From patchwork Fri Mar 10 15:22:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 66226 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 717E33858428 for ; Fri, 10 Mar 2023 15:23:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 37EAF3858D32 for ; Fri, 10 Mar 2023 15:23:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 37EAF3858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,250,1673942400"; d="scan'208,223";a="99558948" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 10 Mar 2023 07:23:10 -0800 IronPort-SDR: XkvqfAKaGmEHERy53OnFnqOEgD2MRS25hCel7fz6xgdhLGQ5X05OJb23YFps8SQzJErParYACP LSIGSkcybvPaeHRHrz+MC5fb3EHVVvE07aC1pK/Ej86xGr95r9JkIDw1lagds5kkndfwh5rJ3U b++2CBcQGQqnSj/LatBl5rK6xz0xZoSQQC9KXIiN0BkLMBGJ0uAwGwJ3rN3PwdTXNxDBSv88k2 fGk3E6JtgchOwR+HjxatXYV6ONUmK2ZDqNhMiP+3ku9tngFgi+nN70boAt+Lvzfw2ZH69lOm+p zo8= From: Thomas Schwinge To: Julian Brown , CC: Jakub Jelinek , Tobias Burnus Subject: Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data (was: [PATCH 3/4] openacc: Fix asynchronous host-to-device copies in libgomp runtime) In-Reply-To: <87r1fkt6s1.fsf@euler.schwinge.homeip.net> References: <560623585b89e577c9e76d8b53939569f49064fa.1624987598.git.julian@codesourcery.com> <87r1fkt6s1.fsf@euler.schwinge.homeip.net> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Fri, 10 Mar 2023 16:22:53 +0100 Message-ID: <87356cbpgy.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi! On 2021-07-27T12:01:18+0200, I wrote: > On 2021-06-29T16:42:03-0700, Julian Brown wrote: >> This patch fixes several places in libgomp/target.c where "ephemeral" data >> (on the stack or in temporary heap locations) may be used as the source of >> an asynchronous host-to-device copy that may not complete before the host >> data disappears. Versions of the patch have been posted several times >> before, but this one (at Chung-Lin Tang's prior suggesion, IIRC) moves >> all logic into target.c rather than pushing it out to each target plugin. > > Thanks for the re-work! >> +/* Copy host memory to an offload device. In asynchronous mode (if AQ is >> + non-NULL), when the source data is stack or may otherwise be deallocated >> + before the asynchronous copy takes place, EPHEMERAL must be passed as >> + TRUE. The CBUF isn't used for non-ephemeral asynchronous copies, because >> + the host data might not be computed yet (by an earlier asynchronous compute >> + region). */ >> + >> [gomp_copy_host2dev] > > Code changes related to the latter sentence have moved into a separate > "Don't use libgomp 'cbuf' buffering with OpenACC 'async'", pushed to > master branch in commit d88a6951586c7229b25708f4486eaaf4bf4b5bbe, [...] Re this TODO comment: > + TODO ... but we could allow CBUF usage for EPHEMERAL data? (Open question: > + is it more performant to use libgomp CBUF buffering or individual device > + asyncronous copying?) */ Pushed to master branch commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd "Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data", see attached. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 From 2b2340e236c0bba8aaca358ea25a5accd8249fbd Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Mon, 27 Feb 2023 16:41:17 +0100 Subject: [PATCH] Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data This does *allow*, but under no circumstances is this currently going to be used: all potentially applicable data is non-'ephemeral', and thus not considered for 'gomp_coalesce_buf_add' for OpenACC 'async'. (But a use will emerge later.) Follow-up to commit r12-2530-gd88a6951586c7229b25708f4486eaaf4bf4b5bbe "Don't use libgomp 'cbuf' buffering with OpenACC 'async'", addressing this TODO comment: TODO ... but we could allow CBUF usage for EPHEMERAL data? (Open question: is it more performant to use libgomp CBUF buffering or individual device asyncronous copying?) Ephemeral data is small, and therefore individual device asyncronous copying does seem dubious -- in particular given that for all those, we'd individually have to allocate and queue for deallocation a temporary buffer to capture the ephemeral data. Instead, just let the 'cbuf' *be* the temporary buffer. libgomp/ * target.c (gomp_copy_host2dev, gomp_map_vars_internal): Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data. --- libgomp/target.c | 70 +++++++++++++++++++++++++----------------------- 1 file changed, 36 insertions(+), 34 deletions(-) diff --git a/libgomp/target.c b/libgomp/target.c index 0344f68a936..074caa6a4dc 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -310,10 +310,8 @@ struct gomp_coalesce_buf This must not be used for asynchronous copies, because the host data might not be computed yet (by an earlier asynchronous compute region, for - example). - TODO ... but we could allow CBUF usage for EPHEMERAL data? (Open question: - is it more performant to use libgomp CBUF buffering or individual device - asyncronous copying?) */ + example). The exception is for EPHEMERAL data, that we know is available + already "by construction". */ static inline void gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len) @@ -377,30 +375,6 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, void *d, const void *h, size_t sz, bool ephemeral, struct gomp_coalesce_buf *cbuf) { - if (__builtin_expect (aq != NULL, 0)) - { - /* See 'gomp_coalesce_buf_add'. */ - assert (!cbuf); - - void *h_buf = (void *) h; - if (ephemeral) - { - /* We're queueing up an asynchronous copy from data that may - disappear before the transfer takes place (i.e. because it is a - stack local in a function that is no longer executing). Make a - copy of the data into a temporary buffer in those cases. */ - h_buf = gomp_malloc (sz); - memcpy (h_buf, h, sz); - } - goacc_device_copy_async (devicep, devicep->openacc.async.host2dev_func, - "dev", d, "host", h_buf, h, sz, aq); - if (ephemeral) - /* Free temporary buffer once the transfer has completed. */ - devicep->openacc.async.queue_callback_func (aq, free, h_buf); - - return; - } - if (cbuf) { uintptr_t doff = (uintptr_t) d - cbuf->tgt->tgt_start; @@ -420,6 +394,12 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, gomp_mutex_unlock (&devicep->lock); gomp_fatal ("internal libgomp cbuf error"); } + + /* In an asynchronous context, verify that CBUF isn't used + with non-EPHEMERAL data; see 'gomp_coalesce_buf_add'. */ + if (__builtin_expect (aq != NULL, 0)) + assert (ephemeral); + memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0].start), h, sz); return; @@ -430,7 +410,28 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, } } - gomp_device_copy (devicep, devicep->host2dev_func, "dev", d, "host", h, sz); + if (__builtin_expect (aq != NULL, 0)) + { + void *h_buf = (void *) h; + if (ephemeral) + { + /* We're queueing up an asynchronous copy from data that may + disappear before the transfer takes place (i.e. because it is a + stack local in a function that is no longer executing). As we've + not been able to use CBUF, make a copy of the data into a + temporary buffer. */ + h_buf = gomp_malloc (sz); + memcpy (h_buf, h, sz); + } + goacc_device_copy_async (devicep, devicep->openacc.async.host2dev_func, + "dev", d, "host", h_buf, h, sz, aq); + if (ephemeral) + /* Free once the transfer has completed. */ + devicep->openacc.async.queue_callback_func (aq, free, h_buf); + } + else + gomp_device_copy (devicep, devicep->host2dev_func, + "dev", d, "host", h, sz); } attribute_hidden void @@ -1751,9 +1752,6 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, if (cbufp) { - /* See 'gomp_coalesce_buf_add'. */ - assert (!aq); - long c = 0; for (c = 0; c < cbuf.chunk_cnt; ++c) gomp_copy_host2dev (devicep, aq, @@ -1761,8 +1759,12 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, (char *) cbuf.buf + (cbuf.chunks[c].start - cbuf.chunks[0].start), cbuf.chunks[c].end - cbuf.chunks[c].start, - true, NULL); - free (cbuf.buf); + false, NULL); + if (aq) + /* Free once the transfer has completed. */ + devicep->openacc.async.queue_callback_func (aq, free, cbuf.buf); + else + free (cbuf.buf); cbuf.buf = NULL; cbufp = NULL; } -- 2.25.1