From patchwork Thu Jan 13 10:07:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 49957 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2D786389F4CF for ; Thu, 13 Jan 2022 10:07:56 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 094DD3858029 for ; Thu, 13 Jan 2022 10:07:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 094DD3858029 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: ZtVXxC5+QP/Vj6IWfDh35gfpUdrK7GzL6mI3vqscg5cYtVSIQfNlFLorq947QJoNt1BiY5iE7R wsnP2s5+YXJloquFRJw3BD5nMkHqtNUYXbWpvgZsYafAUlMrlkBK9bAubYgPBe4mVvywc9o1nb qNuhpddWbWgj0hZs1f5beyFJ/OgvvbUeFbSD2hboH6ktSe08DlMrVUU12obSHBGcy3FOC7EGUj 5vHCATXyTw18WAalqrJY2CUobKvXCjUyxiQMBGJ+zuLiUL51W6sez32gjHYZrWno1JYNg2YiI6 ItndDDkKXeQaeP2dfEp0ExmU X-IronPort-AV: E=Sophos;i="5.88,284,1635235200"; d="scan'208,223";a="70715883" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 13 Jan 2022 02:07:36 -0800 IronPort-SDR: qEV3KTzVj9mSmQqtd2yDmr/kR+FaPi+ZE2s6JeTj83Fos0nGIOlTip1IlIed84SPkCz6lUOAeE RaPdy0L6wis318YN5yq+2D758m+cV+VruDCJybo8kS2Zpf57TCGbsqB4XYTfAcqLQh1n1SmDlQ 8Qfhyz3lAUHJoh4Eb2wjyVZqv10Ta9nphfr0j9lVJVWogc0KZFz9UfNlaaSwFIgvh5iVmqLfXH vmsIywpwzbkOZ3++QZFvaC2q7ay1TSR0geHLsBGqy1v0SFX23XmI0wUKTiNwFnFsjzQUUj1wFq lAY= From: Thomas Schwinge To: Julian Brown , Subject: Wait at end of OpenACC asynchronous kernels regions In-Reply-To: <79cc9084f24fec88df02daa5b099c8288ee06626.1565729221.git.julian@codesourcery.com> References: <79cc9084f24fec88df02daa5b099c8288ee06626.1565729221.git.julian@codesourcery.com> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/27.1 (x86_64-pc-linux-gnu) Date: Thu, 13 Jan 2022 11:07:26 +0100 Message-ID: <87a6g0dk4h.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi! On 2019-08-13T14:37:13-0700, Julian Brown wrote: > This patch provides a workaround for unreliable operation of asynchronous > kernels regions on AMD GCN. At present, kernels regions are decomposed > into a series of parallel regions surrounded by a data region capturing > the data-movement clauses needed by the region as a whole: > > #pragma acc kernels async(n) > { ... } > > is translated to: ... simplified... > #pragma acc data copyin(...) copyout(...) > { > #pragma acc parallel async(n) present(...) > { ... } > #pragma acc parallel async(n) present(...) > { ... } > } > > This is however problematic for two reasons: > > - Variables mapped by the data clause will be unmapped immediately at the end > of the data region, regardless of whether the inner asynchronous > parallels have completed. (This causes crashes for GCN.) > > - Even if the "present" clause caused the reference count to stay above zero > at the end of the data region -- which it doesn't -- the "present" > clauses on the inner parallel regions would not cause "copyout" > variables to be transferred back to the host at the appropriate time, > i.e. when the async parallel region had completed. > There is no "async" data construct in OpenACC (Actually, as of OpenACC 3.2 there now is: "[OpenACC] 'async' clause on 'data' construct" -- but that's not yet implemented, so doesn't help us here.) > so the correct solution > (which I am deferring on for now) is probably to use asynchronous > "enter data" and "exit data" directives when translating asynchronous > kernels regions instead. (Or rather, use structured 'data' (as we're now doing), but with appropriate 'async' clauses.) > The attached patch just adds a "wait" operation before the end of > the enclosing data region. This works, but introduces undesirable > synchronisation with the host. ACK, thanks. Pushed to master branch in commit e52253bcc0916d9a7c7ba4bbe7501ae1ded3b8a8 "Wait at end of OpenACC asynchronous kernels regions", see attached. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 From e52253bcc0916d9a7c7ba4bbe7501ae1ded3b8a8 Mon Sep 17 00:00:00 2001 From: Julian Brown Date: Fri, 9 Aug 2019 13:01:33 -0700 Subject: [PATCH] Wait at end of OpenACC asynchronous kernels regions In OpenACC 'kernels' decomposition, we're improperly nesting synchronous and asynchronous data and compute regions, giving rise to data races when the asynchronicity is actually executed, as is visible in at least on test case with GCN offloading. The proper fix is to correctly use the asynchronous interfaces, making the currently synchronous data regions fully asynchronous (see also "[OpenACC] 'async' clause on 'data' construct", which is to share the same implementation), but that's for later; for now add some more synchronization. gcc/ * omp-oacc-kernels-decompose.cc (add_wait): New function, split out of... (add_async_clauses_and_wait): ...here. Call new outlined function. (decompose_kernels_region_body): Add wait at the end of explicitly-asynchronous kernels regions. libgomp/ * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Remove GCN offloading execution XFAIL. Co-Authored-By: Thomas Schwinge --- gcc/omp-oacc-kernels-decompose.cc | 31 ++++++++++++++----- .../libgomp.oacc-c-c++-common/f-asyncwait-1.c | 1 - 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc index 4ca899d5ece..21872db3ed3 100644 --- a/gcc/omp-oacc-kernels-decompose.cc +++ b/gcc/omp-oacc-kernels-decompose.cc @@ -878,6 +878,18 @@ maybe_build_inner_data_region (location_t loc, gimple *body, return body; } +static void +add_wait (location_t loc, gimple_seq *region_body) +{ + /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0). */ + tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT); + tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC); + gimple *wait_call = gimple_build_call (wait_fn, 2, + sync_arg, integer_zero_node); + gimple_set_location (wait_call, loc); + gimple_seq_add_stmt (region_body, wait_call); +} + /* Helper function of decompose_kernels_region_body. The statements in REGION_BODY are expected to be decomposed parts; add an 'async' clause to each. Also add a 'wait' directive at the end of the sequence. */ @@ -900,13 +912,7 @@ add_async_clauses_and_wait (location_t loc, gimple_seq *region_body) gimple_omp_target_set_clauses (as_a (stmt), target_clauses); } - /* A '#pragma acc wait' is just a call 'GOACC_wait (acc_async_sync, 0)'. */ - tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT); - tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC); - gimple *wait_call = gimple_build_call (wait_fn, 2, - sync_arg, integer_zero_node); - gimple_set_location (wait_call, loc); - gimple_seq_add_stmt (region_body, wait_call); + add_wait (loc, region_body); } /* Auxiliary analysis of the body of a kernels region, to determine for each @@ -1352,6 +1358,17 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses) a wait directive at the end. */ if (async_clause == NULL) add_async_clauses_and_wait (loc, ®ion_body); + else + /* !!! If we have asynchronous parallel blocks inside a (synchronous) data + region, then target memory will get unmapped at the point the data + region ends, even if the inner asynchronous parallels have not yet + completed. For kernels marked "async", we might want to use "enter data + async(...)" and "exit data async(...)" instead, or asynchronous data + regions (see also + "[OpenACC] 'async' clause on 'data' construct", + which is to share the same implementation). + For now, insert a (synchronous) wait at the end of the block. */ + add_wait (loc, ®ion_body); tree kernels_locals = gimple_bind_vars (as_a (kernels_body)); gimple *body = gimple_build_bind (kernels_locals, region_body, diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c index f7ccecbf4b4..ef7735b2ef4 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c @@ -3,7 +3,6 @@ /* Based on '../libgomp.oacc-fortran/asyncwait-1.f90'. */ /* { dg-additional-options "--param=openacc-kernels=decompose" } */ -/* { dg-xfail-run-if TODO { openacc_radeon_accel_selected } } */ /* { dg-additional-options "-fopt-info-all-omp" } { dg-additional-options "-foffload=-fopt-info-all-omp" } */ -- 2.34.1