From patchwork Tue Feb 22 16:39:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 51305 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 97D153945C39 for ; Tue, 22 Feb 2022 16:41:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 82FDF3857018 for ; Tue, 22 Feb 2022 16:39:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 82FDF3857018 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: rQib7p4i2UYYbKs0y8OpGb/a97AnY9vNSFvc7bkase/HjJGamLOyLqNacJafOHiUw4fYfaOCcR y8xhdy6g1gR9xgrxULOC+hyhJJCvMzT5CJBRVab49xYeXMHcuPTrWt78Fw3txBPjmXSwW4y0/K 7/fF5RCJUld33sheCqVSbq7lRccs32glzfHEYmDGvneJD6+kXgXCda8Qb7qYIlFCKatAe4GxM0 i4VdR9qW2gluHkEb5j/zNShTlWxSkVThSdLLQmXZwnK+858/iokStBZadyBtHjovu6mBRRVKqA ToMt0atLmaPXdUBsYoBJb/Wo X-IronPort-AV: E=Sophos;i="5.88,387,1635235200"; d="scan'208,223";a="72265414" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 22 Feb 2022 08:39:52 -0800 IronPort-SDR: 2CE/uPEjR6nYKgDAWI5DmibyYeCJOgqu/3B5Kw3BB1vJrwAxUFlYI+AFfW1fHfGH+qslGdfx87 ZYo5sTqdtxHoM1j+HCUG2BsAApXI8C3IdhX9dNQogmvdVDouuZUELpOkpfVF4L1nPpTkHCMST7 AmSS5OqT4VhK2csgQS+X+dJ39hsLLk9rJ+78r5skCnOuWhug8ntbHWxVp+aq77dwONyTXCbExo bzKjT9r1IOq6mnCpd+yFVFENguNYhCIpqpABb5yGaHvoXC8854KR4js0jVne7eXh6TBscpMryW zlA= From: Thomas Schwinge To: Tobias Burnus , Subject: Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90' (was: Add 'libgomp.oacc-fortran/privatized-ref-2.f90') In-Reply-To: <87mtsow47q.fsf@euler.schwinge.homeip.net> References: <85518091-66ad-4844-cd0f-b8a245d5af7b@codesourcery.com> <0968ff25a4dc4d9895682ce0669345c5@svr-orw-mbx-02.mgc.mentorg.com> <87mtsow47q.fsf@euler.schwinge.homeip.net> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/27.1 (x86_64-pc-linux-gnu) Date: Tue, 22 Feb 2022 17:39:42 +0100 Message-ID: <87zgmionxt.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_LOTSOFHASH, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Stubbs Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi! On 2021-05-21T16:28:57+0200, I wrote: > This came into existance internally, when the og10 branch was set up. > > On 2020-06-03T17:23:51+0200, Tobias Burnus wrote: >> This fixes [...] on OG10 (og10_prerelease); it will be >> later applied to gcn/… to fix the issue. (Upstream is unaffected.) >> [...] > > However, that means that your testcase does work on master branch (and > would regress if certain commits got pushed there). As the testcase has > got a property useful for a thing I'm currently working on, I've pushed > to master branch "Add 'libgomp.oacc-fortran/privatized-ref-2.f90'" in > commit 61796dc03befa9b7426d5bc7c336cca585944143 After commit a78b1ab1df9ca44acc5638e8f9d0ae2e62bd65ed "amdgcn: Tune default OpenMP/OpenACC GPU utilization", we'd seen this test case regress (only) on our AMD GPU amd-instinct1/'-march=gfx908' system: {+WARNING: program timed out.+} [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test Same for other optimization levels. Nothing more in 'libgomp.log'. I have determined this is a latent problem in the original test case, which contains a few instances of code as follows: !$acc parallel copyout(array) array = [(-i, i = 1, nn)] !$acc loop gang private(array) do i = 1, 10 array(i) = i end do if (any (array /= [(-i, i = 1, nn)])) error stop 1 !$acc end parallel Given the '!$acc loop gang', the whole containing '!$acc parallel' region is launched with gang parallelism. The '!$acc loop gang' executes in gang-partitioned mode, but the 'array' assignment before and checks after don't execute in a (hypothetical) gang-single mode, but instead in gang-redundant mode, meaning that each gang executes these concurrently, giving rise to data races and other mischief. Thus, we have to make sure that we're not executing non-parallelized code in gang-redundant mode, by putting these parts into their own 'parallel' constructs, which then default to 'num_gangs(1)'. Pushed to master branch commit f8187b5c0d22723c8e0a3d13d0ea5dd7ecfeff75 "Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90'", see attached. Grüße Thomas > I confirm that "FIXME: Fails due to PR middle-end/95499" is still a > problem. > > And, GCC '-O' reports: > > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:147:21: > > 147 | subroutine foobar15 (scalar) > | ^ > Warning: ‘foobar15’ defined but not used [-Wunused-function] > [...]/libgomp.oacc-fortran/privatized-ref-2.f90: In function ‘MAIN__’: > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: ‘a.offset’ is used uninitialized [-Wuninitialized] > 31 | A = [(3*j, j=1, 10)] > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared here > 27 | integer, allocatable :: A(:) > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: ‘a.dim[0].lbound’ is used uninitialized [-Wuninitialized] > 31 | A = [(3*j, j=1, 10)] > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared here > 27 | integer, allocatable :: A(:) > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: ‘a.dim[0].ubound’ is used uninitialized [-Wuninitialized] > 31 | A = [(3*j, j=1, 10)] > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared here > 27 | integer, allocatable :: A(:) > | ^ > > I haven't looked into these. > > > Grüße > Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 From f8187b5c0d22723c8e0a3d13d0ea5dd7ecfeff75 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 21 Jan 2022 14:58:23 +0100 Subject: [PATCH] Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90' This was a latent problem, and this commit here now resolves a regression that after recent commit a78b1ab1df9ca44acc5638e8f9d0ae2e62bd65ed "amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a GCN offloading '-march=gfx908' system: {+WARNING: program timed out.+} [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test Same for other optimization levels. Make sure that we're not executing non-parallelized code in gang-redundant mode, by putting these parts into their own 'parallel' constructs, which then default to 'num_gangs(1)'. libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC gang-redundant execution. --- .../libgomp.oacc-fortran/privatized-ref-2.f90 | 42 ++++++++++++++----- 1 file changed, 32 insertions(+), 10 deletions(-) diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 index f4a6af986e8..6bd17148911 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 @@ -53,12 +53,10 @@ contains integer :: array(nn) !$acc parallel copyout(array) ! { dg-line l_compute[incr c_compute] } - ! { dg-note {variable 'atmp\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } - ! { dg-note {variable 'shadow_loopvar\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } - ! { dg-note {variable 'offset\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } ! { dg-note {variable 'S\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } - ! { dg-note {variable 'test\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } array = [(-i, i = 1, nn)] + !$acc end parallel + !$acc parallel copy(array) !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] } ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } ! { dg-note {variable 'array' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop } @@ -66,6 +64,13 @@ contains do i = 1, 10 array(i) = i end do + !$acc end parallel + !$acc parallel copyin(array) ! { dg-line l_compute[incr c_compute] } + ! { dg-note {variable 'test\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'atmp\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'shadow_loopvar\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'offset\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'S\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } if (any (array /= [(-i, i = 1, nn)])) error stop 1 !$acc end parallel end subroutine foo @@ -74,14 +79,10 @@ contains integer :: array(:) !$acc parallel copyout(array) ! { dg-line l_compute[incr c_compute] } - ! { dg-note {variable 'atmp\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } - ! { dg-note {variable 'shadow_loopvar\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } - ! { dg-note {variable 'offset\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } ! { dg-note {variable 'S\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } - ! { dg-note {variable 'test\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } - ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } - ! { dg-note {variable 'A\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: static} "" { target *-*-* } l_compute$c_compute } array = [(-2*i, i = 1, size(array))] + !$acc end parallel + !$acc parallel copy(array) !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] } ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } ! { dg-note {variable 'array\.[0-9]+' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop } @@ -91,6 +92,11 @@ contains do i = 1, 10 array(i) = 9*i end do + !$acc end parallel + !$acc parallel copyin(array) ! { dg-line l_compute[incr c_compute] } + ! { dg-note {variable 'test\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'A\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: static} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'S\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } if (any (array /= [(-2*i, i = 1, 10)])) error stop 2 !$acc end parallel end subroutine bar @@ -100,6 +106,8 @@ contains !$acc parallel copyout(str) str = "abcdefghij" + !$acc end parallel + !$acc parallel copy(str) !$acc loop gang private(str) ! { dg-line l_loop[incr c_loop] } ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } ! { dg-note {variable 'str' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop } @@ -110,6 +118,8 @@ contains do i = 1, 10 str(i:i) = achar(ichar('A') + i) end do + !$acc end parallel + !$acc parallel copyin(str) if (str /= "abcdefghij") error stop 3 !$acc end parallel end @@ -122,10 +132,14 @@ contains ! *************************************** !!$acc parallel copyout(str) str = "abcdefghij" + !!$acc end parallel + !!$acc parallel copy(str) !!$acc loop gang private(str) !do i = 1, 10 ! str(i:i) = achar(ichar('A') + i) !end do + !!$acc end parallel + !!$acc parallel copyin(str) if (str /= "abcdefghij") error stop 5 !!$acc end parallel end @@ -135,6 +149,8 @@ contains !$acc parallel copyout(scalar) scalar = "abcdefghi-12345" + !$acc end parallel + !$acc parallel copy(scalar) !$acc loop gang private(scalar) ! { dg-line l_loop[incr c_loop] } ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } ! { dg-note {variable 'scalar' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop } @@ -145,7 +161,9 @@ contains scalar(i:i) = achar(ichar('A') + i) end do !$acc end parallel + !$acc parallel copyin(scalar) if (scalar /= "abcdefghi-12345") error stop 6 + !$acc end parallel end subroutine foobar subroutine foobar15 (scalar) integer :: i @@ -153,11 +171,15 @@ contains !$acc parallel copyout(scalar) scalar = "abcdefghi-12345" + !$acc end parallel + !$acc parallel copy(scalar) !$acc loop gang private(scalar) do i = 1, 15 scalar(i:i) = achar(ichar('A') + i) end do !$acc end parallel + !$acc parallel copyin(scalar) if (scalar /= "abcdefghi-12345") error stop 1 + !$acc end parallel end subroutine foobar15 end -- 2.34.1