Message ID | 20220219190716.GA1470@delia.home |
---|---|
State | Committed |
Commit | 8e5c34ab45f34aadea65c5ba33ec685264b6ec66 |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EE7293851C2A for <patchwork@sourceware.org>; Sat, 19 Feb 2022 19:07:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EE7293851C2A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1645297670; bh=ElmR/QRGAeWvrCxeB/X6fEVmR/xTX0a5M0GbLX3BNgw=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=L4bbwfDLRnrEH8cJ3bVer7B9n3t90Ahg6fo+S9K9VbVaimQ8sREqD4tcLW7ehBW03 4KEakKNt7Uo3TRZF4ZgOk++RzDDu6qsWRwU2aOd6bIQewq4jDOs3uC2D/Msbbg55c3 I9+Ic1t9ASSUBGohKptiVSWp5OBdcqLvEn14RVBw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id DBAC73858D1E for <gcc-patches@gcc.gnu.org>; Sat, 19 Feb 2022 19:07:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DBAC73858D1E Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CEE02219A7 for <gcc-patches@gcc.gnu.org>; Sat, 19 Feb 2022 19:07:19 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B41E11332F for <gcc-patches@gcc.gnu.org>; Sat, 19 Feb 2022 19:07:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id P2weKuc/EWLCGwAAMHmgww (envelope-from <tdevries@suse.de>) for <gcc-patches@gcc.gnu.org>; Sat, 19 Feb 2022 19:07:19 +0000 Date: Sat, 19 Feb 2022 20:07:18 +0100 To: gcc-patches@gcc.gnu.org Subject: [committed][nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt Message-ID: <20220219190716.GA1470@delia.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Tom de Vries via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Tom de Vries <tdevries@suse.de> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[committed,nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt
|
|
Commit Message
Tom de Vries
Feb. 19, 2022, 7:07 p.m. UTC
Hi, With the default ptx isa 6.0, we have for uniform-simt-1.c: ... @%r33 atom.global.cas.b32 %r26, [a], %r28, %r29; shfl.sync.idx.b32 %r26, %r26, %r32, 31, 0xffffffff; ... The atomic insn is predicated by -muniform-simt, and the subsequent insn does a warp sync, at which point the warp is uniform again. But with -mptx=3.1, we have instead: ... @%r33 atom.global.cas.b32 %r26, [a], %r28, %r29; shfl.idx.b32 %r26, %r26, %r32, 31; ... The shfl does not sync the warp, and we want the warp to go back to executing uniformly asap. We cannot enforce this, but at least check this using nvptx_uniform_warp_check, similar to how that is done for openacc. Likewise, detect the case that no shfl insn is emitted, and add a nvptx_uniform_warp_check or nvptx_warpsync. Committed to trunk. Thanks, - Tom [nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt gcc/ChangeLog: 2022-02-19 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Change return type to bool. (nvptx_reorg_uniform_simt): Insert nvptx_uniform_warp_check or nvptx_warpsync, if necessary. gcc/testsuite/ChangeLog: 2022-02-19 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/uniform-simt-1.c: Add scan-assembler test. * gcc.target/nvptx/uniform-simt-2.c: New test. --- gcc/config/nvptx/nvptx.cc | 34 ++++++++++++++++++++++--- gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c | 1 + gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c | 20 +++++++++++++++ 3 files changed, 52 insertions(+), 3 deletions(-)
Comments
Hi Tom! This is me again, following along GCC/nvptx devlopment, and asking questions. ;-) On 2022-02-19T20:07:18+0100, Tom de Vries via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > With the default ptx isa 6.0, we have for uniform-simt-1.c: > ... > @%r33 atom.global.cas.b32 %r26, [a], %r28, %r29; > shfl.sync.idx.b32 %r26, %r26, %r32, 31, 0xffffffff; > ... > > The atomic insn is predicated by -muniform-simt, and the subsequent insn does > a warp sync, at which point the warp is uniform again. I understand the concern here is Independent Thread Scheduling, where the execution of predicated-off threads of a warp ('@ ! %r33') may proceed with the next instruction, 'shfl', without implicitly waiting for the other threads of a warp still working on the 'atom'? Hence, the 'sync' aspect of 'shfl.sync', as a means that PTX provides at the ISA level such that we're getting the desired semantics: as its first step, "wait for all threads in membermask to arrive". > But with -mptx=3.1, we have instead: > ... > @%r33 atom.global.cas.b32 %r26, [a], %r28, %r29; > shfl.idx.b32 %r26, %r26, %r32, 31; > ... > > The shfl does not sync the warp, and we want the warp to go back to executing > uniformly asap. We cannot enforce this Is it really the case that such code may cause "permanent" warp-divergent execution (until re-converging "somewhere")? My understanding has been that predicated-off threads of a warp ('@ ! %r33') would simply idle, implicitly waiting for the other threads of a warp still working on the 'atom' -- due to the nature of a shared program counter per warp, and the desire to re-converge as soon as possible. For example, PTX ISA 7.2, 3.1. "A Set of SIMT Multiprocessors": | [...] | At every instruction issue time, the SIMT unit selects a warp that is ready to execute and | issues the next instruction to the active threads of the warp. A warp executes one common | instruction at a time, so full efficiency is realized when all threads of a warp agree on their | execution path. If threads of a warp diverge via a data-dependent conditional branch, the | warp serially executes each branch path taken, disabling threads that are not on that path, | and when all paths complete, the threads converge back to the same execution path. [...] So I'd have assumed that after the potentially-diverging '@%r33'-predicated 'atom' instruction, we're implicitly re-converging for the unpredicated 'shfl' (as long as Independent Thread Scheduling isn't involved, which it it's for '-mptx=3.1')? As I'm understanding you, my understanding is not correct, and we may thus be getting "permanent" warp-divergent execution as soon as there's any predication/conditional involved that may evaluate differently for individual threads of a warp, and we thus need such *explicit* synchronization after all such instances? > but at least check this using > nvptx_uniform_warp_check, similar to how that is done for openacc. > > Likewise, detect the case that no shfl insn is emitted, and add a > nvptx_uniform_warp_check or nvptx_warpsync. For example, 'nvptx-none/mgomp/libatomic/cas_1_.o': [...] @ %r71 atom.cas.b64 %r62,[%r35],%r29,%r61; +{ +.reg .b32 act; +vote.ballot.b32 act,1; +.reg .pred uni; +setp.eq.b32 uni,act,0xffffffff; +@ ! uni trap; +@ ! uni exit; +} mov.b64 {%r69,%r70},%r62; shfl.idx.b32 %r69,%r69,%r68,31; shfl.idx.b32 %r70,%r70,%r68,31; [...] So that's basically an 'assert' that all threads of a warp are converged. (Is the JIT maybe even able to optimize that out?) I guess I just wonder if that's not satisfied implicitly. Grüße Thomas > [nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt > > gcc/ChangeLog: > > 2022-02-19 Tom de Vries <tdevries@suse.de> > > * config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Change return > type to bool. > (nvptx_reorg_uniform_simt): Insert nvptx_uniform_warp_check or > nvptx_warpsync, if necessary. > > gcc/testsuite/ChangeLog: > > 2022-02-19 Tom de Vries <tdevries@suse.de> > > * gcc.target/nvptx/uniform-simt-1.c: Add scan-assembler test. > * gcc.target/nvptx/uniform-simt-2.c: New test. > > --- > gcc/config/nvptx/nvptx.cc | 34 ++++++++++++++++++++++--- > gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c | 1 + > gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c | 20 +++++++++++++++ > 3 files changed, 52 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc > index afbad5bdde6..4942f1100da 100644 > --- a/gcc/config/nvptx/nvptx.cc > +++ b/gcc/config/nvptx/nvptx.cc > @@ -3248,12 +3248,18 @@ nvptx_call_insn_is_syscall_p (rtx_insn *insn) > /* If SET subexpression of INSN sets a register, emit a shuffle instruction to > propagate its value from lane MASTER to current lane. */ > > -static void > +static bool > nvptx_unisimt_handle_set (rtx set, rtx_insn *insn, rtx master) > { > rtx reg; > if (GET_CODE (set) == SET && REG_P (reg = SET_DEST (set))) > - emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX), insn); > + { > + emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX), > + insn); > + return true; > + } > + > + return false; > } > > /* Adjust code for uniform-simt code generation variant by making atomics and > @@ -3275,8 +3281,30 @@ nvptx_reorg_uniform_simt () > continue; > rtx pat = PATTERN (insn); > rtx master = nvptx_get_unisimt_master (); > + bool shuffle_p = false; > for (int i = 0; i < XVECLEN (pat, 0); i++) > - nvptx_unisimt_handle_set (XVECEXP (pat, 0, i), insn, master); > + shuffle_p > + |= nvptx_unisimt_handle_set (XVECEXP (pat, 0, i), insn, master); > + if (shuffle_p && TARGET_PTX_6_0) > + { > + /* The shuffle is a sync, so uniformity is guaranteed. */ > + } > + else > + { > + if (TARGET_PTX_6_0) > + { > + gcc_assert (!shuffle_p); > + /* Emit after the insn, to guarantee uniformity. */ > + emit_insn_after (gen_nvptx_warpsync (), insn); > + } > + else > + { > + /* Emit after the insn (and before the shuffle, if there are any) > + to check uniformity. */ > + emit_insn_after (gen_nvptx_uniform_warp_check (), insn); > + } > + } > + > rtx pred = nvptx_get_unisimt_predicate (); > pred = gen_rtx_NE (BImode, pred, const0_rtx); > pat = gen_rtx_COND_EXEC (VOIDmode, pred, pat); > diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c b/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c > index 1bc0adae014..77cffc40a66 100644 > --- a/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c > +++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c > @@ -16,3 +16,4 @@ f (void) > } > > /* { dg-final { scan-assembler-times "@%r\[0-9\]*\tatom.global.cas" 1 } } */ > +/* { dg-final { scan-assembler-times "shfl.sync.idx.b32" 1 } } */ > diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c b/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c > new file mode 100644 > index 00000000000..0f1e4e780fe > --- /dev/null > +++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c > @@ -0,0 +1,20 @@ > +/* { dg-options "-O2 -muniform-simt -mptx=3.1" } */ > + > +enum memmodel > +{ > + MEMMODEL_RELAXED = 0, > +}; > + > +int a = 0; > + > +int > +f (void) > +{ > + int expected = 1; > + return __atomic_compare_exchange_n (&a, &expected, 0, 0, MEMMODEL_RELAXED, > + MEMMODEL_RELAXED); > +} > + > +/* { dg-final { scan-assembler-times "@%r\[0-9\]*\tatom.global.cas" 1 } } */ > +/* { dg-final { scan-assembler-times "shfl.idx.b32" 1 } } */ > +/* { dg-final { scan-assembler-times "vote.ballot.b32" 1 } } */ ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
On 2/23/22 10:06, Thomas Schwinge wrote: > Hi Tom! > > This is me again, following along GCC/nvptx devlopment, and asking > questions. ;-) > Yes, thanks for that, that's useful :) > On 2022-02-19T20:07:18+0100, Tom de Vries via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >> With the default ptx isa 6.0, we have for uniform-simt-1.c: >> ... >> @%r33 atom.global.cas.b32 %r26, [a], %r28, %r29; >> shfl.sync.idx.b32 %r26, %r26, %r32, 31, 0xffffffff; >> ... >> >> The atomic insn is predicated by -muniform-simt, and the subsequent insn does >> a warp sync, at which point the warp is uniform again. > > I understand the concern here is Independent Thread Scheduling, where the > execution of predicated-off threads of a warp ('@ ! %r33') may proceed > with the next instruction, 'shfl', without implicitly waiting for the > other threads of a warp still working on the 'atom'? Hence, the 'sync' > aspect of 'shfl.sync', as a means that PTX provides at the ISA level such > that we're getting the desired semantics: as its first step, "wait for > all threads in membermask to arrive". > Indeed. >> But with -mptx=3.1, we have instead: >> ... >> @%r33 atom.global.cas.b32 %r26, [a], %r28, %r29; >> shfl.idx.b32 %r26, %r26, %r32, 31; >> ... >> >> The shfl does not sync the warp, and we want the warp to go back to executing >> uniformly asap. We cannot enforce this > > Is it really the case that such code may cause "permanent" warp-divergent > execution (until re-converging "somewhere")? My understanding has been > that predicated-off threads of a warp ('@ ! %r33') would simply idle, > implicitly waiting for the other threads of a warp still working on the > 'atom' -- due to the nature of a shared program counter per warp, and the > desire to re-converge as soon as possible. > > For example, PTX ISA 7.2, 3.1. "A Set of SIMT Multiprocessors": > > | [...] > | At every instruction issue time, the SIMT unit selects a warp that is ready to execute and > | issues the next instruction to the active threads of the warp. A warp executes one common > | instruction at a time, so full efficiency is realized when all threads of a warp agree on their > | execution path. If threads of a warp diverge via a data-dependent conditional branch, the > | warp serially executes each branch path taken, disabling threads that are not on that path, > | and when all paths complete, the threads converge back to the same execution path. [...] > > So I'd have assumed that after the potentially-diverging > '@%r33'-predicated 'atom' instruction, we're implicitly re-converging for > the unpredicated 'shfl' (as long as Independent Thread Scheduling isn't > involved, which it it's for '-mptx=3.1')? > > As I'm understanding you, my understanding is not correct, and we may > thus be getting "permanent" warp-divergent execution as soon as there's > any predication/conditional involved that may evaluate differently for > individual threads of a warp, and we thus need such *explicit* > synchronization after all such instances? > Reading the ptx manual, I think your interpretation of what _should_ happen is right. Regardless, the JIT is still free to translate say a block of equally predicated insns using a branch as long as it inserts a warp sync right after. And then there might be a JIT bug that optimizes that sync away, or shift it further out, past the shfl. So perhaps the rationale should have been formulated more in terms of the shfl. Note btw that it's possible that there's a compiler bug that does a diverging branch earlier, which would give problems for the shfl, and which the check would catch. Note that the uniform-warp-check insn doesn't enforce convergence. It only checks that the warp is convergent. So, if the warp is not convergent, the check will abort. If the warp is convergent, the JIT optimizer is free to optimize the check away. And sometimes we have seen that adding the check makes the warp convergent (as in: preventing some JIT bug to trigger). Anyway, unfortunately at this point I don't remember whether I found a smoking gun specifically for openmp. Thanks, - Tom >> but at least check this using >> nvptx_uniform_warp_check, similar to how that is done for openacc. >> >> Likewise, detect the case that no shfl insn is emitted, and add a >> nvptx_uniform_warp_check or nvptx_warpsync. > > For example, 'nvptx-none/mgomp/libatomic/cas_1_.o': > > [...] > @ %r71 atom.cas.b64 %r62,[%r35],%r29,%r61; > +{ > +.reg .b32 act; > +vote.ballot.b32 act,1; > +.reg .pred uni; > +setp.eq.b32 uni,act,0xffffffff; > +@ ! uni trap; > +@ ! uni exit; > +} > mov.b64 {%r69,%r70},%r62; > shfl.idx.b32 %r69,%r69,%r68,31; > shfl.idx.b32 %r70,%r70,%r68,31; > [...] > > So that's basically an 'assert' that all threads of a warp are converged. > (Is the JIT maybe even able to optimize that out?) I guess I just wonder > if that's not satisfied implicitly. > > > Grüße > Thomas > > >> [nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt >> >> gcc/ChangeLog: >> >> 2022-02-19 Tom de Vries <tdevries@suse.de> >> >> * config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Change return >> type to bool. >> (nvptx_reorg_uniform_simt): Insert nvptx_uniform_warp_check or >> nvptx_warpsync, if necessary. >> >> gcc/testsuite/ChangeLog: >> >> 2022-02-19 Tom de Vries <tdevries@suse.de> >> >> * gcc.target/nvptx/uniform-simt-1.c: Add scan-assembler test. >> * gcc.target/nvptx/uniform-simt-2.c: New test. >> >> --- >> gcc/config/nvptx/nvptx.cc | 34 ++++++++++++++++++++++--- >> gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c | 1 + >> gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c | 20 +++++++++++++++ >> 3 files changed, 52 insertions(+), 3 deletions(-) >> >> diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc >> index afbad5bdde6..4942f1100da 100644 >> --- a/gcc/config/nvptx/nvptx.cc >> +++ b/gcc/config/nvptx/nvptx.cc >> @@ -3248,12 +3248,18 @@ nvptx_call_insn_is_syscall_p (rtx_insn *insn) >> /* If SET subexpression of INSN sets a register, emit a shuffle instruction to >> propagate its value from lane MASTER to current lane. */ >> >> -static void >> +static bool >> nvptx_unisimt_handle_set (rtx set, rtx_insn *insn, rtx master) >> { >> rtx reg; >> if (GET_CODE (set) == SET && REG_P (reg = SET_DEST (set))) >> - emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX), insn); >> + { >> + emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX), >> + insn); >> + return true; >> + } >> + >> + return false; >> } >> >> /* Adjust code for uniform-simt code generation variant by making atomics and >> @@ -3275,8 +3281,30 @@ nvptx_reorg_uniform_simt () >> continue; >> rtx pat = PATTERN (insn); >> rtx master = nvptx_get_unisimt_master (); >> + bool shuffle_p = false; >> for (int i = 0; i < XVECLEN (pat, 0); i++) >> - nvptx_unisimt_handle_set (XVECEXP (pat, 0, i), insn, master); >> + shuffle_p >> + |= nvptx_unisimt_handle_set (XVECEXP (pat, 0, i), insn, master); >> + if (shuffle_p && TARGET_PTX_6_0) >> + { >> + /* The shuffle is a sync, so uniformity is guaranteed. */ >> + } >> + else >> + { >> + if (TARGET_PTX_6_0) >> + { >> + gcc_assert (!shuffle_p); >> + /* Emit after the insn, to guarantee uniformity. */ >> + emit_insn_after (gen_nvptx_warpsync (), insn); >> + } >> + else >> + { >> + /* Emit after the insn (and before the shuffle, if there are any) >> + to check uniformity. */ >> + emit_insn_after (gen_nvptx_uniform_warp_check (), insn); >> + } >> + } >> + >> rtx pred = nvptx_get_unisimt_predicate (); >> pred = gen_rtx_NE (BImode, pred, const0_rtx); >> pat = gen_rtx_COND_EXEC (VOIDmode, pred, pat); >> diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c b/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c >> index 1bc0adae014..77cffc40a66 100644 >> --- a/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c >> +++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c >> @@ -16,3 +16,4 @@ f (void) >> } >> >> /* { dg-final { scan-assembler-times "@%r\[0-9\]*\tatom.global.cas" 1 } } */ >> +/* { dg-final { scan-assembler-times "shfl.sync.idx.b32" 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c b/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c >> new file mode 100644 >> index 00000000000..0f1e4e780fe >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c >> @@ -0,0 +1,20 @@ >> +/* { dg-options "-O2 -muniform-simt -mptx=3.1" } */ >> + >> +enum memmodel >> +{ >> + MEMMODEL_RELAXED = 0, >> +}; >> + >> +int a = 0; >> + >> +int >> +f (void) >> +{ >> + int expected = 1; >> + return __atomic_compare_exchange_n (&a, &expected, 0, 0, MEMMODEL_RELAXED, >> + MEMMODEL_RELAXED); >> +} >> + >> +/* { dg-final { scan-assembler-times "@%r\[0-9\]*\tatom.global.cas" 1 } } */ >> +/* { dg-final { scan-assembler-times "shfl.idx.b32" 1 } } */ >> +/* { dg-final { scan-assembler-times "vote.ballot.b32" 1 } } */ > ----------------- > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index afbad5bdde6..4942f1100da 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -3248,12 +3248,18 @@ nvptx_call_insn_is_syscall_p (rtx_insn *insn) /* If SET subexpression of INSN sets a register, emit a shuffle instruction to propagate its value from lane MASTER to current lane. */ -static void +static bool nvptx_unisimt_handle_set (rtx set, rtx_insn *insn, rtx master) { rtx reg; if (GET_CODE (set) == SET && REG_P (reg = SET_DEST (set))) - emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX), insn); + { + emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX), + insn); + return true; + } + + return false; } /* Adjust code for uniform-simt code generation variant by making atomics and @@ -3275,8 +3281,30 @@ nvptx_reorg_uniform_simt () continue; rtx pat = PATTERN (insn); rtx master = nvptx_get_unisimt_master (); + bool shuffle_p = false; for (int i = 0; i < XVECLEN (pat, 0); i++) - nvptx_unisimt_handle_set (XVECEXP (pat, 0, i), insn, master); + shuffle_p + |= nvptx_unisimt_handle_set (XVECEXP (pat, 0, i), insn, master); + if (shuffle_p && TARGET_PTX_6_0) + { + /* The shuffle is a sync, so uniformity is guaranteed. */ + } + else + { + if (TARGET_PTX_6_0) + { + gcc_assert (!shuffle_p); + /* Emit after the insn, to guarantee uniformity. */ + emit_insn_after (gen_nvptx_warpsync (), insn); + } + else + { + /* Emit after the insn (and before the shuffle, if there are any) + to check uniformity. */ + emit_insn_after (gen_nvptx_uniform_warp_check (), insn); + } + } + rtx pred = nvptx_get_unisimt_predicate (); pred = gen_rtx_NE (BImode, pred, const0_rtx); pat = gen_rtx_COND_EXEC (VOIDmode, pred, pat); diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c b/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c index 1bc0adae014..77cffc40a66 100644 --- a/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c +++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c @@ -16,3 +16,4 @@ f (void) } /* { dg-final { scan-assembler-times "@%r\[0-9\]*\tatom.global.cas" 1 } } */ +/* { dg-final { scan-assembler-times "shfl.sync.idx.b32" 1 } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c b/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c new file mode 100644 index 00000000000..0f1e4e780fe --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c @@ -0,0 +1,20 @@ +/* { dg-options "-O2 -muniform-simt -mptx=3.1" } */ + +enum memmodel +{ + MEMMODEL_RELAXED = 0, +}; + +int a = 0; + +int +f (void) +{ + int expected = 1; + return __atomic_compare_exchange_n (&a, &expected, 0, 0, MEMMODEL_RELAXED, + MEMMODEL_RELAXED); +} + +/* { dg-final { scan-assembler-times "@%r\[0-9\]*\tatom.global.cas" 1 } } */ +/* { dg-final { scan-assembler-times "shfl.idx.b32" 1 } } */ +/* { dg-final { scan-assembler-times "vote.ballot.b32" 1 } } */