From patchwork Tue Feb 1 18:30:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 50635 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 30CF83858C78 for ; Tue, 1 Feb 2022 18:31:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 30CF83858C78 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1643740317; bh=DK6c7xyojXnM/s8y99fWxhW4Rqg+rDCcx3k9q9TqBhI=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ruf24KBQZa0yx31TOEGezMIdD5zocojmrfPNlJzsiHT6B5rnky0oK3MzcyvaJGdzn AltDaXU2+KxD1cFIAHsET2GOxWqosOz1M6F6lAFiqCxFsbQQQOQgVYw/LhOdjr0xt+ pah7fYN/Ws7hV3rwgnI2nChsNpUacFjePt7vtXss= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 4EE623857805 for ; Tue, 1 Feb 2022 18:30:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4EE623857805 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8112A1F37C for ; Tue, 1 Feb 2022 18:30:19 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6D5D113B54 for ; Tue, 1 Feb 2022 18:30:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id fqKcGTt8+WEJCQAAMHmgww (envelope-from ) for ; Tue, 01 Feb 2022 18:30:19 +0000 Date: Tue, 1 Feb 2022 19:30:18 +0100 To: gcc-patches@gcc.gnu.org Subject: [committed][nvptx] Add some support for .local atomics Message-ID: <20220201183016.GA4137@delia.home> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom de Vries via Gcc-patches From: Tom de Vries Reply-To: Tom de Vries Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, The ptx insn atom doesn't support local memory. In case of doing an atomic operation on local memory, we run into: ... operation not supported on global/shared address space ... This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE. The message is somewhat confusing given that actually the operation is not supported on local address space. Fix this by falling back on a non-atomic version when detecting a frame-related memory operand. This only solves some cases that are detected at compile-time. It does however fix the openacc private-atomic-* test-cases. Tested on x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx] Add some support for .local atomics gcc/ChangeLog: 2022-01-27 Tom de Vries * config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap_1") (define_insn "atomic_exchange") (define_insn "atomic_fetch_add") (define_insn "atomic_fetch_addsf") (define_insn "atomic_fetch_"): Output non-atomic version if memory operands is frame-relative. gcc/testsuite/ChangeLog: 2022-01-31 Tom de Vries * gcc.target/nvptx/stack-atomics-run.c: New test. libgomp/ChangeLog: 2022-01-27 Tom de Vries * testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove PR83812 workaround. * testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same. * testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same. --- gcc/config/nvptx/nvptx.md | 82 +++++++++++++++++++++- gcc/testsuite/gcc.target/nvptx/stack-atomics-run.c | 44 ++++++++++++ .../libgomp.oacc-c-c++-common/private-atomic-1.c | 7 -- .../private-atomic-1-vector.f90 | 7 -- .../private-atomic-1-worker.f90 | 7 -- 5 files changed, 124 insertions(+), 23 deletions(-) diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 773ae8fdc6f..9cbbd956f9d 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -1790,11 +1790,28 @@ (define_insn "atomic_compare_and_swap_1" (unspec_volatile:SDIM [(const_int 0)] UNSPECV_CAS))] "" { + struct address_info info; + decompose_mem_address (&info, operands[1]); + if (info.base != NULL && REG_P (*info.base) + && REGNO_PTR_FRAME_P (REGNO (*info.base))) + { + output_asm_insn ("{", NULL); + output_asm_insn ("\\t" ".reg.pred" "\\t" "%%eq_p;", NULL); + output_asm_insn ("\\t" ".reg%t0" "\\t" "%%val;", operands); + output_asm_insn ("\\t" "ld%A1%t0" "\\t" "%%val,%1;", operands); + output_asm_insn ("\\t" "setp.eq%t0" "\\t" "%%eq_p, %%val, %2;", + operands); + output_asm_insn ("@%%eq_p\\t" "st%A1%t0" "\\t" "%1,%3;", operands); + output_asm_insn ("\\t" "mov%t0" "\\t" "%0,%%val;", operands); + output_asm_insn ("}", NULL); + return ""; + } const char *t - = "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;"; + = "\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;"; return nvptx_output_atomic_insn (t, operands, 1, 4); } - [(set_attr "atomic" "true")]) + [(set_attr "atomic" "true") + (set_attr "predicable" "false")]) (define_insn "atomic_exchange" [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R") ;; output @@ -1806,6 +1823,19 @@ (define_insn "atomic_exchange" (match_operand:SDIM 2 "nvptx_nonmemory_operand" "Ri"))] ;; input "" { + struct address_info info; + decompose_mem_address (&info, operands[1]); + if (info.base != NULL && REG_P (*info.base) + && REGNO_PTR_FRAME_P (REGNO (*info.base))) + { + output_asm_insn ("{", NULL); + output_asm_insn ("\\t" ".reg%t0" "\\t" "%%val;", operands); + output_asm_insn ("%.\\t" "ld%A1%t0" "\\t" "%%val,%1;", operands); + output_asm_insn ("%.\\t" "st%A1%t0" "\\t" "%1,%2;", operands); + output_asm_insn ("%.\\t" "mov%t0" "\\t" "%0,%%val;", operands); + output_asm_insn ("}", NULL); + return ""; + } const char *t = "%.\tatom%A1.exch.b%T0\t%0, %1, %2;"; return nvptx_output_atomic_insn (t, operands, 1, 3); @@ -1823,6 +1853,22 @@ (define_insn "atomic_fetch_add" (match_dup 1))] "" { + struct address_info info; + decompose_mem_address (&info, operands[1]); + if (info.base != NULL && REG_P (*info.base) + && REGNO_PTR_FRAME_P (REGNO (*info.base))) + { + output_asm_insn ("{", NULL); + output_asm_insn ("\\t" ".reg%t0" "\\t" "%%val;", operands); + output_asm_insn ("\\t" ".reg%t0" "\\t" "%%update;", operands); + output_asm_insn ("%.\\t" "ld%A1%t0" "\\t" "%%val,%1;", operands); + output_asm_insn ("%.\\t" "add%t0" "\\t" "%%update,%%val,%2;", + operands); + output_asm_insn ("%.\\t" "st%A1%t0" "\\t" "%1,%%update;", operands); + output_asm_insn ("%.\\t" "mov%t0" "\\t" "%0,%%val;", operands); + output_asm_insn ("}", NULL); + return ""; + } const char *t = "%.\\tatom%A1.add%t0\\t%0, %1, %2;"; return nvptx_output_atomic_insn (t, operands, 1, 3); @@ -1840,6 +1886,22 @@ (define_insn "atomic_fetch_addsf" (match_dup 1))] "" { + struct address_info info; + decompose_mem_address (&info, operands[1]); + if (info.base != NULL && REG_P (*info.base) + && REGNO_PTR_FRAME_P (REGNO (*info.base))) + { + output_asm_insn ("{", NULL); + output_asm_insn ("\\t" ".reg%t0" "\\t" "%%val;", operands); + output_asm_insn ("\\t" ".reg%t0" "\\t" "%%update;", operands); + output_asm_insn ("%.\\t" "ld%A1%t0" "\\t" "%%val,%1;", operands); + output_asm_insn ("%.\\t" "add%t0" "\\t" "%%update,%%val,%2;", + operands); + output_asm_insn ("%.\\t" "st%A1%t0" "\\t" "%1,%%update;", operands); + output_asm_insn ("%.\\t" "mov%t0" "\\t" "%0,%%val;", operands); + output_asm_insn ("}", NULL); + return ""; + } const char *t = "%.\\tatom%A1.add%t0\\t%0, %1, %2;"; return nvptx_output_atomic_insn (t, operands, 1, 3); @@ -1860,6 +1922,22 @@ (define_insn "atomic_fetch_" (match_dup 1))] "mode == SImode || TARGET_SM35" { + struct address_info info; + decompose_mem_address (&info, operands[1]); + if (info.base != NULL && REG_P (*info.base) + && REGNO_PTR_FRAME_P (REGNO (*info.base))) + { + output_asm_insn ("{", NULL); + output_asm_insn ("\\t" ".reg.b%T0" "\\t" "%%val;", operands); + output_asm_insn ("\\t" ".reg.b%T0" "\\t" "%%update;", operands); + output_asm_insn ("%.\\t" "ld%A1%t0" "\\t" "%%val,%1;", operands); + output_asm_insn ("%.\\t" ".b%T0" "\\t" "%%update,%%val,%2;", + operands); + output_asm_insn ("%.\\t" "st%A1%t0" "\\t" "%1,%%update;", operands); + output_asm_insn ("%.\\t" "mov%t0" "\\t" "%0,%%val;", operands); + output_asm_insn ("}", NULL); + return ""; + } const char *t = "%.\\tatom%A1.b%T0.\\t%0, %1, %2;"; return nvptx_output_atomic_insn (t, operands, 1, 3); diff --git a/gcc/testsuite/gcc.target/nvptx/stack-atomics-run.c b/gcc/testsuite/gcc.target/nvptx/stack-atomics-run.c new file mode 100644 index 00000000000..ad8e2f842fb --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/stack-atomics-run.c @@ -0,0 +1,44 @@ +/* { dg-do run } */ + +enum memmodel { + MEMMODEL_RELAXED = 0 +}; + +int +main (void) +{ + int a, b; + + a = 1; + __atomic_fetch_add (&a, 1, MEMMODEL_RELAXED); + if (a != 2) + __builtin_abort (); + + a = 0; + __atomic_fetch_or (&a, 1, MEMMODEL_RELAXED); + if (a != 1) + __builtin_abort (); + + a = 1; + b = -1; + b = __atomic_exchange_n (&a, 0, MEMMODEL_RELAXED); + if (a != 0) + __builtin_abort (); + if (b != 1) + __builtin_abort (); + + a = 1; + b = -1; + { + int expected = a; + b = __atomic_compare_exchange_n (&a, &expected, 0, 0, MEMMODEL_RELAXED, + MEMMODEL_RELAXED); + } + if (a != 0) + __builtin_abort (); + if (b != 1) + __builtin_abort (); + + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c index e651012f463..2f9e6f2d8a5 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c @@ -32,13 +32,6 @@ int main (void) { #pragma acc atomic update ++v; - /* nvptx offloading: PR83812 "operation not supported on global/shared address space". - { dg-output "(\n|\r\n|\r)libgomp: cuStreamSynchronize error: operation not supported on global/shared address space(\n|\r\n|\r)$" { target openacc_nvidia_accel_selected } } - Scan for what we expect in the "XFAILed" case (without actually XFAILing). - { dg-shouldfail "XFAILed" { openacc_nvidia_accel_selected } } - ... instead of 'dg-xfail-run-if' so that 'dg-output' is evaluated at all. - { dg-final { if { [dg-process-target { xfail openacc_nvidia_accel_selected }] == "F" } { xfail "[testname-for-summary] really is XFAILed" } } } - ... so that we still get an XFAIL visible in the log. */ } res += (v == -222 + 121); diff --git a/libgomp/testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90 b/libgomp/testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90 index e916837fc8f..3f39d9e18e8 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90 @@ -25,13 +25,6 @@ program main do i = 0, 31 !$acc atomic update w = w + 1 - ! nvptx offloading: PR83812 "operation not supported on global/shared address space". - ! { dg-output "(\n|\r\n|\r)libgomp: cuStreamSynchronize error: operation not supported on global/shared address space(\n|\r\n|\r)$" { target openacc_nvidia_accel_selected } } - ! Scan for what we expect in the "XFAILed" case (without actually XFAILing). - ! { dg-shouldfail "XFAILed" { openacc_nvidia_accel_selected } } - ! ... instead of 'dg-xfail-run-if' so that 'dg-output' is evaluated at all. - ! { dg-final { if { [dg-process-target { xfail openacc_nvidia_accel_selected }] == "F" } { xfail "[testname-for-summary] really is XFAILed" } } } - ! ... so that we still get an XFAIL visible in the log. !$acc end atomic end do arr(j) = w diff --git a/libgomp/testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90 b/libgomp/testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90 index 5fa157b1674..a86b7a491bc 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90 @@ -25,13 +25,6 @@ program main do i = 0, 31 !$acc atomic update w = w + 1 - ! nvptx offloading: PR83812 "operation not supported on global/shared address space". - ! { dg-output "(\n|\r\n|\r)libgomp: cuStreamSynchronize error: operation not supported on global/shared address space(\n|\r\n|\r)$" { target openacc_nvidia_accel_selected } } - ! Scan for what we expect in the "XFAILed" case (without actually XFAILing). - ! { dg-shouldfail "XFAILed" { openacc_nvidia_accel_selected } } - ! ... instead of 'dg-xfail-run-if' so that 'dg-output' is evaluated at all. - ! { dg-final { if { [dg-process-target { xfail openacc_nvidia_accel_selected }] == "F" } { xfail "[testname-for-summary] really is XFAILed" } } } - ! ... so that we still get an XFAIL visible in the log. !$acc end atomic end do arr(j) = w