[AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
Message ID | 1649362558922.26300@amazon.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5629E3858427 for <patchwork@sourceware.org>; Thu, 7 Apr 2022 20:16:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5629E3858427 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1649362595; bh=zaFKHm131FuYaVEsO2tRJ6IzazOPRsX1CrmnW34yybY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=GIfZC3nmPJ8e4xmUwF8kR4CuhJFWGhi9x7GFB+7J+pk6H/4HHOuvXO0X3EVMu+oSU DM/MloK4S8nQYFuLsPoRfDZgN9tyW+pJnRbgcZmi0+MU8/m3lvVsPjG/l85gI96eL5 JpjTmFve1UvuLZxnXpNBc6joNV8A/9RtaXMI+E4g= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) by sourceware.org (Postfix) with ESMTPS id 0983B3858C50 for <gcc-patches@gcc.gnu.org>; Thu, 7 Apr 2022 20:16:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0983B3858C50 X-Amazon-filename: 0001-AArch64-add-barrier-to-no-LSE-path-in-outline-atomic.patch, 0002-AArch64-emit-a-barrier-for-__atomic-builtins.patch X-IronPort-AV: E=Sophos;i="5.90,242,1643673600"; d="scan'208,217,223";a="77716834" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-iad-1e-90d70b14.us-east-1.amazon.com) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP; 07 Apr 2022 20:16:02 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-iad-1e-90d70b14.us-east-1.amazon.com (Postfix) with ESMTPS id D0A2DC0941; Thu, 7 Apr 2022 20:16:00 +0000 (UTC) Received: from EX13D01UWB002.ant.amazon.com (10.43.161.136) by EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Thu, 7 Apr 2022 20:16:00 +0000 Received: from EX13D01UWB002.ant.amazon.com (10.43.161.136) by EX13d01UWB002.ant.amazon.com (10.43.161.136) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Thu, 7 Apr 2022 20:16:00 +0000 Received: from EX13D01UWB002.ant.amazon.com ([10.43.161.136]) by EX13d01UWB002.ant.amazon.com ([10.43.161.136]) with mapi id 15.00.1497.033; Thu, 7 Apr 2022 20:16:00 +0000 To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> Subject: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE Thread-Topic: [AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE Thread-Index: AQHYSroLWi1lUL/1Ik27BC7ZgbdzRA== Date: Thu, 7 Apr 2022 20:15:59 +0000 Message-ID: <1649362558922.26300@amazon.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.43.160.178] Content-Type: multipart/mixed; boundary="_005_164936255892226300amazoncom_" MIME-Version: 1.0 X-Spam-Status: No, score=-17.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, HTML_MESSAGE, KAM_SHORT, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: "Pop, Sebastian via Gcc-patches" <gcc-patches@gcc.gnu.org> Reply-To: "Pop, Sebastian" <spop@amazon.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[AArch64] PR105162: emit barrier for __sync and __atomic builtins on CPUs without LSE
|
|
Commit Message
Li, Pan2 via Gcc-patches
April 7, 2022, 8:15 p.m. UTC
Hi, With -moutline-atomics gcc stops generating a barrier for __sync builtins: https://gcc.gnu.org/PR105162<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162> This is a problem on CPUs without LSE instructions where the ld/st exclusives do not guarantee a full barrier. The attached patch adds the barrier to the outline-atomics functions on the path without LSE instructions. In consequence, under -moutline-atomics __atomic and __sync builtins now behave the same with and without LSE instructions. To complete the change, the second patch makes gcc emit the barrier for __atomic builtins as well, i.e., independently of is_mm_sync(). Sebastian
Comments
Hi, Wilco pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162#c7? that "Only __sync needs the extra full barrier, but __atomic does not." The attached patch does that by adding out-of-line functions for MEMMODEL_SYNC_*. Those new functions contain a barrier on the path without LSE instructions. I tested the patch on aarch64-linux with bootstrap and make check. Sebastian
Hi Sebastian, > Wilco pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162#c7 that > "Only __sync needs the extra full barrier, but __atomic does not." > The attached patch does that by adding out-of-line functions for MEMMODEL_SYNC_*. > Those new functions contain a barrier on the path without LSE instructions. Yes, adding _sync versions of the outline functions is the correct approach. However there is no need to have separate _acq/_rel/_seq variants for every function since all but one are _seq. Also we should ensure we generate the same sequence as the inlined versions so that they are consistent. This means ensuring the LDXR macro ignores the 'A' for the _sync variants and the swp function switches to acquire semantics. Cheers, Wilco
Hi Wilco, Thanks for your review. Please find attached the patch amended following your recommendations. The number of new functions for _sync is reduced by 3x. I tested the patch on Graviton2 aarch64-linux. I also checked by hand that the outline functions in libgcc look similar to what GCC produces for the inline version. Thanks, Sebastian
Hi Sebastian, > Please find attached the patch amended following your recommendations. > The number of new functions for _sync is reduced by 3x. > I tested the patch on Graviton2 aarch64-linux. > I also checked by hand that the outline functions in libgcc look similar to what GCC produces for the inline version. Yes this looks good to me (still needs maintainer approval). One minor nitpick, a few of the tests check for __aarch64_cas2 - this should be __aarch64_cas2_sync. Note the patch still needs an appropriate commit message. Cheers, Wilco
> Yes this looks good to me (still needs maintainer approval). Thanks again Wilco for your review. > One minor nitpick, > a few of the tests check for __aarch64_cas2 - this should be __aarch64_cas2_sync. Fixed in the attached patch. > Note the patch still needs an appropriate commit message. Added the following ChangeLog entry to the commit message. * config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension of str array. * config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call memmodel_from_int and handle MEMMODEL_SYNC_*. (DEF0): Add __aarch64_*_sync functions. testsuite/ * gcc.target/aarch64/sync-comp-swap-ool.c: New. * gcc.target/aarch64/sync-op-acquire-ool.c: New. * gcc.target/aarch64/sync-op-full-ool.c: New. * gcc.target/aarch64/target_attr_20.c: Update check. * gcc.target/aarch64/target_attr_21.c: Same. libgcc/ * config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5. * config/aarch64/t-lse: Add a 5th memory model for _sync functions.
"Pop, Sebastian via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: >> Yes this looks good to me (still needs maintainer approval). > > Thanks again Wilco for your review. > >> One minor nitpick, >> a few of the tests check for __aarch64_cas2 - this should be __aarch64_cas2_sync. > > Fixed in the attached patch. > >> Note the patch still needs an appropriate commit message. > > Added the following ChangeLog entry to the commit message. > > * config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension > of str array. > * config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call > memmodel_from_int and handle MEMMODEL_SYNC_*. > (DEF0): Add __aarch64_*_sync functions. > > testsuite/ > * gcc.target/aarch64/sync-comp-swap-ool.c: New. > * gcc.target/aarch64/sync-op-acquire-ool.c: New. > * gcc.target/aarch64/sync-op-full-ool.c: New. > * gcc.target/aarch64/target_attr_20.c: Update check. > * gcc.target/aarch64/target_attr_21.c: Same. > > libgcc/ > * config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5. > * config/aarch64/t-lse: Add a 5th memory model for _sync functions. OK, thanks. Richard > From 3b624598035e4e0c1aee89efaae28596a64b3d0d Mon Sep 17 00:00:00 2001 > From: Sebastian Pop <spop@amazon.com> > Date: Mon, 18 Apr 2022 15:13:20 +0000 > Subject: [PATCH] [AArch64] add barriers to ool __sync builtins > > * config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension > of str array. > * config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call > memmodel_from_int and handle MEMMODEL_SYNC_*. > (DEF0): Add __aarch64_*_sync functions. > > testsuite/ > * gcc.target/aarch64/sync-comp-swap-ool.c: New. > * gcc.target/aarch64/sync-op-acquire-ool.c: New. > * gcc.target/aarch64/sync-op-full-ool.c: New. > * gcc.target/aarch64/target_attr_20.c: Update check. > * gcc.target/aarch64/target_attr_21.c: Same. > > libgcc/ > * config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5. > * config/aarch64/t-lse: Add a 5th memory model for _sync functions. > --- > gcc/config/aarch64/aarch64-protos.h | 2 +- > gcc/config/aarch64/aarch64.cc | 12 ++++-- > .../gcc.target/aarch64/sync-comp-swap-ool.c | 6 +++ > .../gcc.target/aarch64/sync-op-acquire-ool.c | 6 +++ > .../gcc.target/aarch64/sync-op-full-ool.c | 9 ++++ > .../gcc.target/aarch64/target_attr_20.c | 2 +- > .../gcc.target/aarch64/target_attr_21.c | 2 +- > libgcc/config/aarch64/lse.S | 42 +++++++++++++++++-- > libgcc/config/aarch64/t-lse | 8 ++-- > 9 files changed, 75 insertions(+), 14 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c > > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > index 46bade28ed6..3ad5e77a1af 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -1051,7 +1051,7 @@ bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT); > > struct atomic_ool_names > { > - const char *str[5][4]; > + const char *str[5][5]; > }; > > rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx, > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 18f80499079..3ad11e84aae 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -22670,14 +22670,14 @@ aarch64_emit_unlikely_jump (rtx insn) > add_reg_br_prob_note (jump, profile_probability::very_unlikely ()); > } > > -/* We store the names of the various atomic helpers in a 5x4 array. > +/* We store the names of the various atomic helpers in a 5x5 array. > Return the libcall function given MODE, MODEL and NAMES. */ > > rtx > aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx, > const atomic_ool_names *names) > { > - memmodel model = memmodel_base (INTVAL (model_rtx)); > + memmodel model = memmodel_from_int (INTVAL (model_rtx)); > int mode_idx, model_idx; > > switch (mode) > @@ -22717,6 +22717,11 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx, > case MEMMODEL_SEQ_CST: > model_idx = 3; > break; > + case MEMMODEL_SYNC_ACQUIRE: > + case MEMMODEL_SYNC_RELEASE: > + case MEMMODEL_SYNC_SEQ_CST: > + model_idx = 4; > + break; > default: > gcc_unreachable (); > } > @@ -22729,7 +22734,8 @@ aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx, > { "__aarch64_" #B #N "_relax", \ > "__aarch64_" #B #N "_acq", \ > "__aarch64_" #B #N "_rel", \ > - "__aarch64_" #B #N "_acq_rel" } > + "__aarch64_" #B #N "_acq_rel", \ > + "__aarch64_" #B #N "_sync" } > > #define DEF4(B) DEF0(B, 1), DEF0(B, 2), DEF0(B, 4), DEF0(B, 8), \ > { NULL, NULL, NULL, NULL } > diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c > new file mode 100644 > index 00000000000..372f4aa8746 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap-ool.c > @@ -0,0 +1,6 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=armv8-a+nolse -O2 -fno-ipa-icf -moutline-atomics" } */ > + > +#include "sync-comp-swap.x" > + > +/* { dg-final { scan-assembler-times "bl.*__aarch64_cas4_sync" 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c > new file mode 100644 > index 00000000000..95d9c56b5e1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire-ool.c > @@ -0,0 +1,6 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */ > + > +#include "sync-op-acquire.x" > + > +/* { dg-final { scan-assembler-times "bl.*__aarch64_swp4_sync" 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c > new file mode 100644 > index 00000000000..2f3881d9755 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full-ool.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=armv8-a+nolse -O2 -moutline-atomics" } */ > + > +#include "sync-op-full.x" > + > +/* { dg-final { scan-assembler-times "bl.*__aarch64_ldadd4_sync" 1 } } */ > +/* { dg-final { scan-assembler-times "bl.*__aarch64_ldclr4_sync" 1 } } */ > +/* { dg-final { scan-assembler-times "bl.*__aarch64_ldeor4_sync" 1 } } */ > +/* { dg-final { scan-assembler-times "bl.*__aarch64_ldset4_sync" 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c > index 509fb039e84..c9454fc420b 100644 > --- a/gcc/testsuite/gcc.target/aarch64/target_attr_20.c > +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_20.c > @@ -24,4 +24,4 @@ bar (void) > } > } > > -/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_acq_rel" } } */ > +/* { dg-final { scan-assembler-not "bl.*__aarch64_cas2_sync" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c > index acace4c8f2a..b8e56223b02 100644 > --- a/gcc/testsuite/gcc.target/aarch64/target_attr_21.c > +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_21.c > @@ -24,4 +24,4 @@ bar (void) > } > } > > -/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_acq_rel" 1 } } */ > +/* { dg-final { scan-assembler-times "bl.*__aarch64_cas2_sync" 1 } } */ > diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S > index c353ec2215b..9c29cf08b59 100644 > --- a/libgcc/config/aarch64/lse.S > +++ b/libgcc/config/aarch64/lse.S > @@ -87,24 +87,44 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > # define L > # define M 0x000000 > # define N 0x000000 > +# define BARRIER > #elif MODEL == 2 > # define SUFF _acq > # define A a > # define L > # define M 0x400000 > # define N 0x800000 > +# define BARRIER > #elif MODEL == 3 > # define SUFF _rel > # define A > # define L l > # define M 0x008000 > # define N 0x400000 > +# define BARRIER > #elif MODEL == 4 > # define SUFF _acq_rel > # define A a > # define L l > # define M 0x408000 > # define N 0xc00000 > +# define BARRIER > +#elif MODEL == 5 > +# define SUFF _sync > +#ifdef L_swp > +/* swp has _acq semantics. */ > +# define A a > +# define L > +# define M 0x400000 > +# define N 0x800000 > +#else > +/* All other _sync functions have _seq semantics. */ > +# define A a > +# define L l > +# define M 0x408000 > +# define N 0xc00000 > +#endif > +# define BARRIER dmb ish > #else > # error > #endif > @@ -127,7 +147,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > #endif > > #define NAME(BASE) glue4(__aarch64_, BASE, SIZE, SUFF) > -#define LDXR glue4(ld, A, xr, S) > +#if MODEL == 5 > +/* Drop A for _sync functions. */ > +# define LDXR glue3(ld, xr, S) > +#else > +# define LDXR glue4(ld, A, xr, S) > +#endif > #define STXR glue4(st, L, xr, S) > > /* Temporary registers used. Other than these, only the return value > @@ -183,10 +208,16 @@ STARTFN NAME(cas) > bne 1f > STXR w(tmp1), s(1), [x2] > cbnz w(tmp1), 0b > -1: ret > +1: BARRIER > + ret > > #else > -#define LDXP glue3(ld, A, xp) > +#if MODEL == 5 > +/* Drop A for _sync functions. */ > +# define LDXP glue2(ld, xp) > +#else > +# define LDXP glue3(ld, A, xp) > +#endif > #define STXP glue3(st, L, xp) > #ifdef HAVE_AS_LSE > # define CASP glue3(casp, A, L) x0, x1, x2, x3, [x4] > @@ -205,7 +236,8 @@ STARTFN NAME(cas) > bne 1f > STXP w(tmp2), x2, x3, [x4] > cbnz w(tmp2), 0b > -1: ret > +1: BARRIER > + ret > > #endif > > @@ -229,6 +261,7 @@ STARTFN NAME(swp) > 0: LDXR s(0), [x1] > STXR w(tmp1), s(tmp0), [x1] > cbnz w(tmp1), 0b > + BARRIER > ret > > ENDFN NAME(swp) > @@ -273,6 +306,7 @@ STARTFN NAME(LDNM) > OP s(tmp1), s(0), s(tmp0) > STXR w(tmp2), s(tmp1), [x1] > cbnz w(tmp2), 0b > + BARRIER > ret > > ENDFN NAME(LDNM) > diff --git a/libgcc/config/aarch64/t-lse b/libgcc/config/aarch64/t-lse > index 790cada3315..624daf7eddf 100644 > --- a/libgcc/config/aarch64/t-lse > +++ b/libgcc/config/aarch64/t-lse > @@ -18,13 +18,13 @@ > # along with GCC; see the file COPYING3. If not see > # <http://www.gnu.org/licenses/>. > > -# Compare-and-swap has 5 sizes and 4 memory models. > +# Compare-and-swap has 5 sizes and 5 memory models. > S0 := $(foreach s, 1 2 4 8 16, $(addsuffix _$(s), cas)) > -O0 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S0))) > +O0 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S0))) > > -# Swap, Load-and-operate have 4 sizes and 4 memory models > +# Swap, Load-and-operate have 4 sizes and 5 memory models > S1 := $(foreach s, 1 2 4 8, $(addsuffix _$(s), swp ldadd ldclr ldeor ldset)) > -O1 := $(foreach m, 1 2 3 4, $(addsuffix _$(m)$(objext), $(S1))) > +O1 := $(foreach m, 1 2 3 4 5, $(addsuffix _$(m)$(objext), $(S1))) > > LSE_OBJS := $(O0) $(O1)
Hi Sebastian, >> Note the patch still needs an appropriate commit message. > > Added the following ChangeLog entry to the commit message. > > * config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension > of str array. > * config/aarch64/aarch64.cc (aarch64_atomic_ool_func): Call > memmodel_from_int and handle MEMMODEL_SYNC_*. > (DEF0): Add __aarch64_*_sync functions. When you commit this, please also add a PR target/105162 before the changelog so the commit gets automatically added to the PR. We will need backports as well. Cheers, Wilco
Please see attached the patch back-ported to branches 12, 11, 10, and 9. Tested on aarch64-linux with bootstrap and regression test. Ok to commit to the GCC active branches? Thanks, Sebastian
"Pop, Sebastian" <spop@amazon.com> writes: > Please see attached the patch back-ported to branches 12, 11, 10, and 9. > Tested on aarch64-linux with bootstrap and regression test. > Ok to commit to the GCC active branches? OK, thanks. Only very safe patches are supposed to be going into GCC 9 at this stage, and I guess this one is a bit on the edge. But I think it's worth applying anyway because it's fixing a non-determinstic wrong-code regression. Richard
From 68c07f95157057f0167723b182f0ccffdac8a17e Mon Sep 17 00:00:00 2001 From: Sebastian Pop <spop@amazon.com> Date: Thu, 7 Apr 2022 19:18:57 +0000 Subject: [PATCH 2/2] [AArch64] emit a barrier for __atomic builtins --- gcc/config/aarch64/aarch64.cc | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 18f80499079..be1b8d22c6a 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -22931,9 +22931,7 @@ aarch64_split_compare_and_swap (rtx operands[]) if (strong_zero_p) aarch64_gen_compare_reg (NE, rval, const0_rtx); - /* Emit any final barrier needed for a __sync operation. */ - if (is_mm_sync (model)) - aarch64_emit_post_barrier (model); + aarch64_emit_post_barrier (model); } /* Split an atomic operation. */ @@ -22948,7 +22946,6 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, machine_mode mode = GET_MODE (mem); machine_mode wmode = (mode == DImode ? DImode : SImode); const enum memmodel model = memmodel_from_int (INTVAL (model_rtx)); - const bool is_sync = is_mm_sync (model); rtx_code_label *label; rtx x; @@ -22966,11 +22963,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, /* The initial load can be relaxed for a __sync operation since a final barrier will be emitted to stop code hoisting. */ - if (is_sync) - aarch64_emit_load_exclusive (mode, old_out, mem, - GEN_INT (MEMMODEL_RELAXED)); - else - aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx); + aarch64_emit_load_exclusive (mode, old_out, mem, GEN_INT (MEMMODEL_RELAXED)); switch (code) { @@ -23016,9 +23009,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, gen_rtx_LABEL_REF (Pmode, label), pc_rtx); aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); - /* Emit any final barrier needed for a __sync operation. */ - if (is_sync) - aarch64_emit_post_barrier (model); + aarch64_emit_post_barrier (model); } static void -- 2.25.1