libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688]

Message ID Y3Hy1ckL3ZluEOSi@tucnak
State New
Headers
Series libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688] |

Commit Message

Jakub Jelinek Nov. 14, 2022, 7:48 a.m. UTC
  Hi!

Working virtually out of Baker Island.

We got a response from AMD in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
so the following patch starts treating AMD with AVX and CMPXCHG16B
ISAs like Intel by using vmovdqa for atomic load/store in libatomic.

Ok for trunk if it passes bootstrap/regtest?

2022-11-13  Jakub Jelinek  <jakub@redhat.com>

	PR target/104688
	* config/x86/init.c (__libat_feat1_init): Revert 2022-03-17 change
	- on x86_64 no longer clear bit_AVX if CPU vendor is not Intel.


	Jakub
  

Comments

Uros Bizjak Nov. 14, 2022, 7:55 a.m. UTC | #1
On Mon, Nov 14, 2022 at 8:48 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> Working virtually out of Baker Island.
>
> We got a response from AMD in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
> so the following patch starts treating AMD with AVX and CMPXCHG16B
> ISAs like Intel by using vmovdqa for atomic load/store in libatomic.
>
> Ok for trunk if it passes bootstrap/regtest?
>
> 2022-11-13  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/104688
>         * config/x86/init.c (__libat_feat1_init): Revert 2022-03-17 change
>         - on x86_64 no longer clear bit_AVX if CPU vendor is not Intel.
>
> --- libatomic/config/x86/init.c.jj      2022-03-17 18:48:56.708723194 +0100
> +++ libatomic/config/x86/init.c 2022-11-13 18:23:26.315440071 -1200
> @@ -34,18 +34,6 @@ __libat_feat1_init (void)
>    unsigned int eax, ebx, ecx, edx;
>    FEAT1_REGISTER = 0;
>    __get_cpuid (1, &eax, &ebx, &ecx, &edx);
> -#ifdef __x86_64__
> -  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> -      == (bit_AVX | bit_CMPXCHG16B))
> -    {
> -      /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
> -        is atomic, but so far we don't have this guarantee from AMD.  */
> -      unsigned int ecx2 = 0;
> -      __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
> -      if (ecx2 != signature_INTEL_ecx)
> -       FEAT1_REGISTER &= ~bit_AVX;

We still need this, but also bypass it for AMD signature. There are
other vendors than Intel and AMD.

OK with the above addition.

Thanks,
Uros.

> -    }
> -#endif
>    /* See the load in load_feat1.  */
>    __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
>    return FEAT1_REGISTER;
>
>         Jakub
>
  
Xi Ruoyao Nov. 14, 2022, 8:19 a.m. UTC | #2
On Mon, 2022-11-14 at 08:55 +0100, Uros Bizjak via Gcc-patches wrote:
> On Mon, Nov 14, 2022 at 8:48 AM Jakub Jelinek <jakub@redhat.com>
> wrote:
> > 
> > Hi!
> > 
> > Working virtually out of Baker Island.
> > 
> > We got a response from AMD in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
> > so the following patch starts treating AMD with AVX and CMPXCHG16B
> > ISAs like Intel by using vmovdqa for atomic load/store in libatomic.
> > 
> > Ok for trunk if it passes bootstrap/regtest?
> > 
> > 2022-11-13  Jakub Jelinek  <jakub@redhat.com>
> > 
> >         PR target/104688
> >         * config/x86/init.c (__libat_feat1_init): Revert 2022-03-17
> > change
> >         - on x86_64 no longer clear bit_AVX if CPU vendor is not
> > Intel.
> > 
> > --- libatomic/config/x86/init.c.jj      2022-03-17
> > 18:48:56.708723194 +0100
> > +++ libatomic/config/x86/init.c 2022-11-13 18:23:26.315440071 -1200
> > @@ -34,18 +34,6 @@ __libat_feat1_init (void)
> >    unsigned int eax, ebx, ecx, edx;
> >    FEAT1_REGISTER = 0;
> >    __get_cpuid (1, &eax, &ebx, &ecx, &edx);
> > -#ifdef __x86_64__
> > -  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> > -      == (bit_AVX | bit_CMPXCHG16B))
> > -    {
> > -      /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte
> > aligned address
> > -        is atomic, but so far we don't have this guarantee from
> > AMD.  */
> > -      unsigned int ecx2 = 0;
> > -      __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
> > -      if (ecx2 != signature_INTEL_ecx)
> > -       FEAT1_REGISTER &= ~bit_AVX;
> 
> We still need this, but also bypass it for AMD signature. There are
> other vendors than Intel and AMD.

Mayshao: how about the status of this feature on Zhaoxin product lines?
IIRC they support AVX (but disabled by default in GCC for Lujiazui), but
we don't know if they make the guarantee about atomicity of 16B aligned
access.

> 
> OK with the above addition.
> 
> Thanks,
> Uros.
> 
> > -    }
> > -#endif
> >    /* See the load in load_feat1.  */
> >    __atomic_store_n (&__libat_feat1, FEAT1_REGISTER,
> > __ATOMIC_RELAXED);
> >    return FEAT1_REGISTER;
> > 
> >         Jakub
> >
  
Jakub Jelinek Nov. 14, 2022, 8:34 a.m. UTC | #3
On Mon, Nov 14, 2022 at 04:19:48PM +0800, Xi Ruoyao wrote:
> > > --- libatomic/config/x86/init.c.jj      2022-03-17
> > > 18:48:56.708723194 +0100
> > > +++ libatomic/config/x86/init.c 2022-11-13 18:23:26.315440071 -1200
> > > @@ -34,18 +34,6 @@ __libat_feat1_init (void)
> > >    unsigned int eax, ebx, ecx, edx;
> > >    FEAT1_REGISTER = 0;
> > >    __get_cpuid (1, &eax, &ebx, &ecx, &edx);
> > > -#ifdef __x86_64__
> > > -  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> > > -      == (bit_AVX | bit_CMPXCHG16B))
> > > -    {
> > > -      /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte
> > > aligned address
> > > -        is atomic, but so far we don't have this guarantee from
> > > AMD.  */
> > > -      unsigned int ecx2 = 0;
> > > -      __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
> > > -      if (ecx2 != signature_INTEL_ecx)
> > > -       FEAT1_REGISTER &= ~bit_AVX;
> > 
> > We still need this, but also bypass it for AMD signature. There are
> > other vendors than Intel and AMD.
> 
> Mayshao: how about the status of this feature on Zhaoxin product lines?
> IIRC they support AVX (but disabled by default in GCC for Lujiazui), but
> we don't know if they make the guarantee about atomicity of 16B aligned
> access.

I did the change on the assumption that only Intel and AMD implement AVX.
Looking around, I'm afraid Zhaoxin Zhangjiang/Wudaokou/Lujiazui
and VIA Eden C and VIA Nano C CPUs do support AVX too, the question is
if they implement CMPXCHG16B too.
From what is in i386-common.cc, none of non-Intel CPUs in there have
PTA_AVX and only Lujiazui has CX16.  But that doesn't need to match what
the HW actually does and one can just compile with -mcx16 -mavx -m64
rather than using some -march=whatever.

Sure, can change the check so that it checks for AMD too for now and
therefore discard the sync.md patch, the question is whom do we talk at
Zhaoxin and VIA and if there are any further other CX16+AVX CPUs

	Jakub
  
Xi Ruoyao Nov. 14, 2022, 8:44 a.m. UTC | #4
On Mon, 2022-11-14 at 09:34 +0100, Jakub Jelinek wrote:

> > Mayshao: how about the status of this feature on Zhaoxin product lines?
> > IIRC they support AVX (but disabled by default in GCC for Lujiazui), but
> > we don't know if they make the guarantee about atomicity of 16B aligned
> > access.
> 
> I did the change on the assumption that only Intel and AMD implement AVX.
> Looking around, I'm afraid Zhaoxin Zhangjiang/Wudaokou/Lujiazui
> and VIA Eden C and VIA Nano C CPUs do support AVX too, the question is
> if they implement CMPXCHG16B too.

According to r13-713, at least Lujiazui has CX16.

> From what is in i386-common.cc, none of non-Intel CPUs in there have
> PTA_AVX and only Lujiazui has CX16.  But that doesn't need to match what
> the HW actually does and one can just compile with -mcx16 -mavx -m64
> rather than using some -march=whatever.
> 
> Sure, can change the check so that it checks for AMD too for now and
> therefore discard the sync.md patch, the question is whom do we talk at
> Zhaoxin and VIA and if there are any further other CX16+AVX CPUs
> 
>         Jakub
>
  

Patch

--- libatomic/config/x86/init.c.jj	2022-03-17 18:48:56.708723194 +0100
+++ libatomic/config/x86/init.c	2022-11-13 18:23:26.315440071 -1200
@@ -34,18 +34,6 @@  __libat_feat1_init (void)
   unsigned int eax, ebx, ecx, edx;
   FEAT1_REGISTER = 0;
   __get_cpuid (1, &eax, &ebx, &ecx, &edx);
-#ifdef __x86_64__
-  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
-      == (bit_AVX | bit_CMPXCHG16B))
-    {
-      /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
-	 is atomic, but so far we don't have this guarantee from AMD.  */
-      unsigned int ecx2 = 0;
-      __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
-      if (ecx2 != signature_INTEL_ecx)
-	FEAT1_REGISTER &= ~bit_AVX;
-    }
-#endif
   /* See the load in load_feat1.  */
   __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
   return FEAT1_REGISTER;