x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI

Message ID 20211206032303.7159-1-hjl.tools@gmail.com
State Committed
Commit ceeffe968c01b1202e482f4855cb6baf5c6cb713
Headers
Series x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

H.J. Lu Dec. 6, 2021, 3:23 a.m. UTC
  Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
they won't lower CPU frequency when ZMM load and store instructions are
used.
---
 sysdeps/x86/cpu-features.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
  

Comments

Noah Goldstein Dec. 7, 2021, 7:47 a.m. UTC | #1
On Sun, Dec 5, 2021 at 9:23 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
> they won't lower CPU frequency when ZMM load and store instructions are
> used.
> ---
>  sysdeps/x86/cpu-features.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index be2498b2e7..311ade1f26 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features)
>           |= bit_arch_Prefer_No_VZEROUPPER;
>        else
>         {
> -         cpu_features->preferred[index_arch_Prefer_No_AVX512]
> -           |= bit_arch_Prefer_No_AVX512;
> +         /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency
> +            when ZMM load and store instructions are used.  */
> +         if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI))
> +           cpu_features->preferred[index_arch_Prefer_No_AVX512]
> +             |= bit_arch_Prefer_No_AVX512;
>
>           /* Avoid RTM abort triggered by VZEROUPPER inside a
>              transactionally executing RTM region.  */
> --
> 2.33.1
>

Should we also do Rocket Lake?
According to Travis Downs at least downclocking is an issue there ether:
https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
  
H.J. Lu Dec. 7, 2021, 12:53 p.m. UTC | #2
On Mon, Dec 6, 2021 at 11:47 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Sun, Dec 5, 2021 at 9:23 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
> > they won't lower CPU frequency when ZMM load and store instructions are
> > used.
> > ---
> >  sysdeps/x86/cpu-features.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > index be2498b2e7..311ade1f26 100644
> > --- a/sysdeps/x86/cpu-features.c
> > +++ b/sysdeps/x86/cpu-features.c
> > @@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features)
> >           |= bit_arch_Prefer_No_VZEROUPPER;
> >        else
> >         {
> > -         cpu_features->preferred[index_arch_Prefer_No_AVX512]
> > -           |= bit_arch_Prefer_No_AVX512;
> > +         /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency
> > +            when ZMM load and store instructions are used.  */
> > +         if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI))
> > +           cpu_features->preferred[index_arch_Prefer_No_AVX512]
> > +             |= bit_arch_Prefer_No_AVX512;
> >
> >           /* Avoid RTM abort triggered by VZEROUPPER inside a
> >              transactionally executing RTM region.  */
> > --
> > 2.33.1
> >
>
> Should we also do Rocket Lake?
> According to Travis Downs at least downclocking is an issue there ether:
> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake

Thiago, Arjan,

Is this true that Rocket Lake can use ZMM load/store?
  
Arjan van de Ven Dec. 7, 2021, 1:17 p.m. UTC | #3
On 12/7/2021 4:53 AM, H.J. Lu wrote:
>> Should we also do Rocket Lake?
>> According to Travis Downs at least downclocking is an issue there ether:
>> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
> 
> Thiago, Arjan,
> 
> Is this true that Rocket Lake can use ZMM load/store?
> 


I have no specific data myself about rocket lake... but data is data...
so I'm all for trying it, but other than looking at cpuid's model number
I wouldn't know of an easy way to detect RKL vs ICL or others
  
H.J. Lu Dec. 7, 2021, 1:34 p.m. UTC | #4
On Tue, Dec 7, 2021 at 5:18 AM Arjan van de Ven <arjan@linux.intel.com> wrote:
>
> On 12/7/2021 4:53 AM, H.J. Lu wrote:
> >> Should we also do Rocket Lake?
> >> According to Travis Downs at least downclocking is an issue there ether:
> >> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
> >
> > Thiago, Arjan,
> >
> > Is this true that Rocket Lake can use ZMM load/store?
> >
>
>
> I have no specific data myself about rocket lake... but data is data...
> so I'm all for trying it, but other than looking at cpuid's model number

Hongtao, Hongyu,  can you find a Rocket Lake to test?

> I wouldn't know of an easy way to detect RKL vs ICL or others

In GCC, RKL ISAs are ICL ISAs without SGX.
  
Florian Weimer Dec. 7, 2021, 2:05 p.m. UTC | #5
* H. J. Lu via Libc-alpha:

> Hongtao, Hongyu,  can you find a Rocket Lake to test?

I've found a lab machine with an i7-11700 CPU.  Is there something I
could test for you?

(This could be non-production silicon, though.)

Thanks,
Florian
  
H.J. Lu Dec. 7, 2021, 2:15 p.m. UTC | #6
On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
>
> I've found a lab machine with an i7-11700 CPU.  Is there something I
> could test for you?

You can enable AVX512 in glibc with:

$ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512

While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
there is no CPU frequency drop and build time is less comparing against
without GLIBC_TUNABLES, we can enable AVX512.

> (This could be non-production silicon, though.)
>

The frequency behavior of non-production silicon can be different.
  
Florian Weimer Dec. 7, 2021, 3:47 p.m. UTC | #7
* H. J. Lu via Libc-alpha:

> On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu via Libc-alpha:
>>
>> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
>>
>> I've found a lab machine with an i7-11700 CPU.  Is there something I
>> could test for you?
>
> You can enable AVX512 in glibc with:
>
> $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
>
> While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
> there is no CPU frequency drop and build time is less comparing against
> without GLIBC_TUNABLES, we can enable AVX512.
>
>> (This could be non-production silicon, though.)
>>
>
> The frequency behavior of non-production silicon can be different.

With that caveat, it seems that frequencies drop further with
GLIBC_TUNABLES set as above, and the build is also a little bit slower
(5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
little bit warmer for the second run).

Would it make sense to run more extensive tests, or should we wait for
someone with production silicon to show up?

Thanks,
Florian
  
H.J. Lu Dec. 7, 2021, 3:52 p.m. UTC | #8
On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu via Libc-alpha:
> >>
> >> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
> >>
> >> I've found a lab machine with an i7-11700 CPU.  Is there something I
> >> could test for you?
> >
> > You can enable AVX512 in glibc with:
> >
> > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
> >
> > While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
> > there is no CPU frequency drop and build time is less comparing against
> > without GLIBC_TUNABLES, we can enable AVX512.
> >
> >> (This could be non-production silicon, though.)
> >>
> >
> > The frequency behavior of non-production silicon can be different.
>
> With that caveat, it seems that frequencies drop further with
> GLIBC_TUNABLES set as above, and the build is also a little bit slower
> (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
> little bit warmer for the second run).
>
> Would it make sense to run more extensive tests, or should we wait for
> someone with production silicon to show up?

GCC is a heavy user of memcpy/memset, which is a good proxy of
ZMM load/store impact on CPU frequency.   We need to run the same
test on a production Rocket Lake.
  
Thiago Macieira Dec. 7, 2021, 4:22 p.m. UTC | #9
On Tuesday, 7 December 2021 07:52:44 PST H.J. Lu wrote:
> > Would it make sense to run more extensive tests, or should we wait for
> > someone with production silicon to show up?
> 
> GCC is a heavy user of memcpy/memset, which is a good proxy of
> ZMM load/store impact on CPU frequency.   We need to run the same
> test on a production Rocket Lake.

Can someone run the same test on an Ice Lake? That will also answer whether we 
should enable the same thing for ICL / ICX.

RKL is a Cypress Cove, so I'd expect it to have the same performance numbers 
as ICL's Sunny Cove. The data I have says that, in theory, we should not see a 
frequency drop for 512-bit memcpy / memset on ICL or TGL, but I haven't got 
experimental data confirming that. And I can't really run the benchmark test 
on a laptop with very poor thermal dissipation (freq drops to 1500 MHz all on 
its own).

If a good ICL has the drop, then I'd assume RKL will too.
  
Noah Goldstein Dec. 7, 2021, 7:32 p.m. UTC | #10
On Tue, Dec 7, 2021 at 9:53 AM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu via Libc-alpha:
> >
> > > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >>
> > >> * H. J. Lu via Libc-alpha:
> > >>
> > >> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
> > >>
> > >> I've found a lab machine with an i7-11700 CPU.  Is there something I
> > >> could test for you?
> > >
> > > You can enable AVX512 in glibc with:
> > >
> > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
> > >
> > > While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
> > > there is no CPU frequency drop and build time is less comparing against
> > > without GLIBC_TUNABLES, we can enable AVX512.
> > >
> > >> (This could be non-production silicon, though.)
> > >>
> > >
> > > The frequency behavior of non-production silicon can be different.
> >
> > With that caveat, it seems that frequencies drop further with
> > GLIBC_TUNABLES set as above, and the build is also a little bit slower
> > (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
> > little bit warmer for the second run).
> >
> > Would it make sense to run more extensive tests, or should we wait for
> > someone with production silicon to show up?
>
> GCC is a heavy user of memcpy/memset, which is a good proxy of
> ZMM load/store impact on CPU frequency.   We need to run the same
> test on a production Rocket Lake.

I would think a microbenchmark would be better for determining if
rocketlake actually has throttling.

Testing the full j8 GCC build will add a bunch of frequency "noise"
due to thermal throttling.

>
> --
> H.J.
  
Sunil Pandey April 23, 2022, 1:51 a.m. UTC | #11
On Tue, Dec 7, 2021 at 11:33 AM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Tue, Dec 7, 2021 at 9:53 AM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu via Libc-alpha:
> > >
> > > > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >>
> > > >> * H. J. Lu via Libc-alpha:
> > > >>
> > > >> > Hongtao, Hongyu,  can you find a Rocket Lake to test?
> > > >>
> > > >> I've found a lab machine with an i7-11700 CPU.  Is there something I
> > > >> could test for you?
> > > >
> > > > You can enable AVX512 in glibc with:
> > > >
> > > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
> > > >
> > > > While bootstrapping GCC with -j8, track CPU frequency with turbostat.  If
> > > > there is no CPU frequency drop and build time is less comparing against
> > > > without GLIBC_TUNABLES, we can enable AVX512.
> > > >
> > > >> (This could be non-production silicon, though.)
> > > >>
> > > >
> > > > The frequency behavior of non-production silicon can be different.
> > >
> > > With that caveat, it seems that frequencies drop further with
> > > GLIBC_TUNABLES set as above, and the build is also a little bit slower
> > > (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
> > > little bit warmer for the second run).
> > >
> > > Would it make sense to run more extensive tests, or should we wait for
> > > someone with production silicon to show up?
> >
> > GCC is a heavy user of memcpy/memset, which is a good proxy of
> > ZMM load/store impact on CPU frequency.   We need to run the same
> > test on a production Rocket Lake.
>
> I would think a microbenchmark would be better for determining if
> rocketlake actually has throttling.
>
> Testing the full j8 GCC build will add a bunch of frequency "noise"
> due to thermal throttling.
>
> >
> > --
> > H.J.

I would like to backport this patch to release branches.
Any comments or objections?

--Sunil
  

Patch

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index be2498b2e7..311ade1f26 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -538,8 +538,11 @@  init_cpu_features (struct cpu_features *cpu_features)
 	  |= bit_arch_Prefer_No_VZEROUPPER;
       else
 	{
-	  cpu_features->preferred[index_arch_Prefer_No_AVX512]
-	    |= bit_arch_Prefer_No_AVX512;
+	  /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency
+	     when ZMM load and store instructions are used.  */
+	  if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI))
+	    cpu_features->preferred[index_arch_Prefer_No_AVX512]
+	      |= bit_arch_Prefer_No_AVX512;
 
 	  /* Avoid RTM abort triggered by VZEROUPPER inside a
 	     transactionally executing RTM region.  */