x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
Checks
Context |
Check |
Description |
dj/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
dj/TryBot-32bit |
success
|
Build for i686
|
Commit Message
Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
they won't lower CPU frequency when ZMM load and store instructions are
used.
---
sysdeps/x86/cpu-features.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Comments
On Sun, Dec 5, 2021 at 9:23 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
> they won't lower CPU frequency when ZMM load and store instructions are
> used.
> ---
> sysdeps/x86/cpu-features.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index be2498b2e7..311ade1f26 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features)
> |= bit_arch_Prefer_No_VZEROUPPER;
> else
> {
> - cpu_features->preferred[index_arch_Prefer_No_AVX512]
> - |= bit_arch_Prefer_No_AVX512;
> + /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency
> + when ZMM load and store instructions are used. */
> + if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI))
> + cpu_features->preferred[index_arch_Prefer_No_AVX512]
> + |= bit_arch_Prefer_No_AVX512;
>
> /* Avoid RTM abort triggered by VZEROUPPER inside a
> transactionally executing RTM region. */
> --
> 2.33.1
>
Should we also do Rocket Lake?
According to Travis Downs at least downclocking is an issue there ether:
https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
On Mon, Dec 6, 2021 at 11:47 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Sun, Dec 5, 2021 at 9:23 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
> > they won't lower CPU frequency when ZMM load and store instructions are
> > used.
> > ---
> > sysdeps/x86/cpu-features.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > index be2498b2e7..311ade1f26 100644
> > --- a/sysdeps/x86/cpu-features.c
> > +++ b/sysdeps/x86/cpu-features.c
> > @@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features)
> > |= bit_arch_Prefer_No_VZEROUPPER;
> > else
> > {
> > - cpu_features->preferred[index_arch_Prefer_No_AVX512]
> > - |= bit_arch_Prefer_No_AVX512;
> > + /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency
> > + when ZMM load and store instructions are used. */
> > + if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI))
> > + cpu_features->preferred[index_arch_Prefer_No_AVX512]
> > + |= bit_arch_Prefer_No_AVX512;
> >
> > /* Avoid RTM abort triggered by VZEROUPPER inside a
> > transactionally executing RTM region. */
> > --
> > 2.33.1
> >
>
> Should we also do Rocket Lake?
> According to Travis Downs at least downclocking is an issue there ether:
> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
Thiago, Arjan,
Is this true that Rocket Lake can use ZMM load/store?
On 12/7/2021 4:53 AM, H.J. Lu wrote:
>> Should we also do Rocket Lake?
>> According to Travis Downs at least downclocking is an issue there ether:
>> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
>
> Thiago, Arjan,
>
> Is this true that Rocket Lake can use ZMM load/store?
>
I have no specific data myself about rocket lake... but data is data...
so I'm all for trying it, but other than looking at cpuid's model number
I wouldn't know of an easy way to detect RKL vs ICL or others
On Tue, Dec 7, 2021 at 5:18 AM Arjan van de Ven <arjan@linux.intel.com> wrote:
>
> On 12/7/2021 4:53 AM, H.J. Lu wrote:
> >> Should we also do Rocket Lake?
> >> According to Travis Downs at least downclocking is an issue there ether:
> >> https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#rocket-lake
> >
> > Thiago, Arjan,
> >
> > Is this true that Rocket Lake can use ZMM load/store?
> >
>
>
> I have no specific data myself about rocket lake... but data is data...
> so I'm all for trying it, but other than looking at cpuid's model number
Hongtao, Hongyu, can you find a Rocket Lake to test?
> I wouldn't know of an easy way to detect RKL vs ICL or others
In GCC, RKL ISAs are ICL ISAs without SGX.
* H. J. Lu via Libc-alpha:
> Hongtao, Hongyu, can you find a Rocket Lake to test?
I've found a lab machine with an i7-11700 CPU. Is there something I
could test for you?
(This could be non-production silicon, though.)
Thanks,
Florian
On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > Hongtao, Hongyu, can you find a Rocket Lake to test?
>
> I've found a lab machine with an i7-11700 CPU. Is there something I
> could test for you?
You can enable AVX512 in glibc with:
$ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
While bootstrapping GCC with -j8, track CPU frequency with turbostat. If
there is no CPU frequency drop and build time is less comparing against
without GLIBC_TUNABLES, we can enable AVX512.
> (This could be non-production silicon, though.)
>
The frequency behavior of non-production silicon can be different.
* H. J. Lu via Libc-alpha:
> On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu via Libc-alpha:
>>
>> > Hongtao, Hongyu, can you find a Rocket Lake to test?
>>
>> I've found a lab machine with an i7-11700 CPU. Is there something I
>> could test for you?
>
> You can enable AVX512 in glibc with:
>
> $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
>
> While bootstrapping GCC with -j8, track CPU frequency with turbostat. If
> there is no CPU frequency drop and build time is less comparing against
> without GLIBC_TUNABLES, we can enable AVX512.
>
>> (This could be non-production silicon, though.)
>>
>
> The frequency behavior of non-production silicon can be different.
With that caveat, it seems that frequencies drop further with
GLIBC_TUNABLES set as above, and the build is also a little bit slower
(5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
little bit warmer for the second run).
Would it make sense to run more extensive tests, or should we wait for
someone with production silicon to show up?
Thanks,
Florian
On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu via Libc-alpha:
> >>
> >> > Hongtao, Hongyu, can you find a Rocket Lake to test?
> >>
> >> I've found a lab machine with an i7-11700 CPU. Is there something I
> >> could test for you?
> >
> > You can enable AVX512 in glibc with:
> >
> > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
> >
> > While bootstrapping GCC with -j8, track CPU frequency with turbostat. If
> > there is no CPU frequency drop and build time is less comparing against
> > without GLIBC_TUNABLES, we can enable AVX512.
> >
> >> (This could be non-production silicon, though.)
> >>
> >
> > The frequency behavior of non-production silicon can be different.
>
> With that caveat, it seems that frequencies drop further with
> GLIBC_TUNABLES set as above, and the build is also a little bit slower
> (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
> little bit warmer for the second run).
>
> Would it make sense to run more extensive tests, or should we wait for
> someone with production silicon to show up?
GCC is a heavy user of memcpy/memset, which is a good proxy of
ZMM load/store impact on CPU frequency. We need to run the same
test on a production Rocket Lake.
On Tuesday, 7 December 2021 07:52:44 PST H.J. Lu wrote:
> > Would it make sense to run more extensive tests, or should we wait for
> > someone with production silicon to show up?
>
> GCC is a heavy user of memcpy/memset, which is a good proxy of
> ZMM load/store impact on CPU frequency. We need to run the same
> test on a production Rocket Lake.
Can someone run the same test on an Ice Lake? That will also answer whether we
should enable the same thing for ICL / ICX.
RKL is a Cypress Cove, so I'd expect it to have the same performance numbers
as ICL's Sunny Cove. The data I have says that, in theory, we should not see a
frequency drop for 512-bit memcpy / memset on ICL or TGL, but I haven't got
experimental data confirming that. And I can't really run the benchmark test
on a laptop with very poor thermal dissipation (freq drops to 1500 MHz all on
its own).
If a good ICL has the drop, then I'd assume RKL will too.
On Tue, Dec 7, 2021 at 9:53 AM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu via Libc-alpha:
> >
> > > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >>
> > >> * H. J. Lu via Libc-alpha:
> > >>
> > >> > Hongtao, Hongyu, can you find a Rocket Lake to test?
> > >>
> > >> I've found a lab machine with an i7-11700 CPU. Is there something I
> > >> could test for you?
> > >
> > > You can enable AVX512 in glibc with:
> > >
> > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
> > >
> > > While bootstrapping GCC with -j8, track CPU frequency with turbostat. If
> > > there is no CPU frequency drop and build time is less comparing against
> > > without GLIBC_TUNABLES, we can enable AVX512.
> > >
> > >> (This could be non-production silicon, though.)
> > >>
> > >
> > > The frequency behavior of non-production silicon can be different.
> >
> > With that caveat, it seems that frequencies drop further with
> > GLIBC_TUNABLES set as above, and the build is also a little bit slower
> > (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
> > little bit warmer for the second run).
> >
> > Would it make sense to run more extensive tests, or should we wait for
> > someone with production silicon to show up?
>
> GCC is a heavy user of memcpy/memset, which is a good proxy of
> ZMM load/store impact on CPU frequency. We need to run the same
> test on a production Rocket Lake.
I would think a microbenchmark would be better for determining if
rocketlake actually has throttling.
Testing the full j8 GCC build will add a bunch of frequency "noise"
due to thermal throttling.
>
> --
> H.J.
On Tue, Dec 7, 2021 at 11:33 AM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Tue, Dec 7, 2021 at 9:53 AM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > On Tue, Dec 7, 2021 at 7:48 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu via Libc-alpha:
> > >
> > > > On Tue, Dec 7, 2021 at 6:05 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >>
> > > >> * H. J. Lu via Libc-alpha:
> > > >>
> > > >> > Hongtao, Hongyu, can you find a Rocket Lake to test?
> > > >>
> > > >> I've found a lab machine with an i7-11700 CPU. Is there something I
> > > >> could test for you?
> > > >
> > > > You can enable AVX512 in glibc with:
> > > >
> > > > $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=-Prefer_No_AVX512
> > > >
> > > > While bootstrapping GCC with -j8, track CPU frequency with turbostat. If
> > > > there is no CPU frequency drop and build time is less comparing against
> > > > without GLIBC_TUNABLES, we can enable AVX512.
> > > >
> > > >> (This could be non-production silicon, though.)
> > > >>
> > > >
> > > > The frequency behavior of non-production silicon can be different.
> > >
> > > With that caveat, it seems that frequencies drop further with
> > > GLIBC_TUNABLES set as above, and the build is also a little bit slower
> > > (5m31s vs 5m23s, the AVX-512 build was run first, and the systems was a
> > > little bit warmer for the second run).
> > >
> > > Would it make sense to run more extensive tests, or should we wait for
> > > someone with production silicon to show up?
> >
> > GCC is a heavy user of memcpy/memset, which is a good proxy of
> > ZMM load/store impact on CPU frequency. We need to run the same
> > test on a production Rocket Lake.
>
> I would think a microbenchmark would be better for determining if
> rocketlake actually has throttling.
>
> Testing the full j8 GCC build will add a bunch of frequency "noise"
> due to thermal throttling.
>
> >
> > --
> > H.J.
I would like to backport this patch to release branches.
Any comments or objections?
--Sunil
@@ -538,8 +538,11 @@ init_cpu_features (struct cpu_features *cpu_features)
|= bit_arch_Prefer_No_VZEROUPPER;
else
{
- cpu_features->preferred[index_arch_Prefer_No_AVX512]
- |= bit_arch_Prefer_No_AVX512;
+ /* Processors with AVX512 and AVX-VNNI won't lower CPU frequency
+ when ZMM load and store instructions are used. */
+ if (!CPU_FEATURES_CPU_P (cpu_features, AVX_VNNI))
+ cpu_features->preferred[index_arch_Prefer_No_AVX512]
+ |= bit_arch_Prefer_No_AVX512;
/* Avoid RTM abort triggered by VZEROUPPER inside a
transactionally executing RTM region. */