[1/3] x86:Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
Commit Message
From: MayShao <mayshao-oc@zhaoxin.com>
Fix code indentation issues under the Zhaoxin branch.
Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
the AVX_Fast_Unaligned_Load.
Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
use sse2_unaligned version of memset,strcpy and strcat.
---
sysdeps/x86/cpu-features.c | 66 ++++++++++++++++++++++----------------
1 file changed, 39 insertions(+), 27 deletions(-)
Comments
On Wed, 2024-06-26 at 10:46 +0800, MayShao wrote:
> From: MayShao <mayshao-oc@zhaoxin.com>
>
> Fix code indentation issues under the Zhaoxin branch.
>
> Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
> the AVX_Fast_Unaligned_Load.
>
> Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
> use sse2_unaligned version of memset,strcpy and strcat.
> ---
> sysdeps/x86/cpu-features.c | 66 ++++++++++++++++++++++---------------
> -
> 1 file changed, 39 insertions(+), 27 deletions(-)
>
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index 3d7c2819d7..24fbf699b9 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -1015,7 +1015,7 @@
> https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht
> kind = arch_kind_zhaoxin;
>
> get_common_indices (cpu_features, &family, &model,
> &extended_model,
> - &stepping);
> + &stepping);
Don't randomly change tabs to spaces.
On Wed, Jun 26, 2024 at 10:56 AM Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Wed, 2024-06-26 at 10:46 +0800, MayShao wrote:
> > From: MayShao <mayshao-oc@zhaoxin.com>
> >
> > Fix code indentation issues under the Zhaoxin branch.
> >
> > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
> > the AVX_Fast_Unaligned_Load.
> >
> > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
> > use sse2_unaligned version of memset,strcpy and strcat.
> > ---
> > sysdeps/x86/cpu-features.c | 66 ++++++++++++++++++++++---------------
> > -
> > 1 file changed, 39 insertions(+), 27 deletions(-)
> >
> > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > index 3d7c2819d7..24fbf699b9 100644
> > --- a/sysdeps/x86/cpu-features.c
> > +++ b/sysdeps/x86/cpu-features.c
> > @@ -1015,7 +1015,7 @@
> > https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht
> > kind = arch_kind_zhaoxin;
> >
> > get_common_indices (cpu_features, &family, &model,
> > &extended_model,
> > - &stepping);
> > + &stepping);
>
> Don't randomly change tabs to spaces.
We have a clang-format file, you should run it over the entire patch.
Also, can you comment or enum the different models?
>
> --
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University
On Wen, Jun 26, 2024 at 11:26 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Wed, Jun 26, 2024 at 10:56 AM Xi Ruoyao <xry111@xry111.site> wrote:
> >
> > On Wed, 2024-06-26 at 10:46 +0800, MayShao wrote:
> > > From: MayShao <mayshao-oc@zhaoxin.com>
> > >
> > > Fix code indentation issues under the Zhaoxin branch.
> > >
> > > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
> > > the AVX_Fast_Unaligned_Load.
> > >
> > > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
> > > use sse2_unaligned version of memset,strcpy and strcat.
> > > ---
> > > sysdeps/x86/cpu-features.c | 66 ++++++++++++++++++++++---------------
> > > -
> > > 1 file changed, 39 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > > index 3d7c2819d7..24fbf699b9 100644
> > > --- a/sysdeps/x86/cpu-features.c
> > > +++ b/sysdeps/x86/cpu-features.c
> > > @@ -1015,7 +1015,7 @@
> > > https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht
> > > kind = arch_kind_zhaoxin;
> > >
> > > get_common_indices (cpu_features, &family, &model,
> > > &extended_model,
> > > - &stepping);
> > > + &stepping);
> >
> > Don't randomly change tabs to spaces.
>
> We have a clang-format file, you should run it over the entire patch.
>
Ack. I will fix this.
> Also, can you comment or enum the different models?
Will fix it.
> >
> > --
> > Xi Ruoyao <xry111@xry111.site>
> > School of Aerospace Science and Technology, Xidian University> >
* MayShao:
> From: MayShao <mayshao-oc@zhaoxin.com>
>
> Fix code indentation issues under the Zhaoxin branch.
>
> Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
> the AVX_Fast_Unaligned_Load.
>
> Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
> use sse2_unaligned version of memset,strcpy and strcat.
Somewhat related to that, do you have documentation of the behavior of
*aligned* 128-bit loads? Are they guaranteed to be atomic?
At least if MOVAPD, MOVAPS, MOVDQA are used?
Thanks,
Florian
On Wen, Jun 16,2024 7:01 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * MayShao:
>
> > From: MayShao <mayshao-oc@zhaoxin.com>
> >
> > Fix code indentation issues under the Zhaoxin branch.
> >
> > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
> > the AVX_Fast_Unaligned_Load.
> >
> > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
> > use sse2_unaligned version of memset,strcpy and strcat.
>
> Somewhat related to that, do you have documentation of the behavior of
> *aligned* 128-bit loads? Are they guaranteed to be atomic?
> At least if MOVAPD, MOVAPS, MOVDQA are used?
I can confirm is that aligned 128-bit loads (such as MOVAPD, MOVAPS,
MOVDQA) in the WB memory region are atomic, and for unaligned
128-bit loads, it can also be guaranteed to be atomic if within a cacheline.
>
> Thanks,
> Florian
>
> On Wen, Jun 16,2024 7:01 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * MayShao:
>>
>> > From: MayShao <mayshao-oc@zhaoxin.com>
>> >
>> > Fix code indentation issues under the Zhaoxin branch.
>> >
>> > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
>> > the AVX_Fast_Unaligned_Load.
>> >
>> > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
>> > use sse2_unaligned version of memset,strcpy and strcat.
>>
>> Somewhat related to that, do you have documentation of the behavior of
>> *aligned* 128-bit loads? Are they guaranteed to be atomic?
>> At least if MOVAPD, MOVAPS, MOVDQA are used?
>
> I can confirm is that aligned 128-bit loads (such as MOVAPD, MOVAPS,
> MOVDQA) in the WB memory region are atomic, and for unaligned
> 128-bit loads, it can also be guaranteed to be atomic if within a cacheline.
This is great news. Could you update this GCC bug with the information?
Bug 104688 - gcc and libatomic can use SSE for 128-bit atomic loads
on Intel and AMD CPUs with AVX
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688>
I think it means we can teach GCC to use 128-bit atomic loads
unconditionally for AVX targets (bypassing libatomic).
Thanks,
Florian
On Thu, 2024-06-27 at 08:32 +0200, Florian Weimer wrote:
> > On Wen, Jun 16,2024 7:01 PM Florian Weimer <fweimer@redhat.com>
> > wrote:
> > >
> > > * MayShao:
> > >
> > > > From: MayShao <mayshao-oc@zhaoxin.com>
> > > >
> > > > Fix code indentation issues under the Zhaoxin branch.
> > > >
> > > > Unaligned AVX load are slower on KH-40000 and KX-7000, so
> > > > disable
> > > > the AVX_Fast_Unaligned_Load.
> > > >
> > > > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
> > > > use sse2_unaligned version of memset,strcpy and strcat.
> > >
> > > Somewhat related to that, do you have documentation of the
> > > behavior of
> > > *aligned* 128-bit loads? Are they guaranteed to be atomic?
> > > At least if MOVAPD, MOVAPS, MOVDQA are used?
> >
> > I can confirm is that aligned 128-bit loads (such as MOVAPD, MOVAPS,
> > MOVDQA) in the WB memory region are atomic, and for unaligned
> > 128-bit loads, it can also be guaranteed to be atomic if within a
> > cacheline.
>
> This is great news. Could you update this GCC bug with the
> information?
>
> Bug 104688 - gcc and libatomic can use SSE for 128-bit atomic loads
> on Intel and AMD CPUs with AVX
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688>
The info is already there. But the discussion stalled after some
concern about non-Write-Back memory regions was raised.
@@ -1015,7 +1015,7 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht
kind = arch_kind_zhaoxin;
get_common_indices (cpu_features, &family, &model, &extended_model,
- &stepping);
+ &stepping);
get_extended_indices (cpu_features);
@@ -1026,38 +1026,50 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht
{
if (model == 0xf || model == 0x19)
{
- CPU_FEATURE_UNSET (cpu_features, AVX);
- CPU_FEATURE_UNSET (cpu_features, AVX2);
+ CPU_FEATURE_UNSET (cpu_features, AVX);
+ CPU_FEATURE_UNSET (cpu_features, AVX2);
- cpu_features->preferred[index_arch_Slow_SSE4_2]
- |= bit_arch_Slow_SSE4_2;
+ cpu_features->preferred[index_arch_Slow_SSE4_2]
+ |= bit_arch_Slow_SSE4_2;
- cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load]
- &= ~bit_arch_AVX_Fast_Unaligned_Load;
+ cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load]
+ &= ~bit_arch_AVX_Fast_Unaligned_Load;
}
}
else if (family == 0x7)
{
- if (model == 0x1b)
- {
- CPU_FEATURE_UNSET (cpu_features, AVX);
- CPU_FEATURE_UNSET (cpu_features, AVX2);
-
- cpu_features->preferred[index_arch_Slow_SSE4_2]
- |= bit_arch_Slow_SSE4_2;
-
- cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load]
- &= ~bit_arch_AVX_Fast_Unaligned_Load;
- }
- else if (model == 0x3b)
- {
- CPU_FEATURE_UNSET (cpu_features, AVX);
- CPU_FEATURE_UNSET (cpu_features, AVX2);
-
- cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load]
- &= ~bit_arch_AVX_Fast_Unaligned_Load;
- }
- }
+ switch (model)
+ {
+ case 0x1b:
+ CPU_FEATURE_UNSET (cpu_features, AVX);
+ CPU_FEATURE_UNSET (cpu_features, AVX2);
+
+ cpu_features->preferred[index_arch_Slow_SSE4_2]
+ |= bit_arch_Slow_SSE4_2;
+
+ cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load]
+ &= ~bit_arch_AVX_Fast_Unaligned_Load;
+ break;
+
+ case 0x3b:
+ CPU_FEATURE_UNSET (cpu_features, AVX);
+ CPU_FEATURE_UNSET (cpu_features, AVX2);
+
+ cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load]
+ &= ~bit_arch_AVX_Fast_Unaligned_Load;
+ break;
+
+ case 0x5b:
+ case 0x6b:
+ cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load]
+ &= ~bit_arch_AVX_Fast_Unaligned_Load;
+
+ cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER]
+ |= (bit_arch_Prefer_No_VZEROUPPER
+ | bit_arch_Fast_Unaligned_Load);
+ break;
+ }
+ }
}
else
{