x86: Set Prefer_No_VZEROUPPER if AVX512ER is available
Commit Message
Since AVX512ER is unique to Xeon Phi, set Prefer_No_VZEROUPPER if
AVX512ER is available.
Any comments?
H.J.
----
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
Prefer_No_VZEROUPPER if AVX512ER is available.
* sysdeps/x86/cpu-features.h
(bit_cpu_AVX512PF): New.
(bit_cpu_AVX512ER): Likewise.
(bit_cpu_AVX512CD): Likewise.
(bit_cpu_AVX512BW): Likewise.
(bit_cpu_AVX512VL): Likewise.
(index_cpu_AVX512PF): Likewise.
(index_cpu_AVX512ER): Likewise.
(index_cpu_AVX512CD): Likewise.
(index_cpu_AVX512BW): Likewise.
(index_cpu_AVX512VL): Likewise.
(reg_AVX512PF): Likewise.
(reg_AVX512ER): Likewise.
(reg_AVX512CD): Likewise.
(reg_AVX512BW): Likewise.
(reg_AVX512VL): Likewise.
---
sysdeps/x86/cpu-features.c | 8 ++++++--
sysdeps/x86/cpu-features.h | 15 +++++++++++++++
2 files changed, 21 insertions(+), 2 deletions(-)
Comments
On 04/17/2017 09:54 PM, H.J. Lu wrote:
> Since AVX512ER is unique to Xeon Phi, set Prefer_No_VZEROUPPER if
> AVX512ER is available.
This approach doesn't seem very future-proof to me.
Thanks,
Florian
On Mon, Apr 17, 2017 at 1:06 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 04/17/2017 09:54 PM, H.J. Lu wrote:
>>
>> Since AVX512ER is unique to Xeon Phi, set Prefer_No_VZEROUPPER if
>> AVX512ER is available.
>
>
> This approach doesn't seem very future-proof to me.
>
AVX512ER won't be implemented in any Xeon processors and it will be in
all Xeon Phi processors. In this way, we don't need to check CPU model
numbers when setting Prefer_No_VZEROUPPER. It will work with current
and future Xeon Phi and non-Xeon Phi processors.
On 04/17/2017 10:13 PM, H.J. Lu wrote:
> On Mon, Apr 17, 2017 at 1:06 PM, Florian Weimer <fweimer@redhat.com> wrote:
>> On 04/17/2017 09:54 PM, H.J. Lu wrote:
>>>
>>> Since AVX512ER is unique to Xeon Phi, set Prefer_No_VZEROUPPER if
>>> AVX512ER is available.
>>
>>
>> This approach doesn't seem very future-proof to me.
>>
>
> AVX512ER won't be implemented in any Xeon processors and it will be in
> all Xeon Phi processors. In this way, we don't need to check CPU model
> numbers when setting Prefer_No_VZEROUPPER. It will work with current
> and future Xeon Phi and non-Xeon Phi processors.
Well, okay. I see it's in the Intel-specific block, so the change
should be okay.
Thanks,
Florian
@@ -138,8 +138,6 @@ init_cpu_features (struct cpu_features *cpu_features)
case 0x57:
/* Knights Landing. Enable Silvermont optimizations. */
- cpu_features->feature[index_arch_Prefer_No_VZEROUPPER]
- |= bit_arch_Prefer_No_VZEROUPPER;
case 0x5c:
case 0x5f:
@@ -225,6 +223,12 @@ init_cpu_features (struct cpu_features *cpu_features)
cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
|= bit_arch_AVX_Fast_Unaligned_Load;
+ /* Since AVX512ER is unique to Xeon Phi, set Prefer_No_VZEROUPPER
+ if AVX512ER is available. */
+ if (CPU_FEATURES_CPU_P (cpu_features, AVX512ER))
+ cpu_features->feature[index_arch_Prefer_No_VZEROUPPER]
+ |= bit_arch_Prefer_No_VZEROUPPER;
+
/* To avoid SSE transition penalty, use _dl_runtime_resolve_slow.
If XGETBV suports ECX == 1, use _dl_runtime_resolve_opt. */
cpu_features->feature[index_arch_Use_dl_runtime_resolve_slow]
@@ -63,6 +63,11 @@
#define bit_cpu_AVX2 (1 << 5)
#define bit_cpu_AVX512F (1 << 16)
#define bit_cpu_AVX512DQ (1 << 17)
+#define bit_cpu_AVX512PF (1 << 26)
+#define bit_cpu_AVX512ER (1 << 27)
+#define bit_cpu_AVX512CD (1 << 28)
+#define bit_cpu_AVX512BW (1 << 30)
+#define bit_cpu_AVX512VL (1u << 31)
/* XCR0 Feature flags. */
#define bit_XMM_state (1 << 1)
@@ -239,6 +244,11 @@ extern const struct cpu_features *__get_cpu_features (void)
# define index_cpu_AVX2 COMMON_CPUID_INDEX_7
# define index_cpu_AVX512F COMMON_CPUID_INDEX_7
# define index_cpu_AVX512DQ COMMON_CPUID_INDEX_7
+# define index_cpu_AVX512PF COMMON_CPUID_INDEX_7
+# define index_cpu_AVX512ER COMMON_CPUID_INDEX_7
+# define index_cpu_AVX512CD COMMON_CPUID_INDEX_7
+# define index_cpu_AVX512BW COMMON_CPUID_INDEX_7
+# define index_cpu_AVX512VL COMMON_CPUID_INDEX_7
# define index_cpu_ERMS COMMON_CPUID_INDEX_7
# define index_cpu_RTM COMMON_CPUID_INDEX_7
# define index_cpu_FMA COMMON_CPUID_INDEX_1
@@ -258,6 +268,11 @@ extern const struct cpu_features *__get_cpu_features (void)
# define reg_AVX2 ebx
# define reg_AVX512F ebx
# define reg_AVX512DQ ebx
+# define reg_AVX512PF ebx
+# define reg_AVX512ER ebx
+# define reg_AVX512CD ebx
+# define reg_AVX512BW ebx
+# define reg_AVX512VL ebx
# define reg_ERMS ebx
# define reg_RTM ebx
# define reg_FMA ecx