[1/2] Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h

Message ID 1457049161-13783-1-git-send-email-hjl.tools@gmail.com
State New, archived
Headers

Commit Message

H.J. Lu March 3, 2016, 11:52 p.m. UTC
  index_* and bit_* macros are used to access cpuid and feature arrays o
struct cpu_features.  It is very easy to use bits and indices of cpuid
array on feature array, especially in assembly codes.  For example,
sysdeps/i386/i686/multiarch/bcopy.S has

	HAS_CPU_FEATURE (Fast_Rep_String)

which should be

	HAS_ARCH_FEATURE (Fast_Rep_String)

We change index_* and bit_* to index_cpu_*/index_arch_* and
bit_cpu_*/bit_arch_* so that we can catch such error at build time.

	[BZ #19762]
	* sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
	(EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
	* sysdeps/x86/cpu-features.h (bit_*): Renamed to ...
	(bit_arch_*): This for feature array.
	(bit_*): Renamed to ...
	(bit_cpu_*): This for cpu array.
	(index_*): Renamed to ...
	(index_arch_*): This for feature array.
	(index_*): Renamed to ...
	(index_cpu_*): This for cpu array.
	[__ASSEMBLER__] (HAS_FEATURE): Add and use field.
	[__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE.
	[__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE.
	[!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and
	bit_##name with index_cpu_##name and bit_cpu_##name.
	[!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and
	bit_##name with index_arch_##name and bit_arch_##name.
---
 sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h |   8 +-
 sysdeps/x86/cpu-features.c                      |  80 +++++----
 sysdeps/x86/cpu-features.h                      | 222 ++++++++++++------------
 3 files changed, 159 insertions(+), 151 deletions(-)
  

Comments

H.J. Lu March 6, 2016, 3:46 p.m. UTC | #1
On Thu, Mar 3, 2016 at 3:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> index_* and bit_* macros are used to access cpuid and feature arrays o
> struct cpu_features.  It is very easy to use bits and indices of cpuid
> array on feature array, especially in assembly codes.  For example,
> sysdeps/i386/i686/multiarch/bcopy.S has
>
>         HAS_CPU_FEATURE (Fast_Rep_String)
>
> which should be
>
>         HAS_ARCH_FEATURE (Fast_Rep_String)
>
> We change index_* and bit_* to index_cpu_*/index_arch_* and
> bit_cpu_*/bit_arch_* so that we can catch such error at build time.
>
>         [BZ #19762]
>         * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
>         (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*.
>         * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
>         * sysdeps/x86/cpu-features.h (bit_*): Renamed to ...
>         (bit_arch_*): This for feature array.
>         (bit_*): Renamed to ...
>         (bit_cpu_*): This for cpu array.
>         (index_*): Renamed to ...
>         (index_arch_*): This for feature array.
>         (index_*): Renamed to ...
>         (index_cpu_*): This for cpu array.
>         [__ASSEMBLER__] (HAS_FEATURE): Add and use field.
>         [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE.
>         [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE.
>         [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and
>         bit_##name with index_cpu_##name and bit_cpu_##name.
>         [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and
>         bit_##name with index_arch_##name and bit_arch_##name.

Any comments?  This change is almost mechanical.  But it is
very useful to avoid typos.
  
H.J. Lu March 10, 2016, 1:25 p.m. UTC | #2
On Sun, Mar 6, 2016 at 7:46 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Mar 3, 2016 at 3:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> index_* and bit_* macros are used to access cpuid and feature arrays o
>> struct cpu_features.  It is very easy to use bits and indices of cpuid
>> array on feature array, especially in assembly codes.  For example,
>> sysdeps/i386/i686/multiarch/bcopy.S has
>>
>>         HAS_CPU_FEATURE (Fast_Rep_String)
>>
>> which should be
>>
>>         HAS_ARCH_FEATURE (Fast_Rep_String)
>>
>> We change index_* and bit_* to index_cpu_*/index_arch_* and
>> bit_cpu_*/bit_arch_* so that we can catch such error at build time.
>>
>>         [BZ #19762]
>>         * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
>>         (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*.
>>         * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
>>         * sysdeps/x86/cpu-features.h (bit_*): Renamed to ...
>>         (bit_arch_*): This for feature array.
>>         (bit_*): Renamed to ...
>>         (bit_cpu_*): This for cpu array.
>>         (index_*): Renamed to ...
>>         (index_arch_*): This for feature array.
>>         (index_*): Renamed to ...
>>         (index_cpu_*): This for cpu array.
>>         [__ASSEMBLER__] (HAS_FEATURE): Add and use field.
>>         [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE.
>>         [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE.
>>         [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and
>>         bit_##name with index_cpu_##name and bit_cpu_##name.
>>         [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and
>>         bit_##name with index_arch_##name and bit_arch_##name.
>
> Any comments?  This change is almost mechanical.  But it is
> very useful to avoid typos.
>

I am checking it in now.
  
Roland McGrath March 11, 2016, 9:47 p.m. UTC | #3
Your post didn't mention testing done.  Every post of a patch you want
actually considered for approval should explicitly say what testing you
have done, and nobody should approve if you haven't clearly tested it.
Any patch that has been waiting for more than a day or two needs
explicit re-testing (and report of doing so) before you land it.  This
change broke the 'make check' build on Linux/x86_64.  There is really no
excuse for that.
  
H.J. Lu March 11, 2016, 9:50 p.m. UTC | #4
On Fri, Mar 11, 2016 at 1:47 PM, Roland McGrath <roland@hack.frob.com> wrote:
> Your post didn't mention testing done.  Every post of a patch you want
> actually considered for approval should explicitly say what testing you
> have done, and nobody should approve if you haven't clearly tested it.
> Any patch that has been waiting for more than a day or two needs
> explicit re-testing (and report of doing so) before you land it.  This
> change broke the 'make check' build on Linux/x86_64.  There is really no
> excuse for that.

Sorry I didn't mention that I tested it on both x86-64 and i686 before
commit.
  
Roland McGrath March 11, 2016, 10 p.m. UTC | #5
> Sorry I didn't mention that I tested it on both x86-64 and i686 before
> commit.

But clearly you didn't!  The tree right now is broken, as I said.
  
H.J. Lu March 11, 2016, 10:20 p.m. UTC | #6
On Fri, Mar 11, 2016 at 2:00 PM, Roland McGrath <roland@hack.frob.com> wrote:
>> Sorry I didn't mention that I tested it on both x86-64 and i686 before
>> commit.
>
> But clearly you didn't!  The tree right now is broken, as I said.

I double checked again with

commit 6aa3e97e2530f9917f504eb4146af119a3f27229
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 10 05:26:46 2016 -0800

    Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h

on x86-64 and i686.  I only see:

[hjl@gnu-6 build-i686-linux]$ ./nptl/tst-cleanupx4
test 0
clh (1)
clh (2)
clh (3)
test 1
clh (1)
clh (4)
clh (5)
clh (6)
test 2
clh (7)
clh (8)
global = 64, expected 120
test 3
clh (1)
clh (2)
clh (9)
clh (10)
[hjl@gnu-6 build-i686-linux]$

Did you remove the old build directory?
  
Roland McGrath March 11, 2016, 10:29 p.m. UTC | #7
> On Fri, Mar 11, 2016 at 2:00 PM, Roland McGrath <roland@hack.frob.com> wrote:
> >> Sorry I didn't mention that I tested it on both x86-64 and i686 before
> >> commit.
> >
> > But clearly you didn't!  The tree right now is broken, as I said.
> 
> I double checked again with
> 
> commit 6aa3e97e2530f9917f504eb4146af119a3f27229
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date:   Thu Mar 10 05:26:46 2016 -0800
> 
>     Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h
> 
> on x86-64 and i686.  I only see:
> 
> [hjl@gnu-6 build-i686-linux]$ ./nptl/tst-cleanupx4
> test 0
> clh (1)
> clh (2)
> clh (3)
> test 1
> clh (1)
> clh (4)
> clh (5)
> clh (6)
> test 2
> clh (7)
> clh (8)
> global = 64, expected 120
> test 3
> clh (1)
> clh (2)
> clh (9)
> clh (10)
> [hjl@gnu-6 build-i686-linux]$
> 
> Did you remove the old build directory?

http://130.211.48.148:8080/builders/glibc-x86_64-linux/builds/1125/steps/check%20%28clobber%29/logs/stdio
  

Patch

diff --git a/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h b/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
index a759934..1c67050 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
+++ b/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
@@ -30,10 +30,10 @@ 
    is always disabled for SUID programs and can be enabled by setting
    environment variable, LD_PREFER_MAP_32BIT_EXEC.  */
 #define EXTRA_LD_ENVVARS \
-  case 21:							      \
-    if (memcmp (envline, "PREFER_MAP_32BIT_EXEC", 21) == 0)	      \
-      GLRO(dl_x86_cpu_features).feature[index_Prefer_MAP_32BIT_EXEC]  \
-	|= bit_Prefer_MAP_32BIT_EXEC;				      \
+  case 21:								  \
+    if (memcmp (envline, "PREFER_MAP_32BIT_EXEC", 21) == 0)		  \
+      GLRO(dl_x86_cpu_features).feature[index_arch_Prefer_MAP_32BIT_EXEC] \
+	|= bit_arch_Prefer_MAP_32BIT_EXEC;				  \
     break;
 
 /* Extra unsecure variables.  The names are all stuffed in a single
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 218ff2b..1787716 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -75,13 +75,14 @@  init_cpu_features (struct cpu_features *cpu_features)
 	    case 0x1c:
 	    case 0x26:
 	      /* BSF is slow on Atom.  */
-	      cpu_features->feature[index_Slow_BSF] |= bit_Slow_BSF;
+	      cpu_features->feature[index_arch_Slow_BSF]
+		|= bit_arch_Slow_BSF;
 	      break;
 
 	    case 0x57:
 	      /* Knights Landing.  Enable Silvermont optimizations.  */
-	      cpu_features->feature[index_Prefer_No_VZEROUPPER]
-		|= bit_Prefer_No_VZEROUPPER;
+	      cpu_features->feature[index_arch_Prefer_No_VZEROUPPER]
+		|= bit_arch_Prefer_No_VZEROUPPER;
 
 	    case 0x37:
 	    case 0x4a:
@@ -90,22 +91,22 @@  init_cpu_features (struct cpu_features *cpu_features)
 	    case 0x5d:
 	      /* Unaligned load versions are faster than SSSE3
 		 on Silvermont.  */
-#if index_Fast_Unaligned_Load != index_Prefer_PMINUB_for_stringop
-# error index_Fast_Unaligned_Load != index_Prefer_PMINUB_for_stringop
+#if index_arch_Fast_Unaligned_Load != index_arch_Prefer_PMINUB_for_stringop
+# error index_arch_Fast_Unaligned_Load != index_arch_Prefer_PMINUB_for_stringop
 #endif
-#if index_Fast_Unaligned_Load != index_Slow_SSE4_2
-# error index_Fast_Unaligned_Load != index_Slow_SSE4_2
+#if index_arch_Fast_Unaligned_Load != index_arch_Slow_SSE4_2
+# error index_arch_Fast_Unaligned_Load != index_arch_Slow_SSE4_2
 #endif
-	      cpu_features->feature[index_Fast_Unaligned_Load]
-		|= (bit_Fast_Unaligned_Load
-		    | bit_Prefer_PMINUB_for_stringop
-		    | bit_Slow_SSE4_2);
+	      cpu_features->feature[index_arch_Fast_Unaligned_Load]
+		|= (bit_arch_Fast_Unaligned_Load
+		    | bit_arch_Prefer_PMINUB_for_stringop
+		    | bit_arch_Slow_SSE4_2);
 	      break;
 
 	    default:
 	      /* Unknown family 0x06 processors.  Assuming this is one
 		 of Core i3/i5/i7 processors if AVX is available.  */
-	      if ((ecx & bit_AVX) == 0)
+	      if ((ecx & bit_cpu_AVX) == 0)
 		break;
 
 	    case 0x1a:
@@ -117,20 +118,20 @@  init_cpu_features (struct cpu_features *cpu_features)
 	    case 0x2f:
 	      /* Rep string instructions, copy backward, unaligned loads
 		 and pminub are fast on Intel Core i3, i5 and i7.  */
-#if index_Fast_Rep_String != index_Fast_Copy_Backward
-# error index_Fast_Rep_String != index_Fast_Copy_Backward
+#if index_arch_Fast_Rep_String != index_arch_Fast_Copy_Backward
+# error index_arch_Fast_Rep_String != index_arch_Fast_Copy_Backward
 #endif
-#if index_Fast_Rep_String != index_Fast_Unaligned_Load
-# error index_Fast_Rep_String != index_Fast_Unaligned_Load
+#if index_arch_Fast_Rep_String != index_arch_Fast_Unaligned_Load
+# error index_arch_Fast_Rep_String != index_arch_Fast_Unaligned_Load
 #endif
-#if index_Fast_Rep_String != index_Prefer_PMINUB_for_stringop
-# error index_Fast_Rep_String != index_Prefer_PMINUB_for_stringop
+#if index_arch_Fast_Rep_String != index_arch_Prefer_PMINUB_for_stringop
+# error index_arch_Fast_Rep_String != index_arch_Prefer_PMINUB_for_stringop
 #endif
-	      cpu_features->feature[index_Fast_Rep_String]
-		|= (bit_Fast_Rep_String
-		    | bit_Fast_Copy_Backward
-		    | bit_Fast_Unaligned_Load
-		    | bit_Prefer_PMINUB_for_stringop);
+	      cpu_features->feature[index_arch_Fast_Rep_String]
+		|= (bit_arch_Fast_Rep_String
+		    | bit_arch_Fast_Copy_Backward
+		    | bit_arch_Fast_Unaligned_Load
+		    | bit_arch_Prefer_PMINUB_for_stringop);
 	      break;
 	    }
 	}
@@ -159,8 +160,8 @@  init_cpu_features (struct cpu_features *cpu_features)
 	{
 	  /* "Excavator"   */
 	  if (model >= 0x60 && model <= 0x7f)
-	    cpu_features->feature[index_Fast_Unaligned_Load]
-	      |= bit_Fast_Unaligned_Load;
+	    cpu_features->feature[index_arch_Fast_Unaligned_Load]
+	      |= bit_arch_Fast_Unaligned_Load;
 	}
     }
   else
@@ -168,11 +169,11 @@  init_cpu_features (struct cpu_features *cpu_features)
 
   /* Support i586 if CX8 is available.  */
   if (HAS_CPU_FEATURE (CX8))
-    cpu_features->feature[index_I586] |= bit_I586;
+    cpu_features->feature[index_arch_I586] |= bit_arch_I586;
 
   /* Support i686 if CMOV is available.  */
   if (HAS_CPU_FEATURE (CMOV))
-    cpu_features->feature[index_I686] |= bit_I686;
+    cpu_features->feature[index_arch_I686] |= bit_arch_I686;
 
   if (cpu_features->max_cpuid >= 7)
     __cpuid_count (7, 0,
@@ -193,15 +194,16 @@  init_cpu_features (struct cpu_features *cpu_features)
 	{
 	  /* Determine if AVX is usable.  */
 	  if (HAS_CPU_FEATURE (AVX))
-	    cpu_features->feature[index_AVX_Usable] |= bit_AVX_Usable;
-#if index_AVX2_Usable != index_AVX_Fast_Unaligned_Load
-# error index_AVX2_Usable != index_AVX_Fast_Unaligned_Load
+	    cpu_features->feature[index_arch_AVX_Usable]
+	      |= bit_arch_AVX_Usable;
+#if index_arch_AVX2_Usable != index_arch_AVX_Fast_Unaligned_Load
+# error index_arch_AVX2_Usable != index_arch_AVX_Fast_Unaligned_Load
 #endif
 	  /* Determine if AVX2 is usable.  Unaligned load with 256-bit
 	     AVX registers are faster on processors with AVX2.  */
 	  if (HAS_CPU_FEATURE (AVX2))
-	    cpu_features->feature[index_AVX2_Usable]
-	      |= bit_AVX2_Usable | bit_AVX_Fast_Unaligned_Load;
+	    cpu_features->feature[index_arch_AVX2_Usable]
+	      |= bit_arch_AVX2_Usable | bit_arch_AVX_Fast_Unaligned_Load;
 	  /* Check if OPMASK state, upper 256-bit of ZMM0-ZMM15 and
 	     ZMM16-ZMM31 state are enabled.  */
 	  if ((xcrlow & (bit_Opmask_state | bit_ZMM0_15_state
@@ -211,20 +213,22 @@  init_cpu_features (struct cpu_features *cpu_features)
 	      /* Determine if AVX512F is usable.  */
 	      if (HAS_CPU_FEATURE (AVX512F))
 		{
-		  cpu_features->feature[index_AVX512F_Usable]
-		    |= bit_AVX512F_Usable;
+		  cpu_features->feature[index_arch_AVX512F_Usable]
+		    |= bit_arch_AVX512F_Usable;
 		  /* Determine if AVX512DQ is usable.  */
 		  if (HAS_CPU_FEATURE (AVX512DQ))
-		    cpu_features->feature[index_AVX512DQ_Usable]
-		      |= bit_AVX512DQ_Usable;
+		    cpu_features->feature[index_arch_AVX512DQ_Usable]
+		      |= bit_arch_AVX512DQ_Usable;
 		}
 	    }
 	  /* Determine if FMA is usable.  */
 	  if (HAS_CPU_FEATURE (FMA))
-	    cpu_features->feature[index_FMA_Usable] |= bit_FMA_Usable;
+	    cpu_features->feature[index_arch_FMA_Usable]
+	      |= bit_arch_FMA_Usable;
 	  /* Determine if FMA4 is usable.  */
 	  if (HAS_CPU_FEATURE (FMA4))
-	    cpu_features->feature[index_FMA4_Usable] |= bit_FMA4_Usable;
+	    cpu_features->feature[index_arch_FMA4_Usable]
+	      |= bit_arch_FMA4_Usable;
 	}
     }
 
diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
index e354920..0624a92 100644
--- a/sysdeps/x86/cpu-features.h
+++ b/sysdeps/x86/cpu-features.h
@@ -18,48 +18,48 @@ 
 #ifndef cpu_features_h
 #define cpu_features_h
 
-#define bit_Fast_Rep_String		(1 << 0)
-#define bit_Fast_Copy_Backward		(1 << 1)
-#define bit_Slow_BSF			(1 << 2)
-#define bit_Fast_Unaligned_Load		(1 << 4)
-#define bit_Prefer_PMINUB_for_stringop	(1 << 5)
-#define bit_AVX_Usable			(1 << 6)
-#define bit_FMA_Usable			(1 << 7)
-#define bit_FMA4_Usable			(1 << 8)
-#define bit_Slow_SSE4_2			(1 << 9)
-#define bit_AVX2_Usable			(1 << 10)
-#define bit_AVX_Fast_Unaligned_Load	(1 << 11)
-#define bit_AVX512F_Usable		(1 << 12)
-#define bit_AVX512DQ_Usable		(1 << 13)
-#define bit_I586			(1 << 14)
-#define bit_I686			(1 << 15)
-#define bit_Prefer_MAP_32BIT_EXEC	(1 << 16)
-#define bit_Prefer_No_VZEROUPPER	(1 << 17)
+#define bit_arch_Fast_Rep_String		(1 << 0)
+#define bit_arch_Fast_Copy_Backward		(1 << 1)
+#define bit_arch_Slow_BSF			(1 << 2)
+#define bit_arch_Fast_Unaligned_Load		(1 << 4)
+#define bit_arch_Prefer_PMINUB_for_stringop	(1 << 5)
+#define bit_arch_AVX_Usable			(1 << 6)
+#define bit_arch_FMA_Usable			(1 << 7)
+#define bit_arch_FMA4_Usable			(1 << 8)
+#define bit_arch_Slow_SSE4_2			(1 << 9)
+#define bit_arch_AVX2_Usable			(1 << 10)
+#define bit_arch_AVX_Fast_Unaligned_Load	(1 << 11)
+#define bit_arch_AVX512F_Usable			(1 << 12)
+#define bit_arch_AVX512DQ_Usable		(1 << 13)
+#define bit_arch_I586				(1 << 14)
+#define bit_arch_I686				(1 << 15)
+#define bit_arch_Prefer_MAP_32BIT_EXEC		(1 << 16)
+#define bit_arch_Prefer_No_VZEROUPPER		(1 << 17)
 
 /* CPUID Feature flags.  */
 
 /* COMMON_CPUID_INDEX_1.  */
-#define bit_CX8		(1 << 8)
-#define bit_CMOV	(1 << 15)
-#define bit_SSE2	(1 << 26)
-#define bit_SSSE3	(1 << 9)
-#define bit_SSE4_1	(1 << 19)
-#define bit_SSE4_2	(1 << 20)
-#define bit_OSXSAVE	(1 << 27)
-#define bit_AVX		(1 << 28)
-#define bit_POPCOUNT	(1 << 23)
-#define bit_FMA		(1 << 12)
-#define bit_FMA4	(1 << 16)
+#define bit_cpu_CX8		(1 << 8)
+#define bit_cpu_CMOV		(1 << 15)
+#define bit_cpu_SSE2		(1 << 26)
+#define bit_cpu_SSSE3		(1 << 9)
+#define bit_cpu_SSE4_1		(1 << 19)
+#define bit_cpu_SSE4_2		(1 << 20)
+#define bit_cpu_OSXSAVE		(1 << 27)
+#define bit_cpu_AVX		(1 << 28)
+#define bit_cpu_POPCOUNT	(1 << 23)
+#define bit_cpu_FMA		(1 << 12)
+#define bit_cpu_FMA4		(1 << 16)
 
 /* COMMON_CPUID_INDEX_7.  */
-#define bit_RTM		(1 << 11)
-#define bit_AVX2	(1 << 5)
-#define bit_AVX512F	(1 << 16)
-#define bit_AVX512DQ	(1 << 17)
+#define bit_cpu_RTM		(1 << 11)
+#define bit_cpu_AVX2		(1 << 5)
+#define bit_cpu_AVX512F		(1 << 16)
+#define bit_cpu_AVX512DQ	(1 << 17)
 
 /* XCR0 Feature flags.  */
-#define bit_XMM_state  (1 << 1)
-#define bit_YMM_state  (2 << 1)
+#define bit_XMM_state		(1 << 1)
+#define bit_YMM_state		(2 << 1)
 #define bit_Opmask_state	(1 << 5)
 #define bit_ZMM0_15_state	(1 << 6)
 #define bit_ZMM16_31_state	(1 << 7)
@@ -75,32 +75,32 @@ 
 # include <ifunc-defines.h>
 # include <rtld-global-offsets.h>
 
-# define index_CX8	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_EDX_OFFSET
-# define index_CMOV	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_EDX_OFFSET
-# define index_SSE2	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_EDX_OFFSET
-# define index_SSSE3	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
-# define index_SSE4_1	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
-# define index_SSE4_2	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
-# define index_AVX	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
-# define index_AVX2	COMMON_CPUID_INDEX_7*CPUID_SIZE+CPUID_EBX_OFFSET
+# define index_cpu_CX8	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_EDX_OFFSET
+# define index_cpu_CMOV	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_EDX_OFFSET
+# define index_cpu_SSE2	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_EDX_OFFSET
+# define index_cpu_SSSE3 COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
+# define index_cpu_SSE4_1 COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
+# define index_cpu_SSE4_2 COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
+# define index_cpu_AVX	COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET
+# define index_cpu_AVX2	COMMON_CPUID_INDEX_7*CPUID_SIZE+CPUID_EBX_OFFSET
 
-# define index_Fast_Rep_String		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Fast_Copy_Backward	FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Slow_BSF			FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Fast_Unaligned_Load	FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Prefer_PMINUB_for_stringop FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_FMA_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_FMA4_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Slow_SSE4_2		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX2_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX_Fast_Unaligned_Load	FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX512F_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_AVX512DQ_Usable		FEATURE_INDEX_1*FEATURE_SIZE
-# define index_I586			FEATURE_INDEX_1*FEATURE_SIZE
-# define index_I686			FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Prefer_MAP_32BIT_EXEC	FEATURE_INDEX_1*FEATURE_SIZE
-# define index_Prefer_No_VZEROUPPER	FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Fast_Rep_String	FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Fast_Copy_Backward	FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Slow_BSF		FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Fast_Unaligned_Load	FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Prefer_PMINUB_for_stringop FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_AVX_Usable		FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_FMA_Usable		FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_FMA4_Usable		FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Slow_SSE4_2		FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_AVX2_Usable		FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_AVX_Fast_Unaligned_Load FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_AVX512F_Usable	FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_AVX512DQ_Usable	FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_I586		FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_I686		FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Prefer_MAP_32BIT_EXEC FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Prefer_No_VZEROUPPER FEATURE_INDEX_1*FEATURE_SIZE
 
 
 # if defined (_LIBC) && !IS_IN (nonlib)
@@ -108,19 +108,21 @@ 
 #   ifdef SHARED
 #    if IS_IN (rtld)
 #     define LOAD_RTLD_GLOBAL_RO_RDX
-#     define HAS_FEATURE(offset, name) \
-  testl $(bit_##name), _rtld_local_ro+offset+(index_##name)(%rip)
+#     define HAS_FEATURE(offset, field, name) \
+  testl $(bit_##field##_##name), \
+	_rtld_local_ro+offset+(index_##field##_##name)(%rip)
 #    else
 #      define LOAD_RTLD_GLOBAL_RO_RDX \
   mov _rtld_global_ro@GOTPCREL(%rip), %RDX_LP
-#     define HAS_FEATURE(offset, name) \
-  testl $(bit_##name), \
-	RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+offset+(index_##name)(%rdx)
+#     define HAS_FEATURE(offset, field, name) \
+  testl $(bit_##field##_##name), \
+	RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+offset+(index_##field##_##name)(%rdx)
 #    endif
 #   else /* SHARED */
 #    define LOAD_RTLD_GLOBAL_RO_RDX
-#    define HAS_FEATURE(offset, name) \
-  testl $(bit_##name), _dl_x86_cpu_features+offset+(index_##name)(%rip)
+#    define HAS_FEATURE(offset, field, name) \
+  testl $(bit_##field##_##name), \
+	_dl_x86_cpu_features+offset+(index_##field##_##name)(%rip)
 #   endif /* !SHARED */
 #  else  /* __x86_64__ */
 #   ifdef SHARED
@@ -129,22 +131,24 @@ 
 #    if IS_IN (rtld)
 #    define LOAD_GOT_AND_RTLD_GLOBAL_RO \
   LOAD_PIC_REG(dx)
-#     define HAS_FEATURE(offset, name) \
-  testl $(bit_##name), offset+(index_##name)+_rtld_local_ro@GOTOFF(%edx)
+#     define HAS_FEATURE(offset, field, name) \
+  testl $(bit_##field##_##name), \
+	offset+(index_##field##_##name)+_rtld_local_ro@GOTOFF(%edx)
 #    else
 #     define LOAD_GOT_AND_RTLD_GLOBAL_RO \
   LOAD_PIC_REG(dx); \
   mov _rtld_global_ro@GOT(%edx), %ecx
-#     define HAS_FEATURE(offset, name) \
-  testl $(bit_##name), \
-	RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+offset+(index_##name)(%ecx)
+#     define HAS_FEATURE(offset, field, name) \
+  testl $(bit_##field##_##name), \
+	RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+offset+(index_##field##_##name)(%ecx)
 #    endif
 #   else  /* SHARED */
 #    define LOAD_FUNC_GOT_EAX(func) \
   leal func, %eax
 #    define LOAD_GOT_AND_RTLD_GLOBAL_RO
-#    define HAS_FEATURE(offset, name) \
-  testl $(bit_##name), _dl_x86_cpu_features+offset+(index_##name)
+#    define HAS_FEATURE(offset, field, name) \
+  testl $(bit_##field##_##name), \
+	_dl_x86_cpu_features+offset+(index_##field##_##name)
 #   endif /* !SHARED */
 #  endif /* !__x86_64__ */
 # else /* _LIBC && !nonlib */
@@ -152,8 +156,8 @@ 
 # endif /* !_LIBC || nonlib */
 
 /* HAS_* evaluates to true if we may use the feature at runtime.  */
-# define HAS_CPU_FEATURE(name)	HAS_FEATURE (CPUID_OFFSET, name)
-# define HAS_ARCH_FEATURE(name) HAS_FEATURE (FEATURE_OFFSET, name)
+# define HAS_CPU_FEATURE(name)	HAS_FEATURE (CPUID_OFFSET, cpu, name)
+# define HAS_ARCH_FEATURE(name) HAS_FEATURE (FEATURE_OFFSET, arch, name)
 
 #else	/* __ASSEMBLER__ */
 
@@ -202,25 +206,25 @@  extern const struct cpu_features *__get_cpu_features (void)
 
 /* HAS_* evaluates to true if we may use the feature at runtime.  */
 # define HAS_CPU_FEATURE(name) \
-  ((__get_cpu_features ()->cpuid[index_##name].reg_##name & (bit_##name)) != 0)
+  ((__get_cpu_features ()->cpuid[index_cpu_##name].reg_##name & (bit_cpu_##name)) != 0)
 # define HAS_ARCH_FEATURE(name) \
-  ((__get_cpu_features ()->feature[index_##name] & (bit_##name)) != 0)
+  ((__get_cpu_features ()->feature[index_arch_##name] & (bit_arch_##name)) != 0)
 
-# define index_CX8		COMMON_CPUID_INDEX_1
-# define index_CMOV		COMMON_CPUID_INDEX_1
-# define index_SSE2		COMMON_CPUID_INDEX_1
-# define index_SSSE3		COMMON_CPUID_INDEX_1
-# define index_SSE4_1		COMMON_CPUID_INDEX_1
-# define index_SSE4_2		COMMON_CPUID_INDEX_1
-# define index_AVX		COMMON_CPUID_INDEX_1
-# define index_AVX2		COMMON_CPUID_INDEX_7
-# define index_AVX512F		COMMON_CPUID_INDEX_7
-# define index_AVX512DQ		COMMON_CPUID_INDEX_7
-# define index_RTM		COMMON_CPUID_INDEX_7
-# define index_FMA		COMMON_CPUID_INDEX_1
-# define index_FMA4		COMMON_CPUID_INDEX_80000001
-# define index_POPCOUNT		COMMON_CPUID_INDEX_1
-# define index_OSXSAVE		COMMON_CPUID_INDEX_1
+# define index_cpu_CX8		COMMON_CPUID_INDEX_1
+# define index_cpu_CMOV		COMMON_CPUID_INDEX_1
+# define index_cpu_SSE2		COMMON_CPUID_INDEX_1
+# define index_cpu_SSSE3	COMMON_CPUID_INDEX_1
+# define index_cpu_SSE4_1	COMMON_CPUID_INDEX_1
+# define index_cpu_SSE4_2	COMMON_CPUID_INDEX_1
+# define index_cpu_AVX		COMMON_CPUID_INDEX_1
+# define index_cpu_AVX2		COMMON_CPUID_INDEX_7
+# define index_cpu_AVX512F	COMMON_CPUID_INDEX_7
+# define index_cpu_AVX512DQ	COMMON_CPUID_INDEX_7
+# define index_cpu_RTM		COMMON_CPUID_INDEX_7
+# define index_cpu_FMA		COMMON_CPUID_INDEX_1
+# define index_cpu_FMA4		COMMON_CPUID_INDEX_80000001
+# define index_cpu_POPCOUNT	COMMON_CPUID_INDEX_1
+# define index_cpu_OSXSAVE	COMMON_CPUID_INDEX_1
 
 # define reg_CX8		edx
 # define reg_CMOV		edx
@@ -238,23 +242,23 @@  extern const struct cpu_features *__get_cpu_features (void)
 # define reg_POPCOUNT		ecx
 # define reg_OSXSAVE		ecx
 
-# define index_Fast_Rep_String		FEATURE_INDEX_1
-# define index_Fast_Copy_Backward	FEATURE_INDEX_1
-# define index_Slow_BSF			FEATURE_INDEX_1
-# define index_Fast_Unaligned_Load	FEATURE_INDEX_1
-# define index_Prefer_PMINUB_for_stringop FEATURE_INDEX_1
-# define index_AVX_Usable		FEATURE_INDEX_1
-# define index_FMA_Usable		FEATURE_INDEX_1
-# define index_FMA4_Usable		FEATURE_INDEX_1
-# define index_Slow_SSE4_2		FEATURE_INDEX_1
-# define index_AVX2_Usable		FEATURE_INDEX_1
-# define index_AVX_Fast_Unaligned_Load	FEATURE_INDEX_1
-# define index_AVX512F_Usable		FEATURE_INDEX_1
-# define index_AVX512DQ_Usable		FEATURE_INDEX_1
-# define index_I586			FEATURE_INDEX_1
-# define index_I686			FEATURE_INDEX_1
-# define index_Prefer_MAP_32BIT_EXEC	FEATURE_INDEX_1
-# define index_Prefer_No_VZEROUPPER     FEATURE_INDEX_1
+# define index_arch_Fast_Rep_String	FEATURE_INDEX_1
+# define index_arch_Fast_Copy_Backward	FEATURE_INDEX_1
+# define index_arch_Slow_BSF		FEATURE_INDEX_1
+# define index_arch_Fast_Unaligned_Load	FEATURE_INDEX_1
+# define index_arch_Prefer_PMINUB_for_stringop FEATURE_INDEX_1
+# define index_arch_AVX_Usable		FEATURE_INDEX_1
+# define index_arch_FMA_Usable		FEATURE_INDEX_1
+# define index_arch_FMA4_Usable		FEATURE_INDEX_1
+# define index_arch_Slow_SSE4_2		FEATURE_INDEX_1
+# define index_arch_AVX2_Usable		FEATURE_INDEX_1
+# define index_arch_AVX_Fast_Unaligned_Load FEATURE_INDEX_1
+# define index_arch_AVX512F_Usable	FEATURE_INDEX_1
+# define index_arch_AVX512DQ_Usable	FEATURE_INDEX_1
+# define index_arch_I586		FEATURE_INDEX_1
+# define index_arch_I686		FEATURE_INDEX_1
+# define index_arch_Prefer_MAP_32BIT_EXEC FEATURE_INDEX_1
+# define index_arch_Prefer_No_VZEROUPPER FEATURE_INDEX_1
 
 #endif	/* !__ASSEMBLER__ */