From patchwork Wed Aug 5 17:14:08 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lu, Hongjiu" X-Patchwork-Id: 8021 Received: (qmail 33679 invoked by alias); 5 Aug 2015 17:14:18 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 33670 invoked by uid 89); 5 Aug 2015 17:14:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL, BAYES_50, KAM_LAZY_DOMAIN_SECURITY, NO_DNS_FOR_FROM, RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mga03.intel.com X-ExtLoop1: 1 Date: Wed, 5 Aug 2015 10:14:08 -0700 From: "H.J. Lu" To: GNU C Library Subject: [PATCH 2/5] Update x86_64 multiarch functions for Message-ID: <20150805171407.GB2340@intel.com> Reply-To: "H.J. Lu" MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) This patch updates x86_64 multiarch functions to use the newly defined HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from . OK for master? H.J. --- * sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1). * sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise. * sysdeps/x86_64/multiarch/strstr.c: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/test-multiarch.c: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features call. Add LOAD_RTLD_GLOBAL_RO_RDX. Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/strcat.S: Likewise. * sysdeps/x86_64/multiarch/strchr.S: Likewise. * sysdeps/x86_64/multiarch/strcmp.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strcspn.S: Likewise. * sysdeps/x86_64/multiarch/strspn.S: Likewise. * sysdeps/x86_64/multiarch/wcscpy.S: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise. --- sysdeps/x86_64/fpu/multiarch/e_asin.c | 8 +- sysdeps/x86_64/fpu/multiarch/e_atan2.c | 9 +- sysdeps/x86_64/fpu/multiarch/e_exp.c | 9 +- sysdeps/x86_64/fpu/multiarch/e_log.c | 9 +- sysdeps/x86_64/fpu/multiarch/e_pow.c | 5 +- sysdeps/x86_64/fpu/multiarch/s_atan.c | 9 +- sysdeps/x86_64/fpu/multiarch/s_ceil.S | 5 +- sysdeps/x86_64/fpu/multiarch/s_ceilf.S | 5 +- sysdeps/x86_64/fpu/multiarch/s_floor.S | 5 +- sysdeps/x86_64/fpu/multiarch/s_floorf.S | 4 +- sysdeps/x86_64/fpu/multiarch/s_fma.c | 9 +- sysdeps/x86_64/fpu/multiarch/s_fmaf.c | 9 +- sysdeps/x86_64/fpu/multiarch/s_nearbyint.S | 4 +- sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S | 5 +- sysdeps/x86_64/fpu/multiarch/s_rint.S | 5 +- sysdeps/x86_64/fpu/multiarch/s_rintf.S | 5 +- sysdeps/x86_64/fpu/multiarch/s_sin.c | 14 +-- sysdeps/x86_64/fpu/multiarch/s_tan.c | 9 +- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 139 ++++++++++++++++++---------- sysdeps/x86_64/multiarch/memcmp.S | 9 +- sysdeps/x86_64/multiarch/memcpy.S | 12 +-- sysdeps/x86_64/multiarch/memcpy_chk.S | 12 +-- sysdeps/x86_64/multiarch/memmove.c | 6 +- sysdeps/x86_64/multiarch/memmove_chk.c | 6 +- sysdeps/x86_64/multiarch/mempcpy.S | 12 +-- sysdeps/x86_64/multiarch/mempcpy_chk.S | 12 +-- sysdeps/x86_64/multiarch/memset.S | 8 +- sysdeps/x86_64/multiarch/memset_chk.S | 8 +- sysdeps/x86_64/multiarch/sched_cpucount.c | 2 +- sysdeps/x86_64/multiarch/strcat.S | 10 +- sysdeps/x86_64/multiarch/strchr.S | 8 +- sysdeps/x86_64/multiarch/strcmp.S | 42 +++------ sysdeps/x86_64/multiarch/strcpy.S | 10 +- sysdeps/x86_64/multiarch/strcspn.S | 8 +- sysdeps/x86_64/multiarch/strspn.S | 8 +- sysdeps/x86_64/multiarch/strstr.c | 5 +- sysdeps/x86_64/multiarch/test-multiarch.c | 18 ++-- sysdeps/x86_64/multiarch/wcscpy.S | 7 +- sysdeps/x86_64/multiarch/wmemcmp.S | 9 +- 39 files changed, 246 insertions(+), 233 deletions(-) diff --git a/sysdeps/x86_64/fpu/multiarch/e_asin.c b/sysdeps/x86_64/fpu/multiarch/e_asin.c index 55865c0..a0edb96 100644 --- a/sysdeps/x86_64/fpu/multiarch/e_asin.c +++ b/sysdeps/x86_64/fpu/multiarch/e_asin.c @@ -9,11 +9,15 @@ extern double __ieee754_acos_fma4 (double); extern double __ieee754_asin_fma4 (double); libm_ifunc (__ieee754_acos, - HAS_FMA4 ? __ieee754_acos_fma4 : __ieee754_acos_sse2); + HAS_ARCH_FEATURE (FMA4_Usable) + ? __ieee754_acos_fma4 + : __ieee754_acos_sse2); strong_alias (__ieee754_acos, __acos_finite) libm_ifunc (__ieee754_asin, - HAS_FMA4 ? __ieee754_asin_fma4 : __ieee754_asin_sse2); + HAS_ARCH_FEATURE (FMA4_Usable) + ? __ieee754_asin_fma4 + : __ieee754_asin_sse2); strong_alias (__ieee754_asin, __asin_finite) # define __ieee754_acos __ieee754_acos_sse2 diff --git a/sysdeps/x86_64/fpu/multiarch/e_atan2.c b/sysdeps/x86_64/fpu/multiarch/e_atan2.c index 547681c..269dcc9 100644 --- a/sysdeps/x86_64/fpu/multiarch/e_atan2.c +++ b/sysdeps/x86_64/fpu/multiarch/e_atan2.c @@ -8,14 +8,15 @@ extern double __ieee754_atan2_avx (double, double); # ifdef HAVE_FMA4_SUPPORT extern double __ieee754_atan2_fma4 (double, double); # else -# undef HAS_FMA4 -# define HAS_FMA4 0 +# undef HAS_ARCH_FEATURE +# define HAS_ARCH_FEATURE(feature) 0 # define __ieee754_atan2_fma4 ((void *) 0) # endif libm_ifunc (__ieee754_atan2, - HAS_FMA4 ? __ieee754_atan2_fma4 - : (HAS_AVX ? __ieee754_atan2_avx : __ieee754_atan2_sse2)); + HAS_ARCH_FEATURE (FMA4_Usable) ? __ieee754_atan2_fma4 + : (HAS_ARCH_FEATURE (AVX_Usable) + ? __ieee754_atan2_avx : __ieee754_atan2_sse2)); strong_alias (__ieee754_atan2, __atan2_finite) # define __ieee754_atan2 __ieee754_atan2_sse2 diff --git a/sysdeps/x86_64/fpu/multiarch/e_exp.c b/sysdeps/x86_64/fpu/multiarch/e_exp.c index d244954..9c124ca 100644 --- a/sysdeps/x86_64/fpu/multiarch/e_exp.c +++ b/sysdeps/x86_64/fpu/multiarch/e_exp.c @@ -8,14 +8,15 @@ extern double __ieee754_exp_avx (double); # ifdef HAVE_FMA4_SUPPORT extern double __ieee754_exp_fma4 (double); # else -# undef HAS_FMA4 -# define HAS_FMA4 0 +# undef HAS_ARCH_FEATURE +# define HAS_ARCH_FEATURE(feature) 0 # define __ieee754_exp_fma4 ((void *) 0) # endif libm_ifunc (__ieee754_exp, - HAS_FMA4 ? __ieee754_exp_fma4 - : (HAS_AVX ? __ieee754_exp_avx : __ieee754_exp_sse2)); + HAS_ARCH_FEATURE (FMA4_Usable) ? __ieee754_exp_fma4 + : (HAS_ARCH_FEATURE (AVX_Usable) + ? __ieee754_exp_avx : __ieee754_exp_sse2)); strong_alias (__ieee754_exp, __exp_finite) # define __ieee754_exp __ieee754_exp_sse2 diff --git a/sysdeps/x86_64/fpu/multiarch/e_log.c b/sysdeps/x86_64/fpu/multiarch/e_log.c index 9805473..04e9ac5 100644 --- a/sysdeps/x86_64/fpu/multiarch/e_log.c +++ b/sysdeps/x86_64/fpu/multiarch/e_log.c @@ -8,14 +8,15 @@ extern double __ieee754_log_avx (double); # ifdef HAVE_FMA4_SUPPORT extern double __ieee754_log_fma4 (double); # else -# undef HAS_FMA4 -# define HAS_FMA4 0 +# undef HAS_ARCH_FEATURE +# define HAS_ARCH_FEATURE(feature) 0 # define __ieee754_log_fma4 ((void *) 0) # endif libm_ifunc (__ieee754_log, - HAS_FMA4 ? __ieee754_log_fma4 - : (HAS_AVX ? __ieee754_log_avx : __ieee754_log_sse2)); + HAS_ARCH_FEATURE (FMA4_Usable) ? __ieee754_log_fma4 + : (HAS_ARCH_FEATURE (AVX_Usable) + ? __ieee754_log_avx : __ieee754_log_sse2)); strong_alias (__ieee754_log, __log_finite) # define __ieee754_log __ieee754_log_sse2 diff --git a/sysdeps/x86_64/fpu/multiarch/e_pow.c b/sysdeps/x86_64/fpu/multiarch/e_pow.c index 433cce0..6d422d6 100644 --- a/sysdeps/x86_64/fpu/multiarch/e_pow.c +++ b/sysdeps/x86_64/fpu/multiarch/e_pow.c @@ -6,7 +6,10 @@ extern double __ieee754_pow_sse2 (double, double); extern double __ieee754_pow_fma4 (double, double); -libm_ifunc (__ieee754_pow, HAS_FMA4 ? __ieee754_pow_fma4 : __ieee754_pow_sse2); +libm_ifunc (__ieee754_pow, + HAS_ARCH_FEATURE (FMA4_Usable) + ? __ieee754_pow_fma4 + : __ieee754_pow_sse2); strong_alias (__ieee754_pow, __pow_finite) # define __ieee754_pow __ieee754_pow_sse2 diff --git a/sysdeps/x86_64/fpu/multiarch/s_atan.c b/sysdeps/x86_64/fpu/multiarch/s_atan.c index ae16d7c..57b5c65 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_atan.c +++ b/sysdeps/x86_64/fpu/multiarch/s_atan.c @@ -7,13 +7,14 @@ extern double __atan_avx (double); # ifdef HAVE_FMA4_SUPPORT extern double __atan_fma4 (double); # else -# undef HAS_FMA4 -# define HAS_FMA4 0 +# undef HAS_ARCH_FEATURE +# define HAS_ARCH_FEATURE(feature) 0 # define __atan_fma4 ((void *) 0) # endif -libm_ifunc (atan, (HAS_FMA4 ? __atan_fma4 : - HAS_AVX ? __atan_avx : __atan_sse2)); +libm_ifunc (atan, (HAS_ARCH_FEATURE (FMA4_Usable) ? __atan_fma4 : + HAS_ARCH_FEATURE (AVX_Usable) + ? __atan_avx : __atan_sse2)); # define atan __atan_sse2 #endif diff --git a/sysdeps/x86_64/fpu/multiarch/s_ceil.S b/sysdeps/x86_64/fpu/multiarch/s_ceil.S index 00ecede..c1b9026 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_ceil.S +++ b/sysdeps/x86_64/fpu/multiarch/s_ceil.S @@ -22,10 +22,9 @@ ENTRY(__ceil) .type __ceil, @gnu_indirect_function - call __get_cpu_features@plt - movq %rax, %rdx + LOAD_RTLD_GLOBAL_RO_RDX leaq __ceil_sse41(%rip), %rax - testl $bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx) + HAS_CPU_FEATURE (SSE4_1) jnz 2f leaq __ceil_c(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/fpu/multiarch/s_ceilf.S b/sysdeps/x86_64/fpu/multiarch/s_ceilf.S index c8ed705..7809e03 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_ceilf.S +++ b/sysdeps/x86_64/fpu/multiarch/s_ceilf.S @@ -22,10 +22,9 @@ ENTRY(__ceilf) .type __ceilf, @gnu_indirect_function - call __get_cpu_features@plt - movq %rax, %rdx + LOAD_RTLD_GLOBAL_RO_RDX leaq __ceilf_sse41(%rip), %rax - testl $bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx) + HAS_CPU_FEATURE (SSE4_1) jnz 2f leaq __ceilf_c(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/fpu/multiarch/s_floor.S b/sysdeps/x86_64/fpu/multiarch/s_floor.S index 952ffaa..fa3f98e 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_floor.S +++ b/sysdeps/x86_64/fpu/multiarch/s_floor.S @@ -22,10 +22,9 @@ ENTRY(__floor) .type __floor, @gnu_indirect_function - call __get_cpu_features@plt - movq %rax, %rdx + LOAD_RTLD_GLOBAL_RO_RDX leaq __floor_sse41(%rip), %rax - testl $bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx) + HAS_CPU_FEATURE (SSE4_1) jnz 2f leaq __floor_c(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/fpu/multiarch/s_floorf.S b/sysdeps/x86_64/fpu/multiarch/s_floorf.S index c8231e8..f60f662 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_floorf.S +++ b/sysdeps/x86_64/fpu/multiarch/s_floorf.S @@ -22,10 +22,10 @@ ENTRY(__floorf) .type __floorf, @gnu_indirect_function - call __get_cpu_features@plt + LOAD_RTLD_GLOBAL_RO_RDX movq %rax, %rdx leaq __floorf_sse41(%rip), %rax - testl $bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx) + HAS_CPU_FEATURE (SSE4_1) jnz 2f leaq __floorf_c(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/fpu/multiarch/s_fma.c b/sysdeps/x86_64/fpu/multiarch/s_fma.c index 0963a0b..78e7732 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_fma.c +++ b/sysdeps/x86_64/fpu/multiarch/s_fma.c @@ -42,14 +42,15 @@ __fma_fma4 (double x, double y, double z) return x; } # else -# undef HAS_FMA4 -# define HAS_FMA4 0 +# undef HAS_ARCH_FEATURE +# define HAS_ARCH_FEATURE(feature) 0 # define __fma_fma4 ((void *) 0) # endif -libm_ifunc (__fma, HAS_FMA - ? __fma_fma3 : (HAS_FMA4 ? __fma_fma4 : __fma_sse2)); +libm_ifunc (__fma, HAS_ARCH_FEATURE (FMA_Usable) + ? __fma_fma3 : (HAS_ARCH_FEATURE (FMA4_Usable) + ? __fma_fma4 : __fma_sse2)); weak_alias (__fma, fma) # define __fma __fma_sse2 diff --git a/sysdeps/x86_64/fpu/multiarch/s_fmaf.c b/sysdeps/x86_64/fpu/multiarch/s_fmaf.c index 6046961..bebd3ee 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_fmaf.c +++ b/sysdeps/x86_64/fpu/multiarch/s_fmaf.c @@ -41,14 +41,15 @@ __fmaf_fma4 (float x, float y, float z) return x; } # else -# undef HAS_FMA4 -# define HAS_FMA4 0 +# undef HAS_ARCH_FEATURE +# define HAS_ARCH_FEATURE(feature) 0 # define __fmaf_fma4 ((void *) 0) # endif -libm_ifunc (__fmaf, HAS_FMA - ? __fmaf_fma3 : (HAS_FMA4 ? __fmaf_fma4 : __fmaf_sse2)); +libm_ifunc (__fmaf, HAS_ARCH_FEATURE (FMA_Usable) + ? __fmaf_fma3 : (HAS_ARCH_FEATURE (FMA4_Usable) + ? __fmaf_fma4 : __fmaf_sse2)); weak_alias (__fmaf, fmaf) # define __fmaf __fmaf_sse2 diff --git a/sysdeps/x86_64/fpu/multiarch/s_nearbyint.S b/sysdeps/x86_64/fpu/multiarch/s_nearbyint.S index b5d32b5..109395c 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_nearbyint.S +++ b/sysdeps/x86_64/fpu/multiarch/s_nearbyint.S @@ -22,10 +22,10 @@ ENTRY(__nearbyint) .type __nearbyint, @gnu_indirect_function - call __get_cpu_features@plt + LOAD_RTLD_GLOBAL_RO_RDX movq %rax, %rdx leaq __nearbyint_sse41(%rip), %rax - testl $bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx) + HAS_CPU_FEATURE (SSE4_1) jnz 2f leaq __nearbyint_c(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S b/sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S index cd7e177..b870c0c 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S +++ b/sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S @@ -22,10 +22,9 @@ ENTRY(__nearbyintf) .type __nearbyintf, @gnu_indirect_function - call __get_cpu_features@plt - movq %rax, %rdx + LOAD_RTLD_GLOBAL_RO_RDX leaq __nearbyintf_sse41(%rip), %rax - testl $bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx) + HAS_CPU_FEATURE (SSE4_1) jnz 2f leaq __nearbyintf_c(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/fpu/multiarch/s_rint.S b/sysdeps/x86_64/fpu/multiarch/s_rint.S index f52cef6..b238d49 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_rint.S +++ b/sysdeps/x86_64/fpu/multiarch/s_rint.S @@ -22,10 +22,9 @@ ENTRY(__rint) .type __rint, @gnu_indirect_function - call __get_cpu_features@plt - movq %rax, %rdx + LOAD_RTLD_GLOBAL_RO_RDX leaq __rint_sse41(%rip), %rax - testl $bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx) + HAS_CPU_FEATURE (SSE4_1) jnz 2f leaq __rint_c(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/fpu/multiarch/s_rintf.S b/sysdeps/x86_64/fpu/multiarch/s_rintf.S index e2608d4..8869196 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_rintf.S +++ b/sysdeps/x86_64/fpu/multiarch/s_rintf.S @@ -22,10 +22,9 @@ ENTRY(__rintf) .type __rintf, @gnu_indirect_function - call __get_cpu_features@plt - movq %rax, %rdx + LOAD_RTLD_GLOBAL_RO_RDX leaq __rintf_sse41(%rip), %rax - testl $bit_SSE4_1, CPUID_OFFSET+index_SSE4_1(%rdx) + HAS_CPU_FEATURE (SSE4_1) jnz 2f leaq __rintf_c(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/fpu/multiarch/s_sin.c b/sysdeps/x86_64/fpu/multiarch/s_sin.c index a0c2521..3bc7330 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_sin.c +++ b/sysdeps/x86_64/fpu/multiarch/s_sin.c @@ -11,18 +11,20 @@ extern double __sin_avx (double); extern double __cos_fma4 (double); extern double __sin_fma4 (double); # else -# undef HAS_FMA4 -# define HAS_FMA4 0 +# undef HAS_ARCH_FEATURE +# define HAS_ARCH_FEATURE(feature) 0 # define __cos_fma4 ((void *) 0) # define __sin_fma4 ((void *) 0) # endif -libm_ifunc (__cos, (HAS_FMA4 ? __cos_fma4 : - HAS_AVX ? __cos_avx : __cos_sse2)); +libm_ifunc (__cos, (HAS_ARCH_FEATURE (FMA4_Usable) ? __cos_fma4 : + HAS_ARCH_FEATURE (AVX_Usable) + ? __cos_avx : __cos_sse2)); weak_alias (__cos, cos) -libm_ifunc (__sin, (HAS_FMA4 ? __sin_fma4 : - HAS_AVX ? __sin_avx : __sin_sse2)); +libm_ifunc (__sin, (HAS_ARCH_FEATURE (FMA4_Usable) ? __sin_fma4 : + HAS_ARCH_FEATURE (AVX_Usable) + ? __sin_avx : __sin_sse2)); weak_alias (__sin, sin) # define __cos __cos_sse2 diff --git a/sysdeps/x86_64/fpu/multiarch/s_tan.c b/sysdeps/x86_64/fpu/multiarch/s_tan.c index 904308f..d99d9db 100644 --- a/sysdeps/x86_64/fpu/multiarch/s_tan.c +++ b/sysdeps/x86_64/fpu/multiarch/s_tan.c @@ -7,13 +7,14 @@ extern double __tan_avx (double); # ifdef HAVE_FMA4_SUPPORT extern double __tan_fma4 (double); # else -# undef HAS_FMA4 -# define HAS_FMA4 0 +# undef HAS_ARCH_FEATURE +# define HAS_ARCH_FEATURE(feature) 0 # define __tan_fma4 ((void *) 0) # endif -libm_ifunc (tan, (HAS_FMA4 ? __tan_fma4 : - HAS_AVX ? __tan_avx : __tan_sse2)); +libm_ifunc (tan, (HAS_ARCH_FEATURE (FMA4_Usable) ? __tan_fma4 : + HAS_ARCH_FEATURE (AVX_Usable) + ? __tan_avx : __tan_sse2)); # define tan __tan_sse2 #endif diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index b64e4f1..f5a576c 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -39,48 +39,57 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/memcmp.S. */ IFUNC_IMPL (i, name, memcmp, - IFUNC_IMPL_ADD (array, i, memcmp, HAS_SSE4_1, + IFUNC_IMPL_ADD (array, i, memcmp, HAS_CPU_FEATURE (SSE4_1), __memcmp_sse4_1) - IFUNC_IMPL_ADD (array, i, memcmp, HAS_SSSE3, __memcmp_ssse3) + IFUNC_IMPL_ADD (array, i, memcmp, HAS_CPU_FEATURE (SSSE3), + __memcmp_ssse3) IFUNC_IMPL_ADD (array, i, memcmp, 1, __memcmp_sse2)) /* Support sysdeps/x86_64/multiarch/memmove_chk.S. */ IFUNC_IMPL (i, name, __memmove_chk, - IFUNC_IMPL_ADD (array, i, __memmove_chk, HAS_AVX, + IFUNC_IMPL_ADD (array, i, __memmove_chk, + HAS_ARCH_FEATURE (AVX_Usable), __memmove_chk_avx_unaligned) - IFUNC_IMPL_ADD (array, i, __memmove_chk, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, __memmove_chk, + HAS_CPU_FEATURE (SSSE3), __memmove_chk_ssse3_back) - IFUNC_IMPL_ADD (array, i, __memmove_chk, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, __memmove_chk, + HAS_CPU_FEATURE (SSSE3), __memmove_chk_ssse3) IFUNC_IMPL_ADD (array, i, __memmove_chk, 1, __memmove_chk_sse2)) /* Support sysdeps/x86_64/multiarch/memmove.S. */ IFUNC_IMPL (i, name, memmove, - IFUNC_IMPL_ADD (array, i, memmove, HAS_AVX, + IFUNC_IMPL_ADD (array, i, memmove, + HAS_ARCH_FEATURE (AVX_Usable), __memmove_avx_unaligned) - IFUNC_IMPL_ADD (array, i, memmove, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, memmove, HAS_CPU_FEATURE (SSSE3), __memmove_ssse3_back) - IFUNC_IMPL_ADD (array, i, memmove, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, memmove, HAS_CPU_FEATURE (SSSE3), __memmove_ssse3) IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_sse2)) #ifdef HAVE_AVX2_SUPPORT /* Support sysdeps/x86_64/multiarch/memset_chk.S. */ IFUNC_IMPL (i, name, __memset_chk, - IFUNC_IMPL_ADD (array, i, __memset_chk, 1, __memset_chk_sse2) - IFUNC_IMPL_ADD (array, i, __memset_chk, HAS_AVX2, + IFUNC_IMPL_ADD (array, i, __memset_chk, 1, + __memset_chk_sse2) + IFUNC_IMPL_ADD (array, i, __memset_chk, + HAS_ARCH_FEATURE (AVX2_Usable), __memset_chk_avx2)) /* Support sysdeps/x86_64/multiarch/memset.S. */ IFUNC_IMPL (i, name, memset, IFUNC_IMPL_ADD (array, i, memset, 1, __memset_sse2) - IFUNC_IMPL_ADD (array, i, memset, HAS_AVX2, __memset_avx2)) + IFUNC_IMPL_ADD (array, i, memset, + HAS_ARCH_FEATURE (AVX2_Usable), + __memset_avx2)) #endif /* Support sysdeps/x86_64/multiarch/stpncpy.S. */ IFUNC_IMPL (i, name, stpncpy, - IFUNC_IMPL_ADD (array, i, stpncpy, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, stpncpy, HAS_CPU_FEATURE (SSSE3), __stpncpy_ssse3) IFUNC_IMPL_ADD (array, i, stpncpy, 1, __stpncpy_sse2_unaligned) @@ -88,27 +97,34 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/stpcpy.S. */ IFUNC_IMPL (i, name, stpcpy, - IFUNC_IMPL_ADD (array, i, stpcpy, HAS_SSSE3, __stpcpy_ssse3) + IFUNC_IMPL_ADD (array, i, stpcpy, HAS_CPU_FEATURE (SSSE3), + __stpcpy_ssse3) IFUNC_IMPL_ADD (array, i, stpcpy, 1, __stpcpy_sse2_unaligned) IFUNC_IMPL_ADD (array, i, stpcpy, 1, __stpcpy_sse2)) /* Support sysdeps/x86_64/multiarch/strcasecmp_l.S. */ IFUNC_IMPL (i, name, strcasecmp, - IFUNC_IMPL_ADD (array, i, strcasecmp, HAS_AVX, + IFUNC_IMPL_ADD (array, i, strcasecmp, + HAS_ARCH_FEATURE (AVX_Usable), __strcasecmp_avx) - IFUNC_IMPL_ADD (array, i, strcasecmp, HAS_SSE4_2, + IFUNC_IMPL_ADD (array, i, strcasecmp, + HAS_CPU_FEATURE (SSE4_2), __strcasecmp_sse42) - IFUNC_IMPL_ADD (array, i, strcasecmp, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, strcasecmp, + HAS_CPU_FEATURE (SSSE3), __strcasecmp_ssse3) IFUNC_IMPL_ADD (array, i, strcasecmp, 1, __strcasecmp_sse2)) /* Support sysdeps/x86_64/multiarch/strcasecmp_l.S. */ IFUNC_IMPL (i, name, strcasecmp_l, - IFUNC_IMPL_ADD (array, i, strcasecmp_l, HAS_AVX, + IFUNC_IMPL_ADD (array, i, strcasecmp_l, + HAS_ARCH_FEATURE (AVX_Usable), __strcasecmp_l_avx) - IFUNC_IMPL_ADD (array, i, strcasecmp_l, HAS_SSE4_2, + IFUNC_IMPL_ADD (array, i, strcasecmp_l, + HAS_CPU_FEATURE (SSE4_2), __strcasecmp_l_sse42) - IFUNC_IMPL_ADD (array, i, strcasecmp_l, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, strcasecmp_l, + HAS_CPU_FEATURE (SSSE3), __strcasecmp_l_ssse3) IFUNC_IMPL_ADD (array, i, strcasecmp_l, 1, __strcasecmp_l_sse2)) @@ -119,7 +135,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/strcat.S. */ IFUNC_IMPL (i, name, strcat, - IFUNC_IMPL_ADD (array, i, strcat, HAS_SSSE3, __strcat_ssse3) + IFUNC_IMPL_ADD (array, i, strcat, HAS_CPU_FEATURE (SSSE3), + __strcat_ssse3) IFUNC_IMPL_ADD (array, i, strcat, 1, __strcat_sse2_unaligned) IFUNC_IMPL_ADD (array, i, strcat, 1, __strcat_sse2)) @@ -130,48 +147,57 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/strcmp.S. */ IFUNC_IMPL (i, name, strcmp, - IFUNC_IMPL_ADD (array, i, strcmp, HAS_SSE4_2, __strcmp_sse42) - IFUNC_IMPL_ADD (array, i, strcmp, HAS_SSSE3, __strcmp_ssse3) + IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSE4_2), + __strcmp_sse42) + IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSSE3), + __strcmp_ssse3) IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_sse2_unaligned) IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_sse2)) /* Support sysdeps/x86_64/multiarch/strcpy.S. */ IFUNC_IMPL (i, name, strcpy, - IFUNC_IMPL_ADD (array, i, strcpy, HAS_SSSE3, __strcpy_ssse3) + IFUNC_IMPL_ADD (array, i, strcpy, HAS_CPU_FEATURE (SSSE3), + __strcpy_ssse3) IFUNC_IMPL_ADD (array, i, strcpy, 1, __strcpy_sse2_unaligned) IFUNC_IMPL_ADD (array, i, strcpy, 1, __strcpy_sse2)) /* Support sysdeps/x86_64/multiarch/strcspn.S. */ IFUNC_IMPL (i, name, strcspn, - IFUNC_IMPL_ADD (array, i, strcspn, HAS_SSE4_2, + IFUNC_IMPL_ADD (array, i, strcspn, HAS_CPU_FEATURE (SSE4_2), __strcspn_sse42) IFUNC_IMPL_ADD (array, i, strcspn, 1, __strcspn_sse2)) /* Support sysdeps/x86_64/multiarch/strncase_l.S. */ IFUNC_IMPL (i, name, strncasecmp, - IFUNC_IMPL_ADD (array, i, strncasecmp, HAS_AVX, + IFUNC_IMPL_ADD (array, i, strncasecmp, + HAS_ARCH_FEATURE (AVX_Usable), __strncasecmp_avx) - IFUNC_IMPL_ADD (array, i, strncasecmp, HAS_SSE4_2, + IFUNC_IMPL_ADD (array, i, strncasecmp, + HAS_CPU_FEATURE (SSE4_2), __strncasecmp_sse42) - IFUNC_IMPL_ADD (array, i, strncasecmp, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, strncasecmp, + HAS_CPU_FEATURE (SSSE3), __strncasecmp_ssse3) IFUNC_IMPL_ADD (array, i, strncasecmp, 1, __strncasecmp_sse2)) /* Support sysdeps/x86_64/multiarch/strncase_l.S. */ IFUNC_IMPL (i, name, strncasecmp_l, - IFUNC_IMPL_ADD (array, i, strncasecmp_l, HAS_AVX, + IFUNC_IMPL_ADD (array, i, strncasecmp_l, + HAS_ARCH_FEATURE (AVX_Usable), __strncasecmp_l_avx) - IFUNC_IMPL_ADD (array, i, strncasecmp_l, HAS_SSE4_2, + IFUNC_IMPL_ADD (array, i, strncasecmp_l, + HAS_CPU_FEATURE (SSE4_2), __strncasecmp_l_sse42) - IFUNC_IMPL_ADD (array, i, strncasecmp_l, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, strncasecmp_l, + HAS_CPU_FEATURE (SSSE3), __strncasecmp_l_ssse3) IFUNC_IMPL_ADD (array, i, strncasecmp_l, 1, __strncasecmp_l_sse2)) /* Support sysdeps/x86_64/multiarch/strncat.S. */ IFUNC_IMPL (i, name, strncat, - IFUNC_IMPL_ADD (array, i, strncat, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, strncat, HAS_CPU_FEATURE (SSSE3), __strncat_ssse3) IFUNC_IMPL_ADD (array, i, strncat, 1, __strncat_sse2_unaligned) @@ -179,7 +205,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/strncpy.S. */ IFUNC_IMPL (i, name, strncpy, - IFUNC_IMPL_ADD (array, i, strncpy, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, strncpy, HAS_CPU_FEATURE (SSSE3), __strncpy_ssse3) IFUNC_IMPL_ADD (array, i, strncpy, 1, __strncpy_sse2_unaligned) @@ -187,14 +213,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/strpbrk.S. */ IFUNC_IMPL (i, name, strpbrk, - IFUNC_IMPL_ADD (array, i, strpbrk, HAS_SSE4_2, + IFUNC_IMPL_ADD (array, i, strpbrk, HAS_CPU_FEATURE (SSE4_2), __strpbrk_sse42) IFUNC_IMPL_ADD (array, i, strpbrk, 1, __strpbrk_sse2)) /* Support sysdeps/x86_64/multiarch/strspn.S. */ IFUNC_IMPL (i, name, strspn, - IFUNC_IMPL_ADD (array, i, strspn, HAS_SSE4_2, __strspn_sse42) + IFUNC_IMPL_ADD (array, i, strspn, HAS_CPU_FEATURE (SSE4_2), + __strspn_sse42) IFUNC_IMPL_ADD (array, i, strspn, 1, __strspn_sse2)) /* Support sysdeps/x86_64/multiarch/strstr.c. */ @@ -204,65 +231,75 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/wcscpy.S. */ IFUNC_IMPL (i, name, wcscpy, - IFUNC_IMPL_ADD (array, i, wcscpy, HAS_SSSE3, __wcscpy_ssse3) + IFUNC_IMPL_ADD (array, i, wcscpy, HAS_CPU_FEATURE (SSSE3), + __wcscpy_ssse3) IFUNC_IMPL_ADD (array, i, wcscpy, 1, __wcscpy_sse2)) /* Support sysdeps/x86_64/multiarch/wmemcmp.S. */ IFUNC_IMPL (i, name, wmemcmp, - IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_SSE4_1, + IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_CPU_FEATURE (SSE4_1), __wmemcmp_sse4_1) - IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, wmemcmp, HAS_CPU_FEATURE (SSSE3), __wmemcmp_ssse3) IFUNC_IMPL_ADD (array, i, wmemcmp, 1, __wmemcmp_sse2)) #ifdef SHARED /* Support sysdeps/x86_64/multiarch/memcpy_chk.S. */ IFUNC_IMPL (i, name, __memcpy_chk, - IFUNC_IMPL_ADD (array, i, __memcpy_chk, HAS_AVX, + IFUNC_IMPL_ADD (array, i, __memcpy_chk, + HAS_ARCH_FEATURE (AVX_Usable), __memcpy_chk_avx_unaligned) - IFUNC_IMPL_ADD (array, i, __memcpy_chk, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, __memcpy_chk, + HAS_CPU_FEATURE (SSSE3), __memcpy_chk_ssse3_back) - IFUNC_IMPL_ADD (array, i, __memcpy_chk, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, __memcpy_chk, + HAS_CPU_FEATURE (SSSE3), __memcpy_chk_ssse3) IFUNC_IMPL_ADD (array, i, __memcpy_chk, 1, __memcpy_chk_sse2)) /* Support sysdeps/x86_64/multiarch/memcpy.S. */ IFUNC_IMPL (i, name, memcpy, - IFUNC_IMPL_ADD (array, i, memcpy, HAS_AVX, + IFUNC_IMPL_ADD (array, i, memcpy, + HAS_ARCH_FEATURE (AVX_Usable), __memcpy_avx_unaligned) - IFUNC_IMPL_ADD (array, i, memcpy, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, memcpy, HAS_CPU_FEATURE (SSSE3), __memcpy_ssse3_back) - IFUNC_IMPL_ADD (array, i, memcpy, HAS_SSSE3, __memcpy_ssse3) + IFUNC_IMPL_ADD (array, i, memcpy, HAS_CPU_FEATURE (SSSE3), + __memcpy_ssse3) IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_sse2_unaligned) IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_sse2)) /* Support sysdeps/x86_64/multiarch/mempcpy_chk.S. */ IFUNC_IMPL (i, name, __mempcpy_chk, - IFUNC_IMPL_ADD (array, i, __mempcpy_chk, HAS_AVX, + IFUNC_IMPL_ADD (array, i, __mempcpy_chk, + HAS_ARCH_FEATURE (AVX_Usable), __mempcpy_chk_avx_unaligned) - IFUNC_IMPL_ADD (array, i, __mempcpy_chk, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, __mempcpy_chk, + HAS_CPU_FEATURE (SSSE3), __mempcpy_chk_ssse3_back) - IFUNC_IMPL_ADD (array, i, __mempcpy_chk, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, __mempcpy_chk, + HAS_CPU_FEATURE (SSSE3), __mempcpy_chk_ssse3) IFUNC_IMPL_ADD (array, i, __mempcpy_chk, 1, __mempcpy_chk_sse2)) /* Support sysdeps/x86_64/multiarch/mempcpy.S. */ IFUNC_IMPL (i, name, mempcpy, - IFUNC_IMPL_ADD (array, i, mempcpy, HAS_AVX, + IFUNC_IMPL_ADD (array, i, mempcpy, + HAS_ARCH_FEATURE (AVX_Usable), __mempcpy_avx_unaligned) - IFUNC_IMPL_ADD (array, i, mempcpy, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, mempcpy, HAS_CPU_FEATURE (SSSE3), __mempcpy_ssse3_back) - IFUNC_IMPL_ADD (array, i, mempcpy, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, mempcpy, HAS_CPU_FEATURE (SSSE3), __mempcpy_ssse3) IFUNC_IMPL_ADD (array, i, mempcpy, 1, __mempcpy_sse2)) /* Support sysdeps/x86_64/multiarch/strncmp.S. */ IFUNC_IMPL (i, name, strncmp, - IFUNC_IMPL_ADD (array, i, strncmp, HAS_SSE4_2, + IFUNC_IMPL_ADD (array, i, strncmp, HAS_CPU_FEATURE (SSE4_2), __strncmp_sse42) - IFUNC_IMPL_ADD (array, i, strncmp, HAS_SSSE3, + IFUNC_IMPL_ADD (array, i, strncmp, HAS_CPU_FEATURE (SSSE3), __strncmp_ssse3) IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_sse2)) #endif diff --git a/sysdeps/x86_64/multiarch/memcmp.S b/sysdeps/x86_64/multiarch/memcmp.S index f8b4636..871a081 100644 --- a/sysdeps/x86_64/multiarch/memcmp.S +++ b/sysdeps/x86_64/multiarch/memcmp.S @@ -26,16 +26,13 @@ .text ENTRY(memcmp) .type memcmp, @gnu_indirect_function - cmpl $0, KIND_OFFSET+__cpu_features(%rip) - jne 1f - call __init_cpu_features - -1: testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + HAS_CPU_FEATURE (SSSE3) jnz 2f leaq __memcmp_sse2(%rip), %rax ret -2: testl $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip) +2: HAS_CPU_FEATURE (SSE4_1) jz 3f leaq __memcmp_sse4_1(%rip), %rax ret diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S index 4e18cd3..7e119d3 100644 --- a/sysdeps/x86_64/multiarch/memcpy.S +++ b/sysdeps/x86_64/multiarch/memcpy.S @@ -29,19 +29,17 @@ .text ENTRY(__new_memcpy) .type __new_memcpy, @gnu_indirect_function - cmpl $0, KIND_OFFSET+__cpu_features(%rip) - jne 1f - call __init_cpu_features -1: leaq __memcpy_avx_unaligned(%rip), %rax - testl $bit_AVX_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_AVX_Fast_Unaligned_Load(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq __memcpy_avx_unaligned(%rip), %rax + HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) jz 1f ret 1: leaq __memcpy_sse2(%rip), %rax - testl $bit_Slow_BSF, __cpu_features+FEATURE_OFFSET+index_Slow_BSF(%rip) + HAS_ARCH_FEATURE (Slow_BSF) jnz 2f leaq __memcpy_sse2_unaligned(%rip), %rax ret -2: testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) +2: HAS_CPU_FEATURE (SSSE3) jz 3f leaq __memcpy_ssse3(%rip), %rax 3: ret diff --git a/sysdeps/x86_64/multiarch/memcpy_chk.S b/sysdeps/x86_64/multiarch/memcpy_chk.S index 1e756ea..81f83dd 100644 --- a/sysdeps/x86_64/multiarch/memcpy_chk.S +++ b/sysdeps/x86_64/multiarch/memcpy_chk.S @@ -29,17 +29,15 @@ .text ENTRY(__memcpy_chk) .type __memcpy_chk, @gnu_indirect_function - cmpl $0, KIND_OFFSET+__cpu_features(%rip) - jne 1f - call __init_cpu_features -1: leaq __memcpy_chk_sse2(%rip), %rax - testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq __memcpy_chk_sse2(%rip), %rax + HAS_CPU_FEATURE (SSSE3) jz 2f leaq __memcpy_chk_ssse3(%rip), %rax - testl $bit_Fast_Copy_Backward, __cpu_features+FEATURE_OFFSET+index_Fast_Copy_Backward(%rip) + HAS_ARCH_FEATURE (Fast_Copy_Backward) jz 2f leaq __memcpy_chk_ssse3_back(%rip), %rax - testl $bit_AVX_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_AVX_Fast_Unaligned_Load(%rip) + HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) jz 2f leaq __memcpy_chk_avx_unaligned(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/multiarch/memmove.c b/sysdeps/x86_64/multiarch/memmove.c index dd153a3..bbddbc1 100644 --- a/sysdeps/x86_64/multiarch/memmove.c +++ b/sysdeps/x86_64/multiarch/memmove.c @@ -49,10 +49,10 @@ extern __typeof (__redirect_memmove) __memmove_avx_unaligned attribute_hidden; ifunc symbol properly. */ extern __typeof (__redirect_memmove) __libc_memmove; libc_ifunc (__libc_memmove, - HAS_AVX_FAST_UNALIGNED_LOAD + HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) ? __memmove_avx_unaligned - : (HAS_SSSE3 - ? (HAS_FAST_COPY_BACKWARD + : (HAS_CPU_FEATURE (SSSE3) + ? (HAS_ARCH_FEATURE (Fast_Copy_Backward) ? __memmove_ssse3_back : __memmove_ssse3) : __memmove_sse2)); diff --git a/sysdeps/x86_64/multiarch/memmove_chk.c b/sysdeps/x86_64/multiarch/memmove_chk.c index 8b12d00..5f70e3a 100644 --- a/sysdeps/x86_64/multiarch/memmove_chk.c +++ b/sysdeps/x86_64/multiarch/memmove_chk.c @@ -30,8 +30,8 @@ extern __typeof (__memmove_chk) __memmove_chk_avx_unaligned attribute_hidden; #include "debug/memmove_chk.c" libc_ifunc (__memmove_chk, - HAS_AVX_FAST_UNALIGNED_LOAD ? __memmove_chk_avx_unaligned : - (HAS_SSSE3 - ? (HAS_FAST_COPY_BACKWARD + HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) ? __memmove_chk_avx_unaligned : + (HAS_CPU_FEATURE (SSSE3) + ? (HAS_ARCH_FEATURE (Fast_Copy_Backward) ? __memmove_chk_ssse3_back : __memmove_chk_ssse3) : __memmove_chk_sse2)); diff --git a/sysdeps/x86_64/multiarch/mempcpy.S b/sysdeps/x86_64/multiarch/mempcpy.S index 2eaacdf..ad36840 100644 --- a/sysdeps/x86_64/multiarch/mempcpy.S +++ b/sysdeps/x86_64/multiarch/mempcpy.S @@ -27,17 +27,15 @@ #if defined SHARED && IS_IN (libc) ENTRY(__mempcpy) .type __mempcpy, @gnu_indirect_function - cmpl $0, KIND_OFFSET+__cpu_features(%rip) - jne 1f - call __init_cpu_features -1: leaq __mempcpy_sse2(%rip), %rax - testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq __mempcpy_sse2(%rip), %rax + HAS_CPU_FEATURE (SSSE3) jz 2f leaq __mempcpy_ssse3(%rip), %rax - testl $bit_Fast_Copy_Backward, __cpu_features+FEATURE_OFFSET+index_Fast_Copy_Backward(%rip) + HAS_ARCH_FEATURE (Fast_Copy_Backward) jz 2f leaq __mempcpy_ssse3_back(%rip), %rax - testl $bit_AVX_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_AVX_Fast_Unaligned_Load(%rip) + HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) jz 2f leaq __mempcpy_avx_unaligned(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/multiarch/mempcpy_chk.S b/sysdeps/x86_64/multiarch/mempcpy_chk.S index 17b8470..0a46b56 100644 --- a/sysdeps/x86_64/multiarch/mempcpy_chk.S +++ b/sysdeps/x86_64/multiarch/mempcpy_chk.S @@ -29,17 +29,15 @@ .text ENTRY(__mempcpy_chk) .type __mempcpy_chk, @gnu_indirect_function - cmpl $0, KIND_OFFSET+__cpu_features(%rip) - jne 1f - call __init_cpu_features -1: leaq __mempcpy_chk_sse2(%rip), %rax - testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq __mempcpy_chk_sse2(%rip), %rax + HAS_CPU_FEATURE (SSSE3) jz 2f leaq __mempcpy_chk_ssse3(%rip), %rax - testl $bit_Fast_Copy_Backward, __cpu_features+FEATURE_OFFSET+index_Fast_Copy_Backward(%rip) + HAS_ARCH_FEATURE (Fast_Copy_Backward) jz 2f leaq __mempcpy_chk_ssse3_back(%rip), %rax - testl $bit_AVX_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_AVX_Fast_Unaligned_Load(%rip) + HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) jz 2f leaq __mempcpy_chk_avx_unaligned(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/multiarch/memset.S b/sysdeps/x86_64/multiarch/memset.S index c5f1fb3..16fefa7 100644 --- a/sysdeps/x86_64/multiarch/memset.S +++ b/sysdeps/x86_64/multiarch/memset.S @@ -26,11 +26,9 @@ # if IS_IN (libc) ENTRY(memset) .type memset, @gnu_indirect_function - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: leaq __memset_sse2(%rip), %rax - testl $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq __memset_sse2(%rip), %rax + HAS_ARCH_FEATURE (AVX2_Usable) jz 2f leaq __memset_avx2(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/multiarch/memset_chk.S b/sysdeps/x86_64/multiarch/memset_chk.S index 64fed31..ef8c64f 100644 --- a/sysdeps/x86_64/multiarch/memset_chk.S +++ b/sysdeps/x86_64/multiarch/memset_chk.S @@ -25,11 +25,9 @@ # if defined SHARED && defined HAVE_AVX2_SUPPORT ENTRY(__memset_chk) .type __memset_chk, @gnu_indirect_function - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: leaq __memset_chk_sse2(%rip), %rax - testl $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq __memset_chk_sse2(%rip), %rax + HAS_ARCH_FEATURE (AVX2_Usable) jz 2f leaq __memset_chk_avx2(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/multiarch/sched_cpucount.c b/sysdeps/x86_64/multiarch/sched_cpucount.c index 72ad7b0..e9391a2 100644 --- a/sysdeps/x86_64/multiarch/sched_cpucount.c +++ b/sysdeps/x86_64/multiarch/sched_cpucount.c @@ -33,4 +33,4 @@ #undef __sched_cpucount libc_ifunc (__sched_cpucount, - HAS_POPCOUNT ? popcount_cpucount : generic_cpucount); + HAS_CPU_FEATURE (POPCOUNT) ? popcount_cpucount : generic_cpucount); diff --git a/sysdeps/x86_64/multiarch/strcat.S b/sysdeps/x86_64/multiarch/strcat.S index 44993fa..25d926c 100644 --- a/sysdeps/x86_64/multiarch/strcat.S +++ b/sysdeps/x86_64/multiarch/strcat.S @@ -47,14 +47,12 @@ .text ENTRY(STRCAT) .type STRCAT, @gnu_indirect_function - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: leaq STRCAT_SSE2_UNALIGNED(%rip), %rax - testl $bit_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_Fast_Unaligned_Load(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq STRCAT_SSE2_UNALIGNED(%rip), %rax + HAS_ARCH_FEATURE (Fast_Unaligned_Load) jnz 2f leaq STRCAT_SSE2(%rip), %rax - testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + HAS_CPU_FEATURE (SSSE3) jz 2f leaq STRCAT_SSSE3(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/multiarch/strchr.S b/sysdeps/x86_64/multiarch/strchr.S index af55fac..0c5fdd9 100644 --- a/sysdeps/x86_64/multiarch/strchr.S +++ b/sysdeps/x86_64/multiarch/strchr.S @@ -25,11 +25,9 @@ .text ENTRY(strchr) .type strchr, @gnu_indirect_function - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: leaq __strchr_sse2(%rip), %rax -2: testl $bit_Slow_BSF, __cpu_features+FEATURE_OFFSET+index_Slow_BSF(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq __strchr_sse2(%rip), %rax +2: HAS_ARCH_FEATURE (Slow_BSF) jz 3f leaq __strchr_sse2_no_bsf(%rip), %rax 3: ret diff --git a/sysdeps/x86_64/multiarch/strcmp.S b/sysdeps/x86_64/multiarch/strcmp.S index f50f26c..c180ce6 100644 --- a/sysdeps/x86_64/multiarch/strcmp.S +++ b/sysdeps/x86_64/multiarch/strcmp.S @@ -84,24 +84,20 @@ .text ENTRY(STRCMP) .type STRCMP, @gnu_indirect_function - /* Manually inlined call to __get_cpu_features. */ - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: + LOAD_RTLD_GLOBAL_RO_RDX #ifdef USE_AS_STRCMP leaq __strcmp_sse2_unaligned(%rip), %rax - testl $bit_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_Fast_Unaligned_Load(%rip) + HAS_ARCH_FEATURE (Fast_Unaligned_Load) jnz 3f #else - testl $bit_Slow_SSE4_2, __cpu_features+FEATURE_OFFSET+index_Slow_SSE4_2(%rip) + HAS_ARCH_FEATURE (Slow_SSE4_2) jnz 2f leaq STRCMP_SSE42(%rip), %rax - testl $bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip) + HAS_CPU_FEATURE (SSE4_2) jnz 3f #endif 2: leaq STRCMP_SSSE3(%rip), %rax - testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + HAS_CPU_FEATURE (SSSE3) jnz 3f leaq STRCMP_SSE2(%rip), %rax 3: ret @@ -110,23 +106,19 @@ END(STRCMP) # ifdef USE_AS_STRCASECMP_L ENTRY(__strcasecmp) .type __strcasecmp, @gnu_indirect_function - /* Manually inlined call to __get_cpu_features. */ - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: + LOAD_RTLD_GLOBAL_RO_RDX # ifdef HAVE_AVX_SUPPORT leaq __strcasecmp_avx(%rip), %rax - testl $bit_AVX_Usable, __cpu_features+FEATURE_OFFSET+index_AVX_Usable(%rip) + HAS_ARCH_FEATURE (AVX_Usable) jnz 3f # endif - testl $bit_Slow_SSE4_2, __cpu_features+FEATURE_OFFSET+index_Slow_SSE4_2(%rip) + HAS_ARCH_FEATURE (Slow_SSE4_2) jnz 2f leaq __strcasecmp_sse42(%rip), %rax - testl $bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip) + HAS_CPU_FEATURE (SSE4_2) jnz 3f 2: leaq __strcasecmp_ssse3(%rip), %rax - testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + HAS_CPU_FEATURE (SSSE3) jnz 3f leaq __strcasecmp_sse2(%rip), %rax 3: ret @@ -136,23 +128,19 @@ weak_alias (__strcasecmp, strcasecmp) # ifdef USE_AS_STRNCASECMP_L ENTRY(__strncasecmp) .type __strncasecmp, @gnu_indirect_function - /* Manually inlined call to __get_cpu_features. */ - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: + LOAD_RTLD_GLOBAL_RO_RDX # ifdef HAVE_AVX_SUPPORT leaq __strncasecmp_avx(%rip), %rax - testl $bit_AVX_Usable, __cpu_features+FEATURE_OFFSET+index_AVX_Usable(%rip) + HAS_ARCH_FEATURE (AVX_Usable) jnz 3f # endif - testl $bit_Slow_SSE4_2, __cpu_features+FEATURE_OFFSET+index_Slow_SSE4_2(%rip) + HAS_ARCH_FEATURE (Slow_SSE4_2) jnz 2f leaq __strncasecmp_sse42(%rip), %rax - testl $bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip) + HAS_CPU_FEATURE (SSE4_2) jnz 3f 2: leaq __strncasecmp_ssse3(%rip), %rax - testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + HAS_CPU_FEATURE (SSSE3) jnz 3f leaq __strncasecmp_sse2(%rip), %rax 3: ret diff --git a/sysdeps/x86_64/multiarch/strcpy.S b/sysdeps/x86_64/multiarch/strcpy.S index 9464ee8..3aae8ee 100644 --- a/sysdeps/x86_64/multiarch/strcpy.S +++ b/sysdeps/x86_64/multiarch/strcpy.S @@ -61,14 +61,12 @@ .text ENTRY(STRCPY) .type STRCPY, @gnu_indirect_function - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: leaq STRCPY_SSE2_UNALIGNED(%rip), %rax - testl $bit_Fast_Unaligned_Load, __cpu_features+FEATURE_OFFSET+index_Fast_Unaligned_Load(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq STRCPY_SSE2_UNALIGNED(%rip), %rax + HAS_ARCH_FEATURE (Fast_Unaligned_Load) jnz 2f leaq STRCPY_SSE2(%rip), %rax - testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + HAS_CPU_FEATURE (SSSE3) jz 2f leaq STRCPY_SSSE3(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/multiarch/strcspn.S b/sysdeps/x86_64/multiarch/strcspn.S index 95e882c..45c69b3 100644 --- a/sysdeps/x86_64/multiarch/strcspn.S +++ b/sysdeps/x86_64/multiarch/strcspn.S @@ -45,11 +45,9 @@ .text ENTRY(STRCSPN) .type STRCSPN, @gnu_indirect_function - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: leaq STRCSPN_SSE2(%rip), %rax - testl $bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq STRCSPN_SSE2(%rip), %rax + HAS_CPU_FEATURE (SSE4_2) jz 2f leaq STRCSPN_SSE42(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/multiarch/strspn.S b/sysdeps/x86_64/multiarch/strspn.S index b734c17..c4d3b27 100644 --- a/sysdeps/x86_64/multiarch/strspn.S +++ b/sysdeps/x86_64/multiarch/strspn.S @@ -30,11 +30,9 @@ .text ENTRY(strspn) .type strspn, @gnu_indirect_function - cmpl $0, __cpu_features+KIND_OFFSET(%rip) - jne 1f - call __init_cpu_features -1: leaq __strspn_sse2(%rip), %rax - testl $bit_SSE4_2, __cpu_features+CPUID_OFFSET+index_SSE4_2(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + leaq __strspn_sse2(%rip), %rax + HAS_CPU_FEATURE (SSE4_2) jz 2f leaq __strspn_sse42(%rip), %rax 2: ret diff --git a/sysdeps/x86_64/multiarch/strstr.c b/sysdeps/x86_64/multiarch/strstr.c index 507994b..b8827f0 100644 --- a/sysdeps/x86_64/multiarch/strstr.c +++ b/sysdeps/x86_64/multiarch/strstr.c @@ -41,7 +41,10 @@ extern __typeof (__redirect_strstr) __strstr_sse2 attribute_hidden; /* Avoid DWARF definition DIE on ifunc symbol so that GDB can handle ifunc symbol properly. */ extern __typeof (__redirect_strstr) __libc_strstr; -libc_ifunc (__libc_strstr, HAS_FAST_UNALIGNED_LOAD ? __strstr_sse2_unaligned : __strstr_sse2) +libc_ifunc (__libc_strstr, + HAS_ARCH_FEATURE (Fast_Unaligned_Load) + ? __strstr_sse2_unaligned + : __strstr_sse2) #undef strstr strong_alias (__libc_strstr, strstr) diff --git a/sysdeps/x86_64/multiarch/test-multiarch.c b/sysdeps/x86_64/multiarch/test-multiarch.c index 949d26e..e893894 100644 --- a/sysdeps/x86_64/multiarch/test-multiarch.c +++ b/sysdeps/x86_64/multiarch/test-multiarch.c @@ -75,12 +75,18 @@ do_test (int argc, char **argv) int fails; get_cpuinfo (); - fails = check_proc ("avx", HAS_AVX, "HAS_AVX"); - fails += check_proc ("fma4", HAS_FMA4, "HAS_FMA4"); - fails += check_proc ("sse4_2", HAS_SSE4_2, "HAS_SSE4_2"); - fails += check_proc ("sse4_1", HAS_SSE4_1, "HAS_SSE4_1"); - fails += check_proc ("ssse3", HAS_SSSE3, "HAS_SSSE3"); - fails += check_proc ("popcnt", HAS_POPCOUNT, "HAS_POPCOUNT"); + fails = check_proc ("avx", HAS_ARCH_FEATURE (AVX_Usable), + "HAS_ARCH_FEATURE (AVX_Usable)"); + fails += check_proc ("fma4", HAS_ARCH_FEATURE (FMA4_Usable), + "HAS_ARCH_FEATURE (FMA4_Usable)"); + fails += check_proc ("sse4_2", HAS_CPU_FEATURE (SSE4_2), + "HAS_CPU_FEATURE (SSE4_2)"); + fails += check_proc ("sse4_1", HAS_CPU_FEATURE (SSE4_1) + , "HAS_CPU_FEATURE (SSE4_1)"); + fails += check_proc ("ssse3", HAS_CPU_FEATURE (SSSE3), + "HAS_CPU_FEATURE (SSSE3)"); + fails += check_proc ("popcnt", HAS_CPU_FEATURE (POPCOUNT), + "HAS_CPU_FEATURE (POPCOUNT)"); printf ("%d differences between /proc/cpuinfo and glibc code.\n", fails); diff --git a/sysdeps/x86_64/multiarch/wcscpy.S b/sysdeps/x86_64/multiarch/wcscpy.S index ff2f5a7..c47c51c 100644 --- a/sysdeps/x86_64/multiarch/wcscpy.S +++ b/sysdeps/x86_64/multiarch/wcscpy.S @@ -27,11 +27,8 @@ .text ENTRY(wcscpy) .type wcscpy, @gnu_indirect_function - cmpl $0, KIND_OFFSET+__cpu_features(%rip) - jne 1f - call __init_cpu_features - -1: testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + HAS_CPU_FEATURE (SSSE3) jnz 2f leaq __wcscpy_sse2(%rip), %rax ret diff --git a/sysdeps/x86_64/multiarch/wmemcmp.S b/sysdeps/x86_64/multiarch/wmemcmp.S index 109e245..62215f4 100644 --- a/sysdeps/x86_64/multiarch/wmemcmp.S +++ b/sysdeps/x86_64/multiarch/wmemcmp.S @@ -26,16 +26,13 @@ .text ENTRY(wmemcmp) .type wmemcmp, @gnu_indirect_function - cmpl $0, KIND_OFFSET+__cpu_features(%rip) - jne 1f - call __init_cpu_features - -1: testl $bit_SSSE3, __cpu_features+CPUID_OFFSET+index_SSSE3(%rip) + LOAD_RTLD_GLOBAL_RO_RDX + HAS_CPU_FEATURE (SSSE3) jnz 2f leaq __wmemcmp_sse2(%rip), %rax ret -2: testl $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip) +2: HAS_CPU_FEATURE (SSE4_1) jz 3f leaq __wmemcmp_sse4_1(%rip), %rax ret