[v2] x86: Add missing Slow_SSE4_2 to ifunc-sse4_2.h

Message ID 20220624201216.3783855-1-goldstein.w.n@gmail.com
State Superseded
Headers
Series [v2] x86: Add missing Slow_SSE4_2 to ifunc-sse4_2.h |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Noah Goldstein June 24, 2022, 8:12 p.m. UTC
  The functions that use this ifunc are strspn, strcspn, and strpbrk.

All of these functions use pcmpstri which can be slow on some
processors (checked by Slow_SSE4_2).
---
 sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
  

Comments

H.J. Lu June 24, 2022, 8:37 p.m. UTC | #1
On Fri, Jun 24, 2022 at 1:12 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> The functions that use this ifunc are strspn, strcspn, and strpbrk.
>
> All of these functions use pcmpstri which can be slow on some
> processors (checked by Slow_SSE4_2).
> ---
>  sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> index ee36525bcf..1830597862 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> @@ -27,7 +27,8 @@ IFUNC_SELECTOR (void)
>  {
>    const struct cpu_features* cpu_features = __get_cpu_features ();
>
> -  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
> +  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2)
> +      && !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2))
>      return OPTIMIZE (sse42);
>
>    return OPTIMIZE (generic);
> --
> 2.34.1
>

Slower SSE 4.2 is relative to the SSE2 version in assembly codes.
It may still be faster than the generic version in C.
  

Patch

diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
index ee36525bcf..1830597862 100644
--- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
+++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
@@ -27,7 +27,8 @@  IFUNC_SELECTOR (void)
 {
   const struct cpu_features* cpu_features = __get_cpu_features ();
 
-  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
+  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2)
+      && !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2))
     return OPTIMIZE (sse42);
 
   return OPTIMIZE (generic);