[v2] x86: Add missing Slow_SSE4_2 to ifunc-sse4_2.h
Checks
Context |
Check |
Description |
dj/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
dj/TryBot-32bit |
success
|
Build for i686
|
Commit Message
The functions that use this ifunc are strspn, strcspn, and strpbrk.
All of these functions use pcmpstri which can be slow on some
processors (checked by Slow_SSE4_2).
---
sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
On Fri, Jun 24, 2022 at 1:12 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> The functions that use this ifunc are strspn, strcspn, and strpbrk.
>
> All of these functions use pcmpstri which can be slow on some
> processors (checked by Slow_SSE4_2).
> ---
> sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> index ee36525bcf..1830597862 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h
> @@ -27,7 +27,8 @@ IFUNC_SELECTOR (void)
> {
> const struct cpu_features* cpu_features = __get_cpu_features ();
>
> - if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
> + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2)
> + && !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2))
> return OPTIMIZE (sse42);
>
> return OPTIMIZE (generic);
> --
> 2.34.1
>
Slower SSE 4.2 is relative to the SSE2 version in assembly codes.
It may still be faster than the generic version in C.
@@ -27,7 +27,8 @@ IFUNC_SELECTOR (void)
{
const struct cpu_features* cpu_features = __get_cpu_features ();
- if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2))
+ if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2)
+ && !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2))
return OPTIMIZE (sse42);
return OPTIMIZE (generic);