[v1] ppc64le: Add optimized __memcmpeq for POWER10

Message ID 58f216af-d292-415b-a836-08c656f4b727@linux.ibm.com (mailing list archive)
State Superseded
Headers
Series [v1] ppc64le: Add optimized __memcmpeq for POWER10 |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch fail Patch failed to apply to master at the time it was sent
redhat-pt-bot/TryBot-32bit fail Patch series failed to apply

Commit Message

Sachin Monga May 25, 2026, 8:09 a.m. UTC
  Hi

This patch adds an optimized __memcmpeq implementation for POWER10.

 >8>8>8>8>8>8>8>8>8>8

 From 7c0eb2fbd5a850dfabfdb77af6bda4b5bc8ada98 Mon Sep 17 00:00:00 2001
From: Sachin Monga <smonga@linux.ibm.com>
Date: Mon, 25 May 2026 00:58:04 -0500
Subject: [PATCH v1] ppc64le: Add optimized __memcmpeq for POWER10

__memcmpeq (added in glibc 2.35) was previously an alias to memcmp on
POWER10 via strong_alias. However, in the multiarch IFUNC path, this
caused __memcmpeq to resolve to the generic C memcmp.c implementation
rather than the optimized POWER10 memcmp.S, leaving a significant
performance gap.

Unlike memcmp, __memcmpeq only needs to return zero or nonzero with
no requirement on the sign or magnitude for unequal inputs, allowing
a simpler and faster implementation.

Performance on POWER10 :

   1) __memcmpeq (generic) -> __memcmpeq_power10
      The primary motivation - __memcmpeq was resolving to generic C
      in the multiarch path.

   - Small data (< 8B to < 512B) : ~52% - 82% improvement.
   - Bulk  (< 16MB to < 256MB)   : ~25% - 32% improvement.
   - Large (1GB)            : ~33% improvement

   2) memcmp_power10 (optimized .S) -> __memcmpeq_power10:
      Comparing dedicated __memcmpeq against the optimized memcmp
      it previously aliased to.

   - Small data (< 8B to < 256B) : No improvement observed.
     Real-world workloads predominantly operate on larger buffers
   - >= 512B            : ~9%  improvement.
   - 16MB - 128MB        : ~25% - 32% improvement.
   - 256MB            : ~3%  improvement.
   - Large (1GB)            : On par.
---
This patch is reg tested.

  sysdeps/powerpc/powerpc64/le/power10/memcmp.S |   2 -
  .../powerpc/powerpc64/le/power10/memcmpeq.S   | 156 ++++++++++++++++++
  sysdeps/powerpc/powerpc64/multiarch/Makefile  |   2 +-
  .../powerpc64/multiarch/ifunc-impl-list.c     |  20 +++
  .../powerpc64/multiarch/memcmp-ppc64.c        |   7 +-
  .../powerpc64/multiarch/memcmpeq-power10.S    |  28 ++++
  .../powerpc/powerpc64/multiarch/memcmpeq.c    |  57 +++++++
  7 files changed, 266 insertions(+), 6 deletions(-)
  create mode 100644 sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S
  create mode 100644 sysdeps/powerpc/powerpc64/multiarch/memcmpeq-power10.S
  create mode 100644 sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c

+#endif
  

Comments

Adhemerval Zanella Netto May 25, 2026, 12:42 p.m. UTC | #1
On 25/05/26 05:09, Sachin Monga wrote:
> Hi
> 
> This patch adds an optimized __memcmpeq implementation for POWER10.
> 
>>8>8>8>8>8>8>8>8>8>8
> 
> From 7c0eb2fbd5a850dfabfdb77af6bda4b5bc8ada98 Mon Sep 17 00:00:00 2001
> From: Sachin Monga <smonga@linux.ibm.com>
> Date: Mon, 25 May 2026 00:58:04 -0500
> Subject: [PATCH v1] ppc64le: Add optimized __memcmpeq for POWER10
> 
> __memcmpeq (added in glibc 2.35) was previously an alias to memcmp on
> POWER10 via strong_alias. However, in the multiarch IFUNC path, this
> caused __memcmpeq to resolve to the generic C memcmp.c implementation
> rather than the optimized POWER10 memcmp.S, leaving a significant
> performance gap.

Is it really worth to duplicate most of the memcmp implementation for this
specific optimization? A simpler solution would add a ifunc variant that
returns the already in place memcmp variants.

> 
> Unlike memcmp, __memcmpeq only needs to return zero or nonzero with
> no requirement on the sign or magnitude for unequal inputs, allowing
> a simpler and faster implementation.
> 
> Performance on POWER10 :
> 
>   1) __memcmpeq (generic) -> __memcmpeq_power10
>      The primary motivation - __memcmpeq was resolving to generic C
>      in the multiarch path.
> 
>   - Small data (< 8B to < 512B) : ~52% - 82% improvement.
>   - Bulk  (< 16MB to < 256MB)   : ~25% - 32% improvement.
>   - Large (1GB)            : ~33% improvement
> 
>   2) memcmp_power10 (optimized .S) -> __memcmpeq_power10:
>      Comparing dedicated __memcmpeq against the optimized memcmp
>      it previously aliased to.
> 
>   - Small data (< 8B to < 256B) : No improvement observed.
>     Real-world workloads predominantly operate on larger buffers
>   - >= 512B            : ~9%  improvement.
>   - 16MB - 128MB        : ~25% - 32% improvement.
>   - 256MB            : ~3%  improvement.
>   - Large (1GB)            : On par.
> ---
> This patch is reg tested.
> 
>  sysdeps/powerpc/powerpc64/le/power10/memcmp.S |   2 -
>  .../powerpc/powerpc64/le/power10/memcmpeq.S   | 156 ++++++++++++++++++
>  sysdeps/powerpc/powerpc64/multiarch/Makefile  |   2 +-
>  .../powerpc64/multiarch/ifunc-impl-list.c     |  20 +++
>  .../powerpc64/multiarch/memcmp-ppc64.c        |   7 +-
>  .../powerpc64/multiarch/memcmpeq-power10.S    |  28 ++++
>  .../powerpc/powerpc64/multiarch/memcmpeq.c    |  57 +++++++
>  7 files changed, 266 insertions(+), 6 deletions(-)
>  create mode 100644 sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S
>  create mode 100644 sysdeps/powerpc/powerpc64/multiarch/memcmpeq-power10.S
>  create mode 100644 sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c
> 
> diff --git a/sysdeps/powerpc/powerpc64/le/power10/memcmp.S b/sysdeps/powerpc/powerpc64/le/power10/memcmp.S
> index 46a74dea4d..8915676e1b 100644
> --- a/sysdeps/powerpc/powerpc64/le/power10/memcmp.S
> +++ b/sysdeps/powerpc/powerpc64/le/power10/memcmp.S
> @@ -161,5 +161,3 @@ L(tail8):
>  END (MEMCMP)
>  libc_hidden_builtin_def (memcmp)
>  weak_alias (memcmp, bcmp)
> -strong_alias (memcmp, __memcmpeq)
> -libc_hidden_def (__memcmpeq)
> diff --git a/sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S b/sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S
> new file mode 100644
> index 0000000000..4a1a4ad3ce
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S
> @@ -0,0 +1,156 @@
> +/* Optimized __memcmpeq implementation for POWER10.
> +   Copyright (C) 2026 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +
> +#define COMPARE_32(vr1,vr2,offset)\
> +    lxvp      32+vr1,offset(r3);    \
> +    lxvp      32+vr2,offset(r4);    \
> +    vxor      v4,vr1,vr2;        \
> +    vxor      v5,vr1+1,vr2+1;    \
> +    vor       v19,v19,v4;        \
> +    vor       v19,v19,v5;
> +
> +/* int [r3] __memcmpeq (const char *s1 [r3], const char *s2 [r4],
> +                    size_t size [r5])
> +   Returns 0 if equal, 1 if not equal (no lexicographic comparison) */
> +
> +#ifndef MEMCMPEQ
> +# define MEMCMPEQ __memcmpeq
> +#endif
> +    .machine  power10
> +ENTRY_TOCLESS (MEMCMPEQ, 4)
> +    CALL_MCOUNT 3
> +
> +    /* Fast path: size == 0 */
> +    cmpdi    cr7,r5,0
> +    beq    cr7,L(finish)
> +
> +    /* Fast path: same pointer */
> +    cmpd    cr7,r3,r4
> +    beq    cr7,L(finish)
> +
> +    cmpldi    cr6,r5,64
> +    bgt    cr6,L(loop_head)
> +
> +/* Compare 64 bytes. This section is used for lengths <= 64 and for the last
> +   bytes for larger lengths.  */
> +L(last_compare):
> +    li    r8,16
> +
> +    sldi    r9,r5,56
> +    sldi    r8,r8,56
> +    addi    r6,r3,16
> +    addi    r7,r4,16
> +
> +    /* Align up to 16 bytes.  */
> +    lxvl    32+v0,r3,r9
> +    lxvl    32+v2,r4,r9
> +
> +    /* Branch to not_equal if any bytes differ (CR6 set by vcmpneb.).
> +       Branch to finish if no bytes remain (CR0.LT set when r9 went
> +       negative after sub.).  */
> +    sub.      r9,r9,r8
> +    vcmpneb.  v4,v0,v2
> +    bne      cr6,L(not_equal)
> +    bt      4*cr0+lt,L(finish)
> +
> +    addi      r3,r3,32
> +    addi      r4,r4,32
> +
> +    lxvl      32+v1,r6,r9
> +    lxvl      32+v3,r7,r9
> +    sub.      r9,r9,r8
> +    vcmpneb.  v5,v1,v3
> +    bne      cr6,L(not_equal)
> +    bt      4*cr0+lt,L(finish)
> +
> +    addi      r6,r3,16
> +    addi      r7,r4,16
> +
> +    lxvl      32+v6,r3,r9
> +    lxvl      32+v8,r4,r9
> +    sub.      r9,r9,r8
> +    vcmpneb.  v4,v6,v8
> +    bne      cr6,L(not_equal)
> +    bt      4*cr0+lt,L(finish)
> +
> +    lxvl      32+v7,r6,r9
> +    lxvl      32+v9,r7,r9
> +    vcmpneb.  v5,v7,v9
> +    bne      cr6,L(not_equal)
> +
> +L(finish):
> +    /* The contents are equal.  */
> +    li    r3,0
> +    blr
> +
> +L(not_equal):
> +    li    r3,1
> +    blr
> +
> +L(loop_head):
> +    /* Calculate how many loops to run.  */
> +    srdi.    r8,r5,7
> +    beq    L(loop_tail)
> +    mtctr    r8
> +
> +    vxor    v18,v18,v18
> +    vxor    v19,v19,v19
> +    .p2align 5
> +L(loop_128):
> +    COMPARE_32(v0,v2,0)
> +    COMPARE_32(v6,v8,32)
> +    COMPARE_32(v10,v12,64)
> +    COMPARE_32(v14,v16,96)
> +
> +    vcmpneb. v17,v19,v18
> +    bne    cr6,L(not_equal)
> +
> +    addi    r3,r3,128
> +    addi    r4,r4,128
> +    bdnz    L(loop_128)
> +
> +    /* Account loop comparisons.  */
> +    clrldi.  r5,r5,57
> +    beq     L(finish)
> +
> +/* Compares 64 bytes if length is still bigger than 64 bytes.  */
> +    .p2align 5
> +L(loop_tail):
> +    /* Initialize accumulator for tail */
> +    vxor    v18,v18,v18
> +    vxor    v19,v19,v19
> +
> +    cmpldi    r5,64
> +    ble    L(last_compare)
> +
> +    COMPARE_32(v0,v2,0)
> +    COMPARE_32(v6,v8,32)
> +
> +    vcmpneb. v17,v19,v18
> +    bne    cr6,L(not_equal)
> +
> +    addi    r3,r3,64
> +    addi    r4,r4,64
> +    subi    r5,r5,64
> +    b    L(last_compare)
> +
> +END (MEMCMPEQ)
> +
> +libc_hidden_def (MEMCMPEQ)
> diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile
> index c9178223a8..164aac9dca 100644
> --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile
> +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile
> @@ -30,7 +30,7 @@ sysdep_routines += memcpy-power8-cached memcpy-power7 memcpy-a2 memcpy-power6 \
>             strncase-power8
> 
>  ifneq (,$(filter %le,$(config-machine)))
> -sysdep_routines += memcmp-power10 memcpy-power10 memmove-power10 memset-power10 \
> +sysdep_routines += memcmp-power10 memcpy-power10 memmove-power10 memset-power10 memcmpeq-power10 \
>             rawmemchr-power9 rawmemchr-power10 \
>             strcmp-power9 strcmp-power10 strncmp-power9 strncmp-power10 \
>             strcpy-power9 strcat-power10 stpcpy-power9 \
> diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
> index 1458b4575d..b346381a35 100644
> --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
> @@ -218,6 +218,26 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>                    __memcmp_power4)
>            IFUNC_IMPL_ADD (array, i, memcmp, 1, __memcmp_ppc))
> 
> +  /* Support sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c.
> +   * Pre-POWER10 variants reuse __memcmp_* since memcmp's return value
> +   * satisfies __memcmpeq's zero/non-zero contract. */
> +
> +  IFUNC_IMPL (i, name, __memcmpeq,
> +#ifdef __LITTLE_ENDIAN__
> +          IFUNC_IMPL_ADD (array, i, __memcmpeq,
> +                  hwcap2 & PPC_FEATURE2_ARCH_3_1
> +                  && hwcap & PPC_FEATURE_HAS_VSX,
> +                  __memcmpeq_power10)
> +#endif
> +          IFUNC_IMPL_ADD (array, i, __memcmpeq, hwcap2 & PPC_FEATURE2_ARCH_2_07
> +                  && hwcap & PPC_FEATURE_HAS_ALTIVEC,
> +                  __memcmp_power8)
> +          IFUNC_IMPL_ADD (array, i, __memcmpeq, hwcap & PPC_FEATURE_ARCH_2_06,
> +                  __memcmp_power7)
> +          IFUNC_IMPL_ADD (array, i, __memcmpeq, hwcap & PPC_FEATURE_POWER4,
> +                  __memcmp_power4)
> +          IFUNC_IMPL_ADD (array, i, __memcmpeq, 1, __memcmp_ppc))
> +
>    /* Support sysdeps/powerpc/powerpc64/multiarch/mempcpy.c.  */
>    IFUNC_IMPL (i, name, mempcpy,
>            IFUNC_IMPL_ADD (array, i, mempcpy,
> diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c
> index ef69cfe8da..f885d3fb55 100644
> --- a/sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c
> +++ b/sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c
> @@ -22,14 +22,15 @@
>  #define weak_alias(name, aliasname) \
>    extern __typeof (__memcmp_ppc) aliasname \
>      __attribute__ ((weak, alias ("__memcmp_ppc")));
> +/* __memcmpeq is now owned by the memcmpeq IFUNC selector (memcmpeq.os) */
>  #undef strong_alias
> -#define strong_alias(name, aliasname) \
> -  extern __typeof (__memcmp_ppc) aliasname \
> -    __attribute__ ((alias ("__memcmp_ppc")));
> +#define strong_alias(name, aliasname)
>  #if IS_IN (libc) && defined(SHARED)
>  # undef libc_hidden_builtin_def
>  # define libc_hidden_builtin_def(name) \
>    __hidden_ver1(__memcmp_ppc, __GI_memcmp, __memcmp_ppc);
> +# undef libc_hidden_def
> +# define libc_hidden_def(name)
>  #endif
> 
>  extern __typeof (memcmp) __memcmp_ppc attribute_hidden;
> diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcmpeq-power10.S b/sysdeps/powerpc/powerpc64/multiarch/memcmpeq-power10.S
> new file mode 100644
> index 0000000000..ee4b433712
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/multiarch/memcmpeq-power10.S
> @@ -0,0 +1,28 @@
> +/* Wrapper for POWER10 __memcmpeq implementation.
> +   Copyright (C) 2026 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define MEMCMPEQ __memcmpeq_power10
> +
> +#undef libc_hidden_builtin_def
> +#define libc_hidden_builtin_def(name)
> +#undef libc_hidden_def
> +#define libc_hidden_def(name)
> +#undef weak_alias
> +#define weak_alias(name, alias)
> +
> +#include <sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S>
> diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c b/sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c
> new file mode 100644
> index 0000000000..3f1266a2e8
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c
> @@ -0,0 +1,57 @@
> +/* Multiple versions of memcmpeq. PowerPC64 version.
> +   Copyright (C) 2026 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Define multiple versions only for definition in libc.  */
> +#if IS_IN (libc)
> +# define __memcmpeq __redirect___memcmpeq
> +# include <string.h>
> +# include <shlib-compat.h>
> +# include "init-arch.h"
> +
> +/* Reuse the existing optimized memcmp variants for pre-POWER10 hardware
> + * as memcmp is a superset */
> +extern __typeof (memcmp) __memcmp_ppc attribute_hidden;
> +extern __typeof (memcmp) __memcmp_power4 attribute_hidden;
> +extern __typeof (memcmp) __memcmp_power7 attribute_hidden;
> +extern __typeof (memcmp) __memcmp_power8 attribute_hidden;
> +extern __typeof (__memcmpeq) __memcmpeq_power10 attribute_hidden;
> +# undef __memcmpeq
> +
> +/* Avoid DWARF definition DIE on ifunc symbol so that GDB can handle
> +   ifunc symbol properly.  */
> +libc_ifunc_redirected (__redirect___memcmpeq, __memcmpeq,
> +#ifdef __LITTLE_ENDIAN__
> +                (hwcap2 & PPC_FEATURE2_ARCH_3_1
> +                 && hwcap & PPC_FEATURE_HAS_VSX)
> +                 ? __memcmpeq_power10 :
> +#endif
> +               (hwcap2 & PPC_FEATURE2_ARCH_2_07
> +            && hwcap & PPC_FEATURE_HAS_ALTIVEC)
> +               ? __memcmp_power8 :
> +               (hwcap & PPC_FEATURE_ARCH_2_06)
> +               ? __memcmp_power7
> +               : (hwcap & PPC_FEATURE_POWER4)
> +             ? __memcmp_power4
> +             : __memcmp_ppc);
> +# ifdef SHARED
> +__hidden_ver1 (__memcmpeq, __GI___memcmpeq, __redirect___memcmpeq)
> +    __attribute__ ((visibility ("hidden"))) __attribute_copy__ (__memcmpeq);
> +# endif
> +#else
> +#include <string/memcmp.c>
> +#endif
  
Sachin Monga May 26, 2026, 9:34 a.m. UTC | #2
Thanks for the review Adhemerval.

> Is it really worth to duplicate most of the memcmp implementation for this
> specific optimization? A simpler solution would add a ifunc variant that
> returns the already in place memcmp variants.
One clarification on scope: the IFUNC selector aliases to the existing 
|__memcmp_*| variants for everything except POWER10.

Power8, power7, power4, and ppc reuse the in-place memcmp 
implementations exactly as you're describing. The only dedicated file in 
the patch is |memcmpeq-power10.S|.

POWER10 has its own because of the numbers in section 2 of the commit 
message — Dedicated |__memcmpeq_power10| vs the same selector aliased to 
|__memcmp_power10|:

|≥ 512B : ~9% 16MB – 128MB : ~25% – 32% 256MB : ~3% 1GB : on par|

16MB–128MB is the customer workload range — that's the band that 
motivates the dedicated implementation.

Precedent: x86_64 takes the same approach — 
|sysdeps/x86_64/multiarch/memcmpeq-{sse2,avx2,evex}.S| are dedicated 
rather than aliased to |__memcmp_*|.

Regards:
Sachin.
  
Adhemerval Zanella Netto May 26, 2026, 12:33 p.m. UTC | #3
On 26/05/26 06:34, Sachin Monga wrote:
> Thanks for the review Adhemerval.
> 
>> Is it really worth to duplicate most of the memcmp implementation for this
>> specific optimization? A simpler solution would add a ifunc variant that
>> returns the already in place memcmp variants.
> One clarification on scope: the IFUNC selector aliases to the existing |__memcmp_*| variants for everything except POWER10. 
> 
> Power8, power7, power4, and ppc reuse the in-place memcmp implementations exactly as you're describing. The only dedicated file in the patch is |memcmpeq-power10.S|.
> 
> POWER10 has its own because of the numbers in section 2 of the commit message — Dedicated |__memcmpeq_power10| vs the same selector aliased to |__memcmp_power10|:
> 
> |≥ 512B : ~9% 16MB – 128MB : ~25% – 32% 256MB : ~3% 1GB : on par|
> 
> 16MB–128MB is the customer workload range — that's the band that motivates the dedicated implementation.
> 
> Precedent: x86_64 takes the same approach — |sysdeps/x86_64/multiarch/memcmpeq-{sse2,avx2,evex}.S| are dedicated rather than aliased to |__memcmp_*|.
Right, I was not expecting that the COMPARE_32 change would yield that much
difference.
  

Patch

diff --git a/sysdeps/powerpc/powerpc64/le/power10/memcmp.S 
b/sysdeps/powerpc/powerpc64/le/power10/memcmp.S
index 46a74dea4d..8915676e1b 100644
--- a/sysdeps/powerpc/powerpc64/le/power10/memcmp.S
+++ b/sysdeps/powerpc/powerpc64/le/power10/memcmp.S
@@ -161,5 +161,3 @@  L(tail8):
  END (MEMCMP)
  libc_hidden_builtin_def (memcmp)
  weak_alias (memcmp, bcmp)
-strong_alias (memcmp, __memcmpeq)
-libc_hidden_def (__memcmpeq)
diff --git a/sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S 
b/sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S
new file mode 100644
index 0000000000..4a1a4ad3ce
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S
@@ -0,0 +1,156 @@ 
+/* Optimized __memcmpeq implementation for POWER10.
+   Copyright (C) 2026 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+#define COMPARE_32(vr1,vr2,offset)\
+    lxvp      32+vr1,offset(r3);    \
+    lxvp      32+vr2,offset(r4);    \
+    vxor      v4,vr1,vr2;        \
+    vxor      v5,vr1+1,vr2+1;    \
+    vor       v19,v19,v4;        \
+    vor       v19,v19,v5;
+
+/* int [r3] __memcmpeq (const char *s1 [r3], const char *s2 [r4],
+                    size_t size [r5])
+   Returns 0 if equal, 1 if not equal (no lexicographic comparison) */
+
+#ifndef MEMCMPEQ
+# define MEMCMPEQ __memcmpeq
+#endif
+    .machine  power10
+ENTRY_TOCLESS (MEMCMPEQ, 4)
+    CALL_MCOUNT 3
+
+    /* Fast path: size == 0 */
+    cmpdi    cr7,r5,0
+    beq    cr7,L(finish)
+
+    /* Fast path: same pointer */
+    cmpd    cr7,r3,r4
+    beq    cr7,L(finish)
+
+    cmpldi    cr6,r5,64
+    bgt    cr6,L(loop_head)
+
+/* Compare 64 bytes. This section is used for lengths <= 64 and for the 
last
+   bytes for larger lengths.  */
+L(last_compare):
+    li    r8,16
+
+    sldi    r9,r5,56
+    sldi    r8,r8,56
+    addi    r6,r3,16
+    addi    r7,r4,16
+
+    /* Align up to 16 bytes.  */
+    lxvl    32+v0,r3,r9
+    lxvl    32+v2,r4,r9
+
+    /* Branch to not_equal if any bytes differ (CR6 set by vcmpneb.).
+       Branch to finish if no bytes remain (CR0.LT set when r9 went
+       negative after sub.).  */
+    sub.      r9,r9,r8
+    vcmpneb.  v4,v0,v2
+    bne      cr6,L(not_equal)
+    bt      4*cr0+lt,L(finish)
+
+    addi      r3,r3,32
+    addi      r4,r4,32
+
+    lxvl      32+v1,r6,r9
+    lxvl      32+v3,r7,r9
+    sub.      r9,r9,r8
+    vcmpneb.  v5,v1,v3
+    bne      cr6,L(not_equal)
+    bt      4*cr0+lt,L(finish)
+
+    addi      r6,r3,16
+    addi      r7,r4,16
+
+    lxvl      32+v6,r3,r9
+    lxvl      32+v8,r4,r9
+    sub.      r9,r9,r8
+    vcmpneb.  v4,v6,v8
+    bne      cr6,L(not_equal)
+    bt      4*cr0+lt,L(finish)
+
+    lxvl      32+v7,r6,r9
+    lxvl      32+v9,r7,r9
+    vcmpneb.  v5,v7,v9
+    bne      cr6,L(not_equal)
+
+L(finish):
+    /* The contents are equal.  */
+    li    r3,0
+    blr
+
+L(not_equal):
+    li    r3,1
+    blr
+
+L(loop_head):
+    /* Calculate how many loops to run.  */
+    srdi.    r8,r5,7
+    beq    L(loop_tail)
+    mtctr    r8
+
+    vxor    v18,v18,v18
+    vxor    v19,v19,v19
+    .p2align 5
+L(loop_128):
+    COMPARE_32(v0,v2,0)
+    COMPARE_32(v6,v8,32)
+    COMPARE_32(v10,v12,64)
+    COMPARE_32(v14,v16,96)
+
+    vcmpneb. v17,v19,v18
+    bne    cr6,L(not_equal)
+
+    addi    r3,r3,128
+    addi    r4,r4,128
+    bdnz    L(loop_128)
+
+    /* Account loop comparisons.  */
+    clrldi.  r5,r5,57
+    beq     L(finish)
+
+/* Compares 64 bytes if length is still bigger than 64 bytes.  */
+    .p2align 5
+L(loop_tail):
+    /* Initialize accumulator for tail */
+    vxor    v18,v18,v18
+    vxor    v19,v19,v19
+
+    cmpldi    r5,64
+    ble    L(last_compare)
+
+    COMPARE_32(v0,v2,0)
+    COMPARE_32(v6,v8,32)
+
+    vcmpneb. v17,v19,v18
+    bne    cr6,L(not_equal)
+
+    addi    r3,r3,64
+    addi    r4,r4,64
+    subi    r5,r5,64
+    b    L(last_compare)
+
+END (MEMCMPEQ)
+
+libc_hidden_def (MEMCMPEQ)
diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile 
b/sysdeps/powerpc/powerpc64/multiarch/Makefile
index c9178223a8..164aac9dca 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile
@@ -30,7 +30,7 @@  sysdep_routines += memcpy-power8-cached memcpy-power7 
memcpy-a2 memcpy-power6 \
             strncase-power8

  ifneq (,$(filter %le,$(config-machine)))
-sysdep_routines += memcmp-power10 memcpy-power10 memmove-power10 
memset-power10 \
+sysdep_routines += memcmp-power10 memcpy-power10 memmove-power10 
memset-power10 memcmpeq-power10 \
             rawmemchr-power9 rawmemchr-power10 \
             strcmp-power9 strcmp-power10 strncmp-power9 strncmp-power10 \
             strcpy-power9 strcat-power10 stpcpy-power9 \
diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c 
b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index 1458b4575d..b346381a35 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -218,6 +218,26 @@  __libc_ifunc_impl_list (const char *name, struct 
libc_ifunc_impl *array,
                    __memcmp_power4)
            IFUNC_IMPL_ADD (array, i, memcmp, 1, __memcmp_ppc))

+  /* Support sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c.
+   * Pre-POWER10 variants reuse __memcmp_* since memcmp's return value
+   * satisfies __memcmpeq's zero/non-zero contract. */
+
+  IFUNC_IMPL (i, name, __memcmpeq,
+#ifdef __LITTLE_ENDIAN__
+          IFUNC_IMPL_ADD (array, i, __memcmpeq,
+                  hwcap2 & PPC_FEATURE2_ARCH_3_1
+                  && hwcap & PPC_FEATURE_HAS_VSX,
+                  __memcmpeq_power10)
+#endif
+          IFUNC_IMPL_ADD (array, i, __memcmpeq, hwcap2 & 
PPC_FEATURE2_ARCH_2_07
+                  && hwcap & PPC_FEATURE_HAS_ALTIVEC,
+                  __memcmp_power8)
+          IFUNC_IMPL_ADD (array, i, __memcmpeq, hwcap & 
PPC_FEATURE_ARCH_2_06,
+                  __memcmp_power7)
+          IFUNC_IMPL_ADD (array, i, __memcmpeq, hwcap & PPC_FEATURE_POWER4,
+                  __memcmp_power4)
+          IFUNC_IMPL_ADD (array, i, __memcmpeq, 1, __memcmp_ppc))
+
    /* Support sysdeps/powerpc/powerpc64/multiarch/mempcpy.c.  */
    IFUNC_IMPL (i, name, mempcpy,
            IFUNC_IMPL_ADD (array, i, mempcpy,
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c 
b/sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c
index ef69cfe8da..f885d3fb55 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c
@@ -22,14 +22,15 @@ 
  #define weak_alias(name, aliasname) \
    extern __typeof (__memcmp_ppc) aliasname \
      __attribute__ ((weak, alias ("__memcmp_ppc")));
+/* __memcmpeq is now owned by the memcmpeq IFUNC selector (memcmpeq.os) */
  #undef strong_alias
-#define strong_alias(name, aliasname) \
-  extern __typeof (__memcmp_ppc) aliasname \
-    __attribute__ ((alias ("__memcmp_ppc")));
+#define strong_alias(name, aliasname)
  #if IS_IN (libc) && defined(SHARED)
  # undef libc_hidden_builtin_def
  # define libc_hidden_builtin_def(name) \
    __hidden_ver1(__memcmp_ppc, __GI_memcmp, __memcmp_ppc);
+# undef libc_hidden_def
+# define libc_hidden_def(name)
  #endif

  extern __typeof (memcmp) __memcmp_ppc attribute_hidden;
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcmpeq-power10.S 
b/sysdeps/powerpc/powerpc64/multiarch/memcmpeq-power10.S
new file mode 100644
index 0000000000..ee4b433712
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/memcmpeq-power10.S
@@ -0,0 +1,28 @@ 
+/* Wrapper for POWER10 __memcmpeq implementation.
+   Copyright (C) 2026 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define MEMCMPEQ __memcmpeq_power10
+
+#undef libc_hidden_builtin_def
+#define libc_hidden_builtin_def(name)
+#undef libc_hidden_def
+#define libc_hidden_def(name)
+#undef weak_alias
+#define weak_alias(name, alias)
+
+#include <sysdeps/powerpc/powerpc64/le/power10/memcmpeq.S>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c 
b/sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c
new file mode 100644
index 0000000000..3f1266a2e8
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/memcmpeq.c
@@ -0,0 +1,57 @@ 
+/* Multiple versions of memcmpeq. PowerPC64 version.
+   Copyright (C) 2026 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for definition in libc.  */
+#if IS_IN (libc)
+# define __memcmpeq __redirect___memcmpeq
+# include <string.h>
+# include <shlib-compat.h>
+# include "init-arch.h"
+
+/* Reuse the existing optimized memcmp variants for pre-POWER10 hardware
+ * as memcmp is a superset */
+extern __typeof (memcmp) __memcmp_ppc attribute_hidden;
+extern __typeof (memcmp) __memcmp_power4 attribute_hidden;
+extern __typeof (memcmp) __memcmp_power7 attribute_hidden;
+extern __typeof (memcmp) __memcmp_power8 attribute_hidden;
+extern __typeof (__memcmpeq) __memcmpeq_power10 attribute_hidden;
+# undef __memcmpeq
+
+/* Avoid DWARF definition DIE on ifunc symbol so that GDB can handle
+   ifunc symbol properly.  */
+libc_ifunc_redirected (__redirect___memcmpeq, __memcmpeq,
+#ifdef __LITTLE_ENDIAN__
+                (hwcap2 & PPC_FEATURE2_ARCH_3_1
+                 && hwcap & PPC_FEATURE_HAS_VSX)
+                 ? __memcmpeq_power10 :
+#endif
+               (hwcap2 & PPC_FEATURE2_ARCH_2_07
+            && hwcap & PPC_FEATURE_HAS_ALTIVEC)
+               ? __memcmp_power8 :
+               (hwcap & PPC_FEATURE_ARCH_2_06)
+               ? __memcmp_power7
+               : (hwcap & PPC_FEATURE_POWER4)
+             ? __memcmp_power4
+             : __memcmp_ppc);
+# ifdef SHARED
+__hidden_ver1 (__memcmpeq, __GI___memcmpeq, __redirect___memcmpeq)
+    __attribute__ ((visibility ("hidden"))) __attribute_copy__ 
(__memcmpeq);
+# endif
+#else
+#include <string/memcmp.c>