riscv: avoid vrgather in RVV memcmp mismatch path

Message ID 20260529012933.12831-1-pincheng.plct@isrc.iscas.ac.cn
State New
Headers
Series riscv: avoid vrgather in RVV memcmp mismatch path |

Commit Message

Pincheng Wang May 29, 2026, 1:29 a.m. UTC
  vfirst.m already returns the byte offset of the first mismatch in the
current vector chunk. Use that offset to reload the two differing bytes
with lbu instead of extracting them with vrgather.vx and vmv.x.s.

The vector gather path can be more expensive on some implementations and
also increases vector register pressure. This keeps the mismatch path
shorter while preserving the memcmp result.

Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
---
 newlib/libc/machine/riscv/memcmp-asm.S | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)
  

Comments

Pincheng Wang June 11, 2026, 2 p.m. UTC | #1
Hi all,

Gentle ping. :)

BR,
Pincheng Wang

On 2026/5/29 9:29, Pincheng Wang wrote:
> vfirst.m already returns the byte offset of the first mismatch in the
> current vector chunk. Use that offset to reload the two differing bytes
> with lbu instead of extracting them with vrgather.vx and vmv.x.s.
> 
> The vector gather path can be more expensive on some implementations and
> also increases vector register pressure. This keeps the mismatch path
> shorter while preserving the memcmp result.
> 
> Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
> ---
>   newlib/libc/machine/riscv/memcmp-asm.S | 10 ++++------
>   1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/newlib/libc/machine/riscv/memcmp-asm.S b/newlib/libc/machine/riscv/memcmp-asm.S
> index b05df9521..1cf1680e2 100644
> --- a/newlib/libc/machine/riscv/memcmp-asm.S
> +++ b/newlib/libc/machine/riscv/memcmp-asm.S
> @@ -28,12 +28,10 @@ memcmp:
>     li a0, 0
>     ret
>   .Lfound:
> -  vrgather.vx v16, v0, a4
> -  vrgather.vx v24, v8, a4
> -  vmv.x.s a0, v16
> -  vmv.x.s a4, v24
> -  andi a0, a0, 0xff
> -  andi a4, a4, 0xff
> +  add a0, a0, a4
> +  add a1, a1, a4
> +  lbu a0, 0(a0)
> +  lbu a4, 0(a1)
>     sub a0, a0, a4
>     ret
>   .size memcmp, .-memcmp
  
Kito Cheng June 11, 2026, 2:03 p.m. UTC | #2
Ack, the patch seems good to me, plan to put to my test queue then push :)

Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn> 於 2026年6月11日週四 下午10:01寫道:

> Hi all,
>
> Gentle ping. :)
>
> BR,
> Pincheng Wang
>
> On 2026/5/29 9:29, Pincheng Wang wrote:
> > vfirst.m already returns the byte offset of the first mismatch in the
> > current vector chunk. Use that offset to reload the two differing bytes
> > with lbu instead of extracting them with vrgather.vx and vmv.x.s.
> >
> > The vector gather path can be more expensive on some implementations and
> > also increases vector register pressure. This keeps the mismatch path
> > shorter while preserving the memcmp result.
> >
> > Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
> > ---
> >   newlib/libc/machine/riscv/memcmp-asm.S | 10 ++++------
> >   1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/newlib/libc/machine/riscv/memcmp-asm.S
> b/newlib/libc/machine/riscv/memcmp-asm.S
> > index b05df9521..1cf1680e2 100644
> > --- a/newlib/libc/machine/riscv/memcmp-asm.S
> > +++ b/newlib/libc/machine/riscv/memcmp-asm.S
> > @@ -28,12 +28,10 @@ memcmp:
> >     li a0, 0
> >     ret
> >   .Lfound:
> > -  vrgather.vx v16, v0, a4
> > -  vrgather.vx v24, v8, a4
> > -  vmv.x.s a0, v16
> > -  vmv.x.s a4, v24
> > -  andi a0, a0, 0xff
> > -  andi a4, a4, 0xff
> > +  add a0, a0, a4
> > +  add a1, a1, a4
> > +  lbu a0, 0(a0)
> > +  lbu a4, 0(a1)
> >     sub a0, a0, a4
> >     ret
> >   .size memcmp, .-memcmp
>
>
  

Patch

diff --git a/newlib/libc/machine/riscv/memcmp-asm.S b/newlib/libc/machine/riscv/memcmp-asm.S
index b05df9521..1cf1680e2 100644
--- a/newlib/libc/machine/riscv/memcmp-asm.S
+++ b/newlib/libc/machine/riscv/memcmp-asm.S
@@ -28,12 +28,10 @@  memcmp:
   li a0, 0
   ret
 .Lfound:
-  vrgather.vx v16, v0, a4
-  vrgather.vx v24, v8, a4
-  vmv.x.s a0, v16
-  vmv.x.s a4, v24
-  andi a0, a0, 0xff
-  andi a4, a4, 0xff
+  add a0, a0, a4
+  add a1, a1, a4
+  lbu a0, 0(a0)
+  lbu a4, 0(a1)
   sub a0, a0, a4
   ret
 .size memcmp, .-memcmp