[v3,0/1] riscv: add vectorized memset, memcpy and memmove

Message ID 20260305061947.31797-1-pincheng.plct@isrc.iscas.ac.cn
Headers
Series riscv: add vectorized memset, memcpy and memmove |

Message

Pincheng Wang March 5, 2026, 6:19 a.m. UTC
  Hi all,

This v3 patch adds RISC-V Vector (RVV) optimized implementations for
memset, memcpy and memmove.

Changes since v2:
- Changed conditional compilation order, so vector path will not be
  chosen when optimizing for size.

Changes since v1:
- Switch the conditional compilation macro from __riscv_v to
  __riscv_vector.
- Replaced '.option arch,+v' with '.option arch,+zve32x'.
- Removed an unnecessary unconditional jump instruction in
  memmove-asm.S.
- In memcpy and memmove, when __riscv_misaligned_fast is not defined,
  the destination address is now aligned to SZREG to improve performance
  on systems with slow misaligned accesses.

These implementations use the RVV extension with e8 element size and m8
LMUL, and are conditionally compiled only when __riscv_vector is
defined, ensuring compatibility with non-vector RISC-V systems.

Benchmark results on Spacemit X60 (Muse-pi) and Canaan K230 show
significant improvements.

memcpy: Up to 4.84x on Muse-pi and 4.66x on K230.
memset: Up to 4.31x on Muse-pi and 3.14x on K230.
memmove: Up to 2.87x on Muse-pi and 1.48x on K230.

Comments and suggestions are greatly appreciated. Thank you for your
time and review!

Best regards,
Pincheng Wang

Pincheng Wang (1):
  riscv: add vectorized memset, memcpy and memmove

 newlib/libc/machine/riscv/memcpy-asm.S  | 52 ++++++++++++++
 newlib/libc/machine/riscv/memcpy.c      |  2 +-
 newlib/libc/machine/riscv/memmove-asm.S | 93 +++++++++++++++++++++++++
 newlib/libc/machine/riscv/memmove.c     |  2 +-
 newlib/libc/machine/riscv/memset.S      | 18 +++++
 5 files changed, 165 insertions(+), 2 deletions(-)
  

Comments

Corinna Vinschen March 17, 2026, 3:29 p.m. UTC | #1
Kito, ping?

On Mar  5 14:19, Pincheng Wang wrote:
> Hi all,
> 
> This v3 patch adds RISC-V Vector (RVV) optimized implementations for
> memset, memcpy and memmove.
> 
> Changes since v2:
> - Changed conditional compilation order, so vector path will not be
>   chosen when optimizing for size.
> 
> Changes since v1:
> - Switch the conditional compilation macro from __riscv_v to
>   __riscv_vector.
> - Replaced '.option arch,+v' with '.option arch,+zve32x'.
> - Removed an unnecessary unconditional jump instruction in
>   memmove-asm.S.
> - In memcpy and memmove, when __riscv_misaligned_fast is not defined,
>   the destination address is now aligned to SZREG to improve performance
>   on systems with slow misaligned accesses.
> 
> These implementations use the RVV extension with e8 element size and m8
> LMUL, and are conditionally compiled only when __riscv_vector is
> defined, ensuring compatibility with non-vector RISC-V systems.
> 
> Benchmark results on Spacemit X60 (Muse-pi) and Canaan K230 show
> significant improvements.
> 
> memcpy: Up to 4.84x on Muse-pi and 4.66x on K230.
> memset: Up to 4.31x on Muse-pi and 3.14x on K230.
> memmove: Up to 2.87x on Muse-pi and 1.48x on K230.
> 
> Comments and suggestions are greatly appreciated. Thank you for your
> time and review!
> 
> Best regards,
> Pincheng Wang
> 
> Pincheng Wang (1):
>   riscv: add vectorized memset, memcpy and memmove
> 
>  newlib/libc/machine/riscv/memcpy-asm.S  | 52 ++++++++++++++
>  newlib/libc/machine/riscv/memcpy.c      |  2 +-
>  newlib/libc/machine/riscv/memmove-asm.S | 93 +++++++++++++++++++++++++
>  newlib/libc/machine/riscv/memmove.c     |  2 +-
>  newlib/libc/machine/riscv/memset.S      | 18 +++++
>  5 files changed, 165 insertions(+), 2 deletions(-)
> 
> -- 
> 2.39.5
  
Pincheng Wang April 16, 2026, 1:17 p.m. UTC | #2
Hi all,

Gentle ping on this patch series.  Please let me know if you need any 
clarifications, reworks, or further testing from my end. :)

Moreover, I’m also ready to upstream more RVV-enabled string functions 
for newlib once this series lands, so I’d welcome any early feedback on 
the overall direction.

Best regards,
Pincheng Wang

On 2026/3/5 14:19, Pincheng Wang wrote:
> Hi all,
> 
> This v3 patch adds RISC-V Vector (RVV) optimized implementations for
> memset, memcpy and memmove.
> 
> Changes since v2:
> - Changed conditional compilation order, so vector path will not be
>    chosen when optimizing for size.
> 
> Changes since v1:
> - Switch the conditional compilation macro from __riscv_v to
>    __riscv_vector.
> - Replaced '.option arch,+v' with '.option arch,+zve32x'.
> - Removed an unnecessary unconditional jump instruction in
>    memmove-asm.S.
> - In memcpy and memmove, when __riscv_misaligned_fast is not defined,
>    the destination address is now aligned to SZREG to improve performance
>    on systems with slow misaligned accesses.
> 
> These implementations use the RVV extension with e8 element size and m8
> LMUL, and are conditionally compiled only when __riscv_vector is
> defined, ensuring compatibility with non-vector RISC-V systems.
> 
> Benchmark results on Spacemit X60 (Muse-pi) and Canaan K230 show
> significant improvements.
> 
> memcpy: Up to 4.84x on Muse-pi and 4.66x on K230.
> memset: Up to 4.31x on Muse-pi and 3.14x on K230.
> memmove: Up to 2.87x on Muse-pi and 1.48x on K230.
> 
> Comments and suggestions are greatly appreciated. Thank you for your
> time and review!
> 
> Best regards,
> Pincheng Wang
> 
> Pincheng Wang (1):
>    riscv: add vectorized memset, memcpy and memmove
> 
>   newlib/libc/machine/riscv/memcpy-asm.S  | 52 ++++++++++++++
>   newlib/libc/machine/riscv/memcpy.c      |  2 +-
>   newlib/libc/machine/riscv/memmove-asm.S | 93 +++++++++++++++++++++++++
>   newlib/libc/machine/riscv/memmove.c     |  2 +-
>   newlib/libc/machine/riscv/memset.S      | 18 +++++
>   5 files changed, 165 insertions(+), 2 deletions(-)
>
  
Kito Cheng April 16, 2026, 2:26 p.m. UTC | #3
ack, it seems ok, let me merge tomorrow after local testing pass

Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn> 於 2026年4月16日週四 下午9:18寫道:
>
> Hi all,
>
> Gentle ping on this patch series.  Please let me know if you need any
> clarifications, reworks, or further testing from my end. :)
>
> Moreover, I’m also ready to upstream more RVV-enabled string functions
> for newlib once this series lands, so I’d welcome any early feedback on
> the overall direction.
>
> Best regards,
> Pincheng Wang
>
> On 2026/3/5 14:19, Pincheng Wang wrote:
> > Hi all,
> >
> > This v3 patch adds RISC-V Vector (RVV) optimized implementations for
> > memset, memcpy and memmove.
> >
> > Changes since v2:
> > - Changed conditional compilation order, so vector path will not be
> >    chosen when optimizing for size.
> >
> > Changes since v1:
> > - Switch the conditional compilation macro from __riscv_v to
> >    __riscv_vector.
> > - Replaced '.option arch,+v' with '.option arch,+zve32x'.
> > - Removed an unnecessary unconditional jump instruction in
> >    memmove-asm.S.
> > - In memcpy and memmove, when __riscv_misaligned_fast is not defined,
> >    the destination address is now aligned to SZREG to improve performance
> >    on systems with slow misaligned accesses.
> >
> > These implementations use the RVV extension with e8 element size and m8
> > LMUL, and are conditionally compiled only when __riscv_vector is
> > defined, ensuring compatibility with non-vector RISC-V systems.
> >
> > Benchmark results on Spacemit X60 (Muse-pi) and Canaan K230 show
> > significant improvements.
> >
> > memcpy: Up to 4.84x on Muse-pi and 4.66x on K230.
> > memset: Up to 4.31x on Muse-pi and 3.14x on K230.
> > memmove: Up to 2.87x on Muse-pi and 1.48x on K230.
> >
> > Comments and suggestions are greatly appreciated. Thank you for your
> > time and review!
> >
> > Best regards,
> > Pincheng Wang
> >
> > Pincheng Wang (1):
> >    riscv: add vectorized memset, memcpy and memmove
> >
> >   newlib/libc/machine/riscv/memcpy-asm.S  | 52 ++++++++++++++
> >   newlib/libc/machine/riscv/memcpy.c      |  2 +-
> >   newlib/libc/machine/riscv/memmove-asm.S | 93 +++++++++++++++++++++++++
> >   newlib/libc/machine/riscv/memmove.c     |  2 +-
> >   newlib/libc/machine/riscv/memset.S      | 18 +++++
> >   5 files changed, 165 insertions(+), 2 deletions(-)
> >
>
  
Kito Cheng April 20, 2026, 8:12 a.m. UTC | #4
pushed, thanks :)

Kito Cheng <kito.cheng@gmail.com> 於 2026年4月16日週四 下午10:26寫道:
>
> ack, it seems ok, let me merge tomorrow after local testing pass
>
> Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn> 於 2026年4月16日週四 下午9:18寫道:
> >
> > Hi all,
> >
> > Gentle ping on this patch series.  Please let me know if you need any
> > clarifications, reworks, or further testing from my end. :)
> >
> > Moreover, I’m also ready to upstream more RVV-enabled string functions
> > for newlib once this series lands, so I’d welcome any early feedback on
> > the overall direction.
> >
> > Best regards,
> > Pincheng Wang
> >
> > On 2026/3/5 14:19, Pincheng Wang wrote:
> > > Hi all,
> > >
> > > This v3 patch adds RISC-V Vector (RVV) optimized implementations for
> > > memset, memcpy and memmove.
> > >
> > > Changes since v2:
> > > - Changed conditional compilation order, so vector path will not be
> > >    chosen when optimizing for size.
> > >
> > > Changes since v1:
> > > - Switch the conditional compilation macro from __riscv_v to
> > >    __riscv_vector.
> > > - Replaced '.option arch,+v' with '.option arch,+zve32x'.
> > > - Removed an unnecessary unconditional jump instruction in
> > >    memmove-asm.S.
> > > - In memcpy and memmove, when __riscv_misaligned_fast is not defined,
> > >    the destination address is now aligned to SZREG to improve performance
> > >    on systems with slow misaligned accesses.
> > >
> > > These implementations use the RVV extension with e8 element size and m8
> > > LMUL, and are conditionally compiled only when __riscv_vector is
> > > defined, ensuring compatibility with non-vector RISC-V systems.
> > >
> > > Benchmark results on Spacemit X60 (Muse-pi) and Canaan K230 show
> > > significant improvements.
> > >
> > > memcpy: Up to 4.84x on Muse-pi and 4.66x on K230.
> > > memset: Up to 4.31x on Muse-pi and 3.14x on K230.
> > > memmove: Up to 2.87x on Muse-pi and 1.48x on K230.
> > >
> > > Comments and suggestions are greatly appreciated. Thank you for your
> > > time and review!
> > >
> > > Best regards,
> > > Pincheng Wang
> > >
> > > Pincheng Wang (1):
> > >    riscv: add vectorized memset, memcpy and memmove
> > >
> > >   newlib/libc/machine/riscv/memcpy-asm.S  | 52 ++++++++++++++
> > >   newlib/libc/machine/riscv/memcpy.c      |  2 +-
> > >   newlib/libc/machine/riscv/memmove-asm.S | 93 +++++++++++++++++++++++++
> > >   newlib/libc/machine/riscv/memmove.c     |  2 +-
> > >   newlib/libc/machine/riscv/memset.S      | 18 +++++
> > >   5 files changed, 165 insertions(+), 2 deletions(-)
> > >
> >