[v2,0/6] aarch64: Added optimized memcpy/memmove/memset for A64FX

Message ID 20210512092308.900998-1-naohirot@fujitsu.com
Headers
Series aarch64: Added optimized memcpy/memmove/memset for A64FX |

Message

Naohiro Tamura May 12, 2021, 9:23 a.m. UTC
  Hi Szabolcs, Wilco, Florian,

Thank you for reviewing Patch V1.

Patch V2 has been reflected all of V1 comments which are mainly
related to redundant assembler code.
Consequently assembler code has been minimized, and each line of V2
assembler code has been rationalized by string bench performance
data.
In terms of assembler LOC (lines of code), memcpy/memmove reduced 60%
from 1,000 to 400 lines, memset reduced 55% from 600 to 270 lines.

So please kindly review V2.

Thanks.
Naohiro

Naohiro Tamura (6):
  config: Added HAVE_AARCH64_SVE_ASM for aarch64
  aarch64: define BTI_C and BTI_J macros as NOP unless HAVE_AARCH64_BTI
  aarch64: Added optimized memcpy and memmove for A64FX
  aarch64: Added optimized memset for A64FX
  scripts: Added Vector Length Set test helper script
  benchtests: Fixed bench-memcpy-random: buf1: mprotect failed

 benchtests/bench-memcpy-random.c              |   4 +-
 config.h.in                                   |   5 +
 manual/tunables.texi                          |   3 +-
 scripts/vltest.py                             |  82 ++++
 sysdeps/aarch64/configure                     |  28 ++
 sysdeps/aarch64/configure.ac                  |  15 +
 sysdeps/aarch64/multiarch/Makefile            |   3 +-
 sysdeps/aarch64/multiarch/ifunc-impl-list.c   |  13 +-
 sysdeps/aarch64/multiarch/init-arch.h         |   4 +-
 sysdeps/aarch64/multiarch/memcpy.c            |  12 +-
 sysdeps/aarch64/multiarch/memcpy_a64fx.S      | 405 ++++++++++++++++++
 sysdeps/aarch64/multiarch/memmove.c           |  12 +-
 sysdeps/aarch64/multiarch/memset.c            |  11 +-
 sysdeps/aarch64/multiarch/memset_a64fx.S      | 268 ++++++++++++
 sysdeps/aarch64/sysdep.h                      |   9 +-
 .../unix/sysv/linux/aarch64/cpu-features.c    |   4 +
 .../unix/sysv/linux/aarch64/cpu-features.h    |   4 +
 17 files changed, 868 insertions(+), 14 deletions(-)
 create mode 100755 scripts/vltest.py
 create mode 100644 sysdeps/aarch64/multiarch/memcpy_a64fx.S
 create mode 100644 sysdeps/aarch64/multiarch/memset_a64fx.S
  

Comments

Naohiro Tamura May 27, 2021, 12:22 a.m. UTC | #1
Hi Szabolcs,

>   config: Added HAVE_AARCH64_SVE_ASM for aarch64
>   aarch64: define BTI_C and BTI_J macros as NOP unless HAVE_AARCH64_BTI
>   scripts: Added Vector Length Set test helper script
>   benchtests: Fixed bench-memcpy-random: buf1: mprotect failed

Thank you for the merges!

>   aarch64: Added optimized memcpy and memmove for A64FX
>   aarch64: Added optimized memset for A64FX

I'll fix the whitespaces.

Thanks
Naohiro
  
Naohiro Tamura May 27, 2021, 11:50 p.m. UTC | #2
Hi Szabolcs,

> >   aarch64: Added optimized memcpy and memmove for A64FX
> >   aarch64: Added optimized memset for A64FX
> 
> I'll fix the whitespaces.

Great thank you for the merges!
Naohiro