[0/5] Added optimized memcpy/memmove/memset for A64FX

Message ID 20210317022849.323046-1-naohirot@fujitsu.com
Headers
Series Added optimized memcpy/memmove/memset for A64FX |

Message

Naohiro Tamura March 17, 2021, 2:28 a.m. UTC
  Fujitsu is in the process of signing the copyright assignment paper.
We'd like to have some feedback in advance.

This series of patches optimize the performance of
memcpy/memmove/memset for A64FX [1] which implements ARMv8-A SVE and
has L1 64KB cache per core and L2 8MB cache per NUMA node.

The first patch is an update of autoconf to check if assembler is
capable for ARMv8-A SVE code generation or not, and then define
HAVE_SVE_ASM_SUPPORT macro.

The second patch is memcpy/memmove performance optimization which makes
use of Scalable Vector Register with several techniques such as
loop unrolling, memory access alignment, cache zero fill, prefetch,
and software pipelining.

The third patch is memset performance optimization which makes
use of Scalable Vector Register with several techniques such as
loop unrolling, memory access alignment, cache zero fill, and
prefetch.

The forth patch is a test helper script to change Vector Length for
child process. This script can be used as test-wrapper for 'make
check'

The fifth patch is to add generic_memcpy and generic_memmove to
bench-memcpy-large.c and bench-memmove-large.c respectively so that we
can compare performance between 512 bit scalable vector register with
scalar 64 bit register consistently among memcpy/memmove/memset
default and large benchtests.


SVE assembler code for memcpy/memmove/memset is implemented as Vector
Length Agnostic code so theoretically it can be run on any SOC which
supports ARMv8-A SVE standard.

We confirmed that all testcases have been passed by running 'make
check' and 'make xcheck' not only on A64FX but also on ThunderX2.

And also we confirmed that the SVE 512 bit vector register performance
is roughly 4 times better than Advanced SIMD 128 bit register and 8
times better than scalar 64 bit register by running 'make bench'.

[1] https://github.com/fujitsu/A64FX


Naohiro Tamura (5):
  config: Added HAVE_SVE_ASM_SUPPORT for aarch64
  aarch64: Added optimized memcpy and memmove for A64FX
  aarch64: Added optimized memset for A64FX
  scripts: Added Vector Length Set test helper script
  benchtests: Added generic_memcpy and generic_memmove to large
    benchtests

 benchtests/bench-memcpy-large.c               |   9 +
 benchtests/bench-memmove-large.c              |   9 +
 config.h.in                                   |   3 +
 manual/tunables.texi                          |   3 +-
 scripts/vltest.py                             |  82 ++
 sysdeps/aarch64/configure                     |  28 +
 sysdeps/aarch64/configure.ac                  |  15 +
 sysdeps/aarch64/multiarch/Makefile            |   3 +-
 sysdeps/aarch64/multiarch/ifunc-impl-list.c   |  17 +-
 sysdeps/aarch64/multiarch/init-arch.h         |   4 +-
 sysdeps/aarch64/multiarch/memcpy.c            |  12 +-
 sysdeps/aarch64/multiarch/memcpy_a64fx.S      | 979 ++++++++++++++++++
 sysdeps/aarch64/multiarch/memmove.c           |  12 +-
 sysdeps/aarch64/multiarch/memset.c            |  11 +-
 sysdeps/aarch64/multiarch/memset_a64fx.S      | 574 ++++++++++
 .../unix/sysv/linux/aarch64/cpu-features.c    |   4 +
 .../unix/sysv/linux/aarch64/cpu-features.h    |   4 +
 17 files changed, 1759 insertions(+), 10 deletions(-)
 create mode 100755 scripts/vltest.py
 create mode 100644 sysdeps/aarch64/multiarch/memcpy_a64fx.S
 create mode 100644 sysdeps/aarch64/multiarch/memset_a64fx.S
  

Comments

Szabolcs Nagy March 29, 2021, 12:03 p.m. UTC | #1
The 03/17/2021 02:28, Naohiro Tamura wrote:
> Fujitsu is in the process of signing the copyright assignment paper.
> We'd like to have some feedback in advance.

thanks for these patches, please let me know when the
copyright is sorted out. i will do some review now.
  
Naohiro Tamura May 10, 2021, 1:45 a.m. UTC | #2
Hi Szabolcs, Wilco, Florian,

> From: Naohiro Tamura <naohirot@fujitsu.com>
> Sent: Wednesday, March 17, 2021 11:29 AM
 
> Fujitsu is in the process of signing the copyright assignment paper.
> We'd like to have some feedback in advance.

FYI: Fujitsu has submitted the signed assignment finally.

Thanks.
Naohiro
  
Szabolcs Nagy May 14, 2021, 1:35 p.m. UTC | #3
The 05/10/2021 01:45, naohirot@fujitsu.com wrote:
> FYI: Fujitsu has submitted the signed assignment finally.

Carlos, can we commit patches from fujitsu now?
(i dont know if we are still waiting for something)
  
Naohiro Tamura May 19, 2021, 12:11 a.m. UTC | #4
Hi Szabolcs, Carlos,

> From: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
> Sent: Friday, May 14, 2021 10:36 PM
> 
> The 05/10/2021 01:45, naohirot@fujitsu.com wrote:
> > FYI: Fujitsu has submitted the signed assignment finally.
> 
> Carlos, can we commit patches from fujitsu now?
> (i dont know if we are still waiting for something)

Fujitsu has received FSF signed assignment.
So the contract process has completed.

Thanks.
Naohiro