[0/5] Added optimized memcpy/memmove/memset for A64FX

Message ID	20210317022849.323046-1-naohirot@fujitsu.com
Headers	DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C0F243851C26 IronPort-SDR: SOPKq9FNNCa/b7onIECbTL7nwbQPPrRqcteKrUDQcrWTvzY2uyvn1cLfqg8yOqcR+Hsm6jTjkW Oqc83vS4RNy8AaIe8sncIbnFiYj5m/FpbtWeXxAPQSLyUTjaaIzmbyIE95LdW8J8ucZ1yQRUan 6+h5iFTPMxMqdt6QP+CZVo/9KGMU9+AAaNal5ROuIqfTt+HWB76nqtDOvNHIVUqllPJvqT34CW UxHKYWpdU227O9Eg4ZWyfcTSqSh23lsECq9oPQ5YdoTe5qgtfqpB8yAI8rENI23vxcLyu3ZtfF ZFg= From: Naohiro Tamura <naohirot@fujitsu.com> To: libc-alpha@sourceware.org Subject: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Date: Wed, 17 Mar 2021 02:28:49 +0000 Message-Id: <20210317022849.323046-1-naohirot@fujitsu.com> Precedence: list Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org>
Series	Added optimized memcpy/memmove/memset for A64FX \| [0/5] Added optimized memcpy/memmove/memset for A64FX [1/5] config: Added HAVE_SVE_ASM_SUPPORT for aarch64 [2/5] aarch64: Added optimized memcpy and memmove for A64FX [3/5] aarch64: Added optimized memset for A64FX [4/5] scripts: Added Vector Length Set test helper script [5/5] benchtests: Added generic_memcpy and generic_memmove to large benchtests

Message ID

20210317022849.323046-1-naohirot@fujitsu.com

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C0F243851C26
IronPort-SDR: 
 SOPKq9FNNCa/b7onIECbTL7nwbQPPrRqcteKrUDQcrWTvzY2uyvn1cLfqg8yOqcR+Hsm6jTjkW
 Oqc83vS4RNy8AaIe8sncIbnFiYj5m/FpbtWeXxAPQSLyUTjaaIzmbyIE95LdW8J8ucZ1yQRUan
 6+h5iFTPMxMqdt6QP+CZVo/9KGMU9+AAaNal5ROuIqfTt+HWB76nqtDOvNHIVUqllPJvqT34CW
 UxHKYWpdU227O9Eg4ZWyfcTSqSh23lsECq9oPQ5YdoTe5qgtfqpB8yAI8rENI23vxcLyu3ZtfF
 ZFg=
From: Naohiro Tamura <naohirot@fujitsu.com>
To: libc-alpha@sourceware.org
Subject: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX
Date: Wed, 17 Mar 2021 02:28:49 +0000
Message-Id: <20210317022849.323046-1-naohirot@fujitsu.com>
Precedence: list
Errors-To: libc-alpha-bounces@sourceware.org
Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org>

Series

Added optimized memcpy/memmove/memset for A64FX |

Message

Naohiro Tamura March 17, 2021, 2:28 a.m. UTC

  Fujitsu is in the process of signing the copyright assignment paper.
We'd like to have some feedback in advance.

This series of patches optimize the performance of
memcpy/memmove/memset for A64FX [1] which implements ARMv8-A SVE and
has L1 64KB cache per core and L2 8MB cache per NUMA node.

The first patch is an update of autoconf to check if assembler is
capable for ARMv8-A SVE code generation or not, and then define
HAVE_SVE_ASM_SUPPORT macro.

The second patch is memcpy/memmove performance optimization which makes
use of Scalable Vector Register with several techniques such as
loop unrolling, memory access alignment, cache zero fill, prefetch,
and software pipelining.

The third patch is memset performance optimization which makes
use of Scalable Vector Register with several techniques such as
loop unrolling, memory access alignment, cache zero fill, and
prefetch.

The forth patch is a test helper script to change Vector Length for
child process. This script can be used as test-wrapper for 'make
check'

The fifth patch is to add generic_memcpy and generic_memmove to
bench-memcpy-large.c and bench-memmove-large.c respectively so that we
can compare performance between 512 bit scalable vector register with
scalar 64 bit register consistently among memcpy/memmove/memset
default and large benchtests.


SVE assembler code for memcpy/memmove/memset is implemented as Vector
Length Agnostic code so theoretically it can be run on any SOC which
supports ARMv8-A SVE standard.

We confirmed that all testcases have been passed by running 'make
check' and 'make xcheck' not only on A64FX but also on ThunderX2.

And also we confirmed that the SVE 512 bit vector register performance
is roughly 4 times better than Advanced SIMD 128 bit register and 8
times better than scalar 64 bit register by running 'make bench'.

[1] https://github.com/fujitsu/A64FX


Naohiro Tamura (5):
  config: Added HAVE_SVE_ASM_SUPPORT for aarch64
  aarch64: Added optimized memcpy and memmove for A64FX
  aarch64: Added optimized memset for A64FX
  scripts: Added Vector Length Set test helper script
  benchtests: Added generic_memcpy and generic_memmove to large
    benchtests

 benchtests/bench-memcpy-large.c               |   9 +
 benchtests/bench-memmove-large.c              |   9 +
 config.h.in                                   |   3 +
 manual/tunables.texi                          |   3 +-
 scripts/vltest.py                             |  82 ++
 sysdeps/aarch64/configure                     |  28 +
 sysdeps/aarch64/configure.ac                  |  15 +
 sysdeps/aarch64/multiarch/Makefile            |   3 +-
 sysdeps/aarch64/multiarch/ifunc-impl-list.c   |  17 +-
 sysdeps/aarch64/multiarch/init-arch.h         |   4 +-
 sysdeps/aarch64/multiarch/memcpy.c            |  12 +-
 sysdeps/aarch64/multiarch/memcpy_a64fx.S      | 979 ++++++++++++++++++
 sysdeps/aarch64/multiarch/memmove.c           |  12 +-
 sysdeps/aarch64/multiarch/memset.c            |  11 +-
 sysdeps/aarch64/multiarch/memset_a64fx.S      | 574 ++++++++++
 .../unix/sysv/linux/aarch64/cpu-features.c    |   4 +
 .../unix/sysv/linux/aarch64/cpu-features.h    |   4 +
 17 files changed, 1759 insertions(+), 10 deletions(-)
 create mode 100755 scripts/vltest.py
 create mode 100644 sysdeps/aarch64/multiarch/memcpy_a64fx.S
 create mode 100644 sysdeps/aarch64/multiarch/memset_a64fx.S

Comments

Szabolcs Nagy March 29, 2021, 12:03 p.m. UTC | #1

The 03/17/2021 02:28, Naohiro Tamura wrote:
> Fujitsu is in the process of signing the copyright assignment paper.
> We'd like to have some feedback in advance.

thanks for these patches, please let me know when the
copyright is sorted out. i will do some review now.

Naohiro Tamura May 10, 2021, 1:45 a.m. UTC | #2

Hi Szabolcs, Wilco, Florian,

> From: Naohiro Tamura <naohirot@fujitsu.com>
> Sent: Wednesday, March 17, 2021 11:29 AM
 
> Fujitsu is in the process of signing the copyright assignment paper.
> We'd like to have some feedback in advance.

FYI: Fujitsu has submitted the signed assignment finally.

Thanks.
Naohiro

Szabolcs Nagy May 14, 2021, 1:35 p.m. UTC | #3

The 05/10/2021 01:45, naohirot@fujitsu.com wrote:
> FYI: Fujitsu has submitted the signed assignment finally.

Carlos, can we commit patches from fujitsu now?
(i dont know if we are still waiting for something)

Naohiro Tamura May 19, 2021, 12:11 a.m. UTC | #4

Hi Szabolcs, Carlos,

> From: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
> Sent: Friday, May 14, 2021 10:36 PM
> 
> The 05/10/2021 01:45, naohirot@fujitsu.com wrote:
> > FYI: Fujitsu has submitted the signed assignment finally.
> 
> Carlos, can we commit patches from fujitsu now?
> (i dont know if we are still waiting for something)

Fujitsu has received FSF signed assignment.
So the contract process has completed.

Thanks.
Naohiro