[RFC] Proposal for implementing AArch64 port of libmvec

Message ID 20230207113555.66008-1-Joe.Ramsay@arm.com
State Superseded
Headers
Series [RFC] Proposal for implementing AArch64 port of libmvec |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Joe Ramsay Feb. 7, 2023, 11:35 a.m. UTC
  Hi,

The attached patch is an attempt to enable libmvec on AArch64. The
proposed change is mainly implementing build infrastructure to add the
new routines to ABI, tests and benchmarks. I have demonstrated how
this all fits together by adding implementations for vector cos, in
both single and double precision, targeting both Advanced SIMD and
SVE.

The implementations of the routines themselves are just loops over the
scalar routine from libm for now, as we are more concerned with
getting the plumbing right at this point. We plan to contribute vector
routines from the Arm Optimized Routines repo that are compliant with
requirements described in the libmvec wiki.

Any comments/thoughts much appreciated! In particular, the patch
raises the minimum GCC to 10, in order to be able to submit routines
written using ACLE instead of assembly. This is clearly a big jump,
but we have options if this is not acceptable. One option would be to
submit compiler-generated assembly, similar to the equivalent routines
under sysdeps/x86_64. If GCC 9 is an acceptable compromise then this
would only have to be for SVE routines.

Also, are there plans to merge libmvec into libm, or will they be kept
separate?

Note that at this point users have to manually call the vector math
functions, there is no declaration in math.h to assist auto
vectorization of scalar math calls. This seems to be acceptable to
some downstream users.

Thanks,
Joe
---
 INSTALL                                       |  3 +
 manual/install.texi                           |  3 +
 sysdeps/aarch64/configure                     | 28 ++++++
 sysdeps/aarch64/configure.ac                  | 20 ++++
 sysdeps/aarch64/fpu/Makefile                  | 66 +++++++++++++
 sysdeps/aarch64/fpu/Versions                  |  8 ++
 sysdeps/aarch64/fpu/advsimd_utils.h           | 39 ++++++++
 sysdeps/aarch64/fpu/bench-libmvec-skeleton.c  | 83 +++++++++++++++++
 sysdeps/aarch64/fpu/bits/math-vector.h        | 65 +++++++++++++
 sysdeps/aarch64/fpu/cos_advsimd.c             | 28 ++++++
 sysdeps/aarch64/fpu/cos_sve.c                 | 27 ++++++
 sysdeps/aarch64/fpu/cosf_advsimd.c            | 28 ++++++
 sysdeps/aarch64/fpu/cosf_sve.c                | 27 ++++++
 sysdeps/aarch64/fpu/libm-test-ulps            |  7 ++
 sysdeps/aarch64/fpu/libm-test-ulps-name       |  1 +
 sysdeps/aarch64/fpu/math-tests-arch.h         | 34 +++++++
 .../fpu/scripts/bench_libmvec_advsimd.py      | 91 ++++++++++++++++++
 .../aarch64/fpu/scripts/bench_libmvec_sve.py  | 93 +++++++++++++++++++
 sysdeps/aarch64/fpu/sve_utils.h               | 55 +++++++++++
 .../fpu/test-double-advsimd-wrappers.c        | 26 ++++++
 sysdeps/aarch64/fpu/test-double-advsimd.h     | 25 +++++
 .../aarch64/fpu/test-double-sve-wrappers.c    | 34 +++++++
 sysdeps/aarch64/fpu/test-double-sve.h         | 26 ++++++
 .../aarch64/fpu/test-float-advsimd-wrappers.c | 26 ++++++
 sysdeps/aarch64/fpu/test-float-advsimd.h      | 25 +++++
 sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 34 +++++++
 sysdeps/aarch64/fpu/test-float-sve.h          | 26 ++++++
 .../aarch64/fpu/test-vpcs-vector-wrapper.h    | 30 ++++++
 .../unix/sysv/linux/aarch64/libmvec.abilist   |  4 +
 29 files changed, 962 insertions(+)
 create mode 100644 sysdeps/aarch64/fpu/Makefile
 create mode 100644 sysdeps/aarch64/fpu/Versions
 create mode 100644 sysdeps/aarch64/fpu/advsimd_utils.h
 create mode 100644 sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
 create mode 100644 sysdeps/aarch64/fpu/bits/math-vector.h
 create mode 100644 sysdeps/aarch64/fpu/cos_advsimd.c
 create mode 100644 sysdeps/aarch64/fpu/cos_sve.c
 create mode 100644 sysdeps/aarch64/fpu/cosf_advsimd.c
 create mode 100644 sysdeps/aarch64/fpu/cosf_sve.c
 create mode 100644 sysdeps/aarch64/fpu/libm-test-ulps
 create mode 100644 sysdeps/aarch64/fpu/libm-test-ulps-name
 create mode 100644 sysdeps/aarch64/fpu/math-tests-arch.h
 create mode 100644 sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py
 create mode 100755 sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py
 create mode 100644 sysdeps/aarch64/fpu/sve_utils.h
 create mode 100644 sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c
 create mode 100644 sysdeps/aarch64/fpu/test-double-advsimd.h
 create mode 100644 sysdeps/aarch64/fpu/test-double-sve-wrappers.c
 create mode 100644 sysdeps/aarch64/fpu/test-double-sve.h
 create mode 100644 sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c
 create mode 100644 sysdeps/aarch64/fpu/test-float-advsimd.h
 create mode 100644 sysdeps/aarch64/fpu/test-float-sve-wrappers.c
 create mode 100644 sysdeps/aarch64/fpu/test-float-sve.h
 create mode 100644 sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
  

Comments

Carlos Seo Feb. 7, 2023, 1:11 p.m. UTC | #1
On Tue, 7 Feb 2023 at 08:36, Joe Ramsay via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> diff --git a/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c b/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
> new file mode 100644
> index 0000000000..ca6a10d1fe
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
> @@ -0,0 +1,83 @@
> +/* Skeleton for libmvec benchmark programs.
> +   Copyright (C) 2021-2023 Free Software Foundation, Inc.

New file. Copyright should be 2023 only.
  
Adhemerval Zanella Netto Feb. 8, 2023, 1:11 p.m. UTC | #2
On 07/02/23 08:35, Joe Ramsay via Libc-alpha wrote:
> Hi,
> 
> The attached patch is an attempt to enable libmvec on AArch64. The
> proposed change is mainly implementing build infrastructure to add the
> new routines to ABI, tests and benchmarks. I have demonstrated how
> this all fits together by adding implementations for vector cos, in
> both single and double precision, targeting both Advanced SIMD and
> SVE.
> 
> The implementations of the routines themselves are just loops over the
> scalar routine from libm for now, as we are more concerned with
> getting the plumbing right at this point. We plan to contribute vector
> routines from the Arm Optimized Routines repo that are compliant with
> requirements described in the libmvec wiki.
> 
> Any comments/thoughts much appreciated! In particular, the patch
> raises the minimum GCC to 10, in order to be able to submit routines
> written using ACLE instead of assembly. This is clearly a big jump,
> but we have options if this is not acceptable. One option would be to
> submit compiler-generated assembly, similar to the equivalent routines
> under sysdeps/x86_64. If GCC 9 is an acceptable compromise then this
> would only have to be for SVE routines.

Using C implementation with intrinsics would be idea, there are more easily
maintained and can leverage compiler improvements.  I rather do it instead
of the assembly dump Intel did.

The minimum GCC 10 is not ideal, however I don't see it as blocker either
(it should be up to arch-maintainers).  One option might be check if
compiler does not support building libmvec, disable the build and related
checks.  It is not ideal either, since the resulting glibc won't have
a complete ABI.


> 
> Also, are there plans to merge libmvec into libm, or will they be kept
> separate?

There is none afaik.  The libpthread, librt, etc. merge was done to
fix long standing design and maintanance issues that is not really presented
with libm and libmvec.  There is still the partial upgrade one, but
it is still present with a disjoint ld, libc, libm anyway.

However, it is feasible to merge if your willing to work on it.  We will
need to keep the x86_64 lib with the sentinel compat symbol (similar to
what we did for libpthread).

What I would like to avoid is to have different arquitectures using different
approaches, for instance aarch64 begin merged while having x86_64 still
using a different library.  It add a slight more complexity to the build
process and extra arch specific boilerplate code.

> 
> Note that at this point users have to manually call the vector math
> functions, there is no declaration in math.h to assist auto
> vectorization of scalar math calls. This seems to be acceptable to
> some downstream users.

I think that's the current approach for x86_64 anyway, since most usages
are done through compiler autovectorization code.

Some comments below.

> 
> Thanks,
> Joe
> ---
>  INSTALL                                       |  3 +
>  manual/install.texi                           |  3 +
>  sysdeps/aarch64/configure                     | 28 ++++++
>  sysdeps/aarch64/configure.ac                  | 20 ++++
>  sysdeps/aarch64/fpu/Makefile                  | 66 +++++++++++++
>  sysdeps/aarch64/fpu/Versions                  |  8 ++
>  sysdeps/aarch64/fpu/advsimd_utils.h           | 39 ++++++++
>  sysdeps/aarch64/fpu/bench-libmvec-skeleton.c  | 83 +++++++++++++++++
>  sysdeps/aarch64/fpu/bits/math-vector.h        | 65 +++++++++++++
>  sysdeps/aarch64/fpu/cos_advsimd.c             | 28 ++++++
>  sysdeps/aarch64/fpu/cos_sve.c                 | 27 ++++++
>  sysdeps/aarch64/fpu/cosf_advsimd.c            | 28 ++++++
>  sysdeps/aarch64/fpu/cosf_sve.c                | 27 ++++++
>  sysdeps/aarch64/fpu/libm-test-ulps            |  7 ++
>  sysdeps/aarch64/fpu/libm-test-ulps-name       |  1 +
>  sysdeps/aarch64/fpu/math-tests-arch.h         | 34 +++++++
>  .../fpu/scripts/bench_libmvec_advsimd.py      | 91 ++++++++++++++++++
>  .../aarch64/fpu/scripts/bench_libmvec_sve.py  | 93 +++++++++++++++++++
>  sysdeps/aarch64/fpu/sve_utils.h               | 55 +++++++++++
>  .../fpu/test-double-advsimd-wrappers.c        | 26 ++++++
>  sysdeps/aarch64/fpu/test-double-advsimd.h     | 25 +++++
>  .../aarch64/fpu/test-double-sve-wrappers.c    | 34 +++++++
>  sysdeps/aarch64/fpu/test-double-sve.h         | 26 ++++++
>  .../aarch64/fpu/test-float-advsimd-wrappers.c | 26 ++++++
>  sysdeps/aarch64/fpu/test-float-advsimd.h      | 25 +++++
>  sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 34 +++++++
>  sysdeps/aarch64/fpu/test-float-sve.h          | 26 ++++++
>  .../aarch64/fpu/test-vpcs-vector-wrapper.h    | 30 ++++++
>  .../unix/sysv/linux/aarch64/libmvec.abilist   |  4 +
>  29 files changed, 962 insertions(+)
>  create mode 100644 sysdeps/aarch64/fpu/Makefile
>  create mode 100644 sysdeps/aarch64/fpu/Versions
>  create mode 100644 sysdeps/aarch64/fpu/advsimd_utils.h
>  create mode 100644 sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
>  create mode 100644 sysdeps/aarch64/fpu/bits/math-vector.h
>  create mode 100644 sysdeps/aarch64/fpu/cos_advsimd.c
>  create mode 100644 sysdeps/aarch64/fpu/cos_sve.c
>  create mode 100644 sysdeps/aarch64/fpu/cosf_advsimd.c
>  create mode 100644 sysdeps/aarch64/fpu/cosf_sve.c
>  create mode 100644 sysdeps/aarch64/fpu/libm-test-ulps
>  create mode 100644 sysdeps/aarch64/fpu/libm-test-ulps-name
>  create mode 100644 sysdeps/aarch64/fpu/math-tests-arch.h
>  create mode 100644 sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py
>  create mode 100755 sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py
>  create mode 100644 sysdeps/aarch64/fpu/sve_utils.h
>  create mode 100644 sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c
>  create mode 100644 sysdeps/aarch64/fpu/test-double-advsimd.h
>  create mode 100644 sysdeps/aarch64/fpu/test-double-sve-wrappers.c
>  create mode 100644 sysdeps/aarch64/fpu/test-double-sve.h
>  create mode 100644 sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c
>  create mode 100644 sysdeps/aarch64/fpu/test-float-advsimd.h
>  create mode 100644 sysdeps/aarch64/fpu/test-float-sve-wrappers.c
>  create mode 100644 sysdeps/aarch64/fpu/test-float-sve.h
>  create mode 100644 sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h
>  create mode 100644 sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
> 
> diff --git a/INSTALL b/INSTALL
> index 970d6627e2..ba800e41d6 100644
> --- a/INSTALL
> +++ b/INSTALL
> @@ -524,6 +524,9 @@ build the GNU C Library:
>       For s390x architecture builds, GCC 7.1 or higher is needed (See gcc
>       Bug 98269).
>  
> +     For AArch64 architecture builds with mathvec enabled, GCC 10 or
> +     higher is needed due to dependency on arm_sve.h.
> +
>       For multi-arch support it is recommended to use a GCC which has
>       been built with support for GNU indirect functions.  This ensures
>       that correct debugging information is generated for functions
> diff --git a/manual/install.texi b/manual/install.texi
> index 260f8a5c82..e9c62b51ae 100644
> --- a/manual/install.texi
> +++ b/manual/install.texi
> @@ -567,6 +567,9 @@ For ARC architecture builds, GCC 8.3 or higher is needed.
>  
>  For s390x architecture builds, GCC 7.1 or higher is needed (See gcc Bug 98269).
>  
> +For AArch64 architecture builds with mathvec enabled, GCC 10 or higher is needed
> +due to dependency on arm_sve.h.
> +
>  For multi-arch support it is recommended to use a GCC which has been built with
>  support for GNU indirect functions.  This ensures that correct debugging
>  information is generated for functions selected by IFUNC resolvers.  This
> diff --git a/sysdeps/aarch64/configure b/sysdeps/aarch64/configure
> index 2130f6b8f8..a71c32d70f 100644
> --- a/sysdeps/aarch64/configure
> +++ b/sysdeps/aarch64/configure
> @@ -327,3 +327,31 @@ if test $libc_cv_aarch64_sve_asm = yes; then
>    $as_echo "#define HAVE_AARCH64_SVE_ASM 1" >>confdefs.h
>  
>  fi
> +
> +# Check if the local system can run SVE binary
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for local SVE hardware" >&5
> +$as_echo_n "checking for local SVE hardware... " >&6; }
> +if ${libc_cv_can_run_sve+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +    cat > conftest.c <<EOF
> +#include <sys/auxv.h>
> +int main(void) {
> +  if (! (getauxval (AT_HWCAP) & HWCAP_SVE))
> +    return 1;
> +  return 0;
> +}
> +EOF
> +  libc_cv_can_run_sve=yes
> +  ${CC-cc} conftest.c -o conftest
> +  ./conftest || libc_cv_can_run_sve=no
> +  rm -f conftest*
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_can_run_sve" >&5
> +$as_echo "$libc_cv_can_run_sve" >&6; }
> +config_vars="$config_vars
> +aarch64-can-run-sve = $libc_cv_can_run_sve"
> +
> +if test x"$build_mathvec" = xnotset; then
> +  build_mathvec=yes
> +fi
> diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac
> index 85c6f76508..688f8772a6 100644
> --- a/sysdeps/aarch64/configure.ac
> +++ b/sysdeps/aarch64/configure.ac
> @@ -101,3 +101,23 @@ rm -f conftest*])
>  if test $libc_cv_aarch64_sve_asm = yes; then
>    AC_DEFINE(HAVE_AARCH64_SVE_ASM)
>  fi
> +
> +# Check if the local system can run SVE binary
> +AC_CACHE_CHECK(for local SVE hardware, libc_cv_can_run_sve, [dnl
> +  cat > conftest.c <<EOF
> +#include <sys/auxv.h>
> +int main(void) {
> +  if (! (getauxval (AT_HWCAP) & HWCAP_SVE))
> +    return 1;
> +  return 0;
> +}
> +EOF
> +  libc_cv_can_run_sve=yes
> +  ${CC-cc} conftest.c -o conftest
> +  ./conftest || libc_cv_can_run_sve=no
> +  rm -f conftest*])
> +LIBC_CONFIG_VAR([aarch64-can-run-sve], [$libc_cv_can_run_sve])
> +
> +if test x"$build_mathvec" = xnotset; then
> +  build_mathvec=yes
> +fi
> diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
> new file mode 100644
> index 0000000000..caf5d60669
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/Makefile
> @@ -0,0 +1,66 @@
> +float-advsimd-funcs = cos
> +
> +double-advsimd-funcs = cos
> +
> +float-sve-funcs = cos
> +
> +double-sve-funcs = cos
> +
> +ifeq ($(subdir),mathvec)
> +libmvec-support = $(addsuffix f_advsimd,$(float-advsimd-funcs)) \
> +                  $(addsuffix _advsimd,$(double-advsimd-funcs)) \
> +                  $(addsuffix f_sve,$(float-sve-funcs)) \
> +                  $(addsuffix _sve,$(double-sve-funcs))
> +endif
> +
> +sve-cflags = -march=armv8-a+sve
> +
> +
> +ifeq ($(build-mathvec),yes)
> +bench-libmvec = $(addprefix float-advsimd-,$(float-advsimd-funcs)) \
> +                $(addprefix double-advsimd-,$(double-advsimd-funcs))
> +
> +# If not on an SVE-enabled machine, do not add SVE routines to benchmarks.
> +# The routines are still built.
> +ifeq ($(aarch64-can-run-sve),yes)
> +  bench-libmvec += $(addprefix float-sve-,$(float-sve-funcs)) \
> +                   $(addprefix double-sve-,$(double-sve-funcs))
> +endif
> +endif
> +
> +$(objpfx)bench-float-advsimd-%.c:
> +	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py $(basename $(@F)) > $@
> +$(objpfx)bench-double-advsimd-%.c:
> +	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py $(basename $(@F)) > $@
> +$(objpfx)bench-float-sve-%.c:
> +	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py $(basename $(@F)) > $@
> +$(objpfx)bench-double-sve-%.c:
> +	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py $(basename $(@F)) > $@
> +
> +ifeq (${STATIC-BENCHTESTS},yes)
> +libmvec-benchtests = $(common-objpfx)mathvec/libmvec.a $(common-objpfx)math/libm.a
> +else
> +libmvec-benchtests = $(libmvec) $(libm)
> +endif
> +
> +$(addprefix $(objpfx)bench-,$(bench-libmvec)): $(libmvec-benchtests)
> +
> +ifeq ($(build-mathvec),yes)
> +libmvec-tests += float-advsimd double-advsimd float-sve double-sve
> +endif
> +
> +define sve-float-cflags-template
> +CFLAGS-$(1)f_sve.c += $(sve-cflags)
> +CFLAGS-bench-float-sve-$(1).c += $(sve-cflags)
> +endef
> +
> +define sve-double-cflags-template
> +CFLAGS-$(1)_sve.c += $(sve-cflags)
> +CFLAGS-bench-double-sve-$(1).c += $(sve-cflags)
> +endef
> +
> +$(foreach f,$(float-sve-funcs), $(eval $(call sve-float-cflags-template,$(f))))
> +$(foreach f,$(double-sve-funcs), $(eval $(call sve-double-cflags-template,$(f))))
> +
> +CFLAGS-test-float-sve-wrappers.c = $(sve-cflags)
> +CFLAGS-test-double-sve-wrappers.c = $(sve-cflags)
> diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions
> new file mode 100644
> index 0000000000..5222a6f180
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/Versions
> @@ -0,0 +1,8 @@
> +libmvec {
> +  GLIBC_2.38 {
> +    _ZGVnN2v_cos;
> +    _ZGVnN4v_cosf;
> +    _ZGVsMxv_cos;
> +    _ZGVsMxv_cosf;
> +  }
> +}
> diff --git a/sysdeps/aarch64/fpu/advsimd_utils.h b/sysdeps/aarch64/fpu/advsimd_utils.h
> new file mode 100644
> index 0000000000..b597a18b8f
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/advsimd_utils.h
> @@ -0,0 +1,39 @@
> +/* Helpers for Advanced SIMD vector math funtions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <arm_neon.h>
> +
> +#define VPCS_ATTR __attribute__ ((aarch64_vector_pcs))
> +
> +#define V_NAME_F1(fun) _ZGVnN4v_##fun##f
> +#define V_NAME_D1(fun) _ZGVnN2v_##fun
> +#define V_NAME_F2(fun) _ZGVnN4vv_##fun##f
> +#define V_NAME_D2(fun) _ZGVnN2vv_##fun
> +
> +static inline float32x4_t

You might considere using __always_inline here if the idea is to use this functions
as macros.

> +v_call_f32 (float (*f) (float), float32x4_t x)
> +{
> +  return (float32x4_t){f (x[0]), f (x[1]), f (x[2]), f (x[3])};
> +}
> +
> +static inline float64x2_t
> +v_call_f64 (double (*f) (double), float64x2_t x)
> +{
> +  return (float64x2_t){f (x[0]), f (x[1])};
> +}
> diff --git a/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c b/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
> new file mode 100644
> index 0000000000..ca6a10d1fe
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
> @@ -0,0 +1,83 @@
> +/* Skeleton for libmvec benchmark programs.
> +   Copyright (C) 2021-2023 Free Software Foundation, Inc.

I think Copyright year is only 2023 here.

> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <string.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <time.h>
> +#include <inttypes.h>
> +#include <bench-timing.h>
> +#include <json-lib.h>
> +#include <bench-util.h>
> +#include <math-tests-arch.h>
> +
> +#include <bench-util.c>
> +#define D_ITERS 10000
> +
> +int
> +main (int argc, char **argv)
> +{
> +  unsigned long i, k;
> +  timing_t start, end;
> +  json_ctx_t json_ctx;> +
> +  bench_start ();
> +
> +#ifdef BENCH_INIT
> +  BENCH_INIT ();
> +#endif
> +
> +  json_init (&json_ctx, 2, stdout);
> +
> +  /* Begin function.  */
> +  json_attr_object_begin (&json_ctx, FUNCNAME);
> +
> +  for (int v = 0; v < NUM_VARIANTS; v++)
> +    {
> +      double d_total_time = 0;
> +      timing_t cur;
> +      for (k = 0; k < D_ITERS; k++)
> +	{
> +	  TIMING_NOW (start);
> +	  for (i = 0; i < NUM_SAMPLES (v); i++)
> +	    BENCH_FUNC (v, i);
> +	  TIMING_NOW (end);
> +
> +	  TIMING_DIFF (cur, start, end);
> +
> +	  TIMING_ACCUM (d_total_time, cur);
> +	}
> +      double d_total_data_set = D_ITERS * NUM_SAMPLES (v) * STRIDE;
> +
> +      /* Begin variant.  */
> +      json_attr_object_begin (&json_ctx, VARIANT (v));
> +
> +      json_attr_double (&json_ctx, "duration", d_total_time);
> +      json_attr_double (&json_ctx, "iterations", d_total_data_set);
> +      json_attr_double (&json_ctx, "mean", d_total_time / d_total_data_set);
> +
> +      /* End variant.  */
> +      json_attr_object_end (&json_ctx);
> +    }
> +
> +  /* End function.  */
> +  json_attr_object_end (&json_ctx);
> +
> +  return 0;
> +}

This file is quite similar to x86_64 modulo the extra CPU_FEATURE_ACTIVE checks.
Maybe try to refactor to use a common definition and parametrize the x86 code
on a arch-specific code?

> diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h
> new file mode 100644
> index 0000000000..a25845bff8
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/bits/math-vector.h
> @@ -0,0 +1,65 @@
> +/* Platform-specific SIMD declarations of math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _MATH_H
> +# error "Never include <bits/math-vector.h> directly;\
> + include <math.h> instead."
> +#endif
> +
> +/* Get default empty definitions for simd declarations.  */
> +#include <bits/libm-simd-decl-stubs.h>
> +
> +#if __GNUC_PREREQ (9, 0)

I think these tests should move to configure tests instead, it advertises beforehand
the user that it needs to update the compiler instead through a compiler error.

The configure check will then check for both advsimd and SVE support, so there is
no need for __ADVSIMD_VEC_MATH_SUPPORTED or __SVE_VEC_MATH_SUPPORTED.

> +# define __ADVSIMD_VEC_MATH_SUPPORTED
> +typedef __Float32x4_t __f32x4_t;
> +typedef __Float64x2_t __f64x2_t;
> +#elif __clang_major__ >= 8
> +# define __ADVSIMD_VEC_MATH_SUPPORTED
> +typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
> +typedef __attribute__((__neon_vector_type__(2))) double __f64x2_t;
> +#endif
> +
> +#if __GNUC_PREREQ (10, 0) || __clang_major >= 11
> +# define __SVE_VEC_MATH_SUPPORTED
> +typedef __SVFloat32_t __sv_f32_t;
> +typedef __SVFloat64_t __sv_f64_t;
> +typedef __SVBool_t __sv_bool_t;
> +#endif
> +
> +/* If vector types and vector PCS are unsupported in the working
> +   compiler, no choice but to omit vector math declarations.  */
> +
> +#ifdef __ADVSIMD_VEC_MATH_SUPPORTED
> +
> +# define __vpcs __attribute__((__aarch64_vector_pcs__))
> +
> +__vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
> +__vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t);
> +
> +#undef __ADVSIMD_VEC_MATH_SUPPORTED
> +#endif /* __ADVSIMD_VEC_MATH_SUPPORTED */
> +
> +#ifdef __SVE_VEC_MATH_SUPPORTED
> +
> +__sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
> +__sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t);
> +
> +#undef __SVE_VEC_MATH_SUPPORTED
> +#endif /* __SVE_VEC_MATH_SUPPORTED */
> +
> diff --git a/sysdeps/aarch64/fpu/cos_advsimd.c b/sysdeps/aarch64/fpu/cos_advsimd.c
> new file mode 100644
> index 0000000000..5a42fbb182
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/cos_advsimd.c
> @@ -0,0 +1,28 @@
> +/* Double-precision vector (Advanced SIMD) cos function.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <math.h>
> +
> +#include "advsimd_utils.h"
> +
> +VPCS_ATTR
> +float64x2_t V_NAME_D1 (cos) (float64x2_t x)
> +{
> +  return v_call_f64 (cos, x);
> +}
> diff --git a/sysdeps/aarch64/fpu/cos_sve.c b/sysdeps/aarch64/fpu/cos_sve.c
> new file mode 100644
> index 0000000000..62bd2ece0e
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/cos_sve.c
> @@ -0,0 +1,27 @@
> +/* Double-precision vector (SVE) cos function.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <math.h>
> +
> +#include "sve_utils.h"
> +
> +svfloat64_t SV_NAME_D1 (cos) (svfloat64_t x, svbool_t pg)
> +{
> +  return sv_call_f64 (cos, x, svdup_n_f64 (0), pg);
> +}
> diff --git a/sysdeps/aarch64/fpu/cosf_advsimd.c b/sysdeps/aarch64/fpu/cosf_advsimd.c
> new file mode 100644
> index 0000000000..23f54bd905
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/cosf_advsimd.c
> @@ -0,0 +1,28 @@
> +/* Single-precision vector (Advanced SIMD) cos function.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <math.h>
> +
> +#include "advsimd_utils.h"
> +
> +VPCS_ATTR
> +float32x4_t V_NAME_F1 (cos) (float32x4_t x)
> +{
> +  return v_call_f32 (cosf, x);
> +}
> diff --git a/sysdeps/aarch64/fpu/cosf_sve.c b/sysdeps/aarch64/fpu/cosf_sve.c
> new file mode 100644
> index 0000000000..0c4e365e1e
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/cosf_sve.c
> @@ -0,0 +1,27 @@
> +/* Single-precision vector (SVE) cos function.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <math.h>
> +
> +#include "sve_utils.h"
> +
> +svfloat32_t SV_NAME_F1 (cos) (svfloat32_t x, svbool_t pg)
> +{
> +  return sv_call_f32 (cosf, x, svdup_n_f32 (0), pg);
> +}
> diff --git a/sysdeps/aarch64/fpu/libm-test-ulps b/sysdeps/aarch64/fpu/libm-test-ulps
> new file mode 100644
> index 0000000000..b199d7ddab
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/libm-test-ulps
> @@ -0,0 +1,7 @@
> +Function: "cos_advsimd":
> +double: 2
> +float: 2
> +
> +Function: "cos_sve":
> +double: 2
> +float: 2
> \ No newline at end of file

Bogus line feed here.

> diff --git a/sysdeps/aarch64/fpu/libm-test-ulps-name b/sysdeps/aarch64/fpu/libm-test-ulps-name
> new file mode 100644
> index 0000000000..1f66c5cda0
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/libm-test-ulps-name
> @@ -0,0 +1 @@
> +AArch64
> diff --git a/sysdeps/aarch64/fpu/math-tests-arch.h b/sysdeps/aarch64/fpu/math-tests-arch.h
> new file mode 100644
> index 0000000000..263d4cabf1
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/math-tests-arch.h
> @@ -0,0 +1,34 @@
> +/* Runtime architecture check for math tests. AArch64 version.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifdef REQUIRE_SVE
> +# include <sys/auxv.h>
> +
> +# define INIT_ARCH_EXT
> +# define CHECK_ARCH_EXT							\
> +   do									\
> +     {									\
> +       if (!(getauxval (AT_HWCAP) & HWCAP_SVE)) return;			\
> +     }									\
> +   while (0)
> +
> +#else
> +# include <sysdeps/generic/math-tests-arch.h>
> +#endif
> +

Spurions new line here.

> diff --git a/sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py b/sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py
> new file mode 100644
> index 0000000000..9c092670d7
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py
> @@ -0,0 +1,91 @@
> +#!/usr/bin/python3
> +# Copyright (C) 2023 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +import sys
> +
> +TEMPLATE = """
> +#include <math.h>
> +#include <arm_neon.h>
> +
> +#define STRIDE {stride}
> +
> +#define CALL_BENCH_FUNC(v, i) (__extension__ ({{                         \\
> +   {rtype} mx0 = {fname}(vld1q_f{prec_short} (variants[v].in[i].arg0));  \\
> +   mx0; }}))
> +
> +struct args
> +{{
> +  {stype} arg0[STRIDE];
> +  double timing;
> +}};
> +
> +struct _variants
> +{{
> +  const char *name;
> +  int count;
> +  struct args *in;
> +}};
> +
> +struct args in0[{rowcount}] = {{
> +{in_data}
> +}};
> +
> +struct _variants variants[1] = {{
> +  {{"", {rowcount}, in0}},
> +}};

Maybe define them as static const?

> +
> +#define NUM_VARIANTS 1
> +#define NUM_SAMPLES(i) (variants[i].count)
> +#define VARIANT(i) (variants[i].name)
> +
> +// Cannot pass volatile pointer to svst1. This still does not appear to get optimised out.
> +static {rtype} volatile ret;
> +
> +#define BENCH_FUNC(i, j) ({{ ret = CALL_BENCH_FUNC(i, j); }})
> +#define FUNCNAME "{fname}"
> +#include <bench-libmvec-skeleton.c>
> +"""
> +
> +def main(name):
> +    _, prec, _, func = name.split("-")
> +    scalar_to_advsimd_type = {"double": "float64x2_t", "float": "float32x4_t"}
> +
> +    stride = {"double": 2, "float": 4}[prec]
> +    rtype = scalar_to_advsimd_type[prec]
> +    atype = scalar_to_advsimd_type[prec]
> +    fname = f"_ZGVnN{stride}v_{func}{'f' if prec == 'float' else ''}"
> +    prec_short = {"double": 64, "float": 32}[prec]
> +
> +    with open(f"../benchtests/{func}-inputs") as f:
> +        in_vals = [l.strip() for l in f.readlines() if l and not l.startswith("#")]
> +    in_vals = [in_vals[i:i+stride] for i in range(0, len(in_vals), stride)]
> +    rowcount= len(in_vals)
> +    in_data = ",\n".join("{{" + ", ".join(row) + "}, 0}" for row in in_vals)
> +
> +    print(TEMPLATE.format(stride=stride,
> +                          rtype=rtype,
> +                          atype=atype,
> +                          fname=fname,
> +                          prec_short=prec_short,
> +                          in_data=in_data,
> +                          rowcount=rowcount,
> +                          stype=prec))
> +
> +
> +if __name__ == "__main__":
> +    main(sys.argv[1])
> diff --git a/sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py b/sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py
> new file mode 100755
> index 0000000000..0ea21c4c69
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py
> @@ -0,0 +1,93 @@
> +#!/usr/bin/python3
> +# Copyright (C) 2023 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +import sys
> +
> +TEMPLATE = """
> +#include <math.h>
> +#include <arm_sve.h>
> +
> +#define STRIDE {stride}
> +
> +#define CALL_BENCH_FUNC(v, i) (__extension__ ({{                         \\
> +   {rtype} mx0 = {fname}(svld1rq_f{prec_short} (svptrue_b{prec_short}(), \\
> +                                                variants[v].in[i].arg0), \\
> +                         svptrue_b{prec_short}());                       \\
> +   mx0; }}))
> +
> +struct args
> +{{
> +  {stype} arg0[STRIDE];
> +  double timing;
> +}};
> +
> +struct _variants
> +{{
> +  const char *name;
> +  int count;
> +  struct args *in;
> +}};
> +
> +struct args in0[{rowcount}] = {{
> +{in_data}
> +}};
> +
> +struct _variants variants[1] = {{
> +  {{"", {rowcount}, in0}},
> +}};
> +
> +#define NUM_VARIANTS 1
> +#define NUM_SAMPLES(i) (variants[i].count)
> +#define VARIANT(i) (variants[i].name)
> +
> +// Cannot pass volatile pointer to svst1. This still does not appear to get optimised out.
> +static {stype} /*volatile*/ ret[STRIDE];
> +
> +#define BENCH_FUNC(i, j) ({{ svst1_f{prec_short}(svwhilelt_b{prec_short}(0, 4), ret, CALL_BENCH_FUNC(i, j)); }})
> +#define FUNCNAME "{fname}"
> +#include <bench-libmvec-skeleton.c>
> +"""
> +
> +def main(name):
> +    _, prec, _, func = name.split("-")
> +    scalar_to_sve_type = {"double": "svfloat64_t", "float": "svfloat32_t"}
> +
> +    stride = {"double": 2, "float": 4}[prec]
> +    rtype = scalar_to_sve_type[prec]
> +    atype = scalar_to_sve_type[prec]
> +    fname = f"_ZGVsMxv_{func}{'f' if prec == 'float' else ''}"
> +    prec_short = {"double": 64, "float": 32}[prec]
> +
> +    with open(f"../benchtests/{func}-inputs") as f:
> +        in_vals = [l.strip() for l in f.readlines() if l and not l.startswith("#")]
> +    in_vals = [in_vals[i:i+stride] for i in range(0, len(in_vals), stride)]
> +    rowcount= len(in_vals)
> +    in_data = ",\n".join("{{" + ", ".join(row) + "}, 0}" for row in in_vals)
> +
> +    print(TEMPLATE.format(stride=stride,
> +                          rtype=rtype,
> +                          atype=atype,
> +                          fname=fname,
> +                          prec_short=prec_short,
> +                          in_data=in_data,
> +                          rowcount=rowcount,
> +                          stype=prec))
> +
> +
> +if __name__ == "__main__":
> +    main(sys.argv[1])
> diff --git a/sysdeps/aarch64/fpu/sve_utils.h b/sysdeps/aarch64/fpu/sve_utils.h
> new file mode 100644
> index 0000000000..dbdc03387c
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/sve_utils.h
> @@ -0,0 +1,55 @@
> +/* Helpers for SVE vector math funtions.

s/funtions/functions

> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <arm_sve.h>
> +
> +#define SV_NAME_F1(fun) _ZGVsMxv_##fun##f
> +#define SV_NAME_D1(fun) _ZGVsMxv_##fun
> +#define SV_NAME_F2(fun) _ZGVsMxvv_##fun##f
> +#define SV_NAME_D2(fun) _ZGVsMxvv_##fun
> +
> +static inline svfloat32_t
> +sv_call_f32 (float (*f) (float), svfloat32_t x, svfloat32_t y, svbool_t cmp)
> +{
> +  svbool_t p = svpfirst (cmp, svpfalse ());
> +  while (svptest_any (cmp, p))
> +    {
> +      float elem = svclastb_n_f32 (p, 0, x);
> +      elem = (*f) (elem);
> +      svfloat32_t y2 = svdup_n_f32 (elem);
> +      y = svsel_f32 (p, y2, y);
> +      p = svpnext_b32 (cmp, p);
> +    }
> +  return y;
> +}
> +
> +static inline svfloat64_t
> +sv_call_f64 (double (*f) (double), svfloat64_t x, svfloat64_t y, svbool_t cmp)
> +{
> +  svbool_t p = svpfirst (cmp, svpfalse ());
> +  while (svptest_any (cmp, p))
> +    {
> +      double elem = svclastb_n_f64 (p, 0, x);
> +      elem = (*f) (elem);
> +      svfloat64_t y2 = svdup_n_f64 (elem);
> +      y = svsel_f64 (p, y2, y);
> +      p = svpnext_b64 (cmp, p);
> +    }
> +  return y;
> +}
> diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c
> new file mode 100644
> index 0000000000..52e330f469
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c
> @@ -0,0 +1,26 @@
> +/* Scalar wrappers for double-precision Advanced SIMD vector math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <arm_neon.h>
> +
> +#include "test-double-advsimd.h"
> +
> +#define VEC_TYPE float64x2_t
> +
> +VPCS_VECTOR_WRAPPER(cos_advsimd, _ZGVnN2v_cos)
> diff --git a/sysdeps/aarch64/fpu/test-double-advsimd.h b/sysdeps/aarch64/fpu/test-double-advsimd.h
> new file mode 100644
> index 0000000000..8bd32b97fa
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/test-double-advsimd.h
> @@ -0,0 +1,25 @@
> +/* Test declarations for double-precision Advanced SIMD vector math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "test-double.h"
> +#include "test-math-vector.h"
> +#include "test-vpcs-vector-wrapper.h"
> +
> +#define VEC_SUFF _advsimd
> +#define VEC_LEN 2
> diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c
> new file mode 100644
> index 0000000000..8edc5ed5ab
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c
> @@ -0,0 +1,34 @@
> +/* Scalar wrappers for double-precision SVE vector math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <arm_sve.h>
> +
> +#include "test-double-sve.h"
> +
> +/* Wrapper from scalar to SVE function. Cannot just use VECTOR_WRAPPER due to predication.  */
> +#define SVE_VECTOR_WRAPPER(scalar_func, vector_func)			\
> +  extern VEC_TYPE vector_func (VEC_TYPE, svbool_t);			\
> +FLOAT scalar_func (FLOAT x)						\
> +{									\
> +  VEC_TYPE mx = svdup_n_f64 (x);					\
> +  VEC_TYPE mr = vector_func (mx, svptrue_b64 ());			\
> +  return svlastb_f64 (svptrue_b64 (), mr);				\
> +}
> +
> +SVE_VECTOR_WRAPPER(cos_sve, _ZGVsMxv_cos)
> diff --git a/sysdeps/aarch64/fpu/test-double-sve.h b/sysdeps/aarch64/fpu/test-double-sve.h
> new file mode 100644
> index 0000000000..857a40861d
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/test-double-sve.h
> @@ -0,0 +1,26 @@
> +/* Test declarations for double-precision SVE vector math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "test-double.h"
> +#include "test-math-vector.h"
> +
> +#define REQUIRE_SVE
> +#define VEC_SUFF _sve
> +#define VEC_LEN svcntd()
> +#define VEC_TYPE svfloat64_t
> diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c
> new file mode 100644
> index 0000000000..3577ca93b8
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c
> @@ -0,0 +1,26 @@
> +/* Scalar wrappers for single-precision Advanced SIMD vector math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <arm_neon.h>
> +
> +#include "test-float-advsimd.h"
> +
> +#define VEC_TYPE float32x4_t
> +
> +VPCS_VECTOR_WRAPPER(cosf_advsimd, _ZGVnN4v_cosf)
> diff --git a/sysdeps/aarch64/fpu/test-float-advsimd.h b/sysdeps/aarch64/fpu/test-float-advsimd.h
> new file mode 100644
> index 0000000000..86fce613cd
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/test-float-advsimd.h
> @@ -0,0 +1,25 @@
> +/* Test declarations for singlex-precision Advanced SIMD vector math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "test-float.h"
> +#include "test-math-vector.h"
> +#include "test-vpcs-vector-wrapper.h"
> +
> +#define VEC_SUFF _advsimd
> +#define VEC_LEN 4
> diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c
> new file mode 100644
> index 0000000000..b6a944d502
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c
> @@ -0,0 +1,34 @@
> +/* Scalar wrappers for single-precision SVE vector math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <arm_sve.h>
> +
> +#include "test-float-sve.h"
> +
> +/* Wrapper from scalar to SVE function. Cannot just use VECTOR_WRAPPER due to predication.  */
> +#define SVE_VECTOR_WRAPPER(scalar_func, vector_func)			\
> +  extern VEC_TYPE vector_func (VEC_TYPE, svbool_t);			\
> +FLOAT scalar_func (FLOAT x)						\
> +{									\
> +  VEC_TYPE mx = svdup_n_f32 (x);					\
> +  VEC_TYPE mr = vector_func (mx, svptrue_b32 ());			\
> +  return svlastb_f32 (svptrue_b32 (), mr);				\
> +}
> +
> +SVE_VECTOR_WRAPPER(cosf_sve, _ZGVsMxv_cosf)
> diff --git a/sysdeps/aarch64/fpu/test-float-sve.h b/sysdeps/aarch64/fpu/test-float-sve.h
> new file mode 100644
> index 0000000000..d6e122cf67
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/test-float-sve.h
> @@ -0,0 +1,26 @@
> +/* Test declarations for single-precision SVE vector math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include "test-float.h"
> +#include "test-math-vector.h"
> +
> +#define REQUIRE_SVE
> +#define VEC_SUFF _sve
> +#define VEC_LEN svcntw()
> +#define VEC_TYPE svfloat32_t
> diff --git a/sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h b/sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h
> new file mode 100644
> index 0000000000..eb0f0db838
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h
> @@ -0,0 +1,30 @@
> +/* Scalar wrapper for vpcs-enabled Advanced SIMD vector math functions.
> +
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define VPCS_VECTOR_WRAPPER(scalar_func, vector_func)				\
> +extern __attribute__ ((aarch64_vector_pcs)) VEC_TYPE vector_func (VEC_TYPE);	\
> +FLOAT scalar_func (FLOAT x)							\
> +{										\
> +  int i;									\
> +  VEC_TYPE mx;									\
> +  INIT_VEC_LOOP (mx, x, VEC_LEN);						\
> +  VEC_TYPE mr = vector_func (mx);						\
> +  TEST_VEC_LOOP (mr, VEC_LEN);							\
> +  return ((FLOAT) mr[0]);							\
> +}
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
> new file mode 100644
> index 0000000000..13af421af2
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
> @@ -0,0 +1,4 @@
> +GLIBC_2.38 _ZGVnN2v_cos F
> +GLIBC_2.38 _ZGVnN4v_cosf F
> +GLIBC_2.38 _ZGVsMxv_cos F
> +GLIBC_2.38 _ZGVsMxv_cosf F
  
Joe Ramsay Feb. 9, 2023, 12:43 p.m. UTC | #3
Thanks for the comments. I will attempt a patch that addresses them, in 
the meantime just a few questions:

On 08/02/2023 13:11, Adhemerval Zanella Netto wrote:
> 
> 
> On 07/02/23 08:35, Joe Ramsay via Libc-alpha wrote:
>> Hi,
>>
>> The attached patch is an attempt to enable libmvec on AArch64. The
>> proposed change is mainly implementing build infrastructure to add the
>> new routines to ABI, tests and benchmarks. I have demonstrated how
>> this all fits together by adding implementations for vector cos, in
>> both single and double precision, targeting both Advanced SIMD and
>> SVE.
>>
>> The implementations of the routines themselves are just loops over the
>> scalar routine from libm for now, as we are more concerned with
>> getting the plumbing right at this point. We plan to contribute vector
>> routines from the Arm Optimized Routines repo that are compliant with
>> requirements described in the libmvec wiki.
>>
>> Any comments/thoughts much appreciated! In particular, the patch
>> raises the minimum GCC to 10, in order to be able to submit routines
>> written using ACLE instead of assembly. This is clearly a big jump,
>> but we have options if this is not acceptable. One option would be to
>> submit compiler-generated assembly, similar to the equivalent routines
>> under sysdeps/x86_64. If GCC 9 is an acceptable compromise then this
>> would only have to be for SVE routines.
> 
> Using C implementation with intrinsics would be idea, there are more easily
> maintained and can leverage compiler improvements.  I rather do it instead
> of the assembly dump Intel did.
> 
> The minimum GCC 10 is not ideal, however I don't see it as blocker either
> (it should be up to arch-maintainers).  One option might be check if
> compiler does not support building libmvec, disable the build and related
> checks.  It is not ideal either, since the resulting glibc won't have
> a complete ABI.
> 
> 
OK, let's see what arch-maintainers make of it.
>>
>> Also, are there plans to merge libmvec into libm, or will they be kept
>> separate?
> 
> There is none afaik.  The libpthread, librt, etc. merge was done to
> fix long standing design and maintanance issues that is not really presented
> with libm and libmvec.  There is still the partial upgrade one, but
> it is still present with a disjoint ld, libc, libm anyway.
> 
> However, it is feasible to merge if your willing to work on it.  We will
> need to keep the x86_64 lib with the sentinel compat symbol (similar to
> what we did for libpthread).
> 
> What I would like to avoid is to have different arquitectures using different
> approaches, for instance aarch64 begin merged while having x86_64 still
> using a different library.  It add a slight more complexity to the build
> process and extra arch specific boilerplate code.
> 
This sounds good - keeping them separate would be our choice too. I have 
not come across the sentinel compat symbol - is this something we need 
to do for AArch64 also?
>>
>> Note that at this point users have to manually call the vector math
>> functions, there is no declaration in math.h to assist auto
>> vectorization of scalar math calls. This seems to be acceptable to
>> some downstream users.
> 
> I think that's the current approach for x86_64 anyway, since most usages
> are done through compiler autovectorization code.
> 
> Some comments below.
> 
>>
>> Thanks,
>> Joe
>> ---
>>   INSTALL                                       |  3 +
>>   manual/install.texi                           |  3 +
>>   sysdeps/aarch64/configure                     | 28 ++++++
>>   sysdeps/aarch64/configure.ac                  | 20 ++++
>>   sysdeps/aarch64/fpu/Makefile                  | 66 +++++++++++++
>>   sysdeps/aarch64/fpu/Versions                  |  8 ++
>>   sysdeps/aarch64/fpu/advsimd_utils.h           | 39 ++++++++
>>   sysdeps/aarch64/fpu/bench-libmvec-skeleton.c  | 83 +++++++++++++++++
>>   sysdeps/aarch64/fpu/bits/math-vector.h        | 65 +++++++++++++
>>   sysdeps/aarch64/fpu/cos_advsimd.c             | 28 ++++++
>>   sysdeps/aarch64/fpu/cos_sve.c                 | 27 ++++++
>>   sysdeps/aarch64/fpu/cosf_advsimd.c            | 28 ++++++
>>   sysdeps/aarch64/fpu/cosf_sve.c                | 27 ++++++
>>   sysdeps/aarch64/fpu/libm-test-ulps            |  7 ++
>>   sysdeps/aarch64/fpu/libm-test-ulps-name       |  1 +
>>   sysdeps/aarch64/fpu/math-tests-arch.h         | 34 +++++++
>>   .../fpu/scripts/bench_libmvec_advsimd.py      | 91 ++++++++++++++++++
>>   .../aarch64/fpu/scripts/bench_libmvec_sve.py  | 93 +++++++++++++++++++
>>   sysdeps/aarch64/fpu/sve_utils.h               | 55 +++++++++++
>>   .../fpu/test-double-advsimd-wrappers.c        | 26 ++++++
>>   sysdeps/aarch64/fpu/test-double-advsimd.h     | 25 +++++
>>   .../aarch64/fpu/test-double-sve-wrappers.c    | 34 +++++++
>>   sysdeps/aarch64/fpu/test-double-sve.h         | 26 ++++++
>>   .../aarch64/fpu/test-float-advsimd-wrappers.c | 26 ++++++
>>   sysdeps/aarch64/fpu/test-float-advsimd.h      | 25 +++++
>>   sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 34 +++++++
>>   sysdeps/aarch64/fpu/test-float-sve.h          | 26 ++++++
>>   .../aarch64/fpu/test-vpcs-vector-wrapper.h    | 30 ++++++
>>   .../unix/sysv/linux/aarch64/libmvec.abilist   |  4 +
>>   29 files changed, 962 insertions(+)
>>   create mode 100644 sysdeps/aarch64/fpu/Makefile
>>   create mode 100644 sysdeps/aarch64/fpu/Versions
>>   create mode 100644 sysdeps/aarch64/fpu/advsimd_utils.h
>>   create mode 100644 sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
>>   create mode 100644 sysdeps/aarch64/fpu/bits/math-vector.h
>>   create mode 100644 sysdeps/aarch64/fpu/cos_advsimd.c
>>   create mode 100644 sysdeps/aarch64/fpu/cos_sve.c
>>   create mode 100644 sysdeps/aarch64/fpu/cosf_advsimd.c
>>   create mode 100644 sysdeps/aarch64/fpu/cosf_sve.c
>>   create mode 100644 sysdeps/aarch64/fpu/libm-test-ulps
>>   create mode 100644 sysdeps/aarch64/fpu/libm-test-ulps-name
>>   create mode 100644 sysdeps/aarch64/fpu/math-tests-arch.h
>>   create mode 100644 sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py
>>   create mode 100755 sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py
>>   create mode 100644 sysdeps/aarch64/fpu/sve_utils.h
>>   create mode 100644 sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c
>>   create mode 100644 sysdeps/aarch64/fpu/test-double-advsimd.h
>>   create mode 100644 sysdeps/aarch64/fpu/test-double-sve-wrappers.c
>>   create mode 100644 sysdeps/aarch64/fpu/test-double-sve.h
>>   create mode 100644 sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c
>>   create mode 100644 sysdeps/aarch64/fpu/test-float-advsimd.h
>>   create mode 100644 sysdeps/aarch64/fpu/test-float-sve-wrappers.c
>>   create mode 100644 sysdeps/aarch64/fpu/test-float-sve.h
>>   create mode 100644 sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h
>>   create mode 100644 sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
>>
>> diff --git a/INSTALL b/INSTALL
>> index 970d6627e2..ba800e41d6 100644
>> --- a/INSTALL
>> +++ b/INSTALL
>> @@ -524,6 +524,9 @@ build the GNU C Library:
>>        For s390x architecture builds, GCC 7.1 or higher is needed (See gcc
>>        Bug 98269).
>>   
>> +     For AArch64 architecture builds with mathvec enabled, GCC 10 or
>> +     higher is needed due to dependency on arm_sve.h.
>> +
>>        For multi-arch support it is recommended to use a GCC which has
>>        been built with support for GNU indirect functions.  This ensures
>>        that correct debugging information is generated for functions
>> diff --git a/manual/install.texi b/manual/install.texi
>> index 260f8a5c82..e9c62b51ae 100644
>> --- a/manual/install.texi
>> +++ b/manual/install.texi
>> @@ -567,6 +567,9 @@ For ARC architecture builds, GCC 8.3 or higher is needed.
>>   
>>   For s390x architecture builds, GCC 7.1 or higher is needed (See gcc Bug 98269).
>>   
>> +For AArch64 architecture builds with mathvec enabled, GCC 10 or higher is needed
>> +due to dependency on arm_sve.h.
>> +
>>   For multi-arch support it is recommended to use a GCC which has been built with
>>   support for GNU indirect functions.  This ensures that correct debugging
>>   information is generated for functions selected by IFUNC resolvers.  This
>> diff --git a/sysdeps/aarch64/configure b/sysdeps/aarch64/configure
>> index 2130f6b8f8..a71c32d70f 100644
>> --- a/sysdeps/aarch64/configure
>> +++ b/sysdeps/aarch64/configure
>> @@ -327,3 +327,31 @@ if test $libc_cv_aarch64_sve_asm = yes; then
>>     $as_echo "#define HAVE_AARCH64_SVE_ASM 1" >>confdefs.h
>>   
>>   fi
>> +
>> +# Check if the local system can run SVE binary
>> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for local SVE hardware" >&5
>> +$as_echo_n "checking for local SVE hardware... " >&6; }
>> +if ${libc_cv_can_run_sve+:} false; then :
>> +  $as_echo_n "(cached) " >&6
>> +else
>> +    cat > conftest.c <<EOF
>> +#include <sys/auxv.h>
>> +int main(void) {
>> +  if (! (getauxval (AT_HWCAP) & HWCAP_SVE))
>> +    return 1;
>> +  return 0;
>> +}
>> +EOF
>> +  libc_cv_can_run_sve=yes
>> +  ${CC-cc} conftest.c -o conftest
>> +  ./conftest || libc_cv_can_run_sve=no
>> +  rm -f conftest*
>> +fi
>> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_can_run_sve" >&5
>> +$as_echo "$libc_cv_can_run_sve" >&6; }
>> +config_vars="$config_vars
>> +aarch64-can-run-sve = $libc_cv_can_run_sve"
>> +
>> +if test x"$build_mathvec" = xnotset; then
>> +  build_mathvec=yes
>> +fi
>> diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac
>> index 85c6f76508..688f8772a6 100644
>> --- a/sysdeps/aarch64/configure.ac
>> +++ b/sysdeps/aarch64/configure.ac
>> @@ -101,3 +101,23 @@ rm -f conftest*])
>>   if test $libc_cv_aarch64_sve_asm = yes; then
>>     AC_DEFINE(HAVE_AARCH64_SVE_ASM)
>>   fi
>> +
>> +# Check if the local system can run SVE binary
>> +AC_CACHE_CHECK(for local SVE hardware, libc_cv_can_run_sve, [dnl
>> +  cat > conftest.c <<EOF
>> +#include <sys/auxv.h>
>> +int main(void) {
>> +  if (! (getauxval (AT_HWCAP) & HWCAP_SVE))
>> +    return 1;
>> +  return 0;
>> +}
>> +EOF
>> +  libc_cv_can_run_sve=yes
>> +  ${CC-cc} conftest.c -o conftest
>> +  ./conftest || libc_cv_can_run_sve=no
>> +  rm -f conftest*])
>> +LIBC_CONFIG_VAR([aarch64-can-run-sve], [$libc_cv_can_run_sve])
>> +
>> +if test x"$build_mathvec" = xnotset; then
>> +  build_mathvec=yes
>> +fi
>> diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
>> new file mode 100644
>> index 0000000000..caf5d60669
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/Makefile
>> @@ -0,0 +1,66 @@
>> +float-advsimd-funcs = cos
>> +
>> +double-advsimd-funcs = cos
>> +
>> +float-sve-funcs = cos
>> +
>> +double-sve-funcs = cos
>> +
>> +ifeq ($(subdir),mathvec)
>> +libmvec-support = $(addsuffix f_advsimd,$(float-advsimd-funcs)) \
>> +                  $(addsuffix _advsimd,$(double-advsimd-funcs)) \
>> +                  $(addsuffix f_sve,$(float-sve-funcs)) \
>> +                  $(addsuffix _sve,$(double-sve-funcs))
>> +endif
>> +
>> +sve-cflags = -march=armv8-a+sve
>> +
>> +
>> +ifeq ($(build-mathvec),yes)
>> +bench-libmvec = $(addprefix float-advsimd-,$(float-advsimd-funcs)) \
>> +                $(addprefix double-advsimd-,$(double-advsimd-funcs))
>> +
>> +# If not on an SVE-enabled machine, do not add SVE routines to benchmarks.
>> +# The routines are still built.
>> +ifeq ($(aarch64-can-run-sve),yes)
>> +  bench-libmvec += $(addprefix float-sve-,$(float-sve-funcs)) \
>> +                   $(addprefix double-sve-,$(double-sve-funcs))
>> +endif
>> +endif
>> +
>> +$(objpfx)bench-float-advsimd-%.c:
>> +	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py $(basename $(@F)) > $@
>> +$(objpfx)bench-double-advsimd-%.c:
>> +	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py $(basename $(@F)) > $@
>> +$(objpfx)bench-float-sve-%.c:
>> +	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py $(basename $(@F)) > $@
>> +$(objpfx)bench-double-sve-%.c:
>> +	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py $(basename $(@F)) > $@
>> +
>> +ifeq (${STATIC-BENCHTESTS},yes)
>> +libmvec-benchtests = $(common-objpfx)mathvec/libmvec.a $(common-objpfx)math/libm.a
>> +else
>> +libmvec-benchtests = $(libmvec) $(libm)
>> +endif
>> +
>> +$(addprefix $(objpfx)bench-,$(bench-libmvec)): $(libmvec-benchtests)
>> +
>> +ifeq ($(build-mathvec),yes)
>> +libmvec-tests += float-advsimd double-advsimd float-sve double-sve
>> +endif
>> +
>> +define sve-float-cflags-template
>> +CFLAGS-$(1)f_sve.c += $(sve-cflags)
>> +CFLAGS-bench-float-sve-$(1).c += $(sve-cflags)
>> +endef
>> +
>> +define sve-double-cflags-template
>> +CFLAGS-$(1)_sve.c += $(sve-cflags)
>> +CFLAGS-bench-double-sve-$(1).c += $(sve-cflags)
>> +endef
>> +
>> +$(foreach f,$(float-sve-funcs), $(eval $(call sve-float-cflags-template,$(f))))
>> +$(foreach f,$(double-sve-funcs), $(eval $(call sve-double-cflags-template,$(f))))
>> +
>> +CFLAGS-test-float-sve-wrappers.c = $(sve-cflags)
>> +CFLAGS-test-double-sve-wrappers.c = $(sve-cflags)
>> diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions
>> new file mode 100644
>> index 0000000000..5222a6f180
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/Versions
>> @@ -0,0 +1,8 @@
>> +libmvec {
>> +  GLIBC_2.38 {
>> +    _ZGVnN2v_cos;
>> +    _ZGVnN4v_cosf;
>> +    _ZGVsMxv_cos;
>> +    _ZGVsMxv_cosf;
>> +  }
>> +}
>> diff --git a/sysdeps/aarch64/fpu/advsimd_utils.h b/sysdeps/aarch64/fpu/advsimd_utils.h
>> new file mode 100644
>> index 0000000000..b597a18b8f
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/advsimd_utils.h
>> @@ -0,0 +1,39 @@
>> +/* Helpers for Advanced SIMD vector math funtions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <arm_neon.h>
>> +
>> +#define VPCS_ATTR __attribute__ ((aarch64_vector_pcs))
>> +
>> +#define V_NAME_F1(fun) _ZGVnN4v_##fun##f
>> +#define V_NAME_D1(fun) _ZGVnN2v_##fun
>> +#define V_NAME_F2(fun) _ZGVnN4vv_##fun##f
>> +#define V_NAME_D2(fun) _ZGVnN2vv_##fun
>> +
>> +static inline float32x4_t
> 
> You might considere using __always_inline here if the idea is to use this functions
> as macros.
> 
>> +v_call_f32 (float (*f) (float), float32x4_t x)
>> +{
>> +  return (float32x4_t){f (x[0]), f (x[1]), f (x[2]), f (x[3])};
>> +}
>> +
>> +static inline float64x2_t
>> +v_call_f64 (double (*f) (double), float64x2_t x)
>> +{
>> +  return (float64x2_t){f (x[0]), f (x[1])};
>> +}
>> diff --git a/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c b/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
>> new file mode 100644
>> index 0000000000..ca6a10d1fe
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
>> @@ -0,0 +1,83 @@
>> +/* Skeleton for libmvec benchmark programs.
>> +   Copyright (C) 2021-2023 Free Software Foundation, Inc.
> 
> I think Copyright year is only 2023 here.
> 
Carlos flagged this up as well. This file was copied, with some 
modifications, from sysdeps/x86_64/fpu/bench-libmvec-skeleton.c - is it 
OK for it to get a brand new copyright header despite not being brand 
new code?
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <string.h>
>> +#include <stdint.h>
>> +#include <stdbool.h>
>> +#include <stdio.h>
>> +#include <time.h>
>> +#include <inttypes.h>
>> +#include <bench-timing.h>
>> +#include <json-lib.h>
>> +#include <bench-util.h>
>> +#include <math-tests-arch.h>
>> +
>> +#include <bench-util.c>
>> +#define D_ITERS 10000
>> +
>> +int
>> +main (int argc, char **argv)
>> +{
>> +  unsigned long i, k;
>> +  timing_t start, end;
>> +  json_ctx_t json_ctx;> +
>> +  bench_start ();
>> +
>> +#ifdef BENCH_INIT
>> +  BENCH_INIT ();
>> +#endif
>> +
>> +  json_init (&json_ctx, 2, stdout);
>> +
>> +  /* Begin function.  */
>> +  json_attr_object_begin (&json_ctx, FUNCNAME);
>> +
>> +  for (int v = 0; v < NUM_VARIANTS; v++)
>> +    {
>> +      double d_total_time = 0;
>> +      timing_t cur;
>> +      for (k = 0; k < D_ITERS; k++)
>> +	{
>> +	  TIMING_NOW (start);
>> +	  for (i = 0; i < NUM_SAMPLES (v); i++)
>> +	    BENCH_FUNC (v, i);
>> +	  TIMING_NOW (end);
>> +
>> +	  TIMING_DIFF (cur, start, end);
>> +
>> +	  TIMING_ACCUM (d_total_time, cur);
>> +	}
>> +      double d_total_data_set = D_ITERS * NUM_SAMPLES (v) * STRIDE;
>> +
>> +      /* Begin variant.  */
>> +      json_attr_object_begin (&json_ctx, VARIANT (v));
>> +
>> +      json_attr_double (&json_ctx, "duration", d_total_time);
>> +      json_attr_double (&json_ctx, "iterations", d_total_data_set);
>> +      json_attr_double (&json_ctx, "mean", d_total_time / d_total_data_set);
>> +
>> +      /* End variant.  */
>> +      json_attr_object_end (&json_ctx);
>> +    }
>> +
>> +  /* End function.  */
>> +  json_attr_object_end (&json_ctx);
>> +
>> +  return 0;
>> +}
> 
> This file is quite similar to x86_64 modulo the extra CPU_FEATURE_ACTIVE checks.
> Maybe try to refactor to use a common definition and parametrize the x86 code
> on a arch-specific code?
> 
>> diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h
>> new file mode 100644
>> index 0000000000..a25845bff8
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/bits/math-vector.h
>> @@ -0,0 +1,65 @@
>> +/* Platform-specific SIMD declarations of math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#ifndef _MATH_H
>> +# error "Never include <bits/math-vector.h> directly;\
>> + include <math.h> instead."
>> +#endif
>> +
>> +/* Get default empty definitions for simd declarations.  */
>> +#include <bits/libm-simd-decl-stubs.h>
>> +
>> +#if __GNUC_PREREQ (9, 0)
> 
> I think these tests should move to configure tests instead, it advertises beforehand
> the user that it needs to update the compiler instead through a compiler error.
> 
> The configure check will then check for both advsimd and SVE support, so there is
> no need for __ADVSIMD_VEC_MATH_SUPPORTED or __SVE_VEC_MATH_SUPPORTED.
> 
Apologies, I don't quite understand what you mean by this. I put these 
tests in so that users could compile against the new symbols with math.h 
as long as they had a sufficiently new compiler, but wouldn't get 
undefined types in math.h if they were using an old compiler that didn't 
have e.g. __Float32x4_t. (I think I remarked in the original message 
that new symbols hadn't been added to math.h, but this was not correct).
I don't see how this relates to configure, since this isn't for the 
benefit of library builders, but maybe I have misunderstood? We could 
put a separate check in at configure time for a compiler which is 
sufficient for vector types, but that would not IMO make these tests 
redundant. Let me know what you think.
>> +# define __ADVSIMD_VEC_MATH_SUPPORTED
>> +typedef __Float32x4_t __f32x4_t;
>> +typedef __Float64x2_t __f64x2_t;
>> +#elif __clang_major__ >= 8
>> +# define __ADVSIMD_VEC_MATH_SUPPORTED
>> +typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
>> +typedef __attribute__((__neon_vector_type__(2))) double __f64x2_t;
>> +#endif
>> +
>> +#if __GNUC_PREREQ (10, 0) || __clang_major >= 11
>> +# define __SVE_VEC_MATH_SUPPORTED
>> +typedef __SVFloat32_t __sv_f32_t;
>> +typedef __SVFloat64_t __sv_f64_t;
>> +typedef __SVBool_t __sv_bool_t;
>> +#endif
>> +
>> +/* If vector types and vector PCS are unsupported in the working
>> +   compiler, no choice but to omit vector math declarations.  */
>> +
>> +#ifdef __ADVSIMD_VEC_MATH_SUPPORTED
>> +
>> +# define __vpcs __attribute__((__aarch64_vector_pcs__))
>> +
>> +__vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
>> +__vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t);
>> +
>> +#undef __ADVSIMD_VEC_MATH_SUPPORTED
>> +#endif /* __ADVSIMD_VEC_MATH_SUPPORTED */
>> +
>> +#ifdef __SVE_VEC_MATH_SUPPORTED
>> +
>> +__sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
>> +__sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t);
>> +
>> +#undef __SVE_VEC_MATH_SUPPORTED
>> +#endif /* __SVE_VEC_MATH_SUPPORTED */
>> +
>> diff --git a/sysdeps/aarch64/fpu/cos_advsimd.c b/sysdeps/aarch64/fpu/cos_advsimd.c
>> new file mode 100644
>> index 0000000000..5a42fbb182
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/cos_advsimd.c
>> @@ -0,0 +1,28 @@
>> +/* Double-precision vector (Advanced SIMD) cos function.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <math.h>
>> +
>> +#include "advsimd_utils.h"
>> +
>> +VPCS_ATTR
>> +float64x2_t V_NAME_D1 (cos) (float64x2_t x)
>> +{
>> +  return v_call_f64 (cos, x);
>> +}
>> diff --git a/sysdeps/aarch64/fpu/cos_sve.c b/sysdeps/aarch64/fpu/cos_sve.c
>> new file mode 100644
>> index 0000000000..62bd2ece0e
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/cos_sve.c
>> @@ -0,0 +1,27 @@
>> +/* Double-precision vector (SVE) cos function.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <math.h>
>> +
>> +#include "sve_utils.h"
>> +
>> +svfloat64_t SV_NAME_D1 (cos) (svfloat64_t x, svbool_t pg)
>> +{
>> +  return sv_call_f64 (cos, x, svdup_n_f64 (0), pg);
>> +}
>> diff --git a/sysdeps/aarch64/fpu/cosf_advsimd.c b/sysdeps/aarch64/fpu/cosf_advsimd.c
>> new file mode 100644
>> index 0000000000..23f54bd905
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/cosf_advsimd.c
>> @@ -0,0 +1,28 @@
>> +/* Single-precision vector (Advanced SIMD) cos function.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <math.h>
>> +
>> +#include "advsimd_utils.h"
>> +
>> +VPCS_ATTR
>> +float32x4_t V_NAME_F1 (cos) (float32x4_t x)
>> +{
>> +  return v_call_f32 (cosf, x);
>> +}
>> diff --git a/sysdeps/aarch64/fpu/cosf_sve.c b/sysdeps/aarch64/fpu/cosf_sve.c
>> new file mode 100644
>> index 0000000000..0c4e365e1e
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/cosf_sve.c
>> @@ -0,0 +1,27 @@
>> +/* Single-precision vector (SVE) cos function.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <math.h>
>> +
>> +#include "sve_utils.h"
>> +
>> +svfloat32_t SV_NAME_F1 (cos) (svfloat32_t x, svbool_t pg)
>> +{
>> +  return sv_call_f32 (cosf, x, svdup_n_f32 (0), pg);
>> +}
>> diff --git a/sysdeps/aarch64/fpu/libm-test-ulps b/sysdeps/aarch64/fpu/libm-test-ulps
>> new file mode 100644
>> index 0000000000..b199d7ddab
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/libm-test-ulps
>> @@ -0,0 +1,7 @@
>> +Function: "cos_advsimd":
>> +double: 2
>> +float: 2
>> +
>> +Function: "cos_sve":
>> +double: 2
>> +float: 2
>> \ No newline at end of file
> 
> Bogus line feed here.
> 
>> diff --git a/sysdeps/aarch64/fpu/libm-test-ulps-name b/sysdeps/aarch64/fpu/libm-test-ulps-name
>> new file mode 100644
>> index 0000000000..1f66c5cda0
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/libm-test-ulps-name
>> @@ -0,0 +1 @@
>> +AArch64
>> diff --git a/sysdeps/aarch64/fpu/math-tests-arch.h b/sysdeps/aarch64/fpu/math-tests-arch.h
>> new file mode 100644
>> index 0000000000..263d4cabf1
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/math-tests-arch.h
>> @@ -0,0 +1,34 @@
>> +/* Runtime architecture check for math tests. AArch64 version.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#ifdef REQUIRE_SVE
>> +# include <sys/auxv.h>
>> +
>> +# define INIT_ARCH_EXT
>> +# define CHECK_ARCH_EXT							\
>> +   do									\
>> +     {									\
>> +       if (!(getauxval (AT_HWCAP) & HWCAP_SVE)) return;			\
>> +     }									\
>> +   while (0)
>> +
>> +#else
>> +# include <sysdeps/generic/math-tests-arch.h>
>> +#endif
>> +
> 
> Spurions new line here.
> 
>> diff --git a/sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py b/sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py
>> new file mode 100644
>> index 0000000000..9c092670d7
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py
>> @@ -0,0 +1,91 @@
>> +#!/usr/bin/python3
>> +# Copyright (C) 2023 Free Software Foundation, Inc.
>> +# This file is part of the GNU C Library.
>> +#
>> +# The GNU C Library is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU Lesser General Public
>> +# License as published by the Free Software Foundation; either
>> +# version 2.1 of the License, or (at your option) any later version.
>> +#
>> +# The GNU C Library is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +# Lesser General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU Lesser General Public
>> +# License along with the GNU C Library; if not, see
>> +# <https://www.gnu.org/licenses/>.
>> +
>> +import sys
>> +
>> +TEMPLATE = """
>> +#include <math.h>
>> +#include <arm_neon.h>
>> +
>> +#define STRIDE {stride}
>> +
>> +#define CALL_BENCH_FUNC(v, i) (__extension__ ({{                         \\
>> +   {rtype} mx0 = {fname}(vld1q_f{prec_short} (variants[v].in[i].arg0));  \\
>> +   mx0; }}))
>> +
>> +struct args
>> +{{
>> +  {stype} arg0[STRIDE];
>> +  double timing;
>> +}};
>> +
>> +struct _variants
>> +{{
>> +  const char *name;
>> +  int count;
>> +  struct args *in;
>> +}};
>> +
>> +struct args in0[{rowcount}] = {{
>> +{in_data}
>> +}};
>> +
>> +struct _variants variants[1] = {{
>> +  {{"", {rowcount}, in0}},
>> +}};
> 
> Maybe define them as static const?
> 
>> +
>> +#define NUM_VARIANTS 1
>> +#define NUM_SAMPLES(i) (variants[i].count)
>> +#define VARIANT(i) (variants[i].name)
>> +
>> +// Cannot pass volatile pointer to svst1. This still does not appear to get optimised out.
>> +static {rtype} volatile ret;
>> +
>> +#define BENCH_FUNC(i, j) ({{ ret = CALL_BENCH_FUNC(i, j); }})
>> +#define FUNCNAME "{fname}"
>> +#include <bench-libmvec-skeleton.c>
>> +"""
>> +
>> +def main(name):
>> +    _, prec, _, func = name.split("-")
>> +    scalar_to_advsimd_type = {"double": "float64x2_t", "float": "float32x4_t"}
>> +
>> +    stride = {"double": 2, "float": 4}[prec]
>> +    rtype = scalar_to_advsimd_type[prec]
>> +    atype = scalar_to_advsimd_type[prec]
>> +    fname = f"_ZGVnN{stride}v_{func}{'f' if prec == 'float' else ''}"
>> +    prec_short = {"double": 64, "float": 32}[prec]
>> +
>> +    with open(f"../benchtests/{func}-inputs") as f:
>> +        in_vals = [l.strip() for l in f.readlines() if l and not l.startswith("#")]
>> +    in_vals = [in_vals[i:i+stride] for i in range(0, len(in_vals), stride)]
>> +    rowcount= len(in_vals)
>> +    in_data = ",\n".join("{{" + ", ".join(row) + "}, 0}" for row in in_vals)
>> +
>> +    print(TEMPLATE.format(stride=stride,
>> +                          rtype=rtype,
>> +                          atype=atype,
>> +                          fname=fname,
>> +                          prec_short=prec_short,
>> +                          in_data=in_data,
>> +                          rowcount=rowcount,
>> +                          stype=prec))
>> +
>> +
>> +if __name__ == "__main__":
>> +    main(sys.argv[1])
>> diff --git a/sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py b/sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py
>> new file mode 100755
>> index 0000000000..0ea21c4c69
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py
>> @@ -0,0 +1,93 @@
>> +#!/usr/bin/python3
>> +# Copyright (C) 2023 Free Software Foundation, Inc.
>> +# This file is part of the GNU C Library.
>> +#
>> +# The GNU C Library is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU Lesser General Public
>> +# License as published by the Free Software Foundation; either
>> +# version 2.1 of the License, or (at your option) any later version.
>> +#
>> +# The GNU C Library is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +# Lesser General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU Lesser General Public
>> +# License along with the GNU C Library; if not, see
>> +# <https://www.gnu.org/licenses/>.
>> +
>> +import sys
>> +
>> +TEMPLATE = """
>> +#include <math.h>
>> +#include <arm_sve.h>
>> +
>> +#define STRIDE {stride}
>> +
>> +#define CALL_BENCH_FUNC(v, i) (__extension__ ({{                         \\
>> +   {rtype} mx0 = {fname}(svld1rq_f{prec_short} (svptrue_b{prec_short}(), \\
>> +                                                variants[v].in[i].arg0), \\
>> +                         svptrue_b{prec_short}());                       \\
>> +   mx0; }}))
>> +
>> +struct args
>> +{{
>> +  {stype} arg0[STRIDE];
>> +  double timing;
>> +}};
>> +
>> +struct _variants
>> +{{
>> +  const char *name;
>> +  int count;
>> +  struct args *in;
>> +}};
>> +
>> +struct args in0[{rowcount}] = {{
>> +{in_data}
>> +}};
>> +
>> +struct _variants variants[1] = {{
>> +  {{"", {rowcount}, in0}},
>> +}};
>> +
>> +#define NUM_VARIANTS 1
>> +#define NUM_SAMPLES(i) (variants[i].count)
>> +#define VARIANT(i) (variants[i].name)
>> +
>> +// Cannot pass volatile pointer to svst1. This still does not appear to get optimised out.
>> +static {stype} /*volatile*/ ret[STRIDE];
>> +
>> +#define BENCH_FUNC(i, j) ({{ svst1_f{prec_short}(svwhilelt_b{prec_short}(0, 4), ret, CALL_BENCH_FUNC(i, j)); }})
>> +#define FUNCNAME "{fname}"
>> +#include <bench-libmvec-skeleton.c>
>> +"""
>> +
>> +def main(name):
>> +    _, prec, _, func = name.split("-")
>> +    scalar_to_sve_type = {"double": "svfloat64_t", "float": "svfloat32_t"}
>> +
>> +    stride = {"double": 2, "float": 4}[prec]
>> +    rtype = scalar_to_sve_type[prec]
>> +    atype = scalar_to_sve_type[prec]
>> +    fname = f"_ZGVsMxv_{func}{'f' if prec == 'float' else ''}"
>> +    prec_short = {"double": 64, "float": 32}[prec]
>> +
>> +    with open(f"../benchtests/{func}-inputs") as f:
>> +        in_vals = [l.strip() for l in f.readlines() if l and not l.startswith("#")]
>> +    in_vals = [in_vals[i:i+stride] for i in range(0, len(in_vals), stride)]
>> +    rowcount= len(in_vals)
>> +    in_data = ",\n".join("{{" + ", ".join(row) + "}, 0}" for row in in_vals)
>> +
>> +    print(TEMPLATE.format(stride=stride,
>> +                          rtype=rtype,
>> +                          atype=atype,
>> +                          fname=fname,
>> +                          prec_short=prec_short,
>> +                          in_data=in_data,
>> +                          rowcount=rowcount,
>> +                          stype=prec))
>> +
>> +
>> +if __name__ == "__main__":
>> +    main(sys.argv[1])
>> diff --git a/sysdeps/aarch64/fpu/sve_utils.h b/sysdeps/aarch64/fpu/sve_utils.h
>> new file mode 100644
>> index 0000000000..dbdc03387c
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/sve_utils.h
>> @@ -0,0 +1,55 @@
>> +/* Helpers for SVE vector math funtions.
> 
> s/funtions/functions
> 
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <arm_sve.h>
>> +
>> +#define SV_NAME_F1(fun) _ZGVsMxv_##fun##f
>> +#define SV_NAME_D1(fun) _ZGVsMxv_##fun
>> +#define SV_NAME_F2(fun) _ZGVsMxvv_##fun##f
>> +#define SV_NAME_D2(fun) _ZGVsMxvv_##fun
>> +
>> +static inline svfloat32_t
>> +sv_call_f32 (float (*f) (float), svfloat32_t x, svfloat32_t y, svbool_t cmp)
>> +{
>> +  svbool_t p = svpfirst (cmp, svpfalse ());
>> +  while (svptest_any (cmp, p))
>> +    {
>> +      float elem = svclastb_n_f32 (p, 0, x);
>> +      elem = (*f) (elem);
>> +      svfloat32_t y2 = svdup_n_f32 (elem);
>> +      y = svsel_f32 (p, y2, y);
>> +      p = svpnext_b32 (cmp, p);
>> +    }
>> +  return y;
>> +}
>> +
>> +static inline svfloat64_t
>> +sv_call_f64 (double (*f) (double), svfloat64_t x, svfloat64_t y, svbool_t cmp)
>> +{
>> +  svbool_t p = svpfirst (cmp, svpfalse ());
>> +  while (svptest_any (cmp, p))
>> +    {
>> +      double elem = svclastb_n_f64 (p, 0, x);
>> +      elem = (*f) (elem);
>> +      svfloat64_t y2 = svdup_n_f64 (elem);
>> +      y = svsel_f64 (p, y2, y);
>> +      p = svpnext_b64 (cmp, p);
>> +    }
>> +  return y;
>> +}
>> diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c
>> new file mode 100644
>> index 0000000000..52e330f469
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c
>> @@ -0,0 +1,26 @@
>> +/* Scalar wrappers for double-precision Advanced SIMD vector math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <arm_neon.h>
>> +
>> +#include "test-double-advsimd.h"
>> +
>> +#define VEC_TYPE float64x2_t
>> +
>> +VPCS_VECTOR_WRAPPER(cos_advsimd, _ZGVnN2v_cos)
>> diff --git a/sysdeps/aarch64/fpu/test-double-advsimd.h b/sysdeps/aarch64/fpu/test-double-advsimd.h
>> new file mode 100644
>> index 0000000000..8bd32b97fa
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/test-double-advsimd.h
>> @@ -0,0 +1,25 @@
>> +/* Test declarations for double-precision Advanced SIMD vector math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include "test-double.h"
>> +#include "test-math-vector.h"
>> +#include "test-vpcs-vector-wrapper.h"
>> +
>> +#define VEC_SUFF _advsimd
>> +#define VEC_LEN 2
>> diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c
>> new file mode 100644
>> index 0000000000..8edc5ed5ab
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c
>> @@ -0,0 +1,34 @@
>> +/* Scalar wrappers for double-precision SVE vector math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <arm_sve.h>
>> +
>> +#include "test-double-sve.h"
>> +
>> +/* Wrapper from scalar to SVE function. Cannot just use VECTOR_WRAPPER due to predication.  */
>> +#define SVE_VECTOR_WRAPPER(scalar_func, vector_func)			\
>> +  extern VEC_TYPE vector_func (VEC_TYPE, svbool_t);			\
>> +FLOAT scalar_func (FLOAT x)						\
>> +{									\
>> +  VEC_TYPE mx = svdup_n_f64 (x);					\
>> +  VEC_TYPE mr = vector_func (mx, svptrue_b64 ());			\
>> +  return svlastb_f64 (svptrue_b64 (), mr);				\
>> +}
>> +
>> +SVE_VECTOR_WRAPPER(cos_sve, _ZGVsMxv_cos)
>> diff --git a/sysdeps/aarch64/fpu/test-double-sve.h b/sysdeps/aarch64/fpu/test-double-sve.h
>> new file mode 100644
>> index 0000000000..857a40861d
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/test-double-sve.h
>> @@ -0,0 +1,26 @@
>> +/* Test declarations for double-precision SVE vector math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include "test-double.h"
>> +#include "test-math-vector.h"
>> +
>> +#define REQUIRE_SVE
>> +#define VEC_SUFF _sve
>> +#define VEC_LEN svcntd()
>> +#define VEC_TYPE svfloat64_t
>> diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c
>> new file mode 100644
>> index 0000000000..3577ca93b8
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c
>> @@ -0,0 +1,26 @@
>> +/* Scalar wrappers for single-precision Advanced SIMD vector math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <arm_neon.h>
>> +
>> +#include "test-float-advsimd.h"
>> +
>> +#define VEC_TYPE float32x4_t
>> +
>> +VPCS_VECTOR_WRAPPER(cosf_advsimd, _ZGVnN4v_cosf)
>> diff --git a/sysdeps/aarch64/fpu/test-float-advsimd.h b/sysdeps/aarch64/fpu/test-float-advsimd.h
>> new file mode 100644
>> index 0000000000..86fce613cd
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/test-float-advsimd.h
>> @@ -0,0 +1,25 @@
>> +/* Test declarations for singlex-precision Advanced SIMD vector math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include "test-float.h"
>> +#include "test-math-vector.h"
>> +#include "test-vpcs-vector-wrapper.h"
>> +
>> +#define VEC_SUFF _advsimd
>> +#define VEC_LEN 4
>> diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c
>> new file mode 100644
>> index 0000000000..b6a944d502
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c
>> @@ -0,0 +1,34 @@
>> +/* Scalar wrappers for single-precision SVE vector math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <arm_sve.h>
>> +
>> +#include "test-float-sve.h"
>> +
>> +/* Wrapper from scalar to SVE function. Cannot just use VECTOR_WRAPPER due to predication.  */
>> +#define SVE_VECTOR_WRAPPER(scalar_func, vector_func)			\
>> +  extern VEC_TYPE vector_func (VEC_TYPE, svbool_t);			\
>> +FLOAT scalar_func (FLOAT x)						\
>> +{									\
>> +  VEC_TYPE mx = svdup_n_f32 (x);					\
>> +  VEC_TYPE mr = vector_func (mx, svptrue_b32 ());			\
>> +  return svlastb_f32 (svptrue_b32 (), mr);				\
>> +}
>> +
>> +SVE_VECTOR_WRAPPER(cosf_sve, _ZGVsMxv_cosf)
>> diff --git a/sysdeps/aarch64/fpu/test-float-sve.h b/sysdeps/aarch64/fpu/test-float-sve.h
>> new file mode 100644
>> index 0000000000..d6e122cf67
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/test-float-sve.h
>> @@ -0,0 +1,26 @@
>> +/* Test declarations for single-precision SVE vector math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include "test-float.h"
>> +#include "test-math-vector.h"
>> +
>> +#define REQUIRE_SVE
>> +#define VEC_SUFF _sve
>> +#define VEC_LEN svcntw()
>> +#define VEC_TYPE svfloat32_t
>> diff --git a/sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h b/sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h
>> new file mode 100644
>> index 0000000000..eb0f0db838
>> --- /dev/null
>> +++ b/sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h
>> @@ -0,0 +1,30 @@
>> +/* Scalar wrapper for vpcs-enabled Advanced SIMD vector math functions.
>> +
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#define VPCS_VECTOR_WRAPPER(scalar_func, vector_func)				\
>> +extern __attribute__ ((aarch64_vector_pcs)) VEC_TYPE vector_func (VEC_TYPE);	\
>> +FLOAT scalar_func (FLOAT x)							\
>> +{										\
>> +  int i;									\
>> +  VEC_TYPE mx;									\
>> +  INIT_VEC_LOOP (mx, x, VEC_LEN);						\
>> +  VEC_TYPE mr = vector_func (mx);						\
>> +  TEST_VEC_LOOP (mr, VEC_LEN);							\
>> +  return ((FLOAT) mr[0]);							\
>> +}
>> diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
>> new file mode 100644
>> index 0000000000..13af421af2
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
>> @@ -0,0 +1,4 @@
>> +GLIBC_2.38 _ZGVnN2v_cos F
>> +GLIBC_2.38 _ZGVnN4v_cosf F
>> +GLIBC_2.38 _ZGVsMxv_cos F
>> +GLIBC_2.38 _ZGVsMxv_cosf F
  
Adhemerval Zanella Netto Feb. 9, 2023, 12:55 p.m. UTC | #4
On 09/02/23 09:43, Joe Ramsay wrote:
> Thanks for the comments. I will attempt a patch that addresses them, in the meantime just a few questions:
> 
> On 08/02/2023 13:11, Adhemerval Zanella Netto wrote:
>>
>>
>> On 07/02/23 08:35, Joe Ramsay via Libc-alpha wrote:
>>> Hi,
>>>
>>> The attached patch is an attempt to enable libmvec on AArch64. The
>>> proposed change is mainly implementing build infrastructure to add the
>>> new routines to ABI, tests and benchmarks. I have demonstrated how
>>> this all fits together by adding implementations for vector cos, in
>>> both single and double precision, targeting both Advanced SIMD and
>>> SVE.
>>>
>>> The implementations of the routines themselves are just loops over the
>>> scalar routine from libm for now, as we are more concerned with
>>> getting the plumbing right at this point. We plan to contribute vector
>>> routines from the Arm Optimized Routines repo that are compliant with
>>> requirements described in the libmvec wiki.
>>>
>>> Any comments/thoughts much appreciated! In particular, the patch
>>> raises the minimum GCC to 10, in order to be able to submit routines
>>> written using ACLE instead of assembly. This is clearly a big jump,
>>> but we have options if this is not acceptable. One option would be to
>>> submit compiler-generated assembly, similar to the equivalent routines
>>> under sysdeps/x86_64. If GCC 9 is an acceptable compromise then this
>>> would only have to be for SVE routines.
>>
>> Using C implementation with intrinsics would be idea, there are more easily
>> maintained and can leverage compiler improvements.  I rather do it instead
>> of the assembly dump Intel did.
>>
>> The minimum GCC 10 is not ideal, however I don't see it as blocker either
>> (it should be up to arch-maintainers).  One option might be check if
>> compiler does not support building libmvec, disable the build and related
>> checks.  It is not ideal either, since the resulting glibc won't have
>> a complete ABI.
>>
>>
> OK, let's see what arch-maintainers make of it.
>>>
>>> Also, are there plans to merge libmvec into libm, or will they be kept
>>> separate?
>>
>> There is none afaik.  The libpthread, librt, etc. merge was done to
>> fix long standing design and maintanance issues that is not really presented
>> with libm and libmvec.  There is still the partial upgrade one, but
>> it is still present with a disjoint ld, libc, libm anyway.
>>
>> However, it is feasible to merge if your willing to work on it.  We will
>> need to keep the x86_64 lib with the sentinel compat symbol (similar to
>> what we did for libpthread).
>>
>> What I would like to avoid is to have different arquitectures using different
>> approaches, for instance aarch64 begin merged while having x86_64 still
>> using a different library.  It add a slight more complexity to the build
>> process and extra arch specific boilerplate code.
>>
> This sounds good - keeping them separate would be our choice too. I have not come across the sentinel compat symbol - is this something we need to do for AArch64 also?

The sentinel compat symbols are required for the case if you want to merge
x86 libmvec on libm, similar to what rt/librt-compat.c and nptl/libpthread-compat.c
does.  They are required so old binaries linked to libmvec does not fail at
loading time, since they will have libmvec as DT_NEEDED.

>>
>> I think these tests should move to configure tests instead, it advertises beforehand
>> the user that it needs to update the compiler instead through a compiler error.
>>
>> The configure check will then check for both advsimd and SVE support, so there is
>> no need for __ADVSIMD_VEC_MATH_SUPPORTED or __SVE_VEC_MATH_SUPPORTED.
>>
> Apologies, I don't quite understand what you mean by this. I put these tests in so that users could compile against the new symbols with math.h as long as they had a sufficiently new compiler, but wouldn't get undefined types in math.h if they were using an old compiler that didn't have e.g. __Float32x4_t. (I think I remarked in the original message that new symbols hadn't been added to math.h, but this was not correct).

Ah right, I missed that this is an *installed header*.  In this case the 
__GNUC_PREREQ does make sense (sorry for the noise).
  

Patch

diff --git a/INSTALL b/INSTALL
index 970d6627e2..ba800e41d6 100644
--- a/INSTALL
+++ b/INSTALL
@@ -524,6 +524,9 @@  build the GNU C Library:
      For s390x architecture builds, GCC 7.1 or higher is needed (See gcc
      Bug 98269).
 
+     For AArch64 architecture builds with mathvec enabled, GCC 10 or
+     higher is needed due to dependency on arm_sve.h.
+
      For multi-arch support it is recommended to use a GCC which has
      been built with support for GNU indirect functions.  This ensures
      that correct debugging information is generated for functions
diff --git a/manual/install.texi b/manual/install.texi
index 260f8a5c82..e9c62b51ae 100644
--- a/manual/install.texi
+++ b/manual/install.texi
@@ -567,6 +567,9 @@  For ARC architecture builds, GCC 8.3 or higher is needed.
 
 For s390x architecture builds, GCC 7.1 or higher is needed (See gcc Bug 98269).
 
+For AArch64 architecture builds with mathvec enabled, GCC 10 or higher is needed
+due to dependency on arm_sve.h.
+
 For multi-arch support it is recommended to use a GCC which has been built with
 support for GNU indirect functions.  This ensures that correct debugging
 information is generated for functions selected by IFUNC resolvers.  This
diff --git a/sysdeps/aarch64/configure b/sysdeps/aarch64/configure
index 2130f6b8f8..a71c32d70f 100644
--- a/sysdeps/aarch64/configure
+++ b/sysdeps/aarch64/configure
@@ -327,3 +327,31 @@  if test $libc_cv_aarch64_sve_asm = yes; then
   $as_echo "#define HAVE_AARCH64_SVE_ASM 1" >>confdefs.h
 
 fi
+
+# Check if the local system can run SVE binary
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for local SVE hardware" >&5
+$as_echo_n "checking for local SVE hardware... " >&6; }
+if ${libc_cv_can_run_sve+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+    cat > conftest.c <<EOF
+#include <sys/auxv.h>
+int main(void) {
+  if (! (getauxval (AT_HWCAP) & HWCAP_SVE))
+    return 1;
+  return 0;
+}
+EOF
+  libc_cv_can_run_sve=yes
+  ${CC-cc} conftest.c -o conftest
+  ./conftest || libc_cv_can_run_sve=no
+  rm -f conftest*
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_can_run_sve" >&5
+$as_echo "$libc_cv_can_run_sve" >&6; }
+config_vars="$config_vars
+aarch64-can-run-sve = $libc_cv_can_run_sve"
+
+if test x"$build_mathvec" = xnotset; then
+  build_mathvec=yes
+fi
diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac
index 85c6f76508..688f8772a6 100644
--- a/sysdeps/aarch64/configure.ac
+++ b/sysdeps/aarch64/configure.ac
@@ -101,3 +101,23 @@  rm -f conftest*])
 if test $libc_cv_aarch64_sve_asm = yes; then
   AC_DEFINE(HAVE_AARCH64_SVE_ASM)
 fi
+
+# Check if the local system can run SVE binary
+AC_CACHE_CHECK(for local SVE hardware, libc_cv_can_run_sve, [dnl
+  cat > conftest.c <<EOF
+#include <sys/auxv.h>
+int main(void) {
+  if (! (getauxval (AT_HWCAP) & HWCAP_SVE))
+    return 1;
+  return 0;
+}
+EOF
+  libc_cv_can_run_sve=yes
+  ${CC-cc} conftest.c -o conftest
+  ./conftest || libc_cv_can_run_sve=no
+  rm -f conftest*])
+LIBC_CONFIG_VAR([aarch64-can-run-sve], [$libc_cv_can_run_sve])
+
+if test x"$build_mathvec" = xnotset; then
+  build_mathvec=yes
+fi
diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
new file mode 100644
index 0000000000..caf5d60669
--- /dev/null
+++ b/sysdeps/aarch64/fpu/Makefile
@@ -0,0 +1,66 @@ 
+float-advsimd-funcs = cos
+
+double-advsimd-funcs = cos
+
+float-sve-funcs = cos
+
+double-sve-funcs = cos
+
+ifeq ($(subdir),mathvec)
+libmvec-support = $(addsuffix f_advsimd,$(float-advsimd-funcs)) \
+                  $(addsuffix _advsimd,$(double-advsimd-funcs)) \
+                  $(addsuffix f_sve,$(float-sve-funcs)) \
+                  $(addsuffix _sve,$(double-sve-funcs))
+endif
+
+sve-cflags = -march=armv8-a+sve
+
+
+ifeq ($(build-mathvec),yes)
+bench-libmvec = $(addprefix float-advsimd-,$(float-advsimd-funcs)) \
+                $(addprefix double-advsimd-,$(double-advsimd-funcs))
+
+# If not on an SVE-enabled machine, do not add SVE routines to benchmarks.
+# The routines are still built.
+ifeq ($(aarch64-can-run-sve),yes)
+  bench-libmvec += $(addprefix float-sve-,$(float-sve-funcs)) \
+                   $(addprefix double-sve-,$(double-sve-funcs))
+endif
+endif
+
+$(objpfx)bench-float-advsimd-%.c:
+	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py $(basename $(@F)) > $@
+$(objpfx)bench-double-advsimd-%.c:
+	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py $(basename $(@F)) > $@
+$(objpfx)bench-float-sve-%.c:
+	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py $(basename $(@F)) > $@
+$(objpfx)bench-double-sve-%.c:
+	$(PYTHON) $(..)sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py $(basename $(@F)) > $@
+
+ifeq (${STATIC-BENCHTESTS},yes)
+libmvec-benchtests = $(common-objpfx)mathvec/libmvec.a $(common-objpfx)math/libm.a
+else
+libmvec-benchtests = $(libmvec) $(libm)
+endif
+
+$(addprefix $(objpfx)bench-,$(bench-libmvec)): $(libmvec-benchtests)
+
+ifeq ($(build-mathvec),yes)
+libmvec-tests += float-advsimd double-advsimd float-sve double-sve
+endif
+
+define sve-float-cflags-template
+CFLAGS-$(1)f_sve.c += $(sve-cflags)
+CFLAGS-bench-float-sve-$(1).c += $(sve-cflags)
+endef
+
+define sve-double-cflags-template
+CFLAGS-$(1)_sve.c += $(sve-cflags)
+CFLAGS-bench-double-sve-$(1).c += $(sve-cflags)
+endef
+
+$(foreach f,$(float-sve-funcs), $(eval $(call sve-float-cflags-template,$(f))))
+$(foreach f,$(double-sve-funcs), $(eval $(call sve-double-cflags-template,$(f))))
+
+CFLAGS-test-float-sve-wrappers.c = $(sve-cflags)
+CFLAGS-test-double-sve-wrappers.c = $(sve-cflags)
diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions
new file mode 100644
index 0000000000..5222a6f180
--- /dev/null
+++ b/sysdeps/aarch64/fpu/Versions
@@ -0,0 +1,8 @@ 
+libmvec {
+  GLIBC_2.38 {
+    _ZGVnN2v_cos;
+    _ZGVnN4v_cosf;
+    _ZGVsMxv_cos;
+    _ZGVsMxv_cosf;
+  }
+}
diff --git a/sysdeps/aarch64/fpu/advsimd_utils.h b/sysdeps/aarch64/fpu/advsimd_utils.h
new file mode 100644
index 0000000000..b597a18b8f
--- /dev/null
+++ b/sysdeps/aarch64/fpu/advsimd_utils.h
@@ -0,0 +1,39 @@ 
+/* Helpers for Advanced SIMD vector math funtions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <arm_neon.h>
+
+#define VPCS_ATTR __attribute__ ((aarch64_vector_pcs))
+
+#define V_NAME_F1(fun) _ZGVnN4v_##fun##f
+#define V_NAME_D1(fun) _ZGVnN2v_##fun
+#define V_NAME_F2(fun) _ZGVnN4vv_##fun##f
+#define V_NAME_D2(fun) _ZGVnN2vv_##fun
+
+static inline float32x4_t
+v_call_f32 (float (*f) (float), float32x4_t x)
+{
+  return (float32x4_t){f (x[0]), f (x[1]), f (x[2]), f (x[3])};
+}
+
+static inline float64x2_t
+v_call_f64 (double (*f) (double), float64x2_t x)
+{
+  return (float64x2_t){f (x[0]), f (x[1])};
+}
diff --git a/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c b/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
new file mode 100644
index 0000000000..ca6a10d1fe
--- /dev/null
+++ b/sysdeps/aarch64/fpu/bench-libmvec-skeleton.c
@@ -0,0 +1,83 @@ 
+/* Skeleton for libmvec benchmark programs.
+   Copyright (C) 2021-2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <time.h>
+#include <inttypes.h>
+#include <bench-timing.h>
+#include <json-lib.h>
+#include <bench-util.h>
+#include <math-tests-arch.h>
+
+#include <bench-util.c>
+#define D_ITERS 10000
+
+int
+main (int argc, char **argv)
+{
+  unsigned long i, k;
+  timing_t start, end;
+  json_ctx_t json_ctx;
+
+  bench_start ();
+
+#ifdef BENCH_INIT
+  BENCH_INIT ();
+#endif
+
+  json_init (&json_ctx, 2, stdout);
+
+  /* Begin function.  */
+  json_attr_object_begin (&json_ctx, FUNCNAME);
+
+  for (int v = 0; v < NUM_VARIANTS; v++)
+    {
+      double d_total_time = 0;
+      timing_t cur;
+      for (k = 0; k < D_ITERS; k++)
+	{
+	  TIMING_NOW (start);
+	  for (i = 0; i < NUM_SAMPLES (v); i++)
+	    BENCH_FUNC (v, i);
+	  TIMING_NOW (end);
+
+	  TIMING_DIFF (cur, start, end);
+
+	  TIMING_ACCUM (d_total_time, cur);
+	}
+      double d_total_data_set = D_ITERS * NUM_SAMPLES (v) * STRIDE;
+
+      /* Begin variant.  */
+      json_attr_object_begin (&json_ctx, VARIANT (v));
+
+      json_attr_double (&json_ctx, "duration", d_total_time);
+      json_attr_double (&json_ctx, "iterations", d_total_data_set);
+      json_attr_double (&json_ctx, "mean", d_total_time / d_total_data_set);
+
+      /* End variant.  */
+      json_attr_object_end (&json_ctx);
+    }
+
+  /* End function.  */
+  json_attr_object_end (&json_ctx);
+
+  return 0;
+}
diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h
new file mode 100644
index 0000000000..a25845bff8
--- /dev/null
+++ b/sysdeps/aarch64/fpu/bits/math-vector.h
@@ -0,0 +1,65 @@ 
+/* Platform-specific SIMD declarations of math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly;\
+ include <math.h> instead."
+#endif
+
+/* Get default empty definitions for simd declarations.  */
+#include <bits/libm-simd-decl-stubs.h>
+
+#if __GNUC_PREREQ (9, 0)
+# define __ADVSIMD_VEC_MATH_SUPPORTED
+typedef __Float32x4_t __f32x4_t;
+typedef __Float64x2_t __f64x2_t;
+#elif __clang_major__ >= 8
+# define __ADVSIMD_VEC_MATH_SUPPORTED
+typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
+typedef __attribute__((__neon_vector_type__(2))) double __f64x2_t;
+#endif
+
+#if __GNUC_PREREQ (10, 0) || __clang_major >= 11
+# define __SVE_VEC_MATH_SUPPORTED
+typedef __SVFloat32_t __sv_f32_t;
+typedef __SVFloat64_t __sv_f64_t;
+typedef __SVBool_t __sv_bool_t;
+#endif
+
+/* If vector types and vector PCS are unsupported in the working
+   compiler, no choice but to omit vector math declarations.  */
+
+#ifdef __ADVSIMD_VEC_MATH_SUPPORTED
+
+# define __vpcs __attribute__((__aarch64_vector_pcs__))
+
+__vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
+__vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t);
+
+#undef __ADVSIMD_VEC_MATH_SUPPORTED
+#endif /* __ADVSIMD_VEC_MATH_SUPPORTED */
+
+#ifdef __SVE_VEC_MATH_SUPPORTED
+
+__sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
+__sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t);
+
+#undef __SVE_VEC_MATH_SUPPORTED
+#endif /* __SVE_VEC_MATH_SUPPORTED */
+
diff --git a/sysdeps/aarch64/fpu/cos_advsimd.c b/sysdeps/aarch64/fpu/cos_advsimd.c
new file mode 100644
index 0000000000..5a42fbb182
--- /dev/null
+++ b/sysdeps/aarch64/fpu/cos_advsimd.c
@@ -0,0 +1,28 @@ 
+/* Double-precision vector (Advanced SIMD) cos function.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+
+#include "advsimd_utils.h"
+
+VPCS_ATTR
+float64x2_t V_NAME_D1 (cos) (float64x2_t x)
+{
+  return v_call_f64 (cos, x);
+}
diff --git a/sysdeps/aarch64/fpu/cos_sve.c b/sysdeps/aarch64/fpu/cos_sve.c
new file mode 100644
index 0000000000..62bd2ece0e
--- /dev/null
+++ b/sysdeps/aarch64/fpu/cos_sve.c
@@ -0,0 +1,27 @@ 
+/* Double-precision vector (SVE) cos function.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+
+#include "sve_utils.h"
+
+svfloat64_t SV_NAME_D1 (cos) (svfloat64_t x, svbool_t pg)
+{
+  return sv_call_f64 (cos, x, svdup_n_f64 (0), pg);
+}
diff --git a/sysdeps/aarch64/fpu/cosf_advsimd.c b/sysdeps/aarch64/fpu/cosf_advsimd.c
new file mode 100644
index 0000000000..23f54bd905
--- /dev/null
+++ b/sysdeps/aarch64/fpu/cosf_advsimd.c
@@ -0,0 +1,28 @@ 
+/* Single-precision vector (Advanced SIMD) cos function.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+
+#include "advsimd_utils.h"
+
+VPCS_ATTR
+float32x4_t V_NAME_F1 (cos) (float32x4_t x)
+{
+  return v_call_f32 (cosf, x);
+}
diff --git a/sysdeps/aarch64/fpu/cosf_sve.c b/sysdeps/aarch64/fpu/cosf_sve.c
new file mode 100644
index 0000000000..0c4e365e1e
--- /dev/null
+++ b/sysdeps/aarch64/fpu/cosf_sve.c
@@ -0,0 +1,27 @@ 
+/* Single-precision vector (SVE) cos function.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+
+#include "sve_utils.h"
+
+svfloat32_t SV_NAME_F1 (cos) (svfloat32_t x, svbool_t pg)
+{
+  return sv_call_f32 (cosf, x, svdup_n_f32 (0), pg);
+}
diff --git a/sysdeps/aarch64/fpu/libm-test-ulps b/sysdeps/aarch64/fpu/libm-test-ulps
new file mode 100644
index 0000000000..b199d7ddab
--- /dev/null
+++ b/sysdeps/aarch64/fpu/libm-test-ulps
@@ -0,0 +1,7 @@ 
+Function: "cos_advsimd":
+double: 2
+float: 2
+
+Function: "cos_sve":
+double: 2
+float: 2
\ No newline at end of file
diff --git a/sysdeps/aarch64/fpu/libm-test-ulps-name b/sysdeps/aarch64/fpu/libm-test-ulps-name
new file mode 100644
index 0000000000..1f66c5cda0
--- /dev/null
+++ b/sysdeps/aarch64/fpu/libm-test-ulps-name
@@ -0,0 +1 @@ 
+AArch64
diff --git a/sysdeps/aarch64/fpu/math-tests-arch.h b/sysdeps/aarch64/fpu/math-tests-arch.h
new file mode 100644
index 0000000000..263d4cabf1
--- /dev/null
+++ b/sysdeps/aarch64/fpu/math-tests-arch.h
@@ -0,0 +1,34 @@ 
+/* Runtime architecture check for math tests. AArch64 version.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifdef REQUIRE_SVE
+# include <sys/auxv.h>
+
+# define INIT_ARCH_EXT
+# define CHECK_ARCH_EXT							\
+   do									\
+     {									\
+       if (!(getauxval (AT_HWCAP) & HWCAP_SVE)) return;			\
+     }									\
+   while (0)
+
+#else
+# include <sysdeps/generic/math-tests-arch.h>
+#endif
+
diff --git a/sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py b/sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py
new file mode 100644
index 0000000000..9c092670d7
--- /dev/null
+++ b/sysdeps/aarch64/fpu/scripts/bench_libmvec_advsimd.py
@@ -0,0 +1,91 @@ 
+#!/usr/bin/python3
+# Copyright (C) 2023 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+
+import sys
+
+TEMPLATE = """
+#include <math.h>
+#include <arm_neon.h>
+
+#define STRIDE {stride}
+
+#define CALL_BENCH_FUNC(v, i) (__extension__ ({{                         \\
+   {rtype} mx0 = {fname}(vld1q_f{prec_short} (variants[v].in[i].arg0));  \\
+   mx0; }}))
+
+struct args
+{{
+  {stype} arg0[STRIDE];
+  double timing;
+}};
+
+struct _variants
+{{
+  const char *name;
+  int count;
+  struct args *in;
+}};
+
+struct args in0[{rowcount}] = {{
+{in_data}
+}};
+
+struct _variants variants[1] = {{
+  {{"", {rowcount}, in0}},
+}};
+
+#define NUM_VARIANTS 1
+#define NUM_SAMPLES(i) (variants[i].count)
+#define VARIANT(i) (variants[i].name)
+
+// Cannot pass volatile pointer to svst1. This still does not appear to get optimised out.
+static {rtype} volatile ret;
+
+#define BENCH_FUNC(i, j) ({{ ret = CALL_BENCH_FUNC(i, j); }})
+#define FUNCNAME "{fname}"
+#include <bench-libmvec-skeleton.c>
+"""
+
+def main(name):
+    _, prec, _, func = name.split("-")
+    scalar_to_advsimd_type = {"double": "float64x2_t", "float": "float32x4_t"}
+
+    stride = {"double": 2, "float": 4}[prec]
+    rtype = scalar_to_advsimd_type[prec]
+    atype = scalar_to_advsimd_type[prec]
+    fname = f"_ZGVnN{stride}v_{func}{'f' if prec == 'float' else ''}"
+    prec_short = {"double": 64, "float": 32}[prec]
+
+    with open(f"../benchtests/{func}-inputs") as f:
+        in_vals = [l.strip() for l in f.readlines() if l and not l.startswith("#")]
+    in_vals = [in_vals[i:i+stride] for i in range(0, len(in_vals), stride)]
+    rowcount= len(in_vals)
+    in_data = ",\n".join("{{" + ", ".join(row) + "}, 0}" for row in in_vals)
+
+    print(TEMPLATE.format(stride=stride,
+                          rtype=rtype,
+                          atype=atype,
+                          fname=fname,
+                          prec_short=prec_short,
+                          in_data=in_data,
+                          rowcount=rowcount,
+                          stype=prec))
+
+
+if __name__ == "__main__":
+    main(sys.argv[1])
diff --git a/sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py b/sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py
new file mode 100755
index 0000000000..0ea21c4c69
--- /dev/null
+++ b/sysdeps/aarch64/fpu/scripts/bench_libmvec_sve.py
@@ -0,0 +1,93 @@ 
+#!/usr/bin/python3
+# Copyright (C) 2023 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+
+import sys
+
+TEMPLATE = """
+#include <math.h>
+#include <arm_sve.h>
+
+#define STRIDE {stride}
+
+#define CALL_BENCH_FUNC(v, i) (__extension__ ({{                         \\
+   {rtype} mx0 = {fname}(svld1rq_f{prec_short} (svptrue_b{prec_short}(), \\
+                                                variants[v].in[i].arg0), \\
+                         svptrue_b{prec_short}());                       \\
+   mx0; }}))
+
+struct args
+{{
+  {stype} arg0[STRIDE];
+  double timing;
+}};
+
+struct _variants
+{{
+  const char *name;
+  int count;
+  struct args *in;
+}};
+
+struct args in0[{rowcount}] = {{
+{in_data}
+}};
+
+struct _variants variants[1] = {{
+  {{"", {rowcount}, in0}},
+}};
+
+#define NUM_VARIANTS 1
+#define NUM_SAMPLES(i) (variants[i].count)
+#define VARIANT(i) (variants[i].name)
+
+// Cannot pass volatile pointer to svst1. This still does not appear to get optimised out.
+static {stype} /*volatile*/ ret[STRIDE];
+
+#define BENCH_FUNC(i, j) ({{ svst1_f{prec_short}(svwhilelt_b{prec_short}(0, 4), ret, CALL_BENCH_FUNC(i, j)); }})
+#define FUNCNAME "{fname}"
+#include <bench-libmvec-skeleton.c>
+"""
+
+def main(name):
+    _, prec, _, func = name.split("-")
+    scalar_to_sve_type = {"double": "svfloat64_t", "float": "svfloat32_t"}
+
+    stride = {"double": 2, "float": 4}[prec]
+    rtype = scalar_to_sve_type[prec]
+    atype = scalar_to_sve_type[prec]
+    fname = f"_ZGVsMxv_{func}{'f' if prec == 'float' else ''}"
+    prec_short = {"double": 64, "float": 32}[prec]
+
+    with open(f"../benchtests/{func}-inputs") as f:
+        in_vals = [l.strip() for l in f.readlines() if l and not l.startswith("#")]
+    in_vals = [in_vals[i:i+stride] for i in range(0, len(in_vals), stride)]
+    rowcount= len(in_vals)
+    in_data = ",\n".join("{{" + ", ".join(row) + "}, 0}" for row in in_vals)
+
+    print(TEMPLATE.format(stride=stride,
+                          rtype=rtype,
+                          atype=atype,
+                          fname=fname,
+                          prec_short=prec_short,
+                          in_data=in_data,
+                          rowcount=rowcount,
+                          stype=prec))
+
+
+if __name__ == "__main__":
+    main(sys.argv[1])
diff --git a/sysdeps/aarch64/fpu/sve_utils.h b/sysdeps/aarch64/fpu/sve_utils.h
new file mode 100644
index 0000000000..dbdc03387c
--- /dev/null
+++ b/sysdeps/aarch64/fpu/sve_utils.h
@@ -0,0 +1,55 @@ 
+/* Helpers for SVE vector math funtions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <arm_sve.h>
+
+#define SV_NAME_F1(fun) _ZGVsMxv_##fun##f
+#define SV_NAME_D1(fun) _ZGVsMxv_##fun
+#define SV_NAME_F2(fun) _ZGVsMxvv_##fun##f
+#define SV_NAME_D2(fun) _ZGVsMxvv_##fun
+
+static inline svfloat32_t
+sv_call_f32 (float (*f) (float), svfloat32_t x, svfloat32_t y, svbool_t cmp)
+{
+  svbool_t p = svpfirst (cmp, svpfalse ());
+  while (svptest_any (cmp, p))
+    {
+      float elem = svclastb_n_f32 (p, 0, x);
+      elem = (*f) (elem);
+      svfloat32_t y2 = svdup_n_f32 (elem);
+      y = svsel_f32 (p, y2, y);
+      p = svpnext_b32 (cmp, p);
+    }
+  return y;
+}
+
+static inline svfloat64_t
+sv_call_f64 (double (*f) (double), svfloat64_t x, svfloat64_t y, svbool_t cmp)
+{
+  svbool_t p = svpfirst (cmp, svpfalse ());
+  while (svptest_any (cmp, p))
+    {
+      double elem = svclastb_n_f64 (p, 0, x);
+      elem = (*f) (elem);
+      svfloat64_t y2 = svdup_n_f64 (elem);
+      y = svsel_f64 (p, y2, y);
+      p = svpnext_b64 (cmp, p);
+    }
+  return y;
+}
diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c
new file mode 100644
index 0000000000..52e330f469
--- /dev/null
+++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c
@@ -0,0 +1,26 @@ 
+/* Scalar wrappers for double-precision Advanced SIMD vector math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <arm_neon.h>
+
+#include "test-double-advsimd.h"
+
+#define VEC_TYPE float64x2_t
+
+VPCS_VECTOR_WRAPPER(cos_advsimd, _ZGVnN2v_cos)
diff --git a/sysdeps/aarch64/fpu/test-double-advsimd.h b/sysdeps/aarch64/fpu/test-double-advsimd.h
new file mode 100644
index 0000000000..8bd32b97fa
--- /dev/null
+++ b/sysdeps/aarch64/fpu/test-double-advsimd.h
@@ -0,0 +1,25 @@ 
+/* Test declarations for double-precision Advanced SIMD vector math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "test-double.h"
+#include "test-math-vector.h"
+#include "test-vpcs-vector-wrapper.h"
+
+#define VEC_SUFF _advsimd
+#define VEC_LEN 2
diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c
new file mode 100644
index 0000000000..8edc5ed5ab
--- /dev/null
+++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c
@@ -0,0 +1,34 @@ 
+/* Scalar wrappers for double-precision SVE vector math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <arm_sve.h>
+
+#include "test-double-sve.h"
+
+/* Wrapper from scalar to SVE function. Cannot just use VECTOR_WRAPPER due to predication.  */
+#define SVE_VECTOR_WRAPPER(scalar_func, vector_func)			\
+  extern VEC_TYPE vector_func (VEC_TYPE, svbool_t);			\
+FLOAT scalar_func (FLOAT x)						\
+{									\
+  VEC_TYPE mx = svdup_n_f64 (x);					\
+  VEC_TYPE mr = vector_func (mx, svptrue_b64 ());			\
+  return svlastb_f64 (svptrue_b64 (), mr);				\
+}
+
+SVE_VECTOR_WRAPPER(cos_sve, _ZGVsMxv_cos)
diff --git a/sysdeps/aarch64/fpu/test-double-sve.h b/sysdeps/aarch64/fpu/test-double-sve.h
new file mode 100644
index 0000000000..857a40861d
--- /dev/null
+++ b/sysdeps/aarch64/fpu/test-double-sve.h
@@ -0,0 +1,26 @@ 
+/* Test declarations for double-precision SVE vector math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "test-double.h"
+#include "test-math-vector.h"
+
+#define REQUIRE_SVE
+#define VEC_SUFF _sve
+#define VEC_LEN svcntd()
+#define VEC_TYPE svfloat64_t
diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c
new file mode 100644
index 0000000000..3577ca93b8
--- /dev/null
+++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c
@@ -0,0 +1,26 @@ 
+/* Scalar wrappers for single-precision Advanced SIMD vector math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <arm_neon.h>
+
+#include "test-float-advsimd.h"
+
+#define VEC_TYPE float32x4_t
+
+VPCS_VECTOR_WRAPPER(cosf_advsimd, _ZGVnN4v_cosf)
diff --git a/sysdeps/aarch64/fpu/test-float-advsimd.h b/sysdeps/aarch64/fpu/test-float-advsimd.h
new file mode 100644
index 0000000000..86fce613cd
--- /dev/null
+++ b/sysdeps/aarch64/fpu/test-float-advsimd.h
@@ -0,0 +1,25 @@ 
+/* Test declarations for singlex-precision Advanced SIMD vector math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "test-float.h"
+#include "test-math-vector.h"
+#include "test-vpcs-vector-wrapper.h"
+
+#define VEC_SUFF _advsimd
+#define VEC_LEN 4
diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c
new file mode 100644
index 0000000000..b6a944d502
--- /dev/null
+++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c
@@ -0,0 +1,34 @@ 
+/* Scalar wrappers for single-precision SVE vector math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <arm_sve.h>
+
+#include "test-float-sve.h"
+
+/* Wrapper from scalar to SVE function. Cannot just use VECTOR_WRAPPER due to predication.  */
+#define SVE_VECTOR_WRAPPER(scalar_func, vector_func)			\
+  extern VEC_TYPE vector_func (VEC_TYPE, svbool_t);			\
+FLOAT scalar_func (FLOAT x)						\
+{									\
+  VEC_TYPE mx = svdup_n_f32 (x);					\
+  VEC_TYPE mr = vector_func (mx, svptrue_b32 ());			\
+  return svlastb_f32 (svptrue_b32 (), mr);				\
+}
+
+SVE_VECTOR_WRAPPER(cosf_sve, _ZGVsMxv_cosf)
diff --git a/sysdeps/aarch64/fpu/test-float-sve.h b/sysdeps/aarch64/fpu/test-float-sve.h
new file mode 100644
index 0000000000..d6e122cf67
--- /dev/null
+++ b/sysdeps/aarch64/fpu/test-float-sve.h
@@ -0,0 +1,26 @@ 
+/* Test declarations for single-precision SVE vector math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "test-float.h"
+#include "test-math-vector.h"
+
+#define REQUIRE_SVE
+#define VEC_SUFF _sve
+#define VEC_LEN svcntw()
+#define VEC_TYPE svfloat32_t
diff --git a/sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h b/sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h
new file mode 100644
index 0000000000..eb0f0db838
--- /dev/null
+++ b/sysdeps/aarch64/fpu/test-vpcs-vector-wrapper.h
@@ -0,0 +1,30 @@ 
+/* Scalar wrapper for vpcs-enabled Advanced SIMD vector math functions.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define VPCS_VECTOR_WRAPPER(scalar_func, vector_func)				\
+extern __attribute__ ((aarch64_vector_pcs)) VEC_TYPE vector_func (VEC_TYPE);	\
+FLOAT scalar_func (FLOAT x)							\
+{										\
+  int i;									\
+  VEC_TYPE mx;									\
+  INIT_VEC_LOOP (mx, x, VEC_LEN);						\
+  VEC_TYPE mr = vector_func (mx);						\
+  TEST_VEC_LOOP (mr, VEC_LEN);							\
+  return ((FLOAT) mr[0]);							\
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
new file mode 100644
index 0000000000..13af421af2
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
@@ -0,0 +1,4 @@ 
+GLIBC_2.38 _ZGVnN2v_cos F
+GLIBC_2.38 _ZGVnN4v_cosf F
+GLIBC_2.38 _ZGVsMxv_cos F
+GLIBC_2.38 _ZGVsMxv_cosf F