[v2] Aarch64: Add simd exp/expf functions

Message ID bd1b565d471b5549f332a53cf450c91e38634d5a.camel@marvell.com
State New, archived
Headers

Commit Message

Steve Ellcey May 22, 2019, 4:54 p.m. UTC
  Here is an updated version of my patch to add libmvec and vector exp
functions to Aarch64.  GCC 9.1 has now been released so the build
no longer depends on an unreleased compiler.  I have not added any
assembly trampolines to allow older compilers to be used.

The SIMD ABI that this patch uses is supported by GCC 9.1 and is defined at:

https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi

If you build with GCC 9.1 (or any compiler that supports the aarch64_vector_pcs
attribute) you will get libmvec by default, otherwise you will not.  If you
try to build libmvec using a compiler with out aarch64_vector_pcs support the
configure will fail.

There was a question of whether building libmvec should be optional or not,
I don't have a strong opinion on that but would be interested in what others
think.  I could change this to require aarch64_vector_pcs attribute support
in all cases and always build libmvec if that is what we want.

I added static *_finite function names so that they are not exported.  If
Wilco's patch to remove the *_finite names entirely is approved I can remove
this part of the patch.

I removed the 'if (aarch64)' conditionals from math-vector-fortran.h
(and fixed my use of the BIG_ENDIAN macros) this means that the vector exp
and expf routines should get used in big-endian and little-endian modes for
Fortran (just like C).  I have not done any big-endian testing at this point
because I have been doing all my testing on a little-endian Aarch64 linux box.

If anyone has ideas on how to do big-endian testing I would be interested.
I am guessing I would have to build an elf target and test with qemu or
something like that but I haven't done a build/test setup like that in quite
a while.

Steve Ellcey
sellcey@marvell.com

2019-05-22  Steve Ellcey  <sellcey@marvell.com>

	* NEWS: Add entry about libmvec support on aarch64.
	* sysdeps/aarch64/configure.ac (build_mathvec): Check for ABI support,
	Build libmvec if support exists.
	* sysdeps/aarch64/configure: Regenerate.
	* sysdeps/aarch64/fpu/Makefile (CFLAGS-libmvec_double_vlen2_exp.c):
	Set flag.
	(CFLAGS-libmvec_float_vlen4_expf.c): Likewise.
	(CFLAGS-libmvec_exp_data.c): Likewise.
	(CFLAGS-libmvec_exp2f_data.c): Likewise.
	(libmvec-support): Add libmvec_double_vlen2_exp,
	libmvec_float_vlen4_expf, libmvec_exp_data, libmvec_exp2f_data,
	libmvec_aliases to list.
	(libmvec-static-only-routines): Add libmvec_aliases to list.
	(libmvec-tests): Add double-vlen2, float-vlen4 to list.
	(double-vlen2-funcs): Add new vector function name.
	(float-vlen4-funcs): Add new vector function name.
	* sysdeps/aarch64/fpu/Versions: New file.
	* sysdeps/aarch64/fpu/bits/math-vector.h: New file.
	* sysdeps/aarch64/fpu/finclude/math-vector-fortran.h: New file.
	* sysdeps/aarch64/fpu/libmvec_aliases.c: New file.
	* sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c: New file.
	* sysdeps/aarch64/fpu/libmvec_exp2f_data.c: New file.
	* sysdeps/aarch64/fpu/libmvec_exp_data.c: New file.
	* sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c: New file.
	* sysdeps/aarch64/fpu/libmvec_util.h: New file.
	* sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c: New file.
	* sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c: New file.
	* sysdeps/aarch64/libm-test-ulps (exp_vlen2): New entry.
	(exp_vlen4): Likewise.
	* sysdeps/unix/sysv/linux/aarch64/libmvec.abilist: New file.
  

Comments

Szabolcs Nagy May 23, 2019, 10:53 a.m. UTC | #1
On 22/05/2019 17:54, Steve Ellcey wrote:
> Here is an updated version of my patch to add libmvec and vector exp

> functions to Aarch64.  GCC 9.1 has now been released so the build

> no longer depends on an unreleased compiler.  I have not added any

> assembly trampolines to allow older compilers to be used.

> 

> The SIMD ABI that this patch uses is supported by GCC 9.1 and is defined at:

> 

> https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi

> 

> If you build with GCC 9.1 (or any compiler that supports the aarch64_vector_pcs

> attribute) you will get libmvec by default, otherwise you will not.  If you

> try to build libmvec using a compiler with out aarch64_vector_pcs support the

> configure will fail.


i just sent out the draft text of the ELF ABI yesterday that
addresses the lazy binding issue with sve and advsimd vector pcs.

this means gcc-9.1 does not produce abi conforming binaries,
a further gcc and binutils patch will be needed for that.

however not following the new ELF abi in libmvec.so itself should
be fine: the new ELF symbol marking is needed in the module that
has R_*_JUMP_SLOT relocations for vector symbols, i.e. on the
caller side. testing libmvec in glibc still works by luck since
lazy binding does not clobber problematic fp regs, and there are
no ifunc resolvers for vector math functions that use fp regs
either (the test binaries won't be abi conform though because
of the missing symbol marking).

(1) it is not obvious that we can enable libmvec in glibc
without a toolchain that follows the new ELF abi.

> There was a question of whether building libmvec should be optional or not,

> I don't have a strong opinion on that but would be interested in what others

> think.  I could change this to require aarch64_vector_pcs attribute support

> in all cases and always build libmvec if that is what we want.


if libmvec is conditional part of the glibc abi on aarch64
then binaries may not be portable between the same version
of glibc.

so either we make libmvec "experimental abi" for which you
have to explicitly opt in (not automatic based on compiler
support) or somehow ensure libmvec abi is present even if
the compiler building glibc has no support for the vector abi.

the latter can be achieved by a dummy libmvec (e.g. using
trivial asm) that provides the right symbols without optimal
implementations. (testing the dymmy libmvec is problematic
without compiler support though)

(2) we need to decide between 'experimental abi' or dummy libmvec
or increasing the minimum toolchain requirement.

(3a) in case of 'experimental abi' there has to be some
description about what users can expect and how to opt out
from using libmvec to build portable binaries.

(3b) in case of dummy libmvec the glibc mechanism to provide
it and test it should be figured out.

> I added static *_finite function names so that they are not exported.  If

> Wilco's patch to remove the *_finite names entirely is approved I can remove

> this part of the patch.

> 

> I removed the 'if (aarch64)' conditionals from math-vector-fortran.h

> (and fixed my use of the BIG_ENDIAN macros) this means that the vector exp

> and expf routines should get used in big-endian and little-endian modes for

> Fortran (just like C).  I have not done any big-endian testing at this point

> because I have been doing all my testing on a little-endian Aarch64 linux box.

> 

> If anyone has ideas on how to do big-endian testing I would be interested.

> I am guessing I would have to build an elf target and test with qemu or

> something like that but I haven't done a build/test setup like that in quite

> a while.


i'd like to get consensus around (1), (2), (3) first.

i don't yet have a proposal for how this should work.
  
Florian Weimer May 23, 2019, 11:08 a.m. UTC | #2
* Szabolcs Nagy:

> however not following the new ELF abi in libmvec.so itself should
> be fine: the new ELF symbol marking is needed in the module that
> has R_*_JUMP_SLOT relocations for vector symbols, i.e. on the
> caller side. testing libmvec in glibc still works by luck since
> lazy binding does not clobber problematic fp regs, and there are
> no ifunc resolvers for vector math functions that use fp regs
> either (the test binaries won't be abi conform though because
> of the missing symbol marking).

But the dynamic section of libmvec won't have the ABI-mandated markup,
right?  Based on the draft, I assume that it's sufficient if the markup
is present on either side to trigger the special processing.  It's
unclear if it's the compiler's job to set the flag for the libmvec
routines, or if it can rely on the link editor divining it (from the
DSO, which it needs anyway for the symbol version) and trigger the
required processing for the symbols concerned.

Thanks,
Florian
  
Szabolcs Nagy May 23, 2019, 12:12 p.m. UTC | #3
On 23/05/2019 12:08, Florian Weimer wrote:
> * Szabolcs Nagy:

> 

>> however not following the new ELF abi in libmvec.so itself should

>> be fine: the new ELF symbol marking is needed in the module that

>> has R_*_JUMP_SLOT relocations for vector symbols, i.e. on the

>> caller side. testing libmvec in glibc still works by luck since

>> lazy binding does not clobber problematic fp regs, and there are

>> no ifunc resolvers for vector math functions that use fp regs

>> either (the test binaries won't be abi conform though because

>> of the missing symbol marking).

> 

> But the dynamic section of libmvec won't have the ABI-mandated markup,

> right?


yes (st_other marking will be missing from the dynamic symbol table)

>  Based on the draft, I assume that it's sufficient if the markup

> is present on either side to trigger the special processing.


markup on either side (definition or reference) triggers adding
the markup to a shared lib or executable at link time.

markup must be present on the caller side (reference) to trigger
special processing at runtime.

the VABI requires markup on *both* sides (i.e. in all conforming
binaries the vector function symbol references will be marked)

>  It's

> unclear if it's the compiler's job to set the flag for the libmvec

> routines, or if it can rely on the link editor divining it (from the

> DSO, which it needs anyway for the symbol version) and trigger the

> required processing for the symbols concerned.


both:

the reference should be marked by the compiler(+assembler) according
to VABI rules.

if any reference or definition is marked in linker inputs then on
the output the symbol should be marked according to ELF ABI rules.

in practice the case where the definition is marked but the ref
is unmarked should not happen if everything conforms to the VABI.
(future pcs extension using the same marking may behave differently).

so unmarked libmvec.so would work in practice but not abi conform,
i don't immediately see a huge problem with non-abi conform libmvec
(other than it's ugly and may confuse tools that try to rely on the
marking for some reason), but i think it's also possible to add a
tiny elf processing tool to glibc that adds the marking manually to
the dynamic symbol table if the linker does not know how to do it.
  
Joseph Myers May 23, 2019, 9:11 p.m. UTC | #4
On Wed, 22 May 2019, Steve Ellcey wrote:

> I added static *_finite function names so that they are not exported.  If
> Wilco's patch to remove the *_finite names entirely is approved I can remove
> this part of the patch.

There's an interesting implication here - that Wilco's patches should also 
be removing the *existing* x86_64 libmvec *_finite aliases and tests 
thereof (because they are static-only, so not relevant for compatibility 
with existing executables and shared libraries - and we don't claim to 
keep compatibility with existing .o files when we obsolete a feature).

A more significant omission in that area from Wilco's patches is that all 
the _finite aliases in libm (as opposed to libmvec) need to be conditioned 
on SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30), as if we're obsoleting the 
aliases that means they should not become part of the ABI for any future 
glibc ports.
  
Wilco Dijkstra May 28, 2019, 3:36 p.m. UTC | #5
Hi Joseph,

> There's an interesting implication here - that Wilco's patches should also 
> be removing the *existing* x86_64 libmvec *_finite aliases and tests 
> thereof (because they are static-only, so not relevant for compatibility 
> with existing executables and shared libraries - and we don't claim to 
> keep compatibility with existing .o files when we obsolete a feature).

You mean sysdeps/x86_64/fpu/svml_finite_alias.S and test-libmvec-alias-mod.c?

> A more significant omission in that area from Wilco's patches is that all 
> the _finite aliases in libm (as opposed to libmvec) need to be conditioned 
> on SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30), as if we're obsoleting the 
> aliases that means they should not become part of the ABI for any future 
> glibc ports.

So concretely, does this mean the strong aliases need to have a #if around it
like:

#if SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30)
strong_alias (__ieee754_cosh, __cosh_finite)
#endif

Do the abi lists need to be changed too?

Wilco
  
Joseph Myers May 28, 2019, 8:07 p.m. UTC | #6
On Tue, 28 May 2019, Wilco Dijkstra wrote:

> Hi Joseph,
> 
> > There's an interesting implication here - that Wilco's patches should also 
> > be removing the *existing* x86_64 libmvec *_finite aliases and tests 
> > thereof (because they are static-only, so not relevant for compatibility 
> > with existing executables and shared libraries - and we don't claim to 
> > keep compatibility with existing .o files when we obsolete a feature).
> 
> You mean sysdeps/x86_64/fpu/svml_finite_alias.S and test-libmvec-alias-mod.c?

There seem to be more files involved in testing those aliases, but that's 
the general idea.

> > A more significant omission in that area from Wilco's patches is that all 
> > the _finite aliases in libm (as opposed to libmvec) need to be conditioned 
> > on SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30), as if we're obsoleting the 
> > aliases that means they should not become part of the ABI for any future 
> > glibc ports.
> 
> So concretely, does this mean the strong aliases need to have a #if around it
> like:
> 
> #if SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30)
> strong_alias (__ieee754_cosh, __cosh_finite)
> #endif

Yes.  They also need to change from strong_alias to compat_symbol calls, 
of the form:

compat_symbol (__ieee754_cosh, __cosh_finite, GLIBC_2_15)

(If any functions have existing versioned_symbol calls you may need to add 
another alias to use in the compat_symbol call, because of assembler 
requirements that each symbol version gets defined using a different name 
for the original symbol.  Hopefully any such cases are rare, but this 
probably applies to a few functions such as expf, for example, where new 
symbol versions were added as part of eliminating wrappers.)

*Calls* to the _finite functions will need updating to call other public 
names for those functions.  There are a few calls of WRAPPER_IMPL_SSE2 
etc. in sysdeps/x86_64/fpu/, and calls to the _finite functions from .S 
files in sysdeps/x86_64/fpu/multiarch/, that need updating.  (There might 
be other calls, I haven't done an exhaustive check.)

> Do the abi lists need to be changed too?

No.  The ABI lists do not distinguish whether a symbol is a compat symbol 
or not.  Testing with build-many-glibcs.py to verify that all the ABI 
tests still pass is a good idea to verify that the patch hasn't 
accidentally broken the ABI.
  
Szabolcs Nagy June 20, 2019, 3:18 p.m. UTC | #7
On 23/05/2019 11:53, Szabolcs Nagy wrote:
> (1) it is not obvious that we can enable libmvec in glibc

> without a toolchain that follows the new ELF abi.


i was thinking about this and i believe we can add
libmvec abi without toolchain support (e.g. dummy
asm implementation that falls back to scalar code).

testing the code requires toolchain support, but i
think that can be conditional.

>> There was a question of whether building libmvec should be optional or not,

>> I don't have a strong opinion on that but would be interested in what others

>> think.  I could change this to require aarch64_vector_pcs attribute support

>> in all cases and always build libmvec if that is what we want.


i would add libmvec unconditionally with dummy
implementations and then the abi is in place for
later backports of algorithms.

> (2) we need to decide between 'experimental abi' or dummy libmvec

> or increasing the minimum toolchain requirement.

> 

> (3a) in case of 'experimental abi' there has to be some

> description about what users can expect and how to opt out

> from using libmvec to build portable binaries.

> 

> (3b) in case of dummy libmvec the glibc mechanism to provide

> it and test it should be figured out.


i think dummy libmvec may work for now.

unfortunately in math.h omp declare simd would imply
the availability of both advsimd and sve functions.
there will be a way to say "only advsimd version is
available", but gcc does not support that yet.
so we can

(1) add omp declare simd to math.h and add dummy sve
    symbols too. (however sve calls cannot be tested)

(2) don't add anything to math.h for now, only have
    the abi symbols.

if it's ok to backport public header changes then (2)
should work: then vectorization is not supported for
now, but e.g. if gcc-10 has the right support we can
conditionally add the header magic to math.h (and the
fortran module) later and backport that change to
glibc-2.30, potentially together with improved
algorithms (conditionally compiled with gcc-10).

in glibc-2.30 the testing can be conditional on attr
aarch64_vector_pcs support (which should work with
current gcc-9, the lazy binding issue is only solved
in trunk gcc yet, but that's not important for testing).

note that current math/test-math-vector.h provides a
declaration for vector functions, on aarch64 the
declaration is wrong, it needs the aarch64_vector_pcs
attribute:

/* Wrapper from scalar to vector function.  */
#define VECTOR_WRAPPER(scalar_func, vector_func) \
extern VEC_TYPE vector_func (VEC_TYPE);         \
FLOAT scalar_func (FLOAT x)                     \
{                                               \
...

if backporting public header changes is problematic
then this has to wait for another release.
  
Florian Weimer June 21, 2019, 10:53 a.m. UTC | #8
* Szabolcs Nagy:

> On 23/05/2019 11:53, Szabolcs Nagy wrote:
>> (1) it is not obvious that we can enable libmvec in glibc
>> without a toolchain that follows the new ELF abi.
>
> i was thinking about this and i believe we can add
> libmvec abi without toolchain support (e.g. dummy
> asm implementation that falls back to scalar code).

To be clear, it's not a dummy implementation, but one with assembler
trampolines which adjust the calling convention?

So it will be fully functional, just rather slow?

Thanks,
Florian
  
Szabolcs Nagy June 21, 2019, 11:51 a.m. UTC | #9
On 21/06/2019 11:53, Florian Weimer wrote:
> * Szabolcs Nagy:

> 

>> On 23/05/2019 11:53, Szabolcs Nagy wrote:

>>> (1) it is not obvious that we can enable libmvec in glibc

>>> without a toolchain that follows the new ELF abi.

>>

>> i was thinking about this and i believe we can add

>> libmvec abi without toolchain support (e.g. dummy

>> asm implementation that falls back to scalar code).

> 

> To be clear, it's not a dummy implementation, but one with assembler

> trampolines which adjust the calling convention?

> 

> So it will be fully functional, just rather slow?


yes, it's fully functional and call abi conform.

i call it dummy because i would not add vector
code, just fall back to scalar calls in the asm.

i guess the asm could be a trampoline that calls
vector c code following the base call convention,
but we don't have reasonable vector c code yet.

either way we can have abi symbols that follow
the right call convention using asm with an old
compiler.

what we cannot have is declarations that only
specify the availability of the specific symbols
we define.
  
Florian Weimer June 21, 2019, 12:02 p.m. UTC | #10
* Szabolcs Nagy:

> On 21/06/2019 11:53, Florian Weimer wrote:
>> * Szabolcs Nagy:
>> 
>>> On 23/05/2019 11:53, Szabolcs Nagy wrote:
>>>> (1) it is not obvious that we can enable libmvec in glibc
>>>> without a toolchain that follows the new ELF abi.
>>>
>>> i was thinking about this and i believe we can add
>>> libmvec abi without toolchain support (e.g. dummy
>>> asm implementation that falls back to scalar code).
>> 
>> To be clear, it's not a dummy implementation, but one with assembler
>> trampolines which adjust the calling convention?
>> 
>> So it will be fully functional, just rather slow?
>
> yes, it's fully functional and call abi conform.
>
> i call it dummy because i would not add vector
> code, just fall back to scalar calls in the asm.
>
> i guess the asm could be a trampoline that calls
> vector c code following the base call convention,
> but we don't have reasonable vector c code yet.
>
> either way we can have abi symbols that follow
> the right call convention using asm with an old
> compiler.

That still sounds reasonable to me. 8-)

Thanks,
Florian
  

Patch

diff --git a/NEWS b/NEWS
index 0e4c57f273..b1845c02e4 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@  Version 2.30
 
 Major new features:
 
+* Aarch64 now supports libmvec.  Building libmvec on aarch64 requires
+  a compiler that supports the vector function ABI that is defined at
+  https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi
+
+  GCC 9.1 has support for this ABI.  The current libmvec for aarch64
+  has vector versions of the exp and expf functions.
+
 * Unicode 12.1.0 Support: Character encoding, character type info, and
   transliteration tables are all updated to Unicode 12.1.0, using
   generator scripts contributed by Mike FABIAN (Red Hat).
diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac
index 7851dd4dac..5c56511deb 100644
--- a/sysdeps/aarch64/configure.ac
+++ b/sysdeps/aarch64/configure.ac
@@ -20,3 +20,27 @@  if test $libc_cv_aarch64_be = yes; then
 else
   LIBC_CONFIG_VAR([default-abi], [lp64])
 fi
+
+AC_CACHE_CHECK([for pcs attribute support],
+               libc_cv_gcc_pcs_attribute, [dnl
+cat > conftest.c <<EOF
+__attribute__((aarch64_vector_pcs)) extern void foo (void);
+EOF
+libc_cv_gcc_pcs_attribute=no
+if ${CC-cc} -c -Wall -Werror conftest.c -o conftest.o 1>&AS_MESSAGE_LOG_FD \
+   2>&AS_MESSAGE_LOG_FD ; then
+  libc_cv_gcc_pcs_attribute=yes
+fi
+rm -f conftest*])
+
+if test x"$build_mathvec" = xyes; then
+  if test $libc_cv_gcc_pcs_attribute = no; then
+    AC_MSG_ERROR([--enable-mathvec requires a gcc that supports the aarch64_vector_pcs attribute])
+  fi
+fi
+
+if test x"$build_mathvec" = xnotset; then
+  if test $libc_cv_gcc_pcs_attribute = yes; then
+    build_mathvec=yes
+  fi
+fi
diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
index 4a182bd6d6..c0720484e2 100644
--- a/sysdeps/aarch64/fpu/Makefile
+++ b/sysdeps/aarch64/fpu/Makefile
@@ -12,3 +12,27 @@  CFLAGS-s_fmaxf.c += -ffinite-math-only
 CFLAGS-s_fmin.c += -ffinite-math-only
 CFLAGS-s_fminf.c += -ffinite-math-only
 endif
+
+ifeq ($(subdir),mathvec)
+CFLAGS-libmvec_double_vlen2_exp.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_float_vlen4_expf.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_exp_data.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_exp2f_data.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_aliases.c += -march=armv8-a+simd -fno-math-errno
+
+libmvec-support += libmvec_double_vlen2_exp
+libmvec-support += libmvec_float_vlen4_expf
+libmvec-support += libmvec_exp_data
+libmvec-support += libmvec_exp2f_data
+libmvec-support += libmvec_aliases
+
+libmvec-static-only-routines += limvec_aliases
+endif
+
+ifeq ($(subdir),math)
+ifeq ($(build-mathvec),yes)
+libmvec-tests += double-vlen2 float-vlen4
+double-vlen2-funcs = exp
+float-vlen4-funcs = exp
+endif
+endif
diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions
index e69de29bb2..da36f3c495 100644
--- a/sysdeps/aarch64/fpu/Versions
+++ b/sysdeps/aarch64/fpu/Versions
@@ -0,0 +1,5 @@ 
+libmvec {
+  GLIBC_2.30 {
+    _ZGVnN2v_exp; _ZGVnN4v_expf;
+  }
+}
diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h
index e69de29bb2..4c3415987a 100644
--- a/sysdeps/aarch64/fpu/bits/math-vector.h
+++ b/sysdeps/aarch64/fpu/bits/math-vector.h
@@ -0,0 +1,43 @@ 
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly;\
+ include <math.h> instead."
+#endif
+
+/* Get default empty definitions for simd declarations.  */
+#include <bits/libm-simd-decl-stubs.h>
+
+#if defined __FAST_MATH__
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case.  */
+#  define __DECL_SIMD_AARCH64 _Pragma ("omp declare simd notinbranch")
+# elif __GNUC_PREREQ (6,0)
+/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
+#  define __DECL_SIMD_AARCH64 __attribute__ ((__simd__ ("notinbranch")))
+# endif
+
+# ifdef __DECL_SIMD_AARCH64
+#  undef __DECL_SIMD_exp
+#  define __DECL_SIMD_exp __DECL_SIMD_AARCH64
+#  undef __DECL_SIMD_expf
+#  define __DECL_SIMD_expf __DECL_SIMD_AARCH64
+
+# endif
+#endif
diff --git a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h
index e69de29bb2..293983eb2c 100644
--- a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h
@@ -0,0 +1,20 @@ 
+! Platform-specific declarations of SIMD math functions for Fortran. -*- f90 -*-
+!   Copyright (C) 2019 Free Software Foundation, Inc.
+!   This file is part of the GNU C Library.
+!
+!   The GNU C Library is free software; you can redistribute it and/or
+!   modify it under the terms of the GNU Lesser General Public
+!   License as published by the Free Software Foundation; either
+!   version 2.1 of the License, or (at your option) any later version.
+!
+!   The GNU C Library is distributed in the hope that it will be useful,
+!   but WITHOUT ANY WARRANTY; without even the implied warranty of
+!   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+!   Lesser General Public License for more details.
+!
+!   You should have received a copy of the GNU Lesser General Public
+!   License along with the GNU C Library; if not, see
+!   <http://www.gnu.org/licenses/>.
+
+!GCC$ builtin (exp) attributes simd (notinbranch)
+!GCC$ builtin (expf) attributes simd (notinbranch)
diff --git a/sysdeps/aarch64/fpu/libmvec_aliases.c b/sysdeps/aarch64/fpu/libmvec_aliases.c
index e69de29bb2..bc3f9b8118 100644
--- a/sysdeps/aarch64/fpu/libmvec_aliases.c
+++ b/sysdeps/aarch64/fpu/libmvec_aliases.c
@@ -0,0 +1,40 @@ 
+/* These aliases added as workaround to exclude unnecessary symbol
+   aliases in libmvec.so while compiler creates the vector names
+   based on scalar asm name.  Corresponding discussion is at
+   <https://gcc.gnu.org/ml/gcc/2015-06/msg00173.html>.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arm_neon.h>
+
+extern __attribute__((aarch64_vector_pcs)) float64x2_t
+_ZGVnN2v_exp (float64x2_t x);
+
+__attribute__((aarch64_vector_pcs)) float64x2_t
+_ZGVnN2v___exp_finite (float64x2_t x)
+{
+  return _ZGVnN2v_exp (x);
+}
+
+extern __attribute__((aarch64_vector_pcs)) float32x4_t
+_ZGVnN4v_expf (float32x4_t x);
+
+__attribute__((aarch64_vector_pcs)) float32x4_t
+_ZGVnN4v___expf_finite (float32x4_t x)
+{
+  return _ZGVnN4v_expf (x);
+}
diff --git a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c
index e69de29bb2..ce618c8859 100644
--- a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c
+++ b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c
@@ -0,0 +1,94 @@ 
+/* Double-precision 2 element vector e^x function.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This function is based on sysdeps/ieee754/dbl-64/e_exp.c.  */
+
+#include <math.h>
+#include <float.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <ieee754.h>
+#include <math-narrow-eval.h>
+#include "math_config.h"
+#include "libmvec_util.h"
+
+#define N (1 << EXP_TABLE_BITS)
+#define InvLn2N __exp_data.invln2N
+#define NegLn2hiN __exp_data.negln2hiN
+#define NegLn2loN __exp_data.negln2loN
+#define Shift __exp_data.shift
+#define T __exp_data.tab
+#define C2 __exp_data.poly[5 - EXP_POLY_ORDER]
+#define C3 __exp_data.poly[6 - EXP_POLY_ORDER]
+#define C4 __exp_data.poly[7 - EXP_POLY_ORDER]
+#define C5 __exp_data.poly[8 - EXP_POLY_ORDER]
+
+#define LIMIT 700.0
+
+/* Do not inline this call.   That way _ZGVnN2v_exp has no calls to non-vector
+   functions.  This reduces the register saves that _ZGVnN2v_exp has to do.  */
+
+__attribute__((aarch64_vector_pcs, noinline)) static float64x2_t
+__scalar_exp (float64x2_t x)
+{
+  return (float64x2_t) { exp(x[0]), exp(x[1]) };
+}
+
+__attribute__((aarch64_vector_pcs)) float64x2_t
+_ZGVnN2v_exp (float64x2_t x)
+{
+  double h, z_0, z_1;
+  float64x2_t g, scale_v, tail_v, tmp_v, r_v, r2_v, kd_v;
+  float64x2_t NegLn2hiN_v, NegLn2loN_v, C2_v, C3_v, C4_v, C5_v;
+  uint64_t ki_0, ki_1, idx_0, idx_1;
+  uint64_t top_0, top_1, sbits_0, sbits_1;
+
+  /* If any value is larger than LIMIT, or NAN, call scalar operation.  */
+  g = vabsq_f64 (x);
+  h = vmaxnmvq_f64 (g);
+  if (__glibc_unlikely (!(h < LIMIT)))
+    return __scalar_exp (x);
+
+  z_0 = InvLn2N * x[0];
+  z_1 = InvLn2N * x[1];
+  ki_0 = converttoint (z_0);
+  ki_1 = converttoint (z_1);
+
+  idx_0 = 2 * (ki_0 % N);
+  idx_1 = 2 * (ki_1 % N);
+  top_0 = ki_0 << (52 - EXP_TABLE_BITS);
+  top_1 = ki_1 << (52 - EXP_TABLE_BITS);
+  sbits_0 = T[idx_0 + 1] + top_0;
+  sbits_1 = T[idx_1 + 1] + top_1;
+
+  kd_v = (float64x2_t) { roundtoint (z_0), roundtoint (z_1) };
+  scale_v = (float64x2_t) { asdouble (sbits_0), asdouble (sbits_1) };
+  tail_v = (float64x2_t) { asdouble (T[idx_0]), asdouble (T[idx_1]) };
+  NegLn2hiN_v = (float64x2_t) { NegLn2hiN, NegLn2hiN };
+  NegLn2loN_v = (float64x2_t) { NegLn2loN, NegLn2loN };
+  C2_v = (float64x2_t) { C2, C2 };
+  C3_v = (float64x2_t) { C3, C3 };
+  C4_v = (float64x2_t) { C4, C4 };
+  C5_v = (float64x2_t) { C5, C5 };
+
+  r_v = x + kd_v * NegLn2hiN_v + kd_v * NegLn2loN_v;
+  r2_v = r_v * r_v;
+  tmp_v = tail_v + r_v + r2_v * (C2_v + r_v * C3_v) + r2_v * r2_v
+	  * (C4_v + r_v * C5_v);
+  return scale_v + scale_v * tmp_v;
+}
diff --git a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c
index e69de29bb2..d97ce157b0 100644
--- a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c
+++ b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c
@@ -0,0 +1,2 @@ 
+#include <sysdeps/ieee754/flt-32/math_config.h>
+#include <sysdeps/ieee754/flt-32/e_exp2f_data.c>
diff --git a/sysdeps/aarch64/fpu/libmvec_exp_data.c b/sysdeps/aarch64/fpu/libmvec_exp_data.c
index e69de29bb2..a83661b39d 100644
--- a/sysdeps/aarch64/fpu/libmvec_exp_data.c
+++ b/sysdeps/aarch64/fpu/libmvec_exp_data.c
@@ -0,0 +1 @@ 
+#include <sysdeps/ieee754/dbl-64/e_exp_data.c>
diff --git a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c
index e69de29bb2..938c72ddab 100644
--- a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c
+++ b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c
@@ -0,0 +1,114 @@ 
+/* Single-precision 2 element vector e^x function.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This function is based on sysdeps/ieee754/flt-32/e_expf.c.  */
+
+#include <math.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <sysdeps/ieee754/flt-32/math_config.h>
+#include "libmvec_util.h"
+
+#define N (1 << EXP2F_TABLE_BITS)
+#define LIMIT 80.0
+
+#define InvLn2N __exp2f_data.invln2_scaled
+#define T __exp2f_data.tab
+#define C __exp2f_data.poly_scaled
+#define SHIFT __exp2f_data.shift
+
+/* Do not inline this call.  That way _ZGVnN4v_expf has no calls to non-vector
+   functions.  This reduces the register saves that _ZGVnN4v_expf has to do.  */
+
+__attribute__((aarch64_vector_pcs,noinline)) static float32x4_t
+__scalar_expf (float32x4_t x)
+{
+  return (float32x4_t) { expf(x[0]), expf(x[1]), expf(x[2]), expf(x[3]) };
+}
+
+__attribute__((aarch64_vector_pcs)) float32x4_t
+_ZGVnN4v_expf (float32x4_t x)
+{
+  float32x4_t g, result;
+  float64x2_t xd_0, xd_1, vInvLn2N, z_0, z_1, vkd_0, vkd_1, r_0, r_1;
+  float64x2_t vs_0, vs_1, c0, c1, c2, y_0, y_1, r2_0, r2_1, one;
+  uint64_t ki_0, ki_1, ki_2, ki_3, t_0, t_1, t_2, t_3;
+  double s_0, s_1, s_2, s_3;
+  float f;
+
+  /* If any value is larger than LIMIT, or NAN, call scalar operation.  */
+  g = vabsq_f32 (x);
+  f = vmaxnmvq_f32 (g);
+  if (__glibc_unlikely (!(f < LIMIT)))
+    return __scalar_expf (x);
+  
+  xd_0 = get_lo_and_extend (x);
+  xd_1 = get_hi_and_extend (x);
+
+  vInvLn2N = (float64x2_t) { InvLn2N, InvLn2N };
+  /* x*N/Ln2 = k + r with r in [-1/2, 1/2] and int k.  */
+  z_0 = vInvLn2N * xd_0;
+  z_1 = vInvLn2N * xd_1;
+
+  /* Round and convert z to int, the result is in [-150*N, 128*N] and
+     ideally ties-to-even rule is used, otherwise the magnitude of r
+     can be bigger which gives larger approximation error.  */
+  vkd_0 = vrndaq_f64 (z_0);
+  vkd_1 = vrndaq_f64 (z_1);
+  r_0 = z_0 - vkd_0;
+  r_1 = z_1 - vkd_1;
+
+  ki_0 = (long) vkd_0[0];
+  ki_1 = (long) vkd_0[1];
+  ki_2 = (long) vkd_1[0];
+  ki_3 = (long) vkd_1[1];
+
+  /* exp(x) = 2^(k/N) * 2^(r/N) ~= s * (C0*r^3 + C1*r^2 + C2*r + 1) */
+  t_0 = T[ki_0 % N];
+  t_1 = T[ki_1 % N];
+  t_2 = T[ki_2 % N];
+  t_3 = T[ki_3 % N];
+  t_0 += ki_0 << (52 - EXP2F_TABLE_BITS);
+  t_1 += ki_1 << (52 - EXP2F_TABLE_BITS);
+  t_2 += ki_2 << (52 - EXP2F_TABLE_BITS);
+  t_3 += ki_3 << (52 - EXP2F_TABLE_BITS);
+  s_0 = asdouble (t_0);
+  s_1 = asdouble (t_1);
+  s_2 = asdouble (t_2);
+  s_3 = asdouble (t_3);
+
+  vs_0 = (float64x2_t) { s_0, s_1 };
+  vs_1 = (float64x2_t) { s_2, s_3 };
+  c0 = (float64x2_t) { C[0], C[0] };
+  c1 = (float64x2_t) { C[1], C[1] };
+  c2 = (float64x2_t) { C[2], C[2] };
+  one = (float64x2_t) { 1.0, 1.0 };
+
+  z_0 = c0 * r_0 + c1;
+  z_1 = c0 * r_1 + c1;
+  r2_0 = r_0 * r_0;
+  r2_1 = r_1 * r_1;
+  y_0 = c2 * r_0 + one;
+  y_1 = c2 * r_1 + one;
+  y_0 = z_0 * r2_0 + y_0;
+  y_1 = z_1 * r2_1 + y_1;
+  y_0 = y_0 * vs_0;
+  y_1 = y_1 * vs_1;
+  result = pack_and_trunc (y_0, y_1);
+  return result;
+}
diff --git a/sysdeps/aarch64/fpu/libmvec_util.h b/sysdeps/aarch64/fpu/libmvec_util.h
index e69de29bb2..bd0463ce22 100644
--- a/sysdeps/aarch64/fpu/libmvec_util.h
+++ b/sysdeps/aarch64/fpu/libmvec_util.h
@@ -0,0 +1,54 @@ 
+/* Utility functions for Aarch64 vector functions.
+   Copyright (C) 2015-2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+#include <arm_neon.h>
+
+/* Copy lower 2 elements of of a 4 element float vector into a 2 element
+   double vector.  */
+
+static __always_inline float64x2_t
+get_lo_and_extend (float32x4_t x)
+{
+  __Uint64x2_t tmp1 = (__Uint64x2_t) x;
+#if __BYTE_ORDER == __BIG_ENDIAN
+  uint64_t tmp2 = (uint64_t) tmp1[1];
+#else
+  uint64_t tmp2 = (uint64_t) tmp1[0];
+#endif
+  return vcvt_f64_f32 ((float32x2_t) tmp2);
+}
+
+/* Copy upper 2 elements of of a 4 element float vector into a 2 element
+   double vector.  */
+
+static __always_inline float64x2_t
+get_hi_and_extend (float32x4_t x)
+{
+  return vcvt_high_f64_f32 (x);
+}
+
+/* Copy a pair of 2 element double vectors into a 4 element float vector.  */
+
+static __always_inline float32x4_t
+pack_and_trunc (float64x2_t x, float64x2_t y)
+{
+  float32x2_t xx = vcvt_f32_f64 (x);
+  float32x2_t yy = vcvt_f32_f64 (y);
+  return (vcombine_f32 (xx, yy));
+}
diff --git a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c
index e69de29bb2..9eb31c8dfc 100644
--- a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c
@@ -0,0 +1,24 @@ 
+/* Wrapper part of tests for aarch64 double vector math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arm_neon.h>
+#include "test-double-vlen2.h"
+
+#define VEC_TYPE float64x2_t
+
+VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVnN2v_exp)
diff --git a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c
index e69de29bb2..7f64acf886 100644
--- a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c
@@ -0,0 +1,24 @@ 
+/* Wrapper part of tests for float aarch64 vector math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arm_neon.h>
+#include "test-float-vlen4.h"
+
+#define VEC_TYPE float32x4_t
+
+VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVnN4v_expf)
diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
index 585e5bbce7..1ed4af9e55 100644
--- a/sysdeps/aarch64/libm-test-ulps
+++ b/sysdeps/aarch64/libm-test-ulps
@@ -1601,6 +1601,12 @@  float: 1
 idouble: 1
 ifloat: 1
 
+Function: "exp_vlen2":
+double: 1
+
+Function: "exp_vlen4":
+float: 1
+
 Function: "expm1":
 double: 1
 float: 1
diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
index e69de29bb2..9e178253f7 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
@@ -0,0 +1,2 @@ 
+GLIBC_2.30 _ZGVnN2v_exp F
+GLIBC_2.30 _ZGVnN4v_expf F