Aarch64: Add simd exp/expf functions
Commit Message
Here are float and double vector exp functions for Aarch64. The vector
functions are based on the ieee ones in sysdeps/ieee754/flt-32/e_expf.c
and sysdeps/ieee754/dbl-64/e_exp.c. If any of the values are 'large'
or NaN they actually call the scalar routines, otherwise they use the
Aarch64 SIMD instructions with the same algorithm as the ieee functions.
My testing has not found any differences in exp output for scalar vs.
vector and the newly added tests for the vector routines pass using the
updated libm-test-ulps file.
This patch also sets build_mathvec to yes by default on Aarch64, applies
the simd attribute to exp and expf in the C header and includes a
Fortran header. The Fortran header is in finclude so this patch needs
Martin Liska's patch that moves math-vector-fortran.h from bits to
finclude in order to work correctly.
Comments?
Steve Ellcey
sellcey@marvell.com
2019-03-06 Steve Ellcey <sellcey@marvell.com>
* sysdeps/aarch64/configure.ac (build_mathvec): Set to yes by default.
* sysdeps/aarch64/configure: Regenerate.
* sysdeps/aarch64/fpu/Makefile (CFLAGS-libmvec_double_vlen2_exp.c):
Set flag.
(CFLAGS-libmvec_float_vlen4_expf.c): Likewise.
(CFLAGS-libmvec_exp_data.c): Likewise.
(CFLAGS-libmvec_exp2f_data.c): Likewise.
(libmvec-support): Add libmvec_double_vlen2_exp,
libmvec_float_vlen4_expf, libmvec_exp_data, libmvec_exp2f_data to list.
(libmvec-static-only-routines): Add dummy name to list.
(libmvec-tests): Add double-vlen2, float-vlen4 to list.
(double-vlen2-funcs): Add new vector function name.
(float-vlen4-funcs): Add new vector function name.
* sysdeps/aarch64/fpu/Versions: New file.
* sysdeps/aarch64/fpu/bits/math-vector.h: New file.
* sysdeps/aarch64/fpu/finclude/math-vector-fortran.h: New file.
* sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c: New file.
* sysdeps/aarch64/fpu/libmvec_exp2f_data.c: New file.
* sysdeps/aarch64/fpu/libmvec_exp_data.c: New file.
* sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c: New file.
* sysdeps/aarch64/fpu/libmvec_util.h: New file.
* sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c: New file.
* sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c: New file.
* sysdeps/aarch64/libm-test-ulps (exp_vlen2): New entry.
(exp_vlen4): Likewise.
* sysdeps/unix/sysv/linux/aarch64/libmvec.abilist: New file.
Comments
On 06/03/2019 17:18, Steve Ellcey wrote:
> Here are float and double vector exp functions for Aarch64. The vector
> functions are based on the ieee ones in sysdeps/ieee754/flt-32/e_expf.c
> and sysdeps/ieee754/dbl-64/e_exp.c. If any of the values are 'large'
> or NaN they actually call the scalar routines, otherwise they use the
> Aarch64 SIMD instructions with the same algorithm as the ieee functions.
> My testing has not found any differences in exp output for scalar vs.
> vector and the newly added tests for the vector routines pass using the
> updated libm-test-ulps file.
>
> This patch also sets build_mathvec to yes by default on Aarch64, applies
> the simd attribute to exp and expf in the C header and includes a
> Fortran header. The Fortran header is in finclude so this patch needs
> Martin Liska's patch that moves math-vector-fortran.h from bits to
> finclude in order to work correctly.
>
> Comments?
thanks
this will need to detect support for
__attribute__((aarch64_vector_pcs))
(which will require gcc-9)
and i plan to fix the lazy binding issue
with vector pcs which will require a new
binutils too (currently that's not super
important since the dynamic linker is
unlikely to use fpregs outside of v0-v7,
but depending on the exact nature of the
solution we may require a new gcc and
new binutils too for libmvec)
the scalar algorithms are not optimal for
simd, but should work and i'm fine with
such initial code to enable libmvec and
then optimize it later.
On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote:
>
> this will need to detect support for
> __attribute__((aarch64_vector_pcs))
> (which will require gcc-9)
That seems easy enough to check for.
> and i plan to fix the lazy binding issue
> with vector pcs which will require a new
> binutils too (currently that's not super
> important since the dynamic linker is
> unlikely to use fpregs outside of v0-v7,
> but depending on the exact nature of the
> solution we may require a new gcc and
> new binutils too for libmvec)
I am not sure how I would check for this.
Will it need to be a version check on binutils
or will there be some functionality that can
be checked for? A new fixup type? Do you
have an estimate for when the binutils change
will go in?
Steve Ellcey
sellcey@marvell.com
* Steve Ellcey:
> On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote:
>>
>> this will need to detect support for
>> __attribute__((aarch64_vector_pcs))
>> (which will require gcc-9)
>
> That seems easy enough to check for.
Can you add assembler trampolines, so that the compiler support
becomes optional, at a performance cost?
>> and i plan to fix the lazy binding issue
>> with vector pcs which will require a new
>> binutils too (currently that's not super
>> important since the dynamic linker is
>> unlikely to use fpregs outside of v0-v7,
>> but depending on the exact nature of the
>> solution we may require a new gcc and
>> new binutils too for libmvec)
>
> I am not sure how I would check for this.
> Will it need to be a version check on binutils
> or will there be some functionality that can
> be checked for? A new fixup type? Do you
> have an estimate for when the binutils change
> will go in?
I don't think the binutils change is needed for building or testing
glibc, at least not initially. Just disable lazy binding.
On Wed, 2019-03-06 at 20:16 +0100, Florian Weimer wrote:
> * Steve Ellcey:
>
> > On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote:
> > >
> > > this will need to detect support for
> > > __attribute__((aarch64_vector_pcs))
> > > (which will require gcc-9)
> >
> > That seems easy enough to check for.
>
> Can you add assembler trampolines, so that the compiler support
> becomes optional, at a performance cost?
Yuck. I suppose this is possible, but I do not want to do it.
The whole reason for vector functions (and for the new vector ABI)
is performance so adding a slow path doesn't seem to me like it is
worthwhile.
Steve Ellcey
sellcey@marvell.com
* Steve Ellcey:
> On Wed, 2019-03-06 at 20:16 +0100, Florian Weimer wrote:
>
>> * Steve Ellcey:
>>
>> > On Wed, 2019-03-06 at 19:04 +0000, Szabolcs Nagy wrote:
>> > >
>> > > this will need to detect support for
>> > > __attribute__((aarch64_vector_pcs))
>> > > (which will require gcc-9)
>> >
>> > That seems easy enough to check for.
>>
>> Can you add assembler trampolines, so that the compiler support
>> becomes optional, at a performance cost?
>
> Yuck. I suppose this is possible, but I do not want to do it.
> The whole reason for vector functions (and for the new vector ABI)
> is performance so adding a slow path doesn't seem to me like it is
> worthwhile.
On the other hand, it could help to get libmvec out of the door more
quickly. I think it's not ideal that if you use an older compiler,
you get only a subset of the glibc ABI. We can get away with it here
because it affects an entire soname. Still it might be difficult to
explain why applications are not portable.
On Wed, 2019-03-06 at 20:45 +0100, Florian Weimer wrote:
> > > Can you add assembler trampolines, so that the compiler support
> > > becomes optional, at a performance cost?
> >
> > Yuck. I suppose this is possible, but I do not want to do it.
> > The whole reason for vector functions (and for the new vector ABI)
> > is performance so adding a slow path doesn't seem to me like it is
> > worthwhile.
>
> On the other hand, it could help to get libmvec out of the door more
> quickly. I think it's not ideal that if you use an older compiler,
> you get only a subset of the glibc ABI. We can get away with it here
> because it affects an entire soname. Still it might be difficult to
> explain why applications are not portable.
If the user doesn't have gcc-9, their compiler isn't going to generate
any calls to these routines anyway. So it doesn't really matter if
they have libmvec or not if they don't have gcc-9. If a program was
compiled with gcc-9 somewhere else and then moved, then yes the new
platform might not have libmvec and there will be portability problems.
I guess if someone was building a platform with gcc-8 and the latest
glibc then it might be nice if libmvec could be built, but gcc-9 should
be released before the next glibc is released so hopefully anyone using
the latest released glibc will also use the latest gcc and have all the
necessary compiler support.
Steve Ellcey
sellcey@marvell.com
On 06/03/2019 20:54, Steve Ellcey wrote:
> On Wed, 2019-03-06 at 20:45 +0100, Florian Weimer wrote:
>
>>>> Can you add assembler trampolines, so that the compiler support
>>>> becomes optional, at a performance cost?
>>>
>>> Yuck. I suppose this is possible, but I do not want to do it.
>>> The whole reason for vector functions (and for the new vector ABI)
>>> is performance so adding a slow path doesn't seem to me like it is
>>> worthwhile.
>>
>> On the other hand, it could help to get libmvec out of the door more
>> quickly. I think it's not ideal that if you use an older compiler,
>> you get only a subset of the glibc ABI. We can get away with it here
>> because it affects an entire soname. Still it might be difficult to
>> explain why applications are not portable.
>
> If the user doesn't have gcc-9, their compiler isn't going to generate
> any calls to these routines anyway. So it doesn't really matter if
> they have libmvec or not if they don't have gcc-9. If a program was
> compiled with gcc-9 somewhere else and then moved, then yes the new
> platform might not have libmvec and there will be portability problems.
glibc is probably built with a stable distro gcc, but then the user
may use a trunk gcc to compile code.
of course with trampolines vector math functions may not be worth to
call at all, so it's not clear if having a libmvec with trampolines is
useful other than allowing the glibc abi to be independent of the gcc
version used to compile it.
On 06/03/2019 19:16, Florian Weimer wrote:
> I don't think the binutils change is needed for building or testing
> glibc, at least not initially. Just disable lazy binding.
in principle libmvec dso as well as anything that references vector
pcs symbols would need to be linked with -z now, and even that's
not enough if we ever want to support LD_AUDIT (which is like
permanent lazy binding). i originally thought i can fix this up
with some simple hack, but it will need a bigger change across the
toolchain, i hope i can post some patches soon and then we can
discuss what to do in glibc.
On 06/03/2019 17:18, Steve Ellcey wrote:
> + g = __builtin_aarch64_absv2df (x);
> + h = __builtin_aarch64_reduc_smax_scal_v2df (g);
please use arm_neon.h intrinsics instead of __builtin_aarch64_*, these
are not documented gcc apis, so they may change.
The following comments are mostly on issues also raised for other
architectures, so reading the discussions of both the powerpc patches and
the x86_64 patches is encouraged.
1. The commit message needs to reference the specification of the ABI
being immplemented, and give confirmation of this having been agreed among
all relevant parties, and give details of the GCC version implementing the
ABI. (The ABI document should be clear on exactly what function variants
the pragma / attributes mean should be available. If you wish to add
other variants in future, e.g. SVE variants, those will need to use a
*different* pragma / attribute, to avoid new compilers misinterpreting the
headers from old glibc as meaning the CVE variants are available.)
2. There needs to be a NEWS entry describing the new user-visible feature
and also giving details of the GCC version with support.
3. There should not be any _finite aliases exported from the shared
library; rather, use static-only wrappers as on x86_64. Or fix the
underlying GCC issue to allow the asm name used as a basis for vector
function names to be different from that used as a scalar function name;
see <https://gcc.gnu.org/ml/gcc/2015-06/msg00173.html>.
4. There are formatting issues in this code, including missing spaces
before '(' and incorrect indentation.
5. Give details (including test programs) of how you tested that the
functions do work, with an installed glibc and new-enough GCC, for
vectorized calls resulting from source code calling the scalar functions,
which the glibc testsuite doesn't cover. This is important end-to-end
validation that the ABI is as intended; the lack of it for x86_64 resulted
in sincos ABI issues only being found later.
6. What does if('aarch64') in the Fortran header mean? What do you need
it at all? The installed header should work for all AArch64 ABIs (so
currently BE and LE); it's not expected to work for other architectures.
7. "#ifdef BIG_ENDIAN" is not a valid conditional. The endian.h header
defines both BIG_ENDIAN and LITTLE_ENDIAN, and then defines BYTE_ORDER to
one of those. Does libmvec_util.h get an implicit include of endian.h
somewhere (so you always get the BE path, which somehow works on LE,
indicating test coverage issues that should be resolved, preferably
through automated tests but failing that please describe in the commit
message how you tested that the endian conditionals were correct), or have
you only tested for little-endian which worked because of the macro being
accidentally undefined?
8. Please confirm in the commit message how testing was run for both BE
and LE, given the presence of such conditionals.
On Wed, 6 Mar 2019, Florian Weimer wrote:
> On the other hand, it could help to get libmvec out of the door more
> quickly. I think it's not ideal that if you use an older compiler,
> you get only a subset of the glibc ABI. We can get away with it here
> because it affects an entire soname. Still it might be difficult to
> explain why applications are not portable.
On the whole I think I agree with Rich Felker's argument
<https://sourceware.org/ml/libc-alpha/2015-11/msg00184.html> against
having the presence of libmvec depend on the tools used for the build.
(Note that the installed bits/math-vector.h file, which may be shared
between multilibs, does not depend on the tools used, so if libmvec was
disabled then the installed bits/math-vector.h is not actually correct and
some programs will fail to build.)
This is an argument for removing the --disable-mathvec configure option as
well as either having assembly wrappers or a requirement for new-enough
tool versions for building libmvec functions on platforms where the oldest
supported GCC / binutils aren't new enough.
On 07/03/2019 19:04, Joseph Myers wrote:
> On Wed, 6 Mar 2019, Florian Weimer wrote:
>
>> On the other hand, it could help to get libmvec out of the door more
>> quickly. I think it's not ideal that if you use an older compiler,
>> you get only a subset of the glibc ABI. We can get away with it here
>> because it affects an entire soname. Still it might be difficult to
>> explain why applications are not portable.
>
> On the whole I think I agree with Rich Felker's argument
> <https://sourceware.org/ml/libc-alpha/2015-11/msg00184.html> against
> having the presence of libmvec depend on the tools used for the build.
> (Note that the installed bits/math-vector.h file, which may be shared
> between multilibs, does not depend on the tools used, so if libmvec was
> disabled then the installed bits/math-vector.h is not actually correct and
> some programs will fail to build.)
>
> This is an argument for removing the --disable-mathvec configure option as
> well as either having assembly wrappers or a requirement for new-enough
> tool versions for building libmvec functions on platforms where the oldest
> supported GCC / binutils aren't new enough.
so is it acceptable to submit generated asm to the
source tree together with the c source?
(or even object files if the assembler is not new
enough?)
* Szabolcs Nagy:
> On 07/03/2019 19:04, Joseph Myers wrote:
>> On Wed, 6 Mar 2019, Florian Weimer wrote:
>>
>>> On the other hand, it could help to get libmvec out of the door more
>>> quickly. I think it's not ideal that if you use an older compiler,
>>> you get only a subset of the glibc ABI. We can get away with it here
>>> because it affects an entire soname. Still it might be difficult to
>>> explain why applications are not portable.
>>
>> On the whole I think I agree with Rich Felker's argument
>> <https://sourceware.org/ml/libc-alpha/2015-11/msg00184.html> against
>> having the presence of libmvec depend on the tools used for the build.
>> (Note that the installed bits/math-vector.h file, which may be shared
>> between multilibs, does not depend on the tools used, so if libmvec was
>> disabled then the installed bits/math-vector.h is not actually correct and
>> some programs will fail to build.)
>>
>> This is an argument for removing the --disable-mathvec configure option as
>> well as either having assembly wrappers or a requirement for new-enough
>> tool versions for building libmvec functions on platforms where the oldest
>> supported GCC / binutils aren't new enough.
>
> so is it acceptable to submit generated asm to the
> source tree together with the c source?
No, before we do that, I think we should just require GCC 9 and binutils
2.33 for building aarch64.
I had the hope that you could build a compatible ABI with just a few
assember trampolines, but that's not the case if the DSOs need markers
for disabling lazy binding in client code. (But it is probably more
natural to disable lazy binding though function attributes in the header
file.)
Thanks,
Florian
On Fri, 8 Mar 2019, Florian Weimer wrote:
> > so is it acceptable to submit generated asm to the
> > source tree together with the c source?
>
> No, before we do that, I think we should just require GCC 9 and binutils
> 2.33 for building aarch64.
I'm dubious of requiring unreleased versions (for an architecture that
previously worked with released versions), but given suitable releases,
requiring recent releases for a given architecture may well be appropriate
if it's required for some feature the architecture maintainers want to
support now rather than later. (Cf. how we set the required version to
6.2 for powerpc64le to facilitate the work towards IEEE long double
support.)
On 07/03/2019 15:08, Joseph Myers wrote:
> 1. The commit message needs to reference the specification of the ABI
> being immplemented, and give confirmation of this having been agreed among
> all relevant parties, and give details of the GCC version implementing the
> ABI. (The ABI document should be clear on exactly what function variants
> the pragma / attributes mean should be available. If you wish to add
> other variants in future, e.g. SVE variants, those will need to use a
> *different* pragma / attribute, to avoid new compilers misinterpreting the
> headers from old glibc as meaning the CVE variants are available.)
the next revision of the vector abi document will try to address this.
(might need some gcc changes)
@@ -20,3 +20,7 @@ if test $libc_cv_aarch64_be = yes; then
else
LIBC_CONFIG_VAR([default-abi], [lp64])
fi
+
+if test x"$build_mathvec" = xnotset; then
+ build_mathvec=yes
+fi
@@ -12,3 +12,27 @@ CFLAGS-s_fmaxf.c += -ffinite-math-only
CFLAGS-s_fmin.c += -ffinite-math-only
CFLAGS-s_fminf.c += -ffinite-math-only
endif
+
+ifeq ($(subdir),mathvec)
+CFLAGS-libmvec_double_vlen2_exp.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_float_vlen4_expf.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_exp_data.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_exp2f_data.c += -march=armv8-a+simd -fno-math-errno
+
+libmvec-support += libmvec_double_vlen2_exp
+libmvec-support += libmvec_float_vlen4_expf
+libmvec-support += libmvec_exp_data
+libmvec-support += libmvec_exp2f_data
+
+# If I do not add a static routine I do not get libmvec_nonshared.a
+# installed and GCC will fail to link when it cannot find it.
+libmvec-static-only-routines += libmvec_dummy
+endif
+
+ifeq ($(subdir),math)
+ifeq ($(build-mathvec),yes)
+libmvec-tests += double-vlen2 float-vlen4
+double-vlen2-funcs = exp
+float-vlen4-funcs = exp
+endif
+endif
@@ -0,0 +1,5 @@
+libmvec {
+ GLIBC_2.30 {
+ _ZGVnN2v___exp_finite; _ZGVnN2v_exp; _ZGVnN4v___expf_finite; _ZGVnN4v_expf;
+ }
+}
@@ -0,0 +1,43 @@
+/* Platform-specific SIMD declarations of math functions.
+ Copyright (C) 2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly;\
+ include <math.h> instead."
+#endif
+
+/* Get default empty definitions for simd declarations. */
+#include <bits/libm-simd-decl-stubs.h>
+
+#if defined __FAST_MATH__
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+# define __DECL_SIMD_AARCH64 _Pragma ("omp declare simd notinbranch")
+# elif __GNUC_PREREQ (6,0)
+/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)). */
+# define __DECL_SIMD_AARCH64 __attribute__ ((__simd__ ("notinbranch")))
+# endif
+
+# ifdef __DECL_SIMD_AARCH64
+# undef __DECL_SIMD_exp
+# define __DECL_SIMD_exp __DECL_SIMD_AARCH64
+# undef __DECL_SIMD_expf
+# define __DECL_SIMD_expf __DECL_SIMD_AARCH64
+
+# endif
+#endif
@@ -0,0 +1,20 @@
+! Platform-specific declarations of SIMD math functions for Fortran. -*- f90 -*-
+! Copyright (C) 2019 Free Software Foundation, Inc.
+! This file is part of the GNU C Library.
+!
+! The GNU C Library is free software; you can redistribute it and/or
+! modify it under the terms of the GNU Lesser General Public
+! License as published by the Free Software Foundation; either
+! version 2.1 of the License, or (at your option) any later version.
+!
+! The GNU C Library is distributed in the hope that it will be useful,
+! but WITHOUT ANY WARRANTY; without even the implied warranty of
+! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+! Lesser General Public License for more details.
+!
+! You should have received a copy of the GNU Lesser General Public
+! License along with the GNU C Library; if not, see
+! <http://www.gnu.org/licenses/>.
+
+!GCC$ builtin (exp) attributes simd (notinbranch) if('aarch64')
+!GCC$ builtin (expf) attributes simd (notinbranch) if('aarch64')
@@ -0,0 +1,95 @@
+/* Double-precision 2 element vector e^x function.
+ Copyright (C) 2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* This function is based on sysdeps/ieee754/dbl-64/e_exp.c. */
+
+#include <math.h>
+#include <float.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <ieee754.h>
+#include <math-narrow-eval.h>
+#include "math_config.h"
+#include "libmvec_util.h"
+
+#define N (1 << EXP_TABLE_BITS)
+#define InvLn2N __exp_data.invln2N
+#define NegLn2hiN __exp_data.negln2hiN
+#define NegLn2loN __exp_data.negln2loN
+#define Shift __exp_data.shift
+#define T __exp_data.tab
+#define C2 __exp_data.poly[5 - EXP_POLY_ORDER]
+#define C3 __exp_data.poly[6 - EXP_POLY_ORDER]
+#define C4 __exp_data.poly[7 - EXP_POLY_ORDER]
+#define C5 __exp_data.poly[8 - EXP_POLY_ORDER]
+
+#define LIMIT 700.0
+
+/* Do not inline this call. That way _ZGVnN2v_exp has no calls to non-vector
+ functions. This reduces the register saves that _ZGVnN2v_exp has to do. */
+
+__attribute__((aarch64_vector_pcs, noinline)) static __Float64x2_t
+__scalar_exp(__Float64x2_t x)
+{
+ return (__Float64x2_t) { exp(x[0]), exp(x[1]) };
+}
+
+__attribute__((aarch64_vector_pcs)) __Float64x2_t
+_ZGVnN2v_exp(__Float64x2_t x)
+{
+ double h, z_0, z_1;
+ __Float64x2_t g, scale_v, tail_v, tmp_v, r_v, r2_v, kd_v;
+ __Float64x2_t NegLn2hiN_v, NegLn2loN_v, C2_v, C3_v, C4_v, C5_v;
+ uint64_t ki_0, ki_1, idx_0, idx_1;
+ uint64_t top_0, top_1, sbits_0, sbits_1;
+
+ /* If any value is larger than LIMIT, or NAN, call scalar operation. */
+ g = __builtin_aarch64_absv2df (x);
+ h = __builtin_aarch64_reduc_smax_scal_v2df (g);
+ if (__glibc_unlikely (!(h < LIMIT)))
+ return __scalar_exp (x);
+
+ z_0 = InvLn2N * x[0];
+ z_1 = InvLn2N * x[1];
+ ki_0 = converttoint (z_0);
+ ki_1 = converttoint (z_1);
+
+ idx_0 = 2 * (ki_0 % N);
+ idx_1 = 2 * (ki_1 % N);
+ top_0 = ki_0 << (52 - EXP_TABLE_BITS);
+ top_1 = ki_1 << (52 - EXP_TABLE_BITS);
+ sbits_0 = T[idx_0 + 1] + top_0;
+ sbits_1 = T[idx_1 + 1] + top_1;
+
+ kd_v = (__Float64x2_t) { roundtoint (z_0), roundtoint (z_1) };
+ scale_v = (__Float64x2_t) { asdouble (sbits_0), asdouble (sbits_1) };
+ tail_v = (__Float64x2_t) { asdouble (T[idx_0]), asdouble (T[idx_1]) };
+ NegLn2hiN_v = (__Float64x2_t) { NegLn2hiN, NegLn2hiN };
+ NegLn2loN_v = (__Float64x2_t) { NegLn2loN, NegLn2loN };
+ C2_v = (__Float64x2_t) { C2, C2 };
+ C3_v = (__Float64x2_t) { C3, C3 };
+ C4_v = (__Float64x2_t) { C4, C4 };
+ C5_v = (__Float64x2_t) { C5, C5 };
+
+ r_v = x + kd_v * NegLn2hiN_v + kd_v * NegLn2loN_v;
+ r2_v = r_v * r_v;
+ tmp_v = tail_v + r_v + r2_v * (C2_v + r_v * C3_v) + r2_v * r2_v
+ * (C4_v + r_v * C5_v);
+ return scale_v + scale_v * tmp_v;
+}
+weak_alias (_ZGVnN2v_exp, _ZGVnN2v___exp_finite)
@@ -0,0 +1,2 @@
+#include <sysdeps/ieee754/flt-32/math_config.h>
+#include <sysdeps/ieee754/flt-32/e_exp2f_data.c>
@@ -0,0 +1 @@
+#include <sysdeps/ieee754/dbl-64/e_exp_data.c>
@@ -0,0 +1,115 @@
+/* Single-precision 2 element vector e^x function.
+ Copyright (C) 2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* This function is based on sysdeps/ieee754/flt-32/e_expf.c. */
+
+#include <math.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <sysdeps/ieee754/flt-32/math_config.h>
+#include "libmvec_util.h"
+
+#define N (1 << EXP2F_TABLE_BITS)
+#define LIMIT 80.0
+
+#define InvLn2N __exp2f_data.invln2_scaled
+#define T __exp2f_data.tab
+#define C __exp2f_data.poly_scaled
+#define SHIFT __exp2f_data.shift
+
+/* Do not inline this call. That way _ZGVnN4v_expf has no calls to non-vector
+ functions. This reduces the register saves that _ZGVnN4v_expf has to do. */
+
+__attribute__((aarch64_vector_pcs,noinline)) static __Float32x4_t
+__scalar_expf (__Float32x4_t x)
+{
+ return (__Float32x4_t) { expf(x[0]), expf(x[1]), expf(x[2]), expf(x[3]) };
+}
+
+__attribute__((aarch64_vector_pcs)) __Float32x4_t
+_ZGVnN4v_expf(__Float32x4_t x)
+{
+ __Float32x4_t g, result;
+ __Float64x2_t xd_0, xd_1, vInvLn2N, z_0, z_1, vkd_0, vkd_1, r_0, r_1;
+ __Float64x2_t vs_0, vs_1, c0, c1, c2, y_0, y_1, r2_0, r2_1, one;
+ uint64_t ki_0, ki_1, ki_2, ki_3, t_0, t_1, t_2, t_3;
+ double s_0, s_1, s_2, s_3;
+ float f;
+
+ /* If any value is larger than LIMIT, or NAN, call scalar operation. */
+ g = __builtin_aarch64_absv4sf (x);
+ f = __builtin_aarch64_reduc_smax_scal_v4sf (g);
+ if (__glibc_unlikely (!(f < LIMIT)))
+ return __scalar_expf (x);
+
+ xd_0 = get_lo_and_extend (x);
+ xd_1 = get_hi_and_extend (x);
+
+ vInvLn2N = (__Float64x2_t) { InvLn2N, InvLn2N };
+ /* x*N/Ln2 = k + r with r in [-1/2, 1/2] and int k. */
+ z_0 = vInvLn2N * xd_0;
+ z_1 = vInvLn2N * xd_1;
+
+ /* Round and convert z to int, the result is in [-150*N, 128*N] and
+ ideally ties-to-even rule is used, otherwise the magnitude of r
+ can be bigger which gives larger approximation error. */
+ vkd_0 = __builtin_aarch64_roundv2df (z_0);
+ vkd_1 = __builtin_aarch64_roundv2df (z_1);
+ r_0 = z_0 - vkd_0;
+ r_1 = z_1 - vkd_1;
+
+ ki_0 = (long) vkd_0[0];
+ ki_1 = (long) vkd_0[1];
+ ki_2 = (long) vkd_1[0];
+ ki_3 = (long) vkd_1[1];
+
+ /* exp(x) = 2^(k/N) * 2^(r/N) ~= s * (C0*r^3 + C1*r^2 + C2*r + 1) */
+ t_0 = T[ki_0 % N];
+ t_1 = T[ki_1 % N];
+ t_2 = T[ki_2 % N];
+ t_3 = T[ki_3 % N];
+ t_0 += ki_0 << (52 - EXP2F_TABLE_BITS);
+ t_1 += ki_1 << (52 - EXP2F_TABLE_BITS);
+ t_2 += ki_2 << (52 - EXP2F_TABLE_BITS);
+ t_3 += ki_3 << (52 - EXP2F_TABLE_BITS);
+ s_0 = asdouble (t_0);
+ s_1 = asdouble (t_1);
+ s_2 = asdouble (t_2);
+ s_3 = asdouble (t_3);
+
+ vs_0 = (__Float64x2_t) { s_0, s_1 };
+ vs_1 = (__Float64x2_t) { s_2, s_3 };
+ c0 = (__Float64x2_t) { C[0], C[0] };
+ c1 = (__Float64x2_t) { C[1], C[1] };
+ c2 = (__Float64x2_t) { C[2], C[2] };
+ one = (__Float64x2_t) { 1.0, 1.0 };
+
+ z_0 = c0 * r_0 + c1;
+ z_1 = c0 * r_1 + c1;
+ r2_0 = r_0 * r_0;
+ r2_1 = r_1 * r_1;
+ y_0 = c2 * r_0 + one;
+ y_1 = c2 * r_1 + one;
+ y_0 = z_0 * r2_0 + y_0;
+ y_1 = z_1 * r2_1 + y_1;
+ y_0 = y_0 * vs_0;
+ y_1 = y_1 * vs_1;
+ result = pack_and_trunc (y_0, y_1);
+ return result;
+}
+weak_alias (_ZGVnN4v_expf, _ZGVnN4v___expf_finite)
@@ -0,0 +1,53 @@
+/* Utility functions for Aarch64 vector functions.
+ Copyright (C) 2015-2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <stdint.h>
+
+/* Copy lower 2 elements of of a 4 element float vector into a 2 element
+ double vector. */
+
+static __always_inline
+__Float64x2_t get_lo_and_extend (__Float32x4_t x)
+{
+ __Uint64x2_t tmp1 = (__Uint64x2_t) x;
+#ifdef BIG_ENDIAN
+ uint64_t tmp2 = (uint64_t) tmp1[1];
+#else
+ uint64_t tmp2 = (uint64_t) tmp1[0];
+#endif
+ return __builtin_aarch64_float_extend_lo_v2df ((__Float32x2_t) tmp2);
+}
+
+/* Copy upper 2 elements of of a 4 element float vector into a 2 element
+ double vector. */
+
+static __always_inline
+__Float64x2_t get_hi_and_extend (__Float32x4_t x)
+{
+ return __builtin_aarch64_vec_unpacks_hi_v4sf (x);
+}
+
+/* Copy a pair of 2 element double vectors into a 4 element float vector. */
+
+static __always_inline
+__Float32x4_t pack_and_trunc (__Float64x2_t x, __Float64x2_t y)
+{
+ __Float32x2_t xx = __builtin_aarch64_float_truncate_lo_v2sf (x);
+ __Float32x2_t yy = __builtin_aarch64_float_truncate_lo_v2sf (y);
+ return (__builtin_aarch64_combinev2sf (xx, yy));
+}
@@ -0,0 +1,23 @@
+/* Wrapper part of tests for aarch64 double vector math functions.
+ Copyright (C) 2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "test-double-vlen2.h"
+
+#define VEC_TYPE __Float64x2_t
+
+VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVnN2v_exp)
@@ -0,0 +1,23 @@
+/* Wrapper part of tests for float aarch64 vector math functions.
+ Copyright (C) 2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "test-float-vlen4.h"
+
+#define VEC_TYPE __Float32x4_t
+
+VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVnN4v_expf)
@@ -1601,6 +1601,12 @@ float: 1
idouble: 1
ifloat: 1
+Function: "exp_vlen2":
+double: 1
+
+Function: "exp_vlen4":
+float: 1
+
Function: "expm1":
double: 1
float: 1
@@ -0,0 +1,4 @@
+GLIBC_2.30 _ZGVnN2v___exp_finite F
+GLIBC_2.30 _ZGVnN2v_exp F
+GLIBC_2.30 _ZGVnN4v___expf_finite F
+GLIBC_2.30 _ZGVnN4v_expf F