Submitter | Steve Ellcey |
---|---|

Date | May 22, 2019, 4:54 p.m. |

Message ID | <bd1b565d471b5549f332a53cf450c91e38634d5a.camel@marvell.com> |

Download | mbox | patch |

Permalink | /patch/32798/ |

State | New |

Headers | show |

## Comments

On 22/05/2019 17:54, Steve Ellcey wrote: > Here is an updated version of my patch to add libmvec and vector exp > functions to Aarch64. GCC 9.1 has now been released so the build > no longer depends on an unreleased compiler. I have not added any > assembly trampolines to allow older compilers to be used. > > The SIMD ABI that this patch uses is supported by GCC 9.1 and is defined at: > > https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi > > If you build with GCC 9.1 (or any compiler that supports the aarch64_vector_pcs > attribute) you will get libmvec by default, otherwise you will not. If you > try to build libmvec using a compiler with out aarch64_vector_pcs support the > configure will fail. i just sent out the draft text of the ELF ABI yesterday that addresses the lazy binding issue with sve and advsimd vector pcs. this means gcc-9.1 does not produce abi conforming binaries, a further gcc and binutils patch will be needed for that. however not following the new ELF abi in libmvec.so itself should be fine: the new ELF symbol marking is needed in the module that has R_*_JUMP_SLOT relocations for vector symbols, i.e. on the caller side. testing libmvec in glibc still works by luck since lazy binding does not clobber problematic fp regs, and there are no ifunc resolvers for vector math functions that use fp regs either (the test binaries won't be abi conform though because of the missing symbol marking). (1) it is not obvious that we can enable libmvec in glibc without a toolchain that follows the new ELF abi. > There was a question of whether building libmvec should be optional or not, > I don't have a strong opinion on that but would be interested in what others > think. I could change this to require aarch64_vector_pcs attribute support > in all cases and always build libmvec if that is what we want. if libmvec is conditional part of the glibc abi on aarch64 then binaries may not be portable between the same version of glibc. so either we make libmvec "experimental abi" for which you have to explicitly opt in (not automatic based on compiler support) or somehow ensure libmvec abi is present even if the compiler building glibc has no support for the vector abi. the latter can be achieved by a dummy libmvec (e.g. using trivial asm) that provides the right symbols without optimal implementations. (testing the dymmy libmvec is problematic without compiler support though) (2) we need to decide between 'experimental abi' or dummy libmvec or increasing the minimum toolchain requirement. (3a) in case of 'experimental abi' there has to be some description about what users can expect and how to opt out from using libmvec to build portable binaries. (3b) in case of dummy libmvec the glibc mechanism to provide it and test it should be figured out. > I added static *_finite function names so that they are not exported. If > Wilco's patch to remove the *_finite names entirely is approved I can remove > this part of the patch. > > I removed the 'if (aarch64)' conditionals from math-vector-fortran.h > (and fixed my use of the BIG_ENDIAN macros) this means that the vector exp > and expf routines should get used in big-endian and little-endian modes for > Fortran (just like C). I have not done any big-endian testing at this point > because I have been doing all my testing on a little-endian Aarch64 linux box. > > If anyone has ideas on how to do big-endian testing I would be interested. > I am guessing I would have to build an elf target and test with qemu or > something like that but I haven't done a build/test setup like that in quite > a while. i'd like to get consensus around (1), (2), (3) first. i don't yet have a proposal for how this should work.

* Szabolcs Nagy: > however not following the new ELF abi in libmvec.so itself should > be fine: the new ELF symbol marking is needed in the module that > has R_*_JUMP_SLOT relocations for vector symbols, i.e. on the > caller side. testing libmvec in glibc still works by luck since > lazy binding does not clobber problematic fp regs, and there are > no ifunc resolvers for vector math functions that use fp regs > either (the test binaries won't be abi conform though because > of the missing symbol marking). But the dynamic section of libmvec won't have the ABI-mandated markup, right? Based on the draft, I assume that it's sufficient if the markup is present on either side to trigger the special processing. It's unclear if it's the compiler's job to set the flag for the libmvec routines, or if it can rely on the link editor divining it (from the DSO, which it needs anyway for the symbol version) and trigger the required processing for the symbols concerned. Thanks, Florian

On 23/05/2019 12:08, Florian Weimer wrote: > * Szabolcs Nagy: > >> however not following the new ELF abi in libmvec.so itself should >> be fine: the new ELF symbol marking is needed in the module that >> has R_*_JUMP_SLOT relocations for vector symbols, i.e. on the >> caller side. testing libmvec in glibc still works by luck since >> lazy binding does not clobber problematic fp regs, and there are >> no ifunc resolvers for vector math functions that use fp regs >> either (the test binaries won't be abi conform though because >> of the missing symbol marking). > > But the dynamic section of libmvec won't have the ABI-mandated markup, > right? yes (st_other marking will be missing from the dynamic symbol table) > Based on the draft, I assume that it's sufficient if the markup > is present on either side to trigger the special processing. markup on either side (definition or reference) triggers adding the markup to a shared lib or executable at link time. markup must be present on the caller side (reference) to trigger special processing at runtime. the VABI requires markup on *both* sides (i.e. in all conforming binaries the vector function symbol references will be marked) > It's > unclear if it's the compiler's job to set the flag for the libmvec > routines, or if it can rely on the link editor divining it (from the > DSO, which it needs anyway for the symbol version) and trigger the > required processing for the symbols concerned. both: the reference should be marked by the compiler(+assembler) according to VABI rules. if any reference or definition is marked in linker inputs then on the output the symbol should be marked according to ELF ABI rules. in practice the case where the definition is marked but the ref is unmarked should not happen if everything conforms to the VABI. (future pcs extension using the same marking may behave differently). so unmarked libmvec.so would work in practice but not abi conform, i don't immediately see a huge problem with non-abi conform libmvec (other than it's ugly and may confuse tools that try to rely on the marking for some reason), but i think it's also possible to add a tiny elf processing tool to glibc that adds the marking manually to the dynamic symbol table if the linker does not know how to do it.

On Wed, 22 May 2019, Steve Ellcey wrote: > I added static *_finite function names so that they are not exported. If > Wilco's patch to remove the *_finite names entirely is approved I can remove > this part of the patch. There's an interesting implication here - that Wilco's patches should also be removing the *existing* x86_64 libmvec *_finite aliases and tests thereof (because they are static-only, so not relevant for compatibility with existing executables and shared libraries - and we don't claim to keep compatibility with existing .o files when we obsolete a feature). A more significant omission in that area from Wilco's patches is that all the _finite aliases in libm (as opposed to libmvec) need to be conditioned on SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30), as if we're obsoleting the aliases that means they should not become part of the ABI for any future glibc ports.

Hi Joseph, > There's an interesting implication here - that Wilco's patches should also > be removing the *existing* x86_64 libmvec *_finite aliases and tests > thereof (because they are static-only, so not relevant for compatibility > with existing executables and shared libraries - and we don't claim to > keep compatibility with existing .o files when we obsolete a feature). You mean sysdeps/x86_64/fpu/svml_finite_alias.S and test-libmvec-alias-mod.c? > A more significant omission in that area from Wilco's patches is that all > the _finite aliases in libm (as opposed to libmvec) need to be conditioned > on SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30), as if we're obsoleting the > aliases that means they should not become part of the ABI for any future > glibc ports. So concretely, does this mean the strong aliases need to have a #if around it like: #if SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30) strong_alias (__ieee754_cosh, __cosh_finite) #endif Do the abi lists need to be changed too? Wilco

On Tue, 28 May 2019, Wilco Dijkstra wrote: > Hi Joseph, > > > There's an interesting implication here - that Wilco's patches should also > > be removing the *existing* x86_64 libmvec *_finite aliases and tests > > thereof (because they are static-only, so not relevant for compatibility > > with existing executables and shared libraries - and we don't claim to > > keep compatibility with existing .o files when we obsolete a feature). > > You mean sysdeps/x86_64/fpu/svml_finite_alias.S and test-libmvec-alias-mod.c? There seem to be more files involved in testing those aliases, but that's the general idea. > > A more significant omission in that area from Wilco's patches is that all > > the _finite aliases in libm (as opposed to libmvec) need to be conditioned > > on SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30), as if we're obsoleting the > > aliases that means they should not become part of the ABI for any future > > glibc ports. > > So concretely, does this mean the strong aliases need to have a #if around it > like: > > #if SHLIB_COMPAT (libm, GLIBC_2_15, GLIBC_2_30) > strong_alias (__ieee754_cosh, __cosh_finite) > #endif Yes. They also need to change from strong_alias to compat_symbol calls, of the form: compat_symbol (__ieee754_cosh, __cosh_finite, GLIBC_2_15) (If any functions have existing versioned_symbol calls you may need to add another alias to use in the compat_symbol call, because of assembler requirements that each symbol version gets defined using a different name for the original symbol. Hopefully any such cases are rare, but this probably applies to a few functions such as expf, for example, where new symbol versions were added as part of eliminating wrappers.) *Calls* to the _finite functions will need updating to call other public names for those functions. There are a few calls of WRAPPER_IMPL_SSE2 etc. in sysdeps/x86_64/fpu/, and calls to the _finite functions from .S files in sysdeps/x86_64/fpu/multiarch/, that need updating. (There might be other calls, I haven't done an exhaustive check.) > Do the abi lists need to be changed too? No. The ABI lists do not distinguish whether a symbol is a compat symbol or not. Testing with build-many-glibcs.py to verify that all the ABI tests still pass is a good idea to verify that the patch hasn't accidentally broken the ABI.

## Patch

diff --git a/NEWS b/NEWS index 0e4c57f273..b1845c02e4 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,13 @@ Version 2.30 Major new features: +* Aarch64 now supports libmvec. Building libmvec on aarch64 requires + a compiler that supports the vector function ABI that is defined at + https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi + + GCC 9.1 has support for this ABI. The current libmvec for aarch64 + has vector versions of the exp and expf functions. + * Unicode 12.1.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 12.1.0, using generator scripts contributed by Mike FABIAN (Red Hat). diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac index 7851dd4dac..5c56511deb 100644 --- a/sysdeps/aarch64/configure.ac +++ b/sysdeps/aarch64/configure.ac @@ -20,3 +20,27 @@ if test $libc_cv_aarch64_be = yes; then else LIBC_CONFIG_VAR([default-abi], [lp64]) fi + +AC_CACHE_CHECK([for pcs attribute support], + libc_cv_gcc_pcs_attribute, [dnl +cat > conftest.c <<EOF +__attribute__((aarch64_vector_pcs)) extern void foo (void); +EOF +libc_cv_gcc_pcs_attribute=no +if ${CC-cc} -c -Wall -Werror conftest.c -o conftest.o 1>&AS_MESSAGE_LOG_FD \ + 2>&AS_MESSAGE_LOG_FD ; then + libc_cv_gcc_pcs_attribute=yes +fi +rm -f conftest*]) + +if test x"$build_mathvec" = xyes; then + if test $libc_cv_gcc_pcs_attribute = no; then + AC_MSG_ERROR([--enable-mathvec requires a gcc that supports the aarch64_vector_pcs attribute]) + fi +fi + +if test x"$build_mathvec" = xnotset; then + if test $libc_cv_gcc_pcs_attribute = yes; then + build_mathvec=yes + fi +fi diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 4a182bd6d6..c0720484e2 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -12,3 +12,27 @@ CFLAGS-s_fmaxf.c += -ffinite-math-only CFLAGS-s_fmin.c += -ffinite-math-only CFLAGS-s_fminf.c += -ffinite-math-only endif + +ifeq ($(subdir),mathvec) +CFLAGS-libmvec_double_vlen2_exp.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_float_vlen4_expf.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_exp_data.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_exp2f_data.c += -march=armv8-a+simd -fno-math-errno +CFLAGS-libmvec_aliases.c += -march=armv8-a+simd -fno-math-errno + +libmvec-support += libmvec_double_vlen2_exp +libmvec-support += libmvec_float_vlen4_expf +libmvec-support += libmvec_exp_data +libmvec-support += libmvec_exp2f_data +libmvec-support += libmvec_aliases + +libmvec-static-only-routines += limvec_aliases +endif + +ifeq ($(subdir),math) +ifeq ($(build-mathvec),yes) +libmvec-tests += double-vlen2 float-vlen4 +double-vlen2-funcs = exp +float-vlen4-funcs = exp +endif +endif diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index e69de29bb2..da36f3c495 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -0,0 +1,5 @@ +libmvec { + GLIBC_2.30 { + _ZGVnN2v_exp; _ZGVnN4v_expf; + } +} diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index e69de29bb2..4c3415987a 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -0,0 +1,43 @@ +/* Platform-specific SIMD declarations of math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _MATH_H +# error "Never include <bits/math-vector.h> directly;\ + include <math.h> instead." +#endif + +/* Get default empty definitions for simd declarations. */ +#include <bits/libm-simd-decl-stubs.h> + +#if defined __FAST_MATH__ +# if defined _OPENMP && _OPENMP >= 201307 +/* OpenMP case. */ +# define __DECL_SIMD_AARCH64 _Pragma ("omp declare simd notinbranch") +# elif __GNUC_PREREQ (6,0) +/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)). */ +# define __DECL_SIMD_AARCH64 __attribute__ ((__simd__ ("notinbranch"))) +# endif + +# ifdef __DECL_SIMD_AARCH64 +# undef __DECL_SIMD_exp +# define __DECL_SIMD_exp __DECL_SIMD_AARCH64 +# undef __DECL_SIMD_expf +# define __DECL_SIMD_expf __DECL_SIMD_AARCH64 + +# endif +#endif diff --git a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h index e69de29bb2..293983eb2c 100644 --- a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h +++ b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h @@ -0,0 +1,20 @@ +! Platform-specific declarations of SIMD math functions for Fortran. -*- f90 -*- +! Copyright (C) 2019 Free Software Foundation, Inc. +! This file is part of the GNU C Library. +! +! The GNU C Library is free software; you can redistribute it and/or +! modify it under the terms of the GNU Lesser General Public +! License as published by the Free Software Foundation; either +! version 2.1 of the License, or (at your option) any later version. +! +! The GNU C Library is distributed in the hope that it will be useful, +! but WITHOUT ANY WARRANTY; without even the implied warranty of +! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +! Lesser General Public License for more details. +! +! You should have received a copy of the GNU Lesser General Public +! License along with the GNU C Library; if not, see +! <http://www.gnu.org/licenses/>. + +!GCC$ builtin (exp) attributes simd (notinbranch) +!GCC$ builtin (expf) attributes simd (notinbranch) diff --git a/sysdeps/aarch64/fpu/libmvec_aliases.c b/sysdeps/aarch64/fpu/libmvec_aliases.c index e69de29bb2..bc3f9b8118 100644 --- a/sysdeps/aarch64/fpu/libmvec_aliases.c +++ b/sysdeps/aarch64/fpu/libmvec_aliases.c @@ -0,0 +1,40 @@ +/* These aliases added as workaround to exclude unnecessary symbol + aliases in libmvec.so while compiler creates the vector names + based on scalar asm name. Corresponding discussion is at + <https://gcc.gnu.org/ml/gcc/2015-06/msg00173.html>. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <arm_neon.h> + +extern __attribute__((aarch64_vector_pcs)) float64x2_t +_ZGVnN2v_exp (float64x2_t x); + +__attribute__((aarch64_vector_pcs)) float64x2_t +_ZGVnN2v___exp_finite (float64x2_t x) +{ + return _ZGVnN2v_exp (x); +} + +extern __attribute__((aarch64_vector_pcs)) float32x4_t +_ZGVnN4v_expf (float32x4_t x); + +__attribute__((aarch64_vector_pcs)) float32x4_t +_ZGVnN4v___expf_finite (float32x4_t x) +{ + return _ZGVnN4v_expf (x); +} diff --git a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c index e69de29bb2..ce618c8859 100644 --- a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c +++ b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c @@ -0,0 +1,94 @@ +/* Double-precision 2 element vector e^x function. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +/* This function is based on sysdeps/ieee754/dbl-64/e_exp.c. */ + +#include <math.h> +#include <float.h> +#include <stdint.h> +#include <stdlib.h> +#include <ieee754.h> +#include <math-narrow-eval.h> +#include "math_config.h" +#include "libmvec_util.h" + +#define N (1 << EXP_TABLE_BITS) +#define InvLn2N __exp_data.invln2N +#define NegLn2hiN __exp_data.negln2hiN +#define NegLn2loN __exp_data.negln2loN +#define Shift __exp_data.shift +#define T __exp_data.tab +#define C2 __exp_data.poly[5 - EXP_POLY_ORDER] +#define C3 __exp_data.poly[6 - EXP_POLY_ORDER] +#define C4 __exp_data.poly[7 - EXP_POLY_ORDER] +#define C5 __exp_data.poly[8 - EXP_POLY_ORDER] + +#define LIMIT 700.0 + +/* Do not inline this call. That way _ZGVnN2v_exp has no calls to non-vector + functions. This reduces the register saves that _ZGVnN2v_exp has to do. */ + +__attribute__((aarch64_vector_pcs, noinline)) static float64x2_t +__scalar_exp (float64x2_t x) +{ + return (float64x2_t) { exp(x[0]), exp(x[1]) }; +} + +__attribute__((aarch64_vector_pcs)) float64x2_t +_ZGVnN2v_exp (float64x2_t x) +{ + double h, z_0, z_1; + float64x2_t g, scale_v, tail_v, tmp_v, r_v, r2_v, kd_v; + float64x2_t NegLn2hiN_v, NegLn2loN_v, C2_v, C3_v, C4_v, C5_v; + uint64_t ki_0, ki_1, idx_0, idx_1; + uint64_t top_0, top_1, sbits_0, sbits_1; + + /* If any value is larger than LIMIT, or NAN, call scalar operation. */ + g = vabsq_f64 (x); + h = vmaxnmvq_f64 (g); + if (__glibc_unlikely (!(h < LIMIT))) + return __scalar_exp (x); + + z_0 = InvLn2N * x[0]; + z_1 = InvLn2N * x[1]; + ki_0 = converttoint (z_0); + ki_1 = converttoint (z_1); + + idx_0 = 2 * (ki_0 % N); + idx_1 = 2 * (ki_1 % N); + top_0 = ki_0 << (52 - EXP_TABLE_BITS); + top_1 = ki_1 << (52 - EXP_TABLE_BITS); + sbits_0 = T[idx_0 + 1] + top_0; + sbits_1 = T[idx_1 + 1] + top_1; + + kd_v = (float64x2_t) { roundtoint (z_0), roundtoint (z_1) }; + scale_v = (float64x2_t) { asdouble (sbits_0), asdouble (sbits_1) }; + tail_v = (float64x2_t) { asdouble (T[idx_0]), asdouble (T[idx_1]) }; + NegLn2hiN_v = (float64x2_t) { NegLn2hiN, NegLn2hiN }; + NegLn2loN_v = (float64x2_t) { NegLn2loN, NegLn2loN }; + C2_v = (float64x2_t) { C2, C2 }; + C3_v = (float64x2_t) { C3, C3 }; + C4_v = (float64x2_t) { C4, C4 }; + C5_v = (float64x2_t) { C5, C5 }; + + r_v = x + kd_v * NegLn2hiN_v + kd_v * NegLn2loN_v; + r2_v = r_v * r_v; + tmp_v = tail_v + r_v + r2_v * (C2_v + r_v * C3_v) + r2_v * r2_v + * (C4_v + r_v * C5_v); + return scale_v + scale_v * tmp_v; +} diff --git a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c index e69de29bb2..d97ce157b0 100644 --- a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c +++ b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c @@ -0,0 +1,2 @@ +#include <sysdeps/ieee754/flt-32/math_config.h> +#include <sysdeps/ieee754/flt-32/e_exp2f_data.c> diff --git a/sysdeps/aarch64/fpu/libmvec_exp_data.c b/sysdeps/aarch64/fpu/libmvec_exp_data.c index e69de29bb2..a83661b39d 100644 --- a/sysdeps/aarch64/fpu/libmvec_exp_data.c +++ b/sysdeps/aarch64/fpu/libmvec_exp_data.c @@ -0,0 +1 @@ +#include <sysdeps/ieee754/dbl-64/e_exp_data.c> diff --git a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c index e69de29bb2..938c72ddab 100644 --- a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c +++ b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c @@ -0,0 +1,114 @@ +/* Single-precision 2 element vector e^x function. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +/* This function is based on sysdeps/ieee754/flt-32/e_expf.c. */ + +#include <math.h> +#include <stdint.h> +#include <stdio.h> +#include <sysdeps/ieee754/flt-32/math_config.h> +#include "libmvec_util.h" + +#define N (1 << EXP2F_TABLE_BITS) +#define LIMIT 80.0 + +#define InvLn2N __exp2f_data.invln2_scaled +#define T __exp2f_data.tab +#define C __exp2f_data.poly_scaled +#define SHIFT __exp2f_data.shift + +/* Do not inline this call. That way _ZGVnN4v_expf has no calls to non-vector + functions. This reduces the register saves that _ZGVnN4v_expf has to do. */ + +__attribute__((aarch64_vector_pcs,noinline)) static float32x4_t +__scalar_expf (float32x4_t x) +{ + return (float32x4_t) { expf(x[0]), expf(x[1]), expf(x[2]), expf(x[3]) }; +} + +__attribute__((aarch64_vector_pcs)) float32x4_t +_ZGVnN4v_expf (float32x4_t x) +{ + float32x4_t g, result; + float64x2_t xd_0, xd_1, vInvLn2N, z_0, z_1, vkd_0, vkd_1, r_0, r_1; + float64x2_t vs_0, vs_1, c0, c1, c2, y_0, y_1, r2_0, r2_1, one; + uint64_t ki_0, ki_1, ki_2, ki_3, t_0, t_1, t_2, t_3; + double s_0, s_1, s_2, s_3; + float f; + + /* If any value is larger than LIMIT, or NAN, call scalar operation. */ + g = vabsq_f32 (x); + f = vmaxnmvq_f32 (g); + if (__glibc_unlikely (!(f < LIMIT))) + return __scalar_expf (x); + + xd_0 = get_lo_and_extend (x); + xd_1 = get_hi_and_extend (x); + + vInvLn2N = (float64x2_t) { InvLn2N, InvLn2N }; + /* x*N/Ln2 = k + r with r in [-1/2, 1/2] and int k. */ + z_0 = vInvLn2N * xd_0; + z_1 = vInvLn2N * xd_1; + + /* Round and convert z to int, the result is in [-150*N, 128*N] and + ideally ties-to-even rule is used, otherwise the magnitude of r + can be bigger which gives larger approximation error. */ + vkd_0 = vrndaq_f64 (z_0); + vkd_1 = vrndaq_f64 (z_1); + r_0 = z_0 - vkd_0; + r_1 = z_1 - vkd_1; + + ki_0 = (long) vkd_0[0]; + ki_1 = (long) vkd_0[1]; + ki_2 = (long) vkd_1[0]; + ki_3 = (long) vkd_1[1]; + + /* exp(x) = 2^(k/N) * 2^(r/N) ~= s * (C0*r^3 + C1*r^2 + C2*r + 1) */ + t_0 = T[ki_0 % N]; + t_1 = T[ki_1 % N]; + t_2 = T[ki_2 % N]; + t_3 = T[ki_3 % N]; + t_0 += ki_0 << (52 - EXP2F_TABLE_BITS); + t_1 += ki_1 << (52 - EXP2F_TABLE_BITS); + t_2 += ki_2 << (52 - EXP2F_TABLE_BITS); + t_3 += ki_3 << (52 - EXP2F_TABLE_BITS); + s_0 = asdouble (t_0); + s_1 = asdouble (t_1); + s_2 = asdouble (t_2); + s_3 = asdouble (t_3); + + vs_0 = (float64x2_t) { s_0, s_1 }; + vs_1 = (float64x2_t) { s_2, s_3 }; + c0 = (float64x2_t) { C[0], C[0] }; + c1 = (float64x2_t) { C[1], C[1] }; + c2 = (float64x2_t) { C[2], C[2] }; + one = (float64x2_t) { 1.0, 1.0 }; + + z_0 = c0 * r_0 + c1; + z_1 = c0 * r_1 + c1; + r2_0 = r_0 * r_0; + r2_1 = r_1 * r_1; + y_0 = c2 * r_0 + one; + y_1 = c2 * r_1 + one; + y_0 = z_0 * r2_0 + y_0; + y_1 = z_1 * r2_1 + y_1; + y_0 = y_0 * vs_0; + y_1 = y_1 * vs_1; + result = pack_and_trunc (y_0, y_1); + return result; +} diff --git a/sysdeps/aarch64/fpu/libmvec_util.h b/sysdeps/aarch64/fpu/libmvec_util.h index e69de29bb2..bd0463ce22 100644 --- a/sysdeps/aarch64/fpu/libmvec_util.h +++ b/sysdeps/aarch64/fpu/libmvec_util.h @@ -0,0 +1,54 @@ +/* Utility functions for Aarch64 vector functions. + Copyright (C) 2015-2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdint.h> +#include <arm_neon.h> + +/* Copy lower 2 elements of of a 4 element float vector into a 2 element + double vector. */ + +static __always_inline float64x2_t +get_lo_and_extend (float32x4_t x) +{ + __Uint64x2_t tmp1 = (__Uint64x2_t) x; +#if __BYTE_ORDER == __BIG_ENDIAN + uint64_t tmp2 = (uint64_t) tmp1[1]; +#else + uint64_t tmp2 = (uint64_t) tmp1[0]; +#endif + return vcvt_f64_f32 ((float32x2_t) tmp2); +} + +/* Copy upper 2 elements of of a 4 element float vector into a 2 element + double vector. */ + +static __always_inline float64x2_t +get_hi_and_extend (float32x4_t x) +{ + return vcvt_high_f64_f32 (x); +} + +/* Copy a pair of 2 element double vectors into a 4 element float vector. */ + +static __always_inline float32x4_t +pack_and_trunc (float64x2_t x, float64x2_t y) +{ + float32x2_t xx = vcvt_f32_f64 (x); + float32x2_t yy = vcvt_f32_f64 (y); + return (vcombine_f32 (xx, yy)); +} diff --git a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c index e69de29bb2..9eb31c8dfc 100644 --- a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c @@ -0,0 +1,24 @@ +/* Wrapper part of tests for aarch64 double vector math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <arm_neon.h> +#include "test-double-vlen2.h" + +#define VEC_TYPE float64x2_t + +VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVnN2v_exp) diff --git a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c index e69de29bb2..7f64acf886 100644 --- a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c @@ -0,0 +1,24 @@ +/* Wrapper part of tests for float aarch64 vector math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <arm_neon.h> +#include "test-float-vlen4.h" + +#define VEC_TYPE float32x4_t + +VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVnN4v_expf) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 585e5bbce7..1ed4af9e55 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1601,6 +1601,12 @@ float: 1 idouble: 1 ifloat: 1 +Function: "exp_vlen2": +double: 1 + +Function: "exp_vlen4": +float: 1 + Function: "expm1": double: 1 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index e69de29bb2..9e178253f7 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -0,0 +1,2 @@ +GLIBC_2.30 _ZGVnN2v_exp F +GLIBC_2.30 _ZGVnN4v_expf F