[v2,1/2] PPC64: Add libmvec SIMD single-precision natural exponent function.
Commit Message
Passes all tests.
Based off the ./sysdeps/ieee754/dbl-64/e_exp.c implementation,
and thus provides identical results.
Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar expf to figure out what is causing the overflow or underflow.
Suprisingly the special-case path performs as well as the normal path.
(both of which are vectorized)
Running 20 times over 32MiB
vector: mean 432.263032 MiB/s (sd 0.486733)
scalar: mean 178.646197 MiB/s (sd 0.050013)
v2: NFC, fixed some style issues, note that this is based on another impl.
2019-05-11 Shawn Landden <shawn@git.icu>
[BZ #24209]
* NEWS: Noted the addition of PPC64 vector expf function.
* sysdeps/powerpc/bits/math-vector.h: Added entry for vector expf.
* sysdeps/powerpc/powerpc64/fpu/Versions: Added vector expf entry.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile:
(libmvec-sysdep_routines, CFLAGS-vec_s_expf4_vsx.c):
(CFLAGS-vec_s_exp2f_data.c): Added build of VSX SIMD expf
function and tests. Added vec_math_errf.c to build.
* sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_flt.h: Modified for expf.
* sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c:
Added entry for vector expf.
* sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_expf4_vsx.c: New file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_exp2f_data.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD expf
added.
---
NEWS | 1 +
sysdeps/powerpc/bits/math-vector.h | 2 +
sysdeps/powerpc/powerpc64/fpu/Versions | 1 +
.../powerpc/powerpc64/fpu/multiarch/Makefile | 6 +-
.../powerpc64/fpu/multiarch/math_config_flt.h | 64 +++++--
.../fpu/multiarch/test-float-vlen4-wrappers.c | 1 +
.../powerpc64/fpu/multiarch/vec_math_errf.c | 39 +++++
.../fpu/multiarch/vec_s_exp2f_data.c | 58 +++++++
.../powerpc64/fpu/multiarch/vec_s_expf4_vsx.c | 159 ++++++++++++++++++
.../linux/powerpc/powerpc64/libmvec.abilist | 1 +
10 files changed, 321 insertions(+), 11 deletions(-)
create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c
create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_exp2f_data.c
create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_expf4_vsx.c
Comments
On Mon, May 27, 2019 at 12:47 PM Shawn Landden <shawn@git.icu> wrote:
>
> Passes all tests.
>
> Based off the ./sysdeps/ieee754/dbl-64/e_exp.c implementation,
> and thus provides identical results.
>
> Unlike other libmvec functions, this sets the underflow and overflow bits.
> The caller can check these flags, and possibly re-run the calculations with
> scalar expf to figure out what is causing the overflow or underflow.
>
> Suprisingly the special-case path performs as well as the normal path.
> (both of which are vectorized)
> Running 20 times over 32MiB
> vector: mean 432.263032 MiB/s (sd 0.486733)
> scalar: mean 178.646197 MiB/s (sd 0.050013)
>
> v2: NFC, fixed some style issues, note that this is based on another impl.
>
> 2019-05-11 Shawn Landden <shawn@git.icu>
>
> [BZ #24209]
> * NEWS: Noted the addition of PPC64 vector expf function.
> * sysdeps/powerpc/bits/math-vector.h: Added entry for vector expf.
> * sysdeps/powerpc/powerpc64/fpu/Versions: Added vector expf entry.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile:
> (libmvec-sysdep_routines, CFLAGS-vec_s_expf4_vsx.c):
> (CFLAGS-vec_s_exp2f_data.c): Added build of VSX SIMD expf
> function and tests. Added vec_math_errf.c to build.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_flt.h: Modified for expf.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c:
> Added entry for vector expf.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_expf4_vsx.c: New file.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_exp2f_data.c: Likewise.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c: Likewise.
> * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD expf
> added.
> ---
> NEWS | 1 +
> sysdeps/powerpc/bits/math-vector.h | 2 +
> sysdeps/powerpc/powerpc64/fpu/Versions | 1 +
> .../powerpc/powerpc64/fpu/multiarch/Makefile | 6 +-
> .../powerpc64/fpu/multiarch/math_config_flt.h | 64 +++++--
> .../fpu/multiarch/test-float-vlen4-wrappers.c | 1 +
> .../powerpc64/fpu/multiarch/vec_math_errf.c | 39 +++++
> .../fpu/multiarch/vec_s_exp2f_data.c | 58 +++++++
> .../powerpc64/fpu/multiarch/vec_s_expf4_vsx.c | 159 ++++++++++++++++++
> .../linux/powerpc/powerpc64/libmvec.abilist | 1 +
> 10 files changed, 321 insertions(+), 11 deletions(-)
> create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c
> create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_exp2f_data.c
> create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_expf4_vsx.c
>
> diff --git a/NEWS b/NEWS
> index 6f9af0e24e..26f5c67f75 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -18,10 +18,11 @@ Major new features:
> - single-precision logarithm: logf
> - double-precision sine: sin
> - single-precision sine: sinf
> - double-precision sincos: sincos
> - single-precision sincos: sincosf
> + - single-precision exp: expf
>
> GCC support for auto-vectorization of functions on PPC64 is not yet
> available. Until that is done, the new vector math functions are
> inaccessible to applications.
>
> diff --git a/sysdeps/powerpc/bits/math-vector.h b/sysdeps/powerpc/bits/math-vector.h
> index 6dc62480fd..bbc9db0dd9 100644
> --- a/sysdeps/powerpc/bits/math-vector.h
> +++ b/sysdeps/powerpc/bits/math-vector.h
> @@ -42,8 +42,10 @@
> # define __DECL_SIMD_logf __DECL_SIMD_PPC64
> # undef __DECL_SIMD_sincos
> # define __DECL_SIMD_sincos __DECL_SIMD_PPC64
> # undef __DECL_SIMD_sincosf
> # define __DECL_SIMD_sincosf __DECL_SIMD_PPC64
> +# undef __DECL_SIMD_expf
> +# define __DECL_SIMD_expf __DECL_SIMD_PPC64
>
> # endif
> #endif
> diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions
> index c083d9f938..7c45e76074 100644
> --- a/sysdeps/powerpc/powerpc64/fpu/Versions
> +++ b/sysdeps/powerpc/powerpc64/fpu/Versions
> @@ -1,6 +1,7 @@
> libmvec {
> GLIBC_2.30 {
> _ZGVbN2v_cos; _ZGVbN4v_cosf; _ZGVbN2v_sin; _ZGVbN4v_sinf;
> _ZGVbN2v_log; _ZGVbN4v_logf; _ZGVbN2vvv_sincos; _ZGVbN4vvv_sincosf;
> + _ZGVbN4v_expf;
> }
> }
> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
> index a4f0d1694e..ad3c29b1ab 100644
> --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
> @@ -46,10 +46,12 @@ endif
> ifeq ($(subdir),mathvec)
> libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \
> vec_d_sin2_vsx vec_s_sinf4_vsx \
> vec_d_log2_vsx vec_d_log_data \
> vec_s_logf4_vsx vec_s_logf_data \
> + vec_s_expf4_vsx vec_s_exp2f_data \
> + vec_math_errf \
> vec_d_sincos2_vsx vec_s_sincosf4_vsx
> CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_d_log2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_d_log_data.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_s_cosf4_vsx.c += -mabi=altivec -maltivec -mvsx
> @@ -57,19 +59,21 @@ CFLAGS-vec_s_logf4_vsx.c += -mabi=altivec -maltivec -mvsx
> CFLAGS-vec_s_logf_data.c += -mabi=altivec -maltivec -mvsx
> CFLAGS-vec_d_sin2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_s_sinf4_vsx.c += -mabi=altivec -maltivec -mvsx
> CFLAGS-vec_d_sincos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_s_sincosf4_vsx.c += -mabi=altivec -maltivec -mvsx
> +CFLAGS-vec_s_expf4_vsx.c += -mabi=altivec -maltivec -mvsx
> +CFLAGS-vec_s_exp2f_data.c += -mabi=altivec -maltivec -mvsx
> endif
>
> # Variables for libmvec tests.
> ifeq ($(subdir),math)
> ifeq ($(build-mathvec),yes)
> libmvec-tests += double-vlen2 float-vlen4
>
> double-vlen2-funcs = cos sin sincos log
> -float-vlen4-funcs = cos sin sincos log
> +float-vlen4-funcs = cos sin sincos log exp
>
> double-vlen2-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
> float-vlen4-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
>
> CFLAGS-test-double-vlen2-wrappers.c += -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_flt.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_flt.h
> index b5fbe02d91..18f34197e2 100644
> --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_flt.h
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_flt.h
> @@ -46,20 +46,42 @@
>
> #if TOINT_INTRINSICS
> /* Round x to nearest int in all rounding modes, ties have to be rounded
> consistently with converttoint so the results match. If the result
> would be outside of [-2^31, 2^31-1] then the semantics is unspecified. */
> -static inline double_t
> -roundtoint (double_t x);
> +static inline vector double
> +roundtoint (vector double x);
>
> /* Convert x to nearest int in all rounding modes, ties have to be rounded
> consistently with roundtoint. If the result is not representible in an
> int32_t then the semantics is unspecified. */
> -static inline int32_t
> -converttoint (double_t x);
> +static inline vector unsigned long long
> +converttoint (vector double x);
> #endif
>
> +static inline vector unsigned
> +vasuint (vector float f)
> +{
> + union
> + {
> + vector float f;
> + vector unsigned i;
> + } u = {f};
> + return u.i;
> +}
> +
> +static inline vector float
> +vasfloat (vector unsigned i)
> +{
> + union
> + {
> + vector unsigned i;
> + vector float f;
> + } u = {i};
> + return u.f;
> +}
> +
> static inline uint32_t
> asuint (float f)
> {
> union
> {
> @@ -78,10 +100,32 @@ asfloat (uint32_t i)
> float f;
> } u = {i};
> return u.f;
> }
>
> +static inline vector unsigned long long
> +vasuint64 (vector double f)
> +{
> + union
> + {
> + vector double f;
> + vector unsigned long long i;
> + } u = {f};
> + return u.i;
> +}
> +
> +static inline vector double
> +vasdouble (vector unsigned long long i)
> +{
> + union
> + {
> + vector unsigned long long i;
> + vector double f;
> + } u = {i};
> + return u.f;
> +}
> +
> static inline uint64_t
> asuint64 (double f)
> {
> union
> {
> @@ -122,16 +166,16 @@ attribute_hidden float __math_invalidf (float);
> /* Shared between expf, exp2f and powf. */
> #define EXP2F_TABLE_BITS 5
> #define EXP2F_POLY_ORDER 3
> extern const struct exp2f_data
> {
> - uint64_t tab[1 << EXP2F_TABLE_BITS];
> - double shift_scaled;
> - double poly[EXP2F_POLY_ORDER];
> - double shift;
> - double invln2_scaled;
> - double poly_scaled[EXP2F_POLY_ORDER];
> + long long unsigned tab[1 << EXP2F_TABLE_BITS];
> + vector double shift_scaled;
> + vector double poly[EXP2F_POLY_ORDER];
> + vector double shift;
> + vector double invln2_scaled;
> + vector double poly_scaled[EXP2F_POLY_ORDER];
> } __exp2f_data attribute_hidden;
>
> #define LOGF_TABLE_BITS 4
> #define LOGF_POLY_ORDER 4
> extern const struct logf_data
> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
> index b4721b1dac..37886d1a45 100644
> --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
> @@ -22,6 +22,7 @@
> #define VEC_TYPE vector float
>
> VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
> VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf)
> VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf)
> +VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
> VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVbN4vvv_sincosf)
> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c
> new file mode 100644
> index 0000000000..22b3a62fcf
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c
> @@ -0,0 +1,39 @@
> +/* Single-precision vector math error handling.
> + Copyright 2019 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include "math_config_flt.h"
> +/* NOINLINE prevents fenv semantics breaking optimizations. */
> +NOINLINE static float
> +xflowf (uint32_t sign, float y)
> +{
> + y = (sign ? -y : y) * y;
> + return y;
> +}
> +
> +attribute_hidden float
> +__math_uflowf (uint32_t sign)
> +{
> + return xflowf (sign, 0x1p-95f);
> +}
> +
> +attribute_hidden float
> +__math_oflowf (uint32_t sign)
> +{
> + return xflowf (sign, 0x1p97f);
> +}
> +
> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_exp2f_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_exp2f_data.c
> new file mode 100644
> index 0000000000..b2232713a9
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_exp2f_data.c
> @@ -0,0 +1,58 @@
> +/* Data for exp, rearranged for vector operations
> + Copyright (C) 2019 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include "math_config_flt.h"
> +
> +#define N (1 << EXP2F_TABLE_BITS)
> +
> +const struct exp2f_data __exp2f_data = {
> + /* tab[i] = uint(2^(i/N)) - (i << 52-BITS)
> + used for computing 2^(k/N) for an int |k| < 150 N as
> + double(tab[k%N] + (k << 52-BITS)) */
> + .tab = {
> +0x3ff0000000000000, 0x3fefd9b0d3158574, 0x3fefb5586cf9890f, 0x3fef9301d0125b51,
> +0x3fef72b83c7d517b, 0x3fef54873168b9aa, 0x3fef387a6e756238, 0x3fef1e9df51fdee1,
> +0x3fef06fe0a31b715, 0x3feef1a7373aa9cb, 0x3feedea64c123422, 0x3feece086061892d,
> +0x3feebfdad5362a27, 0x3feeb42b569d4f82, 0x3feeab07dd485429, 0x3feea47eb03a5585,
> +0x3feea09e667f3bcd, 0x3fee9f75e8ec5f74, 0x3feea11473eb0187, 0x3feea589994cce13,
> +0x3feeace5422aa0db, 0x3feeb737b0cdc5e5, 0x3feec49182a3f090, 0x3feed503b23e255d,
> +0x3feee89f995ad3ad, 0x3feeff76f2fb5e47, 0x3fef199bdd85529c, 0x3fef3720dcef9069,
> +0x3fef5818dcfba487, 0x3fef7c97337b9b5f, 0x3fefa4afa2a490da, 0x3fefd0765b6e4540,
> + },
> + .shift_scaled = {
> + 0x1.8p+52 / N,
> + 0x1.8p+52 / N },
> + .poly = {
> + {0x1.c6af84b912394p-5, 0x1.c6af84b912394p-5},
> + {0x1.ebfce50fac4f3p-3, 0x1.ebfce50fac4f3p-3},
> + {0x1.62e42ff0c52d6p-1, 0x1.62e42ff0c52d6p-1},
> + },
> + .shift = {
> + 0x1.8p+52,
> + 0x1.8p+52,
> + },
> + .invln2_scaled = {
> + 0x1.71547652b82fep+0 * N,
> + 0x1.71547652b82fep+0 * N,
> + },
> + .poly_scaled = {
> + {0x1.c6af84b912394p-5/N/N/N, 0x1.c6af84b912394p-5/N/N/N},
> + {0x1.ebfce50fac4f3p-3/N/N, 0x1.ebfce50fac4f3p-3/N/N},
> + {0x1.62e42ff0c52d6p-1/N, 0x1.62e42ff0c52d6p-1/N},
> + },
> +};
> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_expf4_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_expf4_vsx.c
> new file mode 100644
> index 0000000000..d118428a66
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_expf4_vsx.c
> @@ -0,0 +1,159 @@
> +/* Single-precision vector expf function.
> + Copyright (C) 2019 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +/* Based off ./sysdeps/ieee754/flt-32/e_expf.c
> + which was contributed by Szabolcs Nagy of ARM Ltd. */
> +#include <altivec.h>
> +#include <math.h>
> +
> +#include "math_config_flt.h"
> +
> +typedef vector long long unsigned v64u;
> +typedef union {
> + vector unsigned u;
> + vector float f;
> + v64u l;
> + vector double d;
> +} u;
> +typedef union {
> + double d;
> + int64_t l;
> + unsigned u;
> + float f;
> +} us;
> +
> +#define N (1 << EXP2F_TABLE_BITS)
> +#define InvLn2N __exp2f_data.invln2_scaled
> +#define T __exp2f_data.tab
> +#define C __exp2f_data.poly_scaled
> +
> +vector float
> +_ZGVbN4v_expf (vector float x)
> +{
> + u res;
> + u xu;
> + xu.f = x;
> + us c88;
> + c88.f = 88.0f;
> + us inf;
> + inf.f = INFINITY;
> + us ninf;
> + ninf.f = -INFINITY;
> + vector unsigned constants = {(c88.u & 0xfff00000) << 1,
> + ninf.u, inf.u, 0};
> + vector float constants2 = {0x1.62e42ep6f, -0x1.9fe368p6f, 0, 0};
> + vector unsigned zero = {0, 0, 0, 0};
> + vector unsigned v88 = vec_splat (constants, 0);
> + vector unsigned is_special_case
> + = (vector unsigned) vec_cmpge (xu.u << 1, v88);
> + vector unsigned is_special_not_covered = is_special_case;
> + if (__glibc_unlikely (!vec_all_eq (is_special_case, zero)))
> + {
> + vector unsigned inf = vec_splat (constants, 2);
> + vector unsigned is_inf_or_ninf_or_nan
> + = (vector unsigned) vec_cmpge (xu.u, inf);
> + u xpx;
> + xpx.f = x + x;
> + res.u = xpx.u;
> + is_special_not_covered &= ~is_inf_or_ninf_or_nan;
> + vector unsigned ninf = vec_splat (constants, 1);
> + vector unsigned is_ninf
> + = (vector unsigned) vec_cmpeq (xu.u, ninf);
> + res.u = vec_sel (res.u, zero, is_ninf);
> +
> + vector float overflow_v = vec_splat (constants2, 0);
> + vector unsigned is_overflow
> + = (vector unsigned) vec_cmpgt (xu.f, overflow_v);
> + if (__glibc_unlikely (!vec_all_eq (is_overflow, zero)))
> + {
> + // This branch is because we are generating an
> + // overflow fp flag
> + us ofu;
> + ofu.f = __math_oflowf (0);
> + vector unsigned of = {ofu.u, ofu.u, ofu.u, ofu.u};
> + res.u = vec_sel (res.u, of, is_overflow);
> + is_special_not_covered &= ~is_overflow;
> + }
> + vector float underflow_v = vec_splat (constants2, 1);
> + vector unsigned is_underflow
> + = (vector unsigned) vec_cmplt (xu.f, underflow_v);
> + if (__glibc_unlikely (!vec_all_eq (is_underflow, zero)))
> + {
> + // This branch is because we are generating an
> + // underflow fp flag
> + us ufu;
> + ufu.f = __math_uflowf (0);
> + vector unsigned uf = {ufu.u, ufu.u, ufu.u, ufu.u};
> + res.u = vec_sel (res.u, uf, is_underflow);
> + is_special_not_covered &= ~is_underflow;
> + }
> + }
> +
> +#if __GNUC__ >= 8
|| __clang_major__ >= 6
https://reviews.llvm.org/rG41e14c4dfa63c45271cb42368d09b5a94ee16e4f#change-sAqzIEx8bz0L
> + vector double xl = vec_unpackh (x);
> + vector double xr = vec_unpackl (x);
> +#else
> + vector double xl = {x[0], x[1]};
> + vector double xr = {x[2], x[3]};
> +#endif
> + vector double zl = InvLn2N * xl;
> + vector double zr = InvLn2N * xr;
> +#if TOINT_INTRINSICS
> + vector double kdl = roundtoint (zl);
> + vector double kdl = roundtoint (zl);
> + v64u kil = converttoint (zl);
> + v64u kir = converttoint (zr);
> +#else
> +#define SHIFT __exp2f_data.shift
> + vector double kdl = zl + SHIFT;
> + vector double kdr = zr + SHIFT;
> + u kilu;
> + kilu.d = kdl;
> + u kiru;
> + kiru.d = kdr;
> + v64u kil = kilu.l;
> + v64u kir = kiru.l;
> + kdl -= SHIFT;
> + kdr -= SHIFT;
> +#endif
> + vector double rl = zl - kdl;
> + vector double rr = zr - kdr;
> +
> + v64u tl = {T[kil[0] % N], T[kil[1] % N]};
> + v64u tr = {T[kir[0] % N], T[kir[1] % N]};
> + tl += (kil << (52 - EXP2F_TABLE_BITS));
> + tr += (kir << (52 - EXP2F_TABLE_BITS));
> + u slu;
> + slu.l = tl;
> + u sru;
> + sru.l = tr;
> + // This cast is obnoxious, but there is no vec_ld for double
> + zl = C[0] * rl + C[1];
> + zr = C[0] * rr + C[1];
> + vector double r2l = rl * rl;
> + vector double r2r = rr * rr;
> + vector double yl = C[2] * rl + 1;
> + vector double yr = C[2] * rr + 1;
> + yl = zl * r2l + yl;
> + yr = zr * r2r + yr;
> + yl = yl * slu.d;
> + yr = yr * sru.d;
> + vector float restmp = {(float) yl[0], (float) yl[1],
> + (float) yr[0], (float) yr[1]};
> + return vec_sel (restmp, res.f,
> + is_special_case & ~is_special_not_covered);
> +}
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
> index bb9f0beea4..63770c8da3 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
> @@ -4,5 +4,6 @@ GLIBC_2.30 _ZGVbN2v_sin F
> GLIBC_2.30 _ZGVbN2vvv_sincos F
> GLIBC_2.30 _ZGVbN4v_cosf F
> GLIBC_2.30 _ZGVbN4v_logf F
> GLIBC_2.30 _ZGVbN4v_sinf F
> GLIBC_2.30 _ZGVbN4vvv_sincosf F
> +GLIBC_2.30 _ZGVbN4v_expf F
> --
> 2.20.1
>
Shawn Landden <shawn@git.icu> writes:
> 2019-05-11 Shawn Landden <shawn@git.icu>
>
> [BZ #24209]
> * NEWS: Noted the addition of PPC64 vector expf function.
> * sysdeps/powerpc/bits/math-vector.h: Added entry for vector expf.
> * sysdeps/powerpc/powerpc64/fpu/Versions: Added vector expf entry.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile:
> (libmvec-sysdep_routines, CFLAGS-vec_s_expf4_vsx.c):
> (CFLAGS-vec_s_exp2f_data.c): Added build of VSX SIMD expf
> function and tests. Added vec_math_errf.c to build.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_flt.h: Modified for expf.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c:
> Added entry for vector expf.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_expf4_vsx.c: New file.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_exp2f_data.c: Likewise.
> * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c: Likewise.
> * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD expf
> added.
These entries are still indented with spaces. Fixed.
> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
> index a4f0d1694e..ad3c29b1ab 100644
> --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
> @@ -46,10 +46,12 @@ endif
> ifeq ($(subdir),mathvec)
> libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \
> vec_d_sin2_vsx vec_s_sinf4_vsx \
> vec_d_log2_vsx vec_d_log_data \
> vec_s_logf4_vsx vec_s_logf_data \
> + vec_s_expf4_vsx vec_s_exp2f_data \
> + vec_math_errf \
> vec_d_sincos2_vsx vec_s_sincosf4_vsx
> CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_d_log2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_d_log_data.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_s_cosf4_vsx.c += -mabi=altivec -maltivec -mvsx
> @@ -57,19 +59,21 @@ CFLAGS-vec_s_logf4_vsx.c += -mabi=altivec -maltivec -mvsx
> CFLAGS-vec_s_logf_data.c += -mabi=altivec -maltivec -mvsx
> CFLAGS-vec_d_sin2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_s_sinf4_vsx.c += -mabi=altivec -maltivec -mvsx
> CFLAGS-vec_d_sincos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> CFLAGS-vec_s_sincosf4_vsx.c += -mabi=altivec -maltivec -mvsx
> +CFLAGS-vec_s_expf4_vsx.c += -mabi=altivec -maltivec -mvsx
> +CFLAGS-vec_s_exp2f_data.c += -mabi=altivec -maltivec -mvsx
vec_math_errf.c indirectly depends on altivec.h, which requires to use the same
flags.
Fixed locally.
> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c
> new file mode 100644
> index 0000000000..22b3a62fcf
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_errf.c
> @@ -0,0 +1,39 @@
> ...
> +attribute_hidden float
> +__math_oflowf (uint32_t sign)
> +{
> + return xflowf (sign, 0x1p97f);
> +}
> +
Extra line at EOF here. Fixed.
I'm merging this patch with the fixes I mentioned to branch tuliom/libmvec.
Thanks!
@@ -18,10 +18,11 @@ Major new features:
- single-precision logarithm: logf
- double-precision sine: sin
- single-precision sine: sinf
- double-precision sincos: sincos
- single-precision sincos: sincosf
+ - single-precision exp: expf
GCC support for auto-vectorization of functions on PPC64 is not yet
available. Until that is done, the new vector math functions are
inaccessible to applications.
@@ -42,8 +42,10 @@
# define __DECL_SIMD_logf __DECL_SIMD_PPC64
# undef __DECL_SIMD_sincos
# define __DECL_SIMD_sincos __DECL_SIMD_PPC64
# undef __DECL_SIMD_sincosf
# define __DECL_SIMD_sincosf __DECL_SIMD_PPC64
+# undef __DECL_SIMD_expf
+# define __DECL_SIMD_expf __DECL_SIMD_PPC64
# endif
#endif
@@ -1,6 +1,7 @@
libmvec {
GLIBC_2.30 {
_ZGVbN2v_cos; _ZGVbN4v_cosf; _ZGVbN2v_sin; _ZGVbN4v_sinf;
_ZGVbN2v_log; _ZGVbN4v_logf; _ZGVbN2vvv_sincos; _ZGVbN4vvv_sincosf;
+ _ZGVbN4v_expf;
}
}
@@ -46,10 +46,12 @@ endif
ifeq ($(subdir),mathvec)
libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \
vec_d_sin2_vsx vec_s_sinf4_vsx \
vec_d_log2_vsx vec_d_log_data \
vec_s_logf4_vsx vec_s_logf_data \
+ vec_s_expf4_vsx vec_s_exp2f_data \
+ vec_math_errf \
vec_d_sincos2_vsx vec_s_sincosf4_vsx
CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
CFLAGS-vec_d_log2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
CFLAGS-vec_d_log_data.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
CFLAGS-vec_s_cosf4_vsx.c += -mabi=altivec -maltivec -mvsx
@@ -57,19 +59,21 @@ CFLAGS-vec_s_logf4_vsx.c += -mabi=altivec -maltivec -mvsx
CFLAGS-vec_s_logf_data.c += -mabi=altivec -maltivec -mvsx
CFLAGS-vec_d_sin2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
CFLAGS-vec_s_sinf4_vsx.c += -mabi=altivec -maltivec -mvsx
CFLAGS-vec_d_sincos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
CFLAGS-vec_s_sincosf4_vsx.c += -mabi=altivec -maltivec -mvsx
+CFLAGS-vec_s_expf4_vsx.c += -mabi=altivec -maltivec -mvsx
+CFLAGS-vec_s_exp2f_data.c += -mabi=altivec -maltivec -mvsx
endif
# Variables for libmvec tests.
ifeq ($(subdir),math)
ifeq ($(build-mathvec),yes)
libmvec-tests += double-vlen2 float-vlen4
double-vlen2-funcs = cos sin sincos log
-float-vlen4-funcs = cos sin sincos log
+float-vlen4-funcs = cos sin sincos log exp
double-vlen2-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
float-vlen4-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
CFLAGS-test-double-vlen2-wrappers.c += -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
@@ -46,20 +46,42 @@
#if TOINT_INTRINSICS
/* Round x to nearest int in all rounding modes, ties have to be rounded
consistently with converttoint so the results match. If the result
would be outside of [-2^31, 2^31-1] then the semantics is unspecified. */
-static inline double_t
-roundtoint (double_t x);
+static inline vector double
+roundtoint (vector double x);
/* Convert x to nearest int in all rounding modes, ties have to be rounded
consistently with roundtoint. If the result is not representible in an
int32_t then the semantics is unspecified. */
-static inline int32_t
-converttoint (double_t x);
+static inline vector unsigned long long
+converttoint (vector double x);
#endif
+static inline vector unsigned
+vasuint (vector float f)
+{
+ union
+ {
+ vector float f;
+ vector unsigned i;
+ } u = {f};
+ return u.i;
+}
+
+static inline vector float
+vasfloat (vector unsigned i)
+{
+ union
+ {
+ vector unsigned i;
+ vector float f;
+ } u = {i};
+ return u.f;
+}
+
static inline uint32_t
asuint (float f)
{
union
{
@@ -78,10 +100,32 @@ asfloat (uint32_t i)
float f;
} u = {i};
return u.f;
}
+static inline vector unsigned long long
+vasuint64 (vector double f)
+{
+ union
+ {
+ vector double f;
+ vector unsigned long long i;
+ } u = {f};
+ return u.i;
+}
+
+static inline vector double
+vasdouble (vector unsigned long long i)
+{
+ union
+ {
+ vector unsigned long long i;
+ vector double f;
+ } u = {i};
+ return u.f;
+}
+
static inline uint64_t
asuint64 (double f)
{
union
{
@@ -122,16 +166,16 @@ attribute_hidden float __math_invalidf (float);
/* Shared between expf, exp2f and powf. */
#define EXP2F_TABLE_BITS 5
#define EXP2F_POLY_ORDER 3
extern const struct exp2f_data
{
- uint64_t tab[1 << EXP2F_TABLE_BITS];
- double shift_scaled;
- double poly[EXP2F_POLY_ORDER];
- double shift;
- double invln2_scaled;
- double poly_scaled[EXP2F_POLY_ORDER];
+ long long unsigned tab[1 << EXP2F_TABLE_BITS];
+ vector double shift_scaled;
+ vector double poly[EXP2F_POLY_ORDER];
+ vector double shift;
+ vector double invln2_scaled;
+ vector double poly_scaled[EXP2F_POLY_ORDER];
} __exp2f_data attribute_hidden;
#define LOGF_TABLE_BITS 4
#define LOGF_POLY_ORDER 4
extern const struct logf_data
@@ -22,6 +22,7 @@
#define VEC_TYPE vector float
VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf)
VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf)
+VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVbN4vvv_sincosf)
new file mode 100644
@@ -0,0 +1,39 @@
+/* Single-precision vector math error handling.
+ Copyright 2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "math_config_flt.h"
+/* NOINLINE prevents fenv semantics breaking optimizations. */
+NOINLINE static float
+xflowf (uint32_t sign, float y)
+{
+ y = (sign ? -y : y) * y;
+ return y;
+}
+
+attribute_hidden float
+__math_uflowf (uint32_t sign)
+{
+ return xflowf (sign, 0x1p-95f);
+}
+
+attribute_hidden float
+__math_oflowf (uint32_t sign)
+{
+ return xflowf (sign, 0x1p97f);
+}
+
new file mode 100644
@@ -0,0 +1,58 @@
+/* Data for exp, rearranged for vector operations
+ Copyright (C) 2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "math_config_flt.h"
+
+#define N (1 << EXP2F_TABLE_BITS)
+
+const struct exp2f_data __exp2f_data = {
+ /* tab[i] = uint(2^(i/N)) - (i << 52-BITS)
+ used for computing 2^(k/N) for an int |k| < 150 N as
+ double(tab[k%N] + (k << 52-BITS)) */
+ .tab = {
+0x3ff0000000000000, 0x3fefd9b0d3158574, 0x3fefb5586cf9890f, 0x3fef9301d0125b51,
+0x3fef72b83c7d517b, 0x3fef54873168b9aa, 0x3fef387a6e756238, 0x3fef1e9df51fdee1,
+0x3fef06fe0a31b715, 0x3feef1a7373aa9cb, 0x3feedea64c123422, 0x3feece086061892d,
+0x3feebfdad5362a27, 0x3feeb42b569d4f82, 0x3feeab07dd485429, 0x3feea47eb03a5585,
+0x3feea09e667f3bcd, 0x3fee9f75e8ec5f74, 0x3feea11473eb0187, 0x3feea589994cce13,
+0x3feeace5422aa0db, 0x3feeb737b0cdc5e5, 0x3feec49182a3f090, 0x3feed503b23e255d,
+0x3feee89f995ad3ad, 0x3feeff76f2fb5e47, 0x3fef199bdd85529c, 0x3fef3720dcef9069,
+0x3fef5818dcfba487, 0x3fef7c97337b9b5f, 0x3fefa4afa2a490da, 0x3fefd0765b6e4540,
+ },
+ .shift_scaled = {
+ 0x1.8p+52 / N,
+ 0x1.8p+52 / N },
+ .poly = {
+ {0x1.c6af84b912394p-5, 0x1.c6af84b912394p-5},
+ {0x1.ebfce50fac4f3p-3, 0x1.ebfce50fac4f3p-3},
+ {0x1.62e42ff0c52d6p-1, 0x1.62e42ff0c52d6p-1},
+ },
+ .shift = {
+ 0x1.8p+52,
+ 0x1.8p+52,
+ },
+ .invln2_scaled = {
+ 0x1.71547652b82fep+0 * N,
+ 0x1.71547652b82fep+0 * N,
+ },
+ .poly_scaled = {
+ {0x1.c6af84b912394p-5/N/N/N, 0x1.c6af84b912394p-5/N/N/N},
+ {0x1.ebfce50fac4f3p-3/N/N, 0x1.ebfce50fac4f3p-3/N/N},
+ {0x1.62e42ff0c52d6p-1/N, 0x1.62e42ff0c52d6p-1/N},
+ },
+};
new file mode 100644
@@ -0,0 +1,159 @@
+/* Single-precision vector expf function.
+ Copyright (C) 2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+/* Based off ./sysdeps/ieee754/flt-32/e_expf.c
+ which was contributed by Szabolcs Nagy of ARM Ltd. */
+#include <altivec.h>
+#include <math.h>
+
+#include "math_config_flt.h"
+
+typedef vector long long unsigned v64u;
+typedef union {
+ vector unsigned u;
+ vector float f;
+ v64u l;
+ vector double d;
+} u;
+typedef union {
+ double d;
+ int64_t l;
+ unsigned u;
+ float f;
+} us;
+
+#define N (1 << EXP2F_TABLE_BITS)
+#define InvLn2N __exp2f_data.invln2_scaled
+#define T __exp2f_data.tab
+#define C __exp2f_data.poly_scaled
+
+vector float
+_ZGVbN4v_expf (vector float x)
+{
+ u res;
+ u xu;
+ xu.f = x;
+ us c88;
+ c88.f = 88.0f;
+ us inf;
+ inf.f = INFINITY;
+ us ninf;
+ ninf.f = -INFINITY;
+ vector unsigned constants = {(c88.u & 0xfff00000) << 1,
+ ninf.u, inf.u, 0};
+ vector float constants2 = {0x1.62e42ep6f, -0x1.9fe368p6f, 0, 0};
+ vector unsigned zero = {0, 0, 0, 0};
+ vector unsigned v88 = vec_splat (constants, 0);
+ vector unsigned is_special_case
+ = (vector unsigned) vec_cmpge (xu.u << 1, v88);
+ vector unsigned is_special_not_covered = is_special_case;
+ if (__glibc_unlikely (!vec_all_eq (is_special_case, zero)))
+ {
+ vector unsigned inf = vec_splat (constants, 2);
+ vector unsigned is_inf_or_ninf_or_nan
+ = (vector unsigned) vec_cmpge (xu.u, inf);
+ u xpx;
+ xpx.f = x + x;
+ res.u = xpx.u;
+ is_special_not_covered &= ~is_inf_or_ninf_or_nan;
+ vector unsigned ninf = vec_splat (constants, 1);
+ vector unsigned is_ninf
+ = (vector unsigned) vec_cmpeq (xu.u, ninf);
+ res.u = vec_sel (res.u, zero, is_ninf);
+
+ vector float overflow_v = vec_splat (constants2, 0);
+ vector unsigned is_overflow
+ = (vector unsigned) vec_cmpgt (xu.f, overflow_v);
+ if (__glibc_unlikely (!vec_all_eq (is_overflow, zero)))
+ {
+ // This branch is because we are generating an
+ // overflow fp flag
+ us ofu;
+ ofu.f = __math_oflowf (0);
+ vector unsigned of = {ofu.u, ofu.u, ofu.u, ofu.u};
+ res.u = vec_sel (res.u, of, is_overflow);
+ is_special_not_covered &= ~is_overflow;
+ }
+ vector float underflow_v = vec_splat (constants2, 1);
+ vector unsigned is_underflow
+ = (vector unsigned) vec_cmplt (xu.f, underflow_v);
+ if (__glibc_unlikely (!vec_all_eq (is_underflow, zero)))
+ {
+ // This branch is because we are generating an
+ // underflow fp flag
+ us ufu;
+ ufu.f = __math_uflowf (0);
+ vector unsigned uf = {ufu.u, ufu.u, ufu.u, ufu.u};
+ res.u = vec_sel (res.u, uf, is_underflow);
+ is_special_not_covered &= ~is_underflow;
+ }
+ }
+
+#if __GNUC__ >= 8
+ vector double xl = vec_unpackh (x);
+ vector double xr = vec_unpackl (x);
+#else
+ vector double xl = {x[0], x[1]};
+ vector double xr = {x[2], x[3]};
+#endif
+ vector double zl = InvLn2N * xl;
+ vector double zr = InvLn2N * xr;
+#if TOINT_INTRINSICS
+ vector double kdl = roundtoint (zl);
+ vector double kdl = roundtoint (zl);
+ v64u kil = converttoint (zl);
+ v64u kir = converttoint (zr);
+#else
+#define SHIFT __exp2f_data.shift
+ vector double kdl = zl + SHIFT;
+ vector double kdr = zr + SHIFT;
+ u kilu;
+ kilu.d = kdl;
+ u kiru;
+ kiru.d = kdr;
+ v64u kil = kilu.l;
+ v64u kir = kiru.l;
+ kdl -= SHIFT;
+ kdr -= SHIFT;
+#endif
+ vector double rl = zl - kdl;
+ vector double rr = zr - kdr;
+
+ v64u tl = {T[kil[0] % N], T[kil[1] % N]};
+ v64u tr = {T[kir[0] % N], T[kir[1] % N]};
+ tl += (kil << (52 - EXP2F_TABLE_BITS));
+ tr += (kir << (52 - EXP2F_TABLE_BITS));
+ u slu;
+ slu.l = tl;
+ u sru;
+ sru.l = tr;
+ // This cast is obnoxious, but there is no vec_ld for double
+ zl = C[0] * rl + C[1];
+ zr = C[0] * rr + C[1];
+ vector double r2l = rl * rl;
+ vector double r2r = rr * rr;
+ vector double yl = C[2] * rl + 1;
+ vector double yr = C[2] * rr + 1;
+ yl = zl * r2l + yl;
+ yr = zr * r2r + yr;
+ yl = yl * slu.d;
+ yr = yr * sru.d;
+ vector float restmp = {(float) yl[0], (float) yl[1],
+ (float) yr[0], (float) yr[1]};
+ return vec_sel (restmp, res.f,
+ is_special_case & ~is_special_not_covered);
+}
@@ -4,5 +4,6 @@ GLIBC_2.30 _ZGVbN2v_sin F
GLIBC_2.30 _ZGVbN2vvv_sincos F
GLIBC_2.30 _ZGVbN4v_cosf F
GLIBC_2.30 _ZGVbN4v_logf F
GLIBC_2.30 _ZGVbN4v_sinf F
GLIBC_2.30 _ZGVbN4vvv_sincosf F
+GLIBC_2.30 _ZGVbN4v_expf F