From patchwork Mon May 6 20:32:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shawn Landden X-Patchwork-Id: 32574 X-Patchwork-Delegate: tuliom@linux.vnet.ibm.com Received: (qmail 52491 invoked by alias); 6 May 2019 20:32:23 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 52482 invoked by uid 89); 6 May 2019 20:32:23 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=1, 123, Variables, NEWS, poly X-HELO: mail-it1-f180.google.com Return-Path: From: Shawn Landden To: libc-alpha@sourceware.org Cc: Shawn Landden Subject: [PATCH] powerpc: add libmvec implementations of log and logf Date: Mon, 6 May 2019 13:32:09 -0700 Message-Id: <20190506203210.8536-1-shawn@git.icu> In-Reply-To: <20190505203438.28140-1-shawn@git.icu> References: <20190505203438.28140-1-shawn@git.icu> MIME-Version: 1.0 [BZ #24207] When fed numbers in the range of 0 to 2^32-1 (as doubles) the vector log is about 75% faster than scalar log. However when not using fused-multiply-add, it is only 30% faster, and while less accurate, provides identical results to log(). I think log() should switch to using fma. When fed numbers in the range of 0 to 2^16-1 (as floats) the vector logf is about 30% faster than scalar logf. This should probably be faster, and did not spend much time in perf looking into this.[2] logf requires Power 7 log requires Power 8 I have not completed a FSF copyright assignement, but would be happy to do so. [2] benchmark programs: https://github.com/shawnl/libmvec v2: rebase on top of the appropiate branch v3: get the function name right v4: add the function to the abilist add a non-FMA version so the results are the same as log() respect WANT_ROUNDING 2019-05-06 Shawn Landden [BZ #24207] * NEWS: Note the addition of PPC64 vector log and logf. * sysdeps/powerpc/bits/math-vector.h: Added log and logf entries. * sysdeps/powerpc/powerpc64/fpu/Versions: Added log and logf entries. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: (libmvec-sysdep_routines) (CFLAGS-vec_s_log2_vsx.c, double-vlen2-funcs): Added build of VSX SIMD log function and its tests. (CFLAGS-vec_s_logf4_vsx.c, float-vlen4-funcs): Added build of VSX SIMD logf function and its tests. * sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c: Added logf entry. * sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c: Added log entry. * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log2_vsx.c: New file. * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf4_vsx.h: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.h: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.h: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.c: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD log/logf added. --- NEWS | 2 + sysdeps/powerpc/powerpc64/fpu/Versions | 2 +- .../powerpc/powerpc64/fpu/multiarch/Makefile | 8 +- .../multiarch/test-double-vlen2-wrappers.c | 1 + .../fpu/multiarch/test-float-vlen4-wrappers.c | 1 + .../powerpc64/fpu/multiarch/vec_s_log2_vsx.c | 179 +++++++++ .../powerpc64/fpu/multiarch/vec_s_log_data.c | 340 ++++++++++++++++++ .../powerpc64/fpu/multiarch/vec_s_log_data.h | 41 +++ .../powerpc64/fpu/multiarch/vec_s_logf4_vsx.c | 123 +++++++ .../powerpc64/fpu/multiarch/vec_s_logf_data.c | 44 +++ .../powerpc64/fpu/multiarch/vec_s_logf_data.h | 32 ++ .../linux/powerpc/powerpc64/libmvec.abilist | 2 + 12 files changed, 772 insertions(+), 3 deletions(-) create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log2_vsx.c create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.c create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.h create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf4_vsx.c create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.c create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.h diff --git a/NEWS b/NEWS index 6b542faf93..b29eae7bd8 100644 --- a/NEWS +++ b/NEWS @@ -18,6 +18,8 @@ Major new features: - single-precision sine: sinf - double-precision sincos: sincos - single-precision sincos: sincosf + - double-precision log: log + - single-precision log: logf GCC support for auto-vectorization of functions on PPC64 is not yet available. Until that is done, the new vector math functions are diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions index 22bd7cc51b..ffe3bbc4ca 100644 --- a/sysdeps/powerpc/powerpc64/fpu/Versions +++ b/sysdeps/powerpc/powerpc64/fpu/Versions @@ -1,6 +1,6 @@ libmvec { GLIBC_2.30 { _ZGVbN2v_cos; _ZGVbN4v_cosf; _ZGVbN2v_sin; _ZGVbN4v_sinf; - _ZGVbN2vvv_sincos; _ZGVbN4vvv_sincosf; + _ZGVbN2vvv_sincos; _ZGVbN4vvv_sincosf; _ZGVbN2v_log; _ZGVbN4v_logf; } } diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile index b94fb354c6..c01f45f579 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile @@ -46,6 +46,8 @@ endif ifeq ($(subdir),mathvec) libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \ vec_d_sin2_vsx vec_s_sinf4_vsx \ + vec_s_logf4_vsx vec_s_log2_vsx \ + vec_s_logf_data vec_s_log_data \ vec_d_sincos2_vsx vec_s_sincosf4_vsx CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector CFLAGS-vec_s_cosf4_vsx.c += -mabi=altivec -maltivec -mvsx @@ -53,6 +55,8 @@ CFLAGS-vec_d_sin2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector CFLAGS-vec_s_sinf4_vsx.c += -mabi=altivec -maltivec -mvsx CFLAGS-vec_d_sincos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector CFLAGS-vec_s_sincosf4_vsx.c += -mabi=altivec -maltivec -mvsx +CFLAGS-vec_d_sinlog2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector +CFLAGS-vec_s_sinlogf4_vsx.c += -mabi=altivec -maltivec -mvsx endif # Variables for libmvec tests. @@ -60,8 +64,8 @@ ifeq ($(subdir),math) ifeq ($(build-mathvec),yes) libmvec-tests += double-vlen2 float-vlen4 -double-vlen2-funcs = cos sin sincos -float-vlen4-funcs = cos sin sincos +double-vlen2-funcs = cos sin sincos log +float-vlen4-funcs = cos sin sincos log double-vlen2-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX float-vlen4-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c index 7082a3500e..0ed2812cf6 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c @@ -23,5 +23,6 @@ VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVbN2v_cos) VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVbN2v_sin) +VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log) VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVbN2vvv_sincos) diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c index 87be9de0be..79b671c601 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c @@ -23,5 +23,6 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf) VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf) +VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf) VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVbN4vvv_sincosf) diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log2_vsx.c new file mode 100644 index 0000000000..682eeb1476 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log2_vsx.c @@ -0,0 +1,179 @@ +/* Double-precision vector log(x) function. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ +#include +#include + +#include "math_config.h" +#include "vec_s_log_data.h" + +typedef vector long long unsigned v64u; +typedef vector long long v64i; + +typedef union { + vector double d; + v64u l; +} u; + +typedef union { + double d; + uint64_t l; + unsigned u; + float f; +} us; + +#define T __log_data.tab +#define T2 __log_data.tab2 +#define B __log_data.poly1 +#define A __log_data.poly +#define Ln2hi __log_data.ln2hi +#define Ln2lo __log_data.ln2lo +#define N (1 << LOG_TABLE_BITS) +#define OFF 0x3fe6000000000000 + +#define INF 0x7ff0000000000000 + +vector double _ZGVbN2v_log(vector double x) { + v64u inf = {INF, INF}; + v64u ninf = inf | (1ULL << 63); + v64u zero = {0, 0}; + + u un; + un.d = x; + v64u xi = un.l; + + us lo; + us hi; + lo.d = 1.0 - 0x1p-4; + hi.d = 1.0 + 0x1.09p-4; + v64u lov = {lo.l, lo.l}; + v64u hiv = {hi.l, hi.l}; + u res; + v64u is_close_to_one = (v64u)vec_cmplt(xi - lov, hiv - lov); + if (!vec_all_eq(is_close_to_one, zero)) { + vector double r = x - 1.0; + vector double r2 = r * r; + vector double r3 = r * r2; + vector double b0 = {B[0], B[0]}; + vector double b12 = {B[1], B[2]}; + vector double b1 = vec_splat(b12, 0); + vector double b2 = vec_splat(b12, 1); + vector double b34 = {B[3], B[4]}; + vector double b3 = vec_splat(b34, 0); + vector double b4 = vec_splat(b34, 1); + vector double b56 = {B[5], B[6]}; + vector double b5 = vec_splat(b56, 0); + vector double b6 = vec_splat(b56, 1); + vector double b78 = {B[7], B[8]}; + vector double b7 = vec_splat(b78, 0); + vector double b8 = vec_splat(b78, 1); + vector double b910 = {B[9], B[10]}; + vector double b9 = vec_splat(b910, 0); + vector double b10 = vec_splat(b910, 1); + + res.d = r3 * (b1 + r * b2 + r2 * b3 + r3 * + (b4 + r * b5 + r2 * b6 + r3 * + (b7 + r * b8 + r2 * b9 + r3 * b10))); + /* Worst-case error is around 0.507 ULP. */ + vector double w = r * 0x1p27; + vector double rhi = r + w - w; + vector double rlo = r - rhi; + w = rhi * rhi * b0; /* B[0] == -0.5 */ + vector double hi = r + w; + vector double lo = r - hi + w; + lo += b0 * rlo * (rhi + r); + res.d += lo; + res.d += hi; + +#if WANT_ROUNDING + /* 1.0 -> 0 */ + u oned; + vector double one = {1.0, 1.0}; + oned.d = one; + v64u is_one = (v64u)vec_cmpeq(xi, oned.l); + res.l = vec_sel(res.l, zero, is_one); +#endif + } else + res.l = zero; + + v64u infexp = {0x7ff0000000000000, 0x7ff0000000000000}; + v64u is_special_cases = (v64u)vec_cmpge(xi - 0x0010000000000000, infexp - 0x0010000000000000); + if (!vec_all_eq(is_special_cases, zero)) { + v64u is_zero = (v64u)vec_cmpeq(xi << 1, zero); + res.l = vec_sel(res.l, ninf, is_zero); + + v64u is_inf = (v64u)vec_cmpeq(xi, inf); + res.l = vec_sel(res.l, inf, is_inf); + + v64u is_neg = (v64u)vec_cmpne(xi >> 63, zero); + v64u is_nan = (v64u)vec_cmpeq(xi & infexp, infexp) & ~is_inf; + res.l = vec_sel(res.l, infexp + 1/*NaN*/, is_nan | (is_neg & ~is_zero)); + + /* subnormals: normalize, remove from is_special_cases */ + v64u is_not_subnormal = is_nan | is_neg | is_zero | is_inf; + un.d = x * 0x1p52; + xi = vec_sel(xi, un.l - (52ULL << 52), is_special_cases & ~is_not_subnormal); + is_special_cases = is_not_subnormal; + } + /* x = 2^k z; where z is in range [OFF,2*OFF) and exact. + The range is split into N subintervals. + The ith subinterval contains z and c is near its center. */ + v64u tmp = xi - OFF; + v64u i = (tmp >> (52 - LOG_TABLE_BITS)) % N; + v64i k = ((v64i)tmp >> 52); + v64u iz = xi - (tmp & 0xfffULL << 52); + + vector double invc = {T[i[0]].invc, T[i[1]].invc}; + vector double logc = {T[i[0]].logc, T[i[1]].logc}; + + u z; + z.l = iz; +/* VSX has a fast fma, but we do this so that we get the same result as log()*/ +#ifdef __FP_FAST_FMA + vector double neg1 = {-1.0, -1.0}; + vector double r = vec_madd(z.d, invc, neg1); +#else + vector double chi = {T2[i[0]].chi, T2[i[1]].chi}; + vector double clo = {T2[i[0]].clo, T2[i[1]].clo}; + vector double r = (z - chi - clo) * invc; +#endif + vector double ln2hi = {Ln2hi, Ln2hi}; + vector double ln2lo = {Ln2lo, Ln2lo}; + vector double kd = {(double)k[0], (double)k[1]}; + vector double w2 = kd * ln2hi + logc; + vector double hid = w2 + r; + vector double lod = w2 - hid + r + kd * ln2lo; + + + /* log(x) = lo + (log1p(r) - r) + hi. */ + vector double r2 = r * r; /* rounding error: 0x1p-54/N^2. */ + + /* Worst case error if |y| > 0x1p-5: + 0.5 + 4.13/N + abs-poly-error*2^57 ULP + Worst case error if |y| > 0x1p-4: + 0.5 + 2.06/N + abs-poly-error*2^56 ULP. */ + vector double a0 = {A[0], A[0]}; + vector double a1 = {A[1], A[1]}; + vector double a2 = {A[2], A[2]}; + vector double a3 = {A[3], A[3]}; + vector double a4 = {A[4], A[4]}; + vector double y = lod + r2 * a0 + r * r2 * + (a1 + r * a2 + r2 * + (a3 + r * a4)) + hid; + res.d = vec_sel(res.d, y, ~is_close_to_one & ~is_special_cases); + return res.d; +} diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.c new file mode 100644 index 0000000000..aea5544f66 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.c @@ -0,0 +1,340 @@ +/* Constants used in polynomial approximations for vectorized log + funtion. + Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ +#include "vec_s_log_data.h" + +#define N (1 << LOG_TABLE_BITS) + +const struct log_data __log_data = { +.ln2hi = 0x1.62e42fefa3800p-1, +.ln2lo = 0x1.ef35793c76730p-45, +.poly1 = { +// relative error: 0x1.c04d76cp-63 +// in -0x1p-4 0x1.09p-4 (|log(1+x)| > 0x1p-4 outside the interval) +-0x1p-1, +0x1.5555555555577p-2, +-0x1.ffffffffffdcbp-3, +0x1.999999995dd0cp-3, +-0x1.55555556745a7p-3, +0x1.24924a344de3p-3, +-0x1.fffffa4423d65p-4, +0x1.c7184282ad6cap-4, +-0x1.999eb43b068ffp-4, +0x1.78182f7afd085p-4, +-0x1.5521375d145cdp-4, +}, +.poly = { +// relative error: 0x1.926199e8p-56 +// abs error: 0x1.882ff33p-65 +// in -0x1.fp-9 0x1.fp-9 +-0x1.0000000000001p-1, +0x1.555555551305bp-2, +-0x1.fffffffeb459p-3, +0x1.999b324f10111p-3, +-0x1.55575e506c89fp-3, +}, +/* Algorithm: + + x = 2^k z + log(x) = k ln2 + log(c) + log(z/c) + log(z/c) = poly(z/c - 1) + +where z is in [1.6p-1; 1.6p0] which is split into N subintervals and z falls +into the ith one, then table entries are computed as + + tab[i].invc = 1/c + tab[i].logc = (double)log(c) + tab2[i].chi = (double)c + tab2[i].clo = (double)(c - (double)c) + +where c is near the center of the subinterval and is chosen by trying +-2^29 +floating point invc candidates around 1/center and selecting one for which + + 1) the rounding error in 0x1.8p9 + logc is 0, + 2) the rounding error in z - chi - clo is < 0x1p-66 and + 3) the rounding error in (double)log(c) is minimized (< 0x1p-66). + +Note: 1) ensures that k*ln2hi + logc can be computed without rounding error, +2) ensures that z/c - 1 can be computed as (z - chi - clo)*invc with close to +a single rounding error when there is no fast fma for z*invc - 1, 3) ensures +that logc + poly(z/c - 1) has small error, however near x == 1 when +|log(x)| < 0x1p-4, this is not enough so that is special cased. */ +.tab = { +{0x1.734f0c3e0de9fp+0, -0x1.7cc7f79e69000p-2}, +{0x1.713786a2ce91fp+0, -0x1.76feec20d0000p-2}, +{0x1.6f26008fab5a0p+0, -0x1.713e31351e000p-2}, +{0x1.6d1a61f138c7dp+0, -0x1.6b85b38287800p-2}, +{0x1.6b1490bc5b4d1p+0, -0x1.65d5590807800p-2}, +{0x1.69147332f0cbap+0, -0x1.602d076180000p-2}, +{0x1.6719f18224223p+0, -0x1.5a8ca86909000p-2}, +{0x1.6524f99a51ed9p+0, -0x1.54f4356035000p-2}, +{0x1.63356aa8f24c4p+0, -0x1.4f637c36b4000p-2}, +{0x1.614b36b9ddc14p+0, -0x1.49da7fda85000p-2}, +{0x1.5f66452c65c4cp+0, -0x1.445923989a800p-2}, +{0x1.5d867b5912c4fp+0, -0x1.3edf439b0b800p-2}, +{0x1.5babccb5b90dep+0, -0x1.396ce448f7000p-2}, +{0x1.59d61f2d91a78p+0, -0x1.3401e17bda000p-2}, +{0x1.5805612465687p+0, -0x1.2e9e2ef468000p-2}, +{0x1.56397cee76bd3p+0, -0x1.2941b3830e000p-2}, +{0x1.54725e2a77f93p+0, -0x1.23ec58cda8800p-2}, +{0x1.52aff42064583p+0, -0x1.1e9e129279000p-2}, +{0x1.50f22dbb2bddfp+0, -0x1.1956d2b48f800p-2}, +{0x1.4f38f4734ded7p+0, -0x1.141679ab9f800p-2}, +{0x1.4d843cfde2840p+0, -0x1.0edd094ef9800p-2}, +{0x1.4bd3ec078a3c8p+0, -0x1.09aa518db1000p-2}, +{0x1.4a27fc3e0258ap+0, -0x1.047e65263b800p-2}, +{0x1.4880524d48434p+0, -0x1.feb224586f000p-3}, +{0x1.46dce1b192d0bp+0, -0x1.f474a7517b000p-3}, +{0x1.453d9d3391854p+0, -0x1.ea4443d103000p-3}, +{0x1.43a2744b4845ap+0, -0x1.e020d44e9b000p-3}, +{0x1.420b54115f8fbp+0, -0x1.d60a22977f000p-3}, +{0x1.40782da3ef4b1p+0, -0x1.cc00104959000p-3}, +{0x1.3ee8f5d57fe8fp+0, -0x1.c202956891000p-3}, +{0x1.3d5d9a00b4ce9p+0, -0x1.b81178d811000p-3}, +{0x1.3bd60c010c12bp+0, -0x1.ae2c9ccd3d000p-3}, +{0x1.3a5242b75dab8p+0, -0x1.a45402e129000p-3}, +{0x1.38d22cd9fd002p+0, -0x1.9a877681df000p-3}, +{0x1.3755bc5847a1cp+0, -0x1.90c6d69483000p-3}, +{0x1.35dce49ad36e2p+0, -0x1.87120a645c000p-3}, +{0x1.34679984dd440p+0, -0x1.7d68fb4143000p-3}, +{0x1.32f5cceffcb24p+0, -0x1.73cb83c627000p-3}, +{0x1.3187775a10d49p+0, -0x1.6a39a9b376000p-3}, +{0x1.301c8373e3990p+0, -0x1.60b3154b7a000p-3}, +{0x1.2eb4ebb95f841p+0, -0x1.5737d76243000p-3}, +{0x1.2d50a0219a9d1p+0, -0x1.4dc7b8fc23000p-3}, +{0x1.2bef9a8b7fd2ap+0, -0x1.4462c51d20000p-3}, +{0x1.2a91c7a0c1babp+0, -0x1.3b08abc830000p-3}, +{0x1.293726014b530p+0, -0x1.31b996b490000p-3}, +{0x1.27dfa5757a1f5p+0, -0x1.2875490a44000p-3}, +{0x1.268b39b1d3bbfp+0, -0x1.1f3b9f879a000p-3}, +{0x1.2539d838ff5bdp+0, -0x1.160c8252ca000p-3}, +{0x1.23eb7aac9083bp+0, -0x1.0ce7f57f72000p-3}, +{0x1.22a012ba940b6p+0, -0x1.03cdc49fea000p-3}, +{0x1.2157996cc4132p+0, -0x1.f57bdbc4b8000p-4}, +{0x1.201201dd2fc9bp+0, -0x1.e370896404000p-4}, +{0x1.1ecf4494d480bp+0, -0x1.d17983ef94000p-4}, +{0x1.1d8f5528f6569p+0, -0x1.bf9674ed8a000p-4}, +{0x1.1c52311577e7cp+0, -0x1.adc79202f6000p-4}, +{0x1.1b17c74cb26e9p+0, -0x1.9c0c3e7288000p-4}, +{0x1.19e010c2c1ab6p+0, -0x1.8a646b372c000p-4}, +{0x1.18ab07bb670bdp+0, -0x1.78d01b3ac0000p-4}, +{0x1.1778a25efbcb6p+0, -0x1.674f145380000p-4}, +{0x1.1648d354c31dap+0, -0x1.55e0e6d878000p-4}, +{0x1.151b990275fddp+0, -0x1.4485cdea1e000p-4}, +{0x1.13f0ea432d24cp+0, -0x1.333d94d6aa000p-4}, +{0x1.12c8b7210f9dap+0, -0x1.22079f8c56000p-4}, +{0x1.11a3028ecb531p+0, -0x1.10e4698622000p-4}, +{0x1.107fbda8434afp+0, -0x1.ffa6c6ad20000p-5}, +{0x1.0f5ee0f4e6bb3p+0, -0x1.dda8d4a774000p-5}, +{0x1.0e4065d2a9fcep+0, -0x1.bbcece4850000p-5}, +{0x1.0d244632ca521p+0, -0x1.9a1894012c000p-5}, +{0x1.0c0a77ce2981ap+0, -0x1.788583302c000p-5}, +{0x1.0af2f83c636d1p+0, -0x1.5715e67d68000p-5}, +{0x1.09ddb98a01339p+0, -0x1.35c8a49658000p-5}, +{0x1.08cabaf52e7dfp+0, -0x1.149e364154000p-5}, +{0x1.07b9f2f4e28fbp+0, -0x1.e72c082eb8000p-6}, +{0x1.06ab58c358f19p+0, -0x1.a55f152528000p-6}, +{0x1.059eea5ecf92cp+0, -0x1.63d62cf818000p-6}, +{0x1.04949cdd12c90p+0, -0x1.228fb8caa0000p-6}, +{0x1.038c6c6f0ada9p+0, -0x1.c317b20f90000p-7}, +{0x1.02865137932a9p+0, -0x1.419355daa0000p-7}, +{0x1.0182427ea7348p+0, -0x1.81203c2ec0000p-8}, +{0x1.008040614b195p+0, -0x1.0040979240000p-9}, +{0x1.fe01ff726fa1ap-1, 0x1.feff384900000p-9}, +{0x1.fa11cc261ea74p-1, 0x1.7dc41353d0000p-7}, +{0x1.f6310b081992ep-1, 0x1.3cea3c4c28000p-6}, +{0x1.f25f63ceeadcdp-1, 0x1.b9fc114890000p-6}, +{0x1.ee9c8039113e7p-1, 0x1.1b0d8ce110000p-5}, +{0x1.eae8078cbb1abp-1, 0x1.58a5bd001c000p-5}, +{0x1.e741aa29d0c9bp-1, 0x1.95c8340d88000p-5}, +{0x1.e3a91830a99b5p-1, 0x1.d276aef578000p-5}, +{0x1.e01e009609a56p-1, 0x1.07598e598c000p-4}, +{0x1.dca01e577bb98p-1, 0x1.253f5e30d2000p-4}, +{0x1.d92f20b7c9103p-1, 0x1.42edd8b380000p-4}, +{0x1.d5cac66fb5ccep-1, 0x1.606598757c000p-4}, +{0x1.d272caa5ede9dp-1, 0x1.7da76356a0000p-4}, +{0x1.cf26e3e6b2ccdp-1, 0x1.9ab434e1c6000p-4}, +{0x1.cbe6da2a77902p-1, 0x1.b78c7bb0d6000p-4}, +{0x1.c8b266d37086dp-1, 0x1.d431332e72000p-4}, +{0x1.c5894bd5d5804p-1, 0x1.f0a3171de6000p-4}, +{0x1.c26b533bb9f8cp-1, 0x1.067152b914000p-3}, +{0x1.bf583eeece73fp-1, 0x1.147858292b000p-3}, +{0x1.bc4fd75db96c1p-1, 0x1.2266ecdca3000p-3}, +{0x1.b951e0c864a28p-1, 0x1.303d7a6c55000p-3}, +{0x1.b65e2c5ef3e2cp-1, 0x1.3dfc33c331000p-3}, +{0x1.b374867c9888bp-1, 0x1.4ba366b7a8000p-3}, +{0x1.b094b211d304ap-1, 0x1.5933928d1f000p-3}, +{0x1.adbe885f2ef7ep-1, 0x1.66acd2418f000p-3}, +{0x1.aaf1d31603da2p-1, 0x1.740f8ec669000p-3}, +{0x1.a82e63fd358a7p-1, 0x1.815c0f51af000p-3}, +{0x1.a5740ef09738bp-1, 0x1.8e92954f68000p-3}, +{0x1.a2c2a90ab4b27p-1, 0x1.9bb3602f84000p-3}, +{0x1.a01a01393f2d1p-1, 0x1.a8bed1c2c0000p-3}, +{0x1.9d79f24db3c1bp-1, 0x1.b5b515c01d000p-3}, +{0x1.9ae2505c7b190p-1, 0x1.c2967ccbcc000p-3}, +{0x1.9852ef297ce2fp-1, 0x1.cf635d5486000p-3}, +{0x1.95cbaeea44b75p-1, 0x1.dc1bd3446c000p-3}, +{0x1.934c69de74838p-1, 0x1.e8c01b8cfe000p-3}, +{0x1.90d4f2f6752e6p-1, 0x1.f5509c0179000p-3}, +{0x1.8e6528effd79dp-1, 0x1.00e6c121fb800p-2}, +{0x1.8bfce9fcc007cp-1, 0x1.071b80e93d000p-2}, +{0x1.899c0dabec30ep-1, 0x1.0d46b9e867000p-2}, +{0x1.87427aa2317fbp-1, 0x1.13687334bd000p-2}, +{0x1.84f00acb39a08p-1, 0x1.1980d67234800p-2}, +{0x1.82a49e8653e55p-1, 0x1.1f8ffe0cc8000p-2}, +{0x1.8060195f40260p-1, 0x1.2595fd7636800p-2}, +{0x1.7e22563e0a329p-1, 0x1.2b9300914a800p-2}, +{0x1.7beb377dcb5adp-1, 0x1.3187210436000p-2}, +{0x1.79baa679725c2p-1, 0x1.377266dec1800p-2}, +{0x1.77907f2170657p-1, 0x1.3d54ffbaf3000p-2}, +{0x1.756cadbd6130cp-1, 0x1.432eee32fe000p-2}, +}, +#if !__FP_FAST_FMA +.tab2 = { +{0x1.61000014fb66bp-1, 0x1.e026c91425b3cp-56}, +{0x1.63000034db495p-1, 0x1.dbfea48005d41p-55}, +{0x1.650000d94d478p-1, 0x1.e7fa786d6a5b7p-55}, +{0x1.67000074e6fadp-1, 0x1.1fcea6b54254cp-57}, +{0x1.68ffffedf0faep-1, -0x1.c7e274c590efdp-56}, +{0x1.6b0000763c5bcp-1, -0x1.ac16848dcda01p-55}, +{0x1.6d0001e5cc1f6p-1, 0x1.33f1c9d499311p-55}, +{0x1.6efffeb05f63ep-1, -0x1.e80041ae22d53p-56}, +{0x1.710000e86978p-1, 0x1.bff6671097952p-56}, +{0x1.72ffffc67e912p-1, 0x1.c00e226bd8724p-55}, +{0x1.74fffdf81116ap-1, -0x1.e02916ef101d2p-57}, +{0x1.770000f679c9p-1, -0x1.7fc71cd549c74p-57}, +{0x1.78ffffa7ec835p-1, 0x1.1bec19ef50483p-55}, +{0x1.7affffe20c2e6p-1, -0x1.07e1729cc6465p-56}, +{0x1.7cfffed3fc9p-1, -0x1.08072087b8b1cp-55}, +{0x1.7efffe9261a76p-1, 0x1.dc0286d9df9aep-55}, +{0x1.81000049ca3e8p-1, 0x1.97fd251e54c33p-55}, +{0x1.8300017932c8fp-1, -0x1.afee9b630f381p-55}, +{0x1.850000633739cp-1, 0x1.9bfbf6b6535bcp-55}, +{0x1.87000204289c6p-1, -0x1.bbf65f3117b75p-55}, +{0x1.88fffebf57904p-1, -0x1.9006ea23dcb57p-55}, +{0x1.8b00022bc04dfp-1, -0x1.d00df38e04b0ap-56}, +{0x1.8cfffe50c1b8ap-1, -0x1.8007146ff9f05p-55}, +{0x1.8effffc918e43p-1, 0x1.3817bd07a7038p-55}, +{0x1.910001efa5fc7p-1, 0x1.93e9176dfb403p-55}, +{0x1.9300013467bb9p-1, 0x1.f804e4b980276p-56}, +{0x1.94fffe6ee076fp-1, -0x1.f7ef0d9ff622ep-55}, +{0x1.96fffde3c12d1p-1, -0x1.082aa962638bap-56}, +{0x1.98ffff4458a0dp-1, -0x1.7801b9164a8efp-55}, +{0x1.9afffdd982e3ep-1, -0x1.740e08a5a9337p-55}, +{0x1.9cfffed49fb66p-1, 0x1.fce08c19bep-60}, +{0x1.9f00020f19c51p-1, -0x1.a3faa27885b0ap-55}, +{0x1.a10001145b006p-1, 0x1.4ff489958da56p-56}, +{0x1.a300007bbf6fap-1, 0x1.cbeab8a2b6d18p-55}, +{0x1.a500010971d79p-1, 0x1.8fecadd78793p-55}, +{0x1.a70001df52e48p-1, -0x1.f41763dd8abdbp-55}, +{0x1.a90001c593352p-1, -0x1.ebf0284c27612p-55}, +{0x1.ab0002a4f3e4bp-1, -0x1.9fd043cff3f5fp-57}, +{0x1.acfffd7ae1ed1p-1, -0x1.23ee7129070b4p-55}, +{0x1.aefffee510478p-1, 0x1.a063ee00edea3p-57}, +{0x1.b0fffdb650d5bp-1, 0x1.a06c8381f0ab9p-58}, +{0x1.b2ffffeaaca57p-1, -0x1.9011e74233c1dp-56}, +{0x1.b4fffd995badcp-1, -0x1.9ff1068862a9fp-56}, +{0x1.b7000249e659cp-1, 0x1.aff45d0864f3ep-55}, +{0x1.b8ffff987164p-1, 0x1.cfe7796c2c3f9p-56}, +{0x1.bafffd204cb4fp-1, -0x1.3ff27eef22bc4p-57}, +{0x1.bcfffd2415c45p-1, -0x1.cffb7ee3bea21p-57}, +{0x1.beffff86309dfp-1, -0x1.14103972e0b5cp-55}, +{0x1.c0fffe1b57653p-1, 0x1.bc16494b76a19p-55}, +{0x1.c2ffff1fa57e3p-1, -0x1.4feef8d30c6edp-57}, +{0x1.c4fffdcbfe424p-1, -0x1.43f68bcec4775p-55}, +{0x1.c6fffed54b9f7p-1, 0x1.47ea3f053e0ecp-55}, +{0x1.c8fffeb998fd5p-1, 0x1.383068df992f1p-56}, +{0x1.cb0002125219ap-1, -0x1.8fd8e64180e04p-57}, +{0x1.ccfffdd94469cp-1, 0x1.e7ebe1cc7ea72p-55}, +{0x1.cefffeafdc476p-1, 0x1.ebe39ad9f88fep-55}, +{0x1.d1000169af82bp-1, 0x1.57d91a8b95a71p-56}, +{0x1.d30000d0ff71dp-1, 0x1.9c1906970c7dap-55}, +{0x1.d4fffea790fc4p-1, -0x1.80e37c558fe0cp-58}, +{0x1.d70002edc87e5p-1, -0x1.f80d64dc10f44p-56}, +{0x1.d900021dc82aap-1, -0x1.47c8f94fd5c5cp-56}, +{0x1.dafffd86b0283p-1, 0x1.c7f1dc521617ep-55}, +{0x1.dd000296c4739p-1, 0x1.8019eb2ffb153p-55}, +{0x1.defffe54490f5p-1, 0x1.e00d2c652cc89p-57}, +{0x1.e0fffcdabf694p-1, -0x1.f8340202d69d2p-56}, +{0x1.e2fffdb52c8ddp-1, 0x1.b00c1ca1b0864p-56}, +{0x1.e4ffff24216efp-1, 0x1.2ffa8b094ab51p-56}, +{0x1.e6fffe88a5e11p-1, -0x1.7f673b1efbe59p-58}, +{0x1.e9000119eff0dp-1, -0x1.4808d5e0bc801p-55}, +{0x1.eafffdfa51744p-1, 0x1.80006d54320b5p-56}, +{0x1.ed0001a127fa1p-1, -0x1.002f860565c92p-58}, +{0x1.ef00007babcc4p-1, -0x1.540445d35e611p-55}, +{0x1.f0ffff57a8d02p-1, -0x1.ffb3139ef9105p-59}, +{0x1.f30001ee58ac7p-1, 0x1.a81acf2731155p-55}, +{0x1.f4ffff5823494p-1, 0x1.a3f41d4d7c743p-55}, +{0x1.f6ffffca94c6bp-1, -0x1.202f41c987875p-57}, +{0x1.f8fffe1f9c441p-1, 0x1.77dd1f477e74bp-56}, +{0x1.fafffd2e0e37ep-1, -0x1.f01199a7ca331p-57}, +{0x1.fd0001c77e49ep-1, 0x1.181ee4bceacb1p-56}, +{0x1.feffff7e0c331p-1, -0x1.e05370170875ap-57}, +{0x1.00ffff465606ep+0, -0x1.a7ead491c0adap-55}, +{0x1.02ffff3867a58p+0, -0x1.77f69c3fcb2ep-54}, +{0x1.04ffffdfc0d17p+0, 0x1.7bffe34cb945bp-54}, +{0x1.0700003cd4d82p+0, 0x1.20083c0e456cbp-55}, +{0x1.08ffff9f2cbe8p+0, -0x1.dffdfbe37751ap-57}, +{0x1.0b000010cda65p+0, -0x1.13f7faee626ebp-54}, +{0x1.0d00001a4d338p+0, 0x1.07dfa79489ff7p-55}, +{0x1.0effffadafdfdp+0, -0x1.7040570d66bcp-56}, +{0x1.110000bbafd96p+0, 0x1.e80d4846d0b62p-55}, +{0x1.12ffffae5f45dp+0, 0x1.dbffa64fd36efp-54}, +{0x1.150000dd59ad9p+0, 0x1.a0077701250aep-54}, +{0x1.170000f21559ap+0, 0x1.dfdf9e2e3deeep-55}, +{0x1.18ffffc275426p+0, 0x1.10030dc3b7273p-54}, +{0x1.1b000123d3c59p+0, 0x1.97f7980030188p-54}, +{0x1.1cffff8299eb7p+0, -0x1.5f932ab9f8c67p-57}, +{0x1.1effff48ad4p+0, 0x1.37fbf9da75bebp-54}, +{0x1.210000c8b86a4p+0, 0x1.f806b91fd5b22p-54}, +{0x1.2300003854303p+0, 0x1.3ffc2eb9fbf33p-54}, +{0x1.24fffffbcf684p+0, 0x1.601e77e2e2e72p-56}, +{0x1.26ffff52921d9p+0, 0x1.ffcbb767f0c61p-56}, +{0x1.2900014933a3cp+0, -0x1.202ca3c02412bp-56}, +{0x1.2b00014556313p+0, -0x1.2808233f21f02p-54}, +{0x1.2cfffebfe523bp+0, -0x1.8ff7e384fdcf2p-55}, +{0x1.2f0000bb8ad96p+0, -0x1.5ff51503041c5p-55}, +{0x1.30ffffb7ae2afp+0, -0x1.10071885e289dp-55}, +{0x1.32ffffeac5f7fp+0, -0x1.1ff5d3fb7b715p-54}, +{0x1.350000ca66756p+0, 0x1.57f82228b82bdp-54}, +{0x1.3700011fbf721p+0, 0x1.000bac40dd5ccp-55}, +{0x1.38ffff9592fb9p+0, -0x1.43f9d2db2a751p-54}, +{0x1.3b00004ddd242p+0, 0x1.57f6b707638e1p-55}, +{0x1.3cffff5b2c957p+0, 0x1.a023a10bf1231p-56}, +{0x1.3efffeab0b418p+0, 0x1.87f6d66b152bp-54}, +{0x1.410001532aff4p+0, 0x1.7f8375f198524p-57}, +{0x1.4300017478b29p+0, 0x1.301e672dc5143p-55}, +{0x1.44fffe795b463p+0, 0x1.9ff69b8b2895ap-55}, +{0x1.46fffe80475ep+0, -0x1.5c0b19bc2f254p-54}, +{0x1.48fffef6fc1e7p+0, 0x1.b4009f23a2a72p-54}, +{0x1.4afffe5bea704p+0, -0x1.4ffb7bf0d7d45p-54}, +{0x1.4d000171027dep+0, -0x1.9c06471dc6a3dp-54}, +{0x1.4f0000ff03ee2p+0, 0x1.77f890b85531cp-54}, +{0x1.5100012dc4bd1p+0, 0x1.004657166a436p-57}, +{0x1.530001605277ap+0, -0x1.6bfcece233209p-54}, +{0x1.54fffecdb704cp+0, -0x1.902720505a1d7p-55}, +{0x1.56fffef5f54a9p+0, 0x1.bbfe60ec96412p-54}, +{0x1.5900017e61012p+0, 0x1.87ec581afef9p-55}, +{0x1.5b00003c93e92p+0, -0x1.f41080abf0ccp-54}, +{0x1.5d0001d4919bcp+0, -0x1.8812afb254729p-54}, +{0x1.5efffe7b87a89p+0, -0x1.47eb780ed6904p-54}, +}, +#endif +}; + diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.h new file mode 100644 index 0000000000..c1bf880c14 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_log_data.h @@ -0,0 +1,41 @@ +/* Constants used in polynomial approximations for vectorized log + funtion. + Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ +#ifndef S_LOG_DATA_H +#define S_LOG_DATA_H + +#define LOG_TABLE_BITS 7 +#define LOG_POLY_ORDER 6 +#define LOG_POLY1_ORDER 12 +const struct log_data { + double ln2hi; + double ln2lo; + double poly[LOG_POLY_ORDER - 1]; /* First coefficient is 1. */ + double poly1[LOG_POLY1_ORDER - 1]; + struct { + double invc, logc; + } tab[1 << LOG_TABLE_BITS]; +#if !__FP_FAST_FMA + struct { + double chi, clo; + } tab2[1 << LOG_TABLE_BITS]; +#endif +} __log_data; + +#endif + diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf4_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf4_vsx.c new file mode 100644 index 0000000000..7daa5d36e9 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf4_vsx.c @@ -0,0 +1,123 @@ +/* Single-precision vector logf(x) function. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Library General Public License as + published by the Free Software Foundation; either version 2 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Library General Public License for more details. + + You should have received a copy of the GNU Library General Public + License along with the GNU C Library; see the file COPYING.LIB. If + not, see . */ +#include +/* +LOGF_TABLE_BITS = 4 +LOGF_POLY_ORDER = 4 + +ULP error: 0.818 (nearest rounding.) +Relative error: 1.957 * 2^-26 (before rounding.) +*/ +#include "vec_s_logf_data.h" + +#define INF 0x7F800000 +#define NINF 0xFF800000 + +typedef union { + vector float f; + vector unsigned i; + vector int s; + vector double d; + vector long long unsigned l; +} u; + +#define T __logf_data.tab +#define A __logf_data.poly +#define Ln2 __logf_data.ln2 +#define N (1 << LOGF_TABLE_BITS) +#define OFF 0x3f330000 + +vector float _ZGVbN4v_logf(vector float x) { + vector unsigned inf = {INF, INF, INF, INF}; + vector unsigned ninf = inf | (1 << 31); + vector unsigned zero = {0, 0, 0, 0}; + vector unsigned special_cases, is_subnormal; + + u un; + un.f = x; + vector unsigned xi = un.i; + vector unsigned is_special_cases = (vector unsigned)vec_cmpge(xi - 0x00800000, inf - 0x00800000); + if (!vec_all_eq(is_special_cases, zero)) { + // 0 pos -> -inf, 0 neg -> NaN + vector unsigned is_zero = (vector unsigned)vec_cmpeq(xi << 1, zero); + special_cases = is_zero & ninf; + + // inf -> inf + vector unsigned is_inf = (vector unsigned)vec_cmpeq(xi, inf);; + special_cases = vec_sel(special_cases, inf, is_inf); + + // Invalid + vector unsigned is_negative = (vector unsigned)vec_cmpne(xi & 0x80000000, zero); + vector unsigned splat = {0xff000000, 0xff000000, 0xff000000, 0xff000000}; + vector unsigned is_outofrange = (vector unsigned)vec_cmpge(xi << 1, splat); + vector unsigned not_inf_or_zero = ~is_inf & ~is_zero; + vector unsigned is_invalid = (is_negative | is_outofrange) & not_inf_or_zero; + vector unsigned nan = {0x7f800001, 0x7f800001, 0x7f800001, 0x7f800001}; + special_cases = vec_sel(special_cases, nan, is_invalid); + + // normalize subnormals + is_subnormal = is_special_cases & ~is_invalid & not_inf_or_zero; + vector unsigned subnormals = is_subnormal & xi; + un.i = subnormals; + u un2; + un2.f = un.f * 0x1p23f; + subnormals = un2.i - (23 << 23); + + // clear subnormals, and merge in normalized ones + xi = vec_sel(xi, subnormals, is_subnormal); + } else { + special_cases = zero; + is_subnormal = zero; + } + + // back to the main part + vector unsigned tmp = xi - OFF; + vector unsigned i = (tmp >> (23 - LOGF_TABLE_BITS)) % N; + vector int k = ((vector int)tmp >> 23); + vector unsigned iz = xi - (tmp & 0x1ff << 23); + + vector double abinv = {(double)T[i[0]].invc, (double)T[i[1]].invc}; + vector double ablogc = {(double)T[i[0]].logc, (double)T[i[1]].logc}; + vector double cdinv = {(double)T[i[2]].invc, (double)T[i[3]].invc}; + vector double cdlogc = {(double)T[i[2]].logc, (double)T[i[3]].logc}; + + u izu; + izu.i = iz; + vector double izl = vec_unpackh(izu.f); + vector double izr = vec_unpackl(izu.f);; + vector double rl = izl * abinv - 1; + vector double rr = izr * cdinv - 1; + vector double kl = {(double)k[0], (double)k[1]}; + vector double kr = {(double)k[2], (double)k[3]}; + vector double Ln2v = {Ln2, Ln2}; + vector double y0l = ablogc + kl * Ln2v; + vector double y0r = cdlogc + kr * Ln2v; + + vector double r2l = rl * rl; + vector double r2r = rr * rr; + vector double yl = A[1] * rl + A[2]; + vector double yr = A[1] * rr + A[2]; + yl = A[0] * r2l + yl; + yr = A[0] * r2r + yr; + yl = yl * r2l + (y0l + rl); + yr = yr * r2r + (y0r + rr); + vector float y = {(float)yl[0], (float)yl[1], (float)yr[0], (float)yr[1]}; + un.f = y; + un.i = vec_sel(un.i, special_cases, is_special_cases & ~is_subnormal); + return un.f; +} diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.c new file mode 100644 index 0000000000..58902ee77b --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.c @@ -0,0 +1,44 @@ +/* Constants used in polynomial approximations for vectorized logf + Copyright (C) 2017-2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "vec_s_logf_data.h" + +const struct logf_data __logf_data = { + .tab = { + { 0x1.661ec79f8f3bep+0, -0x1.57bf7808caadep-2 }, + { 0x1.571ed4aaf883dp+0, -0x1.2bef0a7c06ddbp-2 }, + { 0x1.49539f0f010bp+0, -0x1.01eae7f513a67p-2 }, + { 0x1.3c995b0b80385p+0, -0x1.b31d8a68224e9p-3 }, + { 0x1.30d190c8864a5p+0, -0x1.6574f0ac07758p-3 }, + { 0x1.25e227b0b8eap+0, -0x1.1aa2bc79c81p-3 }, + { 0x1.1bb4a4a1a343fp+0, -0x1.a4e76ce8c0e5ep-4 }, + { 0x1.12358f08ae5bap+0, -0x1.1973c5a611cccp-4 }, + { 0x1.0953f419900a7p+0, -0x1.252f438e10c1ep-5 }, + { 0x1p+0, 0x0p+0 }, + { 0x1.e608cfd9a47acp-1, 0x1.aa5aa5df25984p-5 }, + { 0x1.ca4b31f026aap-1, 0x1.c5e53aa362eb4p-4 }, + { 0x1.b2036576afce6p-1, 0x1.526e57720db08p-3 }, + { 0x1.9c2d163a1aa2dp-1, 0x1.bc2860d22477p-3 }, + { 0x1.886e6037841edp-1, 0x1.1058bc8a07ee1p-2 }, + { 0x1.767dcf5534862p-1, 0x1.4043057b6ee09p-2 }, + }, + .ln2 = 0x1.62e42fefa39efp-1, + .poly = { + -0x1.00ea348b88334p-2, 0x1.5575b0be00b6ap-2, -0x1.ffffef20a4123p-2, + } +}; diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.h new file mode 100644 index 0000000000..dd96185a33 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_logf_data.h @@ -0,0 +1,32 @@ +/* Constants used in polynomial approximations for vectorized logf + Copyright (C) 2017-2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ +#ifndef S_LOGF_DATA_H +#define S_LOGF_DATA_H + +#define LOGF_TABLE_BITS 4 +#define LOGF_POLY_ORDER 4 +const struct logf_data { + struct { + double invc, logc; + } tab[1 << LOGF_TABLE_BITS]; + double ln2; + double poly[LOGF_POLY_ORDER - 1]; /* First order coefficient is 1. */ +} __logf_data; + +#endif + diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist index 9d298a788c..07e447172a 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist @@ -4,3 +4,5 @@ GLIBC_2.30 _ZGVbN2vvv_sincos F GLIBC_2.30 _ZGVbN4v_cosf F GLIBC_2.30 _ZGVbN4v_sinf F GLIBC_2.30 _ZGVbN4vvv_sincosf F +GLIBC_2.30 _ZGVbN2v_log F +GLIBC_2.30 _ZGVbN4v_logf F