From patchwork Mon Mar 4 21:03:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: GT X-Patchwork-Id: 31718 Received: (qmail 68974 invoked by alias); 4 Mar 2019 21:03:41 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 68956 invoked by uid 89); 4 Mar 2019 21:03:40 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-27.6 required=5.0 tests=BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-40136.protonmail.ch Date: Mon, 04 Mar 2019 21:03:26 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=default; t=1551733413; bh=Vqx2GagrU6OzG5OwkcXb/5dc6cZrrJ9yXVebaPzJUK0=; h=Date:To:From:Reply-To:Subject:Feedback-ID:From; b=SjoVm20Iu+s8hd/cUfaUDDIRDV4ynyTVrqFcUP6s9gNr2yo617q+se9+hRRTdPOke dFV9YGyTrLoy6k+1++baQTrqAKi4dBDocY56gCZQke8oU4Fb8vDplmScQ3Uoo1kzwS FsR1qRw06m7E6fFwjMk+Kx8+De2qmGB9YILuFPoM= To: "libc-alpha@sourceware.org" , "tnggil@protonmail.com" From: GT Reply-To: GT Subject: [PATCH] PPC64: Adds SIMD sine function for POWER8. Message-ID: MIME-Version: 1.0 This adds the vector sine function for PPC64/POWER8 VSX. Testing with 'make check' passes all tests of the double-precision sine function. The maximum error is 2-ulps, and is expected of the algorithm which guarantees less than 4-ulps in error. Tested on both PPC64 Little and Big Endian. From 78e18edc25ab733e6e25c3c21adb077a7dba2557 Mon Sep 17 00:00:00 2001 From: Bert Tenjy Date: Mon, 4 Mar 2019 20:45:51 +0000 Subject: [PATCH] PPC64: Adds SIMD sine function for POWER8. [BZ #24206] Implements double-precision vector sine function. The polynomial sine-approximating algorithm is adapted for PPC64 from x86_64 [commit #4b9c2b707b]. The patch has been tested on PPC64/POWER8 Little Endian and Big Endian. Testing uses the framework created for libmvec on x86_64 which runs tests on issuing 'make check'. Tests of the new vector sine function all pass. --- ChangeLog | 18 ++++ NEWS | 1 + sysdeps/powerpc/bits/math-vector.h | 2 + sysdeps/powerpc/fpu/libm-test-ulps | 3 + sysdeps/powerpc/powerpc64/fpu/Versions | 2 +- .../powerpc/powerpc64/fpu/multiarch/Makefile | 6 +- .../multiarch/test-double-vlen2-wrappers.c | 1 + .../powerpc64/fpu/multiarch/vec_d_sin2_vsx.c | 101 ++++++++++++++++++ .../powerpc64/fpu/multiarch/vec_d_trig_data.h | 44 +++++--- .../linux/powerpc/powerpc64/libmvec.abilist | 1 + 10 files changed, 160 insertions(+), 19 deletions(-) create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_sin2_vsx.c diff --git a/ChangeLog b/ChangeLog index 0fe27f52fb..75b3d2978f 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,21 @@ +2019-03-04 + + [BZ #24206] + + * NEWS: Note the addition of PPC64 vector sine. + * sysdeps/powerpc/bits/math-vector.h: Added sine SIMD declaration. + * sysdeps/powerpc/fpu/libm-test-ulps: Regenerated. + * sysdeps/powerpc/powerpc64/fpu/Versions: Added sine entry. + * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: (libmvec-sysdep_routines) + (CFLAGS-vec_s_cosf4_vsx.c, libmvec-tests, double-vlen2-funcs) + (double-vlen2-arch-ext-cflags): Added build of VSX SIMD sine function + and its tests. + * sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c: Added sine entry. + * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_sin2_vsx.c: New file. + * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h: Added constants used in + computing vector sine. + * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD sine added. + 2019-03-03 [BZ #24205] diff --git a/NEWS b/NEWS index 42b5aca08b..bbb86dfd41 100644 --- a/NEWS +++ b/NEWS @@ -10,6 +10,7 @@ using `glibc' in the "product" field. The following functions now have vector versions: - cos (double-precision cosine) - cosf (single-precision cosine) + - sin (double-precision sine) GCC support for auto-vectorization of functions on PPC64 is not yet available. Until that is done, the new vector math functions are diff --git a/sysdeps/powerpc/bits/math-vector.h b/sysdeps/powerpc/bits/math-vector.h index bb4834c0a7..0327f204a9 100644 --- a/sysdeps/powerpc/bits/math-vector.h +++ b/sysdeps/powerpc/bits/math-vector.h @@ -38,6 +38,8 @@ # define __DECL_SIMD_cos __DECL_SIMD_PPC64 # undef __DECL_SIMD_cosf # define __DECL_SIMD_cosf __DECL_SIMD_PPC64 +# undef __DECL_SIMD_sin +# define __DECL_SIMD_sin __DECL_SIMD_PPC64 # endif #endif diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps index 3bd9e67096..e72452e757 100644 --- a/sysdeps/powerpc/fpu/libm-test-ulps +++ b/sysdeps/powerpc/fpu/libm-test-ulps @@ -2570,6 +2570,9 @@ ifloat128: 3 ildouble: 5 ldouble: 5 +Function: "sin_vlen2": +double: 2 + Function: "sincos": double: 1 float: 1 diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions index bdd4b657c4..4852a2791f 100644 --- a/sysdeps/powerpc/powerpc64/fpu/Versions +++ b/sysdeps/powerpc/powerpc64/fpu/Versions @@ -1,5 +1,5 @@ libmvec { GLIBC_2.30 { - _ZGVbN2v_cos; _ZGVbN4v_cosf; + _ZGVbN2v_cos; _ZGVbN4v_cosf; _ZGVbN2v_sin; } } diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile index 5f70c7659f..20a3721854 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile @@ -44,9 +44,11 @@ CFLAGS-s_modff-ppc64.c += -fsignaling-nans endif ifeq ($(subdir),mathvec) -libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx +libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \ + vec_d_sin2_vsx CFLAGS-vec_d_cos2_vsx.c += -mvsx CFLAGS-vec_s_cosf4_vsx.c += -mvsx +CFLAGS-vec_d_sin2_vsx.c += -mvsx endif # Variables for libmvec tests. @@ -54,7 +56,7 @@ ifeq ($(subdir),math) ifeq ($(build-mathvec),yes) libmvec-tests += double-vlen2 float-vlen4 -double-vlen2-funcs = cos +double-vlen2-funcs = cos sin float-vlen4-funcs = cos double-vlen2-arch-ext-cflags = -mvsx -DREQUIRE_VSX diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c index 17e2cc0724..10a1ec281b 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c @@ -22,3 +22,4 @@ #define VEC_TYPE vector double VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVbN2v_cos) +VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVbN2v_sin) diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_sin2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_sin2_vsx.c new file mode 100644 index 0000000000..d6c1e25673 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_sin2_vsx.c @@ -0,0 +1,101 @@ +/* Function sin vectorized with VSX. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "vec_d_trig_data.h" + +/* ALGORITHM DESCRIPTION: + + ( low accuracy ( < 4ulp ) or enhanced performance + ( half of correct mantissa ) implementation ) + + Argument representation: + arg = N*Pi + R + + Result calculation: + sin(arg) = sin(N*Pi + R) = (-1)^N * sin(R) + sin(R) is approximated by corresponding polynomial. */ + +vector double +_ZGVbN2v_sin (vector double x) +{ + + /* ARGUMENT RANGE REDUCTION: X' = |X|. */ + vector double abs_x_prime = vec_abs (x); + + /* Y = X'*InvPi + RS : right shifter add. */ + vector double y = (abs_x_prime * __d_inv_pi) + __d_rshifter; + + /* N = Y - RS : right shifter sub. */ + vector double n = y - __d_rshifter; + + /* SignRes = Y<<63 : shift LSB to MSB place for result sign. */ + vector double sign_res = (vector double) vec_sl ((vector long long) y, + (vector unsigned long long) + vec_splats (63)); + + /* Check for large arguments path. */ + vector bool long long large_in = vec_cmpgt (abs_x_prime, __d_rangeval); + + /* R = X' - N*Pi1. */ + vector double r = abs_x_prime - (n * __d_pi1_fma); + + /* R = R - N*Pi2. */ + r = r - (n * __d_pi2_fma); + + /* R = R - N*Pi3. */ + r = r - (n * __d_pi3_fma); + + /* POLYNOMIAL APPROXIMATION: R2 = R*R. */ + vector double r2 = r * r; + + /* R = R^SignRes : update sign of reduced argument. */ + vector double r_sign + = (vector double) ((vector long long) r ^ (vector long long) sign_res); + + /* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))). */ + vector double poly = r2 * __d_coeff7_sin + __d_coeff6_sin; + poly = poly * r2 + __d_coeff5_sin; + poly = poly * r2 + __d_coeff4_sin; + poly = poly * r2 + __d_coeff3_sin; + + /* Poly = R2*(C1+R2*(C2+R2*Poly)). */ + poly = poly * r2 + __d_coeff2_sin; + poly = poly * r2 + __d_coeff1_sin; + poly = poly * r2; + + /* Poly = Poly*R + R. */ + poly = poly * r_sign + r_sign; + + /* SignX: -ve sign bit of X. */ + vector double neg_sign + = (vector double) vec_andc ((vector bool long long) x, __d_abs_mask); + + /* RECONSTRUCTION: Final sign setting: Res = Poly^SignX. */ + vector double out + = (vector double) ((vector long long) poly ^ (vector long long) neg_sign); + + if (large_in[0] != 0) + out[0] = sin (x[0]); + + if (large_in[1] != 0) + out[1] = sin (x[1]); + + return out; + +} /* _ZGVbN2v_sin function. */ diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h index ecd47d258f..bc823ef19c 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h @@ -23,38 +23,50 @@ #include /* PI/2. */ -const vector double __d_half_pi = {0x1.921fb54442d18p+0, 0x1.921fb54442d18p+0}; +static const vector double __d_half_pi = {0x1.921fb54442d18p+0, 0x1.921fb54442d18p+0}; /* Inverse PI. */ -const vector double __d_inv_pi = {0x1.45f306dc9c883p-2, 0x1.45f306dc9c883p-2}; +static const vector double __d_inv_pi = {0x1.45f306dc9c883p-2, 0x1.45f306dc9c883p-2}; /* Right-shifter constant. */ -const vector double __d_rshifter = {0x1.8p+52, 0x1.8p+52}; +static const vector double __d_rshifter = {0x1.8p+52, 0x1.8p+52}; /* Working range threshold. */ -const vector double __d_rangeval = {0x1p+23, 0x1p+23}; +static const vector double __d_rangeval = {0x1p+23, 0x1p+23}; /* One-half. */ -const vector double __d_one_half = {0x1p-1, 0x1p-1}; +static const vector double __d_one_half = {0x1p-1, 0x1p-1}; /* Range reduction PI-based constants if FMA available: PI high part (FMA available). */ -const vector double __d_pi1_fma = {0x1.921fb54442d18p+1, 0x1.921fb54442d18p+1}; +static const vector double __d_pi1_fma = {0x1.921fb54442d18p+1, 0x1.921fb54442d18p+1}; /* PI mid part (FMA available). */ -const vector double __d_pi2_fma = {0x1.1a62633145c06p-53, 0x1.1a62633145c06p-53}; +static const vector double __d_pi2_fma = {0x1.1a62633145c06p-53, 0x1.1a62633145c06p-53}; /* PI low part (FMA available). */ -const vector double __d_pi3_fma +static const vector double __d_pi3_fma = {0x1.c1cd129024e09p-106,0x1.c1cd129024e09p-106}; -/* Polynomial coefficients (relative error 2^(-52.115)). */ -const vector double __d_coeff7 = {-0x1.9f0d60811aac8p-41,-0x1.9f0d60811aac8p-41}; -const vector double __d_coeff6 = {0x1.60e6857a2f22p-33,0x1.60e6857a2f22p-33}; -const vector double __d_coeff5 = {-0x1.ae63546002231p-26,-0x1.ae63546002231p-26}; -const vector double __d_coeff4 = {0x1.71de38030feap-19,0x1.71de38030feap-19}; -const vector double __d_coeff3 = {-0x1.a01a019a5b86dp-13,-0x1.a01a019a5b86dp-13}; -const vector double __d_coeff2 = {0x1.111111110a4a8p-7,0x1.111111110a4a8p-7}; -const vector double __d_coeff1 = {-0x1.55555555554a7p-3,-0x1.55555555554a7p-3}; +/* Polynomial coefficients for cosine (relative error 2^(-52.115)). */ +static const vector double __d_coeff7 = {-0x1.9f0d60811aac8p-41,-0x1.9f0d60811aac8p-41}; +static const vector double __d_coeff6 = {0x1.60e6857a2f22p-33,0x1.60e6857a2f22p-33}; +static const vector double __d_coeff5 = {-0x1.ae63546002231p-26,-0x1.ae63546002231p-26}; +static const vector double __d_coeff4 = {0x1.71de38030feap-19,0x1.71de38030feap-19}; +static const vector double __d_coeff3 = {-0x1.a01a019a5b86dp-13,-0x1.a01a019a5b86dp-13}; +static const vector double __d_coeff2 = {0x1.111111110a4a8p-7,0x1.111111110a4a8p-7}; +static const vector double __d_coeff1 = {-0x1.55555555554a7p-3,-0x1.55555555554a7p-3}; + +/* Absolute value mask. */ +static const vector bool long long __d_abs_mask = { 0x7fffffffffffffff, 0x7fffffffffffffff }; + +/* Polynomial coefficients for sine (relative error 2^(-52.115)). */ +static const vector double __d_coeff7_sin = { -0x1.9f1517e9f65fp-41, -0x1.9f1517e9f65fp-41 }; +static const vector double __d_coeff6_sin = { 0x1.60e6bee01d83ep-33, 0x1.60e6bee01d83ep-33 }; +static const vector double __d_coeff5_sin = { -0x1.ae6355aaa4a53p-26, -0x1.ae6355aaa4a53p-26 }; +static const vector double __d_coeff4_sin = { 0x1.71de3806add1ap-19, 0x1.71de3806add1ap-19 }; +static const vector double __d_coeff3_sin = { -0x1.a01a019a659ddp-13, -0x1.a01a019a659ddp-13 }; +static const vector double __d_coeff2_sin = { 0x1.111111110a573p-7, 0x1.111111110a573p-7 }; +static const vector double __d_coeff1_sin = { -0x1.55555555554a8p-3, -0x1.55555555554a8p-3 }; #endif /* D_TRIG_DATA_H. */ diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist index 8eef5e1e72..a1a7f69d4c 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist @@ -1,2 +1,3 @@ GLIBC_2.30 _ZGVbN2v_cos F +GLIBC_2.30 _ZGVbN2v_sin F GLIBC_2.30 _ZGVbN4v_cosf F -- 2.20.1