From patchwork Thu Mar 7 05:43:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: GT X-Patchwork-Id: 31760 Received: (qmail 18661 invoked by alias); 7 Mar 2019 05:44:22 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 18594 invoked by uid 89); 7 Mar 2019 05:44:14 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-40135.protonmail.ch Date: Thu, 07 Mar 2019 05:43:59 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=default; t=1551937441; bh=bRjoh2olweB9oQZjjcbQc0MUUbNcW/s7eKOKJGC/9NU=; h=Date:To:From:Reply-To:Subject:Feedback-ID:From; b=t6SQSBqc0mcTyRA44w1zvpVAH3lK0JmTtNvpoZMjIYEsjdm3rE6c0LPzbFuoF6oqt 00Xa2NTbMNV4PgjoYBjVE3yKE+fp72cNwl+KssisErkkyCWa0tMyBCiX0dJTq8gXft 2cdXlg3o/cZiGGC2lvoRZhK3v6N1GRexwn5/ee0g= To: "libc-alpha@sourceware.org" , "tnggil@protonmail.com" From: GT Reply-To: GT Subject: [RESEND] [PATCH 2/12] PPC64: Addition of SIMD cosf function for POWER8. Message-ID: <8vuSE94joXSRqM9BzIUsziA04k-Pwyusnd73rsuHoJs3c0AJ5o9J7su5IxvjLy6nnxz6HupqLQ04ZKOGlkNnnWN2nH8lvbJpysGgtx-3jKw=@protonmail.com> MIME-Version: 1.0 Different in this patch: Makes changes requested in feedback from previous submission: 1. Items in NEWS are now below heading for version 2.30. 2. Description of NEWS entry no longer characterizes changes as relative to previous patch. 3. Removed changes to code for the double-precision cosine. They have been moved to the resent patch for that function. 4. The patch subject now identifies it as part of a sequence. From 056f0dd58b61a9068608c00472582b9335648095 Mon Sep 17 00:00:00 2001 From: Bert Tenjy Date: Thu, 7 Mar 2019 05:17:47 +0000 Subject: [PATCH 2/12] PPC64: Addition of SIMD cosf function for POWER8. [BZ #24205] Implements single-precision cosine using VSX vector capability. The polynomial cosine-approximating algorithm is adapted for PPC64 from x86_64 [commit #04f496d602]. The patch has been tested on PPC64/POWER8 Little Endian and Big Endian. It is tested using the framework created for libmvec on x86_64 which runs tests on issuing 'make check'. Tests of the new vector cosine function all pass. Details on the ABI are found at this link: But for adjusting the width of operands, details described for the double-precision cosine implemented earlier apply here. See git commit #7956c29f07 for that information. --- ChangeLog | 16 +++ NEWS | 10 +- sysdeps/powerpc/fpu/libm-test-ulps | 3 + sysdeps/powerpc/powerpc64/fpu/Versions | 2 +- .../powerpc/powerpc64/fpu/multiarch/Makefile | 7 +- .../fpu/multiarch/test-float-vlen4-wrappers.c | 24 ++++ .../powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c | 109 ++++++++++++++++++ .../powerpc64/fpu/multiarch/vec_s_trig_data.h | 72 ++++++++++++ .../linux/powerpc/powerpc64/libmvec.abilist | 1 + 9 files changed, 239 insertions(+), 5 deletions(-) create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h diff --git a/ChangeLog b/ChangeLog index 51956e6b78..0dba2c77b4 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,19 @@ +2019-03-07 + + [BZ #24205] + + * NEWS: Note the addition of PPC64 vector cosf. + * sysdeps/powerpc/fpu/libm-test-ulps: Regenerated. + * sysdeps/powerpc/powerpc64/fpu/Versions: Added new cosf entry. + * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: (libmvec-sysdep_routines) + (CFLAGS-vec_s_cosf4_vsx.c, libmvec-tests, float-vlen2-funcs) + (float-vlen2-arch-ext-cflags): Added build of VSX SIMD cosf function + and its tests. + * sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c: New file. + * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c: New file. + * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h: Likewise. + * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: New SIMD cosf added. + 2019-03-06 [BZ #24205] diff --git a/NEWS b/NEWS index 22735e1d1d..183b306f75 100644 --- a/NEWS +++ b/NEWS @@ -10,13 +10,19 @@ Version 2.30 Major new features: -* Start of implementing vector math library libmvec on PPC64/POWER8. - The double-precision cosine now has a vector version. +* Implementation of vector math library libmvec on PPC64/POWER8. + + The following functions now have vector versions. + - double-precision cosine: cos + - single-precision cosine: cosf + GCC support for auto-vectorization of functions on PPC64 is not yet available. Until that is done, the new vector math functions are inaccessible to applications. + Library libmvec is built by default for PPC64. Disable its creation by passing flag --disable-mathvec to configure. + The library ABI specification is x86_64 Vector Function ABI. More information on libmvec including a link to the ABI document is at: diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps index d392b135a7..3bd9e67096 100644 --- a/sysdeps/powerpc/fpu/libm-test-ulps +++ b/sysdeps/powerpc/fpu/libm-test-ulps @@ -1314,6 +1314,9 @@ ldouble: 5 Function: "cos_vlen2": double: 2 +Function: "cos_vlen4": +float: 1 + Function: "cosh": double: 1 float: 1 diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions index 9a3e1211cc..bdd4b657c4 100644 --- a/sysdeps/powerpc/powerpc64/fpu/Versions +++ b/sysdeps/powerpc/powerpc64/fpu/Versions @@ -1,5 +1,5 @@ libmvec { GLIBC_2.30 { - _ZGVbN2v_cos; + _ZGVbN2v_cos; _ZGVbN4v_cosf; } } diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile index 44c1c04c13..5f70c7659f 100644 --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile @@ -44,18 +44,21 @@ CFLAGS-s_modff-ppc64.c += -fsignaling-nans endif ifeq ($(subdir),mathvec) -libmvec-sysdep_routines += vec_d_cos2_vsx +libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx CFLAGS-vec_d_cos2_vsx.c += -mvsx +CFLAGS-vec_s_cosf4_vsx.c += -mvsx endif # Variables for libmvec tests. ifeq ($(subdir),math) ifeq ($(build-mathvec),yes) -libmvec-tests += double-vlen2 +libmvec-tests += double-vlen2 float-vlen4 double-vlen2-funcs = cos +float-vlen4-funcs = cos double-vlen2-arch-ext-cflags = -mvsx -DREQUIRE_VSX +float-vlen4-arch-ext-cflags = -mvsx -DREQUIRE_VSX endif endif diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c new file mode 100644 index 0000000000..f099990d4e --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c @@ -0,0 +1,24 @@ +/* Wrapper part of tests for VSX ISA versions of vector math functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "test-float-vlen4.h" +#include + +#define VEC_TYPE vector float + +VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf) diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c new file mode 100644 index 0000000000..fb9d27b865 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c @@ -0,0 +1,109 @@ +/* Function cosf vectorized with VSX. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include "vec_s_trig_data.h" + +vector float +_ZGVbN4v_cosf (vector float x) +{ + + /* + ALGORITHM DESCRIPTION: + + 1) Range reduction to [-Pi/2; +Pi/2] interval + a) We remove sign using absolute value operation + b) Add Pi/2 value to argument X for Cos to Sin transformation + c) Getting octant Y by 1/Pi multiplication + d) Add "Right Shifter" value + e) Treat obtained value as integer for destination sign setting. + Shift first bit of this value to the last (sign) position + f) Subtract "Right Shifter" value + g) Subtract 0.5 from result for octant correction + h) Subtract Y*PI from X argument, where PI divided to 4 parts: + X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4; + 2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval) + a) Calculate X^2 = X * X + b) Calculate polynomial: + R = X + X * X^2 * (A3 + x^2 * (A5 + ..... + 3) Destination sign setting + a) Set shifted destination sign using XOR operation: + R = XOR( R, S ). */ + + /* + ARGUMENT RANGE REDUCTION: + Add Pi/2 to argument: X' = X+Pi/2. Transforms cos to sin. */ + vector float x_prime = __s_half_pi + x; + + /* Y = X'*InvPi + RS : right shifter add. */ + vector float y = (x_prime * __s_inv_pi) + __s_rshifter; + + /* N = Y - RS : right shifter sub. */ + vector float n = y - __s_rshifter; + + /* SignRes = Y<<31 : shift LSB to MSB place for result sign. */ + vector float sign_res = (vector float) + vec_sl ((vector signed int) y, (vector unsigned int) vec_splats (31)); + + /* N = N - 0.5. */ + n = n - __s_one_half; + + /* Get absolute argument value: X = |X|. */ + vector float abs_x = vec_abs (x); + + /* Check for large arguments path. */ + vector bool int large_in = vec_cmpgt (abs_x, __s_rangeval); + + /* R = X - N*Pi1. */ + vector float r = x - (n * __s_pi1_fma); + + /* R = R - N*Pi2. */ + r = r - (n * __s_pi2_fma); + + /* R = R - N*Pi3. */ + r = r - (n * __s_pi3_fma); + + /* R2 = R*R. */ + vector float r2 = r * r; + + /* RECONSTRUCTION: + Final sign setting: Res = Poly^SignRes. */ + vector float res = (vector float) + ((vector signed int) r ^ (vector signed int) sign_res); + + /* Poly = R + R * R2*(A3+R2*(A5+R2*(A7+R2*A9))). */ + vector float poly = r2 * __s_a9_fma + __s_a7_fma; + poly = poly * r2 + __s_a5_fma; + poly = poly * r2 + __s_a3; + poly = poly * r2 * res + res; + + if (large_in[0]) + poly[0] = cosf (x[0]); + + if (large_in[1]) + poly[1] = cosf (x[1]); + + if (large_in[2]) + poly[2] = cosf (x[2]); + + if (large_in[3]) + poly[3] = cosf (x[3]); + + return poly; + +} /* Function _ZGVbN4v_cosf. */ diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h new file mode 100644 index 0000000000..55c28563e7 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h @@ -0,0 +1,72 @@ +/* Constants used in polynomial approximations for vectorized sinf, cosf, + and sincosf functions. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef S_TRIG_DATA_H +#define S_TRIG_DATA_H + +#include + +/* PI/2. */ +const vector float __s_half_pi = +{ 0x1.921fb6p+0, 0x1.921fb6p+0, 0x1.921fb6p+0, 0x1.921fb6p+0 }; + +/* Inverse PI. */ +const vector float __s_inv_pi = +{ 0x1.45f306p-2, 0x1.45f306p-2, 0x1.45f306p-2, 0x1.45f306p-2 }; + +/* Right-shifter constant. */ +const vector float __s_rshifter = +{ 0x1.8p+23, 0x1.8p+23, 0x1.8p+23, 0x1.8p+23 }; + +/* One-half. */ +const vector float __s_one_half = +{ 0x1p-1, 0x1p-1, 0x1p-1, 0x1p-1 }; + +/* Threshold for out-of-range values. */ +const vector float __s_rangeval = +{ 0x1.388p+13, 0x1.388p+13, 0x1.388p+13, 0x1.388p+13 }; + +/* PI1, PI2, and PI3 when FMA is available + PI high part (when FMA available). */ +const vector float __s_pi1_fma = +{ 0x1.921fb6p+1, 0x1.921fb6p+1, 0x1.921fb6p+1, 0x1.921fb6p+1 }; + +/* PI mid part (when FMA available). */ +const vector float __s_pi2_fma = +{ -0x1.777a5cp-24, -0x1.777a5cp-24, -0x1.777a5cp-24, -0x1.777a5cp-24 }; + +/* PI low part (when FMA available). */ +const vector float __s_pi3_fma = +{ -0x1.ee59dap-49, -0x1.ee59dap-49, -0x1.ee59dap-49, -0x1.ee59dap-49 }; + +/* Polynomial constants for work w/o FMA, relative error ~ 2^(-26.625). */ +const vector float __s_a3 = +{ -0x1.55554cp-3, -0x1.55554cp-3, -0x1.55554cp-3, -0x1.55554cp-3 }; + +/* Polynomial constants, work with FMA, relative error ~ 2^(-26.417). */ +const vector float __s_a5_fma = +{ 0x1.110edp-7, 0x1.110edp-7, 0x1.110edp-7, 0x1.110edp-7 }; + +const vector float __s_a7_fma = +{ -0x1.9f6d9ep-13, -0x1.9f6d9ep-13, -0x1.9f6d9ep-13, -0x1.9f6d9ep-13 }; + +const vector float __s_a9_fma = +{ 0x1.5d866ap-19, 0x1.5d866ap-19, 0x1.5d866ap-19, 0x1.5d866ap-19 }; + +#endif /* S_TRIG_DATA_H. */ diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist index 656ce0541f..8eef5e1e72 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist @@ -1 +1,2 @@ GLIBC_2.30 _ZGVbN2v_cos F +GLIBC_2.30 _ZGVbN4v_cosf F -- 2.20.1