[RESEND,2/12] PPC64: Addition of SIMD cosf function for POWER8.

Message ID 8vuSE94joXSRqM9BzIUsziA04k-Pwyusnd73rsuHoJs3c0AJ5o9J7su5IxvjLy6nnxz6HupqLQ04ZKOGlkNnnWN2nH8lvbJpysGgtx-3jKw=@protonmail.com
State Superseded
Headers

Commit Message

GT March 7, 2019, 5:43 a.m. UTC
  Different in this patch:

Makes changes requested in feedback from previous submission:

1. Items in NEWS are now below heading for version 2.30.
2. Description of NEWS entry no longer characterizes changes as
relative to previous patch.
3. Removed changes to code for the double-precision cosine. They
have been moved to the resent patch for that function.
4. The patch subject now identifies it as part of a sequence.
  

Patch

From 056f0dd58b61a9068608c00472582b9335648095 Mon Sep 17 00:00:00 2001
From: Bert Tenjy <bert.tenjy@gmail.com>
Date: Thu, 7 Mar 2019 05:17:47 +0000
Subject: [PATCH 2/12] PPC64: Addition of SIMD cosf function for POWER8.

[BZ #24205]

Implements single-precision cosine using VSX vector capability. The polynomial
cosine-approximating algorithm is adapted for PPC64 from x86_64 [commit #04f496d602].

The patch has been tested on PPC64/POWER8 Little Endian and Big Endian. It is
tested using the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector cosine function all pass.

Details on the ABI are found at this link:
<https://sourceware.org/glibc/wiki/
libmvec?action=AttachFile&do=view&target=VectorABI.txt>

But for adjusting the width of operands, details described for the
double-precision cosine implemented earlier apply here. See git
commit #7956c29f07 for that information.
---
 ChangeLog                                     |  16 +++
 NEWS                                          |  10 +-
 sysdeps/powerpc/fpu/libm-test-ulps            |   3 +
 sysdeps/powerpc/powerpc64/fpu/Versions        |   2 +-
 .../powerpc/powerpc64/fpu/multiarch/Makefile  |   7 +-
 .../fpu/multiarch/test-float-vlen4-wrappers.c |  24 ++++
 .../powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c | 109 ++++++++++++++++++
 .../powerpc64/fpu/multiarch/vec_s_trig_data.h |  72 ++++++++++++
 .../linux/powerpc/powerpc64/libmvec.abilist   |   1 +
 9 files changed, 239 insertions(+), 5 deletions(-)
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h

diff --git a/ChangeLog b/ChangeLog
index 51956e6b78..0dba2c77b4 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@ 
+2019-03-07    <bert.tenjy@gmail.com>
+
+       [BZ #24205]
+
+       * NEWS: Note the addition of PPC64 vector cosf.
+       * sysdeps/powerpc/fpu/libm-test-ulps: Regenerated.
+       * sysdeps/powerpc/powerpc64/fpu/Versions: Added new cosf entry.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: (libmvec-sysdep_routines)
+       (CFLAGS-vec_s_cosf4_vsx.c, libmvec-tests, float-vlen2-funcs)
+       (float-vlen2-arch-ext-cflags): Added build of VSX SIMD cosf function
+       and its tests.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c: New file.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c: New file.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h: Likewise.
+       * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: New SIMD cosf added.
+
 2019-03-06    <bert.tenjy@gmail.com>
 
 	[BZ #24205]
diff --git a/NEWS b/NEWS
index 22735e1d1d..183b306f75 100644
--- a/NEWS
+++ b/NEWS
@@ -10,13 +10,19 @@  Version 2.30
 
 Major new features:
 
-* Start of implementing vector math library libmvec on PPC64/POWER8.
-  The double-precision cosine now has a vector version.
+* Implementation of vector math library libmvec on PPC64/POWER8.
+
+  The following functions now have vector versions.
+  - double-precision cosine: cos
+  - single-precision cosine: cosf
+
   GCC support for auto-vectorization of functions on PPC64 is not yet
   available. Until that is done, the new vector math functions are
   inaccessible to applications.
+
   Library libmvec is built by default for PPC64. Disable its creation by
   passing flag --disable-mathvec to configure.
+
   The library ABI specification is x86_64 Vector Function ABI.
   More information on libmvec including a link to the ABI document is at:
   <https://sourceware.org/glibc/wiki/libmvec>
diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
index d392b135a7..3bd9e67096 100644
--- a/sysdeps/powerpc/fpu/libm-test-ulps
+++ b/sysdeps/powerpc/fpu/libm-test-ulps
@@ -1314,6 +1314,9 @@  ldouble: 5
 Function: "cos_vlen2":
 double: 2
 
+Function: "cos_vlen4":
+float: 1
+
 Function: "cosh":
 double: 1
 float: 1
diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions
index 9a3e1211cc..bdd4b657c4 100644
--- a/sysdeps/powerpc/powerpc64/fpu/Versions
+++ b/sysdeps/powerpc/powerpc64/fpu/Versions
@@ -1,5 +1,5 @@ 
 libmvec {
   GLIBC_2.30 {
-    _ZGVbN2v_cos;
+    _ZGVbN2v_cos; _ZGVbN4v_cosf;
   }
 }
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
index 44c1c04c13..5f70c7659f 100644
--- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
@@ -44,18 +44,21 @@  CFLAGS-s_modff-ppc64.c += -fsignaling-nans
 endif
 
 ifeq ($(subdir),mathvec)
-libmvec-sysdep_routines += vec_d_cos2_vsx
+libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx
 CFLAGS-vec_d_cos2_vsx.c += -mvsx
+CFLAGS-vec_s_cosf4_vsx.c += -mvsx
 endif
 
 # Variables for libmvec tests.
 ifeq ($(subdir),math)
 ifeq ($(build-mathvec),yes)
-libmvec-tests += double-vlen2
+libmvec-tests += double-vlen2 float-vlen4
 
 double-vlen2-funcs = cos
+float-vlen4-funcs = cos
 
 double-vlen2-arch-ext-cflags = -mvsx -DREQUIRE_VSX
+float-vlen4-arch-ext-cflags = -mvsx -DREQUIRE_VSX
 
 endif
 endif
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
new file mode 100644
index 0000000000..f099990d4e
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
@@ -0,0 +1,24 @@ 
+/* Wrapper part of tests for VSX ISA versions of vector math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-float-vlen4.h"
+#include <altivec.h>
+
+#define VEC_TYPE vector float
+
+VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c
new file mode 100644
index 0000000000..fb9d27b865
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c
@@ -0,0 +1,109 @@ 
+/* Function cosf vectorized with VSX.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+#include "vec_s_trig_data.h"
+
+vector float
+_ZGVbN4v_cosf (vector float x)
+{
+
+  /*
+   ALGORITHM DESCRIPTION:
+
+   1) Range reduction to [-Pi/2; +Pi/2] interval
+     a) We remove sign using absolute value operation
+     b) Add Pi/2 value to argument X for Cos to Sin transformation
+     c) Getting octant Y by 1/Pi multiplication
+     d) Add "Right Shifter" value
+     e) Treat obtained value as integer for destination sign setting.
+        Shift first bit of this value to the last (sign) position
+     f) Subtract "Right Shifter"  value
+     g) Subtract 0.5 from result for octant correction
+     h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+          X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+   2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+     a) Calculate X^2 = X * X
+     b) Calculate polynomial:
+         R = X + X * X^2 * (A3 + x^2 * (A5 + .....
+   3) Destination sign setting
+     a) Set shifted destination sign using XOR operation:
+          R = XOR( R, S ).  */
+
+  /*
+   ARGUMENT RANGE REDUCTION:
+   Add Pi/2 to argument: X' = X+Pi/2. Transforms cos to sin.  */
+  vector float x_prime = __s_half_pi + x;
+
+  /* Y = X'*InvPi + RS : right shifter add.  */
+  vector float y = (x_prime * __s_inv_pi) + __s_rshifter;
+
+  /* N = Y - RS : right shifter sub.  */
+  vector float n = y - __s_rshifter;
+
+  /* SignRes = Y<<31 : shift LSB to MSB place for result sign.  */
+  vector float sign_res = (vector float)
+      vec_sl ((vector signed int) y, (vector unsigned int) vec_splats (31));
+
+  /* N = N - 0.5.  */
+  n = n - __s_one_half;
+
+  /* Get absolute argument value: X = |X|.  */
+  vector float abs_x = vec_abs (x);
+
+  /* Check for large arguments path.  */
+  vector bool int large_in = vec_cmpgt (abs_x, __s_rangeval);
+
+  /* R = X - N*Pi1. */
+  vector float r = x - (n * __s_pi1_fma);
+
+  /* R = R - N*Pi2.  */
+  r = r - (n * __s_pi2_fma);
+
+  /* R = R - N*Pi3.  */
+  r = r - (n * __s_pi3_fma);
+
+  /* R2 = R*R.  */
+  vector float r2 = r * r;
+
+  /* RECONSTRUCTION:
+     Final sign setting: Res = Poly^SignRes.  */
+  vector float res = (vector float)
+      ((vector signed int) r ^ (vector signed int) sign_res);
+
+  /* Poly = R + R * R2*(A3+R2*(A5+R2*(A7+R2*A9))). */
+  vector float poly = r2 * __s_a9_fma + __s_a7_fma;
+  poly = poly * r2 + __s_a5_fma;
+  poly = poly * r2 + __s_a3;
+  poly = poly * r2 * res + res;
+
+  if (large_in[0])
+    poly[0] = cosf (x[0]);
+
+  if (large_in[1])
+    poly[1] = cosf (x[1]);
+
+  if (large_in[2])
+    poly[2] = cosf (x[2]);
+
+  if (large_in[3])
+    poly[3] = cosf (x[3]);
+
+  return poly;
+
+} /* Function _ZGVbN4v_cosf.  */
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h
new file mode 100644
index 0000000000..55c28563e7
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h
@@ -0,0 +1,72 @@ 
+/* Constants used in polynomial approximations for vectorized sinf, cosf,
+   and sincosf functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef S_TRIG_DATA_H
+#define S_TRIG_DATA_H
+
+#include <altivec.h>
+
+/* PI/2.  */
+const vector float __s_half_pi =
+{ 0x1.921fb6p+0, 0x1.921fb6p+0, 0x1.921fb6p+0, 0x1.921fb6p+0 };
+
+/* Inverse PI.  */
+const vector float __s_inv_pi =
+{ 0x1.45f306p-2, 0x1.45f306p-2, 0x1.45f306p-2, 0x1.45f306p-2 };
+
+/* Right-shifter constant.  */
+const vector float __s_rshifter =
+{ 0x1.8p+23, 0x1.8p+23, 0x1.8p+23, 0x1.8p+23 };
+
+/* One-half.  */
+const vector float __s_one_half =
+{ 0x1p-1, 0x1p-1, 0x1p-1, 0x1p-1 };
+
+/* Threshold for out-of-range values.  */
+const vector float __s_rangeval =
+{ 0x1.388p+13, 0x1.388p+13, 0x1.388p+13, 0x1.388p+13 };
+
+/* PI1, PI2, and PI3 when FMA is available
+   PI high part (when FMA available).  */
+const vector float __s_pi1_fma =
+{ 0x1.921fb6p+1, 0x1.921fb6p+1, 0x1.921fb6p+1, 0x1.921fb6p+1 };
+
+/* PI mid part  (when FMA available).  */
+const vector float __s_pi2_fma =
+{ -0x1.777a5cp-24, -0x1.777a5cp-24, -0x1.777a5cp-24, -0x1.777a5cp-24 };
+
+/* PI low part  (when FMA available).  */
+const vector float __s_pi3_fma =
+{ -0x1.ee59dap-49, -0x1.ee59dap-49, -0x1.ee59dap-49, -0x1.ee59dap-49 };
+
+/* Polynomial constants for work w/o FMA, relative error ~ 2^(-26.625).  */
+const vector float __s_a3 =
+{ -0x1.55554cp-3, -0x1.55554cp-3, -0x1.55554cp-3, -0x1.55554cp-3 };
+
+/* Polynomial constants, work with FMA, relative error ~ 2^(-26.417).  */
+const vector float __s_a5_fma =
+{ 0x1.110edp-7, 0x1.110edp-7, 0x1.110edp-7, 0x1.110edp-7 };
+
+const vector float __s_a7_fma =
+{ -0x1.9f6d9ep-13, -0x1.9f6d9ep-13, -0x1.9f6d9ep-13, -0x1.9f6d9ep-13 };
+
+const vector float __s_a9_fma =
+{ 0x1.5d866ap-19, 0x1.5d866ap-19, 0x1.5d866ap-19, 0x1.5d866ap-19 };
+
+#endif /* S_TRIG_DATA_H.  */
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
index 656ce0541f..8eef5e1e72 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
@@ -1 +1,2 @@ 
 GLIBC_2.30 _ZGVbN2v_cos F
+GLIBC_2.30 _ZGVbN4v_cosf F
-- 
2.20.1