From patchwork Sun Apr  7 05:28:32 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: GT <tnggil@protonmail.com>
X-Patchwork-Id: 32178
X-Patchwork-Delegate: tuliom@linux.vnet.ibm.com
Received: (qmail 60734 invoked by alias); 7 Apr 2019 05:28:43 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 60174 invoked by uid 89); 7 Apr 2019 05:28:42 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-25.4 required=5.0 tests=AWL, BAYES_00,
	FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2,
	GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS,
	SPF_PASS autolearn=ham version=3.3.1 spammy=grab, 2019-04-07,
	multiplication, subtract
X-HELO: mail-40133.protonmail.ch
Date: Sun, 07 Apr 2019 05:28:32 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com;
	s=default; t=1554614915;
	bh=zJvGmdQYWpSw6vFcfOE8BO0G08TZeRlCkUqwe7h260g=;
	h=Date:To:From:Reply-To:Subject:Feedback-ID:From;
	b=eUKGq44EiE9G4Up+VDmBmCfa0pUttYAK5Pk4O+5UagFPRjmjXGm6FlQz84UUNJgF9
	pDZdeWHYnxWgbU27Ph8HLrH3F2rv8+1NUZ0ke+HIkmjwvIzX7+LOTRjBmKzfldwGKw
	DFCtSx9xygKOBn7UFZC061CYeNQvLYRop1QqOLGs=
To: "libc-alpha\\@sourceware.org" <libc-alpha@sourceware.org>
From: GT <tnggil@protonmail.com>
Reply-To: GT <tnggil@protonmail.com>
Subject: [PATCH v1 6/12] PPC64: Add libmvec SIMD single-precision sincosf
	function.
Message-ID: 
 <xR5nuzUjbkeC5DOLVc8egaitsr_UcrvMh1tvwWBuV80GHmZOfvXzLQM7BL7OO8g7CmjTHnykA2Wp0062G_pll3Fhj7JAiYkK51WXVcbWz5I=@protonmail.com>
MIME-Version: 1.0

1. This implementation is basically a combination of the single-precision cosine and sine
functions. Those are in, respectively, patches No. 2 and No. 4 in this sequence.

2. As sincosf returns both a vector of sines and a vector of cosines, the ABI used requires
that: the caller of sincosf pass, as input arguments 2 and 3, pointers to vector floats in
which the sine and cosine results will be stored.

From 5a03c4d8b7bc6d69d286f7411283e3892dd4ee26 Mon Sep 17 00:00:00 2001
From: Bert Tenjy <bert.tenjy@gmail.com>
Date: Sun, 7 Apr 2019 05:13:47 +0000
Subject: [PATCH v1 6/12] PPC64: Add libmvec SIMD single-precision sincosf
 function.

[BZ #24207]

Implements single-precision vector sincosf function. The polynomial approxima-
ting algorithm is adapted for PPC64 from x86_64 [commit #a6336cc446].

The patch has been tested on PPC64/POWER8 Little Endian and Big Endian.
Testing uses the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector sincosf function all pass.
---
 ChangeLog                                     |  15 ++
 NEWS                                          |   1 +
 sysdeps/powerpc/bits/math-vector.h            |   2 +
 sysdeps/powerpc/fpu/libm-test-ulps            |   3 +
 sysdeps/powerpc/powerpc64/fpu/Versions        |   2 +-
 .../powerpc/powerpc64/fpu/multiarch/Makefile  |   5 +-
 .../fpu/multiarch/test-float-vlen4-wrappers.c |   2 +
 .../fpu/multiarch/vec_s_sincosf4_vsx.c        |  33 +++
 .../fpu/multiarch/vec_s_sincosf4_vsx.h        | 203 ++++++++++++++++++
 .../linux/powerpc/powerpc64/libmvec.abilist   |   1 +
 10 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.c
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.h

diff --git a/ChangeLog b/ChangeLog
index d56d1f2a28..ecf71c6c69 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,18 @@
+2019-04-07  Bert Tenjy  <bert.tenjy@gmail.com>
+
+       [BZ #24207]
+       * NEWS: Note the addition of PPC64 vector sincosf.
+       * sysdeps/powerpc/bits/math-vector.h: Added sincosf entry.
+       * sysdeps/powerpc/fpu/libm-test-ulps: Regenerated.
+       * sysdeps/powerpc/powerpc64/fpu/Versions: Added sincosf entry.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: (libmvec-sysdep_routines)
+       (CFLAGS-vec_s_sincosf4_vsx.c, float-vlen4-funcs): Added build of VSX SIMD sincosf function
+       and its tests.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c: Added sincosf entry.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.c: New file.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.h: Likewise.
+       * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD sincosf added.
+
 2019-04-06  Bert Tenjy  <bert.tenjy@gmail.com>
 
        [BZ #24207]
diff --git a/NEWS b/NEWS
index d72b9a7f70..838bbb955b 100644
--- a/NEWS
+++ b/NEWS
@@ -18,6 +18,7 @@ Major new features:
   - double-precision sine: sin
   - single-precision cosine: sinf
   - double-precision sincos: sincos
+  - single-precision sincos: sincosf
 
   GCC support for auto-vectorization of functions on PPC64 is not yet
   available. Until that is done, the new vector math functions are
diff --git a/sysdeps/powerpc/bits/math-vector.h b/sysdeps/powerpc/bits/math-vector.h
index 52b4b09024..aa2c809098 100644
--- a/sysdeps/powerpc/bits/math-vector.h
+++ b/sysdeps/powerpc/bits/math-vector.h
@@ -38,6 +38,8 @@
 #  define __DECL_SIMD_cos __DECL_SIMD_PPC64
 #  undef __DECL_SIMD_sincos
 #  define __DECL_SIMD_sincos __DECL_SIMD_PPC64
+#  undef __DECL_SIMD_sincosf
+#  define __DECL_SIMD_sincosf __DECL_SIMD_PPC64
 
 # endif
 #endif
diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
index 14c82e4995..038f2912b8 100644
--- a/sysdeps/powerpc/fpu/libm-test-ulps
+++ b/sysdeps/powerpc/fpu/libm-test-ulps
@@ -2619,6 +2619,9 @@ ldouble: 7
 Function: "sincos_vlen2":
 double: 2
 
+Function: "sincos_vlen4":
+float: 1
+
 Function: "sinh":
 double: 2
 float: 2
diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions
index 8ab0f00f18..22bd7cc51b 100644
--- a/sysdeps/powerpc/powerpc64/fpu/Versions
+++ b/sysdeps/powerpc/powerpc64/fpu/Versions
@@ -1,6 +1,6 @@
 libmvec {
   GLIBC_2.30 {
     _ZGVbN2v_cos; _ZGVbN4v_cosf; _ZGVbN2v_sin; _ZGVbN4v_sinf;
-    _ZGVbN2vvv_sincos;
+    _ZGVbN2vvv_sincos; _ZGVbN4vvv_sincosf;
   }
 }
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
index 8f4f8fabea..37e8755392 100644
--- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
@@ -46,12 +46,13 @@ endif
 ifeq ($(subdir),mathvec)
 libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \
 			   vec_d_sin2_vsx vec_s_sinf4_vsx \
-			   vec_d_sincos2_vsx
+			   vec_d_sincos2_vsx vec_s_sincosf4_vsx
 CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx
 CFLAGS-vec_s_cosf4_vsx.c += -mabi=altivec -maltivec -mvsx
 CFLAGS-vec_d_sin2_vsx.c += -mabi=altivec -maltivec -mvsx
 CFLAGS-vec_s_sinf4_vsx.c += -mabi=altivec -maltivec -mvsx
 CFLAGS-vec_d_sincos2_vsx.c += -mabi=altivec -maltivec -mvsx
+CFLAGS-vec_s_sincosf4_vsx.c += -mabi=altivec -maltivec -mvsx
 endif
 
 # Variables for libmvec tests.
@@ -60,7 +61,7 @@ ifeq ($(build-mathvec),yes)
 libmvec-tests += double-vlen2 float-vlen4
 
 double-vlen2-funcs = cos sin sincos
-float-vlen4-funcs = cos sin
+float-vlen4-funcs = cos sin sincos
 
 double-vlen2-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
 float-vlen4-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
index 44f94d1c70..87be9de0be 100644
--- a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
@@ -23,3 +23,5 @@
 
 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf)
+
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVbN4vvv_sincosf)
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.c
new file mode 100644
index 0000000000..1f2d478fbc
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.c
@@ -0,0 +1,33 @@
+/* Function sincosf vectorized with VSX.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+#include "vec_s_trig_data.h"
+#include "vec_s_sincosf4_vsx.h"
+
+void
+_ZGVbN4vvv_sincosf (vector float x, vector float * sines_x, vector float * cosines_x)
+{
+
+  /* Call vector sine evaluator.  */
+  *sines_x = __s_sin_poly_eval(x);
+
+  /* Call vector cosine evaluator.  */
+  *cosines_x = __s_cos_poly_eval(x);
+
+} /* Function _ZGVbN4_vvv_sincosf.  */
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.h
new file mode 100644
index 0000000000..3ba1ffb930
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.h
@@ -0,0 +1,203 @@
+/* Definitions to simplify code by allowing reuse of sine and cosine
+   function implementations.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+#include "vec_s_trig_data.h"
+
+static inline vector float
+__s_cos_poly_eval (vector float x)
+{
+
+  /*
+   ALGORITHM DESCRIPTION:
+
+   1) Range reduction to [-Pi/2; +Pi/2] interval
+     a) We remove sign using absolute value operation
+     b) Add Pi/2 value to argument X for Cos to Sin transformation
+     c) Getting octant Y by 1/Pi multiplication
+     d) Add "Right Shifter" value
+     e) Treat obtained value as integer for destination sign setting.
+        Shift first bit of this value to the last (sign) position
+     f) Subtract "Right Shifter"  value
+     g) Subtract 0.5 from result for octant correction
+     h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+          X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+   2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+     a) Calculate X^2 = X * X
+     b) Calculate polynomial:
+         R = X + X * X^2 * (A3 + x^2 * (A5 + .....
+   3) Destination sign setting
+     a) Set shifted destination sign using XOR operation:
+          R = XOR( R, S ).  */
+
+  /*
+   ARGUMENT RANGE REDUCTION:
+   Add Pi/2 to argument: X' = X+Pi/2. Transforms cos to sin.  */
+  vector float x_prime = __s_half_pi + x;
+
+  /* Y = X'*InvPi + RS : right shifter add.  */
+  vector float y = (x_prime * __s_inv_pi) + __s_rshifter;
+
+  /* N = Y - RS : right shifter sub.  */
+  vector float n = y - __s_rshifter;
+
+  /* SignRes = Y<<31 : shift LSB to MSB place for result sign.  */
+  vector float sign_res = (vector float)
+      vec_sl ((vector signed int) y, (vector unsigned int) vec_splats (31));
+
+  /* N = N - 0.5.  */
+  n = n - __s_one_half;
+
+  /* Get absolute argument value: X = |X|.  */
+  vector float abs_x = vec_abs (x);
+
+  /* Check for large arguments path.  */
+  vector bool int large_in = vec_cmpgt (abs_x, __s_rangeval);
+
+  /* R = X - N*Pi1. */
+  vector float r = x - (n * __s_pi1_fma);
+
+  /* R = R - N*Pi2.  */
+  r = r - (n * __s_pi2_fma);
+
+  /* R = R - N*Pi3.  */
+  r = r - (n * __s_pi3_fma);
+
+  /* R2 = R*R.  */
+  vector float r2 = r * r;
+
+  /* RECONSTRUCTION:
+     Final sign setting: Res = Poly^SignRes.  */
+  vector float res = (vector float)
+      ((vector signed int) r ^ (vector signed int) sign_res);
+
+  /* Poly = R + R * R2*(A3+R2*(A5+R2*(A7+R2*A9))). */
+  vector float poly = r2 * __s_a9_fma + __s_a7_fma;
+  poly = poly * r2 + __s_a5_fma;
+  poly = poly * r2 + __s_a3;
+  poly = poly * r2 * res + res;
+
+  if (large_in[0])
+    poly[0] = cosf (x[0]);
+
+  if (large_in[1])
+    poly[1] = cosf (x[1]);
+
+  if (large_in[2])
+    poly[2] = cosf (x[2]);
+
+  if (large_in[3])
+    poly[3] = cosf (x[3]);
+
+  return poly;
+
+} /* Function __s_cos_poly_eval.  */
+
+static inline vector float
+__s_sin_poly_eval (vector float x)
+{
+
+  /*
+    ALGORITHM DESCRIPTION:
+
+    1) Range reduction to [-Pi/2; +Pi/2] interval
+       a) Grab sign from source argument and save it.
+       b) Remove sign using AND operation
+       c) Getting octant Y by 1/Pi multiplication
+       d) Add "Right Shifter" value
+       e) Treat obtained value as integer for destination sign setting.
+          Shift first bit of this value to the last (sign) position
+       f) Change destination sign if source sign is negative
+          using XOR operation.
+       g) Subtract "Right Shifter" value
+       h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+          X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+    2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+       a) Calculate X^2 = X * X
+       b) Calculate polynomial:
+          R = X + X * X^2 * (A3 + x^2 * (A5 + ......
+    3) Destination sign setting
+       a) Set shifted destination sign using XOR operation:
+          R = XOR( R, S ).  */
+
+
+  /* Remove sign of input argument: X'=|X|.  */
+  vector float abs_x = vec_abs (x);
+
+  /* Getting octant Y by 1/Pi multiplication. Add "Right Shifter" value.
+     Y = X'*InvPi + RS.  */
+  vector float y = (abs_x * __s_inv_pi) + __s_rshifter;
+
+  /* N = Y - RS : right shifter sub.  */
+  vector float n = y - __s_rshifter;
+
+  /* SignRes = Y<<31 : shift LSB to MSB place for result sign.  */
+  vector float sign_res = (vector float)
+      vec_sl ((vector signed int) y, (vector unsigned int) vec_splats (31));
+
+  /* Subtract N*PI from X argument, where PI divided into 3 parts.  */
+  /* R = X - N*PI1 - N*PI2 - N*PI3.  */
+  vector float r = abs_x - (n * __s_pi1_fma);
+
+  /* R = R - N*Pi2.  */
+  r = r - (n * __s_pi2_fma);
+
+  /* R = R - N*Pi3.  */
+  r = r - (n * __s_pi3_fma);
+
+  /* Check for large arguments path.  */
+  vector bool int large_in = vec_cmpgt (abs_x, __s_rangeval);
+
+  /* Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval).  */
+
+  /* R2 = R*R.  */
+  vector float r2 = r * r;
+
+  /* Change destination sign if source sign is -ve using XOR operation.  */
+  vector float neg_sign = vec_andc (x, __s_abs_mask);
+
+  vector float res = (vector float)
+      ((vector signed int) r ^ (vector signed int) sign_res);
+
+  /* Poly = R + R * R2*(A3+R2*(A5+R2*(A7+R2*A9))). */
+  vector float poly = r2 * __s_a9_fma + __s_a7_fma;
+  poly = poly * r2 + __s_a5_fma;
+  poly = poly * r2 + __s_a3;
+  poly = poly * r2 * res + res;
+
+/* Destination sign setting.
+   Set shifted destination sign using XOR operation: R = XOR( R, S ).  */
+  vector float out
+      = (vector float) ((vector int) poly ^ (vector int) neg_sign);
+
+  if (large_in[0])
+    out[0] = sinf (x[0]);
+
+  if (large_in[1])
+    out[1] = sinf (x[1]);
+
+  if (large_in[2])
+    out[2] = sinf (x[2]);
+
+  if (large_in[3])
+    out[3] = sinf (x[3]);
+
+  return out;
+
+} /* Function __s_sin_poly_eval.  */
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
index 0337d3adb5..9d298a788c 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
@@ -3,3 +3,4 @@ GLIBC_2.30 _ZGVbN2v_sin F
 GLIBC_2.30 _ZGVbN2vvv_sincos F
 GLIBC_2.30 _ZGVbN4v_cosf F
 GLIBC_2.30 _ZGVbN4v_sinf F
+GLIBC_2.30 _ZGVbN4vvv_sincosf F
-- 
2.20.1