Patchwork [v3,2/12] PPC64: Add libmvec SIMD single-precision cosine function.

login
register
mail settings
Submitter GT
Date April 3, 2019, 10:32 p.m.
Message ID <02mFr7kPjJOxt9Pme-rDgCm2swzXXarXPFAIYgXdgC5TvFnr6lh8FtG54gsBNGUbOXIpiaU2pSJAKP16zyBcpZlKbvB4ZIbywD6wqVJz_aI=@protonmail.com>
Download mbox | patch
Permalink /patch/32156/
State New
Headers show

Comments

GT - April 3, 2019, 10:32 p.m.
Differences from patch version v2:

1. Patch subject line prefix now has version id. The previous versions did not include
their version numbers so here are links to them:

v1: https://sourceware.org/ml/libc-alpha/2019-03/msg00053.html
v2: https://sourceware.org/ml/libc-alpha/2019-03/msg00128.html

2. Makefile adds -mabi=altivec and -maltivec flags for all libmvec targets to enable
successful builds in cases where GCC was not specifically configured for POWER8/9.
Tulio Magno Quites Machado Filho - April 15, 2019, 8:03 p.m.
GT <tnggil@protonmail.com> writes:

> [BZ #24205]
>
> Implements single-precision cosine using VSX vector capability. The polynomial
> cosine-approximating algorithm is adapted for PPC64 from x86_64 [commit #04f496d602].
>
> The patch has been tested on PPC64/POWER8 Little Endian and Big Endian. It is
> tested using the framework created for libmvec on x86_64 which runs tests on
> issuing 'make check'. Tests of the new vector cosine function all pass.
>
> Details on the ABI are found at this link:
> <https://sourceware.org/glibc/wiki/
> libmvec?action=AttachFile&do=view&target=VectorABI.txt>
>
> But for adjusting the width of operands, details described for the
> double-precision cosine implemented earlier apply here. See git
> commit #7956c29f07 for that information.
> ---
>  ChangeLog                                     |  15 +++
>  NEWS                                          |  10 +-
>  sysdeps/powerpc/fpu/libm-test-ulps            |   3 +
>  sysdeps/powerpc/powerpc64/fpu/Versions        |   2 +-
>  .../powerpc/powerpc64/fpu/multiarch/Makefile  |  10 +-
>  .../fpu/multiarch/test-float-vlen4-wrappers.c |  24 ++++
>  .../powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c | 109 ++++++++++++++++++
>  .../powerpc64/fpu/multiarch/vec_s_trig_data.h |  72 ++++++++++++
>  .../linux/powerpc/powerpc64/libmvec.abilist   |   1 +
>  9 files changed, 241 insertions(+), 5 deletions(-)
>  create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
>  create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c
>  create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h
>
> diff --git a/ChangeLog b/ChangeLog
> index efe4253f6d..1d6636b8fb 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,18 @@
> +2019-04-03  Bert Tenjy  <bert.tenjy@gmail.com>
> +
> +       [BZ #24205]
> +       * NEWS: Note the addition of PPC64 vector cosf.
> +       * sysdeps/powerpc/fpu/libm-test-ulps: Regenerated.
> +       * sysdeps/powerpc/powerpc64/fpu/Versions: Added new cosf entry.
> +       * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: (libmvec-sysdep_routines)
> +       (CFLAGS-vec_s_cosf4_vsx.c, libmvec-tests, float-vlen4-funcs)
> +       (float-vlen4-arch-ext-cflags): Added build of VSX SIMD cosf function
> +       and its tests.
> +       * sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c: New file.
> +       * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c: New file.
> +       * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h: Likewise.
> +       * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: New SIMD cosf added.

Changelog entries should be indented with tabs.
The ChangeLog should not be part of the diff either.

More information here:
https://sourceware.org/glibc/wiki/Contribution%20checklist#Properly_Formatted_GNU_ChangeLog

Fixed these and merged into branch tuliom/libmvec.

Thanks!

Patch

From bf203338480daf72207bb101dffcdf6f5a95e10e Mon Sep 17 00:00:00 2001
From: Bert Tenjy <bert.tenjy@gmail.com>
Date: Thu, 7 Mar 2019 05:17:47 +0000
Subject: [PATCH v3 2/12] PPC64: Add libmvec SIMD single-precision cosine
 function.

[BZ #24205]

Implements single-precision cosine using VSX vector capability. The polynomial
cosine-approximating algorithm is adapted for PPC64 from x86_64 [commit #04f496d602].

The patch has been tested on PPC64/POWER8 Little Endian and Big Endian. It is
tested using the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector cosine function all pass.

Details on the ABI are found at this link:
<https://sourceware.org/glibc/wiki/
libmvec?action=AttachFile&do=view&target=VectorABI.txt>

But for adjusting the width of operands, details described for the
double-precision cosine implemented earlier apply here. See git
commit #7956c29f07 for that information.
---
 ChangeLog                                     |  15 +++
 NEWS                                          |  10 +-
 sysdeps/powerpc/fpu/libm-test-ulps            |   3 +
 sysdeps/powerpc/powerpc64/fpu/Versions        |   2 +-
 .../powerpc/powerpc64/fpu/multiarch/Makefile  |  10 +-
 .../fpu/multiarch/test-float-vlen4-wrappers.c |  24 ++++
 .../powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c | 109 ++++++++++++++++++
 .../powerpc64/fpu/multiarch/vec_s_trig_data.h |  72 ++++++++++++
 .../linux/powerpc/powerpc64/libmvec.abilist   |   1 +
 9 files changed, 241 insertions(+), 5 deletions(-)
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h

diff --git a/ChangeLog b/ChangeLog
index efe4253f6d..1d6636b8fb 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,18 @@ 
+2019-04-03  Bert Tenjy  <bert.tenjy@gmail.com>
+
+       [BZ #24205]
+       * NEWS: Note the addition of PPC64 vector cosf.
+       * sysdeps/powerpc/fpu/libm-test-ulps: Regenerated.
+       * sysdeps/powerpc/powerpc64/fpu/Versions: Added new cosf entry.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: (libmvec-sysdep_routines)
+       (CFLAGS-vec_s_cosf4_vsx.c, libmvec-tests, float-vlen4-funcs)
+       (float-vlen4-arch-ext-cflags): Added build of VSX SIMD cosf function
+       and its tests.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c: New file.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c: New file.
+       * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h: Likewise.
+       * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: New SIMD cosf added.
+
 2019-04-03  Bert Tenjy  <bert.tenjy@gmail.com>
 
 	[BZ #24205]
diff --git a/NEWS b/NEWS
index 22735e1d1d..183b306f75 100644
--- a/NEWS
+++ b/NEWS
@@ -10,13 +10,19 @@  Version 2.30
 
 Major new features:
 
-* Start of implementing vector math library libmvec on PPC64/POWER8.
-  The double-precision cosine now has a vector version.
+* Implementation of vector math library libmvec on PPC64/POWER8.
+
+  The following functions now have vector versions.
+  - double-precision cosine: cos
+  - single-precision cosine: cosf
+
   GCC support for auto-vectorization of functions on PPC64 is not yet
   available. Until that is done, the new vector math functions are
   inaccessible to applications.
+
   Library libmvec is built by default for PPC64. Disable its creation by
   passing flag --disable-mathvec to configure.
+
   The library ABI specification is x86_64 Vector Function ABI.
   More information on libmvec including a link to the ABI document is at:
   <https://sourceware.org/glibc/wiki/libmvec>
diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
index d392b135a7..3bd9e67096 100644
--- a/sysdeps/powerpc/fpu/libm-test-ulps
+++ b/sysdeps/powerpc/fpu/libm-test-ulps
@@ -1314,6 +1314,9 @@  ldouble: 5
 Function: "cos_vlen2":
 double: 2
 
+Function: "cos_vlen4":
+float: 1
+
 Function: "cosh":
 double: 1
 float: 1
diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions
index 9a3e1211cc..bdd4b657c4 100644
--- a/sysdeps/powerpc/powerpc64/fpu/Versions
+++ b/sysdeps/powerpc/powerpc64/fpu/Versions
@@ -1,5 +1,5 @@ 
 libmvec {
   GLIBC_2.30 {
-    _ZGVbN2v_cos;
+    _ZGVbN2v_cos; _ZGVbN4v_cosf;
   }
 }
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
index d713b61431..ad9d8f0127 100644
--- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
@@ -44,18 +44,24 @@  CFLAGS-s_modff-ppc64.c += -fsignaling-nans
 endif
 
 ifeq ($(subdir),mathvec)
-libmvec-sysdep_routines += vec_d_cos2_vsx
+libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx
 CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx
+CFLAGS-vec_s_cosf4_vsx.c += -mabi=altivec -maltivec -mvsx
 endif
 
 # Variables for libmvec tests.
 ifeq ($(subdir),math)
 ifeq ($(build-mathvec),yes)
-libmvec-tests += double-vlen2
+libmvec-tests += double-vlen2 float-vlen4
 
 double-vlen2-funcs = cos
+float-vlen4-funcs = cos
 
 double-vlen2-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
+float-vlen4-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
+
 CFLAGS-test-double-vlen2-wrappers.c += -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
+CFLAGS-test-float-vlen4-wrappers.c += -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
+
 endif
 endif
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
new file mode 100644
index 0000000000..f099990d4e
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-float-vlen4-wrappers.c
@@ -0,0 +1,24 @@ 
+/* Wrapper part of tests for VSX ISA versions of vector math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-float-vlen4.h"
+#include <altivec.h>
+
+#define VEC_TYPE vector float
+
+VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c
new file mode 100644
index 0000000000..fb9d27b865
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_cosf4_vsx.c
@@ -0,0 +1,109 @@ 
+/* Function cosf vectorized with VSX.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+#include "vec_s_trig_data.h"
+
+vector float
+_ZGVbN4v_cosf (vector float x)
+{
+
+  /*
+   ALGORITHM DESCRIPTION:
+
+   1) Range reduction to [-Pi/2; +Pi/2] interval
+     a) We remove sign using absolute value operation
+     b) Add Pi/2 value to argument X for Cos to Sin transformation
+     c) Getting octant Y by 1/Pi multiplication
+     d) Add "Right Shifter" value
+     e) Treat obtained value as integer for destination sign setting.
+        Shift first bit of this value to the last (sign) position
+     f) Subtract "Right Shifter"  value
+     g) Subtract 0.5 from result for octant correction
+     h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+          X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+   2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+     a) Calculate X^2 = X * X
+     b) Calculate polynomial:
+         R = X + X * X^2 * (A3 + x^2 * (A5 + .....
+   3) Destination sign setting
+     a) Set shifted destination sign using XOR operation:
+          R = XOR( R, S ).  */
+
+  /*
+   ARGUMENT RANGE REDUCTION:
+   Add Pi/2 to argument: X' = X+Pi/2. Transforms cos to sin.  */
+  vector float x_prime = __s_half_pi + x;
+
+  /* Y = X'*InvPi + RS : right shifter add.  */
+  vector float y = (x_prime * __s_inv_pi) + __s_rshifter;
+
+  /* N = Y - RS : right shifter sub.  */
+  vector float n = y - __s_rshifter;
+
+  /* SignRes = Y<<31 : shift LSB to MSB place for result sign.  */
+  vector float sign_res = (vector float)
+      vec_sl ((vector signed int) y, (vector unsigned int) vec_splats (31));
+
+  /* N = N - 0.5.  */
+  n = n - __s_one_half;
+
+  /* Get absolute argument value: X = |X|.  */
+  vector float abs_x = vec_abs (x);
+
+  /* Check for large arguments path.  */
+  vector bool int large_in = vec_cmpgt (abs_x, __s_rangeval);
+
+  /* R = X - N*Pi1. */
+  vector float r = x - (n * __s_pi1_fma);
+
+  /* R = R - N*Pi2.  */
+  r = r - (n * __s_pi2_fma);
+
+  /* R = R - N*Pi3.  */
+  r = r - (n * __s_pi3_fma);
+
+  /* R2 = R*R.  */
+  vector float r2 = r * r;
+
+  /* RECONSTRUCTION:
+     Final sign setting: Res = Poly^SignRes.  */
+  vector float res = (vector float)
+      ((vector signed int) r ^ (vector signed int) sign_res);
+
+  /* Poly = R + R * R2*(A3+R2*(A5+R2*(A7+R2*A9))). */
+  vector float poly = r2 * __s_a9_fma + __s_a7_fma;
+  poly = poly * r2 + __s_a5_fma;
+  poly = poly * r2 + __s_a3;
+  poly = poly * r2 * res + res;
+
+  if (large_in[0])
+    poly[0] = cosf (x[0]);
+
+  if (large_in[1])
+    poly[1] = cosf (x[1]);
+
+  if (large_in[2])
+    poly[2] = cosf (x[2]);
+
+  if (large_in[3])
+    poly[3] = cosf (x[3]);
+
+  return poly;
+
+} /* Function _ZGVbN4v_cosf.  */
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h
new file mode 100644
index 0000000000..55c28563e7
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_trig_data.h
@@ -0,0 +1,72 @@ 
+/* Constants used in polynomial approximations for vectorized sinf, cosf,
+   and sincosf functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef S_TRIG_DATA_H
+#define S_TRIG_DATA_H
+
+#include <altivec.h>
+
+/* PI/2.  */
+const vector float __s_half_pi =
+{ 0x1.921fb6p+0, 0x1.921fb6p+0, 0x1.921fb6p+0, 0x1.921fb6p+0 };
+
+/* Inverse PI.  */
+const vector float __s_inv_pi =
+{ 0x1.45f306p-2, 0x1.45f306p-2, 0x1.45f306p-2, 0x1.45f306p-2 };
+
+/* Right-shifter constant.  */
+const vector float __s_rshifter =
+{ 0x1.8p+23, 0x1.8p+23, 0x1.8p+23, 0x1.8p+23 };
+
+/* One-half.  */
+const vector float __s_one_half =
+{ 0x1p-1, 0x1p-1, 0x1p-1, 0x1p-1 };
+
+/* Threshold for out-of-range values.  */
+const vector float __s_rangeval =
+{ 0x1.388p+13, 0x1.388p+13, 0x1.388p+13, 0x1.388p+13 };
+
+/* PI1, PI2, and PI3 when FMA is available
+   PI high part (when FMA available).  */
+const vector float __s_pi1_fma =
+{ 0x1.921fb6p+1, 0x1.921fb6p+1, 0x1.921fb6p+1, 0x1.921fb6p+1 };
+
+/* PI mid part  (when FMA available).  */
+const vector float __s_pi2_fma =
+{ -0x1.777a5cp-24, -0x1.777a5cp-24, -0x1.777a5cp-24, -0x1.777a5cp-24 };
+
+/* PI low part  (when FMA available).  */
+const vector float __s_pi3_fma =
+{ -0x1.ee59dap-49, -0x1.ee59dap-49, -0x1.ee59dap-49, -0x1.ee59dap-49 };
+
+/* Polynomial constants for work w/o FMA, relative error ~ 2^(-26.625).  */
+const vector float __s_a3 =
+{ -0x1.55554cp-3, -0x1.55554cp-3, -0x1.55554cp-3, -0x1.55554cp-3 };
+
+/* Polynomial constants, work with FMA, relative error ~ 2^(-26.417).  */
+const vector float __s_a5_fma =
+{ 0x1.110edp-7, 0x1.110edp-7, 0x1.110edp-7, 0x1.110edp-7 };
+
+const vector float __s_a7_fma =
+{ -0x1.9f6d9ep-13, -0x1.9f6d9ep-13, -0x1.9f6d9ep-13, -0x1.9f6d9ep-13 };
+
+const vector float __s_a9_fma =
+{ 0x1.5d866ap-19, 0x1.5d866ap-19, 0x1.5d866ap-19, 0x1.5d866ap-19 };
+
+#endif /* S_TRIG_DATA_H.  */
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
index 656ce0541f..8eef5e1e72 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
@@ -1 +1,2 @@ 
 GLIBC_2.30 _ZGVbN2v_cos F
+GLIBC_2.30 _ZGVbN4v_cosf F
-- 
2.20.1