Patchwork PPC64: First in the series of patches implementing POWER8 vector math.

login
register
mail settings
Submitter GT
Date Feb. 27, 2019, 3:24 a.m.
Message ID <Uow5EQOWMJEiljnzCmwsPEKHKWLzQef0kTCKqXAxghjRKcc3SUN3QhV94lQxRB8Fh7j1CsJ4wqw3XBoaPsBWbKkTSeWJU4gzG0OagHItX8s=@protonmail.com>
Download mbox | patch
Permalink /patch/31607/
State New
Headers show

Comments

GT - Feb. 27, 2019, 3:24 a.m.
Empty Message

Patch

From e7d282c21dd987a30d5b6eb674f32a501594273f Mon Sep 17 00:00:00 2001
From: Bert Tenjy <bert.tenjy@gmail.com>
Date: Wed, 27 Feb 2019 02:00:29 +0000
Subject: [PATCH] PPC64: First in the series of patches implementing POWER8
 vector math.

[BZ #24205]

Implements double-precision cosine using VSX vector capability. Algorithm for
cosine is from x86_64 [commit #2193311288] adapted to PPC64.

Name-mangling exactly duplicates SSE ISA of the x86_64 ABI. The details are at
<https://sourceware.org/glibc/wiki/
libmvec?action=AttachFile&do=view&target=VectorABI.txt>

The patch has been tested on PPC64/POWER8 Little Endian and Big Endian. It is
tested using the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector cosine function all pass.

Glibc built with this patch was installed using the procedure outlined at
<https://sourceware.org/glibc/wiki/Testing/Builds>. Compiling against the new
library created a test executable which computes cosines using the vector
version of the function. The results are at most 2-ulps away from the scalar
cosine. That is expected and indicated in the comments describing the
algorithm - as obtained from x86_64 commit #2193311288.
---
 ChangeLog                                     | 17 ++++
 NEWS                                          | 13 +++
 sysdeps/powerpc/bits/math-vector.h            | 41 +++++++++
 sysdeps/powerpc/fpu/libm-test-ulps            |  3 +
 sysdeps/powerpc/powerpc64/fpu/Makefile        |  7 ++
 sysdeps/powerpc/powerpc64/fpu/Versions        |  5 ++
 .../powerpc/powerpc64/fpu/multiarch/Makefile  | 17 ++++
 .../multiarch/test-double-vlen2-wrappers.c    | 24 +++++
 .../powerpc64/fpu/multiarch/vec_d_cos2_vsx.c  | 88 +++++++++++++++++++
 .../powerpc64/fpu/multiarch/vec_d_trig_data.h | 60 +++++++++++++
 .../powerpc/powerpc64/fpu/vec_finite_alias.c  | 41 +++++++++
 .../linux/powerpc/powerpc64/libmvec.abilist   |  1 +
 12 files changed, 317 insertions(+)
 create mode 100644 sysdeps/powerpc/bits/math-vector.h
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/Makefile
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/Versions
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_cos2_vsx.c
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/vec_finite_alias.c
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist

Notable differences from the previous patch and further commentary:

1. Renamed the main C source file from vec_d_cos2_power8.c to
vec_d_cos2_vsx.c. VSX functionality is also available on POWER7 and
POWER9, hence the change.

2. Removed vec_d_cos2_core.c and vec_d_cos2_vmx.c. The former did
ifunc selection between the latter and the main C implementation.
File vec_d_cos2_vmx.c was not a true Altivec implementation. It was
only a wrapper to the scalar cosine funtion.

3. A new file, vec_finite_alias.c is a workaround until the vector
log function is implemented. It is needed so that libmvec_nonshared.a
is built. Without it, compiling against the newly-built glibc will
fail due to its being missing.

4. __PPC64__ is the macro tested in math-vector.h. Table 5.1 of
the POWER ELFv2 ABI defines it and __powerpc64__ as synonyms.
The other macros in that file are all-uppercase and the choice
made preserves consistency.

5. GCC has no vectorizing support for PPC64. The openmp pragmas
are ignored and only scalar cosine calls generated. Exactly as when
libmvec doesn't exist.

6. The executables created to test against new glibc installation
required a workaround. x86_64 also did when I tried to compile the
same test. The test is a modification of Example #1 at
<https://sourceware.org/glibc/wiki/libmvec>. The only change initially
is a replacement of the call to cos () with one to the vector version
_ZGVbN2v_cos (). Compilation fails due to function without a
prototype. The solution for both PPC64 and x86_64 was to supply a
'extern <return type> _ZGVbN2v_cos (<in arg. type>)' forward
declaration. Then compilation created an executable that used
the new vector cosine.

7. This patch is half of the requirement for BZ #24205. The other is
implementing vector single-precision cosine. There are two outstanding
issues which I ask to be pushed into the patch for cosf. Gracefully
terminating configure if the GCC used does not provide the VSX builtins
required to build libmvec. And runtime avoidance of tests of the vector
functions on machines without VSX hardware.

diff --git a/ChangeLog b/ChangeLog
index 8096175cc9..654774d690 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,20 @@ 
+2019-02-27    <bert.tenjy@gmail.com>
+
+	[BZ #24205]
+	* sysdeps/powerpc/bits/math-vector.h: New file.
+	* sysdeps/powerpc/fpu/libm-test-ulps (cos_vlen2): Regenerated.
+	* sysdeps/powerpc/powerpc64/fpu/Makefile: New file.
+	* sysdeps/powerpc/powerpc64/fpu/Versions: Likewise.
+	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libmvec-sysdep_routines)
+	(CFLAGS-vec_d_cos2_vsx.c, libmvec-tests, double-vlen2-funcs)
+	(double-vlen2-arch-ext-cflags): Added build of VSX vector cos function
+	and its tests.
+	* sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c: New file.
+	* sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_cos2_vsx.c: Likewise.
+	* sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h: Likewise.
+	* sysdeps/powerpc/powerpc64/fpu/vec_finite_alias.c: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: Likewise.
+
 2019-02-26  Joseph Myers  <joseph@codesourcery.com>
 
 	* sysdeps/arm/sysdep.h (#if condition): Break lines before rather
diff --git a/NEWS b/NEWS
index 0a3b6c7a5a..fc08f11c51 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,19 @@  See the end for copying conditions.
 Please send GNU C library bug reports via <https://sourceware.org/bugzilla/>
 using `glibc' in the "product" field.
 
+
+* Start of implementing vector math library libmvec on PPC64/POWER8.
+  The double-precision cosine now has a vector version.
+  GCC support for auto-vectorization of functions on PPC64 is not yet
+  available. Until that is done, the new vector math functions will be
+  inaccessible to applications.
+  Building libmvec for PPC64 VSX hardware is done at configuration with
+  --enable-mathvec. The default is to not build.
+  The library ABI specification is x86_64 Vector Function ABI.
+  More information on libmvec including a link to the ABI document is at:
+  <https://sourceware.org/glibc/wiki/libmvec>
+
+
 Version 2.30
 
 Major new features:
diff --git a/sysdeps/powerpc/bits/math-vector.h b/sysdeps/powerpc/bits/math-vector.h
new file mode 100644
index 0000000000..78d9db64bf
--- /dev/null
+++ b/sysdeps/powerpc/bits/math-vector.h
@@ -0,0 +1,41 @@ 
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly;\
+ include <math.h> instead."
+#endif
+
+/* Get default empty definitions for simd declarations.  */
+#include <bits/libm-simd-decl-stubs.h>
+
+#if defined __PPC64__ && defined __FAST_MATH__
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case.  */
+#  define __DECL_SIMD_PPC64 _Pragma ("omp declare simd notinbranch")
+# elif __GNUC_PREREQ (6,0)
+/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
+#  define __DECL_SIMD_PPC64 __attribute__ ((__simd__ ("notinbranch")))
+# endif
+
+# ifdef __DECL_SIMD_PPC64
+#  undef __DECL_SIMD_cos
+#  define __DECL_SIMD_cos __DECL_SIMD_PPC64
+
+# endif
+#endif
diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
index 1eec27c1dc..d392b135a7 100644
--- a/sysdeps/powerpc/fpu/libm-test-ulps
+++ b/sysdeps/powerpc/fpu/libm-test-ulps
@@ -1311,6 +1311,9 @@  ifloat128: 2
 ildouble: 5
 ldouble: 5
 
+Function: "cos_vlen2":
+double: 2
+
 Function: "cosh":
 double: 1
 float: 1
diff --git a/sysdeps/powerpc/powerpc64/fpu/Makefile b/sysdeps/powerpc/powerpc64/fpu/Makefile
new file mode 100644
index 0000000000..21dc67ff73
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/Makefile
@@ -0,0 +1,7 @@ 
+ifeq ($(subdir),mathvec)
+libmvec-support += vec_finite_alias
+
+CFLAGS-vec_finite_alias.c += -mvsx
+
+libmvec-static-only-routines = vec_finite_alias
+endif
diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions
new file mode 100644
index 0000000000..9a3e1211cc
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/Versions
@@ -0,0 +1,5 @@ 
+libmvec {
+  GLIBC_2.30 {
+    _ZGVbN2v_cos;
+  }
+}
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
index 39b557604c..44c1c04c13 100644
--- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
@@ -42,3 +42,20 @@  CFLAGS-e_hypotf-power7.c = -mcpu=power7
 CFLAGS-s_modf-ppc64.c += -fsignaling-nans
 CFLAGS-s_modff-ppc64.c += -fsignaling-nans
 endif
+
+ifeq ($(subdir),mathvec)
+libmvec-sysdep_routines += vec_d_cos2_vsx
+CFLAGS-vec_d_cos2_vsx.c += -mvsx
+endif
+
+# Variables for libmvec tests.
+ifeq ($(subdir),math)
+ifeq ($(build-mathvec),yes)
+libmvec-tests += double-vlen2
+
+double-vlen2-funcs = cos
+
+double-vlen2-arch-ext-cflags = -mvsx -DREQUIRE_VSX
+
+endif
+endif
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c
new file mode 100644
index 0000000000..17e2cc0724
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c
@@ -0,0 +1,24 @@ 
+/* Wrapper part of tests for VSX ISA versions of vector math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-double-vlen2.h"
+#include <altivec.h>
+
+#define VEC_TYPE vector double
+
+VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVbN2v_cos)
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_cos2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_cos2_vsx.c
new file mode 100644
index 0000000000..ed8fe330c1
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_cos2_vsx.c
@@ -0,0 +1,88 @@ 
+/* Function cos vectorized with VSX.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+#include "vec_d_trig_data.h"
+
+vector double
+_ZGVbN2v_cos (vector double x)
+{
+
+  /*
+   ARGUMENT RANGE REDUCTION:
+   Add Pi/2 to argument: X' = X+Pi/2.  */
+  vector double x_prime = (vector double) d_half_pi + x;
+
+  /* Get absolute argument value: X' = |X'|.  */
+  vector double abs_x_prime = vec_abs (x_prime);
+
+  /* Y = X'*InvPi + RS : right shifter add.  */
+  vector double y = (x_prime * d_inv_pi) + d_rshifter;
+
+  /* Check for large arguments path.  */
+  vector bool long long large_in = vec_cmpgt (abs_x_prime, d_rangeval);
+
+  /* N = Y - RS : right shifter sub.  */
+  vector double n = y - d_rshifter;
+
+  /* SignRes = Y<<63 : shift LSB to MSB place for result sign.  */
+  vector double sign_res = (vector double) vec_sl ((vector long long) y,
+						   (vector unsigned long long)
+						   vec_splats (63));
+
+  /* N = N - 0.5.  */
+  n = n - d_one_half;
+
+  /* R = X - N*Pi1.  */
+  vector double r = x - (n * d_pi1_fma);
+
+  /* R = R - N*Pi2.  */
+  r = r - (n * d_pi2_fma);
+
+  /* R = R - N*Pi3.  */
+  r = r - (n * d_pi3_fma);
+
+  /* R2 = R*R.  */
+  vector double r2 = r * r;
+
+  /* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))).  */
+  vector double poly = r2 * d_coeff7 + d_coeff6;
+  poly = poly * r2 + d_coeff5;
+  poly = poly * r2 + d_coeff4;
+  poly = poly * r2 + d_coeff3;
+
+  /* Poly = R+R*(R2*(C1+R2*(C2+R2*Poly))).  */
+  poly = poly * r2 + d_coeff2;
+  poly = poly * r2 + d_coeff1;
+  poly = poly * r2 * r + r;
+
+  /*
+     RECONSTRUCTION:
+     Final sign setting: Res = Poly^SignRes.  */
+  vector double out
+    = (vector double) ((vector long long) poly ^ (vector long long) sign_res);
+
+  if (large_in[0] != 0)
+    out[0] = cos (x[0]);
+
+  if (large_in[1] != 0)
+    out[1] = cos (x[1]);
+
+  return out;
+
+}
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h
new file mode 100644
index 0000000000..4b2678928f
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_trig_data.h
@@ -0,0 +1,60 @@ 
+/* Constants used in polynomail approximations for vectorized sin, cos,
+   and sincos functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef D_TRIG_DATA_H
+#define D_TRIG_DATA_H
+
+#include <altivec.h>
+
+/* PI/2.  */
+const vector double d_half_pi  = {0x1.921fb54442d18p+0, 0x1.921fb54442d18p+0};
+
+/* Inverse PI.  */
+const vector double d_inv_pi   = {0x1.45f306dc9c883p-2, 0x1.45f306dc9c883p-2};
+
+/* Right-shifter constant.  */
+const vector double d_rshifter = {0x1.8p+52, 0x1.8p+52};
+
+/* Working range threshold.  */
+const vector double d_rangeval = {0x1p+23, 0x1p+23};
+
+/* One-half . */
+const vector double d_one_half = {0x1p-1, 0x1p-1};
+
+/* Range reduction PI-based constants if FMA available:
+   PI high part (FMA available).  */
+const vector double d_pi1_fma = {0x1.921fb54442d18p+1, 0x1.921fb54442d18p+1};
+
+/* PI mid part  (FMA available).  */
+const vector double d_pi2_fma = {0x1.1a62633145c06p-53, 0x1.1a62633145c06p-53};
+
+/* PI low part  (FMA available).  */
+const vector double d_pi3_fma
+= {0x1.c1cd129024e09p-106,0x1.c1cd129024e09p-106};
+
+/* Polynomial coefficients (relative error 2^(-52.115)).  */
+const vector double d_coeff7 = {-0x1.9f0d60811aac8p-41,-0x1.9f0d60811aac8p-41};
+const vector double d_coeff6 = {0x1.60e6857a2f22p-33,0x1.60e6857a2f22p-33};
+const vector double d_coeff5 = {-0x1.ae63546002231p-26,-0x1.ae63546002231p-26};
+const vector double d_coeff4 = {0x1.71de38030feap-19,0x1.71de38030feap-19};
+const vector double d_coeff3 = {-0x1.a01a019a5b86dp-13,-0x1.a01a019a5b86dp-13};
+const vector double d_coeff2 = {0x1.111111110a4a8p-7,0x1.111111110a4a8p-7};
+const vector double d_coeff1 = {-0x1.55555555554a7p-3,-0x1.55555555554a7p-3};
+
+#endif /* D_TRIG_DATA_H.  */
diff --git a/sysdeps/powerpc/powerpc64/fpu/vec_finite_alias.c b/sysdeps/powerpc/powerpc64/fpu/vec_finite_alias.c
new file mode 100644
index 0000000000..f1a062aadf
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/vec_finite_alias.c
@@ -0,0 +1,41 @@ 
+/* A temporary workaround until vector log is implemented.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+#include <altivec.h>
+
+/* We need this wrapper to the scalar log function so that
+   libmvec_nonshared.a is generated. Otherwise compiling
+   against the new glibc during testing results in an error
+   due to the missing libmvec_nonshared.a.  */
+
+vector double
+_ZGVbN2v___log_finite (vector double x)
+{
+
+  /*
+   Calls the scalar log function twice, once for each
+   of the pair of doubles in the input argument.  */
+  vector double out;
+
+  out[0] = log (x[0]);
+  out[1] = log (x[1]);
+
+  return out;
+
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
new file mode 100644
index 0000000000..656ce0541f
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
@@ -0,0 +1 @@ 
+GLIBC_2.30 _ZGVbN2v_cos F
-- 
2.20.1