[01/17] S390: Use load-fp-integer instruction for nearbyint functions.

Message ID 1572881244-6781-1-git-send-email-stli@linux.ibm.com
State Superseded
Headers

Commit Message

Stefan Liebler Nov. 4, 2019, 3:27 p.m. UTC
  If compiled with z196 zarch support, the load-fp-integer instruction
is used to implement nearbyint, nearbyintf, nearbyintl.
Otherwise the common-code implementation is used.
---
 sysdeps/s390/fpu/s_nearbyint.c  | 38 ++++++++++++++++++++++++++++++++
 sysdeps/s390/fpu/s_nearbyintf.c | 38 ++++++++++++++++++++++++++++++++
 sysdeps/s390/fpu/s_nearbyintl.c | 39 +++++++++++++++++++++++++++++++++
 3 files changed, 115 insertions(+)
 create mode 100644 sysdeps/s390/fpu/s_nearbyint.c
 create mode 100644 sysdeps/s390/fpu/s_nearbyintf.c
 create mode 100644 sysdeps/s390/fpu/s_nearbyintl.c
  

Comments

Adhemerval Zanella Netto Nov. 4, 2019, 6:22 p.m. UTC | #1
On 04/11/2019 12:27, Stefan Liebler wrote:
> If compiled with z196 zarch support, the load-fp-integer instruction
> is used to implement nearbyint, nearbyintf, nearbyintl.
> Otherwise the common-code implementation is used.

> +
> +double
> +__nearbyint (double x)
> +{
> +  double y;
> +  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
> +     x to the nearest integer according to current rounding mode (M3-field: 0)
> +     where inexact exceptions are suppressed (M4-field: 4).  */
> +  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
> +  return y;
> +}
> +libm_alias_double (__nearbyint, nearbyint)

At least with recent gcc __builtin_nearbyint generates the expected fidbra
instruction for -march=z196.  I wonder if we could start to simplify some
math symbols implementation where new architectures/extensions provide 
direct implementation by a direct mapping implemented by compiler builtins.

I would expect to:

  1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
     since I hardly doubt these micro-optimizations really pay off with
     recent architectures and compiler version.

  2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:

     * sysdeps/ieee754/dbl-64/s_nearbyint.c
     
     [...]
     double
     __nearbyint (double x)
     {
     #if __USE_NEARBYINT_BUILTIN
       return __builtin_nearbyint (x);
     #else
       /* Use generic implementation.  */
     #endif
     }

  3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.    

It would allow to simplify some architectures, aarch64 for instance.
  
Stefan Liebler Nov. 5, 2019, 3:49 p.m. UTC | #2
On 11/4/19 7:22 PM, Adhemerval Zanella wrote:
> 
> 
> On 04/11/2019 12:27, Stefan Liebler wrote:
>> If compiled with z196 zarch support, the load-fp-integer instruction
>> is used to implement nearbyint, nearbyintf, nearbyintl.
>> Otherwise the common-code implementation is used.
> 
>> +
>> +double
>> +__nearbyint (double x)
>> +{
>> +  double y;
>> +  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>> +     x to the nearest integer according to current rounding mode (M3-field: 0)
>> +     where inexact exceptions are suppressed (M4-field: 4).  */
>> +  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>> +  return y;
>> +}
>> +libm_alias_double (__nearbyint, nearbyint)
> 
> At least with recent gcc __builtin_nearbyint generates the expected fidbra
> instruction for -march=z196.  I wonder if we could start to simplify some
> math symbols implementation where new architectures/extensions provide
> direct implementation by a direct mapping implemented by compiler builtins.
> 
> I would expect to:
> 
>    1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>       since I hardly doubt these micro-optimizations really pay off with
>       recent architectures and compiler version.
> 
>    2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
> 
>       * sysdeps/ieee754/dbl-64/s_nearbyint.c
>       
>       [...]
>       double
>       __nearbyint (double x)
>       {
>       #if __USE_NEARBYINT_BUILTIN
>         return __builtin_nearbyint (x);
>       #else
>         /* Use generic implementation.  */
>       #endif
>       }
> 
>    3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
> 
> It would allow to simplify some architectures, aarch64 for instance.
> 

Currently the long double builtins are generating an extra not needed 
stack frame compared to the inline assembly. But this needs to be fixed 
in gcc.

E.g. if build for s390 (31bit), where the fidbra & co instructions are 
not available, the builtins generate a call to libc which would end in 
an infinite loop.  I will make some tests on s390 starting with the 
current minimum gcc 6.2 to be sure that the instructions are used.  I 
have never build glibc with other compilers like clang.  Is there a 
special need to check this behavior?

In general I can start with those functions where the builtins can be 
used on s390, but I won't move all wordsize-64 functions and adjust them 
to use the builtins with this patch series.
This means for now, I start with using builtins for nearbyint, rint, 
floor, ceil, trunc, round and copysign.

Afterwards the same can be done for the remaining functions.

I will create an own header file, e.g. 
sysdeps/generic/math-use-builtins.h in the same way as 
fix-fp-int-compare-invalid.h.
The generic version contains all USE_XYZ_BUILTIN macros defined to 0
and each architecture can provide its own file with other settings.
For each functions XYZ there will be three macros, e.g. 
USE_NEARBYINT_BUILTIN, USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN.
How about this?

Thanks,
Stefan
  
Joseph Myers Nov. 5, 2019, 4:48 p.m. UTC | #3
On Tue, 5 Nov 2019, Stefan Liebler wrote:

> I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in
> the same way as fix-fp-int-compare-invalid.h.
> The generic version contains all USE_XYZ_BUILTIN macros defined to 0
> and each architecture can provide its own file with other settings.
> For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN,
> USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN.

You'll need macro variants for _Float128 as well, with corresponding 
#undef / #define of the long double variants and of the built-in names in 
float128_private.h (see what it does for FIX_LDBL_LONG_CONVERT_OVERFLOW 
and FIX_LDBL_LLONG_CONVERT_OVERFLOW) - POWER9 supports doing various such 
_Float128 operations as built-in functions.
  
Adhemerval Zanella Netto Nov. 5, 2019, 6:55 p.m. UTC | #4
On 05/11/2019 12:49, Stefan Liebler wrote:
> On 11/4/19 7:22 PM, Adhemerval Zanella wrote:
>>
>>
>> On 04/11/2019 12:27, Stefan Liebler wrote:
>>> If compiled with z196 zarch support, the load-fp-integer instruction
>>> is used to implement nearbyint, nearbyintf, nearbyintl.
>>> Otherwise the common-code implementation is used.
>>
>>> +
>>> +double
>>> +__nearbyint (double x)
>>> +{
>>> +  double y;
>>> +  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>>> +     x to the nearest integer according to current rounding mode (M3-field: 0)
>>> +     where inexact exceptions are suppressed (M4-field: 4).  */
>>> +  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>>> +  return y;
>>> +}
>>> +libm_alias_double (__nearbyint, nearbyint)
>>
>> At least with recent gcc __builtin_nearbyint generates the expected fidbra
>> instruction for -march=z196.  I wonder if we could start to simplify some
>> math symbols implementation where new architectures/extensions provide
>> direct implementation by a direct mapping implemented by compiler builtins.
>>
>> I would expect to:
>>
>>    1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>>       since I hardly doubt these micro-optimizations really pay off with
>>       recent architectures and compiler version.
>>
>>    2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
>>
>>       * sysdeps/ieee754/dbl-64/s_nearbyint.c
>>             [...]
>>       double
>>       __nearbyint (double x)
>>       {
>>       #if __USE_NEARBYINT_BUILTIN
>>         return __builtin_nearbyint (x);
>>       #else
>>         /* Use generic implementation.  */
>>       #endif
>>       }
>>
>>    3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
>>
>> It would allow to simplify some architectures, aarch64 for instance.
>>
> 
> Currently the long double builtins are generating an extra not needed stack frame compared to the inline assembly. But this needs to be fixed in gcc.
> 
> E.g. if build for s390 (31bit), where the fidbra & co instructions are not available, the builtins generate a call to libc which would end in an infinite loop.  I will make some tests on s390 starting with the current minimum gcc 6.2 to be sure that the instructions are used.  I have never build glibc with other compilers like clang.  Is there a special need to check this behavior?

I think google maintains some branches with clang support (google/grte/*),
but there is no know effort to sync these with master.  So I see there is
no need to focus on non-gcc compiler for now.

> 
> In general I can start with those functions where the builtins can be used on s390, but I won't move all wordsize-64 functions and adjust them to use the builtins with this patch series.
> This means for now, I start with using builtins for nearbyint, rint, floor, ceil, trunc, round and copysign.
> 
> Afterwards the same can be done for the remaining functions.
> 
> I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in the same way as fix-fp-int-compare-invalid.h.
> The generic version contains all USE_XYZ_BUILTIN macros defined to 0
> and each architecture can provide its own file with other settings.
> For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN, USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN.
> How about this?
> 

I think it is fair start, with the adjustments pointed out by Joseph.
I will check out the worksize-64 refactor to avoid duplicate the
implementations.
  
Stefan Liebler Dec. 2, 2019, 2:56 p.m. UTC | #5
On 11/4/19 7:22 PM, Adhemerval Zanella wrote:
> 
> 
> On 04/11/2019 12:27, Stefan Liebler wrote:
>> If compiled with z196 zarch support, the load-fp-integer instruction
>> is used to implement nearbyint, nearbyintf, nearbyintl.
>> Otherwise the common-code implementation is used.
> 
>> +
>> +double
>> +__nearbyint (double x)
>> +{
>> +  double y;
>> +  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>> +     x to the nearest integer according to current rounding mode (M3-field: 0)
>> +     where inexact exceptions are suppressed (M4-field: 4).  */
>> +  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>> +  return y;
>> +}
>> +libm_alias_double (__nearbyint, nearbyint)
> 
> At least with recent gcc __builtin_nearbyint generates the expected fidbra
> instruction for -march=z196.  I wonder if we could start to simplify some
> math symbols implementation where new architectures/extensions provide
> direct implementation by a direct mapping implemented by compiler builtins.
> 
> I would expect to:
> 
>    1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>       since I hardly doubt these micro-optimizations really pay off with
>       recent architectures and compiler version.
> 
>    2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
> 
>       * sysdeps/ieee754/dbl-64/s_nearbyint.c
>       
>       [...]
>       double
>       __nearbyint (double x)
>       {
>       #if __USE_NEARBYINT_BUILTIN
>         return __builtin_nearbyint (x);
>       #else
>         /* Use generic implementation.  */
>       #endif
>       }
> 
>    3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
> 
> It would allow to simplify some architectures, aarch64 for instance.
> 
This patch is superseded by the patch-series which is always using 
wordsize-64 version and allows to use the GCC builtins in common-code 
implementation:
"[PATCH 00/13] Use GCC builtins for some math functions if desired."
https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html

Bye,
Stefan
  
Adhemerval Zanella Netto Dec. 2, 2019, 3:20 p.m. UTC | #6
On 02/12/2019 11:56, Stefan Liebler wrote:
> On 11/4/19 7:22 PM, Adhemerval Zanella wrote:
>>
>>
>> On 04/11/2019 12:27, Stefan Liebler wrote:
>>> If compiled with z196 zarch support, the load-fp-integer instruction
>>> is used to implement nearbyint, nearbyintf, nearbyintl.
>>> Otherwise the common-code implementation is used.
>>
>>> +
>>> +double
>>> +__nearbyint (double x)
>>> +{
>>> +  double y;
>>> +  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>>> +     x to the nearest integer according to current rounding mode (M3-field: 0)
>>> +     where inexact exceptions are suppressed (M4-field: 4).  */
>>> +  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>>> +  return y;
>>> +}
>>> +libm_alias_double (__nearbyint, nearbyint)
>>
>> At least with recent gcc __builtin_nearbyint generates the expected fidbra
>> instruction for -march=z196.  I wonder if we could start to simplify some
>> math symbols implementation where new architectures/extensions provide
>> direct implementation by a direct mapping implemented by compiler builtins.
>>
>> I would expect to:
>>
>>    1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>>       since I hardly doubt these micro-optimizations really pay off with
>>       recent architectures and compiler version.
>>
>>    2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
>>
>>       * sysdeps/ieee754/dbl-64/s_nearbyint.c
>>             [...]
>>       double
>>       __nearbyint (double x)
>>       {
>>       #if __USE_NEARBYINT_BUILTIN
>>         return __builtin_nearbyint (x);
>>       #else
>>         /* Use generic implementation.  */
>>       #endif
>>       }
>>
>>    3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
>>
>> It would allow to simplify some architectures, aarch64 for instance.
>>
> This patch is superseded by the patch-series which is always using wordsize-64 version and allows to use the GCC builtins in common-code implementation:
> "[PATCH 00/13] Use GCC builtins for some math functions if desired."
> https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html
> 
> Bye,
> Stefan
> 

Thanks, I will review these set.
  

Patch

diff --git a/sysdeps/s390/fpu/s_nearbyint.c b/sysdeps/s390/fpu/s_nearbyint.c
new file mode 100644
index 0000000000..7af92ace0c
--- /dev/null
+++ b/sysdeps/s390/fpu/s_nearbyint.c
@@ -0,0 +1,38 @@ 
+/* nearbyint() - S390 version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifdef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT
+# include <math.h>
+# include <libm-alias-double.h>
+
+double
+__nearbyint (double x)
+{
+  double y;
+  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
+     x to the nearest integer according to current rounding mode (M3-field: 0)
+     where inexact exceptions are suppressed (M4-field: 4).  */
+  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
+  return y;
+}
+libm_alias_double (__nearbyint, nearbyint)
+
+#else
+# include <sysdeps/ieee754/dbl-64/s_nearbyint.c>
+#endif
diff --git a/sysdeps/s390/fpu/s_nearbyintf.c b/sysdeps/s390/fpu/s_nearbyintf.c
new file mode 100644
index 0000000000..66473e3b2e
--- /dev/null
+++ b/sysdeps/s390/fpu/s_nearbyintf.c
@@ -0,0 +1,38 @@ 
+/* nearbyintf() - S390 version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifdef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT
+# include <math.h>
+# include <libm-alias-float.h>
+
+float
+__nearbyintf (float x)
+{
+  float y;
+  /* The z196 zarch "load fp integer" (fiebra) instruction is rounding
+     x to the nearest integer according to current rounding mode (M3-field: 0)
+     where inexact exceptions are suppressed (M4-field: 4).  */
+  __asm__ ("fiebra %0,0,%1,4" : "=f" (y) : "f" (x));
+  return y;
+}
+libm_alias_float (__nearbyint, nearbyint)
+
+#else
+# include <sysdeps/ieee754/flt-32/s_nearbyintf.c>
+#endif
diff --git a/sysdeps/s390/fpu/s_nearbyintl.c b/sysdeps/s390/fpu/s_nearbyintl.c
new file mode 100644
index 0000000000..f154cf41c7
--- /dev/null
+++ b/sysdeps/s390/fpu/s_nearbyintl.c
@@ -0,0 +1,39 @@ 
+/* nearbyintl() - S390 version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifdef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT
+# include <math.h>
+# include <math_private.h>
+# include <libm-alias-ldouble.h>
+
+_Float128
+__nearbyintl (_Float128 x)
+{
+  _Float128 y;
+  /* The z196 zarch "load fp integer" (fixbra) instruction is rounding
+     x to the nearest integer according to current rounding mode (M3-field: 0)
+     where inexact exceptions are suppressed (M4-field: 4).  */
+  __asm__ ("fixbra %0,0,%1,4" : "=f" (y) : "f" (x));
+  return y;
+}
+libm_alias_ldouble (__nearbyint, nearbyint)
+
+#else
+# include <sysdeps/ieee754/ldbl-128/s_nearbyintl.c>
+#endif