Submitter | Stefan Liebler |
---|---|

Date | Nov. 4, 2019, 3:27 p.m. |

Message ID | <1572881244-6781-1-git-send-email-stli@linux.ibm.com> |

Download | mbox | patch |

Permalink | /patch/35614/ |

State | Superseded |

Headers | show |

## Comments

On 04/11/2019 12:27, Stefan Liebler wrote: > If compiled with z196 zarch support, the load-fp-integer instruction > is used to implement nearbyint, nearbyintf, nearbyintl. > Otherwise the common-code implementation is used. > + > +double > +__nearbyint (double x) > +{ > + double y; > + /* The z196 zarch "load fp integer" (fidbra) instruction is rounding > + x to the nearest integer according to current rounding mode (M3-field: 0) > + where inexact exceptions are suppressed (M4-field: 4). */ > + __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x)); > + return y; > +} > +libm_alias_double (__nearbyint, nearbyint) At least with recent gcc __builtin_nearbyint generates the expected fidbra instruction for -march=z196. I wonder if we could start to simplify some math symbols implementation where new architectures/extensions provide direct implementation by a direct mapping implemented by compiler builtins. I would expect to: 1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/ since I hardly doubt these micro-optimizations really pay off with recent architectures and compiler version. 2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as: * sysdeps/ieee754/dbl-64/s_nearbyint.c [...] double __nearbyint (double x) { #if __USE_NEARBYINT_BUILTIN return __builtin_nearbyint (x); #else /* Use generic implementation. */ #endif } 3. Define the __USE_<SYMBOL>_BUILTIN for each architecture. It would allow to simplify some architectures, aarch64 for instance.

On 11/4/19 7:22 PM, Adhemerval Zanella wrote: > > > On 04/11/2019 12:27, Stefan Liebler wrote: >> If compiled with z196 zarch support, the load-fp-integer instruction >> is used to implement nearbyint, nearbyintf, nearbyintl. >> Otherwise the common-code implementation is used. > >> + >> +double >> +__nearbyint (double x) >> +{ >> + double y; >> + /* The z196 zarch "load fp integer" (fidbra) instruction is rounding >> + x to the nearest integer according to current rounding mode (M3-field: 0) >> + where inexact exceptions are suppressed (M4-field: 4). */ >> + __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x)); >> + return y; >> +} >> +libm_alias_double (__nearbyint, nearbyint) > > At least with recent gcc __builtin_nearbyint generates the expected fidbra > instruction for -march=z196. I wonder if we could start to simplify some > math symbols implementation where new architectures/extensions provide > direct implementation by a direct mapping implemented by compiler builtins. > > I would expect to: > > 1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/ > since I hardly doubt these micro-optimizations really pay off with > recent architectures and compiler version. > > 2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as: > > * sysdeps/ieee754/dbl-64/s_nearbyint.c > > [...] > double > __nearbyint (double x) > { > #if __USE_NEARBYINT_BUILTIN > return __builtin_nearbyint (x); > #else > /* Use generic implementation. */ > #endif > } > > 3. Define the __USE_<SYMBOL>_BUILTIN for each architecture. > > It would allow to simplify some architectures, aarch64 for instance. > Currently the long double builtins are generating an extra not needed stack frame compared to the inline assembly. But this needs to be fixed in gcc. E.g. if build for s390 (31bit), where the fidbra & co instructions are not available, the builtins generate a call to libc which would end in an infinite loop. I will make some tests on s390 starting with the current minimum gcc 6.2 to be sure that the instructions are used. I have never build glibc with other compilers like clang. Is there a special need to check this behavior? In general I can start with those functions where the builtins can be used on s390, but I won't move all wordsize-64 functions and adjust them to use the builtins with this patch series. This means for now, I start with using builtins for nearbyint, rint, floor, ceil, trunc, round and copysign. Afterwards the same can be done for the remaining functions. I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in the same way as fix-fp-int-compare-invalid.h. The generic version contains all USE_XYZ_BUILTIN macros defined to 0 and each architecture can provide its own file with other settings. For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN, USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN. How about this? Thanks, Stefan

On Tue, 5 Nov 2019, Stefan Liebler wrote: > I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in > the same way as fix-fp-int-compare-invalid.h. > The generic version contains all USE_XYZ_BUILTIN macros defined to 0 > and each architecture can provide its own file with other settings. > For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN, > USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN. You'll need macro variants for _Float128 as well, with corresponding #undef / #define of the long double variants and of the built-in names in float128_private.h (see what it does for FIX_LDBL_LONG_CONVERT_OVERFLOW and FIX_LDBL_LLONG_CONVERT_OVERFLOW) - POWER9 supports doing various such _Float128 operations as built-in functions.

On 05/11/2019 12:49, Stefan Liebler wrote: > On 11/4/19 7:22 PM, Adhemerval Zanella wrote: >> >> >> On 04/11/2019 12:27, Stefan Liebler wrote: >>> If compiled with z196 zarch support, the load-fp-integer instruction >>> is used to implement nearbyint, nearbyintf, nearbyintl. >>> Otherwise the common-code implementation is used. >> >>> + >>> +double >>> +__nearbyint (double x) >>> +{ >>> + double y; >>> + /* The z196 zarch "load fp integer" (fidbra) instruction is rounding >>> + x to the nearest integer according to current rounding mode (M3-field: 0) >>> + where inexact exceptions are suppressed (M4-field: 4). */ >>> + __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x)); >>> + return y; >>> +} >>> +libm_alias_double (__nearbyint, nearbyint) >> >> At least with recent gcc __builtin_nearbyint generates the expected fidbra >> instruction for -march=z196. I wonder if we could start to simplify some >> math symbols implementation where new architectures/extensions provide >> direct implementation by a direct mapping implemented by compiler builtins. >> >> I would expect to: >> >> 1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/ >> since I hardly doubt these micro-optimizations really pay off with >> recent architectures and compiler version. >> >> 2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as: >> >> * sysdeps/ieee754/dbl-64/s_nearbyint.c >> [...] >> double >> __nearbyint (double x) >> { >> #if __USE_NEARBYINT_BUILTIN >> return __builtin_nearbyint (x); >> #else >> /* Use generic implementation. */ >> #endif >> } >> >> 3. Define the __USE_<SYMBOL>_BUILTIN for each architecture. >> >> It would allow to simplify some architectures, aarch64 for instance. >> > > Currently the long double builtins are generating an extra not needed stack frame compared to the inline assembly. But this needs to be fixed in gcc. > > E.g. if build for s390 (31bit), where the fidbra & co instructions are not available, the builtins generate a call to libc which would end in an infinite loop. I will make some tests on s390 starting with the current minimum gcc 6.2 to be sure that the instructions are used. I have never build glibc with other compilers like clang. Is there a special need to check this behavior? I think google maintains some branches with clang support (google/grte/*), but there is no know effort to sync these with master. So I see there is no need to focus on non-gcc compiler for now. > > In general I can start with those functions where the builtins can be used on s390, but I won't move all wordsize-64 functions and adjust them to use the builtins with this patch series. > This means for now, I start with using builtins for nearbyint, rint, floor, ceil, trunc, round and copysign. > > Afterwards the same can be done for the remaining functions. > > I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in the same way as fix-fp-int-compare-invalid.h. > The generic version contains all USE_XYZ_BUILTIN macros defined to 0 > and each architecture can provide its own file with other settings. > For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN, USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN. > How about this? > I think it is fair start, with the adjustments pointed out by Joseph. I will check out the worksize-64 refactor to avoid duplicate the implementations.

On 11/4/19 7:22 PM, Adhemerval Zanella wrote: > > > On 04/11/2019 12:27, Stefan Liebler wrote: >> If compiled with z196 zarch support, the load-fp-integer instruction >> is used to implement nearbyint, nearbyintf, nearbyintl. >> Otherwise the common-code implementation is used. > >> + >> +double >> +__nearbyint (double x) >> +{ >> + double y; >> + /* The z196 zarch "load fp integer" (fidbra) instruction is rounding >> + x to the nearest integer according to current rounding mode (M3-field: 0) >> + where inexact exceptions are suppressed (M4-field: 4). */ >> + __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x)); >> + return y; >> +} >> +libm_alias_double (__nearbyint, nearbyint) > > At least with recent gcc __builtin_nearbyint generates the expected fidbra > instruction for -march=z196. I wonder if we could start to simplify some > math symbols implementation where new architectures/extensions provide > direct implementation by a direct mapping implemented by compiler builtins. > > I would expect to: > > 1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/ > since I hardly doubt these micro-optimizations really pay off with > recent architectures and compiler version. > > 2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as: > > * sysdeps/ieee754/dbl-64/s_nearbyint.c > > [...] > double > __nearbyint (double x) > { > #if __USE_NEARBYINT_BUILTIN > return __builtin_nearbyint (x); > #else > /* Use generic implementation. */ > #endif > } > > 3. Define the __USE_<SYMBOL>_BUILTIN for each architecture. > > It would allow to simplify some architectures, aarch64 for instance. > This patch is superseded by the patch-series which is always using wordsize-64 version and allows to use the GCC builtins in common-code implementation: "[PATCH 00/13] Use GCC builtins for some math functions if desired." https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html Bye, Stefan

On 02/12/2019 11:56, Stefan Liebler wrote: > On 11/4/19 7:22 PM, Adhemerval Zanella wrote: >> >> >> On 04/11/2019 12:27, Stefan Liebler wrote: >>> If compiled with z196 zarch support, the load-fp-integer instruction >>> is used to implement nearbyint, nearbyintf, nearbyintl. >>> Otherwise the common-code implementation is used. >> >>> + >>> +double >>> +__nearbyint (double x) >>> +{ >>> + double y; >>> + /* The z196 zarch "load fp integer" (fidbra) instruction is rounding >>> + x to the nearest integer according to current rounding mode (M3-field: 0) >>> + where inexact exceptions are suppressed (M4-field: 4). */ >>> + __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x)); >>> + return y; >>> +} >>> +libm_alias_double (__nearbyint, nearbyint) >> >> At least with recent gcc __builtin_nearbyint generates the expected fidbra >> instruction for -march=z196. I wonder if we could start to simplify some >> math symbols implementation where new architectures/extensions provide >> direct implementation by a direct mapping implemented by compiler builtins. >> >> I would expect to: >> >> 1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/ >> since I hardly doubt these micro-optimizations really pay off with >> recent architectures and compiler version. >> >> 2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as: >> >> * sysdeps/ieee754/dbl-64/s_nearbyint.c >> [...] >> double >> __nearbyint (double x) >> { >> #if __USE_NEARBYINT_BUILTIN >> return __builtin_nearbyint (x); >> #else >> /* Use generic implementation. */ >> #endif >> } >> >> 3. Define the __USE_<SYMBOL>_BUILTIN for each architecture. >> >> It would allow to simplify some architectures, aarch64 for instance. >> > This patch is superseded by the patch-series which is always using wordsize-64 version and allows to use the GCC builtins in common-code implementation: > "[PATCH 00/13] Use GCC builtins for some math functions if desired." > https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html > > Bye, > Stefan > Thanks, I will review these set.

## Patch

diff --git a/sysdeps/s390/fpu/s_nearbyint.c b/sysdeps/s390/fpu/s_nearbyint.c new file mode 100644 index 0000000000..7af92ace0c --- /dev/null +++ b/sysdeps/s390/fpu/s_nearbyint.c @@ -0,0 +1,38 @@ +/* nearbyint() - S390 version. + Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#ifdef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT +# include <math.h> +# include <libm-alias-double.h> + +double +__nearbyint (double x) +{ + double y; + /* The z196 zarch "load fp integer" (fidbra) instruction is rounding + x to the nearest integer according to current rounding mode (M3-field: 0) + where inexact exceptions are suppressed (M4-field: 4). */ + __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x)); + return y; +} +libm_alias_double (__nearbyint, nearbyint) + +#else +# include <sysdeps/ieee754/dbl-64/s_nearbyint.c> +#endif diff --git a/sysdeps/s390/fpu/s_nearbyintf.c b/sysdeps/s390/fpu/s_nearbyintf.c new file mode 100644 index 0000000000..66473e3b2e --- /dev/null +++ b/sysdeps/s390/fpu/s_nearbyintf.c @@ -0,0 +1,38 @@ +/* nearbyintf() - S390 version. + Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#ifdef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT +# include <math.h> +# include <libm-alias-float.h> + +float +__nearbyintf (float x) +{ + float y; + /* The z196 zarch "load fp integer" (fiebra) instruction is rounding + x to the nearest integer according to current rounding mode (M3-field: 0) + where inexact exceptions are suppressed (M4-field: 4). */ + __asm__ ("fiebra %0,0,%1,4" : "=f" (y) : "f" (x)); + return y; +} +libm_alias_float (__nearbyint, nearbyint) + +#else +# include <sysdeps/ieee754/flt-32/s_nearbyintf.c> +#endif diff --git a/sysdeps/s390/fpu/s_nearbyintl.c b/sysdeps/s390/fpu/s_nearbyintl.c new file mode 100644 index 0000000000..f154cf41c7 --- /dev/null +++ b/sysdeps/s390/fpu/s_nearbyintl.c @@ -0,0 +1,39 @@ +/* nearbyintl() - S390 version. + Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#ifdef HAVE_S390_MIN_Z196_ZARCH_ASM_SUPPORT +# include <math.h> +# include <math_private.h> +# include <libm-alias-ldouble.h> + +_Float128 +__nearbyintl (_Float128 x) +{ + _Float128 y; + /* The z196 zarch "load fp integer" (fixbra) instruction is rounding + x to the nearest integer according to current rounding mode (M3-field: 0) + where inexact exceptions are suppressed (M4-field: 4). */ + __asm__ ("fixbra %0,0,%1,4" : "=f" (y) : "f" (x)); + return y; +} +libm_alias_ldouble (__nearbyint, nearbyint) + +#else +# include <sysdeps/ieee754/ldbl-128/s_nearbyintl.c> +#endif