[v4,06/12] math: Remove powerpc e_hypot

Message ID 20211203000103.737833-7-adhemerval.zanella@linaro.org
State Superseded
Headers
Series Improve hypot |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Adhemerval Zanella Dec. 3, 2021, midnight UTC
  The generic implementation is shows only slight worse performance:

POWER9     reciprocal-throughput    latency
master                   13.4024    14.0967
new hypot                14.8479    15.8061

POWER8     reciprocal-throughput    latency
master                   15.5767    16.8885
new hypot                16.5371    18.4057

One way to improve might to make gcc generate xsmaxdp/xsmindp for
fmax/fmin (it onl does for -ffast-math, clang does for default
options).

Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
---
 sysdeps/powerpc/fpu/e_hypot.c                 | 87 -------------------
 sysdeps/powerpc/fpu/e_hypotf.c                | 78 -----------------
 .../powerpc32/power4/fpu/multiarch/Makefile   |  5 +-
 .../power4/fpu/multiarch/e_hypot-power7.c     | 23 -----
 .../power4/fpu/multiarch/e_hypot-ppc32.c      | 23 -----
 .../powerpc32/power4/fpu/multiarch/e_hypot.c  | 33 -------
 .../power4/fpu/multiarch/e_hypotf-power7.c    | 23 -----
 .../power4/fpu/multiarch/e_hypotf-ppc32.c     | 23 -----
 .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 -------
 9 files changed, 1 insertion(+), 327 deletions(-)
 delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
  

Comments

Adhemerval Zanella Dec. 6, 2021, 5:12 p.m. UTC | #1
On 02/12/2021 21:00, Adhemerval Zanella wrote:
> The generic implementation is shows only slight worse performance:
> 
> POWER9     reciprocal-throughput    latency
> master                   13.4024    14.0967
> new hypot                14.8479    15.8061
> 
> POWER8     reciprocal-throughput    latency
> master                   15.5767    16.8885
> new hypot                16.5371    18.4057
> 
> One way to improve might to make gcc generate xsmaxdp/xsmindp for
> fmax/fmin (it onl does for -ffast-math, clang does for default
> options).
> 
> Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
> (power9).

Hi Tulio/Paul,

This the only missing patch in this set and I would like to check with you,
powerpc maintainers, that if it would be ok to push it.  The resulting
performance difference, including the latest one that removes the wrappers,
is slight better:

         
POWER9           reciprocal-throughput     latency
master                          13.4024    14.0967
new hypot                       11.9206    13.9871

POWER8            reciprocal-throughput    latency
master                          15.5767    16.8885
new hypot                       15.3541    18.0856


The POWER8 çatency difference seems to be due branch misprediction 
in the max/min selection.  In fact, if I use xsmaxdp/xsmindp on the
USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:

POWER8            reciprocal-throughput    latency
xsmaxdp/xsmindp                 12.8959    16.2082

POWER9 is not affected (I don't see any performance difference by
using xsmaxdp/xsmindp).

The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
for some reason, clang use them on default -O2 option.


> ---
>  sysdeps/powerpc/fpu/e_hypot.c                 | 87 -------------------
>  sysdeps/powerpc/fpu/e_hypotf.c                | 78 -----------------
>  .../powerpc32/power4/fpu/multiarch/Makefile   |  5 +-
>  .../power4/fpu/multiarch/e_hypot-power7.c     | 23 -----
>  .../power4/fpu/multiarch/e_hypot-ppc32.c      | 23 -----
>  .../powerpc32/power4/fpu/multiarch/e_hypot.c  | 33 -------
>  .../power4/fpu/multiarch/e_hypotf-power7.c    | 23 -----
>  .../power4/fpu/multiarch/e_hypotf-ppc32.c     | 23 -----
>  .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 -------
>  9 files changed, 1 insertion(+), 327 deletions(-)
>  delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
>  delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
>  delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
> 
> diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c
> deleted file mode 100644
> index f96c589bbd..0000000000
> --- a/sysdeps/powerpc/fpu/e_hypot.c
> +++ /dev/null
> @@ -1,87 +0,0 @@
> -/* Pythagorean addition using doubles
> -   Copyright (C) 2011-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Library General Public License as
> -   published by the Free Software Foundation; either version 2 of the
> -   License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Library General Public License for more details.
> -
> -   You should have received a copy of the GNU Library General Public
> -   License along with the GNU C Library; see the file COPYING.LIB.  If
> -   not, see <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <math-underflow.h>
> -#include <stdint.h>
> -#include <libm-alias-finite.h>
> -
> -/* __ieee754_hypot(x,y)
> - *
> - * This a FP only version without any FP->INT conversion.
> - * It is similar to default C version, making appropriates
> - * overflow and underflows checks as well scaling when it
> - * is needed.
> - */
> -
> -double
> -__ieee754_hypot (double x, double y)
> -{
> -  if ((isinf (x) || isinf (y))
> -      && !issignaling (x) && !issignaling (y))
> -    return INFINITY;
> -  if (isnan (x) || isnan (y))
> -    return x + y;
> -
> -  x = fabs (x);
> -  y = fabs (y);
> -
> -  if (y > x)
> -    {
> -      double t = x;
> -      x = y;
> -      y = t;
> -    }
> -  if (y == 0.0)
> -    return x;
> -
> -  /* if y is higher enough, y * 2^60 might overflow. The tests if
> -     y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the
> -     appropriate check to avoid the overflow exception generation.  */
> -  if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60))
> -    return x + y;
> -
> -  if (x > 0x1p+500)
> -    {
> -      x *= 0x1p-600;
> -      y *= 0x1p-600;
> -      return sqrt (x * x + y * y) / 0x1p-600;
> -    }
> -  if (y < 0x1p-500)
> -    {
> -      if (y <= 0x0.fffffffffffffp-1022)
> -	{
> -	  x *= 0x1p+1022;
> -	  y *= 0x1p+1022;
> -	  double ret = sqrt (x * x + y * y) / 0x1p+1022;
> -	  math_check_force_underflow_nonneg (ret);
> -	  return ret;
> -	}
> -      else
> -	{
> -	  x *= 0x1p+600;
> -	  y *= 0x1p+600;
> -	  return sqrt (x * x + y * y) / 0x1p+600;
> -	}
> -    }
> -  return sqrt (x * x + y * y);
> -}
> -#ifndef __ieee754_hypot
> -libm_alias_finite (__ieee754_hypot, __hypot)
> -#endif
> diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c
> deleted file mode 100644
> index fa201dda51..0000000000
> --- a/sysdeps/powerpc/fpu/e_hypotf.c
> +++ /dev/null
> @@ -1,78 +0,0 @@
> -/* Pythagorean addition using floats
> -   Copyright (C) 2011-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Library General Public License as
> -   published by the Free Software Foundation; either version 2 of the
> -   License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Library General Public License for more details.
> -
> -   You should have received a copy of the GNU Library General Public
> -   License along with the GNU C Library; see the file COPYING.LIB.  If
> -   not, see <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <stdint.h>
> -#include <libm-alias-finite.h>
> -
> -/* __ieee754_hypotf(x,y)
> -
> -   This a FP only version without any FP->INT conversion.
> -   It is similar to default C version, making appropriates
> -   overflow and underflows checks as using double precision
> -   instead of scaling.  */
> -
> -#ifdef _ARCH_PWR7
> -/* POWER7 isinf and isnan optimizations are fast. */
> -# define TEST_INF_NAN(x, y)                                      \
> -   if ((isinff(x) || isinff(y))					 \
> -       && !issignaling (x) && !issignaling (y))			 \
> -     return INFINITY;                                            \
> -   if (isnanf(x) || isnanf(y))                                   \
> -     return x + y;
> -# else
> -/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are
> - * costly (especially for POWER6). */
> -# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2)                         \
> - do {                                                            \
> -   ieee_float_shape_type gf_u1;                                  \
> -   ieee_float_shape_type gf_u2;                                  \
> -   gf_u1.value = (f1);                                           \
> -   gf_u2.value = (f2);                                           \
> -   (i1) = gf_u1.word & 0x7fffffff;                               \
> -   (i2) = gf_u2.word & 0x7fffffff;                               \
> - } while (0)
> -
> -# define TEST_INF_NAN(x, y)                                      \
> - do {                                                            \
> -   uint32_t hx, hy;                                              \
> -   GET_TWO_FLOAT_WORD(x, y, hx, hy);                             \
> -   if (hy > hx) {                                                \
> -     uint32_t ht = hx; hx = hy; hy = ht;                         \
> -   }                                                             \
> -   if (hx >= 0x7f800000) {                                       \
> -     if ((hx == 0x7f800000 || hy == 0x7f800000)			 \
> -	 && !issignaling (x) && !issignaling (y))		 \
> -       return INFINITY;                                          \
> -     return x + y;						 \
> -   }                                                             \
> - } while (0)
> -#endif
> -
> -
> -float
> -__ieee754_hypotf (float x, float y)
> -{
> -  TEST_INF_NAN (x, y);
> -
> -  return sqrt ((double) x * x + (double) y * y);
> -}
> -#ifndef __ieee754_hypotf
> -libm_alias_finite (__ieee754_hypotf, __hypotf)
> -#endif
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
> index 60f2c95532..1de0f9b350 100644
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
> +++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
> @@ -15,8 +15,7 @@ libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \
>  			s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \
>  			s_modff-power5+ s_modff-ppc32 s_logbl-power7 \
>  			s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \
> -			s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \
> -			e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32
> +			s_logbf-power7 s_logbf-ppc32
>  
>  CFLAGS-s_llrintf-power6.c += -mcpu=power6
>  CFLAGS-s_llrintf-ppc32.c += -mcpu=power4
> @@ -35,8 +34,6 @@ CFLAGS-s_modff-power5+.c = -mcpu=power5+
>  CFLAGS-s_logbl-power7.c = -mcpu=power7
>  CFLAGS-s_logb-power7.c = -mcpu=power7
>  CFLAGS-s_logbf-power7.c = -mcpu=power7
> -CFLAGS-e_hypot-power7.c = -mcpu=power7
> -CFLAGS-e_hypotf-power7.c = -mcpu=power7
>  
>  # These files quiet sNaNs in a way that is optimized away without
>  # -fsignaling-nans.
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
> deleted file mode 100644
> index 382b4a0b27..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee_hypot() POWER7 version.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypot __ieee754_hypot_power7
> -
> -#include <sysdeps/powerpc/fpu/e_hypot.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
> deleted file mode 100644
> index abb14d5469..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee_hypot() PowerPC32 version.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypot __ieee754_hypot_ppc32
> -
> -#include <sysdeps/powerpc/fpu/e_hypot.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
> deleted file mode 100644
> index a16efa350c..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
> +++ /dev/null
> @@ -1,33 +0,0 @@
> -/* Multiple versions of ieee754_hypot.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <math_ldbl_opt.h>
> -#include <libm-alias-finite.h>
> -#include "init-arch.h"
> -
> -extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden;
> -extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden;
> -
> -libc_ifunc (__ieee754_hypot,
> -	    (hwcap & PPC_FEATURE_ARCH_2_06)
> -	    ? __ieee754_hypot_power7
> -            : __ieee754_hypot_ppc32);
> -
> -libm_alias_finite (__ieee754_hypot, __hypot)
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
> deleted file mode 100644
> index f8a26ff22f..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee754_hypot POWER7 version.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypotf __ieee754_hypotf_power7
> -
> -#include <sysdeps/powerpc/fpu/e_hypotf.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
> deleted file mode 100644
> index b13f8c9db2..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* __ieee_hypot() PowerPC32 version.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -
> -#define __ieee754_hypotf __ieee754_hypotf_ppc32
> -
> -#include <sysdeps/ieee754/flt-32/e_hypotf.c>
> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
> deleted file mode 100644
> index 1e72605db8..0000000000
> --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
> +++ /dev/null
> @@ -1,33 +0,0 @@
> -/* Multiple versions of ieee754_hypotf.
> -   Copyright (C) 2013-2021 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <https://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -#include <math_private.h>
> -#include <math_ldbl_opt.h>
> -#include <libm-alias-finite.h>
> -#include "init-arch.h"
> -
> -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden;
> -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden;
> -
> -libc_ifunc (__ieee754_hypotf,
> -	    (hwcap & PPC_FEATURE_ARCH_2_06)
> -	    ? __ieee754_hypotf_power7
> -            : __ieee754_hypotf_ppc32);
> -
> -libm_alias_finite (__ieee754_hypotf, __hypotf)
>
  
Paul A. Clarke Dec. 6, 2021, 9:29 p.m. UTC | #2
On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote:
> On 02/12/2021 21:00, Adhemerval Zanella wrote:
> > The generic implementation is shows only slight worse performance:
> > 
> > POWER9     reciprocal-throughput    latency
> > master                   13.4024    14.0967
> > new hypot                14.8479    15.8061
> > 
> > POWER8     reciprocal-throughput    latency
> > master                   15.5767    16.8885
> > new hypot                16.5371    18.4057
> > 
> > One way to improve might to make gcc generate xsmaxdp/xsmindp for
> > fmax/fmin (it onl does for -ffast-math, clang does for default
> > options).
> > 
> > Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
> > (power9).
> 
> Hi Tulio/Paul,
> 
> This the only missing patch in this set and I would like to check with you,
> powerpc maintainers, that if it would be ok to push it.  The resulting
> performance difference, including the latest one that removes the wrappers,
> is slight better:
> 
>          
> POWER9           reciprocal-throughput     latency
> master                          13.4024    14.0967
> new hypot                       11.9206    13.9871
> 
> POWER8            reciprocal-throughput    latency
> master                          15.5767    16.8885
> new hypot                       15.3541    18.0856

For Power10 / master:

  "hypot": {
   "workload-random": {
    "duration": 5.28242e+08,
    "iterations": 4.8e+07,
    "reciprocal-throughput": 8.28478,
    "latency": 13.7253,
    "max-throughput": 1.20703e+08,
    "min-throughput": 7.2858e+07
   }
  }

For Power10 / new hypot:

  "hypot": {
   "workload-random": {
    "duration": 5.30731e+08,
    "iterations": 5.2e+07,
    "reciprocal-throughput": 7.21945,
    "latency": 13.1933,
    "max-throughput": 1.38515e+08,
    "min-throughput": 7.57963e+07
   }
  }

> The POWER8 çatency difference seems to be due branch misprediction 
> in the max/min selection.  In fact, if I use xsmaxdp/xsmindp on the
> USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:
> 
> POWER8            reciprocal-throughput    latency
> xsmaxdp/xsmindp                 12.8959    16.2082
> 
> POWER9 is not affected (I don't see any performance difference by
> using xsmaxdp/xsmindp).
> 
> The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
> for some reason, clang use them on default -O2 option.

I believe clang may be wrong here, in that the instructions do not
properly handle NaN for the fmin/fmax semantics without -ffast-math.

PC
  
Adhemerval Zanella Dec. 7, 2021, 1:19 p.m. UTC | #3
On 06/12/2021 18:29, Paul A. Clarke wrote:
> On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote:
>> On 02/12/2021 21:00, Adhemerval Zanella wrote:
>>> The generic implementation is shows only slight worse performance:
>>>
>>> POWER9     reciprocal-throughput    latency
>>> master                   13.4024    14.0967
>>> new hypot                14.8479    15.8061
>>>
>>> POWER8     reciprocal-throughput    latency
>>> master                   15.5767    16.8885
>>> new hypot                16.5371    18.4057
>>>
>>> One way to improve might to make gcc generate xsmaxdp/xsmindp for
>>> fmax/fmin (it onl does for -ffast-math, clang does for default
>>> options).
>>>
>>> Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
>>> (power9).
>>
>> Hi Tulio/Paul,
>>
>> This the only missing patch in this set and I would like to check with you,
>> powerpc maintainers, that if it would be ok to push it.  The resulting
>> performance difference, including the latest one that removes the wrappers,
>> is slight better:
>>
>>          
>> POWER9           reciprocal-throughput     latency
>> master                          13.4024    14.0967
>> new hypot                       11.9206    13.9871
>>
>> POWER8            reciprocal-throughput    latency
>> master                          15.5767    16.8885
>> new hypot                       15.3541    18.0856
> 
> For Power10 / master:
> 
>   "hypot": {
>    "workload-random": {
>     "duration": 5.28242e+08,
>     "iterations": 4.8e+07,
>     "reciprocal-throughput": 8.28478,
>     "latency": 13.7253,
>     "max-throughput": 1.20703e+08,
>     "min-throughput": 7.2858e+07
>    }
>   }
> 
> For Power10 / new hypot:
> 
>   "hypot": {
>    "workload-random": {
>     "duration": 5.30731e+08,
>     "iterations": 5.2e+07,
>     "reciprocal-throughput": 7.21945,
>     "latency": 13.1933,
>     "max-throughput": 1.38515e+08,
>     "min-throughput": 7.57963e+07
>    }
>   }

So I assume it would be ok to push this patch based on the power10
performance numbers, right?

> 
>> The POWER8 çatency difference seems to be due branch misprediction 
>> in the max/min selection.  In fact, if I use xsmaxdp/xsmindp on the
>> USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8:
>>
>> POWER8            reciprocal-throughput    latency
>> xsmaxdp/xsmindp                 12.8959    16.2082
>>
>> POWER9 is not affected (I don't see any performance difference by
>> using xsmaxdp/xsmindp).
>>
>> The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math
>> for some reason, clang use them on default -O2 option.
> 
> I believe clang may be wrong here, in that the instructions do not
> properly handle NaN for the fmin/fmax semantics without -ffast-math.

At least on power8/power9 I don't see any test-double-{fmax,fmin} 
regressions if I use xsmaxdp/xsmindp instead of default implementation.
And libm-test-{fmax,fmin}.inc does extensively tests both default
and signaling NaNs.
  

Patch

diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c
deleted file mode 100644
index f96c589bbd..0000000000
--- a/sysdeps/powerpc/fpu/e_hypot.c
+++ /dev/null
@@ -1,87 +0,0 @@ 
-/* Pythagorean addition using doubles
-   Copyright (C) 2011-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Library General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Library General Public License for more details.
-
-   You should have received a copy of the GNU Library General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math-underflow.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypot(x,y)
- *
- * This a FP only version without any FP->INT conversion.
- * It is similar to default C version, making appropriates
- * overflow and underflows checks as well scaling when it
- * is needed.
- */
-
-double
-__ieee754_hypot (double x, double y)
-{
-  if ((isinf (x) || isinf (y))
-      && !issignaling (x) && !issignaling (y))
-    return INFINITY;
-  if (isnan (x) || isnan (y))
-    return x + y;
-
-  x = fabs (x);
-  y = fabs (y);
-
-  if (y > x)
-    {
-      double t = x;
-      x = y;
-      y = t;
-    }
-  if (y == 0.0)
-    return x;
-
-  /* if y is higher enough, y * 2^60 might overflow. The tests if
-     y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the
-     appropriate check to avoid the overflow exception generation.  */
-  if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60))
-    return x + y;
-
-  if (x > 0x1p+500)
-    {
-      x *= 0x1p-600;
-      y *= 0x1p-600;
-      return sqrt (x * x + y * y) / 0x1p-600;
-    }
-  if (y < 0x1p-500)
-    {
-      if (y <= 0x0.fffffffffffffp-1022)
-	{
-	  x *= 0x1p+1022;
-	  y *= 0x1p+1022;
-	  double ret = sqrt (x * x + y * y) / 0x1p+1022;
-	  math_check_force_underflow_nonneg (ret);
-	  return ret;
-	}
-      else
-	{
-	  x *= 0x1p+600;
-	  y *= 0x1p+600;
-	  return sqrt (x * x + y * y) / 0x1p+600;
-	}
-    }
-  return sqrt (x * x + y * y);
-}
-#ifndef __ieee754_hypot
-libm_alias_finite (__ieee754_hypot, __hypot)
-#endif
diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c
deleted file mode 100644
index fa201dda51..0000000000
--- a/sysdeps/powerpc/fpu/e_hypotf.c
+++ /dev/null
@@ -1,78 +0,0 @@ 
-/* Pythagorean addition using floats
-   Copyright (C) 2011-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Library General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Library General Public License for more details.
-
-   You should have received a copy of the GNU Library General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypotf(x,y)
-
-   This a FP only version without any FP->INT conversion.
-   It is similar to default C version, making appropriates
-   overflow and underflows checks as using double precision
-   instead of scaling.  */
-
-#ifdef _ARCH_PWR7
-/* POWER7 isinf and isnan optimizations are fast. */
-# define TEST_INF_NAN(x, y)                                      \
-   if ((isinff(x) || isinff(y))					 \
-       && !issignaling (x) && !issignaling (y))			 \
-     return INFINITY;                                            \
-   if (isnanf(x) || isnanf(y))                                   \
-     return x + y;
-# else
-/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are
- * costly (especially for POWER6). */
-# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2)                         \
- do {                                                            \
-   ieee_float_shape_type gf_u1;                                  \
-   ieee_float_shape_type gf_u2;                                  \
-   gf_u1.value = (f1);                                           \
-   gf_u2.value = (f2);                                           \
-   (i1) = gf_u1.word & 0x7fffffff;                               \
-   (i2) = gf_u2.word & 0x7fffffff;                               \
- } while (0)
-
-# define TEST_INF_NAN(x, y)                                      \
- do {                                                            \
-   uint32_t hx, hy;                                              \
-   GET_TWO_FLOAT_WORD(x, y, hx, hy);                             \
-   if (hy > hx) {                                                \
-     uint32_t ht = hx; hx = hy; hy = ht;                         \
-   }                                                             \
-   if (hx >= 0x7f800000) {                                       \
-     if ((hx == 0x7f800000 || hy == 0x7f800000)			 \
-	 && !issignaling (x) && !issignaling (y))		 \
-       return INFINITY;                                          \
-     return x + y;						 \
-   }                                                             \
- } while (0)
-#endif
-
-
-float
-__ieee754_hypotf (float x, float y)
-{
-  TEST_INF_NAN (x, y);
-
-  return sqrt ((double) x * x + (double) y * y);
-}
-#ifndef __ieee754_hypotf
-libm_alias_finite (__ieee754_hypotf, __hypotf)
-#endif
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
index 60f2c95532..1de0f9b350 100644
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
@@ -15,8 +15,7 @@  libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \
 			s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \
 			s_modff-power5+ s_modff-ppc32 s_logbl-power7 \
 			s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \
-			s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \
-			e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32
+			s_logbf-power7 s_logbf-ppc32
 
 CFLAGS-s_llrintf-power6.c += -mcpu=power6
 CFLAGS-s_llrintf-ppc32.c += -mcpu=power4
@@ -35,8 +34,6 @@  CFLAGS-s_modff-power5+.c = -mcpu=power5+
 CFLAGS-s_logbl-power7.c = -mcpu=power7
 CFLAGS-s_logb-power7.c = -mcpu=power7
 CFLAGS-s_logbf-power7.c = -mcpu=power7
-CFLAGS-e_hypot-power7.c = -mcpu=power7
-CFLAGS-e_hypotf-power7.c = -mcpu=power7
 
 # These files quiet sNaNs in a way that is optimized away without
 # -fsignaling-nans.
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
deleted file mode 100644
index 382b4a0b27..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@ 
-/* __ieee_hypot() POWER7 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_power7
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
deleted file mode 100644
index abb14d5469..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@ 
-/* __ieee_hypot() PowerPC32 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_ppc32
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
deleted file mode 100644
index a16efa350c..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
+++ /dev/null
@@ -1,33 +0,0 @@ 
-/* Multiple versions of ieee754_hypot.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypot,
-	    (hwcap & PPC_FEATURE_ARCH_2_06)
-	    ? __ieee754_hypot_power7
-            : __ieee754_hypot_ppc32);
-
-libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
deleted file mode 100644
index f8a26ff22f..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@ 
-/* __ieee754_hypot POWER7 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_power7
-
-#include <sysdeps/powerpc/fpu/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
deleted file mode 100644
index b13f8c9db2..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@ 
-/* __ieee_hypot() PowerPC32 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_ppc32
-
-#include <sysdeps/ieee754/flt-32/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
deleted file mode 100644
index 1e72605db8..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
+++ /dev/null
@@ -1,33 +0,0 @@ 
-/* Multiple versions of ieee754_hypotf.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypotf,
-	    (hwcap & PPC_FEATURE_ARCH_2_06)
-	    ? __ieee754_hypotf_power7
-            : __ieee754_hypotf_ppc32);
-
-libm_alias_finite (__ieee754_hypotf, __hypotf)