diff mbox series

[v3,5/7] math: Remove powerpc e_hypot

Message ID	20211101202059.1026032-6-adhemerval.zanella@linaro.org
State	Superseded
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 59BA53858039 To: libc-alpha@sourceware.org Subject: [PATCH v3 5/7] math: Remove powerpc e_hypot Date: Mon, 1 Nov 2021 17:20:57 -0300 Message-Id: <20211101202059.1026032-6-adhemerval.zanella@linaro.org> In-Reply-To: <20211101202059.1026032-1-adhemerval.zanella@linaro.org> References: <20211101202059.1026032-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
Series	Improve hypot() \| [v3,0/7] Improve hypot() [v3,1/7] math: Simplify hypotf implementation [v3,2/7] math: Use an improved algorithm for hypot (dbl-64) [v3,3/7] math: Use an improved algorithm for hypotl (ldbl-96) [v3,4/7] math: Use an improved algorithm for hypotl (ldbl-128) [v3,5/7] math: Remove powerpc e_hypot [v3,6/7] i386: Move hypot implementation to C [v3,7/7] math: Remove the error handling wrapper from hypot and hypotf

Checks

Context	Check	Description
dj/TryBot-apply_patch	success	Patch applied to master at the time it was sent

Commit Message

Adhemerval Zanella Netto Nov. 1, 2021, 8:20 p.m. UTC

  The powerpc64 does show a big difference (which is mitigated slightly
by the last path):

power9 master:

  "hypot": {
   "workload-random": {
    "duration": 5.41027e+08,
    "iterations": 3.6e+07,
    "reciprocal-throughput": 14.6467,
    "latency": 15.4103,
    "max-throughput": 6.82746e+07,
    "min-throughput": 6.48916e+07
   }
  }

power9 patched:

  "hypot": {
   "workload-random": {
    "duration": 5.37776e+08,
    "iterations": 2.4e+07,
    "reciprocal-throughput": 19.915,
    "latency": 24.8997,
    "max-throughput": 5.02135e+07,
    "min-throughput": 4.01612e+07
   }
  }

power8 master:

  "hypot": {
   "workload-random": {
    "duration": 5.27104e+08,
    "iterations": 3.2e+07,
    "reciprocal-throughput": 15.872,
    "latency": 17.072,
    "max-throughput": 6.3004e+07,
    "min-throughput": 5.85755e+07
   }
  }

power8 patched:

  "hypot": {
   "workload-random": {
    "duration": 5.21972e+08,
    "iterations": 2.2e+07,
    "reciprocal-throughput": 21.8583,
    "latency": 25.5937,
    "max-throughput": 4.57493e+07,
    "min-throughput": 3.9072e+07
   }
  }

Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
---
 sysdeps/powerpc/fpu/e_hypot.c                 | 87 -------------------
 sysdeps/powerpc/fpu/e_hypotf.c                | 78 -----------------
 .../powerpc32/power4/fpu/multiarch/Makefile   |  5 +-
 .../power4/fpu/multiarch/e_hypot-power7.c     | 23 -----
 .../power4/fpu/multiarch/e_hypot-ppc32.c      | 23 -----
 .../powerpc32/power4/fpu/multiarch/e_hypot.c  | 33 -------
 .../power4/fpu/multiarch/e_hypotf-power7.c    | 23 -----
 .../power4/fpu/multiarch/e_hypotf-ppc32.c     | 23 -----
 .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 -------
 9 files changed, 1 insertion(+), 327 deletions(-)
 delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c
 delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c

Comments

Paul A. Clarke Nov. 9, 2021, 7:28 p.m. UTC | #1

On Mon, Nov 01, 2021 at 05:20:57PM -0300, Adhemerval Zanella via Libc-alpha wrote:
> The powerpc64 does show a big difference (which is mitigated slightly
> by the last path):
> 
> power9 master:
> 
>   "hypot": {
>    "workload-random": {
>     "duration": 5.41027e+08,
>     "iterations": 3.6e+07,
>     "reciprocal-throughput": 14.6467,
>     "latency": 15.4103,
>     "max-throughput": 6.82746e+07,
>     "min-throughput": 6.48916e+07
>    }
>   }
> 
> power9 patched:
> 
>   "hypot": {
>    "workload-random": {
>     "duration": 5.37776e+08,
>     "iterations": 2.4e+07,
>     "reciprocal-throughput": 19.915,
>     "latency": 24.8997,
>     "max-throughput": 5.02135e+07,
>     "min-throughput": 4.01612e+07
>    }
>   }
> 
> power8 master:
> 
>   "hypot": {
>    "workload-random": {
>     "duration": 5.27104e+08,
>     "iterations": 3.2e+07,
>     "reciprocal-throughput": 15.872,
>     "latency": 17.072,
>     "max-throughput": 6.3004e+07,
>     "min-throughput": 5.85755e+07
>    }
>   }
> 
> power8 patched:
> 
>   "hypot": {
>    "workload-random": {
>     "duration": 5.21972e+08,
>     "iterations": 2.2e+07,
>     "reciprocal-throughput": 21.8583,
>     "latency": 25.5937,
>     "max-throughput": 4.57493e+07,
>     "min-throughput": 3.9072e+07
>    }
>   }

The patches see poorer performance on Power10, also:

Power10 unpatched:

  "hypot": {
   "workload-random": {
    "duration": 5.33875e+08,
    "iterations": 4.8e+07,
    "reciprocal-throughput": 8.47089,
    "latency": 13.7739,
    "max-throughput": 1.18051e+08,
    "min-throughput": 7.2601e+07
   }
  }

Power10 patched:

  "hypot": {
   "workload-random": {
    "duration": 5.34469e+08,
    "iterations": 3.8e+07,
    "reciprocal-throughput": 9.80905,
    "latency": 18.3209,
    "max-throughput": 1.01947e+08,
    "min-throughput": 5.45825e+07
   }
  }

PC

Wilco Dijkstra Nov. 10, 2021, 2:34 p.m. UTC | #2

Hi Paul,

> The patches see poorer performance on Power10, also:

Was that with the last patch to remove the redirects as well? The latest version
is still slower even with that on AArch64...

I think the new algorithm will always be slower due to the dependent sqrt and
division. So it's hard to improve unless we only use it for special cases (eg. when
ax and ay are close). Returning sqrt (fma (ax, ax, ay * ay)) is about twice as fast
and gives just over 1 ULP, so we're losing a lot of performance for a small ULP
improvement.

Cheers,
Wilco

Paul A. Clarke Nov. 10, 2021, 2:43 p.m. UTC | #3

On Wed, Nov 10, 2021 at 02:34:09PM +0000, Wilco Dijkstra wrote:
> > The patches see poorer performance on Power10, also:
> 
> Was that with the last patch to remove the redirects as well? The latest version
> is still slower even with that on AArch64...

Sorry that wasn't clear.  I tested with all 7 patches in the series applied.

PC

> I think the new algorithm will always be slower due to the dependent sqrt and
> division. So it's hard to improve unless we only use it for special cases (eg. when
> ax and ay are close). Returning sqrt (fma (ax, ax, ay * ay)) is about twice as fast
> and gives just over 1 ULP, so we're losing a lot of performance for a small ULP
> improvement.

Adhemerval Zanella Netto Nov. 10, 2021, 2:47 p.m. UTC | #4

On 10/11/2021 11:34, Wilco Dijkstra wrote:
> Hi Paul,
> 
>> The patches see poorer performance on Power10, also:
> 
> Was that with the last patch to remove the redirects as well? The latest version
> is still slower even with that on AArch64...
> 
> I think the new algorithm will always be slower due to the dependent sqrt and
> division. So it's hard to improve unless we only use it for special cases (eg. when
> ax and ay are close). Returning sqrt (fma (ax, ax, ay * ay)) is about twice as fast
> and gives just over 1 ULP, so we're losing a lot of performance for a small ULP
> improvement.

My main drive for this change is remove the arch-specific implementation in
favor of an implementation that might be optimized better by the compiler
without the need to extra hacks by arch-specific hooks (as I did for power7).

So I used a more precise implementation, at least on paper it stated that
with fma it might be overcome some performance implication of the more
precise algorithm.

Another option is to use the powerpc implementation which favor FP over integer
as the default one.

Wilco Dijkstra Nov. 11, 2021, 5:05 p.m. UTC | #5

Hi Adhemerval,

> On 10/11/2021 11:34, Wilco Dijkstra wrote:
>> I think the new algorithm will always be slower due to the dependent sqrt and
>> division. So it's hard to improve unless we only use it for special cases (eg. when
>> ax and ay are close). Returning sqrt (fma (ax, ax, ay * ay)) is about twice as fast
>> and gives just over 1 ULP, so we're losing a lot of performance for a small ULP
>> improvement.
>
> My main drive for this change is remove the arch-specific implementation in
> favor of an implementation that might be optimized better by the compiler
> without the need to extra hacks by arch-specific hooks (as I did for power7).

I'm all for having a single optimized generic implementation like we did for other
math functions. In general there is little scope for compiler optimizations due to
conservative FP settings - it is all down to highly optimizing both the algorithm
and implementation.

> Another option is to use the powerpc implementation which favor FP over integer
> as the default one.

That is the fastest implementation. It is less accurate though (~1.04ULP with FMA
and ~1.21ULP without FMA), so I'm not sure that would be acceptable.

I did some quick optimizations on the new algorithm, on Neoverse N1 my fastest
version is less than 10% slower than the powerpc version, and has ~0.94 ULP error.

Cheers,
Wilco

Adhemerval Zanella Netto Nov. 11, 2021, 5:13 p.m. UTC | #6

On 11/11/2021 14:05, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
>> On 10/11/2021 11:34, Wilco Dijkstra wrote:
>>> I think the new algorithm will always be slower due to the dependent sqrt and
>>> division. So it's hard to improve unless we only use it for special cases (eg. when
>>> ax and ay are close). Returning sqrt (fma (ax, ax, ay * ay)) is about twice as fast
>>> and gives just over 1 ULP, so we're losing a lot of performance for a small ULP
>>> improvement.
>>
>> My main drive for this change is remove the arch-specific implementation in
>> favor of an implementation that might be optimized better by the compiler
>> without the need to extra hacks by arch-specific hooks (as I did for power7).
> 
> I'm all for having a single optimized generic implementation like we did for other
> math functions. In general there is little scope for compiler optimizations due to
> conservative FP settings - it is all down to highly optimizing both the algorithm
> and implementation.
> 
>> Another option is to use the powerpc implementation which favor FP over integer
>> as the default one.
> 
> That is the fastest implementation. It is less accurate though (~1.04ULP with FMA
> and ~1.21ULP without FMA), so I'm not sure that would be acceptable.

This should not be worse than the current default (the powerpc one is essentially
the same as default using FP operations).

> 
> I did some quick optimizations on the new algorithm, on Neoverse N1 my fastest
> version is less than 10% slower than the powerpc version, and has ~0.94 ULP error.

Do you mean besides the optimized nan/inf checks? I can check if it helps on
powerpc.

Wilco Dijkstra Nov. 11, 2021, 7:48 p.m. UTC | #7

Hi Adhemerval,

>>> Another option is to use the powerpc implementation which favor FP over integer
>>> as the default one.
>> 
>> That is the fastest implementation. It is less accurate though (~1.04ULP with FMA
>> and ~1.21ULP without FMA), so I'm not sure that would be acceptable.
>
> This should not be worse than the current default (the powerpc one is essentially
> the same as default using FP operations).

The generic version carefully computes x * x + y * y with higher accuracy so that
the sqrt stays below 1.0ULP. The powerpc version doesn't and so goes over 1.0ULP.

>> I did some quick optimizations on the new algorithm, on Neoverse N1 my fastest
>> version is less than 10% slower than the powerpc version, and has ~0.94 ULP error.
>
> Do you mean besides the optimized nan/inf checks? I can check if it helps on
> powerpc.

Yes. I avoid the unnecessary checks at the end by doing everything in the 3 main
cases. The division can be made independent of the sqrt so they run in parallel on
modern cores.

However we can do even better with FMA and remove the division entirely by 
special casing the difficult case where x and y are really close. This has only 3.5%
higher latency than the powerpc version, so that's the fastest option below 1.0ULP.
I'll see whether it could work without FMA too and send you something to benchmark
if it passes the testsuite.

Cheers,
Wilco

Adhemerval Zanella Netto Nov. 11, 2021, 8:54 p.m. UTC | #8

On 11/11/2021 16:48, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
>>>> Another option is to use the powerpc implementation which favor FP over integer
>>>> as the default one.
>>>
>>> That is the fastest implementation. It is less accurate though (~1.04ULP with FMA
>>> and ~1.21ULP without FMA), so I'm not sure that would be acceptable.
>>
>> This should not be worse than the current default (the powerpc one is essentially
>> the same as default using FP operations).
> 
> The generic version carefully computes x * x + y * y with higher accuracy so that
> the sqrt stays below 1.0ULP. The powerpc version doesn't and so goes over 1.0ULP.

For *hypotf* they are essentially the same, powerpc one just tries to optimize
the isinf/isnan because of the FP->GRP hazards.  I think there is not current 
justification for the TEST_INF_NAN, it would be better to use your suggestion
of on default algorithm and just remove powerpc one:

  if (!isfinite(x) || !isfinite(y))
     {
       a = x; b = y;
       if ((isinf (x) || isinf (y))
	  && !issignaling_inline (x) && !issignaling_inline (y))
	return INFINITY;
       return x + y;
     }

> 
>>> I did some quick optimizations on the new algorithm, on Neoverse N1 my fastest
>>> version is less than 10% slower than the powerpc version, and has ~0.94 ULP error.
>>
>> Do you mean besides the optimized nan/inf checks? I can check if it helps on
>> powerpc.
> 
> Yes. I avoid the unnecessary checks at the end by doing everything in the 3 main
> cases. The division can be made independent of the sqrt so they run in parallel on
> modern cores.
> 
> However we can do even better with FMA and remove the division entirely by 
> special casing the difficult case where x and y are really close. This has only 3.5%
> higher latency than the powerpc version, so that's the fastest option below 1.0ULP.
> I'll see whether it could work without FMA too and send you something to benchmark
> if it passes the testsuite.

The original paper does have a version that uses fma, but it aims to be correctly
rounded:

  double h2 = h * h;
  double ax2 = ax * ax;
  h -= (__builtin_fma (-ay, ay, h2 - ax2)
       + __builtin_fma (h, h, -h2)
       - __builtin_fma (ax, ax, -ax2)) / (2.0 * h);
  return h * scale;

However, at least on recent x86_64 I did not see much improvement over no fma
version. Maybe we can come up with a version that might not be correctly
rounded that can leverage the fma for __FP_FAST_FMA.

(Also this version does not fully pass the testsuite, it trigger some underflow
exceptions that I did not investigate).

diff mbox series

Patch

diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c
deleted file mode 100644
index f96c589bbd..0000000000
--- a/sysdeps/powerpc/fpu/e_hypot.c
+++ /dev/null
@@ -1,87 +0,0 @@ 
-/* Pythagorean addition using doubles
-   Copyright (C) 2011-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Library General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Library General Public License for more details.
-
-   You should have received a copy of the GNU Library General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math-underflow.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypot(x,y)
- *
- * This a FP only version without any FP->INT conversion.
- * It is similar to default C version, making appropriates
- * overflow and underflows checks as well scaling when it
- * is needed.
- */
-
-double
-__ieee754_hypot (double x, double y)
-{
-  if ((isinf (x) || isinf (y))
-      && !issignaling (x) && !issignaling (y))
-    return INFINITY;
-  if (isnan (x) || isnan (y))
-    return x + y;
-
-  x = fabs (x);
-  y = fabs (y);
-
-  if (y > x)
-    {
-      double t = x;
-      x = y;
-      y = t;
-    }
-  if (y == 0.0)
-    return x;
-
-  /* if y is higher enough, y * 2^60 might overflow. The tests if
-     y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the
-     appropriate check to avoid the overflow exception generation.  */
-  if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60))
-    return x + y;
-
-  if (x > 0x1p+500)
-    {
-      x *= 0x1p-600;
-      y *= 0x1p-600;
-      return sqrt (x * x + y * y) / 0x1p-600;
-    }
-  if (y < 0x1p-500)
-    {
-      if (y <= 0x0.fffffffffffffp-1022)
-	{
-	  x *= 0x1p+1022;
-	  y *= 0x1p+1022;
-	  double ret = sqrt (x * x + y * y) / 0x1p+1022;
-	  math_check_force_underflow_nonneg (ret);
-	  return ret;
-	}
-      else
-	{
-	  x *= 0x1p+600;
-	  y *= 0x1p+600;
-	  return sqrt (x * x + y * y) / 0x1p+600;
-	}
-    }
-  return sqrt (x * x + y * y);
-}
-#ifndef __ieee754_hypot
-libm_alias_finite (__ieee754_hypot, __hypot)
-#endif
diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c
deleted file mode 100644
index fa201dda51..0000000000
--- a/sysdeps/powerpc/fpu/e_hypotf.c
+++ /dev/null
@@ -1,78 +0,0 @@ 
-/* Pythagorean addition using floats
-   Copyright (C) 2011-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Library General Public License as
-   published by the Free Software Foundation; either version 2 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Library General Public License for more details.
-
-   You should have received a copy of the GNU Library General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <stdint.h>
-#include <libm-alias-finite.h>
-
-/* __ieee754_hypotf(x,y)
-
-   This a FP only version without any FP->INT conversion.
-   It is similar to default C version, making appropriates
-   overflow and underflows checks as using double precision
-   instead of scaling.  */
-
-#ifdef _ARCH_PWR7
-/* POWER7 isinf and isnan optimizations are fast. */
-# define TEST_INF_NAN(x, y)                                      \
-   if ((isinff(x) || isinff(y))					 \
-       && !issignaling (x) && !issignaling (y))			 \
-     return INFINITY;                                            \
-   if (isnanf(x) || isnanf(y))                                   \
-     return x + y;
-# else
-/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are
- * costly (especially for POWER6). */
-# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2)                         \
- do {                                                            \
-   ieee_float_shape_type gf_u1;                                  \
-   ieee_float_shape_type gf_u2;                                  \
-   gf_u1.value = (f1);                                           \
-   gf_u2.value = (f2);                                           \
-   (i1) = gf_u1.word & 0x7fffffff;                               \
-   (i2) = gf_u2.word & 0x7fffffff;                               \
- } while (0)
-
-# define TEST_INF_NAN(x, y)                                      \
- do {                                                            \
-   uint32_t hx, hy;                                              \
-   GET_TWO_FLOAT_WORD(x, y, hx, hy);                             \
-   if (hy > hx) {                                                \
-     uint32_t ht = hx; hx = hy; hy = ht;                         \
-   }                                                             \
-   if (hx >= 0x7f800000) {                                       \
-     if ((hx == 0x7f800000 || hy == 0x7f800000)			 \
-	 && !issignaling (x) && !issignaling (y))		 \
-       return INFINITY;                                          \
-     return x + y;						 \
-   }                                                             \
- } while (0)
-#endif
-
-
-float
-__ieee754_hypotf (float x, float y)
-{
-  TEST_INF_NAN (x, y);
-
-  return sqrt ((double) x * x + (double) y * y);
-}
-#ifndef __ieee754_hypotf
-libm_alias_finite (__ieee754_hypotf, __hypotf)
-#endif
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
index 60f2c95532..1de0f9b350 100644
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
@@ -15,8 +15,7 @@  libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \
 			s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \
 			s_modff-power5+ s_modff-ppc32 s_logbl-power7 \
 			s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \
-			s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \
-			e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32
+			s_logbf-power7 s_logbf-ppc32
 
 CFLAGS-s_llrintf-power6.c += -mcpu=power6
 CFLAGS-s_llrintf-ppc32.c += -mcpu=power4
@@ -35,8 +34,6 @@  CFLAGS-s_modff-power5+.c = -mcpu=power5+
 CFLAGS-s_logbl-power7.c = -mcpu=power7
 CFLAGS-s_logb-power7.c = -mcpu=power7
 CFLAGS-s_logbf-power7.c = -mcpu=power7
-CFLAGS-e_hypot-power7.c = -mcpu=power7
-CFLAGS-e_hypotf-power7.c = -mcpu=power7
 
 # These files quiet sNaNs in a way that is optimized away without
 # -fsignaling-nans.
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
deleted file mode 100644
index 382b4a0b27..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@ 
-/* __ieee_hypot() POWER7 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_power7
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
deleted file mode 100644
index abb14d5469..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@ 
-/* __ieee_hypot() PowerPC32 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypot __ieee754_hypot_ppc32
-
-#include <sysdeps/powerpc/fpu/e_hypot.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
deleted file mode 100644
index a16efa350c..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
+++ /dev/null
@@ -1,33 +0,0 @@ 
-/* Multiple versions of ieee754_hypot.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypot,
-	    (hwcap & PPC_FEATURE_ARCH_2_06)
-	    ? __ieee754_hypot_power7
-            : __ieee754_hypot_ppc32);
-
-libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
deleted file mode 100644
index f8a26ff22f..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c
+++ /dev/null
@@ -1,23 +0,0 @@ 
-/* __ieee754_hypot POWER7 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_power7
-
-#include <sysdeps/powerpc/fpu/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
deleted file mode 100644
index b13f8c9db2..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c
+++ /dev/null
@@ -1,23 +0,0 @@ 
-/* __ieee_hypot() PowerPC32 version.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#define __ieee754_hypotf __ieee754_hypotf_ppc32
-
-#include <sysdeps/ieee754/flt-32/e_hypotf.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
deleted file mode 100644
index 1e72605db8..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c
+++ /dev/null
@@ -1,33 +0,0 @@ 
-/* Multiple versions of ieee754_hypotf.
-   Copyright (C) 2013-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-#include <math_private.h>
-#include <math_ldbl_opt.h>
-#include <libm-alias-finite.h>
-#include "init-arch.h"
-
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden;
-extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden;
-
-libc_ifunc (__ieee754_hypotf,
-	    (hwcap & PPC_FEATURE_ARCH_2_06)
-	    ? __ieee754_hypotf_power7
-            : __ieee754_hypotf_ppc32);
-
-libm_alias_finite (__ieee754_hypotf, __hypotf)