Patchwork [1/3] powerpc: remove power6 wcscpy optimization

login
register
mail settings
Submitter Adhemerval Zanella Netto
Date March 4, 2019, 7:43 p.m.
Message ID <20190304194350.16116-1-adhemerval.zanella@linaro.org>
Download mbox | patch
Permalink /patch/31712/
State New
Headers show

Comments

Adhemerval Zanella Netto - March 4, 2019, 7:43 p.m.
This patch removes the power6 wcscpy optimization.  It is rarely
int real word codo, the optimizations can be potentially implemented
in generic code (which would also benefit not only powerpc), and
the power6 and power7 resulting binary are essentially the same.

Checked on powerpc64-linux-gnu.

	* sysdeps/powerpc/power6/wcscpy.c: Remove file.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise.
	* sysdeps/powerpc/powerpc64/power6/wcscpy.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
	[$(subdir) == wcsmbs] (sysdeps_routines): Remove wcscpy-power6 and
	wcscpy-power7.
	(CFLAGS-wcscpy-power7.c, CFLAGS-wcscpy-power6.c): Remove rule.
	* sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c:
	Remove wcscpy optimizations.
---
 sysdeps/powerpc/power6/wcscpy.c               | 105 ------------------
 .../powerpc32/power4/multiarch/Makefile       |   5 +-
 .../power4/multiarch/ifunc-impl-list.c        |  11 --
 .../power4/multiarch/wcscpy-power6.c          |  22 ----
 .../power4/multiarch/wcscpy-power7.c          |  22 ----
 .../powerpc32/power4/multiarch/wcscpy-ppc32.c |  27 -----
 .../powerpc32/power4/multiarch/wcscpy.c       |  36 ------
 sysdeps/powerpc/powerpc64/multiarch/Makefile  |   5 +-
 .../powerpc64/multiarch/ifunc-impl-list.c     |  11 --
 .../powerpc64/multiarch/wcscpy-power6.c       |  19 ----
 .../powerpc64/multiarch/wcscpy-power7.c       |  19 ----
 .../powerpc64/multiarch/wcscpy-ppc64.c        |  18 ---
 sysdeps/powerpc/powerpc64/multiarch/wcscpy.c  |  35 ------
 sysdeps/powerpc/powerpc64/power6/wcscpy.c     |   1 -
 14 files changed, 2 insertions(+), 334 deletions(-)
 delete mode 100644 sysdeps/powerpc/power6/wcscpy.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c
 delete mode 100644 sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c
 delete mode 100644 sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c
 delete mode 100644 sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c
 delete mode 100644 sysdeps/powerpc/powerpc64/multiarch/wcscpy.c
 delete mode 100644 sysdeps/powerpc/powerpc64/power6/wcscpy.c
Gabriel F. T. Gomes - March 9, 2019, 4:08 p.m.
On Mon, Mar 04 2019, Adhemerval Zanella wrote:
> This patch removes the power6 wcscpy optimization.  It is rarely
> int real word codo, the optimizations can be potentially implemented
> in generic code (which would also benefit not only powerpc), and
> the power6 and power7 resulting binary are essentially the same.

Although I agree that optimizations could be implemented in generic
code, and that it might benefit other architectures, I tested this patch
and it causes large performance regressions in benchtests, when ran on
Power8 and Power9 machines (Power 7 numbers were OK).  Based on that
result (which I shall present and discuss later in this message), I
think this patch should not be committed to master.

I am aware that this patch has the additional benefit of easing code
maintenance, however, the performance impact on Power would be seen
immediately, while the benefit of an optimization to the generic
algorithm would need more time to implement.  Thus, I suggest that this
patch be delayed until we have such implementation.


Now, regarding the results, I have attached a few spreadsheets, each
containing the raw data collected from 'make bench' runs on different
machines, as well as comparisons between the master branch and another
branch with this patch applied.

  attachment/wcscpy-LE-P9.ods: Runs on a Power 9, little-endian machine.
  attachment/wcscpy-LE-P8.ods: Runs on a Power 8, little-endian machine.
  attachment/wcscpy-BE-P8.ods: Runs on a Power 8, big-endian machine.
  attachment/wcscpy-BE-P7.ods: Runs on a Power 7, big-endian machine.

In each of these files, each sheet is named according to the data it
contains: for instance, 'master - cpu=p8' contains the raw data
collected from a build of the master branch, which has been configured
with --with-cpu=power8.  Likewise, 'patched - cpu=p8' contains similar
data, but from a build of a branch with this patch applied.  Finally, a
sheet named 'compare p8' contains the performance comparison between the
two.  In such 'compare pN' sheets, I added a graph of the performance
comparison of the two, in which the data is sorted from worst to best
case, so that we can have a visual assessment of how good or bad the
change is for the whole set of inputs tested (I believe that this kind
of plot is common practice, but I'm not sure, hence this explanation).

In wcscpy-BE-P8.ods, wcscpy-LE-P8.ods, and wcscpy-LE-P9.ods, we can see
that, for allmost all inputs, there is a performance regression (dots
below 0% are regressions).  On wcscpy-BE-P7.ods, there are more
performance gains than regressions, and I would be OK with this patch
being applied to master, if it weren't for the P8 and P9 results.

PS: I have not yet tested the other patches in this patch set.  This
message is only about this specific patch.
Adhemerval Zanella Netto - March 11, 2019, 1:02 p.m.
On 09/03/2019 13:08, Gabriel F. T. Gomes wrote:
> On Mon, Mar 04 2019, Adhemerval Zanella wrote:
>> This patch removes the power6 wcscpy optimization.  It is rarely
>> int real word codo, the optimizations can be potentially implemented
>> in generic code (which would also benefit not only powerpc), and
>> the power6 and power7 resulting binary are essentially the same.
> 
> Although I agree that optimizations could be implemented in generic
> code, and that it might benefit other architectures, I tested this patch
> and it causes large performance regressions in benchtests, when ran on
> Power8 and Power9 machines (Power 7 numbers were OK).  Based on that
> result (which I shall present and discuss later in this message), I
> think this patch should not be committed to master.
> 
> I am aware that this patch has the additional benefit of easing code
> maintenance, however, the performance impact on Power would be seen
> immediately, while the benefit of an optimization to the generic
> algorithm would need more time to implement.  Thus, I suggest that this
> patch be delayed until we have such implementation.
> 
> 
> Now, regarding the results, I have attached a few spreadsheets, each
> containing the raw data collected from 'make bench' runs on different
> machines, as well as comparisons between the master branch and another
> branch with this patch applied.
> 
>   attachment/wcscpy-LE-P9.ods: Runs on a Power 9, little-endian machine.
>   attachment/wcscpy-LE-P8.ods: Runs on a Power 8, little-endian machine.
>   attachment/wcscpy-BE-P8.ods: Runs on a Power 8, big-endian machine.
>   attachment/wcscpy-BE-P7.ods: Runs on a Power 7, big-endian machine.
> 
> In each of these files, each sheet is named according to the data it
> contains: for instance, 'master - cpu=p8' contains the raw data
> collected from a build of the master branch, which has been configured
> with --with-cpu=power8.  Likewise, 'patched - cpu=p8' contains similar
> data, but from a build of a branch with this patch applied.  Finally, a
> sheet named 'compare p8' contains the performance comparison between the
> two.  In such 'compare pN' sheets, I added a graph of the performance
> comparison of the two, in which the data is sorted from worst to best
> case, so that we can have a visual assessment of how good or bad the
> change is for the whole set of inputs tested (I believe that this kind
> of plot is common practice, but I'm not sure, hence this explanation).
> 
> In wcscpy-BE-P8.ods, wcscpy-LE-P8.ods, and wcscpy-LE-P9.ods, we can see
> that, for allmost all inputs, there is a performance regression (dots
> below 0% are regressions).  On wcscpy-BE-P7.ods, there are more
> performance gains than regressions, and I would be OK with this patch
> being applied to master, if it weren't for the P8 and P9 results.
> 
> PS: I have not yet tested the other patches in this patch set.  This
> message is only about this specific patch.
> 

Yes, I am aware this is a performance regression on power. Specifically
for wcscpy it would require 3 call functions call: wcslen follower by
wmemcpy and memcpy (and it is worse on powerpc64le abi which does not
allow tail-cail optimization).

The main question here is should we really care to optimize wcs* 
routines at all? 

  - Mostly uses I could find on https://codesearch.debian.net are for
    Win32 routines (which uses UTF16 as default).

  - gnulib define the wchar module as obsolete and even on the project
    that does it, it is tied to windows routines.

  - wide char routine are inherit problematic regarding portability
    (standard does not define wchar_t size).

  - Although some runtime environments might use UTF16 as default
    (JavaScript, java, Qt, Python), afaik they do not rely on C runtime
    exactly for the previous issue and they reimplement all string
    routines internally.

  - Mostly uses are not performance extensive.

The architectures that does provide optimized routine for it (x86
and s390) seems to do so only to advertise the architecture instruction
set than to focus on realword cases.  

Besides the issues described with wcs* routines, my main issue with 
powerpc optimization is I really think we should avoid that kind of
optimizations that try to leverage bad compiler code generation or
lack of optimization. Our recent move is to try to leverage the 
compiler itself (take for instance the internal math.h refactoring
we did). Also, if I recall correctly I tried to add a loop unrolling
on strcasecmp on generic that was refused exactly because such
optimization should *not* be in generic implementation (which I tend
to agree).

So what is really the point of still providing such complexity for
powerpc for routines that are most likely not used in realworld cases?
If powerpc performance for this routines are really demanded I think
I can thinker some macro to unroll the generic routines for power only.
This will still show some performance hit on wcscpy (mostly the function
call overhead).
Gabriel F. T. Gomes - March 11, 2019, 7:04 p.m.
On Mon, Mar 11 2019, Adhemerval Zanella wrote:
> 
> Yes, I am aware this is a performance regression on power. Specifically
> for wcscpy it would require 3 call functions call: wcslen follower by
> wmemcpy and memcpy (and it is worse on powerpc64le abi which does not
> allow tail-cail optimization).
> 
> The main question here is should we really care to optimize wcs* 
> routines at all? 

This question is two-fold.  On the one hand, and based on the usage
analysis you provided below, I tend to agree that we shouldn't spend
time trying to optimize them.  On the other hand, the optimization is
already in place, so I don't see a compelling reason to remove it.

I get it that the benefit of removing such optimizations is easier code
maintenance, but the events of problems seem to be rare enough that
fixing them when they show up, is something I would be willing to do
(assuming that that is really the only benefit of the code removal).
Maybe I'll regret it in the future, but then I could change my mind :).
 
>   - Mostly uses I could find on https://codesearch.debian.net are for
>     Win32 routines (which uses UTF16 as default).
> 
>   - gnulib define the wchar module as obsolete and even on the project
>     that does it, it is tied to windows routines.
> 
>   - wide char routine are inherit problematic regarding portability
>     (standard does not define wchar_t size).
> 
>   - Although some runtime environments might use UTF16 as default
>     (JavaScript, java, Qt, Python), afaik they do not rely on C runtime
>     exactly for the previous issue and they reimplement all string
>     routines internally.
> 
>   - Mostly uses are not performance extensive.

Thanks for doing this research.
 
> Besides the issues described with wcs* routines my main issue with 
> powerpc optimization is I really think we should avoid that kind of
> optimizations that try to leverage bad compiler code generation or
> lack of optimization. Our recent move is to try to leverage the 
> compiler itself (take for instance the internal math.h refactoring
> we did). 

OK, but at the same time, this discussion is also about the fact that we
don't want to spend time to optimize wcs* functions any *further*.  One
of the arguments for the removal of the code is that the compiler should
provide such optimization, however, it currently doesn't.

> So what is really the point of still providing such complexity for
> powerpc for routines that are most likely not used in realworld cases?

I can't say that there aren't realworld cases, even though it seems to
be the case in the research you did, because we don't have access to
everybody's software.

Anyhow, we, as community, still don't know if we want wcs* functions to
be optimized, but if we decide that we don't want platform-specific
optimizations in glibc, then this should be done globally, not for
powerpc alone.  Right now, this patch only impacts powerpc with a
performance degradation.

The only benefit of removing the optimization is to ease code
maintenance/complexity.  I'm not convinced that it justifies the
performance impact.
Adhemerval Zanella Netto - March 11, 2019, 8:26 p.m.
On 11/03/2019 16:04, Gabriel F. T. Gomes wrote:
> On Mon, Mar 11 2019, Adhemerval Zanella wrote:
>>
>> Yes, I am aware this is a performance regression on power. Specifically
>> for wcscpy it would require 3 call functions call: wcslen follower by
>> wmemcpy and memcpy (and it is worse on powerpc64le abi which does not
>> allow tail-cail optimization).
>>
>> The main question here is should we really care to optimize wcs* 
>> routines at all? 
> 
> This question is two-fold.  On the one hand, and based on the usage
> analysis you provided below, I tend to agree that we shouldn't spend
> time trying to optimize them.  On the other hand, the optimization is
> already in place, so I don't see a compelling reason to remove it.
> 
> I get it that the benefit of removing such optimizations is easier code
> maintenance, but the events of problems seem to be rare enough that
> fixing them when they show up, is something I would be willing to do
> (assuming that that is really the only benefit of the code removal).
> Maybe I'll regret it in the future, but then I could change my mind :).

It is more in term of the current goals of making glibc as simple of
possible, while still retaining optimizations where they *matter*. As I
said we used to make up for lack of compiler support on math and string
optimizations and current trend is to avoid and remove such complications.

Same idea is for a lot of assembly optimization, where a more clean C
implementation usually does not only pays of for simplicity, but sometimes
might also shows better performance with better compiler support.

>  
>>   - Mostly uses I could find on https://codesearch.debian.net are for
>>     Win32 routines (which uses UTF16 as default).
>>
>>   - gnulib define the wchar module as obsolete and even on the project
>>     that does it, it is tied to windows routines.
>>
>>   - wide char routine are inherit problematic regarding portability
>>     (standard does not define wchar_t size).
>>
>>   - Although some runtime environments might use UTF16 as default
>>     (JavaScript, java, Qt, Python), afaik they do not rely on C runtime
>>     exactly for the previous issue and they reimplement all string
>>     routines internally.
>>
>>   - Mostly uses are not performance extensive.
> 
> Thanks for doing this research.
>  
>> Besides the issues described with wcs* routines my main issue with 
>> powerpc optimization is I really think we should avoid that kind of
>> optimizations that try to leverage bad compiler code generation or
>> lack of optimization. Our recent move is to try to leverage the 
>> compiler itself (take for instance the internal math.h refactoring
>> we did). 
> 
> OK, but at the same time, this discussion is also about the fact that we
> don't want to spend time to optimize wcs* functions any *further*.  One
> of the arguments for the removal of the code is that the compiler should
> provide such optimization, however, it currently doesn't.

No, the argument is such optimization should *not* in libc without a good
reason and we should aim to first improve generic implementations and only
push to arch-specific implementation when generic code is not a 
straightforward gain on all architectures (with strategies such as unaligned
access, prefetch, etc.).

> 
>> So what is really the point of still providing such complexity for
>> powerpc for routines that are most likely not used in realworld cases?
> 
> I can't say that there aren't realworld cases, even though it seems to
> be the case in the research you did, because we don't have access to
> everybody's software.
> 
> Anyhow, we, as community, still don't know if we want wcs* functions to
> be optimized, but if we decide that we don't want platform-specific
> optimizations in glibc, then this should be done globally, not for
> powerpc alone.  Right now, this patch only impacts powerpc with a
> performance degradation.
> 
> The only benefit of removing the optimization is to ease code
> maintenance/complexity.  I'm not convinced that it justifies the
> performance impact.
> 

My point if this kind of code would be proposed nowadays we would probably
make it generic instead of pushing for arch-specific code.  In any case 
I think we can towards in not hurting powerpc performance while still
simplifying the code by adding the loop unroll in generic implementation
and making powerpc unroll by default. I still think this complexity is
still practically unnecessary, but better than what we have today.

Patch

diff --git a/sysdeps/powerpc/power6/wcscpy.c b/sysdeps/powerpc/power6/wcscpy.c
deleted file mode 100644
index 11c04b081a..0000000000
--- a/sysdeps/powerpc/power6/wcscpy.c
+++ /dev/null
@@ -1,105 +0,0 @@ 
-/* wcscpy.c - Wide Character Copy for POWER6+.
-   Copyright (C) 2012-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <http://www.gnu.org/licenses/>.  */
-
-#include <stddef.h>
-#include <wchar.h>
-
-#ifndef WCSCPY
-# define WCSCPY wcscpy
-#endif
-
-/* Copy SRC to DEST.  */
-wchar_t *
-WCSCPY (wchar_t *dest, const wchar_t *src)
-{
-  wint_t c,d;
-  wchar_t *wcp, *wcp2;
-
-  if (__alignof__ (wchar_t) >= sizeof (wchar_t))
-    {
-      const ptrdiff_t off = dest - src;
-
-      wcp = (wchar_t *) src;
-      wcp2 = wcp + 1 ;
-
-      do
-        {
-          d = *wcp;
-          wcp[off] = d;
-          if (d == L'\0')
-            return dest;
-          wcp += 2;
-
-          c = *wcp2;
-          wcp2[off] = c;
-          if (c == L'\0')
-            return dest;
-          wcp2 += 2;
-
-          d = *wcp;
-          wcp[off] = d;
-          if (d == L'\0')
-            return dest;
-          wcp += 2;
-
-          c = *wcp2;
-          wcp2[off] = c;
-          if (c == L'\0')
-            return dest;
-          wcp2 += 2;
-
-          d = *wcp;
-          wcp[off] = d;
-          if (d == L'\0')
-            return dest;
-          wcp += 2;
-
-          c = *wcp2;
-          wcp2[off] = c;
-          if (c == L'\0')
-            return dest;
-          wcp2 += 2;
-
-          d = *wcp;
-          wcp[off] = d;
-          if (d == L'\0')
-            return dest;
-          wcp += 2;
-
-          c = *wcp2;
-          wcp2[off] = c;
-          if (c == L'\0')
-            return dest;
-          wcp2 += 2;
-        }
-      while (c != L'\0');
-
-    }
-  else
-    {
-      wcp = dest;
-
-      do
-        {
-          c = *src++;
-          *wcp++ = c;
-        }
-      while (c != L'\0');
-    }
-  return dest;
-}
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
index bd9d360efa..f5141bc5b5 100644
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
@@ -18,13 +18,10 @@  endif
 
 ifeq ($(subdir),wcsmbs)
 sysdep_routines += wcschr-power7 wcschr-power6 wcschr-ppc32 \
-		   wcsrchr-power7 wcsrchr-power6 wcsrchr-ppc32 \
-		   wcscpy-power7 wcscpy-power6 wcscpy-ppc32
+		   wcsrchr-power7 wcsrchr-power6 wcsrchr-ppc32
 
 CFLAGS-wcschr-power7.c += -mcpu=power7
 CFLAGS-wcschr-power6.c += -mcpu=power6
 CFLAGS-wcsrchr-power7.c += -mcpu=power7
 CFLAGS-wcsrchr-power6.c += -mcpu=power6
-CFLAGS-wcscpy-power7.c += -mcpu=power7
-CFLAGS-wcscpy-power6.c += -mcpu=power6
 endif
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c
index bcd38e0f79..ae581d62f8 100644
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c
@@ -209,16 +209,5 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      IFUNC_IMPL_ADD (array, i, wcsrchr, 1,
 			      __wcsrchr_ppc))
 
-  /* Support sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c.  */
-  IFUNC_IMPL (i, name, wcscpy,
-	      IFUNC_IMPL_ADD (array, i, wcscpy,
-			      hwcap & PPC_FEATURE_HAS_VSX,
-			      __wcscpy_power7)
-	      IFUNC_IMPL_ADD (array, i, wcscpy,
-			      hwcap & PPC_FEATURE_ARCH_2_05,
-			      __wcscpy_power6)
-	      IFUNC_IMPL_ADD (array, i, wcscpy, 1,
-			      __wcscpy_ppc))
-
   return i;
 }
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c b/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c
deleted file mode 100644
index 5bb0c82462..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c
+++ /dev/null
@@ -1,22 +0,0 @@ 
-/* Copyright (C) 2013-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <wchar.h>
-
-#define WCSCPY __wcscpy_power6
-
-#include <sysdeps/powerpc/power6/wcscpy.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c b/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c
deleted file mode 100644
index 5375094b60..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c
+++ /dev/null
@@ -1,22 +0,0 @@ 
-/* Copyright (C) 2013-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <wchar.h>
-
-#define WCSCPY __wcscpy_power7
-
-#include <sysdeps/powerpc/power6/wcscpy.c>
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c b/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c
deleted file mode 100644
index 31e0d81ef0..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c
+++ /dev/null
@@ -1,27 +0,0 @@ 
-/* Copyright (C) 2013-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <wchar.h>
-
-extern __typeof (wcscpy) __wcscpy_ppc;
-
-#define WCSCPY  __wcscpy_ppc
-#include <wcsmbs/wcscpy.c>
-
-#ifdef SHARED
-__hidden_ver1 (__wcscpy_ppc, __GI___wcscpy, __wcscpy_ppc);
-#endif
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c b/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c
deleted file mode 100644
index 0daf55cf70..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c
+++ /dev/null
@@ -1,36 +0,0 @@ 
-/* Multiple versions of wcscpy
-   Copyright (C) 2013-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#if IS_IN (libc)
-# define __wcscpy __redirect_wcscpy
-# include <wchar.h>
-# undef __wcscpy
-# include "init-arch.h"
-
-extern __typeof (__redirect_wcscpy) __wcscpy_ppc attribute_hidden;
-extern __typeof (__redirect_wcscpy) __wcscpy_power6 attribute_hidden;
-extern __typeof (__redirect_wcscpy) __wcscpy_power7 attribute_hidden;
-
-libc_ifunc_redirected (__redirect_wcscpy, __wcscpy,
-		       (hwcap & PPC_FEATURE_HAS_VSX)
-		       ? __wcscpy_power7 :
-			 (hwcap & PPC_FEATURE_ARCH_2_05)
-			 ? __wcscpy_power6
-		       : __wcscpy_ppc);
-weak_alias (__wcscpy, wcscpy)
-#endif
diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile
index 963ea84dbf..3913ef580e 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile
@@ -40,13 +40,10 @@  endif
 
 ifeq ($(subdir),wcsmbs)
 sysdep_routines += wcschr-power7 wcschr-power6 wcschr-ppc64 \
-		   wcsrchr-power7 wcsrchr-power6 wcsrchr-ppc64 \
-		   wcscpy-power7 wcscpy-power6 wcscpy-ppc64 \
+		   wcsrchr-power7 wcsrchr-power6 wcsrchr-ppc64
 
 CFLAGS-wcschr-power7.c += -mcpu=power7
 CFLAGS-wcschr-power6.c += -mcpu=power6
 CFLAGS-wcsrchr-power7.c += -mcpu=power7
 CFLAGS-wcsrchr-power6.c += -mcpu=power6
-CFLAGS-wcscpy-power7.c += -mcpu=power7
-CFLAGS-wcscpy-power6.c += -mcpu=power6
 endif
diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index 8d91d9abb9..06d33d71e0 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -282,17 +282,6 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      IFUNC_IMPL_ADD (array, i, wcsrchr, 1,
 			      __wcsrchr_ppc))
 
-  /* Support sysdeps/powerpc/powerpc64/multiarch/wcscpy.c.  */
-  IFUNC_IMPL (i, name, wcscpy,
-	      IFUNC_IMPL_ADD (array, i, wcscpy,
-			      hwcap & PPC_FEATURE_HAS_VSX,
-			      __wcscpy_power7)
-	      IFUNC_IMPL_ADD (array, i, wcscpy,
-			      hwcap & PPC_FEATURE_ARCH_2_05,
-			      __wcscpy_power6)
-	      IFUNC_IMPL_ADD (array, i, wcscpy, 1,
-			      __wcscpy_ppc))
-
   /* Support sysdeps/powerpc/powerpc64/multiarch/strrchr.c.  */
   IFUNC_IMPL (i, name, strrchr,
 	      IFUNC_IMPL_ADD (array, i, strrchr,
diff --git a/sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c b/sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c
deleted file mode 100644
index abe2e0f073..0000000000
--- a/sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c
+++ /dev/null
@@ -1,19 +0,0 @@ 
-/* wcscpy.c - Wide Character Search for powerpc64/power6.
-   Copyright (C) 2013-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <http://www.gnu.org/licenses/>.  */
-
-#include <sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c b/sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c
deleted file mode 100644
index be95cd9074..0000000000
--- a/sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c
+++ /dev/null
@@ -1,19 +0,0 @@ 
-/* wcscpy.c - Wide Character Search for powerpc64/power7.
-   Copyright (C) 2013-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; see the file COPYING.LIB.  If
-   not, see <http://www.gnu.org/licenses/>.  */
-
-#include <sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c
deleted file mode 100644
index faa3c1614f..0000000000
--- a/sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c
+++ /dev/null
@@ -1,18 +0,0 @@ 
-/* Copyright (C) 2013-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/wcscpy.c b/sysdeps/powerpc/powerpc64/multiarch/wcscpy.c
deleted file mode 100644
index 3f918b27c6..0000000000
--- a/sysdeps/powerpc/powerpc64/multiarch/wcscpy.c
+++ /dev/null
@@ -1,35 +0,0 @@ 
-/* Multiple versions of wcscpy.
-   Copyright (C) 2013-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#define __wcscpy __redirect___wcscpy
-#include <wchar.h>
-#undef __wcscpy
-#include <shlib-compat.h>
-#include "init-arch.h"
-
-extern __typeof (wcscpy) __wcscpy_ppc attribute_hidden;
-extern __typeof (wcscpy) __wcscpy_power6 attribute_hidden;
-extern __typeof (wcscpy) __wcscpy_power7 attribute_hidden;
-
-libc_ifunc_redirected (__redirect___wcscpy, __wcscpy,
-		       (hwcap & PPC_FEATURE_HAS_VSX)
-		       ? __wcscpy_power7 :
-		         (hwcap & PPC_FEATURE_ARCH_2_05)
-		         ? __wcscpy_power6
-	               : __wcscpy_ppc);
-weak_alias (__wcscpy, wcscpy)
diff --git a/sysdeps/powerpc/powerpc64/power6/wcscpy.c b/sysdeps/powerpc/powerpc64/power6/wcscpy.c
deleted file mode 100644
index 722c8f995b..0000000000
--- a/sysdeps/powerpc/powerpc64/power6/wcscpy.c
+++ /dev/null
@@ -1 +0,0 @@ 
-#include <sysdeps/powerpc/power6/wcscpy.c>