Patchwork powerpc: Fix build of wcscpy with --disable-multi-arch

login
register
mail settings
Submitter Gabriel F. T. Gomes
Date March 2, 2019, 10:06 p.m.
Message ID <20190302220618.706-1-gabriel@inconstante.eti.br>
Download mbox | patch
Permalink /patch/31703/
State New
Headers show

Comments

Gabriel F. T. Gomes - March 2, 2019, 10:06 p.m.
Since the commit

commit 81a14439417552324ec6ca71f65ddf8e7cdd51c7
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Feb 5 17:35:12 2019 -0200

    wcsmbs: optimize wcscat

powerpc64 and powerpc64le builds fail when configured with
--disable-multi-arch, due to an undefined reference to __GI___wcscpy.
This patch fixes this on sysdeps/powerpc/powerpc64/power6/wcscpy.c,
which is only used when multi-arch is disabled.

This patch does nothing for the failures on 32-bits powerpc builds,
because the file is under the powerpc64 subdirectory, however, powerpc
builds were already failing with --disable-multi-arch, with multiple
error messages, even before the aforementioned commit.

Tested for powerpc, powerpc64, and powerpc64le with multi-arch enabled
(all pass) and disabled (powerpc still fails as explained above).

	* sysdeps/powerpc/powerpc64/power6/wcscpy.c (WCSCPY): Define to
	__wcscpy, then use libc_hidden_def and weak_alias to bind it to
	__GI___wcscpy and wcscpy.
---
 sysdeps/powerpc/powerpc64/power6/wcscpy.c | 3 +++
 1 file changed, 3 insertions(+)
Adhemerval Zanella Netto - March 3, 2019, 1:20 p.m.
On 02/03/2019 19:06, Gabriel F. T. Gomes wrote:
> Since the commit
> 
> commit 81a14439417552324ec6ca71f65ddf8e7cdd51c7
> Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Date:   Tue Feb 5 17:35:12 2019 -0200
> 
>     wcsmbs: optimize wcscat
> 
> powerpc64 and powerpc64le builds fail when configured with
> --disable-multi-arch, due to an undefined reference to __GI___wcscpy.
> This patch fixes this on sysdeps/powerpc/powerpc64/power6/wcscpy.c,
> which is only used when multi-arch is disabled.
> 
> This patch does nothing for the failures on 32-bits powerpc builds,
> because the file is under the powerpc64 subdirectory, however, powerpc
> builds were already failing with --disable-multi-arch, with multiple
> error messages, even before the aforementioned commit.
> 
> Tested for powerpc, powerpc64, and powerpc64le with multi-arch enabled
> (all pass) and disabled (powerpc still fails as explained above).
> 
> 	* sysdeps/powerpc/powerpc64/power6/wcscpy.c (WCSCPY): Define to
> 	__wcscpy, then use libc_hidden_def and weak_alias to bind it to
> 	__GI___wcscpy and wcscpy.

LGTM. In fact you need either your compiler to setup to to use power6+
as default or configure with --with-cpu (the powerpc64le gcc I used
to check in fact do not have neither set).

As a side note, this is exactly the kind of annoyance I tried to avoid
when I advocated to reduce the build permutations on glibc for the 
recent s390 ifunc refactor (and it bit me with an issue recently [1]).
So I would ask again if these micro-optimization indeed pays for the 
over-complicated build options we need to actually test to make sure 
nothing broken glibc build.

For powerpc64 there are two issue: 1. can't we just infer the minimum
cpu support from compiler or CC and get rid of --with-cpu? I would like
to avoid to add the possible permutation of --with-cpu=powerX and
--disable-multi-arch of the required build to check. It won't help much
now since powerpc still organizes itself with sysdeps subfolder, but
it can be a first step.

Second, and this is minor one, I will send a patch to just remove the
power6 wcs* optimized routines. They are rarely used in the wild, the
optimization are just loop unrolling that we can either rely on 
compiler (sadly gcc does not unroll this kind of loop either with
aggressive optimization flags) or add on generic implementation,
and power6 and power7 are essentially the same binary for
powerpc64.

[1] https://sourceware.org/ml/libc-alpha/2019-02/msg00140.html

> ---
>  sysdeps/powerpc/powerpc64/power6/wcscpy.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/sysdeps/powerpc/powerpc64/power6/wcscpy.c b/sysdeps/powerpc/powerpc64/power6/wcscpy.c
> index 722c8f995b..59cfb28832 100644
> --- a/sysdeps/powerpc/powerpc64/power6/wcscpy.c
> +++ b/sysdeps/powerpc/powerpc64/power6/wcscpy.c
> @@ -1 +1,4 @@
> +#define WCSCPY __wcscpy
>  #include <sysdeps/powerpc/power6/wcscpy.c>
> +libc_hidden_def (__wcscpy)
> +weak_alias (__wcscpy, wcscpy)
>
Gabriel F. T. Gomes - March 5, 2019, 1:35 p.m.
On Sun, Mar 03 2019, Adhemerval Zanella wrote:
>  
> On 02/03/2019 19:06, Gabriel F. T. Gomes wrote:
> > 
> > powerpc64 and powerpc64le builds fail when configured with
> > --disable-multi-arch, due to an undefined reference to __GI___wcscpy.
> > This patch fixes this on sysdeps/powerpc/powerpc64/power6/wcscpy.c,
> > which is only used when multi-arch is disabled.
> > 
> > This patch does nothing for the failures on 32-bits powerpc builds,
> > because the file is under the powerpc64 subdirectory, however, powerpc
> > builds were already failing with --disable-multi-arch, with multiple
> > error messages, even before the aforementioned commit.
> > 
> > Tested for powerpc, powerpc64, and powerpc64le with multi-arch enabled
> > (all pass) and disabled (powerpc still fails as explained above).
> > 
>
> LGTM.

Thanks.

> In fact you need either your compiler to setup to to use power6+
> as default or configure with --with-cpu (the powerpc64le gcc I used
> to check in fact do not have neither set).

I usually build powerpc64le with --with-cpu=power8, because that's the
machine where little-endian is supposed to work, and since that's what
distros typically use, and that's why I saw the failure.

I don't do the same for powerpce64...  I only wrote that such build is
failing, because I ran an extra test, just to check what was going on
(not because I know of any distros that use --with-cpu=power6 (or
newer)), I'm sorry if I gave the impression that this was the case.

Anyhow, yes, one would need to use --with-cpu to get the failure, thanks
for making that clear.

I will commit this patch with that information.

> As a side note, this is exactly the kind of annoyance I tried to avoid
> when I advocated to reduce the build permutations on glibc for the 
> recent s390 ifunc refactor (and it bit me with an issue recently [1]).
> So I would ask again if these micro-optimization indeed pays for the 
> over-complicated build options we need to actually test to make sure 
> nothing broken glibc build.

We will have to check if they actually do, and I'll review your new
patch set.  I'm not so sure about the representativeness (I hope that
this is an actual word in English) of the benchmarks, though.  This has
always been an open-question for me.

> For powerpc64 there are two issue: 1. can't we just infer the minimum
> cpu support from compiler or CC and get rid of --with-cpu?

I thought that --with-cpu was useful for distros, so that they could
decide what minimum cpu they want to build support for.  Are you
implying that they could set this in the compiler?

> I would like
> to avoid to add the possible permutation of --with-cpu=powerX and
> --disable-multi-arch of the required build to check. It won't help much
> now since powerpc still organizes itself with sysdeps subfolder, but
> it can be a first step.

This is new for me.  I thought that the options were good, in the sense
that distros can use whatever combination they deem appropriate for
their users.  I don't even know the reasons why one would disable
multi-arch builds, I only assumed that distros and users would know
better.
 
> Second, and this is minor one, I will send a patch to just remove the
> power6 wcs* optimized routines. They are rarely used in the wild, the
> optimization are just loop unrolling that we can either rely on 
> compiler (sadly gcc does not unroll this kind of loop either with
> aggressive optimization flags) or add on generic implementation,
> and power6 and power7 are essentially the same binary for
> powerpc64.

Thanks for doing this.
Adhemerval Zanella Netto - March 5, 2019, 2:11 p.m.
On 05/03/2019 10:35, Gabriel F. T. Gomes wrote:
> On Sun, Mar 03 2019, Adhemerval Zanella wrote:
>>  
>> On 02/03/2019 19:06, Gabriel F. T. Gomes wrote:
>>>
>>> powerpc64 and powerpc64le builds fail when configured with
>>> --disable-multi-arch, due to an undefined reference to __GI___wcscpy.
>>> This patch fixes this on sysdeps/powerpc/powerpc64/power6/wcscpy.c,
>>> which is only used when multi-arch is disabled.
>>>
>>> This patch does nothing for the failures on 32-bits powerpc builds,
>>> because the file is under the powerpc64 subdirectory, however, powerpc
>>> builds were already failing with --disable-multi-arch, with multiple
>>> error messages, even before the aforementioned commit.
>>>
>>> Tested for powerpc, powerpc64, and powerpc64le with multi-arch enabled
>>> (all pass) and disabled (powerpc still fails as explained above).
>>>
>>
>> LGTM.
> 
> Thanks.
> 
>> In fact you need either your compiler to setup to to use power6+
>> as default or configure with --with-cpu (the powerpc64le gcc I used
>> to check in fact do not have neither set).
> 
> I usually build powerpc64le with --with-cpu=power8, because that's the
> machine where little-endian is supposed to work, and since that's what
> distros typically use, and that's why I saw the failure.
> 
> I don't do the same for powerpce64...  I only wrote that such build is
> failing, because I ran an extra test, just to check what was going on
> (not because I know of any distros that use --with-cpu=power6 (or
> newer)), I'm sorry if I gave the impression that this was the case.
> 
> Anyhow, yes, one would need to use --with-cpu to get the failure, thanks
> for making that clear.
> 
> I will commit this patch with that information.
> 
>> As a side note, this is exactly the kind of annoyance I tried to avoid
>> when I advocated to reduce the build permutations on glibc for the 
>> recent s390 ifunc refactor (and it bit me with an issue recently [1]).
>> So I would ask again if these micro-optimization indeed pays for the 
>> over-complicated build options we need to actually test to make sure 
>> nothing broken glibc build.
> 
> We will have to check if they actually do, and I'll review your new
> patch set.  I'm not so sure about the representativeness (I hope that
> this is an actual word in English) of the benchmarks, though.  This has
> always been an open-question for me.

As Wilco has pointed out, we can just the similar optimization on generic
implementation instead and avoid a lot of powerpc specific code.

> 
>> For powerpc64 there are two issue: 1. can't we just infer the minimum
>> cpu support from compiler or CC and get rid of --with-cpu?
> 
> I thought that --with-cpu was useful for distros, so that they could
> decide what minimum cpu they want to build support for.  Are you
> implying that they could set this in the compiler?

Currently --with-cpu is still required to correctly infer the internal
sysdeps folders if you aim to build glibc to run on a specific chip
as the minimum base (power8 for instance). However this same information
can be inferred from -mcpu flag from CC, so I see this extra configure
option as just another build permutation that incur in more extra
testing.

My idea is to make powerpc behave as other architectures and avoid the
multiple possible configuration options provide by each chip/ISA and
instead have:

  1. powerpc-linux-gnu: make ifunc default for all build instead of
     just power4. If I recall correctly the only impeding reason
     was the hp-timing support that was added only on ISA 2.02. However
     with current hp-timing refactor, we can conditionally build its
     support iff compiler builds for power4.

  2. powerpc64-linux-gnu: remove all the powerX variations, add some
     missing ifunc to remove chip-specific implementation that are
     only active on some chips (add_n.S and sub_n.S for power7 for
     instance). 

     A possible optimization to enable optimized base symbol to be
     used as default (memcpy for instance) it to implement something
     similar s390 did and make some ifunc build conditional depending
     of the compiler option. I am not a huge fan of this approach,
     since it just moves the build permutations from configure option
     to compiler flags.

  3. powerpc64le-linux-gnu: as forr powerpc64-linux-gnu, remove all
     powerX variations and make power8 the true default by avoid
     build and export the various pre-power8 variations.

The powerpc64{le} has an additional issue which seems that only power8
routines are shared between LE and BE abi (it seems power9 is not
support for BE). So one idea might be to move power8 arch-specific
optimization to a common folder user by LE and BE.

Another issue powerpc64le might also use different some specific
implementation from older chips.  I don't have a best approach on 
how to select it, but one thing that came to my mind might be
reimplement some routines as:

---
sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c

#ifdef _ARCH_PWR6
double __libm_copysign (double x, double y)
{
  __builtin_copysign (x, y);
}
#else
#define NO_MATH_REDIRECT
/* Redefine copysign so that the compiler won't complain about the type
   mismatch with the IFUNC selector in strong_alias below.  */
#undef __copysign
#define __copysign __redirect_copysign
#include <math.h>
#include <math_ldbl_opt.h>
#undef __copysign
#include <shlib-compat.h>
#include "init-arch.h"
#include <libm-alias-double.h>

extern __typeof (__redirect_copysign) __copysign_ppc64 attribute_hidden;
extern __typeof (__redirect_copysign) __copysign_power6 attribute_hidden;

extern __typeof (__redirect_copysign) __libm_copysign;
libc_ifunc (__libm_copysign,
            (hwcap & PPC_FEATURE_ARCH_2_05)
            ? __copysign_power6
            : __copysign_ppc64);
#endif
strong_alias (__libm_copysign, __copysign)
libm_alias_double (__copysign, copysign)

#if LONG_DOUBLE_COMPAT (libc, GLIBC_2_0)
compat_symbol (libc, __copysign, copysignl, GLIBC_2_0);
#endif
---

But I am not sure it is really pays off in term of simplify glibc code
and build. Maybe an option is just to use __builtin_* when we know
compiler generates an efficient instruction and just use the generic
implementation instead.

> 
>> I would like
>> to avoid to add the possible permutation of --with-cpu=powerX and
>> --disable-multi-arch of the required build to check. It won't help much
>> now since powerpc still organizes itself with sysdeps subfolder, but
>> it can be a first step.
> 
> This is new for me.  I thought that the options were good, in the sense
> that distros can use whatever combination they deem appropriate for
> their users.  I don't even know the reasons why one would disable
> multi-arch builds, I only assumed that distros and users would know
> better.

I see the --disable-multi-arch useful just when used along with --with-cpu,
because since the idea is just to provide a chip specific build there is
no sense of adding all possible ifunc variants. But afaik distros usually
do not use it. 

>  
>> Second, and this is minor one, I will send a patch to just remove the
>> power6 wcs* optimized routines. They are rarely used in the wild, the
>> optimization are just loop unrolling that we can either rely on 
>> compiler (sadly gcc does not unroll this kind of loop either with
>> aggressive optimization flags) or add on generic implementation,
>> and power6 and power7 are essentially the same binary for
>> powerpc64.
> 
> Thanks for doing this.
>
Joseph Myers - March 6, 2019, 9:07 p.m.
On Sun, 3 Mar 2019, Adhemerval Zanella wrote:

> For powerpc64 there are two issue: 1. can't we just infer the minimum
> cpu support from compiler or CC and get rid of --with-cpu? I would like

We should do that in general to get rid of --with-cpu (regarding powerpc, 
see my review comments 
<https://sourceware.org/ml/libc-alpha/2017-02/msg00164.html>).

Patch

diff --git a/sysdeps/powerpc/powerpc64/power6/wcscpy.c b/sysdeps/powerpc/powerpc64/power6/wcscpy.c
index 722c8f995b..59cfb28832 100644
--- a/sysdeps/powerpc/powerpc64/power6/wcscpy.c
+++ b/sysdeps/powerpc/powerpc64/power6/wcscpy.c
@@ -1 +1,4 @@ 
+#define WCSCPY __wcscpy
 #include <sysdeps/powerpc/power6/wcscpy.c>
+libc_hidden_def (__wcscpy)
+weak_alias (__wcscpy, wcscpy)