[RFC] Fix float128 IFUNC relocations on ppc64le [BZ #21707]

Message ID 20170705172315.14358-1-tuliom@linux.vnet.ibm.com
State Superseded
Headers

Commit Message

Tulio Magno Quites Machado Filho July 5, 2017, 5:23 p.m. UTC
  > On 06/27/2017 09:05 AM, Florian Weimer wrote:
>> On 06/27/2017 01:02 AM, Joseph Myers wrote:
>>> I'm seeing a testsuite regression for powerpc64le with
>>> build-many-glibcs.py with this patch (elf/check-localplt fails), did you
>>> not see that in your testing?
>>>
>>> https://sourceware.org/ml/libc-testresults/2017-q2/msg00427.html
>>>
>>> The failure is: "Extra PLT reference: libc.so: __getauxval".  As the
>>> __getauxval reference comes from have_ieee_hw_p in libgcc, presumably you
>>> need to allow that PLT reference in localplt.data with an appropriate
>>> comment, as it won't be readily possible to avoid it.
>>
>> The __getauxval call happens from IFUNC resolvers and violates current
>> guidelines regarding what can be done from IFUNC resolvers.  This is
>> another reason to get rid of the PLT reference.
>>
>> My IFUNC resolver enhancements are not ready for 2.26, and I plan to
>> wait for DJ's dl-minimal malloc improvements to land, rather than
>> rolling my own memory allocator to back the IFUNC resolver queue.
>
> The above isn't correct because even if you can call getauxval, it
> doesn't have the data to return meaningful results during relocation.
> This currently breaks --enable-bind-now builds on pcp64le:
>
>   https://sourceware.org/bugzilla/show_bug.cgi?id=21707

I agree.

> For the time being, we just disable --enable-bind-now, but I'd prefer a
> proper fix, perhaps the one suggested here:
>
>   https://sourceware.org/ml/libc-alpha/2017-06/msg01383.html

Unfortunately, this patch creates another issue: the thread pointer is not
initialized at the time of IRELA relocations in static executables.

The following changes solve this issue.

-- 8< --

The patch proposed by Peter Bergner [1] to libgc in order to fix
[BZ #21707] adds a dependency on a symbol provided by the loader,
forcing the loader to be linked to tests after libgcc was linked.

The change also requires to access the thread pointer during IRELA
relocations.

Tested on powerpc, powerpc64, powerpc64le, s390x and x86_64.

[1] https://sourceware.org/ml/libc-alpha/2017-06/msg01383.html

2017-07-05  Tulio Magno Quites Machado Filho  <tuliom@linux.vnet.ibm.com>

	[BZ #21707]
	* csu/libc-start.c (LIBC_START_MAIN): Perform IREL{,A}
	relocations after initializing the TCB..
	* sysdeps/powerpc/powerpc64le/Makefile (f128-loader-link): New
	variable.
	[$(subdir) = math] (test-float128% test-ifloat128%): Force
	linking to the loader after linking to libgcc.
	[$(subdir) = wcsmbs stdlib] (bug-strtod bug-strtod2 bug-strtod2)
	(tst-strtod-round tst-wcstod-round tst-strtod6 tst-strrom)
	(tst-strfrom-locale strfrom-skeleton): Likewise.
---
 csu/libc-start.c                     |  9 ++++++---
 sysdeps/powerpc/powerpc64le/Makefile | 10 ++++++++++
 2 files changed, 16 insertions(+), 3 deletions(-)
  

Comments

Florian Weimer July 5, 2017, 7:19 p.m. UTC | #1
On 07/05/2017 07:23 PM, Tulio Magno Quites Machado Filho wrote:

>    ARCH_INIT_CPU_FEATURES ();
>  
> -  /* Perform IREL{,A} relocations.  */
> -  apply_irel ();
> -
>    /* The stack guard goes into the TCB, so initialize it early.  */
>    __libc_setup_tls ();
>  
> +  /* Perform IREL{,A} relocations.
> +     Note: the relocations must happen after TLS initialization so that
> +     IFUNC resolvers can benefit from thread-local storage, e.g. powerpc's
> +     hwcap and platform fields available in the TCB.  */
> +  apply_irel ();
> +

__libc_setup_tls calls memcpy and a lot of other stuff, so I'm not sure
if this change is correct on all architectures.

> +# libgcc requires __tcb_parse_hwcap_and_convert_at_platform when built with
> +# a binary128 type.  That symbol is provided by the loader on dynamically
> +# linked executables, forcing to link the loader after libgcc link.
> +f128-loader-link = $(as-needed) $(elf-objpfx)ld.so $(no-as-needed)

Why doesn't the regular linker invocation take care of this?  Will user
applications run into the same issue?

Thanks,
Florian
  
Carlos O'Donell July 5, 2017, 7:21 p.m. UTC | #2
On 07/05/2017 01:23 PM, Tulio Magno Quites Machado Filho wrote:
>> On 06/27/2017 09:05 AM, Florian Weimer wrote:
>>> On 06/27/2017 01:02 AM, Joseph Myers wrote:
>>>> I'm seeing a testsuite regression for powerpc64le with
>>>> build-many-glibcs.py with this patch (elf/check-localplt fails), did you
>>>> not see that in your testing?
>>>>
>>>> https://sourceware.org/ml/libc-testresults/2017-q2/msg00427.html
>>>>
>>>> The failure is: "Extra PLT reference: libc.so: __getauxval".  As the
>>>> __getauxval reference comes from have_ieee_hw_p in libgcc, presumably you
>>>> need to allow that PLT reference in localplt.data with an appropriate
>>>> comment, as it won't be readily possible to avoid it.
>>>
>>> The __getauxval call happens from IFUNC resolvers and violates current
>>> guidelines regarding what can be done from IFUNC resolvers.  This is
>>> another reason to get rid of the PLT reference.
>>>
>>> My IFUNC resolver enhancements are not ready for 2.26, and I plan to
>>> wait for DJ's dl-minimal malloc improvements to land, rather than
>>> rolling my own memory allocator to back the IFUNC resolver queue.
>>
>> The above isn't correct because even if you can call getauxval, it
>> doesn't have the data to return meaningful results during relocation.
>> This currently breaks --enable-bind-now builds on pcp64le:
>>
>>   https://sourceware.org/bugzilla/show_bug.cgi?id=21707
> 
> I agree.
> 
>> For the time being, we just disable --enable-bind-now, but I'd prefer a
>> proper fix, perhaps the one suggested here:
>>
>>   https://sourceware.org/ml/libc-alpha/2017-06/msg01383.html
> 
> Unfortunately, this patch creates another issue: the thread pointer is not
> initialized at the time of IRELA relocations in static executables.
> 
> The following changes solve this issue.

OK to checkin with a test.

You need a static program with an IFUNC resolver that touches a __thread
variable and fails before your patch, and passes after. This is marginal
proof that TLS is operational after your change. Have the resolver set the
TLS variable to some value and check that value in main.

e.g.
#include <stdio.h>
#include <stdlib.h>

__thread int called;

int foo (void) __attribute__ ((ifunc ("resolve_foo")));

int my_foo (void)
{
  printf ("GNU ifunc resolver called %d times.\n", called);
  return called;
}

static int (*resolve_foo (void)) (void)
{
  called++;
  return my_foo;
}

int my_caller (void)
{
  return foo ();
}

int
main (void)
{
  if (foo () == 1)
    printf ("PASS: IFUNC resolver called once.\n");
  else
    printf ("FAIL: IFUNC resolver not called onece.\n");
    
  if (called == 1)
    printf ("PASS: __thread variable set correctly from IFUNC resolver.\n");
  else
    printf ("FAIL: __thread variable not set correctly from IFUNC resolver.\n");
}

At a high level this looks OK to me. We previously could not access __thread
in a static application during IFUNC resolvers because of the ordering in 
csu/libc-start.c, and now we can. Verify it with a test please.

That also means the TLS setup can't use IFUNCs, but a quick audit of libc.a
shows we don't call any IFUNC using functions in that sequence, only memcpy()
but we don't have an IFUNC for that.

A test like the one above test would fail if we added more IFUNCs in libc.a
and processed them late because any caller of them would crash if called
early.

> -- 8< --
> 
> The patch proposed by Peter Bergner [1] to libgc in order to fix
> [BZ #21707] adds a dependency on a symbol provided by the loader,
> forcing the loader to be linked to tests after libgcc was linked.
> 
> The change also requires to access the thread pointer during IRELA
> relocations.
> 
> Tested on powerpc, powerpc64, powerpc64le, s390x and x86_64.
> 
> [1] https://sourceware.org/ml/libc-alpha/2017-06/msg01383.html
> 
> 2017-07-05  Tulio Magno Quites Machado Filho  <tuliom@linux.vnet.ibm.com>
> 
> 	[BZ #21707]
> 	* csu/libc-start.c (LIBC_START_MAIN): Perform IREL{,A}
> 	relocations after initializing the TCB..
> 	* sysdeps/powerpc/powerpc64le/Makefile (f128-loader-link): New
> 	variable.
> 	[$(subdir) = math] (test-float128% test-ifloat128%): Force
> 	linking to the loader after linking to libgcc.
> 	[$(subdir) = wcsmbs stdlib] (bug-strtod bug-strtod2 bug-strtod2)
> 	(tst-strtod-round tst-wcstod-round tst-strtod6 tst-strrom)
> 	(tst-strfrom-locale strfrom-skeleton): Likewise.
> ---
>  csu/libc-start.c                     |  9 ++++++---
>  sysdeps/powerpc/powerpc64le/Makefile | 10 ++++++++++
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/csu/libc-start.c b/csu/libc-start.c
> index c2dd159..e6fa848 100644
> --- a/csu/libc-start.c
> +++ b/csu/libc-start.c
> @@ -188,12 +188,15 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>  
>    ARCH_INIT_CPU_FEATURES ();
>  
> -  /* Perform IREL{,A} relocations.  */
> -  apply_irel ();
> -
>    /* The stack guard goes into the TCB, so initialize it early.  */
>    __libc_setup_tls ();
>  
> +  /* Perform IREL{,A} relocations.
> +     Note: the relocations must happen after TLS initialization so that
> +     IFUNC resolvers can benefit from thread-local storage, e.g. powerpc's
> +     hwcap and platform fields available in the TCB.  */
> +  apply_irel ();
> +
>    /* Set up the stack checker's canary.  */
>    uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
>  # ifdef THREAD_SET_STACK_GUARD
> diff --git a/sysdeps/powerpc/powerpc64le/Makefile b/sysdeps/powerpc/powerpc64le/Makefile
> index bd8a82d..abaa74d 100644
> --- a/sysdeps/powerpc/powerpc64le/Makefile
> +++ b/sysdeps/powerpc/powerpc64le/Makefile
> @@ -1,6 +1,11 @@
>  # When building float128 we need to ensure -mfloat128 is
>  # passed to all such object files.
>  
> +# libgcc requires __tcb_parse_hwcap_and_convert_at_platform when built with
> +# a binary128 type.  That symbol is provided by the loader on dynamically
> +# linked executables, forcing to link the loader after libgcc link.
> +f128-loader-link = $(as-needed) $(elf-objpfx)ld.so $(no-as-needed)
> +
>  ifeq ($(subdir),math)
>  # sqrtf128 requires emulation before POWER9.
>  CPPFLAGS += -I../soft-fp
> @@ -11,6 +16,8 @@ $(foreach suf,$(all-object-suffixes),%f128_r$(suf)): CFLAGS += -mfloat128
>  $(objpfx)test-float128%.o $(objpfx)test-float128%.os: CFLAGS += -mfloat128
>  $(objpfx)test-ifloat128%.o $(objpfx)test-ifloat128%.os: CFLAGS += -mfloat128
>  CFLAGS-libm-test-support-float128.c += -mfloat128
> +$(objpfx)test-float128% $(objpfx)test-ifloat128%: \
> +  gnulib-tests += $(f128-loader-link)
>  endif
>  
>  # Append flags to string <-> _Float128 routines.
> @@ -24,6 +31,9 @@ CFLAGS-tst-strtod6.c += -mfloat128
>  CFLAGS-tst-strfrom.c += -mfloat128
>  CFLAGS-tst-strfrom-locale.c += -mfloat128
>  CFLAGS-strfrom-skeleton.c += -mfloat128
> +$(foreach test,bug-strtod bug-strtod2 bug-strtod2 tst-strtod-round \
> +tst-wcstod-round tst-strtod6 tst-strrom tst-strfrom-locale \
> +strfrom-skeleton,$(objpfx)$(pref)): gnulib-tests += $(f128-loader-link)
>  
>  # When building glibc with support for _Float128, the powers of ten tables in
>  # fpioconst.c and in the string conversion functions must be extended.
>
  
Carlos O'Donell July 5, 2017, 7:25 p.m. UTC | #3
On 07/05/2017 03:19 PM, Florian Weimer wrote:
> On 07/05/2017 07:23 PM, Tulio Magno Quites Machado Filho wrote:
> 
>>    ARCH_INIT_CPU_FEATURES ();
>>  
>> -  /* Perform IREL{,A} relocations.  */
>> -  apply_irel ();
>> -
>>    /* The stack guard goes into the TCB, so initialize it early.  */
>>    __libc_setup_tls ();
>>  
>> +  /* Perform IREL{,A} relocations.
>> +     Note: the relocations must happen after TLS initialization so that
>> +     IFUNC resolvers can benefit from thread-local storage, e.g. powerpc's
>> +     hwcap and platform fields available in the TCB.  */
>> +  apply_irel ();
>> +
> 
> __libc_setup_tls calls memcpy and a lot of other stuff, so I'm not sure
> if this change is correct on all architectures.

I audited this, and it looks like we don't IFUNC memcpy in libc.a, probably
because of this issue, we use memcpy very very early.

A new test like I suggest would make sure we don't regress this.

>> +# libgcc requires __tcb_parse_hwcap_and_convert_at_platform when built with
>> +# a binary128 type.  That symbol is provided by the loader on dynamically
>> +# linked executables, forcing to link the loader after libgcc link.
>> +f128-loader-link = $(as-needed) $(elf-objpfx)ld.so $(no-as-needed)
> 
> Why doesn't the regular linker invocation take care of this?  Will user
> applications run into the same issue?

We don't use the regular linker invocation in glibc builds? We have to emulate
what the actual toolchain would do in this case.

I would expect that users using gcc as the normal driver would get ld.so
included as needed.

Tulio can answer more authoritatively here.
  
Gabriel F T Gomes July 5, 2017, 7:40 p.m. UTC | #4
On Wed,  5 Jul 2017 14:23:15 -0300
"Tulio Magno Quites Machado Filho" <tuliom@linux.vnet.ibm.com> wrote:
>
> @@ -24,6 +31,9 @@ CFLAGS-tst-strtod6.c += -mfloat128
>  CFLAGS-tst-strfrom.c += -mfloat128
>  CFLAGS-tst-strfrom-locale.c += -mfloat128
>  CFLAGS-strfrom-skeleton.c += -mfloat128
> +$(foreach test,bug-strtod bug-strtod2 bug-strtod2 tst-strtod-round \
> +tst-wcstod-round tst-strtod6 tst-strrom tst-strfrom-locale \
> +strfrom-skeleton,$(objpfx)$(pref)): gnulib-tests += $(f128-loader-link)
                               ~~~~
I understand what you are trying to do, but I did not understand what is
the effect of $(pref).  I would expect it to be $(test).
  
Florian Weimer July 5, 2017, 7:44 p.m. UTC | #5
On 07/05/2017 09:25 PM, Carlos O'Donell wrote:
>>> +# libgcc requires __tcb_parse_hwcap_and_convert_at_platform when built with
>>> +# a binary128 type.  That symbol is provided by the loader on dynamically
>>> +# linked executables, forcing to link the loader after libgcc link.
>>> +f128-loader-link = $(as-needed) $(elf-objpfx)ld.so $(no-as-needed)
>> Why doesn't the regular linker invocation take care of this?  Will user
>> applications run into the same issue?
> We don't use the regular linker invocation in glibc builds? We have to emulate
> what the actual toolchain would do in this case.

But then we should do this everywhere, and not just for the few tests
that actually need it, right?  I assume that's also what the official
linker script does.

Thanks,
Florian
  
Joseph Myers July 5, 2017, 8:11 p.m. UTC | #6
On Wed, 5 Jul 2017, Florian Weimer wrote:

> On 07/05/2017 09:25 PM, Carlos O'Donell wrote:
> >>> +# libgcc requires __tcb_parse_hwcap_and_convert_at_platform when built with
> >>> +# a binary128 type.  That symbol is provided by the loader on dynamically
> >>> +# linked executables, forcing to link the loader after libgcc link.
> >>> +f128-loader-link = $(as-needed) $(elf-objpfx)ld.so $(no-as-needed)
> >> Why doesn't the regular linker invocation take care of this?  Will user
> >> applications run into the same issue?
> > We don't use the regular linker invocation in glibc builds? We have to emulate
> > what the actual toolchain would do in this case.
> 
> But then we should do this everywhere, and not just for the few tests
> that actually need it, right?  I assume that's also what the official
> linker script does.

The installed compiler does -lgcc -lc -lgcc (roughly; -lgcc may actually 
use -lgcc_s or -lgcc_eh, and -lc may vary for e.g. profiling).  -lc is a 
linker script doing e.g. GROUP ( /lib64/libc.so.6 
/usr/lib64/libc_nonshared.a  AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) ).  
The shared libgcc is also a linker script on some platforms.

The glibc build uses libc.so --as-needed ld.so --no-as-needed -lgcc 
(again, roughly) (but libc.a -lgcc libc.a -lgcc in the static build case).

I suppose the suggestion is that we should both use -lgcc before linking 
with libc, not just after (this being the issue involved in the present 
case), and use -Wl,--start-group -Wl,--end-group around libc.so and ld.so, 
both to be more similar to how things work with the installed compiler?  
Which makes sense, but is also rather risky for this development stage.
  
Tulio Magno Quites Machado Filho July 5, 2017, 8:31 p.m. UTC | #7
"Gabriel F. T. Gomes" <gftg@linux.vnet.ibm.com> writes:

> [ text/plain ]
> On Wed,  5 Jul 2017 14:23:15 -0300
> "Tulio Magno Quites Machado Filho" <tuliom@linux.vnet.ibm.com> wrote:
>>
>> @@ -24,6 +31,9 @@ CFLAGS-tst-strtod6.c += -mfloat128
>>  CFLAGS-tst-strfrom.c += -mfloat128
>>  CFLAGS-tst-strfrom-locale.c += -mfloat128
>>  CFLAGS-strfrom-skeleton.c += -mfloat128
>> +$(foreach test,bug-strtod bug-strtod2 bug-strtod2 tst-strtod-round \
>> +tst-wcstod-round tst-strtod6 tst-strrom tst-strfrom-locale \
>> +strfrom-skeleton,$(objpfx)$(pref)): gnulib-tests += $(f128-loader-link)
>                                ~~~~
> I understand what you are trying to do, but I did not understand what is
> the effect of $(pref).  I would expect it to be $(test).

Yes.
Sorry, I messed up the patch while preparing for submission.
  
Florian Weimer July 6, 2017, 6:12 a.m. UTC | #8
On 07/05/2017 10:11 PM, Joseph Myers wrote:
> The installed compiler does -lgcc -lc -lgcc (roughly; -lgcc may actually 
> use -lgcc_s or -lgcc_eh, and -lc may vary for e.g. profiling).  -lc is a 
> linker script doing e.g. GROUP ( /lib64/libc.so.6 
> /usr/lib64/libc_nonshared.a  AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) ).  
> The shared libgcc is also a linker script on some platforms.
> 
> The glibc build uses libc.so --as-needed ld.so --no-as-needed -lgcc 
> (again, roughly) (but libc.a -lgcc libc.a -lgcc in the static build case).
> 
> I suppose the suggestion is that we should both use -lgcc before linking 
> with libc, not just after (this being the issue involved in the present 
> case), and use -Wl,--start-group -Wl,--end-group around libc.so and ld.so, 
> both to be more similar to how things work with the installed compiler?  
> Which makes sense, but is also rather risky for this development stage.

Okay, then let's postpone the more general approach and use whatever
works on POWER for the release.

I assumed that the general fix would still be POWER-specific, but that
does not seem to be the case.

Thanks,
Florian
  
Carlos O'Donell July 6, 2017, 2:04 p.m. UTC | #9
On 07/06/2017 02:12 AM, Florian Weimer wrote:
> On 07/05/2017 10:11 PM, Joseph Myers wrote:
>> The installed compiler does -lgcc -lc -lgcc (roughly; -lgcc may actually 
>> use -lgcc_s or -lgcc_eh, and -lc may vary for e.g. profiling).  -lc is a 
>> linker script doing e.g. GROUP ( /lib64/libc.so.6 
>> /usr/lib64/libc_nonshared.a  AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) ).  
>> The shared libgcc is also a linker script on some platforms.
>>
>> The glibc build uses libc.so --as-needed ld.so --no-as-needed -lgcc 
>> (again, roughly) (but libc.a -lgcc libc.a -lgcc in the static build case).
>>
>> I suppose the suggestion is that we should both use -lgcc before linking 
>> with libc, not just after (this being the issue involved in the present 
>> case), and use -Wl,--start-group -Wl,--end-group around libc.so and ld.so, 
>> both to be more similar to how things work with the installed compiler?  
>> Which makes sense, but is also rather risky for this development stage.
> 
> Okay, then let's postpone the more general approach and use whatever
> works on POWER for the release.

Right, for now only POWER needs the link command fixed.

> I assumed that the general fix would still be POWER-specific, but that
> does not seem to be the case.

Right, the fix is in general code, but it fixes a general problem for
all machines. Every machine, including x86_64, used to segfault upon
first use of __thread in an IFUNC resolver function, and the change
fixes that.
  
Florian Weimer July 7, 2017, 10:08 a.m. UTC | #10
On 07/06/2017 04:04 PM, Carlos O'Donell wrote:
> On 07/06/2017 02:12 AM, Florian Weimer wrote:
>> On 07/05/2017 10:11 PM, Joseph Myers wrote:
>>> The installed compiler does -lgcc -lc -lgcc (roughly; -lgcc may actually 
>>> use -lgcc_s or -lgcc_eh, and -lc may vary for e.g. profiling).  -lc is a 
>>> linker script doing e.g. GROUP ( /lib64/libc.so.6 
>>> /usr/lib64/libc_nonshared.a  AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) ).  
>>> The shared libgcc is also a linker script on some platforms.
>>>
>>> The glibc build uses libc.so --as-needed ld.so --no-as-needed -lgcc 
>>> (again, roughly) (but libc.a -lgcc libc.a -lgcc in the static build case).
>>>
>>> I suppose the suggestion is that we should both use -lgcc before linking 
>>> with libc, not just after (this being the issue involved in the present 
>>> case), and use -Wl,--start-group -Wl,--end-group around libc.so and ld.so, 
>>> both to be more similar to how things work with the installed compiler?  
>>> Which makes sense, but is also rather risky for this development stage.
>>
>> Okay, then let's postpone the more general approach and use whatever
>> works on POWER for the release.
> 
> Right, for now only POWER needs the link command fixed.
> 
>> I assumed that the general fix would still be POWER-specific, but that
>> does not seem to be the case.
> 
> Right, the fix is in general code, but it fixes a general problem for
> all machines. Every machine, including x86_64, used to segfault upon
> first use of __thread in an IFUNC resolver function, and the change
> fixes that.

Note that I was discussing the Makefile changes, not the apply_urel
reordering (which affects only static binaries anyway).

Florian
  
Tulio Magno Quites Machado Filho July 8, 2017, 6:45 p.m. UTC | #11
Carlos O'Donell <carlos@redhat.com> writes:

> On 07/05/2017 01:23 PM, Tulio Magno Quites Machado Filho wrote:
>>> On 06/27/2017 09:05 AM, Florian Weimer wrote:
>>>> On 06/27/2017 01:02 AM, Joseph Myers wrote:
>>>>> I'm seeing a testsuite regression for powerpc64le with
>>>>> build-many-glibcs.py with this patch (elf/check-localplt fails), did you
>>>>> not see that in your testing?
>>>>>
>>>>> https://sourceware.org/ml/libc-testresults/2017-q2/msg00427.html
>>>>>
>>>>> The failure is: "Extra PLT reference: libc.so: __getauxval".  As the
>>>>> __getauxval reference comes from have_ieee_hw_p in libgcc, presumably you
>>>>> need to allow that PLT reference in localplt.data with an appropriate
>>>>> comment, as it won't be readily possible to avoid it.
>>>>
>>>> The __getauxval call happens from IFUNC resolvers and violates current
>>>> guidelines regarding what can be done from IFUNC resolvers.  This is
>>>> another reason to get rid of the PLT reference.
>>>>
>>>> My IFUNC resolver enhancements are not ready for 2.26, and I plan to
>>>> wait for DJ's dl-minimal malloc improvements to land, rather than
>>>> rolling my own memory allocator to back the IFUNC resolver queue.
>>>
>>> The above isn't correct because even if you can call getauxval, it
>>> doesn't have the data to return meaningful results during relocation.
>>> This currently breaks --enable-bind-now builds on pcp64le:
>>>
>>>   https://sourceware.org/bugzilla/show_bug.cgi?id=21707
>> 
>> I agree.
>> 
>>> For the time being, we just disable --enable-bind-now, but I'd prefer a
>>> proper fix, perhaps the one suggested here:
>>>
>>>   https://sourceware.org/ml/libc-alpha/2017-06/msg01383.html
>> 
>> Unfortunately, this patch creates another issue: the thread pointer is not
>> initialized at the time of IRELA relocations in static executables.
>> 
>> The following changes solve this issue.
>
> OK to checkin with a test.
>
> You need a static program with an IFUNC resolver that touches a __thread
> variable and fails before your patch, and passes after. This is marginal
> proof that TLS is operational after your change. Have the resolver set the
> TLS variable to some value and check that value in main.

It wasn't my intention to allow writing to the TLS from an IFUNC resolver.
It was intended to enable reading from the TCB.
Writing to the TLS during IRELA relocation is still not allowed because the
initialization of thread-local variables happen after relocating all objects
in a dynamically-linked executable.

So, I adapted your test to read the address of a thread-local variable
and included tests both for statically and dynamically-linked executables
in the version 2 of the patch.

Thanks!
  

Patch

diff --git a/csu/libc-start.c b/csu/libc-start.c
index c2dd159..e6fa848 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -188,12 +188,15 @@  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 
   ARCH_INIT_CPU_FEATURES ();
 
-  /* Perform IREL{,A} relocations.  */
-  apply_irel ();
-
   /* The stack guard goes into the TCB, so initialize it early.  */
   __libc_setup_tls ();
 
+  /* Perform IREL{,A} relocations.
+     Note: the relocations must happen after TLS initialization so that
+     IFUNC resolvers can benefit from thread-local storage, e.g. powerpc's
+     hwcap and platform fields available in the TCB.  */
+  apply_irel ();
+
   /* Set up the stack checker's canary.  */
   uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
 # ifdef THREAD_SET_STACK_GUARD
diff --git a/sysdeps/powerpc/powerpc64le/Makefile b/sysdeps/powerpc/powerpc64le/Makefile
index bd8a82d..abaa74d 100644
--- a/sysdeps/powerpc/powerpc64le/Makefile
+++ b/sysdeps/powerpc/powerpc64le/Makefile
@@ -1,6 +1,11 @@ 
 # When building float128 we need to ensure -mfloat128 is
 # passed to all such object files.
 
+# libgcc requires __tcb_parse_hwcap_and_convert_at_platform when built with
+# a binary128 type.  That symbol is provided by the loader on dynamically
+# linked executables, forcing to link the loader after libgcc link.
+f128-loader-link = $(as-needed) $(elf-objpfx)ld.so $(no-as-needed)
+
 ifeq ($(subdir),math)
 # sqrtf128 requires emulation before POWER9.
 CPPFLAGS += -I../soft-fp
@@ -11,6 +16,8 @@  $(foreach suf,$(all-object-suffixes),%f128_r$(suf)): CFLAGS += -mfloat128
 $(objpfx)test-float128%.o $(objpfx)test-float128%.os: CFLAGS += -mfloat128
 $(objpfx)test-ifloat128%.o $(objpfx)test-ifloat128%.os: CFLAGS += -mfloat128
 CFLAGS-libm-test-support-float128.c += -mfloat128
+$(objpfx)test-float128% $(objpfx)test-ifloat128%: \
+  gnulib-tests += $(f128-loader-link)
 endif
 
 # Append flags to string <-> _Float128 routines.
@@ -24,6 +31,9 @@  CFLAGS-tst-strtod6.c += -mfloat128
 CFLAGS-tst-strfrom.c += -mfloat128
 CFLAGS-tst-strfrom-locale.c += -mfloat128
 CFLAGS-strfrom-skeleton.c += -mfloat128
+$(foreach test,bug-strtod bug-strtod2 bug-strtod2 tst-strtod-round \
+tst-wcstod-round tst-strtod6 tst-strrom tst-strfrom-locale \
+strfrom-skeleton,$(objpfx)$(pref)): gnulib-tests += $(f128-loader-link)
 
 # When building glibc with support for _Float128, the powers of ten tables in
 # fpioconst.c and in the string conversion functions must be extended.