[PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]

Message ID 20200504152503.9608-1-murphyp@linux.vnet.ibm.com
State Committed
Headers
Series [PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905] |

Commit Message

Paul E. Murphy May 4, 2020, 3:25 p.m. UTC
  Use rtld-strcpy.S file in power9 to redirect to power8 implementation
instead.  This builds the power8 strcpy rtld with --with-cpu=power9 and
--disable-multi-arch.  The existing behavior is unchanged when
multi-arch is enabled.

---8<---
strcmp is used while resolving PLT references.  Vector registers
should not be used during this.  The P9 strcmp makes heavy use of
vector registers, so it should be avoided in rtld.

This prevents quiet vector register corruption when glibc is configured
with --disable-multi-arch and --with-cpu=power9.  This can be seen with
test-float64x-compat_totalordermag during the first call into
totalordermagf64x@GLIBC_2.27.

Add a guard to fallback to the power8 implementation when building
power9 strcmp for libraries other than libc.
---
 sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
  

Comments

Adhemerval Zanella May 4, 2020, 4:36 p.m. UTC | #1
On 04/05/2020 12:25, Paul E. Murphy wrote:
> Use rtld-strcpy.S file in power9 to redirect to power8 implementation
> instead.  This builds the power8 strcpy rtld with --with-cpu=power9 and

s/strcpy/strcmp.

> --disable-multi-arch.  The existing behavior is unchanged when
> multi-arch is enabled.

LGTM thanks.

As a side note, is strcmp the only string routine that used in lazy
resolution?

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> 
> ---8<---
> strcmp is used while resolving PLT references.  Vector registers
> should not be used during this.  The P9 strcmp makes heavy use of
> vector registers, so it should be avoided in rtld.
> 
> This prevents quiet vector register corruption when glibc is configured
> with --disable-multi-arch and --with-cpu=power9.  This can be seen with
> test-float64x-compat_totalordermag during the first call into
> totalordermagf64x@GLIBC_2.27.
> 
> Add a guard to fallback to the power8 implementation when building
> power9 strcmp for libraries other than libc.
> ---
>  sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S | 2 ++
>  1 file changed, 2 insertions(+)
>  create mode 100644 sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
> 
> diff --git a/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
> new file mode 100644
> index 0000000000..afdb492b3d
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
> @@ -0,0 +1,2 @@
> +/* Fallback to P8 which does not use vector regs for rtld.  */
> +#include <sysdeps/powerpc/powerpc64/power8/strcmp.S>
>
  
Paul E Murphy May 4, 2020, 6:02 p.m. UTC | #2
On 5/4/20 11:36 AM, Adhemerval Zanella wrote:

> As a side note, is strcmp the only string routine that used in lazy
> resolution?

It was the only one I encountered while debugging this issue.  I won't 
claim it is the only one, I have limited experience with that corner of 
glibc.  Though, I am surprised only one test failed due to the vector 
register clobbering.
  
Adhemerval Zanella May 4, 2020, 6:26 p.m. UTC | #3
On 04/05/2020 15:02, Paul E Murphy wrote:
> 
> 
> On 5/4/20 11:36 AM, Adhemerval Zanella wrote:
> 
>> As a side note, is strcmp the only string routine that used in lazy
>> resolution?
> 
> It was the only one I encountered while debugging this issue.  I won't claim it is the only one, I have limited experience with that corner of glibc.  Though, I am surprised only one test failed due to the vector register clobbering.

I am wondering which would be best way to avoid such issue to happen
with different string routines since some specific loader code need 
to adhere to more strict ABI requirements.

The powerpc Makefile already adds the expected compiler tools to
avoid this very issue:

---
sysdeps/powerpc/powerpc64/Makefile

 13 # These flags prevent FPU or Altivec registers from being used,
 14 # for code called in contexts that is not allowed to touch those registers.
 15 # Stupid GCC requires us to pass all these ridiculous switches.  We need to
 16 # pass the -mno-* switches as well to prevent the compiler from attempting
 17 # to emit altivec or vsx instructions, especially when the registers aren't
 18 # available.
 19 no-special-regs := $(sort $(foreach n,40 41 50 51 60 61 62 63 \
 20                                       $(foreach m,2 3 4 5 6 7 8 9, \
 21                                                   3$m 4$m 5$m),\
 22                                     -ffixed-$n)) \
 23                    $(sort $(foreach n,$(foreach m,0 1 2 3 4 5 6 7 8 9,\
 24                                                 $m 1$m 2$m) 30 31,\
 25                                     -ffixed-v$n)) \
 26                    -ffixed-vrsave -ffixed-vscr -mno-altivec -mno-vsx
[...]
 36 CFLAGS-dl-runtime.os = $(no-special-regs)
 37 CFLAGS-dl-lookup.os = $(no-special-regs)
 38 CFLAGS-dl-misc.os = $(no-special-regs)
 39 CFLAGS-rtld-mempcpy.os = $(no-special-regs)
 40 CFLAGS-rtld-memmove.os = $(no-special-regs)
 41 CFLAGS-rtld-memchr.os = $(no-special-regs)
 42 CFLAGS-rtld-strnlen.os = $(no-special-regs)
---

However it does not help when an arch-specific implementation might ended being
pulled in rtld build and thus making this ineffective (And it seems that 
the strcmp is missing as well for the powerpc case).

And using __builtin_strcmp not -ffreestanding does really help on dl-runtime,
since strcmp call might still being generated by compiler.  

Maybe one option could to enforce all rtld code to use generic C implementation
instead, not allowing sysdeps folder to override it.  It should scale better
than actually testing all possible permutation and require less hacky solutions
like overriding the loader object for an specific build option as this patch.
  
Paul E Murphy May 4, 2020, 9:16 p.m. UTC | #4
On 5/4/20 1:26 PM, Adhemerval Zanella wrote:
> 
> 
> On 04/05/2020 15:02, Paul E Murphy wrote:
>>
>>
>> On 5/4/20 11:36 AM, Adhemerval Zanella wrote:
>>
>>> As a side note, is strcmp the only string routine that used in lazy
>>> resolution?
>>
>> It was the only one I encountered while debugging this issue.  I won't claim it is the only one, I have limited experience with that corner of glibc.  Though, I am surprised only one test failed due to the vector register clobbering.
> 
> I am wondering which would be best way to avoid such issue to happen
> with different string routines since some specific loader code need
> to adhere to more strict ABI requirements.
> 
> The powerpc Makefile already adds the expected compiler tools to
> avoid this very issue:
> 
> ---
> sysdeps/powerpc/powerpc64/Makefile
> 
>   13 # These flags prevent FPU or Altivec registers from being used,
>   14 # for code called in contexts that is not allowed to touch those registers.
>   15 # Stupid GCC requires us to pass all these ridiculous switches.  We need to
>   16 # pass the -mno-* switches as well to prevent the compiler from attempting
>   17 # to emit altivec or vsx instructions, especially when the registers aren't
>   18 # available.
>   19 no-special-regs := $(sort $(foreach n,40 41 50 51 60 61 62 63 \
>   20                                       $(foreach m,2 3 4 5 6 7 8 9, \
>   21                                                   3$m 4$m 5$m),\
>   22                                     -ffixed-$n)) \
>   23                    $(sort $(foreach n,$(foreach m,0 1 2 3 4 5 6 7 8 9,\
>   24                                                 $m 1$m 2$m) 30 31,\
>   25                                     -ffixed-v$n)) \
>   26                    -ffixed-vrsave -ffixed-vscr -mno-altivec -mno-vsx
> [...]
>   36 CFLAGS-dl-runtime.os = $(no-special-regs)
>   37 CFLAGS-dl-lookup.os = $(no-special-regs)
>   38 CFLAGS-dl-misc.os = $(no-special-regs)


>   39 CFLAGS-rtld-mempcpy.os = $(no-special-regs)
>   40 CFLAGS-rtld-memmove.os = $(no-special-regs)
>   41 CFLAGS-rtld-memchr.os = $(no-special-regs)
>   42 CFLAGS-rtld-strnlen.os = $(no-special-regs)

Hrm, all the string functions listed above generate VMX instructions 
when building with --with-cpu=power9 --disable-multi-arch.  I question 
the correctness and completeness of this list as strcmp is missing.

> ---
> 
> However it does not help when an arch-specific implementation might ended being
> pulled in rtld build and thus making this ineffective (And it seems that
> the strcmp is missing as well for the powerpc case).
> 
> And using __builtin_strcmp not -ffreestanding does really help on dl-runtime,
> since strcmp call might still being generated by compiler.
> 
> Maybe one option could to enforce all rtld code to use generic C implementation
> instead, not allowing sysdeps folder to override it.  It should scale better
> than actually testing all possible permutation and require less hacky solutions
> like overriding the loader object for an specific build option as this patch.
> 
I agree with that line of thought.  However, what would the performance 
impact be?  I would hazard to guess it would be limited.
  
Adhemerval Zanella May 5, 2020, 9:01 p.m. UTC | #5
On 04/05/2020 18:16, Paul E Murphy wrote:
> 
> 
> On 5/4/20 1:26 PM, Adhemerval Zanella wrote:
>>
>>
>> On 04/05/2020 15:02, Paul E Murphy wrote:
>>>
>>>
>>> On 5/4/20 11:36 AM, Adhemerval Zanella wrote:
>>>
>>>> As a side note, is strcmp the only string routine that used in lazy
>>>> resolution?
>>>
>>> It was the only one I encountered while debugging this issue.  I won't claim it is the only one, I have limited experience with that corner of glibc.  Though, I am surprised only one test failed due to the vector register clobbering.
>>
>> I am wondering which would be best way to avoid such issue to happen
>> with different string routines since some specific loader code need
>> to adhere to more strict ABI requirements.
>>
>> The powerpc Makefile already adds the expected compiler tools to
>> avoid this very issue:
>>
>> ---
>> sysdeps/powerpc/powerpc64/Makefile
>>
>>   13 # These flags prevent FPU or Altivec registers from being used,
>>   14 # for code called in contexts that is not allowed to touch those registers.
>>   15 # Stupid GCC requires us to pass all these ridiculous switches.  We need to
>>   16 # pass the -mno-* switches as well to prevent the compiler from attempting
>>   17 # to emit altivec or vsx instructions, especially when the registers aren't
>>   18 # available.
>>   19 no-special-regs := $(sort $(foreach n,40 41 50 51 60 61 62 63 \
>>   20                                       $(foreach m,2 3 4 5 6 7 8 9, \
>>   21                                                   3$m 4$m 5$m),\
>>   22                                     -ffixed-$n)) \
>>   23                    $(sort $(foreach n,$(foreach m,0 1 2 3 4 5 6 7 8 9,\
>>   24                                                 $m 1$m 2$m) 30 31,\
>>   25                                     -ffixed-v$n)) \
>>   26                    -ffixed-vrsave -ffixed-vscr -mno-altivec -mno-vsx
>> [...]
>>   36 CFLAGS-dl-runtime.os = $(no-special-regs)
>>   37 CFLAGS-dl-lookup.os = $(no-special-regs)
>>   38 CFLAGS-dl-misc.os = $(no-special-regs)
> 
> 
>>   39 CFLAGS-rtld-mempcpy.os = $(no-special-regs)
>>   40 CFLAGS-rtld-memmove.os = $(no-special-regs)
>>   41 CFLAGS-rtld-memchr.os = $(no-special-regs)
>>   42 CFLAGS-rtld-strnlen.os = $(no-special-regs)
> 
> Hrm, all the string functions listed above generate VMX instructions when building with --with-cpu=power9 --disable-multi-arch.  I question the correctness and completeness of this list as strcmp is missing.

I think that's the problem of having a dissociated rule from the code it actually
uses, since its definition might be unnecessary if the code is refactored or
changed.  I didn't dig into the history of the lazy resolution code, but it
might the case where old implementation did actually used the referenced routines.

What I think would be a better approach is to make no-special-regs a parametric
rule define in the architecture Makefile (which should know which registers
it supports in lazy resolution) and use this rule (no-special-regs) on generic
definition in elf/Makefile.

> 
>> ---
>>
>> However it does not help when an arch-specific implementation might ended being
>> pulled in rtld build and thus making this ineffective (And it seems that
>> the strcmp is missing as well for the powerpc case).
>>
>> And using __builtin_strcmp not -ffreestanding does really help on dl-runtime,
>> since strcmp call might still being generated by compiler.
>>
>> Maybe one option could to enforce all rtld code to use generic C implementation
>> instead, not allowing sysdeps folder to override it.  It should scale better
>> than actually testing all possible permutation and require less hacky solutions
>> like overriding the loader object for an specific build option as this patch.
>>
> I agree with that line of thought.  However, what would the performance impact be?  I would hazard to guess it would be limited.

It would impart mostly the loader code and I would expects mostly in program 
loading (since distros now are moving to make bind now the default build 
option) with large a large number of symbols with large sizes (most likely C++
programs with load of shared libraries).

Using clang default installation from ubuntu-10 (which links to quite large C++
libraries with more than 30k symbols) on x86_64, strcmp consumes about ~3% of 
total runtime. And adding a very crude strcmp profiling using the same strategy
for relocation time from rtld.c at elf/dl-lookup.c it seems strcmp consumes
about 2.2% of total relocation time.

Most of time is in fact spend at _dl_lookup_symbol_x and _dl_relocate_object,
so I am not sure if using the best mem* routines in dynamic loader is indeed
paramount.
  

Patch

diff --git a/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
new file mode 100644
index 0000000000..afdb492b3d
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
@@ -0,0 +1,2 @@ 
+/* Fallback to P8 which does not use vector regs for rtld.  */
+#include <sysdeps/powerpc/powerpc64/power8/strcmp.S>