elf,nptl: Add -z lazy -z norelro to tests that need it

Message ID 20230302112519.914641-1-arsen@gentoo.org
State Superseded
Headers
Series elf,nptl: Add -z lazy -z norelro to tests that need it |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Arsen Arsenović March 2, 2023, 11:25 a.m. UTC
  Some toolchains, such as that used on Gentoo Hardened, set -z now -z
relro out of the box.  These flags break tests that rely on fixups in
underlinked libraries being applied after a dlopen happens.
---
Morning,

This patch fixes a few test failures that we encounter on Gentoo when
using a toolchain that builds packages with full relro.

All of these failures come down to underlinked test objects, which are
inherently in opposition with it.

Tested on x86_64-pc-linux-gnu against 2.37 and
59a6d5e9477695c41d6feef7ef8636f8f744f3c5, so it should be okay for
backporting too.

Have a lovely day!

 elf/Makefile  | 22 +++++++++++++++++++++-
 nptl/Makefile |  2 +-
 2 files changed, 22 insertions(+), 2 deletions(-)
  

Comments

Florian Weimer March 3, 2023, 11:51 a.m. UTC | #1
* Arsen Arsenović via Libc-alpha:

> Some toolchains, such as that used on Gentoo Hardened, set -z now -z
> relro out of the box.  These flags break tests that rely on fixups in
> underlinked libraries being applied after a dlopen happens.

I'm surprised that -z norelro is ever required.  Why isn't -z lazy
enough?  If ld.so crashes because it attempts to apply relocations after
the fact, woudln't that be an ld.so bug (or a linker bug that sets up
the RELRO segment incorrectly)?

Thanks,
Florian
  
Arsen Arsenović March 3, 2023, 9:54 p.m. UTC | #2
Florian Weimer <fweimer@redhat.com> writes:

> * Arsen Arsenović via Libc-alpha:
>
>> Some toolchains, such as that used on Gentoo Hardened, set -z now -z
>> relro out of the box.  These flags break tests that rely on fixups in
>> underlinked libraries being applied after a dlopen happens.
>
> I'm surprised that -z norelro is ever required.  Why isn't -z lazy
> enough?  If ld.so crashes because it attempts to apply relocations after
> the fact, woudln't that be an ld.so bug (or a linker bug that sets up
> the RELRO segment incorrectly)?

Hm.  Something went awry while I was debugging this.  I looked at a test
again just now and noticed that the symbols some of these tests were
crashing on came from libc (dlopen here) while loading constload2 (which
is dlopen'd from constload1).  The backtrace contains a PLT trampoline
which then fixups dlopen inside the RELRO segment.

I take it dlopen@got[plt] is not supposed to be in the RELRO range?

I could have sworn this failed when fixing up bar (void) as a result of
constload2 dlopening constload3... but maybe that was a different
failure.

Let's put this patch on hold while I investigate further.

FWIW, this should be easy to reproduce by building with CC='gcc
-Wl,-z,relro,-z,now' or so, I think.

Thanks, sorry about the fuss.
  
Arsen Arsenović March 4, 2023, 5:46 p.m. UTC | #3
Arsen Arsenović <arsen@gentoo.org> writes:

> Hm.  Something went awry while I was debugging this.  I looked at a test
> again just now and noticed that the symbols some of these tests were
> crashing on came from libc (dlopen here) while loading constload2 (which
> is dlopen'd from constload1).  The backtrace contains a PLT trampoline
> which then fixups dlopen inside the RELRO segment.
>
> I take it dlopen@got[plt] is not supposed to be in the RELRO range?
>
> I could have sworn this failed when fixing up bar (void) as a result of
> constload2 dlopening constload3... but maybe that was a different
> failure.
>
> Let's put this patch on hold while I investigate further.
>
> FWIW, this should be easy to reproduce by building with CC='gcc
> -Wl,-z,relro,-z,now' or so, I think.

Ah, I think I see the issue:

  ~/gnu/glibc/b2$ diff -u0 shlib.lds.-Wl,-z,{lazy,now},-z,relro 
  --- shlib.lds.-Wl,-z,lazy,-z,relro	2023-03-04 19:54:42.977032934 +0100
  +++ shlib.lds.-Wl,-z,now,-z,relro	2023-03-04 18:57:03.195010040 +0100
  @@ -1 +1 @@
  -/* Script for -shared -z combreloc -z separate-code */
  +/* Script for -shared -z combreloc -z separate-code -z relro -z now */
  @@ -153,3 +153,2 @@
  -  .got            : { *(.got) *(.igot) }
  -  . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .);
  -  .got.plt        : { *(.got.plt) *(.igot.plt) }
  +  .got            : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) }
  +  . = DATA_SEGMENT_RELRO_END (0, .);
  ~/gnu/glibc/b2 1 $ 

The builds system assumes that all the flags used while building glibc
use the same linker script, and that this will be the same linker script
as the one that's used initially to generate shlib.lds.  This is not
true when -z relro is set and -z {now,lazy} are being varied.

This also explains why the problem only arose after we introduced
-Wl,-z,now.

Do the tests even need to use this linker script?  If not, it's probably
best to just not use it for tests.  I can wire that case up, if you
think that is sensible.

Thanks in advance.
  
Florian Weimer March 6, 2023, 9:15 a.m. UTC | #4
* Arsen Arsenović:

> Arsen Arsenović <arsen@gentoo.org> writes:
>
>> Hm.  Something went awry while I was debugging this.  I looked at a test
>> again just now and noticed that the symbols some of these tests were
>> crashing on came from libc (dlopen here) while loading constload2 (which
>> is dlopen'd from constload1).  The backtrace contains a PLT trampoline
>> which then fixups dlopen inside the RELRO segment.
>>
>> I take it dlopen@got[plt] is not supposed to be in the RELRO range?
>>
>> I could have sworn this failed when fixing up bar (void) as a result of
>> constload2 dlopening constload3... but maybe that was a different
>> failure.
>>
>> Let's put this patch on hold while I investigate further.
>>
>> FWIW, this should be easy to reproduce by building with CC='gcc
>> -Wl,-z,relro,-z,now' or so, I think.
>
> Ah, I think I see the issue:
>
>   ~/gnu/glibc/b2$ diff -u0 shlib.lds.-Wl,-z,{lazy,now},-z,relro 
>   --- shlib.lds.-Wl,-z,lazy,-z,relro	2023-03-04 19:54:42.977032934 +0100
>   +++ shlib.lds.-Wl,-z,now,-z,relro	2023-03-04 18:57:03.195010040 +0100
>   @@ -1 +1 @@
>   -/* Script for -shared -z combreloc -z separate-code */
>   +/* Script for -shared -z combreloc -z separate-code -z relro -z now */
>   @@ -153,3 +153,2 @@
>   -  .got            : { *(.got) *(.igot) }
>   -  . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .);
>   -  .got.plt        : { *(.got.plt) *(.igot.plt) }
>   +  .got            : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) }
>   +  . = DATA_SEGMENT_RELRO_END (0, .);
>   ~/gnu/glibc/b2 1 $ 
>
> The builds system assumes that all the flags used while building glibc
> use the same linker script, and that this will be the same linker script
> as the one that's used initially to generate shlib.lds.  This is not
> true when -z relro is set and -z {now,lazy} are being varied.
>
> This also explains why the problem only arose after we introduced
> -Wl,-z,now.

Interesting, thanks for looking into this.

Does the issue go away if you configure with --enable-bind-now?

That's what we are doing, and we don't see those test failures.

Florian
  
Arsen Arsenović March 6, 2023, 2:17 p.m. UTC | #5
Florian Weimer <fweimer@redhat.com> writes:

> * Arsen Arsenović:
>
>> Arsen Arsenović <arsen@gentoo.org> writes:
>>
>>> Hm.  Something went awry while I was debugging this.  I looked at a test
>>> again just now and noticed that the symbols some of these tests were
>>> crashing on came from libc (dlopen here) while loading constload2 (which
>>> is dlopen'd from constload1).  The backtrace contains a PLT trampoline
>>> which then fixups dlopen inside the RELRO segment.
>>>
>>> I take it dlopen@got[plt] is not supposed to be in the RELRO range?
>>>
>>> I could have sworn this failed when fixing up bar (void) as a result of
>>> constload2 dlopening constload3... but maybe that was a different
>>> failure.
>>>
>>> Let's put this patch on hold while I investigate further.
>>>
>>> FWIW, this should be easy to reproduce by building with CC='gcc
>>> -Wl,-z,relro,-z,now' or so, I think.
>>
>> Ah, I think I see the issue:
>>
>>   ~/gnu/glibc/b2$ diff -u0 shlib.lds.-Wl,-z,{lazy,now},-z,relro 
>>   --- shlib.lds.-Wl,-z,lazy,-z,relro	2023-03-04 19:54:42.977032934 +0100
>>   +++ shlib.lds.-Wl,-z,now,-z,relro	2023-03-04 18:57:03.195010040 +0100
>>   @@ -1 +1 @@
>>   -/* Script for -shared -z combreloc -z separate-code */
>>   +/* Script for -shared -z combreloc -z separate-code -z relro -z now */
>>   @@ -153,3 +153,2 @@
>>   -  .got            : { *(.got) *(.igot) }
>>   -  . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .);
>>   -  .got.plt        : { *(.got.plt) *(.igot.plt) }
>>   +  .got            : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) }
>>   +  . = DATA_SEGMENT_RELRO_END (0, .);
>>   ~/gnu/glibc/b2 1 $ 
>>
>> The builds system assumes that all the flags used while building glibc
>> use the same linker script, and that this will be the same linker script
>> as the one that's used initially to generate shlib.lds.  This is not
>> true when -z relro is set and -z {now,lazy} are being varied.
>>
>> This also explains why the problem only arose after we introduced
>> -Wl,-z,now.
>
> Interesting, thanks for looking into this.
>
> Does the issue go away if you configure with --enable-bind-now?
>
> That's what we are doing, and we don't see those test failures.

From a brief look, it would seem to still leave some failures which
might be related (specifically the relro hardening tests).  I haven't
investigated.

A patchset by Adhemerval seems to remove the linker script altogether
piqued my interest.  I'll test his patch on our toolchain, I suspect
that it removes this problem altogether.  I'll keep you posted.

https://inbox.sourceware.org/libc-alpha/20221227211145.3765256-1-adhemerval.zanella@linaro.org/

Thanks, have a lovely day.
  
Arsen Arsenović March 6, 2023, 4:42 p.m. UTC | #6
Arsen Arsenović <arsen@gentoo.org> writes:

> Florian Weimer <fweimer@redhat.com> writes:
>
> From a brief look, it would seem to still leave some failures which
> might be related (specifically the relro hardening tests).  I haven't
> investigated.
>
> A patchset by Adhemerval seems to remove the linker script altogether
> piqued my interest.  I'll test his patch on our toolchain, I suspect
> that it removes this problem altogether.  I'll keep you posted.
>
> https://inbox.sourceware.org/libc-alpha/20221227211145.3765256-1-adhemerval.zanella@linaro.org/
>
> Thanks, have a lovely day.

This patchset indeed removes the need for norelro.  Will send a reroll
tonight.
  

Patch

diff --git a/elf/Makefile b/elf/Makefile
index 0d19964d42..3387b7e6c9 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -1190,6 +1190,11 @@  postclean-generated += $(objpfx)/dso-sort-tests-2.generated-makefile \
 ifeq (yes,$(have-tunables))
 $(eval $(call include_dsosort_tests,dso-sort-tests-1.def))
 $(eval $(call include_dsosort_tests,dso-sort-tests-2.def))
+
+# BZ15311 is intentionally underlinked.
+LDFLAGS-tst-bz15311-b.so += -Wl,-z,lazy -Wl,-z,norelro
+LDFLAGS-tst-bz15311-c.so += -Wl,-z,lazy -Wl,-z,norelro
+LDFLAGS-tst-bz15311-d.so += -Wl,-z,lazy -Wl,-z,norelro
 endif
 
 check-abi: $(objpfx)check-abi-ld.out \
@@ -1514,6 +1519,20 @@  LDFLAGS-tst-initorderb2.so = -Wl,--no-as-needed
 LDFLAGS-tst-tlsmod5.so = -nostdlib -Wl,--no-as-needed
 LDFLAGS-tst-tlsmod6.so = -nostdlib -Wl,--no-as-needed
 
+# The following tests are underlinked or rely on fixups being applied after a
+# dlopen call.  On toolchains that set -z now and -z relro by default, this
+# leads to failures to load or fix up the executables being tested.
+LDFLAGS-circlemod2.so = -Wl,-z,lazy
+LDFLAGS-tst-tls20mod-bad.so = -Wl,-z,lazy
+LDFLAGS-reldep6mod1.so += -Wl,-z,lazy -Wl,-z,norelro
+LDFLAGS-constload2.so = -Wl,-z,lazy -Wl,-z,norelro
+LDFLAGS-constload3.so = -Wl,-z,lazy -Wl,-z,norelro
+LDFLAGS-dblloadmod3.so = -Wl,-z,lazy -Wl,-z,norelro
+LDFLAGS-ifuncmod6.so = -Wl,-z,lazy -Wl,-z,norelro
+LDFLAGS-ltglobmod2.so = -Wl,-z,lazy -Wl,-z,norelro
+LDFLAGS-testobj1.so = -Wl,-z,lazy -Wl,-z,norelro
+LDFLAGS-testobj6.so = -Wl,-z,lazy -Wl,-z,norelro
+
 testobj1.so-no-z-defs = yes
 testobj3.so-no-z-defs = yes
 testobj4.so-no-z-defs = yes
@@ -1612,6 +1631,7 @@  $(objpfx)multiload.out: $(objpfx)testobj1.so
 LDFLAGS-origtest = -rdynamic
 $(objpfx)origtest.out: $(objpfx)testobj1.so
 
+$(objpfx)resolvfail.out: $(objpfx)testobj1.so
 ifeq ($(have-thread-library),yes)
 $(objpfx)resolvfail: $(shared-thread-library)
 endif
@@ -2337,7 +2357,7 @@  LDFLAGS-tst-audit25a = -Wl,-z,lazy
 $(objpfx)tst-audit25mod1.so: $(objpfx)tst-audit25mod3.so
 LDFLAGS-tst-audit25mod1.so = -Wl,-z,now
 $(objpfx)tst-audit25mod2.so: $(objpfx)tst-audit25mod4.so
-LDFLAGS-tst-audit25mod2.so = -Wl,-z,lazy
+LDFLAGS-tst-audit25mod2.so = -Wl,-z,lazy -Wl,-z,norelro
 tst-audit25a-ARGS = -- $(host-test-program-cmd)
 
 $(objpfx)tst-audit25b.out: $(objpfx)tst-auditmod25.so
diff --git a/nptl/Makefile b/nptl/Makefile
index 8cec6faee3..140add412c 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -569,7 +569,7 @@  $(objpfx)tst-compat-forwarder: $(objpfx)tst-compat-forwarder-mod.so
 tst-mutex10-ENV = GLIBC_TUNABLES=glibc.elision.enable=1
 
 # Protect against a build using -Wl,-z,now.
-LDFLAGS-tst-audit-threads-mod1.so = -Wl,-z,lazy
+LDFLAGS-tst-audit-threads-mod1.so = -Wl,-z,lazy -Wl,-z,norelro
 LDFLAGS-tst-audit-threads-mod2.so = -Wl,-z,lazy
 LDFLAGS-tst-audit-threads = -Wl,-z,lazy
 $(objpfx)tst-audit-threads: $(objpfx)tst-audit-threads-mod2.so