[02/12] Make __stack_chk_fail() not use other glibc facilities.

Message ID 87io1gjzrr.fsf@esperi.org.uk
State New, archived
Headers

Commit Message

Nix Feb. 22, 2016, 7:44 p.m. UTC
  On 20 Feb 2016, Mike Frysinger said this:
> all that said, i don't think we want to import the Gentoo one anyways.
> it changes a lot of behavior that we shouldn't conflate here -- the
> crash/syslog handling is contentious.  maybe something like (untested):

Thank you for this! :) I was scared of the code to do things like
getting backtraces, but I see now that this is contentious in the
existing cases where it is done in any case :)

However, even once I'd knocked the bugs out of it (notably that (__SSP__
is only defined for -fstack-protector; -strong and -all get different
#defines), it doesn't actually fix the problem:

gcc   -nostdlib -nostartfiles -r -o /home/oranix/oracle/src/foo/elf/librtld.map.o '-Wl,-(' /home/oranix/oracle/src/foo/elf/dl-allobjs.os /home/oranix/oracle/src/foo/libc_pic.a -lgcc '-Wl,-)' -Wl,-Map,/home/oranix/oracle/src/foo/elf/librtld.mapT
/home/oranix/oracle/src/foo/libc_pic.a(init-first.os):(.data+0x0): multiple definition of `__libc_multiple_libcs'
/home/oranix/oracle/src/foo/elf/dl-allobjs.os:(.bss+0xf8): first defined here

So this forced me to actually *debug* the problem rather than flailing
randomly at it until the problem went away as I had been before.


It turns out there were two problems.

Firstly a __stack_chk_fail call was creeping in to rtld itself,
violating the rule that it should remain unprotected and yanking in the
failure machinery from libc (and when that machinery calls on the rest
of libc, all hell then breaks loose and you get, well, this failure:
even if you manage to avoid it, as with my original patch, it's highly
undesirable).

The function in question (__sigjmp_save) lands in rtld on x86-32, but is
not at any point compiled with MODULE_NAME=rtld, nor rebuilt that way
when rtld is rebuilt after the mapfile is constructed. I called this
function out as problematic in the last patch in the series (which I'll
fold this fix into), but never dreamt that it might be causing this
failure. I'll fold a patch turning this off into the last patch in the
series when I repost.


The second problem, with the same symptoms, is an obvious one I was
foolishly assuming glibc was getting around magically because things
seemed to work -- the mapfile command above is using the libc object
files (it has to, because the rtld modules cannot possibly have been
built yet). Those object files have __stack_chk_fail's scattered all
through them, so ld is going to be pulling __stack_chk_fail and all that
it depends upon into the rtld mapfile (and then not actually building
any of that into rtld after all, because the objects that finally end up
in rtld don't call __stack_chk_fail). Fixing that by faking out ld is
fairly simple: see the patch below.

With setjmp/sigjmp.c unconditionally excluded from stack-protection and
__stack_chk_fail faked out at rtld mapfile computation time, rtld links
fine and this patch in the series can be dropped, thank goodness.


To be sure of that, I'm doing another giant pile of test cycles on x86,
x86-64 and (newly) sparc(64) and will post a revised patch series
incorporating all the review-inspired changes after that, and any
further changes the sparc runs require. (Though the patch I'm disturbed
about, and which may imply a need for review and testing of this stuff
on yet more arches, remains the last one in the series.)

Initial signs on x86 and x86-64, at least, look good.


(For debugging this sort of thing in the future, nm'ing dl-allobjs.os
and looking for __stack_chk_fail in there is the thing to do. There
should be none. To be honest, the sigjmp thing looks like an existing
bug in the build system -- if it lands in rtld, it should be being
rebuilt accordingly, but it isn't.)


Here's the patch. I'll add this to the next series right after the patch
to suppress -fstack-protector in rtld. (It may be that I'm putting the
variable in the wrong place and you don't want it in elf/Makefile, or
that you'd prefer the thing to be done unconditionally, or
unconditionally if ssp is available in the compiler: but for now I'm
only doing this when the compiler supports SSP. As usual in this series,
I'm doing it even if glibc is not configured with
--enable-stack-protector, as long as the compiler is capable of it,
because this ensures that even compilers that pass in -fstack-protector
by default will generate a working glibc.)

8>---------------------------------------------------------------------<8
From: Nick Alcock <nick.alcock@oracle.com>
Subject: [PATCH] Prevent the rtld mapfile computation from dragging in
 __stack_chk_fail().

The previous commit prevented rtld itself from being built with
-fstack-protector, but this is not quite enough.  We identify which
objects belong in rtld via a test link and analysis of the resulting
mapfile.  That link is necessarily done against objects that are
stack-protected, so drags in __stack_chk_fail() and all the libc
and libio code it uses.

To stop this happening, use --defsym in the test librtld.map-production
link to force the linker to predefine __stack_chk_fail() (to 0, but it
could be to anything).  (In a real link, this would of course be
catastrophic, but these object files are never used for anything else.)
---
 elf/Makefile | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)
  

Patch

diff --git a/elf/Makefile b/elf/Makefile
index 9806648..4a7be71 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -346,9 +346,19 @@  $(objpfx)dl-allobjs.os: $(all-rtld-routines:%=$(objpfx)%.os)
 # are compiled with special flags, and puts these modules into rtld-libc.a
 # for us.  Then we do the real link using rtld-libc.a instead of libc_pic.a.
 
+# If the compiler can do SSP, build the mapfile with a dummy __stack_chk_fail
+# symbol defined, to prevent the real thing being dragged into rtld
+# even though rtld is never built with stack-protection.
+
+ifneq ($(stack-protector),)
+dummy-stack-chk-fail := -Wl,--defsym='__stack_chk_fail=0'
+else
+dummy-stack-chk-fail :=
+endif
+
 $(objpfx)librtld.map: $(objpfx)dl-allobjs.os $(common-objpfx)libc_pic.a
 	@-rm -f $@T
-	$(reloc-link) -o $@.o '-Wl,-(' $^ -lgcc '-Wl,-)' -Wl,-Map,$@T
+	$(reloc-link) -o $@.o $(dummy-stack-chk-fail) '-Wl,-(' $^ -lgcc '-Wl,-)' -Wl,-Map,$@T
 	rm -f $@.o
 	mv -f $@T $@