From patchwork Mon Feb 22 19:44:40 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nix X-Patchwork-Id: 10991 Received: (qmail 49540 invoked by alias); 22 Feb 2016 19:45:00 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 49082 invoked by uid 89); 22 Feb 2016 19:44:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.8 required=5.0 tests=BAYES_50, KAM_LAZY_DOMAIN_SECURITY, UNPARSEABLE_RELAY autolearn=no version=3.3.2 spammy=creeping, symptoms, defsym, randomly X-HELO: aserp1040.oracle.com From: Nix To: Mike Frysinger , libc-alpha@sourceware.org Cc: carlos@redhat.com Subject: Re: [PATCH 02/12] Make __stack_chk_fail() not use other glibc facilities. References: <1455963826-21885-1-git-send-email-nix@esperi.org.uk> <1455963826-21885-3-git-send-email-nix@esperi.org.uk> <87r3g7povb.fsf@esperi.org.uk> <20160220171903.GW19841@vapier.lan> Date: Mon, 22 Feb 2016 19:44:40 +0000 In-Reply-To: <20160220171903.GW19841@vapier.lan> (Mike Frysinger's message of "Sat, 20 Feb 2016 12:19:03 -0500") Message-ID: <87io1gjzrr.fsf@esperi.org.uk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) MIME-Version: 1.0 On 20 Feb 2016, Mike Frysinger said this: > all that said, i don't think we want to import the Gentoo one anyways. > it changes a lot of behavior that we shouldn't conflate here -- the > crash/syslog handling is contentious. maybe something like (untested): Thank you for this! :) I was scared of the code to do things like getting backtraces, but I see now that this is contentious in the existing cases where it is done in any case :) However, even once I'd knocked the bugs out of it (notably that (__SSP__ is only defined for -fstack-protector; -strong and -all get different #defines), it doesn't actually fix the problem: gcc -nostdlib -nostartfiles -r -o /home/oranix/oracle/src/foo/elf/librtld.map.o '-Wl,-(' /home/oranix/oracle/src/foo/elf/dl-allobjs.os /home/oranix/oracle/src/foo/libc_pic.a -lgcc '-Wl,-)' -Wl,-Map,/home/oranix/oracle/src/foo/elf/librtld.mapT /home/oranix/oracle/src/foo/libc_pic.a(init-first.os):(.data+0x0): multiple definition of `__libc_multiple_libcs' /home/oranix/oracle/src/foo/elf/dl-allobjs.os:(.bss+0xf8): first defined here So this forced me to actually *debug* the problem rather than flailing randomly at it until the problem went away as I had been before. It turns out there were two problems. Firstly a __stack_chk_fail call was creeping in to rtld itself, violating the rule that it should remain unprotected and yanking in the failure machinery from libc (and when that machinery calls on the rest of libc, all hell then breaks loose and you get, well, this failure: even if you manage to avoid it, as with my original patch, it's highly undesirable). The function in question (__sigjmp_save) lands in rtld on x86-32, but is not at any point compiled with MODULE_NAME=rtld, nor rebuilt that way when rtld is rebuilt after the mapfile is constructed. I called this function out as problematic in the last patch in the series (which I'll fold this fix into), but never dreamt that it might be causing this failure. I'll fold a patch turning this off into the last patch in the series when I repost. The second problem, with the same symptoms, is an obvious one I was foolishly assuming glibc was getting around magically because things seemed to work -- the mapfile command above is using the libc object files (it has to, because the rtld modules cannot possibly have been built yet). Those object files have __stack_chk_fail's scattered all through them, so ld is going to be pulling __stack_chk_fail and all that it depends upon into the rtld mapfile (and then not actually building any of that into rtld after all, because the objects that finally end up in rtld don't call __stack_chk_fail). Fixing that by faking out ld is fairly simple: see the patch below. With setjmp/sigjmp.c unconditionally excluded from stack-protection and __stack_chk_fail faked out at rtld mapfile computation time, rtld links fine and this patch in the series can be dropped, thank goodness. To be sure of that, I'm doing another giant pile of test cycles on x86, x86-64 and (newly) sparc(64) and will post a revised patch series incorporating all the review-inspired changes after that, and any further changes the sparc runs require. (Though the patch I'm disturbed about, and which may imply a need for review and testing of this stuff on yet more arches, remains the last one in the series.) Initial signs on x86 and x86-64, at least, look good. (For debugging this sort of thing in the future, nm'ing dl-allobjs.os and looking for __stack_chk_fail in there is the thing to do. There should be none. To be honest, the sigjmp thing looks like an existing bug in the build system -- if it lands in rtld, it should be being rebuilt accordingly, but it isn't.) Here's the patch. I'll add this to the next series right after the patch to suppress -fstack-protector in rtld. (It may be that I'm putting the variable in the wrong place and you don't want it in elf/Makefile, or that you'd prefer the thing to be done unconditionally, or unconditionally if ssp is available in the compiler: but for now I'm only doing this when the compiler supports SSP. As usual in this series, I'm doing it even if glibc is not configured with --enable-stack-protector, as long as the compiler is capable of it, because this ensures that even compilers that pass in -fstack-protector by default will generate a working glibc.) 8>---------------------------------------------------------------------<8 From: Nick Alcock Subject: [PATCH] Prevent the rtld mapfile computation from dragging in __stack_chk_fail(). The previous commit prevented rtld itself from being built with -fstack-protector, but this is not quite enough. We identify which objects belong in rtld via a test link and analysis of the resulting mapfile. That link is necessarily done against objects that are stack-protected, so drags in __stack_chk_fail() and all the libc and libio code it uses. To stop this happening, use --defsym in the test librtld.map-production link to force the linker to predefine __stack_chk_fail() (to 0, but it could be to anything). (In a real link, this would of course be catastrophic, but these object files are never used for anything else.) --- elf/Makefile | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/elf/Makefile b/elf/Makefile index 9806648..4a7be71 100644 --- a/elf/Makefile +++ b/elf/Makefile @@ -346,9 +346,19 @@ $(objpfx)dl-allobjs.os: $(all-rtld-routines:%=$(objpfx)%.os) # are compiled with special flags, and puts these modules into rtld-libc.a # for us. Then we do the real link using rtld-libc.a instead of libc_pic.a. +# If the compiler can do SSP, build the mapfile with a dummy __stack_chk_fail +# symbol defined, to prevent the real thing being dragged into rtld +# even though rtld is never built with stack-protection. + +ifneq ($(stack-protector),) +dummy-stack-chk-fail := -Wl,--defsym='__stack_chk_fail=0' +else +dummy-stack-chk-fail := +endif + $(objpfx)librtld.map: $(objpfx)dl-allobjs.os $(common-objpfx)libc_pic.a @-rm -f $@T - $(reloc-link) -o $@.o '-Wl,-(' $^ -lgcc '-Wl,-)' -Wl,-Map,$@T + $(reloc-link) -o $@.o $(dummy-stack-chk-fail) '-Wl,-(' $^ -lgcc '-Wl,-)' -Wl,-Map,$@T rm -f $@.o mv -f $@T $@