[14/17,v5] Avoid stack-protecting signal-handling functions sibcalled from assembly.

Message ID 1457882222-22599-15-git-send-email-nix@esperi.org.uk
State New, archived
Headers

Commit Message

Nix March 13, 2016, 3:16 p.m. UTC
  From: Nick Alcock <nick.alcock@oracle.com>

Certain signal-handling functions are sibcalled from assembly on
x86, both on Linux and the Hurd.  As such, they depend on having
the same-shaped stack frame, an assumption it seems likely that
-fstack-protector violates.  More worryingly, setjmp/sigjmp.c lands
in the dynamic linker but is overlooked by the machinery that
rebuilds almost everything else, and is never rebuilt: so we
should compile it witout stack-protection for the sake of ld.so.

v2: de-stack-protect setjmp/sigjmp.c.
v3: Use $(no-stack-protector).
v4: Use inhibit_stack_protector.
v5: Remove inhibition of nptl/pthread_mutex_(un)lock.c now that is
    diagnosed and fixed elsewhere, properly; inhibit _hurd_self_sigstate
    instead.

	* hurd/hurd/signal.h (_hurd_self_sigstate): Add
	inhibit_stack_protector.
	* setjmp/Makefile (CFLAGS-sigjmp.c): Add $(no-stack-protector).
---
 hurd/hurd/signal.h | 2 +-
 setjmp/Makefile    | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)
  

Comments

Florian Weimer May 13, 2016, 9:53 a.m. UTC | #1
On 03/13/2016 04:16 PM, Nix wrote:
> Certain signal-handling functions are sibcalled from assembly on
> x86, both on Linux and the Hurd.  As such, they depend on having
> the same-shaped stack frame, an assumption it seems likely that
> -fstack-protector violates.

I think that's not actually true for tail calls to 
stack-protector-enabled functions from those who are not so enabled.

The lack of rebuild is more problematic.  Does it really make a 
difference, considering that the affected function is not active while 
we initialize the stack guard value?

Thanks,
Florian
  
Nix May 13, 2016, 2:38 p.m. UTC | #2
On 13 May 2016, Florian Weimer told this:

> On 03/13/2016 04:16 PM, Nix wrote:
>> Certain signal-handling functions are sibcalled from assembly on
>> x86, both on Linux and the Hurd.  As such, they depend on having
>> the same-shaped stack frame, an assumption it seems likely that
>> -fstack-protector violates.
>
> I think that's not actually true for tail calls to stack-protector-enabled functions from those who are not so enabled.
>
> The lack of rebuild is more problematic.  Does it really make a difference, considering that the affected function is not active
> while we initialize the stack guard value?

I was being paranoid and aiming for stability given that at the time I
didn't know what caused the locking-related crashes on x86-32 when
stack-protection was enabled, and thought this was a plausible victim as
well.

In hindsight, given that we now know it was a missing reload of a
call-clobbered register in that specific case and not something to do
with some sort of stack-frame incompatibility between stack-protected
and non-stack-protected code, it seems likely that we can drop this (and
the corresponding sparc changes).

I'll roll a new patch series with that done and with the changes
requested a few weeks ago by Mike, and sans the stuff that's already
been pulled in, this weekend, and see if they pass their checks. (The
posting of the patches might not follow until early next week, because
all those make checks take a long time, particularly on my little ARM
home cinema box. :) )
  
Nix May 15, 2016, 9:50 a.m. UTC | #3
On 13 May 2016, Florian Weimer told this:

> On 03/13/2016 04:16 PM, Nix wrote:
>> Certain signal-handling functions are sibcalled from assembly on
>> x86, both on Linux and the Hurd.  As such, they depend on having
>> the same-shaped stack frame, an assumption it seems likely that
>> -fstack-protector violates.
[...]
> The lack of rebuild is more problematic. Does it really make a
> difference, considering that the affected function is not active while
> we initialize the stack guard value?

Oh yes it really does. We're not just keeping stack-protection out of
ld.so because that's where the canary is initialized, but also because
ld.so doesn't have __libc_message() et al. If a single reference is
missed we end up dragging all of the __libc_message() machinery, libio,
etc, into ld.so. We could fix this by providing a small stub for
__libc_message() or __fortify_fail() in ld.so, but given that this
behaviour also prevents any references from creeping in to ld.so's more
delicate machinery, this seems like a better first step. We can try to
stack-protect ld.so later :)

To be specific, because of this rebuild error, sigjmp.c stays
stack-protected, so we see this:

make[3]: Leaving directory '/home/oranix/oracle/src/glibc/elf'
gcc   -nostdlib -nostartfiles -r -o /usr/local/tmp/test/1/elf/librtld.os '-Wl,-(' /usr/local/tmp/test/1/elf/dl-allobjs.os /usr/local/tmp/test/1/elf/rtld-libc.a -lgcc '-Wl,-)' \
          -Wl,-Map,/usr/local/tmp/test/1/elf/librtld.os.map
gcc   -nostdlib -nostartfiles -shared -o /usr/local/tmp/test/1/elf/ld.so.new            \
          -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both -Wl,-z,defs -Wl,-z,now    \
          /usr/local/tmp/test/1/elf/librtld.os -Wl,--version-script=/usr/local/tmp/test/1/ld.map                \
          -Wl,-soname=ld-linux-x86-64.so.2                      \
          -Wl,-defsym=_begin=0
/usr/local/tmp/test/1/elf/librtld.os: In function `check_one_fd':
/home/oranix/oracle/src/glibc/csu/check_fds.c:84: undefined reference to `__stack_chk_fail'
/usr/local/tmp/test/1/elf/librtld.os: In function `__libc_check_standard_fds':
/home/oranix/oracle/src/glibc/csu/check_fds.c:100: undefined reference to `__stack_chk_fail'
/usr/local/tmp/test/1/elf/librtld.os: In function `__closedir':
/home/oranix/oracle/src/glibc/dirent/../sysdeps/posix/closedir.c:53: undefined reference to `__stack_chk_fail'
/usr/local/tmp/test/1/elf/librtld.os: In function `__readdir':
/home/oranix/oracle/src/glibc/dirent/../sysdeps/posix/readdir.c:118: undefined reference to `__stack_chk_fail'
/usr/local/tmp/test/1/elf/librtld.os: In function `__rewinddir':
/home/oranix/oracle/src/glibc/dirent/../sysdeps/posix/rewinddir.c:39: undefined reference to `__stack_chk_fail'
/usr/local/tmp/test/1/elf/librtld.os:/home/oranix/oracle/src/glibc/dirent/../sysdeps/unix/sysv/linux/getdents.c:301: more undefined references to `__stack_chk_fail' follow
collect2: error: ld returned 1 exit status

This is obviously a bad thing.

I'll have a hunt for whatever it is that's causing sigjmp.c to not be
rebuilt after this test cycle, because that's the underlying bug here,
really.
  
Andreas Schwab May 15, 2016, 10:35 a.m. UTC | #4
Nix <nix@esperi.org.uk> writes:

> I'll have a hunt for whatever it is that's causing sigjmp.c to not be
> rebuilt after this test cycle, because that's the underlying bug here,
> really.

Just changing CFLAGS-foo doesn't trigger a rebuild of dependent objects.

Andreas.
  
Nix May 15, 2016, 10:39 a.m. UTC | #5
On 15 May 2016, nix@esperi.org.uk told this:
> To be specific, because of this rebuild error, sigjmp.c stays
> stack-protected, so we see this:

As anyone with working eyes should have seen (but I didn't), this was a
*different* problem. (I shouldn't drop the already-merged stuff from my
patch tree and then try building against a tree that doesn't have them
merged in anywhere!)

We do indeed appear to get no problems if dropping the sigjmp stuff,
though I have no idea why not. Clearly the code inspection that led me
to believe I needed to do this was erronous, at least on x86*. Running
tests on other arches now.
  
Nix May 15, 2016, 3:18 p.m. UTC | #6
On 15 May 2016, Andreas Schwab verbalised:

> Nix <nix@esperi.org.uk> writes:
>
>> I'll have a hunt for whatever it is that's causing sigjmp.c to not be
>> rebuilt after this test cycle, because that's the underlying bug here,
>> really.
>
> Just changing CFLAGS-foo doesn't trigger a rebuild of dependent objects.

Well, no, but sigjmp.c should be rebuilt if it's incorporated in ld.so
anyway: we should be getting an rtld-sigjmp.o, and we're not.

(However, I suspect that this is because it's *not* incorporated into
ld.so, and I was misreading the code! So this is all rather moot.
Everything seems to be going fine so far on the testing front with the
CFLAGS-set for sigjmp.c removed...)
  
Andreas Schwab May 15, 2016, 3:49 p.m. UTC | #7
Nix <nix@esperi.org.uk> writes:

> On 15 May 2016, Andreas Schwab verbalised:
>
>> Nix <nix@esperi.org.uk> writes:
>>
>>> I'll have a hunt for whatever it is that's causing sigjmp.c to not be
>>> rebuilt after this test cycle, because that's the underlying bug here,
>>> really.
>>
>> Just changing CFLAGS-foo doesn't trigger a rebuild of dependent objects.
>
> Well, no, but sigjmp.c should be rebuilt if it's incorporated in ld.so
> anyway: we should be getting an rtld-sigjmp.o, and we're not.

Most architectures implement setjmp in assembler anyway.

Andreas.
  
Florian Weimer May 16, 2016, 8:59 a.m. UTC | #8
On 05/15/2016 05:49 PM, Andreas Schwab wrote:
> Nix <nix@esperi.org.uk> writes:
>
>> On 15 May 2016, Andreas Schwab verbalised:
>>
>>> Nix <nix@esperi.org.uk> writes:
>>>
>>>> I'll have a hunt for whatever it is that's causing sigjmp.c to not be
>>>> rebuilt after this test cycle, because that's the underlying bug here,
>>>> really.
>>>
>>> Just changing CFLAGS-foo doesn't trigger a rebuild of dependent objects.
>>
>> Well, no, but sigjmp.c should be rebuilt if it's incorporated in ld.so
>> anyway: we should be getting an rtld-sigjmp.o, and we're not.
>
> Most architectures implement setjmp in assembler anyway.

But a lot (all?) of them use a C tail for some of the work, and the 
patch is about how that is compiled.

Florian
  
Andreas Schwab May 16, 2016, 9:15 a.m. UTC | #9
Florian Weimer <fweimer@redhat.com> writes:

> On 05/15/2016 05:49 PM, Andreas Schwab wrote:
>> Nix <nix@esperi.org.uk> writes:
>>
>>> On 15 May 2016, Andreas Schwab verbalised:
>>>
>>>> Nix <nix@esperi.org.uk> writes:
>>>>
>>>>> I'll have a hunt for whatever it is that's causing sigjmp.c to not be
>>>>> rebuilt after this test cycle, because that's the underlying bug here,
>>>>> really.
>>>>
>>>> Just changing CFLAGS-foo doesn't trigger a rebuild of dependent objects.
>>>
>>> Well, no, but sigjmp.c should be rebuilt if it's incorporated in ld.so
>>> anyway: we should be getting an rtld-sigjmp.o, and we're not.
>>
>> Most architectures implement setjmp in assembler anyway.
>
> But a lot (all?) of them use a C tail for some of the work, and the patch
> is about how that is compiled.

Ahh, yes, sigjmp, not setjmp. :-)

Andreas.
  

Patch

diff --git a/hurd/hurd/signal.h b/hurd/hurd/signal.h
index 85e5152..78a25ae 100644
--- a/hurd/hurd/signal.h
+++ b/hurd/hurd/signal.h
@@ -129,7 +129,7 @@  extern struct hurd_sigstate *_hurd_self_sigstate (void)
 #define _HURD_SIGNAL_H_EXTERN_INLINE __extern_inline
 #endif
 
-_HURD_SIGNAL_H_EXTERN_INLINE struct hurd_sigstate *
+_HURD_SIGNAL_H_EXTERN_INLINE inhibit_stack_protector struct hurd_sigstate *
 _hurd_self_sigstate (void)
 {
   struct hurd_sigstate **location = (struct hurd_sigstate **)
diff --git a/setjmp/Makefile b/setjmp/Makefile
index 5b677cc..b617a84 100644
--- a/setjmp/Makefile
+++ b/setjmp/Makefile
@@ -35,3 +35,7 @@  tests-static	:= tst-setjmp-static
 include ../Rules
 
 $(objpfx)tst-setjmp-fp: $(libm)
+
+# This is sibcalled directly from arch-specific assembly, included in rtld,
+# but never rebuilt, so it must never be built with stack protection.
+CFLAGS-sigjmp.c += $(no-stack-protector)