Patchwork [15/16] Avoid stack-protecting certain functions called from assembly.

login
register
mail settings
Submitter Nix
Date Feb. 28, 2016, 4:41 p.m.
Message ID <1456677695-29778-16-git-send-email-nix@esperi.org.uk>
Download mbox | patch
Permalink /patch/11135/
State New
Headers show

Comments

Nix - Feb. 28, 2016, 4:41 p.m.
From: Nick Alcock <nick.alcock@oracle.com>

This is the problematic part.  Without -fno-stack-protector on
__pthread_mutex_cond_lock_adjust() and __pthread_mutex_unlock_usercnt(),
nptl/tst-cond24 and nptl/tst-cond25 receive a NULL mutex at unlock time
and segfault.  However... I don't understand why.  It is the callee's
responsibility both to add the stack canary and to initialize it, just
like any other local variable.  It has to be, or the ABI for stack-
protected code would be incompatible with that for non-protected code.
But the fact remains that
sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S both explicitly
mentions the stack frame layout and calls this function, and this call
goes wrong if we stack-protect it.

So this is somewhere where I need someone to tell me what's special about
sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S (and in particular
special about priority-inheritance mutexes: everything else works),
before I can be confident that this is even remotely the right thing to
do.

We also de-stack-protect setjmp/sigjmp.c: it receives a sibcall from
sysdeps/x86_64/setjmp.S and lands in rtld, but is *not* rebuilt by
the machinery that rebuilds almost everything else that lands in
rtld with an appropriate MODULE_NAME.

Similar fixups may be required for things called directly from
assembly on other architectures.

v2: de-stack-protect setjmp/sigjmp.c.
v3: Use $(no-stack-protector).
v4: Use inhibit_stack_protector.
---
 nptl/pthread_mutex_lock.c   | 2 +-
 nptl/pthread_mutex_unlock.c | 2 +-
 setjmp/Makefile             | 4 ++++
 3 files changed, 6 insertions(+), 2 deletions(-)
Szabolcs Nagy - March 2, 2016, 6:53 p.m.
On 28/02/16 16:41, Nix wrote:
> From: Nick Alcock <nick.alcock@oracle.com>
> 
> This is the problematic part.  Without -fno-stack-protector on
> __pthread_mutex_cond_lock_adjust() and __pthread_mutex_unlock_usercnt(),
> nptl/tst-cond24 and nptl/tst-cond25 receive a NULL mutex at unlock time
> and segfault.  However... I don't understand why.  It is the callee's
> responsibility both to add the stack canary and to initialize it, just
> like any other local variable.  It has to be, or the ABI for stack-
> protected code would be incompatible with that for non-protected code.
> But the fact remains that
> sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S both explicitly
> mentions the stack frame layout and calls this function, and this call
> goes wrong if we stack-protect it.
> 
> So this is somewhere where I need someone to tell me what's special about
> sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S (and in particular
> special about priority-inheritance mutexes: everything else works),
> before I can be confident that this is even remotely the right thing to
> do.
> 

i think this should be investigated before adding any
workarounds there.

> We also de-stack-protect setjmp/sigjmp.c: it receives a sibcall from
> sysdeps/x86_64/setjmp.S and lands in rtld, but is *not* rebuilt by
> the machinery that rebuilds almost everything else that lands in
> rtld with an appropriate MODULE_NAME.
> 
> Similar fixups may be required for things called directly from
> assembly on other architectures.
> 

i don't understand the details here, but i don't think calling
something from asm should be a problem, only early function calls
(before thread pointer setup) should be problematic (on targets
where the canary is in the tcb).

> v2: de-stack-protect setjmp/sigjmp.c.
> v3: Use $(no-stack-protector).
> v4: Use inhibit_stack_protector.
Nix - March 3, 2016, 4:40 p.m.
On 2 Mar 2016, Szabolcs Nagy said:

> On 28/02/16 16:41, Nix wrote:
>> From: Nick Alcock <nick.alcock@oracle.com>
>> 
>> This is the problematic part.  Without -fno-stack-protector on
>> __pthread_mutex_cond_lock_adjust() and __pthread_mutex_unlock_usercnt(),
>> nptl/tst-cond24 and nptl/tst-cond25 receive a NULL mutex at unlock time
>> and segfault.  However... I don't understand why.  It is the callee's
>> responsibility both to add the stack canary and to initialize it, just
>> like any other local variable.  It has to be, or the ABI for stack-
>> protected code would be incompatible with that for non-protected code.
>> But the fact remains that
>> sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S both explicitly
>> mentions the stack frame layout and calls this function, and this call
>> goes wrong if we stack-protect it.
>> 
>> So this is somewhere where I need someone to tell me what's special about
>> sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S (and in particular
>> special about priority-inheritance mutexes: everything else works),
>> before I can be confident that this is even remotely the right thing to
>> do.
>
> i think this should be investigated before adding any
> workarounds there.

I'm reasonably certain it has to do with pthread_cond_timedwait.S
somehow setting up a stack frame and locals below it in a form which
__pthread_mutex_{cond_lock_adjust,unlock_usercnt} then rely upon. I just
don't understand how they could be doing that, since they seem to be
calling the normal entry-point as usual. It's almost as if the call
skips the usual stack-frame setup for the local variables in that
function (including the canary), only I don't see how that could even be
possible. But de-protecting them clearly works...

>> We also de-stack-protect setjmp/sigjmp.c: it receives a sibcall from
>> sysdeps/x86_64/setjmp.S and lands in rtld, but is *not* rebuilt by
>> the machinery that rebuilds almost everything else that lands in
>> rtld with an appropriate MODULE_NAME.
>> 
>> Similar fixups may be required for things called directly from
>> assembly on other architectures.
>
> i don't understand the details here, but i don't think calling
> something from asm should be a problem, only early function calls
> (before thread pointer setup) should be problematic (on targets
> where the canary is in the tcb).

Well yes, that's what I'd assume too! But that's clearly not so :(
This is the one great mystery, and I'm fairly glad to see that it's
mysterious to other people too so it's not just that I'm missing the
incredibly obvious somehow. Anyone know?
Andreas Schwab - March 3, 2016, 4:55 p.m.
Nix <nix@esperi.org.uk> writes:

> I'm reasonably certain it has to do with pthread_cond_timedwait.S
> somehow setting up a stack frame and locals below it in a form which
> __pthread_mutex_{cond_lock_adjust,unlock_usercnt} then rely upon. I just
> don't understand how they could be doing that, since they seem to be
> calling the normal entry-point as usual. It's almost as if the call
> skips the usual stack-frame setup for the local variables in that
> function (including the canary), only I don't see how that could even be
> possible. But de-protecting them clearly works...

Have you tried stepping through the function to find the place where it
breaks?

Andreas.
Nix - March 3, 2016, 5:41 p.m.
On 3 Mar 2016, Andreas Schwab stated:

> Nix <nix@esperi.org.uk> writes:
>
>> I'm reasonably certain it has to do with pthread_cond_timedwait.S
>> somehow setting up a stack frame and locals below it in a form which
>> __pthread_mutex_{cond_lock_adjust,unlock_usercnt} then rely upon. I just
>> don't understand how they could be doing that, since they seem to be
>> calling the normal entry-point as usual. It's almost as if the call
>> skips the usual stack-frame setup for the local variables in that
>> function (including the canary), only I don't see how that could even be
>> possible. But de-protecting them clearly works...
>
> Have you tried stepping through the function to find the place where it
> breaks?

Yeah, but it was years ago, when x86 asm was even more opaque to me than
it is now. It looked like it was intentionally nulling the mutex
parameter out and then using that null later on -- but if my
vaguely-supported supposition in the message I just sent to Adhemerval
regarding the way the assembly implementation is sorta-sibcalling
functions overlaid on its own stack frame is right, I'm not surprised at
all that it was failing like that: the canary would have thrown off all
the arguments after it, in addition to overwriting something it wasn't
meant to on the stack frame (which seems to have had a hole left in it,
uncommented, for the 'type' local variable), following which the
assignment to the 'type' local variable in
__pthread_mutex_unlock_usercnt() would have blown up the next variable
in the caller's overlaid stack frame... oops.

This is all speculation, really, I don't read x86 asm well enough to be
sure what's going on in there. (If I did, I'd know what I was doing in
this change and this whole conversation would not arise!)

Patch

diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
index bdfa529..87a1935 100644
--- a/nptl/pthread_mutex_lock.c
+++ b/nptl/pthread_mutex_lock.c
@@ -520,7 +520,7 @@  hidden_def (__pthread_mutex_lock)
 
 #ifdef NO_INCR
 void
-internal_function
+internal_function inhibit_stack_protector
 __pthread_mutex_cond_lock_adjust (pthread_mutex_t *mutex)
 {
   assert ((mutex->__data.__kind & PTHREAD_MUTEX_PRIO_INHERIT_NP) != 0);
diff --git a/nptl/pthread_mutex_unlock.c b/nptl/pthread_mutex_unlock.c
index 334ce38..5f23af1 100644
--- a/nptl/pthread_mutex_unlock.c
+++ b/nptl/pthread_mutex_unlock.c
@@ -33,7 +33,7 @@  __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr)
      __attribute_noinline__;
 
 int
-internal_function attribute_hidden
+internal_function inhibit_stack_protector attribute_hidden
 __pthread_mutex_unlock_usercnt (pthread_mutex_t *mutex, int decr)
 {
   int type = PTHREAD_MUTEX_TYPE_ELISION (mutex);
diff --git a/setjmp/Makefile b/setjmp/Makefile
index 5b677cc..b617a84 100644
--- a/setjmp/Makefile
+++ b/setjmp/Makefile
@@ -35,3 +35,7 @@  tests-static	:= tst-setjmp-static
 include ../Rules
 
 $(objpfx)tst-setjmp-fp: $(libm)
+
+# This is sibcalled directly from arch-specific assembly, included in rtld,
+# but never rebuilt, so it must never be built with stack protection.
+CFLAGS-sigjmp.c += $(no-stack-protector)