mips/o32: fix internal_syscall5/6/7

Message ID 20170815115055.29375-1-aurelien@aurel32.net
State New, archived
Headers

Commit Message

Aurelien Jarno Aug. 15, 2017, 11:50 a.m. UTC
  The internal_syscall5/6/7 functions use the stack pointer to store
the 5th and following arguments on the stack. In some cases GCC optimize
out the stack pointer, and thus storing the data to the stack causes a
segmentation fault.

Fix that by declaring the sp register as clobbered. Not sure it is the
best way to do that, but it seems to be enough to force GCC to not
optimize it out.

This fixes the nptl/tst-rwlock15 test on mips/o32.

ChangeLog:
	* sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
	(internal_syscall5): Add "$29" to the clobber list.
	(internal_syscall6): Likewise.
	(internal_syscall7): Likewise.
---
 ChangeLog                                    | 7 +++++++
 sysdeps/unix/sysv/linux/mips/mips32/sysdep.h | 6 +++---
 2 files changed, 10 insertions(+), 3 deletions(-)
  

Comments

Andreas Schwab Aug. 15, 2017, noon UTC | #1
On Aug 15 2017, Aurelien Jarno <aurelien@aurel32.net> wrote:

> The internal_syscall5/6/7 functions use the stack pointer to store
> the 5th and following arguments on the stack. In some cases GCC optimize
> out the stack pointer, and thus storing the data to the stack causes a
> segmentation fault.

FORCE_FRAME_POINTER does not work any more?

Andreas.
  
Florian Weimer Aug. 15, 2017, 12:15 p.m. UTC | #2
On 08/15/2017 01:50 PM, Aurelien Jarno wrote:
> The internal_syscall5/6/7 functions use the stack pointer to store
> the 5th and following arguments on the stack. In some cases GCC optimize
> out the stack pointer, and thus storing the data to the stack causes a
> segmentation fault.
> 
> Fix that by declaring the sp register as clobbered. Not sure it is the
> best way to do that, but it seems to be enough to force GCC to not
> optimize it out.

Doesn't the inline assembly *require* that r29 is the stack pointer?  Is
there a way to express this explicitly?

Thanks,
Florian
  
Adhemerval Zanella Aug. 15, 2017, 1:06 p.m. UTC | #3
On 15/08/2017 09:00, Andreas Schwab wrote:
> On Aug 15 2017, Aurelien Jarno <aurelien@aurel32.net> wrote:
> 
>> The internal_syscall5/6/7 functions use the stack pointer to store
>> the 5th and following arguments on the stack. In some cases GCC optimize
>> out the stack pointer, and thus storing the data to the stack causes a
>> segmentation fault.
> 
> FORCE_FRAME_POINTER does not work any more?

Wouldn't a better option and more compiler optimization proof to route
syscall5/6/7 to a out of line symbol call to proper handle the stack
pointer as for ARM and i386 (__libc_do_syscall)?
  
Aurelien Jarno Aug. 15, 2017, 4:14 p.m. UTC | #4
On 2017-08-15 14:00, Andreas Schwab wrote:
> On Aug 15 2017, Aurelien Jarno <aurelien@aurel32.net> wrote:
> 
> > The internal_syscall5/6/7 functions use the stack pointer to store
> > the 5th and following arguments on the stack. In some cases GCC optimize
> > out the stack pointer, and thus storing the data to the stack causes a
> > segmentation fault.
> 
> FORCE_FRAME_POINTER does not work any more?

From what I understand of the generated code, it seems to work at the
function level, but not at the asm code level. The pthread_rwlock_rdlock
adds a loop around the syscall, and it seems that's the code path causing
the issue. Adding $sp as clobbered changes the code to reload the $sp
from the saved value at each loop.
  
Aurelien Jarno Aug. 15, 2017, 4:16 p.m. UTC | #5
On 2017-08-15 10:06, Adhemerval Zanella wrote:
> 
> 
> On 15/08/2017 09:00, Andreas Schwab wrote:
> > On Aug 15 2017, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > 
> >> The internal_syscall5/6/7 functions use the stack pointer to store
> >> the 5th and following arguments on the stack. In some cases GCC optimize
> >> out the stack pointer, and thus storing the data to the stack causes a
> >> segmentation fault.
> > 
> > FORCE_FRAME_POINTER does not work any more?
> 
> Wouldn't a better option and more compiler optimization proof to route
> syscall5/6/7 to a out of line symbol call to proper handle the stack
> pointer as for ARM and i386 (__libc_do_syscall)?

Hmm interesting indeed, though that implies an additional call to a
function instead of being fully inline. Not sure it makes a big
difference performance wise given the syscall a few instructions later.
  
Joseph Myers Aug. 15, 2017, 4:24 p.m. UTC | #6
On Tue, 15 Aug 2017, Adhemerval Zanella wrote:

> Wouldn't a better option and more compiler optimization proof to route
> syscall5/6/7 to a out of line symbol call to proper handle the stack
> pointer as for ARM and i386 (__libc_do_syscall)?

Indeed (and with a bug filed in Bugzilla as usual since this issue was 
user-visible in a release).
  
Aurelien Jarno Aug. 15, 2017, 7:33 p.m. UTC | #7
On 2017-08-15 16:24, Joseph Myers wrote:
> On Tue, 15 Aug 2017, Adhemerval Zanella wrote:
> 
> > Wouldn't a better option and more compiler optimization proof to route
> > syscall5/6/7 to a out of line symbol call to proper handle the stack
> > pointer as for ARM and i386 (__libc_do_syscall)?
> 
> Indeed (and with a bug filed in Bugzilla as usual since this issue was 
> user-visible in a release).

That's one way to do that, however it does seem correct to me that there
is no way to force the stack pointer to be valid in an asm code. The
stack pointer is used in other asm codes in the glibc, so we need a more
global solution.

For the record, here is the corresponding generated code showing the
issue:

 174:	8e020000 	lw	v0,0(s0)
 178:	30420004 	andi	v0,v0,0x4
 17c:	104000a6 	beqz	v0,418 <__GI___pthread_rwlock_rdlock+0x418>
 180:	02c01825 	move	v1,s6
 184:	92050019 	lbu	a1,25(s0)
 188:	27bdfff0 	addiu	sp,sp,-16

 here the stack pointer is changed

 18c:	8fc60024 	lw	a2,36(s8)
 190:	02002025 	move	a0,s0
 194:	02e5180a 	movz	v1,s7,a1
 198:	27a20010 	addiu	v0,sp,16
 19c:	00003825 	move	a3,zero
 1a0:	afc20020 	sw	v0,32(s8)

 and the original value stored in the fp.

 1a4:	00001025 	move	v0,zero
 1a8:	00602825 	move	a1,v1

 --- begin of asm code

 1ac:	27bdffe0 	addiu	sp,sp,-32
 1b0:	afa20010 	sw	v0,16(sp)
 1b4:	afb20014 	sw	s2,20(sp)
 1b8:	2402108e 	li	v0,4238
 1bc:	0000000c 	syscall
 1c0:	27bd0020 	addiu	sp,sp,32

 --- end of asm code

 1c4:	10e0ffeb 	beqz	a3,174 <__GI___pthread_rwlock_rdlock+0x174>
 1c8:	00021823 	negu	v1,v0


When specifying the stack pointer as clobbered, we end up with the
following code:

 174:	8e020000 	lw	v0,0(s0)
 178:	30420004 	andi	v0,v0,0x4
 17c:	104000a6 	beqz	v0,418 <__GI___pthread_rwlock_rdlock+0x418>
 180:	8fc60024 	lw	a2,36(s8)
 184:	02a0e825 	move	sp,s5

 here the stack pointer is reloaded at each loop (the decrease by 16 is
 done earlier before saving it in s5).

 188:	92050019 	lbu	a1,25(s0)
 18c:	27a20010 	addiu	v0,sp,16
 190:	02002025 	move	a0,s0
 194:	afc20020 	sw	v0,32(s8)
 198:	02c01025 	move	v0,s6
 19c:	02e5100a 	movz	v0,s7,a1
 1a0:	00003825 	move	a3,zero
 1a4:	00402825 	move	a1,v0
 1a8:	00001025 	move	v0,zero

 --- begin of asm code

 1ac:	27bdffe0 	addiu	sp,sp,-32
 1b0:	afa20010 	sw	v0,16(sp)
 1b4:	afb20014 	sw	s2,20(sp)
 1b8:	2402108e 	li	v0,4238
 1bc:	0000000c 	syscall
 1c0:	27bd0020 	addiu	sp,sp,32

 --- end of asm code

 1c4:	10e0ffeb 	beqz	a3,174 <__GI___pthread_rwlock_rdlock+0x174>
 1c8:	00021823 	negu	v1,v0
  
Joseph Myers Aug. 15, 2017, 7:51 p.m. UTC | #8
On Tue, 15 Aug 2017, Aurelien Jarno wrote:

> That's one way to do that, however it does seem correct to me that there
> is no way to force the stack pointer to be valid in an asm code. The
> stack pointer is used in other asm codes in the glibc, so we need a more
> global solution.

What's the basis for saying the stack pointer is invalid (as opposed to 
unwind information referring to the original stack pointer, so being 
invalid at the point of the syscall, causing unwinding to crash)?  The 
stack pointer should be unconditionally valid for all asms, on all 
architectures; after all, it's certainly OK to make a function call from 
inside an asm, or for a signal handler (without sigaltstack) to interrupt 
an asm.

I don't think modifying the stack pointer inside an asm can ever be safe 
*in glibc's context* because unwind info might refer to it (even if 
there's a frame pointer), and I don't think making an asm clobber the 
stack pointer is safe either.
  
Aurelien Jarno Aug. 15, 2017, 8:08 p.m. UTC | #9
On 2017-08-15 19:51, Joseph Myers wrote:
> On Tue, 15 Aug 2017, Aurelien Jarno wrote:
> 
> > That's one way to do that, however it does seem correct to me that there
> > is no way to force the stack pointer to be valid in an asm code. The
> > stack pointer is used in other asm codes in the glibc, so we need a more
> > global solution.
> 
> What's the basis for saying the stack pointer is invalid (as opposed to 
> unwind information referring to the original stack pointer, so being 
> invalid at the point of the syscall, causing unwinding to crash)?  The 

1) Looking at the assembly code, the value of the stack pointer around
   the syscall depends on the number of time the loop is executed.
2) The crash happens when reaching the stack guard, with a very simple
   test case not using recursive functions.

> stack pointer should be unconditionally valid for all asms, on all 
> architectures; after all, it's certainly OK to make a function call from 
> inside an asm, or for a signal handler (without sigaltstack) to interrupt 
> an asm.
> 
> I don't think modifying the stack pointer inside an asm can ever be safe 
> *in glibc's context* because unwind info might refer to it (even if 
> there's a frame pointer), and I don't think making an asm clobber the 
> stack pointer is safe either.

That's indeed correct, so in that specific case it indeed make sense
to use an out of line symbol call. I am still worried about other use of
$sp in other asm code.
  
Joseph Myers Aug. 15, 2017, 8:21 p.m. UTC | #10
On Tue, 15 Aug 2017, Aurelien Jarno wrote:

> On 2017-08-15 19:51, Joseph Myers wrote:
> > On Tue, 15 Aug 2017, Aurelien Jarno wrote:
> > 
> > > That's one way to do that, however it does seem correct to me that there
> > > is no way to force the stack pointer to be valid in an asm code. The
> > > stack pointer is used in other asm codes in the glibc, so we need a more
> > > global solution.
> > 
> > What's the basis for saying the stack pointer is invalid (as opposed to 
> > unwind information referring to the original stack pointer, so being 
> > invalid at the point of the syscall, causing unwinding to crash)?  The 
> 
> 1) Looking at the assembly code, the value of the stack pointer around
>    the syscall depends on the number of time the loop is executed.
> 2) The crash happens when reaching the stack guard, with a very simple
>    test case not using recursive functions.

What that says to me is that the alloca (to ensure frame pointer creation) 
is fundamentally problematic if the syscall macro can be used many times 
in a loop within a function, because it will allocate unbounded amounts of 
stack.

In which case having a volatile integer variable with value 4, declaring a 
VLA whose size is that variable, and storing a pointer to that VLA in a 
variable, would be an alternative to alloca to force a frame pointer, but 
with deallocation happening when the scope ends rather than the function 
ending (and the syscall macro has its own scope, so using it inside a loop 
wouldn't be a problem).
  
Aurelien Jarno Aug. 15, 2017, 8:41 p.m. UTC | #11
On 2017-08-15 20:21, Joseph Myers wrote:
> On Tue, 15 Aug 2017, Aurelien Jarno wrote:
> 
> > On 2017-08-15 19:51, Joseph Myers wrote:
> > > On Tue, 15 Aug 2017, Aurelien Jarno wrote:
> > > 
> > > > That's one way to do that, however it does seem correct to me that there
> > > > is no way to force the stack pointer to be valid in an asm code. The
> > > > stack pointer is used in other asm codes in the glibc, so we need a more
> > > > global solution.
> > > 
> > > What's the basis for saying the stack pointer is invalid (as opposed to 
> > > unwind information referring to the original stack pointer, so being 
> > > invalid at the point of the syscall, causing unwinding to crash)?  The 
> > 
> > 1) Looking at the assembly code, the value of the stack pointer around
> >    the syscall depends on the number of time the loop is executed.
> > 2) The crash happens when reaching the stack guard, with a very simple
> >    test case not using recursive functions.
> 
> What that says to me is that the alloca (to ensure frame pointer creation) 
> is fundamentally problematic if the syscall macro can be used many times 
> in a loop within a function, because it will allocate unbounded amounts of 
> stack.

Oh didn't thought about that, thanks for the explanation.
  
Maciej W. Rozycki Aug. 16, 2017, 1:24 p.m. UTC | #12
On Tue, 15 Aug 2017, Joseph Myers wrote:

> In which case having a volatile integer variable with value 4, declaring a 
> VLA whose size is that variable, and storing a pointer to that VLA in a 
> variable, would be an alternative to alloca to force a frame pointer, but 
> with deallocation happening when the scope ends rather than the function 
> ending (and the syscall macro has its own scope, so using it inside a loop 
> wouldn't be a problem).

 I suspect using volatile variables will cause unnecessary memory traffic.  
Passing the size specifier through an empty `asm' might give better code; 
also I think we can use 0 as the size requested, not to decrease the stack 
pointer unnecessarily, e.g.:

  {
    size_t s = 0;

    asm ("" : "+r" (s));
    {
      char vla[s << 3];

      asm ("" : : "p" (vla));
      /* ... */

This seems to produce reasonable code with GCC 8, taking the necessity to 
align stack as per the ABI requirement into account already, and wasting 
two instructions only in addition to the $sp adjustment itself:

	move	$4,$0
	sll	$4,$4,3
# ...
	subu	$sp,$sp,$4
	
 Also I wonder if there's actually a dependable way to have GCC itself 
allocate the argument space we require.  For example if we set `s' to 1 
above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be 
valid to place arguments #5 and #6 at respectively without the subsequent 
$sp adjustment we currently have in the syscall `asm' or would it be UB?

  Maciej
  
Joseph Myers Aug. 16, 2017, 1:44 p.m. UTC | #13
On Wed, 16 Aug 2017, Maciej W. Rozycki wrote:

> On Tue, 15 Aug 2017, Joseph Myers wrote:
> 
> > In which case having a volatile integer variable with value 4, declaring a 
> > VLA whose size is that variable, and storing a pointer to that VLA in a 
> > variable, would be an alternative to alloca to force a frame pointer, but 
> > with deallocation happening when the scope ends rather than the function 
> > ending (and the syscall macro has its own scope, so using it inside a loop 
> > wouldn't be a problem).
> 
>  I suspect using volatile variables will cause unnecessary memory traffic.  
> Passing the size specifier through an empty `asm' might give better code; 
> also I think we can use 0 as the size requested, not to decrease the stack 
> pointer unnecessarily, e.g.:

Sure, as long as (a) the compiler can't know the size is actually constant 
and (b) it can't know the VLA isn't actually used, as if it can tell 
either of those things it can optimize away the variable stack allocation.

>  Also I wonder if there's actually a dependable way to have GCC itself 
> allocate the argument space we require.  For example if we set `s' to 1 
> above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be 
> valid to place arguments #5 and #6 at respectively without the subsequent 
> $sp adjustment we currently have in the syscall `asm' or would it be UB?

You can't tell whether the compiler might have allocated other variables 
on the stack after the dynamic adjustment - that is, whether any 
particular offset from sp is in fact unused or not.
  
Maciej W. Rozycki Aug. 16, 2017, 2:30 p.m. UTC | #14
On Wed, 16 Aug 2017, Joseph Myers wrote:

> >  I suspect using volatile variables will cause unnecessary memory traffic.  
> > Passing the size specifier through an empty `asm' might give better code; 
> > also I think we can use 0 as the size requested, not to decrease the stack 
> > pointer unnecessarily, e.g.:
> 
> Sure, as long as (a) the compiler can't know the size is actually constant 
> and (b) it can't know the VLA isn't actually used, as if it can tell 
> either of those things it can optimize away the variable stack allocation.

 Well, an `asm' is a black box, unless it is known -- under the conditions 
set out in GCC documentation -- to be safe to optimise away regardless.  
Neither `asm' I proposed matches the conditions.

> >  Also I wonder if there's actually a dependable way to have GCC itself 
> > allocate the argument space we require.  For example if we set `s' to 1 
> > above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be 
> > valid to place arguments #5 and #6 at respectively without the subsequent 
> > $sp adjustment we currently have in the syscall `asm' or would it be UB?
> 
> You can't tell whether the compiler might have allocated other variables 
> on the stack after the dynamic adjustment - that is, whether any 
> particular offset from sp is in fact unused or not.

 Hmm, taking the requirement to deallocate the space arranged for a VLA at 
the exit of the containing block into account I think we can eliminate the 
possibility for the compiler to have allocated space for other variables 
as long as no other variable has been declared within the same block (or 
any nested one).

 Or am I missing anything here?  E.g. is the compiler allowed to spill 
random data to the stack at any time even in the absence of a matching 
variable declaration?  Or is it allowed to allocate space for a non-VLA 
variable that has been declared outside the block concerned, but the 
lifespan of which is contained within the block in this stack space rather 
than in the local frame?

 If the answer to any of these questions is "yes", then would factoring 
out the syscall `asm' along with the associated VLA declaration to a 
helper `always_inline' function help or would it not?

 I mean it is a tiny optimisation, but some syscalls are frequently 
called, so if we can avoid a waste of resources, then why not?

  Maciej
  
Joseph Myers Aug. 16, 2017, 2:46 p.m. UTC | #15
On Wed, 16 Aug 2017, Maciej W. Rozycki wrote:

>  Or am I missing anything here?  E.g. is the compiler allowed to spill 
> random data to the stack at any time even in the absence of a matching 
> variable declaration?  Or is it allowed to allocate space for a non-VLA 
> variable that has been declared outside the block concerned, but the 
> lifespan of which is contained within the block in this stack space rather 
> than in the local frame?

Yes, it's allowed to do both of those.

>  If the answer to any of these questions is "yes", then would factoring 
> out the syscall `asm' along with the associated VLA declaration to a 
> helper `always_inline' function help or would it not?

I don't think that would help.  An asm can never make assumptions about 
which parts of the stack are used for what, only use its operands.

>  I mean it is a tiny optimisation, but some syscalls are frequently 
> called, so if we can avoid a waste of resources, then why not?

Are any 5/6/7-argument syscalls frequently called?
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 56540f55a1..5d1a088431 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@ 
+2017-08-15  Aurelien Jarno <aurelien@aurel32.net>
+
+	* sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
+	(internal_syscall5): Add "$29" to the clobber list.
+	(internal_syscall6): Likewise.
+	(internal_syscall7): Likewise.
+
 2017-08-14  Joseph Myers  <joseph@codesourcery.com>
 
 	* conform/data/sys/wait.h-data (si_value): Do not expect for
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
index e9e3ee7e82..0df32c186f 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
+++ b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
@@ -294,7 +294,7 @@ 
 	: "=r" (__v0), "+r" (__a3)					\
 	: input, "r" (__a0), "r" (__a1), "r" (__a2),			\
 	  "r" ((long) (arg5))						\
-	: __SYSCALL_CLOBBERS);						\
+	: __SYSCALL_CLOBBERS, "$29");					\
 	err = __a3;							\
 	_sys_result = __v0;						\
 	}								\
@@ -327,7 +327,7 @@ 
 	: "=r" (__v0), "+r" (__a3)					\
 	: input, "r" (__a0), "r" (__a1), "r" (__a2),			\
 	  "r" ((long) (arg5)), "r" ((long) (arg6))			\
-	: __SYSCALL_CLOBBERS);						\
+	: __SYSCALL_CLOBBERS, "$29");					\
 	err = __a3;							\
 	_sys_result = __v0;						\
 	}								\
@@ -361,7 +361,7 @@ 
 	: "=r" (__v0), "+r" (__a3)					\
 	: input, "r" (__a0), "r" (__a1), "r" (__a2),			\
 	  "r" ((long) (arg5)), "r" ((long) (arg6)), "r" ((long) (arg7))	\
-	: __SYSCALL_CLOBBERS);						\
+	: __SYSCALL_CLOBBERS, "$29");					\
 	err = __a3;							\
 	_sys_result = __v0;						\
 	}								\