Message ID | 20170815115055.29375-1-aurelien@aurel32.net |
---|---|
State | New, archived |
Headers |
Received: (qmail 70967 invoked by alias); 15 Aug 2017 11:53:39 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 46747 invoked by uid 89); 15 Aug 2017 11:52:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD, URIBL_RED autolearn=ham version=3.3.2 spammy=conform X-HELO: hall.aurel32.net From: Aurelien Jarno <aurelien@aurel32.net> To: libc-alpha@sourceware.org Cc: Aurelien Jarno <aurelien@aurel32.net> Subject: [PATCH] mips/o32: fix internal_syscall5/6/7 Date: Tue, 15 Aug 2017 13:50:55 +0200 Message-Id: <20170815115055.29375-1-aurelien@aurel32.net> |
Commit Message
Aurelien Jarno
Aug. 15, 2017, 11:50 a.m. UTC
The internal_syscall5/6/7 functions use the stack pointer to store the 5th and following arguments on the stack. In some cases GCC optimize out the stack pointer, and thus storing the data to the stack causes a segmentation fault. Fix that by declaring the sp register as clobbered. Not sure it is the best way to do that, but it seems to be enough to force GCC to not optimize it out. This fixes the nptl/tst-rwlock15 test on mips/o32. ChangeLog: * sysdeps/unix/sysv/linux/mips/mips32/sysdep.h (internal_syscall5): Add "$29" to the clobber list. (internal_syscall6): Likewise. (internal_syscall7): Likewise. --- ChangeLog | 7 +++++++ sysdeps/unix/sysv/linux/mips/mips32/sysdep.h | 6 +++--- 2 files changed, 10 insertions(+), 3 deletions(-)
Comments
On Aug 15 2017, Aurelien Jarno <aurelien@aurel32.net> wrote: > The internal_syscall5/6/7 functions use the stack pointer to store > the 5th and following arguments on the stack. In some cases GCC optimize > out the stack pointer, and thus storing the data to the stack causes a > segmentation fault. FORCE_FRAME_POINTER does not work any more? Andreas.
On 08/15/2017 01:50 PM, Aurelien Jarno wrote: > The internal_syscall5/6/7 functions use the stack pointer to store > the 5th and following arguments on the stack. In some cases GCC optimize > out the stack pointer, and thus storing the data to the stack causes a > segmentation fault. > > Fix that by declaring the sp register as clobbered. Not sure it is the > best way to do that, but it seems to be enough to force GCC to not > optimize it out. Doesn't the inline assembly *require* that r29 is the stack pointer? Is there a way to express this explicitly? Thanks, Florian
On 15/08/2017 09:00, Andreas Schwab wrote: > On Aug 15 2017, Aurelien Jarno <aurelien@aurel32.net> wrote: > >> The internal_syscall5/6/7 functions use the stack pointer to store >> the 5th and following arguments on the stack. In some cases GCC optimize >> out the stack pointer, and thus storing the data to the stack causes a >> segmentation fault. > > FORCE_FRAME_POINTER does not work any more? Wouldn't a better option and more compiler optimization proof to route syscall5/6/7 to a out of line symbol call to proper handle the stack pointer as for ARM and i386 (__libc_do_syscall)?
On 2017-08-15 14:00, Andreas Schwab wrote: > On Aug 15 2017, Aurelien Jarno <aurelien@aurel32.net> wrote: > > > The internal_syscall5/6/7 functions use the stack pointer to store > > the 5th and following arguments on the stack. In some cases GCC optimize > > out the stack pointer, and thus storing the data to the stack causes a > > segmentation fault. > > FORCE_FRAME_POINTER does not work any more? From what I understand of the generated code, it seems to work at the function level, but not at the asm code level. The pthread_rwlock_rdlock adds a loop around the syscall, and it seems that's the code path causing the issue. Adding $sp as clobbered changes the code to reload the $sp from the saved value at each loop.
On 2017-08-15 10:06, Adhemerval Zanella wrote: > > > On 15/08/2017 09:00, Andreas Schwab wrote: > > On Aug 15 2017, Aurelien Jarno <aurelien@aurel32.net> wrote: > > > >> The internal_syscall5/6/7 functions use the stack pointer to store > >> the 5th and following arguments on the stack. In some cases GCC optimize > >> out the stack pointer, and thus storing the data to the stack causes a > >> segmentation fault. > > > > FORCE_FRAME_POINTER does not work any more? > > Wouldn't a better option and more compiler optimization proof to route > syscall5/6/7 to a out of line symbol call to proper handle the stack > pointer as for ARM and i386 (__libc_do_syscall)? Hmm interesting indeed, though that implies an additional call to a function instead of being fully inline. Not sure it makes a big difference performance wise given the syscall a few instructions later.
On Tue, 15 Aug 2017, Adhemerval Zanella wrote: > Wouldn't a better option and more compiler optimization proof to route > syscall5/6/7 to a out of line symbol call to proper handle the stack > pointer as for ARM and i386 (__libc_do_syscall)? Indeed (and with a bug filed in Bugzilla as usual since this issue was user-visible in a release).
On 2017-08-15 16:24, Joseph Myers wrote: > On Tue, 15 Aug 2017, Adhemerval Zanella wrote: > > > Wouldn't a better option and more compiler optimization proof to route > > syscall5/6/7 to a out of line symbol call to proper handle the stack > > pointer as for ARM and i386 (__libc_do_syscall)? > > Indeed (and with a bug filed in Bugzilla as usual since this issue was > user-visible in a release). That's one way to do that, however it does seem correct to me that there is no way to force the stack pointer to be valid in an asm code. The stack pointer is used in other asm codes in the glibc, so we need a more global solution. For the record, here is the corresponding generated code showing the issue: 174: 8e020000 lw v0,0(s0) 178: 30420004 andi v0,v0,0x4 17c: 104000a6 beqz v0,418 <__GI___pthread_rwlock_rdlock+0x418> 180: 02c01825 move v1,s6 184: 92050019 lbu a1,25(s0) 188: 27bdfff0 addiu sp,sp,-16 here the stack pointer is changed 18c: 8fc60024 lw a2,36(s8) 190: 02002025 move a0,s0 194: 02e5180a movz v1,s7,a1 198: 27a20010 addiu v0,sp,16 19c: 00003825 move a3,zero 1a0: afc20020 sw v0,32(s8) and the original value stored in the fp. 1a4: 00001025 move v0,zero 1a8: 00602825 move a1,v1 --- begin of asm code 1ac: 27bdffe0 addiu sp,sp,-32 1b0: afa20010 sw v0,16(sp) 1b4: afb20014 sw s2,20(sp) 1b8: 2402108e li v0,4238 1bc: 0000000c syscall 1c0: 27bd0020 addiu sp,sp,32 --- end of asm code 1c4: 10e0ffeb beqz a3,174 <__GI___pthread_rwlock_rdlock+0x174> 1c8: 00021823 negu v1,v0 When specifying the stack pointer as clobbered, we end up with the following code: 174: 8e020000 lw v0,0(s0) 178: 30420004 andi v0,v0,0x4 17c: 104000a6 beqz v0,418 <__GI___pthread_rwlock_rdlock+0x418> 180: 8fc60024 lw a2,36(s8) 184: 02a0e825 move sp,s5 here the stack pointer is reloaded at each loop (the decrease by 16 is done earlier before saving it in s5). 188: 92050019 lbu a1,25(s0) 18c: 27a20010 addiu v0,sp,16 190: 02002025 move a0,s0 194: afc20020 sw v0,32(s8) 198: 02c01025 move v0,s6 19c: 02e5100a movz v0,s7,a1 1a0: 00003825 move a3,zero 1a4: 00402825 move a1,v0 1a8: 00001025 move v0,zero --- begin of asm code 1ac: 27bdffe0 addiu sp,sp,-32 1b0: afa20010 sw v0,16(sp) 1b4: afb20014 sw s2,20(sp) 1b8: 2402108e li v0,4238 1bc: 0000000c syscall 1c0: 27bd0020 addiu sp,sp,32 --- end of asm code 1c4: 10e0ffeb beqz a3,174 <__GI___pthread_rwlock_rdlock+0x174> 1c8: 00021823 negu v1,v0
On Tue, 15 Aug 2017, Aurelien Jarno wrote: > That's one way to do that, however it does seem correct to me that there > is no way to force the stack pointer to be valid in an asm code. The > stack pointer is used in other asm codes in the glibc, so we need a more > global solution. What's the basis for saying the stack pointer is invalid (as opposed to unwind information referring to the original stack pointer, so being invalid at the point of the syscall, causing unwinding to crash)? The stack pointer should be unconditionally valid for all asms, on all architectures; after all, it's certainly OK to make a function call from inside an asm, or for a signal handler (without sigaltstack) to interrupt an asm. I don't think modifying the stack pointer inside an asm can ever be safe *in glibc's context* because unwind info might refer to it (even if there's a frame pointer), and I don't think making an asm clobber the stack pointer is safe either.
On 2017-08-15 19:51, Joseph Myers wrote: > On Tue, 15 Aug 2017, Aurelien Jarno wrote: > > > That's one way to do that, however it does seem correct to me that there > > is no way to force the stack pointer to be valid in an asm code. The > > stack pointer is used in other asm codes in the glibc, so we need a more > > global solution. > > What's the basis for saying the stack pointer is invalid (as opposed to > unwind information referring to the original stack pointer, so being > invalid at the point of the syscall, causing unwinding to crash)? The 1) Looking at the assembly code, the value of the stack pointer around the syscall depends on the number of time the loop is executed. 2) The crash happens when reaching the stack guard, with a very simple test case not using recursive functions. > stack pointer should be unconditionally valid for all asms, on all > architectures; after all, it's certainly OK to make a function call from > inside an asm, or for a signal handler (without sigaltstack) to interrupt > an asm. > > I don't think modifying the stack pointer inside an asm can ever be safe > *in glibc's context* because unwind info might refer to it (even if > there's a frame pointer), and I don't think making an asm clobber the > stack pointer is safe either. That's indeed correct, so in that specific case it indeed make sense to use an out of line symbol call. I am still worried about other use of $sp in other asm code.
On Tue, 15 Aug 2017, Aurelien Jarno wrote: > On 2017-08-15 19:51, Joseph Myers wrote: > > On Tue, 15 Aug 2017, Aurelien Jarno wrote: > > > > > That's one way to do that, however it does seem correct to me that there > > > is no way to force the stack pointer to be valid in an asm code. The > > > stack pointer is used in other asm codes in the glibc, so we need a more > > > global solution. > > > > What's the basis for saying the stack pointer is invalid (as opposed to > > unwind information referring to the original stack pointer, so being > > invalid at the point of the syscall, causing unwinding to crash)? The > > 1) Looking at the assembly code, the value of the stack pointer around > the syscall depends on the number of time the loop is executed. > 2) The crash happens when reaching the stack guard, with a very simple > test case not using recursive functions. What that says to me is that the alloca (to ensure frame pointer creation) is fundamentally problematic if the syscall macro can be used many times in a loop within a function, because it will allocate unbounded amounts of stack. In which case having a volatile integer variable with value 4, declaring a VLA whose size is that variable, and storing a pointer to that VLA in a variable, would be an alternative to alloca to force a frame pointer, but with deallocation happening when the scope ends rather than the function ending (and the syscall macro has its own scope, so using it inside a loop wouldn't be a problem).
On 2017-08-15 20:21, Joseph Myers wrote: > On Tue, 15 Aug 2017, Aurelien Jarno wrote: > > > On 2017-08-15 19:51, Joseph Myers wrote: > > > On Tue, 15 Aug 2017, Aurelien Jarno wrote: > > > > > > > That's one way to do that, however it does seem correct to me that there > > > > is no way to force the stack pointer to be valid in an asm code. The > > > > stack pointer is used in other asm codes in the glibc, so we need a more > > > > global solution. > > > > > > What's the basis for saying the stack pointer is invalid (as opposed to > > > unwind information referring to the original stack pointer, so being > > > invalid at the point of the syscall, causing unwinding to crash)? The > > > > 1) Looking at the assembly code, the value of the stack pointer around > > the syscall depends on the number of time the loop is executed. > > 2) The crash happens when reaching the stack guard, with a very simple > > test case not using recursive functions. > > What that says to me is that the alloca (to ensure frame pointer creation) > is fundamentally problematic if the syscall macro can be used many times > in a loop within a function, because it will allocate unbounded amounts of > stack. Oh didn't thought about that, thanks for the explanation.
On Tue, 15 Aug 2017, Joseph Myers wrote: > In which case having a volatile integer variable with value 4, declaring a > VLA whose size is that variable, and storing a pointer to that VLA in a > variable, would be an alternative to alloca to force a frame pointer, but > with deallocation happening when the scope ends rather than the function > ending (and the syscall macro has its own scope, so using it inside a loop > wouldn't be a problem). I suspect using volatile variables will cause unnecessary memory traffic. Passing the size specifier through an empty `asm' might give better code; also I think we can use 0 as the size requested, not to decrease the stack pointer unnecessarily, e.g.: { size_t s = 0; asm ("" : "+r" (s)); { char vla[s << 3]; asm ("" : : "p" (vla)); /* ... */ This seems to produce reasonable code with GCC 8, taking the necessity to align stack as per the ABI requirement into account already, and wasting two instructions only in addition to the $sp adjustment itself: move $4,$0 sll $4,$4,3 # ... subu $sp,$sp,$4 Also I wonder if there's actually a dependable way to have GCC itself allocate the argument space we require. For example if we set `s' to 1 above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be valid to place arguments #5 and #6 at respectively without the subsequent $sp adjustment we currently have in the syscall `asm' or would it be UB? Maciej
On Wed, 16 Aug 2017, Maciej W. Rozycki wrote: > On Tue, 15 Aug 2017, Joseph Myers wrote: > > > In which case having a volatile integer variable with value 4, declaring a > > VLA whose size is that variable, and storing a pointer to that VLA in a > > variable, would be an alternative to alloca to force a frame pointer, but > > with deallocation happening when the scope ends rather than the function > > ending (and the syscall macro has its own scope, so using it inside a loop > > wouldn't be a problem). > > I suspect using volatile variables will cause unnecessary memory traffic. > Passing the size specifier through an empty `asm' might give better code; > also I think we can use 0 as the size requested, not to decrease the stack > pointer unnecessarily, e.g.: Sure, as long as (a) the compiler can't know the size is actually constant and (b) it can't know the VLA isn't actually used, as if it can tell either of those things it can optimize away the variable stack allocation. > Also I wonder if there's actually a dependable way to have GCC itself > allocate the argument space we require. For example if we set `s' to 1 > above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be > valid to place arguments #5 and #6 at respectively without the subsequent > $sp adjustment we currently have in the syscall `asm' or would it be UB? You can't tell whether the compiler might have allocated other variables on the stack after the dynamic adjustment - that is, whether any particular offset from sp is in fact unused or not.
On Wed, 16 Aug 2017, Joseph Myers wrote: > > I suspect using volatile variables will cause unnecessary memory traffic. > > Passing the size specifier through an empty `asm' might give better code; > > also I think we can use 0 as the size requested, not to decrease the stack > > pointer unnecessarily, e.g.: > > Sure, as long as (a) the compiler can't know the size is actually constant > and (b) it can't know the VLA isn't actually used, as if it can tell > either of those things it can optimize away the variable stack allocation. Well, an `asm' is a black box, unless it is known -- under the conditions set out in GCC documentation -- to be safe to optimise away regardless. Neither `asm' I proposed matches the conditions. > > Also I wonder if there's actually a dependable way to have GCC itself > > allocate the argument space we require. For example if we set `s' to 1 > > above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be > > valid to place arguments #5 and #6 at respectively without the subsequent > > $sp adjustment we currently have in the syscall `asm' or would it be UB? > > You can't tell whether the compiler might have allocated other variables > on the stack after the dynamic adjustment - that is, whether any > particular offset from sp is in fact unused or not. Hmm, taking the requirement to deallocate the space arranged for a VLA at the exit of the containing block into account I think we can eliminate the possibility for the compiler to have allocated space for other variables as long as no other variable has been declared within the same block (or any nested one). Or am I missing anything here? E.g. is the compiler allowed to spill random data to the stack at any time even in the absence of a matching variable declaration? Or is it allowed to allocate space for a non-VLA variable that has been declared outside the block concerned, but the lifespan of which is contained within the block in this stack space rather than in the local frame? If the answer to any of these questions is "yes", then would factoring out the syscall `asm' along with the associated VLA declaration to a helper `always_inline' function help or would it not? I mean it is a tiny optimisation, but some syscalls are frequently called, so if we can avoid a waste of resources, then why not? Maciej
On Wed, 16 Aug 2017, Maciej W. Rozycki wrote: > Or am I missing anything here? E.g. is the compiler allowed to spill > random data to the stack at any time even in the absence of a matching > variable declaration? Or is it allowed to allocate space for a non-VLA > variable that has been declared outside the block concerned, but the > lifespan of which is contained within the block in this stack space rather > than in the local frame? Yes, it's allowed to do both of those. > If the answer to any of these questions is "yes", then would factoring > out the syscall `asm' along with the associated VLA declaration to a > helper `always_inline' function help or would it not? I don't think that would help. An asm can never make assumptions about which parts of the stack are used for what, only use its operands. > I mean it is a tiny optimisation, but some syscalls are frequently > called, so if we can avoid a waste of resources, then why not? Are any 5/6/7-argument syscalls frequently called?
diff --git a/ChangeLog b/ChangeLog index 56540f55a1..5d1a088431 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2017-08-15 Aurelien Jarno <aurelien@aurel32.net> + + * sysdeps/unix/sysv/linux/mips/mips32/sysdep.h + (internal_syscall5): Add "$29" to the clobber list. + (internal_syscall6): Likewise. + (internal_syscall7): Likewise. + 2017-08-14 Joseph Myers <joseph@codesourcery.com> * conform/data/sys/wait.h-data (si_value): Do not expect for diff --git a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h index e9e3ee7e82..0df32c186f 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h +++ b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h @@ -294,7 +294,7 @@ : "=r" (__v0), "+r" (__a3) \ : input, "r" (__a0), "r" (__a1), "r" (__a2), \ "r" ((long) (arg5)) \ - : __SYSCALL_CLOBBERS); \ + : __SYSCALL_CLOBBERS, "$29"); \ err = __a3; \ _sys_result = __v0; \ } \ @@ -327,7 +327,7 @@ : "=r" (__v0), "+r" (__a3) \ : input, "r" (__a0), "r" (__a1), "r" (__a2), \ "r" ((long) (arg5)), "r" ((long) (arg6)) \ - : __SYSCALL_CLOBBERS); \ + : __SYSCALL_CLOBBERS, "$29"); \ err = __a3; \ _sys_result = __v0; \ } \ @@ -361,7 +361,7 @@ : "=r" (__v0), "+r" (__a3) \ : input, "r" (__a0), "r" (__a1), "r" (__a2), \ "r" ((long) (arg5)), "r" ((long) (arg6)), "r" ((long) (arg7)) \ - : __SYSCALL_CLOBBERS); \ + : __SYSCALL_CLOBBERS, "$29"); \ err = __a3; \ _sys_result = __v0; \ } \