Message ID | alpine.DEB.2.00.1708171313580.17596@tp.orcam.me.uk |
---|---|
State | Superseded |
Headers |
Received: (qmail 28514 invoked by alias); 17 Aug 2017 16:17:22 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 28492 invoked by uid 89); 17 Aug 2017 16:17:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=persuade, fp X-HELO: mailapp01.imgtec.com Date: Thu, 17 Aug 2017 17:17:05 +0100 From: "Maciej W. Rozycki" <macro@imgtec.com> To: Joseph Myers <joseph@codesourcery.com> CC: Aurelien Jarno <aurelien@aurel32.net>, Adhemerval Zanella <adhemerval.zanella@linaro.org>, <libc-alpha@sourceware.org> Subject: Re: [PATCH] mips/o32: fix internal_syscall5/6/7 In-Reply-To: <alpine.DEB.2.20.1708161444290.19544@digraph.polyomino.org.uk> Message-ID: <alpine.DEB.2.00.1708171313580.17596@tp.orcam.me.uk> References: <20170815115055.29375-1-aurelien@aurel32.net> <mvmd17xujkv.fsf@suse.de> <601ef1a8-f7f5-ce53-722c-b6dbaad2831d@linaro.org> <alpine.DEB.2.20.1708151624260.12690@digraph.polyomino.org.uk> <20170815193344.sszs3vmc7ahkspvx@aurel32.net> <alpine.DEB.2.20.1708151945430.19980@digraph.polyomino.org.uk> <20170815200812.6kmv554yfga2x4al@aurel32.net> <alpine.DEB.2.20.1708152012200.19980@digraph.polyomino.org.uk> <alpine.DEB.2.00.1708161350250.17596@tp.orcam.me.uk> <alpine.DEB.2.20.1708161341570.11844@digraph.polyomino.org.uk> <alpine.DEB.2.00.1708161510570.17596@tp.orcam.me.uk> <alpine.DEB.2.20.1708161444290.19544@digraph.polyomino.org.uk> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" |
Commit Message
Maciej W. Rozycki
Aug. 17, 2017, 4:17 p.m. UTC
On Wed, 16 Aug 2017, Joseph Myers wrote: > > If the answer to any of these questions is "yes", then would factoring > > out the syscall `asm' along with the associated VLA declaration to a > > helper `always_inline' function help or would it not? > > I don't think that would help. An asm can never make assumptions about > which parts of the stack are used for what, only use its operands. There may be ABI restrictions however, which could provide guarantees beyond those resulting from the lone `asm' operands. And it would be enough if we could prove that a certain arrangement has to be done in order not to break the ABI. I can't think of anything right now though and if neither you nor anyone else can, then we'll have to live with what we have right now. > > I mean it is a tiny optimisation, but some syscalls are frequently > > called, so if we can avoid a waste of resources, then why not? > > Are any 5/6/7-argument syscalls frequently called? Good question, however I have no data available. Anyway, here's my counter-proposal implementing the approach previously outlined. I have passed it through regular MIPS o32 testing with these changes in test outputs resulting: @@ -2575,7 +2575,7 @@ PASS: nptl/tst-cond22 PASS: nptl/tst-cond23 PASS: nptl/tst-cond24 -FAIL: nptl/tst-cond25 +PASS: nptl/tst-cond25 PASS: nptl/tst-cond3 PASS: nptl/tst-cond4 PASS: nptl/tst-cond5 @@ -2704,7 +2704,7 @@ PASS: nptl/tst-rwlock12 PASS: nptl/tst-rwlock13 PASS: nptl/tst-rwlock14 -FAIL: nptl/tst-rwlock15 +PASS: nptl/tst-rwlock15 PASS: nptl/tst-rwlock16 PASS: nptl/tst-rwlock17 PASS: nptl/tst-rwlock18 The drawback is it adds a bit to code generated, e.g. `__libc_pwrite' (from nptl/pwrite.o and nptl/pwrite.os) grows by 4 and 6 instructions respectively for non-PIC and PIC code respectively, and the whole libraries: text data bss dec hex filename 1483315 21129 11560 1516004 1721e4 libc.so 105482 960 8448 114890 1c0ca nptl/libpthread.so vs: text data bss dec hex filename 1484295 21133 11560 1516988 1725bc libc.so 105974 960 8448 115382 1c2b6 nptl/libpthread.so due to the insertion of the VLA size calculation (although GCC is smart enough to reuse a value of 0 already available, e.g.: 38: 7c03e83b rdhwr v1,$29 3c: 8c638b70 lw v1,-29840(v1) 40: 14600018 bnez v1,a4 <__libc_pwrite+0xa4> 44: 000787c3 sra s0,a3,0x1f 48: 000318c0 sll v1,v1,0x3 4c: 03a08825 move s1,sp 50: 03a3e823 subu sp,sp,v1 and save an isntruction) and the use of an extra register to preserve the value of $sp across the block containing the VLA (as also seen with $s1 in the disassembly above) even though it could use $fp that holds the same value instead (e.g. continuing from the above: 74: 0220e825 move sp,s1 78: 03c0e825 move sp,s8 ). It would be good to know how this compares to Adhemerval's proposal. Maciej * sysdeps/unix/sysv/linux/mips/mips32/sysdep.h (FORCE_FRAME_POINTER): Remove macro. (internal_syscall5): Use a variable-length array to force the use of a frame pointer. (internal_syscall6): Likewise. (internal_syscall7): Likewise. --- sysdeps/unix/sysv/linux/mips/mips32/sysdep.h | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) glibc-mips-o32-syscall-stack.diff
Comments
On 17/08/2017 13:17, Maciej W. Rozycki wrote: > On Wed, 16 Aug 2017, Joseph Myers wrote: > >>> If the answer to any of these questions is "yes", then would factoring >>> out the syscall `asm' along with the associated VLA declaration to a >>> helper `always_inline' function help or would it not? >> >> I don't think that would help. An asm can never make assumptions about >> which parts of the stack are used for what, only use its operands. > > There may be ABI restrictions however, which could provide guarantees > beyond those resulting from the lone `asm' operands. And it would be > enough if we could prove that a certain arrangement has to be done in > order not to break the ABI. I can't think of anything right now though > and if neither you nor anyone else can, then we'll have to live with what > we have right now. > >>> I mean it is a tiny optimisation, but some syscalls are frequently >>> called, so if we can avoid a waste of resources, then why not? >> >> Are any 5/6/7-argument syscalls frequently called? > > Good question, however I have no data available. > > Anyway, here's my counter-proposal implementing the approach previously > outlined. I have passed it through regular MIPS o32 testing with these > changes in test outputs resulting: > > @@ -2575,7 +2575,7 @@ > PASS: nptl/tst-cond22 > PASS: nptl/tst-cond23 > PASS: nptl/tst-cond24 > -FAIL: nptl/tst-cond25 > +PASS: nptl/tst-cond25 > PASS: nptl/tst-cond3 > PASS: nptl/tst-cond4 > PASS: nptl/tst-cond5 > @@ -2704,7 +2704,7 @@ > PASS: nptl/tst-rwlock12 > PASS: nptl/tst-rwlock13 > PASS: nptl/tst-rwlock14 > -FAIL: nptl/tst-rwlock15 > +PASS: nptl/tst-rwlock15 > PASS: nptl/tst-rwlock16 > PASS: nptl/tst-rwlock17 > PASS: nptl/tst-rwlock18 > > The drawback is it adds a bit to code generated, e.g. `__libc_pwrite' > (from nptl/pwrite.o and nptl/pwrite.os) grows by 4 and 6 instructions > respectively for non-PIC and PIC code respectively, and the whole > libraries: > > text data bss dec hex filename > 1483315 21129 11560 1516004 1721e4 libc.so > 105482 960 8448 114890 1c0ca nptl/libpthread.so > > vs: > > text data bss dec hex filename > 1484295 21133 11560 1516988 1725bc libc.so > 105974 960 8448 115382 1c2b6 nptl/libpthread.so > > due to the insertion of the VLA size calculation (although GCC is smart > enough to reuse a value of 0 already available, e.g.: > > 38: 7c03e83b rdhwr v1,$29 > 3c: 8c638b70 lw v1,-29840(v1) > 40: 14600018 bnez v1,a4 <__libc_pwrite+0xa4> > 44: 000787c3 sra s0,a3,0x1f > 48: 000318c0 sll v1,v1,0x3 > 4c: 03a08825 move s1,sp > 50: 03a3e823 subu sp,sp,v1 > > and save an isntruction) and the use of an extra register to preserve the > value of $sp across the block containing the VLA (as also seen with $s1 in > the disassembly above) even though it could use $fp that holds the same > value instead (e.g. continuing from the above: > > 74: 0220e825 move sp,s1 > 78: 03c0e825 move sp,s8 > > ). It would be good to know how this compares to Adhemerval's proposal. My point is I think we should aim for compiler optimization safeness (to avoid code breakage over compiler defined default flags) and taking as base current approach to *avoid* VLA on GLIBC I do not think it is good approach to use it as a bridge to force GCC to generate the expected code. I still thinking trying to optimize for 5/6/7 syscall argument is over engineering in this *specific* case. As I put in my last message, 5/6/7 argument syscalls are used for pread, pwrite, lseek, llseek, ppoll, posix_fadvice, posix_fallocate, sync_file_range, fallocate, preadv, pwritev, preadv2, pwritev2, select, pselect, mmap, readahead, epoll_pwait, splice, recvfrom, sendto, recvmmsg, msgsnd, msgrcv, msgget, msgctl, semop, semget, semctl, semtimedop, shmat, shmdt, shmget, and shmctl. Which are the one generated from C implementation (some are still auto generated). The majority of them are blocking syscalls, so both context switch plus the required work for syscall completion itself will taking proportionally all the required time. So trying to squeeze some cycles don't really pay off comparing to code maintainability (just all this discussion of which C construct would be safe enough to generate the correct stack spill plus the current issue should indicate we should aim for correctness first). > > Maciej > > * sysdeps/unix/sysv/linux/mips/mips32/sysdep.h > (FORCE_FRAME_POINTER): Remove macro. > (internal_syscall5): Use a variable-length array to force the > use of a frame pointer. > (internal_syscall6): Likewise. > (internal_syscall7): Likewise. > --- > sysdeps/unix/sysv/linux/mips/mips32/sysdep.h | 24 +++++++++++++++++------- > 1 file changed, 17 insertions(+), 7 deletions(-) > > glibc-mips-o32-syscall-stack.diff > Index: glibc/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h > =================================================================== > --- glibc.orig/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h 2017-04-11 21:27:25.000000000 +0100 > +++ glibc/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h 2017-08-16 20:49:15.758029215 +0100 > @@ -264,18 +264,20 @@ > > /* We need to use a frame pointer for the functions in which we > adjust $sp around the syscall, or debug information and unwind > - information will be $sp relative and thus wrong during the syscall. As > - of GCC 4.7, this is sufficient. */ > -#define FORCE_FRAME_POINTER \ > - void *volatile __fp_force __attribute__ ((unused)) = alloca (4) > + information will be $sp relative and thus wrong during the syscall. > + We use a variable-length array to persuade GCC to use $fp. */ > > #define internal_syscall5(v0_init, input, number, err, \ > arg1, arg2, arg3, arg4, arg5) \ > ({ \ > long _sys_result; \ > \ > - FORCE_FRAME_POINTER; \ > + size_t s = 0; \ > + asm ("" : "+r" (s)); \ > { \ > + char vla[s << 3]; \ > + asm ("" : : "p" (vla)); \ > + \ > register long __s0 asm ("$16") __attribute__ ((unused)) \ > = (number); \ > register long __v0 asm ("$2"); \ > @@ -306,8 +308,12 @@ > ({ \ > long _sys_result; \ > \ > - FORCE_FRAME_POINTER; \ > + size_t s = 0; \ > + asm ("" : "+r" (s)); \ > { \ > + char vla[s << 3]; \ > + asm ("" : : "p" (vla)); \ > + \ > register long __s0 asm ("$16") __attribute__ ((unused)) \ > = (number); \ > register long __v0 asm ("$2"); \ > @@ -339,8 +345,12 @@ > ({ \ > long _sys_result; \ > \ > - FORCE_FRAME_POINTER; \ > + size_t s = 0; \ > + asm ("" : "+r" (s)); \ > { \ > + char vla[s << 3]; \ > + asm ("" : : "p" (vla)); \ > + \ > register long __s0 asm ("$16") __attribute__ ((unused)) \ > = (number); \ > register long __v0 asm ("$2"); \ >
On Thu, 17 Aug 2017, Adhemerval Zanella wrote: > My point is I think we should aim for compiler optimization safeness > (to avoid code breakage over compiler defined default flags) and taking > as base current approach to *avoid* VLA on GLIBC I do not think it is > good approach to use it as a bridge to force GCC to generate the expected > code. I think the point that -Werror=alloca -Werror=vla would be desirable for building glibc (if you don't have any variable-size stack allocations, you don't need to worry about problems with unbounded stack allocations, which are always bad, even given reliable stack checking, because of the inability to report errors from them) is a good one about why to avoid using the VLA approach.
On 2017-08-17 17:17, Maciej W. Rozycki wrote: > The drawback is it adds a bit to code generated, e.g. `__libc_pwrite' > (from nptl/pwrite.o and nptl/pwrite.os) grows by 4 and 6 instructions > respectively for non-PIC and PIC code respectively, and the whole > libraries: > > text data bss dec hex filename > 1483315 21129 11560 1516004 1721e4 libc.so > 105482 960 8448 114890 1c0ca nptl/libpthread.so > > vs: > > text data bss dec hex filename > 1484295 21133 11560 1516988 1725bc libc.so > 105974 960 8448 115382 1c2b6 nptl/libpthread.so > > due to the insertion of the VLA size calculation (although GCC is smart > enough to reuse a value of 0 already available, e.g.: > > 38: 7c03e83b rdhwr v1,$29 > 3c: 8c638b70 lw v1,-29840(v1) > 40: 14600018 bnez v1,a4 <__libc_pwrite+0xa4> > 44: 000787c3 sra s0,a3,0x1f > 48: 000318c0 sll v1,v1,0x3 > 4c: 03a08825 move s1,sp > 50: 03a3e823 subu sp,sp,v1 > > and save an isntruction) and the use of an extra register to preserve the > value of $sp across the block containing the VLA (as also seen with $s1 in > the disassembly above) even though it could use $fp that holds the same > value instead (e.g. continuing from the above: > > 74: 0220e825 move sp,s1 > 78: 03c0e825 move sp,s8 > > ). It would be good to know how this compares to Adhemerval's proposal. I have been trying to improve Adhemerval's patches a bit by returning the error value in v1, in addition to the return code in v0. Here are the corresponding numbers: w/o patch: text data bss dec hex filename 1489767 21085 11560 1522412 173aec libc.so 107908 956 8448 117312 1ca40 nptl/libpthread.so with patch: text data bss dec hex filename 1488135 21089 11560 1520784 173490 libc.so 107244 960 8448 116652 1c7ac nptl/libpthread.so When looking at a given function like `__libc_pwrite' it gets reduced by 13 instructions in both PIC and non-PIC cases. However we need to add the 16 instructions of __libc_do_syscall. Aurelien
On Thu, 17 Aug 2017, Adhemerval Zanella wrote: > My point is I think we should aim for compiler optimization safeness > (to avoid code breakage over compiler defined default flags) and taking > as base current approach to *avoid* VLA on GLIBC I do not think it is > good approach to use it as a bridge to force GCC to generate the expected > code. You certainly have a point here overall, although I don't think a VLA whose size is always 0 really hurts. And we've used the approach with `alloca' since forever with no adverse effects until we added a place where the caller invokes the syscall wrapper in a loop. So I wouldn't necessarily call it an issue. Mind that this is target-specific code, so we can rely on a target-specific execution model rather than limiting ourselves to what generic ISO C guarantees. Aurelien's figures indicating a clear size reduction certainly count as a pro though. > I still thinking trying to optimize for 5/6/7 syscall argument is over > engineering in this *specific* case. As I put in my last message, > 5/6/7 argument syscalls are used for > > pread, pwrite, lseek, llseek, ppoll, posix_fadvice, posix_fallocate, > sync_file_range, fallocate, preadv, pwritev, preadv2, pwritev2, select, > pselect, mmap, readahead, epoll_pwait, splice, recvfrom, sendto, recvmmsg, > msgsnd, msgrcv, msgget, msgctl, semop, semget, semctl, semtimedop, shmat, > shmdt, shmget, and shmctl. > > Which are the one generated from C implementation (some are still auto > generated). The majority of them are blocking syscalls, so both context > switch plus the required work for syscall completion itself will taking > proportionally all the required time. So trying to squeeze some cycles > don't really pay off comparing to code maintainability (just all this > discussion of which C construct would be safe enough to generate the > correct stack spill plus the current issue should indicate we should > aim for correctness first). TBH, I find it questionable whether it's really the approach I proposed that requires more engineering (and long-term maintenance) effort rather than using a separate handwritten assembly-language call stub. Especially if a non-standard calling convention is used. If everyone but me thinks there's a clear advantage in using such a handcoded stub though, then as I previously noted please adjust the affected MIPS16 stubs to avoid the extra indirection, i.e. you can call `__libc_do_syscall' directly from MIPS16 code as you'd do from regular MIPS or microMIPS code, as the lone reason for the existence of the MIPS16 stubs is the inexistence of a MIPS16 SYSCALL instruction. Once you're done with that I can push your proposed change through MIPS16 regression testing if that helped. I can see if I can run microMIPS testing as well, although I'd have to double-check for an available board as I don't use one regularly. Maciej
On 17/08/2017 17:34, Maciej W. Rozycki wrote: > On Thu, 17 Aug 2017, Adhemerval Zanella wrote: > >> My point is I think we should aim for compiler optimization safeness >> (to avoid code breakage over compiler defined default flags) and taking >> as base current approach to *avoid* VLA on GLIBC I do not think it is >> good approach to use it as a bridge to force GCC to generate the expected >> code. > > You certainly have a point here overall, although I don't think a VLA > whose size is always 0 really hurts. And we've used the approach with > `alloca' since forever with no adverse effects until we added a place > where the caller invokes the syscall wrapper in a loop. So I wouldn't > necessarily call it an issue. Mind that this is target-specific code, so > we can rely on a target-specific execution model rather than limiting > ourselves to what generic ISO C guarantees. > > Aurelien's figures indicating a clear size reduction certainly count as a > pro though. Joseph pointed out another advantage of avoid VLAs (building with -Werror=alloca -Werror=vla). My main problem here is we are betting that compiler won't mess with our assumptions and generate the desirable code without trying to adhere what it is suppose to provide. Target generic ISO C give us a better guarantee and any deviation indicates a possible compiler issue, not otherwise (such this case). My another point is we can optimize if required later if this is the case and imho this is hardly the case here (at least for latency). If I understood correctly Aurelien's suggestion of returning err in v1 is not ABI strictly so it will end up calling __libc_do_syscall with a non-conformant ABI convention (similar to pipe implementation where requires assembly specific implementation for a lot of architectures to get this right). Again this is something I would really to avoid. > >> I still thinking trying to optimize for 5/6/7 syscall argument is over >> engineering in this *specific* case. As I put in my last message, >> 5/6/7 argument syscalls are used for >> >> pread, pwrite, lseek, llseek, ppoll, posix_fadvice, posix_fallocate, >> sync_file_range, fallocate, preadv, pwritev, preadv2, pwritev2, select, >> pselect, mmap, readahead, epoll_pwait, splice, recvfrom, sendto, recvmmsg, >> msgsnd, msgrcv, msgget, msgctl, semop, semget, semctl, semtimedop, shmat, >> shmdt, shmget, and shmctl. >> >> Which are the one generated from C implementation (some are still auto >> generated). The majority of them are blocking syscalls, so both context >> switch plus the required work for syscall completion itself will taking >> proportionally all the required time. So trying to squeeze some cycles >> don't really pay off comparing to code maintainability (just all this >> discussion of which C construct would be safe enough to generate the >> correct stack spill plus the current issue should indicate we should >> aim for correctness first). > > TBH, I find it questionable whether it's really the approach I proposed > that requires more engineering (and long-term maintenance) effort rather > than using a separate handwritten assembly-language call stub. Especially > if a non-standard calling convention is used. IMHO I find the VLA suggestion more fragile in long term. > > If everyone but me thinks there's a clear advantage in using such a > handcoded stub though, then as I previously noted please adjust the > affected MIPS16 stubs to avoid the extra indirection, i.e. you can call > `__libc_do_syscall' directly from MIPS16 code as you'd do from regular > MIPS or microMIPS code, as the lone reason for the existence of the MIPS16 > stubs is the inexistence of a MIPS16 SYSCALL instruction. Ok, I will try to at least check it on qemu. If you have any points on how correctly build a mips16 glibc it could be helpful. > > Once you're done with that I can push your proposed change through MIPS16 > regression testing if that helped. I can see if I can run microMIPS > testing as well, although I'd have to double-check for an available board > as I don't use one regularly. > > Maciej >
On 2017-08-17 18:09, Adhemerval Zanella wrote: > > > On 17/08/2017 17:34, Maciej W. Rozycki wrote: > > On Thu, 17 Aug 2017, Adhemerval Zanella wrote: > > > >> My point is I think we should aim for compiler optimization safeness > >> (to avoid code breakage over compiler defined default flags) and taking > >> as base current approach to *avoid* VLA on GLIBC I do not think it is > >> good approach to use it as a bridge to force GCC to generate the expected > >> code. > > > > You certainly have a point here overall, although I don't think a VLA > > whose size is always 0 really hurts. And we've used the approach with > > `alloca' since forever with no adverse effects until we added a place > > where the caller invokes the syscall wrapper in a loop. So I wouldn't > > necessarily call it an issue. Mind that this is target-specific code, so > > we can rely on a target-specific execution model rather than limiting > > ourselves to what generic ISO C guarantees. > > > > Aurelien's figures indicating a clear size reduction certainly count as a > > pro though. > > Joseph pointed out another advantage of avoid VLAs (building with > -Werror=alloca -Werror=vla). My main problem here is we are betting that > compiler won't mess with our assumptions and generate the desirable code > without trying to adhere what it is suppose to provide. Target generic > ISO C give us a better guarantee and any deviation indicates a possible > compiler issue, not otherwise (such this case). My another point is we > can optimize if required later if this is the case and imho this is hardly > the case here (at least for latency). > > If I understood correctly Aurelien's suggestion of returning err in v1 > is not ABI strictly so it will end up calling __libc_do_syscall with a > non-conformant ABI convention (similar to pipe implementation where requires > assembly specific implementation for a lot of architectures to get this > right). Again this is something I would really to avoid. > In the ABI v1 is used in pair with v0 to return 64-bit values. In my patch the __libc_do_syscall is declared as returning a long long. The value is then split using a union, in a similar way to what is already done for the mips16 code.
On Thu, 17 Aug 2017, Adhemerval Zanella wrote: > If I understood correctly Aurelien's suggestion of returning err in v1 > is not ABI strictly so it will end up calling __libc_do_syscall with a > non-conformant ABI convention (similar to pipe implementation where requires > assembly specific implementation for a lot of architectures to get this > right). Again this is something I would really to avoid. Using $v1 is fine, in ABI terms it's just a part of a `long long' result, and you can access it in plain C in the caller (shifting and masking individual 32-bit halves if necessary). I've done it myself in the past in some bare-metal library code. > > If everyone but me thinks there's a clear advantage in using such a > > handcoded stub though, then as I previously noted please adjust the > > affected MIPS16 stubs to avoid the extra indirection, i.e. you can call > > `__libc_do_syscall' directly from MIPS16 code as you'd do from regular > > MIPS or microMIPS code, as the lone reason for the existence of the MIPS16 > > stubs is the inexistence of a MIPS16 SYSCALL instruction. > > Ok, I will try to at least check it on qemu. If you have any points on how > correctly build a mips16 glibc it could be helpful. Just pass `-mips16' along with CFLAGS. You may have to make sure your GCC configuration includes/supports a suitable MIPS16 mulitilib though (i.e. MIPS16 libgcc.a and CRT files of your chosen endianness; check with `-print-multi-lib' for entries with `@mips16'), to avoid interlinking scenarios that may not be supported. I don't remember offhand what the defaults for the individual GCC configurations are, although I'm fairly sure at least one of `mips-mti-linux-gnu' and `mips-img-linux-gnu' configurations does have MIPS16 multilibs. Let me know if you have troubles with that. Maciej
Index: glibc/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h =================================================================== --- glibc.orig/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h 2017-04-11 21:27:25.000000000 +0100 +++ glibc/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h 2017-08-16 20:49:15.758029215 +0100 @@ -264,18 +264,20 @@ /* We need to use a frame pointer for the functions in which we adjust $sp around the syscall, or debug information and unwind - information will be $sp relative and thus wrong during the syscall. As - of GCC 4.7, this is sufficient. */ -#define FORCE_FRAME_POINTER \ - void *volatile __fp_force __attribute__ ((unused)) = alloca (4) + information will be $sp relative and thus wrong during the syscall. + We use a variable-length array to persuade GCC to use $fp. */ #define internal_syscall5(v0_init, input, number, err, \ arg1, arg2, arg3, arg4, arg5) \ ({ \ long _sys_result; \ \ - FORCE_FRAME_POINTER; \ + size_t s = 0; \ + asm ("" : "+r" (s)); \ { \ + char vla[s << 3]; \ + asm ("" : : "p" (vla)); \ + \ register long __s0 asm ("$16") __attribute__ ((unused)) \ = (number); \ register long __v0 asm ("$2"); \ @@ -306,8 +308,12 @@ ({ \ long _sys_result; \ \ - FORCE_FRAME_POINTER; \ + size_t s = 0; \ + asm ("" : "+r" (s)); \ { \ + char vla[s << 3]; \ + asm ("" : : "p" (vla)); \ + \ register long __s0 asm ("$16") __attribute__ ((unused)) \ = (number); \ register long __v0 asm ("$2"); \ @@ -339,8 +345,12 @@ ({ \ long _sys_result; \ \ - FORCE_FRAME_POINTER; \ + size_t s = 0; \ + asm ("" : "+r" (s)); \ { \ + char vla[s << 3]; \ + asm ("" : : "p" (vla)); \ + \ register long __s0 asm ("$16") __attribute__ ((unused)) \ = (number); \ register long __v0 asm ("$2"); \