[03/15] sparc: Use Linux kABI for syscall return
Commit Message
On 11/02/2020 08:15, Florian Weimer wrote:
> * Adhemerval Zanella:
>
>> + if (__glibc_unlikely (__g1 != 0)) \
>
> This change is inconsistent with the other updates, which use __g1 ==
> -1. Is this deliberate?
>
> Thanks,
> Florian
>
In fact __SYSCALL_STRING already sets the 'o0' to a negative value if
the 'xcc' condition is set (indicating that the syscall has failed).
The 'g1' check is superfluous, it will be always true since 'g1' will
be either 0 or 1.
And both the set and check of 'g1' result is also superfluous, since 'o0'
will already hold all the required information.
Below is an updated patch, checked on sparc64-linux-gnu and
sparcv9-linux-gnu.
--
It changes the sparc internal_syscall* macros to return a negative
value instead the 'g1' register value on 'err' macro argument.
The __SYSCALL_STRING macro is also changed to no set the 'g1'
value, since 'o1' already holds all the required information
to check if syscall has failed.
The macro INTERNAL_SYSCALL_DECL is no longer required, and the
INTERNAL_SYSCALL_ERROR_P follows the other Linux kABIS. The
redefinition of INTERNAL_VSYSCALL_CALL is also no longer
required.
Checked on sparc64-linux-gnu and sparcv9-linux-gnu. It fixes
the sporadic issues on sparc32 where clock_nanosleep does not
act as cancellation entrypoint.
---
.../unix/sysv/linux/sparc/sparc32/sysdep.h | 3 +-
.../unix/sysv/linux/sparc/sparc64/sysdep.h | 3 +-
sysdeps/unix/sysv/linux/sparc/sysdep.h | 80 +++++++++----------
3 files changed, 38 insertions(+), 48 deletions(-)
Comments
* Adhemerval Zanella:
> On 11/02/2020 08:15, Florian Weimer wrote:
>> * Adhemerval Zanella:
>>
>>> + if (__glibc_unlikely (__g1 != 0)) \
>>
>> This change is inconsistent with the other updates, which use __g1 ==
>> -1. Is this deliberate?
>>
>> Thanks,
>> Florian
>>
>
> In fact __SYSCALL_STRING already sets the 'o0' to a negative value if
> the 'xcc' condition is set (indicating that the syscall has failed).
> The 'g1' check is superfluous, it will be always true since 'g1' will
> be either 0 or 1.
>
> And both the set and check of 'g1' result is also superfluous, since 'o0'
> will already hold all the required information.
>
> Below is an updated patch, checked on sparc64-linux-gnu and
> sparcv9-linux-gnu.
I see, nice additional cleanup.
> It changes the sparc internal_syscall* macros to return a negative
> value instead the 'g1' register value on 'err' macro argument.
^ of ^ in the?
> The __SYSCALL_STRING macro is also changed to no set the 'g1'
^ not
> value, since 'o1' already holds all the required information
> to check if syscall has failed.
>
> The macro INTERNAL_SYSCALL_DECL is no longer required, and the
> INTERNAL_SYSCALL_ERROR_P follows the other Linux kABIS. The
^ macro? ^ kABIs
(or drop the “the“ on the preceding line)
> redefinition of INTERNAL_VSYSCALL_CALL is also no longer
> required.
>
> Checked on sparc64-linux-gnu and sparcv9-linux-gnu. It fixes
> the sporadic issues on sparc32 where clock_nanosleep does not
> act as cancellation entrypoint.
I double-checked this against the kernel sources, and entry.S has
this:
ret_sys_call:
ld [%curptr + TI_FLAGS], %l6
cmp %o0, -ERESTART_RESTARTBLOCK
ld [%sp + STACKFRAME_SZ + PT_PSR], %g3
set PSR_C, %g2
bgeu 1f
But ERESTART_RESTARTBLOCK is not 4095, so glibc with this change will
now treat certain internal kernel error codes as errors, while they
were previously reported as success. This looks like a kernel bug, in
that ERESTART_RESTARTBLOCK was not updated when more error codes were
added. On the other hand, these error codes should never leak into
userspace.
To me, your patch looks good.
On 11/02/2020 16:24, Florian Weimer wrote:
> * Adhemerval Zanella:
>
>> On 11/02/2020 08:15, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> + if (__glibc_unlikely (__g1 != 0)) \
>>>
>>> This change is inconsistent with the other updates, which use __g1 ==
>>> -1. Is this deliberate?
>>>
>>> Thanks,
>>> Florian
>>>
>>
>> In fact __SYSCALL_STRING already sets the 'o0' to a negative value if
>> the 'xcc' condition is set (indicating that the syscall has failed).
>> The 'g1' check is superfluous, it will be always true since 'g1' will
>> be either 0 or 1.
>>
>> And both the set and check of 'g1' result is also superfluous, since 'o0'
>> will already hold all the required information.
>>
>> Below is an updated patch, checked on sparc64-linux-gnu and
>> sparcv9-linux-gnu.
>
> I see, nice additional cleanup.
>
>> It changes the sparc internal_syscall* macros to return a negative
>> value instead the 'g1' register value on 'err' macro argument.
> ^ of ^ in the?
>
>
Ack.
>> The __SYSCALL_STRING macro is also changed to no set the 'g1'
> ^ not
>> value, since 'o1' already holds all the required information
>> to check if syscall has failed.
>>
>> The macro INTERNAL_SYSCALL_DECL is no longer required, and the
>> INTERNAL_SYSCALL_ERROR_P follows the other Linux kABIS. The
> ^ macro? ^ kABIs
Ack.
> (or drop the “the“ on the preceding line)
>
>> redefinition of INTERNAL_VSYSCALL_CALL is also no longer
>> required.
>>
>> Checked on sparc64-linux-gnu and sparcv9-linux-gnu. It fixes
>> the sporadic issues on sparc32 where clock_nanosleep does not
>> act as cancellation entrypoint.
>
> I double-checked this against the kernel sources, and entry.S has
> this:
>
> ret_sys_call:
> ld [%curptr + TI_FLAGS], %l6
> cmp %o0, -ERESTART_RESTARTBLOCK
> ld [%sp + STACKFRAME_SZ + PT_PSR], %g3
> set PSR_C, %g2
> bgeu 1f
>
> But ERESTART_RESTARTBLOCK is not 4095, so glibc with this change will
> now treat certain internal kernel error codes as errors, while they
> were previously reported as success. This looks like a kernel bug, in
> that ERESTART_RESTARTBLOCK was not updated when more error codes were
> added. On the other hand, these error codes should never leak into
> userspace.
My understanding is such errors should not be visible by the application,
as indicated by include/linux/errno.h comment. And it seems to be the
case for sparc, at least on:
arch/sparc/kernel/signal_64.c
477 static void do_signal(struct pt_regs *regs, unsigned long orig_i0)
[...]
514 restart_syscall = 0;
515 if (pt_regs_is_syscall(regs) &&
516 (regs->tstate & (TSTATE_XCARRY | TSTATE_ICARRY))) {
517 restart_syscall = 1;
518 orig_i0 = regs->u_regs[UREG_G6];
519 }
520
521 if (has_handler) {
522 if (restart_syscall)
523 syscall_restart(orig_i0, regs, &ksig.ka.sa);
524 signal_setup_done(setup_rt_frame(&ksig, regs), &ksig, 0);
525 } else {
526 if (restart_syscall) {
527 switch (regs->u_regs[UREG_I0]) {
528 case ERESTARTNOHAND:
529 case ERESTARTSYS:
530 case ERESTARTNOINTR:
531 /* replay the system call when we are done */
532 regs->u_regs[UREG_I0] = orig_i0;
533 regs->tpc -= 4;
534 regs->tnpc -= 4;
535 pt_regs_clear_syscall(regs);
536 /* fall through */
537 case ERESTART_RESTARTBLOCK:
538 regs->u_regs[UREG_G1] = __NR_restart_syscall;
539 regs->tpc -= 4;
540 regs->tnpc -= 4;
541 pt_regs_clear_syscall(regs);
542 }
543 }
544
If signal has a handler, syscall_restart will either set EINTR or
previous 'o0' value. Otherwise if syscall should be restarted,
either it will be trying again (line 538) or previous error code
would be set.
>
> To me, your patch looks good.
>
* Adhemerval Zanella:
>> But ERESTART_RESTARTBLOCK is not 4095, so glibc with this change will
>> now treat certain internal kernel error codes as errors, while they
>> were previously reported as success. This looks like a kernel bug, in
>> that ERESTART_RESTARTBLOCK was not updated when more error codes were
>> added. On the other hand, these error codes should never leak into
>> userspace.
>
> My understanding is such errors should not be visible by the application,
> as indicated by include/linux/errno.h comment. And it seems to be the
> case for sparc, at least on:
These error codes tend to leak from device drivers and other less
scrutinized parts of the kernel. It's not actually about the
ERESTART_RESTARTBLOCK value as such, there are other values larger
than that:
#define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */
#define EPROBE_DEFER 517 /* Driver requests probe retry */
#define EOPENSTALE 518 /* open found a stale dentry */
#define ENOPARAM 519 /* Parameter not supported */
/* Defined for the NFSv3 protocol */
#define EBADHANDLE 521 /* Illegal NFS file handle */
#define ENOTSYNC 522 /* Update synchronization mismatch */
#define EBADCOOKIE 523 /* Cookie is stale */
#define ENOTSUPP 524 /* Operation is not supported */
And so on.
Like I said, it looks like someone forgot to update this code. It
probably should use the 4095 boundary and not specific error codes
anyway. (We had a similar problem in glibc itself in the s390
socketcall support.)
On 11/02/2020 18:15, Florian Weimer wrote:
> * Adhemerval Zanella:
>
>>> But ERESTART_RESTARTBLOCK is not 4095, so glibc with this change will
>>> now treat certain internal kernel error codes as errors, while they
>>> were previously reported as success. This looks like a kernel bug, in
>>> that ERESTART_RESTARTBLOCK was not updated when more error codes were
>>> added. On the other hand, these error codes should never leak into
>>> userspace.
>>
>> My understanding is such errors should not be visible by the application,
>> as indicated by include/linux/errno.h comment. And it seems to be the
>> case for sparc, at least on:
>
> These error codes tend to leak from device drivers and other less
> scrutinized parts of the kernel. It's not actually about the
> ERESTART_RESTARTBLOCK value as such, there are other values larger
> than that:
>
> #define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */
> #define EPROBE_DEFER 517 /* Driver requests probe retry */
> #define EOPENSTALE 518 /* open found a stale dentry */
> #define ENOPARAM 519 /* Parameter not supported */
>
> /* Defined for the NFSv3 protocol */
> #define EBADHANDLE 521 /* Illegal NFS file handle */
> #define ENOTSYNC 522 /* Update synchronization mismatch */
> #define EBADCOOKIE 523 /* Cookie is stale */
> #define ENOTSUPP 524 /* Operation is not supported */
>
> And so on.
>
> Like I said, it looks like someone forgot to update this code. It
> probably should use the 4095 boundary and not specific error codes
> anyway. (We had a similar problem in glibc itself in the s390
> socketcall support.)
This code seems to come from since initial git repository
(Linux-2.6.12-rc2).
From a glibc standpoint, the error handling will be the same in fact,
since what indicates the syscall has failed is the carry condition code
value, not the syscall returned value ('o0' register).
* Adhemerval Zanella:
> This code seems to come from since initial git repository
> (Linux-2.6.12-rc2).
>
> From a glibc standpoint, the error handling will be the same in fact,
> since what indicates the syscall has failed is the carry condition code
> value, not the syscall returned value ('o0' register).
The kernel will not set the carry condition code for large errors due
to the faulty check. I think before your changes, we would not treat
these cases as errors because the carry condition is not set and we
check the separate err value. After your changes, the carry condition
code is still not set, but the return value looks like an error return
value if unchanged, so we now treat these leaked error codes as
errors.
But I don't think this should block your changes.
On 12/02/2020 09:38, Florian Weimer wrote:
> * Adhemerval Zanella:
>
>> This code seems to come from since initial git repository
>> (Linux-2.6.12-rc2).
>>
>> From a glibc standpoint, the error handling will be the same in fact,
>> since what indicates the syscall has failed is the carry condition code
>> value, not the syscall returned value ('o0' register).
>
> The kernel will not set the carry condition code for large errors due
> to the faulty check. I think before your changes, we would not treat
> these cases as errors because the carry condition is not set and we
> check the separate err value. After your changes, the carry condition
> code is still not set, but the return value looks like an error return
> value if unchanged, so we now treat these leaked error codes as
> errors.
No, the sparc 'ret_sys_call' returns abs(errno) for syscall failure,
not the value in the range of the expected Linux failure values.
So, if kernel returns a value large than ERESTART_RESTARTBLOCK *without*
set the carry condition code set current syscall code results in:
o0 = abs (errno)
g1 = 0
And then the variable defined by INTERNAL_SYSCALL_DECL holds '0' and
INTERNAL_SYSCALL_ERROR_P evaluates to false.
With this change:
o0 = abs (errno)
And INTERNAL_SYSCALL_ERROR_P, which now uses the expected Linux kABI,
will evaluate to 0 as well.
>
> But I don't think this should block your changes.
>
@@ -110,9 +110,8 @@ ENTRY(name); \
#define __SYSCALL_STRING \
"ta 0x10;" \
"bcc 1f;" \
- " mov 0, %%g1;" \
+ " nop;" \
"sub %%g0, %%o0, %%o0;" \
- "mov 1, %%g1;" \
"1:"
#define __SYSCALL_CLOBBERS \
@@ -109,9 +109,8 @@ ENTRY(name); \
#define __SYSCALL_STRING \
"ta 0x6d;" \
"bcc,pt %%xcc, 1f;" \
- " mov 0, %%g1;" \
+ " nop;" \
"sub %%g0, %%o0, %%o0;" \
- "mov 1, %%g1;" \
"1:"
#define __SYSCALL_CLOBBERS \
@@ -34,13 +34,6 @@
#else /* __ASSEMBLER__ */
-#define INTERNAL_VSYSCALL_CALL(funcptr, err, nr, args...) \
- ({ \
- long _ret = funcptr (args); \
- err = ((unsigned long) (_ret) >= (unsigned long) -4095L); \
- _ret; \
- })
-
# define VDSO_NAME "LINUX_2.6"
# define VDSO_HASH 61765110
@@ -65,108 +58,107 @@
})
#undef INTERNAL_SYSCALL_DECL
-#define INTERNAL_SYSCALL_DECL(err) \
- register long err __asm__("g1");
+#define INTERNAL_SYSCALL_DECL(err) do { } while (0)
#undef INTERNAL_SYSCALL
#define INTERNAL_SYSCALL(name, err, nr, args...) \
- inline_syscall##nr(__SYSCALL_STRING, err, __NR_##name, args)
+ internal_syscall##nr(__SYSCALL_STRING, err, __NR_##name, args)
#undef INTERNAL_SYSCALL_NCS
#define INTERNAL_SYSCALL_NCS(name, err, nr, args...) \
- inline_syscall##nr(__SYSCALL_STRING, err, name, args)
+ internal_syscall##nr(__SYSCALL_STRING, err, name, args)
#undef INTERNAL_SYSCALL_ERROR_P
#define INTERNAL_SYSCALL_ERROR_P(val, err) \
- ((void) (val), __builtin_expect((err) != 0, 0))
+ ((unsigned long) (val) > -4096UL)
#undef INTERNAL_SYSCALL_ERRNO
#define INTERNAL_SYSCALL_ERRNO(val, err) (-(val))
-#define inline_syscall0(string,err,name,dummy...) \
+#define internal_syscall0(string,err,name,dummy...) \
({ \
+ register long int __g1 __asm__ ("g1") = (name); \
register long __o0 __asm__ ("o0"); \
- err = name; \
- __asm __volatile (string : "=r" (err), "=r" (__o0) : \
- "0" (err) : \
+ __asm __volatile (string : "=r" (__o0) : \
+ "0" (__g1) : \
__SYSCALL_CLOBBERS); \
__o0; \
})
-#define inline_syscall1(string,err,name,arg1) \
+#define internal_syscall1(string,err,name,arg1) \
({ \
+ register long int __g1 __asm__("g1") = (name); \
register long __o0 __asm__ ("o0") = (long)(arg1); \
- err = name; \
- __asm __volatile (string : "=r" (err), "=r" (__o0) : \
- "0" (err), "1" (__o0) : \
+ __asm __volatile (string : "=r" (__o0) : \
+ "r" (__g1), "0" (__o0) : \
__SYSCALL_CLOBBERS); \
__o0; \
})
-#define inline_syscall2(string,err,name,arg1,arg2) \
+#define internal_syscall2(string,err,name,arg1,arg2) \
({ \
+ register long int __g1 __asm__("g1") = (name); \
register long __o0 __asm__ ("o0") = (long)(arg1); \
register long __o1 __asm__ ("o1") = (long)(arg2); \
- err = name; \
- __asm __volatile (string : "=r" (err), "=r" (__o0) : \
- "0" (err), "1" (__o0), "r" (__o1) : \
+ __asm __volatile (string : "=r" (__o0) : \
+ "r" (__g1), "0" (__o0), "r" (__o1) : \
__SYSCALL_CLOBBERS); \
__o0; \
})
-#define inline_syscall3(string,err,name,arg1,arg2,arg3) \
+#define internal_syscall3(string,err,name,arg1,arg2,arg3) \
({ \
+ register long int __g1 __asm__("g1") = (name); \
register long __o0 __asm__ ("o0") = (long)(arg1); \
register long __o1 __asm__ ("o1") = (long)(arg2); \
register long __o2 __asm__ ("o2") = (long)(arg3); \
- err = name; \
- __asm __volatile (string : "=r" (err), "=r" (__o0) : \
- "0" (err), "1" (__o0), "r" (__o1), \
+ __asm __volatile (string : "=r" (__o0) : \
+ "r" (__g1), "0" (__o0), "r" (__o1), \
"r" (__o2) : \
__SYSCALL_CLOBBERS); \
__o0; \
})
-#define inline_syscall4(string,err,name,arg1,arg2,arg3,arg4) \
+#define internal_syscall4(string,err,name,arg1,arg2,arg3,arg4) \
({ \
+ register long int __g1 __asm__("g1") = (name); \
register long __o0 __asm__ ("o0") = (long)(arg1); \
register long __o1 __asm__ ("o1") = (long)(arg2); \
register long __o2 __asm__ ("o2") = (long)(arg3); \
register long __o3 __asm__ ("o3") = (long)(arg4); \
- err = name; \
- __asm __volatile (string : "=r" (err), "=r" (__o0) : \
- "0" (err), "1" (__o0), "r" (__o1), \
+ __asm __volatile (string : "=r" (__o0) : \
+ "r" (__g1), "0" (__o0), "r" (__o1), \
"r" (__o2), "r" (__o3) : \
__SYSCALL_CLOBBERS); \
__o0; \
})
-#define inline_syscall5(string,err,name,arg1,arg2,arg3,arg4,arg5) \
+#define internal_syscall5(string,err,name,arg1,arg2,arg3,arg4,arg5) \
({ \
+ register long int __g1 __asm__("g1") = (name); \
register long __o0 __asm__ ("o0") = (long)(arg1); \
register long __o1 __asm__ ("o1") = (long)(arg2); \
register long __o2 __asm__ ("o2") = (long)(arg3); \
register long __o3 __asm__ ("o3") = (long)(arg4); \
register long __o4 __asm__ ("o4") = (long)(arg5); \
- err = name; \
- __asm __volatile (string : "=r" (err), "=r" (__o0) : \
- "0" (err), "1" (__o0), "r" (__o1), \
+ __asm __volatile (string : "=r" (__o0) : \
+ "r" (__g1), "0" (__o0), "r" (__o1), \
"r" (__o2), "r" (__o3), "r" (__o4) : \
__SYSCALL_CLOBBERS); \
__o0; \
})
-#define inline_syscall6(string,err,name,arg1,arg2,arg3,arg4,arg5,arg6) \
+#define internal_syscall6(string,err,name,arg1,arg2,arg3,arg4,arg5,arg6)\
({ \
+ register long int __g1 __asm__("g1") = (name); \
register long __o0 __asm__ ("o0") = (long)(arg1); \
register long __o1 __asm__ ("o1") = (long)(arg2); \
register long __o2 __asm__ ("o2") = (long)(arg3); \
register long __o3 __asm__ ("o3") = (long)(arg4); \
register long __o4 __asm__ ("o4") = (long)(arg5); \
register long __o5 __asm__ ("o5") = (long)(arg6); \
- err = name; \
- __asm __volatile (string : "=r" (err), "=r" (__o0) : \
- "0" (err), "1" (__o0), "r" (__o1), \
+ __asm __volatile (string : "=r" (__o0) : \
+ "r" (__g1), "0" (__o0), "r" (__o1), \
"r" (__o2), "r" (__o3), "r" (__o4), \
"r" (__o5) : \
__SYSCALL_CLOBBERS); \
@@ -182,13 +174,13 @@
register long __o4 __asm__ ("o4") = (long)(arg5); \
register long __g1 __asm__ ("g1") = __NR_clone; \
__asm __volatile (__SYSCALL_STRING : \
- "=r" (__g1), "=r" (__o0), "=r" (__o1) : \
- "0" (__g1), "1" (__o0), "2" (__o1), \
+ "=r" (__o0), "=r" (__o1) : \
+ "r" (__g1), "0" (__o0), "1" (__o1), \
"r" (__o2), "r" (__o3), "r" (__o4) : \
__SYSCALL_CLOBBERS); \
- if (INTERNAL_SYSCALL_ERROR_P (__o0, __g1)) \
+ if (__glibc_unlikely ((unsigned long int) (__o0) > -4096UL)) \
{ \
- __set_errno (INTERNAL_SYSCALL_ERRNO (__o0, __g1)); \
+ __set_errno (-__o0); \
__o0 = -1L; \
} \
else \