Linux: Inhibit tail calls in cancellable system calls
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
Commit Message
Without this change, the system call wrapper function is not visible
on the stack at the time of the system call, which causes problems
for interception tools such as valgrind.
Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
Race conditions in pthread cancellation [BZ#12683]").
Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
(We're still discussing if valgrind needs this, but if it does, here's a
patch.)
---
sysdeps/unix/sysdep.h | 57 ++++++++++++++++++++++++++++++++-------------------
1 file changed, 36 insertions(+), 21 deletions(-)
base-commit: 3e2be87832781a29ed67f38f87c1ce3dd4c1b866
Comments
On 3/21/25 9:01 AM, Florian Weimer wrote:
> Without this change, the system call wrapper function is not visible
> on the stack at the time of the system call, which causes problems
> for interception tools such as valgrind.
>
> Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> Race conditions in pthread cancellation [BZ#12683]").
>
> Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> (We're still discussing if valgrind needs this, but if it does, here's a
> patch.)
We discussed this on the weekly patch queue review.
Glad to see the Valgrind discussion is ongoing.
This has cross-architecture implications, and can add extra function calls.
I'm marking the patch in patchwork as changes requested since we should review
and discuss Adhemerval's suggestion.
Hi Florian,
On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote:
> Without this change, the system call wrapper function is not visible
> on the stack at the time of the system call, which causes problems
> for interception tools such as valgrind.
>
> Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> Race conditions in pthread cancellation [BZ#12683]").
>
> Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> (We're still discussing if valgrind needs this, but if it does, here's a
> patch.)
I implemented the valgrind part of skipping the syscall_cancel frames
here: https://bugs.kde.org/show_bug.cgi?id=502126#c2
And there is a valgrind package build for fedora rawhide:
https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393
For ppc64le, s390x and x86_64 that patch seems enough.
For i686 and aarch64 there does seem to be an issue with missing the
glibc calling function because of a tail call.
Also on i686 there is another extra frame on top __libc_do_syscall.
I haven't yet tried on a system with a glibc with this patch applied.
Cheers,
Mark
> ---
> sysdeps/unix/sysdep.h | 57 ++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 36 insertions(+), 21 deletions(-)
>
> diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h
> index 2cc98725c3..8f1e4fc4f9 100644
> --- a/sysdeps/unix/sysdep.h
> +++ b/sysdeps/unix/sysdep.h
> @@ -166,30 +166,45 @@ long int __syscall_cancel (__syscall_arg_t arg1, __syscall_arg_t arg2,
> __SYSCALL_CANCEL7_ARG_DEF
> __syscall_arg_t nr) attribute_hidden;
>
> +/* Inhibit tail call optimization, so that the stack frame of the
> + system-call-implementing function is visible at the time of the
> + system call. */
> +#define __syscall_cancel_barrier(...) \
> + ({ \
> + long int __syscall_cancel_result = __syscall_cancel (__VA_ARGS__); \
> + asm volatile ("" ::: "memory"); \
> + __syscall_cancel_result; \
> + })
> +
> #define __SYSCALL_CANCEL0(name) \
> - __syscall_cancel (0, 0, 0, 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
> + __syscall_cancel_barrier (0, 0, 0, 0, 0, 0, \
> + __SYSCALL_CANCEL7_ARG __NR_##name)
> #define __SYSCALL_CANCEL1(name, a1) \
> - __syscall_cancel (__SSC (a1), 0, 0, 0, 0, 0, \
> - __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL2(name, a1, a2) \
> - __syscall_cancel (__SSC (a1), __SSC (a2), 0, 0, 0, 0, \
> - __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL3(name, a1, a2, a3) \
> - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), 0, 0, 0, \
> - __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \
> - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), \
> - __SSC(a4), 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \
> - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC(a4), \
> - __SSC (a5), 0, __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \
> - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4), \
> - __SSC (a5), __SSC (a6), __SYSCALL_CANCEL7_ARG \
> - __NR_##name)
> + __syscall_cancel_barrier (__SSC (a1), 0, 0, 0, 0, 0, \
> + __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL2(name, a1, a2) \
> + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), 0, 0, 0, 0, \
> + __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL3(name, a1, a2, a3) \
> + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
> + 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \
> + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
> + __SSC(a4), 0, 0, \
> + __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \
> + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
> + __SSC(a4), __SSC (a5), 0, \
> + __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \
> + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
> + __SSC (a4), __SSC (a5), __SSC (a6), \
> + __SYSCALL_CANCEL7_ARG \
> + __NR_##name)
> #define __SYSCALL_CANCEL7(name, a1, a2, a3, a4, a5, a6, a7) \
> - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4), \
> - __SSC (a5), __SSC (a6), __SSC (a7), __NR_##name)
> + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
> + __SSC (a4), __SSC (a5), __SSC (a6), \
> + __SSC (a7), __NR_##name)
>
> #define __SYSCALL_CANCEL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n
> #define __SYSCALL_CANCEL_NARGS(...) \
>
> base-commit: 3e2be87832781a29ed67f38f87c1ce3dd4c1b866
>
Hi,
On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote:
> On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote:
> > Without this change, the system call wrapper function is not visible
> > on the stack at the time of the system call, which causes problems
> > for interception tools such as valgrind.
> >
> > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> > Race conditions in pthread cancellation [BZ#12683]").
> >
> > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> > (We're still discussing if valgrind needs this, but if it does, here's a
> > patch.)
>
> I implemented the valgrind part of skipping the syscall_cancel frames
> here: https://bugs.kde.org/show_bug.cgi?id=502126#c2
> And there is a valgrind package build for fedora rawhide:
> https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393
>
> For ppc64le, s390x and x86_64 that patch seems enough.
>
> For i686 and aarch64 there does seem to be an issue with missing the
> glibc calling function because of a tail call.
>
> Also on i686 there is another extra frame on top __libc_do_syscall.
I extended the patch to cover some extra sycall wrapper function
symbols on i386 and armhf and pushed it to valgrind trunk and
VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and
f42. This does seem to show that only on arm64 the tail calls
obscure observing the full call stack.
Cheers,
Mark
Hi,
On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote:
> On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote:
> > On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote:
> > > Without this change, the system call wrapper function is not visible
> > > on the stack at the time of the system call, which causes problems
> > > for interception tools such as valgrind.
> > >
> > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> > > Race conditions in pthread cancellation [BZ#12683]").
> > >
> > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> > > (We're still discussing if valgrind needs this, but if it does, here's a
> > > patch.)
> >
> > I implemented the valgrind part of skipping the syscall_cancel frames
> > here: https://bugs.kde.org/show_bug.cgi?id=502126#c2
> > And there is a valgrind package build for fedora rawhide:
> > https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393
> >
> > For ppc64le, s390x and x86_64 that patch seems enough.
> >
> > For i686 and aarch64 there does seem to be an issue with missing the
> > glibc calling function because of a tail call.
> >
> > Also on i686 there is another extra frame on top __libc_do_syscall.
>
> I extended the patch to cover some extra sycall wrapper function
> symbols on i386 and armhf and pushed it to valgrind trunk and
> VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and
> f42. This does seem to show that only on arm64 the tail calls
> obscure observing the full call stack.
This has now landed in fedora rawhide and f42. Test results look good,
except for some if the arm64 tests where the tail calls obscure
observing the full call stack. Please let me know if you need any more
input from us to get this fix in glibc.
Cheers,
Mark
Hi,
On Sun, Apr 06, 2025 at 03:23:54PM +0200, Mark Wielaard wrote:
> On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote:
> > On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote:
> > > On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote:
> > > > Without this change, the system call wrapper function is not visible
> > > > on the stack at the time of the system call, which causes problems
> > > > for interception tools such as valgrind.
> > > >
> > > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> > > > Race conditions in pthread cancellation [BZ#12683]").
> > > >
> > > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> > > > (We're still discussing if valgrind needs this, but if it does, here's a
> > > > patch.)
> > >
> > > I implemented the valgrind part of skipping the syscall_cancel frames
> > > here: https://bugs.kde.org/show_bug.cgi?id=502126#c2
> > > And there is a valgrind package build for fedora rawhide:
> > > https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393
> > >
> > > For ppc64le, s390x and x86_64 that patch seems enough.
> > >
> > > For i686 and aarch64 there does seem to be an issue with missing the
> > > glibc calling function because of a tail call.
> > >
> > > Also on i686 there is another extra frame on top __libc_do_syscall.
> >
> > I extended the patch to cover some extra sycall wrapper function
> > symbols on i386 and armhf and pushed it to valgrind trunk and
> > VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and
> > f42. This does seem to show that only on arm64 the tail calls
> > obscure observing the full call stack.
>
> This has now landed in fedora rawhide and f42. Test results look good,
> except for some if the arm64 tests where the tail calls obscure
> observing the full call stack. Please let me know if you need any more
> input from us to get this fix in glibc.
Please let me know. Valgrind test results for syscall backtraces on
anything except arm64 look good. We are working on valgrind 3.25.0
now, to be released around April 24.
Thanks,
Mark
@@ -166,30 +166,45 @@ long int __syscall_cancel (__syscall_arg_t arg1, __syscall_arg_t arg2,
__SYSCALL_CANCEL7_ARG_DEF
__syscall_arg_t nr) attribute_hidden;
+/* Inhibit tail call optimization, so that the stack frame of the
+ system-call-implementing function is visible at the time of the
+ system call. */
+#define __syscall_cancel_barrier(...) \
+ ({ \
+ long int __syscall_cancel_result = __syscall_cancel (__VA_ARGS__); \
+ asm volatile ("" ::: "memory"); \
+ __syscall_cancel_result; \
+ })
+
#define __SYSCALL_CANCEL0(name) \
- __syscall_cancel (0, 0, 0, 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
+ __syscall_cancel_barrier (0, 0, 0, 0, 0, 0, \
+ __SYSCALL_CANCEL7_ARG __NR_##name)
#define __SYSCALL_CANCEL1(name, a1) \
- __syscall_cancel (__SSC (a1), 0, 0, 0, 0, 0, \
- __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL2(name, a1, a2) \
- __syscall_cancel (__SSC (a1), __SSC (a2), 0, 0, 0, 0, \
- __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL3(name, a1, a2, a3) \
- __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), 0, 0, 0, \
- __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \
- __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), \
- __SSC(a4), 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \
- __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC(a4), \
- __SSC (a5), 0, __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \
- __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4), \
- __SSC (a5), __SSC (a6), __SYSCALL_CANCEL7_ARG \
- __NR_##name)
+ __syscall_cancel_barrier (__SSC (a1), 0, 0, 0, 0, 0, \
+ __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL2(name, a1, a2) \
+ __syscall_cancel_barrier (__SSC (a1), __SSC (a2), 0, 0, 0, 0, \
+ __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL3(name, a1, a2, a3) \
+ __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
+ 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \
+ __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
+ __SSC(a4), 0, 0, \
+ __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \
+ __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
+ __SSC(a4), __SSC (a5), 0, \
+ __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \
+ __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
+ __SSC (a4), __SSC (a5), __SSC (a6), \
+ __SYSCALL_CANCEL7_ARG \
+ __NR_##name)
#define __SYSCALL_CANCEL7(name, a1, a2, a3, a4, a5, a6, a7) \
- __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4), \
- __SSC (a5), __SSC (a6), __SSC (a7), __NR_##name)
+ __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \
+ __SSC (a4), __SSC (a5), __SSC (a6), \
+ __SSC (a7), __NR_##name)
#define __SYSCALL_CANCEL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n
#define __SYSCALL_CANCEL_NARGS(...) \