Linux: Inhibit tail calls in cancellable system calls

Message ID 874izmtu4w.fsf@oldenburg.str.redhat.com (mailing list archive)
State Changes Requested
Headers
Series Linux: Inhibit tail calls in cancellable system calls |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
redhat-pt-bot/TryBot-32bit success Build for i686

Commit Message

Florian Weimer March 21, 2025, 1:01 p.m. UTC
  Without this change, the system call wrapper function is not visible
on the stack at the time of the system call, which causes problems
for interception tools such as valgrind.

Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
Race conditions in pthread cancellation [BZ#12683]").

Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
(We're still discussing if valgrind needs this, but if it does, here's a
patch.)

---
 sysdeps/unix/sysdep.h | 57 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 21 deletions(-)


base-commit: 3e2be87832781a29ed67f38f87c1ce3dd4c1b866
  

Comments

Carlos O'Donell March 24, 2025, 1:37 p.m. UTC | #1
On 3/21/25 9:01 AM, Florian Weimer wrote:
> Without this change, the system call wrapper function is not visible
> on the stack at the time of the system call, which causes problems
> for interception tools such as valgrind.
> 
> Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> Race conditions in pthread cancellation [BZ#12683]").
> 
> Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> (We're still discussing if valgrind needs this, but if it does, here's a
> patch.)

We discussed this on the weekly patch queue review.

Glad to see the Valgrind discussion is ongoing.

This has cross-architecture implications, and can add extra function calls.

I'm marking the patch in patchwork as changes requested since we should review
and discuss Adhemerval's suggestion.
  
Mark Wielaard March 28, 2025, 6:02 p.m. UTC | #2
Hi Florian,

On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote:
> Without this change, the system call wrapper function is not visible
> on the stack at the time of the system call, which causes problems
> for interception tools such as valgrind.
> 
> Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> Race conditions in pthread cancellation [BZ#12683]").
> 
> Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> (We're still discussing if valgrind needs this, but if it does, here's a
> patch.)

I implemented the valgrind part of skipping the syscall_cancel frames
here: https://bugs.kde.org/show_bug.cgi?id=502126#c2
And there is a valgrind package build for fedora rawhide:
https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393

For ppc64le, s390x and x86_64 that patch seems enough.

For i686 and aarch64 there does seem to be an issue with missing the
glibc calling function because of a tail call.

Also on i686 there is another extra frame on top __libc_do_syscall.

I haven't yet tried on a system with a glibc with this patch applied. 

Cheers,

Mark

> ---
>  sysdeps/unix/sysdep.h | 57 ++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 36 insertions(+), 21 deletions(-)
> 
> diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h
> index 2cc98725c3..8f1e4fc4f9 100644
> --- a/sysdeps/unix/sysdep.h
> +++ b/sysdeps/unix/sysdep.h
> @@ -166,30 +166,45 @@ long int __syscall_cancel (__syscall_arg_t arg1, __syscall_arg_t arg2,
>  			   __SYSCALL_CANCEL7_ARG_DEF
>  			   __syscall_arg_t nr) attribute_hidden;
>  
> +/* Inhibit tail call optimization, so that the stack frame of the
> +   system-call-implementing function is visible at the time of the
> +   system call.  */
> +#define __syscall_cancel_barrier(...)					\
> +  ({									\
> +     long int __syscall_cancel_result = __syscall_cancel (__VA_ARGS__); \
> +     asm volatile ("" ::: "memory");					\
> +     __syscall_cancel_result;						\
> +  })
> +
>  #define __SYSCALL_CANCEL0(name)						\
> -  __syscall_cancel (0, 0, 0, 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
> +  __syscall_cancel_barrier (0, 0, 0, 0, 0, 0,				\
> +			    __SYSCALL_CANCEL7_ARG __NR_##name)
>  #define __SYSCALL_CANCEL1(name, a1)					\
> -  __syscall_cancel (__SSC (a1), 0, 0, 0, 0, 0,				\
> -		    __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL2(name, a1, a2) \
> -  __syscall_cancel (__SSC (a1), __SSC (a2), 0, 0, 0, 0,			\
> -		    __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL3(name, a1, a2, a3) \
> -  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), 0, 0, 0,	\
> -		    __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \
> -  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3),			\
> -		    __SSC(a4), 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \
> -  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC(a4),	\
> -		    __SSC (a5), 0, __SYSCALL_CANCEL7_ARG __NR_##name)
> -#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \
> -  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4),	\
> -		    __SSC (a5), __SSC (a6), __SYSCALL_CANCEL7_ARG	\
> -		    __NR_##name)
> +  __syscall_cancel_barrier (__SSC (a1), 0, 0, 0, 0, 0,			\
> +			    __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL2(name, a1, a2)					\
> +  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), 0, 0, 0, 0,		\
> +			    __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL3(name, a1, a2, a3)				\
> +  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
> +			    0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4)				\
> +  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
> +			    __SSC(a4), 0, 0,				\
> +			    __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5)			\
> +  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
> +			    __SSC(a4), __SSC (a5), 0,			\
> +			    __SYSCALL_CANCEL7_ARG __NR_##name)
> +#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6)			\
> +  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
> +			    __SSC (a4), __SSC (a5), __SSC (a6),		\
> +			    __SYSCALL_CANCEL7_ARG			\
> +			    __NR_##name)
>  #define __SYSCALL_CANCEL7(name, a1, a2, a3, a4, a5, a6, a7)		\
> -  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4),	\
> -		    __SSC (a5), __SSC (a6), __SSC (a7), __NR_##name)
> +  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
> +			    __SSC (a4), __SSC (a5), __SSC (a6),		\
> +			    __SSC (a7), __NR_##name)
>  
>  #define __SYSCALL_CANCEL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n
>  #define __SYSCALL_CANCEL_NARGS(...) \
> 
> base-commit: 3e2be87832781a29ed67f38f87c1ce3dd4c1b866
>
  
Mark Wielaard March 31, 2025, 9:29 a.m. UTC | #3
Hi,

On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote:
> On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote:
> > Without this change, the system call wrapper function is not visible
> > on the stack at the time of the system call, which causes problems
> > for interception tools such as valgrind.
> > 
> > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> > Race conditions in pthread cancellation [BZ#12683]").
> > 
> > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> > (We're still discussing if valgrind needs this, but if it does, here's a
> > patch.)
> 
> I implemented the valgrind part of skipping the syscall_cancel frames
> here: https://bugs.kde.org/show_bug.cgi?id=502126#c2
> And there is a valgrind package build for fedora rawhide:
> https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393
> 
> For ppc64le, s390x and x86_64 that patch seems enough.
> 
> For i686 and aarch64 there does seem to be an issue with missing the
> glibc calling function because of a tail call.
> 
> Also on i686 there is another extra frame on top __libc_do_syscall.

I extended the patch to cover some extra sycall wrapper function
symbols on i386 and armhf and pushed it to valgrind trunk and
VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and
f42. This does seem to show that only on arm64 the tail calls
obscure observing the full call stack.

Cheers,

Mark
  
Mark Wielaard April 6, 2025, 1:23 p.m. UTC | #4
Hi,

On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote:
> On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote:
> > On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote:
> > > Without this change, the system call wrapper function is not visible
> > > on the stack at the time of the system call, which causes problems
> > > for interception tools such as valgrind.
> > > 
> > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> > > Race conditions in pthread cancellation [BZ#12683]").
> > > 
> > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> > > (We're still discussing if valgrind needs this, but if it does, here's a
> > > patch.)
> > 
> > I implemented the valgrind part of skipping the syscall_cancel frames
> > here: https://bugs.kde.org/show_bug.cgi?id=502126#c2
> > And there is a valgrind package build for fedora rawhide:
> > https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393
> > 
> > For ppc64le, s390x and x86_64 that patch seems enough.
> > 
> > For i686 and aarch64 there does seem to be an issue with missing the
> > glibc calling function because of a tail call.
> > 
> > Also on i686 there is another extra frame on top __libc_do_syscall.
> 
> I extended the patch to cover some extra sycall wrapper function
> symbols on i386 and armhf and pushed it to valgrind trunk and
> VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and
> f42. This does seem to show that only on arm64 the tail calls
> obscure observing the full call stack.

This has now landed in fedora rawhide and f42. Test results look good,
except for some if the arm64 tests where the tail calls obscure
observing the full call stack. Please let me know if you need any more
input from us to get this fix in glibc.

Cheers,

Mark
  
Mark Wielaard April 14, 2025, 12:06 p.m. UTC | #5
Hi,

On Sun, Apr 06, 2025 at 03:23:54PM +0200, Mark Wielaard wrote:
> On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote:
> > On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote:
> > > On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote:
> > > > Without this change, the system call wrapper function is not visible
> > > > on the stack at the time of the system call, which causes problems
> > > > for interception tools such as valgrind.
> > > > 
> > > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix
> > > > Race conditions in pthread cancellation [BZ#12683]").
> > > > 
> > > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu.
> > > > (We're still discussing if valgrind needs this, but if it does, here's a
> > > > patch.)
> > > 
> > > I implemented the valgrind part of skipping the syscall_cancel frames
> > > here: https://bugs.kde.org/show_bug.cgi?id=502126#c2
> > > And there is a valgrind package build for fedora rawhide:
> > > https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393
> > > 
> > > For ppc64le, s390x and x86_64 that patch seems enough.
> > > 
> > > For i686 and aarch64 there does seem to be an issue with missing the
> > > glibc calling function because of a tail call.
> > > 
> > > Also on i686 there is another extra frame on top __libc_do_syscall.
> > 
> > I extended the patch to cover some extra sycall wrapper function
> > symbols on i386 and armhf and pushed it to valgrind trunk and
> > VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and
> > f42. This does seem to show that only on arm64 the tail calls
> > obscure observing the full call stack.
> 
> This has now landed in fedora rawhide and f42. Test results look good,
> except for some if the arm64 tests where the tail calls obscure
> observing the full call stack. Please let me know if you need any more
> input from us to get this fix in glibc.

Please let me know. Valgrind test results for syscall backtraces on
anything except arm64 look good. We are working on valgrind 3.25.0
now, to be released around April 24.

Thanks,

Mark
  

Patch

diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h
index 2cc98725c3..8f1e4fc4f9 100644
--- a/sysdeps/unix/sysdep.h
+++ b/sysdeps/unix/sysdep.h
@@ -166,30 +166,45 @@  long int __syscall_cancel (__syscall_arg_t arg1, __syscall_arg_t arg2,
 			   __SYSCALL_CANCEL7_ARG_DEF
 			   __syscall_arg_t nr) attribute_hidden;
 
+/* Inhibit tail call optimization, so that the stack frame of the
+   system-call-implementing function is visible at the time of the
+   system call.  */
+#define __syscall_cancel_barrier(...)					\
+  ({									\
+     long int __syscall_cancel_result = __syscall_cancel (__VA_ARGS__); \
+     asm volatile ("" ::: "memory");					\
+     __syscall_cancel_result;						\
+  })
+
 #define __SYSCALL_CANCEL0(name)						\
-  __syscall_cancel (0, 0, 0, 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
+  __syscall_cancel_barrier (0, 0, 0, 0, 0, 0,				\
+			    __SYSCALL_CANCEL7_ARG __NR_##name)
 #define __SYSCALL_CANCEL1(name, a1)					\
-  __syscall_cancel (__SSC (a1), 0, 0, 0, 0, 0,				\
-		    __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL2(name, a1, a2) \
-  __syscall_cancel (__SSC (a1), __SSC (a2), 0, 0, 0, 0,			\
-		    __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL3(name, a1, a2, a3) \
-  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), 0, 0, 0,	\
-		    __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \
-  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3),			\
-		    __SSC(a4), 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \
-  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC(a4),	\
-		    __SSC (a5), 0, __SYSCALL_CANCEL7_ARG __NR_##name)
-#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \
-  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4),	\
-		    __SSC (a5), __SSC (a6), __SYSCALL_CANCEL7_ARG	\
-		    __NR_##name)
+  __syscall_cancel_barrier (__SSC (a1), 0, 0, 0, 0, 0,			\
+			    __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL2(name, a1, a2)					\
+  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), 0, 0, 0, 0,		\
+			    __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL3(name, a1, a2, a3)				\
+  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
+			    0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4)				\
+  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
+			    __SSC(a4), 0, 0,				\
+			    __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5)			\
+  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
+			    __SSC(a4), __SSC (a5), 0,			\
+			    __SYSCALL_CANCEL7_ARG __NR_##name)
+#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6)			\
+  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
+			    __SSC (a4), __SSC (a5), __SSC (a6),		\
+			    __SYSCALL_CANCEL7_ARG			\
+			    __NR_##name)
 #define __SYSCALL_CANCEL7(name, a1, a2, a3, a4, a5, a6, a7)		\
-  __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4),	\
-		    __SSC (a5), __SSC (a6), __SSC (a7), __NR_##name)
+  __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3),		\
+			    __SSC (a4), __SSC (a5), __SSC (a6),		\
+			    __SSC (a7), __NR_##name)
 
 #define __SYSCALL_CANCEL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n
 #define __SYSCALL_CANCEL_NARGS(...) \