nptl: Fix extraneous testing run by tst-rseq-nptl in the test driver

Message ID e573ada7-7491-86d4-e36d-40032b18c9df@redhat.com
State Committed
Commit 9fb237a1c861f3b6a014f7add3f564452ea23e61
Delegated to: Florian Weimer
Headers
Series nptl: Fix extraneous testing run by tst-rseq-nptl in the test driver |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed

Commit Message

Maciej W. Rozycki July 11, 2024, 2:07 p.m. UTC
  Fix an issue with commit 8f4632deb354 ("Linux: rseq registration tests") 
and prevent testing from being run in the process of the test driver 
itself rather than just the test child where one has been forked.  The 
problem here is the unguarded use of a destructor to call a part of the 
testing.  The destructor function, 'do_rseq_destructor_test' is called 
implicitly at program completion, however because it is associated with 
the executable itself rather than an individual process, it is called 
both in the test child *and* in the test driver itself.

Prevent this from happening by providing a guard variable that only 
enables test invocation from 'do_rseq_destructor_test' in the process 
that has first run 'do_test'.  Consequently extra testing is invoked 
from 'do_rseq_destructor_test' only once and in the correct process, 
regardless of the use or the lack of of the '--direct' option.  Where 
called in the controlling test driver process that has neved called 
'do_test' the destructor function silently returns right away without 
taking any further actions, letting the test driver fail gracefully 
where applicable.

This arrangement prevents 'tst-rseq-nptl' from ever causing testing to 
hang forever and never complete, such as currently happening with the 
'mips-linux-gnu' (o32 ABI) target.
---
Hi,

 In the course of verifying changes for the BZ #27650 test case, posted
separately, I have come across a couple of tests hanging forever that
cause `mips-linux-gnu' (o32 ABI) testing to never complete unless the
offending tests are killed by hand.

 This patch addresses this issue for the tst-rseq-nptl test.  This was a 
bit tricky to debug as I saw no immediate cause as to the hang *despite* 
the test driver already being used by the test case.  Eventually running 
via `strace' revealed the actual problem:

wait4(16905, 0x7fffa930, 0, NULL)       = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) ---
SYS_4366()                              = 0
kill(-16905, SIGKILL)                   = 0
kill(16905, SIGKILL)                    = 0
wait4(16905, 0x7fffa5d0, WNOHANG|WSTOPPED, NULL) = 0
SYS_4265()                              = -1 ERRNO_516 ((null))
--- SIGCHLD (Child exited) ---
SYS_4253()                              = 0
wait4(16905, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|WSTOPPED, NULL) = 16905
write(1, "Timed out: killed the child proc"..., 35) = 35
write(1, "\n", 1)                       = 1
brk(0)                                  = 0x55591000
brk(0x555b2000)                         = 0x555b2000
SYS_4288()                              = 3
SYS_4366()                              = 0
SYS_4366()                              = 0
read(3, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7\0\0\0\7\0"..., 4096) = 1323
close(3)                                = 0
write(1, "Termination time: 2024-07-06T03:"..., 48) = 48
SYS_4367()                              = -1 EINVAL (Invalid argument)
rt_sigaction(SIGUSR1, {SIG_DFL}, NULL, 16) = 0
SYS_4222()                              = 16904
getpid()                                = 16904
SYS_4266()                              = 0
--- SIGUSR1 (User defined signal 1) ---
sigreturn()                             = 0

This was obtained with a slightly outdated `strace' binary, hence the 
missing decoding of some syscalls, however output produced is clear: the 
test child does get killed, the test driver reports what it is supposed to 
in this case, but then, hold on, what's up with that `rt_sigaction' call?  
OK, that pointed me at the right place, that is `do_rseq_destructor_test' 
having been marked with `__attribute__ ((destructor))'.

 Being a destructor the function is obviously implicitly called at program 
completion, and then by both the test child *and* the test driver process 
itself.  Hence the call to `do_rseq_test' that hangs also causes the test 
driver to hang forever, preventing testing from ever completing.

 OK to apply (either now or past the freeze period)?

  Maciej
---
 sysdeps/unix/sysv/linux/tst-rseq-nptl.c |    9 +++++++++
 1 file changed, 9 insertions(+)

glibc-tst-rseq-nptl-destructor.diff
  

Comments

Maciej W. Rozycki July 29, 2024, 3:21 p.m. UTC | #1
On Thu, 11 Jul 2024, Maciej W. Rozycki wrote:

> Fix an issue with commit 8f4632deb354 ("Linux: rseq registration tests") 
> and prevent testing from being run in the process of the test driver 
> itself rather than just the test child where one has been forked.  The 
> problem here is the unguarded use of a destructor to call a part of the 
> testing.  The destructor function, 'do_rseq_destructor_test' is called 
> implicitly at program completion, however because it is associated with 
> the executable itself rather than an individual process, it is called 
> both in the test child *and* in the test driver itself.

 Ping for:
<https://sourceware.org/pipermail/libc-alpha/2024-July/158277.html>,
<https://patchwork.sourceware.org/project/glibc/patch/e573ada7-7491-86d4-e36d-40032b18c9df@redhat.com/>.

  Maciej
  
Maciej W. Rozycki Aug. 12, 2024, 4:49 p.m. UTC | #2
On Thu, 11 Jul 2024, Maciej W. Rozycki wrote:

> Fix an issue with commit 8f4632deb354 ("Linux: rseq registration tests") 
> and prevent testing from being run in the process of the test driver 
> itself rather than just the test child where one has been forked.  The 
> problem here is the unguarded use of a destructor to call a part of the 
> testing.  The destructor function, 'do_rseq_destructor_test' is called 
> implicitly at program completion, however because it is associated with 
> the executable itself rather than an individual process, it is called 
> both in the test child *and* in the test driver itself.

 Ping for:
<https://sourceware.org/pipermail/libc-alpha/2024-July/158277.html>,
<https://patchwork.sourceware.org/project/glibc/patch/e573ada7-7491-86d4-e36d-40032b18c9df@redhat.com/>.

  Maciej
  
Adhemerval Zanella Netto Aug. 14, 2024, 12:13 p.m. UTC | #3
On 11/07/24 11:07, Maciej W. Rozycki wrote:
> Fix an issue with commit 8f4632deb354 ("Linux: rseq registration tests") 
> and prevent testing from being run in the process of the test driver 
> itself rather than just the test child where one has been forked.  The 
> problem here is the unguarded use of a destructor to call a part of the 
> testing.  The destructor function, 'do_rseq_destructor_test' is called 
> implicitly at program completion, however because it is associated with 
> the executable itself rather than an individual process, it is called 
> both in the test child *and* in the test driver itself.
> 
> Prevent this from happening by providing a guard variable that only 
> enables test invocation from 'do_rseq_destructor_test' in the process 
> that has first run 'do_test'.  Consequently extra testing is invoked 
> from 'do_rseq_destructor_test' only once and in the correct process, 
> regardless of the use or the lack of of the '--direct' option.  Where 
> called in the controlling test driver process that has neved called 
> 'do_test' the destructor function silently returns right away without 
> taking any further actions, letting the test driver fail gracefully 
> where applicable.
> 
> This arrangement prevents 'tst-rseq-nptl' from ever causing testing to 
> hang forever and never complete, such as currently happening with the 
> 'mips-linux-gnu' (o32 ABI) target.
> ---
> Hi,
> 
>  In the course of verifying changes for the BZ #27650 test case, posted
> separately, I have come across a couple of tests hanging forever that
> cause `mips-linux-gnu' (o32 ABI) testing to never complete unless the
> offending tests are killed by hand.
> 
>  This patch addresses this issue for the tst-rseq-nptl test.  This was a 
> bit tricky to debug as I saw no immediate cause as to the hang *despite* 
> the test driver already being used by the test case.  Eventually running 
> via `strace' revealed the actual problem:
> 
> wait4(16905, 0x7fffa930, 0, NULL)       = ? ERESTARTSYS (To be restarted)
> --- SIGALRM (Alarm clock) ---
> SYS_4366()                              = 0
> kill(-16905, SIGKILL)                   = 0
> kill(16905, SIGKILL)                    = 0
> wait4(16905, 0x7fffa5d0, WNOHANG|WSTOPPED, NULL) = 0
> SYS_4265()                              = -1 ERRNO_516 ((null))
> --- SIGCHLD (Child exited) ---
> SYS_4253()                              = 0
> wait4(16905, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|WSTOPPED, NULL) = 16905
> write(1, "Timed out: killed the child proc"..., 35) = 35
> write(1, "\n", 1)                       = 1
> brk(0)                                  = 0x55591000
> brk(0x555b2000)                         = 0x555b2000
> SYS_4288()                              = 3
> SYS_4366()                              = 0
> SYS_4366()                              = 0
> read(3, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7\0\0\0\7\0"..., 4096) = 1323
> close(3)                                = 0
> write(1, "Termination time: 2024-07-06T03:"..., 48) = 48
> SYS_4367()                              = -1 EINVAL (Invalid argument)
> rt_sigaction(SIGUSR1, {SIG_DFL}, NULL, 16) = 0
> SYS_4222()                              = 16904
> getpid()                                = 16904
> SYS_4266()                              = 0
> --- SIGUSR1 (User defined signal 1) ---
> sigreturn()                             = 0
> 
> This was obtained with a slightly outdated `strace' binary, hence the 
> missing decoding of some syscalls, however output produced is clear: the 
> test child does get killed, the test driver reports what it is supposed to 
> in this case, but then, hold on, what's up with that `rt_sigaction' call?  
> OK, that pointed me at the right place, that is `do_rseq_destructor_test' 
> having been marked with `__attribute__ ((destructor))'.
> 
>  Being a destructor the function is obviously implicitly called at program 
> completion, and then by both the test child *and* the test driver process 
> itself.  Hence the call to `do_rseq_test' that hangs also causes the test 
> driver to hang forever, preventing testing from ever completing.
> 
>  OK to apply (either now or past the freeze period)?
> 
>   Maciej


LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  sysdeps/unix/sysv/linux/tst-rseq-nptl.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> glibc-tst-rseq-nptl-destructor.diff
> Index: glibc/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
> ===================================================================
> --- glibc.orig/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
> +++ glibc/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
> @@ -28,6 +28,11 @@
>  #include <sys/rseq.h>
>  #include <unistd.h>
>  
> +/* Set this in 'do_test' only so as to invoke the destructor test in
> +   the test process only and not 'support_test_main' parent.  Otherwise
> +   the test harness may hang in the destructor if something goes wrong.  */
> +static int run_destructor_test;
> +
>  #ifdef RSEQ_SIG
>  # include <array_length.h>
>  # include <errno.h>
> @@ -236,6 +241,9 @@ do_rseq_test (void)
>  static void __attribute__ ((destructor))
>  do_rseq_destructor_test (void)
>  {
> +  if (!run_destructor_test)
> +    return;
> +
>    /* Cannot use deferred failure reporting after main returns.  */
>    if (do_rseq_test ())
>      FAIL_EXIT1 ("rseq not registered within destructor");
> @@ -254,6 +262,7 @@ do_rseq_test (void)
>  static int
>  do_test (void)
>  {
> +  run_destructor_test = 1;
>    return do_rseq_test ();
>  }
>  
>
  
Maciej W. Rozycki Aug. 16, 2024, 1:43 p.m. UTC | #4
On Wed, 14 Aug 2024, Adhemerval Zanella Netto wrote:

> LGTM, thanks.
> 
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

 Applied now, thank you for your review.

  Maciej
  

Patch

Index: glibc/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
===================================================================
--- glibc.orig/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
+++ glibc/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
@@ -28,6 +28,11 @@ 
 #include <sys/rseq.h>
 #include <unistd.h>
 
+/* Set this in 'do_test' only so as to invoke the destructor test in
+   the test process only and not 'support_test_main' parent.  Otherwise
+   the test harness may hang in the destructor if something goes wrong.  */
+static int run_destructor_test;
+
 #ifdef RSEQ_SIG
 # include <array_length.h>
 # include <errno.h>
@@ -236,6 +241,9 @@  do_rseq_test (void)
 static void __attribute__ ((destructor))
 do_rseq_destructor_test (void)
 {
+  if (!run_destructor_test)
+    return;
+
   /* Cannot use deferred failure reporting after main returns.  */
   if (do_rseq_test ())
     FAIL_EXIT1 ("rseq not registered within destructor");
@@ -254,6 +262,7 @@  do_rseq_test (void)
 static int
 do_test (void)
 {
+  run_destructor_test = 1;
   return do_rseq_test ();
 }