nptl: Fix extraneous testing run by tst-rseq-nptl in the test driver
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Fix an issue with commit 8f4632deb354 ("Linux: rseq registration tests")
and prevent testing from being run in the process of the test driver
itself rather than just the test child where one has been forked. The
problem here is the unguarded use of a destructor to call a part of the
testing. The destructor function, 'do_rseq_destructor_test' is called
implicitly at program completion, however because it is associated with
the executable itself rather than an individual process, it is called
both in the test child *and* in the test driver itself.
Prevent this from happening by providing a guard variable that only
enables test invocation from 'do_rseq_destructor_test' in the process
that has first run 'do_test'. Consequently extra testing is invoked
from 'do_rseq_destructor_test' only once and in the correct process,
regardless of the use or the lack of of the '--direct' option. Where
called in the controlling test driver process that has neved called
'do_test' the destructor function silently returns right away without
taking any further actions, letting the test driver fail gracefully
where applicable.
This arrangement prevents 'tst-rseq-nptl' from ever causing testing to
hang forever and never complete, such as currently happening with the
'mips-linux-gnu' (o32 ABI) target.
---
Hi,
In the course of verifying changes for the BZ #27650 test case, posted
separately, I have come across a couple of tests hanging forever that
cause `mips-linux-gnu' (o32 ABI) testing to never complete unless the
offending tests are killed by hand.
This patch addresses this issue for the tst-rseq-nptl test. This was a
bit tricky to debug as I saw no immediate cause as to the hang *despite*
the test driver already being used by the test case. Eventually running
via `strace' revealed the actual problem:
wait4(16905, 0x7fffa930, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) ---
SYS_4366() = 0
kill(-16905, SIGKILL) = 0
kill(16905, SIGKILL) = 0
wait4(16905, 0x7fffa5d0, WNOHANG|WSTOPPED, NULL) = 0
SYS_4265() = -1 ERRNO_516 ((null))
--- SIGCHLD (Child exited) ---
SYS_4253() = 0
wait4(16905, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|WSTOPPED, NULL) = 16905
write(1, "Timed out: killed the child proc"..., 35) = 35
write(1, "\n", 1) = 1
brk(0) = 0x55591000
brk(0x555b2000) = 0x555b2000
SYS_4288() = 3
SYS_4366() = 0
SYS_4366() = 0
read(3, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7\0\0\0\7\0"..., 4096) = 1323
close(3) = 0
write(1, "Termination time: 2024-07-06T03:"..., 48) = 48
SYS_4367() = -1 EINVAL (Invalid argument)
rt_sigaction(SIGUSR1, {SIG_DFL}, NULL, 16) = 0
SYS_4222() = 16904
getpid() = 16904
SYS_4266() = 0
--- SIGUSR1 (User defined signal 1) ---
sigreturn() = 0
This was obtained with a slightly outdated `strace' binary, hence the
missing decoding of some syscalls, however output produced is clear: the
test child does get killed, the test driver reports what it is supposed to
in this case, but then, hold on, what's up with that `rt_sigaction' call?
OK, that pointed me at the right place, that is `do_rseq_destructor_test'
having been marked with `__attribute__ ((destructor))'.
Being a destructor the function is obviously implicitly called at program
completion, and then by both the test child *and* the test driver process
itself. Hence the call to `do_rseq_test' that hangs also causes the test
driver to hang forever, preventing testing from ever completing.
OK to apply (either now or past the freeze period)?
Maciej
---
sysdeps/unix/sysv/linux/tst-rseq-nptl.c | 9 +++++++++
1 file changed, 9 insertions(+)
glibc-tst-rseq-nptl-destructor.diff
Comments
On Thu, 11 Jul 2024, Maciej W. Rozycki wrote:
> Fix an issue with commit 8f4632deb354 ("Linux: rseq registration tests")
> and prevent testing from being run in the process of the test driver
> itself rather than just the test child where one has been forked. The
> problem here is the unguarded use of a destructor to call a part of the
> testing. The destructor function, 'do_rseq_destructor_test' is called
> implicitly at program completion, however because it is associated with
> the executable itself rather than an individual process, it is called
> both in the test child *and* in the test driver itself.
Ping for:
<https://sourceware.org/pipermail/libc-alpha/2024-July/158277.html>,
<https://patchwork.sourceware.org/project/glibc/patch/e573ada7-7491-86d4-e36d-40032b18c9df@redhat.com/>.
Maciej
On Thu, 11 Jul 2024, Maciej W. Rozycki wrote:
> Fix an issue with commit 8f4632deb354 ("Linux: rseq registration tests")
> and prevent testing from being run in the process of the test driver
> itself rather than just the test child where one has been forked. The
> problem here is the unguarded use of a destructor to call a part of the
> testing. The destructor function, 'do_rseq_destructor_test' is called
> implicitly at program completion, however because it is associated with
> the executable itself rather than an individual process, it is called
> both in the test child *and* in the test driver itself.
Ping for:
<https://sourceware.org/pipermail/libc-alpha/2024-July/158277.html>,
<https://patchwork.sourceware.org/project/glibc/patch/e573ada7-7491-86d4-e36d-40032b18c9df@redhat.com/>.
Maciej
On 11/07/24 11:07, Maciej W. Rozycki wrote:
> Fix an issue with commit 8f4632deb354 ("Linux: rseq registration tests")
> and prevent testing from being run in the process of the test driver
> itself rather than just the test child where one has been forked. The
> problem here is the unguarded use of a destructor to call a part of the
> testing. The destructor function, 'do_rseq_destructor_test' is called
> implicitly at program completion, however because it is associated with
> the executable itself rather than an individual process, it is called
> both in the test child *and* in the test driver itself.
>
> Prevent this from happening by providing a guard variable that only
> enables test invocation from 'do_rseq_destructor_test' in the process
> that has first run 'do_test'. Consequently extra testing is invoked
> from 'do_rseq_destructor_test' only once and in the correct process,
> regardless of the use or the lack of of the '--direct' option. Where
> called in the controlling test driver process that has neved called
> 'do_test' the destructor function silently returns right away without
> taking any further actions, letting the test driver fail gracefully
> where applicable.
>
> This arrangement prevents 'tst-rseq-nptl' from ever causing testing to
> hang forever and never complete, such as currently happening with the
> 'mips-linux-gnu' (o32 ABI) target.
> ---
> Hi,
>
> In the course of verifying changes for the BZ #27650 test case, posted
> separately, I have come across a couple of tests hanging forever that
> cause `mips-linux-gnu' (o32 ABI) testing to never complete unless the
> offending tests are killed by hand.
>
> This patch addresses this issue for the tst-rseq-nptl test. This was a
> bit tricky to debug as I saw no immediate cause as to the hang *despite*
> the test driver already being used by the test case. Eventually running
> via `strace' revealed the actual problem:
>
> wait4(16905, 0x7fffa930, 0, NULL) = ? ERESTARTSYS (To be restarted)
> --- SIGALRM (Alarm clock) ---
> SYS_4366() = 0
> kill(-16905, SIGKILL) = 0
> kill(16905, SIGKILL) = 0
> wait4(16905, 0x7fffa5d0, WNOHANG|WSTOPPED, NULL) = 0
> SYS_4265() = -1 ERRNO_516 ((null))
> --- SIGCHLD (Child exited) ---
> SYS_4253() = 0
> wait4(16905, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|WSTOPPED, NULL) = 16905
> write(1, "Timed out: killed the child proc"..., 35) = 35
> write(1, "\n", 1) = 1
> brk(0) = 0x55591000
> brk(0x555b2000) = 0x555b2000
> SYS_4288() = 3
> SYS_4366() = 0
> SYS_4366() = 0
> read(3, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7\0\0\0\7\0"..., 4096) = 1323
> close(3) = 0
> write(1, "Termination time: 2024-07-06T03:"..., 48) = 48
> SYS_4367() = -1 EINVAL (Invalid argument)
> rt_sigaction(SIGUSR1, {SIG_DFL}, NULL, 16) = 0
> SYS_4222() = 16904
> getpid() = 16904
> SYS_4266() = 0
> --- SIGUSR1 (User defined signal 1) ---
> sigreturn() = 0
>
> This was obtained with a slightly outdated `strace' binary, hence the
> missing decoding of some syscalls, however output produced is clear: the
> test child does get killed, the test driver reports what it is supposed to
> in this case, but then, hold on, what's up with that `rt_sigaction' call?
> OK, that pointed me at the right place, that is `do_rseq_destructor_test'
> having been marked with `__attribute__ ((destructor))'.
>
> Being a destructor the function is obviously implicitly called at program
> completion, and then by both the test child *and* the test driver process
> itself. Hence the call to `do_rseq_test' that hangs also causes the test
> driver to hang forever, preventing testing from ever completing.
>
> OK to apply (either now or past the freeze period)?
>
> Maciej
LGTM, thanks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> ---
> sysdeps/unix/sysv/linux/tst-rseq-nptl.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> glibc-tst-rseq-nptl-destructor.diff
> Index: glibc/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
> ===================================================================
> --- glibc.orig/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
> +++ glibc/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
> @@ -28,6 +28,11 @@
> #include <sys/rseq.h>
> #include <unistd.h>
>
> +/* Set this in 'do_test' only so as to invoke the destructor test in
> + the test process only and not 'support_test_main' parent. Otherwise
> + the test harness may hang in the destructor if something goes wrong. */
> +static int run_destructor_test;
> +
> #ifdef RSEQ_SIG
> # include <array_length.h>
> # include <errno.h>
> @@ -236,6 +241,9 @@ do_rseq_test (void)
> static void __attribute__ ((destructor))
> do_rseq_destructor_test (void)
> {
> + if (!run_destructor_test)
> + return;
> +
> /* Cannot use deferred failure reporting after main returns. */
> if (do_rseq_test ())
> FAIL_EXIT1 ("rseq not registered within destructor");
> @@ -254,6 +262,7 @@ do_rseq_test (void)
> static int
> do_test (void)
> {
> + run_destructor_test = 1;
> return do_rseq_test ();
> }
>
>
On Wed, 14 Aug 2024, Adhemerval Zanella Netto wrote:
> LGTM, thanks.
>
> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Applied now, thank you for your review.
Maciej
===================================================================
@@ -28,6 +28,11 @@
#include <sys/rseq.h>
#include <unistd.h>
+/* Set this in 'do_test' only so as to invoke the destructor test in
+ the test process only and not 'support_test_main' parent. Otherwise
+ the test harness may hang in the destructor if something goes wrong. */
+static int run_destructor_test;
+
#ifdef RSEQ_SIG
# include <array_length.h>
# include <errno.h>
@@ -236,6 +241,9 @@ do_rseq_test (void)
static void __attribute__ ((destructor))
do_rseq_destructor_test (void)
{
+ if (!run_destructor_test)
+ return;
+
/* Cannot use deferred failure reporting after main returns. */
if (do_rseq_test ())
FAIL_EXIT1 ("rseq not registered within destructor");
@@ -254,6 +262,7 @@ do_rseq_test (void)
static int
do_test (void)
{
+ run_destructor_test = 1;
return do_rseq_test ();
}