Speedup various nptl/tst-cancel20 and nptl/tst-cancel21 tests.
Commit Message
On 9/12/19 4:07 PM, Carlos O'Donell wrote:
> On 9/12/19 8:15 AM, Stefan Liebler wrote:
>> Hi,
>>
>> each of thease tests - tst-cancel20, tst-cancelx20, tst-cancel21,
>> tst-cancelx21 and tst-cancel21-static - runs for 8s.
>>
>> do_one_test is called 4 times. Either in do_one_test (tst-cancel20.c)
>> or in tf (tst-cancel21.c) "sleep (1)" is called 2 times.
>>
>> This patch reduces the sleep time. Using usleep (5ms) leads to a
>> runtime of one tst-cancel... invocation of roughly 40ms instead of 8s.
>> As the nptl tests run in sequence, this patch saves roughly 39s of
>> runtime for "make check".
>>
>> Bye,
>> Stefan
>>
>> ChangeLog:
>>
>> * nptl/tst-cancel20.c (do_one_test): Reduce sleep times.
>> * nptl/tst-cancel21.c (tf): Likewise.
>
> I see two possible solutions:
>
> (a) Explain in detail why the sleep is needed, and why reducing
> the sleep is safe.
>
> (b) Remove the sleep and use proper barriers to provide the
> synchronization the test is using sleep to accomplish.
>
> Your current patch does (a) without explaining why it is safe
> to reduce the sleep period, and that the test continues to test
> what it was written to test.
>
> Is it safe to reduce the sleep timing?
>
> Why is it there?
>
Hi,
tst-cancel20.c cancels a thread and checks if all cleanup-handlers were
called in the correct order:
if (cleanups != 0x1234L)
{
printf ("called cleanups %lx\n", cleanups);
return 1;
}
These numbers belong to the cleanup-handlers called in the following
functions:
1 => sh_body (sets in_sh_body=1 and blocks in read)
2 => sh (SIGHUP signal-handler; is received after pthread_barrier_wait)
3 => tf_body (waits with pthread_barrier_wait and then blocks in read)
4 => tf (started via pthread_create in do_one_test)
do_one_test starts tf, sends SIGHUP after waiting for the barrier and
cancels the thread as soon as in_sh_body has been set to 1.
tst-cancel21.c differs in the fact, that there a thread sends SIGHUP and
cancels the main thread.
I've tried to just remove the sleeps. This works fine with tst-cancel20.
But with tst-cancelx20 (same code as tst-cancel20.c but compiled with
-fexceptions -fasynchronous-unwind-tables) it sometimes fails with:
called cleanups 124 => cleanup-handler in tf_body was not called.
In the "good" case, SIGHUP is received while tf_body is blocking in read.
In the "error" case, SIGHUP is received while tf_body is waiting in
pthread_barrier_wait.
(This occures on s390x in the same way as on e.g. x86_64; I've used gcc
version 9.1.1)
I've compiled the test with gcc -S -fverbose-asm. Here is the shortened
output:
tf_body:
.LFB28:
...
# tst-cancel20.c:75: int r = pthread_barrier_wait (&b);
brasl %r14,pthread_barrier_wait #,
...
# tst-cancel20.c:82: if (read (fd[0], &c, 1) == 1)
.LEHB4:
brasl %r14,read #,
...
.LEHE4:
...
.L25:
# ../sysdeps/nptl/pthread.h:612: __frame->__cancel_routine
(__frame->__cancel_arg);
lghi %r2,3 #,
brasl %r14,cl #,
...
brasl %r14,_Unwind_Resume #,
.cfi_endproc
.section .gcc_except_table
.LLSDA28:
.byte 0xff
.byte 0xff
.byte 0x1
.uleb128 .LLSDACSE28-.LLSDACSB28
.LLSDACSB28:
.uleb128 .LEHB4-.LFB28
.uleb128 .LEHE4-.LEHB4
.uleb128 .L25-.LFB28
.uleb128 0
.uleb128 .LEHB5-.LFB28
.uleb128 .LEHE5-.LEHB5
.uleb128 0
.uleb128 0
.LLSDACSE28:
According to the .gcc_except_table, the region with our cleanup-handler
starts - after pthread_barrier_wait - directly before the read call,
even though pthread_barrier_wait is within pthread_cleanup_push / pop:
tf_body (void)
{ ...
pthread_cleanup_push (cl, (void *) 3L);
int r = pthread_barrier_wait (&b);
...
if (read (fd[0], &c, 1) == 1)
...
read (fd[0], &c, 1);
pthread_cleanup_pop (0);
}
Is this behaviour intended?
The difference between those calls is "nothrow":
extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ;
vs
extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
__attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1)));
Placing a e.g. a printf or read call just before pthread_barrier_wait
will lead to the inclusion of pthread_barrier_wait and the
cleanup-handler is called!
Now, I've removed the sleeps and also removed the pthread_barrier_wait.
Instead write to and read from a pipe is used to synchronizse tf_body
and do_one_test.
Please have a look at the attached patch.
Bye.
Stefan
Comments
On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote:
...
> I've tried to just remove the sleeps. This works fine with tst-cancel20.
> But with tst-cancelx20 (same code as tst-cancel20.c but compiled with
> -fexceptions -fasynchronous-unwind-tables) it sometimes fails with:
> called cleanups 124 => cleanup-handler in tf_body was not called.
>
> In the "good" case, SIGHUP is received while tf_body is blocking in read.
> In the "error" case, SIGHUP is received while tf_body is waiting in
> pthread_barrier_wait.
> (This occures on s390x in the same way as on e.g. x86_64; I've used gcc
> version 9.1.1)
...
> According to the .gcc_except_table, the region with our cleanup-handler
> starts - after pthread_barrier_wait - directly before the read call,
> even though pthread_barrier_wait is within pthread_cleanup_push / pop:
...
> Is this behaviour intended?
>
> The difference between those calls is "nothrow":
> extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ;
> vs
> extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
> __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1)));
This looks like GCC is assuming it can hoist a call to a nothrow
function out of a cleanup region that lexically contains it. That's
not true when an exception can be thrown by a signal handler. I can
see an argument that this is an incorrect optimization when
-fasynchronous-unwind-tables is in use, but the documentation makes it
sound like -fasynchronous-unwind-tables is only intended to enable
*stack tracing* from an arbitrary instruction, not exceptions.
(See the documentation for -fnon-call-exceptions as well as for
-fasynchronous-exception-tables.)
It is not entirely clear to me what this test is supposed to test, but
I think the intent is for the SIGHUP to be delivered *only* after we
are sure tf_body is blocked on read. If it's delivered at any other
point, the test might not be testing the right thing. Instead of your
change to use a second pipe, therefore, I suggest use of
pthread_sigmask and ppoll: remove the barriers; block SIGHUP using
pthread_sigmask in do_test (this will inherit to all child threads);
then change tf_body to something like this:
static void __attribute__((noinline))
tf_body (void)
{
char c;
pthread_cleanup_push (cl, (void *) 3L);
sigset_t unblock_SIGHUP;
sigemptyset(&unblock_SIGHUP);
struct pollfd pfds[1];
pfds[0].fd = fd[0];
pfds[0].events = POLLIN;
// this call is expected to be interrupted by a SIGHUP
// before anything is written to the pipe
if (ppoll (pfds, 1, 0, &unblock_SIGHUP) != -1)
{
puts ("read succeeded");
exit (1);
}
// drain the pipe
read (fd[0], &c, 1);
pthread_cleanup_pop (0);
}
ppoll _atomically_ sets the signal mask before waiting, and restores
it afterward, so, this should ensure that the SIGHUP is delivered at
exactly the right point.
zw
* Zack Weinberg:
> On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote:
> ...
>> I've tried to just remove the sleeps. This works fine with tst-cancel20.
>> But with tst-cancelx20 (same code as tst-cancel20.c but compiled with
>> -fexceptions -fasynchronous-unwind-tables) it sometimes fails with:
>> called cleanups 124 => cleanup-handler in tf_body was not called.
>>
>> In the "good" case, SIGHUP is received while tf_body is blocking in read.
>> In the "error" case, SIGHUP is received while tf_body is waiting in
>> pthread_barrier_wait.
>> (This occures on s390x in the same way as on e.g. x86_64; I've used gcc
>> version 9.1.1)
> ...
>> According to the .gcc_except_table, the region with our cleanup-handler
>> starts - after pthread_barrier_wait - directly before the read call,
>> even though pthread_barrier_wait is within pthread_cleanup_push / pop:
> ...
>> Is this behaviour intended?
>>
>> The difference between those calls is "nothrow":
>> extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ;
>> vs
>> extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
>> __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1)));
>
> This looks like GCC is assuming it can hoist a call to a nothrow
> function out of a cleanup region that lexically contains it. That's
> not true when an exception can be thrown by a signal handler. I can
> see an argument that this is an incorrect optimization when
> -fasynchronous-unwind-tables is in use, but the documentation makes it
> sound like -fasynchronous-unwind-tables is only intended to enable
> *stack tracing* from an arbitrary instruction, not exceptions.
> (See the documentation for -fnon-call-exceptions as well as for
> -fasynchronous-exception-tables.)
I would have expected this to be controlled by -fnon-call-exceptions,
but the GCC 8 documentation says this:
| '-fnon-call-exceptions'
| Generate code that allows trapping instructions to throw
| exceptions. Note that this requires platform-specific runtime
| support that does not exist everywhere. Moreover, it only allows
| _trapping_ instructions to throw exceptions, i.e. memory references
| or floating-point instructions. It does not allow exceptions to be
| thrown from arbitrary signal handlers such as 'SIGALRM'.
So it does not seem to be what we are looking for.
Anyway, based on the posted analysis, the tst-cancelx20 test case is
invalid.
Thanks,
Florian
On 9/20/19 5:17 PM, Zack Weinberg wrote:
> On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote:
> ...
>> I've tried to just remove the sleeps. This works fine with tst-cancel20.
>> But with tst-cancelx20 (same code as tst-cancel20.c but compiled with
>> -fexceptions -fasynchronous-unwind-tables) it sometimes fails with:
>> called cleanups 124 => cleanup-handler in tf_body was not called.
>>
>> In the "good" case, SIGHUP is received while tf_body is blocking in read.
>> In the "error" case, SIGHUP is received while tf_body is waiting in
>> pthread_barrier_wait.
>> (This occures on s390x in the same way as on e.g. x86_64; I've used gcc
>> version 9.1.1)
> ...
>> According to the .gcc_except_table, the region with our cleanup-handler
>> starts - after pthread_barrier_wait - directly before the read call,
>> even though pthread_barrier_wait is within pthread_cleanup_push / pop:
> ...
>> Is this behaviour intended?
>>
>> The difference between those calls is "nothrow":
>> extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ;
>> vs
>> extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
>> __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1)));
>
> This looks like GCC is assuming it can hoist a call to a nothrow
> function out of a cleanup region that lexically contains it. That's
> not true when an exception can be thrown by a signal handler. I can
> see an argument that this is an incorrect optimization when
> -fasynchronous-unwind-tables is in use, but the documentation makes it
> sound like -fasynchronous-unwind-tables is only intended to enable
> *stack tracing* from an arbitrary instruction, not exceptions.
> (See the documentation for -fnon-call-exceptions as well as for
> -fasynchronous-exception-tables.)
>
> It is not entirely clear to me what this test is supposed to test, but
> I think the intent is for the SIGHUP to be delivered *only* after we
> are sure tf_body is blocked on read. If it's delivered at any other
> point, the test might not be testing the right thing. Instead of your
> change to use a second pipe, therefore, I suggest use of
> pthread_sigmask and ppoll: remove the barriers; block SIGHUP using
> pthread_sigmask in do_test (this will inherit to all child threads);
> then change tf_body to something like this:
>
> static void __attribute__((noinline))
> tf_body (void)
> {
> char c;
>
> pthread_cleanup_push (cl, (void *) 3L);
>
> sigset_t unblock_SIGHUP;
> sigemptyset(&unblock_SIGHUP);
>
> struct pollfd pfds[1];
> pfds[0].fd = fd[0];
> pfds[0].events = POLLIN;
>
> // this call is expected to be interrupted by a SIGHUP
> // before anything is written to the pipe
> if (ppoll (pfds, 1, 0, &unblock_SIGHUP) != -1)
> {
> puts ("read succeeded");
> exit (1);
> }
>
> // drain the pipe
> read (fd[0], &c, 1);
>
> pthread_cleanup_pop (0);
> }
>
> ppoll _atomically_ sets the signal mask before waiting, and restores
> it afterward, so, this should ensure that the SIGHUP is delivered at
> exactly the right point.
>
> zw
>
Sorry for the long delay.
I've give it a try. The signal handler is now called while we are in the
syscall in ppoll called in tf_body.
Before, the signal handler arrived while we were in the syscall in read
called in tf_body.
In both cases (read or ppoll), we are in a syscall surrounded with
__libc_enable_asynccancel and __libc_disable_asynccancel.
If that's okay, I'll proceed with the patch.
Bye,
Stefan
commit 47847907fd2dae3201bf87096886c44d7801768b
Author: Stefan Liebler <stli@linux.ibm.com>
Date: Wed Sep 11 09:43:08 2019 +0200
Speedup various nptl/tst-cancel20 and nptl/tst-cancel21 tests.
Each of thease tests - tst-cancel20, tst-cancelx20, tst-cancel21,
tst-cancelx21 and tst-cancel21-static - runs for 8s.
do_one_test is called 4 times. Either in do_one_test (tst-cancel20.c)
or in tf (tst-cancel21.c) "sleep (1)" is called 2 times.
This patch removes the sleep which leads to a short runtime of one
tst-cancel... invocation and as the nptl tests run in sequence,
this patch saves roughly 39s of runtime for "make check".
Just removing the sleeps leads to fails in tst-cancelx20 / x21
as one cleanup-handler won't be called as pthread_barrier_wait is
marked with attribute ((__nothrow__)). Therefore the pthread_barrier_wait
calls are replaced by writing to and reading from a pipe in order
to synchronize the threads.
ChangeLog:
* nptl/tst-cancel20.c (b): Remove.
(fd): Enhance to 6 elements.
(tf_body): Use xwrite instead of pthread_barrier_wait.
(do_one_test): Remove sleep and use read instead of
pthread_barrier_wait.
* nptl/tst-cancel21.c (b): Remove.
(fd): Enhance to 6 elements.
(tf_body): Use xwrite instead of pthread_barrier_wait.
(tf): Remove sleep and use read instead of
pthread_barrier_wait.
@@ -21,11 +21,10 @@
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
-#include <unistd.h>
+#include <support/xunistd.h>
-static int fd[4];
-static pthread_barrier_t b;
+static int fd[6];
volatile int in_sh_body;
unsigned long cleanups;
@@ -68,17 +67,21 @@ sh (int sig)
static void __attribute__((noinline))
tf_body (void)
{
- char c;
+ char c = 0;
pthread_cleanup_push (cl, (void *) 3L);
- int r = pthread_barrier_wait (&b);
- if (r != 0 && r != PTHREAD_BARRIER_SERIAL_THREAD)
- {
- puts ("child thread: barrier_wait failed");
- exit (1);
- }
-
+ /* Write to the pipe in order to inform do_one_test that we are now within
+ pthread_cleanup_push / pop and do_one_test can send us SIGHUP and cancel
+ this thread.
+ We do not use pthread_barrier_wait here as it is marked with
+ attribute ((__nothrow__)) and the region in .gcc_except_table section
+ which is responsible for calling the cleanup-handler would start just
+ before the next call to read in case of nptl/tst-cancelx20.c where we
+ build with "gcc -fexceptions -fasynchronous-unwind-tables". */
+ xwrite (fd[5], &c, 1);
+
+ /* Blocks until either SIGHUP is received or the thread is canceled. */
if (read (fd[0], &c, 1) == 1)
{
puts ("read succeeded");
@@ -106,7 +109,7 @@ do_one_test (void)
{
in_sh_body = 0;
cleanups = 0;
- if (pipe (fd) != 0 || pipe (fd + 2) != 0)
+ if (pipe (fd) != 0 || pipe (fd + 2) != 0 || pipe (fd + 4) != 0)
{
puts ("pipe failed");
return 1;
@@ -119,16 +122,17 @@ do_one_test (void)
return 1;
}
- int r = pthread_barrier_wait (&b);
- if (r != 0 && r != PTHREAD_BARRIER_SERIAL_THREAD)
+ /* Block until the thread is within pthread_cleanup_push / pop
+ in tf_body which is called by tf. */
+ char c;
+ if (read (fd[4], &c, 1) != 1)
{
- puts ("parent thread: barrier_wait failed");
+ printf ("parent thread: Reading from pipe failed %m\n");
return 1;
}
- sleep (1);
-
- r = pthread_kill (th, SIGHUP);
+ /* Send signal SIGHUP where in_sh_body will be set to 1 in sh_body. */
+ int r = pthread_kill (th, SIGHUP);
if (r)
{
errno = r;
@@ -137,8 +141,9 @@ do_one_test (void)
}
while (in_sh_body == 0)
- sleep (1);
+ ;
+ /* Cancel the thread while the signal handler blocks in read in sh_body. */
if (pthread_cancel (th) != 0)
{
puts ("cancel failed");
@@ -172,6 +177,8 @@ do_one_test (void)
close (fd[1]);
close (fd[2]);
close (fd[3]);
+ close (fd[4]);
+ close (fd[5]);
return 0;
}
@@ -195,12 +202,6 @@ do_test (void)
return 1;
}
- if (pthread_barrier_init (&b, NULL, 2) != 0)
- {
- puts ("barrier_init failed");
- return 1;
- }
-
struct sigaction sa;
sa.sa_handler = sh;
sigemptyset (&sa.sa_mask);
@@ -22,11 +22,10 @@
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
-#include <unistd.h>
+#include <support/xunistd.h>
-static int fd[4];
-static pthread_barrier_t b;
+static int fd[6];
volatile int in_sh_body;
unsigned long cleanups;
@@ -69,17 +68,21 @@ sh (int sig)
static void __attribute__((noinline))
tf_body (void)
{
- char c;
+ char c = 0;
pthread_cleanup_push (cl, (void *) 3L);
- int r = pthread_barrier_wait (&b);
- if (r != 0 && r != PTHREAD_BARRIER_SERIAL_THREAD)
- {
- puts ("child thread: barrier_wait failed");
- exit (1);
- }
-
+ /* Write to the pipe in order to inform tf that we are now within
+ pthread_cleanup_push / pop and tf can send us SIGHUP and cancel
+ this thread.
+ We do not use pthread_barrier_wait here as it is marked with
+ attribute ((__nothrow__)) and the region in .gcc_except_table section
+ which is responsible for calling the cleanup-handler would start just
+ before the next call to read in case of nptl/tst-cancelx21.c where we
+ build with "gcc -fexceptions -fasynchronous-unwind-tables". */
+ xwrite (fd[5], &c, 1);
+
+ /* Blocks until either SIGHUP is received or the thread is canceled. */
if (read (fd[0], &c, 1) == 1)
{
puts ("read succeeded");
@@ -97,16 +100,17 @@ tf (void *arg)
{
pthread_t th = (pthread_t) arg;
- int r = pthread_barrier_wait (&b);
- if (r != 0 && r != PTHREAD_BARRIER_SERIAL_THREAD)
+ /* Block until the main thread is within pthread_cleanup_push / pop
+ in tf_body which is called by do_one_test. */
+ char c;
+ if (read (fd[4], &c, 1) != 1)
{
- puts ("parent thread: barrier_wait failed");
+ printf ("Reading from pipe failed %m\n");
exit (1);
}
- sleep (1);
-
- r = pthread_kill (th, SIGHUP);
+ /* Send signal SIGHUP where in_sh_body will be set to 1 in sh_body. */
+ int r = pthread_kill (th, SIGHUP);
if (r)
{
errno = r;
@@ -115,8 +119,10 @@ tf (void *arg)
}
while (in_sh_body == 0)
- sleep (1);
+ ;
+ /* Cancel the parent thread while the signal handler blocks in read
+ in sh_body. */
if (pthread_cancel (th) != 0)
{
puts ("cancel failed");
@@ -142,11 +148,6 @@ tf (void *arg)
exit (1);
}
- if (pthread_barrier_destroy (&b))
- {
- puts ("barrier destroy failed");
- exit (1);
- }
/* The pipe closing must be issued after the cancellation handling to avoid
a race condition where the cancellation runs after both pipe ends are
@@ -156,6 +157,8 @@ tf (void *arg)
close (fd[1]);
close (fd[2]);
close (fd[3]);
+ close (fd[4]);
+ close (fd[5]);
exit (0);
}
@@ -186,14 +189,8 @@ do_one_test (void)
return !WIFEXITED (status) || WEXITSTATUS (status);
}
- if (pthread_barrier_init (&b, NULL, 2) != 0)
- {
- puts ("barrier_init failed");
- exit (1);
- }
-
cleanups = 0;
- if (pipe (fd) != 0 || pipe (fd + 2) != 0)
+ if (pipe (fd) != 0 || pipe (fd + 2) != 0 || pipe (fd + 4) != 0)
{
puts ("pipe failed");
exit (1);