Patchwork Speedup various nptl/tst-cancel20 and nptl/tst-cancel21 tests.

login
register
mail settings
Submitter Stefan Liebler
Date Sept. 20, 2019, 12:59 p.m.
Message ID <205f17ba-a82b-7806-54fa-49b61268e64d@linux.ibm.com>
Download mbox | patch
Permalink /patch/34613/
State New
Headers show

Comments

Stefan Liebler - Sept. 20, 2019, 12:59 p.m.
On 9/12/19 4:07 PM, Carlos O'Donell wrote:
> On 9/12/19 8:15 AM, Stefan Liebler wrote:
>> Hi,
>>
>> each of thease tests - tst-cancel20, tst-cancelx20, tst-cancel21,
>> tst-cancelx21 and tst-cancel21-static - runs for 8s.
>>
>> do_one_test is called 4 times. Either in do_one_test (tst-cancel20.c)
>> or in tf (tst-cancel21.c) "sleep (1)" is called 2 times.
>>
>> This patch reduces the sleep time. Using usleep (5ms) leads to a
>> runtime of one tst-cancel... invocation of roughly 40ms instead of 8s.
>> As the nptl tests run in sequence, this patch saves roughly 39s of
>> runtime for "make check".
>>
>> Bye,
>> Stefan
>>
>> ChangeLog:
>>
>>      * nptl/tst-cancel20.c (do_one_test): Reduce sleep times.
>>      * nptl/tst-cancel21.c (tf): Likewise.
> 
> I see two possible solutions:
> 
> (a) Explain in detail why the sleep is needed, and why reducing
>      the sleep is safe.
> 
> (b) Remove the sleep and use proper barriers to provide the
>      synchronization the test is using sleep to accomplish.
> 
> Your current patch does (a) without explaining why it is safe
> to reduce the sleep period, and that the test continues to test
> what it was written to test.
> 
> Is it safe to reduce the sleep timing?
> 
> Why is it there?
> 

Hi,

tst-cancel20.c cancels a thread and checks if all cleanup-handlers were 
called in the correct order:
if (cleanups != 0x1234L)
{
    printf ("called cleanups %lx\n", cleanups);
    return 1;
}

These numbers belong to the cleanup-handlers called in the following 
functions:
1 => sh_body (sets in_sh_body=1 and blocks in read)
2 => sh (SIGHUP signal-handler; is received after pthread_barrier_wait)
3 => tf_body (waits with pthread_barrier_wait and then blocks in read)
4 => tf (started via pthread_create in do_one_test)

do_one_test starts tf, sends SIGHUP after waiting for the barrier and 
cancels the thread as soon as in_sh_body has been set to 1.

tst-cancel21.c differs in the fact, that there a thread sends SIGHUP and 
cancels the main thread.

I've tried to just remove the sleeps. This works fine with tst-cancel20.
But with tst-cancelx20 (same code as tst-cancel20.c but compiled with 
-fexceptions -fasynchronous-unwind-tables) it sometimes fails with: 
called cleanups 124 => cleanup-handler in tf_body was not called.

In the "good" case, SIGHUP is received while tf_body is blocking in read.
In the "error" case, SIGHUP is received while tf_body is waiting in 
pthread_barrier_wait.
(This occures on s390x in the same way as on e.g. x86_64; I've used gcc 
version 9.1.1)

I've compiled the test with gcc -S -fverbose-asm. Here is the shortened 
output:
tf_body:
.LFB28:
...
# tst-cancel20.c:75:   int r = pthread_barrier_wait (&b);
	brasl	%r14,pthread_barrier_wait	#,
...
# tst-cancel20.c:82:   if (read (fd[0], &c, 1) == 1)
.LEHB4:
	brasl	%r14,read	#,
...
.LEHE4:
...
.L25:
# ../sysdeps/nptl/pthread.h:612:     __frame->__cancel_routine 
(__frame->__cancel_arg);
	lghi	%r2,3	#,
	brasl	%r14,cl	#,
...
brasl	%r14,_Unwind_Resume	#,
	.cfi_endproc


.section	.gcc_except_table
.LLSDA28:
	.byte	0xff
	.byte	0xff
	.byte	0x1
	.uleb128 .LLSDACSE28-.LLSDACSB28
.LLSDACSB28:
	.uleb128 .LEHB4-.LFB28
	.uleb128 .LEHE4-.LEHB4
	.uleb128 .L25-.LFB28
	.uleb128 0
	.uleb128 .LEHB5-.LFB28
	.uleb128 .LEHE5-.LEHB5
	.uleb128 0
	.uleb128 0
.LLSDACSE28:


According to the .gcc_except_table, the region with our cleanup-handler 
starts - after pthread_barrier_wait - directly before the read call, 
even though pthread_barrier_wait is within pthread_cleanup_push / pop:
tf_body (void)
{ ...
   pthread_cleanup_push (cl, (void *) 3L);

   int r = pthread_barrier_wait (&b);
   ...
   if (read (fd[0], &c, 1) == 1)
      ...
   read (fd[0], &c, 1);

   pthread_cleanup_pop (0);
}
Is this behaviour intended?

The difference between those calls is "nothrow":
extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ;
vs
extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
      __attribute__ ((__nothrow__))  __attribute__ ((__nonnull__ (1)));

Placing a e.g. a printf or read call just before pthread_barrier_wait 
will lead to the inclusion of pthread_barrier_wait and the 
cleanup-handler is called!


Now, I've removed the sleeps and also removed the pthread_barrier_wait. 
Instead write to and read from a pipe is used to synchronizse tf_body 
and do_one_test.
Please have a look at the attached patch.

Bye.
Stefan
Zack Weinberg - Sept. 20, 2019, 3:17 p.m.
On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote:
...
> I've tried to just remove the sleeps. This works fine with tst-cancel20.
> But with tst-cancelx20 (same code as tst-cancel20.c but compiled with
> -fexceptions -fasynchronous-unwind-tables) it sometimes fails with:
> called cleanups 124 => cleanup-handler in tf_body was not called.
>
> In the "good" case, SIGHUP is received while tf_body is blocking in read.
> In the "error" case, SIGHUP is received while tf_body is waiting in
> pthread_barrier_wait.
> (This occures on s390x in the same way as on e.g. x86_64; I've used gcc
> version 9.1.1)
...
> According to the .gcc_except_table, the region with our cleanup-handler
> starts - after pthread_barrier_wait - directly before the read call,
> even though pthread_barrier_wait is within pthread_cleanup_push / pop:
...
> Is this behaviour intended?
>
> The difference between those calls is "nothrow":
> extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ;
> vs
> extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
>       __attribute__ ((__nothrow__))  __attribute__ ((__nonnull__ (1)));

This looks like GCC is assuming it can hoist a call to a nothrow
function out of a cleanup region that lexically contains it.  That's
not true when an exception can be thrown by a signal handler.  I can
see an argument that this is an incorrect optimization when
-fasynchronous-unwind-tables is in use, but the documentation makes it
sound like -fasynchronous-unwind-tables is only intended to enable
*stack tracing* from an arbitrary instruction, not exceptions.
(See the documentation for -fnon-call-exceptions as well as for
-fasynchronous-exception-tables.)

It is not entirely clear to me what this test is supposed to test, but
I think the intent is for the SIGHUP to be delivered *only* after we
are sure tf_body is blocked on read.  If it's delivered at any other
point, the test might not be testing the right thing.  Instead of your
change to use a second pipe, therefore, I suggest use of
pthread_sigmask and ppoll: remove the barriers; block SIGHUP using
pthread_sigmask in do_test (this will inherit to all child threads);
then change tf_body to something like this:

static void __attribute__((noinline))
tf_body (void)
{
  char c;

  pthread_cleanup_push (cl, (void *) 3L);

  sigset_t unblock_SIGHUP;
  sigemptyset(&unblock_SIGHUP);

  struct pollfd pfds[1];
  pfds[0].fd = fd[0];
  pfds[0].events = POLLIN;

  // this call is expected to be interrupted by a SIGHUP
  // before anything is written to the pipe
  if (ppoll (pfds, 1, 0, &unblock_SIGHUP) != -1)
    {
      puts ("read succeeded");
      exit (1);
    }

  // drain the pipe
  read (fd[0], &c, 1);

  pthread_cleanup_pop (0);
}

ppoll _atomically_ sets the signal mask before waiting, and restores
it afterward, so, this should ensure that the SIGHUP is delivered at
exactly the right point.

zw
Florian Weimer - Sept. 20, 2019, 3:50 p.m.
* Zack Weinberg:

> On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote:
> ...
>> I've tried to just remove the sleeps. This works fine with tst-cancel20.
>> But with tst-cancelx20 (same code as tst-cancel20.c but compiled with
>> -fexceptions -fasynchronous-unwind-tables) it sometimes fails with:
>> called cleanups 124 => cleanup-handler in tf_body was not called.
>>
>> In the "good" case, SIGHUP is received while tf_body is blocking in read.
>> In the "error" case, SIGHUP is received while tf_body is waiting in
>> pthread_barrier_wait.
>> (This occures on s390x in the same way as on e.g. x86_64; I've used gcc
>> version 9.1.1)
> ...
>> According to the .gcc_except_table, the region with our cleanup-handler
>> starts - after pthread_barrier_wait - directly before the read call,
>> even though pthread_barrier_wait is within pthread_cleanup_push / pop:
> ...
>> Is this behaviour intended?
>>
>> The difference between those calls is "nothrow":
>> extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ;
>> vs
>> extern int pthread_barrier_wait (pthread_barrier_t *__barrier)
>>       __attribute__ ((__nothrow__))  __attribute__ ((__nonnull__ (1)));
>
> This looks like GCC is assuming it can hoist a call to a nothrow
> function out of a cleanup region that lexically contains it.  That's
> not true when an exception can be thrown by a signal handler.  I can
> see an argument that this is an incorrect optimization when
> -fasynchronous-unwind-tables is in use, but the documentation makes it
> sound like -fasynchronous-unwind-tables is only intended to enable
> *stack tracing* from an arbitrary instruction, not exceptions.
> (See the documentation for -fnon-call-exceptions as well as for
> -fasynchronous-exception-tables.)

I would have expected this to be controlled by -fnon-call-exceptions,
but the GCC 8 documentation says this:

| '-fnon-call-exceptions'
|      Generate code that allows trapping instructions to throw
|      exceptions.  Note that this requires platform-specific runtime
|      support that does not exist everywhere.  Moreover, it only allows
|      _trapping_ instructions to throw exceptions, i.e. memory references
|      or floating-point instructions.  It does not allow exceptions to be
|      thrown from arbitrary signal handlers such as 'SIGALRM'.

So it does not seem to be what we are looking for.

Anyway, based on the posted analysis, the tst-cancelx20 test case is
invalid.

Thanks,
Florian

Patch

commit 47847907fd2dae3201bf87096886c44d7801768b
Author: Stefan Liebler <stli@linux.ibm.com>
Date:   Wed Sep 11 09:43:08 2019 +0200

    Speedup various nptl/tst-cancel20 and nptl/tst-cancel21 tests.
    
    Each of thease tests - tst-cancel20, tst-cancelx20, tst-cancel21,
    tst-cancelx21 and tst-cancel21-static - runs for 8s.
    
    do_one_test is called 4 times. Either in do_one_test (tst-cancel20.c)
    or in tf (tst-cancel21.c) "sleep (1)" is called 2 times.
    
    This patch removes the sleep which leads to a short runtime of one
    tst-cancel... invocation and as the nptl tests run in sequence,
    this patch saves roughly 39s of runtime for "make check".
    
    Just removing the sleeps leads to fails in tst-cancelx20 / x21
    as one cleanup-handler won't be called as pthread_barrier_wait is
    marked with attribute ((__nothrow__)).  Therefore the pthread_barrier_wait
    calls are replaced by writing to and reading from a pipe in order
    to synchronize the threads.
    
    ChangeLog:
    
            * nptl/tst-cancel20.c (b): Remove.
            (fd): Enhance to 6 elements.
            (tf_body): Use xwrite instead of pthread_barrier_wait.
            (do_one_test): Remove sleep and use read instead of
            pthread_barrier_wait.
            * nptl/tst-cancel21.c (b): Remove.
            (fd): Enhance to 6 elements.
            (tf_body): Use xwrite instead of pthread_barrier_wait.
            (tf): Remove sleep and use read instead of
            pthread_barrier_wait.

diff --git a/nptl/tst-cancel20.c b/nptl/tst-cancel20.c
index 96dfb71fa9..f82f899a83 100644
--- a/nptl/tst-cancel20.c
+++ b/nptl/tst-cancel20.c
@@ -21,11 +21,10 @@ 
 #include <signal.h>
 #include <stdio.h>
 #include <stdlib.h>
-#include <unistd.h>
+#include <support/xunistd.h>
 
 
-static int fd[4];
-static pthread_barrier_t b;
+static int fd[6];
 volatile int in_sh_body;
 unsigned long cleanups;
 
@@ -68,17 +67,21 @@  sh (int sig)
 static void __attribute__((noinline))
 tf_body (void)
 {
-  char c;
+  char c = 0;
 
   pthread_cleanup_push (cl, (void *) 3L);
 
-  int r = pthread_barrier_wait (&b);
-  if (r != 0 && r != PTHREAD_BARRIER_SERIAL_THREAD)
-    {
-      puts ("child thread: barrier_wait failed");
-      exit (1);
-    }
-
+  /* Write to the pipe in order to inform do_one_test that we are now within
+     pthread_cleanup_push / pop and do_one_test can send us SIGHUP and cancel
+     this thread.
+     We do not use pthread_barrier_wait here as it is marked with
+     attribute ((__nothrow__)) and the region in .gcc_except_table section
+     which is responsible for calling the cleanup-handler would start just
+     before the next call to read in case of nptl/tst-cancelx20.c where we
+     build with "gcc -fexceptions -fasynchronous-unwind-tables".  */
+  xwrite (fd[5], &c, 1);
+
+  /* Blocks until either SIGHUP is received or the thread is canceled.  */
   if (read (fd[0], &c, 1) == 1)
     {
       puts ("read succeeded");
@@ -106,7 +109,7 @@  do_one_test (void)
 {
   in_sh_body = 0;
   cleanups = 0;
-  if (pipe (fd) != 0 || pipe (fd + 2) != 0)
+  if (pipe (fd) != 0 || pipe (fd + 2) != 0 || pipe (fd + 4) != 0)
     {
       puts ("pipe failed");
       return 1;
@@ -119,16 +122,17 @@  do_one_test (void)
       return 1;
     }
 
-  int r = pthread_barrier_wait (&b);
-  if (r != 0 && r != PTHREAD_BARRIER_SERIAL_THREAD)
+  /* Block until the thread is within pthread_cleanup_push / pop
+     in tf_body which is called by tf.  */
+  char c;
+  if (read (fd[4], &c, 1) != 1)
     {
-      puts ("parent thread: barrier_wait failed");
+      printf ("parent thread: Reading from pipe failed %m\n");
       return 1;
     }
 
-  sleep (1);
-
-  r = pthread_kill (th, SIGHUP);
+  /* Send signal SIGHUP where in_sh_body will be set to 1 in sh_body.  */
+  int r = pthread_kill (th, SIGHUP);
   if (r)
     {
       errno = r;
@@ -137,8 +141,9 @@  do_one_test (void)
     }
 
   while (in_sh_body == 0)
-    sleep (1);
+    ;
 
+  /* Cancel the thread while the signal handler blocks in read in sh_body.  */
   if (pthread_cancel (th) != 0)
     {
       puts ("cancel failed");
@@ -172,6 +177,8 @@  do_one_test (void)
   close (fd[1]);
   close (fd[2]);
   close (fd[3]);
+  close (fd[4]);
+  close (fd[5]);
 
   return 0;
 }
@@ -195,12 +202,6 @@  do_test (void)
       return 1;
     }
 
-  if (pthread_barrier_init (&b, NULL, 2) != 0)
-    {
-      puts ("barrier_init failed");
-      return 1;
-    }
-
   struct sigaction sa;
   sa.sa_handler = sh;
   sigemptyset (&sa.sa_mask);
diff --git a/nptl/tst-cancel21.c b/nptl/tst-cancel21.c
index 361510b027..30e396dee2 100644
--- a/nptl/tst-cancel21.c
+++ b/nptl/tst-cancel21.c
@@ -22,11 +22,10 @@ 
 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/wait.h>
-#include <unistd.h>
+#include <support/xunistd.h>
 
 
-static int fd[4];
-static pthread_barrier_t b;
+static int fd[6];
 volatile int in_sh_body;
 unsigned long cleanups;
 
@@ -69,17 +68,21 @@  sh (int sig)
 static void __attribute__((noinline))
 tf_body (void)
 {
-  char c;
+  char c = 0;
 
   pthread_cleanup_push (cl, (void *) 3L);
 
-  int r = pthread_barrier_wait (&b);
-  if (r != 0 && r != PTHREAD_BARRIER_SERIAL_THREAD)
-    {
-      puts ("child thread: barrier_wait failed");
-      exit (1);
-    }
-
+  /* Write to the pipe in order to inform tf that we are now within
+     pthread_cleanup_push / pop and tf can send us SIGHUP and cancel
+     this thread.
+     We do not use pthread_barrier_wait here as it is marked with
+     attribute ((__nothrow__)) and the region in .gcc_except_table section
+     which is responsible for calling the cleanup-handler would start just
+     before the next call to read in case of nptl/tst-cancelx21.c where we
+     build with "gcc -fexceptions -fasynchronous-unwind-tables".  */
+  xwrite (fd[5], &c, 1);
+
+  /* Blocks until either SIGHUP is received or the thread is canceled.  */
   if (read (fd[0], &c, 1) == 1)
     {
       puts ("read succeeded");
@@ -97,16 +100,17 @@  tf (void *arg)
 {
   pthread_t th = (pthread_t) arg;
 
-  int r = pthread_barrier_wait (&b);
-  if (r != 0 && r != PTHREAD_BARRIER_SERIAL_THREAD)
+  /* Block until the main thread is within pthread_cleanup_push / pop
+     in tf_body which is called by do_one_test.  */
+  char c;
+  if (read (fd[4], &c, 1) != 1)
     {
-      puts ("parent thread: barrier_wait failed");
+      printf ("Reading from pipe failed %m\n");
       exit (1);
     }
 
-  sleep (1);
-
-  r = pthread_kill (th, SIGHUP);
+  /* Send signal SIGHUP where in_sh_body will be set to 1 in sh_body.  */
+  int r = pthread_kill (th, SIGHUP);
   if (r)
     {
       errno = r;
@@ -115,8 +119,10 @@  tf (void *arg)
     }
 
   while (in_sh_body == 0)
-    sleep (1);
+    ;
 
+  /* Cancel the parent thread while the signal handler blocks in read
+     in sh_body.  */
   if (pthread_cancel (th) != 0)
     {
       puts ("cancel failed");
@@ -142,11 +148,6 @@  tf (void *arg)
       exit (1);
     }
 
-  if (pthread_barrier_destroy (&b))
-    {
-      puts ("barrier destroy failed");
-      exit (1);
-    }
 
   /* The pipe closing must be issued after the cancellation handling to avoid
      a race condition where the cancellation runs after both pipe ends are
@@ -156,6 +157,8 @@  tf (void *arg)
   close (fd[1]);
   close (fd[2]);
   close (fd[3]);
+  close (fd[4]);
+  close (fd[5]);
 
   exit (0);
 }
@@ -186,14 +189,8 @@  do_one_test (void)
       return !WIFEXITED (status) || WEXITSTATUS (status);
     }
 
-  if (pthread_barrier_init (&b, NULL, 2) != 0)
-    {
-      puts ("barrier_init failed");
-      exit (1);
-    }
-
   cleanups = 0;
-  if (pipe (fd) != 0 || pipe (fd + 2) != 0)
+  if (pipe (fd) != 0 || pipe (fd + 2) != 0 || pipe (fd + 4) != 0)
     {
       puts ("pipe failed");
       exit (1);