[RFAv2] Fix tid-reuse sometimes blocks for a very long (infinite?) time.

Message ID 20181204230917.2245-1-philippe.waroquiers@skynet.be
State New, archived
Headers

Commit Message

Philippe Waroquiers Dec. 4, 2018, 11:09 p.m. UTC
  A failure that seems to cause a long/infinite time is the following:

For a not clear reason, tid-reuse.c spawner thread sometimes gets an error:
     tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed.

which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the
after_count breakpoint:
  Thread 2 "tid-reuse" received signal SIGABRT, Aborted.
  [Switching to Thread 0x7ffff7518700 (LWP 10368)]
  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
  51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
  (gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count

After that, tid-reuse.exp gets the value of reuse_time, but this one kept its
initial value of -1 (as unsigned) :
  print reuse_time
  $1 = 4294967295
  (gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time

tid-reuse then dies, and the .exp script continues (with some FAIL)
till it executes:
  set timeout [expr $reuse_time * 2]

leading to the error:

  (gdb) ERROR: integer value too large to represent as non-long integer
      while executing
  "expect {
  -i exp8 -timeout 8589934590
          -re ".*A problem internal to GDB has been detected" {
              fail "$message (GDB internal error)"
              gdb_intern..."
      ("uplevel" body line 1)
      invoked from within
  "uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer
  ERROR: GDB process no longer exists

and then everything blocks.
This last 'GDB process no longer exists' is strange, as I still see the gdb
when this all blocks, e.g.
philippe 16058 31085  0 20:30 pts/15   00:00:00                         /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [
philippe 16386 16058  0 20:30 pts/15   00:00:00                           expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre
philippe 24848 16386  0 20:30 pts/20   00:00:00                             /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip

This patch gives a default value of 60, so that if ever something wrong happens
in tid-reuse, then the value retrieved by the .exp script stays in a reasonable
range.

Simon verified the patch by:
"I replaced the pthread_create call with the value 1 to simulate a
failure, and the test succeeds to fail quickly with your patch applied.
Without your patch, I get the infinite hang that you describe."

Compared to V1:
As suggested by Pedro, this version checks the pthread calls return
code (in particular of pthread_create) and reports the failure reason,
instead of just aborting.

gdb/testsuite/ChangeLog

2018-12-04  Philippe Waroquiers  <philippe.waroquiers@skynet.be>

	* gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60.
	(reuse_time): Initialize to REUSE_TIME_CAP.
	(check_rc): New function.
	(main): Use REUSE_TIME_CAP instead of hardcoded 60.
	Check pthread_create rc.
	(spawner_thread_func): Check pthread_create and pthread_join rc.
---
 gdb/testsuite/gdb.threads/tid-reuse.c | 29 ++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)
  

Comments

Simon Marchi Dec. 9, 2018, 12:47 a.m. UTC | #1
On 2018-12-04 18:09, Philippe Waroquiers wrote:
> A failure that seems to cause a long/infinite time is the following:
> 
> For a not clear reason, tid-reuse.c spawner thread sometimes gets an 
> error:
>      tid-reuse:
> /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58:
> spawner_thread_func: Assertion `rc == 0' failed.
> 
> which causes a SIGABRT to be trapped by gdb, and tid-reuse does not 
> reach the
> after_count breakpoint:
>   Thread 2 "tid-reuse" received signal SIGABRT, Aborted.
>   [Switching to Thread 0x7ffff7518700 (LWP 10368)]
>   __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
>   51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>   (gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: 
> after_count
> 
> After that, tid-reuse.exp gets the value of reuse_time, but this one 
> kept its
> initial value of -1 (as unsigned) :
>   print reuse_time
>   $1 = 4294967295
>   (gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time
> 
> tid-reuse then dies, and the .exp script continues (with some FAIL)
> till it executes:
>   set timeout [expr $reuse_time * 2]
> 
> leading to the error:
> 
>   (gdb) ERROR: integer value too large to represent as non-long integer
>       while executing
>   "expect {
>   -i exp8 -timeout 8589934590
>           -re ".*A problem internal to GDB has been detected" {
>               fail "$message (GDB internal error)"
>               gdb_intern..."
>       ("uplevel" body line 1)
>       invoked from within
>   "uplevel $body" ARITH IOVERFLOW {integer value too large to
> represent as non-long integer} integer value too large to represent as
> non-long integer
>   ERROR: GDB process no longer exists
> 
> and then everything blocks.
> This last 'GDB process no longer exists' is strange, as I still see the 
> gdb
> when this all blocks, e.g.
> philippe 16058 31085  0 20:30 pts/15   00:00:00
>  /bin/bash -c rootme=`pwd`; export rootme;
> srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ;
> EXPECT=`if [
> philippe 16386 16058  0 20:30 pts/15   00:00:00
>    expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes
> --outdir=outputs/gdb.threads/tid-reuse gdb.thre
> philippe 24848 16386  0 20:30 pts/20   00:00:00
> 
> /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb
> -nw -nx -data-directory /bd/home/philip
> 
> This patch gives a default value of 60, so that if ever something wrong 
> happens
> in tid-reuse, then the value retrieved by the .exp script stays in a 
> reasonable
> range.
> 
> Simon verified the patch by:
> "I replaced the pthread_create call with the value 1 to simulate a
> failure, and the test succeeds to fail quickly with your patch applied.
> Without your patch, I get the infinite hang that you describe."
> 
> Compared to V1:
> As suggested by Pedro, this version checks the pthread calls return
> code (in particular of pthread_create) and reports the failure reason,
> instead of just aborting.
> 
> gdb/testsuite/ChangeLog
> 
> 2018-12-04  Philippe Waroquiers  <philippe.waroquiers@skynet.be>
> 
> 	* gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60.
> 	(reuse_time): Initialize to REUSE_TIME_CAP.
> 	(check_rc): New function.
> 	(main): Use REUSE_TIME_CAP instead of hardcoded 60.
> 	Check pthread_create rc.
> 	(spawner_thread_func): Check pthread_create and pthread_join rc.
> ---
>  gdb/testsuite/gdb.threads/tid-reuse.c | 29 ++++++++++++++++++++-------
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/gdb/testsuite/gdb.threads/tid-reuse.c
> b/gdb/testsuite/gdb.threads/tid-reuse.c
> index 1741325a5b..523f87bdea 100644
> --- a/gdb/testsuite/gdb.threads/tid-reuse.c
> +++ b/gdb/testsuite/gdb.threads/tid-reuse.c
> @@ -21,6 +21,7 @@
>  #include <stdlib.h>
>  #include <unistd.h>
>  #include <stdio.h>
> +#include <string.h>
>  #include <limits.h>
> 
>  /* How many threads fit in the target's thread number space.  */
> @@ -34,8 +35,11 @@ unsigned long thread_counter;
>     incremented, this is enough for the tid numbers to wrap around.  On
>     targets that randomize thread IDs, this is enough time to give each
>     number in the thread number space some chance of reuse.  It'll be
> -   capped to a lower value if we can't compute it.  */
> -unsigned int reuse_time = -1;
> +   capped to a lower value if we can't compute it.  REUSE_TIME_CAP
> +   is the max value, and the default value if ever the program
> +   has problem to compute it.  */
> +#define REUSE_TIME_CAP 60
> +unsigned int reuse_time = REUSE_TIME_CAP;
> 
>  void *
>  do_nothing_thread_func (void *arg)
> @@ -44,6 +48,17 @@ do_nothing_thread_func (void *arg)
>    return NULL;
>  }
> 
> +void
> +check_rc (int rc, const char *what)

This can be static.

The patch LGTM with that fixed.

Simon
  
Philippe Waroquiers Dec. 9, 2018, 9:05 a.m. UTC | #2
On Sat, 2018-12-08 at 19:47 -0500, Simon Marchi wrote:
> > +void
> > +check_rc (int rc, const char *what)
> 
> This can be static.
> 
> The patch LGTM with that fixed.
> 
> Simon
Thanks, fixed, retested and pushed.

Philippe
  

Patch

diff --git a/gdb/testsuite/gdb.threads/tid-reuse.c b/gdb/testsuite/gdb.threads/tid-reuse.c
index 1741325a5b..523f87bdea 100644
--- a/gdb/testsuite/gdb.threads/tid-reuse.c
+++ b/gdb/testsuite/gdb.threads/tid-reuse.c
@@ -21,6 +21,7 @@ 
 #include <stdlib.h>
 #include <unistd.h>
 #include <stdio.h>
+#include <string.h>
 #include <limits.h>
 
 /* How many threads fit in the target's thread number space.  */
@@ -34,8 +35,11 @@  unsigned long thread_counter;
    incremented, this is enough for the tid numbers to wrap around.  On
    targets that randomize thread IDs, this is enough time to give each
    number in the thread number space some chance of reuse.  It'll be
-   capped to a lower value if we can't compute it.  */
-unsigned int reuse_time = -1;
+   capped to a lower value if we can't compute it.  REUSE_TIME_CAP
+   is the max value, and the default value if ever the program
+   has problem to compute it.  */
+#define REUSE_TIME_CAP 60
+unsigned int reuse_time = REUSE_TIME_CAP;
 
 void *
 do_nothing_thread_func (void *arg)
@@ -44,6 +48,17 @@  do_nothing_thread_func (void *arg)
   return NULL;
 }
 
+void
+check_rc (int rc, const char *what)
+{
+  if (rc != 0)
+    {
+      fprintf (stderr, "unexpected error from %s: %s (%d)\n",
+	       what, strerror (rc), rc);
+      assert (0);
+    }
+}
+
 void *
 spawner_thread_func (void *arg)
 {
@@ -55,10 +70,10 @@  spawner_thread_func (void *arg)
       thread_counter++;
 
       rc = pthread_create (&child, NULL, do_nothing_thread_func, NULL);
-      assert (rc == 0);
+      check_rc (rc, "pthread_create");
 
       rc = pthread_join (child, NULL);
-      assert (rc == 0);
+      check_rc (rc, "pthread_join");
     }
 
   return NULL;
@@ -115,7 +130,7 @@  main (int argc, char *argv[])
   unsigned int reuse_time_raw = 0;
 
   rc = pthread_create (&child, NULL, spawner_thread_func, NULL);
-  assert (rc == 0);
+  check_rc (rc, "pthread_create spawner_thread");
 
 #define COUNT_TIME 2
   sleep (COUNT_TIME);
@@ -138,8 +153,8 @@  main (int argc, char *argv[])
      pid_max=32768.  Going forward, as machines get faster, this will
      need less time, unless pid_max is set to a very high number.  To
      avoid unreasonably long test time, cap to an upper bound.  */
-  if (reuse_time > 60)
-    reuse_time = 60;
+  if (reuse_time > REUSE_TIME_CAP)
+    reuse_time = REUSE_TIME_CAP;
   printf ("thread_counter=%lu, tid_max = %ld, reuse_time_raw=%u, reuse_time=%u\n",
 	  thread_counter, tid_max, reuse_time_raw, reuse_time);
   after_count ();