Message ID | 20181104200048.2463-1-philippe.waroquiers@skynet.be |
---|---|
State | New, archived |
Headers |
Received: (qmail 74798 invoked by alias); 4 Nov 2018 20:01:10 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 74729 invoked by uid 89); 4 Nov 2018 20:01:01 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=philip, cap, switching, D*entry X-HELO: mailsec109.isp.belgacom.be Received: from mailsec109.isp.belgacom.be (HELO mailsec109.isp.belgacom.be) (195.238.20.105) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 04 Nov 2018 20:00:57 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=skynet.be; i=@skynet.be; q=dns/txt; s=securemail; t=1541361657; x=1572897657; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=s8xrXuLbUzjLtZmeFRu+PVo0v/5O0jdpBZlUiBjyjJg=; b=OkYRYcazHzTjWAa+vTXIa71wRQdjr+PMIuWg0N/ZBHfTtyRMBxRNRZ6L kUzZ3UvjxsSqUlSgxXfYq0kHzXikfA==; Received: from 110.212-243-81.adsl-dyn.isp.belgacom.be (HELO md.home) ([81.243.212.110]) by relay.skynet.be with ESMTP/TLS/DHE-RSA-AES128-GCM-SHA256; 04 Nov 2018 21:00:54 +0100 From: Philippe Waroquiers <philippe.waroquiers@skynet.be> To: gdb-patches@sourceware.org Cc: Philippe Waroquiers <philippe.waroquiers@skynet.be> Subject: [RFA] Fix tid-reuse sometimes blocks for a very long (infinite?) time. Date: Sun, 4 Nov 2018 21:00:48 +0100 Message-Id: <20181104200048.2463-1-philippe.waroquiers@skynet.be> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes |
Commit Message
Philippe Waroquiers
Nov. 4, 2018, 8 p.m. UTC
A failure that seems to cause a long/infinite time is the following: For a not clear reason, tid-reuse.c spawner thread sometimes gets an error: tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed. which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the after_count breakpoint: Thread 2 "tid-reuse" received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff7518700 (LWP 10368)] __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count After that, tid-reuse.exp gets the value of reuse_time, but this one kept its initial value of -1 (as unsigned) : print reuse_time $1 = 4294967295 (gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time tid-reuse then dies, and the .exp script continues (with some FAIL) till it executes: set timeout [expr $reuse_time * 2] leading to the error: (gdb) ERROR: integer value too large to represent as non-long integer while executing "expect { -i exp8 -timeout 8589934590 -re ".*A problem internal to GDB has been detected" { fail "$message (GDB internal error)" gdb_intern..." ("uplevel" body line 1) invoked from within "uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer ERROR: GDB process no longer exists and then everything blocks. This last 'GDB process no longer exists' is strange, as I still see the gdb when this all blocks, e.g. philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [ philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip This patch gives a default value of 60, so that if ever something wrong happens in tid-reuse, then the value retrieved by the .exp script stays in a reasonable range. Note that I could not reproduce this failure often enough to be sure that initializing to 60 ensures it does not block, but in any case, it should not harm. gdb/testsuite/ChangeLog 2018-11-04 Philippe Waroquiers <philippe.waroquiers@skynet.be> * gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60. (reuse_time): Initialize to REUSE_TIME_CAP. (main): Use REUSE_TIME_CAP instead of hardcoded 60. --- gdb/testsuite/gdb.threads/tid-reuse.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
Comments
Thanks Philippe On Sun, 2018-11-04 at 21:00 +0100, Philippe Waroquiers wrote: > A failure that seems to cause a long/infinite time is the following: > > For a not clear reason, tid-reuse.c spawner thread sometimes gets an error: > tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed. > > which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the > after_count breakpoint: > Thread 2 "tid-reuse" received signal SIGABRT, Aborted. > [Switching to Thread 0x7ffff7518700 (LWP 10368)] > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count > > After that, tid-reuse.exp gets the value of reuse_time, but this one kept its > initial value of -1 (as unsigned) : > print reuse_time > $1 = 4294967295 > (gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time > > tid-reuse then dies, and the .exp script continues (with some FAIL) > till it executes: > set timeout [expr $reuse_time * 2] > > leading to the error: > > (gdb) ERROR: integer value too large to represent as non-long integer > while executing > "expect { > -i exp8 -timeout 8589934590 > -re ".*A problem internal to GDB has been detected" { > fail "$message (GDB internal error)" > gdb_intern..." > ("uplevel" body line 1) > invoked from within > "uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer > ERROR: GDB process no longer exists > > and then everything blocks. > This last 'GDB process no longer exists' is strange, as I still see the gdb > when this all blocks, e.g. > philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [ > philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre > philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip > > This patch gives a default value of 60, so that if ever something wrong happens > in tid-reuse, then the value retrieved by the .exp script stays in a reasonable > range. > Note that I could not reproduce this failure often enough to be sure that > initializing to 60 ensures it does not block, but in any case, it should > not harm. > > gdb/testsuite/ChangeLog > > 2018-11-04 Philippe Waroquiers <philippe.waroquiers@skynet.be> > > * gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60. > (reuse_time): Initialize to REUSE_TIME_CAP. > (main): Use REUSE_TIME_CAP instead of hardcoded 60. > --- > gdb/testsuite/gdb.threads/tid-reuse.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/gdb/testsuite/gdb.threads/tid-reuse.c b/gdb/testsuite/gdb.threads/tid-reuse.c > index 1741325a5b..0cdd580441 100644 > --- a/gdb/testsuite/gdb.threads/tid-reuse.c > +++ b/gdb/testsuite/gdb.threads/tid-reuse.c > @@ -34,8 +34,11 @@ unsigned long thread_counter; > incremented, this is enough for the tid numbers to wrap around. On > targets that randomize thread IDs, this is enough time to give each > number in the thread number space some chance of reuse. It'll be > - capped to a lower value if we can't compute it. */ > -unsigned int reuse_time = -1; > + capped to a lower value if we can't compute it. REUSE_TIME_CAP > + is the max value, and the default value if ever the program > + has problem to compute it. */ > +#define REUSE_TIME_CAP 60 > +unsigned int reuse_time = REUSE_TIME_CAP; > > void * > do_nothing_thread_func (void *arg) > @@ -138,8 +141,8 @@ main (int argc, char *argv[]) > pid_max=32768. Going forward, as machines get faster, this will > need less time, unless pid_max is set to a very high number. To > avoid unreasonably long test time, cap to an upper bound. */ > - if (reuse_time > 60) > - reuse_time = 60; > + if (reuse_time > REUSE_TIME_CAP) > + reuse_time = REUSE_TIME_CAP; > printf ("thread_counter=%lu, tid_max = %ld, reuse_time_raw=%u, reuse_time=%u\n", > thread_counter, tid_max, reuse_time_raw, reuse_time); > after_count ();
On Sat, 2018-11-24 at 13:58 +0100, Philippe Waroquiers wrote: > Thanks > Philippe > > On Sun, 2018-11-04 at 21:00 +0100, Philippe Waroquiers wrote: > > A failure that seems to cause a long/infinite time is the following: > > > > For a not clear reason, tid-reuse.c spawner thread sometimes gets an error: > > tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed. > > > > which causes a SIGABRT to be trapped by gdb, and tid-reuse does not reach the > > after_count breakpoint: > > Thread 2 "tid-reuse" received signal SIGABRT, Aborted. > > [Switching to Thread 0x7ffff7518700 (LWP 10368)] > > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > > 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > > (gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_count > > > > After that, tid-reuse.exp gets the value of reuse_time, but this one kept its > > initial value of -1 (as unsigned) : > > print reuse_time > > $1 = 4294967295 > > (gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time > > > > tid-reuse then dies, and the .exp script continues (with some FAIL) > > till it executes: > > set timeout [expr $reuse_time * 2] > > > > leading to the error: > > > > (gdb) ERROR: integer value too large to represent as non-long integer > > while executing > > "expect { > > -i exp8 -timeout 8589934590 > > -re ".*A problem internal to GDB has been detected" { > > fail "$message (GDB internal error)" > > gdb_intern..." > > ("uplevel" body line 1) > > invoked from within > > "uplevel $body" ARITH IOVERFLOW {integer value too large to represent as non-long integer} integer value too large to represent as non-long integer > > ERROR: GDB process no longer exists > > > > and then everything blocks. > > This last 'GDB process no longer exists' is strange, as I still see the gdb > > when this all blocks, e.g. > > philippe 16058 31085 0 20:30 pts/15 00:00:00 /bin/bash -c rootme=`pwd`; export rootme; srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; EXPECT=`if [ > > philippe 16386 16058 0 20:30 pts/15 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/tid-reuse gdb.thre > > philippe 24848 16386 0 20:30 pts/20 00:00:00 /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /bd/home/philip > > > > This patch gives a default value of 60, so that if ever something wrong happens > > in tid-reuse, then the value retrieved by the .exp script stays in a reasonable > > range. > > Note that I could not reproduce this failure often enough to be sure that > > initializing to 60 ensures it does not block, but in any case, it should > > not harm. > > > > gdb/testsuite/ChangeLog > > > > 2018-11-04 Philippe Waroquiers <philippe.waroquiers@skynet.be> > > > > * gdb.threads/tid-reuse.c (REUSE_TIME_CAP): Declare as 60. > > (reuse_time): Initialize to REUSE_TIME_CAP. > > (main): Use REUSE_TIME_CAP instead of hardcoded 60. > > --- > > gdb/testsuite/gdb.threads/tid-reuse.c | 11 +++++++---- > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/gdb/testsuite/gdb.threads/tid-reuse.c b/gdb/testsuite/gdb.threads/tid-reuse.c > > index 1741325a5b..0cdd580441 100644 > > --- a/gdb/testsuite/gdb.threads/tid-reuse.c > > +++ b/gdb/testsuite/gdb.threads/tid-reuse.c > > @@ -34,8 +34,11 @@ unsigned long thread_counter; > > incremented, this is enough for the tid numbers to wrap around. On > > targets that randomize thread IDs, this is enough time to give each > > number in the thread number space some chance of reuse. It'll be > > - capped to a lower value if we can't compute it. */ > > -unsigned int reuse_time = -1; > > + capped to a lower value if we can't compute it. REUSE_TIME_CAP > > + is the max value, and the default value if ever the program > > + has problem to compute it. */ > > +#define REUSE_TIME_CAP 60 > > +unsigned int reuse_time = REUSE_TIME_CAP; > > > > void * > > do_nothing_thread_func (void *arg) > > @@ -138,8 +141,8 @@ main (int argc, char *argv[]) > > pid_max=32768. Going forward, as machines get faster, this will > > need less time, unless pid_max is set to a very high number. To > > avoid unreasonably long test time, cap to an upper bound. */ > > - if (reuse_time > 60) > > - reuse_time = 60; > > + if (reuse_time > REUSE_TIME_CAP) > > + reuse_time = REUSE_TIME_CAP; > > printf ("thread_counter=%lu, tid_max = %ld, reuse_time_raw=%u, reuse_time=%u\n", > > thread_counter, tid_max, reuse_time_raw, reuse_time); > > after_count ();
On 2018-11-04 15:00, Philippe Waroquiers wrote: > A failure that seems to cause a long/infinite time is the following: > > For a not clear reason, tid-reuse.c spawner thread sometimes gets an > error: > tid-reuse: > /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: > spawner_thread_func: Assertion `rc == 0' failed. > > which causes a SIGABRT to be trapped by gdb, and tid-reuse does not > reach the > after_count breakpoint: > Thread 2 "tid-reuse" received signal SIGABRT, Aborted. > [Switching to Thread 0x7ffff7518700 (LWP 10368)] > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: > after_count > > After that, tid-reuse.exp gets the value of reuse_time, but this one > kept its > initial value of -1 (as unsigned) : > print reuse_time > $1 = 4294967295 > (gdb) PASS: gdb.threads/tid-reuse.exp: get reuse_time > > tid-reuse then dies, and the .exp script continues (with some FAIL) > till it executes: > set timeout [expr $reuse_time * 2] > > leading to the error: > > (gdb) ERROR: integer value too large to represent as non-long integer > while executing > "expect { > -i exp8 -timeout 8589934590 > -re ".*A problem internal to GDB has been detected" { > fail "$message (GDB internal error)" > gdb_intern..." > ("uplevel" body line 1) > invoked from within > "uplevel $body" ARITH IOVERFLOW {integer value too large to > represent as non-long integer} integer value too large to represent as > non-long integer > ERROR: GDB process no longer exists > > and then everything blocks. > This last 'GDB process no longer exists' is strange, as I still see the > gdb > when this all blocks, e.g. > philippe 16058 31085 0 20:30 pts/15 00:00:00 > /bin/bash -c rootme=`pwd`; export rootme; > srcdir=../../../binutils-gdb/gdb/testsuite ; export srcdir ; > EXPECT=`if [ > philippe 16386 16058 0 20:30 pts/15 00:00:00 > expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes > --outdir=outputs/gdb.threads/tid-reuse gdb.thre > philippe 24848 16386 0 20:30 pts/20 00:00:00 > > /bd/home/philippe/gdb/git/build_binutils-gdb/gdb/testsuite/../../gdb/gdb > -nw -nx -data-directory /bd/home/philip > > This patch gives a default value of 60, so that if ever something wrong > happens > in tid-reuse, then the value retrieved by the .exp script stays in a > reasonable > range. > Note that I could not reproduce this failure often enough to be sure > that > initializing to 60 ensures it does not block, but in any case, it > should > not harm. I replaced the pthread_create call with the value 1 to simulate a failure, and the test succeeds to fail quickly with your patch applied. Without your patch, I get the infinite hang that you describe. The patch LGTM, thanks! Simon
On 11/04/2018 08:00 PM, Philippe Waroquiers wrote: > A failure that seems to cause a long/infinite time is the following: > > For a not clear reason, tid-reuse.c spawner thread sometimes gets an error: > tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed. Odd. It'd be interesting to know what was the value of 'rc'. I suspect it was EAGAIN, but I'm not seeing why you'd get that: the test spawns each thread and waits for it to die before spawning a new one. Maybe you were testing in parallel mode, and some other testcase happens to exhaust the number of threads simultaneously? I'm thinking of perhaps gdb.threads/attach-many-short-lived-threads.exp, for example. Thanks, Pedro Alves
On Tue, 2018-12-04 at 15:23 +0000, Pedro Alves wrote: > On 11/04/2018 08:00 PM, Philippe Waroquiers wrote: > > A failure that seems to cause a long/infinite time is the following: > > > > For a not clear reason, tid-reuse.c spawner thread sometimes gets an error: > > tid-reuse: /bd/home/philippe/gdb/git/build_moreaa/gdb/testsuite/../../../moreaa/gdb/testsuite/gdb.threads/tid-reuse.c:58: spawner_thread_func: Assertion `rc == 0' failed. > > Odd. It'd be interesting to know what was the value of 'rc'. > I suspect it was EAGAIN, but I'm not seeing why you'd get that: > the test spawns each thread and waits for it to die before spawning > a new one. Maybe you were testing in parallel mode, and some other > testcase happens to exhaust the number of threads simultaneously? > I'm thinking of perhaps gdb.threads/attach-many-short-lived-threads.exp, > for example. Yes, I also suspect it will be EAGAIN (at least man pthread_create does not show much other possibilities). I have submitted an RFAv2 : compared to v1, it reports the value of 'rc' before crashing. Thanks Philippe
diff --git a/gdb/testsuite/gdb.threads/tid-reuse.c b/gdb/testsuite/gdb.threads/tid-reuse.c index 1741325a5b..0cdd580441 100644 --- a/gdb/testsuite/gdb.threads/tid-reuse.c +++ b/gdb/testsuite/gdb.threads/tid-reuse.c @@ -34,8 +34,11 @@ unsigned long thread_counter; incremented, this is enough for the tid numbers to wrap around. On targets that randomize thread IDs, this is enough time to give each number in the thread number space some chance of reuse. It'll be - capped to a lower value if we can't compute it. */ -unsigned int reuse_time = -1; + capped to a lower value if we can't compute it. REUSE_TIME_CAP + is the max value, and the default value if ever the program + has problem to compute it. */ +#define REUSE_TIME_CAP 60 +unsigned int reuse_time = REUSE_TIME_CAP; void * do_nothing_thread_func (void *arg) @@ -138,8 +141,8 @@ main (int argc, char *argv[]) pid_max=32768. Going forward, as machines get faster, this will need less time, unless pid_max is set to a very high number. To avoid unreasonably long test time, cap to an upper bound. */ - if (reuse_time > 60) - reuse_time = 60; + if (reuse_time > REUSE_TIME_CAP) + reuse_time = REUSE_TIME_CAP; printf ("thread_counter=%lu, tid_max = %ld, reuse_time_raw=%u, reuse_time=%u\n", thread_counter, tid_max, reuse_time_raw, reuse_time); after_count ();