From patchwork Tue Sep 8 16:30:09 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pedro Alves X-Patchwork-Id: 8600 Received: (qmail 17230 invoked by alias); 8 Sep 2015 16:30:21 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 17211 invoked by uid 89); 8 Sep 2015 16:30:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL, BAYES_20, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_PASS, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Tue, 08 Sep 2015 16:30:12 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id A52E48EA23; Tue, 8 Sep 2015 16:30:11 +0000 (UTC) Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t88GU9Nl028573; Tue, 8 Sep 2015 12:30:10 -0400 Message-ID: <55EF0D11.2020200@redhat.com> Date: Tue, 08 Sep 2015 17:30:09 +0100 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Sandra Loosemore , gdb-patches CC: Yao Qi Subject: Re: [RFC] fix gdb.threads/non-stop-fair-events.exp timeouts References: <55E9CCCD.7060604@codesourcery.com> In-Reply-To: <55E9CCCD.7060604@codesourcery.com> On 09/04/2015 05:54 PM, Sandra Loosemore wrote: > While running GDB tests on nios2-linux-gnu with gdbserver and "target > remote", I've been seeing random failures in > gdb.threads/non-stop-fair-events.exp. E.g. in one test run I got > > FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 1 > broke out of loop (timeout) > FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 2 > broke out of loop (timeout) > FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 3 > broke out of loop (timeout) > FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=7: thread 1 > broke out of loop (timeout) > FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=10: thread 1 > broke out of loop (timeout) > FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=10: thread 2 > broke out of loop (timeout) > > and in other test runs I got a different ones. The pattern seemed to be > that sometimes it took an extra long time for the first thread to break > out of the loop, but once that happened they would all stop correctly > and send the expected replies even though GDB had given up on waiting > for the first few already. Yeah, I've seen this before with a local series I use for debugging software single-step things that implements software single-stepping on x86. I just re-tried it now after rebasing that series to current mainline, and I still see the time outs against gdbserver. AFAICS, nios2 is a software single-step target that does not implement displaced stepping either. I had a patch for this that I had never posted. See attached. > > I've come up with the attached patch to factor the timeout for the > failing tests by the number of threads still running, which seems to > take care of the problem. Does this seem reasonable? I'd rather avoid it unconditionally; it's just 10 threads, and if the target supports displaced stepping, if starvation avoidance in gdb is working correctly, the test should complete quickly. I takes a couple seconds on my getting-old x86-64 laptop. > > I'm somewhat confused because, in spite of it sometimes taking at least > 3 times the normal timeout for the first stop message to appear, the > alarm in the test case (which is tied to the normal timeout) was never > triggering. My best theory on that is that the slowness is not in the > test case, but rather in gdbserver. IOW, all the threads are already > stopped by the time the alarm would expire, but gdb and gdbserver > haven't finished all the notifications and requests to print a stop > message for any of the threads yet. Is that plausible? Should the > timeout for the alarm be factored by the number of threads, too, just to > be safe? Or maybe it was, and the SIGALRM never manages to be processed by gdb and passed down to the inferior. > > I'm also not entirely sure what this test case is supposed to test. > From the original commit message and comments in the .exp file it seems > like timeouts were supposed to be a sign of a broken kernel with thread > starvation problems, not bugs in gdb or gdbserver. On the kernel side, "waitpid(-1, ...)" just walks the task list linearly looking for the first that had an event. Say you have two threads, A and B which are constantly hitting events/breakpoints. If A is quick enough, "waitpid(-1, ...)" returns the event for thread A over and over, and thread B is starved. The linux backends in both gdb and gdbserver have code in place that picks an event LWP at random out of all that have had events. A similar problem exists as soon as events are queued out of the target backends into gdb's core run control -- events can end up pending for processing later in gdb's core data structures too, and so if gdb just picked those events by walking its own thread list looking for the first thread that has an event pending, it'd starve some threads. So again infrun.c has similar randomization code to avoid starvation (random_pending_event_thread). > But, don't we > normally just skip tests that the target doesn't support or can't run > properly, rather than report them as FAILs? And, I don't know how to > distinguish timeouts that mean the kernel is broken from timeouts that > mean the target is just slow and you need to set a bigger value in the > test harness. Pedro Alves From ccd1e1cca1409ea8eb6f564561280f345f321077 Mon Sep 17 00:00:00 2001 From: Pedro Alves Date: Tue, 8 Sep 2015 17:26:02 +0100 Subject: [PATCH 2/2] non-stop-fair-events.exp slower on software single-step && !displ-step targets On software single-step targets that don't support displaced stepping, threads keep hitting each other's single-step breakpoints, and then GDB needs to pause all threads to step past those. The end result is that progress in the main thread will be slower and it may take a bit longer for the signal to be queued. This patch bumps the timeout on such targets. gdb/testsuite/ChangeLog: 2015-09-08 Pedro Alves * gdb.threads/non-stop-fair-events.c (timeout): New global. (SECONDS): Redefine. (main): Call pthread_kill and alarm early. * gdb.threads/non-stop-fair-events.exp: Probe displaced stepping support. (test): If the target can't hardware step and doesn't support displaced stepping, increase the timeout. --- gdb/testsuite/gdb.threads/non-stop-fair-events.c | 9 ++- gdb/testsuite/gdb.threads/non-stop-fair-events.exp | 91 +++++++++++++++------- gdb/testsuite/lib/gdb.exp | 20 +++-- 3 files changed, 82 insertions(+), 38 deletions(-) diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.c b/gdb/testsuite/gdb.threads/non-stop-fair-events.c index f82c366..700676b 100644 --- a/gdb/testsuite/gdb.threads/non-stop-fair-events.c +++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.c @@ -24,7 +24,9 @@ const int num_threads = NUM_THREADS; /* Allow for as much timeout as DejaGnu wants, plus a bit of slack. */ -#define SECONDS (TIMEOUT + 20) + +volatile unsigned int timeout = TIMEOUT; +#define SECONDS (timeout + 20) pthread_t child_thread[NUM_THREADS]; volatile pthread_t signal_thread; @@ -69,6 +71,11 @@ main (void) int res; int i; + /* Call these early so that we're sure their PLTs are quickly + resolved now, instead of in the busy threads. */ + pthread_kill (pthread_self (), 0); + alarm (0); + signal (SIGUSR1, handler); for (i = 0; i < NUM_THREADS; i++) diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp index 37f5bcb..ba14f6a 100644 --- a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp +++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp @@ -62,6 +62,19 @@ set NUM_THREADS [get_value "num_threads" "get num_threads"] # Account for the main thread. incr NUM_THREADS +# Probe for displaced stepping support. +set displaced_stepping_enabled 0 +set msg "check displaced-stepping" +gdb_test_no_output "set debug displaced 1" +gdb_test_multiple "si" $msg { + -re "displaced pc to.*$gdb_prompt $" { + set displaced_stepping_enabled 1 + } + -re ".*$gdb_prompt $" { + } +} +gdb_test_no_output "set debug displaced 0" + # Run threads to their start positions. This prepares for a new test # sequence. @@ -127,6 +140,8 @@ proc enable_debug {enable} { proc test {signal_thread} { global gdb_prompt global NUM_THREADS + global timeout + global displaced_stepping_enabled with_test_prefix "signal_thread=$signal_thread" { restart @@ -152,42 +167,58 @@ proc test {signal_thread} { enable_debug 1 - set saw_continuing 0 - set test "continue &" - gdb_test_multiple $test $test { - -re "Continuing.\r\n" { - set saw_continuing 1 - exp_continue - } - -re "$gdb_prompt " { - gdb_assert $saw_continuing $test - } - -re "infrun:" { - exp_continue - } + # On software single-step targets that don't support displaced + # stepping, threads keep hitting each others' single-step + # breakpoints, and then GDB needs to pause all threads to step + # past those. The end result is that progress in the main + # thread will be slower and it may take a bit longer for the + # signal to be queued; bump the timeout. + if {!$displaced_stepping_enabled && ![can_hardware_single_step]} { + set factor 5 + } else { + set factor 1 } - - set gotit 0 - - # Wait for all threads to finish their steps, and for the main - # thread to hit the breakpoint. - for {set i 1} { $i <= $NUM_THREADS } { incr i } { - set test "thread $i broke out of loop" - set gotit 0 - gdb_test_multiple "" $test { - -re "loop_broke" { - # The prompt was already matched in the "continue - # &" test above. We're now consuming asynchronous - # output that comes after the prompt. - set gotit 1 - pass $test + with_timeout_factor $factor { + gdb_test "print timeout = $timeout" " = $timeout" \ + "set timeout in the inferior" + + set saw_continuing 0 + set test "continue &" + gdb_test_multiple $test $test { + -re "Continuing.\r\n" { + set saw_continuing 1 + exp_continue + } + -re "$gdb_prompt " { + gdb_assert $saw_continuing $test } -re "infrun:" { exp_continue } } - if {!$gotit} { - break + + set gotit 0 + + # Wait for all threads to finish their steps, and for the main + # thread to hit the breakpoint. + for {set i 1} { $i <= $NUM_THREADS } { incr i } { + set test "thread $i broke out of loop" + set gotit 0 + gdb_test_multiple "" $test { + -re "loop_broke" { + # The prompt was already matched in the "continue + # &" test above. We're now consuming asynchronous + # output that comes after the prompt. + set gotit 1 + pass $test + } + -re "infrun:" { + exp_continue + } + } + if {!$gotit} { + break + } } } diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp index 56cde7a..9eaf721 100644 --- a/gdb/testsuite/lib/gdb.exp +++ b/gdb/testsuite/lib/gdb.exp @@ -2150,15 +2150,10 @@ proc supports_get_siginfo_type {} { } } -# Return 1 if target hardware or OS supports single stepping to signal -# handler, otherwise, return 0. +# Return 1 if the target supports hardware single stepping. -proc can_single_step_to_signal_handler {} { +proc can_hardware_single_step {} { - # Targets don't have hardware single step. On these targets, when - # a signal is delivered during software single step, gdb is unable - # to determine the next instruction addresses, because start of signal - # handler is one of them. if { [istarget "arm*-*-*"] || [istarget "mips*-*-*"] || [istarget "tic6x-*-*"] || [istarget "sparc*-*-linux*"] || [istarget "nios2-*-*"] } { @@ -2168,6 +2163,17 @@ proc can_single_step_to_signal_handler {} { return 1 } +# Return 1 if target hardware or OS supports single stepping to signal +# handler, otherwise, return 0. + +proc can_single_step_to_signal_handler {} { + # Targets don't have hardware single step. On these targets, when + # a signal is delivered during software single step, gdb is unable + # to determine the next instruction addresses, because start of signal + # handler is one of them. + return [can_hardware_single_step] +} + # Return 1 if target supports process record, otherwise return 0. proc supports_process_record {} { -- 1.9.3