From patchwork Tue Sep  8 16:30:09 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Pedro Alves <palves@redhat.com>
X-Patchwork-Id: 8600
Received: (qmail 17230 invoked by alias); 8 Sep 2015 16:30:21 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Delivered-To: mailing list gdb-patches@sourceware.org
Received: (qmail 17211 invoked by uid 89); 8 Sep 2015 16:30:19 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL, BAYES_20,
	KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_PASS,
	T_RP_MATCHES_RCVD autolearn=no version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	(AES256-GCM-SHA384 encrypted) ESMTPS;
	Tue, 08 Sep 2015 16:30:12 +0000
Received: from int-mx10.intmail.prod.int.phx2.redhat.com
	(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])	by
	mx1.redhat.com (Postfix) with ESMTPS id A52E48EA23;
	Tue,  8 Sep 2015 16:30:11 +0000 (UTC)
Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com
	[10.39.146.11])	by int-mx10.intmail.prod.int.phx2.redhat.com
	(8.14.4/8.14.4) with ESMTP id t88GU9Nl028573;
	Tue, 8 Sep 2015 12:30:10 -0400
Message-ID: <55EF0D11.2020200@redhat.com>
Date: Tue, 08 Sep 2015 17:30:09 +0100
From: Pedro Alves <palves@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: Sandra Loosemore <sandra@codesourcery.com>,
	gdb-patches <gdb-patches@sourceware.org>
CC: Yao Qi <yao.qi@linaro.org>
Subject: Re: [RFC] fix gdb.threads/non-stop-fair-events.exp timeouts
References: <55E9CCCD.7060604@codesourcery.com>
In-Reply-To: <55E9CCCD.7060604@codesourcery.com>

On 09/04/2015 05:54 PM, Sandra Loosemore wrote:
> While running GDB tests on nios2-linux-gnu with gdbserver and "target 
> remote", I've been seeing random failures in 
> gdb.threads/non-stop-fair-events.exp.  E.g. in one test run I got
> 
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 1 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 2 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 3 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=7: thread 1 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=10: thread 1 
> broke out of loop (timeout)
> FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=10: thread 2 
> broke out of loop (timeout)
> 
> and in other test runs I got a different ones.  The pattern seemed to be 
> that sometimes it took an extra long time for the first thread to break 
> out of the loop, but once that happened they would all stop correctly 
> and send the expected replies even though GDB had given up on waiting 
> for the first few already.

Yeah, I've seen this before with a local series I use for debugging
software single-step things that implements software single-stepping
on x86.  I just re-tried it now after rebasing that series to
current mainline, and I still see the time outs against gdbserver.

AFAICS, nios2 is a software single-step target that does not implement
displaced stepping either.  I had a patch for this that I had
never posted.  See attached.

> 
> I've come up with the attached patch to factor the timeout for the 
> failing tests by the number of threads still running, which seems to 
> take care of the problem.  Does this seem reasonable?

I'd rather avoid it unconditionally; it's just 10 threads, and if the
target supports displaced stepping, if starvation avoidance in gdb is
working correctly, the test should complete quickly.  I takes a couple
seconds on my getting-old x86-64 laptop.

> 
> I'm somewhat confused because, in spite of it sometimes taking at least 
> 3 times the normal timeout for the first stop message to appear, the 
> alarm in the test case (which is tied to the normal timeout) was never 
> triggering.  My best theory on that is that the slowness is not in the 
> test case, but rather in gdbserver.  IOW, all the threads are already 
> stopped by the time the alarm would expire, but gdb and gdbserver 
> haven't finished all the notifications and requests to print a stop 
> message for any of the threads yet.  Is that plausible?  Should the 
> timeout for the alarm be factored by the number of threads, too, just to 
> be safe?

Or maybe it was, and the SIGALRM never manages to be processed by gdb
and passed down to the inferior.

> 
> I'm also not entirely sure what this test case is supposed to test. 
>  From the original commit message and comments in the .exp file it seems 
> like timeouts were supposed to be a sign of a broken kernel with thread 
> starvation problems, not bugs in gdb or gdbserver.  

On the kernel side, "waitpid(-1, ...)" just walks the task list linearly
looking for the first that had an event.  Say you have two threads, A and
B which are constantly hitting events/breakpoints.  If A is quick enough,
"waitpid(-1, ...)" returns the event for thread A over and over, and thread B is
starved.  The linux backends in both gdb and gdbserver have code in place that
picks an event LWP at random out of all that have had events.  A similar problem
exists as soon as events are queued out of the target backends into gdb's core
run control -- events can end up pending for processing later in gdb's core
data structures too, and so if gdb just picked those events by walking its own
thread list looking for the first thread that has an event pending, it'd starve
some threads.  So again infrun.c has similar randomization code to avoid
starvation (random_pending_event_thread).

> But, don't we 
> normally just skip tests that the target doesn't support or can't run 
> properly, rather than report them as FAILs?  And, I don't know how to 
> distinguish timeouts that mean the kernel is broken from timeouts that 
> mean the target is just slow and you need to set a bigger value in the 
> test harness.

Pedro Alves

From ccd1e1cca1409ea8eb6f564561280f345f321077 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 8 Sep 2015 17:26:02 +0100
Subject: [PATCH 2/2] non-stop-fair-events.exp slower on software single-step
 && !displ-step targets

On software single-step targets that don't support displaced stepping,
threads keep hitting each other's single-step breakpoints, and then
GDB needs to pause all threads to step past those.  The end result is
that progress in the main thread will be slower and it may take a bit
longer for the signal to be queued.  This patch bumps the timeout on
such targets.

gdb/testsuite/ChangeLog:
2015-09-08  Pedro Alves  <palves@redhat.com>

	* gdb.threads/non-stop-fair-events.c (timeout): New global.
	(SECONDS): Redefine.
	(main): Call pthread_kill and alarm early.
	* gdb.threads/non-stop-fair-events.exp: Probe displaced stepping
	support.
	(test): If the target can't hardware step and doesn't support
	displaced stepping, increase the timeout.
---
 gdb/testsuite/gdb.threads/non-stop-fair-events.c   |  9 ++-
 gdb/testsuite/gdb.threads/non-stop-fair-events.exp | 91 +++++++++++++++-------
 gdb/testsuite/lib/gdb.exp                          | 20 +++--
 3 files changed, 82 insertions(+), 38 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.c b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
index f82c366..700676b 100644
--- a/gdb/testsuite/gdb.threads/non-stop-fair-events.c
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.c
@@ -24,7 +24,9 @@
 const int num_threads = NUM_THREADS;
 /* Allow for as much timeout as DejaGnu wants, plus a bit of
    slack.  */
-#define SECONDS (TIMEOUT + 20)
+
+volatile unsigned int timeout = TIMEOUT;
+#define SECONDS (timeout + 20)
 
 pthread_t child_thread[NUM_THREADS];
 volatile pthread_t signal_thread;
@@ -69,6 +71,11 @@ main (void)
   int res;
   int i;
 
+  /* Call these early so that we're sure their PLTs are quickly
+     resolved now, instead of in the busy threads.  */
+  pthread_kill (pthread_self (), 0);
+  alarm (0);
+
   signal (SIGUSR1, handler);
 
   for (i = 0; i < NUM_THREADS; i++)
diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
index 37f5bcb..ba14f6a 100644
--- a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
+++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp
@@ -62,6 +62,19 @@ set NUM_THREADS [get_value "num_threads" "get num_threads"]
 # Account for the main thread.
 incr NUM_THREADS
 
+# Probe for displaced stepping support.
+set displaced_stepping_enabled 0
+set msg "check displaced-stepping"
+gdb_test_no_output "set debug displaced 1"
+gdb_test_multiple "si" $msg {
+    -re "displaced pc to.*$gdb_prompt $" {
+	set displaced_stepping_enabled 1
+    }
+    -re ".*$gdb_prompt $" {
+    }
+}
+gdb_test_no_output "set debug displaced 0"
+
 # Run threads to their start positions.  This prepares for a new test
 # sequence.
 
@@ -127,6 +140,8 @@ proc enable_debug {enable} {
 proc test {signal_thread} {
     global gdb_prompt
     global NUM_THREADS
+    global timeout
+    global displaced_stepping_enabled
 
     with_test_prefix "signal_thread=$signal_thread" {
 	restart
@@ -152,42 +167,58 @@ proc test {signal_thread} {
 
 	enable_debug 1
 
-	set saw_continuing 0
-	set test "continue &"
-	gdb_test_multiple $test $test {
-	    -re "Continuing.\r\n" {
-		set saw_continuing 1
-		exp_continue
-	    }
-	    -re "$gdb_prompt " {
-		gdb_assert $saw_continuing $test
-	    }
-	    -re "infrun:" {
-		exp_continue
-	    }
+	# On software single-step targets that don't support displaced
+	# stepping, threads keep hitting each others' single-step
+	# breakpoints, and then GDB needs to pause all threads to step
+	# past those.  The end result is that progress in the main
+	# thread will be slower and it may take a bit longer for the
+	# signal to be queued; bump the timeout.
+	if {!$displaced_stepping_enabled && ![can_hardware_single_step]} {
+	    set factor 5
+	} else {
+	    set factor 1
 	}
-
-	set gotit 0
-
-	# Wait for all threads to finish their steps, and for the main
-	# thread to hit the breakpoint.
-	for {set i 1} { $i <= $NUM_THREADS } { incr i } {
-	    set test "thread $i broke out of loop"
-	    set gotit 0
-	    gdb_test_multiple "" $test {
-		-re "loop_broke" {
-		    # The prompt was already matched in the "continue
-		    # &" test above.  We're now consuming asynchronous
-		    # output that comes after the prompt.
-		    set gotit 1
-		    pass $test
+	with_timeout_factor $factor {
+	    gdb_test "print timeout = $timeout" " = $timeout" \
+		"set timeout in the inferior"
+
+	    set saw_continuing 0
+	    set test "continue &"
+	    gdb_test_multiple $test $test {
+		-re "Continuing.\r\n" {
+		    set saw_continuing 1
+		    exp_continue
+		}
+		-re "$gdb_prompt " {
+		    gdb_assert $saw_continuing $test
 		}
 		-re "infrun:" {
 		    exp_continue
 		}
 	    }
-	    if {!$gotit} {
-		break
+
+	    set gotit 0
+
+	    # Wait for all threads to finish their steps, and for the main
+	    # thread to hit the breakpoint.
+	    for {set i 1} { $i <= $NUM_THREADS } { incr i } {
+		set test "thread $i broke out of loop"
+		set gotit 0
+		gdb_test_multiple "" $test {
+		    -re "loop_broke" {
+			# The prompt was already matched in the "continue
+			# &" test above.  We're now consuming asynchronous
+			# output that comes after the prompt.
+			set gotit 1
+			pass $test
+		    }
+		    -re "infrun:" {
+			exp_continue
+		    }
+		}
+		if {!$gotit} {
+		    break
+		}
 	    }
 	}
 
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index 56cde7a..9eaf721 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -2150,15 +2150,10 @@ proc supports_get_siginfo_type {} {
     }
 }
 
-# Return 1 if target hardware or OS supports single stepping to signal
-# handler, otherwise, return 0.
+# Return 1 if the target supports hardware single stepping.
 
-proc can_single_step_to_signal_handler {} {
+proc can_hardware_single_step {} {
 
-    # Targets don't have hardware single step.  On these targets, when
-    # a signal is delivered during software single step, gdb is unable
-    # to determine the next instruction addresses, because start of signal
-    # handler is one of them.
     if { [istarget "arm*-*-*"] || [istarget "mips*-*-*"]
 	 || [istarget "tic6x-*-*"] || [istarget "sparc*-*-linux*"]
 	 || [istarget "nios2-*-*"] } {
@@ -2168,6 +2163,17 @@ proc can_single_step_to_signal_handler {} {
     return 1
 }
 
+# Return 1 if target hardware or OS supports single stepping to signal
+# handler, otherwise, return 0.
+
+proc can_single_step_to_signal_handler {} {
+    # Targets don't have hardware single step.  On these targets, when
+    # a signal is delivered during software single step, gdb is unable
+    # to determine the next instruction addresses, because start of signal
+    # handler is one of them.
+    return [can_hardware_single_step]
+}
+
 # Return 1 if target supports process record, otherwise return 0.
 
 proc supports_process_record {} {
-- 
1.9.3