From patchwork Fri Sep 4 16:54:37 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sandra Loosemore X-Patchwork-Id: 8583 Received: (qmail 92755 invoked by alias); 4 Sep 2015 16:55:24 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 92740 invoked by uid 89); 4 Sep 2015 16:55:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 04 Sep 2015 16:55:22 +0000 Received: from svr-orw-fem-02x.mgc.mentorg.com ([147.34.96.206] helo=SVR-ORW-FEM-02.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZXuGl-00005B-9g from Sandra_Loosemore@mentor.com ; Fri, 04 Sep 2015 09:55:19 -0700 Received: from [IPv6:::1] (147.34.91.1) by svr-orw-fem-02.mgc.mentorg.com (147.34.96.168) with Microsoft SMTP Server id 14.3.224.2; Fri, 4 Sep 2015 09:55:18 -0700 Message-ID: <55E9CCCD.7060604@codesourcery.com> Date: Fri, 4 Sep 2015 10:54:37 -0600 From: Sandra Loosemore User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: gdb-patches CC: Pedro Alves , Yao Qi Subject: [RFC] fix gdb.threads/non-stop-fair-events.exp timeouts While running GDB tests on nios2-linux-gnu with gdbserver and "target remote", I've been seeing random failures in gdb.threads/non-stop-fair-events.exp. E.g. in one test run I got FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 1 broke out of loop (timeout) FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 2 broke out of loop (timeout) FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=6: thread 3 broke out of loop (timeout) FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=7: thread 1 broke out of loop (timeout) FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=10: thread 1 broke out of loop (timeout) FAIL: gdb.threads/non-stop-fair-events.exp: signal_thread=10: thread 2 broke out of loop (timeout) and in other test runs I got a different ones. The pattern seemed to be that sometimes it took an extra long time for the first thread to break out of the loop, but once that happened they would all stop correctly and send the expected replies even though GDB had given up on waiting for the first few already. I've come up with the attached patch to factor the timeout for the failing tests by the number of threads still running, which seems to take care of the problem. Does this seem reasonable? I'm somewhat confused because, in spite of it sometimes taking at least 3 times the normal timeout for the first stop message to appear, the alarm in the test case (which is tied to the normal timeout) was never triggering. My best theory on that is that the slowness is not in the test case, but rather in gdbserver. IOW, all the threads are already stopped by the time the alarm would expire, but gdb and gdbserver haven't finished all the notifications and requests to print a stop message for any of the threads yet. Is that plausible? Should the timeout for the alarm be factored by the number of threads, too, just to be safe? I'm also not entirely sure what this test case is supposed to test. From the original commit message and comments in the .exp file it seems like timeouts were supposed to be a sign of a broken kernel with thread starvation problems, not bugs in gdb or gdbserver. But, don't we normally just skip tests that the target doesn't support or can't run properly, rather than report them as FAILs? And, I don't know how to distinguish timeouts that mean the kernel is broken from timeouts that mean the target is just slow and you need to set a bigger value in the test harness. -Sandra the confused diff --git a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp index e2d3f7d..1570d3f 100644 --- a/gdb/testsuite/gdb.threads/non-stop-fair-events.exp +++ b/gdb/testsuite/gdb.threads/non-stop-fair-events.exp @@ -135,16 +135,22 @@ proc test {signal_thread} { # Wait for all threads to finish their steps, and for the main # thread to hit the breakpoint. + # Running this many threads may be quite slow on remote targets, + # so factor the timeout according to how many threads are running. + set max_timeout $NUM_THREADS for {set i 1} { $i <= $NUM_THREADS } { incr i } { set test "thread $i broke out of loop" - gdb_test_multiple "" $test { - -re "loop_broke" { - # The prompt was already matched in the "continue - # &" test above. We're now consuming asynchronous - # output that comes after the prompt. - pass $test + with_timeout_factor $max_timeout { + gdb_test_multiple "" $test { + -re "loop_broke" { + # The prompt was already matched in the "continue + # &" test above. We're now consuming asynchronous + # output that comes after the prompt. + pass $test + } } } + set max_timeout [expr $max_timeout - 1] } # It's helpful to have this in the log if the test ever