Improve analysis of racy testcases

Message ID 87r3gcgm91.fsf@redhat.com
State New, archived
Headers

Commit Message

Sergio Durigan Junior Feb. 17, 2016, 1:26 a.m. UTC
  This patch is a proposal to introduce some mechanisms to identify racy
testcases present in our testsuite.  As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot to
pollute the gdb-testers mailing list with hundreds of false-positives
messages every month.  Hopefully, by identifying these racy tests in
advance (and automatically) will contribute to the reduction of noise
traffic to gdb-testers, maybe to the point where we will be able to send
the failure messages directly to the authors of the commits.

I spent some time trying to decide the best way to tackle this problem,
and decided that there is no silver bullet.  Racy tests are tricky and
it is difficult to catch them, so the best solution I could find (for
now?) is to run our testsuite a number of times in a row, and then
compare the results (i.e., the gdb.sum files generated during each run).
The more times you run the tests, the more racy tests you are likely to
detect (at the expense of waiting longer and longer).  You can also run
the tests in parallel, which makes things faster (and contribute to
catching more racy tests, because your machine will have less resources
for each test and some of them are likely to fail when this happens).  I
did some tests in my machine (8-core i7, 16GB RAM), and running the
whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.

In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable.  You will assign a number to this
variable, which represents the number of times you want to run the
tests.  So, for example, if you want to run the whole testsuite 3 times
in parallel (using 2 cores), you will do:

  make check RACY_ITER=3 -j2

It is also possible to use the TESTS variable and specify which tests
you want to run:

  make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2

And so on.  The output files will be put at the directory
gdb/testsuite/racy_outputs/.

After make invokes the necessary rules to run the tests, it finally runs
a Python script that will analyze the resulting gdb.sum files.  This
Python script will read each file, and construct a series of sets based
on the results of the tests (one set for FAIL's, one for PASS'es, one
for KFAIL's, etc.).  It will then do some set operations and come up
with a list of unique, sorted testcases that are racy.  The algorithm
behind this is:

  for state in PASS, FAIL, XFAIL, XPASS...; do
    if a test's state in every sumfile is $state; then
      it is not racy
    else
      it is racy

IOW, a test must have the same state in every sumfile.

After processing everything, the script prints the racy tests it could
identify on stdout.  I am redirecting this to a file named racy.sum.

Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite.  I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:

  gdb.base/xyz.exp: non-unique message
  gdb.base/xyz.exp: non-unique message <<2>>
  
This means that you will have to be careful about them when you use the
racy.sum file.

I ran the script several times here, and it did a good job catching some
well-known racy tests.  Overall, I am satisfied with this approach and I
think it will be helpful to have it upstream'ed.  I also intend to
extend our BuildBot and create new, specialized builders that will be
responsible for detecting the racy tests every X number of days, but
that will only be done when this patch is accepted.

Comments are welcome, as usual.  Thanks.
  

Comments

Pedro Alves Feb. 25, 2016, 5:01 p.m. UTC | #1
On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
> This patch is a proposal to introduce some mechanisms to identify racy
> testcases present in our testsuite.  As can be seen in previous
> discussions, racy tests are really bothersome and cause our BuildBot to
> pollute the gdb-testers mailing list with hundreds of false-positives
> messages every month.  Hopefully, by identifying these racy tests in
> advance (and automatically) will contribute to the reduction of noise
> traffic to gdb-testers, maybe to the point where we will be able to send
> the failure messages directly to the authors of the commits.
> 
> I spent some time trying to decide the best way to tackle this problem,
> and decided that there is no silver bullet.  Racy tests are tricky and
> it is difficult to catch them, so the best solution I could find (for
> now?) is to run our testsuite a number of times in a row, and then
> compare the results (i.e., the gdb.sum files generated during each run).
> The more times you run the tests, the more racy tests you are likely to
> detect (at the expense of waiting longer and longer).  You can also run
> the tests in parallel, which makes things faster (and contribute to
> catching more racy tests, because your machine will have less resources
> for each test and some of them are likely to fail when this happens).  I
> did some tests in my machine (8-core i7, 16GB RAM), and running the
> whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.
> 
> In order to run the racy test machinery, you need to specify the
> RACY_ITER environment variable.  You will assign a number to this
> variable, which represents the number of times you want to run the
> tests.  So, for example, if you want to run the whole testsuite 3 times
> in parallel (using 2 cores), you will do:
> 
>   make check RACY_ITER=3 -j2
> 
> It is also possible to use the TESTS variable and specify which tests
> you want to run:
> 
>   make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2

I did this, and noticed:

$ ls testsuite/racy_outputs/
gdb1.log  gdb1.sum  gdb2.log  gdb2.sum  gdb3.log  gdb3.sum

I think it'd make more sense, and it'd be easier to copy
test run results around, diff result dirs etc., if these
were put in subdirs instead:

 testsuite/racy_outputs/1/gdb.sum
 testsuite/racy_outputs/1/gdb.sum
 testsuite/race_outputs/1/gdb.log
 testsuite/race_outputs/2/gdb.sum
 testsuite/race_outputs/2/gdb.log

etc.  Any reason not to?

> 
> And so on.  The output files will be put at the directory
> gdb/testsuite/racy_outputs/.
> 
> After make invokes the necessary rules to run the tests, it finally runs
> a Python script that will analyze the resulting gdb.sum files.  This
> Python script will read each file, and construct a series of sets based
> on the results of the tests (one set for FAIL's, one for PASS'es, one
> for KFAIL's, etc.).  It will then do some set operations and come up
> with a list of unique, sorted testcases that are racy.  The algorithm
> behind this is:
> 
>   for state in PASS, FAIL, XFAIL, XPASS...; do
>     if a test's state in every sumfile is $state; then
>       it is not racy
>     else
>       it is racy
> 
> IOW, a test must have the same state in every sumfile.

Is it easy to start with preexisting dirs with gdb.sum/gdb.log
files, and run this script on them?

E.g., say I've already run the testsuite three times, and now
I want to check for racy tests, using the test results I already
have:

 testruns/2015-02-21/1/gdb.sum
 testruns/2015-02-21/2/gdb.sum
 testruns/2015-02-21/3/gdb.sum

I think we should document that.

> 
> After processing everything, the script prints the racy tests it could
> identify on stdout.  I am redirecting this to a file named racy.sum.

I tried:
 make check TESTS='gdb.base/default.exp' RACY_ITER=3

and was surprised to see nothing output at the end.  Then I found that
the racy.sum file is completely empty.  How about we instead make sure
that something is already printed?  Maybe follow dejagnu's output style,
and print a summary at the end?

I've now run "make check -j8 RACY_ITER=3" and got this:

$ cat testsuite/racy.sum
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork

The gdb.threads/process-dies-while-handling-bp.exp ones are actually:

 -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
 +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)

Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.


> 
> Something else that I wasn't sure how to deal with was non-unique
> messages in our testsuite.  I decided to do the same thing I do in our
> BuildBot: include a unique identifier in the end of message, like:
> 
>   gdb.base/xyz.exp: non-unique message
>   gdb.base/xyz.exp: non-unique message <<2>>
>   
> This means that you will have to be careful about them when you use the
> racy.sum file.
> 
> I ran the script several times here, and it did a good job catching some
> well-known racy tests.  Overall, I am satisfied with this approach and I
> think it will be helpful to have it upstream'ed. 

I think so too.  We don't have to fix everything before its put in either.
And having this doesn't preclude exploring other approaches either.

The only thing I do wish we should do, is use the fruits of this
to somehow mark racy tests in the testsuite itself, instead of only
making the buildbot ignore them, so that local development benefits
as well.

> I also intend to
> extend our BuildBot and create new, specialized builders that will be
> responsible for detecting the racy tests every X number of days, but
> that will only be done when this patch is accepted.

That'll be great.

> Comments are welcome, as usual.  Thanks.

On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
> diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
> index 38c3052..4bf6191 100644
> --- a/gdb/testsuite/Makefile.in
> +++ b/gdb/testsuite/Makefile.in
> @@ -52,6 +52,8 @@ RUNTESTFLAGS =
>
>  FORCE_PARALLEL =
>
> +DEFAULT_RACY_ITER = 3

Are users supposed to be able to trigger the default?  I didn't
see it documented.


> +Racy testcases
> +**************
> +
> +Unfortunately, GDB has a number of testcases that can randomly pass or
> +fail.  We call them "racy testcases", and they can be bothersome when
> +one is comparing different testsuite runs.  In order to help

I'd suggest future-proofing this, by rephrasing it as if we were at a
stage where the testsuite is stable, and new racy tests occasionally are
added to the testsuite.

>
> diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
> new file mode 100755
> index 0000000..4f05185
> --- /dev/null
> +++ b/gdb/testsuite/analyze-racy-logs.py
> @@ -0,0 +1,131 @@
> +#!/usr/bin/env python
> +
> +# Copyright (C) 2016 Free Software Foundation, Inc.
> +#
> +# This file is part of GDB.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +
> +# This program is used to analyze the test results (i.e., *.sum files)
> +# generated by GDB's testsuite, and print the testcases that are found
> +# to be racy.
> +#
> +# Racy testcases are considered as being testcases which can
> +# intermitently FAIL (or PASS) when run two or more times

intermittently

I think you should put this in as is, and we can keep working on
it on top.

Thanks,
Pedro Alves
  
Antoine Tremblay Feb. 25, 2016, 6:33 p.m. UTC | #2
Sergio Durigan Junior writes:

> This patch is a proposal to introduce some mechanisms to identify racy
> testcases present in our testsuite.  As can be seen in previous
> discussions, racy tests are really bothersome and cause our BuildBot to
> pollute the gdb-testers mailing list with hundreds of false-positives
> messages every month.  Hopefully, by identifying these racy tests in
> advance (and automatically) will contribute to the reduction of noise
> traffic to gdb-testers, maybe to the point where we will be able to send
> the failure messages directly to the authors of the commits.
>
> I spent some time trying to decide the best way to tackle this problem,
> and decided that there is no silver bullet.  Racy tests are tricky and
> it is difficult to catch them, so the best solution I could find (for
> now?) is to run our testsuite a number of times in a row, and then
> compare the results (i.e., the gdb.sum files generated during each run).
> The more times you run the tests, the more racy tests you are likely to
> detect (at the expense of waiting longer and longer).  You can also run
> the tests in parallel, which makes things faster (and contribute to
> catching more racy tests, because your machine will have less resources
> for each test and some of them are likely to fail when this happens).  I
> did some tests in my machine (8-core i7, 16GB RAM), and running the
> whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.
>
> In order to run the racy test machinery, you need to specify the
> RACY_ITER environment variable.  You will assign a number to this
> variable, which represents the number of times you want to run the
> tests.  So, for example, if you want to run the whole testsuite 3 times
> in parallel (using 2 cores), you will do:
>
>   make check RACY_ITER=3 -j2
>
> It is also possible to use the TESTS variable and specify which tests
> you want to run:
>
>   make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
>
> And so on.  The output files will be put at the directory
> gdb/testsuite/racy_outputs/.
>
> After make invokes the necessary rules to run the tests, it finally runs
> a Python script that will analyze the resulting gdb.sum files.  This
> Python script will read each file, and construct a series of sets based
> on the results of the tests (one set for FAIL's, one for PASS'es, one
> for KFAIL's, etc.).  It will then do some set operations and come up
> with a list of unique, sorted testcases that are racy.  The algorithm
> behind this is:
>
>   for state in PASS, FAIL, XFAIL, XPASS...; do
>     if a test's state in every sumfile is $state; then
>       it is not racy
>     else
>       it is racy
>
> IOW, a test must have the same state in every sumfile.
>
> After processing everything, the script prints the racy tests it could
> identify on stdout.  I am redirecting this to a file named racy.sum.
>
> Something else that I wasn't sure how to deal with was non-unique
> messages in our testsuite.  I decided to do the same thing I do in our
> BuildBot: include a unique identifier in the end of message, like:
>
>   gdb.base/xyz.exp: non-unique message
>   gdb.base/xyz.exp: non-unique message <<2>>
>   
> This means that you will have to be careful about them when you use the
> racy.sum file.
>
> I ran the script several times here, and it did a good job catching some
> well-known racy tests.  Overall, I am satisfied with this approach and I
> think it will be helpful to have it upstream'ed.  I also intend to
> extend our BuildBot and create new, specialized builders that will be
> responsible for detecting the racy tests every X number of days, but
> that will only be done when this patch is accepted.
>
> Comments are welcome, as usual.  Thanks.

Thanks for this ! This was quite a problem for me while testing on arm.
I'm testing it now...

One note maybe it would be nice output the list of unracy tests too to
be able to auto-build a list of tests to run out of this since I'm not
sure you can set an exclusion list ?

Regards,
Antoine
  
Sergio Durigan Junior Feb. 28, 2016, 9:44 p.m. UTC | #3
On Thursday, February 25 2016, Pedro Alves wrote:

> On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
>> This patch is a proposal to introduce some mechanisms to identify racy
>> testcases present in our testsuite.  As can be seen in previous
>> discussions, racy tests are really bothersome and cause our BuildBot to
>> pollute the gdb-testers mailing list with hundreds of false-positives
>> messages every month.  Hopefully, by identifying these racy tests in
>> advance (and automatically) will contribute to the reduction of noise
>> traffic to gdb-testers, maybe to the point where we will be able to send
>> the failure messages directly to the authors of the commits.
>> 
>> I spent some time trying to decide the best way to tackle this problem,
>> and decided that there is no silver bullet.  Racy tests are tricky and
>> it is difficult to catch them, so the best solution I could find (for
>> now?) is to run our testsuite a number of times in a row, and then
>> compare the results (i.e., the gdb.sum files generated during each run).
>> The more times you run the tests, the more racy tests you are likely to
>> detect (at the expense of waiting longer and longer).  You can also run
>> the tests in parallel, which makes things faster (and contribute to
>> catching more racy tests, because your machine will have less resources
>> for each test and some of them are likely to fail when this happens).  I
>> did some tests in my machine (8-core i7, 16GB RAM), and running the
>> whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.
>> 
>> In order to run the racy test machinery, you need to specify the
>> RACY_ITER environment variable.  You will assign a number to this
>> variable, which represents the number of times you want to run the
>> tests.  So, for example, if you want to run the whole testsuite 3 times
>> in parallel (using 2 cores), you will do:
>> 
>>   make check RACY_ITER=3 -j2
>> 
>> It is also possible to use the TESTS variable and specify which tests
>> you want to run:
>> 
>>   make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
>
> I did this, and noticed:
>
> $ ls testsuite/racy_outputs/
> gdb1.log  gdb1.sum  gdb2.log  gdb2.sum  gdb3.log  gdb3.sum
>
> I think it'd make more sense, and it'd be easier to copy
> test run results around, diff result dirs etc., if these
> were put in subdirs instead:
>
>  testsuite/racy_outputs/1/gdb.sum
>  testsuite/racy_outputs/1/gdb.sum
>  testsuite/race_outputs/1/gdb.log
>  testsuite/race_outputs/2/gdb.sum
>  testsuite/race_outputs/2/gdb.log
>
> etc.  Any reason not to?

Not really.  I think it's a good idea, so I modified the Makefile to
implement it.

>> And so on.  The output files will be put at the directory
>> gdb/testsuite/racy_outputs/.
>> 
>> After make invokes the necessary rules to run the tests, it finally runs
>> a Python script that will analyze the resulting gdb.sum files.  This
>> Python script will read each file, and construct a series of sets based
>> on the results of the tests (one set for FAIL's, one for PASS'es, one
>> for KFAIL's, etc.).  It will then do some set operations and come up
>> with a list of unique, sorted testcases that are racy.  The algorithm
>> behind this is:
>> 
>>   for state in PASS, FAIL, XFAIL, XPASS...; do
>>     if a test's state in every sumfile is $state; then
>>       it is not racy
>>     else
>>       it is racy
>> 
>> IOW, a test must have the same state in every sumfile.
>
> Is it easy to start with preexisting dirs with gdb.sum/gdb.log
> files, and run this script on them?

Yes, you just need to provide two or more .sum files to the script.

> E.g., say I've already run the testsuite three times, and now
> I want to check for racy tests, using the test results I already
> have:
>
>  testruns/2015-02-21/1/gdb.sum
>  testruns/2015-02-21/2/gdb.sum
>  testruns/2015-02-21/3/gdb.sum
>
> I think we should document that.

Good idea.  I will write more about how to use the script if you already
have the .sum files.  I assume that gdb/testsuite/README is a good place
to put this information.

>> After processing everything, the script prints the racy tests it could
>> identify on stdout.  I am redirecting this to a file named racy.sum.
>
> I tried:
>  make check TESTS='gdb.base/default.exp' RACY_ITER=3
>
> and was surprised to see nothing output at the end.  Then I found that
> the racy.sum file is completely empty.  How about we instead make sure
> that something is already printed?  Maybe follow dejagnu's output style,
> and print a summary at the end?

Alright, sounds doable.

> I've now run "make check -j8 RACY_ITER=3" and got this:
>
> $ cat testsuite/racy.sum
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
> gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
> gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
> gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork
>
> The gdb.threads/process-dies-while-handling-bp.exp ones are actually:
>
>  -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>  +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)
>
> Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.

I thought about stripping tail ()s away before comparing the names, but
the problem is that maybe we'll miss a test that actually writes
something meaningful inside the parentheses.  There isn't a strong
convention advising testcase writers to not do that.  What do you think?

As for the PASS->KFAIL->PASS, that's a good idea indeed.  I'll implement
that.

>> 
>> Something else that I wasn't sure how to deal with was non-unique
>> messages in our testsuite.  I decided to do the same thing I do in our
>> BuildBot: include a unique identifier in the end of message, like:
>> 
>>   gdb.base/xyz.exp: non-unique message
>>   gdb.base/xyz.exp: non-unique message <<2>>
>>   
>> This means that you will have to be careful about them when you use the
>> racy.sum file.
>> 
>> I ran the script several times here, and it did a good job catching some
>> well-known racy tests.  Overall, I am satisfied with this approach and I
>> think it will be helpful to have it upstream'ed. 
>
> I think so too.  We don't have to fix everything before its put in either.
> And having this doesn't preclude exploring other approaches either.
>
> The only thing I do wish we should do, is use the fruits of this
> to somehow mark racy tests in the testsuite itself, instead of only
> making the buildbot ignore them, so that local development benefits
> as well.

I totally agree, and also spent some time thinking about this problem,
but I don't see an easy solution for that.  Racy testcases vary wildly
between targets, GDB vs. gdbserver, CFLAGS, etc.  We would have to
maintain several lists of racy tests, and even this wouldn't be 100%
trustworthy because the racy tests also vary depending on the machine
load...

I'm eager to see what you/others think about this specific problem.

>> I also intend to
>> extend our BuildBot and create new, specialized builders that will be
>> responsible for detecting the racy tests every X number of days, but
>> that will only be done when this patch is accepted.
>
> That'll be great.
>
>> Comments are welcome, as usual.  Thanks.
>
> On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
>> diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
>> index 38c3052..4bf6191 100644
>> --- a/gdb/testsuite/Makefile.in
>> +++ b/gdb/testsuite/Makefile.in
>> @@ -52,6 +52,8 @@ RUNTESTFLAGS =
>>
>>  FORCE_PARALLEL =
>>
>> +DEFAULT_RACY_ITER = 3
>
> Are users supposed to be able to trigger the default?  I didn't
> see it documented.

Only if they invoke the rule by hand without specifying RACY_ITER, like:

  make check-racy-parallel -j4

In this case, DEFAULT_RACY_ITER would be used.  And yes, I did not
document it, but I should.  Thanks for mentioning.

>> +Racy testcases
>> +**************
>> +
>> +Unfortunately, GDB has a number of testcases that can randomly pass or
>> +fail.  We call them "racy testcases", and they can be bothersome when
>> +one is comparing different testsuite runs.  In order to help
>
> I'd suggest future-proofing this, by rephrasing it as if we were at a
> stage where the testsuite is stable, and new racy tests occasionally are
> added to the testsuite.

Gotcha.

>> diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
>> new file mode 100755
>> index 0000000..4f05185
>> --- /dev/null
>> +++ b/gdb/testsuite/analyze-racy-logs.py
>> @@ -0,0 +1,131 @@
>> +#!/usr/bin/env python
>> +
>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>> +#
>> +# This file is part of GDB.
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +
>> +# This program is used to analyze the test results (i.e., *.sum files)
>> +# generated by GDB's testsuite, and print the testcases that are found
>> +# to be racy.
>> +#
>> +# Racy testcases are considered as being testcases which can
>> +# intermitently FAIL (or PASS) when run two or more times
>
> intermittently

Ops, thanks.

> I think you should put this in as is, and we can keep working on
> it on top.

I'll send a v2 in a few minutes with your suggestions addressed, and
then I'll push it if you're OK with it.

Thanks a lot!
  
Sergio Durigan Junior March 1, 2016, 5:58 a.m. UTC | #4
On Thursday, February 25 2016, Antoine Tremblay wrote:

> Sergio Durigan Junior writes:
>
>> This patch is a proposal to introduce some mechanisms to identify racy
>> testcases present in our testsuite.  As can be seen in previous
>> discussions, racy tests are really bothersome and cause our BuildBot to
>> pollute the gdb-testers mailing list with hundreds of false-positives
>> messages every month.  Hopefully, by identifying these racy tests in
>> advance (and automatically) will contribute to the reduction of noise
>> traffic to gdb-testers, maybe to the point where we will be able to send
>> the failure messages directly to the authors of the commits.
>> [...]
> Thanks for this ! This was quite a problem for me while testing on arm.
> I'm testing it now...

Thanks!  Please let me know if you find anything wrong with the script.

> One note maybe it would be nice output the list of unracy tests too to
> be able to auto-build a list of tests to run out of this since I'm not
> sure you can set an exclusion list ?

Hm, it would be possible to output the non-racy tests, but only to a
different file (instead of outputting to stdout, as I'm doing with the
racy tests).  Perhaps this could be a separate option to the script?
I'm not sure if the users would always want this information...

As for the exclusion list you mentioned, DejaGNU's runtest allows one to
specify a --ignore flag with the names of the tests you don't want to
run.  It should be possible to pass this via RUNTESTFLAGS, but I haven't
tried.  I'll give it a go tomorrow.

Cheers,
  
Pedro Alves March 1, 2016, 11:56 a.m. UTC | #5
On 02/28/2016 09:44 PM, Sergio Durigan Junior wrote:
> On Thursday, February 25 2016, Pedro Alves wrote:

>
>> I've now run "make check -j8 RACY_ITER=3" and got this:
>>
>> $ cat testsuite/racy.sum
>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
>> gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
>> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
>> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
>> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
>> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
>> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
>> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
>> gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
>> gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
>> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
>> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork
>>

>> The gdb.threads/process-dies-while-handling-bp.exp ones are actually:
>>
>>   -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>>   +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)
>>
>> Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.
>
> I thought about stripping tail ()s away before comparing the names, but
> the problem is that maybe we'll miss a test that actually writes
> something meaningful inside the parentheses.  There isn't a strong
> convention advising testcase writers to not do that.  What do you think?

I think we should have that convention.  We already largely implicitly
have it, exactly because of "(PRMS: gdb/NNN)", "(timeout)" or "(eof)",
etc.

IOW, I think this should be interpreted as a regression in the
"whatever test" test:

  -PASS: gdb.base/foo.exp: whatever test
  +FAIL: gdb.base/foo.exp: whatever test (timeout)

If that actually strips something meaningful, I'd just file it under
the same bucket as non-unique test names, and fix it by tweaking the
test message.

>> The only thing I do wish we should do, is use the fruits of this
>> to somehow mark racy tests in the testsuite itself, instead of only
>> making the buildbot ignore them, so that local development benefits
>> as well.
>
> I totally agree, and also spent some time thinking about this problem,
> but I don't see an easy solution for that.  Racy testcases vary wildly
> between targets, GDB vs. gdbserver, CFLAGS, etc.
> We would have to maintain several lists of racy tests,

I wouldn't want to maintain separate lists at all.  Instead, I'd want
to mark testcases themselves with something like setup_kfail.  Testcases 
already call those depending on target/arch, I see no
difference.

Perhaps a problem with setup_kfail is that that generates a KPASS when
the race doesn't trigger.  We could add a new setup_racy_kfail that
generates a PASS on success and KFAIL on failure, but never a KPASS.
Perhaps we could have a racy_test_scope in the spirit of 
with_test_prefix, which would automatically mark all tests in the scope
as racy, but I'm not sure we do need or want that.

> and even this wouldn't be 100%
> trustworthy because the racy tests also vary depending on the machine
> load...

Varying the load does not make the _set_ of racy tests vary.  It only
varies the chances of a racy test in a set of racy tests actual trigger
the race.  A test in that set is still racy regardless.

Thanks,
Pedro Alves
  
Sergio Durigan Junior March 6, 2016, 1:29 a.m. UTC | #6
On Tuesday, March 01 2016, Pedro Alves wrote:

> On 02/28/2016 09:44 PM, Sergio Durigan Junior wrote:
>> On Thursday, February 25 2016, Pedro Alves wrote:
>
>>
>>> I've now run "make check -j8 RACY_ITER=3" and got this:
>>>
>>> $ cat testsuite/racy.sum
>>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
>>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
>>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
>>> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
>>> gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
>>> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
>>> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
>>> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
>>> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
>>> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
>>> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
>>> gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
>>> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
>>> gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
>>> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
>>> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork
>>>
>
>>> The gdb.threads/process-dies-while-handling-bp.exp ones are actually:
>>>
>>>   -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>>>   +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)
>>>
>>> Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.
>>
>> I thought about stripping tail ()s away before comparing the names, but
>> the problem is that maybe we'll miss a test that actually writes
>> something meaningful inside the parentheses.  There isn't a strong
>> convention advising testcase writers to not do that.  What do you think?
>
> I think we should have that convention.  We already largely implicitly
> have it, exactly because of "(PRMS: gdb/NNN)", "(timeout)" or "(eof)",
> etc.
>
> IOW, I think this should be interpreted as a regression in the
> "whatever test" test:
>
>  -PASS: gdb.base/foo.exp: whatever test
>  +FAIL: gdb.base/foo.exp: whatever test (timeout)
>
> If that actually strips something meaningful, I'd just file it under
> the same bucket as non-unique test names, and fix it by tweaking the
> test message.

Alright, makes sense to me.  I went ahead and also updated the wiki with
this rule:

  <https://sourceware.org/gdb/wiki/GDBTestcaseCookbook#Do_not_use_.22tail_parentheses.22_on_test_messages>

>>> The only thing I do wish we should do, is use the fruits of this
>>> to somehow mark racy tests in the testsuite itself, instead of only
>>> making the buildbot ignore them, so that local development benefits
>>> as well.
>>
>> I totally agree, and also spent some time thinking about this problem,
>> but I don't see an easy solution for that.  Racy testcases vary wildly
>> between targets, GDB vs. gdbserver, CFLAGS, etc.
>> We would have to maintain several lists of racy tests,
>
> I wouldn't want to maintain separate lists at all.  Instead, I'd want
> to mark testcases themselves with something like setup_kfail.
> Testcases already call those depending on target/arch, I see no
> difference.
>
> Perhaps a problem with setup_kfail is that that generates a KPASS when
> the race doesn't trigger.  We could add a new setup_racy_kfail that
> generates a PASS on success and KFAIL on failure, but never a KPASS.
> Perhaps we could have a racy_test_scope in the spirit of
> with_test_prefix, which would automatically mark all tests in the
> scope
> as racy, but I'm not sure we do need or want that.

I liked the idea of a setup_racy_kfail (not sure about the
racy_test_scope thing...).  I will take a look at implementing that
later.
  
Antoine Tremblay March 14, 2016, 12:31 p.m. UTC | #7
Sergio Durigan Junior writes:

> On Thursday, February 25 2016, Antoine Tremblay wrote:
>
>> Sergio Durigan Junior writes:
>>
>>> This patch is a proposal to introduce some mechanisms to identify racy
>>> testcases present in our testsuite.  As can be seen in previous
>>> discussions, racy tests are really bothersome and cause our BuildBot to
>>> pollute the gdb-testers mailing list with hundreds of false-positives
>>> messages every month.  Hopefully, by identifying these racy tests in
>>> advance (and automatically) will contribute to the reduction of noise
>>> traffic to gdb-testers, maybe to the point where we will be able to send
>>> the failure messages directly to the authors of the commits.
>>> [...]
>> Thanks for this ! This was quite a problem for me while testing on arm.
>> I'm testing it now...
>
> Thanks!  Please let me know if you find anything wrong with the script.

Just to let know you that the script worked as expected during my
testing and it's quite nice to see. The tests however were more flacky
then expected so I can't use the output directly guess I'll need a never
ignore this test list... (like base/break.exp). Still it's very helpful.

>
>> One note maybe it would be nice output the list of unracy tests too to
>> be able to auto-build a list of tests to run out of this since I'm not
>> sure you can set an exclusion list ?
>
> Hm, it would be possible to output the non-racy tests, but only to a
> different file (instead of outputting to stdout, as I'm doing with the
> racy tests).  Perhaps this could be a separate option to the script?
> I'm not sure if the users would always want this information...
>
> As for the exclusion list you mentioned, DejaGNU's runtest allows one to
> specify a --ignore flag with the names of the tests you don't want to
> run.  It should be possible to pass this via RUNTESTFLAGS, but I haven't
> tried.  I'll give it a go tomorrow.

Great, I did not know that, I'll give it a try and that works no need to
output the non-racy tests...

Thanks,
Antoine
  
Pedro Alves March 14, 2016, 12:45 p.m. UTC | #8
On 03/14/2016 12:31 PM, Antoine Tremblay wrote:

> Just to let know you that the script worked as expected during my
> testing and it's quite nice to see. The tests however were more flacky
> then expected so I can't use the output directly guess I'll need a never
> ignore this test list... (like base/break.exp). Still it's very helpful.

Could you share a bit more on what you found?  I wouldn't expect break.exp
to be flaky.  I think some tests that use gdb_test_stdio are still
flaky with gdbserver, but break.exp does not seem to use it.

Thanks,
Pedro Alves
  
Antoine Tremblay March 14, 2016, 2:04 p.m. UTC | #9
Pedro Alves writes:

> On 03/14/2016 12:31 PM, Antoine Tremblay wrote:
>
>> Just to let know you that the script worked as expected during my
>> testing and it's quite nice to see. The tests however were more flacky
>> then expected so I can't use the output directly guess I'll need a never
>> ignore this test list... (like base/break.exp). Still it's very helpful.
>
> Could you share a bit more on what you found?  I wouldn't expect break.exp
> to be flaky.  I think some tests that use gdb_test_stdio are still
> flaky with gdbserver, but break.exp does not seem to use it.
>

It's flaky because of a weird problem I have while running GDBServer on
ARM, sometimes out of nowhere I will get a SIGILL just running a program
on GDBServer, clean of breakpoints or anything just loading the program
in GDBServer and doing a continue can trigger it.

See this thread: https://www.sourceware.org/ml/gdb/2015-11/msg00030.html

So it's not because the test itself is flaky but more the platform in general...

Regards,
Antoine
  
Antoine Tremblay April 27, 2016, 5:18 p.m. UTC | #10
Antoine Tremblay writes:

>> As for the exclusion list you mentioned, DejaGNU's runtest allows one to
>> specify a --ignore flag with the names of the tests you don't want to
>> run.  It should be possible to pass this via RUNTESTFLAGS, but I haven't
>> tried.  I'll give it a go tomorrow.
>
> Great, I did not know that, I'll give it a try and that works no need to
> output the non-racy tests...

Just FYI, I just had a chance to try this and it works except you need
to forgo the path of the test so for example to ignore
gdb.trace/actions-changed.exp

You need : RUNTESTFLAGS='--ignore actions-changed.exp'

Not sure what happens if 2 tests have the same name buf a different directory..

Regards,
Antoine
  

Patch

diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
index 38c3052..4bf6191 100644
--- a/gdb/testsuite/Makefile.in
+++ b/gdb/testsuite/Makefile.in
@@ -52,6 +52,8 @@  RUNTESTFLAGS =
 
 FORCE_PARALLEL =
 
+DEFAULT_RACY_ITER = 3
+
 RUNTEST_FOR_TARGET = `\
   if [ -f $${srcdir}/../../dejagnu/runtest ]; then \
     echo $${srcdir}/../../dejagnu/runtest; \
@@ -140,7 +142,8 @@  installcheck:
 # given.  If RUNTESTFLAGS is not empty, then by default the tests will
 # be serialized.  This can be overridden by setting FORCE_PARALLEL to
 # any non-empty value.  For a non-GNU make, do not parallelize.
-@GMAKE_TRUE@CHECK_TARGET = $(if $(FORCE_PARALLEL),check-parallel,$(if $(RUNTESTFLAGS),check-single,$(if $(saw_dash_j),check-parallel,check-single)))
+@GMAKE_TRUE@CHECK_TARGET_TMP = $(if $(FORCE_PARALLEL),check-parallel,$(if $(RUNTESTFLAGS),check-single,$(if $(saw_dash_j),check-parallel,check-single)))
+@GMAKE_TRUE@CHECK_TARGET = $(if $(RACY_ITER),$(addsuffix -racy,$(CHECK_TARGET_TMP)),$(CHECK_TARGET_TMP))
 @GMAKE_FALSE@CHECK_TARGET = check-single
 
 # Note that we must resort to a recursive make invocation here,
@@ -190,6 +193,26 @@  DO_RUNTEST = \
 check-single:
 	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)
 
+check-single-racy:
+	-rm -rf cache racy_outputs temp
+	mkdir -p racy_outputs; \
+	racyiter="$(RACY_ITER)"; \
+	test "x$$racyiter" == "x" && \
+	  racyiter=$(DEFAULT_RACY_ITER); \
+	if test $$racyiter -lt 2 ; then \
+	  echo "RACY_ITER must be at least 2."; \
+	  exit 1; \
+	fi; \
+	trap "exit" INT; \
+	for n in `seq $$racyiter` ; do \
+	  $(DO_RUNTEST) --outdir=racy_outputs/ $(RUNTESTFLAGS) \
+	    $(expanded_tests_or_none); \
+	  mv -f racy_outputs/gdb.sum racy_outputs/gdb$$n.sum; \
+	  mv -f racy_outputs/gdb.log racy_outputs/gdb$$n.log; \
+	done; \
+	$(srcdir)/analyze-racy-logs.py \
+	  `ls racy_outputs/gdb*.sum` > racy.sum
+
 check-parallel:
 	-rm -rf cache outputs temp
 	$(MAKE) -k do-check-parallel; \
@@ -201,6 +224,30 @@  check-parallel:
 	sed -n '/=== gdb Summary ===/,$$ p' gdb.sum; \
 	exit $$result
 
+check-parallel-racy:
+	-rm -rf cache racy_outputs temp
+	racyiter="$(RACY_ITER)"; \
+	test "x$$racyiter" == "x" && \
+	  racyiter=$(DEFAULT_RACY_ITER); \
+	if test $$racyiter -lt 2 ; then \
+	  echo "RACY_ITER must be at least 2."; \
+	  exit 1; \
+	fi; \
+	trap "exit" INT; \
+	for n in `seq $$racyiter` ; do \
+	  $(MAKE) -k do-check-parallel-racy \
+	    RACY_OUTPUT_N=$$n; \
+	  $(SHELL) $(srcdir)/dg-extract-results.sh \
+	    `find racy_outputs/racy$$n -name gdb.sum -print` > \
+	    racy_outputs/gdb$$n.sum; \
+	  $(SHELL) $(srcdir)/dg-extract-results.sh -L \
+	    `find racy_outputs/racy$$n -name gdb.log -print` > \
+	    gdb$$n.log; \
+	  sed -n '/=== gdb Summary ===/,$$ p' racy_outputs/gdb$$n.sum; \
+	done; \
+	$(srcdir)/analyze-racy-logs.py \
+	  `ls racy_outputs/gdb*.sum` > racy.sum
+
 # Turn a list of .exp files into "check/" targets.  Only examine .exp
 # files appearing in a gdb.* directory -- we don't want to pick up
 # lib/ by mistake.  For example, gdb.linespec/linespec.exp becomes
@@ -214,9 +261,9 @@  slow_tests = gdb.base/break-interp.exp gdb.base/interp.exp \
 	gdb.base/multi-forks.exp
 @GMAKE_TRUE@all_tests := $(shell cd $(srcdir) && find gdb.* -name '*.exp' -print)
 @GMAKE_TRUE@reordered_tests := $(slow_tests) $(filter-out $(slow_tests),$(all_tests))
-@GMAKE_TRUE@TEST_TARGETS := $(addprefix check/,$(reordered_tests))
+@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(reordered_tests))
 @GMAKE_TRUE@else
-@GMAKE_TRUE@TEST_TARGETS := $(addprefix check/,$(expanded_tests_or_none))
+@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(expanded_tests_or_none))
 @GMAKE_TRUE@endif
 
 do-check-parallel: $(TEST_TARGETS)
@@ -226,6 +273,15 @@  do-check-parallel: $(TEST_TARGETS)
 @GMAKE_TRUE@	-mkdir -p outputs/$*
 @GMAKE_TRUE@	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)
 
+do-check-parallel-racy: $(TEST_TARGETS)
+	@:
+
+@GMAKE_TRUE@check-racy/%.exp:
+@GMAKE_TRUE@	-mkdir -p racy_outputs/racy$(RACY_OUTPUT_N)/$*
+@GMAKE_TRUE@	$(DO_RUNTEST) GDB_PARALLEL=yes \
+@GMAKE_TRUE@	  --outdir=racy_outputs/racy$(RACY_OUTPUT_N)/$* $*.exp \
+@GMAKE_TRUE@	  $(RUNTESTFLAGS)
+
 check/no-matching-tests-found:
 	@echo ""
 	@echo "No matching tests found."
diff --git a/gdb/testsuite/README b/gdb/testsuite/README
index 6b59027..efcd459 100644
--- a/gdb/testsuite/README
+++ b/gdb/testsuite/README
@@ -50,6 +50,29 @@  to any non-empty value:
 If you want to use runtest directly instead of using the Makefile, see
 the description of GDB_PARALLEL below.
 
+Racy testcases
+**************
+
+Unfortunately, GDB has a number of testcases that can randomly pass or
+fail.  We call them "racy testcases", and they can be bothersome when
+one is comparing different testsuite runs.  In order to help
+identifying them, it is possible to run the tests several times in a
+row and ask the testsuite machinery to analyze the results.  To do
+that, you need to specify the RACY_ITER environment variable to make:
+
+	make check RACY_ITER=5 -j4
+
+The value assigned to RACY_ITER represents the number of times you
+wish to run the tests in sequence (in the example above, the entire
+testsuite will be executed 5 times in a row, in parallel).  It is also
+possible to check just a specific test:
+
+	make check TESTS='gdb.base/default.exp' RACY_ITER=3
+
+After running the tests, you shall see a file name 'racy.sum' in the
+gdb/testsuite directory.  You can also inspect the generated *.log and
+*.sum files by looking into the gdb/testsuite/racy_ouputs directory.
+
 Running the Performance Tests
 *****************************
 
diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
new file mode 100755
index 0000000..4f05185
--- /dev/null
+++ b/gdb/testsuite/analyze-racy-logs.py
@@ -0,0 +1,131 @@ 
+#!/usr/bin/env python
+
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GDB.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+
+# This program is used to analyze the test results (i.e., *.sum files)
+# generated by GDB's testsuite, and print the testcases that are found
+# to be racy.
+#
+# Racy testcases are considered as being testcases which can
+# intermitently FAIL (or PASS) when run two or more times
+# consecutively, i.e., tests whose results are not deterministic.
+#
+# This program is invoked when the user runs "make check" and
+# specifies the RACY_ITER environment variable.
+
+import sys
+import os
+import re
+
+# The (global) dictionary that stores the associations between a *.sum
+# file and its results.  The data inside it will be stored as:
+#
+# files_and_tests = { 'file1.sum' : { 'PASS' : { 'test1', 'test2' ... },
+#                                     'FAIL' : { 'test5', 'test6' ... },
+#                                     ...
+#                                   },
+#                   { 'file2.sum' : { 'PASS' : { 'test1', 'test3' ... },
+#                                   ...
+#                                   }
+#                   }
+
+files_and_tests = dict ()
+
+# We are interested in lines that start with '.?(PASS|FAIL)'.  In
+# other words, we don't process errors (maybe we should).
+
+sum_matcher = re.compile('^(.?(PASS|FAIL)): (.*)$')
+
+def parse_sum_line (line, dic):
+    """Parse a single LINE from a sumfile, and store the results in the
+dictionary referenced by DIC."""
+    global sum_matcher
+
+    line = line.rstrip ()
+    m = re.match (sum_matcher, line)
+    if m:
+        result = m.group (1)
+        test_name = m.group (3)
+        if result not in dic.keys ():
+            dic[result] = set ()
+        if test_name in dic[result]:
+            # If the line is already present in the dictionary, then
+            # we include a unique identifier in the end of it, in the
+            # form or '<<N>>' (where N is a number >= 2).  This is
+            # useful because the GDB testsuite is full of non-unique
+            # test messages; however, if you process the racy summary
+            # file you will also need to perform this same operation
+            # in order to identify the racy test.
+            i = 2
+            while True:
+                nname = test_name + ' <<' + str (i) + '>>'
+                if nname not in dic[result]:
+                    break
+                i += 1
+            test_name = nname
+        dic[result].add (test_name)
+
+def read_sum_files (files):
+    """Read the sumfiles (passed as a list in the FILES variable), and
+process each one, filling the FILES_AND_TESTS global dictionary with
+information about them. """
+    global files_and_tests
+
+    for x in files:
+        with open (x, 'r') as f:
+            files_and_tests[x] = dict ()
+            for line in f.readlines ():
+                parse_sum_line (line, files_and_tests[x])
+
+def identify_racy_tests ():
+    """Identify and print the racy tests.  This function basically works
+on sets, and the idea behind it is simple.  It takes all the sets that
+refer to the same result (for example, all the sets that contain PASS
+tests), and compare them.  If a test is present in all PASS sets, then
+it is not racy.  Otherwise, it is.
+
+This function does that for all sets (PASS, FAIL, KPASS, KFAIL, etc.),
+and then print a sorted list (without duplicates) of all the tests
+that were found to be racy."""
+    global files_and_tests
+
+    total_set = dict ()
+    for item in files_and_tests:
+        for key in files_and_tests[item]:
+            if key not in total_set:
+                total_set[key] = files_and_tests[item][key]
+            else:
+                total_set[key] &= files_and_tests[item][key]
+
+    racy_tests = set ()
+    for item in files_and_tests:
+        for key in files_and_tests[item]:
+            racy_tests |= files_and_tests[item][key] - total_set[key]
+
+    for line in sorted (racy_tests):
+        print line
+
+if __name__ == '__main__':
+    if len (sys.argv) < 3:
+        # It only makes sense to invoke this program if you pass two
+        # or more files to be analyzed.
+        sys.exit ("Usage: %s [FILE] [FILE] ..." % sys.argv[0])
+    read_sum_files (sys.argv[1:])
+    identify_racy_tests ()
+    exit (0)