From patchwork Tue Mar  1 05:51:30 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sergio Durigan Junior <sergiodj@redhat.com>
X-Patchwork-Id: 11151
Received: (qmail 41328 invoked by alias); 1 Mar 2016 05:51:37 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Delivered-To: mailing list gdb-patches@sourceware.org
Received: (qmail 41305 invoked by uid 89); 1 Mar 2016 05:51:35 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,
	RP_MATCHES_RCVD,
	SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=dictionary,
	non-empty, test1, 1906
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	(AES256-GCM-SHA384 encrypted) ESMTPS;
	Tue, 01 Mar 2016 05:51:32 +0000
Received: from int-mx13.intmail.prod.int.phx2.redhat.com
	(int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])	by
	mx1.redhat.com (Postfix) with ESMTPS id 88A0F821C3	for
	<gdb-patches@sourceware.org>; Tue,  1 Mar 2016 05:51:31 +0000 (UTC)
Received: from localhost (unused-10-15-17-51.yyz.redhat.com
	[10.15.17.51])	by int-mx13.intmail.prod.int.phx2.redhat.com
	(8.14.4/8.14.4) with ESMTP id u215pURQ014854	(version=TLSv1/SSLv3
	cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
	Tue, 1 Mar 2016 00:51:31 -0500
From: Sergio Durigan Junior <sergiodj@redhat.com>
To: GDB Patches <gdb-patches@sourceware.org>
Cc: Pedro Alves <palves@redhat.com>
Subject: [PATCH v2] Improve analysis of racy testcases
References: <87r3gcgm91.fsf@redhat.com>
X-URL: http://blog.sergiodj.net
Date: Tue, 01 Mar 2016 00:51:30 -0500
In-Reply-To: <87r3gcgm91.fsf@redhat.com> (Sergio Durigan Junior's message
	of	"Tue, 16 Feb 2016 20:26:50 -0500")
Message-ID: <87d1rewxst.fsf@redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
MIME-Version: 1.0
X-IsSubscribed: yes

This is the second version of the patch.  It addresses the following
comments from Pedro:

- Document the DEFAULT_RACY_ITER variable (on gdb/testsuite/Makefile.in
  and gdb/testsuite/README)

- Rephrase the "Racy testcases" section on gdb/testsuite/README

- Improve the output of the racy analyzer in order to make it look more
  like DejaGNU's output.

- Implement a mechanism for ignoring specific testcases based on their
  results (e.g., PASS -> KFAIL).


This patch is a proposal to introduce some mechanisms to identify racy
testcases present in our testsuite.  As can be seen in previous
discussions, racy tests are really bothersome and cause our BuildBot to
pollute the gdb-testers mailing list with hundreds of false-positives
messages every month.  Hopefully, by identifying these racy tests in
advance (and automatically) will contribute to the reduction of noise
traffic to gdb-testers, maybe to the point where we will be able to send
the failure messages directly to the authors of the commits.

I spent some time trying to decide the best way to tackle this problem,
and decided that there is no silver bullet.  Racy tests are tricky and
it is difficult to catch them, so the best solution I could find (for
now?) is to run our testsuite a number of times in a row, and then
compare the results (i.e., the gdb.sum files generated during each run).
The more times you run the tests, the more racy tests you are likely to
detect (at the expense of waiting longer and longer).  You can also run
the tests in parallel, which makes things faster (and contribute to
catching more racy tests, because your machine will have less resources
for each test and some of them are likely to fail when this happens).  I
did some tests in my machine (8-core i7, 16GB RAM), and running the
whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.

In order to run the racy test machinery, you need to specify the
RACY_ITER environment variable.  You will assign a number to this
variable, which represents the number of times you want to run the
tests.  So, for example, if you want to run the whole testsuite 3 times
in parallel (using 2 cores), you will do:

  make check RACY_ITER=3 -j2

It is also possible to use the TESTS variable and specify which tests
you want to run:

  make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2

And so on.  The output files will be put at the directory
gdb/testsuite/racy_outputs/.

After make invokes the necessary rules to run the tests, it finally runs
a Python script that will analyze the resulting gdb.sum files.  This
Python script will read each file, and construct a series of sets based
on the results of the tests (one set for FAIL's, one for PASS'es, one
for KFAIL's, etc.).  It will then do some set operations and come up
with a list of unique, sorted testcases that are racy.  The algorithm
behind this is:

  for state in PASS, FAIL, XFAIL, XPASS...; do
    if a test's state in every sumfile is $state; then
      it is not racy
    else
      it is racy

IOW, a test must have the same state in every sumfile.

After processing everything, the script prints the racy tests it could
identify on stdout.  I am redirecting this to a file named racy.sum.

Something else that I wasn't sure how to deal with was non-unique
messages in our testsuite.  I decided to do the same thing I do in our
BuildBot: include a unique identifier in the end of message, like:

  gdb.base/xyz.exp: non-unique message
  gdb.base/xyz.exp: non-unique message <<2>>
  
This means that you will have to be careful about them when you use the
racy.sum file.

I ran the script several times here, and it did a good job catching some
well-known racy tests.  Overall, I am satisfied with this approach and I
think it will be helpful to have it upstream'ed.  I also intend to
extend our BuildBot and create new, specialized builders that will be
responsible for detecting the racy tests every X number of days, but
that will only be done when this patch is accepted.

Comments are welcome, as usual.  Thanks.

diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
index 38c3052..b5e53ad 100644
--- a/gdb/testsuite/Makefile.in
+++ b/gdb/testsuite/Makefile.in
@@ -52,6 +52,12 @@ RUNTESTFLAGS =
 
 FORCE_PARALLEL =
 
+# Default number of iterations that we will use to run the testsuite
+# if the user does not specify the RACY_ITER environment variable
+# (e.g., when the user calls the make rule directly from the command
+# line).
+DEFAULT_RACY_ITER = 3
+
 RUNTEST_FOR_TARGET = `\
   if [ -f $${srcdir}/../../dejagnu/runtest ]; then \
     echo $${srcdir}/../../dejagnu/runtest; \
@@ -140,7 +146,8 @@ installcheck:
 # given.  If RUNTESTFLAGS is not empty, then by default the tests will
 # be serialized.  This can be overridden by setting FORCE_PARALLEL to
 # any non-empty value.  For a non-GNU make, do not parallelize.
-@GMAKE_TRUE@CHECK_TARGET = $(if $(FORCE_PARALLEL),check-parallel,$(if $(RUNTESTFLAGS),check-single,$(if $(saw_dash_j),check-parallel,check-single)))
+@GMAKE_TRUE@CHECK_TARGET_TMP = $(if $(FORCE_PARALLEL),check-parallel,$(if $(RUNTESTFLAGS),check-single,$(if $(saw_dash_j),check-parallel,check-single)))
+@GMAKE_TRUE@CHECK_TARGET = $(if $(RACY_ITER),$(addsuffix -racy,$(CHECK_TARGET_TMP)),$(CHECK_TARGET_TMP))
 @GMAKE_FALSE@CHECK_TARGET = check-single
 
 # Note that we must resort to a recursive make invocation here,
@@ -190,6 +197,26 @@ DO_RUNTEST = \
 check-single:
 	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)
 
+check-single-racy:
+	-rm -rf cache racy_outputs temp
+	mkdir -p racy_outputs; \
+	racyiter="$(RACY_ITER)"; \
+	test "x$$racyiter" == "x" && \
+	  racyiter=$(DEFAULT_RACY_ITER); \
+	if test $$racyiter -lt 2 ; then \
+	  echo "RACY_ITER must be at least 2."; \
+	  exit 1; \
+	fi; \
+	trap "exit" INT; \
+	for n in `seq $$racyiter` ; do \
+	  mkdir -p racy_outputs/$$n; \
+	  $(DO_RUNTEST) --outdir=racy_outputs/$$n $(RUNTESTFLAGS) \
+	    $(expanded_tests_or_none); \
+	done; \
+	$(srcdir)/analyze-racy-logs.py \
+	  `ls racy_outputs/gdb*.sum` > racy.sum; \
+	sed -n '/=== gdb Summary ===/,$$ p' racy.sum
+
 check-parallel:
 	-rm -rf cache outputs temp
 	$(MAKE) -k do-check-parallel; \
@@ -201,6 +228,31 @@ check-parallel:
 	sed -n '/=== gdb Summary ===/,$$ p' gdb.sum; \
 	exit $$result
 
+check-parallel-racy:
+	-rm -rf cache racy_outputs temp
+	racyiter="$(RACY_ITER)"; \
+	test "x$$racyiter" == "x" && \
+	  racyiter=$(DEFAULT_RACY_ITER); \
+	if test $$racyiter -lt 2 ; then \
+	  echo "RACY_ITER must be at least 2."; \
+	  exit 1; \
+	fi; \
+	trap "exit" INT; \
+	for n in `seq $$racyiter` ; do \
+	  $(MAKE) -k do-check-parallel-racy \
+	    RACY_OUTPUT_N=$$n; \
+	  $(SHELL) $(srcdir)/dg-extract-results.sh \
+	    `find racy_outputs/$$n -name gdb.sum -print` > \
+	    racy_outputs/$$n/gdb.sum; \
+	  $(SHELL) $(srcdir)/dg-extract-results.sh -L \
+	    `find racy_outputs/$$n -name gdb.log -print` > \
+	    racy_outputs/$$n/gdb.log; \
+	  sed -n '/=== gdb Summary ===/,$$ p' racy_outputs/$$n/gdb.sum; \
+	done; \
+	$(srcdir)/analyze-racy-logs.py \
+	  `ls racy_outputs/*/gdb.sum` > racy.sum; \
+	sed -n '/=== gdb Summary ===/,$$ p' racy.sum
+
 # Turn a list of .exp files into "check/" targets.  Only examine .exp
 # files appearing in a gdb.* directory -- we don't want to pick up
 # lib/ by mistake.  For example, gdb.linespec/linespec.exp becomes
@@ -214,9 +266,9 @@ slow_tests = gdb.base/break-interp.exp gdb.base/interp.exp \
 	gdb.base/multi-forks.exp
 @GMAKE_TRUE@all_tests := $(shell cd $(srcdir) && find gdb.* -name '*.exp' -print)
 @GMAKE_TRUE@reordered_tests := $(slow_tests) $(filter-out $(slow_tests),$(all_tests))
-@GMAKE_TRUE@TEST_TARGETS := $(addprefix check/,$(reordered_tests))
+@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(reordered_tests))
 @GMAKE_TRUE@else
-@GMAKE_TRUE@TEST_TARGETS := $(addprefix check/,$(expanded_tests_or_none))
+@GMAKE_TRUE@TEST_TARGETS := $(addprefix $(if $(RACY_ITER),check-racy,check)/,$(expanded_tests_or_none))
 @GMAKE_TRUE@endif
 
 do-check-parallel: $(TEST_TARGETS)
@@ -226,6 +278,15 @@ do-check-parallel: $(TEST_TARGETS)
 @GMAKE_TRUE@	-mkdir -p outputs/$*
 @GMAKE_TRUE@	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)
 
+do-check-parallel-racy: $(TEST_TARGETS)
+	@:
+
+@GMAKE_TRUE@check-racy/%.exp:
+@GMAKE_TRUE@	-mkdir -p racy_outputs/$(RACY_OUTPUT_N)/$*
+@GMAKE_TRUE@	$(DO_RUNTEST) GDB_PARALLEL=yes \
+@GMAKE_TRUE@	  --outdir=racy_outputs/$(RACY_OUTPUT_N)/$* $*.exp \
+@GMAKE_TRUE@	  $(RUNTESTFLAGS)
+
 check/no-matching-tests-found:
 	@echo ""
 	@echo "No matching tests found."
diff --git a/gdb/testsuite/README b/gdb/testsuite/README
index 6b59027..043a8bd 100644
--- a/gdb/testsuite/README
+++ b/gdb/testsuite/README
@@ -50,6 +50,51 @@ to any non-empty value:
 If you want to use runtest directly instead of using the Makefile, see
 the description of GDB_PARALLEL below.
 
+Racy testcases
+**************
+
+Sometimes, new testcases are added to the testsuite that are not
+entirely deterministic, and can randomly pass or fail.  We call them
+"racy testcases", and they can be bothersome when one is comparing
+different testsuite runs.  In order to help identifying them, it is
+possible to run the tests several times in a row and ask the testsuite
+machinery to analyze the results.  To do that, you need to specify the
+RACY_ITER environment variable to make:
+
+	make check RACY_ITER=5 -j4
+
+The value assigned to RACY_ITER represents the number of times you
+wish to run the tests in sequence (in the example above, the entire
+testsuite will be executed 5 times in a row, in parallel).  It is also
+possible to check just a specific test:
+
+	make check TESTS='gdb.base/default.exp' RACY_ITER=3
+
+One can also decide to call the Makefile rules by hand inside the
+gdb/testsuite directory, e.g.:
+
+	make check-paralell-racy -j4
+
+In which case the value of the DEFAULT_RACY_ITER variable (inside
+gdb/testsuite/Makefile.in) will be used to determine how many
+iterations will be run.
+
+After running the tests, you shall see a file name 'racy.sum' in the
+gdb/testsuite directory.  You can also inspect the generated *.log and
+*.sum files by looking into the gdb/testsuite/racy_ouputs directory.
+
+If you already have *.sum files generated from previous testsuite runs
+and you would like to analyze them without having to run the testsuite
+again, you can also use the 'analyze-racy-logs.py' script directly.
+It is located in the gdb/testsuite/ directory, and it expects a list
+of two or more *.sum files to be provided as its argument.  For
+example:
+
+	./gdb/testsuite/analyze-racy-logs.py testsuite-01/gdb.sum \
+	  testsuite-02/gdb.sum testsuite-03/gdb.sum
+
+The script will output its analysis report to the standard output.
+
 Running the Performance Tests
 *****************************
 
diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
new file mode 100755
index 0000000..1083b51
--- /dev/null
+++ b/gdb/testsuite/analyze-racy-logs.py
@@ -0,0 +1,177 @@
+#!/usr/bin/env python
+
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GDB.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+
+# This program is used to analyze the test results (i.e., *.sum files)
+# generated by GDB's testsuite, and print the testcases that are found
+# to be racy.
+#
+# Racy testcases are considered as being testcases which can
+# intermittently FAIL (or PASS) when run two or more times
+# consecutively, i.e., tests whose results are not deterministic.
+#
+# This program is invoked when the user runs "make check" and
+# specifies the RACY_ITER environment variable.
+
+import sys
+import os
+import re
+
+# The (global) dictionary that stores the associations between a *.sum
+# file and its results.  The data inside it will be stored as:
+#
+# files_and_tests = { 'file1.sum' : { 'PASS' : { 'test1', 'test2' ... },
+#                                     'FAIL' : { 'test5', 'test6' ... },
+#                                     ...
+#                                   },
+#                   { 'file2.sum' : { 'PASS' : { 'test1', 'test3' ... },
+#                                   ...
+#                                   }
+#                   }
+
+files_and_tests = dict ()
+
+# The relatioships between various states of the same tests that
+# should be ignored.  For example, if the same test PASSes on a
+# testcase run but KFAILs on another, this test should be considered
+# racy because a known-failure is...  known.
+
+ignore_relations = { 'PASS' : 'KFAIL' }
+
+# We are interested in lines that start with '.?(PASS|FAIL)'.  In
+# other words, we don't process errors (maybe we should).
+
+sum_matcher = re.compile('^(.?(PASS|FAIL)): (.*)$')
+
+def parse_sum_line (line, dic):
+    """Parse a single LINE from a sumfile, and store the results in the
+dictionary referenced by DIC."""
+    global sum_matcher
+
+    line = line.rstrip ()
+    m = re.match (sum_matcher, line)
+    if m:
+        result = m.group (1)
+        test_name = m.group (3)
+        # Remove tail parentheses.  These are likely to be '(timeout)'
+        # and other extra information that will only confuse us.
+        test_name = re.sub ('(\s+)?\(.*$', '', test_name)
+        if result not in dic.keys ():
+            dic[result] = set ()
+        if test_name in dic[result]:
+            # If the line is already present in the dictionary, then
+            # we include a unique identifier in the end of it, in the
+            # form or '<<N>>' (where N is a number >= 2).  This is
+            # useful because the GDB testsuite is full of non-unique
+            # test messages; however, if you process the racy summary
+            # file you will also need to perform this same operation
+            # in order to identify the racy test.
+            i = 2
+            while True:
+                nname = test_name + ' <<' + str (i) + '>>'
+                if nname not in dic[result]:
+                    break
+                i += 1
+            test_name = nname
+        dic[result].add (test_name)
+
+def read_sum_files (files):
+    """Read the sumfiles (passed as a list in the FILES variable), and
+process each one, filling the FILES_AND_TESTS global dictionary with
+information about them. """
+    global files_and_tests
+
+    for x in files:
+        with open (x, 'r') as f:
+            files_and_tests[x] = dict ()
+            for line in f.readlines ():
+                parse_sum_line (line, files_and_tests[x])
+
+def identify_racy_tests ():
+    """Identify and print the racy tests.  This function basically works
+on sets, and the idea behind it is simple.  It takes all the sets that
+refer to the same result (for example, all the sets that contain PASS
+tests), and compare them.  If a test is present in all PASS sets, then
+it is not racy.  Otherwise, it is.
+
+This function does that for all sets (PASS, FAIL, KPASS, KFAIL, etc.),
+and then print a sorted list (without duplicates) of all the tests
+that were found to be racy."""
+    global files_and_tests
+
+    # First, construct two dictionaries that will hold one set of
+    # testcases for each state (PASS, FAIL, etc.).
+    #
+    # Each set in NONRACY_TESTS will contain only the non-racy
+    # testcases for that state.  A non-racy testcase is a testcase
+    # that has the same state in all test runs.
+    #
+    # Each set in ALL_TESTS will contain all tests, racy or not, for
+    # that state.
+    nonracy_tests = dict ()
+    all_tests = dict ()
+    for f in files_and_tests:
+        for state in files_and_tests[f]:
+            try:
+                nonracy_tests[state] &= files_and_tests[f][state].copy ()
+            except KeyError:
+                nonracy_tests[state] = files_and_tests[f][state].copy ()
+
+            try:
+                all_tests[state] |= files_and_tests[f][state].copy ()
+            except KeyError:
+                all_tests[state] = files_and_tests[f][state].copy ()
+
+    # Now, we eliminate the tests that are present in states that need
+    # to be ignored.  For example, tests both in the PASS and KFAIL
+    # states should not be considered racy.
+    ignored_tests = set ()
+    for s1, s2 in ignore_relations.iteritems ():
+        try:
+            ignored_tests |= (all_tests[s1] & all_tests[s2])
+        except:
+            continue
+
+    racy_tests = set ()
+    for f in files_and_tests:
+        for state in files_and_tests[f]:
+            racy_tests |= files_and_tests[f][state] - nonracy_tests[state]
+
+    racy_tests = racy_tests - ignored_tests
+
+    # Print the header.
+    print "\t\t=== gdb tests ===\n"
+
+    # Print each test.
+    for line in sorted (racy_tests):
+        print line
+
+    # Print the summary.
+    print "\n"
+    print "\t\t=== gdb Summary ===\n"
+    print "# of racy testcases:\t\t%d" % len (racy_tests)
+
+if __name__ == '__main__':
+    if len (sys.argv) < 3:
+        # It only makes sense to invoke this program if you pass two
+        # or more files to be analyzed.
+        sys.exit ("Usage: %s [FILE] [FILE] ..." % sys.argv[0])
+    read_sum_files (sys.argv[1:])
+    identify_racy_tests ()
+    exit (0)