Do not terminate default test runs on test failure

Message ID Pine.LNX.4.64.1403071741020.6302@digraph.polyomino.org.uk
State Superseded
Headers

Commit Message

Joseph Myers March 7, 2014, 5:42 p.m. UTC
  This patch is an updated version of
<https://sourceware.org/ml/libc-alpha/2014-01/msg00198.html> now
proposed for inclusion in glibc.

Normal practice for software testsuites is that rather than
terminating immediately when a test fails, they continue running and
report at the end on how many tests passed or failed.

The principle behind the glibc testsuite stopping on failure was
probably that the expected state is no failures and so any failure
indicates a problem such as miscompilation.  In practice, while this
is fairly close to true for native testing on x86_64 and x86 (kernel
bugs and race conditions can still cause intermittent failures), it's
less likely to be the case on other platforms, and so people testing
glibc run the testsuite with "make -k" and then examine the logs to
determine whether the failures are what they expect to fail on that
platform, possibly with some automation for the comparison.

This patch switches the glibc testsuite to the normal convention of
not stopping on failure - unless you use stop-on-test-failure=y, in
which case it behaves essentially as it did before (and does not
generate overall test summaries on failure).  Instead, the summary
tests.sum may contain tests that FAILed.  At the end of the test run,
any FAIL or ERROR lines from tests.sum are printed, and then it exits
with error status if there were any ERROR lines (meaning a directory
had no test results).  In addition, build failures will also cause the
test run to stop - this has the justification that those *do* indicate
serious problems that should be promptly fixed and aren't generally
hard to fix (but apart from that, avoiding the build stopping on those
failures seems harder).

Note that unlike the previous patches in this series, this *does*
require people with automation around testing glibc to change their
processes - either to start using tests.sum / xtests.sum to track
failures and compare them with expectations (with or without also
using "make -k" and examining "make" logs to identify build failures),
or else to use stop-on-test-failure=y and ignore the new tests.sum /
xtests.sum mechanism.

Tested x86_64.

Compared to the previous version, this includes changes to the manual
text as suggested by Brooks in
<https://sourceware.org/ml/libc-alpha/2014-02/msg00301.html>.

This concludes the present series of patches to generate PASS / FAIL
test results and fix associated testsuite issues.  Once it's in, I
intend to put the remaining TODO list items from the discussions of
these patches on the wiki.  I may return later to further testsuite
infrastructure improvements (such as supporting installed testing),
but also encourage other people interested in testsuite issues to work
on issues of interest to them (whether or not also listed on the wiki
todo list or in Bugzilla).

2014-03-07  Joseph Myers  <joseph@codesourcery.com>

	* scripts/evaluate-test.sh: Handle fourth argument to determine
	whether test run should stop on failure.
	* Makeconfig (stop-on-test-failure): New variable.
	(evaluate-test): Pass fourth argument to evaluate-test.sh based on
	$(stop-on-test-failure).
	* Makefile (tests): Give a summary of results from testing and
	exit with failure status if they include an ERROR.
	(xtests): Likewise.
	* manual/install.texi (Configuring and compiling): Mention
	stop-on-test-failure=y.
	* INSTALL: Regenerated.
  

Comments

Brooks Moses March 10, 2014, 9:27 p.m. UTC | #1
On 03/07/2014 09:42 AM, Joseph S. Myers wrote:
> This patch is an updated version of
> <https://sourceware.org/ml/libc-alpha/2014-01/msg00198.html> now
> proposed for inclusion in glibc.
[...]
> Note that unlike the previous patches in this series, this *does*
> require people with automation around testing glibc to change their
> processes - either to start using tests.sum / xtests.sum to track
> failures and compare them with expectations (with or without also
> using "make -k" and examining "make" logs to identify build failures),
> or else to use stop-on-test-failure=y and ignore the new tests.sum /
> xtests.sum mechanism.
>
> Tested x86_64.
>
> Compared to the previous version, this includes changes to the manual
> text as suggested by Brooks in
> <https://sourceware.org/ml/libc-alpha/2014-02/msg00301.html>.

This version looks good to me.  Thanks!

I don't think my approval of the process changes is sufficient to indicate 
consensus by itself, but IMO it's a very significant improvement and I am 
strongly in favor.

- Brooks
  
Mike Frysinger March 11, 2014, 6 a.m. UTC | #2
looks OK
-mike
  
Roland McGrath March 11, 2014, 6:28 p.m. UTC | #3
This was always handled before by just passing -k.
  
Joseph Myers March 11, 2014, 6:41 p.m. UTC | #4
On Tue, 11 Mar 2014, Roland McGrath wrote:

> This was always handled before by just passing -k.

Apart from needing make -k being unnecessarily different from normal 
conventions for how testsuites work, it also doesn't work for getting the 
test summaries: if any tests (dependencies of the tests target) fail then 
the makefile rule for the tests target (which in a subdirectory generates 
subdir-tests.sum, and at toplevel also generates tests.sum) doesn't get 
run.  The point of this patch is to generate a normal summary file of test 
results at toplevel, regardless of whether individual tests passed or 
failed - so you don't need to use -k and postprocess make output to 
identify failures.
  
Roland McGrath March 11, 2014, 7:14 p.m. UTC | #5
> Apart from needing make -k being unnecessarily different from normal 
> conventions for how testsuites work, it also doesn't work for getting the 
> test summaries: if any tests (dependencies of the tests target) fail then 
> the makefile rule for the tests target (which in a subdirectory generates 
> subdir-tests.sum, and at toplevel also generates tests.sum) doesn't get 
> run.  The point of this patch is to generate a normal summary file of test 
> results at toplevel, regardless of whether individual tests passed or 
> failed - so you don't need to use -k and postprocess make output to 
> identify failures.

Yeah, that's sensible enough.  Just so long as the top-level make exits
nonzero when there were failures.
  
Joseph Myers March 11, 2014, 9:14 p.m. UTC | #6
On Tue, 11 Mar 2014, Roland McGrath wrote:

> > Apart from needing make -k being unnecessarily different from normal 
> > conventions for how testsuites work, it also doesn't work for getting the 
> > test summaries: if any tests (dependencies of the tests target) fail then 
> > the makefile rule for the tests target (which in a subdirectory generates 
> > subdir-tests.sum, and at toplevel also generates tests.sum) doesn't get 
> > run.  The point of this patch is to generate a normal summary file of test 
> > results at toplevel, regardless of whether individual tests passed or 
> > failed - so you don't need to use -k and postprocess make output to 
> > identify failures.
> 
> Yeah, that's sensible enough.  Just so long as the top-level make exits
> nonzero when there were failures.

Whether to exit with error status if there were FAILs, or only if there 
were ERRORs, was one of my questions on the first version of this patch, 
where the only previous comment was from Brooks preferring FAILs not to 
cause an error status 
<https://sourceware.org/ml/libc-alpha/2014-02/msg00301.html>.  I'm happy 
either way; error exit for any FAILs is just two extra grep lines like 
those ensuring error exit for any ERRORs.
  
Roland McGrath March 11, 2014, 9:24 p.m. UTC | #7
> Whether to exit with error status if there were FAILs, or only if there 
> were ERRORs, was one of my questions on the first version of this patch, 
> where the only previous comment was from Brooks preferring FAILs not to 
> cause an error status 
> <https://sourceware.org/ml/libc-alpha/2014-02/msg00301.html>.  I'm happy 
> either way; error exit for any FAILs is just two extra grep lines like 
> those ensuring error exit for any ERRORs.

The status quo is that 'make check' only exits zero if all tests passed.
So please preserve that.
  
Brooks Moses March 11, 2014, 10:18 p.m. UTC | #8
On 03/11/2014 02:24 PM, Roland McGrath wrote:
>> Whether to exit with error status if there were FAILs, or only if there
>> were ERRORs, was one of my questions on the first version of this patch,
>> where the only previous comment was from Brooks preferring FAILs not to
>> cause an error status
>> <https://sourceware.org/ml/libc-alpha/2014-02/msg00301.html>.  I'm happy
>> either way; error exit for any FAILs is just two extra grep lines like
>> those ensuring error exit for any ERRORs.
>
> The status quo is that 'make check' only exits zero if all tests passed.
> So please preserve that.

I won't sustain an objection to this decision.  Although I find it more 
convenient to only have an error exit in case of ERROR, it's a fairly minor 
thing to work around.

- Brooks
  
Roland McGrath March 11, 2014, 10:20 p.m. UTC | #9
> I won't sustain an objection to this decision.  Although I find it more 
> convenient to only have an error exit in case of ERROR, it's a fairly minor 
> thing to work around.

That would be inconsistent with how other GNU packages' 'make check' works.
  
Brooks Moses March 11, 2014, 10:26 p.m. UTC | #10
On Tue, Mar 11, 2014 at 3:20 PM, Roland McGrath <roland@hack.frob.com> wrote:
>> I won't sustain an objection to this decision.  Although I find it more
>> convenient to only have an error exit in case of ERROR, it's a fairly minor
>> thing to work around.
>
> That would be inconsistent with how other GNU packages' 'make check' works.

Huh.  Somehow I missed that GCC was doing that.  If that's the case,
then I will not merely withdraw my objection but reverse it;
consistency is good.

- Brooks
  
Carlos O'Donell March 13, 2014, 6:29 a.m. UTC | #11
On 03/07/2014 12:42 PM, Joseph S. Myers wrote:
> This patch is an updated version of
> <https://sourceware.org/ml/libc-alpha/2014-01/msg00198.html> now
> proposed for inclusion in glibc.
> 
> Normal practice for software testsuites is that rather than
> terminating immediately when a test fails, they continue running and
> report at the end on how many tests passed or failed.
> 
> The principle behind the glibc testsuite stopping on failure was
> probably that the expected state is no failures and so any failure
> indicates a problem such as miscompilation.  In practice, while this
> is fairly close to true for native testing on x86_64 and x86 (kernel
> bugs and race conditions can still cause intermittent failures), it's
> less likely to be the case on other platforms, and so people testing
> glibc run the testsuite with "make -k" and then examine the logs to
> determine whether the failures are what they expect to fail on that
> platform, possibly with some automation for the comparison.
> 
> This patch switches the glibc testsuite to the normal convention of
> not stopping on failure - unless you use stop-on-test-failure=y, in
> which case it behaves essentially as it did before (and does not
> generate overall test summaries on failure).  Instead, the summary
> tests.sum may contain tests that FAILed.  At the end of the test run,
> any FAIL or ERROR lines from tests.sum are printed, and then it exits
> with error status if there were any ERROR lines (meaning a directory
> had no test results).  In addition, build failures will also cause the
> test run to stop - this has the justification that those *do* indicate
> serious problems that should be promptly fixed and aren't generally
> hard to fix (but apart from that, avoiding the build stopping on those
> failures seems harder).
> 
> Note that unlike the previous patches in this series, this *does*
> require people with automation around testing glibc to change their
> processes - either to start using tests.sum / xtests.sum to track
> failures and compare them with expectations (with or without also
> using "make -k" and examining "make" logs to identify build failures),
> or else to use stop-on-test-failure=y and ignore the new tests.sum /
> xtests.sum mechanism.
> 
> Tested x86_64.
> 
> Compared to the previous version, this includes changes to the manual
> text as suggested by Brooks in
> <https://sourceware.org/ml/libc-alpha/2014-02/msg00301.html>.
> 
> This concludes the present series of patches to generate PASS / FAIL
> test results and fix associated testsuite issues.  Once it's in, I
> intend to put the remaining TODO list items from the discussions of
> these patches on the wiki.  I may return later to further testsuite
> infrastructure improvements (such as supporting installed testing),
> but also encourage other people interested in testsuite issues to work
> on issues of interest to them (whether or not also listed on the wiki
> todo list or in Bugzilla).
> 
> 2014-03-07  Joseph Myers  <joseph@codesourcery.com>
> 
> 	* scripts/evaluate-test.sh: Handle fourth argument to determine
> 	whether test run should stop on failure.
> 	* Makeconfig (stop-on-test-failure): New variable.
> 	(evaluate-test): Pass fourth argument to evaluate-test.sh based on
> 	$(stop-on-test-failure).
> 	* Makefile (tests): Give a summary of results from testing and
> 	exit with failure status if they include an ERROR.
> 	(xtests): Likewise.
> 	* manual/install.texi (Configuring and compiling): Mention
> 	stop-on-test-failure=y.
> 	* INSTALL: Regenerated.

OK, modulo parenthetical nit below.

I would prefer you split this into two patches, a minimal patch
to support running without `-k' and another for all the other
changes, but I'm not opposed to one patch for this change.
The first patch would change nothing, but add the variable,
and the second patch changes the default and includes the summary
changes.

> diff --git a/INSTALL b/INSTALL
> index bfa692d..bcb53b8 100644
> --- a/INSTALL
> +++ b/INSTALL
> @@ -154,7 +154,7 @@ will be used, and CFLAGS sets optimization options for the compiler.
>  
>       If you only specify `--host', `configure' will prepare for a
>       native compile but use what you specify instead of guessing what
> -     your system is. This is most useful to change the CPU submodel.
> +     your system is.  This is most useful to change the CPU submodel.

OK. Split into another patch?

>       For example, if `configure' guesses your machine as
>       `i686-pc-linux-gnu' but you want to compile a library for 586es,
>       give `--host=i586-pc-linux-gnu' or just `--host=i586-linux' and add
> @@ -192,11 +192,16 @@ an appropriate numeric parameter to `make'.  You need a recent GNU
>  
>     To build and run test programs which exercise some of the library
>  facilities, type `make check'.  If it does not complete successfully,
> -do not use the built library, and report a bug after verifying that the
> -problem is not already known.  *Note Reporting Bugs::, for instructions
> -on reporting bugs.  Note that some of the tests assume they are not
> -being run by `root'.  We recommend you compile and test the GNU C
> -Library as an unprivileged user.
> +or if it reports any unexpected failures or errors in its final summary
> +of results, do not use the built library, and report a bug after
> +verifying that the problem is not already known.  (You can specify
> +`stop-on-test-failure=y' when running `make check' to make the test run
> +stop and exit with an error status immediately when a failure occurs,
> +rather than completing the test run and reporting all problems found.)

Remove parenthetical. This is a fine statement to make at this point in
the document.

> +*Note Reporting Bugs::, for instructions on reporting bugs.  Note that
> +some of the tests assume they are not being run by `root'.  We
> +recommend you compile and test the GNU C Library as an unprivileged
> +user.

OK.

>  
>     Before reporting bugs make sure there is no problem with your system.
>  The tests (and later installation) use some pre-existing files of the
> diff --git a/Makeconfig b/Makeconfig
> index 9078b29..0a2e12b 100644
> --- a/Makeconfig
> +++ b/Makeconfig
> @@ -601,6 +601,12 @@ run-built-tests = yes
>  endif
>  endif
>  
> +# Whether to stop immediately when a test fails.  Nonempty means to

OK.

> +# stop, empty means not to stop.
> +ifndef stop-on-test-failure
> +stop-on-test-failure =
> +endif
> +
>  # How to run a program we just linked with our library.
>  # The program binary is assumed to be $(word 2,$^).
>  built-program-file = $(dir $(word 2,$^))$(notdir $(word 2,$^))
> @@ -1091,6 +1097,7 @@ test-xfail-name = $(strip $(patsubst %.out, %, $(patsubst $(objpfx)%, %, $@)))
>  # XPASS or XFAIL rather than PASS or FAIL.
>  evaluate-test = $(..)scripts/evaluate-test.sh $(test-name) $$? \
>  		  $(if $(test-xfail-$(test-xfail-name)),true,false) \
> +		  $(if $(stop-on-test-failure),true,false) \

OK.

>  		  > $(common-objpfx)$(test-name).test-result
>  
>  endif # Makeconfig not yet included
> diff --git a/Makefile b/Makefile
> index 8214dda..bdece42 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -324,10 +324,20 @@ tests: $(tests-special)
>  	$(..)scripts/merge-test-results.sh -t $(objpfx) subdir-tests.sum \
>  	  $(sort $(subdirs) .) \
>  	  > $(objpfx)tests.sum
> +	@grep '^ERROR:' $(objpfx)tests.sum || true
> +	@grep '^FAIL:' $(objpfx)tests.sum || true
> +	@echo "Summary of test results:"
> +	@sed 's/:.*//' < $(objpfx)tests.sum | sort | uniq -c
> +	@if grep -q '^ERROR:' $(objpfx)tests.sum; then exit 1; fi
>  xtests:
>  	$(..)scripts/merge-test-results.sh -t $(objpfx) subdir-xtests.sum \
>  	  $(sort $(subdirs)) \
>  	  > $(objpfx)xtests.sum
> +	@grep '^ERROR:' $(objpfx)xtests.sum || true
> +	@grep '^FAIL:' $(objpfx)xtests.sum || true
> +	@echo "Summary of test results for extra tests:"
> +	@sed 's/:.*//' < $(objpfx)xtests.sum | sort | uniq -c
> +	@if grep -q '^ERROR:' $(objpfx)xtests.sum; then exit 1; fi

OK. Split into another patch?

>  
>  # The realclean target is just like distclean for the parent, but we want
>  # the subdirs to know the difference in case they care.
> diff --git a/NEWS b/NEWS
> index 0c5b39a..381b8a5 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -12,6 +12,14 @@ Version 2.20
>    15347, 15804, 15894, 16447, 16532, 16545, 16574, 16600, 16609, 16610,
>    16611, 16613, 16623, 16632.
>  
> +* Running the testsuite no longer terminates as soon as a test fails.
> +  Instead, a file tests.sum (xtests.sum from "make xcheck") is generated,
> +  with PASS or FAIL lines for individual tests.  A summary of the results is
> +  printed, including a list of failing lists, and "make check" exits with
> +  error status only if test results for a subdirectory are completely
> +  missing, or if a test failed to compile.  "make check
> +  stop-on-test-failure=y" may be used to keep the old behavior.
> +

OK. Split into another patch?

>  * The am33 port, which had not worked for several years, has been removed
>    from ports.
>  
> diff --git a/manual/install.texi b/manual/install.texi
> index 8562bdc..fa44519 100644
> --- a/manual/install.texi
> +++ b/manual/install.texi
> @@ -224,9 +224,14 @@ GNU @code{make} version, though.
>  
>  To build and run test programs which exercise some of the library
>  facilities, type @code{make check}.  If it does not complete
> -successfully, do not use the built library, and report a bug after
> -verifying that the problem is not already known.  @xref{Reporting Bugs},
> -for instructions on reporting bugs.  Note that some of the tests assume
> +successfully, or if it reports any unexpected failures or errors in
> +its final summary of results, do not use the built library, and report
> +a bug after verifying that the problem is not already known.  (You can
> +specify @samp{stop-on-test-failure=y} when running @code{make check}
> +to make the test run stop and exit with an error status immediately
> +when a failure occurs, rather than completing the test run and
> +reporting all problems found.)  @xref{Reporting Bugs}, for
> +instructions on reporting bugs.  Note that some of the tests assume

OK.

>  they are not being run by @code{root}.  We recommend you compile and
>  test @theglibc{} as an unprivileged user.
>  
> diff --git a/scripts/evaluate-test.sh b/scripts/evaluate-test.sh
> index c8f5012..2a5c156 100755
> --- a/scripts/evaluate-test.sh
> +++ b/scripts/evaluate-test.sh
> @@ -17,12 +17,13 @@
>  # License along with the GNU C Library; if not, see
>  # <http://www.gnu.org/licenses/>.
>  
> -# usage: evaluate-test.sh test_name rc xfail
> +# usage: evaluate-test.sh test_name rc xfail stop_on_failure

OK.

>  
>  test_name=$1
>  rc=$2
>  orig_rc=$rc
>  xfail=$3
> +stop_on_failure=$4
>  
>  if [ $rc -eq 0 ]; then
>    result="PASS"
> @@ -37,4 +38,8 @@ fi
>  
>  echo "$result: $test_name"
>  echo "original exit status $orig_rc"
> -exit $rc
> +if $stop_on_failure; then
> +  exit $rc
> +else
> +  exit 0
> +fi
> 

Cheers,
Carlos.
  
Andreas Schwab March 31, 2014, 3:23 p.m. UTC | #12
"Joseph S. Myers" <joseph@codesourcery.com> writes:

> At the end of the test run, any FAIL or ERROR lines from tests.sum are
> printed

Are they?  I don't see that here.

Andreas.
  
Joseph Myers March 31, 2014, 5:13 p.m. UTC | #13
On Mon, 31 Mar 2014, Andreas Schwab wrote:

> "Joseph S. Myers" <joseph@codesourcery.com> writes:
> 
> > At the end of the test run, any FAIL or ERROR lines from tests.sum are
> > printed
> 
> Are they?  I don't see that here.

I've certainly seem them printed.  That's what

        @grep '^ERROR:' $(objpfx)tests.sum || true
        @grep '^FAIL:' $(objpfx)tests.sum || true

does.
  
Andreas Schwab March 31, 2014, 8:18 p.m. UTC | #14
"Joseph S. Myers" <joseph@codesourcery.com> writes:

> On Mon, 31 Mar 2014, Andreas Schwab wrote:
>
>> "Joseph S. Myers" <joseph@codesourcery.com> writes:
>> 
>> > At the end of the test run, any FAIL or ERROR lines from tests.sum are
>> > printed
>> 
>> Are they?  I don't see that here.
>
> I've certainly seem them printed.  That's what
>
>         @grep '^ERROR:' $(objpfx)tests.sum || true
>         @grep '^FAIL:' $(objpfx)tests.sum || true

This is the tail of make check:

../scripts/merge-test-results.sh -s /home/abuild/rpmbuild/BUILD/glibc-2.19.90/cc-base/elf/ elf \
  check-abi-ld check-abi-libc check-execstack check-localplt check-textrel circleload1 constload1 dblload dblunload filter global ifuncmain1 ifuncmain1pic ifuncmain1picstatic ifuncmain1pie ifuncmain1static ifuncmain1staticpic ifuncmain1staticpie ifuncmain1vis ifuncmain1vispic ifuncmain1vispie ifuncmain2 ifuncmain2pic ifuncmain2picstatic ifuncmain2static ifuncmain3 ifuncmain4 ifuncmain4picstatic ifuncmain4static ifuncmain5 ifuncmain5pic ifuncmain5picstatic ifuncmain5pie ifuncmain5static ifuncmain5staticpic ifuncmain6pie ifuncmain7 ifuncmain7pic ifuncmain7picstatic ifuncmain7pie ifuncmain7static initfirst lateglobal loadfail loadtest multiload neededtest neededtest2 neededtest3 neededtest4 next nodelete nodelete2 nodlopen nodlopen2 noload noload-mem order order-cmp order2 order2-cmp origtest preloadtest reldep reldep2 reldep3 reldep4 reldep5 reldep6 reldep7 reldep8 resolvfail restest1 restest2 tst-addr1 tst-align tst-align2 tst-array1 tst-array1-cmp tst-array1-static tst-array1-static-cmp tst-array2 tst-array2-cmp tst-array3 tst-array3-cmp tst-array4 tst-array4-cmp tst-array5 tst-array5-cmp tst-array5-static tst-array5-static-cmp tst-audit1 tst-audit2 tst-audit8 tst-audit9 tst-auxv tst-deep1 tst-dlmodcount tst-dlmopen1 tst-dlmopen2 tst-dlmopen3 tst-dlopen-aout tst-dlopenrpath tst-execstack tst-execstack-needed tst-execstack-prog tst-global1 tst-initorder tst-initorder-cmp tst-initorder2 tst-initorder2-cmp tst-leaks1 tst-leaks1-mem tst-leaks1-static tst-leaks1-static-mem tst-null-argv tst-pathopt tst-pie1 tst-pie2 tst-ptrguard1 tst-ptrguard1-static tst-relsort1 tst-rtld-load-self tst-stackguard1 tst-stackguard1-static tst-thrlock tst-tls-dlinfo tst-tls1 tst-tls1-static tst-tls10 tst-tls11 tst-tls12 tst-tls13 tst-tls14 tst-tls15 tst-tls16 tst-tls17 tst-tls18 tst-tls19 tst-tls2 tst-tls2-static tst-tls3 tst-tls4 tst-tls5 tst-tls6 tst-tls7 tst-tls8 tst-tls9 tst-tls9-static tst-unique1 tst-unique2 tst-unique3 tst-unique4 tst-unused-dep tst-unused-dep-cmp unload unload2 unload3 unload4 unload5 unload6 unload7 unload8 vismain \
  > /home/abuild/rpmbuild/BUILD/glibc-2.19.90/cc-base/elf/subdir-tests.sum
rm /home/abuild/rpmbuild/BUILD/glibc-2.19.90/cc-base/libc.dynsym /home/abuild/rpmbuild/BUILD/glibc-2.19.90/cc-base/elf/ld.dynsym
make[2]: Leaving directory '/home/abuild/rpmbuild/BUILD/glibc-2.19.90/elf'
make[1]: Target 'check' not remade because of errors.
make[1]: Leaving directory '/home/abuild/rpmbuild/BUILD/glibc-2.19.90'
Makefile:9: recipe for target 'check' failed
make: *** [check] Error 2
make: Leaving directory '/home/abuild/rpmbuild/BUILD/glibc-2.19.90/cc-base'

As you can see: you see nothing.

Andreas.
  
Joseph Myers March 31, 2014, 8:29 p.m. UTC | #15
On Mon, 31 Mar 2014, Andreas Schwab wrote:

> make[1]: Target 'check' not remade because of errors.

That means that some dependency of the toplevel tests target failed 
(possibly because a test failed to compile), in which case the commands 
for that target don't get run.  Generation of summaries relies on 
dependencies not failing - there is a presumption that while some tests 
may fail on execution (which doesn't translate to a make-level failure 
unless stop-on-test-failure is set) for reasons that are hard to fix, any 
failure to compile is a serious problem that should also be 
straightforward to fix.

So find what the first failure at make level was and fix it (or add it to 
$(tests-special) and use $(evaluate-test) in the rules, if it really needs 
to be counted as a test that's allowed to fail).
  
Andreas Schwab March 31, 2014, 10:33 p.m. UTC | #16
"Joseph S. Myers" <joseph@codesourcery.com> writes:

> (possibly because a test failed to compile)

That's currently guaranteed due to some tests using -Werror.

Andreas.
  
Joseph Myers March 31, 2014, 10:55 p.m. UTC | #17
On Tue, 1 Apr 2014, Andreas Schwab wrote:

> "Joseph S. Myers" <joseph@codesourcery.com> writes:
> 
> > (possibly because a test failed to compile)
> 
> That's currently guaranteed due to some tests using -Werror.

Well, the patch that introduced warnings for those tests should not have 
gone in without adding -Wno-error=whatever for them (unless the warnings 
for those tests are architecture-specific).
  
Andreas Schwab March 31, 2014, 11:06 p.m. UTC | #18
"Joseph S. Myers" <joseph@codesourcery.com> writes:

> Well, the patch that introduced warnings for those tests should not have 
> gone in without adding -Wno-error=whatever for them (unless the warnings 
> for those tests are architecture-specific).

So we are removing -Wundef again?

Andreas.
  
Joseph Myers March 31, 2014, 11:21 p.m. UTC | #19
On Tue, 1 Apr 2014, Andreas Schwab wrote:

> "Joseph S. Myers" <joseph@codesourcery.com> writes:
> 
> > Well, the patch that introduced warnings for those tests should not have 
> > gone in without adding -Wno-error=whatever for them (unless the warnings 
> > for those tests are architecture-specific).
> 
> So we are removing -Wundef again?

I'd say the affected tests should use -Wno-error=undef *with a comment by 
the Makefile setting listing what the warnings are that need to be fixed 
for -Wno-error=undef to be removed*.
  
Andreas Schwab April 1, 2014, 7:41 a.m. UTC | #20
"Joseph S. Myers" <joseph@codesourcery.com> writes:

> That means that some dependency of the toplevel tests target failed 
> (possibly because a test failed to compile), in which case the commands 
> for that target don't get run.  Generation of summaries relies on 
> dependencies not failing

This is a serious regression, please fix ASAP.

Andreas.
  
Joseph Myers April 1, 2014, 11:50 a.m. UTC | #21
On Tue, 1 Apr 2014, Andreas Schwab wrote:

> "Joseph S. Myers" <joseph@codesourcery.com> writes:
> 
> > That means that some dependency of the toplevel tests target failed 
> > (possibly because a test failed to compile), in which case the commands 
> > for that target don't get run.  Generation of summaries relies on 
> > dependencies not failing
> 
> This is a serious regression, please fix ASAP.

I see no regression from my patch.

Suppose there is a build failure for some test (and you need to make clear 
exactly what errors you see, on what architecture, before we can start to 
work out the correct way to address them).

If you don't use -k, then, both before and after my patch, the test run 
stops with error status when that build failure occurs, without printing a 
summary of results.  No regression.

If you do use -k, then, both before and after my patch, the test run 
continues past the failure to run as many tests as possible, and once the 
possible tests have run it finishes with error status, without printing a 
summary of results.  No regression.

Maybe some other patch - say, the -Wundef patch - caused a regression, but 
you haven't given sufficient information about the actual failures you see 
and the configuration in which you see them to determine the right fix.
  
Andreas Schwab April 1, 2014, 1:09 p.m. UTC | #22
"Joseph S. Myers" <joseph@codesourcery.com> writes:

> On Tue, 1 Apr 2014, Andreas Schwab wrote:
>
>> "Joseph S. Myers" <joseph@codesourcery.com> writes:
>> 
>> > That means that some dependency of the toplevel tests target failed 
>> > (possibly because a test failed to compile), in which case the commands 
>> > for that target don't get run.  Generation of summaries relies on 
>> > dependencies not failing
>> 
>> This is a serious regression, please fix ASAP.
>
> I see no regression from my patch.

The regression is the ability to see all testsuite failures in the build
log.

Andreas.
  
Joseph Myers April 1, 2014, 2:12 p.m. UTC | #23
On Tue, 1 Apr 2014, Andreas Schwab wrote:

> "Joseph S. Myers" <joseph@codesourcery.com> writes:
> 
> > On Tue, 1 Apr 2014, Andreas Schwab wrote:
> >
> >> "Joseph S. Myers" <joseph@codesourcery.com> writes:
> >> 
> >> > That means that some dependency of the toplevel tests target failed 
> >> > (possibly because a test failed to compile), in which case the commands 
> >> > for that target don't get run.  Generation of summaries relies on 
> >> > dependencies not failing
> >> 
> >> This is a serious regression, please fix ASAP.
> >
> > I see no regression from my patch.
> 
> The regression is the ability to see all testsuite failures in the build
> log.

Use cases that involve identifying a list of failures from a build log are 
expected to involve changes to the system you use to list failures - that 
is, this is working as designed.  Either set stop-on-test-failure or 
combine build failures from the build log with test failures from 
.test-result files, if there is some good reason the build failures can't 
be fixed first before looking at other issues.

You still haven't said what the build failures are or what platform you 
see them on.  I don't see build failures in a testsuite run on x86_64.  
(I do see elf/tst-dlopen-aout failing - contrary to the comment saying it 
only fails if --enable-hardcoded-path-in-tests - and consider it rather 
dubious to have a test like that that fails in a default testsuite 
configuration; a test depending on --enable-hardcoded-path-in-tests should 
be disabled completely when the testsuite is run differently.)
  
Andreas Schwab April 1, 2014, 2:18 p.m. UTC | #24
"Joseph S. Myers" <joseph@codesourcery.com> writes:

> Use cases that involve identifying a list of failures from a build log are 
> expected to involve changes to the system you use to list failures - that 
> is, this is working as designed.

It is unacceptable that there is no summary just because one of the test
cases fails to build, since there is no other way to see which test
cases have failed if the only source of information is the build log.
This is never a problem with dejagnu, for example.

Andreas.
  
Joseph Myers April 1, 2014, 2:27 p.m. UTC | #25
On Tue, 1 Apr 2014, Andreas Schwab wrote:

> "Joseph S. Myers" <joseph@codesourcery.com> writes:
> 
> > Use cases that involve identifying a list of failures from a build log are 
> > expected to involve changes to the system you use to list failures - that 
> > is, this is working as designed.
> 
> It is unacceptable that there is no summary just because one of the test
> cases fails to build, since there is no other way to see which test
> cases have failed if the only source of information is the build log.
> This is never a problem with dejagnu, for example.

It is the case with DejaGnu.  The equivalent in DejaGnu is that a problem 
in a .exp file cause an ERROR that terminates the test run early or 
otherwise prevents some tests from being run (ERRORs in DejaGnu can't be 
reliably associated with individual tests).  I've seen that plenty of 
times.

The output of "make" is not a stable interface (the cases in which the 
exit status is nonzero are such an interface).  If you have automation to 
list failures from the output of "make", it needs changing.  If you want 
to use -k (and I discourage using -k with "make check" after my changes) 
and allow build failures, either use stop-on-test-failure and otherwise 
keep your old automation, or combine failures from the make logs with a 
concatentation of all .test-result files from the tree where you ran "make 
-k check".

I don't object to counting the builds of testcases as tests themselves and 
using $(evaluate-test) in the makefile rules (obviously you need to avoid 
conflicts in the PASS/FAIL lines between the entry for the build and the 
entry for the execution), but that would be a further incremental 
improvement to remove more cases for use of -k or external automation to 
identify failures.
  

Patch

diff --git a/INSTALL b/INSTALL
index bfa692d..bcb53b8 100644
--- a/INSTALL
+++ b/INSTALL
@@ -154,7 +154,7 @@  will be used, and CFLAGS sets optimization options for the compiler.
 
      If you only specify `--host', `configure' will prepare for a
      native compile but use what you specify instead of guessing what
-     your system is. This is most useful to change the CPU submodel.
+     your system is.  This is most useful to change the CPU submodel.
      For example, if `configure' guesses your machine as
      `i686-pc-linux-gnu' but you want to compile a library for 586es,
      give `--host=i586-pc-linux-gnu' or just `--host=i586-linux' and add
@@ -192,11 +192,16 @@  an appropriate numeric parameter to `make'.  You need a recent GNU
 
    To build and run test programs which exercise some of the library
 facilities, type `make check'.  If it does not complete successfully,
-do not use the built library, and report a bug after verifying that the
-problem is not already known.  *Note Reporting Bugs::, for instructions
-on reporting bugs.  Note that some of the tests assume they are not
-being run by `root'.  We recommend you compile and test the GNU C
-Library as an unprivileged user.
+or if it reports any unexpected failures or errors in its final summary
+of results, do not use the built library, and report a bug after
+verifying that the problem is not already known.  (You can specify
+`stop-on-test-failure=y' when running `make check' to make the test run
+stop and exit with an error status immediately when a failure occurs,
+rather than completing the test run and reporting all problems found.)
+*Note Reporting Bugs::, for instructions on reporting bugs.  Note that
+some of the tests assume they are not being run by `root'.  We
+recommend you compile and test the GNU C Library as an unprivileged
+user.
 
    Before reporting bugs make sure there is no problem with your system.
 The tests (and later installation) use some pre-existing files of the
diff --git a/Makeconfig b/Makeconfig
index 9078b29..0a2e12b 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -601,6 +601,12 @@  run-built-tests = yes
 endif
 endif
 
+# Whether to stop immediately when a test fails.  Nonempty means to
+# stop, empty means not to stop.
+ifndef stop-on-test-failure
+stop-on-test-failure =
+endif
+
 # How to run a program we just linked with our library.
 # The program binary is assumed to be $(word 2,$^).
 built-program-file = $(dir $(word 2,$^))$(notdir $(word 2,$^))
@@ -1091,6 +1097,7 @@  test-xfail-name = $(strip $(patsubst %.out, %, $(patsubst $(objpfx)%, %, $@)))
 # XPASS or XFAIL rather than PASS or FAIL.
 evaluate-test = $(..)scripts/evaluate-test.sh $(test-name) $$? \
 		  $(if $(test-xfail-$(test-xfail-name)),true,false) \
+		  $(if $(stop-on-test-failure),true,false) \
 		  > $(common-objpfx)$(test-name).test-result
 
 endif # Makeconfig not yet included
diff --git a/Makefile b/Makefile
index 8214dda..bdece42 100644
--- a/Makefile
+++ b/Makefile
@@ -324,10 +324,20 @@  tests: $(tests-special)
 	$(..)scripts/merge-test-results.sh -t $(objpfx) subdir-tests.sum \
 	  $(sort $(subdirs) .) \
 	  > $(objpfx)tests.sum
+	@grep '^ERROR:' $(objpfx)tests.sum || true
+	@grep '^FAIL:' $(objpfx)tests.sum || true
+	@echo "Summary of test results:"
+	@sed 's/:.*//' < $(objpfx)tests.sum | sort | uniq -c
+	@if grep -q '^ERROR:' $(objpfx)tests.sum; then exit 1; fi
 xtests:
 	$(..)scripts/merge-test-results.sh -t $(objpfx) subdir-xtests.sum \
 	  $(sort $(subdirs)) \
 	  > $(objpfx)xtests.sum
+	@grep '^ERROR:' $(objpfx)xtests.sum || true
+	@grep '^FAIL:' $(objpfx)xtests.sum || true
+	@echo "Summary of test results for extra tests:"
+	@sed 's/:.*//' < $(objpfx)xtests.sum | sort | uniq -c
+	@if grep -q '^ERROR:' $(objpfx)xtests.sum; then exit 1; fi
 
 # The realclean target is just like distclean for the parent, but we want
 # the subdirs to know the difference in case they care.
diff --git a/NEWS b/NEWS
index 0c5b39a..381b8a5 100644
--- a/NEWS
+++ b/NEWS
@@ -12,6 +12,14 @@  Version 2.20
   15347, 15804, 15894, 16447, 16532, 16545, 16574, 16600, 16609, 16610,
   16611, 16613, 16623, 16632.
 
+* Running the testsuite no longer terminates as soon as a test fails.
+  Instead, a file tests.sum (xtests.sum from "make xcheck") is generated,
+  with PASS or FAIL lines for individual tests.  A summary of the results is
+  printed, including a list of failing lists, and "make check" exits with
+  error status only if test results for a subdirectory are completely
+  missing, or if a test failed to compile.  "make check
+  stop-on-test-failure=y" may be used to keep the old behavior.
+
 * The am33 port, which had not worked for several years, has been removed
   from ports.
 
diff --git a/manual/install.texi b/manual/install.texi
index 8562bdc..fa44519 100644
--- a/manual/install.texi
+++ b/manual/install.texi
@@ -224,9 +224,14 @@  GNU @code{make} version, though.
 
 To build and run test programs which exercise some of the library
 facilities, type @code{make check}.  If it does not complete
-successfully, do not use the built library, and report a bug after
-verifying that the problem is not already known.  @xref{Reporting Bugs},
-for instructions on reporting bugs.  Note that some of the tests assume
+successfully, or if it reports any unexpected failures or errors in
+its final summary of results, do not use the built library, and report
+a bug after verifying that the problem is not already known.  (You can
+specify @samp{stop-on-test-failure=y} when running @code{make check}
+to make the test run stop and exit with an error status immediately
+when a failure occurs, rather than completing the test run and
+reporting all problems found.)  @xref{Reporting Bugs}, for
+instructions on reporting bugs.  Note that some of the tests assume
 they are not being run by @code{root}.  We recommend you compile and
 test @theglibc{} as an unprivileged user.
 
diff --git a/scripts/evaluate-test.sh b/scripts/evaluate-test.sh
index c8f5012..2a5c156 100755
--- a/scripts/evaluate-test.sh
+++ b/scripts/evaluate-test.sh
@@ -17,12 +17,13 @@ 
 # License along with the GNU C Library; if not, see
 # <http://www.gnu.org/licenses/>.
 
-# usage: evaluate-test.sh test_name rc xfail
+# usage: evaluate-test.sh test_name rc xfail stop_on_failure
 
 test_name=$1
 rc=$2
 orig_rc=$rc
 xfail=$3
+stop_on_failure=$4
 
 if [ $rc -eq 0 ]; then
   result="PASS"
@@ -37,4 +38,8 @@  fi
 
 echo "$result: $test_name"
 echo "original exit status $orig_rc"
-exit $rc
+if $stop_on_failure; then
+  exit $rc
+else
+  exit 0
+fi