Implement timestamp'ed output on "make check"

Message ID 20181122221240.15354-1-sergiodj@redhat.com
State New, archived
Headers

Commit Message

Sergio Durigan Junior Nov. 22, 2018, 10:12 p.m. UTC
  It is unfortunately not uncommon to have tests hanging on some of the
BuildBot workers.  For example, the ppc64be/ppc64le+gdbserver builders
are especially in a bad state when it comes to testing GDB/gdbserver,
and we can have builds that take an absurd amount of time to
finish (almost 1 week for one single build, for example).

It may be hard to diagnose these failures, because sometimes we don't
have access to the faulty systems, and other times we're just too busy
to wait and check which test is actually hanging.  During one of our
conversations about the topic, someone proposed that it would be a
good idea to have a timestamp put together with stdout output, so that
we can come back later and examine which tests are taking too long to
complete.

Here's my proposal to do this.  The very first thing I tried to do was
to use "ts(1)" to achieve this feature, and it obviously worked, but
the problem is that I'm afraid "ts(1)" may not be widely available on
every system we support.  Therefore, I decided to implement a *very*
simple version of "ts(1)", in Python 3, which basically does the same
thing: iterate over the stdin lines, and prepend a timestamp onto
them.

As for testsuite/Makefile.in, the user can now specify two new
variables to enable timestamp'ed output: TS (which enables the
output), and TS_FORMAT (optional, used to specify another timestamp
format according to "strftime").

Here's an example of how the output looks like:

  ...
  [Nov 22 17:07:19] Running binutils-gdb/gdb/testsuite/gdb.base/call-strs.exp ...
  [Nov 22 17:07:19] Running binutils-gdb/gdb/testsuite/gdb.base/step-over-no-symbols.exp ...
  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/all-architectures-6.exp ...
  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/hashline3.exp ...
  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/max-value-size.exp ...
  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/quit-live.exp ...
  [Nov 22 17:07:46] Running binutils-gdb/gdb/testsuite/gdb.base/paginate-bg-execution.exp ...
  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/gcore-buffer-overflow.exp ...
  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/gcore-relro.exp ...
  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/watchpoint-delete.exp ...
  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/breakpoint-in-ro-region.exp ...
  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/vla-sideeffect.exp ...
  [Nov 22 17:07:57] Running binutils-gdb/gdb/testsuite/gdb.base/unload.exp ...
  ...

(What, gdb.base/quit-live.exp is taking 26 seconds to complete?!)

I've also tested this with FORCE_PARALLEL=1 -j8, and the output still
looks clean (i.e., not intertwined).

Output to stderr is not timestamp'ed, but I don't think that will be a
problem for us.  If it is, we can revisit the solution and extend it.

gdb/testsuite/ChangeLog:
2018-11-22  Sergio Durigan Junior  <sergiodj@redhat.com>

	* Makefile.in (TIMESTAMP): New variable.
	(check-single): Add $(TIMESTAMP) to the end of $(DO_RUNTEST)
	command.
	(check-single-racy): Likewise.
	(check/%.exp): Likewise.
	(check-racy/%.exp): Likewise.
	(workers/%.worker): Likewise.
	(build-perf): Likewise.
	(check-perf): Likewise.
	* README: Describe new "TS" and "TS_FORMAT" variables.
	* print-ts.py: New file.
---
 gdb/testsuite/Makefile.in | 20 ++++++++++-------
 gdb/testsuite/README      | 20 +++++++++++++++++
 gdb/testsuite/print-ts.py | 45 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+), 8 deletions(-)
 create mode 100755 gdb/testsuite/print-ts.py
  

Comments

Alan Hayward Nov. 23, 2018, 2:42 p.m. UTC | #1
> On 22 Nov 2018, at 22:12, Sergio Durigan Junior <sergiodj@redhat.com> wrote:

> 

> It is unfortunately not uncommon to have tests hanging on some of the

> BuildBot workers.  For example, the ppc64be/ppc64le+gdbserver builders

> are especially in a bad state when it comes to testing GDB/gdbserver,

> and we can have builds that take an absurd amount of time to

> finish (almost 1 week for one single build, for example).

> 

> It may be hard to diagnose these failures, because sometimes we don't

> have access to the faulty systems, and other times we're just too busy

> to wait and check which test is actually hanging.  During one of our

> conversations about the topic, someone proposed that it would be a

> good idea to have a timestamp put together with stdout output, so that

> we can come back later and examine which tests are taking too long to

> complete.


Having encountered the same issue, I really like this patch.

> 

> Here's my proposal to do this.  The very first thing I tried to do was

> to use "ts(1)" to achieve this feature, and it obviously worked, but

> the problem is that I'm afraid "ts(1)" may not be widely available on

> every system we support.  Therefore, I decided to implement a *very*

> simple version of "ts(1)", in Python 3, which basically does the same

> thing: iterate over the stdin lines, and prepend a timestamp onto

> them.

> 

> As for testsuite/Makefile.in, the user can now specify two new

> variables to enable timestamp'ed output: TS (which enables the

> output), and TS_FORMAT (optional, used to specify another timestamp

> format according to "strftime").

> 

> Here's an example of how the output looks like:

> 

>  ...

>  [Nov 22 17:07:19] Running binutils-gdb/gdb/testsuite/gdb.base/call-strs.exp ...

>  [Nov 22 17:07:19] Running binutils-gdb/gdb/testsuite/gdb.base/step-over-no-symbols.exp ...

>  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/all-architectures-6.exp ...

>  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/hashline3.exp ...

>  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/max-value-size.exp ...

>  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/quit-live.exp ...

>  [Nov 22 17:07:46] Running binutils-gdb/gdb/testsuite/gdb.base/paginate-bg-execution.exp ...

>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/gcore-buffer-overflow.exp ...

>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/gcore-relro.exp ...

>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/watchpoint-delete.exp ...

>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/breakpoint-in-ro-region.exp ...

>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/vla-sideeffect.exp ...

>  [Nov 22 17:07:57] Running binutils-gdb/gdb/testsuite/gdb.base/unload.exp ...

>  ...

> 

> (What, gdb.base/quit-live.exp is taking 26 seconds to complete?!)

> 

> I've also tested this with FORCE_PARALLEL=1 -j8, and the output still

> looks clean (i.e., not intertwined).


I tried adding a pid to the output (with os.getpid()) and ran with FORCE_PARALLEL=1 -j55

Sample output:

[Nov 23 14:23:31] [32975] /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../../gdb/gdb version  8.2.50.20181123-git -nw -nx -data-directory /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../data-directory
[Nov 23 14:23:31] [32975]
[Nov 23 14:23:34] [32485] FAIL: gdb.threads/create-fail.exp: iteration 4: run till end (timeout)
[Nov 23 14:23:39] [28592] FAIL: gdb.threads/process-dies-while-detaching.exp: single-process: detach: killed outside: continue to breakpoint: _exit (timeout)
[Nov 23 14:23:44] [32485] FAIL: gdb.threads/create-fail.exp: iteration 5: run till end (timeout)
[Nov 23 14:23:49] [28592] FAIL: gdb.threads/process-dies-while-detaching.exp: single-process: detach: killed outside: get integer valueof "mypid" (timeout)
[Nov 23 14:23:55] [32485] FAIL: gdb.threads/create-fail.exp: iteration 6: run till end (timeout)
[Nov 23 14:23:55] [39624]
[Nov 23 14:23:55] [39624] 		=== gdb Summary ===
[Nov 23 14:23:55] [39624]
[Nov 23 14:23:55] [39624] # of expected passes		769
[Nov 23 14:23:55] [39624] /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../../gdb/gdb version  8.2.50.20181123-git -nw -nx -data-directory /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../data-directory
[Nov 23 14:23:55] [39624]
[Nov 23 14:23:58] [34046]
[Nov 23 14:23:58] [34046] 		=== gdb Summary ===
[Nov 23 14:23:58] [34046]
[Nov 23 14:23:58] [34046] # of expected passes		6
[Nov 23 14:23:58] [34046] /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../../gdb/gdb version  8.2.50.20181123-git -nw -nx -data-directory /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../data-directory
[Nov 23 14:23:58] [34046]
[Nov 23 14:24:01] [28592] FAIL: gdb.threads/process-dies-while-detaching.exp: single-process: continue: detach: continue to breakpoint: _exit (timeout)
[Nov 23 14:24:05] [32485] FAIL: gdb.threads/create-fail.exp: iteration 7: run till end (timeout)
[Nov 23 14:24:11] [28592] FAIL: gdb.threads/process-dies-while-detaching.exp: single-process: continue: detach: continue (timeout)
[Nov 23 14:24:13] [31641]
[Nov 23 14:24:13] [31641] 		=== gdb Summary ===
[Nov 23 14:24:13] [31641]
[Nov 23 14:24:13] [31641] # of expected passes		33
[Nov 23 14:24:13] [31641] # of unexpected failures	4
[Nov 23 14:24:13] [33745]
[Nov 23 14:24:13] [33745] 		=== gdb Summary ===
[Nov 23 14:24:13] [33745]

Look how spaced apart the 32485 entries are. I think it needs a pid in the output.


> 

> Output to stderr is not timestamp'ed, but I don't think that will be a

> problem for us.  If it is, we can revisit the solution and extend it.

> 

> gdb/testsuite/ChangeLog:

> 2018-11-22  Sergio Durigan Junior  <sergiodj@redhat.com>

> 

> 	* Makefile.in (TIMESTAMP): New variable.

> 	(check-single): Add $(TIMESTAMP) to the end of $(DO_RUNTEST)

> 	command.

> 	(check-single-racy): Likewise.

> 	(check/%.exp): Likewise.

> 	(check-racy/%.exp): Likewise.

> 	(workers/%.worker): Likewise.

> 	(build-perf): Likewise.

> 	(check-perf): Likewise.

> 	* README: Describe new "TS" and "TS_FORMAT" variables.

> 	* print-ts.py: New file.

> ---

> gdb/testsuite/Makefile.in | 20 ++++++++++-------

> gdb/testsuite/README      | 20 +++++++++++++++++

> gdb/testsuite/print-ts.py | 45 +++++++++++++++++++++++++++++++++++++++

> 3 files changed, 77 insertions(+), 8 deletions(-)

> create mode 100755 gdb/testsuite/print-ts.py

> 

> diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in

> index ece7e252c5..b526f579be 100644

> --- a/gdb/testsuite/Makefile.in

> +++ b/gdb/testsuite/Makefile.in

> @@ -158,6 +158,10 @@ check: all $(abs_builddir)/site.exp

> check-read1:

> 	$(MAKE) READ1="1" check

> 

> +# Check whether we need to print the timestamp for each line of

> +# status.

> +TIMESTAMP = $(if $(TS),| $(srcdir)/print-ts.py $(if $(TS_FORMAT),$(TS_FORMAT),),)

> +

> # All the hair to invoke dejagnu.  A given invocation can just append

> # $(RUNTESTFLAGS)

> DO_RUNTEST = \

> @@ -192,7 +196,7 @@ check-gdb.%:

> 	$(MAKE) check TESTS="gdb.$*/*.exp"

> 

> check-single:

> -	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)

> +	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none) $(TIMESTAMP)

> 

> check-single-racy:

> 	-rm -rf cache racy_outputs temp

> @@ -208,7 +212,7 @@ check-single-racy:

> 	for n in `seq $$racyiter` ; do \

> 	  mkdir -p racy_outputs/$$n; \

> 	  $(DO_RUNTEST) --outdir=racy_outputs/$$n $(RUNTESTFLAGS) \

> -	    $(expanded_tests_or_none); \

> +	    $(expanded_tests_or_none) $(TIMESTAMP); \

> 	done; \

> 	$(srcdir)/analyze-racy-logs.py \

> 	  `ls racy_outputs/*/gdb.sum` > racy.sum; \

> @@ -273,7 +277,7 @@ do-check-parallel: $(TEST_TARGETS)

> 

> check/%.exp:

> 	-mkdir -p outputs/$*

> -	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)

> +	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS) $(TIMESTAMP)

> 

> do-check-parallel-racy: $(TEST_TARGETS)

> 	@:

> @@ -282,7 +286,7 @@ check-racy/%.exp:

> 	-mkdir -p racy_outputs/$(RACY_OUTPUT_N)/$*

> 	$(DO_RUNTEST) GDB_PARALLEL=yes \

> 	  --outdir=racy_outputs/$(RACY_OUTPUT_N)/$* $*.exp \

> -	  $(RUNTESTFLAGS)

> +	  $(RUNTESTFLAGS) $(TIMESTAMP)

> 

> check/no-matching-tests-found:

> 	@echo ""

> @@ -292,7 +296,7 @@ check/no-matching-tests-found:

> # Utility rule invoked by step 2 of the build-perf rule.

> workers/%.worker:

> 	mkdir -p gdb.perf/outputs/$*

> -	$(DO_RUNTEST) --outdir=gdb.perf/outputs/$* lib/build-piece.exp WORKER=$* GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=build-pieces

> +	$(DO_RUNTEST) --outdir=gdb.perf/outputs/$* lib/build-piece.exp WORKER=$* GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=build-pieces $(TIMESTAMP)

> 

> # Utility rule to build tests that support it in parallel.

> # The build is broken into 3 steps distinguished by GDB_PERFTEST_SUBMODE:

> @@ -312,18 +316,18 @@ build-perf: $(abs_builddir)/site.exp

> 	rm -rf gdb.perf/workers

> 	mkdir -p gdb.perf/workers

> 	@: Step 1: Generate the build .worker files.

> -	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf/workers GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=gen-workers

> +	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf/workers GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=gen-workers $(TIMESTAMP)

> 	@: Step 2: Compile the pieces.  Here is the build parallelism.

> 	$(MAKE) $$(cd gdb.perf && echo workers/*/*.worker)

> 	@: Step 3: Do the final link.

> -	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=final

> +	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=final $(TIMESTAMP)

> 

> # The default is to both compile and run the tests.

> GDB_PERFTEST_MODE = both

> 

> check-perf: all $(abs_builddir)/site.exp

> 	@if test ! -d gdb.perf; then mkdir gdb.perf; fi

> -	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PERFTEST_MODE=$(GDB_PERFTEST_MODE) $(RUNTESTFLAGS)

> +	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PERFTEST_MODE=$(GDB_PERFTEST_MODE) $(RUNTESTFLAGS) $(TIMESTAMP)

> 

> force:;

> 

> diff --git a/gdb/testsuite/README b/gdb/testsuite/README

> index 55abfb3254..723d8ba5eb 100644

> --- a/gdb/testsuite/README

> +++ b/gdb/testsuite/README

> @@ -273,6 +273,26 @@ can do:

> Note that only a hostname/address can be provided, without a port

> number.

> 

> +TS

> +

> +This variable turns on the timestamp printing for each line of "make

> +check".  Note that the timestamp will be printed on stdout output

> +only.  In other words, there will be no timestamp output on either

> +gdb.sum and gdb.log files.  If you would like to enable timestamp

> +printing, you can do:

> +

> +	make check TS=1

> +

> +TS_FORMAT

> +

> +You can provide a custom format for timestamp printing with this

> +variable.  The format must be a string compatible with "strftime".

> +This variable is only useful when the TS variable is also provided.

> +If you would like to change the output format of the timestamp, you

> +can do:

> +

> +	make check TS=1 TS_FORMAT='[%b %H:%S]'

> +

> Race detection

> **************

> 

> diff --git a/gdb/testsuite/print-ts.py b/gdb/testsuite/print-ts.py

> new file mode 100755

> index 0000000000..58127f5716

> --- /dev/null

> +++ b/gdb/testsuite/print-ts.py

> @@ -0,0 +1,45 @@

> +#!/usr/bin/env python3


Is it correct to force to python3? Everything else in gdb uses “python”.

> +

> +# Copyright (C) 2018 Free Software Foundation, Inc.

> +#

> +# This file is part of GDB.

> +#

> +# This program is free software; you can redistribute it and/or modify

> +# it under the terms of the GNU General Public License as published by

> +# the Free Software Foundation; either version 3 of the License, or

> +# (at your option) any later version.

> +#

> +# This program is distributed in the hope that it will be useful,

> +# but WITHOUT ANY WARRANTY; without even the implied warranty of

> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +# GNU General Public License for more details.

> +#

> +# You should have received a copy of the GNU General Public License

> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.

> +

> +# This is a simple program that can be used to print timestamps on

> +# standard output.  The inspiration for it was ts(1)

> +# (<https://joeyh.name/code/moreutils/>).  We have our own version

> +# because unfortunately ts(1) is or may not be available on all

> +# systems that GDB supports.

> +#

> +# The usage is simple:

> +#

> +#   #> some_command | print-ts.py [FORMAT]

> +#

> +# FORMAT must be a string compatible with "strftime".  If nothing is

> +# provided, we choose a reasonable format.

> +

> +import fileinput

> +import datetime

> +import sys

> +

> +if len (sys.argv) > 1:

> +    fmt = sys.argv[1]

> +else:

> +    fmt = '[%b %d %H:%M:%S]'

> +

> +for line in fileinput.input ('-'):

> +    print ("{} {}".format (datetime.datetime.now ().strftime (fmt),

> +                           line),

> +           end = '', flush = True)

> -- 

> 2.17.2

>
  
Sergio Durigan Junior Nov. 23, 2018, 2:48 p.m. UTC | #2
On Friday, November 23 2018, Alan Hayward wrote:

>> On 22 Nov 2018, at 22:12, Sergio Durigan Junior <sergiodj@redhat.com> wrote:
>> 
>> It is unfortunately not uncommon to have tests hanging on some of the
>> BuildBot workers.  For example, the ppc64be/ppc64le+gdbserver builders
>> are especially in a bad state when it comes to testing GDB/gdbserver,
>> and we can have builds that take an absurd amount of time to
>> finish (almost 1 week for one single build, for example).
>> 
>> It may be hard to diagnose these failures, because sometimes we don't
>> have access to the faulty systems, and other times we're just too busy
>> to wait and check which test is actually hanging.  During one of our
>> conversations about the topic, someone proposed that it would be a
>> good idea to have a timestamp put together with stdout output, so that
>> we can come back later and examine which tests are taking too long to
>> complete.
>
> Having encountered the same issue, I really like this patch.

Thanks!

>> 
>> Here's my proposal to do this.  The very first thing I tried to do was
>> to use "ts(1)" to achieve this feature, and it obviously worked, but
>> the problem is that I'm afraid "ts(1)" may not be widely available on
>> every system we support.  Therefore, I decided to implement a *very*
>> simple version of "ts(1)", in Python 3, which basically does the same
>> thing: iterate over the stdin lines, and prepend a timestamp onto
>> them.
>> 
>> As for testsuite/Makefile.in, the user can now specify two new
>> variables to enable timestamp'ed output: TS (which enables the
>> output), and TS_FORMAT (optional, used to specify another timestamp
>> format according to "strftime").
>> 
>> Here's an example of how the output looks like:
>> 
>>  ...
>>  [Nov 22 17:07:19] Running binutils-gdb/gdb/testsuite/gdb.base/call-strs.exp ...
>>  [Nov 22 17:07:19] Running binutils-gdb/gdb/testsuite/gdb.base/step-over-no-symbols.exp ...
>>  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/all-architectures-6.exp ...
>>  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/hashline3.exp ...
>>  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/max-value-size.exp ...
>>  [Nov 22 17:07:20] Running binutils-gdb/gdb/testsuite/gdb.base/quit-live.exp ...
>>  [Nov 22 17:07:46] Running binutils-gdb/gdb/testsuite/gdb.base/paginate-bg-execution.exp ...
>>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/gcore-buffer-overflow.exp ...
>>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/gcore-relro.exp ...
>>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/watchpoint-delete.exp ...
>>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/breakpoint-in-ro-region.exp ...
>>  [Nov 22 17:07:56] Running binutils-gdb/gdb/testsuite/gdb.base/vla-sideeffect.exp ...
>>  [Nov 22 17:07:57] Running binutils-gdb/gdb/testsuite/gdb.base/unload.exp ...
>>  ...
>> 
>> (What, gdb.base/quit-live.exp is taking 26 seconds to complete?!)
>> 
>> I've also tested this with FORCE_PARALLEL=1 -j8, and the output still
>> looks clean (i.e., not intertwined).
>
> I tried adding a pid to the output (with os.getpid()) and ran with FORCE_PARALLEL=1 -j55
>
> Sample output:
>
> [Nov 23 14:23:31] [32975]
> /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../../gdb/gdb
> version 8.2.50.20181123-git -nw -nx -data-directory
> /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../data-directory
> [Nov 23 14:23:31] [32975]
> [Nov 23 14:23:34] [32485] FAIL: gdb.threads/create-fail.exp: iteration 4: run till end (timeout)
> [Nov 23 14:23:39] [28592] FAIL: gdb.threads/process-dies-while-detaching.exp: single-process: detach: killed outside: continue to breakpoint: _exit (timeout)
> [Nov 23 14:23:44] [32485] FAIL: gdb.threads/create-fail.exp: iteration 5: run till end (timeout)
> [Nov 23 14:23:49] [28592] FAIL: gdb.threads/process-dies-while-detaching.exp: single-process: detach: killed outside: get integer valueof "mypid" (timeout)
> [Nov 23 14:23:55] [32485] FAIL: gdb.threads/create-fail.exp: iteration 6: run till end (timeout)
> [Nov 23 14:23:55] [39624]
> [Nov 23 14:23:55] [39624] 		=== gdb Summary ===
> [Nov 23 14:23:55] [39624]
> [Nov 23 14:23:55] [39624] # of expected passes		769
> [Nov 23 14:23:55] [39624]
> /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../../gdb/gdb
> version 8.2.50.20181123-git -nw -nx -data-directory
> /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../data-directory
> [Nov 23 14:23:55] [39624]
> [Nov 23 14:23:58] [34046]
> [Nov 23 14:23:58] [34046] 		=== gdb Summary ===
> [Nov 23 14:23:58] [34046]
> [Nov 23 14:23:58] [34046] # of expected passes		6
> [Nov 23 14:23:58] [34046]
> /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../../gdb/gdb
> version 8.2.50.20181123-git -nw -nx -data-directory
> /work/alahay01/gdb-HEAD/build-aarch64/gdb/testsuite/../data-directory
> [Nov 23 14:23:58] [34046]
> [Nov 23 14:24:01] [28592] FAIL: gdb.threads/process-dies-while-detaching.exp: single-process: continue: detach: continue to breakpoint: _exit (timeout)
> [Nov 23 14:24:05] [32485] FAIL: gdb.threads/create-fail.exp: iteration 7: run till end (timeout)
> [Nov 23 14:24:11] [28592] FAIL: gdb.threads/process-dies-while-detaching.exp: single-process: continue: detach: continue (timeout)
> [Nov 23 14:24:13] [31641]
> [Nov 23 14:24:13] [31641] 		=== gdb Summary ===
> [Nov 23 14:24:13] [31641]
> [Nov 23 14:24:13] [31641] # of expected passes		33
> [Nov 23 14:24:13] [31641] # of unexpected failures	4
> [Nov 23 14:24:13] [33745]
> [Nov 23 14:24:13] [33745] 		=== gdb Summary ===
> [Nov 23 14:24:13] [33745]
>
> Look how spaced apart the 32485 entries are. I think it needs a pid in the output.

Good point.  I guess my -j8 test wasn't enough to trigger this
problem.  I'll adjust the patch to include the PID.

>> 
>> Output to stderr is not timestamp'ed, but I don't think that will be a
>> problem for us.  If it is, we can revisit the solution and extend it.
>> 
>> gdb/testsuite/ChangeLog:
>> 2018-11-22  Sergio Durigan Junior  <sergiodj@redhat.com>
>> 
>> 	* Makefile.in (TIMESTAMP): New variable.
>> 	(check-single): Add $(TIMESTAMP) to the end of $(DO_RUNTEST)
>> 	command.
>> 	(check-single-racy): Likewise.
>> 	(check/%.exp): Likewise.
>> 	(check-racy/%.exp): Likewise.
>> 	(workers/%.worker): Likewise.
>> 	(build-perf): Likewise.
>> 	(check-perf): Likewise.
>> 	* README: Describe new "TS" and "TS_FORMAT" variables.
>> 	* print-ts.py: New file.
>> ---
>> gdb/testsuite/Makefile.in | 20 ++++++++++-------
>> gdb/testsuite/README      | 20 +++++++++++++++++
>> gdb/testsuite/print-ts.py | 45 +++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 77 insertions(+), 8 deletions(-)
>> create mode 100755 gdb/testsuite/print-ts.py
>> 
>> diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
>> index ece7e252c5..b526f579be 100644
>> --- a/gdb/testsuite/Makefile.in
>> +++ b/gdb/testsuite/Makefile.in
>> @@ -158,6 +158,10 @@ check: all $(abs_builddir)/site.exp
>> check-read1:
>> 	$(MAKE) READ1="1" check
>> 
>> +# Check whether we need to print the timestamp for each line of
>> +# status.
>> +TIMESTAMP = $(if $(TS),| $(srcdir)/print-ts.py $(if $(TS_FORMAT),$(TS_FORMAT),),)
>> +
>> # All the hair to invoke dejagnu.  A given invocation can just append
>> # $(RUNTESTFLAGS)
>> DO_RUNTEST = \
>> @@ -192,7 +196,7 @@ check-gdb.%:
>> 	$(MAKE) check TESTS="gdb.$*/*.exp"
>> 
>> check-single:
>> -	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)
>> +	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none) $(TIMESTAMP)
>> 
>> check-single-racy:
>> 	-rm -rf cache racy_outputs temp
>> @@ -208,7 +212,7 @@ check-single-racy:
>> 	for n in `seq $$racyiter` ; do \
>> 	  mkdir -p racy_outputs/$$n; \
>> 	  $(DO_RUNTEST) --outdir=racy_outputs/$$n $(RUNTESTFLAGS) \
>> -	    $(expanded_tests_or_none); \
>> +	    $(expanded_tests_or_none) $(TIMESTAMP); \
>> 	done; \
>> 	$(srcdir)/analyze-racy-logs.py \
>> 	  `ls racy_outputs/*/gdb.sum` > racy.sum; \
>> @@ -273,7 +277,7 @@ do-check-parallel: $(TEST_TARGETS)
>> 
>> check/%.exp:
>> 	-mkdir -p outputs/$*
>> -	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)
>> +	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS) $(TIMESTAMP)
>> 
>> do-check-parallel-racy: $(TEST_TARGETS)
>> 	@:
>> @@ -282,7 +286,7 @@ check-racy/%.exp:
>> 	-mkdir -p racy_outputs/$(RACY_OUTPUT_N)/$*
>> 	$(DO_RUNTEST) GDB_PARALLEL=yes \
>> 	  --outdir=racy_outputs/$(RACY_OUTPUT_N)/$* $*.exp \
>> -	  $(RUNTESTFLAGS)
>> +	  $(RUNTESTFLAGS) $(TIMESTAMP)
>> 
>> check/no-matching-tests-found:
>> 	@echo ""
>> @@ -292,7 +296,7 @@ check/no-matching-tests-found:
>> # Utility rule invoked by step 2 of the build-perf rule.
>> workers/%.worker:
>> 	mkdir -p gdb.perf/outputs/$*
>> -	$(DO_RUNTEST) --outdir=gdb.perf/outputs/$* lib/build-piece.exp WORKER=$* GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=build-pieces
>> +	$(DO_RUNTEST) --outdir=gdb.perf/outputs/$* lib/build-piece.exp WORKER=$* GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=build-pieces $(TIMESTAMP)
>> 
>> # Utility rule to build tests that support it in parallel.
>> # The build is broken into 3 steps distinguished by GDB_PERFTEST_SUBMODE:
>> @@ -312,18 +316,18 @@ build-perf: $(abs_builddir)/site.exp
>> 	rm -rf gdb.perf/workers
>> 	mkdir -p gdb.perf/workers
>> 	@: Step 1: Generate the build .worker files.
>> -	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf/workers GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=gen-workers
>> +	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf/workers GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=gen-workers $(TIMESTAMP)
>> 	@: Step 2: Compile the pieces.  Here is the build parallelism.
>> 	$(MAKE) $$(cd gdb.perf && echo workers/*/*.worker)
>> 	@: Step 3: Do the final link.
>> -	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=final
>> +	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=final $(TIMESTAMP)
>> 
>> # The default is to both compile and run the tests.
>> GDB_PERFTEST_MODE = both
>> 
>> check-perf: all $(abs_builddir)/site.exp
>> 	@if test ! -d gdb.perf; then mkdir gdb.perf; fi
>> -	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PERFTEST_MODE=$(GDB_PERFTEST_MODE) $(RUNTESTFLAGS)
>> +	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PERFTEST_MODE=$(GDB_PERFTEST_MODE) $(RUNTESTFLAGS) $(TIMESTAMP)
>> 
>> force:;
>> 
>> diff --git a/gdb/testsuite/README b/gdb/testsuite/README
>> index 55abfb3254..723d8ba5eb 100644
>> --- a/gdb/testsuite/README
>> +++ b/gdb/testsuite/README
>> @@ -273,6 +273,26 @@ can do:
>> Note that only a hostname/address can be provided, without a port
>> number.
>> 
>> +TS
>> +
>> +This variable turns on the timestamp printing for each line of "make
>> +check".  Note that the timestamp will be printed on stdout output
>> +only.  In other words, there will be no timestamp output on either
>> +gdb.sum and gdb.log files.  If you would like to enable timestamp
>> +printing, you can do:
>> +
>> +	make check TS=1
>> +
>> +TS_FORMAT
>> +
>> +You can provide a custom format for timestamp printing with this
>> +variable.  The format must be a string compatible with "strftime".
>> +This variable is only useful when the TS variable is also provided.
>> +If you would like to change the output format of the timestamp, you
>> +can do:
>> +
>> +	make check TS=1 TS_FORMAT='[%b %H:%S]'
>> +
>> Race detection
>> **************
>> 
>> diff --git a/gdb/testsuite/print-ts.py b/gdb/testsuite/print-ts.py
>> new file mode 100755
>> index 0000000000..58127f5716
>> --- /dev/null
>> +++ b/gdb/testsuite/print-ts.py
>> @@ -0,0 +1,45 @@
>> +#!/usr/bin/env python3
>
> Is it correct to force to python3? Everything else in gdb uses “python”.

That's something I wondered myself.  I wrote the script with Python 3 in
mind, but I guess I can make it compatible with Python 2 as well since
it's pretty simple.

I'll send a v2 soon.  Thanks for the review.

>> +
>> +# Copyright (C) 2018 Free Software Foundation, Inc.
>> +#
>> +# This file is part of GDB.
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +# This is a simple program that can be used to print timestamps on
>> +# standard output.  The inspiration for it was ts(1)
>> +# (<https://joeyh.name/code/moreutils/>).  We have our own version
>> +# because unfortunately ts(1) is or may not be available on all
>> +# systems that GDB supports.
>> +#
>> +# The usage is simple:
>> +#
>> +#   #> some_command | print-ts.py [FORMAT]
>> +#
>> +# FORMAT must be a string compatible with "strftime".  If nothing is
>> +# provided, we choose a reasonable format.
>> +
>> +import fileinput
>> +import datetime
>> +import sys
>> +
>> +if len (sys.argv) > 1:
>> +    fmt = sys.argv[1]
>> +else:
>> +    fmt = '[%b %d %H:%M:%S]'
>> +
>> +for line in fileinput.input ('-'):
>> +    print ("{} {}".format (datetime.datetime.now ().strftime (fmt),
>> +                           line),
>> +           end = '', flush = True)
>> -- 
>> 2.17.2
>>
  
Pedro Franco de Carvalho Dec. 5, 2018, 7:38 p.m. UTC | #3
Sergio Durigan Junior <sergiodj@redhat.com> writes:

> It is unfortunately not uncommon to have tests hanging on some of the
> BuildBot workers.  For example, the ppc64be/ppc64le+gdbserver builders
> are especially in a bad state when it comes to testing GDB/gdbserver,
> and we can have builds that take an absurd amount of time to
> finish (almost 1 week for one single build, for example).
>
> It may be hard to diagnose these failures, because sometimes we don't
> have access to the faulty systems, and other times we're just too busy
> to wait and check which test is actually hanging.  During one of our
> conversations about the topic, someone proposed that it would be a
> good idea to have a timestamp put together with stdout output, so that
> we can come back later and examine which tests are taking too long to
> complete.

Hello Sergio,

I've looked into the ppc64le native-gdbserver/native-extended-gdbserver
buildbot workers, it seems that they started getting slower with this
patch:

f19c7ff839d7a32ebb48482ae7d318fb46ca823d
GDBSERVER: Listen on a unix domain (instead of TCP) socket if requested.

I believe this patch has been reverted, but the builders haven't reached
the reverted point yet.  Would it make sense to make the workers resume
after the reverted patch?  This would be at commit
80e24d09860dbeba7d435b4a4f0990f85dbc084e.

They did also get even slower after this patch:

fe1a5cad302b5535030cdf62895e79512713d738
[gdb/testsuite] Log wait status on process no longer exists error

But I think it only happens in combination with the patch above (see
https://sourceware.org/ml/gdb-patches/2018-12/msg00060.html for a more
detailed explanation).

Thanks!!

--
Pedro Franco de Carvalho
  
Sergio Durigan Junior Dec. 5, 2018, 7:48 p.m. UTC | #4
On Wednesday, December 05 2018, Pedro Franco de Carvalho wrote:

> Sergio Durigan Junior <sergiodj@redhat.com> writes:
>
>> It is unfortunately not uncommon to have tests hanging on some of the
>> BuildBot workers.  For example, the ppc64be/ppc64le+gdbserver builders
>> are especially in a bad state when it comes to testing GDB/gdbserver,
>> and we can have builds that take an absurd amount of time to
>> finish (almost 1 week for one single build, for example).
>>
>> It may be hard to diagnose these failures, because sometimes we don't
>> have access to the faulty systems, and other times we're just too busy
>> to wait and check which test is actually hanging.  During one of our
>> conversations about the topic, someone proposed that it would be a
>> good idea to have a timestamp put together with stdout output, so that
>> we can come back later and examine which tests are taking too long to
>> complete.
>
> Hello Sergio,

Hey Pedro,

> I've looked into the ppc64le native-gdbserver/native-extended-gdbserver
> buildbot workers, it seems that they started getting slower with this
> patch:
>
> f19c7ff839d7a32ebb48482ae7d318fb46ca823d
> GDBSERVER: Listen on a unix domain (instead of TCP) socket if requested.

From my recollection, ISTR that the gdbserver builders were *very* slow
even before this commit.  I was even talking to Edjunior about disabling
the gdbserver builders, because they were just contributing to making
the queue larger for the native builder.

> I believe this patch has been reverted, but the builders haven't reached
> the reverted point yet.  Would it make sense to make the workers resume
> after the reverted patch?  This would be at commit
> 80e24d09860dbeba7d435b4a4f0990f85dbc084e.
>
> They did also get even slower after this patch:
>
> fe1a5cad302b5535030cdf62895e79512713d738
> [gdb/testsuite] Log wait status on process no longer exists error
>
> But I think it only happens in combination with the patch above (see
> https://sourceware.org/ml/gdb-patches/2018-12/msg00060.html for a more
> detailed explanation).

Alright.  I'll try to cancel the builds up until
80e24d09860dbeba7d435b4a4f0990f85dbc084e and see how things go.

Thanks for taking the time to investigate this!
  
Pedro Franco de Carvalho Dec. 5, 2018, 8:07 p.m. UTC | #5
Sergio Durigan Junior <sergiodj@redhat.com> writes:

> From my recollection, ISTR that the gdbserver builders were *very* slow
> even before this commit.  I was even talking to Edjunior about disabling
> the gdbserver builders, because they were just contributing to making
> the queue larger for the native builder.

It seems they were taking about one hour before, at least on the LE
builders, although I think the BE ones were taking longer.  Is one hour
more than expected for the gdbserver builders?  If not, I'll see if I
can figure out what's taking long.

> Alright.  I'll try to cancel the builds up until
> 80e24d09860dbeba7d435b4a4f0990f85dbc084e and see how things go.
>
> Thanks for taking the time to investigate this!

Thank you!

--
Pedro Franco de Carvalho
  
Sergio Durigan Junior Dec. 5, 2018, 8:26 p.m. UTC | #6
On Wednesday, December 05 2018, Pedro Franco de Carvalho wrote:

> Sergio Durigan Junior <sergiodj@redhat.com> writes:
>
>> From my recollection, ISTR that the gdbserver builders were *very* slow
>> even before this commit.  I was even talking to Edjunior about disabling
>> the gdbserver builders, because they were just contributing to making
>> the queue larger for the native builder.
>
> It seems they were taking about one hour before, at least on the LE
> builders, although I think the BE ones were taking longer.  Is one hour
> more than expected for the gdbserver builders?  If not, I'll see if I
> can figure out what's taking long.

Hm, on the native builder (CentOS-ppc64le-m64), the builds were taking
about 30 minutes.  However, on
CentOS-ppc64le-native-extended-gdbserver-m64 and
CentOS-ppc64le-native-gdbserver-m64, the builds were taking much
longer.  For example:

  https://gdb-build.sergiodj.net/builders/CentOS-ppc64le-native-gdbserver-m64/builds/1988
  Elapsed	6 hrs, 30 mins, 12 secs

  https://gdb-build.sergiodj.net/builders/CentOS-ppc64le-native-gdbserver-m64/builds/1997
  Elapsed	20 hrs, 52 mins, 15 secs

Things get much worse with native-extended:

  https://gdb-build.sergiodj.net/builders/CentOS-ppc64le-native-extended-gdbserver-m64/builds/1992
  Elapsed	37 hrs, 4 mins, 29 secs

  https://gdb-build.sergiodj.net/builders/CentOS-ppc64le-native-extended-gdbserver-m64/builds/1988
  Elapsed	37 hrs, 6 mins, 19 secs

Unfortunately they haven't reached the commit which introduces the
timestamp feature, so we still don't know exactly where they're hanging.

>> Alright.  I'll try to cancel the builds up until
>> 80e24d09860dbeba7d435b4a4f0990f85dbc084e and see how things go.
>>
>> Thanks for taking the time to investigate this!
>
> Thank you!

No problem.

I've now cancelled the builds; let's see how things go.

Cheers,
  
Pedro Alves Dec. 6, 2018, 3:29 p.m. UTC | #7
On 12/05/2018 08:26 PM, Sergio Durigan Junior wrote:

> Unfortunately they haven't reached the commit which introduces the
> timestamp feature, so we still don't know exactly where they're hanging.

Actually we could know even without the timestamp feature -- all the
testsuite/*/gdb.log files have something like:

  Test run by pedro on Tue Dec  4 18:15:17 2018

at the top, and ...

  runtest completed at Tue Dec  4 18:15:18 2018

... at the bottom.

So a simple script could extract that info and present it
in a convenient way.

Thanks,
Pedro Alves
  
Pedro Alves Dec. 6, 2018, 3:53 p.m. UTC | #8
On 12/06/2018 03:29 PM, Pedro Alves wrote:
> On 12/05/2018 08:26 PM, Sergio Durigan Junior wrote:
> 
>> Unfortunately they haven't reached the commit which introduces the
>> timestamp feature, so we still don't know exactly where they're hanging.
> 
> Actually we could know even without the timestamp feature -- all the
> testsuite/*/gdb.log files have something like:
> 
>   Test run by pedro on Tue Dec  4 18:15:17 2018
> 
> at the top, and ...
> 
>   runtest completed at Tue Dec  4 18:15:18 2018
> 
> ... at the bottom.
> 
> So a simple script could extract that info and present it
> in a convenient way.

Here's a quick bash POC.

Make sure to test in parallel mode (make check-parallel),
so that each testcase gets its own gdb.log file
and then run the script, like:

 $ cd gdb
 $ make -j8 check-parallel
 $ extract-times testsuite/outputs/ | sort -n | tail -n 10
 33 seconds for testsuite/outputs/gdb.base/whatis-ptype-typedefs/gdb.log
 35 seconds for testsuite/outputs/gdb.base/quit-live/gdb.log
 35 seconds for testsuite/outputs/gdb.base/structs/gdb.log
 37 seconds for testsuite/outputs/gdb.base/gdb-sigterm/gdb.log
 39 seconds for testsuite/outputs/gdb.threads/attach-many-short-lived-threads/gdb.log
 43 seconds for testsuite/outputs/gdb.base/multi-forks/gdb.log
 48 seconds for testsuite/outputs/gdb.linespec/cpls-ops/gdb.log
 64 seconds for testsuite/outputs/gdb.linespec/cpcompletion/gdb.log
 78 seconds for testsuite/outputs/gdb.base/break-interp/gdb.log
 149 seconds for testsuite/outputs/gdb.base/sigstep/gdb.log

(that was run against an older test dir I had handy.)
#!/bin/bash

find $1 -name "gdb.log" | \
    while read line;
    do
	#echo $line
	start=$(head -n1 $line | sed 's/Test run by .* on //g')
	end=$(tail -n1 $line | sed 's/.*completed at //g')
	#echo start=$start
	#echo end=$end
	start_sec=$(date --date="$start" '+%s')
	end_sec=$(date --date="$end" '+%s')
	duration_sec=$((end_sec - start_sec))
	echo "$duration_sec seconds for $line"
    done
  
Pedro Franco de Carvalho Dec. 6, 2018, 7:32 p.m. UTC | #9
Pedro Alves <palves@redhat.com> writes:

> Here's a quick bash POC.

Thanks for the script!  However, for the buildbots I don't have access
to the parallel outputs folder through the webpage (or I don't know
where to find them).

Still, the merged gdb.log file seems to already have the completion
times, e.g.:

testcase [...]gdb.trace/unavailable.exp completed in 28 seconds

--
Pedro Franco de Carvalho
  
Pedro Franco de Carvalho Dec. 6, 2018, 7:41 p.m. UTC | #10
Sergio Durigan Junior <sergiodj@redhat.com> writes:

> I've now cancelled the builds; let's see how things go.

Thanks!  The LE builder seem to be going faster now (a bit more than one
hour).

They still fail sometimes with a timeout for make.  From the logs of the
ones that didn't fail it seems that gdb.threads/trace-condition.exp and
gdb.threads/tspeed.exp are the ones taking long, and could be causing
this, but I haven't looked into them yet.

--
Pedro Franco de Carvalho
  
Sergio Durigan Junior Dec. 6, 2018, 7:52 p.m. UTC | #11
On Thursday, December 06 2018, Pedro Franco de Carvalho wrote:

> Pedro Alves <palves@redhat.com> writes:
>
>> Here's a quick bash POC.
>
> Thanks for the script!  However, for the buildbots I don't have access
> to the parallel outputs folder through the webpage (or I don't know
> where to find them).
>
> Still, the merged gdb.log file seems to already have the completion
> times, e.g.:
>
> testcase [...]gdb.trace/unavailable.exp completed in 28 seconds

The results can be found at <https://gdb-build.sergiodj.net/results/>,
but you're, you just have access to the consolidated gdb.log, not the
different logs under output/.

Thanks,
  
Pedro Alves Dec. 7, 2018, 8:05 p.m. UTC | #12
On 12/06/2018 07:52 PM, Sergio Durigan Junior wrote:
> On Thursday, December 06 2018, Pedro Franco de Carvalho wrote:
> 
>> Pedro Alves <palves@redhat.com> writes:
>>
>>> Here's a quick bash POC.
>>
>> Thanks for the script!  However, for the buildbots I don't have access
>> to the parallel outputs folder through the webpage (or I don't know
>> where to find them).
>>
>> Still, the merged gdb.log file seems to already have the completion
>> times, e.g.:
>>
>> testcase [...]gdb.trace/unavailable.exp completed in 28 seconds

Indeed!

> 
> The results can be found at <https://gdb-build.sergiodj.net/results/>,
> but you're, you just have access to the consolidated gdb.log, not the
> different logs under output/.

Given the above, then I have to ask: what's the advantage of the
new timestamped output mode?  Why not just extract the info out
of the gdb.log file?  It's less code to maintain, and you don't
have to remember to run the testsuite in a special mode, nor
record stdout of the test run.

Thanks,
Pedro Alves
  
Sergio Durigan Junior Dec. 7, 2018, 10:09 p.m. UTC | #13
On Friday, December 07 2018, Pedro Alves wrote:

> On 12/06/2018 07:52 PM, Sergio Durigan Junior wrote:
>> The results can be found at <https://gdb-build.sergiodj.net/results/>,
>> but you're, you just have access to the consolidated gdb.log, not the
>> different logs under output/.
>
> Given the above, then I have to ask: what's the advantage of the
> new timestamped output mode?  Why not just extract the info out
> of the gdb.log file?  It's less code to maintain, and you don't
> have to remember to run the testsuite in a special mode, nor
> record stdout of the test run.

When I was writing the timestamp script, I didn't think about extracting
the info from gdb.log.  So there's that.  I agree that it's easier to
have the consolidated info at the end, and I can propose a patch for
that based on your shell script, but I also think it makes sense to have
the special "timestamp'd" mode, because sometimes (when a build takes
more than 120 hours, for example, which was happening until recently
with the ppc64le builders) it's useful to have the information on the
screen without having to wait, I think.

Cheers,
  
Pedro Alves Dec. 8, 2018, 12:58 p.m. UTC | #14
On 12/07/2018 10:09 PM, Sergio Durigan Junior wrote:
> On Friday, December 07 2018, Pedro Alves wrote:
> 
>> On 12/06/2018 07:52 PM, Sergio Durigan Junior wrote:
>>> The results can be found at <https://gdb-build.sergiodj.net/results/>,
>>> but you're, you just have access to the consolidated gdb.log, not the
>>> different logs under output/.
>>
>> Given the above, then I have to ask: what's the advantage of the
>> new timestamped output mode?  Why not just extract the info out
>> of the gdb.log file?  It's less code to maintain, and you don't
>> have to remember to run the testsuite in a special mode, nor
>> record stdout of the test run.
> 
> When I was writing the timestamp script, I didn't think about extracting
> the info from gdb.log.  So there's that.  I agree that it's easier to
> have the consolidated info at the end, and I can propose a patch for
> that based on your shell script, 

I think you're missing that we don't even need my previous script.  As Pedro said:

"Still, the merged gdb.log file seems to already have the completion
            ^^^^^^
 times, e.g.:

 testcase [...]gdb.trace/unavailable.exp completed in 28 seconds"


So all we need is something like this:

 $ grep "completed in " testsuite/gdb.log | awk '{print $5 " seconds for " $2}' | sort -n | tail -n 10
 20 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/fork-running-state.exp
 23 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/gnu-ifunc.exp
 28 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/attach-pie-misread.exp
 28 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/foll-vfork.exp
 29 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/attach.exp
 31 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/checkpoint.exp
 32 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/checkpoint-ns.exp
 37 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/dprintf-detach.exp
 49 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/gdb-sigterm.exp
 78 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/break-interp.exp

Thanks,
Pedro Alves
  
Sergio Durigan Junior Dec. 8, 2018, 11:16 p.m. UTC | #15
On Saturday, December 08 2018, Pedro Alves wrote:

> On 12/07/2018 10:09 PM, Sergio Durigan Junior wrote:
>> On Friday, December 07 2018, Pedro Alves wrote:
>> 
>>> On 12/06/2018 07:52 PM, Sergio Durigan Junior wrote:
>>>> The results can be found at <https://gdb-build.sergiodj.net/results/>,
>>>> but you're, you just have access to the consolidated gdb.log, not the
>>>> different logs under output/.
>>>
>>> Given the above, then I have to ask: what's the advantage of the
>>> new timestamped output mode?  Why not just extract the info out
>>> of the gdb.log file?  It's less code to maintain, and you don't
>>> have to remember to run the testsuite in a special mode, nor
>>> record stdout of the test run.
>> 
>> When I was writing the timestamp script, I didn't think about extracting
>> the info from gdb.log.  So there's that.  I agree that it's easier to
>> have the consolidated info at the end, and I can propose a patch for
>> that based on your shell script, 
>
> I think you're missing that we don't even need my previous script.  As Pedro said:
>
> "Still, the merged gdb.log file seems to already have the completion
>             ^^^^^^
>  times, e.g.:
>
>  testcase [...]gdb.trace/unavailable.exp completed in 28 seconds"
>
>
> So all we need is something like this:
>
>  $ grep "completed in " testsuite/gdb.log | awk '{print $5 " seconds for " $2}' | sort -n | tail -n 10
>  20 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/fork-running-state.exp
>  23 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/gnu-ifunc.exp
>  28 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/attach-pie-misread.exp
>  28 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/foll-vfork.exp
>  29 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/attach.exp
>  31 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/checkpoint.exp
>  32 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/checkpoint-ns.exp
>  37 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/dprintf-detach.exp
>  49 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/gdb-sigterm.exp
>  78 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/break-interp.exp

Ah, sorry, indeed I was missing this point.  Well, in any case, we need
some kind of script, right?  I think it'd be a good idea to generate
this report at the end of the test run, and put it into a file under the
testsuite dir, WDYT?

Thanks,
  
Philippe Waroquiers Dec. 9, 2018, 8:57 a.m. UTC | #16
On Sat, 2018-12-08 at 18:16 -0500, Sergio Durigan Junior wrote:
> On Saturday, December 08 2018, Pedro Alves wrote:
> 
> > On 12/07/2018 10:09 PM, Sergio Durigan Junior wrote:
> > > On Friday, December 07 2018, Pedro Alves wrote:
> > > 
> > > > On 12/06/2018 07:52 PM, Sergio Durigan Junior wrote:
> > > > > The results can be found at <https://gdb-build.sergiodj.net/results/>,
> > > > > but you're, you just have access to the consolidated gdb.log, not the
> > > > > different logs under output/.
> > > > 
> > > > Given the above, then I have to ask: what's the advantage of the
> > > > new timestamped output mode?  Why not just extract the info out
> > > > of the gdb.log file?  It's less code to maintain, and you don't
> > > > have to remember to run the testsuite in a special mode, nor
> > > > record stdout of the test run.
> > > 
> > > When I was writing the timestamp script, I didn't think about extracting
> > > the info from gdb.log.  So there's that.  I agree that it's easier to
> > > have the consolidated info at the end, and I can propose a patch for
> > > that based on your shell script, 
> > 
> > I think you're missing that we don't even need my previous script.  As Pedro said:
> > 
> > "Still, the merged gdb.log file seems to already have the completion
> >             ^^^^^^
> >  times, e.g.:
> > 
> >  testcase [...]gdb.trace/unavailable.exp completed in 28 seconds"
> > 
> > 
> > So all we need is something like this:
> > 
> >  $ grep "completed in " testsuite/gdb.log | awk '{print $5 " seconds for " $2}' | sort -n | tail -n 10
> >  20 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/fork-running-state.exp
> >  23 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/gnu-ifunc.exp
> >  28 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/attach-pie-misread.exp
> >  28 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/foll-vfork.exp
> >  29 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/attach.exp
> >  31 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/checkpoint.exp
> >  32 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/checkpoint-ns.exp
> >  37 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/dprintf-detach.exp
> >  49 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/gdb-sigterm.exp
> >  78 seconds for /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.base/break-interp.exp
> 
> Ah, sorry, indeed I was missing this point.  Well, in any case, we need
> some kind of script, right?  I think it'd be a good idea to generate
> this report at the end of the test run, and put it into a file under the
> testsuite dir, WDYT?

As suggested by Pedro in  https://sourceware.org/ml/gdb-patches/2018-12/msg00022.html,
I am busy measuring how much the patch
   [RFA] Add FORCE_LOCAL_GDB_QUIT_WAIT testsuite parameter.
slows down the regression test, if always activated.

For this, I am using the extract times script Pedro did.
So, IMO, such a script is very useful.

I am also post-processing its output with the below 'avg' script:
   awk '{ sum = sum + $1;  n = n + 1}   END { printf("sum %d N %d avg %f\n",sum,n,sum/n) }' "$@"

e.g. doing
    extract-times ... | avg


Note that regarding the slowdown by the FORCE_LOCAL_GDB_QUIT_WAIT:
it seems that a significant nr of tests are somewhat faster when
activating this FORCE QUIT WAIT, but that is all based on
a second precision, e.g. 30 tests or so take now 0 seconds
rather than 1 second, or 1 second instead of 2 seconds.

So, if easy to do, it would be nice to have a precision better than a second,
to have a better idea of small improvement or degradation.

For the patch 'slow down/speed up' : in total, the patch is still significantly
slowing down the testsuite, as a bunch of tests are taking
about one minute more due to timeouts doing the quit.

I am trying to have a better logic in FORCE QUIT WAIT to solve that :
it is not impossible that at the end, doing a clean QUIT+WAIT might slightly
speed up the testsuite
(measured doing 
  make check RUNTESTFLAGS="GDB=$PWD/gdb FORCE_LOCAL_GDB_QUIT_WAIT=1" FORCE_PARALLEL="1" -j1
).

Thanks

Philippe
  

Patch

diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
index ece7e252c5..b526f579be 100644
--- a/gdb/testsuite/Makefile.in
+++ b/gdb/testsuite/Makefile.in
@@ -158,6 +158,10 @@  check: all $(abs_builddir)/site.exp
 check-read1:
 	$(MAKE) READ1="1" check
 
+# Check whether we need to print the timestamp for each line of
+# status.
+TIMESTAMP = $(if $(TS),| $(srcdir)/print-ts.py $(if $(TS_FORMAT),$(TS_FORMAT),),)
+
 # All the hair to invoke dejagnu.  A given invocation can just append
 # $(RUNTESTFLAGS)
 DO_RUNTEST = \
@@ -192,7 +196,7 @@  check-gdb.%:
 	$(MAKE) check TESTS="gdb.$*/*.exp"
 
 check-single:
-	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none)
+	$(DO_RUNTEST) $(RUNTESTFLAGS) $(expanded_tests_or_none) $(TIMESTAMP)
 
 check-single-racy:
 	-rm -rf cache racy_outputs temp
@@ -208,7 +212,7 @@  check-single-racy:
 	for n in `seq $$racyiter` ; do \
 	  mkdir -p racy_outputs/$$n; \
 	  $(DO_RUNTEST) --outdir=racy_outputs/$$n $(RUNTESTFLAGS) \
-	    $(expanded_tests_or_none); \
+	    $(expanded_tests_or_none) $(TIMESTAMP); \
 	done; \
 	$(srcdir)/analyze-racy-logs.py \
 	  `ls racy_outputs/*/gdb.sum` > racy.sum; \
@@ -273,7 +277,7 @@  do-check-parallel: $(TEST_TARGETS)
 
 check/%.exp:
 	-mkdir -p outputs/$*
-	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)
+	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS) $(TIMESTAMP)
 
 do-check-parallel-racy: $(TEST_TARGETS)
 	@:
@@ -282,7 +286,7 @@  check-racy/%.exp:
 	-mkdir -p racy_outputs/$(RACY_OUTPUT_N)/$*
 	$(DO_RUNTEST) GDB_PARALLEL=yes \
 	  --outdir=racy_outputs/$(RACY_OUTPUT_N)/$* $*.exp \
-	  $(RUNTESTFLAGS)
+	  $(RUNTESTFLAGS) $(TIMESTAMP)
 
 check/no-matching-tests-found:
 	@echo ""
@@ -292,7 +296,7 @@  check/no-matching-tests-found:
 # Utility rule invoked by step 2 of the build-perf rule.
 workers/%.worker:
 	mkdir -p gdb.perf/outputs/$*
-	$(DO_RUNTEST) --outdir=gdb.perf/outputs/$* lib/build-piece.exp WORKER=$* GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=build-pieces
+	$(DO_RUNTEST) --outdir=gdb.perf/outputs/$* lib/build-piece.exp WORKER=$* GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=build-pieces $(TIMESTAMP)
 
 # Utility rule to build tests that support it in parallel.
 # The build is broken into 3 steps distinguished by GDB_PERFTEST_SUBMODE:
@@ -312,18 +316,18 @@  build-perf: $(abs_builddir)/site.exp
 	rm -rf gdb.perf/workers
 	mkdir -p gdb.perf/workers
 	@: Step 1: Generate the build .worker files.
-	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf/workers GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=gen-workers
+	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf/workers GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=gen-workers $(TIMESTAMP)
 	@: Step 2: Compile the pieces.  Here is the build parallelism.
 	$(MAKE) $$(cd gdb.perf && echo workers/*/*.worker)
 	@: Step 3: Do the final link.
-	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=final
+	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile GDB_PERFTEST_SUBMODE=final $(TIMESTAMP)
 
 # The default is to both compile and run the tests.
 GDB_PERFTEST_MODE = both
 
 check-perf: all $(abs_builddir)/site.exp
 	@if test ! -d gdb.perf; then mkdir gdb.perf; fi
-	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PERFTEST_MODE=$(GDB_PERFTEST_MODE) $(RUNTESTFLAGS)
+	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PERFTEST_MODE=$(GDB_PERFTEST_MODE) $(RUNTESTFLAGS) $(TIMESTAMP)
 
 force:;
 
diff --git a/gdb/testsuite/README b/gdb/testsuite/README
index 55abfb3254..723d8ba5eb 100644
--- a/gdb/testsuite/README
+++ b/gdb/testsuite/README
@@ -273,6 +273,26 @@  can do:
 Note that only a hostname/address can be provided, without a port
 number.
 
+TS
+
+This variable turns on the timestamp printing for each line of "make
+check".  Note that the timestamp will be printed on stdout output
+only.  In other words, there will be no timestamp output on either
+gdb.sum and gdb.log files.  If you would like to enable timestamp
+printing, you can do:
+
+	make check TS=1
+
+TS_FORMAT
+
+You can provide a custom format for timestamp printing with this
+variable.  The format must be a string compatible with "strftime".
+This variable is only useful when the TS variable is also provided.
+If you would like to change the output format of the timestamp, you
+can do:
+
+	make check TS=1 TS_FORMAT='[%b %H:%S]'
+
 Race detection
 **************
 
diff --git a/gdb/testsuite/print-ts.py b/gdb/testsuite/print-ts.py
new file mode 100755
index 0000000000..58127f5716
--- /dev/null
+++ b/gdb/testsuite/print-ts.py
@@ -0,0 +1,45 @@ 
+#!/usr/bin/env python3
+
+# Copyright (C) 2018 Free Software Foundation, Inc.
+#
+# This file is part of GDB.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This is a simple program that can be used to print timestamps on
+# standard output.  The inspiration for it was ts(1)
+# (<https://joeyh.name/code/moreutils/>).  We have our own version
+# because unfortunately ts(1) is or may not be available on all
+# systems that GDB supports.
+#
+# The usage is simple:
+#
+#   #> some_command | print-ts.py [FORMAT]
+#
+# FORMAT must be a string compatible with "strftime".  If nothing is
+# provided, we choose a reasonable format.
+
+import fileinput
+import datetime
+import sys
+
+if len (sys.argv) > 1:
+    fmt = sys.argv[1]
+else:
+    fmt = '[%b %d %H:%M:%S]'
+
+for line in fileinput.input ('-'):
+    print ("{} {}".format (datetime.datetime.now ().strftime (fmt),
+                           line),
+           end = '', flush = True)