[v1] nptl: smarter not-parallel-ing
Commit Message
On my i7-4790K (4 real, 8 HT, 4.0 GHz) machine, this reduces "make -j8
check" time from 15m36 to 15m10, saving 26 seconds.
From 722d522f3145736fced14a2f7b57adfb8ecc2f30 Mon Sep 17 00:00:00 2001
From: DJ Delorie <dj@redhat.com>
Date: Tue, 13 Aug 2019 17:34:55 -0400
Subject: Optimize nptl test sequencing.
Remove .NOTPARALLEL, which forces tests to be *built* in series.
Replace with order-only dependencies between targets which only
affects the tests themselves.
Comments
* DJ Delorie:
> diff --git a/nptl/Makefile b/nptl/Makefile
> index 0567e77a79..d7a3857c95 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -666,7 +666,7 @@ $(objpfx)tst-cleanup0.out: /dev/null $(objpfx)tst-cleanup0
> $(evaluate-test)
>
> $(objpfx)tst-cleanup0-cmp.out: tst-cleanup0.expect $(objpfx)tst-cleanup0.out
> - cmp $^ > $@; \
> + cmp tst-cleanup0.expect $(objpfx)tst-cleanup0.out > $@; \
> $(evaluate-test)
Why is this needed if | prerequisites are not listed in $^?
> +ifeq ($(run-built-tests),yes)
> +# The tests in this subdir should not be run in parallel.
> +#
> +# The following will create rules like "foo2.out :| foo1.out" for all
> +# tests, which forces the tests to be run serially, but does not force
> +# a test to be run just because some other test was run.
> +#
> +# Caveat: the :|-style dependencies won't be listed in $^, so avoid
> +# using $^ to depend on test result files.
> +
> +ALLTESTS = $(tests:%=$(objpfx)%.out) $(tests-internal:%=$(objpfx)%.out) \
> + $(tests-container:%=$(objpfx)%.out) \
> + $(tests-special) $(tests-printers-out) \
> + $(xtests:%=$(objpfx)%.out) $(xtests-special)
> +
> +ALLBUTFIRSTTEST = $(filter-out $(firstword $(ALLTESTS)), $(ALLTESTS))
> +ALLBUTLASTTEST = $(filter-out $(lastword $(ALLTESTS)), $(ALLTESTS))
> +TESTPAIRS = $(join $(ALLBUTFIRSTTEST),$(addprefix :|,$(ALLBUTLASTTEST)))
> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
> endif
I would have used different variable names (lowercase, perhaps with an
nptl- prefix), but that's just a detail. The $(filter-out …)
construct is awkward, but I don't see a better way.
I did some benchmarks with this patch.
* Cold case.
Newly build source tree, “make subdirs=login check” (to populate the
testroot), followed by “make subdirs=nptl check”, timings for the
latter. All tests were for x86-64, timings in seconds (wall clock).
Appropriate -j options supplied.
Before:
8-thread system: 399 399 397 400 398 400 400
256-thread system: 392 389 389 391 394 393 392
After:
8-thread system: 372 375 373 376 373 374 375
256-thread system: 357 357 356 361 358 358 357
* Warm case (incremental make).
Timings of “make subdirs=nptl check” after the first test suite run.
Appropriate -j options supplied.
After:
8-thread system: 1.541 1.014 1.025 1.016 1.030 1.029
256-thread system: 1.789 1.720 1.142 1.119 1.133 1.143
(I didn't bother to time the before case, considering these low
numbers.)
So it seems to me that performance-wise, this is a small overall win.
Thanks,
Florian
* DJ Delorie:
> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
One more comment: In the past, there was a sustained objection against
using $(eval …) in makefiles. If I recall correctly, the person who
voiced the objection no longer participates in the project.
Thanks,
Florian
On 8/14/19 8:46 AM, Florian Weimer wrote:
> * DJ Delorie:
>
>> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
>
> One more comment: In the past, there was a sustained objection against
> using $(eval …) in makefiles. If I recall correctly, the person who
> voiced the objection no longer participates in the project.
Do we remember why?
* Carlos O'Donell:
> On 8/14/19 8:46 AM, Florian Weimer wrote:
>> * DJ Delorie:
>>
>>> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
>>
>> One more comment: In the past, there was a sustained objection against
>> using $(eval …) in makefiles. If I recall correctly, the person who
>> voiced the objection no longer participates in the project.
>
> Do we remember why?
Which part? No deep technical reason was given at the time for the
$(eval …) objection, and I did not make an attempt do understand this
position.
For target-generating loops, the workaround has always been to use
o-iterator.mk, which I personally do not consider an improvement.
Thanks,
Florian
On Wed, 14 Aug 2019, Florian Weimer wrote:
> * Carlos O'Donell:
>
> > On 8/14/19 8:46 AM, Florian Weimer wrote:
> >> * DJ Delorie:
> >>
> >>> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
> >>
> >> One more comment: In the past, there was a sustained objection against
> >> using $(eval …) in makefiles. If I recall correctly, the person who
> >> voiced the objection no longer participates in the project.
> >
> > Do we remember why?
>
> Which part? No deep technical reason was given at the time for the
> $(eval …) objection, and I did not make an attempt do understand this
> position.
>
> For target-generating loops, the workaround has always been to use
> o-iterator.mk, which I personally do not consider an improvement.
We could revisit Andreas's patch
<https://sourceware.org/ml/libc-alpha/2012-07/msg00250.html>, updated for
subsequently added uses of o-iterator.mk, and see if it still gives the
reported speedup.
On Aug 14 2019, Joseph Myers <joseph@codesourcery.com> wrote:
> We could revisit Andreas's patch
> <https://sourceware.org/ml/libc-alpha/2012-07/msg00250.html>, updated for
> subsequently added uses of o-iterator.mk, and see if it still gives the
> reported speedup.
It doesn't.
Andreas.
@@ -1,3 +1,8 @@
+2019-08-13 DJ Delorie <dj@delorie.com>
+
+ * nptl/Makefile (.NOTPARALLEL): Remove. Replace with computed
+ sequence dependencies instead.
+
2019-08-13 Florian Weimer <fweimer@redhat.com>
* login/utmp_file.c (LOCK_FILE, LOCKING_FAILED, UNLOCK_FILE):
@@ -666,7 +666,7 @@ $(objpfx)tst-cleanup0.out: /dev/null $(objpfx)tst-cleanup0
$(evaluate-test)
$(objpfx)tst-cleanup0-cmp.out: tst-cleanup0.expect $(objpfx)tst-cleanup0.out
- cmp $^ > $@; \
+ cmp tst-cleanup0.expect $(objpfx)tst-cleanup0.out > $@; \
$(evaluate-test)
$(objpfx)crti.o: $(objpfx)pt-crti.o
@@ -723,7 +723,23 @@ tst-audit-threads-ENV = LD_AUDIT=$(objpfx)tst-audit-threads-mod1.so
CFLAGS-tst-unwind-thread.c += -funwind-tables
-# The tests here better do not run in parallel
-ifneq ($(filter %tests,$(MAKECMDGOALS)),)
-.NOTPARALLEL:
+ifeq ($(run-built-tests),yes)
+# The tests in this subdir should not be run in parallel.
+#
+# The following will create rules like "foo2.out :| foo1.out" for all
+# tests, which forces the tests to be run serially, but does not force
+# a test to be run just because some other test was run.
+#
+# Caveat: the :|-style dependencies won't be listed in $^, so avoid
+# using $^ to depend on test result files.
+
+ALLTESTS = $(tests:%=$(objpfx)%.out) $(tests-internal:%=$(objpfx)%.out) \
+ $(tests-container:%=$(objpfx)%.out) \
+ $(tests-special) $(tests-printers-out) \
+ $(xtests:%=$(objpfx)%.out) $(xtests-special)
+
+ALLBUTFIRSTTEST = $(filter-out $(firstword $(ALLTESTS)), $(ALLTESTS))
+ALLBUTLASTTEST = $(filter-out $(lastword $(ALLTESTS)), $(ALLTESTS))
+TESTPAIRS = $(join $(ALLBUTFIRSTTEST),$(addprefix :|,$(ALLBUTLASTTEST)))
+$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
endif