[RFC] Monster testcase generator for performance testsuite

Message ID m3lhllpkd6.fsf@seba.sebabeach.org
State New, archived
Headers

Commit Message

Doug Evans Jan. 2, 2015, 10:06 a.m. UTC
  Hi.

This patch adds preliminary support for generating large programs.
"Large" as in 10000 compunits or 5000 shared libraries or 3M ELF symbols.

There's still a bit more I want to add to this, but it's at a point
where I can use it, and thus now's a good time to get some feedback.

One difference between these tests and current perf tests is that
one .exp is used to build the program and another .exp is used to
run the test.  These programs take awhile to compile and link.
Generating the sources for these monster testcases takes hardly any time
at all relative to the amount of time to compile them.  I measured 13.5
minutes to compile the included gmonster1 benchmark (with -j6!), and about
an equivalent amount of time to run the benchmark.  Therefore it makes
sense to be able to use one program in multiple performance tests, and
therefore it makes sense to separate the build from the test run.

These tests currently require separate build-perf and check-perf steps,
which is different from normal perf tests.  However, due to the time
it takes to build the program I've added support for building the pieces
of the test in parallel, and hooking this parallel build support into
the existing framework required some pragmatic compromise.

Running the gmonster1-ptype benchmark requires about 8G to link the program,
and 11G to run it under gdb.  I still need to add the ability to
have a small version enabled by default, and turn on the bigger version
from the command line.  I don't expect everyone to have a big enough
machine to run the test configuration that I do.

I don't expect the gmonster1-ptype test to remain as is.
I'm still playing with it.

I wanted the generated files from the parallel build to appear in the
gdb.perf directory, so I enhanced GDB_PARALLEL support to let one specify
the location of the outputs/cache/temp directories.

In order to add parallel build support I needed a way to step
through the phases of the build:

1) To generate the build .exp files
   GDB_PERFTEST_MODE=gen-build-exps
   This step allows for parallel builds of the majority of pieces of the
   test binary and shlibs.
2) To compile the "pieces" of the binary and shlibs.
   "Pieces" are the bulk of the machine-generated sources of the test.
   This step is driven by lib/build-piece.exp.
   GDB_PERFTEST_MODE=build-pieces
3) To perform the final link of the binary.
   GDB_PERFTEST_MODE=compile

Going this route makes the "both" value of GDB_PERFTEST_MODE
(which means compile+run) a bit confusing.  I'm open to suggestions
for how one would want this done differently.  I'm used to the meaning
of "both" now so I don't mind this, but I think the thing to do is rename
"both".  Another possibility is using a different variable than
GDB_PERFTEST_MODE to step through the three phases of the parallel build.
Given the size of these programs and the time it takes to compile them,
I think having parallel build support up front is important.

Also, I still need to do some regression tests to make sure I haven't
broken anything. :-)

Example:

bash$ cd testsuite ; make site.exp
bash$ make -j6 build-perf RUNTESTFLAGS=gmonster1.exp
... wait awhile ...
bash$ make check-perf GDB_PERFTEST_MODE=run RUNTESTFLAGS=gmonster1-ptype.exp
... wait awhile ...
  

Comments

Yao Qi Jan. 5, 2015, 1:32 p.m. UTC | #1
Doug Evans <xdje42@gmail.com> writes:

Doug,
First of all, it is great to have such generator for performance testing,
but it doesn't have to be a monster and we don't need parallel build so
far.  The parallel build will get the generator over-complicated.  See
more below:

> This patch adds preliminary support for generating large programs.
> "Large" as in 10000 compunits or 5000 shared libraries or 3M ELF symbols.
>

Is there any reason we define the workload like this?  Can they
represent the typical and practical super large program?  I feel that the
workload you defined is too heavy to be practical, and the overweight
causes the long compilation time you mentioned below.

> There's still a bit more I want to add to this, but it's at a point
> where I can use it, and thus now's a good time to get some feedback.
>
> One difference between these tests and current perf tests is that
> one .exp is used to build the program and another .exp is used to
> run the test.  These programs take awhile to compile and link.
> Generating the sources for these monster testcases takes hardly any time
> at all relative to the amount of time to compile them.  I measured 13.5
> minutes to compile the included gmonster1 benchmark (with -j6!), and about
> an equivalent amount of time to run the benchmark.  Therefore it makes
> sense to be able to use one program in multiple performance tests, and
> therefore it makes sense to separate the build from the test run.

Compilation and run takes about 10 minutes respectively.  However, I
don't understand the importance that making tests running for 10
minutes, which is too long for a perf test case.  IMO, a-two-minute-run
program should be representative enough...

>
> These tests currently require separate build-perf and check-perf steps,
> which is different from normal perf tests.  However, due to the time
> it takes to build the program I've added support for building the pieces
> of the test in parallel, and hooking this parallel build support into
> the existing framework required some pragmatic compromise.

... so the parallel build part may not be needed.

> Running the gmonster1-ptype benchmark requires about 8G to link the program,
> and 11G to run it under gdb.  I still need to add the ability to
> have a small version enabled by default, and turn on the bigger version
> from the command line.  I don't expect everyone to have a big enough
> machine to run the test configuration that I do.

It looks like a monster rather than a perf test case :)  It is good to
have a small version enabled by default, which requires less than 1 G,
for example, to run it under GDB.  How much time it takes to compile
(sequential build) and run the small version?
  
Doug Evans Jan. 6, 2015, 12:54 a.m. UTC | #2
On Mon, Jan 5, 2015 at 5:32 AM, Yao Qi <yao@codesourcery.com> wrote:
> Doug Evans <xdje42@gmail.com> writes:
>
> Doug,
> First of all, it is great to have such generator for performance testing,
> but it doesn't have to be a monster and we don't need parallel build so
> far.  The parallel build will get the generator over-complicated.  See
> more below.
>
>> This patch adds preliminary support for generating large programs.
>> "Large" as in 10000 compunits or 5000 shared libraries or 3M ELF symbols.
>>
>
> Is there any reason we define the workload like this?  Can they
> represent the typical and practical super large program?  I feel that the
> workload you defined is too heavy to be practical, and the overweight
> causes the long compilation time you mentioned below.

Those are just loose (i.e., informal) characterizations of real programs
my users run gdb on.
And that's an incomplete list btw.
So, yes, they do represent practical super large programs.
The programs these benchmarks will be based on are as real as it gets.
As for whether they're typical ... depends on what you're used to I guess. :-)

>> There's still a bit more I want to add to this, but it's at a point
>> where I can use it, and thus now's a good time to get some feedback.
>>
>> One difference between these tests and current perf tests is that
>> one .exp is used to build the program and another .exp is used to
>> run the test.  These programs take awhile to compile and link.
>> Generating the sources for these monster testcases takes hardly any time
>> at all relative to the amount of time to compile them.  I measured 13.5
>> minutes to compile the included gmonster1 benchmark (with -j6!), and about
>> an equivalent amount of time to run the benchmark.  Therefore it makes
>> sense to be able to use one program in multiple performance tests, and
>> therefore it makes sense to separate the build from the test run.
>
> Compilation and run takes about 10 minutes respectively.  However, I
> don't understand the importance that making tests running for 10
> minutes, which is too long for a perf test case.  IMO, a-two-minute-run
> program should be representative enough...

Depends.
I'm not suggesting compile/run time is the defining characteristic
that makes them useful. gmonster1 (and others) are intended to be
representative of real programs (gmonster1 isn't there yet, but it's
not because it's too big ..., I still have to tweak the kind of bigness
it has, as well as add more specially crafted code to exercise real issues).
Its compile time is what it is. The program is that big.
As for test run time, that depends on the test.
At the moment it's still early, and I'm still writing tests and
calibrating them.

As for general importance,

If a change to gdb increases the time it takes to run a particular command
by one second is that ok? Maybe. And if my users see the increase
become ten seconds is that still ok? Also maybe, but I'd like to make the
case that it'd be preferable to have mechanisms in place to find out sooner
than later.

Similarly, if a change to gdb increases memory usage by 40MB is that ok?
Maybe. And if my users see that increase become 400MB is that still ok?
Possibly (depending on the nature of the change). But, again, one of my
goals here is to have in place mechanisms to find out sooner than later.

Note that, as I said, there's more I wish to add here.
For example, it's not enough to just machine generate a bunch of generic
code. We also need the ability to add specific cases that trip gdb up,
and thus I also plan to add the ability to add hand-written code to
these benchmarks.
Plus, my plan is to make gmonster1 contain a variety of such cases
and use it in multiple benchmarks. Otherwise we're compiling/linking
multiple programs and I *am* trying to cut down on build times here! :-)

>> These tests currently require separate build-perf and check-perf steps,
>> which is different from normal perf tests.  However, due to the time
>> it takes to build the program I've added support for building the pieces
>> of the test in parallel, and hooking this parallel build support into
>> the existing framework required some pragmatic compromise.
>
> ... so the parallel build part may not be needed.

I'm not sure what the hangup is on supporting parallel builds here.
Can you elaborate? It's really not that much code, and while I could
have done things differently, I'm just using mechanisms that are
already in place. The only real "complexity" is that the existing
mechanism is per-.exp-file based, so I needed one .exp file per worker.
I think we could simplify this with some cleverness, but this isn't
what I want to focus on right now. Any change will just be to the
infrastructure, not to the tests. If someone wants to propose a different
mechanism to achieve the parallelism go for it. OTOH, there is value
in using existing mechanisms. Another way to go (and I'm not suggesting
this is a better or worse way, it's just an example) would be to have
hand-written worker .exp files and check those in. I don't have a
strong opinion on that, machine generating them is easy enough and
gives me some flexibility (which is nice) in these early stages.

>> Running the gmonster1-ptype benchmark requires about 8G to link the program,
>> and 11G to run it under gdb.  I still need to add the ability to
>> have a small version enabled by default, and turn on the bigger version
>> from the command line.  I don't expect everyone to have a big enough
>> machine to run the test configuration that I do.
>
> It looks like a monster rather than a perf test case :)

Depends.  How long do your users still wait for gdb to do something?
My users are still waiting too long for several things (e.g., startup time).
And I want to be able to measure what my users see.
And I want to be able to provide upstream with demonstrations of that.

> It is good to
> have a small version enabled by default, which requires less than 1 G,
> for example, to run it under GDB.  How much time it takes to compile
> (sequential build) and run the small version?

There are mechanisms in place to control the amount of parallelism.
One could make it part of the test spec, but I'm not sure it'd be useful
enough.  Thus I think there's no need to compile small testcases
serially.

As for what upstream wants the "default" to be, I don't have
a strong opinion, beyond it being minimally useful.  If the default isn't
useful to me, it's easy enough to tweak the test with a local change
to cover what I need.

Note that I'm not expecting the default to be these
super long times, which I noted in my original email. OTOH, I do want
the harness to be able to usefully handle (as in not wait an hour for the
testcase to be built) the kind of large programs that I need to run the
tests on.  Thus my plan is to have a harness that can handle what
I need, but have defaults that don't impose that on everyone.
Given appropriate knobs it will be easy enough to have useful
defaults and still be able to run the tests with larger programs.
And then if my runs find a problem, it will be straightforward for
me to provide a demonstration of what I'm seeing (which is part
of what I want to accomplish here).
  
Yao Qi Jan. 7, 2015, 9:39 a.m. UTC | #3
Doug Evans <dje@google.com> writes:

> If a change to gdb increases the time it takes to run a particular command
> by one second is that ok? Maybe. And if my users see the increase
> become ten seconds is that still ok? Also maybe, but I'd like to make the
> case that it'd be preferable to have mechanisms in place to find out sooner
> than later.
>

Yeah, I agree that it is better to find out problems sooner than later.
That is why we create perf test cases.  If one second time increase is
sufficient to find the performance problem, isn't it good?  Why do we
still need to run a bigger version which demonstrated ten seconds increase?

> Similarly, if a change to gdb increases memory usage by 40MB is that ok?
> Maybe. And if my users see that increase become 400MB is that still ok?
> Possibly (depending on the nature of the change). But, again, one of my
> goals here is to have in place mechanisms to find out sooner than later.
>

Similarly, if 40MB memory usage increase is sufficient to show the
performance problem, why do we still have to use a bigger one?

Perf test case is used to demonstrate the real performance problems in
some super large programs, but it doesn't mean the perf test case should
be as big as these super large programs.

> Note that, as I said, there's more I wish to add here.
> For example, it's not enough to just machine generate a bunch of generic
> code. We also need the ability to add specific cases that trip gdb up,
> and thus I also plan to add the ability to add hand-written code to
> these benchmarks.
> Plus, my plan is to make gmonster1 contain a variety of such cases
> and use it in multiple benchmarks. Otherwise we're compiling/linking
> multiple programs and I *am* trying to cut down on build times here! :-)
>

That sounds interesting...

>>> These tests currently require separate build-perf and check-perf steps,
>>> which is different from normal perf tests.  However, due to the time
>>> it takes to build the program I've added support for building the pieces
>>> of the test in parallel, and hooking this parallel build support into
>>> the existing framework required some pragmatic compromise.
>>
>> ... so the parallel build part may not be needed.
>
> I'm not sure what the hangup is on supporting parallel builds here.
> Can you elaborate? It's really not that much code, and while I could

I'd like keep gdb perf test simple.

> have done things differently, I'm just using mechanisms that are
> already in place. The only real "complexity" is that the existing
> mechanism is per-.exp-file based, so I needed one .exp file per worker.
> I think we could simplify this with some cleverness, but this isn't
> what I want to focus on right now. Any change will just be to the
> infrastructure, not to the tests. If someone wants to propose a different
> mechanism to achieve the parallelism go for it. OTOH, there is value
> in using existing mechanisms. Another way to go (and I'm not suggesting
> this is a better or worse way, it's just an example) would be to have
> hand-written worker .exp files and check those in. I don't have a
> strong opinion on that, machine generating them is easy enough and
> gives me some flexibility (which is nice) in these early stages.
>
>>> Running the gmonster1-ptype benchmark requires about 8G to link the program,
>>> and 11G to run it under gdb.  I still need to add the ability to
>>> have a small version enabled by default, and turn on the bigger version
>>> from the command line.  I don't expect everyone to have a big enough
>>> machine to run the test configuration that I do.
>>
>> It looks like a monster rather than a perf test case :)
>
> Depends.  How long do your users still wait for gdb to do something?
> My users are still waiting too long for several things (e.g., startup time).
> And I want to be able to measure what my users see.
> And I want to be able to provide upstream with demonstrations of that.
>

IMO, your expectation is beyond the scope or the purpose perf test
case.  The purpose of each perf test case is to make sure there is no
performance regression and to expose performance problems as code
evolves.  It is not reasonable to me that we measure what users see by
running our perf test cases.  Each perf test case is to measure the
performance on gdb on a certain path, so it doesn't have to behave
exactly the same as the application users are debugging.

>> It is good to
>> have a small version enabled by default, which requires less than 1 G,
>> for example, to run it under GDB.  How much time it takes to compile
>> (sequential build) and run the small version?
>
> There are mechanisms in place to control the amount of parallelism.
> One could make it part of the test spec, but I'm not sure it'd be useful
> enough.  Thus I think there's no need to compile small testcases
> serially.
>

Is it possible (or necessary) that we divide it to two parts, 1) perf
test case generator and 2) parallel build?  As we increase the size
generated perf test cases, the long compilation time can justify having
parallel build.

> As for what upstream wants the "default" to be, I don't have
> a strong opinion, beyond it being minimally useful.  If the default isn't
> useful to me, it's easy enough to tweak the test with a local change
> to cover what I need.
>
> Note that I'm not expecting the default to be these
> super long times, which I noted in my original email. OTOH, I do want
> the harness to be able to usefully handle (as in not wait an hour for the
> testcase to be built) the kind of large programs that I need to run the
> tests on.  Thus my plan is to have a harness that can handle what
> I need, but have defaults that don't impose that on everyone.
> Given appropriate knobs it will be easy enough to have useful
> defaults and still be able to run the tests with larger programs.
> And then if my runs find a problem, it will be straightforward for
> me to provide a demonstration of what I'm seeing (which is part
> of what I want to accomplish here).

Yeah, I agree.
  
Doug Evans Jan. 7, 2015, 10:33 p.m. UTC | #4
On Wed, Jan 7, 2015 at 1:39 AM, Yao Qi <yao@codesourcery.com> wrote:
> Doug Evans <dje@google.com> writes:
>
>> If a change to gdb increases the time it takes to run a particular command
>> by one second is that ok? Maybe. And if my users see the increase
>> become ten seconds is that still ok? Also maybe, but I'd like to make the
>> case that it'd be preferable to have mechanisms in place to find out sooner
>> than later.
>>
>
> Yeah, I agree that it is better to find out problems sooner than later.
> That is why we create perf test cases.  If one second time increase is
> sufficient to find the performance problem, isn't it good?  Why do we
> still need to run a bigger version which demonstrated ten seconds increase?

Some performance problems only present themselves at scale.
We need a perf test framework that lets us explore such things.

The point of the 1 second vs 10 second scenario is that the community
may find that 1 second is acceptable (IOW *not* a performance problem
significant enough to address).  It'll depend on the situation.
But at scale the performance may be untenable, causing one to want
to rethink one's algorithm or data structure or whatever.

Similar issues arise elsewhere btw.
E.g., gdb may handle 10 or 100 threads ok, but how about 1000 threads?

>> Similarly, if a change to gdb increases memory usage by 40MB is that ok?
>> Maybe. And if my users see that increase become 400MB is that still ok?
>> Possibly (depending on the nature of the change). But, again, one of my
>> goals here is to have in place mechanisms to find out sooner than later.
>>
>
> Similarly, if 40MB memory usage increase is sufficient to show the
> performance problem, why do we still have to use a bigger one?
>
> Perf test case is used to demonstrate the real performance problems in
> some super large programs, but it doesn't mean the perf test case should
> be as big as these super large programs.

One may think 40MB is a reasonable price to pay for some change
or some new feature.  But at scale that price may become unbearable.
So, yes, we do need perf testcases that let one exercise gdb at scale.

>>>> These tests currently require separate build-perf and check-perf steps,
>>>> which is different from normal perf tests.  However, due to the time
>>>> it takes to build the program I've added support for building the pieces
>>>> of the test in parallel, and hooking this parallel build support into
>>>> the existing framework required some pragmatic compromise.
>>>
>>> ... so the parallel build part may not be needed.
>>
>> I'm not sure what the hangup is on supporting parallel builds here.
>> Can you elaborate? It's really not that much code, and while I could
>
> I'd like keep gdb perf test simple.

How simple?  What about parallel builds adds too much complexity?
make check-parallel adds complexity, but I'm guessing no one is
advocating removing it, or was advocating against checking it in.

>>> It looks like a monster rather than a perf test case :)
>>
>> Depends.  How long do your users still wait for gdb to do something?
>> My users are still waiting too long for several things (e.g., startup time).
>> And I want to be able to measure what my users see.
>> And I want to be able to provide upstream with demonstrations of that.
>>
>
> IMO, your expectation is beyond the scope or the purpose perf test
> case.  The purpose of each perf test case is to make sure there is no
> performance regression and to expose performance problems as code
> evolves.

It's precisely within the scope and purpose of the perf testsuite!
We need to measure how well gdb will work on real programs,
and make sure changes introduced don't adversely affect such programs.
How do you know a feature/change/improvement will work at scale unless
you test it at scale?

> It is not reasonable to me that we measure what users see by
> running our perf test cases.

Perf test cases aren't an end unto themselves.
They exist to help serve our users.  If we're not able to measure
what our users see, how do we know what their gdb experience is?

> Each perf test case is to measure the
> performance on gdb on a certain path, so it doesn't have to behave
> exactly the same as the application users are debugging.
>
>>> It is good to
>>> have a small version enabled by default, which requires less than 1 G,
>>> for example, to run it under GDB.  How much time it takes to compile
>>> (sequential build) and run the small version?
>>
>> There are mechanisms in place to control the amount of parallelism.
>> One could make it part of the test spec, but I'm not sure it'd be useful
>> enough.  Thus I think there's no need to compile small testcases
>> serially.
>>
>
> Is it possible (or necessary) that we divide it to two parts, 1) perf
> test case generator and 2) parallel build?  As we increase the size
> generated perf test cases, the long compilation time can justify having
> parallel build.

I'm not sure what you're advocating for here.
Can you rephrase/elaborate?

>> As for what upstream wants the "default" to be, I don't have
>> a strong opinion, beyond it being minimally useful.  If the default isn't
>> useful to me, it's easy enough to tweak the test with a local change
>> to cover what I need.
>>
>> Note that I'm not expecting the default to be these
>> super long times, which I noted in my original email. OTOH, I do want
>> the harness to be able to usefully handle (as in not wait an hour for the
>> testcase to be built) the kind of large programs that I need to run the
>> tests on.  Thus my plan is to have a harness that can handle what
>> I need, but have defaults that don't impose that on everyone.
>> Given appropriate knobs it will be easy enough to have useful
>> defaults and still be able to run the tests with larger programs.
>> And then if my runs find a problem, it will be straightforward for
>> me to provide a demonstration of what I'm seeing (which is part
>> of what I want to accomplish here).
>
> Yeah, I agree.
>
> --
> Yao (齐尧)
  
Yao Qi Jan. 8, 2015, 1:55 a.m. UTC | #5
Doug Evans <dje@google.com> writes:

> The point of the 1 second vs 10 second scenario is that the community
> may find that 1 second is acceptable (IOW *not* a performance problem
> significant enough to address).  It'll depend on the situation.
> But at scale the performance may be untenable, causing one to want
> to rethink one's algorithm or data structure or whatever.

Right, the algorithm may be reconsidered when the program goes to large
scale.
>
> Similar issues arise elsewhere btw.
> E.g., gdb may handle 10 or 100 threads ok, but how about 1000 threads?

Then, I have to run the program with 1000 threads.

>>> Similarly, if a change to gdb increases memory usage by 40MB is that ok?
>>> Maybe. And if my users see that increase become 400MB is that still ok?
>>> Possibly (depending on the nature of the change). But, again, one of my
>>> goals here is to have in place mechanisms to find out sooner than later.
>>>
>>
>> Similarly, if 40MB memory usage increase is sufficient to show the
>> performance problem, why do we still have to use a bigger one?
>>
>> Perf test case is used to demonstrate the real performance problems in
>> some super large programs, but it doesn't mean the perf test case should
>> be as big as these super large programs.
>
> One may think 40MB is a reasonable price to pay for some change
> or some new feature.  But at scale that price may become unbearable.
> So, yes, we do need perf testcases that let one exercise gdb at scale.

Hmmm, that makes sense to me.

>
>>>>> These tests currently require separate build-perf and check-perf steps,
>>>>> which is different from normal perf tests.  However, due to the time
>>>>> it takes to build the program I've added support for building the pieces
>>>>> of the test in parallel, and hooking this parallel build support into
>>>>> the existing framework required some pragmatic compromise.
>>>>
>>>> ... so the parallel build part may not be needed.
>>>
>>> I'm not sure what the hangup is on supporting parallel builds here.
>>> Can you elaborate? It's really not that much code, and while I could
>>
>> I'd like keep gdb perf test simple.
>
> How simple?  What about parallel builds adds too much complexity?
> make check-parallel adds complexity, but I'm guessing no one is
> advocating removing it, or was advocating against checking it in.
>

Well, 'make check-parallel' is useful and parallel build in perf test
case generator is useful too.  However at first I feel that parallel
build in perf test case generator is a plus, not a must.  I thought we
could have a perf test case generator without parallel build.

>>>> It looks like a monster rather than a perf test case :)
>>>
>>> Depends.  How long do your users still wait for gdb to do something?
>>> My users are still waiting too long for several things (e.g., startup time).
>>> And I want to be able to measure what my users see.
>>> And I want to be able to provide upstream with demonstrations of that.
>>>
>>
>> IMO, your expectation is beyond the scope or the purpose perf test
>> case.  The purpose of each perf test case is to make sure there is no
>> performance regression and to expose performance problems as code
>> evolves.
>
> It's precisely within the scope and purpose of the perf testsuite!
> We need to measure how well gdb will work on real programs,
> and make sure changes introduced don't adversely affect such programs.
> How do you know a feature/change/improvement will work at scale unless
> you test it at scale?
>

We should test it at scale.

>> Each perf test case is to measure the
>> performance on gdb on a certain path, so it doesn't have to behave
>> exactly the same as the application users are debugging.
>>
>>>> It is good to
>>>> have a small version enabled by default, which requires less than 1 G,
>>>> for example, to run it under GDB.  How much time it takes to compile
>>>> (sequential build) and run the small version?
>>>
>>> There are mechanisms in place to control the amount of parallelism.
>>> One could make it part of the test spec, but I'm not sure it'd be useful
>>> enough.  Thus I think there's no need to compile small testcases
>>> serially.
>>>
>>
>> Is it possible (or necessary) that we divide it to two parts, 1) perf
>> test case generator and 2) parallel build?  As we increase the size
>> generated perf test cases, the long compilation time can justify having
>> parallel build.
>
> I'm not sure what you're advocating for here.
> Can you rephrase/elaborate?

Can we have a perf test case generator without using parallel build? and
we can add building perf test cases in parallel in next step.  I'd like
to add new things gradually.

If you think it isn't necessary to do things in these two steps, I am
OK too.  I don't have a strong opinion on this now.  I'll take a look at
your patch in details.
  

Patch

diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
index 07d3942..5a32d02 100644
--- a/gdb/testsuite/Makefile.in
+++ b/gdb/testsuite/Makefile.in
@@ -227,13 +227,30 @@  do-check-parallel: $(TEST_TARGETS)
 
 @GMAKE_TRUE@check/%.exp:
 @GMAKE_TRUE@	-mkdir -p outputs/$*
-@GMAKE_TRUE@	@$(DO_RUNTEST) GDB_PARALLEL=yes --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)
+@GMAKE_TRUE@	@$(DO_RUNTEST) GDB_PARALLEL=. --outdir=outputs/$* $*.exp $(RUNTESTFLAGS)
 
 check/no-matching-tests-found:
 	@echo ""
 	@echo "No matching tests found."
 	@echo ""
 
+@GMAKE_TRUE@pieces/%.exp:
+@GMAKE_TRUE@	mkdir -p gdb.perf/outputs/$*
+@GMAKE_TRUE@	$(DO_RUNTEST) --status --outdir=gdb.perf/outputs/$* lib/build-piece.exp PIECE=gdb.perf/pieces/$*.exp WORKER=$* GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=build-pieces
+
+# GDB_PERFTEST_MODE appears *after* RUNTESTFLAGS here because we don't want
+# anything in RUNTESTFLAGS to override it.
+@GMAKE_TRUE@build-perf: $(abs_builddir)/site.exp
+@GMAKE_TRUE@	rm -rf gdb.perf/pieces
+@GMAKE_TRUE@	rm -rf gdb.perf/cache gdb.perf/outputs gdb.perf/temp
+@GMAKE_TRUE@	mkdir -p gdb.perf/pieces
+@GMAKE_TRUE@	@: Step 1: Generate the build .exp files.
+@GMAKE_TRUE@	$(DO_RUNTEST) --status --directory=gdb.perf --outdir gdb.perf/pieces GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=gen-build-exps
+@GMAKE_TRUE@	@: Step 2: Compile the pieces.
+@GMAKE_TRUE@	$(MAKE) $$(cd gdb.perf && echo pieces/*/*.exp)
+@GMAKE_TRUE@	@: Step 3: Do the final link.
+@GMAKE_TRUE@	$(DO_RUNTEST) --status --directory=gdb.perf --outdir gdb.perf GDB_PARALLEL=gdb.perf $(RUNTESTFLAGS) GDB_PERFTEST_MODE=compile
+
 check-perf: all $(abs_builddir)/site.exp
 	@if test ! -d gdb.perf; then mkdir gdb.perf; fi
 	$(DO_RUNTEST) --directory=gdb.perf --outdir gdb.perf GDB_PERFTEST_MODE=both $(RUNTESTFLAGS)
diff --git a/gdb/testsuite/gdb.perf/gmonster1-ptype.exp b/gdb/testsuite/gdb.perf/gmonster1-ptype.exp
new file mode 100644
index 0000000..e4fde74
--- /dev/null
+++ b/gdb/testsuite/gdb.perf/gmonster1-ptype.exp
@@ -0,0 +1,42 @@ 
+# Copyright (C) 2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Measure speed of ptype on a simple class.
+# Test parameters are the standard GenPerfTest parameters.
+
+load_lib perftest.exp
+
+if [skip_perf_tests] {
+    return 0
+}
+
+GenPerfTest::load_test_description gmonster1.exp
+
+# This variable is required by perftest.exp.
+# This isn't the name of the test program, it's the name of the test.
+# The harness assumes they are the same, which is not the case here.
+set testfile "gmonster1-ptype"
+
+array set testcase [make_testcase_config]
+
+PerfTest::assemble {
+    # Compilation is handled by gmonster1.exp.
+    return 0
+} {
+    clean_restart
+} {
+    global testcase
+    gdb_test "python Gmonster1Ptype('$testfile', [tcl_string_list_to_python_list $testcase(run_names)], '$testcase(binfile)').run()"
+}
diff --git a/gdb/testsuite/gdb.perf/gmonster1-ptype.py b/gdb/testsuite/gdb.perf/gmonster1-ptype.py
new file mode 100644
index 0000000..041d7e4
--- /dev/null
+++ b/gdb/testsuite/gdb.perf/gmonster1-ptype.py
@@ -0,0 +1,72 @@ 
+# Copyright (C) 2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Measure speed of ptype on a simple class.
+
+from perftest import perftest
+from perftest import measure
+
+class Gmonster1Ptype(perftest.TestCaseWithBasicMeasurements):
+    def __init__(self, name, run_names, binfile):
+        # We want to measure time in this test.
+        super(Gmonster1Ptype, self).__init__(name)
+        self.run_names = run_names
+        self.binfile = binfile
+
+    def warm_up(self):
+        pass
+
+    @staticmethod
+    def _safe_execute(command):
+        try:
+            gdb.execute(command)
+        except gdb.error:
+            pass
+
+    @staticmethod
+    def _convert_spaces(file_name):
+        return file_name.replace(" ", "-")
+
+    @staticmethod
+    def _select_file(file_name):
+        gdb.execute("file %s" % (file_name))
+
+    @staticmethod
+    def _runto_main():
+        gdb.execute("tbreak main")
+        gdb.execute("run")
+
+    def execute_test(self):
+        self._safe_execute("set confirm off")
+        class_to_print = { "1-cu": "class_0_0",
+                           "10-cus": "class_9_0",
+                           "100-cus": "class_99_0",
+                           "1000-cus": "class_999_0",
+                           "10000-cus": "class_9999_0" }
+        for run in self.run_names:
+            class_name = "ns_0::ns_1::%s" % (class_to_print[run])
+            this_run_binfile = "%s-%s" % (self.binfile,
+                                          self._convert_spaces(run))
+            self._select_file(this_run_binfile)
+            self._runto_main()
+            self._safe_execute("mt expand-symtabs")
+            self._safe_execute("set $this = (%s*) 0" % (class_name))
+            self._safe_execute("break %s::method_0" % (class_name))
+            self._safe_execute("call $this->method_0()")
+            iteration = 5
+            while iteration > 0:
+                func = lambda: self._safe_execute("ptype %s" % (class_name))
+                self.measure.measure(func, run)
+                iteration -= 1
diff --git a/gdb/testsuite/gdb.perf/gmonster1.exp b/gdb/testsuite/gdb.perf/gmonster1.exp
new file mode 100644
index 0000000..f1e6c2a
--- /dev/null
+++ b/gdb/testsuite/gdb.perf/gmonster1.exp
@@ -0,0 +1,84 @@ 
+# Copyright (C) 2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Perftest description file for building the "gmonster1" benchmark.
+#
+# Perftest descriptions are loaded thrice:
+# 1) To generate the build .exp files
+#    GDB_PERFTEST_MODE=gen-build-exps
+#    This step allows for parallel builds of the majority of pieces of the
+#    test binary and shlibs.
+# 2) To compile the "pieces" of the binary and shlibs.
+#    "Pieces" are the bulk of the machine-generated sources of the test.
+#    This step is driven by lib/build-piece.exp.
+#    GDB_PERFTEST_MODE=build-pieces
+# 3) To perform the final link of the binary and shlibs.
+#    GDB_PERFTEST_MODE=compile
+
+load_lib perftest.exp
+
+if [skip_perf_tests] {
+    return 0
+}
+
+proc make_testcase_config { } {
+    set program_name "gmonster1"
+    array set testcase [GenPerfTest::init_testcase $program_name]
+
+    set testcase(language) cplus
+    set testcase(run_names) { 1-cu 10-cus 100-cus 1000-cus 10000-cus }
+    set testcase(nr_shlibs) { 0 }
+    set testcase(nr_compunits) { 1 10 100 1000 10000 }
+    set testcase(nr_extern_functions) 10
+    set testcase(nr_static_functions) 10
+    # class_specs needs to be embedded in an outer list because remember each
+    # element of the outer list is for each run, and here we want to use the
+    # same value for all runs.
+    set testcase(class_specs) { { { 0 10 } { 1 10 } { 2 10 } } }
+    set testcase(nr_members) 10
+    set testcase(nr_static_members) 10
+    set testcase(nr_methods) 10
+    set testcase(nr_static_methods) 10
+
+    return [array get testcase]
+}
+
+verbose -log "gmonster1: $GDB_PERFTEST_MODE"
+
+switch $GDB_PERFTEST_MODE {
+    gen-build-exps {
+	if { [GenPerfTest::gen_build_exp_files gmonster1.exp make_testcase_config] < 0 } {
+	    return -1
+	}
+    }
+    build-pieces {
+	;# Nothing to do.
+    }
+    compile {
+	array set testcase [make_testcase_config]
+	if { [GenPerfTest::compile testcase] < 0 } {
+	    return -1
+	}
+    }
+    run {
+	;# Nothing to do.
+    }
+    both {
+	;# Don't do anything here.  Tests that use us must have explicitly
+	;# separate compile/run steps.
+    }
+}
+
+return 0
diff --git a/gdb/testsuite/lib/build-piece.exp b/gdb/testsuite/lib/build-piece.exp
new file mode 100644
index 0000000..c48774c
--- /dev/null
+++ b/gdb/testsuite/lib/build-piece.exp
@@ -0,0 +1,36 @@ 
+# Copyright (C) 2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Utility to bootstrap building a piece of a performance test in a
+# parallel build.
+# See testsuite/Makefile.in:pieces/%.exp.
+
+# Dejagnu presents a kind of API to .exp files, but using this file to
+# bootstrap the parallel build process breaks that.  Before invoking $PIECE
+# set various globals to their expected values.  The tests may not use these
+# today, but if/when they do the error modes are confusing, so fix it now.
+
+# $subdir is set to "lib", because that is where this file lives,
+# which is not what tests expect.  The makefile sets WORKER for us.
+# Its value is <name>/<name>-<number>.
+set subdir [file dirname $WORKER]
+
+# $gdb_test_file_name is set to this file, build-piece, which is not what
+# tests expect.  This assumes each piece's build .exp file lives in
+# $objdir/gdb.perf/pieces/<name>.
+# See perftest.exp:GenPerfTest::gen_build_exp_files.
+set gdb_test_file_name [file tail [file dirname $PIECE]]
+
+source $PIECE
diff --git a/gdb/testsuite/lib/cache.exp b/gdb/testsuite/lib/cache.exp
index 2f4a34e..d33d1cb 100644
--- a/gdb/testsuite/lib/cache.exp
+++ b/gdb/testsuite/lib/cache.exp
@@ -35,7 +35,7 @@  proc gdb_do_cache {name} {
     }
 
     if {[info exists GDB_PARALLEL]} {
-	set cache_filename [file join $objdir cache $cache_name]
+	set cache_filename [file join $objdir $GDB_PARALLEL cache $cache_name]
 	if {[file exists $cache_filename]} {
 	    set fd [open $cache_filename]
 	    set gdb_data_cache($cache_name) [read -nonewline $fd]
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index 08087f2..7f5dd81 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -3729,7 +3729,7 @@  proc standard_output_file {basename} {
     global objdir subdir gdb_test_file_name GDB_PARALLEL
 
     if {[info exists GDB_PARALLEL]} {
-	set dir [file join $objdir outputs $subdir $gdb_test_file_name]
+	set dir [file join $objdir $GDB_PARALLEL outputs $subdir $gdb_test_file_name]
 	file mkdir $dir
 	return [file join $dir $basename]
     } else {
@@ -3743,7 +3743,7 @@  proc standard_temp_file {basename} {
     global objdir GDB_PARALLEL
 
     if {[info exists GDB_PARALLEL]} {
-	return [file join $objdir temp $basename]
+	return [file join $objdir $GDB_PARALLEL temp $basename]
     } else {
 	return $basename
     }
@@ -4645,17 +4645,27 @@  proc build_executable { testname executable {sources ""} {options {debug}} } {
     return [eval build_executable_from_specs $arglist]
 }
 
-# Starts fresh GDB binary and loads EXECUTABLE into GDB. EXECUTABLE is
-# the basename of the binary.
-proc clean_restart { executable } {
+# Starts fresh GDB binary and loads an optional executable into GDB.
+# Usage: clean_restart [executable]
+# EXECUTABLE is the basename of the binary.
+
+proc clean_restart { args } {
     global srcdir
     global subdir
-    set binfile [standard_output_file ${executable}]
+
+    if { [llength $args] > 1 } {
+	error "bad number of args: [llength $args]"
+    }
 
     gdb_exit
     gdb_start
     gdb_reinitialize_dir $srcdir/$subdir
-    gdb_load ${binfile}
+
+    if { [llength $args] >= 1 } {
+	set executable [lindex $args 0]
+	set binfile [standard_output_file ${executable}]
+	gdb_load ${binfile}
+    }
 }
 
 # Prepares for testing by calling build_executable_full, then
@@ -4859,7 +4869,10 @@  if {[info exists GDB_PARALLEL]} {
     if {[is_remote host]} {
 	unset GDB_PARALLEL
     } else {
-	file mkdir outputs temp cache
+	file mkdir \
+	    [file join $GDB_PARALLEL outputs] \
+	    [file join $GDB_PARALLEL temp] \
+	    [file join $GDB_PARALLEL cache]
     }
 }
 
diff --git a/gdb/testsuite/lib/perftest.exp b/gdb/testsuite/lib/perftest.exp
index 6b1cab4..f9c9e11 100644
--- a/gdb/testsuite/lib/perftest.exp
+++ b/gdb/testsuite/lib/perftest.exp
@@ -12,6 +12,10 @@ 
 #
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+# Notes:
+# 1) This follows a Python convention for marking internal vs public functions.
+# Internal functions are prefixed with "_".
 
 namespace eval PerfTest {
     # The name of python file on build.
@@ -42,14 +46,7 @@  namespace eval PerfTest {
     # actual compilation.  Return zero if compilation is successful,
     # otherwise return non-zero.
     proc compile {body} {
-	global GDB_PERFTEST_MODE
-
-	if { [info exists GDB_PERFTEST_MODE]
-	     && [string compare $GDB_PERFTEST_MODE "run"] } {
-	    return [uplevel 2 $body]
-	}
-
-	return 0
+	return [uplevel 2 $body]
     }
 
     # Start up GDB.
@@ -82,14 +79,24 @@  namespace eval PerfTest {
     proc assemble {compile startup run} {
 	global GDB_PERFTEST_MODE
 
-	if { [eval compile {$compile}] } {
-	    untested "Could not compile source files."
+	if ![info exists GDB_PERFTEST_MODE] {
 	    return
 	}
 
+	if { "$GDB_PERFTEST_MODE" == "gen-build-exps"
+	     || "$GDB_PERFTEST_MODE" == "build-pieces" } {
+	    return
+	}
+
+	if { [string compare $GDB_PERFTEST_MODE "run"] } {
+	    if { [eval compile {$compile}] } {
+		untested "Could not compile source files."
+		return
+	    }
+	}
+
 	# Don't execute the run if GDB_PERFTEST_MODE=compile.
-	if { [info exists GDB_PERFTEST_MODE]
-	     && [string compare $GDB_PERFTEST_MODE "compile"] == 0} {
+	if { [string compare $GDB_PERFTEST_MODE "compile"] == 0} {
 	    return
 	}
 
@@ -110,10 +117,11 @@  proc skip_perf_tests { } {
 
     if [info exists GDB_PERFTEST_MODE] {
 
-	if { "$GDB_PERFTEST_MODE" != "compile"
+	if { "$GDB_PERFTEST_MODE" != "gen-build-exps"
+	     && "$GDB_PERFTEST_MODE" != "build-pieces"
+	     && "$GDB_PERFTEST_MODE" != "compile"
 	     && "$GDB_PERFTEST_MODE" != "run"
 	     && "$GDB_PERFTEST_MODE" != "both" } {
-	    # GDB_PERFTEST_MODE=compile|run|both is allowed.
 	    error "Unknown value of GDB_PERFTEST_MODE."
 	    return 1
 	}
@@ -123,3 +131,771 @@  proc skip_perf_tests { } {
 
     return 1
 }
+
+# Given a list of tcl strings, return the same list as the text form of a
+# python list.
+
+proc tcl_string_list_to_python_list { l } {
+    proc quote { text } {
+	return "\"$text\""
+    }
+    set quoted_list ""
+    foreach elm $l {
+	lappend quoted_list [quote $elm]
+    }
+    return "([join $quoted_list {, }])"
+}
+
+# A simple testcase generator.
+#
+# Usage Notes:
+#
+# 1) The length of each parameter list must either be one, in which case the
+# same value is used for each run, or the length must match all other
+# parameters of length greater than one.
+#
+# 2) Values for parameters that vary across runs must appear in increasing
+# order.  E.g. nr_shlibs = { 0 1 10 } is good, { 1 0 10 } is bad.
+# This rule simplifies the code a bit, without being onerous on the user:
+#  a) Report generation doesn't have to sort the output by run, it'll already
+#  be sorted.
+#  b) In the static object file case, the last run can be used used to generate
+#  all the source files.
+#
+# TODO:
+# 1) Lots.  E.g., having functions call each other within an objfile and across
+# objfiles to measure things like backtrace times.
+# 2) Lots.  E.g., inline methods.
+#
+# Implementation Notes:
+#
+# 1) The implementation would be a bit simpler if we could assume Tcl 8.5.
+# Then we could use a dictionary to record the testcase instead of an array.
+# With the array we use here, there is only one copy of it and instead of
+# passing its value we pass its name.  Yay Tcl.
+#
+# 2) Array members cannot (apparently) be references in the conditional
+# expression of a for loop (-> variable not found error).  That is why they're
+# all extracted before the for loop.
+
+namespace eval GenPerfTest {
+
+    # The default level of compilation parallelism we support.
+    set DEFAULT_PERF_TEST_COMPILE_PARALLELISM 10
+
+    # The language of the test.
+    set DEFAULT_LANGUAGE "c"
+
+    # The number of shared libraries to create.
+    set DEFAULT_NR_SHLIBS 0
+
+    # The number of compunits in each objfile.
+    set DEFAULT_NR_COMPUNITS 1
+
+    # The number of public globals in each compunit.
+    set DEFAULT_NR_EXTERN_GLOBALS 1
+
+    # The number of static globals in each compunit.
+    set DEFAULT_NR_STATIC_GLOBALS 1
+
+    # The number of public functions in each compunit.
+    set DEFAULT_NR_EXTERN_FUNCTIONS 1
+
+    # The number of static functions in each compunit.
+    set DEFAULT_NR_STATIC_FUNCTIONS 1
+
+    # List of pairs of class depth and number of classes at that depth.
+    # By "depth" here we mean nesting within a namespace.
+    # E.g.,
+    # class foo {};
+    # namespace n { class foo {}; class bar {}; }
+    # would be represented as { { 0 1 } { 1 2 } }.
+    # This is only used if the selected language permits it.
+    set DEFAULT_CLASS_SPECS {}
+
+    # Number of members in each class.
+    # This is only used if classes are enabled.
+    set DEFAULT_NR_MEMBERS 0
+
+    # Number of static members in each class.
+    # This is only used if classes are enabled.
+    set DEFAULT_NR_STATIC_MEMBERS 0
+
+    # Number of methods in each class.
+    # This is only used if classes are enabled.
+    set DEFAULT_NR_METHODS 0
+
+    # Number of static methods in each class.
+    # This is only used if classes are enabled.
+    set DEFAULT_NR_STATIC_METHODS 0
+
+    set suffixes(c) "c"
+    set suffixes(cplus) "cc"
+
+    # Helper function to generate .exp build files.
+
+    proc _gen_build_exp_files { program_name nr_workers output_dir code } {
+	verbose -log "_gen_build_exp_files: $nr_workers workers"
+	for { set i 0 } { $i < $nr_workers } { incr i } {
+	    set file_name "$output_dir/${program_name}-${i}.exp"
+	    verbose -log "_gen_build_exp_files: Generating $file_name"
+	    set f [open $file_name "w"]
+	    puts $f "# DO NOT EDIT, machine generated file."
+	    puts $f "# See perftest.exp:GenPerfTest::gen_build_exp_files."
+	    puts $f ""
+	    puts $f "set worker_nr $i"
+	    puts $f ""
+	    puts $f "# The body of the file is supplied by the test."
+	    puts $f ""
+	    puts $f $code
+	    close $f
+	}
+	return 0
+    }
+
+    # Generate .exp files to build all the "pieces" of the testcase.
+    # This doesn't include "main" or any test-specific stuff.
+    # This mostly consists of the "bulk" (aka "crap" :-)) of the testcase to
+    # give gdb something meaty to chew on.
+    # The result is 0 for success, -1 for failure.
+    #
+    # Benchmarks generated by some of the tests are big.  I mean really big.
+    # And it's a pain to build one piece at a time, we need a parallel build.
+    # To achieve this, given the framework we're working with, we generate
+    # several .exp files, and then let testsuite/Makefile.in's support for
+    # parallel runs of the testsuite to do its thing.
+
+    proc gen_build_exp_files { test_description_exp make_config_thunk_name } {
+	global objdir PERF_TEST_COMPILE_PARALLELISM
+
+	if { [file tail $test_description_exp] != $test_description_exp } {
+	    error "test description file contains directory name"
+	}
+
+	set program_name [file rootname $test_description_exp]
+
+	set output_dir "$objdir/gdb.perf/pieces/$program_name"
+	file mkdir $output_dir
+
+	# N.B. The generation code below cannot reference anything that exists
+	# here, the code isn't run until later, in another process.  That is
+	# why we split up the assignment to $code.
+	# TODO(dje): Not the cleanest way, but simple enough for now.
+	set code {
+	    # This code is put in each copy of the generated .exp file.
+
+	    load_lib perftest.exp
+
+	    GenPerfTest::load_test_description}
+	append code " $test_description_exp"
+	append code {
+
+	    array set testcase [}
+	append code "$make_config_thunk_name"
+	append code {]
+
+	    if { [GenPerfTest::compile_pieces testcase $worker_nr] < 0 } {
+		return -1
+	    }
+
+	    return 0
+	}
+
+	return [_gen_build_exp_files $program_name $PERF_TEST_COMPILE_PARALLELISM $output_dir $code]
+    }
+
+    # Load a perftest description.
+    # Test descriptions are used to build the input files (binary + shlibs)
+    # of one or more performance tests.
+
+    proc load_test_description { basename } {
+	global srcdir
+
+	if { [file tail $basename] != $basename } {
+	    error "test description file contains directory name"
+	}
+
+	verbose -log "load_file $srcdir/gdb.perf/$basename"
+	if { [load_file $srcdir/gdb.perf/$basename] == 0 } {
+	    error "Unable to load test description $basename"
+	}
+    }
+
+    # Create a testcase object for test NAME.
+    # The caller must call this as:
+    # array set my_test [GenPerfTest::init_testcase $name]
+
+    proc init_testcase { name } {
+	set testcase(name) $name
+	set testcase(language) $GenPerfTest::DEFAULT_LANGUAGE
+	set testcase(run_names) [list $name]
+	set testcase(nr_shlibs) $GenPerfTest::DEFAULT_NR_SHLIBS
+	set testcase(nr_compunits) $GenPerfTest::DEFAULT_NR_COMPUNITS
+
+	set testcase(nr_extern_globals) $GenPerfTest::DEFAULT_NR_EXTERN_GLOBALS
+	set testcase(nr_static_globals) $GenPerfTest::DEFAULT_NR_STATIC_GLOBALS
+	set testcase(nr_extern_functions) $GenPerfTest::DEFAULT_NR_EXTERN_FUNCTIONS
+	set testcase(nr_static_functions) $GenPerfTest::DEFAULT_NR_STATIC_FUNCTIONS
+
+	set testcase(class_specs) $GenPerfTest::DEFAULT_CLASS_SPECS
+	set testcase(nr_members) $GenPerfTest::DEFAULT_NR_MEMBERS
+	set testcase(nr_static_members) $GenPerfTest::DEFAULT_NR_STATIC_MEMBERS
+	set testcase(nr_methods) $GenPerfTest::DEFAULT_NR_METHODS
+	set testcase(nr_static_methods) $GenPerfTest::DEFAULT_NR_STATIC_METHODS
+
+	# The location of this file drives the location of all other files.
+	# The choice is derived from standard_output_file.  We don't use it
+	# because of the parallel build support, we want each worker's log/sum
+	# files to go in different directories, but we don't want their output
+	# to go in different directories.
+	# N.B. The value here must be kept in sync with Makefile.in.
+	global objdir
+	set name_no_spaces [_convert_spaces $name]
+	set testcase(binfile) "$objdir/gdb.perf/outputs/$name_no_spaces/$name_no_spaces"
+
+	return [array get testcase]
+    }
+
+    proc _verify_parameter_lengths { self_var } {
+	upvar 1 $self_var self
+	set params {
+	    nr_shlibs nr_compunits
+	    nr_extern_globals nr_static_globals
+	    nr_extern_functions nr_static_functions
+	    class_specs
+	    nr_members nr_static_members
+	    nr_methods nr_static_methods
+	}
+	set nr_runs [llength $self(run_names)]
+	foreach p $params {
+	    set n [llength $self($p)]
+	    if { $n > 1 } {
+		if { $n != $nr_runs } {
+		    error "Bad number of values for parameter $p"
+		}
+		set values $self($p)
+		for { set i 0 } { $i < $n - 1 } { incr i } {
+		    if { [lindex $values $i] > [lindex $values [expr $i + 1]] } {
+			error "Values of parameter $p are not increasing"
+		    }
+		}
+	    }
+	}
+    }
+
+    # Verify the testcase is valid (as best we can, this isn't exhaustive).
+
+    proc _verify_testcase { self_var } {
+	upvar 1 $self_var self
+	_verify_parameter_lengths self
+    }
+
+    # Return the value of parameter PARAM for run RUN_NR.
+
+    proc _get_param { param run_nr } {
+	if { [llength $param] == 1 } {
+	    # Since PARAM may be a list of lists we need to use lindex.  This
+	    # also works for scalars (scalars are degenerate lists).
+	    return [lindex $param 0]
+	}
+	return [lindex $param $run_nr]
+    }
+
+    # Return non-zero if all files (binaries + shlibs) can be compiled from
+    # one set of object files.  This is a simple optimization to speed up
+    # test build times.  This happens if the only variation among runs is
+    # nr_shlibs or nr_compunits.
+
+    proc _static_object_files_p { self_var } {
+	upvar 1 $self_var self
+	set object_file_params {
+	    nr_extern_globals nr_static_globals
+	    nr_extern_functions nr_static_functions
+	}
+	set static 1
+	foreach p $object_file_params {
+	    if { [llength $self($p)] > 1 } {
+		set static 0
+	    }
+	}
+	return $static
+    }
+
+    # Return non-zero if classes are enabled.
+
+    proc _classes_enabled_p { self_var run_nr } {
+	upvar 1 $self_var self
+	set class_specs [_get_param $self(class_specs) $run_nr]
+	foreach elm $class_specs {
+	    if { [llength $elm] != 2 } {
+		error "Bad class spec: $elm"
+	    }
+	    if { [lindex $elm 1] > 0 } {
+		return 1
+	    }
+	}
+	return 0
+    }
+
+    # Spaces in file names are a pain, remove them.
+    # They appear if the user puts spaces in the test name or run name.
+
+    proc _convert_spaces { file_name } {
+	return [regsub -all " " $file_name "-"]
+    }
+
+    # Return the path to put source/object files in for run number RUN_NR.
+
+    proc _make_object_dir_name { self_var static run_nr } {
+	upvar 1 $self_var self
+	# Note: The output directory already includes the name of the test
+	# description file.
+	set bindir [file dirname $self(binfile)]
+	# Put the pieces in a subdirectory, there are a lot of them.
+	if $static {
+	    return "$bindir/pieces"
+	} else {
+	    set run_name [_convert_spaces [lindex $self(run_names) $run_nr]]
+	    return "$bindir/pieces/$run_name"
+	}
+    }
+
+    # CU_NR is either the compilation unit number or "main".
+    # RUN_NR is ignored if STATIC is non-zero.
+
+    proc _make_binary_source_name { self_var static run_nr cu_nr } {
+	upvar 1 $self_var self
+	set source_suffix $GenPerfTest::suffixes($self(language))
+	if { !$static } {
+	    set run_name [_get_param $self(run_names) $run_nr]
+	    set source_name "${run_name}-${cu_nr}.$source_suffix"
+	} else {
+	    set source_name "$self(name)-${cu_nr}.$source_suffix"
+	}
+	return "[_make_object_dir_name self $static $run_nr]/[_convert_spaces $source_name]"
+    }
+
+    proc _make_binary_main_source_name { self_var static run_nr } {
+	upvar 1 $self_var self
+	return [_make_binary_source_name self $static $run_nr "main"]
+    }
+
+    # Generated object files get put in the same directory as their source.
+
+    proc _make_binary_object_name { self_var static run_nr cu_nr } {
+	upvar 1 $self_var self
+	set source_name [_make_binary_source_name self $static $run_nr $cu_nr]
+	return [file rootname $source_name].o
+    }
+
+    proc _make_shlib_source_name { self_var static run_nr so_nr cu_nr } {
+	upvar 1 $self_var self
+	set source_suffix $GenPerfTest::suffixes($self(language))
+	if { !$static } {
+	    set run_name [_get_param $self(run_names) $run_nr]
+	    set source_name "$self(name)-${run_name}-lib${so_nr}-${cu_nr}.$source_suffix"
+	} else {
+	    set source_name "$self(name)-lib${so_nr}-${cu_nr}.$source_suffix"
+	}
+	return "[_make_object_dir_name self $static $run_nr]/[_convert_spaces $source_name]"
+    }
+
+    # Return the list of source/object files for the binary.
+    # The source file for main() is returned, as well as the names of all the
+    # object file "pieces".
+    # If STATIC is non-zero the source files are unchanged for each run.
+
+    proc _make_binary_input_file_names { self_var static run_nr } {
+	upvar 1 $self_var self
+	set nr_compunits [_get_param $self(nr_compunits) $run_nr]
+	set result [_make_binary_main_source_name self $static $run_nr]
+	for { set cu_nr 0 } { $cu_nr < $nr_compunits } { incr cu_nr } {
+	    lappend result [_make_binary_object_name self $static $run_nr $cu_nr]
+	}
+	return $result
+    }
+
+    proc _make_binary_name { self_var run_nr } {
+	upvar 1 $self_var self
+	set run_name [_get_param $self(run_names) $run_nr]
+	set exe_name "$self(binfile)-[_convert_spaces ${run_name}]"
+	return $exe_name
+    }
+
+    proc _make_shlib_name { self_var static run_nr so_nr } {
+	upvar 1 $self_var self
+	if { !$static } {
+	    set run_name [_get_param $self(run_names) $run_nr]
+	    set lib_name "$self(name)-${run_name}-lib${so_nr}"
+	} else {
+	    set lib_name "$self(name)-lib${so_nr}"
+	}
+	set output_dir [file dirname $self(binfile)]
+	return "[_make_object_dir_name self $static $run_nr]/[_convert_spaces $lib_name]"
+    }
+
+    proc _create_file { self_var path } {
+	upvar 1 $self_var self
+	verbose -log "Creating file: $path"
+	set f [open $path "w"]
+	return $f
+    }
+
+    proc _write_header { self_var f } {
+	upvar 1 $self_var self
+	puts $f "// DO NOT EDIT, machine generated file."
+	puts $f "// See perftest.exp:GenPerfTest."
+    }
+
+    proc _write_static_globals { self_var f run_nr } {
+	upvar 1 $self_var self
+	puts $f ""
+	set nr_static_globals [_get_param $self(nr_static_globals) $run_nr]
+	# Rather than parameterize the number of const/non-const globals,
+	# and their types, we keep it simple for now.	Even the number of
+	# bss/non-bss globals may be useful; later, if warranted.
+	for { set i 0 } { $i < $nr_static_globals } { incr i } {
+	    if { $i % 2 == 0 } {
+		set const "const "
+	    } else {
+		set const ""
+	    }
+	    puts $f "static ${const}int static_global_$i = $i;"
+	}
+    }
+
+    # ID is "" for the binary, and a unique symbol prefix for each SO.
+
+    proc _write_extern_globals { self_var f run_nr id cu_nr } {
+	upvar 1 $self_var self
+	puts $f ""
+	set nr_extern_globals [_get_param $self(nr_extern_globals) $run_nr]
+	# Rather than parameterize the number of const/non-const globals,
+	# and their types, we keep it simple for now.	Even the number of
+	# bss/non-bss globals may be useful; later, if warranted.
+	for { set i 0 } { $i < $nr_extern_globals } { incr i } {
+	    if { $i % 2 == 0 } {
+		set const "const "
+	    } else {
+		set const ""
+	    }
+	    puts $f "${const}int ${id}global_${cu_nr}_$i = $cu_nr * 1000 + $i;"
+	}
+    }
+
+    proc _write_static_functions { self_var f run_nr } {
+	upvar 1 $self_var self
+	set nr_static_functions [_get_param $self(nr_static_functions) $run_nr]
+	for { set i 0 } { $i < $nr_static_functions } { incr i } {
+	    puts $f ""
+	    puts $f "static void"
+	    puts $f "static_function_$i (void)"
+	    puts $f "{"
+	    puts $f "}"
+	}
+    }
+
+    # ID is "" for the binary, and a unique symbol prefix for each SO.
+
+    proc _write_extern_functions { self_var f run_nr id cu_nr } {
+	upvar 1 $self_var self
+	set nr_extern_functions [_get_param $self(nr_extern_functions) $run_nr]
+	for { set i 0 } { $i < $nr_extern_functions } { incr i } {
+	    puts $f ""
+	    puts $f "void"
+	    puts $f "${id}function_${cu_nr}_$i (void)"
+	    puts $f "{"
+	    puts $f "}"
+	}
+    }
+
+    proc _write_classes { self_var f run_nr cu_nr } {
+	upvar 1 $self_var self
+	set class_specs [_get_param $self(class_specs) $run_nr]
+	set nr_members [_get_param $self(nr_members) $run_nr]
+	set nr_static_members [_get_param $self(nr_static_members) $run_nr]
+	set nr_methods [_get_param $self(nr_methods) $run_nr]
+	set nr_static_methods [_get_param $self(nr_static_methods) $run_nr]
+	foreach spec $class_specs {
+	    set depth [lindex $spec 0]
+	    set nr_classes [lindex $spec 1]
+	    puts $f ""
+	    for { set i 0 } { $i < $depth } { incr i } {
+		puts $f "namespace ns_${i}"
+		puts $f "\{"
+	    }
+	    for { set c 0 } { $c < $nr_classes } { incr c } {
+		set class_name "class_${cu_nr}_${c}"
+		puts $f "class $class_name"
+		puts $f "\{"
+		puts $f " public:"
+		for { set i 0 } { $i < $nr_members } { incr i } {
+		    puts $f "  int member_$i;"
+		}
+		for { set i 0 } { $i < $nr_static_members } { incr i } {
+		    # Rather than parameterize the number of const/non-const
+		    # members, and their types, we keep it simple for now.
+		    if { $i % 2 == 0 } {
+			puts $f "  static const int static_member_$i = $i;"
+		    } else {
+			puts $f "  static int static_member_$i;"
+		    }
+		}
+		for { set i 0 } { $i < $nr_methods } { incr i } {
+		    puts $f "  void method_$i (void);"
+		}
+		for { set i 0 } { $i < $nr_static_methods } { incr i } {
+		    puts $f "  static void static_method_$i (void);"
+		}
+		puts $f "\};"
+		_write_static_members self $f $run_nr $class_name
+		_write_methods self $f $run_nr $class_name
+		_write_static_methods self $f $run_nr $class_name
+	    }
+	    for { set i 0 } { $i < $depth } { incr i } {
+		puts $f "\}"
+	    }
+	}
+    }
+
+    proc _write_static_members { self_var f run_nr class_name } {
+	upvar 1 $self_var self
+	puts $f ""
+	set nr_static_members [_get_param $self(nr_static_members) $run_nr]
+	# Rather than parameterize the number of const/non-const
+	# members, and their types, we keep it simple for now.
+	for { set i 0 } { $i < $nr_static_members } { incr i } {
+	    if { $i % 2 == 0 } {
+		# Static const members are initialized inline.
+	    } else {
+		puts $f "int ${class_name}::static_member_$i = $i;"
+	    }
+	}
+    }
+
+    proc _write_methods { self_var f run_nr class_name } {
+	upvar 1 $self_var self
+	set nr_methods [_get_param $self(nr_methods) $run_nr]
+	for { set i 0 } { $i < $nr_methods } { incr i } {
+	    puts $f ""
+	    puts $f "void"
+	    puts $f "${class_name}::method_$i (void)"
+	    puts $f "{"
+	    puts $f "}"
+	}
+    }
+
+    proc _write_static_methods { self_var f run_nr class_name } {
+	upvar 1 $self_var self
+	set nr_static_methods [_get_param $self(nr_static_methods) $run_nr]
+	for { set i 0 } { $i < $nr_static_methods } { incr i } {
+	    puts $f ""
+	    puts $f "void"
+	    puts $f "${class_name}::static_method_$i (void)"
+	    puts $f "{"
+	    puts $f "}"
+	}
+    }
+
+    proc _gen_binary_compunit_source { self_var static run_nr cu_nr } {
+	upvar 1 $self_var self
+	set source_file [_make_binary_source_name self $static $run_nr $cu_nr]
+	set f [_create_file self $source_file]
+	_write_header self $f
+	_write_static_globals self $f $run_nr
+	_write_extern_globals self $f $run_nr "" $cu_nr
+	_write_static_functions self $f $run_nr
+	_write_extern_functions self $f $run_nr "" $cu_nr
+	if [_classes_enabled_p self $run_nr] {
+	    _write_classes self $f $run_nr $cu_nr
+	}
+	close $f
+	return $source_file
+    }
+
+    proc _gen_shlib_compunit_source { self_var static run_nr so_nr cu_nr } {
+	upvar 1 $self_var self
+	set source_file [_make_shlib_source_name self $static $run_nr $so_nr $cu_nr]
+	set f [_create_file self $source_file]
+	_write_header self $f
+	_write_static_globals self $f $run_nr
+	_write_extern_globals self $f $run_nr "shlib${so_nr}_" $cu_nr
+	_write_static_functions self $f $run_nr
+	_write_extern_functions self $f $run_nr "shlib${so_nr}_" $cu_nr
+	if [_classes_enabled_p self $run_nr] {
+	    _write_classes self $f $run_nr $cu_nr
+	}
+	close $f
+	return $source_file
+    }
+
+    proc _gen_shlib_source { self_var static run_nr so_nr } {
+	upvar 1 $self_var self
+	set result ""
+	set nr_compunits [_get_param $self(nr_compunits) $run_nr]
+	for { set cu_nr 0 } { $cu_nr < $nr_compunits } { incr cu_nr } {
+	    lappend result [_gen_shlib_compunit_source self $static $run_nr $so_nr $cu_nr]
+	}
+	return $result
+    }
+
+    proc _compile_binary_pieces { self_var worker_nr static run_nr } {
+	upvar 1 $self_var self
+	set object_dir [_make_object_dir_name self $static $run_nr]
+	file mkdir $object_dir
+	set compile_flags {debug}
+	set nr_compunits [_get_param $self(nr_compunits) $run_nr]
+	global PERF_TEST_COMPILE_PARALLELISM
+	set nr_workers $PERF_TEST_COMPILE_PARALLELISM
+	for { set cu_nr $worker_nr } { $cu_nr < $nr_compunits } { incr cu_nr $nr_workers } {
+	    set source_file [_gen_binary_compunit_source self $static $run_nr $cu_nr]
+	    set object_file [_make_binary_object_name self $static $run_nr $cu_nr]
+	    if { [gdb_compile $source_file $object_file object $compile_flags] != "" } {
+		return -1
+	    }
+	}
+	return 0
+    }
+
+    # Helper function to compile the pieces of a shlib.
+    # Note: gdb_compile_shlib{,_pthreads} don't support first building object
+    # files and then building the shlib.  Therefore our hands are tied, and we
+    # just build the shlib in one step.  This is less of a parallelization
+    # problem if there are multiple shlibs: Each worker can build a different
+    # shlib.
+
+    proc _compile_shlib { self_var static run_nr so_nr } {
+	upvar 1 $self_var self
+	set source_files [_gen_shlib_source self $static $run_nr $so_nr]
+	set shlib_file [_make_shlib_name self $static $run_nr $so_nr]
+	set compile_flags {debug}
+	if { [gdb_compile_shlib $source_files $shlib_file $compile_flags] != "" } {
+	    return -1
+	}
+	return 0
+    }
+
+    # Compile the pieces of the binary and possible shlibs for the test.
+    # The result is 0 for success, -1 for failure.
+
+    proc _compile_pieces { self_var worker_nr } {
+	upvar 1 $self_var self
+	global PERF_TEST_COMPILE_PARALLELISM
+	set nr_workers $PERF_TEST_COMPILE_PARALLELISM
+	set nr_runs [llength $self(run_names)]
+	set static [_static_object_files_p self]
+	file mkdir "[file dirname $self(binfile)]/pieces"
+	if $static {
+	    # All the pieces look the same (run over run) so just build all the
+	    # shlibs of the last run (which is the largest).
+	    set last_run [expr $nr_runs - 1]
+	    set nr_shlibs [_get_param $self(nr_shlibs) $last_run]
+	    for { set so_nr $worker_nr } { $so_nr < $nr_shlibs } { incr so_nr $nr_workers } {
+		if { [_compile_shlib self $static $last_run $so_nr] < 0 } {
+		    return -1
+		}
+	    }
+	    if { [_compile_binary_pieces self $worker_nr $static $last_run] < 0 } {
+		return -1
+	    }
+	} else {
+	    for { set run_nr 0 } { $run_nr < $nr_runs } { incr run_nr } {
+		set nr_shlibs [_get_param $self(nr_shlibs) $run_nr]
+		for { set so_nr $worker_nr } { $so_nr < $nr_shlibs } { incr so_nr $nr_workers } {
+		    if { [_compile_shlib self $static $run_nr $so_nr] < 0 } {
+			return -1
+		    }
+		}
+		if { [_compile_binary_pieces self $worker_nr $static $run_nr] < 0 } {
+		    return -1
+		}
+	    }
+	}
+	return 0
+    }
+
+    proc compile_pieces { self_var worker_nr } {
+	upvar 1 $self_var self
+	verbose -log "GenPerfTest::compile_pieces, started worker $worker_nr [timestamp -format %c]"
+	verbose -log "self: [array get self]"
+	_verify_testcase self
+	if { [_compile_pieces self $worker_nr] < 0 } {
+	    verbose -log "GenPerfTest::compile_pieces, worker $worker_nr failed [timestamp -format %c]"
+	    return -1
+	}
+	verbose -log "GenPerfTest::compile_pieces, worker $worker_nr done [timestamp -format %c]"
+	return 0
+    }
+
+    proc _generate_main_source { self_var static run_nr } {
+	upvar 1 $self_var self
+	set main_source_file [_make_binary_main_source_name self $static $run_nr]
+	set f [_create_file self $main_source_file]
+	_write_header self $f
+	puts $f ""
+	puts $f "int"
+	puts $f "main (void)"
+	puts $f "{"
+	puts $f "  return 0;"
+	puts $f "}"
+	close $f
+    }
+
+    proc _make_shlib_flags { self_var static run_nr } {
+	upvar 1 $self_var self
+	set nr_shlibs [_get_param $self(nr_shlibs) $run_nr]
+	set result ""
+	for { set i 0 } { $i < $nr_shlibs } { incr i } {
+	    lappend result "shlib=[_make_shlib_name self $static $run_nr $i]"
+	}
+	return $result
+    }
+
+    proc _compile_binary { self_var static run_nr } {
+	upvar 1 $self_var self
+	set input_files [_make_binary_input_file_names self $static $run_nr]
+	set binary_file [_make_binary_name self $run_nr]
+	set compile_flags "debug [_make_shlib_flags self $static $run_nr]"
+	if { [gdb_compile $input_files $binary_file executable $compile_flags] != "" } {
+	    return -1
+	}
+	return 0
+    }
+
+    # Compile the binary for the test.
+    # This assumes the pieces of the binary (all the .o's, except for main())
+    # have already been built with compile_pieces.
+    # There's no need to compile any shlibs here, as compile_pieces will have
+    # already built them too.
+    # The result is 0 for success, -1 for failure.
+
+    proc _compile { self_var } {
+	upvar 1 $self_var self
+	set nr_runs [llength $self(run_names)]
+	set static [_static_object_files_p self]
+	for { set run_nr 0 } { $run_nr < $nr_runs } { incr run_nr } {
+	    _generate_main_source self $static $run_nr
+	    if { [_compile_binary self $static $run_nr] < 0 } {
+		return -1
+	    }
+	}
+	return 0
+    }
+
+    proc compile { self_var } {
+	upvar 1 $self_var self
+	verbose -log "GenPerfTest::compile, started [timestamp -format %c]"
+	verbose -log "self: [array get self]"
+	_verify_testcase self
+	if { [_compile self] < 0 } {
+	    verbose -log "GenPerfTest::compile, failed [timestamp -format %c]"
+	    return -1
+	}
+	verbose -log "GenPerfTest::compile, done [timestamp -format %c]"
+	return 0
+    }
+}
+
+if ![info exists PERF_TEST_COMPILE_PARALLELISM] {
+    set PERF_TEST_COMPILE_PARALLELISM $GenPerfTest::DEFAULT_PERF_TEST_COMPILE_PARALLELISM
+}