[v3,01/17] Fix and test "checkpoint" in non-stop mode

Message ID 5537DEDD.5000103@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves April 22, 2015, 5:48 p.m. UTC
  Hi Doug,

Thanks for the review.

On 04/21/2015 03:35 AM, Doug Evans wrote:
> Pedro Alves writes:
>  > Letting a "checkpoint" run to exit with "set non-stop on" behaves
>  > differently compared to the default all-stop mode ("set non-stop
>  > off").
>  > 
>  > Currently, in non-stop mode:
>  > 
>  >   (gdb) start
>  >   Temporary breakpoint 1 at 0x40086b: file src/gdb/testsuite/gdb.base/checkpoint.c, line 28.
>  >   Starting program: build/gdb/testsuite/gdb.base/checkpoint
>  > 
>  >   Temporary breakpoint 1, main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
>  >   28        char *tmp = &linebuf[0];
>  >   (gdb) checkpoint
>  >   checkpoint 1: fork returned pid 24948.
>  >   (gdb) c
>  >   Continuing.
>  >   Copy complete.
>  >   Deleting copy.
>  >   [Inferior 1 (process 24944) exited normally]
>  >   [Switching to process 24948]
>  >   (gdb) info threads
>  >     Id   Target Id         Frame
>  >     1    process 24948 "checkpoint" (running)
>  > 
>  >   No selected thread.  See `help thread'.
>  >   (gdb) c
>  >   The program is not being run.
>  >   (gdb)
>  > 
>  > Two issues above:
>  > 
>  >  1. Thread 1 got stuck in "(running)" state (it isn't really running)
>  > 
>  >  2. While checkpoints try to preserve the illusion that the thread is
>  >     still the same when the process exits, GDB switched to "No thread
>  >     selected." instead of staying with thread 1 selected.
>  > 
>  > Problem #1 is caused by handle_inferior_event and normal_stop not
>  > considering that when a
>  > TARGET_WAITKIND_SIGNALLED/TARGET_WAITKIND_EXITED event is reported,
>  > and the inferior is mourned, the target may still have execution.
> 
> Hi.
> 
> What does "target may still have execution" mean here?

That "target_has_execution" returns true.  (But I
extended/changed the comment, see below.)

> Is it that there can be another inferior running?

It's the same inferior, but a different fork/process.

The checkpoints that the "checkpoint" code makes are really
forked processes.  This is done by linux-fork.c.  Each fork
naturally is a different process, but that code tries to hide
that from the user and from core gdb.  When the user switches
to a different checkpoint, the inferior is still the same,
though behind the scenes, the process/thread associated with
the inferior changed:

(top-gdb) checkpoint 
checkpoint 1: fork returned pid 19228.
(top-gdb) info checkpoints 
  1 process 19228 at 0x4677bf, file /home/pedro/gdb/mygit/src/gdb/gdb.c, line 28
* 0 Thread 0x7ffff7c2c7c0 (LWP 19224) (main process) at 0x4677bf, file /home/pedro/gdb/mygit/src/gdb/gdb.c, line 28
(top-gdb) info inferiors 
  Num  Description       Executable        
* 1    process 19224     /home/pedro/gdb/mygit/build/gdb/gdb 
(top-gdb) restart 1
Switching to Thread 0x7ffff7c2c7c0 (LWP 19228)
#0  main (argc=1, argv=0x7fffffffd8f8) at /home/pedro/gdb/mygit/src/gdb/gdb.c:28
28        memset (&args, 0, sizeof args);
(top-gdb) info inferiors 
  Num  Description       Executable        
* 1    process 19228     /home/pedro/gdb/mygit/build/gdb/gdb 
(top-gdb)

So if you let one of the checkpoints run to exit, gdb switches
to another checkpoint:

(gdb) info checkpoints 
  1 process 19265 at 0x4004cf, file main.c, line 5
* 0 process 19264 (main process) at 0x4004cf, file main.c, line 5
(gdb) c
Continuing.
[Inferior 1 (process 19264) exited normally]
[Switching to process 19265]
(gdb) 
(gdb) info threads 
  Id   Target Id         Frame 
* 1    process 19265 "main" main (argc=1, argv=0x7fffffffd908) at main.c:5

again, it's still the same gdb thread/inferior object.

That is done at target_mourn_inferior time, in
linux-fork.c:linux_fork_mourn_inferior:

/* The current inferior_ptid has exited, but there are other viable
   forks to debug.  Delete the exiting one and context-switch to the
   first available.  */

void
linux_fork_mourn_inferior (void)
{

which ends up calling:

void
linux_nat_switch_fork (ptid_t new_ptid)
{
  struct lwp_info *lp;

  purge_lwp_list (ptid_get_pid (inferior_ptid));

  lp = add_lwp (new_ptid);
  lp->stopped = 1;

  /* This changes the thread's ptid while preserving the gdb thread
     num.  Also changes the inferior pid, while preserving the
     inferior num.  */
  thread_change_ptid (inferior_ptid, new_ptid);

  /* We've just told GDB core that the thread changed target id, but,
     in fact, it really is a different thread, with different register
     contents.  */
  registers_changed ();
}


Note this is whole checkpoints feature is old code, that predates
the multi-inferior support for a long while.  It has many
limitations (e.g., relies on fork, so can't use it on multi-threaded
programs), but I/we've kept it limping along, at least in
the single-inferior case.

> 
>  > Problem #2 is caused by the make_cleanup_restore_current_thread
>  > cleanup installed by fetch_inferior_event not being able to find the
>  > original thread 1's ptid in the thread list, thus not being able to
>  > restore thread 1 as selected thread.  The fix is to make the cleanup
>  > installed by make_cleanup_restore_current_thread aware of thread ptid
>  > changes, by installing a thread_ptid_changed observer that adjusts the
>  > cleanup's data.
> 
> I'm guessing this is less hacky than it sounds :-),

:-)  I'd say this is just using the thread_changed_ptid observer
hook as intended.

> but it does make me want to understand why this only comes up
> now since threads/inferiors have always been able to "go away"
> while gdb is doing something.

See above.  It's just that non-stop mode was never adjusted
to provide the same illusion that all-stop always provided.
I first noticed it by forcing the target always in non-stop
mode, but the problem exists in current master, with "set non-stop on".

This issue wasn't visible in all-stop, because:

 - in all-stop we always call set_executing(..., 0) / set_running (..., 0)
   with minus_one_ptid (all threads), even on process exits.

 - fetch_inferior_event does not try to preserve the selected
   thread in all-stop.  (although as discussed a while ago, it should,
   if the command was a background command, but that's another story).

> 
>  > After the patch, we get the same in all-stop and non-stop modes:
>  > 
>  >   (gdb) c
>  >   Continuing.
>  >   Copy complete.
>  >   Deleting copy.
>  >   [Inferior 1 (process 25109) exited normally]
>  >   [Switching to process 25113]
>  >   (gdb) info threads
>  >     Id   Target Id         Frame
>  >   * 1    process 25113 "checkpoint" main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
>  >   (gdb)
>  > 
>  > Turns out the whole checkpoints.exp file can run in non-stop mode
>  > unmodified.  I thought of moving most of the test file's contents to a
>  > procedure that can be called twice, once in non-stop mode and another
>  > in all-stop mode.  But then, the test already takes over 30 seconds to
>  > run on my machine, so I thought it'd be nicer to run all-stop and
>  > non-stop mode in parallel.  Thus I added a new checkpoint-ns.exp file
>  > that just sources checkpoint.exp, and sets a knob that checkpoint.exp
>  > reads to know it should test non-stop mode.  No other test in the tree
>  > currently uses this mechanism, but I can't see a reason we shouldn't
>  > do this.
> 
> Re: checkpoint-ns.exp:
> If we're going to have one "make check" involve both all-stop
> and non-stop testing (as opposed to doing one "make check" with all-stop
> and another one with non-stop), I'd rather see a more table-driven approach
> (the details don't matter to me much other than I'd rather have one file
> than a proliferation of foo-ns.exp files).
> Parallelizing that may involve an extra step, but that's ok.

Yes, it sounds like something we may want to get at at some point,
but I can't invest time right now on designing something
more elaborate.  I suggest letting this first example go in,
and see where it takes us, what patterns start emerging.
For example, I realized that we don't really need the extra
variable in checkpoint/checkpoint-ns.exp; we can use GDBFLAGS
instead, though I'm not sure that will/would be sufficient for
other uses.

If I understand your "extra step" correctly, you're suggesting a generated
file.  So there's still an extra file anyway?  I just think of the separate
file as moving the test's body to a helper "library" instead of a
procedure, with the bonus that we can call either variant from the command line,
instead of being forced to run both variants in sequence, as in
the foreach+procedure way that we've been using in many files.

Whether separate file or foreach, it seems to be that it's the same thing
wrt to table-driven vs procedural, so I can't see how an extra file makes
things any worse.  I will personally keep tending to pick the "foreach
variant" route as first choice, as done in numerous tests now.  It's just
that checkpoint.exp is longish in both line number and time it takes to run.

>  > --- a/gdb/infrun.c
>  > +++ b/gdb/infrun.c
>  > @@ -3804,8 +3804,18 @@ handle_inferior_event (struct execution_control_state *ecs)
>  >       any other process were left running.  */
>  >    if (!non_stop)
>  >      set_executing (minus_one_ptid, 0);
>  > -  else if (ecs->ws.kind != TARGET_WAITKIND_SIGNALLED
>  > -	   && ecs->ws.kind != TARGET_WAITKIND_EXITED)
>  > +  else if (ecs->ws.kind == TARGET_WAITKIND_SIGNALLED
>  > +	   && ecs->ws.kind == TARGET_WAITKIND_EXITED)
> 
> pasto: s/&&/||/
> 

Whoops, fixed.  Checkpoints being single-threaded, the else
branch ended up working...

>  > +    {
>  > +      ptid_t pid_ptid;
>  > +
>  > +      /* Some targets still have execution when a process exits.
>  > +	 E.g., for "checkpoint", when when a fork exits and is
> 
> s/when when/when/
> 
>  > +	 mourned, linux-fork.c switches to another fork.  */
>  > +      pid_ptid = pid_to_ptid (ptid_get_pid (ecs->ptid));
>  > +      set_executing (pid_ptid, 0);
> 
> This is a bit confusing. I'm guessing ecs->ptid is for the inferior
> that just exited/signalled, but we're not changing pids here.
> I'm guessing we'll mourn the inferior later, but IWBN to elaborate
> on the comment a little, e.g., to say we always need to call set_executing
> here, even if the inferior exited/signalled (which the code didn't
> previously do). But it's not clear why "switches to another fork" comes
> into play here.

I've changed it this way:
  

Comments

Doug Evans April 28, 2015, 4:06 p.m. UTC | #1
Pedro Alves writes:
 > Note this is whole checkpoints feature is old code, that predates
 > the multi-inferior support for a long while.  It has many
 > limitations (e.g., relies on fork, so can't use it on multi-threaded
 > programs), but I/we've kept it limping along, at least in
 > the single-inferior case.

Hmmm.  Got it.

 > > Re: checkpoint-ns.exp:
 > > If we're going to have one "make check" involve both all-stop
 > > and non-stop testing (as opposed to doing one "make check" with all-stop
 > > and another one with non-stop), I'd rather see a more table-driven approach
 > > (the details don't matter to me much other than I'd rather have one file
 > > than a proliferation of foo-ns.exp files).
 > > Parallelizing that may involve an extra step, but that's ok.
 > 
 > Yes, it sounds like something we may want to get at at some point,
 > but I can't invest time right now on designing something
 > more elaborate.  I suggest letting this first example go in,
 > and see where it takes us, what patterns start emerging.
 > For example, I realized that we don't really need the extra
 > variable in checkpoint/checkpoint-ns.exp; we can use GDBFLAGS
 > instead, though I'm not sure that will/would be sufficient for
 > other uses.
 > 
 > If I understand your "extra step" correctly, you're suggesting a generated
 > file.  So there's still an extra file anyway?  I just think of the separate
 > file as moving the test's body to a helper "library" instead of a
 > procedure, with the bonus that we can call either variant from the command line,
 > instead of being forced to run both variants in sequence, as in
 > the foreach+procedure way that we've been using in many files.
 > 
 > Whether separate file or foreach, it seems to be that it's the same thing
 > wrt to table-driven vs procedural, so I can't see how an extra file makes
 > things any worse.  I will personally keep tending to pick the "foreach
 > variant" route as first choice, as done in numerous tests now.  It's just
 > that checkpoint.exp is longish in both line number and time it takes to run.

Ah.
The axis on which I'm worried is whether this will turn out to be
1 extra file or 1000. One extra file per test doesn't scale.
And trying to reign in 100 files, or whatever the threshold is
before it's clear that we need to change course, will be more
painful than having a table-driven approach that can be simple
now and still grow as needed. (*1)

Another way to go would be really just have 1 extra file (or 1 extra
file per gdb.foo subdir or some such), and this file
documents all the tests intended to be run in different modes
within one "make check". One could even extend
this to be more than just all-stop/non-stop.

Another way, though, is to have two "make check" runs:
One with all-stop and one with non-stop.
Tests that can't handle one or the other should be marked
as such anyway. How a test chooses to handle the choice
is an internal implementation decision regardless of which
route is chosen. Yeah, running some tests twice this way
won't enhance coverage (e.g., no point in running help.exp in
all-stop and non-stop), but we should be almost within epsilon
of being able to do this today (I know I've done it in the past),
and complete test runs only take a few minutes on any reasonably
beefy system (at least they do on mine).

(*1): A hybrid table driven approach could be to (effectively)
run the test twice using whatever mechanism that running
"make check" twice would use. This way only the specified
set of tests would be run twice (instead of running the whole testsuite
twice), and one can still manually run individual tests in whatever
mode (all-stop/non-stop/whatever) one chooses.

My counter-proposal would be to see the extent to which we can
just run the testsuite in all-stop and non-stop, and do that for now
while we work towards something that scales better.
Someone could get a basic table-driven scheme going in a day.

 > >  > +    {
 > >  > +      ptid_t pid_ptid;
 > >  > +
 > >  > +      /* Some targets still have execution when a process exits.
 > >  > +	 E.g., for "checkpoint", when when a fork exits and is
 > > 
 > > s/when when/when/
 > > 
 > >  > +	 mourned, linux-fork.c switches to another fork.  */
 > >  > +      pid_ptid = pid_to_ptid (ptid_get_pid (ecs->ptid));
 > >  > +      set_executing (pid_ptid, 0);
 > > 
 > > This is a bit confusing. I'm guessing ecs->ptid is for the inferior
 > > that just exited/signalled, but we're not changing pids here.
 > > I'm guessing we'll mourn the inferior later, but IWBN to elaborate
 > > on the comment a little, e.g., to say we always need to call set_executing
 > > here, even if the inferior exited/signalled (which the code didn't
 > > previously do). But it's not clear why "switches to another fork" comes
 > > into play here.
 > 
 > I've changed it this way:
 > 
 > diff --git c/gdb/infrun.c w/gdb/infrun.c
 > index 9c1fdc5..caf1ff5 100644
 > --- c/gdb/infrun.c
 > +++ w/gdb/infrun.c
 > @@ -3798,20 +3798,27 @@ handle_inferior_event (struct execution_control_state *ecs)
 >  
 >    /* Mark the non-executing threads accordingly.  In all-stop, all
 >       threads of all processes are stopped when we get any event
 > -     reported.  In non-stop mode, only the event thread stops.  If
 > -     we're handling a process exit in non-stop mode, there's nothing
 > -     to do, as threads of the dead process are gone, and threads of
 > -     any other process were left running.  */
 > +     reported.  In non-stop mode, only the event thread stops.  */
 >    if (!non_stop)
 >      set_executing (minus_one_ptid, 0);
 >    else if (ecs->ws.kind == TARGET_WAITKIND_SIGNALLED
 > -	   && ecs->ws.kind == TARGET_WAITKIND_EXITED)
 > +	   || ecs->ws.kind == TARGET_WAITKIND_EXITED)
 >      {
 >        ptid_t pid_ptid;
 >  
 > -      /* Some targets still have execution when a process exits.
 > -	 E.g., for "checkpoint", when when a fork exits and is
 > -	 mourned, linux-fork.c switches to another fork.  */
 > +      /* If we're handling a process exit in non-stop mode, even
 > +	 though threads haven't been deleted yet, one would think that
 > +	 there is nothing to do, as threads of the dead process will
 > +	 be soon deleted, and threads of any other process were left
 > +	 running.  However, on some targets, threads survive a process
 > +	 exit event.  E.g., for the "checkpoint" command, when the
 > +	 current checkpoint/fork exits, linux-fork.c automatically
 > +	 switches to another fork from within target_mourn_inferior,
 > +	 by associating the same inferior/thread to another fork.  We
 > +	 haven't mourned yet at this point, but we must mark any
 > +	 threads left in the process as not-executing so that
 > +	 finish_thread_state marks them stopped (in the user's
 > +	 perspective) if/when we present the stop to the user.  */
 >        pid_ptid = pid_to_ptid (ptid_get_pid (ecs->ptid));
 >        set_executing (pid_ptid, 0);
 >      }

Works for me.
Thanks!

 > >From 5932f834e3b4feb2f0771d7bb344ad8f07ed98e1 Mon Sep 17 00:00:00 2001
 > From: Pedro Alves <palves@redhat.com>
 > Date: Wed, 22 Apr 2015 18:25:29 +0100
 > Subject: [PATCH] Fix and test "checkpoint" in non-stop mode
 > 
 > Letting a "checkpoint" run to exit with "set non-stop on" behaves
 > differently compared to the default all-stop mode ("set non-stop
 > off").
 > 
 > Currently, in non-stop mode:
 > 
 >   (gdb) start
 >   Temporary breakpoint 1 at 0x40086b: file src/gdb/testsuite/gdb.base/checkpoint.c, line 28.
 >   Starting program: build/gdb/testsuite/gdb.base/checkpoint
 > 
 >   Temporary breakpoint 1, main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
 >   28        char *tmp = &linebuf[0];
 >   (gdb) checkpoint
 >   checkpoint 1: fork returned pid 24948.
 >   (gdb) c
 >   Continuing.
 >   Copy complete.
 >   Deleting copy.
 >   [Inferior 1 (process 24944) exited normally]
 >   [Switching to process 24948]
 >   (gdb) info threads
 >     Id   Target Id         Frame
 >     1    process 24948 "checkpoint" (running)
 > 
 >   No selected thread.  See `help thread'.
 >   (gdb) c
 >   The program is not being run.
 >   (gdb)
 > 
 > Two issues above:
 > 
 >  1. Thread 1 got stuck in "(running)" state (it isn't really running)
 > 
 >  2. While checkpoints try to preserve the illusion that the thread is
 >     still the same when the process exits, GDB switched to "No thread
 >     selected." instead of staying with thread 1 selected.
 > 
 > Problem #1 is caused by handle_inferior_event and normal_stop not
 > considering that when a
 > TARGET_WAITKIND_SIGNALLED/TARGET_WAITKIND_EXITED event is reported,
 > and the inferior is mourned, the target may still have execution.
 > 
 > Problem #2 is caused by the make_cleanup_restore_current_thread
 > cleanup installed by fetch_inferior_event not being able to find the
 > original thread 1's ptid in the thread list, thus not being able to
 > restore thread 1 as selected thread.  The fix is to make the cleanup
 > installed by make_cleanup_restore_current_thread aware of thread ptid
 > changes, by installing a thread_ptid_changed observer that adjusts the
 > cleanup's data.
 > 
 > After the patch, we get the same in all-stop and non-stop modes:
 > 
 >   (gdb) c
 >   Continuing.
 >   Copy complete.
 >   Deleting copy.
 >   [Inferior 1 (process 25109) exited normally]
 >   [Switching to process 25113]
 >   (gdb) info threads
 >     Id   Target Id         Frame
 >   * 1    process 25113 "checkpoint" main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
 >   (gdb)
 > 
 > Turns out the whole checkpoints.exp file can run in non-stop mode
 > unmodified.  I thought of moving most of the test file's contents to a
 > procedure that can be called twice, once in non-stop mode and another
 > in all-stop mode.  But then, the test already takes over 30 seconds to
 > run on my machine, so I thought it'd be nicer to run all-stop and
 > non-stop mode in parallel.  Thus I added a new checkpoint-ns.exp file
 > that just appends "set non-stop on" to GDBFLAGS and sources
 > checkpoint.exp.
 > 
 > gdb/ChangeLog:
 > 2015-04-22  Pedro Alves  <palves@redhat.com>
 > 
 > 	* infrun.c (handle_inferior_event): If we get
 > 	TARGET_WAITKIND_SIGNALLED or TARGET_WAITKIND_EXITED in non-stop
 > 	mode, mark all threads of the exiting process as not-executing.
 > 	(normal_stop): If we get TARGET_WAITKIND_SIGNALLED or
 > 	TARGET_WAITKIND_EXITED in non-stop mode, finish all threads of the
 > 	exiting process, if inferior_ptid still points at a process.
 > 	* thread.c (struct current_thread_cleanup) <next>: New field.
 > 	(current_thread_cleanup_chain): New global.
 > 	(restore_current_thread_ptid_changed): New function.
 > 	(restore_current_thread_cleanup_dtor): Remove the cleanup from the
 > 	current_thread_cleanup_chain list.
 > 	(make_cleanup_restore_current_thread): Add the cleanup data to the
 > 	current_thread_cleanup_chain list.
 > 	(_initialize_thread): Install restore_current_thread_ptid_changed
 > 	as thread_ptid_changed observer.
 > 
 > gdb/testsuite/ChangeLog:
 > 2015-04-22  Pedro Alves  <palves@redhat.com>
 > 
 > 	* gdb.base/checkpoint-ns.exp: New file.
 > 	* gdb.base/checkpoint.exp: Pass explicit "checkpoint.c" to
 > 	standard_testfile.
 > 
 > v3.1:
 > 
 > 	Extend comments.
 > 	In test, use GDBFLAGS instead of new variable.
 > 
 > v3:
 > 	No changes.
 > ---
 >  gdb/infrun.c                             | 48 ++++++++++++++++++++++++++------
 >  gdb/testsuite/gdb.base/checkpoint-ns.exp | 26 +++++++++++++++++
 >  gdb/testsuite/gdb.base/checkpoint.exp    |  6 ++--
 >  gdb/thread.c                             | 38 +++++++++++++++++++++++++
 >  4 files changed, 107 insertions(+), 11 deletions(-)
 >  create mode 100644 gdb/testsuite/gdb.base/checkpoint-ns.exp
 > 
 > diff --git a/gdb/infrun.c b/gdb/infrun.c
 > index 7870f70..caf1ff5 100644
 > --- a/gdb/infrun.c
 > +++ b/gdb/infrun.c
 > @@ -3798,14 +3798,31 @@ handle_inferior_event (struct execution_control_state *ecs)
 >  
 >    /* Mark the non-executing threads accordingly.  In all-stop, all
 >       threads of all processes are stopped when we get any event
 > -     reported.  In non-stop mode, only the event thread stops.  If
 > -     we're handling a process exit in non-stop mode, there's nothing
 > -     to do, as threads of the dead process are gone, and threads of
 > -     any other process were left running.  */
 > +     reported.  In non-stop mode, only the event thread stops.  */
 >    if (!non_stop)
 >      set_executing (minus_one_ptid, 0);
 > -  else if (ecs->ws.kind != TARGET_WAITKIND_SIGNALLED
 > -	   && ecs->ws.kind != TARGET_WAITKIND_EXITED)
 > +  else if (ecs->ws.kind == TARGET_WAITKIND_SIGNALLED
 > +	   || ecs->ws.kind == TARGET_WAITKIND_EXITED)
 > +    {
 > +      ptid_t pid_ptid;
 > +
 > +      /* If we're handling a process exit in non-stop mode, even
 > +	 though threads haven't been deleted yet, one would think that
 > +	 there is nothing to do, as threads of the dead process will
 > +	 be soon deleted, and threads of any other process were left
 > +	 running.  However, on some targets, threads survive a process
 > +	 exit event.  E.g., for the "checkpoint" command, when the
 > +	 current checkpoint/fork exits, linux-fork.c automatically
 > +	 switches to another fork from within target_mourn_inferior,
 > +	 by associating the same inferior/thread to another fork.  We
 > +	 haven't mourned yet at this point, but we must mark any
 > +	 threads left in the process as not-executing so that
 > +	 finish_thread_state marks them stopped (in the user's
 > +	 perspective) if/when we present the stop to the user.  */
 > +      pid_ptid = pid_to_ptid (ptid_get_pid (ecs->ptid));
 > +      set_executing (pid_ptid, 0);
 > +    }
 > +  else
 >      set_executing (ecs->ptid, 0);
 >  
 >    switch (ecs->ws.kind)
 > @@ -6550,6 +6567,7 @@ normal_stop (void)
 >    struct target_waitstatus last;
 >    ptid_t last_ptid;
 >    struct cleanup *old_chain = make_cleanup (null_cleanup, NULL);
 > +  ptid_t pid_ptid;
 >  
 >    get_last_target_status (&last_ptid, &last);
 >  
 > @@ -6559,9 +6577,21 @@ normal_stop (void)
 >       here, so do this before any filtered output.  */
 >    if (!non_stop)
 >      make_cleanup (finish_thread_state_cleanup, &minus_one_ptid);
 > -  else if (last.kind != TARGET_WAITKIND_SIGNALLED
 > -	   && last.kind != TARGET_WAITKIND_EXITED
 > -	   && last.kind != TARGET_WAITKIND_NO_RESUMED)
 > +  else if (last.kind == TARGET_WAITKIND_SIGNALLED
 > +	   || last.kind == TARGET_WAITKIND_EXITED)
 > +    {
 > +      /* On some targets, we may still have live threads in the
 > +	 inferior when we get a process exit event.  E.g., for
 > +	 "checkpoint", when the current checkpoint/fork exits,
 > +	 linux-fork.c automatically switches to another fork from
 > +	 within target_mourn_inferior.  */
 > +      if (!ptid_equal (inferior_ptid, null_ptid))
 > +	{
 > +	  pid_ptid = pid_to_ptid (ptid_get_pid (inferior_ptid));
 > +	  make_cleanup (finish_thread_state_cleanup, &pid_ptid);
 > +	}
 > +    }
 > +  else if (last.kind != TARGET_WAITKIND_NO_RESUMED)
 >      make_cleanup (finish_thread_state_cleanup, &inferior_ptid);
 >  
 >    /* As we're presenting a stop, and potentially removing breakpoints,
 > diff --git a/gdb/testsuite/gdb.base/checkpoint-ns.exp b/gdb/testsuite/gdb.base/checkpoint-ns.exp
 > new file mode 100644
 > index 0000000..d3698ba
 > --- /dev/null
 > +++ b/gdb/testsuite/gdb.base/checkpoint-ns.exp
 > @@ -0,0 +1,26 @@
 > +# Copyright 2015 Free Software Foundation, Inc.
 > +
 > +# This program is free software; you can redistribute it and/or modify
 > +# it under the terms of the GNU General Public License as published by
 > +# the Free Software Foundation; either version 3 of the License, or
 > +# (at your option) any later version.
 > +#
 > +# This program is distributed in the hope that it will be useful,
 > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 > +# GNU General Public License for more details.
 > +#
 > +# You should have received a copy of the GNU General Public License
 > +# along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
 > +
 > +# Test gdb checkpoint and restart in non-stop mode.
 > +
 > +# We drive non-stop mode from a separate file because the whole test
 > +# takes a while to run.  This way, we can test both modes in parallel.
 > +
 > +set saved_gdbflags $GDBFLAGS
 > +append GDBFLAGS " -ex \"set non-stop on\""
 > +
 > +source $srcdir/$subdir/checkpoint.exp
 > +
 > +set GDBFLAGS $saved_gdbflags
 > diff --git a/gdb/testsuite/gdb.base/checkpoint.exp b/gdb/testsuite/gdb.base/checkpoint.exp
 > index 6d94ab6..4a476be 100644
 > --- a/gdb/testsuite/gdb.base/checkpoint.exp
 > +++ b/gdb/testsuite/gdb.base/checkpoint.exp
 > @@ -24,8 +24,10 @@ if {![istarget "*-*-linux*"]} then {
 >      continue
 >  }
 >  
 > -
 > -standard_testfile .c
 > +# Must name the source file explicitly, otherwise when driven by
 > +# checkpoints-ns.exp, we'd try compiling checkpoints-ns.c, which
 > +# doesn't exist.
 > +standard_testfile checkpoint.c
 >  
 >  set pi_txt [gdb_remote_download host ${srcdir}/${subdir}/pi.txt]
 >  if {[is_remote host]} {
 > diff --git a/gdb/thread.c b/gdb/thread.c
 > index 23dfcc9..46b5947 100644
 > --- a/gdb/thread.c
 > +++ b/gdb/thread.c
 > @@ -1279,8 +1279,16 @@ restore_selected_frame (struct frame_id a_frame_id, int frame_level)
 >      }
 >  }
 >  
 > +/* Data used by the cleanup installed by
 > +   'make_cleanup_restore_current_thread'.  */
 > +
 >  struct current_thread_cleanup
 >  {
 > +  /* Next in list of currently installed 'struct
 > +     current_thread_cleanup' cleanups.  See
 > +     'current_thread_cleanup_chain' below.  */
 > +  struct current_thread_cleanup *next;
 > +
 >    ptid_t inferior_ptid;
 >    struct frame_id selected_frame_id;
 >    int selected_frame_level;
 > @@ -1289,6 +1297,29 @@ struct current_thread_cleanup
 >    int was_removable;
 >  };
 >  
 > +/* A chain of currently installed 'struct current_thread_cleanup'
 > +   cleanups.  Restoring the previously selected thread looks up the
 > +   old thread in the thread list by ptid.  If the thread changes ptid,
 > +   we need to update the cleanup's thread structure so the look up
 > +   succeeds.  */
 > +static struct current_thread_cleanup *current_thread_cleanup_chain;
 > +
 > +/* A thread_ptid_changed observer.  Update all currently installed
 > +   current_thread_cleanup cleanups that want to switch back to
 > +   OLD_PTID to switch back to NEW_PTID instead.  */
 > +
 > +static void
 > +restore_current_thread_ptid_changed (ptid_t old_ptid, ptid_t new_ptid)
 > +{
 > +  struct current_thread_cleanup *it;
 > +
 > +  for (it = current_thread_cleanup_chain; it != NULL; it = it->next)
 > +    {
 > +      if (ptid_equal (it->inferior_ptid, old_ptid))
 > +	it->inferior_ptid = new_ptid;
 > +    }
 > +}
 > +
 >  static void
 >  do_restore_current_thread_cleanup (void *arg)
 >  {
 > @@ -1329,6 +1360,8 @@ restore_current_thread_cleanup_dtor (void *arg)
 >    struct thread_info *tp;
 >    struct inferior *inf;
 >  
 > +  current_thread_cleanup_chain = current_thread_cleanup_chain->next;
 > +
 >    tp = find_thread_ptid (old->inferior_ptid);
 >    if (tp)
 >      tp->refcount--;
 > @@ -1362,6 +1395,9 @@ make_cleanup_restore_current_thread (void)
 >    old->inf_id = current_inferior ()->num;
 >    old->was_removable = current_inferior ()->removable;
 >  
 > +  old->next = current_thread_cleanup_chain;
 > +  current_thread_cleanup_chain = old;
 > +
 >    if (!ptid_equal (inferior_ptid, null_ptid))
 >      {
 >        old->was_stopped = is_stopped (inferior_ptid);
 > @@ -1815,4 +1851,6 @@ Show printing of thread events (such as thread start and exit)."), NULL,
 >           &setprintlist, &showprintlist);
 >  
 >    create_internalvar_type_lazy ("_thread", &thread_funcs, NULL);
 > +
 > +  observer_attach_thread_ptid_changed (restore_current_thread_ptid_changed);
 >  }
 > -- 
 > 1.9.3
 > 
 >
  
Doug Evans April 28, 2015, 9:50 p.m. UTC | #2
On Tue, Apr 28, 2015 at 9:06 AM, Doug Evans <dje@google.com> wrote:
> The axis on which I'm worried is whether this will turn out to be
> 1 extra file or 1000. One extra file per test doesn't scale.
> And trying to reign in 100 files, or whatever the threshold is
> before it's clear that we need to change course, will be more
> painful than having a table-driven approach that can be simple
> now and still grow as needed. (*1)
>
> Another way to go would be really just have 1 extra file (or 1 extra
> file per gdb.foo subdir or some such), and this file
> documents all the tests intended to be run in different modes
> within one "make check". One could even extend
> this to be more than just all-stop/non-stop.
>
> Another way, though, is to have two "make check" runs:
> One with all-stop and one with non-stop.
> Tests that can't handle one or the other should be marked
> as such anyway. How a test chooses to handle the choice
> is an internal implementation decision regardless of which
> route is chosen. Yeah, running some tests twice this way
> won't enhance coverage (e.g., no point in running help.exp in
> all-stop and non-stop), but we should be almost within epsilon
> of being able to do this today (I know I've done it in the past),
> and complete test runs only take a few minutes on any reasonably
> beefy system (at least they do on mine).
>
> (*1): A hybrid table driven approach could be to (effectively)
> run the test twice using whatever mechanism that running
> "make check" twice would use. This way only the specified
> set of tests would be run twice (instead of running the whole testsuite
> twice), and one can still manually run individual tests in whatever
> mode (all-stop/non-stop/whatever) one chooses.
>
> My counter-proposal would be to see the extent to which we can
> just run the testsuite in all-stop and non-stop, and do that for now
> while we work towards something that scales better.
> Someone could get a basic table-driven scheme going in a day.

[Sorry for the resend ... cursed text/html.]

Another thought was that if this is just a one-off and you're manually
splitting up non-stop from all-stop to maintain parallelizability of the tests
then I might not mind the new file.

IOW, tests that want to handle both all-stop and non-stop in one
"make check" can do so in one file. Any loss in parallelization
is probably minimal (modulo really big tests).

Btw, checkpoint.exp completes in 9 seconds on my system.
I suspect doubling that and doing both all-stop and non-stop in
the one file won't increase "make check-parallel" noticeably.
  
Pedro Alves May 19, 2015, 6:07 p.m. UTC | #3
On 04/28/2015 10:50 PM, Doug Evans wrote:

> Another thought was that if this is just a one-off and you're manually
> splitting up non-stop from all-stop to maintain parallelizability of the tests
> then I might not mind the new file.

I don't have plans to do this in any other test, so for now at least,
this is really just a one-off.  It's special because "checkpoint"
only really works with single-threaded programs (because it uses
fork), while non-stop/all-stop mostly matters about threaded programs.

> 
> IOW, tests that want to handle both all-stop and non-stop in one
> "make check" can do so in one file. Any loss in parallelization
> is probably minimal (modulo really big tests).

Yes, and we already do that in some tests.

> 
> Btw, checkpoint.exp completes in 9 seconds on my system.
> I suspect doubling that and doing both all-stop and non-stop in
> the one file won't increase "make check-parallel" noticeably.

It makes a larger difference for me:

$ time make check RUNTESTFLAGS="checkpoint.exp"

real    0m24.716s
user    0m22.676s
sys     0m2.168s

$ time make check RUNTESTFLAGS="checkpoint.exp checkpoint-ns.exp"

real    0m49.241s
user    0m45.183s
sys     0m4.306s

$ time make check TESTS="gdb.base/checkpoint.exp gdb.base/checkpoint-ns.exp" -j

real    0m28.606s
user    0m49.684s
sys     0m5.590s

Thanks,
Pedro Alves
  

Patch

diff --git c/gdb/infrun.c w/gdb/infrun.c
index 9c1fdc5..caf1ff5 100644
--- c/gdb/infrun.c
+++ w/gdb/infrun.c
@@ -3798,20 +3798,27 @@  handle_inferior_event (struct execution_control_state *ecs)
 
   /* Mark the non-executing threads accordingly.  In all-stop, all
      threads of all processes are stopped when we get any event
-     reported.  In non-stop mode, only the event thread stops.  If
-     we're handling a process exit in non-stop mode, there's nothing
-     to do, as threads of the dead process are gone, and threads of
-     any other process were left running.  */
+     reported.  In non-stop mode, only the event thread stops.  */
   if (!non_stop)
     set_executing (minus_one_ptid, 0);
   else if (ecs->ws.kind == TARGET_WAITKIND_SIGNALLED
-	   && ecs->ws.kind == TARGET_WAITKIND_EXITED)
+	   || ecs->ws.kind == TARGET_WAITKIND_EXITED)
     {
       ptid_t pid_ptid;
 
-      /* Some targets still have execution when a process exits.
-	 E.g., for "checkpoint", when when a fork exits and is
-	 mourned, linux-fork.c switches to another fork.  */
+      /* If we're handling a process exit in non-stop mode, even
+	 though threads haven't been deleted yet, one would think that
+	 there is nothing to do, as threads of the dead process will
+	 be soon deleted, and threads of any other process were left
+	 running.  However, on some targets, threads survive a process
+	 exit event.  E.g., for the "checkpoint" command, when the
+	 current checkpoint/fork exits, linux-fork.c automatically
+	 switches to another fork from within target_mourn_inferior,
+	 by associating the same inferior/thread to another fork.  We
+	 haven't mourned yet at this point, but we must mark any
+	 threads left in the process as not-executing so that
+	 finish_thread_state marks them stopped (in the user's
+	 perspective) if/when we present the stop to the user.  */
       pid_ptid = pid_to_ptid (ptid_get_pid (ecs->ptid));
       set_executing (pid_ptid, 0);
     }
@@ -6573,9 +6580,11 @@  normal_stop (void)
   else if (last.kind == TARGET_WAITKIND_SIGNALLED
 	   || last.kind == TARGET_WAITKIND_EXITED)
     {
-      /* Some targets still have execution when a process exits.
-	 E.g., for "checkpoint", when when a fork exits and is
-	 mourned, linux-fork.c switches to another fork.  */
+      /* On some targets, we may still have live threads in the
+	 inferior when we get a process exit event.  E.g., for
+	 "checkpoint", when the current checkpoint/fork exits,
+	 linux-fork.c automatically switches to another fork from
+	 within target_mourn_inferior.  */
       if (!ptid_equal (inferior_ptid, null_ptid))
 	{
 	  pid_ptid = pid_to_ptid (ptid_get_pid (inferior_ptid));

Below's the resulting patch

WDYT?

From 5932f834e3b4feb2f0771d7bb344ad8f07ed98e1 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 22 Apr 2015 18:25:29 +0100
Subject: [PATCH] Fix and test "checkpoint" in non-stop mode

Letting a "checkpoint" run to exit with "set non-stop on" behaves
differently compared to the default all-stop mode ("set non-stop
off").

Currently, in non-stop mode:

  (gdb) start
  Temporary breakpoint 1 at 0x40086b: file src/gdb/testsuite/gdb.base/checkpoint.c, line 28.
  Starting program: build/gdb/testsuite/gdb.base/checkpoint

  Temporary breakpoint 1, main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
  28        char *tmp = &linebuf[0];
  (gdb) checkpoint
  checkpoint 1: fork returned pid 24948.
  (gdb) c
  Continuing.
  Copy complete.
  Deleting copy.
  [Inferior 1 (process 24944) exited normally]
  [Switching to process 24948]
  (gdb) info threads
    Id   Target Id         Frame
    1    process 24948 "checkpoint" (running)

  No selected thread.  See `help thread'.
  (gdb) c
  The program is not being run.
  (gdb)

Two issues above:

 1. Thread 1 got stuck in "(running)" state (it isn't really running)

 2. While checkpoints try to preserve the illusion that the thread is
    still the same when the process exits, GDB switched to "No thread
    selected." instead of staying with thread 1 selected.

Problem #1 is caused by handle_inferior_event and normal_stop not
considering that when a
TARGET_WAITKIND_SIGNALLED/TARGET_WAITKIND_EXITED event is reported,
and the inferior is mourned, the target may still have execution.

Problem #2 is caused by the make_cleanup_restore_current_thread
cleanup installed by fetch_inferior_event not being able to find the
original thread 1's ptid in the thread list, thus not being able to
restore thread 1 as selected thread.  The fix is to make the cleanup
installed by make_cleanup_restore_current_thread aware of thread ptid
changes, by installing a thread_ptid_changed observer that adjusts the
cleanup's data.

After the patch, we get the same in all-stop and non-stop modes:

  (gdb) c
  Continuing.
  Copy complete.
  Deleting copy.
  [Inferior 1 (process 25109) exited normally]
  [Switching to process 25113]
  (gdb) info threads
    Id   Target Id         Frame
  * 1    process 25113 "checkpoint" main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
  (gdb)

Turns out the whole checkpoints.exp file can run in non-stop mode
unmodified.  I thought of moving most of the test file's contents to a
procedure that can be called twice, once in non-stop mode and another
in all-stop mode.  But then, the test already takes over 30 seconds to
run on my machine, so I thought it'd be nicer to run all-stop and
non-stop mode in parallel.  Thus I added a new checkpoint-ns.exp file
that just appends "set non-stop on" to GDBFLAGS and sources
checkpoint.exp.

gdb/ChangeLog:
2015-04-22  Pedro Alves  <palves@redhat.com>

	* infrun.c (handle_inferior_event): If we get
	TARGET_WAITKIND_SIGNALLED or TARGET_WAITKIND_EXITED in non-stop
	mode, mark all threads of the exiting process as not-executing.
	(normal_stop): If we get TARGET_WAITKIND_SIGNALLED or
	TARGET_WAITKIND_EXITED in non-stop mode, finish all threads of the
	exiting process, if inferior_ptid still points at a process.
	* thread.c (struct current_thread_cleanup) <next>: New field.
	(current_thread_cleanup_chain): New global.
	(restore_current_thread_ptid_changed): New function.
	(restore_current_thread_cleanup_dtor): Remove the cleanup from the
	current_thread_cleanup_chain list.
	(make_cleanup_restore_current_thread): Add the cleanup data to the
	current_thread_cleanup_chain list.
	(_initialize_thread): Install restore_current_thread_ptid_changed
	as thread_ptid_changed observer.

gdb/testsuite/ChangeLog:
2015-04-22  Pedro Alves  <palves@redhat.com>

	* gdb.base/checkpoint-ns.exp: New file.
	* gdb.base/checkpoint.exp: Pass explicit "checkpoint.c" to
	standard_testfile.

v3.1:

	Extend comments.
	In test, use GDBFLAGS instead of new variable.

v3:
	No changes.
---
 gdb/infrun.c                             | 48 ++++++++++++++++++++++++++------
 gdb/testsuite/gdb.base/checkpoint-ns.exp | 26 +++++++++++++++++
 gdb/testsuite/gdb.base/checkpoint.exp    |  6 ++--
 gdb/thread.c                             | 38 +++++++++++++++++++++++++
 4 files changed, 107 insertions(+), 11 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/checkpoint-ns.exp

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 7870f70..caf1ff5 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -3798,14 +3798,31 @@  handle_inferior_event (struct execution_control_state *ecs)
 
   /* Mark the non-executing threads accordingly.  In all-stop, all
      threads of all processes are stopped when we get any event
-     reported.  In non-stop mode, only the event thread stops.  If
-     we're handling a process exit in non-stop mode, there's nothing
-     to do, as threads of the dead process are gone, and threads of
-     any other process were left running.  */
+     reported.  In non-stop mode, only the event thread stops.  */
   if (!non_stop)
     set_executing (minus_one_ptid, 0);
-  else if (ecs->ws.kind != TARGET_WAITKIND_SIGNALLED
-	   && ecs->ws.kind != TARGET_WAITKIND_EXITED)
+  else if (ecs->ws.kind == TARGET_WAITKIND_SIGNALLED
+	   || ecs->ws.kind == TARGET_WAITKIND_EXITED)
+    {
+      ptid_t pid_ptid;
+
+      /* If we're handling a process exit in non-stop mode, even
+	 though threads haven't been deleted yet, one would think that
+	 there is nothing to do, as threads of the dead process will
+	 be soon deleted, and threads of any other process were left
+	 running.  However, on some targets, threads survive a process
+	 exit event.  E.g., for the "checkpoint" command, when the
+	 current checkpoint/fork exits, linux-fork.c automatically
+	 switches to another fork from within target_mourn_inferior,
+	 by associating the same inferior/thread to another fork.  We
+	 haven't mourned yet at this point, but we must mark any
+	 threads left in the process as not-executing so that
+	 finish_thread_state marks them stopped (in the user's
+	 perspective) if/when we present the stop to the user.  */
+      pid_ptid = pid_to_ptid (ptid_get_pid (ecs->ptid));
+      set_executing (pid_ptid, 0);
+    }
+  else
     set_executing (ecs->ptid, 0);
 
   switch (ecs->ws.kind)
@@ -6550,6 +6567,7 @@  normal_stop (void)
   struct target_waitstatus last;
   ptid_t last_ptid;
   struct cleanup *old_chain = make_cleanup (null_cleanup, NULL);
+  ptid_t pid_ptid;
 
   get_last_target_status (&last_ptid, &last);
 
@@ -6559,9 +6577,21 @@  normal_stop (void)
      here, so do this before any filtered output.  */
   if (!non_stop)
     make_cleanup (finish_thread_state_cleanup, &minus_one_ptid);
-  else if (last.kind != TARGET_WAITKIND_SIGNALLED
-	   && last.kind != TARGET_WAITKIND_EXITED
-	   && last.kind != TARGET_WAITKIND_NO_RESUMED)
+  else if (last.kind == TARGET_WAITKIND_SIGNALLED
+	   || last.kind == TARGET_WAITKIND_EXITED)
+    {
+      /* On some targets, we may still have live threads in the
+	 inferior when we get a process exit event.  E.g., for
+	 "checkpoint", when the current checkpoint/fork exits,
+	 linux-fork.c automatically switches to another fork from
+	 within target_mourn_inferior.  */
+      if (!ptid_equal (inferior_ptid, null_ptid))
+	{
+	  pid_ptid = pid_to_ptid (ptid_get_pid (inferior_ptid));
+	  make_cleanup (finish_thread_state_cleanup, &pid_ptid);
+	}
+    }
+  else if (last.kind != TARGET_WAITKIND_NO_RESUMED)
     make_cleanup (finish_thread_state_cleanup, &inferior_ptid);
 
   /* As we're presenting a stop, and potentially removing breakpoints,
diff --git a/gdb/testsuite/gdb.base/checkpoint-ns.exp b/gdb/testsuite/gdb.base/checkpoint-ns.exp
new file mode 100644
index 0000000..d3698ba
--- /dev/null
+++ b/gdb/testsuite/gdb.base/checkpoint-ns.exp
@@ -0,0 +1,26 @@ 
+# Copyright 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+# Test gdb checkpoint and restart in non-stop mode.
+
+# We drive non-stop mode from a separate file because the whole test
+# takes a while to run.  This way, we can test both modes in parallel.
+
+set saved_gdbflags $GDBFLAGS
+append GDBFLAGS " -ex \"set non-stop on\""
+
+source $srcdir/$subdir/checkpoint.exp
+
+set GDBFLAGS $saved_gdbflags
diff --git a/gdb/testsuite/gdb.base/checkpoint.exp b/gdb/testsuite/gdb.base/checkpoint.exp
index 6d94ab6..4a476be 100644
--- a/gdb/testsuite/gdb.base/checkpoint.exp
+++ b/gdb/testsuite/gdb.base/checkpoint.exp
@@ -24,8 +24,10 @@  if {![istarget "*-*-linux*"]} then {
     continue
 }
 
-
-standard_testfile .c
+# Must name the source file explicitly, otherwise when driven by
+# checkpoints-ns.exp, we'd try compiling checkpoints-ns.c, which
+# doesn't exist.
+standard_testfile checkpoint.c
 
 set pi_txt [gdb_remote_download host ${srcdir}/${subdir}/pi.txt]
 if {[is_remote host]} {
diff --git a/gdb/thread.c b/gdb/thread.c
index 23dfcc9..46b5947 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1279,8 +1279,16 @@  restore_selected_frame (struct frame_id a_frame_id, int frame_level)
     }
 }
 
+/* Data used by the cleanup installed by
+   'make_cleanup_restore_current_thread'.  */
+
 struct current_thread_cleanup
 {
+  /* Next in list of currently installed 'struct
+     current_thread_cleanup' cleanups.  See
+     'current_thread_cleanup_chain' below.  */
+  struct current_thread_cleanup *next;
+
   ptid_t inferior_ptid;
   struct frame_id selected_frame_id;
   int selected_frame_level;
@@ -1289,6 +1297,29 @@  struct current_thread_cleanup
   int was_removable;
 };
 
+/* A chain of currently installed 'struct current_thread_cleanup'
+   cleanups.  Restoring the previously selected thread looks up the
+   old thread in the thread list by ptid.  If the thread changes ptid,
+   we need to update the cleanup's thread structure so the look up
+   succeeds.  */
+static struct current_thread_cleanup *current_thread_cleanup_chain;
+
+/* A thread_ptid_changed observer.  Update all currently installed
+   current_thread_cleanup cleanups that want to switch back to
+   OLD_PTID to switch back to NEW_PTID instead.  */
+
+static void
+restore_current_thread_ptid_changed (ptid_t old_ptid, ptid_t new_ptid)
+{
+  struct current_thread_cleanup *it;
+
+  for (it = current_thread_cleanup_chain; it != NULL; it = it->next)
+    {
+      if (ptid_equal (it->inferior_ptid, old_ptid))
+	it->inferior_ptid = new_ptid;
+    }
+}
+
 static void
 do_restore_current_thread_cleanup (void *arg)
 {
@@ -1329,6 +1360,8 @@  restore_current_thread_cleanup_dtor (void *arg)
   struct thread_info *tp;
   struct inferior *inf;
 
+  current_thread_cleanup_chain = current_thread_cleanup_chain->next;
+
   tp = find_thread_ptid (old->inferior_ptid);
   if (tp)
     tp->refcount--;
@@ -1362,6 +1395,9 @@  make_cleanup_restore_current_thread (void)
   old->inf_id = current_inferior ()->num;
   old->was_removable = current_inferior ()->removable;
 
+  old->next = current_thread_cleanup_chain;
+  current_thread_cleanup_chain = old;
+
   if (!ptid_equal (inferior_ptid, null_ptid))
     {
       old->was_stopped = is_stopped (inferior_ptid);
@@ -1815,4 +1851,6 @@  Show printing of thread events (such as thread start and exit)."), NULL,
          &setprintlist, &showprintlist);
 
   create_internalvar_type_lazy ("_thread", &thread_funcs, NULL);
+
+  observer_attach_thread_ptid_changed (restore_current_thread_ptid_changed);
 }