[v3,01/17] Fix and test "checkpoint" in non-stop mode

  Hi Doug,

Thanks for the review.

On 04/21/2015 03:35 AM, Doug Evans wrote:
> Pedro Alves writes:
>  > Letting a "checkpoint" run to exit with "set non-stop on" behaves
>  > differently compared to the default all-stop mode ("set non-stop
>  > off").
>  > 
>  > Currently, in non-stop mode:
>  > 
>  >   (gdb) start
>  >   Temporary breakpoint 1 at 0x40086b: file src/gdb/testsuite/gdb.base/checkpoint.c, line 28.
>  >   Starting program: build/gdb/testsuite/gdb.base/checkpoint
>  > 
>  >   Temporary breakpoint 1, main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
>  >   28        char *tmp = &linebuf[0];
>  >   (gdb) checkpoint
>  >   checkpoint 1: fork returned pid 24948.
>  >   (gdb) c
>  >   Continuing.
>  >   Copy complete.
>  >   Deleting copy.
>  >   [Inferior 1 (process 24944) exited normally]
>  >   [Switching to process 24948]
>  >   (gdb) info threads
>  >     Id   Target Id         Frame
>  >     1    process 24948 "checkpoint" (running)
>  > 
>  >   No selected thread.  See `help thread'.
>  >   (gdb) c
>  >   The program is not being run.
>  >   (gdb)
>  > 
>  > Two issues above:
>  > 
>  >  1. Thread 1 got stuck in "(running)" state (it isn't really running)
>  > 
>  >  2. While checkpoints try to preserve the illusion that the thread is
>  >     still the same when the process exits, GDB switched to "No thread
>  >     selected." instead of staying with thread 1 selected.
>  > 
>  > Problem #1 is caused by handle_inferior_event and normal_stop not
>  > considering that when a
>  > TARGET_WAITKIND_SIGNALLED/TARGET_WAITKIND_EXITED event is reported,
>  > and the inferior is mourned, the target may still have execution.
> 
> Hi.
> 
> What does "target may still have execution" mean here?

That "target_has_execution" returns true.  (But I
extended/changed the comment, see below.)

> Is it that there can be another inferior running?

It's the same inferior, but a different fork/process.

The checkpoints that the "checkpoint" code makes are really
forked processes.  This is done by linux-fork.c.  Each fork
naturally is a different process, but that code tries to hide
that from the user and from core gdb.  When the user switches
to a different checkpoint, the inferior is still the same,
though behind the scenes, the process/thread associated with
the inferior changed:

(top-gdb) checkpoint 
checkpoint 1: fork returned pid 19228.
(top-gdb) info checkpoints 
  1 process 19228 at 0x4677bf, file /home/pedro/gdb/mygit/src/gdb/gdb.c, line 28
* 0 Thread 0x7ffff7c2c7c0 (LWP 19224) (main process) at 0x4677bf, file /home/pedro/gdb/mygit/src/gdb/gdb.c, line 28
(top-gdb) info inferiors 
  Num  Description       Executable        
* 1    process 19224     /home/pedro/gdb/mygit/build/gdb/gdb 
(top-gdb) restart 1
Switching to Thread 0x7ffff7c2c7c0 (LWP 19228)
#0  main (argc=1, argv=0x7fffffffd8f8) at /home/pedro/gdb/mygit/src/gdb/gdb.c:28
28        memset (&args, 0, sizeof args);
(top-gdb) info inferiors 
  Num  Description       Executable        
* 1    process 19228     /home/pedro/gdb/mygit/build/gdb/gdb 
(top-gdb)

So if you let one of the checkpoints run to exit, gdb switches
to another checkpoint:

(gdb) info checkpoints 
  1 process 19265 at 0x4004cf, file main.c, line 5
* 0 process 19264 (main process) at 0x4004cf, file main.c, line 5
(gdb) c
Continuing.
[Inferior 1 (process 19264) exited normally]
[Switching to process 19265]
(gdb) 
(gdb) info threads 
  Id   Target Id         Frame 
* 1    process 19265 "main" main (argc=1, argv=0x7fffffffd908) at main.c:5

again, it's still the same gdb thread/inferior object.

That is done at target_mourn_inferior time, in
linux-fork.c:linux_fork_mourn_inferior:

/* The current inferior_ptid has exited, but there are other viable
   forks to debug.  Delete the exiting one and context-switch to the
   first available.  */

void
linux_fork_mourn_inferior (void)
{

which ends up calling:

void
linux_nat_switch_fork (ptid_t new_ptid)
{
  struct lwp_info *lp;

  purge_lwp_list (ptid_get_pid (inferior_ptid));

  lp = add_lwp (new_ptid);
  lp->stopped = 1;

  /* This changes the thread's ptid while preserving the gdb thread
     num.  Also changes the inferior pid, while preserving the
     inferior num.  */
  thread_change_ptid (inferior_ptid, new_ptid);

  /* We've just told GDB core that the thread changed target id, but,
     in fact, it really is a different thread, with different register
     contents.  */
  registers_changed ();
}

Note this is whole checkpoints feature is old code, that predates
the multi-inferior support for a long while.  It has many
limitations (e.g., relies on fork, so can't use it on multi-threaded
programs), but I/we've kept it limping along, at least in
the single-inferior case.

> 
>  > Problem #2 is caused by the make_cleanup_restore_current_thread
>  > cleanup installed by fetch_inferior_event not being able to find the
>  > original thread 1's ptid in the thread list, thus not being able to
>  > restore thread 1 as selected thread.  The fix is to make the cleanup
>  > installed by make_cleanup_restore_current_thread aware of thread ptid
>  > changes, by installing a thread_ptid_changed observer that adjusts the
>  > cleanup's data.
> 
> I'm guessing this is less hacky than it sounds :-),

:-)  I'd say this is just using the thread_changed_ptid observer
hook as intended.

> but it does make me want to understand why this only comes up
> now since threads/inferiors have always been able to "go away"
> while gdb is doing something.

See above.  It's just that non-stop mode was never adjusted
to provide the same illusion that all-stop always provided.
I first noticed it by forcing the target always in non-stop
mode, but the problem exists in current master, with "set non-stop on".

This issue wasn't visible in all-stop, because:

 - in all-stop we always call set_executing(..., 0) / set_running (..., 0)
   with minus_one_ptid (all threads), even on process exits.

 - fetch_inferior_event does not try to preserve the selected
   thread in all-stop.  (although as discussed a while ago, it should,
   if the command was a background command, but that's another story).

> 
>  > After the patch, we get the same in all-stop and non-stop modes:
>  > 
>  >   (gdb) c
>  >   Continuing.
>  >   Copy complete.
>  >   Deleting copy.
>  >   [Inferior 1 (process 25109) exited normally]
>  >   [Switching to process 25113]
>  >   (gdb) info threads
>  >     Id   Target Id         Frame
>  >   * 1    process 25113 "checkpoint" main () at src/gdb/testsuite/gdb.base/checkpoint.c:28
>  >   (gdb)
>  > 
>  > Turns out the whole checkpoints.exp file can run in non-stop mode
>  > unmodified.  I thought of moving most of the test file's contents to a
>  > procedure that can be called twice, once in non-stop mode and another
>  > in all-stop mode.  But then, the test already takes over 30 seconds to
>  > run on my machine, so I thought it'd be nicer to run all-stop and
>  > non-stop mode in parallel.  Thus I added a new checkpoint-ns.exp file
>  > that just sources checkpoint.exp, and sets a knob that checkpoint.exp
>  > reads to know it should test non-stop mode.  No other test in the tree
>  > currently uses this mechanism, but I can't see a reason we shouldn't
>  > do this.
> 
> Re: checkpoint-ns.exp:
> If we're going to have one "make check" involve both all-stop
> and non-stop testing (as opposed to doing one "make check" with all-stop
> and another one with non-stop), I'd rather see a more table-driven approach
> (the details don't matter to me much other than I'd rather have one file
> than a proliferation of foo-ns.exp files).
> Parallelizing that may involve an extra step, but that's ok.

Yes, it sounds like something we may want to get at at some point,
but I can't invest time right now on designing something
more elaborate.  I suggest letting this first example go in,
and see where it takes us, what patterns start emerging.
For example, I realized that we don't really need the extra
variable in checkpoint/checkpoint-ns.exp; we can use GDBFLAGS
instead, though I'm not sure that will/would be sufficient for
other uses.

If I understand your "extra step" correctly, you're suggesting a generated
file.  So there's still an extra file anyway?  I just think of the separate
file as moving the test's body to a helper "library" instead of a
procedure, with the bonus that we can call either variant from the command line,
instead of being forced to run both variants in sequence, as in
the foreach+procedure way that we've been using in many files.

Whether separate file or foreach, it seems to be that it's the same thing
wrt to table-driven vs procedural, so I can't see how an extra file makes
things any worse.  I will personally keep tending to pick the "foreach
variant" route as first choice, as done in numerous tests now.  It's just
that checkpoint.exp is longish in both line number and time it takes to run.

>  > --- a/gdb/infrun.c
>  > +++ b/gdb/infrun.c
>  > @@ -3804,8 +3804,18 @@ handle_inferior_event (struct execution_control_state *ecs)
>  >       any other process were left running.  */
>  >    if (!non_stop)
>  >      set_executing (minus_one_ptid, 0);
>  > -  else if (ecs->ws.kind != TARGET_WAITKIND_SIGNALLED
>  > -	   && ecs->ws.kind != TARGET_WAITKIND_EXITED)
>  > +  else if (ecs->ws.kind == TARGET_WAITKIND_SIGNALLED
>  > +	   && ecs->ws.kind == TARGET_WAITKIND_EXITED)
> 
> pasto: s/&&/||/
> 

Whoops, fixed.  Checkpoints being single-threaded, the else
branch ended up working...

>  > +    {
>  > +      ptid_t pid_ptid;
>  > +
>  > +      /* Some targets still have execution when a process exits.
>  > +	 E.g., for "checkpoint", when when a fork exits and is
> 
> s/when when/when/
> 
>  > +	 mourned, linux-fork.c switches to another fork.  */
>  > +      pid_ptid = pid_to_ptid (ptid_get_pid (ecs->ptid));
>  > +      set_executing (pid_ptid, 0);
> 
> This is a bit confusing. I'm guessing ecs->ptid is for the inferior
> that just exited/signalled, but we're not changing pids here.
> I'm guessing we'll mourn the inferior later, but IWBN to elaborate
> on the comment a little, e.g., to say we always need to call set_executing
> here, even if the inferior exited/signalled (which the code didn't
> previously do). But it's not clear why "switches to another fork" comes
> into play here.

I've changed it this way:

[v3,01/17] Fix and test "checkpoint" in non-stop mode

Commit Message

Comments

Patch