[PATCHv2,3/8] gdb: fix an issue with vfork in non-stop mode

Message ID 1320f2998985fa621cb3dfab0e3143fa2ba99568.1688484032.git.aburgess@redhat.com
State New
Headers
Series Some vfork related fixes |

Commit Message

Andrew Burgess July 4, 2023, 3:22 p.m. UTC
  This commit fixes a bug introduced by this commit:

  commit d8bbae6ea080249c05ca90b1f8640fde48a18301
  Date:   Fri Jan 14 15:40:59 2022 -0500

      gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on)

The problem can be seen in this GDB session:

  $ gdb -q
  (gdb) set non-stop on
  (gdb) file ./gdb/testsuite/outputs/gdb.base/foll-vfork/foll-vfork
  Reading symbols from ./gdb/testsuite/outputs/gdb.base/foll-vfork/foll-vfork...
  (gdb) tcatch vfork
  Catchpoint 1 (vfork)
  (gdb) run
  Starting program: /tmp/gdb/testsuite/outputs/gdb.base/foll-vfork/foll-vfork

  Temporary catchpoint 1 (vforked process 1375914), 0x00007ffff7d5043c in vfork () from /lib64/libc.so.6
  (gdb) bt
  #0  0x00007ffff7d5043c in vfork () from /lib64/libc.so.6
  #1  0x00000000004011af in main (argc=1, argv=0x7fffffffad88) at .../gdb/testsuite/gdb.base/foll-vfork.c:32
  (gdb) finish
  Run till exit from #0  0x00007ffff7d5043c in vfork () from /lib64/libc.so.6
  [Detaching after vfork from child process 1375914]
  No unwaited-for children left.
  (gdb)

Notice the "No unwaited-for children left." error.  This is incorrect,
given where we are stopped there's no reason why we shouldn't be able
to use "finish" to return to the main frame.

When the inferior is stopped as a result of the 'tcatch vfork', the
inferior is in the process of performing the vfork, that is, GDB has
seen the VFORKED event, but has not yet attached to the new child
process, nor has the child process been resumed.

However, GDB has seen the VFORKED, and, as we are going to follow the
parent process, the inferior for the vfork parent will have its
thread_waiting_for_vfork_done member variable set, this will point to
the one and only thread that makes up the vfork parent process.

When the "finish" command is used GDB eventually ends up in the
proceed function (in infrun.c), in here we pass through all the
function until we eventually encounter this 'else if' condition:

   else if (!cur_thr->resumed ()
	     && !thread_is_in_step_over_chain (cur_thr)
	     /* In non-stop, forbid resuming a thread if some other thread of
		that inferior is waiting for a vfork-done event (this means
		breakpoints are out for this inferior).  */
	     && !(non_stop
		  && cur_thr->inf->thread_waiting_for_vfork_done != nullptr))
      {

The first two of these conditions will both be true, the thread is not
already resumed, and is not in the step-over chain, however, the third
condition, this one:

	     && !(non_stop
		  && cur_thr->inf->thread_waiting_for_vfork_done != nullptr))

is false, and this prevents the thread we are trying to finish from
being resumed.  This condition is false because (a) non_stop is true,
and (b) cur_thr->inf->thread_waiting_for_vfork_done is not
nullptr (see above for why).

Now, if we check the comment embedded within the condition it says:

     /* In non-stop, forbid resuming a thread if some other thread of
        that inferior is waiting for a vfork-done event (this means
        breakpoints are out for this inferior).  */

And this makes sense, if we have a vfork parent with two thread, and
one thread has performed a vfork, then we shouldn't try to resume the
second thread.

However, if we are trying to resume the thread that actually performed
a vfork, then this is fine.  If we never resume the vfork parent then
we'll never get a VFORK_DONE event, and so the vfork will never
complete.

Thus, the condition should actually be:

     && !(non_stop
	  && cur_thr->inf->thread_waiting_for_vfork_done != nullptr
	  && cur_thr->inf->thread_waiting_for_vfork_done != cur_thr))

This extra check will allow the vfork parent thread to resume, but
prevent any other thread in the vfork parent process from resuming.
This is the same condition that already exists in the all-stop on a
non-stop-target block earlier in the proceed function.

My actual fix is slightly different to the above, first, I've chosen
to use a nested 'if' check instead of extending the original 'else if'
check, this makes it easier to write a longer comment explaining
what's going on, and second, instead of checking 'non_stop' I've
switched to checking 'target_is_non_stop_p'.  In this context this is
effectively the same thing, a previous 'else if' block in proceed
already handles '!non_stop && target_is_non_stop_p ()', so by the time
we get here, if 'target_is_non_stop_p ()' then we must be running in
non_stop mode.

Both of these tweaks will make the next patch easier, which is a
refactor to merge two parts of the proceed function, so this nested
'if' block is not going to exist for long.

For testing, there is no test included with this commit.  The test was
exposed when using a modified version of the gdb.base/foll-vfork.exp
test script, however, there are other bugs that are exposed when using
the modified test script.  These bugs will be addressed in subsequent
commits, and then I'll add the updated gdb.base/foll-vfork.exp.

If you wish to reproduce this failure then grab the updates to
gdb.base/foll-vfork.exp from the later commit and run this test, the
failure is always reproducible.
---
 gdb/infrun.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)
  

Patch

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 58da1cef29e..5b0257076f0 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -3503,19 +3503,29 @@  proceed (CORE_ADDR addr, enum gdb_signal siggnal)
 	  }
       }
     else if (!cur_thr->resumed ()
-	     && !thread_is_in_step_over_chain (cur_thr)
-	     /* In non-stop, forbid resuming a thread if some other thread of
-		that inferior is waiting for a vfork-done event (this means
-		breakpoints are out for this inferior).  */
-	     && !(non_stop
-		  && cur_thr->inf->thread_waiting_for_vfork_done != nullptr))
+	     && !thread_is_in_step_over_chain (cur_thr))
       {
-	/* The thread wasn't started, and isn't queued, run it now.  */
-	execution_control_state ecs (cur_thr);
-	switch_to_thread (cur_thr);
-	keep_going_pass_signal (&ecs);
-	if (!ecs.wait_some_more)
-	  error (_("Command aborted."));
+	/* In non-stop mode, if a there exists a thread waiting for a vfork
+	   then only allow that thread to resume (breakpoints are removed
+	   from an inferior when handling a vfork).
+
+	   We check target_is_non_stop_p here rather than just checking the
+	   non-stop flag, though these are equivalent (all-stop on a
+	   non-stop target was handled in a previous block, so at this
+	   point, if target_is_non_stop_p then GDB must be running on
+	   non-stop mode).  By using target_is_non_stop_p it will be easier
+	   to merge this block with the previous in a later commit.  */
+	if (!(target_is_non_stop_p ()
+	      && cur_thr->inf->thread_waiting_for_vfork_done != nullptr
+	      && cur_thr->inf->thread_waiting_for_vfork_done != cur_thr))
+	  {
+	    /* The thread wasn't started, and isn't queued, run it now.  */
+	    execution_control_state ecs (cur_thr);
+	    switch_to_thread (cur_thr);
+	    keep_going_pass_signal (&ecs);
+	    if (!ecs.wait_some_more)
+	      error (_("Command aborted."));
+	  }
       }
 
     disable_commit_resumed.reset_and_commit ();