[v2,5/6] gdb, infrun: fix silent inferior switch in do_target_wait()

Message ID 20240411052604.87893-6-markus.t.metzger@intel.com
State New
Headers
Series fix PR gdb/19340 |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed

Commit Message

Metzger, Markus T April 11, 2024, 5:26 a.m. UTC
  In do_target_wait(), we iterate over inferiors and call
do_target_wait_1(), which eventually calls target_wait() per inferior.
Each time, we wait for minus_one_ptid.

In some cases, e.g. gdb.threads/detach-step-over.exp, we ask to wait for
one inferior, and get an event from a different inferior back without
noticing the inferior switch.

Wait for a single inferior, instead.  Since we iterate over all inferiors,
we still cover everything.

This exposes another bug with STOP_QUIETLY_NO_SIGSTOP handling.

After attaching, we interrupt all threads in the new inferior, then call
do_target_wait() to receive the stopped events.  This randomly selects an
inferior to start waiting for and iterates over all inferiors starting
from there.

The initial stop event for the main thread is already queued up, so we
wouldn't actually wait() if we had started with the new inferior.  Or if
we had waited for minus_one_ptid, which would then have silently switched
inferiors.

Since we no longer allow that, we may actually wait() for the new inferior
and find other events to report, out of which we randomly select one.

If we selected an event for another thread, e.g. one that had been
interrupted as part of non-stop attach, STOP_QUIETLY_NO_SIGSTOP would be
applied to that thread (unnecessarily), leaving the main thread with a
SIGSTOP event but last_resume_kind = 0 (resume_continue).

When the main thread is later selected, SIGSTOP is reported to the user.

Normally, linux-nat's wait() turns the SIGSTOP it uses for interrupting
threads into GDB_SIGNAL_0.  This is based on last_resume_kind, which is
set to 2 (resume_stop) when sending SIGSTOP to interrupt a thread.

We do this for all threads of the new inferior when interrupting them as
part of non-stop attach.  Except for the main thread, which we expect to
be reported before the first wait().

Set last_resume_kind to resume_stop for the main thread after attaching.
---
 gdb/infrun.c    | 46 +++++++++++++++++++++++++++++++++++++++++++---
 gdb/linux-nat.c | 17 ++++++++++++-----
 gdb/remote.c    | 22 +++++++++++++++++-----
 3 files changed, 72 insertions(+), 13 deletions(-)
  

Patch

diff --git a/gdb/infrun.c b/gdb/infrun.c
index a5030b16376..9ca0571065c 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4198,7 +4198,23 @@  do_target_wait (ptid_t wait_ptid, execution_control_state *ecs,
 
   auto do_wait = [&] (inferior *inf)
   {
-    ecs->ptid = do_target_wait_1 (inf, wait_ptid, &ecs->ws, options);
+    ptid_t ptid { inf->pid };
+
+    /* Make sure we're not widening WAIT_PTID.  */
+    if (!ptid.matches (wait_ptid)
+	/* Targets that cannot async will be asked for a blocking wait.
+
+	   Blocking wait does not work inferior-by-inferior if the target
+	   provides more than one inferior.  Fall back to waiting for
+	   WAIT_PTID in that case.  */
+	|| !target_can_async_p () || ((options & TARGET_WNOHANG) == 0)
+	/* FIXME: I don't see why we should have inferiors with zero pid,
+	   which indicates that the respective ptid is not a process.
+	   They do exist, though, and we cannot wait for them.  */
+	|| !ptid.is_pid ())
+      ptid = wait_ptid;
+
+    ecs->ptid = do_target_wait_1 (inf, ptid, &ecs->ws, options);
     ecs->target = inf->process_target ();
     return (ecs->ws.kind () != TARGET_WAITKIND_IGNORE);
   };
@@ -4208,6 +4224,12 @@  do_target_wait (ptid_t wait_ptid, execution_control_state *ecs,
      reported the stop to the user, polling for events.  */
   scoped_restore_current_thread restore_thread;
 
+  /* The first TARGET_WAITKIND_NO_RESUMED execution state.
+
+     If we do not find a more interesting event, we will report that.  */
+  execution_control_state no_resumed {};
+  no_resumed.ptid = null_ptid;
+
   intrusive_list_iterator<inferior> start
     = inferior_list.iterator_to (*selected);
 
@@ -4218,7 +4240,13 @@  do_target_wait (ptid_t wait_ptid, execution_control_state *ecs,
       inferior *inf = &*it;
 
       if (inferior_matches (inf) && do_wait (inf))
-	return true;
+	{
+	  if (ecs->ws.kind () != TARGET_WAITKIND_NO_RESUMED)
+	    return true;
+
+	  if (no_resumed.ptid == null_ptid)
+	    no_resumed = *ecs;
+	}
     }
 
   for (intrusive_list_iterator<inferior> it = inferior_list.begin ();
@@ -4228,7 +4256,19 @@  do_target_wait (ptid_t wait_ptid, execution_control_state *ecs,
       inferior *inf = &*it;
 
       if (inferior_matches (inf) && do_wait (inf))
-	return true;
+	{
+	  if (ecs->ws.kind () != TARGET_WAITKIND_NO_RESUMED)
+	    return true;
+
+	  if (no_resumed.ptid == null_ptid)
+	    no_resumed = *ecs;
+	}
+    }
+
+  if (no_resumed.ptid != null_ptid)
+    {
+      *ecs = no_resumed;
+      return true;
     }
 
   ecs->ws.set_ignore ();
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 2602e1f240d..06b39d67a72 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1154,6 +1154,7 @@  linux_nat_target::attach (const char *args, int from_tty)
 
   /* Add the initial process as the first LWP to the list.  */
   lp = add_initial_lwp (ptid);
+  lp->last_resume_kind = resume_stop;
 
   status = linux_nat_post_attach_wait (lp->ptid, &lp->signalled);
   if (!WIFSTOPPED (status))
@@ -3329,12 +3330,18 @@  linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
      moment at which we know its PID.  */
   if (ptid.is_pid () && find_lwp_pid (ptid) == nullptr)
     {
-      ptid_t lwp_ptid (ptid.pid (), ptid.pid ());
+      /* Unless we already did and this is simply a request to wait for a
+	 particular inferior.  */
+      inferior *inf = find_inferior_ptid (linux_target, ptid);
+      if (inf && inf->find_thread (ptid))
+	{
+	  ptid_t lwp_ptid (ptid.pid (), ptid.pid ());
 
-      /* Upgrade the main thread's ptid.  */
-      thread_change_ptid (linux_target, ptid, lwp_ptid);
-      lp = add_initial_lwp (lwp_ptid);
-      lp->resumed = 1;
+	  /* Upgrade the main thread's ptid.  */
+	  thread_change_ptid (linux_target, ptid, lwp_ptid);
+	  lp = add_initial_lwp (lwp_ptid);
+	  lp->resumed = 1;
+	}
     }
 
   /* Make sure SIGCHLD is blocked until the sigsuspend below.  */
diff --git a/gdb/remote.c b/gdb/remote.c
index a09ba4d715d..49abd4e4376 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -7825,12 +7825,24 @@  remote_target::remote_notif_remove_queued_reply (ptid_t ptid)
 {
   remote_state *rs = get_remote_state ();
 
+  auto pred = [=] (const stop_reply_up &event)
+  {
+    /* A null ptid should only happen if we have a single process.  It
+       wouldn't match the process ptid, though, so let's check this case
+       separately.  */
+    if ((event->ptid == null_ptid) && ptid.is_pid ())
+      return true;
+
+    /* A minus one ptid should only happen for events that match
+       everything.  It wouldn't match a process or thread ptid, though, so
+       let's check this case separately.  */
+    if (event->ptid == minus_one_ptid)
+      return true;
+
+    return event->ptid.matches (ptid);
+  };
   auto iter = std::find_if (rs->stop_reply_queue.begin (),
-			    rs->stop_reply_queue.end (),
-			    [=] (const stop_reply_up &event)
-			    {
-			      return event->ptid.matches (ptid);
-			    });
+			    rs->stop_reply_queue.end (), pred);
   stop_reply_up result;
   if (iter != rs->stop_reply_queue.end ())
     {