[v2,00/23] All-stop on top of non-stop

Message ID 552BE0A3.5040001@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves April 13, 2015, 3:28 p.m. UTC
  On 04/10/2015 10:26 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> Fun.   TBC, that was only with gdbserver, right?
>>
> 
> Right, I just run non-stop-fair-events.exp on native aarch64-linux gdb,
> they all pass.
> 
>> I suspect the test was only passing by change before though.
> 
> They pass before this change.
> 
>> AFAICS, aarch64 doesn't have a displaced stepping implementation.
> 
> No, aarch64 doesn't have.
> 
>> I'd suspect current master fails other non-stop tests? (and hopefully
>> this series fixes them).
> 
> Current master doesn't fail other non-stop tests on aarch64-linux.  This
> series doesn't change the test result except non-stop-fair-events.exp.
> 
>>
>> So GDB should now be falling back to stopping all threads to
>> step past the breakpoint on aarch64, while before threads were
>> just missing breakpoints.   Likely something wrong with that
>> with remote targets still.
> 
> I'll take a look, and see if I can find anything wrong.

With displaced stepping disabled on x86, I can reproduce it
sometimes here too.  The issue seems to be that we're constantly
bouncing between the events of the same two threads over and over,
and thus the signal event is never processed.  Exactly the sort
of issue the test is meant to catch.   Hurray, I guess?  :-)
I've been testing with the patch below, and it seems to fix it.
Testing against gdbserver + software single-step + displaced
stepping disabled caught a couple other issues too.  I'm testing
this further and cleaning it all up now.

Thanks,
Pedro Alves

commit 9112338313db3a23b9abb4d2176ce6f62498dc08
Author:     Pedro Alves <palves@redhat.com>
AuthorDate: Mon Apr 13 14:16:27 2015 +0100

    better randomization
  

Comments

Yao Qi April 13, 2015, 4:16 p.m. UTC | #1
Pedro Alves <palves@redhat.com> writes:

Pedro,
I can't apply this patch cleanly, so unable to test it for
aarch64-linux.  I'll give a test once your V3 is posted, in which this
patch is included, I assume.

> +      /* If we have events pending, go through handle_inferior_event
> +	 again, picking up a pending event at random.  This avoids
> +	 thread starvation.  */
> +      pending = iterate_over_threads (resumed_thread_with_pending_status,
> +				      NULL);

A quick question, I don't see how pending events are picked up at
random.  Do you mean target_ops->to_wait, such as linux_nat_wait, can
get event at random?
  
Pedro Alves April 13, 2015, 4:22 p.m. UTC | #2
On 04/13/2015 05:16 PM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
> Pedro,
> I can't apply this patch cleanly, so unable to test it for
> aarch64-linux.  I'll give a test once your V3 is posted, in which this
> patch is included, I assume.

Oh, sorry for not being clearer, I didn't meant for you to try
the patch, only to say that I got a handle in the issue, to avoid you
wasting any time trying to debug it, in case you felt compelled to do
that.  I'll post something testable once I clean things up a bit more.

> 
>> +      /* If we have events pending, go through handle_inferior_event
>> +	 again, picking up a pending event at random.  This avoids
>> +	 thread starvation.  */
>> +      pending = iterate_over_threads (resumed_thread_with_pending_status,
>> +				      NULL);
> 
> A quick question, I don't see how pending events are picked up at
> random.  Do you mean target_ops->to_wait, such as linux_nat_wait, can
> get event at random?
> 

It's done in do_target_wait.  So we go back to the event loop,
and end up in fetch_inferior_event again.  In the series/branch, that
fetch_inferior_event calls do_target_wait instead of target_wait directly:

/* Wrapper for target_wait that first checks whether threads have
   pending status to report before actually asking the target for more
   events.  */

static ptid_t
do_target_wait (ptid_t ptid, struct target_waitstatus *status, int options)
{

which calls random_pending_event_thread, and that is what takes care
of the randomization.

Thanks,
Pedro Alves
  
Pedro Alves April 13, 2015, 4:23 p.m. UTC | #3
On 04/13/2015 05:16 PM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
> Pedro,
> I can't apply this patch cleanly, so unable to test it for
> aarch64-linux.  I'll give a test once your V3 is posted, in which this
> patch is included, I assume.

Oh, sorry for not being clearer, I didn't meant for you to try
the patch, only to say that I got a handle in the issue, to avoid you
wasting any time trying to debug it, in case you felt compelled to do
that.  I'll post something testable once I clean things up a bit more.

> 
>> +      /* If we have events pending, go through handle_inferior_event
>> +	 again, picking up a pending event at random.  This avoids
>> +	 thread starvation.  */
>> +      pending = iterate_over_threads (resumed_thread_with_pending_status,
>> +				      NULL);
> 
> A quick question, I don't see how pending events are picked up at
> random.  Do you mean target_ops->to_wait, such as linux_nat_wait, can
> get event at random?
> 

It's done in do_target_wait.  So we go back to the event loop,
and end up in fetch_inferior_event again.  In the series/branch, that
fetch_inferior_event calls do_target_wait instead of target_wait directly:

/* Wrapper for target_wait that first checks whether threads have
   pending status to report before actually asking the target for more
   events.  */

static ptid_t
do_target_wait (ptid_t ptid, struct target_waitstatus *status, int options)
{

which calls random_pending_event_thread, and that is what takes care
of the randomization.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 4dd25d6..334d153 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4154,6 +4154,82 @@  switch_to_thread_cleanup (void *ptid_p)
   switch_to_thread (ptid);
 }
 
+/* Save the thread's event and stop reason to process it later.  */
+
+static void
+save_waitstatus (struct thread_info *tp, struct target_waitstatus *ws)
+{
+  struct regcache *regcache;
+  struct address_space *aspace;
+
+  if (debug_infrun)
+    {
+      char *statstr;
+
+      statstr = target_waitstatus_to_string (ws);
+      fprintf_unfiltered (gdb_stdlog,
+			  "infrun: saving status %s for %d.%ld.%ld\n",
+			  statstr,
+			  ptid_get_pid (tp->ptid),
+			  ptid_get_lwp (tp->ptid),
+			  ptid_get_tid (tp->ptid));
+      xfree (statstr);
+    }
+
+  /* Record for later.  */
+  tp->suspend.waitstatus = *ws;
+  tp->suspend.waitstatus_pending_p = 1;
+
+  regcache = get_thread_regcache (tp->ptid);
+  aspace = get_regcache_aspace (regcache);
+
+  if (ws->kind == TARGET_WAITKIND_STOPPED
+      && ws->value.sig == GDB_SIGNAL_TRAP)
+    {
+      CORE_ADDR pc = regcache_read_pc (regcache);
+
+      adjust_pc_after_break (tp, &tp->suspend.waitstatus);
+
+      if (thread_stopped_by_watchpoint (tp->ptid))
+	{
+	  tp->suspend.stop_reason
+	    = TARGET_STOPPED_BY_WATCHPOINT;
+	}
+      else if (target_supports_stopped_by_sw_breakpoint ()
+	       && thread_stopped_by_sw_breakpoint (tp->ptid))
+	{
+	  tp->suspend.stop_reason
+	    = TARGET_STOPPED_BY_SW_BREAKPOINT;
+	}
+      else if (target_supports_stopped_by_hw_breakpoint ()
+	       && thread_stopped_by_hw_breakpoint (tp->ptid))
+	{
+	  tp->suspend.stop_reason
+	    = TARGET_STOPPED_BY_HW_BREAKPOINT;
+	}
+      else if (!target_supports_stopped_by_hw_breakpoint ()
+	       && hardware_breakpoint_inserted_here_p (aspace,
+						       pc))
+	{
+	  tp->suspend.stop_reason
+	    = TARGET_STOPPED_BY_HW_BREAKPOINT;
+	}
+      else if (!target_supports_stopped_by_sw_breakpoint ()
+	       && software_breakpoint_inserted_here_p (aspace,
+						       pc))
+	{
+	  tp->suspend.stop_reason
+	    = TARGET_STOPPED_BY_SW_BREAKPOINT;
+	}
+      else if (!thread_has_single_step_breakpoints_set (tp)
+	       && currently_stepping (tp))
+	{
+	  tp->suspend.stop_reason
+	    = TARGET_STOPPED_BY_SINGLE_STEP;
+	}
+    }
+}
+
 /* Stop all threads.  */
 
 static void
@@ -4317,61 +4393,11 @@  stop_all_threads (void)
 		    }
 
 		  /* Record for later.  */
-		  t->suspend.waitstatus = ws;
-		  t->suspend.waitstatus_pending_p = 1;
+		  save_waitstatus (t, &ws);
 
 		  sig = (ws.kind == TARGET_WAITKIND_STOPPED
 			 ? ws.value.sig : GDB_SIGNAL_0);
 
-		  regcache = get_thread_regcache (t->ptid);
-		  aspace = get_regcache_aspace (regcache);
-
-		  if (ws.kind == TARGET_WAITKIND_STOPPED
-		      && ws.value.sig == GDB_SIGNAL_TRAP)
-		    {
-		      CORE_ADDR pc = regcache_read_pc (regcache);
-
-		      adjust_pc_after_break (t, &t->suspend.waitstatus);
-
-		      if (thread_stopped_by_watchpoint (t->ptid))
-			{
-			  t->suspend.stop_reason
-			    = TARGET_STOPPED_BY_WATCHPOINT;
-			}
-		      else if (target_supports_stopped_by_sw_breakpoint ()
-			       && thread_stopped_by_sw_breakpoint (t->ptid))
-			{
-			  t->suspend.stop_reason
-			    = TARGET_STOPPED_BY_SW_BREAKPOINT;
-			}
-		      else if (target_supports_stopped_by_hw_breakpoint ()
-			       && thread_stopped_by_hw_breakpoint (t->ptid))
-			{
-			  t->suspend.stop_reason
-			    = TARGET_STOPPED_BY_HW_BREAKPOINT;
-			}
-		      else if (!target_supports_stopped_by_hw_breakpoint ()
-			       && hardware_breakpoint_inserted_here_p (aspace,
-								       pc))
-			{
-			  t->suspend.stop_reason
-			    = TARGET_STOPPED_BY_HW_BREAKPOINT;
-			}
-		      else if (!target_supports_stopped_by_sw_breakpoint ()
-			       && software_breakpoint_inserted_here_p (aspace,
-								       pc))
-			{
-			  t->suspend.stop_reason
-			    = TARGET_STOPPED_BY_SW_BREAKPOINT;
-			}
-		      else if (!thread_has_single_step_breakpoints_set (t)
-			       && currently_stepping (t))
-			{
-			  t->suspend.stop_reason
-			    = TARGET_STOPPED_BY_SINGLE_STEP;
-			}
-		    }
-
 		  if (displaced_step_fixup (t->ptid, sig) < 0)
 		    {
 		      /* Add it back to the step-over queue.  */
@@ -4379,6 +4405,7 @@  stop_all_threads (void)
 		      thread_step_over_chain_enqueue (t);
 		    }
 
+		  regcache = get_thread_regcache (t->ptid);
 		  t->suspend.stop_pc = regcache_read_pc (regcache);
 
 		  if (debug_infrun)
@@ -5033,7 +5060,13 @@  restart_threads (struct thread_info *event_thread)
       /* If some thread needs to start a step-over at this point, it
 	 should still be in the step-over queue, and thus skipped
 	 above.  */
-      gdb_assert (!thread_still_needs_step_over (tp));
+      if (thread_still_needs_step_over (tp))
+	{
+	  internal_error (__FILE__, __LINE__,
+			  "thread [%s] needs a step-over, but not in "
+			  "step-over queue\n",
+			  target_pid_to_str (tp->ptid));
+	}
 
       if (currently_stepping (tp))
 	{
@@ -5056,10 +5089,24 @@  restart_threads (struct thread_info *event_thread)
     }
 }
 
+/* Callback for iterate_over_threads.  Find a resumed thread that has
+   a pending waitstatus.  */
+
+static int
+resumed_thread_with_pending_status (struct thread_info *tp,
+				    void *arg)
+{
+  return (tp->resumed
+	  && tp->suspend.waitstatus_pending_p);
+}
+
 /* Called when we get an event that may finish an in-line or
-   out-of-line (displaced stepping) step-over started previously.  */
+   out-of-line (displaced stepping) step-over started previously.
+   Return true if the event is processed and we should go back to the
+   event loop; false if the caller should continue processing the
+   event.  */
 
-static void
+static int
 finish_step_over (struct execution_control_state *ecs)
 {
   int had_step_over_info;
@@ -5081,7 +5128,7 @@  finish_step_over (struct execution_control_state *ecs)
     }
 
   if (!target_is_non_stop_p ())
-    return;
+    return 0;
 
   /* Start a new step-over in another thread if there's one that
      needs it.  */
@@ -5094,10 +5141,67 @@  finish_step_over (struct execution_control_state *ecs)
      these other threads stop.  */
   if (had_step_over_info && !step_over_info_valid_p ())
     {
+      struct thread_info *pending;
+
       restart_threads (ecs->event_thread);
 
-      gdb_assert (!ecs->event_thread->resumed);
+      /* If we have events pending, go through handle_inferior_event
+	 again, picking up a pending event at random.  This avoids
+	 thread starvation.  */
+      pending = iterate_over_threads (resumed_thread_with_pending_status,
+				      NULL);
+      if (pending != NULL)
+	{
+	  struct thread_info *tp = ecs->event_thread;
+	  struct regcache *regcache;
+
+	  if (debug_infrun)
+	    {
+	      fprintf_unfiltered (gdb_stdlog,
+				  "infrun: found resumed threads with "
+				  "pending events, saving status\n");
+	    }
+
+	  gdb_assert (pending != tp);
+
+	  /* Record the event thread's event for later.  */
+	  save_waitstatus (tp, &ecs->ws);
+	  /* This was cleared early, by handle_inferior_event.  Set it
+	     so this pending event is considered by
+	     do_target_wait.  */
+	  tp->resumed = 1;
+
+	  gdb_assert (!tp->executing);
+
+	  regcache = get_thread_regcache (tp->ptid);
+	  tp->suspend.stop_pc = regcache_read_pc (regcache);
+
+	  if (debug_infrun)
+	    {
+	      fprintf_unfiltered (gdb_stdlog,
+				  "infrun: saved stop_pc=%s for %s "
+				  "(currently_stepping=%d)\n",
+				  paddress (target_gdbarch (),
+					    tp->suspend.stop_pc),
+				  target_pid_to_str (tp->ptid),
+				  currently_stepping (tp));
+	    }
+
+	  /* This in-line step-over finished; clear this so we won't
+	     start a new one.  This is what handle_signal_stop would
+	     do, if we returned false.  */
+	  tp->stepping_over_breakpoint = 0;
+	  tp->stepping_over_watchpoint = 0;
+
+	  /* Wake up the event loop again.  */
+	  mark_async_event_handler (infrun_async_inferior_event_token);
+
+	  prepare_to_wait (ecs);
+	  return 1;
+	}
     }
+
+  return 0;
 }
 
 /* Come here when the program has stopped with a signal.  */
@@ -5116,7 +5220,8 @@  handle_signal_stop (struct execution_control_state *ecs)
   /* Do we need to clean up the state of a thread that has
      completed a displaced single-step?  (Doing so usually affects
      the PC, so do it here, before we set stop_pc.)  */
-  finish_step_over (ecs);
+  if (finish_step_over (ecs))
+    return;
 
   /* If we either finished a single-step or hit a breakpoint, but
      the user wanted this thread to be stopped, pretend we got a