[2/2] Avoid step-over infinite loop in GDBServer

Message ID 20161129120702.9490-2-antoine.tremblay@ericsson.com
State New, archived
Headers

Commit Message

Antoine Tremblay Nov. 29, 2016, 12:07 p.m. UTC
  Before this patch, GDBServer always executed a step-over if it found a
thread that needed one.

This could be a problem in a situation exposed by non-stop-fair-events.exp
where the code and the breakpoint placement is like so:

instruction A : has a single-step breakpoint installed for thread 1 and 2
instruction B : has a single-step breakpoint installed for thread 3
and is a branch to A.

In this particular case:

 - GDBServer stops on instruction A in thread 1.
 - Deletes thread 1 single-step breakpoint.
 - Starts a step-over of thread 1 to step-over the thread 2 breakpoint.
 - GDBServer finishes a step-over and is at instruction B.
 - GDBserver starts a step-over of thread 1 to step-over the thread 3
   breakpoint at instruction B.
 - GDBServer stops on instuction A in thread 1.
 - GDBServer is now in an infinite loop.

This patch avoids this issue by counting the number of times a thread does
a step-over consecutively.  If the thread reaches a threshold, which is
currently 32, GDBServer will not step-over but rather restart all the
threads.

I chose a threshold of 32, so to trigger this there needs to be 32
consecutive instructions with breakpoints installed that one thread needs
to step over. I think this should be rare enought to trigger only in this
case but suggestions on the threshold value are welcome.

If the threshold is hit and the step-over is delayed, when the thread
that needed a step-over runs it will simply hit the breakpoint it needed
to step-over and retry to start a step-over.

This makes it possible for other threads to run and start a step-over
dance of their own or pending events like signals to be handled.

This patch fixes the intermittent FAILs for
gdb.threads/non-stop-fair-events.exp on ARM like discussed here:
https://sourceware.org/ml/gdb-patches/2016-11/msg00132.html

No regressions, tested on ubuntu 14.04 ARMv7.
With gdbserver-native/-m{arm,thumb}

gdb/gdbserver/ChangeLog:

	* linux-low.c (class step_over_limiter): New class.
	(_step_over_limiter): New global variable.
	(linux_wait_1): Count step-overs.
	(proceed_all_lwps): Delay step-over if threshold is reached.
---
 gdb/gdbserver/linux-low.c | 76 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 70 insertions(+), 6 deletions(-)
  

Comments

Antoine Tremblay Jan. 16, 2017, 5:27 p.m. UTC | #1
Ping.

Antoine Tremblay writes:

> Before this patch, GDBServer always executed a step-over if it found a
> thread that needed one.
>
> This could be a problem in a situation exposed by non-stop-fair-events.exp
> where the code and the breakpoint placement is like so:
>
> instruction A : has a single-step breakpoint installed for thread 1 and 2
> instruction B : has a single-step breakpoint installed for thread 3
> and is a branch to A.
>
> In this particular case:
>
>  - GDBServer stops on instruction A in thread 1.
>  - Deletes thread 1 single-step breakpoint.
>  - Starts a step-over of thread 1 to step-over the thread 2 breakpoint.
>  - GDBServer finishes a step-over and is at instruction B.
>  - GDBserver starts a step-over of thread 1 to step-over the thread 3
>    breakpoint at instruction B.
>  - GDBServer stops on instuction A in thread 1.
>  - GDBServer is now in an infinite loop.
>
> This patch avoids this issue by counting the number of times a thread does
> a step-over consecutively.  If the thread reaches a threshold, which is
> currently 32, GDBServer will not step-over but rather restart all the
> threads.
>
> I chose a threshold of 32, so to trigger this there needs to be 32
> consecutive instructions with breakpoints installed that one thread needs
> to step over. I think this should be rare enought to trigger only in this
> case but suggestions on the threshold value are welcome.
>
> If the threshold is hit and the step-over is delayed, when the thread
> that needed a step-over runs it will simply hit the breakpoint it needed
> to step-over and retry to start a step-over.
>
> This makes it possible for other threads to run and start a step-over
> dance of their own or pending events like signals to be handled.
>
> This patch fixes the intermittent FAILs for
> gdb.threads/non-stop-fair-events.exp on ARM like discussed here:
> https://sourceware.org/ml/gdb-patches/2016-11/msg00132.html
>
> No regressions, tested on ubuntu 14.04 ARMv7.
> With gdbserver-native/-m{arm,thumb}
>
> gdb/gdbserver/ChangeLog:
>
> 	* linux-low.c (class step_over_limiter): New class.
> 	(_step_over_limiter): New global variable.
> 	(linux_wait_1): Count step-overs.
> 	(proceed_all_lwps): Delay step-over if threshold is reached.
> ---
>  gdb/gdbserver/linux-low.c | 76 +++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 70 insertions(+), 6 deletions(-)
>
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 15fb726..b84b62a 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -282,6 +282,53 @@ static int proceed_one_lwp (struct inferior_list_entry *entry, void *except);
>     being stepped.  */
>  ptid_t step_over_bkpt;
>  
> +/* Class limiting the number of consecutive step-overs for a thread.  */
> +
> +class step_over_limiter
> +{
> + public:
> +
> +  step_over_limiter () : m_ptid (null_ptid), m_count (0), m_max_count (32) {}
> +
> +  void step_over_done (struct lwp_info *lwp)
> +  {
> +    ptid_t ptid = lwp->thread->entry.id;
> +
> +    if (!ptid_equal (ptid, m_ptid))
> +      {
> +	m_ptid = ptid;
> +	m_count = 0;
> +      }
> +
> +    m_count++;
> +  }
> +
> +  bool can_step_over (struct lwp_info *lwp)
> +  {
> +    if (!ptid_equal(lwp->thread->entry.id, m_ptid)
> +	|| m_count < m_max_count)
> +      return true;
> +    else
> +      {
> +	/* Reset here so that the step_over is retried.  */
> +	m_ptid = null_ptid;
> +	m_count = 0;
> +	return false;
> +      }
> +  }
> +
> + private:
> +
> +  ptid_t m_ptid;
> +  int m_count;
> +
> +  /* Maximum step overs for a thread, before all threads are run instead of
> +     stepping over.  */
> +  const int m_max_count;
> +};
> +
> +step_over_limiter _step_over_limiter;
> +
>  /* True if the low target can hardware single-step.  */
>  
>  static int
> @@ -3607,6 +3654,8 @@ linux_wait_1 (ptid_t ptid,
>  	     doesn't lose it.  */
>  	  enqueue_pending_signal (event_child, WSTOPSIG (w), info_p);
>  
> +	  _step_over_limiter.step_over_done (event_child);
> +
>  	  proceed_all_lwps ();
>  	}
>        else
> @@ -3694,6 +3743,8 @@ linux_wait_1 (ptid_t ptid,
>  	     We're going to keep waiting, so use proceed, which
>  	     handles stepping over the next breakpoint.  */
>  	  unsuspend_all_lwps (event_child);
> +
> +	  _step_over_limiter.step_over_done (event_child);
>  	}
>        else
>  	{
> @@ -5400,13 +5451,26 @@ proceed_all_lwps (void)
>  
>        if (need_step_over != NULL)
>  	{
> -	  if (debug_threads)
> -	    debug_printf ("proceed_all_lwps: found "
> -			  "thread %ld needing a step-over\n",
> -			  lwpid_of (need_step_over));
> +	  /* Don't step over if we're looping too much.  */
> +	  if (_step_over_limiter.can_step_over
> +	      (get_thread_lwp (need_step_over)))
> +	    {
> +	      if (debug_threads)
> +		debug_printf ("proceed_all_lwps: found "
> +			      "thread %ld needing a step-over\n",
> +			      lwpid_of (need_step_over));
>  
> -	  start_step_over (get_thread_lwp (need_step_over));
> -	  return;
> +	      start_step_over (get_thread_lwp (need_step_over));
> +	      return;
> +	    }
> +	  else
> +	    {
> +	      if (debug_threads)
> +		debug_printf ("proceed_all_lwps: found "
> +			      "thread %ld needing a step-over "
> +			      "but max consecutive step-overs reached\n",
> +			      lwpid_of (need_step_over));
> +	    }
>  	}
>      }
  
Antoine Tremblay Jan. 18, 2017, 4:30 p.m. UTC | #2
Antoine Tremblay writes:

> Ping.
>

FYI, I just realised that this can break traceframes and breakpoint counts,
I'll try to fix that or find a better way.
  
Pedro Alves Feb. 3, 2017, 4:21 p.m. UTC | #3
On 11/29/2016 12:07 PM, Antoine Tremblay wrote:
>  - GDBServer stops on instruction A in thread 1.
>  - Deletes thread 1 single-step breakpoint.
>  - Starts a step-over of thread 1 to step-over the thread 2 breakpoint.
>  - GDBServer finishes a step-over and is at instruction B.
>  - GDBserver starts a step-over of thread 1 to step-over the thread 3
>    breakpoint at instruction B.
>  - GDBServer stops on instuction A in thread 1.
>  - GDBServer is now in an infinite loop.

This sounds to me very much like a fairness issue.  There were
three threads stopped that needed to move past a breakpoint, but
gdbserver always picks thread 1.  Why?

Thanks,
Pedro Alves
  
Antoine Tremblay Feb. 17, 2017, 3:34 a.m. UTC | #4
Pedro Alves writes:

> On 11/29/2016 12:07 PM, Antoine Tremblay wrote:
>>  - GDBServer stops on instruction A in thread 1.
>>  - Deletes thread 1 single-step breakpoint.
>>  - Starts a step-over of thread 1 to step-over the thread 2 breakpoint.
>>  - GDBServer finishes a step-over and is at instruction B.
>>  - GDBserver starts a step-over of thread 1 to step-over the thread 3
>>    breakpoint at instruction B.
>>  - GDBServer stops on instuction A in thread 1.
>>  - GDBServer is now in an infinite loop.
>
> This sounds to me very much like a fairness issue.  There were
> three threads stopped that needed to move past a breakpoint, but
> gdbserver always picks thread 1.  Why?

It is a fairness issue but not between the stepping over threads,
gdbserver could pick any thread rather than thread 1 and still get into
the same loop.

The problem is one of fairness between stepping over threads and
non-stepping-over threads.

For example if all the instructions in the thread's execution
path have a breakpoint on it such that this thread always needs to
step-over and that the only way to break that loop is to let a thread
that doesn't need a step-over run then we're in an infinite loop since
GDBServer always gives precedence to the threads needing a step-over.

Thus this patch created more fairness by giving non-stepping over
threads a chance to run.

But it doesn't work... I'm not sure this can be fixed it might be more
a problem like putting a breakpoint on a mutex and wanting a thread that
depends on that mutex to continue... so more a user problem... but
non-stop-fair-events is done in a way that this can happen...

Ideas are welcome :)
  
Yao Qi Feb. 22, 2017, 10:15 a.m. UTC | #5
On Tue, Nov 29, 2016 at 12:07 PM, Antoine Tremblay
<antoine.tremblay@ericsson.com> wrote:
> Before this patch, GDBServer always executed a step-over if it found a
> thread that needed one.
>
> This could be a problem in a situation exposed by non-stop-fair-events.exp
> where the code and the breakpoint placement is like so:
>
> instruction A : has a single-step breakpoint installed for thread 1 and 2
> instruction B : has a single-step breakpoint installed for thread 3
> and is a branch to A.
>

Is instruction B following instruction A?  Is it like

.L1:
 nop
 b .L1

> In this particular case:
>
>  - GDBServer stops on instruction A in thread 1.
>  - Deletes thread 1 single-step breakpoint.
>  - Starts a step-over of thread 1 to step-over the thread 2 breakpoint.
>  - GDBServer finishes a step-over and is at instruction B.
>  - GDBserver starts a step-over of thread 1 to step-over the thread 3
>    breakpoint at instruction B.

Why does GDBserver starts a step-over again?  is it because
need_step_over_p doing checks like this,

  if (breakpoint_here (pc) || fast_tracepoint_jump_here (pc))
    {
      /* Don't step over a breakpoint that GDB expects to hit
         though.  If the condition is being evaluated on the target's side
         and it evaluate to false, step over this breakpoint as well.  */
      if (gdb_breakpoint_here (pc)
          && gdb_condition_true_at_breakpoint (pc)
          && gdb_no_commands_at_breakpoint (pc))
        {
          if (debug_threads)
            debug_printf ("Need step over [LWP %ld]? yes, but found"
                          " GDB breakpoint at 0x%s; skipping step over\n",
                          lwpid_of (thread), paddress (pc));

          current_thread = saved_thread;
          return 0;
        }
      else
        {
          if (debug_threads)
            debug_printf ("Need step over [LWP %ld]? yes, "
                          "found breakpoint at 0x%s\n",
                          lwpid_of (thread), paddress (pc));

          /* We've found an lwp that needs stepping over --- return 1 so
             that find_inferior stops looking.  */
          current_thread = saved_thread;

          return 1;
        }
    }

there is a single step breakpoint on pc, and it is obviously not a
gdb breakpoint, so 1 is returned.

>  - GDBServer stops on instuction A in thread 1.
>  - GDBServer is now in an infinite loop.
>

I am wondering can we take the information that we've already step
over a breakpoint for thread A into need_step_over_p, if we see pc
is on another single step breakpoint for thread B, don't do step over.
  
Antoine Tremblay March 27, 2017, 1:28 p.m. UTC | #6
Yao Qi writes:

> On Tue, Nov 29, 2016 at 12:07 PM, Antoine Tremblay
> <antoine.tremblay@ericsson.com> wrote:
>> Before this patch, GDBServer always executed a step-over if it found a
>> thread that needed one.
>>
>> This could be a problem in a situation exposed by non-stop-fair-events.exp
>> where the code and the breakpoint placement is like so:
>>
>> instruction A : has a single-step breakpoint installed for thread 1 and 2
>> instruction B : has a single-step breakpoint installed for thread 3
>> and is a branch to A.
>>
>
> Is instruction B following instruction A?  Is it like
>
> .L1:
>  nop
>  b .L1
>

Yes, assuming A is nop an B is b .L1.

I reduced it to that from the real world case in non-stop-fair-events.c
with :

0x0000000000400767 <+38>:	mov    0x200963(%rip),%eax        # 0x6010d0 <got_sig>
0x000000000040076d <+44>:	test   %eax,%eax
0x000000000040076f <+46>:	je     0x400767 <child_function+38>

And a single-step breakpoint on all instructions.

>> In this particular case:
>>
>>  - GDBServer stops on instruction A in thread 1.
>>  - Deletes thread 1 single-step breakpoint.
>>  - Starts a step-over of thread 1 to step-over the thread 2 breakpoint.
>>  - GDBServer finishes a step-over and is at instruction B.
>>  - GDBserver starts a step-over of thread 1 to step-over the thread 3
>>    breakpoint at instruction B.
>
> Why does GDBserver starts a step-over again?  is it because
> need_step_over_p doing checks like this,
>
>   if (breakpoint_here (pc) || fast_tracepoint_jump_here (pc))
>     {
>       /* Don't step over a breakpoint that GDB expects to hit
>          though.  If the condition is being evaluated on the target's side
>          and it evaluate to false, step over this breakpoint as well.  */
>       if (gdb_breakpoint_here (pc)
>           && gdb_condition_true_at_breakpoint (pc)
>           && gdb_no_commands_at_breakpoint (pc))
>         {
>           if (debug_threads)
>             debug_printf ("Need step over [LWP %ld]? yes, but found"
>                           " GDB breakpoint at 0x%s; skipping step over\n",
>                           lwpid_of (thread), paddress (pc));
>
>           current_thread = saved_thread;
>           return 0;
>         }
>       else
>         {
>           if (debug_threads)
>             debug_printf ("Need step over [LWP %ld]? yes, "
>                           "found breakpoint at 0x%s\n",
>                           lwpid_of (thread), paddress (pc));
>
>           /* We've found an lwp that needs stepping over --- return 1 so
>              that find_inferior stops looking.  */
>           current_thread = saved_thread;
>
>           return 1;
>         }
>     }
>
> there is a single step breakpoint on pc, and it is obviously not a
> gdb breakpoint, so 1 is returned.
>

Yes.

>>  - GDBServer stops on instuction A in thread 1.
>>  - GDBServer is now in an infinite loop.
>>
>
> I am wondering can we take the information that we've already step
> over a breakpoint for thread A into need_step_over_p, if we see pc
> is on another single step breakpoint for thread B, don't do step over.

There's 3 parts to consider here:

 - What to restart ?

My first thought with this was step-over the other threads that needed
to step-over in a queue like fashion, but this may not be enough since we
may depend on another thread to be able to make progress on the single
stepped threads like is the case in non-stop-fair-events.

So like this patch does I though it was better to restart all
threads...

 - What does it mean to not step-over ?

The don't do step over part is more tricky then it seems since
let's say GDBServer:

 - hits a breakpoint on A with thread 1
 - doesn't step-over and restart all threads
 - any thread hits the breakpoint on A

At this point thread 1 has already hit the breakpoint on A even if we do
not allow it to step-over, so it's stuck there and could hit the
breakpoint a number of times, thus breaking the breakpoints/trace frames
counts and maybe some other stuff... I'm not sure about the solution to
that, ideas ? How to "undo" a breakpoint hit ? Or some other solution...

- When not to step-over ?

After some thought I think we could make it so that we expect the
single-step breakpoints for PC to be hit in the order of insertion.

Since the breakpoints are already a linked list we could make it so that
if you hit a single step breakpoint and that this breakpoint is
installed on multiple threads then single-step only if the thread stopped
matches the thread of the first breakpoint for that PC.

Otherwise continue all threads.

I think this would be simpler than to record the last executed step-over
for all threads and check/reset that flag so that all threads wait for
each other, but that may be possible too.

WDYT ?
  

Patch

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 15fb726..b84b62a 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -282,6 +282,53 @@  static int proceed_one_lwp (struct inferior_list_entry *entry, void *except);
    being stepped.  */
 ptid_t step_over_bkpt;
 
+/* Class limiting the number of consecutive step-overs for a thread.  */
+
+class step_over_limiter
+{
+ public:
+
+  step_over_limiter () : m_ptid (null_ptid), m_count (0), m_max_count (32) {}
+
+  void step_over_done (struct lwp_info *lwp)
+  {
+    ptid_t ptid = lwp->thread->entry.id;
+
+    if (!ptid_equal (ptid, m_ptid))
+      {
+	m_ptid = ptid;
+	m_count = 0;
+      }
+
+    m_count++;
+  }
+
+  bool can_step_over (struct lwp_info *lwp)
+  {
+    if (!ptid_equal(lwp->thread->entry.id, m_ptid)
+	|| m_count < m_max_count)
+      return true;
+    else
+      {
+	/* Reset here so that the step_over is retried.  */
+	m_ptid = null_ptid;
+	m_count = 0;
+	return false;
+      }
+  }
+
+ private:
+
+  ptid_t m_ptid;
+  int m_count;
+
+  /* Maximum step overs for a thread, before all threads are run instead of
+     stepping over.  */
+  const int m_max_count;
+};
+
+step_over_limiter _step_over_limiter;
+
 /* True if the low target can hardware single-step.  */
 
 static int
@@ -3607,6 +3654,8 @@  linux_wait_1 (ptid_t ptid,
 	     doesn't lose it.  */
 	  enqueue_pending_signal (event_child, WSTOPSIG (w), info_p);
 
+	  _step_over_limiter.step_over_done (event_child);
+
 	  proceed_all_lwps ();
 	}
       else
@@ -3694,6 +3743,8 @@  linux_wait_1 (ptid_t ptid,
 	     We're going to keep waiting, so use proceed, which
 	     handles stepping over the next breakpoint.  */
 	  unsuspend_all_lwps (event_child);
+
+	  _step_over_limiter.step_over_done (event_child);
 	}
       else
 	{
@@ -5400,13 +5451,26 @@  proceed_all_lwps (void)
 
       if (need_step_over != NULL)
 	{
-	  if (debug_threads)
-	    debug_printf ("proceed_all_lwps: found "
-			  "thread %ld needing a step-over\n",
-			  lwpid_of (need_step_over));
+	  /* Don't step over if we're looping too much.  */
+	  if (_step_over_limiter.can_step_over
+	      (get_thread_lwp (need_step_over)))
+	    {
+	      if (debug_threads)
+		debug_printf ("proceed_all_lwps: found "
+			      "thread %ld needing a step-over\n",
+			      lwpid_of (need_step_over));
 
-	  start_step_over (get_thread_lwp (need_step_over));
-	  return;
+	      start_step_over (get_thread_lwp (need_step_over));
+	      return;
+	    }
+	  else
+	    {
+	      if (debug_threads)
+		debug_printf ("proceed_all_lwps: found "
+			      "thread %ld needing a step-over "
+			      "but max consecutive step-overs reached\n",
+			      lwpid_of (need_step_over));
+	    }
 	}
     }