[3/8] Deliver signal in hardware single step

Message ID 1457088276-1170-4-git-send-email-yao.qi@linaro.org
State New, archived
Headers

Commit Message

Yao Qi March 4, 2016, 10:44 a.m. UTC
  GDBserver doesn't deliver signal when stepping over a breakpoint even
hardware single step is used.  When GDBserver started to step over
(thread creation) breakpoint for mutlit-threaded debugging in 2002 [1],
GDBserver behaves this way.

This behaviour gets trouble on conditional breakpoints on branch to
self instruction like this,

   0x00000000004005b6 <+29>:	jmp    0x4005b6 <main+29>

and I set breakpoint

$(gdb) break branch-to-self.c:43 if counter > 3

and the variable counter will be set to 5 in SIGALRM signal handler.
Since GDBserver keeps stepping over breakpoint, the SIGALRM can never
be dequeued and delivered to the inferior, so the program can't stop.
The test can be found in gdb.base/branch-to-self.exp.

I can understand why does GDBserver queue signal for software single
step, but I can't figure out a reason we should queue signal for
hardware single step.  With this patch applied, GDBserver forward the
signal to inferior and the program can stop correctly.

[1] PATCH: Multithreaded debugging for gdbserver
    https://sourceware.org/ml/gdb-patches/2002-06/msg00157.html

gdb/gdbserver:

2016-03-04  Yao Qi  <yao.qi@linaro.org>

	* linux-low.c (LWP_SIGNAL_CAN_BE_DELIVERED): Don't deliver
	signal when stepping over breakpoint with software single
	step.
---
 gdb/gdbserver/linux-low.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
  

Comments

Pedro Alves March 11, 2016, 11:05 a.m. UTC | #1
On 03/04/2016 10:44 AM, Yao Qi wrote:
> GDBserver doesn't deliver signal when stepping over a breakpoint even
> hardware single step is used.  When GDBserver started to step over
> (thread creation) breakpoint for mutlit-threaded debugging in 2002 [1],
> GDBserver behaves this way.
> 
> This behaviour gets trouble on conditional breakpoints on branch to
> self instruction like this,
> 
>     0x00000000004005b6 <+29>:	jmp    0x4005b6 <main+29>
> 
> and I set breakpoint
> 
> $(gdb) break branch-to-self.c:43 if counter > 3
> 
> and the variable counter will be set to 5 in SIGALRM signal handler.
> Since GDBserver keeps stepping over breakpoint, the SIGALRM can never
> be dequeued and delivered to the inferior, so the program can't stop.
> The test can be found in gdb.base/branch-to-self.exp.
> 
> I can understand why does GDBserver queue signal for software single
> step, but I can't figure out a reason we should queue signal for
> hardware single step.  With this patch applied, GDBserver forward the
> signal to inferior and the program can stop correctly.
> 
> [1] PATCH: Multithreaded debugging for gdbserver
>      https://sourceware.org/ml/gdb-patches/2002-06/msg00157.html
> 

Because the signal handler might recurse and call the same code
that had the breakpoint (or some other removed breakpoint), and thus
we'd miss a breakpoint hit in the signal handler.

GDB / infrun.c handles it here:

      if (ecs->event_thread->prev_pc == stop_pc
	  && ecs->event_thread->control.trap_expected
	  && ecs->event_thread->control.step_resume_breakpoint == NULL)
	{
	  int was_in_line;

	  /* We were just starting a new sequence, attempting to
	     single-step off of a breakpoint and expecting a SIGTRAP.
	     Instead this signal arrives.  This signal will take us out
	     of the stepping range so GDB needs to remember to, when
	     the signal handler returns, resume stepping off that
	     breakpoint.  */
	  /* To simplify things, "continue" is forced to use the same
	     code paths as single-step - set a breakpoint at the
	     signal return address and then, once hit, step off that
	     breakpoint.  */

IIRC, some of sigstep.exp, signull.exp, signest.exp exercise this.

Note that this also lets all threads run while the signal
handler runs.

Thanks,
Pedro Alves
  
Pedro Alves March 11, 2016, 11:09 a.m. UTC | #2
On 03/11/2016 11:05 AM, Pedro Alves wrote:
> On 03/04/2016 10:44 AM, Yao Qi wrote:
>> GDBserver doesn't deliver signal when stepping over a breakpoint even
>> hardware single step is used.  When GDBserver started to step over
>> (thread creation) breakpoint for mutlit-threaded debugging in 2002 [1],
>> GDBserver behaves this way.
>>
>> This behaviour gets trouble on conditional breakpoints on branch to
>> self instruction like this,
>>
>>      0x00000000004005b6 <+29>:	jmp    0x4005b6 <main+29>
>>
>> and I set breakpoint
>>
>> $(gdb) break branch-to-self.c:43 if counter > 3
>>
>> and the variable counter will be set to 5 in SIGALRM signal handler.
>> Since GDBserver keeps stepping over breakpoint, the SIGALRM can never
>> be dequeued and delivered to the inferior, so the program can't stop.
>> The test can be found in gdb.base/branch-to-self.exp.
>>
>> I can understand why does GDBserver queue signal for software single
>> step, but I can't figure out a reason we should queue signal for
>> hardware single step.  With this patch applied, GDBserver forward the
>> signal to inferior and the program can stop correctly.
>>
>> [1] PATCH: Multithreaded debugging for gdbserver
>>       https://sourceware.org/ml/gdb-patches/2002-06/msg00157.html
>>
>
> Because the signal handler might recurse and call the same code
> that had the breakpoint (or some other removed breakpoint), and thus
> we'd miss a breakpoint hit in the signal handler.

Hmm, no, I got confused.  We'll stop in first instruction in the signal 
handler.  Let me go back and take a fresh look.

Thanks,
Pedro Alves
  
Pedro Alves March 11, 2016, 11:37 a.m. UTC | #3
On 03/11/2016 11:09 AM, Pedro Alves wrote:
> On 03/11/2016 11:05 AM, Pedro Alves wrote:
>> On 03/04/2016 10:44 AM, Yao Qi wrote:
>>> GDBserver doesn't deliver signal when stepping over a breakpoint even
>>> hardware single step is used.  When GDBserver started to step over
>>> (thread creation) breakpoint for mutlit-threaded debugging in 2002 [1],
>>> GDBserver behaves this way.
>>>
>>> This behaviour gets trouble on conditional breakpoints on branch to
>>> self instruction like this,
>>>
>>>      0x00000000004005b6 <+29>:    jmp    0x4005b6 <main+29>
>>>
>>> and I set breakpoint
>>>
>>> $(gdb) break branch-to-self.c:43 if counter > 3
>>>
>>> and the variable counter will be set to 5 in SIGALRM signal handler.
>>> Since GDBserver keeps stepping over breakpoint, the SIGALRM can never
>>> be dequeued and delivered to the inferior, so the program can't stop.
>>> The test can be found in gdb.base/branch-to-self.exp.
>>>
>>> I can understand why does GDBserver queue signal for software single
>>> step, but I can't figure out a reason we should queue signal for
>>> hardware single step.  With this patch applied, GDBserver forward the
>>> signal to inferior and the program can stop correctly.
>>>
>>> [1] PATCH: Multithreaded debugging for gdbserver
>>>       https://sourceware.org/ml/gdb-patches/2002-06/msg00157.html
>>>
>>
>> Because the signal handler might recurse and call the same code
>> that had the breakpoint (or some other removed breakpoint), and thus
>> we'd miss a breakpoint hit in the signal handler.
> 
> Hmm, no, I got confused.  We'll stop in first instruction in the signal 
> handler.  Let me go back and take a fresh look.

OK, trying to think of scenarios that might break.  Haven't actually
tried them, I may have missed something.

- If there's no handler installed at all (SIG_IGN), then we'll
  manage to step over the breakpoint successfully and stop at the
  next instruction.  So this case is not an issue.

- If there's a signal handler installed, we'll stop at its entry,
  but we haven't stepped over the original breakpoint yet.  If we
  were already stopped at the entry of the signal handler, and it
  nested, we'll find the program stopped at the same PC it had
  started at.  But we won't know whether the step-over finished
  successfully (could be a "jmp $pc" instruction), or if instead we're
  in a new handler invocation, and thus this is a new,
  separate breakpoint hit.

  A signal handler that segfaults in the first instruction
  would be the easiest way to reproduce this that I can think of.
  (If it's crashing, you'd expect the user might try putting a
  breakpoint there.)

  Now, if after stepping into the handler, we immediately reinsert the 
  breakpoint and continue, we'll retrap it, it ends up OK.

  But if we immediately instead start a new step-over for the same 
  breakpoint address, we will miss the new hit, and nest a new handler,
  on and on.

  Not sure which happens, but I suspect the latter.

  Maybe we could detect this by comparing before / after
  stack pointer.

Thanks,
Pedro Alves
  
Yao Qi March 16, 2016, 10:47 a.m. UTC | #4
Pedro Alves <palves@redhat.com> writes:

Hi Pedro,

> - If there's a signal handler installed, we'll stop at its entry,
>   but we haven't stepped over the original breakpoint yet.  If we
>   were already stopped at the entry of the signal handler, and it
>   nested, we'll find the program stopped at the same PC it had
>   started at.  But we won't know whether the step-over finished
>   successfully (could be a "jmp $pc" instruction), or if instead we're
>   in a new handler invocation, and thus this is a new,
>   separate breakpoint hit.

GDBserver doesn't have to tell the program stopped at the same PC
it had started at when step over is finished.  When step over, GDBserver
uninsert breakpoint, resumes the lwp, and wait the next event from the
lwp.  Once the event from the lwp arrives, GDBserver reinsert breakpoint
and thinks step over is finished.  That is all ...

>
>   A signal handler that segfaults in the first instruction
>   would be the easiest way to reproduce this that I can think of.
>   (If it's crashing, you'd expect the user might try putting a
>   breakpoint there.)

I write a case like this, set the conditional breakpoint on the first
instruction of handler, and looks breakpoint condition in target side
still works properly.

(gdb) disassemble handler
Dump of assembler code for function handler:
   0x0000000000400586 <+0>:     movb   $0x0,0x0
   0x000000000040058e <+8>:     retq

(gdb) p/x $rsp
$1 = 0x7fffffffde00
(gdb) b *handler if $rsp < 0x7fffffffde00 - 3300

the condition like this can guarantee that the condition will be true
after several recursions.

>
>   Now, if after stepping into the handler, we immediately reinsert the 
>   breakpoint and continue, we'll retrap it, it ends up OK.

breakpoint is reinserted after stepping into the handler, because
GDBserver thinks step over is finished, as I said above.

>   But if we immediately instead start a new step-over for the same 
>   breakpoint address, we will miss the new hit, and nest a new handler,
>   on and on.

This won't happen.  After hardware single step with signal delivered,
the program stops at the entry of handler, and GDBserver gets an event.
Then, GDBserver thinks step over is finished, and then proceed all lwps.
However, the thread still stopped at the breakpoint (of different
stack frames), so it needs step over again.  The loop like this will go
on until the breakpoint condition is true, need_step_over_p decides to
resume rather than step over.  The program will hit the breakpoint, and
GDBserver reports the event back to GDB.

Note in my experiment with the test case above, I don't let GDBserver
treat SIGSEGV as SIGTRAP, otherwise SIGSEGV will be reported back to GDB.
  
Pedro Alves March 17, 2016, 12:12 p.m. UTC | #5
On 03/16/2016 10:47 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
> Hi Pedro,
> 
>> - If there's a signal handler installed, we'll stop at its entry,
>>    but we haven't stepped over the original breakpoint yet.  If we
>>    were already stopped at the entry of the signal handler, and it
>>    nested, we'll find the program stopped at the same PC it had
>>    started at.  But we won't know whether the step-over finished
>>    successfully (could be a "jmp $pc" instruction), or if instead we're
>>    in a new handler invocation, and thus this is a new,
>>    separate breakpoint hit.
> 
> GDBserver doesn't have to tell the program stopped at the same PC
> it had started at when step over is finished.  When step over, GDBserver
> uninsert breakpoint, resumes the lwp, and wait the next event from the
> lwp.  Once the event from the lwp arrives, GDBserver reinsert breakpoint
> and thinks step over is finished.  That is all ...
> 
>>
>>    A signal handler that segfaults in the first instruction
>>    would be the easiest way to reproduce this that I can think of.
>>    (If it's crashing, you'd expect the user might try putting a
>>    breakpoint there.)
> 
> I write a case like this, set the conditional breakpoint on the first
> instruction of handler, and looks breakpoint condition in target side
> still works properly.
> 
> (gdb) disassemble handler
> Dump of assembler code for function handler:
>     0x0000000000400586 <+0>:     movb   $0x0,0x0
>     0x000000000040058e <+8>:     retq
> 
> (gdb) p/x $rsp
> $1 = 0x7fffffffde00
> (gdb) b *handler if $rsp < 0x7fffffffde00 - 3300
> 
> the condition like this can guarantee that the condition will be true
> after several recursions.
> 
>>
>>    Now, if after stepping into the handler, we immediately reinsert the
>>    breakpoint and continue, we'll retrap it, it ends up OK.
> 
> breakpoint is reinserted after stepping into the handler, because
> GDBserver thinks step over is finished, as I said above.
> 
>>    But if we immediately instead start a new step-over for the same
>>    breakpoint address, we will miss the new hit, and nest a new handler,
>>    on and on.
> 
> This won't happen.  After hardware single step with signal delivered,
> the program stops at the entry of handler, and GDBserver gets an event.
> Then, GDBserver thinks step over is finished, and then proceed all lwps.
> However, the thread still stopped at the breakpoint (of different
> stack frames), so it needs step over again. 

> The loop like this will go
> on until the breakpoint condition is true, need_step_over_p decides to
> resume rather than step over.  The program will hit the breakpoint, and
> GDBserver reports the event back to GDB.
> 
> Note in my experiment with the test case above, I don't let GDBserver
> treat SIGSEGV as SIGTRAP, otherwise SIGSEGV will be reported back to GDB.
> 

OK.

We need to think of unconditional tracepoints as well,
not just conditional breakpoints, however.

Here's a likely scenario, that I think will happen:

#1 - tracepoint set at 0xf00.
#2 - Program stops at 0xf00, tracepoint is collected.
#3 - gdbserver starts a step-over
#4 - instead, a signal arrives, step-over cancelled.
#5 - signal should be passed, not reported to gdb.
#6 - gdbserver starts a new step-over, passing signal as well.
#7 - step lands in handler. step over finished.
#8 - gdbserver proceeds/continues
#9 - signal handler returns, again to 0xf00, tracepoint
     traps again and is thus collected again. This is a
     spurious collection.

gdb avoids the spurious double-hit in #9 by remembering that
it was stepping over a breakpoint when a signal arrived, in
order to continue the step over once the handler returns
(tp->step_after_step_resume_breakpoint).  This only works if the handler 
doesn't cause some other unrelated user-visible stop -- in that case,
then the next time the user continues the program, and the
handler returns to the original breakpoint address, we'll still
report a new breakpoint hit.  I think.

Maybe we can just not bother and live with the spurious double
tracepoint hit/collect in gdbserver.  (Likewise dprintf.)
Dunno, but I'm now starting to lean toward that.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index dcf58db..2330e67 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -4119,11 +4119,12 @@  single_step (struct lwp_info* lwp)
 }
 
 /* The signal can not be delivered to the inferior if we are trying to
-   reinsert a breakpoint or we're trying to finish a fast tracepoint
-   collect.  */
+   reinsert a breakpoint for software single step or we're trying to
+   finish a fast tracepoint collect.  */
 
-#define LWP_SIGNAL_CAN_BE_DELIVERED(LWP) \
-  !((LWP)->bp_reinsert != 0 || (LWP)->collecting_fast_tracepoint)
+#define LWP_SIGNAL_CAN_BE_DELIVERED(LWP)			\
+  !(((LWP)->bp_reinsert != 0 && can_software_single_step ())	\
+    || (LWP)->collecting_fast_tracepoint)
 
 /* Resume execution of LWP.  If STEP is nonzero, single-step it.  If
    SIGNAL is nonzero, give it that signal.  */