RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS

Message ID 20141213154638.GK5457@adacore.com
State New, archived
Headers

Commit Message

Joel Brobecker Dec. 13, 2014, 3:46 p.m. UTC
  Hi Pedro,

> The main issue is that we're trying to move the thread past a
> breakpoint.  Barring displaced stepping support, to move the
> thread past the breakpoint, we have to remove the breakpoint from
> the target temporarily.  But then we _cannot_ resume other threads
> but the one that is stopped at the breakpoint, because then those
> other threads could fly by the removed breakpoint and miss it.

Attached is a patch that does just that, tested on ppc-lynx5 and
ppc-lynx178.  I waited a while before posting it here, because
I wanted to put it in observation for a while first...

gdb/gdbserver/ChangeLog:

        * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1.
        Remove FIXME comment about assumption about N.

OK to commit?

Note that parallel to that, I came across another issue, which I am
going to call a limitation for now: consider the case where we have
2 threads, A and B, and we are tring to next/step some code in thread
A. While doing so, thread B receives a signal, and therefore reports
it to GDB. GDB sees that this signal is configured as
nostop/noprint/pass, so presumably, you would think that we'd resume
the inferior passing that signal to thread B. However, how do you do
that while at the same time stepping thread A?

IIRC, what happens currently in this case is that GDB keeps trying
to resume/step thread A, and the kernel keeps telling GDB "no,
thread B just received a signal", and so GDB and the kernel go
into that infinite loop where nothing advances. I'm not quite sure
why we keep getting the signal for thread B, if it's a new signal
each time, or if it's about the signal not being passed back (the
program I saw this in is fairly large and complicated).

In any case, I don't see how we could improve this situation
without settting sss-like breakpoints... Something I'm not really
eager to do, at least for now, since "set scheduler-locking step"
seems to work around the issue.

Thanks!
  

Comments

Pedro Alves Dec. 15, 2014, 1:11 p.m. UTC | #1
On 12/13/2014 03:46 PM, Joel Brobecker wrote:
> Hi Pedro,
> 
>> The main issue is that we're trying to move the thread past a
>> breakpoint.  Barring displaced stepping support, to move the
>> thread past the breakpoint, we have to remove the breakpoint from
>> the target temporarily.  But then we _cannot_ resume other threads
>> but the one that is stopped at the breakpoint, because then those
>> other threads could fly by the removed breakpoint and miss it.
> 
> Attached is a patch that does just that, tested on ppc-lynx5 and
> ppc-lynx178.  I waited a while before posting it here, because
> I wanted to put it in observation for a while first...
> 
> gdb/gdbserver/ChangeLog:
> 
>         * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1.
>         Remove FIXME comment about assumption about N.
> 
> OK to commit?

Sure, OK.

> 
> Note that parallel to that, I came across another issue, which I am
> going to call a limitation for now: consider the case where we have
> 2 threads, A and B, and we are tring to next/step some code in thread
> A. While doing so, thread B receives a signal, and therefore reports
> it to GDB. GDB sees that this signal is configured as
> nostop/noprint/pass, so presumably, you would think that we'd resume
> the inferior passing that signal to thread B. However, how do you do
> that while at the same time stepping thread A?

GDB nowadays sends a single vCont packet that both steps thread A,
continues thread B with a signal and continues all other threads with
no signal (previously in some cases it'd just lose control of the
inferior, or deliver the signal to the wrong thread).  Something like:

  vCont;s:A;C SIG:B;c

See the switch_back_to_stepped_thread calls within:

  if (random_signal)
    {

at the tail end of handle_signal_stop, and
remote.c:append_pending_thread_resumptions.

There are tests in the testsuite that result in packets
just like that.

> 
> IIRC, what happens currently in this case is that GDB keeps trying
> to resume/step thread A, and the kernel keeps telling GDB "no,
> thread B just received a signal", and so GDB and the kernel go
> into that infinite loop where nothing advances. I'm not quite sure
> why we keep getting the signal for thread B, if it's a new signal
> each time, or if it's about the signal not being passed back (the
> program I saw this in is fairly large and complicated).
> 
> In any case, I don't see how we could improve this situation
> without settting sss-like breakpoints... Something I'm not really
> eager to do, at least for now, since "set scheduler-locking step"
> seems to work around the issue.

Couldn't you iterate over the threads, and use PTRACE_STEP_ONE
for the stepped threads, and PTRACE_CONT_ONE for the others,
instead of PTRACE_CONT ?  For the case above, lynx_resume would
end up issuing:

 PTRACE_STEP_ONE, thread A, sig 0
 PTRACE_CONT_ONE, thread B, sig SIG
 PTRACE_CONT_ONE, thread C, sig 0
 PTRACE_CONT_ONE, thread D, sig 0
 ...

Otherwise, yeah, sounds like handling the step request with
breakpoints instead might be the solution.

Thanks,
Pedro Alves
  
Joel Brobecker Dec. 15, 2014, 2:58 p.m. UTC | #2
> > gdb/gdbserver/ChangeLog:
> > 
> >         * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1.
> >         Remove FIXME comment about assumption about N.
> > 
> > OK to commit?
> 
> Sure, OK.

Thank you, pushed!

> GDB nowadays sends a single vCont packet that both steps thread A,
> continues thread B with a signal and continues all other threads with
> no signal (previously in some cases it'd just lose control of the
> inferior, or deliver the signal to the wrong thread).  Something like:
> 
>   vCont;s:A;C SIG:B;c
[...]
> Couldn't you iterate over the threads, and use PTRACE_STEP_ONE
> for the stepped threads, and PTRACE_CONT_ONE for the others,
> instead of PTRACE_CONT ?  For the case above, lynx_resume would
> end up issuing:
> 
>  PTRACE_STEP_ONE, thread A, sig 0
>  PTRACE_CONT_ONE, thread B, sig SIG
>  PTRACE_CONT_ONE, thread C, sig 0
>  PTRACE_CONT_ONE, thread D, sig 0

Interesting. Do you mean sending those requests without waiting
for the inferior to stop? I'd have to verify that it's possible
to send ptrace requests while the inferior is "in flight", but
wouldn't you then have possible race conditions?
  
Pedro Alves Dec. 15, 2014, 4:01 p.m. UTC | #3
On 12/15/2014 02:58 PM, Joel Brobecker wrote:

>> GDB nowadays sends a single vCont packet that both steps thread A,
>> continues thread B with a signal and continues all other threads with
>> no signal (previously in some cases it'd just lose control of the
>> inferior, or deliver the signal to the wrong thread).  Something like:
>>
>>   vCont;s:A;C SIG:B;c
> [...]
>> Couldn't you iterate over the threads, and use PTRACE_STEP_ONE
>> for the stepped threads, and PTRACE_CONT_ONE for the others,
>> instead of PTRACE_CONT ?  For the case above, lynx_resume would
>> end up issuing:
>>
>>  PTRACE_STEP_ONE, thread A, sig 0
>>  PTRACE_CONT_ONE, thread B, sig SIG
>>  PTRACE_CONT_ONE, thread C, sig 0
>>  PTRACE_CONT_ONE, thread D, sig 0
> 
> Interesting. Do you mean sending those requests without waiting
> for the inferior to stop?

Yes.  This is what we do e.g., on Linux.  It just sounds like
Lynx's PTRACE_CONT_ONE is like Linux's PTRACE_CONT.  Linux has
no equivalent of Lynx's PTRACE_CONT (resume all threads with
a single request).

> I'd have to verify that it's possible
> to send ptrace requests while the inferior is "in flight", but
> wouldn't you then have possible race conditions?

Not sure what sort of race conditions you mean, but keep in mind
that I'm pretty clueless about Lynx.  :-)

Thanks,
Pedro Alves
  

Patch

From ea7e173463120d24417a7706f98fff850f9aaa1a Mon Sep 17 00:00:00 2001
From: Joel Brobecker <brobecker@adacore.com>
Date: Tue, 25 Nov 2014 11:12:10 -0500
Subject: [PATCH] [gdbserver/lynxos] Use PTRACE_SINGLESTEP_ONE when
 single-stepping one thread.

Currently, when we receive a request to single-step one single thread
(Eg, when single-stepping out of a breakpoint), we use the
PTRACE_SINGLESTEP pthread request, which does single-step
the corresponding thread, but also resumes execution of all
other threads in the inferior.

This causes problems when debugging programs where another thread
receives multiple debug events while trying to single-step a specific
thread out of a breakpoint (with infrun traces turned on):

    (gdb) continue
    Continuing.
    infrun: clear_proceed_status_thread (Thread 126)
    [...]
    infrun: clear_proceed_status_thread (Thread 142)
    [...]
    infrun: clear_proceed_status_thread (Thread 146)
    infrun: clear_proceed_status_thread (Thread 125)
    infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0)
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
    infrun: wait_for_inferior ()
    infrun: target_wait (-1, status) =
    infrun:   42000 [Thread 146],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
    infrun: infwait_normal_state
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x10a187f4
    infrun: context switch
    infrun: Switching context from Thread 142 to Thread 146
    infrun: random signal (GDB_SIGNAL_REALTIME_34)
    infrun: switching back to stepped thread
    infrun: Switching context from Thread 146 to Thread 142
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838
    infrun: prepare_to_wait
    [...handling of similar events for threads 145, 144 and 143 snipped...]
    infrun: prepare_to_wait
    infrun: target_wait (-1, status) =
    infrun:   42000 [Thread 146],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34
    infrun: infwait_normal_state
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x10a187f4
    infrun: context switch
    infrun: Switching context from Thread 142 to Thread 146
    ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed.

What happens is that GDB keeps sending requests to resume one specific
thread, and keeps receiving debugging events for other threads.
Things break down when the one of the other threads receives a debug
event for the second time (thread 146 in the example above).

This patch fixes the problem by making sure that only one thread
gets resumed, thus preventing the other threads from generating
an unexpected event.

gdb/gdbserver/ChangeLog:

        * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1.
        Remove FIXME comment about assumption about N.
---
 gdb/gdbserver/lynx-low.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gdb/gdbserver/lynx-low.c b/gdb/gdbserver/lynx-low.c
index 6178e03..3b83669 100644
--- a/gdb/gdbserver/lynx-low.c
+++ b/gdb/gdbserver/lynx-low.c
@@ -320,10 +320,11 @@  lynx_attach (unsigned long pid)
 static void
 lynx_resume (struct thread_resume *resume_info, size_t n)
 {
-  /* FIXME: Assume for now that n == 1.  */
   ptid_t ptid = resume_info[0].thread;
-  const int request = (resume_info[0].kind == resume_step
-                       ? PTRACE_SINGLESTEP : PTRACE_CONT);
+  const int request
+    = (resume_info[0].kind == resume_step
+       ? (n == 1 ? PTRACE_SINGLESTEP_ONE : PTRACE_SINGLESTEP)
+       : PTRACE_CONT);
   const int signal = resume_info[0].sig;
 
   /* If given a minus_one_ptid, then try using the current_process'
-- 
1.9.1