[RFA/7.8] user breakpoint not inserted if software-single-step at same location

Message ID 1401394280-14999-1-git-send-email-brobecker@adacore.com
State Superseded
Headers

Commit Message

Joel Brobecker May 29, 2014, 8:11 p.m. UTC
  Hello,

with the following code...

    12    Nested;   -- break #1
    13    return I; -- break #2
    14  end;

(line 12 is a call to function Nested)

... we have noticed the following errorneous behavior on ppc-aix,
where, after having inserted a breakpoint at line 12 and line 13,
and continuing from the breakpoint at line 12, the program never
stops at line 13, running away until the program terminates:

    % gdb -q func
    (gdb) b func.adb:12
    Breakpoint 1 at 0x10000a24: file func.adb, line 12.
    (gdb) b func.adb:13
    Breakpoint 2 at 0x10000a28: file func.adb, line 13.
    (gdb) run
    Starting program: /[...]/func

    Breakpoint 1, func () at func.adb:12
    12        Nested;   -- break #1
    (gdb) c
    Continuing.
    [Inferior 1 (process 4128872) exited with code 02]

When resuming from the first breakpoint, GDB first tries to step
out of that first breakpoint. We rely on software single-stepping
on this platform, and it just so happens that the address of the
first software single-step breakpoint is the same as the user's
breakpoint #2 (0x10000a28). So, with infrun and target traces
turned on (but uninteresting traces snip'ed off), the "continue"
operation looks like this:

    (gdb) c
    ### First, we insert the user breakpoints (the second one is an internal
    ### breakpoint on __pthread_init). The first user breakpoint is not
    ### inserted as we need to step out of it first.
    target_insert_breakpoint (0x0000000010000a28, xxx) = 0
    target_insert_breakpoint (0x00000000d03f3800, xxx) = 0
    ### Then we proceed with the step-out-of-breakpoint...
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [process 15335610] at 0x10000a24
    ### That's when we insert the SSS breakpoints...
    target_insert_breakpoint (0x0000000010000a28, xxx) = 0
    target_insert_breakpoint (0x00000000100009ac, xxx) = 0
    ### ... then let the inferior resume...
    target_resume (15335610, continue, 0)
    infrun: wait_for_inferior ()
    target_wait (-1, status, options={}) = 15335610,   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: target_wait (-1, status) =
    infrun:   15335610 [process 15335610],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: infwait_normal_state
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x100009ac
    ### At this point, we stopped at the second SSS breakpoint...
    target_stopped_by_watchpoint () = 0
    ### We remove the SSS breakpoints...
    target_remove_breakpoint (0x0000000010000a28, xxx) = 0
    target_remove_breakpoint (0x00000000100009ac, xxx) = 0
    target_stopped_by_watchpoint () = 0
    ### We find that we're not done, so we resume....
    infrun: no stepping, continue
    ### And thus insert the user breakpoints again, except we're not
    ### inserting the second breakpoint?!?
    target_insert_breakpoint (0x0000000010000a24, xxx) = 0
    infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [process 15335610] at 0x100009ac
    target_resume (-1, continue, 0)
    infrun: prepare_to_wait
    target_wait (-1, status, options={}) = 15335610,   status->kind = exited, status = 2

What happens is that the removal of the software single-step breakpoints
effectively removed the breakpoint instruction from inferior memory.
But because such breakpoints are inserted directly as raw breakpoints
rather than through the normal chain of breakpoints, we fail to notice
that one of the user breakpoints points to the same address and that
this user breakpoint is therefore effectively un-inserted. When
resuming after the single-step, GDB thinks that the user breakpoint
is still inserted and therefore does not need to insert it again.

This patch fixes the problem by avoiding the creation of a raw
breakpoint for any of the software single-step breakpoints when
their address correspond to a user breakpoint which has already
been inserted and whose address matches. The rest of the patch
adjusts the code removing the software single-step breakpoints
to take this possibility into account. As it happens, it simplifies
the code a little bit.

This patch does make one assumption, however, which is the fact
that user breakpoints get inserted before software single-step
breakpoints. I think this is a fair assumption, as software
single-step breakpoints get created as part of the target-step
operation (resume with step=1), which logically can only happen
after we've inserted all breakpoints.

For the record, this behavior started appearing after commit
31e77af205cf6564c2bf4c18400b4ca16bdf92cd (PR breakpoints/7143 -
Watchpoint does not trigger when first set) was applied, but
I have not checked whether it was really directly related to
that commit, or a latent issue.

gdb/ChangeLog:

        PR breakpoints/7143
        * breakpoint.c (non_sss_software_breakpoint_inserted_here_p):
        New function, extracted from software_breakpoint_inserted_here_p.
        (software_breakpoint_inserted_here_p): Remove factored out code
        by call to non_sss_software_breakpoint_inserted_here_p.
        (insert_single_step_breakpoint): Do nothing if a non-software-
        single-step breakpoint was already inserted at the same
        address.
        (remove_single_step_breakpoints): Adjust to take into account
        the fact that the first software single-step may not have been
        inserted.

Tested on ppc-aix with AdaCore's testsuite. Tested on x86_64-linux
with the official testsuite.

OK to commit?

Also, due to the nature of the regression (compared to 7.7), I think
we should wait for a resolution of this issue before creating the
gdb 7.8 release branch.

Thanks!
  

Comments

Pedro Alves May 29, 2014, 11:17 p.m. UTC | #1
On 05/29/2014 09:11 PM, Joel Brobecker wrote:
> Hello,
> 
> with the following code...
> 
>     12    Nested;   -- break #1
>     13    return I; -- break #2
>     14  end;
> 
> (line 12 is a call to function Nested)
> 
> ... we have noticed the following errorneous behavior on ppc-aix,

"erroneous"

> This patch does make one assumption, however, which is the fact
> that user breakpoints get inserted before software single-step
> breakpoints. I think this is a fair assumption, as software
> single-step breakpoints get created as part of the target-step
> operation (resume with step=1), which logically can only happen
> after we've inserted all breakpoints.

Async/background execution breaks that assumption though.

E.g., say you do:

 (gdb) b *0xsame_addr_as_sss_bkpt
 (gdb) s &
 (gdb) del

That "del" runs while the target is single-stepping.  And it
might just delete the breakpoint that was at the same address
as a single-step breakpoint, before the single-step breakpoint
traps or gdb collects the trap.

If you try that on native GNU/Linux, it'll fail because
then GDB can't poke at memory while the program is running.
You can still trigger it with using two threads:

 (gdb) b *0xsame_addr_as_sss_bkpt
 (gdb) set scheduler-locking on
 (gdb) thread 1
 (gdb) s &
 (gdb) thread 2  // this thread is stopped, so we can poke at memory.
 (gdb) del

That's why I thought it'd be easier to do this in the "remove" path.

We can still bypass actually inserting the sss breakpoint in
the "insert" path if there's already another breakpoint there,
but we'll need to create/clone the location and its shadow buffer,
and then still handle the issue in the "remove" path.  I'm now
thinking that indeed we should implement that optimization, but not
for efficiency (this being a corner case, it's doubtful it
matters), but to cater for remote targets that might not handle
duplicate Z0 packets well.  They're supposed to be handled in
an idempotent manner, but even GDBserver had related issues
recently.

Oh, _but_!  If the target supports breakpoint evaluating
breakpoint conditions itself (target_supports_evaluation_of_breakpoint_conditions),
such as GDBserver, then we _need_ to send the duplicate
software single-step Z0 breakpoint, in case the non-sss breakpoint
that is already inserted at the same address was conditional, otherwise
the target is only going to report the breakpoint hit if the
non-sss breakpoint's condition evaluates true, while we need the
sss breakpoint to be unconditional.  (In this case we know for
sure that the target knows what to do with the duplicate Z0
packets, it's a requirement of the feature.)

In sum, in the "insert" path:

 - if !target_supports_evaluation_of_breakpoint_conditions; then
     optimize out the sss breakpoint if there's already a
     non-sss breakpoint inserted at the same address.
   else
     make sure to resend/reinsert the breakpoint sss breakpoint,
     even if there's already a non-sss in place, in case that other
     breakpoint was conditional.
   fi

And in the "remove" path:

 - if there's still a non-sss breakpoint inserted at the
   same address, then don't actually remove the breakpoint
   off of the target, just wipe it from gdb's list.

> Also, due to the nature of the regression (compared to 7.7), I think
> we should wait for a resolution of this issue before creating the
> gdb 7.8 release branch.

I agree.

Thanks,
  
Joel Brobecker May 30, 2014, 12:22 p.m. UTC | #2
Hi Pedro,

Thanks for the super quick review!

> Async/background execution breaks that assumption though.

Hmmm, you are right. I had a feeling that this assumption was
going to come back to bite us one day. I didn't realize that
it was going to be today! ;-)

> We can still bypass actually inserting the sss breakpoint in
> the "insert" path if there's already another breakpoint there,
> but we'll need to create/clone the location and its shadow buffer,
> and then still handle the issue in the "remove" path.
[...]
> In sum, in the "insert" path:
> 
>  - if !target_supports_evaluation_of_breakpoint_conditions; then
>      optimize out the sss breakpoint if there's already a
>      non-sss breakpoint inserted at the same address.
>    else
>      make sure to resend/reinsert the breakpoint sss breakpoint,
>      even if there's already a non-sss in place, in case that other
>      breakpoint was conditional.
>    fi
> 
> And in the "remove" path:
> 
>  - if there's still a non-sss breakpoint inserted at the
>    same address, then don't actually remove the breakpoint
>    off of the target, just wipe it from gdb's list.

It seems to me that we'd need to merge your initial recommendation
into your summary above, right? Otherwise, wouldn't we fail in
the async example you provided? Actually, wouldn't it fail
regardless? Even if we inserted the SSS breakpoint, when the user
deletes his breakpoints, since the breakpoint chain doesn't know
about the SSS breakpoint, wouldn't it remove our SSS breakpoint
insn?

I am wondering whether the simpler approach that you initially
suggested, which is to just handle the issue in the "remove"
path for 7.8 wouldn't be a little safer, while we also look
at actually enhancing SSS breakpoints via the normal breakpoint
chain. I am wondering what's going to be needed for that...

WDYT?

Thanks!
  
Pedro Alves May 30, 2014, 12:51 p.m. UTC | #3
On 05/30/2014 01:22 PM, Joel Brobecker wrote:

>> And in the "remove" path:
>>
>>  - if there's still a non-sss breakpoint inserted at the
>>    same address, then don't actually remove the breakpoint
>>    off of the target, just wipe it from gdb's list.
> 
> It seems to me that we'd need to merge your initial recommendation
> into your summary above, right?

I admit I don't know what recommendation you're referring to.  :-)

 Otherwise, wouldn't we fail in
> the async example you provided? Actually, wouldn't it fail
> regardless? Even if we inserted the SSS breakpoint, when the user
> deletes his breakpoints, since the breakpoint chain doesn't know
> about the SSS breakpoint, wouldn't it remove our SSS breakpoint
> insn?

Ah, yes.  We'd need the mirror treatment in bkpt_remove_location.

> 
> I am wondering whether the simpler approach that you initially
> suggested, which is to just handle the issue in the "remove"
> path for 7.8 wouldn't be a little safer

For 7.8, I'm thinking it's really safer to avoid resending
duplicate Z0 packets to stubs that don't support conditional
breakpoint evaluation on the target end.  So I think we should
handle the "insert" path too.

BTW, did you try creating a test for the issue you discovered?
I don't think we have anything that triggers this already in
the tree, otherwise I think I'd have seen it with my
software-single-step-on-x86 branch.

> , while we also look
> at actually enhancing SSS breakpoints via the normal breakpoint
> chain. I am wondering what's going to be needed for that...

I have done that at least two or three times before, and I was
never that happy with the result.  This was before the patch
that led to this regression, and that one and the surrounding
patches were actually result of exactly wanting to modernize
software single-step.  :-)  The main issue is that SSS breakpoints
and all other breakpoints need to be inserted/removed at
different times, and that is surprisingly tricky to handle.
But I'm hoping/assuming the codebase is a little bit more
ready now for the next attempt.  My main motivation is to
be able to enable non-stop without forcing displaced-stepping
for all single-steps (even those that don't step over a bkpt).
  
Joel Brobecker May 30, 2014, 1:26 p.m. UTC | #4
> >>  - if there's still a non-sss breakpoint inserted at the
> >>    same address, then don't actually remove the breakpoint
> >>    off of the target, just wipe it from gdb's list.
> > 
> > It seems to me that we'd need to merge your initial recommendation
> > into your summary above, right?
> 
> I admit I don't know what recommendation you're referring to.  :-)

Sorry! This one:

    | but we'll need to create/clone the location and its shadow buffer,
    | and then still handle the issue in the "remove" path.

> For 7.8, I'm thinking it's really safer to avoid resending
> duplicate Z0 packets to stubs that don't support conditional
> breakpoint evaluation on the target end.  So I think we should
> handle the "insert" path too.

OK - I will take care of that.

> BTW, did you try creating a test for the issue you discovered?
> I don't think we have anything that triggers this already in
> the tree, otherwise I think I'd have seen it with my
> software-single-step-on-x86 branch.

I am wondering how to create that test, because it would be
a little tricky. We need to set ourselves into a situation
where we single-step out of a breakpoint with the second SSS
breakpoint being at the same address as one of the user breakpoints,
that second SSS not being the one that gets hit during that
first single-step-out-of-breakpoint.  The only way I can see
to achieve that, at the moment, is with a function call.
The only reliable way to do that, I think, is with an assembly
file, which means it'd be limited to a certain architecture.

> I have done that at least two or three times before, and I was
> never that happy with the result.  This was before the patch
> that led to this regression, and that one and the surrounding
> patches were actually result of exactly wanting to modernize
> software single-step.  :-)  The main issue is that SSS breakpoints
> and all other breakpoints need to be inserted/removed at
> different times, and that is surprisingly tricky to handle.
> But I'm hoping/assuming the codebase is a little bit more
> ready now for the next attempt.  My main motivation is to
> be able to enable non-stop without forcing displaced-stepping
> for all single-steps (even those that don't step over a bkpt).

Ugh! Much trickier than I thought! :-(

So, to summarize, I'll start by working on a new patch, which I'll
send here. I'll also try to put the new testcase on the list, but
today is a little bit of a full day for me, so that part might have
to wait until Monday depending on how well my day goes.

Thanks, Pedro!
  

Patch

diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index 676c7b8..f9dc413 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -4153,12 +4153,12 @@  breakpoint_inserted_here_p (struct address_space *aspace, CORE_ADDR pc)
   return 0;
 }
 
-/* This function returns non-zero iff there is a software breakpoint
-   inserted at PC.  */
+/* Ignoring software single-step breakpoints, return non-zero iff
+   there is a software breakpoint inserted at PC.  */
 
-int
-software_breakpoint_inserted_here_p (struct address_space *aspace,
-				     CORE_ADDR pc)
+static int
+non_sss_software_breakpoint_inserted_here_p (struct address_space *aspace,
+					     CORE_ADDR pc)
 {
   struct bp_location *bl, **blp_tmp;
 
@@ -4180,6 +4180,19 @@  software_breakpoint_inserted_here_p (struct address_space *aspace,
 	}
     }
 
+  return 0;
+}
+
+/* This function returns non-zero iff there is a software breakpoint
+   inserted at PC.  */
+
+int
+software_breakpoint_inserted_here_p (struct address_space *aspace,
+				     CORE_ADDR pc)
+{
+  if (non_sss_software_breakpoint_inserted_here_p (aspace, pc))
+    return 1;
+
   /* Also check for software single-step breakpoints.  */
   if (single_step_breakpoint_inserted_here_p (aspace, pc))
     return 1;
@@ -15149,6 +15162,13 @@  insert_single_step_breakpoint (struct gdbarch *gdbarch,
 {
   void **bpt_p;
 
+  /* If a non-software-single-step breakpoint is inserted at the same
+     location as our next_pc, no need to insert an additional
+     raw breakpoint.  */
+
+  if (non_sss_software_breakpoint_inserted_here_p (aspace, next_pc))
+    return;
+
   if (single_step_breakpoints[0] == NULL)
     {
       bpt_p = &single_step_breakpoints[0];
@@ -15189,22 +15209,18 @@  single_step_breakpoints_inserted (void)
 void
 remove_single_step_breakpoints (void)
 {
-  gdb_assert (single_step_breakpoints[0] != NULL);
-
-  /* See insert_single_step_breakpoint for more about this deprecated
-     call.  */
-  deprecated_remove_raw_breakpoint (single_step_gdbarch[0],
-				    single_step_breakpoints[0]);
-  single_step_gdbarch[0] = NULL;
-  single_step_breakpoints[0] = NULL;
+  int i;
 
-  if (single_step_breakpoints[1] != NULL)
-    {
-      deprecated_remove_raw_breakpoint (single_step_gdbarch[1],
-					single_step_breakpoints[1]);
-      single_step_gdbarch[1] = NULL;
-      single_step_breakpoints[1] = NULL;
-    }
+  for (i = 0; i < 2; i++)
+    if (single_step_breakpoints[i] != NULL)
+      {
+	/* See insert_single_step_breakpoint for more about
+	   this deprecated call.  */
+	deprecated_remove_raw_breakpoint (single_step_gdbarch[i],
+					  single_step_breakpoints[i]);
+	single_step_gdbarch[i] = NULL;
+	single_step_breakpoints[i] = NULL;
+      }
 }
 
 /* Delete software single step breakpoints without removing them from