Fix problems with finishing a dummy function call on simulators.

Message ID 1433862056-18237-1-git-send-email-lgustavo@codesourcery.com
State Superseded
Headers

Commit Message

Luis Machado June 9, 2015, 3 p.m. UTC
  This is in line with what was done by Joel's patch here:

https://sourceware.org/ml/gdb-patches/2014-11/msg00478.html

And it also answers Pedro's question about whether this is specific to SPARC
QEMU or not.  This indeed seems to affect multiple QEMU targets and also other
simulators (proprietary).

I ran into this weird issue of not being able to "finish" an inferior function
call. It looks as if the program is running away, but it really is stuck
somewhere.  "finish" still works fine for regular functions not called manually
by GDB.

I tracked this failure down to GDB having both a bp_call_dummy and bp_finish in
its breakpoint list. As a result of one not being considered permanent and the
other considered permanent, GDB will not issue a Z packet to force the insertion
of that location's breakpoint, confusing the simulator that does not know how
to deal properly with these permanent breakpoints that GDB inserted beforehand.

The attached patch fixes this, though i'm inclined to say we could probably
check if both bp_call_dummy and bp_finish are present and force the
insertion of that location's breakpoint. It isn't clear to me where exactly that
check would go or if it would be cleaner than checking that information in
the same function Joel used.

I see no regressions on x86-64 and it fixes a bunch of failures for simulator
targets we use (MIPS and PowerPC to name two).

gdb/ChangeLog:

2015-06-09  Luis Machado  <lgustavo@codesourcery.com>

	* breakpoint.c (bp_loc_is_permanent): Return 0 for bp_finish
	as well.
---
 gdb/breakpoint.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)
  

Comments

Pedro Alves June 9, 2015, 5:51 p.m. UTC | #1
On 06/09/2015 04:00 PM, Luis Machado wrote:
> This is in line with what was done by Joel's patch here:
> 
> https://sourceware.org/ml/gdb-patches/2014-11/msg00478.html
> 
> And it also answers Pedro's question about whether this is specific to SPARC
> QEMU or not.  This indeed seems to affect multiple QEMU targets and also other
> simulators (proprietary).

Sounds like a different issue, although related.

> 
> I ran into this weird issue of not being able to "finish" an inferior function
> call. It looks as if the program is running away, but it really is stuck
> somewhere.  "finish" still works fine for regular functions not called manually
> by GDB.

Sounds like that would fail on SPARC qemu as well.

> 
> I tracked this failure down to GDB having both a bp_call_dummy and bp_finish in
> its breakpoint list. As a result of one not being considered permanent and the
> other considered permanent, GDB will not issue a Z packet to force the insertion
> of that location's breakpoint, confusing the simulator that does not know how
> to deal properly with these permanent breakpoints that GDB inserted beforehand.
> 
> The attached patch fixes this, though i'm inclined to say we could probably
> check if both bp_call_dummy and bp_finish are present and force the
> insertion of that location's breakpoint.  It isn't clear to me where exactly that
> check would go or if it would be cleaner than checking that information in
> the same function Joel used.
> 
> I see no regressions on x86-64 and it fixes a bunch of failures for simulator
> targets we use (MIPS and PowerPC to name two).

If it happens that you "finish" from a normal function, and the finish
breakpoint ends up on top of a real permanent breakpoint, then this patch
will make us end up inserting a breakpoint on top of that permanent
breakpoint.  I don't see what's special about finish breakpoints;
it's the address (dummy breakpoint location) that is special.  It very much
sounds like that any kind of breakpoint that is placed on top of the dummy
breakpoint ends up with the same issue.  E.g., if you stepi out of
the called function, with a software single-step breakpoint, sounds like
GDB will miss inserting the software step breakpoint because that's
at the same address as the dummy breakpoint.

As a data point, I assume that GDB is considering the non-permanent
dummy breakpoint a duplicate of the permanent finish breakpoint and
then none ends up inserted.  Is that right?

Not exactly sure what to do here.  Maybe we should stop considering
permanent and non-permanent breakpoints at the same address as
duplicates.  That should result in GDB inserting the non-permanent
one, I think.  Or we could get stop marking permanent breakpoints
as always inserted, and let normal breakpoints insert on top of
permanent breakpoints normally.  See also:
 https://sourceware.org/ml/gdb-patches/2015-03/msg00174.html

Thanks,
Pedro Alves
  
Luis Machado June 9, 2015, 6:10 p.m. UTC | #2
On 06/09/2015 02:51 PM, Pedro Alves wrote:
> On 06/09/2015 04:00 PM, Luis Machado wrote:
>> This is in line with what was done by Joel's patch here:
>>
>> https://sourceware.org/ml/gdb-patches/2014-11/msg00478.html
>>
>> And it also answers Pedro's question about whether this is specific to SPARC
>> QEMU or not.  This indeed seems to affect multiple QEMU targets and also other
>> simulators (proprietary).
>
> Sounds like a different issue, although related.
>
>>
>> I ran into this weird issue of not being able to "finish" an inferior function
>> call. It looks as if the program is running away, but it really is stuck
>> somewhere.  "finish" still works fine for regular functions not called manually
>> by GDB.
>
> Sounds like that would fail on SPARC qemu as well.
>
>>
>> I tracked this failure down to GDB having both a bp_call_dummy and bp_finish in
>> its breakpoint list. As a result of one not being considered permanent and the
>> other considered permanent, GDB will not issue a Z packet to force the insertion
>> of that location's breakpoint, confusing the simulator that does not know how
>> to deal properly with these permanent breakpoints that GDB inserted beforehand.
>>
>> The attached patch fixes this, though i'm inclined to say we could probably
>> check if both bp_call_dummy and bp_finish are present and force the
>> insertion of that location's breakpoint.  It isn't clear to me where exactly that
>> check would go or if it would be cleaner than checking that information in
>> the same function Joel used.
>>
>> I see no regressions on x86-64 and it fixes a bunch of failures for simulator
>> targets we use (MIPS and PowerPC to name two).
>
> If it happens that you "finish" from a normal function, and the finish
> breakpoint ends up on top of a real permanent breakpoint, then this patch
> will make us end up inserting a breakpoint on top of that permanent
> breakpoint.  I don't see what's special about finish breakpoints;
> it's the address (dummy breakpoint location) that is special.  It very much
> sounds like that any kind of breakpoint that is placed on top of the dummy
> breakpoint ends up with the same issue.  E.g., if you stepi out of
> the called function, with a software single-step breakpoint, sounds like
> GDB will miss inserting the software step breakpoint because that's
> at the same address as the dummy breakpoint.

Yes, i meant breakpoints as the address themselves, so a location. It is 
probably the case that using permanent breakpoints and mixing them with 
other types of non-permanent breakpoints is causing issues, though the 
only well-exercised testcase is the finish-after-dummy-call one.

I do recall once getting stuck with a stepi inside a dummy function 
call, so i may have hit what you suggested here.

>
> As a data point, I assume that GDB is considering the non-permanent
> dummy breakpoint a duplicate of the permanent finish breakpoint and
> then none ends up inserted.  Is that right?

That is correct. And one of them is already considered inserted.

>
> Not exactly sure what to do here.  Maybe we should stop considering
> permanent and non-permanent breakpoints at the same address as
> duplicates.  That should result in GDB inserting the non-permanent
> one, I think.  Or we could get stop marking permanent breakpoints
> as always inserted, and let normal breakpoints insert on top of
> permanent breakpoints normally.  See also:
>   https://sourceware.org/ml/gdb-patches/2015-03/msg00174.html

That sounds a bit hacky. Doesn't that defeat the purpose of having 
permanent breakpoints in the first place?

It looks like non-gdbserver targets are not ready to support these 
tricky constructs/optimizations unfortunately. I'm afraid adding more 
hacks here and there will cause the code to get even more confusing 
without a generous amount of code comments. And i'm not even sure the 
bp_finish check is the best solution either. After all, there is the 
stepi case too.

We could probably fix the simulators, but then again there are 
proprietary ones we cannot easily fix.
  
Pedro Alves June 9, 2015, 6:13 p.m. UTC | #3
On 06/09/2015 07:10 PM, Luis Machado wrote:

>> Not exactly sure what to do here.  Maybe we should stop considering
>> permanent and non-permanent breakpoints at the same address as
>> duplicates.  That should result in GDB inserting the non-permanent
>> one, I think.  Or we could get stop marking permanent breakpoints
>> as always inserted, and let normal breakpoints insert on top of
>> permanent breakpoints normally.  See also:
>>   https://sourceware.org/ml/gdb-patches/2015-03/msg00174.html
> 
> That sounds a bit hacky.

Can you clarify?  There are two suggestions above, in addition
to a url showing even more ideas.  So I don't know what you're
referring to.  :-)

Thanks,
Pedro Alves

> Doesn't that defeat the purpose of having 
> permanent breakpoints in the first place?
> 
> It looks like non-gdbserver targets are not ready to support these 
> tricky constructs/optimizations unfortunately. I'm afraid adding more 
> hacks here and there will cause the code to get even more confusing 
> without a generous amount of code comments. And i'm not even sure the 
> bp_finish check is the best solution either. After all, there is the 
> stepi case too.
> 
> We could probably fix the simulators, but then again there are 
> proprietary ones we cannot easily fix.
  
Luis Machado June 9, 2015, 6:22 p.m. UTC | #4
On 06/09/2015 03:13 PM, Pedro Alves wrote:
> On 06/09/2015 07:10 PM, Luis Machado wrote:
>
>>> Not exactly sure what to do here.  Maybe we should stop considering
>>> permanent and non-permanent breakpoints at the same address as
>>> duplicates.  That should result in GDB inserting the non-permanent
>>> one, I think.  Or we could get stop marking permanent breakpoints
>>> as always inserted, and let normal breakpoints insert on top of
>>> permanent breakpoints normally.  See also:
>>>    https://sourceware.org/ml/gdb-patches/2015-03/msg00174.html
>>
>> That sounds a bit hacky.
>
> Can you clarify?  There are two suggestions above, in addition
> to a url showing even more ideas.  So I don't know what you're
> referring to.  :-)

Both the above and the mail sound like workaround ideas. You mentioned 
even more special casing in the mail. It is the amount of special casing 
that i'm afraid of.

Having more special cases feel like they defeat the purpose of having 
those permanent breakpoints if we have to IF our way through them. Am i 
missing something?
  
Luis Machado June 9, 2015, 6:34 p.m. UTC | #5
On 06/09/2015 03:13 PM, Pedro Alves wrote:
> On 06/09/2015 07:10 PM, Luis Machado wrote:
>
>>> Not exactly sure what to do here.  Maybe we should stop considering
>>> permanent and non-permanent breakpoints at the same address as
>>> duplicates.  That should result in GDB inserting the non-permanent
>>> one, I think.  Or we could get stop marking permanent breakpoints
>>> as always inserted, and let normal breakpoints insert on top of
>>> permanent breakpoints normally.  See also:
>>>    https://sourceware.org/ml/gdb-patches/2015-03/msg00174.html
>>
>> That sounds a bit hacky.
>
> Can you clarify?  There are two suggestions above, in addition
> to a url showing even more ideas.  So I don't know what you're
> referring to.  :-)
>
> Thanks,
> Pedro Alves
>
>> Doesn't that defeat the purpose of having
>> permanent breakpoints in the first place?
>>
>> It looks like non-gdbserver targets are not ready to support these
>> tricky constructs/optimizations unfortunately. I'm afraid adding more
>> hacks here and there will cause the code to get even more confusing
>> without a generous amount of code comments. And i'm not even sure the
>> bp_finish check is the best solution either. After all, there is the
>> stepi case too.
>>
>> We could probably fix the simulators, but then again there are
>> proprietary ones we cannot easily fix.
>
>

For the record, i'm fine with any of those workarounds if there is no 
reasonable fix for the fact that permanent breakpoints don't work as GDB 
expects on some targets. :-)
  

Patch

diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index 657c58e..eb3df02 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -8984,8 +8984,16 @@  bp_loc_is_permanent (struct bp_location *loc)
      0x02 while interrupts disabled, Error state) instead of reporting
      a SIGTRAP.  QEMU should probably be fixed, but in the interest of
      compatibility with versions that behave this way, we always consider
-     bp_call_dummy breakpoint locations as non-permanent.  */
-  if (loc->owner->type == bp_call_dummy)
+     bp_call_dummy breakpoint locations as non-permanent.
+
+     Another situation arises when we have a bp_call_dummy breakpoint inserted
+     and then the user issues a finish, triggering GDB to create a bp_finish
+     breakpoint to handle the return from the inferior function call.  When
+     both bp_call_dummy and bp_finish breakpoints are present, GDB will not
+     force the insertion of these locations, triggering, once again, the
+     problem described above.  Therefore we check for bp_finish here as
+     well.  */
+  if (loc->owner->type == bp_call_dummy || loc->owner->type == bp_finish)
     return 0;
 
   cleanup = save_current_space_and_thread ();