Expect SI_KERNEL si_code for a MIPS software breakpoint trap

Message ID 1442592647-3051-1-git-send-email-lgustavo@codesourcery.com
State Superseded
Headers

Commit Message

Luis Machado Sept. 18, 2015, 4:10 p.m. UTC
  While doing some MIPS/Linux tests, i've found a number of threading tests
failing due to spurious SIGTRAP's. Turns out those spurious SIGTRAP's were
actually software breakpoint hits that were supposed to be handled silently by
GDB/GDBserver, returning a swbreak event.

gdb.threads/continue-pending-status.exp is one of the testcases that show this
behavior.

--

Breakpoint 1, main () at gdb.threads/continue-pending-status.c:44^M
44        pthread_barrier_init (&barrier, NULL, NUM_THREADS);^M
(gdb) b continue-pending-status.c:36^M
Breakpoint 2 at 0x400a04: file gdb.threads/continue-pending-status.c, line 36.^M
(gdb) PASS: gdb.threads/continue-pending-status.exp: attempt 0: set break in tight loop
continue^M
Continuing.^M
[New Thread 5850]^M
[New Thread 5851]^M
[Switching to Thread 5850]^M
^M
Breakpoint 2, thread_function (arg=0x0) at gdb.threads/continue-pending-status.c:36^M
36        while (1); /* break here */^M
(gdb) PASS: gdb.threads/continue-pending-status.exp: attempt 0: continue to tight loop
print /x $_thread^M
$1 = 0x2^M
(gdb) PASS: gdb.threads/continue-pending-status.exp: attempt 0: get thread number
thread 3^M
[Switching to thread 3 (Thread 5851)]^M
36        while (1); /* break here */^M
(gdb) PASS: gdb.threads/continue-pending-status.exp: attempt 0: switch to non-event thread
delete breakpoints^M
Delete all breakpoints? (y or n) y^M
(gdb) info breakpoints^M
No breakpoints or watchpoints.^M
(gdb) continue^M
Continuing.^M
^M
Program received signal SIGTRAP, Trace/breakpoint trap.^M

<<<< This SIGTRAP was a pending breakpoint event that wasn't supposed to cause
<<<< a stop, but gdbserver did not figure out this was a breakpoint hit.

PASS: gdb.threads/continue-pending-status.exp: attempt 0: continue for ctrl-c
thread_function (arg=0x0) at gdb.threads/continue-pending-status.c:36^M
36        while (1); /* break here */^M
(gdb) FAIL: gdb.threads/continue-pending-status.exp: attempt 0: caught interrupt

--

I tracked this down to the lack of a proper definition of what MIPS' kernel
returns in the si_code for a software breakpoint trap.

Though i did not find documentation about this, tests showed that we should
check for SI_KERNEL, just like i386. I've cc-ed Maciej, just to be sure this
is indeed correct.

With the following patch i have cleaner results for thread tests on
MIPS/Linux.

Regression-tested on a few MIPS boards.

OK?

gdb/ChangeLog:

2015-09-18  Luis Machado  <lgustavo@codesourcery.com>

	* nat/linux-ptrace.h: Check for SI_KERNEL si_code for MIPS.
---
 gdb/nat/linux-ptrace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Maciej W. Rozycki Sept. 18, 2015, 4:56 p.m. UTC | #1
Hi Luis,

> I tracked this down to the lack of a proper definition of what MIPS' kernel
> returns in the si_code for a software breakpoint trap.
> 
> Though i did not find documentation about this, tests showed that we should
> check for SI_KERNEL, just like i386. I've cc-ed Maciej, just to be sure this
> is indeed correct.

 Hmm, the MIPS/Linux port does not set any particular code for SIGTRAP, 
all such signals will have the SI_KERNEL default, so you may well return 
TRUE unconditionally.

 I'm not convinced however that it is safe to assume all SIGTRAPs come 
from breakpoints -- this signal is sent by the kernel for both BREAK and 
trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may 
have been placed throughout code for some reason, for example to serve as 
cheap assertion checks.

 Is there a separate check made afterwards like `bpstat_explains_signal' 
to validate the source of the signal here?

 Perhaps we should make it a part of the ABI and teach MIPS/Linux about 
the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP 
in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to 
TRAP_BRKPT, as expected.  This won't fix history of course, but at least 
it will make debugging a little bit easier to handle in the future.  
Cc-ing `linux-mips' for further input.

 I was wondering where these SIGTRAPs come from too BTW, thanks for 
investigating it.  And thanks for the heads-up!

  Maciej
  
David Daney Sept. 18, 2015, 5:04 p.m. UTC | #2
We have to be very careful changing the ABI here.

This is used by almost all userspace code to detect integer division by 
zero.  Many things like the libgcj runtime use this to generate runtime 
exceptions, we don't want to break them.

David Daney



On 09/18/2015 09:56 AM, Maciej W. Rozycki wrote:
> Hi Luis,
>
>> I tracked this down to the lack of a proper definition of what MIPS' kernel
>> returns in the si_code for a software breakpoint trap.
>>
>> Though i did not find documentation about this, tests showed that we should
>> check for SI_KERNEL, just like i386. I've cc-ed Maciej, just to be sure this
>> is indeed correct.
>
>   Hmm, the MIPS/Linux port does not set any particular code for SIGTRAP,
> all such signals will have the SI_KERNEL default, so you may well return
> TRUE unconditionally.
>
>   I'm not convinced however that it is safe to assume all SIGTRAPs come
> from breakpoints -- this signal is sent by the kernel for both BREAK and
> trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may
> have been placed throughout code for some reason, for example to serve as
> cheap assertion checks.
>
>   Is there a separate check made afterwards like `bpstat_explains_signal'
> to validate the source of the signal here?
>
>   Perhaps we should make it a part of the ABI and teach MIPS/Linux about
> the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP
> in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to
> TRAP_BRKPT, as expected.  This won't fix history of course, but at least
> it will make debugging a little bit easier to handle in the future.
> Cc-ing `linux-mips' for further input.
>
>   I was wondering where these SIGTRAPs come from too BTW, thanks for
> investigating it.  And thanks for the heads-up!
>
>    Maciej
>
>
  
Maciej W. Rozycki Sept. 18, 2015, 6:36 p.m. UTC | #3
On Fri, 18 Sep 2015, David Daney wrote:

> We have to be very careful changing the ABI here.
> 
> This is used by almost all userspace code to detect integer division by zero.
> Many things like the libgcj runtime use this to generate runtime exceptions,
> we don't want to break them.

 No worries here, integer division by 0 and overflow use `BREAK 7' and 
`BREAK 6' respectively (or corresponding trap instructions) and these 
cases are already handled correctly, as I implemented many years ago for 
regular MIPS user code and recently fixed for MIPS16 code (and less 
recently for microMIPS code as well).  Have a look at `do_trap_or_bp' in 
arch/mips/kernel/traps.c for details.

 There's no collision with `BREAK 5'; Linux code would of course have to 
treat 5 as an unrecognised for traps and would therefore handle the case 
right in `do_bp' like with kprobe breakpoints.

 I hope this clears your concerns.

  Maciej
  
Luis Machado Feb. 9, 2016, 2:29 p.m. UTC | #4
Hi Maciej,

I'm finally getting back to this. Sorry for the delay.

On 09/18/2015 01:56 PM, Maciej W. Rozycki wrote:
> Hi Luis,
>
>> I tracked this down to the lack of a proper definition of what MIPS' kernel
>> returns in the si_code for a software breakpoint trap.
>>
>> Though i did not find documentation about this, tests showed that we should
>> check for SI_KERNEL, just like i386. I've cc-ed Maciej, just to be sure this
>> is indeed correct.
>
>   Hmm, the MIPS/Linux port does not set any particular code for SIGTRAP,
> all such signals will have the SI_KERNEL default, so you may well return
> TRUE unconditionally.
>

That is unfortunate. It would be nice to have a well defined standard of 
communicating these events from kernels to debuggers. All is not lost 
though, since that can be improved.

>   I'm not convinced however that it is safe to assume all SIGTRAPs come
> from breakpoints -- this signal is sent by the kernel for both BREAK and
> trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may
> have been placed throughout code for some reason, for example to serve as
> cheap assertion checks.
>
>   Is there a separate check made afterwards like `bpstat_explains_signal'
> to validate the source of the signal here?
>

In my specific example we are dealing with two breakpoint hits by 
different threads. The first one is reported just fine and the one we 
have problems with is the queued one that is reported afterwards when we 
attempt to move the other thread.

We do rely on bpstat_explains_signal when gdbserver/the remote target 
notifies gdb about a stop. In the case of a queued breakpoint hit, it 
doesn't even get reported back to gdb and is just ignored by gdbserver 
if it is recognized as a breakpoint hit.

With the proposed change, even the hardcoded traps will initially be 
recognized as breakpoints, albeit maybe recognized as permanent 
breakpoints for some opcodes. It may cause gdbserver to ignore a second 
hardcoded trap hit, which is not desirable.

>   Perhaps we should make it a part of the ABI and teach MIPS/Linux about
> the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP
> in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to
> TRAP_BRKPT, as expected.  This won't fix history of course, but at least
> it will make debugging a little bit easier to handle in the future.
> Cc-ing `linux-mips' for further input.

This is the best solution in my opinion and will definitely make the 
debugger's life easier if it can tell the difference between multiple 
seemingly equivalent SIGTRAP's.

Does this involve handling BRK_SSTEPBP inside 
arch/mips/kernel/traps.c:do_trap_or_bp?

>
>   I was wondering where these SIGTRAPs come from too BTW, thanks for
> investigating it.  And thanks for the heads-up!

You're welcome.
  
Maciej W. Rozycki Feb. 9, 2016, 8:55 p.m. UTC | #5
On Tue, 9 Feb 2016, Luis Machado wrote:

> I'm finally getting back to this. Sorry for the delay.

 No problem, thanks for keeping a track of it.

> >   I'm not convinced however that it is safe to assume all SIGTRAPs come
> > from breakpoints -- this signal is sent by the kernel for both BREAK and
> > trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may
> > have been placed throughout code for some reason, for example to serve as
> > cheap assertion checks.
> > 
> >   Is there a separate check made afterwards like `bpstat_explains_signal'
> > to validate the source of the signal here?
> 
> In my specific example we are dealing with two breakpoint hits by different
> threads. The first one is reported just fine and the one we have problems with
> is the queued one that is reported afterwards when we attempt to move the
> other thread.
> 
> We do rely on bpstat_explains_signal when gdbserver/the remote target notifies
> gdb about a stop. In the case of a queued breakpoint hit, it doesn't even get
> reported back to gdb and is just ignored by gdbserver if it is recognized as a
> breakpoint hit.

 I'm not sure I understand what's going on here, why is a breakpoint hit
required to be ignored by gdbserver?

 From what you say I infer GDB controls a multi-threaded program and there 
it sets a software breakpoint which, by its nature, is global to all the 
threads.  Then multiple threads hit the breakpoint simultaneously (or 
nearly simultaneously) and the hits are delivered to GDB one by one, via 
gdbserver.  So why is GDB not prepared for that?

 It set the breakpoint itself so it should expect it to hit sometime, and 
if there are multiple threads, then there can be multiple hits at once or 
almost at once because, even in the all-stop mode, there is no guaranteed 
way to stop the threads all at once.  The threads may be spread across 
different processors in an SMP system for example, all trapping literally 
at once -- and then the kernel queueing the signals according to its own 
internal schedule before delivering them to the debugger (that, from the 
kernel's point of view, being gdbserver in this case).

> With the proposed change, even the hardcoded traps will initially be
> recognized as breakpoints, albeit maybe recognized as permanent breakpoints
> for some opcodes. It may cause gdbserver to ignore a second hardcoded trap
> hit, which is not desirable.

 So why does gdbserver have to be taught which breakpoints may have 
potentially been set by GDB and which may have not?  Why not to deliver 
them all and leave it up to GDB to decide?  I believe it will be the right 
thing to let GDB know that more than one thread has hit the same 
breakpoint.

 Did I miss anything?  How is this situation handled in a native debug 
scenario?

> >   Perhaps we should make it a part of the ABI and teach MIPS/Linux about
> > the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP
> > in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to
> > TRAP_BRKPT, as expected.  This won't fix history of course, but at least
> > it will make debugging a little bit easier to handle in the future.
> > Cc-ing `linux-mips' for further input.
> 
> This is the best solution in my opinion and will definitely make the
> debugger's life easier if it can tell the difference between multiple
> seemingly equivalent SIGTRAP's.
> 
> Does this involve handling BRK_SSTEPBP inside
> arch/mips/kernel/traps.c:do_trap_or_bp?

 No, as I noted in my reply to David elsewhere in this thread, this would 
have to be in `do_bp' instead, to exclude trap instructions (e.g. TEQ, 
etc.) from being treated as breakpoints.  I can implement this change 
myself for you, but we need to agree first what the right solution for GDB 
is.  So far it looks to me we'd be only papering over a problem elsewhere.

  Maciej
  
Luis Machado Feb. 10, 2016, 12:52 p.m. UTC | #6
On 02/09/2016 06:55 PM, Maciej W. Rozycki wrote:
> On Tue, 9 Feb 2016, Luis Machado wrote:
>
>> I'm finally getting back to this. Sorry for the delay.
>
>   No problem, thanks for keeping a track of it.
>
>>>    I'm not convinced however that it is safe to assume all SIGTRAPs come
>>> from breakpoints -- this signal is sent by the kernel for both BREAK and
>>> trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may
>>> have been placed throughout code for some reason, for example to serve as
>>> cheap assertion checks.
>>>
>>>    Is there a separate check made afterwards like `bpstat_explains_signal'
>>> to validate the source of the signal here?
>>
>> In my specific example we are dealing with two breakpoint hits by different
>> threads. The first one is reported just fine and the one we have problems with
>> is the queued one that is reported afterwards when we attempt to move the
>> other thread.
>>
>> We do rely on bpstat_explains_signal when gdbserver/the remote target notifies
>> gdb about a stop. In the case of a queued breakpoint hit, it doesn't even get
>> reported back to gdb and is just ignored by gdbserver if it is recognized as a
>> breakpoint hit.
>
>   I'm not sure I understand what's going on here, why is a breakpoint hit
> required to be ignored by gdbserver?
>
>   From what you say I infer GDB controls a multi-threaded program and there
> it sets a software breakpoint which, by its nature, is global to all the
> threads.  Then multiple threads hit the breakpoint simultaneously (or
> nearly simultaneously) and the hits are delivered to GDB one by one, via
> gdbserver.  So why is GDB not prepared for that?
>
>   It set the breakpoint itself so it should expect it to hit sometime, and
> if there are multiple threads, then there can be multiple hits at once or
> almost at once because, even in the all-stop mode, there is no guaranteed
> way to stop the threads all at once.  The threads may be spread across
> different processors in an SMP system for example, all trapping literally
> at once -- and then the kernel queueing the signals according to its own
> internal schedule before delivering them to the debugger (that, from the
> kernel's point of view, being gdbserver in this case).
>

I went through the data again and i was partially mistaken about the 
above. Here's a detailed description of the events that take place in 
the reported situation.

1 - A breakpoint is inserted by GDB at some code that is executed by 
multiple threads.
2 - Two threads, let us call them 1 and 2, happen to hit the same 
software breakpoint, so both SIGTRAP's get sent by the kernel and 
gdbserver picks one of them to process.
3 - gdbserver figures out this is from a breakpoint hit, since it knows 
there is a breakpoint inserted at that PC, and sends a swbreak stop 
reply back to gdb.
4 - GDB gets the swbreak stop reply and notifies the user about the 
breakpoint hit for thread 1, displays the frame information etc.

Now, the user goes and deletes that specific breakpoint that triggered 
the previous event and switches the context to thread 2 and then 
attempts to continue execution.

5 - In gdbserver, thread 2 still has a pending SIGTRAP that was not yet 
handled.
6 - gdbserver proceeds to handle it, sees it is a SIGTRAP but cannot map 
that event back to a software breakpoint hit due to the removal of such 
breakpoint and because gdbserver doesn't expect SI_KERNEL to mean 
"software breakpoint hit".
7 - gdbserver then assumes this is a generic trap and reports it as such 
to GDB, in a new stop reply.
8 - GDB receives the stop reply and displays a generic trace/breakpoint 
SIGTRAP since it also cannot map the trap back to a software breakpoint, 
i.e. bpstat_explains_signal returns 0 and random_signal is non-zero.

Patching gdbserver to recognize a si_code of SI_KERNEL as a software 
breakpoint trap causes changes starting from 6.

6 - gdbserver proceeds to handle it, sees it is a SIGTRAP and that its 
si_code is SI_KERNEL, meaning a software breakpoint hit now, even though 
we can't recognize a breakpoint hit by checking for an underlying 
breakpoint instruction.
7 - gdbserver sends a swbreak stop reply back to GDB.
8 - GDB receives the stop reply and notices it is a delayed breakpoint 
hit. According to its logic, GDB discards this as useless and the 
program continues its execution properly. Here random_signal is non-zero 
and target_stopped_by_sw_breakpoint () returns 1 (because gdbserver told 
GDB so).

The problem of forcing gdbserver to recognize all traps with 
si_code==SI_KERNEL is that even hardcoded traps will be reported back to 
GDB as a swbreak event, which is not ideal.

But currently there is no easy way to tell a software breakpoint hit and 
a hardcoded trap (and maybe even a hardware breakpoint hit?) apart.

>> With the proposed change, even the hardcoded traps will initially be
>> recognized as breakpoints, albeit maybe recognized as permanent breakpoints
>> for some opcodes. It may cause gdbserver to ignore a second hardcoded trap
>> hit, which is not desirable.
>
>   So why does gdbserver have to be taught which breakpoints may have
> potentially been set by GDB and which may have not?  Why not to deliver
> them all and leave it up to GDB to decide?  I believe it will be the right
> thing to let GDB know that more than one thread has hit the same
> breakpoint.
>
>   Did I miss anything?  How is this situation handled in a native debug
> scenario?
>

I take it native debugging will display the same sympthoms because the 
definitions of si_code are shared between GDB and gdbserver, from 
nat/linux-ptrace.h. Native also uses a similar function to check for 
breakpoint hits, namely gdb/linux-nat.c:check_stopped_by_breakpoint. I 
did not test this with a native debugger though.

>>>    Perhaps we should make it a part of the ABI and teach MIPS/Linux about
>>> the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP
>>> in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to
>>> TRAP_BRKPT, as expected.  This won't fix history of course, but at least
>>> it will make debugging a little bit easier to handle in the future.
>>> Cc-ing `linux-mips' for further input.
>>
>> This is the best solution in my opinion and will definitely make the
>> debugger's life easier if it can tell the difference between multiple
>> seemingly equivalent SIGTRAP's.
>>
>> Does this involve handling BRK_SSTEPBP inside
>> arch/mips/kernel/traps.c:do_trap_or_bp?
>
>   No, as I noted in my reply to David elsewhere in this thread, this would
> have to be in `do_bp' instead, to exclude trap instructions (e.g. TEQ,
> etc.) from being treated as breakpoints.  I can implement this change
> myself for you, but we need to agree first what the right solution for GDB
> is.  So far it looks to me we'd be only papering over a problem elsewhere.

Hopefully the above makes what we're facing more clear.
  
Maciej W. Rozycki Feb. 15, 2016, 11:50 p.m. UTC | #7
On Wed, 10 Feb 2016, Luis Machado wrote:

> I went through the data again and i was partially mistaken about the above.
> Here's a detailed description of the events that take place in the reported
> situation.
> 
> 1 - A breakpoint is inserted by GDB at some code that is executed by multiple
> threads.
> 2 - Two threads, let us call them 1 and 2, happen to hit the same software
> breakpoint, so both SIGTRAP's get sent by the kernel and gdbserver picks one
> of them to process.
> 3 - gdbserver figures out this is from a breakpoint hit, since it knows there
> is a breakpoint inserted at that PC, and sends a swbreak stop reply back to
> gdb.
> 4 - GDB gets the swbreak stop reply and notifies the user about the breakpoint
> hit for thread 1, displays the frame information etc.
> 
> Now, the user goes and deletes that specific breakpoint that triggered the
> previous event and switches the context to thread 2 and then attempts to
> continue execution.
> 
> 5 - In gdbserver, thread 2 still has a pending SIGTRAP that was not yet
> handled.
> 6 - gdbserver proceeds to handle it, sees it is a SIGTRAP but cannot map that
> event back to a software breakpoint hit due to the removal of such breakpoint
> and because gdbserver doesn't expect SI_KERNEL to mean "software breakpoint
> hit".
> 7 - gdbserver then assumes this is a generic trap and reports it as such to
> GDB, in a new stop reply.
> 8 - GDB receives the stop reply and displays a generic trace/breakpoint
> SIGTRAP since it also cannot map the trap back to a software breakpoint, i.e.
> bpstat_explains_signal returns 0 and random_signal is non-zero.
> 
> Patching gdbserver to recognize a si_code of SI_KERNEL as a software
> breakpoint trap causes changes starting from 6.
> 
> 6 - gdbserver proceeds to handle it, sees it is a SIGTRAP and that its si_code
> is SI_KERNEL, meaning a software breakpoint hit now, even though we can't
> recognize a breakpoint hit by checking for an underlying breakpoint
> instruction.
> 7 - gdbserver sends a swbreak stop reply back to GDB.
> 8 - GDB receives the stop reply and notices it is a delayed breakpoint hit.
> According to its logic, GDB discards this as useless and the program continues
> its execution properly. Here random_signal is non-zero and
> target_stopped_by_sw_breakpoint () returns 1 (because gdbserver told GDB so).
> 
> The problem of forcing gdbserver to recognize all traps with
> si_code==SI_KERNEL is that even hardcoded traps will be reported back to GDB
> as a swbreak event, which is not ideal.
> 
> But currently there is no easy way to tell a software breakpoint hit and a
> hardcoded trap (and maybe even a hardware breakpoint hit?) apart.

 Thanks for the detailed explanation.

 FWIW, I maintain it's GDB that should be handling it.  What if TRAP_BRKPT 
is reported for a breakpoint that has not been set by GDB in the first 
place and is still there in code?  I take it either GDB or gdbserver, as 
applicable, will just sit there looping indefinitely in an attempt to 
discard the event while executing the breakpoint instruction over and over 
again.  There's nothing stopping the user from having a MIPS `BREAK 5' 
instruction or say INT3 for the x86 target already present in the 
executable being debugged.

 What I think GDB ought to be doing here is caching addresses for recently 
removed breakpoints and discarding spurious hits in that cache.  It may 
actually be worth verifying, before discarding such a hit, that there is 
no breakpoint instruction there anymore in target memory too -- a clever 
user might have set a breakpoint on a breakpoint instruction already there 
in code!

 It seems to me it'll be enough if the cache is only retained over a 
single resumption step, per thread of course, as it does not appear to me 
that the kernel might queue more than one software breakpoint signal at 
once.

 NB conceptually this is similar to handling spurious hardware interrupts 
in the kernel, where the kernel interrogates all possible interrupt 
sources for the reporting interrupt line, which might be a physical shared 
wire, a shared vector, etc., before discarding them as invalid.  Such 
interrupts are sometimes delivered when the originating device pulls its 
request away before it's been handled.

> >   Did I miss anything?  How is this situation handled in a native debug
> > scenario?
> 
> I take it native debugging will display the same sympthoms because the
> definitions of si_code are shared between GDB and gdbserver, from
> nat/linux-ptrace.h. Native also uses a similar function to check for
> breakpoint hits, namely gdb/linux-nat.c:check_stopped_by_breakpoint. I did not
> test this with a native debugger though.

 OK, noted.  Thanks!

> > > >    Perhaps we should make it a part of the ABI and teach MIPS/Linux
> > > > about
> > > > the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP
> > > > in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to
> > > > TRAP_BRKPT, as expected.  This won't fix history of course, but at least
> > > > it will make debugging a little bit easier to handle in the future.
> > > > Cc-ing `linux-mips' for further input.
> > > 
> > > This is the best solution in my opinion and will definitely make the
> > > debugger's life easier if it can tell the difference between multiple
> > > seemingly equivalent SIGTRAP's.
> > > 
> > > Does this involve handling BRK_SSTEPBP inside
> > > arch/mips/kernel/traps.c:do_trap_or_bp?
> > 
> >   No, as I noted in my reply to David elsewhere in this thread, this would
> > have to be in `do_bp' instead, to exclude trap instructions (e.g. TEQ,
> > etc.) from being treated as breakpoints.  I can implement this change
> > myself for you, but we need to agree first what the right solution for GDB
> > is.  So far it looks to me we'd be only papering over a problem elsewhere.
> 
> Hopefully the above makes what we're facing more clear.

 I think we could do this as well, so as to save GDB from the need to 
verify truly irrelevant traps against its cache.  Though I don't think it 
buys us anything short-term as we'll have to continue support kernels 
which have no notion of TRAP_BRKPT.  So no need to rush doing this IMHO.

  Maciej
  
Pedro Alves Feb. 16, 2016, 12:30 a.m. UTC | #8
On 02/15/2016 11:50 PM, Maciej W. Rozycki wrote:

>  FWIW, I maintain it's GDB that should be handling it.  What if TRAP_BRKPT 
> is reported for a breakpoint that has not been set by GDB in the first 
> place and is still there in code?  I take it either GDB or gdbserver, as 
> applicable, will just sit there looping indefinitely in an attempt to 
> discard the event while executing the breakpoint instruction over and over 
> again.

Nope.

> There's nothing stopping the user from having a MIPS `BREAK 5' 
> instruction or say INT3 for the x86 target already present in the 
> executable being debugged.

GDB only ignores the TRAP_BRKPT event if there's no "BREAK 5" instruction
hardcoded in the binary.  If there is, then the program is stopped and
a SIGTRAP is reported to the user.

>  What I think GDB ought to be doing here is caching addresses for recently 
> removed breakpoints and discarding spurious hits in that cache.

That does not work in general.  The most problematic archs are those that
leave the PC pointing after the breakpoint instruction, such as x86.
See more below.

> It may 
> actually be worth verifying, before discarding such a hit, that there is 
> no breakpoint instruction there anymore in target memory too -- a clever 
> user might have set a breakpoint on a breakpoint instruction already there 
> in code!

Yep, GDB does that already, and we have tests that cover this.
See gdb.base/bp-permanent.exp.

> 
>  It seems to me it'll be enough if the cache is only retained over a 
> single resumption step, per thread of course, as it does not appear to me 
> that the kernel might queue more than one software breakpoint signal at 
> once.

That wouldn't work, as a new thread GDB doesn't know about yet may report
a stop for the PC where a breakpoint used to be, and then you don't
know whether you need to adjust its PC.

Remembering whether a breakpoint was inserted was what GDB used to
do, and it was because it was problematic that "swbreak" was
invented.  See:

 https://sourceware.org/ml/gdb-patches/2015-02/msg00726.html

Particularly:

 https://sourceware.org/ml/gdb-patches/2015-02/msg00730.html

This was a previous attempt that tried to preserve moribund
locations, but was still not sufficient:
 https://sourceware.org/ml/gdb-patches/2014-10/msg00781.html

I'm really hoping that at some point all archs implement
TRAP_BRKPT and the moribund locations heuristic can be removed
from gdb.

Thanks,
Pedro Alves
  
Pedro Alves Feb. 16, 2016, 12:57 a.m. UTC | #9
On 02/10/2016 12:52 PM, Luis Machado wrote:
> The problem of forcing gdbserver to recognize all traps with 
> si_code==SI_KERNEL is that even hardcoded traps will be reported back to 
> GDB as a swbreak event, which is not ideal.

That's how swbreak is defined:

 @item swbreak
 @anchor{swbreak stop reason}
 The packet indicates a memory breakpoint instruction was executed,
 irrespective of whether it was @value{GDBN} that planted the
 breakpoint or the breakpoint is hardcoded in the program.

> 
> But currently there is no easy way to tell a software breakpoint hit and 
> a hardcoded trap (and maybe even a hardware breakpoint hit?) apart.

Software breakpoint hits or hardcoded traps are handled the same.  Even if GDB
plants the breakpoint instruction itself with direct memory pokes (instead of
z0 packets), the target should report "swbreak" stops, so that gdb can do
the right thing.

GDB knows whether to discard the hit as a delayed breakpoint hit event by
checking whether the thread's PC points at an hardcoded trap.  If it does,
the event is not discarded, but instead reported to the user as a SIGTRAP.

Hardware breakpoint hits are distinguished from software breakpoint hits,
because they're reported with "hwbreak", not "swbreak":

 @item hwbreak
 The packet indicates the target stopped for a hardware breakpoint.
 The @var{r} part must be left empty.

Thanks,
Pedro Alves
  
Maciej W. Rozycki Feb. 19, 2016, 4:21 p.m. UTC | #10
On Tue, 16 Feb 2016, Pedro Alves wrote:

> >  FWIW, I maintain it's GDB that should be handling it.  What if TRAP_BRKPT 
> > is reported for a breakpoint that has not been set by GDB in the first 
> > place and is still there in code?  I take it either GDB or gdbserver, as 
> > applicable, will just sit there looping indefinitely in an attempt to 
> > discard the event while executing the breakpoint instruction over and over 
> > again.
> 
> Nope.
> 
> > There's nothing stopping the user from having a MIPS `BREAK 5' 
> > instruction or say INT3 for the x86 target already present in the 
> > executable being debugged.
> 
> GDB only ignores the TRAP_BRKPT event if there's no "BREAK 5" instruction
> hardcoded in the binary.  If there is, then the program is stopped and
> a SIGTRAP is reported to the user.

 Ah, that makes a fundamental difference to me!  I missed this detail from 
Luis's description and concluded that GDB/gdbserver never passes along 
TRAP_BRKPT events it cannot associate with a breakpoint set by itself.

> >  What I think GDB ought to be doing here is caching addresses for recently 
> > removed breakpoints and discarding spurious hits in that cache.
> 
> That does not work in general.  The most problematic archs are those that
> leave the PC pointing after the breakpoint instruction, such as x86.
> See more below.

 [etc.] Fair enough.  Thanks for the extra explanation.

> Remembering whether a breakpoint was inserted was what GDB used to
> do, and it was because it was problematic that "swbreak" was
> invented.  See:
> 
>  https://sourceware.org/ml/gdb-patches/2015-02/msg00726.html
> 
> Particularly:
> 
>  https://sourceware.org/ml/gdb-patches/2015-02/msg00730.html

 OK, I can see you peek at target text in `program_breakpoint_here_p' to 
see if there's a breakpoint there.  With this in mind I think Luis's 
change is conceptually good as it stands, although I think we should move 
the MIPS case along PowerPC right away, because TRAP_BRKPT is the right 
code to use here and the MIPS port has never produced it so far and 
therefore there's no backwards compatibility concern if we accept both 
SI_KERNEL and TRAP_BRKPT.

 Luis, such an updated change is preapproved as far as I am concerned, 
feel free to commit it without another review round.

 As to the kernel side with the observations made in this discussion I 
think we should set the trap code for SIGTRAP signals issued with BREAK 
instructions to TRAP_BRKPT unconditionally, regardless of the code used.  
This won't of course affect encodings which send a different signal such 
as SIGFPE.

 We're lacking a code suitable for (conditional) trap instructions.  I 
think TRAP_TRAP or suchlike needs to be added.

> > But currently there is no easy way to tell a software breakpoint hit and 
> > a hardcoded trap (and maybe even a hardware breakpoint hit?) apart.
> 
> Software breakpoint hits or hardcoded traps are handled the same.  Even if GDB
> plants the breakpoint instruction itself with direct memory pokes (instead of
> z0 packets), the target should report "swbreak" stops, so that gdb can do
> the right thing.

 Manual planting is historical AFAIK, gdbserver and other stubs used not 
to support `z' packets.  Now we have the right infrastructure in place and 
GDB passes MIPS breakpoint encoding requirements along (although I note 
that the encoding for microMIPS BREAK16 has changed with R6, so this will 
have to be updated) so we could have software breakpoint support added to 
MIPS gdbserver as well.

> Hardware breakpoint hits are distinguished from software breakpoint hits,
> because they're reported with "hwbreak", not "swbreak":
> 
>  @item hwbreak
>  The packet indicates the target stopped for a hardware breakpoint.
>  The @var{r} part must be left empty.

 Umm, any requirements for this?  We have MIPS data hardware breakpoint 
support in the Linux kernel (regrettably not for instructions, but that 
could be added sometime), but I can't see TRAP_HWBKPT being set for them, 
they just use generic SI_KERNEL as everything else right now.

 Thanks for your patience in explaining this all to me.

  Maciej
  
Pedro Alves Feb. 24, 2016, 11:52 a.m. UTC | #11
Hi Maciej,

>  As to the kernel side with the observations made in this discussion I 
> think we should set the trap code for SIGTRAP signals issued with BREAK 
> instructions to TRAP_BRKPT unconditionally, regardless of the code used.  
> This won't of course affect encodings which send a different signal such 
> as SIGFPE.
> 
>  We're lacking a code suitable for (conditional) trap instructions.  I 
> think TRAP_TRAP or suchlike needs to be added.

Yeah, looks like it.

>> Hardware breakpoint hits are distinguished from software breakpoint hits,
>> because they're reported with "hwbreak", not "swbreak":
>>
>>  @item hwbreak
>>  The packet indicates the target stopped for a hardware breakpoint.
>>  The @var{r} part must be left empty.
> 
>  Umm, any requirements for this?  We have MIPS data hardware breakpoint 
> support in the Linux kernel (regrettably not for instructions, but that 
> could be added sometime), but I can't see TRAP_HWBKPT being set for them, 
> they just use generic SI_KERNEL as everything else right now.

Can userspace/ptrace still tell whether a hardware breakpoint triggered by
consulting debug registers, similarly to how it can for watchpoints,
with PTRACE_GET_WATCH_REGS, and looking at watchhi ?

(assuming this comment is correct):

/* Target to_stopped_by_watchpoint implementation.  Return 1 if
   stopped by watchpoint.  The watchhi R and W bits indicate the watch
   register triggered.  */

static int
mips_linux_stopped_by_watchpoint (struct target_ops *ops)
{


This is not reachable today, due to lack of TRAP_* in si_code, but I
think that can be fixed.


The only use for hwbreak currently is to know whether to ignore hardware
breakpoint traps gdb can't explain (gdb assumes they're a delayed event for
a hw breakpoint that has since been removed):

  /* Maybe this was a trap for a hardware breakpoint/watchpoint that
     has since been removed.  */
  if (random_signal && target_stopped_by_hw_breakpoint ())
    {
      /* A delayed hardware breakpoint event.  Ignore the trap.  */
      if (debug_infrun)
	fprintf_unfiltered (gdb_stdlog,
			    "infrun: delayed hardware breakpoint/watchpoint "
			    "trap, ignoring\n");
      random_signal = 0;
    }

So if the server claims it supports this stop reason, but then doesn't
send it for hw breakpoint trap, users will see their programs
occasionally stop for random spurious SIGTRAPs (if they use hardware
breapoints).

If the server does _not_ claim support for the swbreak/hwbreak stop
reason, then the old moribund breakpoints heuristic kicks in:

  /* Check if a moribund breakpoint explains the stop.  */
  if (!target_supports_stopped_by_sw_breakpoint ()
      || !target_supports_stopped_by_hw_breakpoint ())
    {
      for (ix = 0; VEC_iterate (bp_location_p, moribund_locations, ix, loc); ++ix)
	{
	  if (breakpoint_location_address_match (loc, aspace, bp_addr)
	      && need_moribund_for_location_type (loc))
	    {
	      bs = bpstat_alloc (loc, &bs_link);
	      /* For hits of moribund locations, we should just proceed.  */
	      bs->stop = 0;
	      bs->print = 0;
	      bs->print_it = print_it_noop;
	    }
	}
    }

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/nat/linux-ptrace.h b/gdb/nat/linux-ptrace.h
index 1be38fe..09c8bc5 100644
--- a/gdb/nat/linux-ptrace.h
+++ b/gdb/nat/linux-ptrace.h
@@ -142,7 +142,7 @@  struct buffer;
 
    The generic Linux target code should use GDB_ARCH_IS_TRAP_BRKPT
    instead of TRAP_BRKPT to abstract out these peculiarities.  */
-#if defined __i386__ || defined __x86_64__
+#if defined __i386__ || defined __x86_64__ || defined __mips__
 # define GDB_ARCH_IS_TRAP_BRKPT(X) ((X) == SI_KERNEL)
 #elif defined __powerpc__
 # define GDB_ARCH_IS_TRAP_BRKPT(X) ((X) == SI_KERNEL || (X) == TRAP_BRKPT)