Message ID | 1442592647-3051-1-git-send-email-lgustavo@codesourcery.com |
---|---|
State | Superseded |
Headers |
Received: (qmail 65830 invoked by alias); 18 Sep 2015 16:11:01 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 65820 invoked by uid 89); 18 Sep 2015 16:11:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 18 Sep 2015 16:11:00 +0000 Received: from svr-orw-fem-05.mgc.mentorg.com ([147.34.97.43]) by relay1.mentorg.com with esmtp id 1ZcyFU-00009w-J4 from Luis_Gustavo@mentor.com ; Fri, 18 Sep 2015 09:10:56 -0700 Received: from opsys.world.mentorg.com (147.34.91.1) by svr-orw-fem-05.mgc.mentorg.com (147.34.97.43) with Microsoft SMTP Server id 14.3.224.2; Fri, 18 Sep 2015 09:10:55 -0700 From: Luis Machado <lgustavo@codesourcery.com> To: <gdb-patches@sourceware.org> CC: <macro@linux-mips.org> Subject: [PATCH] Expect SI_KERNEL si_code for a MIPS software breakpoint trap Date: Fri, 18 Sep 2015 13:10:47 -0300 Message-ID: <1442592647-3051-1-git-send-email-lgustavo@codesourcery.com> MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes |
Commit Message
Luis Machado
Sept. 18, 2015, 4:10 p.m. UTC
While doing some MIPS/Linux tests, i've found a number of threading tests failing due to spurious SIGTRAP's. Turns out those spurious SIGTRAP's were actually software breakpoint hits that were supposed to be handled silently by GDB/GDBserver, returning a swbreak event. gdb.threads/continue-pending-status.exp is one of the testcases that show this behavior. -- Breakpoint 1, main () at gdb.threads/continue-pending-status.c:44^M 44 pthread_barrier_init (&barrier, NULL, NUM_THREADS);^M (gdb) b continue-pending-status.c:36^M Breakpoint 2 at 0x400a04: file gdb.threads/continue-pending-status.c, line 36.^M (gdb) PASS: gdb.threads/continue-pending-status.exp: attempt 0: set break in tight loop continue^M Continuing.^M [New Thread 5850]^M [New Thread 5851]^M [Switching to Thread 5850]^M ^M Breakpoint 2, thread_function (arg=0x0) at gdb.threads/continue-pending-status.c:36^M 36 while (1); /* break here */^M (gdb) PASS: gdb.threads/continue-pending-status.exp: attempt 0: continue to tight loop print /x $_thread^M $1 = 0x2^M (gdb) PASS: gdb.threads/continue-pending-status.exp: attempt 0: get thread number thread 3^M [Switching to thread 3 (Thread 5851)]^M 36 while (1); /* break here */^M (gdb) PASS: gdb.threads/continue-pending-status.exp: attempt 0: switch to non-event thread delete breakpoints^M Delete all breakpoints? (y or n) y^M (gdb) info breakpoints^M No breakpoints or watchpoints.^M (gdb) continue^M Continuing.^M ^M Program received signal SIGTRAP, Trace/breakpoint trap.^M <<<< This SIGTRAP was a pending breakpoint event that wasn't supposed to cause <<<< a stop, but gdbserver did not figure out this was a breakpoint hit. PASS: gdb.threads/continue-pending-status.exp: attempt 0: continue for ctrl-c thread_function (arg=0x0) at gdb.threads/continue-pending-status.c:36^M 36 while (1); /* break here */^M (gdb) FAIL: gdb.threads/continue-pending-status.exp: attempt 0: caught interrupt -- I tracked this down to the lack of a proper definition of what MIPS' kernel returns in the si_code for a software breakpoint trap. Though i did not find documentation about this, tests showed that we should check for SI_KERNEL, just like i386. I've cc-ed Maciej, just to be sure this is indeed correct. With the following patch i have cleaner results for thread tests on MIPS/Linux. Regression-tested on a few MIPS boards. OK? gdb/ChangeLog: 2015-09-18 Luis Machado <lgustavo@codesourcery.com> * nat/linux-ptrace.h: Check for SI_KERNEL si_code for MIPS. --- gdb/nat/linux-ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Comments
Hi Luis, > I tracked this down to the lack of a proper definition of what MIPS' kernel > returns in the si_code for a software breakpoint trap. > > Though i did not find documentation about this, tests showed that we should > check for SI_KERNEL, just like i386. I've cc-ed Maciej, just to be sure this > is indeed correct. Hmm, the MIPS/Linux port does not set any particular code for SIGTRAP, all such signals will have the SI_KERNEL default, so you may well return TRUE unconditionally. I'm not convinced however that it is safe to assume all SIGTRAPs come from breakpoints -- this signal is sent by the kernel for both BREAK and trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may have been placed throughout code for some reason, for example to serve as cheap assertion checks. Is there a separate check made afterwards like `bpstat_explains_signal' to validate the source of the signal here? Perhaps we should make it a part of the ABI and teach MIPS/Linux about the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to TRAP_BRKPT, as expected. This won't fix history of course, but at least it will make debugging a little bit easier to handle in the future. Cc-ing `linux-mips' for further input. I was wondering where these SIGTRAPs come from too BTW, thanks for investigating it. And thanks for the heads-up! Maciej
We have to be very careful changing the ABI here. This is used by almost all userspace code to detect integer division by zero. Many things like the libgcj runtime use this to generate runtime exceptions, we don't want to break them. David Daney On 09/18/2015 09:56 AM, Maciej W. Rozycki wrote: > Hi Luis, > >> I tracked this down to the lack of a proper definition of what MIPS' kernel >> returns in the si_code for a software breakpoint trap. >> >> Though i did not find documentation about this, tests showed that we should >> check for SI_KERNEL, just like i386. I've cc-ed Maciej, just to be sure this >> is indeed correct. > > Hmm, the MIPS/Linux port does not set any particular code for SIGTRAP, > all such signals will have the SI_KERNEL default, so you may well return > TRUE unconditionally. > > I'm not convinced however that it is safe to assume all SIGTRAPs come > from breakpoints -- this signal is sent by the kernel for both BREAK and > trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may > have been placed throughout code for some reason, for example to serve as > cheap assertion checks. > > Is there a separate check made afterwards like `bpstat_explains_signal' > to validate the source of the signal here? > > Perhaps we should make it a part of the ABI and teach MIPS/Linux about > the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP > in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to > TRAP_BRKPT, as expected. This won't fix history of course, but at least > it will make debugging a little bit easier to handle in the future. > Cc-ing `linux-mips' for further input. > > I was wondering where these SIGTRAPs come from too BTW, thanks for > investigating it. And thanks for the heads-up! > > Maciej > >
On Fri, 18 Sep 2015, David Daney wrote: > We have to be very careful changing the ABI here. > > This is used by almost all userspace code to detect integer division by zero. > Many things like the libgcj runtime use this to generate runtime exceptions, > we don't want to break them. No worries here, integer division by 0 and overflow use `BREAK 7' and `BREAK 6' respectively (or corresponding trap instructions) and these cases are already handled correctly, as I implemented many years ago for regular MIPS user code and recently fixed for MIPS16 code (and less recently for microMIPS code as well). Have a look at `do_trap_or_bp' in arch/mips/kernel/traps.c for details. There's no collision with `BREAK 5'; Linux code would of course have to treat 5 as an unrecognised for traps and would therefore handle the case right in `do_bp' like with kprobe breakpoints. I hope this clears your concerns. Maciej
Hi Maciej, I'm finally getting back to this. Sorry for the delay. On 09/18/2015 01:56 PM, Maciej W. Rozycki wrote: > Hi Luis, > >> I tracked this down to the lack of a proper definition of what MIPS' kernel >> returns in the si_code for a software breakpoint trap. >> >> Though i did not find documentation about this, tests showed that we should >> check for SI_KERNEL, just like i386. I've cc-ed Maciej, just to be sure this >> is indeed correct. > > Hmm, the MIPS/Linux port does not set any particular code for SIGTRAP, > all such signals will have the SI_KERNEL default, so you may well return > TRUE unconditionally. > That is unfortunate. It would be nice to have a well defined standard of communicating these events from kernels to debuggers. All is not lost though, since that can be improved. > I'm not convinced however that it is safe to assume all SIGTRAPs come > from breakpoints -- this signal is sent by the kernel for both BREAK and > trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may > have been placed throughout code for some reason, for example to serve as > cheap assertion checks. > > Is there a separate check made afterwards like `bpstat_explains_signal' > to validate the source of the signal here? > In my specific example we are dealing with two breakpoint hits by different threads. The first one is reported just fine and the one we have problems with is the queued one that is reported afterwards when we attempt to move the other thread. We do rely on bpstat_explains_signal when gdbserver/the remote target notifies gdb about a stop. In the case of a queued breakpoint hit, it doesn't even get reported back to gdb and is just ignored by gdbserver if it is recognized as a breakpoint hit. With the proposed change, even the hardcoded traps will initially be recognized as breakpoints, albeit maybe recognized as permanent breakpoints for some opcodes. It may cause gdbserver to ignore a second hardcoded trap hit, which is not desirable. > Perhaps we should make it a part of the ABI and teach MIPS/Linux about > the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP > in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to > TRAP_BRKPT, as expected. This won't fix history of course, but at least > it will make debugging a little bit easier to handle in the future. > Cc-ing `linux-mips' for further input. This is the best solution in my opinion and will definitely make the debugger's life easier if it can tell the difference between multiple seemingly equivalent SIGTRAP's. Does this involve handling BRK_SSTEPBP inside arch/mips/kernel/traps.c:do_trap_or_bp? > > I was wondering where these SIGTRAPs come from too BTW, thanks for > investigating it. And thanks for the heads-up! You're welcome.
On Tue, 9 Feb 2016, Luis Machado wrote: > I'm finally getting back to this. Sorry for the delay. No problem, thanks for keeping a track of it. > > I'm not convinced however that it is safe to assume all SIGTRAPs come > > from breakpoints -- this signal is sent by the kernel for both BREAK and > > trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may > > have been placed throughout code for some reason, for example to serve as > > cheap assertion checks. > > > > Is there a separate check made afterwards like `bpstat_explains_signal' > > to validate the source of the signal here? > > In my specific example we are dealing with two breakpoint hits by different > threads. The first one is reported just fine and the one we have problems with > is the queued one that is reported afterwards when we attempt to move the > other thread. > > We do rely on bpstat_explains_signal when gdbserver/the remote target notifies > gdb about a stop. In the case of a queued breakpoint hit, it doesn't even get > reported back to gdb and is just ignored by gdbserver if it is recognized as a > breakpoint hit. I'm not sure I understand what's going on here, why is a breakpoint hit required to be ignored by gdbserver? From what you say I infer GDB controls a multi-threaded program and there it sets a software breakpoint which, by its nature, is global to all the threads. Then multiple threads hit the breakpoint simultaneously (or nearly simultaneously) and the hits are delivered to GDB one by one, via gdbserver. So why is GDB not prepared for that? It set the breakpoint itself so it should expect it to hit sometime, and if there are multiple threads, then there can be multiple hits at once or almost at once because, even in the all-stop mode, there is no guaranteed way to stop the threads all at once. The threads may be spread across different processors in an SMP system for example, all trapping literally at once -- and then the kernel queueing the signals according to its own internal schedule before delivering them to the debugger (that, from the kernel's point of view, being gdbserver in this case). > With the proposed change, even the hardcoded traps will initially be > recognized as breakpoints, albeit maybe recognized as permanent breakpoints > for some opcodes. It may cause gdbserver to ignore a second hardcoded trap > hit, which is not desirable. So why does gdbserver have to be taught which breakpoints may have potentially been set by GDB and which may have not? Why not to deliver them all and leave it up to GDB to decide? I believe it will be the right thing to let GDB know that more than one thread has hit the same breakpoint. Did I miss anything? How is this situation handled in a native debug scenario? > > Perhaps we should make it a part of the ABI and teach MIPS/Linux about > > the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP > > in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to > > TRAP_BRKPT, as expected. This won't fix history of course, but at least > > it will make debugging a little bit easier to handle in the future. > > Cc-ing `linux-mips' for further input. > > This is the best solution in my opinion and will definitely make the > debugger's life easier if it can tell the difference between multiple > seemingly equivalent SIGTRAP's. > > Does this involve handling BRK_SSTEPBP inside > arch/mips/kernel/traps.c:do_trap_or_bp? No, as I noted in my reply to David elsewhere in this thread, this would have to be in `do_bp' instead, to exclude trap instructions (e.g. TEQ, etc.) from being treated as breakpoints. I can implement this change myself for you, but we need to agree first what the right solution for GDB is. So far it looks to me we'd be only papering over a problem elsewhere. Maciej
On 02/09/2016 06:55 PM, Maciej W. Rozycki wrote: > On Tue, 9 Feb 2016, Luis Machado wrote: > >> I'm finally getting back to this. Sorry for the delay. > > No problem, thanks for keeping a track of it. > >>> I'm not convinced however that it is safe to assume all SIGTRAPs come >>> from breakpoints -- this signal is sent by the kernel for both BREAK and >>> trap (multiple mnemonics, e.g. TEQ, TGEI, etc.) instructions which may >>> have been placed throughout code for some reason, for example to serve as >>> cheap assertion checks. >>> >>> Is there a separate check made afterwards like `bpstat_explains_signal' >>> to validate the source of the signal here? >> >> In my specific example we are dealing with two breakpoint hits by different >> threads. The first one is reported just fine and the one we have problems with >> is the queued one that is reported afterwards when we attempt to move the >> other thread. >> >> We do rely on bpstat_explains_signal when gdbserver/the remote target notifies >> gdb about a stop. In the case of a queued breakpoint hit, it doesn't even get >> reported back to gdb and is just ignored by gdbserver if it is recognized as a >> breakpoint hit. > > I'm not sure I understand what's going on here, why is a breakpoint hit > required to be ignored by gdbserver? > > From what you say I infer GDB controls a multi-threaded program and there > it sets a software breakpoint which, by its nature, is global to all the > threads. Then multiple threads hit the breakpoint simultaneously (or > nearly simultaneously) and the hits are delivered to GDB one by one, via > gdbserver. So why is GDB not prepared for that? > > It set the breakpoint itself so it should expect it to hit sometime, and > if there are multiple threads, then there can be multiple hits at once or > almost at once because, even in the all-stop mode, there is no guaranteed > way to stop the threads all at once. The threads may be spread across > different processors in an SMP system for example, all trapping literally > at once -- and then the kernel queueing the signals according to its own > internal schedule before delivering them to the debugger (that, from the > kernel's point of view, being gdbserver in this case). > I went through the data again and i was partially mistaken about the above. Here's a detailed description of the events that take place in the reported situation. 1 - A breakpoint is inserted by GDB at some code that is executed by multiple threads. 2 - Two threads, let us call them 1 and 2, happen to hit the same software breakpoint, so both SIGTRAP's get sent by the kernel and gdbserver picks one of them to process. 3 - gdbserver figures out this is from a breakpoint hit, since it knows there is a breakpoint inserted at that PC, and sends a swbreak stop reply back to gdb. 4 - GDB gets the swbreak stop reply and notifies the user about the breakpoint hit for thread 1, displays the frame information etc. Now, the user goes and deletes that specific breakpoint that triggered the previous event and switches the context to thread 2 and then attempts to continue execution. 5 - In gdbserver, thread 2 still has a pending SIGTRAP that was not yet handled. 6 - gdbserver proceeds to handle it, sees it is a SIGTRAP but cannot map that event back to a software breakpoint hit due to the removal of such breakpoint and because gdbserver doesn't expect SI_KERNEL to mean "software breakpoint hit". 7 - gdbserver then assumes this is a generic trap and reports it as such to GDB, in a new stop reply. 8 - GDB receives the stop reply and displays a generic trace/breakpoint SIGTRAP since it also cannot map the trap back to a software breakpoint, i.e. bpstat_explains_signal returns 0 and random_signal is non-zero. Patching gdbserver to recognize a si_code of SI_KERNEL as a software breakpoint trap causes changes starting from 6. 6 - gdbserver proceeds to handle it, sees it is a SIGTRAP and that its si_code is SI_KERNEL, meaning a software breakpoint hit now, even though we can't recognize a breakpoint hit by checking for an underlying breakpoint instruction. 7 - gdbserver sends a swbreak stop reply back to GDB. 8 - GDB receives the stop reply and notices it is a delayed breakpoint hit. According to its logic, GDB discards this as useless and the program continues its execution properly. Here random_signal is non-zero and target_stopped_by_sw_breakpoint () returns 1 (because gdbserver told GDB so). The problem of forcing gdbserver to recognize all traps with si_code==SI_KERNEL is that even hardcoded traps will be reported back to GDB as a swbreak event, which is not ideal. But currently there is no easy way to tell a software breakpoint hit and a hardcoded trap (and maybe even a hardware breakpoint hit?) apart. >> With the proposed change, even the hardcoded traps will initially be >> recognized as breakpoints, albeit maybe recognized as permanent breakpoints >> for some opcodes. It may cause gdbserver to ignore a second hardcoded trap >> hit, which is not desirable. > > So why does gdbserver have to be taught which breakpoints may have > potentially been set by GDB and which may have not? Why not to deliver > them all and leave it up to GDB to decide? I believe it will be the right > thing to let GDB know that more than one thread has hit the same > breakpoint. > > Did I miss anything? How is this situation handled in a native debug > scenario? > I take it native debugging will display the same sympthoms because the definitions of si_code are shared between GDB and gdbserver, from nat/linux-ptrace.h. Native also uses a similar function to check for breakpoint hits, namely gdb/linux-nat.c:check_stopped_by_breakpoint. I did not test this with a native debugger though. >>> Perhaps we should make it a part of the ABI and teach MIPS/Linux about >>> the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP >>> in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to >>> TRAP_BRKPT, as expected. This won't fix history of course, but at least >>> it will make debugging a little bit easier to handle in the future. >>> Cc-ing `linux-mips' for further input. >> >> This is the best solution in my opinion and will definitely make the >> debugger's life easier if it can tell the difference between multiple >> seemingly equivalent SIGTRAP's. >> >> Does this involve handling BRK_SSTEPBP inside >> arch/mips/kernel/traps.c:do_trap_or_bp? > > No, as I noted in my reply to David elsewhere in this thread, this would > have to be in `do_bp' instead, to exclude trap instructions (e.g. TEQ, > etc.) from being treated as breakpoints. I can implement this change > myself for you, but we need to agree first what the right solution for GDB > is. So far it looks to me we'd be only papering over a problem elsewhere. Hopefully the above makes what we're facing more clear.
On Wed, 10 Feb 2016, Luis Machado wrote: > I went through the data again and i was partially mistaken about the above. > Here's a detailed description of the events that take place in the reported > situation. > > 1 - A breakpoint is inserted by GDB at some code that is executed by multiple > threads. > 2 - Two threads, let us call them 1 and 2, happen to hit the same software > breakpoint, so both SIGTRAP's get sent by the kernel and gdbserver picks one > of them to process. > 3 - gdbserver figures out this is from a breakpoint hit, since it knows there > is a breakpoint inserted at that PC, and sends a swbreak stop reply back to > gdb. > 4 - GDB gets the swbreak stop reply and notifies the user about the breakpoint > hit for thread 1, displays the frame information etc. > > Now, the user goes and deletes that specific breakpoint that triggered the > previous event and switches the context to thread 2 and then attempts to > continue execution. > > 5 - In gdbserver, thread 2 still has a pending SIGTRAP that was not yet > handled. > 6 - gdbserver proceeds to handle it, sees it is a SIGTRAP but cannot map that > event back to a software breakpoint hit due to the removal of such breakpoint > and because gdbserver doesn't expect SI_KERNEL to mean "software breakpoint > hit". > 7 - gdbserver then assumes this is a generic trap and reports it as such to > GDB, in a new stop reply. > 8 - GDB receives the stop reply and displays a generic trace/breakpoint > SIGTRAP since it also cannot map the trap back to a software breakpoint, i.e. > bpstat_explains_signal returns 0 and random_signal is non-zero. > > Patching gdbserver to recognize a si_code of SI_KERNEL as a software > breakpoint trap causes changes starting from 6. > > 6 - gdbserver proceeds to handle it, sees it is a SIGTRAP and that its si_code > is SI_KERNEL, meaning a software breakpoint hit now, even though we can't > recognize a breakpoint hit by checking for an underlying breakpoint > instruction. > 7 - gdbserver sends a swbreak stop reply back to GDB. > 8 - GDB receives the stop reply and notices it is a delayed breakpoint hit. > According to its logic, GDB discards this as useless and the program continues > its execution properly. Here random_signal is non-zero and > target_stopped_by_sw_breakpoint () returns 1 (because gdbserver told GDB so). > > The problem of forcing gdbserver to recognize all traps with > si_code==SI_KERNEL is that even hardcoded traps will be reported back to GDB > as a swbreak event, which is not ideal. > > But currently there is no easy way to tell a software breakpoint hit and a > hardcoded trap (and maybe even a hardware breakpoint hit?) apart. Thanks for the detailed explanation. FWIW, I maintain it's GDB that should be handling it. What if TRAP_BRKPT is reported for a breakpoint that has not been set by GDB in the first place and is still there in code? I take it either GDB or gdbserver, as applicable, will just sit there looping indefinitely in an attempt to discard the event while executing the breakpoint instruction over and over again. There's nothing stopping the user from having a MIPS `BREAK 5' instruction or say INT3 for the x86 target already present in the executable being debugged. What I think GDB ought to be doing here is caching addresses for recently removed breakpoints and discarding spurious hits in that cache. It may actually be worth verifying, before discarding such a hit, that there is no breakpoint instruction there anymore in target memory too -- a clever user might have set a breakpoint on a breakpoint instruction already there in code! It seems to me it'll be enough if the cache is only retained over a single resumption step, per thread of course, as it does not appear to me that the kernel might queue more than one software breakpoint signal at once. NB conceptually this is similar to handling spurious hardware interrupts in the kernel, where the kernel interrogates all possible interrupt sources for the reporting interrupt line, which might be a physical shared wire, a shared vector, etc., before discarding them as invalid. Such interrupts are sometimes delivered when the originating device pulls its request away before it's been handled. > > Did I miss anything? How is this situation handled in a native debug > > scenario? > > I take it native debugging will display the same sympthoms because the > definitions of si_code are shared between GDB and gdbserver, from > nat/linux-ptrace.h. Native also uses a similar function to check for > breakpoint hits, namely gdb/linux-nat.c:check_stopped_by_breakpoint. I did not > test this with a native debugger though. OK, noted. Thanks! > > > > Perhaps we should make it a part of the ABI and teach MIPS/Linux > > > > about > > > > the breakpoint encoding used by GDB, which is `BREAK 5' (aka BRK_SSTEPBP > > > > in kernel-speak, a misnomer I'm afraid), and make it set `si_code' to > > > > TRAP_BRKPT, as expected. This won't fix history of course, but at least > > > > it will make debugging a little bit easier to handle in the future. > > > > Cc-ing `linux-mips' for further input. > > > > > > This is the best solution in my opinion and will definitely make the > > > debugger's life easier if it can tell the difference between multiple > > > seemingly equivalent SIGTRAP's. > > > > > > Does this involve handling BRK_SSTEPBP inside > > > arch/mips/kernel/traps.c:do_trap_or_bp? > > > > No, as I noted in my reply to David elsewhere in this thread, this would > > have to be in `do_bp' instead, to exclude trap instructions (e.g. TEQ, > > etc.) from being treated as breakpoints. I can implement this change > > myself for you, but we need to agree first what the right solution for GDB > > is. So far it looks to me we'd be only papering over a problem elsewhere. > > Hopefully the above makes what we're facing more clear. I think we could do this as well, so as to save GDB from the need to verify truly irrelevant traps against its cache. Though I don't think it buys us anything short-term as we'll have to continue support kernels which have no notion of TRAP_BRKPT. So no need to rush doing this IMHO. Maciej
On 02/15/2016 11:50 PM, Maciej W. Rozycki wrote: > FWIW, I maintain it's GDB that should be handling it. What if TRAP_BRKPT > is reported for a breakpoint that has not been set by GDB in the first > place and is still there in code? I take it either GDB or gdbserver, as > applicable, will just sit there looping indefinitely in an attempt to > discard the event while executing the breakpoint instruction over and over > again. Nope. > There's nothing stopping the user from having a MIPS `BREAK 5' > instruction or say INT3 for the x86 target already present in the > executable being debugged. GDB only ignores the TRAP_BRKPT event if there's no "BREAK 5" instruction hardcoded in the binary. If there is, then the program is stopped and a SIGTRAP is reported to the user. > What I think GDB ought to be doing here is caching addresses for recently > removed breakpoints and discarding spurious hits in that cache. That does not work in general. The most problematic archs are those that leave the PC pointing after the breakpoint instruction, such as x86. See more below. > It may > actually be worth verifying, before discarding such a hit, that there is > no breakpoint instruction there anymore in target memory too -- a clever > user might have set a breakpoint on a breakpoint instruction already there > in code! Yep, GDB does that already, and we have tests that cover this. See gdb.base/bp-permanent.exp. > > It seems to me it'll be enough if the cache is only retained over a > single resumption step, per thread of course, as it does not appear to me > that the kernel might queue more than one software breakpoint signal at > once. That wouldn't work, as a new thread GDB doesn't know about yet may report a stop for the PC where a breakpoint used to be, and then you don't know whether you need to adjust its PC. Remembering whether a breakpoint was inserted was what GDB used to do, and it was because it was problematic that "swbreak" was invented. See: https://sourceware.org/ml/gdb-patches/2015-02/msg00726.html Particularly: https://sourceware.org/ml/gdb-patches/2015-02/msg00730.html This was a previous attempt that tried to preserve moribund locations, but was still not sufficient: https://sourceware.org/ml/gdb-patches/2014-10/msg00781.html I'm really hoping that at some point all archs implement TRAP_BRKPT and the moribund locations heuristic can be removed from gdb. Thanks, Pedro Alves
On 02/10/2016 12:52 PM, Luis Machado wrote: > The problem of forcing gdbserver to recognize all traps with > si_code==SI_KERNEL is that even hardcoded traps will be reported back to > GDB as a swbreak event, which is not ideal. That's how swbreak is defined: @item swbreak @anchor{swbreak stop reason} The packet indicates a memory breakpoint instruction was executed, irrespective of whether it was @value{GDBN} that planted the breakpoint or the breakpoint is hardcoded in the program. > > But currently there is no easy way to tell a software breakpoint hit and > a hardcoded trap (and maybe even a hardware breakpoint hit?) apart. Software breakpoint hits or hardcoded traps are handled the same. Even if GDB plants the breakpoint instruction itself with direct memory pokes (instead of z0 packets), the target should report "swbreak" stops, so that gdb can do the right thing. GDB knows whether to discard the hit as a delayed breakpoint hit event by checking whether the thread's PC points at an hardcoded trap. If it does, the event is not discarded, but instead reported to the user as a SIGTRAP. Hardware breakpoint hits are distinguished from software breakpoint hits, because they're reported with "hwbreak", not "swbreak": @item hwbreak The packet indicates the target stopped for a hardware breakpoint. The @var{r} part must be left empty. Thanks, Pedro Alves
On Tue, 16 Feb 2016, Pedro Alves wrote: > > FWIW, I maintain it's GDB that should be handling it. What if TRAP_BRKPT > > is reported for a breakpoint that has not been set by GDB in the first > > place and is still there in code? I take it either GDB or gdbserver, as > > applicable, will just sit there looping indefinitely in an attempt to > > discard the event while executing the breakpoint instruction over and over > > again. > > Nope. > > > There's nothing stopping the user from having a MIPS `BREAK 5' > > instruction or say INT3 for the x86 target already present in the > > executable being debugged. > > GDB only ignores the TRAP_BRKPT event if there's no "BREAK 5" instruction > hardcoded in the binary. If there is, then the program is stopped and > a SIGTRAP is reported to the user. Ah, that makes a fundamental difference to me! I missed this detail from Luis's description and concluded that GDB/gdbserver never passes along TRAP_BRKPT events it cannot associate with a breakpoint set by itself. > > What I think GDB ought to be doing here is caching addresses for recently > > removed breakpoints and discarding spurious hits in that cache. > > That does not work in general. The most problematic archs are those that > leave the PC pointing after the breakpoint instruction, such as x86. > See more below. [etc.] Fair enough. Thanks for the extra explanation. > Remembering whether a breakpoint was inserted was what GDB used to > do, and it was because it was problematic that "swbreak" was > invented. See: > > https://sourceware.org/ml/gdb-patches/2015-02/msg00726.html > > Particularly: > > https://sourceware.org/ml/gdb-patches/2015-02/msg00730.html OK, I can see you peek at target text in `program_breakpoint_here_p' to see if there's a breakpoint there. With this in mind I think Luis's change is conceptually good as it stands, although I think we should move the MIPS case along PowerPC right away, because TRAP_BRKPT is the right code to use here and the MIPS port has never produced it so far and therefore there's no backwards compatibility concern if we accept both SI_KERNEL and TRAP_BRKPT. Luis, such an updated change is preapproved as far as I am concerned, feel free to commit it without another review round. As to the kernel side with the observations made in this discussion I think we should set the trap code for SIGTRAP signals issued with BREAK instructions to TRAP_BRKPT unconditionally, regardless of the code used. This won't of course affect encodings which send a different signal such as SIGFPE. We're lacking a code suitable for (conditional) trap instructions. I think TRAP_TRAP or suchlike needs to be added. > > But currently there is no easy way to tell a software breakpoint hit and > > a hardcoded trap (and maybe even a hardware breakpoint hit?) apart. > > Software breakpoint hits or hardcoded traps are handled the same. Even if GDB > plants the breakpoint instruction itself with direct memory pokes (instead of > z0 packets), the target should report "swbreak" stops, so that gdb can do > the right thing. Manual planting is historical AFAIK, gdbserver and other stubs used not to support `z' packets. Now we have the right infrastructure in place and GDB passes MIPS breakpoint encoding requirements along (although I note that the encoding for microMIPS BREAK16 has changed with R6, so this will have to be updated) so we could have software breakpoint support added to MIPS gdbserver as well. > Hardware breakpoint hits are distinguished from software breakpoint hits, > because they're reported with "hwbreak", not "swbreak": > > @item hwbreak > The packet indicates the target stopped for a hardware breakpoint. > The @var{r} part must be left empty. Umm, any requirements for this? We have MIPS data hardware breakpoint support in the Linux kernel (regrettably not for instructions, but that could be added sometime), but I can't see TRAP_HWBKPT being set for them, they just use generic SI_KERNEL as everything else right now. Thanks for your patience in explaining this all to me. Maciej
Hi Maciej, > As to the kernel side with the observations made in this discussion I > think we should set the trap code for SIGTRAP signals issued with BREAK > instructions to TRAP_BRKPT unconditionally, regardless of the code used. > This won't of course affect encodings which send a different signal such > as SIGFPE. > > We're lacking a code suitable for (conditional) trap instructions. I > think TRAP_TRAP or suchlike needs to be added. Yeah, looks like it. >> Hardware breakpoint hits are distinguished from software breakpoint hits, >> because they're reported with "hwbreak", not "swbreak": >> >> @item hwbreak >> The packet indicates the target stopped for a hardware breakpoint. >> The @var{r} part must be left empty. > > Umm, any requirements for this? We have MIPS data hardware breakpoint > support in the Linux kernel (regrettably not for instructions, but that > could be added sometime), but I can't see TRAP_HWBKPT being set for them, > they just use generic SI_KERNEL as everything else right now. Can userspace/ptrace still tell whether a hardware breakpoint triggered by consulting debug registers, similarly to how it can for watchpoints, with PTRACE_GET_WATCH_REGS, and looking at watchhi ? (assuming this comment is correct): /* Target to_stopped_by_watchpoint implementation. Return 1 if stopped by watchpoint. The watchhi R and W bits indicate the watch register triggered. */ static int mips_linux_stopped_by_watchpoint (struct target_ops *ops) { This is not reachable today, due to lack of TRAP_* in si_code, but I think that can be fixed. The only use for hwbreak currently is to know whether to ignore hardware breakpoint traps gdb can't explain (gdb assumes they're a delayed event for a hw breakpoint that has since been removed): /* Maybe this was a trap for a hardware breakpoint/watchpoint that has since been removed. */ if (random_signal && target_stopped_by_hw_breakpoint ()) { /* A delayed hardware breakpoint event. Ignore the trap. */ if (debug_infrun) fprintf_unfiltered (gdb_stdlog, "infrun: delayed hardware breakpoint/watchpoint " "trap, ignoring\n"); random_signal = 0; } So if the server claims it supports this stop reason, but then doesn't send it for hw breakpoint trap, users will see their programs occasionally stop for random spurious SIGTRAPs (if they use hardware breapoints). If the server does _not_ claim support for the swbreak/hwbreak stop reason, then the old moribund breakpoints heuristic kicks in: /* Check if a moribund breakpoint explains the stop. */ if (!target_supports_stopped_by_sw_breakpoint () || !target_supports_stopped_by_hw_breakpoint ()) { for (ix = 0; VEC_iterate (bp_location_p, moribund_locations, ix, loc); ++ix) { if (breakpoint_location_address_match (loc, aspace, bp_addr) && need_moribund_for_location_type (loc)) { bs = bpstat_alloc (loc, &bs_link); /* For hits of moribund locations, we should just proceed. */ bs->stop = 0; bs->print = 0; bs->print_it = print_it_noop; } } } Thanks, Pedro Alves
diff --git a/gdb/nat/linux-ptrace.h b/gdb/nat/linux-ptrace.h index 1be38fe..09c8bc5 100644 --- a/gdb/nat/linux-ptrace.h +++ b/gdb/nat/linux-ptrace.h @@ -142,7 +142,7 @@ struct buffer; The generic Linux target code should use GDB_ARCH_IS_TRAP_BRKPT instead of TRAP_BRKPT to abstract out these peculiarities. */ -#if defined __i386__ || defined __x86_64__ +#if defined __i386__ || defined __x86_64__ || defined __mips__ # define GDB_ARCH_IS_TRAP_BRKPT(X) ((X) == SI_KERNEL) #elif defined __powerpc__ # define GDB_ARCH_IS_TRAP_BRKPT(X) ((X) == SI_KERNEL || (X) == TRAP_BRKPT)