[RFA/commit] arm-tdep.c: Do not single-step after hitting a watchpoint.

Message ID 1410786062-19274-1-git-send-email-brobecker@adacore.com
State New, archived
Headers

Commit Message

Joel Brobecker Sept. 15, 2014, 1:01 p.m. UTC
  Hello!

Re: question about ARM watchpoints
    https://www.sourceware.org/ml/gdb/2014-09/msg00000.html

This patch fixes an issue with watchpoints on ARM targets, where
the debugger stops 2 instructions after the instruction causing
the watchpoint. GDB is expected to stop at the next instruction.

The problem is caused by the fact that GDB does an extra single-step
after receiving the watchpoint notification, because the
have_nonsteppable_watchpoint gdbarch attribute is set for ARM
targets. Our experiments indicate that this is incorrect, at
least for the versions of ARM that we tested on (ARMv7). We tried
to get confirmation of this through the ARM documentation, but
did not manage to get a clear answer. So, in light of evidence
that the current code is wrong for some versions of ARM, and
with the lack of evidence regarding the versions of ARM we could
not test on, this patch simply unsets this gdbarch attribute
for all versions of ARM.  The plan is to refine this later on
if/when we find that some systems behave differently.

gdb/ChangeLog:

        * arm-tdep.c (arm_gdbarch_init): Remove call to
        set_gdbarch_have_nonsteppable_watchpoint.

Unless there are comments, I will commit the patch in a week.

Thank you!

---
 gdb/arm-tdep.c | 3 ---
 1 file changed, 3 deletions(-)
  

Comments

Yao Qi Sept. 16, 2014, 11:08 a.m. UTC | #1
Joel Brobecker <brobecker@adacore.com> writes:

> Hello!
>
> Re: question about ARM watchpoints
>     https://www.sourceware.org/ml/gdb/2014-09/msg00000.html
>
> This patch fixes an issue with watchpoints on ARM targets, where
> the debugger stops 2 instructions after the instruction causing
> the watchpoint. GDB is expected to stop at the next instruction.
>
> The problem is caused by the fact that GDB does an extra single-step
> after receiving the watchpoint notification, because the
> have_nonsteppable_watchpoint gdbarch attribute is set for ARM
> targets. Our experiments indicate that this is incorrect, at
> least for the versions of ARM that we tested on (ARMv7). We tried

Joel,
Can you elaborate your experiments?  Do you do experiments on qemu, arm
bare metal targets or arm linux targets?

I find Peter tries to fix the same problem we encounter in qemu side,

  [Qemu-devel] [PATCH] gdbstub: Allow target CPUs to specify watchpoint STOP_BEFORE_ACCESS flag
  http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg02665.html

and this patch isn't accepted yet.

Without this patch, program stops two instructions after the variable is
updated on qemu trunk,

   0x000001ae <+10>:    str     r3, [r7, #12]
   0x000001b0 <+12>:    ldr     r3, [r7, #4]
=> 0x000001b2 <+14>:    cmp     r3, #1
   0x000001b4 <+16>:    bne.n   0x1ba <recurse+22>

however, with Peter's patch applied, program stops one instruction after
the variable is updated,

(gdb) watch b
Hardware watchpoint 3: b
(gdb) c
Continuing.
Hardware watchpoint 3: b

Old value = 1283
New value = 0
recurse (a=10) at ../../../../git/gdb/testsuite/gdb.base/recurse.c:15
15        if (a == 1)
(gdb) disassemble recurse
Dump of assembler code for function recurse:
   0x000001a4 <+0>:     push    {r7, lr}
   0x000001a6 <+2>:     sub     sp, #16
   0x000001a8 <+4>:     add     r7, sp, #0
   0x000001aa <+6>:     str     r0, [r7, #4]
   0x000001ac <+8>:     movs    r3, #0
   0x000001ae <+10>:    str     r3, [r7, #12]
=> 0x000001b0 <+12>:    ldr     r3, [r7, #4]
   0x000001b2 <+14>:    cmp     r3, #1

Note that with patched qemu, two fails in gdb.base/recurse.exp are
fixed.  At least, gdb and qemu should be in sync on this.
  
Joel Brobecker Sept. 16, 2014, 11:59 a.m. UTC | #2
> Can you elaborate your experiments?  Do you do experiments on qemu, arm
> bare metal targets or arm linux targets?

My experiments were on QEMU, but others tried on bare-metal. I tried
on GNU/Linux targets as well, but hardware watchpoints simply did
not work (no signal).

> I find Peter tries to fix the same problem we encounter in qemu side,
> 
>   [Qemu-devel] [PATCH] gdbstub: Allow target CPUs to specify watchpoint STOP_BEFORE_ACCESS flag
>   http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg02665.html
> 
> and this patch isn't accepted yet.
[...]
> Note that with patched qemu, two fails in gdb.base/recurse.exp are
> fixed.  At least, gdb and qemu should be in sync on this.

I think the experiments that were run showed that QEMU is in fact
correct and should NOT be changed.
  
Luis Machado Sept. 16, 2014, 12:05 p.m. UTC | #3
On 09/16/2014 08:59 AM, Joel Brobecker wrote:
>> Can you elaborate your experiments?  Do you do experiments on qemu, arm
>> bare metal targets or arm linux targets?
>
> My experiments were on QEMU, but others tried on bare-metal. I tried
> on GNU/Linux targets as well, but hardware watchpoints simply did
> not work (no signal).
>
>> I find Peter tries to fix the same problem we encounter in qemu side,
>>
>>    [Qemu-devel] [PATCH] gdbstub: Allow target CPUs to specify watchpoint STOP_BEFORE_ACCESS flag
>>    http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg02665.html
>>
>> and this patch isn't accepted yet.
> [...]
>> Note that with patched qemu, two fails in gdb.base/recurse.exp are
>> fixed.  At least, gdb and qemu should be in sync on this.
>
> I think the experiments that were run showed that QEMU is in fact
> correct and should NOT be changed.
>

Do we know what the Linux kernel's behavior on this one is? I wonder 
what the stopped data address shows.

Someone with access to a board with a relatively new kernel could try 
that and rule it out, otherwise we risk fixing something for QEMU/bare 
metal and breaking things for Linux.

Luis
  
Joel Brobecker Sept. 16, 2014, 12:48 p.m. UTC | #4
> >I think the experiments that were run showed that QEMU is in fact
> >correct and should NOT be changed.
> 
> Do we know what the Linux kernel's behavior on this one is? I wonder
> what the stopped data address shows.
> 
> Someone with access to a board with a relatively new kernel could
> try that and rule it out, otherwise we risk fixing something for
> QEMU/bare metal and breaking things for Linux.

When I tested on GNU/Linux, watchpoints simply did not work
(silently ignored, no signal).  I was using an old kernel (2012),
though; but that's all I had access to.  But, all in all, baremetal
should be our most reliable source of info, though,no? - no software
layer to murky the waters.
  
Luis Machado Sept. 16, 2014, 1:09 p.m. UTC | #5
On 09/16/2014 09:48 AM, Joel Brobecker wrote:
>>> I think the experiments that were run showed that QEMU is in fact
>>> correct and should NOT be changed.
>>
>> Do we know what the Linux kernel's behavior on this one is? I wonder
>> what the stopped data address shows.
>>
>> Someone with access to a board with a relatively new kernel could
>> try that and rule it out, otherwise we risk fixing something for
>> QEMU/bare metal and breaking things for Linux.
>
> When I tested on GNU/Linux, watchpoints simply did not work
> (silently ignored, no signal).  I was using an old kernel (2012),
> though; but that's all I had access to.  But, all in all, baremetal
> should be our most reliable source of info, though,no? - no software
> layer to murky the waters.
>

It is hard to tell. ARM's documentation is not clear. For example, this 
is probably where the stopped data address should be coming from:

--

WFAR - Watchpoint Fault Address Register

The WFAR is updated to indicate the address of the instruction that 
accessed the watchpointed address:

- the address of the instruction + 8 in ARM state
- the address of the instruction + 4 in Thumb® state

--

So it seems in line with what we are seeing? The program being trapped 
two instructions after the data fault?

If it stops just a single instruction after the data fault, then someone 
(probe, emulator or kernel) may be trying to help GDB by decrementing 
the data fault address.

Luis
  
Pedro Alves Sept. 16, 2014, 3:21 p.m. UTC | #6
Hi Terry, Marcus,

Can someone at ARM shed some light on this, please?

This thread is here:

 https://sourceware.org/ml/gdb-patches/2014-09/msg00498.html

And the discussion started in another thread here:

  https://sourceware.org/ml/gdb/2014-09/msg00000.html

I've just added a test that hopefully helps with this, btw:

 https://sourceware.org/ml/gdb-patches/2014-09/msg00535.html

I'm also wondering whether Aarch64 needs adjustment as well.

Thanks,
Pedro Alves

On 09/16/2014 02:09 PM, Luis Machado wrote:
> On 09/16/2014 09:48 AM, Joel Brobecker wrote:
>>>> I think the experiments that were run showed that QEMU is in fact
>>>> correct and should NOT be changed.
>>>
>>> Do we know what the Linux kernel's behavior on this one is? I wonder
>>> what the stopped data address shows.
>>>
>>> Someone with access to a board with a relatively new kernel could
>>> try that and rule it out, otherwise we risk fixing something for
>>> QEMU/bare metal and breaking things for Linux.
>>
>> When I tested on GNU/Linux, watchpoints simply did not work
>> (silently ignored, no signal).  I was using an old kernel (2012),
>> though; but that's all I had access to.  But, all in all, baremetal
>> should be our most reliable source of info, though,no? - no software
>> layer to murky the waters.
>>
> 
> It is hard to tell. ARM's documentation is not clear. For example, this 
> is probably where the stopped data address should be coming from:
> 
> --
> 
> WFAR - Watchpoint Fault Address Register
> 
> The WFAR is updated to indicate the address of the instruction that 
> accessed the watchpointed address:
> 
> - the address of the instruction + 8 in ARM state
> - the address of the instruction + 4 in Thumb® state
> 
> --
> 
> So it seems in line with what we are seeing? The program being trapped 
> two instructions after the data fault?
> 
> If it stops just a single instruction after the data fault, then someone 
> (probe, emulator or kernel) may be trying to help GDB by decrementing 
> the data fault address.
> 
> Luis
>
  
Marcus Shawcroft Sept. 18, 2014, 11:40 a.m. UTC | #7
On 16 September 2014 16:21, Pedro Alves <palves@redhat.com> wrote:
> Hi Terry, Marcus,
>
> Can someone at ARM shed some light on this, please?
>
> This thread is here:
>
>  https://sourceware.org/ml/gdb-patches/2014-09/msg00498.html
>
> And the discussion started in another thread here:
>
>   https://sourceware.org/ml/gdb/2014-09/msg00000.html
>
> I've just added a test that hopefully helps with this, btw:
>
>  https://sourceware.org/ml/gdb-patches/2014-09/msg00535.html
>
> I'm also wondering whether Aarch64 needs adjustment as well.
>
> Thanks,
> Pedro Alves


Hi,

In aarch32 execution state a watch point event is taken as a data
abort with the PC containing the address of the faulting instruction +
8 irrespective of thumb mode.

The linux kernel adjusts the reported PC by subtracting 8 such that
the ptrace interface will indicate the address of the faulting
instruction.

Peter Maydell's proposed qemu patch referenced in the thread above
appears to me to align the gdbstub behaviour in qemu with the linux
kernel ptrace() interface behaviour.

w.r.t DBGWFAR, it's use is described as deprecated in  ARM ARMv7-A&R
Issue C.c  c11.11.45. It is not used by linux kernel.

Cheers
/Marcus
  
Pedro Alves Sept. 19, 2014, 5:31 p.m. UTC | #8
Hi Marcus,

On 09/18/2014 12:40 PM, Marcus Shawcroft wrote:
> On 16 September 2014 16:21, Pedro Alves <palves@redhat.com> wrote:
>> Hi Terry, Marcus,
>>
>> Can someone at ARM shed some light on this, please?
>>
>> This thread is here:
>>
>>  https://sourceware.org/ml/gdb-patches/2014-09/msg00498.html
>>
>> And the discussion started in another thread here:
>>
>>   https://sourceware.org/ml/gdb/2014-09/msg00000.html
>>
>> I've just added a test that hopefully helps with this, btw:
>>
>>  https://sourceware.org/ml/gdb-patches/2014-09/msg00535.html
>>
>> I'm also wondering whether Aarch64 needs adjustment as well.
>>

> In aarch32 execution state a watch point event is taken as a data
> abort with the PC containing the address of the faulting instruction +
> 8 irrespective of thumb mode.

So the documentation is actually wrong for thumb?  Or you're
saying that for Thumb2, while it'd be 4 for Thumb 1?

If you're talking about something else, not DBGWFAR (what I think
Luis pointed out before), then can you give a pointer to where these
other watch point events are documented?

> 
> The linux kernel adjusts the reported PC by subtracting 8 such that
> the ptrace interface will indicate the address of the faulting
> instruction.

Hmm.  So when the data abort triggers at fault+8, the instruction
that triggered the abort hasn't actually completed, right?  No memory
has changed yet.

So if nothing does the adjustment, like Gareth found out happens with
the Black Magic Probe, then we'll resume execution from the
wrong address/instruction (with the effects of the skipped instructions
missing, including the memory write...).  Did I understand that
right?  (Gareth, is that what you see?)

> Peter Maydell's proposed qemu patch referenced in the thread above
> appears to me to align the gdbstub behaviour in qemu with the linux
> kernel ptrace() interface behaviour.
> 
> w.r.t DBGWFAR, it's use is described as deprecated in  ARM ARMv7-A&R
> Issue C.c  c11.11.45. It is not used by linux kernel.

Thanks,
Pedro Alves
  
Joel Brobecker Sept. 29, 2014, 5:51 p.m. UTC | #9
Hello all,

> Hmm.  So when the data abort triggers at fault+8, the instruction
> that triggered the abort hasn't actually completed, right?  No memory
> has changed yet.
> 
> So if nothing does the adjustment, like Gareth found out happens with
> the Black Magic Probe, then we'll resume execution from the
> wrong address/instruction (with the effects of the skipped instructions
> missing, including the memory write...).  Did I understand that
> right?  (Gareth, is that what you see?)

I have been trying to understand the various contributions, and
I admit I am still not quite sure...

Does it look like the patch I proposed is correct? It seems to be
supported by Terry Guo's experiments as well...

Thanks!
  
Luis Machado Sept. 29, 2014, 5:57 p.m. UTC | #10
On 09/29/2014 02:51 PM, Joel Brobecker wrote:
> Hello all,
>
>> Hmm.  So when the data abort triggers at fault+8, the instruction
>> that triggered the abort hasn't actually completed, right?  No memory
>> has changed yet.
>>
>> So if nothing does the adjustment, like Gareth found out happens with
>> the Black Magic Probe, then we'll resume execution from the
>> wrong address/instruction (with the effects of the skipped instructions
>> missing, including the memory write...).  Did I understand that
>> right?  (Gareth, is that what you see?)
>
> I have been trying to understand the various contributions, and
> I admit I am still not quite sure...
>
> Does it look like the patch I proposed is correct? It seems to be
> supported by Terry Guo's experiments as well...
>
> Thanks!
>

 From previous mails, it does not seem to be correct for Linux, where 
the ptrace interface adjusts the data fault address to point to the 
address of the instruction that caused the trigger. So it looks like the 
current behavior of GDB is correct for Linux, though it may not be 
correct for QEMU or bare metal.

Luis
  
Pedro Alves Sept. 29, 2014, 9:04 p.m. UTC | #11
On 09/29/2014 06:51 PM, Joel Brobecker wrote:
> Hello all,
> 
>> Hmm.  So when the data abort triggers at fault+8, the instruction
>> that triggered the abort hasn't actually completed, right?  No memory
>> has changed yet.
>>
>> So if nothing does the adjustment, like Gareth found out happens with
>> the Black Magic Probe, then we'll resume execution from the
>> wrong address/instruction (with the effects of the skipped instructions
>> missing, including the memory write...).  Did I understand that
>> right?  (Gareth, is that what you see?)
> 
> I have been trying to understand the various contributions, and
> I admit I am still not quite sure...
> 
> Does it look like the patch I proposed is correct? It seems to be
> supported by Terry Guo's experiments as well...

Nope, Terry's experiments supported the current code.

The experiments (which were on Linux) showed that the watchpoint was
reported to GDB first with the PC pointing at the instruction that
accessed memory, and then GDB single-stepped once, and the PC ends up
pointing at one instruction after the instruction that changed memory.

Thanks,
Pedro Alves
  
Will Deacon Sept. 30, 2014, 8:53 a.m. UTC | #12
On Mon, Sep 29, 2014 at 10:04:11PM +0100, Pedro Alves wrote:
> On 09/29/2014 06:51 PM, Joel Brobecker wrote:
> > Hello all,
> > 
> >> Hmm.  So when the data abort triggers at fault+8, the instruction
> >> that triggered the abort hasn't actually completed, right?  No memory
> >> has changed yet.
> >>
> >> So if nothing does the adjustment, like Gareth found out happens with
> >> the Black Magic Probe, then we'll resume execution from the
> >> wrong address/instruction (with the effects of the skipped instructions
> >> missing, including the memory write...).  Did I understand that
> >> right?  (Gareth, is that what you see?)
> > 
> > I have been trying to understand the various contributions, and
> > I admit I am still not quite sure...
> > 
> > Does it look like the patch I proposed is correct? It seems to be
> > supported by Terry Guo's experiments as well...
> 
> Nope, Terry's experiments supported the current code.
> 
> The experiments (which were on Linux) showed that the watchpoint was
> reported to GDB first with the PC pointing at the instruction that
> accessed memory, and then GDB single-stepped once, and the PC ends up
> pointing at one instruction after the instruction that changed memory.

FWIW, that also matches the intention of the kernel-side code. The same
logic applies to arm64, despite the availability of hardware single-step
there (the PTRACE_SINGLESTEP request can be used to access that feature).

Furthermore, this also matches the ARMv7/8 debug architectures; a
watchpoint data abort will be taken before the faulting instruction has
executed.

Will
  

Patch

diff --git a/gdb/arm-tdep.c b/gdb/arm-tdep.c
index f9feb52..990c4ad 100644
--- a/gdb/arm-tdep.c
+++ b/gdb/arm-tdep.c
@@ -10446,9 +10446,6 @@  arm_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)
   if (tdep->arm_abi == ARM_ABI_AUTO)
     tdep->arm_abi = ARM_ABI_APCS;
 
-  /* Watchpoints are not steppable.  */
-  set_gdbarch_have_nonsteppable_watchpoint (gdbarch, 1);
-
   /* We used to default to FPA for generic ARM, but almost nobody
      uses that now, and we now provide a way for the user to force
      the model.  So default to the most useful variant.  */