[RFA,v2] (x86) Fix watchpoint using hardware breakpoint for some distro

Message ID 9c7c8586-2940-bea9-d3fb-a13b0d38a32e@adacore.com
State New, archived
Headers

Commit Message

Xavier Roirand March 20, 2018, 2:28 p.m. UTC
  Hello Pedro,

I've replied to your comments and attached a v2 patch.

Le 3/19/18 à 3:28 PM, Pedro Alves a écrit :
> A few things are missing here:
> 
> #1 - kernel versions where this was observed.
> 

CentOS: 2.6.18-419.el5
Suse: 2.6.27.19-5-pae

> #2 - If it's not equal to TRAP_HWBKPT, then what's it equal to?
>       I assume zero?

No, it's equal to 1.

> Take a look at the big comment and table in nat/linux-ptrace.h -- is
> this is the only case that is different on these kernels?
> 

AFAIK, yes.

> I think that we should update the table a bit here, at least
> something like:
> 
>   -   | hardware breakpoints/watchpoints         | TRAP_HWBKPT |
>   +   | hardware breakpoints/watchpoints         | TRAP_HWBKPT (*) |
> 
>   (*) - Kernels x.y.z (CentOS 5, Suse 11) leave this as zero > If other cases are different, then that might affect how to best
> address this.
> 

Agree.

> This comment only makes complete sense with the context in the
> git log in mind:
> 
> - This code is run by all architectures, so the comment should mention x86.
> - The comment reads a bit backwards to me -- talks about what it should
>    be before talking about watchpoints.
> 
> I'd suggest something like this:
> 
>     /* On some kernels (such as x86-64 x.y.z/CentOS 5, x.y.z/Suse 11),
>        when we continue into a watchpoint, si_code indicates 0 instead of
>        TRAP_HWBKPT so we need to check debug registers separately.  */
> 
Agree.

> Does the step-into-watchpoint case result in TRAP_TRACE, or does
> that result in 0 too?  That affects the "continue" in the comment above.

This results to a value of 1 too.

> Thanks,
> Pedro Alves
> 

Regards
From 995662a4ffd5901511053c228e4ad6b396de9197 Mon Sep 17 00:00:00 2001
From: Xavier Roirand <roirand@adacore.com>
Date: Tue, 13 Mar 2018 03:52:14 +0100
Subject: [PATCH] (x86) Fix watchpoint using hardware breakpoint for some
 distro

Running watch*.exp tests in gdb.base shows this:

on x86_64/Ubuntu 16.04:

 # of expected passes            2631
 # of unexpected failures        0

on x86_64/Ubuntu CentOS 5.11:

 # of expected passes            2535
 # of unexpected failures        96

The problem can be easily shown in a debug session:

(gdb) watch val
Hardware watchpoint 2: val
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
...

Whereas it should be:

(gdb) watch val
Hardware watchpoint 2: val
(gdb) c
Continuing.
val before change = 0

Hardware watchpoint 2: val

Old value = ...
New value = ...

The Linux target and gdbserver now check the siginfo si_code
reported on a SIGTRAP to detect whether the trap indicates
a hardware breakpoint was hit.

Unfortunately, on some distro (CentOS 5, Suse 11) the returned
si_code value is not equal to TRAP_HWBKPT when a hardware breakpoint
is hit thus the hardware breakpoint is not handled as it should
be, which is also the case for watchpoint when based on hardware
breakpoint.

This patch adds an additional check when the inferior reported
a SIGTRAP in order to detect this case.

No test have been created since all the existing ones are enough
to validate the fix. BTW, with this fix, the tests results for
the watchpoint tests are (for CentOS 5.11):

 # of expected passes            2630
 # of unexpected failures        1

The remaining failure is located in watch-vfork test which explicitly
disable the use of hardware breakpoint which is out of scope here.

gdb/ChangeLog:

        * linux-nat.c (save_stop_reason): Add an additional check
        to detect hardware watchpoint.

nat/ChangeLog:
        * linux-ptrace.h: Update comment before GDB_ARCH_IS_TRAP_XX
        macros to reflect CentOS 5 & Suse 11 specific case.

gdbserver/ChangeLog:

        * linux-low.c (save_stop_reason): Add an additional check
        to detect hardware watchpoint.

For R309-004

Test: x86_64/gdb      /ubuntu 16.04
      x86_64/gdbserver/ubuntu 16.04
      x86   /gdbs     /ubuntu 16.04
      x86   /gdbserver/ubuntu 16.04
      x86_64/gdb      /centos 5.11
      x86_64/gdbserver/centos 5.11
      x86   /gdb      /centos 5.11
      x86   /gdbserver/centos 5.11

Change-Id: I2546aca9827d9ae12ab86deb7aa4acc60c82b4b4
---
 gdb/gdbserver/linux-low.c |  8 ++++++++
 gdb/linux-nat.c           |  8 ++++++++
 gdb/nat/linux-ptrace.h    | 18 ++++++++++--------
 3 files changed, 26 insertions(+), 8 deletions(-)
  

Comments

Pedro Alves March 20, 2018, 3:11 p.m. UTC | #1
On 03/20/2018 02:28 PM, Xavier Roirand wrote:
> Hello Pedro,
> 
> I've replied to your comments and attached a v2 patch.
> 
> Le 3/19/18 à 3:28 PM, Pedro Alves a écrit :
>> A few things are missing here:
>>
>> #1 - kernel versions where this was observed.
>>
> 
> CentOS: 2.6.18-419.el5
> Suse: 2.6.27.19-5-pae
> 
>> #2 - If it's not equal to TRAP_HWBKPT, then what's it equal to?
>>       I assume zero?
> 
> No, it's equal to 1.

Hmm, that's TRAP_BRKPT nowadays.  What was '1' supposed to mean in
kernels of such vintage?  What was it's symbolic name back then?

In the table in linux-ptrace.h, we see that modern kernels report
TRAP_BRKPT/1 for the "single-stepping a syscall" case.  What do
those older kernels report in that case then?

>> Does the step-into-watchpoint case result in TRAP_TRACE, or does
>> that result in 0 too?  That affects the "continue" in the comment above.
> 
> This results to a value of 1 too.

What do those kernels report for hardware _breakpoints_?  Is it 1 too?
Wondering whether we should make GDB_ARCH_IS_TRAP_HWBKPT return
true for 1 too...

Please provide a more complete picture.

Thanks,
Pedro Alves
  
Xavier Roirand March 21, 2018, 4:17 p.m. UTC | #2
Hello Pedro,

Le 3/20/18 à 4:11 PM, Pedro Alves a écrit :
> On 03/20/2018 02:28 PM, Xavier Roirand wrote:
> Hmm, that's TRAP_BRKPT nowadays.  What was '1' supposed to mean in
> kernels of such vintage?  What was it's symbolic name back then?

As far as I can see in header file (/usr/include/bits/siginfo.h),
this is the same value, it's:

TRAP_BRKPT = 1,		/* Process breakpoint.  */

> In the table in linux-ptrace.h, we see that modern kernels report
> TRAP_BRKPT/1 for the "single-stepping a syscall" case.  What do
> those older kernels report in that case then?

When "single-stepping" syscall, the same value (TRAP_BRKPT) is
reported.

> What do those kernels report for hardware _breakpoints_?  Is it 1 too?

Yes, I've checked this and it's 1 also.

> Wondering whether we should make GDB_ARCH_IS_TRAP_HWBKPT return
> true for 1 too...
> 
> Please provide a more complete picture.
> 

Is it enough or do you think I need to provide more info ?

> Thanks,
> Pedro Alves
> 

Regards.
  
Pedro Alves March 26, 2018, 11:38 a.m. UTC | #3
On 03/21/2018 04:17 PM, Xavier Roirand wrote:
> Hello Pedro,
> 
> Le 3/20/18 à 4:11 PM, Pedro Alves a écrit :
>> On 03/20/2018 02:28 PM, Xavier Roirand wrote:
>> Hmm, that's TRAP_BRKPT nowadays.  What was '1' supposed to mean in
>> kernels of such vintage?  What was it's symbolic name back then?
> 
> As far as I can see in header file (/usr/include/bits/siginfo.h),
> this is the same value, it's:
> 
> TRAP_BRKPT = 1,        /* Process breakpoint.  */
> 
>> In the table in linux-ptrace.h, we see that modern kernels report
>> TRAP_BRKPT/1 for the "single-stepping a syscall" case.  What do
>> those older kernels report in that case then?
> 
> When "single-stepping" syscall, the same value (TRAP_BRKPT) is
> reported.
> 
>> What do those kernels report for hardware _breakpoints_?  Is it 1 too?
> 
> Yes, I've checked this and it's 1 also.
> 
>> Wondering whether we should make GDB_ARCH_IS_TRAP_HWBKPT return
>> true for 1 too...
>>
>> Please provide a more complete picture.
>>
> 
> Is it enough or do you think I need to provide more info ?
> 
Earlier you said that TRAP_HWBKPT was the only case that was different
from current kernels, but turns out that's not accurate.  And now
I'm left wondering what do those kernels report for the other cases
you haven't confirmed yet, like hw breakpoints, software breakpoints,
and regular single steps.  We need a complete picture to figure out
what the best fix is.

E.g., if hw breakpoints and watchpoints report the same si_code,
how do hw breakpoints work with your patch?

Thanks,
Pedro Alves
  
Xavier Roirand March 27, 2018, 1:18 p.m. UTC | #4
Hello,

> Earlier you said that TRAP_HWBKPT was the only case that was different
> from current kernels, but turns out that's not accurate.  And now
> I'm left wondering what do those kernels report for the other cases
> you haven't confirmed yet, like hw breakpoints, software breakpoints,
> and regular single steps.  We need a complete picture to figure out
> what the best fix is.
> 

Here's a more complete picture:

Ubuntu 16.04

| what                                     | si_code     | value |
|------------------------------------------+-------------|-------|
| software breakpoints (int3)              | SI_KERNEL   |  0x80 |
| single-steps                             | TRAP_TRACE  |     2 |
| single-stepping a syscall                | TRAP_BRKPT  |     1 |
| user sent SIGTRAP                        | 0           |     0 |
| exec SIGTRAP (when no PTRACE_EVENT_EXEC) | 0           |     0 |
| hardware breakpoints/watchpoints         | TRAP_HWBKPT |     4 |

CentOs 5.11

| what                                     | si_code     | value |
|------------------------------------------+-------------|-------|
| software breakpoints (int3)              | SI_KERNEL   |  0x80 |
| single-steps                             | TRAP_TRACE  |     2 |
| single-stepping a syscall                | TRAP_BRKPT  |     1 |
| user sent SIGTRAP                        | 0           |     0 |
| exec SIGTRAP (when no PTRACE_EVENT_EXEC) | 0           |     0 |
| hardware breakpoints/watchpoints         | TRAP_HWBKPT |     1 |


>> E.g., if hw breakpoints and watchpoints report the same si_code,
>> how do hw breakpoints work with your patch?
>>

In handle_signal_stop function, there's a check done in order to see if 
there's a breakpoint/watchpoint/catchpoint/etc which handle this stop 
event and check if there's a breakpoint/watchpoint/catchpoint where the 
inferior is stopped:

   /* See if there is a breakpoint/watchpoint/catchpoint/etc. that
      handles this event.  */
   ecs->event_thread->control.stop_bpstat
     = bpstat_stop_status (get_current_regcache ()->aspace (),
                           stop_pc, ecs->ptid, &ecs->ws);

Do you think that having a fix handling both cases in save_stop_reason 
would be better (hardware breakpoint & watchpoint) ?

           else
             {
               if (hardware_breakpoint_inserted_here_p (regcache->aspace 
(), pc))
                 lp->stop_reason = TARGET_STOPPED_BY_HW_BREAKPOINT;
               else
                 check_stopped_by_watchpoint (lp);
             }

Rather than my initial proposal:

           else
             {
                 check_stopped_by_watchpoint (lp);
             }

Regards

> Thanks,
> Pedro Alves
>
  

Patch

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 2e5e19d..3fee99a 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -866,6 +866,14 @@  save_stop_reason (struct lwp_info *lwp)
 	      if (!check_stopped_by_watchpoint (lwp))
 		lwp->stop_reason = TARGET_STOPPED_BY_SINGLE_STEP;
 	    }
+	  else
+	    {
+	      /* On some kernels (such as x86-64 2.6.18/CentOS 5,
+	         2.6.27/Suse 11), when we continue into a watchpoint,
+	         si_code indicates 1 instead of TRAP_HWBKPT so we
+	         need to check debug registers separately.  */
+	      check_stopped_by_watchpoint (lwp);
+	    }
 	}
     }
 #else
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 1bbad7b..2e3fa29 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2798,6 +2798,14 @@  save_stop_reason (struct lwp_info *lp)
 		 the debug registers separately.  */
 	      check_stopped_by_watchpoint (lp);
 	    }
+	  else
+	    {
+	      /* On some kernels (such as x86-64 2.6.18/CentOS 5,
+	         2.6.27/Suse 11), when we continue into a watchpoint,
+	         si_code indicates 1 instead of TRAP_HWBKPT so we
+	         need to check debug registers separately.  */
+	      check_stopped_by_watchpoint (lp);
+	    }
 	}
     }
 #else
diff --git a/gdb/nat/linux-ptrace.h b/gdb/nat/linux-ptrace.h
index dc180fb..7e10fb3 100644
--- a/gdb/nat/linux-ptrace.h
+++ b/gdb/nat/linux-ptrace.h
@@ -120,14 +120,16 @@  struct buffer;
 /* The x86 kernel gets some of the si_code values backwards, like
    this:
 
-   | what                                     | si_code     |
-   |------------------------------------------+-------------|
-   | software breakpoints (int3)              | SI_KERNEL   |
-   | single-steps                             | TRAP_TRACE  |
-   | single-stepping a syscall                | TRAP_BRKPT  |
-   | user sent SIGTRAP                        | 0           |
-   | exec SIGTRAP (when no PTRACE_EVENT_EXEC) | 0           |
-   | hardware breakpoints/watchpoints         | TRAP_HWBKPT |
+   | what                                     | si_code         |
+   |------------------------------------------+-----------------|
+   | software breakpoints (int3)              | SI_KERNEL       |
+   | single-steps                             | TRAP_TRACE      |
+   | single-stepping a syscall                | TRAP_BRKPT      |
+   | user sent SIGTRAP                        | 0               |
+   | exec SIGTRAP (when no PTRACE_EVENT_EXEC) | 0               |
+   | hardware breakpoints/watchpoints         | TRAP_HWBKPT (*) |
+
+   (*) - Kernels 2.6.18 (CentOS 5), 2.6.27 (Suse 11) set this to 1.
 
    That is, it reports SI_KERNEL for software breakpoints (and only
    for those), and TRAP_BRKPT for single-stepping a syscall...  If the