[RFA] (x86) Fix watchpoint using hardware breakpoint for some distro

Message ID 1521209212-11264-1-git-send-email-roirand@adacore.com
State New, archived
Headers

Commit Message

Xavier Roirand March 16, 2018, 2:06 p.m. UTC
  Running watch*.exp tests in gdb.base shows this:

on x86_64/Ubuntu 16.04:

 # of expected passes            2631
 # of unexpected failures        0

on x86_64/Ubuntu CentOS 5.11:

 # of expected passes            2535
 # of unexpected failures        96

The problem can be easily shown in a debug session:

(gdb) watch val
Hardware watchpoint 2: val
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
...

Whereas it should be:

(gdb) watch val
Hardware watchpoint 2: val
(gdb) c
Continuing.
val before change = 0

Hardware watchpoint 2: val

Old value = ...
New value = ...

The Linux target and gdbserver now check the siginfo si_code
reported on a SIGTRAP to detect whether the trap indicates
a hardware breakpoint was hit.

Unfortunately, on some distro (CentOS 5, Suse 11) the returned
si_code value is not equal to TRAP_HWBKPT when a hardware breakpoint
is hit thus the hardware breakpoint is not handled as it should
be, which is also the case for watchpoint when based on hardware
breakpoint.

This patch adds an additional check when the inferior reported
a SIGTRAP in order to detect this case.

No test have been created since all the existing ones are enough
to validate the fix. BTW, with this fix, the tests results for
the watchpoint tests are (for CentOS 5.11):

 # of expected passes            2630
 # of unexpected failures        1

The remaining failure is located in watch-vfork test which explicitly
disable the use of hardware breakpoint which is out of scope here.

gdb/ChangeLog:

        * linux-nat.c (save_stop_reason): Add an additional check
        to detect hardware watchpoint.

gdbserver/ChangeLog:

        * linux-low.c (save_stop_reason): Add an additional check
        to detect hardware watchpoint.

For R309-004

Test: x86_64/gdb      /ubuntu 16.04
      x86_64/gdbserver/ubuntu 16.04
      x86   /gdbs     /ubuntu 16.04
      x86   /gdbserver/ubuntu 16.04
      x86_64/gdb      /centos 5.11
      x86_64/gdbserver/centos 5.11
      x86   /gdb      /centos 5.11
      x86   /gdbserver/centos 5.11

Change-Id: I2546aca9827d9ae12ab86deb7aa4acc60c82b4b4
---
 gdb/gdbserver/linux-low.c | 7 +++++++
 gdb/linux-nat.c           | 7 +++++++
 2 files changed, 14 insertions(+)
  

Comments

Pedro Alves March 19, 2018, 2:28 p.m. UTC | #1
Hi Xavier,

On 03/16/2018 02:06 PM, Xavier Roirand wrote:
> Running watch*.exp tests in gdb.base shows this:
> 
> on x86_64/Ubuntu 16.04:
> 
>  # of expected passes            2631
>  # of unexpected failures        0
> 
> on x86_64/Ubuntu CentOS 5.11:
> 
>  # of expected passes            2535
>  # of unexpected failures        96
> 
> The problem can be easily shown in a debug session:
> 
> (gdb) watch val
> Hardware watchpoint 2: val
> (gdb) c
> Continuing.
> Program received signal SIGTRAP, Trace/breakpoint trap.
> ...
> 
> Whereas it should be:
> 
> (gdb) watch val
> Hardware watchpoint 2: val
> (gdb) c
> Continuing.
> val before change = 0
> 
> Hardware watchpoint 2: val
> 
> Old value = ...
> New value = ...
> 
> The Linux target and gdbserver now check the siginfo si_code
> reported on a SIGTRAP to detect whether the trap indicates
> a hardware breakpoint was hit.
> 
> Unfortunately, on some distro (CentOS 5, Suse 11) the returned
> si_code value is not equal to TRAP_HWBKPT when a hardware breakpoint
> is hit thus the hardware breakpoint is not handled as it should
> be, which is also the case for watchpoint when based on hardware
> breakpoint.

A few things are missing here:

#1 - kernel versions where this was observed.

#2 - If it's not equal to TRAP_HWBKPT, then what's it equal to?
     I assume zero?

Take a look at the big comment and table in nat/linux-ptrace.h -- is
this is the only case that is different on these kernels?  

I think that we should update the table a bit here, at least
something like:

 -   | hardware breakpoints/watchpoints         | TRAP_HWBKPT |
 +   | hardware breakpoints/watchpoints         | TRAP_HWBKPT (*) |

 (*) - Kernels x.y.z (CentOS 5, Suse 11) leave this as zero.

If other cases are different, then that might affect how to best
address this.

> 
> This patch adds an additional check when the inferior reported
> a SIGTRAP in order to detect this case.
> 
> No test have been created since all the existing ones are enough
> to validate the fix. BTW, with this fix, the tests results for
> the watchpoint tests are (for CentOS 5.11):
> 
>  # of expected passes            2630
>  # of unexpected failures        1
> 
> The remaining failure is located in watch-vfork test which explicitly
> disable the use of hardware breakpoint which is out of scope here.
> 
> gdb/ChangeLog:
> 
>         * linux-nat.c (save_stop_reason): Add an additional check
>         to detect hardware watchpoint.
> 
> gdbserver/ChangeLog:
> 
>         * linux-low.c (save_stop_reason): Add an additional check
>         to detect hardware watchpoint.
> 
> For R309-004
> 
> Test: x86_64/gdb      /ubuntu 16.04
>       x86_64/gdbserver/ubuntu 16.04
>       x86   /gdbs     /ubuntu 16.04
>       x86   /gdbserver/ubuntu 16.04
>       x86_64/gdb      /centos 5.11
>       x86_64/gdbserver/centos 5.11
>       x86   /gdb      /centos 5.11
>       x86   /gdbserver/centos 5.11
> 
> Change-Id: I2546aca9827d9ae12ab86deb7aa4acc60c82b4b4
> ---
>  gdb/gdbserver/linux-low.c | 7 +++++++
>  gdb/linux-nat.c           | 7 +++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 2e5e19d..fe61026 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -866,6 +866,13 @@ save_stop_reason (struct lwp_info *lwp)
>  	      if (!check_stopped_by_watchpoint (lwp))
>  		lwp->stop_reason = TARGET_STOPPED_BY_SINGLE_STEP;
>  	    }
> +	  else
> +	    {
> +	      /* The si_code is not TRAP_HWBKPT whereas it should
> +	         on some distro (CentOS 5, Suse 11) so still check
> +	         if stopped due to watchpoint.  */
> +	      check_stopped_by_watchpoint (lwp);
> +	    }
>  	}
>      }
>  #else
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 1bbad7b..0d97aa1 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -2798,6 +2798,13 @@ save_stop_reason (struct lwp_info *lp)
>  		 the debug registers separately.  */
>  	      check_stopped_by_watchpoint (lp);
>  	    }
> +	  else
> +	    {
> +	      /* The si_code is not TRAP_HWBKPT whereas it should
> +	         on some distro (CentOS 5, Suse 11) so still check
> +	         if stopped due to watchpoint.  */

This comment only makes complete sense with the context in the
git log in mind:

- This code is run by all architectures, so the comment should mention x86.
- The comment reads a bit backwards to me -- talks about what it should
  be before talking about watchpoints.

I'd suggest something like this:

   /* On some kernels (such as x86-64 x.y.z/CentOS 5, x.y.z/Suse 11),
      when we continue into a watchpoint, si_code indicates 0 instead of
      TRAP_HWBKPT so we need to check debug registers separately.  */

Does the step-into-watchpoint case result in TRAP_TRACE, or does
that result in 0 too?  That affects the "continue" in the comment above.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 2e5e19d..fe61026 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -866,6 +866,13 @@  save_stop_reason (struct lwp_info *lwp)
 	      if (!check_stopped_by_watchpoint (lwp))
 		lwp->stop_reason = TARGET_STOPPED_BY_SINGLE_STEP;
 	    }
+	  else
+	    {
+	      /* The si_code is not TRAP_HWBKPT whereas it should
+	         on some distro (CentOS 5, Suse 11) so still check
+	         if stopped due to watchpoint.  */
+	      check_stopped_by_watchpoint (lwp);
+	    }
 	}
     }
 #else
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 1bbad7b..0d97aa1 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2798,6 +2798,13 @@  save_stop_reason (struct lwp_info *lp)
 		 the debug registers separately.  */
 	      check_stopped_by_watchpoint (lp);
 	    }
+	  else
+	    {
+	      /* The si_code is not TRAP_HWBKPT whereas it should
+	         on some distro (CentOS 5, Suse 11) so still check
+	         if stopped due to watchpoint.  */
+	      check_stopped_by_watchpoint (lp);
+	    }
 	}
     }
 #else