diff mbox

aarch64: PR 19806: watchpoints: false negatives -> false positives

Message ID 20160606075945.GA19395@host1.jankratochvil.net
State New
Headers show

Commit Message

Jan Kratochvil June 6, 2016, 7:59 a.m. UTC
Hi,

Aarch64: watchpoints set on non-8-byte-aligned addresses are always missed
https://sourceware.org/bugzilla/show_bug.cgi?id=19806

some unaligned watchpoints are currently missed.  After this patch some other
unaligned watchpoints will get reported as false positives.

It could be probably all fixed on the kernel side, filed a RFE for it:
	kernel RFE: aarch64: ptrace: BAS: Support any contiguous range
	https://sourceware.org/bugzilla/show_bug.cgi?id=20207
->
	https://bugzilla.redhat.com/show_bug.cgi?id=1342821
I have no idea if/when the kernel part gets fixed so I am posting this
hopefully-temporary GDB change.

No regressions on aarch64-fedora23-linux-gnu in native mode.
No regressions on {x86_64,x86_64-m32,i686}-fedorarawhide-linux-gnu in native
and gdbserver mode.


Jan
gdb/ChangeLog
2016-06-06  Jan Kratochvil  <jan.kratochvil@redhat.com>

	PR breakpoints/19806
	* aarch64-linux-nat.c (aarch64_linux_stopped_data_address): Use
	dr_aligned_addr_wp and dr_orig_addr_wp.  Return dr_orig_addr_wp instead
	of addr_trap.
	* nat/aarch64-linux-hw-point.c (aarch64_dr_state_insert_one_point): Add
	parameter orig_addr.  Use dr_aligned_addr_wp and dr_orig_addr_wp.
	(aarch64_dr_state_remove_one_point): Use dr_aligned_addr_wp.
	(aarch64_handle_breakpoint, aarch64_handle_aligned_watchpoint)
	(aarch64_handle_unaligned_watchpoint): Update
	aarch64_dr_state_insert_one_point caller.
	(aarch64_linux_set_debug_regs): Use dr_aligned_addr_wp.
	(aarch64_show_debug_reg_state): Use dr_aligned_addr_wp and
	dr_orig_addr_wp.
	* nat/aarch64-linux-hw-point.h (struct aarch64_debug_reg_state): Rename
	dr_addr_wp to dr_orig_addr_wp, add dr_orig_addr_wp.

gdb/gdbserver/ChangeLog
2016-06-06  Jan Kratochvil  <jan.kratochvil@redhat.com>

	PR breakpoints/19806
	* linux-aarch64-low.c (aarch64_init_debug_reg_state): Use
	dr_aligned_addr_wp and dr_orig_addr_wp.
	(aarch64_stopped_data_address): Use dr_aligned_addr_wp and
	dr_orig_addr_wp.  Return dr_orig_addr_wp instead of addr_trap.

gdb/testsuite/ChangeLog
2016-06-06  Jan Kratochvil  <jan.kratochvil@redhat.com>

	PR breakpoints/19806
	* gdb.base/watchpoint-unaligned.c: New file.
	* gdb.base/watchpoint-unaligned.exp: New file.

Comments

Yao Qi June 7, 2016, 1:23 p.m. UTC | #1
Jan Kratochvil <jan.kratochvil@redhat.com> writes:

Hi Jan,

> Aarch64: watchpoints set on non-8-byte-aligned addresses are always missed
> https://sourceware.org/bugzilla/show_bug.cgi?id=19806
>
> some unaligned watchpoints are currently missed.  After this patch some other
> unaligned watchpoints will get reported as false positives.
>
> It could be probably all fixed on the kernel side, filed a RFE for it:
> 	kernel RFE: aarch64: ptrace: BAS: Support any contiguous range
> 	https://sourceware.org/bugzilla/show_bug.cgi?id=20207
> ->
> 	https://bugzilla.redhat.com/show_bug.cgi?id=1342821
> I have no idea if/when the kernel part gets fixed so I am posting this
> hopefully-temporary GDB change.
>

I agree that false positives are better than false negatives.

I don't think this problem is aarch64-linux specific, because other
targets, such as mips and ppc have the similar restrictions.  IMO, we
need to fix it in a target-independent way.  I've had some ideas, but
still need some experiments to see how good/bad they are.

In general, the trouble maker is target_stopped_data_address in
watchpoints_triggered, because it returns the 8-byte-aligned address,
which doesn't falls in the range of [loc->address, loc->address + loc->length).
However, watchpoints_triggered calls three target hooks and does two
things,

 - return true if target is stopped by watchpoints, and return false
   otherwise,
 - update watchpoint.watchpoint_triggered,

this leads me thinking that why do we need to get "inaccurate address"
from target_stopped_data_address, and pass it to
target_watchpoint_addr_within_range.  Instead, we can pass the
watchpoint to the (new) target hook, and set
watchpoint.watchpoint_triggered in different target implementations.  In
each target implementation, we can set .watchpoint_triggered to
watch_triggered_{yes,no,unknown} according to its hardware feature or
capability.

I'll give a try this way.

> diff --git a/gdb/testsuite/gdb.base/watchpoint-unaligned.exp b/gdb/testsuite/gdb.base/watchpoint-unaligned.exp
> new file mode 100644
> index 0000000..623314a
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/watchpoint-unaligned.exp
> @@ -0,0 +1,82 @@
> +# Copyright 2016 Free Software Foundation, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> +#
> +# This file is part of the gdb testsuite.
> +
> +if {![is_aarch64_target] && ![istarget "x86_64-*-*"] && ![istarget i?86-*]} {
> +    verbose "Skipping ${gdb_test_file_name}."
> +    return
> +}

Can we generalize this test to any targets supports watchpoint?

> +
> +standard_testfile
> +if { [prepare_for_testing ${testfile}.exp ${testfile} ${srcfile}] } {
> +    return -1
> +}
> +
> +if ![runto_main] {
> +    untested "could not run to main"
> +    return -1
> +}
> +
> +gdb_breakpoint "[gdb_get_line_number "again_start"]" "Breakpoint $decimal at $hex" "again_start"
> +
> +set sizes {1 2 4 8}
> +array set alignedend {1 1  2 2  3 4  4 4  5 8  6 8  7 8  8 8}
> +
> +foreach wpsize $sizes {
> +    for {set wpoffset 0} {$wpoffset < 8/$wpsize} {incr wpoffset} {
> +	set wpstart [expr $wpoffset * $wpsize]
> +	set wpend [expr ($wpoffset + 1) * $wpsize]
> +	set wpendaligned $alignedend($wpend)
> +	foreach rdsize $sizes {
> +	    for {set rdoffset 0} {$rdoffset < 8/$rdsize} {incr rdoffset} {
> +		set rdstart [expr $rdoffset * $rdsize]
> +		set rdend [expr ($rdoffset + 1) * $rdsize]
> +		set expect_hit [expr max($wpstart,$rdstart) < min($wpend,$rdend)]
> +		set test "rwatch data.u.size$wpsize\[$wpoffset\]"

Can we test both read watch and write watch?  Secondly, can you add some
comments on the code?

> +		set wpnum ""
> +		gdb_test_multiple $test $test {
> +		    -re "Hardware read watchpoint (\[0-9\]+): .*\r\n$gdb_prompt $" {
> +			set wpnum $expect_out(1,string)
> +		    }
> +		}
> +		gdb_test_no_output "set variable size = $rdsize" ""
> +		gdb_test_no_output "set variable offset = $rdoffset" ""
> +		set test "continue"
> +		set did_hit 0
> +		gdb_test_multiple $test $test {
> +		    -re "Hardware read watchpoint $wpnum:.*Value = .*\r\n$gdb_prompt $" {
> +			set did_hit 1
> +			send_gdb "continue\n"
> +			exp_continue
> +		    }
> +		    -re " again_start .*\r\n$gdb_prompt $" {
> +		    }
> +		}
> +		gdb_test_no_output "delete $wpnum" ""
> +		set test "wp(size=$wpsize offset=$wpoffset) rd(size=$rdsize offset=$rdoffset) expect=$expect_hit"
> +		if {$expect_hit == 0 && $rdstart < $wpendaligned} {
> +		    setup_kfail external/20207 "aarch64-*-*"

It should be xfail rather than kfail.
Pedro Alves June 7, 2016, 1:41 p.m. UTC | #2
On 06/07/2016 02:23 PM, Yao Qi wrote:

> this leads me thinking that why do we need to get "inaccurate address"
> from target_stopped_data_address, and pass it to
> target_watchpoint_addr_within_range.  Instead, we can pass the
> watchpoint to the (new) target hook, and set
> watchpoint.watchpoint_triggered in different target implementations.  In
> each target implementation, we can set .watchpoint_triggered to
> watch_triggered_{yes,no,unknown} according to its hardware feature or
> capability.
> 
> I'll give a try this way.

How do you plan on handling remote targets though?  Done that way, it 
sounds to me like the alignment restrictions should either be a gdbarch
property, or you need some RSP extension, e.g., extend the "watch" stop
reply to indicate an stop data address range instead of a sole address,
or make the stub report back the alignment restriction when GDB tells it
to insert the watchpoint in the first place, instead of just saying "OK".

A gdbarch method poses problems for remote stubs that are actually emulators,
and thus can support hardware watchpoints without these restrictions.
I think it's actually problematic for real machines, as the restrictions
will often depend on process revisions/models.  So a gdbarch approach
would be undesirable, IMO.

An RSP extension approach would work, though exactly because it
needs some extension, I'm not sure is worth it.

See:

 https://sourceware.org/bugzilla/show_bug.cgi?id=19806#c1

Thanks,
Pedro Alves
Yao Qi June 7, 2016, 3:25 p.m. UTC | #3
Pedro Alves <palves@redhat.com> writes:

> How do you plan on handling remote targets though?  Done that way, it 
> sounds to me like the alignment restrictions should either be a gdbarch
> property, or you need some RSP extension, e.g., extend the "watch" stop
> reply to indicate an stop data address range instead of a sole address,
> or make the stub report back the alignment restriction when GDB tells it
> to insert the watchpoint in the first place, instead of just saying
> "OK".
>

I want to use a gdbarch method to decide whether a watchpoint is
triggered in remote target, like this, [note that all the code below is
not compiled at all]

static enum watchpoint_triggered
remote_watchpoint_triggered (struct target_ops *target, struct watchpoint *w)
{
  struct thread_info *thread = inferior_thread ();

  if (thread->priv != NULL
      && thread->priv->stop_reason == TARGET_STOPPED_BY_WATCHPOINT)
    {
      return gdbarch_watchpoint_triggered (w->base.gdbarch, w,
					   thread->priv->watch_data_address);
    }

  return watchpoint_triggered_unknown;
}

The default implementation of gdbarch_watchpoint_triggered returns
watchpoint_triggered_yes if address is in the range of
loc->address,+loc->length.  Otherwise return watchpoint_triggered_no.
Then, we can rewrite breakpoint.c:watchpoints_triggered like this,

int
watchpoints_triggered (void)
{
  int ret = 0;

  ALL_BREAKPOINTS (b)
    if (is_hardware_watchpoint (b))
      {
	struct watchpoint *w = (struct watchpoint *) b;

	w->watchpoint_triggered = target_watchpoint_triggered (w);

	if (w->watchpoint_triggered != watch_triggered_no)
	  ret = 1;
      }

  return ret;
}

> A gdbarch method poses problems for remote stubs that are actually emulators,
> and thus can support hardware watchpoints without these restrictions.

Then, the address reported by the target should fall in the range of
loc->address,+loc->length.

> I think it's actually problematic for real machines, as the restrictions
> will often depend on process revisions/models.  So a gdbarch approach
> would be undesirable, IMO.

On the real machine, nowadays, the restriction is that address must be
8-byte-aligned on linux.  The restriction can only be relaxed and
may be removed finally in the future, IOW, the restriction won't become
16-byte aligned, so we can write the gdbarch method for aarch64-linux
like this,

static enum watchpoint_triggered
aarch64_linux_watchpoint_triggered (struct gdbarch *gdbarch,
				    struct watchpoint *w,
				    CORE_ADDR addr)
{
  struct bp_location *loc;

  for (loc = w->base.loc; loc; loc = loc->next)
    {
      if (addr >= loc->address && addr < loc->address + loc->length)
	{
	  /* If the address reported by the target falls in the memory
	     range that watchpoint W is monitoring, the watchpoint is
	     definitely hit.  */
	  return watchpoint_triggered_yes;
	}
      else if (addr >= align_down (loc->address, 8) && addr < loc->address)
	{
	  /* The debugging stub, like GDBserver, may adjust the address
	     to meet kernel's alignment requirements (8-aligned for
	     example), we are not sure watchpoint W is hit or not.  */
	  return watchpoint_triggered_unknown;
	}
    }

  return watchpoint_triggered_no;
}
Pedro Alves June 7, 2016, 4:04 p.m. UTC | #4
On 06/07/2016 04:25 PM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:

>> A gdbarch method poses problems for remote stubs that are actually emulators,
>> and thus can support hardware watchpoints without these restrictions.
> 
> Then, the address reported by the target should fall in the range of
> loc->address,+loc->length.

Isn't that what Jan's patch did?

In any case, with the gdbarch method, if the target supports watching all
kinds of unaligned addresses/ranges, and reports the correct address in
"T05 watch" stop replies, it then faces the problem that gdb has hardcoded
knowledge of watch region alignment restrictions, and then e.g., watchpoints
on consecutive addresses are misinterpreted, even though there's
no reason for that.

E.g., say gdb believes the machine only supports watching 32-bit-aligned
words.  Then with:

union
{
  char buf[4];
  uint32_t force_align;
} global;

(gdb) watch global.buf[1];
Hardware watchpoint 1 ...
(gdb) watch global.buf[3];
Hardware watchpoint 2 ...

... if the program writes to global.buf[1], and the target reports
a memory access to 'global.buf + 1', gdb will believe that
watchpoint 2 _also_ triggered, when it did not.  That's a false positive you
can't help with with real machines, but there's no reason a
simulator/emulator has to suffer from that.

> 
>> I think it's actually problematic for real machines, as the restrictions
>> will often depend on process revisions/models.  So a gdbarch approach
>> would be undesirable, IMO.
> 
> On the real machine, nowadays, the restriction is that address must be
> 8-byte-aligned on linux. 

I think we need to consider all architectures, and the design going
forward, not just Aarch64.  For example, PPC has:

static int
ppc_linux_watchpoint_addr_within_range (struct target_ops *target,
					CORE_ADDR addr,
					CORE_ADDR start, int length)
{
  int mask;

  if (have_ptrace_hwdebug_interface ()
      && ppc_linux_get_hwcap () & PPC_FEATURE_BOOKE)
    return start <= addr && start + length >= addr;
  else if (ppc_linux_get_hwcap () & PPC_FEATURE_BOOKE)
    mask = 3;
  else
    mask = 7;

So e.g., here, the alignment restrictions depend on both
the processor model and kernel.

(I'll guess that other embedded architectures that gdb supports probably
have similar restrictions that gdb was never taught about.)

> The restriction can only be relaxed and
> may be removed finally in the future, IOW, the restriction won't become
> 16-byte aligned, so we can write the gdbarch method for aarch64-linux
> like this,

How can gdb determine whether the restriction has been lifted?
The way to do it is probably either by checking kernel version or having the
ptrace code that "inserts" the watchpoint to first try watching the unaligned
region exactly as gdb requested, and if that doesn't work, try a wider,
aligned region.  Only that target-side ptrace code is aware of these finer details
and the correct restrictions in effect for the running system.  If we put this
in a gdbarch method, how can the gdbarch method maintain compatibility with
older kernels and at the same time reflect that newer kernels no longer
impose the restriction?

It's actually not just simulators/emulators that can have different
watchpoint restrictions from the machine architecture's debug hardware
limitations.  This is in good part a debug API issue as well --
a target may well support watchpoints implemented in a totally different
way -- for example, I believe Solaris supports "unlimited" watchpoints and
address ranges by implementing watchpoints not by using debug registers, but
instead by changing memory page protections and trapping faults internally,
all invisibly to userspace.

All this is why I believe that hardcoding this knowledge in gdb, which is
what a gdbarch method does, is not the best approach. 

Thanks,
Pedro Alves
Yao Qi June 8, 2016, 4:42 p.m. UTC | #5
Pedro Alves <palves@redhat.com> writes:

> In any case, with the gdbarch method, if the target supports watching all
> kinds of unaligned addresses/ranges, and reports the correct address in
> "T05 watch" stop replies, it then faces the problem that gdb has hardcoded
> knowledge of watch region alignment restrictions, and then e.g., watchpoints
> on consecutive addresses are misinterpreted, even though there's
> no reason for that.
>

The gdbarch method should first check if reported address is within the
range of watchpoint location, if yes, return watch_triggered_yes, tell
GDB that the watchpoint is definitely hit.

If we can't find any one range of watchpoint location that the report
address is within, we'll guess that the target may be monitoring aligned
address.  We check watchpoint locations again, to see if address is
within [align_down (loc->address, 8), loc->address], if yes, return
watch_triggered_unknown, tell GDB that watchpoint is hit, but target
doesn't tell the address.  [note that we'll miss read watchpoint in this
approach].  If no luck, return watch_triggered_no.

static enum watchpoint_triggered
aarch64_linux_watchpoint_triggered (struct gdbarch *gdbarch, CORE_ADDR addr,
				    struct breakpoint *b)
{
  struct bp_location *loc;
  int unknown = 0;

  for (loc = b->loc; loc != NULL; loc = loc->next)
    {
      if (addr >= loc->address && addr < loc->address + loc->length)
	{
	  /* If the address reported by the target falls in the memory
	     range that watchpoint W is monitoring, the watchpoint is
	     definitely hit.  */
	  return watch_triggered_yes;
	}
      else if (addr >= align_down (loc->address, 8) && addr < loc->address)
	{
	  /* The debugging stub, like GDBserver, may adjust the address
	     to meet kernel's alignment requirements (8-aligned for
	     example), we are not sure watchpoint W is hit or not.  */
	  unknown = 1;
	}
    }

  if (unknown)
    return watch_triggered_unknown;
  else
    return watch_triggered_no;
}

> E.g., say gdb believes the machine only supports watching 32-bit-aligned
> words.  Then with:
>
> union
> {
>   char buf[4];
>   uint32_t force_align;
> } global;
>
> (gdb) watch global.buf[1];
> Hardware watchpoint 1 ...
> (gdb) watch global.buf[3];
> Hardware watchpoint 2 ...
>
> ... if the program writes to global.buf[1], and the target reports
> a memory access to 'global.buf + 1', gdb will believe that
> watchpoint 2 _also_ triggered, when it did not.  That's a false positive you
> can't help with with real machines, but there's no reason a
> simulator/emulator has to suffer from that.

As I described above, gdbarch method will return watch_triggered_yes for
watchpoint 1, and watch_triggered_unknown for watchpoint 2.  After GDB
checks the value change, it ignores the latter, which is good.  The
false positive can be eliminated.

>
>> 
>>> I think it's actually problematic for real machines, as the restrictions
>>> will often depend on process revisions/models.  So a gdbarch approach
>>> would be undesirable, IMO.
>> 
>> On the real machine, nowadays, the restriction is that address must be
>> 8-byte-aligned on linux. 
>
> I think we need to consider all architectures, and the design going
> forward, not just Aarch64.  For example, PPC has:

We put things into gdbarch method, which is to handle the differences of
different archs.

>
> static int
> ppc_linux_watchpoint_addr_within_range (struct target_ops *target,
> 					CORE_ADDR addr,
> 					CORE_ADDR start, int length)
> {
>   int mask;
>
>   if (have_ptrace_hwdebug_interface ()
>       && ppc_linux_get_hwcap () & PPC_FEATURE_BOOKE)
>     return start <= addr && start + length >= addr;
>   else if (ppc_linux_get_hwcap () & PPC_FEATURE_BOOKE)
>     mask = 3;
>   else
>     mask = 7;
>
> So e.g., here, the alignment restrictions depend on both
> the processor model and kernel.

This can be in the gdbarch method for ppc-linux.

>
> (I'll guess that other embedded architectures that gdb supports probably
> have similar restrictions that gdb was never taught about.)
>

Yes, but we should teach GDB to do what we already know of.

>> The restriction can only be relaxed and
>> may be removed finally in the future, IOW, the restriction won't become
>> 16-byte aligned, so we can write the gdbarch method for aarch64-linux
>> like this,
>
> How can gdb determine whether the restriction has been lifted?

In the step of checking whether a given address is within the range
monitored by watchpoint, gdb doesn't need to determine that.  Once the
restriction is removed, the address reported by the target is exactly
the address of watchpoint, the gdbarch method returns watch_triggered_yes.

> The way to do it is probably either by checking kernel version or having the
> ptrace code that "inserts" the watchpoint to first try watching the unaligned
> region exactly as gdb requested, and if that doesn't work, try a wider,
> aligned region.  Only that target-side ptrace code is aware of these
> finer details
> and the correct restrictions in effect for the running system.  If we put this
> in a gdbarch method, how can the gdbarch method maintain compatibility with
> older kernels and at the same time reflect that newer kernels no longer
> impose the restriction?

The gdbarch method only does some complementary guess if reported
address doesn't fall in the range of watchpoint locations.  If there is
no such restriction, the guess is not used at all.

>
> It's actually not just simulators/emulators that can have different
> watchpoint restrictions from the machine architecture's debug hardware
> limitations.  This is in good part a debug API issue as well --
> a target may well support watchpoints implemented in a totally different
> way -- for example, I believe Solaris supports "unlimited" watchpoints and
> address ranges by implementing watchpoints not by using debug registers, but
> instead by changing memory page protections and trapping faults internally,
> all invisibly to userspace.

If they are invisible to user space, we can do nothing, but I don't know
how is it related to this issue.

>
> All this is why I believe that hardcoding this knowledge in gdb, which is
> what a gdbarch method does, is not the best approach. 

These knowledge is only used when there are some limitations in the
debugging api, and these knowledge are not harmful once the limitations
are fixed.
Pedro Alves June 8, 2016, 5:54 p.m. UTC | #6
On 06/08/2016 05:42 PM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> In any case, with the gdbarch method, if the target supports watching all
>> kinds of unaligned addresses/ranges, and reports the correct address in
>> "T05 watch" stop replies, it then faces the problem that gdb has hardcoded
>> knowledge of watch region alignment restrictions, and then e.g., watchpoints
>> on consecutive addresses are misinterpreted, even though there's
>> no reason for that.
>>
> 
> The gdbarch method should first check if reported address is within the
> range of watchpoint location, if yes, return watch_triggered_yes, tell
> GDB that the watchpoint is definitely hit.
> 
> If we can't find any one range of watchpoint location that the report
> address is within, we'll guess that the target may be monitoring aligned
> address.  We check watchpoint locations again, to see if address is
> within [align_down (loc->address, 8), loc->address], if yes, return
> watch_triggered_unknown, tell GDB that watchpoint is hit, but target
> doesn't tell the address.  [note that we'll miss read watchpoint in this
> approach].  If no luck, return watch_triggered_no.
> 

But this way, once the kernel is fixed, we'll continue reporting false
positives for read watchpoints.  And we'll unnecessarily report false
positives with any emulator that doesn't have this restriction (think
Valgrind, etc.).

How is this better than making it the responsibility of the
stub to report a stop data address that is within the address
range that gdb requested to watch, in the first place?

By making it the target's responsibility, the false positives will
disappear once the kernel (and the target) is fixed.  With a gdbarch
approach, we will always have false positives, as long as we
support older kernels.  And as long as we carry the workaround,
we will always have false positives against simulators/emulators
that really only trap the addresses gdb requested.

> 
>> E.g., say gdb believes the machine only supports watching 32-bit-aligned
>> words.  Then with:
>>
>> union
>> {
>>   char buf[4];
>>   uint32_t force_align;
>> } global;
>>
>> (gdb) watch global.buf[1];
>> Hardware watchpoint 1 ...
>> (gdb) watch global.buf[3];
>> Hardware watchpoint 2 ...
>>
>> ... if the program writes to global.buf[1], and the target reports
>> a memory access to 'global.buf + 1', gdb will believe that
>> watchpoint 2 _also_ triggered, when it did not.  That's a false positive you
>> can't help with with real machines, but there's no reason a
>> simulator/emulator has to suffer from that.
> 
> As I described above, gdbarch method will return watch_triggered_yes for
> watchpoint 1, and watch_triggered_unknown for watchpoint 2.  After GDB
> checks the value change, it ignores the latter, which is good.  The
> false positive can be eliminated.

Only for regular watchpoints...  Read/access watchpoints are not
so lucky.

> 
>>
>>>
>>>> I think it's actually problematic for real machines, as the restrictions
>>>> will often depend on process revisions/models.  So a gdbarch approach
>>>> would be undesirable, IMO.
>>>
>>> On the real machine, nowadays, the restriction is that address must be
>>> 8-byte-aligned on linux. 
>>
>> I think we need to consider all architectures, and the design going
>> forward, not just Aarch64.  For example, PPC has:
> 
> We put things into gdbarch method, which is to handle the differences of
> different archs.
> 
>>
>> static int
>> ppc_linux_watchpoint_addr_within_range (struct target_ops *target,
>> 					CORE_ADDR addr,
>> 					CORE_ADDR start, int length)
>> {
>>   int mask;
>>
>>   if (have_ptrace_hwdebug_interface ()
>>       && ppc_linux_get_hwcap () & PPC_FEATURE_BOOKE)
>>     return start <= addr && start + length >= addr;
>>   else if (ppc_linux_get_hwcap () & PPC_FEATURE_BOOKE)
>>     mask = 3;
>>   else
>>     mask = 7;
>>
>> So e.g., here, the alignment restrictions depend on both
>> the processor model and kernel.
> 
> This can be in the gdbarch method for ppc-linux.

No, because a gdbarch method can't tell whether have_ptrace_hwdebug_interface()
returns true.

> 
>>
>> (I'll guess that other embedded architectures that gdb supports probably
>> have similar restrictions that gdb was never taught about.)
>>
> 
> Yes, but we should teach GDB to do what we already know of.
> 
>>> The restriction can only be relaxed and
>>> may be removed finally in the future, IOW, the restriction won't become
>>> 16-byte aligned, so we can write the gdbarch method for aarch64-linux
>>> like this,
>>
>> How can gdb determine whether the restriction has been lifted?
> 
> In the step of checking whether a given address is within the range
> monitored by watchpoint, gdb doesn't need to determine that.  Once the
> restriction is removed, the address reported by the target is exactly
> the address of watchpoint, the gdbarch method returns watch_triggered_yes.

Only if you ignore read watchpoints.  But those are real things.  We
shouldn't ignore them.

> 
>> The way to do it is probably either by checking kernel version or having the
>> ptrace code that "inserts" the watchpoint to first try watching the unaligned
>> region exactly as gdb requested, and if that doesn't work, try a wider,
>> aligned region.  Only that target-side ptrace code is aware of these
>> finer details
>> and the correct restrictions in effect for the running system.  If we put this
>> in a gdbarch method, how can the gdbarch method maintain compatibility with
>> older kernels and at the same time reflect that newer kernels no longer
>> impose the restriction?
> 
> The gdbarch method only does some complementary guess if reported
> address doesn't fall in the range of watchpoint locations.  If there is
> no such restriction, the guess is not used at all.
> 
>>
>> It's actually not just simulators/emulators that can have different
>> watchpoint restrictions from the machine architecture's debug hardware
>> limitations.  This is in good part a debug API issue as well --
>> a target may well support watchpoints implemented in a totally different
>> way -- for example, I believe Solaris supports "unlimited" watchpoints and
>> address ranges by implementing watchpoints not by using debug registers, but
>> instead by changing memory page protections and trapping faults internally,
>> all invisibly to userspace.
> 
> If they are invisible to user space, we can do nothing, but I don't know
> how is it related to this issue.

I'm saying that even if a machine/cpu has some restriction that only
8-byte aligned addresses can be watched, a stub is free to implement
watchpoints some other way that gets over that limitation.  By hardcoding
the alignment assumptions in gdb, you're setting up for false positives
which could be entirely avoided if gdb wasn't in the business of
hardcoding them.

> 
>>
>> All this is why I believe that hardcoding this knowledge in gdb, which is
>> what a gdbarch method does, is not the best approach. 
> 
> These knowledge is only used when there are some limitations in the
> debugging api, and these knowledge are not harmful once the limitations
> are fixed.
> 

They _are_ harmful, because they prevent fixing read watchpoint false
positives.

Thanks,
Pedro Alves
Pedro Alves June 8, 2016, 6:46 p.m. UTC | #7
On 06/08/2016 06:54 PM, Pedro Alves wrote:
> How is this better than making it the responsibility of the
> stub to report a stop data address that is within the address
> range that gdb requested to watch, in the first place?

I thought of a case where this is the wrong thing to do.
(Alternative below.)

The same example as before: e.g., a machine that only supports
watching 32-bit-aligned words.  Then with:

union
{
  char buf[4];
  uint32_t force_align;
} global;

(gdb) watch global.buf[1];
Hardware watchpoint 1 ...
(gdb) watch global.buf[3];
Hardware watchpoint 2 ...

... if the program writes to global.buf[3], and the target
reports a memory access to 'global.buf + 1' (because that's the
first watchpoint in its own watchpoint list that overlaps
the global.buf[0]..global.buf[3] range (what is really being
watched)), gdb will believe that watchpoint 1 triggered, notice
the value didn't change, and thus incorrectly ignore the watchpoint hit.

So I'm now thinking that the best is to extend the RSP somehow.

I think that my preference is for the target to report a
memory range instead of a single address in the watchpoint 
stop reply.  Like, e.g., if an access to address 10000006 triggers,
but all the target knows is that some address between 10000004
and 10000008 was accessed, it reports:

  T05 watch:10000004-10000008

instead of:

  T05 watch:10000006

( or maybe T05 watch:10000004;watch-end:10000008 )

This should be easy to implement in the target side, and
is practically stateless (a stub that just forwards requests
to a lower level debug api doesn't have to remember
the addresses gdb requested).

Then the corresponding target methods in gdb would work with an
address range instead of a single address too.  E.g.,
target_watchpoint_addr_within_range would be replaced by
a range overlap check.

If the target knows the process stopped for a watchpoint, but
doesn't know the address that triggered, then it could report
the whole address space as range:

  T05 watch:00000000-ffffffff

Note that it's not possible currently for a remote stub to tell gdb
that it doesn't know the address that trapped, even though the
target_ops:target_stopped_data_address method supports
returning false, and some native targets do make use of it.

Alternatively, since we're extending the packet anyway, we
can make the address optional, thus making these equivalent:

  T05 watch:00000000-ffffffff
  T05 watch:

Alternatively, I though we could add a new qSupported feature value
that informs gdb of what is the watchpoint alignment and size
restriction, but I'm not so keen on that since the alignment
restriction may depend on mode (e.g., 32-bit vs 64-bit inferior),
or on the size of the area being watched, or some more complicated rule.

Thanks,
Pedro Alves
Yao Qi June 10, 2016, 8:11 a.m. UTC | #8
Pedro Alves <palves@redhat.com> writes:

> So I'm now thinking that the best is to extend the RSP somehow.
>

I think it is a good idea to extend the RSP...

> I think that my preference is for the target to report a
> memory range instead of a single address in the watchpoint 
> stop reply.  Like, e.g., if an access to address 10000006 triggers,
> but all the target knows is that some address between 10000004
> and 10000008 was accessed, it reports:
>
>   T05 watch:10000004-10000008
>
> instead of:
>
>   T05 watch:10000006
>
> ( or maybe T05 watch:10000004;watch-end:10000008 )
>
> This should be easy to implement in the target side, and
> is practically stateless (a stub that just forwards requests
> to a lower level debug api doesn't have to remember
> the addresses gdb requested).
>

Right.

> Then the corresponding target methods in gdb would work with an
> address range instead of a single address too.  E.g.,
> target_watchpoint_addr_within_range would be replaced by
> a range overlap check.
>
> If the target knows the process stopped for a watchpoint, but
> doesn't know the address that triggered, then it could report
> the whole address space as range:
>
>   T05 watch:00000000-ffffffff
>
> Note that it's not possible currently for a remote stub to tell gdb
> that it doesn't know the address that trapped, even though the
> target_ops:target_stopped_data_address method supports
> returning false, and some native targets do make use of it.
>
> Alternatively, since we're extending the packet anyway, we
> can make the address optional, thus making these equivalent:
>
>   T05 watch:00000000-ffffffff
>   T05 watch:

I prefer the latter, T05 watch:.

> Alternatively, I though we could add a new qSupported feature value
> that informs gdb of what is the watchpoint alignment and size
> restriction, but I'm not so keen on that since the alignment
> restriction may depend on mode (e.g., 32-bit vs 64-bit inferior),
> or on the size of the area being watched, or some more complicated rule.

Yes, there are different watchpoint alignment restrictions on arm and
aarch64.
Jan Kratochvil June 19, 2016, 6:29 p.m. UTC | #9
On Wed, 08 Jun 2016 20:46:35 +0200, Pedro Alves wrote:
> I thought of a case where this is the wrong thing to do.
> (Alternative below.)
> 
> The same example as before: e.g., a machine that only supports
> watching 32-bit-aligned words.  Then with:
> 
> union
> {
>   char buf[4];
>   uint32_t force_align;
> } global;
> 
> (gdb) watch global.buf[1];
> Hardware watchpoint 1 ...
> (gdb) watch global.buf[3];
> Hardware watchpoint 2 ...
> 
> ... if the program writes to global.buf[3], and the target
> reports a memory access to 'global.buf + 1' (because that's the
> first watchpoint in its own watchpoint list that overlaps
> the global.buf[0]..global.buf[3] range (what is really being
> watched)), gdb will believe that watchpoint 1 triggered, notice
> the value didn't change, and thus incorrectly ignore the watchpoint hit.

Here if the program really does 'global.buf[3] = 0xff;' then it still does
work as despite GDB places watchpoint at &global.buf[0] for 4 bytes aarch64
still reports the exact 1-byte location &global.buf[3]:
	echo -e 'union { char buf[8]; unsigned long ul; } u; int main(void) {\n u.buf[3]=0xff;\n return 0; }'|gcc -Wall -g -x c -;./gdb -data-directory ./data-directory/ ./a.out -ex start -ex 'watch u.buf[1]' -ex 'watch u.buf[3]' -ex c -ex 'p u.buf[1]' -ex 'p u.buf[3]'
	Hardware watchpoint 2: u.buf[1]
	Hardware watchpoint 3: u.buf[3]
	Continuing.
	Hardware watchpoint 3: u.buf[3]
	Old value = 0 '\000'
	New value = 255 '\377'
	main () at <stdin>:3
	3	in <stdin>
	$1 = 0 '\000'
	$2 = 255 '\377'

But you are right the watchpoint gets missed if the program writes to
&global.buf[0] a 4-byte value not modifying the contents of global.buf[1] but
modifying the contents of global.buf[3]:
	echo -e 'union { char buf[8]; unsigned long ul; } u; int main(void) {\n u.ul=0xff000000;\n return 0; }'|gcc -Wall -g -x c -;./gdb -data-directory ./data-directory/ ./a.out -ex start -ex 'watch u.buf[1]' -ex 'watch u.buf[3]' -ex 'b 3' -ex c -ex 'p u.buf[1]' -ex 'p u.buf[3]'
	Hardware watchpoint 2: u.buf[1]
	Hardware watchpoint 3: u.buf[3]
	Breakpoint 4 at 0x400570: file <stdin>, line 3.
	Continuing.
	Breakpoint 4, main () at <stdin>:3
	3	in <stdin>
	$1 = 0 '\000'
	$2 = 255 '\377'

I will follow with an add-on patch as I guess you think it makes sense to fix
GDB even for older kernels (assuming aarch64 kernel gets fixed in some time).


Thanks,
Jan
Pedro Alves June 20, 2016, 11:47 a.m. UTC | #10
On 06/19/2016 07:29 PM, Jan Kratochvil wrote:
> On Wed, 08 Jun 2016 20:46:35 +0200, Pedro Alves wrote:
>> I thought of a case where this is the wrong thing to do.
>> (Alternative below.)
>>
>> The same example as before: e.g., a machine that only supports
>> watching 32-bit-aligned words.  Then with:
>>
>> union
>> {
>>   char buf[4];
>>   uint32_t force_align;
>> } global;
>>
>> (gdb) watch global.buf[1];
>> Hardware watchpoint 1 ...
>> (gdb) watch global.buf[3];
>> Hardware watchpoint 2 ...
>>
>> ... if the program writes to global.buf[3], and the target
>> reports a memory access to 'global.buf + 1' (because that's the
>> first watchpoint in its own watchpoint list that overlaps
>> the global.buf[0]..global.buf[3] range (what is really being
>> watched)), gdb will believe that watchpoint 1 triggered, notice
>> the value didn't change, and thus incorrectly ignore the watchpoint hit.
> 
> Here if the program really does 'global.buf[3] = 0xff;' then it still does
> work as despite GDB places watchpoint at &global.buf[0] for 4 bytes aarch64
> still reports the exact 1-byte location &global.buf[3]:
> 	echo -e 'union { char buf[8]; unsigned long ul; } u; int main(void) {\n u.buf[3]=0xff;\n return 0; }'|gcc -Wall -g -x c -;./gdb -data-directory ./data-directory/ ./a.out -ex start -ex 'watch u.buf[1]' -ex 'watch u.buf[3]' -ex c -ex 'p u.buf[1]' -ex 'p u.buf[3]'
> 	Hardware watchpoint 2: u.buf[1]
> 	Hardware watchpoint 3: u.buf[3]
> 	Continuing.
> 	Hardware watchpoint 3: u.buf[3]
> 	Old value = 0 '\000'
> 	New value = 255 '\377'

Sounds what would happen when gdbserver looks for the first watchpoint
that overlaps the kernel-reported siginfo.si_addr trap address, it finds
watchpoint 3 first.  Which is entirely plausible because gdbserver actually
iterates backwards:

  /* Check if the address matches any watched address.  */
  state = aarch64_get_debug_reg_state (pid_of (current_thread));
  for (i = aarch64_num_wp_regs - 1; i >= 0; --i)
    {
      const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);

So I guess that if you manage to reverse the order of the watchpoints
in gdbserver, gdbserver will find u.buf[1] as first overlapping watchpoint,
and then we'd see what I described.  Or if you reverse the iteration
order in that for loop.

Maybe just setting the watchpoint at buf[3] first would be sufficient.

 (gdb) watch global.buf[3];
 Hardware watchpoint 1 ...
 (gdb) watch global.buf[1];
 Hardware watchpoint 2 ...
 ...

Thanks,
Pedro Alves
Jan Kratochvil June 20, 2016, 2:12 p.m. UTC | #11
On Mon, 20 Jun 2016 13:47:12 +0200, Pedro Alves wrote:
> On 06/19/2016 07:29 PM, Jan Kratochvil wrote:
> > On Wed, 08 Jun 2016 20:46:35 +0200, Pedro Alves wrote:
> >> ... if the program writes to global.buf[3], and the target
> >> reports a memory access to 'global.buf + 1' (because that's the
> >> first watchpoint in its own watchpoint list that overlaps
> >> the global.buf[0]..global.buf[3] range (what is really being
> >> watched)), gdb will believe that watchpoint 1 triggered, notice
> >> the value didn't change, and thus incorrectly ignore the watchpoint hit.
> > 
> > Here if the program really does 'global.buf[3] = 0xff;' then it still does
> > work as despite GDB places watchpoint at &global.buf[0] for 4 bytes aarch64
> > still reports the exact 1-byte location &global.buf[3]:
> > 	echo -e 'union { char buf[8]; unsigned long ul; } u; int main(void) {\n u.buf[3]=0xff;\n return 0; }'|gcc -Wall -g -x c -;./gdb -data-directory ./data-directory/ ./a.out -ex start -ex 'watch u.buf[1]' -ex 'watch u.buf[3]' -ex c -ex 'p u.buf[1]' -ex 'p u.buf[3]'
> > 	Hardware watchpoint 2: u.buf[1]
> > 	Hardware watchpoint 3: u.buf[3]
> > 	Continuing.
> > 	Hardware watchpoint 3: u.buf[3]
> > 	Old value = 0 '\000'
> > 	New value = 255 '\377'
> 
> Sounds what would happen when gdbserver looks for the first watchpoint
> that overlaps the kernel-reported siginfo.si_addr trap address, it finds
> watchpoint 3 first.

What I was saying is that it is really not possible because for
	u.buf[3]=0xff;
kernel does report
	siginfo.si_addr=0x4109bb
And 'Hardware watchpoint 2' is only 2 bytes long (0x4109b8..0x4109b9
inclusive).


> Which is entirely plausible because gdbserver actually iterates backwards:

It really does not matter in which order the watchpoints are because only one
watchpoint range (Hardware watchpoint 3) does match the reported address.

Please verify the technical topics you are unsure about before correcting
people.


Thanks,
Jan
Pedro Alves June 20, 2016, 2:40 p.m. UTC | #12
On 06/20/2016 03:12 PM, Jan Kratochvil wrote:
> It really does not matter in which order the watchpoints are because only one
> watchpoint range (Hardware watchpoint 3) does match the reported address.

OK.

> 
> Please verify the technical topics you are unsure about before correcting
> people.

Wow, seriously.  That's truly uncalled for, and not very constructive.

Thanks,
Pedro Alves
Jan Kratochvil March 27, 2017, 9:11 p.m. UTC | #13
obsoleted by:
	[patch] aarch64: PR 19806: watchpoints: false negatives + PR 20207 contiguous ones
	https://sourceware.org/ml/gdb-patches/2017-03/msg00470.html
	Message-ID: <20170327210753.GA29656@host1.jankratochvil.net>
diff mbox

Patch

diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
index fe1631d..33a0a4f 100644
--- a/gdb/aarch64-linux-nat.c
+++ b/gdb/aarch64-linux-nat.c
@@ -765,14 +765,15 @@  aarch64_linux_stopped_data_address (struct target_ops *target,
     {
       const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
       const CORE_ADDR addr_trap = (CORE_ADDR) siginfo.si_addr;
-      const CORE_ADDR addr_watch = state->dr_addr_wp[i];
+      const CORE_ADDR addr_aligned_watch = state->dr_aligned_addr_wp[i];
+      const CORE_ADDR addr_orig = state->dr_orig_addr_wp[i];
 
       if (state->dr_ref_count_wp[i]
 	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
-	  && addr_trap >= addr_watch
-	  && addr_trap < addr_watch + len)
+	  && addr_trap >= addr_aligned_watch
+	  && addr_trap < addr_aligned_watch + len)
 	{
-	  *addr_p = addr_trap;
+	  *addr_p = addr_orig;
 	  return 1;
 	}
     }
diff --git a/gdb/gdbserver/linux-aarch64-low.c b/gdb/gdbserver/linux-aarch64-low.c
index d237bde..d9edd36 100644
--- a/gdb/gdbserver/linux-aarch64-low.c
+++ b/gdb/gdbserver/linux-aarch64-low.c
@@ -220,7 +220,8 @@  aarch64_init_debug_reg_state (struct aarch64_debug_reg_state *state)
 
   for (i = 0; i < AARCH64_HWP_MAX_NUM; ++i)
     {
-      state->dr_addr_wp[i] = 0;
+      state->dr_aligned_addr_wp[i] = 0;
+      state->dr_orig_addr_wp[i] = 0;
       state->dr_ctrl_wp[i] = 0;
       state->dr_ref_count_wp[i] = 0;
     }
@@ -376,12 +377,13 @@  aarch64_stopped_data_address (void)
     {
       const unsigned int len = aarch64_watchpoint_length (state->dr_ctrl_wp[i]);
       const CORE_ADDR addr_trap = (CORE_ADDR) siginfo.si_addr;
-      const CORE_ADDR addr_watch = state->dr_addr_wp[i];
+      const CORE_ADDR addr_aligned_watch = state->dr_aligned_addr_wp[i];
+      const CORE_ADDR addr_orig = state->dr_orig_addr_wp[i];
       if (state->dr_ref_count_wp[i]
 	  && DR_CONTROL_ENABLED (state->dr_ctrl_wp[i])
-	  && addr_trap >= addr_watch
-	  && addr_trap < addr_watch + len)
-	return addr_trap;
+	  && addr_trap >= addr_aligned_watch
+	  && addr_trap < addr_aligned_watch + len)
+	return addr_orig;
     }
 
   return (CORE_ADDR) 0;
diff --git a/gdb/nat/aarch64-linux-hw-point.c b/gdb/nat/aarch64-linux-hw-point.c
index a06a6e6..ddba270 100644
--- a/gdb/nat/aarch64-linux-hw-point.c
+++ b/gdb/nat/aarch64-linux-hw-point.c
@@ -330,11 +330,11 @@  aarch64_notify_debug_reg_change (const struct aarch64_debug_reg_state *state,
 static int
 aarch64_dr_state_insert_one_point (struct aarch64_debug_reg_state *state,
 				   enum target_hw_bp_type type,
-				   CORE_ADDR addr, int len)
+				   CORE_ADDR addr, int len, CORE_ADDR orig_addr)
 {
   int i, idx, num_regs, is_watchpoint;
   unsigned int ctrl, *dr_ctrl_p, *dr_ref_count;
-  CORE_ADDR *dr_addr_p;
+  CORE_ADDR *dr_aligned_addr_p, *dr_orig_addr_p;
 
   /* Set up state pointers.  */
   is_watchpoint = (type != hw_execute);
@@ -342,14 +342,16 @@  aarch64_dr_state_insert_one_point (struct aarch64_debug_reg_state *state,
   if (is_watchpoint)
     {
       num_regs = aarch64_num_wp_regs;
-      dr_addr_p = state->dr_addr_wp;
+      dr_aligned_addr_p = state->dr_aligned_addr_wp;
+      dr_orig_addr_p = state->dr_orig_addr_wp;
       dr_ctrl_p = state->dr_ctrl_wp;
       dr_ref_count = state->dr_ref_count_wp;
     }
   else
     {
       num_regs = aarch64_num_bp_regs;
-      dr_addr_p = state->dr_addr_bp;
+      dr_aligned_addr_p = state->dr_addr_bp;
+      dr_orig_addr_p = NULL;
       dr_ctrl_p = state->dr_ctrl_bp;
       dr_ref_count = state->dr_ref_count_bp;
     }
@@ -366,7 +368,7 @@  aarch64_dr_state_insert_one_point (struct aarch64_debug_reg_state *state,
 	  idx = i;
 	  /* no break; continue hunting for an exising one.  */
 	}
-      else if (dr_addr_p[i] == addr && dr_ctrl_p[i] == ctrl)
+      else if (dr_aligned_addr_p[i] == addr && dr_ctrl_p[i] == ctrl)
 	{
 	  gdb_assert (dr_ref_count[i] != 0);
 	  idx = i;
@@ -382,7 +384,9 @@  aarch64_dr_state_insert_one_point (struct aarch64_debug_reg_state *state,
   if ((dr_ctrl_p[idx] & 1) == 0)
     {
       /* new entry */
-      dr_addr_p[idx] = addr;
+      dr_aligned_addr_p[idx] = addr;
+      if (dr_orig_addr_p != NULL)
+	dr_orig_addr_p[idx] = orig_addr;
       dr_ctrl_p[idx] = ctrl;
       dr_ref_count[idx] = 1;
       /* Notify the change.  */
@@ -414,7 +418,7 @@  aarch64_dr_state_remove_one_point (struct aarch64_debug_reg_state *state,
   if (is_watchpoint)
     {
       num_regs = aarch64_num_wp_regs;
-      dr_addr_p = state->dr_addr_wp;
+      dr_addr_p = state->dr_aligned_addr_wp;
       dr_ctrl_p = state->dr_ctrl_wp;
       dr_ref_count = state->dr_ref_count_wp;
     }
@@ -472,7 +476,7 @@  aarch64_handle_breakpoint (enum target_hw_bp_type type, CORE_ADDR addr,
       if (!aarch64_point_is_aligned (0 /* is_watchpoint */ , addr, len))
 	return -1;
 
-      return aarch64_dr_state_insert_one_point (state, type, addr, len);
+      return aarch64_dr_state_insert_one_point (state, type, addr, len, 0);
     }
   else
     return aarch64_dr_state_remove_one_point (state, type, addr, len);
@@ -487,7 +491,7 @@  aarch64_handle_aligned_watchpoint (enum target_hw_bp_type type,
 				   struct aarch64_debug_reg_state *state)
 {
   if (is_insert)
-    return aarch64_dr_state_insert_one_point (state, type, addr, len);
+    return aarch64_dr_state_insert_one_point (state, type, addr, len, addr);
   else
     return aarch64_dr_state_remove_one_point (state, type, addr, len);
 }
@@ -504,6 +508,8 @@  aarch64_handle_unaligned_watchpoint (enum target_hw_bp_type type,
 				     CORE_ADDR addr, int len, int is_insert,
 				     struct aarch64_debug_reg_state *state)
 {
+  const CORE_ADDR orig_addr = addr;
+
   while (len > 0)
     {
       CORE_ADDR aligned_addr;
@@ -514,7 +520,7 @@  aarch64_handle_unaligned_watchpoint (enum target_hw_bp_type type,
 
       if (is_insert)
 	ret = aarch64_dr_state_insert_one_point (state, type, aligned_addr,
-						 aligned_len);
+						 aligned_len, orig_addr);
       else
 	ret = aarch64_dr_state_remove_one_point (state, type, aligned_addr,
 						 aligned_len);
@@ -564,7 +570,7 @@  aarch64_linux_set_debug_regs (const struct aarch64_debug_reg_state *state,
   memset (&regs, 0, sizeof (regs));
   iov.iov_base = &regs;
   count = watchpoint ? aarch64_num_wp_regs : aarch64_num_bp_regs;
-  addr = watchpoint ? state->dr_addr_wp : state->dr_addr_bp;
+  addr = watchpoint ? state->dr_aligned_addr_wp : state->dr_addr_bp;
   ctrl = watchpoint ? state->dr_ctrl_wp : state->dr_ctrl_bp;
   if (count == 0)
     return;
@@ -611,8 +617,9 @@  aarch64_show_debug_reg_state (struct aarch64_debug_reg_state *state,
 
   debug_printf ("\tWATCHPOINTs:\n");
   for (i = 0; i < aarch64_num_wp_regs; i++)
-    debug_printf ("\tWP%d: addr=%s, ctrl=0x%08x, ref.count=%d\n",
-		  i, core_addr_to_string_nz (state->dr_addr_wp[i]),
+    debug_printf ("\tWP%d: addr=%s (orig=%s), ctrl=0x%08x, ref.count=%d\n",
+		  i, core_addr_to_string_nz (state->dr_aligned_addr_wp[i]),
+		  core_addr_to_string_nz (state->dr_orig_addr_wp[i]),
 		  state->dr_ctrl_wp[i], state->dr_ref_count_wp[i]);
 }
 
diff --git a/gdb/nat/aarch64-linux-hw-point.h b/gdb/nat/aarch64-linux-hw-point.h
index acf0a49..5d17f92 100644
--- a/gdb/nat/aarch64-linux-hw-point.h
+++ b/gdb/nat/aarch64-linux-hw-point.h
@@ -143,7 +143,8 @@  struct aarch64_debug_reg_state
   unsigned int dr_ref_count_bp[AARCH64_HBP_MAX_NUM];
 
   /* hardware watchpoint */
-  CORE_ADDR dr_addr_wp[AARCH64_HWP_MAX_NUM];
+  CORE_ADDR dr_aligned_addr_wp[AARCH64_HWP_MAX_NUM];
+  CORE_ADDR dr_orig_addr_wp[AARCH64_HWP_MAX_NUM];
   unsigned int dr_ctrl_wp[AARCH64_HWP_MAX_NUM];
   unsigned int dr_ref_count_wp[AARCH64_HWP_MAX_NUM];
 };
diff --git a/gdb/testsuite/gdb.base/watchpoint-unaligned.c b/gdb/testsuite/gdb.base/watchpoint-unaligned.c
new file mode 100644
index 0000000..b76bc1c
--- /dev/null
+++ b/gdb/testsuite/gdb.base/watchpoint-unaligned.c
@@ -0,0 +1,68 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2016 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+#include <assert.h>
+
+static int again;
+
+static volatile struct
+{
+  uint64_t alignment;
+  union
+    {
+      uint64_t size8[1];
+      uint32_t size4[2];
+      uint16_t size2[4];
+      uint8_t size1[8];
+    }
+  u;
+} data;
+
+static int size = 0;
+static int offset;
+
+int
+main (void)
+{
+  volatile uint64_t local;
+
+  assert (sizeof (data) == 8 + 8);
+  while (size)
+    {
+      switch (size)
+	{
+	case 8:
+	  local = data.u.size8[offset];
+	  break;
+	case 4:
+	  local = data.u.size4[offset];
+	  break;
+	case 2:
+	  local = data.u.size2[offset];
+	  break;
+	case 1:
+	  local = data.u.size1[offset];
+	  break;
+	default:
+	  assert (0);
+	}
+      size = 0;
+      size = size; /* again_start */
+    }
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/watchpoint-unaligned.exp b/gdb/testsuite/gdb.base/watchpoint-unaligned.exp
new file mode 100644
index 0000000..623314a
--- /dev/null
+++ b/gdb/testsuite/gdb.base/watchpoint-unaligned.exp
@@ -0,0 +1,82 @@ 
+# Copyright 2016 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+#
+# This file is part of the gdb testsuite.
+
+if {![is_aarch64_target] && ![istarget "x86_64-*-*"] && ![istarget i?86-*]} {
+    verbose "Skipping ${gdb_test_file_name}."
+    return
+}
+
+standard_testfile
+if { [prepare_for_testing ${testfile}.exp ${testfile} ${srcfile}] } {
+    return -1
+}
+
+if ![runto_main] {
+    untested "could not run to main"
+    return -1
+}
+
+gdb_breakpoint "[gdb_get_line_number "again_start"]" "Breakpoint $decimal at $hex" "again_start"
+
+set sizes {1 2 4 8}
+array set alignedend {1 1  2 2  3 4  4 4  5 8  6 8  7 8  8 8}
+
+foreach wpsize $sizes {
+    for {set wpoffset 0} {$wpoffset < 8/$wpsize} {incr wpoffset} {
+	set wpstart [expr $wpoffset * $wpsize]
+	set wpend [expr ($wpoffset + 1) * $wpsize]
+	set wpendaligned $alignedend($wpend)
+	foreach rdsize $sizes {
+	    for {set rdoffset 0} {$rdoffset < 8/$rdsize} {incr rdoffset} {
+		set rdstart [expr $rdoffset * $rdsize]
+		set rdend [expr ($rdoffset + 1) * $rdsize]
+		set expect_hit [expr max($wpstart,$rdstart) < min($wpend,$rdend)]
+		set test "rwatch data.u.size$wpsize\[$wpoffset\]"
+		set wpnum ""
+		gdb_test_multiple $test $test {
+		    -re "Hardware read watchpoint (\[0-9\]+): .*\r\n$gdb_prompt $" {
+			set wpnum $expect_out(1,string)
+		    }
+		}
+		gdb_test_no_output "set variable size = $rdsize" ""
+		gdb_test_no_output "set variable offset = $rdoffset" ""
+		set test "continue"
+		set did_hit 0
+		gdb_test_multiple $test $test {
+		    -re "Hardware read watchpoint $wpnum:.*Value = .*\r\n$gdb_prompt $" {
+			set did_hit 1
+			send_gdb "continue\n"
+			exp_continue
+		    }
+		    -re " again_start .*\r\n$gdb_prompt $" {
+		    }
+		}
+		gdb_test_no_output "delete $wpnum" ""
+		set test "wp(size=$wpsize offset=$wpoffset) rd(size=$rdsize offset=$rdoffset) expect=$expect_hit"
+		if {$expect_hit == 0 && $rdstart < $wpendaligned} {
+		    setup_kfail external/20207 "aarch64-*-*"
+		}
+		if {$expect_hit == $did_hit} {
+		    pass $test
+		} else {
+		    fail $test
+		}
+	    }
+	}
+    }
+}