[RFC,gdb/testsuite] Add xfail in gdb.base/hbreak.exp

Message ID 20240717151055.21480-1-tdevries@suse.de
State Superseded
Headers
Series [RFC,gdb/testsuite] Add xfail in gdb.base/hbreak.exp |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed

Commit Message

Tom de Vries July 17, 2024, 3:10 p.m. UTC
  On an aarch64-linux system with 32-bit userland running in a chroot, and using
target board unix/mthumb I get:
...
(gdb) hbreak hbreak.c:27^M
Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
(gdb) PASS: gdb.base/hbreak.exp: hbreak
continue^M
Continuing.^M
Unexpected error setting breakpoint: Invalid argument.^M
(gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
...
due to this call in arm_linux_nat_target::low_prepare_to_resume:
...
          if (ptrace (PTRACE_SETHBPREGS, pid,
              (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
            perror_with_name (_("Unexpected error setting breakpoint"));
...

This problem does not happen if instead we use a 4-byte aligned address.

I'm not sure if this is simply unsupported or if there's a kernel bug of some
sort, but I don't see what gdb can do about this.

Tentatively mark this as xfail.

Tested on aarch64-linux.
---
 gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 6 deletions(-)


base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
  

Comments

Luis Machado July 17, 2024, 3:14 p.m. UTC | #1
On 7/17/24 16:10, Tom de Vries wrote:
> On an aarch64-linux system with 32-bit userland running in a chroot, and using
> target board unix/mthumb I get:
> ...
> (gdb) hbreak hbreak.c:27^M
> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M

That is a bit odd, but it goes through the compat layer, which is not exercised
as often as the 32-bit code.

Let me see if I can reproduce this one on my end.


> (gdb) PASS: gdb.base/hbreak.exp: hbreak
> continue^M
> Continuing.^M
> Unexpected error setting breakpoint: Invalid argument.^M
> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
> ...
> due to this call in arm_linux_nat_target::low_prepare_to_resume:
> ...
>           if (ptrace (PTRACE_SETHBPREGS, pid,
>               (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>             perror_with_name (_("Unexpected error setting breakpoint"));
> ...
> 
> This problem does not happen if instead we use a 4-byte aligned address.
> 
> I'm not sure if this is simply unsupported or if there's a kernel bug of some
> sort, but I don't see what gdb can do about this.
> 
> Tentatively mark this as xfail.
> 
> Tested on aarch64-linux.
> ---
>  gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>  1 file changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
> index 73a3e2afb67..b140257a23e 100644
> --- a/gdb/testsuite/gdb.base/hbreak.exp
> +++ b/gdb/testsuite/gdb.base/hbreak.exp
> @@ -27,10 +27,38 @@ if ![runto_main] {
>  
>  set breakline [gdb_get_line_number "break-at-exit"]
>  
> -gdb_test "hbreak ${srcfile}:${breakline}" \
> -	 "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
> -	 "hbreak"
> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
> +set re_dot [string_to_regexp .]
>  
> -gdb_test "continue" \
> -	 "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
> -	 "continue to break-at-exit after hbreak"
> +set addr 0x0
> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
> +	set addr $expect_out(1,string)
> +	pass $gdb_test_name
> +    }
> +}
> +
> +set have_xfail 0
> +if { [istarget arm*-*-*] } {
> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
> +    # are not 4-byte aligned may not be supported for setting a hardware
> +    # breakpoint on.
> +    set have_xfail [expr ($addr & 0x2) == 2]
> +}
> +
> +set re_xfail \
> +    [string_to_regexp \
> +	 "Unexpected error setting breakpoint: Invalid argument."]
> +
> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
> +	pass $gdb_test_name
> +    }
> +    -re -wrap $re_xfail {
> +	if { $have_xfail } {
> +	    xfail $gdb_test_name
> +	} else {
> +	    fail $gdb_test_name
> +	}
> +    }
> +}
> 
> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
  
Luis Machado July 23, 2024, 10:02 a.m. UTC | #2
On 7/17/24 16:14, Luis Machado wrote:
> On 7/17/24 16:10, Tom de Vries wrote:
>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>> target board unix/mthumb I get:
>> ...
>> (gdb) hbreak hbreak.c:27^M
>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
> 
> That is a bit odd, but it goes through the compat layer, which is not exercised
> as often as the 32-bit code.
> 
> Let me see if I can reproduce this one on my end.

I managed to reproduce this. I checked with the kernel folks and this should
work, but I'm not sure where the error is coming from.

> 
> 
>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>> continue^M
>> Continuing.^M
>> Unexpected error setting breakpoint: Invalid argument.^M
>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>> ...
>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>> ...
>>           if (ptrace (PTRACE_SETHBPREGS, pid,
>>               (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>             perror_with_name (_("Unexpected error setting breakpoint"));
>> ...
>>
>> This problem does not happen if instead we use a 4-byte aligned address.
>>
>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>> sort, but I don't see what gdb can do about this.
>>
>> Tentatively mark this as xfail.
>>
>> Tested on aarch64-linux.
>> ---
>>  gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>  1 file changed, 34 insertions(+), 6 deletions(-)
>>
>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>> index 73a3e2afb67..b140257a23e 100644
>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>  
>>  set breakline [gdb_get_line_number "break-at-exit"]
>>  
>> -gdb_test "hbreak ${srcfile}:${breakline}" \
>> -	 "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>> -	 "hbreak"
>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>> +set re_dot [string_to_regexp .]
>>  
>> -gdb_test "continue" \
>> -	 "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>> -	 "continue to break-at-exit after hbreak"
>> +set addr 0x0
>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>> +	set addr $expect_out(1,string)
>> +	pass $gdb_test_name
>> +    }
>> +}
>> +
>> +set have_xfail 0
>> +if { [istarget arm*-*-*] } {
>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>> +    # are not 4-byte aligned may not be supported for setting a hardware
>> +    # breakpoint on.
>> +    set have_xfail [expr ($addr & 0x2) == 2]
>> +}
>> +
>> +set re_xfail \
>> +    [string_to_regexp \
>> +	 "Unexpected error setting breakpoint: Invalid argument."]
>> +
>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>> +	pass $gdb_test_name
>> +    }
>> +    -re -wrap $re_xfail {
>> +	if { $have_xfail } {
>> +	    xfail $gdb_test_name
>> +	} else {
>> +	    fail $gdb_test_name
>> +	}
>> +    }
>> +}
>>
>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
> 

In any case, I agree gdb doesn't have a better way to deal with it.

Approved-By: Luis Machado <luis.machado@arm.com>
  
Tom de Vries July 24, 2024, 5:25 a.m. UTC | #3
On 7/23/24 12:02, Luis Machado wrote:
> On 7/17/24 16:14, Luis Machado wrote:
>> On 7/17/24 16:10, Tom de Vries wrote:
>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>> target board unix/mthumb I get:
>>> ...
>>> (gdb) hbreak hbreak.c:27^M
>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>
>> That is a bit odd, but it goes through the compat layer, which is not exercised
>> as often as the 32-bit code.
>>
>> Let me see if I can reproduce this one on my end.
> 
> I managed to reproduce this. I checked with the kernel folks and this should
> work, but I'm not sure where the error is coming from.
> 

Hi Luis,

thanks for looking into this, and the approval, committed.

Are you or the kernel folks following up on this, in terms of a linux 
kernel PR or some such?  It would be nice to add some sort of reference 
to the xfail.

Thanks,
- Tom

>>
>>
>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>> continue^M
>>> Continuing.^M
>>> Unexpected error setting breakpoint: Invalid argument.^M
>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>> ...
>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>> ...
>>>            if (ptrace (PTRACE_SETHBPREGS, pid,
>>>                (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>              perror_with_name (_("Unexpected error setting breakpoint"));
>>> ...
>>>
>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>
>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>> sort, but I don't see what gdb can do about this.
>>>
>>> Tentatively mark this as xfail.
>>>
>>> Tested on aarch64-linux.
>>> ---
>>>   gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>   1 file changed, 34 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>> index 73a3e2afb67..b140257a23e 100644
>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>   
>>>   set breakline [gdb_get_line_number "break-at-exit"]
>>>   
>>> -gdb_test "hbreak ${srcfile}:${breakline}" \
>>> -	 "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>> -	 "hbreak"
>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>> +set re_dot [string_to_regexp .]
>>>   
>>> -gdb_test "continue" \
>>> -	 "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>> -	 "continue to break-at-exit after hbreak"
>>> +set addr 0x0
>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>> +	set addr $expect_out(1,string)
>>> +	pass $gdb_test_name
>>> +    }
>>> +}
>>> +
>>> +set have_xfail 0
>>> +if { [istarget arm*-*-*] } {
>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>> +    # breakpoint on.
>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>> +}
>>> +
>>> +set re_xfail \
>>> +    [string_to_regexp \
>>> +	 "Unexpected error setting breakpoint: Invalid argument."]
>>> +
>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>> +	pass $gdb_test_name
>>> +    }
>>> +    -re -wrap $re_xfail {
>>> +	if { $have_xfail } {
>>> +	    xfail $gdb_test_name
>>> +	} else {
>>> +	    fail $gdb_test_name
>>> +	}
>>> +    }
>>> +}
>>>
>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>
> 
> In any case, I agree gdb doesn't have a better way to deal with it.
> 
> Approved-By: Luis Machado <luis.machado@arm.com>
  
Luis Machado July 24, 2024, 6:53 a.m. UTC | #4
On 7/24/24 06:25, Tom de Vries wrote:
> On 7/23/24 12:02, Luis Machado wrote:
>> On 7/17/24 16:14, Luis Machado wrote:
>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>> target board unix/mthumb I get:
>>>> ...
>>>> (gdb) hbreak hbreak.c:27^M
>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>
>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>> as often as the 32-bit code.
>>>
>>> Let me see if I can reproduce this one on my end.
>>
>> I managed to reproduce this. I checked with the kernel folks and this should
>> work, but I'm not sure where the error is coming from.
>>
> 
> Hi Luis,
> 
> thanks for looking into this, and the approval, committed.
> 
> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.

It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.

> 
> Thanks,
> - Tom
> 
>>>
>>>
>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>> continue^M
>>>> Continuing.^M
>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>> ...
>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>> ...
>>>>            if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>                (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>              perror_with_name (_("Unexpected error setting breakpoint"));
>>>> ...
>>>>
>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>
>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>> sort, but I don't see what gdb can do about this.
>>>>
>>>> Tentatively mark this as xfail.
>>>>
>>>> Tested on aarch64-linux.
>>>> ---
>>>>   gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>   1 file changed, 34 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>> index 73a3e2afb67..b140257a23e 100644
>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>     set breakline [gdb_get_line_number "break-at-exit"]
>>>>   -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>> -     "hbreak"
>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>> +set re_dot [string_to_regexp .]
>>>>   -gdb_test "continue" \
>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>> -     "continue to break-at-exit after hbreak"
>>>> +set addr 0x0
>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>> +    set addr $expect_out(1,string)
>>>> +    pass $gdb_test_name
>>>> +    }
>>>> +}
>>>> +
>>>> +set have_xfail 0
>>>> +if { [istarget arm*-*-*] } {
>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>> +    # breakpoint on.
>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>> +}
>>>> +
>>>> +set re_xfail \
>>>> +    [string_to_regexp \
>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>> +
>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>> +    pass $gdb_test_name
>>>> +    }
>>>> +    -re -wrap $re_xfail {
>>>> +    if { $have_xfail } {
>>>> +        xfail $gdb_test_name
>>>> +    } else {
>>>> +        fail $gdb_test_name
>>>> +    }
>>>> +    }
>>>> +}
>>>>
>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>
>>
>> In any case, I agree gdb doesn't have a better way to deal with it.
>>
>> Approved-By: Luis Machado <luis.machado@arm.com>
>
  
Tom de Vries July 24, 2024, 9:28 a.m. UTC | #5
On 7/24/24 08:53, Luis Machado wrote:
> On 7/24/24 06:25, Tom de Vries wrote:
>> On 7/23/24 12:02, Luis Machado wrote:
>>> On 7/17/24 16:14, Luis Machado wrote:
>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>> target board unix/mthumb I get:
>>>>> ...
>>>>> (gdb) hbreak hbreak.c:27^M
>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>
>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>> as often as the 32-bit code.
>>>>
>>>> Let me see if I can reproduce this one on my end.
>>>
>>> I managed to reproduce this. I checked with the kernel folks and this should
>>> work, but I'm not sure where the error is coming from.
>>>
>>
>> Hi Luis,
>>
>> thanks for looking into this, and the approval, committed.
>>
>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
> 
> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
> 

Ok, I spent some more time debugging this issue this morning.

After reading kernel sources for a while, I tried out reversing the 
order in which the Breakpoint Register Pair is written in 
arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.

My theory at this point is that the following happens in the failing case:
- PTRACE_SETHBPREGS with address 0x4004e2
- compat_arch_ptrace
- compat_ptrace_sethbpregs
- compat_ptrace_hbp_set
- ptrace_hbp_set_addr
- ptrace_hbp_get_initialised_bp
- ptrace_hbp_create
- /* Initialise fields to sane defaults
      (i.e. values that will pass validation).  */
   attr.bp_len = HW_BREAKPOINT_LEN_4;
- attr.bp_addr = 0x4004e2;
- modify_user_hw_breakpoint
- modify_user_hw_breakpoint_check
- hw_breakpoint_parse
- hw_breakpoint_arch_parse
- case is_compat_bp(bp)
- offset = 2;
- fallthrough to default
- return -EINVAL

In short, we try to validate:
- attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
and fail.

By reversing the order, we validate:
- attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
- attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
which both succeed.

So, my questions at this point are:
- is this a problem limited to aarch64 32-bit mode, or does it also
   occur for native 32-bit arm?
- is this a kernel bug?
- if this is a kernel bug, is there a workaround we can use?
- if this is not a kernel bug, is this because gdb is writing the
   Breakpoint Register Pair in the wrong order?

Thanks,
- Tom

>>
>> Thanks,
>> - Tom
>>
>>>>
>>>>
>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>> continue^M
>>>>> Continuing.^M
>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>> ...
>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>> ...
>>>>>             if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>                 (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>               perror_with_name (_("Unexpected error setting breakpoint"));
>>>>> ...
>>>>>
>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>
>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>> sort, but I don't see what gdb can do about this.
>>>>>
>>>>> Tentatively mark this as xfail.
>>>>>
>>>>> Tested on aarch64-linux.
>>>>> ---
>>>>>    gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>    1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>      set breakline [gdb_get_line_number "break-at-exit"]
>>>>>    -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>> -     "hbreak"
>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>> +set re_dot [string_to_regexp .]
>>>>>    -gdb_test "continue" \
>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>> -     "continue to break-at-exit after hbreak"
>>>>> +set addr 0x0
>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>> +    set addr $expect_out(1,string)
>>>>> +    pass $gdb_test_name
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +set have_xfail 0
>>>>> +if { [istarget arm*-*-*] } {
>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>> +    # breakpoint on.
>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>> +}
>>>>> +
>>>>> +set re_xfail \
>>>>> +    [string_to_regexp \
>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>> +
>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>> +    pass $gdb_test_name
>>>>> +    }
>>>>> +    -re -wrap $re_xfail {
>>>>> +    if { $have_xfail } {
>>>>> +        xfail $gdb_test_name
>>>>> +    } else {
>>>>> +        fail $gdb_test_name
>>>>> +    }
>>>>> +    }
>>>>> +}
>>>>>
>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>
>>>
>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>
>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>
>
  
Luis Machado July 24, 2024, 10:45 a.m. UTC | #6
On 7/24/24 10:28, Tom de Vries wrote:
> On 7/24/24 08:53, Luis Machado wrote:
>> On 7/24/24 06:25, Tom de Vries wrote:
>>> On 7/23/24 12:02, Luis Machado wrote:
>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>> target board unix/mthumb I get:
>>>>>> ...
>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>
>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>> as often as the 32-bit code.
>>>>>
>>>>> Let me see if I can reproduce this one on my end.
>>>>
>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>> work, but I'm not sure where the error is coming from.
>>>>
>>>
>>> Hi Luis,
>>>
>>> thanks for looking into this, and the approval, committed.
>>>
>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>
>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>
> 
> Ok, I spent some more time debugging this issue this morning.
> 
> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
> 

But what would change with reversing writing to the control registers, from gdb's perspective?

> My theory at this point is that the following happens in the failing case:
> - PTRACE_SETHBPREGS with address 0x4004e2
> - compat_arch_ptrace
> - compat_ptrace_sethbpregs
> - compat_ptrace_hbp_set
> - ptrace_hbp_set_addr
> - ptrace_hbp_get_initialised_bp
> - ptrace_hbp_create
> - /* Initialise fields to sane defaults
>      (i.e. values that will pass validation).  */
>   attr.bp_len = HW_BREAKPOINT_LEN_4;


The default starts as a 4-byte breakpoint, but is supposed to be adjusted later on to 2 bytes. If this isn't happening, I think we have a bug somewhere.

> - attr.bp_addr = 0x4004e2;
> - modify_user_hw_breakpoint
> - modify_user_hw_breakpoint_check
> - hw_breakpoint_parse
> - hw_breakpoint_arch_parse
> - case is_compat_bp(bp)
> - offset = 2;
> - fallthrough to default
> - return -EINVAL
> 
> In short, we try to validate:
> - attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
> and fail.
> 
> By reversing the order, we validate:
> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
> which both succeed.

Why do we have HW_BREAKPOINT_LEN_2 above while the first case has HW_BREAKPOINT_LEN_4?

> 
> So, my questions at this point are:
> - is this a problem limited to aarch64 32-bit mode, or does it also
>   occur for native 32-bit arm?

I'm not sure at this point. They are two separate code bases, but it is probably reasonable to assume the compat layer of aarch64 was based on the
original 32-bit arm code. Checking hw_breakpoint_arch_parse for arm, it does seem fairly similar.

> - is this a kernel bug?

Potentially, if it is assuming a length that is not correct.

> - if this is a kernel bug, is there a workaround we can use?
> - if this is not a kernel bug, is this because gdb is writing the
>   Breakpoint Register Pair in the wrong order?

I don't think we have a specific order to write things, but if it is a bug that arises from a specific order of commands, we could potentially
work around it.

> 
> Thanks,
> - Tom
> 
>>>
>>> Thanks,
>>> - Tom
>>>
>>>>>
>>>>>
>>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>>> continue^M
>>>>>> Continuing.^M
>>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>>> ...
>>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>>> ...
>>>>>>             if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>>                 (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>>               perror_with_name (_("Unexpected error setting breakpoint"));
>>>>>> ...
>>>>>>
>>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>>
>>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>>> sort, but I don't see what gdb can do about this.
>>>>>>
>>>>>> Tentatively mark this as xfail.
>>>>>>
>>>>>> Tested on aarch64-linux.
>>>>>> ---
>>>>>>    gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>>    1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>>      set breakline [gdb_get_line_number "break-at-exit"]
>>>>>>    -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>>> -     "hbreak"
>>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>>> +set re_dot [string_to_regexp .]
>>>>>>    -gdb_test "continue" \
>>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>>> -     "continue to break-at-exit after hbreak"
>>>>>> +set addr 0x0
>>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>>> +    set addr $expect_out(1,string)
>>>>>> +    pass $gdb_test_name
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +set have_xfail 0
>>>>>> +if { [istarget arm*-*-*] } {
>>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>>> +    # breakpoint on.
>>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>>> +}
>>>>>> +
>>>>>> +set re_xfail \
>>>>>> +    [string_to_regexp \
>>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>>> +
>>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>>> +    pass $gdb_test_name
>>>>>> +    }
>>>>>> +    -re -wrap $re_xfail {
>>>>>> +    if { $have_xfail } {
>>>>>> +        xfail $gdb_test_name
>>>>>> +    } else {
>>>>>> +        fail $gdb_test_name
>>>>>> +    }
>>>>>> +    }
>>>>>> +}
>>>>>>
>>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>>
>>>>
>>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>>
>>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>>
>>
>
  
Tom de Vries July 24, 2024, 11:56 a.m. UTC | #7
On 7/24/24 12:45, Luis Machado wrote:
> On 7/24/24 10:28, Tom de Vries wrote:
>> On 7/24/24 08:53, Luis Machado wrote:
>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>> target board unix/mthumb I get:
>>>>>>> ...
>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>
>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>> as often as the 32-bit code.
>>>>>>
>>>>>> Let me see if I can reproduce this one on my end.
>>>>>
>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>> work, but I'm not sure where the error is coming from.
>>>>>
>>>>
>>>> Hi Luis,
>>>>
>>>> thanks for looking into this, and the approval, committed.
>>>>
>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>
>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>
>>
>> Ok, I spent some more time debugging this issue this morning.
>>
>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>
> 
> But what would change with reversing writing to the control registers, from gdb's perspective?
> 

Well, from gdb's perspective, the only difference is that both ptrace 
calls succeed, while with the original order the first one fails (and 
consequently there's no second call).

>> My theory at this point is that the following happens in the failing case:
>> - PTRACE_SETHBPREGS with address 0x4004e2
>> - compat_arch_ptrace
>> - compat_ptrace_sethbpregs
>> - compat_ptrace_hbp_set
>> - ptrace_hbp_set_addr
>> - ptrace_hbp_get_initialised_bp
>> - ptrace_hbp_create
>> - /* Initialise fields to sane defaults
>>       (i.e. values that will pass validation).  */
>>    attr.bp_len = HW_BREAKPOINT_LEN_4;
> 
> 
> The default starts as a 4-byte breakpoint, but is supposed to be adjusted later on to 2 bytes. If this isn't happening, I think we have a bug somewhere.
> 

Agreed, you could frame that as a kernel bug.  It would be good to known 
whether the kernel developers agree with that assessment.

>> - attr.bp_addr = 0x4004e2;
>> - modify_user_hw_breakpoint
>> - modify_user_hw_breakpoint_check
>> - hw_breakpoint_parse
>> - hw_breakpoint_arch_parse
>> - case is_compat_bp(bp)
>> - offset = 2;
>> - fallthrough to default
>> - return -EINVAL
>>
>> In short, we try to validate:
>> - attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
>> and fail.
>>
>> By reversing the order, we validate:
>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
>> which both succeed.
> 
> Why do we have HW_BREAKPOINT_LEN_2 above while the first case has HW_BREAKPOINT_LEN_4?
> 

Well, because we reversed the order of the two ptrace calls.

So, in the original case, the first call to ptrace uses the default 
bp_len (HW_BREAKPOINT_LEN_4) and the actual address (0x4004e2), which fails.

And in the reversed order case, the first call to ptrace uses the 
default address (0x0) and the actual bp_len (HW_BREAKPOINT_LEN_2).

[ With "default" meaning, as set by ptrace_hbp_create, and "actual", as 
set by the ptrace calls. ]

>>
>> So, my questions at this point are:
>> - is this a problem limited to aarch64 32-bit mode, or does it also
>>    occur for native 32-bit arm?
> 
> I'm not sure at this point. They are two separate code bases, but it is probably reasonable to assume the compat layer of aarch64 was based on the
> original 32-bit arm code. Checking hw_breakpoint_arch_parse for arm, it does seem fairly similar.
> 

I also observed that they're very similar.

>> - is this a kernel bug?
> 
> Potentially, if it is assuming a length that is not correct.
> 
>> - if this is a kernel bug, is there a workaround we can use?
>> - if this is not a kernel bug, is this because gdb is writing the
>>    Breakpoint Register Pair in the wrong order?
> 
> I don't think we have a specific order to write things, but if it is a bug that arises from a specific order of commands, we could potentially
> work around it.
> 

OK, I'm currently testing that approach.

Thanks,
- Tom

>>
>> Thanks,
>> - Tom
>>
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>>>>
>>>>>>
>>>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>>>> continue^M
>>>>>>> Continuing.^M
>>>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>>>> ...
>>>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>>>> ...
>>>>>>>              if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>>>                  (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>>>                perror_with_name (_("Unexpected error setting breakpoint"));
>>>>>>> ...
>>>>>>>
>>>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>>>
>>>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>>>> sort, but I don't see what gdb can do about this.
>>>>>>>
>>>>>>> Tentatively mark this as xfail.
>>>>>>>
>>>>>>> Tested on aarch64-linux.
>>>>>>> ---
>>>>>>>     gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>>>     1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>>>       set breakline [gdb_get_line_number "break-at-exit"]
>>>>>>>     -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>>>> -     "hbreak"
>>>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>>>> +set re_dot [string_to_regexp .]
>>>>>>>     -gdb_test "continue" \
>>>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>>>> -     "continue to break-at-exit after hbreak"
>>>>>>> +set addr 0x0
>>>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>>>> +    set addr $expect_out(1,string)
>>>>>>> +    pass $gdb_test_name
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +set have_xfail 0
>>>>>>> +if { [istarget arm*-*-*] } {
>>>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>>>> +    # breakpoint on.
>>>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>>>> +}
>>>>>>> +
>>>>>>> +set re_xfail \
>>>>>>> +    [string_to_regexp \
>>>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>>>> +
>>>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>>>> +    pass $gdb_test_name
>>>>>>> +    }
>>>>>>> +    -re -wrap $re_xfail {
>>>>>>> +    if { $have_xfail } {
>>>>>>> +        xfail $gdb_test_name
>>>>>>> +    } else {
>>>>>>> +        fail $gdb_test_name
>>>>>>> +    }
>>>>>>> +    }
>>>>>>> +}
>>>>>>>
>>>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>>>
>>>>>
>>>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>>>
>>>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>>>
>>>
>>
>
  
Luis Machado July 24, 2024, 10:59 p.m. UTC | #8
On 7/24/24 12:56, Tom de Vries wrote:
> On 7/24/24 12:45, Luis Machado wrote:
>> On 7/24/24 10:28, Tom de Vries wrote:
>>> On 7/24/24 08:53, Luis Machado wrote:
>>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>>> target board unix/mthumb I get:
>>>>>>>> ...
>>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>>
>>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>>> as often as the 32-bit code.
>>>>>>>
>>>>>>> Let me see if I can reproduce this one on my end.
>>>>>>
>>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>>> work, but I'm not sure where the error is coming from.
>>>>>>
>>>>>
>>>>> Hi Luis,
>>>>>
>>>>> thanks for looking into this, and the approval, committed.
>>>>>
>>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>>
>>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>>
>>>
>>> Ok, I spent some more time debugging this issue this morning.
>>>
>>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>>
>>
>> But what would change with reversing writing to the control registers, from gdb's perspective?
>>
> 
> Well, from gdb's perspective, the only difference is that both ptrace calls succeed, while with the original order the first one fails (and consequently there's no second call).> 

I've investigated this further, and I think I see the reason why reversing works. It seems handling of hardware breakpoints is slightly different between aarch64 compat and
32-bit arm.

In summary, it seems aarch64 compat attempts to set the address before doing anything with the passed control register value, in arch/arm64/kernel/ptrace.c:hw_break_set.

We can see it punts if ptrace_hbp_set_addr returns an error, which is where we're failing with EINVAL.

For 32-bit arm, in arch/arm/kernel/ptrace.c:ptrace_sethbpregs, we do things in a different way. The important bit is that we only call modify_user_hw_breakpoint after
we're done setting both the address and the control register.

For aarch64 compat we call modify_user_hw_breakpoint for both ptrace_hbp_set_addr and ptrace_hbp_set_ctrl.

When we have a new task and attempt to use a hw break, the kernel initializes the hw break slots. It does so in arch/arm64/kernel/ptrace.c:ptrace_hbp_get_initialised_bp.

Once a slot is initialized, it isn't initialized again it seems. We will only reuse the slot (with whatever information it has, since we will replace it anyway).

With the above context, it seems we're running into trouble when we have an unaligned thumb address (offset == 2) and the slot's bp_len is set to ARM_BREAKPOINT_LEN_4,
which is the default (or it is there because we previously set a 4-byte hw break on this slot).

We can confirm that setting a 2-byte hw break works if we use an aligned thumb address (offset == 0), because we use a different switch case entry. It also works if we first manage to insert
a 2-byte hw break on an aligned thumb address, delete it and then try to insert the hw break on the unaligned thumb address. This is because inserting a hw break on an
aligned thumb address sets the bp_len to ARM_BREAKPOINT_LEN_2, and we eventually reuse that slot during ptrace_hbp_set_addr, which, I think, is the bug we have in the kernel.

We shouldn't be reusing that information. Instead we should use whatever the user passed as the control register value to the ptrace call.

For a potential workaround, I think we'll need to check for attempts at inserting a hw break at an unaligned thumb address (offset == 2). If so, then we do a dance of
inserting the hw break at the aligned version of that address (offset == 0), only to make sure the slot's bp_len is correctly set to ARM_BREAKPOINT_LEN_2, and then
proceed to insert the hw break on the original requested unaligned thumb address.

Off the top of my head I can't think of potential issues with this approach. I don't think the kernel checks if we insert two hw break's at the same address, so that
corner case might not be an issue.

Thoughts?

>>> My theory at this point is that the following happens in the failing case:
>>> - PTRACE_SETHBPREGS with address 0x4004e2
>>> - compat_arch_ptrace
>>> - compat_ptrace_sethbpregs
>>> - compat_ptrace_hbp_set
>>> - ptrace_hbp_set_addr
>>> - ptrace_hbp_get_initialised_bp
>>> - ptrace_hbp_create
>>> - /* Initialise fields to sane defaults
>>>       (i.e. values that will pass validation).  */
>>>    attr.bp_len = HW_BREAKPOINT_LEN_4;
>>
>>
>> The default starts as a 4-byte breakpoint, but is supposed to be adjusted later on to 2 bytes. If this isn't happening, I think we have a bug somewhere.
>>
> 
> Agreed, you could frame that as a kernel bug.  It would be good to known whether the kernel developers agree with that assessment.
> 
>>> - attr.bp_addr = 0x4004e2;
>>> - modify_user_hw_breakpoint
>>> - modify_user_hw_breakpoint_check
>>> - hw_breakpoint_parse
>>> - hw_breakpoint_arch_parse
>>> - case is_compat_bp(bp)
>>> - offset = 2;
>>> - fallthrough to default
>>> - return -EINVAL
>>>
>>> In short, we try to validate:
>>> - attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
>>> and fail.
>>>
>>> By reversing the order, we validate:
>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
>>> which both succeed.
>>
>> Why do we have HW_BREAKPOINT_LEN_2 above while the first case has HW_BREAKPOINT_LEN_4?
>>
> 
> Well, because we reversed the order of the two ptrace calls.
> 
> So, in the original case, the first call to ptrace uses the default bp_len (HW_BREAKPOINT_LEN_4) and the actual address (0x4004e2), which fails.
> 
> And in the reversed order case, the first call to ptrace uses the default address (0x0) and the actual bp_len (HW_BREAKPOINT_LEN_2).
> 
> [ With "default" meaning, as set by ptrace_hbp_create, and "actual", as set by the ptrace calls. ]
> 
>>>
>>> So, my questions at this point are:
>>> - is this a problem limited to aarch64 32-bit mode, or does it also
>>>    occur for native 32-bit arm?
>>
>> I'm not sure at this point. They are two separate code bases, but it is probably reasonable to assume the compat layer of aarch64 was based on the
>> original 32-bit arm code. Checking hw_breakpoint_arch_parse for arm, it does seem fairly similar.
>>
> 
> I also observed that they're very similar.
> 
>>> - is this a kernel bug?
>>
>> Potentially, if it is assuming a length that is not correct.
>>
>>> - if this is a kernel bug, is there a workaround we can use?
>>> - if this is not a kernel bug, is this because gdb is writing the
>>>    Breakpoint Register Pair in the wrong order?
>>
>> I don't think we have a specific order to write things, but if it is a bug that arises from a specific order of commands, we could potentially
>> work around it.
>>
> 
> OK, I'm currently testing that approach.
> 
> Thanks,
> - Tom
> 
>>>
>>> Thanks,
>>> - Tom
>>>
>>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>>>>> continue^M
>>>>>>>> Continuing.^M
>>>>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>>>>> ...
>>>>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>>>>> ...
>>>>>>>>              if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>>>>                  (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>>>>                perror_with_name (_("Unexpected error setting breakpoint"));
>>>>>>>> ...
>>>>>>>>
>>>>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>>>>
>>>>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>>>>> sort, but I don't see what gdb can do about this.
>>>>>>>>
>>>>>>>> Tentatively mark this as xfail.
>>>>>>>>
>>>>>>>> Tested on aarch64-linux.
>>>>>>>> ---
>>>>>>>>     gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>>>>     1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>>>>       set breakline [gdb_get_line_number "break-at-exit"]
>>>>>>>>     -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>>>>> -     "hbreak"
>>>>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>>>>> +set re_dot [string_to_regexp .]
>>>>>>>>     -gdb_test "continue" \
>>>>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>>>>> -     "continue to break-at-exit after hbreak"
>>>>>>>> +set addr 0x0
>>>>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>>>>> +    set addr $expect_out(1,string)
>>>>>>>> +    pass $gdb_test_name
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +set have_xfail 0
>>>>>>>> +if { [istarget arm*-*-*] } {
>>>>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>>>>> +    # breakpoint on.
>>>>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +set re_xfail \
>>>>>>>> +    [string_to_regexp \
>>>>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>>>>> +
>>>>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>>>>> +    pass $gdb_test_name
>>>>>>>> +    }
>>>>>>>> +    -re -wrap $re_xfail {
>>>>>>>> +    if { $have_xfail } {
>>>>>>>> +        xfail $gdb_test_name
>>>>>>>> +    } else {
>>>>>>>> +        fail $gdb_test_name
>>>>>>>> +    }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>>
>>>>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>>>>
>>>>>>
>>>>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>>>>
>>>>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>>>>
>>>>
>>>
>>
>
  
Tom de Vries July 25, 2024, 1:52 p.m. UTC | #9
On 7/25/24 00:59, Luis Machado wrote:
> On 7/24/24 12:56, Tom de Vries wrote:
>> On 7/24/24 12:45, Luis Machado wrote:
>>> On 7/24/24 10:28, Tom de Vries wrote:
>>>> On 7/24/24 08:53, Luis Machado wrote:
>>>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>>>> target board unix/mthumb I get:
>>>>>>>>> ...
>>>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>>>
>>>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>>>> as often as the 32-bit code.
>>>>>>>>
>>>>>>>> Let me see if I can reproduce this one on my end.
>>>>>>>
>>>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>>>> work, but I'm not sure where the error is coming from.
>>>>>>>
>>>>>>
>>>>>> Hi Luis,
>>>>>>
>>>>>> thanks for looking into this, and the approval, committed.
>>>>>>
>>>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>>>
>>>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>>>
>>>>
>>>> Ok, I spent some more time debugging this issue this morning.
>>>>
>>>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>>>
>>>
>>> But what would change with reversing writing to the control registers, from gdb's perspective?
>>>
>>
>> Well, from gdb's perspective, the only difference is that both ptrace calls succeed, while with the original order the first one fails (and consequently there's no second call).>
> 
> I've investigated this further, and I think I see the reason why reversing works. It seems handling of hardware breakpoints is slightly different between aarch64 compat and
> 32-bit arm.
> 
> In summary, it seems aarch64 compat attempts to set the address before doing anything with the passed control register value, in arch/arm64/kernel/ptrace.c:hw_break_set.
> 
> We can see it punts if ptrace_hbp_set_addr returns an error, which is where we're failing with EINVAL.
> 
> For 32-bit arm, in arch/arm/kernel/ptrace.c:ptrace_sethbpregs, we do things in a different way. The important bit is that we only call modify_user_hw_breakpoint after
> we're done setting both the address and the control register.
> 

I'm looking at that code, and it seems obvious to me that 
modify_user_hw_breakpoint is called both after setting the address 
register and after setting the control register.  Could you double-check 
your observation?

> For aarch64 compat we call modify_user_hw_breakpoint for both ptrace_hbp_set_addr and ptrace_hbp_set_ctrl.
> 
> When we have a new task and attempt to use a hw break, the kernel initializes the hw break slots. It does so in arch/arm64/kernel/ptrace.c:ptrace_hbp_get_initialised_bp.
> 
> Once a slot is initialized, it isn't initialized again it seems. We will only reuse the slot (with whatever information it has, since we will replace it anyway).
> 
> With the above context, it seems we're running into trouble when we have an unaligned thumb address (offset == 2) and the slot's bp_len is set to ARM_BREAKPOINT_LEN_4,
> which is the default (or it is there because we previously set a 4-byte hw break on this slot).
> 
> We can confirm that setting a 2-byte hw break works if we use an aligned thumb address (offset == 0), because we use a different switch case entry. It also works if we first manage to insert
> a 2-byte hw break on an aligned thumb address, delete it and then try to insert the hw break on the unaligned thumb address.

Confirmed, that also works for me.

> This is because inserting a hw break on an
> aligned thumb address sets the bp_len to ARM_BREAKPOINT_LEN_2, and we eventually reuse that slot during ptrace_hbp_set_addr, which, I think, is the bug we have in the kernel.
> 
> We shouldn't be reusing that information. Instead we should use whatever the user passed as the control register value to the ptrace call.
> 

IIUC, your hypothesis is that the kernel bug is that the check for 
address vs breakpoint length should only happen when writing the control 
register?

> For a potential workaround, I think we'll need to check for attempts at inserting a hw break at an unaligned thumb address (offset == 2). If so, then we do a dance of
> inserting the hw break at the aligned version of that address (offset == 0), only to make sure the slot's bp_len is correctly set to ARM_BREAKPOINT_LEN_2, and then
> proceed to insert the hw break on the original requested unaligned thumb address.
> 
> Off the top of my head I can't think of potential issues with this approach. I don't think the kernel checks if we insert two hw break's at the same address, so that
> corner case might not be an issue.
> 

I've submitted a patch implementing that approach ( 
https://sourceware.org/pipermail/gdb-patches/2024-July/210681.html ), 
basically doing the following ptrace calls:
...
	     1. address_reg = bpts[i].address & ~0x7U
	     2. control_reg = bpts[i].control
	     3. address_reg = bpts[i].address
...

[ Note that a fix for the kernel bug formulated above would mean that 
the address vs breakpoint length check in step 3 would stop happening, 
and we'd need to write the control register again in a step 4, to get 
that check back... ]

Thanks,
- Tom

> Thoughts?
> 
>>>> My theory at this point is that the following happens in the failing case:
>>>> - PTRACE_SETHBPREGS with address 0x4004e2
>>>> - compat_arch_ptrace
>>>> - compat_ptrace_sethbpregs
>>>> - compat_ptrace_hbp_set
>>>> - ptrace_hbp_set_addr
>>>> - ptrace_hbp_get_initialised_bp
>>>> - ptrace_hbp_create
>>>> - /* Initialise fields to sane defaults
>>>>        (i.e. values that will pass validation).  */
>>>>     attr.bp_len = HW_BREAKPOINT_LEN_4;
>>>
>>>
>>> The default starts as a 4-byte breakpoint, but is supposed to be adjusted later on to 2 bytes. If this isn't happening, I think we have a bug somewhere.
>>>
>>
>> Agreed, you could frame that as a kernel bug.  It would be good to known whether the kernel developers agree with that assessment.
>>
>>>> - attr.bp_addr = 0x4004e2;
>>>> - modify_user_hw_breakpoint
>>>> - modify_user_hw_breakpoint_check
>>>> - hw_breakpoint_parse
>>>> - hw_breakpoint_arch_parse
>>>> - case is_compat_bp(bp)
>>>> - offset = 2;
>>>> - fallthrough to default
>>>> - return -EINVAL
>>>>
>>>> In short, we try to validate:
>>>> - attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
>>>> and fail.
>>>>
>>>> By reversing the order, we validate:
>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
>>>> which both succeed.
>>>
>>> Why do we have HW_BREAKPOINT_LEN_2 above while the first case has HW_BREAKPOINT_LEN_4?
>>>
>>
>> Well, because we reversed the order of the two ptrace calls.
>>
>> So, in the original case, the first call to ptrace uses the default bp_len (HW_BREAKPOINT_LEN_4) and the actual address (0x4004e2), which fails.
>>
>> And in the reversed order case, the first call to ptrace uses the default address (0x0) and the actual bp_len (HW_BREAKPOINT_LEN_2).
>>
>> [ With "default" meaning, as set by ptrace_hbp_create, and "actual", as set by the ptrace calls. ]
>>
>>>>
>>>> So, my questions at this point are:
>>>> - is this a problem limited to aarch64 32-bit mode, or does it also
>>>>     occur for native 32-bit arm?
>>>
>>> I'm not sure at this point. They are two separate code bases, but it is probably reasonable to assume the compat layer of aarch64 was based on the
>>> original 32-bit arm code. Checking hw_breakpoint_arch_parse for arm, it does seem fairly similar.
>>>
>>
>> I also observed that they're very similar.
>>
>>>> - is this a kernel bug?
>>>
>>> Potentially, if it is assuming a length that is not correct.
>>>
>>>> - if this is a kernel bug, is there a workaround we can use?
>>>> - if this is not a kernel bug, is this because gdb is writing the
>>>>     Breakpoint Register Pair in the wrong order?
>>>
>>> I don't think we have a specific order to write things, but if it is a bug that arises from a specific order of commands, we could potentially
>>> work around it.
>>>
>>
>> OK, I'm currently testing that approach.
>>
>> Thanks,
>> - Tom
>>
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> - Tom
>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>>>>>> continue^M
>>>>>>>>> Continuing.^M
>>>>>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>>>>>> ...
>>>>>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>>>>>> ...
>>>>>>>>>               if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>>>>>                   (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>>>>>                 perror_with_name (_("Unexpected error setting breakpoint"));
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>>>>>
>>>>>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>>>>>> sort, but I don't see what gdb can do about this.
>>>>>>>>>
>>>>>>>>> Tentatively mark this as xfail.
>>>>>>>>>
>>>>>>>>> Tested on aarch64-linux.
>>>>>>>>> ---
>>>>>>>>>      gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>>>>>      1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>>>>>        set breakline [gdb_get_line_number "break-at-exit"]
>>>>>>>>>      -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>>>>>> -     "hbreak"
>>>>>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>>>>>> +set re_dot [string_to_regexp .]
>>>>>>>>>      -gdb_test "continue" \
>>>>>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>>>>>> -     "continue to break-at-exit after hbreak"
>>>>>>>>> +set addr 0x0
>>>>>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>>>>>> +    set addr $expect_out(1,string)
>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +set have_xfail 0
>>>>>>>>> +if { [istarget arm*-*-*] } {
>>>>>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>>>>>> +    # breakpoint on.
>>>>>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +set re_xfail \
>>>>>>>>> +    [string_to_regexp \
>>>>>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>>>>>> +
>>>>>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>> +    }
>>>>>>>>> +    -re -wrap $re_xfail {
>>>>>>>>> +    if { $have_xfail } {
>>>>>>>>> +        xfail $gdb_test_name
>>>>>>>>> +    } else {
>>>>>>>>> +        fail $gdb_test_name
>>>>>>>>> +    }
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>>>>>
>>>>>>>
>>>>>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>>>>>
>>>>>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>>>>>
>>>>>
>>>>
>>>
>>
>
  
Luis Machado July 25, 2024, 3:22 p.m. UTC | #10
On 7/25/24 14:52, Tom de Vries wrote:
> On 7/25/24 00:59, Luis Machado wrote:
>> On 7/24/24 12:56, Tom de Vries wrote:
>>> On 7/24/24 12:45, Luis Machado wrote:
>>>> On 7/24/24 10:28, Tom de Vries wrote:
>>>>> On 7/24/24 08:53, Luis Machado wrote:
>>>>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>>>>> target board unix/mthumb I get:
>>>>>>>>>> ...
>>>>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>>>>
>>>>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>>>>> as often as the 32-bit code.
>>>>>>>>>
>>>>>>>>> Let me see if I can reproduce this one on my end.
>>>>>>>>
>>>>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>>>>> work, but I'm not sure where the error is coming from.
>>>>>>>>
>>>>>>>
>>>>>>> Hi Luis,
>>>>>>>
>>>>>>> thanks for looking into this, and the approval, committed.
>>>>>>>
>>>>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>>>>
>>>>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>>>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>>>>
>>>>>
>>>>> Ok, I spent some more time debugging this issue this morning.
>>>>>
>>>>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>>>>
>>>>
>>>> But what would change with reversing writing to the control registers, from gdb's perspective?
>>>>
>>>
>>> Well, from gdb's perspective, the only difference is that both ptrace calls succeed, while with the original order the first one fails (and consequently there's no second call).>
>>
>> I've investigated this further, and I think I see the reason why reversing works. It seems handling of hardware breakpoints is slightly different between aarch64 compat and
>> 32-bit arm.
>>
>> In summary, it seems aarch64 compat attempts to set the address before doing anything with the passed control register value, in arch/arm64/kernel/ptrace.c:hw_break_set.
>>
>> We can see it punts if ptrace_hbp_set_addr returns an error, which is where we're failing with EINVAL.
>>
>> For 32-bit arm, in arch/arm/kernel/ptrace.c:ptrace_sethbpregs, we do things in a different way. The important bit is that we only call modify_user_hw_breakpoint after
>> we're done setting both the address and the control register.
>>
> 
> I'm looking at that code, and it seems obvious to me that modify_user_hw_breakpoint is called both after setting the address register and after setting the control register.  Could you double-check your observation?
> 

You're right. It gets called twice, one for setting the address and the other for setting the control register. I missed that when reading through it.

So it may still be the case this is also an issue with the 32-bit arm code. I'll have to boot a 32-bit kernel to check.

>> For aarch64 compat we call modify_user_hw_breakpoint for both ptrace_hbp_set_addr and ptrace_hbp_set_ctrl.
>>
>> When we have a new task and attempt to use a hw break, the kernel initializes the hw break slots. It does so in arch/arm64/kernel/ptrace.c:ptrace_hbp_get_initialised_bp.
>>
>> Once a slot is initialized, it isn't initialized again it seems. We will only reuse the slot (with whatever information it has, since we will replace it anyway).
>>
>> With the above context, it seems we're running into trouble when we have an unaligned thumb address (offset == 2) and the slot's bp_len is set to ARM_BREAKPOINT_LEN_4,
>> which is the default (or it is there because we previously set a 4-byte hw break on this slot).
>>
>> We can confirm that setting a 2-byte hw break works if we use an aligned thumb address (offset == 0), because we use a different switch case entry. It also works if we first manage to insert
>> a 2-byte hw break on an aligned thumb address, delete it and then try to insert the hw break on the unaligned thumb address.
> 
> Confirmed, that also works for me.
> 
>> This is because inserting a hw break on an
>> aligned thumb address sets the bp_len to ARM_BREAKPOINT_LEN_2, and we eventually reuse that slot during ptrace_hbp_set_addr, which, I think, is the bug we have in the kernel.
>>
>> We shouldn't be reusing that information. Instead we should use whatever the user passed as the control register value to the ptrace call.
>>
> 
> IIUC, your hypothesis is that the kernel bug is that the check for address vs breakpoint length should only happen when writing the control register?

I think so, because we need to validate the address against the length of the breakpoint that is being requested. And that data is part of the control register.

The problem seems to arise from the fact we need to do two ptrace calls to set things up, and we're trying to validate both calls.

> 
>> For a potential workaround, I think we'll need to check for attempts at inserting a hw break at an unaligned thumb address (offset == 2). If so, then we do a dance of
>> inserting the hw break at the aligned version of that address (offset == 0), only to make sure the slot's bp_len is correctly set to ARM_BREAKPOINT_LEN_2, and then
>> proceed to insert the hw break on the original requested unaligned thumb address.
>>
>> Off the top of my head I can't think of potential issues with this approach. I don't think the kernel checks if we insert two hw break's at the same address, so that
>> corner case might not be an issue.
>>
> 
> I've submitted a patch implementing that approach ( https://sourceware.org/pipermail/gdb-patches/2024-July/210681.html ), basically doing the following ptrace calls:
> ...
>          1. address_reg = bpts[i].address & ~0x7U
>          2. control_reg = bpts[i].control
>          3. address_reg = bpts[i].address
> ...
> 
> [ Note that a fix for the kernel bug formulated above would mean that the address vs breakpoint length check in step 3 would stop happening, and we'd need to write the control register again in a step 4, to get that check back... ]

That makes sense, as we need two ptrace calls for this operation

> 
> Thanks,
> - Tom
> 
>> Thoughts?
>>
>>>>> My theory at this point is that the following happens in the failing case:
>>>>> - PTRACE_SETHBPREGS with address 0x4004e2
>>>>> - compat_arch_ptrace
>>>>> - compat_ptrace_sethbpregs
>>>>> - compat_ptrace_hbp_set
>>>>> - ptrace_hbp_set_addr
>>>>> - ptrace_hbp_get_initialised_bp
>>>>> - ptrace_hbp_create
>>>>> - /* Initialise fields to sane defaults
>>>>>        (i.e. values that will pass validation).  */
>>>>>     attr.bp_len = HW_BREAKPOINT_LEN_4;
>>>>
>>>>
>>>> The default starts as a 4-byte breakpoint, but is supposed to be adjusted later on to 2 bytes. If this isn't happening, I think we have a bug somewhere.
>>>>
>>>
>>> Agreed, you could frame that as a kernel bug.  It would be good to known whether the kernel developers agree with that assessment.
>>>
>>>>> - attr.bp_addr = 0x4004e2;
>>>>> - modify_user_hw_breakpoint
>>>>> - modify_user_hw_breakpoint_check
>>>>> - hw_breakpoint_parse
>>>>> - hw_breakpoint_arch_parse
>>>>> - case is_compat_bp(bp)
>>>>> - offset = 2;
>>>>> - fallthrough to default
>>>>> - return -EINVAL
>>>>>
>>>>> In short, we try to validate:
>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
>>>>> and fail.
>>>>>
>>>>> By reversing the order, we validate:
>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
>>>>> which both succeed.
>>>>
>>>> Why do we have HW_BREAKPOINT_LEN_2 above while the first case has HW_BREAKPOINT_LEN_4?
>>>>
>>>
>>> Well, because we reversed the order of the two ptrace calls.
>>>
>>> So, in the original case, the first call to ptrace uses the default bp_len (HW_BREAKPOINT_LEN_4) and the actual address (0x4004e2), which fails.
>>>
>>> And in the reversed order case, the first call to ptrace uses the default address (0x0) and the actual bp_len (HW_BREAKPOINT_LEN_2).
>>>
>>> [ With "default" meaning, as set by ptrace_hbp_create, and "actual", as set by the ptrace calls. ]
>>>
>>>>>
>>>>> So, my questions at this point are:
>>>>> - is this a problem limited to aarch64 32-bit mode, or does it also
>>>>>     occur for native 32-bit arm?
>>>>
>>>> I'm not sure at this point. They are two separate code bases, but it is probably reasonable to assume the compat layer of aarch64 was based on the
>>>> original 32-bit arm code. Checking hw_breakpoint_arch_parse for arm, it does seem fairly similar.
>>>>
>>>
>>> I also observed that they're very similar.
>>>
>>>>> - is this a kernel bug?
>>>>
>>>> Potentially, if it is assuming a length that is not correct.
>>>>
>>>>> - if this is a kernel bug, is there a workaround we can use?
>>>>> - if this is not a kernel bug, is this because gdb is writing the
>>>>>     Breakpoint Register Pair in the wrong order?
>>>>
>>>> I don't think we have a specific order to write things, but if it is a bug that arises from a specific order of commands, we could potentially
>>>> work around it.
>>>>
>>>
>>> OK, I'm currently testing that approach.
>>>
>>> Thanks,
>>> - Tom
>>>
>>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> - Tom
>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>>>>>>> continue^M
>>>>>>>>>> Continuing.^M
>>>>>>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>>>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>>>>>>> ...
>>>>>>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>>>>>>> ...
>>>>>>>>>>               if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>>>>>>                   (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>>>>>>                 perror_with_name (_("Unexpected error setting breakpoint"));
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>>>>>>
>>>>>>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>>>>>>> sort, but I don't see what gdb can do about this.
>>>>>>>>>>
>>>>>>>>>> Tentatively mark this as xfail.
>>>>>>>>>>
>>>>>>>>>> Tested on aarch64-linux.
>>>>>>>>>> ---
>>>>>>>>>>      gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>>>>>>      1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>>>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>>>>>>        set breakline [gdb_get_line_number "break-at-exit"]
>>>>>>>>>>      -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>>>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>>>>>>> -     "hbreak"
>>>>>>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>>>>>>> +set re_dot [string_to_regexp .]
>>>>>>>>>>      -gdb_test "continue" \
>>>>>>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>>>>>>> -     "continue to break-at-exit after hbreak"
>>>>>>>>>> +set addr 0x0
>>>>>>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>>>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>>>>>>> +    set addr $expect_out(1,string)
>>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>>> +    }
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +set have_xfail 0
>>>>>>>>>> +if { [istarget arm*-*-*] } {
>>>>>>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>>>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>>>>>>> +    # breakpoint on.
>>>>>>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +set re_xfail \
>>>>>>>>>> +    [string_to_regexp \
>>>>>>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>>>>>>> +
>>>>>>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>>>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>>> +    }
>>>>>>>>>> +    -re -wrap $re_xfail {
>>>>>>>>>> +    if { $have_xfail } {
>>>>>>>>>> +        xfail $gdb_test_name
>>>>>>>>>> +    } else {
>>>>>>>>>> +        fail $gdb_test_name
>>>>>>>>>> +    }
>>>>>>>>>> +    }
>>>>>>>>>> +}
>>>>>>>>>>
>>>>>>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>>>>>>
>>>>>>>>
>>>>>>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>>>>>>
>>>>>>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
  
Tom de Vries July 26, 2024, 6:28 a.m. UTC | #11
On 7/25/24 17:22, Luis Machado wrote:
> On 7/25/24 14:52, Tom de Vries wrote:
>> On 7/25/24 00:59, Luis Machado wrote:
>>> On 7/24/24 12:56, Tom de Vries wrote:
>>>> On 7/24/24 12:45, Luis Machado wrote:
>>>>> On 7/24/24 10:28, Tom de Vries wrote:
>>>>>> On 7/24/24 08:53, Luis Machado wrote:
>>>>>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>>>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>>>>>> target board unix/mthumb I get:
>>>>>>>>>>> ...
>>>>>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>>>>>
>>>>>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>>>>>> as often as the 32-bit code.
>>>>>>>>>>
>>>>>>>>>> Let me see if I can reproduce this one on my end.
>>>>>>>>>
>>>>>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>>>>>> work, but I'm not sure where the error is coming from.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Luis,
>>>>>>>>
>>>>>>>> thanks for looking into this, and the approval, committed.
>>>>>>>>
>>>>>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>>>>>
>>>>>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>>>>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>>>>>
>>>>>>
>>>>>> Ok, I spent some more time debugging this issue this morning.
>>>>>>
>>>>>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>>>>>
>>>>>
>>>>> But what would change with reversing writing to the control registers, from gdb's perspective?
>>>>>
>>>>
>>>> Well, from gdb's perspective, the only difference is that both ptrace calls succeed, while with the original order the first one fails (and consequently there's no second call).>
>>>
>>> I've investigated this further, and I think I see the reason why reversing works. It seems handling of hardware breakpoints is slightly different between aarch64 compat and
>>> 32-bit arm.
>>>
>>> In summary, it seems aarch64 compat attempts to set the address before doing anything with the passed control register value, in arch/arm64/kernel/ptrace.c:hw_break_set.
>>>
>>> We can see it punts if ptrace_hbp_set_addr returns an error, which is where we're failing with EINVAL.
>>>
>>> For 32-bit arm, in arch/arm/kernel/ptrace.c:ptrace_sethbpregs, we do things in a different way. The important bit is that we only call modify_user_hw_breakpoint after
>>> we're done setting both the address and the control register.
>>>
>>
>> I'm looking at that code, and it seems obvious to me that modify_user_hw_breakpoint is called both after setting the address register and after setting the control register.  Could you double-check your observation?
>>
> 
> You're right. It gets called twice, one for setting the address and the other for setting the control register. I missed that when reading through it.
> 
> So it may still be the case this is also an issue with the 32-bit arm code. I'll have to boot a 32-bit kernel to check.
> 
>>> For aarch64 compat we call modify_user_hw_breakpoint for both ptrace_hbp_set_addr and ptrace_hbp_set_ctrl.
>>>
>>> When we have a new task and attempt to use a hw break, the kernel initializes the hw break slots. It does so in arch/arm64/kernel/ptrace.c:ptrace_hbp_get_initialised_bp.
>>>
>>> Once a slot is initialized, it isn't initialized again it seems. We will only reuse the slot (with whatever information it has, since we will replace it anyway).
>>>
>>> With the above context, it seems we're running into trouble when we have an unaligned thumb address (offset == 2) and the slot's bp_len is set to ARM_BREAKPOINT_LEN_4,
>>> which is the default (or it is there because we previously set a 4-byte hw break on this slot).
>>>
>>> We can confirm that setting a 2-byte hw break works if we use an aligned thumb address (offset == 0), because we use a different switch case entry. It also works if we first manage to insert
>>> a 2-byte hw break on an aligned thumb address, delete it and then try to insert the hw break on the unaligned thumb address.
>>
>> Confirmed, that also works for me.
>>
>>> This is because inserting a hw break on an
>>> aligned thumb address sets the bp_len to ARM_BREAKPOINT_LEN_2, and we eventually reuse that slot during ptrace_hbp_set_addr, which, I think, is the bug we have in the kernel.
>>>
>>> We shouldn't be reusing that information. Instead we should use whatever the user passed as the control register value to the ptrace call.
>>>
>>
>> IIUC, your hypothesis is that the kernel bug is that the check for address vs breakpoint length should only happen when writing the control register?
> 
> I think so, because we need to validate the address against the length of the breakpoint that is being requested. And that data is part of the control register.
> 
> The problem seems to arise from the fact we need to do two ptrace calls to set things up, and we're trying to validate both calls.
> 
>>
>>> For a potential workaround, I think we'll need to check for attempts at inserting a hw break at an unaligned thumb address (offset == 2). If so, then we do a dance of
>>> inserting the hw break at the aligned version of that address (offset == 0), only to make sure the slot's bp_len is correctly set to ARM_BREAKPOINT_LEN_2, and then
>>> proceed to insert the hw break on the original requested unaligned thumb address.
>>>
>>> Off the top of my head I can't think of potential issues with this approach. I don't think the kernel checks if we insert two hw break's at the same address, so that
>>> corner case might not be an issue.
>>>
>>
>> I've submitted a patch implementing that approach ( https://sourceware.org/pipermail/gdb-patches/2024-July/210681.html ), basically doing the following ptrace calls:
>> ...
>>           1. address_reg = bpts[i].address & ~0x7U
>>           2. control_reg = bpts[i].control
>>           3. address_reg = bpts[i].address
>> ...
>>
>> [ Note that a fix for the kernel bug formulated above would mean that the address vs breakpoint length check in step 3 would stop happening, and we'd need to write the control register again in a step 4, to get that check back... ]
> 
> That makes sense, as we need two ptrace calls for this operation
> 

OK, I'll send a v2.

I also think that it's probably a good idea to do the first control_reg 
write with the enabled bit switched off.

That way we're not actually enabling a hw breakpoint on the wrong address.

Thanks,
- Tom


>>
>> Thanks,
>> - Tom
>>
>>> Thoughts?
>>>
>>>>>> My theory at this point is that the following happens in the failing case:
>>>>>> - PTRACE_SETHBPREGS with address 0x4004e2
>>>>>> - compat_arch_ptrace
>>>>>> - compat_ptrace_sethbpregs
>>>>>> - compat_ptrace_hbp_set
>>>>>> - ptrace_hbp_set_addr
>>>>>> - ptrace_hbp_get_initialised_bp
>>>>>> - ptrace_hbp_create
>>>>>> - /* Initialise fields to sane defaults
>>>>>>         (i.e. values that will pass validation).  */
>>>>>>      attr.bp_len = HW_BREAKPOINT_LEN_4;
>>>>>
>>>>>
>>>>> The default starts as a 4-byte breakpoint, but is supposed to be adjusted later on to 2 bytes. If this isn't happening, I think we have a bug somewhere.
>>>>>
>>>>
>>>> Agreed, you could frame that as a kernel bug.  It would be good to known whether the kernel developers agree with that assessment.
>>>>
>>>>>> - attr.bp_addr = 0x4004e2;
>>>>>> - modify_user_hw_breakpoint
>>>>>> - modify_user_hw_breakpoint_check
>>>>>> - hw_breakpoint_parse
>>>>>> - hw_breakpoint_arch_parse
>>>>>> - case is_compat_bp(bp)
>>>>>> - offset = 2;
>>>>>> - fallthrough to default
>>>>>> - return -EINVAL
>>>>>>
>>>>>> In short, we try to validate:
>>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
>>>>>> and fail.
>>>>>>
>>>>>> By reversing the order, we validate:
>>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
>>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
>>>>>> which both succeed.
>>>>>
>>>>> Why do we have HW_BREAKPOINT_LEN_2 above while the first case has HW_BREAKPOINT_LEN_4?
>>>>>
>>>>
>>>> Well, because we reversed the order of the two ptrace calls.
>>>>
>>>> So, in the original case, the first call to ptrace uses the default bp_len (HW_BREAKPOINT_LEN_4) and the actual address (0x4004e2), which fails.
>>>>
>>>> And in the reversed order case, the first call to ptrace uses the default address (0x0) and the actual bp_len (HW_BREAKPOINT_LEN_2).
>>>>
>>>> [ With "default" meaning, as set by ptrace_hbp_create, and "actual", as set by the ptrace calls. ]
>>>>
>>>>>>
>>>>>> So, my questions at this point are:
>>>>>> - is this a problem limited to aarch64 32-bit mode, or does it also
>>>>>>      occur for native 32-bit arm?
>>>>>
>>>>> I'm not sure at this point. They are two separate code bases, but it is probably reasonable to assume the compat layer of aarch64 was based on the
>>>>> original 32-bit arm code. Checking hw_breakpoint_arch_parse for arm, it does seem fairly similar.
>>>>>
>>>>
>>>> I also observed that they're very similar.
>>>>
>>>>>> - is this a kernel bug?
>>>>>
>>>>> Potentially, if it is assuming a length that is not correct.
>>>>>
>>>>>> - if this is a kernel bug, is there a workaround we can use?
>>>>>> - if this is not a kernel bug, is this because gdb is writing the
>>>>>>      Breakpoint Register Pair in the wrong order?
>>>>>
>>>>> I don't think we have a specific order to write things, but if it is a bug that arises from a specific order of commands, we could potentially
>>>>> work around it.
>>>>>
>>>>
>>>> OK, I'm currently testing that approach.
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> - Tom
>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> - Tom
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>>>>>>>> continue^M
>>>>>>>>>>> Continuing.^M
>>>>>>>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>>>>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>>>>>>>> ...
>>>>>>>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>>>>>>>> ...
>>>>>>>>>>>                if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>>>>>>>                    (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>>>>>>>                  perror_with_name (_("Unexpected error setting breakpoint"));
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>>>>>>>> sort, but I don't see what gdb can do about this.
>>>>>>>>>>>
>>>>>>>>>>> Tentatively mark this as xfail.
>>>>>>>>>>>
>>>>>>>>>>> Tested on aarch64-linux.
>>>>>>>>>>> ---
>>>>>>>>>>>       gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>>>>>>>       1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>>>>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>>>>>>>         set breakline [gdb_get_line_number "break-at-exit"]
>>>>>>>>>>>       -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>>>>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>>>>>>>> -     "hbreak"
>>>>>>>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>>>>>>>> +set re_dot [string_to_regexp .]
>>>>>>>>>>>       -gdb_test "continue" \
>>>>>>>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>>>>>>>> -     "continue to break-at-exit after hbreak"
>>>>>>>>>>> +set addr 0x0
>>>>>>>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>>>>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>>>>>>>> +    set addr $expect_out(1,string)
>>>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>>>> +    }
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +set have_xfail 0
>>>>>>>>>>> +if { [istarget arm*-*-*] } {
>>>>>>>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>>>>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>>>>>>>> +    # breakpoint on.
>>>>>>>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +set re_xfail \
>>>>>>>>>>> +    [string_to_regexp \
>>>>>>>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>>>>>>>> +
>>>>>>>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>>>>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>>>> +    }
>>>>>>>>>>> +    -re -wrap $re_xfail {
>>>>>>>>>>> +    if { $have_xfail } {
>>>>>>>>>>> +        xfail $gdb_test_name
>>>>>>>>>>> +    } else {
>>>>>>>>>>> +        fail $gdb_test_name
>>>>>>>>>>> +    }
>>>>>>>>>>> +    }
>>>>>>>>>>> +}
>>>>>>>>>>>
>>>>>>>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>>>>>>>
>>>>>>>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
  
Luis Machado July 26, 2024, 6:30 a.m. UTC | #12
On 7/26/24 07:28, Tom de Vries wrote:
> On 7/25/24 17:22, Luis Machado wrote:
>> On 7/25/24 14:52, Tom de Vries wrote:
>>> On 7/25/24 00:59, Luis Machado wrote:
>>>> On 7/24/24 12:56, Tom de Vries wrote:
>>>>> On 7/24/24 12:45, Luis Machado wrote:
>>>>>> On 7/24/24 10:28, Tom de Vries wrote:
>>>>>>> On 7/24/24 08:53, Luis Machado wrote:
>>>>>>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>>>>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>>>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>>>>>>> target board unix/mthumb I get:
>>>>>>>>>>>> ...
>>>>>>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>>>>>>
>>>>>>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>>>>>>> as often as the 32-bit code.
>>>>>>>>>>>
>>>>>>>>>>> Let me see if I can reproduce this one on my end.
>>>>>>>>>>
>>>>>>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>>>>>>> work, but I'm not sure where the error is coming from.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Luis,
>>>>>>>>>
>>>>>>>>> thanks for looking into this, and the approval, committed.
>>>>>>>>>
>>>>>>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>>>>>>
>>>>>>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>>>>>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>>>>>>
>>>>>>>
>>>>>>> Ok, I spent some more time debugging this issue this morning.
>>>>>>>
>>>>>>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>>>>>>
>>>>>>
>>>>>> But what would change with reversing writing to the control registers, from gdb's perspective?
>>>>>>
>>>>>
>>>>> Well, from gdb's perspective, the only difference is that both ptrace calls succeed, while with the original order the first one fails (and consequently there's no second call).>
>>>>
>>>> I've investigated this further, and I think I see the reason why reversing works. It seems handling of hardware breakpoints is slightly different between aarch64 compat and
>>>> 32-bit arm.
>>>>
>>>> In summary, it seems aarch64 compat attempts to set the address before doing anything with the passed control register value, in arch/arm64/kernel/ptrace.c:hw_break_set.
>>>>
>>>> We can see it punts if ptrace_hbp_set_addr returns an error, which is where we're failing with EINVAL.
>>>>
>>>> For 32-bit arm, in arch/arm/kernel/ptrace.c:ptrace_sethbpregs, we do things in a different way. The important bit is that we only call modify_user_hw_breakpoint after
>>>> we're done setting both the address and the control register.
>>>>
>>>
>>> I'm looking at that code, and it seems obvious to me that modify_user_hw_breakpoint is called both after setting the address register and after setting the control register.  Could you double-check your observation?
>>>
>>
>> You're right. It gets called twice, one for setting the address and the other for setting the control register. I missed that when reading through it.
>>
>> So it may still be the case this is also an issue with the 32-bit arm code. I'll have to boot a 32-bit kernel to check.
>>
>>>> For aarch64 compat we call modify_user_hw_breakpoint for both ptrace_hbp_set_addr and ptrace_hbp_set_ctrl.
>>>>
>>>> When we have a new task and attempt to use a hw break, the kernel initializes the hw break slots. It does so in arch/arm64/kernel/ptrace.c:ptrace_hbp_get_initialised_bp.
>>>>
>>>> Once a slot is initialized, it isn't initialized again it seems. We will only reuse the slot (with whatever information it has, since we will replace it anyway).
>>>>
>>>> With the above context, it seems we're running into trouble when we have an unaligned thumb address (offset == 2) and the slot's bp_len is set to ARM_BREAKPOINT_LEN_4,
>>>> which is the default (or it is there because we previously set a 4-byte hw break on this slot).
>>>>
>>>> We can confirm that setting a 2-byte hw break works if we use an aligned thumb address (offset == 0), because we use a different switch case entry. It also works if we first manage to insert
>>>> a 2-byte hw break on an aligned thumb address, delete it and then try to insert the hw break on the unaligned thumb address.
>>>
>>> Confirmed, that also works for me.
>>>
>>>> This is because inserting a hw break on an
>>>> aligned thumb address sets the bp_len to ARM_BREAKPOINT_LEN_2, and we eventually reuse that slot during ptrace_hbp_set_addr, which, I think, is the bug we have in the kernel.
>>>>
>>>> We shouldn't be reusing that information. Instead we should use whatever the user passed as the control register value to the ptrace call.
>>>>
>>>
>>> IIUC, your hypothesis is that the kernel bug is that the check for address vs breakpoint length should only happen when writing the control register?
>>
>> I think so, because we need to validate the address against the length of the breakpoint that is being requested. And that data is part of the control register.
>>
>> The problem seems to arise from the fact we need to do two ptrace calls to set things up, and we're trying to validate both calls.
>>
>>>
>>>> For a potential workaround, I think we'll need to check for attempts at inserting a hw break at an unaligned thumb address (offset == 2). If so, then we do a dance of
>>>> inserting the hw break at the aligned version of that address (offset == 0), only to make sure the slot's bp_len is correctly set to ARM_BREAKPOINT_LEN_2, and then
>>>> proceed to insert the hw break on the original requested unaligned thumb address.
>>>>
>>>> Off the top of my head I can't think of potential issues with this approach. I don't think the kernel checks if we insert two hw break's at the same address, so that
>>>> corner case might not be an issue.
>>>>
>>>
>>> I've submitted a patch implementing that approach ( https://sourceware.org/pipermail/gdb-patches/2024-July/210681.html ), basically doing the following ptrace calls:
>>> ...
>>>           1. address_reg = bpts[i].address & ~0x7U
>>>           2. control_reg = bpts[i].control
>>>           3. address_reg = bpts[i].address
>>> ...
>>>
>>> [ Note that a fix for the kernel bug formulated above would mean that the address vs breakpoint length check in step 3 would stop happening, and we'd need to write the control register again in a step 4, to get that check back... ]
>>
>> That makes sense, as we need two ptrace calls for this operation
>>
> 
> OK, I'll send a v2.
> 
> I also think that it's probably a good idea to do the first control_reg write with the enabled bit switched off.
> 
> That way we're not actually enabling a hw breakpoint on the wrong address.

Makes sense to me.

I'm in the process of booting a 32-bit kernel to check what is the situation on 32-bit arm.

> 
> Thanks,
> - Tom
> 
> 
>>>
>>> Thanks,
>>> - Tom
>>>
>>>> Thoughts?
>>>>
>>>>>>> My theory at this point is that the following happens in the failing case:
>>>>>>> - PTRACE_SETHBPREGS with address 0x4004e2
>>>>>>> - compat_arch_ptrace
>>>>>>> - compat_ptrace_sethbpregs
>>>>>>> - compat_ptrace_hbp_set
>>>>>>> - ptrace_hbp_set_addr
>>>>>>> - ptrace_hbp_get_initialised_bp
>>>>>>> - ptrace_hbp_create
>>>>>>> - /* Initialise fields to sane defaults
>>>>>>>         (i.e. values that will pass validation).  */
>>>>>>>      attr.bp_len = HW_BREAKPOINT_LEN_4;
>>>>>>
>>>>>>
>>>>>> The default starts as a 4-byte breakpoint, but is supposed to be adjusted later on to 2 bytes. If this isn't happening, I think we have a bug somewhere.
>>>>>>
>>>>>
>>>>> Agreed, you could frame that as a kernel bug.  It would be good to known whether the kernel developers agree with that assessment.
>>>>>
>>>>>>> - attr.bp_addr = 0x4004e2;
>>>>>>> - modify_user_hw_breakpoint
>>>>>>> - modify_user_hw_breakpoint_check
>>>>>>> - hw_breakpoint_parse
>>>>>>> - hw_breakpoint_arch_parse
>>>>>>> - case is_compat_bp(bp)
>>>>>>> - offset = 2;
>>>>>>> - fallthrough to default
>>>>>>> - return -EINVAL
>>>>>>>
>>>>>>> In short, we try to validate:
>>>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
>>>>>>> and fail.
>>>>>>>
>>>>>>> By reversing the order, we validate:
>>>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
>>>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
>>>>>>> which both succeed.
>>>>>>
>>>>>> Why do we have HW_BREAKPOINT_LEN_2 above while the first case has HW_BREAKPOINT_LEN_4?
>>>>>>
>>>>>
>>>>> Well, because we reversed the order of the two ptrace calls.
>>>>>
>>>>> So, in the original case, the first call to ptrace uses the default bp_len (HW_BREAKPOINT_LEN_4) and the actual address (0x4004e2), which fails.
>>>>>
>>>>> And in the reversed order case, the first call to ptrace uses the default address (0x0) and the actual bp_len (HW_BREAKPOINT_LEN_2).
>>>>>
>>>>> [ With "default" meaning, as set by ptrace_hbp_create, and "actual", as set by the ptrace calls. ]
>>>>>
>>>>>>>
>>>>>>> So, my questions at this point are:
>>>>>>> - is this a problem limited to aarch64 32-bit mode, or does it also
>>>>>>>      occur for native 32-bit arm?
>>>>>>
>>>>>> I'm not sure at this point. They are two separate code bases, but it is probably reasonable to assume the compat layer of aarch64 was based on the
>>>>>> original 32-bit arm code. Checking hw_breakpoint_arch_parse for arm, it does seem fairly similar.
>>>>>>
>>>>>
>>>>> I also observed that they're very similar.
>>>>>
>>>>>>> - is this a kernel bug?
>>>>>>
>>>>>> Potentially, if it is assuming a length that is not correct.
>>>>>>
>>>>>>> - if this is a kernel bug, is there a workaround we can use?
>>>>>>> - if this is not a kernel bug, is this because gdb is writing the
>>>>>>>      Breakpoint Register Pair in the wrong order?
>>>>>>
>>>>>> I don't think we have a specific order to write things, but if it is a bug that arises from a specific order of commands, we could potentially
>>>>>> work around it.
>>>>>>
>>>>>
>>>>> OK, I'm currently testing that approach.
>>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> - Tom
>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> - Tom
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>>>>>>>>> continue^M
>>>>>>>>>>>> Continuing.^M
>>>>>>>>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>>>>>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>>>>>>>>> ...
>>>>>>>>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>>>>>>>>> ...
>>>>>>>>>>>>                if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>>>>>>>>                    (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>>>>>>>>                  perror_with_name (_("Unexpected error setting breakpoint"));
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>>>>>>>>> sort, but I don't see what gdb can do about this.
>>>>>>>>>>>>
>>>>>>>>>>>> Tentatively mark this as xfail.
>>>>>>>>>>>>
>>>>>>>>>>>> Tested on aarch64-linux.
>>>>>>>>>>>> ---
>>>>>>>>>>>>       gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>>>>>>>>       1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>>>>>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>>>>>>>>         set breakline [gdb_get_line_number "break-at-exit"]
>>>>>>>>>>>>       -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>>>>>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>>>>>>>>> -     "hbreak"
>>>>>>>>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>>>>>>>>> +set re_dot [string_to_regexp .]
>>>>>>>>>>>>       -gdb_test "continue" \
>>>>>>>>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>>>>>>>>> -     "continue to break-at-exit after hbreak"
>>>>>>>>>>>> +set addr 0x0
>>>>>>>>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>>>>>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>>>>>>>>> +    set addr $expect_out(1,string)
>>>>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +set have_xfail 0
>>>>>>>>>>>> +if { [istarget arm*-*-*] } {
>>>>>>>>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>>>>>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>>>>>>>>> +    # breakpoint on.
>>>>>>>>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +set re_xfail \
>>>>>>>>>>>> +    [string_to_regexp \
>>>>>>>>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>>>>>>>>> +
>>>>>>>>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>>>>>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    -re -wrap $re_xfail {
>>>>>>>>>>>> +    if { $have_xfail } {
>>>>>>>>>>>> +        xfail $gdb_test_name
>>>>>>>>>>>> +    } else {
>>>>>>>>>>>> +        fail $gdb_test_name
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>>
>>>>>>>>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>>>>>>>>
>>>>>>>>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
  
Tom de Vries July 26, 2024, 10:01 a.m. UTC | #13
On 7/26/24 08:30, Luis Machado wrote:
> On 7/26/24 07:28, Tom de Vries wrote:
>> On 7/25/24 17:22, Luis Machado wrote:
>>> On 7/25/24 14:52, Tom de Vries wrote:
>>>> On 7/25/24 00:59, Luis Machado wrote:
>>>>> On 7/24/24 12:56, Tom de Vries wrote:
>>>>>> On 7/24/24 12:45, Luis Machado wrote:
>>>>>>> On 7/24/24 10:28, Tom de Vries wrote:
>>>>>>>> On 7/24/24 08:53, Luis Machado wrote:
>>>>>>>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>>>>>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>>>>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>>>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>>>>>>>> target board unix/mthumb I get:
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>>>>>>>
>>>>>>>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>>>>>>>> as often as the 32-bit code.
>>>>>>>>>>>>
>>>>>>>>>>>> Let me see if I can reproduce this one on my end.
>>>>>>>>>>>
>>>>>>>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>>>>>>>> work, but I'm not sure where the error is coming from.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Luis,
>>>>>>>>>>
>>>>>>>>>> thanks for looking into this, and the approval, committed.
>>>>>>>>>>
>>>>>>>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>>>>>>>
>>>>>>>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>>>>>>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, I spent some more time debugging this issue this morning.
>>>>>>>>
>>>>>>>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>>>>>>>
>>>>>>>
>>>>>>> But what would change with reversing writing to the control registers, from gdb's perspective?
>>>>>>>
>>>>>>
>>>>>> Well, from gdb's perspective, the only difference is that both ptrace calls succeed, while with the original order the first one fails (and consequently there's no second call).>
>>>>>
>>>>> I've investigated this further, and I think I see the reason why reversing works. It seems handling of hardware breakpoints is slightly different between aarch64 compat and
>>>>> 32-bit arm.
>>>>>
>>>>> In summary, it seems aarch64 compat attempts to set the address before doing anything with the passed control register value, in arch/arm64/kernel/ptrace.c:hw_break_set.
>>>>>
>>>>> We can see it punts if ptrace_hbp_set_addr returns an error, which is where we're failing with EINVAL.
>>>>>
>>>>> For 32-bit arm, in arch/arm/kernel/ptrace.c:ptrace_sethbpregs, we do things in a different way. The important bit is that we only call modify_user_hw_breakpoint after
>>>>> we're done setting both the address and the control register.
>>>>>
>>>>
>>>> I'm looking at that code, and it seems obvious to me that modify_user_hw_breakpoint is called both after setting the address register and after setting the control register.  Could you double-check your observation?
>>>>
>>>
>>> You're right. It gets called twice, one for setting the address and the other for setting the control register. I missed that when reading through it.
>>>
>>> So it may still be the case this is also an issue with the 32-bit arm code. I'll have to boot a 32-bit kernel to check.
>>>
>>>>> For aarch64 compat we call modify_user_hw_breakpoint for both ptrace_hbp_set_addr and ptrace_hbp_set_ctrl.
>>>>>
>>>>> When we have a new task and attempt to use a hw break, the kernel initializes the hw break slots. It does so in arch/arm64/kernel/ptrace.c:ptrace_hbp_get_initialised_bp.
>>>>>
>>>>> Once a slot is initialized, it isn't initialized again it seems. We will only reuse the slot (with whatever information it has, since we will replace it anyway).
>>>>>
>>>>> With the above context, it seems we're running into trouble when we have an unaligned thumb address (offset == 2) and the slot's bp_len is set to ARM_BREAKPOINT_LEN_4,
>>>>> which is the default (or it is there because we previously set a 4-byte hw break on this slot).
>>>>>
>>>>> We can confirm that setting a 2-byte hw break works if we use an aligned thumb address (offset == 0), because we use a different switch case entry. It also works if we first manage to insert
>>>>> a 2-byte hw break on an aligned thumb address, delete it and then try to insert the hw break on the unaligned thumb address.
>>>>
>>>> Confirmed, that also works for me.
>>>>
>>>>> This is because inserting a hw break on an
>>>>> aligned thumb address sets the bp_len to ARM_BREAKPOINT_LEN_2, and we eventually reuse that slot during ptrace_hbp_set_addr, which, I think, is the bug we have in the kernel.
>>>>>
>>>>> We shouldn't be reusing that information. Instead we should use whatever the user passed as the control register value to the ptrace call.
>>>>>
>>>>
>>>> IIUC, your hypothesis is that the kernel bug is that the check for address vs breakpoint length should only happen when writing the control register?
>>>
>>> I think so, because we need to validate the address against the length of the breakpoint that is being requested. And that data is part of the control register.
>>>
>>> The problem seems to arise from the fact we need to do two ptrace calls to set things up, and we're trying to validate both calls.
>>>
>>>>
>>>>> For a potential workaround, I think we'll need to check for attempts at inserting a hw break at an unaligned thumb address (offset == 2). If so, then we do a dance of
>>>>> inserting the hw break at the aligned version of that address (offset == 0), only to make sure the slot's bp_len is correctly set to ARM_BREAKPOINT_LEN_2, and then
>>>>> proceed to insert the hw break on the original requested unaligned thumb address.
>>>>>
>>>>> Off the top of my head I can't think of potential issues with this approach. I don't think the kernel checks if we insert two hw break's at the same address, so that
>>>>> corner case might not be an issue.
>>>>>
>>>>
>>>> I've submitted a patch implementing that approach ( https://sourceware.org/pipermail/gdb-patches/2024-July/210681.html ), basically doing the following ptrace calls:
>>>> ...
>>>>            1. address_reg = bpts[i].address & ~0x7U
>>>>            2. control_reg = bpts[i].control
>>>>            3. address_reg = bpts[i].address
>>>> ...
>>>>
>>>> [ Note that a fix for the kernel bug formulated above would mean that the address vs breakpoint length check in step 3 would stop happening, and we'd need to write the control register again in a step 4, to get that check back... ]
>>>
>>> That makes sense, as we need two ptrace calls for this operation
>>>
>>
>> OK, I'll send a v2.
>>
>> I also think that it's probably a good idea to do the first control_reg write with the enabled bit switched off.
>>
>> That way we're not actually enabling a hw breakpoint on the wrong address.
> 
> Makes sense to me.
> 

Well, it turns out that that doesn't work, so I've left that part as is.

V2 submitted here ( 
https://sourceware.org/pipermail/gdb-patches/2024-July/210709.html ).

> I'm in the process of booting a 32-bit kernel to check what is the situation on 32-bit arm.
> 

Great.  Unfortunately I don't have a setup where I can try that.

Thanks,
- Tom

>>
>> Thanks,
>> - Tom
>>
>>
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>>> Thoughts?
>>>>>
>>>>>>>> My theory at this point is that the following happens in the failing case:
>>>>>>>> - PTRACE_SETHBPREGS with address 0x4004e2
>>>>>>>> - compat_arch_ptrace
>>>>>>>> - compat_ptrace_sethbpregs
>>>>>>>> - compat_ptrace_hbp_set
>>>>>>>> - ptrace_hbp_set_addr
>>>>>>>> - ptrace_hbp_get_initialised_bp
>>>>>>>> - ptrace_hbp_create
>>>>>>>> - /* Initialise fields to sane defaults
>>>>>>>>          (i.e. values that will pass validation).  */
>>>>>>>>       attr.bp_len = HW_BREAKPOINT_LEN_4;
>>>>>>>
>>>>>>>
>>>>>>> The default starts as a 4-byte breakpoint, but is supposed to be adjusted later on to 2 bytes. If this isn't happening, I think we have a bug somewhere.
>>>>>>>
>>>>>>
>>>>>> Agreed, you could frame that as a kernel bug.  It would be good to known whether the kernel developers agree with that assessment.
>>>>>>
>>>>>>>> - attr.bp_addr = 0x4004e2;
>>>>>>>> - modify_user_hw_breakpoint
>>>>>>>> - modify_user_hw_breakpoint_check
>>>>>>>> - hw_breakpoint_parse
>>>>>>>> - hw_breakpoint_arch_parse
>>>>>>>> - case is_compat_bp(bp)
>>>>>>>> - offset = 2;
>>>>>>>> - fallthrough to default
>>>>>>>> - return -EINVAL
>>>>>>>>
>>>>>>>> In short, we try to validate:
>>>>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_4 && attr.bp_addr == 0x4004e2
>>>>>>>> and fail.
>>>>>>>>
>>>>>>>> By reversing the order, we validate:
>>>>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0, and then
>>>>>>>> - attr.bp_len == HW_BREAKPOINT_LEN_2 && attr.bp_addr == 0x4004e2
>>>>>>>> which both succeed.
>>>>>>>
>>>>>>> Why do we have HW_BREAKPOINT_LEN_2 above while the first case has HW_BREAKPOINT_LEN_4?
>>>>>>>
>>>>>>
>>>>>> Well, because we reversed the order of the two ptrace calls.
>>>>>>
>>>>>> So, in the original case, the first call to ptrace uses the default bp_len (HW_BREAKPOINT_LEN_4) and the actual address (0x4004e2), which fails.
>>>>>>
>>>>>> And in the reversed order case, the first call to ptrace uses the default address (0x0) and the actual bp_len (HW_BREAKPOINT_LEN_2).
>>>>>>
>>>>>> [ With "default" meaning, as set by ptrace_hbp_create, and "actual", as set by the ptrace calls. ]
>>>>>>
>>>>>>>>
>>>>>>>> So, my questions at this point are:
>>>>>>>> - is this a problem limited to aarch64 32-bit mode, or does it also
>>>>>>>>       occur for native 32-bit arm?
>>>>>>>
>>>>>>> I'm not sure at this point. They are two separate code bases, but it is probably reasonable to assume the compat layer of aarch64 was based on the
>>>>>>> original 32-bit arm code. Checking hw_breakpoint_arch_parse for arm, it does seem fairly similar.
>>>>>>>
>>>>>>
>>>>>> I also observed that they're very similar.
>>>>>>
>>>>>>>> - is this a kernel bug?
>>>>>>>
>>>>>>> Potentially, if it is assuming a length that is not correct.
>>>>>>>
>>>>>>>> - if this is a kernel bug, is there a workaround we can use?
>>>>>>>> - if this is not a kernel bug, is this because gdb is writing the
>>>>>>>>       Breakpoint Register Pair in the wrong order?
>>>>>>>
>>>>>>> I don't think we have a specific order to write things, but if it is a bug that arises from a specific order of commands, we could potentially
>>>>>>> work around it.
>>>>>>>
>>>>>>
>>>>>> OK, I'm currently testing that approach.
>>>>>>
>>>>>> Thanks,
>>>>>> - Tom
>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> - Tom
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> - Tom
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> (gdb) PASS: gdb.base/hbreak.exp: hbreak
>>>>>>>>>>>>> continue^M
>>>>>>>>>>>>> Continuing.^M
>>>>>>>>>>>>> Unexpected error setting breakpoint: Invalid argument.^M
>>>>>>>>>>>>> (gdb) FAIL: gdb.base/hbreak.exp: continue to break-at-exit after hbreak
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> due to this call in arm_linux_nat_target::low_prepare_to_resume:
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>                 if (ptrace (PTRACE_SETHBPREGS, pid,
>>>>>>>>>>>>>                     (PTRACE_TYPE_ARG3) ((i << 1) + 1), &bpts[i].address) < 0)
>>>>>>>>>>>>>                   perror_with_name (_("Unexpected error setting breakpoint"));
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> This problem does not happen if instead we use a 4-byte aligned address.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not sure if this is simply unsupported or if there's a kernel bug of some
>>>>>>>>>>>>> sort, but I don't see what gdb can do about this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tentatively mark this as xfail.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tested on aarch64-linux.
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>        gdb/testsuite/gdb.base/hbreak.exp | 40 ++++++++++++++++++++++++++-----
>>>>>>>>>>>>>        1 file changed, 34 insertions(+), 6 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>>>>> index 73a3e2afb67..b140257a23e 100644
>>>>>>>>>>>>> --- a/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>>>>> +++ b/gdb/testsuite/gdb.base/hbreak.exp
>>>>>>>>>>>>> @@ -27,10 +27,38 @@ if ![runto_main] {
>>>>>>>>>>>>>          set breakline [gdb_get_line_number "break-at-exit"]
>>>>>>>>>>>>>        -gdb_test "hbreak ${srcfile}:${breakline}" \
>>>>>>>>>>>>> -     "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
>>>>>>>>>>>>> -     "hbreak"
>>>>>>>>>>>>> +set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
>>>>>>>>>>>>> +set re_dot [string_to_regexp .]
>>>>>>>>>>>>>        -gdb_test "continue" \
>>>>>>>>>>>>> -     "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
>>>>>>>>>>>>> -     "continue to break-at-exit after hbreak"
>>>>>>>>>>>>> +set addr 0x0
>>>>>>>>>>>>> +gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
>>>>>>>>>>>>> +    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
>>>>>>>>>>>>> +    set addr $expect_out(1,string)
>>>>>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +set have_xfail 0
>>>>>>>>>>>>> +if { [istarget arm*-*-*] } {
>>>>>>>>>>>>> +    # When running 32-bit userland on aarch64 kernel, thumb instructions that
>>>>>>>>>>>>> +    # are not 4-byte aligned may not be supported for setting a hardware
>>>>>>>>>>>>> +    # breakpoint on.
>>>>>>>>>>>>> +    set have_xfail [expr ($addr & 0x2) == 2]
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +set re_xfail \
>>>>>>>>>>>>> +    [string_to_regexp \
>>>>>>>>>>>>> +     "Unexpected error setting breakpoint: Invalid argument."]
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
>>>>>>>>>>>>> +    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
>>>>>>>>>>>>> +    pass $gdb_test_name
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +    -re -wrap $re_xfail {
>>>>>>>>>>>>> +    if { $have_xfail } {
>>>>>>>>>>>>> +        xfail $gdb_test_name
>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>> +        fail $gdb_test_name
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +}
>>>>>>>>>>>>>
>>>>>>>>>>>>> base-commit: 0ed152c5c6b3c72fc505b331ed77e08b438d643a
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In any case, I agree gdb doesn't have a better way to deal with it.
>>>>>>>>>>>
>>>>>>>>>>> Approved-By: Luis Machado <luis.machado@arm.com>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
  
Luis Machado July 26, 2024, 3:46 p.m. UTC | #14
On 7/26/24 11:01, Tom de Vries wrote:
> On 7/26/24 08:30, Luis Machado wrote:
>> On 7/26/24 07:28, Tom de Vries wrote:
>>> On 7/25/24 17:22, Luis Machado wrote:
>>>> On 7/25/24 14:52, Tom de Vries wrote:
>>>>> On 7/25/24 00:59, Luis Machado wrote:
>>>>>> On 7/24/24 12:56, Tom de Vries wrote:
>>>>>>> On 7/24/24 12:45, Luis Machado wrote:
>>>>>>>> On 7/24/24 10:28, Tom de Vries wrote:
>>>>>>>>> On 7/24/24 08:53, Luis Machado wrote:
>>>>>>>>>> On 7/24/24 06:25, Tom de Vries wrote:
>>>>>>>>>>> On 7/23/24 12:02, Luis Machado wrote:
>>>>>>>>>>>> On 7/17/24 16:14, Luis Machado wrote:
>>>>>>>>>>>>> On 7/17/24 16:10, Tom de Vries wrote:
>>>>>>>>>>>>>> On an aarch64-linux system with 32-bit userland running in a chroot, and using
>>>>>>>>>>>>>> target board unix/mthumb I get:
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> (gdb) hbreak hbreak.c:27^M
>>>>>>>>>>>>>> Hardware assisted breakpoint 2 at 0x4004e2: file hbreak.c, line 27.^M
>>>>>>>>>>>>>
>>>>>>>>>>>>> That is a bit odd, but it goes through the compat layer, which is not exercised
>>>>>>>>>>>>> as often as the 32-bit code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let me see if I can reproduce this one on my end.
>>>>>>>>>>>>
>>>>>>>>>>>> I managed to reproduce this. I checked with the kernel folks and this should
>>>>>>>>>>>> work, but I'm not sure where the error is coming from.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Luis,
>>>>>>>>>>>
>>>>>>>>>>> thanks for looking into this, and the approval, committed.
>>>>>>>>>>>
>>>>>>>>>>> Are you or the kernel folks following up on this, in terms of a linux kernel PR or some such?  It would be nice to add some sort of reference to the xfail.
>>>>>>>>>>
>>>>>>>>>> It's in my TODO. I'm still investigating to understand where the error is coming from. Once located, I plan to check with them for their thoughts and a possible
>>>>>>>>>> fix. I don't think the kernel folks use the PR process much. We could probably ammend this commit later on once we have more information though.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok, I spent some more time debugging this issue this morning.
>>>>>>>>>
>>>>>>>>> After reading kernel sources for a while, I tried out reversing the order in which the Breakpoint Register Pair is written in arm_linux_nat_target::low_prepare_to_resume, and ... the test-case passes.
>>>>>>>>>
>>>>>>>>
>>>>>>>> But what would change with reversing writing to the control registers, from gdb's perspective?
>>>>>>>>
>>>>>>>
>>>>>>> Well, from gdb's perspective, the only difference is that both ptrace calls succeed, while with the original order the first one fails (and consequently there's no second call).>
>>>>>>
>>>>>> I've investigated this further, and I think I see the reason why reversing works. It seems handling of hardware breakpoints is slightly different between aarch64 compat and
>>>>>> 32-bit arm.
>>>>>>
>>>>>> In summary, it seems aarch64 compat attempts to set the address before doing anything with the passed control register value, in arch/arm64/kernel/ptrace.c:hw_break_set.
>>>>>>
>>>>>> We can see it punts if ptrace_hbp_set_addr returns an error, which is where we're failing with EINVAL.
>>>>>>
>>>>>> For 32-bit arm, in arch/arm/kernel/ptrace.c:ptrace_sethbpregs, we do things in a different way. The important bit is that we only call modify_user_hw_breakpoint after
>>>>>> we're done setting both the address and the control register.
>>>>>>
>>>>>
>>>>> I'm looking at that code, and it seems obvious to me that modify_user_hw_breakpoint is called both after setting the address register and after setting the control register.  Could you double-check your observation?
>>>>>
>>>>
>>>> You're right. It gets called twice, one for setting the address and the other for setting the control register. I missed that when reading through it.
>>>>
>>>> So it may still be the case this is also an issue with the 32-bit arm code. I'll have to boot a 32-bit kernel to check.
>>>>
>>>>>> For aarch64 compat we call modify_user_hw_breakpoint for both ptrace_hbp_set_addr and ptrace_hbp_set_ctrl.
>>>>>>
>>>>>> When we have a new task and attempt to use a hw break, the kernel initializes the hw break slots. It does so in arch/arm64/kernel/ptrace.c:ptrace_hbp_get_initialised_bp.
>>>>>>
>>>>>> Once a slot is initialized, it isn't initialized again it seems. We will only reuse the slot (with whatever information it has, since we will replace it anyway).
>>>>>>
>>>>>> With the above context, it seems we're running into trouble when we have an unaligned thumb address (offset == 2) and the slot's bp_len is set to ARM_BREAKPOINT_LEN_4,
>>>>>> which is the default (or it is there because we previously set a 4-byte hw break on this slot).
>>>>>>
>>>>>> We can confirm that setting a 2-byte hw break works if we use an aligned thumb address (offset == 0), because we use a different switch case entry. It also works if we first manage to insert
>>>>>> a 2-byte hw break on an aligned thumb address, delete it and then try to insert the hw break on the unaligned thumb address.
>>>>>
>>>>> Confirmed, that also works for me.
>>>>>
>>>>>> This is because inserting a hw break on an
>>>>>> aligned thumb address sets the bp_len to ARM_BREAKPOINT_LEN_2, and we eventually reuse that slot during ptrace_hbp_set_addr, which, I think, is the bug we have in the kernel.
>>>>>>
>>>>>> We shouldn't be reusing that information. Instead we should use whatever the user passed as the control register value to the ptrace call.
>>>>>>
>>>>>
>>>>> IIUC, your hypothesis is that the kernel bug is that the check for address vs breakpoint length should only happen when writing the control register?
>>>>
>>>> I think so, because we need to validate the address against the length of the breakpoint that is being requested. And that data is part of the control register.
>>>>
>>>> The problem seems to arise from the fact we need to do two ptrace calls to set things up, and we're trying to validate both calls.
>>>>
>>>>>
>>>>>> For a potential workaround, I think we'll need to check for attempts at inserting a hw break at an unaligned thumb address (offset == 2). If so, then we do a dance of
>>>>>> inserting the hw break at the aligned version of that address (offset == 0), only to make sure the slot's bp_len is correctly set to ARM_BREAKPOINT_LEN_2, and then
>>>>>> proceed to insert the hw break on the original requested unaligned thumb address.
>>>>>>
>>>>>> Off the top of my head I can't think of potential issues with this approach. I don't think the kernel checks if we insert two hw break's at the same address, so that
>>>>>> corner case might not be an issue.
>>>>>>
>>>>>
>>>>> I've submitted a patch implementing that approach ( https://sourceware.org/pipermail/gdb-patches/2024-July/210681.html ), basically doing the following ptrace calls:
>>>>> ...
>>>>>            1. address_reg = bpts[i].address & ~0x7U
>>>>>            2. control_reg = bpts[i].control
>>>>>            3. address_reg = bpts[i].address
>>>>> ...
>>>>>
>>>>> [ Note that a fix for the kernel bug formulated above would mean that the address vs breakpoint length check in step 3 would stop happening, and we'd need to write the control register again in a step 4, to get that check back... ]
>>>>
>>>> That makes sense, as we need two ptrace calls for this operation
>>>>
>>>
>>> OK, I'll send a v2.
>>>
>>> I also think that it's probably a good idea to do the first control_reg write with the enabled bit switched off.
>>>
>>> That way we're not actually enabling a hw breakpoint on the wrong address.
>>
>> Makes sense to me.
>>
> 
> Well, it turns out that that doesn't work, so I've left that part as is.
> 
> V2 submitted here ( https://sourceware.org/pipermail/gdb-patches/2024-July/210709.html ).
> 
>> I'm in the process of booting a 32-bit kernel to check what is the situation on 32-bit arm.
>>
> 
> Great.  Unfortunately I don't have a setup where I can try that.

Ok. I've confirmed the problem also exists for 32-bit arm, so I also see the ptrace
error there.
  

Patch

diff --git a/gdb/testsuite/gdb.base/hbreak.exp b/gdb/testsuite/gdb.base/hbreak.exp
index 73a3e2afb67..b140257a23e 100644
--- a/gdb/testsuite/gdb.base/hbreak.exp
+++ b/gdb/testsuite/gdb.base/hbreak.exp
@@ -27,10 +27,38 @@  if ![runto_main] {
 
 set breakline [gdb_get_line_number "break-at-exit"]
 
-gdb_test "hbreak ${srcfile}:${breakline}" \
-	 "Hardware assisted breakpoint \[0-9\]+ at 0x\[0-9a-f\]+: file .*${srcfile}, line ${breakline}\\." \
-	 "hbreak"
+set re_loc "file \[^\r\n\]*$srcfile, line $breakline"
+set re_dot [string_to_regexp .]
 
-gdb_test "continue" \
-	 "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" \
-	 "continue to break-at-exit after hbreak"
+set addr 0x0
+gdb_test_multiple "hbreak ${srcfile}:${breakline}" "hbreak" {
+    -re -wrap "Hardware assisted breakpoint $decimal at ($hex): $re_loc$re_dot" {
+	set addr $expect_out(1,string)
+	pass $gdb_test_name
+    }
+}
+
+set have_xfail 0
+if { [istarget arm*-*-*] } {
+    # When running 32-bit userland on aarch64 kernel, thumb instructions that
+    # are not 4-byte aligned may not be supported for setting a hardware
+    # breakpoint on.
+    set have_xfail [expr ($addr & 0x2) == 2]
+}
+
+set re_xfail \
+    [string_to_regexp \
+	 "Unexpected error setting breakpoint: Invalid argument."]
+
+gdb_test_multiple "continue" "continue to break-at-exit after hbreak" {
+    -re -wrap "Continuing\\.\[ \r\n\]+Breakpoint \[0-9\]+, .*break-at-exit.*" {
+	pass $gdb_test_name
+    }
+    -re -wrap $re_xfail {
+	if { $have_xfail } {
+	    xfail $gdb_test_name
+	} else {
+	    fail $gdb_test_name
+	}
+    }
+}