[aarch64] Fix gdbserver crashes on SVE/SME-enabled systems

Message ID 20250228110513.2449558-1-luis.machado@arm.com
State New
Headers
Series [aarch64] Fix gdbserver crashes on SVE/SME-enabled systems |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed

Commit Message

Luis Machado Feb. 28, 2025, 11:05 a.m. UTC
  Commit 51e6b8cfd649013ae16a3d00f1451b2531ba6bc9 fixed a
regression for SVE/SME registers on gdb's side by using a <= comparison for
regcache's raw_compare assertion check. We seem to have failed to do the same
for gdbserver's raw_compare counterpart.

With the code as it is, I'm seeing a lot of crashes for gdbserver on a machine
with SVE enabled. For instance, with the following invocation:

make check-gdb RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.base/break.exp

Running /work/builds/binutils-gdb/gdb/testsuite/../../../../repos/binutils-gdb/gdb/testsuite/gdb.base/break.exp ...
FAIL: gdb.base/break.exp: test_break: run until function breakpoint
FAIL: gdb.base/break.exp: test_break: run until breakpoint set at a line number (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: run until file:function(6) breakpoint (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: run until file:function(5) breakpoint (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: run until file:function(4) breakpoint (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: run until file:function(3) breakpoint (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: run until file:function(2) breakpoint (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: run until file:function(1) breakpoint (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: run until quoted breakpoint (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: run until file:linenum breakpoint (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: breakpoint offset +1
FAIL: gdb.base/break.exp: test_break: step onto breakpoint (the program is no longer running)
FAIL: gdb.base/break.exp: test_break: setting breakpoint at }
FAIL: gdb.base/break.exp: test_break: continue to breakpoint at } (the program is no longer running)
FAIL: gdb.base/break.exp: test_no_break_on_catchpoint: runto: run to main
FAIL: gdb.base/break.exp: test_break_nonexistent_line: runto: run to main
FAIL: gdb.base/break.exp: test_break_default: runto: run to main
FAIL: gdb.base/break.exp: test_break_silent_and_more: runto: run to main
FAIL: gdb.base/break.exp: test_break_line_convenience_var: runto: run to main
FAIL: gdb.base/break.exp: test_break_user_call: runto: run to main
FAIL: gdb.base/break.exp: test_finish_arguments: runto: run to main
FAIL: gdb.base/break.exp: test_next_with_recursion: kill program
FAIL: gdb.base/break.exp: test_next_with_recursion: run to factorial(6)
FAIL: gdb.base/break.exp: test_next_with_recursion: continue to factorial(5) (the program is no longer running)
FAIL: gdb.base/break.exp: test_next_with_recursion: backtrace from factorial(5)
FAIL: gdb.base/break.exp: test_next_with_recursion: next to recursive call (the program is no longer running)
FAIL: gdb.base/break.exp: test_next_with_recursion: next over recursive call (the program is no longer running)
FAIL: gdb.base/break.exp: test_next_with_recursion: backtrace from factorial(5.1)
FAIL: gdb.base/break.exp: test_next_with_recursion: continue until exit at recursive next test (the program is no longer running)
FAIL: gdb.base/break.exp: test_break_optimized_prologue: run until function breakpoint, optimized file
FAIL: gdb.base/break.exp: test_break_optimized_prologue: run until breakpoint set at small function, optimized file (the program is no longer running)
FAIL: gdb.base/break.exp: test_rbreak_shlib: rbreak junk

Adjusting the regcache raw_compare assertion check to use <= fixes
the problem on aarch64-linux on a SVE-capable system.

I plan to backport this fix to GDB 16 as well, if that's OK.
---
 gdbserver/regcache.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Simon Marchi March 3, 2025, 3:16 p.m. UTC | #1
On 2/28/25 6:05 AM, Luis Machado wrote:
> Commit 51e6b8cfd649013ae16a3d00f1451b2531ba6bc9 fixed a
> regression for SVE/SME registers on gdb's side by using a <= comparison for
> regcache's raw_compare assertion check. We seem to have failed to do the same
> for gdbserver's raw_compare counterpart.
> 
> With the code as it is, I'm seeing a lot of crashes for gdbserver on a machine
> with SVE enabled. For instance, with the following invocation:
> 
> make check-gdb RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.base/break.exp
> 
> Running /work/builds/binutils-gdb/gdb/testsuite/../../../../repos/binutils-gdb/gdb/testsuite/gdb.base/break.exp ...
> FAIL: gdb.base/break.exp: test_break: run until function breakpoint
> FAIL: gdb.base/break.exp: test_break: run until breakpoint set at a line number (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: run until file:function(6) breakpoint (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: run until file:function(5) breakpoint (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: run until file:function(4) breakpoint (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: run until file:function(3) breakpoint (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: run until file:function(2) breakpoint (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: run until file:function(1) breakpoint (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: run until quoted breakpoint (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: run until file:linenum breakpoint (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: breakpoint offset +1
> FAIL: gdb.base/break.exp: test_break: step onto breakpoint (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break: setting breakpoint at }
> FAIL: gdb.base/break.exp: test_break: continue to breakpoint at } (the program is no longer running)
> FAIL: gdb.base/break.exp: test_no_break_on_catchpoint: runto: run to main
> FAIL: gdb.base/break.exp: test_break_nonexistent_line: runto: run to main
> FAIL: gdb.base/break.exp: test_break_default: runto: run to main
> FAIL: gdb.base/break.exp: test_break_silent_and_more: runto: run to main
> FAIL: gdb.base/break.exp: test_break_line_convenience_var: runto: run to main
> FAIL: gdb.base/break.exp: test_break_user_call: runto: run to main
> FAIL: gdb.base/break.exp: test_finish_arguments: runto: run to main
> FAIL: gdb.base/break.exp: test_next_with_recursion: kill program
> FAIL: gdb.base/break.exp: test_next_with_recursion: run to factorial(6)
> FAIL: gdb.base/break.exp: test_next_with_recursion: continue to factorial(5) (the program is no longer running)
> FAIL: gdb.base/break.exp: test_next_with_recursion: backtrace from factorial(5)
> FAIL: gdb.base/break.exp: test_next_with_recursion: next to recursive call (the program is no longer running)
> FAIL: gdb.base/break.exp: test_next_with_recursion: next over recursive call (the program is no longer running)
> FAIL: gdb.base/break.exp: test_next_with_recursion: backtrace from factorial(5.1)
> FAIL: gdb.base/break.exp: test_next_with_recursion: continue until exit at recursive next test (the program is no longer running)
> FAIL: gdb.base/break.exp: test_break_optimized_prologue: run until function breakpoint, optimized file
> FAIL: gdb.base/break.exp: test_break_optimized_prologue: run until breakpoint set at small function, optimized file (the program is no longer running)
> FAIL: gdb.base/break.exp: test_rbreak_shlib: rbreak junk
> 
> Adjusting the regcache raw_compare assertion check to use <= fixes
> the problem on aarch64-linux on a SVE-capable system.
> 
> I plan to backport this fix to GDB 16 as well, if that's OK.
> ---
>  gdbserver/regcache.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gdbserver/regcache.cc b/gdbserver/regcache.cc
> index ee0c1b363b0..c08c9ae06c1 100644
> --- a/gdbserver/regcache.cc
> +++ b/gdbserver/regcache.cc
> @@ -503,7 +503,7 @@ regcache::raw_compare (int regnum, const void *buf, int offset) const
>    gdb_assert (buf != NULL);
>  
>    gdb::array_view<const gdb_byte> regbuf = register_data (this, regnum);
> -  gdb_assert (offset < regbuf.size ());
> +  gdb_assert (offset <= regbuf.size ());
>    regbuf = regbuf.slice (offset);
>  
>    return memcmp (buf, regbuf.data (), regbuf.size ()) == 0;
> -- 
> 2.25.1
> 

In my commit message in 51e6b8cfd649013ae16a3d00f1451b2531ba6bc9, I seem
pretty convinced that comparing 0 bytes was needed for SVE, but I have
no idea why it was needed.  But if we did it for GDB, it makes sense to
do it for GDBserver.

I didn't look into it, but can you check if it would be easy to add a
GDBserver selftest to check this, like I added in GDB?  I don't want you
to spend an eternity on it, but if it's easy it would be nice.

Simon
  
Luis Machado March 3, 2025, 3:19 p.m. UTC | #2
Hi,

On 3/3/25 15:16, Simon Marchi wrote:
> On 2/28/25 6:05 AM, Luis Machado wrote:
>> Commit 51e6b8cfd649013ae16a3d00f1451b2531ba6bc9 fixed a
>> regression for SVE/SME registers on gdb's side by using a <= comparison for
>> regcache's raw_compare assertion check. We seem to have failed to do the same
>> for gdbserver's raw_compare counterpart.
>>
>> With the code as it is, I'm seeing a lot of crashes for gdbserver on a machine
>> with SVE enabled. For instance, with the following invocation:
>>
>> make check-gdb RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.base/break.exp
>>
>> Running /work/builds/binutils-gdb/gdb/testsuite/../../../../repos/binutils-gdb/gdb/testsuite/gdb.base/break.exp ...
>> FAIL: gdb.base/break.exp: test_break: run until function breakpoint
>> FAIL: gdb.base/break.exp: test_break: run until breakpoint set at a line number (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: run until file:function(6) breakpoint (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: run until file:function(5) breakpoint (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: run until file:function(4) breakpoint (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: run until file:function(3) breakpoint (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: run until file:function(2) breakpoint (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: run until file:function(1) breakpoint (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: run until quoted breakpoint (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: run until file:linenum breakpoint (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: breakpoint offset +1
>> FAIL: gdb.base/break.exp: test_break: step onto breakpoint (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break: setting breakpoint at }
>> FAIL: gdb.base/break.exp: test_break: continue to breakpoint at } (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_no_break_on_catchpoint: runto: run to main
>> FAIL: gdb.base/break.exp: test_break_nonexistent_line: runto: run to main
>> FAIL: gdb.base/break.exp: test_break_default: runto: run to main
>> FAIL: gdb.base/break.exp: test_break_silent_and_more: runto: run to main
>> FAIL: gdb.base/break.exp: test_break_line_convenience_var: runto: run to main
>> FAIL: gdb.base/break.exp: test_break_user_call: runto: run to main
>> FAIL: gdb.base/break.exp: test_finish_arguments: runto: run to main
>> FAIL: gdb.base/break.exp: test_next_with_recursion: kill program
>> FAIL: gdb.base/break.exp: test_next_with_recursion: run to factorial(6)
>> FAIL: gdb.base/break.exp: test_next_with_recursion: continue to factorial(5) (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_next_with_recursion: backtrace from factorial(5)
>> FAIL: gdb.base/break.exp: test_next_with_recursion: next to recursive call (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_next_with_recursion: next over recursive call (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_next_with_recursion: backtrace from factorial(5.1)
>> FAIL: gdb.base/break.exp: test_next_with_recursion: continue until exit at recursive next test (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_break_optimized_prologue: run until function breakpoint, optimized file
>> FAIL: gdb.base/break.exp: test_break_optimized_prologue: run until breakpoint set at small function, optimized file (the program is no longer running)
>> FAIL: gdb.base/break.exp: test_rbreak_shlib: rbreak junk
>>
>> Adjusting the regcache raw_compare assertion check to use <= fixes
>> the problem on aarch64-linux on a SVE-capable system.
>>
>> I plan to backport this fix to GDB 16 as well, if that's OK.
>> ---
>>  gdbserver/regcache.cc | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gdbserver/regcache.cc b/gdbserver/regcache.cc
>> index ee0c1b363b0..c08c9ae06c1 100644
>> --- a/gdbserver/regcache.cc
>> +++ b/gdbserver/regcache.cc
>> @@ -503,7 +503,7 @@ regcache::raw_compare (int regnum, const void *buf, int offset) const
>>    gdb_assert (buf != NULL);
>>  
>>    gdb::array_view<const gdb_byte> regbuf = register_data (this, regnum);
>> -  gdb_assert (offset < regbuf.size ());
>> +  gdb_assert (offset <= regbuf.size ());
>>    regbuf = regbuf.slice (offset);
>>  
>>    return memcmp (buf, regbuf.data (), regbuf.size ()) == 0;
>> -- 
>> 2.25.1
>>
> 
> In my commit message in 51e6b8cfd649013ae16a3d00f1451b2531ba6bc9, I seem
> pretty convinced that comparing 0 bytes was needed for SVE, but I have
> no idea why it was needed.  But if we did it for GDB, it makes sense to
> do it for GDBserver.
> 
> I didn't look into it, but can you check if it would be easy to add a
> GDBserver selftest to check this, like I added in GDB?  I don't want you
> to spend an eternity on it, but if it's easy it would be nice.
> 
> Simon

Sure, I can do that. I tried running the selftest hoping it would transfer
things over RSP and, as a consequence, test gdbserver. But that did not work
obviously. Let me come up with something.

Luis
  
Simon Marchi March 3, 2025, 3:36 p.m. UTC | #3
On 3/3/25 10:19 AM, Luis Machado wrote:
> Sure, I can do that. I tried running the selftest hoping it would transfer
> things over RSP and, as a consequence, test gdbserver. But that did not work
> obviously. Let me come up with something.

No, the selftests are really just unit tests instantiating objects and
poking at them.  The difficulty is when some object has a dependency.
Sometimes, it is easy to mock the dependency, sometimes not.

Simon
  
Luis Machado March 5, 2025, 3:56 p.m. UTC | #4
Hi,

On 3/3/25 15:19, Luis Machado wrote:
> Hi,
> 
> On 3/3/25 15:16, Simon Marchi wrote:
>> On 2/28/25 6:05 AM, Luis Machado wrote:
>>> Commit 51e6b8cfd649013ae16a3d00f1451b2531ba6bc9 fixed a
>>> regression for SVE/SME registers on gdb's side by using a <= comparison for
>>> regcache's raw_compare assertion check. We seem to have failed to do the same
>>> for gdbserver's raw_compare counterpart.
>>>
>>> With the code as it is, I'm seeing a lot of crashes for gdbserver on a machine
>>> with SVE enabled. For instance, with the following invocation:
>>>
>>> make check-gdb RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.base/break.exp
>>>
>>> Running /work/builds/binutils-gdb/gdb/testsuite/../../../../repos/binutils-gdb/gdb/testsuite/gdb.base/break.exp ...
>>> FAIL: gdb.base/break.exp: test_break: run until function breakpoint
>>> FAIL: gdb.base/break.exp: test_break: run until breakpoint set at a line number (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: run until file:function(6) breakpoint (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: run until file:function(5) breakpoint (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: run until file:function(4) breakpoint (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: run until file:function(3) breakpoint (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: run until file:function(2) breakpoint (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: run until file:function(1) breakpoint (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: run until quoted breakpoint (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: run until file:linenum breakpoint (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: breakpoint offset +1
>>> FAIL: gdb.base/break.exp: test_break: step onto breakpoint (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break: setting breakpoint at }
>>> FAIL: gdb.base/break.exp: test_break: continue to breakpoint at } (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_no_break_on_catchpoint: runto: run to main
>>> FAIL: gdb.base/break.exp: test_break_nonexistent_line: runto: run to main
>>> FAIL: gdb.base/break.exp: test_break_default: runto: run to main
>>> FAIL: gdb.base/break.exp: test_break_silent_and_more: runto: run to main
>>> FAIL: gdb.base/break.exp: test_break_line_convenience_var: runto: run to main
>>> FAIL: gdb.base/break.exp: test_break_user_call: runto: run to main
>>> FAIL: gdb.base/break.exp: test_finish_arguments: runto: run to main
>>> FAIL: gdb.base/break.exp: test_next_with_recursion: kill program
>>> FAIL: gdb.base/break.exp: test_next_with_recursion: run to factorial(6)
>>> FAIL: gdb.base/break.exp: test_next_with_recursion: continue to factorial(5) (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_next_with_recursion: backtrace from factorial(5)
>>> FAIL: gdb.base/break.exp: test_next_with_recursion: next to recursive call (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_next_with_recursion: next over recursive call (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_next_with_recursion: backtrace from factorial(5.1)
>>> FAIL: gdb.base/break.exp: test_next_with_recursion: continue until exit at recursive next test (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_break_optimized_prologue: run until function breakpoint, optimized file
>>> FAIL: gdb.base/break.exp: test_break_optimized_prologue: run until breakpoint set at small function, optimized file (the program is no longer running)
>>> FAIL: gdb.base/break.exp: test_rbreak_shlib: rbreak junk
>>>
>>> Adjusting the regcache raw_compare assertion check to use <= fixes
>>> the problem on aarch64-linux on a SVE-capable system.
>>>
>>> I plan to backport this fix to GDB 16 as well, if that's OK.
>>> ---
>>>  gdbserver/regcache.cc | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gdbserver/regcache.cc b/gdbserver/regcache.cc
>>> index ee0c1b363b0..c08c9ae06c1 100644
>>> --- a/gdbserver/regcache.cc
>>> +++ b/gdbserver/regcache.cc
>>> @@ -503,7 +503,7 @@ regcache::raw_compare (int regnum, const void *buf, int offset) const
>>>    gdb_assert (buf != NULL);
>>>  
>>>    gdb::array_view<const gdb_byte> regbuf = register_data (this, regnum);
>>> -  gdb_assert (offset < regbuf.size ());
>>> +  gdb_assert (offset <= regbuf.size ());
>>>    regbuf = regbuf.slice (offset);
>>>  
>>>    return memcmp (buf, regbuf.data (), regbuf.size ()) == 0;
>>> -- 
>>> 2.25.1
>>>
>>
>> In my commit message in 51e6b8cfd649013ae16a3d00f1451b2531ba6bc9, I seem
>> pretty convinced that comparing 0 bytes was needed for SVE, but I have
>> no idea why it was needed.  But if we did it for GDB, it makes sense to
>> do it for GDBserver.
>>
>> I didn't look into it, but can you check if it would be easy to add a
>> GDBserver selftest to check this, like I added in GDB?  I don't want you
>> to spend an eternity on it, but if it's easy it would be nice.
>>
>> Simon
> 
> Sure, I can do that. I tried running the selftest hoping it would transfer
> things over RSP and, as a consequence, test gdbserver. But that did not work
> obviously. Let me come up with something.
> 
> Luis

Before venturing into creating a more substantial check for this, I took a step back to
understand what was going on.

I think there is an oversight in the code that handles SVE comparisons. It is a corner case
and only happens when we have a vector length of 128-bit, meaning the Z registers and the V
registers have the same size. So it is a complete overlap.

That isn't handled correctly in the code and we pass that down to raw_compare, which has to deal
with an empty comparison. It shouldn't, in my opinion.

What we should do instead is to check if the Z and V registers have the same size and skip this
comparison. Otherwise, if their sizes differ, we should go ahead and compare the contents of the
part that doesn't overlap with the V register.

If we fix the AArch64 code, there isn't a need to consider the 0-byte comparison case in raw_compare,
which means we can use a < comparison as opposed to <= for the offset.

I'll run a few more checks and will send an updated patch.
  
Simon Marchi March 5, 2025, 4:15 p.m. UTC | #5
On 3/5/25 10:56 AM, Luis Machado wrote:
> Before venturing into creating a more substantial check for this, I took a step back to
> understand what was going on.
> 
> I think there is an oversight in the code that handles SVE comparisons. It is a corner case
> and only happens when we have a vector length of 128-bit, meaning the Z registers and the V
> registers have the same size. So it is a complete overlap.
> 
> That isn't handled correctly in the code and we pass that down to raw_compare, which has to deal
> with an empty comparison. It shouldn't, in my opinion.
> 
> What we should do instead is to check if the Z and V registers have the same size and skip this
> comparison. Otherwise, if their sizes differ, we should go ahead and compare the contents of the
> part that doesn't overlap with the V register.
> 
> If we fix the AArch64 code, there isn't a need to consider the 0-byte comparison case in raw_compare,
> which means we can use a < comparison as opposed to <= for the offset.

Thanks for digging into this.

I think that both approaches have their merits.  If you allow a 0-byte
comparison, it can make caller code simpler.  For example, in the SVE
case, you can avoid that "are registers the same size" check.  On the
other hand, disallowing 0-byte comparison could find some bugs where we
messed up our offsets and we don't really want to compare 0 bytes.

I don't have a strong preference, as long as it's consistent between GDB
and GDBserver and well documented.

Simon
  
Luis Machado March 5, 2025, 4:59 p.m. UTC | #6
On 3/5/25 16:15, Simon Marchi wrote:
> On 3/5/25 10:56 AM, Luis Machado wrote:
>> Before venturing into creating a more substantial check for this, I took a step back to
>> understand what was going on.
>>
>> I think there is an oversight in the code that handles SVE comparisons. It is a corner case
>> and only happens when we have a vector length of 128-bit, meaning the Z registers and the V
>> registers have the same size. So it is a complete overlap.
>>
>> That isn't handled correctly in the code and we pass that down to raw_compare, which has to deal
>> with an empty comparison. It shouldn't, in my opinion.
>>
>> What we should do instead is to check if the Z and V registers have the same size and skip this
>> comparison. Otherwise, if their sizes differ, we should go ahead and compare the contents of the
>> part that doesn't overlap with the V register.
>>
>> If we fix the AArch64 code, there isn't a need to consider the 0-byte comparison case in raw_compare,
>> which means we can use a < comparison as opposed to <= for the offset.
> 
> Thanks for digging into this.
> 
> I think that both approaches have their merits.  If you allow a 0-byte
> comparison, it can make caller code simpler.  For example, in the SVE
> case, you can avoid that "are registers the same size" check.  On the
> other hand, disallowing 0-byte comparison could find some bugs where we
> messed up our offsets and we don't really want to compare 0 bytes.
> 
> I don't have a strong preference, as long as it's consistent between GDB
> and GDBserver and well documented.

Alright. That's a reasonable assessment. The standard says memcmp is supposed to return 0 for a 0-length
comparison. So that's one more in favor of keeping that behavior.

Let me think about this a bit more.
  
Joel Brobecker March 22, 2025, 12:43 p.m. UTC | #7
Hi Luis,

On Wed, Mar 05, 2025 at 04:59:28PM +0000, Luis Machado wrote:
> On 3/5/25 16:15, Simon Marchi wrote:
> > On 3/5/25 10:56 AM, Luis Machado wrote:
> >> Before venturing into creating a more substantial check for this, I took a step back to
> >> understand what was going on.
> >>
> >> I think there is an oversight in the code that handles SVE comparisons. It is a corner case
> >> and only happens when we have a vector length of 128-bit, meaning the Z registers and the V
> >> registers have the same size. So it is a complete overlap.
> >>
> >> That isn't handled correctly in the code and we pass that down to raw_compare, which has to deal
> >> with an empty comparison. It shouldn't, in my opinion.
> >>
> >> What we should do instead is to check if the Z and V registers have the same size and skip this
> >> comparison. Otherwise, if their sizes differ, we should go ahead and compare the contents of the
> >> part that doesn't overlap with the V register.
> >>
> >> If we fix the AArch64 code, there isn't a need to consider the 0-byte comparison case in raw_compare,
> >> which means we can use a < comparison as opposed to <= for the offset.
> > 
> > Thanks for digging into this.
> > 
> > I think that both approaches have their merits.  If you allow a 0-byte
> > comparison, it can make caller code simpler.  For example, in the SVE
> > case, you can avoid that "are registers the same size" check.  On the
> > other hand, disallowing 0-byte comparison could find some bugs where we
> > messed up our offsets and we don't really want to compare 0 bytes.
> > 
> > I don't have a strong preference, as long as it's consistent between GDB
> > and GDBserver and well documented.
> 
> Alright. That's a reasonable assessment. The standard says memcmp is supposed to return 0 for a 0-length
> comparison. So that's one more in favor of keeping that behavior.
> 
> Let me think about this a bit more.

Just checking in on this patch, as this one is in the list of potential
fixes for the 16.3 release. Have you made a decision?

Thank you!
  
Luis Machado March 24, 2025, 8:18 a.m. UTC | #8
On 3/22/25 12:43, Joel Brobecker wrote:
> Hi Luis,
> 
> On Wed, Mar 05, 2025 at 04:59:28PM +0000, Luis Machado wrote:
>> On 3/5/25 16:15, Simon Marchi wrote:
>>> On 3/5/25 10:56 AM, Luis Machado wrote:
>>>> Before venturing into creating a more substantial check for this, I took a step back to
>>>> understand what was going on.
>>>>
>>>> I think there is an oversight in the code that handles SVE comparisons. It is a corner case
>>>> and only happens when we have a vector length of 128-bit, meaning the Z registers and the V
>>>> registers have the same size. So it is a complete overlap.
>>>>
>>>> That isn't handled correctly in the code and we pass that down to raw_compare, which has to deal
>>>> with an empty comparison. It shouldn't, in my opinion.
>>>>
>>>> What we should do instead is to check if the Z and V registers have the same size and skip this
>>>> comparison. Otherwise, if their sizes differ, we should go ahead and compare the contents of the
>>>> part that doesn't overlap with the V register.
>>>>
>>>> If we fix the AArch64 code, there isn't a need to consider the 0-byte comparison case in raw_compare,
>>>> which means we can use a < comparison as opposed to <= for the offset.
>>>
>>> Thanks for digging into this.
>>>
>>> I think that both approaches have their merits.  If you allow a 0-byte
>>> comparison, it can make caller code simpler.  For example, in the SVE
>>> case, you can avoid that "are registers the same size" check.  On the
>>> other hand, disallowing 0-byte comparison could find some bugs where we
>>> messed up our offsets and we don't really want to compare 0 bytes.
>>>
>>> I don't have a strong preference, as long as it's consistent between GDB
>>> and GDBserver and well documented.
>>
>> Alright. That's a reasonable assessment. The standard says memcmp is supposed to return 0 for a 0-length
>> comparison. So that's one more in favor of keeping that behavior.
>>
>> Let me think about this a bit more.
> 
> Just checking in on this patch, as this one is in the list of potential
> fixes for the 16.3 release. Have you made a decision?
> 
> Thank you!

Yes, I want to try to have a gdbserver-side testcase for this, which I'm still working on.

I'm planning on having something for next week if that's fine for the release schedule.
  
Joel Brobecker March 25, 2025, 5:46 a.m. UTC | #9
> Yes, I want to try to have a gdbserver-side testcase for this, which
> I'm still working on.
> 
> I'm planning on having something for next week if that's fine for the
> release schedule.

Thanks for the update, Luis. I confirm this fits with the release
schedule (which I try to keep flexible anyway, owing to the voluntary
nature of our project).
  

Patch

diff --git a/gdbserver/regcache.cc b/gdbserver/regcache.cc
index ee0c1b363b0..c08c9ae06c1 100644
--- a/gdbserver/regcache.cc
+++ b/gdbserver/regcache.cc
@@ -503,7 +503,7 @@  regcache::raw_compare (int regnum, const void *buf, int offset) const
   gdb_assert (buf != NULL);
 
   gdb::array_view<const gdb_byte> regbuf = register_data (this, regnum);
-  gdb_assert (offset < regbuf.size ());
+  gdb_assert (offset <= regbuf.size ());
   regbuf = regbuf.slice (offset);
 
   return memcmp (buf, regbuf.data (), regbuf.size ()) == 0;