[v7,15/18,gdb/generic] corefile/bug: Add hook to control the use of target description notes from corefiles

Message ID 20230918212651.660141-16-luis.machado@arm.com
State New
Headers
Series SME support for AArch64 gdb/gdbserver on Linux |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 pending Test started
linaro-tcwg-bot/tcwg_gdb_check--master-arm pending Test started
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 pending Test started
linaro-tcwg-bot/tcwg_gdb_build--master-arm pending Test started

Commit Message

Luis Machado Sept. 18, 2023, 9:26 p.m. UTC
  New entry in v7.

---

Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
can potentially have distinct target descriptions/gdbarches.

When loading a gcore-generated core file, at the moment GDB gives priority
to the target description dumped to NT_GDB_TDESC.  Though technically correct
for most targets, it doesn't work correctly for AArch64 with SVE or SME
support.

The correct approach for AArch64/Linux is to either have per-thread target
description notes in the corefiles or to rely on the
gdbarch_core_read_description hook, so it can figure out the proper target
description for a given thread based on the various available register notes.

The former, although more correct, doesn't address the case of existing gdb's
that only output a single target description note.

This patch goes for the latter, and adds a new gdbarch hook to conditionalize
the use of the corefile target description note. The hook is called
use_target_description_from_corefile_notes.

The hook defaults to returning true, meaning targets will use the corefile
target description note.  AArch64 Linux overrides the hook to return false
when it detects any of the SVE or SME register notes in the corefile.

Otherwise it should be fine for AArch64 Linux to use the corefile target
description note.

When we support per-thread target description notes, then we can augment
the AArch64 Linux hook to rely on those notes.

Regression-tested on aarch64-linux Ubuntu 22.04/20.04.
---
 gdb/aarch64-linux-tdep.c  | 33 ++++++++++++++++++++++++++
 gdb/arch-utils.c          | 10 ++++++++
 gdb/arch-utils.h          |  6 +++++
 gdb/corelow.c             | 50 ++++++++++++++++++++++++---------------
 gdb/gdbarch-gen.h         | 14 +++++++++++
 gdb/gdbarch.c             | 22 +++++++++++++++++
 gdb/gdbarch_components.py | 19 +++++++++++++++
 7 files changed, 135 insertions(+), 19 deletions(-)
  

Comments

Simon Marchi Sept. 19, 2023, 8:49 p.m. UTC | #1
On 9/18/23 17:26, Luis Machado wrote:
> New entry in v7.
> 
> ---

Note: when you use `---` in a commit message, git-am drops whatever
comes after that, so here we lose the commit message when applying.  I
think the intention is that you'd put whatever you don't want to appear
in the commit message (like the "new in version X" notes) after the
`---`.  I'll try to do that from now on.

> 
> Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
> can potentially have distinct target descriptions/gdbarches.
> 
> When loading a gcore-generated core file, at the moment GDB gives priority
> to the target description dumped to NT_GDB_TDESC.  Though technically correct
> for most targets, it doesn't work correctly for AArch64 with SVE or SME
> support.
> 
> The correct approach for AArch64/Linux is to either have per-thread target
> description notes in the corefiles or to rely on the
> gdbarch_core_read_description hook, so it can figure out the proper target
> description for a given thread based on the various available register notes.
> 
> The former, although more correct, doesn't address the case of existing gdb's
> that only output a single target description note.
> 
> This patch goes for the latter, and adds a new gdbarch hook to conditionalize
> the use of the corefile target description note. The hook is called
> use_target_description_from_corefile_notes.
> 
> The hook defaults to returning true, meaning targets will use the corefile
> target description note.  AArch64 Linux overrides the hook to return false
> when it detects any of the SVE or SME register notes in the corefile.
> 
> Otherwise it should be fine for AArch64 Linux to use the corefile target
> description note.
> 
> When we support per-thread target description notes, then we can augment
> the AArch64 Linux hook to rely on those notes.
> 
> Regression-tested on aarch64-linux Ubuntu 22.04/20.04.

The question that came to mind when reading this is (I already asked you
on IRC, but perhaps someone else has an idea): if GDB has to know how to
infer target descriptions from the core file contents (the
kernel-generated core files don't include target description notes, only
GDB-generated core files do), what advantage do we have of storing and
reading the target description in the core?  Are there any scenarios
where the target description note conveys something that GDB wouldn't
figure out by analyzing the core file content?

> diff --git a/gdb/gdbarch_components.py b/gdb/gdbarch_components.py
> index 846467b8d83..bbb9b188286 100644
> --- a/gdb/gdbarch_components.py
> +++ b/gdb/gdbarch_components.py
> @@ -2732,3 +2732,22 @@ Read core file mappings
>      predefault="default_read_core_file_mappings",
>      invalid=False,
>  )
> +
> +Method(
> +    comment="""
> +Return true if the target description for all threads should be read from the
> +target description core file note(s).  Return false if the target description
> +for all threads should be inferred from the core file contents/sections.
> +
> +The corefile's bfd is passed through OBFD.
> +
> +This hook should be used by targets that can have distinct target descriptions
> +for each thread when the core file only holds a single target description
> +note.

I think the last paragraph should be left out.  An arch could decide not
to use the target description note for whatever reason it sees fit, this
is just one example.  The reason for AArch64 appears to be sufficiently
described in the AArch64-specific files.

> +""",
> +    type="bool",
> +    name="use_target_description_from_corefile_notes",
> +    params=[("struct bfd *", "obfd")],

Can you name the parameter core_bfd maybe?  Just a bit more precise.

Otherwise:

Approved-By: Simon Marchi <simon.marchi@efficios.com>

Simon
  
Luis Machado Sept. 20, 2023, 5:49 a.m. UTC | #2
On 9/19/23 21:49, Simon Marchi wrote:
> On 9/18/23 17:26, Luis Machado wrote:
>> New entry in v7.
>>
>> ---
> 
> Note: when you use `---` in a commit message, git-am drops whatever
> comes after that, so here we lose the commit message when applying.  I
> think the intention is that you'd put whatever you don't want to appear
> in the commit message (like the "new in version X" notes) after the
> `---`.  I'll try to do that from now on.
> 

Yeah, that's a bad marker indeed. Is that configuration-specific though?

>>
>> Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
>> can potentially have distinct target descriptions/gdbarches.
>>
>> When loading a gcore-generated core file, at the moment GDB gives priority
>> to the target description dumped to NT_GDB_TDESC.  Though technically correct
>> for most targets, it doesn't work correctly for AArch64 with SVE or SME
>> support.
>>
>> The correct approach for AArch64/Linux is to either have per-thread target
>> description notes in the corefiles or to rely on the
>> gdbarch_core_read_description hook, so it can figure out the proper target
>> description for a given thread based on the various available register notes.
>>
>> The former, although more correct, doesn't address the case of existing gdb's
>> that only output a single target description note.
>>
>> This patch goes for the latter, and adds a new gdbarch hook to conditionalize
>> the use of the corefile target description note. The hook is called
>> use_target_description_from_corefile_notes.
>>
>> The hook defaults to returning true, meaning targets will use the corefile
>> target description note.  AArch64 Linux overrides the hook to return false
>> when it detects any of the SVE or SME register notes in the corefile.
>>
>> Otherwise it should be fine for AArch64 Linux to use the corefile target
>> description note.
>>
>> When we support per-thread target description notes, then we can augment
>> the AArch64 Linux hook to rely on those notes.
>>
>> Regression-tested on aarch64-linux Ubuntu 22.04/20.04.
> 
> The question that came to mind when reading this is (I already asked you
> on IRC, but perhaps someone else has an idea): if GDB has to know how to
> infer target descriptions from the core file contents (the
> kernel-generated core files don't include target description notes, only
> GDB-generated core files do), what advantage do we have of storing and
> reading the target description in the core?  Are there any scenarios
> where the target description note conveys something that GDB wouldn't
> figure out by analyzing the core file content?
> 

According to the original commit that added NT_GDB_TDESC, there are potential use case around
dumping a core file from a remote session. That remote target may potentially have more registers
than what gdb might be able to infer.

>> diff --git a/gdb/gdbarch_components.py b/gdb/gdbarch_components.py
>> index 846467b8d83..bbb9b188286 100644
>> --- a/gdb/gdbarch_components.py
>> +++ b/gdb/gdbarch_components.py
>> @@ -2732,3 +2732,22 @@ Read core file mappings
>>      predefault="default_read_core_file_mappings",
>>      invalid=False,
>>  )
>> +
>> +Method(
>> +    comment="""
>> +Return true if the target description for all threads should be read from the
>> +target description core file note(s).  Return false if the target description
>> +for all threads should be inferred from the core file contents/sections.
>> +
>> +The corefile's bfd is passed through OBFD.
>> +
>> +This hook should be used by targets that can have distinct target descriptions
>> +for each thread when the core file only holds a single target description
>> +note.
> 
> I think the last paragraph should be left out.  An arch could decide not
> to use the target description note for whatever reason it sees fit, this
> is just one example.  The reason for AArch64 appears to be sufficiently
> described in the AArch64-specific files.
> 

Will do, thanks.

>> +""",
>> +    type="bool",
>> +    name="use_target_description_from_corefile_notes",
>> +    params=[("struct bfd *", "obfd")],
> 
> Can you name the parameter core_bfd maybe?  Just a bit more precise.
> 

That was my first try, but unfortunately core_bfd seems to be a macro, and it doesn't
play well in this context. Maybe a longer name that dodges the core_bfd macro?

/* Binary File Diddler for the core file.  */

#define core_bfd (current_program_space->cbfd.get ())

> Otherwise:
> 
> Approved-By: Simon Marchi <simon.marchi@efficios.com>
> 
> Simon
  
Luis Machado Sept. 20, 2023, 2:01 p.m. UTC | #3
On 9/20/23 06:49, Luis Machado wrote:
> On 9/19/23 21:49, Simon Marchi wrote:
>> On 9/18/23 17:26, Luis Machado wrote:
>>> New entry in v7.
>>>
>>> ---
>>
>> Note: when you use `---` in a commit message, git-am drops whatever
>> comes after that, so here we lose the commit message when applying.  I
>> think the intention is that you'd put whatever you don't want to appear
>> in the commit message (like the "new in version X" notes) after the
>> `---`.  I'll try to do that from now on.
>>
> 
> Yeah, that's a bad marker indeed. Is that configuration-specific though?
> 

Since I started playing with b4 only recently, I think I didn't run into a case of
trying to apply a series containing this marker. I get the same, and what you mentioned
makes sense.

It is pretty handy, when trying to push a series, for git to take care of removing
the update entries. Might be useful to do it as part of the gdb development workflow.

>>>
>>> Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
>>> can potentially have distinct target descriptions/gdbarches.
>>>
>>> When loading a gcore-generated core file, at the moment GDB gives priority
>>> to the target description dumped to NT_GDB_TDESC.  Though technically correct
>>> for most targets, it doesn't work correctly for AArch64 with SVE or SME
>>> support.
>>>
>>> The correct approach for AArch64/Linux is to either have per-thread target
>>> description notes in the corefiles or to rely on the
>>> gdbarch_core_read_description hook, so it can figure out the proper target
>>> description for a given thread based on the various available register notes.
>>>
>>> The former, although more correct, doesn't address the case of existing gdb's
>>> that only output a single target description note.
>>>
>>> This patch goes for the latter, and adds a new gdbarch hook to conditionalize
>>> the use of the corefile target description note. The hook is called
>>> use_target_description_from_corefile_notes.
>>>
>>> The hook defaults to returning true, meaning targets will use the corefile
>>> target description note.  AArch64 Linux overrides the hook to return false
>>> when it detects any of the SVE or SME register notes in the corefile.
>>>
>>> Otherwise it should be fine for AArch64 Linux to use the corefile target
>>> description note.
>>>
>>> When we support per-thread target description notes, then we can augment
>>> the AArch64 Linux hook to rely on those notes.
>>>
>>> Regression-tested on aarch64-linux Ubuntu 22.04/20.04.
>>
>> The question that came to mind when reading this is (I already asked you
>> on IRC, but perhaps someone else has an idea): if GDB has to know how to
>> infer target descriptions from the core file contents (the
>> kernel-generated core files don't include target description notes, only
>> GDB-generated core files do), what advantage do we have of storing and
>> reading the target description in the core?  Are there any scenarios
>> where the target description note conveys something that GDB wouldn't
>> figure out by analyzing the core file content?
>>
> 
> According to the original commit that added NT_GDB_TDESC, there are potential use case around
> dumping a core file from a remote session. That remote target may potentially have more registers
> than what gdb might be able to infer.
> 
>>> diff --git a/gdb/gdbarch_components.py b/gdb/gdbarch_components.py
>>> index 846467b8d83..bbb9b188286 100644
>>> --- a/gdb/gdbarch_components.py
>>> +++ b/gdb/gdbarch_components.py
>>> @@ -2732,3 +2732,22 @@ Read core file mappings
>>>      predefault="default_read_core_file_mappings",
>>>      invalid=False,
>>>  )
>>> +
>>> +Method(
>>> +    comment="""
>>> +Return true if the target description for all threads should be read from the
>>> +target description core file note(s).  Return false if the target description
>>> +for all threads should be inferred from the core file contents/sections.
>>> +
>>> +The corefile's bfd is passed through OBFD.
>>> +
>>> +This hook should be used by targets that can have distinct target descriptions
>>> +for each thread when the core file only holds a single target description
>>> +note.
>>
>> I think the last paragraph should be left out.  An arch could decide not
>> to use the target description note for whatever reason it sees fit, this
>> is just one example.  The reason for AArch64 appears to be sufficiently
>> described in the AArch64-specific files.
>>
> 
> Will do, thanks.
> 
>>> +""",
>>> +    type="bool",
>>> +    name="use_target_description_from_corefile_notes",
>>> +    params=[("struct bfd *", "obfd")],
>>
>> Can you name the parameter core_bfd maybe?  Just a bit more precise.
>>
> 
> That was my first try, but unfortunately core_bfd seems to be a macro, and it doesn't
> play well in this context. Maybe a longer name that dodges the core_bfd macro?
> 
> /* Binary File Diddler for the core file.  */
> 
> #define core_bfd (current_program_space->cbfd.get ())
> 

FTR, I went with corefile_bfd to dodge the macro name.

>> Otherwise:
>>
>> Approved-By: Simon Marchi <simon.marchi@efficios.com>
>>
>> Simon
> 

I plan to push both these series (sme1/sme2) towards the end of the week.

Thanks,
Luis
  
Andrew Burgess Sept. 20, 2023, 2:22 p.m. UTC | #4
Luis Machado via Gdb-patches <gdb-patches@sourceware.org> writes:

> New entry in v7.
>
> ---
>
> Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
> can potentially have distinct target descriptions/gdbarches.
>
> When loading a gcore-generated core file, at the moment GDB gives priority
> to the target description dumped to NT_GDB_TDESC.  Though technically correct
> for most targets, it doesn't work correctly for AArch64 with SVE or SME
> support.
>
> The correct approach for AArch64/Linux is to either have per-thread target
> description notes in the corefiles or to rely on the
> gdbarch_core_read_description hook, so it can figure out the proper target
> description for a given thread based on the various available register notes.
>
> The former, although more correct, doesn't address the case of existing gdb's
> that only output a single target description note.

Do those existing GDB's support per-thread target descriptions though?
I thought this series was the where the per-thread target description
feature was being added ... so shouldn't core files generated by
previous GDB's only support a single target description?  Or am I
missing something.

>
> This patch goes for the latter, and adds a new gdbarch hook to conditionalize
> the use of the corefile target description note. The hook is called
> use_target_description_from_corefile_notes.
>
> The hook defaults to returning true, meaning targets will use the corefile
> target description note.  AArch64 Linux overrides the hook to return false
> when it detects any of the SVE or SME register notes in the corefile.
>
> Otherwise it should be fine for AArch64 Linux to use the corefile target
> description note.
>
> When we support per-thread target description notes, then we can augment
> the AArch64 Linux hook to rely on those notes.

I guess I was a little surprised that I couldn't see anywhere in this
series that you _stop_ adding the NT_GDB_TDESC note to core files
generated from within GDB.

I guess I was expecting to see some logic in gcore_elf_make_tdesc_note
that tried to figure out if we had per-thread tdesc, and if so, at a
minimum, didn't add the NT_GDB_TDESC.

If we did that, then, I'm thinking:

  - Previous GDB's only supported a single tdesc, and so are correct,

  - New GDB's support per-thread tdesc, but don't emit NT_GDB_TDESC, so
    we don't need to guard against loading them

Maybe I'm missing something though.

Thanks,
Andrew


>
> Regression-tested on aarch64-linux Ubuntu 22.04/20.04.
> ---
>  gdb/aarch64-linux-tdep.c  | 33 ++++++++++++++++++++++++++
>  gdb/arch-utils.c          | 10 ++++++++
>  gdb/arch-utils.h          |  6 +++++
>  gdb/corelow.c             | 50 ++++++++++++++++++++++++---------------
>  gdb/gdbarch-gen.h         | 14 +++++++++++
>  gdb/gdbarch.c             | 22 +++++++++++++++++
>  gdb/gdbarch_components.py | 19 +++++++++++++++
>  7 files changed, 135 insertions(+), 19 deletions(-)
>
> diff --git a/gdb/aarch64-linux-tdep.c b/gdb/aarch64-linux-tdep.c
> index 47e5e1db641..21ac7ebdc56 100644
> --- a/gdb/aarch64-linux-tdep.c
> +++ b/gdb/aarch64-linux-tdep.c
> @@ -2199,6 +2199,33 @@ aarch64_linux_decode_memtag_section (struct gdbarch *gdbarch,
>    return tags;
>  }
>  
> +/* AArch64 Linux implementation of the
> +   gdbarch_use_target_description_from_corefile_notes hook.  */
> +
> +static bool
> +aarch64_use_target_description_from_corefile_notes (gdbarch *gdbarch,
> +						    bfd *obfd)
> +{
> +  /* Sanity check.  */
> +  gdb_assert (obfd != nullptr);
> +
> +  /* If the corefile contains any SVE or SME register data, we don't want to
> +     use the target description note, as it may be incorrect.
> +
> +     Currently the target description note contains a potentially incorrect
> +     target description if the originating program changed the SVE or SME
> +     vector lengths mid-execution.
> +
> +     Once we support per-thread target description notes in the corefiles, we
> +     can always trust those notes whenever they are available.  */
> +  if (bfd_get_section_by_name (obfd, ".reg-aarch-sve") != nullptr
> +      || bfd_get_section_by_name (obfd, ".reg-aarch-za") != nullptr
> +      || bfd_get_section_by_name (obfd, ".reg-aarch-zt") != nullptr)
> +    return false;
> +
> +  return true;
> +}
> +
>  static void
>  aarch64_linux_init_abi (struct gdbarch_info info, struct gdbarch *gdbarch)
>  {
> @@ -2469,6 +2496,12 @@ aarch64_linux_init_abi (struct gdbarch_info info, struct gdbarch *gdbarch)
>  					    aarch64_displaced_step_hw_singlestep);
>  
>    set_gdbarch_gcc_target_options (gdbarch, aarch64_linux_gcc_target_options);
> +
> +  /* Hook to decide if the target description should be obtained from
> +     corefile target description note(s) or inferred from the corefile
> +     sections.  */
> +  set_gdbarch_use_target_description_from_corefile_notes (gdbarch,
> +			    aarch64_use_target_description_from_corefile_notes);
>  }
>  
>  #if GDB_SELF_TEST
> diff --git a/gdb/arch-utils.c b/gdb/arch-utils.c
> index ee34fc07d33..c1d2c939eb9 100644
> --- a/gdb/arch-utils.c
> +++ b/gdb/arch-utils.c
> @@ -1092,6 +1092,16 @@ default_read_core_file_mappings
>  {
>  }
>  
> +/* See arch-utils.h.  */
> +bool
> +default_use_target_description_from_corefile_notes (struct gdbarch *gdbarch,
> +						    struct bfd *obfd)
> +{
> +  /* Always trust the corefile target description contained in the target
> +     description note.  */
> +  return true;
> +}
> +
>  CORE_ADDR
>  default_get_return_buf_addr (struct type *val_type, frame_info_ptr cur_frame)
>  {
> diff --git a/gdb/arch-utils.h b/gdb/arch-utils.h
> index 2bdc3251c9c..5df3de7b5d9 100644
> --- a/gdb/arch-utils.h
> +++ b/gdb/arch-utils.h
> @@ -305,6 +305,12 @@ extern void default_read_core_file_mappings
>     read_core_file_mappings_pre_loop_ftype pre_loop_cb,
>     read_core_file_mappings_loop_ftype loop_cb);
>  
> +/* Default implementation of gdbarch
> +   use_target_description_from_corefile_notes.  */
> +extern bool default_use_target_description_from_corefile_notes
> +  (struct gdbarch *gdbarch,
> +  struct bfd *obfd);
> +
>  /* Default implementation of gdbarch default_get_return_buf_addr method.  */
>  extern CORE_ADDR default_get_return_buf_addr (struct type *val_typegdbarch,
>  					      frame_info_ptr cur_frame);
> diff --git a/gdb/corelow.c b/gdb/corelow.c
> index 439270f5559..114ce3054d5 100644
> --- a/gdb/corelow.c
> +++ b/gdb/corelow.c
> @@ -1229,35 +1229,47 @@ core_target::thread_alive (ptid_t ptid)
>  const struct target_desc *
>  core_target::read_description ()
>  {
> -  /* If the core file contains a target description note then we will use
> -     that in preference to anything else.  */
> -  bfd_size_type tdesc_note_size = 0;
> -  struct bfd_section *tdesc_note_section
> -    = bfd_get_section_by_name (core_bfd, ".gdb-tdesc");
> -  if (tdesc_note_section != nullptr)
> -    tdesc_note_size = bfd_section_size (tdesc_note_section);
> -  if (tdesc_note_size > 0)
> +  /* First check whether the target wants us to use the corefile target
> +     description notes.  */
> +  if (gdbarch_use_target_description_from_corefile_notes (m_core_gdbarch,
> +							  core_bfd))
>      {
> -      gdb::char_vector contents (tdesc_note_size + 1);
> -      if (bfd_get_section_contents (core_bfd, tdesc_note_section,
> -				    contents.data (), (file_ptr) 0,
> -				    tdesc_note_size))
> +      /* If the core file contains a target description note then go ahead and
> +	 use that.  */
> +      bfd_size_type tdesc_note_size = 0;
> +      struct bfd_section *tdesc_note_section
> +	= bfd_get_section_by_name (core_bfd, ".gdb-tdesc");
> +      if (tdesc_note_section != nullptr)
> +	tdesc_note_size = bfd_section_size (tdesc_note_section);
> +      if (tdesc_note_size > 0)
>  	{
> -	  /* Ensure we have a null terminator.  */
> -	  contents[tdesc_note_size] = '\0';
> -	  const struct target_desc *result
> -	    = string_read_description_xml (contents.data ());
> -	  if (result != nullptr)
> -	    return result;
> +	  gdb::char_vector contents (tdesc_note_size + 1);
> +	  if (bfd_get_section_contents (core_bfd, tdesc_note_section,
> +					contents.data (), (file_ptr) 0,
> +					tdesc_note_size))
> +	    {
> +	      /* Ensure we have a null terminator.  */
> +	      contents[tdesc_note_size] = '\0';
> +	      const struct target_desc *result
> +		= string_read_description_xml (contents.data ());
> +	      if (result != nullptr)
> +		return result;
> +	    }
>  	}
>      }
>  
> +  /* If the architecture provides a corefile target description hook, use
> +     it now.  Even if the core file contains a target description in a note
> +     section, it is not useful for targets that can potentially have distinct
> +     descriptions for each thread.  One example is AArch64's SVE/SME
> +     extensions that allow per-thread vector length changes, resulting in
> +     registers with different sizes.  */
>    if (m_core_gdbarch && gdbarch_core_read_description_p (m_core_gdbarch))
>      {
>        const struct target_desc *result;
>  
>        result = gdbarch_core_read_description (m_core_gdbarch, this, core_bfd);
> -      if (result != NULL)
> +      if (result != nullptr)
>  	return result;
>      }
>  
> diff --git a/gdb/gdbarch-gen.h b/gdb/gdbarch-gen.h
> index d62eefa1c5b..33276aa1c43 100644
> --- a/gdb/gdbarch-gen.h
> +++ b/gdb/gdbarch-gen.h
> @@ -1717,3 +1717,17 @@ extern void set_gdbarch_get_pc_address_flags (struct gdbarch *gdbarch, gdbarch_g
>  typedef void (gdbarch_read_core_file_mappings_ftype) (struct gdbarch *gdbarch, struct bfd *cbfd, read_core_file_mappings_pre_loop_ftype pre_loop_cb, read_core_file_mappings_loop_ftype loop_cb);
>  extern void gdbarch_read_core_file_mappings (struct gdbarch *gdbarch, struct bfd *cbfd, read_core_file_mappings_pre_loop_ftype pre_loop_cb, read_core_file_mappings_loop_ftype loop_cb);
>  extern void set_gdbarch_read_core_file_mappings (struct gdbarch *gdbarch, gdbarch_read_core_file_mappings_ftype *read_core_file_mappings);
> +
> +/* Return true if the target description for all threads should be read from the
> +   target description core file note(s).  Return false if the target description
> +   for all threads should be inferred from the core file contents/sections.
> +
> +   The corefile's bfd is passed through OBFD.
> +
> +   This hook should be used by targets that can have distinct target descriptions
> +   for each thread when the core file only holds a single target description
> +   note. */
> +
> +typedef bool (gdbarch_use_target_description_from_corefile_notes_ftype) (struct gdbarch *gdbarch, struct bfd *obfd);
> +extern bool gdbarch_use_target_description_from_corefile_notes (struct gdbarch *gdbarch, struct bfd *obfd);
> +extern void set_gdbarch_use_target_description_from_corefile_notes (struct gdbarch *gdbarch, gdbarch_use_target_description_from_corefile_notes_ftype *use_target_description_from_corefile_notes);
> diff --git a/gdb/gdbarch.c b/gdb/gdbarch.c
> index 1fc254d3d6e..ee868908598 100644
> --- a/gdb/gdbarch.c
> +++ b/gdb/gdbarch.c
> @@ -256,6 +256,7 @@ struct gdbarch
>    gdbarch_type_align_ftype *type_align = default_type_align;
>    gdbarch_get_pc_address_flags_ftype *get_pc_address_flags = default_get_pc_address_flags;
>    gdbarch_read_core_file_mappings_ftype *read_core_file_mappings = default_read_core_file_mappings;
> +  gdbarch_use_target_description_from_corefile_notes_ftype *use_target_description_from_corefile_notes = default_use_target_description_from_corefile_notes;
>  };
>  
>  /* Create a new ``struct gdbarch'' based on information provided by
> @@ -523,6 +524,7 @@ verify_gdbarch (struct gdbarch *gdbarch)
>    /* Skip verify of type_align, invalid_p == 0 */
>    /* Skip verify of get_pc_address_flags, invalid_p == 0 */
>    /* Skip verify of read_core_file_mappings, invalid_p == 0 */
> +  /* Skip verify of use_target_description_from_corefile_notes, invalid_p == 0 */
>    if (!log.empty ())
>      internal_error (_("verify_gdbarch: the following are invalid ...%s"),
>  		    log.c_str ());
> @@ -1373,6 +1375,9 @@ gdbarch_dump (struct gdbarch *gdbarch, struct ui_file *file)
>    gdb_printf (file,
>  	      "gdbarch_dump: read_core_file_mappings = <%s>\n",
>  	      host_address_to_string (gdbarch->read_core_file_mappings));
> +  gdb_printf (file,
> +	      "gdbarch_dump: use_target_description_from_corefile_notes = <%s>\n",
> +	      host_address_to_string (gdbarch->use_target_description_from_corefile_notes));
>    if (gdbarch->dump_tdep != NULL)
>      gdbarch->dump_tdep (gdbarch, file);
>  }
> @@ -5409,3 +5414,20 @@ set_gdbarch_read_core_file_mappings (struct gdbarch *gdbarch,
>  {
>    gdbarch->read_core_file_mappings = read_core_file_mappings;
>  }
> +
> +bool
> +gdbarch_use_target_description_from_corefile_notes (struct gdbarch *gdbarch, struct bfd *obfd)
> +{
> +  gdb_assert (gdbarch != NULL);
> +  gdb_assert (gdbarch->use_target_description_from_corefile_notes != NULL);
> +  if (gdbarch_debug >= 2)
> +    gdb_printf (gdb_stdlog, "gdbarch_use_target_description_from_corefile_notes called\n");
> +  return gdbarch->use_target_description_from_corefile_notes (gdbarch, obfd);
> +}
> +
> +void
> +set_gdbarch_use_target_description_from_corefile_notes (struct gdbarch *gdbarch,
> +							gdbarch_use_target_description_from_corefile_notes_ftype use_target_description_from_corefile_notes)
> +{
> +  gdbarch->use_target_description_from_corefile_notes = use_target_description_from_corefile_notes;
> +}
> diff --git a/gdb/gdbarch_components.py b/gdb/gdbarch_components.py
> index 846467b8d83..bbb9b188286 100644
> --- a/gdb/gdbarch_components.py
> +++ b/gdb/gdbarch_components.py
> @@ -2732,3 +2732,22 @@ Read core file mappings
>      predefault="default_read_core_file_mappings",
>      invalid=False,
>  )
> +
> +Method(
> +    comment="""
> +Return true if the target description for all threads should be read from the
> +target description core file note(s).  Return false if the target description
> +for all threads should be inferred from the core file contents/sections.
> +
> +The corefile's bfd is passed through OBFD.
> +
> +This hook should be used by targets that can have distinct target descriptions
> +for each thread when the core file only holds a single target description
> +note.
> +""",
> +    type="bool",
> +    name="use_target_description_from_corefile_notes",
> +    params=[("struct bfd *", "obfd")],
> +    predefault="default_use_target_description_from_corefile_notes",
> +    invalid=False,
> +)
> -- 
> 2.25.1
  
Andrew Burgess Sept. 20, 2023, 3:26 p.m. UTC | #5
Andrew Burgess <aburgess@redhat.com> writes:

> Luis Machado via Gdb-patches <gdb-patches@sourceware.org> writes:
>
>> New entry in v7.
>>
>> ---
>>
>> Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
>> can potentially have distinct target descriptions/gdbarches.
>>
>> When loading a gcore-generated core file, at the moment GDB gives priority
>> to the target description dumped to NT_GDB_TDESC.  Though technically correct
>> for most targets, it doesn't work correctly for AArch64 with SVE or SME
>> support.
>>
>> The correct approach for AArch64/Linux is to either have per-thread target
>> description notes in the corefiles or to rely on the
>> gdbarch_core_read_description hook, so it can figure out the proper target
>> description for a given thread based on the various available register notes.
>>
>> The former, although more correct, doesn't address the case of existing gdb's
>> that only output a single target description note.
>
> Do those existing GDB's support per-thread target descriptions though?
> I thought this series was the where the per-thread target description
> feature was being added ... so shouldn't core files generated by
> previous GDB's only support a single target description?  Or am I
> missing something.
>
>>
>> This patch goes for the latter, and adds a new gdbarch hook to conditionalize
>> the use of the corefile target description note. The hook is called
>> use_target_description_from_corefile_notes.
>>
>> The hook defaults to returning true, meaning targets will use the corefile
>> target description note.  AArch64 Linux overrides the hook to return false
>> when it detects any of the SVE or SME register notes in the corefile.
>>
>> Otherwise it should be fine for AArch64 Linux to use the corefile target
>> description note.
>>
>> When we support per-thread target description notes, then we can augment
>> the AArch64 Linux hook to rely on those notes.
>
> I guess I was a little surprised that I couldn't see anywhere in this
> series that you _stop_ adding the NT_GDB_TDESC note to core files
> generated from within GDB.
>
> I guess I was expecting to see some logic in gcore_elf_make_tdesc_note
> that tried to figure out if we had per-thread tdesc, and if so, at a
> minimum, didn't add the NT_GDB_TDESC.
>
> If we did that, then, I'm thinking:
>
>   - Previous GDB's only supported a single tdesc, and so are correct,
>
>   - New GDB's support per-thread tdesc, but don't emit NT_GDB_TDESC, so
>     we don't need to guard against loading them
>
> Maybe I'm missing something though.

Luis,

After our discussion on IRC I think I understand this a bit more, so
thanks for that.

So, here's what I think is true:

  - AArch64 supports per-thread gdbarch since before this patch series
    (implements ::thread_architecture),

  - GDB by default writes out a single NT_GDB_TDESC which describes the
    "per-inferior" gdbarch, but this doesn't correctly represent the
    per-thread gdbarch/tdesc,

  - If a NT_GDB_TDESC is present then GDB is going to try and load it
    back in and use it,

  - This is a problem that has existed for a while, but you've only just
    become aware, and so you're fixing it here,

  - The new gdbarch hook allows an architecture to avoid loading a
    single NT_GDB_TDESC note, for AArch64 this is when we see registers
    that are (likely) going to require per-thread gdbarch/tdesc,

I'm hoping we both agree on the above.  So from there I think I has two
concerns:

 1. GDB should avoid writing out the NT_GDB_TDESC is the situation where
    it know that a single such note is not correct, and

 2. I wonder if there's a better solution than the new hook.

For point (1) the patch below is a rough draft of what I'm thinking
about: by looking at each of the gdbarch pointers we can spot if an
inferior uses multiple different gdbarches -- if it does then we know
that a single NT_GDB_TDESC is going to be wrong, so lets not write it
out.

As was mentioned on IRC, longer term, it might be better if we wrote out
one tdesc for each thread, but that feels like a bigger job, and I think
stopping GDB doing something that's just *wrong* is pretty easy, so why
wouldn't we do that?

For point (2) I agree with the premise that we need a mechanism to stop
AArch64 loading the incorrect single NT_GDB_TDESC.  This is a slight
change in stance from what I originally wrote, but our IRC conversation
showed me I was wrong originally.  I don't have time this evening to
look at this, but will follow up again tomorrow with more thoughts.

Thanks,
Andrew

---

diff --git a/gdb/linux-tdep.c b/gdb/linux-tdep.c
index 2885afd60c7..f1f7675eb37 100644
--- a/gdb/linux-tdep.c
+++ b/gdb/linux-tdep.c
@@ -2077,6 +2077,8 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
   else
     stop_signal = GDB_SIGNAL_0;
 
+  bool multiple_arches = false;
+  struct gdbarch *some_arch = nullptr;
   if (signalled_thr != nullptr)
     {
       /* On some architectures, like AArch64, each thread can have a distinct
@@ -2085,8 +2087,8 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
 
 	 Fetch each thread's gdbarch and pass it down to the lower layers so
 	 we can dump the right set of registers.  */
-      linux_corefile_thread (signalled_thr,
-			     target_thread_architecture (signalled_thr->ptid),
+      some_arch = target_thread_architecture (signalled_thr->ptid);
+      linux_corefile_thread (signalled_thr, some_arch,
 			     obfd, note_data, note_size, stop_signal);
     }
   for (thread_info *thr : current_inferior ()->non_exited_threads ())
@@ -2100,7 +2102,10 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
 
 	 Fetch each thread's gdbarch and pass it down to the lower layers so
 	 we can dump the right set of registers.  */
-      linux_corefile_thread (thr, target_thread_architecture (thr->ptid),
+      struct gdbarch *tmp = target_thread_architecture (thr->ptid);
+      if (tmp != some_arch)
+	multiple_arches = true;
+      linux_corefile_thread (thr, tmp,
 			     obfd, note_data, note_size, stop_signal);
     }
 
@@ -2125,7 +2130,8 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
   linux_make_mappings_corefile_notes (gdbarch, obfd, note_data, note_size);
 
   /* Target description.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  if (!multiple_arches)
+    gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
 
   return note_data;
 }
  
Luis Machado Sept. 20, 2023, 11:35 p.m. UTC | #6
On 9/20/23 16:26, Andrew Burgess wrote:
> Andrew Burgess <aburgess@redhat.com> writes:
> 
>> Luis Machado via Gdb-patches <gdb-patches@sourceware.org> writes:
>>
>>> New entry in v7.
>>>
>>> ---
>>>
>>> Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
>>> can potentially have distinct target descriptions/gdbarches.
>>>
>>> When loading a gcore-generated core file, at the moment GDB gives priority
>>> to the target description dumped to NT_GDB_TDESC.  Though technically correct
>>> for most targets, it doesn't work correctly for AArch64 with SVE or SME
>>> support.
>>>
>>> The correct approach for AArch64/Linux is to either have per-thread target
>>> description notes in the corefiles or to rely on the
>>> gdbarch_core_read_description hook, so it can figure out the proper target
>>> description for a given thread based on the various available register notes.
>>>
>>> The former, although more correct, doesn't address the case of existing gdb's
>>> that only output a single target description note.
>>
>> Do those existing GDB's support per-thread target descriptions though?
>> I thought this series was the where the per-thread target description
>> feature was being added ... so shouldn't core files generated by
>> previous GDB's only support a single target description?  Or am I
>> missing something.
>>
>>>
>>> This patch goes for the latter, and adds a new gdbarch hook to conditionalize
>>> the use of the corefile target description note. The hook is called
>>> use_target_description_from_corefile_notes.
>>>
>>> The hook defaults to returning true, meaning targets will use the corefile
>>> target description note.  AArch64 Linux overrides the hook to return false
>>> when it detects any of the SVE or SME register notes in the corefile.
>>>
>>> Otherwise it should be fine for AArch64 Linux to use the corefile target
>>> description note.
>>>
>>> When we support per-thread target description notes, then we can augment
>>> the AArch64 Linux hook to rely on those notes.
>>
>> I guess I was a little surprised that I couldn't see anywhere in this
>> series that you _stop_ adding the NT_GDB_TDESC note to core files
>> generated from within GDB.
>>
>> I guess I was expecting to see some logic in gcore_elf_make_tdesc_note
>> that tried to figure out if we had per-thread tdesc, and if so, at a
>> minimum, didn't add the NT_GDB_TDESC.
>>
>> If we did that, then, I'm thinking:
>>
>>   - Previous GDB's only supported a single tdesc, and so are correct,
>>
>>   - New GDB's support per-thread tdesc, but don't emit NT_GDB_TDESC, so
>>     we don't need to guard against loading them
>>
>> Maybe I'm missing something though.
> 
> Luis,
> 
> After our discussion on IRC I think I understand this a bit more, so
> thanks for that.

You're welcome. Hopefully I managed to clarify most of it.

> 
> So, here's what I think is true:
> 
>   - AArch64 supports per-thread gdbarch since before this patch series
>     (implements ::thread_architecture),
> 
>   - GDB by default writes out a single NT_GDB_TDESC which describes the
>     "per-inferior" gdbarch, but this doesn't correctly represent the
>     per-thread gdbarch/tdesc,
> 
>   - If a NT_GDB_TDESC is present then GDB is going to try and load it
>     back in and use it,
> 
>   - This is a problem that has existed for a while, but you've only just
>     become aware, and so you're fixing it here,
> 
>   - The new gdbarch hook allows an architecture to avoid loading a
>     single NT_GDB_TDESC note, for AArch64 this is when we see registers
>     that are (likely) going to require per-thread gdbarch/tdesc,
> 

All of the above are correct.

> I'm hoping we both agree on the above.  So from there I think I has two
> concerns:
> 
>  1. GDB should avoid writing out the NT_GDB_TDESC is the situation where
>     it know that a single such note is not correct, and

I see your point now. Not outputting NT_GDB_TDESC when it is incorrect sounds
reasonable, even if the incorrect NT_GDB_TDESC note won't be used.

It is also reasonable to do it until the point where we support per-thread
NT_GDB_TDESC's, which is, I think, the proper solution.

> 
>  2. I wonder if there's a better solution than the new hook.
> 
> For point (1) the patch below is a rough draft of what I'm thinking
> about: by looking at each of the gdbarch pointers we can spot if an
> inferior uses multiple different gdbarches -- if it does then we know
> that a single NT_GDB_TDESC is going to be wrong, so lets not write it
> out.
> 
> As was mentioned on IRC, longer term, it might be better if we wrote out
> one tdesc for each thread, but that feels like a bigger job, and I think
> stopping GDB doing something that's just *wrong* is pretty easy, so why
> wouldn't we do that?

I think outputting per-thread NT_GDB_TDESC's is fairly easy. Probably a matter of
calling the function that outputs NT_GDB_TDESC while we're dumping register sets for
each thread.

I haven't investigated in-depth how to use that information when loading a core file, but
there might be some complexity because we read the target description before we load the threads.

If the target description we read first is *always* the one for the signalled thread, then we might
be ok and it should make the implementation easier. Anyway, that's a bit off-topic for this patch.

> 
> For point (2) I agree with the premise that we need a mechanism to stop
> AArch64 loading the incorrect single NT_GDB_TDESC.  This is a slight
> change in stance from what I originally wrote, but our IRC conversation
> showed me I was wrong originally.  I don't have time this evening to
> look at this, but will follow up again tomorrow with more thoughts.

Except for some different names and tweaks, your proposed approach works correctly.

I tested this and noticed the lack of NT_GDB_TDESC for a AArch64 Linux target
with sve/sme support and one thread with a differing gdbarch. I also saw the note for
a AArch64 Linux target without sve/sme support, or with sve/sme support but all threads
having the exact same gdbarch. So that looks good to me.

Would you like this extra code to be included as part of this particular patch?

By the way, thanks for the review.

> 
> Thanks,
> Andrew
> 
> ---
> 
> diff --git a/gdb/linux-tdep.c b/gdb/linux-tdep.c
> index 2885afd60c7..f1f7675eb37 100644
> --- a/gdb/linux-tdep.c
> +++ b/gdb/linux-tdep.c
> @@ -2077,6 +2077,8 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
>    else
>      stop_signal = GDB_SIGNAL_0;
>  
> +  bool multiple_arches = false;> +  struct gdbarch *some_arch = nullptr;
>    if (signalled_thr != nullptr)
>      {
>        /* On some architectures, like AArch64, each thread can have a distinct
> @@ -2085,8 +2087,8 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
>  
>  	 Fetch each thread's gdbarch and pass it down to the lower layers so
>  	 we can dump the right set of registers.  */
> -      linux_corefile_thread (signalled_thr,
> -			     target_thread_architecture (signalled_thr->ptid),
> +      some_arch = target_thread_architecture (signalled_thr->ptid);
> +      linux_corefile_thread (signalled_thr, some_arch,
>  			     obfd, note_data, note_size, stop_signal);
>      }
>    for (thread_info *thr : current_inferior ()->non_exited_threads ())
> @@ -2100,7 +2102,10 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
>  
>  	 Fetch each thread's gdbarch and pass it down to the lower layers so
>  	 we can dump the right set of registers.  */
> -      linux_corefile_thread (thr, target_thread_architecture (thr->ptid),
> +      struct gdbarch *tmp = target_thread_architecture (thr->ptid);
> +      if (tmp != some_arch)
> +	multiple_arches = true;
> +      linux_corefile_thread (thr, tmp,
>  			     obfd, note_data, note_size, stop_signal);
>      }
>  
> @@ -2125,7 +2130,8 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
>    linux_make_mappings_corefile_notes (gdbarch, obfd, note_data, note_size);
>  
>    /* Target description.  */
> -  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
> +  if (!multiple_arches)
> +    gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
>  
>    return note_data;
>  }
>
  
Andrew Burgess Sept. 21, 2023, 10:02 a.m. UTC | #7
Luis Machado <luis.machado@arm.com> writes:

> On 9/20/23 16:26, Andrew Burgess wrote:
>> Andrew Burgess <aburgess@redhat.com> writes:
>> 
>>> Luis Machado via Gdb-patches <gdb-patches@sourceware.org> writes:
>>>
>>>> New entry in v7.
>>>>
>>>> ---
>>>>
>>>> Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
>>>> can potentially have distinct target descriptions/gdbarches.
>>>>
>>>> When loading a gcore-generated core file, at the moment GDB gives priority
>>>> to the target description dumped to NT_GDB_TDESC.  Though technically correct
>>>> for most targets, it doesn't work correctly for AArch64 with SVE or SME
>>>> support.
>>>>
>>>> The correct approach for AArch64/Linux is to either have per-thread target
>>>> description notes in the corefiles or to rely on the
>>>> gdbarch_core_read_description hook, so it can figure out the proper target
>>>> description for a given thread based on the various available register notes.
>>>>
>>>> The former, although more correct, doesn't address the case of existing gdb's
>>>> that only output a single target description note.
>>>
>>> Do those existing GDB's support per-thread target descriptions though?
>>> I thought this series was the where the per-thread target description
>>> feature was being added ... so shouldn't core files generated by
>>> previous GDB's only support a single target description?  Or am I
>>> missing something.
>>>
>>>>
>>>> This patch goes for the latter, and adds a new gdbarch hook to conditionalize
>>>> the use of the corefile target description note. The hook is called
>>>> use_target_description_from_corefile_notes.
>>>>
>>>> The hook defaults to returning true, meaning targets will use the corefile
>>>> target description note.  AArch64 Linux overrides the hook to return false
>>>> when it detects any of the SVE or SME register notes in the corefile.
>>>>
>>>> Otherwise it should be fine for AArch64 Linux to use the corefile target
>>>> description note.
>>>>
>>>> When we support per-thread target description notes, then we can augment
>>>> the AArch64 Linux hook to rely on those notes.
>>>
>>> I guess I was a little surprised that I couldn't see anywhere in this
>>> series that you _stop_ adding the NT_GDB_TDESC note to core files
>>> generated from within GDB.
>>>
>>> I guess I was expecting to see some logic in gcore_elf_make_tdesc_note
>>> that tried to figure out if we had per-thread tdesc, and if so, at a
>>> minimum, didn't add the NT_GDB_TDESC.
>>>
>>> If we did that, then, I'm thinking:
>>>
>>>   - Previous GDB's only supported a single tdesc, and so are correct,
>>>
>>>   - New GDB's support per-thread tdesc, but don't emit NT_GDB_TDESC, so
>>>     we don't need to guard against loading them
>>>
>>> Maybe I'm missing something though.
>> 
>> Luis,
>> 
>> After our discussion on IRC I think I understand this a bit more, so
>> thanks for that.
>
> You're welcome. Hopefully I managed to clarify most of it.
>
>> 
>> So, here's what I think is true:
>> 
>>   - AArch64 supports per-thread gdbarch since before this patch series
>>     (implements ::thread_architecture),
>> 
>>   - GDB by default writes out a single NT_GDB_TDESC which describes the
>>     "per-inferior" gdbarch, but this doesn't correctly represent the
>>     per-thread gdbarch/tdesc,
>> 
>>   - If a NT_GDB_TDESC is present then GDB is going to try and load it
>>     back in and use it,
>> 
>>   - This is a problem that has existed for a while, but you've only just
>>     become aware, and so you're fixing it here,
>> 
>>   - The new gdbarch hook allows an architecture to avoid loading a
>>     single NT_GDB_TDESC note, for AArch64 this is when we see registers
>>     that are (likely) going to require per-thread gdbarch/tdesc,
>> 
>
> All of the above are correct.
>
>> I'm hoping we both agree on the above.  So from there I think I has two
>> concerns:
>> 
>>  1. GDB should avoid writing out the NT_GDB_TDESC is the situation where
>>     it know that a single such note is not correct, and
>
> I see your point now. Not outputting NT_GDB_TDESC when it is incorrect sounds
> reasonable, even if the incorrect NT_GDB_TDESC note won't be used.
>
> It is also reasonable to do it until the point where we support per-thread
> NT_GDB_TDESC's, which is, I think, the proper solution.
>
>> 
>>  2. I wonder if there's a better solution than the new hook.
>> 
>> For point (1) the patch below is a rough draft of what I'm thinking
>> about: by looking at each of the gdbarch pointers we can spot if an
>> inferior uses multiple different gdbarches -- if it does then we know
>> that a single NT_GDB_TDESC is going to be wrong, so lets not write it
>> out.
>> 
>> As was mentioned on IRC, longer term, it might be better if we wrote out
>> one tdesc for each thread, but that feels like a bigger job, and I think
>> stopping GDB doing something that's just *wrong* is pretty easy, so why
>> wouldn't we do that?
>
> I think outputting per-thread NT_GDB_TDESC's is fairly easy. Probably a matter of
> calling the function that outputs NT_GDB_TDESC while we're dumping register sets for
> each thread.

Indeed, I have this bit done locally.  What isn't clear to me is how
having multiple NT_GDB_TDESC will help -- does
core_file::read_description get called multiple times when using AArch64
with sve/sme?

>
> I haven't investigated in-depth how to use that information when loading a core file, but
> there might be some complexity because we read the target description before we load the threads.
>
> If the target description we read first is *always* the one for the signalled thread, then we might
> be ok and it should make the implementation easier. Anyway, that's a
> bit off-topic for this patch.

I mean, maybe this is a better solution?  What if
gcore_elf_make_tdesc_note took a 'struct gdbarch *' argument, and it was
the callers to chose which gdbarch to pass.  We would then spec in the
comments of gcore_elf_make_tdesc_note that the passed in gdbarch should
be the gdbarch of the signalled thread.  See the patch below.

Could I ask: would you mind sending me a small application and corefile
for AArch64 with sve/sme please.  I'd love to have a play to better
understand how GDB sets up the per-thread gdbarch in that case.  I'd be
very grateful.

>
>> 
>> For point (2) I agree with the premise that we need a mechanism to stop
>> AArch64 loading the incorrect single NT_GDB_TDESC.  This is a slight
>> change in stance from what I originally wrote, but our IRC conversation
>> showed me I was wrong originally.  I don't have time this evening to
>> look at this, but will follow up again tomorrow with more thoughts.

I wonder, instead of adding the new hook, maybe we should just reorder
the checks in core_target::read_description -- ask the gdbarch to grok a
tdesc from the .regs (etc) sections, and if that fails, check for an
encoded tdesc.

When I originally added this code my problem case was RISC-V, where the
CSRs (control regs) for embedded targets can be different for each
target.  The CSRs are just encoded as a blob, so unless you already know
which regs are present you're stuck.  For real h/w the controller
(e.g. openocd) provides a tdesc (via target.xml), but for a core file we
needed a way to reacquire the tdesc.

My assumption at the time that the tdesc we wrote would always be
correct, but clearly this isn't the case.

Right now RISC-V doesn't event override gdbarch_core_read_description,
so if we read the encoded tdesc after checking that gdbarch hook nothing
would change.

For other architectures, if gdbarch_core_read_description is implemented
then we'll be back to using that, and that worked fine before.

... And if, in some future world we do implement
gdbarch_core_read_description for RISC-V then the solution seems
obvious, if there's a section containing CSRs, and there's also a
section containing a tdesc, then the RISC-V
gdbarch_core_read_description can just return nullptr, safe in the
knowledge the generic GDB code will load the encoded tdesc.

>
> Except for some different names and tweaks, your proposed approach works correctly.
>
> I tested this and noticed the lack of NT_GDB_TDESC for a AArch64 Linux target
> with sve/sme support and one thread with a differing gdbarch. I also saw the note for
> a AArch64 Linux target without sve/sme support, or with sve/sme support but all threads
> having the exact same gdbarch. So that looks good to me.
>
> Would you like this extra code to be included as part of this
> particular patch?

Yeah I think something like this should be added to this patch, we
should stop GDB creating incorrect core files I think.

Thanks,
Andrew

----

diff --git a/gdb/elf-none-tdep.c b/gdb/elf-none-tdep.c
index 460f02e7dbb..ad31d3c820a 100644
--- a/gdb/elf-none-tdep.c
+++ b/gdb/elf-none-tdep.c
@@ -111,7 +111,9 @@ elf_none_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd,
 
 
   /* Target description.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  if (signalled_thr != nullptr)
+    gdbarch = target_thread_architecture (signalled_thr->ptid);
+  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
 
   return note_data;
 }
diff --git a/gdb/fbsd-tdep.c b/gdb/fbsd-tdep.c
index dc5020d92d2..c44dc0871c6 100644
--- a/gdb/fbsd-tdep.c
+++ b/gdb/fbsd-tdep.c
@@ -774,7 +774,9 @@ fbsd_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
     }
 
   /* Include the target description when possible.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  if (signalled_thr != nullptr)
+    gdbarch = target_thread_architecture (signalled_thr->ptid);
+  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
 
   return note_data;
 }
diff --git a/gdb/gcore-elf.c b/gdb/gcore-elf.c
index 643426d911b..0142db82670 100644
--- a/gdb/gcore-elf.c
+++ b/gdb/gcore-elf.c
@@ -139,12 +139,12 @@ gcore_elf_build_thread_register_notes
 /* See gcore-elf.h.  */
 
 void
-gcore_elf_make_tdesc_note (bfd *obfd,
+gcore_elf_make_tdesc_note (struct gdbarch *gdbarch, bfd *obfd,
 			   gdb::unique_xmalloc_ptr<char> *note_data,
 			   int *note_size)
 {
   /* Append the target description to the core file.  */
-  const struct target_desc *tdesc = gdbarch_target_desc (target_gdbarch ());
+  const struct target_desc *tdesc = gdbarch_target_desc (gdbarch);
   const char *tdesc_xml
     = tdesc == nullptr ? nullptr : tdesc_get_features_xml (tdesc);
   if (tdesc_xml != nullptr && *tdesc_xml != '\0')
diff --git a/gdb/gcore-elf.h b/gdb/gcore-elf.h
index 2cfeb3e8550..e23fe5e7162 100644
--- a/gdb/gcore-elf.h
+++ b/gdb/gcore-elf.h
@@ -42,6 +42,7 @@ extern void gcore_elf_build_thread_register_notes
    set to nullptr.  */
 
 extern void gcore_elf_make_tdesc_note
-  (bfd *obfd, gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
+  (struct gdbarch *gdbarch, bfd *obfd,
+   gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
 
 #endif /* GCORE_ELF_H */
diff --git a/gdb/linux-tdep.c b/gdb/linux-tdep.c
index 2885afd60c7..24ae1d711d1 100644
--- a/gdb/linux-tdep.c
+++ b/gdb/linux-tdep.c
@@ -1836,6 +1836,10 @@ linux_corefile_thread (struct thread_info *info,
      such a core file is useless.  */
   if (note_data != nullptr)
     {
+      /* If we want per-thread NT_GDB_TDESC then at this point we should
+	 do this:  */
+      // gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
+
       gdb::byte_vector siginfo_data
 	= linux_get_siginfo_data (info, gdbarch);
       if (!siginfo_data.empty ())
@@ -2125,7 +2129,9 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
   linux_make_mappings_corefile_notes (gdbarch, obfd, note_data, note_size);
 
   /* Target description.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  if (signalled_thr != nullptr)
+    gdbarch = target_thread_architecture (signalled_thr->ptid);
+  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
 
   return note_data;
 }

---
  
Luis Machado Sept. 21, 2023, 10:44 a.m. UTC | #8
Hi Andrew,

On 9/21/23 11:02, Andrew Burgess wrote:
> Luis Machado <luis.machado@arm.com> writes:
> 
>> On 9/20/23 16:26, Andrew Burgess wrote:
>>> Andrew Burgess <aburgess@redhat.com> writes:
>>>
>>>> Luis Machado via Gdb-patches <gdb-patches@sourceware.org> writes:
>>>>
>>>>> New entry in v7.
>>>>>
>>>>> ---
>>>>>
>>>>> Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
>>>>> can potentially have distinct target descriptions/gdbarches.
>>>>>
>>>>> When loading a gcore-generated core file, at the moment GDB gives priority
>>>>> to the target description dumped to NT_GDB_TDESC.  Though technically correct
>>>>> for most targets, it doesn't work correctly for AArch64 with SVE or SME
>>>>> support.
>>>>>
>>>>> The correct approach for AArch64/Linux is to either have per-thread target
>>>>> description notes in the corefiles or to rely on the
>>>>> gdbarch_core_read_description hook, so it can figure out the proper target
>>>>> description for a given thread based on the various available register notes.
>>>>>
>>>>> The former, although more correct, doesn't address the case of existing gdb's
>>>>> that only output a single target description note.
>>>>
>>>> Do those existing GDB's support per-thread target descriptions though?
>>>> I thought this series was the where the per-thread target description
>>>> feature was being added ... so shouldn't core files generated by
>>>> previous GDB's only support a single target description?  Or am I
>>>> missing something.
>>>>
>>>>>
>>>>> This patch goes for the latter, and adds a new gdbarch hook to conditionalize
>>>>> the use of the corefile target description note. The hook is called
>>>>> use_target_description_from_corefile_notes.
>>>>>
>>>>> The hook defaults to returning true, meaning targets will use the corefile
>>>>> target description note.  AArch64 Linux overrides the hook to return false
>>>>> when it detects any of the SVE or SME register notes in the corefile.
>>>>>
>>>>> Otherwise it should be fine for AArch64 Linux to use the corefile target
>>>>> description note.
>>>>>
>>>>> When we support per-thread target description notes, then we can augment
>>>>> the AArch64 Linux hook to rely on those notes.
>>>>
>>>> I guess I was a little surprised that I couldn't see anywhere in this
>>>> series that you _stop_ adding the NT_GDB_TDESC note to core files
>>>> generated from within GDB.
>>>>
>>>> I guess I was expecting to see some logic in gcore_elf_make_tdesc_note
>>>> that tried to figure out if we had per-thread tdesc, and if so, at a
>>>> minimum, didn't add the NT_GDB_TDESC.
>>>>
>>>> If we did that, then, I'm thinking:
>>>>
>>>>   - Previous GDB's only supported a single tdesc, and so are correct,
>>>>
>>>>   - New GDB's support per-thread tdesc, but don't emit NT_GDB_TDESC, so
>>>>     we don't need to guard against loading them
>>>>
>>>> Maybe I'm missing something though.
>>>
>>> Luis,
>>>
>>> After our discussion on IRC I think I understand this a bit more, so
>>> thanks for that.
>>
>> You're welcome. Hopefully I managed to clarify most of it.
>>
>>>
>>> So, here's what I think is true:
>>>
>>>   - AArch64 supports per-thread gdbarch since before this patch series
>>>     (implements ::thread_architecture),
>>>
>>>   - GDB by default writes out a single NT_GDB_TDESC which describes the
>>>     "per-inferior" gdbarch, but this doesn't correctly represent the
>>>     per-thread gdbarch/tdesc,
>>>
>>>   - If a NT_GDB_TDESC is present then GDB is going to try and load it
>>>     back in and use it,
>>>
>>>   - This is a problem that has existed for a while, but you've only just
>>>     become aware, and so you're fixing it here,
>>>
>>>   - The new gdbarch hook allows an architecture to avoid loading a
>>>     single NT_GDB_TDESC note, for AArch64 this is when we see registers
>>>     that are (likely) going to require per-thread gdbarch/tdesc,
>>>
>>
>> All of the above are correct.
>>
>>> I'm hoping we both agree on the above.  So from there I think I has two
>>> concerns:
>>>
>>>  1. GDB should avoid writing out the NT_GDB_TDESC is the situation where
>>>     it know that a single such note is not correct, and
>>
>> I see your point now. Not outputting NT_GDB_TDESC when it is incorrect sounds
>> reasonable, even if the incorrect NT_GDB_TDESC note won't be used.
>>
>> It is also reasonable to do it until the point where we support per-thread
>> NT_GDB_TDESC's, which is, I think, the proper solution.
>>
>>>
>>>  2. I wonder if there's a better solution than the new hook.
>>>
>>> For point (1) the patch below is a rough draft of what I'm thinking
>>> about: by looking at each of the gdbarch pointers we can spot if an
>>> inferior uses multiple different gdbarches -- if it does then we know
>>> that a single NT_GDB_TDESC is going to be wrong, so lets not write it
>>> out.
>>>
>>> As was mentioned on IRC, longer term, it might be better if we wrote out
>>> one tdesc for each thread, but that feels like a bigger job, and I think
>>> stopping GDB doing something that's just *wrong* is pretty easy, so why
>>> wouldn't we do that?
>>
>> I think outputting per-thread NT_GDB_TDESC's is fairly easy. Probably a matter of
>> calling the function that outputs NT_GDB_TDESC while we're dumping register sets for
>> each thread.
> 
> Indeed, I have this bit done locally.  What isn't clear to me is how
> having multiple NT_GDB_TDESC will help -- does
> core_file::read_description get called multiple times when using AArch64
> with sve/sme?
> 

No, it doesn't get called multiple times. And to be honest, I don't think it would help at
the moment, even if we did output multiple NT_GDB_TDESC entries.

The reason being, IIRC, that we lack a thread_architecture hook. That's available for the
AArch64 Linux native target and there have been attempts at having it for the remote
target as well (which would fix the problem of communicating vector length changes over RSP).

So, similarly, we'd need something like a thread_architecture implementation for core files,
otherwise it wouldn't solve the problem completely.

To clarify, the more generic issue of per-thread target descriptions (with or without NT_GDB_TDESC's)
from threaded core files isn't completely solved by this patch either.  I haven't gone through that path
yet, as it is a bit outside the scope of sme1/sme2 enablement. It is an orthogonal problem, that I happened
to have a pass at improving (not completely fixing).

With this patch, at least gdb can load the core file and display sane values for single-threaded
core files with sme data.

>>
>> I haven't investigated in-depth how to use that information when loading a core file, but
>> there might be some complexity because we read the target description before we load the threads.
>>
>> If the target description we read first is *always* the one for the signalled thread, then we might
>> be ok and it should make the implementation easier. Anyway, that's a
>> bit off-topic for this patch.
> 
> I mean, maybe this is a better solution?  What if
> gcore_elf_make_tdesc_note took a 'struct gdbarch *' argument, and it was
> the callers to chose which gdbarch to pass.  We would then spec in the
> comments of gcore_elf_make_tdesc_note that the passed in gdbarch should
> be the gdbarch of the signalled thread.  See the patch below.

Sure, that makes sense, and I don't disagree with that. But as I mentioned above, it is an
orthogonal problem that might be better suited as its own patch/series. A more complete patch
addressing this particular situation I stumbled upon.

> 
> Could I ask: would you mind sending me a small application and corefile
> for AArch64 with sve/sme please.  I'd love to have a play to better
> understand how GDB sets up the per-thread gdbarch in that case.  I'd be
> very grateful.
> 

Absolutely, that would be no problem. I can send you some of those. You will need more than
one, as the different states might influence things. But, as I explained earlier, gdb won't
setup the per-thread gdbarch in that case.

If you'd still like to have a go, let me know which ones you want (no sve, sve,
sve+sme, threads).

>>
>>>
>>> For point (2) I agree with the premise that we need a mechanism to stop
>>> AArch64 loading the incorrect single NT_GDB_TDESC.  This is a slight
>>> change in stance from what I originally wrote, but our IRC conversation
>>> showed me I was wrong originally.  I don't have time this evening to
>>> look at this, but will follow up again tomorrow with more thoughts.
> 
> I wonder, instead of adding the new hook, maybe we should just reorder
> the checks in core_target::read_description -- ask the gdbarch to grok a
> tdesc from the .regs (etc) sections, and if that fails, check for an
> encoded tdesc.
> 

That's exactly what I did a few versions earlier. But Simon pointed out it would
effectively be a conflicting situation given our choice of defaulting to reading the
target description from the NT_GDB_TDESC. So I went with the hook, which, to be honest,
seems cleaner.

> When I originally added this code my problem case was RISC-V, where the
> CSRs (control regs) for embedded targets can be different for each
> target.  The CSRs are just encoded as a blob, so unless you already know
> which regs are present you're stuck.  For real h/w the controller
> (e.g. openocd) provides a tdesc (via target.xml), but for a core file we
> needed a way to reacquire the tdesc.
> 
> My assumption at the time that the tdesc we wrote would always be
> correct, but clearly this isn't the case.

That's true. With the introduction of dynamic target descriptions like sve and sme, this
no longer holds true.

> 
> Right now RISC-V doesn't event override gdbarch_core_read_description,
> so if we read the encoded tdesc after checking that gdbarch hook nothing
> would change.

That was my assessment back in the earlier versions of this series.

> 
> For other architectures, if gdbarch_core_read_description is implemented
> then we'll be back to using that, and that worked fine before.
> 
> ... And if, in some future world we do implement
> gdbarch_core_read_description for RISC-V then the solution seems
> obvious, if there's a section containing CSRs, and there's also a
> section containing a tdesc, then the RISC-V
> gdbarch_core_read_description can just return nullptr, safe in the
> knowledge the generic GDB code will load the encoded tdesc.
> 
>>
>> Except for some different names and tweaks, your proposed approach works correctly.
>>
>> I tested this and noticed the lack of NT_GDB_TDESC for a AArch64 Linux target
>> with sve/sme support and one thread with a differing gdbarch. I also saw the note for
>> a AArch64 Linux target without sve/sme support, or with sve/sme support but all threads
>> having the exact same gdbarch. So that looks good to me.
>>
>> Would you like this extra code to be included as part of this
>> particular patch?
> 
> Yeah I think something like this should be added to this patch, we
> should stop GDB creating incorrect core files I think.

Got it.

I'm afraid I'm going to need some clarification on ways forward with regards to this particular
series. Do you still want this (or other) potential changes as part of this series or would you
be ok with a follow-up patch/series to (try to, haven't gone in-depth yet) address this particular
core file/gdbarch/threads situation?

I'd like to have the sme support in for soon-to-branch gdb 14. I don't know how long the community
would be willing to wait for a potential fix for this problem before branching/releasing.

I don't see it as a critical problem, to be honest. Most of the concern is around core files generated
by threaded programs for which some threads have changed the vector length mid-execution. That is
an uncommon case. The most common case is that of all threads having the same gdbarch, because they
are all using the same vector length (for sve, sme etc).

Thanks!
Luis

> 
> Thanks,
> Andrew
> 
> ----
> 
> diff --git a/gdb/elf-none-tdep.c b/gdb/elf-none-tdep.c
> index 460f02e7dbb..ad31d3c820a 100644
> --- a/gdb/elf-none-tdep.c
> +++ b/gdb/elf-none-tdep.c
> @@ -111,7 +111,9 @@ elf_none_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd,
>  
>  
>    /* Target description.  */
> -  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
> +  if (signalled_thr != nullptr)
> +    gdbarch = target_thread_architecture (signalled_thr->ptid);
> +  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
>  
>    return note_data;
>  }
> diff --git a/gdb/fbsd-tdep.c b/gdb/fbsd-tdep.c
> index dc5020d92d2..c44dc0871c6 100644
> --- a/gdb/fbsd-tdep.c
> +++ b/gdb/fbsd-tdep.c
> @@ -774,7 +774,9 @@ fbsd_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
>      }
>  
>    /* Include the target description when possible.  */
> -  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
> +  if (signalled_thr != nullptr)
> +    gdbarch = target_thread_architecture (signalled_thr->ptid);
> +  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
>  
>    return note_data;
>  }
> diff --git a/gdb/gcore-elf.c b/gdb/gcore-elf.c
> index 643426d911b..0142db82670 100644
> --- a/gdb/gcore-elf.c
> +++ b/gdb/gcore-elf.c
> @@ -139,12 +139,12 @@ gcore_elf_build_thread_register_notes
>  /* See gcore-elf.h.  */
>  
>  void
> -gcore_elf_make_tdesc_note (bfd *obfd,
> +gcore_elf_make_tdesc_note (struct gdbarch *gdbarch, bfd *obfd,
>  			   gdb::unique_xmalloc_ptr<char> *note_data,
>  			   int *note_size)
>  {
>    /* Append the target description to the core file.  */
> -  const struct target_desc *tdesc = gdbarch_target_desc (target_gdbarch ());
> +  const struct target_desc *tdesc = gdbarch_target_desc (gdbarch);
>    const char *tdesc_xml
>      = tdesc == nullptr ? nullptr : tdesc_get_features_xml (tdesc);
>    if (tdesc_xml != nullptr && *tdesc_xml != '\0')
> diff --git a/gdb/gcore-elf.h b/gdb/gcore-elf.h
> index 2cfeb3e8550..e23fe5e7162 100644
> --- a/gdb/gcore-elf.h
> +++ b/gdb/gcore-elf.h
> @@ -42,6 +42,7 @@ extern void gcore_elf_build_thread_register_notes
>     set to nullptr.  */
>  
>  extern void gcore_elf_make_tdesc_note
> -  (bfd *obfd, gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
> +  (struct gdbarch *gdbarch, bfd *obfd,
> +   gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
>  
>  #endif /* GCORE_ELF_H */
> diff --git a/gdb/linux-tdep.c b/gdb/linux-tdep.c
> index 2885afd60c7..24ae1d711d1 100644
> --- a/gdb/linux-tdep.c
> +++ b/gdb/linux-tdep.c
> @@ -1836,6 +1836,10 @@ linux_corefile_thread (struct thread_info *info,
>       such a core file is useless.  */
>    if (note_data != nullptr)
>      {
> +      /* If we want per-thread NT_GDB_TDESC then at this point we should
> +	 do this:  */
> +      // gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
> +
>        gdb::byte_vector siginfo_data
>  	= linux_get_siginfo_data (info, gdbarch);
>        if (!siginfo_data.empty ())
> @@ -2125,7 +2129,9 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
>    linux_make_mappings_corefile_notes (gdbarch, obfd, note_data, note_size);
>  
>    /* Target description.  */
> -  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
> +  if (signalled_thr != nullptr)
> +    gdbarch = target_thread_architecture (signalled_thr->ptid);
> +  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
>  
>    return note_data;
>  }
> 
> ---
>
  
Andrew Burgess Sept. 25, 2023, 9:57 a.m. UTC | #9
Luis Machado <luis.machado@arm.com> writes:

> Hi Andrew,
>
> On 9/21/23 11:02, Andrew Burgess wrote:
>> Luis Machado <luis.machado@arm.com> writes:
>> 
>>> On 9/20/23 16:26, Andrew Burgess wrote:
>>>> Andrew Burgess <aburgess@redhat.com> writes:
>>>>
>>>>> Luis Machado via Gdb-patches <gdb-patches@sourceware.org> writes:
>>>>>
>>>>>> New entry in v7.
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Due to the nature of the AArch64 SVE/SME extensions in GDB, each thread
>>>>>> can potentially have distinct target descriptions/gdbarches.
>>>>>>
>>>>>> When loading a gcore-generated core file, at the moment GDB gives priority
>>>>>> to the target description dumped to NT_GDB_TDESC.  Though technically correct
>>>>>> for most targets, it doesn't work correctly for AArch64 with SVE or SME
>>>>>> support.
>>>>>>
>>>>>> The correct approach for AArch64/Linux is to either have per-thread target
>>>>>> description notes in the corefiles or to rely on the
>>>>>> gdbarch_core_read_description hook, so it can figure out the proper target
>>>>>> description for a given thread based on the various available register notes.
>>>>>>
>>>>>> The former, although more correct, doesn't address the case of existing gdb's
>>>>>> that only output a single target description note.
>>>>>
>>>>> Do those existing GDB's support per-thread target descriptions though?
>>>>> I thought this series was the where the per-thread target description
>>>>> feature was being added ... so shouldn't core files generated by
>>>>> previous GDB's only support a single target description?  Or am I
>>>>> missing something.
>>>>>
>>>>>>
>>>>>> This patch goes for the latter, and adds a new gdbarch hook to conditionalize
>>>>>> the use of the corefile target description note. The hook is called
>>>>>> use_target_description_from_corefile_notes.
>>>>>>
>>>>>> The hook defaults to returning true, meaning targets will use the corefile
>>>>>> target description note.  AArch64 Linux overrides the hook to return false
>>>>>> when it detects any of the SVE or SME register notes in the corefile.
>>>>>>
>>>>>> Otherwise it should be fine for AArch64 Linux to use the corefile target
>>>>>> description note.
>>>>>>
>>>>>> When we support per-thread target description notes, then we can augment
>>>>>> the AArch64 Linux hook to rely on those notes.
>>>>>
>>>>> I guess I was a little surprised that I couldn't see anywhere in this
>>>>> series that you _stop_ adding the NT_GDB_TDESC note to core files
>>>>> generated from within GDB.
>>>>>
>>>>> I guess I was expecting to see some logic in gcore_elf_make_tdesc_note
>>>>> that tried to figure out if we had per-thread tdesc, and if so, at a
>>>>> minimum, didn't add the NT_GDB_TDESC.
>>>>>
>>>>> If we did that, then, I'm thinking:
>>>>>
>>>>>   - Previous GDB's only supported a single tdesc, and so are correct,
>>>>>
>>>>>   - New GDB's support per-thread tdesc, but don't emit NT_GDB_TDESC, so
>>>>>     we don't need to guard against loading them
>>>>>
>>>>> Maybe I'm missing something though.
>>>>
>>>> Luis,
>>>>
>>>> After our discussion on IRC I think I understand this a bit more, so
>>>> thanks for that.
>>>
>>> You're welcome. Hopefully I managed to clarify most of it.
>>>
>>>>
>>>> So, here's what I think is true:
>>>>
>>>>   - AArch64 supports per-thread gdbarch since before this patch series
>>>>     (implements ::thread_architecture),
>>>>
>>>>   - GDB by default writes out a single NT_GDB_TDESC which describes the
>>>>     "per-inferior" gdbarch, but this doesn't correctly represent the
>>>>     per-thread gdbarch/tdesc,
>>>>
>>>>   - If a NT_GDB_TDESC is present then GDB is going to try and load it
>>>>     back in and use it,
>>>>
>>>>   - This is a problem that has existed for a while, but you've only just
>>>>     become aware, and so you're fixing it here,
>>>>
>>>>   - The new gdbarch hook allows an architecture to avoid loading a
>>>>     single NT_GDB_TDESC note, for AArch64 this is when we see registers
>>>>     that are (likely) going to require per-thread gdbarch/tdesc,
>>>>
>>>
>>> All of the above are correct.
>>>
>>>> I'm hoping we both agree on the above.  So from there I think I has two
>>>> concerns:
>>>>
>>>>  1. GDB should avoid writing out the NT_GDB_TDESC is the situation where
>>>>     it know that a single such note is not correct, and
>>>
>>> I see your point now. Not outputting NT_GDB_TDESC when it is incorrect sounds
>>> reasonable, even if the incorrect NT_GDB_TDESC note won't be used.
>>>
>>> It is also reasonable to do it until the point where we support per-thread
>>> NT_GDB_TDESC's, which is, I think, the proper solution.
>>>
>>>>
>>>>  2. I wonder if there's a better solution than the new hook.
>>>>
>>>> For point (1) the patch below is a rough draft of what I'm thinking
>>>> about: by looking at each of the gdbarch pointers we can spot if an
>>>> inferior uses multiple different gdbarches -- if it does then we know
>>>> that a single NT_GDB_TDESC is going to be wrong, so lets not write it
>>>> out.
>>>>
>>>> As was mentioned on IRC, longer term, it might be better if we wrote out
>>>> one tdesc for each thread, but that feels like a bigger job, and I think
>>>> stopping GDB doing something that's just *wrong* is pretty easy, so why
>>>> wouldn't we do that?
>>>
>>> I think outputting per-thread NT_GDB_TDESC's is fairly easy. Probably a matter of
>>> calling the function that outputs NT_GDB_TDESC while we're dumping register sets for
>>> each thread.
>> 
>> Indeed, I have this bit done locally.  What isn't clear to me is how
>> having multiple NT_GDB_TDESC will help -- does
>> core_file::read_description get called multiple times when using AArch64
>> with sve/sme?
>> 
>
> No, it doesn't get called multiple times. And to be honest, I don't think it would help at
> the moment, even if we did output multiple NT_GDB_TDESC entries.
>
> The reason being, IIRC, that we lack a thread_architecture hook. That's available for the
> AArch64 Linux native target and there have been attempts at having it for the remote
> target as well (which would fix the problem of communicating vector length changes over RSP).
>
> So, similarly, we'd need something like a thread_architecture implementation for core files,
> otherwise it wouldn't solve the problem completely.
>
> To clarify, the more generic issue of per-thread target descriptions (with or without NT_GDB_TDESC's)
> from threaded core files isn't completely solved by this patch either.  I haven't gone through that path
> yet, as it is a bit outside the scope of sme1/sme2 enablement. It is an orthogonal problem, that I happened
> to have a pass at improving (not completely fixing).
>
> With this patch, at least gdb can load the core file and display sane values for single-threaded
> core files with sme data.

OK, that makes so much sense -- I assumed multi-threaded stuff was
handled, but I couldn't figure out how it was supposed to work!

>
>>>
>>> I haven't investigated in-depth how to use that information when loading a core file, but
>>> there might be some complexity because we read the target description before we load the threads.
>>>
>>> If the target description we read first is *always* the one for the signalled thread, then we might
>>> be ok and it should make the implementation easier. Anyway, that's a
>>> bit off-topic for this patch.
>> 
>> I mean, maybe this is a better solution?  What if
>> gcore_elf_make_tdesc_note took a 'struct gdbarch *' argument, and it was
>> the callers to chose which gdbarch to pass.  We would then spec in the
>> comments of gcore_elf_make_tdesc_note that the passed in gdbarch should
>> be the gdbarch of the signalled thread.  See the patch below.
>
> Sure, that makes sense, and I don't disagree with that. But as I mentioned above, it is an
> orthogonal problem that might be better suited as its own patch/series. A more complete patch
> addressing this particular situation I stumbled upon.

That's fine.  I understand the urgency you have.  For me, the only issue
I have here is that the new hook you add is to prevent GDB loading a
"broken" NT_GDB_TDESC. I'd also like this patch to include code that
prevents GDB from writing out "broken" NT_GDB_TDESC.

I understand we're likely stuck with the "don't load the NT_GDB_TDESC"
hook due to broken GDB's already existing in the wild.  That sucks, but
it is what it is.

However, having identified some broken behaviour I think we really
should fix that at the same time as adding the work around.

>
>> 
>> Could I ask: would you mind sending me a small application and corefile
>> for AArch64 with sve/sme please.  I'd love to have a play to better
>> understand how GDB sets up the per-thread gdbarch in that case.  I'd be
>> very grateful.
>> 
>
> Absolutely, that would be no problem. I can send you some of those. You will need more than
> one, as the different states might influence things. But, as I explained earlier, gdb won't
> setup the per-thread gdbarch in that case.
>
> If you'd still like to have a go, let me know which ones you want (no sve, sve,
> sve+sme, threads).

OK, that's probably not necessary, I was mostly trying to figure out how
the multi-threaded core-file support worked, and I guess the answer is
that right now, it doesn't in the general case, but, as you mention, if
all threads use the same tdesc, then we're probably fine.  Knowing that,
things make much more sense.

>
>>>
>>>>
>>>> For point (2) I agree with the premise that we need a mechanism to stop
>>>> AArch64 loading the incorrect single NT_GDB_TDESC.  This is a slight
>>>> change in stance from what I originally wrote, but our IRC conversation
>>>> showed me I was wrong originally.  I don't have time this evening to
>>>> look at this, but will follow up again tomorrow with more thoughts.
>> 
>> I wonder, instead of adding the new hook, maybe we should just reorder
>> the checks in core_target::read_description -- ask the gdbarch to grok a
>> tdesc from the .regs (etc) sections, and if that fails, check for an
>> encoded tdesc.
>> 
>
> That's exactly what I did a few versions earlier. But Simon pointed out it would
> effectively be a conflicting situation given our choice of defaulting to reading the
> target description from the NT_GDB_TDESC. So I went with the hook, which, to be honest,
> seems cleaner.
>
>> When I originally added this code my problem case was RISC-V, where the
>> CSRs (control regs) for embedded targets can be different for each
>> target.  The CSRs are just encoded as a blob, so unless you already know
>> which regs are present you're stuck.  For real h/w the controller
>> (e.g. openocd) provides a tdesc (via target.xml), but for a core file we
>> needed a way to reacquire the tdesc.
>> 
>> My assumption at the time that the tdesc we wrote would always be
>> correct, but clearly this isn't the case.
>
> That's true. With the introduction of dynamic target descriptions like sve and sme, this
> no longer holds true.

OK. But is it really the case that the assumption is wrong, or is it
that the current code (that I added) is just using the wrong tdesc?

I know you described the idea of updating gcore_elf_make_tdesc_note to
take a gdbarch* as orthogonal, but I really think this might be the way
to solve this problem.

I've attached a patch at the end of this email that can be applied to
head of master BEFORE your patch series.  This patch updates the
NT_GDB_TDESC generation to use the gdbarch of the signalled thread.

Now, if I've understood how everything works, I believe that the
NT_GDB_TDESC that is emitted _with_ my patch should be good enough -- it
should be the correct tdesc for the signalled thread, which, if all
threads use the same tdesc, should be good enough.

As a test, you can apply my patch, and then TEMPORARILY update
aarch64_use_target_description_from_corefile_notes to always return
true.  Using such a GDB you should be able to create a core file, and
then load it back in correctly.

As I already mentioned, I acknowledge that we are stuck with the
aarch64_use_target_description_from_corefile_notes hook -- there are
broken GDBs in the wild.

But, if I'm correct, then I think this is the best solution for now,
we're emitting a NT_GDB_TDESC that is mostly correct, but AArch64 will
continue to ignore it, just as you originally proposed.

>
>> 
>> Right now RISC-V doesn't event override gdbarch_core_read_description,
>> so if we read the encoded tdesc after checking that gdbarch hook nothing
>> would change.
>
> That was my assessment back in the earlier versions of this series.
>
>> 
>> For other architectures, if gdbarch_core_read_description is implemented
>> then we'll be back to using that, and that worked fine before.
>> 
>> ... And if, in some future world we do implement
>> gdbarch_core_read_description for RISC-V then the solution seems
>> obvious, if there's a section containing CSRs, and there's also a
>> section containing a tdesc, then the RISC-V
>> gdbarch_core_read_description can just return nullptr, safe in the
>> knowledge the generic GDB code will load the encoded tdesc.
>> 
>>>
>>> Except for some different names and tweaks, your proposed approach works correctly.
>>>
>>> I tested this and noticed the lack of NT_GDB_TDESC for a AArch64 Linux target
>>> with sve/sme support and one thread with a differing gdbarch. I also saw the note for
>>> a AArch64 Linux target without sve/sme support, or with sve/sme support but all threads
>>> having the exact same gdbarch. So that looks good to me.
>>>
>>> Would you like this extra code to be included as part of this
>>> particular patch?
>> 
>> Yeah I think something like this should be added to this patch, we
>> should stop GDB creating incorrect core files I think.
>
> Got it.
>
> I'm afraid I'm going to need some clarification on ways forward with regards to this particular
> series. Do you still want this (or other) potential changes as part of this series or would you
> be ok with a follow-up patch/series to (try to, haven't gone in-depth yet) address this particular
> core file/gdbarch/threads situation?

Here's what I propose:

  1. Try the patch I've supplied here, if this works for you (see above)
  then I think this is the best solution, we merge this, then go with v7
  of this patch unmodified,

  2. If the patch above doesn't work, then we go with the dynamic
  gdbarch detection I proposed, which I think you said worked fine to
  prevent "broken" NT_GDB_TDESC from being emitted.

>
> I'd like to have the sme support in for soon-to-branch gdb 14. I don't know how long the community
> would be willing to wait for a potential fix for this problem before
> branching/releasing.

Understood.  I think we have a plan.

> I don't see it as a critical problem, to be honest. Most of the concern is around core files generated
> by threaded programs for which some threads have changed the vector length mid-execution. That is
> an uncommon case. The most common case is that of all threads having the same gdbarch, because they
> are all using the same vector length (for sve, sme etc).

Even signal threaded programs are going to emit the wrong NT_GDB_TDESC
right now though, so I'd really like to at least solve that problem.

I'm 100% NOT suggesting we need to solve the general, many threads with
many architectures problem right now.  I'm not even suggesting that you
need to promise to immediately start looking at it.  I just want GDB to
not emit wrong information in the simple case.  That's all.

---

commit 06546f13ffc5b71fdc65190b529b29575d4f4865
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Mon Sep 25 10:32:14 2023 +0100

    gdb/corefile: write NT_GDB_TDESC based on signalled thread
    
    When creating a core file from within GDB we include a NT_GDB_TDESC
    that includes the target description of the architecture in use.
    
    For architectures with dynamic architectures (e.g. AArch64 with
    sve/sme) the original architecture, calculated from the original
    target description, might not match the per-thread architecture.
    
    In the general case, where each thread has a different architecture,
    then we really need a separate NT_GDB_TDESC for each thread, however,
    there's currently no way to read in multiple NT_GDB_TDESC.
    
    This commit is a step towards per-thread NT_GDB_TDESC.  In this commit
    I have updated the function that writes the NT_GDB_TDESC to accept a
    gdbarch (rather than calling target_gdbarch() to find a gdbarch), and
    I now pass in the gdbarch of the signalled thread.
    
    In many cases (though NOT all) targets with dynamic architectures
    really only use a single architecture, even when there are multiple
    threads, so in the common case, this should ensure that GDB emits an
    architecture that is more likely to be correct.
    
    Additional work will be needed in order to support corefiles with
    truly per-thread architectures, but that will need to be done in the
    future.

diff --git a/gdb/elf-none-tdep.c b/gdb/elf-none-tdep.c
index 460f02e7dbb..1e99b2fbb37 100644
--- a/gdb/elf-none-tdep.c
+++ b/gdb/elf-none-tdep.c
@@ -110,8 +110,12 @@ elf_none_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd,
     }
 
 
-  /* Target description.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  /* Include the target description when possible.  Some architectures
+     allow for per-thread gdbarch so we should really be emitting a tdesc
+     per-thread, however, we don't currently support reading in a
+     per-thread tdesc, so just emit the tdesc for the signalled thread.  */
+  gdbarch = target_thread_architecture (signalled_thr->ptid);
+  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
 
   return note_data;
 }
diff --git a/gdb/fbsd-tdep.c b/gdb/fbsd-tdep.c
index dc5020d92d2..d166d785736 100644
--- a/gdb/fbsd-tdep.c
+++ b/gdb/fbsd-tdep.c
@@ -773,8 +773,12 @@ fbsd_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
 	return NULL;
     }
 
-  /* Include the target description when possible.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  /* Include the target description when possible.  Some architectures
+     allow for per-thread gdbarch so we should really be emitting a tdesc
+     per-thread, however, we don't currently support reading in a
+     per-thread tdesc, so just emit the tdesc for the signalled thread.  */
+  gdbarch = target_thread_architecture (signalled_thr->ptid);
+  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
 
   return note_data;
 }
diff --git a/gdb/gcore-elf.c b/gdb/gcore-elf.c
index 643426d911b..0142db82670 100644
--- a/gdb/gcore-elf.c
+++ b/gdb/gcore-elf.c
@@ -139,12 +139,12 @@ gcore_elf_build_thread_register_notes
 /* See gcore-elf.h.  */
 
 void
-gcore_elf_make_tdesc_note (bfd *obfd,
+gcore_elf_make_tdesc_note (struct gdbarch *gdbarch, bfd *obfd,
 			   gdb::unique_xmalloc_ptr<char> *note_data,
 			   int *note_size)
 {
   /* Append the target description to the core file.  */
-  const struct target_desc *tdesc = gdbarch_target_desc (target_gdbarch ());
+  const struct target_desc *tdesc = gdbarch_target_desc (gdbarch);
   const char *tdesc_xml
     = tdesc == nullptr ? nullptr : tdesc_get_features_xml (tdesc);
   if (tdesc_xml != nullptr && *tdesc_xml != '\0')
diff --git a/gdb/gcore-elf.h b/gdb/gcore-elf.h
index 2cfeb3e8550..826e3bebcee 100644
--- a/gdb/gcore-elf.h
+++ b/gdb/gcore-elf.h
@@ -37,11 +37,12 @@ extern void gcore_elf_build_thread_register_notes
    bfd *obfd, gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
 
 /* Add content to *NOTE_DATA (and update *NOTE_SIZE) to include a note
-   containing the current target's target description.  The core file is
+   containing the target description for GDBARCH.  The core file is
    being written to OBFD.  If something goes wrong then *NOTE_DATA can be
    set to nullptr.  */
 
 extern void gcore_elf_make_tdesc_note
-  (bfd *obfd, gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
+  (struct gdbarch *gdbarch, bfd *obfd,
+   gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
 
 #endif /* GCORE_ELF_H */
diff --git a/gdb/linux-tdep.c b/gdb/linux-tdep.c
index b5eee5e108c..f82a3e44839 100644
--- a/gdb/linux-tdep.c
+++ b/gdb/linux-tdep.c
@@ -2128,8 +2128,12 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
   /* File mappings.  */
   linux_make_mappings_corefile_notes (gdbarch, obfd, note_data, note_size);
 
-  /* Target description.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  /* Include the target description when possible.  Some architectures
+     allow for per-thread gdbarch so we should really be emitting a tdesc
+     per-thread, however, we don't currently support reading in a
+     per-thread tdesc, so just emit the tdesc for the signalled thread.  */
+  gdbarch = target_thread_architecture (signalled_thr->ptid);
+  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
 
   return note_data;
 }
  
Simon Marchi Sept. 25, 2023, 3:41 p.m. UTC | #10
On 9/21/23 06:44, Luis Machado via Gdb-patches wrote:
>>>> For point (2) I agree with the premise that we need a mechanism to stop
>>>> AArch64 loading the incorrect single NT_GDB_TDESC.  This is a slight
>>>> change in stance from what I originally wrote, but our IRC conversation
>>>> showed me I was wrong originally.  I don't have time this evening to
>>>> look at this, but will follow up again tomorrow with more thoughts.
>>
>> I wonder, instead of adding the new hook, maybe we should just reorder
>> the checks in core_target::read_description -- ask the gdbarch to grok a
>> tdesc from the .regs (etc) sections, and if that fails, check for an
>> encoded tdesc.
>>
> 
> That's exactly what I did a few versions earlier. But Simon pointed out it would
> effectively be a conflicting situation given our choice of defaulting to reading the
> target description from the NT_GDB_TDESC. So I went with the hook, which, to be honest,
> seems cleaner.

Hi,

I did not have the complete historical picture when I suggested this,
Andrew later explained that this was added for some RISC-V target for
which a tdesc could not be reconstructed from just the dumped register
state.  I thought that the NT_GDB_TDESC note was a better source of
truth than just the register state.  My understanding now is that for
most targets (including AArch64 with SME and friends), reconstructing
the tdesc from the register state works equally well as reading an
NT_GDB_TDESC note (that holds the right tdesc).  So, we can say that
having the NT_GDB_TDESC note there is useless in these cases.

So now, my question is: when we generate a core, should we only generate
an NT_GDB_TDESC note for those targets for which we know a tdesc note
will be necessary?  That would involve a gdbarch hook where the arch can
say "yes please generate a tdesc note in this core", which would default
to false.

Simon
  
Luis Machado Sept. 26, 2023, 12:39 p.m. UTC | #11
Hi Andrew,

On 9/25/23 10:57, Andrew Burgess wrote:
> Luis Machado <luis.machado@arm.com> writes:
>> Sure, that makes sense, and I don't disagree with that. But as I mentioned above, it is an
>> orthogonal problem that might be better suited as its own patch/series. A more complete patch
>> addressing this particular situation I stumbled upon.
> 
> That's fine.  I understand the urgency you have.  For me, the only issue
> I have here is that the new hook you add is to prevent GDB loading a
> "broken" NT_GDB_TDESC. I'd also like this patch to include code that
> prevents GDB from writing out "broken" NT_GDB_TDESC.
> 
> I understand we're likely stuck with the "don't load the NT_GDB_TDESC"
> hook due to broken GDB's already existing in the wild.  That sucks, but
> it is what it is.
> 
> However, having identified some broken behaviour I think we really
> should fix that at the same time as adding the work around.
> 

Yes, I see your point and think it's reasonable.

>>> When I originally added this code my problem case was RISC-V, where the
>>> CSRs (control regs) for embedded targets can be different for each
>>> target.  The CSRs are just encoded as a blob, so unless you already know
>>> which regs are present you're stuck.  For real h/w the controller
>>> (e.g. openocd) provides a tdesc (via target.xml), but for a core file we
>>> needed a way to reacquire the tdesc.
>>>
>>> My assumption at the time that the tdesc we wrote would always be
>>> correct, but clearly this isn't the case.
>>
>> That's true. With the introduction of dynamic target descriptions like sve and sme, this
>> no longer holds true.
> 
> OK. But is it really the case that the assumption is wrong, or is it
> that the current code (that I added) is just using the wrong tdesc?

For the general case, it is the wrong assumption because you can have threads
with different tdesc's/gdbarches, and using a single NT_GDB_TDESC doesn't cover
that part.

Putting aside the threaded case, it can also be incorrect for the single-threaded
case.

Imagine you start up with a vector length of 256 bits (vg == 4). At this point the
gdbarch/target description that gets stored in the inferior (the tdesc is still a
per-inferior entry IMO) says the vector length is 256 bits.

Now imagine we executed a few instructions and now the vector length is 128 bits
(vg == 2). Though gdb will cope with it just fine during debugging, because
aarch64/linux implements the thread_architecture hook, if we attempt to output
a gcore core file, the current code (without your proposed patch) will dump
the target description/gdbarch cached in the inferior. And that one says we
have a vector length of 256 bits.

So the tdesc is incorrect when we try to load the gcore file, if we trust the
NT_GDB_TDESC note. The register data in the notes has the correct information
though.

Your attached patch fixes this because it uses the signalled thread's
gdbarch/tdesc instead of the cached inferior gdbarch/tdesc.

target_gdbarch can be a bit dangerous unfortunately, but we still have to use
it in some places.

> 
> I know you described the idea of updating gcore_elf_make_tdesc_note to
> take a gdbarch* as orthogonal, but I really think this might be the way
> to solve this problem.
> 
> I've attached a patch at the end of this email that can be applied to
> head of master BEFORE your patch series.  This patch updates the
> NT_GDB_TDESC generation to use the gdbarch of the signalled thread.
> 
> Now, if I've understood how everything works, I believe that the
> NT_GDB_TDESC that is emitted _with_ my patch should be good enough -- it
> should be the correct tdesc for the signalled thread, which, if all
> threads use the same tdesc, should be good enough.
> 
> As a test, you can apply my patch, and then TEMPORARILY update
> aarch64_use_target_description_from_corefile_notes to always return
> true.  Using such a GDB you should be able to create a core file, and
> then load it back in correctly.

Yes, that works correctly from testing on my end. So at a minimum that's
already an improvement, as changing the vector length for one (or all threads,
but with all of them still using the same vector length) during execution is a
useful use case. Like a program setting up for some sve operation, and all of a
someone wants to dump a core file.

> 
> As I already mentioned, I acknowledge that we are stuck with the
> aarch64_use_target_description_from_corefile_notes hook -- there are
> broken GDBs in the wild.

Though unfortunate, the most common core files are likely coming from crashes
outside a debugging session (so not gcore).

> 
> But, if I'm correct, then I think this is the best solution for now,
> we're emitting a NT_GDB_TDESC that is mostly correct, but AArch64 will
> continue to ignore it, just as you originally proposed.
> 

Sorry, it took me a little bit to get some tests running on the simulator. Happy to
report that I see no regressions in terms of testing with the changes you proposed for
handling the signalled thread.

Also, I checked the case I described above about a program starting with vg == 4 and then
changing it to vg == 2 before we output a gcore file.

When I load the corefile back in a patched gdb (and forcing gdb to use the tdesc, by always
returning true in the aarch64-linux hook), I can see that the tdesc is the correct
one (vector length indicating vg == 2).

Without your patch, gdb used the inferior's cached gdbarch/tdesc to dump the NT_GDB_TDESC
note, which means the vector length was vg == 4, when in reality it should've been
vg == 2, as we had changed it before dumping the core file.

>>
>>>
>>> Right now RISC-V doesn't event override gdbarch_core_read_description,
>>> so if we read the encoded tdesc after checking that gdbarch hook nothing
>>> would change.
>>
>> That was my assessment back in the earlier versions of this series.
>>
>>>
>>> For other architectures, if gdbarch_core_read_description is implemented
>>> then we'll be back to using that, and that worked fine before.
>>>
>>> ... And if, in some future world we do implement
>>> gdbarch_core_read_description for RISC-V then the solution seems
>>> obvious, if there's a section containing CSRs, and there's also a
>>> section containing a tdesc, then the RISC-V
>>> gdbarch_core_read_description can just return nullptr, safe in the
>>> knowledge the generic GDB code will load the encoded tdesc.
>>>
>>>>
>>>> Except for some different names and tweaks, your proposed approach works correctly.
>>>>
>>>> I tested this and noticed the lack of NT_GDB_TDESC for a AArch64 Linux target
>>>> with sve/sme support and one thread with a differing gdbarch. I also saw the note for
>>>> a AArch64 Linux target without sve/sme support, or with sve/sme support but all threads
>>>> having the exact same gdbarch. So that looks good to me.
>>>>
>>>> Would you like this extra code to be included as part of this
>>>> particular patch?
>>>
>>> Yeah I think something like this should be added to this patch, we
>>> should stop GDB creating incorrect core files I think.
>>
>> Got it.
>>
>> I'm afraid I'm going to need some clarification on ways forward with regards to this particular
>> series. Do you still want this (or other) potential changes as part of this series or would you
>> be ok with a follow-up patch/series to (try to, haven't gone in-depth yet) address this particular
>> core file/gdbarch/threads situation?
> 
> Here's what I propose:to output the cached gdbarch/
> 
>   1. Try the patch I've supplied here, if this works for you (see above)
>   then I think this is the best solution, we merge this, then go with v7
>   of this patch unmodified,

It did work. Thanks!

But should we keep your suggested changes to check for distinct gdbarches in the threaded case? And
not output NT_GDB_TDESC if we find multiple distinct gdbarch entries?
  
Andrew Burgess Sept. 27, 2023, 5:44 p.m. UTC | #12
Simon Marchi <simon.marchi@polymtl.ca> writes:

> On 9/21/23 06:44, Luis Machado via Gdb-patches wrote:
>>>>> For point (2) I agree with the premise that we need a mechanism to stop
>>>>> AArch64 loading the incorrect single NT_GDB_TDESC.  This is a slight
>>>>> change in stance from what I originally wrote, but our IRC conversation
>>>>> showed me I was wrong originally.  I don't have time this evening to
>>>>> look at this, but will follow up again tomorrow with more thoughts.
>>>
>>> I wonder, instead of adding the new hook, maybe we should just reorder
>>> the checks in core_target::read_description -- ask the gdbarch to grok a
>>> tdesc from the .regs (etc) sections, and if that fails, check for an
>>> encoded tdesc.
>>>
>> 
>> That's exactly what I did a few versions earlier. But Simon pointed out it would
>> effectively be a conflicting situation given our choice of defaulting to reading the
>> target description from the NT_GDB_TDESC. So I went with the hook, which, to be honest,
>> seems cleaner.
>
> Hi,
>
> I did not have the complete historical picture when I suggested this,
> Andrew later explained that this was added for some RISC-V target for
> which a tdesc could not be reconstructed from just the dumped register
> state.  I thought that the NT_GDB_TDESC note was a better source of
> truth than just the register state.  My understanding now is that for
> most targets (including AArch64 with SME and friends), reconstructing
> the tdesc from the register state works equally well as reading an
> NT_GDB_TDESC note (that holds the right tdesc).  So, we can say that
> having the NT_GDB_TDESC note there is useless in these cases.
>
> So now, my question is: when we generate a core, should we only generate
> an NT_GDB_TDESC note for those targets for which we know a tdesc note
> will be necessary?  That would involve a gdbarch hook where the arch can
> say "yes please generate a tdesc note in this core", which would default
> to false.

I would not oppose such a change, but the problem here really was that
GDB was just writing out the wrong thing -- the generic, incorrect,
global tdesc, rather than the correct thread specific tdesc.

But you are correct, for hosted environments (Linux, FreeBSD), where the
OS can create a core file without a NT_GDB_TDESC, we really do need to
be able to figure out the architecture just from the core file contents,
so dropping the NT_GDB_TDESC for these cases really shouldn't be a
problem.

Thanks,
Andrew
  
Andrew Burgess Sept. 27, 2023, 5:56 p.m. UTC | #13
Luis Machado <luis.machado@arm.com> writes:

> Hi Andrew,
>
> On 9/25/23 10:57, Andrew Burgess wrote:
>> Luis Machado <luis.machado@arm.com> writes:
>>> Sure, that makes sense, and I don't disagree with that. But as I mentioned above, it is an
>>> orthogonal problem that might be better suited as its own patch/series. A more complete patch
>>> addressing this particular situation I stumbled upon.
>> 
>> That's fine.  I understand the urgency you have.  For me, the only issue
>> I have here is that the new hook you add is to prevent GDB loading a
>> "broken" NT_GDB_TDESC. I'd also like this patch to include code that
>> prevents GDB from writing out "broken" NT_GDB_TDESC.
>> 
>> I understand we're likely stuck with the "don't load the NT_GDB_TDESC"
>> hook due to broken GDB's already existing in the wild.  That sucks, but
>> it is what it is.
>> 
>> However, having identified some broken behaviour I think we really
>> should fix that at the same time as adding the work around.
>> 
>
> Yes, I see your point and think it's reasonable.
>
>>>> When I originally added this code my problem case was RISC-V, where the
>>>> CSRs (control regs) for embedded targets can be different for each
>>>> target.  The CSRs are just encoded as a blob, so unless you already know
>>>> which regs are present you're stuck.  For real h/w the controller
>>>> (e.g. openocd) provides a tdesc (via target.xml), but for a core file we
>>>> needed a way to reacquire the tdesc.
>>>>
>>>> My assumption at the time that the tdesc we wrote would always be
>>>> correct, but clearly this isn't the case.
>>>
>>> That's true. With the introduction of dynamic target descriptions like sve and sme, this
>>> no longer holds true.
>> 
>> OK. But is it really the case that the assumption is wrong, or is it
>> that the current code (that I added) is just using the wrong tdesc?
>
> For the general case, it is the wrong assumption because you can have threads
> with different tdesc's/gdbarches, and using a single NT_GDB_TDESC doesn't cover
> that part.
>
> Putting aside the threaded case, it can also be incorrect for the single-threaded
> case.
>
> Imagine you start up with a vector length of 256 bits (vg == 4). At this point the
> gdbarch/target description that gets stored in the inferior (the tdesc is still a
> per-inferior entry IMO) says the vector length is 256 bits.
>
> Now imagine we executed a few instructions and now the vector length is 128 bits
> (vg == 2). Though gdb will cope with it just fine during debugging, because
> aarch64/linux implements the thread_architecture hook, if we attempt to output
> a gcore core file, the current code (without your proposed patch) will dump
> the target description/gdbarch cached in the inferior. And that one says we
> have a vector length of 256 bits.
>
> So the tdesc is incorrect when we try to load the gcore file, if we trust the
> NT_GDB_TDESC note. The register data in the notes has the correct information
> though.

I think we agree on the above.

> Your attached patch fixes this because it uses the signalled thread's
> gdbarch/tdesc instead of the cached inferior gdbarch/tdesc.
>
> target_gdbarch can be a bit dangerous unfortunately, but we still have to use
> it in some places.

That's unfortunate.  I guess that might be because in some places we
have an inferior selected but no thread?

It feels like the right solution is to transition GDB to only having a
per-thread gdbarch/tdesc, which would avoid problems like this in the
future.

>
>> 
>> I know you described the idea of updating gcore_elf_make_tdesc_note to
>> take a gdbarch* as orthogonal, but I really think this might be the way
>> to solve this problem.
>> 
>> I've attached a patch at the end of this email that can be applied to
>> head of master BEFORE your patch series.  This patch updates the
>> NT_GDB_TDESC generation to use the gdbarch of the signalled thread.
>> 
>> Now, if I've understood how everything works, I believe that the
>> NT_GDB_TDESC that is emitted _with_ my patch should be good enough -- it
>> should be the correct tdesc for the signalled thread, which, if all
>> threads use the same tdesc, should be good enough.
>> 
>> As a test, you can apply my patch, and then TEMPORARILY update
>> aarch64_use_target_description_from_corefile_notes to always return
>> true.  Using such a GDB you should be able to create a core file, and
>> then load it back in correctly.
>
> Yes, that works correctly from testing on my end. So at a minimum that's
> already an improvement, as changing the vector length for one (or all threads,
> but with all of them still using the same vector length) during execution is a
> useful use case. Like a program setting up for some sve operation, and all of a
> someone wants to dump a core file.
>
>> 
>> As I already mentioned, I acknowledge that we are stuck with the
>> aarch64_use_target_description_from_corefile_notes hook -- there are
>> broken GDBs in the wild.
>
> Though unfortunate, the most common core files are likely coming from crashes
> outside a debugging session (so not gcore).
>
>> 
>> But, if I'm correct, then I think this is the best solution for now,
>> we're emitting a NT_GDB_TDESC that is mostly correct, but AArch64 will
>> continue to ignore it, just as you originally proposed.
>> 
>
> Sorry, it took me a little bit to get some tests running on the simulator. Happy to
> report that I see no regressions in terms of testing with the changes you proposed for
> handling the signalled thread.
>
> Also, I checked the case I described above about a program starting with vg == 4 and then
> changing it to vg == 2 before we output a gcore file.
>
> When I load the corefile back in a patched gdb (and forcing gdb to use the tdesc, by always
> returning true in the aarch64-linux hook), I can see that the tdesc is the correct
> one (vector length indicating vg == 2).
>
> Without your patch, gdb used the inferior's cached gdbarch/tdesc to dump the NT_GDB_TDESC
> note, which means the vector length was vg == 4, when in reality it should've been
> vg == 2, as we had changed it before dumping the core file.

This all sounds positive.

>
>>>
>>>>
>>>> Right now RISC-V doesn't event override gdbarch_core_read_description,
>>>> so if we read the encoded tdesc after checking that gdbarch hook nothing
>>>> would change.
>>>
>>> That was my assessment back in the earlier versions of this series.
>>>
>>>>
>>>> For other architectures, if gdbarch_core_read_description is implemented
>>>> then we'll be back to using that, and that worked fine before.
>>>>
>>>> ... And if, in some future world we do implement
>>>> gdbarch_core_read_description for RISC-V then the solution seems
>>>> obvious, if there's a section containing CSRs, and there's also a
>>>> section containing a tdesc, then the RISC-V
>>>> gdbarch_core_read_description can just return nullptr, safe in the
>>>> knowledge the generic GDB code will load the encoded tdesc.
>>>>
>>>>>
>>>>> Except for some different names and tweaks, your proposed approach works correctly.
>>>>>
>>>>> I tested this and noticed the lack of NT_GDB_TDESC for a AArch64 Linux target
>>>>> with sve/sme support and one thread with a differing gdbarch. I also saw the note for
>>>>> a AArch64 Linux target without sve/sme support, or with sve/sme support but all threads
>>>>> having the exact same gdbarch. So that looks good to me.
>>>>>
>>>>> Would you like this extra code to be included as part of this
>>>>> particular patch?
>>>>
>>>> Yeah I think something like this should be added to this patch, we
>>>> should stop GDB creating incorrect core files I think.
>>>
>>> Got it.
>>>
>>> I'm afraid I'm going to need some clarification on ways forward with regards to this particular
>>> series. Do you still want this (or other) potential changes as part of this series or would you
>>> be ok with a follow-up patch/series to (try to, haven't gone in-depth yet) address this particular
>>> core file/gdbarch/threads situation?
>> 
>> Here's what I propose:to output the cached gdbarch/
>> 
>>   1. Try the patch I've supplied here, if this works for you (see above)
>>   then I think this is the best solution, we merge this, then go with v7
>>   of this patch unmodified,
>
> It did work. Thanks!
>
> But should we keep your suggested changes to check for distinct gdbarches in the threaded case? And
> not output NT_GDB_TDESC if we find multiple distinct gdbarch entries?

I don't think so.  As you've already said, in the core file case we
already don't support per-thread tdesc, even if we use the
aarch64_linux_core_read_description hook - that's only called once, and
will just generate a tdesc based on the main (signalled) thread as far
as I can tell.

Telling GDB to dump per-thread NT_GDB_TDESC is trivial, but I haven't
looked at updating the core file code to read in per-thread tdesc, but I
suspect you'll (eventually) need to solve that problem even when using
the aarch64_linux_core_read_description hook.

I know Simon has mentioned changing GDB to not write out NT_GDB_TDESC,
and I think that would be fine, however, I think we would still want my
proposed patch -- we should always write out the thread tdesc, not the
cached global one, as that is always going to be more accurate.

My proposal then would be: we merge my patch to write NT_GDB_TDESC using
the per-thread tdesc, then you merge your original V7 version of this
patch with no other changes.

Later, if we want, we can add a new hook to avoid writing the
NT_GDB_TDESC at all -- but that can come later, and would be in addition
to my proposed patch, not instead of it.

If you and Simon are happy with this plan then I'll go ahead and push my
patch, this should unblock you, and you'll be able to merge this series.

Let me know what you think,

Thanks,
Andrew
  
Luis Machado Sept. 28, 2023, 8:23 a.m. UTC | #14
On 9/27/23 18:56, Andrew Burgess wrote:
> Luis Machado <luis.machado@arm.com> writes:
> 
>> Hi Andrew,
>>
>> On 9/25/23 10:57, Andrew Burgess wrote:
>>> Luis Machado <luis.machado@arm.com> writes:
>>>> Sure, that makes sense, and I don't disagree with that. But as I mentioned above, it is an
>>>> orthogonal problem that might be better suited as its own patch/series. A more complete patch
>>>> addressing this particular situation I stumbled upon.
>>>
>>> That's fine.  I understand the urgency you have.  For me, the only issue
>>> I have here is that the new hook you add is to prevent GDB loading a
>>> "broken" NT_GDB_TDESC. I'd also like this patch to include code that
>>> prevents GDB from writing out "broken" NT_GDB_TDESC.
>>>
>>> I understand we're likely stuck with the "don't load the NT_GDB_TDESC"
>>> hook due to broken GDB's already existing in the wild.  That sucks, but
>>> it is what it is.
>>>
>>> However, having identified some broken behaviour I think we really
>>> should fix that at the same time as adding the work around.
>>>
>>
>> Yes, I see your point and think it's reasonable.
>>
>>>>> When I originally added this code my problem case was RISC-V, where the
>>>>> CSRs (control regs) for embedded targets can be different for each
>>>>> target.  The CSRs are just encoded as a blob, so unless you already know
>>>>> which regs are present you're stuck.  For real h/w the controller
>>>>> (e.g. openocd) provides a tdesc (via target.xml), but for a core file we
>>>>> needed a way to reacquire the tdesc.
>>>>>
>>>>> My assumption at the time that the tdesc we wrote would always be
>>>>> correct, but clearly this isn't the case.
>>>>
>>>> That's true. With the introduction of dynamic target descriptions like sve and sme, this
>>>> no longer holds true.
>>>
>>> OK. But is it really the case that the assumption is wrong, or is it
>>> that the current code (that I added) is just using the wrong tdesc?
>>
>> For the general case, it is the wrong assumption because you can have threads
>> with different tdesc's/gdbarches, and using a single NT_GDB_TDESC doesn't cover
>> that part.
>>
>> Putting aside the threaded case, it can also be incorrect for the single-threaded
>> case.
>>
>> Imagine you start up with a vector length of 256 bits (vg == 4). At this point the
>> gdbarch/target description that gets stored in the inferior (the tdesc is still a
>> per-inferior entry IMO) says the vector length is 256 bits.
>>
>> Now imagine we executed a few instructions and now the vector length is 128 bits
>> (vg == 2). Though gdb will cope with it just fine during debugging, because
>> aarch64/linux implements the thread_architecture hook, if we attempt to output
>> a gcore core file, the current code (without your proposed patch) will dump
>> the target description/gdbarch cached in the inferior. And that one says we
>> have a vector length of 256 bits.
>>
>> So the tdesc is incorrect when we try to load the gcore file, if we trust the
>> NT_GDB_TDESC note. The register data in the notes has the correct information
>> though.
> 
> I think we agree on the above.
> 
>> Your attached patch fixes this because it uses the signalled thread's
>> gdbarch/tdesc instead of the cached inferior gdbarch/tdesc.
>>
>> target_gdbarch can be a bit dangerous unfortunately, but we still have to use
>> it in some places.
> 
> That's unfortunate.  I guess that might be because in some places we
> have an inferior selected but no thread?
> 
> It feels like the right solution is to transition GDB to only having a
> per-thread gdbarch/tdesc, which would avoid problems like this in the
> future.
> 
>>
>>>
>>> I know you described the idea of updating gcore_elf_make_tdesc_note to
>>> take a gdbarch* as orthogonal, but I really think this might be the way
>>> to solve this problem.
>>>
>>> I've attached a patch at the end of this email that can be applied to
>>> head of master BEFORE your patch series.  This patch updates the
>>> NT_GDB_TDESC generation to use the gdbarch of the signalled thread.
>>>
>>> Now, if I've understood how everything works, I believe that the
>>> NT_GDB_TDESC that is emitted _with_ my patch should be good enough -- it
>>> should be the correct tdesc for the signalled thread, which, if all
>>> threads use the same tdesc, should be good enough.
>>>
>>> As a test, you can apply my patch, and then TEMPORARILY update
>>> aarch64_use_target_description_from_corefile_notes to always return
>>> true.  Using such a GDB you should be able to create a core file, and
>>> then load it back in correctly.
>>
>> Yes, that works correctly from testing on my end. So at a minimum that's
>> already an improvement, as changing the vector length for one (or all threads,
>> but with all of them still using the same vector length) during execution is a
>> useful use case. Like a program setting up for some sve operation, and all of a
>> someone wants to dump a core file.
>>
>>>
>>> As I already mentioned, I acknowledge that we are stuck with the
>>> aarch64_use_target_description_from_corefile_notes hook -- there are
>>> broken GDBs in the wild.
>>
>> Though unfortunate, the most common core files are likely coming from crashes
>> outside a debugging session (so not gcore).
>>
>>>
>>> But, if I'm correct, then I think this is the best solution for now,
>>> we're emitting a NT_GDB_TDESC that is mostly correct, but AArch64 will
>>> continue to ignore it, just as you originally proposed.
>>>
>>
>> Sorry, it took me a little bit to get some tests running on the simulator. Happy to
>> report that I see no regressions in terms of testing with the changes you proposed for
>> handling the signalled thread.
>>
>> Also, I checked the case I described above about a program starting with vg == 4 and then
>> changing it to vg == 2 before we output a gcore file.
>>
>> When I load the corefile back in a patched gdb (and forcing gdb to use the tdesc, by always
>> returning true in the aarch64-linux hook), I can see that the tdesc is the correct
>> one (vector length indicating vg == 2).
>>
>> Without your patch, gdb used the inferior's cached gdbarch/tdesc to dump the NT_GDB_TDESC
>> note, which means the vector length was vg == 4, when in reality it should've been
>> vg == 2, as we had changed it before dumping the core file.
> 
> This all sounds positive.
> 
>>
>>>>
>>>>>
>>>>> Right now RISC-V doesn't event override gdbarch_core_read_description,
>>>>> so if we read the encoded tdesc after checking that gdbarch hook nothing
>>>>> would change.
>>>>
>>>> That was my assessment back in the earlier versions of this series.
>>>>
>>>>>
>>>>> For other architectures, if gdbarch_core_read_description is implemented
>>>>> then we'll be back to using that, and that worked fine before.
>>>>>
>>>>> ... And if, in some future world we do implement
>>>>> gdbarch_core_read_description for RISC-V then the solution seems
>>>>> obvious, if there's a section containing CSRs, and there's also a
>>>>> section containing a tdesc, then the RISC-V
>>>>> gdbarch_core_read_description can just return nullptr, safe in the
>>>>> knowledge the generic GDB code will load the encoded tdesc.
>>>>>
>>>>>>
>>>>>> Except for some different names and tweaks, your proposed approach works correctly.
>>>>>>
>>>>>> I tested this and noticed the lack of NT_GDB_TDESC for a AArch64 Linux target
>>>>>> with sve/sme support and one thread with a differing gdbarch. I also saw the note for
>>>>>> a AArch64 Linux target without sve/sme support, or with sve/sme support but all threads
>>>>>> having the exact same gdbarch. So that looks good to me.
>>>>>>
>>>>>> Would you like this extra code to be included as part of this
>>>>>> particular patch?
>>>>>
>>>>> Yeah I think something like this should be added to this patch, we
>>>>> should stop GDB creating incorrect core files I think.
>>>>
>>>> Got it.
>>>>
>>>> I'm afraid I'm going to need some clarification on ways forward with regards to this particular
>>>> series. Do you still want this (or other) potential changes as part of this series or would you
>>>> be ok with a follow-up patch/series to (try to, haven't gone in-depth yet) address this particular
>>>> core file/gdbarch/threads situation?
>>>
>>> Here's what I propose:to output the cached gdbarch/
>>>
>>>   1. Try the patch I've supplied here, if this works for you (see above)
>>>   then I think this is the best solution, we merge this, then go with v7
>>>   of this patch unmodified,
>>
>> It did work. Thanks!
>>
>> But should we keep your suggested changes to check for distinct gdbarches in the threaded case? And
>> not output NT_GDB_TDESC if we find multiple distinct gdbarch entries?
> 
> I don't think so.  As you've already said, in the core file case we
> already don't support per-thread tdesc, even if we use the
> aarch64_linux_core_read_description hook - that's only called once, and
> will just generate a tdesc based on the main (signalled) thread as far
> as I can tell.
> 
> Telling GDB to dump per-thread NT_GDB_TDESC is trivial, but I haven't
> looked at updating the core file code to read in per-thread tdesc, but I
> suspect you'll (eventually) need to solve that problem even when using
> the aarch64_linux_core_read_description hook.
> 
> I know Simon has mentioned changing GDB to not write out NT_GDB_TDESC,
> and I think that would be fine, however, I think we would still want my
> proposed patch -- we should always write out the thread tdesc, not the
> cached global one, as that is always going to be more accurate.
> 
> My proposal then would be: we merge my patch to write NT_GDB_TDESC using
> the per-thread tdesc, then you merge your original V7 version of this
> patch with no other changes.
> 
> Later, if we want, we can add a new hook to avoid writing the
> NT_GDB_TDESC at all -- but that can come later, and would be in addition
> to my proposed patch, not instead of it.
> 
> If you and Simon are happy with this plan then I'll go ahead and push my
> patch, this should unblock you, and you'll be able to merge this series.
> 
> Let me know what you think,

That sounds good to me. Thanks for going over this.

Please let me know when you have pushed the NT_GDB_TDESC fixup.

Regards,
Luis
  
Andrew Burgess Oct. 3, 2023, 5:23 p.m. UTC | #15
Luis Machado <luis.machado@arm.com> writes:

> On 9/27/23 18:56, Andrew Burgess wrote:
>> Luis Machado <luis.machado@arm.com> writes:
>> 
>>> Hi Andrew,
>>>
>>> On 9/25/23 10:57, Andrew Burgess wrote:
>>>> Luis Machado <luis.machado@arm.com> writes:
>>>>> Sure, that makes sense, and I don't disagree with that. But as I mentioned above, it is an
>>>>> orthogonal problem that might be better suited as its own patch/series. A more complete patch
>>>>> addressing this particular situation I stumbled upon.
>>>>
>>>> That's fine.  I understand the urgency you have.  For me, the only issue
>>>> I have here is that the new hook you add is to prevent GDB loading a
>>>> "broken" NT_GDB_TDESC. I'd also like this patch to include code that
>>>> prevents GDB from writing out "broken" NT_GDB_TDESC.
>>>>
>>>> I understand we're likely stuck with the "don't load the NT_GDB_TDESC"
>>>> hook due to broken GDB's already existing in the wild.  That sucks, but
>>>> it is what it is.
>>>>
>>>> However, having identified some broken behaviour I think we really
>>>> should fix that at the same time as adding the work around.
>>>>
>>>
>>> Yes, I see your point and think it's reasonable.
>>>
>>>>>> When I originally added this code my problem case was RISC-V, where the
>>>>>> CSRs (control regs) for embedded targets can be different for each
>>>>>> target.  The CSRs are just encoded as a blob, so unless you already know
>>>>>> which regs are present you're stuck.  For real h/w the controller
>>>>>> (e.g. openocd) provides a tdesc (via target.xml), but for a core file we
>>>>>> needed a way to reacquire the tdesc.
>>>>>>
>>>>>> My assumption at the time that the tdesc we wrote would always be
>>>>>> correct, but clearly this isn't the case.
>>>>>
>>>>> That's true. With the introduction of dynamic target descriptions like sve and sme, this
>>>>> no longer holds true.
>>>>
>>>> OK. But is it really the case that the assumption is wrong, or is it
>>>> that the current code (that I added) is just using the wrong tdesc?
>>>
>>> For the general case, it is the wrong assumption because you can have threads
>>> with different tdesc's/gdbarches, and using a single NT_GDB_TDESC doesn't cover
>>> that part.
>>>
>>> Putting aside the threaded case, it can also be incorrect for the single-threaded
>>> case.
>>>
>>> Imagine you start up with a vector length of 256 bits (vg == 4). At this point the
>>> gdbarch/target description that gets stored in the inferior (the tdesc is still a
>>> per-inferior entry IMO) says the vector length is 256 bits.
>>>
>>> Now imagine we executed a few instructions and now the vector length is 128 bits
>>> (vg == 2). Though gdb will cope with it just fine during debugging, because
>>> aarch64/linux implements the thread_architecture hook, if we attempt to output
>>> a gcore core file, the current code (without your proposed patch) will dump
>>> the target description/gdbarch cached in the inferior. And that one says we
>>> have a vector length of 256 bits.
>>>
>>> So the tdesc is incorrect when we try to load the gcore file, if we trust the
>>> NT_GDB_TDESC note. The register data in the notes has the correct information
>>> though.
>> 
>> I think we agree on the above.
>> 
>>> Your attached patch fixes this because it uses the signalled thread's
>>> gdbarch/tdesc instead of the cached inferior gdbarch/tdesc.
>>>
>>> target_gdbarch can be a bit dangerous unfortunately, but we still have to use
>>> it in some places.
>> 
>> That's unfortunate.  I guess that might be because in some places we
>> have an inferior selected but no thread?
>> 
>> It feels like the right solution is to transition GDB to only having a
>> per-thread gdbarch/tdesc, which would avoid problems like this in the
>> future.
>> 
>>>
>>>>
>>>> I know you described the idea of updating gcore_elf_make_tdesc_note to
>>>> take a gdbarch* as orthogonal, but I really think this might be the way
>>>> to solve this problem.
>>>>
>>>> I've attached a patch at the end of this email that can be applied to
>>>> head of master BEFORE your patch series.  This patch updates the
>>>> NT_GDB_TDESC generation to use the gdbarch of the signalled thread.
>>>>
>>>> Now, if I've understood how everything works, I believe that the
>>>> NT_GDB_TDESC that is emitted _with_ my patch should be good enough -- it
>>>> should be the correct tdesc for the signalled thread, which, if all
>>>> threads use the same tdesc, should be good enough.
>>>>
>>>> As a test, you can apply my patch, and then TEMPORARILY update
>>>> aarch64_use_target_description_from_corefile_notes to always return
>>>> true.  Using such a GDB you should be able to create a core file, and
>>>> then load it back in correctly.
>>>
>>> Yes, that works correctly from testing on my end. So at a minimum that's
>>> already an improvement, as changing the vector length for one (or all threads,
>>> but with all of them still using the same vector length) during execution is a
>>> useful use case. Like a program setting up for some sve operation, and all of a
>>> someone wants to dump a core file.
>>>
>>>>
>>>> As I already mentioned, I acknowledge that we are stuck with the
>>>> aarch64_use_target_description_from_corefile_notes hook -- there are
>>>> broken GDBs in the wild.
>>>
>>> Though unfortunate, the most common core files are likely coming from crashes
>>> outside a debugging session (so not gcore).
>>>
>>>>
>>>> But, if I'm correct, then I think this is the best solution for now,
>>>> we're emitting a NT_GDB_TDESC that is mostly correct, but AArch64 will
>>>> continue to ignore it, just as you originally proposed.
>>>>
>>>
>>> Sorry, it took me a little bit to get some tests running on the simulator. Happy to
>>> report that I see no regressions in terms of testing with the changes you proposed for
>>> handling the signalled thread.
>>>
>>> Also, I checked the case I described above about a program starting with vg == 4 and then
>>> changing it to vg == 2 before we output a gcore file.
>>>
>>> When I load the corefile back in a patched gdb (and forcing gdb to use the tdesc, by always
>>> returning true in the aarch64-linux hook), I can see that the tdesc is the correct
>>> one (vector length indicating vg == 2).
>>>
>>> Without your patch, gdb used the inferior's cached gdbarch/tdesc to dump the NT_GDB_TDESC
>>> note, which means the vector length was vg == 4, when in reality it should've been
>>> vg == 2, as we had changed it before dumping the core file.
>> 
>> This all sounds positive.
>> 
>>>
>>>>>
>>>>>>
>>>>>> Right now RISC-V doesn't event override gdbarch_core_read_description,
>>>>>> so if we read the encoded tdesc after checking that gdbarch hook nothing
>>>>>> would change.
>>>>>
>>>>> That was my assessment back in the earlier versions of this series.
>>>>>
>>>>>>
>>>>>> For other architectures, if gdbarch_core_read_description is implemented
>>>>>> then we'll be back to using that, and that worked fine before.
>>>>>>
>>>>>> ... And if, in some future world we do implement
>>>>>> gdbarch_core_read_description for RISC-V then the solution seems
>>>>>> obvious, if there's a section containing CSRs, and there's also a
>>>>>> section containing a tdesc, then the RISC-V
>>>>>> gdbarch_core_read_description can just return nullptr, safe in the
>>>>>> knowledge the generic GDB code will load the encoded tdesc.
>>>>>>
>>>>>>>
>>>>>>> Except for some different names and tweaks, your proposed approach works correctly.
>>>>>>>
>>>>>>> I tested this and noticed the lack of NT_GDB_TDESC for a AArch64 Linux target
>>>>>>> with sve/sme support and one thread with a differing gdbarch. I also saw the note for
>>>>>>> a AArch64 Linux target without sve/sme support, or with sve/sme support but all threads
>>>>>>> having the exact same gdbarch. So that looks good to me.
>>>>>>>
>>>>>>> Would you like this extra code to be included as part of this
>>>>>>> particular patch?
>>>>>>
>>>>>> Yeah I think something like this should be added to this patch, we
>>>>>> should stop GDB creating incorrect core files I think.
>>>>>
>>>>> Got it.
>>>>>
>>>>> I'm afraid I'm going to need some clarification on ways forward with regards to this particular
>>>>> series. Do you still want this (or other) potential changes as part of this series or would you
>>>>> be ok with a follow-up patch/series to (try to, haven't gone in-depth yet) address this particular
>>>>> core file/gdbarch/threads situation?
>>>>
>>>> Here's what I propose:to output the cached gdbarch/
>>>>
>>>>   1. Try the patch I've supplied here, if this works for you (see above)
>>>>   then I think this is the best solution, we merge this, then go with v7
>>>>   of this patch unmodified,
>>>
>>> It did work. Thanks!
>>>
>>> But should we keep your suggested changes to check for distinct gdbarches in the threaded case? And
>>> not output NT_GDB_TDESC if we find multiple distinct gdbarch entries?
>> 
>> I don't think so.  As you've already said, in the core file case we
>> already don't support per-thread tdesc, even if we use the
>> aarch64_linux_core_read_description hook - that's only called once, and
>> will just generate a tdesc based on the main (signalled) thread as far
>> as I can tell.
>> 
>> Telling GDB to dump per-thread NT_GDB_TDESC is trivial, but I haven't
>> looked at updating the core file code to read in per-thread tdesc, but I
>> suspect you'll (eventually) need to solve that problem even when using
>> the aarch64_linux_core_read_description hook.
>> 
>> I know Simon has mentioned changing GDB to not write out NT_GDB_TDESC,
>> and I think that would be fine, however, I think we would still want my
>> proposed patch -- we should always write out the thread tdesc, not the
>> cached global one, as that is always going to be more accurate.
>> 
>> My proposal then would be: we merge my patch to write NT_GDB_TDESC using
>> the per-thread tdesc, then you merge your original V7 version of this
>> patch with no other changes.
>> 
>> Later, if we want, we can add a new hook to avoid writing the
>> NT_GDB_TDESC at all -- but that can come later, and would be in addition
>> to my proposed patch, not instead of it.
>> 
>> If you and Simon are happy with this plan then I'll go ahead and push my
>> patch, this should unblock you, and you'll be able to merge this series.
>> 
>> Let me know what you think,
>
> That sounds good to me. Thanks for going over this.
>
> Please let me know when you have pushed the NT_GDB_TDESC fixup.

I've pushed the patch.

I've included it again below, just for the record.

Thanks,
Andrew

---

commit c14993e9dc598a5ca126121f7d2a0898c26908bc
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Mon Sep 25 10:32:14 2023 +0100

    gdb/corefile: write NT_GDB_TDESC based on signalled thread
    
    When creating a core file from within GDB we include a NT_GDB_TDESC
    that includes the target description of the architecture in use.
    
    For architectures with dynamic architectures (e.g. AArch64 with
    sve/sme) the original architecture, calculated from the original
    target description, might not match the per-thread architecture.
    
    In the general case, where each thread has a different architecture,
    then we really need a separate NT_GDB_TDESC for each thread, however,
    there's currently no way to read in multiple NT_GDB_TDESC.
    
    This commit is a step towards per-thread NT_GDB_TDESC.  In this commit
    I have updated the function that writes the NT_GDB_TDESC to accept a
    gdbarch (rather than calling target_gdbarch() to find a gdbarch), and
    I now pass in the gdbarch of the signalled thread.
    
    In many cases (though NOT all) targets with dynamic architectures
    really only use a single architecture, even when there are multiple
    threads, so in the common case, this should ensure that GDB emits an
    architecture that is more likely to be correct.
    
    Additional work will be needed in order to support corefiles with
    truly per-thread architectures, but that will need to be done in the
    future.

diff --git a/gdb/elf-none-tdep.c b/gdb/elf-none-tdep.c
index ce2db02aa9d..cae66004a26 100644
--- a/gdb/elf-none-tdep.c
+++ b/gdb/elf-none-tdep.c
@@ -110,8 +110,12 @@ elf_none_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd,
     }
 
 
-  /* Target description.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  /* Include the target description when possible.  Some architectures
+     allow for per-thread gdbarch so we should really be emitting a tdesc
+     per-thread, however, we don't currently support reading in a
+     per-thread tdesc, so just emit the tdesc for the signalled thread.  */
+  gdbarch = target_thread_architecture (signalled_thr->ptid);
+  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
 
   return note_data;
 }
diff --git a/gdb/fbsd-tdep.c b/gdb/fbsd-tdep.c
index dc5020d92d2..d166d785736 100644
--- a/gdb/fbsd-tdep.c
+++ b/gdb/fbsd-tdep.c
@@ -773,8 +773,12 @@ fbsd_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
 	return NULL;
     }
 
-  /* Include the target description when possible.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  /* Include the target description when possible.  Some architectures
+     allow for per-thread gdbarch so we should really be emitting a tdesc
+     per-thread, however, we don't currently support reading in a
+     per-thread tdesc, so just emit the tdesc for the signalled thread.  */
+  gdbarch = target_thread_architecture (signalled_thr->ptid);
+  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
 
   return note_data;
 }
diff --git a/gdb/gcore-elf.c b/gdb/gcore-elf.c
index 643426d911b..0142db82670 100644
--- a/gdb/gcore-elf.c
+++ b/gdb/gcore-elf.c
@@ -139,12 +139,12 @@ gcore_elf_build_thread_register_notes
 /* See gcore-elf.h.  */
 
 void
-gcore_elf_make_tdesc_note (bfd *obfd,
+gcore_elf_make_tdesc_note (struct gdbarch *gdbarch, bfd *obfd,
 			   gdb::unique_xmalloc_ptr<char> *note_data,
 			   int *note_size)
 {
   /* Append the target description to the core file.  */
-  const struct target_desc *tdesc = gdbarch_target_desc (target_gdbarch ());
+  const struct target_desc *tdesc = gdbarch_target_desc (gdbarch);
   const char *tdesc_xml
     = tdesc == nullptr ? nullptr : tdesc_get_features_xml (tdesc);
   if (tdesc_xml != nullptr && *tdesc_xml != '\0')
diff --git a/gdb/gcore-elf.h b/gdb/gcore-elf.h
index 2cfeb3e8550..826e3bebcee 100644
--- a/gdb/gcore-elf.h
+++ b/gdb/gcore-elf.h
@@ -37,11 +37,12 @@ extern void gcore_elf_build_thread_register_notes
    bfd *obfd, gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
 
 /* Add content to *NOTE_DATA (and update *NOTE_SIZE) to include a note
-   containing the current target's target description.  The core file is
+   containing the target description for GDBARCH.  The core file is
    being written to OBFD.  If something goes wrong then *NOTE_DATA can be
    set to nullptr.  */
 
 extern void gcore_elf_make_tdesc_note
-  (bfd *obfd, gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
+  (struct gdbarch *gdbarch, bfd *obfd,
+   gdb::unique_xmalloc_ptr<char> *note_data, int *note_size);
 
 #endif /* GCORE_ELF_H */
diff --git a/gdb/linux-tdep.c b/gdb/linux-tdep.c
index 22bb9686e8d..98658fa5a3f 100644
--- a/gdb/linux-tdep.c
+++ b/gdb/linux-tdep.c
@@ -2128,8 +2128,12 @@ linux_make_corefile_notes (struct gdbarch *gdbarch, bfd *obfd, int *note_size)
   /* File mappings.  */
   linux_make_mappings_corefile_notes (gdbarch, obfd, note_data, note_size);
 
-  /* Target description.  */
-  gcore_elf_make_tdesc_note (obfd, &note_data, note_size);
+  /* Include the target description when possible.  Some architectures
+     allow for per-thread gdbarch so we should really be emitting a tdesc
+     per-thread, however, we don't currently support reading in a
+     per-thread tdesc, so just emit the tdesc for the signalled thread.  */
+  gdbarch = target_thread_architecture (signalled_thr->ptid);
+  gcore_elf_make_tdesc_note (gdbarch, obfd, &note_data, note_size);
 
   return note_data;
 }
  
Luis Machado Oct. 4, 2023, 3:27 p.m. UTC | #16
On 10/3/23 18:23, Andrew Burgess wrote:
> Luis Machado <luis.machado@arm.com> writes:
> 
>> On 9/27/23 18:56, Andrew Burgess wrote:
>>> Luis Machado <luis.machado@arm.com> writes:
>>>
>>>> Hi Andrew,
>>>>
>>>> On 9/25/23 10:57, Andrew Burgess wrote:
>>>>> Luis Machado <luis.machado@arm.com> writes:
>>>>>> Sure, that makes sense, and I don't disagree with that. But as I mentioned above, it is an
>>>>>> orthogonal problem that might be better suited as its own patch/series. A more complete patch
>>>>>> addressing this particular situation I stumbled upon.
>>>>>
>>>>> That's fine.  I understand the urgency you have.  For me, the only issue
>>>>> I have here is that the new hook you add is to prevent GDB loading a
>>>>> "broken" NT_GDB_TDESC. I'd also like this patch to include code that
>>>>> prevents GDB from writing out "broken" NT_GDB_TDESC.
>>>>>
>>>>> I understand we're likely stuck with the "don't load the NT_GDB_TDESC"
>>>>> hook due to broken GDB's already existing in the wild.  That sucks, but
>>>>> it is what it is.
>>>>>
>>>>> However, having identified some broken behaviour I think we really
>>>>> should fix that at the same time as adding the work around.
>>>>>
>>>>
>>>> Yes, I see your point and think it's reasonable.
>>>>
>>>>>>> When I originally added this code my problem case was RISC-V, where the
>>>>>>> CSRs (control regs) for embedded targets can be different for each
>>>>>>> target.  The CSRs are just encoded as a blob, so unless you already know
>>>>>>> which regs are present you're stuck.  For real h/w the controller
>>>>>>> (e.g. openocd) provides a tdesc (via target.xml), but for a core file we
>>>>>>> needed a way to reacquire the tdesc.
>>>>>>>
>>>>>>> My assumption at the time that the tdesc we wrote would always be
>>>>>>> correct, but clearly this isn't the case.
>>>>>>
>>>>>> That's true. With the introduction of dynamic target descriptions like sve and sme, this
>>>>>> no longer holds true.
>>>>>
>>>>> OK. But is it really the case that the assumption is wrong, or is it
>>>>> that the current code (that I added) is just using the wrong tdesc?
>>>>
>>>> For the general case, it is the wrong assumption because you can have threads
>>>> with different tdesc's/gdbarches, and using a single NT_GDB_TDESC doesn't cover
>>>> that part.
>>>>
>>>> Putting aside the threaded case, it can also be incorrect for the single-threaded
>>>> case.
>>>>
>>>> Imagine you start up with a vector length of 256 bits (vg == 4). At this point the
>>>> gdbarch/target description that gets stored in the inferior (the tdesc is still a
>>>> per-inferior entry IMO) says the vector length is 256 bits.
>>>>
>>>> Now imagine we executed a few instructions and now the vector length is 128 bits
>>>> (vg == 2). Though gdb will cope with it just fine during debugging, because
>>>> aarch64/linux implements the thread_architecture hook, if we attempt to output
>>>> a gcore core file, the current code (without your proposed patch) will dump
>>>> the target description/gdbarch cached in the inferior. And that one says we
>>>> have a vector length of 256 bits.
>>>>
>>>> So the tdesc is incorrect when we try to load the gcore file, if we trust the
>>>> NT_GDB_TDESC note. The register data in the notes has the correct information
>>>> though.
>>>
>>> I think we agree on the above.
>>>
>>>> Your attached patch fixes this because it uses the signalled thread's
>>>> gdbarch/tdesc instead of the cached inferior gdbarch/tdesc.
>>>>
>>>> target_gdbarch can be a bit dangerous unfortunately, but we still have to use
>>>> it in some places.
>>>
>>> That's unfortunate.  I guess that might be because in some places we
>>> have an inferior selected but no thread?
>>>
>>> It feels like the right solution is to transition GDB to only having a
>>> per-thread gdbarch/tdesc, which would avoid problems like this in the
>>> future.
>>>
>>>>
>>>>>
>>>>> I know you described the idea of updating gcore_elf_make_tdesc_note to
>>>>> take a gdbarch* as orthogonal, but I really think this might be the way
>>>>> to solve this problem.
>>>>>
>>>>> I've attached a patch at the end of this email that can be applied to
>>>>> head of master BEFORE your patch series.  This patch updates the
>>>>> NT_GDB_TDESC generation to use the gdbarch of the signalled thread.
>>>>>
>>>>> Now, if I've understood how everything works, I believe that the
>>>>> NT_GDB_TDESC that is emitted _with_ my patch should be good enough -- it
>>>>> should be the correct tdesc for the signalled thread, which, if all
>>>>> threads use the same tdesc, should be good enough.
>>>>>
>>>>> As a test, you can apply my patch, and then TEMPORARILY update
>>>>> aarch64_use_target_description_from_corefile_notes to always return
>>>>> true.  Using such a GDB you should be able to create a core file, and
>>>>> then load it back in correctly.
>>>>
>>>> Yes, that works correctly from testing on my end. So at a minimum that's
>>>> already an improvement, as changing the vector length for one (or all threads,
>>>> but with all of them still using the same vector length) during execution is a
>>>> useful use case. Like a program setting up for some sve operation, and all of a
>>>> someone wants to dump a core file.
>>>>
>>>>>
>>>>> As I already mentioned, I acknowledge that we are stuck with the
>>>>> aarch64_use_target_description_from_corefile_notes hook -- there are
>>>>> broken GDBs in the wild.
>>>>
>>>> Though unfortunate, the most common core files are likely coming from crashes
>>>> outside a debugging session (so not gcore).
>>>>
>>>>>
>>>>> But, if I'm correct, then I think this is the best solution for now,
>>>>> we're emitting a NT_GDB_TDESC that is mostly correct, but AArch64 will
>>>>> continue to ignore it, just as you originally proposed.
>>>>>
>>>>
>>>> Sorry, it took me a little bit to get some tests running on the simulator. Happy to
>>>> report that I see no regressions in terms of testing with the changes you proposed for
>>>> handling the signalled thread.
>>>>
>>>> Also, I checked the case I described above about a program starting with vg == 4 and then
>>>> changing it to vg == 2 before we output a gcore file.
>>>>
>>>> When I load the corefile back in a patched gdb (and forcing gdb to use the tdesc, by always
>>>> returning true in the aarch64-linux hook), I can see that the tdesc is the correct
>>>> one (vector length indicating vg == 2).
>>>>
>>>> Without your patch, gdb used the inferior's cached gdbarch/tdesc to dump the NT_GDB_TDESC
>>>> note, which means the vector length was vg == 4, when in reality it should've been
>>>> vg == 2, as we had changed it before dumping the core file.
>>>
>>> This all sounds positive.
>>>
>>>>
>>>>>>
>>>>>>>
>>>>>>> Right now RISC-V doesn't event override gdbarch_core_read_description,
>>>>>>> so if we read the encoded tdesc after checking that gdbarch hook nothing
>>>>>>> would change.
>>>>>>
>>>>>> That was my assessment back in the earlier versions of this series.
>>>>>>
>>>>>>>
>>>>>>> For other architectures, if gdbarch_core_read_description is implemented
>>>>>>> then we'll be back to using that, and that worked fine before.
>>>>>>>
>>>>>>> ... And if, in some future world we do implement
>>>>>>> gdbarch_core_read_description for RISC-V then the solution seems
>>>>>>> obvious, if there's a section containing CSRs, and there's also a
>>>>>>> section containing a tdesc, then the RISC-V
>>>>>>> gdbarch_core_read_description can just return nullptr, safe in the
>>>>>>> knowledge the generic GDB code will load the encoded tdesc.
>>>>>>>
>>>>>>>>
>>>>>>>> Except for some different names and tweaks, your proposed approach works correctly.
>>>>>>>>
>>>>>>>> I tested this and noticed the lack of NT_GDB_TDESC for a AArch64 Linux target
>>>>>>>> with sve/sme support and one thread with a differing gdbarch. I also saw the note for
>>>>>>>> a AArch64 Linux target without sve/sme support, or with sve/sme support but all threads
>>>>>>>> having the exact same gdbarch. So that looks good to me.
>>>>>>>>
>>>>>>>> Would you like this extra code to be included as part of this
>>>>>>>> particular patch?
>>>>>>>
>>>>>>> Yeah I think something like this should be added to this patch, we
>>>>>>> should stop GDB creating incorrect core files I think.
>>>>>>
>>>>>> Got it.
>>>>>>
>>>>>> I'm afraid I'm going to need some clarification on ways forward with regards to this particular
>>>>>> series. Do you still want this (or other) potential changes as part of this series or would you
>>>>>> be ok with a follow-up patch/series to (try to, haven't gone in-depth yet) address this particular
>>>>>> core file/gdbarch/threads situation?
>>>>>
>>>>> Here's what I propose:to output the cached gdbarch/
>>>>>
>>>>>   1. Try the patch I've supplied here, if this works for you (see above)
>>>>>   then I think this is the best solution, we merge this, then go with v7
>>>>>   of this patch unmodified,
>>>>
>>>> It did work. Thanks!
>>>>
>>>> But should we keep your suggested changes to check for distinct gdbarches in the threaded case? And
>>>> not output NT_GDB_TDESC if we find multiple distinct gdbarch entries?
>>>
>>> I don't think so.  As you've already said, in the core file case we
>>> already don't support per-thread tdesc, even if we use the
>>> aarch64_linux_core_read_description hook - that's only called once, and
>>> will just generate a tdesc based on the main (signalled) thread as far
>>> as I can tell.
>>>
>>> Telling GDB to dump per-thread NT_GDB_TDESC is trivial, but I haven't
>>> looked at updating the core file code to read in per-thread tdesc, but I
>>> suspect you'll (eventually) need to solve that problem even when using
>>> the aarch64_linux_core_read_description hook.
>>>
>>> I know Simon has mentioned changing GDB to not write out NT_GDB_TDESC,
>>> and I think that would be fine, however, I think we would still want my
>>> proposed patch -- we should always write out the thread tdesc, not the
>>> cached global one, as that is always going to be more accurate.
>>>
>>> My proposal then would be: we merge my patch to write NT_GDB_TDESC using
>>> the per-thread tdesc, then you merge your original V7 version of this
>>> patch with no other changes.
>>>
>>> Later, if we want, we can add a new hook to avoid writing the
>>> NT_GDB_TDESC at all -- but that can come later, and would be in addition
>>> to my proposed patch, not instead of it.
>>>
>>> If you and Simon are happy with this plan then I'll go ahead and push my
>>> patch, this should unblock you, and you'll be able to merge this series.
>>>
>>> Let me know what you think,
>>
>> That sounds good to me. Thanks for going over this.
>>
>> Please let me know when you have pushed the NT_GDB_TDESC fixup.
> 
> I've pushed the patch.
> 
> I've included it again below, just for the record.
> 
> Thanks,
> Andrew

Great. Thanks for addressing this.
  

Patch

diff --git a/gdb/aarch64-linux-tdep.c b/gdb/aarch64-linux-tdep.c
index 47e5e1db641..21ac7ebdc56 100644
--- a/gdb/aarch64-linux-tdep.c
+++ b/gdb/aarch64-linux-tdep.c
@@ -2199,6 +2199,33 @@  aarch64_linux_decode_memtag_section (struct gdbarch *gdbarch,
   return tags;
 }
 
+/* AArch64 Linux implementation of the
+   gdbarch_use_target_description_from_corefile_notes hook.  */
+
+static bool
+aarch64_use_target_description_from_corefile_notes (gdbarch *gdbarch,
+						    bfd *obfd)
+{
+  /* Sanity check.  */
+  gdb_assert (obfd != nullptr);
+
+  /* If the corefile contains any SVE or SME register data, we don't want to
+     use the target description note, as it may be incorrect.
+
+     Currently the target description note contains a potentially incorrect
+     target description if the originating program changed the SVE or SME
+     vector lengths mid-execution.
+
+     Once we support per-thread target description notes in the corefiles, we
+     can always trust those notes whenever they are available.  */
+  if (bfd_get_section_by_name (obfd, ".reg-aarch-sve") != nullptr
+      || bfd_get_section_by_name (obfd, ".reg-aarch-za") != nullptr
+      || bfd_get_section_by_name (obfd, ".reg-aarch-zt") != nullptr)
+    return false;
+
+  return true;
+}
+
 static void
 aarch64_linux_init_abi (struct gdbarch_info info, struct gdbarch *gdbarch)
 {
@@ -2469,6 +2496,12 @@  aarch64_linux_init_abi (struct gdbarch_info info, struct gdbarch *gdbarch)
 					    aarch64_displaced_step_hw_singlestep);
 
   set_gdbarch_gcc_target_options (gdbarch, aarch64_linux_gcc_target_options);
+
+  /* Hook to decide if the target description should be obtained from
+     corefile target description note(s) or inferred from the corefile
+     sections.  */
+  set_gdbarch_use_target_description_from_corefile_notes (gdbarch,
+			    aarch64_use_target_description_from_corefile_notes);
 }
 
 #if GDB_SELF_TEST
diff --git a/gdb/arch-utils.c b/gdb/arch-utils.c
index ee34fc07d33..c1d2c939eb9 100644
--- a/gdb/arch-utils.c
+++ b/gdb/arch-utils.c
@@ -1092,6 +1092,16 @@  default_read_core_file_mappings
 {
 }
 
+/* See arch-utils.h.  */
+bool
+default_use_target_description_from_corefile_notes (struct gdbarch *gdbarch,
+						    struct bfd *obfd)
+{
+  /* Always trust the corefile target description contained in the target
+     description note.  */
+  return true;
+}
+
 CORE_ADDR
 default_get_return_buf_addr (struct type *val_type, frame_info_ptr cur_frame)
 {
diff --git a/gdb/arch-utils.h b/gdb/arch-utils.h
index 2bdc3251c9c..5df3de7b5d9 100644
--- a/gdb/arch-utils.h
+++ b/gdb/arch-utils.h
@@ -305,6 +305,12 @@  extern void default_read_core_file_mappings
    read_core_file_mappings_pre_loop_ftype pre_loop_cb,
    read_core_file_mappings_loop_ftype loop_cb);
 
+/* Default implementation of gdbarch
+   use_target_description_from_corefile_notes.  */
+extern bool default_use_target_description_from_corefile_notes
+  (struct gdbarch *gdbarch,
+  struct bfd *obfd);
+
 /* Default implementation of gdbarch default_get_return_buf_addr method.  */
 extern CORE_ADDR default_get_return_buf_addr (struct type *val_typegdbarch,
 					      frame_info_ptr cur_frame);
diff --git a/gdb/corelow.c b/gdb/corelow.c
index 439270f5559..114ce3054d5 100644
--- a/gdb/corelow.c
+++ b/gdb/corelow.c
@@ -1229,35 +1229,47 @@  core_target::thread_alive (ptid_t ptid)
 const struct target_desc *
 core_target::read_description ()
 {
-  /* If the core file contains a target description note then we will use
-     that in preference to anything else.  */
-  bfd_size_type tdesc_note_size = 0;
-  struct bfd_section *tdesc_note_section
-    = bfd_get_section_by_name (core_bfd, ".gdb-tdesc");
-  if (tdesc_note_section != nullptr)
-    tdesc_note_size = bfd_section_size (tdesc_note_section);
-  if (tdesc_note_size > 0)
+  /* First check whether the target wants us to use the corefile target
+     description notes.  */
+  if (gdbarch_use_target_description_from_corefile_notes (m_core_gdbarch,
+							  core_bfd))
     {
-      gdb::char_vector contents (tdesc_note_size + 1);
-      if (bfd_get_section_contents (core_bfd, tdesc_note_section,
-				    contents.data (), (file_ptr) 0,
-				    tdesc_note_size))
+      /* If the core file contains a target description note then go ahead and
+	 use that.  */
+      bfd_size_type tdesc_note_size = 0;
+      struct bfd_section *tdesc_note_section
+	= bfd_get_section_by_name (core_bfd, ".gdb-tdesc");
+      if (tdesc_note_section != nullptr)
+	tdesc_note_size = bfd_section_size (tdesc_note_section);
+      if (tdesc_note_size > 0)
 	{
-	  /* Ensure we have a null terminator.  */
-	  contents[tdesc_note_size] = '\0';
-	  const struct target_desc *result
-	    = string_read_description_xml (contents.data ());
-	  if (result != nullptr)
-	    return result;
+	  gdb::char_vector contents (tdesc_note_size + 1);
+	  if (bfd_get_section_contents (core_bfd, tdesc_note_section,
+					contents.data (), (file_ptr) 0,
+					tdesc_note_size))
+	    {
+	      /* Ensure we have a null terminator.  */
+	      contents[tdesc_note_size] = '\0';
+	      const struct target_desc *result
+		= string_read_description_xml (contents.data ());
+	      if (result != nullptr)
+		return result;
+	    }
 	}
     }
 
+  /* If the architecture provides a corefile target description hook, use
+     it now.  Even if the core file contains a target description in a note
+     section, it is not useful for targets that can potentially have distinct
+     descriptions for each thread.  One example is AArch64's SVE/SME
+     extensions that allow per-thread vector length changes, resulting in
+     registers with different sizes.  */
   if (m_core_gdbarch && gdbarch_core_read_description_p (m_core_gdbarch))
     {
       const struct target_desc *result;
 
       result = gdbarch_core_read_description (m_core_gdbarch, this, core_bfd);
-      if (result != NULL)
+      if (result != nullptr)
 	return result;
     }
 
diff --git a/gdb/gdbarch-gen.h b/gdb/gdbarch-gen.h
index d62eefa1c5b..33276aa1c43 100644
--- a/gdb/gdbarch-gen.h
+++ b/gdb/gdbarch-gen.h
@@ -1717,3 +1717,17 @@  extern void set_gdbarch_get_pc_address_flags (struct gdbarch *gdbarch, gdbarch_g
 typedef void (gdbarch_read_core_file_mappings_ftype) (struct gdbarch *gdbarch, struct bfd *cbfd, read_core_file_mappings_pre_loop_ftype pre_loop_cb, read_core_file_mappings_loop_ftype loop_cb);
 extern void gdbarch_read_core_file_mappings (struct gdbarch *gdbarch, struct bfd *cbfd, read_core_file_mappings_pre_loop_ftype pre_loop_cb, read_core_file_mappings_loop_ftype loop_cb);
 extern void set_gdbarch_read_core_file_mappings (struct gdbarch *gdbarch, gdbarch_read_core_file_mappings_ftype *read_core_file_mappings);
+
+/* Return true if the target description for all threads should be read from the
+   target description core file note(s).  Return false if the target description
+   for all threads should be inferred from the core file contents/sections.
+
+   The corefile's bfd is passed through OBFD.
+
+   This hook should be used by targets that can have distinct target descriptions
+   for each thread when the core file only holds a single target description
+   note. */
+
+typedef bool (gdbarch_use_target_description_from_corefile_notes_ftype) (struct gdbarch *gdbarch, struct bfd *obfd);
+extern bool gdbarch_use_target_description_from_corefile_notes (struct gdbarch *gdbarch, struct bfd *obfd);
+extern void set_gdbarch_use_target_description_from_corefile_notes (struct gdbarch *gdbarch, gdbarch_use_target_description_from_corefile_notes_ftype *use_target_description_from_corefile_notes);
diff --git a/gdb/gdbarch.c b/gdb/gdbarch.c
index 1fc254d3d6e..ee868908598 100644
--- a/gdb/gdbarch.c
+++ b/gdb/gdbarch.c
@@ -256,6 +256,7 @@  struct gdbarch
   gdbarch_type_align_ftype *type_align = default_type_align;
   gdbarch_get_pc_address_flags_ftype *get_pc_address_flags = default_get_pc_address_flags;
   gdbarch_read_core_file_mappings_ftype *read_core_file_mappings = default_read_core_file_mappings;
+  gdbarch_use_target_description_from_corefile_notes_ftype *use_target_description_from_corefile_notes = default_use_target_description_from_corefile_notes;
 };
 
 /* Create a new ``struct gdbarch'' based on information provided by
@@ -523,6 +524,7 @@  verify_gdbarch (struct gdbarch *gdbarch)
   /* Skip verify of type_align, invalid_p == 0 */
   /* Skip verify of get_pc_address_flags, invalid_p == 0 */
   /* Skip verify of read_core_file_mappings, invalid_p == 0 */
+  /* Skip verify of use_target_description_from_corefile_notes, invalid_p == 0 */
   if (!log.empty ())
     internal_error (_("verify_gdbarch: the following are invalid ...%s"),
 		    log.c_str ());
@@ -1373,6 +1375,9 @@  gdbarch_dump (struct gdbarch *gdbarch, struct ui_file *file)
   gdb_printf (file,
 	      "gdbarch_dump: read_core_file_mappings = <%s>\n",
 	      host_address_to_string (gdbarch->read_core_file_mappings));
+  gdb_printf (file,
+	      "gdbarch_dump: use_target_description_from_corefile_notes = <%s>\n",
+	      host_address_to_string (gdbarch->use_target_description_from_corefile_notes));
   if (gdbarch->dump_tdep != NULL)
     gdbarch->dump_tdep (gdbarch, file);
 }
@@ -5409,3 +5414,20 @@  set_gdbarch_read_core_file_mappings (struct gdbarch *gdbarch,
 {
   gdbarch->read_core_file_mappings = read_core_file_mappings;
 }
+
+bool
+gdbarch_use_target_description_from_corefile_notes (struct gdbarch *gdbarch, struct bfd *obfd)
+{
+  gdb_assert (gdbarch != NULL);
+  gdb_assert (gdbarch->use_target_description_from_corefile_notes != NULL);
+  if (gdbarch_debug >= 2)
+    gdb_printf (gdb_stdlog, "gdbarch_use_target_description_from_corefile_notes called\n");
+  return gdbarch->use_target_description_from_corefile_notes (gdbarch, obfd);
+}
+
+void
+set_gdbarch_use_target_description_from_corefile_notes (struct gdbarch *gdbarch,
+							gdbarch_use_target_description_from_corefile_notes_ftype use_target_description_from_corefile_notes)
+{
+  gdbarch->use_target_description_from_corefile_notes = use_target_description_from_corefile_notes;
+}
diff --git a/gdb/gdbarch_components.py b/gdb/gdbarch_components.py
index 846467b8d83..bbb9b188286 100644
--- a/gdb/gdbarch_components.py
+++ b/gdb/gdbarch_components.py
@@ -2732,3 +2732,22 @@  Read core file mappings
     predefault="default_read_core_file_mappings",
     invalid=False,
 )
+
+Method(
+    comment="""
+Return true if the target description for all threads should be read from the
+target description core file note(s).  Return false if the target description
+for all threads should be inferred from the core file contents/sections.
+
+The corefile's bfd is passed through OBFD.
+
+This hook should be used by targets that can have distinct target descriptions
+for each thread when the core file only holds a single target description
+note.
+""",
+    type="bool",
+    name="use_target_description_from_corefile_notes",
+    params=[("struct bfd *", "obfd")],
+    predefault="default_use_target_description_from_corefile_notes",
+    invalid=False,
+)