[3/3] Add name canonicalization for C

Message ID 20221107162356.3175221-4-tromey@adacore.com
State Committed
Headers
Series Fix over-eager CU expansion with new DWARF reader |

Commit Message

Tom Tromey Nov. 7, 2022, 4:23 p.m. UTC
  PR symtab/29105 shows a number of situations where symbol lookup can
result in the expansion of too many CUs.

What happens is that lookup_signed_typename will try to look up a type
like "signed int".  In cooked_index_functions::expand_symtabs_matching,
when looping over languages, the C++ case will canonicalize this type
name to be "int" instead.  Then this method will proceed to expand
every CU that has an entry for "int" -- i.e., nearly all of them.  A
crucial component of this is that the caller, objfile::lookup_symbol,
does not do this canonicalization, so when it tries to find the symbol
for "signed int", it fails -- causing the loop to continue.

This patch fixes the problem by introducing name canonicalization for
C.  The idea here is that, by making C and C++ agree on the canonical
name when a symbol name can have multiple spellings, we avoid the bad
behavior in objfile::lookup_symbol (and any other such code -- I don't
know if there is any).

Unlike C++, C only has a few situations where canonicalization is
needed.  And, in particular, due to the lack of overloading (thus
avoiding any issues in linespec) and due to the way c-exp.y works, I
think that no canonicalization is needed during symbol lookup -- only
during symtab construction.  This explains why lookup_name_info is not
touched.

The stabs reader is modified on a "best effort" basis.

The DWARF reader needed one small tweak in dwarf2_name to avoid a
regression in dw2-unusual-field-names.exp.  I think this is adequately
explained by the comment, but basically this is a scenario that should
not occur in real code, only the gdb test suite.

lookup_signed_typename is simplified.  It used to search for two
different type names, but now gdb can search just for the canonical
form.

gdb.dwarf2/enum-type.exp needed a small tweak, because the
canonicalizer turns "unsigned integer" into "unsigned int integer".
It seems better here to use the correct C type name.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
---
 gdb/c-lang.c                           | 14 ++++++++++++
 gdb/c-lang.h                           |  5 +++++
 gdb/dbxread.c                          | 13 +++++++++++
 gdb/dwarf2/cooked-index.c              |  8 +++++--
 gdb/dwarf2/read.c                      | 18 +++++++++++++++-
 gdb/gdbtypes.c                         | 12 +++--------
 gdb/stabsread.c                        | 30 ++++++++++++++++----------
 gdb/testsuite/gdb.dwarf2/enum-type.exp |  4 ++--
 8 files changed, 79 insertions(+), 25 deletions(-)
  

Comments

Simon Marchi Dec. 1, 2022, 4:06 p.m. UTC | #1
On 11/7/22 11:23, Tom Tromey via Gdb-patches wrote:
> PR symtab/29105 shows a number of situations where symbol lookup can
> result in the expansion of too many CUs.
> 
> What happens is that lookup_signed_typename will try to look up a type
> like "signed int".  In cooked_index_functions::expand_symtabs_matching,
> when looping over languages, the C++ case will canonicalize this type
> name to be "int" instead.  Then this method will proceed to expand
> every CU that has an entry for "int" -- i.e., nearly all of them.  A
> crucial component of this is that the caller, objfile::lookup_symbol,
> does not do this canonicalization, so when it tries to find the symbol
> for "signed int", it fails -- causing the loop to continue.
> 
> This patch fixes the problem by introducing name canonicalization for
> C.  The idea here is that, by making C and C++ agree on the canonical
> name when a symbol name can have multiple spellings, we avoid the bad
> behavior in objfile::lookup_symbol (and any other such code -- I don't
> know if there is any).
> 
> Unlike C++, C only has a few situations where canonicalization is
> needed.  And, in particular, due to the lack of overloading (thus
> avoiding any issues in linespec) and due to the way c-exp.y works, I
> think that no canonicalization is needed during symbol lookup -- only
> during symtab construction.  This explains why lookup_name_info is not
> touched.
> 
> The stabs reader is modified on a "best effort" basis.
> 
> The DWARF reader needed one small tweak in dwarf2_name to avoid a
> regression in dw2-unusual-field-names.exp.  I think this is adequately
> explained by the comment, but basically this is a scenario that should
> not occur in real code, only the gdb test suite.
> 
> lookup_signed_typename is simplified.  It used to search for two
> different type names, but now gdb can search just for the canonical
> form.
> 
> gdb.dwarf2/enum-type.exp needed a small tweak, because the
> canonicalizer turns "unsigned integer" into "unsigned int integer".
> It seems better here to use the correct C type name.
> 
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105

What's actually happening in the code is a bit over my head, I don't
think I could properly review this without spending several days diving
into it.  But I tested the cases I reported on the bug, and confirm the
over-expansion does not happen with the patch applied.

Simon
  
Andrew Burgess Dec. 1, 2022, 4:29 p.m. UTC | #2
Tom Tromey via Gdb-patches <gdb-patches@sourceware.org> writes:

> PR symtab/29105 shows a number of situations where symbol lookup can
> result in the expansion of too many CUs.
>
> What happens is that lookup_signed_typename will try to look up a type
> like "signed int".  In cooked_index_functions::expand_symtabs_matching,
> when looping over languages, the C++ case will canonicalize this type
> name to be "int" instead.  Then this method will proceed to expand
> every CU that has an entry for "int" -- i.e., nearly all of them.  A
> crucial component of this is that the caller, objfile::lookup_symbol,
> does not do this canonicalization, so when it tries to find the symbol
> for "signed int", it fails -- causing the loop to continue.
>
> This patch fixes the problem by introducing name canonicalization for
> C.  The idea here is that, by making C and C++ agree on the canonical
> name when a symbol name can have multiple spellings, we avoid the bad
> behavior in objfile::lookup_symbol (and any other such code -- I don't
> know if there is any).
>
> Unlike C++, C only has a few situations where canonicalization is
> needed.  And, in particular, due to the lack of overloading (thus
> avoiding any issues in linespec) and due to the way c-exp.y works, I
> think that no canonicalization is needed during symbol lookup -- only
> during symtab construction.  This explains why lookup_name_info is not
> touched.
>
> The stabs reader is modified on a "best effort" basis.
>
> The DWARF reader needed one small tweak in dwarf2_name to avoid a
> regression in dw2-unusual-field-names.exp.  I think this is adequately
> explained by the comment, but basically this is a scenario that should
> not occur in real code, only the gdb test suite.
>
> lookup_signed_typename is simplified.  It used to search for two
> different type names, but now gdb can search just for the canonical
> form.
>
> gdb.dwarf2/enum-type.exp needed a small tweak, because the
> canonicalizer turns "unsigned integer" into "unsigned int integer".
> It seems better here to use the correct C type name.
>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
> ---
>  gdb/c-lang.c                           | 14 ++++++++++++
>  gdb/c-lang.h                           |  5 +++++
>  gdb/dbxread.c                          | 13 +++++++++++
>  gdb/dwarf2/cooked-index.c              |  8 +++++--
>  gdb/dwarf2/read.c                      | 18 +++++++++++++++-
>  gdb/gdbtypes.c                         | 12 +++--------
>  gdb/stabsread.c                        | 30 ++++++++++++++++----------
>  gdb/testsuite/gdb.dwarf2/enum-type.exp |  4 ++--
>  8 files changed, 79 insertions(+), 25 deletions(-)
>
> diff --git a/gdb/c-lang.c b/gdb/c-lang.c
> index 36b4d1ae3dd..bfad7aeee60 100644
> --- a/gdb/c-lang.c
> +++ b/gdb/c-lang.c
> @@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type)
>  
>  
>  
> +/* See c-lang.h.  */
> +
> +gdb::unique_xmalloc_ptr<char>
> +c_canonicalize_name (const char *name)
> +{
> +  if (strchr (name, ' ') != nullptr
> +      || streq (name, "signed")
> +      || streq (name, "unsigned"))
> +    return cp_canonicalize_string (name);
> +  return nullptr;
> +}
> +
> +
> +
>  void
>  c_language_arch_info (struct gdbarch *gdbarch,
>  		      struct language_arch_info *lai)
> diff --git a/gdb/c-lang.h b/gdb/c-lang.h
> index 93515671d80..652f147f656 100644
> --- a/gdb/c-lang.h
> +++ b/gdb/c-lang.h
> @@ -167,4 +167,9 @@ extern std::string cplus_compute_program (compile_instance *inst,
>  					  const struct block *expr_block,
>  					  CORE_ADDR expr_pc);
>  
> +/* Return the canonical form of the C symbol NAME.  If NAME is already
> +   canonical, return nullptr.  */
> +
> +extern gdb::unique_xmalloc_ptr<char> c_canonicalize_name (const char *name);
> +
>  #endif /* !defined (C_LANG_H) */
> diff --git a/gdb/dbxread.c b/gdb/dbxread.c
> index b0047cf0e79..ae726bdfcc6 100644
> --- a/gdb/dbxread.c
> +++ b/gdb/dbxread.c
> @@ -48,6 +48,7 @@
>  #include "complaints.h"
>  #include "cp-abi.h"
>  #include "cp-support.h"
> +#include "c-lang.h"
>  #include "psympriv.h"
>  #include "block.h"
>  #include "aout/aout64.h"
> @@ -1444,6 +1445,18 @@ read_dbx_symtab (minimal_symbol_reader &reader,
>  					     new_name.get ());
>  		}
>  	    }
> +	  else if (psymtab_language == language_c)
> +	    {
> +	      std::string name (namestring, p - namestring);
> +	      gdb::unique_xmalloc_ptr<char> new_name
> +		= c_canonicalize_name (name.c_str ());
> +	      if (new_name != nullptr)
> +		{
> +		  sym_len = strlen (new_name.get ());
> +		  sym_name = obstack_strdup (&objfile->objfile_obstack,
> +					     new_name.get ());
> +		}
> +	    }
>  
>  	  if (sym_len == 0)
>  	    {
> diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
> index a580d549d0d..0aa026c7779 100644
> --- a/gdb/dwarf2/cooked-index.c
> +++ b/gdb/dwarf2/cooked-index.c
> @@ -21,6 +21,7 @@
>  #include "dwarf2/cooked-index.h"
>  #include "dwarf2/read.h"
>  #include "cp-support.h"
> +#include "c-lang.h"
>  #include "ada-lang.h"
>  #include "split-name.h"
>  #include <algorithm>
> @@ -210,14 +211,17 @@ cooked_index::do_finalize ()
>  	      m_names.push_back (std::move (canon_name));
>  	    }
>  	}
> -      else if (entry->per_cu->lang () == language_cplus)
> +      else if (entry->per_cu->lang () == language_cplus
> +	       || entry->per_cu->lang () == language_c)
>  	{
>  	  void **slot = htab_find_slot (seen_names.get (), entry,
>  					INSERT);
>  	  if (*slot == nullptr)
>  	    {
>  	      gdb::unique_xmalloc_ptr<char> canon_name
> -		= cp_canonicalize_string (entry->name);
> +		= (entry->per_cu->lang () == language_cplus
> +		   ? cp_canonicalize_string (entry->name)
> +		   : c_canonicalize_name (entry->name));
>  	      if (canon_name == nullptr)
>  		entry->canonical = entry->name;
>  	      else
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 978dd4d0bb9..0826f907800 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -22001,7 +22001,10 @@ static const char *
>  dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
>  			  struct objfile *objfile)
>  {
> -  if (name && cu->lang () == language_cplus)
> +  if (name == nullptr)
> +    return name;
> +
> +  if (cu->lang () == language_cplus)
>      {
>        gdb::unique_xmalloc_ptr<char> canon_name
>  	= cp_canonicalize_string (name);
> @@ -22009,6 +22012,14 @@ dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
>        if (canon_name != nullptr)
>  	name = objfile->intern (canon_name.get ());
>      }
> +  else if (cu->lang () == language_c)
> +    {
> +      gdb::unique_xmalloc_ptr<char> canon_name
> +	= c_canonicalize_name (name);
> +
> +      if (canon_name != nullptr)
> +	name = objfile->intern (canon_name.get ());
> +    }
>  
>    return name;
>  }
> @@ -22037,6 +22048,11 @@ dwarf2_name (struct die_info *die, struct dwarf2_cu *cu)
>  
>    switch (die->tag)
>      {
> +      /* A member's name should not be canonicalized.  This is a bit
> +	 of a hack, in that normally it should not be possible to run
> +	 into this situation; however, the dw2-unusual-field-names.exp
> +	 test creates custom DWARF that does.  */
> +    case DW_TAG_member:
>      case DW_TAG_compile_unit:
>      case DW_TAG_partial_unit:
>        /* Compilation units have a DW_AT_name that is a filename, not
> diff --git a/gdb/gdbtypes.c b/gdb/gdbtypes.c
> index a43d9265ad2..d6e8109a95c 100644
> --- a/gdb/gdbtypes.c
> +++ b/gdb/gdbtypes.c
> @@ -1721,15 +1721,9 @@ lookup_unsigned_typename (const struct language_defn *language,
>  struct type *
>  lookup_signed_typename (const struct language_defn *language, const char *name)
>  {
> -  struct type *t;
> -  char *uns = (char *) alloca (strlen (name) + 8);
> -
> -  strcpy (uns, "signed ");
> -  strcpy (uns + 7, name);
> -  t = lookup_typename (language, uns, NULL, 1);
> -  /* If we don't find "signed FOO" just try again with plain "FOO".  */
> -  if (t != NULL)
> -    return t;
> +  /* In C and C++, "char" and "signed char" are distinct types.  */
> +  if (streq (name, "char"))
> +    name = "signed char";
>    return lookup_typename (language, name, NULL, 0);
>  }
>  
> diff --git a/gdb/stabsread.c b/gdb/stabsread.c
> index 612443557b5..74d0885fa71 100644
> --- a/gdb/stabsread.c
> +++ b/gdb/stabsread.c
> @@ -736,11 +736,13 @@ define_symbol (CORE_ADDR valu, const char *string, int desc, int type,
>  
>        if (sym->language () == language_cplus)
>  	{
> -	  char *name = (char *) alloca (p - string + 1);
> -
> -	  memcpy (name, string, p - string);
> -	  name[p - string] = '\0';
> -	  new_name = cp_canonicalize_string (name);
> +	  std::string name (string, p - string);
> +	  new_name = cp_canonicalize_string (name.c_str ());
> +	}
> +      else if (sym->language () == language_c)
> +	{
> +	  std::string name (string, p - string);
> +	  new_name = c_canonicalize_name (name.c_str ());
>  	}
>        if (new_name != nullptr)
>  	sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd);
> @@ -1592,12 +1594,18 @@ read_type (const char **pp, struct objfile *objfile)
>  	  type_name = NULL;
>  	  if (get_current_subfile ()->language == language_cplus)
>  	    {
> -	      char *name = (char *) alloca (p - *pp + 1);
> -
> -	      memcpy (name, *pp, p - *pp);
> -	      name[p - *pp] = '\0';
> -
> -	      gdb::unique_xmalloc_ptr<char> new_name = cp_canonicalize_string (name);
> +	      std::string name (*pp, p - *pp);
> +	      gdb::unique_xmalloc_ptr<char> new_name
> +		= cp_canonicalize_string (name.c_str ());
> +	      if (new_name != nullptr)
> +		type_name = obstack_strdup (&objfile->objfile_obstack,
> +					    new_name.get ());
> +	    }
> +	  else if (get_current_subfile ()->language == language_c)
> +	    {
> +	      std::string name (*pp, p - *pp);
> +	      gdb::unique_xmalloc_ptr<char> new_name
> +		= c_canonicalize_name (name.c_str ());
>  	      if (new_name != nullptr)
>  		type_name = obstack_strdup (&objfile->objfile_obstack,
>  					    new_name.get ());
> diff --git a/gdb/testsuite/gdb.dwarf2/enum-type.exp b/gdb/testsuite/gdb.dwarf2/enum-type.exp
> index ed8e3a35d69..6ebaefa6fb1 100644
> --- a/gdb/testsuite/gdb.dwarf2/enum-type.exp
> +++ b/gdb/testsuite/gdb.dwarf2/enum-type.exp
> @@ -43,7 +43,7 @@ Dwarf::assemble $asm_file {
>              uinteger_label: DW_TAG_base_type {
>                  {DW_AT_byte_size 4 DW_FORM_sdata}
>                  {DW_AT_encoding  @DW_ATE_unsigned}
> -                {DW_AT_name      {unsigned integer}}
> +		{DW_AT_name      {unsigned int}}

I notice that a few lines above this we also have:

     integer_label: DW_TAG_base_type {
         {DW_AT_byte_size 4 DW_FORM_sdata}
         {DW_AT_encoding  @DW_ATE_signed}
         {DW_AT_name      integer}
     }

which seems to have the same int/integer misnaming, though I guess it
isn't causing any problems.  Maybe we should fix this anyway though,
just for consistency?

Thanks,
Andrew

>              }
>  
>  	    DW_TAG_enumeration_type {
> @@ -79,5 +79,5 @@ gdb_test "print sizeof(enum E)" " = 4"
>  gdb_test "ptype enum EU" "type = enum EU {TWO = 2}" \
>      "ptype EU in enum C"
>  gdb_test_no_output "set lang c++"
> -gdb_test "ptype enum EU" "type = enum EU : unsigned integer {TWO = 2}" \
> +gdb_test "ptype enum EU" "type = enum EU : unsigned int {TWO = 2}" \
>      "ptype EU in C++"
> -- 
> 2.34.3
  
Tom Tromey Dec. 1, 2022, 5:56 p.m. UTC | #3
>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

>> -                {DW_AT_name      {unsigned integer}}
>> +		{DW_AT_name      {unsigned int}}

Andrew> I notice that a few lines above this we also have:

Andrew>      integer_label: DW_TAG_base_type {
Andrew>          {DW_AT_byte_size 4 DW_FORM_sdata}
Andrew>          {DW_AT_encoding  @DW_ATE_signed}
Andrew>          {DW_AT_name      integer}
Andrew>      }

Andrew> which seems to have the same int/integer misnaming, though I guess it
Andrew> isn't causing any problems.  Maybe we should fix this anyway though,
Andrew> just for consistency?

Sure, no problem.

The reason "unsigned integer" causes a problem is that this
canonicalizes to "unsigned int integer".  Maybe this is a bug in the
canonicalizer, though it's also weird/"impossible" input.

"integer" doesn't have this problem because it's just an ordinary
identifier.

Thanks for your reviews.

Tom
  
Tom Tromey Dec. 1, 2022, 6:16 p.m. UTC | #4
Tom> Thanks for your reviews.

I'm checking these in now.

Tom
  
Andrew Burgess Dec. 1, 2022, 11:23 p.m. UTC | #5
* Tom Tromey via Gdb-patches <gdb-patches@sourceware.org> [2022-11-07 09:23:56 -0700]:

> PR symtab/29105 shows a number of situations where symbol lookup can
> result in the expansion of too many CUs.
> 
> What happens is that lookup_signed_typename will try to look up a type
> like "signed int".  In cooked_index_functions::expand_symtabs_matching,
> when looping over languages, the C++ case will canonicalize this type
> name to be "int" instead.  Then this method will proceed to expand
> every CU that has an entry for "int" -- i.e., nearly all of them.  A
> crucial component of this is that the caller, objfile::lookup_symbol,
> does not do this canonicalization, so when it tries to find the symbol
> for "signed int", it fails -- causing the loop to continue.
> 
> This patch fixes the problem by introducing name canonicalization for
> C.  The idea here is that, by making C and C++ agree on the canonical
> name when a symbol name can have multiple spellings, we avoid the bad
> behavior in objfile::lookup_symbol (and any other such code -- I don't
> know if there is any).
> 
> Unlike C++, C only has a few situations where canonicalization is
> needed.  And, in particular, due to the lack of overloading (thus
> avoiding any issues in linespec) and due to the way c-exp.y works, I
> think that no canonicalization is needed during symbol lookup -- only
> during symtab construction.  This explains why lookup_name_info is not
> touched.
> 
> The stabs reader is modified on a "best effort" basis.
> 
> The DWARF reader needed one small tweak in dwarf2_name to avoid a
> regression in dw2-unusual-field-names.exp.  I think this is adequately
> explained by the comment, but basically this is a scenario that should
> not occur in real code, only the gdb test suite.
> 
> lookup_signed_typename is simplified.  It used to search for two
> different type names, but now gdb can search just for the canonical
> form.
> 
> gdb.dwarf2/enum-type.exp needed a small tweak, because the
> canonicalizer turns "unsigned integer" into "unsigned int integer".
> It seems better here to use the correct C type name.
> 
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
> ---
>  gdb/c-lang.c                           | 14 ++++++++++++
>  gdb/c-lang.h                           |  5 +++++
>  gdb/dbxread.c                          | 13 +++++++++++
>  gdb/dwarf2/cooked-index.c              |  8 +++++--
>  gdb/dwarf2/read.c                      | 18 +++++++++++++++-
>  gdb/gdbtypes.c                         | 12 +++--------
>  gdb/stabsread.c                        | 30 ++++++++++++++++----------
>  gdb/testsuite/gdb.dwarf2/enum-type.exp |  4 ++--
>  8 files changed, 79 insertions(+), 25 deletions(-)
> 
> diff --git a/gdb/c-lang.c b/gdb/c-lang.c
> index 36b4d1ae3dd..bfad7aeee60 100644
> --- a/gdb/c-lang.c
> +++ b/gdb/c-lang.c
> @@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type)
>  
>  
>  
> +/* See c-lang.h.  */
> +
> +gdb::unique_xmalloc_ptr<char>
> +c_canonicalize_name (const char *name)
> +{
> +  if (strchr (name, ' ') != nullptr
> +      || streq (name, "signed")
> +      || streq (name, "unsigned"))
> +    return cp_canonicalize_string (name);
> +  return nullptr;
> +}
> +
> +
> +
>  void
>  c_language_arch_info (struct gdbarch *gdbarch,
>  		      struct language_arch_info *lai)
> diff --git a/gdb/c-lang.h b/gdb/c-lang.h
> index 93515671d80..652f147f656 100644
> --- a/gdb/c-lang.h
> +++ b/gdb/c-lang.h
> @@ -167,4 +167,9 @@ extern std::string cplus_compute_program (compile_instance *inst,
>  					  const struct block *expr_block,
>  					  CORE_ADDR expr_pc);
>  
> +/* Return the canonical form of the C symbol NAME.  If NAME is already
> +   canonical, return nullptr.  */
> +
> +extern gdb::unique_xmalloc_ptr<char> c_canonicalize_name (const char *name);
> +
>  #endif /* !defined (C_LANG_H) */
> diff --git a/gdb/dbxread.c b/gdb/dbxread.c
> index b0047cf0e79..ae726bdfcc6 100644
> --- a/gdb/dbxread.c
> +++ b/gdb/dbxread.c
> @@ -48,6 +48,7 @@
>  #include "complaints.h"
>  #include "cp-abi.h"
>  #include "cp-support.h"
> +#include "c-lang.h"
>  #include "psympriv.h"
>  #include "block.h"
>  #include "aout/aout64.h"
> @@ -1444,6 +1445,18 @@ read_dbx_symtab (minimal_symbol_reader &reader,
>  					     new_name.get ());
>  		}
>  	    }
> +	  else if (psymtab_language == language_c)
> +	    {
> +	      std::string name (namestring, p - namestring);
> +	      gdb::unique_xmalloc_ptr<char> new_name
> +		= c_canonicalize_name (name.c_str ());
> +	      if (new_name != nullptr)
> +		{
> +		  sym_len = strlen (new_name.get ());
> +		  sym_name = obstack_strdup (&objfile->objfile_obstack,
> +					     new_name.get ());
> +		}
> +	    }
>  
>  	  if (sym_len == 0)
>  	    {
> diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
> index a580d549d0d..0aa026c7779 100644
> --- a/gdb/dwarf2/cooked-index.c
> +++ b/gdb/dwarf2/cooked-index.c
> @@ -21,6 +21,7 @@
>  #include "dwarf2/cooked-index.h"
>  #include "dwarf2/read.h"
>  #include "cp-support.h"
> +#include "c-lang.h"
>  #include "ada-lang.h"
>  #include "split-name.h"
>  #include <algorithm>
> @@ -210,14 +211,17 @@ cooked_index::do_finalize ()
>  	      m_names.push_back (std::move (canon_name));
>  	    }
>  	}
> -      else if (entry->per_cu->lang () == language_cplus)
> +      else if (entry->per_cu->lang () == language_cplus
> +	       || entry->per_cu->lang () == language_c)
>  	{
>  	  void **slot = htab_find_slot (seen_names.get (), entry,
>  					INSERT);
>  	  if (*slot == nullptr)
>  	    {
>  	      gdb::unique_xmalloc_ptr<char> canon_name
> -		= cp_canonicalize_string (entry->name);
> +		= (entry->per_cu->lang () == language_cplus
> +		   ? cp_canonicalize_string (entry->name)
> +		   : c_canonicalize_name (entry->name));
>  	      if (canon_name == nullptr)
>  		entry->canonical = entry->name;
>  	      else
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 978dd4d0bb9..0826f907800 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -22001,7 +22001,10 @@ static const char *
>  dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
>  			  struct objfile *objfile)
>  {
> -  if (name && cu->lang () == language_cplus)
> +  if (name == nullptr)
> +    return name;
> +
> +  if (cu->lang () == language_cplus)
>      {
>        gdb::unique_xmalloc_ptr<char> canon_name
>  	= cp_canonicalize_string (name);
> @@ -22009,6 +22012,14 @@ dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
>        if (canon_name != nullptr)
>  	name = objfile->intern (canon_name.get ());
>      }
> +  else if (cu->lang () == language_c)
> +    {
> +      gdb::unique_xmalloc_ptr<char> canon_name
> +	= c_canonicalize_name (name);
> +
> +      if (canon_name != nullptr)
> +	name = objfile->intern (canon_name.get ());
> +    }
>  
>    return name;
>  }
> @@ -22037,6 +22048,11 @@ dwarf2_name (struct die_info *die, struct dwarf2_cu *cu)
>  
>    switch (die->tag)
>      {
> +      /* A member's name should not be canonicalized.  This is a bit
> +	 of a hack, in that normally it should not be possible to run
> +	 into this situation; however, the dw2-unusual-field-names.exp
> +	 test creates custom DWARF that does.  */
> +    case DW_TAG_member:
>      case DW_TAG_compile_unit:
>      case DW_TAG_partial_unit:
>        /* Compilation units have a DW_AT_name that is a filename, not
> diff --git a/gdb/gdbtypes.c b/gdb/gdbtypes.c
> index a43d9265ad2..d6e8109a95c 100644
> --- a/gdb/gdbtypes.c
> +++ b/gdb/gdbtypes.c
> @@ -1721,15 +1721,9 @@ lookup_unsigned_typename (const struct language_defn *language,
>  struct type *
>  lookup_signed_typename (const struct language_defn *language, const char *name)
>  {
> -  struct type *t;
> -  char *uns = (char *) alloca (strlen (name) + 8);
> -
> -  strcpy (uns, "signed ");
> -  strcpy (uns + 7, name);
> -  t = lookup_typename (language, uns, NULL, 1);
> -  /* If we don't find "signed FOO" just try again with plain "FOO".  */
> -  if (t != NULL)
> -    return t;
> +  /* In C and C++, "char" and "signed char" are distinct types.  */
> +  if (streq (name, "char"))
> +    name = "signed char";

I wondered why this "char" -> "signed char" conversion is done
unconditionally for all languages, when the comment hints that the
conversion only applies for C/C++?  I guess I would have expected a
language check here.

Thanks,
Andrew



>    return lookup_typename (language, name, NULL, 0);
>  }
>  
> diff --git a/gdb/stabsread.c b/gdb/stabsread.c
> index 612443557b5..74d0885fa71 100644
> --- a/gdb/stabsread.c
> +++ b/gdb/stabsread.c
> @@ -736,11 +736,13 @@ define_symbol (CORE_ADDR valu, const char *string, int desc, int type,
>  
>        if (sym->language () == language_cplus)
>  	{
> -	  char *name = (char *) alloca (p - string + 1);
> -
> -	  memcpy (name, string, p - string);
> -	  name[p - string] = '\0';
> -	  new_name = cp_canonicalize_string (name);
> +	  std::string name (string, p - string);
> +	  new_name = cp_canonicalize_string (name.c_str ());
> +	}
> +      else if (sym->language () == language_c)
> +	{
> +	  std::string name (string, p - string);
> +	  new_name = c_canonicalize_name (name.c_str ());
>  	}
>        if (new_name != nullptr)
>  	sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd);
> @@ -1592,12 +1594,18 @@ read_type (const char **pp, struct objfile *objfile)
>  	  type_name = NULL;
>  	  if (get_current_subfile ()->language == language_cplus)
>  	    {
> -	      char *name = (char *) alloca (p - *pp + 1);
> -
> -	      memcpy (name, *pp, p - *pp);
> -	      name[p - *pp] = '\0';
> -
> -	      gdb::unique_xmalloc_ptr<char> new_name = cp_canonicalize_string (name);
> +	      std::string name (*pp, p - *pp);
> +	      gdb::unique_xmalloc_ptr<char> new_name
> +		= cp_canonicalize_string (name.c_str ());
> +	      if (new_name != nullptr)
> +		type_name = obstack_strdup (&objfile->objfile_obstack,
> +					    new_name.get ());
> +	    }
> +	  else if (get_current_subfile ()->language == language_c)
> +	    {
> +	      std::string name (*pp, p - *pp);
> +	      gdb::unique_xmalloc_ptr<char> new_name
> +		= c_canonicalize_name (name.c_str ());
>  	      if (new_name != nullptr)
>  		type_name = obstack_strdup (&objfile->objfile_obstack,
>  					    new_name.get ());
> diff --git a/gdb/testsuite/gdb.dwarf2/enum-type.exp b/gdb/testsuite/gdb.dwarf2/enum-type.exp
> index ed8e3a35d69..6ebaefa6fb1 100644
> --- a/gdb/testsuite/gdb.dwarf2/enum-type.exp
> +++ b/gdb/testsuite/gdb.dwarf2/enum-type.exp
> @@ -43,7 +43,7 @@ Dwarf::assemble $asm_file {
>              uinteger_label: DW_TAG_base_type {
>                  {DW_AT_byte_size 4 DW_FORM_sdata}
>                  {DW_AT_encoding  @DW_ATE_unsigned}
> -                {DW_AT_name      {unsigned integer}}
> +		{DW_AT_name      {unsigned int}}
>              }
>  
>  	    DW_TAG_enumeration_type {
> @@ -79,5 +79,5 @@ gdb_test "print sizeof(enum E)" " = 4"
>  gdb_test "ptype enum EU" "type = enum EU {TWO = 2}" \
>      "ptype EU in enum C"
>  gdb_test_no_output "set lang c++"
> -gdb_test "ptype enum EU" "type = enum EU : unsigned integer {TWO = 2}" \
> +gdb_test "ptype enum EU" "type = enum EU : unsigned int {TWO = 2}" \
>      "ptype EU in C++"
> -- 
> 2.34.3
>
  
Tom Tromey Dec. 2, 2022, 2:39 p.m. UTC | #6
>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

>> struct type *
>> lookup_signed_typename (const struct language_defn *language, const char *name)
>> {
>> -  struct type *t;
>> -  char *uns = (char *) alloca (strlen (name) + 8);
>> -
>> -  strcpy (uns, "signed ");
>> -  strcpy (uns + 7, name);
>> -  t = lookup_typename (language, uns, NULL, 1);
>> -  /* If we don't find "signed FOO" just try again with plain "FOO".  */
>> -  if (t != NULL)
>> -    return t;
>> +  /* In C and C++, "char" and "signed char" are distinct types.  */
>> +  if (streq (name, "char"))
>> +    name = "signed char";

Andrew> I wondered why this "char" -> "signed char" conversion is done
Andrew> unconditionally for all languages, when the comment hints that the
Andrew> conversion only applies for C/C++?  I guess I would have expected a
Andrew> language check here.

lookup_signed_typename is only used by C and C-like languages.  For
non-C-like languages, sticking a "signed" (or "unsigned", see
lookup_unsigned_typename) prefix on a type name doesn't really make
sense anyway.

The uses outside c-exp.y are, IMNSHO, just leftover code from the bad
old days.  Like, I suspect there's no reason to have binop_promote at
all, instead this could be an explicit node in the expression tree, and
rather than having a big 'switch' on the language, each language could
simply make a different node.

Tom
  

Patch

diff --git a/gdb/c-lang.c b/gdb/c-lang.c
index 36b4d1ae3dd..bfad7aeee60 100644
--- a/gdb/c-lang.c
+++ b/gdb/c-lang.c
@@ -727,6 +727,20 @@  c_is_string_type_p (struct type *type)
 
 
 
+/* See c-lang.h.  */
+
+gdb::unique_xmalloc_ptr<char>
+c_canonicalize_name (const char *name)
+{
+  if (strchr (name, ' ') != nullptr
+      || streq (name, "signed")
+      || streq (name, "unsigned"))
+    return cp_canonicalize_string (name);
+  return nullptr;
+}
+
+
+
 void
 c_language_arch_info (struct gdbarch *gdbarch,
 		      struct language_arch_info *lai)
diff --git a/gdb/c-lang.h b/gdb/c-lang.h
index 93515671d80..652f147f656 100644
--- a/gdb/c-lang.h
+++ b/gdb/c-lang.h
@@ -167,4 +167,9 @@  extern std::string cplus_compute_program (compile_instance *inst,
 					  const struct block *expr_block,
 					  CORE_ADDR expr_pc);
 
+/* Return the canonical form of the C symbol NAME.  If NAME is already
+   canonical, return nullptr.  */
+
+extern gdb::unique_xmalloc_ptr<char> c_canonicalize_name (const char *name);
+
 #endif /* !defined (C_LANG_H) */
diff --git a/gdb/dbxread.c b/gdb/dbxread.c
index b0047cf0e79..ae726bdfcc6 100644
--- a/gdb/dbxread.c
+++ b/gdb/dbxread.c
@@ -48,6 +48,7 @@ 
 #include "complaints.h"
 #include "cp-abi.h"
 #include "cp-support.h"
+#include "c-lang.h"
 #include "psympriv.h"
 #include "block.h"
 #include "aout/aout64.h"
@@ -1444,6 +1445,18 @@  read_dbx_symtab (minimal_symbol_reader &reader,
 					     new_name.get ());
 		}
 	    }
+	  else if (psymtab_language == language_c)
+	    {
+	      std::string name (namestring, p - namestring);
+	      gdb::unique_xmalloc_ptr<char> new_name
+		= c_canonicalize_name (name.c_str ());
+	      if (new_name != nullptr)
+		{
+		  sym_len = strlen (new_name.get ());
+		  sym_name = obstack_strdup (&objfile->objfile_obstack,
+					     new_name.get ());
+		}
+	    }
 
 	  if (sym_len == 0)
 	    {
diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
index a580d549d0d..0aa026c7779 100644
--- a/gdb/dwarf2/cooked-index.c
+++ b/gdb/dwarf2/cooked-index.c
@@ -21,6 +21,7 @@ 
 #include "dwarf2/cooked-index.h"
 #include "dwarf2/read.h"
 #include "cp-support.h"
+#include "c-lang.h"
 #include "ada-lang.h"
 #include "split-name.h"
 #include <algorithm>
@@ -210,14 +211,17 @@  cooked_index::do_finalize ()
 	      m_names.push_back (std::move (canon_name));
 	    }
 	}
-      else if (entry->per_cu->lang () == language_cplus)
+      else if (entry->per_cu->lang () == language_cplus
+	       || entry->per_cu->lang () == language_c)
 	{
 	  void **slot = htab_find_slot (seen_names.get (), entry,
 					INSERT);
 	  if (*slot == nullptr)
 	    {
 	      gdb::unique_xmalloc_ptr<char> canon_name
-		= cp_canonicalize_string (entry->name);
+		= (entry->per_cu->lang () == language_cplus
+		   ? cp_canonicalize_string (entry->name)
+		   : c_canonicalize_name (entry->name));
 	      if (canon_name == nullptr)
 		entry->canonical = entry->name;
 	      else
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 978dd4d0bb9..0826f907800 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -22001,7 +22001,10 @@  static const char *
 dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
 			  struct objfile *objfile)
 {
-  if (name && cu->lang () == language_cplus)
+  if (name == nullptr)
+    return name;
+
+  if (cu->lang () == language_cplus)
     {
       gdb::unique_xmalloc_ptr<char> canon_name
 	= cp_canonicalize_string (name);
@@ -22009,6 +22012,14 @@  dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
       if (canon_name != nullptr)
 	name = objfile->intern (canon_name.get ());
     }
+  else if (cu->lang () == language_c)
+    {
+      gdb::unique_xmalloc_ptr<char> canon_name
+	= c_canonicalize_name (name);
+
+      if (canon_name != nullptr)
+	name = objfile->intern (canon_name.get ());
+    }
 
   return name;
 }
@@ -22037,6 +22048,11 @@  dwarf2_name (struct die_info *die, struct dwarf2_cu *cu)
 
   switch (die->tag)
     {
+      /* A member's name should not be canonicalized.  This is a bit
+	 of a hack, in that normally it should not be possible to run
+	 into this situation; however, the dw2-unusual-field-names.exp
+	 test creates custom DWARF that does.  */
+    case DW_TAG_member:
     case DW_TAG_compile_unit:
     case DW_TAG_partial_unit:
       /* Compilation units have a DW_AT_name that is a filename, not
diff --git a/gdb/gdbtypes.c b/gdb/gdbtypes.c
index a43d9265ad2..d6e8109a95c 100644
--- a/gdb/gdbtypes.c
+++ b/gdb/gdbtypes.c
@@ -1721,15 +1721,9 @@  lookup_unsigned_typename (const struct language_defn *language,
 struct type *
 lookup_signed_typename (const struct language_defn *language, const char *name)
 {
-  struct type *t;
-  char *uns = (char *) alloca (strlen (name) + 8);
-
-  strcpy (uns, "signed ");
-  strcpy (uns + 7, name);
-  t = lookup_typename (language, uns, NULL, 1);
-  /* If we don't find "signed FOO" just try again with plain "FOO".  */
-  if (t != NULL)
-    return t;
+  /* In C and C++, "char" and "signed char" are distinct types.  */
+  if (streq (name, "char"))
+    name = "signed char";
   return lookup_typename (language, name, NULL, 0);
 }
 
diff --git a/gdb/stabsread.c b/gdb/stabsread.c
index 612443557b5..74d0885fa71 100644
--- a/gdb/stabsread.c
+++ b/gdb/stabsread.c
@@ -736,11 +736,13 @@  define_symbol (CORE_ADDR valu, const char *string, int desc, int type,
 
       if (sym->language () == language_cplus)
 	{
-	  char *name = (char *) alloca (p - string + 1);
-
-	  memcpy (name, string, p - string);
-	  name[p - string] = '\0';
-	  new_name = cp_canonicalize_string (name);
+	  std::string name (string, p - string);
+	  new_name = cp_canonicalize_string (name.c_str ());
+	}
+      else if (sym->language () == language_c)
+	{
+	  std::string name (string, p - string);
+	  new_name = c_canonicalize_name (name.c_str ());
 	}
       if (new_name != nullptr)
 	sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd);
@@ -1592,12 +1594,18 @@  read_type (const char **pp, struct objfile *objfile)
 	  type_name = NULL;
 	  if (get_current_subfile ()->language == language_cplus)
 	    {
-	      char *name = (char *) alloca (p - *pp + 1);
-
-	      memcpy (name, *pp, p - *pp);
-	      name[p - *pp] = '\0';
-
-	      gdb::unique_xmalloc_ptr<char> new_name = cp_canonicalize_string (name);
+	      std::string name (*pp, p - *pp);
+	      gdb::unique_xmalloc_ptr<char> new_name
+		= cp_canonicalize_string (name.c_str ());
+	      if (new_name != nullptr)
+		type_name = obstack_strdup (&objfile->objfile_obstack,
+					    new_name.get ());
+	    }
+	  else if (get_current_subfile ()->language == language_c)
+	    {
+	      std::string name (*pp, p - *pp);
+	      gdb::unique_xmalloc_ptr<char> new_name
+		= c_canonicalize_name (name.c_str ());
 	      if (new_name != nullptr)
 		type_name = obstack_strdup (&objfile->objfile_obstack,
 					    new_name.get ());
diff --git a/gdb/testsuite/gdb.dwarf2/enum-type.exp b/gdb/testsuite/gdb.dwarf2/enum-type.exp
index ed8e3a35d69..6ebaefa6fb1 100644
--- a/gdb/testsuite/gdb.dwarf2/enum-type.exp
+++ b/gdb/testsuite/gdb.dwarf2/enum-type.exp
@@ -43,7 +43,7 @@  Dwarf::assemble $asm_file {
             uinteger_label: DW_TAG_base_type {
                 {DW_AT_byte_size 4 DW_FORM_sdata}
                 {DW_AT_encoding  @DW_ATE_unsigned}
-                {DW_AT_name      {unsigned integer}}
+		{DW_AT_name      {unsigned int}}
             }
 
 	    DW_TAG_enumeration_type {
@@ -79,5 +79,5 @@  gdb_test "print sizeof(enum E)" " = 4"
 gdb_test "ptype enum EU" "type = enum EU {TWO = 2}" \
     "ptype EU in enum C"
 gdb_test_no_output "set lang c++"
-gdb_test "ptype enum EU" "type = enum EU : unsigned integer {TWO = 2}" \
+gdb_test "ptype enum EU" "type = enum EU : unsigned int {TWO = 2}" \
     "ptype EU in C++"