[3/3] Add name canonicalization for C
Commit Message
PR symtab/29105 shows a number of situations where symbol lookup can
result in the expansion of too many CUs.
What happens is that lookup_signed_typename will try to look up a type
like "signed int". In cooked_index_functions::expand_symtabs_matching,
when looping over languages, the C++ case will canonicalize this type
name to be "int" instead. Then this method will proceed to expand
every CU that has an entry for "int" -- i.e., nearly all of them. A
crucial component of this is that the caller, objfile::lookup_symbol,
does not do this canonicalization, so when it tries to find the symbol
for "signed int", it fails -- causing the loop to continue.
This patch fixes the problem by introducing name canonicalization for
C. The idea here is that, by making C and C++ agree on the canonical
name when a symbol name can have multiple spellings, we avoid the bad
behavior in objfile::lookup_symbol (and any other such code -- I don't
know if there is any).
Unlike C++, C only has a few situations where canonicalization is
needed. And, in particular, due to the lack of overloading (thus
avoiding any issues in linespec) and due to the way c-exp.y works, I
think that no canonicalization is needed during symbol lookup -- only
during symtab construction. This explains why lookup_name_info is not
touched.
The stabs reader is modified on a "best effort" basis.
The DWARF reader needed one small tweak in dwarf2_name to avoid a
regression in dw2-unusual-field-names.exp. I think this is adequately
explained by the comment, but basically this is a scenario that should
not occur in real code, only the gdb test suite.
lookup_signed_typename is simplified. It used to search for two
different type names, but now gdb can search just for the canonical
form.
gdb.dwarf2/enum-type.exp needed a small tweak, because the
canonicalizer turns "unsigned integer" into "unsigned int integer".
It seems better here to use the correct C type name.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
---
gdb/c-lang.c | 14 ++++++++++++
gdb/c-lang.h | 5 +++++
gdb/dbxread.c | 13 +++++++++++
gdb/dwarf2/cooked-index.c | 8 +++++--
gdb/dwarf2/read.c | 18 +++++++++++++++-
gdb/gdbtypes.c | 12 +++--------
gdb/stabsread.c | 30 ++++++++++++++++----------
gdb/testsuite/gdb.dwarf2/enum-type.exp | 4 ++--
8 files changed, 79 insertions(+), 25 deletions(-)
Comments
On 11/7/22 11:23, Tom Tromey via Gdb-patches wrote:
> PR symtab/29105 shows a number of situations where symbol lookup can
> result in the expansion of too many CUs.
>
> What happens is that lookup_signed_typename will try to look up a type
> like "signed int". In cooked_index_functions::expand_symtabs_matching,
> when looping over languages, the C++ case will canonicalize this type
> name to be "int" instead. Then this method will proceed to expand
> every CU that has an entry for "int" -- i.e., nearly all of them. A
> crucial component of this is that the caller, objfile::lookup_symbol,
> does not do this canonicalization, so when it tries to find the symbol
> for "signed int", it fails -- causing the loop to continue.
>
> This patch fixes the problem by introducing name canonicalization for
> C. The idea here is that, by making C and C++ agree on the canonical
> name when a symbol name can have multiple spellings, we avoid the bad
> behavior in objfile::lookup_symbol (and any other such code -- I don't
> know if there is any).
>
> Unlike C++, C only has a few situations where canonicalization is
> needed. And, in particular, due to the lack of overloading (thus
> avoiding any issues in linespec) and due to the way c-exp.y works, I
> think that no canonicalization is needed during symbol lookup -- only
> during symtab construction. This explains why lookup_name_info is not
> touched.
>
> The stabs reader is modified on a "best effort" basis.
>
> The DWARF reader needed one small tweak in dwarf2_name to avoid a
> regression in dw2-unusual-field-names.exp. I think this is adequately
> explained by the comment, but basically this is a scenario that should
> not occur in real code, only the gdb test suite.
>
> lookup_signed_typename is simplified. It used to search for two
> different type names, but now gdb can search just for the canonical
> form.
>
> gdb.dwarf2/enum-type.exp needed a small tweak, because the
> canonicalizer turns "unsigned integer" into "unsigned int integer".
> It seems better here to use the correct C type name.
>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
What's actually happening in the code is a bit over my head, I don't
think I could properly review this without spending several days diving
into it. But I tested the cases I reported on the bug, and confirm the
over-expansion does not happen with the patch applied.
Simon
Tom Tromey via Gdb-patches <gdb-patches@sourceware.org> writes:
> PR symtab/29105 shows a number of situations where symbol lookup can
> result in the expansion of too many CUs.
>
> What happens is that lookup_signed_typename will try to look up a type
> like "signed int". In cooked_index_functions::expand_symtabs_matching,
> when looping over languages, the C++ case will canonicalize this type
> name to be "int" instead. Then this method will proceed to expand
> every CU that has an entry for "int" -- i.e., nearly all of them. A
> crucial component of this is that the caller, objfile::lookup_symbol,
> does not do this canonicalization, so when it tries to find the symbol
> for "signed int", it fails -- causing the loop to continue.
>
> This patch fixes the problem by introducing name canonicalization for
> C. The idea here is that, by making C and C++ agree on the canonical
> name when a symbol name can have multiple spellings, we avoid the bad
> behavior in objfile::lookup_symbol (and any other such code -- I don't
> know if there is any).
>
> Unlike C++, C only has a few situations where canonicalization is
> needed. And, in particular, due to the lack of overloading (thus
> avoiding any issues in linespec) and due to the way c-exp.y works, I
> think that no canonicalization is needed during symbol lookup -- only
> during symtab construction. This explains why lookup_name_info is not
> touched.
>
> The stabs reader is modified on a "best effort" basis.
>
> The DWARF reader needed one small tweak in dwarf2_name to avoid a
> regression in dw2-unusual-field-names.exp. I think this is adequately
> explained by the comment, but basically this is a scenario that should
> not occur in real code, only the gdb test suite.
>
> lookup_signed_typename is simplified. It used to search for two
> different type names, but now gdb can search just for the canonical
> form.
>
> gdb.dwarf2/enum-type.exp needed a small tweak, because the
> canonicalizer turns "unsigned integer" into "unsigned int integer".
> It seems better here to use the correct C type name.
>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
> ---
> gdb/c-lang.c | 14 ++++++++++++
> gdb/c-lang.h | 5 +++++
> gdb/dbxread.c | 13 +++++++++++
> gdb/dwarf2/cooked-index.c | 8 +++++--
> gdb/dwarf2/read.c | 18 +++++++++++++++-
> gdb/gdbtypes.c | 12 +++--------
> gdb/stabsread.c | 30 ++++++++++++++++----------
> gdb/testsuite/gdb.dwarf2/enum-type.exp | 4 ++--
> 8 files changed, 79 insertions(+), 25 deletions(-)
>
> diff --git a/gdb/c-lang.c b/gdb/c-lang.c
> index 36b4d1ae3dd..bfad7aeee60 100644
> --- a/gdb/c-lang.c
> +++ b/gdb/c-lang.c
> @@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type)
>
>
>
> +/* See c-lang.h. */
> +
> +gdb::unique_xmalloc_ptr<char>
> +c_canonicalize_name (const char *name)
> +{
> + if (strchr (name, ' ') != nullptr
> + || streq (name, "signed")
> + || streq (name, "unsigned"))
> + return cp_canonicalize_string (name);
> + return nullptr;
> +}
> +
> +
> +
> void
> c_language_arch_info (struct gdbarch *gdbarch,
> struct language_arch_info *lai)
> diff --git a/gdb/c-lang.h b/gdb/c-lang.h
> index 93515671d80..652f147f656 100644
> --- a/gdb/c-lang.h
> +++ b/gdb/c-lang.h
> @@ -167,4 +167,9 @@ extern std::string cplus_compute_program (compile_instance *inst,
> const struct block *expr_block,
> CORE_ADDR expr_pc);
>
> +/* Return the canonical form of the C symbol NAME. If NAME is already
> + canonical, return nullptr. */
> +
> +extern gdb::unique_xmalloc_ptr<char> c_canonicalize_name (const char *name);
> +
> #endif /* !defined (C_LANG_H) */
> diff --git a/gdb/dbxread.c b/gdb/dbxread.c
> index b0047cf0e79..ae726bdfcc6 100644
> --- a/gdb/dbxread.c
> +++ b/gdb/dbxread.c
> @@ -48,6 +48,7 @@
> #include "complaints.h"
> #include "cp-abi.h"
> #include "cp-support.h"
> +#include "c-lang.h"
> #include "psympriv.h"
> #include "block.h"
> #include "aout/aout64.h"
> @@ -1444,6 +1445,18 @@ read_dbx_symtab (minimal_symbol_reader &reader,
> new_name.get ());
> }
> }
> + else if (psymtab_language == language_c)
> + {
> + std::string name (namestring, p - namestring);
> + gdb::unique_xmalloc_ptr<char> new_name
> + = c_canonicalize_name (name.c_str ());
> + if (new_name != nullptr)
> + {
> + sym_len = strlen (new_name.get ());
> + sym_name = obstack_strdup (&objfile->objfile_obstack,
> + new_name.get ());
> + }
> + }
>
> if (sym_len == 0)
> {
> diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
> index a580d549d0d..0aa026c7779 100644
> --- a/gdb/dwarf2/cooked-index.c
> +++ b/gdb/dwarf2/cooked-index.c
> @@ -21,6 +21,7 @@
> #include "dwarf2/cooked-index.h"
> #include "dwarf2/read.h"
> #include "cp-support.h"
> +#include "c-lang.h"
> #include "ada-lang.h"
> #include "split-name.h"
> #include <algorithm>
> @@ -210,14 +211,17 @@ cooked_index::do_finalize ()
> m_names.push_back (std::move (canon_name));
> }
> }
> - else if (entry->per_cu->lang () == language_cplus)
> + else if (entry->per_cu->lang () == language_cplus
> + || entry->per_cu->lang () == language_c)
> {
> void **slot = htab_find_slot (seen_names.get (), entry,
> INSERT);
> if (*slot == nullptr)
> {
> gdb::unique_xmalloc_ptr<char> canon_name
> - = cp_canonicalize_string (entry->name);
> + = (entry->per_cu->lang () == language_cplus
> + ? cp_canonicalize_string (entry->name)
> + : c_canonicalize_name (entry->name));
> if (canon_name == nullptr)
> entry->canonical = entry->name;
> else
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 978dd4d0bb9..0826f907800 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -22001,7 +22001,10 @@ static const char *
> dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
> struct objfile *objfile)
> {
> - if (name && cu->lang () == language_cplus)
> + if (name == nullptr)
> + return name;
> +
> + if (cu->lang () == language_cplus)
> {
> gdb::unique_xmalloc_ptr<char> canon_name
> = cp_canonicalize_string (name);
> @@ -22009,6 +22012,14 @@ dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
> if (canon_name != nullptr)
> name = objfile->intern (canon_name.get ());
> }
> + else if (cu->lang () == language_c)
> + {
> + gdb::unique_xmalloc_ptr<char> canon_name
> + = c_canonicalize_name (name);
> +
> + if (canon_name != nullptr)
> + name = objfile->intern (canon_name.get ());
> + }
>
> return name;
> }
> @@ -22037,6 +22048,11 @@ dwarf2_name (struct die_info *die, struct dwarf2_cu *cu)
>
> switch (die->tag)
> {
> + /* A member's name should not be canonicalized. This is a bit
> + of a hack, in that normally it should not be possible to run
> + into this situation; however, the dw2-unusual-field-names.exp
> + test creates custom DWARF that does. */
> + case DW_TAG_member:
> case DW_TAG_compile_unit:
> case DW_TAG_partial_unit:
> /* Compilation units have a DW_AT_name that is a filename, not
> diff --git a/gdb/gdbtypes.c b/gdb/gdbtypes.c
> index a43d9265ad2..d6e8109a95c 100644
> --- a/gdb/gdbtypes.c
> +++ b/gdb/gdbtypes.c
> @@ -1721,15 +1721,9 @@ lookup_unsigned_typename (const struct language_defn *language,
> struct type *
> lookup_signed_typename (const struct language_defn *language, const char *name)
> {
> - struct type *t;
> - char *uns = (char *) alloca (strlen (name) + 8);
> -
> - strcpy (uns, "signed ");
> - strcpy (uns + 7, name);
> - t = lookup_typename (language, uns, NULL, 1);
> - /* If we don't find "signed FOO" just try again with plain "FOO". */
> - if (t != NULL)
> - return t;
> + /* In C and C++, "char" and "signed char" are distinct types. */
> + if (streq (name, "char"))
> + name = "signed char";
> return lookup_typename (language, name, NULL, 0);
> }
>
> diff --git a/gdb/stabsread.c b/gdb/stabsread.c
> index 612443557b5..74d0885fa71 100644
> --- a/gdb/stabsread.c
> +++ b/gdb/stabsread.c
> @@ -736,11 +736,13 @@ define_symbol (CORE_ADDR valu, const char *string, int desc, int type,
>
> if (sym->language () == language_cplus)
> {
> - char *name = (char *) alloca (p - string + 1);
> -
> - memcpy (name, string, p - string);
> - name[p - string] = '\0';
> - new_name = cp_canonicalize_string (name);
> + std::string name (string, p - string);
> + new_name = cp_canonicalize_string (name.c_str ());
> + }
> + else if (sym->language () == language_c)
> + {
> + std::string name (string, p - string);
> + new_name = c_canonicalize_name (name.c_str ());
> }
> if (new_name != nullptr)
> sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd);
> @@ -1592,12 +1594,18 @@ read_type (const char **pp, struct objfile *objfile)
> type_name = NULL;
> if (get_current_subfile ()->language == language_cplus)
> {
> - char *name = (char *) alloca (p - *pp + 1);
> -
> - memcpy (name, *pp, p - *pp);
> - name[p - *pp] = '\0';
> -
> - gdb::unique_xmalloc_ptr<char> new_name = cp_canonicalize_string (name);
> + std::string name (*pp, p - *pp);
> + gdb::unique_xmalloc_ptr<char> new_name
> + = cp_canonicalize_string (name.c_str ());
> + if (new_name != nullptr)
> + type_name = obstack_strdup (&objfile->objfile_obstack,
> + new_name.get ());
> + }
> + else if (get_current_subfile ()->language == language_c)
> + {
> + std::string name (*pp, p - *pp);
> + gdb::unique_xmalloc_ptr<char> new_name
> + = c_canonicalize_name (name.c_str ());
> if (new_name != nullptr)
> type_name = obstack_strdup (&objfile->objfile_obstack,
> new_name.get ());
> diff --git a/gdb/testsuite/gdb.dwarf2/enum-type.exp b/gdb/testsuite/gdb.dwarf2/enum-type.exp
> index ed8e3a35d69..6ebaefa6fb1 100644
> --- a/gdb/testsuite/gdb.dwarf2/enum-type.exp
> +++ b/gdb/testsuite/gdb.dwarf2/enum-type.exp
> @@ -43,7 +43,7 @@ Dwarf::assemble $asm_file {
> uinteger_label: DW_TAG_base_type {
> {DW_AT_byte_size 4 DW_FORM_sdata}
> {DW_AT_encoding @DW_ATE_unsigned}
> - {DW_AT_name {unsigned integer}}
> + {DW_AT_name {unsigned int}}
I notice that a few lines above this we also have:
integer_label: DW_TAG_base_type {
{DW_AT_byte_size 4 DW_FORM_sdata}
{DW_AT_encoding @DW_ATE_signed}
{DW_AT_name integer}
}
which seems to have the same int/integer misnaming, though I guess it
isn't causing any problems. Maybe we should fix this anyway though,
just for consistency?
Thanks,
Andrew
> }
>
> DW_TAG_enumeration_type {
> @@ -79,5 +79,5 @@ gdb_test "print sizeof(enum E)" " = 4"
> gdb_test "ptype enum EU" "type = enum EU {TWO = 2}" \
> "ptype EU in enum C"
> gdb_test_no_output "set lang c++"
> -gdb_test "ptype enum EU" "type = enum EU : unsigned integer {TWO = 2}" \
> +gdb_test "ptype enum EU" "type = enum EU : unsigned int {TWO = 2}" \
> "ptype EU in C++"
> --
> 2.34.3
>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:
>> - {DW_AT_name {unsigned integer}}
>> + {DW_AT_name {unsigned int}}
Andrew> I notice that a few lines above this we also have:
Andrew> integer_label: DW_TAG_base_type {
Andrew> {DW_AT_byte_size 4 DW_FORM_sdata}
Andrew> {DW_AT_encoding @DW_ATE_signed}
Andrew> {DW_AT_name integer}
Andrew> }
Andrew> which seems to have the same int/integer misnaming, though I guess it
Andrew> isn't causing any problems. Maybe we should fix this anyway though,
Andrew> just for consistency?
Sure, no problem.
The reason "unsigned integer" causes a problem is that this
canonicalizes to "unsigned int integer". Maybe this is a bug in the
canonicalizer, though it's also weird/"impossible" input.
"integer" doesn't have this problem because it's just an ordinary
identifier.
Thanks for your reviews.
Tom
Tom> Thanks for your reviews.
I'm checking these in now.
Tom
* Tom Tromey via Gdb-patches <gdb-patches@sourceware.org> [2022-11-07 09:23:56 -0700]:
> PR symtab/29105 shows a number of situations where symbol lookup can
> result in the expansion of too many CUs.
>
> What happens is that lookup_signed_typename will try to look up a type
> like "signed int". In cooked_index_functions::expand_symtabs_matching,
> when looping over languages, the C++ case will canonicalize this type
> name to be "int" instead. Then this method will proceed to expand
> every CU that has an entry for "int" -- i.e., nearly all of them. A
> crucial component of this is that the caller, objfile::lookup_symbol,
> does not do this canonicalization, so when it tries to find the symbol
> for "signed int", it fails -- causing the loop to continue.
>
> This patch fixes the problem by introducing name canonicalization for
> C. The idea here is that, by making C and C++ agree on the canonical
> name when a symbol name can have multiple spellings, we avoid the bad
> behavior in objfile::lookup_symbol (and any other such code -- I don't
> know if there is any).
>
> Unlike C++, C only has a few situations where canonicalization is
> needed. And, in particular, due to the lack of overloading (thus
> avoiding any issues in linespec) and due to the way c-exp.y works, I
> think that no canonicalization is needed during symbol lookup -- only
> during symtab construction. This explains why lookup_name_info is not
> touched.
>
> The stabs reader is modified on a "best effort" basis.
>
> The DWARF reader needed one small tweak in dwarf2_name to avoid a
> regression in dw2-unusual-field-names.exp. I think this is adequately
> explained by the comment, but basically this is a scenario that should
> not occur in real code, only the gdb test suite.
>
> lookup_signed_typename is simplified. It used to search for two
> different type names, but now gdb can search just for the canonical
> form.
>
> gdb.dwarf2/enum-type.exp needed a small tweak, because the
> canonicalizer turns "unsigned integer" into "unsigned int integer".
> It seems better here to use the correct C type name.
>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
> ---
> gdb/c-lang.c | 14 ++++++++++++
> gdb/c-lang.h | 5 +++++
> gdb/dbxread.c | 13 +++++++++++
> gdb/dwarf2/cooked-index.c | 8 +++++--
> gdb/dwarf2/read.c | 18 +++++++++++++++-
> gdb/gdbtypes.c | 12 +++--------
> gdb/stabsread.c | 30 ++++++++++++++++----------
> gdb/testsuite/gdb.dwarf2/enum-type.exp | 4 ++--
> 8 files changed, 79 insertions(+), 25 deletions(-)
>
> diff --git a/gdb/c-lang.c b/gdb/c-lang.c
> index 36b4d1ae3dd..bfad7aeee60 100644
> --- a/gdb/c-lang.c
> +++ b/gdb/c-lang.c
> @@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type)
>
>
>
> +/* See c-lang.h. */
> +
> +gdb::unique_xmalloc_ptr<char>
> +c_canonicalize_name (const char *name)
> +{
> + if (strchr (name, ' ') != nullptr
> + || streq (name, "signed")
> + || streq (name, "unsigned"))
> + return cp_canonicalize_string (name);
> + return nullptr;
> +}
> +
> +
> +
> void
> c_language_arch_info (struct gdbarch *gdbarch,
> struct language_arch_info *lai)
> diff --git a/gdb/c-lang.h b/gdb/c-lang.h
> index 93515671d80..652f147f656 100644
> --- a/gdb/c-lang.h
> +++ b/gdb/c-lang.h
> @@ -167,4 +167,9 @@ extern std::string cplus_compute_program (compile_instance *inst,
> const struct block *expr_block,
> CORE_ADDR expr_pc);
>
> +/* Return the canonical form of the C symbol NAME. If NAME is already
> + canonical, return nullptr. */
> +
> +extern gdb::unique_xmalloc_ptr<char> c_canonicalize_name (const char *name);
> +
> #endif /* !defined (C_LANG_H) */
> diff --git a/gdb/dbxread.c b/gdb/dbxread.c
> index b0047cf0e79..ae726bdfcc6 100644
> --- a/gdb/dbxread.c
> +++ b/gdb/dbxread.c
> @@ -48,6 +48,7 @@
> #include "complaints.h"
> #include "cp-abi.h"
> #include "cp-support.h"
> +#include "c-lang.h"
> #include "psympriv.h"
> #include "block.h"
> #include "aout/aout64.h"
> @@ -1444,6 +1445,18 @@ read_dbx_symtab (minimal_symbol_reader &reader,
> new_name.get ());
> }
> }
> + else if (psymtab_language == language_c)
> + {
> + std::string name (namestring, p - namestring);
> + gdb::unique_xmalloc_ptr<char> new_name
> + = c_canonicalize_name (name.c_str ());
> + if (new_name != nullptr)
> + {
> + sym_len = strlen (new_name.get ());
> + sym_name = obstack_strdup (&objfile->objfile_obstack,
> + new_name.get ());
> + }
> + }
>
> if (sym_len == 0)
> {
> diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c
> index a580d549d0d..0aa026c7779 100644
> --- a/gdb/dwarf2/cooked-index.c
> +++ b/gdb/dwarf2/cooked-index.c
> @@ -21,6 +21,7 @@
> #include "dwarf2/cooked-index.h"
> #include "dwarf2/read.h"
> #include "cp-support.h"
> +#include "c-lang.h"
> #include "ada-lang.h"
> #include "split-name.h"
> #include <algorithm>
> @@ -210,14 +211,17 @@ cooked_index::do_finalize ()
> m_names.push_back (std::move (canon_name));
> }
> }
> - else if (entry->per_cu->lang () == language_cplus)
> + else if (entry->per_cu->lang () == language_cplus
> + || entry->per_cu->lang () == language_c)
> {
> void **slot = htab_find_slot (seen_names.get (), entry,
> INSERT);
> if (*slot == nullptr)
> {
> gdb::unique_xmalloc_ptr<char> canon_name
> - = cp_canonicalize_string (entry->name);
> + = (entry->per_cu->lang () == language_cplus
> + ? cp_canonicalize_string (entry->name)
> + : c_canonicalize_name (entry->name));
> if (canon_name == nullptr)
> entry->canonical = entry->name;
> else
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 978dd4d0bb9..0826f907800 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -22001,7 +22001,10 @@ static const char *
> dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
> struct objfile *objfile)
> {
> - if (name && cu->lang () == language_cplus)
> + if (name == nullptr)
> + return name;
> +
> + if (cu->lang () == language_cplus)
> {
> gdb::unique_xmalloc_ptr<char> canon_name
> = cp_canonicalize_string (name);
> @@ -22009,6 +22012,14 @@ dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
> if (canon_name != nullptr)
> name = objfile->intern (canon_name.get ());
> }
> + else if (cu->lang () == language_c)
> + {
> + gdb::unique_xmalloc_ptr<char> canon_name
> + = c_canonicalize_name (name);
> +
> + if (canon_name != nullptr)
> + name = objfile->intern (canon_name.get ());
> + }
>
> return name;
> }
> @@ -22037,6 +22048,11 @@ dwarf2_name (struct die_info *die, struct dwarf2_cu *cu)
>
> switch (die->tag)
> {
> + /* A member's name should not be canonicalized. This is a bit
> + of a hack, in that normally it should not be possible to run
> + into this situation; however, the dw2-unusual-field-names.exp
> + test creates custom DWARF that does. */
> + case DW_TAG_member:
> case DW_TAG_compile_unit:
> case DW_TAG_partial_unit:
> /* Compilation units have a DW_AT_name that is a filename, not
> diff --git a/gdb/gdbtypes.c b/gdb/gdbtypes.c
> index a43d9265ad2..d6e8109a95c 100644
> --- a/gdb/gdbtypes.c
> +++ b/gdb/gdbtypes.c
> @@ -1721,15 +1721,9 @@ lookup_unsigned_typename (const struct language_defn *language,
> struct type *
> lookup_signed_typename (const struct language_defn *language, const char *name)
> {
> - struct type *t;
> - char *uns = (char *) alloca (strlen (name) + 8);
> -
> - strcpy (uns, "signed ");
> - strcpy (uns + 7, name);
> - t = lookup_typename (language, uns, NULL, 1);
> - /* If we don't find "signed FOO" just try again with plain "FOO". */
> - if (t != NULL)
> - return t;
> + /* In C and C++, "char" and "signed char" are distinct types. */
> + if (streq (name, "char"))
> + name = "signed char";
I wondered why this "char" -> "signed char" conversion is done
unconditionally for all languages, when the comment hints that the
conversion only applies for C/C++? I guess I would have expected a
language check here.
Thanks,
Andrew
> return lookup_typename (language, name, NULL, 0);
> }
>
> diff --git a/gdb/stabsread.c b/gdb/stabsread.c
> index 612443557b5..74d0885fa71 100644
> --- a/gdb/stabsread.c
> +++ b/gdb/stabsread.c
> @@ -736,11 +736,13 @@ define_symbol (CORE_ADDR valu, const char *string, int desc, int type,
>
> if (sym->language () == language_cplus)
> {
> - char *name = (char *) alloca (p - string + 1);
> -
> - memcpy (name, string, p - string);
> - name[p - string] = '\0';
> - new_name = cp_canonicalize_string (name);
> + std::string name (string, p - string);
> + new_name = cp_canonicalize_string (name.c_str ());
> + }
> + else if (sym->language () == language_c)
> + {
> + std::string name (string, p - string);
> + new_name = c_canonicalize_name (name.c_str ());
> }
> if (new_name != nullptr)
> sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd);
> @@ -1592,12 +1594,18 @@ read_type (const char **pp, struct objfile *objfile)
> type_name = NULL;
> if (get_current_subfile ()->language == language_cplus)
> {
> - char *name = (char *) alloca (p - *pp + 1);
> -
> - memcpy (name, *pp, p - *pp);
> - name[p - *pp] = '\0';
> -
> - gdb::unique_xmalloc_ptr<char> new_name = cp_canonicalize_string (name);
> + std::string name (*pp, p - *pp);
> + gdb::unique_xmalloc_ptr<char> new_name
> + = cp_canonicalize_string (name.c_str ());
> + if (new_name != nullptr)
> + type_name = obstack_strdup (&objfile->objfile_obstack,
> + new_name.get ());
> + }
> + else if (get_current_subfile ()->language == language_c)
> + {
> + std::string name (*pp, p - *pp);
> + gdb::unique_xmalloc_ptr<char> new_name
> + = c_canonicalize_name (name.c_str ());
> if (new_name != nullptr)
> type_name = obstack_strdup (&objfile->objfile_obstack,
> new_name.get ());
> diff --git a/gdb/testsuite/gdb.dwarf2/enum-type.exp b/gdb/testsuite/gdb.dwarf2/enum-type.exp
> index ed8e3a35d69..6ebaefa6fb1 100644
> --- a/gdb/testsuite/gdb.dwarf2/enum-type.exp
> +++ b/gdb/testsuite/gdb.dwarf2/enum-type.exp
> @@ -43,7 +43,7 @@ Dwarf::assemble $asm_file {
> uinteger_label: DW_TAG_base_type {
> {DW_AT_byte_size 4 DW_FORM_sdata}
> {DW_AT_encoding @DW_ATE_unsigned}
> - {DW_AT_name {unsigned integer}}
> + {DW_AT_name {unsigned int}}
> }
>
> DW_TAG_enumeration_type {
> @@ -79,5 +79,5 @@ gdb_test "print sizeof(enum E)" " = 4"
> gdb_test "ptype enum EU" "type = enum EU {TWO = 2}" \
> "ptype EU in enum C"
> gdb_test_no_output "set lang c++"
> -gdb_test "ptype enum EU" "type = enum EU : unsigned integer {TWO = 2}" \
> +gdb_test "ptype enum EU" "type = enum EU : unsigned int {TWO = 2}" \
> "ptype EU in C++"
> --
> 2.34.3
>
>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:
>> struct type *
>> lookup_signed_typename (const struct language_defn *language, const char *name)
>> {
>> - struct type *t;
>> - char *uns = (char *) alloca (strlen (name) + 8);
>> -
>> - strcpy (uns, "signed ");
>> - strcpy (uns + 7, name);
>> - t = lookup_typename (language, uns, NULL, 1);
>> - /* If we don't find "signed FOO" just try again with plain "FOO". */
>> - if (t != NULL)
>> - return t;
>> + /* In C and C++, "char" and "signed char" are distinct types. */
>> + if (streq (name, "char"))
>> + name = "signed char";
Andrew> I wondered why this "char" -> "signed char" conversion is done
Andrew> unconditionally for all languages, when the comment hints that the
Andrew> conversion only applies for C/C++? I guess I would have expected a
Andrew> language check here.
lookup_signed_typename is only used by C and C-like languages. For
non-C-like languages, sticking a "signed" (or "unsigned", see
lookup_unsigned_typename) prefix on a type name doesn't really make
sense anyway.
The uses outside c-exp.y are, IMNSHO, just leftover code from the bad
old days. Like, I suspect there's no reason to have binop_promote at
all, instead this could be an explicit node in the expression tree, and
rather than having a big 'switch' on the language, each language could
simply make a different node.
Tom
@@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type)
+/* See c-lang.h. */
+
+gdb::unique_xmalloc_ptr<char>
+c_canonicalize_name (const char *name)
+{
+ if (strchr (name, ' ') != nullptr
+ || streq (name, "signed")
+ || streq (name, "unsigned"))
+ return cp_canonicalize_string (name);
+ return nullptr;
+}
+
+
+
void
c_language_arch_info (struct gdbarch *gdbarch,
struct language_arch_info *lai)
@@ -167,4 +167,9 @@ extern std::string cplus_compute_program (compile_instance *inst,
const struct block *expr_block,
CORE_ADDR expr_pc);
+/* Return the canonical form of the C symbol NAME. If NAME is already
+ canonical, return nullptr. */
+
+extern gdb::unique_xmalloc_ptr<char> c_canonicalize_name (const char *name);
+
#endif /* !defined (C_LANG_H) */
@@ -48,6 +48,7 @@
#include "complaints.h"
#include "cp-abi.h"
#include "cp-support.h"
+#include "c-lang.h"
#include "psympriv.h"
#include "block.h"
#include "aout/aout64.h"
@@ -1444,6 +1445,18 @@ read_dbx_symtab (minimal_symbol_reader &reader,
new_name.get ());
}
}
+ else if (psymtab_language == language_c)
+ {
+ std::string name (namestring, p - namestring);
+ gdb::unique_xmalloc_ptr<char> new_name
+ = c_canonicalize_name (name.c_str ());
+ if (new_name != nullptr)
+ {
+ sym_len = strlen (new_name.get ());
+ sym_name = obstack_strdup (&objfile->objfile_obstack,
+ new_name.get ());
+ }
+ }
if (sym_len == 0)
{
@@ -21,6 +21,7 @@
#include "dwarf2/cooked-index.h"
#include "dwarf2/read.h"
#include "cp-support.h"
+#include "c-lang.h"
#include "ada-lang.h"
#include "split-name.h"
#include <algorithm>
@@ -210,14 +211,17 @@ cooked_index::do_finalize ()
m_names.push_back (std::move (canon_name));
}
}
- else if (entry->per_cu->lang () == language_cplus)
+ else if (entry->per_cu->lang () == language_cplus
+ || entry->per_cu->lang () == language_c)
{
void **slot = htab_find_slot (seen_names.get (), entry,
INSERT);
if (*slot == nullptr)
{
gdb::unique_xmalloc_ptr<char> canon_name
- = cp_canonicalize_string (entry->name);
+ = (entry->per_cu->lang () == language_cplus
+ ? cp_canonicalize_string (entry->name)
+ : c_canonicalize_name (entry->name));
if (canon_name == nullptr)
entry->canonical = entry->name;
else
@@ -22001,7 +22001,10 @@ static const char *
dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
struct objfile *objfile)
{
- if (name && cu->lang () == language_cplus)
+ if (name == nullptr)
+ return name;
+
+ if (cu->lang () == language_cplus)
{
gdb::unique_xmalloc_ptr<char> canon_name
= cp_canonicalize_string (name);
@@ -22009,6 +22012,14 @@ dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
if (canon_name != nullptr)
name = objfile->intern (canon_name.get ());
}
+ else if (cu->lang () == language_c)
+ {
+ gdb::unique_xmalloc_ptr<char> canon_name
+ = c_canonicalize_name (name);
+
+ if (canon_name != nullptr)
+ name = objfile->intern (canon_name.get ());
+ }
return name;
}
@@ -22037,6 +22048,11 @@ dwarf2_name (struct die_info *die, struct dwarf2_cu *cu)
switch (die->tag)
{
+ /* A member's name should not be canonicalized. This is a bit
+ of a hack, in that normally it should not be possible to run
+ into this situation; however, the dw2-unusual-field-names.exp
+ test creates custom DWARF that does. */
+ case DW_TAG_member:
case DW_TAG_compile_unit:
case DW_TAG_partial_unit:
/* Compilation units have a DW_AT_name that is a filename, not
@@ -1721,15 +1721,9 @@ lookup_unsigned_typename (const struct language_defn *language,
struct type *
lookup_signed_typename (const struct language_defn *language, const char *name)
{
- struct type *t;
- char *uns = (char *) alloca (strlen (name) + 8);
-
- strcpy (uns, "signed ");
- strcpy (uns + 7, name);
- t = lookup_typename (language, uns, NULL, 1);
- /* If we don't find "signed FOO" just try again with plain "FOO". */
- if (t != NULL)
- return t;
+ /* In C and C++, "char" and "signed char" are distinct types. */
+ if (streq (name, "char"))
+ name = "signed char";
return lookup_typename (language, name, NULL, 0);
}
@@ -736,11 +736,13 @@ define_symbol (CORE_ADDR valu, const char *string, int desc, int type,
if (sym->language () == language_cplus)
{
- char *name = (char *) alloca (p - string + 1);
-
- memcpy (name, string, p - string);
- name[p - string] = '\0';
- new_name = cp_canonicalize_string (name);
+ std::string name (string, p - string);
+ new_name = cp_canonicalize_string (name.c_str ());
+ }
+ else if (sym->language () == language_c)
+ {
+ std::string name (string, p - string);
+ new_name = c_canonicalize_name (name.c_str ());
}
if (new_name != nullptr)
sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd);
@@ -1592,12 +1594,18 @@ read_type (const char **pp, struct objfile *objfile)
type_name = NULL;
if (get_current_subfile ()->language == language_cplus)
{
- char *name = (char *) alloca (p - *pp + 1);
-
- memcpy (name, *pp, p - *pp);
- name[p - *pp] = '\0';
-
- gdb::unique_xmalloc_ptr<char> new_name = cp_canonicalize_string (name);
+ std::string name (*pp, p - *pp);
+ gdb::unique_xmalloc_ptr<char> new_name
+ = cp_canonicalize_string (name.c_str ());
+ if (new_name != nullptr)
+ type_name = obstack_strdup (&objfile->objfile_obstack,
+ new_name.get ());
+ }
+ else if (get_current_subfile ()->language == language_c)
+ {
+ std::string name (*pp, p - *pp);
+ gdb::unique_xmalloc_ptr<char> new_name
+ = c_canonicalize_name (name.c_str ());
if (new_name != nullptr)
type_name = obstack_strdup (&objfile->objfile_obstack,
new_name.get ());
@@ -43,7 +43,7 @@ Dwarf::assemble $asm_file {
uinteger_label: DW_TAG_base_type {
{DW_AT_byte_size 4 DW_FORM_sdata}
{DW_AT_encoding @DW_ATE_unsigned}
- {DW_AT_name {unsigned integer}}
+ {DW_AT_name {unsigned int}}
}
DW_TAG_enumeration_type {
@@ -79,5 +79,5 @@ gdb_test "print sizeof(enum E)" " = 4"
gdb_test "ptype enum EU" "type = enum EU {TWO = 2}" \
"ptype EU in enum C"
gdb_test_no_output "set lang c++"
-gdb_test "ptype enum EU" "type = enum EU : unsigned integer {TWO = 2}" \
+gdb_test "ptype enum EU" "type = enum EU : unsigned int {TWO = 2}" \
"ptype EU in C++"