From patchwork Mon Nov 7 16:23:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Tromey X-Patchwork-Id: 60127 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 53383385843A for ; Mon, 7 Nov 2022 16:24:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 53383385843A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1667838284; bh=itoYXqDduYwg/M4zX0GOkRlfbCtPz0g22F1QrUcNVC0=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Hrmgltv3REVCLyx5F89h4u6PbDRgGehv5Xgr1KNB0P4cc874d+o8aBBfWIUR3PXg9 tm0hSOQ9mxCKCsvWGZreRmaATY9yEO+IoJO5zxhZV5IGghfzkNEvsR15OLjzH9O9HK d2/JyEni1thWdU3uwCI75BoxLjerAV+MCMRgQMzI= X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) by sourceware.org (Postfix) with ESMTPS id E39043858C36 for ; Mon, 7 Nov 2022 16:24:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E39043858C36 Received: by mail-io1-xd34.google.com with SMTP id z3so9276689iof.3 for ; Mon, 07 Nov 2022 08:24:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=itoYXqDduYwg/M4zX0GOkRlfbCtPz0g22F1QrUcNVC0=; b=ehy3XVTUTpL5Rlpvd+To3ITjZNXuSmuJFj6lTN+cmX1kv10zAPcGcgsKlOWyqogwwK cav8JO+2R38F5a3nzzXcn9djZiDad7rN/ZdtzuTc28oGIenPvcy4pG41L9QokBCu2wfI 2KX03nyiKKYCNZMrgMalBot8gTUq0/haSv8SkHdieRReQ3/agz28cHvvWBtLfllfk1re 0fe0subvTQOQcY8m73kkSSOIfMum/VYsmu8/0X1uorkWv4yx704BMNgPhOeBC3iPjPhQ w6uuUs//icD+EHHO2MFhgtdhYdn8Jy+NsqEjggvQnk0eCi2YuwIbC76c1ThhULPEX14O PPUw== X-Gm-Message-State: ACrzQf2eAdKUYopGBBtI2XRUZHLKdrqixUzZs/U47+K4bh0E84+eE/XP 0epNSKIXBHbAXHVicIWYQpgflhWWUp1pOA== X-Google-Smtp-Source: AMsMyM6/WLZdjxkc/5qNZblwlLPqRrAnA8l52FshmbSxY3esE9smJ2pDk60tFlYMwyaMAKpj4xBZ5A== X-Received: by 2002:a05:6638:2384:b0:375:2a77:5188 with SMTP id q4-20020a056638238400b003752a775188mr29976394jat.39.1667838250119; Mon, 07 Nov 2022 08:24:10 -0800 (PST) Received: from localhost.localdomain (97-122-76-186.hlrn.qwest.net. [97.122.76.186]) by smtp.gmail.com with ESMTPSA id i12-20020a023b4c000000b003750aa3c5c1sm2790492jaf.31.2022.11.07.08.24.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 08:24:09 -0800 (PST) To: gdb-patches@sourceware.org Cc: Tom Tromey Subject: [PATCH 3/3] Add name canonicalization for C Date: Mon, 7 Nov 2022 09:23:56 -0700 Message-Id: <20221107162356.3175221-4-tromey@adacore.com> X-Mailer: git-send-email 2.34.3 In-Reply-To: <20221107162356.3175221-1-tromey@adacore.com> References: <20221107162356.3175221-1-tromey@adacore.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Tromey via Gdb-patches From: Tom Tromey Reply-To: Tom Tromey Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" PR symtab/29105 shows a number of situations where symbol lookup can result in the expansion of too many CUs. What happens is that lookup_signed_typename will try to look up a type like "signed int". In cooked_index_functions::expand_symtabs_matching, when looping over languages, the C++ case will canonicalize this type name to be "int" instead. Then this method will proceed to expand every CU that has an entry for "int" -- i.e., nearly all of them. A crucial component of this is that the caller, objfile::lookup_symbol, does not do this canonicalization, so when it tries to find the symbol for "signed int", it fails -- causing the loop to continue. This patch fixes the problem by introducing name canonicalization for C. The idea here is that, by making C and C++ agree on the canonical name when a symbol name can have multiple spellings, we avoid the bad behavior in objfile::lookup_symbol (and any other such code -- I don't know if there is any). Unlike C++, C only has a few situations where canonicalization is needed. And, in particular, due to the lack of overloading (thus avoiding any issues in linespec) and due to the way c-exp.y works, I think that no canonicalization is needed during symbol lookup -- only during symtab construction. This explains why lookup_name_info is not touched. The stabs reader is modified on a "best effort" basis. The DWARF reader needed one small tweak in dwarf2_name to avoid a regression in dw2-unusual-field-names.exp. I think this is adequately explained by the comment, but basically this is a scenario that should not occur in real code, only the gdb test suite. lookup_signed_typename is simplified. It used to search for two different type names, but now gdb can search just for the canonical form. gdb.dwarf2/enum-type.exp needed a small tweak, because the canonicalizer turns "unsigned integer" into "unsigned int integer". It seems better here to use the correct C type name. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105 --- gdb/c-lang.c | 14 ++++++++++++ gdb/c-lang.h | 5 +++++ gdb/dbxread.c | 13 +++++++++++ gdb/dwarf2/cooked-index.c | 8 +++++-- gdb/dwarf2/read.c | 18 +++++++++++++++- gdb/gdbtypes.c | 12 +++-------- gdb/stabsread.c | 30 ++++++++++++++++---------- gdb/testsuite/gdb.dwarf2/enum-type.exp | 4 ++-- 8 files changed, 79 insertions(+), 25 deletions(-) diff --git a/gdb/c-lang.c b/gdb/c-lang.c index 36b4d1ae3dd..bfad7aeee60 100644 --- a/gdb/c-lang.c +++ b/gdb/c-lang.c @@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type) +/* See c-lang.h. */ + +gdb::unique_xmalloc_ptr +c_canonicalize_name (const char *name) +{ + if (strchr (name, ' ') != nullptr + || streq (name, "signed") + || streq (name, "unsigned")) + return cp_canonicalize_string (name); + return nullptr; +} + + + void c_language_arch_info (struct gdbarch *gdbarch, struct language_arch_info *lai) diff --git a/gdb/c-lang.h b/gdb/c-lang.h index 93515671d80..652f147f656 100644 --- a/gdb/c-lang.h +++ b/gdb/c-lang.h @@ -167,4 +167,9 @@ extern std::string cplus_compute_program (compile_instance *inst, const struct block *expr_block, CORE_ADDR expr_pc); +/* Return the canonical form of the C symbol NAME. If NAME is already + canonical, return nullptr. */ + +extern gdb::unique_xmalloc_ptr c_canonicalize_name (const char *name); + #endif /* !defined (C_LANG_H) */ diff --git a/gdb/dbxread.c b/gdb/dbxread.c index b0047cf0e79..ae726bdfcc6 100644 --- a/gdb/dbxread.c +++ b/gdb/dbxread.c @@ -48,6 +48,7 @@ #include "complaints.h" #include "cp-abi.h" #include "cp-support.h" +#include "c-lang.h" #include "psympriv.h" #include "block.h" #include "aout/aout64.h" @@ -1444,6 +1445,18 @@ read_dbx_symtab (minimal_symbol_reader &reader, new_name.get ()); } } + else if (psymtab_language == language_c) + { + std::string name (namestring, p - namestring); + gdb::unique_xmalloc_ptr new_name + = c_canonicalize_name (name.c_str ()); + if (new_name != nullptr) + { + sym_len = strlen (new_name.get ()); + sym_name = obstack_strdup (&objfile->objfile_obstack, + new_name.get ()); + } + } if (sym_len == 0) { diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c index a580d549d0d..0aa026c7779 100644 --- a/gdb/dwarf2/cooked-index.c +++ b/gdb/dwarf2/cooked-index.c @@ -21,6 +21,7 @@ #include "dwarf2/cooked-index.h" #include "dwarf2/read.h" #include "cp-support.h" +#include "c-lang.h" #include "ada-lang.h" #include "split-name.h" #include @@ -210,14 +211,17 @@ cooked_index::do_finalize () m_names.push_back (std::move (canon_name)); } } - else if (entry->per_cu->lang () == language_cplus) + else if (entry->per_cu->lang () == language_cplus + || entry->per_cu->lang () == language_c) { void **slot = htab_find_slot (seen_names.get (), entry, INSERT); if (*slot == nullptr) { gdb::unique_xmalloc_ptr canon_name - = cp_canonicalize_string (entry->name); + = (entry->per_cu->lang () == language_cplus + ? cp_canonicalize_string (entry->name) + : c_canonicalize_name (entry->name)); if (canon_name == nullptr) entry->canonical = entry->name; else diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c index 978dd4d0bb9..0826f907800 100644 --- a/gdb/dwarf2/read.c +++ b/gdb/dwarf2/read.c @@ -22001,7 +22001,10 @@ static const char * dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu, struct objfile *objfile) { - if (name && cu->lang () == language_cplus) + if (name == nullptr) + return name; + + if (cu->lang () == language_cplus) { gdb::unique_xmalloc_ptr canon_name = cp_canonicalize_string (name); @@ -22009,6 +22012,14 @@ dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu, if (canon_name != nullptr) name = objfile->intern (canon_name.get ()); } + else if (cu->lang () == language_c) + { + gdb::unique_xmalloc_ptr canon_name + = c_canonicalize_name (name); + + if (canon_name != nullptr) + name = objfile->intern (canon_name.get ()); + } return name; } @@ -22037,6 +22048,11 @@ dwarf2_name (struct die_info *die, struct dwarf2_cu *cu) switch (die->tag) { + /* A member's name should not be canonicalized. This is a bit + of a hack, in that normally it should not be possible to run + into this situation; however, the dw2-unusual-field-names.exp + test creates custom DWARF that does. */ + case DW_TAG_member: case DW_TAG_compile_unit: case DW_TAG_partial_unit: /* Compilation units have a DW_AT_name that is a filename, not diff --git a/gdb/gdbtypes.c b/gdb/gdbtypes.c index a43d9265ad2..d6e8109a95c 100644 --- a/gdb/gdbtypes.c +++ b/gdb/gdbtypes.c @@ -1721,15 +1721,9 @@ lookup_unsigned_typename (const struct language_defn *language, struct type * lookup_signed_typename (const struct language_defn *language, const char *name) { - struct type *t; - char *uns = (char *) alloca (strlen (name) + 8); - - strcpy (uns, "signed "); - strcpy (uns + 7, name); - t = lookup_typename (language, uns, NULL, 1); - /* If we don't find "signed FOO" just try again with plain "FOO". */ - if (t != NULL) - return t; + /* In C and C++, "char" and "signed char" are distinct types. */ + if (streq (name, "char")) + name = "signed char"; return lookup_typename (language, name, NULL, 0); } diff --git a/gdb/stabsread.c b/gdb/stabsread.c index 612443557b5..74d0885fa71 100644 --- a/gdb/stabsread.c +++ b/gdb/stabsread.c @@ -736,11 +736,13 @@ define_symbol (CORE_ADDR valu, const char *string, int desc, int type, if (sym->language () == language_cplus) { - char *name = (char *) alloca (p - string + 1); - - memcpy (name, string, p - string); - name[p - string] = '\0'; - new_name = cp_canonicalize_string (name); + std::string name (string, p - string); + new_name = cp_canonicalize_string (name.c_str ()); + } + else if (sym->language () == language_c) + { + std::string name (string, p - string); + new_name = c_canonicalize_name (name.c_str ()); } if (new_name != nullptr) sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd); @@ -1592,12 +1594,18 @@ read_type (const char **pp, struct objfile *objfile) type_name = NULL; if (get_current_subfile ()->language == language_cplus) { - char *name = (char *) alloca (p - *pp + 1); - - memcpy (name, *pp, p - *pp); - name[p - *pp] = '\0'; - - gdb::unique_xmalloc_ptr new_name = cp_canonicalize_string (name); + std::string name (*pp, p - *pp); + gdb::unique_xmalloc_ptr new_name + = cp_canonicalize_string (name.c_str ()); + if (new_name != nullptr) + type_name = obstack_strdup (&objfile->objfile_obstack, + new_name.get ()); + } + else if (get_current_subfile ()->language == language_c) + { + std::string name (*pp, p - *pp); + gdb::unique_xmalloc_ptr new_name + = c_canonicalize_name (name.c_str ()); if (new_name != nullptr) type_name = obstack_strdup (&objfile->objfile_obstack, new_name.get ()); diff --git a/gdb/testsuite/gdb.dwarf2/enum-type.exp b/gdb/testsuite/gdb.dwarf2/enum-type.exp index ed8e3a35d69..6ebaefa6fb1 100644 --- a/gdb/testsuite/gdb.dwarf2/enum-type.exp +++ b/gdb/testsuite/gdb.dwarf2/enum-type.exp @@ -43,7 +43,7 @@ Dwarf::assemble $asm_file { uinteger_label: DW_TAG_base_type { {DW_AT_byte_size 4 DW_FORM_sdata} {DW_AT_encoding @DW_ATE_unsigned} - {DW_AT_name {unsigned integer}} + {DW_AT_name {unsigned int}} } DW_TAG_enumeration_type { @@ -79,5 +79,5 @@ gdb_test "print sizeof(enum E)" " = 4" gdb_test "ptype enum EU" "type = enum EU {TWO = 2}" \ "ptype EU in enum C" gdb_test_no_output "set lang c++" -gdb_test "ptype enum EU" "type = enum EU : unsigned integer {TWO = 2}" \ +gdb_test "ptype enum EU" "type = enum EU : unsigned int {TWO = 2}" \ "ptype EU in C++"