From patchwork Tue Mar 11 14:12:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Tromey X-Patchwork-Id: 107667 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2CF1F3857835 for ; Tue, 11 Mar 2025 14:14:05 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from omta36.uswest2.a.cloudfilter.net (omta36.uswest2.a.cloudfilter.net [35.89.44.35]) by sourceware.org (Postfix) with ESMTPS id 00EE13857C7B for ; Tue, 11 Mar 2025 14:12:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 00EE13857C7B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 00EE13857C7B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=35.89.44.35 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1741702369; cv=none; b=lsNqxYfJRcqX2qBxDv9nvSzuDm4RVAOu0RaXIelQy4hT0zJF6lEegXbipvOKTnIyy9eneVOt/tshykPdAVSq/prPLQkirajz5h0wOpl8yye3X6fEBmv6XbTcJKlaubrVJibrM6PDX1+Ne3KfHP97IqfeOx6tn7dYyNhh+m72lgU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1741702369; c=relaxed/simple; bh=OfA2uMTIVqy3LT8Ewptnk5lHJxVElEei+WuAkGqLHQY=; h=DKIM-Signature:From:Date:Subject:MIME-Version:Message-Id:To; b=HPWJvznv+4S3WZ7ORaglO7dW2r5aR561PMpdT+KstTcNiG1nxPUuCbWZWw/zPuPXSlQZH7WmukenSVrCjpaZ+yWxpa9pNVLmQerW9AxkxB6JIj8zuCM1Ydeab6bIJYiHRqkGqjE521pMFlx7HODj4uk+6yqPjQwyCL9AW/yUnHU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from eig-obgw-5008a.ext.cloudfilter.net ([10.0.29.246]) by cmsmtp with ESMTPS id roBvtp4ZmMETls0LftmMAE; Tue, 11 Mar 2025 14:12:48 +0000 Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with ESMTPS id s0LetaH2ikhljs0Let2zPQ; Tue, 11 Mar 2025 14:12:47 +0000 X-Authority-Analysis: v=2.4 cv=F79RdLhN c=1 sm=1 tr=0 ts=67d044df a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17 a=IkcTkHD0fZMA:10 a=Vs1iUdzkB0EA:10 a=ItBw4LHWJt0A:10 a=D5Z4WpcXYkCQU8PNWbQA:9 a=QEXdDO2ut3YA:10 a=6Ogn3jAGHLSNbaov7Orx:22 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Cc:To:In-Reply-To:References:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From:Sender: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=45Pht0f+zaOijVkywLUbgx7JIQthREMn06/ssdjImaU=; b=HLOVYEtJFaDci68aNs+igMj4kH xn1KIh9YPeUNPqXwFdByB+LyGIyi13SSRfk3NDn6zdV9oIRy4Nrod/Oa208jr0IS3Y23+20wv0tyP dI3ZBepIAIfBNhMlxrU0cZeBp; Received: from 97-118-51-80.hlrn.qwest.net ([97.118.51.80]:33770 helo=prentzel.local) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.1) (envelope-from ) id 1ts0Le-000000003jP-077u; Tue, 11 Mar 2025 08:12:46 -0600 From: Tom Tromey Date: Tue, 11 Mar 2025 08:12:38 -0600 Subject: [PATCH 02/28] Change ada_decode to preserve upper-case in some situations MIME-Version: 1.0 Message-Id: <20250311-search-in-psyms-v1-2-d73d9be20983@tromey.com> References: <20250311-search-in-psyms-v1-0-d73d9be20983@tromey.com> In-Reply-To: <20250311-search-in-psyms-v1-0-d73d9be20983@tromey.com> To: gdb-patches@sourceware.org Cc: Tom Tromey X-Mailer: b4 0.14.2 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 97.118.51.80 X-Source-L: No X-Exim-ID: 1ts0Le-000000003jP-077u X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 97-118-51-80.hlrn.qwest.net (prentzel.local) [97.118.51.80]:33770 X-Source-Auth: tom+tromey.com X-Email-Count: 3 X-Org: HG=bhshared;ORG=bluehost; X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-CMAE-Envelope: MS4xfI70nvhXHioD9CD5YSzT6l1drdd0SDcaNg+JezPR8jaIKTntjMWP44eoEkV0pc5h4JgJ1/9zOoyReZ3LqDC3qwDk6LOWRgVs6RXeb7sc6XD0+3yVDv1J Iu/NEZm8ZLW5i0P0uw+aqNlP0HoWXHOpCF22LPLps63hHKDzfvcDkUViijx+E8dR0TqTy3TXptm8xx2/sG2ShgrtQPC1PcB0ksQ= X-Spam-Status: No, score=-3016.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_LOTSOFHASH, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces~patchwork=sourceware.org@sourceware.org This patch is needed to avoid regressions later in the series. The issue here is that ada_decode, when called with wide=false, would act as though the input needed verbatim quoting. That would happen because the 'W' character would be passed through; and then a later loop would reject the result due to that character. Similarly, with operators=false the upper-case-checking loop would be skipped, but then some names that did need verbatim quoting would pass through. Furthermore I noticed that there isn't a need to distinguish between the "wide" and "operators" cases -- all callers pass identical values to both. This patch cleans up the above, consolidating the parameters and changing how upper-case detection is handled, so that both the operator and wide cases pass-through without issue. I've added new unit tests for this. --- gdb/ada-lang.c | 83 ++++++++++++++++++++++++++++++++++------------- gdb/ada-lang.h | 15 ++++----- gdb/dwarf2/cooked-index.c | 2 +- gdb/symtab.h | 2 +- 4 files changed, 68 insertions(+), 34 deletions(-) diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c index a55ee12ce70d02082e64d85634b87dd27f5a0670..4bb6a808fd8c1a7f8e4b2344fdf935f94c602ed1 100644 --- a/gdb/ada-lang.c +++ b/gdb/ada-lang.c @@ -1308,7 +1308,7 @@ convert_from_hex_encoded (std::string &out, const char *str, int n) /* See ada-lang.h. */ std::string -ada_decode (const char *encoded, bool wrap, bool operators, bool wide) +ada_decode (const char *encoded, bool wrap, bool translate) { int i; int len0; @@ -1403,7 +1403,7 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) while (i < len0) { /* Is this a symbol function? */ - if (operators && at_start_name && encoded[i] == 'O') + if (at_start_name && encoded[i] == 'O') { int k; @@ -1414,7 +1414,10 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) op_len - 1) == 0) && !isalnum (encoded[i + op_len])) { - decoded.append (ada_opname_table[k].decoded); + if (translate) + decoded.append (ada_opname_table[k].decoded); + else + decoded.append (ada_opname_table[k].encoded); at_start_name = 0; i += op_len; break; @@ -1502,28 +1505,59 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) i++; } - if (wide && i < len0 + 3 && encoded[i] == 'U' && isxdigit (encoded[i + 1])) + /* Handle wide characters while respecting the arguments to the + function: we may want to copy them verbatim, but in this case + we do not want to register that we've copied an upper-case + character. */ + if (i < len0 + 3 && encoded[i] == 'U' && isxdigit (encoded[i + 1])) { - if (convert_from_hex_encoded (decoded, &encoded[i + 1], 2)) + if (translate) { - i += 3; + if (convert_from_hex_encoded (decoded, &encoded[i + 1], 2)) + { + i += 3; + continue; + } + } + else + { + decoded.push_back (encoded[i]); + ++i; continue; } } - else if (wide && i < len0 + 5 && encoded[i] == 'W' && isxdigit (encoded[i + 1])) + else if (i < len0 + 5 && encoded[i] == 'W' && isxdigit (encoded[i + 1])) { - if (convert_from_hex_encoded (decoded, &encoded[i + 1], 4)) + if (translate) + { + if (convert_from_hex_encoded (decoded, &encoded[i + 1], 4)) + { + i += 5; + continue; + } + } + else { - i += 5; + decoded.push_back (encoded[i]); + ++i; continue; } } - else if (wide && i < len0 + 10 && encoded[i] == 'W' && encoded[i + 1] == 'W' + else if (i < len0 + 10 && encoded[i] == 'W' && encoded[i + 1] == 'W' && isxdigit (encoded[i + 2])) { - if (convert_from_hex_encoded (decoded, &encoded[i + 2], 8)) + if (translate) { - i += 10; + if (convert_from_hex_encoded (decoded, &encoded[i + 2], 8)) + { + i += 10; + continue; + } + } + else + { + decoded.push_back (encoded[i]); + ++i; continue; } } @@ -1550,6 +1584,12 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) at_start_name = 1; i += 2; } + else if (isupper (encoded[i]) || encoded[i] == ' ') + { + /* Decoded names should never contain any uppercase + character. */ + goto Suppress; + } else { /* It's a character part of the decoded name, so just copy it @@ -1559,16 +1599,6 @@ ada_decode (const char *encoded, bool wrap, bool operators, bool wide) } } - /* Decoded names should never contain any uppercase character. - Double-check this, and abort the decoding if we find one. */ - - if (operators) - { - for (i = 0; i < decoded.length(); ++i) - if (isupper (decoded[i]) || decoded[i] == ' ') - goto Suppress; - } - /* If the compiler added a suffix, append it now. */ if (suffix >= 0) decoded = decoded + "[" + &encoded[suffix] + "]"; @@ -1594,6 +1624,13 @@ ada_decode_tests () /* This isn't valid, but used to cause a crash. PR gdb/30639. The result does not really matter very much. */ SELF_CHECK (ada_decode ("44") == "44"); + + /* Check that the settings used by the DWARF reader have the desired + effect. */ + SELF_CHECK (ada_decode ("symada__cS", false, false) == ""); + SELF_CHECK (ada_decode ("pkg__Oxor", false, false) == "pkg.Oxor"); + SELF_CHECK (ada_decode ("pack__func_W017b", false, false) + == "pack.func_W017b"); } #endif @@ -13311,7 +13348,7 @@ ada_lookup_name_info::ada_lookup_name_info (const lookup_name_info &lookup_name) else m_standard_p = false; - m_decoded_name = ada_decode (m_encoded_name.c_str (), true, false, false); + m_decoded_name = ada_decode (m_encoded_name.c_str (), true, false); /* If the name contains a ".", then the user is entering a fully qualified entity name, and the match must not be done in wild diff --git a/gdb/ada-lang.h b/gdb/ada-lang.h index 3582082a1a1b702595b803072ff9c345b7f3e0f7..a96a1f6e01737b03c6e6dea5024fbdd253647201 100644 --- a/gdb/ada-lang.h +++ b/gdb/ada-lang.h @@ -218,16 +218,13 @@ extern const char *ada_decode_symbol (const struct general_symbol_info *); simply wrapped in <...>. If WRAP is false, then the empty string will be returned. - When OPERATORS is false, operator names will not be decoded. By - default, they are decoded, e.g., 'Oadd' will be transformed to - '"+"'. - - When WIDE is false, wide characters will be left as-is. By - default, they converted from their hex encoding to the host - charset. */ + TRANSLATE has two effects. When true (the default), operator names + and wide characters will be decoded. E.g., 'Oadd' will be + transformed to '"+"', and wide characters converted from their hex + encoding to the host charset. When false, these will be left + alone. */ extern std::string ada_decode (const char *name, bool wrap = true, - bool operators = true, - bool wide = true); + bool translate = true); extern std::vector ada_lookup_symbol_list (const char *, const struct block *, domain_search_flags); diff --git a/gdb/dwarf2/cooked-index.c b/gdb/dwarf2/cooked-index.c index 9533a20e6c48cd164f1de853f1071ce5cb00ca88..427b9bbb2f6ce7a9e1339a729b58d2f64286677d 100644 --- a/gdb/dwarf2/cooked-index.c +++ b/gdb/dwarf2/cooked-index.c @@ -359,7 +359,7 @@ cooked_index_shard::handle_gnat_encoded_entry characters are left as-is. This is done to make name matching a bit simpler; and for wide characters, it means the choice of Ada source charset does not affect the indexer directly. */ - std::string canonical = ada_decode (entry->name, false, false, false); + std::string canonical = ada_decode (entry->name, false, false); if (canonical.empty ()) { entry->canonical = entry->name; diff --git a/gdb/symtab.h b/gdb/symtab.h index 7927380fca3f115fd43ecdaf683ecc07a0ff22e0..83913b1806f4a5fe39987978bb7059efc606a594 100644 --- a/gdb/symtab.h +++ b/gdb/symtab.h @@ -145,7 +145,7 @@ class ada_lookup_name_info final std::string m_encoded_name; /* The decoded lookup name. This is formed by calling ada_decode - with both 'operators' and 'wide' set to false. */ + with 'translate' set to false. */ std::string m_decoded_name; /* Whether the user-provided lookup name was Ada encoded. If so,