From patchwork Sat Oct 6 00:16:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Brobecker X-Patchwork-Id: 29663 Received: (qmail 5754 invoked by alias); 6 Oct 2018 00:16:54 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 4813 invoked by uid 89); 6 Oct 2018 00:16:39 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, TIME_LIMIT_EXCEEDED autolearn=unavailable version=3.3.2 spammy=hitch, Beta, brain, comprehensive X-HELO: rock.gnat.com Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 06 Oct 2018 00:16:25 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id CED21560BA for ; Fri, 5 Oct 2018 20:16:23 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id ZSYUiVFYXbRC for ; Fri, 5 Oct 2018 20:16:23 -0400 (EDT) Received: from joel.gnat.com (localhost.localdomain [127.0.0.1]) by rock.gnat.com (Postfix) with ESMTP id 8253B560B7 for ; Fri, 5 Oct 2018 20:16:23 -0400 (EDT) Received: by joel.gnat.com (Postfix, from userid 1000) id BE41382C67; Fri, 5 Oct 2018 17:16:21 -0700 (PDT) Date: Fri, 5 Oct 2018 17:16:21 -0700 From: Joel Brobecker To: gdb-patches@sourceware.org Subject: [RFC] partial symbol name matching vs regexp Message-ID: <20181006001621.GA6358@adacore.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) Hello, First of all, apologies for the long-ish email below; I was just trying to be thorough and go slow in my explanation. I hope nothing's too convoluted, and that the email can be read without too much brain intensity.... (the fixing, on the other hand... ;-)) One of our users reported that info types was sometimes not working for Ada types. Consider for instance the following code: package Alpha is type Beta is record B : Integer; end record; end Alpha; This package defines a structure type called "Beta" inside package Alpha. The fully qualified (natural) name of the type is "Alpha.Beta" and underneath, its linkage name is alpha__beta. What the user is trying to do is use the "info types" command with the fully qualified name, something like: (gdb) info types alpha.beta Unfortunately, as of today, this does not always work: $ gdb -q foo Reading symbols from foo... (gdb) info types alpha.beta All types matching regular expression "alpha.beta": (gdb) The same query but with just the type's name, not qualified, does work; and once that's done, package Alpha's partial symtab is now expanded, so do the initial query above also starts working: $ gdb -q foo Reading symbols from foo... (gdb) info types beta All types matching regular expression "beta": File alpha.ads: 2: alpha.beta; (gdb) info types alpha.beta All types matching regular expression "alpha.beta": File alpha.ads: 2: alpha.beta; So, the problem in the matching of the partial symtabs we want to expand. The function that searches for matching symbols is in symtab.c... | std::vector | search_symbols (const char *regexp, enum search_domain kind, | int nfiles, const char *files[]) ... and the expansion is done via a call to expand_symtabs_matching: | /* Search through the partial symtabs *first* for all symbols | matching the regexp. That way we don't have to reproduce all of | the machinery below. */ | expand_symtabs_matching ([file_matcher snip'ed], | lookup_name_info::match_any (), | | [&] (const char *symname) | | { | [symbol_matcher]---+ return (!preg || preg->exec (symname, | | 0, NULL, 0) == 0); | | }, | NULL, | kind); The part of interest is the symbol_matcher argument, passed via a lambda function which evaluates the regexp against the given symname; I've kind of highlighted that argument in the copy/paste above, but here it is again, for the avoidance of doubt: | [&] (const char *symname) | { | return (!preg || preg->exec (symname, | 0, NULL, 0) == 0); | }, As you can see, it is a fairly straightforward implementation that runs the given symbol name against the regexp. In our case, that symbol_matchers doesn't match alpha.beta, because SYMNAME is the symbol's search name, and in Ada, we search by encoded name. Thus, for the symbol we were hoping to match, it gets called with "alpha__beta". We can verify that the search_name gets used by looking at the caller of that lambda function (recursively_search_psymtabs), which iterates over all the symbols of a partial symtab with: | && (sym_matcher == NULL || sym_matcher (symbol_search_name (*psym)))) That's why it doesn't work. After pondering about it a bit, my overall thinking is that regular expressions used in the context of matching symbol name have to be coming from a user, who normally reasons in terms of natural name; and therefore these regexps should be evaluated against the symbols' natural name, rather than their search name. With that assumption in mind, I think the way to make it work is to change the symbol_matcher's signature to receive more information - so that each caller of expand_symtabs_matching can then decide which symbol name it needs to look at; in most cases, it will be the symbol_search_name, but in the particular case of search_symbols where we're comparing against a regexp that we assume comes from the user, we can decide to use the symbol_natural_name instead. This would actually be consistent with the rest of search_symbols's implementation; as you can see, once the partial symbol expansion is performed, it iterates over minimal symbols and full symbols, and selects them based on the symbol's natural name as well: | ALL_MSYMBOLS (objfile, msymbol) | [...] | if (!preg | || preg->exec (MSYMBOL_NATURAL_NAME (msymbol), 0, ... and ... | ALL_COMPUNITS (objfile, cust) | [...] | && ((!preg | || preg->exec (SYMBOL_NATURAL_NAME (sym), 0, | NULL, 0) == 0) Following this, I thought the simplest would probably to have the symbol_matcher receive a general_symbol_info instead of just the name. I tried writing a prototype that does just this, and it seems to be giving the results I hoped, without any regression on x86_64-linux, using both the official testsuite as well as AdaCore's testsuite. See the attached patch that modifies expand_symtabs_symbol_matcher_ftype to take a general_symbol_info instead of a name, and then adjusts all users. Unfortunately, while working on that prototype, I realized we have .gdb_index support in dwarf2read.c that uses that infrastructure too. Given that gdb_index is just an elaborate index (and therefore just a very partial view of the debuging information), I don't think the entries in the gdb_index symbol table have enough information to allow us to construct a complete general_symbol_info object. The prototype I attached side-steps the question by just creating partial general_symbol_info objects where only the name is set. It allows me to get identical results for any language but Ada, knowing that it doesn't interfere with my Ada testing, since we do not support gdb_index with Ada yet. Trying to resolve that hitch with the gdb_index handling, and after having read the description of what's in the gdb_index section, I think there should be a way to link a symbol in the gdb_index back to the DWARF CU, and therefore get the symbol's language. Once we have the language in addition to the symbol name, we should have all we need for our immediate purposes. Thus, a second option would be to keep the name parameter in expand_symtabs_symbol_matcher_ftype, but then add the symbol language as a second parameter; we would then modify the symbol_matcher in search_symbols to check the language, and if the language is Ada, then decode the name before evaluating the regexp. Outside of adjusting the prototype of a couple of lambda functions, everything else would remain the same. But I'm not really satisfied with the second option (two parameters). What if a search routine wanted to expand based on the linkage name, one day, even for C++ symbols. I don't know if mangling a symbol back would be possible or not, but regardless of that, it would feel wrong to compute a mangled name when the caller had it. For that reason, I tend to still think that changing expand_symtabs_symbol_matcher_ftype to receive a general_symbol_info is better. It's not perfect in the case of gdb_index handling, but I think that the consequences of that peculiarity would be contained within the gdb_index handling in dwarf2read.c. So, at least, it wouldn't be a caller in dwarf2read.c passing an unsuspecting symbol_matcher function defined elsewhere an incomplete general_symbol_info. We could possibly mitigate the problem by documenting that only the name and language of the general_symbol_info parameter in expand_symtabs_symbol_matcher_ftype function should be accessed, but I think it's too easy for people to miss that and access the fields anyway. Better put the documentation in dwarf2read's particular implementation, IMO, and make sure that it remains contained and consistent within that unit. Attached is my prototype patch, with a small test that has one fail because the patch is applied. The test is very simple, and I intend to make a more complex one, but it should make it easier for you to try this example should you like to. Please let me know what you think! Thank you, From 10c678ea79793b3bcf039b8053bd3d0b70e71715 Mon Sep 17 00:00:00 2001 From: Joel Brobecker Date: Fri, 5 Oct 2018 19:53:17 -0400 Subject: [PATCH] missing Ada symbol listed by info types with some regular expressions *** FIXME *** write a revision log and ChangeLog *** FIXME *** provide a more comprehensive testcase. --- gdb/ada-lang.c | 4 +++- gdb/dwarf2read.c | 4 +++- gdb/psymtab.c | 2 +- gdb/symfile.h | 7 ++++--- gdb/symmisc.c | 2 +- gdb/symtab.c | 4 +++- gdb/testsuite/gdb.ada/info_types_ada.exp | 29 ++++++++++++++++++++++++++ gdb/testsuite/gdb.ada/info_types_ada/alpha.adb | 6 ++++++ gdb/testsuite/gdb.ada/info_types_ada/alpha.ads | 7 +++++++ gdb/testsuite/gdb.ada/info_types_ada/foo.adb | 6 ++++++ 10 files changed, 63 insertions(+), 8 deletions(-) create mode 100644 gdb/testsuite/gdb.ada/info_types_ada.exp create mode 100644 gdb/testsuite/gdb.ada/info_types_ada/alpha.adb create mode 100644 gdb/testsuite/gdb.ada/info_types_ada/alpha.ads create mode 100644 gdb/testsuite/gdb.ada/info_types_ada/foo.adb diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c index 1462271..9723161 100644 --- a/gdb/ada-lang.c +++ b/gdb/ada-lang.c @@ -13611,8 +13611,10 @@ ada_add_global_exceptions (compiled_regex *preg, name. So match against the decoded name. */ expand_symtabs_matching (NULL, lookup_name_info::match_any (), - [&] (const char *search_name) + [&] (const struct general_symbol_info *gsymbol) { + const char *search_name + = symbol_search_name (gsymbol); const char *decoded = ada_decode (search_name); return name_matches_regex (decoded, preg); }, diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c index e0fd565..3eabe3b 100644 --- a/gdb/dwarf2read.c +++ b/gdb/dwarf2read.c @@ -4540,9 +4540,11 @@ dw2_expand_symtabs_matching_symbol for (; bounds.first != bounds.second; ++bounds.first) { const char *qualified = index.symbol_name_at (bounds.first->idx); + struct general_symbol_info ginfo = {0}; + ginfo.name = qualified; if (!lookup_name_matcher.matches (qualified) - || (symbol_matcher != NULL && !symbol_matcher (qualified))) + || (symbol_matcher != NULL && !symbol_matcher (&ginfo))) continue; matches.push_back (bounds.first->idx); diff --git a/gdb/psymtab.c b/gdb/psymtab.c index 915e4fb..74b1589 100644 --- a/gdb/psymtab.c +++ b/gdb/psymtab.c @@ -1298,7 +1298,7 @@ recursively_search_psymtabs || (domain == TYPES_DOMAIN && (*psym)->aclass == LOC_TYPEDEF)) && psymbol_name_matches (*psym, lookup_name) - && (sym_matcher == NULL || sym_matcher (symbol_search_name (*psym)))) + && (sym_matcher == NULL || sym_matcher (*psym))) { /* Found a match, so notify our caller. */ result = PST_SEARCHED_AND_FOUND; diff --git a/gdb/symfile.h b/gdb/symfile.h index 5b8b781..abed56c 100644 --- a/gdb/symfile.h +++ b/gdb/symfile.h @@ -114,7 +114,8 @@ typedef bool (expand_symtabs_file_matcher_ftype) (const char *filename, /* Callback for quick_symbol_functions->expand_symtabs_matching to match a symbol name. */ -typedef bool (expand_symtabs_symbol_matcher_ftype) (const char *name); +typedef bool (expand_symtabs_symbol_matcher_ftype) + (const struct general_symbol_info *gsymbol); /* Callback for quick_symbol_functions->expand_symtabs_matching to be called after a symtab has been expanded. */ @@ -245,8 +246,8 @@ struct quick_symbol_functions Otherwise, if KIND does not match, this symbol is skipped. If even KIND matches, SYMBOL_MATCHER is called for each symbol - defined in the file. The symbol "search" name is passed to - SYMBOL_MATCHER. + defined in the file, with the symbol's general_symbol_info + being passed as the argument. If SYMBOL_MATCHER returns false, then the symbol is skipped. diff --git a/gdb/symmisc.c b/gdb/symmisc.c index d30a354..4bcdb7a 100644 --- a/gdb/symmisc.c +++ b/gdb/symmisc.c @@ -951,7 +951,7 @@ maintenance_expand_symtabs (const char *args, int from_tty) && (regexp == NULL || re_exec (filename))); }, lookup_name_info::match_any (), - [] (const char *symname) + [] (const struct general_symbol_info *gsymbol) { /* Since we're not searching on symbols, just return true. */ return true; diff --git a/gdb/symtab.c b/gdb/symtab.c index 2e48d65..7b02d4c 100644 --- a/gdb/symtab.c +++ b/gdb/symtab.c @@ -4375,8 +4375,10 @@ search_symbols (const char *regexp, enum search_domain kind, basenames); }, lookup_name_info::match_any (), - [&] (const char *symname) + [&] (const struct general_symbol_info *gsymbol) { + const char *symname + = symbol_natural_name (gsymbol); return (!preg || preg->exec (symname, 0, NULL, 0) == 0); }, diff --git a/gdb/testsuite/gdb.ada/info_types_ada.exp b/gdb/testsuite/gdb.ada/info_types_ada.exp new file mode 100644 index 0000000..e5952cd --- /dev/null +++ b/gdb/testsuite/gdb.ada/info_types_ada.exp @@ -0,0 +1,29 @@ +# Copyright 2018 Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +load_lib "ada.exp" + +standard_ada_testfile foo + +if {[gdb_compile_ada "${srcfile}" "${binfile}" executable [list debug]] != "" } { + return -1 +} + +clean_restart ${testfile} + +set eol "\[\r\n\]+" + +gdb_test "info types alpha.beta" \ + "All types matching regular expression \"alpha\\.beta\":${eol}File .*alpha.ads:${eol}${decimal}:\talpha\.beta;" diff --git a/gdb/testsuite/gdb.ada/info_types_ada/alpha.adb b/gdb/testsuite/gdb.ada/info_types_ada/alpha.adb new file mode 100644 index 0000000..756269a --- /dev/null +++ b/gdb/testsuite/gdb.ada/info_types_ada/alpha.adb @@ -0,0 +1,6 @@ +package body Alpha is + procedure Do_Nothing (B : in out Beta) is + begin + null; + end Do_Nothing; +end Alpha; diff --git a/gdb/testsuite/gdb.ada/info_types_ada/alpha.ads b/gdb/testsuite/gdb.ada/info_types_ada/alpha.ads new file mode 100644 index 0000000..1b10513 --- /dev/null +++ b/gdb/testsuite/gdb.ada/info_types_ada/alpha.ads @@ -0,0 +1,7 @@ +package Alpha is + type Beta is record + B : Integer; + end record; + + procedure Do_Nothing (B : in out Beta); +end Alpha; diff --git a/gdb/testsuite/gdb.ada/info_types_ada/foo.adb b/gdb/testsuite/gdb.ada/info_types_ada/foo.adb new file mode 100644 index 0000000..a154552 --- /dev/null +++ b/gdb/testsuite/gdb.ada/info_types_ada/foo.adb @@ -0,0 +1,6 @@ +with Alpha; use Alpha; +procedure Foo is + My_Beta : Beta := (B => 7484); +begin + Do_Nothing (My_Beta); +end Foo; -- 2.1.4