[RFC] partial symbol name matching vs regexp

Message ID 20181006001621.GA6358@adacore.com
State New, archived
Headers

Commit Message

Joel Brobecker Oct. 6, 2018, 12:16 a.m. UTC
  Hello,

First of all, apologies for the long-ish email below; I was just
trying to be thorough and go slow in my explanation. I hope nothing's
too convoluted, and that the email can be read without too much
brain intensity.... (the fixing, on the other hand... ;-))

One of our users reported that info types was sometimes not working
for Ada types. Consider for instance the following code:

    package Alpha is
       type Beta is record
          B : Integer;
       end record;
    end Alpha;

This package defines a structure type called "Beta" inside package
Alpha. The fully qualified (natural) name of the type is "Alpha.Beta"
and underneath, its linkage name is alpha__beta.

What the user is trying to do is use the "info types" command with
the fully qualified name, something like:

    (gdb) info types alpha.beta

Unfortunately, as of today, this does not always work:

    $ gdb -q foo
    Reading symbols from foo...
    (gdb) info types alpha.beta
    All types matching regular expression "alpha.beta":
    (gdb)

The same query but with just the type's name, not qualified,
does work; and once that's done, package Alpha's partial symtab
is now expanded, so do the initial query above also starts working:

    $ gdb -q foo
    Reading symbols from foo...
    (gdb) info types beta
    All types matching regular expression "beta":

    File alpha.ads:
    2:	alpha.beta;
    (gdb) info types alpha.beta
    All types matching regular expression "alpha.beta":

    File alpha.ads:
    2:	alpha.beta;

So, the problem in the matching of the partial symtabs we want to expand.

The function that searches for matching symbols is in symtab.c...

    | std::vector<symbol_search>
    | search_symbols (const char *regexp, enum search_domain kind,
    |                 int nfiles, const char *files[])

... and the expansion is done via a call to expand_symtabs_matching:

    | /* Search through the partial symtabs *first* for all symbols
    |    matching the regexp.  That way we don't have to reproduce all of
    |    the machinery below.  */
    | expand_symtabs_matching ([file_matcher snip'ed],
    |                          lookup_name_info::match_any (),
    |                       |  [&] (const char *symname)
    |                       |  {
    |    [symbol_matcher]---+    return (!preg || preg->exec (symname,
    |                       |                                 0, NULL, 0) == 0);
    |                       |  },
    |                          NULL,
    |                          kind);

The part of interest is the symbol_matcher argument, passed via
a lambda function which evaluates the regexp against the given
symname; I've kind of highlighted that argument in the copy/paste
above, but here it is again, for the avoidance of doubt:

    |  [&] (const char *symname)
    |  {
    |    return (!preg || preg->exec (symname,
    |                                 0, NULL, 0) == 0);
    |  },

As you can see, it is a fairly straightforward implementation
that runs the given symbol name against the regexp.

In our case, that symbol_matchers doesn't match alpha.beta,
because SYMNAME is the symbol's search name, and in Ada,
we search by encoded name. Thus, for the symbol we were
hoping to match, it gets called with "alpha__beta".

We can verify that the search_name gets used by looking at
the caller of that lambda function (recursively_search_psymtabs),
which iterates over all the symbols of a partial symtab with:

    | && (sym_matcher == NULL || sym_matcher (symbol_search_name (*psym))))

That's why it doesn't work.

After pondering about it a bit, my overall thinking is that regular
expressions used in the context of matching symbol name have to be
coming from a user, who normally reasons in terms of natural name;
and therefore these regexps should be evaluated against the symbols'
natural name, rather than their search name.

With that assumption in mind, I think the way to make it work
is to change the symbol_matcher's signature to receive more
information - so that each caller of expand_symtabs_matching can
then decide which symbol name it needs to look at; in most cases,
it will be the symbol_search_name, but in the particular case of
search_symbols where we're comparing against a regexp that we assume
comes from the user, we can decide to use the symbol_natural_name
instead.

This would actually be consistent with the rest of search_symbols's
implementation; as you can see, once the partial symbol expansion
is performed, it iterates over minimal symbols and full symbols,
and selects them based on the symbol's natural name as well:

    | ALL_MSYMBOLS (objfile, msymbol)
    | [...]
    |       if (!preg
    |           || preg->exec (MSYMBOL_NATURAL_NAME (msymbol), 0,


... and ...

    | ALL_COMPUNITS (objfile, cust)
    | [...]
    |      && ((!preg
    |           || preg->exec (SYMBOL_NATURAL_NAME (sym), 0,
    |                          NULL, 0) == 0)

Following this, I thought the simplest would probably to have
the symbol_matcher receive a general_symbol_info instead of
just the name.

I tried writing a prototype that does just this, and it seems to be
giving the results I hoped, without any regression on x86_64-linux,
using both the official testsuite as well as AdaCore's testsuite.
See the attached patch that modifies expand_symtabs_symbol_matcher_ftype
to take a general_symbol_info instead of a name, and then adjusts
all users.

Unfortunately, while working on that prototype, I realized we have
.gdb_index support in dwarf2read.c that uses that infrastructure too.
Given that gdb_index is just an elaborate index (and therefore just
a very partial view of the debuging information), I don't think
the entries in the gdb_index symbol table have enough information
to allow us to construct a complete general_symbol_info object.
The prototype I attached side-steps the question by just creating
partial general_symbol_info objects where only the name is set.
It allows me to get identical results for any language but Ada,
knowing that it doesn't interfere with my Ada testing, since we
do not support gdb_index with Ada yet.

Trying to resolve that hitch with the gdb_index handling, and
after having read the description of what's in the gdb_index
section, I think there should be a way to link a symbol in
the gdb_index back to the DWARF CU, and therefore get the symbol's
language. Once we have the language in addition to the symbol name,
we should have all we need for our immediate purposes.

Thus, a second option would be to keep the name parameter
in expand_symtabs_symbol_matcher_ftype, but then add the symbol
language as a second parameter; we would then modify the symbol_matcher
in search_symbols to check the language, and if the language is Ada,
then decode the name before evaluating the regexp. Outside of adjusting
the prototype of a couple of lambda functions, everything else would
remain the same.

But I'm not really satisfied with the second option (two parameters).
What if a search routine wanted to expand based on the linkage name,
one day, even for C++ symbols. I don't know if mangling a symbol back
would be possible or not, but regardless of that, it would feel wrong
to compute a mangled name when the caller had it.

For that reason, I tend to still think that changing
expand_symtabs_symbol_matcher_ftype to receive a general_symbol_info
is better. It's not perfect in the case of gdb_index handling,
but I think that the consequences of that peculiarity would be
contained within the gdb_index handling in dwarf2read.c. So,
at least, it wouldn't be a caller in dwarf2read.c passing an
unsuspecting symbol_matcher function defined elsewhere an incomplete
general_symbol_info.

We could possibly mitigate the problem by documenting that only
the name and language of the general_symbol_info parameter in
expand_symtabs_symbol_matcher_ftype function  should be accessed,
but I think it's too easy for people to miss that and access
the fields anyway. Better put the documentation in dwarf2read's
particular implementation, IMO, and make sure that it remains
contained and consistent within that unit.

Attached is my prototype patch, with a small test that has one
fail because the patch is applied. The test is very simple,
and I intend to make a more complex one, but it should make it
easier for you to try this example should you like to.

Please let me know what you think!

Thank you,
  

Comments

Tom Tromey Aug. 8, 2019, 7:33 p.m. UTC | #1
>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:

Replying to email from October of 2018:

https://www.sourceware.org/ml/gdb-patches/2018-10/msg00140.html

Joel> One of our users reported that info types was sometimes not working
Joel> for Ada types. Consider for instance the following code:
Joel>     package Alpha is
Joel>        type Beta is record
Joel>           B : Integer;
Joel>        end record;
Joel>     end Alpha;
[...]

Joel> With that assumption in mind, I think the way to make it work
Joel> is to change the symbol_matcher's signature to receive more
Joel> information - so that each caller of expand_symtabs_matching can
Joel> then decide which symbol name it needs to look at; in most cases,
Joel> it will be the symbol_search_name, but in the particular case of
Joel> search_symbols where we're comparing against a regexp that we assume
Joel> comes from the user, we can decide to use the symbol_natural_name
Joel> instead.

This makes sense to me, especially given what you found:

Joel> This would actually be consistent with the rest of search_symbols's
Joel> implementation; as you can see, once the partial symbol expansion
Joel> is performed, it iterates over minimal symbols and full symbols,
Joel> and selects them based on the symbol's natural name as well:
[...]
Joel>     | ALL_COMPUNITS (objfile, cust)
Joel>     | [...]
Joel>     |      && ((!preg
Joel>     |           || preg->exec (SYMBOL_NATURAL_NAME (sym), 0,
Joel>     |                          NULL, 0) == 0)

Having the expansion step using one symbol name and then the actual
search use another seems like a bug.

[...]
Joel> It allows me to get identical results for any language but Ada,
Joel> knowing that it doesn't interfere with my Ada testing, since we
Joel> do not support gdb_index with Ada yet.

We'll have to revisit this now that we have patches to support
.debug_names with Ada.

In this situation, we'll have "quasi encoded" names in the index -- that
is, Ada names that were decoded and then re-encoded, to drop the various
suffixes.

I am not totally sure what to do here.  Reading the CU DIEs to find the
language might be acceptable, but it might be ok to just try decoding
the name as well.

I think some kind of hack is appropriate given that .debug_names doesn't
even hold the correct names currently -- because when it does, we are
going to need some entirely different approach here, like reconstructing
full names as we search.

Joel> It's not perfect in the case of gdb_index handling,
Joel> but I think that the consequences of that peculiarity would be
Joel> contained within the gdb_index handling in dwarf2read.c. So,
Joel> at least, it wouldn't be a caller in dwarf2read.c passing an
Joel> unsuspecting symbol_matcher function defined elsewhere an incomplete
Joel> general_symbol_info.

I didn't understand this, because the symbol_matcher callback is passed
in to dw2_expand_symtabs_matching_symbol.

So, I think every symbol matcher needs to be prepared to handle this
case.

I think it's fine to just document that the symbol might be "fake".
However, if we really want to be robust here, I guess could supply some
wrapper object that only reveals the bits we want revealed.

Joel> Attached is my prototype patch, with a small test that has one
Joel> fail because the patch is applied. The test is very simple,
Joel> and I intend to make a more complex one, but it should make it
Joel> easier for you to try this example should you like to.

What would the complex test do that this one does not do?

FWIW I've updated the patch to the latest source (actually I rebased it
on top of the .debug_names patch, to see how hard that would be -- not
very as it turns out).

Anyway, I'd like to move forward with this.

Tom
  

Patch

From 10c678ea79793b3bcf039b8053bd3d0b70e71715 Mon Sep 17 00:00:00 2001
From: Joel Brobecker <brobecker@adacore.com>
Date: Fri, 5 Oct 2018 19:53:17 -0400
Subject: [PATCH] missing Ada symbol listed by info types with some regular
 expressions

*** FIXME *** write a revision log and ChangeLog
*** FIXME *** provide a more comprehensive testcase.
---
 gdb/ada-lang.c                                 |  4 +++-
 gdb/dwarf2read.c                               |  4 +++-
 gdb/psymtab.c                                  |  2 +-
 gdb/symfile.h                                  |  7 ++++---
 gdb/symmisc.c                                  |  2 +-
 gdb/symtab.c                                   |  4 +++-
 gdb/testsuite/gdb.ada/info_types_ada.exp       | 29 ++++++++++++++++++++++++++
 gdb/testsuite/gdb.ada/info_types_ada/alpha.adb |  6 ++++++
 gdb/testsuite/gdb.ada/info_types_ada/alpha.ads |  7 +++++++
 gdb/testsuite/gdb.ada/info_types_ada/foo.adb   |  6 ++++++
 10 files changed, 63 insertions(+), 8 deletions(-)
 create mode 100644 gdb/testsuite/gdb.ada/info_types_ada.exp
 create mode 100644 gdb/testsuite/gdb.ada/info_types_ada/alpha.adb
 create mode 100644 gdb/testsuite/gdb.ada/info_types_ada/alpha.ads
 create mode 100644 gdb/testsuite/gdb.ada/info_types_ada/foo.adb

diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
index 1462271..9723161 100644
--- a/gdb/ada-lang.c
+++ b/gdb/ada-lang.c
@@ -13611,8 +13611,10 @@  ada_add_global_exceptions (compiled_regex *preg,
      name.  So match against the decoded name.  */
   expand_symtabs_matching (NULL,
 			   lookup_name_info::match_any (),
-			   [&] (const char *search_name)
+			   [&] (const struct general_symbol_info *gsymbol)
 			   {
+			     const char *search_name
+			       = symbol_search_name (gsymbol);
 			     const char *decoded = ada_decode (search_name);
 			     return name_matches_regex (decoded, preg);
 			   },
diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index e0fd565..3eabe3b 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -4540,9 +4540,11 @@  dw2_expand_symtabs_matching_symbol
   for (; bounds.first != bounds.second; ++bounds.first)
     {
       const char *qualified = index.symbol_name_at (bounds.first->idx);
+      struct general_symbol_info ginfo = {0};
+      ginfo.name = qualified;
 
       if (!lookup_name_matcher.matches (qualified)
-	  || (symbol_matcher != NULL && !symbol_matcher (qualified)))
+	  || (symbol_matcher != NULL && !symbol_matcher (&ginfo)))
 	continue;
 
       matches.push_back (bounds.first->idx);
diff --git a/gdb/psymtab.c b/gdb/psymtab.c
index 915e4fb..74b1589 100644
--- a/gdb/psymtab.c
+++ b/gdb/psymtab.c
@@ -1298,7 +1298,7 @@  recursively_search_psymtabs
 	       || (domain == TYPES_DOMAIN
 		   && (*psym)->aclass == LOC_TYPEDEF))
 	      && psymbol_name_matches (*psym, lookup_name)
-	      && (sym_matcher == NULL || sym_matcher (symbol_search_name (*psym))))
+	      && (sym_matcher == NULL || sym_matcher (*psym)))
 	    {
 	      /* Found a match, so notify our caller.  */
 	      result = PST_SEARCHED_AND_FOUND;
diff --git a/gdb/symfile.h b/gdb/symfile.h
index 5b8b781..abed56c 100644
--- a/gdb/symfile.h
+++ b/gdb/symfile.h
@@ -114,7 +114,8 @@  typedef bool (expand_symtabs_file_matcher_ftype) (const char *filename,
 /* Callback for quick_symbol_functions->expand_symtabs_matching
    to match a symbol name.  */
 
-typedef bool (expand_symtabs_symbol_matcher_ftype) (const char *name);
+typedef bool (expand_symtabs_symbol_matcher_ftype)
+  (const struct general_symbol_info *gsymbol);
 
 /* Callback for quick_symbol_functions->expand_symtabs_matching
    to be called after a symtab has been expanded.  */
@@ -245,8 +246,8 @@  struct quick_symbol_functions
      Otherwise, if KIND does not match, this symbol is skipped.
 
      If even KIND matches, SYMBOL_MATCHER is called for each symbol
-     defined in the file.  The symbol "search" name is passed to
-     SYMBOL_MATCHER.
+     defined in the file, with the symbol's general_symbol_info
+     being passed as the argument.
 
      If SYMBOL_MATCHER returns false, then the symbol is skipped.
 
diff --git a/gdb/symmisc.c b/gdb/symmisc.c
index d30a354..4bcdb7a 100644
--- a/gdb/symmisc.c
+++ b/gdb/symmisc.c
@@ -951,7 +951,7 @@  maintenance_expand_symtabs (const char *args, int from_tty)
 		       && (regexp == NULL || re_exec (filename)));
 	     },
 	     lookup_name_info::match_any (),
-	     [] (const char *symname)
+	     [] (const struct general_symbol_info *gsymbol)
 	     {
 	       /* Since we're not searching on symbols, just return true.  */
 	       return true;
diff --git a/gdb/symtab.c b/gdb/symtab.c
index 2e48d65..7b02d4c 100644
--- a/gdb/symtab.c
+++ b/gdb/symtab.c
@@ -4375,8 +4375,10 @@  search_symbols (const char *regexp, enum search_domain kind,
 						  basenames);
 			   },
 			   lookup_name_info::match_any (),
-			   [&] (const char *symname)
+			   [&] (const struct general_symbol_info *gsymbol)
 			   {
+			     const char *symname
+			       = symbol_natural_name (gsymbol);
 			     return (!preg || preg->exec (symname,
 							  0, NULL, 0) == 0);
 			   },
diff --git a/gdb/testsuite/gdb.ada/info_types_ada.exp b/gdb/testsuite/gdb.ada/info_types_ada.exp
new file mode 100644
index 0000000..e5952cd
--- /dev/null
+++ b/gdb/testsuite/gdb.ada/info_types_ada.exp
@@ -0,0 +1,29 @@ 
+# Copyright 2018 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+load_lib "ada.exp"
+
+standard_ada_testfile foo
+
+if {[gdb_compile_ada "${srcfile}" "${binfile}" executable [list debug]] != "" } {
+  return -1
+}
+
+clean_restart ${testfile}
+
+set eol "\[\r\n\]+"
+
+gdb_test "info types alpha.beta" \
+         "All types matching regular expression \"alpha\\.beta\":${eol}File .*alpha.ads:${eol}${decimal}:\talpha\.beta;"
diff --git a/gdb/testsuite/gdb.ada/info_types_ada/alpha.adb b/gdb/testsuite/gdb.ada/info_types_ada/alpha.adb
new file mode 100644
index 0000000..756269a
--- /dev/null
+++ b/gdb/testsuite/gdb.ada/info_types_ada/alpha.adb
@@ -0,0 +1,6 @@ 
+package body Alpha is
+   procedure Do_Nothing (B : in out Beta) is
+   begin
+      null;
+   end Do_Nothing;
+end Alpha;
diff --git a/gdb/testsuite/gdb.ada/info_types_ada/alpha.ads b/gdb/testsuite/gdb.ada/info_types_ada/alpha.ads
new file mode 100644
index 0000000..1b10513
--- /dev/null
+++ b/gdb/testsuite/gdb.ada/info_types_ada/alpha.ads
@@ -0,0 +1,7 @@ 
+package Alpha is
+   type Beta is record
+      B : Integer;
+   end record;
+
+   procedure Do_Nothing (B : in out Beta);
+end Alpha;
diff --git a/gdb/testsuite/gdb.ada/info_types_ada/foo.adb b/gdb/testsuite/gdb.ada/info_types_ada/foo.adb
new file mode 100644
index 0000000..a154552
--- /dev/null
+++ b/gdb/testsuite/gdb.ada/info_types_ada/foo.adb
@@ -0,0 +1,6 @@ 
+with Alpha; use Alpha;
+procedure Foo is
+   My_Beta : Beta := (B => 7484);
+begin
+   Do_Nothing (My_Beta);
+end Foo;
-- 
2.1.4