Patchwork Fix gdb.ada/bp_c_mixed_case.exp (PR gdb/22670) (Re: [PATCH 3/3] Add new gdb.ada/bp_c_mixed_case testcase for PR gdb/22670)

login
register
mail settings
Submitter Pedro Alves
Date Jan. 5, 2018, 4:34 p.m.
Message ID <f6c80127-ef1e-7381-d7e5-3f2f0dc3a02c@redhat.com>
Download mbox | patch
Permalink /patch/25234/
State New
Headers show

Comments

Pedro Alves - Jan. 5, 2018, 4:34 p.m.
On 01/04/2018 08:35 AM, Joel Brobecker wrote:
> This patch adds a new testcase to demonstrate a regression introduced by:
> 
>     commit b5ec771e60c1a0863e51eb491c85c674097e9e13
>     Date:   Wed Nov 8 14:22:32 2017 +0000
>     Subject: Introduce lookup_name_info and generalize Ada's FULL/WILD name matching
> 
> The purpose of the testcase is to verify that a user can insert
> a breakpoint on a C function while debugging Ada, even if the name
> of the function includes uppercase letters, requiring us to use
> Ada's "<...>" notation to tell the GDB that the symbol name should
> be looked up verbatim.
> 
> As of the commit above, GDB is no longer finding the function:
> 
>     (gdb) break <MixedCaseFunc>
>     Function "<MixedCaseFunc>" not defined.
>     Make breakpoint pending on future shared library load? (y or [n])
> 
> Before the patch, the breakpoint was inserted without problem.
> 

Below's a fix for this one.

I've force-pushed this one along with the following on to the
users/palves/literal-matching branch, rebased on master to pick up the
other fixes that went in meanwhile.

From 439f8c51ff8f6cd9fb3bbc330a40492a15992add Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Fri, 5 Jan 2018 00:17:19 +0000
Subject: [PATCH 1/2] Fix gdb.ada/bp_c_mixed_case.exp (PR gdb/22670)

The problem here is that we were using the user-provided lookup name
literally for linkage name comparisons.  I.e., "<MixedCase>" with the
"<>"s included.  That obviously can't work since the "<>" are not
really part of the linkage name.  The original idea was that we'd use
the symbol's language to select the right symbol name matching
algorithm, but that doesn't work for Ada because it's not really
possible to unambiguously tell from the linkage name alone whether
we're dealing with Ada symbols, so Ada minsyms end up with no language
set, or sometimes C++ set.  So fix this by treating Ada mode specially
when determining the linkage name to match against.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/22670
	* minsyms.c (linkage_name_str): New function.
	(iterate_over_minimal_symbols): Use it.

gdb/testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	PR gdb/22670
	* gdb.ada/bp_c_mixed_case.exp: Remove setup_kfail calls.
---
 gdb/minsyms.c                             | 21 ++++++++++++++++++++-
 gdb/testsuite/gdb.ada/bp_c_mixed_case.exp |  4 +---
 2 files changed, 21 insertions(+), 4 deletions(-)
Joel Brobecker - Jan. 8, 2018, 3:57 a.m.
Hi Pedro,

On Fri, Jan 05, 2018 at 04:34:39PM +0000, Pedro Alves wrote:
> On 01/04/2018 08:35 AM, Joel Brobecker wrote:
> > This patch adds a new testcase to demonstrate a regression introduced by:
> > 
> >     commit b5ec771e60c1a0863e51eb491c85c674097e9e13
> >     Date:   Wed Nov 8 14:22:32 2017 +0000
> >     Subject: Introduce lookup_name_info and generalize Ada's FULL/WILD name matching
> > 
> > The purpose of the testcase is to verify that a user can insert
> > a breakpoint on a C function while debugging Ada, even if the name
> > of the function includes uppercase letters, requiring us to use
> > Ada's "<...>" notation to tell the GDB that the symbol name should
> > be looked up verbatim.
> > 
> > As of the commit above, GDB is no longer finding the function:
> > 
> >     (gdb) break <MixedCaseFunc>
> >     Function "<MixedCaseFunc>" not defined.
> >     Make breakpoint pending on future shared library load? (y or [n])
> > 
> > Before the patch, the breakpoint was inserted without problem.
> > 
> 
> Below's a fix for this one.

Thanks!

I confirm the test now passes for me as well :). I have a question
though:

> >From 439f8c51ff8f6cd9fb3bbc330a40492a15992add Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Fri, 5 Jan 2018 00:17:19 +0000
> Subject: [PATCH 1/2] Fix gdb.ada/bp_c_mixed_case.exp (PR gdb/22670)
> 
> The problem here is that we were using the user-provided lookup name
> literally for linkage name comparisons.  I.e., "<MixedCase>" with the
> "<>"s included.  That obviously can't work since the "<>" are not
> really part of the linkage name.  The original idea was that we'd use
> the symbol's language to select the right symbol name matching
> algorithm, but that doesn't work for Ada because it's not really
> possible to unambiguously tell from the linkage name alone whether
> we're dealing with Ada symbols, so Ada minsyms end up with no language
> set, or sometimes C++ set.  So fix this by treating Ada mode specially
> when determining the linkage name to match against.

I am wondering why minimal symbols are involved in this case,
considering that the C file was build with debugging information.
Shouldn't we be getting the function's address from the partial/full
symtabs instead?
Pedro Alves - Jan. 8, 2018, 3 p.m.
On 01/08/2018 03:57 AM, Joel Brobecker wrote:
> On Fri, Jan 05, 2018 at 04:34:39PM +0000, Pedro Alves wrote:

>> Below's a fix for this one.
> 
> Thanks!
> 
> I confirm the test now passes for me as well :). 

Great!

> I have a question though:
> 
>> >From 439f8c51ff8f6cd9fb3bbc330a40492a15992add Mon Sep 17 00:00:00 2001
>> From: Pedro Alves <palves@redhat.com>
>> Date: Fri, 5 Jan 2018 00:17:19 +0000
>> Subject: [PATCH 1/2] Fix gdb.ada/bp_c_mixed_case.exp (PR gdb/22670)
>>
>> The problem here is that we were using the user-provided lookup name
>> literally for linkage name comparisons.  I.e., "<MixedCase>" with the
>> "<>"s included.  That obviously can't work since the "<>" are not
>> really part of the linkage name.  The original idea was that we'd use
>> the symbol's language to select the right symbol name matching
>> algorithm, but that doesn't work for Ada because it's not really
>> possible to unambiguously tell from the linkage name alone whether
>> we're dealing with Ada symbols, so Ada minsyms end up with no language
>> set, or sometimes C++ set.  So fix this by treating Ada mode specially
>> when determining the linkage name to match against.
> 
> I am wondering why minimal symbols are involved in this case,
> considering that the C file was build with debugging information.
> Shouldn't we be getting the function's address from the partial/full
> symtabs instead?

AFAIK, GDB always worked this way for linespecs, even before my C++
wildmatching patches -- we collect symbols from both debug info and
minsyms, and coalesce them by address to avoid duplicates
(linespec.c:add_matching_symbols_to_info).  

The completion paths then try to do the same thing (though implemented
differently) [1].

I think it makes sense because:

- even if you have debug information in the binary, the debug information
  won't cover all function symbols.  Some may be internal linker-/compiler-
  generated symbols.  Or..

- ..there may be multiple symbols with the same name in different
  compilation units that all end up in the same binary/objfile.  Some may
  have debug info while others not.  E.g. (C, but applies to any language,
  or mixed languages):

static void function () {}                // in file1.c, compiled with -g
static int function (int p) { return p; } // in file2.c, compiled WITHOUT -g

  I could easily see the 'with'/'without -g' functions ending up both in
  the same objfile via static linking, for example.  We want "b <function>" to
  set a breakpoint on all the functions.

[1] - the C++ wildmatching series eliminated the divergence a bit, but
there's still a lot more duplication that there should ideally be.
One of the points of the gdb.linespec/cpcompletion.exp and
gdb.linespec/cpls-*.exp testcases is making sure that completion
and actually setting a breakpoint finds the same locations.

Let me know if this answers your question.

Thanks,
Pedro Alves
Joel Brobecker - Jan. 9, 2018, 9:46 a.m.
> > I am wondering why minimal symbols are involved in this case,
> > considering that the C file was build with debugging information.
> > Shouldn't we be getting the function's address from the partial/full
> > symtabs instead?
> 
> AFAIK, GDB always worked this way for linespecs, even before my C++
> wildmatching patches -- we collect symbols from both debug info and
> minsyms, and coalesce them by address to avoid duplicates
> (linespec.c:add_matching_symbols_to_info).  

That's true.

What surprises me is that, before your patch, we were finding
no symbol at all. So we were failing the lookup both with minimal
symbols, and within the partial/full symtab.

Your patch, IIUC, handles the lookup at the minimal symbol level,
which is indeed a good thing. But shouldn't we also be finding
that same symbol through the partial/full symtab search? I have
a feeling that your minimal symbol patch might be hiding a bug
in the search for the symbol, at least from the linespec module.

I did a bit of debugging this morning, first with the following
snapshot, which is shortly before the wild-matching patch series:

    commit b346cb961f729e2955391513a5b05eaf02b308ea
    Author: GDB Administrator <gdbadmin@sourceware.org>
    Date:   Wed Nov 8 00:00:20 2017 +0000

The function iterate_over_all_matching_symtabs finds the function
in the bar.c's partial symtab because the matching function is...

           [&] (const char *symbol_name)
           {
             return symbol_name_cmp (symbol_name, name) == 0;
           },

... where name, in this case is "MixedCaseFunc" -- The "<>" has been
stripped. They got stripped by linespec.c::find_linespec_symbols
when it took that name and converted it to a lookup name via:

  if (state->language->la_language == language_ada)
    {
      /* In Ada, the symbol lookups are performed using the encoded
         name rather than the demangled name.  */
      ada_lookup_storage = ada_name_for_lookup (name);
      lookup_name = ada_lookup_storage.c_str ();
    }
  else
    {
      lookup_name = demangle_for_lookup (name,
                                         state->language->la_language,
                                         demangle_storage);
    }

In the newer version, find_linespec_symbols gets passed the lookup_name
directly, and that lookup_name is now "<MixedCaseFunc>". Those extra
"<...>" are what eventually gets in the way when we compare this
lookup_name against the partial's symbols name (in
default_symbol_name_matcher, which does an strncmp_iw_with_mode
comparison, IIUC).

The call to find_linespec_symbols comes from linespace_parse_basic,
which has:

  /* Try looking it up as a function/method.  */
  find_linespec_symbols (PARSER_STATE (parser),
                         PARSER_RESULT (parser)->file_symtabs, name,
                         PARSER_EXPLICIT (parser)->func_name_match_type,
                         &symbols, &minimal_symbols);

I really hate to be stopping the investigation at this point, as
I feel I am onto something, but I am running out of time for today.

The part where I am not sure yet is whether we should be transforming
"name" into a "lookup_name" before calling find_linespec_symbols, or
whether we should be handling the angle brackets during the symbol
comparison... Or something else entirely! This is still all fairly
new to me...

Note that I was thinkg we would need to be stripping the executable
for us to demonstrate an error, but in fact, this is what happens
if I use "print" instead of "break":

    (gdb) p <MixedCaseFunc>
    $1 = {<text variable, no debug info>} 0x4024dc <MixedCaseFunc>

With the snapshot prior to the patch series, GDB knows that
MixedCaseFunc is a function without parameters, and the expression
above means calling it. As I was debugging without having started
the inferior, I got the following (expected) error:

    (gdb) print  <MixedCaseFunc>
    You can't do that without a process to debug.

in the bp_c_mixed_case.exp, we should see GDB telling us that
we stopped on our MixedCaseFunc breakpoint while evaluating
a function call...

Does this make some kind of sense to you? I can get back to this
for more digging again tomorrow.

Thanks!
Pedro Alves - Jan. 9, 2018, 2:59 p.m.
On 01/09/2018 09:46 AM, Joel Brobecker wrote:
>>> I am wondering why minimal symbols are involved in this case,
>>> considering that the C file was build with debugging information.
>>> Shouldn't we be getting the function's address from the partial/full
>>> symtabs instead?
>>
>> AFAIK, GDB always worked this way for linespecs, even before my C++
>> wildmatching patches -- we collect symbols from both debug info and
>> minsyms, and coalesce them by address to avoid duplicates
>> (linespec.c:add_matching_symbols_to_info).  
> 
> That's true.
> 
> What surprises me is that, before your patch, we were finding
> no symbol at all. So we were failing the lookup both with minimal
> symbols, and within the partial/full symtab.

Indeed, good point.  I don't know what I did not think of that.

> 
> Your patch, IIUC, handles the lookup at the minimal symbol level,
> which is indeed a good thing. But shouldn't we also be finding
> that same symbol through the partial/full symtab search? I have
> a feeling that your minimal symbol patch might be hiding a bug
> in the search for the symbol, at least from the linespec module.
> 
> I did a bit of debugging this morning, first with the following
> snapshot, which is shortly before the wild-matching patch series:
> 
>     commit b346cb961f729e2955391513a5b05eaf02b308ea
>     Author: GDB Administrator <gdbadmin@sourceware.org>
>     Date:   Wed Nov 8 00:00:20 2017 +0000
> 
> The function iterate_over_all_matching_symtabs finds the function
> in the bar.c's partial symtab because the matching function is...
> 
>            [&] (const char *symbol_name)
>            {
>              return symbol_name_cmp (symbol_name, name) == 0;
>            },
> 
> ... where name, in this case is "MixedCaseFunc" -- The "<>" has been
> stripped. They got stripped by linespec.c::find_linespec_symbols
> when it took that name and converted it to a lookup name via:
> 
>   if (state->language->la_language == language_ada)
>     {
>       /* In Ada, the symbol lookups are performed using the encoded
>          name rather than the demangled name.  */
>       ada_lookup_storage = ada_name_for_lookup (name);
>       lookup_name = ada_lookup_storage.c_str ();
>     }
>   else
>     {
>       lookup_name = demangle_for_lookup (name,
>                                          state->language->la_language,
>                                          demangle_storage);
>     }
> 
> In the newer version, find_linespec_symbols gets passed the lookup_name
> directly, and that lookup_name is now "<MixedCaseFunc>". Those extra
> "<...>" are what eventually gets in the way when we compare this
> lookup_name against the partial's symbols name (in
> default_symbol_name_matcher, which does an strncmp_iw_with_mode
> comparison, IIUC).
> 
> The call to find_linespec_symbols comes from linespace_parse_basic,
> which has:
> 
>   /* Try looking it up as a function/method.  */
>   find_linespec_symbols (PARSER_STATE (parser),
>                          PARSER_RESULT (parser)->file_symtabs, name,
>                          PARSER_EXPLICIT (parser)->func_name_match_type,
>                          &symbols, &minimal_symbols);
> 
> I really hate to be stopping the investigation at this point, as
> I feel I am onto something, but I am running out of time for today.
> 
> The part where I am not sure yet is whether we should be transforming
> "name" into a "lookup_name" before calling find_linespec_symbols, or
> whether we should be handling the angle brackets during the symbol
> comparison... Or something else entirely! This is still all fairly
> new to me...
> 
> Note that I was thinkg we would need to be stripping the executable
> for us to demonstrate an error, but in fact, this is what happens
> if I use "print" instead of "break":
> 
>     (gdb) p <MixedCaseFunc>
>     $1 = {<text variable, no debug info>} 0x4024dc <MixedCaseFunc>
> 
> With the snapshot prior to the patch series, GDB knows that
> MixedCaseFunc is a function without parameters, and the expression
> above means calling it. As I was debugging without having started
> the inferior, I got the following (expected) error:
> 
>     (gdb) print  <MixedCaseFunc>
>     You can't do that without a process to debug.
> 
> in the bp_c_mixed_case.exp, we should see GDB telling us that
> we stopped on our MixedCaseFunc breakpoint while evaluating
> a function call...
> 
> Does this make some kind of sense to you? 

Yes it does.  I played with this a bit, and am testing a patch.
Stay tuned.

> I can get back to this
> for more digging again tomorrow.
Thanks,
Pedro Alves

Patch

diff --git a/gdb/minsyms.c b/gdb/minsyms.c
index 26c91ec8819..fded0d65e93 100644
--- a/gdb/minsyms.c
+++ b/gdb/minsyms.c
@@ -447,6 +447,25 @@  find_minimal_symbol_address (const char *name, CORE_ADDR *addr,
   return sym.minsym == NULL;
 }
 
+/* Get the lookup name form best suitable for linkage name
+   matching.  */
+
+static const char *
+linkage_name_str (const lookup_name_info &lookup_name)
+{
+  /* Unlike most languages (including C++), Ada uses the
+     encoded/linkage name as the search name recorded in symbols.  So
+     if debugging in Ada mode, prefer the Ada-encoded name.  This also
+     makes Ada's verbatim match syntax ("<...>") work, because
+     "lookup_name.name()" includes the "<>"s, while
+     "lookup_name.ada().lookup_name()" is the encoded name with "<>"s
+     stripped.  */
+  if (current_language->la_language == language_ada)
+    return lookup_name.ada ().lookup_name ().c_str ();
+
+  return lookup_name.name ().c_str ();
+}
+
 /* See minsyms.h.  */
 
 void
@@ -459,7 +478,7 @@  iterate_over_minimal_symbols (struct objfile *objf,
 
   /* The first pass is over the ordinary hash table.  */
     {
-      const char *name = lookup_name.name ().c_str ();
+      const char *name = linkage_name_str (lookup_name);
       unsigned int hash = msymbol_hash (name) % MINIMAL_SYMBOL_HASH_SIZE;
       auto *mangled_cmp
 	= (case_sensitivity == case_sensitive_on
diff --git a/gdb/testsuite/gdb.ada/bp_c_mixed_case.exp b/gdb/testsuite/gdb.ada/bp_c_mixed_case.exp
index 54c61e3a8e8..7787646c67f 100644
--- a/gdb/testsuite/gdb.ada/bp_c_mixed_case.exp
+++ b/gdb/testsuite/gdb.ada/bp_c_mixed_case.exp
@@ -40,13 +40,11 @@  gdb_test "show lang" \
 # Try inserting a breakpoint inside a C function. Because the function's
 # name has some uppercase letters, we need to use the "<...>" notation.
 # The purpose of this testcase is to verify that we can in fact do so
-# and that it inserts the breakpoint at the expected location.
-setup_kfail gdb/22670 "*-*-*"
+# and that it inserts the breakpoint at the expected location.  See gdb/22670.
 gdb_test "break <MixedCaseFunc>" \
          "Breakpoint $decimal at $hex: file .*bar.c, line $decimal\\."
 
 # Resume the program's execution, verifying that it lands at the expected
 # location.
-setup_kfail gdb/22670 "*-*-*"
 gdb_test "continue" \
          "Breakpoint $decimal, MixedCaseFunc \\(\\) at .*bar\\.c:$decimal.*"