c++/8218: Destructors w/arguments.

Message ID e51beafa-eee2-b598-927c-2ac961c8e3b1@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves March 9, 2017, 3:04 p.m. UTC
  On 03/09/2017 01:58 AM, Keith Seitz wrote:

>> Can we trust that FIELD_ARTIFICIAL is always set on the implicit this
>> argument on all debug formats?  I.e., does it work with stabs?
> 
> No, only DWARF, I think. I've tested gdb.cp/*.exp for -gdwarf-2 and
> -gstabs+ for both 64-bit and -m32. In all cases, with or without my v1
> proposed patch, logs show no differences whatsoever, except in the
> -gdwarf-2 cases where the modified templates.exp tests now pass.

That's quite surprising.  There's nothing AFAICS in the stabs
reader setting the artificial flag, so a quick manual test
of "ptype SomeStructWithMethods" against a program built with stabs,
using your patch, "should" incorrectly show the "this" parameter
of all method prototypes.  I tried it myself now, and indeed
GDB still manages to NOT print the "this" parameter.  Using the
gdb.cp/method.cc testcase:

(gdb) ptype A
type = struct A {
    int x;
    int y;
  public:
    int foo(int);
    int bar(int) const;
    int baz(int, char) volatile;
    int qux(int, float) const volatile;
}

So I debugged a little.

Turns out that cp_type_print_method_args is _never_ called for stabs.
But, we ptype still prints the methods just fine.  So... huh?

I stepped through two instances of GDB side by side, one against
the method.cc binary built with stabs, and another with dwarf.

The difference is here, in c-typeprint.c:c_type_print_base :

1243                      demangled_name =
1244                        gdb_demangle (mangled_name,
1245                                      DMGL_ANSI | DMGL_PARAMS);
1246                      if (demangled_name == NULL)
1247                        {
1248                          /* In some cases (for instance with the HP
1249                             demangling), if a function has more than 10
1250                             arguments, the demangling will fail.
1251                             Let's try to reconstruct the function
1252                             signature from the symbol information.  */
1253                          if (!TYPE_FN_FIELD_STUB (f, j))
1254                            {
1255                              int staticp = TYPE_FN_FIELD_STATIC_P (f, j);
1256                              struct type *mtype = TYPE_FN_FIELD_TYPE (f, j);
1257
1258                              cp_type_print_method_args (mtype,
1259                                                         "",
1260                                                         method_name,
1261                                                         staticp,
1262                                                         stream, &local_flags);
1263                            }
1264                          else
1265                            fprintf_filtered (stream,
1266                                              _("<badly mangled name '%s'>"),
1267                                              mangled_name);
1268                        }
1269                      else
1270                        {
1271                          char *p;
1272                          char *demangled_no_class
1273                            = remove_qualifiers (demangled_name);

Note how the "In some cases ..." comment above makes it sounds like that
is a fallback path that we shouldn't really be hitting on modern toolchains.

But, in reality, it's the opposite.  The difference between stabs and dwarf,
is that stabs goes through the (demangled_name != NULL) path while DWARF
goes through the "fallback" (demangled_name == NULL) path.  Because:

Stabs:

 (top-gdb) p mangled_name
 $1 = 0x2fb3940 "_ZN1A3fooEi"
 (top-gdb) p demangled_name
 $2 = 0x2fc80b0 "A::foo(int)"
 (top-gdb) 

DWARF:

 (top-gdb) p mangled_name
 $1 = 0x2fcdfe0 "A::foo(int)"
 (top-gdb) p demangled_name
 $2 = 0x0

Eh, "mangled_name" is fetched from "TYPE_FN_FIELD_PHYSNAME (f, j)"
just above.  And PHYSNAME is the demangled name in DWARF/C++...

I never quite got why that is (why PHYSNAME is the demangled
name in C++), and there are still comments all over the place that
suggest that PHYSNAME is the mangled name.  Plus the variables names
suggest that too.  It's ... confusing.

So just for entertainment, I hacked GDB like this to always force the
cp_type_print_method_args path in stabs too:


And now with stabs, plus your original patch, we get the "expected"
artificial "this" parameters incorrectly printed:

(gdb) ptype A
type = struct A {
    int x;
    int y;
  public:
    int foo(A *, int);
    int bar(const A *, int) const;
    int baz(volatile A *, int, char) volatile;
    int qux(const volatile A *, int, float) const volatile;
}

>> Can you comment on the treatment c_type_print_args gives to
>> artificial arguments that Tom pointed out in the PR?
> 
> As Tom mentions, c_type_print_args will output the types of function
> arguments /unless/ it is being called from the dwarf reader (to
> construct physnames).  This function is called when someone does "ptype
> A::~A". It is /not/ used when one types "ptype A". The dtor in that
> instance is handled by cp_type_print_method_args.
> 
> Detailed explanation of why we don't see the artificial "int" parameter
> in both cases (before my patch):
> 
> Because of how the compiler is outputting dtor debug info, we see
> multiple DIEs describing the destructor.

This got to be the multiple destructors.  (not-in-charge / charge)

See the ABI at <http://mentorembedded.github.io/cxx-abi/abi.html>:

		   ::= D0	# deleting destructor
		   ::= D1	# complete object destructor
		   ::= D2	# base object destructor

Notice, same demangled name:

 $ nm -A testsuite/outputs/gdb.cp/templates/templates | c++filt | grep "T5<int>" | grep "~T5"
 testsuite/outputs/gdb.cp/templates/templates:000000000040184a W T5<int>::~T5()
 testsuite/outputs/gdb.cp/templates/templates:000000000040184a W T5<int>::~T5()

But different mangled names:

 $ nm -A testsuite/outputs/gdb.cp/templates/templates | grep 000000000040184a
 testsuite/outputs/gdb.cp/templates/templates:000000000040184a W _ZN2T5IiED1Ev
 testsuite/outputs/gdb.cp/templates/templates:000000000040184a W _ZN2T5IiED2Ev

In this case, they're aliased, but they'll have different addresses when
virtual inheritance is involved.  Vis:

$ cat virtual.cc
struct A { ~A() {} };
struct VA { virtual ~VA() {} };

struct VA2 : public virtual VA { virtual ~VA2() {} };
struct VA3 : public virtual VA { virtual ~VA3() {} };
struct VA2_3 : public VA2, public VA3 { virtual ~VA2_3() {} };

A a;
VA va;
VA2 va2;
VA3 va3;
VA2_3 va2_3;

int main () {}

$ g++ virtual.cc -o virtual -g3 -O0

$ nm -A virtual | c++filt | grep "~" | grep -v thunk
virtual:000000000040081a W A::~A()
virtual:000000000040081a W A::~A()
virtual:0000000000400856 W VA::~VA()
virtual:0000000000400826 W VA::~VA()
virtual:0000000000400826 W VA::~VA()
virtual:000000000040094c W VA2::~VA2()
virtual:00000000004008ea W VA2::~VA2()
virtual:000000000040087c W VA2::~VA2()
virtual:0000000000400a4c W VA3::~VA3()
virtual:00000000004009ea W VA3::~VA3()
virtual:000000000040097c W VA3::~VA3()
virtual:0000000000400b22 W VA2_3::~VA2_3()
virtual:0000000000400a7c W VA2_3::~VA2_3()

$ nm -A virtual | grep "_ZN.*D[0-2]"
virtual:000000000040081a W _ZN1AD1Ev
virtual:000000000040081a W _ZN1AD2Ev
virtual:0000000000400856 W _ZN2VAD0Ev
virtual:0000000000400826 W _ZN2VAD1Ev
virtual:0000000000400826 W _ZN2VAD2Ev
virtual:000000000040094c W _ZN3VA2D0Ev
virtual:00000000004008ea W _ZN3VA2D1Ev
virtual:000000000040087c W _ZN3VA2D2Ev
virtual:0000000000400a4c W _ZN3VA3D0Ev
virtual:00000000004009ea W _ZN3VA3D1Ev
virtual:000000000040097c W _ZN3VA3D2Ev
virtual:0000000000400b22 W _ZN5VA2_3D0Ev
virtual:0000000000400a7c W _ZN5VA2_3D1Ev

> When one types "ptype A", this uses one of the DIEs. This particular one
> mentions the two artificial parameters. That's why we see (before this
> patch) the "int" parameter.
> 
> When one types "ptype A::~A", this uses a different DIE, which has valid
> PC bounds, but only mentions the first formal parameter. This is why
> when we see only the "const A *" in the description of the parameters.

Yeah.  If there are multiple versions of the "same" symbol, guess ptype
just runs into the first.

(gdb) ptype A::~A
type = void (A * const)
(gdb) ptype VA2_3::~VA2_3
type = void (VA2_3 * const)
(gdb) ptype A::~A
type = void (A * const)
(gdb) ptype VA::~VA
type = void (VA * const)
(gdb) ptype VA2::~VA2
type = void (VA2 * const, const void ** const)
(gdb) ptype VA3::~VA3
type = void (VA3 * const, const void ** const)
(gdb) ptype VA2_3::~VA2_3
type = void (VA2_3 * const)

> 
> I've no explanation as to why we don't hide these implementation details
> other than to offer a truly pedantic view of the code.

I can see value in being able to refer to to the artificial arguments
and distinguish the different overloads.  E.g., if one is trying to debug
one of the multiple ctor/dtors:

(gdb) b VA3::~VA3
Breakpoint 2 at 0x40098c: VA3::~VA3. (3 locations)
(gdb) info breakpoints 
Num     Type           Disp Enb Address            What
2       breakpoint     keep y   <MULTIPLE>         
2.1                         y     0x000000000040098c in VA3::~VA3() at virtual.cc:5
2.2                         y     0x00000000004009f6 in VA3::~VA3() at virtual.cc:5
2.3                         y     0x0000000000400a58 in VA3::~VA3() at virtual.cc:5
(gdb) r
Starting program: /home/pedro/tmp/virtual 

Breakpoint 2, VA3::~VA3 (this=0x602128 <va2_3+8>, __vtt_parm=0x400de8 <VTT for VA2_3+24>, __in_chrg=<optimized out>) at virtual.cc:5
5       struct VA3 : public virtual VA { virtual ~VA3() {} };
(gdb) c
Continuing.

Breakpoint 2, VA3::~VA3 (this=0x602118 <va3>, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) at virtual.cc:5
5       struct VA3 : public virtual VA { virtual ~VA3() {} };
(gdb) c
Continuing.
[Inferior 1 (process 30836) exited normally]

But that's more of a run-time engineer thing.  And such users can
always use the mangled names directly, for example:

 (gdb) ptype _ZN1AD0Ev
 type = void (A * const)

A regular user wouldn't ever want to see these artificial things,
I suppose.

Maybe it'd be preferable to print the type of non-static methods
In a more C++-ish way, even, like, given:

 struct A { void foo(); };

instead of the current:

 (gdb) ptype A::foo
 type = void (A * const)

print:

 (gdb) ptype A::foo
 type = void (A::)();

Likewise for method pointers, while at it.  I.e., print this:

 (gdb) ptype &A::foo
 type = void (A::*)()

instead of this:

 (gdb) ptype &A::foo
 type = void (A::*)(A * const)

which just looks incorrect, since it no longer has the excuse
of looking like a function pointer "as seen" through a C lens.

> 
> As best I can describe it, then, "ptype A::~A" (aka c_type_print_args)
> attempts to be as tools-developer-oriented as possible while "ptype
> CLASS" attempts to be more user-centric, hiding internal details like
> artificial constructs.

Yeah...  I suspect it's more an "historic accident" than by design
though.

> 
> Did any of that actually answer your question?

It sure helped, but I for me it was important to understand
why _didn't_ stabs get broken.

This new version is OK.

Thanks,
Pedro Alves
  

Comments

Keith Seitz March 10, 2017, 6:33 p.m. UTC | #1
On 03/09/2017 07:04 AM, Pedro Alves wrote:
> That's quite surprising. 

Sorry, I meant there are no differences in the test suite. I did not
test by hand, believing also that it would manifest as you indicated.

> But, in reality, it's the opposite.  The difference between stabs and dwarf,
> is that stabs goes through the (demangled_name != NULL) path while DWARF
> goes through the "fallback" (demangled_name == NULL) path.  Because:
> 
> Stabs:
> 
>  (top-gdb) p mangled_name
>  $1 = 0x2fb3940 "_ZN1A3fooEi"
>  (top-gdb) p demangled_name
>  $2 = 0x2fc80b0 "A::foo(int)"
>  (top-gdb) 
> 
> DWARF:
> 
>  (top-gdb) p mangled_name
>  $1 = 0x2fcdfe0 "A::foo(int)"
>  (top-gdb) p demangled_name
>  $2 = 0x0
> 
> Eh, "mangled_name" is fetched from "TYPE_FN_FIELD_PHYSNAME (f, j)"
> just above.  And PHYSNAME is the demangled name in DWARF/C++...
> 
> I never quite got why that is (why PHYSNAME is the demangled
> name in C++), and there are still comments all over the place that
> suggest that PHYSNAME is the mangled name.  Plus the variables names
> suggest that too.  It's ... confusing.

Long story short: the symbol tables only stored linkage names. That
means every time we attempt to match a name to a symbol (every time we
call strcmp_iw essentially), we need to demangle. Every time we print
out a symbol, we demangle. Every time we do anything with any symbol, we
demangle it. I originally had a working code base to do this. As you can
imagine, it was terribly slow.

Users don't typically type "b _ZN1AD0Ev". They type "b A::~A". If we
could convince users to use only linkage names, we'd be in great shape.

If we didn't care about memory footprint, we'd be in great shape.

If we had access to the same mangler the compiler uses, we'd be in great
shape.

But we have none of those things.

Do maintainers want to reassess this?

> This got to be the multiple destructors.  (not-in-charge / charge)

Sure is. [I apologize, my post was not explicit on that point.]

>> I've no explanation as to why we don't hide these implementation details
>> other than to offer a truly pedantic view of the code.
> 
> I can see value in being able to refer to to the artificial arguments
> and distinguish the different overloads.  E.g., if one is trying to debug
> one of the multiple ctor/dtors:
> 
> (gdb) b VA3::~VA3
> Breakpoint 2 at 0x40098c: VA3::~VA3. (3 locations)
> (gdb) info breakpoints 
> Num     Type           Disp Enb Address            What
> 2       breakpoint     keep y   <MULTIPLE>         
> 2.1                         y     0x000000000040098c in VA3::~VA3() at virtual.cc:5
> 2.2                         y     0x00000000004009f6 in VA3::~VA3() at virtual.cc:5
> 2.3                         y     0x0000000000400a58 in VA3::~VA3() at virtual.cc:5
> (gdb) r
> Starting program: /home/pedro/tmp/virtual 
> 
> Breakpoint 2, VA3::~VA3 (this=0x602128 <va2_3+8>, __vtt_parm=0x400de8 <VTT for VA2_3+24>, __in_chrg=<optimized out>) at virtual.cc:5
> 5       struct VA3 : public virtual VA { virtual ~VA3() {} };
> (gdb) c
> Continuing.
> 
> Breakpoint 2, VA3::~VA3 (this=0x602118 <va3>, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) at virtual.cc:5
> 5       struct VA3 : public virtual VA { virtual ~VA3() {} };
> (gdb) c
> Continuing.
> [Inferior 1 (process 30836) exited normally]
> 
> But that's more of a run-time engineer thing.  And such users can
> always use the mangled names directly, for example:
> 
>  (gdb) ptype _ZN1AD0Ev
>  type = void (A * const)
> 
> A regular user wouldn't ever want to see these artificial things,
> I suppose.

Yeah, I figure the way forward would really be to offer all-or-nothing.
Hide all artificial members or hide none. We now do a mish-mash of both,
confusing nearly everybody. /me notes future cleanup/feature

> Maybe it'd be preferable to print the type of non-static methods
> In a more C++-ish way, even, like, given:
> 
>  struct A { void foo(); };
> 
> instead of the current:
> 
>  (gdb) ptype A::foo
>  type = void (A * const)
> 
> print:
> 
>  (gdb) ptype A::foo
>  type = void (A::)();

Ooohh... That would be nice...

>> Did any of that actually answer your question?
> 
> It sure helped, but I for me it was important to understand
> why _didn't_ stabs get broken.

Sorry about that. I should have asked for further clarification.

> This new version is OK.

Thank you, committed.

Keith
  
Pedro Alves March 10, 2017, 7:23 p.m. UTC | #2
On 03/10/2017 06:33 PM, Keith Seitz wrote:

>> Eh, "mangled_name" is fetched from "TYPE_FN_FIELD_PHYSNAME (f, j)"
>> just above.  And PHYSNAME is the demangled name in DWARF/C++...
>>
>> I never quite got why that is (why PHYSNAME is the demangled
>> name in C++), and there are still comments all over the place that
>> suggest that PHYSNAME is the mangled name.  Plus the variables names
>> suggest that too.  It's ... confusing.
> 
> Long story short: the symbol tables only stored linkage names. That
> means every time we attempt to match a name to a symbol (every time we
> call strcmp_iw essentially), we need to demangle. 

Or mangle the lookup name and compare that.  But it wouldn't work
for my WIP linespec/completion branch, so I'm not proposing that.  :-)

> Every time we print
> out a symbol, we demangle. Every time we do anything with any symbol, we
> demangle it. I originally had a working code base to do this. As you can
> imagine, it was terribly slow.

Yeah, understood, thanks.  Ada still does something like that, by
using "encoded names" as symbol search names.

Regardless, I think a global demangled<->mangled map/hash
to avoid doing the work twice would help.  I've been pondering that
on and off.  I think there's a PR open about how we end up demangling
the same symbols more than once, due to minsyms vs symbols.
 
> Users don't typically type "b _ZN1AD0Ev". They type "b A::~A". If we
> could convince users to use only linkage names, we'd be in great shape.
> 
> If we didn't care about memory footprint, we'd be in great shape.
> 
> If we had access to the same mangler the compiler uses, we'd be in great
> shape.
> 
> But we have none of those things.

Understood, thanks.

> Do maintainers want to reassess this?

No, no, no.  I was honestly confused by this, not calling for
a redesign.  I think the main problem is one of nomenclature
(what exactly is a physical name), of old stale comments and
old variable names.  E.g., comments in gdbtypes.h about physname
things only talk about mangled names.

> Yeah, I figure the way forward would really be to offer all-or-nothing.
> Hide all artificial members or hide none. We now do a mish-mash of both,
> confusing nearly everybody. /me notes future cleanup/feature

Yeah.  We could have a "show me artificial things" / "give me the ABI view"
switches or format parameter to print/ptype, or some such.

Thanks,
Pedro Alves
  

Patch

diff --git c/gdb/c-typeprint.c w/gdb/c-typeprint.c
index e504618..e750c84 100644
--- c/gdb/c-typeprint.c
+++ w/gdb/c-typeprint.c
@@ -1243,7 +1243,7 @@  c_type_print_base (struct type *type, struct ui_file *stream,
                  demangled_name =
                    gdb_demangle (mangled_name,
                                  DMGL_ANSI | DMGL_PARAMS);
-                 if (demangled_name == NULL)
+                 if (1 || demangled_name == NULL)
                    {
                      /* In some cases (for instance with the HP
                         demangling), if a function has more than 10