[RFC,0/3] Pretty-printing for errno

Message ID 3a7946e9-d178-f878-9774-64ff44bcf5df@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves June 29, 2017, 4:53 p.m. UTC
  On 06/29/2017 04:48 PM, Phil Muldoon wrote:
> 
> (gdb) p (test_i) i
> $2 = ('typename: ', 'int')
> 
> Which is most puzzling.

Indeed.  No need for printers to see this, actually.

With this code:

 typedef int zzz;
 zzz z;

gdb:

 (gdb) whatis z
 type = zzz
 (gdb) whatis (zzz) z
 type = int

Eh.

I suspect this is the result of a bogus implementation of
only stripping one level of typedef if the argument to
whatis is a type name.  From the manual:

 If @var{arg} is a variable or an expression, @code{whatis} prints its
 literal type as it is used in the source code.  If the type was
 defined using a @code{typedef}, @code{whatis} will @emph{not} print
 the data type underlying the @code{typedef}.
 (...)
 If @var{arg} is a type name that was defined using @code{typedef},
 @code{whatis} @dfn{unrolls} only one level of that @code{typedef}.


That one-level stripping comes from here, in
gdb/eval.c:evaluate_subexp_standard, handling OP_TYPE:

...
     else if (noside == EVAL_AVOID_SIDE_EFFECTS)
	{
	  struct type *type = exp->elts[pc + 1].type;

	  /* If this is a typedef, then find its immediate target.  We
	     use check_typedef to resolve stubs, but we ignore its
	     result because we do not want to dig past all
	     typedefs.  */
	  check_typedef (type);
	  if (TYPE_CODE (type) == TYPE_CODE_TYPEDEF)
	    type = TYPE_TARGET_TYPE (type);
	  return allocate_value (type);
	}

However, this is reachable in both:

#1 - (gdb) whatis (zzz)0
#2 - (gdb) whatis zzz

While only case #2 should strip the typedef.  Removing that "find immediate
target" code above help with fixing #1.  We then run into the fact that
value_cast also drops typedefs.  Fixing that (see patch below) makes
whatis (zzz)0 work as expected:

 (gdb) whatis (zzz)0
 type = zzz

however, we'd need to still make "whatis" strip one level of
typedefs when the argument is a type, somehow, because we now
get this:

 (gdb) whatis zzz
 type = zzz
From 21f3a88611fad22f73aff2217c93d99971dd076f Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 29 Jun 2017 17:18:32 +0100
Subject: [PATCH] whatis

---
 gdb/eval.c   |  2 ++
 gdb/valops.c | 28 +++++++++++++++-------------
 2 files changed, 17 insertions(+), 13 deletions(-)
  

Comments

Pedro Alves June 29, 2017, 5:02 p.m. UTC | #1
On 06/29/2017 05:53 PM, Pedro Alves wrote:

> however, we'd need to still make "whatis" strip one level of
> typedefs when the argument is a type, somehow, because we now
> get this:
> 
>  (gdb) whatis zzz
>  type = zzz

I suspect we can do that by making "whatis" look at the top of
the expression tree, see if it's an OP_TYPE:

(gdb) set debug expression 1
(gdb) whatis (zzz)0
...
Dump of expression @ 0x1bc6af0, after conversion to prefix form:
Expression: `(zzz) 0'
        Language c, 8 elements, 16 bytes each.

            0  UNOP_CAST_TYPE         (
            1    OP_TYPE               Type @0x18e7920 (zzz))
            4    OP_LONG               Type @0x19a6650 (int), value 0 (0x0)
type = zzz
(gdb) whatis zzz
...
Dump of expression @ 0x1bc87b0, after conversion to prefix form:
Expression: `zzz'
        Language c, 3 elements, 16 bytes each.

            0  OP_TYPE               Type @0x18e7920 (zzz)
type = zzz
(gdb) whatis z
...
Dump of expression @ 0x1bc6af0, after conversion to prefix form:
Expression: `z'
        Language c, 4 elements, 16 bytes each.

            0  OP_VAR_VALUE          Block @0x19e8ee0, symbol @0x19e8e10 (z)
type = zzz
(gdb) 

There may be a helper for this already somewhere.  I've never dug
that much deeply into this area of the code.

Thanks,
Pedro Alves
  
Pedro Alves June 29, 2017, 5:28 p.m. UTC | #2
On 06/29/2017 06:02 PM, Pedro Alves wrote:
> I suspect we can do that by making "whatis" look at the top of
> the expression tree, see if it's an OP_TYPE:

That works:

With this code:

 typedef int zzz;
 zzz z;

gdb:

 (gdb) whatis z
 type = zzz
 (gdb) whatis (zzz)0
 type = zzz
 (gdb) whatis zzz
 type = int

and at least

 make check TESTS="*/whatis*.exp */*ptype*.exp"

passes, which is promising.

Haven't tried this against a printer, but I think
it should start working.

We may need OP_SCOPE too.

I'll try running this against the full gdb testsuite,
and write some test.

Thanks,
Pedro Alves
  
Zack Weinberg June 30, 2017, 12:28 a.m. UTC | #3
On Thu, Jun 29, 2017 at 1:28 PM, Pedro Alves <palves@redhat.com> wrote:
> With this code:
>
>  typedef int zzz;
>  zzz z;
>
> gdb:
>
>  (gdb) whatis z
>  type = zzz
>  (gdb) whatis (zzz)0
>  type = zzz
>  (gdb) whatis zzz
>  type = int

For the glibc patch to go through, this case also needs to work:

typedef int error_t;
error_t var;
extern error_t *errloc(void);

int main(void)
{
    return *errloc();
}

(compiled to .o file)

(gdb) ptype errloc
No symbol "errloc" in current context.

This might be a problem with inadequate debugging information
generated by the compiler -- it does work correctly if a function
*definition* is visible.  But we need it to work given only the above,
possibly with no debug symbols in the shared library defining
"errloc".

zw
  
Pedro Alves June 30, 2017, 4:37 p.m. UTC | #4
On 06/30/2017 01:28 AM, Zack Weinberg wrote:
> On Thu, Jun 29, 2017 at 1:28 PM, Pedro Alves <palves@redhat.com> wrote:
>> With this code:
>>
>>  typedef int zzz;
>>  zzz z;
>>
>> gdb:
>>
>>  (gdb) whatis z
>>  type = zzz
>>  (gdb) whatis (zzz)0
>>  type = zzz
>>  (gdb) whatis zzz
>>  type = int
> 

I've now sent the patch to gdb-patches:

 https://sourceware.org/ml/gdb-patches/2017-06/msg00827.html

Could one of you check whether the type printer kicks in with that,
in the problematic cast case?

Phil, might it be a good idea to convert what you were
using to a real gdb testsuite test?

> For the glibc patch to go through, this case also needs to work:
> 
> typedef int error_t;
> error_t var;
> extern error_t *errloc(void);
> 
> int main(void)
> {
>     return *errloc();
> }
> 
> (compiled to .o file)
> 
> (gdb) ptype errloc
> No symbol "errloc" in current context.
> 
> This might be a problem with inadequate debugging information
> generated by the compiler 

Debug info is generated for function definitions, not declarations.
I.e., that declaration for "errloc" alone produces no debug info
at all.  Try 'readelf -dw' on the .o file.

But why is this a valid scenario?  Even if you have no debug info,
GDB should be able to find the function's address in the elf dynamic
symbol table?

I'm not sure what you mean by "glibc patch to go through".
If you mean acceptance upstream, then I'd consider this a separate
issue, because now you're talking about being able to print the
"errno" variable in the first place, even as a plain int with no
printer?  That is related, but orthogonal from the error_t type printer?
Otherwise, can you clarify?

-- it does work correctly if a function
> *definition* is visible.  But we need it to work given only the above,
> possibly with no debug symbols in the shared library defining
> "errloc".

Won't __errno_location be always present in the dynamic symbol table?
If you strip everything from libc.so.6 / libpthread.so.0 including
dynamic symbols, leaving gdb with no clue about "__errno_location",
let alone its address, there's no way that gdb can call it.
(Unless you call it by address manually.)

The next problem is that without debug info for __errno_location,
gdb has no clue of its prototype, only that its a function, and so
it assumes that it has type "int()", i.e., that it returns int,
while in reality it returns int * / __error_t *.  (Falling back
to assuming "int" is IMO not the best idea, but I don't know
the history behind it.)

I.e., with your example .o + a separate .o file defining
errloc (with no debug info), we now get:

 (gdb) ptype errloc
 type = int ()

and then calling "p *errloc()" (the equivalent of
'#define errno (*__errno_location ())' silently subtly does
the wrong thing.

One dirty way around it would be for the printer to
re-define the errno macro (using  to cast __errno_location to
the correct type before calling it, I guess:

(gdb) macro define errno  *(*(__error_t *(*) (void)) __errno_location) ()

That's make "errno" available when you compile with levels
lower than -g3, too.

Fedora gdb has a morally equivalent hack in the source code
directly.

Ideally, we'd find a clean way to make this all Just Work.

Thanks,
Pedro Alves
  
Pedro Alves June 30, 2017, 4:47 p.m. UTC | #5
On 06/30/2017 05:37 PM, Pedro Alves wrote:

> I've now sent the patch to gdb-patches:
> 
>  https://sourceware.org/ml/gdb-patches/2017-06/msg00827.html
> 
> Could one of you check whether the type printer kicks in with that,
> in the problematic cast case?

I put it on the users/palves/whatis branch as well, btw.

Thanks,
Pedro Alves
  
Zack Weinberg June 30, 2017, 5:27 p.m. UTC | #6
On Fri, Jun 30, 2017 at 12:37 PM, Pedro Alves <palves@redhat.com> wrote:
> On 06/30/2017 01:28 AM, Zack Weinberg wrote:
>> For the glibc patch to go through, this case also needs to work:
>>
>> typedef int error_t;
>> error_t var;
>> extern error_t *errloc(void);
>>
>> int main(void)
>> {
>>     return *errloc();
>> }
>>
>> (compiled to .o file)
>>
>> (gdb) ptype errloc
>> No symbol "errloc" in current context.
>>
>> This might be a problem with inadequate debugging information
>> generated by the compiler
>
> Debug info is generated for function definitions, not declarations.
> I.e., that declaration for "errloc" alone produces no debug info
> at all.  Try 'readelf -dw' on the .o file.
>
> But why is this a valid scenario?  Even if you have no debug info,
> GDB should be able to find the function's address in the elf dynamic
> symbol table?
>
> I'm not sure what you mean by "glibc patch to go through".
> If you mean acceptance upstream, then I'd consider this a separate
> issue, because now you're talking about being able to print the
> "errno" variable in the first place, even as a plain int with no
> printer?  That is related, but orthogonal from the error_t type printer?
> Otherwise, can you clarify?

The _primary_ goal of the glibc patches is to trigger a pretty-printer
when someone does

(gdb) p errno

with no further adornment. Since pretty-printers are keyed by type
(and since people do store errno codes in other places than errno),
this involves a type 'error_t' (and its implementation-namespace alias
'__error_t'), but if we can't get 'p errno' to work, we're probably
not going to bother.

In all currently-supported configurations, this is glibc's definition of errno:

extern int *__errno_location (void) __THROW __attribute__ ((__const__));
#define errno (*__errno_location ())

(__THROW is a macro expanding to 'throw ()' when compiling C++.)

The patches change that to

extern __error_t *__errno_location (void) __THROW __attribute__ ((__const__));

which should be sufficient to get the pretty-printing happening.  But
obviously that's only going to work if GDB actually knows that the
return type of __errno_location is __error_t*, and it doesn't seem to,
*even when debug information for the definition is available*:

 $ gdb /lib/x86_64-linux-gnu/libc.so.6
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
...

Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols
from /usr/lib/debug/.build-id/cc/80584889db7a969292959a46c718a2b1500702.debug...done.
done.
(gdb) ptype __errno_location
type = int ()
(gdb) p __errno_location
$1 = {<text variable, no debug info>} 0x20590 <__GI___errno_location>

... a suspicion occurs...

(gdb) ptype __GI___errno_location
type = int *(void)
(gdb) p __GI___errno_location
$2 = {int *(void)} 0x20590 <__GI___errno_location>

... so I guess this is a problem with the __GI_ indirection, which
*may* be a thing we can resolve on our end.  I don't fully understand
that stuff.

Still, It Would Be Nice™ if you _didn't_ have to have the debug
symbols for libc.so installed for this to work.  Probably a lot of
people think you only need those if you are debugging libc itself.

> The next problem is that without debug info for __errno_location,
> gdb has no clue of its prototype, only that its a function, and so
> it assumes that it has type "int()", i.e., that it returns int,
> while in reality it returns int * / __error_t *.  (Falling back
> to assuming "int" is IMO not the best idea, but I don't know
> the history behind it.)

Probably because that's the pre-C89 legacy default function signature?

> One dirty way around it would be for the printer to
> re-define the errno macro (using  to cast __errno_location to
> the correct type before calling it, I guess:
>
> (gdb) macro define errno  *(*(__error_t *(*) (void)) __errno_location) ()
>
> That's make "errno" available when you compile with levels
> lower than -g3, too.

Hmm.  How would one do that from inside Python?

zw
  
Pedro Alves June 30, 2017, 6:11 p.m. UTC | #7
On 06/30/2017 06:27 PM, Zack Weinberg wrote:

> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols
> from /usr/lib/debug/.build-id/cc/80584889db7a969292959a46c718a2b1500702.debug...done.
> done.
> (gdb) ptype __errno_location
> type = int ()
> (gdb) p __errno_location
> $1 = {<text variable, no debug info>} 0x20590 <__GI___errno_location>
> 
> ... a suspicion occurs...
> 
> (gdb) ptype __GI___errno_location
> type = int *(void)
> (gdb) p __GI___errno_location
> $2 = {int *(void)} 0x20590 <__GI___errno_location>
> 
> ... so I guess this is a problem with the __GI_ indirection, which
> *may* be a thing we can resolve on our end.  I don't fully understand
> that stuff.

OK, thanks, that's clear.  I don't know much about __GI_ indirection
either.  Redefining errno to call the __GI_ version would probably
be too easy.  :-)

> Still, It Would Be Nice™ if you _didn't_ have to have the debug
> symbols for libc.so installed for this to work.  Probably a lot of
> people think you only need those if you are debugging libc itself.

I guess that that could also be seen as a packaging issue.  May
be it'd be possible to (always) have some kind of minimal debug
info available for libc, with only the definitions of public API
functions, describing their prototypes, but not their internals?
See:

  https://fedoraproject.org/wiki/Features/MiniDebugInfo

> 
>> The next problem is that without debug info for __errno_location,
>> gdb has no clue of its prototype, only that its a function, and so
>> it assumes that it has type "int()", i.e., that it returns int,
>> while in reality it returns int * / __error_t *.  (Falling back
>> to assuming "int" is IMO not the best idea, but I don't know
>> the history behind it.)
> 
> Probably because that's the pre-C89 legacy default function signature?

Yes, most likely.

>> One dirty way around it would be for the printer to
>> re-define the errno macro (using  to cast __errno_location to
>> the correct type before calling it, I guess:
>>
>> (gdb) macro define errno  *(*(__error_t *(*) (void)) __errno_location) ()
>>
>> That's make "errno" available when you compile with levels
>> lower than -g3, too.
> 
> Hmm.  How would one do that from inside Python?

There's no direct Python API, I believe.  You'd just call the CLI
command directly, with gdb.execute.

xmethods sounds like something that maybe might be useful here:
 https://sourceware.org/gdb/onlinedocs/gdb/Xmethods-In-Python.html
though from the docs it sounds like you can only replace class
methods, not free functions, currently.  Not sure, have never written any.

Thanks,
Pedro Alves
  
Pedro Alves July 1, 2017, 11:56 a.m. UTC | #8
On 06/30/2017 07:11 PM, Pedro Alves wrote:
> On 06/30/2017 06:27 PM, Zack Weinberg wrote:
> 

>>> One dirty way around it would be for the printer to
>>> re-define the errno macro (using  to cast __errno_location to
>>> the correct type before calling it, I guess:
>>>
>>> (gdb) macro define errno  *(*(__error_t *(*) (void)) __errno_location) ()
>>>
>>> That's make "errno" available when you compile with levels
>>> lower than -g3, too.
>>
>> Hmm.  How would one do that from inside Python?
> 
> There's no direct Python API, I believe.  You'd just call the CLI
> command directly, with gdb.execute.
> 
> xmethods sounds like something that maybe might be useful here:
>  https://sourceware.org/gdb/onlinedocs/gdb/Xmethods-In-Python.html
> though from the docs it sounds like you can only replace class
> methods, not free functions, currently.  Not sure, have never written any.

BTW, it'd be very nice if the printer could replace
the "#define errno *__errno_location()" with an alternative
implementation that would avoid the function call, so that
"print errno":

 #1 - would also work when debugging core dumps

 #2 - is just plain safer.  Having gdb call functions in
      the inferior always carries the risk of corrupting an
      already corrupt inferior even more.

Thanks,
Pedro Alves
  
Siddhesh Poyarekar July 1, 2017, 2:35 p.m. UTC | #9
On Friday 30 June 2017 10:57 PM, Zack Weinberg wrote:
> The _primary_ goal of the glibc patches is to trigger a pretty-printer
> when someone does
> 
> (gdb) p errno
> 
> with no further adornment. Since pretty-printers are keyed by type
> (and since people do store errno codes in other places than errno),
> this involves a type 'error_t' (and its implementation-namespace alias
> '__error_t'), but if we can't get 'p errno' to work, we're probably
> not going to bother.
> 
> In all currently-supported configurations, this is glibc's definition of errno:
> 
> extern int *__errno_location (void) __THROW __attribute__ ((__const__));
> #define errno (*__errno_location ())
> 
> (__THROW is a macro expanding to 'throw ()' when compiling C++.)
> 
> The patches change that to
> 
> extern __error_t *__errno_location (void) __THROW __attribute__ ((__const__));
> 
> which should be sufficient to get the pretty-printing happening.  But
> obviously that's only going to work if GDB actually knows that the
> return type of __errno_location is __error_t*, and it doesn't seem to,
> *even when debug information for the definition is available*:
> 
>  $ gdb /lib/x86_64-linux-gnu/libc.so.6
> GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
> ...
> 
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols
> from /usr/lib/debug/.build-id/cc/80584889db7a969292959a46c718a2b1500702.debug...done.
> done.
> (gdb) ptype __errno_location
> type = int ()
> (gdb) p __errno_location
> $1 = {<text variable, no debug info>} 0x20590 <__GI___errno_location>
> 
> ... a suspicion occurs...
> 
> (gdb) ptype __GI___errno_location
> type = int *(void)
> (gdb) p __GI___errno_location
> $2 = {int *(void)} 0x20590 <__GI___errno_location>
> 
> ... so I guess this is a problem with the __GI_ indirection, which
> *may* be a thing we can resolve on our end.  I don't fully understand
> that stuff.
> 
> Still, It Would Be Nice™ if you _didn't_ have to have the debug
> symbols for libc.so installed for this to work.  Probably a lot of
> people think you only need those if you are debugging libc itself.

The __GI_* alias is an internal alias of __errno_location and I've seen
this before with other symbols, where a function address resolves to the
internal alias instead of the public one in gdb as well as other places
like objdump.  It might make sense to turn this around, but I suspect
there may be a reason for it that I am unaware of.

Siddhesh
  
Pedro Alves July 4, 2017, 3:54 p.m. UTC | #10
On 07/01/2017 03:35 PM, Siddhesh Poyarekar wrote:
> On Friday 30 June 2017 10:57 PM, Zack Weinberg wrote:

>> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols
>> from /usr/lib/debug/.build-id/cc/80584889db7a969292959a46c718a2b1500702.debug...done.
>> done.
>> (gdb) ptype __errno_location
>> type = int ()
>> (gdb) p __errno_location
>> $1 = {<text variable, no debug info>} 0x20590 <__GI___errno_location>
>>
>> ... a suspicion occurs...
>>
>> (gdb) ptype __GI___errno_location
>> type = int *(void)
>> (gdb) p __GI___errno_location
>> $2 = {int *(void)} 0x20590 <__GI___errno_location>
>>
>> ... so I guess this is a problem with the __GI_ indirection, which
>> *may* be a thing we can resolve on our end.  I don't fully understand
>> that stuff.
> 
> The __GI_* alias is an internal alias of __errno_location and I've seen
> this before with other symbols, where a function address resolves to the
> internal alias instead of the public one in gdb as well as other places
> like objdump.  It might make sense to turn this around, but I suspect
> there may be a reason for it that I am unaware of.
> 

I look at this a bit today, and sent a gdb patch to handle this
better now:
  https://sourceware.org/ml/gdb-patches/2017-07/msg00018.html

I pushed it to the same branch as the other one: users/palves/whatis.

Thanks,
Pedro Alves
  
Pedro Alves July 13, 2017, 2:30 a.m. UTC | #11
On 06/30/2017 07:11 PM, Pedro Alves wrote:
> On 06/30/2017 06:27 PM, Zack Weinberg wrote:
>> Pedro Alves wrote:
>>> The next problem is that without debug info for __errno_location,
>>> gdb has no clue of its prototype, only that its a function, and so
>>> it assumes that it has type "int()", i.e., that it returns int,
>>> while in reality it returns int * / __error_t *.  (Falling back
>>> to assuming "int" is IMO not the best idea, but I don't know
>>> the history behind it.)
>>
>> Probably because that's the pre-C89 legacy default function signature?
> 
> Yes, most likely.

FYI, shortly after this discussion, yet another user showed up
on #gdb confused by "p getenv("PATH") returning weird negative
integer numbers [because he had no debug info for getenv...], so I
decided to do something about it.  I've now sent a patch series
that stops GDB from assuming no-debug-info symbols have
type int:

 [PATCH 00/13] No-debug-info debugging improvements
 https://sourceware.org/ml/gdb-patches/2017-07/msg00112.html

Comments welcome.

Thanks,
Pedro Alves
  
Pedro Alves Sept. 4, 2017, 9:25 p.m. UTC | #12
On 07/13/2017 03:30 AM, Pedro Alves wrote:
> On 06/30/2017 07:11 PM, Pedro Alves wrote:
>> On 06/30/2017 06:27 PM, Zack Weinberg wrote:
>>> Pedro Alves wrote:
>>>> The next problem is that without debug info for __errno_location,
>>>> gdb has no clue of its prototype, only that its a function, and so
>>>> it assumes that it has type "int()", i.e., that it returns int,
>>>> while in reality it returns int * / __error_t *.  (Falling back
>>>> to assuming "int" is IMO not the best idea, but I don't know
>>>> the history behind it.)
>>>
>>> Probably because that's the pre-C89 legacy default function signature?
>>
>> Yes, most likely.
> 
> FYI, shortly after this discussion, yet another user showed up
> on #gdb confused by "p getenv("PATH") returning weird negative
> integer numbers [because he had no debug info for getenv...], so I
> decided to do something about it.  I've now sent a patch series
> that stops GDB from assuming no-debug-info symbols have
> type int:
> 
>  [PATCH 00/13] No-debug-info debugging improvements
>  https://sourceware.org/ml/gdb-patches/2017-07/msg00112.html
> 
> Comments welcome.

FYI, this is now all in gdb master.  I believe all the gdb issues
uncovered by the errno printer have been addressed.  Let me know
if you're aware of something still not working properly.

Thanks,
Pedro Alves
  
Zack Weinberg Sept. 5, 2017, 9:15 p.m. UTC | #13
On Mon, Sep 4, 2017 at 5:25 PM, Pedro Alves <palves@redhat.com> wrote:
>
> FYI, this is now all in gdb master.  I believe all the gdb issues
> uncovered by the errno printer have been addressed.  Let me know
> if you're aware of something still not working properly.

I'm sorry I never got around to experimenting with your patches before now.

With gdb master as of earlier today, and my patched glibc that tries
to pretty-print errno, I can confirm that nearly everything works as
desired. `p (error_t) 0` invokes the pretty-printer, and when
preprocessor macro bodies are available to the debugger (-ggdb3) so
does `p errno`. Unfortunately I am still getting this error message
when I try to print the underlying thread-local errno variable (which
is what `p errno` does if macro bodies are not available):

Cannot find thread-local storage for process 4367, executable file
/home/zack/projects/glibc/glibc-build/stdlib/test-errno-printer:
Cannot find thread-local variables on this target

I don't understand why thread-local variables are inaccessible on my
perfectly ordinary x86_64-unknown-linux-gnu workstation (the base OS
is Debian 'stretch').  Do you have any idea what might be wrong?

zw
  
Pedro Alves Sept. 5, 2017, 10:31 p.m. UTC | #14
On 09/05/2017 10:15 PM, Zack Weinberg wrote:
> On Mon, Sep 4, 2017 at 5:25 PM, Pedro Alves <palves@redhat.com> wrote:
>>
>> FYI, this is now all in gdb master.  I believe all the gdb issues
>> uncovered by the errno printer have been addressed.  Let me know
>> if you're aware of something still not working properly.
> 
> I'm sorry I never got around to experimenting with your patches before now.

Really no worries.

> With gdb master as of earlier today, and my patched glibc that tries
> to pretty-print errno, I can confirm that nearly everything works as
> desired. `p (error_t) 0` invokes the pretty-printer, and when
> preprocessor macro bodies are available to the debugger (-ggdb3) so
> does `p errno`. Unfortunately I am still getting this error message
> when I try to print the underlying thread-local errno variable (which
> is what `p errno` does if macro bodies are not available):
> 
> Cannot find thread-local storage for process 4367, executable file
> /home/zack/projects/glibc/glibc-build/stdlib/test-errno-printer:
> Cannot find thread-local variables on this target
> 
> I don't understand why thread-local variables are inaccessible on my
> perfectly ordinary x86_64-unknown-linux-gnu workstation (the base OS
> is Debian 'stretch').  Do you have any idea what might be wrong?

I assume your test program isn't actually linked with -pthread?

When you do "print errno" in this situation, because there's no
"#define errno ..." in sight, gdb ends up finding the real "errno" symbol,
which, even though the program isn't threaded, is a TLS symbol, and as such has
a DWARF location expression describing its location as an offset into the
thread-local block for the current thread.  GDB needs to resolve that address, and
for threaded programs that is normally done with assistance from libthread_db.so.
The problem is then that libthread_db.so only works with programs that
link with libpthread.so, and if your test program is actually non-threaded,
it doesn't link with libpthread.so,  So without libthread_db.so's assistance,
gdb cannot "find [the address of] thread-local variables on this target".  The
error message is coming from generic GDB "get address of tls var" code several
layers above GNU/Linux-specific libthread_db.so-interaction code, but still
it could probably be made clearer, maybe by adding "the address of".

A workaround specifically for errno, and only for live-process debugging [*]
is the "macro define" trick I had suggested before:

 (gdb) macro define errno (*__errno_location ())

After that, "p errno" ends up calling __errno_location just
like when you compile the test program with -g3.

[*] doesn't work with core file / postmortem debugging.

I don't know whether GDB could be able to resolve the location
of the errno variable for the main thread (for single-threaded
programs):

 - without libthread_db.so assistance and
 - without baking in awareness of glibc internal data structures

If there is I'd absolutely love to learn about it.

Thanks,
Pedro Alves
  
Zack Weinberg Sept. 6, 2017, 1:05 p.m. UTC | #15
On Tue, Sep 5, 2017 at 6:31 PM, Pedro Alves <palves@redhat.com> wrote:
> On 09/05/2017 10:15 PM, Zack Weinberg wrote:
>>
>> I don't understand why thread-local variables are inaccessible on my
>> perfectly ordinary x86_64-unknown-linux-gnu workstation (the base OS
>> is Debian 'stretch').  Do you have any idea what might be wrong?
>
> I assume your test program isn't actually linked with -pthread?

That is correct.

> When you do "print errno" in this situation, because there's no
> "#define errno ..." in sight, gdb ends up finding the real "errno" symbol,
> which, even though the program isn't threaded, is a TLS symbol, and as such has
> a DWARF location expression describing its location as an offset into the
> thread-local block for the current thread.  GDB needs to resolve that address, and
> for threaded programs that is normally done with assistance from libthread_db.so.
> The problem is then that libthread_db.so only works with programs that
> link with libpthread.so, and if your test program is actually non-threaded,
> it doesn't link with libpthread.so.

I am not familiar with the glibc-side TLS implementation, nor with
libthread_db.so, nor the code in GDB that uses libthread_db.so.
However, reading the implementation of td_thr_tls_get_addr leads me to
believe that that function is *supposed* to work even if libpthread.so
has not been loaded into the 'inferior'.  If it doesn't, perhaps that
is a bug on our side.  Do you know if GDB even tries? It's not obvious
to me looking at linux-thread-db.c.

> A workaround specifically for errno, and only for live-process debugging [*]
> is the "macro define" trick I had suggested before:
>
>  (gdb) macro define errno (*__errno_location ())
>
> After that, "p errno" ends up calling __errno_location just
> like when you compile the test program with -g3.

Again, is it possible to do (the equivalent of) this from the
initialization code of a pretty-printer module?  Specifically, there
already exists a Python function that's doing this:

def register(objfile):
    """Register pretty printers for the current objfile."""

    printer = gdb.printing.RegexpCollectionPrettyPrinter("glibc-errno")
    printer.add_printer('error_t', r'^(?:__)?error_t', ErrnoPrinter)

    if objfile == None:
        objfile = gdb

    gdb.printing.register_pretty_printer(objfile, printer)

called when the module is loaded; what would I need to add to that so
that the macro is defined (if it isn't already)?

zw
  

Patch

diff --git a/gdb/eval.c b/gdb/eval.c
index 2a39774..6aa2499 100644
--- a/gdb/eval.c
+++ b/gdb/eval.c
@@ -2735,8 +2735,10 @@  evaluate_subexp_standard (struct type *expect_type,
 	     result because we do not want to dig past all
 	     typedefs.  */
 	  check_typedef (type);
+#if 0
 	  if (TYPE_CODE (type) == TYPE_CODE_TYPEDEF)
 	    type = TYPE_TARGET_TYPE (type);
+#endif
 	  return allocate_value (type);
 	}
       else
diff --git a/gdb/valops.c b/gdb/valops.c
index 8675e6c..058fe1e 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -378,6 +378,8 @@  value_cast (struct type *type, struct value *arg2)
     /* We deref the value and then do the cast.  */
     return value_cast (type, coerce_ref (arg2)); 
 
+  struct type *org_type = type;
+
   type = check_typedef (type);
   code1 = TYPE_CODE (type);
   arg2 = coerce_ref (arg2);
@@ -452,14 +454,14 @@  value_cast (struct type *type, struct value *arg2)
       && (code2 == TYPE_CODE_STRUCT || code2 == TYPE_CODE_UNION)
       && TYPE_NAME (type) != 0)
     {
-      struct value *v = value_cast_structs (type, arg2);
+      struct value *v = value_cast_structs (org_type, arg2);
 
       if (v)
 	return v;
     }
 
   if (code1 == TYPE_CODE_FLT && scalar)
-    return value_from_double (type, value_as_double (arg2));
+    return value_from_double (org_type, value_as_double (arg2));
   else if (code1 == TYPE_CODE_DECFLOAT && scalar)
     {
       enum bfd_endian byte_order = gdbarch_byte_order (get_type_arch (type));
@@ -475,7 +477,7 @@  value_cast (struct type *type, struct value *arg2)
 	/* The only option left is an integral type.  */
 	decimal_from_integral (arg2, dec, dec_len, byte_order);
 
-      return value_from_decfloat (type, dec);
+      return value_from_decfloat (org_type, dec);
     }
   else if ((code1 == TYPE_CODE_INT || code1 == TYPE_CODE_ENUM
 	    || code1 == TYPE_CODE_RANGE)
@@ -496,7 +498,7 @@  value_cast (struct type *type, struct value *arg2)
 		     gdbarch_byte_order (get_type_arch (type2)));
       else
         longest = value_as_long (arg2);
-      return value_from_longest (type, convert_to_boolean ?
+      return value_from_longest (org_type, convert_to_boolean ?
 				 (LONGEST) (longest ? 1 : 0) : longest);
     }
   else if (code1 == TYPE_CODE_PTR && (code2 == TYPE_CODE_INT  
@@ -522,14 +524,14 @@  value_cast (struct type *type, struct value *arg2)
 	      || longest <= -((LONGEST) 1 << addr_bit))
 	    warning (_("value truncated"));
 	}
-      return value_from_longest (type, longest);
+      return value_from_longest (org_type, longest);
     }
   else if (code1 == TYPE_CODE_METHODPTR && code2 == TYPE_CODE_INT
 	   && value_as_long (arg2) == 0)
     {
-      struct value *result = allocate_value (type);
+      struct value *result = allocate_value (org_type);
 
-      cplus_make_method_ptr (type, value_contents_writeable (result), 0, 0);
+      cplus_make_method_ptr (org_type, value_contents_writeable (result), 0, 0);
       return result;
     }
   else if (code1 == TYPE_CODE_MEMBERPTR && code2 == TYPE_CODE_INT
@@ -537,7 +539,7 @@  value_cast (struct type *type, struct value *arg2)
     {
       /* The Itanium C++ ABI represents NULL pointers to members as
 	 minus one, instead of biasing the normal case.  */
-      return value_from_longest (type, -1);
+      return value_from_longest (org_type, -1);
     }
   else if (code1 == TYPE_CODE_ARRAY && TYPE_VECTOR (type)
 	   && code2 == TYPE_CODE_ARRAY && TYPE_VECTOR (type2)
@@ -548,21 +550,21 @@  value_cast (struct type *type, struct value *arg2)
     error (_("can only cast scalar to vector of same size"));
   else if (code1 == TYPE_CODE_VOID)
     {
-      return value_zero (type, not_lval);
+      return value_zero (org_type, not_lval);
     }
   else if (TYPE_LENGTH (type) == TYPE_LENGTH (type2))
     {
       if (code1 == TYPE_CODE_PTR && code2 == TYPE_CODE_PTR)
-	return value_cast_pointers (type, arg2, 0);
+	return value_cast_pointers (org_type, arg2, 0);
 
       arg2 = value_copy (arg2);
-      deprecated_set_value_type (arg2, type);
-      set_value_enclosing_type (arg2, type);
+      deprecated_set_value_type (arg2, org_type);
+      set_value_enclosing_type (arg2, org_type);
       set_value_pointed_to_offset (arg2, 0);	/* pai: chk_val */
       return arg2;
     }
   else if (VALUE_LVAL (arg2) == lval_memory)
-    return value_at_lazy (type, value_address (arg2));
+    return value_at_lazy (org_type, value_address (arg2));
   else
     {
       error (_("Invalid cast."));