[3/9] Handle TLS variable lookups when using separate debug object files.

Message ID 6f24b813adb0155b499d6e2265a6f15a2db4e6ca.1548180889.git.jhb@FreeBSD.org
State New, archived
Headers

Commit Message

John Baldwin Jan. 22, 2019, 6:42 p.m. UTC
  The object files stored in the shared object list are the original
object files, not the separate debug object files.  However, when
resolving a TLS variable, the variable is resolved against a separate
debug object file if it exists.  In this case,
svr4_fetch_objfile_link_map was failing to find the link map entry
since the debug object file is not in its internal list, only the
original object file.

gdb/ChangeLog:

	* solib-svr4.c (svr4_fetch_objfile_link_map): Look for
	objfile->separate_debug_objfile_backlink if not NULL.
---
 gdb/ChangeLog    | 5 +++++
 gdb/solib-svr4.c | 5 +++++
 2 files changed, 10 insertions(+)
  

Comments

Simon Marchi Feb. 2, 2019, 3:52 p.m. UTC | #1
On 2019-01-22 13:42, John Baldwin wrote:
> The object files stored in the shared object list are the original
> object files, not the separate debug object files.  However, when
> resolving a TLS variable, the variable is resolved against a separate
> debug object file if it exists.  In this case,
> svr4_fetch_objfile_link_map was failing to find the link map entry
> since the debug object file is not in its internal list, only the
> original object file.

Does this solve an existing issue, or an issue that would come up with 
the following patches?

Simon
  
John Baldwin Feb. 4, 2019, 8:02 p.m. UTC | #2
On 2/2/19 7:52 AM, Simon Marchi wrote:
> On 2019-01-22 13:42, John Baldwin wrote:
>> The object files stored in the shared object list are the original
>> object files, not the separate debug object files.  However, when
>> resolving a TLS variable, the variable is resolved against a separate
>> debug object file if it exists.  In this case,
>> svr4_fetch_objfile_link_map was failing to find the link map entry
>> since the debug object file is not in its internal list, only the
>> original object file.
> 
> Does this solve an existing issue, or an issue that would come up with 
> the following patches?

I only noticed while working on these patches, but I believe it is a
generic issue.  I tried to reproduce on a Linux box by compiling a small
library with separate debug symbols and a program that linked against it
and running it under gdb, but TLS variables didn't work for me even without
separate debug symbols in my testing. :(

$ cat foo.c
#include "foo.h"

static __thread int foo_id;

void
set_foo_id(int id)
{
  foo_id = id;
}

int
get_foo_id(void)
{
  return foo_id;
}
$ cat foo.h
void set_foo_id(int id);
int get_foo_id(void);
$ cat main.c
#include <stdio.h>

#include "foo.h"

int
main(void)
{

  set_foo_id(47);
  printf("foo id = %d\n", get_foo_id());
  return (0);
}
$ cc -g -fPIC -shared foo.c -o libfoo.so.full
$ objcopy --only-keep-debug libfoo.so.full libfoo.so.debug
$ objcopy --strip-debug --add-gnu-debuglink=libfoo.so.debug libfoo.so.full libfoo.so
$ cc -g main.c -o foo -lfoo -L.
$ gdb foo
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
...
(gdb) start
Temporary breakpoint 1 at 0x7ae: file main.c, line 9.
Starting program: /home/john/tls_lib/foo 

Temporary breakpoint 1, main () at main.c:9
9         set_foo_id(47);
(gdb) p foo_id
Cannot find thread-local storage for process 3970, shared library libfoo.so:
Cannot find thread-local variables on this target

Then tried it without separate debug file, but that didn't work either:

$ cc -g -fPIC -shared foo.c -o libfoo.so
$ gdb foo
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
...
(gdb) start
Temporary breakpoint 1 at 0x7ae: file main.c, line 9.
Starting program: /home/john/tls_lib/foo 

Temporary breakpoint 1, main () at main.c:9
9         set_foo_id(47);
(gdb) p foo_id
Cannot find thread-local storage for process 3982, shared library libfoo.so:
Cannot find thread-local variables on this target
(gdb) n
10        printf("foo id = %d\n", get_foo_id());
(gdb) 
foo id = 47
11        return (0);
(gdb) p foo_id
Cannot find thread-local storage for process 3982, shared library libfoo.so:
Cannot find thread-local variables on this target

I would have expected the second case to work and the first to fail and for
the patch to fix the first case.
  
Simon Marchi Feb. 5, 2019, 8:06 p.m. UTC | #3
On 2019-02-04 15:02, John Baldwin wrote:
> On 2/2/19 7:52 AM, Simon Marchi wrote:
>> On 2019-01-22 13:42, John Baldwin wrote:
>>> The object files stored in the shared object list are the original
>>> object files, not the separate debug object files.  However, when
>>> resolving a TLS variable, the variable is resolved against a separate
>>> debug object file if it exists.  In this case,
>>> svr4_fetch_objfile_link_map was failing to find the link map entry
>>> since the debug object file is not in its internal list, only the
>>> original object file.
>> 
>> Does this solve an existing issue, or an issue that would come up with
>> the following patches?
> 
> I only noticed while working on these patches, but I believe it is a
> generic issue.  I tried to reproduce on a Linux box by compiling a 
> small
> library with separate debug symbols and a program that linked against 
> it
> and running it under gdb, but TLS variables didn't work for me even 
> without
> separate debug symbols in my testing. :(
> 
> $ cat foo.c
> #include "foo.h"
> 
> static __thread int foo_id;
> 
> void
> set_foo_id(int id)
> {
>   foo_id = id;
> }
> 
> int
> get_foo_id(void)
> {
>   return foo_id;
> }
> $ cat foo.h
> void set_foo_id(int id);
> int get_foo_id(void);
> $ cat main.c
> #include <stdio.h>
> 
> #include "foo.h"
> 
> int
> main(void)
> {
> 
>   set_foo_id(47);
>   printf("foo id = %d\n", get_foo_id());
>   return (0);
> }
> $ cc -g -fPIC -shared foo.c -o libfoo.so.full
> $ objcopy --only-keep-debug libfoo.so.full libfoo.so.debug
> $ objcopy --strip-debug --add-gnu-debuglink=libfoo.so.debug
> libfoo.so.full libfoo.so
> $ cc -g main.c -o foo -lfoo -L.
> $ gdb foo
> GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
> ...
> (gdb) start
> Temporary breakpoint 1 at 0x7ae: file main.c, line 9.
> Starting program: /home/john/tls_lib/foo
> 
> Temporary breakpoint 1, main () at main.c:9
> 9         set_foo_id(47);
> (gdb) p foo_id
> Cannot find thread-local storage for process 3970, shared library 
> libfoo.so:
> Cannot find thread-local variables on this target
> 
> Then tried it without separate debug file, but that didn't work either:
> 
> $ cc -g -fPIC -shared foo.c -o libfoo.so
> $ gdb foo
> GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
> ...
> (gdb) start
> Temporary breakpoint 1 at 0x7ae: file main.c, line 9.
> Starting program: /home/john/tls_lib/foo
> 
> Temporary breakpoint 1, main () at main.c:9
> 9         set_foo_id(47);
> (gdb) p foo_id
> Cannot find thread-local storage for process 3982, shared library 
> libfoo.so:
> Cannot find thread-local variables on this target
> (gdb) n
> 10        printf("foo id = %d\n", get_foo_id());
> (gdb)
> foo id = 47
> 11        return (0);
> (gdb) p foo_id
> Cannot find thread-local storage for process 3982, shared library 
> libfoo.so:
> Cannot find thread-local variables on this target
> 
> I would have expected the second case to work and the first to fail and 
> for
> the patch to fix the first case.

I did a similar test and hit the same message.  I figured it's because 
you need to build with -pthread.  With -pthread, GDB manages to print 
the value in both cases (with and without separate debug info).  That's 
why I ended up asking you this question, I thought that maybe I just 
hadn't reproduced the issue correctly.

I have attached my test case (so we don't have to rewrite the same test 
over and over), which you can easily build with and without debug info 
(see the Makefile).  Does it trigger the problem on FreeBSD?  Does it 
work for you in both cases on Linux, like it does for me?

Now that I take a look, there might be the gdb.threads/tls-shared.exp 
test that does the exact same thing.  But there doesn't seem to be a 
board file to test with separate debug files out of the box...

Simon
  
John Baldwin Feb. 5, 2019, 10:21 p.m. UTC | #4
On 2/5/19 12:06 PM, Simon Marchi wrote:
> On 2019-02-04 15:02, John Baldwin wrote:
>> On 2/2/19 7:52 AM, Simon Marchi wrote:
>>> On 2019-01-22 13:42, John Baldwin wrote:
>>>> The object files stored in the shared object list are the original
>>>> object files, not the separate debug object files.  However, when
>>>> resolving a TLS variable, the variable is resolved against a separate
>>>> debug object file if it exists.  In this case,
>>>> svr4_fetch_objfile_link_map was failing to find the link map entry
>>>> since the debug object file is not in its internal list, only the
>>>> original object file.
>>>
>>> Does this solve an existing issue, or an issue that would come up with
>>> the following patches?
>>
>> I only noticed while working on these patches, but I believe it is a
>> generic issue.  I tried to reproduce on a Linux box by compiling a 
>> small
>> library with separate debug symbols and a program that linked against 
>> it
>> and running it under gdb, but TLS variables didn't work for me even 
>> without
>> separate debug symbols in my testing. :(
>>
>> $ cat foo.c
>> #include "foo.h"
>>
>> static __thread int foo_id;
>>
>> void
>> set_foo_id(int id)
>> {
>>   foo_id = id;
>> }
>>
>> int
>> get_foo_id(void)
>> {
>>   return foo_id;
>> }
>> $ cat foo.h
>> void set_foo_id(int id);
>> int get_foo_id(void);
>> $ cat main.c
>> #include <stdio.h>
>>
>> #include "foo.h"
>>
>> int
>> main(void)
>> {
>>
>>   set_foo_id(47);
>>   printf("foo id = %d\n", get_foo_id());
>>   return (0);
>> }
>> $ cc -g -fPIC -shared foo.c -o libfoo.so.full
>> $ objcopy --only-keep-debug libfoo.so.full libfoo.so.debug
>> $ objcopy --strip-debug --add-gnu-debuglink=libfoo.so.debug
>> libfoo.so.full libfoo.so
>> $ cc -g main.c -o foo -lfoo -L.
>> $ gdb foo
>> GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
>> ...
>> (gdb) start
>> Temporary breakpoint 1 at 0x7ae: file main.c, line 9.
>> Starting program: /home/john/tls_lib/foo
>>
>> Temporary breakpoint 1, main () at main.c:9
>> 9         set_foo_id(47);
>> (gdb) p foo_id
>> Cannot find thread-local storage for process 3970, shared library 
>> libfoo.so:
>> Cannot find thread-local variables on this target
>>
>> Then tried it without separate debug file, but that didn't work either:
>>
>> $ cc -g -fPIC -shared foo.c -o libfoo.so
>> $ gdb foo
>> GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
>> ...
>> (gdb) start
>> Temporary breakpoint 1 at 0x7ae: file main.c, line 9.
>> Starting program: /home/john/tls_lib/foo
>>
>> Temporary breakpoint 1, main () at main.c:9
>> 9         set_foo_id(47);
>> (gdb) p foo_id
>> Cannot find thread-local storage for process 3982, shared library 
>> libfoo.so:
>> Cannot find thread-local variables on this target
>> (gdb) n
>> 10        printf("foo id = %d\n", get_foo_id());
>> (gdb)
>> foo id = 47
>> 11        return (0);
>> (gdb) p foo_id
>> Cannot find thread-local storage for process 3982, shared library 
>> libfoo.so:
>> Cannot find thread-local variables on this target
>>
>> I would have expected the second case to work and the first to fail and 
>> for
>> the patch to fix the first case.
> 
> I did a similar test and hit the same message.  I figured it's because 
> you need to build with -pthread.  With -pthread, GDB manages to print 
> the value in both cases (with and without separate debug info).  That's 
> why I ended up asking you this question, I thought that maybe I just 
> hadn't reproduced the issue correctly.
> 
> I have attached my test case (so we don't have to rewrite the same test 
> over and over), which you can easily build with and without debug info 
> (see the Makefile).  Does it trigger the problem on FreeBSD?  Does it 
> work for you in both cases on Linux, like it does for me?
> 
> Now that I take a look, there might be the gdb.threads/tls-shared.exp 
> test that does the exact same thing.  But there doesn't seem to be a 
> board file to test with separate debug files out of the box...

So, it works fine for me without this patch on FreeBSD.  However, I dug
around a bit more as I had only noticed this before when trying to debug
a core dump for riscv (as FreeBSD/riscv64 core dumps have the tp register
in them).  That is when I get a separate objfile passed to
svr4_fetch_objfile_link_map:

Breakpoint 3, svr4_fetch_objfile_link_map (objfile=0x8038a1600) at ../../gdb/solib-svr4.c:1541
1541      struct svr4_info *info = get_svr4_info ();
(top-gdb) p *objfile
$3 = {next = 0x8038a1480, 
  original_name = 0x8038df010 "/usr/lib/debug//ufs/riscv64/rootfs/lib/libc.so.7.debug", addr_low = 0x0, flags = {
...
(top-gdb) where
#0  svr4_fetch_objfile_link_map (objfile=0x8038a1600)
    at ../../gdb/solib-svr4.c:1541
#1  0x00000000015085e7 in gdbarch_fetch_tls_load_module_address (gdbarch=0x803800010, objfile=0x8038a1600) at ../../gdb/gdbarch.c:3019
#2  0x00000000019ba415 in target_translate_tls_address (objfile=0x8038a1600, 
    offset=0x1800) at ../../gdb/target.c:705
#3  0x00000000016ea2a1 in find_minsym_type_and_address (msymbol=0x803cfcb18, 
    objfile=0x8038a1600, address_p=0x7fffffffd2e8) at ../../gdb/parse.c:500
#4  0x00000000014afeb7 in evaluate_var_msym_value (noside=EVAL_NORMAL, 
    objfile=0x8038a1600, msymbol=0x803cfcb18) at ../../gdb/eval.c:743
#5  0x00000000014b0290 in evaluate_subexp_standard (expect_type=0x0, 
    exp=0x8034f0430, pos=0x7fffffffde34, noside=EVAL_NORMAL)
    at ../../gdb/eval.c:1326
#6  0x00000000012a8562 in evaluate_subexp_c (expect_type=0x0, exp=0x8034f0430, 
    pos=0x7fffffffde34, noside=EVAL_NORMAL) at ../../gdb/c-lang.c:704
#7  0x00000000014ae17b in evaluate_subexp (expect_type=0x0, exp=0x8034f0430, 
    pos=0x7fffffffde34, noside=EVAL_NORMAL) at ../../gdb/eval.c:80
#8  0x00000000014ae48b in evaluate_expression (exp=0x8034f0430)
    at ../../gdb/eval.c:141
#9  0x000000000173184f in print_command_1 (
    exp=0x7fffffffe101 "__je_tsd_initialized", voidprint=1)
    at ../../gdb/printcmd.c:1187
#10 0x000000000172e5ff in print_command (
    exp=0x7fffffffe101 "__je_tsd_initialized", from_tty=1)
    at ../../gdb/printcmd.c:1200
(top-gdb) frame 5
#5  0x00000000014b0290 in evaluate_subexp_standard (expect_type=0x0, 
    exp=0x8034f0430, pos=0x7fffffffde34, noside=EVAL_NORMAL)
    at ../../gdb/eval.c:1326
1326            value *val = evaluate_var_msym_value (noside,
(top-gdb) l
1321        case OP_VAR_MSYM_VALUE:
1322          {
1323            (*pos) += 3;
1324
1325            minimal_symbol *msymbol = exp->elts[pc + 2].msymbol;
1326            value *val = evaluate_var_msym_value (noside,
1327                                                  exp->elts[pc + 1].objfile,
1328                                                  msymbol);
1329
1330            type = value_type (val);

whereas on a live amd64 process this was called from a different place:

(top-gdb) where
#0  svr4_fetch_objfile_link_map (objfile=0x803879380)
    at ../../gdb/solib-svr4.c:1541
#1  0x00000000015085e7 in gdbarch_fetch_tls_load_module_address (gdbarch=0x8039b7010, objfile=0x803879380) at ../../gdb/gdbarch.c:3019
#2  0x00000000019ba3f5 in target_translate_tls_address (objfile=0x803879380, 
    offset=6168) at ../../gdb/target.c:705
#3  0x000000000141f361 in dwarf_evaluate_loc_desc::get_tls_address (
    this=0x7fffffffc578, offset=6168) at ../../gdb/dwarf2loc.c:609
#4  0x00000000014077dd in dwarf_expr_context::execute_stack_op (
    this=0x7fffffffc578, op_ptr=0x803dea6fb "\002S\177\001", 
    op_end=0x803dea6fb "\002S\177\001") at ../../gdb/dwarf2expr.c:1175
#5  0x00000000014056b1 in dwarf_expr_context::eval (this=0x7fffffffc578, 
    addr=0x803dea6f1 "\016\030\030", len=10) at ../../gdb/dwarf2expr.c:301
#6  0x000000000140e99d in dwarf2_evaluate_loc_desc_full (type=0x8040ad930, 
    frame=0x0, data=0x803dea6f1 "\016\030\030", size=10, per_cu=0x8039cb190, 
    subobj_type=0x8040ad930, subobj_byte_offset=0)
    at ../../gdb/dwarf2loc.c:2170
#7  0x000000000140e7d2 in dwarf2_evaluate_loc_desc (type=0x8040ad930, 
    frame=0x0, data=0x803dea6f1 "\016\030\030", size=10, per_cu=0x8039cb190)
    at ../../gdb/dwarf2loc.c:2352
#8  0x0000000001412794 in locexpr_read_variable (symbol=0x80416c930, frame=0x0)
    at ../../gdb/dwarf2loc.c:3503
#9  0x00000000014e3743 in default_read_var_value (var=0x80416c930, 
    var_block=0x8041724e0, frame=0x0) at ../../gdb/findvar.c:610
#10 0x00000000014e4881 in read_var_value (var=0x80416c930, 
    var_block=0x8041724e0, frame=0x0) at ../../gdb/findvar.c:815
#11 0x0000000001a99abe in value_of_variable (var=0x80416c930, b=0x8041724e0) at ../../gdb/valops.c:1292
#12 0x00000000014afd7b in evaluate_var_value (noside=EVAL_NORMAL, blk=0x8041724e0, var=0x80416c930) at ../../gdb/eval.c:721
#13 0x00000000014b0217 in evaluate_subexp_standard (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/eval.c:1312
#14 0x00000000012a8562 in evaluate_subexp_c (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/c-lang.c:704
#15 0x00000000014ae17b in evaluate_subexp (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/eval.c:80
#16 0x00000000014ae48b in evaluate_expression (exp=0x803507830) at ../../gdb/eval.c:141
#17 0x000000000173184f in print_command_1 (exp=0x7fffffffd7d1 "__je_tsd_initialized", voidprint=1) at ../../gdb/printcmd.c:1187
#18 0x000000000172e5ff in print_command (exp=0x7fffffffd7d1 "__je_tsd_initialized", from_tty=1) at ../../gdb/printcmd.c:1200
(top-gdb) frame 13
#13 0x00000000014b0217 in evaluate_subexp_standard (expect_type=0x0, 
    exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL)
    at ../../gdb/eval.c:1312
1312                return evaluate_var_value (noside, exp->elts[pc + 1].block, var);
(top-gdb) l
1307            (*pos) += 3;
1308            symbol *var = exp->elts[pc + 2].symbol;
1309            if (TYPE_CODE (SYMBOL_TYPE (var)) == TYPE_CODE_ERROR)
1310              error_unknown_type (SYMBOL_PRINT_NAME (var));
1311            if (noside != EVAL_SKIP)
1312                return evaluate_var_value (noside, exp->elts[pc + 1].block, var);
1313            else
1314              {
1315                /* Return a dummy value of the correct type when skipping, so
1316                   that parent functions know what is to be skipped.  */

So it seems that the OP_VAR_VALUE path calls down into the dwarf bits that
get the "original" objfile to pass to target_translate_tls_address, whereas
the OP_VAR_MSYM_VALUE case ends up using a separate object file.  This might
in fact be due to bugs in the RISCV GCC backend as the TLS symbols for RISCV
don't seem quite right.  I have to cast TLS variables to their types for
example:

(gdb) p __je_tsd_initialized
'__je_tsd_initialized` has unknown type; cast it to its declared type
(gdb) p (bool)__je_tsd_initialized
$2 = 1 '\001'

Also, for me TLS variables in the main executable did not work for me on
RISCV, only TLS variables in shared libraries, unlike on other architectures
I tested where TLS variables in both the executable and shared libraries
worked.
  
John Baldwin Feb. 5, 2019, 10:33 p.m. UTC | #5
On 2/5/19 2:21 PM, John Baldwin wrote:
> So it seems that the OP_VAR_VALUE path calls down into the dwarf bits that
> get the "original" objfile to pass to target_translate_tls_address, whereas
> the OP_VAR_MSYM_VALUE case ends up using a separate object file.  This might
> in fact be due to bugs in the RISCV GCC backend as the TLS symbols for RISCV
> don't seem quite right.  I have to cast TLS variables to their types for
> example:
> 
> (gdb) p __je_tsd_initialized
> '__je_tsd_initialized` has unknown type; cast it to its declared type
> (gdb) p (bool)__je_tsd_initialized
> $2 = 1 '\001'
> 
> Also, for me TLS variables in the main executable did not work for me on
> RISCV, only TLS variables in shared libraries, unlike on other architectures
> I tested where TLS variables in both the executable and shared libraries
> worked.

I should have said: I think this means I probably need to rework the commit
message a bit to explain that this doesn't always happen with separate
debug files, but it can, and if so this fix is needed.
  
Simon Marchi Feb. 7, 2019, 4:05 a.m. UTC | #6
On 2019-02-05 17:33, John Baldwin wrote:
> On 2/5/19 2:21 PM, John Baldwin wrote:
>> So it seems that the OP_VAR_VALUE path calls down into the dwarf bits 
>> that
>> get the "original" objfile to pass to target_translate_tls_address, 
>> whereas
>> the OP_VAR_MSYM_VALUE case ends up using a separate object file.  This 
>> might
>> in fact be due to bugs in the RISCV GCC backend as the TLS symbols for 
>> RISCV
>> don't seem quite right.  I have to cast TLS variables to their types 
>> for
>> example:
>> 
>> (gdb) p __je_tsd_initialized
>> '__je_tsd_initialized` has unknown type; cast it to its declared type
>> (gdb) p (bool)__je_tsd_initialized
>> $2 = 1 '\001'
>> 
>> Also, for me TLS variables in the main executable did not work for me 
>> on
>> RISCV, only TLS variables in shared libraries, unlike on other 
>> architectures
>> I tested where TLS variables in both the executable and shared 
>> libraries
>> worked.
> 
> I should have said: I think this means I probably need to rework the 
> commit
> message a bit to explain that this doesn't always happen with separate
> debug files, but it can, and if so this fix is needed.

After discussing with John on IRC and trying by all means to trigger 
this bug, this is what I ended up with.  On x86/GNU/Linux, it is 
possible to trigger this bug in a rather contrived way, but at least it 
shows that there is indeed a bug in GDB.  The requirements are:

1. Have a TLS variable in a shared library ("libfoo.so")
2. The .o containing the TLS variable should not be described with 
DWARF.  This can be done simply by compiling it without -g.
3. The shared lib should still have a separate debug info file 
("libfoo.so.debug").
4. The shared lib should be stripped of its unnecessary ELF symbols (ran 
through the strip utility)
5. The shared library should not be the first object file in the program 
to have TLS.  This can be done by adding a TLS variable in the main 
executable ("main").
6. Because we don't have debug info for the variable we try to print, we 
need to cast it to its expected type, e.g. "print (int)foo_id".

With all this, when parsing "(int)foo_id", we find a minimal symbol 
matching foo_id in the objfile representing libfoo.so.debug.  
find_minsym_type_and_address is eventually called with that objfile, 
which calls target_translate_tls_address, which calls 
svr4_fetch_objfile_link_map.  The latter obviously can't find a link_map 
matching the separate-debug-info objfile, and wrong things happen after.

Without #2 above, the DWARF symbol is found and some other code path is 
taken, where the separate-debug-info objfile is replaced with the "real" 
objfile at some point.

Without #3, we obviously wouldn't end up with a separate-debug-info 
objfile in svr4_fetch_objfile_link_map.

Without #4 above, we would find the objfile for libfoo.so first, so we 
wouldn't end up with the seaprate-debug-info objfile in 
svr4_fetch_objfile_link_map.

Without #5 we actually get the right result by chance.  This is because 
we end up in the special case "else" of 
thread_db_target::get_thread_local_address.  That special case hardcodes 
the module id to read from to 1.  If the shared lib is the only module 
to have TLS, then it happens to be the right one.  By making the main 
executable have some TLS, we will end up reading the TLS for the wrong 
module, and thus trigger the bug.

I have not yet found the motivation to write a proper test for this (in 
particular, I am not sure how to build the lib with separate debug info 
in the testsuite).  But I attached a reproducer for future reference.  
You should just need to "make" and "gdb -x run.gdb".  The wrong value is 
printed with today's GDB.

In John's specific case, apparently the DWARF debug info for TLS 
variables on riscv is broken, which brought him to roughly the same 
state.  I don't have any more info about that.

All this to say that I think this patch is fine.  I wondered whether it 
was better instead to make svr4_fetch_objfile_link_map assert that it 
receives a "real" objfile (objfile->separate_debug_objfile_backlink == 
NULL) and push the responsibility to the caller to provide a correct 
objfile.  But in the end I didn't find a compelling to do this rather 
than what John did in this patch.  So therefore, the patch LGTM.

Simon
  
Simon Marchi Feb. 7, 2019, 4:08 a.m. UTC | #7
On 2019-02-06 23:05, Simon Marchi wrote:
> I have not yet found the motivation to write a proper test for this
> (in particular, I am not sure how to build the lib with separate debug
> info in the testsuite).  But I attached a reproducer for future
> reference.  You should just need to "make" and "gdb -x run.gdb".  The
> wrong value is printed with today's GDB.

And of course I forgot the attachment... here it is.

Simon
  

Patch

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 056a60fa23..c02e534fe5 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,8 @@ 
+2019-01-22  John Baldwin  <jhb@FreeBSD.org>
+
+	* solib-svr4.c (svr4_fetch_objfile_link_map): Look for
+	objfile->separate_debug_objfile_backlink if not NULL.
+
 2019-01-22  John Baldwin  <jhb@FreeBSD.org>
 
 	* amd64-bsd-nat.c (amd64bsd_fetch_inferior_registers): Use
diff --git a/gdb/solib-svr4.c b/gdb/solib-svr4.c
index 787b98ba26..d7a792580d 100644
--- a/gdb/solib-svr4.c
+++ b/gdb/solib-svr4.c
@@ -1548,6 +1548,11 @@  svr4_fetch_objfile_link_map (struct objfile *objfile)
   if (objfile == symfile_objfile)
     return info->main_lm_addr;
 
+  /* If OBJFILE is a separate debug object file, look for the
+     original object file.  */
+  if (objfile->separate_debug_objfile_backlink != NULL)
+    objfile = objfile->separate_debug_objfile_backlink;
+
   /* The other link map addresses may be found by examining the list
      of shared libraries.  */
   for (so = master_so_list (); so; so = so->next)