Message ID | 6f24b813adb0155b499d6e2265a6f15a2db4e6ca.1548180889.git.jhb@FreeBSD.org |
---|---|
State | New, archived |
Headers |
Received: (qmail 85469 invoked by alias); 22 Jan 2019 18:43:45 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 85447 invoked by uid 89); 22 Jan 2019 18:43:45 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_HELO_PASS, SPF_SOFTFAIL autolearn=ham version=3.3.2 spammy=John, HContent-Transfer-Encoding:8bit X-HELO: mail.baldwin.cx Received: from bigwig.baldwin.cx (HELO mail.baldwin.cx) (96.47.65.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 22 Jan 2019 18:43:43 +0000 Received: from ralph.com (ralph.baldwin.cx [66.234.199.215]) by mail.baldwin.cx (Postfix) with ESMTPSA id BC70210B4CE for <gdb-patches@sourceware.org>; Tue, 22 Jan 2019 13:43:41 -0500 (EST) From: John Baldwin <jhb@FreeBSD.org> To: gdb-patches@sourceware.org Subject: [PATCH 3/9] Handle TLS variable lookups when using separate debug object files. Date: Tue, 22 Jan 2019 10:42:56 -0800 Message-Id: <6f24b813adb0155b499d6e2265a6f15a2db4e6ca.1548180889.git.jhb@FreeBSD.org> In-Reply-To: <cover.1548180889.git.jhb@FreeBSD.org> References: <cover.1548180889.git.jhb@FreeBSD.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes |
Commit Message
John Baldwin
Jan. 22, 2019, 6:42 p.m. UTC
The object files stored in the shared object list are the original object files, not the separate debug object files. However, when resolving a TLS variable, the variable is resolved against a separate debug object file if it exists. In this case, svr4_fetch_objfile_link_map was failing to find the link map entry since the debug object file is not in its internal list, only the original object file. gdb/ChangeLog: * solib-svr4.c (svr4_fetch_objfile_link_map): Look for objfile->separate_debug_objfile_backlink if not NULL. --- gdb/ChangeLog | 5 +++++ gdb/solib-svr4.c | 5 +++++ 2 files changed, 10 insertions(+)
Comments
On 2019-01-22 13:42, John Baldwin wrote: > The object files stored in the shared object list are the original > object files, not the separate debug object files. However, when > resolving a TLS variable, the variable is resolved against a separate > debug object file if it exists. In this case, > svr4_fetch_objfile_link_map was failing to find the link map entry > since the debug object file is not in its internal list, only the > original object file. Does this solve an existing issue, or an issue that would come up with the following patches? Simon
On 2/2/19 7:52 AM, Simon Marchi wrote: > On 2019-01-22 13:42, John Baldwin wrote: >> The object files stored in the shared object list are the original >> object files, not the separate debug object files. However, when >> resolving a TLS variable, the variable is resolved against a separate >> debug object file if it exists. In this case, >> svr4_fetch_objfile_link_map was failing to find the link map entry >> since the debug object file is not in its internal list, only the >> original object file. > > Does this solve an existing issue, or an issue that would come up with > the following patches? I only noticed while working on these patches, but I believe it is a generic issue. I tried to reproduce on a Linux box by compiling a small library with separate debug symbols and a program that linked against it and running it under gdb, but TLS variables didn't work for me even without separate debug symbols in my testing. :( $ cat foo.c #include "foo.h" static __thread int foo_id; void set_foo_id(int id) { foo_id = id; } int get_foo_id(void) { return foo_id; } $ cat foo.h void set_foo_id(int id); int get_foo_id(void); $ cat main.c #include <stdio.h> #include "foo.h" int main(void) { set_foo_id(47); printf("foo id = %d\n", get_foo_id()); return (0); } $ cc -g -fPIC -shared foo.c -o libfoo.so.full $ objcopy --only-keep-debug libfoo.so.full libfoo.so.debug $ objcopy --strip-debug --add-gnu-debuglink=libfoo.so.debug libfoo.so.full libfoo.so $ cc -g main.c -o foo -lfoo -L. $ gdb foo GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git ... (gdb) start Temporary breakpoint 1 at 0x7ae: file main.c, line 9. Starting program: /home/john/tls_lib/foo Temporary breakpoint 1, main () at main.c:9 9 set_foo_id(47); (gdb) p foo_id Cannot find thread-local storage for process 3970, shared library libfoo.so: Cannot find thread-local variables on this target Then tried it without separate debug file, but that didn't work either: $ cc -g -fPIC -shared foo.c -o libfoo.so $ gdb foo GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git ... (gdb) start Temporary breakpoint 1 at 0x7ae: file main.c, line 9. Starting program: /home/john/tls_lib/foo Temporary breakpoint 1, main () at main.c:9 9 set_foo_id(47); (gdb) p foo_id Cannot find thread-local storage for process 3982, shared library libfoo.so: Cannot find thread-local variables on this target (gdb) n 10 printf("foo id = %d\n", get_foo_id()); (gdb) foo id = 47 11 return (0); (gdb) p foo_id Cannot find thread-local storage for process 3982, shared library libfoo.so: Cannot find thread-local variables on this target I would have expected the second case to work and the first to fail and for the patch to fix the first case.
On 2019-02-04 15:02, John Baldwin wrote: > On 2/2/19 7:52 AM, Simon Marchi wrote: >> On 2019-01-22 13:42, John Baldwin wrote: >>> The object files stored in the shared object list are the original >>> object files, not the separate debug object files. However, when >>> resolving a TLS variable, the variable is resolved against a separate >>> debug object file if it exists. In this case, >>> svr4_fetch_objfile_link_map was failing to find the link map entry >>> since the debug object file is not in its internal list, only the >>> original object file. >> >> Does this solve an existing issue, or an issue that would come up with >> the following patches? > > I only noticed while working on these patches, but I believe it is a > generic issue. I tried to reproduce on a Linux box by compiling a > small > library with separate debug symbols and a program that linked against > it > and running it under gdb, but TLS variables didn't work for me even > without > separate debug symbols in my testing. :( > > $ cat foo.c > #include "foo.h" > > static __thread int foo_id; > > void > set_foo_id(int id) > { > foo_id = id; > } > > int > get_foo_id(void) > { > return foo_id; > } > $ cat foo.h > void set_foo_id(int id); > int get_foo_id(void); > $ cat main.c > #include <stdio.h> > > #include "foo.h" > > int > main(void) > { > > set_foo_id(47); > printf("foo id = %d\n", get_foo_id()); > return (0); > } > $ cc -g -fPIC -shared foo.c -o libfoo.so.full > $ objcopy --only-keep-debug libfoo.so.full libfoo.so.debug > $ objcopy --strip-debug --add-gnu-debuglink=libfoo.so.debug > libfoo.so.full libfoo.so > $ cc -g main.c -o foo -lfoo -L. > $ gdb foo > GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git > ... > (gdb) start > Temporary breakpoint 1 at 0x7ae: file main.c, line 9. > Starting program: /home/john/tls_lib/foo > > Temporary breakpoint 1, main () at main.c:9 > 9 set_foo_id(47); > (gdb) p foo_id > Cannot find thread-local storage for process 3970, shared library > libfoo.so: > Cannot find thread-local variables on this target > > Then tried it without separate debug file, but that didn't work either: > > $ cc -g -fPIC -shared foo.c -o libfoo.so > $ gdb foo > GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git > ... > (gdb) start > Temporary breakpoint 1 at 0x7ae: file main.c, line 9. > Starting program: /home/john/tls_lib/foo > > Temporary breakpoint 1, main () at main.c:9 > 9 set_foo_id(47); > (gdb) p foo_id > Cannot find thread-local storage for process 3982, shared library > libfoo.so: > Cannot find thread-local variables on this target > (gdb) n > 10 printf("foo id = %d\n", get_foo_id()); > (gdb) > foo id = 47 > 11 return (0); > (gdb) p foo_id > Cannot find thread-local storage for process 3982, shared library > libfoo.so: > Cannot find thread-local variables on this target > > I would have expected the second case to work and the first to fail and > for > the patch to fix the first case. I did a similar test and hit the same message. I figured it's because you need to build with -pthread. With -pthread, GDB manages to print the value in both cases (with and without separate debug info). That's why I ended up asking you this question, I thought that maybe I just hadn't reproduced the issue correctly. I have attached my test case (so we don't have to rewrite the same test over and over), which you can easily build with and without debug info (see the Makefile). Does it trigger the problem on FreeBSD? Does it work for you in both cases on Linux, like it does for me? Now that I take a look, there might be the gdb.threads/tls-shared.exp test that does the exact same thing. But there doesn't seem to be a board file to test with separate debug files out of the box... Simon
On 2/5/19 12:06 PM, Simon Marchi wrote: > On 2019-02-04 15:02, John Baldwin wrote: >> On 2/2/19 7:52 AM, Simon Marchi wrote: >>> On 2019-01-22 13:42, John Baldwin wrote: >>>> The object files stored in the shared object list are the original >>>> object files, not the separate debug object files. However, when >>>> resolving a TLS variable, the variable is resolved against a separate >>>> debug object file if it exists. In this case, >>>> svr4_fetch_objfile_link_map was failing to find the link map entry >>>> since the debug object file is not in its internal list, only the >>>> original object file. >>> >>> Does this solve an existing issue, or an issue that would come up with >>> the following patches? >> >> I only noticed while working on these patches, but I believe it is a >> generic issue. I tried to reproduce on a Linux box by compiling a >> small >> library with separate debug symbols and a program that linked against >> it >> and running it under gdb, but TLS variables didn't work for me even >> without >> separate debug symbols in my testing. :( >> >> $ cat foo.c >> #include "foo.h" >> >> static __thread int foo_id; >> >> void >> set_foo_id(int id) >> { >> foo_id = id; >> } >> >> int >> get_foo_id(void) >> { >> return foo_id; >> } >> $ cat foo.h >> void set_foo_id(int id); >> int get_foo_id(void); >> $ cat main.c >> #include <stdio.h> >> >> #include "foo.h" >> >> int >> main(void) >> { >> >> set_foo_id(47); >> printf("foo id = %d\n", get_foo_id()); >> return (0); >> } >> $ cc -g -fPIC -shared foo.c -o libfoo.so.full >> $ objcopy --only-keep-debug libfoo.so.full libfoo.so.debug >> $ objcopy --strip-debug --add-gnu-debuglink=libfoo.so.debug >> libfoo.so.full libfoo.so >> $ cc -g main.c -o foo -lfoo -L. >> $ gdb foo >> GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git >> ... >> (gdb) start >> Temporary breakpoint 1 at 0x7ae: file main.c, line 9. >> Starting program: /home/john/tls_lib/foo >> >> Temporary breakpoint 1, main () at main.c:9 >> 9 set_foo_id(47); >> (gdb) p foo_id >> Cannot find thread-local storage for process 3970, shared library >> libfoo.so: >> Cannot find thread-local variables on this target >> >> Then tried it without separate debug file, but that didn't work either: >> >> $ cc -g -fPIC -shared foo.c -o libfoo.so >> $ gdb foo >> GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git >> ... >> (gdb) start >> Temporary breakpoint 1 at 0x7ae: file main.c, line 9. >> Starting program: /home/john/tls_lib/foo >> >> Temporary breakpoint 1, main () at main.c:9 >> 9 set_foo_id(47); >> (gdb) p foo_id >> Cannot find thread-local storage for process 3982, shared library >> libfoo.so: >> Cannot find thread-local variables on this target >> (gdb) n >> 10 printf("foo id = %d\n", get_foo_id()); >> (gdb) >> foo id = 47 >> 11 return (0); >> (gdb) p foo_id >> Cannot find thread-local storage for process 3982, shared library >> libfoo.so: >> Cannot find thread-local variables on this target >> >> I would have expected the second case to work and the first to fail and >> for >> the patch to fix the first case. > > I did a similar test and hit the same message. I figured it's because > you need to build with -pthread. With -pthread, GDB manages to print > the value in both cases (with and without separate debug info). That's > why I ended up asking you this question, I thought that maybe I just > hadn't reproduced the issue correctly. > > I have attached my test case (so we don't have to rewrite the same test > over and over), which you can easily build with and without debug info > (see the Makefile). Does it trigger the problem on FreeBSD? Does it > work for you in both cases on Linux, like it does for me? > > Now that I take a look, there might be the gdb.threads/tls-shared.exp > test that does the exact same thing. But there doesn't seem to be a > board file to test with separate debug files out of the box... So, it works fine for me without this patch on FreeBSD. However, I dug around a bit more as I had only noticed this before when trying to debug a core dump for riscv (as FreeBSD/riscv64 core dumps have the tp register in them). That is when I get a separate objfile passed to svr4_fetch_objfile_link_map: Breakpoint 3, svr4_fetch_objfile_link_map (objfile=0x8038a1600) at ../../gdb/solib-svr4.c:1541 1541 struct svr4_info *info = get_svr4_info (); (top-gdb) p *objfile $3 = {next = 0x8038a1480, original_name = 0x8038df010 "/usr/lib/debug//ufs/riscv64/rootfs/lib/libc.so.7.debug", addr_low = 0x0, flags = { ... (top-gdb) where #0 svr4_fetch_objfile_link_map (objfile=0x8038a1600) at ../../gdb/solib-svr4.c:1541 #1 0x00000000015085e7 in gdbarch_fetch_tls_load_module_address (gdbarch=0x803800010, objfile=0x8038a1600) at ../../gdb/gdbarch.c:3019 #2 0x00000000019ba415 in target_translate_tls_address (objfile=0x8038a1600, offset=0x1800) at ../../gdb/target.c:705 #3 0x00000000016ea2a1 in find_minsym_type_and_address (msymbol=0x803cfcb18, objfile=0x8038a1600, address_p=0x7fffffffd2e8) at ../../gdb/parse.c:500 #4 0x00000000014afeb7 in evaluate_var_msym_value (noside=EVAL_NORMAL, objfile=0x8038a1600, msymbol=0x803cfcb18) at ../../gdb/eval.c:743 #5 0x00000000014b0290 in evaluate_subexp_standard (expect_type=0x0, exp=0x8034f0430, pos=0x7fffffffde34, noside=EVAL_NORMAL) at ../../gdb/eval.c:1326 #6 0x00000000012a8562 in evaluate_subexp_c (expect_type=0x0, exp=0x8034f0430, pos=0x7fffffffde34, noside=EVAL_NORMAL) at ../../gdb/c-lang.c:704 #7 0x00000000014ae17b in evaluate_subexp (expect_type=0x0, exp=0x8034f0430, pos=0x7fffffffde34, noside=EVAL_NORMAL) at ../../gdb/eval.c:80 #8 0x00000000014ae48b in evaluate_expression (exp=0x8034f0430) at ../../gdb/eval.c:141 #9 0x000000000173184f in print_command_1 ( exp=0x7fffffffe101 "__je_tsd_initialized", voidprint=1) at ../../gdb/printcmd.c:1187 #10 0x000000000172e5ff in print_command ( exp=0x7fffffffe101 "__je_tsd_initialized", from_tty=1) at ../../gdb/printcmd.c:1200 (top-gdb) frame 5 #5 0x00000000014b0290 in evaluate_subexp_standard (expect_type=0x0, exp=0x8034f0430, pos=0x7fffffffde34, noside=EVAL_NORMAL) at ../../gdb/eval.c:1326 1326 value *val = evaluate_var_msym_value (noside, (top-gdb) l 1321 case OP_VAR_MSYM_VALUE: 1322 { 1323 (*pos) += 3; 1324 1325 minimal_symbol *msymbol = exp->elts[pc + 2].msymbol; 1326 value *val = evaluate_var_msym_value (noside, 1327 exp->elts[pc + 1].objfile, 1328 msymbol); 1329 1330 type = value_type (val); whereas on a live amd64 process this was called from a different place: (top-gdb) where #0 svr4_fetch_objfile_link_map (objfile=0x803879380) at ../../gdb/solib-svr4.c:1541 #1 0x00000000015085e7 in gdbarch_fetch_tls_load_module_address (gdbarch=0x8039b7010, objfile=0x803879380) at ../../gdb/gdbarch.c:3019 #2 0x00000000019ba3f5 in target_translate_tls_address (objfile=0x803879380, offset=6168) at ../../gdb/target.c:705 #3 0x000000000141f361 in dwarf_evaluate_loc_desc::get_tls_address ( this=0x7fffffffc578, offset=6168) at ../../gdb/dwarf2loc.c:609 #4 0x00000000014077dd in dwarf_expr_context::execute_stack_op ( this=0x7fffffffc578, op_ptr=0x803dea6fb "\002S\177\001", op_end=0x803dea6fb "\002S\177\001") at ../../gdb/dwarf2expr.c:1175 #5 0x00000000014056b1 in dwarf_expr_context::eval (this=0x7fffffffc578, addr=0x803dea6f1 "\016\030\030", len=10) at ../../gdb/dwarf2expr.c:301 #6 0x000000000140e99d in dwarf2_evaluate_loc_desc_full (type=0x8040ad930, frame=0x0, data=0x803dea6f1 "\016\030\030", size=10, per_cu=0x8039cb190, subobj_type=0x8040ad930, subobj_byte_offset=0) at ../../gdb/dwarf2loc.c:2170 #7 0x000000000140e7d2 in dwarf2_evaluate_loc_desc (type=0x8040ad930, frame=0x0, data=0x803dea6f1 "\016\030\030", size=10, per_cu=0x8039cb190) at ../../gdb/dwarf2loc.c:2352 #8 0x0000000001412794 in locexpr_read_variable (symbol=0x80416c930, frame=0x0) at ../../gdb/dwarf2loc.c:3503 #9 0x00000000014e3743 in default_read_var_value (var=0x80416c930, var_block=0x8041724e0, frame=0x0) at ../../gdb/findvar.c:610 #10 0x00000000014e4881 in read_var_value (var=0x80416c930, var_block=0x8041724e0, frame=0x0) at ../../gdb/findvar.c:815 #11 0x0000000001a99abe in value_of_variable (var=0x80416c930, b=0x8041724e0) at ../../gdb/valops.c:1292 #12 0x00000000014afd7b in evaluate_var_value (noside=EVAL_NORMAL, blk=0x8041724e0, var=0x80416c930) at ../../gdb/eval.c:721 #13 0x00000000014b0217 in evaluate_subexp_standard (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/eval.c:1312 #14 0x00000000012a8562 in evaluate_subexp_c (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/c-lang.c:704 #15 0x00000000014ae17b in evaluate_subexp (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/eval.c:80 #16 0x00000000014ae48b in evaluate_expression (exp=0x803507830) at ../../gdb/eval.c:141 #17 0x000000000173184f in print_command_1 (exp=0x7fffffffd7d1 "__je_tsd_initialized", voidprint=1) at ../../gdb/printcmd.c:1187 #18 0x000000000172e5ff in print_command (exp=0x7fffffffd7d1 "__je_tsd_initialized", from_tty=1) at ../../gdb/printcmd.c:1200 (top-gdb) frame 13 #13 0x00000000014b0217 in evaluate_subexp_standard (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/eval.c:1312 1312 return evaluate_var_value (noside, exp->elts[pc + 1].block, var); (top-gdb) l 1307 (*pos) += 3; 1308 symbol *var = exp->elts[pc + 2].symbol; 1309 if (TYPE_CODE (SYMBOL_TYPE (var)) == TYPE_CODE_ERROR) 1310 error_unknown_type (SYMBOL_PRINT_NAME (var)); 1311 if (noside != EVAL_SKIP) 1312 return evaluate_var_value (noside, exp->elts[pc + 1].block, var); 1313 else 1314 { 1315 /* Return a dummy value of the correct type when skipping, so 1316 that parent functions know what is to be skipped. */ So it seems that the OP_VAR_VALUE path calls down into the dwarf bits that get the "original" objfile to pass to target_translate_tls_address, whereas the OP_VAR_MSYM_VALUE case ends up using a separate object file. This might in fact be due to bugs in the RISCV GCC backend as the TLS symbols for RISCV don't seem quite right. I have to cast TLS variables to their types for example: (gdb) p __je_tsd_initialized '__je_tsd_initialized` has unknown type; cast it to its declared type (gdb) p (bool)__je_tsd_initialized $2 = 1 '\001' Also, for me TLS variables in the main executable did not work for me on RISCV, only TLS variables in shared libraries, unlike on other architectures I tested where TLS variables in both the executable and shared libraries worked.
On 2/5/19 2:21 PM, John Baldwin wrote: > So it seems that the OP_VAR_VALUE path calls down into the dwarf bits that > get the "original" objfile to pass to target_translate_tls_address, whereas > the OP_VAR_MSYM_VALUE case ends up using a separate object file. This might > in fact be due to bugs in the RISCV GCC backend as the TLS symbols for RISCV > don't seem quite right. I have to cast TLS variables to their types for > example: > > (gdb) p __je_tsd_initialized > '__je_tsd_initialized` has unknown type; cast it to its declared type > (gdb) p (bool)__je_tsd_initialized > $2 = 1 '\001' > > Also, for me TLS variables in the main executable did not work for me on > RISCV, only TLS variables in shared libraries, unlike on other architectures > I tested where TLS variables in both the executable and shared libraries > worked. I should have said: I think this means I probably need to rework the commit message a bit to explain that this doesn't always happen with separate debug files, but it can, and if so this fix is needed.
On 2019-02-05 17:33, John Baldwin wrote: > On 2/5/19 2:21 PM, John Baldwin wrote: >> So it seems that the OP_VAR_VALUE path calls down into the dwarf bits >> that >> get the "original" objfile to pass to target_translate_tls_address, >> whereas >> the OP_VAR_MSYM_VALUE case ends up using a separate object file. This >> might >> in fact be due to bugs in the RISCV GCC backend as the TLS symbols for >> RISCV >> don't seem quite right. I have to cast TLS variables to their types >> for >> example: >> >> (gdb) p __je_tsd_initialized >> '__je_tsd_initialized` has unknown type; cast it to its declared type >> (gdb) p (bool)__je_tsd_initialized >> $2 = 1 '\001' >> >> Also, for me TLS variables in the main executable did not work for me >> on >> RISCV, only TLS variables in shared libraries, unlike on other >> architectures >> I tested where TLS variables in both the executable and shared >> libraries >> worked. > > I should have said: I think this means I probably need to rework the > commit > message a bit to explain that this doesn't always happen with separate > debug files, but it can, and if so this fix is needed. After discussing with John on IRC and trying by all means to trigger this bug, this is what I ended up with. On x86/GNU/Linux, it is possible to trigger this bug in a rather contrived way, but at least it shows that there is indeed a bug in GDB. The requirements are: 1. Have a TLS variable in a shared library ("libfoo.so") 2. The .o containing the TLS variable should not be described with DWARF. This can be done simply by compiling it without -g. 3. The shared lib should still have a separate debug info file ("libfoo.so.debug"). 4. The shared lib should be stripped of its unnecessary ELF symbols (ran through the strip utility) 5. The shared library should not be the first object file in the program to have TLS. This can be done by adding a TLS variable in the main executable ("main"). 6. Because we don't have debug info for the variable we try to print, we need to cast it to its expected type, e.g. "print (int)foo_id". With all this, when parsing "(int)foo_id", we find a minimal symbol matching foo_id in the objfile representing libfoo.so.debug. find_minsym_type_and_address is eventually called with that objfile, which calls target_translate_tls_address, which calls svr4_fetch_objfile_link_map. The latter obviously can't find a link_map matching the separate-debug-info objfile, and wrong things happen after. Without #2 above, the DWARF symbol is found and some other code path is taken, where the separate-debug-info objfile is replaced with the "real" objfile at some point. Without #3, we obviously wouldn't end up with a separate-debug-info objfile in svr4_fetch_objfile_link_map. Without #4 above, we would find the objfile for libfoo.so first, so we wouldn't end up with the seaprate-debug-info objfile in svr4_fetch_objfile_link_map. Without #5 we actually get the right result by chance. This is because we end up in the special case "else" of thread_db_target::get_thread_local_address. That special case hardcodes the module id to read from to 1. If the shared lib is the only module to have TLS, then it happens to be the right one. By making the main executable have some TLS, we will end up reading the TLS for the wrong module, and thus trigger the bug. I have not yet found the motivation to write a proper test for this (in particular, I am not sure how to build the lib with separate debug info in the testsuite). But I attached a reproducer for future reference. You should just need to "make" and "gdb -x run.gdb". The wrong value is printed with today's GDB. In John's specific case, apparently the DWARF debug info for TLS variables on riscv is broken, which brought him to roughly the same state. I don't have any more info about that. All this to say that I think this patch is fine. I wondered whether it was better instead to make svr4_fetch_objfile_link_map assert that it receives a "real" objfile (objfile->separate_debug_objfile_backlink == NULL) and push the responsibility to the caller to provide a correct objfile. But in the end I didn't find a compelling to do this rather than what John did in this patch. So therefore, the patch LGTM. Simon
On 2019-02-06 23:05, Simon Marchi wrote: > I have not yet found the motivation to write a proper test for this > (in particular, I am not sure how to build the lib with separate debug > info in the testsuite). But I attached a reproducer for future > reference. You should just need to "make" and "gdb -x run.gdb". The > wrong value is printed with today's GDB. And of course I forgot the attachment... here it is. Simon
diff --git a/gdb/ChangeLog b/gdb/ChangeLog index 056a60fa23..c02e534fe5 100644 --- a/gdb/ChangeLog +++ b/gdb/ChangeLog @@ -1,3 +1,8 @@ +2019-01-22 John Baldwin <jhb@FreeBSD.org> + + * solib-svr4.c (svr4_fetch_objfile_link_map): Look for + objfile->separate_debug_objfile_backlink if not NULL. + 2019-01-22 John Baldwin <jhb@FreeBSD.org> * amd64-bsd-nat.c (amd64bsd_fetch_inferior_registers): Use diff --git a/gdb/solib-svr4.c b/gdb/solib-svr4.c index 787b98ba26..d7a792580d 100644 --- a/gdb/solib-svr4.c +++ b/gdb/solib-svr4.c @@ -1548,6 +1548,11 @@ svr4_fetch_objfile_link_map (struct objfile *objfile) if (objfile == symfile_objfile) return info->main_lm_addr; + /* If OBJFILE is a separate debug object file, look for the + original object file. */ + if (objfile->separate_debug_objfile_backlink != NULL) + objfile = objfile->separate_debug_objfile_backlink; + /* The other link map addresses may be found by examining the list of shared libraries. */ for (so = master_so_list (); so; so = so->next)