Using libthread_db.so with single-threaded programs, for TLS access (was: Re: [RFC PATCH 0/3] Pretty-printing for errno)

Message ID 72a8ac9b-2429-c8bd-83b9-d758224571c5@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves Sept. 13, 2017, 11:22 a.m. UTC
  On 09/07/2017 12:34 PM, Pedro Alves wrote:
> On 09/06/2017 10:03 PM, Zack Weinberg wrote:
> 
>> So, changes to both gdb and libthread_db seem to be required here.  I
>> do think that _in principle_ it ought to be possible to use
>> libthread_db to retrieve the address of thread-local data even if the
>> inferior is not linked with libpthread; glibc has quite a few
>> thread-specific variables (errno most prominent, of course, but also
>> h_errno, _res, etc), and so might any library which can be used from
>> both single- and multithreaded programs.
>>
>> This is really not code I feel comfortable hacking up, though, and
>> it's probably more of a project than I have time for, in any case.
> 
> Sounds like a promising approach though.  I'd like to see this path
> explored a bit more.  I'll keep this in my TODO, even though it's
> not likely to bubble up very soon.  Thanks for the discussion/ideas!

So I played with this a bit more on the plane back from Cauldron,
to try to see if we'd hit some major roadblock.  I also chatted
with Carlos a bit about this back at the Cauldron, and seemingly
there's no major reason this can't be made to work,
TLS-internals-wise.

Seems like that it's mainly a case of moving libthread_db.so-related
symbols from libpthread.so elsewhere.  More below.

I hacked libthread_db.so to disable the nptl_version check, so that
it always successfully loads with non-threaded programs.  And then
I tweaked GDB enough to make it actually reach libthread_db.so's
td_thr_tls_get_addr in that scenario too.  That's when I hit
another snag: the symbols that libthread_db.so needs which describe
the necessary offsets of internal data structures for getting at the
TLS blocks are also in libpthread.so...  In particular, the first
we stumble on is "_thread_db_link_map_l_tls_modid".  I made GDB
print the symbol lookups to make it easier to debug.  Vis:

 (gdb) p errno
 ps_pglobal_lookup: name="__stack_user" => PS_NOSYM
 warning: Cannot find user-level thread for LWP 31772: generic error
 ps_pglobal_lookup: name="_thread_db_link_map_l_tls_modid" => PS_NOSYM
 Cannot find thread-local storage for process 31772, shared library /lib64/libc.so.6:
 operation not applicable to

The lookup is coming from here:

 (top-gdb) bt
 #0  ps_pglobal_lookup (ph=0x1f65fe0, obj=0x7fffe58f93ae "libpthread.so.0", name=0x7fffe58f9e48 "_thread_db_link_map_l_tls_modid", sym_addr=0x7fffffffc428) at src/gdb/proc-service.c:115
 #1  0x00007fffe58f88a8 in td_mod_lookup (ps=<optimized out>, mod=mod@entry=0x7fffe58f93ae "libpthread.so.0", idx=<optimized out>, sym_addr=sym_addr@entry=0x7fffffffc428)
     at td_symbol_list.c:48
 #2  0x00007fffe58f8f45 in _td_locate_field (ta=ta@entry=0x1f84df0, desc=desc@entry=0x1f84fbc, descriptor_name=descriptor_name@entry=43, idx=idx@entry=0x0, 
     address=address@entry=0x7fffffffc458) at fetch-value.c:54
 #3  0x00007fffe58f8ff0 in _td_fetch_value (ta=0x1f84df0, desc=0x1f84fbc, descriptor_name=descriptor_name@entry=43, idx=idx@entry=0x0, address=0x7ffff7ff7658, 
     result=result@entry=0x7fffffffc498) at fetch-value.c:94
 #4  0x00007fffe58f8ddf in td_thr_tls_get_addr (th=0x7fffffffc4e0, map_address=<optimized out>, offset=16, address=0x7fffffffc4f8) at td_thr_tls_get_addr.c:31
 ...

So we'd need to move that symbol (and maybe others) to one of ld.so/libc.so
instead.  AFAICT, those magic symbols are described in nptl_db/structs.def.
I haven't looked enough to figure out what ends up expanding those macros
in libpthread.so.  This is where I stopped.

I'm attaching the gdb and libthread_db.so patches I used, both against current
master in their respective projects.  See comments within the patches.
I've also pushed the gdb patch to a "users/palves/tls-nonthreaded" branch.
(I don't think I have write access to glibc's git.)

Thanks,
Pedro Alves
  

Comments

Philippe Waroquiers Sept. 13, 2017, 7:26 p.m. UTC | #1
On Wed, 2017-09-13 at 12:22 +0100, Pedro Alves wrote:
> On 09/07/2017 12:34 PM, Pedro Alves wrote:
> > On 09/06/2017 10:03 PM, Zack Weinberg wrote:
> > 
> > > So, changes to both gdb and libthread_db seem to be required
> > > here.  I
> > > do think that _in principle_ it ought to be possible to use
> > > libthread_db to retrieve the address of thread-local data even if
> > > the
> > > inferior is not linked with libpthread; glibc has quite a few
> > > thread-specific variables (errno most prominent, of course, but
> > > also
> > > h_errno, _res, etc), and so might any library which can be used
> > > from
> > > both single- and multithreaded programs.
> > > 
> > > This is really not code I feel comfortable hacking up, though,
> > > and
> > > it's probably more of a project than I have time for, in any
> > > case.
> > 
> > Sounds like a promising approach though.  I'd like to see this path
> > explored a bit more.  I'll keep this in my TODO, even though it's
> > not likely to bubble up very soon.  Thanks for the
> > discussion/ideas!
> 
> So I played with this a bit more on the plane back from Cauldron,
> to try to see if we'd hit some major roadblock.  I also chatted
> with Carlos a bit about this back at the Cauldron, and seemingly
> there's no major reason this can't be made to work,
> TLS-internals-wise.
> 
> Seems like that it's mainly a case of moving libthread_db.so-related
> symbols from libpthread.so elsewhere.  More below.
Note that in the valgrind gdbserver, I had to handle the same problem
i.e. find the address of a tls variable without access to any
library (valgrind cannot make use of any library including glibc).

So, I finally end-ed up implementing the minimum logic for that.
It is based on some real ugly hacks, e.g. to get the offset of
lm_modid in struct link_map.
There is also some arch dependent 1 or 2 lines of code to get the dtv.


This is all somewhat fragile, was done in 2014, not broken (yet).
But some more recent changes might have broken the hack,
as I have a test failing after upgrading to Debian 9.

See valgrind  coregrind/m_gdbserver/server.c handling of qGetTLSAddr
for
the gory/hacky details.

Better (even partial) support for such things without the need of a
library would significantly improve my life :)

Philippe
  

Patch

From 386b8dc8ef16197b3efa38f4bbbc98833ce7c2c6 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Mon, 11 Sep 2017 13:48:04 +0100
Subject: [PATCH] remove version checks hack

---
 nptl_db/td_ta_map_lwp2thr.c | 10 ++++++++--
 nptl_db/td_ta_new.c         |  9 ++++++++-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/nptl_db/td_ta_map_lwp2thr.c b/nptl_db/td_ta_map_lwp2thr.c
index d34711d..85620cb 100644
--- a/nptl_db/td_ta_map_lwp2thr.c
+++ b/nptl_db/td_ta_map_lwp2thr.c
@@ -185,11 +185,17 @@  td_ta_map_lwp2thr (const td_thragent_t *ta_arg,
      sometimes contain garbage that would confuse us, left by the kernel
      at exec.  So if it looks like initialization is incomplete, we only
      fake a special descriptor for the initial thread.  */
-
   psaddr_t list;
   td_err_e err = DB_GET_SYMBOL (list, ta, __stack_user);
   if (err != TD_OK)
-    return err;
+    {
+      /* '__stack_user' is in pthread.so, so this always fails with
+	 non-threaded programs.  GDB hardcodes/assumes th_unique==0
+	 for the main thread - maybe we should instead return the fake
+	 special descriptor for the initial thread here too.  See
+	 below.  */
+      return err;
+    }
 
   err = DB_GET_FIELD (list, ta, list, list_t, next, 0);
   if (err != TD_OK)
diff --git a/nptl_db/td_ta_new.c b/nptl_db/td_ta_new.c
index aec2356..40424ad 100644
--- a/nptl_db/td_ta_new.c
+++ b/nptl_db/td_ta_new.c
@@ -33,12 +33,18 @@  LIST_HEAD (__td_agent_list);
 td_err_e
 td_ta_new (struct ps_prochandle *ps, td_thragent_t **ta)
 {
+#if 0
   psaddr_t versaddr;
   char versbuf[sizeof (VERSION)];
+#endif
 
   LOG ("td_ta_new");
 
-  /* Check whether the versions match.  */
+  /* Check whether the versions match.
+
+     XXX: Disabled because "nptl_version" currently lives in
+     libpthread.so.  */
+#if 0
   if (td_lookup (ps, SYM_nptl_version, &versaddr) != PS_OK)
     return TD_NOLIBTHREAD;
   if (ps_pdread (ps, versaddr, versbuf, sizeof (versbuf)) != PS_OK)
@@ -47,6 +53,7 @@  td_ta_new (struct ps_prochandle *ps, td_thragent_t **ta)
   if (memcmp (versbuf, VERSION, sizeof VERSION) != 0)
     /* Not the right version.  */
     return TD_VERSION;
+#endif
 
   /* Fill in the appropriate information.  */
   *ta = (td_thragent_t *) calloc (1, sizeof (td_thragent_t));
-- 
2.5.5