Using libthread_db.so with single-threaded programs, for TLS access

Message ID f55d2a4d-9e07-5be2-5669-c0e75eaec6e7@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves Sept. 14, 2017, 12:02 a.m. UTC
  On 09/13/2017 12:22 PM, Pedro Alves wrote:
> On 09/07/2017 12:34 PM, Pedro Alves wrote:
>> On 09/06/2017 10:03 PM, Zack Weinberg wrote:
>>
>>> So, changes to both gdb and libthread_db seem to be required here.  I
>>> do think that _in principle_ it ought to be possible to use
>>> libthread_db to retrieve the address of thread-local data even if the
>>> inferior is not linked with libpthread; glibc has quite a few
>>> thread-specific variables (errno most prominent, of course, but also
>>> h_errno, _res, etc), and so might any library which can be used from
>>> both single- and multithreaded programs.
>>>
>>> This is really not code I feel comfortable hacking up, though, and
>>> it's probably more of a project than I have time for, in any case.
>>
>> Sounds like a promising approach though.  I'd like to see this path
>> explored a bit more.  I'll keep this in my TODO, even though it's
>> not likely to bubble up very soon.  Thanks for the discussion/ideas!
> 
> So I played with this a bit more on the plane back from Cauldron,
> to try to see if we'd hit some major roadblock.  I also chatted
> with Carlos a bit about this back at the Cauldron, and seemingly
> there's no major reason this can't be made to work,
> TLS-internals-wise.
> 
> Seems like that it's mainly a case of moving libthread_db.so-related
> symbols from libpthread.so elsewhere.  More below.
> 
> I hacked libthread_db.so to disable the nptl_version check, so that
> it always successfully loads with non-threaded programs.  And then
> I tweaked GDB enough to make it actually reach libthread_db.so's
> td_thr_tls_get_addr in that scenario too.  That's when I hit
> another snag: the symbols that libthread_db.so needs which describe
> the necessary offsets of internal data structures for getting at the
> TLS blocks are also in libpthread.so...  In particular, the first
> we stumble on is "_thread_db_link_map_l_tls_modid".  I made GDB
> print the symbol lookups to make it easier to debug.  Vis:
> 
>  (gdb) p errno
>  ps_pglobal_lookup: name="__stack_user" => PS_NOSYM
>  warning: Cannot find user-level thread for LWP 31772: generic error
>  ps_pglobal_lookup: name="_thread_db_link_map_l_tls_modid" => PS_NOSYM
>  Cannot find thread-local storage for process 31772, shared library /lib64/libc.so.6:
>  operation not applicable to
> 
> The lookup is coming from here:
> 
>  (top-gdb) bt
>  #0  ps_pglobal_lookup (ph=0x1f65fe0, obj=0x7fffe58f93ae "libpthread.so.0", name=0x7fffe58f9e48 "_thread_db_link_map_l_tls_modid", sym_addr=0x7fffffffc428) at src/gdb/proc-service.c:115
>  #1  0x00007fffe58f88a8 in td_mod_lookup (ps=<optimized out>, mod=mod@entry=0x7fffe58f93ae "libpthread.so.0", idx=<optimized out>, sym_addr=sym_addr@entry=0x7fffffffc428)
>      at td_symbol_list.c:48
>  #2  0x00007fffe58f8f45 in _td_locate_field (ta=ta@entry=0x1f84df0, desc=desc@entry=0x1f84fbc, descriptor_name=descriptor_name@entry=43, idx=idx@entry=0x0, 
>      address=address@entry=0x7fffffffc458) at fetch-value.c:54
>  #3  0x00007fffe58f8ff0 in _td_fetch_value (ta=0x1f84df0, desc=0x1f84fbc, descriptor_name=descriptor_name@entry=43, idx=idx@entry=0x0, address=0x7ffff7ff7658, 
>      result=result@entry=0x7fffffffc498) at fetch-value.c:94
>  #4  0x00007fffe58f8ddf in td_thr_tls_get_addr (th=0x7fffffffc4e0, map_address=<optimized out>, offset=16, address=0x7fffffffc4f8) at td_thr_tls_get_addr.c:31
>  ...
> 
> So we'd need to move that symbol (and maybe others) to one of ld.so/libc.so
> instead.  AFAICT, those magic symbols are described in nptl_db/structs.def.
> I haven't looked enough to figure out what ends up expanding those macros
> in libpthread.so.  This is where I stopped.
> 
> I'm attaching the gdb and libthread_db.so patches I used, both against current
> master in their respective projects.  See comments within the patches.
> I've also pushed the gdb patch to a "users/palves/tls-nonthreaded" branch.
> (I don't think I have write access to glibc's git.)

I played some more with this tonight, and got it working.
I found that libpthread.so gets the symbols from structs.def
via this bit in pthread_create.c:

~~~
 /* Information for libthread_db.  */

 #include "../nptl_db/db_info.c"
~~~

So I added the same thing to elf/dl-tls.c (because TLS stuff),
with a tweak to structs.def to make it skip some symbols.

Compared to the other version, I also defined nptl_version so that
libthread_db.so's version check works.  And I made td_ta_map_lwp2thr
return the initial thread's fake descriptor when __stack_used isn't
found, so that gdb gets back a descriptor instead of an error.

I haven't tried running glibc's testsuite, nor gdb's.

But, with:

 GLIBC=/home/pedro/src/glibc/build/ gcc \
   -Wl,-rpath=${GLIBC}:${GLIBC}/math:${GLIBC}/elf:${GLIBC}/dlfcn:${GLIBC}/nss:${GLIBC}/nis:${GLIBC}/rt:${GLIBC}/resolv:${GLIBC}/crypt:${GLIBC}/nptl:${GLIBC}/dfp \
   -Wl,--dynamic-linker=${GLIBC}/elf/ld.so \
   -g -O0 \
   errno.c -o errno

 $ gdb 
    -ex "set debug libthread-db 1" \
    -ex "set auto-load safe-path /home/pedro/src/glibc/build/nptl_db/" \
    -ex "set libthread-db-search-path /home/pedro/src/glibc/build/nptl_db/" \
   ~/tmp/errno

I get:

 ...
 Temporary breakpoint 1, main () at errno.c:6
 6         errno = 2;
 (gdb) p errno
 $1 = 0
 (gdb) n
 7         return errno;
 (gdb) p errno
 $2 = 2
 (gdb) p &errno
 $3 = (int *) 0x7ffff7ff36c0

So WDYT?  Do you think there's a chance we could get something
like this merged?

Thanks,
Pedro Alves
  

Comments

Carlos O'Donell Sept. 18, 2017, 1:17 p.m. UTC | #1
On 09/13/2017 06:02 PM, Pedro Alves wrote:
> On 09/13/2017 12:22 PM, Pedro Alves wrote:
>> On 09/07/2017 12:34 PM, Pedro Alves wrote:
>>> On 09/06/2017 10:03 PM, Zack Weinberg wrote:
>>>
>>>> So, changes to both gdb and libthread_db seem to be required here.  I
>>>> do think that _in principle_ it ought to be possible to use
>>>> libthread_db to retrieve the address of thread-local data even if the
>>>> inferior is not linked with libpthread; glibc has quite a few
>>>> thread-specific variables (errno most prominent, of course, but also
>>>> h_errno, _res, etc), and so might any library which can be used from
>>>> both single- and multithreaded programs.
>>>>
>>>> This is really not code I feel comfortable hacking up, though, and
>>>> it's probably more of a project than I have time for, in any case.
>>>
>>> Sounds like a promising approach though.  I'd like to see this path
>>> explored a bit more.  I'll keep this in my TODO, even though it's
>>> not likely to bubble up very soon.  Thanks for the discussion/ideas!
>>
>> So I played with this a bit more on the plane back from Cauldron,
>> to try to see if we'd hit some major roadblock.  I also chatted
>> with Carlos a bit about this back at the Cauldron, and seemingly
>> there's no major reason this can't be made to work,
>> TLS-internals-wise.
>>
>> Seems like that it's mainly a case of moving libthread_db.so-related
>> symbols from libpthread.so elsewhere.  More below.
>>
>> I hacked libthread_db.so to disable the nptl_version check, so that
>> it always successfully loads with non-threaded programs.  And then
>> I tweaked GDB enough to make it actually reach libthread_db.so's
>> td_thr_tls_get_addr in that scenario too.  That's when I hit
>> another snag: the symbols that libthread_db.so needs which describe
>> the necessary offsets of internal data structures for getting at the
>> TLS blocks are also in libpthread.so...  In particular, the first
>> we stumble on is "_thread_db_link_map_l_tls_modid".  I made GDB
>> print the symbol lookups to make it easier to debug.  Vis:
>>
>>  (gdb) p errno
>>  ps_pglobal_lookup: name="__stack_user" => PS_NOSYM
>>  warning: Cannot find user-level thread for LWP 31772: generic error
>>  ps_pglobal_lookup: name="_thread_db_link_map_l_tls_modid" => PS_NOSYM
>>  Cannot find thread-local storage for process 31772, shared library /lib64/libc.so.6:
>>  operation not applicable to
>>
>> The lookup is coming from here:
>>
>>  (top-gdb) bt
>>  #0  ps_pglobal_lookup (ph=0x1f65fe0, obj=0x7fffe58f93ae "libpthread.so.0", name=0x7fffe58f9e48 "_thread_db_link_map_l_tls_modid", sym_addr=0x7fffffffc428) at src/gdb/proc-service.c:115
>>  #1  0x00007fffe58f88a8 in td_mod_lookup (ps=<optimized out>, mod=mod@entry=0x7fffe58f93ae "libpthread.so.0", idx=<optimized out>, sym_addr=sym_addr@entry=0x7fffffffc428)
>>      at td_symbol_list.c:48
>>  #2  0x00007fffe58f8f45 in _td_locate_field (ta=ta@entry=0x1f84df0, desc=desc@entry=0x1f84fbc, descriptor_name=descriptor_name@entry=43, idx=idx@entry=0x0, 
>>      address=address@entry=0x7fffffffc458) at fetch-value.c:54
>>  #3  0x00007fffe58f8ff0 in _td_fetch_value (ta=0x1f84df0, desc=0x1f84fbc, descriptor_name=descriptor_name@entry=43, idx=idx@entry=0x0, address=0x7ffff7ff7658, 
>>      result=result@entry=0x7fffffffc498) at fetch-value.c:94
>>  #4  0x00007fffe58f8ddf in td_thr_tls_get_addr (th=0x7fffffffc4e0, map_address=<optimized out>, offset=16, address=0x7fffffffc4f8) at td_thr_tls_get_addr.c:31
>>  ...
>>
>> So we'd need to move that symbol (and maybe others) to one of ld.so/libc.so
>> instead.  AFAICT, those magic symbols are described in nptl_db/structs.def.
>> I haven't looked enough to figure out what ends up expanding those macros
>> in libpthread.so.  This is where I stopped.
>>
>> I'm attaching the gdb and libthread_db.so patches I used, both against current
>> master in their respective projects.  See comments within the patches.
>> I've also pushed the gdb patch to a "users/palves/tls-nonthreaded" branch.
>> (I don't think I have write access to glibc's git.)
> 
> I played some more with this tonight, and got it working.
> I found that libpthread.so gets the symbols from structs.def
> via this bit in pthread_create.c:
> 
> ~~~
>  /* Information for libthread_db.  */
> 
>  #include "../nptl_db/db_info.c"
> ~~~
> 
> So I added the same thing to elf/dl-tls.c (because TLS stuff),
> with a tweak to structs.def to make it skip some symbols.
> 
> Compared to the other version, I also defined nptl_version so that
> libthread_db.so's version check works.  And I made td_ta_map_lwp2thr
> return the initial thread's fake descriptor when __stack_used isn't
> found, so that gdb gets back a descriptor instead of an error.
> 
> I haven't tried running glibc's testsuite, nor gdb's.
> 
> But, with:
> 
>  GLIBC=/home/pedro/src/glibc/build/ gcc \
>    -Wl,-rpath=${GLIBC}:${GLIBC}/math:${GLIBC}/elf:${GLIBC}/dlfcn:${GLIBC}/nss:${GLIBC}/nis:${GLIBC}/rt:${GLIBC}/resolv:${GLIBC}/crypt:${GLIBC}/nptl:${GLIBC}/dfp \
>    -Wl,--dynamic-linker=${GLIBC}/elf/ld.so \
>    -g -O0 \
>    errno.c -o errno
> 
>  $ gdb 
>     -ex "set debug libthread-db 1" \
>     -ex "set auto-load safe-path /home/pedro/src/glibc/build/nptl_db/" \
>     -ex "set libthread-db-search-path /home/pedro/src/glibc/build/nptl_db/" \
>    ~/tmp/errno
> 
> I get:
> 
>  ...
>  Temporary breakpoint 1, main () at errno.c:6
>  6         errno = 2;
>  (gdb) p errno
>  $1 = 0
>  (gdb) n
>  7         return errno;
>  (gdb) p errno
>  $2 = 2
>  (gdb) p &errno
>  $3 = (int *) 0x7ffff7ff36c0
> 
> So WDYT?  Do you think there's a chance we could get something
> like this merged?

From the glibc side I think your patch goes in a very desirable direction.
We want this to look the same if there is one thread (main thread 0) or
more than one thread. Therefore the abstraction is valuable.

At the implementation level I would want to see the changes to dl-tls.c
much more restricted. Including db_info.c seems like just an expedient
airplane hack. We need a cleaner way to define the symbols you need.

So the patch isn't ready, but the idea is solid. I don't have time to look
at this right now, but I will in maybe a month (yes my queue is that deep
right now).
  
Pedro Alves Sept. 18, 2017, 2:28 p.m. UTC | #2
Hi Carlos,

On 09/18/2017 02:17 PM, Carlos O'Donell wrote:
> On 09/13/2017 06:02 PM, Pedro Alves wrote:

> From the glibc side I think your patch goes in a very desirable direction.
> We want this to look the same if there is one thread (main thread 0) or
> more than one thread. Therefore the abstraction is valuable.

Great to hear, thanks!

> At the implementation level I would want to see the changes to dl-tls.c
> much more restricted. Including db_info.c seems like just an expedient
> airplane hack. We need a cleaner way to define the symbols you need.

Yeah, I was aiming for the "minimal viable product", and just
followed what nptl/pthread_create.c does.  It's possible that we
could trim the defined symbols, I haven't looked much beyond
the surface.

I suspect that nptl/pthread_create.c includes ../nptl_db/db_info.c
directly for static links (-static).  I.e., so that the
thread_db-related symbols are defined (and are only defined)
with programs that call pthread_create.  It's possible that
my prototype breaks -static linking with multiple definition
errors, I haven't tried it.

> 
> So the patch isn't ready, but the idea is solid. I don't have time to look
> at this right now, but I will in maybe a month (yes my queue is that deep
> right now).

Likewise.  If someone more familiar with glibc internals wants to
run with this, they're more than welcome.

Thanks,
Pedro Alves
  

Patch

From 77e31bd04a937264436ddaa63ebceeb9a6c9b906 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 13 Sep 2017 23:58:03 +0100
Subject: [PATCH] Allow loading libthread_db.so with non-threaded processes

---
 elf/dl-tls.c                | 13 +++++++++++++
 nptl_db/structs.def         |  6 ++++--
 nptl_db/td_ta_map_lwp2thr.c | 27 +++++++++++++++++++--------
 nptl_db/td_thr_get_info.c   | 12 ++++++++++--
 4 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 5aba33b..8d43e38 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -951,3 +951,16 @@  cannot create TLS data structures"));
   listp->slotinfo[idx].map = l;
   listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
 }
+
+/* Information for libthread_db.  */
+
+#include <version.h>
+#include <ldsodefs.h>
+
+/* Version of the library, used in libthread_db to detect mismatches.  */
+static const char nptl_version[] __attribute_used__ = VERSION;
+
+/* Likely could have better #if checks in structs.def instead.  */
+#define DB_RTLD_SYMS 1
+#include "../nptl_db/db_info.c"
+
diff --git a/nptl_db/structs.def b/nptl_db/structs.def
index 8a65afd..f9f9ab7 100644
--- a/nptl_db/structs.def
+++ b/nptl_db/structs.def
@@ -31,7 +31,7 @@ 
 #endif
 
 #ifndef DB_RTLD_GLOBAL_FIELD
-# if !IS_IN (libpthread)
+# if !IS_IN (libpthread) && !defined DB_RTLD_SYMS
 #  define DB_RTLD_GLOBAL_FIELD(field)		\
   DB_STRUCT_FIELD (rtld_global, _##field)	\
   DB_MAIN_VARIABLE (_##field)
@@ -76,8 +76,10 @@  DB_FUNCTION (__nptl_create_event)
 DB_FUNCTION (__nptl_death_event)
 DB_SYMBOL (__nptl_threads_events)
 DB_VARIABLE (__nptl_nthreads)
+#if !defined DB_RTLD_SYMS
 DB_VARIABLE (__nptl_last_event)
 DB_VARIABLE (__nptl_initial_report_events)
+#endif
 
 DB_ARRAY_VARIABLE (__pthread_keys)
 DB_STRUCT (pthread_key_struct)
@@ -101,7 +103,7 @@  DB_STRUCT_FIELD (dtv_t, counter)
 DB_STRUCT_FIELD (pthread, dtvp)
 #endif
 
-#if !(IS_IN (libpthread) && !defined SHARED)
+#if !(IS_IN (libpthread) && !defined SHARED) && !defined DB_RTLD_SYMS
 DB_STRUCT (rtld_global)
 DB_RTLD_VARIABLE (_rtld_global)
 #endif
diff --git a/nptl_db/td_ta_map_lwp2thr.c b/nptl_db/td_ta_map_lwp2thr.c
index d34711d..798c6e9 100644
--- a/nptl_db/td_ta_map_lwp2thr.c
+++ b/nptl_db/td_ta_map_lwp2thr.c
@@ -168,6 +168,20 @@  __td_ta_lookup_th_unique (const td_thragent_t *ta_arg,
   return TD_OK;
 }
 
+/* If LWPID is the thread group leader, return the fake special
+   descriptor for the initial thread.  Otherwise, return error.  */
+
+static td_err_e
+initial_thread_desc (td_thragent_t *const ta,
+		     lwpid_t lwpid, td_thrhandle_t *th)
+{
+  if (ps_getpid (ta->ph) != lwpid)
+    return TD_ERR;
+  th->th_ta_p = ta;
+  th->th_unique = 0;
+  return TD_OK;
+}
+
 td_err_e
 td_ta_map_lwp2thr (const td_thragent_t *ta_arg,
 		   lwpid_t lwpid, td_thrhandle_t *th)
@@ -189,20 +203,17 @@  td_ta_map_lwp2thr (const td_thragent_t *ta_arg,
   psaddr_t list;
   td_err_e err = DB_GET_SYMBOL (list, ta, __stack_user);
   if (err != TD_OK)
-    return err;
+    {
+      /* Looks like this is a non-threaded program.  */
+      return initial_thread_desc (ta, lwpid, th);
+    }
 
   err = DB_GET_FIELD (list, ta, list, list_t, next, 0);
   if (err != TD_OK)
     return err;
 
   if (list == 0)
-    {
-      if (ps_getpid (ta->ph) != lwpid)
-	return TD_ERR;
-      th->th_ta_p = ta;
-      th->th_unique = 0;
-      return TD_OK;
-    }
+    return initial_thread_desc (ta, lwpid, th);
 
   return __td_ta_lookup_th_unique (ta_arg, lwpid, th);
 }
diff --git a/nptl_db/td_thr_get_info.c b/nptl_db/td_thr_get_info.c
index 48979b8..e92bd1f 100644
--- a/nptl_db/td_thr_get_info.c
+++ b/nptl_db/td_thr_get_info.c
@@ -43,6 +43,14 @@  td_thr_get_info (const td_thrhandle_t *th, td_thrinfo_t *infop)
       tid = 0;
       err = DB_GET_VALUE (report_events, th->th_ta_p,
 			  __nptl_initial_report_events, 0);
+      if (err != TD_OK)
+	{
+	  /* If the program is non-threaded,
+	     __nptl_initial_report_events is nowhere to be found (it
+	     only exists in libpthread.so).  */
+	  report_events = 0;
+	  err = TD_OK;
+	}
     }
   else
     {
@@ -75,9 +83,9 @@  td_thr_get_info (const td_thrhandle_t *th, td_thrinfo_t *infop)
 	return err;
       err = DB_GET_FIELD_LOCAL (report_events, th->th_ta_p, copy, pthread,
 				report_events, 0);
+      if (err != TD_OK)
+	return err;
     }
-  if (err != TD_OK)
-    return err;
 
   /* Fill in information.  Clear first to provide reproducable
      results for the fields we do not fill in.  */
-- 
2.5.5