Cell multi-arch broken (Re: [PATCH 2/2] GNU/Linux: Stop using libthread_db/td_ta_thr_iter)

Message ID 20150826173903.3ADD439FA@oc7340732750.ibm.com
State New, archived
Headers

Commit Message

Ulrich Weigand Aug. 26, 2015, 5:39 p.m. UTC
  Pedro Alves wrote:

> gdb/ChangeLog:
> 2015-02-08  Pedro Alves  <palves@redhat.com>

> @@ -1719,6 +1696,9 @@ find_new_threads_once (struct thread_db_info *info, int iteration,
>    data.info = info;
>    data.new_threads = 0;
>  
> +  /* See comment in thread_db_update_thread_list.  */
> +  gdb_assert (!target_has_execution || thread_db_use_events ());
> +


Now that I've had the occasion to run some tests on Cell/B.E. again,
I've tried to verify combined multi-arch debugging still works, but
noticed that it is currently completely broken; GDB will immediately
run into the assertion you've added above.

In fact, it is so broken that the test suite assumes we're not even
on a Cell/B.E. (since it can't debug the trivial test program), and
silently skips all Cell tests, so I didn't notice in the daily build
reports ...

The reason why we're running into the abort is that the multi-arch
debugging logic attempts to resolve a thread-local variable from
inside the frame unwinders (which is probably not done elsewhere).
This uncovers a code path where the above assertion is wrong:

In thread_db_get_thread_local_address, we have:

  /* If we have not discovered any threads yet, check now.  */
  if (!have_threads (ptid))
    thread_db_find_new_threads_1 (ptid);

Now, note that thread_db_get_thread_local_address is one of the few
remaining routines that always uses the thread DB, even if we do
not use thread events.  However, thread_db_find_new_threads_1 now
assumes it gets only ever called when using thread events, which
ultimately leads to the assert.

As a quick fix, the patch appended below makes it work again;
but this may be a bit overkill since thread_db_update_thread_list
no longer realized that we're only interested in process ptid,
and updates all inferiors.  (Maybe the to_update_thread_list
target callback should get a ptid argument?)

Any suggestions on how best to fix this?

Bye,
Ulrich

P.S. With this fix, the Cell multi-arch tests now run again.
As probably to be expected, this uncovers a series of other
problems where code has bit-rotten in the meantime.  I'll
have other patches to fix this again ...
  

Comments

Pedro Alves Aug. 26, 2015, 6:23 p.m. UTC | #1
On 08/26/2015 06:39 PM, Ulrich Weigand wrote:

> In fact, it is so broken that the test suite assumes we're not even
> on a Cell/B.E. (since it can't debug the trivial test program), and
> silently skips all Cell tests, so I didn't notice in the daily build
> reports ...

(Sounds like the testsuite could be improved to better detect this.)

> 
> The reason why we're running into the abort is that the multi-arch
> debugging logic attempts to resolve a thread-local variable from
> inside the frame unwinders (which is probably not done elsewhere).
> This uncovers a code path where the above assertion is wrong:

Curious.  Could you point me at this code path?  I can't seem
to find it.  I wonder whether can trigger this assertion
by stopping the inferior before thread_db is initialized (e.g.,
entry point), and then trying to print a tls variable?  (In order
to construct a test case).

> 
> In thread_db_get_thread_local_address, we have:
> 
>   /* If we have not discovered any threads yet, check now.  */
>   if (!have_threads (ptid))
>     thread_db_find_new_threads_1 (ptid);
> 
> Now, note that thread_db_get_thread_local_address is one of the few
> remaining routines that always uses the thread DB, even if we do
> not use thread events.  However, thread_db_find_new_threads_1 now
> assumes it gets only ever called when using thread events, which
> ultimately leads to the assert.
> 
> As a quick fix, the patch appended below makes it work again;
> but this may be a bit overkill since thread_db_update_thread_list
> no longer realized that we're only interested in process ptid,
> and updates all inferiors.  (Maybe the to_update_thread_list
> target callback should get a ptid argument?)
> 
> Any suggestions on how best to fix this?

Try doing it like gdbserver's thread_db_get_tls_address.

...
  lwp = get_thread_lwp (thread);
  if (!lwp->thread_known)
    find_one_thread (thread->entry.id);
...

That is, here what we're really after is the td_thrhandle_t
of the current thread, in order to be able to call
td_thr_tls_get_addr.  There's no need to walk thread_db's
thread list to find that for a single thread.

find_one_thread makes use of td_ta_map_lwp2thr for the
mapping we're after.
On the GDB side, the equivalent is linux-thread-db.c:thread_from_lwp.

Thanks,
Pedro Alves
  

Patch

Index: binutils-gdb/gdb/linux-thread-db.c
===================================================================
--- binutils-gdb.orig/gdb/linux-thread-db.c
+++ binutils-gdb/gdb/linux-thread-db.c
@@ -1853,7 +1853,7 @@  thread_db_get_thread_local_address (stru
 
   /* If we have not discovered any threads yet, check now.  */
   if (!have_threads (ptid))
-    thread_db_find_new_threads_1 (ptid);
+    thread_db_update_thread_list (ops);
 
   /* Find the matching thread.  */
   thread_info = find_thread_ptid (ptid);