[3/3] gdb: improve error handling for thread-db thread adoption

Message ID 63e2a1079d44737c020efc5bc177da21feab0ce5.1775557188.git.aburgess@redhat.com
State New
Headers
Series Better error handling when creating threads with clone() |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed

Commit Message

Andrew Burgess April 7, 2026, 10:22 a.m. UTC
  This work started with a bug report against Fedora GDB.  The bug link
is included at the end of this message, but it isn't super informative
and there was no follow up, so all we have is the original report.

The bug was that GDB crashed with this backtrace:

 #4 dump_core at ../../gdb/utils.c:225
 #5 internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) at ../../gdb/utils.c:477
 #6 internal_verror at ../../gdb/utils.c:503
 #7 internal_error_loc at ../../gdbsupport/errors.cc:57
 #8 report_exit_events_for at ../../gdb/linux-nat.c:331
 #10 linux_nat_filter_event at ../../gdb/linux-nat.c:3063
 #11 linux_nat_wait_1 at ../../gdb/linux-nat.c:3377
 #12 linux_nat_target::wait at ../../gdb/linux-nat.c:3603
 #13 thread_db_target::wait at ../../gdb/linux-thread-db.c:1401
 #14 target_wait at ../../gdb/target.c:2582
 #15 do_target_wait_1 at ../../gdb/infrun.c:4144
 #16 operator() at ../../gdb/infrun.c:4206
 #17 do_target_wait at ../../gdb/infrun.c:4225
 #18 fetch_inferior_event at ../../gdb/infrun.c:4666
 #19 gdb_wait_for_event at ../../gdbsupport/event-loop.cc:694
 #20 gdb_do_one_event at ../../gdbsupport/event-loop.cc:216
 #21 start_event_loop at ../../gdb/main.c:402
 #22 captured_command_loop at ../../gdb/main.c:466
 #23 captured_main at ../../gdb/main.c:1343
 #24 gdb_main at ../../gdb/main.c:1362

Frame #8 is where things start going wrong, and the line in question
is this gdb_assert:

  thread_info *thr = linux_target->find_thread (lp->ptid);
  gdb_assert (thr != nullptr);		<---- This is line 331.

So what we know is that while reporting a thread exit event GDB failed
to find the thread_info for a thread.

We can imagine many bugs that might cause this to occur, some of which
could even be outside GDB.  But without additional information I'm not
inclined to start looking for bugs outside of GDB.  So I started
looking for ways that the bug could be triggered from within GDB.

One way the bug could occur is if GDB fails to create a thread_info
for a new thread.

This led me to look at linux_nat_target::follow_clone, which led to
looking at thread_db_notice_clone.  In ::follow_clone GDB already has
code to handle the case where thread_db_notice_clone fails to add a
thread:

  if (!thread_db_notice_clone (inferior_ptid, new_lp->ptid))
    {
      /* The process is not using thread_db.  Add the LWP to
         GDB's list.  */
      add_thread (linux_target, new_lp->ptid);
    }

But does this always do what we need?  In thread_db_notice_clone, if
the CHILD thread was not adopted by thread-db then the thread_from_lwp
will return NULL, but we don't currently check this, instead we always
return true from thread_db_notice_clone.  This is an easy fix, we can
just check the return value of thread_from_lwp in
thread_db_notice_clone.

Next, looking through thread_from_lwp, we can see several calls to
`error`.  If any of these trigger then GDB will throw an exception, in
which case we'll unwind back through ::follow_clone without adding the
thread to GDB's data structures.  I think this can be fixed by using a
FORWARD_SCOPE_EXIT in ::follow_clone which will add the thread to
GDB's data structures.  Now, if an exception triggers, we will still
record the thread.

Initially, I'd thought just adding the thread on the exception path
would be enough, on x86-64 the target I initially tested on, I wasn't
able to trigger the error cases, so I was writing code to cover this
path "just in case".

But when I started testing on AArch64 and PPC64le, I found that my
same test case (more details below) was triggering the error path.

What I found is that throwing an exception from thread_from_lwp was
causing GDB to stop the inferior unnecessarily.  GDB would see a CLONE
event from the inferior, try to add the new thread to GDB, but this
would result in the `error` path, and an exception being thrown.  The
exception was not handled anywhere, so GDB would drop back to the
prompt, leaving the inferior stopped.  This is despite the fallback
code that I added above handling the failure of thread-db to adopt the
thread.

My solution for this is to convert the `error` calls into warnings,
and just have thread_from_lwp return NULL if thread-db cannot adopt
the thread.

This potentially makes the use of FORWARD_SCOPE_EXIT in ::follow_clone
unnecessary, I no longer expect any exceptions to be thrown from
thread_db_notice_clone.  However, I think maintaining the use of
FORWARD_SCOPE_EXIT is still a good idea.  If I've missed any exception
routes, or any are added in the future, then this will ensure the
threads are always added to GDB correctly.

The testing for this change is a little weird.  It involves creating a
thread using a direct clone call, and manually setting the thread's
TLS pointer to NULL.  So long as we don't do anything that might
require TLS access within the thread then this works on at least some
target architectures (tested on x86-64), but the NULL TLS pointer is
enough to prevent adoption of the thread by thread-db.  We do already
have some tests in gdb.threads/ that test using raw clone calls, so
this isn't the first test in this area.

Without this patch in place the new test will trigger the assertion
that was the cause of the original bug report, but also triggers a
couple of segfaults from within GDB, places where we assume we should
have a thread_info available.  With the patch in place the test works
as expected.

I also tested against gdbserver, and that works fine too, though I did
need to perform a similar error to warning conversion as I have in
GDB.  With that done gdbserver testing works fine.

There are some more warnings printed by gdbserver that we don't see
when doing native GDB testing. The warnings can be found in the
gdb.log after running with the native-gdbserver or
native-extended-gdbserver boards, and look like this:

  gdbserver: PID mismatch!  Expected 633347, got 633343
  gdbserver: Cannot find thread after clone.
  gdbserver: PID mismatch!  Expected 633347, got 633343
  gdbserver: PID mismatch!  Expected 633347, got 633343

These warnings are just a consequence of gdbserver not using thread-db
to manage the threads.  Maybe we could/should silence these warnings
as gdbserver is still happily managing the threads, just outside of
thread-db, but I don't plan to do anything about these right now.

Tested on GNU/Linux on x86-64, AArch64, RISC-V, and PPC64le using
unix, native-gdbserver, and native-extended-gdbserver boards.

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2414004
---
 gdb/linux-nat.c                             |  25 +-
 gdb/linux-thread-db.c                       |  18 +-
 gdb/testsuite/gdb.threads/clone-bad-tls.c   | 193 +++++++++++++
 gdb/testsuite/gdb.threads/clone-bad-tls.exp | 294 ++++++++++++++++++++
 gdbserver/thread-db.cc                      |  14 +-
 5 files changed, 525 insertions(+), 19 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/clone-bad-tls.c
 create mode 100644 gdb/testsuite/gdb.threads/clone-bad-tls.exp
  

Patch

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 1d0a4609a3a..bdb948aebaf 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1971,15 +1971,22 @@  linux_nat_target::follow_clone (ptid_t child_ptid)
   lwp_info *new_lp = add_lwp (child_ptid);
   new_lp->stopped = 1;
 
-  /* If the thread_db layer is active, let it record the user
-     level thread id and status, and add the thread to GDB's
-     list.  */
-  if (!thread_db_notice_clone (inferior_ptid, new_lp->ptid))
-    {
-      /* The process is not using thread_db.  Add the LWP to
-	 GDB's list.  */
-      add_thread (linux_target, new_lp->ptid);
-    }
+  /* If the thread_db layer is active, let it record the user level thread
+     id and status, and add the thread to GDB's list.  */
+  {
+    /* If the thread_db layer fails to record the thread, including if it
+       throws an error, then we should make sure GDB always adds the thread
+       to its data structures otherwise lots of things can go wrong.  For
+       example, GDB will not resume the thread, or if/when the thread
+       exits, GDB will be surprised and raise an assertion.  */
+    using scoped_add_thread = FORWARD_SCOPE_EXIT (add_thread);
+    scoped_add_thread add_thread_fallback (linux_target, new_lp->ptid);
+
+    /* If the thread_db layer successfully adopts the thread then we no
+       longer need our fallback to add the thread for us.  */
+    if (thread_db_notice_clone (inferior_ptid, new_lp->ptid))
+      add_thread_fallback.release ();
+  }
 
   /* We just created NEW_LP so it cannot yet contain STATUS.  */
   gdb_assert (new_lp->status == 0);
diff --git a/gdb/linux-thread-db.c b/gdb/linux-thread-db.c
index 1807352ed76..1ae9323108a 100644
--- a/gdb/linux-thread-db.c
+++ b/gdb/linux-thread-db.c
@@ -413,13 +413,19 @@  thread_from_lwp (thread_info *stopped, ptid_t ptid)
   err = info->td_ta_map_lwp2thr_p (info->thread_agent, ptid.lwp (),
 				   &th);
   if (err != TD_OK)
-    error (_("Cannot find user-level thread for LWP %ld: %s"),
-	   ptid.lwp (), thread_db_err_str (err));
+    {
+      warning (_("Cannot find user-level thread for LWP %ld: %s"),
+	       ptid.lwp (), thread_db_err_str (err));
+      return nullptr;
+    }
 
   err = info->td_thr_get_info_p (&th, &ti);
   if (err != TD_OK)
-    error (_("thread_get_info_callback: cannot get thread info: %s"),
-	   thread_db_err_str (err));
+    {
+      warning (_("thread_get_info_callback: cannot get thread info: %s"),
+	       thread_db_err_str (err));
+      return nullptr;
+    }
 
   /* Fill the cache.  */
   tp = stopped->inf->process_target ()->find_thread (ptid);
@@ -450,8 +456,8 @@  thread_db_notice_clone (ptid_t parent, ptid_t child)
   thread_info *parent_info = thread_from_lwp (stopped, parent);
   gdb_assert (parent_info == stopped);
 
-  thread_from_lwp (stopped, child);
-  return true;
+  thread_info *child_info = thread_from_lwp (stopped, child);
+  return child_info != nullptr;
 }
 
 static void *
diff --git a/gdb/testsuite/gdb.threads/clone-bad-tls.c b/gdb/testsuite/gdb.threads/clone-bad-tls.c
new file mode 100644
index 00000000000..dfb3ff79c17
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/clone-bad-tls.c
@@ -0,0 +1,193 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2026 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+   Test that GDB doesn't lose an event for a thread it didn't know
+   about, until an event is reported for it.  */
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <linux/futex.h>
+#include <sys/syscall.h>
+
+/* Global synchronization variables.  */
+pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
+int ready = 0;
+
+#define STACK_SIZE 0x1000
+
+/* Wrap futex syscall.  See 'man 2 futex'.  */
+
+static int
+futex (int *uaddr, int futex_op, int val, const struct timespec *timeout,
+       int *uaddr2, int val3)
+{
+  return syscall (SYS_futex, uaddr, futex_op, val, timeout, uaddr2, val3);
+}
+
+/* Function to run in the raw clone thread.  This just spins until the
+   integer flag pointed to by ARG is set to non-zero.  */
+
+static int
+clone_fn (void *arg)
+{
+  int *thread_exit = (int *) arg;	/* Break inside raw thread.  */
+  *thread_exit = 0;
+
+  while (*thread_exit == 0)
+    ;
+}
+
+/* A variable into which the kernel will write the child thread-id.  This
+   will be cleared when the thread exits.  This only gets used when
+   USE_FUTEX is defined.  */
+
+static int thread_tid;
+
+/* Create a thread using a raw clone call.  */
+
+void *
+do_raw_clone ()
+{
+  unsigned char *stack;
+  int res;
+  int clone_pid;
+
+  /* This flag is shared with the raw thread.  It is initially -1, then
+     set to 0 in the raw thread to show that the thread has started up.
+     This flag is then set to 1 by the main thread to indicate that the
+     raw thread should exit.  Using a volatile for thread synchronisation
+     is not great, but this avoids having to make library calls from the
+     raw thread, which might trigger a need for TLS to be setup correctly
+     on some targets.  */
+  volatile int thread_exit = -1;
+
+  stack = malloc (STACK_SIZE);
+  assert (stack != NULL);	/* Break before raw thread created.  */
+
+#ifdef USE_FUTEX
+#define TID_FLAGS (CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID)
+#else
+#define TID_FLAGS (0)
+#endif
+
+#define CLONE_FLAGS (CLONE_THREAD | CLONE_SIGHAND | CLONE_VM	\
+		     | CLONE_SETTLS | TID_FLAGS)
+
+  clone_pid = clone (clone_fn, stack + STACK_SIZE, CLONE_FLAGS,
+		     (void *) &thread_exit, &thread_tid,
+		     NULL, &thread_tid);
+
+  assert (clone_pid > 0);	/* Immediately after the clone.  */
+
+  while (thread_exit != 0)
+    usleep (1000);
+
+  /* Trigger the clone thread to exit.  */
+  thread_exit = 1;	/* Break after raw thread created.  */
+
+#ifdef USE_FUTEX
+  /* In this mode we rely on the futex to notify us when the thread has
+     exited.  In a broken GDB we used to deadlock at this point as GDB
+     would fail to restart the raw thread, and so the kernel would never
+     wake this futex.  */
+  while (thread_tid != 0)
+    futex (&thread_tid, FUTEX_WAIT, clone_pid, NULL, NULL, 0);
+
+  /* We do know that the raw thread has completed at this point, so we
+     could free its stack.  We don't though, just to be consistent with the
+     no futex path.  This doesn't really matter, this is just a small
+     test.  */
+#else
+  /* There's no attempt to synchronise with the raw clone thread on this
+     path.  In a broken GDB the raw thread might not be resumed, in which
+     case this timeout will expire.  */
+  sleep (2);
+
+  /* We cannot be sure that the raw thread has finished by this point, so
+     we cannot free the stack.  This isn't critical, this is just a small
+     test case.  */
+#endif
+
+  return NULL;			/* Break after raw thread exits.  */
+}
+
+/* Something for our pthread thread to do.  This just blocks until the main
+   thread releases it, at which point this thread will exit.  */
+
+void *
+worker_thread (void *arg)
+{
+  pthread_mutex_lock(&mutex);
+
+  /* Let the main thread know we are live by clearing the ready flag.  */
+  ready = 0;
+  pthread_cond_signal(&cond);
+
+  /* Now spin until the main thread sets this flag back to non-zero.  */
+  while (ready == 0)
+    {
+      /* The thread will block here until main signals 'cond' */
+      pthread_cond_wait (&cond, &mutex);
+    }
+
+  pthread_mutex_unlock (&mutex);
+
+  return NULL;
+}
+
+int
+main (void)
+{
+  pthread_t thread_id;
+  volatile int flag = 0;
+  int res;
+
+  /* The pthread will set the READY flag back to zero.  */
+  ready = 1;
+
+  alarm (300);
+
+  res = pthread_create (&thread_id, NULL, worker_thread, (void *)&flag);
+  assert (res == 0);
+
+  /* Wait for the pthread to set READY to zero.  */
+  pthread_mutex_lock(&mutex);
+  while (ready == 1)
+    pthread_cond_wait (&cond, &mutex);
+  pthread_mutex_unlock (&mutex);
+
+  /* Create a thread using a raw clone call.  */
+  do_raw_clone ();		/* Break before do_raw_clone.  */
+
+  /* Unblock the pthread.  */
+  pthread_mutex_lock (&mutex);
+  ready = 1;
+  pthread_cond_signal (&cond);
+  pthread_mutex_unlock (&mutex);
+
+  /* Wait for the worker to finish.  */
+  pthread_join (thread_id, NULL);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/clone-bad-tls.exp b/gdb/testsuite/gdb.threads/clone-bad-tls.exp
new file mode 100644
index 00000000000..2de16c63b6e
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/clone-bad-tls.exp
@@ -0,0 +1,294 @@ 
+# This testcase is part of GDB, the GNU debugger.
+
+# Copyright 2026 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This test creates a raw thread using a clone() call without going
+# through the pthread library.  Further, the thread that is created
+# deliberately sets its TLS pointer to NULL.  This used to cause
+# problems for GDB as GDB would fail to recognize the new thread,
+# which means no thread_info would be created, which would lead to
+# segfaults and assertion failures.
+#
+# When running this test on a remote target we do see some warnings
+# from gdbserver about failure to find the raw thread.  This is
+# because the thread is never owned by thread-db.  In a GDB only world
+# we don't warn about such threads, we just silently handle them.
+# Maybe we should do the same in gdbserver?  For now though, we just
+# ignore the warnings.  Functionally, gdbserver handles the raw thread
+# just fine, it just gives a warning each time we try to look it up in
+# thread-db (and fail).
+
+# This only works on targets with the Linux kernel, and the source
+# makes use of the clone call, which has a different API on IA64.
+require {!istarget "ia64-*-*"}
+require {istarget "*-*-linux*"}
+
+standard_testfile
+
+# When debugging on a remote target GDB sees the thread as a full
+# thread, so we get the 'Thread ...' type message.  When using the
+# native linux target within GDB, as this thread is not owned by
+# thread-db we get a 'LWP ...' type message.  Setup a global to be
+# used in the test patterns below.
+if {[gdb_protocol_is_remote]} {
+    set raw_thread_re "Thread $::decimal\\.$::decimal"
+} else {
+    set raw_thread_re "LWP $::decimal"
+}
+
+# Restart GDB using TESTFILE, the basename of an executable.  Run to
+# the function 'do_raw_clone', which is where we start all the tests.
+# Then use BREAK_LINE_PATTERN to create a breakpoint on the matching
+# line.
+proc setup_for_test { testfile break_line_pattern } {
+    clean_restart $testfile
+
+    # Run until the do_raw_clone function.  By this point the pthread
+    # will have been created.
+    if {![runto do_raw_clone]} {
+	return
+    }
+
+    gdb_breakpoint [gdb_get_line_number $break_line_pattern]
+}
+
+# Use TESTFILE as the basename of the executable to run.  Place a
+# breakpoint after the creation of a raw thread, in the parent thread,
+# not the raw thread.  Run until the breakpoint is hit, check we see
+# the raw thread announced, and that the thread shows up in the 'info
+# threads' output.
+proc_with_prefix stop_after_creating_raw_thread { testfile } {
+    # Start GDB.  Run to do_raw_clone.  Place a breakpoint after the
+    # raw thread has been created.
+    setup_for_test $testfile "Break after raw thread created."
+
+    # Continue until the raw thread has been created.  Check we see
+    # the raw thread announced, and that we hit the expected breakpoint.
+    set saw_breakpoint_line false
+    set saw_source_line false
+    set saw_new_thread_line false
+    gdb_test_multiple "continue" "" {
+	-re "^\\\[New $::raw_thread_re \\(id 3\\)\\\]\r\n" {
+	    set saw_new_thread_line true
+	    exp_continue
+	}
+
+	-re "^Thread 1 \[^\r\n\]+ hit Breakpoint $::decimal, do_raw_clone \\(\\)\[^\r\n\]+\r\n" {
+	    set saw_breakpoint_line true
+	    exp_continue
+	}
+
+	-re "^$::decimal\\s+\[^\r\n\]+Break after raw thread created\\.\[^\r\n\]+\r\n" {
+	    set saw_source_line true
+	    exp_continue
+	}
+
+	-re "^$::gdb_prompt $" {
+	    gdb_assert { $saw_breakpoint_line && $saw_source_line \
+			     && $saw_new_thread_line }  $gdb_test_name
+	}
+
+	-re "^\[^\r\n\]*\r\n" {
+	    exp_continue
+	}
+    }
+
+    # Ensure the raw thread shows up in the 'info threads' output.
+    gdb_test "info threads" \
+	"\r\n\\s+3\\s+$::raw_thread_re \[^\r\n\]+ clone_fn \\(arg=$::hex\\) \[^\r\n\]+"
+
+    gdb_test "thread 3"
+    gdb_test "bt"
+}
+
+# Use TESTFILE as the basename of the executable to run.  Place a
+# breakpoint within the raw thread function.  Run until the breakpoint
+# is hit, check we see the raw thread announced, the switch to the raw
+# thread, and that the thread shows up in the 'info threads' output.
+proc_with_prefix stop_in_raw_thread { testfile } {
+    setup_for_test $testfile "Break inside raw thread."
+
+    # Continue until the raw thread has been created.  Check we see
+    # the raw thread announced, and that we hit the expected breakpoint.
+    set saw_breakpoint_line false
+    set saw_source_line false
+    set saw_new_thread_line false
+    set saw_thread_switch false
+    gdb_test_multiple "continue" "" {
+	-re "^\\\[New $::raw_thread_re \\(id 3\\)\\\]\r\n" {
+	    set saw_new_thread_line true
+	    exp_continue
+	}
+
+	-re "^\\\[Switching to $::raw_thread_re\\\]\r\n" {
+	    set saw_thread_switch true
+	    exp_continue
+	}
+
+	-re "^Thread 3 \[^\r\n\]+ hit Breakpoint $::decimal, clone_fn \\(arg=$::hex\\)\[^\r\n\]+\r\n" {
+	    set saw_breakpoint_line true
+	    exp_continue
+	}
+
+	-re "^$::decimal\\s+\[^\r\n\]+Break inside raw thread\\.\[^\r\n\]+\r\n" {
+	    set saw_source_line true
+	    exp_continue
+	}
+
+	-re "^$::gdb_prompt $" {
+	    gdb_assert { $saw_breakpoint_line && $saw_source_line \
+			     && $saw_thread_switch && $saw_new_thread_line } \
+		$gdb_test_name
+	}
+
+	-re "^\[^\r\n\]*\r\n" {
+	    exp_continue
+	}
+    }
+
+    # Ensure the raw thread shows up in the 'info threads' output.
+    gdb_test "info threads" \
+	"\r\n\\*\\s+3\\s+$::raw_thread_re \[^\r\n\]+ clone_fn \\(arg=$::hex\\) \[^\r\n\]+"
+}
+
+# Use TESTFILE as the basename of the executable to run.  Break before
+# the raw thread is created, set scheduler-locking mode on to stop GDB
+# setting the raw thread running, then step (using 'next') over the
+# creation of the raw thread.  Once we have stepped past the clone
+# turn scheduler-locking off again and run until the inferior exits,
+# we expect to see both the pthread and the raw thread exit.
+#
+# In broken GDB, in the delay build (no futex) when the inferior
+# exits, the raw thread will also exit.  GDB knows nothing about the
+# raw thread, and an assertion is triggered.
+#
+# In broken GDB, in futex mode the inferior will deadlock waiting on
+# the futex, this is because GDB doesn't know about the raw thread,
+# and never set it running again.  As such the futex condition
+# variable is never cleared, and so the inferior never exits.
+#
+# This should all now be fixed, GDB should see the raw thread
+# correctly, and the inferior should exit in both modes.
+proc_with_prefix next_over_raw_thread_creation { testfile } {
+    setup_for_test $testfile "Break before raw thread created."
+
+    gdb_continue_to_breakpoint "before raw thread created"
+
+    gdb_test_no_output "set scheduler-locking on"
+
+    set saw_thread_announced false
+    set reached_line_after_clone false
+    for { set i 0 } { $i < 10 } { incr i } {
+	gdb_test_multiple "next" "next, $i" {
+	    -re "^\\\[New $::raw_thread_re \\(id 3\\)\\\]\r\n" {
+		set saw_thread_announced true
+		exp_continue
+	    }
+
+	    -re "^$::decimal\\s+\[^\r\n\]+Immediately after the clone\\.\[^\r\n\]+\r\n" {
+		set reached_line_after_clone true
+		exp_continue
+	    }
+
+	    -re "^$::gdb_prompt " {
+		# Nothing.
+	    }
+
+	    -re "^\[^\r\n\]*\r\n" {
+		exp_continue
+	    }
+	}
+
+	if { $reached_line_after_clone } {
+	    break
+	}
+    }
+
+    gdb_assert { $reached_line_after_clone && $saw_thread_announced } \
+	"have passed clone and seen raw thread"
+
+    # Ensure the raw thread shows up in the 'info threads' output.
+    # Due to scheduler-locking the new thread will not have made any
+    # progress yet.  Having 'clone ()' in the expected output might be
+    # risky; on some targets it's possible this could show up as some
+    # other function name.
+    gdb_test "info threads" \
+	"\r\n\\s+3\\s+$::raw_thread_re \[^\r\n\]+ clone \\(\\) \[^\r\n\]+"
+
+    gdb_test_no_output "set scheduler-locking off"
+
+    # Run until the inferior exits.  We want to do this because if GDB
+    # failed to track the raw thread, then when the inferior exits the
+    # raw thread will also exit, and this thread exit event used to
+    # trigger an assertion in GDB.
+    set saw_raw_thread_exit false
+    set saw_pthread_exit false
+    set saw_inferior_exit false
+    gdb_test_multiple "continue" "continue to exit" {
+	-re "^\\\[$::raw_thread_re \\(id 3\\) exited\\\]\r\n" {
+	    set saw_raw_thread_exit true
+	    exp_continue
+	}
+
+	-re "^\\\[Thread \[^\r\n\]+ \\(id 2\\) exited\\\]\r\n" {
+	    set saw_pthread_exit true
+	    exp_continue
+	}
+
+	-re "^\\\[Inferior 1 \[^\r\n\]+ exited normally\\\]\r\n" {
+	    set saw_inferior_exit true
+	    exp_continue
+	}
+
+	-re "^$::gdb_prompt " {
+	    # Remote targets don't send thread exit events for every
+	    # thread when the entire inferior exits.  Maybe GDB itself
+	    # should synthesize these events?  For now though just
+	    # invert the state of the thread exit flags when testing
+	    # with a remote protocol.
+	    if {[gdb_protocol_is_remote]} {
+		set saw_raw_thread_exit [expr {!$saw_raw_thread_exit}]
+		set saw_pthread_exit [expr {!$saw_pthread_exit}]
+	    }
+
+	    gdb_assert { $saw_raw_thread_exit && $saw_pthread_exit \
+			     && $saw_inferior_exit } $gdb_test_name
+	}
+
+	-re "^\[^\r\n\]*\r\n" {
+	    exp_continue
+	}
+    }
+}
+
+
+# Run the tests.
+foreach_with_prefix mode { delay futex } {
+    set the_testfile $testfile-$mode
+
+    set flags {debug pthreads}
+    if { $mode eq "futex" } {
+	lappend flags "additional_flags=-DUSE_FUTEX"
+    }
+
+    if {[build_executable "failed to build" $the_testfile $srcfile $flags]} {
+	return
+    }
+
+    stop_after_creating_raw_thread $the_testfile
+    stop_in_raw_thread $the_testfile
+    next_over_raw_thread_creation $the_testfile
+}
diff --git a/gdbserver/thread-db.cc b/gdbserver/thread-db.cc
index 0bdcda2ced8..b15d5e472a0 100644
--- a/gdbserver/thread-db.cc
+++ b/gdbserver/thread-db.cc
@@ -178,14 +178,20 @@  find_one_thread (ptid_t ptid)
   td_err_e err = thread_db->td_ta_map_lwp2thr_p (thread_db->thread_agent, lwpid,
 						 &th);
   if (err != TD_OK)
-    error ("Cannot get thread handle for LWP %d: %s",
-	   lwpid, thread_db_err_str (err));
+    {
+      warning ("Cannot get thread handle for LWP %d: %s",
+	       lwpid, thread_db_err_str (err));
+      return 0;
+    }
 
   td_thrinfo_t ti;
   err = thread_db->td_thr_get_info_p (&th, &ti);
   if (err != TD_OK)
-    error ("Cannot get thread info for LWP %d: %s",
-	   lwpid, thread_db_err_str (err));
+    {
+      warning ("Cannot get thread info for LWP %d: %s",
+	       lwpid, thread_db_err_str (err));
+      return 0;
+    }
 
   threads_debug_printf ("Found thread %ld (LWP %d)",
 			(unsigned long) ti.ti_tid, ti.ti_lid);