[v2] gdb/Windows: Fix detach while running

Message ID 625b2dbb-6cb8-40dd-b7d0-fbce48cb5436@palves.net
State New
Headers
Series [v2] gdb/Windows: Fix detach while running |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed

Commit Message

Pedro Alves April 12, 2024, 1:41 p.m. UTC
  Hi!

On 2024-04-12 11:56, Hannes Domani wrote:

> I tried this, and it works for me too for Windows native.
> 
> Tested-By: Hannes Domani <ssbssa@yahoo.de>
> 

Thanks, added.


>> +
>> +#include <assert.h>
>> +#include <unistd.h>
>> +
>> +int
>> +main (int argc, char **argv)
>> +{
>> +  alarm (30);
>> +
>> +  while (1)
>> +    sleep (1);
>> +
>> +  return 0;
>> +}
> 
> The test passes for me, but I had to #ifdef the alarm() function out,
> since it doesn't exist on Windows.

I removed it, and make the loop sleep for 30 seconds instead.

>> +  for (;;)
>> +    {
>> +      DEBUG_EVENT current_event;
>> +      wait_for_debug_event_main_thread (&current_event);
>> +
>> +      if (current_event.dwDebugEventCode == EXIT_PROCESS_DEBUG_EVENT)
>> +    {
>> +      DEBUG_EVENTS ("got EXIT_PROCESS_DEBUG_EVENT");
>> +      process_alive = false;
>> +      break;
> 
> Not sure how likely it is to get here, 

Thanks for bringing it up, it made me double check my assumptions and improve the code
a little.

The inferior is running free when we get to this function, so we can see a
process exit event if the process exits just before we manage to inject
the thread.  I tried that by setting a breakpoint on windows_nat_target::detach,
doing "c& ; detach", and then waiting until the child exits after the 30s,
and then stepping through the detach code.

I've now added explicit error handling and logging for CreateRemoteThread failing.

I manually tested the new error/exception path.

> but injected_thread_handle leaks in this case.

Indeed.  I had noticed it too after posting.  I fixed it by moving the CloseHandle call
after the loop.  And I shouldn't try to close the injected thread's handle when CreateRemoteThread
failed.  Fixed too.

Also, I remembered a couple things that I intended to do in the testcase that I had forgotten in v1:

 - Re-attach again after the detach.  This would catch the bad scenario with DebugBreakProcess
   described in the comments.

 - Move the testcase to gdb.base/ instead of gdb.threads/.

All fixed now.

Here's a v2 addressing all these issues.

From 0aa7be888aa17776aa1fd58eb8798181c80aa1bf Mon Sep 17 00:00:00 2001
From: Pedro Alves <pedro@palves.net>
Date: Fri, 12 Apr 2024 14:18:22 +0100
Subject: [PATCH] gdb/Windows: Fix detach while running

While testing a WIP Cygwin GDB that supports non-stop, I noticed that
gdb.threads/attach-non-stop.exp exposes that this:

 (gdb) attach PID&
 ...
 (gdb) detach

... hangs.

And it turns out that it hangs in all-stop as well.  This commits
fixes that.

After "attach &", the target is set running, we've called
ContinueDebugEvent and the process_thread thread is waiting for
WaitForDebugEvent events.  It is the equivalent of "attach; c&".

In windows_nat_target::detach, the first thing we do is
unconditionally call windows_continue (for ContinueDebugEvent), which
blocks in do_synchronously, until the process_thread sees an event out
of WaitForDebugEvent.  Unless the inferior happens to run into a
breakpoint, etc., then this hangs indefinitely.

If we've already called ContinueDebugEvent earlier, then we shouldn't
be calling it again in ::detach.

Still in windows_nat_target::detach, we have an interesting issue that
ends up being the bulk of the patch -- only the process_thread thread
can call DebugActiveProcessStop, but if it is blocked in
WaitForDebugEvent, we need to somehow force it to break out of it.
The only way to do that, is to force the inferior to do something that
causes WaitForDebugEvent to return some event.

This patch uses CreateRemoteThread to do it, which results in
WaitForDebugEvent reporting CREATE_THREAD_DEBUG_EVENT.  We then
terminate the injected thread before it has a chance to run any
userspace code.

Note that Win32 functions like DebugBreakProcess and
GenerateConsoleCtrlEvent would also inject a new thread in the
inferior.  I first used DebugBreakProcess, but that is actually more
complicated to use, because we'd have to be sure to consume the
breakpoint event before detaching, otherwise the inferior would likely
die due a breakpoint exception being raised with no debugger around to
intercept it.

See the new break_out_process_thread method.

So the fix has two parts:

 - Keep track of whether we've called ContinueDebugEvent and the
   process_thread thread is waiting for events, or whether
   WaitForDebugEvent already returned an event.

 - In windows_nat_target::detach, if the process_thread thread is
   waiting for events, unblock out of its WaitForDebugEvent, before
   proceeding with the actual detach.

New test included.  Passes cleanly on GNU/Linux native and gdbserver,
and also passes cleanly on Cygwin and MinGW, with the fix.  Before the
fix, it would hang and fail with a timeout.

Tested-By: Hannes Domani <ssbssa@yahoo.de>
Change-Id: Ifb91c58c08af1a9bcbafecedc93dfce001040905
---
 gdb/testsuite/gdb.base/detach-while-running.c |  27 ++++
 .../gdb.base/detach-while-running.exp         |  95 +++++++++++
 gdb/windows-nat.c                             | 153 +++++++++++++++++-
 3 files changed, 267 insertions(+), 8 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/detach-while-running.c
 create mode 100644 gdb/testsuite/gdb.base/detach-while-running.exp


base-commit: 02d02fc7924992ddd98073b95810b957efdc421a
  

Comments

Tom Tromey April 15, 2024, 5:02 p.m. UTC | #1
>>>>> "Pedro" == Pedro Alves <pedro@palves.net> writes:

Pedro> Still in windows_nat_target::detach, we have an interesting issue that
Pedro> ends up being the bulk of the patch -- only the process_thread thread
Pedro> can call DebugActiveProcessStop, but if it is blocked in
Pedro> WaitForDebugEvent, we need to somehow force it to break out of it.
Pedro> The only way to do that, is to force the inferior to do something that
Pedro> causes WaitForDebugEvent to return some event.

I generally like the Windows debug API but I can't really understand why
they didn't integrate WaitForDebugEvent with WaitForMultipleObjects --
it seems like such an obvious thing to do and it would have saved so
many headaches.

I didn't read the patch in too much detail, but I did look over it and I
read through the explanation.  It all makes sense to me.

I was wondering if something similar is needed for gdbserver?
I assume not though.

Reviewed-By: Tom Tromey <tom@tromey.com>

Tom
  
Pedro Alves April 17, 2024, 5:57 p.m. UTC | #2
On 2024-04-15 18:02, Tom Tromey wrote:
>>>>>> "Pedro" == Pedro Alves <pedro@palves.net> writes:
> 
> Pedro> Still in windows_nat_target::detach, we have an interesting issue that
> Pedro> ends up being the bulk of the patch -- only the process_thread thread
> Pedro> can call DebugActiveProcessStop, but if it is blocked in
> Pedro> WaitForDebugEvent, we need to somehow force it to break out of it.
> Pedro> The only way to do that, is to force the inferior to do something that
> Pedro> causes WaitForDebugEvent to return some event.
> 
> I generally like the Windows debug API but I can't really understand why
> they didn't integrate WaitForDebugEvent with WaitForMultipleObjects --
> it seems like such an obvious thing to do and it would have saved so
> many headaches.
> 
> I didn't read the patch in too much detail, but I did look over it and I
> read through the explanation.  It all makes sense to me.
> 
> I was wondering if something similar is needed for gdbserver?
> I assume not though.

It doesn't for two reasons:

#1 - gdbserver doesn't wait for Windows debug events with a separate thread.

#2 - in order to detach while the inferior is running with a remote target, we must
     be using the non-stop variant of the RSP (so we can send a packet other than
     \03 while the target is running), which the Windows gdbserver port does not support
     (and would need #1).

ATM, I can only focus mostly on native non-stop, other than making sure gdbserver
doesn't break.  But after the non-stop work, the gdb and gdbserver backends end up a
bit more similar in some areas, hopefully we'll end up being able to share more code
later on.

> 
> Reviewed-By: Tom Tromey <tom@tromey.com>
> 

Thanks.  I've added this and merged the patch.
  
Pedro Alves April 17, 2024, 6:18 p.m. UTC | #3
I don't know why I forgot to reply to this part, sorry about that.
I had intended to.  Correcting now.

On 2024-04-15 18:02, Tom Tromey wrote:

> I generally like the Windows debug API but I can't really understand why
> they didn't integrate WaitForDebugEvent with WaitForMultipleObjects --
> it seems like such an obvious thing to do and it would have saved so
> many headaches.

Totally agreed!  It just seems so obvious.  I even put that in a slide in my FOSDEM presentation this
last February:

 https://fosdem.org/2024/schedule/event/fosdem-2024-2796-gdb-on-windows-status-plans/
 https://fosdem.org/2024/events/attachments/fosdem-2024-2796-gdb-on-windows-status-plans/slides/22389/FOSDEM-2024-gdb-on-windows_IWzsEfL.pdf

Slides 22-25.  Unfortunately I didn't have time to go through them in the actual talk.

If ever I have a chance of asking for debug features from MSFT, this will be one of them.

Pedro Alves
  

Patch

diff --git a/gdb/testsuite/gdb.base/detach-while-running.c b/gdb/testsuite/gdb.base/detach-while-running.c
new file mode 100644
index 00000000000..9c038f4af18
--- /dev/null
+++ b/gdb/testsuite/gdb.base/detach-while-running.c
@@ -0,0 +1,27 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2024 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <unistd.h>
+
+int
+main (int argc, char **argv)
+{
+  for (int i = 0; i < 30; i++)
+    sleep (1);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/detach-while-running.exp b/gdb/testsuite/gdb.base/detach-while-running.exp
new file mode 100644
index 00000000000..4ce6c7045ea
--- /dev/null
+++ b/gdb/testsuite/gdb.base/detach-while-running.exp
@@ -0,0 +1,95 @@ 
+# Copyright 2024 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test detaching while the inferior is running.  Basically:
+#
+#  (gdb) attach PID
+#  (gdb) c&
+#  (gdb) detach
+
+require can_spawn_for_attach
+
+standard_testfile
+
+if {[build_executable "failed to prepare" $testfile $srcfile {debug}] == -1} {
+    return
+}
+
+# The test proper.  See description above.
+
+proc test {} {
+    global binfile gdb_prompt
+
+    # This test requires executing commands while the target is
+    # running, which, when testing with the remote target, requires
+    # non-stop remote protocol.  Until that variant of the RSP is the
+    # default, force target non-stop mode on.
+    set is_remote \
+	[expr {[target_info exists gdb_protocol] \
+		   && ([target_info gdb_protocol] == "remote" \
+			   || [target_info gdb_protocol] == "extended-remote")}]
+
+    save_vars { ::GDBFLAGS } {
+	if {$is_remote} {
+	    append ::GDBFLAGS " -ex \"maint set target-non-stop on\""
+	}
+	clean_restart ${binfile}
+    }
+
+    set test_spawn_id [spawn_wait_for_attach $binfile]
+    set testpid [spawn_id_get_pid $test_spawn_id]
+
+    set any "\[^\r\n\]*"
+
+    # Iterate more than once so that we test re-attaching after
+    # detaching, in case GDB incorrectly detaches and the process
+    # crashes after the detach.
+    set n_iters 2
+    for {set iter 1} {$iter <= $n_iters} {incr iter} {
+	with_test_prefix "iter=$iter" {
+	    set attached 0
+
+	    gdb_test_multiple "attach $testpid" "attach" {
+		-re "Attaching to program:${any}process $testpid\r\n.*$gdb_prompt " {
+		    pass $gdb_test_name
+		    set attached 1
+		}
+	    }
+
+	    if {!$attached} {
+		break
+	    }
+
+	    gdb_test_multiple "continue &" "" {
+		-re "Continuing\.\r\n$::gdb_prompt " {
+		    pass $gdb_test_name
+		}
+	    }
+
+	    gdb_test "detach" "Detaching from.*"
+
+	    # Sleep a bit before reattaching to let the detached
+	    # process crash and exit if e.g., GDB managed to leave
+	    # breakpoint traps behind.
+	    if {$iter != $n_iters} {
+		sleep 1
+	    }
+	}
+    }
+
+    kill_wait_spawned_process $test_spawn_id
+}
+
+test
diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c
index b123a66ef0f..0d8484c832a 100644
--- a/gdb/windows-nat.c
+++ b/gdb/windows-nat.c
@@ -357,6 +357,13 @@  struct windows_nat_target final : public x86_nat_target<inf_child_target>
      needed.  */
   void wait_for_debug_event_main_thread (DEBUG_EVENT *event);
 
+  /* Force the process_thread thread to return from WaitForDebugEvent.
+     PROCESS_ALIVE is set to false if the inferior process exits while
+     we're trying to break out the process_thread thread.  This can
+     happen because this is called while all threads are running free,
+     while we're trying to detach.  */
+  void break_out_process_thread (bool &process_alive);
+
   /* Queue used to send requests to process_thread.  This is
      implicitly locked.  */
   std::queue<gdb::function_view<bool ()>> m_queue;
@@ -379,6 +386,12 @@  struct windows_nat_target final : public x86_nat_target<inf_child_target>
 
   /* True if currently in async mode.  */
   bool m_is_async = false;
+
+  /* True if we last called ContinueDebugEvent and the process_thread
+     thread is now waiting for events.  False if WaitForDebugEvent
+     already returned an event, and we need to ContinueDebugEvent
+     again to restart the inferior.  */
+  bool m_continued = false;
 };
 
 static void
@@ -498,6 +511,8 @@  windows_nat_target::wait_for_debug_event_main_thread (DEBUG_EVENT *event)
 	wait_for_debug_event (event, INFINITE);
       return false;
     });
+
+  m_continued = false;
 }
 
 /* See nat/windows-nat.h.  */
@@ -1352,6 +1367,8 @@  windows_nat_target::windows_continue (DWORD continue_status, int id,
 				" - ContinueDebugEvent failed"),
 			      *err);
 
+  m_continued = !last_call;
+
   return TRUE;
 }
 
@@ -2072,20 +2089,140 @@  windows_nat_target::attach (const char *args, int from_tty)
   target_terminal::ours ();
 }
 
+void
+windows_nat_target::break_out_process_thread (bool &process_alive)
+{
+  /* This is called when the process_thread thread is blocked in
+     WaitForDebugEvent (unless it already returned some event we
+     haven't consumed yet), and we need to unblock it so that we can
+     have it call DebugActiveProcessStop.
+
+     To make WaitForDebugEvent return, we need to force some event in
+     the inferior.  Any method that lets us do that (without
+     disturbing the other threads), injects a new thread in the
+     inferior.
+
+     We don't use DebugBreakProcess for this, because that injects a
+     thread that ends up executing a breakpoint instruction.  We can't
+     let the injected thread hit that breakpoint _after_ we've
+     detached.  Consuming events until we see a breakpoint trap isn't
+     100% reliable, because we can't distinguish it from some other
+     thread itself deciding to call int3 while we're detaching, unless
+     we temporarily suspend all threads.  It's just a lot of
+     complication, and there's an easier way.
+
+     Important observation: the thread creation event for the newly
+     injected thread is sufficient to unblock WaitForDebugEvent.
+
+     Instead of DebugBreakProcess, we can instead use
+     CreateRemoteThread to control the code that the injected thread
+     runs ourselves.  We could consider pointing the injected thread
+     at some side-effect-free Win32 function as entry point.  However,
+     finding the address of such a function requires having at least
+     minimal symbols loaded for ntdll.dll.  Having a way that avoids
+     that is better, so that detach always works correctly even when
+     we don't have any symbols loaded.
+
+     So what we do is inject a thread that doesn't actually run ANY
+     userspace code, because we force-terminate it as soon as we see
+     its corresponding thread creation event.  CreateRemoteThread
+     gives us the new thread's ID, which we can match with the thread
+     associated with the CREATE_THREAD_DEBUG_EVENT event.  */
+
+  DWORD injected_thread_id = 0;
+  HANDLE injected_thread_handle
+    = CreateRemoteThread (windows_process.handle, NULL,
+			  0, (LPTHREAD_START_ROUTINE) 0,
+			  NULL, 0, &injected_thread_id);
+
+  if (injected_thread_handle == NULL)
+    {
+      DWORD err = GetLastError ();
+
+      DEBUG_EVENTS ("CreateRemoteThread failed with %u", err);
+
+      if (err == ERROR_ACCESS_DENIED)
+	{
+	  /* Creating the remote thread fails with ERROR_ACCESS_DENIED
+	     if the process exited before we had a chance to inject
+	     the thread.  Continue with the loop below and consume the
+	     process exit event anyhow, so that our caller can always
+	     call windows_continue.  */
+	}
+      else
+	throw_winerror_with_name (_("Can't detach from running process.  "
+				    "Interrupt it first."),
+				  err);
+    }
+
+  process_alive = true;
+
+  /* At this point, the user has declared that they want to detach, so
+     any event that happens from this point on should be forwarded to
+     the inferior.  */
+
+  for (;;)
+    {
+      DEBUG_EVENT current_event;
+      wait_for_debug_event_main_thread (&current_event);
+
+      if (current_event.dwDebugEventCode == EXIT_PROCESS_DEBUG_EVENT)
+	{
+	  DEBUG_EVENTS ("got EXIT_PROCESS_DEBUG_EVENT");
+	  process_alive = false;
+	  break;
+	}
+
+      if (current_event.dwDebugEventCode == CREATE_THREAD_DEBUG_EVENT
+	  && current_event.dwThreadId == injected_thread_id)
+	{
+	  DEBUG_EVENTS ("got CREATE_THREAD_DEBUG_EVENT for injected thread");
+
+	  /* Terminate the injected thread, so it doesn't run any code
+	     at all.  All we wanted was some event, and
+	     CREATE_THREAD_DEBUG_EVENT is sufficient.  */
+	  CHECK (TerminateThread (injected_thread_handle, 0));
+	  break;
+	}
+
+      DEBUG_EVENTS ("got unrelated event, code %u",
+		    current_event.dwDebugEventCode);
+      windows_continue (DBG_CONTINUE, -1, 0);
+    }
+
+  if (injected_thread_handle != NULL)
+    CHECK (CloseHandle (injected_thread_handle));
+}
+
 void
 windows_nat_target::detach (inferior *inf, int from_tty)
 {
+  /* If we see the process exit while unblocking the process_thread
+     helper thread, then we should skip the actual
+     DebugActiveProcessStop call.  But don't report an error.  Just
+     pretend the process exited shortly after the detach.  */
+  bool process_alive = true;
+
+  /* The process_thread helper thread will be blocked in
+     WaitForDebugEvent waiting for events if we've resumed the target
+     before we get here, e.g., with "attach&" or "c&".  We need to
+     unblock it so that we can have it call DebugActiveProcessStop
+     below, in the do_synchronously block.  */
+  if (m_continued)
+    break_out_process_thread (process_alive);
+
   windows_continue (DBG_CONTINUE, -1, 0, true);
 
   std::optional<unsigned> err;
-  do_synchronously ([&] ()
-    {
-      if (!DebugActiveProcessStop (windows_process.current_event.dwProcessId))
-	err = (unsigned) GetLastError ();
-      else
-	DebugSetProcessKillOnExit (FALSE);
-      return false;
-    });
+  if (process_alive)
+    do_synchronously ([&] ()
+      {
+	if (!DebugActiveProcessStop (windows_process.current_event.dwProcessId))
+	  err = (unsigned) GetLastError ();
+	else
+	  DebugSetProcessKillOnExit (FALSE);
+	return false;
+      });
 
   if (err.has_value ())
     {