[RFA/commit] GDB crash re-running program on Windows (native)

Message ID 20181231035750.8063-1-brobecker@adacore.com
State New, archived
Headers

Commit Message

Joel Brobecker Dec. 31, 2018, 3:57 a.m. UTC
  Hello,

Running any program twice on Windows current results in GDB crashing:

    $ gdb -q any_program
    (gdb) run
    $ gdb dummy -batch -ex run -ex run
    [New Thread 684960.0xe5878]
    [New Thread 684960.0xd75ac]
    [New Thread 684960.0xddac8]
    [New Thread 684960.0xc1f50]
    [Thread 684960.0xd75ac exited with code 0]
    [Thread 684960.0xddac8 exited with code 0]
    [Thread 684960.0xc1f50 exited with code 0]
    [Inferior 1 (process 684960) exited normally]
    (gdb) run
    Segmentation fault

The crash happens while processing the CREATE_PROCESS_DEBUG_EVENT
for  the second run; in particular, we have in get_windows_debug_event:

    | case CREATE_PROCESS_DEBUG_EVENT:
    |   [...]
    |   if (main_thread_id)
    |     windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
    |                                    main_thread_id),
    |                            0);

The problem is that main_thread_id is the TID of the main thread from
the *previous* inferior, and this code is trying to delete that
thread. The problem is that it is constructing a PTID by pairing
the TID of the previous inferior with the PID of the new inferior.
As a result, when we dig inside windows_delete_thread to see
how it would handle that, we see...

    | delete_thread (find_thread_ptid (ptid));

Since the PTID is bogus, we end up calling delete_thread with
a NULL thread_info. It used to be harmless, turning the delete_thread
into a nop, but the following change...

    | commit 080363310650c93ad8e93018bcb6760ba5d32d1c
    | Date:   Thu Nov 22 16:09:14 2018 +0000
    | Subject: Per-inferior thread list, thread ranges/iterators, down with ALL_THREADS, etc.

... changed delete_thread to get the list of threads from
the inferior, which itself is now accessed via the given
thread_info. This is the corresponding diff that shows the change:

    | -  for (tp = thread_list; tp; tpprev = tp, tp = tp->next)
    | +  for (tp = thr->inf->thread_list; tp; tpprev = tp, tp = tp->next)

As a result of this, passing a NULL thread_info is no longer
an option!

Stepping back a bit, the reason behind deleting the thread late
could be found in a patch from Dec 2003, which laconically explains:

    | commit 87a45c96062d658ca83b50aa060a648bf5f5f1ff
    | Date:   Fri Dec 26 00:39:04 2003 +0000
    |
    | * win32-nat.c (get_child_debug_event): Keep main thread id around
    | even after thread exits since Windows insists on continuing to
    | report events against it.

A look at the gdb-patches archives did not provide any additional
clues (https://www.sourceware.org/ml/gdb-patches/2003-12/msg00478.html).
It is not clear whether this is still needed or not. This patch
assumes that whatever isue there was, the versions of Windows
we currently support no longer have it.

With that in mind, this commit fixes the issue by deleting the thread
when the inferior sends the exit-process event as opposed to deleting it
later, while starting a new inferior.

This restores also restores the printing of the thread-exit notification
for the main thread, which was missing before. Looking at the transcript
of the example show above, we can see 4 thread creation notifications,
and only 3 notifications for thread exits. Now creation and exit
notifications are balanced.

In the handling of EXIT_THREAD_DEBUG_EVENT, the main_thread_id
check is removed because deemed unnecessary: The main thread was
introduced by a CREATE_THREAD_DEBUG_EVENT, and thus the kernel
is expected to report its death via EXIT_PROCESS_DEBUG_EVENT.

And finally, because the behavior of delete_thread did change
(albeit when getting a value we probably never expected to receive),
this patch also adds a gdb_assert. The purpose is to provide some
immediate information in case there are other callers that mistakenly
call delete_thread with a NULL thread info. This can be useful
information when direct debugging of GDB isn't an option.

gdb/ChangeLog:

	* thread.c (delete_thread_1): Add gdb_assert that THR is not
	NULL. Initialize tpprev to NULL instead of assigning it
	to NULL on the next statement.
	* windows-nat.c (windows_delete_thread): Remove check for
	main_thread_id before printing thread exit notifications.
	(get_windows_debug_event) <EXIT_THREAD_DEBUG_EVENT>:
	Remove thread ID check against main_thread_id.
	<CREATE_PROCESS_DEBUG_EVENT>: Remove call to
	windows_delete_thread.
	<EXIT_PROCESS_DEBUG_EVENT>: Add call to windows_delete_thread.

Tested on x86-windows and x86_64-windows using AdaCore's testsuite.
I will push this patch in a couple of weeks or so unless there are
comments.

Thank you,
  

Comments

Simon Marchi Jan. 2, 2019, 11:03 p.m. UTC | #1
On 2018-12-30 22:57, Joel Brobecker wrote:
> Hello,
> 
> Running any program twice on Windows current results in GDB crashing:
> 
>     $ gdb -q any_program
>     (gdb) run
>     $ gdb dummy -batch -ex run -ex run
>     [New Thread 684960.0xe5878]
>     [New Thread 684960.0xd75ac]
>     [New Thread 684960.0xddac8]
>     [New Thread 684960.0xc1f50]
>     [Thread 684960.0xd75ac exited with code 0]
>     [Thread 684960.0xddac8 exited with code 0]
>     [Thread 684960.0xc1f50 exited with code 0]
>     [Inferior 1 (process 684960) exited normally]
>     (gdb) run
>     Segmentation fault
> 
> The crash happens while processing the CREATE_PROCESS_DEBUG_EVENT
> for  the second run; in particular, we have in get_windows_debug_event:
> 
>     | case CREATE_PROCESS_DEBUG_EVENT:
>     |   [...]
>     |   if (main_thread_id)
>     |     windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
>     |                                    main_thread_id),
>     |                            0);
> 
> The problem is that main_thread_id is the TID of the main thread from
> the *previous* inferior, and this code is trying to delete that
> thread. The problem is that it is constructing a PTID by pairing
> the TID of the previous inferior with the PID of the new inferior.
> As a result, when we dig inside windows_delete_thread to see
> how it would handle that, we see...
> 
>     | delete_thread (find_thread_ptid (ptid));
> 
> Since the PTID is bogus, we end up calling delete_thread with
> a NULL thread_info. It used to be harmless, turning the delete_thread
> into a nop, but the following change...
> 
>     | commit 080363310650c93ad8e93018bcb6760ba5d32d1c
>     | Date:   Thu Nov 22 16:09:14 2018 +0000
>     | Subject: Per-inferior thread list, thread ranges/iterators, down
> with ALL_THREADS, etc.
> 
> ... changed delete_thread to get the list of threads from
> the inferior, which itself is now accessed via the given
> thread_info. This is the corresponding diff that shows the change:
> 
>     | -  for (tp = thread_list; tp; tpprev = tp, tp = tp->next)
>     | +  for (tp = thr->inf->thread_list; tp; tpprev = tp, tp = 
> tp->next)
> 
> As a result of this, passing a NULL thread_info is no longer
> an option!
> 
> Stepping back a bit, the reason behind deleting the thread late
> could be found in a patch from Dec 2003, which laconically explains:
> 
>     | commit 87a45c96062d658ca83b50aa060a648bf5f5f1ff
>     | Date:   Fri Dec 26 00:39:04 2003 +0000
>     |
>     | * win32-nat.c (get_child_debug_event): Keep main thread id around
>     | even after thread exits since Windows insists on continuing to
>     | report events against it.
> 
> A look at the gdb-patches archives did not provide any additional
> clues 
> (https://www.sourceware.org/ml/gdb-patches/2003-12/msg00478.html).
> It is not clear whether this is still needed or not. This patch
> assumes that whatever isue there was, the versions of Windows
> we currently support no longer have it.

This seems reasonable to me, if the testsuite shows no regression with 
the hack removed, I'm confident enough that this is ok.

> 
> With that in mind, this commit fixes the issue by deleting the thread
> when the inferior sends the exit-process event as opposed to deleting 
> it
> later, while starting a new inferior.
> 
> This restores also restores the printing of the thread-exit 
> notification

"This restores also restores"

> for the main thread, which was missing before. Looking at the 
> transcript
> of the example show above, we can see 4 thread creation notifications,
> and only 3 notifications for thread exits. Now creation and exit
> notifications are balanced.

Another choice is to not show the main thread's creation and exit (as is 
done on Linux), since it's kind of redundant with the process creation 
and exit.

> 
> In the handling of EXIT_THREAD_DEBUG_EVENT, the main_thread_id
> check is removed because deemed unnecessary: The main thread was
> introduced by a CREATE_THREAD_DEBUG_EVENT, and thus the kernel
> is expected to report its death via EXIT_PROCESS_DEBUG_EVENT.

Should that last EXIT_PROCESS_DEBUG_EVENT actually be 
EXIT_THREAD_DEBUG_EVENT?

> @@ -1607,6 +1599,9 @@ get_windows_debug_event (struct target_ops *ops,
>  	}
>        else if (saw_create == 1)
>  	{
> +	  windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
> +					 main_thread_id),
> +				 0);
>  	  ourstatus->kind = TARGET_WAITKIND_EXITED;
>  	  ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode;
>  	  thread_id = main_thread_id;

If what you said above is right (that the kernel reports the main 
thread's death through an EXIT_THREAD_DEBUG_EVENT), why is this new call 
to windows_delete_thread needed?  Shouldn't it already be deleted at 
this point?

Simon
  
Joel Brobecker Jan. 3, 2019, 3:33 a.m. UTC | #2
Hi Simon,

Thanks for the review!

> > With that in mind, this commit fixes the issue by deleting the thread
> > when the inferior sends the exit-process event as opposed to deleting it
> > later, while starting a new inferior.
> > 
> > This restores also restores the printing of the thread-exit notification
> 
> "This restores also restores"
> 
> > for the main thread, which was missing before. Looking at the transcript
> > of the example show above, we can see 4 thread creation notifications,
> > and only 3 notifications for thread exits. Now creation and exit
> > notifications are balanced.
> 
> Another choice is to not show the main thread's creation and exit (as is
> done on Linux), since it's kind of redundant with the process creation and
> exit.

Indeed, that's a good idea. I propose to work on that as a followup
patch, as I am a little short on time these next couple of weeks
(broken hand :-().

> > In the handling of EXIT_THREAD_DEBUG_EVENT, the main_thread_id
> > check is removed because deemed unnecessary: The main thread was
> > introduced by a CREATE_THREAD_DEBUG_EVENT, and thus the kernel
> > is expected to report its death via EXIT_PROCESS_DEBUG_EVENT.
> 
> Should that last EXIT_PROCESS_DEBUG_EVENT actually be
> EXIT_THREAD_DEBUG_EVENT?

Not quite, but actually a good question nonetheless. There was
an auto-completion error in the text above. New text:

    | In the handling of EXIT_THREAD_DEBUG_EVENT, the main_thread_id
    | check is removed because deemed unnecessary: The main thread was
    | introduced by a CREATE_PROCESS_DEBUG_EVENT, and thus the kernel
    | is expected to report its death via EXIT_PROCESS_DEBUG_EVENT.

In other words, we don't expect to receive an EXIT_THREAD_DEBUG_EVENT
for the main thread, because, at the Windows level, that thread really
isn't a thread, but really a process.


> > @@ -1607,6 +1599,9 @@ get_windows_debug_event (struct target_ops *ops,
> >  	}
> >        else if (saw_create == 1)
> >  	{
> > +	  windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
> > +					 main_thread_id),
> > +				 0);
> >  	  ourstatus->kind = TARGET_WAITKIND_EXITED;
> >  	  ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode;
> >  	  thread_id = main_thread_id;
> 
> If what you said above is right (that the kernel reports the main thread's
> death through an EXIT_THREAD_DEBUG_EVENT), why is this new call to
> windows_delete_thread needed?  Shouldn't it already be deleted at this
> point?

Does the answer above allow us to make sense of this hunk?
  
Simon Marchi Jan. 3, 2019, 3:48 a.m. UTC | #3
On 2019-01-02 22:33, Joel Brobecker wrote:
> Hi Simon,
> 
> Thanks for the review!
> 
>> > With that in mind, this commit fixes the issue by deleting the thread
>> > when the inferior sends the exit-process event as opposed to deleting it
>> > later, while starting a new inferior.
>> >
>> > This restores also restores the printing of the thread-exit notification
>> 
>> "This restores also restores"
>> 
>> > for the main thread, which was missing before. Looking at the transcript
>> > of the example show above, we can see 4 thread creation notifications,
>> > and only 3 notifications for thread exits. Now creation and exit
>> > notifications are balanced.
>> 
>> Another choice is to not show the main thread's creation and exit (as 
>> is
>> done on Linux), since it's kind of redundant with the process creation 
>> and
>> exit.
> 
> Indeed, that's a good idea. I propose to work on that as a followup
> patch, as I am a little short on time these next couple of weeks
> (broken hand :-().
> 
>> > In the handling of EXIT_THREAD_DEBUG_EVENT, the main_thread_id
>> > check is removed because deemed unnecessary: The main thread was
>> > introduced by a CREATE_THREAD_DEBUG_EVENT, and thus the kernel
>> > is expected to report its death via EXIT_PROCESS_DEBUG_EVENT.
>> 
>> Should that last EXIT_PROCESS_DEBUG_EVENT actually be
>> EXIT_THREAD_DEBUG_EVENT?
> 
> Not quite, but actually a good question nonetheless. There was
> an auto-completion error in the text above. New text:
> 
>     | In the handling of EXIT_THREAD_DEBUG_EVENT, the main_thread_id
>     | check is removed because deemed unnecessary: The main thread was
>     | introduced by a CREATE_PROCESS_DEBUG_EVENT, and thus the kernel
>     | is expected to report its death via EXIT_PROCESS_DEBUG_EVENT.
> 
> In other words, we don't expect to receive an EXIT_THREAD_DEBUG_EVENT
> for the main thread, because, at the Windows level, that thread really
> isn't a thread, but really a process.
> 
> 
>> > @@ -1607,6 +1599,9 @@ get_windows_debug_event (struct target_ops *ops,
>> >  	}
>> >        else if (saw_create == 1)
>> >  	{
>> > +	  windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
>> > +					 main_thread_id),
>> > +				 0);
>> >  	  ourstatus->kind = TARGET_WAITKIND_EXITED;
>> >  	  ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode;
>> >  	  thread_id = main_thread_id;
>> 
>> If what you said above is right (that the kernel reports the main 
>> thread's
>> death through an EXIT_THREAD_DEBUG_EVENT), why is this new call to
>> windows_delete_thread needed?  Shouldn't it already be deleted at this
>> point?
> 
> Does the answer above allow us to make sense of this hunk?

Yes, thanks for the explanation, everything LGTM.

Simon
  
Joel Brobecker Jan. 5, 2019, 7:57 a.m. UTC | #4
>>>    gdb/ChangeLog:
>>>
>>>            * thread.c (delete_thread_1): Add gdb_assert that THR is not
>>>            NULL. Initialize tpprev to NULL instead of assigning it
>>>            to NULL on the next statement.
>>>            * windows-nat.c (windows_delete_thread): Remove check for
>>>            main_thread_id before printing thread exit notifications.
>>>            (get_windows_debug_event) <EXIT_THREAD_DEBUG_EVENT>:
>>>            Remove thread ID check against main_thread_id.
>>>            <CREATE_PROCESS_DEBUG_EVENT>: Remove call to
>>>            windows_delete_thread.
>>>            <EXIT_PROCESS_DEBUG_EVENT>: Add call to windows_delete_thread.

> Yes, thanks for the explanation, everything LGTM.

Thank you Simon; I just pushed the patch.

I am also happy to say that I have a patch for the secondary
suggestion of avoiding the thread creation/exit notification
for the main thread. I will let it run through our nightly testing
for a couple of days and send it here for review afterwards.
  

Patch

diff --git a/gdb/thread.c b/gdb/thread.c
index 48d605e612a..539d39c63f1 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -442,15 +442,17 @@  thread_step_over_chain_remove (struct thread_info *tp)
   step_over_chain_remove (&step_over_queue_head, tp);
 }
 
-/* Delete thread TP.  If SILENT, don't notify the observer of this
-   exit.  */
+/* Delete the thread referenced by THR.  If SILENT, don't notifyi
+   the observer of this exit.
+   
+   THR must not be NULL or a failed assertion will be raised.  */
 
 static void
 delete_thread_1 (thread_info *thr, bool silent)
 {
-  struct thread_info *tp, *tpprev;
+  gdb_assert (thr != nullptr);
 
-  tpprev = NULL;
+  struct thread_info *tp, *tpprev = NULL;
 
   for (tp = thr->inf->thread_list; tp; tpprev = tp, tp = tp->next)
     if (tp == thr)
diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c
index 8292cf42122..9b7b11e411f 100644
--- a/gdb/windows-nat.c
+++ b/gdb/windows-nat.c
@@ -496,7 +496,7 @@  windows_delete_thread (ptid_t ptid, DWORD exit_code)
 
   if (info_verbose)
     printf_unfiltered ("[Deleting %s]\n", target_pid_to_str (ptid));
-  else if (print_thread_events && id != main_thread_id)
+  else if (print_thread_events)
     printf_unfiltered (_("[%s exited with code %u]\n"),
 		       target_pid_to_str (ptid), (unsigned) exit_code);
   delete_thread (find_thread_ptid (ptid));
@@ -1560,14 +1560,10 @@  get_windows_debug_event (struct target_ops *ops,
 		     (unsigned) current_event.dwProcessId,
 		     (unsigned) current_event.dwThreadId,
 		     "EXIT_THREAD_DEBUG_EVENT"));
-
-      if (current_event.dwThreadId != main_thread_id)
-	{
-	  windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
-					 current_event.dwThreadId),
-				 current_event.u.ExitThread.dwExitCode);
-	  th = &dummy_thread_info;
-	}
+      windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
+				     current_event.dwThreadId),
+			     current_event.u.ExitThread.dwExitCode);
+      th = &dummy_thread_info;
       break;
 
     case CREATE_PROCESS_DEBUG_EVENT:
@@ -1580,10 +1576,6 @@  get_windows_debug_event (struct target_ops *ops,
 	break;
 
       current_process_handle = current_event.u.CreateProcessInfo.hProcess;
-      if (main_thread_id)
-	windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
-				       main_thread_id),
-			       0);
       main_thread_id = current_event.dwThreadId;
       /* Add the main thread.  */
       th = windows_add_thread (ptid_t (current_event.dwProcessId, 0,
@@ -1607,6 +1599,9 @@  get_windows_debug_event (struct target_ops *ops,
 	}
       else if (saw_create == 1)
 	{
+	  windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
+					 main_thread_id),
+				 0);
 	  ourstatus->kind = TARGET_WAITKIND_EXITED;
 	  ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode;
 	  thread_id = main_thread_id;