Fixes to Cygwin-specific signal handling

Message ID 1429009382-21040-1-git-send-email-jon.turney@dronecode.org.uk
State New, archived
Headers

Commit Message

Jon Turney April 14, 2015, 11:03 a.m. UTC
  Originally by cgf, this patch has been carried in Cygwin's gdb package for a few
years.  I've cleaned it up a bit and revised it for master.

Without this patch, it's impossible to usefully run the testsuite on Cygwin.

gdb/ChangeLog:

2015-04-11  Jon Turney  <jon.turney@dronecode.org.uk>

	* windows-nat.c: Replace have_saved_context with signal_thread_id
	throughout.
	(thread_rec): Don't retrieve context if we have a saved one.
	Ignore 'Invalid Handle' errors.
	(handle_output_debug_string): Mark signal context as not to be
	written to inferior by windows_continue() or windows_resume().
	(get_windows_debug_event): Replace retval with thread_id
	throughout.  Don't clear any saved context.
---
 gdb/ChangeLog     | 11 +++++++++++
 gdb/windows-nat.c | 55 +++++++++++++++++++++++++++++++------------------------
 2 files changed, 42 insertions(+), 24 deletions(-)
  

Comments

Joel Brobecker April 14, 2015, 1:16 p.m. UTC | #1
Jon,

On Tue, Apr 14, 2015 at 12:03:02PM +0100, Jon Turney wrote:
> Originally by cgf, this patch has been carried in Cygwin's gdb package
> for a few years.  I've cleaned it up a bit and revised it for master.
> 
> Without this patch, it's impossible to usefully run the testsuite on
> Cygwin.
> 
> gdb/ChangeLog:
> 
> 2015-04-11  Jon Turney  <jon.turney@dronecode.org.uk>
> 
> 	* windows-nat.c: Replace have_saved_context with signal_thread_id
> 	throughout.
> 	(thread_rec): Don't retrieve context if we have a saved one.
> 	Ignore 'Invalid Handle' errors.
> 	(handle_output_debug_string): Mark signal context as not to be
> 	written to inferior by windows_continue() or windows_resume().
> 	(get_windows_debug_event): Replace retval with thread_id
> 	throughout.  Don't clear any saved context.

Overall, the patch looks reasonable to me. But I think there are
at least 3 independent changes, and it would be nice to split those
two out, for a couple of reasons:
  1. It allows you to explain the nature of the problem, from the user's
     standpoint, that the patch is fixing (ie, what user-visible
     symptoms it fixes);
  2. it allows us to see how each problem is fixed, and to deal with
     each issue separately.

The three issues I view as independent:
  a. ignoring "invalid handle" errors;
  b. unsetting saved_context.ContextFlags
  c. the renaming of have_saved_context into signal_thread_id
     so you can compare the current thread id with the saved
     signal_thread_id.

A few minor comments below.

> @@ -849,8 +853,12 @@ handle_output_debug_string (struct target_waitstatus *ourstatus)
>  					 &saved_context,
>  					 __COPY_CONTEXT_SIZE, &n)
>  		   && n == __COPY_CONTEXT_SIZE)
> -	    have_saved_context = 1;
> -	  current_event.dwThreadId = retval;
> +	    {
> +	      signal_thread_id = retval;
> +	      saved_context.ContextFlags = 0;  /* Don't attempt to call SetThreadContext */

Can you move the comment just above the statement so as to avoid
exceeding the maximum line length, and also explain why we should
not call SetThreadContext?

> @@ -1330,7 +1338,6 @@ get_windows_debug_event (struct target_ops *ops,
>    event_code = current_event.dwDebugEventCode;
>    ourstatus->kind = TARGET_WAITKIND_SPURIOUS;
>    th = NULL;
> -  have_saved_context = 0;

Normally, there is a corresponding addition that sets signal_thread_id
to zero.  This leads me to be believe that there might be an additional
user-visible symptom that this patch fixes?

> @@ -1501,12 +1508,12 @@ get_windows_debug_event (struct target_ops *ops,
>    else
>      {
>        inferior_ptid = ptid_build (current_event.dwProcessId, 0,
> -				  retval);
> -      current_thread = th ?: thread_rec (current_event.dwThreadId, TRUE);
> +				  thread_id);
> +      current_thread = th ?: thread_rec (thread_id, TRUE);

I know you are just repeating what was there before, but I don't think
this is the clearest way to do things. GDB Coding Standards also do not
allow assignments within conditions. I suggest instead:

        current_thread = th;
        if (!current_thread)
          thread_rec (thread_id, TRUE);

>  out:
> -  return retval;
> +  return (int) thread_id;

I'm wondering whether the cast is actually necessary?

Also, it looks like the function's comment might be stale:

    /* Get the next event from the child.  Return 1 if the event requires
       handling by WFI (or whatever).  */
    static int
    get_windows_debug_event (struct target_ops *ops,
                             int pid, struct target_waitstatus *ourstatus)

Would you mind fixing that?

Thanks,
  
Eli Zaretskii April 14, 2015, 2:38 p.m. UTC | #2
> Date: Tue, 14 Apr 2015 06:16:15 -0700
> From: Joel Brobecker <brobecker@adacore.com>
> Cc: gdb-patches@sourceware.org
> 
> Overall, the patch looks reasonable to me. But I think there are
> at least 3 independent changes, and it would be nice to split those
> two out, for a couple of reasons:
>   1. It allows you to explain the nature of the problem, from the user's
>      standpoint, that the patch is fixing (ie, what user-visible
>      symptoms it fixes);
>   2. it allows us to see how each problem is fixed, and to deal with
>      each issue separately.
> 
> The three issues I view as independent:
>   a. ignoring "invalid handle" errors;
>   b. unsetting saved_context.ContextFlags
>   c. the renaming of have_saved_context into signal_thread_id
>      so you can compare the current thread id with the saved
>      signal_thread_id.

Also, quite a few of the changes are in non-Cygwin parts, so some
explanation of the issues would be appreciated, to make sure the
changes are safe for the native builds as well.

Thanks.
  
Jon Turney April 16, 2015, 7:23 p.m. UTC | #3
Some cleanups in windows-nat.c and a fix to Cygwin-specific signal handling.

Using the 'catch-signal' test from the testsuite, on x86_64 Cygwin:

$ ./gdb testsuite/outputs/gdb.base/catch-signal/catch-signal.exe
[...]
(gdb) catch signal
Catchpoint 1 (standard signals)
(gdb) r
[...]
Catchpoint 1 (signal SIGHUP), main () at
../../../gdb/testsuite/gdb.base/catch-signal.c:40
40        raise (SIGHUP);               /* second HUP */
(gdb) c
Continuing.
[hangs]

with up to [4/5] applied:

$ ./gdb testsuite/outputs/gdb.base/catch-signal/catch-signal.exe
[...]
(gdb) catch signal
Catchpoint 1 (standard signals)
(gdb) r
[...]
Catchpoint 1 (signal SIGHUP), main () at
../../../gdb/testsuite/gdb.base/catch-signal.c:40
40        raise (SIGHUP);               /* second HUP */
(gdb) c
Continuing.
main () at ../../../gdb/testsuite/gdb.base/catch-signal.c:40
40        raise (SIGHUP);               /* second HUP */
ContinueDebugEvent failed, GetLastError = 87
(gdb)

It can be seen that changing current_event.dwThreadId as 
handle_output_debug_string does when processing a Cygwin signal message, leads 
to ContinueDebugEvent() being applied to the wrong thead, with tragic 
consequences.

with [5/5] applied:

$ ./gdb testsuite/outputs/gdb.base/catch-signal/catch-signal.exe
[...]
(gdb) catch signal
Catchpoint 1 (standard signals)
(gdb) r
[...]
Catchpoint 1 (signal SIGHUP), main () at
../../../gdb/testsuite/gdb.base/catch-signal.c:40
40        raise (SIGHUP);               /* second HUP */
(gdb) c
Continuing.
Catchpoint 1 (signal SIGHUP), main () at
../../../gdb/testsuite/gdb.base/catch-signal.c:42
42        raise (SIGHUP);               /* third HUP */
(gdb)

It looks like tests don't finish with an unresponsive gdb, and there are more 
tests in the testsuite which have this problem than my computer has cores, so 
eventually it ends up stuck on those tests, and the test run doesn't finish.

After these patches, there still remain some problems with Cygwin-specific 
signal handling, which hopefully I can address in future patches.

Jon Turney (5):
  windows-nat: Don't use ternary conditional operator in
    get_windows_debug_event
  windows-nat: Cleanups in get_windows_debug_event
  windows-nat: Fix misspelling in debug output
  windows-nat: Report an error if ContinueDebugEvent() fails
  windows-nat: Don't change current_event.dwThreadId in
    handle_output_debug_string()

 gdb/ChangeLog     | 26 ++++++++++++++++++++++++++
 gdb/windows-nat.c | 41 +++++++++++++++++++++++------------------
 2 files changed, 49 insertions(+), 18 deletions(-)
  
Jon Turney April 16, 2015, 7:24 p.m. UTC | #4
Thanks for taking the time to look at this.

On 14/04/2015 14:16, Joel Brobecker wrote:
>
> Overall, the patch looks reasonable to me. But I think there are
> at least 3 independent changes, and it would be nice to split those
> two out, for a couple of reasons:
>    1. It allows you to explain the nature of the problem, from the user's
>       standpoint, that the patch is fixing (ie, what user-visible
>       symptoms it fixes);
>    2. it allows us to see how each problem is fixed, and to deal with
>       each issue separately.

Ok.  Unfortunately I don't have access to the original problems which 
inspired these patches, so I have reduced the scope to a problem I can 
easily demonstrate and split things up further.

>
> The three issues I view as independent:
>    a. ignoring "invalid handle" errors;

Yes, this is an unrelated change I should have excluded.

This was discussed somewhat in the thread starting at 
https://cygwin.com/ml/gdb-patches/2013-06/msg00237.html

>    b. unsetting saved_context.ContextFlags
>    c. the renaming of have_saved_context into signal_thread_id
>       so you can compare the current thread id with the saved
>       signal_thread_id.
>
> A few minor comments below.
>
>> @@ -849,8 +853,12 @@ handle_output_debug_string (struct target_waitstatus *ourstatus)
>>   					 &saved_context,
>>   					 __COPY_CONTEXT_SIZE, &n)
>>   		   && n == __COPY_CONTEXT_SIZE)
>> -	    have_saved_context = 1;
>> -	  current_event.dwThreadId = retval;
>> +	    {
>> +	      signal_thread_id = retval;
>> +	      saved_context.ContextFlags = 0;  /* Don't attempt to call SetThreadContext */
>
> Can you move the comment just above the statement so as to avoid
> exceeding the maximum line length, and also explain why we should
> not call SetThreadContext?

I've dropped this change for the moment.

>> @@ -1330,7 +1338,6 @@ get_windows_debug_event (struct target_ops *ops,
>>     event_code = current_event.dwDebugEventCode;
>>     ourstatus->kind = TARGET_WAITKIND_SPURIOUS;
>>     th = NULL;
>> -  have_saved_context = 0;
>
> Normally, there is a corresponding addition that sets signal_thread_id
> to zero.  This leads me to be believe that there might be an additional
> user-visible symptom that this patch fixes?

I'm sorry I don't follow you.

have_saved_context is reset in do_windows_fetch_inferior_registers() 
when the context is actually retrieved.

It seems odd to reset it in get_windows_debug_event() as it seems that 
any other Windows debug event between the Cygwin signal message and the 
do_windows_fetch_inferior_registers() will prevent the saved context 
being used, if it's possible for that to happen.

>> @@ -1501,12 +1508,12 @@ get_windows_debug_event (struct target_ops *ops,
>>     else
>>       {
>>         inferior_ptid = ptid_build (current_event.dwProcessId, 0,
>> -				  retval);
>> -      current_thread = th ?: thread_rec (current_event.dwThreadId, TRUE);
>> +				  thread_id);
>> +      current_thread = th ?: thread_rec (thread_id, TRUE);
>
> I know you are just repeating what was there before, but I don't think
> this is the clearest way to do things. GDB Coding Standards also do not
> allow assignments within conditions. I suggest instead:
>
>          current_thread = th;
>          if (!current_thread)
>            thread_rec (thread_id, TRUE);

I've done this (with the last line as an assignment), but you'll have to 
help me find where the Coding Standard says that.

[1] mentions not using assignments inside if conditions, but I don't see 
any mention of ?:

[1] http://www.gnu.org/prep/standards/standards.html#Syntactic-Conventions

>>   out:
>> -  return retval;
>> +  return (int) thread_id;
>
> I'm wondering whether the cast is actually necessary?

On third thoughts, I think not.

> Also, it looks like the function's comment might be stale:
>
>      /* Get the next event from the child.  Return 1 if the event requires
>         handling by WFI (or whatever).  */
>      static int
>      get_windows_debug_event (struct target_ops *ops,
>                               int pid, struct target_waitstatus *ourstatus)
>
> Would you mind fixing that?

Done.
  
Joel Brobecker April 22, 2015, 2:23 p.m. UTC | #5
> >         current_thread = th;
> >         if (!current_thread)
> >           thread_rec (thread_id, TRUE);
> 
> I've done this (with the last line as an assignment), but you'll
> have to help me find where the Coding Standard says that.
> 
> [1] mentions not using assignments inside if conditions, but I don't
> see any mention of ?:
> 
> [1] http://www.gnu.org/prep/standards/standards.html#Syntactic-Conventions

I think I got slightly confused by that one, as to what the "?"
applied to. But I guess that could be seen as a good reason for
avoiding it in this case :-P. Thanks for doing it anyways, even
though the reasons for it were not as strong as I thought.
  
Jon Turney June 10, 2015, 1:06 p.m. UTC | #6
On 16/04/2015 20:23, Jon Turney wrote:
> After these patches, there still remain some problems with Cygwin-specific
> signal handling, which hopefully I can address in future patches.

The remaining problem appears to be that since the signal context
reported by Cygwin is the context which will be resumed after any signal
handler has been run, to restore it to the inferior (as master currently
does) skips over the actual signal delivery and handling.

Cygwin currently carries a patch which clears the ContextFlags in the
signal context, so we never attempt to restore it to the inferior.

The test-suite reveals that isn't quite right, either, unfortunately.

In some cases, GDB decides it needs to single-step when continuing from
the signal, which requires it update the inferior's context with
FLAG_TRACE_BIT set, which can only currently happen when ContextFlags
has the CONTEXT_CONTROL flag set.

This seems to me to be not straightforward to fix. It's not entirely
clear to me that Cygwin's current behaviour is correct, either.
  

Patch

diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c
index fd31083..5c191de 100644
--- a/gdb/windows-nat.c
+++ b/gdb/windows-nat.c
@@ -118,8 +118,8 @@  static COORD WINAPI (*GetConsoleFontSize) (HANDLE, DWORD);
 #   define bad_GetModuleFileNameEx bad_GetModuleFileNameExW
 #endif
 
-static int have_saved_context;	/* True if we've saved context from a
-				   cygwin signal.  */
+static DWORD  signal_thread_id;	/* Non-zero thread id if we have a saved
+				   context from a cygwin signal.  */
 static CONTEXT saved_context;	/* Containes the saved context from a
 				   cygwin signal.  */
 
@@ -301,7 +301,8 @@  thread_rec (DWORD id, int get_context)
       {
 	if (!th->suspended && get_context)
 	  {
-	    if (get_context > 0 && id != current_event.dwThreadId)
+	    if (get_context > 0 && id != current_event.dwThreadId
+		&& id != signal_thread_id)
 	      {
 		if (SuspendThread (th->h) == (DWORD) -1)
 		  {
@@ -310,8 +311,11 @@  thread_rec (DWORD id, int get_context)
 		    /* We get Access Denied (5) when trying to suspend
 		       threads that Windows started on behalf of the
 		       debuggee, usually when those threads are just
-		       about to exit.  */
-		    if (err != ERROR_ACCESS_DENIED)
+		       about to exit.
+		       We can get Invalid Handle (6) if the main thread
+		       has exited. */
+		    if (err != ERROR_INVALID_HANDLE
+			&& err != ERROR_ACCESS_DENIED)
 		      warning (_("SuspendThread (tid=0x%x) failed."
 				 " (winerr %u)"),
 			       (unsigned) id, (unsigned) err);
@@ -433,7 +437,7 @@  do_windows_fetch_inferior_registers (struct regcache *regcache, int r)
   if (current_thread->reload_context)
     {
 #ifdef __COPY_CONTEXT_SIZE
-      if (have_saved_context)
+      if (signal_thread_id)
 	{
 	  /* Lie about where the program actually is stopped since
 	     cygwin has informed us that we should consider the signal
@@ -441,7 +445,7 @@  do_windows_fetch_inferior_registers (struct regcache *regcache, int r)
 	     "saved_context.  */
 	  memcpy (&current_thread->context, &saved_context,
 		  __COPY_CONTEXT_SIZE);
-	  have_saved_context = 0;
+	  signal_thread_id = 0;
 	}
       else
 #endif
@@ -849,8 +853,12 @@  handle_output_debug_string (struct target_waitstatus *ourstatus)
 					 &saved_context,
 					 __COPY_CONTEXT_SIZE, &n)
 		   && n == __COPY_CONTEXT_SIZE)
-	    have_saved_context = 1;
-	  current_event.dwThreadId = retval;
+	    {
+	      signal_thread_id = retval;
+	      saved_context.ContextFlags = 0;  /* Don't attempt to call SetThreadContext */
+	    }
+	  else
+	    retval = 0;
 	}
     }
 #endif
@@ -1317,7 +1325,7 @@  get_windows_debug_event (struct target_ops *ops,
   DWORD continue_status, event_code;
   windows_thread_info *th;
   static windows_thread_info dummy_thread_info;
-  int retval = 0;
+  DWORD thread_id = 0;
 
   last_sig = GDB_SIGNAL_0;
 
@@ -1330,7 +1338,6 @@  get_windows_debug_event (struct target_ops *ops,
   event_code = current_event.dwDebugEventCode;
   ourstatus->kind = TARGET_WAITKIND_SPURIOUS;
   th = NULL;
-  have_saved_context = 0;
 
   switch (event_code)
     {
@@ -1348,14 +1355,14 @@  get_windows_debug_event (struct target_ops *ops,
 	      /* Kludge around a Windows bug where first event is a create
 		 thread event.  Caused when attached process does not have
 		 a main thread.  */
-	      retval = fake_create_process ();
-	      if (retval)
+	      thread_id = fake_create_process ();
+	      if (thread_id)
 		saw_create++;
 	    }
 	  break;
 	}
       /* Record the existence of this thread.  */
-      retval = current_event.dwThreadId;
+      thread_id = current_event.dwThreadId;
       th = windows_add_thread (ptid_build (current_event.dwProcessId, 0,
 					 current_event.dwThreadId),
 			     current_event.u.CreateThread.hThread,
@@ -1398,7 +1405,7 @@  get_windows_debug_event (struct target_ops *ops,
 					   current_event.dwThreadId),
 	     current_event.u.CreateProcessInfo.hThread,
 	     current_event.u.CreateProcessInfo.lpThreadLocalBase);
-      retval = current_event.dwThreadId;
+      thread_id = current_event.dwThreadId;
       break;
 
     case EXIT_PROCESS_DEBUG_EVENT:
@@ -1417,7 +1424,7 @@  get_windows_debug_event (struct target_ops *ops,
 	{
 	  ourstatus->kind = TARGET_WAITKIND_EXITED;
 	  ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode;
-	  retval = main_thread_id;
+	  thread_id = main_thread_id;
 	}
       break;
 
@@ -1432,7 +1439,7 @@  get_windows_debug_event (struct target_ops *ops,
       catch_errors (handle_load_dll, NULL, (char *) "", RETURN_MASK_ALL);
       ourstatus->kind = TARGET_WAITKIND_LOADED;
       ourstatus->value.integer = 0;
-      retval = main_thread_id;
+      thread_id = main_thread_id;
       break;
 
     case UNLOAD_DLL_DEBUG_EVENT:
@@ -1445,7 +1452,7 @@  get_windows_debug_event (struct target_ops *ops,
       catch_errors (handle_unload_dll, NULL, (char *) "", RETURN_MASK_ALL);
       ourstatus->kind = TARGET_WAITKIND_LOADED;
       ourstatus->value.integer = 0;
-      retval = main_thread_id;
+      thread_id = main_thread_id;
       break;
 
     case EXCEPTION_DEBUG_EVENT:
@@ -1461,7 +1468,7 @@  get_windows_debug_event (struct target_ops *ops,
 	  continue_status = DBG_EXCEPTION_NOT_HANDLED;
 	  break;
 	case 1:
-	  retval = current_event.dwThreadId;
+	  thread_id = current_event.dwThreadId;
 	  break;
 	case -1:
 	  last_sig = 1;
@@ -1477,7 +1484,7 @@  get_windows_debug_event (struct target_ops *ops,
 		     "OUTPUT_DEBUG_STRING_EVENT"));
       if (saw_create != 1)
 	break;
-      retval = handle_output_debug_string (ourstatus);
+      thread_id = handle_output_debug_string (ourstatus);
       break;
 
     default:
@@ -1491,7 +1498,7 @@  get_windows_debug_event (struct target_ops *ops,
       break;
     }
 
-  if (!retval || saw_create != 1)
+  if (!thread_id || saw_create != 1)
     {
       if (continue_status == -1)
 	windows_resume (ops, minus_one_ptid, 0, 1);
@@ -1501,12 +1508,12 @@  get_windows_debug_event (struct target_ops *ops,
   else
     {
       inferior_ptid = ptid_build (current_event.dwProcessId, 0,
-				  retval);
-      current_thread = th ?: thread_rec (current_event.dwThreadId, TRUE);
+				  thread_id);
+      current_thread = th ?: thread_rec (thread_id, TRUE);
     }
 
 out:
-  return retval;
+  return (int) thread_id;
 }
 
 /* Wait for interesting events to occur in the target process.  */