diff mbox

Fix "PC register is not available" issue

Message ID 834n2kztfw.fsf@gnu.org
State Superseded
Headers show

Commit Message

Eli Zaretskii March 26, 2014, 6:49 p.m. UTC
This describes the results of my looking into this issue, given the
comments and suggestions by Joel and Pedro.  Sorry about the length.

> I didn't mean to change the behavior - only hide the warning.
> In this case, if it is normal that we can't suspend the thread,
> then there is no point in warning (scaring) the user about it.
> I would only generate a warning if something abnormal that we should
> fix occured.

The patch near the end of this message indeed includes code to ignore
the warning in these cases.

> I see that the GetThreadContext call (do_windows_fetch_inferior_registers)
> doesn't check for errors (I think it should (*)).  It'd be interesting to know whether gdb can
> actually read the registers off of this thread, and if so, what's the
> thread's backtrace like.

I added CHECK to that call to GetThreadContext.  It never produced a
warning in all my testing, and it looks like we do succeed to get the
registers.  At least the registers of 2 such threads show different
contents, and the EIP value is consistent with what "info threads"
displays.

> >> and if so, what's the thread's backtrace like.
> > 
> > Why would we be interested in that info?
> 
> It'll likely show us the thread is stopped at some ntdll.dll function
> or some such, and from the function name we will likely
> be able to infer what/which thread is this, like, e.g., whether
> it's a thread injected with DebugBreakProcess or some such
> (internally by one of the system dlls or the dlls your app
> links with).

I can show you 2 typical examples.  This is from Emacs, where the
application has 3 threads, and one more is started by the debugger.
The rest, threads 5 and 6 in these examples, are those mysterious
threads we are talking about.

  (gdb) info threads
    Id   Target Id         Frame
    6    Thread 15492.0x1f28 0x77a41f46 in ntdll!ZwWaitForWorkViaWorkerFactory
      () from C:\Windows\system32\ntdll.dll
    5    Thread 15492.0x73c0 0x77a41f46 in ntdll!ZwWaitForWorkViaWorkerFactory
      () from C:\Windows\system32\ntdll.dll
    4    Thread 15492.0x2300 0x75ac78d7 in USER32!DispatchMessageW ()
     from C:\Windows\syswow64\user32.dll
    3    Thread 15492.0x1860 0x77a3fd91 in ntdll!ZwDelayExecution ()
     from C:\Windows\system32\ntdll.dll
    2    Thread 15492.0x2410 0x77a4015d in ntdll!ZwWaitForMultipleObjects ()
     from C:\Windows\system32\ntdll.dll
  * 1    Thread 15492.0x44a0 cleanup_vector (vector=0x62daeb0) at alloc.c:2917

  (gdb) info threads
    Id   Target Id         Frame
    6    Thread 15492.0x1f28 0x77a3f8d1 in ntdll!ZwWaitForSingleObject ()
     from C:\Windows\system32\ntdll.dll
    5    Thread 15492.0x73c0 0x77a72880 in ntdll!RtlFillMemoryUlong ()
     from C:\Windows\system32\ntdll.dll
    4    Thread 15492.0x2300 0x75ac78d7 in USER32!DispatchMessageW ()
     from C:\Windows\syswow64\user32.dll
    3    Thread 15492.0x1860 0x77a3fd91 in ntdll!ZwDelayExecution ()
     from C:\Windows\system32\ntdll.dll
    2    Thread 15492.0x2410 0x77a4015d in ntdll!ZwWaitForMultipleObjects ()
     from C:\Windows\system32\ntdll.dll
  * 1    Thread 15492.0x44a0 cleanup_vector (vector=0x388ca58) at alloc.c:2917

The first display is what I usually see: several (I've seen up to 4)
threads waiting inside ZwWaitForWorkViaWorkerFactory.  But sometimes
they do perform some work, as can be seen from the second display.

My theory is that we get those Access Denied (winerr = 5) errors when
we try to suspend these threads when they are about to exit.  That's
because I normally see a "Thread exited" message immediately after the
warning with the error code.  However, testing whether the thread
exited, and avoiding the SuspendThread call if it did, didn't stop the
warnings, so I guess the issue is more subtle.  E.g., it could be that
during the final stages of thread shutdown, the thread runs some
privileged code or something, which precludes suspension.  It might
also be related to the fact that we are debugging 32-bit threads on a
64-bit OS, so WOW64 is at work here.

In any case, the patch below ignores Access Denied errors from
SuspendThread.

In addition, I tried to solve warnings like these:

  error return windows-nat.c:1306 was 5
  [Thread 15492.0x68a0 exited with code 0]
  (gdb) q
  A debugging session is active.

	  Inferior 1 [process 15492] will be killed.

  Quit anyway? (y or n) y
  error return windows-nat.c:1306 was 5

These come from SetThreadContext in windows_continue.  The second
occurrence is because we already killed the inferior by calling
TerminateProcess, so its threads begin to shut down, and trying to set
their context might indeed fail.  The first warning is about one of
those same threads, and again happens just before the thread exit is
announced.

My solution, which you can see below, is (a) pass an additional
parameter to windows_continue telling it that the inferior was killed,
in which case it doesn't bother checking errors from the
SetThreadContext call; and (b) check if the thread already exited, and
if so, avoid calling SetThreadContext on it.  This completely avoids
the warning when killing the inferior, and removes most (but not all)
of the warnings for the other threads.

Finally, here's the full patch.  I hope this research answered all the
questions, and we can now get the patch in.

Comments

Joel Brobecker March 27, 2014, 12:56 p.m. UTC | #1
Hi Eli,

Thanks for doing all this.

> I added CHECK to that call to GetThreadContext.  It never produced a
> warning in all my testing, and it looks like we do succeed to get the
> registers.  At least the registers of 2 such threads show different
> contents, and the EIP value is consistent with what "info threads"
> displays.

I think "info threads" uses those register values to produce that
listing, so the consistency would be expected. But as long as you
have meaningful values, I think that's some evidence of success.

> My theory is that we get those Access Denied (winerr = 5) errors when
> we try to suspend these threads when they are about to exit.  That's
> because I normally see a "Thread exited" message immediately after the
> warning with the error code.  However, testing whether the thread
> exited, and avoiding the SuspendThread call if it did, didn't stop the
> warnings, so I guess the issue is more subtle.  E.g., it could be that
> during the final stages of thread shutdown, the thread runs some
> privileged code or something, which precludes suspension.

Perhaps a simpler theory is that these threads are typically
short-lived, and so you'd be always be seeing their exit soon
after they are created.

> In any case, the patch below ignores Access Denied errors from
> SuspendThread.

I think that makes sense.

> In addition, I tried to solve warnings like these:
> 
>   error return windows-nat.c:1306 was 5
>   [Thread 15492.0x68a0 exited with code 0]
>   (gdb) q
>   A debugging session is active.
> 
> 	  Inferior 1 [process 15492] will be killed.
> 
>   Quit anyway? (y or n) y
>   error return windows-nat.c:1306 was 5

Yay! :)

> These come from SetThreadContext in windows_continue.  The second
> occurrence is because we already killed the inferior by calling
> TerminateProcess, so its threads begin to shut down, and trying to set
> their context might indeed fail.  The first warning is about one of
> those same threads, and again happens just before the thread exit is
> announced.
> 
> My solution, which you can see below, is (a) pass an additional
> parameter to windows_continue telling it that the inferior was killed,
> in which case it doesn't bother checking errors from the
> SetThreadContext call; and (b) check if the thread already exited, and
> if so, avoid calling SetThreadContext on it.  This completely avoids
> the warning when killing the inferior, and removes most (but not all)
> of the warnings for the other threads.

I am missing the command that caused the first error. From what you
are saying, was it "kill"? If it was, it seems to me that if we
cured the first error, the second one would not happen. Otherwise,
it'd be interesting to understand why we send a TerminateProcess
first, and then try to SetThreadContext later on. It does not seem
to make sense.

Perhaps one way to address the problem more globally is to have
a version of the CHECK macro that ignores access-denied errors,
and use this one on operations where we know we might get that
error?

> Finally, here's the full patch.  I hope this research answered all the
> questions, and we can now get the patch in.

I'm good with the first half of the patch that handles SuspendThread
more gracefully. For the additional argument to windows_continue,
I propose we handle that as a separate patch. It's the right thing
to do procedurally anyway, and it'll give us a chance to investigate
more without holding the first half up.

Pedro, WDYT?

> 
> --- gdb/windows-nat.c~0	2014-02-06 04:21:29.000000000 +0200
> +++ gdb/windows-nat.c	2014-03-25 19:18:16.814250000 +0200
> @@ -310,12 +310,18 @@ thread_rec (DWORD id, int get_context)
>  		  {
>  		    DWORD err = GetLastError ();
>  
> -		    warning (_("SuspendThread (tid=0x%x) failed."
> -			       " (winerr %u)"),
> -			     (unsigned) id, (unsigned) err);
> -		    return NULL;
> +		    /* We get Access Denied (5) when trying to suspend
> +		       threads that Windows started on behalf of the
> +		       debuggee, usually when those threads are just
> +		       about to exit.  */
> +		    if (err != ERROR_ACCESS_DENIED)
> +		      warning (_("SuspendThread (tid=0x%x) failed."
> +				 " (winerr %u)"),
> +			       (unsigned) id, (unsigned) err);
> +		    th->suspended = -1;
>  		  }
> -		th->suspended = 1;
> +		else
> +		  th->suspended = 1;
>  	      }
>  	    else if (get_context < 0)
>  	      th->suspended = -1;
> @@ -445,7 +451,7 @@ do_windows_fetch_inferior_registers (str
>  	{
>  	  thread_info *th = current_thread;
>  	  th->context.ContextFlags = CONTEXT_DEBUGGER_DR;
> -	  GetThreadContext (th->h, &th->context);
> +	  CHECK (GetThreadContext (th->h, &th->context));
>  	  /* Copy dr values from that thread.
>  	     But only if there were not modified since last stop.
>  	     PR gdb/2388 */
> @@ -1266,10 +1272,12 @@ handle_exception (struct target_waitstat
>    return 1;
>  }
>  
> -/* Resume all artificially suspended threads if we are continuing
> -   execution.  */
> +/* Resume thread specified by ID, or all artificially suspended
> +   threads, if we are continuing execution.  KILLED non-zero means we
> +   have killed the inferior, so we should ignore weird errors due to
> +   threads shutting down.  */
>  static BOOL
> -windows_continue (DWORD continue_status, int id)
> +windows_continue (DWORD continue_status, int id, int killed)
>  {
>    int i;
>    thread_info *th;
> @@ -1297,7 +1305,16 @@ windows_continue (DWORD continue_status,
>  	  }
>  	if (th->context.ContextFlags)
>  	  {
> -	    CHECK (SetThreadContext (th->h, &th->context));
> +	    DWORD ec = 0;
> +
> +	    if (GetExitCodeThread (th->h, &ec)
> +		&& ec == STILL_ACTIVE)
> +	      {
> +		if (killed)
> +		  SetThreadContext (th->h, &th->context);
> +		else
> +		  CHECK (SetThreadContext (th->h, &th->context));
> +	      }
>  	    th->context.ContextFlags = 0;
>  	  }
>  	if (th->suspended > 0)
> @@ -1425,9 +1442,9 @@ windows_resume (struct target_ops *ops,
>       Otherwise complain.  */
>  
>    if (resume_all)
> -    windows_continue (continue_status, -1);
> +    windows_continue (continue_status, -1, 0);
>    else
> -    windows_continue (continue_status, ptid_get_tid (ptid));
> +    windows_continue (continue_status, ptid_get_tid (ptid), 0);
>  }
>  
>  /* Ctrl-C handler used when the inferior is not run in the same console.  The
> @@ -1645,7 +1662,7 @@ get_windows_debug_event (struct target_o
>        if (continue_status == -1)
>  	windows_resume (ops, minus_one_ptid, 0, 1);
>        else
> -	CHECK (windows_continue (continue_status, -1));
> +	CHECK (windows_continue (continue_status, -1, 0));
>      }
>    else
>      {
> @@ -2382,13 +2399,13 @@ windows_create_inferior (struct target_o
>  
>    do_initial_windows_stuff (ops, pi.dwProcessId, 0);
>  
> -  /* windows_continue (DBG_CONTINUE, -1); */
> +  /* windows_continue (DBG_CONTINUE, -1, 0); */
>  }
>  
>  static void
>  windows_mourn_inferior (struct target_ops *ops)
>  {
> -  (void) windows_continue (DBG_CONTINUE, -1);
> +  (void) windows_continue (DBG_CONTINUE, -1, 0);
>    i386_cleanup_dregs();
>    if (open_process_used)
>      {
> @@ -2456,7 +2473,7 @@ windows_kill_inferior (struct target_ops
>  
>    for (;;)
>      {
> -      if (!windows_continue (DBG_CONTINUE, -1))
> +      if (!windows_continue (DBG_CONTINUE, -1, 1))
>  	break;
>        if (!WaitForDebugEvent (&current_event, INFINITE))
>  	break;
>
Eli Zaretskii March 27, 2014, 5:41 p.m. UTC | #2
> Date: Thu, 27 Mar 2014 05:56:46 -0700
> From: Joel Brobecker <brobecker@adacore.com>
> Cc: Pedro Alves <palves@redhat.com>, gdb-patches@sourceware.org
> 
> > My theory is that we get those Access Denied (winerr = 5) errors when
> > we try to suspend these threads when they are about to exit.  That's
> > because I normally see a "Thread exited" message immediately after the
> > warning with the error code.  However, testing whether the thread
> > exited, and avoiding the SuspendThread call if it did, didn't stop the
> > warnings, so I guess the issue is more subtle.  E.g., it could be that
> > during the final stages of thread shutdown, the thread runs some
> > privileged code or something, which precludes suspension.
> 
> Perhaps a simpler theory is that these threads are typically
> short-lived, and so you'd be always be seeing their exit soon
> after they are created.

They live for several minutes, so not so short-lived.

> > In addition, I tried to solve warnings like these:
> > 
> >   error return windows-nat.c:1306 was 5
> >   [Thread 15492.0x68a0 exited with code 0]
> >   (gdb) q
> >   A debugging session is active.
> > 
> > 	  Inferior 1 [process 15492] will be killed.
> > 
> >   Quit anyway? (y or n) y
> >   error return windows-nat.c:1306 was 5
> 
> Yay! :)
> 
> > These come from SetThreadContext in windows_continue.  The second
> > occurrence is because we already killed the inferior by calling
> > TerminateProcess, so its threads begin to shut down, and trying to set
> > their context might indeed fail.  The first warning is about one of
> > those same threads, and again happens just before the thread exit is
> > announced.
> > 
> > My solution, which you can see below, is (a) pass an additional
> > parameter to windows_continue telling it that the inferior was killed,
> > in which case it doesn't bother checking errors from the
> > SetThreadContext call; and (b) check if the thread already exited, and
> > if so, avoid calling SetThreadContext on it.  This completely avoids
> > the warning when killing the inferior, and removes most (but not all)
> > of the warnings for the other threads.
> 
> I am missing the command that caused the first error. From what you
> are saying, was it "kill"?

No, it was "continue", which caused the inferior to run until a
breakpoint, at which point I typed "q".  The thread exit event
happened while the inferior was running.

> it'd be interesting to understand why we send a TerminateProcess
> first, and then try to SetThreadContext later on. It does not seem
> to make sense.

That happens because windows_kill_inferior does this:

  static void
  windows_kill_inferior (struct target_ops *ops)
  {
    CHECK (TerminateProcess (current_process_handle, 0));

    for (;;)
      {
	if (!windows_continue (DBG_CONTINUE, -1, 1))
	  break;
	if (!WaitForDebugEvent (&current_event, INFINITE))
	  break;
	if (current_event.dwDebugEventCode == EXIT_PROCESS_DEBUG_EVENT)
	  break;
      }

    target_mourn_inferior ();	/* Or just windows_mourn_inferior?  */
  }

IOW, we resume the inferior (perhaps to collect all the debug
events?).

> Perhaps one way to address the problem more globally is to have
> a version of the CHECK macro that ignores access-denied errors,
> and use this one on operations where we know we might get that
> error?

We only have one or 2 places for that right now, so I wouldn't think a
separate macro is justified.

> > Finally, here's the full patch.  I hope this research answered all the
> > questions, and we can now get the patch in.
> 
> I'm good with the first half of the patch that handles SuspendThread
> more gracefully. For the additional argument to windows_continue,
> I propose we handle that as a separate patch. It's the right thing
> to do procedurally anyway, and it'll give us a chance to investigate
> more without holding the first half up.

Given what I told above, what additional investigations are needed?

Note that the second part is not entirely separate: those phantom
threads hit the problem with SetThreadContext as well, and checking
whether the thread already exited does let through fewer of those
warnings.
Joel Brobecker March 28, 2014, 12:59 p.m. UTC | #3
> > it'd be interesting to understand why we send a TerminateProcess
> > first, and then try to SetThreadContext later on. It does not seem
> > to make sense.
> 
> That happens because windows_kill_inferior does this:
> 
>   static void
>   windows_kill_inferior (struct target_ops *ops)
>   {
>     CHECK (TerminateProcess (current_process_handle, 0));
> 
>     for (;;)
>       {
> 	if (!windows_continue (DBG_CONTINUE, -1, 1))
> 	  break;
> 	if (!WaitForDebugEvent (&current_event, INFINITE))
> 	  break;
> 	if (current_event.dwDebugEventCode == EXIT_PROCESS_DEBUG_EVENT)
> 	  break;
>       }
> 
>     target_mourn_inferior ();	/* Or just windows_mourn_inferior?  */
>   }
> 
> IOW, we resume the inferior (perhaps to collect all the debug
> events?).

I see - I didn't realize we were doing that, but a quick read
of the corresponding MSDN page confirms that this is necessary:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686714(v=vs.85).aspx

> Given what I told above, what additional investigations are needed?
> 
> Note that the second part is not entirely separate: those phantom
> threads hit the problem with SetThreadContext as well, and checking
> whether the thread already exited does let through fewer of those
> warnings.

In light of everything, I have no further comment at the moment
regarding your latest patch. Perhaps one tiny detail:

> +             if (killed)
> +               SetThreadContext (th->h, &th->context);
> +             else
> +               CHECK (SetThreadContext (th->h, &th->context));

Rather than duplicate the call to SetThreadContext, perhaps
another way of doing it would be:

    DWORD status;

    status = SetThreadContext (th->h, &th->context);
    if (!killed)
      CHECK (status)

(not a strong suggestion, feel free to ignore)
Pedro Alves March 28, 2014, 2:50 p.m. UTC | #4
On 03/26/2014 06:49 PM, Eli Zaretskii wrote:
> This describes the results of my looking into this issue, given the
> comments and suggestions by Joel and Pedro.  Sorry about the length.
> 
>> I didn't mean to change the behavior - only hide the warning.
>> In this case, if it is normal that we can't suspend the thread,
>> then there is no point in warning (scaring) the user about it.
>> I would only generate a warning if something abnormal that we should
>> fix occured.
> 
> The patch near the end of this message indeed includes code to ignore
> the warning in these cases.
> 
>> I see that the GetThreadContext call (do_windows_fetch_inferior_registers)
>> doesn't check for errors (I think it should (*)).  It'd be interesting to know whether gdb can
>> actually read the registers off of this thread, and if so, what's the
>> thread's backtrace like.
> 
> I added CHECK to that call to GetThreadContext.  It never produced a
> warning in all my testing, and it looks like we do succeed to get the
> registers.  At least the registers of 2 such threads show different
> contents, and the EIP value is consistent with what "info threads"
> displays.

It isn't clear to me whether you're saying that you saw the
SuspendThread failure trigger in all your new testing, so that
we'd know for sure whether GetThreadContext suceeds in that case,
or whether it might have been that you just were "lucky" enough
to not trigger the SuspendThread failure issue.

Does your patch fix the test case in PR14018, without producing
a CHECK warning from the new CHECK in GetThreadContext you've
added?  That'd confirm it, I think.

> I can show you 2 typical examples.  This is from Emacs, where the
> application has 3 threads, and one more is started by the debugger.
> The rest, threads 5 and 6 in these examples, are those mysterious
> threads we are talking about.
> 
>   (gdb) info threads
>     Id   Target Id         Frame
>     6    Thread 15492.0x1f28 0x77a41f46 in ntdll!ZwWaitForWorkViaWorkerFactory
>       () from C:\Windows\system32\ntdll.dll
>     5    Thread 15492.0x73c0 0x77a41f46 in ntdll!ZwWaitForWorkViaWorkerFactory
>       () from C:\Windows\system32\ntdll.dll
>     4    Thread 15492.0x2300 0x75ac78d7 in USER32!DispatchMessageW ()
>      from C:\Windows\syswow64\user32.dll
>     3    Thread 15492.0x1860 0x77a3fd91 in ntdll!ZwDelayExecution ()
>      from C:\Windows\system32\ntdll.dll
>     2    Thread 15492.0x2410 0x77a4015d in ntdll!ZwWaitForMultipleObjects ()
>      from C:\Windows\system32\ntdll.dll
>   * 1    Thread 15492.0x44a0 cleanup_vector (vector=0x62daeb0) at alloc.c:2917
> 
>   (gdb) info threads
>     Id   Target Id         Frame
>     6    Thread 15492.0x1f28 0x77a3f8d1 in ntdll!ZwWaitForSingleObject ()
>      from C:\Windows\system32\ntdll.dll
>     5    Thread 15492.0x73c0 0x77a72880 in ntdll!RtlFillMemoryUlong ()
>      from C:\Windows\system32\ntdll.dll
>     4    Thread 15492.0x2300 0x75ac78d7 in USER32!DispatchMessageW ()
>      from C:\Windows\syswow64\user32.dll
>     3    Thread 15492.0x1860 0x77a3fd91 in ntdll!ZwDelayExecution ()
>      from C:\Windows\system32\ntdll.dll
>     2    Thread 15492.0x2410 0x77a4015d in ntdll!ZwWaitForMultipleObjects ()
>      from C:\Windows\system32\ntdll.dll
>   * 1    Thread 15492.0x44a0 cleanup_vector (vector=0x388ca58) at alloc.c:2917
>
> The first display is what I usually see: several (I've seen up to 4)
> threads waiting inside ZwWaitForWorkViaWorkerFactory.  But sometimes
> they do perform some work, as can be seen from the second display.

OK, but these don't appear to be backtraces taken right after
SuspendThread failed.  That is, you're showing that these threads
exist during your normal emacs debugging.  IOW, you're not showing
where the thread is actually stopped when the SuspendThread failed.

> In any case, the patch below ignores Access Denied errors from
> SuspendThread.
> 
> In addition, I tried to solve warnings like these:
> 
>   error return windows-nat.c:1306 was 5
>   [Thread 15492.0x68a0 exited with code 0]
>   (gdb) q
>   A debugging session is active.
> 
> 	  Inferior 1 [process 15492] will be killed.
> 
>   Quit anyway? (y or n) y
>   error return windows-nat.c:1306 was 5
> 
> These come from SetThreadContext in windows_continue.  The second
> occurrence is because we already killed the inferior by calling
> TerminateProcess, so its threads begin to shut down, and trying to set
> their context might indeed fail.  The first warning is about one of
> those same threads, and again happens just before the thread exit is
> announced.
> 
> My solution, which you can see below, is (a) pass an additional
> parameter to windows_continue telling it that the inferior was killed,
> in which case it doesn't bother checking errors from the
> SetThreadContext call; and (b) check if the thread already exited, and
> if so, avoid calling SetThreadContext on it.  This completely avoids
> the warning when killing the inferior, and removes most (but not all)
> of the warnings for the other threads.

Why bother calling SetThreadContext at all if we just killed
the process?  That is, it seems to me this would be little
less magical:

	    if (!killed
		&& GetExitCodeThread (th->h, &ec)
		&& ec == STILL_ACTIVE)
  	     CHECK (SetThreadContext (th->h, &th->context));

> Finally, here's the full patch.  I hope this research answered all the
> questions, and we can now get the patch in.

I'm not sure it did, but in any case the patch looks good to me.

Sounds like GDBserver might have this problem too.

Thanks,
Eli Zaretskii March 28, 2014, 5:29 p.m. UTC | #5
> Date: Fri, 28 Mar 2014 05:59:55 -0700
> From: Joel Brobecker <brobecker@adacore.com>
> Cc: palves@redhat.com, gdb-patches@sourceware.org
> 
> > +             if (killed)
> > +               SetThreadContext (th->h, &th->context);
> > +             else
> > +               CHECK (SetThreadContext (th->h, &th->context));
> 
> Rather than duplicate the call to SetThreadContext, perhaps
> another way of doing it would be:
> 
>     DWORD status;
> 
>     status = SetThreadContext (th->h, &th->context);
>     if (!killed)
>       CHECK (status)

Fine with me.
Eli Zaretskii March 28, 2014, 5:35 p.m. UTC | #6
> Date: Fri, 28 Mar 2014 14:50:31 +0000
> From: Pedro Alves <palves@redhat.com>
> CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org
> 
> On 03/26/2014 06:49 PM, Eli Zaretskii wrote:
> > This describes the results of my looking into this issue, given the
> > comments and suggestions by Joel and Pedro.  Sorry about the length.
> > 
> >> I didn't mean to change the behavior - only hide the warning.
> >> In this case, if it is normal that we can't suspend the thread,
> >> then there is no point in warning (scaring) the user about it.
> >> I would only generate a warning if something abnormal that we should
> >> fix occured.
> > 
> > The patch near the end of this message indeed includes code to ignore
> > the warning in these cases.
> > 
> >> I see that the GetThreadContext call (do_windows_fetch_inferior_registers)
> >> doesn't check for errors (I think it should (*)).  It'd be interesting to know whether gdb can
> >> actually read the registers off of this thread, and if so, what's the
> >> thread's backtrace like.
> > 
> > I added CHECK to that call to GetThreadContext.  It never produced a
> > warning in all my testing, and it looks like we do succeed to get the
> > registers.  At least the registers of 2 such threads show different
> > contents, and the EIP value is consistent with what "info threads"
> > displays.
> 
> It isn't clear to me whether you're saying that you saw the
> SuspendThread failure trigger in all your new testing, so that
> we'd know for sure whether GetThreadContext suceeds in that case,
> or whether it might have been that you just were "lucky" enough
> to not trigger the SuspendThread failure issue.

The former.

> Does your patch fix the test case in PR14018, without producing
> a CHECK warning from the new CHECK in GetThreadContext you've
> added?

Yes.

> > I can show you 2 typical examples.  This is from Emacs, where the
> > application has 3 threads, and one more is started by the debugger.
> > The rest, threads 5 and 6 in these examples, are those mysterious
> > threads we are talking about.
> > 
> >   (gdb) info threads
> >     Id   Target Id         Frame
> >     6    Thread 15492.0x1f28 0x77a41f46 in ntdll!ZwWaitForWorkViaWorkerFactory
> >       () from C:\Windows\system32\ntdll.dll
> >     5    Thread 15492.0x73c0 0x77a41f46 in ntdll!ZwWaitForWorkViaWorkerFactory
> >       () from C:\Windows\system32\ntdll.dll
> >     4    Thread 15492.0x2300 0x75ac78d7 in USER32!DispatchMessageW ()
> >      from C:\Windows\syswow64\user32.dll
> >     3    Thread 15492.0x1860 0x77a3fd91 in ntdll!ZwDelayExecution ()
> >      from C:\Windows\system32\ntdll.dll
> >     2    Thread 15492.0x2410 0x77a4015d in ntdll!ZwWaitForMultipleObjects ()
> >      from C:\Windows\system32\ntdll.dll
> >   * 1    Thread 15492.0x44a0 cleanup_vector (vector=0x62daeb0) at alloc.c:2917
> > 
> >   (gdb) info threads
> >     Id   Target Id         Frame
> >     6    Thread 15492.0x1f28 0x77a3f8d1 in ntdll!ZwWaitForSingleObject ()
> >      from C:\Windows\system32\ntdll.dll
> >     5    Thread 15492.0x73c0 0x77a72880 in ntdll!RtlFillMemoryUlong ()
> >      from C:\Windows\system32\ntdll.dll
> >     4    Thread 15492.0x2300 0x75ac78d7 in USER32!DispatchMessageW ()
> >      from C:\Windows\syswow64\user32.dll
> >     3    Thread 15492.0x1860 0x77a3fd91 in ntdll!ZwDelayExecution ()
> >      from C:\Windows\system32\ntdll.dll
> >     2    Thread 15492.0x2410 0x77a4015d in ntdll!ZwWaitForMultipleObjects ()
> >      from C:\Windows\system32\ntdll.dll
> >   * 1    Thread 15492.0x44a0 cleanup_vector (vector=0x388ca58) at alloc.c:2917
> >
> > The first display is what I usually see: several (I've seen up to 4)
> > threads waiting inside ZwWaitForWorkViaWorkerFactory.  But sometimes
> > they do perform some work, as can be seen from the second display.
> 
> OK, but these don't appear to be backtraces taken right after
> SuspendThread failed.

Yes, they are after SuspendThread failed.

> Why bother calling SetThreadContext at all if we just killed
> the process?

See my other mail and Joel's response.

> > Finally, here's the full patch.  I hope this research answered all the
> > questions, and we can now get the patch in.
> 
> I'm not sure it did, but in any case the patch looks good to me.

If that's an approval, I will happily commit the changes.

> Sounds like GDBserver might have this problem too.

If there's an easy way to verify that, without having 2 systems
talking via some communications line, please tell how, and I will try
that.

Thanks.
Pedro Alves March 28, 2014, 5:49 p.m. UTC | #7
On 03/28/2014 05:35 PM, Eli Zaretskii wrote:
>> Date: Fri, 28 Mar 2014 14:50:31 +0000
>> From: Pedro Alves <palves@redhat.com>
>> CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org
>>
>> On 03/26/2014 06:49 PM, Eli Zaretskii wrote:
>>> This describes the results of my looking into this issue, given the
>>> comments and suggestions by Joel and Pedro.  Sorry about the length.
>>>
>>>> I didn't mean to change the behavior - only hide the warning.
>>>> In this case, if it is normal that we can't suspend the thread,
>>>> then there is no point in warning (scaring) the user about it.
>>>> I would only generate a warning if something abnormal that we should
>>>> fix occured.
>>>
>>> The patch near the end of this message indeed includes code to ignore
>>> the warning in these cases.
>>>
>>>> I see that the GetThreadContext call (do_windows_fetch_inferior_registers)
>>>> doesn't check for errors (I think it should (*)).  It'd be interesting to know whether gdb can
>>>> actually read the registers off of this thread, and if so, what's the
>>>> thread's backtrace like.
>>>
>>> I added CHECK to that call to GetThreadContext.  It never produced a
>>> warning in all my testing, and it looks like we do succeed to get the
>>> registers.  At least the registers of 2 such threads show different
>>> contents, and the EIP value is consistent with what "info threads"
>>> displays.
>>
>> It isn't clear to me whether you're saying that you saw the
>> SuspendThread failure trigger in all your new testing, so that
>> we'd know for sure whether GetThreadContext suceeds in that case,
>> or whether it might have been that you just were "lucky" enough
>> to not trigger the SuspendThread failure issue.
> 
> The former.
>
>> Does your patch fix the test case in PR14018, without producing
>> a CHECK warning from the new CHECK in GetThreadContext you've
>> added?
> 
> Yes.

Ah, excellent!  Thank you.  Please remember adding the PR number
to the ChangeLog entry then.


>> Why bother calling SetThreadContext at all if we just killed
>> the process?
> 
> See my other mail and Joel's response.

Not sure what you mean.  TerminateProcess is asynchronous, and
we need to resume the inferior and collect the debug events
until we see the process terminate.  But, my question is
why would we write the thread's registers at all if we
just told it to die?  Seems to be we could just skip
calling SetThreadContext instead of calling it but
ignoring the result.

> 
>>> Finally, here's the full patch.  I hope this research answered all the
>>> questions, and we can now get the patch in.
>>
>> I'm not sure it did, but in any case the patch looks good to me.
> 
> If that's an approval, I will happily commit the changes.

It's an approval from my end.

>> Sounds like GDBserver might have this problem too.
> 
> If there's an easy way to verify that, without having 2 systems
> talking via some communications line, please tell how, and I will try
> that.

Sure, you can run gdbserver and gdb on the same machine, and connect
with tcp.  Just:

 $ gdbserver :9999 myprogram.exe

in one terminal, and:

 $ gdb myprogram.exe -ex "tar rem :9999" -ex "b main" -ex "c"

in another.
Eli Zaretskii March 28, 2014, 6:30 p.m. UTC | #8
> Date: Fri, 28 Mar 2014 17:49:13 +0000
> From: Pedro Alves <palves@redhat.com>
> CC: brobecker@adacore.com, gdb-patches@sourceware.org
> 
> >> Why bother calling SetThreadContext at all if we just killed
> >> the process?
> > 
> > See my other mail and Joel's response.
> 
> Not sure what you mean.  TerminateProcess is asynchronous, and
> we need to resume the inferior and collect the debug events
> until we see the process terminate.  But, my question is
> why would we write the thread's registers at all if we
> just told it to die?  Seems to be we could just skip
> calling SetThreadContext instead of calling it but
> ignoring the result.

If you say so, I don't know enough about this stuff.

> >> Sounds like GDBserver might have this problem too.
> > 
> > If there's an easy way to verify that, without having 2 systems
> > talking via some communications line, please tell how, and I will try
> > that.
> 
> Sure, you can run gdbserver and gdb on the same machine, and connect
> with tcp.  Just:
> 
>  $ gdbserver :9999 myprogram.exe
> 
> in one terminal, and:
> 
>  $ gdb myprogram.exe -ex "tar rem :9999" -ex "b main" -ex "c"
> 
> in another.

OK, will try that.
Eli Zaretskii March 31, 2014, 3:31 p.m. UTC | #9
> Date: Fri, 28 Mar 2014 21:30:10 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: brobecker@adacore.com, gdb-patches@sourceware.org
> 
> > Date: Fri, 28 Mar 2014 17:49:13 +0000
> > From: Pedro Alves <palves@redhat.com>
> > CC: brobecker@adacore.com, gdb-patches@sourceware.org
> > 
> > >> Why bother calling SetThreadContext at all if we just killed
> > >> the process?
> > > 
> > > See my other mail and Joel's response.
> > 
> > Not sure what you mean.  TerminateProcess is asynchronous, and
> > we need to resume the inferior and collect the debug events
> > until we see the process terminate.  But, my question is
> > why would we write the thread's registers at all if we
> > just told it to die?  Seems to be we could just skip
> > calling SetThreadContext instead of calling it but
> > ignoring the result.
> 
> If you say so, I don't know enough about this stuff.

Actually, upon second thought: we continue the inferior after
TerminateProcess call to let it be killed, right?  If so, shouldn't we
continue it with the right context?

> > >> Sounds like GDBserver might have this problem too.
> > > 
> > > If there's an easy way to verify that, without having 2 systems
> > > talking via some communications line, please tell how, and I will try
> > > that.
> > 
> > Sure, you can run gdbserver and gdb on the same machine, and connect
> > with tcp.  Just:
> > 
> >  $ gdbserver :9999 myprogram.exe
> > 
> > in one terminal, and:
> > 
> >  $ gdb myprogram.exe -ex "tar rem :9999" -ex "b main" -ex "c"
> > 
> > in another.
> 
> OK, will try that.

Funnily enough, I cannot get GDBserver to emit similar warnings in the
same situation.  I don't understand the reasons for that, since the
code is very similar, and with a single exception, we do check the
return values of calls to GetThreadContext, SetThreadContext, and
SuspendThread in GDBserver.  But the fact remains that no warnings
about these threads are ever seen when debugging remotely.  I do see
the extra threads under GDBserver as well.

Does anyone have any further ideas?
Eli Zaretskii April 5, 2014, 9:07 a.m. UTC | #10
Ping!  If there are no further suggestions, I'd like to commit the
changes posted in this thread.

> Date: Mon, 31 Mar 2014 18:31:43 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: brobecker@adacore.com, gdb-patches@sourceware.org
> 
> > Date: Fri, 28 Mar 2014 21:30:10 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: brobecker@adacore.com, gdb-patches@sourceware.org
> > 
> > > Date: Fri, 28 Mar 2014 17:49:13 +0000
> > > From: Pedro Alves <palves@redhat.com>
> > > CC: brobecker@adacore.com, gdb-patches@sourceware.org
> > > 
> > > >> Why bother calling SetThreadContext at all if we just killed
> > > >> the process?
> > > > 
> > > > See my other mail and Joel's response.
> > > 
> > > Not sure what you mean.  TerminateProcess is asynchronous, and
> > > we need to resume the inferior and collect the debug events
> > > until we see the process terminate.  But, my question is
> > > why would we write the thread's registers at all if we
> > > just told it to die?  Seems to be we could just skip
> > > calling SetThreadContext instead of calling it but
> > > ignoring the result.
> > 
> > If you say so, I don't know enough about this stuff.
> 
> Actually, upon second thought: we continue the inferior after
> TerminateProcess call to let it be killed, right?  If so, shouldn't we
> continue it with the right context?
> 
> > > >> Sounds like GDBserver might have this problem too.
> > > > 
> > > > If there's an easy way to verify that, without having 2 systems
> > > > talking via some communications line, please tell how, and I will try
> > > > that.
> > > 
> > > Sure, you can run gdbserver and gdb on the same machine, and connect
> > > with tcp.  Just:
> > > 
> > >  $ gdbserver :9999 myprogram.exe
> > > 
> > > in one terminal, and:
> > > 
> > >  $ gdb myprogram.exe -ex "tar rem :9999" -ex "b main" -ex "c"
> > > 
> > > in another.
> > 
> > OK, will try that.
> 
> Funnily enough, I cannot get GDBserver to emit similar warnings in the
> same situation.  I don't understand the reasons for that, since the
> code is very similar, and with a single exception, we do check the
> return values of calls to GetThreadContext, SetThreadContext, and
> SuspendThread in GDBserver.  But the fact remains that no warnings
> about these threads are ever seen when debugging remotely.  I do see
> the extra threads under GDBserver as well.
> 
> Does anyone have any further ideas?
Joel Brobecker April 7, 2014, 4:58 p.m. UTC | #11
> Ping!  If there are no further suggestions, I'd like to commit the
> changes posted in this thread.

FWIW, I think you can go ahead. Your patches seem to be an improvement,
and I don't see how they could make things worse should we want to
work on another way to implement this part of the code again.

> 
> > Date: Mon, 31 Mar 2014 18:31:43 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: brobecker@adacore.com, gdb-patches@sourceware.org
> > 
> > > Date: Fri, 28 Mar 2014 21:30:10 +0300
> > > From: Eli Zaretskii <eliz@gnu.org>
> > > Cc: brobecker@adacore.com, gdb-patches@sourceware.org
> > > 
> > > > Date: Fri, 28 Mar 2014 17:49:13 +0000
> > > > From: Pedro Alves <palves@redhat.com>
> > > > CC: brobecker@adacore.com, gdb-patches@sourceware.org
> > > > 
> > > > >> Why bother calling SetThreadContext at all if we just killed
> > > > >> the process?
> > > > > 
> > > > > See my other mail and Joel's response.
> > > > 
> > > > Not sure what you mean.  TerminateProcess is asynchronous, and
> > > > we need to resume the inferior and collect the debug events
> > > > until we see the process terminate.  But, my question is
> > > > why would we write the thread's registers at all if we
> > > > just told it to die?  Seems to be we could just skip
> > > > calling SetThreadContext instead of calling it but
> > > > ignoring the result.
> > > 
> > > If you say so, I don't know enough about this stuff.
> > 
> > Actually, upon second thought: we continue the inferior after
> > TerminateProcess call to let it be killed, right?  If so, shouldn't we
> > continue it with the right context?

Sure, but what context would that be?

> > 
> > > > >> Sounds like GDBserver might have this problem too.
> > > > > 
> > > > > If there's an easy way to verify that, without having 2 systems
> > > > > talking via some communications line, please tell how, and I will try
> > > > > that.
> > > > 
> > > > Sure, you can run gdbserver and gdb on the same machine, and connect
> > > > with tcp.  Just:
> > > > 
> > > >  $ gdbserver :9999 myprogram.exe
> > > > 
> > > > in one terminal, and:
> > > > 
> > > >  $ gdb myprogram.exe -ex "tar rem :9999" -ex "b main" -ex "c"
> > > > 
> > > > in another.
> > > 
> > > OK, will try that.
> > 
> > Funnily enough, I cannot get GDBserver to emit similar warnings in the
> > same situation.  I don't understand the reasons for that, since the
> > code is very similar, and with a single exception, we do check the
> > return values of calls to GetThreadContext, SetThreadContext, and
> > SuspendThread in GDBserver.  But the fact remains that no warnings
> > about these threads are ever seen when debugging remotely.  I do see
> > the extra threads under GDBserver as well.
> > 
> > Does anyone have any further ideas?
Pedro Alves April 7, 2014, 5:09 p.m. UTC | #12
On 03/31/2014 04:31 PM, Eli Zaretskii wrote:
>> Date: Fri, 28 Mar 2014 21:30:10 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: brobecker@adacore.com, gdb-patches@sourceware.org
>>
>>> Date: Fri, 28 Mar 2014 17:49:13 +0000
>>> From: Pedro Alves <palves@redhat.com>
>>> CC: brobecker@adacore.com, gdb-patches@sourceware.org
>>>
>>>>> Why bother calling SetThreadContext at all if we just killed
>>>>> the process?
>>>>
>>>> See my other mail and Joel's response.
>>>
>>> Not sure what you mean.  TerminateProcess is asynchronous, and
>>> we need to resume the inferior and collect the debug events
>>> until we see the process terminate.  But, my question is
>>> why would we write the thread's registers at all if we
>>> just told it to die?  Seems to be we could just skip
>>> calling SetThreadContext instead of calling it but
>>> ignoring the result.
>>
>> If you say so, I don't know enough about this stuff.
> 
> Actually, upon second thought: we continue the inferior after
> TerminateProcess call to let it be killed, right?  If so, shouldn't we
> continue it with the right context?

I don't think the threads are going to run whatever context
you set them them to.  They'll surely die before ever getting
scheduled to run any further userspace code?

> 
>>>>> Sounds like GDBserver might have this problem too.
>>>>
>>>> If there's an easy way to verify that, without having 2 systems
>>>> talking via some communications line, please tell how, and I will try
>>>> that.
>>>
>>> Sure, you can run gdbserver and gdb on the same machine, and connect
>>> with tcp.  Just:
>>>
>>>  $ gdbserver :9999 myprogram.exe
>>>
>>> in one terminal, and:
>>>
>>>  $ gdb myprogram.exe -ex "tar rem :9999" -ex "b main" -ex "c"
>>>
>>> in another.
>>
>> OK, will try that.
> 
> Funnily enough, I cannot get GDBserver to emit similar warnings in the
> same situation.  I don't understand the reasons for that, since the
> code is very similar, and with a single exception, we do check the
> return values of calls to GetThreadContext, SetThreadContext, and
> SuspendThread in GDBserver.  But the fact remains that no warnings
> about these threads are ever seen when debugging remotely.  I do see
> the extra threads under GDBserver as well.

GDBserver's warnings are guarded by 'if (debug_threads)' (see OUTMSG2).
That means you'll need to start gdbserver with --debug to see them.
Did you do that?
Eli Zaretskii April 7, 2014, 6:24 p.m. UTC | #13
> Date: Mon, 07 Apr 2014 18:09:16 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: brobecker@adacore.com, gdb-patches@sourceware.org
> 
> > Funnily enough, I cannot get GDBserver to emit similar warnings in the
> > same situation.  I don't understand the reasons for that, since the
> > code is very similar, and with a single exception, we do check the
> > return values of calls to GetThreadContext, SetThreadContext, and
> > SuspendThread in GDBserver.  But the fact remains that no warnings
> > about these threads are ever seen when debugging remotely.  I do see
> > the extra threads under GDBserver as well.
> 
> GDBserver's warnings are guarded by 'if (debug_threads)' (see OUTMSG2).

But the warnings I was talking about are output with OUTMSG, which
doesn't depend on debug_threads.  Here's an example:

  static void
  suspend_one_thread (struct inferior_list_entry *entry)
  {
    struct thread_info *thread = (struct thread_info *) entry;
    win32_thread_info *th = inferior_target_data (thread);

    if (!th->suspended)
      {
	if (SuspendThread (th->h) == (DWORD) -1)
	  {
	    DWORD err = GetLastError ();
	    OUTMSG (("warning: SuspendThread failed in suspend_one_thread, "
		     "(error %d): %s\n", (int) err, strwinerror (err)));

Did I miss something?
Joel Brobecker April 7, 2014, 9:39 p.m. UTC | #14
> But the warnings I was talking about are output with OUTMSG, which
> doesn't depend on debug_threads.  Here's an example:
> 
>   static void
>   suspend_one_thread (struct inferior_list_entry *entry)
>   {
>     struct thread_info *thread = (struct thread_info *) entry;
>     win32_thread_info *th = inferior_target_data (thread);
> 
>     if (!th->suspended)
>       {
> 	if (SuspendThread (th->h) == (DWORD) -1)
> 	  {
> 	    DWORD err = GetLastError ();
> 	    OUTMSG (("warning: SuspendThread failed in suspend_one_thread, "
> 		     "(error %d): %s\n", (int) err, strwinerror (err)));
> 
> Did I miss something?

This code not going to be used at all if you debug through GDBserver
(it's the code in remote.c that does). But if the GDBserver code does
the same thing as windows-nat in terms of inferior management,
it might trigger the same error and output them on GDBserver's
stdout/stderr if --debug is enabled. Usually, we try to keep the
GDB and GDBserver code in sync, and that means that we fix a problem
in inferior management, it's a good bet that the same problem exists
on both sides.

I might be able to take a look sometime next week. I think the issue
we'll have is with testing, since the problem is not necessarily always
reproduceable in Eli's scenario, and he would need to debug repeatedly
with GDBserver in order to gain a good level of confidence that the
problem is gone for good.
Eli Zaretskii April 8, 2014, 2:44 a.m. UTC | #15
> Date: Mon, 7 Apr 2014 14:39:13 -0700
> From: Joel Brobecker <brobecker@adacore.com>
> Cc: Pedro Alves <palves@redhat.com>, gdb-patches@sourceware.org
> 
> > But the warnings I was talking about are output with OUTMSG, which
> > doesn't depend on debug_threads.  Here's an example:
> > 
> >   static void
> >   suspend_one_thread (struct inferior_list_entry *entry)
> >   {
> >     struct thread_info *thread = (struct thread_info *) entry;
> >     win32_thread_info *th = inferior_target_data (thread);
> > 
> >     if (!th->suspended)
> >       {
> > 	if (SuspendThread (th->h) == (DWORD) -1)
> > 	  {
> > 	    DWORD err = GetLastError ();
> > 	    OUTMSG (("warning: SuspendThread failed in suspend_one_thread, "
> > 		     "(error %d): %s\n", (int) err, strwinerror (err)));
> > 
> > Did I miss something?
> 
> This code not going to be used at all if you debug through GDBserver
> (it's the code in remote.c that does).

Sorry, I don't understand: remote.c is not specific to Windows, so it
cannot include any Windows-specific calls like SuspendThread.

> But if the GDBserver code does the same thing as windows-nat in
> terms of inferior management, it might trigger the same error and
> output them on GDBserver's stdout/stderr if --debug is enabled.

Again, I don't see where --debug comes into play here.  Can you tell
me what am I missing?

> Usually, we try to keep the GDB and GDBserver code in sync, and that
> means that we fix a problem in inferior management, it's a good bet
> that the same problem exists on both sides.

As I said, I don't see the same problem in the server.

> I might be able to take a look sometime next week. I think the issue
> we'll have is with testing, since the problem is not necessarily always
> reproduceable in Eli's scenario, and he would need to debug repeatedly
> with GDBserver in order to gain a good level of confidence that the
> problem is gone for good.

I can reproduce it with some effort, so verifying a fix is not a
problem.
Joel Brobecker April 8, 2014, 4:23 a.m. UTC | #16
> Sorry, I don't understand: remote.c is not specific to Windows, so it
> cannot include any Windows-specific calls like SuspendThread.

I think the easiest is probably for you to take a look at the code
in gdb/gdbserver/win32-low.c. This file actually also has a routine
called thread_rec, and generally speaking, the code in gdb/*-nat.c
and the corresponding gdb/gdbserver/*-low.c can (and probably should)
be very similar - at least until we can factorize that code and reuse
the gdbserver code in GDB.

The reason I was mentioning remote.c is because, when you use GDBserver,
GDBserver does all the inferior control for GDB. It's like a mini
debugger, except that it does not have a user interface, only a serial
interface that GDB can used to communicate with it and send it orders
such as read or write memory, next/step operations, continue, kill,
etc. So, even if you are using a Windows native debugger, as soon as
you type the "target remote [...]" command, the windows-nat target
gets swapped out in favor of the "remote" target, which then delegates
the inferior control to the agent (gdbserver). The code that does that
delegation is in remote.c.

So, to try to reproduce with GDBserver, you'd have to do the following.
In one terminal, start your program with gdbserver. For instance:

    % gdbserver --debug :4567 YOUR_PROGRAM

It'll write a message saying that the program has been started,
and that it's now waiting for a connection on port 4567. The --debug
switch is not normally necessary, but allows us to turn warnings on,
which then allows us to determine whether or not we reproduced the
problem in GDB.

Next, start GDB in a second terminal, and connect to it via:

    % gdb YOUR_PROGRAM
    (gdb) target remote :4567

The "target remote [...]" command replaces the "run" command.
>From there, everything else should be the same as the reproducer
you have for the case where GDBserver isn't involved.

And instead of seeing the warning in the GDB console, you would
see it in the terminal where gdbserver was started. The idea for
solving the issue on the gdbserver side would be to apply the very
same sort of change you applied to windows-nat, but to win32-low,
essentially keeping the implementations in sync.
Pedro Alves April 8, 2014, 11:32 a.m. UTC | #17
On 04/07/2014 07:24 PM, Eli Zaretskii wrote:
>> Date: Mon, 07 Apr 2014 18:09:16 +0100
>> From: Pedro Alves <palves@redhat.com>
>> CC: brobecker@adacore.com, gdb-patches@sourceware.org
>>
>>> Funnily enough, I cannot get GDBserver to emit similar warnings in the
>>> same situation.  I don't understand the reasons for that, since the
>>> code is very similar, and with a single exception, we do check the
>>> return values of calls to GetThreadContext, SetThreadContext, and
>>> SuspendThread in GDBserver.  But the fact remains that no warnings
>>> about these threads are ever seen when debugging remotely.  I do see
>>> the extra threads under GDBserver as well.
>>
>> GDBserver's warnings are guarded by 'if (debug_threads)' (see OUTMSG2).
> 
> But the warnings I was talking about are output with OUTMSG, which
> doesn't depend on debug_threads.  Here's an example:
> 
>   static void
>   suspend_one_thread (struct inferior_list_entry *entry)
>   {
>     struct thread_info *thread = (struct thread_info *) entry;
>     win32_thread_info *th = inferior_target_data (thread);
> 
>     if (!th->suspended)
>       {
> 	if (SuspendThread (th->h) == (DWORD) -1)
> 	  {
> 	    DWORD err = GetLastError ();
> 	    OUTMSG (("warning: SuspendThread failed in suspend_one_thread, "
> 		     "(error %d): %s\n", (int) err, strwinerror (err)));
> 
> Did I miss something?

No, it was I who missed that.  I found a Windows machine and am taking
a look.
Eli Zaretskii April 8, 2014, 3:17 p.m. UTC | #18
> Date: Mon, 7 Apr 2014 21:23:31 -0700
> From: Joel Brobecker <brobecker@adacore.com>
> Cc: palves@redhat.com, gdb-patches@sourceware.org
> 
> > Sorry, I don't understand: remote.c is not specific to Windows, so it
> > cannot include any Windows-specific calls like SuspendThread.
> 
> I think the easiest is probably for you to take a look at the code
> in gdb/gdbserver/win32-low.c.

I altready did, but you seemed to say that code will not be used,
which confused me.

> This file actually also has a routine
> called thread_rec, and generally speaking, the code in gdb/*-nat.c
> and the corresponding gdb/gdbserver/*-low.c can (and probably should)
> be very similar - at least until we can factorize that code and reuse
> the gdbserver code in GDB.

Yes, I know.  I've read the code.

> The reason I was mentioning remote.c is because, when you use GDBserver,
> GDBserver does all the inferior control for GDB. It's like a mini
> debugger, except that it does not have a user interface, only a serial
> interface that GDB can used to communicate with it and send it orders
> such as read or write memory, next/step operations, continue, kill,
> etc. So, even if you are using a Windows native debugger, as soon as
> you type the "target remote [...]" command, the windows-nat target
> gets swapped out in favor of the "remote" target, which then delegates
> the inferior control to the agent (gdbserver). The code that does that
> delegation is in remote.c.

I know that, too.  What I don't understand is how all this is relevant
to the issue at hand, which is why the same system calls in
win32-low.c as we have in windows-nat.c don't trigger similar warning
messages.

> So, to try to reproduce with GDBserver, you'd have to do the following.
> In one terminal, start your program with gdbserver. For instance:
> 
>     % gdbserver --debug :4567 YOUR_PROGRAM

Why do I need --debug?  The warnings in question are displayed by
OUTMSG, which AFAIU does not need --debug to print its messages.

> It'll write a message saying that the program has been started,
> and that it's now waiting for a connection on port 4567. The --debug
> switch is not normally necessary, but allows us to turn warnings on,
> which then allows us to determine whether or not we reproduced the
> problem in GDB.
> 
> Next, start GDB in a second terminal, and connect to it via:
> 
>     % gdb YOUR_PROGRAM
>     (gdb) target remote :4567
> 
> The "target remote [...]" command replaces the "run" command.
> >From there, everything else should be the same as the reproducer
> you have for the case where GDBserver isn't involved.

I already did all that, per Pedro's instructions here:

  https://sourceware.org/ml/gdb-patches/2014-03/msg00668.html

> And instead of seeing the warning in the GDB console, you would
> see it in the terminal where gdbserver was started.

But that's it: I see no warnings when I run GDBserver like that.
diff mbox

Patch

--- gdb/windows-nat.c~0	2014-02-06 04:21:29.000000000 +0200
+++ gdb/windows-nat.c	2014-03-25 19:18:16.814250000 +0200
@@ -310,12 +310,18 @@  thread_rec (DWORD id, int get_context)
 		  {
 		    DWORD err = GetLastError ();
 
-		    warning (_("SuspendThread (tid=0x%x) failed."
-			       " (winerr %u)"),
-			     (unsigned) id, (unsigned) err);
-		    return NULL;
+		    /* We get Access Denied (5) when trying to suspend
+		       threads that Windows started on behalf of the
+		       debuggee, usually when those threads are just
+		       about to exit.  */
+		    if (err != ERROR_ACCESS_DENIED)
+		      warning (_("SuspendThread (tid=0x%x) failed."
+				 " (winerr %u)"),
+			       (unsigned) id, (unsigned) err);
+		    th->suspended = -1;
 		  }
-		th->suspended = 1;
+		else
+		  th->suspended = 1;
 	      }
 	    else if (get_context < 0)
 	      th->suspended = -1;
@@ -445,7 +451,7 @@  do_windows_fetch_inferior_registers (str
 	{
 	  thread_info *th = current_thread;
 	  th->context.ContextFlags = CONTEXT_DEBUGGER_DR;
-	  GetThreadContext (th->h, &th->context);
+	  CHECK (GetThreadContext (th->h, &th->context));
 	  /* Copy dr values from that thread.
 	     But only if there were not modified since last stop.
 	     PR gdb/2388 */
@@ -1266,10 +1272,12 @@  handle_exception (struct target_waitstat
   return 1;
 }
 
-/* Resume all artificially suspended threads if we are continuing
-   execution.  */
+/* Resume thread specified by ID, or all artificially suspended
+   threads, if we are continuing execution.  KILLED non-zero means we
+   have killed the inferior, so we should ignore weird errors due to
+   threads shutting down.  */
 static BOOL
-windows_continue (DWORD continue_status, int id)
+windows_continue (DWORD continue_status, int id, int killed)
 {
   int i;
   thread_info *th;
@@ -1297,7 +1305,16 @@  windows_continue (DWORD continue_status,
 	  }
 	if (th->context.ContextFlags)
 	  {
-	    CHECK (SetThreadContext (th->h, &th->context));
+	    DWORD ec = 0;
+
+	    if (GetExitCodeThread (th->h, &ec)
+		&& ec == STILL_ACTIVE)
+	      {
+		if (killed)
+		  SetThreadContext (th->h, &th->context);
+		else
+		  CHECK (SetThreadContext (th->h, &th->context));
+	      }
 	    th->context.ContextFlags = 0;
 	  }
 	if (th->suspended > 0)
@@ -1425,9 +1442,9 @@  windows_resume (struct target_ops *ops,
      Otherwise complain.  */
 
   if (resume_all)
-    windows_continue (continue_status, -1);
+    windows_continue (continue_status, -1, 0);
   else
-    windows_continue (continue_status, ptid_get_tid (ptid));
+    windows_continue (continue_status, ptid_get_tid (ptid), 0);
 }
 
 /* Ctrl-C handler used when the inferior is not run in the same console.  The
@@ -1645,7 +1662,7 @@  get_windows_debug_event (struct target_o
       if (continue_status == -1)
 	windows_resume (ops, minus_one_ptid, 0, 1);
       else
-	CHECK (windows_continue (continue_status, -1));
+	CHECK (windows_continue (continue_status, -1, 0));
     }
   else
     {
@@ -2382,13 +2399,13 @@  windows_create_inferior (struct target_o
 
   do_initial_windows_stuff (ops, pi.dwProcessId, 0);
 
-  /* windows_continue (DBG_CONTINUE, -1); */
+  /* windows_continue (DBG_CONTINUE, -1, 0); */
 }
 
 static void
 windows_mourn_inferior (struct target_ops *ops)
 {
-  (void) windows_continue (DBG_CONTINUE, -1);
+  (void) windows_continue (DBG_CONTINUE, -1, 0);
   i386_cleanup_dregs();
   if (open_process_used)
     {
@@ -2456,7 +2473,7 @@  windows_kill_inferior (struct target_ops
 
   for (;;)
     {
-      if (!windows_continue (DBG_CONTINUE, -1))
+      if (!windows_continue (DBG_CONTINUE, -1, 1))
 	break;
       if (!WaitForDebugEvent (&current_event, INFINITE))
 	break;