Fix "PC register is not available" issue

Message ID 53442710.60104@redhat.com
State Not applicable
Headers

Commit Message

Pedro Alves April 8, 2014, 4:42 p.m. UTC
  On 04/08/2014 12:32 PM, Pedro Alves wrote:
> On 04/07/2014 07:24 PM, Eli Zaretskii wrote:
>>> Date: Mon, 07 Apr 2014 18:09:16 +0100
>>> From: Pedro Alves <palves@redhat.com>
>>> CC: brobecker@adacore.com, gdb-patches@sourceware.org
>>>
>>>> Funnily enough, I cannot get GDBserver to emit similar warnings in the
>>>> same situation.  I don't understand the reasons for that, since the
>>>> code is very similar, and with a single exception, we do check the
>>>> return values of calls to GetThreadContext, SetThreadContext, and
>>>> SuspendThread in GDBserver.  But the fact remains that no warnings
>>>> about these threads are ever seen when debugging remotely.  I do see
>>>> the extra threads under GDBserver as well.
>>>
>>> GDBserver's warnings are guarded by 'if (debug_threads)' (see OUTMSG2).
>>
>> But the warnings I was talking about are output with OUTMSG, which
>> doesn't depend on debug_threads.  Here's an example:
>>
>>   static void
>>   suspend_one_thread (struct inferior_list_entry *entry)
>>   {
>>     struct thread_info *thread = (struct thread_info *) entry;
>>     win32_thread_info *th = inferior_target_data (thread);
>>
>>     if (!th->suspended)
>>       {
>> 	if (SuspendThread (th->h) == (DWORD) -1)
>> 	  {
>> 	    DWORD err = GetLastError ();
>> 	    OUTMSG (("warning: SuspendThread failed in suspend_one_thread, "
>> 		     "(error %d): %s\n", (int) err, strwinerror (err)));
>>
>> Did I miss something?
> 
> No, it was I who missed that.  I found a Windows machine and am taking
> a look.

D**n, I had forgotten how slow Windows is...


OK, so I looked for a Windows machine to try it.  I can definitely
reproduce the warning with gdbserver, using the testcase from PR14018.
Like:

$ cat test.c
#include <windows.h>

int main()
{    
    while (1)
        DebugBreakProcess(GetCurrentProcess());
}

$ gcc win32_suspend_issue.c -o win32_suspend_issue.exe -g3 -O0
$
$ ./gdb -data-directory=data-directory --args ./win32_suspend_issue.exe
[snip]
Reading symbols from ./win32_suspend_issue.exe...done.
(gdb) tbreak main
Temporary breakpoint 1 at 0x4010ba: file win32_suspend_issue.c, line 6.
(gdb) tar remote :9999
Remote debugging using :9999
0x770c1004 in ntdll!RtlUpcaseUnicodeToOemN () from /cygdrive/c/Windows/SysWOW64/ntdll.dll
(gdb) c
Continuing.

Temporary breakpoint 1, main () at win32_suspend_issue.c:6
6               DebugBreakProcess(GetCurrentProcess());
(gdb) b 6
Breakpoint 2 at 0x4010ba: file win32_suspend_issue.c, line 6.
(gdb) commands 
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>continue
>end
(gdb) handle SIGTRAP nostop
SIGTRAP is used by the debugger.
Are you sure you want to change it? (y or n) y
Signal        Stop      Print   Pass to program Description
SIGTRAP       No        Yes     No              Trace/breakpoint trap
(gdb) set pagination off 
(gdb) c

Breakpoint 2, main () at win32_suspend_issue.c:6
6               DebugBreakProcess(GetCurrentProcess());
[New Thread 3156]

Program received signal SIGTRAP, Trace/breakpoint trap.

Breakpoint 2, main () at win32_suspend_issue.c:6
6               DebugBreakProcess(GetCurrentProcess());

Breakpoint 2, main () at win32_suspend_issue.c:6
6               DebugBreakProcess(GetCurrentProcess());

Breakpoint 2, main () at win32_suspend_issue.c:6
6               DebugBreakProcess(GetCurrentProcess());

Breakpoint 2, main () at win32_suspend_issue.c:6
6               DebugBreakProcess(GetCurrentProcess());

Breakpoint 2, main () at win32_suspend_issue.c:6
6               DebugBreakProcess(GetCurrentProcess());

Breakpoint 2, main () at win32_suspend_issue.c:6
6               DebugBreakProcess(GetCurrentProcess());

Breakpoint 2, main () at win32_suspend_issue.c:6
6               DebugBreakProcess(GetCurrentProcess());
[New Thread 3524]

Program received signal SIGTRAP, Trace/breakpoint trap.

[snip lots of the same, forever]


While on the gdbserver terminal:

$ gdbserver/gdbserver.exe :9999 ./win32_suspend_issue.exe
Process ./win32_suspend_issue.exe created; pid = 5888
Listening on port 9999
Remote debugging from host 127.0.0.1
warning: SuspendThread failed in thread_rec, (error 5): Acesso negado.
warning: SuspendThread failed in thread_rec, (error 5): Acesso negado.
warning: SuspendThread failed in thread_rec, (error 5): Acesso negado.
warning: SuspendThread failed in thread_rec, (error 5): Acesso negado.
warning: SuspendThread failed in thread_rec, (error 5): Acesso negado.
warning: SuspendThread failed in thread_rec, (error 5): Acesso negado.
warning: SuspendThread failed in thread_rec, (error 5): Acesso negado.
warning: SuspendThread failed in thread_rec, (error 5): Acesso negado.
...

Now, unlike in the GDB case,  gdbserver reads registers
from the thread anyway, so we don't see the "PC register is not
available" error.  So I went a step further, and patched gdbserver
like this, to force it to error out on the subsequent resume,
if previously suspending the thread failed:

---



That is, again, if SuspendThread fails, GDB still fetches registers,
like in your patch, but in addition, prevent resuming (once)
so we can get the failing thread's backtrace.

Still using the test case from the PR, I see, with the native
gdb:

Program received signal SIGTRAP, Trace/breakpoint trap.
warning: SuspendThread (tid=0x1318) failed. (winerr 5)

Breakpoint 4, main () at /home/pedro/win32_suspend_issue.c:6
6       in /home/pedro/win32_suspend_issue.c
0x0000000000401514 in main () at /home/pedro/win32_suspend_issue.c:6
6       in /home/pedro/win32_suspend_issue.c
SuspendThread had failed.


And tid=0x1318 is inside ZwTerminateThread:

(gdb) info threads
  Id   Target Id         Frame 
  123  Thread 5316.0x1318 0x0000000076e90aea in ntdll!ZwTerminateThread () from C:\Windows\SYSTEM32\ntdll.dll
* 1    Thread 5316.0x1004 0x0000000000401514 in main () at /home/pedro/win32_suspend_issue.c:6
(gdb) t 123
[Switching to thread 123 (Thread 5316.0x1318)]
#0  0x0000000076e90aea in ntdll!ZwTerminateThread () from C:\Windows\SYSTEM32\ntdll.dll
(gdb) bt
#0  0x0000000076e90aea in ntdll!ZwTerminateThread () from C:\Windows\SYSTEM32\ntdll.dll
#1  0x0000000076e85c78 in ntdll!RtlExitUserThread () from C:\Windows\SYSTEM32\ntdll.dll
#2  0x0000000076f37a61 in ntdll!DbgUiRemoteBreakin () from C:\Windows\SYSTEM32\ntdll.dll
#3  0x0000000076d3651d in KERNEL32!BaseThreadInitThunk () from C:\Windows\system32\kernel32.dll
#4  0x0000000076e6ba01 in ntdll!RtlUserThreadStart () from C:\Windows\SYSTEM32\ntdll.dll
#5  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)


With the native debugger, the fail always happens in 
ZwTerminateThread (and I tried both 64-bit and 32-bit native
gdbs), while with gdbserver the fail always happens in
RtlDosApplyFileIsolationRedirection.  My currently hypothesis for this is
that gdbserver affects timings differently then when running under
gdb (remote protocol traffic, etc.).

(Oh this is a Windows 7 box, btw)

Then I tried simplifying the test case, and remove
remote thread injection from the picture (DebugBreakProcess
uses remote thread injection, and I don't know whether
that's special in terms of SuspendThread permissions, or whether
that's somehow short circuited in terms of breaking
GetCurrentProcess()), with this alternative test case:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <windows.h>

static DWORD WINAPI
thread_entry (void *arg)
{
  return 0;
}

int
main ()
{
  DWORD threadId;

	
  while (1)
    {
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
      CreateThread (NULL, 0, thread_entry, NULL, 0, &threadId);
    }
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And sourcing this GDB script:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
start
tb main
continue

break 17
command
	continue
end

set pagination off

continue
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And sure enough, eventually, I get:

(...)
[Thread 11432.0x1208 exited with code 0]
[Thread 11432.0x813c exited with code 0]
[New Thread 11432.0x8290]
[Thread 11432.0x8290 exited with code 0]
[New Thread 11432.0x78d4]
[Thread 11432.0x78d4 exited with code 0]
[New Thread 11432.0x6ed8]
[Thread 11432.0x6ed8 exited with code 0]

Breakpoint 1, main () at win32_spawn_threads.c:17
17      in win32_spawn_threads.c
[New Thread 11432.0x8334]
warning: SuspendThread (tid=0x8334) failed. (winerr 5)
0x0000000000401520 in main () at win32_spawn_threads.c:17
17      in win32_spawn_threads.c
./win32_spawn_threads.gdb:14: Error in sourced command file:
SuspendThread had failed.

(gdb) info threads
  Id   Target Id         Frame 
  6870 Thread 11432.0x8334 0x0000000076e90aea in ntdll!ZwTerminateThread () from C:\Windows\SYSTEM32\ntdll.dll
  6863 Thread 11432.0x7030 0x0000000076e905fa in ntdll!ZwWaitForSingleObject () from C:\Windows\SYSTEM32\ntdll.dll
  6862 Thread 11432.0x8294 0x0000000076e905fa in ntdll!ZwWaitForSingleObject () from C:\Windows\SYSTEM32\ntdll.dll
...
(gdb) t 6870
[Switching to thread 6870 (Thread 11432.0x8334)]
#0  0x0000000076e90aea in ntdll!ZwTerminateThread () from C:\Windows\SYSTEM32\ntdll.dll
(gdb) bt
#0  0x0000000076e90aea in ntdll!ZwTerminateThread () from C:\Windows\SYSTEM32\ntdll.dll
#1  0x0000000076e85c78 in ntdll!RtlExitUserThread () from C:\Windows\SYSTEM32\ntdll.dll
#2  0x0000000076d36525 in KERNEL32!BaseThreadInitThunk () from C:\Windows\system32\kernel32.dll
#3  0x0000000076e6ba01 in ntdll!RtlUserThreadStart () from C:\Windows\SYSTEM32\ntdll.dll
#4  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)


and in 32-bit mode, the equivalent:

[New Thread 13192.0x8248]
[New Thread 13192.0x82d0]
[New Thread 13192.0x718c]
[New Thread 13192.0x7214]
[New Thread 13192.0x8238]
warning: SuspendThread (tid=0x8238) failed. (winerr 5)

Breakpoint 3, main () at win32_spawn_threads.c:17
17      in win32_spawn_threads.c
0x00401585 in main () at win32_spawn_threads.c:17
17      in win32_spawn_threads.c
./win32_spawn_threads.gdb:14: Error in sourced command file:
SuspendThread had failed.

(gdb) info threads
  Id   Target Id         Frame 
  635  Thread 13192.0x8238 0x77040096 in ntdll!ZwTerminateThread () from C:\Windows\SysWOW64\ntdll.dll
  634  Thread 13192.0x7214 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  633  Thread 13192.0x718c 0x770301b4 in ntdll!RtlUserThreadStart () from C:\Windows\SysWOW64\ntdll.dll
  632  Thread 13192.0x82d0 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  631  Thread 13192.0x8248 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  630  Thread 13192.0x8258 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  629  Thread 13192.0x82a0 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  628  Thread 13192.0x6544 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  627  Thread 13192.0x8280 0x770301b4 in ntdll!RtlUserThreadStart () from C:\Windows\SysWOW64\ntdll.dll
  616  Thread 13192.0x71a4 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  590  Thread 13192.0x8298 0x770301b4 in ntdll!RtlUserThreadStart () from C:\Windows\SysWOW64\ntdll.dll
  578  Thread 13192.0x2ba0 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  574  Thread 13192.0x2560 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  558  Thread 13192.0x6a94 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  557  Thread 13192.0x15dc 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
  544  Thread 13192.0x2d90 0x7703f9d9 in ntdll!ZwSetEvent () from C:\Windows\SysWOW64\ntdll.dll
  519  Thread 13192.0x118c 0x7703f8d1 in ntdll!ZwWaitForSingleObject () from C:\Windows\SysWOW64\ntdll.dll
* 1    Thread 13192.0x3390 0x00401585 in main () at win32_spawn_threads.c:17
(gdb) t 635
[Switching to thread 635 (Thread 13192.0x8238)]
#0  0x77040096 in ntdll!ZwTerminateThread () from C:\Windows\SysWOW64\ntdll.dll
(gdb) bt
#0  0x77040096 in ntdll!ZwTerminateThread () from C:\Windows\SysWOW64\ntdll.dll
#1  0x77040096 in ntdll!ZwTerminateThread () from C:\Windows\SysWOW64\ntdll.dll
#2  0x77076795 in ntdll!RtlExitUserThread () from C:\Windows\SysWOW64\ntdll.dll
#3  0x76903371 in KERNEL32!BaseThreadInitThunk () from C:\Windows\syswow64\kernel32.dll
#4  0x7705bf32 in ntdll!RtlInitializeExceptionChain () from C:\Windows\SysWOW64\ntdll.dll
#5  0x7705bf05 in ntdll!RtlInitializeExceptionChain () from C:\Windows\SysWOW64\ntdll.dll
#6  0x00000000 in ?? ()
(gdb) 


So at least this test case shows that the issue very much looks
like indeed caused by trying to suspend threads that are more dead
than alive.  I'd be very curious to see the backtrace you get
for the failing thread in your test case (I guess emacs?).

+		    /* We get Access Denied (5) when trying to suspend
+		       threads that Windows started on behalf of the
+		       debuggee, usually when those threads are just
+		       about to exit.  */
+		    if (err != ERROR_ACCESS_DENIED)

I've shown above that whether it was Windows or the program
itself that started the threads is irrelevant, it'd be good to 
reword this comment.
  

Comments

Eli Zaretskii April 8, 2014, 5:10 p.m. UTC | #1
> Date: Tue, 08 Apr 2014 17:42:56 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: brobecker@adacore.com, gdb-patches@sourceware.org
> 
> I'd be very curious to see the backtrace you get
> for the failing thread in your test case (I guess emacs?).

Yes, it's Emacs.  Do you mean the backtrace I see when debugging
natively?  Because when debugging Emacs with gdbserver, I cannot
reproduce the problem with SuspendThread.

> +		    /* We get Access Denied (5) when trying to suspend
> +		       threads that Windows started on behalf of the
> +		       debuggee, usually when those threads are just
> +		       about to exit.  */
> +		    if (err != ERROR_ACCESS_DENIED)
> 
> I've shown above that whether it was Windows or the program
> itself that started the threads is irrelevant, it'd be good to 
> reword this comment.

OK.  But now I'm confused: what is the conclusion from what you saw?
  
Pedro Alves April 8, 2014, 5:36 p.m. UTC | #2
On 04/08/2014 06:10 PM, Eli Zaretskii wrote:
>> Date: Tue, 08 Apr 2014 17:42:56 +0100
>> From: Pedro Alves <palves@redhat.com>
>> CC: brobecker@adacore.com, gdb-patches@sourceware.org
>>
>> I'd be very curious to see the backtrace you get
>> for the failing thread in your test case (I guess emacs?).
> 
> Yes, it's Emacs.  Do you mean the backtrace I see when debugging
> natively?

Yes, but you'll need to use the patch I attached, not yours.
When SuspendThread fails, the warning says which thread failed.
The next "continue", "next", whatever will fail when that
happens (just once, so you can continue debugging as usual
in case you're in the middle of a real debug session).  At
that point, "info threads", see which is the gdb thread id
for the thread that failed, switch to it, and get a backtrace,
like I showed in the previous email.

> Because when debugging Emacs with gdbserver, I cannot
> reproduce the problem with SuspendThread.

You mean you can't get the warning in GDBserver's console,
or you don't see the "PC register is not available" issue?
Even if the exact test case as in the PR (and as I attached
to the previous email) -- that is, one that continues banging
in a loop until the failure triggers?  Or you mean with
emacs?

As I mentioned in the previous email, you won't get the
"PC register not available" issue with GDBserver because GDBserver's
thread_rec returns the thread's info structure anyway even if
SuspendThread failed, unlike GDB, which returns NULL.  If
the SuspendThread issue triggers, then the
GetThreadContext/SetThreadContext issue should trigger too.
But, GDBserver actually currently ignores SetThreadContext
fails ...:

static void
i386_set_thread_context (win32_thread_info *th, DEBUG_EVENT* current_event)
{
  if (debug_registers_changed)
    {
      struct i386_debug_reg_state *dr = &debug_reg_state;
      th->context.Dr0 = dr->dr_mirror[0];
      th->context.Dr1 = dr->dr_mirror[1];
      th->context.Dr2 = dr->dr_mirror[2];
      th->context.Dr3 = dr->dr_mirror[3];
      /* th->context.Dr6 = dr->dr_status_mirror;
	 FIXME: should we set dr6 also ?? */
      th->context.Dr7 = dr->dr_control_mirror;
    }

  SetThreadContext (th->h, &th->context);
}



Given I see different backtraces on the same machine in native vs
gdbserver debugging of the same test, and that in gdbserver's case the
failing thread appears to always be further down into thread termination,
it may just be that your machine is a little slower or faster than
mine, and it's harder to stop threads at exactly within the time window
when the SuspendThread problem can trigger.

>> +		    /* We get Access Denied (5) when trying to suspend
>> +		       threads that Windows started on behalf of the
>> +		       debuggee, usually when those threads are just
>> +		       about to exit.  */
>> +		    if (err != ERROR_ACCESS_DENIED)
>>
>> I've shown above that whether it was Windows or the program
>> itself that started the threads is irrelevant, it'd be good to 
>> reword this comment.
> 
> OK.  But now I'm confused: what is the conclusion from what you saw?

My conclusion so far is that this happens exactly when we try to
suspend a thread that is already half-dead, no matter who started it,
and that we do need your patch to GDB, and that GDBserver will also
need to the patched.  I'm just curious to see your emacs backtrace
to 99% confirm that it's the same thread-exiting scenario, mainly
to put any doubts to rest, for us and for future generations (the
archives).
  
Eli Zaretskii April 8, 2014, 5:54 p.m. UTC | #3
> Date: Tue, 08 Apr 2014 18:36:37 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: brobecker@adacore.com, gdb-patches@sourceware.org
> 
> On 04/08/2014 06:10 PM, Eli Zaretskii wrote:
> >> Date: Tue, 08 Apr 2014 17:42:56 +0100
> >> From: Pedro Alves <palves@redhat.com>
> >> CC: brobecker@adacore.com, gdb-patches@sourceware.org
> >>
> >> I'd be very curious to see the backtrace you get
> >> for the failing thread in your test case (I guess emacs?).
> > 
> > Yes, it's Emacs.  Do you mean the backtrace I see when debugging
> > natively?
> 
> Yes, but you'll need to use the patch I attached, not yours.
> When SuspendThread fails, the warning says which thread failed.
> The next "continue", "next", whatever will fail when that
> happens (just once, so you can continue debugging as usual
> in case you're in the middle of a real debug session).  At
> that point, "info threads", see which is the gdb thread id
> for the thread that failed, switch to it, and get a backtrace,
> like I showed in the previous email.

I can know which thread's suspension fails even without the patch,
because GDB immediately announces its exit.  Is the patch just for
showing the thread ID, or is it needed for something else?

> > Because when debugging Emacs with gdbserver, I cannot
> > reproduce the problem with SuspendThread.
> 
> You mean you can't get the warning in GDBserver's console,
> or you don't see the "PC register is not available" issue?

The former, of course.

> Even if the exact test case as in the PR (and as I attached
> to the previous email) -- that is, one that continues banging
> in a loop until the failure triggers?  Or you mean with
> emacs?

I mean with Emacs.  The PR test case is an artificial toy program
which also floods the system with threads.  I wouldn't consider it a
serious real-life use case.

> Given I see different backtraces on the same machine in native vs
> gdbserver debugging of the same test, and that in gdbserver's case the
> failing thread appears to always be further down into thread termination,
> it may just be that your machine is a little slower or faster than
> mine, and it's harder to stop threads at exactly within the time window
> when the SuspendThread problem can trigger.

I think multiple cores also come into play here.  (My machine is Core
i7.)

> > OK.  But now I'm confused: what is the conclusion from what you saw?
> 
> My conclusion so far is that this happens exactly when we try to
> suspend a thread that is already half-dead, no matter who started it,
> and that we do need your patch to GDB, and that GDBserver will also
> need to the patched.  I'm just curious to see your emacs backtrace
> to 99% confirm that it's the same thread-exiting scenario, mainly
> to put any doubts to rest, for us and for future generations (the
> archives).

OK, I will try to get that info.
  
Joel Brobecker April 11, 2014, 8:06 p.m. UTC | #4
Hello,

I was wondering what the status of the GDB-side of the patch was.
I thought we were all set to commit it, while also looking at
applying the same patch on the GDBserver side? I don't think we are
in any hurry to get anything committed, but I'd had to see all
this work get forgotten and lost... Also, I don't think the patch
is all that risky, but the earlier we put it in, the more exposure
it'll get before we release 7.8!

Thanks,
  

Patch

>From d935c6d43e035c15ba64c1e8094943ac63d8e711 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 8 Apr 2014 11:37:26 +0100
Subject: [PATCH] Force the next resume to fail when SuspendThread fails

---
 gdb/gdbserver/win32-low.c | 19 +++++++++++++++----
 gdb/windows-nat.c         | 11 +++++++++--
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/gdb/gdbserver/win32-low.c b/gdb/gdbserver/win32-low.c
index bc2506c..ea3d70b 100644
--- a/gdb/gdbserver/win32-low.c
+++ b/gdb/gdbserver/win32-low.c
@@ -177,8 +177,11 @@  thread_rec (ptid_t ptid, int get_context)
 	  if (SuspendThread (th->h) == (DWORD) -1)
 	    {
 	      DWORD err = GetLastError ();
-	      OUTMSG (("warning: SuspendThread failed in thread_rec, "
-		       "(error %d): %s\n", (int) err, strwinerror (err)));
+	      OUTMSG (("warning: SuspendThread failed in thread_rec(%s), "
+		       "(error %d): %s\n",
+		       target_pid_to_str (ptid),
+		       (int) err, strwinerror (err)));
+	      th->suspended = -1;
 	    }
 	  else
 	    th->suspended = 1;
@@ -420,8 +423,15 @@  continue_one_thread (struct inferior_list_entry *this_thread, void *id_ptr)
   int thread_id = * (int *) id_ptr;
   win32_thread_info *th = inferior_target_data (thread);
 
-  if ((thread_id == -1 || thread_id == th->tid)
-      && th->suspended)
+  if ((thread_id == -1 || thread_id == th->tid))
+    {
+      if (th->suspended == -1)
+	{
+	  th->suspended = 1;
+	  error ("SuspendThread had failed for thread %d.\n", (int) th->tid);
+	}
+
+      if (th->suspended)
     {
       if (th->context.ContextFlags)
 	{
@@ -437,6 +447,7 @@  continue_one_thread (struct inferior_list_entry *this_thread, void *id_ptr)
 	}
       th->suspended = 0;
     }
+    }
 
   return 0;
 }
diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c
index fe40c4d..6a0c861 100644
--- a/gdb/windows-nat.c
+++ b/gdb/windows-nat.c
@@ -312,9 +312,10 @@  thread_rec (DWORD id, int get_context)
 		    warning (_("SuspendThread (tid=0x%x) failed."
 			       " (winerr %u)"),
 			     (unsigned) id, (unsigned) err);
-		    return NULL;
+		    th->suspended = -2;
 		  }
-		th->suspended = 1;
+		else
+		  th->suspended = 1;
 	      }
 	    else if (get_context < 0)
 	      th->suspended = -1;
@@ -1200,6 +1201,12 @@  windows_continue (DWORD continue_status, int id)
     if ((id == -1 || id == (int) th->id)
 	&& th->suspended)
       {
+	if (th->suspended == -2)
+	  {
+	    th->suspended = 1;
+	    error ("SuspendThread had failed.\n");
+	  }
+
 	if (debug_registers_changed)
 	  {
 	    th->context.ContextFlags |= CONTEXT_DEBUG_REGISTERS;
-- 
1.7.11.7