Fix "PC register is not available" issue

Message ID 83txawa9wk.fsf@gnu.org
State Superseded
Headers

Commit Message

Eli Zaretskii March 17, 2014, 7:43 p.m. UTC
  This problem was raised and mentioned several times over the last few
years.  It was discussed here in the thread which started with this
message:

  https://sourceware.org/ml/gdb-patches/2013-06/msg00237.html

In the ensuing discussion, there was a consensus that these failures
shouldn't be fatal, and perhaps we should even ignore them entirely.

The most annoying problem with this is that after the message about
failure to suspend a thread, there's another message:

  warning: SuspendThread (tid=0x720) failed. (winerr 5)
  PC register is not available  <<<<<<<<<<<<<<<<<<<<<<<<<

and the debug session becomes useless after that: the debuggee stops,
and you cannot continue it.  You can only examine memory.

A very similar, perhaps the same, problem is described here:

  https://sourceware.org/bugzilla/show_bug.cgi?id=14018

There you can find a short test program that demonstrates the issue,
and some analysis.  After hitting this problem myself many times, I
came to the same conclusion: namely, that GDB is trying to suspend a
thread for which it doesn't have the necessary privileges.  My
hypothesis is that those are the threads that Windows starts in the
context of the process being debugged, perhaps for the purposes of
handling some debug events.  (I can see the "New thread" messages from
time to time, although I know for certain that the inferior didn't
start any application threads.)

So I came up with the patch below.  The idea is that setting
th->suspended to -1 just signals to GDB that the thread isn't
suspended, and shouldn't be resumed; otherwise, we simply ignore the
failure to suspend the thread, although the warning is still printed.

I am running with this patch for almost a month, and the dreaded "PC
register is not available" message didn't appear even once.  No more
botched debugging sessions!  The test program in PR/14018 also runs
indefinitely until interrupted, instead of stopping.

So I suggest to install this.  OK?  Should I also close the Bugzilla
PR?
  

Comments

Joel Brobecker March 18, 2014, 4:16 p.m. UTC | #1
Hi Eli,

On Mon, Mar 17, 2014 at 09:43:23PM +0200, Eli Zaretskii wrote:
> This problem was raised and mentioned several times over the last few
> years.  It was discussed here in the thread which started with this
> message:
> 
>   https://sourceware.org/ml/gdb-patches/2013-06/msg00237.html

Thanks for looking into this!

> So I came up with the patch below.  The idea is that setting
> th->suspended to -1 just signals to GDB that the thread isn't
> suspended, and shouldn't be resumed; otherwise, we simply ignore the
> failure to suspend the thread, although the warning is still printed.
> 
> I am running with this patch for almost a month, and the dreaded "PC
> register is not available" message didn't appear even once.  No more
> botched debugging sessions!  The test program in PR/14018 also runs
> indefinitely until interrupted, instead of stopping.
> 
> So I suggest to install this.  OK?  Should I also close the Bugzilla
> PR?
> 
> --- gdb/windows-nat.c~0	2014-02-06 04:21:29.000000000 +0200
> +++ gdb/windows-nat.c	2014-02-26 22:27:10.225625000 +0200
> @@ -313,9 +313,10 @@ thread_rec (DWORD id, int get_context)
>  		    warning (_("SuspendThread (tid=0x%x) failed."
>  			       " (winerr %u)"),
>  			     (unsigned) id, (unsigned) err);
> -		    return NULL;
> +		    th->suspended = -1;
>  		  }
> -		th->suspended = 1;
> +		else
> +		  th->suspended = 1;
>  	      }
>  	    else if (get_context < 0)
>  	      th->suspended = -1;

Generally speaking, it seems to make sense to me that we would
mark as unsuspended threads that we cannot suspend. But one question
that rises from doing that is: how does the rest of GDB handle
this thread? In particular, does the thread show up in "info thread"
and is the user able to switch to that thread? etc? Certainly, if
the current situation is to leave the user stranded, the suggested
approach cannot only be an improvement...

Another thought I had on your patch is that we might want to limit
the warning to situation where the return code is not a permission
denied.

It would also have been nice to double-check that the thread in
question, when the error shows up, is indeed on a system thread
unrelated to our program (Eg: the thread created when we try to
interrupt the program with ctrl-c or ctrl-break). Not sure if
there is a way to determine that, though.

I would have looked into that, but I don't have much time this week, and
might be equally busy the following 2 weeks, so I didn't want to delay
my answer any further. Overall, your patch looks very promising.

Sorry I can't be anymore support for the moment.
  
Eli Zaretskii March 18, 2014, 4:36 p.m. UTC | #2
> Date: Tue, 18 Mar 2014 09:16:08 -0700
> From: Joel Brobecker <brobecker@adacore.com>
> Cc: gdb-patches@sourceware.org
> 
> Generally speaking, it seems to make sense to me that we would
> mark as unsuspended threads that we cannot suspend. But one question
> that rises from doing that is: how does the rest of GDB handle
> this thread? In particular, does the thread show up in "info thread"
> and is the user able to switch to that thread? etc?

I can try finding out, but it's not easy: since I made that change,
these situations became so much rarer that it's a challenge to bump
into them.  I'll see what I can do.

> Certainly, if the current situation is to leave the user stranded,
> the suggested approach cannot only be an improvement...

Exactly.

> Another thought I had on your patch is that we might want to limit
> the warning to situation where the return code is not a permission
> denied.

I'm not sure we should bother.  After all, if the problem is real, we
will get an error further down the line, when we use the handle to
that thread to do something with it.

IOW, I see no need to thrash the entire session because of something
that isn't fatal.

> It would also have been nice to double-check that the thread in
> question, when the error shows up, is indeed on a system thread
> unrelated to our program (Eg: the thread created when we try to
> interrupt the program with ctrl-c or ctrl-break). Not sure if
> there is a way to determine that, though.

I never use Ctrl-C/Ctrl/BREAK during debugging.  The threads that I
was talking about just appear out of thin air, when the debuggee hits
a breakpoint, for example.

> I would have looked into that, but I don't have much time this week, and
> might be equally busy the following 2 weeks, so I didn't want to delay
> my answer any further. Overall, your patch looks very promising.
> 
> Sorry I can't be anymore support for the moment.

There's no rush.  My problem is solved, so I can wait patiently ;-)

Thanks.
  
Joel Brobecker March 18, 2014, 4:54 p.m. UTC | #3
> > Another thought I had on your patch is that we might want to limit
> > the warning to situation where the return code is not a permission
> > denied.
> 
> I'm not sure we should bother.  After all, if the problem is real, we
> will get an error further down the line, when we use the handle to
> that thread to do something with it.
> 
> IOW, I see no need to thrash the entire session because of something
> that isn't fatal.

I didn't mean to change the behavior - only hide the warning.
In this case, if it is normal that we can't suspend the thread,
then there is no point in warning (scaring) the user about it.
I would only generate a warning if something abnormal that we should
fix occured.

> > It would also have been nice to double-check that the thread in
> > question, when the error shows up, is indeed on a system thread
> > unrelated to our program (Eg: the thread created when we try to
> > interrupt the program with ctrl-c or ctrl-break). Not sure if
> > there is a way to determine that, though.
> 
> I never use Ctrl-C/Ctrl/BREAK during debugging.  The threads that I
> was talking about just appear out of thin air, when the debuggee hits
> a breakpoint, for example.

I think we've seen that as well, I was just citing this thread as
an example, because that's a case where I know that an unrelated
thread gets created.

> There's no rush.  My problem is solved, so I can wait patiently ;-)

OK :-). but don't let me stop you. I think you have as good a handle
as I do, and perhaps others will comment as well. I think the only
part we need to look at before putting your patch in is analyzing
its side-effects. It cannot be all that bad, since you've been running
with the patch for quite a while, now.
  
Eli Zaretskii March 18, 2014, 5:12 p.m. UTC | #4
> Date: Tue, 18 Mar 2014 09:54:13 -0700
> From: Joel Brobecker <brobecker@adacore.com>
> Cc: gdb-patches@sourceware.org
> 
> > > Another thought I had on your patch is that we might want to limit
> > > the warning to situation where the return code is not a permission
> > > denied.
> > 
> > I'm not sure we should bother.  After all, if the problem is real, we
> > will get an error further down the line, when we use the handle to
> > that thread to do something with it.
> > 
> > IOW, I see no need to thrash the entire session because of something
> > that isn't fatal.
> 
> I didn't mean to change the behavior - only hide the warning.

Ah, OK.  I think I'll do that in the next version of the patch.

> I think the only part we need to look at before putting your patch
> in is analyzing its side-effects.

I'll look into that and post the results.

Thanks.
  
Pedro Alves March 18, 2014, 5:33 p.m. UTC | #5
On 03/18/2014 05:12 PM, Eli Zaretskii wrote:
>> Date: Tue, 18 Mar 2014 09:54:13 -0700
>> From: Joel Brobecker <brobecker@adacore.com>
>> Cc: gdb-patches@sourceware.org
>>
>>>> Another thought I had on your patch is that we might want to limit
>>>> the warning to situation where the return code is not a permission
>>>> denied.
>>>
>>> I'm not sure we should bother.  After all, if the problem is real, we
>>> will get an error further down the line, when we use the handle to
>>> that thread to do something with it.
>>>
>>> IOW, I see no need to thrash the entire session because of something
>>> that isn't fatal.
>>
>> I didn't mean to change the behavior - only hide the warning.
> 
> Ah, OK.  I think I'll do that in the next version of the patch.
> 
>> I think the only part we need to look at before putting your patch
>> in is analyzing its side-effects.
> 
> I'll look into that and post the results.

I see that the GetThreadContext call (do_windows_fetch_inferior_registers)
doesn't check for errors (I think it should (*)).  It'd be interesting to know whether gdb can
actually read the registers off of this thread, and if so, what's the
thread's backtrace like.

(*) - note the other place that calls GetThreadContext does it under
a CHECK:
      CHECK (GetThreadContext (th->h, &th->context));
  
Eli Zaretskii March 19, 2014, 3:40 a.m. UTC | #6
> Date: Tue, 18 Mar 2014 17:33:16 +0000
> From: Pedro Alves <palves@redhat.com>
> CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org
> 
> I see that the GetThreadContext call (do_windows_fetch_inferior_registers)
> doesn't check for errors (I think it should (*)).  It'd be interesting to know whether gdb can
> actually read the registers off of this thread

How to see those registers?

> and if so, what's the thread's backtrace like.

Why would we be interested in that info?
  
Pedro Alves March 19, 2014, 10:06 a.m. UTC | #7
On 03/19/2014 03:40 AM, Eli Zaretskii wrote:
>> Date: Tue, 18 Mar 2014 17:33:16 +0000
>> From: Pedro Alves <palves@redhat.com>
>> CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org
>>
>> I see that the GetThreadContext call (do_windows_fetch_inferior_registers)
>> doesn't check for errors (I think it should (*)).  It'd be interesting to know whether gdb can
>> actually read the registers off of this thread
> 
> How to see those registers?

Just "info registers" ?

If we can't even read registers off of it, and GetThreadContext
is failing, it means after your patch we'll be showing bogus
register contents for these threads.  But I think GetThreadContext
will indeed succeed for these threads.

AFAIK, we don't really need the SuspendThread calls when handling
a debug event, given that when WaitForDebugEvent returns a
stop event, all threads have already been stopped by the OS for us.
We really only need to SuspendThread threads when we might want
to leave most threads paused on the next resume, for e.g., when
stepping over a breakpoint.  The suspend count handling in
windows-nat.c is quite messy, and looking at the code, it doesn't
look like we actually get that right, given we only SuspendThread
threads if we try to read their registers, and so if nothing reads
registers off all threads when e.g., handling a breakpoint that
we decide needs to be stepped over (which we don't), then we end
up resuming threads we shouldn't.  But it might be I'm missing
something.  I wonder whether schedlock.exp is passing on
Windows/Cygwin cleanly.  IMO, this whole SuspendThread business
is ripe for simplification/cleanup.

>> and if so, what's the thread's backtrace like.
> 
> Why would we be interested in that info?

It'll likely show us the thread is stopped at some ntdll.dll function
or some such, and from the function name we will likely
be able to infer what/which thread is this, like, e.g., whether
it's a thread injected with DebugBreakProcess or some such
(internally by one of the system dlls or the dlls your app
links with).
  
Eli Zaretskii March 19, 2014, 4:24 p.m. UTC | #8
> Date: Wed, 19 Mar 2014 10:06:51 +0000
> From: Pedro Alves <palves@redhat.com>
> CC: brobecker@adacore.com, gdb-patches@sourceware.org
> 
> On 03/19/2014 03:40 AM, Eli Zaretskii wrote:
> >> Date: Tue, 18 Mar 2014 17:33:16 +0000
> >> From: Pedro Alves <palves@redhat.com>
> >> CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org
> >>
> >> I see that the GetThreadContext call (do_windows_fetch_inferior_registers)
> >> doesn't check for errors (I think it should (*)).  It'd be interesting to know whether gdb can
> >> actually read the registers off of this thread
> > 
> > How to see those registers?
> 
> Just "info registers" ?

That's what I thought, but ...

> If we can't even read registers off of it, and GetThreadContext
> is failing, it means after your patch we'll be showing bogus
> register contents for these threads.

...how do you tell bogus register contents from correct contents?
It's not like I know which register should have what value at any
given time, do I?

> But I think GetThreadContext will indeed succeed for these threads.

Well, at least MSDN begs to differ:

  You cannot get a valid context for a running thread. Use the
  SuspendThread function to suspend the thread before calling
  GetThreadContext.

> AFAIK, we don't really need the SuspendThread calls when handling
> a debug event, given that when WaitForDebugEvent returns a
> stop event, all threads have already been stopped by the OS for us.

Yes, AFAIK that's true.

> We really only need to SuspendThread threads when we might want
> to leave most threads paused on the next resume, for e.g., when
> stepping over a breakpoint.  The suspend count handling in
> windows-nat.c is quite messy, and looking at the code, it doesn't
> look like we actually get that right, given we only SuspendThread
> threads if we try to read their registers, and so if nothing reads
> registers off all threads when e.g., handling a breakpoint that
> we decide needs to be stepped over (which we don't), then we end
> up resuming threads we shouldn't.

That's assuming that stepping resumes threads.  I'm not sure, but I
really don't know enough about debugging APIs on Windows.

> It'll likely show us the thread is stopped at some ntdll.dll function
> or some such, and from the function name we will likely
> be able to infer what/which thread is this, like, e.g., whether
> it's a thread injected with DebugBreakProcess or some such
> (internally by one of the system dlls or the dlls your app
> links with).

I'll see what I can find about that, but I doubt you'd see something
telltale in the backtrace.  (The thread started by Windows for
debugging is not part of this issue; I never saw the threads that are
to have any debug-related functions on their callstacks.)
  
Pedro Alves March 19, 2014, 4:41 p.m. UTC | #9
On 03/19/2014 04:24 PM, Eli Zaretskii wrote:
>> Date: Wed, 19 Mar 2014 10:06:51 +0000
>> From: Pedro Alves <palves@redhat.com>
>> CC: brobecker@adacore.com, gdb-patches@sourceware.org
>>
>> On 03/19/2014 03:40 AM, Eli Zaretskii wrote:
>>>> Date: Tue, 18 Mar 2014 17:33:16 +0000
>>>> From: Pedro Alves <palves@redhat.com>
>>>> CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org
>>>>
>>>> I see that the GetThreadContext call (do_windows_fetch_inferior_registers)
>>>> doesn't check for errors (I think it should (*)).  It'd be interesting to know whether gdb can
>>>> actually read the registers off of this thread
>>>
>>> How to see those registers?
>>
>> Just "info registers" ?
> 
> That's what I thought, but ...
> 
>> If we can't even read registers off of it, and GetThreadContext
>> is failing, it means after your patch we'll be showing bogus
>> register contents for these threads.
> 
> ...how do you tell bogus register contents from correct contents?
> It's not like I know which register should have what value at any
> given time, do I?

The point is that GDB ignores GetThreadContext errors, and so
if indeed GetThreadContext fails, GDB happily proceeds
decoding a bogus th->context.  I mean, we should do this
in do_windows_fetch_inferior_registers:

-      GetThreadContext (th->h, &th->context);
+      CHECK (GetThreadContext (th->h, &th->context));

So that GetThreadContext fails, we at least see a warning.
I assume that if GetThreadContext does not fail, then the
register contents are correct.

> 
>> But I think GetThreadContext will indeed succeed for these threads.
> 
> Well, at least MSDN begs to differ:
> 
>   You cannot get a valid context for a running thread. Use the
>   SuspendThread function to suspend the thread before calling
>   GetThreadContext.

I mean it'll succeed because we only ever read registers when
threads are stopped for debug event.  I don't mean to imply
that those threads are special WRT to GetThreadContext.  It's
not valid to get a context for a _running_ thread.  But after a
debug event, no thread is running at all.  The OS already
stopped threads for us.

> 
>> AFAIK, we don't really need the SuspendThread calls when handling
>> a debug event, given that when WaitForDebugEvent returns a
>> stop event, all threads have already been stopped by the OS for us.
> 
> Yes, AFAIK that's true.

Alright, we were talking past each other then.

I did a little websearch, and I found evidence of other debuggers
also not using SuspendThread after events:

 http://www.ollydbg.de/Help/t_run.htm

"indebugevent
Application is paused on debug event, therefore Suspendallthreads() does not need to call SuspendThread()"

> 
>> We really only need to SuspendThread threads when we might want
>> to leave most threads paused on the next resume, for e.g., when
>> stepping over a breakpoint.  The suspend count handling in
>> windows-nat.c is quite messy, and looking at the code, it doesn't
>> look like we actually get that right, given we only SuspendThread
>> threads if we try to read their registers, and so if nothing reads
>> registers off all threads when e.g., handling a breakpoint that
>> we decide needs to be stepped over (which we don't), then we end
>> up resuming threads we shouldn't.
> 
> That's assuming that stepping resumes threads.  I'm not sure, but I
> really don't know enough about debugging APIs on Windows.

There's no special step request in the debug API.  The way to set
a thread stepping is to enable the trace flag in eflags:

      if (step)
	{
	  /* Single step by setting t bit.  */
	  struct regcache *regcache = get_current_regcache ();
	  struct gdbarch *gdbarch = get_regcache_arch (regcache);
	  windows_fetch_inferior_registers (ops, regcache,
					    gdbarch_ps_regnum (gdbarch));
	  th->context.EFlags |= FLAG_TRACE_BIT;
	}

>> It'll likely show us the thread is stopped at some ntdll.dll function
>> or some such, and from the function name we will likely
>> be able to infer what/which thread is this, like, e.g., whether
>> it's a thread injected with DebugBreakProcess or some such
>> (internally by one of the system dlls or the dlls your app
>> links with).
> 
> I'll see what I can find about that, but I doubt you'd see something
> telltale in the backtrace.  (The thread started by Windows for
> debugging is not part of this issue; I never saw the threads that are
> to have any debug-related functions on their callstacks.)

Thanks!
  

Patch

--- gdb/windows-nat.c~0	2014-02-06 04:21:29.000000000 +0200
+++ gdb/windows-nat.c	2014-02-26 22:27:10.225625000 +0200
@@ -313,9 +313,10 @@  thread_rec (DWORD id, int get_context)
 		    warning (_("SuspendThread (tid=0x%x) failed."
 			       " (winerr %u)"),
 			     (unsigned) id, (unsigned) err);
-		    return NULL;
+		    th->suspended = -1;
 		  }
-		th->suspended = 1;
+		else
+		  th->suspended = 1;
 	      }
 	    else if (get_context < 0)
 	      th->suspended = -1;