Message ID | 83txawa9wk.fsf@gnu.org |
---|---|
State | Superseded |
Headers |
Return-Path: <x14314964@homiemail-mx22.g.dreamhost.com> X-Original-To: siddhesh@wilcox.dreamhost.com Delivered-To: siddhesh@wilcox.dreamhost.com Received: from homiemail-mx22.g.dreamhost.com (caibbdcaabij.dreamhost.com [208.113.200.189]) by wilcox.dreamhost.com (Postfix) with ESMTP id 16C22360084 for <siddhesh@wilcox.dreamhost.com>; Mon, 17 Mar 2014 12:43:35 -0700 (PDT) Received: by homiemail-mx22.g.dreamhost.com (Postfix, from userid 14314964) id BBCE34FB6673; Mon, 17 Mar 2014 12:43:34 -0700 (PDT) X-Original-To: gdb@patchwork.siddhesh.in Delivered-To: x14314964@homiemail-mx22.g.dreamhost.com Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by homiemail-mx22.g.dreamhost.com (Postfix) with ESMTPS id 978ED4FB6642 for <gdb@patchwork.siddhesh.in>; Mon, 17 Mar 2014 12:43:34 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:subject:to:reply-to:message-id; q= dns; s=default; b=GyADE1zXtMfqhCDw8HlFCWDhBlvNkgCVPaznS6x5wld3pj RlH2nq+xUt62x1sjJLP28l7KwTY63ScXhLlBT67qRQU3NiS79uoRr1QI1Kd+tsM5 QR0yIMDI7wwy47t0o0SHqUj+le29TsW6k3strQFjA9REp/u+S/SsC0D833g4k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:subject:to:reply-to:message-id; s= default; bh=IqVu1BhIzuPckrX82+WNq/hxJ2w=; b=mJ9XCI90CoUTZxgu4Jtx B5cDhPnnoK7iOZal69jPCOVX21mWfL5/b4I7vkcncbZi+7EQNpkWYxaSYo9YFUL3 4TgIqLy+LizhXwAmGvSqyjapgmNJ6o2ZRhb724XXpC+VUd9LIaylQwqyB4ClKypY faR4EB92ngGbgMo/cNKgG04= Received: (qmail 4827 invoked by alias); 17 Mar 2014 19:43:33 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-gdb=patchwork.siddhesh.in@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 4817 invoked by uid 89); 17 Mar 2014 19:43:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.9 required=5.0 tests=AWL, BAYES_00, SPF_SOFTFAIL autolearn=no version=3.3.2 X-HELO: mtaout27.012.net.il Received: from mtaout27.012.net.il (HELO mtaout27.012.net.il) (80.179.55.183) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 17 Mar 2014 19:43:30 +0000 Received: from conversion-daemon.mtaout27.012.net.il by mtaout27.012.net.il (HyperSendmail v2007.08) id <0N2L00G00IL3NK00@mtaout27.012.net.il> for gdb-patches@sourceware.org; Mon, 17 Mar 2014 21:40:40 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout27.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0N2L0085EINSTL70@mtaout27.012.net.il> for gdb-patches@sourceware.org; Mon, 17 Mar 2014 21:40:40 +0200 (IST) Date: Mon, 17 Mar 2014 21:43:23 +0200 From: Eli Zaretskii <eliz@gnu.org> Subject: [PATCH] Fix "PC register is not available" issue To: gdb-patches@sourceware.org Reply-to: Eli Zaretskii <eliz@gnu.org> Message-id: <83txawa9wk.fsf@gnu.org> X-IsSubscribed: yes X-DH-Original-To: gdb@patchwork.siddhesh.in |
Commit Message
Eli Zaretskii
March 17, 2014, 7:43 p.m. UTC
This problem was raised and mentioned several times over the last few years. It was discussed here in the thread which started with this message: https://sourceware.org/ml/gdb-patches/2013-06/msg00237.html In the ensuing discussion, there was a consensus that these failures shouldn't be fatal, and perhaps we should even ignore them entirely. The most annoying problem with this is that after the message about failure to suspend a thread, there's another message: warning: SuspendThread (tid=0x720) failed. (winerr 5) PC register is not available <<<<<<<<<<<<<<<<<<<<<<<<< and the debug session becomes useless after that: the debuggee stops, and you cannot continue it. You can only examine memory. A very similar, perhaps the same, problem is described here: https://sourceware.org/bugzilla/show_bug.cgi?id=14018 There you can find a short test program that demonstrates the issue, and some analysis. After hitting this problem myself many times, I came to the same conclusion: namely, that GDB is trying to suspend a thread for which it doesn't have the necessary privileges. My hypothesis is that those are the threads that Windows starts in the context of the process being debugged, perhaps for the purposes of handling some debug events. (I can see the "New thread" messages from time to time, although I know for certain that the inferior didn't start any application threads.) So I came up with the patch below. The idea is that setting th->suspended to -1 just signals to GDB that the thread isn't suspended, and shouldn't be resumed; otherwise, we simply ignore the failure to suspend the thread, although the warning is still printed. I am running with this patch for almost a month, and the dreaded "PC register is not available" message didn't appear even once. No more botched debugging sessions! The test program in PR/14018 also runs indefinitely until interrupted, instead of stopping. So I suggest to install this. OK? Should I also close the Bugzilla PR?
Comments
Hi Eli, On Mon, Mar 17, 2014 at 09:43:23PM +0200, Eli Zaretskii wrote: > This problem was raised and mentioned several times over the last few > years. It was discussed here in the thread which started with this > message: > > https://sourceware.org/ml/gdb-patches/2013-06/msg00237.html Thanks for looking into this! > So I came up with the patch below. The idea is that setting > th->suspended to -1 just signals to GDB that the thread isn't > suspended, and shouldn't be resumed; otherwise, we simply ignore the > failure to suspend the thread, although the warning is still printed. > > I am running with this patch for almost a month, and the dreaded "PC > register is not available" message didn't appear even once. No more > botched debugging sessions! The test program in PR/14018 also runs > indefinitely until interrupted, instead of stopping. > > So I suggest to install this. OK? Should I also close the Bugzilla > PR? > > --- gdb/windows-nat.c~0 2014-02-06 04:21:29.000000000 +0200 > +++ gdb/windows-nat.c 2014-02-26 22:27:10.225625000 +0200 > @@ -313,9 +313,10 @@ thread_rec (DWORD id, int get_context) > warning (_("SuspendThread (tid=0x%x) failed." > " (winerr %u)"), > (unsigned) id, (unsigned) err); > - return NULL; > + th->suspended = -1; > } > - th->suspended = 1; > + else > + th->suspended = 1; > } > else if (get_context < 0) > th->suspended = -1; Generally speaking, it seems to make sense to me that we would mark as unsuspended threads that we cannot suspend. But one question that rises from doing that is: how does the rest of GDB handle this thread? In particular, does the thread show up in "info thread" and is the user able to switch to that thread? etc? Certainly, if the current situation is to leave the user stranded, the suggested approach cannot only be an improvement... Another thought I had on your patch is that we might want to limit the warning to situation where the return code is not a permission denied. It would also have been nice to double-check that the thread in question, when the error shows up, is indeed on a system thread unrelated to our program (Eg: the thread created when we try to interrupt the program with ctrl-c or ctrl-break). Not sure if there is a way to determine that, though. I would have looked into that, but I don't have much time this week, and might be equally busy the following 2 weeks, so I didn't want to delay my answer any further. Overall, your patch looks very promising. Sorry I can't be anymore support for the moment.
> Date: Tue, 18 Mar 2014 09:16:08 -0700 > From: Joel Brobecker <brobecker@adacore.com> > Cc: gdb-patches@sourceware.org > > Generally speaking, it seems to make sense to me that we would > mark as unsuspended threads that we cannot suspend. But one question > that rises from doing that is: how does the rest of GDB handle > this thread? In particular, does the thread show up in "info thread" > and is the user able to switch to that thread? etc? I can try finding out, but it's not easy: since I made that change, these situations became so much rarer that it's a challenge to bump into them. I'll see what I can do. > Certainly, if the current situation is to leave the user stranded, > the suggested approach cannot only be an improvement... Exactly. > Another thought I had on your patch is that we might want to limit > the warning to situation where the return code is not a permission > denied. I'm not sure we should bother. After all, if the problem is real, we will get an error further down the line, when we use the handle to that thread to do something with it. IOW, I see no need to thrash the entire session because of something that isn't fatal. > It would also have been nice to double-check that the thread in > question, when the error shows up, is indeed on a system thread > unrelated to our program (Eg: the thread created when we try to > interrupt the program with ctrl-c or ctrl-break). Not sure if > there is a way to determine that, though. I never use Ctrl-C/Ctrl/BREAK during debugging. The threads that I was talking about just appear out of thin air, when the debuggee hits a breakpoint, for example. > I would have looked into that, but I don't have much time this week, and > might be equally busy the following 2 weeks, so I didn't want to delay > my answer any further. Overall, your patch looks very promising. > > Sorry I can't be anymore support for the moment. There's no rush. My problem is solved, so I can wait patiently ;-) Thanks.
> > Another thought I had on your patch is that we might want to limit > > the warning to situation where the return code is not a permission > > denied. > > I'm not sure we should bother. After all, if the problem is real, we > will get an error further down the line, when we use the handle to > that thread to do something with it. > > IOW, I see no need to thrash the entire session because of something > that isn't fatal. I didn't mean to change the behavior - only hide the warning. In this case, if it is normal that we can't suspend the thread, then there is no point in warning (scaring) the user about it. I would only generate a warning if something abnormal that we should fix occured. > > It would also have been nice to double-check that the thread in > > question, when the error shows up, is indeed on a system thread > > unrelated to our program (Eg: the thread created when we try to > > interrupt the program with ctrl-c or ctrl-break). Not sure if > > there is a way to determine that, though. > > I never use Ctrl-C/Ctrl/BREAK during debugging. The threads that I > was talking about just appear out of thin air, when the debuggee hits > a breakpoint, for example. I think we've seen that as well, I was just citing this thread as an example, because that's a case where I know that an unrelated thread gets created. > There's no rush. My problem is solved, so I can wait patiently ;-) OK :-). but don't let me stop you. I think you have as good a handle as I do, and perhaps others will comment as well. I think the only part we need to look at before putting your patch in is analyzing its side-effects. It cannot be all that bad, since you've been running with the patch for quite a while, now.
> Date: Tue, 18 Mar 2014 09:54:13 -0700 > From: Joel Brobecker <brobecker@adacore.com> > Cc: gdb-patches@sourceware.org > > > > Another thought I had on your patch is that we might want to limit > > > the warning to situation where the return code is not a permission > > > denied. > > > > I'm not sure we should bother. After all, if the problem is real, we > > will get an error further down the line, when we use the handle to > > that thread to do something with it. > > > > IOW, I see no need to thrash the entire session because of something > > that isn't fatal. > > I didn't mean to change the behavior - only hide the warning. Ah, OK. I think I'll do that in the next version of the patch. > I think the only part we need to look at before putting your patch > in is analyzing its side-effects. I'll look into that and post the results. Thanks.
On 03/18/2014 05:12 PM, Eli Zaretskii wrote: >> Date: Tue, 18 Mar 2014 09:54:13 -0700 >> From: Joel Brobecker <brobecker@adacore.com> >> Cc: gdb-patches@sourceware.org >> >>>> Another thought I had on your patch is that we might want to limit >>>> the warning to situation where the return code is not a permission >>>> denied. >>> >>> I'm not sure we should bother. After all, if the problem is real, we >>> will get an error further down the line, when we use the handle to >>> that thread to do something with it. >>> >>> IOW, I see no need to thrash the entire session because of something >>> that isn't fatal. >> >> I didn't mean to change the behavior - only hide the warning. > > Ah, OK. I think I'll do that in the next version of the patch. > >> I think the only part we need to look at before putting your patch >> in is analyzing its side-effects. > > I'll look into that and post the results. I see that the GetThreadContext call (do_windows_fetch_inferior_registers) doesn't check for errors (I think it should (*)). It'd be interesting to know whether gdb can actually read the registers off of this thread, and if so, what's the thread's backtrace like. (*) - note the other place that calls GetThreadContext does it under a CHECK: CHECK (GetThreadContext (th->h, &th->context));
> Date: Tue, 18 Mar 2014 17:33:16 +0000 > From: Pedro Alves <palves@redhat.com> > CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org > > I see that the GetThreadContext call (do_windows_fetch_inferior_registers) > doesn't check for errors (I think it should (*)). It'd be interesting to know whether gdb can > actually read the registers off of this thread How to see those registers? > and if so, what's the thread's backtrace like. Why would we be interested in that info?
On 03/19/2014 03:40 AM, Eli Zaretskii wrote: >> Date: Tue, 18 Mar 2014 17:33:16 +0000 >> From: Pedro Alves <palves@redhat.com> >> CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org >> >> I see that the GetThreadContext call (do_windows_fetch_inferior_registers) >> doesn't check for errors (I think it should (*)). It'd be interesting to know whether gdb can >> actually read the registers off of this thread > > How to see those registers? Just "info registers" ? If we can't even read registers off of it, and GetThreadContext is failing, it means after your patch we'll be showing bogus register contents for these threads. But I think GetThreadContext will indeed succeed for these threads. AFAIK, we don't really need the SuspendThread calls when handling a debug event, given that when WaitForDebugEvent returns a stop event, all threads have already been stopped by the OS for us. We really only need to SuspendThread threads when we might want to leave most threads paused on the next resume, for e.g., when stepping over a breakpoint. The suspend count handling in windows-nat.c is quite messy, and looking at the code, it doesn't look like we actually get that right, given we only SuspendThread threads if we try to read their registers, and so if nothing reads registers off all threads when e.g., handling a breakpoint that we decide needs to be stepped over (which we don't), then we end up resuming threads we shouldn't. But it might be I'm missing something. I wonder whether schedlock.exp is passing on Windows/Cygwin cleanly. IMO, this whole SuspendThread business is ripe for simplification/cleanup. >> and if so, what's the thread's backtrace like. > > Why would we be interested in that info? It'll likely show us the thread is stopped at some ntdll.dll function or some such, and from the function name we will likely be able to infer what/which thread is this, like, e.g., whether it's a thread injected with DebugBreakProcess or some such (internally by one of the system dlls or the dlls your app links with).
> Date: Wed, 19 Mar 2014 10:06:51 +0000 > From: Pedro Alves <palves@redhat.com> > CC: brobecker@adacore.com, gdb-patches@sourceware.org > > On 03/19/2014 03:40 AM, Eli Zaretskii wrote: > >> Date: Tue, 18 Mar 2014 17:33:16 +0000 > >> From: Pedro Alves <palves@redhat.com> > >> CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org > >> > >> I see that the GetThreadContext call (do_windows_fetch_inferior_registers) > >> doesn't check for errors (I think it should (*)). It'd be interesting to know whether gdb can > >> actually read the registers off of this thread > > > > How to see those registers? > > Just "info registers" ? That's what I thought, but ... > If we can't even read registers off of it, and GetThreadContext > is failing, it means after your patch we'll be showing bogus > register contents for these threads. ...how do you tell bogus register contents from correct contents? It's not like I know which register should have what value at any given time, do I? > But I think GetThreadContext will indeed succeed for these threads. Well, at least MSDN begs to differ: You cannot get a valid context for a running thread. Use the SuspendThread function to suspend the thread before calling GetThreadContext. > AFAIK, we don't really need the SuspendThread calls when handling > a debug event, given that when WaitForDebugEvent returns a > stop event, all threads have already been stopped by the OS for us. Yes, AFAIK that's true. > We really only need to SuspendThread threads when we might want > to leave most threads paused on the next resume, for e.g., when > stepping over a breakpoint. The suspend count handling in > windows-nat.c is quite messy, and looking at the code, it doesn't > look like we actually get that right, given we only SuspendThread > threads if we try to read their registers, and so if nothing reads > registers off all threads when e.g., handling a breakpoint that > we decide needs to be stepped over (which we don't), then we end > up resuming threads we shouldn't. That's assuming that stepping resumes threads. I'm not sure, but I really don't know enough about debugging APIs on Windows. > It'll likely show us the thread is stopped at some ntdll.dll function > or some such, and from the function name we will likely > be able to infer what/which thread is this, like, e.g., whether > it's a thread injected with DebugBreakProcess or some such > (internally by one of the system dlls or the dlls your app > links with). I'll see what I can find about that, but I doubt you'd see something telltale in the backtrace. (The thread started by Windows for debugging is not part of this issue; I never saw the threads that are to have any debug-related functions on their callstacks.)
On 03/19/2014 04:24 PM, Eli Zaretskii wrote: >> Date: Wed, 19 Mar 2014 10:06:51 +0000 >> From: Pedro Alves <palves@redhat.com> >> CC: brobecker@adacore.com, gdb-patches@sourceware.org >> >> On 03/19/2014 03:40 AM, Eli Zaretskii wrote: >>>> Date: Tue, 18 Mar 2014 17:33:16 +0000 >>>> From: Pedro Alves <palves@redhat.com> >>>> CC: Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org >>>> >>>> I see that the GetThreadContext call (do_windows_fetch_inferior_registers) >>>> doesn't check for errors (I think it should (*)). It'd be interesting to know whether gdb can >>>> actually read the registers off of this thread >>> >>> How to see those registers? >> >> Just "info registers" ? > > That's what I thought, but ... > >> If we can't even read registers off of it, and GetThreadContext >> is failing, it means after your patch we'll be showing bogus >> register contents for these threads. > > ...how do you tell bogus register contents from correct contents? > It's not like I know which register should have what value at any > given time, do I? The point is that GDB ignores GetThreadContext errors, and so if indeed GetThreadContext fails, GDB happily proceeds decoding a bogus th->context. I mean, we should do this in do_windows_fetch_inferior_registers: - GetThreadContext (th->h, &th->context); + CHECK (GetThreadContext (th->h, &th->context)); So that GetThreadContext fails, we at least see a warning. I assume that if GetThreadContext does not fail, then the register contents are correct. > >> But I think GetThreadContext will indeed succeed for these threads. > > Well, at least MSDN begs to differ: > > You cannot get a valid context for a running thread. Use the > SuspendThread function to suspend the thread before calling > GetThreadContext. I mean it'll succeed because we only ever read registers when threads are stopped for debug event. I don't mean to imply that those threads are special WRT to GetThreadContext. It's not valid to get a context for a _running_ thread. But after a debug event, no thread is running at all. The OS already stopped threads for us. > >> AFAIK, we don't really need the SuspendThread calls when handling >> a debug event, given that when WaitForDebugEvent returns a >> stop event, all threads have already been stopped by the OS for us. > > Yes, AFAIK that's true. Alright, we were talking past each other then. I did a little websearch, and I found evidence of other debuggers also not using SuspendThread after events: http://www.ollydbg.de/Help/t_run.htm "indebugevent Application is paused on debug event, therefore Suspendallthreads() does not need to call SuspendThread()" > >> We really only need to SuspendThread threads when we might want >> to leave most threads paused on the next resume, for e.g., when >> stepping over a breakpoint. The suspend count handling in >> windows-nat.c is quite messy, and looking at the code, it doesn't >> look like we actually get that right, given we only SuspendThread >> threads if we try to read their registers, and so if nothing reads >> registers off all threads when e.g., handling a breakpoint that >> we decide needs to be stepped over (which we don't), then we end >> up resuming threads we shouldn't. > > That's assuming that stepping resumes threads. I'm not sure, but I > really don't know enough about debugging APIs on Windows. There's no special step request in the debug API. The way to set a thread stepping is to enable the trace flag in eflags: if (step) { /* Single step by setting t bit. */ struct regcache *regcache = get_current_regcache (); struct gdbarch *gdbarch = get_regcache_arch (regcache); windows_fetch_inferior_registers (ops, regcache, gdbarch_ps_regnum (gdbarch)); th->context.EFlags |= FLAG_TRACE_BIT; } >> It'll likely show us the thread is stopped at some ntdll.dll function >> or some such, and from the function name we will likely >> be able to infer what/which thread is this, like, e.g., whether >> it's a thread injected with DebugBreakProcess or some such >> (internally by one of the system dlls or the dlls your app >> links with). > > I'll see what I can find about that, but I doubt you'd see something > telltale in the backtrace. (The thread started by Windows for > debugging is not part of this issue; I never saw the threads that are > to have any debug-related functions on their callstacks.) Thanks!
--- gdb/windows-nat.c~0 2014-02-06 04:21:29.000000000 +0200 +++ gdb/windows-nat.c 2014-02-26 22:27:10.225625000 +0200 @@ -313,9 +313,10 @@ thread_rec (DWORD id, int get_context) warning (_("SuspendThread (tid=0x%x) failed." " (winerr %u)"), (unsigned) id, (unsigned) err); - return NULL; + th->suspended = -1; } - th->suspended = 1; + else + th->suspended = 1; } else if (get_context < 0) th->suspended = -1;