Message ID | 20161223212842.42715-1-jhb@FreeBSD.org |
---|---|
State | New, archived |
Headers |
Received: (qmail 96316 invoked by alias); 23 Dec 2016 21:30:26 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 84720 invoked by uid 89); 23 Dec 2016 21:30:15 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_00, SPF_HELO_PASS, SPF_SOFTFAIL autolearn=no version=3.3.2 spammy=baldwin, Baldwin, D*FreeBSD.org, U*jhb X-HELO: mail.baldwin.cx Received: from bigwig.baldwin.cx (HELO mail.baldwin.cx) (96.47.65.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 23 Dec 2016 21:30:03 +0000 Received: from ralph.com (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by mail.baldwin.cx (Postfix) with ESMTPSA id ECC3910AA64 for <gdb-patches@sourceware.org>; Fri, 23 Dec 2016 16:30:01 -0500 (EST) From: John Baldwin <jhb@FreeBSD.org> To: gdb-patches@sourceware.org Subject: [PATCH] PR threads/20743: Don't attempt to suspend or resume exited threads. Date: Fri, 23 Dec 2016 13:28:42 -0800 Message-Id: <20161223212842.42715-1-jhb@FreeBSD.org> X-IsSubscribed: yes |
Commit Message
John Baldwin
Dec. 23, 2016, 9:28 p.m. UTC
When resuming a native FreeBSD process, ignore exited threads when suspending/resuming individual threads prior to continuing the process. gdb/ChangeLog: PR threads/20743 * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. (resume_all_threads_cb): Likewise. (fbsd_resume): Assert resuming thread has not exited. --- gdb/ChangeLog | 7 +++++++ gdb/fbsd-nat.c | 7 +++++++ 2 files changed, 14 insertions(+)
Comments
On 12/23/2016 03:28 PM, John Baldwin wrote: > When resuming a native FreeBSD process, ignore exited threads when > suspending/resuming individual threads prior to continuing the process. > > gdb/ChangeLog: > > PR threads/20743 > * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > (resume_all_threads_cb): Likewise. > (fbsd_resume): Assert resuming thread has not exited. > --- > gdb/ChangeLog | 7 +++++++ > gdb/fbsd-nat.c | 7 +++++++ > 2 files changed, 14 insertions(+) > > diff --git a/gdb/ChangeLog b/gdb/ChangeLog > index db6e913..4fb3732 100644 > --- a/gdb/ChangeLog > +++ b/gdb/ChangeLog > @@ -1,3 +1,10 @@ > +2016-12-23 John Baldwin <jhb@FreeBSD.org> > + > + PR threads/20743 > + * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > + (resume_all_threads_cb): Likewise. > + (fbsd_resume): Assert resuming thread has not exited. > + > 2016-12-22 Doug Evans <xdje42@gmail.com> > > * infrun.c (set_step_over_info): Add comment. > diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c > index ade62f1..7cd08c6 100644 > --- a/gdb/fbsd-nat.c > +++ b/gdb/fbsd-nat.c > @@ -662,6 +662,9 @@ resume_one_thread_cb (struct thread_info *tp, void *data) > if (ptid_get_pid (tp->ptid) != ptid_get_pid (*ptid)) > return 0; > > + if (is_exited (tp->ptid)) > + return 0; > + > if (ptid_get_lwp (tp->ptid) == ptid_get_lwp (*ptid)) > request = PT_RESUME; > else > @@ -680,6 +683,9 @@ resume_all_threads_cb (struct thread_info *tp, void *data) > if (!ptid_match (tp->ptid, *filter)) > return 0; > > + if (is_exited (tp->ptid)) > + return 0; > + > if (ptrace (PT_RESUME, ptid_get_lwp (tp->ptid), NULL, 0) == -1) > perror_with_name (("ptrace")); > return 0; > @@ -711,6 +717,7 @@ fbsd_resume (struct target_ops *ops, > if (ptid_lwp_p (ptid)) > { > /* If ptid is a specific LWP, suspend all other LWPs in the process. */ > + gdb_assert (!is_exited (ptid)); If we're asserting on this (since supposedly it shouldn't happen), do we need to check for is_exited on the two functions above? Also, is there a reason why we're not detecting a thread that has exited? Aren't all threads stopped at this point (for all-stop mode at least)? > iterate_over_threads (resume_one_thread_cb, &ptid); > } > else >
On Fri, Dec 23, 2016 at 15:43:19 -0600, Luis Machado wrote: > On 12/23/2016 03:28 PM, John Baldwin wrote: > > When resuming a native FreeBSD process, ignore exited threads when > > suspending/resuming individual threads prior to continuing the process. > > > > gdb/ChangeLog: > > > > PR threads/20743 > > * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > > (resume_all_threads_cb): Likewise. > > (fbsd_resume): Assert resuming thread has not exited. > > --- > > gdb/ChangeLog | 7 +++++++ > > gdb/fbsd-nat.c | 7 +++++++ > > 2 files changed, 14 insertions(+) > > > > diff --git a/gdb/ChangeLog b/gdb/ChangeLog > > index db6e913..4fb3732 100644 > > --- a/gdb/ChangeLog > > +++ b/gdb/ChangeLog > > @@ -1,3 +1,10 @@ > > +2016-12-23 John Baldwin <jhb@FreeBSD.org> > > + > > + PR threads/20743 > > + * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > > + (resume_all_threads_cb): Likewise. > > + (fbsd_resume): Assert resuming thread has not exited. > > + > > 2016-12-22 Doug Evans <xdje42@gmail.com> > > > > * infrun.c (set_step_over_info): Add comment. > > diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c > > index ade62f1..7cd08c6 100644 > > --- a/gdb/fbsd-nat.c > > +++ b/gdb/fbsd-nat.c > > @@ -662,6 +662,9 @@ resume_one_thread_cb (struct thread_info *tp, void *data) > > if (ptid_get_pid (tp->ptid) != ptid_get_pid (*ptid)) > > return 0; > > > > + if (is_exited (tp->ptid)) > > + return 0; > > + > > if (ptid_get_lwp (tp->ptid) == ptid_get_lwp (*ptid)) > > request = PT_RESUME; > > else > > @@ -680,6 +683,9 @@ resume_all_threads_cb (struct thread_info *tp, void *data) > > if (!ptid_match (tp->ptid, *filter)) > > return 0; > > > > + if (is_exited (tp->ptid)) > > + return 0; > > + > > if (ptrace (PT_RESUME, ptid_get_lwp (tp->ptid), NULL, 0) == -1) > > perror_with_name (("ptrace")); > > return 0; > > @@ -711,6 +717,7 @@ fbsd_resume (struct target_ops *ops, > > if (ptid_lwp_p (ptid)) > > { > > /* If ptid is a specific LWP, suspend all other LWPs in the process. */ > > + gdb_assert (!is_exited (ptid)); > > If we're asserting on this (since supposedly it shouldn't happen), do we > need to check for is_exited on the two functions above? > > Also, is there a reason why we're not detecting a thread that has > exited? Aren't all threads stopped at this point (for all-stop mode at > least)? [...] Hello, I just nailed this down after it has been annoying me for some time, fixed it with a similar patch as the one submitted by John, and came here to report it. The reason that we are "not detecting" an exited thread (at least in the scenario I got is), gdb/thread.c: --- cut --- static void delete_thread_1 (ptid_t ptid, int silent) { ... /* If this is the current thread, or there's code out there that relies on it existing (refcount > 0) we can't delete yet. Mark it as exited, and notify it. */ if (tp->refcount > 0 || ptid_equal (tp->ptid, inferior_ptid)) { ... /* Will be really deleted some other time. */ printf_unfiltered ("========== Will be really deleted some other time %u\n", ptid); return; } ... if (tpprev) tpprev->next = tp->next; else thread_list = tp->next; --- cut --- In my scenario tp->refcount is 0, but "ptid_equal (tp->ptid, inferior_ptid)" is true, so the thread's entry is not removed from the global "threads_list". The gdb output (with "set debug fbsd-lwp" enabled): --- cut --- FLWP: adding thread for LWP 102009 [New LWP 102009 of process 40304] FLWP: fbsd_resume for ptid (-1, 0, 0) FLWP: fbsd_resume for ptid (40304, 102009, 0) FLWP: fbsd_resume for ptid (-1, 0, 0) FLWP: fbsd_resume for ptid (40304, 102009, 0) FLWP: fbsd_resume for ptid (-1, 0, 0) FLWP: deleting thread for LWP 102009 [LWP 102009 of process 40304 exited] ... ptrace: No such process. --- cut --- Hope this helps.
On Tuesday, December 27, 2016 05:43:29 PM Vasil Dimov wrote: > On Fri, Dec 23, 2016 at 15:43:19 -0600, Luis Machado wrote: > > On 12/23/2016 03:28 PM, John Baldwin wrote: > > > When resuming a native FreeBSD process, ignore exited threads when > > > suspending/resuming individual threads prior to continuing the process. > > > > > > gdb/ChangeLog: > > > > > > PR threads/20743 > > > * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > > > (resume_all_threads_cb): Likewise. > > > (fbsd_resume): Assert resuming thread has not exited. > > > --- > > > gdb/ChangeLog | 7 +++++++ > > > gdb/fbsd-nat.c | 7 +++++++ > > > 2 files changed, 14 insertions(+) > > > > > > diff --git a/gdb/ChangeLog b/gdb/ChangeLog > > > index db6e913..4fb3732 100644 > > > --- a/gdb/ChangeLog > > > +++ b/gdb/ChangeLog > > > @@ -1,3 +1,10 @@ > > > +2016-12-23 John Baldwin <jhb@FreeBSD.org> > > > + > > > + PR threads/20743 > > > + * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > > > + (resume_all_threads_cb): Likewise. > > > + (fbsd_resume): Assert resuming thread has not exited. > > > + > > > 2016-12-22 Doug Evans <xdje42@gmail.com> > > > > > > * infrun.c (set_step_over_info): Add comment. > > > diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c > > > index ade62f1..7cd08c6 100644 > > > --- a/gdb/fbsd-nat.c > > > +++ b/gdb/fbsd-nat.c > > > @@ -662,6 +662,9 @@ resume_one_thread_cb (struct thread_info *tp, void *data) > > > if (ptid_get_pid (tp->ptid) != ptid_get_pid (*ptid)) > > > return 0; > > > > > > + if (is_exited (tp->ptid)) > > > + return 0; > > > + > > > if (ptid_get_lwp (tp->ptid) == ptid_get_lwp (*ptid)) > > > request = PT_RESUME; > > > else > > > @@ -680,6 +683,9 @@ resume_all_threads_cb (struct thread_info *tp, void *data) > > > if (!ptid_match (tp->ptid, *filter)) > > > return 0; > > > > > > + if (is_exited (tp->ptid)) > > > + return 0; > > > + > > > if (ptrace (PT_RESUME, ptid_get_lwp (tp->ptid), NULL, 0) == -1) > > > perror_with_name (("ptrace")); > > > return 0; > > > @@ -711,6 +717,7 @@ fbsd_resume (struct target_ops *ops, > > > if (ptid_lwp_p (ptid)) > > > { > > > /* If ptid is a specific LWP, suspend all other LWPs in the process. */ > > > + gdb_assert (!is_exited (ptid)); > > > > If we're asserting on this (since supposedly it shouldn't happen), do we > > need to check for is_exited on the two functions above? > > > > Also, is there a reason why we're not detecting a thread that has > > exited? Aren't all threads stopped at this point (for all-stop mode at > > least)? > [...] > > Hello, > > I just nailed this down after it has been annoying me for some time, > fixed it with a similar patch as the one submitted by John, and came > here to report it. > > The reason that we are "not detecting" an exited thread (at least in the > scenario I got is), gdb/thread.c: > > --- cut --- > static void > delete_thread_1 (ptid_t ptid, int silent) > { > ... > /* If this is the current thread, or there's code out there that > relies on it existing (refcount > 0) we can't delete yet. Mark > it as exited, and notify it. */ > if (tp->refcount > 0 > || ptid_equal (tp->ptid, inferior_ptid)) > { > ... > /* Will be really deleted some other time. */ > printf_unfiltered ("========== Will be really deleted some other time %u\n", ptid); > return; > } > ... > if (tpprev) > tpprev->next = tp->next; > else > thread_list = tp->next; > --- cut --- > > In my scenario tp->refcount is 0, but > "ptid_equal (tp->ptid, inferior_ptid)" is true, so the thread's entry is > not removed from the global "threads_list". > > The gdb output (with "set debug fbsd-lwp" enabled): > > --- cut --- > FLWP: adding thread for LWP 102009 > [New LWP 102009 of process 40304] > FLWP: fbsd_resume for ptid (-1, 0, 0) > FLWP: fbsd_resume for ptid (40304, 102009, 0) > FLWP: fbsd_resume for ptid (-1, 0, 0) > FLWP: fbsd_resume for ptid (40304, 102009, 0) > FLWP: fbsd_resume for ptid (-1, 0, 0) > FLWP: deleting thread for LWP 102009 > [LWP 102009 of process 40304 exited] > ... > ptrace: No such process. > --- cut --- > > Hope this helps. In particular, the sequence of events is this: - an LWP (T1) reports a "normal" event (in the test case it is hitting a breakpoint). This is reported to the core and sets the current thread (and thus inferior_ptid) to T1. - the same LWP (T1) then exits and a thread exit event is reported via ptrace() to the native target. The native target calls delete_thread, but the thread is not removed, just marked EXITING since it == inferior_ptid as Vasil noted. The native target just continues the process explicitly via ptrace() without reporting any event to the core aside from the call to delete_thread(). - some other LWP (T2) reports an event (in the test case it is a breakpoint). - the user continues which invokes fbsd_resume() which wants to resume all threads. Here iterate_over_threads() in fbsd_resume() will encounters the exited thread for T1 since nothing has called thread_update_list() (which would invoke delete_exited_threads() from fbsd_update_thread_list()). Since the thread is exited, trying to manipulate it via ptrace() results in an error. I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS instead of explicitly continuing the process, but that doesn't help, and it means that the ptid being returned is still T1 in that case. I'm not sure if I should explicitly be calling delete_exited_threads() in fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't clear to me which of these is preferred since both are in use). I added the assertion for my own sanity. I suspect gdb should never try to invoke target_resume() with a ptid of an exited thread, but if for some reason it did the effect on FreeBSD would be a hang since we would suspend all the other threads and when the process was continued via PT_CONTINUE it would have nothing to do and would never return from wait(). I'd rather have gdb fail an assertion in that case rather than hang.
On Tue, Dec 27, 2016 at 13:03:27 -0800, John Baldwin wrote: [...] > I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS > instead of explicitly continuing the process, but that doesn't help, and it > means that the ptid being returned is still T1 in that case. > > I'm not sure if I should explicitly be calling delete_exited_threads() in > fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() > could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't > clear to me which of these is preferred since both are in use). > > I added the assertion for my own sanity. I suspect gdb should never try to > invoke target_resume() with a ptid of an exited thread, but if for some > reason it did the effect on FreeBSD would be a hang since we would suspend > all the other threads and when the process was continued via PT_CONTINUE it > would have nothing to do and would never return from wait(). I'd rather have > gdb fail an assertion in that case rather than hang. [...] Hi, I am not sure if this is related, but since I get a hang I would rather mention it: with the John's patch (including the assert) gdb does not emit the "ptrace: No such process" error, but when I attempt to quit, it hangs: --- cut --- ^C Thread 1 received signal SIGINT, Interrupt. [Switching to LWP 100746 of process 3453] 0x0000000804e59c7a in _poll () from /lib/libc.so.7 (gdb) q A debugging session is active. Inferior 1 [process 3453] will be killed. Quit anyway? (y or n) y (hangs here) --- cut --- It has hung here: --- cut --- (top-gdb) bt #0 0x0000000803526df8 in _wait4 () from /lib/libc.so.7 During symbol reading, cannot get low and high bounds for subprogram DIE at 76931. During symbol reading, incomplete CFI data; unspecified registers (e.g., rax) at 0x803a53a8e. #1 0x0000000803a53abc in __thr_wait4 (pid=3453, pid@entry=<optimized out>, status=0x7fffffffe304, status@entry=<optimized out>, options=0, options@entry=<optimized out>, rusage=0x0, rusage@entry=<optimized out>) at /usr/src/lib/libthr/thread/thr_syscalls.c:563 #2 0x0000000000540d85 in inf_ptrace_mourn_inferior(target_ops*) () #3 0x000000000067c28f in target_mourn_inferior() () #4 0x0000000000540c6d in inf_ptrace_kill(target_ops*) () #5 0x000000000072769a in kill_or_detach(inferior*, void*) () #6 0x000000000074bdc9 in iterate_over_inferiors(int (*)(inferior*, void*), void*) () #7 0x00000000007272b2 in quit_force(char*, int) () #8 0x0000000000576f74 in cmd_func(cmd_list_element*, char*, int) () #9 0x000000000072698d in execute_command(char*, int) () #10 0x000000000065b2d0 in command_line_handler(char*) () #11 0x000000000065aaf2 in gdb_rl_callback_handler(char*) () #12 0x0000000801c41d7a in rl_callback_read_char () from /usr/local/lib/libreadline.so.6 #13 0x000000000065a880 in gdb_rl_callback_read_char_wrapper(void*) () #14 0x000000000065ae80 in stdin_event_handler(int, void*) () #15 0x0000000000659cf3 in gdb_wait_for_event(int) () #16 0x00000000006598a3 in gdb_do_one_event() () #17 0x0000000000659df0 in start_event_loop() () #18 0x0000000000654c5c in captured_command_loop(void*) () #19 0x0000000000650037 in catch_errors(int (*)(void*), void*, char*, return_mask) () #20 0x0000000000654696 in gdb_main(captured_main_args*) () #21 0x0000000000408643 in main () (top-gdb) --- cut --- Calling delete_exited_threads before iterate_over_threads in fbsd_resume does not fix the hang. inf_ptrace_mourn_inferior calls waitpid and it hangs on the pid of the program being debugged, which is in TX state: T Marks a stopped process. X The process is being traced or debugged.
On Wednesday, December 28, 2016 09:07:07 AM Vasil Dimov wrote: > On Tue, Dec 27, 2016 at 13:03:27 -0800, John Baldwin wrote: > [...] > > I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS > > instead of explicitly continuing the process, but that doesn't help, and it > > means that the ptid being returned is still T1 in that case. > > > > I'm not sure if I should explicitly be calling delete_exited_threads() in > > fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() > > could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't > > clear to me which of these is preferred since both are in use). > > > > I added the assertion for my own sanity. I suspect gdb should never try to > > invoke target_resume() with a ptid of an exited thread, but if for some > > reason it did the effect on FreeBSD would be a hang since we would suspend > > all the other threads and when the process was continued via PT_CONTINUE it > > would have nothing to do and would never return from wait(). I'd rather have > > gdb fail an assertion in that case rather than hang. > [...] > > Hi, > > I am not sure if this is related, but since I get a hang I would rather > mention it: with the John's patch (including the assert) gdb does not > emit the "ptrace: No such process" error, but when I attempt to quit, > it hangs: No, this is a separate bug in the kernel whereby a process doesn't treat PT_KILL as a detach-like event but incorrectly expects to keep getting PT_CONTINUE events for a while until it finally exits. I'm working on writing up regression/unit tests for PT_KILL and then fixing the bug.
On Tuesday, December 27, 2016 01:03:27 PM John Baldwin wrote: > On Tuesday, December 27, 2016 05:43:29 PM Vasil Dimov wrote: > > On Fri, Dec 23, 2016 at 15:43:19 -0600, Luis Machado wrote: > > > On 12/23/2016 03:28 PM, John Baldwin wrote: > > > > When resuming a native FreeBSD process, ignore exited threads when > > > > suspending/resuming individual threads prior to continuing the process. > > > > > > > > gdb/ChangeLog: > > > > > > > > PR threads/20743 > > > > * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > > > > (resume_all_threads_cb): Likewise. > > > > (fbsd_resume): Assert resuming thread has not exited. > > > > --- > > > > gdb/ChangeLog | 7 +++++++ > > > > gdb/fbsd-nat.c | 7 +++++++ > > > > 2 files changed, 14 insertions(+) > > > > > > > > diff --git a/gdb/ChangeLog b/gdb/ChangeLog > > > > index db6e913..4fb3732 100644 > > > > --- a/gdb/ChangeLog > > > > +++ b/gdb/ChangeLog > > > > @@ -1,3 +1,10 @@ > > > > +2016-12-23 John Baldwin <jhb@FreeBSD.org> > > > > + > > > > + PR threads/20743 > > > > + * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > > > > + (resume_all_threads_cb): Likewise. > > > > + (fbsd_resume): Assert resuming thread has not exited. > > > > + > > > > 2016-12-22 Doug Evans <xdje42@gmail.com> > > > > > > > > * infrun.c (set_step_over_info): Add comment. > > > > diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c > > > > index ade62f1..7cd08c6 100644 > > > > --- a/gdb/fbsd-nat.c > > > > +++ b/gdb/fbsd-nat.c > > > > @@ -662,6 +662,9 @@ resume_one_thread_cb (struct thread_info *tp, void *data) > > > > if (ptid_get_pid (tp->ptid) != ptid_get_pid (*ptid)) > > > > return 0; > > > > > > > > + if (is_exited (tp->ptid)) > > > > + return 0; > > > > + > > > > if (ptid_get_lwp (tp->ptid) == ptid_get_lwp (*ptid)) > > > > request = PT_RESUME; > > > > else > > > > @@ -680,6 +683,9 @@ resume_all_threads_cb (struct thread_info *tp, void *data) > > > > if (!ptid_match (tp->ptid, *filter)) > > > > return 0; > > > > > > > > + if (is_exited (tp->ptid)) > > > > + return 0; > > > > + > > > > if (ptrace (PT_RESUME, ptid_get_lwp (tp->ptid), NULL, 0) == -1) > > > > perror_with_name (("ptrace")); > > > > return 0; > > > > @@ -711,6 +717,7 @@ fbsd_resume (struct target_ops *ops, > > > > if (ptid_lwp_p (ptid)) > > > > { > > > > /* If ptid is a specific LWP, suspend all other LWPs in the process. */ > > > > + gdb_assert (!is_exited (ptid)); > > > > > > If we're asserting on this (since supposedly it shouldn't happen), do we > > > need to check for is_exited on the two functions above? > > > > > > Also, is there a reason why we're not detecting a thread that has > > > exited? Aren't all threads stopped at this point (for all-stop mode at > > > least)? > > [...] > > > > Hello, > > > > I just nailed this down after it has been annoying me for some time, > > fixed it with a similar patch as the one submitted by John, and came > > here to report it. > > > > The reason that we are "not detecting" an exited thread (at least in the > > scenario I got is), gdb/thread.c: > > > > --- cut --- > > static void > > delete_thread_1 (ptid_t ptid, int silent) > > { > > ... > > /* If this is the current thread, or there's code out there that > > relies on it existing (refcount > 0) we can't delete yet. Mark > > it as exited, and notify it. */ > > if (tp->refcount > 0 > > || ptid_equal (tp->ptid, inferior_ptid)) > > { > > ... > > /* Will be really deleted some other time. */ > > printf_unfiltered ("========== Will be really deleted some other time %u\n", ptid); > > return; > > } > > ... > > if (tpprev) > > tpprev->next = tp->next; > > else > > thread_list = tp->next; > > --- cut --- > > > > In my scenario tp->refcount is 0, but > > "ptid_equal (tp->ptid, inferior_ptid)" is true, so the thread's entry is > > not removed from the global "threads_list". > > > > The gdb output (with "set debug fbsd-lwp" enabled): > > > > --- cut --- > > FLWP: adding thread for LWP 102009 > > [New LWP 102009 of process 40304] > > FLWP: fbsd_resume for ptid (-1, 0, 0) > > FLWP: fbsd_resume for ptid (40304, 102009, 0) > > FLWP: fbsd_resume for ptid (-1, 0, 0) > > FLWP: fbsd_resume for ptid (40304, 102009, 0) > > FLWP: fbsd_resume for ptid (-1, 0, 0) > > FLWP: deleting thread for LWP 102009 > > [LWP 102009 of process 40304 exited] > > ... > > ptrace: No such process. > > --- cut --- > > > > Hope this helps. > > In particular, the sequence of events is this: > > - an LWP (T1) reports a "normal" event (in the test case it is hitting a > breakpoint). This is reported to the core and sets the current thread > (and thus inferior_ptid) to T1. > - the same LWP (T1) then exits and a thread exit event is reported via > ptrace() to the native target. The native target calls delete_thread, > but the thread is not removed, just marked EXITING since it == > inferior_ptid as Vasil noted. The native target just > continues the process explicitly via ptrace() without reporting any > event to the core aside from the call to delete_thread(). > - some other LWP (T2) reports an event (in the test case it is a > breakpoint). > - the user continues which invokes fbsd_resume() which wants to resume > all threads. Here iterate_over_threads() in fbsd_resume() will > encounters the exited thread for T1 since nothing has called > thread_update_list() (which would invoke delete_exited_threads() from > fbsd_update_thread_list()). Since the thread is exited, trying to > manipulate it via ptrace() results in an error. > > I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS > instead of explicitly continuing the process, but that doesn't help, and it > means that the ptid being returned is still T1 in that case. > > I'm not sure if I should explicitly be calling delete_exited_threads() in > fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() > could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't > clear to me which of these is preferred since both are in use). > > I added the assertion for my own sanity. I suspect gdb should never try to > invoke target_resume() with a ptid of an exited thread, but if for some > reason it did the effect on FreeBSD would be a hang since we would suspend > all the other threads and when the process was continued via PT_CONTINUE it > would have nothing to do and would never return from wait(). I'd rather have > gdb fail an assertion in that case rather than hang. Luis (or anyone else), any thoughts on the above or if there is a better way to solve this (e.g. calling delete_exited_threads() explicitly in fbsd_resume() before iterate_threads() and/or using ALL_NONEXITED_THREADS instead of iterate_threads)?
On 12/28/2016 11:37 AM, John Baldwin wrote: > On Wednesday, December 28, 2016 09:07:07 AM Vasil Dimov wrote: >> On Tue, Dec 27, 2016 at 13:03:27 -0800, John Baldwin wrote: >> [...] >>> I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS >>> instead of explicitly continuing the process, but that doesn't help, and it >>> means that the ptid being returned is still T1 in that case. >>> >>> I'm not sure if I should explicitly be calling delete_exited_threads() in >>> fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() >>> could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't >>> clear to me which of these is preferred since both are in use). >>> >>> I added the assertion for my own sanity. I suspect gdb should never try to >>> invoke target_resume() with a ptid of an exited thread, but if for some >>> reason it did the effect on FreeBSD would be a hang since we would suspend >>> all the other threads and when the process was continued via PT_CONTINUE it >>> would have nothing to do and would never return from wait(). I'd rather have >>> gdb fail an assertion in that case rather than hang. >> [...] >> >> Hi, >> >> I am not sure if this is related, but since I get a hang I would rather >> mention it: with the John's patch (including the assert) gdb does not >> emit the "ptrace: No such process" error, but when I attempt to quit, >> it hangs: > > No, this is a separate bug in the kernel whereby a process doesn't > treat PT_KILL as a detach-like event but incorrectly expects to keep > getting PT_CONTINUE events for a while until it finally exits. I'm > working on writing up regression/unit tests for PT_KILL and then > fixing the bug. > I think the patch is mainly papering over a bigger problem. My guess is that the native fbsd backend is not doing something it should. I'd check how linux-nat.c is doing things and then try to confirm the fbsd behavior is sane. For example, i noticed linux-nat.c has exit_lwp (...) that handles deletion of both thread information and the thread itself (lwp). Even if it is the currently-selected thread, we *will* get the lwp removed from the list of existing lwp's. It doesn't make sense to keep a thread that has already exitted in the list of threads we are manipulating.
On Thursday, January 12, 2017 10:29:00 AM Luis Machado wrote: > On 12/28/2016 11:37 AM, John Baldwin wrote: > > On Wednesday, December 28, 2016 09:07:07 AM Vasil Dimov wrote: > >> On Tue, Dec 27, 2016 at 13:03:27 -0800, John Baldwin wrote: > >> [...] > >>> I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS > >>> instead of explicitly continuing the process, but that doesn't help, and it > >>> means that the ptid being returned is still T1 in that case. > >>> > >>> I'm not sure if I should explicitly be calling delete_exited_threads() in > >>> fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() > >>> could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't > >>> clear to me which of these is preferred since both are in use). > >>> > >>> I added the assertion for my own sanity. I suspect gdb should never try to > >>> invoke target_resume() with a ptid of an exited thread, but if for some > >>> reason it did the effect on FreeBSD would be a hang since we would suspend > >>> all the other threads and when the process was continued via PT_CONTINUE it > >>> would have nothing to do and would never return from wait(). I'd rather have > >>> gdb fail an assertion in that case rather than hang. > >> [...] > >> > >> Hi, > >> > >> I am not sure if this is related, but since I get a hang I would rather > >> mention it: with the John's patch (including the assert) gdb does not > >> emit the "ptrace: No such process" error, but when I attempt to quit, > >> it hangs: > > > > No, this is a separate bug in the kernel whereby a process doesn't > > treat PT_KILL as a detach-like event but incorrectly expects to keep > > getting PT_CONTINUE events for a while until it finally exits. I'm > > working on writing up regression/unit tests for PT_KILL and then > > fixing the bug. > > > > I think the patch is mainly papering over a bigger problem. My guess is > that the native fbsd backend is not doing something it should. > > I'd check how linux-nat.c is doing things and then try to confirm the > fbsd behavior is sane. > > For example, i noticed linux-nat.c has exit_lwp (...) that handles > deletion of both thread information and the thread itself (lwp). Even if > it is the currently-selected thread, we *will* get the lwp removed from > the list of existing lwp's. FreeBSD's backend doesn't maintain a separate lwp list, it just uses the existing GDB thread list. For FreeBSD's backend the two lists would simply mirror each other so it seems a bit of a waste to maintain a duplicate list. exit_lwp() calls delete_thread() which is the same thing the FreeBSD backend is doing, so if that is the current thread in inferior_ptid, the Linux backend will also being leaving the exited thread around in GDB's list until some future call to delete_exited_threads(). I think the thing that makes Linux work is that it doesn't use GDB's thread list. Meaning, it doesn't walk over GDB's thread list, but instead iterates over its private LWP list via iterate_over_lwps(). It would seem that GDB's thread list is designed so that backends shouldn't need their own thread list (you can add target-specific data with a custom destructor that gets invoked when freeing a thread for example), but the Linux backend doesn't choose to use it that way? Looking at some other threaded backends: - aix-thread.c relies on custom ptrace ops that resume a single thread - darwin-nat.c uses its own thread list (stored in the inferior's private data) instead of GDB's thread list. - gnu-nat.c uses its own thread list instead of GDB's thread list. - obsd-nat.c uses GDB's thread list but doesn't seem to support resuming individual threads (only entire processes). - procfs.c maintains its own thread list, but it doesn't seem to use it for resume but relies on the associated kernel resuming either an entire process or a single thread in a process via different ioctls. - remote.c:remote_resume() uses ALL_NON_EXITED_THREADS - windows_nat.cwindows_resume() calls windows_continue() which uses a target-internal thread list rather than GDB's thread list. > It doesn't make sense to keep a thread that has already exitted in the > list of threads we are manipulating. FreeBSD's backend isn't making that choice. delete_thread() in threads.c is the one making that choice. If FreeBSD's backend were to define its own thread list, the contents would be identical except it would not include any exited threads, so skipping exited threads gives the same result as walking a hypothetical private list.
On 01/12/2017 01:16 PM, John Baldwin wrote: > On Thursday, January 12, 2017 10:29:00 AM Luis Machado wrote: >> On 12/28/2016 11:37 AM, John Baldwin wrote: >>> On Wednesday, December 28, 2016 09:07:07 AM Vasil Dimov wrote: >>>> On Tue, Dec 27, 2016 at 13:03:27 -0800, John Baldwin wrote: >>>> [...] >>>>> I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS >>>>> instead of explicitly continuing the process, but that doesn't help, and it >>>>> means that the ptid being returned is still T1 in that case. >>>>> >>>>> I'm not sure if I should explicitly be calling delete_exited_threads() in >>>>> fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() >>>>> could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't >>>>> clear to me which of these is preferred since both are in use). >>>>> >>>>> I added the assertion for my own sanity. I suspect gdb should never try to >>>>> invoke target_resume() with a ptid of an exited thread, but if for some >>>>> reason it did the effect on FreeBSD would be a hang since we would suspend >>>>> all the other threads and when the process was continued via PT_CONTINUE it >>>>> would have nothing to do and would never return from wait(). I'd rather have >>>>> gdb fail an assertion in that case rather than hang. >>>> [...] >>>> >>>> Hi, >>>> >>>> I am not sure if this is related, but since I get a hang I would rather >>>> mention it: with the John's patch (including the assert) gdb does not >>>> emit the "ptrace: No such process" error, but when I attempt to quit, >>>> it hangs: >>> >>> No, this is a separate bug in the kernel whereby a process doesn't >>> treat PT_KILL as a detach-like event but incorrectly expects to keep >>> getting PT_CONTINUE events for a while until it finally exits. I'm >>> working on writing up regression/unit tests for PT_KILL and then >>> fixing the bug. >>> >> >> I think the patch is mainly papering over a bigger problem. My guess is >> that the native fbsd backend is not doing something it should. >> >> I'd check how linux-nat.c is doing things and then try to confirm the >> fbsd behavior is sane. >> >> For example, i noticed linux-nat.c has exit_lwp (...) that handles >> deletion of both thread information and the thread itself (lwp). Even if >> it is the currently-selected thread, we *will* get the lwp removed from >> the list of existing lwp's. > > FreeBSD's backend doesn't maintain a separate lwp list, it just uses > the existing GDB thread list. For FreeBSD's backend the two lists would > simply mirror each other so it seems a bit of a waste to maintain a > duplicate list. exit_lwp() calls delete_thread() which is the same thing > the FreeBSD backend is doing, so if that is the current thread in > inferior_ptid, the Linux backend will also being leaving the exited > thread around in GDB's list until some future call to delete_exited_threads(). > > I think the thing that makes Linux work is that it doesn't use GDB's > thread list. Meaning, it doesn't walk over GDB's thread list, but instead > iterates over its private LWP list via iterate_over_lwps(). It would seem > that GDB's thread list is designed so that backends shouldn't need their > own thread list (you can add target-specific data with a custom destructor > that gets invoked when freeing a thread for example), but the Linux backend > doesn't choose to use it that way? > > Looking at some other threaded backends: > > - aix-thread.c relies on custom ptrace ops that resume a single thread > - darwin-nat.c uses its own thread list (stored in the inferior's > private data) instead of GDB's thread list. > - gnu-nat.c uses its own thread list instead of GDB's thread list. > - obsd-nat.c uses GDB's thread list but doesn't seem to support resuming > individual threads (only entire processes). > - procfs.c maintains its own thread list, but it doesn't seem to use it > for resume but relies on the associated kernel resuming either an > entire process or a single thread in a process via different ioctls. > - remote.c:remote_resume() uses ALL_NON_EXITED_THREADS > - windows_nat.cwindows_resume() calls windows_continue() which uses a > target-internal thread list rather than GDB's thread list. > >> It doesn't make sense to keep a thread that has already exitted in the >> list of threads we are manipulating. > > FreeBSD's backend isn't making that choice. delete_thread() in threads.c > is the one making that choice. If FreeBSD's backend were to define its > own thread list, the contents would be identical except it would not > include any exited threads, so skipping exited threads gives the same > result as walking a hypothetical private list. > So i take it using ALL_NON_EXITED_THREADS is something that would seem reasonable to use in this case and not iterate through all threads (even ones marked exitting)?
On Thursday, January 12, 2017 07:27:43 PM Luis Machado wrote: > On 01/12/2017 01:16 PM, John Baldwin wrote: > > On Thursday, January 12, 2017 10:29:00 AM Luis Machado wrote: > >> On 12/28/2016 11:37 AM, John Baldwin wrote: > >>> On Wednesday, December 28, 2016 09:07:07 AM Vasil Dimov wrote: > >>>> On Tue, Dec 27, 2016 at 13:03:27 -0800, John Baldwin wrote: > >>>> [...] > >>>>> I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS > >>>>> instead of explicitly continuing the process, but that doesn't help, and it > >>>>> means that the ptid being returned is still T1 in that case. > >>>>> > >>>>> I'm not sure if I should explicitly be calling delete_exited_threads() in > >>>>> fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() > >>>>> could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't > >>>>> clear to me which of these is preferred since both are in use). > >>>>> > >>>>> I added the assertion for my own sanity. I suspect gdb should never try to > >>>>> invoke target_resume() with a ptid of an exited thread, but if for some > >>>>> reason it did the effect on FreeBSD would be a hang since we would suspend > >>>>> all the other threads and when the process was continued via PT_CONTINUE it > >>>>> would have nothing to do and would never return from wait(). I'd rather have > >>>>> gdb fail an assertion in that case rather than hang. > >>>> [...] > >>>> > >>>> Hi, > >>>> > >>>> I am not sure if this is related, but since I get a hang I would rather > >>>> mention it: with the John's patch (including the assert) gdb does not > >>>> emit the "ptrace: No such process" error, but when I attempt to quit, > >>>> it hangs: > >>> > >>> No, this is a separate bug in the kernel whereby a process doesn't > >>> treat PT_KILL as a detach-like event but incorrectly expects to keep > >>> getting PT_CONTINUE events for a while until it finally exits. I'm > >>> working on writing up regression/unit tests for PT_KILL and then > >>> fixing the bug. > >>> > >> > >> I think the patch is mainly papering over a bigger problem. My guess is > >> that the native fbsd backend is not doing something it should. > >> > >> I'd check how linux-nat.c is doing things and then try to confirm the > >> fbsd behavior is sane. > >> > >> For example, i noticed linux-nat.c has exit_lwp (...) that handles > >> deletion of both thread information and the thread itself (lwp). Even if > >> it is the currently-selected thread, we *will* get the lwp removed from > >> the list of existing lwp's. > > > > FreeBSD's backend doesn't maintain a separate lwp list, it just uses > > the existing GDB thread list. For FreeBSD's backend the two lists would > > simply mirror each other so it seems a bit of a waste to maintain a > > duplicate list. exit_lwp() calls delete_thread() which is the same thing > > the FreeBSD backend is doing, so if that is the current thread in > > inferior_ptid, the Linux backend will also being leaving the exited > > thread around in GDB's list until some future call to delete_exited_threads(). > > > > I think the thing that makes Linux work is that it doesn't use GDB's > > thread list. Meaning, it doesn't walk over GDB's thread list, but instead > > iterates over its private LWP list via iterate_over_lwps(). It would seem > > that GDB's thread list is designed so that backends shouldn't need their > > own thread list (you can add target-specific data with a custom destructor > > that gets invoked when freeing a thread for example), but the Linux backend > > doesn't choose to use it that way? > > > > Looking at some other threaded backends: > > > > - aix-thread.c relies on custom ptrace ops that resume a single thread > > - darwin-nat.c uses its own thread list (stored in the inferior's > > private data) instead of GDB's thread list. > > - gnu-nat.c uses its own thread list instead of GDB's thread list. > > - obsd-nat.c uses GDB's thread list but doesn't seem to support resuming > > individual threads (only entire processes). > > - procfs.c maintains its own thread list, but it doesn't seem to use it > > for resume but relies on the associated kernel resuming either an > > entire process or a single thread in a process via different ioctls. > > - remote.c:remote_resume() uses ALL_NON_EXITED_THREADS > > - windows_nat.cwindows_resume() calls windows_continue() which uses a > > target-internal thread list rather than GDB's thread list. > > > >> It doesn't make sense to keep a thread that has already exitted in the > >> list of threads we are manipulating. > > > > FreeBSD's backend isn't making that choice. delete_thread() in threads.c > > is the one making that choice. If FreeBSD's backend were to define its > > own thread list, the contents would be identical except it would not > > include any exited threads, so skipping exited threads gives the same > > result as walking a hypothetical private list. > > > > So i take it using ALL_NON_EXITED_THREADS is something that would seem > reasonable to use in this case and not iterate through all threads (even > ones marked exitting)? Yes, that would work for me.
On 01/13/2017 01:44 AM, John Baldwin wrote: >> So i take it using ALL_NON_EXITED_THREADS is something that would seem >> reasonable to use in this case and not iterate through all threads (even >> ones marked exitting)? > > Yes, that would work for me. > Agree, that should be fine. Thanks, Pedro Alves
On 12/27/2016 09:03 PM, John Baldwin wrote: > I added the assertion for my own sanity. I suspect gdb should never try to > invoke target_resume() with a ptid of an exited thread, but if for some > reason it did the effect on FreeBSD would be a hang since we would suspend > all the other threads and when the process was continued via PT_CONTINUE it > would have nothing to do and would never return from wait(). I'd rather have > gdb fail an assertion in that case rather than hang. OOC, what happens on FreeBSD on the gdb.threads/no-unwaited-for-left.exp test? That test: - spawns a thread - gdb switches to that thread, and, - enables "set scheduler-locking on", and "continue"s. - the child thread exits - gdb should not hang. It used to hang on Linux, until TARGET_WAITKIND_NO_RESUMED was added. Thanks, Pedro Alves
diff --git a/gdb/ChangeLog b/gdb/ChangeLog index db6e913..4fb3732 100644 --- a/gdb/ChangeLog +++ b/gdb/ChangeLog @@ -1,3 +1,10 @@ +2016-12-23 John Baldwin <jhb@FreeBSD.org> + + PR threads/20743 + * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. + (resume_all_threads_cb): Likewise. + (fbsd_resume): Assert resuming thread has not exited. + 2016-12-22 Doug Evans <xdje42@gmail.com> * infrun.c (set_step_over_info): Add comment. diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c index ade62f1..7cd08c6 100644 --- a/gdb/fbsd-nat.c +++ b/gdb/fbsd-nat.c @@ -662,6 +662,9 @@ resume_one_thread_cb (struct thread_info *tp, void *data) if (ptid_get_pid (tp->ptid) != ptid_get_pid (*ptid)) return 0; + if (is_exited (tp->ptid)) + return 0; + if (ptid_get_lwp (tp->ptid) == ptid_get_lwp (*ptid)) request = PT_RESUME; else @@ -680,6 +683,9 @@ resume_all_threads_cb (struct thread_info *tp, void *data) if (!ptid_match (tp->ptid, *filter)) return 0; + if (is_exited (tp->ptid)) + return 0; + if (ptrace (PT_RESUME, ptid_get_lwp (tp->ptid), NULL, 0) == -1) perror_with_name (("ptrace")); return 0; @@ -711,6 +717,7 @@ fbsd_resume (struct target_ops *ops, if (ptid_lwp_p (ptid)) { /* If ptid is a specific LWP, suspend all other LWPs in the process. */ + gdb_assert (!is_exited (ptid)); iterate_over_threads (resume_one_thread_cb, &ptid); } else