Message ID | 55B1308E.4020700@redhat.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 5919 invoked by alias); 23 Jul 2015 18:21:08 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 5779 invoked by uid 89); 23 Jul 2015 18:21:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 23 Jul 2015 18:21:06 +0000 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 04A4D2B7847; Thu, 23 Jul 2015 18:21:04 +0000 (UTC) Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t6NIL219012670; Thu, 23 Jul 2015 14:21:03 -0400 Message-ID: <55B1308E.4020700@redhat.com> Date: Thu, 23 Jul 2015 19:21:02 +0100 From: Pedro Alves <palves@redhat.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: "Breazeal, Don" <donb@codesourcery.com>, Simon Marchi <simon.marchi@ericsson.com> CC: GDB Patches <gdb-patches@sourceware.org> Subject: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode References: <1437672294-29351-1-git-send-email-palves@redhat.com> In-Reply-To: <1437672294-29351-1-git-send-email-palves@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit |
Commit Message
Pedro Alves
July 23, 2015, 6:21 p.m. UTC
So I managed to extract out this smaller patch from the gdbserver fixes I mentioned. I think this one looks safe enough for 7.10. WDYT? ----------- From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001 From: Pedro Alves <palves@redhat.com> Date: Thu, 23 Jul 2015 18:49:51 +0100 Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode Running gdb.threads/fork-plus-threads.exp against gdbserver in extended-remote mode, even though the test passes, we still see broken behavior: Running gdb.threads/fork-plus-threads.exp against gdbserver in extended-remote mode, even though the test passes, we still see broken behavior: (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off continue & Continuing. (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue & [New Thread 28092.28092] [Thread 28092.28092] #2 stopped. [New Thread 28094.28094] [Inferior 2 (process 28092) exited normally] [New Thread 28094.28105] [New Thread 28094.28109] ... [Thread 28174.28174] #18 stopped. [New Thread 28185.28185] [Inferior 10 (process 28174) exited normally] [New Thread 28185.28196] [Thread 28185.28185] #20 stopped. Cannot remove breakpoints because program is no longer writable. Further execution is probably impossible. [Inferior 11 (process 28185) exited normally] [Inferior 1 (process 28091) exited normally] PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint info threads No threads. (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left info inferiors Num Description Executable * 1 <null> /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left All the "[Thread FOO] #NN stopped." above are bogus, as well as the "Cannot remove breakpoints because program is no longer writable.", which is a consequence. The problem is that when we intercept a fork event, we should report the event for the parent, only, and leave the child stopped, but not report its stop event. GDB later decides whether to follow the parent or the child. But because handle_extended_wait does not set the child's last_status.kind to TARGET_WAITKIND_STOPPED, a stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access memory) by mistake ends up queueing a SIGSTOP on the child, resuming it, and then when that SIGSTOP is intercepted, because the LWP has last_resume_kind set to resume_stop, gdbserver reports the stop to GDB, as GDB_SIGNAL_0: ... >>>> entering unstop_all_lwps unstopping all lwps proceed_one_lwp: lwp 1600 client wants LWP to remain 1600 stopped proceed_one_lwp: lwp 1828 Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sending sigstop to lwp 1828 pc is 0x3615ebc7cc Resuming lwp 1828 (continue, signal 0, stop expected) continue from pc 0x3615ebc7cc unstop_all_lwps done sigchld_handler <<<< exiting unstop_all_lwps handling possible target event >>>> entering linux_wait_1 linux_wait_1: [<all threads>] my_waitpid (-1, 0x40000001) my_waitpid (-1, 0x1): status(137f), 1828 LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK LLW: waitpid 1828 received Stopped (signal) (stopped) pc is 0x3615ebc7cc Expected stop. LLW: resume_stop SIGSTOP caught for LWP 1828.1828. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ... linux_wait_1 ret = LWP 1828.1828, 1, 0 <<<< exiting linux_wait_1 Writing resume reply for LWP 1828.1828:1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ By inspection, I also noticed that we miss leaving the child with the suspend count incremented if stopping threads, like we do for clone threads. Tested on x86_64 Fedora 20, extended-remote. gdb/gdbserver/ChangeLog: 2015-07-23 Pedro Alves <palves@redhat.com> * linux-low.c (handle_extended_wait): Set the child's last reported status to TARGET_WAITKIND_STOPPED. --- gdb/gdbserver/linux-low.c | 7 ++++++ gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++ 2 files changed, 37 insertions(+)
Comments
On 15-07-23 02:21 PM, Pedro Alves wrote: > So I managed to extract out this smaller patch from the > gdbserver fixes I mentioned. I think this one looks safe enough > for 7.10. WDYT? > > ----------- > From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001 > From: Pedro Alves <palves@redhat.com> > Date: Thu, 23 Jul 2015 18:49:51 +0100 > Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode > > Running gdb.threads/fork-plus-threads.exp against gdbserver in > extended-remote mode, even though the test passes, we still see broken > behavior: > > Running gdb.threads/fork-plus-threads.exp against gdbserver in > extended-remote mode, even though the test passes, we still see broken > behavior: > > (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off > continue & > Continuing. > (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue & > [New Thread 28092.28092] > > [Thread 28092.28092] #2 stopped. > [New Thread 28094.28094] > [Inferior 2 (process 28092) exited normally] > [New Thread 28094.28105] > [New Thread 28094.28109] > > ... > > [Thread 28174.28174] #18 stopped. > [New Thread 28185.28185] > [Inferior 10 (process 28174) exited normally] > [New Thread 28185.28196] > > [Thread 28185.28185] #20 stopped. > Cannot remove breakpoints because program is no longer writable. > Further execution is probably impossible. > [Inferior 11 (process 28185) exited normally] > [Inferior 1 (process 28091) exited normally] > PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint > info threads > No threads. > (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left > info inferiors > Num Description Executable > * 1 <null> /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads > (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left > > All the "[Thread FOO] #NN stopped." above are bogus, as well as the > "Cannot remove breakpoints because program is no longer writable.", > which is a consequence. > > The problem is that when we intercept a fork event, we should report > the event for the parent, only, and leave the child stopped, but not > report its stop event. GDB later decides whether to follow the parent > or the child. But because handle_extended_wait does not set the > child's last_status.kind to TARGET_WAITKIND_STOPPED, a > stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access > memory) by mistake ends up queueing a SIGSTOP on the child, resuming > it, and then when that SIGSTOP is intercepted, because the LWP has > last_resume_kind set to resume_stop, gdbserver reports the stop to > GDB, as GDB_SIGNAL_0: > > ... > >>>> entering unstop_all_lwps > unstopping all lwps > proceed_one_lwp: lwp 1600 > client wants LWP to remain 1600 stopped > proceed_one_lwp: lwp 1828 > Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Sending sigstop to lwp 1828 > pc is 0x3615ebc7cc > Resuming lwp 1828 (continue, signal 0, stop expected) > continue from pc 0x3615ebc7cc > unstop_all_lwps done > sigchld_handler > <<<< exiting unstop_all_lwps > handling possible target event > >>>> entering linux_wait_1 > linux_wait_1: [<all threads>] > my_waitpid (-1, 0x40000001) > my_waitpid (-1, 0x1): status(137f), 1828 > LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK > LLW: waitpid 1828 received Stopped (signal) (stopped) > pc is 0x3615ebc7cc > Expected stop. > LLW: resume_stop SIGSTOP caught for LWP 1828.1828. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > ... > linux_wait_1 ret = LWP 1828.1828, 1, 0 > <<<< exiting linux_wait_1 > Writing resume reply for LWP 1828.1828:1 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > By inspection, I also noticed that we miss leaving the child with the > suspend count incremented if stopping threads, like we do for clone > threads. > > Tested on x86_64 Fedora 20, extended-remote. > > gdb/gdbserver/ChangeLog: > 2015-07-23 Pedro Alves <palves@redhat.com> > > * linux-low.c (handle_extended_wait): Set the child's last > reported status to TARGET_WAITKIND_STOPPED. > --- > gdb/gdbserver/linux-low.c | 7 ++++++ > gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c > index 17b2a51..56a33ff 100644 > --- a/gdb/gdbserver/linux-low.c > +++ b/gdb/gdbserver/linux-low.c > @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat) > child_lwp->status_pending_p = 0; > child_thr = get_lwp_thread (child_lwp); > child_thr->last_resume_kind = resume_stop; > + child_thr->last_status.kind = TARGET_WAITKIND_STOPPED; > + > + /* If we're suspending all threads, leave this one suspended > + too. */ > + if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS) > + child_lwp->suspended = 1; > + > parent_proc = get_thread_process (event_thr); > child_proc->attached = parent_proc->attached; > clone_all_breakpoints (&child_proc->breakpoints, > diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp > index f44dd76..80d2464 100644 > --- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp > +++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp > @@ -48,13 +48,43 @@ gdb_test_multiple $test $test { > } > } > > +# gdbserver had a bug that resulted in reporting the fork child's > +# initial stop to gdb, which gdb does not expect, in turn resulting in > +# a broken session, like: > +# > +# [Thread 31536.31536] #16 stopped. <== BAD > +# [New Thread 31547.31547] > +# [Inferior 10 (process 31536) exited normally] > +# [New Thread 31547.31560] > +# > +# [Thread 31547.31547] #18 stopped. <== BAD > +# Cannot remove breakpoints because program is no longer writable. <== BAD > +# Further execution is probably impossible. <== BAD > +# [Inferior 11 (process 31547) exited normally] > +# [Inferior 1 (process 31454) exited normally] > +# > +# These variables track whether we see such broken behavior. > +set saw_cannot_remove_breakpoints 0 > +set saw_thread_stopped 0 > + > set test "reached breakpoint" > gdb_test_multiple "" $test { > + -re "Cannot remove breakpoints" { > + set saw_cannot_remove_breakpoints 1 > + exp_continue > + } > + -re "Thread \[^\r\n\]+ stopped\\." { > + set saw_thread_stopped 1 > + exp_continue > + } > -re "Inferior 1 \(\[^\r\n\]+\) exited normally" { > pass $test > } > } > > +gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints" > +gdb_assert !$saw_thread_stopped "no spurious thread stop" > + > gdb_test "info threads" "No threads\." \ > "no threads left" I tried it and it works as expected. If you try the same test program in all-stop though, fork childs are left stopped. Is it expected? I am not sure how forking interacts with all-stop. ----------------------- $ ./gdb -q -nx -ex "set detach-on-fork off" testsuite/gdb.threads/fork-plus-threads Reading symbols from testsuite/gdb.threads/fork-plus-threads...done. (gdb) r & Starting program: /home/emaisin/src/binutils-gdb/gdb/testsuite/gdb.threads/fork-plus-threads (gdb) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5304] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5305] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5306] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5307] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5308] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5309] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5310] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5311] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5312] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New process 5313] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". i th Id Target Id Frame 11 process 5313 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 10 process 5312 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 9 process 5311 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 8 process 5310 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 7 process 5309 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 6 process 5308 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 5 process 5307 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 4 process 5306 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 3 process 5305 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 2 process 5304 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6 * 1 Thread 0x7ffff7fc9740 (LWP 5300) "fork-plus-threa" (running)
On 07/24/2015 07:05 PM, Simon Marchi wrote: > I tried it and it works as expected. If you try the same test program in all-stop > though, fork childs are left stopped. Is it expected? I am not sure how forking > interacts with all-stop. Yeah, in all-stop, you need "set schedule-multiple on" to let all processes run. That seems to trip on more breakage: ... [Thread 0x7ffff57bc700 (LWP 11703) exited] [Thread 0x7ffff7fc1700 (LWP 11700) exited] [New Thread 0x7ffff77c0700 (LWP 11710)] [New Thread 0x7ffff67be700 (LWP 11709)] [New Thread 0x7ffff57bc700 (LWP 11711)] [New Thread 0x7ffff4fbb700 (LWP 11712)] [New Thread 0x7ffff3fb9700 (LWP 11713)] [New process 11702] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [Thread 0x7ffff3fb9700 (LWP 11713) exited] [Thread 0x7ffff4fbb700 (LWP 11712) exited] [Thread 0x7ffff57bc700 (LWP 11711) exited] [Thread 0x7ffff77c0700 (LWP 11710) exited] [Thread 0x7ffff67be700 (LWP 11709) exited] [New Thread 0x7ffff7fc1700 (LWP 11714)] [New Thread 0x7ffff6fbf700 (LWP 11716)] [New Thread 0x7ffff5fbd700 (LWP 11715)] [New Thread 0x7ffff37b8700 (LWP 11717)] [Inferior 3 (process 11634) exited normally] Cannot find new threads: generic error (gdb) info threads Cannot find new threads: generic error (gdb) info threads Cannot find new threads: generic error (gdb) q ISTR that's not a new bug, but I haven't tried older releases. Thanks, Pedro Alves
On 7/23/2015 11:21 AM, Pedro Alves wrote: > So I managed to extract out this smaller patch from the > gdbserver fixes I mentioned. I think this one looks safe enough > for 7.10. WDYT? > > ----------- > From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001 > From: Pedro Alves <palves@redhat.com> > Date: Thu, 23 Jul 2015 18:49:51 +0100 > Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode > > Running gdb.threads/fork-plus-threads.exp against gdbserver in > extended-remote mode, even though the test passes, we still see broken > behavior: > > Running gdb.threads/fork-plus-threads.exp against gdbserver in > extended-remote mode, even though the test passes, we still see broken > behavior: > > (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off > continue & > Continuing. > (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue & > [New Thread 28092.28092] > > [Thread 28092.28092] #2 stopped. > [New Thread 28094.28094] > [Inferior 2 (process 28092) exited normally] > [New Thread 28094.28105] > [New Thread 28094.28109] > > ... > > [Thread 28174.28174] #18 stopped. > [New Thread 28185.28185] > [Inferior 10 (process 28174) exited normally] > [New Thread 28185.28196] > > [Thread 28185.28185] #20 stopped. > Cannot remove breakpoints because program is no longer writable. > Further execution is probably impossible. > [Inferior 11 (process 28185) exited normally] > [Inferior 1 (process 28091) exited normally] > PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint > info threads > No threads. > (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left > info inferiors > Num Description Executable > * 1 <null> /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads > (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left > > All the "[Thread FOO] #NN stopped." above are bogus, as well as the > "Cannot remove breakpoints because program is no longer writable.", > which is a consequence. > > The problem is that when we intercept a fork event, we should report > the event for the parent, only, and leave the child stopped, but not > report its stop event. GDB later decides whether to follow the parent > or the child. But because handle_extended_wait does not set the > child's last_status.kind to TARGET_WAITKIND_STOPPED, a > stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access > memory) by mistake ends up queueing a SIGSTOP on the child, resuming > it, and then when that SIGSTOP is intercepted, because the LWP has > last_resume_kind set to resume_stop, gdbserver reports the stop to > GDB, as GDB_SIGNAL_0: > > ... > >>>> entering unstop_all_lwps > unstopping all lwps > proceed_one_lwp: lwp 1600 > client wants LWP to remain 1600 stopped > proceed_one_lwp: lwp 1828 > Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Sending sigstop to lwp 1828 > pc is 0x3615ebc7cc > Resuming lwp 1828 (continue, signal 0, stop expected) > continue from pc 0x3615ebc7cc > unstop_all_lwps done > sigchld_handler > <<<< exiting unstop_all_lwps > handling possible target event > >>>> entering linux_wait_1 > linux_wait_1: [<all threads>] > my_waitpid (-1, 0x40000001) > my_waitpid (-1, 0x1): status(137f), 1828 > LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK > LLW: waitpid 1828 received Stopped (signal) (stopped) > pc is 0x3615ebc7cc > Expected stop. > LLW: resume_stop SIGSTOP caught for LWP 1828.1828. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > ... > linux_wait_1 ret = LWP 1828.1828, 1, 0 > <<<< exiting linux_wait_1 > Writing resume reply for LWP 1828.1828:1 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > By inspection, I also noticed that we miss leaving the child with the > suspend count incremented if stopping threads, like we do for clone > threads. > > Tested on x86_64 Fedora 20, extended-remote. > > gdb/gdbserver/ChangeLog: > 2015-07-23 Pedro Alves <palves@redhat.com> > > * linux-low.c (handle_extended_wait): Set the child's last > reported status to TARGET_WAITKIND_STOPPED. > --- > gdb/gdbserver/linux-low.c | 7 ++++++ > gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c > index 17b2a51..56a33ff 100644 > --- a/gdb/gdbserver/linux-low.c > +++ b/gdb/gdbserver/linux-low.c > @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat) > child_lwp->status_pending_p = 0; > child_thr = get_lwp_thread (child_lwp); > child_thr->last_resume_kind = resume_stop; > + child_thr->last_status.kind = TARGET_WAITKIND_STOPPED; This makes perfect sense to me. > + > + /* If we're suspending all threads, leave this one suspended > + too. */ > + if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS) > + child_lwp->suspended = 1; I have a question about this. In the definition of struct lwp_info in linux-low.h, it has this comment: /* When this is true, we shall not try to resume this thread, even if last_resume_kind isn't resume_stop. */ int suspended; Since we are setting last_resume_kind to resume_stop here, is this unnecessary? Thanks, --Don > + > parent_proc = get_thread_process (event_thr); > child_proc->attached = parent_proc->attached; > clone_all_breakpoints (&child_proc->breakpoints, > diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp > index f44dd76..80d2464 100644 > --- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp > +++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp > @@ -48,13 +48,43 @@ gdb_test_multiple $test $test { > } > } > > +# gdbserver had a bug that resulted in reporting the fork child's > +# initial stop to gdb, which gdb does not expect, in turn resulting in > +# a broken session, like: > +# > +# [Thread 31536.31536] #16 stopped. <== BAD > +# [New Thread 31547.31547] > +# [Inferior 10 (process 31536) exited normally] > +# [New Thread 31547.31560] > +# > +# [Thread 31547.31547] #18 stopped. <== BAD > +# Cannot remove breakpoints because program is no longer writable. <== BAD > +# Further execution is probably impossible. <== BAD > +# [Inferior 11 (process 31547) exited normally] > +# [Inferior 1 (process 31454) exited normally] > +# > +# These variables track whether we see such broken behavior. > +set saw_cannot_remove_breakpoints 0 > +set saw_thread_stopped 0 > + > set test "reached breakpoint" > gdb_test_multiple "" $test { > + -re "Cannot remove breakpoints" { > + set saw_cannot_remove_breakpoints 1 > + exp_continue > + } > + -re "Thread \[^\r\n\]+ stopped\\." { > + set saw_thread_stopped 1 > + exp_continue > + } > -re "Inferior 1 \(\[^\r\n\]+\) exited normally" { > pass $test > } > } > > +gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints" > +gdb_assert !$saw_thread_stopped "no spurious thread stop" > + > gdb_test "info threads" "No threads\." \ > "no threads left" > >
Hi Don, Sorry for the delay. On 07/24/2015 07:43 PM, Don Breazeal wrote: >> index 17b2a51..56a33ff 100644 >> --- a/gdb/gdbserver/linux-low.c >> +++ b/gdb/gdbserver/linux-low.c >> @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat) >> child_lwp->status_pending_p = 0; >> child_thr = get_lwp_thread (child_lwp); >> child_thr->last_resume_kind = resume_stop; >> + child_thr->last_status.kind = TARGET_WAITKIND_STOPPED; > > This makes perfect sense to me. > Great. >> + >> + /* If we're suspending all threads, leave this one suspended >> + too. */ >> + if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS) >> + child_lwp->suspended = 1; > > I have a question about this. In the definition of struct lwp_info in > linux-low.h, it has this comment: > > /* When this is true, we shall not try to resume this thread, even > if last_resume_kind isn't resume_stop. */ > int suspended; > > Since we are setting last_resume_kind to resume_stop here, is this > unnecessary? We still need it, because otherwise we'd decrement the suspend count below 0: static int unsuspend_and_proceed_one_lwp (struct inferior_list_entry *entry, void *except) { struct thread_info *thread = (struct thread_info *) entry; struct lwp_info *lwp = get_thread_lwp (thread); if (lwp == except) return 0; lwp->suspended--; gdb_assert (lwp->suspended >= 0); return proceed_one_lwp (entry, except); } It's proceed_one_lwp that skips resuming if the client wants the lwp stopped: static int proceed_one_lwp (struct inferior_list_entry *entry, void *except) { ... if (thread->last_resume_kind == resume_stop && thread->last_status.kind != TARGET_WAITKIND_IGNORE) { if (debug_threads) debug_printf (" client wants LWP to remain %ld stopped\n", lwpid_of (thread)); return 0; } I tried writing a test for this, by making a multithreaded program have all its threads but the main continuously fork (see attached), while the main thread continuously steps over a breakpoint (a conditional breakpoint with condition "0" should do it, as gdbserver handles that breakpoint itself), but that stumbles on yet more problems... :-/ $ ./gdb ./testsuite/gdb.threads/fork-plus-threads-2 -ex "set non-stop on" -ex "set detach-on-fork off" -ex "tar extended-rem :9999" ... Remote debugging using :9999 (gdb) [Thread 24971.24971] #1 stopped. 0x0000003615a011f0 in ?? () c& Continuing. (gdb) [New Thread 24971.24981] [New Thread 24983.24983] [New Thread 24971.24982] [Thread 24983.24983] #3 stopped. 0x0000003615ebc7cc in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:130 130 pid = ARCH_FORK (); [New Thread 24984.24984] Error in re-setting breakpoint -16: PC register is not available Error in re-setting breakpoint -17: PC register is not available Error in re-setting breakpoint -18: PC register is not available Error in re-setting breakpoint -19: PC register is not available Error in re-setting breakpoint -24: PC register is not available Error in re-setting breakpoint -25: PC register is not available Error in re-setting breakpoint -26: PC register is not available Error in re-setting breakpoint -27: PC register is not available Error in re-setting breakpoint -28: PC register is not available Error in re-setting breakpoint -29: PC register is not available Error in re-setting breakpoint -30: PC register is not available PC register is not available (gdb) >> set test "reached breakpoint" BTW, I noticed that this test message is stale from my previous attempt at running to a breakpoint instead of to exit. I changed it to: set test "inferior 1 exited" in patch 1/2. >> gdb_test_multiple "" $test { >> + -re "Cannot remove breakpoints" { >> + set saw_cannot_remove_breakpoints 1 >> + exp_continue >> + } >> + -re "Thread \[^\r\n\]+ stopped\\." { >> + set saw_thread_stopped 1 >> + exp_continue >> + } >> -re "Inferior 1 \(\[^\r\n\]+\) exited normally" { >> pass $test >> } >> } Thanks, Pedro Alves /* This testcase is part of GDB, the GNU debugger. Copyright 2015 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. */ #include <assert.h> #include <pthread.h> #include <unistd.h> #include <stdio.h> #include <sys/types.h> #include <sys/wait.h> #include <stdlib.h> /* Number of threads. Each thread continuously spawns a fork and wait for it. If we have another thread continuously start a step over, gdbserver should end up finding new forks while suspending threads. */ #define NTHREADS 10 pthread_t threads[NTHREADS]; static void * thread_func (void *arg) { while (1) { pid_t pid; pid = fork (); if (pid > 0) { int status; /* Parent. */ pid = waitpid (pid, &status, 0); if (pid == -1) { perror ("wait"); exit (1); } if (!WIFEXITED (status)) { printf ("Unexpected wait status 0x%x from child %d\n", status, pid); } } else if (pid == 0) { /* Child. */ exit (0); } else { perror ("fork"); exit (1); } } } int main (void) { int i; int ret; for (i = 0; i < NTHREADS; i++) { ret = pthread_create (&threads[i], NULL, thread_func, NULL); assert (ret == 0); } for (i = 0; i < NTHREADS; i++) { ret = pthread_join (threads[i], NULL); assert (ret == 0); } /* Don't run forever. */ sleep (180); return 0; }
On 07/29/2015 02:21 PM, Pedro Alves wrote: > > I tried writing a test for this, by making a multithreaded program > have all its threads but the main continuously fork (see attached), while > the main thread continuously steps over a breakpoint (a conditional > breakpoint with condition "0" should do it, as gdbserver handles > that breakpoint itself), but that stumbles on yet more problems... :-/ > > $ ./gdb ./testsuite/gdb.threads/fork-plus-threads-2 -ex "set non-stop on" -ex "set detach-on-fork off" -ex "tar extended-rem :9999" > ... > Remote debugging using :9999 > (gdb) > [Thread 24971.24971] #1 stopped. > 0x0000003615a011f0 in ?? () > c& > Continuing. > (gdb) [New Thread 24971.24981] > [New Thread 24983.24983] > [New Thread 24971.24982] > > [Thread 24983.24983] #3 stopped. > 0x0000003615ebc7cc in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:130 > 130 pid = ARCH_FORK (); > [New Thread 24984.24984] > Error in re-setting breakpoint -16: PC register is not available > Error in re-setting breakpoint -17: PC register is not available > Error in re-setting breakpoint -18: PC register is not available > Error in re-setting breakpoint -19: PC register is not available > Error in re-setting breakpoint -24: PC register is not available > Error in re-setting breakpoint -25: PC register is not available > Error in re-setting breakpoint -26: PC register is not available > Error in re-setting breakpoint -27: PC register is not available > Error in re-setting breakpoint -28: PC register is not available > Error in re-setting breakpoint -29: PC register is not available > Error in re-setting breakpoint -30: PC register is not available > PC register is not available > (gdb) > Hmm, gdbserver's logs (for a different run) show: ... HEW: Got clone event from LWP 25962, new child is LWP 25989 my_waitpid (-1, 0x40000001) my_waitpid (-1, 0x1): status(137f), 25990 LWFE: waitpid(-1, ...) returned 25990, ERRNO-OK LLW: waitpid 25990 received Stopped (signal) (stopped) my_waitpid (-1, 0x40000001) my_waitpid (-1, 0x1): status(0), 25988 LWFE: waitpid(-1, ...) returned 25988, ERRNO-OK LLW: waitpid 25988 received 0 (exited) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ LLFE: 25988 exited. ^^^^^^^^^^^^^^^^^^^ my_waitpid (-1, 0x40000001) my_waitpid (-1, 0x80000001): status(1057f), 25973 LWFE: waitpid(-1, ...) returned 25973, ERRNO-OK LLW: waitpid 25973 received Trace/breakpoint trap (stopped) pc is 0x3615ebc7cc HEW: Got fork event from LWP 25973, new child is 25990 pc is 0x3615ebc7cc pc is 0x3615ebc7cc my_waitpid (-1, 0x40000001) my_waitpid (-1, 0x80000001): status(117f), 25972 LWFE: waitpid(-1, ...) returned 25972, ERRNO-OK LLW: waitpid 25972 received Child exited (stopped) pc is 0x3616a0f279 my_waitpid (-1, 0x40000001) my_waitpid (-1, 0x80000001): status(117f), 0 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK RSRL: resuming stopped-resumed LWP LWP 25962.25962 at 3615ef4ce1: step=0 pc is 0x3615ef4ce1 Resuming lwp 25962 (continue, signal 0, stop not expected) continue from pc 0x3615ef4ce1 RSRL: resuming stopped-resumed LWP LWP 25962.25989 at 0: step=0 pc is 0x3615ef4ce1 Resuming lwp 25989 (continue, signal 0, stop not expected) continue from pc 0x3615ef4ce1 sigchld_handler Ignored signal 17 for LWP 25972. pc is 0x3616a0f279 Resuming lwp 25972 (continue, signal 17, stop not expected) continue from pc 0x3616a0f279 handling possible target event >>>> entering linux_wait_1 linux_wait_1: [<all threads>] Got a pending child 25973 Got an event from pending child 25973 (1057f) Hit a non-gdbserver trap event. SEL: Found 2 SIGTRAP events, selecting #1 linux_wait_1 ret = LWP 25988.25988, 1, 0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ("1" is TARGET_WAITKIND_STOPPED) <<<< exiting linux_wait_1 Writing resume reply for LWP 25988.25988:1 ptrace(regsets_fetch_inferior_registers) PID=25988: No such process ptrace(regsets_fetch_inferior_registers) PID=25988: No such process ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Ignore the "SIGTRAP" mention in "SEL: Found 2 SIGTRAP events", it's "two events". And the one that was picked was a process exit. But the tail end of linux_wait_1 isn't expecting that can happen. Thanks, Pedro Alves
diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c index 17b2a51..56a33ff 100644 --- a/gdb/gdbserver/linux-low.c +++ b/gdb/gdbserver/linux-low.c @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat) child_lwp->status_pending_p = 0; child_thr = get_lwp_thread (child_lwp); child_thr->last_resume_kind = resume_stop; + child_thr->last_status.kind = TARGET_WAITKIND_STOPPED; + + /* If we're suspending all threads, leave this one suspended + too. */ + if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS) + child_lwp->suspended = 1; + parent_proc = get_thread_process (event_thr); child_proc->attached = parent_proc->attached; clone_all_breakpoints (&child_proc->breakpoints, diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp index f44dd76..80d2464 100644 --- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp +++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp @@ -48,13 +48,43 @@ gdb_test_multiple $test $test { } } +# gdbserver had a bug that resulted in reporting the fork child's +# initial stop to gdb, which gdb does not expect, in turn resulting in +# a broken session, like: +# +# [Thread 31536.31536] #16 stopped. <== BAD +# [New Thread 31547.31547] +# [Inferior 10 (process 31536) exited normally] +# [New Thread 31547.31560] +# +# [Thread 31547.31547] #18 stopped. <== BAD +# Cannot remove breakpoints because program is no longer writable. <== BAD +# Further execution is probably impossible. <== BAD +# [Inferior 11 (process 31547) exited normally] +# [Inferior 1 (process 31454) exited normally] +# +# These variables track whether we see such broken behavior. +set saw_cannot_remove_breakpoints 0 +set saw_thread_stopped 0 + set test "reached breakpoint" gdb_test_multiple "" $test { + -re "Cannot remove breakpoints" { + set saw_cannot_remove_breakpoints 1 + exp_continue + } + -re "Thread \[^\r\n\]+ stopped\\." { + set saw_thread_stopped 1 + exp_continue + } -re "Inferior 1 \(\[^\r\n\]+\) exited normally" { pass $test } } +gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints" +gdb_assert !$saw_thread_stopped "no spurious thread stop" + gdb_test "info threads" "No threads\." \ "no threads left"