Message ID | 55A3E23C.8020101@gmail.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 114396 invoked by alias); 13 Jul 2015 16:07:37 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 114380 invoked by uid 89); 13 Jul 2015 16:07:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pa0-f53.google.com Received: from mail-pa0-f53.google.com (HELO mail-pa0-f53.google.com) (209.85.220.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 13 Jul 2015 16:07:34 +0000 Received: by padck2 with SMTP id ck2so44959330pad.0 for <gdb-patches@sourceware.org>; Mon, 13 Jul 2015 09:07:32 -0700 (PDT) X-Received: by 10.66.221.226 with SMTP id qh2mr71971672pac.64.1436803652356; Mon, 13 Jul 2015 09:07:32 -0700 (PDT) Received: from [127.0.0.1] (gcc1-power7.osuosl.org. [140.211.15.137]) by smtp.gmail.com with ESMTPSA id ky17sm12791576pab.31.2015.07.13.09.07.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Jul 2015 09:07:31 -0700 (PDT) Message-ID: <55A3E23C.8020101@gmail.com> Date: Mon, 13 Jul 2015 17:07:24 +0100 From: Yao Qi <qiyaoltc@gmail.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Pedro Alves <palves@redhat.com>, gdb-patches ml <gdb-patches@sourceware.org> Subject: Re: [PATCH v2] GDBserver crashes when killing a multi-thread process References: <510f2362-8d33-4c3c-9a13-5d187f26abdf@SVR-ORW-FEM-04.mgc.mentorg.com> <53AF87EB.60703@mentor.com> <53B3CBDB.5030207@redhat.com> <53BEAE5E.7030209@redhat.com> In-Reply-To: <53BEAE5E.7030209@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes |
Commit Message
Yao Qi
July 13, 2015, 4:07 p.m. UTC
On 10/07/14 16:16, Pedro Alves wrote: > +static void > +kill_wait_lwp (struct lwp_info *lwp) > +{ > + struct thread_info *thr = get_lwp_thread (lwp); > + int pid = ptid_get_pid (ptid_of (thr)); > + int lwpid = ptid_get_lwp (ptid_of (thr)); > + int wstat; > + int res; > + > + if (debug_threads) > + debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid); > + > + do > + { > + linux_kill_one_lwp (lwp); > + > + /* Make sure it died. Notes: > + > + - The loop is most likely unnecessary. > + > + - We don't use linux_wait_for_event as that could delete lwps > + while we're iterating over them. We're not interested in > + any pending status at this point, only in making sure all > + wait status on the kernel side are collected until the > + process is reaped. > + > + - We don't use __WALL here as the __WALL emulation relies on > + SIGCHLD, and killing a stopped process doesn't generate > + one, nor an exit status. > + */ > + res = my_waitpid (lwpid, &wstat, 0); > + if (res == -1 && errno == ECHILD) > + res = my_waitpid (lwpid, &wstat, __WCLONE); > + } while (res > 0 && WIFSTOPPED (wstat)); > + > + gdb_assert (res > 0); > +} Hi Pedro, do you still remember why did you add this assert? It wasn't mentioned in the mail https://sourceware.org/ml/gdb-patches/2014-07/msg00206.html I am looking at a GDBserver internal error on x86_64 when I run gdb.threads/thread-unwindonsignal.exp with GDBserver, continue^M Continuing.^M warning: Remote failure reply: E.No unwaited-for children left.^M PC register is not available^M (gdb) FAIL: gdb.threads/thread-unwindonsignal.exp: continue until exit Remote debugging from host 127.0.0.1^M ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M monitor exit^M Killing process(es): 30694^M (gdb) /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:1106: A problem internal to GDBserver has been detected.^M kill_wait_lwp: Assertion `res > 0' failed. After your patch https://sourceware.org/ml/gdb-patches/2015-03/msg00597.html GDBserver starts to swallows errors if the LWP is gone. Then, when GDBservers kills non-exist LWP, the assert will be triggered. Why don't we implement kill_wait_lwp like its counterpart in GDB linux-nat.c:kill_wait_callback? we can loop and assert like this patch below, (note that this patch fixes the internal error, and the FAIL is still there).
Comments
On 07/13/2015 05:07 PM, Yao Qi wrote: > Hi Pedro, > do you still remember why did you add this assert? It wasn't > mentioned in the mail > https://sourceware.org/ml/gdb-patches/2014-07/msg00206.html > Simply because getting here was supposed to indicate something went wrong elsewhere, but at the time I didn't consider that the child could die while ptrace-stopped. > I am looking at a GDBserver internal error on x86_64 when I run > gdb.threads/thread-unwindonsignal.exp with GDBserver, > > continue^M > Continuing.^M > warning: Remote failure reply: E.No unwaited-for children left.^M > PC register is not available^M > (gdb) FAIL: gdb.threads/thread-unwindonsignal.exp: continue until exit > Remote debugging from host 127.0.0.1^M > ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M > ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M > ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M > ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M > monitor exit^M > Killing process(es): 30694^M > (gdb) /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:1106: A > problem internal to GDBserver has been detected.^M > kill_wait_lwp: Assertion `res > 0' failed. > > After your patch https://sourceware.org/ml/gdb-patches/2015-03/msg00597.html > GDBserver starts to swallows errors if the LWP is gone. Then, when > GDBservers kills non-exist LWP, the assert will be triggered. > Looks like I forgot to push the rest of that series: https://sourceware.org/ml/gdb-patches/2015-03/msg00182.html What do you think of that one? > Why don't we implement kill_wait_lwp like its counterpart in GDB > linux-nat.c:kill_wait_callback? we can loop and assert like this > patch below, (note that this patch fixes the internal error, and > the FAIL is still there). > Seems to me it's not 100% correct to waitpid the pid one more time after we've already reaped it, because there's a minuscule chance another process that we're debugging could clone a new lwp that reuses the PID of the one we've just killed/reaped, and then another iteration could collect the initial SIGSTOP of the wrong LWP and we'd kill it: -> kill (pid1, SIGKILL); <- waitpid (pid1) returns pid1/WSIGNALLED -> on iteration1: new pid1 clone lwp is spawned -> ret==pid1, continue iterating -> kill (pid1, SIGKILL); // killing wrong process <- waitpid (pid1) returns either SIGSTOP or WSIGNALLED ... Thanks, Pedro Alves
Pedro Alves <palves@redhat.com> writes: > Looks like I forgot to push the rest of that series: > > https://sourceware.org/ml/gdb-patches/2015-03/msg00182.html > > What do you think of that one? Yes, it looks good to me. We also need it on 7.10 branch. > >> Why don't we implement kill_wait_lwp like its counterpart in GDB >> linux-nat.c:kill_wait_callback? we can loop and assert like this >> patch below, (note that this patch fixes the internal error, and >> the FAIL is still there). >> > > Seems to me it's not 100% correct to waitpid the pid one more time > after we've already reaped it, because there's a minuscule chance > another process that we're debugging could clone a new lwp that reuses > the PID of the one we've just killed/reaped, and then another iteration > could collect the initial SIGSTOP of the wrong LWP and we'd kill it: > > -> kill (pid1, SIGKILL); > <- waitpid (pid1) returns pid1/WSIGNALLED > -> on iteration1: new pid1 clone lwp is spawned > -> ret==pid1, continue iterating > -> kill (pid1, SIGKILL); // killing wrong process > <- waitpid (pid1) returns either SIGSTOP or WSIGNALLED > ... Yes, that is possible.
On 07/14/2015 09:00 AM, Yao Qi wrote: > Pedro Alves <palves@redhat.com> writes: > >> Looks like I forgot to push the rest of that series: >> >> https://sourceware.org/ml/gdb-patches/2015-03/msg00182.html >> >> What do you think of that one? > > Yes, it looks good to me. We also need it on 7.10 branch. OK, I'll push it. Thanks, Pedro Alves
diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c index 7bb9f7f..07d051a 100644 --- a/gdb/gdbserver/linux-low.c +++ b/gdb/gdbserver/linux-low.c @@ -1101,9 +1101,9 @@ kill_wait_lwp (struct lwp_info *lwp) res = my_waitpid (lwpid, &wstat, 0); if (res == -1 && errno == ECHILD) res = my_waitpid (lwpid, &wstat, __WCLONE); - } while (res > 0 && WIFSTOPPED (wstat)); + } while (res == lwpid); - gdb_assert (res > 0); + gdb_assert (res == -1 && errno == ECHILD); } /* Callback for `find_inferior'. Kills an lwp of a given process,