From patchwork Thu Jul 10 15:16:46 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pedro Alves X-Patchwork-Id: 1999 Received: (qmail 28120 invoked by alias); 10 Jul 2014 15:17:25 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 28089 invoked by uid 89); 10 Jul 2014 15:17:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 10 Jul 2014 15:16:52 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s6AFGmrU013814 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Jul 2014 11:16:49 -0400 Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s6AFGkZ2026640; Thu, 10 Jul 2014 11:16:47 -0400 Message-ID: <53BEAE5E.7030209@redhat.com> Date: Thu, 10 Jul 2014 16:16:46 +0100 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Hui Zhu , gdb-patches ml Subject: [PATCH v2] GDBserver crashes when killing a multi-thread process References: <510f2362-8d33-4c3c-9a13-5d187f26abdf@SVR-ORW-FEM-04.mgc.mentorg.com> <53AF87EB.60703@mentor.com> <53B3CBDB.5030207@redhat.com> In-Reply-To: <53B3CBDB.5030207@redhat.com> Hi Hui, On 07/02/2014 10:07 AM, Pedro Alves wrote: >> This issue is introduced by the patches for PR 12702 that was pushed by you. >> Maybe you could take a look on this patch. > > Thanks. I'll need to think a bit about whether this is the right > solution. > > I've added it to the 7.8 regression queue at: > > https://sourceware.org/gdb/wiki/GDB_7.8_Release > > I'll look better at it as soon as I have a chance. > First, it's really annoying that the testsuite doesn't catch this! We're really missing such a basic test, so I went ahead and wrote one. I was actually quite surprised that we didn't have any test that makes sure "kill" works without complaints/spurious output. However, unfortunately, the new test revealed that the fix isn't right. :-/ Sometimes the test passed, but other times (that'll depend on machine, of course), I'd get: kill Kill the program being debugged? (y or n) y Ignoring packet error, continuing... (gdb) FAIL: gdb.threads/kill.exp: kill That means GDBserver didn't reply to the vKill packet. Attaching to GDBserver at this point shows that it's stuck forever in the new loop: + /* Keep call kill_one_lwp_callback until find_inferior cannot find any + lwps that is for pid. */ + while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL); The reason was that the leader lwp goes zombie before the other lwps are reaped, and linux_wait_for_event would delete it. We'd be left with e.g., only one lwp that is _not_ the leader. That lwp had disappeared already (due to SIGKILL), but there's no status to collect (waitpid would return -1). Nothing in linux-low.c would notice the lwp is already gone. Because that lwp has lwp->stopped == 0, linux_wait_for_event would return, here: if ((find_inferior (&all_threads, not_stopped_callback, &wait_ptid) == NULL)) { if (debug_threads) debug_printf ("LLW: exit (no unwaited-for LWP)\n"); sigprocmask (SIG_SETMASK, &prev_mask, NULL); return -1; } So this would return true: --- a/gdb/gdbserver/linux-low.c +++ b/gdb/gdbserver/linux-low.c @@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL); } while (pid > 0 && WIFSTOPPED (wstat)); - return 0; + /* Let find_inferior return because maybe other lwp in the list will + be deleted by delete_lwp. */ + return 1; } So the new loop would never break. Here's a different patch that passes the test cleanly for me. Let me know if it works for you. 8<------------------- From ed3e42e5447e0276d6fc759a27408d06e9ccf0b7 Mon Sep 17 00:00:00 2001 From: Pedro Alves Date: Thu, 10 Jul 2014 15:13:26 +0100 Subject: [PATCH] GDBserver crashes when killing a multi-thread process Here's an example, with the new test: gdbserver :9999 gdb.threads/kill gdb gdb.threads/kill (gdb) b 52 Breakpoint 1 at 0x4007f4: file kill.c, line 52. Continuing. Breakpoint 1, main () at kill.c:52 52 return 0; /* set break here */ (gdb) k Kill the program being debugged? (y or n) y gdbserver :9999 gdb.threads/kill Process gdb.base/watch_thread_num created; pid = 9719 Listening on port 1234 Remote debugging from host 127.0.0.1 Killing all inferiors Segmentation fault (core dumped) Backtrace: (gdb) bt #0 0x00000000004068a0 in find_inferior (list=0x66b060 , func=0x427637 , arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199 #1 0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966 #2 0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163 #3 0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934 #4 0x0000000000406522 in for_each_inferior (list=0x66b050 , action=0x4107a6 ) at src/gdb/gdbserver/inferiors.c:57 #5 0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767 #6 0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880 #7 0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434 #8 0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189 #9 0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552 #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283 The problem is that linux_wait_for_event deletes lwps that have exited (even those not passed in as lwps of interest), while the lwp/thread list is being walked on with find_inferior. find_inferior can handle the current iterated inferior being deleted, but not others. When killing lwps, we don't really care about any of the pending status handling of linux_wait_for_event. We can just waitpid the lwps directly, which is also what GDB does (see linux-nat.c:kill_wait_callback). This way the lwps are not deleted while we're walking the list. They'll be deleted by linux_mourn afterwards. This crash triggers several times when running the testsuite against GDBserver with the native-gdbserver board (target remote), but as GDB can't distinguish between GDBserver crashing and "kill" being sucessful, as in both cases the connection is closed (the 'k' packet doesn't require a reply), and the inferior is gone, that results in no FAIL. The patch adds a generic test that catches the issue with extended-remote mode (and works fine with native testing too). Here's how it fails with the native-extended-gdbserver board without the fix: (gdb) info threads Id Target Id Frame 6 Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 5 Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 4 Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 3 Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 2 Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 * 1 Thread 15367.15367 main () at .../gdb.threads/kill.c:52 (gdb) kill Kill the program being debugged? (y or n) y Remote connection closed ^^^^^^^^^^^^^^^^^^^^^^^^ (gdb) FAIL: gdb.threads/kill.exp: kill Extended remote should remain connected after the kill. gdb/gdbserver/ 2014-07-10 Pedro Alves * linux-low.c (kill_wait_lwp): New function, based on kill_one_lwp_callback, but use my_waitpid directly. (kill_one_lwp_callback, linux_kill): Use it. gdb/testsuite/ 2014-07-10 Pedro Alves * gdb.threads/kill.c: New file. * gdb.threads/kill.exp: New file. --- gdb/gdbserver/linux-low.c | 68 ++++++++++++++++++++------------- gdb/testsuite/gdb.threads/kill.c | 64 +++++++++++++++++++++++++++++++ gdb/testsuite/gdb.threads/kill.exp | 77 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 183 insertions(+), 26 deletions(-) create mode 100644 gdb/testsuite/gdb.threads/kill.c create mode 100644 gdb/testsuite/gdb.threads/kill.exp diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c index 61552f4..bdc9aaa 100644 --- a/gdb/gdbserver/linux-low.c +++ b/gdb/gdbserver/linux-low.c @@ -909,6 +909,46 @@ linux_kill_one_lwp (struct lwp_info *lwp) errno ? strerror (errno) : "OK"); } +/* Kill LWP and wait for it to die. */ + +static void +kill_wait_lwp (struct lwp_info *lwp) +{ + struct thread_info *thr = get_lwp_thread (lwp); + int pid = ptid_get_pid (ptid_of (thr)); + int lwpid = ptid_get_lwp (ptid_of (thr)); + int wstat; + int res; + + if (debug_threads) + debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid); + + do + { + linux_kill_one_lwp (lwp); + + /* Make sure it died. Notes: + + - The loop is most likely unnecessary. + + - We don't use linux_wait_for_event as that could delete lwps + while we're iterating over them. We're not interested in + any pending status at this point, only in making sure all + wait status on the kernel side are collected until the + process is reaped. + + - We don't use __WALL here as the __WALL emulation relies on + SIGCHLD, and killing a stopped process doesn't generate + one, nor an exit status. + */ + res = my_waitpid (lwpid, &wstat, 0); + if (res == -1 && errno == ECHILD) + res = my_waitpid (lwpid, &wstat, __WCLONE); + } while (res > 0 && WIFSTOPPED (wstat)); + + gdb_assert (res > 0); +} + /* Callback for `find_inferior'. Kills an lwp of a given process, except the leader. */ @@ -917,7 +957,6 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args) { struct thread_info *thread = (struct thread_info *) entry; struct lwp_info *lwp = get_thread_lwp (thread); - int wstat; int pid = * (int *) args; if (ptid_get_pid (entry->id) != pid) @@ -936,14 +975,7 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args) return 0; } - do - { - linux_kill_one_lwp (lwp); - - /* Make sure it died. The loop is most likely unnecessary. */ - pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL); - } while (pid > 0 && WIFSTOPPED (wstat)); - + kill_wait_lwp (lwp); return 0; } @@ -952,8 +984,6 @@ linux_kill (int pid) { struct process_info *process; struct lwp_info *lwp; - int wstat; - int lwpid; process = find_process_pid (pid); if (process == NULL) @@ -976,21 +1006,7 @@ linux_kill (int pid) pid); } else - { - struct thread_info *thr = get_lwp_thread (lwp); - - if (debug_threads) - debug_printf ("lk_1: killing lwp %ld, for pid: %d\n", - lwpid_of (thr), pid); - - do - { - linux_kill_one_lwp (lwp); - - /* Make sure it died. The loop is most likely unnecessary. */ - lwpid = linux_wait_for_event (thr->entry.id, &wstat, __WALL); - } while (lwpid > 0 && WIFSTOPPED (wstat)); - } + kill_wait_lwp (lwp); the_target->mourn (process); diff --git a/gdb/testsuite/gdb.threads/kill.c b/gdb/testsuite/gdb.threads/kill.c new file mode 100644 index 0000000..aefbb06 --- /dev/null +++ b/gdb/testsuite/gdb.threads/kill.c @@ -0,0 +1,64 @@ +/* This testcase is part of GDB, the GNU debugger. + + Copyright 2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#ifdef USE_THREADS + +#include +#include + +#define NUM 5 + +pthread_barrier_t barrier; + +void * +thread_function (void *arg) +{ + volatile unsigned int counter = 1; + + pthread_barrier_wait (&barrier); + + while (counter > 0) + { + counter++; + usleep (1); + } + + pthread_exit (NULL); +} + +#endif /* USE_THREADS */ + +void +setup (void) +{ +#ifdef USE_THREADS + pthread_t threads[NUM]; + int i; + + pthread_barrier_init (&barrier, NULL, NUM + 1); + for (i = 0; i < NUM; i++) + pthread_create (&threads[i], NULL, thread_function, NULL); + pthread_barrier_wait (&barrier); +#endif /* USE_THREADS */ +} + +int +main (void) +{ + setup (); + return 0; /* set break here */ +} diff --git a/gdb/testsuite/gdb.threads/kill.exp b/gdb/testsuite/gdb.threads/kill.exp new file mode 100644 index 0000000..a3f921f --- /dev/null +++ b/gdb/testsuite/gdb.threads/kill.exp @@ -0,0 +1,77 @@ +# This testcase is part of GDB, the GNU debugger. + +# Copyright 2014 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . */ + +standard_testfile + +# Run the test proper. THREADED indicates whether to build a threaded +# program and spawn several threads before trying to kill the program. + +proc test {threaded} { + global testfile srcfile + + with_test_prefix [expr ($threaded)?"threaded":"non-threaded"] { + + set options {debug} + if {$threaded} { + lappend options "pthreads" + lappend options "additional_flags=-DUSE_THREADS" + set prog ${testfile}_threads + } else { + set prog ${testfile}_nothreads + } + + if {[prepare_for_testing "failed to prepare" $prog $srcfile $options] == -1} { + return -1 + } + + if { ![runto main] } then { + fail "run to main" + return + } + + set linenum [gdb_get_line_number "set break here"] + gdb_breakpoint "$srcfile:$linenum" + gdb_continue_to_breakpoint "break here" ".*break here.*" + + if {$threaded} { + gdb_test "info threads" "6.*5.*4.*3.*2.*1.*" "all threads started" + } + + # This kills and ensures no output other than the prompt comes out, + # like: + # + # (gdb) kill + # Kill the program being debugged? (y or n) y + # (gdb) + # + # If we instead saw more output, like e.g., with an extended-remote + # connection: + # + # (gdb) kill + # Kill the program being debugged? (y or n) y + # Remote connection closed + # (gdb) + # + # the above would mean that the remote end crashed. + + gdb_test "kill" "^y" "kill program" "Kill the program being debugged\\? \\(y or n\\) $" "y" + } +} + +foreach threaded {true false} { + test $threaded +}