From patchwork Fri Oct 31 23:28:49 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Don Breazeal X-Patchwork-Id: 3526 Received: (qmail 30880 invoked by alias); 31 Oct 2014 23:30:34 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 30867 invoked by uid 89); 31 Oct 2014 23:30:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 31 Oct 2014 23:30:30 +0000 Received: from svr-orw-fem-06.mgc.mentorg.com ([147.34.97.120]) by relay1.mentorg.com with esmtp id 1XkLeF-0002DY-GV from Don_Breazeal@mentor.com for gdb-patches@sourceware.org; Fri, 31 Oct 2014 16:30:27 -0700 Received: from build4-lucid-cs (147.34.91.1) by SVR-ORW-FEM-06.mgc.mentorg.com (147.34.97.120) with Microsoft SMTP Server id 14.3.181.6; Fri, 31 Oct 2014 16:30:26 -0700 Received: by build4-lucid-cs (Postfix, from userid 1905) id DC8AD40DDD; Fri, 31 Oct 2014 16:30:26 -0700 (PDT) From: Don Breazeal To: Subject: [PATCH 11/16 v3] Extended-remote Linux exit events Date: Fri, 31 Oct 2014 16:28:49 -0700 Message-ID: <1414798134-11536-9-git-send-email-donb@codesourcery.com> In-Reply-To: <1408580964-27916-1-git-send-email-donb@codesourcery.com> References: <1408580964-27916-1-git-send-email-donb@codesourcery.com> MIME-Version: 1.0 X-IsSubscribed: yes This patch implements support for the extended ptrace event PTRACE_EVENT_EXIT on Linux. This is a preparatory patch for exec event support. The use of this event is entirely internal to gdbserver; these events are not reported to GDB or the user. When this event occurs, if the reporting thread is not the last thread in a process, its lwp entry is simply deleted, since this is what happens in the absence of exit events. If it is the last thread of a process, the wait status is set to the actual wait status of the thread, retrieved by PTRACE_O_GETMESSAGE, instead of the status that indicates the extended event, and the existing mechanisms for handling thread exit proceed as usual. The only purpose in using the exit events instead of the existing wait mechanisms is to ensure that the exit of a thread group leader is detected reliably when a non-leader thread calls exec. This was tested on x64 Ubuntu using extended-remote. Rationale for Using Exit Events -------------------------------- In brief, there is a race condition in the current implementation that can leave a dangling entry in the lwp list (an entry that doesn't have a corresponding actual lwp). In this case gdbserver will hang waiting for the non-existent lwp to stop. Using the exit events eliminates this race condition. The same race may exist in the native implementation, since the two implementations are similar, but I haven't verified that. It may be difficult to concoct a test case that demonstrates the race since the window is so small. Now for the long answer: in my testing I ran into a race condition in check_zombie_leaders, which detects when a thread group leader has exited and other threads still exist. On the Linux kernel, ptrace/waitpid don't allow reaping the leader thread until all other threads in the group are reaped. When the leader exits, it goes zombie, but waitpid will not return an exit status until the other threads are gone. When a non-leader thread calls exec, all other non-leader threads are destroyed, the leader becomes a zombie, and once the "other" threads have been reaped, the execing thread takes over the leader's pid (tgid) and appears to vanish. In order to handle this situation in the current implementation, check_zombie_leaders polls the process state in /proc and deletes thread group leaders that are in a zombie state. The replacement is added to the lwp list when the exec event is reported. See https://sourceware.org/ml/gdb-patches/2011-10/msg00704.html for a more detailed explanation of how this works. Here is the relevant part of check_zombie_leaders: if (leader_lp != NULL /* Check if there are other threads in the group, as we may have raced with the inferior simply exiting. */ && !last_thread_of_process_p (leader_pid) && linux_proc_pid_is_zombie (leader_pid)) { /* ...large informative comment block... */ delete_lwp (leader_lp); The race occurred when there were two threads in the program, and the non-leader thread called exec. In this case the leader thread passed through a very brief zombie state before being replaced by the exec'ing thread as the thread group leader. This state transition was asynchronous, with no dependency on anything gdbserver did. Because there were no other threads, there were no thread exit events, and thus there was no synchronization with the leader passing through the zombie state and the exec completing. If there had been more threads, the leader would remain in the zombie state until they were waited for. In the two-thread case, sometimes the leader exit was detected and sometimes it wasn't. (Recall that check_zombie_leaders is polling the state, via linux_proc_pid_is_zombie. The race is between the leader thread passing through the zombie state and check_zombie_leaders testing for zombie state.) If leader exit wasn't detected, gdbserver would end up with a dangling lwp entry that didn't correspond to any real lwp, and would hang waiting for that lwp to stop. Using PTRACE_EVENT_EXIT guarantees that the leader exit will be detected. Note that check_zombie_leaders works just fine for the scenarios where the leader thread exits and the other threads continue to run, with no exec calls. It is required for systems that don't support the extended ptrace events. The sequence of events resulting in the race condition was this: 1) In the program, a CLONE event for a new thread occurs. 2) In the program, both threads are resumed once gdbserver has completed the new thread processing. 3) In gdbserver, the function linux_wait_for_event_filtered loops until waitpid returns "no more events" for the SIGCHLD generated by the CLONE event. Then linux_wait_for_event_filtered calls check_zombie_leaders. 4) In the program, the new thread is doing the exec. During the exec the leader thread will pass through a transitory zombie state. If there were more than two threads, the leader thread would remain a zombie until all the non-leader, non-exec'ing threads were reaped by gdbserver. Since there are no such threads to reap, the leader just becomes a zombie and is replaced by the exec'ing thread on-the-fly. (Note that it appears that the leader thread is a zombie just for a very brief instant.) 5) In gdbserver, check_zombie_leaders checks whether an lwp entry corresponds to a zombie leader thread, and if so, deletes it. Here is the race: in (4) above, the leader may or may not be in the transitory zombie state. In the case where a zombie isn't detected, delete_lwp is not called. 6) In gdbserver, an EXEC event is detected and processed. When it gets ready to report the event to GDB, it calls stop_all_lwps, which sends a SIGSTOP to each lwp in the list and the waits until all the lwps in the list have reported a stop event. If the zombie leader wasn't detected and processed in step (5), gdbserver blocks forever in linux_wait_for_event_filtered, waiting for the undeleted lwp to be stopped, which will never happen. Thanks --Don gdb/gdbserver/ 2014-10-31 Don Breazeal * linux-low.c (handle_extended_wait): Support PTRACE_EVENT_EXIT. (linux_low_filter_event): Change wstat argument to a pointer, making it an input/output argument. (linux_wait_for_event_filtered): Handle wstat point argument in call to linux_low_filter_event. (initialize_low): Add PTRACE_O_TRACEEXIT. gdb/ 2014-10-31 Don Breazeal * nat/linux-ptrace.c (linux_ptrace_check_options): Call linux_test_for_traceexit. (linux_test_for_traceexit): New function. --- gdb/gdbserver/linux-low.c | 68 +++++++++++++++++++++++++++++++++++++------- gdb/nat/linux-ptrace.c | 50 +++++++++++++++++++++++++++++++++ 2 files changed, 107 insertions(+), 11 deletions(-) diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c index 369bfbe..d5f6fe0 100644 --- a/gdb/gdbserver/linux-low.c +++ b/gdb/gdbserver/linux-low.c @@ -222,6 +222,7 @@ static int finish_step_over (struct lwp_info *lwp); static CORE_ADDR get_stop_pc (struct lwp_info *lwp); static void async_file_mark (void); static int kill_lwp (unsigned long lwpid, int signo); +static int num_lwps (int pid); /* True if the low target can hardware single-step. Such targets don't need a BREAKPOINT_REINSERT_ADDR callback. */ @@ -366,12 +367,17 @@ linux_add_process (int pid, int attached) } /* Handle a GNU/Linux extended wait response. If we see a clone - event, we need to add the new LWP to our list (and return 0 so as - not to report the trap to higher layers). */ + event, we need to add the new lwp EVENT_CHILD to our list (and not + report the trap to higher layers). This function returns non-zero + if the event should be ignored and we should wait again. If an + exit event occurred in the last thread of a process, the wait + status in WSTATP will be updated with the exit status of the + process. */ static int -handle_extended_wait (struct lwp_info *event_child, int wstat) +handle_extended_wait (struct lwp_info *event_child, int *wstatp) { + int wstat = *wstatp; int event = linux_ptrace_get_extended_event (wstat); struct thread_info *event_thr = get_lwp_thread (event_child); struct lwp_info *new_lwp; @@ -524,6 +530,38 @@ handle_extended_wait (struct lwp_info *event_child, int wstat) event_child->waitstatus.kind = TARGET_WAITKIND_VFORK_DONE; return 0; } + else if (event == PTRACE_EVENT_EXIT) + { + unsigned long exit_status; + unsigned long lwpid = lwpid_of (event_thr); + int ret; + + if (debug_threads) + debug_printf ("HEW: Got exit event from LWP %ld\n", lwpid ); + + ptrace (PTRACE_GETEVENTMSG, lwpid, (PTRACE_TYPE_ARG3) 0, &exit_status); + + if (num_lwps (pid_of (event_thr)) > 1) + { + /* If there is at least one more LWP, then the program still + exists and the exit should not be reported to GDB. */ + delete_lwp (event_child); + ret = 1; + } + else + { + /* Set the exit status to the actual exit status, so normal + WIFEXITED/WIFSIGNALLED processing and reporting for the + last lwp in the process can proceed from here. */ + *wstatp = exit_status; + ret = 0; + } + + /* Resume the thread so that it actually exits. Subsequent exit + events for LWPs that were deleted above will be ignored. */ + ptrace (PTRACE_CONT, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0); + return ret; + } internal_error (__FILE__, __LINE__, _("unknown ptrace event %d"), event); } @@ -2001,8 +2039,9 @@ lp_status_maybe_breakpoint (struct lwp_info *lp) NULL otherwise. */ static struct lwp_info * -linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat) +linux_low_filter_event (ptid_t filter_ptid, int lwpid, int *wstatp) { + int wstat = *wstatp; struct lwp_info *child; struct thread_info *thread; @@ -2054,7 +2093,8 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat) architecture being defined already (so that CHILD has a valid regcache), and on LAST_STATUS being set (to check for SIGTRAP or not). */ - if (WIFSTOPPED (wstat)) + if (WIFSTOPPED (wstat) + && (linux_ptrace_get_extended_event (wstat) != PTRACE_EVENT_EXIT)) { if (debug_threads && the_low_target.get_pc != NULL) @@ -2090,7 +2130,8 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat) changes the debug registers meanwhile, we have the cached data we can rely on. */ - if (WIFSTOPPED (wstat) && WSTOPSIG (wstat) == SIGTRAP) + if (WIFSTOPPED (wstat) && WSTOPSIG (wstat) == SIGTRAP + && !linux_is_extended_waitstatus (wstat)) { if (the_low_target.stopped_by_watchpoint == NULL) { @@ -2128,10 +2169,15 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int wstat) if (WIFSTOPPED (wstat) && WSTOPSIG (wstat) == SIGTRAP && linux_is_extended_waitstatus (wstat)) { - if (handle_extended_wait (child, wstat)) + if (handle_extended_wait (child, &wstat)) return NULL; else - return child; + { + /* Update caller's wstat in case handle_extended_wait + changed it. */ + *wstatp = wstat; + return child; + } } if (WIFSTOPPED (wstat) && WSTOPSIG (wstat) == SIGSTOP @@ -2354,8 +2400,7 @@ linux_wait_for_event_filtered (ptid_t wait_ptid, ptid_t filter_ptid, (long) ret, status_to_str (*wstatp)); } - event_child = linux_low_filter_event (filter_ptid, - ret, *wstatp); + event_child = linux_low_filter_event (filter_ptid, ret, wstatp); if (event_child != NULL) { /* We got an event to report to the core. */ @@ -6484,6 +6529,7 @@ initialize_low (void) /* Enable extended events. */ linux_ptrace_set_requested_options (PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK - | PTRACE_O_TRACEVFORKDONE); + | PTRACE_O_TRACEVFORKDONE + | PTRACE_O_TRACEEXIT); linux_ptrace_check_options (); } diff --git a/gdb/nat/linux-ptrace.c b/gdb/nat/linux-ptrace.c index 5f3f123..e8fd58c 100644 --- a/gdb/nat/linux-ptrace.c +++ b/gdb/nat/linux-ptrace.c @@ -313,6 +313,7 @@ linux_child_function (gdb_byte *child_stack) static void linux_test_for_tracesysgood (int child_pid); static void linux_test_for_tracefork (int child_pid); +static void linux_test_for_traceexit (int child_pid); /* Determine ptrace features available on this target. */ @@ -350,6 +351,8 @@ linux_ptrace_check_options (void) linux_test_for_tracefork (child_pid); + linux_test_for_traceexit (child_pid); + /* Clean things up and kill any pending children. */ do { @@ -464,6 +467,53 @@ linux_test_for_tracefork (int child_pid) "(%d, status 0x%x)"), ret, status); } +/* Determine if PTRACE_O_TRACEEXIT can be used to follow exit + events. */ + +static void +linux_test_for_traceexit (int child_pid) +{ + int ret; + int status; + int i; + + /* Check if the caller wants PTRACE_O_TRACEEXIT. */ + if ((requested_ptrace_options & PTRACE_O_TRACEEXIT) == 0) + return; + + /* Test for PTRACE_O_TRACEEXIT. First try to set the option. + If this fails, we know for sure that it is not supported. */ + ret = ptrace (PTRACE_SETOPTIONS, child_pid, (PTRACE_TYPE_ARG3) 0, + (PTRACE_TYPE_ARG4) PTRACE_O_TRACEEXIT); + if (ret != 0) + return; + + /* We don't know for sure that the feature is available; old + versions of PTRACE_SETOPTIONS ignored unknown options. So + see if the process exit will generate a PTRACE_EVENT_EXIT. + + We try twice (at most) to handle the case where the grandchild + process exits, causing a SIGCHLD to be delivered to the child + process, stopping it before exit. */ + for (i = 0; i < 2; i++) + { + ret = ptrace (PTRACE_CONT, child_pid, (PTRACE_TYPE_ARG3) 0, + (PTRACE_TYPE_ARG4) 0); + if (ret != 0) + warning (_("linux_test_for_traceexit: failed to resume child")); + + ret = my_waitpid (child_pid, &status, 0); + + /* Check if we received an exit event notification. */ + if (ret == child_pid && WIFSTOPPED (status) + && linux_ptrace_get_extended_event (status) == PTRACE_EVENT_EXIT) + { + /* PTRACE_O_TRACEEXIT is supported. */ + available_ptrace_options |= PTRACE_O_TRACEEXIT; + } + } +} + /* Enable reporting of supported ptrace events. If USE_AVAILABLE_OPTIONS is false, then exclude the events specified in available_ptrace_options. If PID is non-zero,