[11/16,v2] Extended-remote Linux exit events

  This patch implements support for the extended ptrace event
PTRACE_EVENT_EXIT on Linux.  This is a preparatory patch for exec event
support.

The use of this event is entirely internal to gdbserver; these events are
not reported to GDB or the user.  When this event occurs, if the reporting
thread is not the last thread in a process, its lwp entry is simply
deleted, since this is what happens in the absence of exit events.  If it
is the last thread of a process, the wait status is set to the actual wait
status of the thread, retrieved by PTRACE_O_GETMESSAGE, instead of the
status that indicates the extended event, and the existing mechanisms for
handling thread exit proceed as usual.

The only purpose in using the exit events instead of the existing wait
mechanisms is to ensure that the exit of a thread group leader is detected
reliably when a non-leader thread calls exec.

This was tested on x64 Ubuntu using extended-remote.

Rationale for Using Exit Events
--------------------------------
In brief, there is a race condition in the current implementation that can
leave a dangling entry in the lwp list (an entry that doesn't have a
corresponding actual lwp).  In this case gdbserver will hang waiting for
the non-existent lwp to stop.  Using the exit events eliminates this race
condition.

The same race may exist in the native implementation, since the two
implementations are similar, but I haven't verified that.  It may be
difficult to concoct a test case that demonstrates the race since the
window is so small.

Now for the long answer: in my testing I ran into a race condition in
check_zombie_leaders, which detects when a thread group leader has exited
and other threads still exist.  On the Linux kernel, ptrace/waitpid don't
allow reaping the leader thread until all other threads in the group are
reaped.  When the leader exits, it goes zombie, but waitpid will not return
an exit status until the other threads are gone.  When a non-leader thread
calls exec, all other non-leader threads are destroyed, the leader becomes
a zombie, and once the "other" threads have been reaped, the execing thread
takes over the leader's pid (tgid) and appears to vanish.  In order to
handle this situation in the current implementation, check_zombie_leaders
polls the process state in /proc and deletes thread group leaders that are
in a zombie state.  The replacement is added to the lwp list when the exec
event is reported.

See https://sourceware.org/ml/gdb-patches/2011-10/msg00704.html for a more
detailed explanation of how this works.

Here is the relevant part of check_zombie_leaders:

if (leader_lp != NULL
          /* Check if there are other threads in the group, as we may
             have raced with the inferior simply exiting.  */
          && !last_thread_of_process_p (leader_pid)
          && linux_proc_pid_is_zombie (leader_pid))
        {
          /* ...large informative comment block... */
          delete_lwp (leader_lp);

The race occurred when there were two threads in the program, and the
non-leader thread called exec.  In this case the leader thread passed
through a very brief zombie state before being replaced by the exec'ing
thread as the thread group leader.  This state transition was asynchronous,
with no dependency on anything gdbserver did.  Because there were no other
threads, there were no thread exit events, and thus there was no
synchronization with the leader passing through the zombie state and the
exec completing.  If there had been more threads, the leader would remain
in the zombie state until they were waited for.  In the two-thread case,
sometimes the leader exit was detected and sometimes it wasn't. (Recall
that check_zombie_leaders is polling the state, via
linux_proc_pid_is_zombie.  The race is between the leader thread passing
through the zombie state and check_zombie_leaders testing for zombie
state.)  If leader exit wasn't detected, gdbserver would end up with a
dangling lwp entry that didn't correspond to any real lwp, and would hang
waiting for that lwp to stop.  Using PTRACE_EVENT_EXIT guarantees that the
leader exit will be detected.

Note that check_zombie_leaders works just fine for the scenarios where the
leader thread exits and the other threads continue to run, with no exec
calls.  It is required for systems that don't support the extended ptrace
events.

The sequence of events resulting in the race condition was this:

    1) In the program, a CLONE event for a new thread occurs.

    2) In the program, both threads are resumed once gdbserver has
       completed the new thread processing.

    3) In gdbserver, the function linux_wait_for_event_filtered loops until
       waitpid returns "no more events" for the SIGCHLD generated by the
       CLONE event.  Then linux_wait_for_event_filtered calls
       check_zombie_leaders.

    4) In the program, the new thread is doing the exec.  During the exec
       the leader thread will pass through a transitory zombie state.  If
       there were more than two threads, the leader thread would remain a
       zombie until all the non-leader, non-exec'ing threads were reaped by
       gdbserver.  Since there are no such threads to reap, the leader just
       becomes a zombie and is replaced by the exec'ing thread on-the-fly.
       (Note that it appears that the leader thread is a zombie just for a
       very brief instant.)

    5) In gdbserver, check_zombie_leaders checks whether an lwp entry
       corresponds to a zombie leader thread, and if so, deletes it.  Here
       is the race: in (4) above, the leader may or may not be in the
       transitory zombie state.  In the case where a zombie isn't detected,
       delete_lwp is not called.

    6) In gdbserver, an EXEC event is detected and processed.  When it gets
       ready to report the event to GDB, it calls stop_all_lwps, which sends
       a SIGSTOP to each lwp in the list and the waits until all the lwps in
       the list have reported a stop event.  If the zombie leader wasn't
       detected and processed in step (5), gdbserver blocks forever in
       linux_wait_for_event_filtered, waiting for the undeleted lwp to be
       stopped, which will never happen.

Thanks
--Don

gdb/
2014-08-20  Don Breazeal  <donb@codesourcery.com>

	* nat/linux-ptrace.c (linux_ptrace_check_options): Call
	linux_test_for_traceexit.
	(linux_test_for_traceexit): New function.

gdb/gdbserver/
2014-08-20  Don Breazeal  <donb@codesourcery.com>

	* linux-low.c (handle_extended_wait): Support PTRACE_EVENT_EXIT.
	(linux_low_filter_event): Change wstat argument to a pointer,
	making it an input/output argument.
	(linux_wait_for_event_filtered): Handle wstat point argument in
	call to linux_low_filter_event.
	(initialize_low): Add PTRACE_O_TRACEEXIT.

---
 gdb/gdbserver/linux-low.c |   68 +++++++++++++++++++++++++++++++++++++-------
 gdb/nat/linux-ptrace.c    |   50 +++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+), 11 deletions(-)

[11/16,v2] Extended-remote Linux exit events

Commit Message

Patch