follow-exec: handle targets that don't have thread exit events

Message ID 1415905375-29865-1-git-send-email-palves@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves Nov. 13, 2014, 7:02 p.m. UTC
  ... such as remote.

Ref: https://sourceware.org/ml/gdb-patches/2014-11/msg00268.html

This fixes invalid reads Valgrind caught when debugging against a
GDBserver patched with a series that adds exec events to the remote
protocol.  Like these, using the gdb.threads/thread-execl.exp test:

$ valgrind ./gdb -data-directory=data-directory ./testsuite/gdb.threads/thread-execl  -ex "tar extended-remote :9999" -ex "b thread_execler" -ex "c" -ex "set scheduler-locking on"
...
Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
29        if (execl (image, image, NULL) == -1)
(gdb) n
Thread 32509.32509 is executing new program: build/gdb/testsuite/gdb.threads/thread-execl
[New Thread 32509.32532]
==32510== Invalid read of size 4
==32510==    at 0x5AA7D8: delete_breakpoint (breakpoint.c:13989)
==32510==    by 0x6285D3: delete_thread_breakpoint (thread.c:100)
==32510==    by 0x628603: delete_step_resume_breakpoint (thread.c:109)
==32510==    by 0x61622B: delete_thread_infrun_breakpoints (infrun.c:2928)
==32510==    by 0x6162EF: for_each_just_stopped_thread (infrun.c:2958)
==32510==    by 0x616311: delete_just_stopped_threads_infrun_breakpoints (infrun.c:2969)
==32510==    by 0x616C96: fetch_inferior_event (infrun.c:3267)
==32510==    by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
==32510==    by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
==32510==    by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
==32510==    by 0x4AF6F0: fd_event (ser-base.c:182)
==32510==    by 0x63806D: handle_file_event (event-loop.c:762)
==32510==  Address 0xcf333e0 is 16 bytes inside a block of size 200 free'd
==32510==    at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==32510==    by 0x77CB74: xfree (common-utils.c:98)
==32510==    by 0x5AA954: delete_breakpoint (breakpoint.c:14056)
==32510==    by 0x5988BD: update_breakpoints_after_exec (breakpoint.c:3765)
==32510==    by 0x61360F: follow_exec (infrun.c:1091)
==32510==    by 0x6186FA: handle_inferior_event (infrun.c:4061)
==32510==    by 0x616C55: fetch_inferior_event (infrun.c:3261)
==32510==    by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
==32510==    by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
==32510==    by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
==32510==    by 0x4AF6F0: fd_event (ser-base.c:182)
==32510==    by 0x63806D: handle_file_event (event-loop.c:762)
==32510==
[Switching to Thread 32509.32532]

Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
29        if (execl (image, image, NULL) == -1)
(gdb)

The breakpoint in question is the step-resume breakpoint of the
non-main thread, the one that was "next"ed.

Tested on x86_64 Fedora 20.

gdb/
2014-11-13  Pedro Alves  <palves@redhat.com>

	* infrun.c (follow_exec): Delete all threads of the process except
	the event thread.  Extended comments.
---
 gdb/infrun.c | 44 ++++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 12 deletions(-)
  

Comments

Don Breazeal Dec. 6, 2014, 12:55 a.m. UTC | #1
On 11/13/2014 11:02 AM, Pedro Alves wrote:
> ... such as remote.
> 
> Ref: https://sourceware.org/ml/gdb-patches/2014-11/msg00268.html
> 
> This fixes invalid reads Valgrind caught when debugging against a
> GDBserver patched with a series that adds exec events to the remote
> protocol.  Like these, using the gdb.threads/thread-execl.exp test:
> 
> $ valgrind ./gdb -data-directory=data-directory ./testsuite/gdb.threads/thread-execl  -ex "tar extended-remote :9999" -ex "b thread_execler" -ex "c" -ex "set scheduler-locking on"
> ...
> Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
> 29        if (execl (image, image, NULL) == -1)
> (gdb) n
> Thread 32509.32509 is executing new program: build/gdb/testsuite/gdb.threads/thread-execl
> [New Thread 32509.32532]
> ==32510== Invalid read of size 4
> ==32510==    at 0x5AA7D8: delete_breakpoint (breakpoint.c:13989)
> ==32510==    by 0x6285D3: delete_thread_breakpoint (thread.c:100)
> ==32510==    by 0x628603: delete_step_resume_breakpoint (thread.c:109)
> ==32510==    by 0x61622B: delete_thread_infrun_breakpoints (infrun.c:2928)
> ==32510==    by 0x6162EF: for_each_just_stopped_thread (infrun.c:2958)
> ==32510==    by 0x616311: delete_just_stopped_threads_infrun_breakpoints (infrun.c:2969)
> ==32510==    by 0x616C96: fetch_inferior_event (infrun.c:3267)
> ==32510==    by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
> ==32510==    by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
> ==32510==    by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
> ==32510==    by 0x4AF6F0: fd_event (ser-base.c:182)
> ==32510==    by 0x63806D: handle_file_event (event-loop.c:762)
> ==32510==  Address 0xcf333e0 is 16 bytes inside a block of size 200 free'd
> ==32510==    at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==32510==    by 0x77CB74: xfree (common-utils.c:98)
> ==32510==    by 0x5AA954: delete_breakpoint (breakpoint.c:14056)
> ==32510==    by 0x5988BD: update_breakpoints_after_exec (breakpoint.c:3765)
> ==32510==    by 0x61360F: follow_exec (infrun.c:1091)
> ==32510==    by 0x6186FA: handle_inferior_event (infrun.c:4061)
> ==32510==    by 0x616C55: fetch_inferior_event (infrun.c:3261)
> ==32510==    by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
> ==32510==    by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
> ==32510==    by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
> ==32510==    by 0x4AF6F0: fd_event (ser-base.c:182)
> ==32510==    by 0x63806D: handle_file_event (event-loop.c:762)
> ==32510==
> [Switching to Thread 32509.32532]
> 
> Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
> 29        if (execl (image, image, NULL) == -1)
> (gdb)
> 
> The breakpoint in question is the step-resume breakpoint of the
> non-main thread, the one that was "next"ed.
> 
> Tested on x86_64 Fedora 20.
> 
> gdb/
> 2014-11-13  Pedro Alves  <palves@redhat.com>
> 
> 	* infrun.c (follow_exec): Delete all threads of the process except
> 	the event thread.  Extended comments.
> ---
>  gdb/infrun.c | 44 ++++++++++++++++++++++++++++++++------------
>  1 file changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 7e59f55..0532d3e 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -1060,10 +1060,11 @@ show_follow_exec_mode_string (struct ui_file *file, int from_tty,
>  /* EXECD_PATHNAME is assumed to be non-NULL.  */
>  
>  static void
> -follow_exec (ptid_t pid, char *execd_pathname)
> +follow_exec (ptid_t ptid, char *execd_pathname)
>  {
> -  struct thread_info *th = inferior_thread ();
> +  struct thread_info *th, *tmp;
>    struct inferior *inf = current_inferior ();
> +  int pid = ptid_get_pid (ptid);
>  
>    /* This is an exec event that we actually wish to pay attention to.
>       Refresh our symbol table to the newly exec'd program, remove any
> @@ -1088,24 +1089,43 @@ follow_exec (ptid_t pid, char *execd_pathname)
>  
>    mark_breakpoints_out ();
>  
> -  update_breakpoints_after_exec ();
> -
> -  /* If there was one, it's gone now.  We cannot truly step-to-next
> -     statement through an exec().  */
> +  /* The target reports the exec event to the main thread, even if
> +     some other thread does the exec, and even if the main thread was
> +     stopped or already gone.  On targets that don't have thread exit
> +     events (like remote), we may still have non-leader threads of the
> +     process on our list.  When debugging remotely, it's best to avoid
> +     extra traffic, when possible, so avoid syncing the thread list
> +     with the target, and instead go ahead and delete all threads of
> +     the process but one that reported the event.  Note this must be
> +     done before calling update_breakpoints_after_exec, as otherwise
> +     clearing the threads' resources would reference stale thread
> +     breakpoints -- it may have been one of these threads that stepped
> +     across the exec.  We could just clear their stepping states, but
> +     as long as we're iterating, might as well delete them.  Deleting
> +     them now rather than at the next user-visible stop provides a
> +     nicer sequence of events for user and MI notifications.  */
> +  ALL_NON_EXITED_THREADS_SAFE (th, tmp)
> +    if (ptid_get_pid (th->ptid) == pid && !ptid_equal (th->ptid, ptid))
> +      delete_thread (th->ptid);
> +
> +  /* We also need to clear any left over stale state for the
> +     leader/event thread.  E.g., if there was any step-resume
> +     breakpoint or similar, it's gone now.  We cannot truly
> +     step-to-next statement through an exec().  */
> +  th = inferior_thread ();
>    th->control.step_resume_breakpoint = NULL;
>    th->control.exception_resume_breakpoint = NULL;
>    th->control.single_step_breakpoints = NULL;
>    th->control.step_range_start = 0;
>    th->control.step_range_end = 0;
>  
> -  /* The target reports the exec event to the main thread, even if
> -     some other thread does the exec, and even if the main thread was
> -     already stopped --- if debugging in non-stop mode, it's possible
> -     the user had the main thread held stopped in the previous image
> -     --- release it now.  This is the same behavior as step-over-exec
> -     with scheduler-locking on in all-stop mode.  */
> +  /* The user may have had the main thread held stopped in the
> +     previous image (e.g., schedlock on, or non-stop).  Release
> +     it now.  */
>    th->stop_requested = 0;
>  
> +  update_breakpoints_after_exec ();
> +
>    /* What is this a.out's name?  */
>    printf_unfiltered (_("%s is executing new program: %s\n"),
>  		     target_pid_to_str (inferior_ptid),
> 
Hi Pedro,

I walked through this, and it makes sense to me.  We know that
on entry to follow_exec inferior_thread() is the event thread,
which is also the leader thread, right? Thanks for digging into this.

I haven't had a chance yet to look at the exec event /
check_zombie_leaders race condition issue.  I'll do
that once I get through all of the fork event issues.

--Don
  

Patch

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 7e59f55..0532d3e 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1060,10 +1060,11 @@  show_follow_exec_mode_string (struct ui_file *file, int from_tty,
 /* EXECD_PATHNAME is assumed to be non-NULL.  */
 
 static void
-follow_exec (ptid_t pid, char *execd_pathname)
+follow_exec (ptid_t ptid, char *execd_pathname)
 {
-  struct thread_info *th = inferior_thread ();
+  struct thread_info *th, *tmp;
   struct inferior *inf = current_inferior ();
+  int pid = ptid_get_pid (ptid);
 
   /* This is an exec event that we actually wish to pay attention to.
      Refresh our symbol table to the newly exec'd program, remove any
@@ -1088,24 +1089,43 @@  follow_exec (ptid_t pid, char *execd_pathname)
 
   mark_breakpoints_out ();
 
-  update_breakpoints_after_exec ();
-
-  /* If there was one, it's gone now.  We cannot truly step-to-next
-     statement through an exec().  */
+  /* The target reports the exec event to the main thread, even if
+     some other thread does the exec, and even if the main thread was
+     stopped or already gone.  On targets that don't have thread exit
+     events (like remote), we may still have non-leader threads of the
+     process on our list.  When debugging remotely, it's best to avoid
+     extra traffic, when possible, so avoid syncing the thread list
+     with the target, and instead go ahead and delete all threads of
+     the process but one that reported the event.  Note this must be
+     done before calling update_breakpoints_after_exec, as otherwise
+     clearing the threads' resources would reference stale thread
+     breakpoints -- it may have been one of these threads that stepped
+     across the exec.  We could just clear their stepping states, but
+     as long as we're iterating, might as well delete them.  Deleting
+     them now rather than at the next user-visible stop provides a
+     nicer sequence of events for user and MI notifications.  */
+  ALL_NON_EXITED_THREADS_SAFE (th, tmp)
+    if (ptid_get_pid (th->ptid) == pid && !ptid_equal (th->ptid, ptid))
+      delete_thread (th->ptid);
+
+  /* We also need to clear any left over stale state for the
+     leader/event thread.  E.g., if there was any step-resume
+     breakpoint or similar, it's gone now.  We cannot truly
+     step-to-next statement through an exec().  */
+  th = inferior_thread ();
   th->control.step_resume_breakpoint = NULL;
   th->control.exception_resume_breakpoint = NULL;
   th->control.single_step_breakpoints = NULL;
   th->control.step_range_start = 0;
   th->control.step_range_end = 0;
 
-  /* The target reports the exec event to the main thread, even if
-     some other thread does the exec, and even if the main thread was
-     already stopped --- if debugging in non-stop mode, it's possible
-     the user had the main thread held stopped in the previous image
-     --- release it now.  This is the same behavior as step-over-exec
-     with scheduler-locking on in all-stop mode.  */
+  /* The user may have had the main thread held stopped in the
+     previous image (e.g., schedlock on, or non-stop).  Release
+     it now.  */
   th->stop_requested = 0;
 
+  update_breakpoints_after_exec ();
+
   /* What is this a.out's name?  */
   printf_unfiltered (_("%s is executing new program: %s\n"),
 		     target_pid_to_str (inferior_ptid),