[3/N] remote follow fork and spurious child stops in non-stop mode

Message ID 55B1308E.4020700@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves July 23, 2015, 6:21 p.m. UTC
  So I managed to extract out this smaller patch from the
gdbserver fixes I mentioned.  I think this one looks safe enough
for 7.10.  WDYT?

-----------
From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 23 Jul 2015 18:49:51 +0100
Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode

Running gdb.threads/fork-plus-threads.exp against gdbserver in
extended-remote mode, even though the test passes, we still see broken
behavior:

Running gdb.threads/fork-plus-threads.exp against gdbserver in
extended-remote mode, even though the test passes, we still see broken
behavior:

 (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
 continue &
 Continuing.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
 [New Thread 28092.28092]

 [Thread 28092.28092] #2 stopped.
 [New Thread 28094.28094]
 [Inferior 2 (process 28092) exited normally]
 [New Thread 28094.28105]
 [New Thread 28094.28109]

...

[Thread 28174.28174] #18 stopped.
 [New Thread 28185.28185]
 [Inferior 10 (process 28174) exited normally]
 [New Thread 28185.28196]

 [Thread 28185.28185] #20 stopped.
 Cannot remove breakpoints because program is no longer writable.
 Further execution is probably impossible.
 [Inferior 11 (process 28185) exited normally]
 [Inferior 1 (process 28091) exited normally]
 PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
 info threads
 No threads.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
 info inferiors
   Num  Description       Executable
 * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left

All the "[Thread FOO] #NN stopped." above are bogus, as well as the
"Cannot remove breakpoints because program is no longer writable.",
which is a consequence.

The problem is that when we intercept a fork event, we should report
the event for the parent, only, and leave the child stopped, but not
report its stop event.  GDB later decides whether to follow the parent
or the child.  But because handle_extended_wait does not set the
child's last_status.kind to TARGET_WAITKIND_STOPPED, a
stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
memory) by mistake ends up queueing a SIGSTOP on the child, resuming
it, and then when that SIGSTOP is intercepted, because the LWP has
last_resume_kind set to resume_stop, gdbserver reports the stop to
GDB, as GDB_SIGNAL_0:

...
 >>>> entering unstop_all_lwps
 unstopping all lwps
 proceed_one_lwp: lwp 1600
    client wants LWP to remain 1600 stopped
 proceed_one_lwp: lwp 1828
 Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Sending sigstop to lwp 1828
 pc is 0x3615ebc7cc
 Resuming lwp 1828 (continue, signal 0, stop expected)
   continue from pc 0x3615ebc7cc
 unstop_all_lwps done
 sigchld_handler
 <<<< exiting unstop_all_lwps
 handling possible target event
 >>>> entering linux_wait_1
 linux_wait_1: [<all threads>]
 my_waitpid (-1, 0x40000001)
 my_waitpid (-1, 0x1): status(137f), 1828
 LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
 LLW: waitpid 1828 received Stopped (signal) (stopped)
 pc is 0x3615ebc7cc
 Expected stop.
 LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
 linux_wait_1 ret = LWP 1828.1828, 1, 0
 <<<< exiting linux_wait_1
 Writing resume reply for LWP 1828.1828:1
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By inspection, I also noticed that we miss leaving the child with the
suspend count incremented if stopping threads, like we do for clone
threads.

Tested on x86_64 Fedora 20, extended-remote.

gdb/gdbserver/ChangeLog:
2015-07-23  Pedro Alves  <palves@redhat.com>

	* linux-low.c (handle_extended_wait): Set the child's last
	reported status to TARGET_WAITKIND_STOPPED.
---
 gdb/gdbserver/linux-low.c                       |  7 ++++++
 gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++
 2 files changed, 37 insertions(+)
  

Comments

Simon Marchi July 24, 2015, 6:05 p.m. UTC | #1
On 15-07-23 02:21 PM, Pedro Alves wrote:
> So I managed to extract out this smaller patch from the
> gdbserver fixes I mentioned.  I think this one looks safe enough
> for 7.10.  WDYT?
> 
> -----------
> From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 23 Jul 2015 18:49:51 +0100
> Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode
> 
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
> 
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
> 
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
>  continue &
>  Continuing.
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
>  [New Thread 28092.28092]
> 
>  [Thread 28092.28092] #2 stopped.
>  [New Thread 28094.28094]
>  [Inferior 2 (process 28092) exited normally]
>  [New Thread 28094.28105]
>  [New Thread 28094.28109]
> 
> ...
> 
> [Thread 28174.28174] #18 stopped.
>  [New Thread 28185.28185]
>  [Inferior 10 (process 28174) exited normally]
>  [New Thread 28185.28196]
> 
>  [Thread 28185.28185] #20 stopped.
>  Cannot remove breakpoints because program is no longer writable.
>  Further execution is probably impossible.
>  [Inferior 11 (process 28185) exited normally]
>  [Inferior 1 (process 28091) exited normally]
>  PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
>  info threads
>  No threads.
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
>  info inferiors
>    Num  Description       Executable
>  * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left
> 
> All the "[Thread FOO] #NN stopped." above are bogus, as well as the
> "Cannot remove breakpoints because program is no longer writable.",
> which is a consequence.
> 
> The problem is that when we intercept a fork event, we should report
> the event for the parent, only, and leave the child stopped, but not
> report its stop event.  GDB later decides whether to follow the parent
> or the child.  But because handle_extended_wait does not set the
> child's last_status.kind to TARGET_WAITKIND_STOPPED, a
> stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
> memory) by mistake ends up queueing a SIGSTOP on the child, resuming
> it, and then when that SIGSTOP is intercepted, because the LWP has
> last_resume_kind set to resume_stop, gdbserver reports the stop to
> GDB, as GDB_SIGNAL_0:
> 
> ...
>  >>>> entering unstop_all_lwps
>  unstopping all lwps
>  proceed_one_lwp: lwp 1600
>     client wants LWP to remain 1600 stopped
>  proceed_one_lwp: lwp 1828
>  Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  Sending sigstop to lwp 1828
>  pc is 0x3615ebc7cc
>  Resuming lwp 1828 (continue, signal 0, stop expected)
>    continue from pc 0x3615ebc7cc
>  unstop_all_lwps done
>  sigchld_handler
>  <<<< exiting unstop_all_lwps
>  handling possible target event
>  >>>> entering linux_wait_1
>  linux_wait_1: [<all threads>]
>  my_waitpid (-1, 0x40000001)
>  my_waitpid (-1, 0x1): status(137f), 1828
>  LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
>  LLW: waitpid 1828 received Stopped (signal) (stopped)
>  pc is 0x3615ebc7cc
>  Expected stop.
>  LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ...
>  linux_wait_1 ret = LWP 1828.1828, 1, 0
>  <<<< exiting linux_wait_1
>  Writing resume reply for LWP 1828.1828:1
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> By inspection, I also noticed that we miss leaving the child with the
> suspend count incremented if stopping threads, like we do for clone
> threads.
> 
> Tested on x86_64 Fedora 20, extended-remote.
> 
> gdb/gdbserver/ChangeLog:
> 2015-07-23  Pedro Alves  <palves@redhat.com>
> 
> 	* linux-low.c (handle_extended_wait): Set the child's last
> 	reported status to TARGET_WAITKIND_STOPPED.
> ---
>  gdb/gdbserver/linux-low.c                       |  7 ++++++
>  gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 17b2a51..56a33ff 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
>  	  child_lwp->status_pending_p = 0;
>  	  child_thr = get_lwp_thread (child_lwp);
>  	  child_thr->last_resume_kind = resume_stop;
> +	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;
> +
> +	  /* If we're suspending all threads, leave this one suspended
> +	     too.  */
> +	  if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
> +	    child_lwp->suspended = 1;
> +
>  	  parent_proc = get_thread_process (event_thr);
>  	  child_proc->attached = parent_proc->attached;
>  	  clone_all_breakpoints (&child_proc->breakpoints,
> diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> index f44dd76..80d2464 100644
> --- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> +++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> @@ -48,13 +48,43 @@ gdb_test_multiple $test $test {
>      }
>  }
>  
> +# gdbserver had a bug that resulted in reporting the fork child's
> +# initial stop to gdb, which gdb does not expect, in turn resulting in
> +# a broken session, like:
> +#
> +#  [Thread 31536.31536] #16 stopped.                                   <== BAD
> +#  [New Thread 31547.31547]
> +#  [Inferior 10 (process 31536) exited normally]
> +#  [New Thread 31547.31560]
> +#
> +#  [Thread 31547.31547] #18 stopped.                                   <== BAD
> +#  Cannot remove breakpoints because program is no longer writable.    <== BAD
> +#  Further execution is probably impossible.                           <== BAD
> +#  [Inferior 11 (process 31547) exited normally]
> +#  [Inferior 1 (process 31454) exited normally]
> +#
> +# These variables track whether we see such broken behavior.
> +set saw_cannot_remove_breakpoints 0
> +set saw_thread_stopped 0
> +
>  set test "reached breakpoint"
>  gdb_test_multiple "" $test {
> +    -re "Cannot remove breakpoints" {
> +	set saw_cannot_remove_breakpoints 1
> +	exp_continue
> +    }
> +    -re "Thread \[^\r\n\]+ stopped\\." {
> +	set saw_thread_stopped 1
> +	exp_continue
> +    }
>      -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
>  	pass $test
>      }
>  }
>  
> +gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints"
> +gdb_assert !$saw_thread_stopped "no spurious thread stop"
> +
>  gdb_test "info threads" "No threads\." \
>      "no threads left"

I tried it and it works as expected.  If you try the same test program in all-stop
though, fork childs are left stopped.  Is it expected?  I am not sure how forking
interacts with all-stop.

-----------------------

$ ./gdb -q -nx -ex "set detach-on-fork off"  testsuite/gdb.threads/fork-plus-threads
Reading symbols from testsuite/gdb.threads/fork-plus-threads...done.
(gdb) r &
Starting program: /home/emaisin/src/binutils-gdb/gdb/testsuite/gdb.threads/fork-plus-threads
(gdb) [Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5304]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5305]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5306]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5307]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5308]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5309]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5310]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5311]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5312]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 5313]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
i th
  Id   Target Id         Frame
  11   process 5313 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  10   process 5312 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  9    process 5311 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  8    process 5310 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  7    process 5309 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  6    process 5308 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  5    process 5307 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  4    process 5306 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  3    process 5305 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
  2    process 5304 "fork-plus-threa" 0x00007ffff78b8025 in fork () from /lib/x86_64-linux-gnu/libc.so.6
* 1    Thread 0x7ffff7fc9740 (LWP 5300) "fork-plus-threa" (running)
  
Pedro Alves July 24, 2015, 6:17 p.m. UTC | #2
On 07/24/2015 07:05 PM, Simon Marchi wrote:

> I tried it and it works as expected.  If you try the same test program in all-stop
> though, fork childs are left stopped.  Is it expected?  I am not sure how forking
> interacts with all-stop.

Yeah, in all-stop, you need "set schedule-multiple on" to let all processes run.
That seems to trip on more breakage:

...
[Thread 0x7ffff57bc700 (LWP 11703) exited]
[Thread 0x7ffff7fc1700 (LWP 11700) exited]
[New Thread 0x7ffff77c0700 (LWP 11710)]
[New Thread 0x7ffff67be700 (LWP 11709)]
[New Thread 0x7ffff57bc700 (LWP 11711)]
[New Thread 0x7ffff4fbb700 (LWP 11712)]
[New Thread 0x7ffff3fb9700 (LWP 11713)]
[New process 11702]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread 0x7ffff3fb9700 (LWP 11713) exited]
[Thread 0x7ffff4fbb700 (LWP 11712) exited]
[Thread 0x7ffff57bc700 (LWP 11711) exited]
[Thread 0x7ffff77c0700 (LWP 11710) exited]
[Thread 0x7ffff67be700 (LWP 11709) exited]
[New Thread 0x7ffff7fc1700 (LWP 11714)]
[New Thread 0x7ffff6fbf700 (LWP 11716)]
[New Thread 0x7ffff5fbd700 (LWP 11715)]
[New Thread 0x7ffff37b8700 (LWP 11717)]
[Inferior 3 (process 11634) exited normally]
Cannot find new threads: generic error
(gdb) info threads
Cannot find new threads: generic error
(gdb) info threads
Cannot find new threads: generic error
(gdb) q

ISTR that's not a new bug, but I haven't tried older releases.

Thanks,
Pedro Alves
  
Don Breazeal July 24, 2015, 6:43 p.m. UTC | #3
On 7/23/2015 11:21 AM, Pedro Alves wrote:
> So I managed to extract out this smaller patch from the
> gdbserver fixes I mentioned.  I think this one looks safe enough
> for 7.10.  WDYT?
> 
> -----------
> From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 23 Jul 2015 18:49:51 +0100
> Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode
> 
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
> 
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
> 
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
>  continue &
>  Continuing.
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
>  [New Thread 28092.28092]
> 
>  [Thread 28092.28092] #2 stopped.
>  [New Thread 28094.28094]
>  [Inferior 2 (process 28092) exited normally]
>  [New Thread 28094.28105]
>  [New Thread 28094.28109]
> 
> ...
> 
> [Thread 28174.28174] #18 stopped.
>  [New Thread 28185.28185]
>  [Inferior 10 (process 28174) exited normally]
>  [New Thread 28185.28196]
> 
>  [Thread 28185.28185] #20 stopped.
>  Cannot remove breakpoints because program is no longer writable.
>  Further execution is probably impossible.
>  [Inferior 11 (process 28185) exited normally]
>  [Inferior 1 (process 28091) exited normally]
>  PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
>  info threads
>  No threads.
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
>  info inferiors
>    Num  Description       Executable
>  * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
>  (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left
> 
> All the "[Thread FOO] #NN stopped." above are bogus, as well as the
> "Cannot remove breakpoints because program is no longer writable.",
> which is a consequence.
> 
> The problem is that when we intercept a fork event, we should report
> the event for the parent, only, and leave the child stopped, but not
> report its stop event.  GDB later decides whether to follow the parent
> or the child.  But because handle_extended_wait does not set the
> child's last_status.kind to TARGET_WAITKIND_STOPPED, a
> stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
> memory) by mistake ends up queueing a SIGSTOP on the child, resuming
> it, and then when that SIGSTOP is intercepted, because the LWP has
> last_resume_kind set to resume_stop, gdbserver reports the stop to
> GDB, as GDB_SIGNAL_0:
> 
> ...
>  >>>> entering unstop_all_lwps
>  unstopping all lwps
>  proceed_one_lwp: lwp 1600
>     client wants LWP to remain 1600 stopped
>  proceed_one_lwp: lwp 1828
>  Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  Sending sigstop to lwp 1828
>  pc is 0x3615ebc7cc
>  Resuming lwp 1828 (continue, signal 0, stop expected)
>    continue from pc 0x3615ebc7cc
>  unstop_all_lwps done
>  sigchld_handler
>  <<<< exiting unstop_all_lwps
>  handling possible target event
>  >>>> entering linux_wait_1
>  linux_wait_1: [<all threads>]
>  my_waitpid (-1, 0x40000001)
>  my_waitpid (-1, 0x1): status(137f), 1828
>  LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
>  LLW: waitpid 1828 received Stopped (signal) (stopped)
>  pc is 0x3615ebc7cc
>  Expected stop.
>  LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ...
>  linux_wait_1 ret = LWP 1828.1828, 1, 0
>  <<<< exiting linux_wait_1
>  Writing resume reply for LWP 1828.1828:1
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> By inspection, I also noticed that we miss leaving the child with the
> suspend count incremented if stopping threads, like we do for clone
> threads.
> 
> Tested on x86_64 Fedora 20, extended-remote.
> 
> gdb/gdbserver/ChangeLog:
> 2015-07-23  Pedro Alves  <palves@redhat.com>
> 
> 	* linux-low.c (handle_extended_wait): Set the child's last
> 	reported status to TARGET_WAITKIND_STOPPED.
> ---
>  gdb/gdbserver/linux-low.c                       |  7 ++++++
>  gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 17b2a51..56a33ff 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
>  	  child_lwp->status_pending_p = 0;
>  	  child_thr = get_lwp_thread (child_lwp);
>  	  child_thr->last_resume_kind = resume_stop;
> +	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;

This makes perfect sense to me.

> +
> +	  /* If we're suspending all threads, leave this one suspended
> +	     too.  */
> +	  if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
> +	    child_lwp->suspended = 1;

I have a question about this.  In the definition of struct lwp_info in
linux-low.h, it has this comment:

  /* When this is true, we shall not try to resume this thread, even
     if last_resume_kind isn't resume_stop.  */
  int suspended;

Since we are setting last_resume_kind to resume_stop here, is this
unnecessary?

Thanks,
--Don

> +
>  	  parent_proc = get_thread_process (event_thr);
>  	  child_proc->attached = parent_proc->attached;
>  	  clone_all_breakpoints (&child_proc->breakpoints,
> diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> index f44dd76..80d2464 100644
> --- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> +++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> @@ -48,13 +48,43 @@ gdb_test_multiple $test $test {
>      }
>  }
>  
> +# gdbserver had a bug that resulted in reporting the fork child's
> +# initial stop to gdb, which gdb does not expect, in turn resulting in
> +# a broken session, like:
> +#
> +#  [Thread 31536.31536] #16 stopped.                                   <== BAD
> +#  [New Thread 31547.31547]
> +#  [Inferior 10 (process 31536) exited normally]
> +#  [New Thread 31547.31560]
> +#
> +#  [Thread 31547.31547] #18 stopped.                                   <== BAD
> +#  Cannot remove breakpoints because program is no longer writable.    <== BAD
> +#  Further execution is probably impossible.                           <== BAD
> +#  [Inferior 11 (process 31547) exited normally]
> +#  [Inferior 1 (process 31454) exited normally]
> +#
> +# These variables track whether we see such broken behavior.
> +set saw_cannot_remove_breakpoints 0
> +set saw_thread_stopped 0
> +
>  set test "reached breakpoint"
>  gdb_test_multiple "" $test {
> +    -re "Cannot remove breakpoints" {
> +	set saw_cannot_remove_breakpoints 1
> +	exp_continue
> +    }
> +    -re "Thread \[^\r\n\]+ stopped\\." {
> +	set saw_thread_stopped 1
> +	exp_continue
> +    }
>      -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
>  	pass $test
>      }
>  }
>  
> +gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints"
> +gdb_assert !$saw_thread_stopped "no spurious thread stop"
> +
>  gdb_test "info threads" "No threads\." \
>      "no threads left"
>  
>
  
Pedro Alves July 29, 2015, 1:21 p.m. UTC | #4
Hi Don,

Sorry for the delay.

On 07/24/2015 07:43 PM, Don Breazeal wrote:

>> index 17b2a51..56a33ff 100644
>> --- a/gdb/gdbserver/linux-low.c
>> +++ b/gdb/gdbserver/linux-low.c
>> @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
>>  	  child_lwp->status_pending_p = 0;
>>  	  child_thr = get_lwp_thread (child_lwp);
>>  	  child_thr->last_resume_kind = resume_stop;
>> +	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;
> 
> This makes perfect sense to me.
> 

Great.

>> +
>> +	  /* If we're suspending all threads, leave this one suspended
>> +	     too.  */
>> +	  if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
>> +	    child_lwp->suspended = 1;
> 
> I have a question about this.  In the definition of struct lwp_info in
> linux-low.h, it has this comment:
> 
>   /* When this is true, we shall not try to resume this thread, even
>      if last_resume_kind isn't resume_stop.  */
>   int suspended;
> 
> Since we are setting last_resume_kind to resume_stop here, is this
> unnecessary?

We still need it, because otherwise we'd decrement the suspend count
below 0:

static int
unsuspend_and_proceed_one_lwp (struct inferior_list_entry *entry, void *except)
{
  struct thread_info *thread = (struct thread_info *) entry;
  struct lwp_info *lwp = get_thread_lwp (thread);

  if (lwp == except)
    return 0;

  lwp->suspended--;
  gdb_assert (lwp->suspended >= 0);

  return proceed_one_lwp (entry, except);
}


It's proceed_one_lwp that skips resuming if the client wants the
lwp stopped:

static int
proceed_one_lwp (struct inferior_list_entry *entry, void *except)
{
...
  if (thread->last_resume_kind == resume_stop
      && thread->last_status.kind != TARGET_WAITKIND_IGNORE)
    {
      if (debug_threads)
	debug_printf ("   client wants LWP to remain %ld stopped\n",
		      lwpid_of (thread));
      return 0;
    }




I tried writing a test for this, by making a multithreaded program
have all its threads but the main continuously fork (see attached), while
the main thread continuously steps over a breakpoint (a conditional
breakpoint with condition "0" should do it, as gdbserver handles
that breakpoint itself), but that stumbles on yet more problems...  :-/

$ ./gdb ./testsuite/gdb.threads/fork-plus-threads-2 -ex "set non-stop on" -ex "set detach-on-fork off" -ex "tar extended-rem :9999"
...
Remote debugging using :9999
(gdb)
[Thread 24971.24971] #1 stopped.
0x0000003615a011f0 in ?? ()
c&
Continuing.
(gdb) [New Thread 24971.24981]
[New Thread 24983.24983]
[New Thread 24971.24982]

[Thread 24983.24983] #3 stopped.
0x0000003615ebc7cc in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:130
130       pid = ARCH_FORK ();
[New Thread 24984.24984]
Error in re-setting breakpoint -16: PC register is not available
Error in re-setting breakpoint -17: PC register is not available
Error in re-setting breakpoint -18: PC register is not available
Error in re-setting breakpoint -19: PC register is not available
Error in re-setting breakpoint -24: PC register is not available
Error in re-setting breakpoint -25: PC register is not available
Error in re-setting breakpoint -26: PC register is not available
Error in re-setting breakpoint -27: PC register is not available
Error in re-setting breakpoint -28: PC register is not available
Error in re-setting breakpoint -29: PC register is not available
Error in re-setting breakpoint -30: PC register is not available
PC register is not available
(gdb)

>>  set test "reached breakpoint"

BTW, I noticed that this test message is stale from my previous attempt
at running to a breakpoint instead of to exit.  I changed it to:

 set test "inferior 1 exited"

in patch 1/2.

>>  gdb_test_multiple "" $test {
>> +    -re "Cannot remove breakpoints" {
>> +	set saw_cannot_remove_breakpoints 1
>> +	exp_continue
>> +    }
>> +    -re "Thread \[^\r\n\]+ stopped\\." {
>> +	set saw_thread_stopped 1
>> +	exp_continue
>> +    }
>>      -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
>>  	pass $test
>>      }
>>  }

Thanks,
Pedro Alves
/* This testcase is part of GDB, the GNU debugger.

   Copyright 2015 Free Software Foundation, Inc.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

#include <assert.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdlib.h>

/* Number of threads.  Each thread continuously spawns a fork and wait
   for it.  If we have another thread continuously start a step over,
   gdbserver should end up finding new forks while suspending
   threads.  */
#define NTHREADS 10

pthread_t threads[NTHREADS];

static void *
thread_func (void *arg)
{
  while (1)
    {
      pid_t pid;

      pid = fork ();

      if (pid > 0)
	{
	  int status;

	  /* Parent.  */
	  pid = waitpid (pid, &status, 0);
	  if (pid == -1)
	    {
	      perror ("wait");
	      exit (1);
	    }

	  if (!WIFEXITED (status))
	    {
	      printf ("Unexpected wait status 0x%x from child %d\n",
		      status, pid);
	    }
	}
      else if (pid == 0)
	{
	  /* Child.  */
	  exit (0);
	}
      else
	{
	  perror ("fork");
	  exit (1);
	}
    }
}

int
main (void)
{
  int i;
  int ret;

  for (i = 0; i < NTHREADS; i++)
    {
      ret = pthread_create (&threads[i], NULL, thread_func, NULL);
      assert (ret == 0);
    }

  for (i = 0; i < NTHREADS; i++)
    {
      ret = pthread_join (threads[i], NULL);
      assert (ret == 0);
    }

  /* Don't run forever.  */
  sleep (180);

  return 0;
}
  
Pedro Alves July 29, 2015, 1:38 p.m. UTC | #5
On 07/29/2015 02:21 PM, Pedro Alves wrote:

> 
> I tried writing a test for this, by making a multithreaded program
> have all its threads but the main continuously fork (see attached), while
> the main thread continuously steps over a breakpoint (a conditional
> breakpoint with condition "0" should do it, as gdbserver handles
> that breakpoint itself), but that stumbles on yet more problems...  :-/
> 
> $ ./gdb ./testsuite/gdb.threads/fork-plus-threads-2 -ex "set non-stop on" -ex "set detach-on-fork off" -ex "tar extended-rem :9999"
> ...
> Remote debugging using :9999
> (gdb)
> [Thread 24971.24971] #1 stopped.
> 0x0000003615a011f0 in ?? ()
> c&
> Continuing.
> (gdb) [New Thread 24971.24981]
> [New Thread 24983.24983]
> [New Thread 24971.24982]
> 
> [Thread 24983.24983] #3 stopped.
> 0x0000003615ebc7cc in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:130
> 130       pid = ARCH_FORK ();
> [New Thread 24984.24984]
> Error in re-setting breakpoint -16: PC register is not available
> Error in re-setting breakpoint -17: PC register is not available
> Error in re-setting breakpoint -18: PC register is not available
> Error in re-setting breakpoint -19: PC register is not available
> Error in re-setting breakpoint -24: PC register is not available
> Error in re-setting breakpoint -25: PC register is not available
> Error in re-setting breakpoint -26: PC register is not available
> Error in re-setting breakpoint -27: PC register is not available
> Error in re-setting breakpoint -28: PC register is not available
> Error in re-setting breakpoint -29: PC register is not available
> Error in re-setting breakpoint -30: PC register is not available
> PC register is not available
> (gdb)
> 

Hmm, gdbserver's logs (for a different run) show:

...
HEW: Got clone event from LWP 25962, new child is LWP 25989
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x1): status(137f), 25990
LWFE: waitpid(-1, ...) returned 25990, ERRNO-OK
LLW: waitpid 25990 received Stopped (signal) (stopped)
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x1): status(0), 25988
LWFE: waitpid(-1, ...) returned 25988, ERRNO-OK
LLW: waitpid 25988 received 0 (exited)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
LLFE: 25988 exited.
^^^^^^^^^^^^^^^^^^^
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x80000001): status(1057f), 25973
LWFE: waitpid(-1, ...) returned 25973, ERRNO-OK
LLW: waitpid 25973 received Trace/breakpoint trap (stopped)
pc is 0x3615ebc7cc
HEW: Got fork event from LWP 25973, new child is 25990
pc is 0x3615ebc7cc
pc is 0x3615ebc7cc
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x80000001): status(117f), 25972
LWFE: waitpid(-1, ...) returned 25972, ERRNO-OK
LLW: waitpid 25972 received Child exited (stopped)
pc is 0x3616a0f279
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x80000001): status(117f), 0
LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
RSRL: resuming stopped-resumed LWP LWP 25962.25962 at 3615ef4ce1: step=0
pc is 0x3615ef4ce1
Resuming lwp 25962 (continue, signal 0, stop not expected)
  continue from pc 0x3615ef4ce1
RSRL: resuming stopped-resumed LWP LWP 25962.25989 at 0: step=0
pc is 0x3615ef4ce1
Resuming lwp 25989 (continue, signal 0, stop not expected)
  continue from pc 0x3615ef4ce1
sigchld_handler
Ignored signal 17 for LWP 25972.
pc is 0x3616a0f279
Resuming lwp 25972 (continue, signal 17, stop not expected)
  continue from pc 0x3616a0f279
handling possible target event
>>>> entering linux_wait_1
linux_wait_1: [<all threads>]
Got a pending child 25973
Got an event from pending child 25973 (1057f)
Hit a non-gdbserver trap event.
SEL: Found 2 SIGTRAP events, selecting #1
linux_wait_1 ret = LWP 25988.25988, 1, 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ("1" is TARGET_WAITKIND_STOPPED)
<<<< exiting linux_wait_1
Writing resume reply for LWP 25988.25988:1
ptrace(regsets_fetch_inferior_registers) PID=25988: No such process
ptrace(regsets_fetch_inferior_registers) PID=25988: No such process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Ignore the "SIGTRAP" mention in "SEL: Found 2 SIGTRAP events",
it's "two events".  And the one that was picked was a process
exit.  But the tail end of linux_wait_1 isn't expecting that
can happen.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 17b2a51..56a33ff 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -488,6 +488,13 @@  handle_extended_wait (struct lwp_info *event_lwp, int wstat)
 	  child_lwp->status_pending_p = 0;
 	  child_thr = get_lwp_thread (child_lwp);
 	  child_thr->last_resume_kind = resume_stop;
+	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;
+
+	  /* If we're suspending all threads, leave this one suspended
+	     too.  */
+	  if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
+	    child_lwp->suspended = 1;
+
 	  parent_proc = get_thread_process (event_thr);
 	  child_proc->attached = parent_proc->attached;
 	  clone_all_breakpoints (&child_proc->breakpoints,
diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
index f44dd76..80d2464 100644
--- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
+++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
@@ -48,13 +48,43 @@  gdb_test_multiple $test $test {
     }
 }
 
+# gdbserver had a bug that resulted in reporting the fork child's
+# initial stop to gdb, which gdb does not expect, in turn resulting in
+# a broken session, like:
+#
+#  [Thread 31536.31536] #16 stopped.                                   <== BAD
+#  [New Thread 31547.31547]
+#  [Inferior 10 (process 31536) exited normally]
+#  [New Thread 31547.31560]
+#
+#  [Thread 31547.31547] #18 stopped.                                   <== BAD
+#  Cannot remove breakpoints because program is no longer writable.    <== BAD
+#  Further execution is probably impossible.                           <== BAD
+#  [Inferior 11 (process 31547) exited normally]
+#  [Inferior 1 (process 31454) exited normally]
+#
+# These variables track whether we see such broken behavior.
+set saw_cannot_remove_breakpoints 0
+set saw_thread_stopped 0
+
 set test "reached breakpoint"
 gdb_test_multiple "" $test {
+    -re "Cannot remove breakpoints" {
+	set saw_cannot_remove_breakpoints 1
+	exp_continue
+    }
+    -re "Thread \[^\r\n\]+ stopped\\." {
+	set saw_thread_stopped 1
+	exp_continue
+    }
     -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
 	pass $test
     }
 }
 
+gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints"
+gdb_assert !$saw_thread_stopped "no spurious thread stop"
+
 gdb_test "info threads" "No threads\." \
     "no threads left"