Should this be on the blocker list for the 7.10 release?

Message ID 559C268E.4050706@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves July 7, 2015, 7:20 p.m. UTC
  On 07/07/2015 08:04 PM, Pedro Alves wrote:

> ...
> WL: waitpid Thread 0x7ffff7fc2740 (LWP 9513) received Trace/breakpoint trap (stopped)
> WL: Handling extended status 0x03057f
> LHEW: Got clone event from LWP 9513, new child is LWP 9579
> [New Thread 0x7ffff37b8700 (LWP 9579)]
> WL: waitpid Thread 0x7ffff7fc2740 (LWP 9508) received 0 (exited)
> WL: Thread 0x7ffff7fc2740 (LWP 9508) exited.
>                            ^^^^^^^^
> [Thread 0x7ffff7fc2740 (LWP 9508) exited]
> WL: waitpid Thread 0x7ffff7fc2740 (LWP 9499) received 0 (exited)
> WL: Thread 0x7ffff7fc2740 (LWP 9499) exited.
> [Thread 0x7ffff7fc2740 (LWP 9499) exited]
> RSRL: resuming stopped-resumed LWP Thread 0x7ffff37b8700 (LWP 9579) at 0x3615ef4ce1: step=0
> ...
> (gdb) info inferiors
>   Num  Description       Executable
>   5    process 9508      /home/pedro/bugs/src/test
>                ^^^^
>   4    process 9503      /home/pedro/bugs/src/test
>   3    process 9500      /home/pedro/bugs/src/test
>   2    process 9499      /home/pedro/bugs/src/test
> * 1    <null>            /home/pedro/bugs/src/test
> (gdb)
> ...
> 
> Note the "Thread 0x7ffff7fc2740 (LWP 9508) exited." line.
> That's this in wait_lwp:
> 
>       /* Check if the thread has exited.  */
>       if (WIFEXITED (status) || WIFSIGNALED (status))
> 	{
> 	  thread_dead = 1;
> 	  if (debug_linux_nat)
> 	    fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n",
> 				target_pid_to_str (lp->ptid));
> 	}
>     }
> 
> This code doesn't understand that an WIFEXITED status
> of the last lwp of the process should be reported as
> process exit.  Haven't tried to fix it yet.

This one (on top of the other) fixes this for me.  No
testsuite regressions on x86_64 F20.

-----
From 1290101d792c0e1d8c4e202cd7d900837db0ee84 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 7 Jul 2015 19:50:38 +0100
Subject: [PATCH] missing exit event

---
 gdb/linux-nat.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
  

Comments

Simon Marchi July 7, 2015, 8:38 p.m. UTC | #1
On 15-07-07 03:20 PM, Pedro Alves wrote:
> This one (on top of the other) fixes this for me.  No
> testsuite regressions on x86_64 F20.
> 
> -----
> From 1290101d792c0e1d8c4e202cd7d900837db0ee84 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Tue, 7 Jul 2015 19:50:38 +0100
> Subject: [PATCH] missing exit event
> 
> ---
>  gdb/linux-nat.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index ea38ebb..281a270 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -2275,6 +2275,20 @@ wait_lwp (struct lwp_info *lp)
>        /* Check if the thread has exited.  */
>        if (WIFEXITED (status) || WIFSIGNALED (status))
>  	{
> +	  if (ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
> +	    {
> +	      if (debug_linux_nat)
> +		fprintf_unfiltered (gdb_stdlog, "WL: Process %d exited.\n",
> +				    ptid_get_pid (lp->ptid));
> +
> +	      /* This is the leader exiting, it means the whole
> +		 process is gone.  Store the status to report to the
> +		 core.  Store it in the lp->waitstatus, because
> +		 W_EXITCODE(0,0) == 0.  */
> +	      store_waitstatus (&lp->waitstatus, status);
> +	      return 0;
> +	    }
> +
>  	  thread_dead = 1;
>  	  if (debug_linux_nat)
>  	    fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n",
> 

Indeed it looks good.  I'll work on the test.
  
Simon Marchi July 7, 2015, 10:04 p.m. UTC | #2
On 15-07-07 03:20 PM, Pedro Alves wrote:
> This one (on top of the other) fixes this for me.  No
> testsuite regressions on x86_64 F20.
> 
> -----
> From 1290101d792c0e1d8c4e202cd7d900837db0ee84 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Tue, 7 Jul 2015 19:50:38 +0100
> Subject: [PATCH] missing exit event
> 
> ---
>  gdb/linux-nat.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index ea38ebb..281a270 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -2275,6 +2275,20 @@ wait_lwp (struct lwp_info *lp)
>        /* Check if the thread has exited.  */
>        if (WIFEXITED (status) || WIFSIGNALED (status))
>  	{
> +	  if (ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
> +	    {
> +	      if (debug_linux_nat)
> +		fprintf_unfiltered (gdb_stdlog, "WL: Process %d exited.\n",
> +				    ptid_get_pid (lp->ptid));
> +
> +	      /* This is the leader exiting, it means the whole
> +		 process is gone.  Store the status to report to the
> +		 core.  Store it in the lp->waitstatus, because
> +		 W_EXITCODE(0,0) == 0.  */
> +	      store_waitstatus (&lp->waitstatus, status);
> +	      return 0;
> +	    }
> +
>  	  thread_dead = 1;
>  	  if (debug_linux_nat)
>  	    fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n",
> 

Another question.  This is probably unrelated to the current issue, since
the behavior was the same in 7.9, but I am encountering it while writing
the test.  When running the program without the &, when should the prompt
come back?

See this transcript: http://pastebin.com/qyba8Ucd

We see that the prompt comes back the first time an inferior exits, well
before the end of the execution of inferior #1.  I would think that this
is not what we want.  If I run my program synchronously, then the inferior
forks and then the fork child exits, do we really want to bring back the
prompt at this point?  I would think that we should only show it when
the initial inferior that was ran exits (or if some other significant
event, such as a breakpoint hit, of course).

Would you agree?
  
Pedro Alves July 13, 2015, 7:20 p.m. UTC | #3
On 07/07/2015 11:04 PM, Simon Marchi wrote:

> Another question.  This is probably unrelated to the current issue, since
> the behavior was the same in 7.9, but I am encountering it while writing
> the test.  When running the program without the &, when should the prompt
> come back?

Yes, it's an issue currently that the prompt returns too eagerly.
It's "always" been that way though, not just in 7.9.

Your example was non-stop mode, but I'd say all-stop mode is even worse.
Consider the scenario of debugging the canonical multi-process
example (debug "make check" until some child process crashes).
Currently everything stops as soon as any child process exits...

> 
> See this transcript: http://pastebin.com/qyba8Ucd
> 
> We see that the prompt comes back the first time an inferior exits, well
> before the end of the execution of inferior #1.  I would think that this
> is not what we want.  If I run my program synchronously, then the inferior
> forks and then the fork child exits, do we really want to bring back the
> prompt at this point?  I would think that we should only show it when
> the initial inferior that was ran exits (or if some other significant
> event, such as a breakpoint hit, of course).

Yeah, I think we need something like this.  But should the prompt
be shown if the parent exits but the child carries on?  Why?
How do you define which process is special?

I think we should think of "run" as "create process" + "continue".
Because, say, you do "start", followed by "continue", and
then the inferior forks, and then child exits.  It's the same
situation.

Or any execution command really.  E.g., what if you already have two
processes under control of the debugger, and step a thread of inferior 2,
while running all threads of all processes
free (all-stop + schedule-multi on), and inferior 1 exits while the
step is still in progress?

It's not even clear if the prompt should be returned when a breakpoint
hits.  Since I know you've played with my itsets branch, let me put
this in terms of that branch:

Say you do "itfocus it1.1 next".  While "next" is in progress, a thread
from some other inferior (that was already running) hits a breakpoint, and
that breakpoint's suspend set does _not_ stop thread 1.1.  It would seem
that it would be reasonable to say that the "next" really hasn't finished yet,
and thread 1.1 should continue stepping?  It would follow then that
the prompt shouldn't be returned to the user yet.

Maybe what we need is for synchronous commands to define
the set of threads they're waiting for.  In "itfocus it1.1 next",
it would wait until any thread in inferior 1 reports a stop.
In "itfocus tt1.1 next", it would wait until thread 1.1 reports
a stop.  In "itfocus at1.1 next", it would wait until any thread
reports a stop.  Etc.
Since you can only run one synchronous command at a time,
the set of what to wait before the prompt is given back to the
user would be a property of the interpreter, not the thread.

> 
> Would you agree?
> 

I agree, though the devil is in the details.  I'd rather not
try to fix this for 7.10.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index ea38ebb..281a270 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2275,6 +2275,20 @@  wait_lwp (struct lwp_info *lp)
       /* Check if the thread has exited.  */
       if (WIFEXITED (status) || WIFSIGNALED (status))
 	{
+	  if (ptid_get_pid (lp->ptid) == ptid_get_lwp (lp->ptid))
+	    {
+	      if (debug_linux_nat)
+		fprintf_unfiltered (gdb_stdlog, "WL: Process %d exited.\n",
+				    ptid_get_pid (lp->ptid));
+
+	      /* This is the leader exiting, it means the whole
+		 process is gone.  Store the status to report to the
+		 core.  Store it in the lp->waitstatus, because
+		 W_EXITCODE(0,0) == 0.  */
+	      store_waitstatus (&lp->waitstatus, status);
+	      return 0;
+	    }
+
 	  thread_dead = 1;
 	  if (debug_linux_nat)
 	    fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n",