[review] infrun: mark an exited thread non-executing when attempting to stop

Message ID gerrit.1571405222000.I7cec98f40283773b79255d998511da434e9cd408@gnutoolchain-gerrit.osci.io
State New, archived
Headers

Commit Message

Simon Marchi (Code Review) Oct. 18, 2019, 1:27 p.m. UTC
  Tankut Baris Aktemur has uploaded a new change for review.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................

infrun: mark an exited thread non-executing when attempting to stop

In stop_all_threads, GDB sends signals to other threads in an attempt
to stop them.  While in a typical scenario the expected wait status is
TARGET_WAITKIND_STOPPED, it is possible that the thread GDB attempted
to stop has already terminated.  If so, a waitstatus other than
TARGET_WAITKIND_STOPPED would be received.  In that case, mark the
thread as not-executing and set its state to THREAD_EXITED.

If a wait status that denotes thread termination is ignored, GDB goes
into an infinite loop in stop_all_threads.
E.g.:

~~~
$ gdb ./a.out
(gdb) start
...
(gdb) add-inferior -exec ./a.out
...
(gdb) inferior 2
...
(gdb) start
...
(gdb) set schedule-multiple on
(gdb) set debug infrun 2
(gdb) continue
Continuing.
infrun: clear_proceed_status_thread (process 23419)
infrun: clear_proceed_status_thread (process 23703)
infrun: proceed (addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT)
infrun: proceed: resuming process 23419
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [process 23419] at 0x55555555514e
infrun: infrun_async(1)
infrun: prepare_to_wait
infrun: proceed: resuming process 23703
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [process 23703] at 0x55555555514e
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   23703.23703.0 [process 23703],
infrun:   status->kind = exited, status = 42
infrun: handle_inferior_event status->kind = exited, status = 42
[Inferior 2 (process 23703) exited with code 052]
infrun: stop_waiting
infrun: stop_all_threads
infrun: stop_all_threads, pass=0, iterations=0
infrun:   process 23419 executing, need stop
infrun: target_wait (-1.0.0, status) =
infrun:   23419.23419.0 [process 23419],
infrun:   status->kind = exited, status = 42
infrun: stop_all_threads status->kind = exited, status = 42 process 23419
infrun:   process 23419 executing, already stopping
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: stop_all_threads status->kind = no-resumed process -1
infrun:   process 23419 executing, already stopping
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: stop_all_threads status->kind = no-resumed process -1
infrun:   process 23419 executing, already stopping
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: stop_all_threads status->kind = no-resumed process -1
infrun:   process 23419 executing, already stopping
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: stop_all_threads status->kind = no-resumed process -1
infrun:   process 23419 executing, already stopping
~~~

And this polling goes on forever.  This patch prevents the infinite
looping behavior.

gdb/ChangeLog:
2019-10-18  Tankut Baris Aktemur  <tankut.baris.aktemur@intel.com>

	* infrun.c (stop_all_threads): Do NOT ignore
	TARGET_WAITKIND_NO_RESUMED, TARGET_WAITKIND_THREAD_EXITED,
	TARGET_WAITKIND_EXITED, TARGET_WAITKIND_SIGNALLED wait statuses
	received from threads we attempt to stop; mark the corresponding
	thread as THREAD_EXITED and not-executing.

Change-Id: I7cec98f40283773b79255d998511da434e9cd408
---
M gdb/infrun.c
1 file changed, 9 insertions(+), 0 deletions(-)
  

Comments

Simon Marchi (Code Review) Nov. 4, 2019, 8:35 a.m. UTC | #1
Tankut Baris Aktemur has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 1:

> Patch Set 1: Code-Review+1
> 
> This looks like an old oversight when handling the case of exited threads when we're attempting to stop all of them. Looking at older code, we used to have a message saying a thread had exited while we were stopping it, but it was removed by a cleanup.
> 
> The change looks good for me.

Thank you.  I noticed that the patch has a problem.  It leaves the exited inferior in an alive state with no threads.  It becomes not possible to re-run the program.  I will send a revision, together with updated tests.
  
Simon Marchi (Code Review) Dec. 4, 2019, 12:02 p.m. UTC | #2
Tankut Baris Aktemur has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

Kindly pinging.
Thanks.

-Baris
  
Simon Marchi (Code Review) Dec. 5, 2019, 7:25 p.m. UTC | #3
Pedro Alves has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

> Kindly pinging.
> Thanks.

Thanks.  I'm taking a look.

(FWIW, I'm skimmed this before and put it off because it seemingly didn't come with a testcase.
But today, while digging deeper, I see that this is actually part of a patch series, which _does_ include tests.  Thanks for that.  This is an instance of my annoyance with gerrit's lack of cover letter & threading.)
  
Simon Marchi (Code Review) Dec. 6, 2019, 4:19 p.m. UTC | #4
Pedro Alves has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

(1 comment)

| --- gdb/infrun.c
| +++ gdb/infrun.c
| @@ -4494,7 +4509,19 @@ stop_all_threads (void)
| +		  if (ws.kind == TARGET_WAITKIND_THREAD_EXITED)
| +		    delete_thread (t);
| +		  else
| +		    {
| +		      /* TARGET_WAITKIND_EXITED or
| +			 TARGET_WAITKIND_SIGNALLED.  */
| +		      /* Need to restore the context because
| +			 handle_inferior_exit switches it.  */
| +		      scoped_restore_current_pspace_and_thread restore;
| +		      handle_inferior_exit (event_ptid, ws);

PS2, Line 4518:

Sorry, but this doesn't look right.

We're inside stop_all_threads, processing some other event, and a
process exit event for one of the processes we're trying to stop comes
along, and this processes it right away.  Once the stop_all_threads
dance is done, we go back to handling the original event, and possibly
reporting a stop to the user.  Meanwhile, whatever state that was set
by handle_inferior_exit, like e.g. $_exitcode, is lost, or now
incorrect for the reported stop.  We also never present a stop on the
CLI for that "spurious" process exit.  Here:

 (gdb)
 continue
 Continuing.
 Executing on build: kill -9 29412 29417    (timeout = 300)
 spawn -ignore SIGHUP kill -9 29412 29417

 Program terminated with signal SIGKILL, Killed.
 The program no longer exists.
     <<<<<<<<<<< no user-visible stop / prompt here. <<<<<<<<<<<<<<<
 Program terminated with signal SIGKILL, Killed.
 The program no longer exists.
 (gdb) PASS: gdb.multi/multi-kill.exp: iteration 1: back to gdb prompt

The fix for this I think must be around leaving the
TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED
event pending, so that it is processed later when we're out of the
stop_all_threads loop and back to
dequeuing the next event.

gdb/linux-nat.c also has its own "stop all threads temporarily" logic,
and that does that -- leaves process exits pending.  See wait_lwp:

              /* If this is the leader exiting, it means the whole
                 process is gone.  Store the status to report to the
                 core.  Store it in lp->waitstatus, because lp->status
                 would be ambiguous (W_EXITCODE(0,0) == 0).  */
              store_waitstatus (&lp->waitstatus, status);
              return 0;

| +		    }
| +		}
|  	    }
|  	  else
|  	    {
|  	      thread_info *t = find_thread_ptid (event_ptid);
|  	      if (t == NULL)
|  		t = add_thread (event_ptid);
|
  
Simon Marchi (Code Review) Dec. 6, 2019, 5:41 p.m. UTC | #5
Tankut Baris Aktemur has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

(1 comment)

| --- gdb/infrun.c
| +++ gdb/infrun.c
| @@ -4494,7 +4509,19 @@ stop_all_threads (void)
| +		  if (ws.kind == TARGET_WAITKIND_THREAD_EXITED)
| +		    delete_thread (t);
| +		  else
| +		    {
| +		      /* TARGET_WAITKIND_EXITED or
| +			 TARGET_WAITKIND_SIGNALLED.  */
| +		      /* Need to restore the context because
| +			 handle_inferior_exit switches it.  */
| +		      scoped_restore_current_pspace_and_thread restore;
| +		      handle_inferior_exit (event_ptid, ws);

PS2, Line 4518:

Thanks for your comment.  I'll work on making the event pending and
then submit a revision.

| +		    }
| +		}
|  	    }
|  	  else
|  	    {
|  	      thread_info *t = find_thread_ptid (event_ptid);
|  	      if (t == NULL)
|  		t = add_thread (event_ptid);
|
  
Simon Marchi (Code Review) Dec. 9, 2019, 3:09 p.m. UTC | #6
Tankut Baris Aktemur has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

(1 comment)

> Patch Set 2:
> 
> (1 comment)

| --- gdb/infrun.c
| +++ gdb/infrun.c
| @@ -4494,7 +4509,19 @@ stop_all_threads (void)
| +		  if (ws.kind == TARGET_WAITKIND_THREAD_EXITED)
| +		    delete_thread (t);
| +		  else
| +		    {
| +		      /* TARGET_WAITKIND_EXITED or
| +			 TARGET_WAITKIND_SIGNALLED.  */
| +		      /* Need to restore the context because
| +			 handle_inferior_exit switches it.  */
| +		      scoped_restore_current_pspace_and_thread restore;
| +		      handle_inferior_exit (event_ptid, ws);

PS2, Line 4518:

> The fix for this I think must be around leaving the TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED
event pending, so that it is processed later when we're out of the
stop_all_threads loop and back to
dequeuing the next event.


I'd like to ask for your opinion on making the second exit event
pending.  One problem is, because the event has not been reported to
the user yet, the user still thinks that the inferior is alive.  So,
after getting the prompt because of the first exit event, they may be
tempted to do "info threads" or switch to the not-yet-reported-
inferior and inspect its state.  This triggers a query (e.g. of
registers) on the process that is already gone.  I tried the following
scenario with the current master branch (the patch that I proposed was
not applied):

~~~
$ gdb ./a.out
(gdb) maint set target-non-stop off
(gdb) start
...
(gdb) add-inferior -exec ./a.out
[New inferior 2]
Added inferior 2
...
(gdb) inferior 2
[Switching to inferior 2 [<null>] (/tmp/a.out)]
(gdb) start
...
(gdb) set schedule-multiple on
(gdb) c
Continuing.
[Inferior 2 (process 16331) exited normally]
(gdb) i inferiors
  Num  Description       Executable
  1    process 16137     /tmp/a.out
* 2    <null>            /tmp/a.out
(gdb) inferior 1
[Switching to inferior 1 [process 16137] (/tmp/a.out)]
[Switching to thread 1.1 (process 16137)]
Couldn't get registers: No such process.
(gdb) i threads
  Id   Target Id         Frame
Couldn't get registers: No such process.
(gdb) c
Continuing.
Couldn't get registers: No such process.
~~~

If I save the exit event in my patch as a pending event (and omit
'maint set target-non-stop off'), I get essentially the same problem.
What is the expected GDB behavior here?  Would it be alright to
actually print both exit events, followed by the gdb-prompt, where the
user can now query $_exitcode or $_exitsignal by switching between
inferiors, assuming those special variables are set correctly per
inferior?

| +		    }
| +		}
|  	    }
|  	  else
|  	    {
|  	      thread_info *t = find_thread_ptid (event_ptid);
|  	      if (t == NULL)
|  		t = add_thread (event_ptid);
|
  
Simon Marchi (Code Review) Jan. 8, 2020, 3:57 p.m. UTC | #7
Tankut Baris Aktemur has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

(1 comment)

| --- gdb/infrun.c
| +++ gdb/infrun.c
| @@ -4494,7 +4509,19 @@ stop_all_threads (void)
| +		  if (ws.kind == TARGET_WAITKIND_THREAD_EXITED)
| +		    delete_thread (t);
| +		  else
| +		    {
| +		      /* TARGET_WAITKIND_EXITED or
| +			 TARGET_WAITKIND_SIGNALLED.  */
| +		      /* Need to restore the context because
| +			 handle_inferior_exit switches it.  */
| +		      scoped_restore_current_pspace_and_thread restore;
| +		      handle_inferior_exit (event_ptid, ws);

PS2, Line 4518:

> > The fix for this I think must be around leaving the TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED
> event pending, so that it is processed later when we're out of the stop_all_threads loop and back to
> dequeuing the next event.
> 
> 
> I'd like to ask for your opinion on making the second exit event pending.
> One problem is, because the event has not been reported to the user yet,
> the user still thinks that the inferior is alive.  So, after getting the
> prompt because of the first exit event, they may be tempted to do "info
> threads" or switch to the not-yet-reported-inferior and inspect its state.  
> This triggers a query (e.g. of registers) on the process that is already 
> gone.  I tried the following scenario with the current master branch
> (the patch that I proposed was not applied):

Kindly pinging.

Thanks,

-Baris

| +		    }
| +		}
|  	    }
|  	  else
|  	    {
|  	      thread_info *t = find_thread_ptid (event_ptid);
|  	      if (t == NULL)
|  		t = add_thread (event_ptid);
|
  
Pedro Alves Jan. 9, 2020, 4:58 p.m. UTC | #8
Hi,

(I tried to reply via gerrit, but I couldn't find how to hit "Quote"
to quote/reply to this message of yours -- seems like via the web ui you
can only reply to the last comment in a thread, which in this case is a
ping, not the message I intended to quote...)

Sorry for the delay, and thanks for the ping.  I've been thinking on and off
about this.  Below you'll find my current thoughts.

On 12/9/19 3:09 PM, Tankut Baris Aktemur (Code Review) wrote:

> I'd like to ask for your opinion on making the second exit event
> pending.  One problem is, because the event has not been reported to
> the user yet, the user still thinks that the inferior is alive.  So,
> after getting the prompt because of the first exit event, they may be
> tempted to do "info threads" or switch to the not-yet-reported-
> inferior and inspect its state.  This triggers a query (e.g. of
> registers) on the process that is already gone.  I tried the following
> scenario with the current master branch (the patch that I proposed was
> not applied):
> 
> ~~~
> $ gdb ./a.out
> (gdb) maint set target-non-stop off
> (gdb) start
> ...
> (gdb) add-inferior -exec ./a.out
> [New inferior 2]
> Added inferior 2
> ...
> (gdb) inferior 2
> [Switching to inferior 2 [<null>] (/tmp/a.out)]
> (gdb) start
> ...
> (gdb) set schedule-multiple on
> (gdb) c
> Continuing.
> [Inferior 2 (process 16331) exited normally]
> (gdb) i inferiors
>   Num  Description       Executable
>   1    process 16137     /tmp/a.out
> * 2    <null>            /tmp/a.out
> (gdb) inferior 1
> [Switching to inferior 1 [process 16137] (/tmp/a.out)]
> [Switching to thread 1.1 (process 16137)]
> Couldn't get registers: No such process.
> (gdb) i threads
>   Id   Target Id         Frame
> Couldn't get registers: No such process.
> (gdb) c
> Continuing.
> Couldn't get registers: No such process.
> ~~~
> 
> If I save the exit event in my patch as a pending event (and omit
> 'maint set target-non-stop off'), I get essentially the same problem.
> What is the expected GDB behavior here?  Would it be alright to
> actually print both exit events, followed by the gdb-prompt, where the
> user can now query $_exitcode or $_exitsignal by switching between
> inferiors, assuming those special variables are set correctly per
> inferior?

I'm really not sure about that.  

As you've seen, this happens in true all-stop too, which can't report
multiple events at the same time, so I think from that angle alone,
GDB should cope better with it.

Plus, this can happen even if an inferior stopped for some other event
while at the same time some other inferior exits.  

Say, inferior 1 hits a breakpoint, and while stopping everything,
inferior 2 exits.  And GDB happens to report the breakpoint hit
first.  And now the user does "info threads" and sees the "No such process"
errors.

You could maybe think, that then maybe we should prioritize
inferior exits over breakpoint hits.  But then, what if inferior 1
stops for a breakpoint, gdb manages to stop all threads without
inferior 2 exiting, and then a SIGKILL is sent to the
supposedly-stopped inferior, from outside GDB?  

Or to make it even simpler, that SIGKILL use case can even
happen in single-inferior debugging.  

Or, in "set non-stop on" mode, the inferior is running and you so
"info threads" just between the process dying, and GDB getting
the SIGCHLD and collecting the ptrace event.

So I think that the state gets into where the inferior dies
before the inferior exit event is reported to the user is just
something that GDB needs to cope with well.

I.e., report the failures to read registers, in "info threads",
"print" etc., and importantly -- _be sure to not get into a state
where the user is stuck_.

The "not get stuck" part is where I think we should improve things.

Your example already shows where we need improvement, in the
the last "c".  A simplified version, using an external SIGKILL is:

 $ gdb --args /usr/bin/tail -f /dev/null
 GNU gdb (GDB) 10.0.50.20200106-git
 ...
 Program received signal SIGINT, Interrupt.
 0x00007ffff7b08881 in __GI___libc_read (fd=4, buf=0x555555766410, nbytes=26) at ../sysdeps/unix/sysv/linux/read.c:26
 26        return SYSCALL_CANCEL (read, fd, buf, nbytes);
 (gdb) info inferiors 
   Num  Description       Executable        
 * 1    process 9425      /usr/bin/tail     
 (gdb) shell kill -9 9425
 (gdb) flushregs 
 Register cache flushed.
 Couldn't get registers: No such process.
 (gdb) info threads
   Id   Target Id           Frame 
 Couldn't get registers: No such process.
 (gdb) c
 Continuing.
 Couldn't get registers: No such process.
 (gdb) 

This error comes from the regcache_read_pc call from
within infrun.c:proceed:

 ...
 #4  0x000000000097e04e in perror_with_name (string=0xb213e9 "Couldn't get registers") at /home/pedro/gdb/binutils-gdb/src/gdb/utils.c:612
 #5  0x0000000000452156 in amd64_linux_nat_target::fetch_registers (this=0x11a64b0 <the_amd64_linux_nat_target>, regcache=0x1eac160, regnum=16) at /home/pedro/gdb/binutils-gdb/src/gdb/amd64-linux-nat.c:225
 #6  0x0000000000901cc4 in target_fetch_registers (regcache=0x1eac160, regno=16) at /home/pedro/gdb/binutils-gdb/src/gdb/target.c:3427
 #7  0x0000000000829f94 in regcache::raw_update (this=0x1eac160, regnum=16) at /home/pedro/gdb/binutils-gdb/src/gdb/regcache.c:471
 #8  0x000000000082a039 in readable_regcache::raw_read (this=0x1eac160, regnum=16, buf=0x7fffffffcc00 "\302%") at /home/pedro/gdb/binutils-gdb/src/gdb/regcache.c:485
 #9  0x000000000082a371 in readable_regcache::cooked_read (this=0x1eac160, regnum=16, buf=0x7fffffffcc00 "\302%") at /home/pedro/gdb/binutils-gdb/src/gdb/regcache.c:577
 #10 0x000000000082eefd in readable_regcache::cooked_read<unsigned long, void> (this=0x1eac160, regnum=16, val=0x7fffffffcca8) at /home/pedro/gdb/binutils-gdb/src/gdb/regcache.c:664
 #11 0x000000000082a7d8 in regcache_cooked_read_unsigned (regcache=0x1eac160, regnum=16, val=0x7fffffffcca8) at /home/pedro/gdb/binutils-gdb/src/gdb/regcache.c:678
 #12 0x000000000082bcf5 in regcache_read_pc (regcache=0x1eac160) at /home/pedro/gdb/binutils-gdb/src/gdb/regcache.c:1182
 #13 0x00000000006e6f62 in proceed (addr=0xffffffffffffffff, siggnal=GDB_SIGNAL_DEFAULT) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:2855
 #14 0x00000000006d7af6 in continue_1 (all_threads=0) at /home/pedro/gdb/binutils-gdb/src/gdb/infcmd.c:804
 ...


I think PTRACE_EVENT_EXIT on Linux could help with at least some of
the use cases on Linux, but still, GDB should cope better on systems
that do not have that feature.

Thanks,
Pedro Alves
  
Simon Marchi (Code Review) Jan. 29, 2020, 4:19 p.m. UTC | #9
Tom de Vries has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

> I'd like to ask for your opinion on making the second exit event pending

Discussion continued here: https://sourceware.org/ml/gdb-patches/2020-01/msg00212.html .
  
Simon Marchi (Code Review) Jan. 30, 2020, 9:01 a.m. UTC | #10
Tom de Vries has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

FTR, I build patchset 2 and tested the example from PR25478, and hit an assert:
...
Thread 1 "gdb" hit Breakpoint 1, stop_all_threads () at /data/gdb_versions/devel/src/gdb/infrun.c:4395
4395      int iterations = 0;
(master-gdb) c
Continuing.
infrun: stop_all_threads
infrun: stop_all_threads, pass=0, iterations=0
infrun:   Thread 0x7ffff7fa6740 (LWP 32692) not executing
infrun:   Thread 0x7ffff77fe700 (LWP 32696) executing, need stop
infrun: target_wait (-1.0.0, status) =
infrun:   32692.32696.0 [Thread 0x7ffff77fe700 (LWP 32696)],
infrun:   status->kind = signalled, signal = GDB_SIGNAL_KILL
infrun: stop_all_threads status->kind = signalled, signal = GDB_SIGNAL_KILL Thread 0x7ffff77fe700 (LWP 32696)

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
infrun: stop_all_threads, pass=1, iterations=1
infrun: stop_all_threads done
thread.c:95: internal-error: thread_info* inferior_thread(): Assertion `tp' failed.
...
  
Simon Marchi (Code Review) Jan. 30, 2020, 9:10 a.m. UTC | #11
Tom de Vries has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

(1 comment)

| --- gdb/infrun.c
| +++ gdb/infrun.c
| @@ -4494,0 +4500,19 @@ stop_all_threads (void)
| +
| +		  /* Set the threads as non-executing to avoid infinitely
| +		     waiting for them to stop.  */
| +		  mark_non_executing_threads (event_ptid, ws);
| +
| +		  if (ws.kind == TARGET_WAITKIND_NO_RESUMED)
| +		    {
| +		      /* Do nothing.  Already marked the threads.  */
| +		    }
| +		  if (ws.kind == TARGET_WAITKIND_THREAD_EXITED)

PS2, Line 4509:

This doesn't look right. Did you mean to write:
 else if (ws.kind == TARGET_WAITKIND_THREAD_EXITED)
?

| +		    delete_thread (t);
| +		  else
| +		    {
| +		      /* TARGET_WAITKIND_EXITED or
| +			 TARGET_WAITKIND_SIGNALLED.  */
| +		      /* Need to restore the context because
| +			 handle_inferior_exit switches it.  */
| +		      scoped_restore_current_pspace_and_thread restore;
| +		      handle_inferior_exit (event_ptid, ws);
  
Simon Marchi (Code Review) Jan. 30, 2020, 4:30 p.m. UTC | #12
Tankut Baris Aktemur has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

(1 comment)

> Patch Set 2:
> 
> FTR, I build patchset 2 and tested the example from PR25478, and hit an assert:
> ...

I was not able to repeat the problematic scenario with the current master.  The multi-target feature seems to have changed the behavior.  I'll need to first look into whether I can reproduce the infinite loop.

| --- gdb/infrun.c
| +++ gdb/infrun.c
| @@ -4494,0 +4500,19 @@ stop_all_threads (void)
| +
| +		  /* Set the threads as non-executing to avoid infinitely
| +		     waiting for them to stop.  */
| +		  mark_non_executing_threads (event_ptid, ws);
| +
| +		  if (ws.kind == TARGET_WAITKIND_NO_RESUMED)
| +		    {
| +		      /* Do nothing.  Already marked the threads.  */
| +		    }
| +		  if (ws.kind == TARGET_WAITKIND_THREAD_EXITED)

PS2, Line 4509:

Oops, yes.  Thanks for spotting that.

| +		    delete_thread (t);
| +		  else
| +		    {
| +		      /* TARGET_WAITKIND_EXITED or
| +			 TARGET_WAITKIND_SIGNALLED.  */
| +		      /* Need to restore the context because
| +			 handle_inferior_exit switches it.  */
| +		      scoped_restore_current_pspace_and_thread restore;
| +		      handle_inferior_exit (event_ptid, ws);
  
Simon Marchi (Code Review) Jan. 30, 2020, 5:52 p.m. UTC | #13
Tom de Vries has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

> Patch Set 2:
> 
> (1 comment)
> 
> > Patch Set 2:
> > 
> > FTR, I build patchset 2 and tested the example from PR25478, and hit an assert:
> > ...
> 
> I was not able to repeat the problematic scenario with the current master. 

Do you mean with 'problematic scenario' PR25478 as such, or the assert?
  
Simon Marchi (Code Review) Feb. 3, 2020, 3:13 p.m. UTC | #14
Tom de Vries has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

> Patch Set 2:
> 
> FTR, I build patchset 2 and tested the example from PR25478, and hit an assert:
> ...
> Thread 1 "gdb" hit Breakpoint 1, stop_all_threads () at /data/gdb_versions/devel/src/gdb/infrun.c:4395
> 4395      int iterations = 0;
> (master-gdb) c
> Continuing.
> infrun: stop_all_threads
> infrun: stop_all_threads, pass=0, iterations=0
> infrun:   Thread 0x7ffff7fa6740 (LWP 32692) not executing
> infrun:   Thread 0x7ffff77fe700 (LWP 32696) executing, need stop
> infrun: target_wait (-1.0.0, status) =
> infrun:   32692.32696.0 [Thread 0x7ffff77fe700 (LWP 32696)],
> infrun:   status->kind = signalled, signal = GDB_SIGNAL_KILL
> infrun: stop_all_threads status->kind = signalled, signal = GDB_SIGNAL_KILL Thread 0x7ffff77fe700 (LWP 32696)
> 
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> infrun: stop_all_threads, pass=1, iterations=1
> infrun: stop_all_threads done
> thread.c:95: internal-error: thread_info* inferior_thread(): Assertion `tp' failed.
> ...

I've:
- ported patch set 2 to master
- integrated the fix in patch set 3 of
  https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 such that the assert above is
  fixed.
- build and reg-tested on x86_64-linux

Available at https://github.com/vries/gdb/tree/handle-already-exited-threads-when-attempting-to-stop-try-master-2 .

Note: this does not address all the comments on patch set 2.
  
Simon Marchi (Code Review) Feb. 3, 2020, 4:02 p.m. UTC | #15
Tankut Baris Aktemur has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

> > > 
> > > FTR, I build patchset 2 and tested the example from PR25478, and hit an assert:
> > > ...
> > 
> > I was not able to repeat the problematic scenario with the current master. 
> 
> Do you mean with 'problematic scenario' PR25478 as such, or the assert?

Sorry for the vagueness.  I was referring to the test scenario I gave in the patch
description.  Like, scenarios that can be used for testing as in
https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/134
without having to stop GDB from outside at 'stop_all_threads'.

I just saw that you posted a rebased version of this patch.  I'll look at that,
thank you.

Btw, I think this patch is kind of obsolete after Pedro's comments in
https://sourceware.org/ml/gdb-patches/2020-01/msg00212.html
because the patch causes two stop event reports at once.

-Baris
  
Simon Marchi (Code Review) Feb. 3, 2020, 4:26 p.m. UTC | #16
Tom de Vries has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

> Patch Set 2:
> I just saw that you posted a rebased version of this patch.  I'll look at that,
> thank you.

That would be great, thanks.

BTW, do you happen to have something more up-to-date than patch set 2? If so, could you share this?
  
Simon Marchi (Code Review) Feb. 4, 2020, 9:06 a.m. UTC | #17
Tankut Baris Aktemur has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 2:

> BTW, do you happen to have something more up-to-date than patch set 2? If so, could you share this?

Yes.  I need to revise the associated tests, and will post it as soon as I can -- hopefully today.
  
Simon Marchi (Code Review) Feb. 5, 2020, 1:24 p.m. UTC | #18
Tankut Baris Aktemur has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 3:

> BTW, do you happen to have something more up-to-date than patch set 2? If so, could you share this?

I've uploaded patchset 3.  This version uses pending waitstatuses and also includes fixes from
https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759

I've not fully updated the testcases in
https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/143.
There are still unaddressed comments there.

This patchset assumes that an existing problem that is explained in
https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/766
is fixed.  Otherwise the scenario described in the commit message is not reproduced.

Thanks
-Baris
  
Simon Marchi (Code Review) Feb. 5, 2020, 10:59 p.m. UTC | #19
Tom de Vries has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133
......................................................................


Patch Set 3:

> Patch Set 3:
> 
> > BTW, do you happen to have something more up-to-date than patch set 2? If so, could you share this?
> 
> I've uploaded patchset 3.  This version uses pending waitstatuses and also includes fixes from
> https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759
> 

FTR, this also fixes the gdb/testsuite/gdb.threads/kill-in-stop-all-threads.exp test-case in patch set 4 of https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 .
  
Tankut Baris Aktemur Feb. 7, 2020, 12:04 p.m. UTC | #20
Hi,

GDB goes into an infinite loop when it attempts to stop a thread in
'stop_all_threads', if the thread has already died.  This patch aims
to fix this by handling waitstatuses that denote exiting.

This is a port of

  https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133

from Gerrit to email, because Gerrit is planned to be retired.
This version of the patch is a revision of patchset 3 in Gerrit,
hence I enumerated it as v4.  The new revision simplifies the
waitstatus handling in PATCH 2/2 and also merges the tests into
that same patch (because it is more desirable that the tests
accompany the fix rather than coming as a separate commit).

PATCH 1/2 was already approved on Gerrit.

For a previous discussion, see

  https://sourceware.org/ml/gdb-patches/2020-01/msg00212.html

And also see

  https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759

The problematic scenario explained and tested in PATCH 2/2 is not
reproducible if the bug mentioned in

  https://sourceware.org/ml/gdb-patches/2020-02/msg00122.html

is not addressed.  Therefore the patch proposed here has to be applied
on top of a solution for that.  Such a branch is available at

  https://github.com/barisaktemur/gdb/tree/thread-exit-in-stop-all-threads-v4

Thanks.
Baris

Tankut Baris Aktemur (2):
  gdb/infrun: extract out a code piece into 'mark_non_executing_threads'
    function
  gdb/infrun: handle already-exited threads when attempting to stop

 gdb/infrun.c                           | 115 ++++++++++++++++---------
 gdb/testsuite/gdb.multi/multi-exit.c   |  22 +++++
 gdb/testsuite/gdb.multi/multi-exit.exp |  81 +++++++++++++++++
 gdb/testsuite/gdb.multi/multi-kill.c   |  34 ++++++++
 gdb/testsuite/gdb.multi/multi-kill.exp |  84 ++++++++++++++++++
 5 files changed, 296 insertions(+), 40 deletions(-)
 create mode 100644 gdb/testsuite/gdb.multi/multi-exit.c
 create mode 100644 gdb/testsuite/gdb.multi/multi-exit.exp
 create mode 100644 gdb/testsuite/gdb.multi/multi-kill.c
 create mode 100644 gdb/testsuite/gdb.multi/multi-kill.exp
  
Simon Marchi (Code Review) Feb. 7, 2020, 12:10 p.m. UTC | #21
Tankut Baris Aktemur has abandoned this change. ( https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133 )

Change subject: infrun: handle already-exited threads when attempting to stop
......................................................................


Abandoned

Migrated to https://sourceware.org/ml/gdb-patches/2020-02/msg00153.html
  

Patch

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 66a066f..01fcbf6 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4383,6 +4383,15 @@ 
 	    {
 	      /* All resumed threads exited
 		 or one thread/process exited/signalled.  */
+	      thread_info *t = find_thread_ptid (event_ptid);
+	      if (t != nullptr)
+		{
+		  t->stop_requested = 0;
+		  t->executing = 0;
+		  t->resumed = 0;
+		  t->control.may_range_step = 0;
+		  t->state = THREAD_EXITED;
+		}
 	    }
 	  else
 	    {