[v2,2/4] Unregister the last inferior from the event loop

Message ID alpine.LFD.2.21.1911051717500.13542@redsun52.ssa.fujisawa.hgst.com
State New, archived
Headers

Commit Message

Maciej W. Rozycki Nov. 6, 2019, 8:51 p.m. UTC
  Fix an issue with GDB starting to effectively busy-loop (though still 
accepting and interpreting commands) using the full processing power 
available when a remote connection is forcibly terminated while the 
target is running.

This was observed with RISC-V/Linux `gdbserver' modified to hang after a 
successful acceptance of a `vCont' packet, as follows:

(gdb) set debug infrun 2
(gdb) set debug remote 2
(gdb) continue
Continuing.
infrun: clear_proceed_status_thread (Thread 22854.22854)
infrun: proceed (addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT)
Sending packet: $Z0,10430,2#0c...Packet received:
Packet Z0 (software-breakpoint) is NOT supported
Sending packet: $m10430,2#c3...Packet received: 8147
Sending packet: $X10430,0:#e6...Packet received: OK
binary downloading supported by target
Sending packet: $X10430,2:\002\220#7a...Packet received: OK
Sending packet: $m15555607b8,2#07...Packet received: 8280
Sending packet: $X15555607b8,2:\002\220#be...Packet received: OK
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 22854.22854] at 0x1555556ef0
Sending packet: $QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;2c;4c;97;#0a...Packet received: OK
Sending packet: $vCont?#49...Packet received: vCont;c;C;t
Packet vCont (verbose-resume) is supported
Sending packet: $vCont;c:p5946.-1#b6...infrun: infrun_async(1)
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = ignore
infrun: handle_inferior_event status->kind = ignore
infrun: prepare_to_wait
^Cremote_pass_ctrlc called
remote_interrupt called
^Cremote_pass_ctrlc called
infrun: infrun_async(0)
The target is not responding to interrupt requests.
Stop debugging it? (y or n) y
infrun: infrun_async(1)
Disconnected from target.
(gdb) infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = ignore
infrun: handle_inferior_event status->kind = ignore
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = ignore
infrun: handle_inferior_event status->kind = ignore
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = ignore
infrun: handle_inferior_event status->kind = ignore
infrun: prepare_to_wait
[...]

This is because `remote_target::resume' enables the asynchronous event 
loop, as indicated by the first `infrun: infrun_async(1)' record above, 
and then the confirmation dialogue temporarily disables it and then 
reenables, as indicated by the second `infrun: infrun_async(1)' record 
above.  The problem with that approach is that the reenabling also marks 
the handler for the `infrun_async_inferior_event_token' event ready, 
even though it was not before the temporary disabling, by calling 
`mark_async_event_handler' on it, and that triggers the infinite loop as 
there's no inferior anymore and consequently no event waiting that would 
stop it.

Unregister the `infrun_async_inferior_event_token' event from the loop 
then when the last inferior has gone.  The final part of the session now 
looks like:

^Cremote_pass_ctrlc called
remote_interrupt called
^Cremote_pass_ctrlc called
infrun: infrun_async(0)
The target is not responding to interrupt requests.
Stop debugging it? (y or n) y
infrun: infrun_async(1)
infrun: infrun_async(0)
Disconnected from target.
(gdb) 

and the looping does not happen anymore.

	gdb/
	* target.c (target_stack::unpush): Unregister the last inferior 
	from the event loop.
---
New change in v2.
---
 gdb/target.c |    6 ++++++
 1 file changed, 6 insertions(+)

gdb-unpush-target-async.diff
  

Comments

Simon Marchi Jan. 21, 2020, 5:41 a.m. UTC | #1
On 2019-11-06 3:51 p.m., Maciej W. Rozycki wrote:
> Fix an issue with GDB starting to effectively busy-loop (though still 
> accepting and interpreting commands) using the full processing power 
> available when a remote connection is forcibly terminated while the 
> target is running.
> 
> This was observed with RISC-V/Linux `gdbserver' modified to hang after a 
> successful acceptance of a `vCont' packet, as follows:
> 
> (gdb) set debug infrun 2
> (gdb) set debug remote 2
> (gdb) continue
> Continuing.
> infrun: clear_proceed_status_thread (Thread 22854.22854)
> infrun: proceed (addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT)
> Sending packet: $Z0,10430,2#0c...Packet received:
> Packet Z0 (software-breakpoint) is NOT supported
> Sending packet: $m10430,2#c3...Packet received: 8147
> Sending packet: $X10430,0:#e6...Packet received: OK
> binary downloading supported by target
> Sending packet: $X10430,2:\002\220#7a...Packet received: OK
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $X15555607b8,2:\002\220#be...Packet received: OK
> infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 22854.22854] at 0x1555556ef0
> Sending packet: $QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;2c;4c;97;#0a...Packet received: OK
> Sending packet: $vCont?#49...Packet received: vCont;c;C;t
> Packet vCont (verbose-resume) is supported
> Sending packet: $vCont;c:p5946.-1#b6...infrun: infrun_async(1)
> infrun: prepare_to_wait
> infrun: target_wait (-1.0.0, status) =
> infrun:   -1.0.0 [process -1],
> infrun:   status->kind = ignore
> infrun: handle_inferior_event status->kind = ignore
> infrun: prepare_to_wait
> ^Cremote_pass_ctrlc called
> remote_interrupt called
> ^Cremote_pass_ctrlc called
> infrun: infrun_async(0)
> The target is not responding to interrupt requests.
> Stop debugging it? (y or n) y
> infrun: infrun_async(1)
> Disconnected from target.
> (gdb) infrun: target_wait (-1.0.0, status) =
> infrun:   -1.0.0 [process -1],
> infrun:   status->kind = ignore
> infrun: handle_inferior_event status->kind = ignore
> infrun: prepare_to_wait
> infrun: target_wait (-1.0.0, status) =
> infrun:   -1.0.0 [process -1],
> infrun:   status->kind = ignore
> infrun: handle_inferior_event status->kind = ignore
> infrun: prepare_to_wait
> infrun: target_wait (-1.0.0, status) =
> infrun:   -1.0.0 [process -1],
> infrun:   status->kind = ignore
> infrun: handle_inferior_event status->kind = ignore
> infrun: prepare_to_wait
> [...]
> 
> This is because `remote_target::resume' enables the asynchronous event 
> loop, as indicated by the first `infrun: infrun_async(1)' record above, 
> and then the confirmation dialogue temporarily disables it and then 
> reenables, as indicated by the second `infrun: infrun_async(1)' record 
> above.  The problem with that approach is that the reenabling also marks 
> the handler for the `infrun_async_inferior_event_token' event ready, 
> even though it was not before the temporary disabling, by calling 
> `mark_async_event_handler' on it, and that triggers the infinite loop as 
> there's no inferior anymore and consequently no event waiting that would 
> stop it.

I don't understand this description.  Let's assume that the second call to
infrun_async indeed left infrun_async_inferior_event_token.ready to true.
Then I would expect the event loop to call it once, setting the ready flag
to false.  After that, the ready being false, I don't see why the callback
of infrun_async_inferior_event_token would get called in a loop.

Simon
  
Maciej W. Rozycki Jan. 21, 2020, 8:29 a.m. UTC | #2
Hi Simon,

> > This is because `remote_target::resume' enables the asynchronous event 
> > loop, as indicated by the first `infrun: infrun_async(1)' record above, 
> > and then the confirmation dialogue temporarily disables it and then 
> > reenables, as indicated by the second `infrun: infrun_async(1)' record 
> > above.  The problem with that approach is that the reenabling also marks 
> > the handler for the `infrun_async_inferior_event_token' event ready, 
> > even though it was not before the temporary disabling, by calling 
> > `mark_async_event_handler' on it, and that triggers the infinite loop as 
> > there's no inferior anymore and consequently no event waiting that would 
> > stop it.
> 
> I don't understand this description.  Let's assume that the second call to
> infrun_async indeed left infrun_async_inferior_event_token.ready to true.
> Then I would expect the event loop to call it once, setting the ready flag
> to false.  After that, the ready being false, I don't see why the callback
> of infrun_async_inferior_event_token would get called in a loop.

 Thanks for looking into it.  It's been a while however, so my memory has 
become fuzzy on this.

 I have therefore gone reading through the code again and what I can see 
is that in the async mode the `ready' (readiness) flag is never cleared: 
see `infrun_async' and `remote_target::async', which are the only places 
to call `clear_async_event_handler' and then only when the async mode is 
being disabled.

 On the other hand waiting for an inferior event does get disabled in 
`handle_inferior_event' regardless of the readiness flag, by calling 
`stop_waiting', for certain events, but not for TARGET_WAITKIND_IGNORE.  
Instead for that event `prepare_to_wait' is called, which makes sense to 
me because such an event does not indicate whether waiting should or 
should not be disabled, and with an asynchronous target you can normally 
(i.e. if not indicated by a specific event received otherwise, e.g. 
TARGET_WAITKIND_EXITED) expect a further event to be received anytime.

 Does this clarify the problematic scenario to you?

  Maciej
  
Simon Marchi Jan. 21, 2020, 5:11 p.m. UTC | #3
On 2020-01-21 3:29 a.m., Maciej W. Rozycki wrote:
> Hi Simon,
> 
>>> This is because `remote_target::resume' enables the asynchronous event 
>>> loop, as indicated by the first `infrun: infrun_async(1)' record above, 
>>> and then the confirmation dialogue temporarily disables it and then 
>>> reenables, as indicated by the second `infrun: infrun_async(1)' record 
>>> above.  The problem with that approach is that the reenabling also marks 
>>> the handler for the `infrun_async_inferior_event_token' event ready, 
>>> even though it was not before the temporary disabling, by calling 
>>> `mark_async_event_handler' on it, and that triggers the infinite loop as 
>>> there's no inferior anymore and consequently no event waiting that would 
>>> stop it.
>>
>> I don't understand this description.  Let's assume that the second call to
>> infrun_async indeed left infrun_async_inferior_event_token.ready to true.
>> Then I would expect the event loop to call it once, setting the ready flag
>> to false.  After that, the ready being false, I don't see why the callback
>> of infrun_async_inferior_event_token would get called in a loop.
> 
>  Thanks for looking into it.  It's been a while however, so my memory has 
> become fuzzy on this.
> 
>  I have therefore gone reading through the code again and what I can see 
> is that in the async mode the `ready' (readiness) flag is never cleared: 
> see `infrun_async' and `remote_target::async', which are the only places 
> to call `clear_async_event_handler' and then only when the async mode is 
> being disabled.

The ready flag is also cleared in `check_async_event_handlers`, just before
invoking the callback.  So I was thinking, why would the callback get called
repeatedly if the ready flag is cleared just before it gets called, and
there's nothing setting it back to true.  The answer is probably that the
busy loop is within that callback, as seen below?

>  On the other hand waiting for an inferior event does get disabled in 
> `handle_inferior_event' regardless of the readiness flag, by calling 
> `stop_waiting', for certain events, but not for TARGET_WAITKIND_IGNORE.  
> Instead for that event `prepare_to_wait' is called, which makes sense to 
> me because such an event does not indicate whether waiting should or 
> should not be disabled, and with an asynchronous target you can normally 
> (i.e. if not indicated by a specific event received otherwise, e.g. 
> TARGET_WAITKIND_EXITED) expect a further event to be received anytime.
> 
>  Does this clarify the problematic scenario to you?

Ok, so if I understand, the infinite loop is this one, inside wait_for_inferior?

  while (1)
    {
      ...

      /* Now figure out what to do with the result of the result.  */
      handle_inferior_event (ecs);

      if (!ecs->wait_some_more)
	break;
    }

After the remote target has been unpushed, the remaining target is probably
just the "exec file" target, which does not provide a ::wait implementation,
and therefore inherits default_target_wait:

ptid_t
default_target_wait (struct target_ops *ops,
		     ptid_t ptid, struct target_waitstatus *status,
		     int options)
{
  status->kind = TARGET_WAITKIND_IGNORE;
  return minus_one_ptid;
}


And because that returns TARGET_WAITKIND_IGNORE, which results in
ecs->wait_some_more getting set by handle_inferior_event/prepare_to_wait,
it results in the infinite loop in wait_for_inferior.

Does that look accurate?

Simon
  
Maciej W. Rozycki Jan. 22, 2020, 5:06 p.m. UTC | #4
On Tue, 21 Jan 2020, Simon Marchi wrote:

> >  I have therefore gone reading through the code again and what I can see 
> > is that in the async mode the `ready' (readiness) flag is never cleared: 
> > see `infrun_async' and `remote_target::async', which are the only places 
> > to call `clear_async_event_handler' and then only when the async mode is 
> > being disabled.
> 
> The ready flag is also cleared in `check_async_event_handlers`, just before
> invoking the callback.  So I was thinking, why would the callback get called
> repeatedly if the ready flag is cleared just before it gets called, and
> there's nothing setting it back to true.  The answer is probably that the
> busy loop is within that callback, as seen below?

 Indeed it was cleared in `check_async_event_handlers' and then set again 
via `mark_infrun_async_event_handler' in `prepare_to_wait' called from 
`handle_inferior_event'.

> >  On the other hand waiting for an inferior event does get disabled in 
> > `handle_inferior_event' regardless of the readiness flag, by calling 
> > `stop_waiting', for certain events, but not for TARGET_WAITKIND_IGNORE.  
> > Instead for that event `prepare_to_wait' is called, which makes sense to 
> > me because such an event does not indicate whether waiting should or 
> > should not be disabled, and with an asynchronous target you can normally 
> > (i.e. if not indicated by a specific event received otherwise, e.g. 
> > TARGET_WAITKIND_EXITED) expect a further event to be received anytime.
> > 
> >  Does this clarify the problematic scenario to you?
> 
> Ok, so if I understand, the infinite loop is this one, inside 
> wait_for_inferior?
> 
>   while (1)
>     {
>       ...
> 
>       /* Now figure out what to do with the result of the result.  */
>       handle_inferior_event (ecs);
> 
>       if (!ecs->wait_some_more)
> 	break;
>     }

 Nope, it was in `start_event_loop':

  while (1)
    {
      ...
	  result = gdb_do_one_event ();
      ...
      if (result < 0)
	break;
    }

calling `check_async_event_handlers', 
`infrun_async_inferior_event_handler', `inferior_event_handler' and 
ultimately `fetch_inferior_event':

    /* Now figure out what to do with the result of the result.  */
    handle_inferior_event (ecs);

    if (!ecs->wait_some_more)
      ...

> After the remote target has been unpushed, the remaining target is probably
> just the "exec file" target, which does not provide a ::wait implementation,
> and therefore inherits default_target_wait:
> 
> ptid_t
> default_target_wait (struct target_ops *ops,
> 		     ptid_t ptid, struct target_waitstatus *status,
> 		     int options)
> {
>   status->kind = TARGET_WAITKIND_IGNORE;
>   return minus_one_ptid;
> }

 And it was `dummy_target::wait' indeed invoking `default_target_wait'.

> And because that returns TARGET_WAITKIND_IGNORE, which results in
> ecs->wait_some_more getting set by handle_inferior_event/prepare_to_wait,
> it results in the infinite loop in wait_for_inferior.

 Right.

> Does that look accurate?

 Almost, except that code in question has since been heavily refactored 
and the call to `target_wait' does not happen anymore.  Now I need to 
bisect the tree, find the offending commit and figure out whether it has 
actually provided an alternative fix for the issue I have observed or just 
papered it over by chance somehow.

 Thanks for your input, I wish it happened earlier, before the code has 
been rearranged, sigh...

  Maciej
  
Maciej W. Rozycki Jan. 23, 2020, 1:12 a.m. UTC | #5
On Wed, 22 Jan 2020, Maciej W. Rozycki wrote:

> > And because that returns TARGET_WAITKIND_IGNORE, which results in
> > ecs->wait_some_more getting set by handle_inferior_event/prepare_to_wait,
> > it results in the infinite loop in wait_for_inferior.
> 
>  Right.
> 
> > Does that look accurate?
> 
>  Almost, except that code in question has since been heavily refactored 
> and the call to `target_wait' does not happen anymore.  Now I need to 
> bisect the tree, find the offending commit and figure out whether it has 
> actually provided an alternative fix for the issue I have observed or just 
> papered it over by chance somehow.

 OK, I have tracked it down now.  The semantics has changed with commit 
5b6d1e4fa4fc ("Multi-target support"), and in particular this change:

@@ -3719,17 +3910,28 @@ fetch_inferior_event (void *client_data)
       = make_scoped_restore (&execution_direction,
 			     target_execution_direction ());

-    ecs->ptid = do_target_wait (waiton_ptid, &ecs->ws,
-				target_can_async_p () ? TARGET_WNOHANG : 0);
+    if (!do_target_wait (minus_one_ptid, ecs, TARGET_WNOHANG))
+      return;
+
+    gdb_assert (ecs->ws.kind != TARGET_WAITKIND_IGNORE);
+
[...]

combined with the fact that the new `do_target_wait' wrapper (original 
`do_target_wait' has been renamed to `do_target_wait_1') retuns false if 
(ecs->ws.kind == TARGET_WAITKIND_IGNORE) means that the looping no longer 
happens and my problematic scenario has been deliberately taken care of, 
as a side effect if not as an intended one.

 This change (2/4) can therefore be dropped as no longer required.  Again, 
thank you, Simon, for your assistance!

  Maciej
  
Maciej W. Rozycki Jan. 23, 2020, 2:33 a.m. UTC | #6
On Thu, 23 Jan 2020, Maciej W. Rozycki wrote:

>  This change (2/4) can therefore be dropped as no longer required.  Again, 
> thank you, Simon, for your assistance!

 On second thoughts and having experimented a bit I take my conclusion 
back.

 I think we do want to include my proposed change so as not to have the 
value of 1 linger in `infrun_is_async' after an abrupt termination of the 
target connection.  This is because it is not the case if a remote stub 
behaves in an orderly manner, where `infrun_is_async' is reset to 0 every 
time execution stops in the debuggee (in the scenario concerned, that is).

 Therefore I think we want the state of GDB to be consistently the same 
regardless of whether the remote connection has been ended with `detach', 
`disconnect', `kill' or abruptly, even though we currently do not know 
(anymore) of any observable difference in behaviour resulting from this 
discrepancy.

 OK to go ahead with this change then?

 NB the current session log looks like:

(gdb) continue
Continuing.
infrun: clear_proceed_status_thread (Thread 2061.2061)
infrun: proceed (addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT)
Sending packet: $Z0,1555560ad8,2#7c...Packet received:
Packet Z0 (software-breakpoint) is NOT supported
Sending packet: $m1555560ad8,2#33...Packet received: 8280
Sending packet: $X1555560ad8,0:#56...Packet received: OK
binary downloading supported by target
Sending packet: $X1555560ad8,2:\002\220#ea...Packet received: OK
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current 
thread [Thread 2061.2061] at 0x1555556ef0
Sending packet: $QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;2c;4c;97;#0a...Packet received: OK
Sending packet: $vCont?#49...Packet received: vCont;c;C;t
Packet vCont (verbose-resume) is supported
Sending packet: $vCont;c:p80d.-1#aa...infrun: infrun_async(1)
infrun: prepare_to_wait
^Cremote_pass_ctrlc called
remote_interrupt called
^Cremote_pass_ctrlc called
infrun: infrun_async(0)
The target is not responding to interrupt requests.
Stop debugging it? (y or n) y
infrun: infrun_async(1)
Disconnected from target.
(gdb) 

  Maciej
  
Simon Marchi Jan. 23, 2020, 2:58 p.m. UTC | #7
On 2020-01-22 9:33 p.m., Maciej W. Rozycki wrote:
> On Thu, 23 Jan 2020, Maciej W. Rozycki wrote:
> 
>>  This change (2/4) can therefore be dropped as no longer required.  Again, 
>> thank you, Simon, for your assistance!
> 
>  On second thoughts and having experimented a bit I take my conclusion 
> back.
> 
>  I think we do want to include my proposed change so as not to have the 
> value of 1 linger in `infrun_is_async' after an abrupt termination of the 
> target connection.  This is because it is not the case if a remote stub 
> behaves in an orderly manner, where `infrun_is_async' is reset to 0 every 
> time execution stops in the debuggee (in the scenario concerned, that is).
> 
>  Therefore I think we want the state of GDB to be consistently the same 
> regardless of whether the remote connection has been ended with `detach', 
> `disconnect', `kill' or abruptly, even though we currently do not know 
> (anymore) of any observable difference in behaviour resulting from this 
> discrepancy.
> 
>  OK to go ahead with this change then?
> 
>  NB the current session log looks like:
> 
> (gdb) continue
> Continuing.
> infrun: clear_proceed_status_thread (Thread 2061.2061)
> infrun: proceed (addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT)
> Sending packet: $Z0,1555560ad8,2#7c...Packet received:
> Packet Z0 (software-breakpoint) is NOT supported
> Sending packet: $m1555560ad8,2#33...Packet received: 8280
> Sending packet: $X1555560ad8,0:#56...Packet received: OK
> binary downloading supported by target
> Sending packet: $X1555560ad8,2:\002\220#ea...Packet received: OK
> infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current 
> thread [Thread 2061.2061] at 0x1555556ef0
> Sending packet: $QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;2c;4c;97;#0a...Packet received: OK
> Sending packet: $vCont?#49...Packet received: vCont;c;C;t
> Packet vCont (verbose-resume) is supported
> Sending packet: $vCont;c:p80d.-1#aa...infrun: infrun_async(1)
> infrun: prepare_to_wait
> ^Cremote_pass_ctrlc called
> remote_interrupt called
> ^Cremote_pass_ctrlc called
> infrun: infrun_async(0)
> The target is not responding to interrupt requests.
> Stop debugging it? (y or n) y
> infrun: infrun_async(1)
> Disconnected from target.
> (gdb) 
> 
>   Maciej
> 

I think it would make sense.  I was wondering about how your change would
behave with multiple targets, but it only triggers when there are no more
live inferiors across all targets, so I think it would be fine.

But in all honesty, I don't really understand what the infrun_is_async
variable is for.  Clearly, it makes it so this:

      if (enable)
	mark_async_event_handler (infrun_async_inferior_event_token);
      else
	clear_async_event_handler (infrun_async_inferior_event_token);

is only called when there is a transition (async becomes enabled or
becomes disabled), but I don't see why that is particularly important.

Simon
  

Patch

Index: binutils-gdb/gdb/target.c
===================================================================
--- binutils-gdb.orig/gdb/target.c
+++ binutils-gdb/gdb/target.c
@@ -634,6 +634,12 @@  target_stack::unpush (target_ops *t)
      implementation don't end up in T anymore.  */
   target_close (t);
 
+  /* If this was the last inferior, then make sure it's been unregistered
+     from the event loop so that we don't get distracted by spurious
+     inferior output.  */
+  if (!have_live_inferiors ())
+    infrun_async (0);
+
   return true;
 }