[v2,4/4] Unregister the inferior from the event loop if failed to resume

Message ID alpine.LFD.2.21.1911061926080.13542@redsun52.ssa.fujisawa.hgst.com
State New, archived
Headers

Commit Message

Maciej W. Rozycki Nov. 6, 2019, 8:52 p.m. UTC
  Fix an issue with GDB starting to effectively busy-loop (though still 
accepting and interpreting commands) using the full processing power 
available when a remote stub has gone closing the connection while GDB 
is waiting for user input once the inferior has failed to resume.

This was observed with a `riscv64-linux-gnu' cross-GDB and RISC-V/Linux
`gdbserver' being developed and an attempt to step over the `ret' aka
`c.jr ra' instruction where the value held in the `ra' aka `x1' register
and therefore the address of a software-stepping breakpoint to insert is
0.  Here's a record of a remote debug session leading to the issue:

Sending packet: $vCont?#49...Packet received: vCont;c;C;t
Packet vCont (verbose-resume) is supported
Sending packet: $vCont;c:p6857.-1#b8...infrun: infrun_async(1)
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = ignore
infrun: handle_inferior_event status->kind = ignore
infrun: prepare_to_wait
Packet received: T05swbreak:;02:f0f8ffff3f000000;20:b807565515000000;thread:p6857.6857;core:1;
infrun: target_wait (-1.0.0, status) =
infrun:   26711.26711.0 [Thread 26711.26711],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: handle_inferior_event status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: stop_pc = 0x15555607b8
Sending packet: $qXfer:libraries-svr4:read::0,1000#20...Packet received: l<library-list-svr4 version="1.0" main-lm="0x155556f160"><library name=".../sysroot/lib/ld-linux-riscv64-lp64d.so.1" lm="0x155556ea18" l_addr="0x1555556000" l_ld="0x155556de90"/><library name="linux-vdso.so.1" lm="0x155556f6f0" l_addr="0x1555570000" l_ld="0x1555570350"/></library-list-svr4>
infrun: BPSTAT_WHAT_SINGLE
infrun: thread [Thread 26711.26711] still needs step-over
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
Sending packet: $X15555607b8,2:\202\200#2e...Packet received: OK
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 26711.26711] at 0x15555607b8
Sending packet: $m15555607b8,2#07...Packet received: 8280
Sending packet: $m15555607b8,2#07...Packet received: 8280
Sending packet: $g#67...Packet received: 00000000000000000000000000000000f0f8ffff3f000000d0c1b9aa2a00000020f76d55150000000f00000000000000ec6e555515000000ffffff6f00000000f0faffff3f00000090f0565515000000000000000000000052e57464000000000500000000000000887801000000000028f1565515000000010000000000000008f8ffff3f000000722f6c696236342f60f156551500000000000000000000000000000000000000000000000000000008f75655150000000100000000000000f5feffefffffffff80de56551500000050f9ffff3f00000068f9ffff3f000000ae585655150000000b00000000000000fffdff6f00000000fcffffffffffffff[552 bytes omitted]
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
infrun: clear_step_over_info
Cannot access memory at address 0x0
(gdb) 

At this point a command was issued to kill `gdbserver' and GDB responded 
immediately by entering the event loop:

infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: handle_inferior_event status->kind = no-resumed
infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg)
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: handle_inferior_event status->kind = no-resumed
infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg)
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [process -1],
infrun:   status->kind = no-resumed
infrun: handle_inferior_event status->kind = no-resumed
infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg)
infrun: prepare_to_wait
[...]

This is because `remote_target::resume' enables the asynchronous event
loop, as indicated by the first `infrun: infrun_async(1)' record above, 
and then the EOF condition on the remote connection is delivered as an 
event, but there are no resumed children to be waited for and no other 
event waiting that would stop the loop.

Correct that then by disabling the asynchronous event as execution 
completion would, where applicable, i.e. in the all-stop mode.  This 
makes the final sequence of the session look like:

Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
infrun: infrun_async(0)
infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
infrun: clear_step_over_info
Cannot access memory at address 0x0
(gdb)

and GDB does not trigger the loop at this point if `gdbserver' goes away 
as the asynchronous event loop has already been disabled.

	gdb/
	* infrun.c (resume): In the `catch' clause also unregister the 
	inferior from the event loop.
---
Hi,

 It might be that the non-stop mode requires additional clean-up here, 
however I have no way to work with that at the moment, so I'll be leaving 
any investigation to someone who has and will be inclined to look into it.

  Maciej

New change in v2.
---
 gdb/infrun.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

gdb-resume-async.diff
  

Comments

Luis Machado Feb. 19, 2020, 1:40 p.m. UTC | #1
On 11/6/19 5:52 PM, Maciej W. Rozycki wrote:
> Fix an issue with GDB starting to effectively busy-loop (though still
> accepting and interpreting commands) using the full processing power
> available when a remote stub has gone closing the connection while GDB
> is waiting for user input once the inferior has failed to resume.
> 
> This was observed with a `riscv64-linux-gnu' cross-GDB and RISC-V/Linux
> `gdbserver' being developed and an attempt to step over the `ret' aka
> `c.jr ra' instruction where the value held in the `ra' aka `x1' register
> and therefore the address of a software-stepping breakpoint to insert is
> 0.  Here's a record of a remote debug session leading to the issue:
> 
> Sending packet: $vCont?#49...Packet received: vCont;c;C;t
> Packet vCont (verbose-resume) is supported
> Sending packet: $vCont;c:p6857.-1#b8...infrun: infrun_async(1)
> infrun: prepare_to_wait
> infrun: target_wait (-1.0.0, status) =
> infrun:   -1.0.0 [process -1],
> infrun:   status->kind = ignore
> infrun: handle_inferior_event status->kind = ignore
> infrun: prepare_to_wait
> Packet received: T05swbreak:;02:f0f8ffff3f000000;20:b807565515000000;thread:p6857.6857;core:1;
> infrun: target_wait (-1.0.0, status) =
> infrun:   26711.26711.0 [Thread 26711.26711],
> infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
> infrun: handle_inferior_event status->kind = stopped, signal = GDB_SIGNAL_TRAP
> infrun: stop_pc = 0x15555607b8
> Sending packet: $qXfer:libraries-svr4:read::0,1000#20...Packet received: l<library-list-svr4 version="1.0" main-lm="0x155556f160"><library name=".../sysroot/lib/ld-linux-riscv64-lp64d.so.1" lm="0x155556ea18" l_addr="0x1555556000" l_ld="0x155556de90"/><library name="linux-vdso.so.1" lm="0x155556f6f0" l_addr="0x1555570000" l_ld="0x1555570350"/></library-list-svr4>
> infrun: BPSTAT_WHAT_SINGLE
> infrun: thread [Thread 26711.26711] still needs step-over
> infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
> Sending packet: $X15555607b8,2:\202\200#2e...Packet received: OK
> infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
> infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
> infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 26711.26711] at 0x15555607b8
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $g#67...Packet received: 00000000000000000000000000000000f0f8ffff3f000000d0c1b9aa2a00000020f76d55150000000f00000000000000ec6e555515000000ffffff6f00000000f0faffff3f00000090f0565515000000000000000000000052e57464000000000500000000000000887801000000000028f1565515000000010000000000000008f8ffff3f000000722f6c696236342f60f156551500000000000000000000000000000000000000000000000000000008f75655150000000100000000000000f5feffefffffffff80de56551500000050f9ffff3f00000068f9ffff3f000000ae585655150000000b00000000000000fffdff6f00000000fcffffffffffffff[552 bytes omitted]
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
> infrun: clear_step_over_info
> Cannot access memory at address 0x0
> (gdb)
> 
> At this point a command was issued to kill `gdbserver' and GDB responded
> immediately by entering the event loop:
> 
> infrun: target_wait (-1.0.0, status) =
> infrun:   -1.0.0 [process -1],
> infrun:   status->kind = no-resumed
> infrun: handle_inferior_event status->kind = no-resumed
> infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg)
> infrun: prepare_to_wait
> infrun: target_wait (-1.0.0, status) =
> infrun:   -1.0.0 [process -1],
> infrun:   status->kind = no-resumed
> infrun: handle_inferior_event status->kind = no-resumed
> infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg)
> infrun: prepare_to_wait
> infrun: target_wait (-1.0.0, status) =
> infrun:   -1.0.0 [process -1],
> infrun:   status->kind = no-resumed
> infrun: handle_inferior_event status->kind = no-resumed
> infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg)
> infrun: prepare_to_wait
> [...]
> 
> This is because `remote_target::resume' enables the asynchronous event
> loop, as indicated by the first `infrun: infrun_async(1)' record above,
> and then the EOF condition on the remote connection is delivered as an
> event, but there are no resumed children to be waited for and no other
> event waiting that would stop the loop.
> 
> Correct that then by disabling the asynchronous event as execution
> completion would, where applicable, i.e. in the all-stop mode.  This
> makes the final sequence of the session look like:
> 
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> infrun: infrun_async(0)
> infrun: skipping breakpoint: stepping past insn at: 0x15555607b8
> infrun: clear_step_over_info
> Cannot access memory at address 0x0
> (gdb)
> 
> and GDB does not trigger the loop at this point if `gdbserver' goes away
> as the asynchronous event loop has already been disabled.
> 
> 	gdb/
> 	* infrun.c (resume): In the `catch' clause also unregister the
> 	inferior from the event loop.
> ---
> Hi,
> 
>   It might be that the non-stop mode requires additional clean-up here,
> however I have no way to work with that at the moment, so I'll be leaving
> any investigation to someone who has and will be inclined to look into it.
> 
>    Maciej
> 
> New change in v2.
> ---
>   gdb/infrun.c |   16 +++++++++++-----
>   1 file changed, 11 insertions(+), 5 deletions(-)
> 
> gdb-resume-async.diff
> Index: binutils-gdb/gdb/infrun.c
> ===================================================================
> --- binutils-gdb.orig/gdb/infrun.c
> +++ binutils-gdb/gdb/infrun.c
> @@ -2614,11 +2614,17 @@ resume (gdb_signal sig)
>       }
>     catch (const gdb_exception &ex)
>       {
> -      /* If resuming is being aborted for any reason, delete any
> -	 single-step breakpoint resume_1 may have created, to avoid
> -	 confusing the following resumption, and to avoid leaving
> -	 single-step breakpoints perturbing other threads, in case
> -	 we're running in non-stop mode.  */
> +      /* If resuming is being aborted for any reason, and we are in
> +	 the all-stop mode, then unregister the inferior from the event
> +	 loop.  This is done so that when the inferior is not running
> +	 we don't get distracted by spurious inferior output.  */
> +      if (!non_stop && target_has_execution && target_can_async_p ())
> +	target_async (0);
> +
> +      /* Also delete any single-step breakpoint resume_1 may have
> +	 created, to avoid confusing the following resumption, and to
> +	 avoid leaving single-step breakpoints perturbing other threads,
> +	 in case we're running in non-stop mode.  */
>         if (inferior_ptid != null_ptid)
>   	{
>   	  thread_info *tp = inferior_thread ();
> 

This looks reasonable, though the call to target_async (0) is always a 
bit cryptic to me. Makes me wonder if we're missing such invocations in 
other places to make GDB more recoverable on failures.

Does this fix also address 02/04? Since 02/04 is also a failure during 
resume? Not sure if it goes through the catch block.
  

Patch

Index: binutils-gdb/gdb/infrun.c
===================================================================
--- binutils-gdb.orig/gdb/infrun.c
+++ binutils-gdb/gdb/infrun.c
@@ -2614,11 +2614,17 @@  resume (gdb_signal sig)
     }
   catch (const gdb_exception &ex)
     {
-      /* If resuming is being aborted for any reason, delete any
-	 single-step breakpoint resume_1 may have created, to avoid
-	 confusing the following resumption, and to avoid leaving
-	 single-step breakpoints perturbing other threads, in case
-	 we're running in non-stop mode.  */
+      /* If resuming is being aborted for any reason, and we are in
+	 the all-stop mode, then unregister the inferior from the event
+	 loop.  This is done so that when the inferior is not running
+	 we don't get distracted by spurious inferior output.  */
+      if (!non_stop && target_has_execution && target_can_async_p ())
+	target_async (0);
+
+      /* Also delete any single-step breakpoint resume_1 may have
+	 created, to avoid confusing the following resumption, and to
+	 avoid leaving single-step breakpoints perturbing other threads,
+	 in case we're running in non-stop mode.  */
       if (inferior_ptid != null_ptid)
 	{
 	  thread_info *tp = inferior_thread ();