Message ID | alpine.LFD.2.21.1911061926080.13542@redsun52.ssa.fujisawa.hgst.com |
---|---|
State | New |
Headers | show |
On 11/6/19 5:52 PM, Maciej W. Rozycki wrote: > Fix an issue with GDB starting to effectively busy-loop (though still > accepting and interpreting commands) using the full processing power > available when a remote stub has gone closing the connection while GDB > is waiting for user input once the inferior has failed to resume. > > This was observed with a `riscv64-linux-gnu' cross-GDB and RISC-V/Linux > `gdbserver' being developed and an attempt to step over the `ret' aka > `c.jr ra' instruction where the value held in the `ra' aka `x1' register > and therefore the address of a software-stepping breakpoint to insert is > 0. Here's a record of a remote debug session leading to the issue: > > Sending packet: $vCont?#49...Packet received: vCont;c;C;t > Packet vCont (verbose-resume) is supported > Sending packet: $vCont;c:p6857.-1#b8...infrun: infrun_async(1) > infrun: prepare_to_wait > infrun: target_wait (-1.0.0, status) = > infrun: -1.0.0 [process -1], > infrun: status->kind = ignore > infrun: handle_inferior_event status->kind = ignore > infrun: prepare_to_wait > Packet received: T05swbreak:;02:f0f8ffff3f000000;20:b807565515000000;thread:p6857.6857;core:1; > infrun: target_wait (-1.0.0, status) = > infrun: 26711.26711.0 [Thread 26711.26711], > infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP > infrun: handle_inferior_event status->kind = stopped, signal = GDB_SIGNAL_TRAP > infrun: stop_pc = 0x15555607b8 > Sending packet: $qXfer:libraries-svr4:read::0,1000#20...Packet received: l<library-list-svr4 version="1.0" main-lm="0x155556f160"><library name=".../sysroot/lib/ld-linux-riscv64-lp64d.so.1" lm="0x155556ea18" l_addr="0x1555556000" l_ld="0x155556de90"/><library name="linux-vdso.so.1" lm="0x155556f6f0" l_addr="0x1555570000" l_ld="0x1555570350"/></library-list-svr4> > infrun: BPSTAT_WHAT_SINGLE > infrun: thread [Thread 26711.26711] still needs step-over > infrun: skipping breakpoint: stepping past insn at: 0x15555607b8 > Sending packet: $X15555607b8,2:\202\200#2e...Packet received: OK > infrun: skipping breakpoint: stepping past insn at: 0x15555607b8 > infrun: skipping breakpoint: stepping past insn at: 0x15555607b8 > infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 26711.26711] at 0x15555607b8 > Sending packet: $m15555607b8,2#07...Packet received: 8280 > Sending packet: $m15555607b8,2#07...Packet received: 8280 > Sending packet: $g#67...Packet received: 00000000000000000000000000000000f0f8ffff3f000000d0c1b9aa2a00000020f76d55150000000f00000000000000ec6e555515000000ffffff6f00000000f0faffff3f00000090f0565515000000000000000000000052e57464000000000500000000000000887801000000000028f1565515000000010000000000000008f8ffff3f000000722f6c696236342f60f156551500000000000000000000000000000000000000000000000000000008f75655150000000100000000000000f5feffefffffffff80de56551500000050f9ffff3f00000068f9ffff3f000000ae585655150000000b00000000000000fffdff6f00000000fcffffffffffffff[552 bytes omitted] > Sending packet: $m0,40#2d...Packet received: E01 > Sending packet: $m0,1#fa...Packet received: E01 > Sending packet: $m0,40#2d...Packet received: E01 > Sending packet: $m0,1#fa...Packet received: E01 > infrun: skipping breakpoint: stepping past insn at: 0x15555607b8 > infrun: clear_step_over_info > Cannot access memory at address 0x0 > (gdb) > > At this point a command was issued to kill `gdbserver' and GDB responded > immediately by entering the event loop: > > infrun: target_wait (-1.0.0, status) = > infrun: -1.0.0 [process -1], > infrun: status->kind = no-resumed > infrun: handle_inferior_event status->kind = no-resumed > infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg) > infrun: prepare_to_wait > infrun: target_wait (-1.0.0, status) = > infrun: -1.0.0 [process -1], > infrun: status->kind = no-resumed > infrun: handle_inferior_event status->kind = no-resumed > infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg) > infrun: prepare_to_wait > infrun: target_wait (-1.0.0, status) = > infrun: -1.0.0 [process -1], > infrun: status->kind = no-resumed > infrun: handle_inferior_event status->kind = no-resumed > infrun: TARGET_WAITKIND_NO_RESUMED (ignoring: bg) > infrun: prepare_to_wait > [...] > > This is because `remote_target::resume' enables the asynchronous event > loop, as indicated by the first `infrun: infrun_async(1)' record above, > and then the EOF condition on the remote connection is delivered as an > event, but there are no resumed children to be waited for and no other > event waiting that would stop the loop. > > Correct that then by disabling the asynchronous event as execution > completion would, where applicable, i.e. in the all-stop mode. This > makes the final sequence of the session look like: > > Sending packet: $m0,40#2d...Packet received: E01 > Sending packet: $m0,1#fa...Packet received: E01 > Sending packet: $m0,40#2d...Packet received: E01 > Sending packet: $m0,1#fa...Packet received: E01 > infrun: infrun_async(0) > infrun: skipping breakpoint: stepping past insn at: 0x15555607b8 > infrun: clear_step_over_info > Cannot access memory at address 0x0 > (gdb) > > and GDB does not trigger the loop at this point if `gdbserver' goes away > as the asynchronous event loop has already been disabled. > > gdb/ > * infrun.c (resume): In the `catch' clause also unregister the > inferior from the event loop. > --- > Hi, > > It might be that the non-stop mode requires additional clean-up here, > however I have no way to work with that at the moment, so I'll be leaving > any investigation to someone who has and will be inclined to look into it. > > Maciej > > New change in v2. > --- > gdb/infrun.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > gdb-resume-async.diff > Index: binutils-gdb/gdb/infrun.c > =================================================================== > --- binutils-gdb.orig/gdb/infrun.c > +++ binutils-gdb/gdb/infrun.c > @@ -2614,11 +2614,17 @@ resume (gdb_signal sig) > } > catch (const gdb_exception &ex) > { > - /* If resuming is being aborted for any reason, delete any > - single-step breakpoint resume_1 may have created, to avoid > - confusing the following resumption, and to avoid leaving > - single-step breakpoints perturbing other threads, in case > - we're running in non-stop mode. */ > + /* If resuming is being aborted for any reason, and we are in > + the all-stop mode, then unregister the inferior from the event > + loop. This is done so that when the inferior is not running > + we don't get distracted by spurious inferior output. */ > + if (!non_stop && target_has_execution && target_can_async_p ()) > + target_async (0); > + > + /* Also delete any single-step breakpoint resume_1 may have > + created, to avoid confusing the following resumption, and to > + avoid leaving single-step breakpoints perturbing other threads, > + in case we're running in non-stop mode. */ > if (inferior_ptid != null_ptid) > { > thread_info *tp = inferior_thread (); > This looks reasonable, though the call to target_async (0) is always a bit cryptic to me. Makes me wonder if we're missing such invocations in other places to make GDB more recoverable on failures. Does this fix also address 02/04? Since 02/04 is also a failure during resume? Not sure if it goes through the catch block.
Index: binutils-gdb/gdb/infrun.c =================================================================== --- binutils-gdb.orig/gdb/infrun.c +++ binutils-gdb/gdb/infrun.c @@ -2614,11 +2614,17 @@ resume (gdb_signal sig) } catch (const gdb_exception &ex) { - /* If resuming is being aborted for any reason, delete any - single-step breakpoint resume_1 may have created, to avoid - confusing the following resumption, and to avoid leaving - single-step breakpoints perturbing other threads, in case - we're running in non-stop mode. */ + /* If resuming is being aborted for any reason, and we are in + the all-stop mode, then unregister the inferior from the event + loop. This is done so that when the inferior is not running + we don't get distracted by spurious inferior output. */ + if (!non_stop && target_has_execution && target_can_async_p ()) + target_async (0); + + /* Also delete any single-step breakpoint resume_1 may have + created, to avoid confusing the following resumption, and to + avoid leaving single-step breakpoints perturbing other threads, + in case we're running in non-stop mode. */ if (inferior_ptid != null_ptid) { thread_info *tp = inferior_thread ();