[v2,3/4] Remove breakpoint step-over information if failed to resume

Message ID alpine.LFD.2.21.1911051718280.13542@redsun52.ssa.fujisawa.hgst.com
State New, archived
Headers

Commit Message

Maciej W. Rozycki Nov. 6, 2019, 8:52 p.m. UTC
  Complement commit 31e77af205cf ("PR breakpoints/7143 - Watchpoint does 
not trigger when first set") and commit 71d378ae60a4 
("gdb.base/breakpoint-in-ro-region.exp regression on sss targets (PR 
gdb/22583)") and also remove breakpoint step-over information if the 
inferior has failed to resume, which may be due to for example a failure 
to insert a software breakpoint at an inaccessible location.  If this 
happens, the state of GDB becomes inconsistent and further attempts to 
start execution hang.

This was observed with a `riscv64-linux-gnu' cross-GDB and RISC-V/Linux 
`gdbserver' being developed and an attempt to step over the `ret' aka 
`c.jr ra' instruction where the value held in the `ra' aka `x1' register 
and therefore the address of a software-stepping breakpoint to insert is 
0.  Here's a record of a remote debug session leading to the issue:

Sending packet: $vCont?#49...Packet received: vCont;c;C;t
Packet vCont (verbose-resume) is supported
Sending packet: $vCont;c:p23cb.-1#08...Packet received: T05swbreak:;02:20da080000000000;20:b807565515000000;thread:p23cb.23cb;core:3;
Sending packet: $qXfer:libraries-svr4:read::0,1000#20...Packet received: l<library-list-svr4 version="1.0" main-lm="0x155556f160"><library name=".../sysroot/lib/ld-linux-riscv64-lp64d.so.1" lm="0x155556ea18" l_addr="0x1555556000" l_ld="0x155556de90"/><library name="linux-vdso.so.1" lm="0x155556f6f0" l_addr="0x1555570000" l_ld="0x1555570350"/></library-list-svr4>
Sending packet: $X15555607b8,2:\202\200#2e...Packet received: OK
Sending packet: $m15555607b8,2#07...Packet received: 8280
Sending packet: $m15555607b8,2#07...Packet received: 8280
Sending packet: $g#67...Packet received: 0000000000000000000000000000000020da08000000000000000000000000000000000000000000d0ae040000000000696e74000000000000000000000000000100000000000000020000000000000080d708000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000000e408000000000020e908000000000000000000000000000000000000000000d0ae04000000000072697363765f646f00000000000000000100000000000000030000000000000000e408000000000020f208000000000000000000000000000000000000000000d0ae04000000000072697363765f646f0000000000000000[552 bytes omitted]
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
Cannot access memory at address 0x0
(gdb) stepi
Sending packet: $m15555607b8,4#09...Packet received: 82802a87
Sending packet: $m15555607b4,4#05...Packet received: 01452dbd
^Cremote_pass_ctrlc called
remote_interrupt called
^Cremote_pass_ctrlc called
Interrupted while waiting for the program.
Give up waiting? (y or n) y
Quit
(gdb) 

As observed here the `stepi' command does not even attempt to reinsert 
the breakpoint, let alone resume execution.

Correct the issue by making a call to clear global breakpoint step-over 
from the exception catch clause in `resume'.

With the change applied further `stepi' commands correctly retry 
breakpoint insertion:

(gdb) stepi
Sending packet: $m15555607b8,4#09...Packet received: 82802a87
Sending packet: $m15555607b4,4#05...Packet received: 01452dbd
Sending packet: $m15555607b8,2#07...Packet received: 8280
Sending packet: $m15555607b8,2#07...Packet received: 8280
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
Sending packet: $m0,40#2d...Packet received: E01
Sending packet: $m0,1#fa...Packet received: E01
Cannot access memory at address 0x0
(gdb)

and changing the value of the `pc' register so as not to point at a 
`ret' instruction allows execution to be actually resumed.

	gdb/
	* infrun.c (resume): Also call `clear_step_over_info' in the 
	`catch' clause.
---
Changes from v1:

- Add a thread number argument to `clear_step_over_info'.
---
 gdb/infrun.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

gdb-resume-clear-step-over-info.diff
  

Comments

Simon Marchi Jan. 21, 2020, 5:47 a.m. UTC | #1
On 2019-11-06 3:52 p.m., Maciej W. Rozycki wrote:
> Complement commit 31e77af205cf ("PR breakpoints/7143 - Watchpoint does 
> not trigger when first set") and commit 71d378ae60a4 
> ("gdb.base/breakpoint-in-ro-region.exp regression on sss targets (PR 
> gdb/22583)") and also remove breakpoint step-over information if the 
> inferior has failed to resume, which may be due to for example a failure 
> to insert a software breakpoint at an inaccessible location.  If this 
> happens, the state of GDB becomes inconsistent and further attempts to 
> start execution hang.
> 
> This was observed with a `riscv64-linux-gnu' cross-GDB and RISC-V/Linux 
> `gdbserver' being developed and an attempt to step over the `ret' aka 
> `c.jr ra' instruction where the value held in the `ra' aka `x1' register 
> and therefore the address of a software-stepping breakpoint to insert is 
> 0.  Here's a record of a remote debug session leading to the issue:
> 
> Sending packet: $vCont?#49...Packet received: vCont;c;C;t
> Packet vCont (verbose-resume) is supported
> Sending packet: $vCont;c:p23cb.-1#08...Packet received: T05swbreak:;02:20da080000000000;20:b807565515000000;thread:p23cb.23cb;core:3;
> Sending packet: $qXfer:libraries-svr4:read::0,1000#20...Packet received: l<library-list-svr4 version="1.0" main-lm="0x155556f160"><library name=".../sysroot/lib/ld-linux-riscv64-lp64d.so.1" lm="0x155556ea18" l_addr="0x1555556000" l_ld="0x155556de90"/><library name="linux-vdso.so.1" lm="0x155556f6f0" l_addr="0x1555570000" l_ld="0x1555570350"/></library-list-svr4>
> Sending packet: $X15555607b8,2:\202\200#2e...Packet received: OK
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $g#67...Packet received: 0000000000000000000000000000000020da08000000000000000000000000000000000000000000d0ae040000000000696e74000000000000000000000000000100000000000000020000000000000080d708000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000000e408000000000020e908000000000000000000000000000000000000000000d0ae04000000000072697363765f646f00000000000000000100000000000000030000000000000000e408000000000020f208000000000000000000000000000000000000000000d0ae04000000000072697363765f646f0000000000000000[552 bytes omitted]
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Cannot access memory at address 0x0
> (gdb) stepi
> Sending packet: $m15555607b8,4#09...Packet received: 82802a87
> Sending packet: $m15555607b4,4#05...Packet received: 01452dbd
> ^Cremote_pass_ctrlc called
> remote_interrupt called
> ^Cremote_pass_ctrlc called
> Interrupted while waiting for the program.
> Give up waiting? (y or n) y
> Quit
> (gdb) 
> 
> As observed here the `stepi' command does not even attempt to reinsert 
> the breakpoint, let alone resume execution.
> 
> Correct the issue by making a call to clear global breakpoint step-over 
> from the exception catch clause in `resume'.
> 
> With the change applied further `stepi' commands correctly retry 
> breakpoint insertion:
> 
> (gdb) stepi
> Sending packet: $m15555607b8,4#09...Packet received: 82802a87
> Sending packet: $m15555607b4,4#05...Packet received: 01452dbd
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Cannot access memory at address 0x0
> (gdb)
> 
> and changing the value of the `pc' register so as not to point at a 
> `ret' instruction allows execution to be actually resumed.
> 
> 	gdb/
> 	* infrun.c (resume): Also call `clear_step_over_info' in the 
> 	`catch' clause.

That looks ok to me, but again, I'd prefer if Pedro could take a look.

Simon
  
Luis Machado Feb. 19, 2020, 1:30 p.m. UTC | #2
On 11/6/19 5:52 PM, Maciej W. Rozycki wrote:
> Complement commit 31e77af205cf ("PR breakpoints/7143 - Watchpoint does
> not trigger when first set") and commit 71d378ae60a4
> ("gdb.base/breakpoint-in-ro-region.exp regression on sss targets (PR
> gdb/22583)") and also remove breakpoint step-over information if the
> inferior has failed to resume, which may be due to for example a failure
> to insert a software breakpoint at an inaccessible location.  If this
> happens, the state of GDB becomes inconsistent and further attempts to
> start execution hang.
> 
> This was observed with a `riscv64-linux-gnu' cross-GDB and RISC-V/Linux
> `gdbserver' being developed and an attempt to step over the `ret' aka
> `c.jr ra' instruction where the value held in the `ra' aka `x1' register
> and therefore the address of a software-stepping breakpoint to insert is
> 0.  Here's a record of a remote debug session leading to the issue:
> 
> Sending packet: $vCont?#49...Packet received: vCont;c;C;t
> Packet vCont (verbose-resume) is supported
> Sending packet: $vCont;c:p23cb.-1#08...Packet received: T05swbreak:;02:20da080000000000;20:b807565515000000;thread:p23cb.23cb;core:3;
> Sending packet: $qXfer:libraries-svr4:read::0,1000#20...Packet received: l<library-list-svr4 version="1.0" main-lm="0x155556f160"><library name=".../sysroot/lib/ld-linux-riscv64-lp64d.so.1" lm="0x155556ea18" l_addr="0x1555556000" l_ld="0x155556de90"/><library name="linux-vdso.so.1" lm="0x155556f6f0" l_addr="0x1555570000" l_ld="0x1555570350"/></library-list-svr4>
> Sending packet: $X15555607b8,2:\202\200#2e...Packet received: OK
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $g#67...Packet received: 0000000000000000000000000000000020da08000000000000000000000000000000000000000000d0ae040000000000696e74000000000000000000000000000100000000000000020000000000000080d708000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000000e408000000000020e908000000000000000000000000000000000000000000d0ae04000000000072697363765f646f00000000000000000100000000000000030000000000000000e408000000000020f208000000000000000000000000000000000000000000d0ae04000000000072697363765f646f0000000000000000[552 bytes omitted]
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Cannot access memory at address 0x0
> (gdb) stepi
> Sending packet: $m15555607b8,4#09...Packet received: 82802a87
> Sending packet: $m15555607b4,4#05...Packet received: 01452dbd
> ^Cremote_pass_ctrlc called
> remote_interrupt called
> ^Cremote_pass_ctrlc called
> Interrupted while waiting for the program.
> Give up waiting? (y or n) y
> Quit
> (gdb)
> 
> As observed here the `stepi' command does not even attempt to reinsert
> the breakpoint, let alone resume execution.
> 
> Correct the issue by making a call to clear global breakpoint step-over
> from the exception catch clause in `resume'.
> 
> With the change applied further `stepi' commands correctly retry
> breakpoint insertion:
> 
> (gdb) stepi
> Sending packet: $m15555607b8,4#09...Packet received: 82802a87
> Sending packet: $m15555607b4,4#05...Packet received: 01452dbd
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $m15555607b8,2#07...Packet received: 8280
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Sending packet: $m0,40#2d...Packet received: E01
> Sending packet: $m0,1#fa...Packet received: E01
> Cannot access memory at address 0x0
> (gdb)
> 
> and changing the value of the `pc' register so as not to point at a
> `ret' instruction allows execution to be actually resumed.
> 
> 	gdb/
> 	* infrun.c (resume): Also call `clear_step_over_info' in the
> 	`catch' clause.
> ---
> Changes from v1:
> 
> - Add a thread number argument to `clear_step_over_info'.
> ---
>   gdb/infrun.c |    7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> gdb-resume-clear-step-over-info.diff
> Index: binutils-gdb/gdb/infrun.c
> ===================================================================
> --- binutils-gdb.orig/gdb/infrun.c
> +++ binutils-gdb/gdb/infrun.c
> @@ -2620,7 +2620,12 @@ resume (gdb_signal sig)
>   	 single-step breakpoints perturbing other threads, in case
>   	 we're running in non-stop mode.  */
>         if (inferior_ptid != null_ptid)
> -	delete_single_step_breakpoints (inferior_thread ());
> +	{
> +	  thread_info *tp = inferior_thread ();
> +
> +	  delete_single_step_breakpoints (tp);
> +	  clear_step_over_info (tp->global_num);
> +	}
>         throw;
>       }
>   }
> 

I'd suggest the same approach of checking if the thread has step-over 
information and then clearing the state without the need to add a global 
thread number parameter to clear_step_over_info.

Otherwise this looks reasonable to me.

It would've been nice to have more robust rollback logic to handle such 
cases of failure.
  

Patch

Index: binutils-gdb/gdb/infrun.c
===================================================================
--- binutils-gdb.orig/gdb/infrun.c
+++ binutils-gdb/gdb/infrun.c
@@ -2620,7 +2620,12 @@  resume (gdb_signal sig)
 	 single-step breakpoints perturbing other threads, in case
 	 we're running in non-stop mode.  */
       if (inferior_ptid != null_ptid)
-	delete_single_step_breakpoints (inferior_thread ());
+	{
+	  thread_info *tp = inferior_thread ();
+
+	  delete_single_step_breakpoints (tp);
+	  clear_step_over_info (tp->global_num);
+	}
       throw;
     }
 }