gdbserver-support: Handle gdbserver start failures

Message ID alpine.DEB.1.10.1408212053570.2958@tp.orcam.me.uk
State Committed
Headers

Commit Message

Maciej W. Rozycki Aug. 26, 2014, 5:14 p.m. UTC
  On Tue, 29 Jul 2014, Pedro Alves wrote:

> On 07/28/2014 08:34 PM, Maciej W. Rozycki wrote:
> 
> > gdb-test-gdbserver-start-timeout.diff
> > Index: gdb-fsf-trunk-quilt/gdb/testsuite/lib/gdbserver-support.exp
> > ===================================================================
> > --- gdb-fsf-trunk-quilt.orig/gdb/testsuite/lib/gdbserver-support.exp	2014-05-13 02:52:11.347706187 +0100
> > +++ gdb-fsf-trunk-quilt/gdb/testsuite/lib/gdbserver-support.exp	2014-05-30 01:45:51.658977074 +0100
> > @@ -275,6 +275,7 @@ proc gdbserver_start { options arguments
> >  	# Wait for the server to open its TCP socket, so that GDB can connect.
> >  	expect {
> >  	    -i $server_spawn_id
> > +	    -timeout 120
> >  	    -notransfer
> >  	    -re "Listening on" { }
> >  	    -re "Can't bind address: Address already in use\\.\r\n" {
> > 
> 
> OK.
> 
> Wouldn't it be good to add a 'timeout {...}' case that emits a
> warning or some such?

 Good point.  I actually went further than that.

 As it happens we have a board that fails a gdb.base/gcore-relro.exp test 
case reproducibly and moreover the case appears to trigger a kernel bug 
making the it less than usable.  Specifically the board remains responsive 
to some extent, however processes do not appear to be able to successfully 
complete termination anymore and perhaps more importantly further 
gdbserver processes can be started, but they never reach the stage of 
listening on the RSP socket.

 This change handles timeouts in gdbserver start properly, by throwing a 
TCL error exception when gdbserver does not report listening on the RSP 
socket in time.  This is then caught at the outer level and reported, and 
2 rather than 1 is returned so that the caller may tell the failure to 
start gdbserver and other issues apart and act accordingly (or do 
nothing).

 I thought letting the exception unwind further on might be a good idea 
for any test harnesses out there to break outright where a gdbserver start 
error is silently ignored right now, however I figured out the calls to 
gdbserver-support.exp are buried down too deep in the GDB test suite for 
such a change to be made easily.  I think returning a distinct return 
value is good enough (the API says "non-zero", so 2 is as good as 1) and 
we can always make the error harder in a later step if required.

 With config/gdbserver.exp being used this change remains transparent to 
the target board, the return value is passed up by gdb_reload and the 
error exception unwinds through gdbserver_gdb_load and is caught and 
handled by mi_gdb_target_load.  A call to perror is still made, reporting 
the timeout, and in the case of mi_gdb_target_load the procedure returns a 
value denoting unsuccessful completion.  An unsuccessful completion of 
gdb_reload is already handled elsewhere.

 An alternative gdbserver board configuration can interpret the return 
value in its gdb_reload implementation and catch the error in 
gdbserver_gdb_load in an attempt to recover a target board that has gone 
astray, for example by rebooting the board somehow.  This has proved 
effective with our failing board, that now completes the remaining test 
cases with no further hiccups.

 I pushed it through regression testing with the powerpc-linux-gnu target 
and a some half a dozen of multilibs (including ones that trip over the 
faulty kernel on the problematic board) and there were no issues.  The 
full list of the offending test cases is as follows:

FAIL: gdb.base/gcore-relro.exp: save a corefile
FAIL: gdb.base/print-symbol-loading.exp: save a corefile
FAIL: gdb.threads/gcore-thread.exp: save a corefile

so essentially the same stuff in different places.

 OK to apply?

2014-08-26  Maciej W. Rozycki  <macro@codesourcery.com>

	gdb/testsuite/
	* lib/gdbserver-support.exp (gdbserver_start): Throw an error
	exception on timeout.
	(gdbserver_run): Catch any `gdbserver_spawn' error exceptions.
	(gdbserver_start_extended): Catch any `gdbserver_start' error
	exceptions.
	(gdbserver_start_multi, mi_gdbserver_start_multi): Likewise.
	* lib/mi-support.exp (mi_gdb_target_load): Catch any
	`gdbserver_gdb_load' error exceptions.

  Maciej

gdb-test-gdbserver-start-timeout-handle.diff
  

Comments

Maciej W. Rozycki Sept. 3, 2014, 12:39 p.m. UTC | #1
On Tue, 26 Aug 2014, Maciej W. Rozycki wrote:

> > Wouldn't it be good to add a 'timeout {...}' case that emits a
> > warning or some such?
> 
>  Good point.  I actually went further than that.
> 
>  As it happens we have a board that fails a gdb.base/gcore-relro.exp test 
> case reproducibly and moreover the case appears to trigger a kernel bug 
> making the it less than usable.  Specifically the board remains responsive 
> to some extent, however processes do not appear to be able to successfully 
> complete termination anymore and perhaps more importantly further 
> gdbserver processes can be started, but they never reach the stage of 
> listening on the RSP socket.
> 
>  This change handles timeouts in gdbserver start properly, by throwing a 
> TCL error exception when gdbserver does not report listening on the RSP 
> socket in time.  This is then caught at the outer level and reported, and 
> 2 rather than 1 is returned so that the caller may tell the failure to 
> start gdbserver and other issues apart and act accordingly (or do 
> nothing).
> 
>  I thought letting the exception unwind further on might be a good idea 
> for any test harnesses out there to break outright where a gdbserver start 
> error is silently ignored right now, however I figured out the calls to 
> gdbserver-support.exp are buried down too deep in the GDB test suite for 
> such a change to be made easily.  I think returning a distinct return 
> value is good enough (the API says "non-zero", so 2 is as good as 1) and 
> we can always make the error harder in a later step if required.
> 
>  With config/gdbserver.exp being used this change remains transparent to 
> the target board, the return value is passed up by gdb_reload and the 
> error exception unwinds through gdbserver_gdb_load and is caught and 
> handled by mi_gdb_target_load.  A call to perror is still made, reporting 
> the timeout, and in the case of mi_gdb_target_load the procedure returns a 
> value denoting unsuccessful completion.  An unsuccessful completion of 
> gdb_reload is already handled elsewhere.
> 
>  An alternative gdbserver board configuration can interpret the return 
> value in its gdb_reload implementation and catch the error in 
> gdbserver_gdb_load in an attempt to recover a target board that has gone 
> astray, for example by rebooting the board somehow.  This has proved 
> effective with our failing board, that now completes the remaining test 
> cases with no further hiccups.
> 
>  I pushed it through regression testing with the powerpc-linux-gnu target 
> and a some half a dozen of multilibs (including ones that trip over the 
> faulty kernel on the problematic board) and there were no issues.  The 
> full list of the offending test cases is as follows:
> 
> FAIL: gdb.base/gcore-relro.exp: save a corefile
> FAIL: gdb.base/print-symbol-loading.exp: save a corefile
> FAIL: gdb.threads/gcore-thread.exp: save a corefile
> 
> so essentially the same stuff in different places.
> 
>  OK to apply?
> 
> 2014-08-26  Maciej W. Rozycki  <macro@codesourcery.com>
> 
> 	gdb/testsuite/
> 	* lib/gdbserver-support.exp (gdbserver_start): Throw an error
> 	exception on timeout.
> 	(gdbserver_run): Catch any `gdbserver_spawn' error exceptions.
> 	(gdbserver_start_extended): Catch any `gdbserver_start' error
> 	exceptions.
> 	(gdbserver_start_multi, mi_gdbserver_start_multi): Likewise.
> 	* lib/mi-support.exp (mi_gdb_target_load): Catch any
> 	`gdbserver_gdb_load' error exceptions.

 Ping!

  Maciej
  
Pedro Alves Sept. 4, 2014, 9:30 a.m. UTC | #2
Hi Maciej,

On 08/26/2014 06:14 PM, Maciej W. Rozycki wrote:

>>> Index: gdb-fsf-trunk-quilt/gdb/testsuite/lib/gdbserver-support.exp
>>> ===================================================================
>>> --- gdb-fsf-trunk-quilt.orig/gdb/testsuite/lib/gdbserver-support.exp	2014-05-13 02:52:11.347706187 +0100
>>> +++ gdb-fsf-trunk-quilt/gdb/testsuite/lib/gdbserver-support.exp	2014-05-30 01:45:51.658977074 +0100
>>> @@ -275,6 +275,7 @@ proc gdbserver_start { options arguments
>>>  	# Wait for the server to open its TCP socket, so that GDB can connect.
>>>  	expect {
>>>  	    -i $server_spawn_id
>>> +	    -timeout 120
>>>  	    -notransfer
>>>  	    -re "Listening on" { }
>>>  	    -re "Can't bind address: Address already in use\\.\r\n" {
>>>
>>
>> OK.
>>
>> Wouldn't it be good to add a 'timeout {...}' case that emits a
>> warning or some such?
> 
>  Good point.  I actually went further than that.
> 
>  As it happens we have a board that fails a gdb.base/gcore-relro.exp test 
> case reproducibly and moreover the case appears to trigger a kernel bug 
> making the it less than usable.  Specifically the board remains responsive 
> to some extent, however processes do not appear to be able to successfully 
> complete termination anymore and perhaps more importantly further 
> gdbserver processes can be started, but they never reach the stage of 
> listening on the RSP socket.
> 
>  This change handles timeouts in gdbserver start properly, by throwing a 
> TCL error exception when gdbserver does not report listening on the RSP 
> socket in time.  This is then caught at the outer level and reported, and 
> 2 rather than 1 is returned so that the caller may tell the failure to 
> start gdbserver and other issues apart and act accordingly (or do 
> nothing).
> 
>  I thought letting the exception unwind further on might be a good idea 
> for any test harnesses out there to break outright where a gdbserver start 
> error is silently ignored right now, however I figured out the calls to 
> gdbserver-support.exp are buried down too deep in the GDB test suite for 
> such a change to be made easily.  I think returning a distinct return 
> value is good enough (the API says "non-zero", so 2 is as good as 1) and 
> we can always make the error harder in a later step if required.
> 
>  With config/gdbserver.exp being used this change remains transparent to 
> the target board, the return value is passed up by gdb_reload and the 
> error exception unwinds through gdbserver_gdb_load and is caught and 
> handled by mi_gdb_target_load.  A call to perror is still made, reporting 
> the timeout, and in the case of mi_gdb_target_load the procedure returns a 
> value denoting unsuccessful completion.  An unsuccessful completion of 
> gdb_reload is already handled elsewhere.
> 
>  An alternative gdbserver board configuration can interpret the return 
> value in its gdb_reload implementation and catch the error in 
> gdbserver_gdb_load in an attempt to recover a target board that has gone 
> astray, for example by rebooting the board somehow.  This has proved 
> effective with our failing board, that now completes the remaining test 
> cases with no further hiccups.
> 
>  I pushed it through regression testing with the powerpc-linux-gnu target 
> and a some half a dozen of multilibs (including ones that trip over the 
> faulty kernel on the problematic board) and there were no issues.  The 
> full list of the offending test cases is as follows:
> 
> FAIL: gdb.base/gcore-relro.exp: save a corefile
> FAIL: gdb.base/print-symbol-loading.exp: save a corefile
> FAIL: gdb.threads/gcore-thread.exp: save a corefile
> 
> so essentially the same stuff in different places.
> 
>  OK to apply?

OK.

Thanks,
Pedro Alves
  
Maciej W. Rozycki Sept. 9, 2014, 3:24 p.m. UTC | #3
On Thu, 4 Sep 2014, Pedro Alves wrote:

> >  I pushed it through regression testing with the powerpc-linux-gnu target 
> > and a some half a dozen of multilibs (including ones that trip over the 
> > faulty kernel on the problematic board) and there were no issues.  The 
> > full list of the offending test cases is as follows:
> > 
> > FAIL: gdb.base/gcore-relro.exp: save a corefile
> > FAIL: gdb.base/print-symbol-loading.exp: save a corefile
> > FAIL: gdb.threads/gcore-thread.exp: save a corefile
> > 
> > so essentially the same stuff in different places.
> > 
> >  OK to apply?
> 
> OK.

 Thanks for your review, I have applied this change now too.

  Maciej
  

Patch

Index: gdb-fsf-trunk-quilt/gdb/testsuite/lib/gdbserver-support.exp
===================================================================
--- gdb-fsf-trunk-quilt.orig/gdb/testsuite/lib/gdbserver-support.exp	2014-08-23 01:40:41.758957007 +0100
+++ gdb-fsf-trunk-quilt/gdb/testsuite/lib/gdbserver-support.exp	2014-08-23 01:45:06.788973638 +0100
@@ -287,6 +287,9 @@  proc gdbserver_start { options arguments
 		    continue
 		}
 	    }
+	    timeout {
+		error "Timeout waiting for gdbserver response."
+	    }
 	}
 	break
     }
@@ -326,7 +329,8 @@  proc gdbserver_spawn { child_args } {
 }
 
 # Start a gdbserver process running HOST_EXEC and pass CHILD_ARGS
-# to it.  Return 0 on success, or non-zero on failure.
+# to it.  Return 0 on success, or non-zero on failure: 2 if gdbserver
+# failed to start or 1 if we couldn't connect to it.
 
 proc gdbserver_run { child_args } {
     global gdbserver_protocol
@@ -347,7 +351,10 @@  proc gdbserver_run { child_args } {
 	}
     }
 
-    set res [gdbserver_spawn $child_args]
+    if { [catch { gdbserver_spawn $child_args } res] == 1 } {
+	perror $res
+	return 2
+    }
     set gdbserver_protocol [lindex $res 0]
     set gdbserver_gdbport [lindex $res 1]
 
@@ -377,7 +384,10 @@  proc gdbserver_start_extended { } {
     global gdbserver_gdbport
     global use_gdb_stub
 
-    set res [gdbserver_start "--multi" ""]
+    if { [catch { gdbserver_start "--multi" "" } res] == 1 } {
+	perror $res
+	return 2
+    }
     set gdbserver_protocol [lindex $res 0]
     if { [string first "extended-" $gdbserver_protocol] != 0} {
 	set gdbserver_protocol "extended-$gdbserver_protocol"
@@ -399,7 +409,10 @@  proc gdbserver_start_multi { } {
     global gdbserver_protocol
     global gdbserver_gdbport
 
-    set res [gdbserver_start "--multi" ""]
+    if { [catch { gdbserver_start "--multi" "" } res] == 1 } {
+	perror $res
+	return 2
+    }
     set gdbserver_protocol [lindex $res 0]
     set gdbserver_gdbport [lindex $res 1]
 
@@ -414,7 +427,10 @@  proc mi_gdbserver_start_multi { } {
     global gdbserver_protocol
     global gdbserver_gdbport
 
-    set res [gdbserver_start "--multi" ""]
+    if { [catch { gdbserver_start "--multi" "" } res] == 1 } {
+	perror $res
+	return 2
+    }
     set gdbserver_protocol [lindex $res 0]
     set gdbserver_gdbport [lindex $res 1]
 
Index: gdb-fsf-trunk-quilt/gdb/testsuite/lib/mi-support.exp
===================================================================
--- gdb-fsf-trunk-quilt.orig/gdb/testsuite/lib/mi-support.exp	2014-08-23 01:11:09.000000000 +0100
+++ gdb-fsf-trunk-quilt/gdb/testsuite/lib/mi-support.exp	2014-08-23 01:45:06.788973638 +0100
@@ -477,7 +477,10 @@  proc mi_gdb_target_load { } {
 
     if { [info procs gdbserver_gdb_load] != "" } {
 	mi_gdb_test "kill" ".*" ""
-	set res [gdbserver_gdb_load]
+	if { [catch gdbserver_gdb_load res] == 1 } {
+	    perror $res
+	    return -1
+	}
 	set protocol [lindex $res 0]
 	set gdbport [lindex $res 1]