Testsuite: Catch gdbserver socket listen errors

Message ID 20190226142856.35527-1-alan.hayward@arm.com
State New, archived
Headers

Commit Message

Alan Hayward Feb. 26, 2019, 2:28 p.m. UTC
  [This is a replacement for the patch "gdbserver: Add option to ignore
SO_REUSEADDR".  The issue is the same, but the root cause is different].

When launching gdbserver, the testsuite checks for binding failure but
does not check for failure to listen to socket error (which can happen
due to another gdbserver binding to the socket at the same time).

When this error occurs, the test will ignore the error and connect GDB
to the failed port.  This may succeed and GDB will now be connected to
the gdbserver from another test.  This eventually causes both tests to
fail.

When running the tests suite with native-gdbserver across many cores,
this issue may happen once or twice, each causing random failures for
two .exp testscripts.

Example gdb.log output for the failure:

The testsuite sucessfully notices a failure to connect to port 2348.
It launches again with port 2349, which also fails.  The testsuite
ignores this error and uses gdb to connect to the port - which succeeds.

spawn /work/build/gdb/testsuite/../gdbserver/gdbserver --once localhost:2348 /work/build/gdb/testsuite/outputs/gdb.ada/arrayidx/p^M
Can't bind address: Address already in use.^M
Exiting^M
Port 2348 is already in use.
spawn /work/build/gdb/testsuite/../gdbserver/gdbserver --once localhost:2349 /work/build/gdb/testsuite/outputs/gdb.ada/arrayidx/p^M
Can't listen on socket: Address already in use.^M
Exiting^M
target remote localhost:2349^M
Remote debugging using localhost:2349^M
Reading /lib/ld-linux-aarch64.so.1 from remote target...^M
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.^M
Reading /lib/ld-linux-aarch64.so.1 from remote target...^M
Reading symbols from target:/lib/ld-linux-aarch64.so.1...^M
Reading /lib/ld-2.23.so from remote target...^M
Reading /lib/.debug/ld-2.23.so from remote target...^M
Reading /work/build/install/lib/debug//lib/ld-2.23.so from remote target...^M
Reading /work/build/install/lib/debug/lib//ld-2.23.so from remote target...^M
Reading target:/work/build/install/lib/debug/lib//ld-2.23.so from remote target...^M
(No debugging symbols found in target:/lib/ld-linux-aarch64.so.1)^M
0x0000ffffbf6d2cc0 in ?? () from target:/lib/ld-linux-aarch64.so.1^M
(gdb) continue^M
Continuing.^M
Reading /lib/aarch64-linux-gnu/libc.so.6 from remote target...^M
Reading /lib/aarch64-linux-gnu/libc-2.23.so from remote target...^M
Reading /lib/aarch64-linux-gnu/.debug/libc-2.23.so from remote target...^M
Reading /work/build/install/lib/debug//lib/aarch64-linux-gnu/libc-2.23.so from remote target...^M
Reading /work/build/install/lib/debug/lib/aarch64-linux-gnu//libc-2.23.so from remote target...^M
Reading target:/work/build/install/lib/debug/lib/aarch64-linux-gnu//libc-2.23.so from remote target...^M
[Inferior 1 (process 35351) exited normally]^M
(gdb) FAIL: gdb.ada/arrayidx.exp: can't run to main

Meanwhile, at the same time, in another test, gdbserver successfully
connects to port 2349.  GDB then trys to connect to the port, but it
times out because the GDB in the test above has already connected to it.

spawn /work/build/gdb/testsuite/../gdbserver/gdbserver --once localhost:2348 /work/build/gdb/testsuite/outputs/gdb.ada/rdv_wait/foo^M
Can't bind address: Address already in use.^M
Exiting^M
Port 2348 is already in use.
spawn /work/build/gdb/testsuite/../gdbserver/gdbserver --once localhost:2349 /work/build/gdb/testsuite/outputs/gdb.ada/rdv_wait/foo^M
Process /work/build/gdb/testsuite/outputs/gdb.ada/rdv_wait/foo created; pid = 65162^M
Listening on port 2349^M
Remote debugging from host 127.0.0.1, port 45154^M
target remote localhost:2349^M
localhost:2349: Connection timed out.^M
(gdb) ^CQuit^M
(gdb) task 2^M
Cannot inspect Ada tasks when program is not running^M

2019-02-26  Alan Hayward  <alan.hayward@arm.com>

	* lib/gdbserver-support.exp (gdbserver_start): Check for listen
	failure.
---
 gdb/testsuite/lib/gdbserver-support.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Sergio Durigan Junior Feb. 27, 2019, 4:20 p.m. UTC | #1
On Tuesday, February 26 2019, Alan Hayward wrote:

> [This is a replacement for the patch "gdbserver: Add option to ignore
> SO_REUSEADDR".  The issue is the same, but the root cause is different].
>
> When launching gdbserver, the testsuite checks for binding failure but
> does not check for failure to listen to socket error (which can happen
> due to another gdbserver binding to the socket at the same time).
>
> When this error occurs, the test will ignore the error and connect GDB
> to the failed port.  This may succeed and GDB will now be connected to
> the gdbserver from another test.  This eventually causes both tests to
> fail.
>
> When running the tests suite with native-gdbserver across many cores,
> this issue may happen once or twice, each causing random failures for
> two .exp testscripts.

I remember having to touch this code when I was implementing the IPv6
feature, and I think your patch makes perfect sense.  It LGTM, but I'm
not a global maintainer.

> Example gdb.log output for the failure:
>
> The testsuite sucessfully notices a failure to connect to port 2348.
> It launches again with port 2349, which also fails.  The testsuite
> ignores this error and uses gdb to connect to the port - which succeeds.
>
> spawn /work/build/gdb/testsuite/../gdbserver/gdbserver --once localhost:2348 /work/build/gdb/testsuite/outputs/gdb.ada/arrayidx/p^M
> Can't bind address: Address already in use.^M
> Exiting^M
> Port 2348 is already in use.
> spawn /work/build/gdb/testsuite/../gdbserver/gdbserver --once localhost:2349 /work/build/gdb/testsuite/outputs/gdb.ada/arrayidx/p^M
> Can't listen on socket: Address already in use.^M
> Exiting^M
> target remote localhost:2349^M
> Remote debugging using localhost:2349^M
> Reading /lib/ld-linux-aarch64.so.1 from remote target...^M
> warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.^M
> Reading /lib/ld-linux-aarch64.so.1 from remote target...^M
> Reading symbols from target:/lib/ld-linux-aarch64.so.1...^M
> Reading /lib/ld-2.23.so from remote target...^M
> Reading /lib/.debug/ld-2.23.so from remote target...^M
> Reading /work/build/install/lib/debug//lib/ld-2.23.so from remote target...^M
> Reading /work/build/install/lib/debug/lib//ld-2.23.so from remote target...^M
> Reading target:/work/build/install/lib/debug/lib//ld-2.23.so from remote target...^M
> (No debugging symbols found in target:/lib/ld-linux-aarch64.so.1)^M
> 0x0000ffffbf6d2cc0 in ?? () from target:/lib/ld-linux-aarch64.so.1^M
> (gdb) continue^M
> Continuing.^M
> Reading /lib/aarch64-linux-gnu/libc.so.6 from remote target...^M
> Reading /lib/aarch64-linux-gnu/libc-2.23.so from remote target...^M
> Reading /lib/aarch64-linux-gnu/.debug/libc-2.23.so from remote target...^M
> Reading /work/build/install/lib/debug//lib/aarch64-linux-gnu/libc-2.23.so from remote target...^M
> Reading /work/build/install/lib/debug/lib/aarch64-linux-gnu//libc-2.23.so from remote target...^M
> Reading target:/work/build/install/lib/debug/lib/aarch64-linux-gnu//libc-2.23.so from remote target...^M
> [Inferior 1 (process 35351) exited normally]^M
> (gdb) FAIL: gdb.ada/arrayidx.exp: can't run to main
>
> Meanwhile, at the same time, in another test, gdbserver successfully
> connects to port 2349.  GDB then trys to connect to the port, but it
> times out because the GDB in the test above has already connected to it.
>
> spawn /work/build/gdb/testsuite/../gdbserver/gdbserver --once localhost:2348 /work/build/gdb/testsuite/outputs/gdb.ada/rdv_wait/foo^M
> Can't bind address: Address already in use.^M
> Exiting^M
> Port 2348 is already in use.
> spawn /work/build/gdb/testsuite/../gdbserver/gdbserver --once localhost:2349 /work/build/gdb/testsuite/outputs/gdb.ada/rdv_wait/foo^M
> Process /work/build/gdb/testsuite/outputs/gdb.ada/rdv_wait/foo created; pid = 65162^M
> Listening on port 2349^M
> Remote debugging from host 127.0.0.1, port 45154^M
> target remote localhost:2349^M
> localhost:2349: Connection timed out.^M
> (gdb) ^CQuit^M
> (gdb) task 2^M
> Cannot inspect Ada tasks when program is not running^M
>
> 2019-02-26  Alan Hayward  <alan.hayward@arm.com>
>
> 	* lib/gdbserver-support.exp (gdbserver_start): Check for listen
> 	failure.
> ---
>  gdb/testsuite/lib/gdbserver-support.exp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gdb/testsuite/lib/gdbserver-support.exp b/gdb/testsuite/lib/gdbserver-support.exp
> index 05234c43bd..dbd885aa22 100644
> --- a/gdb/testsuite/lib/gdbserver-support.exp
> +++ b/gdb/testsuite/lib/gdbserver-support.exp
> @@ -317,7 +317,7 @@ proc gdbserver_start { options arguments } {
>  	    -timeout 120
>  	    -notransfer
>  	    -re "Listening on" { }
> -	    -re "Can't bind address: Address already in use\\.\r\n" {
> +	    -re "Can't (bind address|listen on socket): Address already in use\\.\r\n" {
>  		verbose -log "Port $portnum is already in use."
>  		if ![target_info exists gdb,socketport] {
>  		    # Bump the port number to avoid the conflict.
> -- 
> 2.17.2 (Apple Git-113)
  
Pedro Alves Feb. 27, 2019, 7:32 p.m. UTC | #2
On 02/26/2019 02:28 PM, Alan Hayward wrote:
> [This is a replacement for the patch "gdbserver: Add option to ignore
> SO_REUSEADDR".  The issue is the same, but the root cause is different].

Great, I was scratching my head over that one.

> Meanwhile, at the same time, in another test, gdbserver successfully
> connects to port 2349.  GDB then trys to connect to the port, but it

Typo: "tries" :-D

Patch is OK.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/testsuite/lib/gdbserver-support.exp b/gdb/testsuite/lib/gdbserver-support.exp
index 05234c43bd..dbd885aa22 100644
--- a/gdb/testsuite/lib/gdbserver-support.exp
+++ b/gdb/testsuite/lib/gdbserver-support.exp
@@ -317,7 +317,7 @@  proc gdbserver_start { options arguments } {
 	    -timeout 120
 	    -notransfer
 	    -re "Listening on" { }
-	    -re "Can't bind address: Address already in use\\.\r\n" {
+	    -re "Can't (bind address|listen on socket): Address already in use\\.\r\n" {
 		verbose -log "Port $portnum is already in use."
 		if ![target_info exists gdb,socketport] {
 		    # Bump the port number to avoid the conflict.