[RFC] fix gdbserver channel leaks

Message ID 15b31a68-82f4-b9e9-7b9a-d9a82f8903d9@codesourcery.com
State New, archived
Headers

Commit Message

Sandra Loosemore Jan. 16, 2019, 9:18 p.m. UTC
  I've been trying to remove some ancient cruft in our local remote-target 
gdbserver test harness that has caused problems with some newer GDB 
tests.  This has exposed what I think is a problem in GDB's own 
gdbserver-support.exp:  having opened a channel to the gdbserver spawn 
in gdbserver_start, in many cases it is failing to close it cleanly, 
causing it to leak so many open channels that eventually TCL runs out 
and starts giving ERRORs.  Specifically, gdbserver_start is discarding 
any already-open server_spawn_id without closing it when opening a new 
channel, and the logic in gdbserver_gdb_exit that tries to clean up the 
gdbserver connection is failing to actually close the channel before 
discarding server_spawn_id.

I've thought it best to consolidate the logic to manage server_spawn_id 
in start_gdbserver and close_gdbserver.  It did seem plausible to put 
the logic to shut down any old server_spawn_id channel before opening a 
new one in gdbserver_run instead of in start_gdbserver, but there are 
places that bypass that function (e.g., in the mi test support).

The alternative to fixing this in GDB's own test support seems to be 
doing it at the DejaGnu board level instead, which is what I'm trying to 
get rid of in our local test harness.  There's no documentation about 
the expectations and it seems pretty hacky and a violation of 
abstractions to do it in the low-level board support.

I'm wondering, can other people who do remote-target gdbserver testing 
help try out this patch and see if it works for them?  I've gotten good 
results with my nios2-linux-gnu test setup but I don't know what kind of 
board support other people might be using.  Also, assuming the patch is 
OK, I don't know if it's inappropriate or too late for the 8.3 release.

-Sandra
  

Comments

Tom Tromey Jan. 17, 2019, 4:02 p.m. UTC | #1
>>>>> "Sandra" == Sandra Loosemore <sandra@codesourcery.com> writes:

Sandra> I've thought it best to consolidate the logic to manage
Sandra> server_spawn_id in start_gdbserver and close_gdbserver.  It did seem
Sandra> plausible to put the logic to shut down any old server_spawn_id
Sandra> channel before opening a new one in gdbserver_run instead of in
Sandra> start_gdbserver, but there are places that bypass that function (e.g.,
Sandra> in the mi test support).

FWIW this patch looks reasonable to me.

Sandra> I'm wondering, can other people who do remote-target gdbserver testing
Sandra> help try out this patch and see if it works for them?

I can't really help with that, though, I'm afraid.

Tom
  
Pedro Alves Jan. 17, 2019, 4:12 p.m. UTC | #2
On 01/16/2019 09:18 PM, Sandra Loosemore wrote:
> I've been trying to remove some ancient cruft in our local remote-target gdbserver test harness that has caused problems with some newer GDB tests.  This has exposed what I think is a problem in GDB's own gdbserver-support.exp:  having opened a channel to the gdbserver spawn in gdbserver_start, in many cases it is failing to close it cleanly, causing it to leak so many open channels that eventually TCL runs out and starts giving ERRORs.  Specifically, gdbserver_start is discarding any already-open server_spawn_id without closing it when opening a new channel, and the logic in gdbserver_gdb_exit that tries to clean up the gdbserver connection is failing to actually close the channel before discarding server_spawn_id.

When I read this, a question comes to mind:

How does it happen that we're starting a new gdbserver
without closing the original one?  It sounds like this may
be papering over a bug elsewhere?  Shouldn't that be
an error instead?  Why are we "failing to close it cleanly"?

I mean, default_gdb_spawn doesn't start a new gdb if
there's already one either.

Thanks,
Pedro Alves

> 
> I've thought it best to consolidate the logic to manage server_spawn_id in start_gdbserver and close_gdbserver.  It did seem plausible to put the logic to shut down any old server_spawn_id channel before opening a new one in gdbserver_run instead of in start_gdbserver, but there are places that bypass that function (e.g., in the mi test support).
> 
> The alternative to fixing this in GDB's own test support seems to be doing it at the DejaGnu board level instead, which is what I'm trying to get rid of in our local test harness.  There's no documentation about the expectations and it seems pretty hacky and a violation of abstractions to do it in the low-level board support.
> 
> I'm wondering, can other people who do remote-target gdbserver testing help try out this patch and see if it works for them?  I've gotten good results with my nios2-linux-gnu test setup but I don't know what kind of board support other people might be using.  Also, assuming the patch is OK, I don't know if it's inappropriate or too late for the 8.3 release.
  
Sandra Loosemore Jan. 17, 2019, 7:32 p.m. UTC | #3
On 1/17/19 9:12 AM, Pedro Alves wrote:
> On 01/16/2019 09:18 PM, Sandra Loosemore wrote:
>> I've been trying to remove some ancient cruft in our local remote-target gdbserver test harness that has caused problems with some newer GDB tests.  This has exposed what I think is a problem in GDB's own gdbserver-support.exp:  having opened a channel to the gdbserver spawn in gdbserver_start, in many cases it is failing to close it cleanly, causing it to leak so many open channels that eventually TCL runs out and starts giving ERRORs.  Specifically, gdbserver_start is discarding any already-open server_spawn_id without closing it when opening a new channel, and the logic in gdbserver_gdb_exit that tries to clean up the gdbserver connection is failing to actually close the channel before discarding server_spawn_id.
> 
> When I read this, a question comes to mind:
> 
> How does it happen that we're starting a new gdbserver
> without closing the original one?  It sounds like this may
> be papering over a bug elsewhere?  Shouldn't that be
> an error instead?  Why are we "failing to close it cleanly"?
> 
> I mean, default_gdb_spawn doesn't start a new gdb if
> there's already one either.

Well, I initially thought this should be an error too.  But here is one 
example I tracked down where the error was triggering.  (There might be 
others as well.)

gdbserver_gdb_load (in config/gdbserver.exp) calls gdbserver_spawn. 
We're using our own hacked-up version but it was derived from from this 
source; presumably this behavior is part of the interface description of 
what this function is supposed to do.

mi_gdb_target_load calls gdbserver_gdb_load.  It issues the "kill" 
command to kill any already-running program first, but doesn't do 
anything to close the open gdbserver channel.

mi_run_cmd calls mi_gdb_target_load indirectly via mi_run_cmd_full.

mi_runto calls mi_run_cmd indirectly via mi_runto_helper.

There are a lot of individual tests that call mi_runto without doing 
anything to shut down gdbserver first.

-Sandra
  
Sandra Loosemore Jan. 25, 2019, 6:36 p.m. UTC | #4
Ping.  Any additional thoughts on this?

On 1/17/19 12:32 PM, Sandra Loosemore wrote:
> On 1/17/19 9:12 AM, Pedro Alves wrote:
>> On 01/16/2019 09:18 PM, Sandra Loosemore wrote:
>>> I've been trying to remove some ancient cruft in our local 
>>> remote-target gdbserver test harness that has caused problems with 
>>> some newer GDB tests.  This has exposed what I think is a problem in 
>>> GDB's own gdbserver-support.exp:  having opened a channel to the 
>>> gdbserver spawn in gdbserver_start, in many cases it is failing to 
>>> close it cleanly, causing it to leak so many open channels that 
>>> eventually TCL runs out and starts giving ERRORs.  Specifically, 
>>> gdbserver_start is discarding any already-open server_spawn_id 
>>> without closing it when opening a new channel, and the logic in 
>>> gdbserver_gdb_exit that tries to clean up the gdbserver connection is 
>>> failing to actually close the channel before discarding server_spawn_id.
>>
>> When I read this, a question comes to mind:
>>
>> How does it happen that we're starting a new gdbserver
>> without closing the original one?  It sounds like this may
>> be papering over a bug elsewhere?  Shouldn't that be
>> an error instead?  Why are we "failing to close it cleanly"?
>>
>> I mean, default_gdb_spawn doesn't start a new gdb if
>> there's already one either.
> 
> Well, I initially thought this should be an error too.  But here is one 
> example I tracked down where the error was triggering.  (There might be 
> others as well.)
> 
> gdbserver_gdb_load (in config/gdbserver.exp) calls gdbserver_spawn. 
> We're using our own hacked-up version but it was derived from from this 
> source; presumably this behavior is part of the interface description of 
> what this function is supposed to do.
> 
> mi_gdb_target_load calls gdbserver_gdb_load.  It issues the "kill" 
> command to kill any already-running program first, but doesn't do 
> anything to close the open gdbserver channel.
> 
> mi_run_cmd calls mi_gdb_target_load indirectly via mi_run_cmd_full.
> 
> mi_runto calls mi_run_cmd indirectly via mi_runto_helper.
> 
> There are a lot of individual tests that call mi_runto without doing 
> anything to shut down gdbserver first.
> 
> -Sandra
  

Patch

diff --git a/gdb/testsuite/lib/gdbserver-support.exp b/gdb/testsuite/lib/gdbserver-support.exp
index 05234c4..42ccbfa 100644
--- a/gdb/testsuite/lib/gdbserver-support.exp
+++ b/gdb/testsuite/lib/gdbserver-support.exp
@@ -303,7 +303,12 @@  proc gdbserver_start { options arguments } {
 	    append gdbserver_command " $arguments"
 	}
 
+	# Cleanly shut down any existing gdbserver spawn before
+	# starting a new one.
 	global server_spawn_id
+	if { [info exists server_spawn_id] } {
+	    close_gdbserver
+	}
 	set server_spawn_id [remote_spawn target $gdbserver_command]
 
 	# GDBserver doesn't do inferior I/O through GDB.  But we can
@@ -422,8 +427,7 @@  proc gdbserver_gdb_exit { is_mi } {
 		    exp_continue
 		}
 		-i "$server_spawn_id" eof {
-		    wait -i $expect_out(spawn_id)
-		    unset server_spawn_id
+		    # close_gdbserver below will clean up the spawn state.
 		}
 	    }
 	}