[0/2] Better handling of slow remote transfers

Message ID 55D3DB83.4050204@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves Aug. 19, 2015, 1:27 a.m. UTC
  On 08/18/2015 05:50 PM, Sandra Loosemore wrote:

> I would be happy to try to help debug what is going wrong here, but I'm
> not a GDB expert by any means and I'm totally unfamiliar with this part 
> of the code and don't know where to start.  If somebody has a theory on 
> this, is there someplace I could put in printfs to trace what is 
> happening?  From the previous discussion, it seemed like the flow 
> through the different layers of abstraction to what the RSP was actually 
> doing was very obscure, in terms of where the chunking is happening, etc.

That was just a misunderstanding.  It's not really obscure.  GDB has a file io
abstraction that wires the open/close/pread/pwrite syscalls to RSP packets.
Callers just need to use the target_fileio_open/pread etc. functions instead
of the usual syscalls.  Whenever BFD or GDB accesses a file that is prefixed
target: or remote:, the open/read/write requests go through those calls.
For remote targets, those end up in remote.c:remote_hostio_pread, etc.  The only
chunking going on is the server trimming the pread results to the maximum packet
size it can return to gdb.  Both gdb and gdbserver have that maximum buffer
size hardcoded at 16384 for historical reasons.

>> Your PandaBoard takes 8 seconds.  That doesn't seem so bad to me.
>> If this Altera board is the only one with the massive slowdown then
>> I don't think we should delay 7.10 any further on this issue--and I
>> certainly don't think we should lose the functionality that the
>> default sysroot of "target:" brings.
> 
> FWIW, I have just tried on a random sampling of our MIPS target boards 
> as well, with the same test program and same scenario of continuing to a 
> breakpoint on main.  The timings were 1:40 (SEAD-3 LX110 board, M14Kc 
> microMips), 1:38 (MALTA 74Kc MIPS16), and 1:20 (MALTA 5Kef MIPS64).  I 
> could try some PowerPC boards as well if we need more datapoints.

Thanks for doing this.  As yet another data point, it'd be interesting to
know whether Gary's rate-limit patch speeds up any other of these boards
as well.

Did you try getting the nfs mount out of the picture?  That is, copy the
debug-build files to the target, and have gdb pull them automatically.

It's expected that reading a file out of remote filesystem
is going to be slower than not reading it.  Obviously some slowdown
is expected.  There's no free lunch.

The fact that Gary's chunk size limiting patch made things much
faster on the nios2 board is still mysterious to me.  I'd expect the
slowness to be latency bound, given the request/response nature of the RSP,
but then I'd expect that more chunking would slow things down, not speed
it up.

I'm trying to build a more complete picture of the potential for
trouble.  The more data, the better...

So tonight I played with this.  To amplify latency issues against a remote target,
I've built gdbserver on gcc76 on the gcc compile farm.  Anyone can request
access to the gcc compile farm machines and to reproduce this.  That's an x86-64
in France almost 2000km away from me.  I've got a ping time of ~85ms to
that machine, and I'm behind a ~16Mbit ADSL.

As an illustration, here's what it takes to download the gdbserver binary using ssh/scp.
Not the fastest file transfer method, but the most convenient:

$ time scp gcc76.fsffrance.org:/home/palves/gdb/build/gdb/gdbserver/gdbserver /tmp/gdbserver
gdbserver             100% 1451KB   1.4MB/s   00:01

real    0m1.296s
user    0m0.015s
sys     0m0.013s

Now let's compare with GDB's file retrieval:

[palves@gcc76] $ ./gdbserver --multi :9999

[palves@home] $ ssh -L 9999:localhost:9999 gcc76.fsffrance.org
[palves@home] $ cat /tmp/remote-get
define getfile
       shell date
       remote get /home/palves/gdb/build/gdb/gdbserver/gdbserver /tmp/gdbserver
       shell date
       end
[palves@home] $ ./gdb -data-directory=data-directory -ex "tar extended-rem :9999" -x /tmp/remote-get

(gdb) getfile
Detaching after fork from child process 31567.
Wed Aug 19 01:04:13 WEST 2015
Detaching after fork from child process 31568.
Wed Aug 19 01:04:23 WEST 2015
(gdb)

~10s GDB/vFile vs ~1.3s scp/ssh.  Not that impressive.

So then I tried bumping the RSP max packet buffer size.  That required
a patch to gdb and another to gdbserver.  I bumped the buffer size 16 times.
The result was:

(gdb) getfile
Wed Aug 19 01:13:01 WEST 2015
Wed Aug 19 01:13:04 WEST 2015
(gdb) getfile
Wed Aug 19 01:13:12 WEST 2015
Wed Aug 19 01:13:14 WEST 2015
(gdb) getfile
Wed Aug 19 01:13:19 WEST 2015
Wed Aug 19 01:13:21 WEST 2015

So 2s/3s.  Not bad.

Bumping the packet buffer size to 4MB, to make sure
a single vFile:pread covers the ~1.5MB file, I consistently
get ~1s:

(gdb) getfile
Wed Aug 19 01:17:52 WEST 2015
Wed Aug 19 01:17:53 WEST 2015
(gdb) getfile
Wed Aug 19 01:17:55 WEST 2015
Wed Aug 19 01:17:56 WEST 2015
(gdb) getfile
Wed Aug 19 01:17:57 WEST 2015
Wed Aug 19 01:17:58 WEST 2015
(gdb)

which is then in the ball park of scp.  Seems clear to me that for
whole-file transfers, the bottleneck (on this use case) is the number
of chunks / round trips.  For file transfer, obviously a stream-based
transfer, like scp's, that avoids roundtrip hiccups, wins.

For the use case of connecting to gdbserver with "target remote",
and having gdb fetch the binary out of gdbserver with vFile, I tried
something else.

This time, I'll simply debug gdbserver itself remotely, letting gdb
fetch the gdbserver binary, and run to main.  Once main is reached,
quit:

[palves@gcc76] $ ./gdbserver :9999 ./gdbserver
[palves@home] $ ssh -L 9999:localhost:9999 gcc76.fsffrance.org
[palves@home] time ./gdb -data-directory=data-directory -ex "tar rem :9999" -ex "b main" -ex "c" -ex "set confirm off" -ex "quit"

Pristine gdb 7.10.50.20150816-cvs gets me:
...
Remote debugging using :9999
Reading symbols from target:/home/palves/gdb/build/gdb/gdbserver/gdbserver...done.
Reading symbols from target:/lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
0x00007ffff7ddd190 in ?? () from target:/lib64/ld-linux-x86-64.so.2
Breakpoint 1 at 0x41200c: file ../../../src/gdb/gdbserver/server.c, line 3635.
Continuing.

Breakpoint 1, main (argc=1, argv=0x7fffffffe3d8) at ../../../src/gdb/gdbserver/server.c:3635
3635    ../../../src/gdb/gdbserver/server.c: No such file or directory.
/home/palves/gdb/build/gdb/gdbserver/gdbserver: No such file or directory.

real    0m55.315s
user    0m0.400s
sys     0m0.087s

Bumping the remote max packet buffer size to (16384*16) helps a bit,
though not impressive:

real    0m51.600s
user    0m0.404s
sys     0m0.085s

Bumping to 0x100000 helps a bit further:

real    0m48.335s
user    0m0.421s
sys     0m0.091s

Looking at "set debug remote 1" output, I noticed that gdb often issues
a sequence of small vFile:pread requests for the same file.  Ctrl-c'ing
gdb while it was doing that and getting a backtrace shows that those requests
often came from bfd.  So I thought that maybe adding a readahead buffer/cache
to gdb's vFile:pread requests could help.  And indeed, that dropped the
time for the same use case further down to:

real    0m29.055s
user    0m2.625s
sys     0m0.090s

I added a few counters to show cache hit/miss, and got:

 readahead cache miss 44
 readahead cache hit 377

Not sure exactly what makes the "user" time go up that much (maybe
reading too much ahead causes too much RSP escaping, etc.?)
If I repeat the same leaving gdbserver's maximum buffer size
set at the original 16384, I still get <30s:

real    0m27.169s
user    0m0.464s
sys     0m0.061s

though notice the user time goes down again.  Maybe this indicates
that bigger chunk sizes indeed increase CPU usage, which on slower boards
would have a more profound effect than on the x86_64 machines I'm using.
An observation in line with Sandra's observation of Gary's patch speeding
up the nios2 board.  Or maybe that's just a coincidence.

BTW, the transfers seem to be always interruptible for me, even without
Gary's patch, and even the slow ones.

And finally, here's the time using a local sysroot and specifying a local
program:

$ time $g -ex "set sysroot /" -ex "tar rem :9999" -ex "b main" -ex "c" -ex "set confirm off" -ex "quit" /tmp/gdbserver
Reading symbols from /tmp/gdbserver...done.
Remote debugging using :9999
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib64/ld-2.18.so.debug...done.
done.
0x00007ffff7ddd190 in ?? ()
Breakpoint 1 at 0x41200c: file ../../../src/gdb/gdbserver/server.c, line 3635.
Continuing.

Breakpoint 1, main (argc=1, argv=0x7fffffffe3d8) at ../../../src/gdb/gdbserver/server.c:3635
3635    ../../../src/gdb/gdbserver/server.c: No such file or directory.

real    0m7.385s
user    0m0.132s
sys     0m0.028s

I'm attaching the 3 patches that I used.  I wouldn't be surprised
if they contain obvious blunders; it's getting quite late here...

Thanks,
Pedro Alves
  

Comments

Gary Benson Aug. 19, 2015, 1:42 p.m. UTC | #1
Pedro Alves wrote:
> BTW, the transfers seem to be always interruptible for me, even without
> Gary's patch, and even the slow ones.

Ok, I'm glad I'm not the only one :)

> From d426ce0a36830378a8ec2e2cbdd94d9fcb47b891 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Tue, 18 Aug 2015 23:27:20 +0100
> Subject: [PATCH 3/3] add readahead cache to gdb's vFile:pread

I tried this out on its own, and got similar hit/miss ratios, so it
looks like a good addition.  Comments below.

> diff --git a/gdb/remote.c b/gdb/remote.c
> index a839adf..8a739c8 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -10311,6 +10311,26 @@ remote_hostio_send_command (int command_bytes, int which_packet,
>    return ret;
>  }
>  
> +struct readahead_cache
> +{
> +  int fd;
> +  gdb_byte *buf;
> +  ULONGEST offset;
> +  size_t bufsize;
> +};
> +
> +static struct readahead_cache *readahead_cache;

Would this be better in struct remote_state (and maybe not allocated,
just inlined in the main remote_state--it's only 16 or 32 bytes)?

> @@ -10325,6 +10345,8 @@ remote_hostio_set_filesystem
>    char arg[9];
>    int ret;
>  
> +  readahead_cache_invalidate ();
> +
>    if (packet_support (PACKET_vFile_setfs) == PACKET_DISABLE)
>      return 0;
>  

This isn't necessary, file descriptors persist across setns calls.

> +      if (remote_debug)
> +	fprintf_unfiltered (gdb_stdlog,
> +			    "readahead cache hit %d\n",
> +			    ++readahead_cache_hit_count);
and
> +  if (remote_debug)
> +    fprintf_unfiltered (gdb_stdlog, "readahead cache miss %d\n",
> +			++readahead_cache_miss_count);

These only update the counters when debug printing is switched on.
Is this what you intended?

> Looking at "set debug remote 1" output, I noticed that gdb often
> issues a sequence of small vFile:pread requests for the same file.
> Ctrl-c'ing gdb while it was doing that and getting a backtrace shows
> that those requests often came from bfd.  So I thought that maybe
> adding a readahead buffer/cache to gdb's vFile:pread requests could
> help.  And indeed, that dropped the time for the same use case
> further down to:

Another thing I noticed when I was testing the warning messages patch
is that GDB seems to (always? often?) open each file twice:

    Remote debugging using :9999
  * Reading /home/gary/work/archer/fast-transfer/build64/gdb/gdbserver/gdbserver from remote target...
    warning: File transfers from remote targets can be slow. Use "set sysroot" with no arguments to access files locally instead.
  * Reading /home/gary/work/archer/fast-transfer/build64/gdb/gdbserver/gdbserver from remote target...
    Reading symbols from target:/home/gary/work/archer/fast-transfer/build64/gdb/gdbserver/gdbserver...done.
  * Reading /lib64/ld-linux-x86-64.so.2 from remote target...
  * Reading /lib64/ld-linux-x86-64.so.2 from remote target...
    ...etc

Figuring out why this is happening could save us some more time.

Cheers,
Gary

--
http://gbenson.net/
  

Patch

From d426ce0a36830378a8ec2e2cbdd94d9fcb47b891 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 18 Aug 2015 23:27:20 +0100
Subject: [PATCH 3/3] add readahead cache to gdb's vFile:pread

---
 gdb/remote.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 99 insertions(+), 3 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index a839adf..8a739c8 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -10311,6 +10311,26 @@  remote_hostio_send_command (int command_bytes, int which_packet,
   return ret;
 }
 
+struct readahead_cache
+{
+  int fd;
+  gdb_byte *buf;
+  ULONGEST offset;
+  size_t bufsize;
+};
+
+static struct readahead_cache *readahead_cache;
+
+static void
+readahead_cache_invalidate (void)
+{
+  if (readahead_cache != NULL)
+    readahead_cache->fd = -1;
+}
+
+static unsigned int readahead_cache_hit_count;
+static unsigned int readahead_cache_miss_count;
+
 /* Set the filesystem remote_hostio functions that take FILENAME
    arguments will use.  Return 0 on success, or -1 if an error
    occurs (and set *REMOTE_ERRNO).  */
@@ -10325,6 +10345,8 @@  remote_hostio_set_filesystem (struct inferior *inf, int *remote_errno)
   char arg[9];
   int ret;
 
+  readahead_cache_invalidate ();
+
   if (packet_support (PACKET_vFile_setfs) == PACKET_DISABLE)
     return 0;
 
@@ -10389,6 +10411,9 @@  remote_hostio_pwrite (struct target_ops *self,
   int left = get_remote_packet_size ();
   int out_len;
 
+  if (readahead_cache != NULL && readahead_cache->fd == fd)
+    readahead_cache_invalidate ();
+
   remote_buffer_add_string (&p, &left, "vFile:pwrite:");
 
   remote_buffer_add_int (&p, &left, fd);
@@ -10407,9 +10432,9 @@  remote_hostio_pwrite (struct target_ops *self,
 /* Implementation of to_fileio_pread.  */
 
 static int
-remote_hostio_pread (struct target_ops *self,
-		     int fd, gdb_byte *read_buf, int len,
-		     ULONGEST offset, int *remote_errno)
+remote_hostio_pread_1 (struct target_ops *self,
+		       int fd, gdb_byte *read_buf, int len,
+		       ULONGEST offset, int *remote_errno)
 {
   struct remote_state *rs = get_remote_state ();
   char *p = rs->buf;
@@ -10443,6 +10468,74 @@  remote_hostio_pread (struct target_ops *self,
   return ret;
 }
 
+static int
+remote_hostio_pread_from_cache (struct readahead_cache *cache,
+				int fd, gdb_byte *read_buf, int len,
+				ULONGEST offset)
+{
+  if (cache->fd == fd
+      && cache->offset <= offset
+      && offset < cache->offset + cache->bufsize)
+    {
+      ULONGEST max = cache->offset + cache->bufsize;
+
+      if (offset + len > max)
+	len = max - offset;
+
+      if (remote_debug)
+	fprintf_unfiltered (gdb_stdlog,
+			    "readahead cache hit %d\n",
+			    ++readahead_cache_hit_count);
+      memcpy (read_buf, cache->buf + offset - cache->offset, len);
+      return len;
+    }
+
+  return -1;
+}
+
+static int
+remote_hostio_pread (struct target_ops *self,
+		     int fd, gdb_byte *read_buf, int len,
+		     ULONGEST offset, int *remote_errno)
+{
+  int ret;
+
+  if (readahead_cache != NULL)
+    {
+      ret = remote_hostio_pread_from_cache (readahead_cache, fd,
+					    read_buf, len, offset);
+      if (ret >= 0)
+	return ret;
+
+      readahead_cache_invalidate ();
+    }
+
+  if (remote_debug)
+    fprintf_unfiltered (gdb_stdlog, "readahead cache miss %d\n",
+			++readahead_cache_miss_count);
+
+  if (readahead_cache == NULL)
+    readahead_cache = XCNEW (struct readahead_cache);
+  readahead_cache->fd = fd;
+  readahead_cache->offset = offset;
+  readahead_cache->bufsize = get_remote_packet_size ();
+  readahead_cache->buf = xrealloc (readahead_cache->buf, readahead_cache->bufsize);
+
+  ret = remote_hostio_pread_1 (self,
+			       readahead_cache->fd,
+			       readahead_cache->buf, readahead_cache->bufsize,
+			       readahead_cache->offset, remote_errno);
+  if (ret <= 0)
+    {
+      readahead_cache_invalidate ();
+      return ret;
+    }
+
+  readahead_cache->bufsize = ret;
+  return remote_hostio_pread_from_cache (readahead_cache, fd,
+					 read_buf, len, offset);
+}
+
 /* Implementation of to_fileio_close.  */
 
 static int
@@ -10452,6 +10545,9 @@  remote_hostio_close (struct target_ops *self, int fd, int *remote_errno)
   char *p = rs->buf;
   int left = get_remote_packet_size () - 1;
 
+  if (readahead_cache != NULL && readahead_cache->fd == fd)
+    readahead_cache_invalidate ();
+
   remote_buffer_add_string (&p, &left, "vFile:close:");
 
   remote_buffer_add_int (&p, &left, fd);
-- 
1.9.3