[v2] Optimize memory_xfer_partial for remote

  On 6/24/2016 3:23 PM, Pedro Alves wrote:
> On 06/24/2016 10:21 PM, Don Breazeal wrote:
>>
>> and with commit hash: 67c059c29e1fb0cdeacdd2005f955514d8d1fb34
>>
> 
> Write:
> 
>  ... with commit 67c059c29e1f ("Improve performance of large restore
>  commands") ...
> 
> so the reader has a clue what the commit is about without having
> to check.

Hi Pedro,
Thanks for your comments; sorry about the coding convention and clarity
issues. The one above and the others are fixed in the attached patch below.

> 
> 
>> gdb/ChangeLog:
>> 2016-06-24  Don Breazeal  <donb@codesourcery.com>
>>
>> 	* remote.c (remote_get_memory_xfer_limit): New function.
>> 	* target-delegates.c (delegate_get_memory_xfer_limit,
>> 	debug_get_memory_xfer_limit, install_delegators,
>> 	install_dummy_methods, init_debug_target): New functions
>> 	and target_ops initialization from regenerating the file.
> 
> The standard practice is to just say:
> 
>  	* target-delegates.c: Regenerate.

Fixed in the attached patch.

> 
>> 	* target.c (default_get_memory_xfer_limit): New function and
>> 	forward declaration.
>> 	(memory_xfer_partial): Call target_ops.to_get_memory_xfer_limit.
>> 	* target.h (struct target_ops)<to_get_memory_xfer_limit>: New
>> 	member.
> 
> Space between ")<".

Fixed in the attached patch.

> 
>>
>> diff --git a/gdb/remote.c b/gdb/remote.c
>> index 501f3c6..03c7ab7 100644
>> --- a/gdb/remote.c
>> +++ b/gdb/remote.c
>> @@ -10160,6 +10160,12 @@ remote_xfer_partial (struct target_ops *ops, enum target_object object,
>>    return TARGET_XFER_OK;
>>  }
>>  
>> +static ULONGEST
>> +remote_get_memory_xfer_limit (struct target_ops *ops)
> 
> Intro comment, something like "Implementation of ... method.".

Fixed in the attached patch.

> 
>>  
>> +/* The default implementation for the to_get_memory_xfer_limit method.
>> +   The hard-coded limit here was determined to be a reasonable default
>> +   that eliminated exponential slowdown on very large transfers without
>> +   unduly compromising performance on smaller transfers.  */
> 
> Where's this coming from?  Is this new experimentation you did,
> or are you talking about Anton's patch?

Both.  I did some experimentation to verify that things were significantly
slower without any memory transfer limit, which they were, although I never
reproduced the extreme scenario Anton had reported.  Presumably the
performance differences were due to hardware and environment differences.
Regarding the comment, I thought some explanation of the hard-coded number
was appropriate.  Is there a better or more preferable way to do this, e.g.
refer to the commit hash, or does it just seem superfluous?

> 
>> @@ -1301,8 +1314,9 @@ memory_xfer_partial (struct target_ops *ops, enum target_object object,
>>  	 by memory_xfer_partial_1.  We will continually malloc
>>  	 and free a copy of the entire write request for breakpoint
>>  	 shadow handling even though we only end up writing a small
>> -	 subset of it.  Cap writes to 4KB to mitigate this.  */
>> -      len = min (4096, len);
>> +	 subset of it.  Cap writes to a limit specified by the target
>> +	 to mitigate this.  */
>> +      len = min (ops->to_get_memory_xfer_limit (ops), len);
>>  
> 
> Does this still work if remote is not the top-most target?

Yes, see next comment.

> 
> E.g., what happens if you do "record" to push a record_statum
> target on top?  Do we get the 4KB default limit, or the
> remote limit?

I ran this experiment and verified that the target delegation mechanism
worked as expected in this case. It walks down the target stack, skipping
targets that don't implement to_get_memory_xfer_limit, and returning after
calling the function in the first target that does implement it. I also
verified that it walks all the way down the target stack and calls the
default function if no target has defined its own version of
to_get_memory_xfer_limit.

Output of the experiment with record and remote is below, followed by the
revised patch.

Thanks
--Don

[build5-trusty-cs(5.68) 694] bin$ gdb ./gdb
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
... GDB banner stuff...
Reading symbols from ./gdb...done.
(gdb) set prompt (top)
(top) b target.c:1319
Breakpoint 1 at 0x65985b: file ../../binutils-gdb/gdb/target.c, line 1319.
(top) run ~/test/big
Starting program: /scratch/dbreazea/gdb-5301/install/bin/gdb ~/test/big
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
GNU gdb (GDB) 7.11.50.20160624-git
...GDB banner stuff...
Reading symbols from /home/dbreazea/test/big...done.
(gdb) tar rem localhost:51111
Remote debugging using localhost:51111
...Reading symbols msgs...
0x00007ffff7ddb2d0 in ?? () from target:/lib64/ld-linux-x86-64.so.2
(gdb) b main
Breakpoint 1 at 0x40051e: file /home/dbreazea/test/big.c, line 9.
(gdb) c
Continuing.
Reading /lib/x86_64-linux-gnu/libc.so.6 from remote target...
Reading /lib/x86_64-linux-gnu/libc-2.19.so from remote target...
Reading /lib/x86_64-linux-gnu/.debug/libc-2.19.so from remote target...

Breakpoint 1, main () at /home/dbreazea/test/big.c:9
9 printf ("starting...\n");
(gdb) record full <---<<< push the target on top of the remote target
(gdb) b 12
Breakpoint 2 at 0x40054d: file /home/dbreazea/test/big.c, line 12.
(gdb) c
Continuing.

Breakpoint 2, main () at /home/dbreazea/test/big.c:12
12 printf ("done, a[%d] is %d\n", 42, a[42]);
(gdb) restore ../../build/gdb/a.srec <---<<< 'restore' for big writes
Restoring section .sec1 (0x6009a0 to 0x6089a0)

Breakpoint 1, memory_xfer_partial (ops=0xe07e20 <record_full_ops>,
object=TARGET_OBJECT_MEMORY, readbuf=0x0, writebuf=0x212c480 "",
memaddr=6293920, len=32768, xfered_len=0x7fffffffdea8)
at ../../binutils-gdb/gdb/target.c:1319
1319 len = min (ops->to_get_memory_xfer_limit (ops), len);
(top) p ops->to_longname
$1 = 0x9ddc00 "Process record and replay target"
(top) s
delegate_get_memory_xfer_limit (self=0xe07e20 <record_full_ops>)
at ../../binutils-gdb/gdb/target-delegates.c:2070
2070 self = self->beneath;
(top) display self->to_longname
1: self->to_longname = 0x9ddc00 "Process record and replay target"
(top) s
2071 return self->to_get_memory_xfer_limit (self);
1: self->to_longname = 0x8fcd00 "Remote serial target in gdb-specific protocol"
(top) s
remote_get_memory_xfer_limit (ops=0xdf0360 <remote_ops>)
at ../../binutils-gdb/gdb/remote.c:10166
10166 return get_memory_write_packet_size ();
(top) n
10167 }
(top) n
delegate_get_memory_xfer_limit (self=0xdf0360 <remote_ops>)
at ../../binutils-gdb/gdb/target-delegates.c:2072
2072 }
1: self->to_longname = 0x8fcd00 "Remote serial target in gdb-specific protocol"
(top) n
memory_xfer_partial (ops=0xe07e20 <record_full_ops>,
object=TARGET_OBJECT_MEMORY, readbuf=0x0, writebuf=0x212c480 "",
memaddr=6293920, len=16383, xfered_len=0x7fffffffdea8)
at ../../binutils-gdb/gdb/target.c:1321
1321 buf = (gdb_byte *) xmalloc (len);
(top) p len
$2 = 16383 <---<<< 16K (approx) len from remote target function

---------- revised patch -----------
Some analysis we did here showed that increasing the cap on the
transfer size in target.c:memory_xfer_partial could give 20% or more
improvement in remote load across JTAG.  Transfer sizes are capped
to 4K bytes because of performance problems encountered with the
restore command, documented here:

https://sourceware.org/ml/gdb-patches/2013-07/msg00611.html

and in commit 67c059c29e1f ("Improve performance of large restore
commands").

The 4K cap was introduced because in a case where the restore command
requested a 100MB transfer, memory_xfer_partial would repeatedy
allocate and copy an entire 100MB buffer in order to properly handle
breakpoint shadow instructions, even though memory_xfer_partial would
actually only write a small portion of the buffer contents.

A couple of alternative solutions were suggested:
* change the algorithm for handling the breakpoint shadow instructions
* throttle the transfer size up or down based on the previous actual
  transfer size

I tried implementing the throttling approach, and my implementation
reduced the performance in some cases.

This patch implements a new target function that returns that target's
limit on memory transfer size.  It defaults to 4K bytes as before, but
for remote it uses remote.c:get_memory_write_packet_size.

The performance differences that I saw were as follows (in seconds),
using an artificially large application and a 100MB srec file:

USB  load:     orig   53.2 patch  18.9
USB  restore:  orig 1522.4 patch 543.6
Enet load:     orig   12.2 patch  10.0
Enet restore:  orig  368.0 patch 294.3

Tested on x86_64 Linux with native and native-gdbserver, and manually
tested 'load' and 'restore' on a Windows 7 host with a bare-metal ARM
board.

gdb/ChangeLog:
2016-06-27  Don Breazeal  <donb@codesourcery.com>

	* remote.c (remote_get_memory_xfer_limit): New function.
	* target-delegates.c: Regenerate.
	* target.c (default_get_memory_xfer_limit): New function and
	forward declaration.
	(memory_xfer_partial): Call target_ops.to_get_memory_xfer_limit.
	* target.h (struct target_ops) <to_get_memory_xfer_limit>: New
	member.

---
 gdb/remote.c           |  9 +++++++++
 gdb/target-delegates.c | 25 +++++++++++++++++++++++++
 gdb/target.c           | 18 ++++++++++++++++--
 gdb/target.h           |  6 ++++++
 4 files changed, 56 insertions(+), 2 deletions(-)

[v2] Optimize memory_xfer_partial for remote

Commit Message

Comments

Patch