diff mbox

[v2] Optimize memory_xfer_partial for remote

Message ID 1466803274-62026-1-git-send-email-donb@codesourcery.com
State New
Headers show

Commit Message

Don Breazeal June 24, 2016, 9:21 p.m. UTC
Pedro, Yao,
Here is version two of the patch to optimize the limit on the size of
memory transfers, specifically memory writes.  Sorry it took a little
while to turn this around.  The patch has been rewritten to use a
target function as Pedro suggested.  Responses to both Pedro and Yao's
comments follow, then the revised patch.

Note that performance numbers in the patch description were generated on
hardware different than the hardware used for the previous patch, so
one can't compare them with the previous patch's performance numbers.

Thanks,
--Don

On 6/20/2016 12:25 PM, Pedro Alves wrote:
> On 06/03/2016 08:02 PM, Don Breazeal wrote:
> 
>> I considered making target_set_memory_xfer_limit a function in the target
>> vector, but concluded that was overkill.  In this patch it is an external
>> function in target.c.
> 
> Sorry, but that doesn't make sense.  If in the same session you
> switch to another target (e.g., core or native debugging), you'll 
> continue using the limit set up by the previous remote connection.
> 
> There's also an effort to teach gdb about connecting to 
> multiple remote targets at the same time.  A global like this
> would need to be adjusted to per-connection anyway.
> 
> So seems to be we should have a real target_get_memory_xfer_limit()
> (or some such) target method.

Ah, I see.  The patch has been revised to do this.

On 6/21/2016 3:15 AM, Yao Qi wrote:
> Alternatively, we can move breakpoint_xfer_memory down to each target
> (remote, native, etc), on which we know what limit it is.

I went with the get_memory_xfer_limit approach, lacking any technical
reason to choose one approach over the other.

>> +  /* Set the cap on memory transfer requests to our packet size.  */
>> +  target_set_memory_xfer_limit (get_remote_packet_size ());
> 
> Shouldn't this be based on get_memory_write_packet_size() instead?

The patch has been revised to use this as well.

Thanks,
--Don
----------
Some analysis we did here showed that increasing the cap on the
transfer size in target.c:memory_xfer_partial could give 20% or more
improvement in remote load across JTAG.  Transfer sizes are capped
to 4K bytes because of performance problems encountered with the
restore command, documented here:

https://sourceware.org/ml/gdb-patches/2013-07/msg00611.html

and with commit hash: 67c059c29e1fb0cdeacdd2005f955514d8d1fb34

The 4K cap was introduced because in a case where the restore command
requested a 100MB transfer, memory_xfer_partial would allocate and copy
an entire 100MB buffer in order to properly handle breakpoint shadow
instructions, even though memory_xfer_partial would actually only
write a small portion of the buffer contents.

A couple of alternative solutions were suggested:
* change the algorithm for handling the breakpoint shadow instructions
* throttle the transfer size up or down based on the previous actual
  transfer size

I tried implementing the throttling approach, and my implementation
reduced the performance in some cases.

This patch implements a new target function that returns that target's
limit on memory transfer size.  It defaults to 4K bytes as before, but
for remote it uses remote.c:get_memory_write_packet_size.

The performance differences that I saw were as follows (in seconds),
using an artificially large application and a 100MB srec file:

USB  load:     orig   53.2 patch  18.9
USB  restore:  orig 1522.4 patch 543.6
Enet load:     orig   12.2 patch  10.0
Enet restore:  orig  368.0 patch 294.3

Tested on x86_64 Linux with native and native-gdbserver, and manually
tested 'load' and 'restore' on a Windows 7 host with a bare-metal ARM
board.

gdb/ChangeLog:
2016-06-24  Don Breazeal  <donb@codesourcery.com>

	* remote.c (remote_get_memory_xfer_limit): New function.
	* target-delegates.c (delegate_get_memory_xfer_limit,
	debug_get_memory_xfer_limit, install_delegators,
	install_dummy_methods, init_debug_target): New functions
	and target_ops initialization from regenerating the file.
	* target.c (default_get_memory_xfer_limit): New function and
	forward declaration.
	(memory_xfer_partial): Call target_ops.to_get_memory_xfer_limit.
	* target.h (struct target_ops)<to_get_memory_xfer_limit>: New
	member.

---
 gdb/remote.c           |  7 +++++++
 gdb/target-delegates.c | 25 +++++++++++++++++++++++++
 gdb/target.c           | 18 ++++++++++++++++--
 gdb/target.h           |  6 ++++++
 4 files changed, 54 insertions(+), 2 deletions(-)

Comments

Pedro Alves June 24, 2016, 10:23 p.m. UTC | #1
On 06/24/2016 10:21 PM, Don Breazeal wrote:
> 
> and with commit hash: 67c059c29e1fb0cdeacdd2005f955514d8d1fb34
> 

Write:

 ... with commit 67c059c29e1f ("Improve performance of large restore
 commands") ...

so the reader has a clue what the commit is about without having
to check.


> gdb/ChangeLog:
> 2016-06-24  Don Breazeal  <donb@codesourcery.com>
> 
> 	* remote.c (remote_get_memory_xfer_limit): New function.
> 	* target-delegates.c (delegate_get_memory_xfer_limit,
> 	debug_get_memory_xfer_limit, install_delegators,
> 	install_dummy_methods, init_debug_target): New functions
> 	and target_ops initialization from regenerating the file.

The standard practice is to just say:

 	* target-delegates.c: Regenerate.

> 	* target.c (default_get_memory_xfer_limit): New function and
> 	forward declaration.
> 	(memory_xfer_partial): Call target_ops.to_get_memory_xfer_limit.
> 	* target.h (struct target_ops)<to_get_memory_xfer_limit>: New
> 	member.

Space between ")<".

> 
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 501f3c6..03c7ab7 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -10160,6 +10160,12 @@ remote_xfer_partial (struct target_ops *ops, enum target_object object,
>    return TARGET_XFER_OK;
>  }
>  
> +static ULONGEST
> +remote_get_memory_xfer_limit (struct target_ops *ops)

Intro comment, something like "Implementation of ... method.".

>  
> +/* The default implementation for the to_get_memory_xfer_limit method.
> +   The hard-coded limit here was determined to be a reasonable default
> +   that eliminated exponential slowdown on very large transfers without
> +   unduly compromising performance on smaller transfers.  */

Where's this coming from?  Is this new experimentation you did,
or are you talking about Anton's patch?

> @@ -1301,8 +1314,9 @@ memory_xfer_partial (struct target_ops *ops, enum target_object object,
>  	 by memory_xfer_partial_1.  We will continually malloc
>  	 and free a copy of the entire write request for breakpoint
>  	 shadow handling even though we only end up writing a small
> -	 subset of it.  Cap writes to 4KB to mitigate this.  */
> -      len = min (4096, len);
> +	 subset of it.  Cap writes to a limit specified by the target
> +	 to mitigate this.  */
> +      len = min (ops->to_get_memory_xfer_limit (ops), len);
>  

Does this still work if remote is not the top-most target?

E.g., what happens if you do "record" to push a record_statum
target on top?  Do we get the 4KB default limit, or the
remote limit?

Thanks,
Pedro Alves
diff mbox

Patch

diff --git a/gdb/remote.c b/gdb/remote.c
index 501f3c6..03c7ab7 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -10160,6 +10160,12 @@  remote_xfer_partial (struct target_ops *ops, enum target_object object,
   return TARGET_XFER_OK;
 }
 
+static ULONGEST
+remote_get_memory_xfer_limit (struct target_ops *ops)
+{
+  return get_memory_write_packet_size ();
+}
+
 static int
 remote_search_memory (struct target_ops* ops,
 		      CORE_ADDR start_addr, ULONGEST search_space_len,
@@ -13073,6 +13079,7 @@  Specify the serial device it is connected to\n\
   remote_ops.to_interrupt = remote_interrupt;
   remote_ops.to_pass_ctrlc = remote_pass_ctrlc;
   remote_ops.to_xfer_partial = remote_xfer_partial;
+  remote_ops.to_get_memory_xfer_limit = remote_get_memory_xfer_limit;
   remote_ops.to_rcmd = remote_rcmd;
   remote_ops.to_pid_to_exec_file = remote_pid_to_exec_file;
   remote_ops.to_log_command = serial_log_command;
diff --git a/gdb/target-delegates.c b/gdb/target-delegates.c
index 03aa2cc..18f22b5 100644
--- a/gdb/target-delegates.c
+++ b/gdb/target-delegates.c
@@ -2064,6 +2064,27 @@  debug_xfer_partial (struct target_ops *self, enum target_object arg1, const char
   return result;
 }
 
+static ULONGEST
+delegate_get_memory_xfer_limit (struct target_ops *self)
+{
+  self = self->beneath;
+  return self->to_get_memory_xfer_limit (self);
+}
+
+static ULONGEST
+debug_get_memory_xfer_limit (struct target_ops *self)
+{
+  ULONGEST result;
+  fprintf_unfiltered (gdb_stdlog, "-> %s->to_get_memory_xfer_limit (...)\n", debug_target.to_shortname);
+  result = debug_target.to_get_memory_xfer_limit (&debug_target);
+  fprintf_unfiltered (gdb_stdlog, "<- %s->to_get_memory_xfer_limit (", debug_target.to_shortname);
+  target_debug_print_struct_target_ops_p (&debug_target);
+  fputs_unfiltered (") = ", gdb_stdlog);
+  target_debug_print_ULONGEST (result);
+  fputs_unfiltered ("\n", gdb_stdlog);
+  return result;
+}
+
 static VEC(mem_region_s) *
 delegate_memory_map (struct target_ops *self)
 {
@@ -4223,6 +4244,8 @@  install_delegators (struct target_ops *ops)
     ops->to_get_thread_local_address = delegate_get_thread_local_address;
   if (ops->to_xfer_partial == NULL)
     ops->to_xfer_partial = delegate_xfer_partial;
+  if (ops->to_get_memory_xfer_limit == NULL)
+    ops->to_get_memory_xfer_limit = delegate_get_memory_xfer_limit;
   if (ops->to_memory_map == NULL)
     ops->to_memory_map = delegate_memory_map;
   if (ops->to_flash_erase == NULL)
@@ -4454,6 +4477,7 @@  install_dummy_methods (struct target_ops *ops)
   ops->to_goto_bookmark = tdefault_goto_bookmark;
   ops->to_get_thread_local_address = tdefault_get_thread_local_address;
   ops->to_xfer_partial = tdefault_xfer_partial;
+  ops->to_get_memory_xfer_limit = default_get_memory_xfer_limit;
   ops->to_memory_map = tdefault_memory_map;
   ops->to_flash_erase = tdefault_flash_erase;
   ops->to_flash_done = tdefault_flash_done;
@@ -4610,6 +4634,7 @@  init_debug_target (struct target_ops *ops)
   ops->to_goto_bookmark = debug_goto_bookmark;
   ops->to_get_thread_local_address = debug_get_thread_local_address;
   ops->to_xfer_partial = debug_xfer_partial;
+  ops->to_get_memory_xfer_limit = debug_get_memory_xfer_limit;
   ops->to_memory_map = debug_memory_map;
   ops->to_flash_erase = debug_flash_erase;
   ops->to_flash_done = debug_flash_done;
diff --git a/gdb/target.c b/gdb/target.c
index 6f69ac3..57202b4 100644
--- a/gdb/target.c
+++ b/gdb/target.c
@@ -60,6 +60,8 @@  static int default_region_ok_for_hw_watchpoint (struct target_ops *,
 
 static void default_rcmd (struct target_ops *, const char *, struct ui_file *);
 
+static ULONGEST default_get_memory_xfer_limit (struct target_ops *self);
+
 static ptid_t default_get_ada_task_ptid (struct target_ops *self,
 					 long lwp, long tid);
 
@@ -623,6 +625,17 @@  default_terminal_info (struct target_ops *self, const char *args, int from_tty)
   printf_unfiltered (_("No saved terminal information.\n"));
 }
 
+/* The default implementation for the to_get_memory_xfer_limit method.
+   The hard-coded limit here was determined to be a reasonable default
+   that eliminated exponential slowdown on very large transfers without
+   unduly compromising performance on smaller transfers.  */
+
+static ULONGEST
+default_get_memory_xfer_limit (struct target_ops *self)
+{
+  return 4096;
+}
+
 /* A default implementation for the to_get_ada_task_ptid target method.
 
    This function builds the PTID by using both LWP and TID as part of
@@ -1301,8 +1314,9 @@  memory_xfer_partial (struct target_ops *ops, enum target_object object,
 	 by memory_xfer_partial_1.  We will continually malloc
 	 and free a copy of the entire write request for breakpoint
 	 shadow handling even though we only end up writing a small
-	 subset of it.  Cap writes to 4KB to mitigate this.  */
-      len = min (4096, len);
+	 subset of it.  Cap writes to a limit specified by the target
+	 to mitigate this.  */
+      len = min (ops->to_get_memory_xfer_limit (ops), len);
 
       buf = (gdb_byte *) xmalloc (len);
       old_chain = make_cleanup (xfree, buf);
diff --git a/gdb/target.h b/gdb/target.h
index 6b5b6e0..84f12a9 100644
--- a/gdb/target.h
+++ b/gdb/target.h
@@ -745,6 +745,12 @@  struct target_ops
 						ULONGEST *xfered_len)
       TARGET_DEFAULT_RETURN (TARGET_XFER_E_IO);
 
+    /* Return the limit on the size of any single memory transfer
+       for the target.  */
+
+    ULONGEST (*to_get_memory_xfer_limit) (struct target_ops *)
+      TARGET_DEFAULT_FUNC (default_get_memory_xfer_limit);
+
     /* Returns the memory map for the target.  A return value of NULL
        means that no memory map is available.  If a memory address
        does not fall within any returned regions, it's assumed to be