gdb, gdbserver: introduce the 'e' RSP packet

Message ID 20250320181134.2608606-1-tankut.baris.aktemur@intel.com
State New
Headers
Series gdb, gdbserver: introduce the 'e' RSP packet |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed

Commit Message

Aktemur, Tankut Baris March 20, 2025, 6:11 p.m. UTC
  Introduce a new Remote Serial Protocol (RSP) packet: 'e'

This packet fetches a subset of registers from the remote target and
is an alternative to the 'g' packet, which fetches all the registers.
The motivation is to reduce the data transfer overhead between GDB and
the remote target.  Using the 'e' packet instead of 'g' can make a
difference especially when the target has a large number of threads
with big register files, such as GPUs.

Which subset of registers to send to GDB is up to the remote target,
but typically it would be the list of "expedite" registers as defined
by the specific target.  We implement the 'e' packet at the gdbserver
side that way.

Suppose we are debugging a multi-threaded program over a remote target
connection.  When a thread hits a breakpoint, its expedited registers
are included in the stop reply packet, e.g.:

  [remote] Packet received: T05swbreak:;06:c0ddffffff7f0000;07:20ddffffff7f0000;10:1e55555555550000;thread:p30382.30382;core:6;

With these expedited registers, it may be sufficient to accomplish
fundamental debug actions such as computing the frame or even the
backtrace of the eventing thread, if the frame arguments do not depend
on non-expedited registers (the user can avoid this dependency with
`set print frame-arguments none`) without having to fetch anything
more with the 'g' packet.

However, if the user wants to inspect the other threads, GDB
immediately sends a 'g' packet for them to fetch all their registers,
because the stop reply packet contains the expedited registers of the
eventing thread only.  As an example, suppose there are 1000 threads
in a program and we hit a breakpoint.  If the user runs `info
threads`, the whole register sets of all the 999 threads would be
fetched by sending a 'g' packet for each.  For targets that have even
larger number of threads with big register files (e.g. GPUs), this
could be costly.  The 'e' packet approach in this patch aims to reduce
the costs by first fetching the expedited registers and resorting to
'g' if the expedited registers are not sufficient.  This means,
however, the 'e' packet approach does not guarantee speedups -- in the
worst case, GDB would first send an 'e' packet followed by a 'g'
packet, instead of sending 'g' only.  However, I think that in the
typical user scenarios improvements are expected.

Here is an experiment with a program that has 2000 threads.  After
hitting a BP, we do "maint set per-command time on" and measure the
cost of the following commands:

 * info threads

   In the experiment I did, there are no arguments of the frame.
   Expedited registers are sufficient to calculate the frame and print
   the thread rows in the output.

   Without this patch: 2.601 s
   With this patch:    0.856 s

 * thread apply all info reg rip

   The 'rip' register is part of expedited registers.  Hence, for each
   thread, no further 'g' packet needs to be sent.  Without the patch,
   a 'g' is sent for each thread except the one that hit the
   breakpoint.

   Without this patch: 2.317 s
   With this patch:    1.406 s

 * thread apply all info reg rip r15

   The 'r15' register is not part of the expedited set.  Therefore,
   after sending an 'e' to fetch rip, also a 'g' is sent to fetch r15.
   This benchmark represents the worst case.

   Without this patch: 2.495 s
   With this patch:    3.332 s

In our (Intel) downstream debugger for our GPU targets, we see much
more significant improvements and the worst case scenario is not seen
in typical user scenarios (we recommend the users to run "info
threads" with "set print frame-arguments none").

Regression-tested on X86-64 Linux using the native-gdbserver and
native-extended-gdbserver board files.
---
 gdb/NEWS                  |   8 +++
 gdb/doc/gdb.texinfo       |  24 +++++++++
 gdb/remote.c              | 103 +++++++++++++++++++++++++++++++++++++-
 gdbserver/remote-utils.cc |  14 ++++++
 gdbserver/remote-utils.h  |   4 ++
 gdbserver/server.cc       |  15 ++++++
 6 files changed, 167 insertions(+), 1 deletion(-)
  

Comments

Eli Zaretskii March 20, 2025, 8:49 p.m. UTC | #1
> From: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
> Date: Thu, 20 Mar 2025 19:11:34 +0100
> 
>  gdb/NEWS                  |   8 +++
>  gdb/doc/gdb.texinfo       |  24 +++++++++
>  gdb/remote.c              | 103 +++++++++++++++++++++++++++++++++++++-
>  gdbserver/remote-utils.cc |  14 ++++++
>  gdbserver/remote-utils.h  |   4 ++
>  gdbserver/server.cc       |  15 ++++++
>  6 files changed, 167 insertions(+), 1 deletion(-)

Thanks, the documentation parts are approved.

Reviewed-By: Eli Zaretskii <eliz@gnu.org>
  
Aktemur, Tankut Baris April 4, 2025, 1:45 p.m. UTC | #2
Kindly pinging.

-Baris

On Thursday, March 20, 2025 7:12 PM, Aktemur, Tankut Baris wrote:
> Introduce a new Remote Serial Protocol (RSP) packet: 'e'
> 
> This packet fetches a subset of registers from the remote target and
> is an alternative to the 'g' packet, which fetches all the registers.
> The motivation is to reduce the data transfer overhead between GDB and
> the remote target.  Using the 'e' packet instead of 'g' can make a
> difference especially when the target has a large number of threads
> with big register files, such as GPUs.
> 
> Which subset of registers to send to GDB is up to the remote target,
> but typically it would be the list of "expedite" registers as defined
> by the specific target.  We implement the 'e' packet at the gdbserver
> side that way.
> 
> Suppose we are debugging a multi-threaded program over a remote target
> connection.  When a thread hits a breakpoint, its expedited registers
> are included in the stop reply packet, e.g.:
> 
>   [remote] Packet received:
> T05swbreak:;06:c0ddffffff7f0000;07:20ddffffff7f0000;10:1e55555555550000;thread:p30382.30382;
> core:6;
> 
> With these expedited registers, it may be sufficient to accomplish
> fundamental debug actions such as computing the frame or even the
> backtrace of the eventing thread, if the frame arguments do not depend
> on non-expedited registers (the user can avoid this dependency with
> `set print frame-arguments none`) without having to fetch anything
> more with the 'g' packet.
> 
> However, if the user wants to inspect the other threads, GDB
> immediately sends a 'g' packet for them to fetch all their registers,
> because the stop reply packet contains the expedited registers of the
> eventing thread only.  As an example, suppose there are 1000 threads
> in a program and we hit a breakpoint.  If the user runs `info
> threads`, the whole register sets of all the 999 threads would be
> fetched by sending a 'g' packet for each.  For targets that have even
> larger number of threads with big register files (e.g. GPUs), this
> could be costly.  The 'e' packet approach in this patch aims to reduce
> the costs by first fetching the expedited registers and resorting to
> 'g' if the expedited registers are not sufficient.  This means,
> however, the 'e' packet approach does not guarantee speedups -- in the
> worst case, GDB would first send an 'e' packet followed by a 'g'
> packet, instead of sending 'g' only.  However, I think that in the
> typical user scenarios improvements are expected.
> 
> Here is an experiment with a program that has 2000 threads.  After
> hitting a BP, we do "maint set per-command time on" and measure the
> cost of the following commands:
> 
>  * info threads
> 
>    In the experiment I did, there are no arguments of the frame.
>    Expedited registers are sufficient to calculate the frame and print
>    the thread rows in the output.
> 
>    Without this patch: 2.601 s
>    With this patch:    0.856 s
> 
>  * thread apply all info reg rip
> 
>    The 'rip' register is part of expedited registers.  Hence, for each
>    thread, no further 'g' packet needs to be sent.  Without the patch,
>    a 'g' is sent for each thread except the one that hit the
>    breakpoint.
> 
>    Without this patch: 2.317 s
>    With this patch:    1.406 s
> 
>  * thread apply all info reg rip r15
> 
>    The 'r15' register is not part of the expedited set.  Therefore,
>    after sending an 'e' to fetch rip, also a 'g' is sent to fetch r15.
>    This benchmark represents the worst case.
> 
>    Without this patch: 2.495 s
>    With this patch:    3.332 s
> 
> In our (Intel) downstream debugger for our GPU targets, we see much
> more significant improvements and the worst case scenario is not seen
> in typical user scenarios (we recommend the users to run "info
> threads" with "set print frame-arguments none").
> 
> Regression-tested on X86-64 Linux using the native-gdbserver and
> native-extended-gdbserver board files.
> ---
>  gdb/NEWS                  |   8 +++
>  gdb/doc/gdb.texinfo       |  24 +++++++++
>  gdb/remote.c              | 103 +++++++++++++++++++++++++++++++++++++-
>  gdbserver/remote-utils.cc |  14 ++++++
>  gdbserver/remote-utils.h  |   4 ++
>  gdbserver/server.cc       |  15 ++++++
>  6 files changed, 167 insertions(+), 1 deletion(-)
> 
> diff --git a/gdb/NEWS b/gdb/NEWS
> index 0aac7a7b13a..2d54b64586b 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -73,6 +73,14 @@ binary-upload in qSupported reply
>    stub doesn't report this feature supported, then GDB will not use
>    the 'x' packet.
> 
> +e
> +  Fetch the "expedited" registers from the remote target.  These are
> +  the registers that are deemed the most important by the target and
> +  are typically the same as those sent in a 'T' stop reply packet.
> +  The 'e' packet can be an alternative to the 'g' packet to reduce
> +  the transfer overhead between GDB and the remote target when the
> +  complete register file of a thread is large.
> +
>  * Changed remote packets
> 
>  qXfer:threads:read
> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index 473431011d1..b86bfd17fc8 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -43125,6 +43125,30 @@ Reply:
>  for success
>  @end table
> 
> +@item e
> +@anchor{read expedited registers packet}
> +@cindex @samp{e} packet
> +Read expedited registers.
> +
> +Reply:
> +@table @samp
> +@item @var{n1}:@var{r1};@var{n2}:@var{r2};@dots{}
> +The reply contains a list of @samp{@var{n}:@var{r}} pairs, where each
> +pair carries the value @var{r} for the register numbered @var{n},
> +where @var{n} is a hexadecimal number.  The corresponding @var{r}
> +gives the register's value as a series of bytes in target byte order,
> +with each byte given by a two-digit hex number.
> +@end table
> +
> +The registers sent by the remote target are those that are deemed
> +important (e.g.@: most frequently accessed) registers.  The list is
> +determined by the target and is typically the same as the list sent in
> +a stop reply packet.  @xref{Stop Reply Packets}.
> +
> +The @samp{e} packet can be an alternative to the @samp{g} packet to
> +reduce the transfer overhead between @value{GDBN} and the remote
> +target when the complete register file of a thread is large.
> +
>  @item F @var{RC},@var{EE},@var{CF};@var{XX}
>  @cindex @samp{F} packet
>  A reply from @value{GDBN} to an @samp{F} packet sent by the target.
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 66c58c884d3..f9d9325da23 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -236,6 +236,7 @@ enum {
>    PACKET_qSymbol,
>    PACKET_P,
>    PACKET_p,
> +  PACKET_e,
>    PACKET_Z0,
>    PACKET_Z1,
>    PACKET_Z2,
> @@ -466,6 +467,7 @@ struct packet_reg
>    long regnum; /* GDB's internal register number.  */
>    LONGEST pnum; /* Remote protocol register number.  */
>    bool in_g_packet; /* Always part of G packet.  */
> +  bool in_e_packet; /* Part of the E packet (expedited registers).  */
>    /* long size in bytes;  == register_size (arch, regnum);
>       at present.  */
>    /* char *name; == gdbarch_register_name (arch, regnum);
> @@ -1321,6 +1323,9 @@ class remote_target : public process_stratum_target
>    int send_g_packet ();
>    void process_g_packet (struct regcache *regcache);
>    void fetch_registers_using_g (struct regcache *regcache);
> +
> +  void fetch_registers_using_e (regcache *regcache);
> +
>    int store_register_using_P (const struct regcache *regcache,
>  			      packet_reg *reg);
>    void store_registers_using_G (const struct regcache *regcache);
> @@ -8240,6 +8245,7 @@ Packet: '%s'\n"),
>  			   hex_string (pnum), p, buf);
> 
>  		  int reg_size = register_size (event->arch, reg->regnum);
> +		  reg->in_e_packet = true;
>  		  cached_reg.num = reg->regnum;
>  		  cached_reg.data.resize (reg_size);
> 
> @@ -9081,6 +9087,89 @@ remote_target::fetch_registers_using_g (struct regcache *regcache)
>    process_g_packet (regcache);
>  }
> 
> +/* Fetch the expedited registers.  */
> +
> +void
> +remote_target::fetch_registers_using_e (regcache *regcache)
> +{
> +  if (m_features.packet_support (PACKET_e) == PACKET_DISABLE)
> +    return;
> +
> +  remote_state *rs = get_remote_state ();
> +
> +  char *wbuf = rs->buf.data ();
> +  wbuf[0] = 'e';
> +  wbuf[1] = '\0';
> +
> +  putpkt (rs->buf);
> +  getpkt (&rs->buf, 0);
> +
> +  packet_result result = m_features.packet_ok (rs->buf, PACKET_e);
> +  switch (result.status ())
> +    {
> +    case PACKET_OK:
> +      break;
> +    case PACKET_UNKNOWN:
> +    case PACKET_ERROR:
> +      /* We do not raise an error here, so that we try 'g'
> +	 and 'p' further.  */
> +      return;
> +    }
> +
> +  gdbarch *gdbarch = regcache->arch ();
> +  remote_arch_state *rsa = rs->get_remote_arch_state (gdbarch);
> +  const char *buf = rs->buf.data ();
> +
> +  while (*buf != '\0')
> +    {
> +      /* Parse a register value.  The format is 'n...:r...;', where
> +	 n... = register number
> +	 r... = register contents.  */
> +      const char *buf_col = strchr (buf, ':');
> +      if ((buf_col == nullptr) || (buf_col == buf))
> +	error (_("Remote response is badly formatted: %s"
> +		 "\nhere: %s"), rs->buf.data (), buf);
> +
> +      ULONGEST pnum;
> +      const char *buf_temp = unpack_varlen_hex (buf, &pnum);
> +      if (buf_temp != buf_col)
> +	{
> +	  /* Not a number.  */
> +	  error (_("Remote response does not contain a number at: %s"),
> +		 buf);
> +	}
> +
> +      packet_reg *reg = packet_reg_from_pnum (gdbarch, rsa, pnum);
> +      if (reg == nullptr)
> +	error (_("Remote sent bad register number %s at %s"),
> +	       hex_string (pnum), buf);
> +
> +      reg->in_e_packet = true;
> +
> +      int reg_size = register_size (gdbarch, reg->regnum);
> +      buf = buf_temp + 1;
> +      if (*buf == 'x')
> +	regcache->raw_supply (reg->regnum, nullptr);
> +      else
> +	{
> +	  std::vector<gdb_byte> value;
> +	  value.reserve (reg_size);
> +
> +	  int data_size = hex2bin (buf, value.data (), reg_size);
> +	  if (data_size < reg_size)
> +	    error (_("Remote reply is too short: %s"), buf);
> +
> +	  regcache->raw_supply (reg->regnum, value.data ());
> +	}
> +
> +      buf += 2 * reg_size;
> +      if (*buf != ';')
> +	error (_("Remote response does not contain a ';' at %s"), buf);
> +
> +      buf++;
> +    }
> +}
> +
>  /* Make the remote selected traceframe match GDB's selected
>     traceframe.  */
> 
> @@ -9121,7 +9210,17 @@ remote_target::fetch_registers (struct regcache *regcache, int
> regnum)
> 
>        gdb_assert (reg != NULL);
> 
> -      /* If this register might be in the 'g' packet, try that first -
> +      /* If we believe this register is included in an 'e' packet, try
> +	 it first.  */
> +      if (reg->in_e_packet)
> +	{
> +	  reg->in_e_packet = false;
> +	  fetch_registers_using_e (regcache);
> +	  if (reg->in_e_packet)
> +	    return;
> +	}
> +
> +      /* If this register might be in the 'g' packet, try that next;
>  	 we are likely to read more than one register.  If this is the
>  	 first 'g' packet, we might be overly optimistic about its
>  	 contents, so fall back to 'p'.  */
> @@ -16338,6 +16437,8 @@ Show the maximum size of the address (in bits) in a memory
> packet."), NULL,
> 
>    add_packet_config_cmd (PACKET_p, "p", "fetch-register", 1);
> 
> +  add_packet_config_cmd (PACKET_e, "e", "fetch-expedited-registers", 1);
> +
>    add_packet_config_cmd (PACKET_Z0, "Z0", "software-breakpoint", 0);
> 
>    add_packet_config_cmd (PACKET_Z1, "Z1", "hardware-breakpoint", 0);
> diff --git a/gdbserver/remote-utils.cc b/gdbserver/remote-utils.cc
> index 67225c50f81..7a1dc4aaadb 100644
> --- a/gdbserver/remote-utils.cc
> +++ b/gdbserver/remote-utils.cc
> @@ -1295,6 +1295,20 @@ prepare_resume_reply (char *buf, ptid_t ptid, const target_waitstatus
> &status)
> 
>  /* See remote-utils.h.  */
> 
> +void
> +output_expedite_registers (char *buf)
> +{
> +  regcache *regcache = get_thread_regcache (current_thread);
> +
> +  for (const std::string &expedited_reg : regcache->tdesc->expedite_regs)
> +    buf = outreg (regcache, find_regno (regcache->tdesc,
> +					expedited_reg.c_str ()), buf);
> +
> +  *buf = '\0';
> +}
> +
> +/* See remote-utils.h.  */
> +
>  const char *
>  decode_m_packet_params (const char *from, CORE_ADDR *mem_addr_ptr,
>  			unsigned int *len_ptr, const char end_marker)
> diff --git a/gdbserver/remote-utils.h b/gdbserver/remote-utils.h
> index 65ce604f9dc..dd939240a01 100644
> --- a/gdbserver/remote-utils.h
> +++ b/gdbserver/remote-utils.h
> @@ -43,6 +43,10 @@ void check_remote_input_interrupt_request (void);
>  void prepare_resume_reply (char *buf, ptid_t ptid,
>  			   const target_waitstatus &status);
> 
> +/* Output the expedited registers of the current thread into BUF
> +   as a list of 'regno:value;' pairs.  */
> +void output_expedite_registers (char *buf);
> +
>  const char *decode_address_to_semicolon (CORE_ADDR *addrp, const char *start);
>  void decode_address (CORE_ADDR *addrp, const char *start, int len);
> 
> diff --git a/gdbserver/server.cc b/gdbserver/server.cc
> index def01c1ee80..880cc9ffc13 100644
> --- a/gdbserver/server.cc
> +++ b/gdbserver/server.cc
> @@ -4677,6 +4677,21 @@ process_serial_event (void)
>  	  cs.own_buf[0] = '\0';
>  	}
>        break;
> +    case 'e':
> +      require_running_or_break (cs.own_buf);
> +      if (cs.current_traceframe >= 0)
> +	{
> +	  /* This packet is not supported in this mode.  */
> +	  write_enn (cs.own_buf);
> +	}
> +      else
> +	{
> +	  if (!set_desired_thread ())
> +	    write_enn (cs.own_buf);
> +	  else
> +	    output_expedite_registers (cs.own_buf);
> +	}
> +      break;
>      case 'g':
>        require_running_or_break (cs.own_buf);
>        if (cs.current_traceframe >= 0)
> --
> 2.34.1

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
  
Thiago Jung Bauermann April 4, 2025, 11:40 p.m. UTC | #3
Hello Baris,

Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> writes:

> Here is an experiment with a program that has 2000 threads.  After
> hitting a BP, we do "maint set per-command time on" and measure the
> cost of the following commands:
>
>  * info threads
>
>    In the experiment I did, there are no arguments of the frame.
>    Expedited registers are sufficient to calculate the frame and print
>    the thread rows in the output.
>
>    Without this patch: 2.601 s
>    With this patch:    0.856 s
>
>  * thread apply all info reg rip
>
>    The 'rip' register is part of expedited registers.  Hence, for each
>    thread, no further 'g' packet needs to be sent.  Without the patch,
>    a 'g' is sent for each thread except the one that hit the
>    breakpoint.
>
>    Without this patch: 2.317 s
>    With this patch:    1.406 s
>
>  * thread apply all info reg rip r15
>
>    The 'r15' register is not part of the expedited set.  Therefore,
>    after sending an 'e' to fetch rip, also a 'g' is sent to fetch r15.
>    This benchmark represents the worst case.
>
>    Without this patch: 2.495 s
>    With this patch:    3.332 s
>
> In our (Intel) downstream debugger for our GPU targets, we see much
> more significant improvements and the worst case scenario is not seen
> in typical user scenarios (we recommend the users to run "info
> threads" with "set print frame-arguments none").

These are great results.

> Regression-tested on X86-64 Linux using the native-gdbserver and
> native-extended-gdbserver board files.
> ---
>  gdb/NEWS                  |   8 +++
>  gdb/doc/gdb.texinfo       |  24 +++++++++
>  gdb/remote.c              | 103 +++++++++++++++++++++++++++++++++++++-
>  gdbserver/remote-utils.cc |  14 ++++++
>  gdbserver/remote-utils.h  |   4 ++
>  gdbserver/server.cc       |  15 ++++++
>  6 files changed, 167 insertions(+), 1 deletion(-)

Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>

Just a couple of minor suggestions below.

> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index 473431011d1..b86bfd17fc8 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -43125,6 +43125,30 @@ Reply:
>  for success
>  @end table
>  
> +@item e
> +@anchor{read expedited registers packet}
> +@cindex @samp{e} packet
> +Read expedited registers.
> +
> +Reply:
> +@table @samp
> +@item @var{n1}:@var{r1};@var{n2}:@var{r2};@dots{}
> +The reply contains a list of @samp{@var{n}:@var{r}} pairs, where each
> +pair carries the value @var{r} for the register numbered @var{n},
> +where @var{n} is a hexadecimal number.  The corresponding @var{r}
> +gives the register's value as a series of bytes in target byte order,
> +with each byte given by a two-digit hex number.
> +@end table
> +
> +The registers sent by the remote target are those that are deemed
> +important (e.g.@: most frequently accessed) registers.  The list is
> +determined by the target and is typically the same as the list sent in
> +a stop reply packet.  @xref{Stop Reply Packets}.

If I understood the patch correctly, when receiving a stop reply packet
GDB assumes that the expedited registers will also be present in the
reply to an 'e' packet.

So if there's a register that's in the stop reply but not in the 'e'
reply then GDB will fall into the worst case that you describe in the
commit message. I suggest adding some explanation about this here and
also a recommendation (if you agree with such a recommendation) that
it's better to have all expedited registers also present in the 'e'
reply.

Another alternative would be to only set packet_reg::in_e_packet to true
the first time a stop reply with expedited registers is received,
instead of in every one. In that case, GDB will only fall into this
specific worst case situation one time, and from then on learn which
registers are actually in the 'e' reply. Though perhaps the additional
complexity might not be worth it.

> +The @samp{e} packet can be an alternative to the @samp{g} packet to
> +reduce the transfer overhead between @value{GDBN} and the remote
> +target when the complete register file of a thread is large.
> +
>  @item F @var{RC},@var{EE},@var{CF};@var{XX}
>  @cindex @samp{F} packet
>  A reply from @value{GDBN} to an @samp{F} packet sent by the target.

<snip>

> @@ -9081,6 +9087,89 @@ remote_target::fetch_registers_using_g (struct regcache *regcache)
>    process_g_packet (regcache);
>  }
>  
> +/* Fetch the expedited registers.  */
> +
> +void
> +remote_target::fetch_registers_using_e (regcache *regcache)
> +{
> +  if (m_features.packet_support (PACKET_e) == PACKET_DISABLE)
> +    return;
> +
> +  remote_state *rs = get_remote_state ();
> +
> +  char *wbuf = rs->buf.data ();
> +  wbuf[0] = 'e';
> +  wbuf[1] = '\0';
> +
> +  putpkt (rs->buf);
> +  getpkt (&rs->buf, 0);
> +
> +  packet_result result = m_features.packet_ok (rs->buf, PACKET_e);
> +  switch (result.status ())
> +    {
> +    case PACKET_OK:
> +      break;
> +    case PACKET_UNKNOWN:
> +    case PACKET_ERROR:
> +      /* We do not raise an error here, so that we try 'g'
> +	 and 'p' further.  */
> +      return;
> +    }
> +
> +  gdbarch *gdbarch = regcache->arch ();
> +  remote_arch_state *rsa = rs->get_remote_arch_state (gdbarch);
> +  const char *buf = rs->buf.data ();
> +
> +  while (*buf != '\0')
> +    {
> +      /* Parse a register value.  The format is 'n...:r...;', where
> +	 n... = register number
> +	 r... = register contents.  */
> +      const char *buf_col = strchr (buf, ':');
> +      if ((buf_col == nullptr) || (buf_col == buf))
> +	error (_("Remote response is badly formatted: %s"
> +		 "\nhere: %s"), rs->buf.data (), buf);
> +
> +      ULONGEST pnum;
> +      const char *buf_temp = unpack_varlen_hex (buf, &pnum);
> +      if (buf_temp != buf_col)
> +	{
> +	  /* Not a number.  */
> +	  error (_("Remote response does not contain a number at: %s"),
> +		 buf);
> +	}
> +
> +      packet_reg *reg = packet_reg_from_pnum (gdbarch, rsa, pnum);
> +      if (reg == nullptr)
> +	error (_("Remote sent bad register number %s at %s"),
> +	       hex_string (pnum), buf);
> +
> +      reg->in_e_packet = true;
> +
> +      int reg_size = register_size (gdbarch, reg->regnum);
> +      buf = buf_temp + 1;
> +      if (*buf == 'x')
> +	regcache->raw_supply (reg->regnum, nullptr);
> +      else
> +	{
> +	  std::vector<gdb_byte> value;
> +	  value.reserve (reg_size);
> +
> +	  int data_size = hex2bin (buf, value.data (), reg_size);
> +	  if (data_size < reg_size)
> +	    error (_("Remote reply is too short: %s"), buf);
> +
> +	  regcache->raw_supply (reg->regnum, value.data ());

I suggest passing value instead of value.data () to raw_supply so that
the gdb::array_view version of the method is used.

> +	}
> +
> +      buf += 2 * reg_size;
> +      if (*buf != ';')
> +	error (_("Remote response does not contain a ';' at %s"), buf);
> +
> +      buf++;
> +    }
> +}
  
Aktemur, Tankut Baris April 23, 2025, 7:55 a.m. UTC | #4
On Saturday, April 5, 2025 1:41 AM, Thiago Jung Bauermann wrote:
> Hello Baris,
> 
> Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> writes:
> 
> > Here is an experiment with a program that has 2000 threads.  After
> > hitting a BP, we do "maint set per-command time on" and measure the
> > cost of the following commands:
> >
> >  * info threads
> >
> >    In the experiment I did, there are no arguments of the frame.
> >    Expedited registers are sufficient to calculate the frame and print
> >    the thread rows in the output.
> >
> >    Without this patch: 2.601 s
> >    With this patch:    0.856 s
> >
> >  * thread apply all info reg rip
> >
> >    The 'rip' register is part of expedited registers.  Hence, for each
> >    thread, no further 'g' packet needs to be sent.  Without the patch,
> >    a 'g' is sent for each thread except the one that hit the
> >    breakpoint.
> >
> >    Without this patch: 2.317 s
> >    With this patch:    1.406 s
> >
> >  * thread apply all info reg rip r15
> >
> >    The 'r15' register is not part of the expedited set.  Therefore,
> >    after sending an 'e' to fetch rip, also a 'g' is sent to fetch r15.
> >    This benchmark represents the worst case.
> >
> >    Without this patch: 2.495 s
> >    With this patch:    3.332 s
> >
> > In our (Intel) downstream debugger for our GPU targets, we see much
> > more significant improvements and the worst case scenario is not seen
> > in typical user scenarios (we recommend the users to run "info
> > threads" with "set print frame-arguments none").
> 
> These are great results.
> 
> > Regression-tested on X86-64 Linux using the native-gdbserver and
> > native-extended-gdbserver board files.
> > ---
> >  gdb/NEWS                  |   8 +++
> >  gdb/doc/gdb.texinfo       |  24 +++++++++
> >  gdb/remote.c              | 103 +++++++++++++++++++++++++++++++++++++-
> >  gdbserver/remote-utils.cc |  14 ++++++
> >  gdbserver/remote-utils.h  |   4 ++
> >  gdbserver/server.cc       |  15 ++++++
> >  6 files changed, 167 insertions(+), 1 deletion(-)
> 
> Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
> 
> Just a couple of minor suggestions below.

Thank you for the review.

> > diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> > index 473431011d1..b86bfd17fc8 100644
> > --- a/gdb/doc/gdb.texinfo
> > +++ b/gdb/doc/gdb.texinfo
> > @@ -43125,6 +43125,30 @@ Reply:
> >  for success
> >  @end table
> >
> > +@item e
> > +@anchor{read expedited registers packet}
> > +@cindex @samp{e} packet
> > +Read expedited registers.
> > +
> > +Reply:
> > +@table @samp
> > +@item @var{n1}:@var{r1};@var{n2}:@var{r2};@dots{}
> > +The reply contains a list of @samp{@var{n}:@var{r}} pairs, where each
> > +pair carries the value @var{r} for the register numbered @var{n},
> > +where @var{n} is a hexadecimal number.  The corresponding @var{r}
> > +gives the register's value as a series of bytes in target byte order,
> > +with each byte given by a two-digit hex number.
> > +@end table
> > +
> > +The registers sent by the remote target are those that are deemed
> > +important (e.g.@: most frequently accessed) registers.  The list is
> > +determined by the target and is typically the same as the list sent in
> > +a stop reply packet.  @xref{Stop Reply Packets}.
> 
> If I understood the patch correctly, when receiving a stop reply packet
> GDB assumes that the expedited registers will also be present in the
> reply to an 'e' packet.
>
> So if there's a register that's in the stop reply but not in the 'e'
> reply then GDB will fall into the worst case that you describe in the
> commit message.

This is correct.  In fact RSP does not really define expedite registers.
It just allows a stop reply to contain some register values.  I think in
practice we can expect the server to be consistent in the list of expedite
registers it sends each time with a stop reply, but the definition in RSP
as well as the implementation in GDB allow flexibility to the server to send
a different list with each stop reply.

Please note that while what you described above can lead to the worst
case scenario of having to send both 'e' and 'g', it is not the only way.
Suppose the remote target has register R1 defined as an expedite register
and R2 is not.  The stop reply packet for thread T1 contains R1, making
GDB mark this register as expedited.  Next the user executes an operation
in the context of thread T2, for which no registers are fetched so far, and 
the operation depends on R1 only.  GDB would then fetch R1 of T2 using 'e'.
So far so good.  The user then executes another operation in the context of
T2, but this time the operation depends on R2.  GDB uses 'g' to fetch R2's
value (and all the other registers', including R1).  If GDB used 'g' in the
first place, we would have avoided the previous 'e' packet.  This scenario
can happen even when the target is consistent in the list of expedite
registers it sends and is the scenario that I aimed at explaining and
benchmarking in the commit message.

> I suggest adding some explanation about this here and
> also a recommendation (if you agree with such a recommendation) that
> it's better to have all expedited registers also present in the 'e'
> reply.

I added the following text:

  If the remote target
  does not send the same list of registers in a stop reply packet and
  in the reply to the @samp{e} packet, @value{GDBN} may have to make
  additional queries to fetch register values; e.g.@: first try the
  @samp{e} packet and then the @samp{g} packet.  Therefore, it is
  recommended that the remote target consistently sends the same list
  of registers in stop reply packets and in @samp{e} packet replies.

> Another alternative would be to only set packet_reg::in_e_packet to true
> the first time a stop reply with expedited registers is received,
> instead of in every one. In that case, GDB will only fall into this
> specific worst case situation one time, and from then on learn which
> registers are actually in the 'e' reply. Though perhaps the additional
> complexity might not be worth it.

It is I think a good idea to let the remote target announce its expedite
registers in the target XML from the very beginning, but let me keep the
current patch more concise and explore that direction as a follow up.

> > +The @samp{e} packet can be an alternative to the @samp{g} packet to
> > +reduce the transfer overhead between @value{GDBN} and the remote
> > +target when the complete register file of a thread is large.
> > +
> >  @item F @var{RC},@var{EE},@var{CF};@var{XX}
> >  @cindex @samp{F} packet
> >  A reply from @value{GDBN} to an @samp{F} packet sent by the target.
> 
> <snip>
> 
> > @@ -9081,6 +9087,89 @@ remote_target::fetch_registers_using_g (struct regcache *regcache)
> >    process_g_packet (regcache);
> >  }
> >
> > +/* Fetch the expedited registers.  */
> > +
> > +void
> > +remote_target::fetch_registers_using_e (regcache *regcache)
> > +{
> > +  if (m_features.packet_support (PACKET_e) == PACKET_DISABLE)
> > +    return;
> > +
> > +  remote_state *rs = get_remote_state ();
> > +
> > +  char *wbuf = rs->buf.data ();
> > +  wbuf[0] = 'e';
> > +  wbuf[1] = '\0';
> > +
> > +  putpkt (rs->buf);
> > +  getpkt (&rs->buf, 0);
> > +
> > +  packet_result result = m_features.packet_ok (rs->buf, PACKET_e);
> > +  switch (result.status ())
> > +    {
> > +    case PACKET_OK:
> > +      break;
> > +    case PACKET_UNKNOWN:
> > +    case PACKET_ERROR:
> > +      /* We do not raise an error here, so that we try 'g'
> > +	 and 'p' further.  */
> > +      return;
> > +    }
> > +
> > +  gdbarch *gdbarch = regcache->arch ();
> > +  remote_arch_state *rsa = rs->get_remote_arch_state (gdbarch);
> > +  const char *buf = rs->buf.data ();
> > +
> > +  while (*buf != '\0')
> > +    {
> > +      /* Parse a register value.  The format is 'n...:r...;', where
> > +	 n... = register number
> > +	 r... = register contents.  */
> > +      const char *buf_col = strchr (buf, ':');
> > +      if ((buf_col == nullptr) || (buf_col == buf))
> > +	error (_("Remote response is badly formatted: %s"
> > +		 "\nhere: %s"), rs->buf.data (), buf);
> > +
> > +      ULONGEST pnum;
> > +      const char *buf_temp = unpack_varlen_hex (buf, &pnum);
> > +      if (buf_temp != buf_col)
> > +	{
> > +	  /* Not a number.  */
> > +	  error (_("Remote response does not contain a number at: %s"),
> > +		 buf);
> > +	}
> > +
> > +      packet_reg *reg = packet_reg_from_pnum (gdbarch, rsa, pnum);
> > +      if (reg == nullptr)
> > +	error (_("Remote sent bad register number %s at %s"),
> > +	       hex_string (pnum), buf);
> > +
> > +      reg->in_e_packet = true;
> > +
> > +      int reg_size = register_size (gdbarch, reg->regnum);
> > +      buf = buf_temp + 1;
> > +      if (*buf == 'x')
> > +	regcache->raw_supply (reg->regnum, nullptr);
> > +      else
> > +	{
> > +	  std::vector<gdb_byte> value;
> > +	  value.reserve (reg_size);
> > +
> > +	  int data_size = hex2bin (buf, value.data (), reg_size);
> > +	  if (data_size < reg_size)
> > +	    error (_("Remote reply is too short: %s"), buf);
> > +
> > +	  regcache->raw_supply (reg->regnum, value.data ());
> 
> I suggest passing value instead of value.data () to raw_supply so that
> the gdb::array_view version of the method is used.

Thanks.  I made this change in v2.

> > +	}
> > +
> > +      buf += 2 * reg_size;
> > +      if (*buf != ';')
> > +	error (_("Remote response does not contain a ';' at %s"), buf);
> > +
> > +      buf++;
> > +    }
> > +}
> 
> --
> Thiago

Regards
-Baris


Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
  

Patch

diff --git a/gdb/NEWS b/gdb/NEWS
index 0aac7a7b13a..2d54b64586b 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -73,6 +73,14 @@  binary-upload in qSupported reply
   stub doesn't report this feature supported, then GDB will not use
   the 'x' packet.
 
+e
+  Fetch the "expedited" registers from the remote target.  These are
+  the registers that are deemed the most important by the target and
+  are typically the same as those sent in a 'T' stop reply packet.
+  The 'e' packet can be an alternative to the 'g' packet to reduce
+  the transfer overhead between GDB and the remote target when the
+  complete register file of a thread is large.
+
 * Changed remote packets
 
 qXfer:threads:read
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 473431011d1..b86bfd17fc8 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -43125,6 +43125,30 @@  Reply:
 for success
 @end table
 
+@item e
+@anchor{read expedited registers packet}
+@cindex @samp{e} packet
+Read expedited registers.
+
+Reply:
+@table @samp
+@item @var{n1}:@var{r1};@var{n2}:@var{r2};@dots{}
+The reply contains a list of @samp{@var{n}:@var{r}} pairs, where each
+pair carries the value @var{r} for the register numbered @var{n},
+where @var{n} is a hexadecimal number.  The corresponding @var{r}
+gives the register's value as a series of bytes in target byte order,
+with each byte given by a two-digit hex number.
+@end table
+
+The registers sent by the remote target are those that are deemed
+important (e.g.@: most frequently accessed) registers.  The list is
+determined by the target and is typically the same as the list sent in
+a stop reply packet.  @xref{Stop Reply Packets}.
+
+The @samp{e} packet can be an alternative to the @samp{g} packet to
+reduce the transfer overhead between @value{GDBN} and the remote
+target when the complete register file of a thread is large.
+
 @item F @var{RC},@var{EE},@var{CF};@var{XX}
 @cindex @samp{F} packet
 A reply from @value{GDBN} to an @samp{F} packet sent by the target.
diff --git a/gdb/remote.c b/gdb/remote.c
index 66c58c884d3..f9d9325da23 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -236,6 +236,7 @@  enum {
   PACKET_qSymbol,
   PACKET_P,
   PACKET_p,
+  PACKET_e,
   PACKET_Z0,
   PACKET_Z1,
   PACKET_Z2,
@@ -466,6 +467,7 @@  struct packet_reg
   long regnum; /* GDB's internal register number.  */
   LONGEST pnum; /* Remote protocol register number.  */
   bool in_g_packet; /* Always part of G packet.  */
+  bool in_e_packet; /* Part of the E packet (expedited registers).  */
   /* long size in bytes;  == register_size (arch, regnum);
      at present.  */
   /* char *name; == gdbarch_register_name (arch, regnum);
@@ -1321,6 +1323,9 @@  class remote_target : public process_stratum_target
   int send_g_packet ();
   void process_g_packet (struct regcache *regcache);
   void fetch_registers_using_g (struct regcache *regcache);
+
+  void fetch_registers_using_e (regcache *regcache);
+
   int store_register_using_P (const struct regcache *regcache,
 			      packet_reg *reg);
   void store_registers_using_G (const struct regcache *regcache);
@@ -8240,6 +8245,7 @@  Packet: '%s'\n"),
 			   hex_string (pnum), p, buf);
 
 		  int reg_size = register_size (event->arch, reg->regnum);
+		  reg->in_e_packet = true;
 		  cached_reg.num = reg->regnum;
 		  cached_reg.data.resize (reg_size);
 
@@ -9081,6 +9087,89 @@  remote_target::fetch_registers_using_g (struct regcache *regcache)
   process_g_packet (regcache);
 }
 
+/* Fetch the expedited registers.  */
+
+void
+remote_target::fetch_registers_using_e (regcache *regcache)
+{
+  if (m_features.packet_support (PACKET_e) == PACKET_DISABLE)
+    return;
+
+  remote_state *rs = get_remote_state ();
+
+  char *wbuf = rs->buf.data ();
+  wbuf[0] = 'e';
+  wbuf[1] = '\0';
+
+  putpkt (rs->buf);
+  getpkt (&rs->buf, 0);
+
+  packet_result result = m_features.packet_ok (rs->buf, PACKET_e);
+  switch (result.status ())
+    {
+    case PACKET_OK:
+      break;
+    case PACKET_UNKNOWN:
+    case PACKET_ERROR:
+      /* We do not raise an error here, so that we try 'g'
+	 and 'p' further.  */
+      return;
+    }
+
+  gdbarch *gdbarch = regcache->arch ();
+  remote_arch_state *rsa = rs->get_remote_arch_state (gdbarch);
+  const char *buf = rs->buf.data ();
+
+  while (*buf != '\0')
+    {
+      /* Parse a register value.  The format is 'n...:r...;', where
+	 n... = register number
+	 r... = register contents.  */
+      const char *buf_col = strchr (buf, ':');
+      if ((buf_col == nullptr) || (buf_col == buf))
+	error (_("Remote response is badly formatted: %s"
+		 "\nhere: %s"), rs->buf.data (), buf);
+
+      ULONGEST pnum;
+      const char *buf_temp = unpack_varlen_hex (buf, &pnum);
+      if (buf_temp != buf_col)
+	{
+	  /* Not a number.  */
+	  error (_("Remote response does not contain a number at: %s"),
+		 buf);
+	}
+
+      packet_reg *reg = packet_reg_from_pnum (gdbarch, rsa, pnum);
+      if (reg == nullptr)
+	error (_("Remote sent bad register number %s at %s"),
+	       hex_string (pnum), buf);
+
+      reg->in_e_packet = true;
+
+      int reg_size = register_size (gdbarch, reg->regnum);
+      buf = buf_temp + 1;
+      if (*buf == 'x')
+	regcache->raw_supply (reg->regnum, nullptr);
+      else
+	{
+	  std::vector<gdb_byte> value;
+	  value.reserve (reg_size);
+
+	  int data_size = hex2bin (buf, value.data (), reg_size);
+	  if (data_size < reg_size)
+	    error (_("Remote reply is too short: %s"), buf);
+
+	  regcache->raw_supply (reg->regnum, value.data ());
+	}
+
+      buf += 2 * reg_size;
+      if (*buf != ';')
+	error (_("Remote response does not contain a ';' at %s"), buf);
+
+      buf++;
+    }
+}
+
 /* Make the remote selected traceframe match GDB's selected
    traceframe.  */
 
@@ -9121,7 +9210,17 @@  remote_target::fetch_registers (struct regcache *regcache, int regnum)
 
       gdb_assert (reg != NULL);
 
-      /* If this register might be in the 'g' packet, try that first -
+      /* If we believe this register is included in an 'e' packet, try
+	 it first.  */
+      if (reg->in_e_packet)
+	{
+	  reg->in_e_packet = false;
+	  fetch_registers_using_e (regcache);
+	  if (reg->in_e_packet)
+	    return;
+	}
+
+      /* If this register might be in the 'g' packet, try that next;
 	 we are likely to read more than one register.  If this is the
 	 first 'g' packet, we might be overly optimistic about its
 	 contents, so fall back to 'p'.  */
@@ -16338,6 +16437,8 @@  Show the maximum size of the address (in bits) in a memory packet."), NULL,
 
   add_packet_config_cmd (PACKET_p, "p", "fetch-register", 1);
 
+  add_packet_config_cmd (PACKET_e, "e", "fetch-expedited-registers", 1);
+
   add_packet_config_cmd (PACKET_Z0, "Z0", "software-breakpoint", 0);
 
   add_packet_config_cmd (PACKET_Z1, "Z1", "hardware-breakpoint", 0);
diff --git a/gdbserver/remote-utils.cc b/gdbserver/remote-utils.cc
index 67225c50f81..7a1dc4aaadb 100644
--- a/gdbserver/remote-utils.cc
+++ b/gdbserver/remote-utils.cc
@@ -1295,6 +1295,20 @@  prepare_resume_reply (char *buf, ptid_t ptid, const target_waitstatus &status)
 
 /* See remote-utils.h.  */
 
+void
+output_expedite_registers (char *buf)
+{
+  regcache *regcache = get_thread_regcache (current_thread);
+
+  for (const std::string &expedited_reg : regcache->tdesc->expedite_regs)
+    buf = outreg (regcache, find_regno (regcache->tdesc,
+					expedited_reg.c_str ()), buf);
+
+  *buf = '\0';
+}
+
+/* See remote-utils.h.  */
+
 const char *
 decode_m_packet_params (const char *from, CORE_ADDR *mem_addr_ptr,
 			unsigned int *len_ptr, const char end_marker)
diff --git a/gdbserver/remote-utils.h b/gdbserver/remote-utils.h
index 65ce604f9dc..dd939240a01 100644
--- a/gdbserver/remote-utils.h
+++ b/gdbserver/remote-utils.h
@@ -43,6 +43,10 @@  void check_remote_input_interrupt_request (void);
 void prepare_resume_reply (char *buf, ptid_t ptid,
 			   const target_waitstatus &status);
 
+/* Output the expedited registers of the current thread into BUF
+   as a list of 'regno:value;' pairs.  */
+void output_expedite_registers (char *buf);
+
 const char *decode_address_to_semicolon (CORE_ADDR *addrp, const char *start);
 void decode_address (CORE_ADDR *addrp, const char *start, int len);
 
diff --git a/gdbserver/server.cc b/gdbserver/server.cc
index def01c1ee80..880cc9ffc13 100644
--- a/gdbserver/server.cc
+++ b/gdbserver/server.cc
@@ -4677,6 +4677,21 @@  process_serial_event (void)
 	  cs.own_buf[0] = '\0';
 	}
       break;
+    case 'e':
+      require_running_or_break (cs.own_buf);
+      if (cs.current_traceframe >= 0)
+	{
+	  /* This packet is not supported in this mode.  */
+	  write_enn (cs.own_buf);
+	}
+      else
+	{
+	  if (!set_desired_thread ())
+	    write_enn (cs.own_buf);
+	  else
+	    output_expedite_registers (cs.own_buf);
+	}
+      break;
     case 'g':
       require_running_or_break (cs.own_buf);
       if (cs.current_traceframe >= 0)