fix segfault in tfile_write_status if ts->stop_desc is null

Message ID hwjgyafrj2xxhp.fsf@brandonb.zetier.com
State New
Headers
Series fix segfault in tfile_write_status if ts->stop_desc is null |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed

Commit Message

Brandon Belew March 24, 2025, 3:09 p.m. UTC
  If the trace_status ts->stop_desc member is null, the 'tsave' command
would segfault within tfile_write_status. Add a null-check for it. 

---

 gdb/tracefile-tfile.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
  

Comments

Tom Tromey March 26, 2025, 4:01 p.m. UTC | #1
>>>>> "Brandon" == Brandon Belew <brandon.belew@zetier.com> writes:

Brandon> If the trace_status ts->stop_desc member is null, the 'tsave' command
Brandon> would segfault within tfile_write_status. Add a null-check for it. 

I don't have any objection to the patch but I'm wondering how this
occurs and/or whether we can get a test case for it.

Tom
  
Brandon Belew April 9, 2025, 12:01 p.m. UTC | #2
Tom Tromey <tom@tromey.com> writes:

>>>>>> "Brandon" == Brandon Belew <brandon.belew@zetier.com> writes:
>
> Brandon> If the trace_status ts->stop_desc member is null, the 'tsave' command
> Brandon> would segfault within tfile_write_status. Add a null-check for it. 
>
> I don't have any objection to the patch but I'm wondering how this
> occurs and/or whether we can get a test case for it.

Tom, sorry for the delayed response. I'm on a team working on a custom
gdbserver implementation for an emulated device, and we are trying to
add support for the various tracepoint remote packets. I hit this bug
trying to do a 'tsave' command after gdb had connected to our
server. I'd set a tracepoint with 'actions' to 'collect $regs' (which is
the only action our stub supports currently), tstart, tstop, and tfind
to get to the single trace snapshot our stub returned.

I haven't really tracked down why the field is null - our server may not
be sending back all the data that gdb assumes will be present - but did
enough debugging to find the site of the segfault and it was pretty
clearly a null deref which was easy enough to fix. Sorry I'm not sure
how to provide a better test case for this.

    ~Brandon
  
Luis Machado April 9, 2025, 12:27 p.m. UTC | #3
On 4/9/25 13:01, Brandon Belew wrote:
> Tom Tromey <tom@tromey.com> writes:
> 
>>>>>>> "Brandon" == Brandon Belew <brandon.belew@zetier.com> writes:
>>
>> Brandon> If the trace_status ts->stop_desc member is null, the 'tsave' command
>> Brandon> would segfault within tfile_write_status. Add a null-check for it. 
>>
>> I don't have any objection to the patch but I'm wondering how this
>> occurs and/or whether we can get a test case for it.
> 
> Tom, sorry for the delayed response. I'm on a team working on a custom
> gdbserver implementation for an emulated device, and we are trying to
> add support for the various tracepoint remote packets. I hit this bug
> trying to do a 'tsave' command after gdb had connected to our
> server. I'd set a tracepoint with 'actions' to 'collect $regs' (which is
> the only action our stub supports currently), tstart, tstop, and tfind
> to get to the single trace snapshot our stub returned.
> 
> I haven't really tracked down why the field is null - our server may not
> be sending back all the data that gdb assumes will be present - but did
> enough debugging to find the site of the segfault and it was pretty
> clearly a null deref which was easy enough to fix. Sorry I'm not sure
> how to provide a better test case for this.
> 
>     ~Brandon 

gdb/tracepoint.c:parse_trace_status should take care of setting stop_desc to nullptr
before filling it with the stop description. Maybe the trace status reply from your
custom gdbserver is not returning what gdb wants to see for the stop description.

With that said, the issue here might be the implementation of strlen not accepting a
nullptr gracefully and crashing. I guess that depends on your libraries. Seems worth
guarding against it anyway.

Out of curiosity, could you describe what sort of use you plan to make out of the
gdb tracepoint machinery? And would it be just regular tracepoint or also fast
tracepoints (where we JIT instructions on the target).

I ask because this feature barely sees any publicly-visible use. It helps us plan
for maintenance and potential deprecation of features.
  
Brandon Belew April 10, 2025, 1:04 p.m. UTC | #4
Luis Machado <luis.machado@arm.com> writes:
> gdb/tracepoint.c:parse_trace_status should take care of setting stop_desc to nullptr
> before filling it with the stop description. Maybe the trace status reply from your
> custom gdbserver is not returning what gdb wants to see for the stop description.

According to [[info:gdb#Tracepoint Packets]], the 'qTStatus' packet can
contain a tstop field described as follows:

     'tstop[:TEXT]:0'
          The trace was stopped by a user-originated stop command.  The
          optional TEXT field is a user-supplied string supplied as part
          of the stop command (for instance, an explanation of why the
          trace was stopped manually).  It is hex-encoded.

Our implementation sends back 'tstop:0', so we were not providing an
explanation. The documentation says this is optional, but the crash I
encountered seems to indicate GDB was assuming it was present - hence my
patch to add a nullptr check before accessing 'ts->stop_desc'.

> With that said, the issue here might be the implementation of strlen
> not accepting a nullptr gracefully and crashing. I guess that depends
> on your libraries. Seems worth guarding against it anyway.

Yes, nearly all libc implementations of strlen, including glibc 2.27
which is what I'm using on Ubuntu 18.04, will segfault if provided a
NULL pointer.

> Out of curiosity, could you describe what sort of use you plan to make
> out of the gdb tracepoint machinery? And would it be just regular
> tracepoint or also fast tracepoints (where we JIT instructions on the
> target).
>
> I ask because this feature barely sees any publicly-visible use. It
> helps us plan for maintenance and potential deprecation of features.

My team is working on emulating and debugging some embedded devices,
where behavior is very timing-sensitive. Tracepoints are great because
they collect data quickly on-device, without tripping things like
watchdog timers. We have been using the gdbstub[0] rust library for
implementing the GDB remote serial protocol. We recently contributed
basic tracepoint support to this library[1] so we could use it on our
project. Many other projects[2] use the gdbstub library, so this
contribution may lead to more public real-world tracepoint users.

We probably will not use fast tracepoint packets. Our implementation
under the hood may end up using something very similar to the inserted
branch instructions mentioned on the FastTracepoints wiki[3], but we
consider that an implementation detail and don't intend to distinguish
between regular tracepoints and fast tracepoints. 


[0]: https://github.com/daniel5151/gdbstub.git
[1]: https://github.com/daniel5151/gdbstub/pull/160
[2]: https://github.com/daniel5151/gdbstub?tab=readme-ov-file#real-world-examples
[3]: https://www.sourceware.org/gdb/wiki/FastTracepoints
  
Brandon Belew April 14, 2025, 12:33 p.m. UTC | #5
Any other info you want about the tsave crash? I don't have commit
rights so can someone merge this if it looks good?

Tom previously said

> I don't have any objection to the patch but I'm wondering how this
> occurs and/or whether we can get a test case for it.

and I think I've at least addressed how it occurs (our tstop response in
the qTStatus packet, 'tstop[:TEXT]:0', elides the intended-to-be-optional
[:TEXT] and just sends 'tstop:0'), although I'm not sure how to add a
test case.

Original patch follows below.
---

If the trace_status ts->stop_desc member is null, the 'tsave' command
would segfault within tfile_write_status. Add a null-check for it. 

---

 gdb/tracefile-tfile.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gdb/tracefile-tfile.c b/gdb/tracefile-tfile.c
index b59b5c73325..d6e9dbeaffa 100644
--- a/gdb/tracefile-tfile.c
+++ b/gdb/tracefile-tfile.c
@@ -170,8 +170,8 @@ tfile_write_status (struct trace_file_writer *self,
 
   fprintf (writer->fp, "status %c;%s",
 	   (ts->running ? '1' : '0'), stop_reason_names[ts->stop_reason]);
-  if (ts->stop_reason == tracepoint_error
-      || ts->stop_reason == trace_stop_command)
+  if (ts->stop_desc != nullptr && (ts->stop_reason == tracepoint_error
+				   || ts->stop_reason == trace_stop_command))
     {
       char *buf = (char *) alloca (strlen (ts->stop_desc) * 2 + 1);
 
@@ -201,14 +201,14 @@ tfile_write_status (struct trace_file_writer *self,
       fprintf (writer->fp, ";stoptime:%s",
       phex_nz (ts->stop_time, sizeof (ts->stop_time)));
     }
-  if (ts->notes != NULL)
+  if (ts->notes != nullptr)
     {
       char *buf = (char *) alloca (strlen (ts->notes) * 2 + 1);
 
       bin2hex ((gdb_byte *) ts->notes, buf, strlen (ts->notes));
       fprintf (writer->fp, ";notes:%s", buf);
     }
-  if (ts->user_name != NULL)
+  if (ts->user_name != nullptr)
     {
       char *buf = (char *) alloca (strlen (ts->user_name) * 2 + 1);
  
Simon Marchi April 17, 2025, 2:37 a.m. UTC | #6
On 3/24/25 11:09 AM, Brandon Belew wrote:
> If the trace_status ts->stop_desc member is null, the 'tsave' command
> would segfault within tfile_write_status. Add a null-check for it. 
> 
> ---
> 
>  gdb/tracefile-tfile.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gdb/tracefile-tfile.c b/gdb/tracefile-tfile.c
> index b59b5c73325..d6e9dbeaffa 100644
> --- a/gdb/tracefile-tfile.c
> +++ b/gdb/tracefile-tfile.c
> @@ -170,8 +170,8 @@ tfile_write_status (struct trace_file_writer *self,
>  
>    fprintf (writer->fp, "status %c;%s",
>  	   (ts->running ? '1' : '0'), stop_reason_names[ts->stop_reason]);
> -  if (ts->stop_reason == tracepoint_error
> -      || ts->stop_reason == trace_stop_command)
> +  if (ts->stop_desc != nullptr && (ts->stop_reason == tracepoint_error
> +				   || ts->stop_reason == trace_stop_command))
>      {
>        char *buf = (char *) alloca (strlen (ts->stop_desc) * 2 + 1);

So, this code generates the "status" line in a trace, which the doc says
is the same format as the qTStatus packet, documented here:

https://sourceware.org/gdb/current/onlinedocs/gdb.html/Tracepoint-Packets.html#Tracepoint-Packets

If the stop reason is tstop, then the following text is indeed optional:

  tstop[:text]:0

However, for the error case, it's not:

  terror:text:tpnum

So I would write that as:

if (ts->stop_reason == tracepoint_error
    || (ts->stop_reason == trace_stop_command && ts->stop_desc != nullptr)

Hopefully, the rest of GDB agrees with that and always sets a stop_desc
when the stop_reason is tracepoint_error.  From what I read in function
parse_trace_status, it does.

The documentation of the trace_status::stop_desc field is:

  /* If stop_reason is tstop_command or tracepoint_error, this is an
     arbitrary string that may describe the reason for the stop in
     more detail.  */

(I guess that tstop_command should be trace_stop_command in that text)

This doc implies that if stop_reason is one of those two, then this
field does contain a string.  That comment would probably need to be
changed, to say that if stop_reason is trace_stop_command, then
stop_desc may contain a string.  And if stop_reason is tracepoint_error,
then stop_desc always contains a string.

Ultimately, I think that the reason why you see the crash is that your
stub replies with "stop:0", whereas even when there is no stop
description, gdbserver replies with "stop::0", resulting in an empty but
non-nullptr string.

I agree with Tom that enhancing a test case to cover this would be nice.
The only way I see would be to add a special switch to gdbserver to make
it behave like your stub and return "stop:0" instead of "stop::0".
Perhaps it could reuse the existing --disable-packet flag, which is used
for this kind of thing.  Then, a test case can be written around that.

Simon
  

Patch

diff --git a/gdb/tracefile-tfile.c b/gdb/tracefile-tfile.c
index b59b5c73325..d6e9dbeaffa 100644
--- a/gdb/tracefile-tfile.c
+++ b/gdb/tracefile-tfile.c
@@ -170,8 +170,8 @@  tfile_write_status (struct trace_file_writer *self,
 
   fprintf (writer->fp, "status %c;%s",
 	   (ts->running ? '1' : '0'), stop_reason_names[ts->stop_reason]);
-  if (ts->stop_reason == tracepoint_error
-      || ts->stop_reason == trace_stop_command)
+  if (ts->stop_desc != nullptr && (ts->stop_reason == tracepoint_error
+				   || ts->stop_reason == trace_stop_command))
     {
       char *buf = (char *) alloca (strlen (ts->stop_desc) * 2 + 1);
 
@@ -201,14 +201,14 @@  tfile_write_status (struct trace_file_writer *self,
       fprintf (writer->fp, ";stoptime:%s",
       phex_nz (ts->stop_time, sizeof (ts->stop_time)));
     }
-  if (ts->notes != NULL)
+  if (ts->notes != nullptr)
     {
       char *buf = (char *) alloca (strlen (ts->notes) * 2 + 1);
 
       bin2hex ((gdb_byte *) ts->notes, buf, strlen (ts->notes));
       fprintf (writer->fp, ";notes:%s", buf);
     }
-  if (ts->user_name != NULL)
+  if (ts->user_name != nullptr)
     {
       char *buf = (char *) alloca (strlen (ts->user_name) * 2 + 1);