[V4,5/9] New probe type: DTrace USDT probes.

Message ID 20150806213103.GC14992@adacore.com
State New, archived
Headers

Commit Message

Joel Brobecker Aug. 6, 2015, 9:31 p.m. UTC
  I finally had a bit of time to look into this again. I concluded
that the DOF data is invalid, or follows a format that's different
from the one that GDB expected, and so added some code to detect
and ignore... Attached is the patch with full explanation. Does it
make sense to you?

gdb/ChangeLog:

        * dtrace-probe.c (dtrace_process_dof): Ignore the objfile's DOF
        data if a DTRACE_DOF_SECT_TYPE_PROVIDER section is found to be
        smaller than expected.

Tested on x86_64-linux (official GDB testsuite) + x86-solaris
(AdaCore's testsuite).

Thanks,
  

Comments

Sergio Durigan Junior Aug. 6, 2015, 10:17 p.m. UTC | #1
On Thursday, August 06 2015, Joel Brobecker wrote:

> I finally had a bit of time to look into this again. I concluded
> that the DOF data is invalid, or follows a format that's different
> from the one that GDB expected, and so added some code to detect
> and ignore... Attached is the patch with full explanation. Does it
> make sense to you?
>
> gdb/ChangeLog:
>
>         * dtrace-probe.c (dtrace_process_dof): Ignore the objfile's DOF
>         data if a DTRACE_DOF_SECT_TYPE_PROVIDER section is found to be
>         smaller than expected.
>
> Tested on x86_64-linux (official GDB testsuite) + x86-solaris
> (AdaCore's testsuite).

Pretty nice investigation, Joel!

This patch makes sense to me and is OK to go in (it just has a small
typo below).

Something I don't like about this goto scheme is that it doesn't contain
useful information as to why the section has been discarded ("... does
not contain valid DOF data" is too generic).  But I'm just speaking my
mind here; this has nothing to do with your patch :-).

> Thanks,
> -- 
> Joel
>
> From 84179e3ce4008f36cbdbb4edf040d94b5f03b4c2 Mon Sep 17 00:00:00 2001
> From: Joel Brobecker <brobecker@adacore.com>
> Date: Thu, 6 Aug 2015 22:13:32 +0200
> Subject: [PATCH] [DTRACE] ignore invalid DOF provider sections
>
> On x86-solaris 10, we noticed that starting a program would sometimes
> cause the debugger to crash. For instance:
>
>     % gdb a
>     (gdb) break adainit
>     Breakpoint 1 at 0x8051f03
>     (gdb) run
>     Starting program: /[...]/a
>     [Thread debugging using libthread_db enabled]
>     zsh: 24398 segmentation fault (core dumped)  /[...]/gdb a
>
> The exception occurs in dtrace_process_dof_probe, while trying
> to process each probe referenced by a DTRACE_DOF_SECT_TYPE_PROVIDER
> DOF section from /lib/libc.so.1. For reference, the ELF section
> in that shared library providing the DOF data has the following
> characteristics:
>
>     Idx Name          Size      VMA       LMA       File off  Algn
>      14 .SUNW_dof     0000109d  000b4398  000b4398  000b4398  2**3
>                       CONTENTS, ALLOC, LOAD, READONLY, DATA
>
> The function dtrace_process_dof gets passed the contents of that
> ELF section, which allows is to determine the location of the table
                            ^^

"it" ?

> where all DOF sections are described. I dumped the contents of
> each DOF section as seen by GDB, and it seemed to be plausible,
> because the offset of each DOF section was pretty much equal to
> the sum of the offset and size of the previous DOF section. Also,
> the offset + sum of the last section corresponds to the size of
> the .SUNW_dof section.
>
> Things start to break down when processing one of the DOF sections
> that has a type of DTRACE_DOF_SECT_TYPE_PROVIDER. It gets the contents
> of this DOF section via:
>
>         struct dtrace_dof_provider *provider = (struct dtrace_dof_provider *)
>           DTRACE_DOF_PTR (dof, DOF_UINT (dof, section->dofs_offset));
>
> Said more simply, the struct dtrace_dof_provider data is at
> section->dofs_offset of the entire DOF contents. Given that
> the contents of SECTION seemed to make sense, so far so good.
>
> However, what SECTION tells us is that our DOF provider section
> is 40 bytes long:
>
>     (gdb) print *section
>     $36 = {dofs_type = 15, dofs_align = 4, dofs_flags = 1,
>            dofs_entsize = 0, dofs_offset = 3264, dofs_size = 40}
>                                                  ^^^^^^^^^^^^^^
>
> But on the other hand:
>
>     (gdb) p sizeof (struct dtrace_dof_provider)
>     $54 = 44
>
> In other words GDB expected a bigger DOF section and when we try to
> fetch the value of the last field of that DOF section (dofpv_prenoffs)...
>
>     eoffsets_s = DTRACE_DOF_SECT (dof,
>                                   DOF_UINT (dof, provider->dofpv_prenoffs));
>
> ... we end up reading data that actually belongs to another DOF
> section, and therefore irrelevant. This in turn means that the value
> of eofftab gets incorrectly set, since it depends on eoffsets_s:
>
>     eofftab = DTRACE_DOF_PTR (dof, DOF_UINT (dof, eoffsets_s->dofs_offset));
>
> This invalid address quickly catches up to us when we pass it to
> dtrace_process_dof_probe shortly after, where we crash because
> we try to subscript it:
>
>     Program received signal SIGSEGV, Segmentation fault.
>     0x08155bba in dtrace_process_dof_probe ([...]) at [...]/dtrace-probe.c:378
>     378             = ((uint32_t *) eofftab)[...];
>
> This patch fixes the issue by detecting provider DOF sections
> that are smaller than expected, and discarding the DOF data.
>
> gdb/ChangeLog:
>
>         * dtrace-probe.c (dtrace_process_dof): Ignore the objfile's DOF
>         data if a DTRACE_DOF_SECT_TYPE_PROVIDER section is found to be
>         smaller than expected.
> ---
>  gdb/dtrace-probe.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/gdb/dtrace-probe.c b/gdb/dtrace-probe.c
> index 3f2548d..9816f07 100644
> --- a/gdb/dtrace-probe.c
> +++ b/gdb/dtrace-probe.c
> @@ -519,6 +519,14 @@ dtrace_process_dof (asection *sect, struct objfile *objfile,
>  	unsigned int entsize = DOF_UINT (dof, probes_s->dofs_entsize);
>  	int num_probes;
>  
> +	if (DOF_UINT (dof, section->dofs_size)
> +	    < sizeof (struct dtrace_dof_provider))
> +	  {
> +	    /* The section is smaller than expected, so do not use it.
> +	       This has been observed on x86-solaris 10.  */
> +	    goto invalid_dof_data;
> +	  }
> +
>  	/* Very, unlikely, but could crash gdb if not handled
>  	   properly.  */
>  	if (entsize == 0)
> -- 
> 1.7.10.4
>

Thanks,
  
Jose E. Marchesi Aug. 7, 2015, 1:11 p.m. UTC | #2
Hi Joel.

    I finally had a bit of time to look into this again. I concluded
    that the DOF data is invalid, or follows a format that's different
    from the one that GDB expected, and so added some code to detect
    and ignore... Attached is the patch with full explanation. Does it
    make sense to you?

Yes, definitely, thanks for the fix!

I will pass this information to the DTrace hackers.  It would be nice to
figure out why such an inconsistent DOF data is being generated in
solarix-x86.
  
Jose E. Marchesi Aug. 7, 2015, 1:20 p.m. UTC | #3
+	if (DOF_UINT (dof, section->dofs_size)
    +	    < sizeof (struct dtrace_dof_provider))
    +	  {
    +	    /* The section is smaller than expected, so do not use it.
    +	       This has been observed on x86-solaris 10.  */
    +	    goto invalid_dof_data;
    +	  }
    +

It looks to me that at some point a new field was probably added to the
struct dtrace_dof_provider, changing its size.  If the DOF sections
found in the solaris-x86 binaries were generated before the change, that
would explain the discrepancy in size.
  
Jose E. Marchesi Aug. 7, 2015, 2:17 p.m. UTC | #4
+	if (DOF_UINT (dof, section->dofs_size)
        +	    < sizeof (struct dtrace_dof_provider))
        +	  {
        +	    /* The section is smaller than expected, so do not use it.
        +	       This has been observed on x86-solaris 10.  */
        +	    goto invalid_dof_data;
        +	  }
        +
    
    It looks to me that at some point a new field was probably added to the
    struct dtrace_dof_provider, changing its size.

Confirmed: DOF version 2 added support for is-enabled probes, and the
struct dtrace_dof_provider got a new 4 bytes field:

@@ -719,6 +722,7 @@ typedef struct dof_provider {
        dof_attr_t dofpv_funcattr;      /* function attributes */
        dof_attr_t dofpv_nameattr;      /* name attributes */
        dof_attr_t dofpv_argsattr;      /* args attributes */
+       dof_secidx_t dofpv_prenoffs;    /* link to DOF_SECT_PRENOFFS section */
 } dof_provider_t;

Looks like your solaris-x86 binaries are embedding DOFv1 programs.  You
can check that by looking at dof->dofh_ident[6]: it will be 1 in a DOFv1
program and 2 in a DOFv2 program.

So... we can make GDB to look at the DOF version and then use a
different struct (struct dof_provider_v1 ?) when handling embedded
programs using the old format.

I can prepare a patch for this.  For testing purposes pdtrace can be
made to generate DOFv1 programs even in systems lacking an old enough
dtrace.
  
Joel Brobecker Aug. 7, 2015, 3:12 p.m. UTC | #5
> Confirmed: DOF version 2 added support for is-enabled probes, and the
> struct dtrace_dof_provider got a new 4 bytes field:
> 
> @@ -719,6 +722,7 @@ typedef struct dof_provider {
>         dof_attr_t dofpv_funcattr;      /* function attributes */
>         dof_attr_t dofpv_nameattr;      /* name attributes */
>         dof_attr_t dofpv_argsattr;      /* args attributes */
> +       dof_secidx_t dofpv_prenoffs;    /* link to DOF_SECT_PRENOFFS section */
>  } dof_provider_t;
> 
> Looks like your solaris-x86 binaries are embedding DOFv1 programs.  You
> can check that by looking at dof->dofh_ident[6]: it will be 1 in a DOFv1
> program and 2 in a DOFv2 program.

I think I can confirm that. This is the contents of the .SUNW_dof
section for the .so file:

    Contents of section .SUNW_dof:
     b4398 7f444f46 01010102 08080000 00000000  .DOF............
                        ^^
                        ||

> So... we can make GDB to look at the DOF version and then use a
> different struct (struct dof_provider_v1 ?) when handling embedded
> programs using the old format.
>
> I can prepare a patch for this.  For testing purposes pdtrace can be
> made to generate DOFv1 programs even in systems lacking an old enough
> dtrace.

Sure. I would perhaps also consider the idea of not support DOF v1,
if v1 support is only of marginal value.

Also, I didn't say it before, I wasn't so sure that casting a binary
stream to a structure to read its contents was the best way to do
things. It can of works, but makes the code a little harder to read,
IMO, because accessing its contents requires the accessor macros.
Perhaps it might be better to just have decoding routines that take
the binary data and produce the corresponding struct, with the data
properly decoded. That way, you can add DOF version checks and decode
the data differently, without the rest of the code having to worry
about which version of the type they should be using. It's not obvious
to me that this way is better than the current way, but that's how
I would have tried it first... (just my 2 cents)
  
Joel Brobecker Aug. 7, 2015, 3:20 p.m. UTC | #6
> Pretty nice investigation, Joel!
> 
> This patch makes sense to me and is OK to go in (it just has a small
> typo below).

Thanks! Patch just pushed with the tiny correction you indicated.

> Something I don't like about this goto scheme is that it doesn't contain
> useful information as to why the section has been discarded ("... does
> not contain valid DOF data" is too generic).  But I'm just speaking my
> mind here; this has nothing to do with your patch :-).

We could improve it indeed. For a rainy day...
  
Sergio Durigan Junior Aug. 10, 2015, 3:21 a.m. UTC | #7
On Friday, August 07 2015, Joel Brobecker wrote:

>> So... we can make GDB to look at the DOF version and then use a
>> different struct (struct dof_provider_v1 ?) when handling embedded
>> programs using the old format.
>>
>> I can prepare a patch for this.  For testing purposes pdtrace can be
>> made to generate DOFv1 programs even in systems lacking an old enough
>> dtrace.

Thanks, Jose.  That would be much appreciated.

> Also, I didn't say it before, I wasn't so sure that casting a binary
> stream to a structure to read its contents was the best way to do
> things. It can of works, but makes the code a little harder to read,
> IMO, because accessing its contents requires the accessor macros.
> Perhaps it might be better to just have decoding routines that take
> the binary data and produce the corresponding struct, with the data
> properly decoded. That way, you can add DOF version checks and decode
> the data differently, without the rest of the code having to worry
> about which version of the type they should be using. It's not obvious
> to me that this way is better than the current way, but that's how
> I would have tried it first... (just my 2 cents)

I agree.  Something else really worth doing is creating a debug setting
for DTrace probes, and print useful information while decoding them.
Almost all (if not all) the information that Joel gathered using
external tools could have been printed from inside GDB.

Cheers,
  
Jose E. Marchesi Aug. 10, 2015, 2:37 p.m. UTC | #8
> Also, I didn't say it before, I wasn't so sure that casting a binary
    > stream to a structure to read its contents was the best way to do
    > things. It can of works, but makes the code a little harder to read,
    > IMO, because accessing its contents requires the accessor macros.
    > Perhaps it might be better to just have decoding routines that take
    > the binary data and produce the corresponding struct, with the data
    > properly decoded. That way, you can add DOF version checks and decode
    > the data differently, without the rest of the code having to worry
    > about which version of the type they should be using. It's not obvious
    > to me that this way is better than the current way, but that's how
    > I would have tried it first... (just my 2 cents)
    
    I agree.  Something else really worth doing is creating a debug setting
    for DTrace probes, and print useful information while decoding them.

That sounds like a good idea.  Let me work on that...
  

Patch

From 84179e3ce4008f36cbdbb4edf040d94b5f03b4c2 Mon Sep 17 00:00:00 2001
From: Joel Brobecker <brobecker@adacore.com>
Date: Thu, 6 Aug 2015 22:13:32 +0200
Subject: [PATCH] [DTRACE] ignore invalid DOF provider sections

On x86-solaris 10, we noticed that starting a program would sometimes
cause the debugger to crash. For instance:

    % gdb a
    (gdb) break adainit
    Breakpoint 1 at 0x8051f03
    (gdb) run
    Starting program: /[...]/a
    [Thread debugging using libthread_db enabled]
    zsh: 24398 segmentation fault (core dumped)  /[...]/gdb a

The exception occurs in dtrace_process_dof_probe, while trying
to process each probe referenced by a DTRACE_DOF_SECT_TYPE_PROVIDER
DOF section from /lib/libc.so.1. For reference, the ELF section
in that shared library providing the DOF data has the following
characteristics:

    Idx Name          Size      VMA       LMA       File off  Algn
     14 .SUNW_dof     0000109d  000b4398  000b4398  000b4398  2**3
                      CONTENTS, ALLOC, LOAD, READONLY, DATA

The function dtrace_process_dof gets passed the contents of that
ELF section, which allows is to determine the location of the table
where all DOF sections are described. I dumped the contents of
each DOF section as seen by GDB, and it seemed to be plausible,
because the offset of each DOF section was pretty much equal to
the sum of the offset and size of the previous DOF section. Also,
the offset + sum of the last section corresponds to the size of
the .SUNW_dof section.

Things start to break down when processing one of the DOF sections
that has a type of DTRACE_DOF_SECT_TYPE_PROVIDER. It gets the contents
of this DOF section via:

        struct dtrace_dof_provider *provider = (struct dtrace_dof_provider *)
          DTRACE_DOF_PTR (dof, DOF_UINT (dof, section->dofs_offset));

Said more simply, the struct dtrace_dof_provider data is at
section->dofs_offset of the entire DOF contents. Given that
the contents of SECTION seemed to make sense, so far so good.

However, what SECTION tells us is that our DOF provider section
is 40 bytes long:

    (gdb) print *section
    $36 = {dofs_type = 15, dofs_align = 4, dofs_flags = 1,
           dofs_entsize = 0, dofs_offset = 3264, dofs_size = 40}
                                                 ^^^^^^^^^^^^^^

But on the other hand:

    (gdb) p sizeof (struct dtrace_dof_provider)
    $54 = 44

In other words GDB expected a bigger DOF section and when we try to
fetch the value of the last field of that DOF section (dofpv_prenoffs)...

    eoffsets_s = DTRACE_DOF_SECT (dof,
                                  DOF_UINT (dof, provider->dofpv_prenoffs));

... we end up reading data that actually belongs to another DOF
section, and therefore irrelevant. This in turn means that the value
of eofftab gets incorrectly set, since it depends on eoffsets_s:

    eofftab = DTRACE_DOF_PTR (dof, DOF_UINT (dof, eoffsets_s->dofs_offset));

This invalid address quickly catches up to us when we pass it to
dtrace_process_dof_probe shortly after, where we crash because
we try to subscript it:

    Program received signal SIGSEGV, Segmentation fault.
    0x08155bba in dtrace_process_dof_probe ([...]) at [...]/dtrace-probe.c:378
    378             = ((uint32_t *) eofftab)[...];

This patch fixes the issue by detecting provider DOF sections
that are smaller than expected, and discarding the DOF data.

gdb/ChangeLog:

        * dtrace-probe.c (dtrace_process_dof): Ignore the objfile's DOF
        data if a DTRACE_DOF_SECT_TYPE_PROVIDER section is found to be
        smaller than expected.
---
 gdb/dtrace-probe.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/gdb/dtrace-probe.c b/gdb/dtrace-probe.c
index 3f2548d..9816f07 100644
--- a/gdb/dtrace-probe.c
+++ b/gdb/dtrace-probe.c
@@ -519,6 +519,14 @@  dtrace_process_dof (asection *sect, struct objfile *objfile,
 	unsigned int entsize = DOF_UINT (dof, probes_s->dofs_entsize);
 	int num_probes;
 
+	if (DOF_UINT (dof, section->dofs_size)
+	    < sizeof (struct dtrace_dof_provider))
+	  {
+	    /* The section is smaller than expected, so do not use it.
+	       This has been observed on x86-solaris 10.  */
+	    goto invalid_dof_data;
+	  }
+
 	/* Very, unlikely, but could crash gdb if not handled
 	   properly.  */
 	if (entsize == 0)
-- 
1.7.10.4