[V4,5/9] New probe type: DTrace USDT probes.

Message ID 20150325191418.GA32233@adacore.com
State New, archived
Headers

Commit Message

Joel Brobecker March 25, 2015, 7:14 p.m. UTC
  Jose,

> > 2015-02-02  Jose E. Marchesi  <jose.marchesi@oracle.com>
> >
> > 	* breakpoint.c (BREAK_ARGS_HELP): help string updated to mention
> > 	the -probe-dtrace new vpossible value for PROBE_MODIFIER.
> > 	* configure.ac (CONFIG_OBS): dtrace-probe.o added if BFD can
> > 	handle ELF files.
> > 	* Makefile.in (SFILES): dtrace-probe.c added.
> > 	* configure: Regenerate.
> > 	* dtrace-probe.c: New file.
> > 	(SHT_SUNW_dof): New constant.
> > 	(dtrace_probe_type): New enum.
> > 	(dtrace_probe_arg): New struct.
> > 	(dtrace_probe_arg_s): New typedef.
> > 	(struct dtrace_probe_enabler): New struct.
> > 	(dtrace_probe_enabler_s): New typedef.
> > 	(dtrace_probe): New struct.
> > 	(dtrace_probe_is_linespec): New function.
> > 	(dtrace_dof_sect_type): New enum.
> > 	(dtrace_dof_dofh_ident): Likewise.
> > 	(dtrace_dof_encoding): Likewise.
> > 	(DTRACE_DOF_ENCODE_LSB): Likewise.
> > 	(DTRACE_DOF_ENCODE_MSB): Likewise.
> > 	(dtrace_dof_hdr): New struct.
> > 	(dtrace_dof_sect): Likewise.
> > 	(dtrace_dof_provider): Likewise.
> > 	(dtrace_dof_probe): Likewise.
> > 	(DOF_UINT): New macro.
> > 	(DTRACE_DOF_PTR): Likewise.
> > 	(DTRACE_DOF_SECT): Likewise.
> > 	(dtrace_process_dof_probe): New function.
> > 	(dtrace_process_dof): Likewise.
> > 	(dtrace_build_arg_exprs): Likewise.
> > 	(dtrace_get_arg): Likewise.
> > 	(dtrace_get_probes): Likewise.
> > 	(dtrace_get_probe_argument_count): Likewise.
> > 	(dtrace_can_evaluate_probe_arguments): Likewise.
> > 	(dtrace_evaluate_probe_argument): Likewise.
> > 	(dtrace_compile_to_ax): Likewise.
> > 	(dtrace_probe_destroy): Likewise.
> > 	(dtrace_gen_info_probes_table_header): Likewise.
> > 	(dtrace_gen_info_probes_table_values): Likewise.
> > 	(dtrace_probe_is_enabled): Likewise.
> > 	(dtrace_probe_ops): New variable.
> > 	(info_probes_dtrace_command): New function.
> > 	(_initialize_dtrace_probe): Likewise.
> > 	(dtrace_type_name): Likewise.

Unfortunately, this patch is breaking GDB completely on Solaris.
The first issue I started investigating is the following, which
happens with nearly any unthreaded program:

    (gdb) start
    Temporary breakpoint 1 at 0x80593bc: file simple_main.adb, line 4.
    Starting program: /[...]/simple_main 
    [Thread debugging using libthread_db enabled]
    No definition of "mutex_t" in current context.

The error happens in...

> > +static void
> > +dtrace_process_dof_probe (struct objfile *objfile,

... and more precisely...

> > +      /* Store argument type descriptions.  A description of the type
> > +         of the argument is in the (J+1)th null-terminated string
> > +         starting at 'strtab' + 'probe->dofpr_nargv'.  */
> > +      ret->args = NULL;
> > +      p = strtab + DOF_UINT (dof, probe->dofpr_nargv);
> > +      for (j = 0; j < ret->probe_argc; j++)
> > +	{
> > +	  struct dtrace_probe_arg arg;
> > +	  struct expression *expr;
> > +
> > +	  arg.type_str = xstrdup (p);
> > +
> > +	  /* Use strtab_size as a sentinel.  */
> > +	  while (*p++ != '\0' && p - strtab < strtab_size);
> > +
> > +	  /* Try to parse a type expression from the type string.  If
> > +	     this does not work then we set the type to `long
> > +	     int'.  */
> > +          arg.type = builtin_type (gdbarch)->builtin_long;
> > +	  expr = parse_expression (arg.type_str);
                 ^^^^^^^^^^^^^^^^
                 there

The most obvious issue is that type "mutex_t" does not exist in
my program, so I think that the code should handle that case.

The second, less obvious issue, is that the parsing is done using
the current language, which I don't think is a good idea. Attached
is small prototype that attempts to do something like that.
I think we need a function that parses an expression using
a specific language, but I went for short prototype to test
my theory first.

That solves most of the issues, even if I don't really know if
it's not just sweeping them under the carpet instead.

And once I had that fixed, the next issue that I looked at was:

    (gdb) b adainit
    Breakpoint 1 at 0x8051f03
    (gdb) run
    Starting program: /[...]/a
    [Thread debugging using libthread_db enabled]
    zsh: 12378 segmentation fault (core dumped)  /[...]/gdb -q a

This is where I'm getting even more out of my league, here.
The SEGV happens on the following line:

377           uint32_t enabler_offset
378             = ((uint32_t *) eofftab)[DOF_UINT (dof, probe->dofpr_enoffidx) + i];

... and eofftab seems to have a bogus value, because:

    (gdb) p eofftab
    $6 = 0x9479fb2 <error: Cannot access memory at address 0x9479fb2>

For me to investigate further, I think I'd have to investigate how
probes work and what's where, and I don't really have the time or
interest at the moment. So, I'm hoping that you have access to some
Solaris systems and you could look into this?

I can send you reproducers, if you need. I investigated the issue on
x86-solaris 2.10, but the same happens on sparc-solaris 2.10 and
sparc64-solaris 2.10.

Thank you,
  

Comments

Jose E. Marchesi March 26, 2015, 4:21 p.m. UTC | #1
Hi Joel!
    
    > > +	  /* Try to parse a type expression from the type string.  If
    > > +	     this does not work then we set the type to `long
    > > +	     int'.  */
    > > +          arg.type = builtin_type (gdbarch)->builtin_long;
    > > +	  expr = parse_expression (arg.type_str);
                     ^^^^^^^^^^^^^^^^
                     there
    
    The most obvious issue is that type "mutex_t" does not exist in
    my program, so I think that the code should handle that case.

Certainly.
    
    The second, less obvious issue, is that the parsing is done using
    the current language, which I don't think is a good idea. Attached
    is small prototype that attempts to do something like that.
    I think we need a function that parses an expression using
    a specific language, but I went for short prototype to test
    my theory first.

    That solves most of the issues, even if I don't really know if
    it's not just sweeping them under the carpet instead.

Well, the TRY..CATCH in your prototype would catch any error that may be
thrown in parse_expression, and the `set_language' must take care of
changing the language, so I would say that is enough...
    
    And once I had that fixed, the next issue that I looked at was:
    
        (gdb) b adainit
        Breakpoint 1 at 0x8051f03
        (gdb) run
        Starting program: /[...]/a
        [Thread debugging using libthread_db enabled]
        zsh: 12378 segmentation fault (core dumped)  /[...]/gdb -q a
    
    This is where I'm getting even more out of my league, here.
    The SEGV happens on the following line:
    
    377           uint32_t enabler_offset
    378             = ((uint32_t *) eofftab)[DOF_UINT (dof, probe->dofpr_enoffidx) + i];

I will debug that SIGSEGV in solaris, but the problem seems to be
related to the DOF program embedded in your program, more than to the
platform.

Could you please send me your sparc-solaris reproducer?
Salud!
  
Joel Brobecker March 26, 2015, 5:50 p.m. UTC | #2
> Well, the TRY..CATCH in your prototype would catch any error that may be
> thrown in parse_expression, and the `set_language' must take care of
> changing the language, so I would say that is enough...

OK - I will send an updated patch that makes things a little cleaner.
I didn't know whether it was OK to default to type long makes much
sense when the probe says the parameter should be using type "mutex_t".

>     And once I had that fixed, the next issue that I looked at was:
>     
>         (gdb) b adainit
>         Breakpoint 1 at 0x8051f03
>         (gdb) run
>         Starting program: /[...]/a
>         [Thread debugging using libthread_db enabled]
>         zsh: 12378 segmentation fault (core dumped)  /[...]/gdb -q a
>     
>     This is where I'm getting even more out of my league, here.
>     The SEGV happens on the following line:
>     
>     377           uint32_t enabler_offset
>     378             = ((uint32_t *) eofftab)[DOF_UINT (dof, probe->dofpr_enoffidx) + i];
> 
> I will debug that SIGSEGV in solaris, but the problem seems to be
> related to the DOF program embedded in your program, more than to the
> platform.
> 
> Could you please send me your sparc-solaris reproducer?

Thanks for the offer to help! Sadly, the SEGV doesn't seem to
happen on sparc-solaris, it seems. Once I apply the patch above,
I pretty much get normal results back (yay!).

So, the problem appears to be specific to x86-solaris. I didn't know
the DOF program was embedded in the executable, but I suspect there is
a problem in its contents. How do you think we should proceed?

Thanks!
  
Jose E. Marchesi March 26, 2015, 11:45 p.m. UTC | #3
> Could you please send me your sparc-solaris reproducer?
    
    Thanks for the offer to help! Sadly, the SEGV doesn't seem to
    happen on sparc-solaris, it seems. Once I apply the patch above,
    I pretty much get normal results back (yay!).
    
    So, the problem appears to be specific to x86-solaris. I didn't know
    the DOF program was embedded in the executable, but I suspect there is
    a problem in its contents. How do you think we should proceed?
    
Hmm, this may be an endianness problem:

/* DOF supports two different encodings: MSB (big-endian) and LSB
   (little-endian).  The encoding is itself encoded in the DOF header.
   The following function returns an unsigned value in the host
   endianness.  */

#define DOF_UINT(dof, field)						\
  extract_unsigned_integer ((gdb_byte *) &(field),			\
			    sizeof ((field)),				\
			    (((dof)->dofh_ident[DTRACE_DOF_ID_ENCODING] \
			      == DTRACE_DOF_ENCODE_MSB)			\
			     ? BFD_ENDIAN_BIG : BFD_ENDIAN_LITTLE))


We know for a fact that the implementation works properly on a
little-endian host (x86) and little-endian DOF data (generated by either
pdtrace or dtrace).

I would need to look at the x86-solaris binaries you are using.  Maybe
dtrace on that platform is generating big-endian DOF data in the
little-endian platform and the implementation is not handling that
properly.
  
Jose E. Marchesi March 31, 2015, 5:35 p.m. UTC | #4
Hi Joel.

    >     And once I had that fixed, the next issue that I looked at was:
    >     
    >         (gdb) b adainit
    >         Breakpoint 1 at 0x8051f03
    >         (gdb) run
    >         Starting program: /[...]/a
    >         [Thread debugging using libthread_db enabled]
    >         zsh: 12378 segmentation fault (core dumped)  /[...]/gdb -q a
    >     
    >     This is where I'm getting even more out of my league, here.
    >     The SEGV happens on the following line:
    >     
    >     377           uint32_t enabler_offset
    >     378             = ((uint32_t *) eofftab)[DOF_UINT (dof, probe->dofpr_enoffidx) + i];
    > 
    > I will debug that SIGSEGV in solaris, but the problem seems to be
    > related to the DOF program embedded in your program, more than to the
    > platform.
    > 
    > Could you please send me your sparc-solaris reproducer?
    
    Thanks for the offer to help! Sadly, the SEGV doesn't seem to
    happen on sparc-solaris, it seems. Once I apply the patch above,
    I pretty much get normal results back (yay!).
    
    So, the problem appears to be specific to x86-solaris. I didn't know
    the DOF program was embedded in the executable, but I suspect there is
    a problem in its contents.

I just tried the following in a x86-solaris machine, using todays git
GDB:

$ uname -a
SunOS solaris-x86 5.11 11.1 i86pc i386 i86pc
$ cat > foo.d <<EOF
provider test {
 probe progress__counter (char *, int);
};
EOF
$ cat > foo.c <<EOF
#include "foo.h"

int
main ()
{
  char *name = "application";
  int i = 0;

  while (i < 10)
  {
    i++;
    if (TEST_PROGRESS_COUNTER_ENABLED())
      TEST_PROGRESS_COUNTER (name, i);
  }

  return 0;
}
EOF
$ dtrace -h -s foo.d -o foo.h
$ gcc -c foo.c
$ dtrace -G -s foo.d -o foo-p.o foo.o
$ gcc -o foo foo.o foo-p.o
$ gdb foo
[...]
(gdb) info probes
Type   Provider Name             Where      Enabled Object
dtrace test     progress-counter 0x08051024 unknown /export/home/jose/binutils-gdb/build/foo
(gdb) disas main
Dump of assembler code for function main:
0x08050fec <+0>:     push   %ebp
0x08050fed <+1>:     mov    %esp,%ebp
0x08050fef <+3>:     and    $0xfffffff0,%esp
0x08050ff2 <+6>:     sub    $0x20,%esp
0x08050ff5 <+9>:     movl   $0x80514f0,0x18(%esp)
0x08050ffd <+17>:    movl   $0x0,0x1c(%esp)
0x08051005 <+25>:    jmp    0x8051029 <main+61>
0x08051007 <+27>:    addl   $0x1,0x1c(%esp)
0x0805100c <+32>:    xor    %eax,%eax
0x0805100e <+34>:    nop
0x0805100f <+35>:    nop
0x08051010 <+36>:    nop
0x08051011 <+37>:    test   %eax,%eax
0x08051013 <+39>:    je     0x8051029 <main+61>
0x08051015 <+41>:    mov    0x1c(%esp),%eax
0x08051019 <+45>:    mov    %eax,0x4(%esp)
0x0805101d <+49>:    mov    0x18(%esp),%eax
0x08051021 <+53>:    mov    %eax,(%esp)
0x08051024 <+56>:    nop
0x08051025 <+57>:    nop
0x08051026 <+58>:    nop
0x08051027 <+59>:    nop
0x08051028 <+60>:    nop
0x08051029 <+61>:    cmpl   $0x9,0x1c(%esp)
0x0805102e <+66>:    jle    0x8051007 <main+27>
0x08051030 <+68>:    mov    $0x0,%eax
0x08051035 <+73>:    leave
0x08051036 <+74>:    ret
End of assembler dump.
(gdb)

In this system DTrace generates little-endian DOF data, and it looks
like GDB handles it just fine.  I still think the problem you observed
is related to endianness, but I cannot be sure without looking at the
contents of the DOF program embedded in your binaries...
  
Joel Brobecker March 31, 2015, 6:47 p.m. UTC | #5
Hi Jose,

> I just tried the following in a x86-solaris machine, using todays git
> GDB:
[...]
> In this system DTrace generates little-endian DOF data, and it looks
> like GDB handles it just fine.  I still think the problem you observed
> is related to endianness, but I cannot be sure without looking at the
> contents of the DOF program embedded in your binaries...

Sorry, I was going to send you an email this morning, and then got
carried away by other duties... :-(

I've been able to reproduce the problem, but when I checked again
with a non-AdaCore debugger, the problem went away. I'm not really
sure whether this indicates whether this is an issue caused by
some of our local changes, or whether this is just a latent bug
that gets triggered by those local changes. Either way, it felt like
I would have no choice but to get more familiar with how those probes
work. The problem is that I'm short on time again. But, if you have
any kind of document describing how this data is laid out in the
binary, I'm very interested.

Sorry I waited so long before getting back to you!
  
Jose E. Marchesi March 31, 2015, 8 p.m. UTC | #6
I've been able to reproduce the problem, but when I checked again
    with a non-AdaCore debugger, the problem went away. I'm not really
    sure whether this indicates whether this is an issue caused by
    some of our local changes, or whether this is just a latent bug
    that gets triggered by those local changes. Either way, it felt like
    I would have no choice but to get more familiar with how those probes
    work. The problem is that I'm short on time again. But, if you have
    any kind of document describing how this data is laid out in the
    binary, I'm very interested.
    
I don't know of any document describing the DOF layout.  I would suggest
you to take a look to the script gdb/testsuite/lib/pdtrace.in.  It
creates and populates DOF sections and it documents the structure of the
generated data.

Feel free to send me any question you may have regarding DOF.  I will
try to help even if I am not a DOF expert myself :)
  

Patch

diff --git a/gdb/dtrace-probe.c b/gdb/dtrace-probe.c
index 491d853..5d5c5b2 100644
--- a/gdb/dtrace-probe.c
+++ b/gdb/dtrace-probe.c
@@ -414,6 +414,7 @@  dtrace_process_dof_probe (struct objfile *objfile,
 	{
 	  struct dtrace_probe_arg arg;
 	  struct expression *expr;
+	  struct cleanup *old_chain = NULL;
 
 	  /* Set arg.expr to ensure all fields in expr are initialized and
 	     the compiler will not warn when arg is used.  */
@@ -427,8 +428,24 @@  dtrace_process_dof_probe (struct objfile *objfile,
 	     this does not work then we set the type to `long
 	     int'.  */
           arg.type = builtin_type (gdbarch)->builtin_long;
-	  expr = parse_expression (arg.type_str);
-	  if (expr->elts[0].opcode == OP_TYPE)
+
+	  if (current_language->la_language != language_c)
+	    {
+	      old_chain = make_cleanup_restore_current_language ();
+	      set_language (language_c);
+	    }
+	  TRY
+	    {
+	      expr = parse_expression (arg.type_str);
+	    }
+	  CATCH (ex, RETURN_MASK_ERROR)
+	    {
+	      expr = NULL;
+	    }
+	  END_CATCH
+	  if (old_chain != NULL)
+	    do_cleanups (old_chain);
+	  if (expr != NULL && expr->elts[0].opcode == OP_TYPE)
 	    arg.type = expr->elts[1].type;
 
 	  VEC_safe_push (dtrace_probe_arg_s, ret->args, &arg);