Add a new format letter to dump instructions backward

Message ID 1827952218.466587.1453670934999.JavaMail.yahoo@mail.yahoo.com
State New, archived
Headers

Commit Message

Toshihito Kikuchi Jan. 24, 2016, 9:28 p.m. UTC
  Hi all,

With the attached patch, I'd like to add a new format letter "j" to
the "x" command to dump instructions in backward direction.
This feature is basically equal to "ub" command in Windows debugger.

Below is the sample output:

(gdb) disas main
Dump of assembler code for function main(int, char**):
   0x4040a7 <+0>: push   %rbp
   0x4040a8 <+1>: mov    %rsp,%rbp
   0x4040ab <+4>: sub    $0x10,%rsp
   0x4040af <+8>: mov    %edi,-0x4(%rbp)
   0x4040b2 <+11>:  mov    %rsi,-0x10(%rbp)
=> 0x4040b6 <+15>:  lea    0x116(%rip),%rdi        # 0x4041d3
   0x4040bd <+22>:  callq  0x400e30 <puts@plt>
   0x4040c2 <+27>:  mov    $0x0,%eax
   0x4040c7 <+32>:  leaveq 
   0x4040c8 <+33>:  retq   
End of assembler dump.
(gdb) x/6j $rip
   0x4040a6 <previous_function(...)+119>:  retq   
   0x4040a7 <main(int, char**)>:  push   %rbp
   0x4040a8 <main(int, char**)+1>:  mov    %rsp,%rbp
   0x4040ab <main(int, char**)+4>:  sub    $0x10,%rsp
   0x4040af <main(int, char**)+8>:  mov    %edi,-0x4(%rbp)
   0x4040b2 <main(int, char**)+11>: mov    %rsi,-0x10(%rbp)
(gdb) x/2j
   0x4040a2 <previous_function(...)+115>:  add    %rdx,%rax
   0x4040a5 <previous_function(...)+118>:  leaveq 

Unlike forward disassemble, theoretically it's impossible to
solve a correct frame only using the code. In the example below, we cannot
decide whether the instruction before "pop" is "nop" or "mov".

<correct frame>
   0x4045bb <+89>:  eb b4  jmp    0x404571
   0x4045bd <+91>:  90     nop
   0x4045be <+92>:  5d     pop    %rbp
   0x4045bf <+93>:  c3     retq   

<incorrect frame>
   0x4045bc <+90>:  b4 90  mov    $0x90,%ah
   0x4045be <+92>:  5d     pop    %rbp
   0x4045bf <+93>:  c3     retq 

To solve this, the change takes the help of line number information.

I introduced a new function "disassemble_backward" which goes backward
by referring the start address of each line.If we go out of the symbol range, we cancel backtrack and dump
the smallest address with the message.

Tested on x86_64 GNU/Linux.


Thanks,
Toshihito
  

Comments

Pedro Alves Jan. 25, 2016, 11:41 a.m. UTC | #1
On 01/24/2016 09:28 PM, Toshihito Kikuchi wrote:
> Hi all,
> 

Hi there.

> With the attached patch, I'd like to add a new format letter "j" to
> the "x" command to dump instructions in backward direction.

Thanks.

What's the rationale for making this a format letter though?
It seems orthogonal to the other format letters to me.

Example, say I dump raw opcodes in hex, with:

 (gdb) x /4x main
 0x400766 <main>:        0x55    0x48    0x89    0xe5
 (gdb) x /4x
 0x40076a <main+4>:      0x48    0x83    0xec    0x30
 (gdb) x
 0x40076e <main+8>:      0x48
 (gdb)
 0x40076f <main+9>:      0xc7

but then I can't go backwards in hex, as /j overrides it,
and always prints like /i:

 (gdb) x /4xj
    0x400766 <main>:     push   %rbp
    0x400767 <main+1>:   mov    %rsp,%rbp
    0x40076a <main+4>:   sub    $0x30,%rsp
 => 0x40076e <main+8>:   movq   $0x0,-0x8(%rbp)


Seems to me that some other UI would be better.  E.g.,:

 #1 - a different switch, like "x -back /i " ?

 #2 - a different command, like "bx" ?

 #3 - negative repeat counts ?

   (gdb) x /4i    // next 4 instructions
   (gdb) x /-4i   // previous 4 instructions

   (gdb) x /4bx    // next 4 bytes, in hex
   (gdb) x /-4bx   // previous 4 bytes, in hex


Added #1 just for completeness, I don't find it very convenient.

#3 feels natural to me.  What do you (and others) think?

Adding some tests would be great.  Also, the documentation
will need to be updated.

If you haven't yet, please take a look at:
 https://sourceware.org/gdb/wiki/ContributionChecklist

Thanks,
Pedro Alves
  
John Baldwin Jan. 25, 2016, 5:52 p.m. UTC | #2
On Monday, January 25, 2016 11:41:27 AM Pedro Alves wrote:
> On 01/24/2016 09:28 PM, Toshihito Kikuchi wrote:
> Seems to me that some other UI would be better.  E.g.,:
> 
>  #1 - a different switch, like "x -back /i " ?
> 
>  #2 - a different command, like "bx" ?
> 
>  #3 - negative repeat counts ?
> 
>    (gdb) x /4i    // next 4 instructions
>    (gdb) x /-4i   // previous 4 instructions
> 
>    (gdb) x /4bx    // next 4 bytes, in hex
>    (gdb) x /-4bx   // previous 4 bytes, in hex
> 
> 
> Added #1 just for completeness, I don't find it very convenient.
> 
> #3 feels natural to me.  What do you (and others) think?

I think #3 is the most natural as well.  I also think this is a
very useful feature.
  
Paul_Koning@Dell.com Jan. 25, 2016, 6:22 p.m. UTC | #3
> On Jan 25, 2016, at 12:52 PM, John Baldwin <jhb@freebsd.org> wrote:
> 
> On Monday, January 25, 2016 11:41:27 AM Pedro Alves wrote:
>> On 01/24/2016 09:28 PM, Toshihito Kikuchi wrote:
>> ...
>> #3 - negative repeat counts ?
>> 
>>   (gdb) x /4i    // next 4 instructions
>>   (gdb) x /-4i   // previous 4 instructions
>> ...
>> #3 feels natural to me.  What do you (and others) think?
> 
> I think #3 is the most natural as well.  I also think this is a
> very useful feature.

Yes, but how do you do instructions backwards if the instruction length is variable?  It is entirely possible that there will be multiple possible answers, and no way to tell which one (if any) is "correct".

	paul
  
Pedro Alves Jan. 25, 2016, 6:27 p.m. UTC | #4
On 01/25/2016 06:22 PM, Paul_Koning@Dell.com wrote:
> 
>> On Jan 25, 2016, at 12:52 PM, John Baldwin <jhb@freebsd.org> wrote:
>>
>> On Monday, January 25, 2016 11:41:27 AM Pedro Alves wrote:
>>> On 01/24/2016 09:28 PM, Toshihito Kikuchi wrote:
>>> ...
>>> #3 - negative repeat counts ?
>>>
>>>   (gdb) x /4i    // next 4 instructions
>>>   (gdb) x /-4i   // previous 4 instructions
>>> ...
>>> #3 feels natural to me.  What do you (and others) think?
>>
>> I think #3 is the most natural as well.  I also think this is a
>> very useful feature.
> 
> Yes, but how do you do instructions backwards if the instruction length is variable?  It is entirely possible that there will be multiple possible answers, and no way to tell which one (if any) is "correct".

You disassemble forward starting from the previous known
instruction boundary, based on symbol/line info.  I haven't looked
at the implementation in detail, but from the patch description, that's
what I assume the patch is doing.

Thanks,
Pedro Alves
  
Paul_Koning@Dell.com Jan. 25, 2016, 8:11 p.m. UTC | #5
> On Jan 25, 2016, at 1:27 PM, Pedro Alves <palves@redhat.com> wrote:
> 
> On 01/25/2016 06:22 PM, Paul_Koning@Dell.com wrote:
>> 
>>> On Jan 25, 2016, at 12:52 PM, John Baldwin <jhb@freebsd.org> wrote:
>>> 
>>> On Monday, January 25, 2016 11:41:27 AM Pedro Alves wrote:
>>>> On 01/24/2016 09:28 PM, Toshihito Kikuchi wrote:
>>>> ...
>>>> #3 - negative repeat counts ?
>>>> 
>>>>  (gdb) x /4i    // next 4 instructions
>>>>  (gdb) x /-4i   // previous 4 instructions
>>>> ...
>>>> #3 feels natural to me.  What do you (and others) think?
>>> 
>>> I think #3 is the most natural as well.  I also think this is a
>>> very useful feature.
>> 
>> Yes, but how do you do instructions backwards if the instruction length is variable?  It is entirely possible that there will be multiple possible answers, and no way to tell which one (if any) is "correct".
> 
> You disassemble forward starting from the previous known
> instruction boundary, based on symbol/line info.  I haven't looked
> at the implementation in detail, but from the patch description, that's
> what I assume the patch is doing.

So unlike the existing x commands this would depend on debug information.  It would be nice if it worked without, on machines where instruction length is fixed (most RISC machines).

	paul
  
Toshihito Kikuchi Jan. 26, 2016, 5:36 a.m. UTC | #6
Thank you for the great suggestion.

I also agree that #3, adding the support for a negative number, looks reasonable. I'll get back to this thread when the code is ready.

To answer the question from Paul, Pedro's guess is correct. I used symbol's line info to solve a correct frame. And yes, this feature does not work without debug information unfortunately. Since I'm not familiar with architectures other than x86, let me finish this work in the current approach first. If there are architectures where instruction length is fixed, I think I can add them later.


> Adding some tests would be great.  Also, the documentation
> will need to be updated.
> 
> If you haven't yet, please take a look at:
> https://sourceware.org/gdb/wiki/ContributionChecklist
Thank you for pointing to this. I looked at only gdb/CONTRIBUTE and didn't reach this. Tests and doc updates will be included in next patch.


Thanks,
Toshihito
  
Andrew Burgess Jan. 27, 2016, 4:04 p.m. UTC | #7
* Toshihito Kikuchi <k.toshihito@yahoo.de> [2016-01-26 05:36:58 +0000]:

> To answer the question from Paul, Pedro's guess is correct. I used
> symbol's line info to solve a correct frame. And yes, this feature
> does not work without debug information unfortunately.

If there's no debug information then just looking for symbols in the
same section would also be a reasonable guess. Most symbols appear
between instructions, not part way through an instruction.

Sounds like a useful feature.

Thanks,
Andrew
  
Paul_Koning@Dell.com Jan. 27, 2016, 7:49 p.m. UTC | #8
> On Jan 27, 2016, at 11:04 AM, Andrew Burgess <andrew.burgess@embecosm.com> wrote:
> 
> * Toshihito Kikuchi <k.toshihito@yahoo.de> [2016-01-26 05:36:58 +0000]:
> 
>> To answer the question from Paul, Pedro's guess is correct. I used
>> symbol's line info to solve a correct frame. And yes, this feature
>> does not work without debug information unfortunately.
> 
> If there's no debug information then just looking for symbols in the
> same section would also be a reasonable guess. Most symbols appear
> between instructions, not part way through an instruction.

Yes.  But if you have no symbols either, it can still work on Alpha, or on MIPS if you disregard the Thumb instruction set, or on any number of other RISC machines where every instruction is the same size

	paul
  
Pedro Alves Jan. 28, 2016, 11:53 a.m. UTC | #9
On 01/27/2016 07:49 PM, Paul_Koning@Dell.com wrote:
> 
>> On Jan 27, 2016, at 11:04 AM, Andrew Burgess <andrew.burgess@embecosm.com> wrote:
>>
>> * Toshihito Kikuchi <k.toshihito@yahoo.de> [2016-01-26 05:36:58 +0000]:
>>
>>> To answer the question from Paul, Pedro's guess is correct. I used
>>> symbol's line info to solve a correct frame. And yes, this feature
>>> does not work without debug information unfortunately.
>>
>> If there's no debug information then just looking for symbols in the
>> same section would also be a reasonable guess. Most symbols appear
>> between instructions, not part way through an instruction.
> 
> Yes.  But if you have no symbols either, it can still work on Alpha, or on MIPS if you disregard the Thumb instruction set, or on any number of other RISC machines where every instruction is the same size

Agreed.  We'd probably need to add a gdbarch hook to know whether
the target architecture has variable or fixed length instruction set.
That can always be done as a follow up incremental enhancement, though.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/printcmd.c b/gdb/printcmd.c
index f5c4211..a6cf141 100644
--- a/gdb/printcmd.c
+++ b/gdb/printcmd.c
@@ -47,6 +47,7 @@ 
 #include "cli/cli-utils.h"
 #include "format.h"
 #include "source.h"
+#include "linespec.h"
 
 #ifdef TUI
 #include "tui/tui.h"		/* For tui_active et al.   */
@@ -785,6 +786,97 @@  print_address_demangle (const struct value_print_options *opts,
 }
 
 
+/* These empty functions are used in disassemble_backward to
+   suppress standard output during the call to gdbarch_print_insn. */
+
+static int
+null_fprintf (void *stream, const char *format ATTRIBUTE_UNUSED, ...)
+{
+  return 0;
+}
+
+static void
+null_print_address (bfd_vma addr, struct disassemble_info *info)
+{
+  return;
+}
+
+/* Rewind assembly from addr and return the start address after the given
+   number of lines are disassembled. To avoid disassembling in a wrong frame,
+   we get addresses in a correct frame using line information.
+   If we go out of the symbol range during disassembling, we return
+   the smallest address we've got so far. */
+
+static CORE_ADDR
+disassemble_backward(struct gdbarch *gdbarch, CORE_ADDR addr,
+                     int count, int *linesread)
+{
+  char addrstr[64];
+  struct symtabs_and_lines sals;
+  CORE_ADDR start_pc, end_pc, ret;
+
+  start_pc = end_pc = ret = 0;
+  *linesread = 0;
+
+  sprintf (addrstr, "*%p", (void *) (addr - 1));
+  sals = decode_line_with_last_displayed (addrstr, DECODE_LINE_FUNFIRSTLINE);
+  if (sals.nelts >= 1)
+    {
+      if ((!sals.sals[0].line)
+          || (!find_line_pc_range (sals.sals[0], &start_pc, &end_pc)))
+        {
+          printf_filtered (_("No line number information available "
+            "for address "));
+          wrap_here ("  ");
+          print_address (gdbarch, addr - 1, gdb_stdout);
+          printf_filtered ("\n");
+          start_pc = end_pc = 0;
+        }
+      xfree (sals.sals);
+    }
+
+  if (start_pc)
+    {
+      VEC (CORE_ADDR) *pcs = NULL;
+      struct disassemble_info di;
+      CORE_ADDR p;
+      int i;
+
+      VEC_reserve (CORE_ADDR, pcs, count);
+
+      di = gdb_disassemble_info (gdbarch, NULL);
+      di.fprintf_func = null_fprintf;
+      di.print_address_func = null_print_address;
+
+      p = start_pc;
+      for (i = 0; p < addr; ++i)
+        {
+          VEC_safe_push (CORE_ADDR, pcs, p);
+          p += gdbarch_print_insn (gdbarch, (CORE_ADDR)p, &di);
+        }
+
+      if (i >= count)
+        {
+          ret = VEC_index (CORE_ADDR, pcs, i - count);
+          *linesread = count;
+        }
+        else
+        {
+          int linesread_internal = 0;
+          ret = disassemble_backward (gdbarch, VEC_index (CORE_ADDR, pcs, 0),
+               count - i, &linesread_internal);
+          *linesread = i + linesread_internal;
+
+          /* Return the smallest valid address we've got
+             if the recursive call above failed. */
+          if (!ret)
+            ret = VEC_index (CORE_ADDR, pcs, 0);
+        }
+      VEC_free (CORE_ADDR, pcs);
+    }
+  return ret;
+}
+
 /* Examine data at address ADDR in format FMT.
    Fetch it from memory and print on gdb_stdout.  */
 
@@ -798,6 +890,7 @@  do_examine (struct format_data fmt, struct gdbarch *gdbarch, CORE_ADDR addr)
   int i;
   int maxelts;
   struct value_print_options opts;
+  CORE_ADDR addr_rewound = 0;
 
   format = fmt.format;
   size = fmt.size;
@@ -805,6 +898,24 @@  do_examine (struct format_data fmt, struct gdbarch *gdbarch, CORE_ADDR addr)
   next_gdbarch = gdbarch;
   next_address = addr;
 
+  if (format == 'j')
+    {
+      /* If 'j' is given, we get the address after going back
+         by 'count' instructions from addr and dump instructions
+         from it by setting format to 'i'.
+         At the end of this function, we adjust next_address
+         for subsequent command calls. */
+      int linesread = 0;
+      addr_rewound = next_address = disassemble_backward (gdbarch,
+           addr, count, &linesread);
+      if (!next_address)
+        {
+          return;
+        }
+      count = linesread;
+      format = 'i';
+    }
+
   /* Instruction format implies fetch single bytes
      regardless of the specified size.
      The case of strings is handled in decode_format, only explicit
@@ -913,6 +1024,9 @@  do_examine (struct format_data fmt, struct gdbarch *gdbarch, CORE_ADDR addr)
       printf_filtered ("\n");
       gdb_flush (gdb_stdout);
     }
+
+  if (addr_rewound)
+    next_address = addr_rewound;
 }
 
 static void
@@ -2518,8 +2632,8 @@  Examine memory: x/FMT ADDRESS.\n\
 ADDRESS is an expression for the memory address to examine.\n\
 FMT is a repeat count followed by a format letter and a size letter.\n\
 Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),\n\
-  t(binary), f(float), a(address), i(instruction), c(char), s(string)\n\
-  and z(hex, zero padded on the left).\n\
+  t(binary), f(float), a(address), i(instruction), j(instruction backward),\n\
+  c(char), s(string) and z(hex, zero padded on the left).\n\
 Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).\n\
 The specified number of objects of the specified size are printed\n\
 according to the format.\n\n\