[v2,2/5] Restrict use of minsym names when printing addresses in disassembled code

Message ID 20190704045503.1250-3-kevinb@redhat.com
State New, archived
Headers

Commit Message

Kevin Buettner July 4, 2019, 4:55 a.m. UTC
  build_address_symbolic contains some code which causes it to
prefer the minsym over the the function symbol in certain cases.
The cases where this occurs are the same as the "certain pathological
cases" that used to exist in find_frame_funname().

This commit largely disables that code; it will only prefer the
minsym when the address of minsym is identical to that of the address
under consideration AND the function address for the symbtab sym is
not the same as the address under consideration.

So, without this change, when using the dw2-ranges-func-lo-cold
executable from the gdb.dwarf2/dw2-ranges-func.exp test, GDB exhibits
the following behavior:

(gdb) x/5i foo_cold
   0x40110d <foo+4294967277>:	push   %rbp
   0x40110e <foo+4294967278>:	mov    %rsp,%rbp
   0x401111 <foo+4294967281>:	callq  0x401106 <baz>
   0x401116 <foo+4294967286>:	nop
   0x401117 <foo+4294967287>:	pop    %rbp

On the other hand, still without this change, using the
dw2-ranges-func-hi-cold executable from the same test, GDB
does this instead:

(gdb) x/5i foo_cold
   0x401128 <foo_cold>:	push   %rbp
   0x401129 <foo_cold+1>:	mov    %rsp,%rbp
   0x40112c <foo_cold+4>:	callq  0x401134 <baz>
   0x401131 <foo_cold+9>:	nop
   0x401132 <foo_cold+10>:	pop    %rbp

This is inconsistent behavior.  When foo_cold is at a lower
address than the function's entry point, the symtab symbol (foo)
is displayed along with a large positive offset which would wrap
around the address space if the address space were only 32 bits wide.
(A later patch fixes this problem by displaying negative offsets.)

This commit makes the behavior uniform for both the "lo-cold" and
"hi-cold" cases:

lo-cold:

(gdb) x/5i foo_cold
   0x40110d <foo_cold>:	push   %rbp
   0x40110e <foo-18>:	mov    %rsp,%rbp
   0x401111 <foo-15>:	callq  0x401106 <baz>
   0x401116 <foo-10>:	nop
   0x401117 <foo-9>:	pop    %rbp

hi-cold:

(gdb) x/5i foo_cold
   0x401128 <foo_cold>:	push   %rbp
   0x401129 <foo+35>:	mov    %rsp,%rbp
   0x40112c <foo+38>:	callq  0x401134 <baz>
   0x401131 <foo+43>:	nop
   0x401132 <foo+44>:	pop    %rbp

In both cases, the symbol shown for the address at which foo_cold
resides is shown as <foo_cold>.  Subsequent offsets are shown as
either negative or positive offsets from the entry pc for foo.

When disassembling a function, care must be taken to NOT display
<+0> as the offset for the second range.  For this reason, I found
it necessary to add the "prefer_sym_over_minsym" parameter to
build_address_symbolic.  The type of this flag is a bool; do_demangle
ought to be a bool also, so I made this change at the same time.

gdb/ChangeLog:

	* valprint.h (build_address_symbolic): Add "prefer_sym_over_minsym"
	parameter.  Change type of "do_demangle" to bool.
	* disasm.c (gdb_pretty_print_disassembler::pretty_print_insn):
	Pass suitable "prefer_sym_over_minsym" flag to
	build_address_symbolic().  Don't output "+" for negative offsets.
	* printcmd.c (print_address_symbolic): Update invocation of
	build_address_symbolic to include a "prefer_sym_over_minsym"
	flag.
	(build_address_symbolic): Add "prefer_sym_over_minsym" parameter.
	Restrict cases in which use of minimal symbol is preferred to that
	of a found symbol.  Update comments.
---
 gdb/disasm.c   | 12 ++++++++----
 gdb/printcmd.c | 29 +++++++++++++++++++++--------
 gdb/valprint.h | 13 +++++++++----
 3 files changed, 38 insertions(+), 16 deletions(-)
  

Comments

Pedro Alves July 19, 2019, 5:07 p.m. UTC | #1
Below, more of a brain dump than an objection.  FWIW.

On 7/4/19 5:55 AM, Kevin Buettner wrote:
> 
> (gdb) x/5i foo_cold
>    0x401128 <foo_cold>:	push   %rbp
>    0x401129 <foo+35>:	mov    %rsp,%rbp
>    0x40112c <foo+38>:	callq  0x401134 <baz>
>    0x401131 <foo+43>:	nop
>    0x401132 <foo+44>:	pop    %rbp

I admit that it took me a while to reply because I'm still finding
it a bit hard to convince myself that that is the ideal output.
E.g., the first two instructions are obviously part of the same
prologue, I find it a bit odd that the disassembly shows
different function names for that instruction pair.

Maybe this won't matter in practice since IIUC your testcase is a
bit contrived, and real cold functions are just fragments of
functions jmp/ret'ed to/from, without a prologue.

BTW, I noticed (with your series applied), this divergence:

 (gdb) x /5i foo_cold
    0x40048e <foo_cold>: push   %rbp
    0x40048f <foo-18>:   mov    %rsp,%rbp
    0x400492 <foo-15>:   callq  0x400487 <baz>
    0x400497 <foo-10>:   nop
    0x400498 <foo-9>:    pop    %rbp

vs:

 (gdb) info symbol 0x40048f
 foo_cold + 1 in section .text
 (gdb) info symbol 0x400492
 foo_cold + 4 in section .text
 (gdb) info symbol 
 Argument required (address).
 (gdb) info symbol 0x400497
 foo_cold + 9 in section .text

That's of course because "info symbol" only looks at minimal symbols.

On the disassemble side, I think I'd be less confused with:

(gdb) disassemble foo
Dump of assembler code for function foo:
Address range 0x4004a1 to 0x4004bc:
   0x00000000004004a1 <foo+0>:     push   %rbp
   0x00000000004004a2 <foo+1>:     mov    %rsp,%rbp
   0x00000000004004a5 <foo+4>:     callq  0x40049a <bar>
   0x00000000004004aa <foo+9>:     mov    0x200b70(%rip),%eax        # 0x601020 <e>
   0x00000000004004b0 <foo+15>:    test   %eax,%eax
   0x00000000004004b2 <foo+17>:    je     0x4004b9 <foo+24>
   0x00000000004004b4 <foo+19>:    callq  0x40048e <foo_cold>
   0x00000000004004b9 <foo+24>:    nop
   0x00000000004004ba <foo+25>:    pop    %rbp
   0x00000000004004bb <foo+26>:    retq   
Address range 0x40048e to 0x40049a:
   0x000000000040048e <foo.cold+0>:    push   %rbp
   0x000000000040048f <foo.cold+1>:    mov    %rsp,%rbp
   0x0000000000400492 <foo.cold+4>:    callq  0x400487 <baz>
   0x0000000000400497 <foo.cold+9>:    nop
   0x0000000000400498 <foo.cold+10>:   pop    %rbp
   0x0000000000400499 <foo.cold+11>:   retq   
End of assembler dump.

Instead of:

(gdb) disassemble foo
Dump of assembler code for function foo:
Address range 0x4004a1 to 0x4004bc:
   0x00000000004004a1 <+0>:     push   %rbp
   0x00000000004004a2 <+1>:     mov    %rsp,%rbp
   0x00000000004004a5 <+4>:     callq  0x40049a <bar>
   0x00000000004004aa <+9>:     mov    0x200b70(%rip),%eax        # 0x601020 <e>
   0x00000000004004b0 <+15>:    test   %eax,%eax
   0x00000000004004b2 <+17>:    je     0x4004b9 <foo+24>
   0x00000000004004b4 <+19>:    callq  0x40048e <foo_cold>
   0x00000000004004b9 <+24>:    nop
   0x00000000004004ba <+25>:    pop    %rbp
   0x00000000004004bb <+26>:    retq   
Address range 0x40048e to 0x40049a:
   0x000000000040048e <-19>:    push   %rbp
   0x000000000040048f <-18>:    mov    %rsp,%rbp
   0x0000000000400492 <-15>:    callq  0x400487 <baz>
   0x0000000000400497 <-10>:    nop
   0x0000000000400498 <-9>:     pop    %rbp
   0x0000000000400499 <-8>:     retq   
End of assembler dump.

In your examples / testcase, the cold function is adjacent/near
the hot / main entry point of foo.  But I think that
on a real cold function, the offsets between the cold and
hot entry points can potentially be much larger (e.g., place in
different elf sections), as the point is exactly to
move cold code away from the hot path / cache.

That that would mean that we're likely to see output like:

(gdb) disassemble foo
Dump of assembler code for function foo:
Address range 0x6004a1 to 0x6004bc:
   0x00000000006004a1 <+0>:     push   %rbp
   0x00000000006004a2 <+1>:     mov    %rsp,%rbp
   0x00000000006004a5 <+4>:     callq  0x40049a <bar>
   0x00000000006004aa <+9>:     mov    0x200b70(%rip),%eax        # 0x601020 <e>
   0x00000000006004b0 <+15>:    test   %eax,%eax
   0x00000000006004b2 <+17>:    je     0x6004b9 <foo+24>
   0x00000000006004b4 <+19>:    callq  0x40048e <foo_cold>
   0x00000000006004b9 <+24>:    nop
   0x00000000006004ba <+25>:    pop    %rbp
   0x00000000006004bb <+26>:    retq   
Address range 0x40048e to 0x40049a:
   0x000000000040048e <-2097171>:    push   %rbp
   0x000000000040048f <-2097170>:    mov    %rsp,%rbp
   0x0000000000400492 <-2097167>:    callq  0x400487 <baz>
   0x0000000000400497 <-2097162>:    nop
   0x0000000000400498 <-2097161>:    pop    %rbp
   0x0000000000400499 <-2097160>:    retq   
End of assembler dump.

... and those negative offsets kind of look a bit odd.

But maybe it's just that I'm not thinking it right.  

If I think of the offset in terms of offset from the
"foo"'s entry point, which is what it really is, then I can
convince myself that I can explain why that's the right output,
pedantically.

So I guess I'll grow into it.

And with that out of the way, the series looks good to me as is.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 4e58cb67f9..30c3b06936 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -237,16 +237,20 @@  gdb_pretty_print_disassembler::pretty_print_insn (struct ui_out *uiout,
     uiout->field_core_addr ("address", gdbarch, pc);
 
     std::string name, filename;
-    if (!build_address_symbolic (gdbarch, pc, 0, &name, &offset, &filename,
-				 &line, &unmapped))
+    bool omit_fname = ((flags & DISASSEMBLY_OMIT_FNAME) != 0);
+    if (!build_address_symbolic (gdbarch, pc, false, omit_fname, &name,
+                                 &offset, &filename, &line, &unmapped))
       {
 	/* We don't care now about line, filename and unmapped.  But we might in
 	   the future.  */
 	uiout->text (" <");
-	if ((flags & DISASSEMBLY_OMIT_FNAME) == 0)
+	if (!omit_fname)
 	  uiout->field_string ("func-name", name.c_str (),
 			       ui_out_style_kind::FUNCTION);
-	uiout->text ("+");
+	/* For negative offsets, avoid displaying them as +-N; the sign of
+	   the offset takes the place of the "+" here.  */
+	if (offset >= 0)
+	  uiout->text ("+");
 	uiout->field_int ("offset", offset);
 	uiout->text (">:\t");
       }
diff --git a/gdb/printcmd.c b/gdb/printcmd.c
index 0509360581..1109cb3046 100644
--- a/gdb/printcmd.c
+++ b/gdb/printcmd.c
@@ -528,8 +528,8 @@  print_address_symbolic (struct gdbarch *gdbarch, CORE_ADDR addr,
   int offset = 0;
   int line = 0;
 
-  if (build_address_symbolic (gdbarch, addr, do_demangle, &name, &offset,
-			      &filename, &line, &unmapped))
+  if (build_address_symbolic (gdbarch, addr, do_demangle, false, &name,
+                              &offset, &filename, &line, &unmapped))
     return 0;
 
   fputs_filtered (leadin, stream);
@@ -563,7 +563,8 @@  print_address_symbolic (struct gdbarch *gdbarch, CORE_ADDR addr,
 int
 build_address_symbolic (struct gdbarch *gdbarch,
 			CORE_ADDR addr,  /* IN */
-			int do_demangle, /* IN */
+			bool do_demangle, /* IN */
+			bool prefer_sym_over_minsym, /* IN */
 			std::string *name, /* OUT */
 			int *offset,     /* OUT */
 			std::string *filename, /* OUT */
@@ -591,8 +592,10 @@  build_address_symbolic (struct gdbarch *gdbarch,
 	}
     }
 
-  /* First try to find the address in the symbol table, then
-     in the minsyms.  Take the closest one.  */
+  /* Try to find the address in both the symbol table and the minsyms. 
+     In most cases, we'll prefer to use the symbol instead of the
+     minsym.  However, there are cases (see below) where we'll choose
+     to use the minsym instead.  */
 
   /* This is defective in the sense that it only finds text symbols.  So
      really this is kind of pointless--we should make sure that the
@@ -629,7 +632,19 @@  build_address_symbolic (struct gdbarch *gdbarch,
 
   if (msymbol.minsym != NULL)
     {
-      if (BMSYMBOL_VALUE_ADDRESS (msymbol) > name_location || symbol == NULL)
+      /* Use the minsym if no symbol is found.
+      
+	 Additionally, use the minsym instead of a (found) symbol if
+	 the following conditions all hold:
+	   1) The prefer_sym_over_minsym flag is false.
+	   2) The minsym address is identical to that of the address under
+	      consideration.
+	   3) The symbol address is not identical to that of the address
+	      under consideration.  */
+      if (symbol == NULL ||
+           (!prefer_sym_over_minsym
+	    && BMSYMBOL_VALUE_ADDRESS (msymbol) == addr
+	    && name_location != addr))
 	{
 	  /* If this is a function (i.e. a code address), strip out any
 	     non-address bits.  For instance, display a pointer to the
@@ -642,8 +657,6 @@  build_address_symbolic (struct gdbarch *gdbarch,
 	      || MSYMBOL_TYPE (msymbol.minsym) == mst_solib_trampoline)
 	    addr = gdbarch_addr_bits_remove (gdbarch, addr);
 
-	  /* The msymbol is closer to the address than the symbol;
-	     use the msymbol instead.  */
 	  symbol = 0;
 	  name_location = BMSYMBOL_VALUE_ADDRESS (msymbol);
 	  if (do_demangle || asm_demangle)
diff --git a/gdb/valprint.h b/gdb/valprint.h
index 987c534eaf..07014c11b9 100644
--- a/gdb/valprint.h
+++ b/gdb/valprint.h
@@ -255,13 +255,18 @@  extern void print_command_completer (struct cmd_list_element *ignore,
 /* Given an address ADDR return all the elements needed to print the
    address in a symbolic form.  NAME can be mangled or not depending
    on DO_DEMANGLE (and also on the asm_demangle global variable,
-   manipulated via ''set print asm-demangle'').  Return 0 in case of
-   success, when all the info in the OUT paramters is valid.  Return 1
-   otherwise.  */
+   manipulated via ''set print asm-demangle'').  When
+   PREFER_SYM_OVER_MINSYM is true, names (and offsets) from minimal
+   symbols won't be used except in instances where no symbol was
+   found; otherwise, a minsym might be used in some instances (mostly
+   involving function with non-contiguous address ranges).  Return
+   0 in case of success, when all the info in the OUT paramters is
+   valid.  Return 1 otherwise.  */
 
 extern int build_address_symbolic (struct gdbarch *,
 				   CORE_ADDR addr,
-				   int do_demangle,
+				   bool do_demangle,
+				   bool prefer_sym_over_minsym,
 				   std::string *name,
 				   int *offset,
 				   std::string *filename,