[v2,2/5] Restrict use of minsym names when printing addresses in disassembled code
Commit Message
build_address_symbolic contains some code which causes it to
prefer the minsym over the the function symbol in certain cases.
The cases where this occurs are the same as the "certain pathological
cases" that used to exist in find_frame_funname().
This commit largely disables that code; it will only prefer the
minsym when the address of minsym is identical to that of the address
under consideration AND the function address for the symbtab sym is
not the same as the address under consideration.
So, without this change, when using the dw2-ranges-func-lo-cold
executable from the gdb.dwarf2/dw2-ranges-func.exp test, GDB exhibits
the following behavior:
(gdb) x/5i foo_cold
0x40110d <foo+4294967277>: push %rbp
0x40110e <foo+4294967278>: mov %rsp,%rbp
0x401111 <foo+4294967281>: callq 0x401106 <baz>
0x401116 <foo+4294967286>: nop
0x401117 <foo+4294967287>: pop %rbp
On the other hand, still without this change, using the
dw2-ranges-func-hi-cold executable from the same test, GDB
does this instead:
(gdb) x/5i foo_cold
0x401128 <foo_cold>: push %rbp
0x401129 <foo_cold+1>: mov %rsp,%rbp
0x40112c <foo_cold+4>: callq 0x401134 <baz>
0x401131 <foo_cold+9>: nop
0x401132 <foo_cold+10>: pop %rbp
This is inconsistent behavior. When foo_cold is at a lower
address than the function's entry point, the symtab symbol (foo)
is displayed along with a large positive offset which would wrap
around the address space if the address space were only 32 bits wide.
(A later patch fixes this problem by displaying negative offsets.)
This commit makes the behavior uniform for both the "lo-cold" and
"hi-cold" cases:
lo-cold:
(gdb) x/5i foo_cold
0x40110d <foo_cold>: push %rbp
0x40110e <foo-18>: mov %rsp,%rbp
0x401111 <foo-15>: callq 0x401106 <baz>
0x401116 <foo-10>: nop
0x401117 <foo-9>: pop %rbp
hi-cold:
(gdb) x/5i foo_cold
0x401128 <foo_cold>: push %rbp
0x401129 <foo+35>: mov %rsp,%rbp
0x40112c <foo+38>: callq 0x401134 <baz>
0x401131 <foo+43>: nop
0x401132 <foo+44>: pop %rbp
In both cases, the symbol shown for the address at which foo_cold
resides is shown as <foo_cold>. Subsequent offsets are shown as
either negative or positive offsets from the entry pc for foo.
When disassembling a function, care must be taken to NOT display
<+0> as the offset for the second range. For this reason, I found
it necessary to add the "prefer_sym_over_minsym" parameter to
build_address_symbolic. The type of this flag is a bool; do_demangle
ought to be a bool also, so I made this change at the same time.
gdb/ChangeLog:
* valprint.h (build_address_symbolic): Add "prefer_sym_over_minsym"
parameter. Change type of "do_demangle" to bool.
* disasm.c (gdb_pretty_print_disassembler::pretty_print_insn):
Pass suitable "prefer_sym_over_minsym" flag to
build_address_symbolic(). Don't output "+" for negative offsets.
* printcmd.c (print_address_symbolic): Update invocation of
build_address_symbolic to include a "prefer_sym_over_minsym"
flag.
(build_address_symbolic): Add "prefer_sym_over_minsym" parameter.
Restrict cases in which use of minimal symbol is preferred to that
of a found symbol. Update comments.
---
gdb/disasm.c | 12 ++++++++----
gdb/printcmd.c | 29 +++++++++++++++++++++--------
gdb/valprint.h | 13 +++++++++----
3 files changed, 38 insertions(+), 16 deletions(-)
Comments
Below, more of a brain dump than an objection. FWIW.
On 7/4/19 5:55 AM, Kevin Buettner wrote:
>
> (gdb) x/5i foo_cold
> 0x401128 <foo_cold>: push %rbp
> 0x401129 <foo+35>: mov %rsp,%rbp
> 0x40112c <foo+38>: callq 0x401134 <baz>
> 0x401131 <foo+43>: nop
> 0x401132 <foo+44>: pop %rbp
I admit that it took me a while to reply because I'm still finding
it a bit hard to convince myself that that is the ideal output.
E.g., the first two instructions are obviously part of the same
prologue, I find it a bit odd that the disassembly shows
different function names for that instruction pair.
Maybe this won't matter in practice since IIUC your testcase is a
bit contrived, and real cold functions are just fragments of
functions jmp/ret'ed to/from, without a prologue.
BTW, I noticed (with your series applied), this divergence:
(gdb) x /5i foo_cold
0x40048e <foo_cold>: push %rbp
0x40048f <foo-18>: mov %rsp,%rbp
0x400492 <foo-15>: callq 0x400487 <baz>
0x400497 <foo-10>: nop
0x400498 <foo-9>: pop %rbp
vs:
(gdb) info symbol 0x40048f
foo_cold + 1 in section .text
(gdb) info symbol 0x400492
foo_cold + 4 in section .text
(gdb) info symbol
Argument required (address).
(gdb) info symbol 0x400497
foo_cold + 9 in section .text
That's of course because "info symbol" only looks at minimal symbols.
On the disassemble side, I think I'd be less confused with:
(gdb) disassemble foo
Dump of assembler code for function foo:
Address range 0x4004a1 to 0x4004bc:
0x00000000004004a1 <foo+0>: push %rbp
0x00000000004004a2 <foo+1>: mov %rsp,%rbp
0x00000000004004a5 <foo+4>: callq 0x40049a <bar>
0x00000000004004aa <foo+9>: mov 0x200b70(%rip),%eax # 0x601020 <e>
0x00000000004004b0 <foo+15>: test %eax,%eax
0x00000000004004b2 <foo+17>: je 0x4004b9 <foo+24>
0x00000000004004b4 <foo+19>: callq 0x40048e <foo_cold>
0x00000000004004b9 <foo+24>: nop
0x00000000004004ba <foo+25>: pop %rbp
0x00000000004004bb <foo+26>: retq
Address range 0x40048e to 0x40049a:
0x000000000040048e <foo.cold+0>: push %rbp
0x000000000040048f <foo.cold+1>: mov %rsp,%rbp
0x0000000000400492 <foo.cold+4>: callq 0x400487 <baz>
0x0000000000400497 <foo.cold+9>: nop
0x0000000000400498 <foo.cold+10>: pop %rbp
0x0000000000400499 <foo.cold+11>: retq
End of assembler dump.
Instead of:
(gdb) disassemble foo
Dump of assembler code for function foo:
Address range 0x4004a1 to 0x4004bc:
0x00000000004004a1 <+0>: push %rbp
0x00000000004004a2 <+1>: mov %rsp,%rbp
0x00000000004004a5 <+4>: callq 0x40049a <bar>
0x00000000004004aa <+9>: mov 0x200b70(%rip),%eax # 0x601020 <e>
0x00000000004004b0 <+15>: test %eax,%eax
0x00000000004004b2 <+17>: je 0x4004b9 <foo+24>
0x00000000004004b4 <+19>: callq 0x40048e <foo_cold>
0x00000000004004b9 <+24>: nop
0x00000000004004ba <+25>: pop %rbp
0x00000000004004bb <+26>: retq
Address range 0x40048e to 0x40049a:
0x000000000040048e <-19>: push %rbp
0x000000000040048f <-18>: mov %rsp,%rbp
0x0000000000400492 <-15>: callq 0x400487 <baz>
0x0000000000400497 <-10>: nop
0x0000000000400498 <-9>: pop %rbp
0x0000000000400499 <-8>: retq
End of assembler dump.
In your examples / testcase, the cold function is adjacent/near
the hot / main entry point of foo. But I think that
on a real cold function, the offsets between the cold and
hot entry points can potentially be much larger (e.g., place in
different elf sections), as the point is exactly to
move cold code away from the hot path / cache.
That that would mean that we're likely to see output like:
(gdb) disassemble foo
Dump of assembler code for function foo:
Address range 0x6004a1 to 0x6004bc:
0x00000000006004a1 <+0>: push %rbp
0x00000000006004a2 <+1>: mov %rsp,%rbp
0x00000000006004a5 <+4>: callq 0x40049a <bar>
0x00000000006004aa <+9>: mov 0x200b70(%rip),%eax # 0x601020 <e>
0x00000000006004b0 <+15>: test %eax,%eax
0x00000000006004b2 <+17>: je 0x6004b9 <foo+24>
0x00000000006004b4 <+19>: callq 0x40048e <foo_cold>
0x00000000006004b9 <+24>: nop
0x00000000006004ba <+25>: pop %rbp
0x00000000006004bb <+26>: retq
Address range 0x40048e to 0x40049a:
0x000000000040048e <-2097171>: push %rbp
0x000000000040048f <-2097170>: mov %rsp,%rbp
0x0000000000400492 <-2097167>: callq 0x400487 <baz>
0x0000000000400497 <-2097162>: nop
0x0000000000400498 <-2097161>: pop %rbp
0x0000000000400499 <-2097160>: retq
End of assembler dump.
... and those negative offsets kind of look a bit odd.
But maybe it's just that I'm not thinking it right.
If I think of the offset in terms of offset from the
"foo"'s entry point, which is what it really is, then I can
convince myself that I can explain why that's the right output,
pedantically.
So I guess I'll grow into it.
And with that out of the way, the series looks good to me as is.
Thanks,
Pedro Alves
@@ -237,16 +237,20 @@ gdb_pretty_print_disassembler::pretty_print_insn (struct ui_out *uiout,
uiout->field_core_addr ("address", gdbarch, pc);
std::string name, filename;
- if (!build_address_symbolic (gdbarch, pc, 0, &name, &offset, &filename,
- &line, &unmapped))
+ bool omit_fname = ((flags & DISASSEMBLY_OMIT_FNAME) != 0);
+ if (!build_address_symbolic (gdbarch, pc, false, omit_fname, &name,
+ &offset, &filename, &line, &unmapped))
{
/* We don't care now about line, filename and unmapped. But we might in
the future. */
uiout->text (" <");
- if ((flags & DISASSEMBLY_OMIT_FNAME) == 0)
+ if (!omit_fname)
uiout->field_string ("func-name", name.c_str (),
ui_out_style_kind::FUNCTION);
- uiout->text ("+");
+ /* For negative offsets, avoid displaying them as +-N; the sign of
+ the offset takes the place of the "+" here. */
+ if (offset >= 0)
+ uiout->text ("+");
uiout->field_int ("offset", offset);
uiout->text (">:\t");
}
@@ -528,8 +528,8 @@ print_address_symbolic (struct gdbarch *gdbarch, CORE_ADDR addr,
int offset = 0;
int line = 0;
- if (build_address_symbolic (gdbarch, addr, do_demangle, &name, &offset,
- &filename, &line, &unmapped))
+ if (build_address_symbolic (gdbarch, addr, do_demangle, false, &name,
+ &offset, &filename, &line, &unmapped))
return 0;
fputs_filtered (leadin, stream);
@@ -563,7 +563,8 @@ print_address_symbolic (struct gdbarch *gdbarch, CORE_ADDR addr,
int
build_address_symbolic (struct gdbarch *gdbarch,
CORE_ADDR addr, /* IN */
- int do_demangle, /* IN */
+ bool do_demangle, /* IN */
+ bool prefer_sym_over_minsym, /* IN */
std::string *name, /* OUT */
int *offset, /* OUT */
std::string *filename, /* OUT */
@@ -591,8 +592,10 @@ build_address_symbolic (struct gdbarch *gdbarch,
}
}
- /* First try to find the address in the symbol table, then
- in the minsyms. Take the closest one. */
+ /* Try to find the address in both the symbol table and the minsyms.
+ In most cases, we'll prefer to use the symbol instead of the
+ minsym. However, there are cases (see below) where we'll choose
+ to use the minsym instead. */
/* This is defective in the sense that it only finds text symbols. So
really this is kind of pointless--we should make sure that the
@@ -629,7 +632,19 @@ build_address_symbolic (struct gdbarch *gdbarch,
if (msymbol.minsym != NULL)
{
- if (BMSYMBOL_VALUE_ADDRESS (msymbol) > name_location || symbol == NULL)
+ /* Use the minsym if no symbol is found.
+
+ Additionally, use the minsym instead of a (found) symbol if
+ the following conditions all hold:
+ 1) The prefer_sym_over_minsym flag is false.
+ 2) The minsym address is identical to that of the address under
+ consideration.
+ 3) The symbol address is not identical to that of the address
+ under consideration. */
+ if (symbol == NULL ||
+ (!prefer_sym_over_minsym
+ && BMSYMBOL_VALUE_ADDRESS (msymbol) == addr
+ && name_location != addr))
{
/* If this is a function (i.e. a code address), strip out any
non-address bits. For instance, display a pointer to the
@@ -642,8 +657,6 @@ build_address_symbolic (struct gdbarch *gdbarch,
|| MSYMBOL_TYPE (msymbol.minsym) == mst_solib_trampoline)
addr = gdbarch_addr_bits_remove (gdbarch, addr);
- /* The msymbol is closer to the address than the symbol;
- use the msymbol instead. */
symbol = 0;
name_location = BMSYMBOL_VALUE_ADDRESS (msymbol);
if (do_demangle || asm_demangle)
@@ -255,13 +255,18 @@ extern void print_command_completer (struct cmd_list_element *ignore,
/* Given an address ADDR return all the elements needed to print the
address in a symbolic form. NAME can be mangled or not depending
on DO_DEMANGLE (and also on the asm_demangle global variable,
- manipulated via ''set print asm-demangle''). Return 0 in case of
- success, when all the info in the OUT paramters is valid. Return 1
- otherwise. */
+ manipulated via ''set print asm-demangle''). When
+ PREFER_SYM_OVER_MINSYM is true, names (and offsets) from minimal
+ symbols won't be used except in instances where no symbol was
+ found; otherwise, a minsym might be used in some instances (mostly
+ involving function with non-contiguous address ranges). Return
+ 0 in case of success, when all the info in the OUT paramters is
+ valid. Return 1 otherwise. */
extern int build_address_symbolic (struct gdbarch *,
CORE_ADDR addr,
- int do_demangle,
+ bool do_demangle,
+ bool prefer_sym_over_minsym,
std::string *name,
int *offset,
std::string *filename,