Message ID | 20190704045503.1250-3-kevinb@redhat.com |
---|---|
State | New |
Headers | show |
Below, more of a brain dump than an objection. FWIW. On 7/4/19 5:55 AM, Kevin Buettner wrote: > > (gdb) x/5i foo_cold > 0x401128 <foo_cold>: push %rbp > 0x401129 <foo+35>: mov %rsp,%rbp > 0x40112c <foo+38>: callq 0x401134 <baz> > 0x401131 <foo+43>: nop > 0x401132 <foo+44>: pop %rbp I admit that it took me a while to reply because I'm still finding it a bit hard to convince myself that that is the ideal output. E.g., the first two instructions are obviously part of the same prologue, I find it a bit odd that the disassembly shows different function names for that instruction pair. Maybe this won't matter in practice since IIUC your testcase is a bit contrived, and real cold functions are just fragments of functions jmp/ret'ed to/from, without a prologue. BTW, I noticed (with your series applied), this divergence: (gdb) x /5i foo_cold 0x40048e <foo_cold>: push %rbp 0x40048f <foo-18>: mov %rsp,%rbp 0x400492 <foo-15>: callq 0x400487 <baz> 0x400497 <foo-10>: nop 0x400498 <foo-9>: pop %rbp vs: (gdb) info symbol 0x40048f foo_cold + 1 in section .text (gdb) info symbol 0x400492 foo_cold + 4 in section .text (gdb) info symbol Argument required (address). (gdb) info symbol 0x400497 foo_cold + 9 in section .text That's of course because "info symbol" only looks at minimal symbols. On the disassemble side, I think I'd be less confused with: (gdb) disassemble foo Dump of assembler code for function foo: Address range 0x4004a1 to 0x4004bc: 0x00000000004004a1 <foo+0>: push %rbp 0x00000000004004a2 <foo+1>: mov %rsp,%rbp 0x00000000004004a5 <foo+4>: callq 0x40049a <bar> 0x00000000004004aa <foo+9>: mov 0x200b70(%rip),%eax # 0x601020 <e> 0x00000000004004b0 <foo+15>: test %eax,%eax 0x00000000004004b2 <foo+17>: je 0x4004b9 <foo+24> 0x00000000004004b4 <foo+19>: callq 0x40048e <foo_cold> 0x00000000004004b9 <foo+24>: nop 0x00000000004004ba <foo+25>: pop %rbp 0x00000000004004bb <foo+26>: retq Address range 0x40048e to 0x40049a: 0x000000000040048e <foo.cold+0>: push %rbp 0x000000000040048f <foo.cold+1>: mov %rsp,%rbp 0x0000000000400492 <foo.cold+4>: callq 0x400487 <baz> 0x0000000000400497 <foo.cold+9>: nop 0x0000000000400498 <foo.cold+10>: pop %rbp 0x0000000000400499 <foo.cold+11>: retq End of assembler dump. Instead of: (gdb) disassemble foo Dump of assembler code for function foo: Address range 0x4004a1 to 0x4004bc: 0x00000000004004a1 <+0>: push %rbp 0x00000000004004a2 <+1>: mov %rsp,%rbp 0x00000000004004a5 <+4>: callq 0x40049a <bar> 0x00000000004004aa <+9>: mov 0x200b70(%rip),%eax # 0x601020 <e> 0x00000000004004b0 <+15>: test %eax,%eax 0x00000000004004b2 <+17>: je 0x4004b9 <foo+24> 0x00000000004004b4 <+19>: callq 0x40048e <foo_cold> 0x00000000004004b9 <+24>: nop 0x00000000004004ba <+25>: pop %rbp 0x00000000004004bb <+26>: retq Address range 0x40048e to 0x40049a: 0x000000000040048e <-19>: push %rbp 0x000000000040048f <-18>: mov %rsp,%rbp 0x0000000000400492 <-15>: callq 0x400487 <baz> 0x0000000000400497 <-10>: nop 0x0000000000400498 <-9>: pop %rbp 0x0000000000400499 <-8>: retq End of assembler dump. In your examples / testcase, the cold function is adjacent/near the hot / main entry point of foo. But I think that on a real cold function, the offsets between the cold and hot entry points can potentially be much larger (e.g., place in different elf sections), as the point is exactly to move cold code away from the hot path / cache. That that would mean that we're likely to see output like: (gdb) disassemble foo Dump of assembler code for function foo: Address range 0x6004a1 to 0x6004bc: 0x00000000006004a1 <+0>: push %rbp 0x00000000006004a2 <+1>: mov %rsp,%rbp 0x00000000006004a5 <+4>: callq 0x40049a <bar> 0x00000000006004aa <+9>: mov 0x200b70(%rip),%eax # 0x601020 <e> 0x00000000006004b0 <+15>: test %eax,%eax 0x00000000006004b2 <+17>: je 0x6004b9 <foo+24> 0x00000000006004b4 <+19>: callq 0x40048e <foo_cold> 0x00000000006004b9 <+24>: nop 0x00000000006004ba <+25>: pop %rbp 0x00000000006004bb <+26>: retq Address range 0x40048e to 0x40049a: 0x000000000040048e <-2097171>: push %rbp 0x000000000040048f <-2097170>: mov %rsp,%rbp 0x0000000000400492 <-2097167>: callq 0x400487 <baz> 0x0000000000400497 <-2097162>: nop 0x0000000000400498 <-2097161>: pop %rbp 0x0000000000400499 <-2097160>: retq End of assembler dump. ... and those negative offsets kind of look a bit odd. But maybe it's just that I'm not thinking it right. If I think of the offset in terms of offset from the "foo"'s entry point, which is what it really is, then I can convince myself that I can explain why that's the right output, pedantically. So I guess I'll grow into it. And with that out of the way, the series looks good to me as is. Thanks, Pedro Alves
diff --git a/gdb/disasm.c b/gdb/disasm.c index 4e58cb67f9..30c3b06936 100644 --- a/gdb/disasm.c +++ b/gdb/disasm.c @@ -237,16 +237,20 @@ gdb_pretty_print_disassembler::pretty_print_insn (struct ui_out *uiout, uiout->field_core_addr ("address", gdbarch, pc); std::string name, filename; - if (!build_address_symbolic (gdbarch, pc, 0, &name, &offset, &filename, - &line, &unmapped)) + bool omit_fname = ((flags & DISASSEMBLY_OMIT_FNAME) != 0); + if (!build_address_symbolic (gdbarch, pc, false, omit_fname, &name, + &offset, &filename, &line, &unmapped)) { /* We don't care now about line, filename and unmapped. But we might in the future. */ uiout->text (" <"); - if ((flags & DISASSEMBLY_OMIT_FNAME) == 0) + if (!omit_fname) uiout->field_string ("func-name", name.c_str (), ui_out_style_kind::FUNCTION); - uiout->text ("+"); + /* For negative offsets, avoid displaying them as +-N; the sign of + the offset takes the place of the "+" here. */ + if (offset >= 0) + uiout->text ("+"); uiout->field_int ("offset", offset); uiout->text (">:\t"); } diff --git a/gdb/printcmd.c b/gdb/printcmd.c index 0509360581..1109cb3046 100644 --- a/gdb/printcmd.c +++ b/gdb/printcmd.c @@ -528,8 +528,8 @@ print_address_symbolic (struct gdbarch *gdbarch, CORE_ADDR addr, int offset = 0; int line = 0; - if (build_address_symbolic (gdbarch, addr, do_demangle, &name, &offset, - &filename, &line, &unmapped)) + if (build_address_symbolic (gdbarch, addr, do_demangle, false, &name, + &offset, &filename, &line, &unmapped)) return 0; fputs_filtered (leadin, stream); @@ -563,7 +563,8 @@ print_address_symbolic (struct gdbarch *gdbarch, CORE_ADDR addr, int build_address_symbolic (struct gdbarch *gdbarch, CORE_ADDR addr, /* IN */ - int do_demangle, /* IN */ + bool do_demangle, /* IN */ + bool prefer_sym_over_minsym, /* IN */ std::string *name, /* OUT */ int *offset, /* OUT */ std::string *filename, /* OUT */ @@ -591,8 +592,10 @@ build_address_symbolic (struct gdbarch *gdbarch, } } - /* First try to find the address in the symbol table, then - in the minsyms. Take the closest one. */ + /* Try to find the address in both the symbol table and the minsyms. + In most cases, we'll prefer to use the symbol instead of the + minsym. However, there are cases (see below) where we'll choose + to use the minsym instead. */ /* This is defective in the sense that it only finds text symbols. So really this is kind of pointless--we should make sure that the @@ -629,7 +632,19 @@ build_address_symbolic (struct gdbarch *gdbarch, if (msymbol.minsym != NULL) { - if (BMSYMBOL_VALUE_ADDRESS (msymbol) > name_location || symbol == NULL) + /* Use the minsym if no symbol is found. + + Additionally, use the minsym instead of a (found) symbol if + the following conditions all hold: + 1) The prefer_sym_over_minsym flag is false. + 2) The minsym address is identical to that of the address under + consideration. + 3) The symbol address is not identical to that of the address + under consideration. */ + if (symbol == NULL || + (!prefer_sym_over_minsym + && BMSYMBOL_VALUE_ADDRESS (msymbol) == addr + && name_location != addr)) { /* If this is a function (i.e. a code address), strip out any non-address bits. For instance, display a pointer to the @@ -642,8 +657,6 @@ build_address_symbolic (struct gdbarch *gdbarch, || MSYMBOL_TYPE (msymbol.minsym) == mst_solib_trampoline) addr = gdbarch_addr_bits_remove (gdbarch, addr); - /* The msymbol is closer to the address than the symbol; - use the msymbol instead. */ symbol = 0; name_location = BMSYMBOL_VALUE_ADDRESS (msymbol); if (do_demangle || asm_demangle) diff --git a/gdb/valprint.h b/gdb/valprint.h index 987c534eaf..07014c11b9 100644 --- a/gdb/valprint.h +++ b/gdb/valprint.h @@ -255,13 +255,18 @@ extern void print_command_completer (struct cmd_list_element *ignore, /* Given an address ADDR return all the elements needed to print the address in a symbolic form. NAME can be mangled or not depending on DO_DEMANGLE (and also on the asm_demangle global variable, - manipulated via ''set print asm-demangle''). Return 0 in case of - success, when all the info in the OUT paramters is valid. Return 1 - otherwise. */ + manipulated via ''set print asm-demangle''). When + PREFER_SYM_OVER_MINSYM is true, names (and offsets) from minimal + symbols won't be used except in instances where no symbol was + found; otherwise, a minsym might be used in some instances (mostly + involving function with non-contiguous address ranges). Return + 0 in case of success, when all the info in the OUT paramters is + valid. Return 1 otherwise. */ extern int build_address_symbolic (struct gdbarch *, CORE_ADDR addr, - int do_demangle, + bool do_demangle, + bool prefer_sym_over_minsym, std::string *name, int *offset, std::string *filename,