RFC: Annotate immediates in x86 disassembly
Commit Message
Hi Guys,
Attached is a small potential patch that adds a new target specific
option to the x86 and x86_64 disassemblers: "-M annotate-immediates".
If enabled immediate operands which match a symbol's value will have
the name of that symbol displayed alongside them.
For example, without the new option a line of disassembly might look
like this:
400419: bf 18 30 40 00 mov $0x403018,%edi
whereas with the option enabled it could look like this:
400419: bf 18 30 40 00 mov $0x403018 [func1],%edi
I find this helpful when looking at code that loads function
addresses into registers for example.
What do you think ? Would this be a useful addition to the assembler?
Notes:
* The feature is only enabled when disassembling executables as
the symbolic addresses in object files too often match small
ordinary constants.
* The option parser will accept "-M annotate" or "-M
annotate-immediates" or "-M annotate<anything>". I could make it
more strict, but I was not sure if it was worth it.
I did try to make this a generic option, rather than specific to the
x86/x86_64 architecture, but I quickly ran into problems with how
different ISAs handle the loading of constant values. Still if this
kind of feature is considered useful it could be added to more
backends or a per-architecture basis.
Cheers
Nick
Comments
On Wed, Apr 22, 2026 at 5:02 PM Nick Clifton <nickc@redhat.com> wrote:
>
> Hi Guys,
>
> Attached is a small potential patch that adds a new target specific
> option to the x86 and x86_64 disassemblers: "-M annotate-immediates".
> If enabled immediate operands which match a symbol's value will have
> the name of that symbol displayed alongside them.
>
> For example, without the new option a line of disassembly might look
> like this:
>
> 400419: bf 18 30 40 00 mov $0x403018,%edi
>
> whereas with the option enabled it could look like this:
>
> 400419: bf 18 30 40 00 mov $0x403018 [func1],%edi
>
> I find this helpful when looking at code that loads function
> addresses into registers for example.
>
> What do you think ? Would this be a useful addition to the assembler?
>
> Notes:
> * The feature is only enabled when disassembling executables as
> the symbolic addresses in object files too often match small
> ordinary constants.
>
> * The option parser will accept "-M annotate" or "-M
> annotate-immediates" or "-M annotate<anything>". I could make it
> more strict, but I was not sure if it was worth it.
>
> I did try to make this a generic option, rather than specific to the
> x86/x86_64 architecture, but I quickly ran into problems with how
> different ISAs handle the loading of constant values. Still if this
> kind of feature is considered useful it could be added to more
> backends or a per-architecture basis.
>
> Cheers
> Nick
>
/* Determine if we can display some more information about this immediate. */
+ if (! annotate_immediates
+ /* Don't bother with xero, even if there is symbol associated
with it. */
zero.
+ /* For the next tests we need a BFD. If we do not have one
then do not proceed. */
+ || ins->info->section == NULL
+ || ins->info->section->owner == NULL
+ /* Save time by avoiding immediates that cannot reference part
of the address space. */
+ || imm < ins->info->section->owner->start_address
+ /* Also skip object files as their symbols have not been resolved. */
+ || (ins->info->section->owner->flags & EXEC_P) == 0)
Should it also handle shared libraries?
+ return;
On 22.04.2026 11:02, Nick Clifton wrote:
> Hi Guys,
>
> Attached is a small potential patch that adds a new target specific
> option to the x86 and x86_64 disassemblers: "-M annotate-immediates".
> If enabled immediate operands which match a symbol's value will have
> the name of that symbol displayed alongside them.
>
> For example, without the new option a line of disassembly might look
> like this:
>
> 400419: bf 18 30 40 00 mov $0x403018,%edi
>
> whereas with the option enabled it could look like this:
>
> 400419: bf 18 30 40 00 mov $0x403018 [func1],%edi
>
> I find this helpful when looking at code that loads function
> addresses into registers for example.
>
> What do you think ? Would this be a useful addition to the assembler?
For the disassembler - yes, sure.
> Notes:
> * The feature is only enabled when disassembling executables as
> the symbolic addresses in object files too often match small
> ordinary constants.
>
> * The option parser will accept "-M annotate" or "-M
> annotate-immediates" or "-M annotate<anything>". I could make it
> more strict, but I was not sure if it was worth it.
Will something equivalent then also work from the gdb prompt? (I expect
so, as I think per-arch option option parsing covers both, but I'd like
to double check.)
> I did try to make this a generic option, rather than specific to the
> x86/x86_64 architecture, but I quickly ran into problems with how
> different ISAs handle the loading of constant values. Still if this
> kind of feature is considered useful it could be added to more
> backends or a per-architecture basis.
Many architectures can only load range restricted constants in a single
insns, so there may not be that many where this is easily possible.
Jan
Hello,
On Wed, 22 Apr 2026, Nick Clifton wrote:
> For example, without the new option a line of disassembly might look
> like this:
>
> 400419: bf 18 30 40 00 mov $0x403018,%edi
>
> whereas with the option enabled it could look like this:
>
> 400419: bf 18 30 40 00 mov $0x403018 [func1],%edi
>
> I find this helpful when looking at code that loads function
> addresses into registers for example.
I think it's helpful, but may I use the bike-shed opportunity here? I
would prefer if the disassembler output (even with the option, but with
--no-addresses --show-raw-insn) would be roughly copy-n-pastable to the
assembler. That would mean putting any annotations (and there may be more
in the future?) after a comment-introducer. Like:
400419: bf 18 30 40 00 mov $0x403018,%edi # [func1]
There are only so many immediates^Wmagic numbers per instruction that
warrant a comment, so I don't think we'd lose much by doing all annotation
postfix instead of infix. Plus: we already do output such after-comment
hints for certain magic numbers:
16: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 1c <f+0x1c>
(that is for the expression 0x0+%rip).
Oh, and about .o files: their magic numbers are usually all associated
with a relocation, so a symbol name hint could still be generated. Like I
now usually disassemble .o files with -r to see exactly that. The above
line is (with -r):
16: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 1c <f+0x1c>
18: R_X86_64_PC32 foo-0x4
I wouldn't mind it saying (with the new option, or even by default):
16: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # [foo]
Ciao,
Michael.
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, April 22, 2026 6:44 PM
>
> On 22.04.2026 11:02, Nick Clifton wrote:
> > Hi Guys,
> >
> > Attached is a small potential patch that adds a new target specific
> > option to the x86 and x86_64 disassemblers: "-M annotate-immediates".
> > If enabled immediate operands which match a symbol's value will have
> > the name of that symbol displayed alongside them.
> >
> > For example, without the new option a line of disassembly might look
> > like this:
> >
> > 400419: bf 18 30 40 00 mov $0x403018,%edi
> >
> > whereas with the option enabled it could look like this:
> >
> > 400419: bf 18 30 40 00 mov $0x403018 [func1],%edi
> >
> > I find this helpful when looking at code that loads function
> > addresses into registers for example.
> >
> > What do you think ? Would this be a useful addition to the assembler?
>
> For the disassembler - yes, sure.
It is helpful for sure.
>
> > Notes:
> > * The feature is only enabled when disassembling executables as
> > the symbolic addresses in object files too often match small
> > ordinary constants.
> >
> > * The option parser will accept "-M annotate" or "-M
> > annotate-immediates" or "-M annotate<anything>". I could make it
> > more strict, but I was not sure if it was worth it.
>
> Will something equivalent then also work from the gdb prompt? (I expect
> so, as I think per-arch option option parsing covers both, but I'd like
> to double check.)
>
I have added some GDB folks from Intel side to see if they have some opinion
on that.
Thx,
Haochen
Hi H.J.
> + || (ins->info->section->owner->flags & EXEC_P) == 0)
>
> Should it also handle shared libraries?
Yes, yes it should. :-)
I will fix this in the v2 patch.
Cheers
Nick
Hi Jan,
>> * The option parser will accept "-M annotate" or "-M
> Will something equivalent then also work from the gdb prompt? (I expect
> so, as I think per-arch option option parsing covers both, but I'd like
> to double check.)
Actually it appears not. At least not in the very basic test I ran:
(gdb) show architecture
The target architecture is set to "auto" (currently "i386:x86-64").
(gdb) show disassembler-options
The current disassembler options are ''
There are no disassembler options available for this architecture.
So it looks like it is not possible to set any target specific options in
gdb. At least not for the x86_64 architecture.
Cheers
Nick
Hi Michael,
>> 400419: bf 18 30 40 00 mov $0x403018 [func1],%edi
> I think it's helpful, but may I use the bike-shed opportunity here?
Of course...
> I
> would prefer if the disassembler output (even with the option, but with
> --no-addresses --show-raw-insn) would be roughly copy-n-pastable to the
> assembler. That would mean putting any annotations (and there may be more
> in the future?) after a comment-introducer. Like:
>
> 400419: bf 18 30 40 00 mov $0x403018,%edi # [func1]
Ah - yes I did think about this. But as far as I can see the current code
in i386-dis.c does not have any way to queue up comments to be displayed at
the end of the line. So that would mean either adding the framework to do
so, or adding yet another walk over all of the operands, once the disassembly
proper has been displayed. Since the feature is gated by the -Mannotate option
it seemed to me that doing either of these things would needlessly complicate
the code.
But ... I agree that it would be nice, and if, as you suggest, there might be
more annotations to come in the future then it would make sense to code the
framework for them now. Let me have a little investigation...
Cheers
Nick
Hi Guys,
Attached is a v2 of the proposed patch. It adds:
* The annotation is displayed as a comment at the end of the line.
* The symbol name is displayed in dis_style_symbol rather than dis_style_address.
* Shared object files (type ET_DYN) can also benefit from the new feature.
There are a couple of potential issues with the patch:
* It adds a new buffer to the instr_info structure, making it even bigger.
* There are opportunities for sharing code with the oappend_*() functions
which I did not take. (Mainly to keep the code in the patch simple).
* There are opportunities for other places in the dissaembler to add comments
of their own, but I figured that these could be added later, if they
were actually deemed worthwhile.
Thoughts, comments ?
Cheers
Nick
Hey,
On Thu, 23 Apr 2026, Nick Clifton wrote:
> * The annotation is displayed as a comment at the end of the line.
Nice!
> Thoughts, comments ?
I'm unworried about the size increase of instr_info, there exists exactly
one instance of it globally (while print_insn runs, on the stack). It may
be nice to separate multiple annotations by comma, and/or to integrate the
%rip-relative print_address_func() call with the
new number-related-symbol hint print into one loop (so that the order of
comments relative to the printed order of operands is correct), but I
think that that shouldn't hold off the patch.
And leaks when asprintf returns zero... hmm, who cares :)
Ciao,
Michael.
> 16: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 1c <f+0x1c>
> 18: R_X86_64_PC32 foo-0x4
>
> I wouldn't mind it saying (with the new option, or even by default):
>
> 16: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # [foo]
I would just like to stress that, probably only mainly for developers,
using objdump utility with '-r' option is really to get a listing of all
relocations present inside a given object or executable,
I wouldn't mind the substitution from "# 1c <f+0x1c>" into "# [foo]"
but I would really like that the output still shows the relocation itself
on the next line.
Pierre Muller
Free Pascal developer
On 23/04/2026 10:13, Nick Clifton wrote:
> Hi Michael,
>
>>> 400419: bf 18 30 40 00 mov $0x403018 [func1],%edi
>
>> I think it's helpful, but may I use the bike-shed opportunity here?
>
> Of course...
>
>> I
>> would prefer if the disassembler output (even with the option, but with
>> --no-addresses --show-raw-insn) would be roughly copy-n-pastable to the
>> assembler. That would mean putting any annotations (and there may be more
>> in the future?) after a comment-introducer. Like:
>>
>> 400419: bf 18 30 40 00 mov $0x403018,%edi # [func1]
>
> Ah - yes I did think about this. But as far as I can see the current code
> in i386-dis.c does not have any way to queue up comments to be displayed at
> the end of the line. So that would mean either adding the framework to do
> so, or adding yet another walk over all of the operands, once the disassembly
> proper has been displayed. Since the feature is gated by the -Mannotate option
> it seemed to me that doing either of these things would needlessly complicate
> the code.
>
> But ... I agree that it would be nice, and if, as you suggest, there might be
> more annotations to come in the future then it would make sense to code the
> framework for them now. Let me have a little investigation...
>
> Cheers
> Nick
>
We somehow do this for function calls and jump instructions.
But to bikeshed further, I've sometimes pondered whether the disassembler should produce some structured data output which a generic output formatter can then grok. This would be able to handle this and coloured output more sensibly since we've improved the abstraction.
The problem, of course, is that such an approach would be a LOT more work.
R.
Hello,
On Thu, 23 Apr 2026, Pierre Muller wrote:
> > 16: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 1c <f+0x1c>
> > 18: R_X86_64_PC32 foo-0x4
> >
> > I wouldn't mind it saying (with the new option, or even by default):
> >
> > 16: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # [foo]
>
> I would just like to stress that, probably only mainly for developers,
> using objdump utility with '-r' option is really to get a listing of all
> relocations present inside a given object or executable,
> I wouldn't mind the substitution from "# 1c <f+0x1c>" into "# [foo]"
> but I would really like that the output still shows the relocation itself
> on the next line.
Yes, of course. I didn't mean to imply otherwise, only to show that
immediates in .o files can be usefully turned into annotations as well
because the info is all there. -r should continue to behave the way it
does now.
Ciao,
Michael.
On 23.04.2026 13:03, Nick Clifton wrote:
> Hi Guys,
>
> Attached is a v2 of the proposed patch. It adds:
>
> * The annotation is displayed as a comment at the end of the line.
> * The symbol name is displayed in dis_style_symbol rather than dis_style_address.
> * Shared object files (type ET_DYN) can also benefit from the new feature.
>
> There are a couple of potential issues with the patch:
>
> * It adds a new buffer to the instr_info structure, making it even bigger.
> * There are opportunities for sharing code with the oappend_*() functions
> which I did not take. (Mainly to keep the code in the patch simple).
> * There are opportunities for other places in the dissaembler to add comments
> of their own, but I figured that these could be added later, if they
> were actually deemed worthwhile.
>
> Thoughts, comments ?
Already when reading Michael's v1 comment I wondered: How's handling of
multiple comments going to work? Surely we want to switch the present
%rip-relative operand printing to the new cappend_*() model. Hence both
that and an immediate may want to have a comment added. Presumably those
would want to appear in the same order as the operands, which are
printed in different order for AT&T vs Intel syntax.
Jan
Hi Jan,
> Already when reading Michael's v1 comment I wondered: How's handling of
> multiple comments going to work?
Yeah .. I don't know :-)
> Surely we want to switch the present
> %rip-relative operand printing to the new cappend_*() model.
That was going to be my next step.
> Hence both
> that and an immediate may want to have a comment added. Presumably those
> would want to appear in the same order as the operands, which are
> printed in different order for AT&T vs Intel syntax.
Ah, OK, this is beyond my x86 expertise. Maybe a real x86 maintainer
would like to take over ?
Cheers
Nick
On 05.05.2026 10:33, Nick Clifton wrote:
>> Already when reading Michael's v1 comment I wondered: How's handling of
>> multiple comments going to work?
>
> Yeah .. I don't know :-)
>
>> Surely we want to switch the present
>> %rip-relative operand printing to the new cappend_*() model.
>
> That was going to be my next step.
>
>> Hence both
>> that and an immediate may want to have a comment added. Presumably those
>> would want to appear in the same order as the operands, which are
>> printed in different order for AT&T vs Intel syntax.
>
> Ah, OK, this is beyond my x86 expertise. Maybe a real x86 maintainer
> would like to take over ?
Perhaps I can, if this isn't overly urgent. As long as only one comment is
there per line, things are fine (I think/hope). So you could get your work
in, for me to then take it from there. (Please just drop me a note when
you committed your changes, so I'm aware without needing to actively track
things.)
Btw, seeing the Cc list, I can't help finally asking: Shouldn't the x86_64
maintainer list be pruned? Neither Jan nor Andreas have been doing much in
the last so many years, and typically they also aren't Cc-ed (anymore) on
patches afaict. Jan, Andreas - primarily it would be your call, of course.
Jan
On 05/05/2026 10.53, Jan Beulich wrote:
> [...]
> Btw, seeing the Cc list, I can't help finally asking: Shouldn't the x86_64
> maintainer list be pruned? Neither Jan nor Andreas have been doing much in
> the last so many years, and typically they also aren't Cc-ed (anymore) on
> patches afaict. Jan, Andreas - primarily it would be your call, of course.
Thanks for raising this, Jan!
If there's a document that lists me still as maintainer in binutils:
Please remove me as I don't expect to contribute in the future.
Andreas
Hi Andreas,
> If there's a document that lists me still as maintainer in binutils: Please remove me as I don't expect to contribute in the future.
OK, I have done that.
Thank you very much for all of your contributions to the GNU Binutils.
Cheers
Nick
On 23.04.2026 13:03, Nick Clifton wrote:
> Hi Guys,
>
> Attached is a v2 of the proposed patch. It adds:
>
> * The annotation is displayed as a comment at the end of the line.
> * The symbol name is displayed in dis_style_symbol rather than dis_style_address.
> * Shared object files (type ET_DYN) can also benefit from the new feature.
>
> There are a couple of potential issues with the patch:
>
> * It adds a new buffer to the instr_info structure, making it even bigger.
> * There are opportunities for sharing code with the oappend_*() functions
> which I did not take. (Mainly to keep the code in the patch simple).
> * There are opportunities for other places in the dissaembler to add comments
> of their own, but I figured that these could be added later, if they
> were actually deemed worthwhile.
>
> Thoughts, comments ?
For me the use of the global "exe" in test_objdump_M_annotate is a
problem when running the testsuite with RUNTESTFLAGS=x86-64.exp:
ERROR: can't read "exe": no such variable
Sadly I have no clue what may need doing to address this.
Jan
@@ -1,5 +1,9 @@
-*- text -*-
+* The x86 and x86_64 disassemblers now accept a command line option of
+ "-M annotate-immediates" which displays the symbol associated with immediate
+ values, should there be one.
+
* Objdump and readelf now have a --debug-dir=<DIR> option which can be used to
tell them where to find separate debug info files.
@@ -2729,6 +2729,9 @@ When in AT&T mode and also for a limited set of instructions when in Intel
mode, instructs the disassembler to print a mnemonic suffix even when the
suffix could be inferred by the operands or, for certain instructions, the
execution mode's defaults.
+
+@item annotate-immediates
+Display the symbol associated with immediate values, should there be one.
@end table
For PowerPC, the @option{-M} argument @option{raw} selects
@@ -323,3 +323,34 @@ if {[catch "system \"bzip2 -dc $t > $obj\""] != 0} {
run_pr33230_test "$testname" $obj "" $run_readelf
run_pr33230_test "$testname" $obj "--input-target=default" $run_readelf
}
+
+# Test objdump -M annotate
+
+proc test_objdump_M_annotate { } {
+ global srcdir
+ global subdir
+ global OBJDUMP
+ global OBJDUMPFLAGS
+ global exe
+
+ set test "objdump -M annotate"
+
+ if { [target_compile $srcdir/$subdir/../testprog.c tmpdir/testprog${exe} executable debug] != "" } {
+ unsupported "$test (build)"
+ return
+ }
+
+ set got [binutils_run $OBJDUMP "$OBJDUMPFLAGS -d -M annotate tmpdir/testprog${exe}"]
+
+ # Look for something like this in the disassembly:
+ # 400419: bf 18 30 40 00 mov $0x403018 [common],%edi
+ set want " \[common\]"
+
+ if [regexp $want $got] then {
+ pass $test
+ } else {
+ fail $test
+ }
+}
+
+test_objdump_M_annotate
@@ -41,6 +41,8 @@
typedef struct instr_info instr_info;
+static bool annotate_immediates = false;
+
static bool dofloat (instr_info *, int);
static int putop (instr_info *, const char *, int);
static void oappend_with_style (instr_info *, const char *,
@@ -9043,6 +9045,7 @@ with the -M switch (multiple options should be separated by commas):\n"));
fprintf (stream, _(" suffix Always display instruction suffix in AT&T syntax\n"));
fprintf (stream, _(" amd64 Display instruction in AMD64 ISA\n"));
fprintf (stream, _(" intel64 Display instruction in Intel64 ISA\n"));
+ fprintf (stream, _(" annotate-immediates Annotate immediate operands that match symbols\n"));
}
/* Bad opcode. */
@@ -9825,6 +9828,9 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
else if (startswith (p, "suffix"))
priv.orig_sizeflag |= SUFFIX_ALWAYS;
+ else if (startswith (p, "annotate"))
+ annotate_immediates = true;
+
p = strchr (p, ',');
if (p != NULL)
p++;
@@ -11637,7 +11643,36 @@ oappend_immediate (instr_info *ins, bfd_vma imm)
{
if (!ins->intel_syntax)
oappend_char_with_style (ins, '$', dis_style_immediate);
+
print_operand_value (ins, imm, dis_style_immediate);
+
+ /* Determine if we can display some more information about this immediate. */
+ if (! annotate_immediates
+ /* Don't bother with xero, even if there is symbol associated with it. */
+ || imm == 0
+ /* For the next tests we need a BFD. If we do not have one then do not proceed. */
+ || ins->info->section == NULL
+ || ins->info->section->owner == NULL
+ /* Save time by avoiding immediates that cannot reference part of the address space. */
+ || imm < ins->info->section->owner->start_address
+ /* Also skip object files as their symbols have not been resolved. */
+ || (ins->info->section->owner->flags & EXEC_P) == 0)
+ return;
+
+ asymbol * sym = ins->info->symbol_at_address_func (imm, ins->info);
+ if (sym == NULL)
+ return;
+
+ char * annotation = NULL;
+
+ /* FIXME: Potential memory leak: strictly speaking asprintf()
+ can return 0 whilst also having allocated some memory. */
+ if (asprintf (& annotation, " [%s]", sym->name) > 0)
+ {
+ /* Display the symbol associated with address 'imm'. */
+ oappend_with_style (ins, annotation, dis_style_address);
+ free (annotation);
+ }
}
/* Put DISP in BUF as signed hex number. */