[v3,0/3] RISC-V: Add overridable "priv-spec" and "arch" disassembler options

Message ID cover.1668910970.git.research_trasio@irq.a4lg.com
Series RISC-V: Add overridable "priv-spec" and "arch" disassembler options |


Tsukasa OI Nov. 20, 2022, 2:23 a.m. UTC
-   PATCH 3/3 contains GDB changes.

This patchset:

-   Adds support for "arch=ARCH" diassembler option to RISC-V and
-   Makes existing "priv-spec=SPEC" disassembler option
    overridable on ELF files.
-   Adds GDB testcases

Tracker on GitHub:
Branch on GitHub:

(proposed "isa" option) PATCH v1:
(proposed "isa" option) PATCH v2:

The idea is reviewed by Palmer Dabbelt and issues pointed out were fixed
in this version:

Also, a prerequisite to this patchset is merged:

** this patchset does not apply to master directly. **

This patchset requires following patchset to be applied first:
but not my optimization patchsets.

If my optimization patchsets:
are merged first, slight modification to this patchset
(to Binutils part) is required.  This is not affected on the GDB side.

That "slightly modified" branch (optimization first) is managed on GitHub:

Let me recap the background of this patchset.


Normally, libopcodes (used by objdump and GDB) on RISC-V sets:

1. Default ISA and extensions from either:
    -   ELF mapping symbols (with ISA string)
    -   ELF .riscv.attributes section
    -   "rv32gc"/"rv64gc" depending on ELF architecture or configuration
2. Default privileged specification version from:
    -   ELF .riscv.attributes section
    -   Latest privileged specification
        (if not ELF attributes are available)

But there are some cases that this automatic detection is not good.
This patchset is going to be particularly useful on:

-   gdb: Baremetal debugging on GDB (e.g. via OpenOCD)
-   objdump: Reverse engineering binary files (without ELF headers)

In both cases, self-describing architectural information is missing (unlike
regular ELF files we generate).  On such cases, that is very helpful to have
an option to specify ISA and specifications without help of ELF attributes.

Another situation where we need this option is that a program is compiled
for general/old specification but actually uses new/specialized ones (as
shown in an example below).

This patchset enables setting various ISA extensions / privileged
specifications for debugging / reverse engineering software
on various RISC-V processors:

-   objdump:
    -   -M arch=rv[32|64]...
    -   -M priv-spec=SPEC
-   GDB
    -   set disassembler-options arch=rv[32|64]...
    -   set disassembler-options priv-spec=SPEC
    Note that all disassembler options must be specified as
    a comma-separated list (specifying set disassembler-options twice makes
    only the last option applied).

[Example: OpenSBI]

This is not hypothetical and even has a problem on ELF files.
OpenSBI is the prime example.

OpenSBI can be compiled with the current RISC-V GNU Toolchain (with
privileged specification version 1.11) but uses hypervisor instructions and
privileged specification 1.12 CSRs.  It also has ELF arch attribute
"rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0" (RV64GC with ISA version 2.2) if we
compile for RV64 and there's no H-extension reference here.

An Excerpt from the Source Code (OpenSBI 1.1, lib/sbi/sbi_hfence.S):

        .align 3
        .global __sbi_hfence_gvma_vmid_gpa
        * rs1 = a0 (GPA >> 2)
        * rs2 = a1 (VMID)
        * HFENCE.GVMA a0, a1
        * 0110001 01011 01010 000 00000 1110011
        .word 0x62b50073


-   Command:
    riscv64-unknown-elf-objdump -d fw_jump.elf
-   Target:
    fw_jump.elf from OpenSBI 1.1 (RV64)
-   Compiler/Toolchain:
    RISC-V GNU Toolchain 2022.06.10

    00000000800097c8 <__sbi_hfence_gvma_vmid_gpa>:
        800097c8:  62b50073                .4byte  0x62b50073
        800097cc:  8082                    ret
        800097ce:  0001                    nop


-   Command:
    riscv64-unknown-elf-objdump -M arch=rv64gch -d fw_jump.elf

    00000000800097c8 <__sbi_hfence_gvma_vmid_gpa>:
        800097c8:  62b50073                hfence.gvma     a0,a1
        800097cc:  8082                    ret
        800097ce:  0001                    nop


Even after this patchset, only 1 of 8 "hfence.*" instructions in OpenSBI
is correctly disassembled on objdump but the reason is separate: OpenSBI
uses ".word", not ".insn" (the disassembler handles some instructions
emitted with ".word" as data).  In this case, "objdump -D" can
be a workaround.

[PATCH 2/3: RISC-V: Add -M arch disassembler option]

-M arch=ARCH is very simple.  But we have multiple ways to set proper
XLEN for given situation.

In this patch (PATCH 2), I propose following precedence rules
(on objdump/gdb):

1.  BFD architecture set by either (in the following order):
    a.  Machine architecture
        (-m riscv:rv[32|64] / set arch riscv:rv[32|64])
    b.  ELF class in the ELF header
        (only when disassembling RISC-V ELF files)
    This is effective only if XLEN-specific RISC-V BFD architecture
    is set.  For instance, if XLEN-neutral machine is specified by
    "-m riscv", BFD architecture is ignored on XLEN selection.
2.  ISA string set by either (in the following order):
    a.  ISA string option
        (-M arch=rv[32|64].. / set disassembler-options arch=..)
    b.  ELF mapping symbols with ISA string
        (only when disassembling RISC-V ELF files)
    c.  ELF attributes
        (only when disassembling RISC-V ELF files)
3.  ELF class in the "ELF header"

This enables XLEN switching by ISA option on architecture riscv but not
on riscv:rv32 or riscv:rv64 (architecture with fixed XLEN is preferred).

I preferred not to generate a warning if XLEN if they conflict
because of "ELF header" (may be a dummy while processing a binary file
or not useful if the input is a non-RISC-V ELF file) and possible
flexibility when used together with GDB.

Still, adding it might be an option.

Cross-toolchain XLEN precedence rules with some prerequisites will
look like this (of course, rules above conforms to it):

0.  When disassembling, the disassembler MAY infer the target machine
    (with or without specific XLEN) from the input or other conditions
    automatically unless the target machine is explicitly specified.
0.  If the ISA string is not specified, the disassembler SHALL use either
    the architecture from mapping symbols with ISA string (if exists),
    architecture tag from ELF attributes or the default ISA and extensions
    matching the target machine (as possible).
    This default ISA is normally either "rv32gc" or "rv64gc" but the
    disassembler MAY have its own defaults.

1.  If the target machine is not set or XLEN-neutral (can be both
    RV32 or RV64, "-m riscv" on objdump for example), XLEN portion of
    the ISA string takes precedence.
2.  If the target machine is XLEN-specific (either RV32 or RV64),
    target XLEN takes precedence.  If the ISA string is specified but
    differs in XLEN, the disassembler MAY:
    -   raise an error or,
    -   ignore XLEN part of the ISA string and try to conform given ISA
        (except XLEN) as possible.

[Changes between RFC PATCH v2 and PATCH v3]

-   Rebased
-   Renamed option name from "isa" to "arch" based on precious feedback
    from Palmer Dabbelt.
-   XLEN precedence rules are updated per actual behavior (auto-setting
    BFD architecture by given RISC-V ELF files were not considered in
    RFC PATCH v1 and v2).
-   Added some tests to check architecture override.
-   Added support for mapping symbols with ISA string.

[Changes between RFC PATCH v1 and RFC PATCH v2]

The only functional change in the RFC PATCH v2 is that we reset xlen
variable to 0 before parsing ISA string.  It has no effect on objdump
but on GDB, it stops preserving last XLEN value before setting invalid
ISA string (not starting with "rv32" or "rv64").

Rest of the changes are editorial.  My language got broken while writing
v1 but I think most of them are fixed in v2.  Also, most "-M isa"
(objdump-only) references are replaced with just "isa" to indicate
both objdump and GDB options.

Renamed "xlen_set_by_option" to "xlen_by_isa" for clarity (meaning XLEN
set by "ISA string" option).

Tsukasa OI (3):
  RISC-V: Make "priv-spec" overridable
  RISC-V: Add "arch" disassembler option
  gdb/testsuite: RISC-V disassembler option tests

 gas/testsuite/gas/riscv/dis-arch-override-1.d |  13 ++
 gas/testsuite/gas/riscv/dis-arch-override-2.d |  13 ++
 gas/testsuite/gas/riscv/dis-arch-override-3.d |  13 ++
 gas/testsuite/gas/riscv/dis-arch-override.s   |  45 ++++++
 .../gas/riscv/dis-priv-spec-override-1.d      |  10 ++
 .../gas/riscv/dis-priv-spec-override-2.d      |  10 ++
 .../gas/riscv/dis-priv-spec-override.s        |   2 +
 .../gdb.arch/riscv-disassembler-options.exp   | 129 ++++++++++++++++++
 .../gdb.arch/riscv-disassembler-options.s     |  29 ++++
 opcodes/riscv-dis.c                           | 100 ++++++++++----
 10 files changed, 337 insertions(+), 27 deletions(-)
 create mode 100644 gas/testsuite/gas/riscv/dis-arch-override-1.d
 create mode 100644 gas/testsuite/gas/riscv/dis-arch-override-2.d
 create mode 100644 gas/testsuite/gas/riscv/dis-arch-override-3.d
 create mode 100644 gas/testsuite/gas/riscv/dis-arch-override.s
 create mode 100644 gas/testsuite/gas/riscv/dis-priv-spec-override-1.d
 create mode 100644 gas/testsuite/gas/riscv/dis-priv-spec-override-2.d
 create mode 100644 gas/testsuite/gas/riscv/dis-priv-spec-override.s
 create mode 100644 gdb/testsuite/gdb.arch/riscv-disassembler-options.exp
 create mode 100644 gdb/testsuite/gdb.arch/riscv-disassembler-options.s

base-commit: f3fcf98b44621fb8768cf11121d3fd77089bca5b