[2/6] Delegate opcodes to select disassembler in GDB

Message ID 867ezk5hea.fsf@gmail.com
State New, archived
Headers

Commit Message

Yao Qi July 7, 2017, 4:41 p.m. UTC
  "Maciej W. Rozycki" <macro@imgtec.com> writes:

>  I can see the assumption from the assertion itself, but what is its 
> purpose?
>

The purpose of this assert is to check that the disassemble_info passed
to opcodes is correctly got from the executable.

>  Normally you place an assertion in code to guard against a case that is 
> not handled (correctly) and if let through it would lead execution to go 
> astray, e.g. because it is a new complicated feature we have not yet got 
> to implementing or because it is a case we think that will never happen, 
> and code that follows has assumptions that will not be met.
>
>  So if you say that you can remove the assertion with no adverse effect on 
> processing, then I think it should not have been placed there in the first 
> place.
>
>  Anyway, if you look at code in `gdb_print_insn_mips', then you'll find 
> this comment:
>
>   /* FIXME: cagney/2003-06-26: Is this even necessary?  The
>      disassembler needs to be able to locally determine the ISA, and
>      not rely on GDB.  Otherwize the stand-alone 'objdump -d' will not
>      work.  */
>

Yes, that is the point of my patch series.

> and it is indeed the case that `objdump' handles this correctly without 
> the need to switch away from the BFD selected for the binary being 
> handled.  However in GDB we have this problem that we do not pass the 
> symbol table down to libopcodes for the disassembler, and in the case of 
> the MIPS target it is the symbol table that carries information about 
> which functions contain regular MIPS code, MIPS16 code and microMIPS code 
> respectively.
>
>  Lacking symbol information we resort to this hack of overriding the BFD 
> for the purpose of disassembly only, and this has the adverse effect of 
> getting instruction subsetting wrong: the `bfd_mach_mips16' and 
> `bfd_mach_mips_micromips' BFDs always choose the maximum instruction set 
> supported possible whereas the binary handled may only support a subset 
> (or worse yet an alternative set), as indicated by the original BFD 
> selected.  This in turn may confuse and mislead the person running a debug 
> session into thinking code disassembled is not the problem they are after.
>
>  Do you think it would be possible as a part of your disassembler rework 
> to make the symbol table available to libopcodes?  Then the hack currently 
> present in `gdb_print_insn_mips' could go.  I remember looking into it a 
> while ago and concluding it would be somewhat tricky because the symbol 
> table format expected by libopcodes is not the same that we use in
> GDB.

We can pass the symbol table to opcodes, so it can choose the right
disassembler.  However, current GDB uses both symbol and address to
determine some address is in a MIPS16 or microMIPS function (done by
mips_pc_is_mips16, mips_pc_is_micromips).  If we want to use symbol
table to tell the address is in microMIPS or MIPS16, and GDB can't find
a symbol for a given address, I suppose we need to create a fake
symbol, and pass it to disassembler.  Why don't we teach GDB
unconditionally create a fake symbol with the right bits set in order to
meet the check in mips-dis.c:is_compressed_mode_p?
arm-tdep.c:gdb_print_insn_arm has already do so to disassemble Thumb
instructions.

What do you think of the patch below?  IMO, change in
is_compressed_mode_p fixes a bug on usage of info->num_symbols vs.
info->symtab_size.  I can't test this patch.
  

Comments

Maciej W. Rozycki Aug. 1, 2017, 4:31 p.m. UTC | #1
On Fri, 7 Jul 2017, Yao Qi wrote:

> >  I can see the assumption from the assertion itself, but what is its 
> > purpose?
> >
> 
> The purpose of this assert is to check that the disassemble_info passed
> to opcodes is correctly got from the executable.

 What is there in opcodes that needs guarding with the assertion though, 
i.e. will anything break if the `disassemble_info' passed is not exactly 
one obtained from the executable?

 I'd at least imagine a need for the user to be able to explicitly 
override the disassembler's machine via `set architecture ...', e.g. to 
get a meaningful disassembly of code that has been built with a temporary 
instruction set override for encodings that are not included in the ISA 
recorded in ELF file structures.

> >  Do you think it would be possible as a part of your disassembler rework 
> > to make the symbol table available to libopcodes?  Then the hack currently 
> > present in `gdb_print_insn_mips' could go.  I remember looking into it a 
> > while ago and concluding it would be somewhat tricky because the symbol 
> > table format expected by libopcodes is not the same that we use in
> > GDB.
> 
> We can pass the symbol table to opcodes, so it can choose the right
> disassembler.  However, current GDB uses both symbol and address to
> determine some address is in a MIPS16 or microMIPS function (done by
> mips_pc_is_mips16, mips_pc_is_micromips).  If we want to use symbol
> table to tell the address is in microMIPS or MIPS16, and GDB can't find
> a symbol for a given address, I suppose we need to create a fake
> symbol, and pass it to disassembler.  Why don't we teach GDB
> unconditionally create a fake symbol with the right bits set in order to
> meet the check in mips-dis.c:is_compressed_mode_p?
> arm-tdep.c:gdb_print_insn_arm has already do so to disassemble Thumb
> instructions.

 We also need symbols for other purposes, specifically telling code and 
data apart in PLT entries, jump tables and constant pools, so a fake 
symbol won't address all cases.  Ideally we'd present output corresponding 
to that produced with `objdump -d', with a way for the user to request 
`objdump -D' equivalent if disassembly of what is supposed to be data is 
required.

> What do you think of the patch below?  IMO, change in
> is_compressed_mode_p fixes a bug on usage of info->num_symbols vs.
> info->symtab_size.  I can't test this patch.

 There is no bug, as it's `info->num_symbols' that tells the disassembler 
the number of symbols starting from `info->symtab_pos' in the symbol table 
affecting the location requested.  Any further symbols are irrelevant (the 
symbol table is sorted by the increasing address, so those will be beyond 
the location disassembled).  You also need to set `info->num_symbols' to 1 
if you're only passing a single symbol in the first place; it's a part of 
this internal API.

 The setting of BSF_SYNTHETIC might be dangerous as code elsewhere assumes 
it's only set for PLT symbols.  Right now it affects `is_mips16_plt_tail', 
causing it to crash, because with your change `asym.section' remains null 
and that function effectively accesses `asym.section->vma'.  I feel uneasy 
about papering it over by adding a check for the section being non-null 
there, I think we need to look for a better solution as this is getting 
too fragile IMO.

 So given that this is a grave regression I think short-term we just need 
to remove the offending assertion checks.

 NB for the purpose of validation I have now bisected the actual 
regression to commit 6394c606997f ("Don't use print_insn_XXX in GDB"), 
rather than commit 39503f82427e ("Delegate opcodes to select disassembler 
in GDB") where the assertion check has been introduced.  There is also 
commit 003ca0fd2286 ("Refactor disassembler selection") on the opcodes' 
side, causing assertion failures triggering once the checks in GDB have 
been removed.  I'll be following up with a fix right away.

 Also in the course of this investigation I have discovered a grave 
regression in compressed breakpoint handling.  I'll be posting a fix 
separately.

  Maciej
  

Patch

diff --git a/gdb/mips-tdep.c b/gdb/mips-tdep.c
index c1800e4..b848015 100644
--- a/gdb/mips-tdep.c
+++ b/gdb/mips-tdep.c
@@ -7007,14 +7007,31 @@  gdb_print_insn_mips (bfd_vma memaddr, struct disassemble_info *info)
      disassembler needs to be able to locally determine the ISA, and
      not rely on GDB.  Otherwize the stand-alone 'objdump -d' will not
      work.  */
-  if (mips_pc_is_mips16 (gdbarch, memaddr))
-    info->mach = bfd_mach_mips16;
-  else if (mips_pc_is_micromips (gdbarch, memaddr))
-    info->mach = bfd_mach_mips_micromips;
-
-  /* Round down the instruction address to the appropriate boundary.  */
-  memaddr &= (info->mach == bfd_mach_mips16
-	      || info->mach == bfd_mach_mips_micromips) ? ~1 : ~3;
+  if (mips_pc_is_mips16 (gdbarch, memaddr)
+      || mips_pc_is_micromips (gdbarch, memaddr))
+    {
+      static asymbol asym;
+      static asymbol *asymp = &asym;
+
+      /* Create a fake symbol to tell disassembler in opcodes that
+	 the code is MIPS16 or miscroMIPS.  */
+      asym.flags = BSF_SYNTHETIC;
+      info->symtab_pos = 0;
+      info->symtab_size = 1;
+      info->symtab = &asymp;
+      info->symbols = &asymp;
+
+      if (mips_pc_is_mips16 (gdbarch, memaddr))
+	asym.udata.i = ELF_ST_SET_MIPS16 (asym.udata.i);
+      else
+	asym.udata.i = ELF_ST_SET_MICROMIPS (asym.udata.i);
+
+      /* Round down the instruction address to the appropriate
+	 boundary.  */
+      memaddr &= ~1;
+    }
+  else
+    memaddr &= ~3;
 
   /* Set the disassembler options.  */
   if (!info->disassembler_options)
diff --git a/opcodes/mips-dis.c b/opcodes/mips-dis.c
index 4519500..d413841 100644
--- a/opcodes/mips-dis.c
+++ b/opcodes/mips-dis.c
@@ -2452,7 +2452,7 @@  is_compressed_mode_p (struct disassemble_info *info, bfd_boolean micromips_p)
   int i;
   int l;
 
-  for (i = info->symtab_pos, l = i + info->num_symbols; i < l; i++)
+  for (i = info->symtab_pos, l = info->symtab_size; i < l; i++)
     if (((info->symtab[i])->flags & BSF_SYNTHETIC) != 0
 	&& ((!micromips_p
 	     && ELF_ST_IS_MIPS16 ((*info->symbols)->udata.i))