[v2] arc: Select CPU model properly before disassembling

Message ID 20170613134951.29359-1-Anton.Kolesov@synopsys.com
State New, archived
Headers

Commit Message

Anton Kolesov June 13, 2017, 1:49 p.m. UTC
  Changes in V2:

* Use xstrprintf instead of string_printf
* Reinstate arc_delayed_print_insn as a print instruction function for ARC.
* Change arc_delayed_print_insn to use default_print_insn.
* Change commit description accordingly.

---

Enforce CPU model for disassembler via its options, if it was specified in XML
target description, otherwise use default method of determining CPU implemented
in disassembler - scanning ELF private header.  The latter requires
disassemble_info->section to be properly initialized.  To make sure that
info->section is set in all cases this patch partially reverts [1] for ARC: it
reinstates arc_delayed_print_insn as a "print_insn" function for ARC, but
now this function only sets disassemble_info->section and then calls
default_print_insn to do the rest of the job.

Support for CPU in disassembler options for ARC has been added in [2].

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=39503f82427e22ed8e04d986ccdc8562091ec62e
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=10045478d984f9924cb945423388ba25b7dd3ffe

gdb/ChangeLog:

yyyy-mm-dd  Anton Kolesov  <anton.kolesov@synopsys.com>

	* arc-tdep.c (arc_disassembler_options): New variable.
	(arc_gdbarch_init): Set and use it. Use arc_delayed_print_insn instead
	of default_print_insn.
	(arc_delayed_print_insn): Set info->section when needed,
	use default_print_insn to retrieve a disassembler.
---
 gdb/arc-tdep.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 45 insertions(+), 6 deletions(-)
  

Comments

Simon Marchi June 13, 2017, 10:16 p.m. UTC | #1
On 2017-06-13 15:49, Anton Kolesov wrote:
> Changes in V2:
> 
> * Use xstrprintf instead of string_printf
> * Reinstate arc_delayed_print_insn as a print instruction function for 
> ARC.
> * Change arc_delayed_print_insn to use default_print_insn.
> * Change commit description accordingly.
> 
> ---
> 
> Enforce CPU model for disassembler via its options, if it was specified 
> in XML
> target description, otherwise use default method of determining CPU 
> implemented
> in disassembler - scanning ELF private header.  The latter requires
> disassemble_info->section to be properly initialized.  To make sure 
> that
> info->section is set in all cases this patch partially reverts [1] for 
> ARC: it
> reinstates arc_delayed_print_insn as a "print_insn" function for ARC, 
> but
> now this function only sets disassemble_info->section and then calls
> default_print_insn to do the rest of the job.
> 
> Support for CPU in disassembler options for ARC has been added in [2].

The patch looks good to me, there's just one formatting nit below, and 
some random thoughts (which doesn't affect the fact that the patch 
LGTM).

> [1]
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=39503f82427e22ed8e04d986ccdc8562091ec62e
> [2]
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=10045478d984f9924cb945423388ba25b7dd3ffe
> 
> gdb/ChangeLog:
> 
> yyyy-mm-dd  Anton Kolesov  <anton.kolesov@synopsys.com>
> 
> 	* arc-tdep.c (arc_disassembler_options): New variable.
> 	(arc_gdbarch_init): Set and use it. Use arc_delayed_print_insn instead
> 	of default_print_insn.
> 	(arc_delayed_print_insn): Set info->section when needed,
> 	use default_print_insn to retrieve a disassembler.
> ---
>  gdb/arc-tdep.c | 51 
> +++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 45 insertions(+), 6 deletions(-)
> 
> diff --git a/gdb/arc-tdep.c b/gdb/arc-tdep.c
> index d9ee5c6..9a5efc1 100644
> --- a/gdb/arc-tdep.c
> +++ b/gdb/arc-tdep.c
> @@ -145,6 +145,8 @@ static const char *const 
> core_arcompact_register_names[] = {
>    "lp_count", "reserved", "limm", "pcl",
>  };
> 
> +static char *arc_disassembler_options = NULL;
> +
>  /* Functions are sorted in the order as they are used in the
>     _initialize_arc_tdep (), which uses the same order as gdbarch.h.  
> Static
>     functions are defined before the first invocation.  */
> @@ -1405,12 +1407,31 @@ arc_skip_prologue (struct gdbarch *gdbarch,
> CORE_ADDR pc)
>  int
>  arc_delayed_print_insn (bfd_vma addr, struct disassemble_info *info)
>  {
> -  int (*print_insn) (bfd_vma, struct disassemble_info *);
> -  /* exec_bfd may be null, if GDB is run without a target BFD file.  
> Opcodes
> -     will handle NULL value gracefully.  */
> -  print_insn = arc_get_disassembler (exec_bfd);
> -  gdb_assert (print_insn != NULL);
> -  return print_insn (addr, info);
> +  /* Standard BFD "machine number" field allows libocodes disassembler 
> to
> +     distinguish ARC 600, 700 and v2 cores, however v2 encompasses 
> both ARC EM
> +     and HS, which have some difference between.  There are two ways 
> to specify
> +     what is the target core:
> +     1) via the disassemble_info->disassembler_options;
> +     2) otherwise libopcodes will use private (architecture-specific) 
> ELF
> +     header.
> +
> +     Using disassembler_options is preferable, because it comes 
> directly from
> +     GDBserver which scanned an actual ARC core identification info.  
> However,
> +     not all GDBservers report core architecture, so as a fallback GDB 
> still
> +     should support analysis of ELF header.  The libopcodes 
> disassembly code
> +     uses the section to find the BFD and the BFD to find the ELF 
> header,
> +     therefore this function should set disassemble_info->section 
> properly.
> +
> +     disassembler_options was already set by non-target specific code 
> with
> +     proper options obtained via gdbarch_disassembler_options ().  */
> +  if (info->disassembler_options == NULL)
> +    {
> +      struct obj_section * s = find_pc_section (addr);

Extra space after the *.

> +      if (s != NULL)
> +	info->section = s->the_bfd_section;
> +    }

When printing multiple instructions in a row, is this function called 
multiple times with the same disassemble_info?  If so, are we going to 
make a find_pc_section call for each instruction?  Is this needed, given 
that the options won't change for a given ELF (on ARC at least)?

> +
> +  return default_print_insn (addr, info);
>  }
> 
>  /* Baremetal breakpoint instructions.
> @@ -2013,6 +2034,8 @@ arc_gdbarch_init (struct gdbarch_info info,
> struct gdbarch_list *arches)
> 
>    set_gdbarch_frame_align (gdbarch, arc_frame_align);
> 
> +  set_gdbarch_print_insn (gdbarch, arc_delayed_print_insn);
> +
>    set_gdbarch_cannot_step_breakpoint (gdbarch, 1);
> 
>    /* "nonsteppable" watchpoint means that watchpoint triggers before
> @@ -2041,6 +2064,22 @@ arc_gdbarch_init (struct gdbarch_info info,
> struct gdbarch_list *arches)
>    if (tdep->jb_pc >= 0)
>      set_gdbarch_get_longjmp_target (gdbarch, arc_get_longjmp_target);
> 
> +  /* Disassembler options.  Enforce CPU if it was specified in XML 
> target
> +     description, otherwise use default method of determining CPU (ELF 
> private
> +     header).  */
> +  if (info.target_desc != NULL)
> +    {
> +      const struct bfd_arch_info *tdesc_arch
> +	= tdesc_architecture (info.target_desc);
> +      if (tdesc_arch != NULL)
> +	{
> +	  arc_disassembler_options = xstrprintf ("cpu=%s",
> +						 tdesc_arch->printable_name);
> +	  set_gdbarch_disassembler_options (gdbarch,
> +					    &arc_disassembler_options);
> +	}
> +    }
> +

I was a bit concerned with the fact that multiple gdbarch objects (that 
represent different architectures) can end up sharing the same pointer 
to arc_disassembler_options, so necessarily some of them will have the 
wrong value at a certain point.  But that doesn't look like an immediate 
problem, since you don't seem to recycle previously created gdbarch 
objects, like some other architectures do (does that mean that the 
gdbarch list keeps growing indefinitely if you connect multiple times, 
load new exec files?).  The gdbarch in use should always be the last 
that was created, so arc_disassembler_options should contain the good 
value for that gdbarch.  I was thinking that a situation like this could 
be problematic, if you were re-using gdbarch objects:

1. You connect to a gdbserver that reports arch A.  A gdbarch is created 
and arc_disassembler_options is set for arch A.
2. You disconnect, and connect to a gdbserver that reports arch B.  A 
gdbarch is created and arc_disassembler_options is set for arch B.
3. You disconnect, and connect again to the first gdbserver.  This time, 
you would return the existing gdbarch for arch A, so the current gdbarch 
(arch A) would still use the disassembler options of arch B.

But like I said, that doesn't seem to be the case right now.

That makes me wonder why we pass a char ** and not simply an allocated 
char *, of which the gdbarch would take ownership, which would avoid 
this problem entirely.

Simon
  
Anton Kolesov June 14, 2017, 2:02 p.m. UTC | #2
Hi Simon,

> > + if (info->disassembler_options == NULL)
> > +    {
> > +      struct obj_section * s = find_pc_section (addr);
> 
> Extra space after the *.

This was copied from mep-tdep.c. Should I make a separate patch that fixes it
there too?

> 
> > +      if (s != NULL)
> > +	info->section = s->the_bfd_section;
> > +    }
> 
> When printing multiple instructions in a row, is this function called multiple
> times with the same disassemble_info?  If so, are we going to make a
> find_pc_section call for each instruction?  Is this needed, given that the
> options won't change for a given ELF (on ARC at least)?

I think it might not work if there is a multi-arch target and disassembler
invocation tries to dump instructions across multiple sections which come from
different ELF files (not sure if shared libraries would count as a separate ELF
files). But usually disassembler works per-function and practically I don't
think that there would be a case of function that crosses section boundary.
I've put this optimization to my v3 patch.

> 
> I was a bit concerned with the fact that multiple gdbarch objects (that
> represent different architectures) can end up sharing the same pointer to
> arc_disassembler_options, so necessarily some of them will have the wrong
> value at a certain point.  But that doesn't look like an immediate problem,
> since you don't seem to recycle previously created gdbarch objects, like
> some other architectures do (does that mean that the gdbarch list keeps
> growing indefinitely if you connect multiple times, load new exec files?).  The
> gdbarch in use should always be the last that was created, so
> arc_disassembler_options should contain the good value for that gdbarch.  I
> was thinking that a situation like this could be problematic, if you were re-
> using gdbarch objects:
> 
> 1. You connect to a gdbserver that reports arch A.  A gdbarch is created and
> arc_disassembler_options is set for arch A.
> 2. You disconnect, and connect to a gdbserver that reports arch B.  A gdbarch
> is created and arc_disassembler_options is set for arch B.
> 3. You disconnect, and connect again to the first gdbserver.  This time, you
> would return the existing gdbarch for arch A, so the current gdbarch (arch A)
> would still use the disassembler options of arch B.
> 
> But like I said, that doesn't seem to be the case right now.
> 
> That makes me wonder why we pass a char ** and not simply an allocated
> char *, of which the gdbarch would take ownership, which would avoid this
> problem entirely.

This is the same thing as in {arm,rs6000,s390-linux}-tdep.c - they all use
global static variable, which would be shared across multiple gdbarch
instances. So when options are changed that would change that for any gdbarch
of that architecture. RS6000 and S390 don't even assign this variable any value
- it completely managed by the disasm.c. Although in ARC case there is a memory
leak, because arc_disassembler_options is only assigned by arc-tdep.c and
never freed - that's because I didn't properly understood what other arches
do - they don't free options as well, but they also don't allocate them in
each gdbarch_init, because there usecase is somewhat different. I'm not sure
how big of a problem this leak is, because outside of a GDB test suite I don't
think there are a lot of practical cases where same GDB instance is reused to
connect/disconnect, so leak wouldn't be noticeable. I think, I can make
things better for ARC, by moving arc_disassembler_options into the
gdbarch_tdep, then each gdbarch for ARC will have its own options, but that
would mean behavior different from other arches which support
disassembler_options - there options are shared across gdbarches of
architecture. Should I do that in V3?

TBH, that would be a moot for ARC right now, because today our disassembler
has a global state and it doesn't support simultaneous disassembly of
different ARC architectures anyway, but it might make sense to try to make
sure that GDB part handles this correctly, if in the future disassembler
would be upgraded to avoid global state.

The reuse of gdbarch instances seems like a good idea, will try to add that
in the future for ARC.

Anton

> 
> Simon
  
Simon Marchi June 14, 2017, 4:27 p.m. UTC | #3
On 2017-06-14 16:02, Anton Kolesov wrote:
> This is the same thing as in {arm,rs6000,s390-linux}-tdep.c - they all 
> use
> global static variable, which would be shared across multiple gdbarch
> instances. So when options are changed that would change that for any 
> gdbarch
> of that architecture. RS6000 and S390 don't even assign this variable 
> any value
> - it completely managed by the disasm.c. Although in ARC case there is 
> a memory
> leak, because arc_disassembler_options is only assigned by arc-tdep.c 
> and
> never freed - that's because I didn't properly understood what other 
> arches
> do - they don't free options as well, but they also don't allocate them 
> in
> each gdbarch_init, because there usecase is somewhat different.

Indeed, set_disassembler_options takes care of the 
allocation/deallocation when the user changes the options.

> I'm not sure
> how big of a problem this leak is, because outside of a GDB test suite 
> I don't
> think there are a lot of practical cases where same GDB instance is 
> reused to
> connect/disconnect, so leak wouldn't be noticeable. I think, I can make
> things better for ARC, by moving arc_disassembler_options into the
> gdbarch_tdep, then each gdbarch for ARC will have its own options, but 
> that
> would mean behavior different from other arches which support
> disassembler_options - there options are shared across gdbarches of
> architecture. Should I do that in V3?

Ah, I did not understand previously that sharing the disassembler 
options between all gdbarches of an architecture is the intended 
behavior.  Having different behavior across architectures for this 
wouldn't be a good idea, I think.  Also, as of today, the disassembler 
options are very much user driven.  The architecture family can set some 
default at startup (ARM, for example, sets "reg-names-std"), but after 
that it's all up to the user.

In your case, it's the opposite along those two axis: you want GDB to 
set disassembler options in the back of the user, and you want 
disassembler options (at least some of them) to be gdbarch-specific.  To 
support that correctly, I think we should first define what behavior we 
want from GDB.  For example:

  - if the user doesn't specify any disassembler-options, it's probably 
right to automatically set it for them.
  - if the user provides a disassembler option other than cpu= and then 
connects to a gdbserver, we probably want to add the cpu= to what they 
have specified.
  - if they connect first and then set another, unrelated option, do we 
keep the cpu= option or does it get lost?
  - if they set a particular cpu= option and then connect to a gdbserver 
that reports something else, which one do we choose?
  - if they connect to a gdbserver that reports an architecture and we 
add a cpu= for it, then they connect to a gdbserver that doesn't report 
an architecture, do we keep the previous cpu= or not?
  - what happens with the various options when we switch gdbarch, do we 
keep the user-provided ones but flush the GDB-inferred ones?

Once we know what behavior we want from GDB, I think it will be easier 
to decide what to put where.

In the mean time, if you think the current patch makes the situation 
better for ARC users in the typical usage scenarios, I'm fine with going 
with it.  Note that there will be a change for your users: connecting to 
a target will overwrite whatever option they might have set before 
connecting.  Also, to avoid the leak, you could xfree the previous 
arc_disassembler_options before assigning it in arc_gdbarch_init.

> TBH, that would be a moot for ARC right now, because today our 
> disassembler
> has a global state and it doesn't support simultaneous disassembly of
> different ARC architectures anyway, but it might make sense to try to 
> make
> sure that GDB part handles this correctly, if in the future 
> disassembler
> would be upgraded to avoid global state.
> 
> The reuse of gdbarch instances seems like a good idea, will try to add 
> that
> in the future for ARC.
> 
> Anton

Thanks,

Simon
  
Pedro Alves June 14, 2017, 4:37 p.m. UTC | #4
On 06/14/2017 05:27 PM, Simon Marchi wrote:
> On 2017-06-14 16:02, Anton Kolesov wrote:
>> This is the same thing as in {arm,rs6000,s390-linux}-tdep.c - they all
>> use
>> global static variable, which would be shared across multiple gdbarch
>> instances. So when options are changed that would change that for any
>> gdbarch
>> of that architecture. RS6000 and S390 don't even assign this variable
>> any value
>> - it completely managed by the disasm.c. Although in ARC case there is
>> a memory
>> leak, because arc_disassembler_options is only assigned by arc-tdep.c and
>> never freed - that's because I didn't properly understood what other
>> arches
>> do - they don't free options as well, but they also don't allocate
>> them in
>> each gdbarch_init, because there usecase is somewhat different.
> 
> Indeed, set_disassembler_options takes care of the
> allocation/deallocation when the user changes the options.
> 
>> I'm not sure
>> how big of a problem this leak is, because outside of a GDB test suite
>> I don't
>> think there are a lot of practical cases where same GDB instance is
>> reused to
>> connect/disconnect, so leak wouldn't be noticeable. I think, I can make
>> things better for ARC, by moving arc_disassembler_options into the
>> gdbarch_tdep, then each gdbarch for ARC will have its own options, but
>> that
>> would mean behavior different from other arches which support
>> disassembler_options - there options are shared across gdbarches of
>> architecture. Should I do that in V3?
> 
> Ah, I did not understand previously that sharing the disassembler
> options between all gdbarches of an architecture is the intended
> behavior.  Having different behavior across architectures for this
> wouldn't be a good idea, I think.  Also, as of today, the disassembler
> options are very much user driven.  The architecture family can set some
> default at startup (ARM, for example, sets "reg-names-std"), but after
> that it's all up to the user.

Right, in a given debug session you can have different
gdbarch instances at play even for the same cpu architecture.
E.g., the gdbarch determined for the binary vs the gdbarch created
for the target description.

> 
> In your case, it's the opposite along those two axis: you want GDB to
> set disassembler options in the back of the user, and you want
> disassembler options (at least some of them) to be gdbarch-specific.  To
> support that correctly, I think we should first define what behavior we
> want from GDB.  For example:
> 
>  - if the user doesn't specify any disassembler-options, it's probably
> right to automatically set it for them.
>  - if the user provides a disassembler option other than cpu= and then
> connects to a gdbserver, we probably want to add the cpu= to what they
> have specified.
>  - if they connect first and then set another, unrelated option, do we
> keep the cpu= option or does it get lost?
>  - if they set a particular cpu= option and then connect to a gdbserver
> that reports something else, which one do we choose?
>  - if they connect to a gdbserver that reports an architecture and we
> add a cpu= for it, then they connect to a gdbserver that doesn't report
> an architecture, do we keep the previous cpu= or not?
>  - what happens with the various options when we switch gdbarch, do we
> keep the user-provided ones but flush the GDB-inferred ones?

I'd think that the "set/show disassembler-options" setting
wouldn't ever be changed by connecting to a target and
that whatever the user explicitly set should be respected.

I.e., gdb / opcodes would choose which cpu to use for disassembly
based on this logical sequence:

 - if user specified cpu, use that,
 - otherwise, if tdesc specified cpu, use that,
 - otherwise, infer cpu from bfd/elf section,
 - otherwise, use default cpu

Thanks,
Pedro Alves
  
Anton Kolesov June 15, 2017, 1:19 p.m. UTC | #5
> 
> I'd think that the "set/show disassembler-options" setting wouldn't ever be
> changed by connecting to a target and that whatever the user explicitly set
> should be respected.
> 
> I.e., gdb / opcodes would choose which cpu to use for disassembly based on
> this logical sequence:
> 
>  - if user specified cpu, use that,
>  - otherwise, if tdesc specified cpu, use that,
>  - otherwise, infer cpu from bfd/elf section,
>  - otherwise, use default cpu

However for ARC we don't support "set disassembler-options", because this
option requires valid disassembler_options to be provided, and arc-tdep.c doesn't
do this - so currently "set disassembler-options ..." reports as unsupported. I
agree that in general this option that comes from XML description probably
should be hidden from the user and not shown via "show disassembler-options".
One possible way is that CPU from target description will be stored in the ARC
gdbarch_tdep, and then will be appended to disassemble_info options in the
arc_delayed_print_insn if cpu= hasn't been specified by the user.  As far as I
can see this situation is pretty specific to ARC, because we have two
bfd_arch_info instances sharing the same machine number (to be compatible with
our proprietary toolchain), so <architecture> tag in XML description doesn't
work the way it should - it selects the first of those two architectures in the
internal BFD list, regardless of which one was actually in the description.

Anton

> 
> Thanks,
> Pedro Alves
  

Patch

diff --git a/gdb/arc-tdep.c b/gdb/arc-tdep.c
index d9ee5c6..9a5efc1 100644
--- a/gdb/arc-tdep.c
+++ b/gdb/arc-tdep.c
@@ -145,6 +145,8 @@  static const char *const core_arcompact_register_names[] = {
   "lp_count", "reserved", "limm", "pcl",
 };
 
+static char *arc_disassembler_options = NULL;
+
 /* Functions are sorted in the order as they are used in the
    _initialize_arc_tdep (), which uses the same order as gdbarch.h.  Static
    functions are defined before the first invocation.  */
@@ -1405,12 +1407,31 @@  arc_skip_prologue (struct gdbarch *gdbarch, CORE_ADDR pc)
 int
 arc_delayed_print_insn (bfd_vma addr, struct disassemble_info *info)
 {
-  int (*print_insn) (bfd_vma, struct disassemble_info *);
-  /* exec_bfd may be null, if GDB is run without a target BFD file.  Opcodes
-     will handle NULL value gracefully.  */
-  print_insn = arc_get_disassembler (exec_bfd);
-  gdb_assert (print_insn != NULL);
-  return print_insn (addr, info);
+  /* Standard BFD "machine number" field allows libocodes disassembler to
+     distinguish ARC 600, 700 and v2 cores, however v2 encompasses both ARC EM
+     and HS, which have some difference between.  There are two ways to specify
+     what is the target core:
+     1) via the disassemble_info->disassembler_options;
+     2) otherwise libopcodes will use private (architecture-specific) ELF
+     header.
+
+     Using disassembler_options is preferable, because it comes directly from
+     GDBserver which scanned an actual ARC core identification info.  However,
+     not all GDBservers report core architecture, so as a fallback GDB still
+     should support analysis of ELF header.  The libopcodes disassembly code
+     uses the section to find the BFD and the BFD to find the ELF header,
+     therefore this function should set disassemble_info->section properly.
+
+     disassembler_options was already set by non-target specific code with
+     proper options obtained via gdbarch_disassembler_options ().  */
+  if (info->disassembler_options == NULL)
+    {
+      struct obj_section * s = find_pc_section (addr);
+      if (s != NULL)
+	info->section = s->the_bfd_section;
+    }
+
+  return default_print_insn (addr, info);
 }
 
 /* Baremetal breakpoint instructions.
@@ -2013,6 +2034,8 @@  arc_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)
 
   set_gdbarch_frame_align (gdbarch, arc_frame_align);
 
+  set_gdbarch_print_insn (gdbarch, arc_delayed_print_insn);
+
   set_gdbarch_cannot_step_breakpoint (gdbarch, 1);
 
   /* "nonsteppable" watchpoint means that watchpoint triggers before
@@ -2041,6 +2064,22 @@  arc_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)
   if (tdep->jb_pc >= 0)
     set_gdbarch_get_longjmp_target (gdbarch, arc_get_longjmp_target);
 
+  /* Disassembler options.  Enforce CPU if it was specified in XML target
+     description, otherwise use default method of determining CPU (ELF private
+     header).  */
+  if (info.target_desc != NULL)
+    {
+      const struct bfd_arch_info *tdesc_arch
+	= tdesc_architecture (info.target_desc);
+      if (tdesc_arch != NULL)
+	{
+	  arc_disassembler_options = xstrprintf ("cpu=%s",
+						 tdesc_arch->printable_name);
+	  set_gdbarch_disassembler_options (gdbarch,
+					    &arc_disassembler_options);
+	}
+    }
+
   tdesc_use_registers (gdbarch, tdesc, tdesc_data);
 
   return gdbarch;