Support multiple CUs in a single DWO file

Message ID CAENS6EvRvjtQ2RA02nCre41sj5+nWP9d4hqhGJaku=skZjWKXg@mail.gmail.com
State New, archived
Headers

Commit Message

David Blaikie April 27, 2017, 6:31 p.m. UTC
  With work on ThinLTO (a form of partial cross-file optimization) in
LLVM I've been looking at support for Fission/Split DWARF in this
scenario.

Basically it seems like the best representation for LTO (Thin (small
clusters of objects) and full (whole libraries/programs in a single
optimization/object file step)) is to produce a .dwo file that matches
the .o file granularity, but that means multiple CUs in that .o file
(because it represents the merging of multiple source files)

So I'd like to contribute patches to GDB to support this - the first
one I've attached (though it still lacks a test case - open to
suggestions, but I can probably figure it out on my own - just wanted
to get some feedback on the general direction & start some
conversation sooner rather than later) addresses the first error
messages about "multiple CUs in a DWO" and "cannot find DWO CU" by
keeping a map of CU signatures, the same as there's a map of TU
signatures.

Does this seem like a reasonable thing to support?
Is this the right way to go about doing it?

[The big thing I'm trying to do after this that seems more difficult &
I'd love to discuss - is supporting DW_FORM_ref_addr in these
situations. This comes up when there's cross-CU inlining (a large part
of the point of LTO-like situations) and an inlined_subroutine in one
CU needs to refer to an abstract_origin subprogram in another CU.
Currently the resolution for finding these CUs, etc, isn't quite right
for Fission]
  

Comments

Pedro Alves April 28, 2017, 2:37 p.m. UTC | #1
Hi David,

On 04/27/2017 07:31 PM, David Blaikie wrote:
> With work on ThinLTO (a form of partial cross-file optimization) in
> LLVM I've been looking at support for Fission/Split DWARF in this
> scenario.
> 
> Basically it seems like the best representation for LTO (Thin (small
> clusters of objects) and full (whole libraries/programs in a single
> optimization/object file step)) is to produce a .dwo file that matches
> the .o file granularity, but that means multiple CUs in that .o file
> (because it represents the merging of multiple source files)
> 
> So I'd like to contribute patches to GDB to support this - the first
> one I've attached (though it still lacks a test case - open to
> suggestions, but I can probably figure it out on my own - just wanted

I'm not sure exactly what sort of suggestion you're after, but ...
... looks like we have some dwo tests under gdb.dwarf2/fission*.exp
that I'd suggest taking a look at.  We also have a couple board
files that allow running the whole testsuite in fission/dwo
mode -- see gdb/testsuite/boards/fission*.exp.  Instructions are in
the files themselves.  Sounds like it'd be useful to add a third one?
(and now that I look at the git logs, I see you've touched those
files in the past, so you probably more about this than me.  :-) )

For DWARF-specific tests, the ideal is to write a test using the
dwarf assembler infrastructure (Dwarf::assemble), but when too difficult
we punt and allow leaving the test be arch specific.  However here it
sounds like all you'd need is a smoke test that runs the C/C++ compiler
with the right flags and makes sure you can debug the result?
Something like gdb.dwarf2/fission-mix.exp, I guess.

In case you haven't seen these yet, these may prove useful too:

 http://sourceware.org/gdb/wiki/ContributionChecklist
 https://sourceware.org/gdb/wiki/GDBTestcaseCookbook

> to get some feedback on the general direction & start some
> conversation sooner rather than later) addresses the first error
> messages about "multiple CUs in a DWO" and "cannot find DWO CU" by
> keeping a map of CU signatures, the same as there's a map of TU
> signatures.
> 
> Does this seem like a reasonable thing to support?
> Is this the right way to go about doing it?

I haven't really been involved in GDB's DWO support, so take it with
a grain of salt, but it sounds reasonable to me.

I'm not sure Doug Evans will have time, but given his earlier
involvement with DWO support in the past I think he'd be the best
reviewer on both direction and patches.

> [The big thing I'm trying to do after this that seems more difficult &
> I'd love to discuss - is supporting DW_FORM_ref_addr in these
> situations. This comes up when there's cross-CU inlining (a large part
> of the point of LTO-like situations) and an inlined_subroutine in one
> CU needs to refer to an abstract_origin subprogram in another CU.
> Currently the resolution for finding these CUs, etc, isn't quite right
> for Fission]

Thanks,
Pedro Alves
  
David Blaikie May 12, 2017, 10:42 p.m. UTC | #2
On Fri, Apr 28, 2017 at 7:37 AM, Pedro Alves <palves@redhat.com> wrote:
> Hi David,
>
> On 04/27/2017 07:31 PM, David Blaikie wrote:
>> With work on ThinLTO (a form of partial cross-file optimization) in
>> LLVM I've been looking at support for Fission/Split DWARF in this
>> scenario.
>>
>> Basically it seems like the best representation for LTO (Thin (small
>> clusters of objects) and full (whole libraries/programs in a single
>> optimization/object file step)) is to produce a .dwo file that matches
>> the .o file granularity, but that means multiple CUs in that .o file
>> (because it represents the merging of multiple source files)
>>
>> So I'd like to contribute patches to GDB to support this - the first
>> one I've attached (though it still lacks a test case - open to
>> suggestions, but I can probably figure it out on my own - just wanted
>
> I'm not sure exactly what sort of suggestion you're after, but ...
> ... looks like we have some dwo tests under gdb.dwarf2/fission*.exp
> that I'd suggest taking a look at.  We also have a couple board
> files that allow running the whole testsuite in fission/dwo
> mode -- see gdb/testsuite/boards/fission*.exp.  Instructions are in
> the files themselves.  Sounds like it'd be useful to add a third one?
> (and now that I look at the git logs, I see you've touched those
> files in the past, so you probably more about this than me.  :-) )

Heh - sure enough. But it's been quite a while - I really do
appreciate the reminders & did eventually find something to model from
(fission-base.exp) in there.

> For DWARF-specific tests, the ideal is to write a test using the
> dwarf assembler infrastructure (Dwarf::assemble),

I'll go see if I can figure that out, but will also start a new thread
with a test case, ChangeLog entry, proper subject line, update on the
issues/resolutions surrounding this, etc.

> but when too difficult
> we punt and allow leaving the test be arch specific.  However here it
> sounds like all you'd need is a smoke test that runs the C/C++ compiler
> with the right flags and makes sure you can debug the result?
> Something like gdb.dwarf2/fission-mix.exp, I guess.

Not sure how practical or useful such a test would be - I'm going to
assume GCC doesn't produce any debug info like this at the moment & I
don't know of anyone who's regularly running the GDB test suite with
Clang. (I mean, there is (I added) support for special casing clang in
test cases: "if {[test_compiler_info {clang-*-*}]} {" in a couple of
places)

> In case you haven't seen these yet, these may prove useful too:
>
>  http://sourceware.org/gdb/wiki/ContributionChecklist
>  https://sourceware.org/gdb/wiki/GDBTestcaseCookbook

Thanks!

>> to get some feedback on the general direction & start some
>> conversation sooner rather than later) addresses the first error
>> messages about "multiple CUs in a DWO" and "cannot find DWO CU" by
>> keeping a map of CU signatures, the same as there's a map of TU
>> signatures.
>>
>> Does this seem like a reasonable thing to support?
>> Is this the right way to go about doing it?
>
> I haven't really been involved in GDB's DWO support, so take it with
> a grain of salt, but it sounds reasonable to me.
>
> I'm not sure Doug Evans will have time, but given his earlier
> involvement with DWO support in the past I think he'd be the best
> reviewer on both direction and patches.

*nod* I'll CC you on the new thread & see what Doug's got to say about
it before trying to search for other reviewers, etc.

>> [The big thing I'm trying to do after this that seems more difficult &
>> I'd love to discuss - is supporting DW_FORM_ref_addr in these
>> situations. This comes up when there's cross-CU inlining (a large part
>> of the point of LTO-like situations) and an inlined_subroutine in one
>> CU needs to refer to an abstract_origin subprogram in another CU.
>> Currently the resolution for finding these CUs, etc, isn't quite right
>> for Fission]

Turns out this ^ is a longer project, since the DWP format can't
doesn't capture enough info to support ref_addr. So for now I've
modified Clang not to do this (but still, multiple CUs per DWO is
nice/handy to have, even without cross-CU references) while I fix up
DWP and consumers - so patches to GDB eventually, but not as
immediate.

Thanks again!
- Dave
  
Terekhov, Mikhail via Gdb-patches May 17, 2017, 7:15 p.m. UTC | #3
On Thu, Apr 27, 2017 at 11:31 AM, David Blaikie <dblaikie@gmail.com> wrote:
>
> With work on ThinLTO (a form of partial cross-file optimization) in
> LLVM I've been looking at support for Fission/Split DWARF in this
> scenario.
>
> Basically it seems like the best representation for LTO (Thin (small
> clusters of objects) and full (whole libraries/programs in a single
> optimization/object file step)) is to produce a .dwo file that matches
> the .o file granularity, but that means multiple CUs in that .o file
> (because it represents the merging of multiple source files)
>
> So I'd like to contribute patches to GDB to support this - the first
> one I've attached (though it still lacks a test case - open to
> suggestions, but I can probably figure it out on my own - just wanted
> to get some feedback on the general direction & start some
> conversation sooner rather than later) addresses the first error
> messages about "multiple CUs in a DWO" and "cannot find DWO CU" by
> keeping a map of CU signatures, the same as there's a map of TU
> signatures.
>
> Does this seem like a reasonable thing to support?
> Is this the right way to go about doing it?



Heya. Getting caught up.
Whether it's reasonable or not I guess is being asked in the context
of not being sure what DWARF will eventually be for this?
Sounds reasonable (within some epsilon of some definition of
"reasonable") to me at first glance.


>
> [The big thing I'm trying to do after this that seems more difficult &
> I'd love to discuss - is supporting DW_FORM_ref_addr in these
> situations. This comes up when there's cross-CU inlining (a large part
> of the point of LTO-like situations) and an inlined_subroutine in one
> CU needs to refer to an abstract_origin subprogram in another CU.
> Currently the resolution for finding these CUs, etc, isn't quite right
> for Fission]



Happy to discuss. Lunch?
  

Patch

diff --git gdb/dwarf2read.c gdb/dwarf2read.c
index b58d0fc16e..29eb5a14b2 100644
--- gdb/dwarf2read.c
+++ gdb/dwarf2read.c
@@ -852,12 +852,9 @@  struct dwo_file
      sections (for lack of a better name).  */
   struct dwo_sections sections;
 
-  /* The CU in the file.
-     We only support one because having more than one requires hacking the
-     dwo_name of each to match, which is highly unlikely to happen.
-     Doing this means all TUs can share comp_dir: We also assume that
-     DW_AT_comp_dir across all TUs in a DWO file will be identical.  */
-  struct dwo_unit *cu;
+  /* The CUs in the file.
+     Each element is a struct dwo_unit. */
+  htab_t cus;
 
   /* Table of TUs in the file.
      Each element is a struct dwo_unit.  */
@@ -9702,72 +9699,75 @@  create_dwo_cu_reader (const struct die_reader_specs *reader,
 			hex_string (dwo_unit->signature));
 }
 
-/* Create the dwo_unit for the lone CU in DWO_FILE.
-   Note: This function processes DWO files only, not DWP files.  */
-
-static struct dwo_unit *
-create_dwo_cu (struct dwo_file *dwo_file)
+static void create_cus_hash_table (struct dwo_file &dwo_file,
+                                   dwarf2_section_info &section,
+                                   htab_t &cus_htab)
 {
   struct objfile *objfile = dwarf2_per_objfile->objfile;
-  struct dwarf2_section_info *section = &dwo_file->sections.info;
+  const struct dwarf2_section_info *abbrev_section = &dwo_file.sections.abbrev;
   const gdb_byte *info_ptr, *end_ptr;
-  struct create_dwo_cu_data create_dwo_cu_data;
-  struct dwo_unit *dwo_unit;
 
-  dwarf2_read_section (objfile, section);
-  info_ptr = section->buffer;
+  dwarf2_read_section (objfile, &section);
+  info_ptr = section.buffer;
 
   if (info_ptr == NULL)
-    return NULL;
+    return;
 
   if (dwarf_read_debug)
     {
       fprintf_unfiltered (gdb_stdlog, "Reading %s for %s:\n",
-			  get_section_name (section),
-			  get_section_file_name (section));
+			  get_section_name (&section),
+			  get_section_file_name (&section));
     }
 
-  create_dwo_cu_data.dwo_file = dwo_file;
-  dwo_unit = NULL;
-
-  end_ptr = info_ptr + section->size;
+  end_ptr = info_ptr  + section.size;
   while (info_ptr < end_ptr)
     {
       struct dwarf2_per_cu_data per_cu;
+      struct create_dwo_cu_data create_dwo_cu_data;
+      struct dwo_unit *dwo_unit;
+      void **slot;
+      sect_offset sect_off = (sect_offset) (info_ptr - section.buffer);
 
       memset (&create_dwo_cu_data.dwo_unit, 0,
 	      sizeof (create_dwo_cu_data.dwo_unit));
       memset (&per_cu, 0, sizeof (per_cu));
       per_cu.objfile = objfile;
       per_cu.is_debug_types = 0;
-      per_cu.sect_off = sect_offset (info_ptr - section->buffer);
-      per_cu.section = section;
+      per_cu.sect_off = sect_offset (info_ptr - section.buffer);
+      per_cu.section = &section;
+      create_dwo_cu_data.dwo_file = &dwo_file;
 
-      init_cutu_and_read_dies_no_follow (&per_cu, dwo_file,
+      init_cutu_and_read_dies_no_follow (&per_cu, &dwo_file,
 					 create_dwo_cu_reader,
 					 &create_dwo_cu_data);
+      info_ptr += per_cu.length;
 
-      if (create_dwo_cu_data.dwo_unit.dwo_file != NULL)
-	{
-	  /* If we've already found one, complain.  We only support one
-	     because having more than one requires hacking the dwo_name of
-	     each to match, which is highly unlikely to happen.  */
-	  if (dwo_unit != NULL)
-	    {
-	      complaint (&symfile_complaints,
-			 _("Multiple CUs in DWO file %s [in module %s]"),
-			 dwo_file->dwo_name, objfile_name (objfile));
-	      break;
-	    }
+      if (create_dwo_cu_data.dwo_unit.dwo_file == NULL)
+        continue;
 
-	  dwo_unit = OBSTACK_ZALLOC (&objfile->objfile_obstack, struct dwo_unit);
-	  *dwo_unit = create_dwo_cu_data.dwo_unit;
-	}
+      if (cus_htab == NULL)
+        {
+          cus_htab = allocate_dwo_unit_table (objfile);
+        }
 
-      info_ptr += per_cu.length;
-    }
+      dwo_unit = OBSTACK_ZALLOC (&objfile->objfile_obstack, struct dwo_unit);
+      *dwo_unit = create_dwo_cu_data.dwo_unit;
+      slot = htab_find_slot (cus_htab, dwo_unit, INSERT);
+      gdb_assert (slot != NULL);
+      if (*slot != NULL)
+        {
+          const struct dwo_unit *dup_cu = (const struct dwo_unit *) *slot;
+          sect_offset dup_sect_off = dup_cu->sect_off;
 
-  return dwo_unit;
+	  complaint (&symfile_complaints,
+		     _("debug cu entry at offset 0x%x is duplicate to"
+		       " the entry at offset 0x%x, signature %s"),
+		     to_underlying (sect_off), to_underlying (dup_sect_off),
+		     hex_string (dwo_unit->signature));
+        }
+      *slot = (void *) dwo_unit;
+    }
 }
 
 /* DWP file .debug_{cu,tu}_index section format:
@@ -10772,7 +10772,7 @@  open_and_init_dwo_file (struct dwarf2_per_cu_data *per_cu,
   bfd_map_over_sections (dwo_file->dbfd, dwarf2_locate_dwo_sections,
 			 &dwo_file->sections);
 
-  dwo_file->cu = create_dwo_cu (dwo_file);
+  create_cus_hash_table (*dwo_file, dwo_file->sections.info, dwo_file->cus);
 
   create_debug_types_hash_table (dwo_file, dwo_file->sections.types,
 				 dwo_file->tus);
@@ -11139,10 +11139,13 @@  lookup_dwo_cutu (struct dwarf2_per_cu_data *this_unit,
 	      dwo_cutu
 		= (struct dwo_unit *) htab_find (dwo_file->tus, &find_dwo_cutu);
 	    }
-	  else if (!is_debug_types && dwo_file->cu)
+	  else if (!is_debug_types && dwo_file->cus)
 	    {
-	      if (signature == dwo_file->cu->signature)
-		dwo_cutu = dwo_file->cu;
+              struct dwo_unit find_dwo_cutu;
+
+              memset (&find_dwo_cutu, 0, sizeof (find_dwo_cutu));
+              find_dwo_cutu.signature = signature;
+              dwo_cutu = (struct dwo_unit *) htab_find (dwo_file->cus, &find_dwo_cutu);
 	    }
 
 	  if (dwo_cutu != NULL)