abg-dwarf-reader.cc: Simplify WIP type handling.

Message ID 20200515174740.36885-1-gprocida@google.com
State Superseded, archived
Headers
Series abg-dwarf-reader.cc: Simplify WIP type handling. |

Commit Message

Giuliano Procida May 15, 2020, 5:47 p.m. UTC
  The DWARF reader tracks incompletely declared class/union and function
types using 6 maps. Supporting incompletely declared enum types in the
same way would require 3 additional maps and more duplicated code.

This commit folds together much of the duplicated WIP type handling.
This comes at the price of a tiny bit of extra dynamic casting to copy
member access specifiers in a couple of places.

There are no behavioural changes.

	* src/abg-dwarf-reader.cc (die_type_map_type): Add new
	convenience typedef. (read_context): Replace *wip_classes_map_
	and *wip_function_map_ members with *die_wip_types_map_ ones,
	replace die_wip_classes_map and die_wip_function_types_map
	getters with die_wip_types_map ones, replace
	is_wip_class_die_offset and is_wip_function_type_die_offset
	with is_wip_type_die_offset and simplify lookup.
	(read_context::initialize): Clear *die_wip_types_map_ instead
	of *wip_classes_map_ *wip_function_map_.
	(read_context::lookup_type_from_die_offset): Fold together
	class/union and function conditionals.
	(read_context::add_or_update_class_type): Replace uses of
	die_wip_classes_map with die_wip_types_map. Do an extra
	dynamic cast to be able to fetch member access specifiers in
	the case the WIP class is a member.
	(read_context::add_or_update_union_type): Mutatis mutandis.
	(read_context::build_function_type): Replace uses of
	die_wip_function_types_map with die_wip_types_map. Simplify
	removal from the map.
	(read_context::maybe_canonicalize_type): Replace use
	is_wip_function_type_die_offset with is_wip_type_die_offset.

Signed-off-by: Giuliano Procida <gprocida@google.com>
---
 src/abg-dwarf-reader.cc | 191 +++++++++++++---------------------------
 1 file changed, 59 insertions(+), 132 deletions(-)
  

Comments

Dodji Seketeli May 19, 2020, 5:22 p.m. UTC | #1
Hello Giuliano,

Giuliano Procida <gprocida@google.com> a écrit:

> The DWARF reader tracks incompletely declared class/union and function
> types using 6 maps.

Actually, those maps are not for incompletely declared types.  They are
for types which internal representation (IR) are being built but are not
yet completely built.  Those are called work-in-progress (WIP) types.

During type IR building, it happens frequently that we need to look-up
the IR representing a given DWARF construct.  Because the WIP types are
set in a different set of maps than the fully built IR types, we avoid
some nasty infinite looping that might happen.  So it's important to
keep the WIP types separate from the fully constructed IR types.

> Supporting incompletely declared enum types in the same way would
> require 3 additional maps and more duplicated code.

I don't think we need to store enums in any of those maps.  I think
enums, just like scalar types, can be constructed atomically.
Therefore, they logically don't need to go through the WIP phase.

> This commit folds together much of the duplicated WIP type handling.

Actually, there is a reason for spreading out classes and function types
in separate maps: performance, basically.  We have encountered
pathologically big binaries where the number of functions and
struct/union types was just *HUGE*.  Painful profiling showed that just
accessing stuff in those huge maps was taking a lot of time, probably
due to the huge space it was taking.  Splitting them in two sets of maps
(function types in one hand, and struct/unions in another) helped quite
a bit, so I did that and never looked back.  So before revisiting that,
I'd make really really sure that we don't regress on those pathological
case.  And sadly, I don't have them off the top of my head right now,
but we could dig that up if that is really important.

So here again, I sadly think we should not apply this patch just for the
sake of simplification because the complexity is there for a reason.

Sorry.

Cheers,
  
Giuliano Procida May 21, 2020, 12:53 p.m. UTC | #2
Hi there.

On Tue, 19 May 2020 at 18:22, Dodji Seketeli <dodji@seketeli.org> wrote:

> Hello Giuliano,
>
> Giuliano Procida <gprocida@google.com> a écrit:
>
> > The DWARF reader tracks incompletely declared class/union and function
> > types using 6 maps.
>
> Actually, those maps are not for incompletely declared types.  They are
> for types which internal representation (IR) are being built but are not
> yet completely built.  Those are called work-in-progress (WIP) types.
>

OK. So where do incompletely declared types and the logic to identify and
merge their declarations live?


> During type IR building, it happens frequently that we need to look-up
> the IR representing a given DWARF construct.  Because the WIP types are
> set in a different set of maps than the fully built IR types, we avoid
> some nasty infinite looping that might happen.  So it's important to
> keep the WIP types separate from the fully constructed IR types.
>
> > Supporting incompletely declared enum types in the same way would
> > require 3 additional maps and more duplicated code.
>
> I don't think we need to store enums in any of those maps.  I think
> enums, just like scalar types, can be constructed atomically.
> Therefore, they logically don't need to go through the WIP phase.
>

I see enum types as potentially existing in 3 states of "completeness":

1. enum foo; # C

This is a forward-declared C enum. Given this is a GCC/Clang extension,
we're in the compilers' hands here. Experimentation shows that GCC at least
won't allow direct use of enum foo until a full declaration is seen. So
this is just like struct/union foo. This may change in the future to be
more like 2.

2. enum foo; # C++11 - an opaque enum declaration.

The standard says the type is a complete type and its size is known. The
enumerators are not known.

3. enum with all the enumerators.

Underlying type and all the valid values are known to the compiler.

If libabigail wants to support 1., 2. or both, then it has to be able merge
declarations of those kinds with kind 3, somewhere.


> > This commit folds together much of the duplicated WIP type handling.
>
> Actually, there is a reason for spreading out classes and function types
> in separate maps: performance, basically.  We have encountered
> pathologically big binaries where the number of functions and
> struct/union types was just *HUGE*.  Painful profiling showed that just
> accessing stuff in those huge maps was taking a lot of time, probably
> due to the huge space it was taking.  Splitting them in two sets of maps
> (function types in one hand, and struct/unions in another) helped quite
> a bit, so I did that and never looked back.


Ah. The code didn't look like a pure performance optimisation as the map
value types were specialised as well. A map of maps or a small array of
maps would have been another way of achieving the same result, but with
uniformity of types.

Anyway, understood.


> So before revisiting that,
> I'd make really really sure that we don't regress on those pathological
> case.  And sadly, I don't have them off the top of my head right now,
> but we could dig that up if that is really important.
>

It would be useful to keep large test cases around. We even created an
internal repo for just such things (that are too big to drop into
sourceware git) but we haven't done much with it yet. If you do run across
the case you had in mind, let us know and, assuming licences are OK, we can
at least add it to our collection.


> So here again, I sadly think we should not apply this patch just for the
> sake of simplification because the complexity is there for a reason.
>

We are dealing with the kernel (which is pretty large) and a variety of
users (who are sensitive to performance issues). However, the ABI
monitoring we do clearly only exercises a subset of libabigail. So
unfortunately, I think this sort of thing will tend to recur, where there's
code tuned for cases we simply never see.


> Sorry.
>

Thank you for the review.

Regards,
Giuliano.


> Cheers,
>
> --
>                 Dodji
>
  
Dodji Seketeli May 21, 2020, 2:53 p.m. UTC | #3
Giuliano Procida <gprocida@google.com> a écrit:

> OK. So where do incompletely declared types and the logic to identify and
> merge their declarations live?

Those, called declaration-only classes in the libabigail lingo are
stored in the map returned by read_context::declaration_only_classes().

And I think you figured that out correctly in the patch.  I am looking
at it right now and I'll get back to you shortly, hopefully.

[...]

> I see enum types as potentially existing in 3 states of "completeness":
>
> 1. enum foo; # C
>
> This is a forward-declared C enum. Given this is a GCC/Clang extension,
> we're in the compilers' hands here. Experimentation shows that GCC at least
> won't allow direct use of enum foo until a full declaration is seen. So
> this is just like struct/union foo. This may change in the future to be
> more like 2.
>
> 2. enum foo; # C++11 - an opaque enum declaration.
>
> The standard says the type is a complete type and its size is known. The
> enumerators are not known.
>
> 3. enum with all the enumerators.
>
> Underlying type and all the valid values are known to the compiler.
>
> If libabigail wants to support 1., 2. or both, then it has to be able merge
> declarations of those kinds with kind 3, somewhere.

I think your patch started addressing that just fine by providing
read_context::resolve_declaration_only_enums().  It's just like what we
do already for declaration-only classes.  I think handling 2/ won't be
an issue.

[...]

> It would be useful to keep large test cases around.

Absolutely.  The only reason why it's not been done yet is for logistics
reasons.  I wanted to have a separate "project" (git repo maybe?) that
hosts just those big pathological tests.

In the Fedora space however, we have this
https://pagure.io/libabigail-selfcheck that we run on the entire distro
of userspace packages every now and then.  It's been done by Sinny
Kumari.  It'd be nice to have it automated, but for now, that's all we
have and it's better than nothing.

So yeah, something along those lines, that is public, automated somehow,
would be nice indeed.

>We even created an internal repo for just such things (that are too big
>to drop into sourceware git) but we haven't done much with it yet. If
>you do run across the case you had in mind, let us know and, assuming
>licences are OK, we can at least add it to our collection.

Noted. Thanks.

[...]

> We are dealing with the kernel (which is pretty large) and a variety of
> users (who are sensitive to performance issues). However, the ABI
> monitoring we do clearly only exercises a subset of libabigail. So
> unfortunately, I think this sort of thing will tend to recur, where there's
> code tuned for cases we simply never see.

I have no problem with that at all.  I guess we just have to communicate
as we are doing :-)

Cheers,
  

Patch

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 5024deb3..41c40603 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -129,6 +129,10 @@  typedef vector<Dwarf_Off> dwarf_offsets_type;
 /// die and which value is the corresponding artefact.
 typedef unordered_map<Dwarf_Off, type_or_decl_base_sptr> die_artefact_map_type;
 
+/// Convenience typedef for a map which key is the offset of a dwarf
+/// die and which value is the corresponding type_base.
+typedef unordered_map<Dwarf_Off, type_base_sptr> die_type_map_type;
+
 /// Convenience typedef for a map which key is the offset of a dwarf
 /// die, (given by dwarf_dieoffset()) and which value is the
 /// corresponding class_decl.
@@ -2270,12 +2274,9 @@  public:
   /// function types, inside a translation unit.
   mutable istring_fn_type_map_type per_tu_repr_to_fn_type_maps_;
 
-  die_class_or_union_map_type	die_wip_classes_map_;
-  die_class_or_union_map_type	alternate_die_wip_classes_map_;
-  die_class_or_union_map_type	type_unit_die_wip_classes_map_;
-  die_function_type_map_type	die_wip_function_types_map_;
-  die_function_type_map_type	alternate_die_wip_function_types_map_;
-  die_function_type_map_type	type_unit_die_wip_function_types_map_;
+  die_type_map_type		die_wip_types_map_;
+  die_type_map_type		alternate_die_wip_types_map_;
+  die_type_map_type		type_unit_die_wip_types_map_;
   die_function_decl_map_type	die_function_with_no_symbol_map_;
   vector<Dwarf_Off>		types_to_canonicalize_;
   vector<Dwarf_Off>		alt_types_to_canonicalize_;
@@ -2447,12 +2448,9 @@  public:
     type_die_artefact_maps_.clear();
     canonical_type_die_offsets_.clear();
     canonical_decl_die_offsets_.clear();
-    die_wip_classes_map_.clear();
-    alternate_die_wip_classes_map_.clear();
-    type_unit_die_wip_classes_map_.clear();
-    die_wip_function_types_map_.clear();
-    alternate_die_wip_function_types_map_.clear();
-    type_unit_die_wip_function_types_map_.clear();
+    die_wip_types_map_.clear();
+    alternate_die_wip_types_map_.clear();
+    type_unit_die_wip_types_map_.clear();
     die_function_with_no_symbol_map_.clear();
     types_to_canonicalize_.clear();
     alt_types_to_canonicalize_.clear();
@@ -4280,21 +4278,9 @@  public:
 
     if (!result)
       {
-	// Maybe we are looking for a class type being constructed?
-	const die_class_or_union_map_type& m = die_wip_classes_map(source);
-	die_class_or_union_map_type::const_iterator i = m.find(die_offset);
-
-	if (i != m.end())
-	  result = i->second;
-      }
-
-    if (!result)
-      {
-	// Maybe we are looking for a function type being constructed?
-	const die_function_type_map_type& m =
-	  die_wip_function_types_map(source);
-	die_function_type_map_type::const_iterator i = m.find(die_offset);
-
+	// Maybe we are looking for a type being constructed?
+	const die_type_map_type& m = die_wip_types_map(source);
+	die_type_map_type::const_iterator i = m.find(die_offset);
 	if (i != m.end())
 	  result = i->second;
       }
@@ -4310,70 +4296,34 @@  public:
   ///
   /// @return the map that associates a DIE to the class that is being
   /// built.
-  const die_class_or_union_map_type&
-  die_wip_classes_map(die_source source) const
-  {return const_cast<read_context*>(this)->die_wip_classes_map(source);}
+  const die_type_map_type&
+  die_wip_types_map(die_source source) const
+  {return const_cast<read_context*>(this)->die_wip_types_map(source);}
 
-  /// Getter of a map that associates a die that represents a
-  /// class/struct with the declaration of the class, while the class
-  /// is being constructed.
+  /// Getter of a map that associates a die that represents a type
+  /// with the declaration of the type, while the type is being
+  /// constructed.
   ///
   /// @param source where the DIE comes from.
   ///
-  /// @return the map that associates a DIE to the class that is being
+  /// @return the map that associates a DIE to the type that is being
   /// built.
-  die_class_or_union_map_type&
-  die_wip_classes_map(die_source source)
+  die_type_map_type&
+  die_wip_types_map(die_source source)
   {
     switch (source)
       {
       case PRIMARY_DEBUG_INFO_DIE_SOURCE:
 	break;
       case ALT_DEBUG_INFO_DIE_SOURCE:
-	return alternate_die_wip_classes_map_;
+	return alternate_die_wip_types_map_;
       case TYPE_UNIT_DIE_SOURCE:
-	return type_unit_die_wip_classes_map_;
+	return type_unit_die_wip_types_map_;
       case NO_DEBUG_INFO_DIE_SOURCE:
       case NUMBER_OF_DIE_SOURCES:
 	ABG_ASSERT_NOT_REACHED;
       }
-    return die_wip_classes_map_;
-  }
-
-  /// Getter for a map that associates a die (that represents a
-  /// function type) whith a function type, while the function type is
-  /// being constructed (WIP == work in progress).
-  ///
-  /// @param source where the DIE comes from.n
-  ///
-  /// @return the map of wip function types.
-  const die_function_type_map_type&
-  die_wip_function_types_map(die_source source) const
-  {return const_cast<read_context*>(this)->die_wip_function_types_map(source);}
-
-  /// Getter for a map that associates a die (that represents a
-  /// function type) whith a function type, while the function type is
-  /// being constructed (WIP == work in progress).
-  ///
-  /// @param source where DIEs of the map come from.
-  ///
-  /// @return the map of wip function types.
-  die_function_type_map_type&
-  die_wip_function_types_map(die_source source)
-  {
-    switch (source)
-      {
-      case PRIMARY_DEBUG_INFO_DIE_SOURCE:
-	break;
-      case ALT_DEBUG_INFO_DIE_SOURCE:
-	return alternate_die_wip_function_types_map_;
-      case TYPE_UNIT_DIE_SOURCE:
-	return type_unit_die_wip_function_types_map_;
-      case NO_DEBUG_INFO_DIE_SOURCE:
-      case NUMBER_OF_DIE_SOURCES:
-	ABG_ASSERT_NOT_REACHED;
-      }
-    return die_wip_function_types_map_;
+    return die_wip_types_map_;
   }
 
   /// Getter for a map that associates a die with a function decl
@@ -4387,40 +4337,20 @@  public:
   die_function_decl_with_no_symbol_map()
   {return die_function_with_no_symbol_map_;}
 
-  /// Return true iff a given offset is for the DIE of a class that is
-  /// being built, but that is not fully built yet.  WIP == "work in
-  /// progress".
-  ///
-  /// @param offset the DIE offset to consider.
-  ///
-  /// @param source where the DIE of the map come from.
-  ///
-  /// @return true iff @p offset is the offset of the DIE of a class
-  /// that is being currently built.
-  bool
-  is_wip_class_die_offset(Dwarf_Off offset, die_source source) const
-  {
-    die_class_or_union_map_type::const_iterator i =
-      die_wip_classes_map(source).find(offset);
-    return (i != die_wip_classes_map(source).end());
-  }
-
-  /// Return true iff a given offset is for the DIE of a function type
-  /// that is being built at the moment, but is not fully built yet.
-  /// WIP == work in progress.
+  /// Return true iff a given offset is for the DIE of a type that is
+  /// being built at the moment, but is not fully built yet.  WIP ==
+  /// work in progress.
   ///
   /// @param offset DIE offset to consider.
   ///
   /// @param source where the DIE comes from.
   ///
-  /// @return true iff @p offset is the offset of the DIE of a
-  /// function type that is being currently built.
+  /// @return true iff @p offset is the offset of the DIE of a type
+  /// that is being currently built.
   bool
-  is_wip_function_type_die_offset(Dwarf_Off offset, die_source source) const
+  is_wip_type_die_offset(Dwarf_Off offset, die_source source) const
   {
-    die_function_type_map_type::const_iterator i =
-      die_wip_function_types_map(source).find(offset);
-    return (i != die_wip_function_types_map(source).end());
+    return die_wip_types_map(source).count(offset);
   }
 
   /// Getter for the map of declaration-only classes that are to be
@@ -13779,9 +13709,9 @@  add_or_update_class_type(read_context&	 ctxt,
     return result;
 
   {
-    die_class_or_union_map_type::const_iterator i =
-      ctxt.die_wip_classes_map(source).find(dwarf_dieoffset(die));
-    if (i != ctxt.die_wip_classes_map(source).end())
+    die_type_map_type::const_iterator i =
+      ctxt.die_wip_types_map(source).find(dwarf_dieoffset(die));
+    if (i != ctxt.die_wip_types_map(source).end())
       {
 	class_decl_sptr class_type = is_class_type(i->second);
 	ABG_ASSERT(class_type);
@@ -13907,7 +13837,7 @@  add_or_update_class_type(read_context&	 ctxt,
     // here.
     return result;
 
-  ctxt.die_wip_classes_map(source)[dwarf_dieoffset(die)] = result;
+  ctxt.die_wip_types_map(source)[dwarf_dieoffset(die)] = result;
 
   scope_decl_sptr scop =
     dynamic_pointer_cast<scope_decl>(res);
@@ -14090,14 +14020,15 @@  add_or_update_class_type(read_context&	 ctxt,
   ctxt.scope_stack().pop();
 
   {
-    die_class_or_union_map_type::const_iterator i =
-      ctxt.die_wip_classes_map(source).find(dwarf_dieoffset(die));
-    if (i != ctxt.die_wip_classes_map(source).end())
+    die_type_map_type::const_iterator i =
+      ctxt.die_wip_types_map(source).find(dwarf_dieoffset(die));
+    if (i != ctxt.die_wip_types_map(source).end())
       {
-	if (is_member_type(i->second))
-	  set_member_access_specifier(res,
-				      get_member_access_specifier(i->second));
-	ctxt.die_wip_classes_map(source).erase(i);
+        const type_base_sptr& type = i->second;
+        const decl_base* decl = is_decl(type.get());
+	if (is_member_type(type) && decl)
+	  set_member_access_specifier(res, get_member_access_specifier(*decl));
+	ctxt.die_wip_types_map(source).erase(i);
       }
   }
 
@@ -14144,9 +14075,9 @@  add_or_update_union_type(read_context&	ctxt,
   die_source source;
   ABG_ASSERT(ctxt.get_die_source(die, source));
   {
-    die_class_or_union_map_type::const_iterator i =
-      ctxt.die_wip_classes_map(source).find(dwarf_dieoffset(die));
-    if (i != ctxt.die_wip_classes_map(source).end())
+    die_type_map_type::const_iterator i =
+      ctxt.die_wip_types_map(source).find(dwarf_dieoffset(die));
+    if (i != ctxt.die_wip_types_map(source).end())
       {
 	union_decl_sptr u = is_union_type(i->second);
 	ABG_ASSERT(u);
@@ -14244,7 +14175,7 @@  add_or_update_union_type(read_context&	ctxt,
   if (!has_child)
     return result;
 
-  ctxt.die_wip_classes_map(source)[dwarf_dieoffset(die)] = result;
+  ctxt.die_wip_types_map(source)[dwarf_dieoffset(die)] = result;
 
   scope_decl_sptr scop =
     dynamic_pointer_cast<scope_decl>(result);
@@ -14326,14 +14257,16 @@  add_or_update_union_type(read_context&	ctxt,
   ctxt.scope_stack().pop();
 
   {
-    die_class_or_union_map_type::const_iterator i =
-      ctxt.die_wip_classes_map(source).find(dwarf_dieoffset(die));
-    if (i != ctxt.die_wip_classes_map(source).end())
+    die_type_map_type::const_iterator i =
+      ctxt.die_wip_types_map(source).find(dwarf_dieoffset(die));
+    if (i != ctxt.die_wip_types_map(source).end())
       {
-	if (is_member_type(i->second))
+        const type_base_sptr& type = i->second;
+        const decl_base* decl = is_decl(type.get());
+	if (is_member_type(type) && decl)
 	  set_member_access_specifier(result,
-				      get_member_access_specifier(i->second));
-	ctxt.die_wip_classes_map(source).erase(i);
+				      get_member_access_specifier(*decl));
+	ctxt.die_wip_types_map(source).erase(i);
       }
   }
 
@@ -14796,7 +14729,7 @@  build_function_type(read_context&	ctxt,
 	       : new function_type(ctxt.env(), tu->get_address_size(),
 				   /*alignment=*/0));
   ctxt.associate_die_to_type(die, result, where_offset);
-  ctxt.die_wip_function_types_map(source)[dwarf_dieoffset(die)] = result;
+  ctxt.die_wip_types_map(source)[dwarf_dieoffset(die)] = result;
   ctxt.associate_die_repr_to_fn_type_per_tu(die, result);
 
   type_base_sptr return_type;
@@ -14872,13 +14805,7 @@  build_function_type(read_context&	ctxt,
 
   tu->bind_function_type_life_time(result);
 
-  {
-    die_function_type_map_type::const_iterator i =
-      ctxt.die_wip_function_types_map(source).
-      find(dwarf_dieoffset(die));
-    if (i != ctxt.die_wip_function_types_map(source).end())
-      ctxt.die_wip_function_types_map(source).erase(i);
-  }
+  ctxt.die_wip_types_map(source).erase(dwarf_dieoffset(die));
 
   maybe_canonicalize_type(result, ctxt);
   return result;
@@ -16259,7 +16186,7 @@  maybe_canonicalize_type(const Dwarf_Die *die, read_context& ctxt)
     // maybe_strip_qualification) after they are initially built.
     ctxt.schedule_type_for_late_canonicalization(die);
   else if ((is_function_type(t)
-	    && ctxt.is_wip_function_type_die_offset(die_offset, source))
+	    && ctxt.is_wip_type_die_offset(die_offset, source))
 	   || type_has_non_canonicalized_subtype(t))
     ctxt.schedule_type_for_late_canonicalization(die);
   else