[1/2,/,RFC] Bug 26646 - unexpected declaration-only types (part 2)

Message ID 87wnhfjpwt.fsf@redhat.com
State New
Headers
Series [1/2,/,RFC] Bug 26646 - unexpected declaration-only types (part 2) |

Commit Message

Dodji Seketeli Feb. 28, 2022, 9:34 a.m. UTC
  Hello Giuliano,

This patch addresses the comment
https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c15 of the
reported problem.

The core of the problem seems to be that when compare_dies compares
two DW_TAG_{structure,union}_type, it "gives up" when the comparison
stack contains more than 5 such DIEs being compared.  Giving up means
that it considers the two DIEs to be equivalent if they have the same
size and kind, basically.  This is to avoid infinite loops or too long
loops that we've seen with artificial DWARF input generated by
fuzzers.

This patch fixes that by better using the "aggregates_being_compared"
data structure that is meant to prevent looping infinitely over
aggregate DIEs that might be recursively defined.  That data structure
now contains the DWARF offset pairs of the DIEs being compared, rather
than their "string representation".  Lookups in that data structure
should now be faster and more precise.

The artificial limit of the comparison stack not having more than 5
DIEs is now lifted.

	* src/abg-dwarf-reader.cc (struct dwarf_offset_pair_hash)
	(dwarf_offset_pair_set_type): Define new type.
	(die_offset, has_offset_pair, insert_offset_pair)
	(erase_offset_pair): Define new static helper functions.
	(compare_dies): Use a set of DWARF offsets for the
	'aggregates_being_compared' data structure, rather than a set of
	string representation of the DIEs.  Always look at the size of the
	types being compared first so that we can get out quickly if they
	differ.  For DIEs of DW_TAG_{union,struct}_type kind, don't limit
	the depth of the stack of DIEs being compared to 5; so we don't
	consider two types as being equal just because the depth of the
	stack being compared is 5 and the two types have the same size and
	are equal.  Hopefully things don't take too long.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-dwarf-reader.cc | 138 +++++++++++++++++++++++++++++-----------
 1 file changed, 101 insertions(+), 37 deletions(-)
  

Comments

Dodji Seketeli March 1, 2022, 5:45 p.m. UTC | #1
Hey Giuliano,

Thank you for reviewing the patch!

Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:

[...]

> 	* src/abg-dwarf-reader.cc (struct dwarf_offset_pair_hash)
> 	(dwarf_offset_pair_set_type): Define new type.
> 	(die_offset, has_offset_pair, insert_offset_pair)
> 	(erase_offset_pair): Define new static helper functions.
> 	(compare_dies): Use a set of DWARF offsets for the
> 	'aggregates_being_compared' data structure, rather than a set of
> 	string representation of the DIEs.  Always look at the size of the
> 	types being compared first so that we can get out quickly if they
> 	differ.  For DIEs of DW_TAG_{union,struct}_type kind, don't limit
> 	the depth of the stack of DIEs being compared to 5; so we don't
> 	consider two types as being equal just because the depth of the
> 	stack being compared is 5 and the two types have the same size and
> 	are equal.  Hopefully things don't take too long.
>

OK, you actually posted your comment as a bugzilla comment which is
fine.  As I'd like the review to be found by people reading the patch
post as well, I am replying it here.

[...]

gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
écrit:

> Two (overlapping) things:
>
> 1. I wouldn't bother with inserting / testing / erasing the reverse
> pair o(p2, p1)
>
> It might cost more than it ever saves. But it would take
> instrumentation to see if it ever makes a difference.
>
> If you make this change then s/unordered pair/ordered pair/ in the doc
> comments.

OK, I am going with your inclination here.  I updated the patch
accordingly.

> 2. There is a typo.
>
> One of the reverse pairs is written as o(p2, 1).

Good catch. Thanks.  Updated.

> Otherwise, it looks good to me.

Thanks.  I am posting an updated version of the patch below.

From 3b67a367ce6b8a141f0cb29a130c002ca7b984eb Mon Sep 17 00:00:00 2001
From: Dodji Seketeli <dodji@redhat.com>
Date: Thu, 10 Feb 2022 17:43:38 +0100
Subject: [PATCH 1/2] Bug 26646 - unexpected declaration-only types (part 2)

This patch addresses the comment
https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c15 of the
reported problem.

The core of the problem seems to be that when compare_dies compares
two DW_TAG_{structure,union}_type, it "gives up" when the comparison
stack contains more than 5 such DIEs being compared.  Giving up means
that it considers the two DIEs to be equivalent if they have the same
size and kind, basically.  This is to avoid infinite loops or too long
loops that we've seen with artificial DWARF input generated by
fuzzers.

This patch fixes that by better using the "aggregates_being_compared"
data structure that is meant to prevent looping infinitely over
aggregate DIEs that might be recursively defined.  That data structure
now contains the DWARF offset pairs of the DIEs being compared, rather
than their "string representation".  Lookups in that data structure
should now be faster and more precise.

The artificial limit of the comparison stack not having more than 5
DIEs is now lifted.

	* src/abg-dwarf-reader.cc (struct dwarf_offset_pair_hash)
	(dwarf_offset_pair_set_type): Define new type.
	(die_offset, has_offset_pair, insert_offset_pair)
	(erase_offset_pair): Define new static helper functions.
	(compare_dies): Use a set of DWARF offsets for the
	'aggregates_being_compared' data structure, rather than a set of
	string representation of the DIEs.  Always look at the size of the
	types being compared first so that we can get out quickly if they
	differ.  For DIEs of DW_TAG_{union,struct}_type kind, don't limit
	the depth of the stack of DIEs being compared to 5; so we don't
	consider two types as being equal just because the depth of the
	stack being compared is 5 and the two types have the same size and
	are equal.  Hopefully things don't take too long.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-dwarf-reader.cc | 135 +++++++++++++++++++++++++++++-----------
 1 file changed, 98 insertions(+), 37 deletions(-)

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 53b5845d..18f49037 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -161,7 +161,18 @@ typedef unordered_map<GElf_Addr, elf_symbol_sptr> addr_elf_symbol_sptr_map_type;
 /// Convenience typedef for a set of ELF addresses.
 typedef unordered_set<GElf_Addr> address_set_type;
 
-typedef unordered_set<interned_string, hash_interned_string> istring_set_type;
+/// A hasher for a pair of Dwarf_Off.  This is used as a hasher for
+/// the type @ref dwarf_offset_pair_set_type.
+struct dwarf_offset_pair_hash
+{
+  size_t
+  operator()(const std::pair<Dwarf_Off, Dwarf_Off>& p) const
+  {return abigail::hashing::combine_hashes(p.first, p.second);}
+};// end struct dwarf_offset_pair_hash
+
+typedef unordered_set<std::pair<Dwarf_Off,
+				Dwarf_Off>,
+		      dwarf_offset_pair_hash> dwarf_offset_pair_set_type;
 
 /// Convenience typedef for a shared pointer to an @ref address_set_type.
 typedef shared_ptr<address_set_type> address_set_sptr;
@@ -297,6 +308,12 @@ get_scope_die(const read_context&	ctxt,
 	      size_t			where_offset,
 	      Dwarf_Die&		scope_die);
 
+static Dwarf_Off
+die_offset(Dwarf_Die* die);
+
+static Dwarf_Off
+die_offset(const Dwarf_Die* die);
+
 static bool
 die_is_anonymous(const Dwarf_Die* die);
 
@@ -6230,6 +6247,24 @@ void
 set_do_log(read_context& ctxt, bool f)
 {ctxt.do_log(f);}
 
+/// Get the offset of a DIE
+///
+/// @param die the DIE to consider.
+///
+/// @return the offset of the DIE.
+static Dwarf_Off
+die_offset(Dwarf_Die* die)
+{return dwarf_dieoffset(die);}
+
+/// Get the offset of a DIE
+///
+/// @param die the DIE to consider.
+///
+/// @return the offset of the DIE.
+static Dwarf_Off
+die_offset(const Dwarf_Die* die)
+{return die_offset(const_cast<Dwarf_Die*>(die));}
+
 /// Test if a given DIE is anonymous
 ///
 /// @param die the DIE to consider.
@@ -10097,6 +10132,51 @@ fn_die_equal_by_linkage_name(const read_context &ctxt,
 	  && llinkage_name == rlinkage_name);
 }
 
+/// Test if the pair of offset {p1,p2} is present in a set.
+///
+/// @param set the set of pairs of DWARF offsets to consider.
+///
+/// @param p1 the first value of the pair.
+///
+/// @param p2 the second value of the pair.
+///
+/// @return if the pair {p1,p2} is present in the set.
+static bool
+has_offset_pair(const dwarf_offset_pair_set_type& set,
+		Dwarf_Off p1, Dwarf_Off p2)
+{
+  std::pair<Dwarf_Off, Dwarf_Off> p(p1, p2);
+  if (set.find(p) != set.end())
+    return true;
+  return false;
+}
+
+/// Insert a new pair of offset into the set of pair.
+///
+/// @param set the set of pairs of DWARF offsets to consider.
+///
+/// @param p1 the first value of the pair.
+///
+/// @param p2 the second value of the pair.
+static void
+insert_offset_pair(dwarf_offset_pair_set_type& set, Dwarf_Off p1, Dwarf_Off p2)
+{set.insert(std::pair<Dwarf_Off, Dwarf_Off>(p1, p2));}
+
+/// Erase a pair of DWARF offset from a set of pairs.
+///
+///
+/// @param set the set of pairs of DWARF offsets to consider.
+///
+/// @param p1 the first value of the pair.
+///
+/// @param p2 the second value of the pair.
+static void
+erase_offset_pair(dwarf_offset_pair_set_type& set, Dwarf_Off p1, Dwarf_Off p2)
+{
+  std::pair<Dwarf_Off, Dwarf_Off> p(p1, p2);
+  set.erase(p);
+}
+
 /// Compare two DIEs emitted by a C compiler.
 ///
 /// @param ctxt the read context used to load the DWARF information.
@@ -10123,7 +10203,7 @@ fn_die_equal_by_linkage_name(const read_context &ctxt,
 static bool
 compare_dies(const read_context& ctxt,
 	     const Dwarf_Die *l, const Dwarf_Die *r,
-	     istring_set_type& aggregates_being_compared,
+	     dwarf_offset_pair_set_type& aggregates_being_compared,
 	     bool update_canonical_dies_on_the_fly)
 {
   ABG_ASSERT(l);
@@ -10268,23 +10348,15 @@ compare_dies(const read_context& ctxt,
     case DW_TAG_structure_type:
     case DW_TAG_union_type:
       {
-	interned_string ln = ctxt.get_die_pretty_type_representation(l, 0);
-	interned_string rn = ctxt.get_die_pretty_type_representation(r, 0);
-
-	if ((aggregates_being_compared.find(ln)
-	     != aggregates_being_compared.end())
-	    || (aggregates_being_compared.find(rn)
-		!= aggregates_being_compared.end()))
+	if (has_offset_pair(aggregates_being_compared,
+			    die_offset(l), die_offset(r)))
 	  result = true;
-	else if (!compare_as_decl_dies(l, r))
-	  result = false;
-	else if (!compare_as_type_dies(l, r))
+	else if (!compare_as_decl_dies(l, r) || !compare_as_type_dies(l, r))
 	  result = false;
 	else
 	  {
-	    aggregates_being_compared.insert(ln);
-	    aggregates_being_compared.insert(rn);
-
+	    insert_offset_pair(aggregates_being_compared,
+			       die_offset(l), die_offset(r));
 	    Dwarf_Die l_member, r_member;
 	    bool found_l_member, found_r_member;
 	    for (found_l_member = dwarf_child(const_cast<Dwarf_Die*>(l),
@@ -10316,8 +10388,8 @@ compare_dies(const read_context& ctxt,
 	    if (found_l_member != found_r_member)
 	      result = false;
 
-	    aggregates_being_compared.erase(ln);
-	    aggregates_being_compared.erase(rn);
+	    erase_offset_pair(aggregates_being_compared,
+			      die_offset(l), die_offset(r));
 	  }
       }
       break;
@@ -10402,10 +10474,8 @@ compare_dies(const read_context& ctxt,
 	interned_string ln = ctxt.get_die_pretty_type_representation(l, 0);
 	interned_string rn = ctxt.get_die_pretty_type_representation(r, 0);
 
-	if ((aggregates_being_compared.find(ln)
-	     != aggregates_being_compared.end())
-	    || (aggregates_being_compared.find(rn)
-		!= aggregates_being_compared.end()))
+	if (has_offset_pair(aggregates_being_compared, die_offset(l),
+			    die_offset(r)))
 	  {
 	    result = true;
 	    break;
@@ -10498,8 +10568,8 @@ compare_dies(const read_context& ctxt,
 	      }
 	  }
 
-	aggregates_being_compared.erase(ln);
-	aggregates_being_compared.erase(rn);
+	erase_offset_pair(aggregates_being_compared,
+			  die_offset(l), die_offset(r));
       }
       break;
 
@@ -10535,19 +10605,10 @@ compare_dies(const read_context& ctxt,
 	      Dwarf_Die l_type, r_type;
 	      ABG_ASSERT(die_die_attribute(l, DW_AT_type, l_type));
 	      ABG_ASSERT(die_die_attribute(r, DW_AT_type, r_type));
-	      if (aggregates_being_compared.size () < 5)
-		{
-		  if (!compare_dies(ctxt, &l_type, &r_type,
-				    aggregates_being_compared,
-				    update_canonical_dies_on_the_fly))
-		    result = false;
-		}
-	      else
-		{
-		  if (!compare_as_type_dies(&l_type, &r_type)
-		      ||!compare_as_decl_dies(&l_type, &r_type))
-		    return false;
-		}
+	      if (!compare_dies(ctxt, &l_type, &r_type,
+				aggregates_being_compared,
+				update_canonical_dies_on_the_fly))
+		result = false;
 	    }
 	}
       else
@@ -10661,7 +10722,7 @@ compare_dies(const read_context& ctxt,
 	     const Dwarf_Die *r,
 	     bool update_canonical_dies_on_the_fly)
 {
-  istring_set_type aggregates_being_compared;
+  dwarf_offset_pair_set_type aggregates_being_compared;
   return compare_dies(ctxt, l, r, aggregates_being_compared,
 		      update_canonical_dies_on_the_fly);
 }
  
Giuliano Procida March 2, 2022, 8:58 a.m. UTC | #2
Answering on phone.

This looks fine. You might be able to shorten things a bit with make_pair
but that's just code style.

Giuliano.

On Tue, 1 Mar 2022, 17:46 Dodji Seketeli, <dodji@seketeli.org> wrote:

> Hey Giuliano,
>
> Thank you for reviewing the patch!
>
> Dodji Seketeli via Libabigail <libabigail@sourceware.org> a écrit:
>
> [...]
>
> >       * src/abg-dwarf-reader.cc (struct dwarf_offset_pair_hash)
> >       (dwarf_offset_pair_set_type): Define new type.
> >       (die_offset, has_offset_pair, insert_offset_pair)
> >       (erase_offset_pair): Define new static helper functions.
> >       (compare_dies): Use a set of DWARF offsets for the
> >       'aggregates_being_compared' data structure, rather than a set of
> >       string representation of the DIEs.  Always look at the size of the
> >       types being compared first so that we can get out quickly if they
> >       differ.  For DIEs of DW_TAG_{union,struct}_type kind, don't limit
> >       the depth of the stack of DIEs being compared to 5; so we don't
> >       consider two types as being equal just because the depth of the
> >       stack being compared is 5 and the two types have the same size and
> >       are equal.  Hopefully things don't take too long.
> >
>
> OK, you actually posted your comment as a bugzilla comment which is
> fine.  As I'd like the review to be found by people reading the patch
> post as well, I am replying it here.
>
> [...]
>
> gprocida at google dot com via Libabigail <libabigail@sourceware.org> a
> écrit:
>
> > Two (overlapping) things:
> >
> > 1. I wouldn't bother with inserting / testing / erasing the reverse
> > pair o(p2, p1)
> >
> > It might cost more than it ever saves. But it would take
> > instrumentation to see if it ever makes a difference.
> >
> > If you make this change then s/unordered pair/ordered pair/ in the doc
> > comments.
>
> OK, I am going with your inclination here.  I updated the patch
> accordingly.
>
> > 2. There is a typo.
> >
> > One of the reverse pairs is written as o(p2, 1).
>
> Good catch. Thanks.  Updated.
>
> > Otherwise, it looks good to me.
>
> Thanks.  I am posting an updated version of the patch below.
>
> From 3b67a367ce6b8a141f0cb29a130c002ca7b984eb Mon Sep 17 00:00:00 2001
> From: Dodji Seketeli <dodji@redhat.com>
> Date: Thu, 10 Feb 2022 17:43:38 +0100
> Subject: [PATCH 1/2] Bug 26646 - unexpected declaration-only types (part 2)
>
> This patch addresses the comment
> https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c15 of the
> reported problem.
>
> The core of the problem seems to be that when compare_dies compares
> two DW_TAG_{structure,union}_type, it "gives up" when the comparison
> stack contains more than 5 such DIEs being compared.  Giving up means
> that it considers the two DIEs to be equivalent if they have the same
> size and kind, basically.  This is to avoid infinite loops or too long
> loops that we've seen with artificial DWARF input generated by
> fuzzers.
>
> This patch fixes that by better using the "aggregates_being_compared"
> data structure that is meant to prevent looping infinitely over
> aggregate DIEs that might be recursively defined.  That data structure
> now contains the DWARF offset pairs of the DIEs being compared, rather
> than their "string representation".  Lookups in that data structure
> should now be faster and more precise.
>
> The artificial limit of the comparison stack not having more than 5
> DIEs is now lifted.
>
>         * src/abg-dwarf-reader.cc (struct dwarf_offset_pair_hash)
>         (dwarf_offset_pair_set_type): Define new type.
>         (die_offset, has_offset_pair, insert_offset_pair)
>         (erase_offset_pair): Define new static helper functions.
>         (compare_dies): Use a set of DWARF offsets for the
>         'aggregates_being_compared' data structure, rather than a set of
>         string representation of the DIEs.  Always look at the size of the
>         types being compared first so that we can get out quickly if they
>         differ.  For DIEs of DW_TAG_{union,struct}_type kind, don't limit
>         the depth of the stack of DIEs being compared to 5; so we don't
>         consider two types as being equal just because the depth of the
>         stack being compared is 5 and the two types have the same size and
>         are equal.  Hopefully things don't take too long.
>
> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
> ---
>  src/abg-dwarf-reader.cc | 135 +++++++++++++++++++++++++++++-----------
>  1 file changed, 98 insertions(+), 37 deletions(-)
>
> diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
> index 53b5845d..18f49037 100644
> --- a/src/abg-dwarf-reader.cc
> +++ b/src/abg-dwarf-reader.cc
> @@ -161,7 +161,18 @@ typedef unordered_map<GElf_Addr, elf_symbol_sptr>
> addr_elf_symbol_sptr_map_type;
>  /// Convenience typedef for a set of ELF addresses.
>  typedef unordered_set<GElf_Addr> address_set_type;
>
> -typedef unordered_set<interned_string, hash_interned_string>
> istring_set_type;
> +/// A hasher for a pair of Dwarf_Off.  This is used as a hasher for
> +/// the type @ref dwarf_offset_pair_set_type.
> +struct dwarf_offset_pair_hash
> +{
> +  size_t
> +  operator()(const std::pair<Dwarf_Off, Dwarf_Off>& p) const
> +  {return abigail::hashing::combine_hashes(p.first, p.second);}
> +};// end struct dwarf_offset_pair_hash
> +
> +typedef unordered_set<std::pair<Dwarf_Off,
> +                               Dwarf_Off>,
> +                     dwarf_offset_pair_hash> dwarf_offset_pair_set_type;
>
>  /// Convenience typedef for a shared pointer to an @ref address_set_type.
>  typedef shared_ptr<address_set_type> address_set_sptr;
> @@ -297,6 +308,12 @@ get_scope_die(const read_context&  ctxt,
>               size_t                    where_offset,
>               Dwarf_Die&                scope_die);
>
> +static Dwarf_Off
> +die_offset(Dwarf_Die* die);
> +
> +static Dwarf_Off
> +die_offset(const Dwarf_Die* die);
> +
>  static bool
>  die_is_anonymous(const Dwarf_Die* die);
>
> @@ -6230,6 +6247,24 @@ void
>  set_do_log(read_context& ctxt, bool f)
>  {ctxt.do_log(f);}
>
> +/// Get the offset of a DIE
> +///
> +/// @param die the DIE to consider.
> +///
> +/// @return the offset of the DIE.
> +static Dwarf_Off
> +die_offset(Dwarf_Die* die)
> +{return dwarf_dieoffset(die);}
> +
> +/// Get the offset of a DIE
> +///
> +/// @param die the DIE to consider.
> +///
> +/// @return the offset of the DIE.
> +static Dwarf_Off
> +die_offset(const Dwarf_Die* die)
> +{return die_offset(const_cast<Dwarf_Die*>(die));}
> +
>  /// Test if a given DIE is anonymous
>  ///
>  /// @param die the DIE to consider.
> @@ -10097,6 +10132,51 @@ fn_die_equal_by_linkage_name(const read_context
> &ctxt,
>           && llinkage_name == rlinkage_name);
>  }
>
> +/// Test if the pair of offset {p1,p2} is present in a set.
> +///
> +/// @param set the set of pairs of DWARF offsets to consider.
> +///
> +/// @param p1 the first value of the pair.
> +///
> +/// @param p2 the second value of the pair.
> +///
> +/// @return if the pair {p1,p2} is present in the set.
> +static bool
> +has_offset_pair(const dwarf_offset_pair_set_type& set,
> +               Dwarf_Off p1, Dwarf_Off p2)
> +{
> +  std::pair<Dwarf_Off, Dwarf_Off> p(p1, p2);
> +  if (set.find(p) != set.end())
> +    return true;
> +  return false;
> +}
> +
> +/// Insert a new pair of offset into the set of pair.
> +///
> +/// @param set the set of pairs of DWARF offsets to consider.
> +///
> +/// @param p1 the first value of the pair.
> +///
> +/// @param p2 the second value of the pair.
> +static void
> +insert_offset_pair(dwarf_offset_pair_set_type& set, Dwarf_Off p1,
> Dwarf_Off p2)
> +{set.insert(std::pair<Dwarf_Off, Dwarf_Off>(p1, p2));}
> +
> +/// Erase a pair of DWARF offset from a set of pairs.
> +///
> +///
> +/// @param set the set of pairs of DWARF offsets to consider.
> +///
> +/// @param p1 the first value of the pair.
> +///
> +/// @param p2 the second value of the pair.
> +static void
> +erase_offset_pair(dwarf_offset_pair_set_type& set, Dwarf_Off p1,
> Dwarf_Off p2)
> +{
> +  std::pair<Dwarf_Off, Dwarf_Off> p(p1, p2);
> +  set.erase(p);
> +}
> +
>  /// Compare two DIEs emitted by a C compiler.
>  ///
>  /// @param ctxt the read context used to load the DWARF information.
> @@ -10123,7 +10203,7 @@ fn_die_equal_by_linkage_name(const read_context
> &ctxt,
>  static bool
>  compare_dies(const read_context& ctxt,
>              const Dwarf_Die *l, const Dwarf_Die *r,
> -            istring_set_type& aggregates_being_compared,
> +            dwarf_offset_pair_set_type& aggregates_being_compared,
>              bool update_canonical_dies_on_the_fly)
>  {
>    ABG_ASSERT(l);
> @@ -10268,23 +10348,15 @@ compare_dies(const read_context& ctxt,
>      case DW_TAG_structure_type:
>      case DW_TAG_union_type:
>        {
> -       interned_string ln = ctxt.get_die_pretty_type_representation(l, 0);
> -       interned_string rn = ctxt.get_die_pretty_type_representation(r, 0);
> -
> -       if ((aggregates_being_compared.find(ln)
> -            != aggregates_being_compared.end())
> -           || (aggregates_being_compared.find(rn)
> -               != aggregates_being_compared.end()))
> +       if (has_offset_pair(aggregates_being_compared,
> +                           die_offset(l), die_offset(r)))
>           result = true;
> -       else if (!compare_as_decl_dies(l, r))
> -         result = false;
> -       else if (!compare_as_type_dies(l, r))
> +       else if (!compare_as_decl_dies(l, r) || !compare_as_type_dies(l,
> r))
>           result = false;
>         else
>           {
> -           aggregates_being_compared.insert(ln);
> -           aggregates_being_compared.insert(rn);
> -
> +           insert_offset_pair(aggregates_being_compared,
> +                              die_offset(l), die_offset(r));
>             Dwarf_Die l_member, r_member;
>             bool found_l_member, found_r_member;
>             for (found_l_member = dwarf_child(const_cast<Dwarf_Die*>(l),
> @@ -10316,8 +10388,8 @@ compare_dies(const read_context& ctxt,
>             if (found_l_member != found_r_member)
>               result = false;
>
> -           aggregates_being_compared.erase(ln);
> -           aggregates_being_compared.erase(rn);
> +           erase_offset_pair(aggregates_being_compared,
> +                             die_offset(l), die_offset(r));
>           }
>        }
>        break;
> @@ -10402,10 +10474,8 @@ compare_dies(const read_context& ctxt,
>         interned_string ln = ctxt.get_die_pretty_type_representation(l, 0);
>         interned_string rn = ctxt.get_die_pretty_type_representation(r, 0);
>
> -       if ((aggregates_being_compared.find(ln)
> -            != aggregates_being_compared.end())
> -           || (aggregates_being_compared.find(rn)
> -               != aggregates_being_compared.end()))
> +       if (has_offset_pair(aggregates_being_compared, die_offset(l),
> +                           die_offset(r)))
>           {
>             result = true;
>             break;
> @@ -10498,8 +10568,8 @@ compare_dies(const read_context& ctxt,
>               }
>           }
>
> -       aggregates_being_compared.erase(ln);
> -       aggregates_being_compared.erase(rn);
> +       erase_offset_pair(aggregates_being_compared,
> +                         die_offset(l), die_offset(r));
>        }
>        break;
>
> @@ -10535,19 +10605,10 @@ compare_dies(const read_context& ctxt,
>               Dwarf_Die l_type, r_type;
>               ABG_ASSERT(die_die_attribute(l, DW_AT_type, l_type));
>               ABG_ASSERT(die_die_attribute(r, DW_AT_type, r_type));
> -             if (aggregates_being_compared.size () < 5)
> -               {
> -                 if (!compare_dies(ctxt, &l_type, &r_type,
> -                                   aggregates_being_compared,
> -                                   update_canonical_dies_on_the_fly))
> -                   result = false;
> -               }
> -             else
> -               {
> -                 if (!compare_as_type_dies(&l_type, &r_type)
> -                     ||!compare_as_decl_dies(&l_type, &r_type))
> -                   return false;
> -               }
> +             if (!compare_dies(ctxt, &l_type, &r_type,
> +                               aggregates_being_compared,
> +                               update_canonical_dies_on_the_fly))
> +               result = false;
>             }
>         }
>        else
> @@ -10661,7 +10722,7 @@ compare_dies(const read_context& ctxt,
>              const Dwarf_Die *r,
>              bool update_canonical_dies_on_the_fly)
>  {
> -  istring_set_type aggregates_being_compared;
> +  dwarf_offset_pair_set_type aggregates_being_compared;
>    return compare_dies(ctxt, l, r, aggregates_being_compared,
>                       update_canonical_dies_on_the_fly);
>  }
> --
> 2.35.0.rc2
>
>
> --
>                 Dodji
>
  
Dodji Seketeli March 2, 2022, 4:31 p.m. UTC | #3
Giuliano Procida <gprocida@google.com> a écrit:

> Answering on phone.
>
> This looks fine. You might be able to shorten things a bit with make_pair
> but that's just code style.

Right, I have updated the patch to use make_pair.  Thanks.

I am thus applying the patch below:

From bfd557040376d4e82e4e684f3745f1e55bafdd9b Mon Sep 17 00:00:00 2001
From: Dodji Seketeli <dodji@redhat.com>
Date: Thu, 10 Feb 2022 17:43:38 +0100
Subject: [PATCH] Bug 26646 - unexpected declaration-only types (part 2)

This patch addresses the comment
https://sourceware.org/bugzilla/show_bug.cgi?id=26646#c15 of the
reported problem.

The core of the problem seems to be that when compare_dies compares
two DW_TAG_{structure,union}_type, it "gives up" when the comparison
stack contains more than 5 such DIEs being compared.  Giving up means
that it considers the two DIEs to be equivalent if they have the same
size and kind, basically.  This is to avoid infinite loops or too long
loops that we've seen with artificial DWARF input generated by
fuzzers.

This patch fixes that by better using the "aggregates_being_compared"
data structure that is meant to prevent looping infinitely over
aggregate DIEs that might be recursively defined.  That data structure
now contains the DWARF offset pairs of the DIEs being compared, rather
than their "string representation".  Lookups in that data structure
should now be faster and more precise.

The artificial limit of the comparison stack not having more than 5
DIEs is now lifted.

	* src/abg-dwarf-reader.cc (struct dwarf_offset_pair_hash)
	(dwarf_offset_pair_set_type): Define new type.
	(die_offset, has_offset_pair, insert_offset_pair)
	(erase_offset_pair): Define new static helper functions.
	(compare_dies): Use a set of DWARF offsets for the
	'aggregates_being_compared' data structure, rather than a set of
	string representation of the DIEs.  Always look at the size of the
	types being compared first so that we can get out quickly if they
	differ.  For DIEs of DW_TAG_{union,struct}_type kind, don't limit
	the depth of the stack of DIEs being compared to 5; so we don't
	consider two types as being equal just because the depth of the
	stack being compared is 5 and the two types have the same size and
	are equal.  Hopefully things don't take too long.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-dwarf-reader.cc | 134 +++++++++++++++++++++++++++++-----------
 1 file changed, 97 insertions(+), 37 deletions(-)

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 53b5845d..b35e6c98 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -161,7 +161,18 @@ typedef unordered_map<GElf_Addr, elf_symbol_sptr> addr_elf_symbol_sptr_map_type;
 /// Convenience typedef for a set of ELF addresses.
 typedef unordered_set<GElf_Addr> address_set_type;
 
-typedef unordered_set<interned_string, hash_interned_string> istring_set_type;
+/// A hasher for a pair of Dwarf_Off.  This is used as a hasher for
+/// the type @ref dwarf_offset_pair_set_type.
+struct dwarf_offset_pair_hash
+{
+  size_t
+  operator()(const std::pair<Dwarf_Off, Dwarf_Off>& p) const
+  {return abigail::hashing::combine_hashes(p.first, p.second);}
+};// end struct dwarf_offset_pair_hash
+
+typedef unordered_set<std::pair<Dwarf_Off,
+				Dwarf_Off>,
+		      dwarf_offset_pair_hash> dwarf_offset_pair_set_type;
 
 /// Convenience typedef for a shared pointer to an @ref address_set_type.
 typedef shared_ptr<address_set_type> address_set_sptr;
@@ -297,6 +308,12 @@ get_scope_die(const read_context&	ctxt,
 	      size_t			where_offset,
 	      Dwarf_Die&		scope_die);
 
+static Dwarf_Off
+die_offset(Dwarf_Die* die);
+
+static Dwarf_Off
+die_offset(const Dwarf_Die* die);
+
 static bool
 die_is_anonymous(const Dwarf_Die* die);
 
@@ -6230,6 +6247,24 @@ void
 set_do_log(read_context& ctxt, bool f)
 {ctxt.do_log(f);}
 
+/// Get the offset of a DIE
+///
+/// @param die the DIE to consider.
+///
+/// @return the offset of the DIE.
+static Dwarf_Off
+die_offset(Dwarf_Die* die)
+{return dwarf_dieoffset(die);}
+
+/// Get the offset of a DIE
+///
+/// @param die the DIE to consider.
+///
+/// @return the offset of the DIE.
+static Dwarf_Off
+die_offset(const Dwarf_Die* die)
+{return die_offset(const_cast<Dwarf_Die*>(die));}
+
 /// Test if a given DIE is anonymous
 ///
 /// @param die the DIE to consider.
@@ -10097,6 +10132,50 @@ fn_die_equal_by_linkage_name(const read_context &ctxt,
 	  && llinkage_name == rlinkage_name);
 }
 
+/// Test if the pair of offset {p1,p2} is present in a set.
+///
+/// @param set the set of pairs of DWARF offsets to consider.
+///
+/// @param p1 the first value of the pair.
+///
+/// @param p2 the second value of the pair.
+///
+/// @return if the pair {p1,p2} is present in the set.
+static bool
+has_offset_pair(const dwarf_offset_pair_set_type& set,
+		Dwarf_Off p1, Dwarf_Off p2)
+{
+  if (set.find(std::make_pair(p1, p2)) != set.end())
+    return true;
+  return false;
+}
+
+/// Insert a new pair of offset into the set of pair.
+///
+/// @param set the set of pairs of DWARF offsets to consider.
+///
+/// @param p1 the first value of the pair.
+///
+/// @param p2 the second value of the pair.
+static void
+insert_offset_pair(dwarf_offset_pair_set_type& set, Dwarf_Off p1, Dwarf_Off p2)
+{set.insert(std::make_pair(p1, p2));}
+
+/// Erase a pair of DWARF offset from a set of pairs.
+///
+///
+/// @param set the set of pairs of DWARF offsets to consider.
+///
+/// @param p1 the first value of the pair.
+///
+/// @param p2 the second value of the pair.
+static void
+erase_offset_pair(dwarf_offset_pair_set_type& set, Dwarf_Off p1, Dwarf_Off p2)
+{
+  std::pair<Dwarf_Off, Dwarf_Off> p(p1, p2);
+  set.erase(p);
+}
+
 /// Compare two DIEs emitted by a C compiler.
 ///
 /// @param ctxt the read context used to load the DWARF information.
@@ -10123,7 +10202,7 @@ fn_die_equal_by_linkage_name(const read_context &ctxt,
 static bool
 compare_dies(const read_context& ctxt,
 	     const Dwarf_Die *l, const Dwarf_Die *r,
-	     istring_set_type& aggregates_being_compared,
+	     dwarf_offset_pair_set_type& aggregates_being_compared,
 	     bool update_canonical_dies_on_the_fly)
 {
   ABG_ASSERT(l);
@@ -10268,23 +10347,15 @@ compare_dies(const read_context& ctxt,
     case DW_TAG_structure_type:
     case DW_TAG_union_type:
       {
-	interned_string ln = ctxt.get_die_pretty_type_representation(l, 0);
-	interned_string rn = ctxt.get_die_pretty_type_representation(r, 0);
-
-	if ((aggregates_being_compared.find(ln)
-	     != aggregates_being_compared.end())
-	    || (aggregates_being_compared.find(rn)
-		!= aggregates_being_compared.end()))
+	if (has_offset_pair(aggregates_being_compared,
+			    die_offset(l), die_offset(r)))
 	  result = true;
-	else if (!compare_as_decl_dies(l, r))
-	  result = false;
-	else if (!compare_as_type_dies(l, r))
+	else if (!compare_as_decl_dies(l, r) || !compare_as_type_dies(l, r))
 	  result = false;
 	else
 	  {
-	    aggregates_being_compared.insert(ln);
-	    aggregates_being_compared.insert(rn);
-
+	    insert_offset_pair(aggregates_being_compared,
+			       die_offset(l), die_offset(r));
 	    Dwarf_Die l_member, r_member;
 	    bool found_l_member, found_r_member;
 	    for (found_l_member = dwarf_child(const_cast<Dwarf_Die*>(l),
@@ -10316,8 +10387,8 @@ compare_dies(const read_context& ctxt,
 	    if (found_l_member != found_r_member)
 	      result = false;
 
-	    aggregates_being_compared.erase(ln);
-	    aggregates_being_compared.erase(rn);
+	    erase_offset_pair(aggregates_being_compared,
+			      die_offset(l), die_offset(r));
 	  }
       }
       break;
@@ -10402,10 +10473,8 @@ compare_dies(const read_context& ctxt,
 	interned_string ln = ctxt.get_die_pretty_type_representation(l, 0);
 	interned_string rn = ctxt.get_die_pretty_type_representation(r, 0);
 
-	if ((aggregates_being_compared.find(ln)
-	     != aggregates_being_compared.end())
-	    || (aggregates_being_compared.find(rn)
-		!= aggregates_being_compared.end()))
+	if (has_offset_pair(aggregates_being_compared, die_offset(l),
+			    die_offset(r)))
 	  {
 	    result = true;
 	    break;
@@ -10498,8 +10567,8 @@ compare_dies(const read_context& ctxt,
 	      }
 	  }
 
-	aggregates_being_compared.erase(ln);
-	aggregates_being_compared.erase(rn);
+	erase_offset_pair(aggregates_being_compared,
+			  die_offset(l), die_offset(r));
       }
       break;
 
@@ -10535,19 +10604,10 @@ compare_dies(const read_context& ctxt,
 	      Dwarf_Die l_type, r_type;
 	      ABG_ASSERT(die_die_attribute(l, DW_AT_type, l_type));
 	      ABG_ASSERT(die_die_attribute(r, DW_AT_type, r_type));
-	      if (aggregates_being_compared.size () < 5)
-		{
-		  if (!compare_dies(ctxt, &l_type, &r_type,
-				    aggregates_being_compared,
-				    update_canonical_dies_on_the_fly))
-		    result = false;
-		}
-	      else
-		{
-		  if (!compare_as_type_dies(&l_type, &r_type)
-		      ||!compare_as_decl_dies(&l_type, &r_type))
-		    return false;
-		}
+	      if (!compare_dies(ctxt, &l_type, &r_type,
+				aggregates_being_compared,
+				update_canonical_dies_on_the_fly))
+		result = false;
 	    }
 	}
       else
@@ -10661,7 +10721,7 @@ compare_dies(const read_context& ctxt,
 	     const Dwarf_Die *r,
 	     bool update_canonical_dies_on_the_fly)
 {
-  istring_set_type aggregates_being_compared;
+  dwarf_offset_pair_set_type aggregates_being_compared;
   return compare_dies(ctxt, l, r, aggregates_being_compared,
 		      update_canonical_dies_on_the_fly);
 }
  

Patch

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 53b5845d..05a7e4d6 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -161,7 +161,18 @@  typedef unordered_map<GElf_Addr, elf_symbol_sptr> addr_elf_symbol_sptr_map_type;
 /// Convenience typedef for a set of ELF addresses.
 typedef unordered_set<GElf_Addr> address_set_type;
 
-typedef unordered_set<interned_string, hash_interned_string> istring_set_type;
+/// A hasher for a pair of Dwarf_Off.  This is used as a hasher for
+/// the type @ref dwarf_offset_pair_set_type.
+struct dwarf_offset_pair_hash
+{
+  size_t
+  operator()(const std::pair<Dwarf_Off, Dwarf_Off>& p) const
+  {return abigail::hashing::combine_hashes(p.first, p.second);}
+};// end struct dwarf_offset_pair_hash
+
+typedef unordered_set<std::pair<Dwarf_Off,
+				Dwarf_Off>,
+		      dwarf_offset_pair_hash> dwarf_offset_pair_set_type;
 
 /// Convenience typedef for a shared pointer to an @ref address_set_type.
 typedef shared_ptr<address_set_type> address_set_sptr;
@@ -297,6 +308,12 @@  get_scope_die(const read_context&	ctxt,
 	      size_t			where_offset,
 	      Dwarf_Die&		scope_die);
 
+static Dwarf_Off
+die_offset(Dwarf_Die* die);
+
+static Dwarf_Off
+die_offset(const Dwarf_Die* die);
+
 static bool
 die_is_anonymous(const Dwarf_Die* die);
 
@@ -6230,6 +6247,24 @@  void
 set_do_log(read_context& ctxt, bool f)
 {ctxt.do_log(f);}
 
+/// Get the offset of a DIE
+///
+/// @param die the DIE to consider.
+///
+/// @return the offset of the DIE.
+static Dwarf_Off
+die_offset(Dwarf_Die* die)
+{return dwarf_dieoffset(die);}
+
+/// Get the offset of a DIE
+///
+/// @param die the DIE to consider.
+///
+/// @return the offset of the DIE.
+static Dwarf_Off
+die_offset(const Dwarf_Die* die)
+{return die_offset(const_cast<Dwarf_Die*>(die));}
+
 /// Test if a given DIE is anonymous
 ///
 /// @param die the DIE to consider.
@@ -10097,6 +10132,54 @@  fn_die_equal_by_linkage_name(const read_context &ctxt,
 	  && llinkage_name == rlinkage_name);
 }
 
+/// Test if the unordered pair of offset {p1,p2} is present in a set.
+///
+/// @param set the set of pairs of DWARF offsets to consider.
+///
+/// @param p1 the first value of the pair.
+///
+/// @param p2 the second value of the pair.
+///
+/// @return if the pair {p1,p2} or {p2,p1} is present in the set.
+static bool
+has_offset_pair(const dwarf_offset_pair_set_type& set,
+		Dwarf_Off p1, Dwarf_Off p2)
+{
+  std::pair<Dwarf_Off, Dwarf_Off> p(p1, p2), o(p2, p1);
+  if ((set.find(p) != set.end())
+      || (set.find(o) != set.end()))
+    return true;
+  return false;
+}
+
+/// Insert a new pair of offset into the set of pair.
+///
+/// @param set the set of pairs of DWARF offsets to consider.
+///
+/// @param p1 the first value of the pair.
+///
+/// @param p2 the second value of the pair.
+static void
+insert_offset_pair(dwarf_offset_pair_set_type& set, Dwarf_Off p1, Dwarf_Off p2)
+{set.insert(std::pair<Dwarf_Off, Dwarf_Off>(p1, p2));}
+
+/// Erase a pair of DWARF offset from a set of pairs.
+///
+/// The pair erased is non ordered.
+///
+/// @param set the set of pairs of DWARF offsets to consider.
+///
+/// @param p1 the first value of the pair.
+///
+/// @param p2 the second value of the pair.
+static void
+erase_offset_pair(dwarf_offset_pair_set_type& set, Dwarf_Off p1, Dwarf_Off p2)
+{
+  std::pair<Dwarf_Off, Dwarf_Off> p(p1, p2), o(p2, 1);
+  if (!set.erase(p))
+    set.erase(o);
+}
+
 /// Compare two DIEs emitted by a C compiler.
 ///
 /// @param ctxt the read context used to load the DWARF information.
@@ -10123,7 +10206,7 @@  fn_die_equal_by_linkage_name(const read_context &ctxt,
 static bool
 compare_dies(const read_context& ctxt,
 	     const Dwarf_Die *l, const Dwarf_Die *r,
-	     istring_set_type& aggregates_being_compared,
+	     dwarf_offset_pair_set_type& aggregates_being_compared,
 	     bool update_canonical_dies_on_the_fly)
 {
   ABG_ASSERT(l);
@@ -10268,23 +10351,15 @@  compare_dies(const read_context& ctxt,
     case DW_TAG_structure_type:
     case DW_TAG_union_type:
       {
-	interned_string ln = ctxt.get_die_pretty_type_representation(l, 0);
-	interned_string rn = ctxt.get_die_pretty_type_representation(r, 0);
-
-	if ((aggregates_being_compared.find(ln)
-	     != aggregates_being_compared.end())
-	    || (aggregates_being_compared.find(rn)
-		!= aggregates_being_compared.end()))
+	if (has_offset_pair(aggregates_being_compared,
+			    die_offset(l), die_offset(r)))
 	  result = true;
-	else if (!compare_as_decl_dies(l, r))
-	  result = false;
-	else if (!compare_as_type_dies(l, r))
+	else if (!compare_as_decl_dies(l, r) || !compare_as_type_dies(l, r))
 	  result = false;
 	else
 	  {
-	    aggregates_being_compared.insert(ln);
-	    aggregates_being_compared.insert(rn);
-
+	    insert_offset_pair(aggregates_being_compared,
+			       die_offset(l), die_offset(r));
 	    Dwarf_Die l_member, r_member;
 	    bool found_l_member, found_r_member;
 	    for (found_l_member = dwarf_child(const_cast<Dwarf_Die*>(l),
@@ -10316,8 +10391,8 @@  compare_dies(const read_context& ctxt,
 	    if (found_l_member != found_r_member)
 	      result = false;
 
-	    aggregates_being_compared.erase(ln);
-	    aggregates_being_compared.erase(rn);
+	    erase_offset_pair(aggregates_being_compared,
+			      die_offset(l), die_offset(r));
 	  }
       }
       break;
@@ -10402,10 +10477,8 @@  compare_dies(const read_context& ctxt,
 	interned_string ln = ctxt.get_die_pretty_type_representation(l, 0);
 	interned_string rn = ctxt.get_die_pretty_type_representation(r, 0);
 
-	if ((aggregates_being_compared.find(ln)
-	     != aggregates_being_compared.end())
-	    || (aggregates_being_compared.find(rn)
-		!= aggregates_being_compared.end()))
+	if (has_offset_pair(aggregates_being_compared, die_offset(l),
+			    die_offset(r)))
 	  {
 	    result = true;
 	    break;
@@ -10498,8 +10571,8 @@  compare_dies(const read_context& ctxt,
 	      }
 	  }
 
-	aggregates_being_compared.erase(ln);
-	aggregates_being_compared.erase(rn);
+	erase_offset_pair(aggregates_being_compared,
+			  die_offset(l), die_offset(r));
       }
       break;
 
@@ -10535,19 +10608,10 @@  compare_dies(const read_context& ctxt,
 	      Dwarf_Die l_type, r_type;
 	      ABG_ASSERT(die_die_attribute(l, DW_AT_type, l_type));
 	      ABG_ASSERT(die_die_attribute(r, DW_AT_type, r_type));
-	      if (aggregates_being_compared.size () < 5)
-		{
-		  if (!compare_dies(ctxt, &l_type, &r_type,
-				    aggregates_being_compared,
-				    update_canonical_dies_on_the_fly))
-		    result = false;
-		}
-	      else
-		{
-		  if (!compare_as_type_dies(&l_type, &r_type)
-		      ||!compare_as_decl_dies(&l_type, &r_type))
-		    return false;
-		}
+	      if (!compare_dies(ctxt, &l_type, &r_type,
+				aggregates_being_compared,
+				update_canonical_dies_on_the_fly))
+		result = false;
 	    }
 	}
       else
@@ -10661,7 +10725,7 @@  compare_dies(const read_context& ctxt,
 	     const Dwarf_Die *r,
 	     bool update_canonical_dies_on_the_fly)
 {
-  istring_set_type aggregates_being_compared;
+  dwarf_offset_pair_set_type aggregates_being_compared;
   return compare_dies(ctxt, l, r, aggregates_being_compared,
 		      update_canonical_dies_on_the_fly);
 }