Structurally compare the few non-canonicalized types in general

Message ID 875z7ebf9d.fsf@redhat.com
State New
Headers
Series Structurally compare the few non-canonicalized types in general |

Commit Message

Dodji Seketeli Oct. 13, 2020, 10:18 a.m. UTC
  Hello,

In the abixml writer, we recently stopped using
type_base::dynamic_hash as a slow path to hash a non-canonicalized
type.  Rather, we use an arbitrary constant as a hash for those
non-canonicalized types.  This amounts to using structural comparison
for those types.  The function that implements this hashing scheme is
hash_as_canonical_type_or_constant.

A potential concern for that approach was the possible negative impact
on speed.  As it turns out since the change went in, there was no
noticeable speed impact raised after testing.  Which is great!

In insight, this is understandable as the number of non-canonicalized
types should be extremely reduced now that we canonicalize pretty much
all types.  As far as I understand it at this point, the only types
that would not be canonicalized are unresolved declaration-only
types.  And, all in all, structurally comparing unresolved
declaration-only types should be "fast enough".

If we ever stumble across any other non-canonicalized type, I think
that would be an artifact of a bug that ought to be fixed.

On that basis, I went ahead and used
hash_as_canonical_type_or_constant throughout the code base and did
away with the use of type_base::dynamic_hash for now, until it's
properly audited, regression tested and got ready for the use cases
where it might make sense.

This patch thus makes hash_type use
hash_as_canonical_type_or_constant.  The writer is then back to using
hash_type again, as it used to; but at the same time, it's still using
structural comparison for non-canonilized types.  So is
hash_type_or_decl, which now uses hash_type to hash types, rather than
using type_base::dynamic_hash.  Note that the comparison engine
heavily uses hash_type_or_decl to hash diff nodes.

So with this small patch the comparison engine is now using structural
comparison of non-canonicalized types (and diff nodes), just as the
abixml writer does.

	* src/abg-hash.cc (type_base::dynamic_hash::operator()): Add a
	comment.
	* src/abg-ir.cc (hash_type_or_decl): Use hash_type for types.
	* src/abg-writer.cc (type_hasher::operator()): Use hash_type.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-hash.cc   |  5 ++++
 src/abg-ir.cc     | 67 ++---------------------------------------------
 src/abg-writer.cc |  2 +-
 3 files changed, 8 insertions(+), 66 deletions(-)
  

Comments

Giuliano Procida Oct. 13, 2020, 2:55 p.m. UTC | #1
Hi Dodji.

This looks good to me. I did a couple of spot checks with this on top of
master and all seemed OK.

Giuliano.


On Tue, 13 Oct 2020 at 11:18, Dodji Seketeli <dodji@redhat.com> wrote:

> Hello,
>
> In the abixml writer, we recently stopped using
> type_base::dynamic_hash as a slow path to hash a non-canonicalized
> type.  Rather, we use an arbitrary constant as a hash for those
> non-canonicalized types.  This amounts to using structural comparison
> for those types.  The function that implements this hashing scheme is
> hash_as_canonical_type_or_constant.
>
> A potential concern for that approach was the possible negative impact
> on speed.  As it turns out since the change went in, there was no
> noticeable speed impact raised after testing.  Which is great!
>
> In insight, this is understandable as the number of non-canonicalized
> types should be extremely reduced now that we canonicalize pretty much
> all types.  As far as I understand it at this point, the only types
> that would not be canonicalized are unresolved declaration-only
> types.  And, all in all, structurally comparing unresolved
> declaration-only types should be "fast enough".
>
> If we ever stumble across any other non-canonicalized type, I think
> that would be an artifact of a bug that ought to be fixed.
>
> On that basis, I went ahead and used
> hash_as_canonical_type_or_constant throughout the code base and did
> away with the use of type_base::dynamic_hash for now, until it's
> properly audited, regression tested and got ready for the use cases
> where it might make sense.
>
> This patch thus makes hash_type use
> hash_as_canonical_type_or_constant.  The writer is then back to using
> hash_type again, as it used to; but at the same time, it's still using
> structural comparison for non-canonilized types.  So is
> hash_type_or_decl, which now uses hash_type to hash types, rather than
> using type_base::dynamic_hash.  Note that the comparison engine
> heavily uses hash_type_or_decl to hash diff nodes.
>
> So with this small patch the comparison engine is now using structural
> comparison of non-canonicalized types (and diff nodes), just as the
> abixml writer does.
>
>         * src/abg-hash.cc (type_base::dynamic_hash::operator()): Add a
>         comment.
>         * src/abg-ir.cc (hash_type_or_decl): Use hash_type for types.
>         * src/abg-writer.cc (type_hasher::operator()): Use hash_type.
>
> Signed-off-by: Dodji Seketeli <dodji@redhat.com>
> ---
>  src/abg-hash.cc   |  5 ++++
>  src/abg-ir.cc     | 67 ++---------------------------------------------
>  src/abg-writer.cc |  2 +-
>  3 files changed, 8 insertions(+), 66 deletions(-)
>
> diff --git a/src/abg-hash.cc b/src/abg-hash.cc
> index 5d2861f3..ca67689c 100644
> --- a/src/abg-hash.cc
> +++ b/src/abg-hash.cc
> @@ -991,6 +991,11 @@ operator()(const shared_ptr<class_tdecl> t) const
>  /// inheritance hierarchy, make sure to handle the most derived type
>  /// first.
>  ///
> +/// FIXME: This hashing function is not maintained and is surely
> +/// broken in subtle ways.  In pratice, the various *::hash functors
> +/// should be audited before they are used here.  They should all
> +/// match what is done in the 'equals' functions in abg-ir.cc.
> +///
>  /// @param t a pointer to the type declaration to be hashed
>  ///
>  /// @return the resulting hash
> diff --git a/src/abg-ir.cc b/src/abg-ir.cc
> index 24f42e54..1b99c18f 100644
> --- a/src/abg-ir.cc
> +++ b/src/abg-ir.cc
> @@ -22977,36 +22977,7 @@ hash_type_or_decl(const type_or_decl_base *tod)
>    if (tod == 0)
>      ;
>    else if (const type_base* t = is_type(tod))
> -    {
> -      // If the type has a canonical type, then use the pointer value
> -      // as a hash.  This is the fastest we can get.
> -      if (type_base* ct = t->get_naked_canonical_type())
> -       result = reinterpret_cast<size_t>(ct);
> -      else if (const class_decl* cl = is_class_type(t))
> -       {
> -         if (cl->get_is_declaration_only()
> -             && cl->get_naked_definition_of_declaration())
> -           // The is a declaration-only class, so it has no canonical
> -           // type; but then it's class definition has one.  Let's
> -           // use that one.
> -           return
> hash_type_or_decl(cl->get_naked_definition_of_declaration());
> -         else
> -           {
> -             // The class really has no canonical type, let's use the
> -             // slow path of hashing the class recursively.  Well
> -             // it's not that slow as the hash value is quickly going
> -             // to result to zero anyway.
> -             type_base::dynamic_hash hash;
> -             result = hash(t);
> -           }
> -       }
> -      else
> -       {
> -         // Let's use the slow path of hashing the class recursively.
> -         type_base::dynamic_hash hash;
> -         result = hash(t);
> -       }
> -    }
> +    result = hash_type(t);
>    else if (const decl_base* d = is_decl(tod))
>      {
>        if (var_decl* v = is_var_decl(d))
> @@ -23079,41 +23050,7 @@ hash_type_or_decl(const type_or_decl_base *tod)
>  /// @return the resulting hash value.
>  size_t
>  hash_type(const type_base *t)
> -{
> -  size_t result = 0;
> -
> -  // If the type has a canonical type, then use the pointer value
> -  // as a hash.  This is the fastest we can get.
> -  if (type_base* ct = t->get_naked_canonical_type())
> -    result = reinterpret_cast<size_t>(ct);
> -  else if (const class_decl* cl = is_class_type(t))
> -    {
> -      if (cl->get_is_declaration_only()
> -         && cl->get_naked_definition_of_declaration())
> -       // The is a declaration-only class, so it has no canonical
> -       // type; but then it's class definition has one.  Let's
> -       // use that one.
> -       return hash_type
> -         (is_class_type(cl->get_naked_definition_of_declaration()));
> -      else
> -       {
> -         // The class really has no canonical type, let's use the
> -         // slow path of hashing the class recursively.  Well
> -         // it's not that slow as the hash value is quickly going
> -         // to result to zero anyway.
> -         type_base::dynamic_hash hash;
> -         result = hash(t);
> -       }
> -    }
> -  else
> -    {
> -      // Let's use the slow path of hashing the class recursively.
> -      type_base::dynamic_hash hash;
> -      result = hash(t);
> -    }
> -
> -  return result;
> -}
> +{return hash_as_canonical_type_or_constant(t);}
>
>  /// Hash an ABI artifact that is either a type of a decl.
>  ///
> diff --git a/src/abg-writer.cc b/src/abg-writer.cc
> index 59060e10..4c751c26 100644
> --- a/src/abg-writer.cc
> +++ b/src/abg-writer.cc
> @@ -136,7 +136,7 @@ struct type_hasher
>  {
>    size_t
>    operator()(const type_base* t) const
> -  {return hash_as_canonical_type_or_constant(t);}
> +  {return hash_type(t);}
>  }; // end struct type_hasher
>
>  /// A convenience typedef for a map that associates a pointer to type
> --
> 2.28.0
>
>
> --
>                 Dodji
>
>
  
Dodji Seketeli Oct. 14, 2020, 11:07 a.m. UTC | #2
Hello Giuliano,

Giuliano Procida <gprocida@google.com> writes:

> This looks good to me. I did a couple of spot checks with this on top of
> master and all seemed OK.

Thank you for taking the time to review this, it's really appreciated.

I have applied to patch to master.

Thanks!
  

Patch

diff --git a/src/abg-hash.cc b/src/abg-hash.cc
index 5d2861f3..ca67689c 100644
--- a/src/abg-hash.cc
+++ b/src/abg-hash.cc
@@ -991,6 +991,11 @@  operator()(const shared_ptr<class_tdecl> t) const
 /// inheritance hierarchy, make sure to handle the most derived type
 /// first.
 ///
+/// FIXME: This hashing function is not maintained and is surely
+/// broken in subtle ways.  In pratice, the various *::hash functors
+/// should be audited before they are used here.  They should all
+/// match what is done in the 'equals' functions in abg-ir.cc.
+///
 /// @param t a pointer to the type declaration to be hashed
 ///
 /// @return the resulting hash
diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index 24f42e54..1b99c18f 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -22977,36 +22977,7 @@  hash_type_or_decl(const type_or_decl_base *tod)
   if (tod == 0)
     ;
   else if (const type_base* t = is_type(tod))
-    {
-      // If the type has a canonical type, then use the pointer value
-      // as a hash.  This is the fastest we can get.
-      if (type_base* ct = t->get_naked_canonical_type())
-	result = reinterpret_cast<size_t>(ct);
-      else if (const class_decl* cl = is_class_type(t))
-	{
-	  if (cl->get_is_declaration_only()
-	      && cl->get_naked_definition_of_declaration())
-	    // The is a declaration-only class, so it has no canonical
-	    // type; but then it's class definition has one.  Let's
-	    // use that one.
-	    return hash_type_or_decl(cl->get_naked_definition_of_declaration());
-	  else
-	    {
-	      // The class really has no canonical type, let's use the
-	      // slow path of hashing the class recursively.  Well
-	      // it's not that slow as the hash value is quickly going
-	      // to result to zero anyway.
-	      type_base::dynamic_hash hash;
-	      result = hash(t);
-	    }
-	}
-      else
-	{
-	  // Let's use the slow path of hashing the class recursively.
-	  type_base::dynamic_hash hash;
-	  result = hash(t);
-	}
-    }
+    result = hash_type(t);
   else if (const decl_base* d = is_decl(tod))
     {
       if (var_decl* v = is_var_decl(d))
@@ -23079,41 +23050,7 @@  hash_type_or_decl(const type_or_decl_base *tod)
 /// @return the resulting hash value.
 size_t
 hash_type(const type_base *t)
-{
-  size_t result = 0;
-
-  // If the type has a canonical type, then use the pointer value
-  // as a hash.  This is the fastest we can get.
-  if (type_base* ct = t->get_naked_canonical_type())
-    result = reinterpret_cast<size_t>(ct);
-  else if (const class_decl* cl = is_class_type(t))
-    {
-      if (cl->get_is_declaration_only()
-	  && cl->get_naked_definition_of_declaration())
-	// The is a declaration-only class, so it has no canonical
-	// type; but then it's class definition has one.  Let's
-	// use that one.
-	return hash_type
-	  (is_class_type(cl->get_naked_definition_of_declaration()));
-      else
-	{
-	  // The class really has no canonical type, let's use the
-	  // slow path of hashing the class recursively.  Well
-	  // it's not that slow as the hash value is quickly going
-	  // to result to zero anyway.
-	  type_base::dynamic_hash hash;
-	  result = hash(t);
-	}
-    }
-  else
-    {
-      // Let's use the slow path of hashing the class recursively.
-      type_base::dynamic_hash hash;
-      result = hash(t);
-    }
-
-  return result;
-}
+{return hash_as_canonical_type_or_constant(t);}
 
 /// Hash an ABI artifact that is either a type of a decl.
 ///
diff --git a/src/abg-writer.cc b/src/abg-writer.cc
index 59060e10..4c751c26 100644
--- a/src/abg-writer.cc
+++ b/src/abg-writer.cc
@@ -136,7 +136,7 @@  struct type_hasher
 {
   size_t
   operator()(const type_base* t) const
-  {return hash_as_canonical_type_or_constant(t);}
+  {return hash_type(t);}
 }; // end struct type_hasher
 
 /// A convenience typedef for a map that associates a pointer to type