[3/5] XML writer: track emitted types by bare pointer

Message ID 20211203114622.2944173-4-maennich@google.com
State New
Headers
Series Improvements for the XML Writer |

Commit Message

Matthias Männich Dec. 3, 2021, 11:46 a.m. UTC
  This is a performance and safety improvement made possible by the
previous changes which ensure that the same pointers are inserted and
looked up.

This essentially removes the now unnecessary deep comparison.

	* src/abg-writer.cc (type_ptr_set_type): Change typedef
	container type to use default equality and hashing for pointer
	keys.
	(fn_type_ptr_set_type): Likewise.

Reviewed-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
---
 src/abg-writer.cc | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)
  

Comments

Dodji Seketeli Dec. 10, 2021, 10:50 a.m. UTC | #1
Hello,

Matthias Maennich <maennich@google.com> a écrit:

[...]

> This is a performance and safety improvement made possible by the
> previous changes which ensure that the same pointers are inserted and
> looked up.
>
> This essentially removes the now unnecessary deep comparison.

[...]

> +++ b/src/abg-writer.cc
> @@ -123,14 +123,10 @@ typedef unordered_map<type_base*,
>  		      abigail::diff_utils::deep_ptr_eq_functor> type_ptr_map;
>  
>  // A convenience typedef for a set of type_base*.
> -typedef unordered_set<const type_base*, type_hasher,
> -		      abigail::diff_utils::deep_ptr_eq_functor>
> -type_ptr_set_type;
> +typedef std::unordered_set<const type_base*> type_ptr_set_type;
>  
>  /// A convenience typedef for a set of function type*.
> -typedef unordered_set<function_type*, type_hasher,
> -		      abigail::diff_utils::deep_ptr_eq_functor>
> -fn_type_ptr_set_type;
> +typedef std::unordered_set<function_type*> fn_type_ptr_set_type;

The problem I see with doing this is that it's possible that two
declaration-only classes, that are equivalent but that have different
pointer values get into these sets.

In that case, they would be considered different even though they are
not.

So maybe it would be better have an equality operator that uses
is_non_canonicalized_type() to detect those rare cases and use
structural comparison in those cases?

What do you think?
  
Matthias Männich Dec. 16, 2021, 4:07 p.m. UTC | #2
Hi Dodji!

Thanks for the review and thanks for your thoughts!

On Fri, Dec 10, 2021 at 11:50:49AM +0100, Dodji Seketeli wrote:
>Hello,
>
>Matthias Maennich <maennich@google.com> a écrit:
>
>[...]
>
>> This is a performance and safety improvement made possible by the
>> previous changes which ensure that the same pointers are inserted and
>> looked up.
>>
>> This essentially removes the now unnecessary deep comparison.
>
>[...]
>
>> +++ b/src/abg-writer.cc
>> @@ -123,14 +123,10 @@ typedef unordered_map<type_base*,
>>  		      abigail::diff_utils::deep_ptr_eq_functor> type_ptr_map;
>>
>>  // A convenience typedef for a set of type_base*.
>> -typedef unordered_set<const type_base*, type_hasher,
>> -		      abigail::diff_utils::deep_ptr_eq_functor>
>> -type_ptr_set_type;
>> +typedef std::unordered_set<const type_base*> type_ptr_set_type;
>>
>>  /// A convenience typedef for a set of function type*.
>> -typedef unordered_set<function_type*, type_hasher,
>> -		      abigail::diff_utils::deep_ptr_eq_functor>
>> -fn_type_ptr_set_type;
>> +typedef std::unordered_set<function_type*> fn_type_ptr_set_type;
>
>The problem I see with doing this is that it's possible that two
>declaration-only classes, that are equivalent but that have different
>pointer values get into these sets.
>
>In that case, they would be considered different even though they are
>not.

If the XML writer considers two equivalent declaration-only types to be
different, one question to ask is: what is the real difference, that is,
how will this affect the outcome of abidiff? If the types never change
(kind, name or declaration/definition status), nothing should ever be
reported. If a type does change... there are two possibilities: either
the types were really one type and now perhaps abidiff reports diffs for
the same name in two different ways; or the types were really two
different ones and abidiff has a simpler job. In my experience, abidiff
doesn't always report declaration-only/defined transitions. It doesn't
sound like there will be any really bad impact on diffs from having this
kind of duplication. However, if someone can come up with a test case of
the kind you mention, that would give some extra reassurance.

>
>So maybe it would be better have an equality operator that uses
>is_non_canonicalized_type() to detect those rare cases and use
>structural comparison in those cases?

That might come at higher cost than it is beneficial.

>
>What do you think?

For us specifically - building with clang and for our use cases - if we
keep structural equality of any kind then we need a hash function to go
along with this and, as we've sadly found out, this isn't working well
at the moment. We are currently on a bit dated version of libabigail for
our production use, but would like to close that gap again to come
closer to master.

The risk of infinite loops and the reality of 30x slowdowns for certain
workloads mean we would need to apply these changes to remove structural
equality testing from the XML writer and then maintain an Android
version of libabigail as a more heavily-patched fork, to whatever extent
is feasible. I would rather we find a good solution that works for all
to get again close to upstream and not having to maintain such a fork.

Yet, as an additional piece of assurance: the testing we have done does
not only include kernels, but of course we heavily examined the
libabigail test suite. Additionally, we maintain a large set of small
test cases specifically created for ABI stability testing and to cover
corner cases of all sorts. We are in the process of publishing those as
well. So far, this has served as great input for this patch series as
well.

Does this make sense? What do you think?

Cheers,
Matthias

>
>-- 
>		Dodji
  
Dodji Seketeli Jan. 10, 2022, 5 p.m. UTC | #3
Matthias Maennich <maennich@google.com> a écrit:

[...]

> If the XML writer considers two equivalent declaration-only types to be
> different, one question to ask is: what is the real difference, that is,
> how will this affect the outcome of abidiff?

The problem is not necessarily at the abidiff level per say.

The problem would be duplication of decl-only types in the abixml
output, I think, and maybe infinite loops in those cases.  The infinite
loops are easy to debug, though.  So I am not concerned about them.

> If the types never change
> (kind, name or declaration/definition status), nothing should ever be
> reported. If a type does change... there are two possibilities: either
> the types were really one type and now perhaps abidiff reports diffs for
> the same name in two different ways; or the types were really two
> different ones and abidiff has a simpler job. In my experience, abidiff
> doesn't always report declaration-only/defined transitions. It doesn't
> sound like there will be any really bad impact on diffs from having this
> kind of duplication. However, if someone can come up with a test case of
> the kind you mention, that would give some extra reassurance.

The reason why I was pointing to this "general" issue is to make sure
you are aware of this.  As type duplications in abixml was something you
guys were tracking (and rightly so) I thought I'd point out that we
still have the risk here.

But because the type id map (writer_context::m_type_id_map) is not
affected, the duplicated types will correctly be identified as such by
the reader; thus I don't think abidiff is going to be affected.

>>So maybe it would be better have an equality operator that uses
>>is_non_canonicalized_type() to detect those rare cases and use
>>structural comparison in those cases?
>
> That might come at higher cost than it is beneficial.

I could not tell, as I don't necessarily have the right binaries at
hand.  I trust you.

>
>>
>>What do you think?
>
> For us specifically - building with clang and for our use cases - if we
> keep structural equality of any kind then we need a hash function to go
> along with this and, as we've sadly found out, this isn't working well
> at the moment. We are currently on a bit dated version of libabigail for
> our production use, but would like to close that gap again to come
> closer to master.
>
> The risk of infinite loops and the reality of 30x slowdowns for certain
> workloads mean we would need to apply these changes to remove structural
> equality testing from the XML writer and then maintain an Android
> version of libabigail as a more heavily-patched fork, to whatever extent
> is feasible. I would rather we find a good solution that works for all
> to get again close to upstream and not having to maintain such a fork.
>
> Yet, as an additional piece of assurance: the testing we have done does
> not only include kernels, but of course we heavily examined the
> libabigail test suite. Additionally, we maintain a large set of small
> test cases specifically created for ABI stability testing and to cover
> corner cases of all sorts. We are in the process of publishing those as
> well. So far, this has served as great input for this patch series as
> well.
>
> Does this make sense? What do you think?

If you don't really care about the potential type duplication in the
abixml as stated above, frankly, let's just get this patch in.

Are you okay with that?

Cheers,
  
Matthias Männich Jan. 17, 2022, 6:03 p.m. UTC | #4
Thanks Dodji for having a look and for sharing your thoughts! That is -
as always - very helpful to get a good full picture!

On Mon, Jan 10, 2022 at 06:00:04PM +0100, Dodji Seketeli wrote:
>Matthias Maennich <maennich@google.com> a écrit:
>
>[...]
>
>> If the XML writer considers two equivalent declaration-only types to be
>> different, one question to ask is: what is the real difference, that is,
>> how will this affect the outcome of abidiff?
>
>The problem is not necessarily at the abidiff level per say.
>
>The problem would be duplication of decl-only types in the abixml
>output, I think, and maybe infinite loops in those cases.  The infinite
>loops are easy to debug, though.  So I am not concerned about them.
>

Agreed there might be some duplication. See for example the commentary
about tests/data/test-read-dwarf/PR22122-libftdc.so.abi in PATCH 4/5.

This series is specifically to eliminate the risk of infinite loops in
the libabigail version we have downstream; and also to improve
performance. After these fixes there are some more changes that should
make infinite loops even less likely. In our case the infinite loops
only happened when using Clang's library (hash tables) and were not so
easy to debug!

>> If the types never change
>> (kind, name or declaration/definition status), nothing should ever be
>> reported. If a type does change... there are two possibilities: either
>> the types were really one type and now perhaps abidiff reports diffs for
>> the same name in two different ways; or the types were really two
>> different ones and abidiff has a simpler job. In my experience, abidiff
>> doesn't always report declaration-only/defined transitions. It doesn't
>> sound like there will be any really bad impact on diffs from having this
>> kind of duplication. However, if someone can come up with a test case of
>> the kind you mention, that would give some extra reassurance.
>
>The reason why I was pointing to this "general" issue is to make sure
>you are aware of this.  As type duplications in abixml was something you
>guys were tracking (and rightly so) I thought I'd point out that we
>still have the risk here.
>
>But because the type id map (writer_context::m_type_id_map) is not
>affected, the duplicated types will correctly be identified as such by
>the reader; thus I don't think abidiff is going to be affected.
>

Duplicates with different type ids could still appear after these
changes. But they should not hurt abidiff and may point to problems
earlier in the pipeline (even the compiler - we found a Clang bug during
the investigation).

Duplicates with the same type id can be conflicting or not conflicting.
Not conflicting is not ideal, but abidiff can handle this. Conflicting
means we have some problem interpreting the XML - which definition is
the right one?

PATCH 4/5 does indeed affect the type id map specifically so that we
avoid the risk of conflicting definitions.


>>>So maybe it would be better have an equality operator that uses
>>>is_non_canonicalized_type() to detect those rare cases and use
>>>structural comparison in those cases?
>>
>> That might come at higher cost than it is beneficial.
>
>I could not tell, as I don't necessarily have the right binaries at
>hand.  I trust you.

It is having the binaries but also the tool chain (the prebuilt clang
version that we use to build Android is very close to upstream releases,
but differences can be subtle - as always). Clang produces different
DWARF and has different bugs from GCC but the standard library is also
more sensitive to how unordered_map and unordered_set are used.

>
>>
>>>
>>>What do you think?
>>
>> For us specifically - building with clang and for our use cases - if we
>> keep structural equality of any kind then we need a hash function to go
>> along with this and, as we've sadly found out, this isn't working well
>> at the moment. We are currently on a bit dated version of libabigail for
>> our production use, but would like to close that gap again to come
>> closer to master.
>>
>> The risk of infinite loops and the reality of 30x slowdowns for certain
>> workloads mean we would need to apply these changes to remove structural
>> equality testing from the XML writer and then maintain an Android
>> version of libabigail as a more heavily-patched fork, to whatever extent
>> is feasible. I would rather we find a good solution that works for all
>> to get again close to upstream and not having to maintain such a fork.
>>
>> Yet, as an additional piece of assurance: the testing we have done does
>> not only include kernels, but of course we heavily examined the
>> libabigail test suite. Additionally, we maintain a large set of small
>> test cases specifically created for ABI stability testing and to cover
>> corner cases of all sorts. We are in the process of publishing those as
>> well. So far, this has served as great input for this patch series as
>> well.
>>
>> Does this make sense? What do you think?
>
>If you don't really care about the potential type duplication in the
>abixml as stated above, frankly, let's just get this patch in.
>
>Are you okay with that?

Yes. Though I think it's important you are somewhat happy with PATCH 4/5
as well as they go together.

Cheers,
Matthias

>
>Cheers,
>
>-- 
>		Dodji
  
Dodji Seketeli Jan. 18, 2022, 5:15 p.m. UTC | #5
Matthias Maennich <maennich@google.com> a écrit:

> This is a performance and safety improvement made possible by the
> previous changes which ensure that the same pointers are inserted and
> looked up.
>
> This essentially removes the now unnecessary deep comparison.
>
> 	* src/abg-writer.cc (type_ptr_set_type): Change typedef
> 	container type to use default equality and hashing for pointer
> 	keys.
> 	(fn_type_ptr_set_type): Likewise.
>
> Reviewed-by: Giuliano Procida <gprocida@google.com>
> Signed-off-by: Matthias Maennich <maennich@google.com>

Applied to master, thanks!

[...]

Cheers,
  

Patch

diff --git a/src/abg-writer.cc b/src/abg-writer.cc
index 4cf2a7fb16af..d2ef19eb7095 100644
--- a/src/abg-writer.cc
+++ b/src/abg-writer.cc
@@ -123,14 +123,10 @@  typedef unordered_map<type_base*,
 		      abigail::diff_utils::deep_ptr_eq_functor> type_ptr_map;
 
 // A convenience typedef for a set of type_base*.
-typedef unordered_set<const type_base*, type_hasher,
-		      abigail::diff_utils::deep_ptr_eq_functor>
-type_ptr_set_type;
+typedef std::unordered_set<const type_base*> type_ptr_set_type;
 
 /// A convenience typedef for a set of function type*.
-typedef unordered_set<function_type*, type_hasher,
-		      abigail::diff_utils::deep_ptr_eq_functor>
-fn_type_ptr_set_type;
+typedef std::unordered_set<function_type*> fn_type_ptr_set_type;
 
 typedef unordered_map<shared_ptr<function_tdecl>,
 		      string,