[0/4,applied] Speed up type DIEs canonicalization

Message ID 87tu525ngh.fsf@seketeli.org
Headers
Series Speed up type DIEs canonicalization |

Message

Dodji Seketeli Sept. 20, 2022, 10:50 a.m. UTC
  Hello,

This patch set is a response to the problem entitled "abidw
performance regression on vmlinux", reported at
https://sourceware.org/bugzilla/show_bug.cgi?id=29464.

There was a first version of this patch sent to
https://inbox.sourceware.org/libabigail/87o7vsizdl.fsf@seketeli.org/T/#t.

This patch addresses the comments that were generously submitted in
the problem report.

In that problem report, "abidw --noout vmlinux" would take more than
hour.

Before that performance regression, type DIEs canonicalization by the
DWARF reader was fast, because it was wrong, basically.  During
canonicalization, the reader was using a speed optimization called
"canonical type propagation".  I argue that the optimization was
beeing done too eagerly.  Meaning it was being done even in cases
where it should not have been, often leading to wrong (but very fast)
outcomes.

This patch was:

    commit 7ecef6361799326b99129a479b43b138f0b237ae
    Author: Dodji Seketeli <dodji@redhat.com>
    Date:   Mon Apr 4 15:35:48 2022 +0200

	Canonicalize DIEs w/o assuming ODR & handle typedefs transparently

That patch was mostly meant to affect only C++, Ada and Java binaries.
But then, the patch stopped the reader from doing canonical type
propagation on types that are not aggregates (pointers, typedefs
etc).  Doing type propagation on types that are not aggregate is wrong
because that can lead to cases where two "pointers to aggregate" could
be considered equal where in the end, the aggregates are different.
This can happen for instance on aggregates that are detected as redundant.

Please note that these terms (redundant, canonical type propagation,
etc) are defined in the introductory comments of the first patches
below.

Anyway, that patch restricted canonicalization and canonical type
propagation to aggregate types.  It also restricted the amount
canonical type propagation performed because there are cases where
these should not be performed.

I believe that these changes lead the many more type DIEs comparison
being performed, even for C programs.  On libraries where there tend
to be less deep type trees, comparison time didn't explode.  But on
some applications like the Linux Kernel with deep type trees
containing lots of redundant type the number of DIEs comparison
exploded.

The first patch of the series below tracks all the types that are
subject to the canonical type propagation optimization and tries to
apply that optimization to a maximum of types.  That makes comparisons
be as fast as possible as many types now have a canonical type.  But
whenever it detects that the optimization should not have taken place,
it cancels it on all the types where it happened.  This is like what
we already do when canonicalyzing IR types.

This brought comparison times of vmlinux below 5 minutes (from more
than hour).

The second patch restricts the analysis of DIEs to the interfaces that
have exported symbols.  Otherwise, several other kinds of interfaces
where being analyzed at the DIE level.  It was just later that only
the IR of the exported interfaces where "chosen" to be put into the
final ABI Corpus.  That lead to potentially "too many" DIEs being
analyzed in the first place, leading to many comparisons in the case
of huge applications like vmlinux.  I believe this brings the time to
under a minute or so when analyzing vmlinux.

That second patch thus introduces a new "--exported-interfaces-only"
option supported by the tools abidw, abidiff, abipkgdiff, kmidiff,
etc.  That option triggers the new mode where only exported interfaces
are analyzed at the DIE level.  Note that when looking at the Linux
Kernel, that option is enabled by default.

It also introduces a new "--allow-non-exported-interfaces" for those
tools.  Note that this option is enabled by default when looking at
any binary that is not the Linux Kernel.

The last two patches are additions to the initial series.

Third one fixes issues that were latent in the way we were caching the
result of some comparisons at the IR level.  These issues are
uncovered by the first patch.

The last patch is a return to canonicalizing typedefs in the IR.  I
tried to stop canonicalizing typedefs before to give a change to the
comparison engine to avoid detecting harmless type changes due to
typedef name changes.  It turned out not canonicalizing typedefs
causes more harm than good as it makes comparing and addressing them
(in the abixml writer for instance) much harder and slower.  So I am
reverting back to square one and I am relying on subsequent passes on
the final diff graph to mark diff nodes as being harmless, leaving
leeway to reporters to handle that information at reporting time.

I am applying the patch set to master as the discussion on the problem
was really fruitful.  Many thanks to Giuliano Procida for the feedback.

Dodji Seketeli (4):
  dwarf-reader: Revamp the canonical type DIE propagation algorithm
  Allow restricting analyzed decls to exported symbols
  Fix IR comparison result caching and canonical type propagation tracking
  ir, writer: Go back to canonicalizing typedefs in the IR

 doc/manuals/Makefile.am                       |     3 +-
 doc/manuals/abidiff.rst                       |    52 +
 doc/manuals/abidw.rst                         |    82 +-
 doc/manuals/abipkgdiff.rst                    |    51 +
 doc/manuals/kmidiff.rst                       |    52 +-
 doc/manuals/tools-use-libabigail.txt          |    16 +
 include/abg-fwd.h                             |     5 +
 include/abg-ir.h                              |    67 +-
 src/abg-dwarf-reader.cc                       |  1514 +-
 src/abg-ir-priv.h                             |    64 +-
 src/abg-ir.cc                                 |   552 +-
 src/abg-reader.cc                             |     3 +
 src/abg-writer.cc                             |   166 +-
 .../test-abidiff/test-PR18791-report0.txt     |   238 +-
 tests/data/test-annotate/libtest23.so.abi     |   724 +-
 .../test-annotate/libtest24-drop-fns-2.so.abi |   804 +-
 .../test-annotate/libtest24-drop-fns.so.abi   |   804 +-
 .../test-anonymous-members-0.o.abi            |    96 +-
 tests/data/test-annotate/test0.abi            |     4 +-
 tests/data/test-annotate/test1.abi            |    32 +-
 .../data/test-annotate/test13-pr18894.so.abi  |  2322 +-
 .../data/test-annotate/test14-pr18893.so.abi  | 18390 ++---
 .../data/test-annotate/test15-pr18892.so.abi  | 62714 ++++++++--------
 .../data/test-annotate/test17-pr19027.so.abi  | 46135 ++++++------
 ...st18-pr19037-libvtkRenderingLIC-6.1.so.abi | 18052 +++--
 ...19-pr19023-libtcmalloc_and_profiler.so.abi | 49442 ++++++------
 tests/data/test-annotate/test2.so.abi         |    38 +-
 ...st20-pr19025-libvtkParallelCore-6.1.so.abi | 30100 ++++----
 .../data/test-annotate/test21-pr19092.so.abi  |  7608 +-
 .../PR25058-liblttng-ctl-report-1.txt         |     7 +-
 .../test42-PR21296-clanggcc-report0.txt       |     6 +
 .../test31-pr18535-libstdc++-report-0.txt     |    31 +-
 .../test31-pr18535-libstdc++-report-1.txt     |    31 +-
 .../data/test-diff-filter/test41-report-0.txt |    30 +-
 ...x86_64--2.24.2-30.fc30.x86_64-report-0.txt |     2 +-
 .../PR24690/PR24690-report-0.txt              |     2 +-
 .../nss-3.23.0-1.0.fc23.x86_64-report-0.txt   |     6 +-
 ...l7.x86_64-0.12.8-1.el7.x86_64-report-2.txt |   113 +-
 ...bb-4.3-3.20141204.fc23.x86_64-report-0.txt |    47 +-
 ...bb-4.3-3.20141204.fc23.x86_64-report-1.txt |    13 +-
 .../PR22015-libboost_iostreams.so.abi         |  2858 +-
 .../test-read-dwarf/PR22122-libftdc.so.abi    | 16923 ++++-
 .../data/test-read-dwarf/PR25007-sdhci.ko.abi | 14620 ++--
 .../PR25042-libgdbm-clang-dwarf5.so.6.0.0.abi |   642 +-
 .../test-read-dwarf/PR26261/PR26261-exe.abi   |    10 +-
 .../test-read-dwarf/PR27700/test-PR27700.abi  |     2 +-
 tests/data/test-read-dwarf/libtest23.so.abi   |   626 +-
 .../libtest24-drop-fns-2.so.abi               |   702 +-
 .../test-read-dwarf/libtest24-drop-fns.so.abi |   702 +-
 .../data/test-read-dwarf/test-PR26568-1.o.abi |    16 +-
 .../data/test-read-dwarf/test-PR26568-2.o.abi |    22 +-
 .../test-read-dwarf/test-libaaudio.so.abi     |   238 +-
 .../test-read-dwarf/test-libandroid.so.abi    | 56274 +++++++-------
 tests/data/test-read-dwarf/test0.abi          |     2 +-
 tests/data/test-read-dwarf/test0.hash.abi     |     2 +-
 tests/data/test-read-dwarf/test1.abi          |    26 +-
 tests/data/test-read-dwarf/test1.hash.abi     |     6 +-
 .../test-read-dwarf/test10-pr18818-gcc.so.abi |  7990 +-
 .../test-read-dwarf/test11-pr18828.so.abi     | 23566 +++---
 .../test-read-dwarf/test12-pr18844.so.abi     | 33361 ++++----
 .../test-read-dwarf/test13-pr18894.so.abi     |  2090 +-
 .../test-read-dwarf/test14-pr18893.so.abi     | 15146 ++--
 .../test-read-dwarf/test15-pr18892.so.abi     | 27826 +++----
 .../test-read-dwarf/test16-pr18904.so.abi     | 41341 +++++-----
 .../test-read-dwarf/test17-pr19027.so.abi     | 24602 +++---
 ...st18-pr19037-libvtkRenderingLIC-6.1.so.abi | 14733 ++--
 ...19-pr19023-libtcmalloc_and_profiler.so.abi | 22933 +++---
 tests/data/test-read-dwarf/test2.so.abi       |    34 +-
 tests/data/test-read-dwarf/test2.so.hash.abi  |     4 +-
 ...st20-pr19025-libvtkParallelCore-6.1.so.abi | 21807 +++---
 .../test-read-dwarf/test21-pr19092.so.abi     |  6494 +-
 .../test22-pr19097-libstdc++.so.6.0.17.so.abi | 60456 +++++++--------
 .../test9-pr18818-clang.so.abi                |  5013 +-
 tests/data/test-read-write/test18.xml         |     2 +-
 .../test28-without-std-fns-ref.xml            |   860 +-
 .../test28-without-std-vars-ref.xml           |   780 +-
 tools/abidiff.cc                              |    12 +
 tools/abidw.cc                                |    15 +
 tools/abipkgdiff.cc                           |    21 +
 tools/kmidiff.cc                              |    12 +
 80 files changed, 326407 insertions(+), 316780 deletions(-)
 create mode 100644 doc/manuals/tools-use-libabigail.txt
  

Comments

Dodji Seketeli Sept. 20, 2022, 11:27 a.m. UTC | #1
Hello,

During the type DIEs comparison for canonicalization we need to
perform the optimization called "canonical type propagation".  This is
very similar to what we do for type canonicalization at the IR level.
It's described in-extenso in abg-ir.cc, in the comment that starts
with "@defgroup OnTheFlyCanonicalization".

Basically during canonicalization, suppose we end-up comparing a
sub-type pair {Sl, Sr}.  Suppose Sr does already have a canonical
type.  Suppose also, that {Sl, Sr} ends up comparing equal.  We can
thus deduce that Sl would have the same canonical type as Sr.  So we
'propagate' the canonical type of Sr onto Sl.  Sl is said to have been
canonical-type-propagated from Sr.

Now, I need to introduce the concept of redundant type.

Consider an aggregate type T and its sub-types ST0, ST1 and ST2 which
looks like this:

	T
	+-- ST0
	|
	+-- ST1
	|    +
	|    |
	|    +-- T
	|
	+-- ST2

Notice how the sub-type ST1 itself has a sub-type T.

Now, consider the type T', similar to T, which looks like:

	T'
	+-- ST0'
	|
	+-- ST1'
	|    +
	|    |
	|    +-- T'
	|
	+-- ST2'
	|
	+-- ST'3

Now consider how libabigail compares those two types T and T'
member-wise, aka structurally:

		T                 T'
		+-- ST0           +-- ST0'
		|                 |
		+-- ST1           +-- ST1' <--- Let's call this point #0.
		|    +            |    +
		|    |            |    |
		|    +-- T        |    +-- T'  <--- Let's call this point #1.
		|                 |                 Note that the sub-types of
		+-- ST2           +-- ST2'          T and T' are not
		|                                   represented at this point
		+-- ST'3                            but they do exist!
						    Representing them would lead
						    to a never-ending graph, right?

The properties of T are compared against the properties of T', but to
complete that comparison, the sub-type ST0 is compared against ST0',
etc.  A comparison stack is thus maintained.  Each member of the stack
is the pair of sub-types being compared.  That content changes as
different sub-types are compared. Let's consider the content of the
stack when we reach the point #0 in the graph above. That stack
content would look like:

	[{T,T'} | {ST0, ST0'}]

If we keep going and reach the point #1, the content of the stack
becomes:

	[{T,T'} |{ST0, ST0'} | {T, T'}]

At this point, we see that {T,T'} appears twice in the stack. If we
keep going and explore again the sub-types of {T,T'}, an infinite loop
appears.  The pair {T,T'} is said to be redundant.

To avoid those infinite loops, redundancy detection is done by
compare_dies in abg-dwarf-reader.cc.  When compare_dies detects at #1 that
{T,T'} was already present in the stack then it returns "true" early,
as if in #1, T compares equal to T'.

But then what does that mean for the value of comparing {ST0,ST0'}?
The sub-type {T,T'} in #1 compared equal, so compare_dies assumes that
{ST0,ST0'} compares equal as well, right?  That is what libabigail
used to assume before the commit

	commit 7ecef6361799326b99129a479b43b138f0b237ae
	Author: Dodji Seketeli <dodji@redhat.com>
	Date:   Mon Apr 4 15:35:48 2022 +0200

	    Canonicalize DIEs w/o assuming ODR & handle typedefs transparently

Well, it turns out we need to compare the entire tree of sub-types of
{T,T'} until we reach {ST2, ST2'}.  At that point, compare_dies sees
that ST2' has a sub-type ST3' where ST2 has none.  So {ST2,ST2'}
compares /different/.  So {T,T'} compares different.  So back to #0,
because {ST1,ST2'} has {T,T'} as sub-type (or said differently,
{ST1,ST2'} depends on the redundant pair {T,T'}) then {ST1,ST1'}
compares different.

So {ST1,ST1'} compares different even though it were initially thought to
compare equal because compare_dies had to get out early in #1,
considering that {T,T'} compared equal at that point.

Now, there are two ways to operate when we reach #1:

1/ Avoid performing canonical type propagation as soon as we detect
that {T,T'} is redundant.  That avoidance lasts until we finish
comparing {ST2,ST2'}, that is, until the complete comparison of
{T,T'}.  That means hat comparing every single sub-type is almost
assured to be done structurally, rather than canonically.

2/ Speculate that {T,T'} compare equal, unless proved otherwise.  If
{T,T'} proves to be equal, then things are well and fast.  If they
prove different, then {ST0,ST0'} must be edited afterwards to cancel
the canonical type propagation that might have taken place.  In other
words, sub-types that depends on redundant types pairs must be tracked
until the comparison of those redundant type pairs is done.  If a
redundant type pair compares different then all the sub-types that
depend on it must be walked to have their propagated canonical types
erase as if no canonical type propagation ever took place.

The first approach is the one I initially put in place with the patch
referred to above.  It proved to be super slow.  Analyzing the kernel
was taking literally hours.  Oops.

This patch implements the second approach.  It's more involved than
the first approach.  But it brings the time down to around 2 minutes
on a reasonably fast machine.  This is still slower than what I would
like to see, but it's way better than what had with the first
approach.

A subsequent patch will bring the analysis time for the kernel further
down.  But in the mean time, this one is really needed, I think.

So the patch introduces a new type named offset_pairs_stack_type to
track the pairs of type DIEs being compared.  Each DIE is now
identified by a new offset_type type.  That type  contains the
traditional DIE offset using the Dwarf_Off type from elfutils, but it
also contains the source of that DIE.  The offset_pairs_stack_type
also tracks redundant type pairs and their dependant type pairs.

compare_dies is modified to return a value that is an instance of the
new enum comparison_result.  The results can be
COMPARISON_RESULT_EQUAL when a type pair compares equal,
COMPARISON_RESULT_DIFFERENT when a type pair compares different,
COMPARISON_RESULT_CYCLE_DETECTED when a redundant type is detected,
leading to a comparison cycle, and COMPARISON_RESULT_UNKNOWN when the
outcome of the comparison cannot be known because we are comparing a
pair that depends on a redundant pair.

A new function return_comparison_result is introduced.  It's intended
to be invoked right before returning from compare_dies.  It looks at
the actual comparison result and depending on the state of the stack
of type pairs being compared, handles the book keeping of redundant
types and their dependant types.  It also handles when to propagate
canonical types if necessary and when to cancel the canonical types
that might have been propagated.

The ABG_RETURN macro has been adapted to invoke
return_comparison_result before returning out from compare_dies.

	* src/abg-ir-priv.h (enum comparison_result): Define new enum.
	* src/abg-dwarf-reader.cc (type_comparison_result_to_be_cached)
	(maybe_cache_type_comparison_result)
	(get_cached_type_comparison_result)
	(maybe_get_cached_type_comparison_result)
	(maybe_propagate_canonical_type, propagate_canonical_type)
	(return_comparison_result): Define new static functions.
	(has_offset_pair, insert_offset_pair, erase_offset_pair)
	(have_offset_pair_in_common): Remove static functions.
	(read_context::die_comparison_visits_): Remove data member.  The
	concept supported by this data member is now replaced by caching
	the results of comparing aggregate types, especially those that
	are not yet canonicalized.  This essentially prevents the same
	aggregate type pair to be compared again and again.
	(read_context::{die_comparison_results_, compare_count_,
	canonical_propagated_count_, cancelled_propagation_count_}): New
	data members.
	(read_context::initialize): Initialize the new data members
	compare_count_, canonical_propagated_count_,
	cancelled_propagation_count_ of integral type.
	(read_context::{erase_canonical_die_offset}): New member
	functions.
	(struct offset_pairs_stack_type): Define new type.
	(die_offset): Remove.
	(is_canon_type_to_be_propagated_tag): Add union types to the set
	of types for which canonical type propagation might occur.
	(is_type_die_to_be_canonicalized): Add function types and array
	types to the types to be canonicalized.
	(ABG_RETURN): Change this macro to consider
	COMPARISON_RESULT_DIFFERENT rather  than the "false" boolean.
	Also, it uses the new return_comparison_result function.
	(ABG_RETURN_FALSE): Likewise, use the new return_comparison_result
	function.
	(SET_RESULT_TO_FALSE): Make this return
	COMPARISON_RESULT_DIFFERENT.
	(SET_RESULT_TO, RETURN_IF_COMPARISON_CYCLE_DETECTED): Define new
	macros.
	(compare_dies): Make this return comparison_result rather than
	just a bool.  This is also the core of the overhaul of the
	canonical DIE propagation algorithm.  The algorithm is now similar
	to the one implemented in the equals function for class_or_union
	types in abg-ir.cc.  It's described in the comment that starts
	with '@defgroup OnTheFlyCanonicalization' in abg-ir.cc.  The type
	of the aggregates_being_compared parameter is now
	offset_pairs_stack_type in parameter.  No more need for the
	redundant_aggregates_being_compared parameter.  The new
	offset_type that also encapsulates the source of the offset is now
	used in lieu of the Dwarf_Off type.  Results of comparing
	aggregates being compared are now cached.  When comparing
	aggregates, use the RETURN_IF_COMPARISON_CYCLE_DETECTED to return
	early if a cycle is detected.  The invocation of the ABG_RETURN
	macro (especially the call to return_comparison_result) is where
	the book keeping for canonical types propagation takes place, so
	remove the explicit code that was handling that from the end of
	this function.
	(read_debug_info_into_corpus): Print statistics about the number
	of aggregates compared and canonical-type-propagated.
	* tests/data/test-annotate/test14-pr18893.so.abi: Adjust.
	* tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
	* tests/data/test-annotate/test17-pr19027.so.abi: Likewise.
	* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.
	* tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi:
	Likewise.
	* tests/data/test-diff-pkg/GtkAda-gl-2.24.2-29.fc29.x86_64--2.24.2-30.fc30.x86_64-report-0.txt:
	Likewise.
	* tests/data/test-read-dwarf/PR22122-libftdc.so.abi: Likewise.
	* tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
	* tests/data/test-read-dwarf/test14-pr18893.so.abi: Likewise.
	* tests/data/test-read-dwarf/test15-pr18892.so.abi: Likewise.
	* tests/data/test-read-dwarf/test16-pr18904.so.abi: Likewise.
	* tests/data/test-read-dwarf/test17-pr19027.so.abi: Likewise.
	* tests/data/test-read-dwarf/test19-pr19023-libtcmalloc_and_profiler.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test20-pr19025-libvtkParallelCore-6.1.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test22-pr19097-libstdc++.so.6.0.17.so.abi:
	Likewise.
	* tests/data/test-read-dwarf/test9-pr18818-clang.so.abi: Likewise.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-dwarf-reader.cc                       |  1377 +-
 src/abg-ir-priv.h                             |     9 +
 .../data/test-annotate/test14-pr18893.so.abi  |    52 +-
 .../data/test-annotate/test15-pr18892.so.abi  | 11386 ++++++++--------
 .../data/test-annotate/test17-pr19027.so.abi  |  1224 +-
 ...19-pr19023-libtcmalloc_and_profiler.so.abi | 11323 +++++++--------
 ...st20-pr19025-libvtkParallelCore-6.1.so.abi |  7194 +++++-----
 ...x86_64--2.24.2-30.fc30.x86_64-report-0.txt |     2 +-
 .../test-read-dwarf/PR22122-libftdc.so.abi    |   893 +-
 .../test-read-dwarf/test-libandroid.so.abi    |     8 +
 .../test-read-dwarf/test14-pr18893.so.abi     |    38 +-
 .../test-read-dwarf/test15-pr18892.so.abi     | 10901 ++++++++-------
 .../test-read-dwarf/test16-pr18904.so.abi     |   676 +-
 .../test-read-dwarf/test17-pr19027.so.abi     |  1216 +-
 ...19-pr19023-libtcmalloc_and_profiler.so.abi | 11153 +++++++--------
 ...st20-pr19025-libvtkParallelCore-6.1.so.abi |  7164 +++++-----
 .../test22-pr19097-libstdc++.so.6.0.17.so.abi |    32 +-
 .../test9-pr18818-clang.so.abi                |   127 +-
 18 files changed, 33405 insertions(+), 31370 deletions(-)

The patch is too big for the list so I am attaching it gzipped.