From patchwork Tue May 5 11:30:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dodji Seketeli X-Patchwork-Id: 39217 From: dodji@seketeli.org (Dodji Seketeli) Date: Tue, 05 May 2020 13:30:29 +0200 Subject: [RESEND PATCH] corpus/writer: sort emitted translation units by path name In-Reply-To: <20200430225800.28768-1-maennich@google.com> (Matthias Maennich's message of "Fri, 1 May 2020 00:58:01 +0200") References: <20200430225800.28768-1-maennich@google.com> Message-ID: <87k11q8vqi.fsf@seketeli.org> Hello Matthias, Thank you for working on this! Matthias Maennich a ?crit: > By sorting the corpora output, we achieve determinism for the emitted > XML file across multiple runs of abidw. > > For that to happen, change the collection of translation units to a > std::set (instead of std::vector), sorted by absolute path name. We were using a vector so if the analyzed binary itself didn't change i.e, if the order of the translation units described in the DWARF didn't change, several runs of abidw on that same binary should not have changed. Right? I other words, we were seeing changes in the ordering of translation units because the binary itself (its DWARF at least) changed. If that is right, then I think that even with this change, we still can see non deterministic output in the ordering of type IDs inside a given translation unit (TU) if the order of the definition of types and declarations inside that TU changes. But this is just a side note to raise awareness about this matter. This work is an improvement. And as such I think it's perfectly worth getting in. I have thus applied it to master, with some little changes that I am discussing below. [...] > --- a/include/abg-ir.h > +++ b/include/abg-ir.h > @@ -33,6 +33,7 @@ > #include > #include > #include > +#include > #include "abg-cxx-compat.h" > #include "abg-fwd.h" > #include "abg-hash.h" > @@ -707,6 +708,20 @@ public: > translation_unit& tu); > };//end class translation_unit > > +struct SharedTranslationUnitComparator > +{ In libabigail code base, we try to not use camel case for type and function names. The only exception to that is for template parameters, I believe. So I changed this to shared_translation_unit_comp, to comply with how the other comparator functors are named. I have also added some comments to that type. > + bool > + operator()(const translation_unit_sptr& lhs, > + const translation_unit_sptr& rhs) const > + { I have added a comment for this too. > + return lhs->get_absolute_path() < rhs->get_absolute_path(); > + } > +}; > + > +/// Convenience typedef for a vector of @ref translation_unit_sptr. I changed vector into set here. > +typedef std::set > + translation_units; > + > string > translation_unit_language_to_string(translation_unit::language); [...] > > * include/abg-fwd.h: remove translation_units fwd declaration. > * include/abg-ir.h: add SharedTranslationUnitComparator and add > translation_units fwd declaration as a set using said comparator. > * src/abg-corpus.cc (corpus::add): do checked insert into the > translation_units set (rather than vector::pushback) > * tests/data/test-annotate/test13-pr18894.so.abi: Adjust test data. > * tests/data/test-annotate/test14-pr18893.so.abi: Likewise. > * tests/data/test-annotate/test15-pr18892.so.abi: Likewise. > * tests/data/test-annotate/test17-pr19027.so.abi: Likewise. > * tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi: Likewise. > * tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi: Likewise. > * tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi: Likewise. > * tests/data/test-annotate/test21-pr19092.so.abi: Likewise. > I have adjusted this ChangeLog according to the changes above. [...] Matthias Maennich a ?crit: > On Fri, May 01, 2020 at 03:35:06PM +0100, Giuliano Procida wrote: >>Hi. >> >>I had a patch which sorted cus in read_debug_info_into_corpus but I >>had some concens. This was ages ago. >> >>Anyway... >> >>On Thu, 30 Apr 2020 at 23:58, Matthias Maennich wrote: >>> >>> By sorting the corpora output, we achieve determinism for the emitted >>> XML file across multiple runs of abidw. >>> >>> For that to happen, change the collection of translation units to a >>> std::set (instead of std::vector), sorted by absolute path name. >> >>Would it be excessive paranoia to use a multiset instead? >>Or keep the vector and sort it with the comparator? >> >>This would eliminate an ABG_ASSERT. > > This assert kicks in if we see a translation unit with the same name > twice. While there might be scenarios where the same relative name is > valid, that should still not happen. Hence my assumption that this > should never happen and we can assert there. Allowing multiple values > with the same key likely introduces issues downstream when working on > the set of translation units. I agree. In general, I like having asserts that express a systemic invariant like in this case. They are invaluable in detecting inconsistency in the state of the system. >> >>> Test data needed adjustments, but changes are fully compatible. >>> >>> * include/abg-fwd.h: remove translation_units fwd declaration. >>> * include/abg-ir.h: add SharedTranslationUnitComparator and add >>> translation_units fwd declaration as a set using said comparator. >>> * src/abg-corpus.cc (corpus::add): do checked insert into the >>> translation_units set (rather than vector::pushback) >> >>s/pushback/push_back/ > > Dodji, could you correct this if you agree with the patch and decide to > apply it? I have corrected that, thanks. [...] [...] >>> +++ b/include/abg-ir.h [...] >>> +/// Convenience typedef for a vector of @ref translation_unit_sptr. >> >>vector / set > > Here as well. Done. Please find below the patch that I have applied. I am leaving out the tests adjustment part because of its size. Just for the record, I have applied the patch from the mm/sort-tu-by-name branch and this is what I have applied exactly: d5bdebdb corpus/writer: sort emitted translation units by path name ------------------------->8<-------------------------------------------- commit 246ca2004930b52b2d7e73f2ba9faa7bcbcd3999 Author: Matthias Maennich Date: Sun Jan 12 21:05:33 2020 +0000 corpus/writer: sort emitted translation units by path name By sorting the corpora output, we achieve determinism for the emitted XML file across multiple runs of abidw. For that to happen, change the collection of translation units to a std::set (instead of std::vector), sorted by absolute path name. Test data needed adjustments, but changes are fully compatible. * include/abg-fwd.h: remove translation_units fwd declaration. * include/abg-ir.h (struct shared_translation_unit_comparator): Define new class. (translation_units): Define new typedef. * src/abg-corpus.cc (corpus::add): do checked insert into the translation_units set (rather than vector::push_back) * tests/data/test-annotate/test13-pr18894.so.abi: Adjust test data. * tests/data/test-annotate/test14-pr18893.so.abi: Likewise. * tests/data/test-annotate/test15-pr18892.so.abi: Likewise. * tests/data/test-annotate/test17-pr19027.so.abi: Likewise. * tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi: Likewise. * tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi: Likewise. * tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi: Likewise. * tests/data/test-annotate/test21-pr19092.so.abi: Likewise. Signed-off-by: Matthias Maennich Signed-off-by: Dodji Seketeli ------------------------->8<-------------------------------------------- Cheers, diff --git a/include/abg-fwd.h b/include/abg-fwd.h index 1aab70a6..cffa3f0d 100644 --- a/include/abg-fwd.h +++ b/include/abg-fwd.h @@ -119,8 +119,6 @@ class translation_unit; /// Convenience typedef for a shared pointer on a @ref /// translation_unit type. typedef shared_ptr translation_unit_sptr; -/// Convenience typedef for a vector of @ref translation_unit_sptr. -typedef std::vector translation_units; /// Convenience typedef for a map that associates a string to a /// translation unit. typedef unordered_map string_tu_map_type; diff --git a/include/abg-ir.h b/include/abg-ir.h index fda10de5..4b9626df 100644 --- a/include/abg-ir.h +++ b/include/abg-ir.h @@ -33,6 +33,7 @@ #include #include #include +#include #include "abg-cxx-compat.h" #include "abg-fwd.h" #include "abg-hash.h" @@ -707,6 +708,28 @@ public: translation_unit& tu); };//end class translation_unit +/// A comparison functor to compare translation units based on their +/// absolute paths. +struct shared_translation_unit_comp +{ + /// Compare two translations units based on their absolute paths. + /// + /// @param lhs the first translation unit to consider for the + /// comparison. + /// + /// @param rhs the second translatin unit to consider for the + /// comparison. + bool + operator()(const translation_unit_sptr& lhs, + const translation_unit_sptr& rhs) const + {return lhs->get_absolute_path() < rhs->get_absolute_path();} +}; // end struct shared_translation_unit_comp + +/// Convenience typedef for an ordered set of @ref +/// translation_unit_sptr. +typedef std::set translation_units; + string translation_unit_language_to_string(translation_unit::language); diff --git a/src/abg-corpus.cc b/src/abg-corpus.cc index 2e9f304d..1ca46086 100644 --- a/src/abg-corpus.cc +++ b/src/abg-corpus.cc @@ -560,7 +560,8 @@ corpus::add(const translation_unit_sptr tu) ABG_ASSERT(tu->get_environment() == get_environment()); - priv_->members.push_back(tu); + ABG_ASSERT(priv_->members.insert(tu).second); + if (!tu->get_absolute_path().empty()) { // Update the path -> translation_unit map.