[RESEND] corpus/writer: sort emitted translation units by path name

Message ID 87k11q8vqi.fsf@seketeli.org
State Committed
Headers
Series [RESEND] corpus/writer: sort emitted translation units by path name |

Commit Message

Dodji Seketeli May 5, 2020, 11:30 a.m. UTC
  Hello Matthias,

Thank you for working on this!

Matthias Maennich <maennich@google.com> a ?crit:

> By sorting the corpora output, we achieve determinism for the emitted
> XML file across multiple runs of abidw.
>
> For that to happen, change the collection of translation units to a
> std::set (instead of std::vector), sorted by absolute path name.

We were using a vector so if the analyzed binary itself didn't change
i.e, if the order of the translation units described in the DWARF didn't
change, several runs of abidw on that same binary should not have
changed.  Right?  I other words, we were seeing changes in the ordering
of translation units because the binary itself (its DWARF at least)
changed.

If that is right, then I think that even with this change, we still can
see non deterministic output in the ordering of type IDs inside a given
translation unit (TU) if the order of the definition of types and
declarations inside that TU changes.

But this is just a side note to raise awareness about this matter.

This work is an improvement.  And as such I think it's perfectly worth
getting in.

I have thus applied it to master, with some little changes that I am
discussing below.

[...]

> --- a/include/abg-ir.h
> +++ b/include/abg-ir.h
> @@ -33,6 +33,7 @@
>  #include <assert.h>
>  #include <stdint.h>
>  #include <cstdlib>
> +#include <set>
>  #include "abg-cxx-compat.h"
>  #include "abg-fwd.h"
>  #include "abg-hash.h"
> @@ -707,6 +708,20 @@ public:
>  					translation_unit& tu);
>  };//end class translation_unit
>  
> +struct SharedTranslationUnitComparator
> +{

In libabigail code base, we try to not use camel case for type and
function names.  The only exception to that is for template parameters,
I believe.  So I changed this to shared_translation_unit_comp, to comply
with how the other comparator functors are named.  I have also added
some comments to that type.

> +  bool
> +  operator()(const translation_unit_sptr& lhs,
> +	     const translation_unit_sptr& rhs) const
> +  {

I have added a comment for this too.

> +    return lhs->get_absolute_path() < rhs->get_absolute_path();
> +  }
> +};
> +
> +/// Convenience typedef for a vector of @ref translation_unit_sptr.

I changed vector into set here.

> +typedef std::set<translation_unit_sptr, SharedTranslationUnitComparator>
> +    translation_units;
> +
>  string
>  translation_unit_language_to_string(translation_unit::language);

[...]

>
> 	* include/abg-fwd.h: remove translation_units fwd declaration.
> 	* include/abg-ir.h: add SharedTranslationUnitComparator and add
> 	translation_units fwd declaration as a set using said comparator.
> 	* src/abg-corpus.cc (corpus::add): do checked insert into the
> 	translation_units set (rather than vector::pushback)
> 	* tests/data/test-annotate/test13-pr18894.so.abi: Adjust test data.
> 	* tests/data/test-annotate/test14-pr18893.so.abi: Likewise.
> 	* tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
> 	* tests/data/test-annotate/test17-pr19027.so.abi: Likewise.
> 	* tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi: Likewise.
> 	* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi: Likewise.
> 	* tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi: Likewise.
> 	* tests/data/test-annotate/test21-pr19092.so.abi: Likewise.
>

I have adjusted this ChangeLog according to the changes above.

[...]



Matthias Maennich <maennich@google.com> a ?crit:

> On Fri, May 01, 2020 at 03:35:06PM +0100, Giuliano Procida wrote:
>>Hi.
>>
>>I had a patch which sorted cus in read_debug_info_into_corpus but I
>>had some concens. This was ages ago.
>>
>>Anyway...
>>
>>On Thu, 30 Apr 2020 at 23:58, Matthias Maennich <maennich@google.com> wrote:
>>>
>>> By sorting the corpora output, we achieve determinism for the emitted
>>> XML file across multiple runs of abidw.
>>>
>>> For that to happen, change the collection of translation units to a
>>> std::set (instead of std::vector), sorted by absolute path name.
>>
>>Would it be excessive paranoia to use a multiset instead?
>>Or keep the vector and sort it with the comparator?
>>
>>This would eliminate an ABG_ASSERT.
>
> This assert kicks in if we see a translation unit with the same name
> twice. While there might be scenarios where the same relative name is
> valid, that should still not happen. Hence my assumption that this
> should never happen and we can assert there. Allowing multiple values
> with the same key likely introduces issues downstream when working on
> the set of translation units.

I agree.  In general, I like having asserts that express a systemic
invariant like in this case.  They are invaluable in detecting
inconsistency in the state of the system.

>>
>>> Test data needed adjustments, but changes are fully compatible.
>>>
>>>         * include/abg-fwd.h: remove translation_units fwd declaration.
>>>         * include/abg-ir.h: add SharedTranslationUnitComparator and add
>>>         translation_units fwd declaration as a set using said comparator.
>>>         * src/abg-corpus.cc (corpus::add): do checked insert into the
>>>         translation_units set (rather than vector::pushback)
>>
>>s/pushback/push_back/
>
> Dodji, could you correct this if you agree with the patch and decide to
> apply it?

I have corrected that, thanks.

[...]


[...]

>>> +++ b/include/abg-ir.h

[...]

>>> +/// Convenience typedef for a vector of @ref translation_unit_sptr.
>>
>>vector / set
>
> Here as well.

Done.

Please find below the patch that I have applied.  I am leaving out the
tests adjustment part because of its size.  Just for the record, I have
applied the patch from the mm/sort-tu-by-name branch and this is what I
have applied exactly:

           d5bdebdb corpus/writer: sort emitted translation units by path name

------------------------->8<--------------------------------------------
commit 246ca2004930b52b2d7e73f2ba9faa7bcbcd3999
Author: Matthias Maennich <maennich@google.com>
Date:   Sun Jan 12 21:05:33 2020 +0000

    corpus/writer: sort emitted translation units by path name
    
    By sorting the corpora output, we achieve determinism for the emitted
    XML file across multiple runs of abidw.
    
    For that to happen, change the collection of translation units to a
    std::set (instead of std::vector), sorted by absolute path name.
    
    Test data needed adjustments, but changes are fully compatible.
    
            * include/abg-fwd.h: remove translation_units fwd declaration.
            * include/abg-ir.h (struct shared_translation_unit_comparator):
            Define new class.
            (translation_units): Define new typedef.
            * src/abg-corpus.cc (corpus::add): do checked insert into the
            translation_units set (rather than vector::push_back)
            * tests/data/test-annotate/test13-pr18894.so.abi: Adjust test data.
            * tests/data/test-annotate/test14-pr18893.so.abi: Likewise.
            * tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
            * tests/data/test-annotate/test17-pr19027.so.abi: Likewise.
            * tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi: Likewise.
            * tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi: Likewise.
            * tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi: Likewise.
            * tests/data/test-annotate/test21-pr19092.so.abi: Likewise.
    
    Signed-off-by: Matthias Maennich <maennich@google.com>
    Signed-off-by: Dodji Seketeli <dodji@redhat.com>

------------------------->8<--------------------------------------------

Cheers,
  

Patch

diff --git a/include/abg-fwd.h b/include/abg-fwd.h
index 1aab70a6..cffa3f0d 100644
--- a/include/abg-fwd.h
+++ b/include/abg-fwd.h
@@ -119,8 +119,6 @@  class translation_unit;
 /// Convenience typedef for a shared pointer on a @ref
 /// translation_unit type.
 typedef shared_ptr<translation_unit> translation_unit_sptr;
-/// Convenience typedef for a vector of @ref translation_unit_sptr.
-typedef std::vector<translation_unit_sptr> translation_units;
 /// Convenience typedef for a map that associates a string to a
 /// translation unit.
 typedef unordered_map<string, translation_unit_sptr> string_tu_map_type;
diff --git a/include/abg-ir.h b/include/abg-ir.h
index fda10de5..4b9626df 100644
--- a/include/abg-ir.h
+++ b/include/abg-ir.h
@@ -33,6 +33,7 @@ 
 #include <assert.h>
 #include <stdint.h>
 #include <cstdlib>
+#include <set>
 #include "abg-cxx-compat.h"
 #include "abg-fwd.h"
 #include "abg-hash.h"
@@ -707,6 +708,28 @@  public:
 					translation_unit& tu);
 };//end class translation_unit
 
+/// A comparison functor to compare translation units based on their
+/// absolute paths.
+struct shared_translation_unit_comp
+{
+  /// Compare two translations units based on their absolute paths.
+  ///
+  /// @param lhs the first translation unit to consider for the
+  /// comparison.
+  ///
+  /// @param rhs the second translatin unit to consider for the
+  /// comparison.
+  bool
+  operator()(const translation_unit_sptr& lhs,
+	     const translation_unit_sptr& rhs) const
+  {return lhs->get_absolute_path() < rhs->get_absolute_path();}
+}; // end struct shared_translation_unit_comp
+
+/// Convenience typedef for an ordered set of @ref
+/// translation_unit_sptr.
+typedef std::set<translation_unit_sptr,
+		 shared_translation_unit_comp> translation_units;
+
 string
 translation_unit_language_to_string(translation_unit::language);
 
diff --git a/src/abg-corpus.cc b/src/abg-corpus.cc
index 2e9f304d..1ca46086 100644
--- a/src/abg-corpus.cc
+++ b/src/abg-corpus.cc
@@ -560,7 +560,8 @@  corpus::add(const translation_unit_sptr tu)
 
   ABG_ASSERT(tu->get_environment() == get_environment());
 
-  priv_->members.push_back(tu);
+  ABG_ASSERT(priv_->members.insert(tu).second);
+
   if (!tu->get_absolute_path().empty())
     {
       // Update the path -> translation_unit map.