[RESEND] corpus/writer: sort emitted translation units by path name

Message ID 20200430225800.28768-1-maennich@google.com
State Committed
Headers
Series [RESEND] corpus/writer: sort emitted translation units by path name |

Commit Message

Matthias Männich April 30, 2020, 10:58 p.m. UTC
  By sorting the corpora output, we achieve determinism for the emitted
XML file across multiple runs of abidw.

For that to happen, change the collection of translation units to a
std::set (instead of std::vector), sorted by absolute path name.

Test data needed adjustments, but changes are fully compatible.

	* include/abg-fwd.h: remove translation_units fwd declaration.
	* include/abg-ir.h: add SharedTranslationUnitComparator and add
	translation_units fwd declaration as a set using said comparator.
	* src/abg-corpus.cc (corpus::add): do checked insert into the
	translation_units set (rather than vector::pushback)
	* tests/data/test-annotate/test13-pr18894.so.abi: Adjust test data.
	* tests/data/test-annotate/test14-pr18893.so.abi: Likewise.
	* tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
	* tests/data/test-annotate/test17-pr19027.so.abi: Likewise.
	* tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi: Likewise.
	* tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi: Likewise.
	* tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi: Likewise.
	* tests/data/test-annotate/test21-pr19092.so.abi: Likewise.

Signed-off-by: Matthias Maennich <maennich@google.com>
---

I cut out the test data changes when sending this PATCH to avoid a crazy large
email. The test data can be restored quite easily, but I also staged the change
in the mm/sort-tu-by-name branch [1] easily cherry pickable as

    d5bdebdbd3a8 ("corpus/writer: sort emitted translation units by path name")

Cheers,
Matthias

[1] https://sourceware.org/git/?p=libabigail.git;a=commit;h=d5bdebdbd3a8a5abd804b314ded517ae6fa26d2e

 include/abg-fwd.h                             |     2 -
 include/abg-ir.h                              |    15 +
 src/abg-corpus.cc                             |     3 +-
 .../data/test-annotate/test13-pr18894.so.abi  |   562 +-
 .../data/test-annotate/test14-pr18893.so.abi  | 11708 +--
 .../data/test-annotate/test15-pr18892.so.abi  | 71712 +++++++--------
 .../data/test-annotate/test17-pr19027.so.abi  | 54448 +++++------
 ...st18-pr19037-libvtkRenderingLIC-6.1.so.abi |    58 +-
 ...19-pr19023-libtcmalloc_and_profiler.so.abi | 76289 ++++++++--------
 ...st20-pr19025-libvtkParallelCore-6.1.so.abi |  9220 +-
 .../data/test-annotate/test21-pr19092.so.abi  | 12358 +--
 11 files changed, 118387 insertions(+), 117988 deletions(-)
  

Comments

Giuliano Procida May 1, 2020, 2:35 p.m. UTC | #1
Hi.

I had a patch which sorted cus in read_debug_info_into_corpus but I
had some concens. This was ages ago.

Anyway...

On Thu, 30 Apr 2020 at 23:58, Matthias Maennich <maennich@google.com> wrote:
>
> By sorting the corpora output, we achieve determinism for the emitted
> XML file across multiple runs of abidw.
>
> For that to happen, change the collection of translation units to a
> std::set (instead of std::vector), sorted by absolute path name.

Would it be excessive paranoia to use a multiset instead?
Or keep the vector and sort it with the comparator?

This would eliminate an ABG_ASSERT.

> Test data needed adjustments, but changes are fully compatible.
>
>         * include/abg-fwd.h: remove translation_units fwd declaration.
>         * include/abg-ir.h: add SharedTranslationUnitComparator and add
>         translation_units fwd declaration as a set using said comparator.
>         * src/abg-corpus.cc (corpus::add): do checked insert into the
>         translation_units set (rather than vector::pushback)

s/pushback/push_back/

>         * tests/data/test-annotate/test13-pr18894.so.abi: Adjust test data.
>         * tests/data/test-annotate/test14-pr18893.so.abi: Likewise.
>         * tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
>         * tests/data/test-annotate/test17-pr19027.so.abi: Likewise.
>         * tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi: Likewise.
>         * tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi: Likewise.
>         * tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi: Likewise.
>         * tests/data/test-annotate/test21-pr19092.so.abi: Likewise.
>

Reviewed-by: Giuliano Procida <gprocida@google.com>

> Signed-off-by: Matthias Maennich <maennich@google.com>
> ---
>
> I cut out the test data changes when sending this PATCH to avoid a crazy large
> email. The test data can be restored quite easily, but I also staged the change
> in the mm/sort-tu-by-name branch [1] easily cherry pickable as
>
>     d5bdebdbd3a8 ("corpus/writer: sort emitted translation units by path name")
>
> Cheers,
> Matthias
>
> [1] https://sourceware.org/git/?p=libabigail.git;a=commit;h=d5bdebdbd3a8a5abd804b314ded517ae6fa26d2e
>
>  include/abg-fwd.h                             |     2 -
>  include/abg-ir.h                              |    15 +
>  src/abg-corpus.cc                             |     3 +-
>  .../data/test-annotate/test13-pr18894.so.abi  |   562 +-
>  .../data/test-annotate/test14-pr18893.so.abi  | 11708 +--
>  .../data/test-annotate/test15-pr18892.so.abi  | 71712 +++++++--------
>  .../data/test-annotate/test17-pr19027.so.abi  | 54448 +++++------
>  ...st18-pr19037-libvtkRenderingLIC-6.1.so.abi |    58 +-
>  ...19-pr19023-libtcmalloc_and_profiler.so.abi | 76289 ++++++++--------
>  ...st20-pr19025-libvtkParallelCore-6.1.so.abi |  9220 +-
>  .../data/test-annotate/test21-pr19092.so.abi  | 12358 +--
>  11 files changed, 118387 insertions(+), 117988 deletions(-)
>
> diff --git a/include/abg-fwd.h b/include/abg-fwd.h
> index 1aab70a663de..cffa3f0d1aa3 100644
> --- a/include/abg-fwd.h
> +++ b/include/abg-fwd.h
> @@ -119,8 +119,6 @@ class translation_unit;
>  /// Convenience typedef for a shared pointer on a @ref
>  /// translation_unit type.
>  typedef shared_ptr<translation_unit> translation_unit_sptr;
> -/// Convenience typedef for a vector of @ref translation_unit_sptr.
> -typedef std::vector<translation_unit_sptr> translation_units;
>  /// Convenience typedef for a map that associates a string to a
>  /// translation unit.
>  typedef unordered_map<string, translation_unit_sptr> string_tu_map_type;
> diff --git a/include/abg-ir.h b/include/abg-ir.h
> index fda10de5c537..83dbe02f605d 100644
> --- a/include/abg-ir.h
> +++ b/include/abg-ir.h
> @@ -33,6 +33,7 @@
>  #include <assert.h>
>  #include <stdint.h>
>  #include <cstdlib>
> +#include <set>
>  #include "abg-cxx-compat.h"
>  #include "abg-fwd.h"
>  #include "abg-hash.h"
> @@ -707,6 +708,20 @@ public:
>                                         translation_unit& tu);
>  };//end class translation_unit
>
> +struct SharedTranslationUnitComparator
> +{
> +  bool
> +  operator()(const translation_unit_sptr& lhs,
> +            const translation_unit_sptr& rhs) const
> +  {
> +    return lhs->get_absolute_path() < rhs->get_absolute_path();
> +  }
> +};
> +
> +/// Convenience typedef for a vector of @ref translation_unit_sptr.

vector / set

> +typedef std::set<translation_unit_sptr, SharedTranslationUnitComparator>
> +    translation_units;
> +
>  string
>  translation_unit_language_to_string(translation_unit::language);
>
> diff --git a/src/abg-corpus.cc b/src/abg-corpus.cc
> index 12f44fd1e4cf..e01c20f75b4d 100644
> --- a/src/abg-corpus.cc
> +++ b/src/abg-corpus.cc
> @@ -560,7 +560,8 @@ corpus::add(const translation_unit_sptr tu)
>
>    ABG_ASSERT(tu->get_environment() == get_environment());
>
> -  priv_->members.push_back(tu);
> +  ABG_ASSERT(priv_->members.insert(tu).second);
> +
>    if (!tu->get_absolute_path().empty())
>      {
>        // Update the path -> translation_unit map.
> --
> 2.26.2.526.g744177e7f7-goog
>
  
Matthias Männich May 4, 2020, 1:25 p.m. UTC | #2
On Fri, May 01, 2020 at 03:35:06PM +0100, Giuliano Procida wrote:
>Hi.
>
>I had a patch which sorted cus in read_debug_info_into_corpus but I
>had some concens. This was ages ago.
>
>Anyway...
>
>On Thu, 30 Apr 2020 at 23:58, Matthias Maennich <maennich@google.com> wrote:
>>
>> By sorting the corpora output, we achieve determinism for the emitted
>> XML file across multiple runs of abidw.
>>
>> For that to happen, change the collection of translation units to a
>> std::set (instead of std::vector), sorted by absolute path name.
>
>Would it be excessive paranoia to use a multiset instead?
>Or keep the vector and sort it with the comparator?
>
>This would eliminate an ABG_ASSERT.

This assert kicks in if we see a translation unit with the same name
twice. While there might be scenarios where the same relative name is
valid, that should still not happen. Hence my assumption that this
should never happen and we can assert there. Allowing multiple values
with the same key likely introduces issues downstream when working on
the set of translation units.

>
>> Test data needed adjustments, but changes are fully compatible.
>>
>>         * include/abg-fwd.h: remove translation_units fwd declaration.
>>         * include/abg-ir.h: add SharedTranslationUnitComparator and add
>>         translation_units fwd declaration as a set using said comparator.
>>         * src/abg-corpus.cc (corpus::add): do checked insert into the
>>         translation_units set (rather than vector::pushback)
>
>s/pushback/push_back/

Dodji, could you correct this if you agree with the patch and decide to
apply it?

>
>>         * tests/data/test-annotate/test13-pr18894.so.abi: Adjust test data.
>>         * tests/data/test-annotate/test14-pr18893.so.abi: Likewise.
>>         * tests/data/test-annotate/test15-pr18892.so.abi: Likewise.
>>         * tests/data/test-annotate/test17-pr19027.so.abi: Likewise.
>>         * tests/data/test-annotate/test18-pr19037-libvtkRenderingLIC-6.1.so.abi: Likewise.
>>         * tests/data/test-annotate/test19-pr19023-libtcmalloc_and_profiler.so.abi: Likewise.
>>         * tests/data/test-annotate/test20-pr19025-libvtkParallelCore-6.1.so.abi: Likewise.
>>         * tests/data/test-annotate/test21-pr19092.so.abi: Likewise.
>>
>
>Reviewed-by: Giuliano Procida <gprocida@google.com>
>
>> Signed-off-by: Matthias Maennich <maennich@google.com>
>> ---
>>
>> I cut out the test data changes when sending this PATCH to avoid a crazy large
>> email. The test data can be restored quite easily, but I also staged the change
>> in the mm/sort-tu-by-name branch [1] easily cherry pickable as
>>
>>     d5bdebdbd3a8 ("corpus/writer: sort emitted translation units by path name")
>>
>> Cheers,
>> Matthias
>>
>> [1] https://sourceware.org/git/?p=libabigail.git;a=commit;h=d5bdebdbd3a8a5abd804b314ded517ae6fa26d2e
>>
>>  include/abg-fwd.h                             |     2 -
>>  include/abg-ir.h                              |    15 +
>>  src/abg-corpus.cc                             |     3 +-
>>  .../data/test-annotate/test13-pr18894.so.abi  |   562 +-
>>  .../data/test-annotate/test14-pr18893.so.abi  | 11708 +--
>>  .../data/test-annotate/test15-pr18892.so.abi  | 71712 +++++++--------
>>  .../data/test-annotate/test17-pr19027.so.abi  | 54448 +++++------
>>  ...st18-pr19037-libvtkRenderingLIC-6.1.so.abi |    58 +-
>>  ...19-pr19023-libtcmalloc_and_profiler.so.abi | 76289 ++++++++--------
>>  ...st20-pr19025-libvtkParallelCore-6.1.so.abi |  9220 +-
>>  .../data/test-annotate/test21-pr19092.so.abi  | 12358 +--
>>  11 files changed, 118387 insertions(+), 117988 deletions(-)
>>
>> diff --git a/include/abg-fwd.h b/include/abg-fwd.h
>> index 1aab70a663de..cffa3f0d1aa3 100644
>> --- a/include/abg-fwd.h
>> +++ b/include/abg-fwd.h
>> @@ -119,8 +119,6 @@ class translation_unit;
>>  /// Convenience typedef for a shared pointer on a @ref
>>  /// translation_unit type.
>>  typedef shared_ptr<translation_unit> translation_unit_sptr;
>> -/// Convenience typedef for a vector of @ref translation_unit_sptr.
>> -typedef std::vector<translation_unit_sptr> translation_units;
>>  /// Convenience typedef for a map that associates a string to a
>>  /// translation unit.
>>  typedef unordered_map<string, translation_unit_sptr> string_tu_map_type;
>> diff --git a/include/abg-ir.h b/include/abg-ir.h
>> index fda10de5c537..83dbe02f605d 100644
>> --- a/include/abg-ir.h
>> +++ b/include/abg-ir.h
>> @@ -33,6 +33,7 @@
>>  #include <assert.h>
>>  #include <stdint.h>
>>  #include <cstdlib>
>> +#include <set>
>>  #include "abg-cxx-compat.h"
>>  #include "abg-fwd.h"
>>  #include "abg-hash.h"
>> @@ -707,6 +708,20 @@ public:
>>                                         translation_unit& tu);
>>  };//end class translation_unit
>>
>> +struct SharedTranslationUnitComparator
>> +{
>> +  bool
>> +  operator()(const translation_unit_sptr& lhs,
>> +            const translation_unit_sptr& rhs) const
>> +  {
>> +    return lhs->get_absolute_path() < rhs->get_absolute_path();
>> +  }
>> +};
>> +
>> +/// Convenience typedef for a vector of @ref translation_unit_sptr.
>
>vector / set

Here as well. Thanks for catching that!

Cheers,
Matthias

>
>> +typedef std::set<translation_unit_sptr, SharedTranslationUnitComparator>
>> +    translation_units;
>> +
>>  string
>>  translation_unit_language_to_string(translation_unit::language);
>>
>> diff --git a/src/abg-corpus.cc b/src/abg-corpus.cc
>> index 12f44fd1e4cf..e01c20f75b4d 100644
>> --- a/src/abg-corpus.cc
>> +++ b/src/abg-corpus.cc
>> @@ -560,7 +560,8 @@ corpus::add(const translation_unit_sptr tu)
>>
>>    ABG_ASSERT(tu->get_environment() == get_environment());
>>
>> -  priv_->members.push_back(tu);
>> +  ABG_ASSERT(priv_->members.insert(tu).second);
>> +
>>    if (!tu->get_absolute_path().empty())
>>      {
>>        // Update the path -> translation_unit map.
>> --
>> 2.26.2.526.g744177e7f7-goog
>>
  

Patch

diff --git a/include/abg-fwd.h b/include/abg-fwd.h
index 1aab70a663de..cffa3f0d1aa3 100644
--- a/include/abg-fwd.h
+++ b/include/abg-fwd.h
@@ -119,8 +119,6 @@  class translation_unit;
 /// Convenience typedef for a shared pointer on a @ref
 /// translation_unit type.
 typedef shared_ptr<translation_unit> translation_unit_sptr;
-/// Convenience typedef for a vector of @ref translation_unit_sptr.
-typedef std::vector<translation_unit_sptr> translation_units;
 /// Convenience typedef for a map that associates a string to a
 /// translation unit.
 typedef unordered_map<string, translation_unit_sptr> string_tu_map_type;
diff --git a/include/abg-ir.h b/include/abg-ir.h
index fda10de5c537..83dbe02f605d 100644
--- a/include/abg-ir.h
+++ b/include/abg-ir.h
@@ -33,6 +33,7 @@ 
 #include <assert.h>
 #include <stdint.h>
 #include <cstdlib>
+#include <set>
 #include "abg-cxx-compat.h"
 #include "abg-fwd.h"
 #include "abg-hash.h"
@@ -707,6 +708,20 @@  public:
 					translation_unit& tu);
 };//end class translation_unit
 
+struct SharedTranslationUnitComparator
+{
+  bool
+  operator()(const translation_unit_sptr& lhs,
+	     const translation_unit_sptr& rhs) const
+  {
+    return lhs->get_absolute_path() < rhs->get_absolute_path();
+  }
+};
+
+/// Convenience typedef for a vector of @ref translation_unit_sptr.
+typedef std::set<translation_unit_sptr, SharedTranslationUnitComparator>
+    translation_units;
+
 string
 translation_unit_language_to_string(translation_unit::language);
 
diff --git a/src/abg-corpus.cc b/src/abg-corpus.cc
index 12f44fd1e4cf..e01c20f75b4d 100644
--- a/src/abg-corpus.cc
+++ b/src/abg-corpus.cc
@@ -560,7 +560,8 @@  corpus::add(const translation_unit_sptr tu)
 
   ABG_ASSERT(tu->get_environment() == get_environment());
 
-  priv_->members.push_back(tu);
+  ABG_ASSERT(priv_->members.insert(tu).second);
+
   if (!tu->get_absolute_path().empty())
     {
       // Update the path -> translation_unit map.