[1/3] Stabilise sort of canonical types

Message ID 20201021110815.4025428-1-maennich@google.com
State New
Headers
Series [1/3] Stabilise sort of canonical types |

Commit Message

Matthias Männich Oct. 21, 2020, 11:08 a.m. UTC
  From: Giuliano Procida <gprocida@google.com>

Some types like unnamed-enum-underlying-type are not distinguished by
type_topo_comp. This can result in nondeterministic output and flakey
tests.

While a more complete ordering from type_topo_comp would be nice, the
nondeterminism can reduced by preserving the relative order of
identically-named types.

	* src/abg-ir.cc (scope_decl::get_sorted_canonical_types): Sort
	canonical types with std::stable_sort(..., type_topo_comp()).

Signed-off-by: Giuliano Procida <gprocida@google.com>
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
---
 src/abg-ir.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

Giuliano Procida Oct. 21, 2020, 3:49 p.m. UTC | #1
Hi.

On Wed, 21 Oct 2020 at 12:09, Matthias Maennich <maennich@google.com> wrote:

> Location expressions might occasionally be unset or invalid. E.g. a
> reason for that are thread local variables that do not exactly have a
> location to refer to. Compilers/Linkers may choose an odd
> representation. E.g. see the dwarfdump output for the added testcase
> based on libandroid.so (from AOSP).
>
> $ dwarfdump libandroid.so|egrep -B9 "gChoreographerE$"
>
> LOCAL_SYMBOLS:
> < 1><0x00000022> DW_TAG_namespace
>                    DW_AT_name           android
> < 2><0x00000027>   DW_TAG_variable
>                      DW_AT_name           gChoreographer
>                      DW_AT_type           <0x00000b65>
>                      DW_AT_decl_file      0x00000003 .../choreographer.cpp
>                      DW_AT_decl_line      0x00000059
>                      DW_AT_location       len 0x0000: :
>                      DW_AT_linkage_name   _ZN7androidL14gChoreographerE
>
> The DW_AT_location is properly read by elfutils' dwarf_location(), but
> is not useful for us to proceed with. Hence early exit on this.
>
>         * src/abg-dwarf-reader.cc (die_location_address): Ignore invalid
>         location expressions.
>         * tests/data/Makefile.am: Add new test files.
>         * tests/data/test-read-dwarf/test-libandroid.so: New test file.
>         * tests/data/test-read-dwarf/test-libandroid.so.abi: Likewise.
>         * tests/test-read-dwarf.cc: Add new test case.
>
> Reported-by: Dan Albert <danalbert@google.com>
> Signed-off-by: Matthias Maennich <maennich@google.com>
> ---
>  src/abg-dwarf-reader.cc                       |     5 +
>  tests/data/Makefile.am                        |     2 +
>  tests/data/test-read-dwarf/test-libandroid.so |   Bin 0 -> 2579424 bytes
>  .../test-read-dwarf/test-libandroid.so.abi    | 86949 ++++++++++++++++
>  tests/test-read-dwarf.cc                      |     7 +
>  5 files changed, 86963 insertions(+)
>  create mode 100644 tests/data/test-read-dwarf/test-libandroid.so
>  create mode 100644 tests/data/test-read-dwarf/test-libandroid.so.abi
>
> diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
> index 7257052e6e45..c65e1b921945 100644
> --- a/src/abg-dwarf-reader.cc
> +++ b/src/abg-dwarf-reader.cc
> @@ -10358,6 +10358,11 @@ die_location_address(Dwarf_Die*        die,
>    if (!die_location_expr(die, DW_AT_location, &expr, &expr_len))
>      return false;
>
> +  // Ignore invalid location expressions where reading them succeeded,
> but they
> +  // are set to 0x0 in DWARF.
> +  if (!expr)
> +    return false;
> +
>    int64_t addr = 0;
>    if (!eval_last_constant_dwarf_sub_expr(expr, expr_len, addr,
> is_tls_address))
>      return false;
> diff --git a/tests/data/Makefile.am b/tests/data/Makefile.am
> index 40a575faa60b..34e7a0f4212f 100644
> --- a/tests/data/Makefile.am
> +++ b/tests/data/Makefile.am
> @@ -520,6 +520,8 @@ test-read-dwarf/test-PR26568-2.o     \
>  test-read-dwarf/test-PR26568-1.o     \
>  test-read-dwarf/test-PR26568-1.o.abi \
>  test-read-dwarf/test-PR26568-2.o.abi \
> +test-read-dwarf/test-libandroid.so \
> +test-read-dwarf/test-libandroid.so.abi \
>  \
>  test-annotate/test0.abi                        \
>  test-annotate/test1.abi                        \
> [snip due to size]


My only concern is that this is a large test case. I hope we get more use
out of it than just this one issue.

Reviewed-by: Giuliano Procida <gprocida@google.com>

Regards,
Giuliano.
  
Mark Wielaard Oct. 21, 2020, 6:39 p.m. UTC | #2
Hi,

On Wed, 2020-10-21 at 16:49 +0100, Giuliano Procida via Libabigail wrote:
> On Wed, 21 Oct 2020 at 12:09, Matthias Maennich <maennich@google.com>
> wrote:
> 
> > Location expressions might occasionally be unset or invalid. E.g. a
> > reason for that are thread local variables that do not exactly have a
> > location to refer to. Compilers/Linkers may choose an odd
> > representation. E.g. see the dwarfdump output for the added testcase
> > based on libandroid.so (from AOSP).
> > 
> > $ dwarfdump libandroid.so|egrep -B9 "gChoreographerE$"
> > 
> > LOCAL_SYMBOLS:
> > < 1><0x00000022> DW_TAG_namespace
> >                    DW_AT_name           android
> > < 2><0x00000027>   DW_TAG_variable
> >                      DW_AT_name           gChoreographer
> >                      DW_AT_type           <0x00000b65>
> >                      DW_AT_decl_file      0x00000003 .../choreographer.cpp
> >                      DW_AT_decl_line      0x00000059
> >                      DW_AT_location       len 0x0000: :
> >                      DW_AT_linkage_name   _ZN7androidL14gChoreographerE
> > 
> > The DW_AT_location is properly read by elfutils' dwarf_location(), but
> > is not useful for us to proceed with. Hence early exit on this.

For some reason I missed the original post, so I don't fully understand
what is going on here. Would it be possible to make this libandroid.so
available somewhere?

Thanks,

Mark
  
Matthias Männich Oct. 22, 2020, 8:50 a.m. UTC | #3
Hi!

On Wed, Oct 21, 2020 at 08:39:16PM +0200, Mark Wielaard wrote:
>Hi,
>
>On Wed, 2020-10-21 at 16:49 +0100, Giuliano Procida via Libabigail wrote:
>> On Wed, 21 Oct 2020 at 12:09, Matthias Maennich <maennich@google.com>
>> wrote:
>>
>> > Location expressions might occasionally be unset or invalid. E.g. a
>> > reason for that are thread local variables that do not exactly have a
>> > location to refer to. Compilers/Linkers may choose an odd
>> > representation. E.g. see the dwarfdump output for the added testcase
>> > based on libandroid.so (from AOSP).
>> >
>> > $ dwarfdump libandroid.so|egrep -B9 "gChoreographerE$"
>> >
>> > LOCAL_SYMBOLS:
>> > < 1><0x00000022> DW_TAG_namespace
>> >                    DW_AT_name           android
>> > < 2><0x00000027>   DW_TAG_variable
>> >                      DW_AT_name           gChoreographer
>> >                      DW_AT_type           <0x00000b65>
>> >                      DW_AT_decl_file      0x00000003 .../choreographer.cpp
>> >                      DW_AT_decl_line      0x00000059
>> >                      DW_AT_location       len 0x0000: :
>> >                      DW_AT_linkage_name   _ZN7androidL14gChoreographerE
>> >
>> > The DW_AT_location is properly read by elfutils' dwarf_location(), but
>> > is not useful for us to proceed with. Hence early exit on this.
>
>For some reason I missed the original post, so I don't fully understand
>what is going on here. Would it be possible to make this libandroid.so
>available somewhere?

Patch 2/3 got stuck in the moderation queue as it exceeded the message
size limit. I staged all patches at

   https://github.com/metti/libabigail/commits/libandroid

   1/3 https://github.com/metti/libabigail/commit/b8a067e1d2c90bd4677cab53a4e01208f3a78f37
   2/3 https://github.com/metti/libabigail/commit/e001e06a37036429450b3802b8c26b820b2e464f
   3/3 https://github.com/metti/libabigail/commit/4af2c640c9ce76027d1617d5ed61a8df05ee8aee

Cheers,
Matthias

>
>Thanks,
>
>Mark
  
Mark Wielaard Oct. 22, 2020, 12:25 p.m. UTC | #4
Hi Matthias,

On Thu, 2020-10-22 at 09:50 +0100, Matthias Maennich wrote:
> On Wed, Oct 21, 2020 at 08:39:16PM +0200, Mark Wielaard wrote:
> > On Wed, 2020-10-21 at 16:49 +0100, Giuliano Procida via Libabigail wrote:
> > > On Wed, 21 Oct 2020 at 12:09, Matthias Maennich <maennich@google.com>
> > > wrote:
> > > 
> > > > Location expressions might occasionally be unset or invalid. E.g. a
> > > > reason for that are thread local variables that do not exactly have a
> > > > location to refer to. Compilers/Linkers may choose an odd
> > > > representation. E.g. see the dwarfdump output for the added testcase
> > > > based on libandroid.so (from AOSP).
> > > > 
> > > > $ dwarfdump libandroid.so|egrep -B9 "gChoreographerE$"
> > > > 
> > > > LOCAL_SYMBOLS:
> > > > < 1><0x00000022> DW_TAG_namespace
> > > >                    DW_AT_name           android
> > > > < 2><0x00000027>   DW_TAG_variable
> > > >                      DW_AT_name           gChoreographer
> > > >                      DW_AT_type           <0x00000b65>
> > > >                      DW_AT_decl_file      0x00000003 .../choreographer.cpp
> > > >                      DW_AT_decl_line      0x00000059
> > > >                      DW_AT_location       len 0x0000: :
> > > >                      DW_AT_linkage_name   _ZN7androidL14gChoreographerE
> > > > 
> > > > The DW_AT_location is properly read by elfutils' dwarf_location(), but
> > > > is not useful for us to proceed with. Hence early exit on this.
> > 
> > For some reason I missed the original post, so I don't fully understand
> > what is going on here. Would it be possible to make this libandroid.so
> > available somewhere?
> 
> Patch 2/3 got stuck in the moderation queue as it exceeded the message
> size limit. I staged all patches at
> 
>    https://github.com/metti/libabigail/commits/libandroid
> 
>    1/3 https://github.com/metti/libabigail/commit/b8a067e1d2c90bd4677cab53a4e01208f3a78f37
>    2/3 https://github.com/metti/libabigail/commit/e001e06a37036429450b3802b8c26b820b2e464f
>    3/3 https://github.com/metti/libabigail/commit/4af2c640c9ce76027d1617d5ed61a8df05ee8aee

Thanks, I believe I understand now what is going on. gChoreographer has
a empty location expression (zero length block).

It is correct to interpret these as not having a location expression at
all. The spec says "An empty location description consists of a DWARF
expression containing no operations. It represents a piece or all of an
object that is present in the source but not in the object code"

But I think it would be better to make die_location_expr () return
false in case expr_len == 0, instead of your check. The commit message
and comment make it sound like the expression isn't invalid, but it is
actually well defined.

BTW. I think that if this variable is actually a thread local variable
then it would be better for the DWARF producer to produce an actual
(non-empty) location expression using DW_OP_GNU_push_tls_address or
DW_OP_form_tls_address, which abg-dwarf-reader should handle correctly.

Cheers,

Mark
  
Matthias Männich Oct. 22, 2020, 1:46 p.m. UTC | #5
Hi Mark,

On Thu, Oct 22, 2020 at 02:25:20PM +0200, Mark Wielaard wrote:
>Hi Matthias,
>
>On Thu, 2020-10-22 at 09:50 +0100, Matthias Maennich wrote:
>> On Wed, Oct 21, 2020 at 08:39:16PM +0200, Mark Wielaard wrote:
>> > On Wed, 2020-10-21 at 16:49 +0100, Giuliano Procida via Libabigail wrote:
>> > > On Wed, 21 Oct 2020 at 12:09, Matthias Maennich <maennich@google.com>
>> > > wrote:
>> > >
>> > > > Location expressions might occasionally be unset or invalid. E.g. a
>> > > > reason for that are thread local variables that do not exactly have a
>> > > > location to refer to. Compilers/Linkers may choose an odd
>> > > > representation. E.g. see the dwarfdump output for the added testcase
>> > > > based on libandroid.so (from AOSP).
>> > > >
>> > > > $ dwarfdump libandroid.so|egrep -B9 "gChoreographerE$"
>> > > >
>> > > > LOCAL_SYMBOLS:
>> > > > < 1><0x00000022> DW_TAG_namespace
>> > > >                    DW_AT_name           android
>> > > > < 2><0x00000027>   DW_TAG_variable
>> > > >                      DW_AT_name           gChoreographer
>> > > >                      DW_AT_type           <0x00000b65>
>> > > >                      DW_AT_decl_file      0x00000003 .../choreographer.cpp
>> > > >                      DW_AT_decl_line      0x00000059
>> > > >                      DW_AT_location       len 0x0000: :
>> > > >                      DW_AT_linkage_name   _ZN7androidL14gChoreographerE
>> > > >
>> > > > The DW_AT_location is properly read by elfutils' dwarf_location(), but
>> > > > is not useful for us to proceed with. Hence early exit on this.
>> >
>> > For some reason I missed the original post, so I don't fully understand
>> > what is going on here. Would it be possible to make this libandroid.so
>> > available somewhere?
>>
>> Patch 2/3 got stuck in the moderation queue as it exceeded the message
>> size limit. I staged all patches at
>>
>>    https://github.com/metti/libabigail/commits/libandroid
>>
>>    1/3 https://github.com/metti/libabigail/commit/b8a067e1d2c90bd4677cab53a4e01208f3a78f37
>>    2/3 https://github.com/metti/libabigail/commit/e001e06a37036429450b3802b8c26b820b2e464f
>>    3/3 https://github.com/metti/libabigail/commit/4af2c640c9ce76027d1617d5ed61a8df05ee8aee
>
>Thanks, I believe I understand now what is going on. gChoreographer has
>a empty location expression (zero length block).
>
>It is correct to interpret these as not having a location expression at
>all. The spec says "An empty location description consists of a DWARF
>expression containing no operations. It represents a piece or all of an
>object that is present in the source but not in the object code"
>
>But I think it would be better to make die_location_expr () return
>false in case expr_len == 0, instead of your check. The commit message
>and comment make it sound like the expression isn't invalid, but it is
>actually well defined.

I will attempt that. Thanks for the hint.

>
>BTW. I think that if this variable is actually a thread local variable
>then it would be better for the DWARF producer to produce an actual
>(non-empty) location expression using DW_OP_GNU_push_tls_address or
>DW_OP_form_tls_address, which abg-dwarf-reader should handle correctly.

I never got to reproduce this with any other binary other than this
particular version of the library. A simple C++ example:

   | static thread_local int i = 1;
   | void test() { i++; }

produces

   DW_AT_location
   len 0x000a: 0e0000000000000000e0: DW_OP_const8u 0 DW_OP_GNU_push_tls_address

for clang/gcc 10 as you expected and causes no issue.

Cheers,
Matthias

>
>Cheers,
>
>Mark
  
Dodji Seketeli Oct. 28, 2020, 3:28 p.m. UTC | #6
Matthias Maennich <maennich@google.com> a écrit:

> From: Giuliano Procida <gprocida@google.com>
>
> Some types like unnamed-enum-underlying-type are not distinguished by
> type_topo_comp. This can result in nondeterministic output and flakey
> tests.
>
> While a more complete ordering from type_topo_comp would be nice, the
> nondeterminism can reduced by preserving the relative order of
> identically-named types.
>
> 	* src/abg-ir.cc (scope_decl::get_sorted_canonical_types): Sort
> 	canonical types with std::stable_sort(..., type_topo_comp()).

Interesting.  Did this reduce the non-deterministic result you were
seeing?

I think a follow-up patch to this one could be to change
enum_type_decl::get_pretty_representation to make it use a "flat
representation" of enums when the enums is anonymous, like what we do in
union_decl::get_pretty_representation.  That should reduce the variance
we see in the result of sorting.

In the mean time, I guess this change shouldn't hurt.  Thanks for
looking into it!

> Signed-off-by: Giuliano Procida <gprocida@google.com>
> Reviewed-by: Matthias Maennich <maennich@google.com>
> Signed-off-by: Matthias Maennich <maennich@google.com>

Applied to master.

Cheers,

[...]
  
Giuliano Procida Oct. 29, 2020, 8:33 a.m. UTC | #7
Hi Dodji.

On Wed, 28 Oct 2020 at 15:28, Dodji Seketeli <dodji@seketeli.org> wrote:

> Matthias Maennich <maennich@google.com> a écrit:
>
> > From: Giuliano Procida <gprocida@google.com>
> >
> > Some types like unnamed-enum-underlying-type are not distinguished by
> > type_topo_comp. This can result in nondeterministic output and flakey
> > tests.
> >
> > While a more complete ordering from type_topo_comp would be nice, the
> > nondeterminism can reduced by preserving the relative order of
> > identically-named types.
> >
> >       * src/abg-ir.cc (scope_decl::get_sorted_canonical_types): Sort
> >       canonical types with std::stable_sort(..., type_topo_comp()).
>
> Interesting.  Did this reduce the non-deterministic result you were
> seeing?
>
>
It reduced, but did not eliminate, the non-determinism.


> I think a follow-up patch to this one could be to change
> enum_type_decl::get_pretty_representation to make it use a "flat
> representation" of enums when the enums is anonymous, like what we do in
> union_decl::get_pretty_representation.  That should reduce the variance
> we see in the result of sorting.
>
>
Interesting. Currently all *underlying* types of enums are considered
anonymous. It looks like this was done to address a canonicalisation issue
with such types all having the same name. However, there's also dormant
code to add the enum name (if non anonymous) to the underlying type name.

I was looking at a patch to add the bit size to the end of the name of
enum-underlying types and in fact it's something we're likely to use
locally as it does eliminate the test non-determinism. This is not ideal as
it causes one-off churn to XML and the name switch is picked up as a
(normally filtered) harmless change.

Finally, I did a brief experiment which failed to cause any test failures.

--- a/src/abg-comparison.cc
+++ b/src/abg-comparison.cc
@@ -4199,7 +4199,7 @@ compute_diff(const enum_type_decl_sptr first,
     ABG_ASSERT(first->get_environment() == second->get_environment());

   diff_sptr ud = compute_diff_for_types(first->get_underlying_type(),
-                                       second->get_underlying_type(),
+                                       first->get_underlying_type(),
                                        ctxt);
   enum_diff_sptr d(new enum_diff(first, second, ud, ctxt));

I did this due to dubious diff output with my own testing. I'll try to
contribute some enum test cases and perhaps even a diff presentation fix.

In the mean time, I guess this change shouldn't hurt.  Thanks for
> looking into it!
>

You're welcome.

Giuliano.


>
> > Signed-off-by: Giuliano Procida <gprocida@google.com>
> > Reviewed-by: Matthias Maennich <maennich@google.com>
> > Signed-off-by: Matthias Maennich <maennich@google.com>
>
> Applied to master.
>
> Cheers,
>
> [...]
>
> --
>                 Dodji
>
  

Patch

diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index 8c04797f574c..e491528922d3 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -6008,9 +6008,9 @@  scope_decl::get_sorted_canonical_types() const
 	priv_->sorted_canonical_types_.push_back(*e);
 
       type_topo_comp comp;
-      std::sort(priv_->sorted_canonical_types_.begin(),
-		priv_->sorted_canonical_types_.end(),
-		comp);
+      std::stable_sort(priv_->sorted_canonical_types_.begin(),
+		       priv_->sorted_canonical_types_.end(),
+		       comp);
     }
   return priv_->sorted_canonical_types_;
 }