Linux kernel type equality

Message ID CAGvU0HkcV=BL3BV1Vyu3yFG+e2qPs6DA223qtwR0MhiDArTZ8Q@mail.gmail.com
State Rejected, archived
Headers
Series Linux kernel type equality |

Commit Message

Giuliano Procida June 18, 2020, 1:45 p.m. UTC
  Hi Dodji.

Users have reported issues with kernel ABI monitoring where there are
unexpected declaration-only vs definition diffs (between XML ABIs).

We're looking at one of these cases in some detail. The DWARF reader
does associate the struct definition with the declaration-only type,
so I'm taking a look at get_canonical_type_for.

I was checking the behaviour of
types_defined_same_linux_kernel_corpus_public (as it's described as an
optimisation) with the patch below (together with my
"get_canonical_type_for: restore environment better" patch).

The check triggers with existing tests. I think these are all typedef
struct { } foo.

$ build/tests/runtestreaddwarf
type equality discrepancy with struct {pgdval_t pgd;} and struct {pgdval_t pgd;}
type equality discrepancy with struct {s64 counter;} and struct {s64 counter;}
type equality discrepancy with struct {atomic64_t id; void* vdso;
unsigned long int flags;} and struct {atomic64_t id; void* vdso;
unsigned long int flags;}
type equality discrepancy with struct {unsigned long int sig[1];} and
struct {unsigned long int sig[1];}
type equality discrepancy with struct {uid_t val;} and struct {uid_t val;}
type equality discrepancy with struct {unsigned long int bits[1];} and
struct {unsigned long int bits[1];}
type equality discrepancy with struct {pteval_t pgprot;} and struct
{pteval_t pgprot;}
type equality discrepancy with struct {gid_t val;} and struct {gid_t val;}

My question: does this point to a problem with type equality elsewhere
in libabigail?

Giuliano.

From dd249b7dc83df3609c23ec237a7ddb94e29abf44 Mon Sep 17 00:00:00 2001
From: Giuliano Procida <gprocida@google.com>
Date: Thu, 18 Jun 2020 13:50:59 +0100
Subject: [PATCH] check linux optimisation

---
 src/abg-ir.cc | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

          // flags.
  

Comments

Dodji Seketeli June 24, 2020, 5:27 p.m. UTC | #1
Hello Giuliano,

Giuliano Procida <gprocida@google.com> a écrit:

> Users have reported issues with kernel ABI monitoring where there are
> unexpected declaration-only vs definition diffs (between XML ABIs).

Ok.

>
> We're looking at one of these cases in some detail. The DWARF reader
> does associate the struct definition with the declaration-only type,
> so I'm taking a look at get_canonical_type_for.
>
> I was checking the behaviour of
> types_defined_same_linux_kernel_corpus_public (as it's described as an
> optimisation) with the patch below (together with my
> "get_canonical_type_for: restore environment better" patch).
>
> The check triggers with existing tests. I think these are all typedef
> struct { } foo.
>
> $ build/tests/runtestreaddwarf
> type equality discrepancy with struct {pgdval_t pgd;} and struct {pgdval_t pgd;}
> type equality discrepancy with struct {s64 counter;} and struct {s64 counter;}
> type equality discrepancy with struct {atomic64_t id; void* vdso;
> unsigned long int flags;} and struct {atomic64_t id; void* vdso;
> unsigned long int flags;}
> type equality discrepancy with struct {unsigned long int sig[1];} and
> struct {unsigned long int sig[1];}
> type equality discrepancy with struct {uid_t val;} and struct {uid_t val;}
> type equality discrepancy with struct {unsigned long int bits[1];} and
> struct {unsigned long int bits[1];}
> type equality discrepancy with struct {pteval_t pgprot;} and struct
> {pteval_t pgprot;}
> type equality discrepancy with struct {gid_t val;} and struct {gid_t val;}
>
> My question: does this point to a problem with type equality elsewhere
> in libabigail?

I think it does, yes.  As you have filed a problem report for this
particular issue, I have responded to it at
https://sourceware.org/bugzilla/show_bug.cgi?id=26135#c1.  We can
continue the discussion related to that problem report there, I guess.

As for the initial unexpected diff between decl-only and its
fully-defined counterpart that you talked about at the beginning of your
message, I am not sure the issue raised in PR26135 is related.  To say
for sure, I guess I'd need to have a reproducer, probably as part of a
problem report?

Cheers,
  
Giuliano Procida June 25, 2020, 3:13 p.m. UTC | #2
On Wed, 24 Jun 2020 at 18:27, Dodji Seketeli <dodji@seketeli.org> wrote:
>
> Hello Giuliano,
>
> Giuliano Procida <gprocida@google.com> a écrit:
>
> > Users have reported issues with kernel ABI monitoring where there are
> > unexpected declaration-only vs definition diffs (between XML ABIs).
>
> Ok.
>
> >
> > We're looking at one of these cases in some detail. The DWARF reader
> > does associate the struct definition with the declaration-only type,
> > so I'm taking a look at get_canonical_type_for.
> >
> > I was checking the behaviour of
> > types_defined_same_linux_kernel_corpus_public (as it's described as an
> > optimisation) with the patch below (together with my
> > "get_canonical_type_for: restore environment better" patch).
> >
> > The check triggers with existing tests. I think these are all typedef
> > struct { } foo.
> >
> > $ build/tests/runtestreaddwarf
> > type equality discrepancy with struct {pgdval_t pgd;} and struct {pgdval_t pgd;}
> > type equality discrepancy with struct {s64 counter;} and struct {s64 counter;}
> > type equality discrepancy with struct {atomic64_t id; void* vdso;
> > unsigned long int flags;} and struct {atomic64_t id; void* vdso;
> > unsigned long int flags;}
> > type equality discrepancy with struct {unsigned long int sig[1];} and
> > struct {unsigned long int sig[1];}
> > type equality discrepancy with struct {uid_t val;} and struct {uid_t val;}
> > type equality discrepancy with struct {unsigned long int bits[1];} and
> > struct {unsigned long int bits[1];}
> > type equality discrepancy with struct {pteval_t pgprot;} and struct
> > {pteval_t pgprot;}
> > type equality discrepancy with struct {gid_t val;} and struct {gid_t val;}
> >
> > My question: does this point to a problem with type equality elsewhere
> > in libabigail?
>
> I think it does, yes.  As you have filed a problem report for this
> particular issue, I have responded to it at
> https://sourceware.org/bugzilla/show_bug.cgi?id=26135#c1.  We can
> continue the discussion related to that problem report there, I guess.
>
> As for the initial unexpected diff between decl-only and its
> fully-defined counterpart that you talked about at the beginning of your
> message, I am not sure the issue raised in PR26135 is related.  To say
> for sure, I guess I'd need to have a reproducer, probably as part of a
> problem report?
>

This appears to definitely be a separate issue. I'll continue working
on isolating a smaller example.

FYI, I analysed some recent kernel ABI XML diffs and used that to come
up with the following ordering of "things which cause churn", most
significant first. For a small change, the rough amount of churn is in
[ ] below.

1 type-id renumbering - fixed by hash-based type-ids [~20000 line changes]
2 Types moving between translation units, particularly when
translation units are added or removed [~400 line moves]
3 Type id hash collision linear probing (almost always due to
anonymous types and internal name collisions; anonymous kernel unions
in particular never seem to have naming typedefs) [~200 line changes]
4a Location (line and column) changes. [<20 line changes]
4b External name renumbering due to sequential anon type names. [<10
line changes]

In terms of code review, line moves are worse than changes, so item 2
is worse than twice as bad as item 3.

mjw had a patch out addressing item 2 by folding everything into one
translation unit. This is what I'm currently looking at, in terms of
XML diff sizes at least.

Regards,
Giuliano.

> Cheers,
>
> --
>                 Dodji
  
Giuliano Procida June 29, 2020, 2:54 p.m. UTC | #3
Hi Dodji.

On Thu, 25 Jun 2020 at 16:13, Giuliano Procida <gprocida@google.com> wrote:
>
> On Wed, 24 Jun 2020 at 18:27, Dodji Seketeli <dodji@seketeli.org> wrote:
> >
> > Hello Giuliano,
> >
> > Giuliano Procida <gprocida@google.com> a écrit:
> >
> > > Users have reported issues with kernel ABI monitoring where there are
> > > unexpected declaration-only vs definition diffs (between XML ABIs).
> >
> > Ok.
> >
> > >
> > > We're looking at one of these cases in some detail. The DWARF reader
> > > does associate the struct definition with the declaration-only type,
> > > so I'm taking a look at get_canonical_type_for.
> > >
> > > I was checking the behaviour of
> > > types_defined_same_linux_kernel_corpus_public (as it's described as an
> > > optimisation) with the patch below (together with my
> > > "get_canonical_type_for: restore environment better" patch).
> > >
> > > The check triggers with existing tests. I think these are all typedef
> > > struct { } foo.
> > >
> > > $ build/tests/runtestreaddwarf
> > > type equality discrepancy with struct {pgdval_t pgd;} and struct {pgdval_t pgd;}
> > > type equality discrepancy with struct {s64 counter;} and struct {s64 counter;}
> > > type equality discrepancy with struct {atomic64_t id; void* vdso;
> > > unsigned long int flags;} and struct {atomic64_t id; void* vdso;
> > > unsigned long int flags;}
> > > type equality discrepancy with struct {unsigned long int sig[1];} and
> > > struct {unsigned long int sig[1];}
> > > type equality discrepancy with struct {uid_t val;} and struct {uid_t val;}
> > > type equality discrepancy with struct {unsigned long int bits[1];} and
> > > struct {unsigned long int bits[1];}
> > > type equality discrepancy with struct {pteval_t pgprot;} and struct
> > > {pteval_t pgprot;}
> > > type equality discrepancy with struct {gid_t val;} and struct {gid_t val;}
> > >
> > > My question: does this point to a problem with type equality elsewhere
> > > in libabigail?
> >
> > I think it does, yes.  As you have filed a problem report for this
> > particular issue, I have responded to it at
> > https://sourceware.org/bugzilla/show_bug.cgi?id=26135#c1.  We can
> > continue the discussion related to that problem report there, I guess.
> >
> > As for the initial unexpected diff between decl-only and its
> > fully-defined counterpart that you talked about at the beginning of your
> > message, I am not sure the issue raised in PR26135 is related.  To say
> > for sure, I guess I'd need to have a reproducer, probably as part of a
> > problem report?
> >
>
> This appears to definitely be a separate issue. I'll continue working
> on isolating a smaller example.

I've opened https://sourceware.org/bugzilla/show_bug.cgi?id=26182 for this.

When it comes to the Linux kernel, small examples are still rather large.

Regards,
Giuliano.

> FYI, I analysed some recent kernel ABI XML diffs and used that to come
> up with the following ordering of "things which cause churn", most
> significant first. For a small change, the rough amount of churn is in
> [ ] below.
>
> 1 type-id renumbering - fixed by hash-based type-ids [~20000 line changes]
> 2 Types moving between translation units, particularly when
> translation units are added or removed [~400 line moves]
> 3 Type id hash collision linear probing (almost always due to
> anonymous types and internal name collisions; anonymous kernel unions
> in particular never seem to have naming typedefs) [~200 line changes]
> 4a Location (line and column) changes. [<20 line changes]
> 4b External name renumbering due to sequential anon type names. [<10
> line changes]
>
> In terms of code review, line moves are worse than changes, so item 2
> is worse than twice as bad as item 3.
>
> mjw had a patch out addressing item 2 by folding everything into one
> translation unit. This is what I'm currently looking at, in terms of
> XML diff sizes at least.
>
> Regards,
> Giuliano.
>
> > Cheers,
> >
> > --
> >                 Dodji
  

Patch

diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index 41a8e3f9..a8096690 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -11904,8 +11904,18 @@  type_base::get_canonical_type_for(type_base_sptr t)
          // Compare types by considering that decl-only classes don't
          // equal their definition.
          env->decl_only_class_equals_definition(false);
-         bool equal = types_defined_same_linux_kernel_corpus_public(**it, *t)
-                      || *it == t;
+         bool linux_eq =
types_defined_same_linux_kernel_corpus_public(**it, *t);
+         bool plain_eq = *it == t;
+         // is it really just an optimisation?
+         if (linux_eq > plain_eq)
+           {
+             std::cerr << "type equality discrepancy with "
+                       << t->get_pretty_representation()
+                       << " and "
+                       << (*it)->get_pretty_representation()
+                       << std::endl;
+           }
+         bool equal = linux_eq || plain_eq;
          // Restore the state of the on-the-fly-canonicalization and
          // the decl-only-class-being-equal-to-a-matching-definition