[v2,0/2] Validating UAPI backwards compatibility

Message ID 20230301075402.4578-1-quic_johmoo@quicinc.com
Headers
Series Validating UAPI backwards compatibility |

Message

John Moon March 1, 2023, 7:54 a.m. UTC
  Hi all,

The kernel community has rigorously enforced a policy of backwards
compatibility in its UAPI headers for a long time. This has allowed user
applications to enjoy stability across kernel upgrades without
recompiling.

In the vendor driver community (out-of-tree modules), there's been a
lack of discipline when it comes to maintaining UAPI backwards
compatibility. This has been a maintenance burden and limits our options
for long-term support of older devices.

Our goal is to add tooling for vendor driver developers because the
upstream model of expert maintainer code review can be difficult to
replicate in-house. Tools may help developers catch simple UAPI
incompatibilities that could be easily overlooked by in-house review.

We see in the kernel documentation:
"Kernel headers are backwards compatible, but not forwards compatible.
This means that a program built against a C library using older kernel
headers should run on a newer kernel (although it may not have access
to new features), but a program built against newer kernel headers may
not work on an older kernel."[1]

How does the kernel enforce this guarantee? We would be interested to
learn about any tools or methods used by kernel developers to make sure
the above statement remains true.

Could the documentation on UAPI maintenance (from a developer's point of
view) be expanded? Internally, we have a set of guidelines for our kernel
developers regarding UAPI compatibility techniques. If there's interest
in supplying a document on this topic with the kernel, we'd be happy to
submit a draft detailing what we have so far as a jumping off point.

Additionally, I've attached a shell script we've been using internally
to validate changes to our UAPI headers are backwards compatible. The
script uses libabigail's[2] tool abidiff[3] to compare a modified
header's ABI before and after a patch is applied. If an existing UAPI is
modified, the script exits non-zero. We use this script in our CI system
to block changes that fail the check.

Currently, the script works with gcc. It generates output like this when
a backwards-incompatible change is made to a UAPI header:

 !!! ABI differences detected in include/uapi/linux/acct.h (compared to
 file at HEAD^1) !!!

     [C] 'struct acct' changed:
       type size changed from 512 to 544 (in bits)
       1 data member insertion:
         '__u32 new_val', at offset 512 (in bits) at acct.h:71:1

 0/1 UAPI header file changes are backwards compatible
 UAPI header ABI check failed

However, we have not had success with clang. It seems clang is more
aggressive in optimizing dead code away (no matter which options we
pass). Therefore, no ABI differences are found.

We wanted to share with the community to receive feedback and any advice
when it comes to tooling/policy surrounding this issue. Our hope is that
the script will help all kernel UAPI authors (even those that haven't
upstreamed yet) maintain good discipline and avoid breaking userspace.

In v2, we've expanded the functionality quite a bit. We've added a
document for the tool which includes some examples and explanations of
possible false positives. We've also made it easier to examine changes
across a large range of commits (e.g. v6.0 -> v6.1). This provided a
great testbed of a wide variety of changes to examine.

It would be very helpful for the expert maintainers to look over the
tool's output and describe why or why not certain changes are being
incorrectly flagged. This could lead the way towards another document
that describes the kernel's UAPI policy more formally.

As our tooling improves, we may be able to effectively filter out the
false positives so the tool fits the kernel's UAPI policy neatly.

Thanks for the help on v1 and thanks in advance for the help on v2!

[1] Documentation/kbuild/headers_install.rst
[2] https://sourceware.org/libabigail/manual/libabigail-overview.html
[3] https://sourceware.org/libabigail/manual/abidiff.html

P.S. While at Qualcomm, Jordan Crouse <jorcrous@amazon.com> authored the
original version of the UAPI checker script. Thanks Jordan!

John Moon (2):
  check-uapi: Introduce check-uapi.sh
  docs: dev-tools: Add UAPI checker documentation

 Documentation/dev-tools/checkuapi.rst | 258 +++++++++++++++
 Documentation/dev-tools/index.rst     |   1 +
 scripts/check-uapi.sh                 | 452 ++++++++++++++++++++++++++
 3 files changed, 711 insertions(+)
 create mode 100644 Documentation/dev-tools/checkuapi.rst
 create mode 100755 scripts/check-uapi.sh


base-commit: e492250d5252635b6c97d52eddf2792ec26f1ec1
--
2.17.1
  

Comments

Nick Desaulniers March 1, 2023, 5:50 p.m. UTC | #1
On Tue, Feb 28, 2023 at 11:54 PM John Moon <quic_johmoo@quicinc.com> wrote:
>
> Currently, the script works with gcc. It generates output like this when
> a backwards-incompatible change is made to a UAPI header:
>
>  !!! ABI differences detected in include/uapi/linux/acct.h (compared to
>  file at HEAD^1) !!!
>
>      [C] 'struct acct' changed:
>        type size changed from 512 to 544 (in bits)
>        1 data member insertion:
>          '__u32 new_val', at offset 512 (in bits) at acct.h:71:1
>
>  0/1 UAPI header file changes are backwards compatible
>  UAPI header ABI check failed
>
> However, we have not had success with clang. It seems clang is more
> aggressive in optimizing dead code away (no matter which options we
> pass). Therefore, no ABI differences are found.

Hi John,
Do you have the list of bugs you've filed upstream against clang wrt.
information missing when using `-fno-eliminate-unused-debug-types`?

https://github.com/llvm/llvm-project/issues is the issue tracker.

Seeing a strong participant in both the Android and LLVM ecosystems
supply scripts that lack clang support...raises eyebrows.
  
John Moon March 1, 2023, 6:03 p.m. UTC | #2
On 3/1/2023 9:50 AM, Nick Desaulniers wrote:
> On Tue, Feb 28, 2023 at 11:54 PM John Moon <quic_johmoo@quicinc.com> wrote:
>>
>> Currently, the script works with gcc. It generates output like this when
>> a backwards-incompatible change is made to a UAPI header:
>>
>>   !!! ABI differences detected in include/uapi/linux/acct.h (compared to
>>   file at HEAD^1) !!!
>>
>>       [C] 'struct acct' changed:
>>         type size changed from 512 to 544 (in bits)
>>         1 data member insertion:
>>           '__u32 new_val', at offset 512 (in bits) at acct.h:71:1
>>
>>   0/1 UAPI header file changes are backwards compatible
>>   UAPI header ABI check failed
>>
>> However, we have not had success with clang. It seems clang is more
>> aggressive in optimizing dead code away (no matter which options we
>> pass). Therefore, no ABI differences are found.
> 
> Hi John,
> Do you have the list of bugs you've filed upstream against clang wrt.
> information missing when using `-fno-eliminate-unused-debug-types`?
> 
> https://github.com/llvm/llvm-project/issues is the issue tracker.
> 
> Seeing a strong participant in both the Android and LLVM ecosystems
> supply scripts that lack clang support...raises eyebrows.

We have not filed a bug with upstream clang since we're not sure it's 
*not* and issue on our end. Giuliano Procida (CC'd) has tested the 
script with clang 13 and 14 and GCC 10, 11 and 12 and got the expected 
diff. If it's convenient for anyone testing this script to give it a 
whirl with clang and report back, it could help us determine if there's 
a real issue with clang support. :)
  
John Moon March 1, 2023, 7:24 p.m. UTC | #3
On 3/1/2023 10:03 AM, John Moon wrote:
> On 3/1/2023 9:50 AM, Nick Desaulniers wrote:
>> On Tue, Feb 28, 2023 at 11:54 PM John Moon <quic_johmoo@quicinc.com> 
>> wrote:
>>>
>>> Currently, the script works with gcc. It generates output like this when
>>> a backwards-incompatible change is made to a UAPI header:
>>>
>>>   !!! ABI differences detected in include/uapi/linux/acct.h (compared to
>>>   file at HEAD^1) !!!
>>>
>>>       [C] 'struct acct' changed:
>>>         type size changed from 512 to 544 (in bits)
>>>         1 data member insertion:
>>>           '__u32 new_val', at offset 512 (in bits) at acct.h:71:1
>>>
>>>   0/1 UAPI header file changes are backwards compatible
>>>   UAPI header ABI check failed
>>>
>>> However, we have not had success with clang. It seems clang is more
>>> aggressive in optimizing dead code away (no matter which options we
>>> pass). Therefore, no ABI differences are found.
>>
>> Hi John,
>> Do you have the list of bugs you've filed upstream against clang wrt.
>> information missing when using `-fno-eliminate-unused-debug-types`?
>>
>> https://github.com/llvm/llvm-project/issues is the issue tracker.
>>
>> Seeing a strong participant in both the Android and LLVM ecosystems
>> supply scripts that lack clang support...raises eyebrows.
> 
> We have not filed a bug with upstream clang since we're not sure it's 
> *not* and issue on our end. Giuliano Procida (CC'd) has tested the 
> script with clang 13 and 14 and GCC 10, 11 and 12 and got the expected 
> diff. If it's convenient for anyone testing this script to give it a 
> whirl with clang and report back, it could help us determine if there's 
> a real issue with clang support. :)

With some additional internal testing, we've found that clang does not 
work with this script on Ubuntu 18.04, but does work on Ubuntu 20.04. 
This is controlling for the clang version and different installation 
sources. The same clang-15 binary run on an 18.04 host fails while 
working on 20.04.

We'll investigate some more internally and potentially file a bug with 
upstream clang.
  
John Moon March 1, 2023, 10:33 p.m. UTC | #4
On 3/1/2023 11:24 AM, John Moon wrote:
> On 3/1/2023 10:03 AM, John Moon wrote:
>> On 3/1/2023 9:50 AM, Nick Desaulniers wrote:
>>> On Tue, Feb 28, 2023 at 11:54 PM John Moon <quic_johmoo@quicinc.com> 
>>> wrote:
>>>>
>>>> Currently, the script works with gcc. It generates output like this 
>>>> when
>>>> a backwards-incompatible change is made to a UAPI header:
>>>>
>>>>   !!! ABI differences detected in include/uapi/linux/acct.h 
>>>> (compared to
>>>>   file at HEAD^1) !!!
>>>>
>>>>       [C] 'struct acct' changed:
>>>>         type size changed from 512 to 544 (in bits)
>>>>         1 data member insertion:
>>>>           '__u32 new_val', at offset 512 (in bits) at acct.h:71:1
>>>>
>>>>   0/1 UAPI header file changes are backwards compatible
>>>>   UAPI header ABI check failed
>>>>
>>>> However, we have not had success with clang. It seems clang is more
>>>> aggressive in optimizing dead code away (no matter which options we
>>>> pass). Therefore, no ABI differences are found.
>>>
>>> Hi John,
>>> Do you have the list of bugs you've filed upstream against clang wrt.
>>> information missing when using `-fno-eliminate-unused-debug-types`?
>>>
>>> https://github.com/llvm/llvm-project/issues is the issue tracker.
>>>
>>> Seeing a strong participant in both the Android and LLVM ecosystems
>>> supply scripts that lack clang support...raises eyebrows.
>>
>> We have not filed a bug with upstream clang since we're not sure it's 
>> *not* and issue on our end. Giuliano Procida (CC'd) has tested the 
>> script with clang 13 and 14 and GCC 10, 11 and 12 and got the expected 
>> diff. If it's convenient for anyone testing this script to give it a 
>> whirl with clang and report back, it could help us determine if 
>> there's a real issue with clang support. :)
> 
> With some additional internal testing, we've found that clang does not 
> work with this script on Ubuntu 18.04, but does work on Ubuntu 20.04. 
> This is controlling for the clang version and different installation 
> sources. The same clang-15 binary run on an 18.04 host fails while 
> working on 20.04.
> 
> We'll investigate some more internally and potentially file a bug with 
> upstream clang.

With some additional help from Nick offline, we determined that the 
issue isn't with clang, but with libdw (from elfutils). You need at 
least libdw version 0.171 for the abidiff tool to work correctly with 
clang (in this particular case). Ubuntu 18.04 ships with version 0.170.

If there's any interest, it'd be fairly easy to add a check for this 
condition under the check_deps() function in the script.
  
Nick Desaulniers March 1, 2023, 11:33 p.m. UTC | #5
On Wed, Mar 1, 2023 at 2:33 PM John Moon <quic_johmoo@quicinc.com> wrote:
>
> With some additional help from Nick offline, we determined that the
> issue isn't with clang, but with libdw (from elfutils). You need at
> least libdw version 0.171 for the abidiff tool to work correctly with
> clang (in this particular case). Ubuntu 18.04 ships with version 0.170.
>
> If there's any interest, it'd be fairly easy to add a check for this
> condition under the check_deps() function in the script.

Good job John; mind sending a v3 with that addition, and the commit
message updated?
  
Trilok Soni March 1, 2023, 11:35 p.m. UTC | #6
On 3/1/2023 3:33 PM, Nick Desaulniers wrote:
> On Wed, Mar 1, 2023 at 2:33 PM John Moon <quic_johmoo@quicinc.com> wrote:
>>
>> With some additional help from Nick offline, we determined that the
>> issue isn't with clang, but with libdw (from elfutils). You need at
>> least libdw version 0.171 for the abidiff tool to work correctly with
>> clang (in this particular case). Ubuntu 18.04 ships with version 0.170.
>>
>> If there's any interest, it'd be fairly easy to add a check for this
>> condition under the check_deps() function in the script.
> 
> Good job John; mind sending a v3 with that addition, and the commit
> message updated?

I would prefer to wait to get more reviews on v2 from Greg/Masahiro and 
then combine above checks in the v3. If there are no comments by next 
Monday then we can send v3, sounds good?

---Trilok Soni
  
Mark Wielaard March 1, 2023, 11:52 p.m. UTC | #7
Hi John,

On Wed, Mar 01, 2023 at 02:33:00PM -0800, John Moon via Libabigail wrote:
> With some additional help from Nick offline, we determined that the
> issue isn't with clang, but with libdw (from elfutils). You need at
> least libdw version 0.171 for the abidiff tool to work correctly
> with clang (in this particular case). Ubuntu 18.04 ships with
> version 0.170.

I don't remember any specific fixes for clang in libdw for elfutils
0.171. But elfutils 0.171 was the first release that supported most of
DWARF5 (including GNU DebugFission and split dwarf).

> If there's any interest, it'd be fairly easy to add a check for this
> condition under the check_deps() function in the script.

Please do add this check. elfutils 0.170 is almost 6 years old now,
there have been many, many bug fixes since then (current release is
0.188 from Nov 2022).

Thanks,

Mark
  
John Moon March 1, 2023, 11:59 p.m. UTC | #8
On 3/1/2023 3:52 PM, Mark Wielaard wrote:
> Hi John,
> 
> On Wed, Mar 01, 2023 at 02:33:00PM -0800, John Moon via Libabigail wrote:
>> With some additional help from Nick offline, we determined that the
>> issue isn't with clang, but with libdw (from elfutils). You need at
>> least libdw version 0.171 for the abidiff tool to work correctly
>> with clang (in this particular case). Ubuntu 18.04 ships with
>> version 0.170.
> 
> I don't remember any specific fixes for clang in libdw for elfutils
> 0.171. But elfutils 0.171 was the first release that supported most of
> DWARF5 (including GNU DebugFission and split dwarf).
> 
>> If there's any interest, it'd be fairly easy to add a check for this
>> condition under the check_deps() function in the script.
> 
> Please do add this check. elfutils 0.170 is almost 6 years old now,
> there have been many, many bug fixes since then (current release is
> 0.188 from Nov 2022).
> 
> Thanks,
> 
> Mark

Okay, I'll add the check in v3 after giving a chance for some other 
reviews. Thanks!
  
Christoph Hellwig March 10, 2023, 8:09 a.m. UTC | #9
On Tue, Feb 28, 2023 at 11:54:00PM -0800, John Moon wrote:
> Our goal is to add tooling for vendor driver developers because the
> upstream model of expert maintainer code review can be difficult to
> replicate in-house. Tools may help developers catch simple UAPI
> incompatibilities that could be easily overlooked by in-house review.

Why would this matter in any way for the kernel?  If you tool is useful
for in-kernel usage it should be added to the tree and documented as
such, but ouf of tree crap simply does not matter.
  
Trilok Soni March 10, 2023, 6:20 p.m. UTC | #10
On 3/10/2023 12:09 AM, Christoph Hellwig wrote:
> On Tue, Feb 28, 2023 at 11:54:00PM -0800, John Moon wrote:
>> Our goal is to add tooling for vendor driver developers because the
>> upstream model of expert maintainer code review can be difficult to
>> replicate in-house. Tools may help developers catch simple UAPI
>> incompatibilities that could be easily overlooked by in-house review.
> 
> Why would this matter in any way for the kernel?  If you tool is useful
> for in-kernel usage it should be added to the tree and documented as
> such, but ouf of tree crap simply does not matter.

This tool will be helpful for the kernel maintainers and reviewers as 
well if it can detect potential UAPI backward compatibilities. Even for 
the developers while changing UAPI interfaces at kernel.org before 
submission.

John is trying to highlight also that this tool can be useful for 
downstream users who want to keep the UAPI backward compatibility like 
we do at upstream. We can remove the above text, since we would like to 
mainline it at kernel.org.

---Trilok Soni
  
Christoph Hellwig March 20, 2023, 6:26 a.m. UTC | #11
On Fri, Mar 10, 2023 at 10:20:14AM -0800, Trilok Soni wrote:
> This tool will be helpful for the kernel maintainers and reviewers as well
> if it can detect potential UAPI backward compatibilities. Even for the
> developers while changing UAPI interfaces at kernel.org before submission.

So document it as that.