[v2,1/2] check-uapi: Introduce check-uapi.sh

Message ID 20230301075402.4578-2-quic_johmoo@quicinc.com
State New
Headers
Series Validating UAPI backwards compatibility |

Commit Message

John Moon March 1, 2023, 7:54 a.m. UTC
  While the kernel community has been good at maintaining backwards
compatibility with kernel UAPIs, it would be helpful to have a tool
to check if a commit introduces changes that break backwards
compatibility.

To that end, introduce check-uapi.sh: a simple shell script that
checks for changes to UAPI headers using libabigail.

libabigail is "a framework which aims at helping developers and
software distributors to spot some ABI-related issues like interface
incompatibility in ELF shared libraries by performing a static
analysis of the ELF binaries at hand."

The script uses one of libabigail's tools, "abidiff", to compile the
changed header before and after the commit to detect any changes.

abidiff "compares the ABI of two shared libraries in ELF format. It
emits a meaningful report describing the differences between the two
ABIs."

The script also includes the ability to check the compatibilty of
all UAPI headers across commits. This allows developers to inspect
the stability of the UAPIs over time.

Signed-off-by: John Moon <quic_johmoo@quicinc.com>
---
   - Fixed issue where system UAPI headers were used instead of kernel
     source headers. Now, script will include all UAPI headers from
     source instead of just target ones.
   - Added logic to install all requires sanitized headers to the tmp
     directory up front (including across git commits)
   - Added filter for changes which only add types (which should be
     allowed for UAPI headers)
   - Added -b flag to specify "base commit" to compare against.
   - Modified logic to check for any changed UAPI files between the
     base commit and reference commit.
   - Added -a flag to check compatibility of all UAPI files.
   - Added threaded execution and -j flag to control number of jobs.
   - Added error log and -l option to control output path.
   - Added support for checking dirty git workspaces.
   - Added summary file output with header diffs
   - Added "arch/*/include/uapi" to list of files to check.
   - Addressed misc code review findings.

 scripts/check-uapi.sh | 451 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 451 insertions(+)
 create mode 100755 scripts/check-uapi.sh

--
2.17.1
  

Comments

Dodji Seketeli March 3, 2023, 9:09 p.m. UTC | #1
Hello John,

John Moon via Libabigail <libabigail@sourceware.org> a écrit:

> While the kernel community has been good at maintaining backwards
> compatibility with kernel UAPIs, it would be helpful to have a tool
> to check if a commit introduces changes that break backwards
> compatibility.
>
> To that end, introduce check-uapi.sh: a simple shell script that
> checks for changes to UAPI headers using libabigail.
>
> libabigail is "a framework which aims at helping developers and
> software distributors to spot some ABI-related issues like interface
> incompatibility in ELF shared libraries by performing a static
> analysis of the ELF binaries at hand."
>
> The script uses one of libabigail's tools, "abidiff", to compile the
> changed header before and after the commit to detect any changes.
>
> abidiff "compares the ABI of two shared libraries in ELF format. It
> emits a meaningful report describing the differences between the two
> ABIs."
>
> The script also includes the ability to check the compatibilty of
> all UAPI headers across commits. This allows developers to inspect
> the stability of the UAPIs over time.

Thank you for working on this.

The libabigail bits look good to me, for what it's worth.  I just have
some general considerations to discuss.

[...]

> +# Perform the A/B compilation and compare output ABI
> +compare_abi() {

[...]

> +	if "$ABIDIFF" --non-reachable-types "${ref_header}.bin" "${base_header}.bin" > "$log"; then
> +		printf "No ABI differences detected in %s from %s -> %s\n" "$file" "$ref" "${base_ref:-dirty tree}"
> +	else
> +		# If the only changes were additions (not modifications to existing APIs), then
> +		# there's no problem. Ignore these diffs.
> +		if grep "Unreachable types summary" "$log" | grep -q "0 removed" &&
> +		   grep "Unreachable types summary" "$log" | grep -q "0 changed"; then
> +			return 0

There is no problem in parsing the output of the tool like this.
However, the return code of the tool has been designed as a bit field that
could be analysed to know more about the kind of changes that were
reported: https://sourceware.org/libabigail/manual/abidiff.html#return-values.

Right now, there is no bit assigned to detect new types (or interface)
addition, but do you think that it would be a helpful new feature to add
to abidiff for this use case?  We can discuss this in a separate thread
if you prefer, so that we don't pollute others with this minutiae.

> +		fi
> +		{
> +			printf "!!! ABI differences detected in %s from %s -> %s !!!\n\n" "$file" "$ref" "${base_ref:-dirty tree}"
> +			sed  -e '/summary:/d' -e '/changed type/d' -e '/^$/d' -e 's/^/  /g' "$log"

Here again, if you'd like to have a particular output format emitted by
the tool, we'd be glad to discuss how to improve the plasticity of the
tool enough to emit the right output for you.  For instance, we could
add a new --no-summary that would let the tool display the change
directly without the summary header that you are strimming out with this
sed script.

[...]

Thanks again for this tool that I think might be very useful.

Cheers,
  
Masahiro Yamada March 5, 2023, 4:22 a.m. UTC | #2
On Wed, Mar 1, 2023 at 4:54 PM John Moon <quic_johmoo@quicinc.com> wrote:
>
> While the kernel community has been good at maintaining backwards
> compatibility with kernel UAPIs, it would be helpful to have a tool
> to check if a commit introduces changes that break backwards
> compatibility.
>
> To that end, introduce check-uapi.sh: a simple shell script that
> checks for changes to UAPI headers using libabigail.
>
> libabigail is "a framework which aims at helping developers and
> software distributors to spot some ABI-related issues like interface
> incompatibility in ELF shared libraries by performing a static
> analysis of the ELF binaries at hand."
>
> The script uses one of libabigail's tools, "abidiff", to compile the
> changed header before and after the commit to detect any changes.
>
> abidiff "compares the ABI of two shared libraries in ELF format. It
> emits a meaningful report describing the differences between the two
> ABIs."
>
> The script also includes the ability to check the compatibilty of
> all UAPI headers across commits. This allows developers to inspect
> the stability of the UAPIs over time.


Let's see more test cases.


[Case 1]

I think d759be8953febb6e5b5376c7d9bbf568864c6e2d
is a trivial/good cleanup.
Apparently, it still exports equivalent headers,
but this tool reports "incorrectly removed".



$ ./scripts/check-uapi.sh -b d759be8953
Saving current tree state... OK
Installing sanitized UAPI headers from d759be8953... OK
Installing sanitized UAPI headers from d759be8953^1... OK
Restoring current tree state... OK
Checking changes to UAPI headers starting from d759be8953
error - UAPI header arch/alpha/include/uapi/asm/poll.h was incorrectly removed
error - UAPI header arch/ia64/include/uapi/asm/poll.h was incorrectly removed
error - UAPI header arch/x86/include/uapi/asm/poll.h was incorrectly removed
/tmp/tmp.ixUIBlntUP/d759be8953/x86/usr/include/asm/Kbuild does not
exist - cannot compare ABI
/tmp/tmp.ixUIBlntUP/d759be8953/alpha/usr/include/asm/Kbuild does not
exist - cannot compare ABI
/tmp/tmp.ixUIBlntUP/d759be8953/ia64/usr/include/asm/Kbuild does not
exist - cannot compare ABI
error - 6/6 UAPI headers modified between d759be8953^1 and d759be8953
are not backwards compatible
error - UAPI header ABI check failed
Failure summary saved to /home/masahiro/ref/linux/abi_error_log.txt



[Case 2]

This tool compiles only changed headers.
Does it detect ABI change?

I believe the users of the headers must be compiled.



Think about this case.


$ cat foo-typedef.h
typedef int foo_cap_type;


$ cat foo.h
#include "foo-typedef.h"

struct foo {
       foo_cap_type capability;
};



Then, change the first header to
  typedef long long foo_cap_type;

abidiff will never notice the ABI change
until it compiles "foo.h" instead of "foo-typedef.h"



For testing, I applied the following patch.


 --- a/include/uapi/linux/types.h
 +++ b/include/uapi/linux/types.h
 @@ -52,7 +52,7 @@ typedef __u32 __bitwise __wsum;
  #define __aligned_be64 __be64 __attribute__((aligned(8)))
  #define __aligned_le64 __le64 __attribute__((aligned(8)))

 -typedef unsigned __bitwise __poll_t;
 +typedef unsigned short __bitwise __poll_t;

  #endif /*  __ASSEMBLY__ */
  #endif /* _UAPI_LINUX_TYPES_H */




I believe this is an ABI change because this will change
'struct epoll_event' in the include/uapi/linux/eventpoll.h
but the tool happily reports it is backwards compatible.


$ ./scripts/check-uapi.sh
Saving current tree state... OK
Installing sanitized UAPI headers from HEAD... OK
Installing sanitized UAPI headers from HEAD^1... OK
Restoring current tree state... OK
Checking changes to UAPI headers starting from HEAD
No ABI differences detected in include/uapi/linux/types.h from HEAD^1 -> HEAD
All 1 UAPI headers modified between HEAD^1 and HEAD are backwards compatible!





I would not use such a tool that contains both false positives
and false negatives, but you may notice this is more difficult
than you had expected.

I do not know if further review is worthwhile since this does not work
but I added some more in-line comments.






> +
> +# Some UAPI headers require an architecture-specific compiler to build properly.
> +ARCH_SPECIFIC_CC_NEEDED=(
> +       "arch/hexagon/include/uapi/asm/sigcontext.h"
> +       "arch/ia64/include/uapi/asm/intel_intrin.h"
> +       "arch/ia64/include/uapi/asm/setup.h"
> +       "arch/ia64/include/uapi/asm/sigcontext.h"
> +       "arch/mips/include/uapi/asm/bitfield.h"
> +       "arch/mips/include/uapi/asm/byteorder.h"
> +       "arch/mips/include/uapi/asm/inst.h"
> +       "arch/sparc/include/uapi/asm/fbio.h"
> +       "arch/sparc/include/uapi/asm/uctx.h"
> +       "arch/xtensa/include/uapi/asm/byteorder.h"
> +       "arch/xtensa/include/uapi/asm/msgbuf.h"
> +       "arch/xtensa/include/uapi/asm/sembuf.h"
> +)


Yes, arch/*/include/ must be compiled by the target compiler.
If you compile them by the host compiler, it is unpredictable (i.e. wrong).

BTW, was this blacklist detected on a x86 host?

If you do this on an ARM/ARM64 host, some headers
under arch/x86/include/uapi/ might be blacklisted?



> +# Compile the simple test app
> +do_compile() {
> +       local -r inc_dir="$1"
> +       local -r header="$2"
> +       local -r out="$3"
> +       printf "int main(void) { return 0; }\n" | \
> +               "${CC:-gcc}" -c \
> +                 -o "$out" \
> +                 -x c \
> +                 -O0 \
> +                 -std=c90 \
> +                 -fno-eliminate-unused-debug-types \
> +                 -g \
> +                 "-I${inc_dir}" \
> +                 -include "$header" \
> +                 -
> +}
> +
> +# Print the list of incompatible headers from the usr/include Makefile
> +get_no_header_list() {
> +       {
> +               # shellcheck disable=SC2016
> +               printf 'all: ; @echo $(no-header-test)\n'
> +               cat "usr/include/Makefile"

You must pass SRCARCH=$arch.

Otherwise,

ifeq ($(SRCARCH),...)
  ...
endif

are all skipped.





> +       } | make -f - | tr " " "\n" | grep -v "asm-generic"
> +
> +       # One additional header file is not building correctly
> +       # with this method.
> +       # TODO: why can't we build this one?
> +       printf "asm-generic/ucontext.h\n"


Answer - it is not intended for standalone compiling in the first place.

<asm-generic/*.h> should be included from <asm/*.h>.

Userspace never ever includes <asm-generic/*.h> directly.
(If it does, it is a bug in the userspace program)

I am afraid you read user/include/Makefile wrongly.




> +
> +# Install headers for every arch and ref we need
> +install_headers() {
> +       local -r check_all="$1"
> +       local -r base_ref="$2"
> +       local -r ref="$3"
> +
> +       local arch_list=()
> +       while read -r status file; do
> +               if arch="$(printf "%s" "$file" | grep -o 'arch/.*/uapi' | cut -d '/' -f 2)"; then
> +                       # shellcheck disable=SC2076
> +                       if ! [[ " ${arch_list[*]} " =~ " $arch " ]]; then
> +                               arch_list+=("$arch")
> +                       fi
> +               fi
> +       done < <(get_uapi_files "$check_all" "$base_ref" "$ref")
> +
> +       deviated_from_current_tree="false"
> +       for inst_ref in "$base_ref" "$ref"; do
> +               if [ -n "$inst_ref" ]; then
> +                       if [ "$deviated_from_current_tree" = "false" ]; then
> +                               save_tree_state
> +                               trap 'rm -rf "$tmp_dir"; restore_tree_state;' EXIT
> +                               deviated_from_current_tree="true"
> +                       fi
> +                       git checkout --quiet "$(git rev-parse "$inst_ref")"


I might be wrong, but I was worried when I looked at this line
because git-checkout may change the running code
if check-uapi.sh is changed between ref and base_ref.

If bash always loads all code into memory before running
it is safe but I do not know how it works.


If this is safe, some comments might be worthwhile:

    # 'git checkout' may update this script itself while running,
    # but it is OK because ...





> +
> +# Make sure we have the tools we need
> +check_deps() {
> +       export ABIDIFF="${ABIDIFF:-abidiff}"
> +
> +       if ! command -v "$ABIDIFF" > /dev/null 2>&1; then
> +               eprintf "error - abidiff not found!\n"
> +               eprintf "Please install abigail-tools (version 1.7 or greater)\n"
> +               eprintf "See: https://sourceware.org/libabigail/manual/libabigail-overview.html\n"
> +               exit 1
> +       fi
> +
> +       read -r abidiff_maj abidiff_min _unused < <("$ABIDIFF" --version | cut -d ' ' -f 2 | tr '.' ' ')
> +       if [ "$abidiff_maj" -lt 1 ] || { [ "$abidiff_maj" -eq 1 ] && [ "$abidiff_min" -lt 7 ]; }; then


This is up to you, but I think "sort -V" would be cleaner.
(see Documentation/devicetree/bindings/Makefile for example)




> +       fi
> +
> +       if [ ! -x "scripts/unifdef" ]; then
> +               if ! make -f /dev/null scripts/unifdef; then

Previously, I wanted to point out that using Make is meaningless,
and using gcc directly is better.


But, is this still necessary?

V2 uses 'make headers_install' to install all headers.
scripts/unifdef is not used anywhere in this script.






> +
> +       abi_error_log="${abi_error_log:-${KERNEL_SRC}/abi_error_log.txt}"
> +
> +       check_deps
> +
> +       tmp_dir=$(mktemp -d)
> +       trap 'rm -rf "$tmp_dir"' EXIT
> +
> +       # Set of UAPI directories to check by default
> +       UAPI_DIRS=(include/uapi arch/*/include/uapi)
> +
> +       if ! git rev-parse --is-inside-work-tree > /dev/null 2>&1; then
> +               eprintf "error - this script requires the kernel tree to be initialized with Git\n"
> +               exit 1
> +       fi
> +
> +       # If there are no dirty UAPI files, use HEAD as base_ref
> +       if [ -z "$base_ref" ] && [ "$(get_uapi_files "" "" | wc -l)" -eq 0 ]; then
> +               base_ref="HEAD"
> +       fi
> +
> +       if [ -z "$ref_to_check" ]; then
> +               if [ -n "$base_ref" ]; then
> +                       ref_to_check="${base_ref}^1"
> +               else
> +                       ref_to_check="HEAD"
> +               fi
> +       fi


I think this is because I am not good at English, but
I was so confused between 'base_ref' vs 'ref_to_check'.
I do not get which one is the ancestor from the names.

I thought 'ref_a' and 'ref_b' would be less confusing,
but I hope somebody will come up with better naming
than that.




--
Best Regards





Masahiro Yamada
  
John Moon March 5, 2023, 5:08 p.m. UTC | #3
On 3/4/2023 8:22 PM, Masahiro Yamada wrote:
> On Wed, Mar 1, 2023 at 4:54 PM John Moon <quic_johmoo@quicinc.com> wrote:
>>
>> While the kernel community has been good at maintaining backwards
>> compatibility with kernel UAPIs, it would be helpful to have a tool
>> to check if a commit introduces changes that break backwards
>> compatibility.
>>
>> To that end, introduce check-uapi.sh: a simple shell script that
>> checks for changes to UAPI headers using libabigail.
>>
>> libabigail is "a framework which aims at helping developers and
>> software distributors to spot some ABI-related issues like interface
>> incompatibility in ELF shared libraries by performing a static
>> analysis of the ELF binaries at hand."
>>
>> The script uses one of libabigail's tools, "abidiff", to compile the
>> changed header before and after the commit to detect any changes.
>>
>> abidiff "compares the ABI of two shared libraries in ELF format. It
>> emits a meaningful report describing the differences between the two
>> ABIs."
>>
>> The script also includes the ability to check the compatibilty of
>> all UAPI headers across commits. This allows developers to inspect
>> the stability of the UAPIs over time.
> 
> 
> Let's see more test cases.
> 
> 
> [Case 1]
> 
> I think d759be8953febb6e5b5376c7d9bbf568864c6e2d
> is a trivial/good cleanup.
> Apparently, it still exports equivalent headers,
> but this tool reports "incorrectly removed".
> 
> 
> 
> $ ./scripts/check-uapi.sh -b d759be8953
> Saving current tree state... OK
> Installing sanitized UAPI headers from d759be8953... OK
> Installing sanitized UAPI headers from d759be8953^1... OK
> Restoring current tree state... OK
> Checking changes to UAPI headers starting from d759be8953
> error - UAPI header arch/alpha/include/uapi/asm/poll.h was incorrectly removed
> error - UAPI header arch/ia64/include/uapi/asm/poll.h was incorrectly removed
> error - UAPI header arch/x86/include/uapi/asm/poll.h was incorrectly removed
> /tmp/tmp.ixUIBlntUP/d759be8953/x86/usr/include/asm/Kbuild does not
> exist - cannot compare ABI
> /tmp/tmp.ixUIBlntUP/d759be8953/alpha/usr/include/asm/Kbuild does not
> exist - cannot compare ABI
> /tmp/tmp.ixUIBlntUP/d759be8953/ia64/usr/include/asm/Kbuild does not
> exist - cannot compare ABI
> error - 6/6 UAPI headers modified between d759be8953^1 and d759be8953
> are not backwards compatible
> error - UAPI header ABI check failed
> Failure summary saved to /home/masahiro/ref/linux/abi_error_log.txt
> 
> 

This is an interesting test case. Thanks for bringing it up. I don't 
know if there's a way for the script to filter out these kinds of 
changes, so it may just need to be noted under possible false positives 
in the document.

It also reveals that the script isn't filtering out non-headers from the 
git diffs... I'll fix that in v3.

> 
> [Case 2]
> 
> This tool compiles only changed headers.
> Does it detect ABI change?
> 
> I believe the users of the headers must be compiled.
> 
> 
> 
> Think about this case.
> 
> 
> $ cat foo-typedef.h
> typedef int foo_cap_type;
> 
> 
> $ cat foo.h
> #include "foo-typedef.h"
> 
> struct foo {
>         foo_cap_type capability;
> };
> 
> 
> 
> Then, change the first header to
>    typedef long long foo_cap_type;
> 
> abidiff will never notice the ABI change
> until it compiles "foo.h" instead of "foo-typedef.h"
>  >
> 
> For testing, I applied the following patch.
> 
> 
>   --- a/include/uapi/linux/types.h
>   +++ b/include/uapi/linux/types.h
>   @@ -52,7 +52,7 @@ typedef __u32 __bitwise __wsum;
>    #define __aligned_be64 __be64 __attribute__((aligned(8)))
>    #define __aligned_le64 __le64 __attribute__((aligned(8)))
> 
>   -typedef unsigned __bitwise __poll_t;
>   +typedef unsigned short __bitwise __poll_t;
> 
>    #endif /*  __ASSEMBLY__ */
>    #endif /* _UAPI_LINUX_TYPES_H */
> 
> 
> 
> 
> I believe this is an ABI change because this will change
> 'struct epoll_event' in the include/uapi/linux/eventpoll.h
> but the tool happily reports it is backwards compatible.
> 
> 
> $ ./scripts/check-uapi.sh
> Saving current tree state... OK
> Installing sanitized UAPI headers from HEAD... OK
> Installing sanitized UAPI headers from HEAD^1... OK
> Restoring current tree state... OK
> Checking changes to UAPI headers starting from HEAD
> No ABI differences detected in include/uapi/linux/types.h from HEAD^1 -> HEAD
> All 1 UAPI headers modified between HEAD^1 and HEAD are backwards compatible!
> 
> 

You're correct, this case is missed when only checking modified headers. 
With the same patch, if I pass "-a" to the script, it does catch the change:

% ./scripts/check-uapi.sh -a
--- snip ---
error - 1/1328 UAPI headers modified between HEAD and dirty tree are not 
backwards compatible
error - UAPI header ABI check failed


% cat abi_error_log.txt
Generated by "./scripts/check-uapi.sh -a" from git ref 
94b1166f7954f1136f307dafbaad5f9d871b73bf

!!! ABI differences detected in include/uapi/linux/eventpoll.h from HEAD 
-> dirty tree !!!

     [C] 'struct epoll_event' changed:
       type size changed from 96 to 80 (in bits)
       2 data member changes:
         type of '__poll_t events' changed:
           underlying type 'unsigned int' changed:
             type name changed from 'unsigned int' to 'unsigned short int'
             type size changed from 32 to 16 (in bits)
         '__u64 data' offset changed from 32 to 16 (in bits) (by -16 bits)

Perhaps in the next revision we could add some way to detect these 
dependencies (e.g. if foo.h includes bar.h, and bar.h was modified, we 
should check foo.h). However, the time savings may not be worth the 
complicated and potentially fragile dependency detection.

For now, "-a" should catch this, and it only took about 1 minute to run 
through all the headers on my 8-core machine, so it should be a 
resonable test step for a CI system.

> 
> 
> 
> I would not use such a tool that contains both false positives
> and false negatives, but you may notice this is more difficult
> than you had expected.
> 

Right, it certainly has its shortcomings. I appreciate you helping us 
find and address them! Even in its current state, I believe the script 
has value for developers and reviewers. :)

> I do not know if further review is worthwhile since this does not work
> but I added some more in-line comments.
> 
> 
> 
> 
> 
> 
>> +
>> +# Some UAPI headers require an architecture-specific compiler to build properly.
>> +ARCH_SPECIFIC_CC_NEEDED=(
>> +       "arch/hexagon/include/uapi/asm/sigcontext.h"
>> +       "arch/ia64/include/uapi/asm/intel_intrin.h"
>> +       "arch/ia64/include/uapi/asm/setup.h"
>> +       "arch/ia64/include/uapi/asm/sigcontext.h"
>> +       "arch/mips/include/uapi/asm/bitfield.h"
>> +       "arch/mips/include/uapi/asm/byteorder.h"
>> +       "arch/mips/include/uapi/asm/inst.h"
>> +       "arch/sparc/include/uapi/asm/fbio.h"
>> +       "arch/sparc/include/uapi/asm/uctx.h"
>> +       "arch/xtensa/include/uapi/asm/byteorder.h"
>> +       "arch/xtensa/include/uapi/asm/msgbuf.h"
>> +       "arch/xtensa/include/uapi/asm/sembuf.h"
>> +)
> 
> 
> Yes, arch/*/include/ must be compiled by the target compiler.
> If you compile them by the host compiler, it is unpredictable (i.e. wrong).
> 
> BTW, was this blacklist detected on a x86 host?
> 

Yes.

> If you do this on an ARM/ARM64 host, some headers
> under arch/x86/include/uapi/ might be blacklisted?
> 

Good point - I missed those!

> 
> 
>> +# Compile the simple test app
>> +do_compile() {
>> +       local -r inc_dir="$1"
>> +       local -r header="$2"
>> +       local -r out="$3"
>> +       printf "int main(void) { return 0; }\n" | \
>> +               "${CC:-gcc}" -c \
>> +                 -o "$out" \
>> +                 -x c \
>> +                 -O0 \
>> +                 -std=c90 \
>> +                 -fno-eliminate-unused-debug-types \
>> +                 -g \
>> +                 "-I${inc_dir}" \
>> +                 -include "$header" \
>> +                 -
>> +}
>> +
>> +# Print the list of incompatible headers from the usr/include Makefile
>> +get_no_header_list() {
>> +       {
>> +               # shellcheck disable=SC2016
>> +               printf 'all: ; @echo $(no-header-test)\n'
>> +               cat "usr/include/Makefile"
> 
> You must pass SRCARCH=$arch.
> 
> Otherwise,
> 
> ifeq ($(SRCARCH),...)
>    ...
> endif
> 
> are all skipped.
> 
> 

Thanks for the tip, that explains it. Should be able to address this in v3.

> 
> 
> 
>> +       } | make -f - | tr " " "\n" | grep -v "asm-generic"
>> +
>> +       # One additional header file is not building correctly
>> +       # with this method.
>> +       # TODO: why can't we build this one?
>> +       printf "asm-generic/ucontext.h\n"
> 
> 
> Answer - it is not intended for standalone compiling in the first place.
> 
> <asm-generic/*.h> should be included from <asm/*.h>.
> 
> Userspace never ever includes <asm-generic/*.h> directly.
> (If it does, it is a bug in the userspace program)
> 
> I am afraid you read user/include/Makefile wrongly.
> 
> 

Understood. I think I had misinterpreted one of your comments on v1, but 
now I'm clear. Will address in v3.

> 
> 
>> +
>> +# Install headers for every arch and ref we need
>> +install_headers() {
>> +       local -r check_all="$1"
>> +       local -r base_ref="$2"
>> +       local -r ref="$3"
>> +
>> +       local arch_list=()
>> +       while read -r status file; do
>> +               if arch="$(printf "%s" "$file" | grep -o 'arch/.*/uapi' | cut -d '/' -f 2)"; then
>> +                       # shellcheck disable=SC2076
>> +                       if ! [[ " ${arch_list[*]} " =~ " $arch " ]]; then
>> +                               arch_list+=("$arch")
>> +                       fi
>> +               fi
>> +       done < <(get_uapi_files "$check_all" "$base_ref" "$ref")
>> +
>> +       deviated_from_current_tree="false"
>> +       for inst_ref in "$base_ref" "$ref"; do
>> +               if [ -n "$inst_ref" ]; then
>> +                       if [ "$deviated_from_current_tree" = "false" ]; then
>> +                               save_tree_state
>> +                               trap 'rm -rf "$tmp_dir"; restore_tree_state;' EXIT
>> +                               deviated_from_current_tree="true"
>> +                       fi
>> +                       git checkout --quiet "$(git rev-parse "$inst_ref")"
> 
> 
> I might be wrong, but I was worried when I looked at this line
> because git-checkout may change the running code
> if check-uapi.sh is changed between ref and base_ref.
> 
> If bash always loads all code into memory before running
> it is safe but I do not know how it works.
> 
> 
> If this is safe, some comments might be worthwhile:
> 
>      # 'git checkout' may update this script itself while running,
>      # but it is OK because ...
> 

Yes, my understanding is that since the script is all encapsulated in 
functions, the shell has loaded all of the functions before execution 
starts. My testing has shown this to be safe as well. Will add a comment 
in v3.

> 
> 
> 
> 
>> +
>> +# Make sure we have the tools we need
>> +check_deps() {
>> +       export ABIDIFF="${ABIDIFF:-abidiff}"
>> +
>> +       if ! command -v "$ABIDIFF" > /dev/null 2>&1; then
>> +               eprintf "error - abidiff not found!\n"
>> +               eprintf "Please install abigail-tools (version 1.7 or greater)\n"
>> +               eprintf "See: https://sourceware.org/libabigail/manual/libabigail-overview.html\n"
>> +               exit 1
>> +       fi
>> +
>> +       read -r abidiff_maj abidiff_min _unused < <("$ABIDIFF" --version | cut -d ' ' -f 2 | tr '.' ' ')
>> +       if [ "$abidiff_maj" -lt 1 ] || { [ "$abidiff_maj" -eq 1 ] && [ "$abidiff_min" -lt 7 ]; }; then
> 
> 
> This is up to you, but I think "sort -V" would be cleaner.
> (see Documentation/devicetree/bindings/Makefile for example)
> 
> 

Noted.

> 
> 
>> +       fi
>> +
>> +       if [ ! -x "scripts/unifdef" ]; then
>> +               if ! make -f /dev/null scripts/unifdef; then
> 
> Previously, I wanted to point out that using Make is meaningless,
> and using gcc directly is better.
> 
> 
> But, is this still necessary?
> 
> V2 uses 'make headers_install' to install all headers.
> scripts/unifdef is not used anywhere in this script.
> 
> 

Ah, you're right it is not necessary. Previously, we were calling 
headers_install.sh directly, so make wasn't there to supply the unifdef 
dependency. Will remove this in v3.

> 
> 
> 
> 
>> +
>> +       abi_error_log="${abi_error_log:-${KERNEL_SRC}/abi_error_log.txt}"
>> +
>> +       check_deps
>> +
>> +       tmp_dir=$(mktemp -d)
>> +       trap 'rm -rf "$tmp_dir"' EXIT
>> +
>> +       # Set of UAPI directories to check by default
>> +       UAPI_DIRS=(include/uapi arch/*/include/uapi)
>> +
>> +       if ! git rev-parse --is-inside-work-tree > /dev/null 2>&1; then
>> +               eprintf "error - this script requires the kernel tree to be initialized with Git\n"
>> +               exit 1
>> +       fi
>> +
>> +       # If there are no dirty UAPI files, use HEAD as base_ref
>> +       if [ -z "$base_ref" ] && [ "$(get_uapi_files "" "" | wc -l)" -eq 0 ]; then
>> +               base_ref="HEAD"
>> +       fi
>> +
>> +       if [ -z "$ref_to_check" ]; then
>> +               if [ -n "$base_ref" ]; then
>> +                       ref_to_check="${base_ref}^1"
>> +               else
>> +                       ref_to_check="HEAD"
>> +               fi
>> +       fi
> 
> 
> I think this is because I am not good at English, but
> I was so confused between 'base_ref' vs 'ref_to_check'.
> I do not get which one is the ancestor from the names.
> 
> I thought 'ref_a' and 'ref_b' would be less confusing,
> but I hope somebody will come up with better naming
> than that.
> 

Agreed, I think this is a confusing case for native English-speakers too. :)

I want to indicate that one ref has to come after the other in the Git 
tree. Maybe "base_ref" and "past_ref"? We'll think on it.

> 
> 
> 
> --
> Best Regards
> 
> 
> 
> 
> 
> Masahiro Yamada

Thank you again for the detailed review!
  

Patch

diff --git a/scripts/check-uapi.sh b/scripts/check-uapi.sh
new file mode 100755
index 000000000000..022cc7f8a2a9
--- /dev/null
+++ b/scripts/check-uapi.sh
@@ -0,0 +1,451 @@ 
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+# Script to check commits for UAPI backwards compatibility
+
+set -o errexit
+set -o pipefail
+
+print_usage() {
+	name=$(basename "$0")
+	cat << EOF
+$name - check for UAPI header stability across Git commits
+
+By default, the script will check to make sure the latest commit (or current
+dirty changes) did not introduce ABI changes when compared to HEAD^1. You can
+check against additional commit ranges with the -b and -r options.
+
+To force the script to check compatibility of all UAPI headers, use the -a
+option.
+
+Usage: $name [-b BASE_REF] [-r COMP_REF] [-a] [-j N] [-l ERROR_LOG]
+
+Options:
+    -b BASE_REF    Base git reference to use for comparison. If unspecified or empty,
+                   will use any dirty changes in tree to UAPI files. If there are no
+                   dirty changes, HEAD will be used.
+    -r COMP_REF    Compare BASE_REF to COMP_REF (e.g. -r v6.1). If unspecified or empty,
+                   will use BASE_REF^1.
+    -a             Check all UAPI headers for backwards compatibility.
+    -j JOBS        Number of checks to run in parallel (default: number of CPU cores)
+    -l ERROR_LOG   Write error log to file (default: "\$KERNEL_SOURCE/abi_error_log.txt")
+
+Environmental args:
+    ABIDIFF  Custom path to abidiff binary
+    CC       C compiler (default is "gcc")
+EOF
+}
+
+# Some UAPI headers require an architecture-specific compiler to build properly.
+ARCH_SPECIFIC_CC_NEEDED=(
+	"arch/hexagon/include/uapi/asm/sigcontext.h"
+	"arch/ia64/include/uapi/asm/intel_intrin.h"
+	"arch/ia64/include/uapi/asm/setup.h"
+	"arch/ia64/include/uapi/asm/sigcontext.h"
+	"arch/mips/include/uapi/asm/bitfield.h"
+	"arch/mips/include/uapi/asm/byteorder.h"
+	"arch/mips/include/uapi/asm/inst.h"
+	"arch/sparc/include/uapi/asm/fbio.h"
+	"arch/sparc/include/uapi/asm/uctx.h"
+	"arch/xtensa/include/uapi/asm/byteorder.h"
+	"arch/xtensa/include/uapi/asm/msgbuf.h"
+	"arch/xtensa/include/uapi/asm/sembuf.h"
+)
+
+# Print to stderr
+eprintf() {
+	# shellcheck disable=SC2059
+	printf "$@" >&2
+}
+
+# Print list of UAPI files to operate on
+get_uapi_files() {
+	local -r check_all="$1"
+	local -r base_ref="$2"
+	local -r ref="$3"
+	local -r args="--name-status --no-renames --format= --diff-filter=a -- ${UAPI_DIRS[*]}"
+
+	if [ "$check_all" = "true" ]; then
+		# Use find to print all of the UAPI files as if git had detected they were modified
+		# Ignore the headers that need an arch-specific compiler because we can't build all
+		# of those in one run (as only one CC can be passed).
+		# shellcheck disable=SC2046,SC2048,SC2086
+		find "${UAPI_DIRS[@]}" -type f -name '*.h' -printf 'M\t%p\n' \
+			| grep -v $(get_no_header_list | xargs -- printf '-e %s ') \
+				  ${ARCH_SPECIFIC_CC_NEEDED[*]/#/-e }
+	else
+		if [ -z "$base_ref" ] || [ -z "$ref" ]; then
+			# shellcheck disable=SC2086
+			git diff $args
+		else
+			# shellcheck disable=SC2086
+			git diff "$ref" "$base_ref" $args
+		fi
+	fi
+}
+
+# Compile the simple test app
+do_compile() {
+	local -r inc_dir="$1"
+	local -r header="$2"
+	local -r out="$3"
+	printf "int main(void) { return 0; }\n" | \
+		"${CC:-gcc}" -c \
+		  -o "$out" \
+		  -x c \
+		  -O0 \
+		  -std=c90 \
+		  -fno-eliminate-unused-debug-types \
+		  -g \
+		  "-I${inc_dir}" \
+		  -include "$header" \
+		  -
+}
+
+# Print the list of incompatible headers from the usr/include Makefile
+get_no_header_list() {
+	{
+		# shellcheck disable=SC2016
+		printf 'all: ; @echo $(no-header-test)\n'
+		cat "usr/include/Makefile"
+	} | make -f - | tr " " "\n" | grep -v "asm-generic"
+
+	# One additional header file is not building correctly
+	# with this method.
+	# TODO: why can't we build this one?
+	printf "asm-generic/ucontext.h\n"
+}
+
+# Save the current git tree state, stashing if needed
+save_tree_state() {
+	printf "Saving current tree state... "
+	current_ref="$(git rev-parse HEAD)"
+	readonly current_ref
+	if ! git diff-index --quiet HEAD; then
+		unstash="true"
+		git stash push --quiet
+	fi
+	printf "OK\n"
+}
+
+# Restore the git tree state, unstashing if needed
+restore_tree_state() {
+	printf "Restoring current tree state... "
+	git checkout --quiet "$current_ref"
+	if [ "$unstash" = "true" ]; then
+		git stash pop --quiet
+		unstash="false"
+	fi
+	printf "OK\n"
+}
+
+# Install headers for every arch and ref we need
+install_headers() {
+	local -r check_all="$1"
+	local -r base_ref="$2"
+	local -r ref="$3"
+
+	local arch_list=()
+	while read -r status file; do
+		if arch="$(printf "%s" "$file" | grep -o 'arch/.*/uapi' | cut -d '/' -f 2)"; then
+			# shellcheck disable=SC2076
+			if ! [[ " ${arch_list[*]} " =~ " $arch " ]]; then
+				arch_list+=("$arch")
+			fi
+		fi
+	done < <(get_uapi_files "$check_all" "$base_ref" "$ref")
+
+	deviated_from_current_tree="false"
+	for inst_ref in "$base_ref" "$ref"; do
+		if [ -n "$inst_ref" ]; then
+			if [ "$deviated_from_current_tree" = "false" ]; then
+				save_tree_state
+				trap 'rm -rf "$tmp_dir"; restore_tree_state;' EXIT
+				deviated_from_current_tree="true"
+			fi
+			git checkout --quiet "$(git rev-parse "$inst_ref")"
+		fi
+
+		printf "Installing sanitized UAPI headers from %s... " "${inst_ref:-dirty tree}"
+		make INSTALL_HDR_PATH="${tmp_dir}/${inst_ref}/usr" headers_install > /dev/null 2>&1
+		for arch in "${arch_list[@]}"; do
+			make ARCH="$arch" INSTALL_HDR_PATH="${tmp_dir}/${inst_ref}/${arch}/usr" \
+				headers_install > /dev/null 2>&1
+		done
+		printf "OK\n"
+	done
+
+	restore_tree_state
+	trap 'rm -rf "$tmp_dir"' EXIT
+}
+
+# Check file list for UAPI compatibility
+check_uapi_files() {
+	local -r check_all="$1"
+	local -r base_ref="$2"
+	local -r ref="$3"
+
+	install_headers "$check_all" "$base_ref" "$ref"
+
+	local passed=0;
+	local failed=0;
+	local -a threads=()
+
+	printf "Checking changes to UAPI headers starting from %s\n" "${base_ref:-dirty tree}"
+	while read -r status file; do
+		if [ "${#threads[@]}" -ge "$MAX_THREADS" ]; then
+			if wait "${threads[0]}"; then
+				passed=$((passed + 1))
+			else
+				failed=$((failed + 1))
+			fi
+			threads=("${threads[@]:1}")
+		fi
+
+		check_individual_file "$base_ref" "$ref" "$status" "$file" &
+		threads+=("$!")
+	done < <(get_uapi_files "$check_all" "$base_ref" "$ref")
+
+	for t in "${threads[@]}"; do
+		if wait "$t"; then
+			passed=$((passed + 1))
+		else
+			failed=$((failed + 1))
+		fi
+	done
+
+	total="$((passed + failed))"
+	if [ "$failed" -gt 0 ]; then
+		eprintf "error - %d/%d UAPI headers modified between %s and %s are not backwards compatible\n" \
+			"$failed" "$total" "$ref" "${base_ref:-dirty tree}"
+	else
+		printf "All %d UAPI headers modified between %s and %s are backwards compatible!\n" \
+			"$total" "$ref" "${base_ref:-dirty tree}"
+	fi
+
+	return "$failed"
+}
+
+# Print the path to a give header in the tmp_dir
+get_header() {
+	local -r ref="$1"
+	local -r arch="$2"
+	local -r base="$3"
+
+	if [ -z "$arch" ]; then
+		printf "%s" "${tmp_dir}/${ref}/usr/${base}"
+	else
+		printf "%s" "${tmp_dir}/${ref}/${arch}/usr/$(printf "%s" "$base" | cut -d '/' -f 3-)"
+	fi
+}
+
+# Check an individual file for UAPI compatibility
+check_individual_file() {
+	local -r base_ref="$1"
+	local -r ref="$2"
+	local -r status="$3"
+	local -r file="$4"
+
+	if [ "$status" = "D" ]; then
+		eprintf "error - UAPI header %s was incorrectly removed\n" "$file"
+		return 1
+	fi
+
+	local -r base=${file/uapi\//}
+	local -r uapi_arch="$(printf "%s" "$file" | grep -o 'arch/.*/uapi' | cut -d '/' -f 2)"
+	local -r base_header=$(get_header "$base_ref" "$uapi_arch" "$base")
+	local -r ref_header=$(get_header "$ref" "$uapi_arch" "$base")
+	local -r installed_base="$(printf "%s" "$base_header" | grep -o "usr/include/.*" | cut -d '/' -f 3-)"
+
+	# shellcheck disable=SC2076
+	if [[ " $(get_no_header_list | xargs) " =~ " $installed_base " ]]; then
+		eprintf "%s cannot be tested by this script (see usr/include/Makefile)\n" "$file"
+		return 1
+	fi
+
+	for h in "$base_header" "$ref_header"; do
+		if [ ! -f "$h" ]; then
+			eprintf "%s does not exist - cannot compare ABI\n" "$h"
+			return 1
+		fi
+	done
+
+	compare_abi "$file" "$base_header" "$ref_header" "$base_ref" "$ref" "$uapi_arch"
+}
+
+# Perform the A/B compilation and compare output ABI
+compare_abi() {
+	local -r file="$1"
+	local -r base_header="$2"
+	local -r ref_header="$3"
+	local -r base_ref="$4"
+	local -r ref="$5"
+	local -r uapi_arch="$6"
+	local -r log="${tmp_dir}/log/$(basename "$file").log"
+
+	mkdir -p "$(dirname "$log")"
+
+	if ! do_compile "${tmp_dir}/${base_ref}/${uapi_arch}/usr/include" "$base_header" "${base_header}.bin" 2> "$log"; then
+		eprintf "error - couldn't compile current version of UAPI header %s\n" "$file"
+		# shellcheck disable=SC2076
+		if [[ " ${ARCH_SPECIFIC_CC_NEEDED[*]} " =~ " $file " ]]; then
+			eprintf "warning - this file needs to be built with a %s compiler. Are you using one?\n" "$uapi_arch"
+		fi
+		cat "$log" >&2
+		exit 1
+	fi
+
+	if ! do_compile "${tmp_dir}/${ref}/${uapi_arch}/usr/include" "$ref_header" "${ref_header}.bin" 2> "$log"; then
+		eprintf "error - couldn't compile version of UAPI header %s at %s\n" "$file" "$ref"
+		# shellcheck disable=SC2076
+		if [[ " ${ARCH_SPECIFIC_CC_NEEDED[*]} " =~ " $file " ]]; then
+			eprintf "warning - this file needs to be built with a %s compiler. Are you using one?\n" "$uapi_arch"
+		fi
+
+		cat "$log" >&2
+		exit 1
+	fi
+
+	if "$ABIDIFF" --non-reachable-types "${ref_header}.bin" "${base_header}.bin" > "$log"; then
+		printf "No ABI differences detected in %s from %s -> %s\n" "$file" "$ref" "${base_ref:-dirty tree}"
+	else
+		# If the only changes were additions (not modifications to existing APIs), then
+		# there's no problem. Ignore these diffs.
+		if grep "Unreachable types summary" "$log" | grep -q "0 removed" &&
+		   grep "Unreachable types summary" "$log" | grep -q "0 changed"; then
+			return 0
+		fi
+		{
+			printf "!!! ABI differences detected in %s from %s -> %s !!!\n\n" "$file" "$ref" "${base_ref:-dirty tree}"
+			sed  -e '/summary:/d' -e '/changed type/d' -e '/^$/d' -e 's/^/  /g' "$log"
+			printf "\nHeader file diff (after headers_install):\n"
+			diff -Naur "$ref_header" "$base_header" \
+				| sed -e "s|${ref_header}|${ref}/${file}|g" \
+				      -e "s|${base_header}|${base_ref:-dirty}/${file}|g"
+			printf "\n"
+		} | tee "${base_header}.error" >&2
+		return 1
+	fi
+}
+
+# Make sure we have the tools we need
+check_deps() {
+	export ABIDIFF="${ABIDIFF:-abidiff}"
+
+	if ! command -v "$ABIDIFF" > /dev/null 2>&1; then
+		eprintf "error - abidiff not found!\n"
+		eprintf "Please install abigail-tools (version 1.7 or greater)\n"
+		eprintf "See: https://sourceware.org/libabigail/manual/libabigail-overview.html\n"
+		exit 1
+	fi
+
+	read -r abidiff_maj abidiff_min _unused < <("$ABIDIFF" --version | cut -d ' ' -f 2 | tr '.' ' ')
+	if [ "$abidiff_maj" -lt 1 ] || { [ "$abidiff_maj" -eq 1 ] && [ "$abidiff_min" -lt 7 ]; }; then
+		eprintf "error - abidiff version too old: %s\n" "$("$ABIDIFF" --version)"
+		eprintf "Please install abigail-tools (version 1.7 or greater)\n"
+		eprintf "See: https://sourceware.org/libabigail/manual/libabigail-overview.html\n"
+		exit 1
+	fi
+
+	if [ ! -x "scripts/unifdef" ]; then
+		if ! make -f /dev/null scripts/unifdef; then
+			eprintf 'error - failed to build required dependency "scripts/unifdef"\n'
+			exit 1
+		fi
+	fi
+}
+
+main() {
+	MAX_THREADS=$(nproc)
+
+	base_ref=""
+	check_all="false"
+	while getopts "hb:r:aj:l:" opt; do
+		case $opt in
+		h)
+			print_usage
+			exit 0
+			;;
+		b)
+			base_ref="$OPTARG"
+			;;
+		r)
+			ref_to_check="$OPTARG"
+			;;
+		a)
+			check_all="true"
+			;;
+		j)
+			MAX_THREADS="$OPTARG"
+			;;
+		l)
+			abi_error_log="$OPTARG"
+			;;
+		*)
+			eprintf "error - invalid option %s\n" "$opt"
+			exit 1
+		esac
+	done
+
+	if [ -z "$KERNEL_SRC" ]; then
+		KERNEL_SRC="$(realpath "$(dirname "$0")"/..)"
+	fi
+
+	cd "$KERNEL_SRC"
+
+	abi_error_log="${abi_error_log:-${KERNEL_SRC}/abi_error_log.txt}"
+
+	check_deps
+
+	tmp_dir=$(mktemp -d)
+	trap 'rm -rf "$tmp_dir"' EXIT
+
+	# Set of UAPI directories to check by default
+	UAPI_DIRS=(include/uapi arch/*/include/uapi)
+
+	if ! git rev-parse --is-inside-work-tree > /dev/null 2>&1; then
+		eprintf "error - this script requires the kernel tree to be initialized with Git\n"
+		exit 1
+	fi
+
+	# If there are no dirty UAPI files, use HEAD as base_ref
+	if [ -z "$base_ref" ] && [ "$(get_uapi_files "" "" | wc -l)" -eq 0 ]; then
+		base_ref="HEAD"
+	fi
+
+	if [ -z "$ref_to_check" ]; then
+		if [ -n "$base_ref" ]; then
+			ref_to_check="${base_ref}^1"
+		else
+			ref_to_check="HEAD"
+		fi
+	fi
+
+	if [ -n "$ref_to_check" ] && ! git rev-parse --verify "$ref_to_check" > /dev/null 2>&1; then
+		printf 'error - invalid git reference "%s"\n' "$ref_to_check"
+		exit 1
+	fi
+
+	if [ -n "$base_ref" ]; then
+		if ! git merge-base --is-ancestor "$ref_to_check" "$base_ref" > /dev/null 2>&1; then
+			printf 'error - "%s" is not an ancestor of base ref "%s"\n' "$ref_to_check" "$base_ref"
+			exit 1
+		fi
+	fi
+
+	if [ "$check_all" != "true" ] && [ "$(get_uapi_files "$check_all" "$base_ref" "$ref_to_check" | wc -l)" -eq 0 ]; then
+		printf "No changes to UAPI headers were applied between %s and %s\n" "$ref_to_check" "$base_ref"
+		exit 0
+	fi
+
+	if ! check_uapi_files "$check_all" "$base_ref" "$ref_to_check"; then
+		eprintf "error - UAPI header ABI check failed\n"
+		{
+			printf 'Generated by "%s %s" from git ref %s\n\n' "$0" "$*" "$(git rev-parse "HEAD")"
+			find "$tmp_dir" -type f -name '*.error' -exec cat {} \;
+		} > "$abi_error_log"
+		eprintf "Failure summary saved to %s\n" "$abi_error_log"
+		exit 1
+	fi
+}
+
+main "$@"