[2/2] scripts: Add sort-makefile-lines.py to sort Makefile variables.

Message ID 20230428114811.4129539-3-carlos@redhat.com
State Superseded
Delegated to: Siddhesh Poyarekar
Headers
Series Standardize the sorting of Makefile variables |

Checks

Context Check Description
dj/TryBot-32bit success Build for i686

Commit Message

Carlos O'Donell April 28, 2023, 11:48 a.m. UTC
  The scripts/sort-makefile-lines.py script sorts Makefile variables
according to project expected order.

The script is used like this:

$ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
$ mv elf/Makefile.tmp elf/Makefile
---
 scripts/sort-makefile-lines.py | 217 +++++++++++++++++++++++++++++++++
 1 file changed, 217 insertions(+)
 create mode 100755 scripts/sort-makefile-lines.py
  

Comments

Siddhesh Poyarekar May 8, 2023, 5:47 p.m. UTC | #1
On 2023-04-28 07:48, Carlos O'Donell via Libc-alpha wrote:
> The scripts/sort-makefile-lines.py script sorts Makefile variables
> according to project expected order.
> 
> The script is used like this:
> 
> $ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
> $ mv elf/Makefile.tmp elf/Makefile

Should we have a convenience make target like `make sort-makefile-lines` 
to allow folks to reflow files?  I'm trying to think of how we could 
make it easy for users to consume this.

> ---
>   scripts/sort-makefile-lines.py | 217 +++++++++++++++++++++++++++++++++
>   1 file changed, 217 insertions(+)
>   create mode 100755 scripts/sort-makefile-lines.py
> 
> diff --git a/scripts/sort-makefile-lines.py b/scripts/sort-makefile-lines.py
> new file mode 100755
> index 0000000000..06c0b3b3a2
> --- /dev/null
> +++ b/scripts/sort-makefile-lines.py
> @@ -0,0 +1,217 @@
> +#!/usr/bin/python3
> +# Sort Makefile lines as expected by project policy.
> +# Copyright (C) 2022-2023 Free Software Foundation, Inc.
> +# Copyright The GNU Toolchain Authors.

Why both?  Does it include code that was written by someone who does not 
have copyright assignment on file with the FSF?

> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +# The project consensus is to split Makefile variable assignment
> +# across multiple lines with one value per line.  The values are
> +# then sorted as described below, and terminated with a special
> +# list termination marker.  This splitting makes it much easier
> +# to add new tests to the list since they become just a single
> +# line insertion.  It also makes backports and merges easier
> +# since the new test may not conflict due to the ordering.
> +#
> +# Consensus discussion:
> +# https://public-inbox.org/libc-alpha/f6406204-84f5-adb1-d00e-979ebeebbbde@redhat.com/
> +#
> +# To support cleaning up Makefiles we created this program to
> +# help sort existing lists converted to the new format.
> +#
> +# The program takes as input the Makefile to sort correctly,
> +# and the output file to write the correctly sorted output.
> +#
> +# Sorting is only carried out between two special markers:
> +# (a) Marker start is '<variable> += \'
> +# (b) Marker end is '  # <variable>'
> +# With everthing between (a) and (b) being sorted.
> +#
> +# You can use it like this:
> +# $ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
> +# $ mv elf/Makefile.tmp elf/Makefile
> +#
> +# The Makefile lines in the project are sorted using the
> +# following rules:
> +# - First all lines are sorted as-if `LC_COLLATE=C sort`
> +# - Then all entries by group are sorted against the last digits
> +#   of the test.
> +#
> +# For example:
> +# ~~~
> +# tests += \
> +#   test-a \
> +#   test-b \
> +#   test-b1 \
> +#   test-b2 \
> +#   test-b10 \
> +#   test-b20 \
> +#   test-b100 \
> +#   # tests
> +# ~~~
> +# This example shows tests sorted alphabetically, followed
> +# by a numeric suffix sort in increasing numeric order.
> +#
> +# Required cleanups:
> +# - Tests that end in "a" or "b" variants must be renamed to
> +#   end in just the numerical value. For example 'tst-mutex7robust'
> +#   should be renamed to 'tst-mutex12' (the highest numbered test)
> +#   or 'tst-robust11' (the highest numbered test).
> +# - Modules that end in "mod" or "mod1" should be renamed. For
> +#   example 'tst-atfork2mod' should be renamed to 'tst-mod-atfork2'
> +#   (test module for atfork2). If there are more than one module
> +#   then they should be named with a suffix that uses [0-9] first
> +#   then [A-Z] next for a total of 36 possible modules per test.
> +#   No manually listed test currently uses more than that (though
> +#   automatically generated tests may; they don't need sorting).
> +# - Avoid including another test and instead refactor into common
> +#   code with all tests including hte common code, then give the
> +#   tests unique names.
> +#
> +# If you have a Makefile that needs converting, then you can
> +# quickly split the values into one-per-line, ensure the start
> +# and end markers are in place, and then run the script to
> +# sort the values.

I'm not going to block on this if you don't want to do it, but have you 
thought about making this even simpler by, e.g. only looking for the 
start and end marker, splitting everything in between and then sorting 
the values?  Basically don't require developers to break the lists into 
one value per line and do it through the script.

> +
> +import argparse
> +import sys
> +import locale
> +import re
> +import functools
> +
> +def numeric_key(line):
> +    # Turn a line into a numeric sort value by fetching
> +    # the ending number and using that as a key.
> +    var = re.search(r'([0-9]+) \\$', line)
> +    if var == None:
> +        print ("Error: Test line is currently: \"", end='')
> +        print (line, end='')
> +        print ("\"")
> +        print (
> +        '''
> +Test name does not match expected pattern.
> +Rename to match pattern e.g. tst-name[0-9]+.
> +        '''
> +        )
> +        raise Exception ("Invalid test name.")
> +    # Return the numeric value as the key or throws because
> +    # var is None.
> +    return int(var.group(1))
> +
> +def sort_lines(lines):
> +
> +    # Use the C locale for language independent collation.
> +    locale.setlocale (locale.LC_ALL, "C")

Will we ever have non-ASCII names for tests, routines, etc?

> +
> +    # Sort with strcoll initially.  The tests ending in numeric
> +    # names will not sort correctly, but we will adjust that next.
> +    lines = sorted(lines, key=functools.cmp_to_key(locale.strcoll))

I wonder if you could, instead of simply passing strcoll, use a custom 
function which calls strcoll (or a simple sort if we decide we'll never 
have non-ascii file names) and also accounts for the suffix, returning, 
e.g. -1 for cmp('tst-mutex9', 'tst-mutex10').

> +
> +    # We could use a trie here, but since the problem is restricted
> +    # to just numeric suffix we sort by group with a unique key
> +    # function.
> +
> +    # Build a list of all start markers (tuple includes prefix)
> +    prefixes = []
> +    groups = []
> +    for i in range(len(lines)):
> +        # Look for things like "  tst-foo1 \" to start the numbered list.
> +        var = re.search(r'([0-9]+) \\$', lines[i])
> +        if var:
> +            prefix = lines[i][0:var.span()[0]]
> +            if prefix in prefixes:
> +                continue
> +            prefixes.append(prefix)
> +            groups.append((prefix, i))
> +
> +    # For each prefix find the range it covers that needs numeric sorting.
> +    numgroups = []
> +    for group in groups:
> +        for j in range(group[1] + 1,len(lines)):
> +            if not lines[j].startswith(group[0]):
> +                # If it doesn't start with the prefix, then we're on to
> +                # to the next group so mark the last entry as the end
> +                # of the group.
> +                numgroups.append((group[0], group[1], j - 1))
> +                break
> +
> +    # We now have a list of groups to sort.
> +    for ng in numgroups:
> +        # Note slices exclude nth element, so we must add one to right side.
> +        lines[ng[1]:ng[2]+1] = sorted(lines[ng[1]:ng[2]+1], key=numeric_key)
> +
> +    # Return a sorted list with numeric tests sorted by number.
> +    return lines
> +
> +def sort_makefile_lines(infile, outfile):
> +

Maybe add a check here to ensure infile != outfile?  Or do we want to 
support that use case?  It should be doable since you're reading the 
entire file into a list at once.

> +    # Read the whole Makefile.
> +    mfile = open(infile)
> +    lines = mfile.readlines()
> +    mfile.close()
> +
> +    # We will output the Makefile here. Open it early to check
> +    # for any errors.
> +    ofile = open(outfile, "w")
> +
> +    # Build a list of all start markers (tuple includes name).
> +    startmarks = []
> +    for i in range(len(lines)):
> +        # Look for things like "var += \" to start the sorted list.
> +        var = re.search(r'^([a-zA-Z0-9]*) \+\= \\$', lines[i])
> +        if var:
> +            # Remember the index and the name.
> +            startmarks.append((i, var.group(1)))
> +
> +    # For each start marker try to find a matching end mark
> +    # and build a block that needs sorting.  The end marker
> +    # must have the matching comment name for it to be valid.
> +    rangemarks = []
> +    for sm in startmarks:
> +        # Look for things like "  # var" to end the sorted list.
> +        reg = r'^  # ' + sm[1] + r'$'
> +        for j in range(sm[0] + 1, len(lines)):
> +            if re.search(reg, lines[j]):
> +                # Rembember the block to sort (inclusive).
> +                rangemarks.append((sm[0] + 1, j - 1))
> +                break
> +
> +    # We now have a list of all ranges that need sorting.
> +    # Sort those ranges.
> +    for r in rangemarks:
> +        lines[r[0]:r[1]] = sort_lines(lines[r[0]:r[1]])
> +
> +    # Output the whole list with sorted lines.
> +    for line in lines:
> +        ofile.write(line)
> +
> +    ofile.close()
> +
> +def get_parser():
> +    parser = argparse.ArgumentParser(description=__doc__)
> +    parser.add_argument('-i', dest='infile',
> +                        help='Input Makefile to read lines from')
> +    parser.add_argument('-o', dest='outfile',
> +                        help='Output Makefile to write sorted lines to')
> +    return parser
> +
> +def main(argv):
> +    parser = get_parser()
> +    opts = parser.parse_args(argv)
> +    sort_makefile_lines (opts.infile, opts.outfile)
> +
> +if __name__ == '__main__':
> +    main(sys.argv[1:])
  
Carlos O'Donell May 9, 2023, 5:58 p.m. UTC | #2
On 5/8/23 13:47, Siddhesh Poyarekar wrote:
> On 2023-04-28 07:48, Carlos O'Donell via Libc-alpha wrote:
>> The scripts/sort-makefile-lines.py script sorts Makefile variables
>> according to project expected order.
>>
>> The script is used like this:
>>
>> $ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
>> $ mv elf/Makefile.tmp elf/Makefile
> 
> Should we have a convenience make target like `make
> sort-makefile-lines` to allow folks to reflow files?  I'm trying to
> think of how we could make it easy for users to consume this.

The reflow process is manual and NRE, so it doesn't deserve a target IMO.

We *do* need to either:
(a) Add a linter in make check.
(b) Add a pre-commit CI linter check.

Today we can't do this until we convert more of the Makefiles and it is my intent
to convert them all.

It *almost* passes today, we need to cleanup some half-dozen Makefiles.

>> ---
>>   scripts/sort-makefile-lines.py | 217 +++++++++++++++++++++++++++++++++
>>   1 file changed, 217 insertions(+)
>>   create mode 100755 scripts/sort-makefile-lines.py
>>
>> diff --git a/scripts/sort-makefile-lines.py b/scripts/sort-makefile-lines.py
>> new file mode 100755
>> index 0000000000..06c0b3b3a2
>> --- /dev/null
>> +++ b/scripts/sort-makefile-lines.py
>> @@ -0,0 +1,217 @@
>> +#!/usr/bin/python3
>> +# Sort Makefile lines as expected by project policy.
>> +# Copyright (C) 2022-2023 Free Software Foundation, Inc.
>> +# Copyright The GNU Toolchain Authors.
> 
> Why both?  Does it include code that was written by someone who does
> not have copyright assignment on file with the FSF?

That's a mistake I forgot to cleanup.

I'll post a v2.

In v2 I'm changing a number of things:

(a) Fold all the complex double-block sorting into a single sort routine.
(b) Stop issuing errors and just sort according to the rules.

> 
>> +# This file is part of the GNU C Library.
>> +#
>> +# The GNU C Library is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU Lesser General Public
>> +# License as published by the Free Software Foundation; either
>> +# version 2.1 of the License, or (at your option) any later version.
>> +#
>> +# The GNU C Library is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +# Lesser General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU Lesser General Public
>> +# License along with the GNU C Library; if not, see
>> +# <https://www.gnu.org/licenses/>.
>> +
>> +# The project consensus is to split Makefile variable assignment
>> +# across multiple lines with one value per line.  The values are
>> +# then sorted as described below, and terminated with a special
>> +# list termination marker.  This splitting makes it much easier
>> +# to add new tests to the list since they become just a single
>> +# line insertion.  It also makes backports and merges easier
>> +# since the new test may not conflict due to the ordering.
>> +#
>> +# Consensus discussion:
>> +# https://public-inbox.org/libc-alpha/f6406204-84f5-adb1-d00e-979ebeebbbde@redhat.com/
>> +#
>> +# To support cleaning up Makefiles we created this program to
>> +# help sort existing lists converted to the new format.
>> +#
>> +# The program takes as input the Makefile to sort correctly,
>> +# and the output file to write the correctly sorted output.
>> +#
>> +# Sorting is only carried out between two special markers:
>> +# (a) Marker start is '<variable> += \'
>> +# (b) Marker end is '  # <variable>'
>> +# With everthing between (a) and (b) being sorted.
>> +#
>> +# You can use it like this:
>> +# $ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
>> +# $ mv elf/Makefile.tmp elf/Makefile
>> +#
>> +# The Makefile lines in the project are sorted using the
>> +# following rules:
>> +# - First all lines are sorted as-if `LC_COLLATE=C sort`
>> +# - Then all entries by group are sorted against the last digits
>> +#   of the test.
>> +#
>> +# For example:
>> +# ~~~
>> +# tests += \
>> +#   test-a \
>> +#   test-b \
>> +#   test-b1 \
>> +#   test-b2 \
>> +#   test-b10 \
>> +#   test-b20 \
>> +#   test-b100 \
>> +#   # tests
>> +# ~~~
>> +# This example shows tests sorted alphabetically, followed
>> +# by a numeric suffix sort in increasing numeric order.
>> +#
>> +# Required cleanups:
>> +# - Tests that end in "a" or "b" variants must be renamed to
>> +#   end in just the numerical value. For example 'tst-mutex7robust'
>> +#   should be renamed to 'tst-mutex12' (the highest numbered test)
>> +#   or 'tst-robust11' (the highest numbered test).
>> +# - Modules that end in "mod" or "mod1" should be renamed. For
>> +#   example 'tst-atfork2mod' should be renamed to 'tst-mod-atfork2'
>> +#   (test module for atfork2). If there are more than one module
>> +#   then they should be named with a suffix that uses [0-9] first
>> +#   then [A-Z] next for a total of 36 possible modules per test.
>> +#   No manually listed test currently uses more than that (though
>> +#   automatically generated tests may; they don't need sorting).
>> +# - Avoid including another test and instead refactor into common
>> +#   code with all tests including hte common code, then give the
>> +#   tests unique names.
>> +#
>> +# If you have a Makefile that needs converting, then you can
>> +# quickly split the values into one-per-line, ensure the start
>> +# and end markers are in place, and then run the script to
>> +# sort the values.
> 
> I'm not going to block on this if you don't want to do it, but have
> you thought about making this even simpler by, e.g. only looking for
> the start and end marker, splitting everything in between and then
> sorting the values?  Basically don't require developers to break the
> lists into one value per line and do it through the script.

I have, but that means writing code that only gets used once in the conversion
process and then we delete it because it's never needed again.

I want to keep this file as simple as possible with as little code as possible
for the future scenario where all the files are already converted and we're
just checking their sorting.

Notice I don't enforce any whitespace either here just the sorting.

The intent is to lint only the sorting of lines in a specific layout.

>> +
>> +import argparse
>> +import sys
>> +import locale
>> +import re
>> +import functools
>> +
>> +def numeric_key(line):
>> +    # Turn a line into a numeric sort value by fetching
>> +    # the ending number and using that as a key.
>> +    var = re.search(r'([0-9]+) \\$', line)
>> +    if var == None:
>> +        print ("Error: Test line is currently: \"", end='')
>> +        print (line, end='')
>> +        print ("\"")
>> +        print (
>> +        '''
>> +Test name does not match expected pattern.
>> +Rename to match pattern e.g. tst-name[0-9]+.
>> +        '''
>> +        )
>> +        raise Exception ("Invalid test name.")
>> +    # Return the numeric value as the key or throws because
>> +    # var is None.
>> +    return int(var.group(1))
>> +
>> +def sort_lines(lines):
>> +
>> +    # Use the C locale for language independent collation.
>> +    locale.setlocale (locale.LC_ALL, "C")
> 
> Will we ever have non-ASCII names for tests, routines, etc?

Yes.

With C we'll sort by code point order (byte-by-byte).

It won't match the sorting of any *language* but it will sort by UTF-8 code point order.

>> +
>> +    # Sort with strcoll initially.  The tests ending in numeric
>> +    # names will not sort correctly, but we will adjust that next.
>> +    lines = sorted(lines, key=functools.cmp_to_key(locale.strcoll))
> 
> I wonder if you could, instead of simply passing strcoll, use a
> custom function which calls strcoll (or a simple sort if we decide
> we'll never have non-ascii file names) and also accounts for the
> suffix, returning, e.g. -1 for cmp('tst-mutex9', 'tst-mutex10').

Fixed in v2.

> 
>> +
>> +    # We could use a trie here, but since the problem is restricted
>> +    # to just numeric suffix we sort by group with a unique key
>> +    # function.
>> +
>> +    # Build a list of all start markers (tuple includes prefix)
>> +    prefixes = []
>> +    groups = []
>> +    for i in range(len(lines)):
>> +        # Look for things like "  tst-foo1 \" to start the numbered list.
>> +        var = re.search(r'([0-9]+) \\$', lines[i])
>> +        if var:
>> +            prefix = lines[i][0:var.span()[0]]
>> +            if prefix in prefixes:
>> +                continue
>> +            prefixes.append(prefix)
>> +            groups.append((prefix, i))
>> +
>> +    # For each prefix find the range it covers that needs numeric sorting.
>> +    numgroups = []
>> +    for group in groups:
>> +        for j in range(group[1] + 1,len(lines)):
>> +            if not lines[j].startswith(group[0]):
>> +                # If it doesn't start with the prefix, then we're on to
>> +                # to the next group so mark the last entry as the end
>> +                # of the group.
>> +                numgroups.append((group[0], group[1], j - 1))
>> +                break
>> +
>> +    # We now have a list of groups to sort.
>> +    for ng in numgroups:
>> +        # Note slices exclude nth element, so we must add one to right side.
>> +        lines[ng[1]:ng[2]+1] = sorted(lines[ng[1]:ng[2]+1], key=numeric_key)
>> +
>> +    # Return a sorted list with numeric tests sorted by number.
>> +    return lines
>> +
>> +def sort_makefile_lines(infile, outfile):
>> +
> 
> Maybe add a check here to ensure infile != outfile?  Or do we want to
> support that use case?  It should be doable since you're reading the
> entire file into a list at once.

It should be allowed, we open, read, close, and then open, write close.

I've adjusted all the uses to avoid showing a temp file.
 
>> +    # Read the whole Makefile.
>> +    mfile = open(infile)
>> +    lines = mfile.readlines()
>> +    mfile.close()
>> +
>> +    # We will output the Makefile here. Open it early to check
>> +    # for any errors.
>> +    ofile = open(outfile, "w")
>> +
>> +    # Build a list of all start markers (tuple includes name).
>> +    startmarks = []
>> +    for i in range(len(lines)):
>> +        # Look for things like "var += \" to start the sorted list.
>> +        var = re.search(r'^([a-zA-Z0-9]*) \+\= \\$', lines[i])
>> +        if var:
>> +            # Remember the index and the name.
>> +            startmarks.append((i, var.group(1)))
>> +
>> +    # For each start marker try to find a matching end mark
>> +    # and build a block that needs sorting.  The end marker
>> +    # must have the matching comment name for it to be valid.
>> +    rangemarks = []
>> +    for sm in startmarks:
>> +        # Look for things like "  # var" to end the sorted list.
>> +        reg = r'^  # ' + sm[1] + r'$'
>> +        for j in range(sm[0] + 1, len(lines)):
>> +            if re.search(reg, lines[j]):
>> +                # Rembember the block to sort (inclusive).
>> +                rangemarks.append((sm[0] + 1, j - 1))
>> +                break
>> +
>> +    # We now have a list of all ranges that need sorting.
>> +    # Sort those ranges.
>> +    for r in rangemarks:
>> +        lines[r[0]:r[1]] = sort_lines(lines[r[0]:r[1]])
>> +
>> +    # Output the whole list with sorted lines.
>> +    for line in lines:
>> +        ofile.write(line)
>> +
>> +    ofile.close()
>> +
>> +def get_parser():
>> +    parser = argparse.ArgumentParser(description=__doc__)
>> +    parser.add_argument('-i', dest='infile',
>> +                        help='Input Makefile to read lines from')
>> +    parser.add_argument('-o', dest='outfile',
>> +                        help='Output Makefile to write sorted lines to')
>> +    return parser
>> +
>> +def main(argv):
>> +    parser = get_parser()
>> +    opts = parser.parse_args(argv)
>> +    sort_makefile_lines (opts.infile, opts.outfile)
>> +
>> +if __name__ == '__main__':
>> +    main(sys.argv[1:])
>
  

Patch

diff --git a/scripts/sort-makefile-lines.py b/scripts/sort-makefile-lines.py
new file mode 100755
index 0000000000..06c0b3b3a2
--- /dev/null
+++ b/scripts/sort-makefile-lines.py
@@ -0,0 +1,217 @@ 
+#!/usr/bin/python3
+# Sort Makefile lines as expected by project policy.
+# Copyright (C) 2022-2023 Free Software Foundation, Inc.
+# Copyright The GNU Toolchain Authors.
+# This file is part of the GNU C Library.
+#
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+
+# The project consensus is to split Makefile variable assignment
+# across multiple lines with one value per line.  The values are
+# then sorted as described below, and terminated with a special
+# list termination marker.  This splitting makes it much easier
+# to add new tests to the list since they become just a single
+# line insertion.  It also makes backports and merges easier
+# since the new test may not conflict due to the ordering.
+#
+# Consensus discussion:
+# https://public-inbox.org/libc-alpha/f6406204-84f5-adb1-d00e-979ebeebbbde@redhat.com/
+#
+# To support cleaning up Makefiles we created this program to
+# help sort existing lists converted to the new format.
+#
+# The program takes as input the Makefile to sort correctly,
+# and the output file to write the correctly sorted output.
+#
+# Sorting is only carried out between two special markers:
+# (a) Marker start is '<variable> += \'
+# (b) Marker end is '  # <variable>'
+# With everthing between (a) and (b) being sorted.
+#
+# You can use it like this:
+# $ scripts/sort-makefile-lines.py -i elf/Makefile -o elf/Makefile.tmp
+# $ mv elf/Makefile.tmp elf/Makefile
+#
+# The Makefile lines in the project are sorted using the
+# following rules:
+# - First all lines are sorted as-if `LC_COLLATE=C sort`
+# - Then all entries by group are sorted against the last digits
+#   of the test.
+#
+# For example:
+# ~~~
+# tests += \
+#   test-a \
+#   test-b \
+#   test-b1 \
+#   test-b2 \
+#   test-b10 \
+#   test-b20 \
+#   test-b100 \
+#   # tests
+# ~~~
+# This example shows tests sorted alphabetically, followed
+# by a numeric suffix sort in increasing numeric order.
+#
+# Required cleanups:
+# - Tests that end in "a" or "b" variants must be renamed to
+#   end in just the numerical value. For example 'tst-mutex7robust'
+#   should be renamed to 'tst-mutex12' (the highest numbered test)
+#   or 'tst-robust11' (the highest numbered test).
+# - Modules that end in "mod" or "mod1" should be renamed. For
+#   example 'tst-atfork2mod' should be renamed to 'tst-mod-atfork2'
+#   (test module for atfork2). If there are more than one module
+#   then they should be named with a suffix that uses [0-9] first
+#   then [A-Z] next for a total of 36 possible modules per test.
+#   No manually listed test currently uses more than that (though
+#   automatically generated tests may; they don't need sorting).
+# - Avoid including another test and instead refactor into common
+#   code with all tests including hte common code, then give the
+#   tests unique names.
+#
+# If you have a Makefile that needs converting, then you can
+# quickly split the values into one-per-line, ensure the start
+# and end markers are in place, and then run the script to
+# sort the values.
+
+import argparse
+import sys
+import locale
+import re
+import functools
+
+def numeric_key(line):
+    # Turn a line into a numeric sort value by fetching
+    # the ending number and using that as a key.
+    var = re.search(r'([0-9]+) \\$', line)
+    if var == None:
+        print ("Error: Test line is currently: \"", end='')
+        print (line, end='')
+        print ("\"")
+        print (
+        '''
+Test name does not match expected pattern.
+Rename to match pattern e.g. tst-name[0-9]+.
+        '''
+        )
+        raise Exception ("Invalid test name.")
+    # Return the numeric value as the key or throws because
+    # var is None.
+    return int(var.group(1))
+
+def sort_lines(lines):
+
+    # Use the C locale for language independent collation.
+    locale.setlocale (locale.LC_ALL, "C")
+
+    # Sort with strcoll initially.  The tests ending in numeric
+    # names will not sort correctly, but we will adjust that next.
+    lines = sorted(lines, key=functools.cmp_to_key(locale.strcoll))
+
+    # We could use a trie here, but since the problem is restricted
+    # to just numeric suffix we sort by group with a unique key
+    # function.
+
+    # Build a list of all start markers (tuple includes prefix)
+    prefixes = []
+    groups = []
+    for i in range(len(lines)):
+        # Look for things like "  tst-foo1 \" to start the numbered list.
+        var = re.search(r'([0-9]+) \\$', lines[i])
+        if var:
+            prefix = lines[i][0:var.span()[0]]
+            if prefix in prefixes:
+                continue
+            prefixes.append(prefix)
+            groups.append((prefix, i))
+
+    # For each prefix find the range it covers that needs numeric sorting.
+    numgroups = []
+    for group in groups:
+        for j in range(group[1] + 1,len(lines)):
+            if not lines[j].startswith(group[0]):
+                # If it doesn't start with the prefix, then we're on to
+                # to the next group so mark the last entry as the end
+                # of the group.
+                numgroups.append((group[0], group[1], j - 1))
+                break
+
+    # We now have a list of groups to sort.
+    for ng in numgroups:
+        # Note slices exclude nth element, so we must add one to right side.
+        lines[ng[1]:ng[2]+1] = sorted(lines[ng[1]:ng[2]+1], key=numeric_key)
+
+    # Return a sorted list with numeric tests sorted by number.
+    return lines
+
+def sort_makefile_lines(infile, outfile):
+
+    # Read the whole Makefile.
+    mfile = open(infile)
+    lines = mfile.readlines()
+    mfile.close()
+
+    # We will output the Makefile here. Open it early to check
+    # for any errors.
+    ofile = open(outfile, "w")
+
+    # Build a list of all start markers (tuple includes name).
+    startmarks = []
+    for i in range(len(lines)):
+        # Look for things like "var += \" to start the sorted list.
+        var = re.search(r'^([a-zA-Z0-9]*) \+\= \\$', lines[i])
+        if var:
+            # Remember the index and the name.
+            startmarks.append((i, var.group(1)))
+
+    # For each start marker try to find a matching end mark
+    # and build a block that needs sorting.  The end marker
+    # must have the matching comment name for it to be valid.
+    rangemarks = []
+    for sm in startmarks:
+        # Look for things like "  # var" to end the sorted list.
+        reg = r'^  # ' + sm[1] + r'$'
+        for j in range(sm[0] + 1, len(lines)):
+            if re.search(reg, lines[j]):
+                # Rembember the block to sort (inclusive).
+                rangemarks.append((sm[0] + 1, j - 1))
+                break
+
+    # We now have a list of all ranges that need sorting.
+    # Sort those ranges.
+    for r in rangemarks:
+        lines[r[0]:r[1]] = sort_lines(lines[r[0]:r[1]])
+
+    # Output the whole list with sorted lines.
+    for line in lines:
+        ofile.write(line)
+
+    ofile.close()
+
+def get_parser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument('-i', dest='infile',
+                        help='Input Makefile to read lines from')
+    parser.add_argument('-o', dest='outfile',
+                        help='Output Makefile to write sorted lines to')
+    return parser
+
+def main(argv):
+    parser = get_parser()
+    opts = parser.parse_args(argv)
+    sort_makefile_lines (opts.infile, opts.outfile)
+
+if __name__ == '__main__':
+    main(sys.argv[1:])