[0/5] argp: Fix several cases of undefined behaviour

Message ID 1609981580-17229-1-git-send-email-bruno@clisp.org
Headers
Series argp: Fix several cases of undefined behaviour |

Message

Bruno Haible Jan. 7, 2021, 1:06 a.m. UTC
  Here is a proposed patch series to merge changes in the argp-help.c file,
done in Gnulib, back to glibc.

The argp-help.c file is what determines the output of the --help option of
a program that uses the 'argp' facility
 <https://www.gnu.org/software/libc/manual/html_node/Argp.html>.

The purpose of these patches is to fix three different kinds of what ISO C
calls "undefined behaviour", and thus achieve the --help output on
different operating systems. Without the patches, the --help output was
different (and nonsensical) on macOS and FreeBSD.

I am not an expert in the argp facility, but I am pretty confident about
the changes as
  - they are limited to a single file, without affecting the other parts
    of the 'argp' facility,
  - I spent several days reading and understanding this file and the
    associated documentation,
  - the glibc tests of the 'argp' facility pass, without requiring changes.

Patch 1/5 is by Paul Eggert
 <https://lists.gnu.org/archive/html/bug-gnulib/2017-05/msg00176.html>
It fixes an alert from GCC's -fcheck-pointer-bounds instrumentation.

Patch 2/5 was originally by Eric Blake
 <https://lists.gnu.org/archive/html/bug-gnulib/2009-09/msg00287.html>.
It fixes an undefined behaviour: The code used the _tolower function
on arbitrary ASCII letters. But this function is documented, e.g. in
 <https://linux.die.net/man/3/_tolower>, to take only uppercase letter
as arguments.
Inside glibc it might not be undefined behaviour if glibc's _tolower
functions has more guarantees. But tolower() is not that much slower.
This part of glibc is not performance-critical: Producing the --help
output occurs at most once per program run, and there are rarely more
than 1000 command-line options for which the help output needs to be
produced.

Patch 3/5 fixes undefined behaviour caused by calling the functions
isspace(), isalpha(), isalnum(), isdigit() on values of type 'char',
not 'unsigned char'. See e.g. POSIX
 <https://pubs.opengroup.org/onlinepubs/9699919799/functions/isalnum_l.html>
This matters only on platforms for which 'char' is signed (e.g. not
on powerpc), and only for options that contain non-ASCII characters.
Admittedly, this is rare nowadays that ISO-8859-1 is no longer in wide
use. But it costs nothing to fix this: Fetching a byte from memory
and zero-extending it takes as many CPU instructions as fetching a byte
from memory and sign-extending it.

Patch 4/5 are cosmetic / readability changes that I felt most helpful
when understanding this code.
  - Add sectioning comments. Reading a 2000-lines file without
    structure is like reading a 200-pages novel with no chapter titles.
    The file has page breaks, but they don't carry any information,
    and some editors don't display page breaks.
  - Write NULL to designate a null pointer. It does help readability
    to write 0 for integers and NULL for pointers.
  - Fix wrong comments.
  - Move two functions so that they appear in proximity of related code,
    instead of interrupting the sequence of functions that deal with
    something completely different.

Patch 5/5 fixes undefined behaviour caused by calling the function
qsort() with a sort predicate that is not a total order. POSIX
 <https://pubs.opengroup.org/onlinepubs/9699919799/functions/qsort.html>
says:
  "When the same objects ... are passed more than once to the
   comparison function, the results shall be consistent with one
   another. That is, they shall define a total ordering on the array."

I added some debugging before the qsort() call and printed out
the results of this comparison function
  1) as a matrix,
  2) as a list of violations of the total order rule.

For example, the output on the gnulib test suite example was:

hol_sort: entries = {
  [0] = [] [Main options]
  [1] = [test] []
  [2] = [] [Option Group 1]
  [3] = [verbose] [Simple option without arguments]
  [4] = [file] [Option with a mandatory argument]
  [5] = [hidden] [Hidden option]
  [6] = [] [Option Group 1.1]
  [7] = [cantiga] [create a cantiga]
  [8] = [sonet] [create a sonet]
  [9] = [] [Option Group 2]
  [10] = [option] [An option]
  [11] = [optional] [Option with an optional argument. ARG is one of the following:]
  [12] = [one] [one unit]
  [13] = [two] [two units]
  [14] = [many] [many units]
  [15] = [] [Option Group 2.1]
  [16] = [poem] [create a poem]
  [17] = [limerick] [create a limerick]
  [18] = [help] [give this help list]
  [19] = [usage] [give a short usage message]
  [20] = [program-name] [set the program name]
  [21] = [HANG] [hang for SECS seconds (default 3600)]
  [22] = [version] [print program version]
}
hol_sort: comparisons =
   . - - - - - - - - - - - - - - - - - - - - - -
   + . - - - - - - - - - - - - - - - - - - - - -
   + + . - - . . . . - - - - - - - - - - - - - -
   + + + . + + . . . - - - - - - - - - - - - - -
   + + + - . + . . . - - - - - - - - - - - - - -
   + + . - - . . . . - - - - - - - - - - - - - -
   + + . . . . . - - - - - - - - - - - - - - - -
   + + . . . . + . - - - - - - - - - - - - - - -
   + + . . . . + + . - - - - - - - - - - - - - -
   + + + + + + + + + . - - - - - . . . - - - - -
   + + + + + + + + + + . - - - - . . . - - - - -
   + + + + + + + + + + + . - - - . . . - - - - -
   + + + + + + + + + + + + . - + . . . - - - - -
   + + + + + + + + + + + + + . + . . . - - - - -
   + + + + + + + + + + + + - - . . . . - - - - -
   + + + + + + + + + . . . . . . . - - - - - - -
   + + + + + + + + + . . . . . . + . + - - - - -
   + + + + + + + + + . . . . . . + - . - - - - -
   + + + + + + + + + + + + + + + + + + . - + + -
   + + + + + + + + + + + + + + + + + + + . + + -
   + + + + + + + + + + + + + + + + + + - - . . -
   + + + + + + + + + + + + + + + + + + - - . . -
   + + + + + + + + + + + + + + + + + + + + + + .
hol_sort: transitivity violated for [2] [3] [6]
hol_sort: transitivity violated for [2] [3] [7]
...

"transitivity violated for [2] [3] [6]" means that
  compare([2], [3]) < 0
  compare([3], [6]) = 0
  compare([2], [6]) = 0
and similarly for the other violations.

More details in
 <https://lists.gnu.org/archive/html/bug-gnulib/2020-12/msg00088.html>.

So, I rewrote the comparison function 'hol_entry_cmp' in such a way that
  - it reflects the original programmer's intent (as best as can
    understand from the previous code and its comments),
  - it is a total order (this is guaranteed by defining it a lexicographic
    comparison function, based on several more elementary total orderings),
  - it passes the Gnulib test suite.
It then also passes the glibc test suite without further adjustments.


Bruno Haible (4):
  argp: Don't rely on undefined behaviour of _tolower().
  argp: Don't pass invalid arguments to isspace, isalnum, isalpha,
    isdigit.
  argp: Improve comments.
  argp: Avoid undefined behaviour when invoking qsort().

Paul Eggert (1):
  argp: fix pointer-subtraction bug

 argp/argp-help.c | 379 ++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 246 insertions(+), 133 deletions(-)
  

Comments

Adhemerval Zanella Feb. 4, 2021, 7:46 p.m. UTC | #1
On 06/01/2021 22:06, Bruno Haible wrote:
> Here is a proposed patch series to merge changes in the argp-help.c file,
> done in Gnulib, back to glibc.

Thanks for working on this, I have push it upstream. Do you plan to sync
back the rest of argp code? I noticed there is extra changes for argp-help,
but the rest is mainly code style, comments, and gnulib specific fixes.