Message ID | 1609981580-17229-1-git-send-email-bruno@clisp.org |
---|---|
Headers |
Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 761F0384640D; Thu, 7 Jan 2021 01:07:02 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [81.169.146.218]) by sourceware.org (Postfix) with ESMTPS id 742653865470 for <libc-alpha@sourceware.org>; Thu, 7 Jan 2021 01:06:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 742653865470 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=clisp.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=bruno@clisp.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1609981617; s=strato-dkim-0002; d=clisp.org; h=Message-Id:Date:Subject:Cc:To:From:From:Subject:Sender; bh=Y7wZvznANxQk/jsk6TGaxGXQaOgnq32nhKWQCC0chmw=; b=fctncReakfFjnDOykWanFQhT+aMOSU4Dz+Wg87QNFAYCVxAGhRVz+KEguDD3ubcnqo QRbUSv9+/zjVbauyrOR/Acpu1HxoviPF2QKSYL+PcmET89OJw0C4ISEcgTzcRI2fKAYv x9BwauFrkmuspbRfW8CA4OSchfhlXZylwnvZmHfklDjLnDtXHOLnmCChiH/fUjHRLGw+ j1wIh87BRczk49V9fR0oiZucnLZIl1nc8JLsH9+wkG2qL0NPAqHoFhvnrX3rF7ZY2OZT RJBpL+ArVEB975GHY4ZFgZKoURLEO48gSi+tclyqF0SHw434AWVkJkeLH9bZACX2N2Bc Gr9w== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOHqf3yZdW" X-RZG-CLASS-ID: mo00 Received: from omega.bruno.haible.de by smtp.strato.de (RZmta 47.12.1 DYNA|AUTH) with ESMTPSA id u0aa20x0716m31p (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (curve X9_62_prime256v1 with 256 ECDH bits, eq. 3072 bits RSA)) (Client did not present a certificate); Thu, 7 Jan 2021 02:06:48 +0100 (CET) From: Bruno Haible <bruno@clisp.org> To: libc-alpha@sourceware.org, Paul Eggert <eggert@cs.ucla.edu> Subject: [PATCH 0/5] argp: Fix several cases of undefined behaviour Date: Thu, 7 Jan 2021 02:06:15 +0100 Message-Id: <1609981580-17229-1-git-send-email-bruno@clisp.org> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> Cc: Bruno Haible <bruno@clisp.org> Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org> |
Series |
argp: Fix several cases of undefined behaviour
|
|
Message
Bruno Haible
Jan. 7, 2021, 1:06 a.m. UTC
Here is a proposed patch series to merge changes in the argp-help.c file, done in Gnulib, back to glibc. The argp-help.c file is what determines the output of the --help option of a program that uses the 'argp' facility <https://www.gnu.org/software/libc/manual/html_node/Argp.html>. The purpose of these patches is to fix three different kinds of what ISO C calls "undefined behaviour", and thus achieve the --help output on different operating systems. Without the patches, the --help output was different (and nonsensical) on macOS and FreeBSD. I am not an expert in the argp facility, but I am pretty confident about the changes as - they are limited to a single file, without affecting the other parts of the 'argp' facility, - I spent several days reading and understanding this file and the associated documentation, - the glibc tests of the 'argp' facility pass, without requiring changes. Patch 1/5 is by Paul Eggert <https://lists.gnu.org/archive/html/bug-gnulib/2017-05/msg00176.html> It fixes an alert from GCC's -fcheck-pointer-bounds instrumentation. Patch 2/5 was originally by Eric Blake <https://lists.gnu.org/archive/html/bug-gnulib/2009-09/msg00287.html>. It fixes an undefined behaviour: The code used the _tolower function on arbitrary ASCII letters. But this function is documented, e.g. in <https://linux.die.net/man/3/_tolower>, to take only uppercase letter as arguments. Inside glibc it might not be undefined behaviour if glibc's _tolower functions has more guarantees. But tolower() is not that much slower. This part of glibc is not performance-critical: Producing the --help output occurs at most once per program run, and there are rarely more than 1000 command-line options for which the help output needs to be produced. Patch 3/5 fixes undefined behaviour caused by calling the functions isspace(), isalpha(), isalnum(), isdigit() on values of type 'char', not 'unsigned char'. See e.g. POSIX <https://pubs.opengroup.org/onlinepubs/9699919799/functions/isalnum_l.html> This matters only on platforms for which 'char' is signed (e.g. not on powerpc), and only for options that contain non-ASCII characters. Admittedly, this is rare nowadays that ISO-8859-1 is no longer in wide use. But it costs nothing to fix this: Fetching a byte from memory and zero-extending it takes as many CPU instructions as fetching a byte from memory and sign-extending it. Patch 4/5 are cosmetic / readability changes that I felt most helpful when understanding this code. - Add sectioning comments. Reading a 2000-lines file without structure is like reading a 200-pages novel with no chapter titles. The file has page breaks, but they don't carry any information, and some editors don't display page breaks. - Write NULL to designate a null pointer. It does help readability to write 0 for integers and NULL for pointers. - Fix wrong comments. - Move two functions so that they appear in proximity of related code, instead of interrupting the sequence of functions that deal with something completely different. Patch 5/5 fixes undefined behaviour caused by calling the function qsort() with a sort predicate that is not a total order. POSIX <https://pubs.opengroup.org/onlinepubs/9699919799/functions/qsort.html> says: "When the same objects ... are passed more than once to the comparison function, the results shall be consistent with one another. That is, they shall define a total ordering on the array." I added some debugging before the qsort() call and printed out the results of this comparison function 1) as a matrix, 2) as a list of violations of the total order rule. For example, the output on the gnulib test suite example was: hol_sort: entries = { [0] = [] [Main options] [1] = [test] [] [2] = [] [Option Group 1] [3] = [verbose] [Simple option without arguments] [4] = [file] [Option with a mandatory argument] [5] = [hidden] [Hidden option] [6] = [] [Option Group 1.1] [7] = [cantiga] [create a cantiga] [8] = [sonet] [create a sonet] [9] = [] [Option Group 2] [10] = [option] [An option] [11] = [optional] [Option with an optional argument. ARG is one of the following:] [12] = [one] [one unit] [13] = [two] [two units] [14] = [many] [many units] [15] = [] [Option Group 2.1] [16] = [poem] [create a poem] [17] = [limerick] [create a limerick] [18] = [help] [give this help list] [19] = [usage] [give a short usage message] [20] = [program-name] [set the program name] [21] = [HANG] [hang for SECS seconds (default 3600)] [22] = [version] [print program version] } hol_sort: comparisons = . - - - - - - - - - - - - - - - - - - - - - - + . - - - - - - - - - - - - - - - - - - - - - + + . - - . . . . - - - - - - - - - - - - - - + + + . + + . . . - - - - - - - - - - - - - - + + + - . + . . . - - - - - - - - - - - - - - + + . - - . . . . - - - - - - - - - - - - - - + + . . . . . - - - - - - - - - - - - - - - - + + . . . . + . - - - - - - - - - - - - - - - + + . . . . + + . - - - - - - - - - - - - - - + + + + + + + + + . - - - - - . . . - - - - - + + + + + + + + + + . - - - - . . . - - - - - + + + + + + + + + + + . - - - . . . - - - - - + + + + + + + + + + + + . - + . . . - - - - - + + + + + + + + + + + + + . + . . . - - - - - + + + + + + + + + + + + - - . . . . - - - - - + + + + + + + + + . . . . . . . - - - - - - - + + + + + + + + + . . . . . . + . + - - - - - + + + + + + + + + . . . . . . + - . - - - - - + + + + + + + + + + + + + + + + + + . - + + - + + + + + + + + + + + + + + + + + + + . + + - + + + + + + + + + + + + + + + + + + - - . . - + + + + + + + + + + + + + + + + + + - - . . - + + + + + + + + + + + + + + + + + + + + + + . hol_sort: transitivity violated for [2] [3] [6] hol_sort: transitivity violated for [2] [3] [7] ... "transitivity violated for [2] [3] [6]" means that compare([2], [3]) < 0 compare([3], [6]) = 0 compare([2], [6]) = 0 and similarly for the other violations. More details in <https://lists.gnu.org/archive/html/bug-gnulib/2020-12/msg00088.html>. So, I rewrote the comparison function 'hol_entry_cmp' in such a way that - it reflects the original programmer's intent (as best as can understand from the previous code and its comments), - it is a total order (this is guaranteed by defining it a lexicographic comparison function, based on several more elementary total orderings), - it passes the Gnulib test suite. It then also passes the glibc test suite without further adjustments. Bruno Haible (4): argp: Don't rely on undefined behaviour of _tolower(). argp: Don't pass invalid arguments to isspace, isalnum, isalpha, isdigit. argp: Improve comments. argp: Avoid undefined behaviour when invoking qsort(). Paul Eggert (1): argp: fix pointer-subtraction bug argp/argp-help.c | 379 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 246 insertions(+), 133 deletions(-)
Comments
On 06/01/2021 22:06, Bruno Haible wrote: > Here is a proposed patch series to merge changes in the argp-help.c file, > done in Gnulib, back to glibc. Thanks for working on this, I have push it upstream. Do you plan to sync back the rest of argp code? I noticed there is extra changes for argp-help, but the rest is mainly code style, comments, and gnulib specific fixes.