From patchwork Thu Aug 19 23:50:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Martin Sebor X-Patchwork-Id: 44719 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 81DC0388B800 for ; Thu, 19 Aug 2021 23:50:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 81DC0388B800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629417028; bh=HchPjyXDsOSlnmEkW85/OTsvXvbZR2gfMiaqSbmxuxI=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=M7e6pRs95ZCx9jAfg/R7g4L+6/UkToFcnX4osdkJdNlKu2ZuRIirjuDxU4LmkrFl2 0hXP6cVOz6aTHte8e41MKvcrE3XRoa4N+ep/KqY3ms5zQwOeN1GhEhA7X0q34MjelU 91pWt499WP5N4QFpTS8fwjUNvJ/kTXIh/wWd9UUM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by sourceware.org (Postfix) with ESMTPS id 4D54F3857433 for ; Thu, 19 Aug 2021 23:50:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4D54F3857433 Received: by mail-qt1-x82b.google.com with SMTP id g11so6100254qtk.5 for ; Thu, 19 Aug 2021 16:50:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=HchPjyXDsOSlnmEkW85/OTsvXvbZR2gfMiaqSbmxuxI=; b=gcs9rtg9yEtI/QjIAvm4kskUCq6sZaPJCLvukV01wwX1s4Vwgu8SVWV0BAJuV9eB/1 KR/E7CVzhuXkSaYqmcLrMsczpA9zNKXR06ORWe5whLf7r5zZJdeVAdWp8lgOPR7nO61d GcyEMbIdvZB0b40Sdw8Y5GwR1gioYCs6IoWT70Xz9o5II9NMCXB8JYsWkP75/+gTVpKf LnuOU+YRHeW59PqS0LvXfMmH8k+ay9+iQaVYxzZyDboRFr9X5kJJxGm2dxcRoSBjwMPw P0MORIp5n4fm+D4nKzNpgG+GAe2KbvHzNkaAy2FkH+S0mWFl14S34bK6uygKkuroCYP5 9XWg== X-Gm-Message-State: AOAM530q2vqVBKCnOVu4G0/WeAe5/p7RBKXoX9fuZ5c6H4tKWDJc8wT4 UtqOov9+ZPhcQteH0HagloZmn6KWbcE= X-Google-Smtp-Source: ABdhPJyquGYdET6+J2yIPYf279m34b4A+dz9S45DEQAXxWAZFZsNTA5BJObQu9CA81HjyQt/YUUMiQ== X-Received: by 2002:ac8:4e24:: with SMTP id d4mr15079745qtw.277.1629417005210; Thu, 19 Aug 2021 16:50:05 -0700 (PDT) Received: from [192.168.0.41] (97-118-104-61.hlrn.qwest.net. [97.118.104.61]) by smtp.gmail.com with ESMTPSA id c27sm2354211qkp.5.2021.08.19.16.50.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 19 Aug 2021 16:50:03 -0700 (PDT) Subject: [PATCH v3] remove attribute access from regexec To: Paul Eggert References: <15a32181-8060-4135-cb72-9e79831697d5@gmail.com> <4251e9f2-d4d1-b649-8e86-a4336164a5b1@cs.ucla.edu> <1024a9e9-a880-7da2-7b99-3e8b8012a94a@cs.ucla.edu> Message-ID: Date: Thu, 19 Aug 2021 17:50:02 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <1024a9e9-a880-7da2-7b99-3e8b8012a94a@cs.ucla.edu> Content-Language: en-US X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Martin Sebor via Libc-alpha From: Martin Sebor Reply-To: Martin Sebor Cc: GNU C Library Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" On 8/18/21 1:52 PM, Paul Eggert wrote: > On 8/14/21 1:08 PM, Martin Sebor wrote: >> The VLA bound by itself doesn't affect codegen.  I suspect you're >> thinking of a[static n]?  With just a[n], without static, there >> is no requirement that a point to an array with n elements.  It >> simply declares an ordinary pointer, same as [] or *. > > Thanks for clarifying. > > I tried using a patch like that on coreutils, but it caused the build to > fail like this: > >   In file included from lib/exclude.c:35: >   ./lib/regex.h:661:7: error: ISO C90 forbids variable length array > '__pmatch' [-Werror=vla] >     661 |       regmatch_t __pmatch[_Restrict_arr_ _VLA_ARG (__nmatch)], >         |       ^~~~~~~~~~ >   cc1: all warnings being treated as errors >   make[2]: *** [Makefile:10648: lib/exclude.o] Error 1 > > This is because coreutils is compiled with -Wvla -Werror, to catch > inadvertent attempts to use VLAs in local variables (this helps avoid > stack-overflow problems). It'd be unfortunate if we had to give that > useful diagnostic up simply due to this declaration, as there's no > stack-overflow problem here. > > If you can think of a way around this issue, here are some other things > I ran into while trying this idea out on Coreutils. Thanks the for the additional testing! I wouldn't expect to see -Wvla for a Glibc declaration outside of a Glibc build. As a lexical warning, -Wvla shouldn't (and in my tests doesn't) trigger for code in system headers unless it's enabled by -Wsystem-headers. Is for some reason not considered a system header in your test environment? > > * Other cdefs.h macros (__NTH, __REDIRECT, etc.) start with two > underscores, so shouldn't this new macro too? They're both reserved but I'm happy to go with whatever convention is preferred in Glibc. > > * Come to think of it, the name _VLA_ARG could be improved, as its > argument is not actually a VLA; it's the number of elements in a > VLA-like array. Also, its formal-parameter "arg" is confusingly-named, > as it's an arbitrary integer expression and need not be a function > parameter name. How about naming the macro __ARG_NELTS instead? That works for me. > > * regex.h cannot use __ARG_NELTS directly, for the same reason it can't > use __restrict_arr directly: regex.h is shared with Gnulib and can't > assume that a glibc-like sys/cdefs.h is present. I suppose regex.h would > need something like this: > >   #ifndef _ARG_NELTS_ >   # ifdef __ARG_NELTS >   #  define _ARG_NELTS_(arg) __ARG_NELTS (arg) >   # elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ >       && !defined __STDC_NO_VLA__) >   #  define _ARG_NELTS_(n) n >   # else >   #  define _ARG_NELTS_(n) >   # endif >   #endif > > and then use _ARG_NELTS_ later. I didn't know mixing and matching two implementations like this was even possible. Thanks for explaining it (though it seems like a pretty cumbersome arrangement). I've made the suggested change. > > * The cdefs.h comment has a stray 'n', its wording could be improved (I > misread "variable bound" as a variable that's bound to something), > there's a stray empty line, and it's nicer to put the comment in front > of all the lines that define the macro. Perhaps something like this: > >   /* Specify the number of elements of a function's array parameter, >      as in 'int f (int n, int a[__ARG_NELTS (n)]);'.  */ >   #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ >        && !defined __STDC_NO_VLA__) >   # define __ARG_NELTS(n) n >   #else >   # define __ARG_NELTS(n) >   #endif I've changed the macro to the above. > > Though again, it's not clear to me that this idea will fly at all, due > to the -Wvla issue. > > Maybe GCC's -Wvla should be fixed to not report an error in this case? > It's actually not a VLA if you ask me (the C standard is unclear). I agree. Someone else made the same suggestion in GCC bug 98217 (and I even offered to handle it). I'll try to remember to get to it but as I said above, I don't think it should be necessary for this change. Attached is yet another revision of this patch (v3 to let the patch tester pick it up). Martin diff --git a/include/regex.h b/include/regex.h index 24eca2c297..76fa798861 100644 --- a/include/regex.h +++ b/include/regex.h @@ -36,8 +36,24 @@ extern void __re_set_registers extern int __regcomp (regex_t *__preg, const char *__pattern, int __cflags); libc_hidden_proto (__regcomp) + +#ifndef __ARG_NELTS +# ifdef __ARG_NELTS +/* Same as the corresponding cdefs.h macro. Defined for builds with + no cdefs.h. */ +# define __ARG_NELTS(arg) __ARG_NELTS (arg) +# elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ + && !defined __STDC_NO_VLA__) +# define __ARG_NELTS(n) n +# else +# define __ARG_NELTS(n) +# endif +#endif + extern int __regexec (const regex_t *__preg, const char *__string, - size_t __nmatch, regmatch_t __pmatch[], int __eflags); + size_t __nmatch, + regmatch_t __pmatch[__ARG_NELTS (__nmatch)], + int __eflags); libc_hidden_proto (__regexec) extern size_t __regerror (int __errcode, const regex_t *__preg, diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h index e490fc1aeb..64e46df190 100644 --- a/misc/sys/cdefs.h +++ b/misc/sys/cdefs.h @@ -632,4 +632,17 @@ _Static_assert (0, "IEEE 128-bits long double requires redirection on this platf # define __attribute_returns_twice__ /* Ignore. */ #endif +#ifndef __ARG_NELTS +# ifdef __ARG_NELTS +/* Used to specify a variable bound in a declaration of a function + VLA-like parameter, as in 'int f (int n, int[__ARG_NELTS (n)]);' */ +# define __ARG_NELTS(arg) __ARG_NELTS (arg) +# elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ + && !defined __STDC_NO_VLA__) +# define __ARG_NELTS(n) n +# else +# define __ARG_NELTS(n) +# endif +#endif + #endif /* sys/cdefs.h */ diff --git a/posix/regex.h b/posix/regex.h index 14fb1d8364..5b44f8e52b 100644 --- a/posix/regex.h +++ b/posix/regex.h @@ -654,9 +654,8 @@ extern int regcomp (regex_t *_Restrict_ __preg, extern int regexec (const regex_t *_Restrict_ __preg, const char *_Restrict_ __String, size_t __nmatch, - regmatch_t __pmatch[_Restrict_arr_], - int __eflags) - __attr_access ((__write_only__, 4, 3)); + regmatch_t __pmatch[_Restrict_arr_ __ARG_NELTS (__nmatch)], + int __eflags); extern size_t regerror (int __errcode, const regex_t *_Restrict_ __preg, char *_Restrict_ __errbuf, size_t __errbuf_size) diff --git a/posix/regexec.c b/posix/regexec.c index f7b4f9cfc3..bec2fdfe39 100644 --- a/posix/regexec.c +++ b/posix/regexec.c @@ -190,7 +190,7 @@ static reg_errcode_t extend_buffers (re_match_context_t *mctx, int min_len); int regexec (const regex_t *__restrict preg, const char *__restrict string, - size_t nmatch, regmatch_t pmatch[], int eflags) + size_t nmatch, regmatch_t pmatch[__ARG_NELTS (nmatch)], int eflags) { reg_errcode_t err; Idx start, length;