From patchwork Wed Aug 18 15:53:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Martin Sebor X-Patchwork-Id: 44695 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 261A3385701E for ; Wed, 18 Aug 2021 15:54:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 261A3385701E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629302064; bh=n4+ITe5ZVUBZ67ACHQPl6E9guEnDr70xIsWz4lYHwkA=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=hTkjz2yd47tYv3qSmE8Z4hdZo+zpH1RbCu30Bx9gLm2qrPyR3OXvQMySlXyDZapcw jTOsEX2Pc5mOfpFCDG7hoW1EQyFHo/IPTCjwbT1ojUInyBKEn+dj88P7bRTil4FhOB TmGURxnCgfAi2VqFbTz8utHng3M+v1G7842eV+nA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com [IPv6:2607:f8b0:4864:20::f2e]) by sourceware.org (Postfix) with ESMTPS id 2E9C6385701E for ; Wed, 18 Aug 2021 15:53:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2E9C6385701E Received: by mail-qv1-xf2e.google.com with SMTP id jv8so1913031qvb.3 for ; Wed, 18 Aug 2021 08:53:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=n4+ITe5ZVUBZ67ACHQPl6E9guEnDr70xIsWz4lYHwkA=; b=Hm6OArl2cGycJ3GiFPF8pRfiF92z/HJJyENCqZxszHLTOjdPWGepdu45/d2xIqNwc/ 6rH8kQAASz6Lfs12Ku04R1tyP3CQ7qym+c0hBq5nuBE2Bw18wY+zkJjlb9OzlUSMK2X7 ZgqD+dWCb57wa5dWLi5eO1rMjvfLTFdpmXkfwXIaUNP9SfwLmvLQ+EeQTI4+FmYTpyzd OuZTnfOi/UdPrB1cK4jE1mAE9/FSDDQ97JZ8U2CJZ0Y6r9t1AxT0DhZjG/ATAMM6LMFA iY+Lujepa+BXcCAyv0nNC8y+e0O9HoUUFW3mCz1RCn1IjGUToAiisBOUN3NE1KgdxnWu 9n0Q== X-Gm-Message-State: AOAM53268uYS36ggNHG9V9bSBS2sjykrb0rMenLFZodJBCZrMUW32HTe tW8vsuA1zZim0eYnpfoEcjk= X-Google-Smtp-Source: ABdhPJzC4eCUuZInywBWyrKj4RfMGMK3pqabmi6Y6W0t0djMAybbgzw5c0BxTOdWRV0oRZmkOaiPeA== X-Received: by 2002:ad4:5bc8:: with SMTP id t8mr9215762qvt.18.1629302032809; Wed, 18 Aug 2021 08:53:52 -0700 (PDT) Received: from [192.168.0.41] (97-118-104-61.hlrn.qwest.net. [97.118.104.61]) by smtp.gmail.com with ESMTPSA id w20sm78137qkj.116.2021.08.18.08.53.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 Aug 2021 08:53:52 -0700 (PDT) Subject: [PATCH v2] remove attribute access from regexec To: Paul Eggert References: <15a32181-8060-4135-cb72-9e79831697d5@gmail.com> <4251e9f2-d4d1-b649-8e86-a4336164a5b1@cs.ucla.edu> Message-ID: Date: Wed, 18 Aug 2021 09:53:50 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Martin Sebor via Libc-alpha From: Martin Sebor Reply-To: Martin Sebor Cc: GNU C Library Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Let me repost an updated patch as v2 to let the patch tester know about the revision, and to also include the corresponding change to regexec.c needed to avoid the -Wvla-parameter I mentioned. Apparently the first patch failed to apply and the second one wasn't picked up because the subject line hasn't changed. Thanks for letting me know, Carlos! Here's a link to the Patchwork check: http://patchwork.sourceware.org/project/glibc/patch/15a32181-8060-4135-cb72-9e79831697d5@gmail.com/ On 8/14/21 2:08 PM, Martin Sebor wrote: > On 8/13/21 4:34 PM, Paul Eggert wrote: >> On 8/13/21 2:30 PM, Martin Sebor wrote: >>> Attached is a revised patch with this approach. >> >> The revised patch is to include/regex.h but the original patch was to >> posix/regex.h. Is that intentional? > > Yes, they need to be consistent, otherwise GCC issues -Wvla-parameter. > (That's to help detect inadvertent array/VLA mismatches as well as > mismatches in the VLA parameter bounds.) > >> >> We need to check whether __STDC_VERSION__ is defined. Also, no need >> for parens around arg of 'defined'. Something like this perhaps: >> >>    #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ >>         && !defined __STDC_NO_VLA__) >> >> Also, the duplication of the declarations make the headers harder to >> read and encourage typos (I noticed one typo: "_Restrict_arr" without >> the trailing "_"). Instead, I suggest something like this: >> >>    #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ >>         && !defined __STDC_NO_VLA__) >>    # define _REGEX_VLA(arg) arg >>    #else >>    # define _REGEX_VLA(arg) >>    #endif >> >> That way, we can simply change "regmatch_t __pmatch[_Restrict_arr_]" >> to "regmatch_t __pmatch[_Restrict_arr_ _REGEX_VLA (__nmatch)]" without >> having to duplicate the entire function declaration. > > Sounds good.  I've defined the macro in cdefs.h and mamed it _VLA_ARG > to make it usable in other contexts.  Please see the attached revision. > >> >>> PS POSIX says regexec() ignores pnmatch when REG_NOSUB is set, so >>> strictly speaking, warning for such calls to it in that case is >>> also a false positive. >> >> Ouch, this casts doubt on the entire exercise. It's not simply about >> warnings: it's about the code being generated for the matcher. For >> example, for: >> >> int >> f (_Bool flag, unsigned long n, int a[n]) >> { >>    return n == 0 ? 0 : flag ? a[n - 1] : a[0]; >> } >> >> a compiler is allowed to generate code that loads a[n - 1] even when >> FLAG is false. Similarly, if we add this VLA business to regexec, the >> generated machine code could dereference pmatch unconditionally even >> if our source code makes the dereferencing conditional on REG_NOSUB, >> and the resulting behavior would fail to conform to POSIX. > > The VLA bound by itself doesn't affect codegen.  I suspect you're > thinking of a[static n]?  With just a[n], without static, there > is no requirement that a point to an array with n elements.  It > simply declares an ordinary pointer, same as [] or *. > > Martin diff --git a/include/regex.h b/include/regex.h index 24eca2c297..5cf3ef7636 100644 --- a/include/regex.h +++ b/include/regex.h @@ -37,7 +37,8 @@ extern int __regcomp (regex_t *__preg, const char *__pattern, int __cflags); libc_hidden_proto (__regcomp) extern int __regexec (const regex_t *__preg, const char *__string, - size_t __nmatch, regmatch_t __pmatch[], int __eflags); + size_t __nmatch, regmatch_t __pmatch[_VLA_ARG (__nmatch)], + int __eflags); libc_hidden_proto (__regexec) extern size_t __regerror (int __errcode, const regex_t *__preg, diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h index e490fc1aeb..6b6e4c233c 100644 --- a/misc/sys/cdefs.h +++ b/misc/sys/cdefs.h @@ -632,4 +632,14 @@ _Static_assert (0, "IEEE 128-bits long double requires redirection on this platf # define __attribute_returns_twice__ /* Ignore. */ #endif + +#if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ + && !defined __STDC_NO_VLA__) +/* Used to specify a variable bound in a declaration of a function + VLA-like parameter, as in 'int f (int n, int[_VLA_ARG (n)n]);' */ +# define _VLA_ARG(arg) arg +#else +# define _VLA_ARG(arg) +#endif + #endif /* sys/cdefs.h */ diff --git a/posix/regex.h b/posix/regex.h index 14fb1d8364..52ccc4b577 100644 --- a/posix/regex.h +++ b/posix/regex.h @@ -654,9 +654,8 @@ extern int regcomp (regex_t *_Restrict_ __preg, extern int regexec (const regex_t *_Restrict_ __preg, const char *_Restrict_ __String, size_t __nmatch, - regmatch_t __pmatch[_Restrict_arr_], - int __eflags) - __attr_access ((__write_only__, 4, 3)); + regmatch_t __pmatch[_Restrict_arr_ _VLA_ARG (__nmatch)], + int __eflags); extern size_t regerror (int __errcode, const regex_t *_Restrict_ __preg, char *_Restrict_ __errbuf, size_t __errbuf_size) diff --git a/posix/regexec.c b/posix/regexec.c index f7b4f9cfc3..7b9d0ee65f 100644 --- a/posix/regexec.c +++ b/posix/regexec.c @@ -190,7 +190,7 @@ static reg_errcode_t extend_buffers (re_match_context_t *mctx, int min_len); int regexec (const regex_t *__restrict preg, const char *__restrict string, - size_t nmatch, regmatch_t pmatch[], int eflags) + size_t nmatch, regmatch_t pmatch[_VLA_ARG (nmatch)], int eflags) { reg_errcode_t err; Idx start, length;