[v3] remove attribute access from regexec

Message ID ce165a21-3b3a-baf3-f745-36ffdb243310@gmail.com
State Dropped
Headers
Series [v3] remove attribute access from regexec |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Martin Sebor Aug. 19, 2021, 11:50 p.m. UTC
  On 8/18/21 1:52 PM, Paul Eggert wrote:
> On 8/14/21 1:08 PM, Martin Sebor wrote:
>> The VLA bound by itself doesn't affect codegen.  I suspect you're
>> thinking of a[static n]?  With just a[n], without static, there
>> is no requirement that a point to an array with n elements.  It
>> simply declares an ordinary pointer, same as [] or *.
> 
> Thanks for clarifying.
> 
> I tried using a patch like that on coreutils, but it caused the build to 
> fail like this:
> 
>    In file included from lib/exclude.c:35:
>    ./lib/regex.h:661:7: error: ISO C90 forbids variable length array 
> '__pmatch' [-Werror=vla]
>      661 |       regmatch_t __pmatch[_Restrict_arr_ _VLA_ARG (__nmatch)],
>          |       ^~~~~~~~~~
>    cc1: all warnings being treated as errors
>    make[2]: *** [Makefile:10648: lib/exclude.o] Error 1
> 
> This is because coreutils is compiled with -Wvla -Werror, to catch 
> inadvertent attempts to use VLAs in local variables (this helps avoid 
> stack-overflow problems). It'd be unfortunate if we had to give that 
> useful diagnostic up simply due to this declaration, as there's no 
> stack-overflow problem here.
> 
> If you can think of a way around this issue, here are some other things 
> I ran into while trying this idea out on Coreutils.

Thanks the for the additional testing!  I wouldn't expect to see
-Wvla for a Glibc declaration outside of a Glibc build.   As
a lexical warning, -Wvla shouldn't (and in my tests doesn't) trigger
for code in system headers unless it's enabled by -Wsystem-headers.
Is <regex.h> for some reason not considered a system header in your
test environment?

> 
> * Other cdefs.h macros (__NTH, __REDIRECT, etc.) start with two 
> underscores, so shouldn't this new macro too?

They're both reserved but I'm happy to go with whatever convention
is preferred in Glibc.

> 
> * Come to think of it, the name _VLA_ARG could be improved, as its 
> argument is not actually a VLA; it's the number of elements in a 
> VLA-like array. Also, its formal-parameter "arg" is confusingly-named, 
> as it's an arbitrary integer expression and need not be a function 
> parameter name. How about naming the macro __ARG_NELTS instead?

That works for me.

> 
> * regex.h cannot use __ARG_NELTS directly, for the same reason it can't 
> use __restrict_arr directly: regex.h is shared with Gnulib and can't 
> assume that a glibc-like sys/cdefs.h is present. I suppose regex.h would 
> need something like this:
> 
>    #ifndef _ARG_NELTS_
>    # ifdef __ARG_NELTS
>    #  define _ARG_NELTS_(arg) __ARG_NELTS (arg)
>    # elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
>        && !defined __STDC_NO_VLA__)
>    #  define _ARG_NELTS_(n) n
>    # else
>    #  define _ARG_NELTS_(n)
>    # endif
>    #endif
> 
> and then use _ARG_NELTS_ later.

I didn't know mixing and matching two implementations like this
was even possible.  Thanks for explaining it (though it seems
like a pretty cumbersome arrangement).  I've made the suggested
change.

> 
> * The cdefs.h comment has a stray 'n', its wording could be improved (I 
> misread "variable bound" as a variable that's bound to something), 
> there's a stray empty line, and it's nicer to put the comment in front 
> of all the lines that define the macro. Perhaps something like this:
> 
>    /* Specify the number of elements of a function's array parameter,
>       as in 'int f (int n, int a[__ARG_NELTS (n)]);'.  */
>    #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
>         && !defined __STDC_NO_VLA__)
>    # define __ARG_NELTS(n) n
>    #else
>    # define __ARG_NELTS(n)
>    #endif

I've changed the macro to the above.

> 
> Though again, it's not clear to me that this idea will fly at all, due 
> to the -Wvla issue.
> 
> Maybe GCC's -Wvla should be fixed to not report an error in this case? 
> It's actually not a VLA if you ask me (the C standard is unclear).

I agree.  Someone else made the same suggestion in GCC bug 98217 (and
I even offered to handle it).  I'll try to remember to get to it but
as I said above, I don't think it should be necessary for this change.

Attached is yet another revision of this patch (v3 to let the patch
tester pick it up).

Martin
  

Comments

Paul Eggert Aug. 22, 2021, 5:06 a.m. UTC | #1
On 8/19/21 4:50 PM, Martin Sebor wrote:

> Is <regex.h> for some reason not considered a system header in your
> test environment?

Yes, because Coreutils etc. currently use Gnulib regex.h instead of 
/usr/include/regex.h, as Glibc regex has a bug and the fix (in Gnulib) 
hasn't made its way back to Glibc yet.

This sort of thing is all too common, as regex development (such as it 
is) often occurs in Gnulib first, which means it's common for 
'configure' to discover that the current Glibc has a bug and to 
substitute Gnulib regex instead. And when we compile Gnulib headers we 
don't consider them to be system headers, because we want the extra 
static checking that gcc does for non system headers.

I think the simplest workaround for the warning is to disable -Wvla for 
that particular declaration. Please see attached Gnulib patch for how I 
suggest doing this.

> I didn't know mixing and matching two implementations like this
> was even possible.  Thanks for explaining it (though it seems
> like a pretty cumbersome arrangement).

Yes, it is a bit awkward. I am tempted to clean it up but that really 
should be a separate patch and so I will focus only on this thread's issue.

> +#ifndef __ARG_NELTS
> +# ifdef __ARG_NELTS

A clear typo. The suggestion was to define and use _ARG_NELTS_ instead.

Some other thoughts.

If we're going to change regexec, we should change __compat_regex to be 
consistent.

We should also change regexec.c's internal functions to be consistent 
with regexec. This includes re_search_internal, update_regs, etc.

These internal functions don't have regexec's quirk that pmatch is 
ignored when not substituting, so their parameters can use the syntax 
"regmatch_t pmatch[__ARG_NELTS (static nmatch)]" to let the compiler 
know that the array must be of the given size.

Proposed Gnulib patch attached. It should be easy to aplly to glibc 
though glibc also needs its own regex.h patched. I have not installed 
this into Gnulib yet.

At some point we should sync Gnulib and Glibc regex of course.
  
Martin Sebor Aug. 26, 2021, 3:06 p.m. UTC | #2
On 8/21/21 11:06 PM, Paul Eggert wrote:
> On 8/19/21 4:50 PM, Martin Sebor wrote:
> 
>> Is <regex.h> for some reason not considered a system header in your
>> test environment?
> 
> Yes, because Coreutils etc. currently use Gnulib regex.h instead of 
> /usr/include/regex.h, as Glibc regex has a bug and the fix (in Gnulib) 
> hasn't made its way back to Glibc yet.
> 
> This sort of thing is all too common, as regex development (such as it 
> is) often occurs in Gnulib first, which means it's common for 
> 'configure' to discover that the current Glibc has a bug and to 
> substitute Gnulib regex instead. And when we compile Gnulib headers we 
> don't consider them to be system headers, because we want the extra 
> static checking that gcc does for non system headers.
> 
> I think the simplest workaround for the warning is to disable -Wvla for 
> that particular declaration. Please see attached Gnulib patch for how I 
> suggest doing this.
> 
>> I didn't know mixing and matching two implementations like this
>> was even possible.  Thanks for explaining it (though it seems
>> like a pretty cumbersome arrangement).
> 
> Yes, it is a bit awkward. I am tempted to clean it up but that really 
> should be a separate patch and so I will focus only on this thread's issue.
> 
>> +#ifndef __ARG_NELTS
>> +# ifdef __ARG_NELTS
> 
> A clear typo. The suggestion was to define and use _ARG_NELTS_ instead.

__ARG_NELTS is  the name of macro in cdefs.h that you suggested so
this should presumably be the same, no?  (I.e., it's not a typo but
I could be missing something.)

> 
> Some other thoughts.
> 
> If we're going to change regexec, we should change __compat_regex to be 
> consistent.
> 
> We should also change regexec.c's internal functions to be consistent 
> with regexec. This includes re_search_internal, update_regs, etc.

That sounds like a good idea for a follow-up.  My immediate goal is
to avoid the current warning so Glibc can build, and I prefer to make
that change independently of other improvements.

I plan to commit the last revision of the patch as is unless there's
some reason I'm missing that the name of the regex.h macro shouldn't
be the same as that in cdefs.h.  I see you have both __ARG_NELTS and
_ARG_NELTS_ in your Gnulib patch though I don't understand why.

Thanks for all your comments!

Martin

> 
> These internal functions don't have regexec's quirk that pmatch is 
> ignored when not substituting, so their parameters can use the syntax 
> "regmatch_t pmatch[__ARG_NELTS (static nmatch)]" to let the compiler 
> know that the array must be of the given size.
> 
> Proposed Gnulib patch attached. It should be easy to aplly to glibc 
> though glibc also needs its own regex.h patched. I have not installed 
> this into Gnulib yet.
> 
> At some point we should sync Gnulib and Glibc regex of course.
  
Paul Eggert Aug. 26, 2021, 4:24 p.m. UTC | #3
On 8/26/21 8:06 AM, Martin Sebor wrote:
>>
>>> +#ifndef __ARG_NELTS
>>> +# ifdef __ARG_NELTS
>>
>> A clear typo. The suggestion was to define and use _ARG_NELTS_ instead.
> 
> __ARG_NELTS is  the name of macro in cdefs.h that you suggested so
> this should presumably be the same, no?

No matter what the macro name is (let's call it N), it cannot be right 
to do this:

    #ifndef N
    # ifdef N
    ... whatever ...

because the "whatever" cannot be reached. Unfortunately this is what 
your patch does, with N being __ARG_NELTS.

> I see you have both __ARG_NELTS and
> _ARG_NELTS_ in your Gnulib patch though I don't understand why.

It's for the same reason that regex.h already has both __restrict_arr 
(the normal glibc practice) and  _Restrict_arr_ (an alternative spelling 
used only in regex.h). This is because regex.h is used both in 
/usr/include on GNU systems (where __restrict_arr is used) and in Gnulib 
on non-GNU systems that may not have __restrict_arr.

regex.h does not simply #define __restrict_arr if it's not already 
defined, because any such definition might collide with a later 
GNU-system header that defines __restrict_arr in a different way (this 
could happen if Gnulib regex.h is used on a older GNUish platform).

Obviously this is a messy situation and I'd like to clean it up, but as 
I mentioned in my previous email any such cleanup something better left 
for a later patch. In the meantime we should just continue existing 
practice when we add a new macro like __ARG_NELTS.

> I plan to commit the last revision of the patch as is

Please don't do that. That patch still needs work. Instead, please start 
with the patch I sent, and understand what it does. Unless I'm missing 
something it should be able to go into glibc as-is (though you may need 
to edit glibc files that are not in gnulib).
  
Martin Sebor Aug. 26, 2021, 4:47 p.m. UTC | #4
On 8/26/21 10:24 AM, Paul Eggert wrote:
> On 8/26/21 8:06 AM, Martin Sebor wrote:
>>>
>>>> +#ifndef __ARG_NELTS
>>>> +# ifdef __ARG_NELTS
>>>
>>> A clear typo. The suggestion was to define and use _ARG_NELTS_ instead.
>>
>> __ARG_NELTS is  the name of macro in cdefs.h that you suggested so
>> this should presumably be the same, no?
> 
> No matter what the macro name is (let's call it N), it cannot be right 
> to do this:
> 
>     #ifndef N
>     # ifdef N
>     ... whatever ...
> 
> because the "whatever" cannot be reached. Unfortunately this is what 
> your patch does, with N being __ARG_NELTS.

Aaah.  I took what you had in your email and changed it to
__ARG_NELTS thinking you had a typo there because you had
suggested __ARG_NELTS before, and introduced a typo of my own.

> 
>> I see you have both __ARG_NELTS and
>> _ARG_NELTS_ in your Gnulib patch though I don't understand why.
> 
> It's for the same reason that regex.h already has both __restrict_arr 
> (the normal glibc practice) and  _Restrict_arr_ (an alternative spelling 
> used only in regex.h). This is because regex.h is used both in 
> /usr/include on GNU systems (where __restrict_arr is used) and in Gnulib 
> on non-GNU systems that may not have __restrict_arr.
> 
> regex.h does not simply #define __restrict_arr if it's not already 
> defined, because any such definition might collide with a later 
> GNU-system header that defines __restrict_arr in a different way (this 
> could happen if Gnulib regex.h is used on a older GNUish platform).
> 
> Obviously this is a messy situation and I'd like to clean it up, but as 
> I mentioned in my previous email any such cleanup something better left 
> for a later patch. In the meantime we should just continue existing 
> practice when we add a new macro like __ARG_NELTS.
> 
>> I plan to commit the last revision of the patch as is
> 
> Please don't do that. That patch still needs work. Instead, please start 
> with the patch I sent, and understand what it does. Unless I'm missing 
> something it should be able to go into glibc as-is (though you may need 
> to edit glibc files that are not in gnulib).

I think I'd prefer to let you do this.  It's far more complicated
and extensive than it needs to be to silence just the one warning.

Martin
  
Paul Eggert Aug. 27, 2021, 8:52 a.m. UTC | #5
On 8/26/21 9:47 AM, Martin Sebor wrote:
> I think I'd prefer to let you do this.

OK, proposed glibc patch attached. As I lack the time to shepherd each 
regex fix into Glibc separately, I simply copied regex-relevant source 
files from Gnulib to Glibc, after merging recent Glibc changes (plus 
your proposal) into Gnulib.

Please give the patch a try.
  
Martin Sebor Aug. 27, 2021, 4:34 p.m. UTC | #6
On 8/27/21 2:52 AM, Paul Eggert wrote:
> On 8/26/21 9:47 AM, Martin Sebor wrote:
>> I think I'd prefer to let you do this.
> 
> OK, proposed glibc patch attached. As I lack the time to shepherd each 
> regex fix into Glibc separately, I simply copied regex-relevant source 
> files from Gnulib to Glibc, after merging recent Glibc changes (plus 
> your proposal) into Gnulib.
> 
> Please give the patch a try.

The patch compiles fine with the latest GCC trunk (12.0) but it
seems to cause failures in the conformance tests that I don't think
were there before:

FAIL: conform/POSIX/regex.h/conform
FAIL: conform/POSIX2008/regex.h/conform
FAIL: conform/UNIX98/regex.h/conform
FAIL: conform/XOPEN2K/regex.h/conform
FAIL: conform/XOPEN2K8/regex.h/conform
FAIL: conform/XPG4/regex.h/conform
FAIL: conform/XPG42/regex.h/conform

The first one fails because of this (I suspect the others are
the same):

     Namespace violation: "GCC"
     Namespace violation: "diagnostic"
     Namespace violation: "ignored"
     Namespace violation: "pop"
     Namespace violation: "pragma"
     Namespace violation: "push"
FAIL: Namespace of <regex.h>

I'm guessing it's due to a limitation of the conformance test script
that considers the GCC diagnostic pragmas unsafe to use because they
are in the user-namespace (even though they're not subject to macro
expansion).  I'm still not sure I understand why the #pragma is
needed since -Wvla shouldn't trigger in a system header.

Beyond that I only skimmed your patch.  It includes quite a few
changes that don't seem necessary to avoid the GCC warning (BZ #28170)
and that I wouldn't be comfortable making at the same time.  This is
not an objection, just an explanation why I was reluctant to accept
some of your suggestions.

But just to clarify, I meant I would have preferred to just fix
the Glibc warning and let you change the other Glibc internal APIs
and handle the Gnulib integration.  Looks like you decided against
the former in the end but thanks for taking care of the latter.

Martin
  
Paul Eggert Aug. 27, 2021, 6:57 p.m. UTC | #7
On 8/27/21 9:34 AM, Martin Sebor wrote:

> It includes quite a few
> changes that don't seem necessary to avoid the GCC warning (BZ #28170)

True. However, the changes have been tested on the Gnulib side, and I 
feel more comfortable applying them all than doing just some of them. 
It's less work anyway; and the more-work option feels like make-work.

I'll give people more time to comment before installing.
  
Joseph Myers Sept. 20, 2021, 8:46 p.m. UTC | #8
On Fri, 27 Aug 2021, Paul Eggert wrote:

> On 8/27/21 9:34 AM, Martin Sebor wrote:
> 
> > It includes quite a few
> > changes that don't seem necessary to avoid the GCC warning (BZ #28170)
> 
> True. However, the changes have been tested on the Gnulib side, and I feel
> more comfortable applying them all than doing just some of them. It's less
> work anyway; and the more-work option feels like make-work.
> 
> I'll give people more time to comment before installing.

Any news on this?  Builds with GCC mainline are still falling over with 
the regexec issue (thus hiding any other issues with GCC mainline that 
might have appeared since then and making them harder to bisect).
  
Paul Eggert Sept. 21, 2021, 6:52 a.m. UTC | #9
On 9/20/21 1:46 PM, Joseph Myers wrote:
> Any news on this?  Builds with GCC mainline are still falling over with
> the regexec issue (thus hiding any other issues with GCC mainline that
> might have appeared since then and making them harder to bisect).

Sounds like I should install that August 27 patch then. It would sync 
those files with Gnulib. OK with you?
  
Joseph Myers Sept. 21, 2021, 1:48 p.m. UTC | #10
On Tue, 21 Sep 2021, Paul Eggert wrote:

> On 9/20/21 1:46 PM, Joseph Myers wrote:
> > Any news on this?  Builds with GCC mainline are still falling over with
> > the regexec issue (thus hiding any other issues with GCC mainline that
> > might have appeared since then and making them harder to bisect).
> 
> Sounds like I should install that August 27 patch then. It would sync those
> files with Gnulib. OK with you?

Yes.
  
Paul Eggert Sept. 21, 2021, 3 p.m. UTC | #11
On 9/21/21 6:48 AM, Joseph Myers wrote:

>> Sounds like I should install that August 27 patch then. It would sync those
>> files with Gnulib. OK with you?
> 
> Yes.

OK, done. I built with gcc 11.2.1 20210728 (Red Hat 11.2.1-1). GCC 
mainline builds should work now, though I haven't tested that.
  
Carlos O'Donell Oct. 19, 2021, 4:39 p.m. UTC | #12
On 8/19/21 19:50, Martin Sebor via Libc-alpha wrote:
> Attached is yet another revision of this patch (v3 to let the patch
> tester pick it up).

Is your v3 resolved given the merge from gnulib?

commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Tue Sep 21 07:47:45 2021 -0700

    regex: copy back from Gnulib
  
Martin Sebor Oct. 19, 2021, 5:06 p.m. UTC | #13
On 10/19/21 10:39 AM, Carlos O'Donell wrote:
> On 8/19/21 19:50, Martin Sebor via Libc-alpha wrote:
>> Attached is yet another revision of this patch (v3 to let the patch
>> tester pick it up).
> 
> Is your v3 resolved given the merge from gnulib?
> 
> commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82
> Author: Paul Eggert <eggert@cs.ucla.edu>
> Date:   Tue Sep 21 07:47:45 2021 -0700
> 
>      regex: copy back from Gnulib
> 

Yes, I believe so.  I have just rebuilt the latest Glibc with
the latest GCC on x86_64-linux.  The build shows no warnings.

Thanks
Martin
  

Patch

diff --git a/include/regex.h b/include/regex.h
index 24eca2c297..76fa798861 100644
--- a/include/regex.h
+++ b/include/regex.h
@@ -36,8 +36,24 @@  extern void __re_set_registers
 extern int __regcomp (regex_t *__preg, const char *__pattern, int __cflags);
 libc_hidden_proto (__regcomp)
 
+
+#ifndef __ARG_NELTS
+# ifdef __ARG_NELTS
+/* Same as the corresponding cdefs.h macro.  Defined for builds with
+   no cdefs.h.  */
+#  define __ARG_NELTS(arg) __ARG_NELTS (arg)
+# elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
+        && !defined __STDC_NO_VLA__)
+#  define __ARG_NELTS(n) n
+# else
+#  define __ARG_NELTS(n)
+# endif
+#endif
+
 extern int __regexec (const regex_t *__preg, const char *__string,
-		      size_t __nmatch, regmatch_t __pmatch[], int __eflags);
+		      size_t __nmatch,
+		      regmatch_t __pmatch[__ARG_NELTS (__nmatch)],
+		      int __eflags);
 libc_hidden_proto (__regexec)
 
 extern size_t __regerror (int __errcode, const regex_t *__preg,
diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h
index e490fc1aeb..64e46df190 100644
--- a/misc/sys/cdefs.h
+++ b/misc/sys/cdefs.h
@@ -632,4 +632,17 @@  _Static_assert (0, "IEEE 128-bits long double requires redirection on this platf
 # define __attribute_returns_twice__ /* Ignore.  */
 #endif
 
+#ifndef __ARG_NELTS
+# ifdef __ARG_NELTS
+/* Used to specify a variable bound in a declaration of a function
+   VLA-like parameter, as in 'int f (int n, int[__ARG_NELTS (n)]);'  */
+#  define __ARG_NELTS(arg) __ARG_NELTS (arg)
+# elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
+        && !defined __STDC_NO_VLA__)
+#  define __ARG_NELTS(n) n
+# else
+#  define __ARG_NELTS(n)
+# endif
+#endif
+
 #endif	 /* sys/cdefs.h */
diff --git a/posix/regex.h b/posix/regex.h
index 14fb1d8364..5b44f8e52b 100644
--- a/posix/regex.h
+++ b/posix/regex.h
@@ -654,9 +654,8 @@  extern int regcomp (regex_t *_Restrict_ __preg,
 
 extern int regexec (const regex_t *_Restrict_ __preg,
 		    const char *_Restrict_ __String, size_t __nmatch,
-		    regmatch_t __pmatch[_Restrict_arr_],
-		    int __eflags)
-    __attr_access ((__write_only__, 4, 3));
+		    regmatch_t __pmatch[_Restrict_arr_ __ARG_NELTS (__nmatch)],
+		    int __eflags);
 
 extern size_t regerror (int __errcode, const regex_t *_Restrict_ __preg,
 			char *_Restrict_ __errbuf, size_t __errbuf_size)
diff --git a/posix/regexec.c b/posix/regexec.c
index f7b4f9cfc3..bec2fdfe39 100644
--- a/posix/regexec.c
+++ b/posix/regexec.c
@@ -190,7 +190,7 @@  static reg_errcode_t extend_buffers (re_match_context_t *mctx, int min_len);
 
 int
 regexec (const regex_t *__restrict preg, const char *__restrict string,
-	 size_t nmatch, regmatch_t pmatch[], int eflags)
+	 size_t nmatch, regmatch_t pmatch[__ARG_NELTS (nmatch)], int eflags)
 {
   reg_errcode_t err;
   Idx start, length;