[v3] remove attribute access from regexec
Checks
Context |
Check |
Description |
dj/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
dj/TryBot-32bit |
success
|
Build for i686
|
Commit Message
On 8/18/21 1:52 PM, Paul Eggert wrote:
> On 8/14/21 1:08 PM, Martin Sebor wrote:
>> The VLA bound by itself doesn't affect codegen. I suspect you're
>> thinking of a[static n]? With just a[n], without static, there
>> is no requirement that a point to an array with n elements. It
>> simply declares an ordinary pointer, same as [] or *.
>
> Thanks for clarifying.
>
> I tried using a patch like that on coreutils, but it caused the build to
> fail like this:
>
> In file included from lib/exclude.c:35:
> ./lib/regex.h:661:7: error: ISO C90 forbids variable length array
> '__pmatch' [-Werror=vla]
> 661 | regmatch_t __pmatch[_Restrict_arr_ _VLA_ARG (__nmatch)],
> | ^~~~~~~~~~
> cc1: all warnings being treated as errors
> make[2]: *** [Makefile:10648: lib/exclude.o] Error 1
>
> This is because coreutils is compiled with -Wvla -Werror, to catch
> inadvertent attempts to use VLAs in local variables (this helps avoid
> stack-overflow problems). It'd be unfortunate if we had to give that
> useful diagnostic up simply due to this declaration, as there's no
> stack-overflow problem here.
>
> If you can think of a way around this issue, here are some other things
> I ran into while trying this idea out on Coreutils.
Thanks the for the additional testing! I wouldn't expect to see
-Wvla for a Glibc declaration outside of a Glibc build. As
a lexical warning, -Wvla shouldn't (and in my tests doesn't) trigger
for code in system headers unless it's enabled by -Wsystem-headers.
Is <regex.h> for some reason not considered a system header in your
test environment?
>
> * Other cdefs.h macros (__NTH, __REDIRECT, etc.) start with two
> underscores, so shouldn't this new macro too?
They're both reserved but I'm happy to go with whatever convention
is preferred in Glibc.
>
> * Come to think of it, the name _VLA_ARG could be improved, as its
> argument is not actually a VLA; it's the number of elements in a
> VLA-like array. Also, its formal-parameter "arg" is confusingly-named,
> as it's an arbitrary integer expression and need not be a function
> parameter name. How about naming the macro __ARG_NELTS instead?
That works for me.
>
> * regex.h cannot use __ARG_NELTS directly, for the same reason it can't
> use __restrict_arr directly: regex.h is shared with Gnulib and can't
> assume that a glibc-like sys/cdefs.h is present. I suppose regex.h would
> need something like this:
>
> #ifndef _ARG_NELTS_
> # ifdef __ARG_NELTS
> # define _ARG_NELTS_(arg) __ARG_NELTS (arg)
> # elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
> && !defined __STDC_NO_VLA__)
> # define _ARG_NELTS_(n) n
> # else
> # define _ARG_NELTS_(n)
> # endif
> #endif
>
> and then use _ARG_NELTS_ later.
I didn't know mixing and matching two implementations like this
was even possible. Thanks for explaining it (though it seems
like a pretty cumbersome arrangement). I've made the suggested
change.
>
> * The cdefs.h comment has a stray 'n', its wording could be improved (I
> misread "variable bound" as a variable that's bound to something),
> there's a stray empty line, and it's nicer to put the comment in front
> of all the lines that define the macro. Perhaps something like this:
>
> /* Specify the number of elements of a function's array parameter,
> as in 'int f (int n, int a[__ARG_NELTS (n)]);'. */
> #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
> && !defined __STDC_NO_VLA__)
> # define __ARG_NELTS(n) n
> #else
> # define __ARG_NELTS(n)
> #endif
I've changed the macro to the above.
>
> Though again, it's not clear to me that this idea will fly at all, due
> to the -Wvla issue.
>
> Maybe GCC's -Wvla should be fixed to not report an error in this case?
> It's actually not a VLA if you ask me (the C standard is unclear).
I agree. Someone else made the same suggestion in GCC bug 98217 (and
I even offered to handle it). I'll try to remember to get to it but
as I said above, I don't think it should be necessary for this change.
Attached is yet another revision of this patch (v3 to let the patch
tester pick it up).
Martin
Comments
On 8/19/21 4:50 PM, Martin Sebor wrote:
> Is <regex.h> for some reason not considered a system header in your
> test environment?
Yes, because Coreutils etc. currently use Gnulib regex.h instead of
/usr/include/regex.h, as Glibc regex has a bug and the fix (in Gnulib)
hasn't made its way back to Glibc yet.
This sort of thing is all too common, as regex development (such as it
is) often occurs in Gnulib first, which means it's common for
'configure' to discover that the current Glibc has a bug and to
substitute Gnulib regex instead. And when we compile Gnulib headers we
don't consider them to be system headers, because we want the extra
static checking that gcc does for non system headers.
I think the simplest workaround for the warning is to disable -Wvla for
that particular declaration. Please see attached Gnulib patch for how I
suggest doing this.
> I didn't know mixing and matching two implementations like this
> was even possible. Thanks for explaining it (though it seems
> like a pretty cumbersome arrangement).
Yes, it is a bit awkward. I am tempted to clean it up but that really
should be a separate patch and so I will focus only on this thread's issue.
> +#ifndef __ARG_NELTS
> +# ifdef __ARG_NELTS
A clear typo. The suggestion was to define and use _ARG_NELTS_ instead.
Some other thoughts.
If we're going to change regexec, we should change __compat_regex to be
consistent.
We should also change regexec.c's internal functions to be consistent
with regexec. This includes re_search_internal, update_regs, etc.
These internal functions don't have regexec's quirk that pmatch is
ignored when not substituting, so their parameters can use the syntax
"regmatch_t pmatch[__ARG_NELTS (static nmatch)]" to let the compiler
know that the array must be of the given size.
Proposed Gnulib patch attached. It should be easy to aplly to glibc
though glibc also needs its own regex.h patched. I have not installed
this into Gnulib yet.
At some point we should sync Gnulib and Glibc regex of course.
On 8/21/21 11:06 PM, Paul Eggert wrote:
> On 8/19/21 4:50 PM, Martin Sebor wrote:
>
>> Is <regex.h> for some reason not considered a system header in your
>> test environment?
>
> Yes, because Coreutils etc. currently use Gnulib regex.h instead of
> /usr/include/regex.h, as Glibc regex has a bug and the fix (in Gnulib)
> hasn't made its way back to Glibc yet.
>
> This sort of thing is all too common, as regex development (such as it
> is) often occurs in Gnulib first, which means it's common for
> 'configure' to discover that the current Glibc has a bug and to
> substitute Gnulib regex instead. And when we compile Gnulib headers we
> don't consider them to be system headers, because we want the extra
> static checking that gcc does for non system headers.
>
> I think the simplest workaround for the warning is to disable -Wvla for
> that particular declaration. Please see attached Gnulib patch for how I
> suggest doing this.
>
>> I didn't know mixing and matching two implementations like this
>> was even possible. Thanks for explaining it (though it seems
>> like a pretty cumbersome arrangement).
>
> Yes, it is a bit awkward. I am tempted to clean it up but that really
> should be a separate patch and so I will focus only on this thread's issue.
>
>> +#ifndef __ARG_NELTS
>> +# ifdef __ARG_NELTS
>
> A clear typo. The suggestion was to define and use _ARG_NELTS_ instead.
__ARG_NELTS is the name of macro in cdefs.h that you suggested so
this should presumably be the same, no? (I.e., it's not a typo but
I could be missing something.)
>
> Some other thoughts.
>
> If we're going to change regexec, we should change __compat_regex to be
> consistent.
>
> We should also change regexec.c's internal functions to be consistent
> with regexec. This includes re_search_internal, update_regs, etc.
That sounds like a good idea for a follow-up. My immediate goal is
to avoid the current warning so Glibc can build, and I prefer to make
that change independently of other improvements.
I plan to commit the last revision of the patch as is unless there's
some reason I'm missing that the name of the regex.h macro shouldn't
be the same as that in cdefs.h. I see you have both __ARG_NELTS and
_ARG_NELTS_ in your Gnulib patch though I don't understand why.
Thanks for all your comments!
Martin
>
> These internal functions don't have regexec's quirk that pmatch is
> ignored when not substituting, so their parameters can use the syntax
> "regmatch_t pmatch[__ARG_NELTS (static nmatch)]" to let the compiler
> know that the array must be of the given size.
>
> Proposed Gnulib patch attached. It should be easy to aplly to glibc
> though glibc also needs its own regex.h patched. I have not installed
> this into Gnulib yet.
>
> At some point we should sync Gnulib and Glibc regex of course.
On 8/26/21 8:06 AM, Martin Sebor wrote:
>>
>>> +#ifndef __ARG_NELTS
>>> +# ifdef __ARG_NELTS
>>
>> A clear typo. The suggestion was to define and use _ARG_NELTS_ instead.
>
> __ARG_NELTS is the name of macro in cdefs.h that you suggested so
> this should presumably be the same, no?
No matter what the macro name is (let's call it N), it cannot be right
to do this:
#ifndef N
# ifdef N
... whatever ...
because the "whatever" cannot be reached. Unfortunately this is what
your patch does, with N being __ARG_NELTS.
> I see you have both __ARG_NELTS and
> _ARG_NELTS_ in your Gnulib patch though I don't understand why.
It's for the same reason that regex.h already has both __restrict_arr
(the normal glibc practice) and _Restrict_arr_ (an alternative spelling
used only in regex.h). This is because regex.h is used both in
/usr/include on GNU systems (where __restrict_arr is used) and in Gnulib
on non-GNU systems that may not have __restrict_arr.
regex.h does not simply #define __restrict_arr if it's not already
defined, because any such definition might collide with a later
GNU-system header that defines __restrict_arr in a different way (this
could happen if Gnulib regex.h is used on a older GNUish platform).
Obviously this is a messy situation and I'd like to clean it up, but as
I mentioned in my previous email any such cleanup something better left
for a later patch. In the meantime we should just continue existing
practice when we add a new macro like __ARG_NELTS.
> I plan to commit the last revision of the patch as is
Please don't do that. That patch still needs work. Instead, please start
with the patch I sent, and understand what it does. Unless I'm missing
something it should be able to go into glibc as-is (though you may need
to edit glibc files that are not in gnulib).
On 8/26/21 10:24 AM, Paul Eggert wrote:
> On 8/26/21 8:06 AM, Martin Sebor wrote:
>>>
>>>> +#ifndef __ARG_NELTS
>>>> +# ifdef __ARG_NELTS
>>>
>>> A clear typo. The suggestion was to define and use _ARG_NELTS_ instead.
>>
>> __ARG_NELTS is the name of macro in cdefs.h that you suggested so
>> this should presumably be the same, no?
>
> No matter what the macro name is (let's call it N), it cannot be right
> to do this:
>
> #ifndef N
> # ifdef N
> ... whatever ...
>
> because the "whatever" cannot be reached. Unfortunately this is what
> your patch does, with N being __ARG_NELTS.
Aaah. I took what you had in your email and changed it to
__ARG_NELTS thinking you had a typo there because you had
suggested __ARG_NELTS before, and introduced a typo of my own.
>
>> I see you have both __ARG_NELTS and
>> _ARG_NELTS_ in your Gnulib patch though I don't understand why.
>
> It's for the same reason that regex.h already has both __restrict_arr
> (the normal glibc practice) and _Restrict_arr_ (an alternative spelling
> used only in regex.h). This is because regex.h is used both in
> /usr/include on GNU systems (where __restrict_arr is used) and in Gnulib
> on non-GNU systems that may not have __restrict_arr.
>
> regex.h does not simply #define __restrict_arr if it's not already
> defined, because any such definition might collide with a later
> GNU-system header that defines __restrict_arr in a different way (this
> could happen if Gnulib regex.h is used on a older GNUish platform).
>
> Obviously this is a messy situation and I'd like to clean it up, but as
> I mentioned in my previous email any such cleanup something better left
> for a later patch. In the meantime we should just continue existing
> practice when we add a new macro like __ARG_NELTS.
>
>> I plan to commit the last revision of the patch as is
>
> Please don't do that. That patch still needs work. Instead, please start
> with the patch I sent, and understand what it does. Unless I'm missing
> something it should be able to go into glibc as-is (though you may need
> to edit glibc files that are not in gnulib).
I think I'd prefer to let you do this. It's far more complicated
and extensive than it needs to be to silence just the one warning.
Martin
On 8/26/21 9:47 AM, Martin Sebor wrote:
> I think I'd prefer to let you do this.
OK, proposed glibc patch attached. As I lack the time to shepherd each
regex fix into Glibc separately, I simply copied regex-relevant source
files from Gnulib to Glibc, after merging recent Glibc changes (plus
your proposal) into Gnulib.
Please give the patch a try.
On 8/27/21 2:52 AM, Paul Eggert wrote:
> On 8/26/21 9:47 AM, Martin Sebor wrote:
>> I think I'd prefer to let you do this.
>
> OK, proposed glibc patch attached. As I lack the time to shepherd each
> regex fix into Glibc separately, I simply copied regex-relevant source
> files from Gnulib to Glibc, after merging recent Glibc changes (plus
> your proposal) into Gnulib.
>
> Please give the patch a try.
The patch compiles fine with the latest GCC trunk (12.0) but it
seems to cause failures in the conformance tests that I don't think
were there before:
FAIL: conform/POSIX/regex.h/conform
FAIL: conform/POSIX2008/regex.h/conform
FAIL: conform/UNIX98/regex.h/conform
FAIL: conform/XOPEN2K/regex.h/conform
FAIL: conform/XOPEN2K8/regex.h/conform
FAIL: conform/XPG4/regex.h/conform
FAIL: conform/XPG42/regex.h/conform
The first one fails because of this (I suspect the others are
the same):
Namespace violation: "GCC"
Namespace violation: "diagnostic"
Namespace violation: "ignored"
Namespace violation: "pop"
Namespace violation: "pragma"
Namespace violation: "push"
FAIL: Namespace of <regex.h>
I'm guessing it's due to a limitation of the conformance test script
that considers the GCC diagnostic pragmas unsafe to use because they
are in the user-namespace (even though they're not subject to macro
expansion). I'm still not sure I understand why the #pragma is
needed since -Wvla shouldn't trigger in a system header.
Beyond that I only skimmed your patch. It includes quite a few
changes that don't seem necessary to avoid the GCC warning (BZ #28170)
and that I wouldn't be comfortable making at the same time. This is
not an objection, just an explanation why I was reluctant to accept
some of your suggestions.
But just to clarify, I meant I would have preferred to just fix
the Glibc warning and let you change the other Glibc internal APIs
and handle the Gnulib integration. Looks like you decided against
the former in the end but thanks for taking care of the latter.
Martin
On 8/27/21 9:34 AM, Martin Sebor wrote:
> It includes quite a few
> changes that don't seem necessary to avoid the GCC warning (BZ #28170)
True. However, the changes have been tested on the Gnulib side, and I
feel more comfortable applying them all than doing just some of them.
It's less work anyway; and the more-work option feels like make-work.
I'll give people more time to comment before installing.
On Fri, 27 Aug 2021, Paul Eggert wrote:
> On 8/27/21 9:34 AM, Martin Sebor wrote:
>
> > It includes quite a few
> > changes that don't seem necessary to avoid the GCC warning (BZ #28170)
>
> True. However, the changes have been tested on the Gnulib side, and I feel
> more comfortable applying them all than doing just some of them. It's less
> work anyway; and the more-work option feels like make-work.
>
> I'll give people more time to comment before installing.
Any news on this? Builds with GCC mainline are still falling over with
the regexec issue (thus hiding any other issues with GCC mainline that
might have appeared since then and making them harder to bisect).
On 9/20/21 1:46 PM, Joseph Myers wrote:
> Any news on this? Builds with GCC mainline are still falling over with
> the regexec issue (thus hiding any other issues with GCC mainline that
> might have appeared since then and making them harder to bisect).
Sounds like I should install that August 27 patch then. It would sync
those files with Gnulib. OK with you?
On Tue, 21 Sep 2021, Paul Eggert wrote:
> On 9/20/21 1:46 PM, Joseph Myers wrote:
> > Any news on this? Builds with GCC mainline are still falling over with
> > the regexec issue (thus hiding any other issues with GCC mainline that
> > might have appeared since then and making them harder to bisect).
>
> Sounds like I should install that August 27 patch then. It would sync those
> files with Gnulib. OK with you?
Yes.
On 9/21/21 6:48 AM, Joseph Myers wrote:
>> Sounds like I should install that August 27 patch then. It would sync those
>> files with Gnulib. OK with you?
>
> Yes.
OK, done. I built with gcc 11.2.1 20210728 (Red Hat 11.2.1-1). GCC
mainline builds should work now, though I haven't tested that.
On 8/19/21 19:50, Martin Sebor via Libc-alpha wrote:
> Attached is yet another revision of this patch (v3 to let the patch
> tester pick it up).
Is your v3 resolved given the merge from gnulib?
commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82
Author: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue Sep 21 07:47:45 2021 -0700
regex: copy back from Gnulib
On 10/19/21 10:39 AM, Carlos O'Donell wrote:
> On 8/19/21 19:50, Martin Sebor via Libc-alpha wrote:
>> Attached is yet another revision of this patch (v3 to let the patch
>> tester pick it up).
>
> Is your v3 resolved given the merge from gnulib?
>
> commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82
> Author: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue Sep 21 07:47:45 2021 -0700
>
> regex: copy back from Gnulib
>
Yes, I believe so. I have just rebuilt the latest Glibc with
the latest GCC on x86_64-linux. The build shows no warnings.
Thanks
Martin
@@ -36,8 +36,24 @@ extern void __re_set_registers
extern int __regcomp (regex_t *__preg, const char *__pattern, int __cflags);
libc_hidden_proto (__regcomp)
+
+#ifndef __ARG_NELTS
+# ifdef __ARG_NELTS
+/* Same as the corresponding cdefs.h macro. Defined for builds with
+ no cdefs.h. */
+# define __ARG_NELTS(arg) __ARG_NELTS (arg)
+# elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
+ && !defined __STDC_NO_VLA__)
+# define __ARG_NELTS(n) n
+# else
+# define __ARG_NELTS(n)
+# endif
+#endif
+
extern int __regexec (const regex_t *__preg, const char *__string,
- size_t __nmatch, regmatch_t __pmatch[], int __eflags);
+ size_t __nmatch,
+ regmatch_t __pmatch[__ARG_NELTS (__nmatch)],
+ int __eflags);
libc_hidden_proto (__regexec)
extern size_t __regerror (int __errcode, const regex_t *__preg,
@@ -632,4 +632,17 @@ _Static_assert (0, "IEEE 128-bits long double requires redirection on this platf
# define __attribute_returns_twice__ /* Ignore. */
#endif
+#ifndef __ARG_NELTS
+# ifdef __ARG_NELTS
+/* Used to specify a variable bound in a declaration of a function
+ VLA-like parameter, as in 'int f (int n, int[__ARG_NELTS (n)]);' */
+# define __ARG_NELTS(arg) __ARG_NELTS (arg)
+# elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
+ && !defined __STDC_NO_VLA__)
+# define __ARG_NELTS(n) n
+# else
+# define __ARG_NELTS(n)
+# endif
+#endif
+
#endif /* sys/cdefs.h */
@@ -654,9 +654,8 @@ extern int regcomp (regex_t *_Restrict_ __preg,
extern int regexec (const regex_t *_Restrict_ __preg,
const char *_Restrict_ __String, size_t __nmatch,
- regmatch_t __pmatch[_Restrict_arr_],
- int __eflags)
- __attr_access ((__write_only__, 4, 3));
+ regmatch_t __pmatch[_Restrict_arr_ __ARG_NELTS (__nmatch)],
+ int __eflags);
extern size_t regerror (int __errcode, const regex_t *_Restrict_ __preg,
char *_Restrict_ __errbuf, size_t __errbuf_size)
@@ -190,7 +190,7 @@ static reg_errcode_t extend_buffers (re_match_context_t *mctx, int min_len);
int
regexec (const regex_t *__restrict preg, const char *__restrict string,
- size_t nmatch, regmatch_t pmatch[], int eflags)
+ size_t nmatch, regmatch_t pmatch[__ARG_NELTS (nmatch)], int eflags)
{
reg_errcode_t err;
Idx start, length;