remove attribute access from regexec
Checks
Commit Message
A recent GCC enhancement to detect accesses by functions declared
with attribute access that are in conflict with the mode specified
by the attribute triggers a warning for the definition of regexec:
regexec.c: In function ‘__regexec’:
regexec.c:204:13: warning: ‘*pmatch.rm_so’ may be used uninitialized
[-Wmaybe-uninitialized]
regexec.c:192:101: note: accessing argument 4 of a function declared
with attribute ‘access (write_only, 4, 3)’
The attribute was added based on the POSIX description of
the function that implies the pmatch array referenced by it is
write-only. However, when the REG_STARTEND bit is set in flags,
Glibc also reads from the object. This seems to be an extension
to POSIX that's not mentioned in the Glibc manual but that is
documented in the Linux man pages:
https://man7.org/linux/man-pages/man3/regcomp.3.html
The patch below changes the attribute's mode to read_write to
reflect this extension.
Martin
Comments
On 8/13/21 11:26 AM, Martin Sebor via Libc-alpha wrote:
> - __attr_access ((__write_only__, 4, 3));
> + __attr_access ((__read_write__, 4, 3));
Wouldn't it be simpler to remove the __attr_access instead?
What utility does __attr_access ((__read_write__, 4, 3)) have? FWIW,
Glibc currently does not use such an attribute anywhere.
On 8/13/21 2:11 PM, Paul Eggert wrote:
> On 8/13/21 11:26 AM, Martin Sebor via Libc-alpha wrote:
>> - __attr_access ((__write_only__, 4, 3));
>> + __attr_access ((__read_write__, 4, 3));
>
> Wouldn't it be simpler to remove the __attr_access instead?
>
> What utility does __attr_access ((__read_write__, 4, 3)) have? FWIW,
> Glibc currently does not use such an attribute anywhere.
The attribute serves two purposes: it specifies 1) how the function
might access the array (to determine whether it should be initialized
by the caller) and 2) the minimum number of elements the caller must
provide and the maximum number of elements the function definition
might access.
GCC checks to make sure these constraints are satisfied, both at call
sites and in the definitions of these functions.
The read_write mode means that a function may both read from and write
to the array, and expects it to be initialized. But since regexec
might just write to the array and not read from it, depending on
flags, the read_write mode wouldn't be correct either.
There is no attribute access mode that would capture this but in C99
(though sadly not in C++) and thanks to the nmatch argument preceding
pmatch, the VLA argument syntax does make the checking possible. It's
like attribute access with an unspecified mode. (It might be worth
extending the attribute to express this mode so it can be used when
the bound doesn't precede the VLA argument and in C++.)
Attached is a revised patch with this approach.
Martin
PS POSIX says regexec() ignores pnmatch when REG_NOSUB is set, so
strictly speaking, warning for such calls to it in that case is
also a false positive. I think that's an acceptable trade-off for
the buffer overflow detection (passing a zero nmatch in that case
is a trivial way to avoid the warning).
On 8/13/21 2:30 PM, Martin Sebor wrote:
> Attached is a revised patch with this approach.
The revised patch is to include/regex.h but the original patch was to
posix/regex.h. Is that intentional?
We need to check whether __STDC_VERSION__ is defined. Also, no need for
parens around arg of 'defined'. Something like this perhaps:
#if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
&& !defined __STDC_NO_VLA__)
Also, the duplication of the declarations make the headers harder to
read and encourage typos (I noticed one typo: "_Restrict_arr" without
the trailing "_"). Instead, I suggest something like this:
#if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
&& !defined __STDC_NO_VLA__)
# define _REGEX_VLA(arg) arg
#else
# define _REGEX_VLA(arg)
#endif
That way, we can simply change "regmatch_t __pmatch[_Restrict_arr_]" to
"regmatch_t __pmatch[_Restrict_arr_ _REGEX_VLA (__nmatch)]" without
having to duplicate the entire function declaration.
> PS POSIX says regexec() ignores pnmatch when REG_NOSUB is set, so
> strictly speaking, warning for such calls to it in that case is
> also a false positive.
Ouch, this casts doubt on the entire exercise. It's not simply about
warnings: it's about the code being generated for the matcher. For
example, for:
int
f (_Bool flag, unsigned long n, int a[n])
{
return n == 0 ? 0 : flag ? a[n - 1] : a[0];
}
a compiler is allowed to generate code that loads a[n - 1] even when
FLAG is false. Similarly, if we add this VLA business to regexec, the
generated machine code could dereference pmatch unconditionally even if
our source code makes the dereferencing conditional on REG_NOSUB, and
the resulting behavior would fail to conform to POSIX.
On 8/13/21 4:34 PM, Paul Eggert wrote:
> On 8/13/21 2:30 PM, Martin Sebor wrote:
>> Attached is a revised patch with this approach.
>
> The revised patch is to include/regex.h but the original patch was to
> posix/regex.h. Is that intentional?
Yes, they need to be consistent, otherwise GCC issues -Wvla-parameter.
(That's to help detect inadvertent array/VLA mismatches as well as
mismatches in the VLA parameter bounds.)
>
> We need to check whether __STDC_VERSION__ is defined. Also, no need for
> parens around arg of 'defined'. Something like this perhaps:
>
> #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
> && !defined __STDC_NO_VLA__)
>
> Also, the duplication of the declarations make the headers harder to
> read and encourage typos (I noticed one typo: "_Restrict_arr" without
> the trailing "_"). Instead, I suggest something like this:
>
> #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
> && !defined __STDC_NO_VLA__)
> # define _REGEX_VLA(arg) arg
> #else
> # define _REGEX_VLA(arg)
> #endif
>
> That way, we can simply change "regmatch_t __pmatch[_Restrict_arr_]" to
> "regmatch_t __pmatch[_Restrict_arr_ _REGEX_VLA (__nmatch)]" without
> having to duplicate the entire function declaration.
Sounds good. I've defined the macro in cdefs.h and mamed it _VLA_ARG
to make it usable in other contexts. Please see the attached revision.
>
>> PS POSIX says regexec() ignores pnmatch when REG_NOSUB is set, so
>> strictly speaking, warning for such calls to it in that case is
>> also a false positive.
>
> Ouch, this casts doubt on the entire exercise. It's not simply about
> warnings: it's about the code being generated for the matcher. For
> example, for:
>
> int
> f (_Bool flag, unsigned long n, int a[n])
> {
> return n == 0 ? 0 : flag ? a[n - 1] : a[0];
> }
>
> a compiler is allowed to generate code that loads a[n - 1] even when
> FLAG is false. Similarly, if we add this VLA business to regexec, the
> generated machine code could dereference pmatch unconditionally even if
> our source code makes the dereferencing conditional on REG_NOSUB, and
> the resulting behavior would fail to conform to POSIX.
The VLA bound by itself doesn't affect codegen. I suspect you're
thinking of a[static n]? With just a[n], without static, there
is no requirement that a point to an array with n elements. It
simply declares an ordinary pointer, same as [] or *.
Martin
On 8/14/21 1:08 PM, Martin Sebor wrote:
> The VLA bound by itself doesn't affect codegen. I suspect you're
> thinking of a[static n]? With just a[n], without static, there
> is no requirement that a point to an array with n elements. It
> simply declares an ordinary pointer, same as [] or *.
Thanks for clarifying.
I tried using a patch like that on coreutils, but it caused the build to
fail like this:
In file included from lib/exclude.c:35:
./lib/regex.h:661:7: error: ISO C90 forbids variable length array
'__pmatch' [-Werror=vla]
661 | regmatch_t __pmatch[_Restrict_arr_ _VLA_ARG (__nmatch)],
| ^~~~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [Makefile:10648: lib/exclude.o] Error 1
This is because coreutils is compiled with -Wvla -Werror, to catch
inadvertent attempts to use VLAs in local variables (this helps avoid
stack-overflow problems). It'd be unfortunate if we had to give that
useful diagnostic up simply due to this declaration, as there's no
stack-overflow problem here.
If you can think of a way around this issue, here are some other things
I ran into while trying this idea out on Coreutils.
* Other cdefs.h macros (__NTH, __REDIRECT, etc.) start with two
underscores, so shouldn't this new macro too?
* Come to think of it, the name _VLA_ARG could be improved, as its
argument is not actually a VLA; it's the number of elements in a
VLA-like array. Also, its formal-parameter "arg" is confusingly-named,
as it's an arbitrary integer expression and need not be a function
parameter name. How about naming the macro __ARG_NELTS instead?
* regex.h cannot use __ARG_NELTS directly, for the same reason it can't
use __restrict_arr directly: regex.h is shared with Gnulib and can't
assume that a glibc-like sys/cdefs.h is present. I suppose regex.h would
need something like this:
#ifndef _ARG_NELTS_
# ifdef __ARG_NELTS
# define _ARG_NELTS_(arg) __ARG_NELTS (arg)
# elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
&& !defined __STDC_NO_VLA__)
# define _ARG_NELTS_(n) n
# else
# define _ARG_NELTS_(n)
# endif
#endif
and then use _ARG_NELTS_ later.
* The cdefs.h comment has a stray 'n', its wording could be improved (I
misread "variable bound" as a variable that's bound to something),
there's a stray empty line, and it's nicer to put the comment in front
of all the lines that define the macro. Perhaps something like this:
/* Specify the number of elements of a function's array parameter,
as in 'int f (int n, int a[__ARG_NELTS (n)]);'. */
#if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
&& !defined __STDC_NO_VLA__)
# define __ARG_NELTS(n) n
#else
# define __ARG_NELTS(n)
#endif
Though again, it's not clear to me that this idea will fly at all, due
to the -Wvla issue.
Maybe GCC's -Wvla should be fixed to not report an error in this case?
It's actually not a VLA if you ask me (the C standard is unclear).
@@ -656,7 +656,7 @@ extern int regexec (const regex_t *_Restrict_ __preg,
const char *_Restrict_ __String, size_t __nmatch,
regmatch_t __pmatch[_Restrict_arr_],
int __eflags)
- __attr_access ((__write_only__, 4, 3));
+ __attr_access ((__read_write__, 4, 3));
extern size_t regerror (int __errcode, const regex_t *_Restrict_ __preg,
char *_Restrict_ __errbuf, size_t __errbuf_size)