posix: Sync gnulib regex implementation

  On 28/06/2018 17:21, Paul Eggert wrote:
> Adhemerval Zanella wrote:
>> This patch syncs regex code with gnulib (commit 16aa5a2).
> 
> Thanks for doing all that work. I merged some of the changes embodied in that patch back to Gnulib; see <https://lists.gnu.org/r/bug-gnulib/2018-06/msg00095.html>. Here are some comments on the changes that I had questions about:
> 
>>    * regcomp: add a __libc_lock_init init for dfa lock.
> 
> Why is this needed? The lock has already been initialized by the call to lock_init about 15 lines earlier. Please see attached patch, which reverts this change.

I think we can remove it indeed, I did not change because on extenal
project syncs I try to avoid deviate from original code and add extra
changes in subsequent patches.

> 
>>    * regexec: instead of use intprops.h and its macros, this patch
>>      the check_add_overflow_idx function to check for addition
>>      overflow.  The function need to be defined on the file because
>>      the file will be build multiple times wit 'Idx' type being
>>      redefined (I think intprops.h addition should be done in a
>>      separate patch).
> 
> When used with Gnulib, that implementation won't work with ICC 17.0.4 20170411, which defines __GNUC__ to be 5 but does not support __builtin_add_overflow. Also, the __GNUC__ < 5 code doesn't look right for multiple reasons: (1) it doesn't set *r when returning false, contrary to the comment describing the code (presumably the comment needs to be fixed?), (2) it incorrectly uses SSIZE_MAX instead of INT_MAX when _REGEX_LARGE_OFFSETS is not defined, and (3) it has an unnecessary test for a > 0 (you need to test only whether a < 0).
> 
> Although these issues could be fixed individually, why waste time? Let's just use intprops.h and avoid the hassle of hacking on the bugs of a partial replacement.
I really want to make this change independent of intprops.h addition
since previous attempts we tried to add it, some maintainers raised 
concerns about its inclusion.  I think it would be better if we propose
its addition independently and without any conditionality.  To reiterate,
although its macro machinery is somewhat complex, I do think its addition
could make easier add checks for overflow, instead of splitting the logic
over different places.

For this change I have fixed the points you raised (as you pointed out
the semantic in indeed to wrap over values).

> 
> Attached please see an intprops.h addition that is done as a separate patch. The second attached patch goes back to using INT_ADD_WRAPV instead of check_add_overflow.
> 
>>    * regex.h: Remove _Restrict_ and _Restrict_arr_ definition based
>>      on __STDC_VERSION__ because its usage leads to failures on
>>      posix/check-installed-headers-c{xx}.
> 
> That implementation assumes glibc, so won't work in Gnulib. I attempted to work around the check-installed-headers issue in the attached patch; if this doesn't work please let me know what the issue is.

Thanks for checking on it, I sync with gnulib and changes seems to
not trigger any check-installed-headers issues.

> 
>>    * regex_internal.h: Define lock_fini to empty macro because setting
>>      to 0 lead to build issues (error: statement with no effect
>>      [-Werror=unused-value]).
> 
> Making it empty is problematic, since lock_fini is supposed to be usable wherever a function call could appear. Instead, let's define it to ((void) 0). That should fix the -Wunused-value issue in a better way. See attached patch.

Ack, I changed on glibc side as well.

> 
>>    2. posix/PCRE.tests: the test '(a)|\1' uses a backreference along
>>       with group creation and I am not sure if it is the correct
>>       behavior to accept it with regcomp (REG_EXTENDED).  The GNU grep
>>       accepts it with ERE option though.
> 
> POSIX allows recomp and grep to accept this pattern as an extension. I dunno what it means, though, and no doubt regcomp and grep should report an error instead of accepting it silently. That would be a different bugfix, though.

Just to be clear, current glibc regex do accept this pattern with 
REG_EXTENDED. And the gnulib code does report an error instead of
accepting it. So it is not clear to me, from glibc side, if we should
continue to accept it or if we should provide old behaviour.

> 
> One other thing: let's use https: intead of http: in URLs, as per Gnulib style. We should be using these everywhere in Glibc, of course, but one step at a time.

Ack.

> 
> See attached patches for what the above comments boil down to. They are intended to be applied after the patch you posted. The patches could all be squashed.

Since you already pushed the changes on gnulib, I synced again.  I am just
not sure how we should handle the 'posix/PCRE.tests' change.

--

This patch syncs regex code with gnulib (commit 16aa5a2).  The main
differences are:

  * regcomp: add a __libc_lock_init init for dfa lock.

  * regexec: instead of use intprops.h and its macros, this patch
    the check_add_overflow_idx function to check for addition
    overflow.  The function need to be defined on the file because
    the file will be build multiple times wit 'Idx' type being
    redefined (I think intprops.h addition should be done in a
    separate patch).

  * regex.h: Remove _Restrict_ and _Restrict_arr_ definition based
    on __STDC_VERSION__ because its usage leads to failures on
    posix/check-installed-headers-c{xx}.

  * regex_internal.h: Define lock_fini to empty macro because setting
    to 0 lead to build issues (error: statement with no effect
    [-Werror=unused-value]).  Also the prototypes of
    're_string_realloc_buffers', 'build_wcs_buffer',
    'build_wcs_upper_buffer', 'build_upper_buffer',
    're_string_translate_buffer', 're_string_context_at' are kept in
    its implementation counterpart because of internal tests building.
    The same internal testing requires the 'bitset_*' macros to be
    'static inline' instead of just 'static' plus attribute used in
    gnulib code.

Only two changes in GLIBC regex testing are required:

  1. posix/bug-regex28.c: as previously discussed [1] the change of
     expected results on the pattern should be safe.

  2. posix/PCRE.tests: the test '(a)|\1' uses a backreference along
     with group creation and I am not sure if it is the correct
     behavior to accept it with regcomp (REG_EXTENDED).  The GNU grep
     accepts it with ERE option though.

This sync contains some patches from thread 'Regex: Make libc regex
more usable outside GLIBC.' [2] which have been pushed upstream in
gnulib.  This patches also fixes some regex issues (BZ #23233,
BZ #21163, BZ #18986, BZ #13762) and I did not add testcases for
both #23233 and #13762 because I couldn't think a simple way to
trigger the expected failure path to trigger them.

Checked on x86_64-linux-gnu and i686-linux-gnu.

	[BZ #23233]
	[BZ #21163]
	[BZ #18986]
	[BZ #13762]
	* posix/Makefile (tests): Add bug-regex37 and bug-regex38.
	* posix/PCRE.tests: Remove invalid test.
	* posix/bug-regex28.c: Fix expected values for used syntax.
	* posix/bug-regex37.c: New file.
	* posix/bug-regex38.c: Likewise.
	* posix/regcomp.c: Sync with gnulib.
	* posix/regex.c: Likewise.
	* posix/regex.h: Likewise.
	* posix/regex_internal.c: Likewise.
	* posix/regex_internal.h: Likewise.
	* posix/regexec.c: Likewise.

[1] https://sourceware.org/ml/libc-alpha/2017-12/msg00807.html
[2] https://sourceware.org/ml/libc-alpha/2017-12/msg00237.html
---
 ChangeLog              |  18 +
 posix/Makefile         |   3 +-
 posix/PCRE.tests       |  13 -
 posix/bug-regex28.c    |  46 +--
 posix/bug-regex37.c    |  32 ++
 posix/bug-regex38.c    |  32 ++
 posix/regcomp.c        | 601 ++++++++++++++++++-------------
 posix/regex.c          |  21 +-
 posix/regex.h          | 335 ++++++++++-------
 posix/regex_internal.c | 295 ++++++++-------
 posix/regex_internal.h | 426 ++++++++++++++--------
 posix/regexec.c        | 952 ++++++++++++++++++++++++++-----------------------
 12 files changed, 1612 insertions(+), 1162 deletions(-)
 create mode 100644 posix/bug-regex37.c
 create mode 100644 posix/bug-regex38.c

posix: Sync gnulib regex implementation

Commit Message

Comments

Patch