Adhemerval Zanella wrote:
> This patch syncs regex code with gnulib (commit 16aa5a2).
Thanks for doing all that work. I merged some of the changes embodied in that
patch back to Gnulib; see
<https://lists.gnu.org/r/bug-gnulib/2018-06/msg00095.html>. Here are some
comments on the changes that I had questions about:
> * regcomp: add a __libc_lock_init init for dfa lock.
Why is this needed? The lock has already been initialized by the call to
lock_init about 15 lines earlier. Please see attached patch, which reverts this
change.
> * regexec: instead of use intprops.h and its macros, this patch
> the check_add_overflow_idx function to check for addition
> overflow. The function need to be defined on the file because
> the file will be build multiple times wit 'Idx' type being
> redefined (I think intprops.h addition should be done in a
> separate patch).
When used with Gnulib, that implementation won't work with ICC 17.0.4 20170411,
which defines __GNUC__ to be 5 but does not support __builtin_add_overflow.
Also, the __GNUC__ < 5 code doesn't look right for multiple reasons: (1) it
doesn't set *r when returning false, contrary to the comment describing the code
(presumably the comment needs to be fixed?), (2) it incorrectly uses SSIZE_MAX
instead of INT_MAX when _REGEX_LARGE_OFFSETS is not defined, and (3) it has an
unnecessary test for a > 0 (you need to test only whether a < 0).
Although these issues could be fixed individually, why waste time? Let's just
use intprops.h and avoid the hassle of hacking on the bugs of a partial replacement.
Attached please see an intprops.h addition that is done as a separate patch. The
second attached patch goes back to using INT_ADD_WRAPV instead of
check_add_overflow.
> * regex.h: Remove _Restrict_ and _Restrict_arr_ definition based
> on __STDC_VERSION__ because its usage leads to failures on
> posix/check-installed-headers-c{xx}.
That implementation assumes glibc, so won't work in Gnulib. I attempted to work
around the check-installed-headers issue in the attached patch; if this doesn't
work please let me know what the issue is.
> * regex_internal.h: Define lock_fini to empty macro because setting
> to 0 lead to build issues (error: statement with no effect
> [-Werror=unused-value]).
Making it empty is problematic, since lock_fini is supposed to be usable
wherever a function call could appear. Instead, let's define it to ((void) 0).
That should fix the -Wunused-value issue in a better way. See attached patch.
> 2. posix/PCRE.tests: the test '(a)|\1' uses a backreference along
> with group creation and I am not sure if it is the correct
> behavior to accept it with regcomp (REG_EXTENDED). The GNU grep
> accepts it with ERE option though.
POSIX allows recomp and grep to accept this pattern as an extension. I dunno
what it means, though, and no doubt regcomp and grep should report an error
instead of accepting it silently. That would be a different bugfix, though.
One other thing: let's use https: intead of http: in URLs, as per Gnulib style.
We should be using these everywhere in Glibc, of course, but one step at a time.
See attached patches for what the above comments boil down to. They are intended
to be applied after the patch you posted. The patches could all be squashed.
From 86022a73a20f701ec54eb9baa5d963d3c665051d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 28 Jun 2018 12:49:07 -0700
Subject: [PATCH 2/2] posix: finish merge with Gnulib regex
* posix/regex_internal.h: Include intprops.h.
* posix/regcomp.c (re_compile_internal): Omit unnecessary call to
__libc_lock_init, as the previous lock_init has already
initialized the lock.
* posix/regex.h (_Restrict_, _Restrict_arr_):
Port to Gnulib case, where __restrict and __restrict_arr
might not work. Do this in a better way than Gnulib
did until today.
* posix/regexec.c (check_add_overflow_idx):
Remove, as it was not portable to ICC, and had some problems
when __GNUC__ < 5. Use replaced by intprops.h's INT_ADD_WRAPV.
---
ChangeLog | 13 +++++++++++++
posix/regcomp.c | 4 +---
posix/regex.c | 2 +-
posix/regex.h | 27 ++++++++++++++++++++-------
posix/regex_internal.c | 2 +-
posix/regex_internal.h | 6 ++++--
posix/regexec.c | 21 ++-------------------
7 files changed, 42 insertions(+), 33 deletions(-)
@@ -1,5 +1,18 @@
2018-06-28 Paul Eggert <eggert@cs.ucla.edu>
+ posix: finish merge with Gnulib regex
+ * posix/regex_internal.h: Include intprops.h.
+ * posix/regcomp.c (re_compile_internal): Omit unnecessary call to
+ __libc_lock_init, as the previous lock_init has already
+ initialized the lock.
+ * posix/regex.h (_Restrict_, _Restrict_arr_):
+ Port to Gnulib case, where __restrict and __restrict_arr
+ might not work. Do this in a better way than Gnulib
+ did until today.
+ * posix/regexec.c (check_add_overflow_idx):
+ Remove, as it was not portable to ICC, and had some problems
+ when __GNUC__ < 5. Use replaced by intprops.h's INT_ADD_WRAPV.
+
posix: add intprops.h to prepare for regex sync
* include/intprops.h: New file, taken from Gnulib except
with reworded header comments. This will be used in a
@@ -15,7 +15,7 @@
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
- <http://www.gnu.org/licenses/>. */
+ <https://www.gnu.org/licenses/>. */
#ifdef _LIBC
# include <locale/weight.h>
@@ -784,8 +784,6 @@ re_compile_internal (regex_t *preg, const char * pattern, size_t length,
strncpy (dfa->re_str, pattern, length + 1);
#endif
- __libc_lock_init (dfa->lock);
-
err = re_string_construct (®exp, pattern, length, preg->translate,
(syntax & RE_ICASE) != 0, dfa);
if (BE (err != REG_NOERROR, 0))
@@ -15,7 +15,7 @@
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
- <http://www.gnu.org/licenses/>. */
+ <https://www.gnu.org/licenses/>. */
#ifndef _LIBC
# include <config.h>
@@ -15,7 +15,7 @@
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
- <http://www.gnu.org/licenses/>. */
+ <https://www.gnu.org/licenses/>. */
#ifndef _REGEX_H
#define _REGEX_H 1
@@ -607,19 +607,32 @@ extern int re_exec (const char *);
# endif
#endif
-/* GCC 2.95 and later have "__restrict"; C99 compilers have
+/* For plain 'restrict', use glibc's __restrict if defined.
+ Otherwise, GCC 2.95 and later have "__restrict"; C99 compilers have
"restrict", and "configure" may have defined "restrict".
Other compilers use __restrict, __restrict__, and _Restrict, and
'configure' might #define 'restrict' to those words, so pick a
different name. */
#ifndef _Restrict_
-# define _Restrict_ __restrict
+# if defined __restrict || 2 < __GNUC__ + (95 <= __GNUC_MINOR__)
+# define _Restrict_ __restrict
+# elif 199901L <= __STDC_VERSION__ || defined restrict
+# define _Restrict_ restrict
+# else
+# define _Restrict_
+# endif
#endif
-/* gcc 3.1 and up support the [restrict] syntax. Don't trust
- sys/cdefs.h's definition of __restrict_arr, though, as it
- mishandles gcc -ansi -pedantic. */
+/* For [restrict], use glibc's __restrict_arr if available.
+ Otherwise, GCC 3.1 (not in C++ mode) and C99 support [restrict]. */
#ifndef _Restrict_arr_
-# define _Restrict_arr_ __restrict_arr
+# ifdef __restrict_arr
+# define _Restrict_arr_ __restrict_arr
+# elif ((199901L <= __STDC_VERSION__ || 3 < __GNUC__ + (1 <= __GNUC_MINOR__)) \
+ && !defined __GNUG__)
+# define _Restrict_arr_ _Restrict_
+# else
+# define _Restrict_arr_
+# endif
#endif
/* POSIX compatibility. */
@@ -15,7 +15,7 @@
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
- <http://www.gnu.org/licenses/>. */
+ <https://www.gnu.org/licenses/>. */
static void re_string_construct_common (const char *str, Idx len,
re_string_t *pstr,
@@ -15,7 +15,7 @@
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
- <http://www.gnu.org/licenses/>. */
+ <https://www.gnu.org/licenses/>. */
#ifndef _REGEX_INTERNAL_H
#define _REGEX_INTERNAL_H 1
@@ -33,11 +33,13 @@
#include <stdbool.h>
#include <stdint.h>
+#include "intprops.h"
+
#ifdef _LIBC
# include <libc-lock.h>
# define lock_define(name) __libc_lock_define (, name)
# define lock_init(lock) (__libc_lock_init (lock), 0)
-# define lock_fini(lock)
+# define lock_fini(lock) ((void) 0)
# define lock_lock(lock) __libc_lock_lock (lock)
# define lock_unlock(lock) __libc_lock_unlock (lock)
#elif defined GNULIB_LOCK && !defined USE_UNLOCKED_IO
@@ -15,7 +15,7 @@
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
- <http://www.gnu.org/licenses/>. */
+ <https://www.gnu.org/licenses/>. */
static reg_errcode_t match_ctx_init (re_match_context_t *cache, int eflags,
Idx n);
@@ -317,23 +317,6 @@ re_search_2 (struct re_pattern_buffer *bufp, const char *string1, Idx length1,
weak_alias (__re_search_2, re_search_2)
#endif
-/* Set *R = A + B. Return true if the answer is mathematically incorrect due
- to overflow; in this case, *R is the low order bits of the correct
- answer. */
-static inline bool
-check_add_overflow_idx (Idx a, Idx b, Idx *r)
-{
-#if __GNUC__ >= 5
- return __builtin_add_overflow (a, b, r);
-#else
- if ((a > 0 && b > SSIZE_MAX - a)
- || (a < 0 && b < SSIZE_MIN -a)
- return false;
- *r = a + b;
- return true;
-#endif
-}
-
static regoff_t
re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
Idx length1, const char *string2, Idx length2, Idx start,
@@ -346,7 +329,7 @@ re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
char *s = NULL;
if (BE ((length1 < 0 || length2 < 0 || stop < 0
- || check_add_overflow_idx (length1, length2, &len)),
+ || INT_ADD_WRAPV (length1, length2, &len)),
0))
return -2;
--
2.17.1