posix: Sync gnulib regex implementation

Message ID d5a88131-c50f-868d-b8b0-6115c5aa00ef@cs.ucla.edu
State New, archived
Headers

Commit Message

Paul Eggert June 28, 2018, 8:21 p.m. UTC
  Adhemerval Zanella wrote:
> This patch syncs regex code with gnulib (commit 16aa5a2).

Thanks for doing all that work. I merged some of the changes embodied in that 
patch back to Gnulib; see 
<https://lists.gnu.org/r/bug-gnulib/2018-06/msg00095.html>. Here are some 
comments on the changes that I had questions about:

>    * regcomp: add a __libc_lock_init init for dfa lock.

Why is this needed? The lock has already been initialized by the call to 
lock_init about 15 lines earlier. Please see attached patch, which reverts this 
change.

>    * regexec: instead of use intprops.h and its macros, this patch
>      the check_add_overflow_idx function to check for addition
>      overflow.  The function need to be defined on the file because
>      the file will be build multiple times wit 'Idx' type being
>      redefined (I think intprops.h addition should be done in a
>      separate patch).

When used with Gnulib, that implementation won't work with ICC 17.0.4 20170411, 
which defines __GNUC__ to be 5 but does not support __builtin_add_overflow. 
Also, the __GNUC__ < 5 code doesn't look right for multiple reasons: (1) it 
doesn't set *r when returning false, contrary to the comment describing the code 
(presumably the comment needs to be fixed?), (2) it incorrectly uses SSIZE_MAX 
instead of INT_MAX when _REGEX_LARGE_OFFSETS is not defined, and (3) it has an 
unnecessary test for a > 0 (you need to test only whether a < 0).

Although these issues could be fixed individually, why waste time? Let's just 
use intprops.h and avoid the hassle of hacking on the bugs of a partial replacement.

Attached please see an intprops.h addition that is done as a separate patch. The 
second attached patch goes back to using INT_ADD_WRAPV instead of 
check_add_overflow.

>    * regex.h: Remove _Restrict_ and _Restrict_arr_ definition based
>      on __STDC_VERSION__ because its usage leads to failures on
>      posix/check-installed-headers-c{xx}.

That implementation assumes glibc, so won't work in Gnulib. I attempted to work 
around the check-installed-headers issue in the attached patch; if this doesn't 
work please let me know what the issue is.

>    * regex_internal.h: Define lock_fini to empty macro because setting
>      to 0 lead to build issues (error: statement with no effect
>      [-Werror=unused-value]).

Making it empty is problematic, since lock_fini is supposed to be usable 
wherever a function call could appear. Instead, let's define it to ((void) 0). 
That should fix the -Wunused-value issue in a better way. See attached patch.

>    2. posix/PCRE.tests: the test '(a)|\1' uses a backreference along
>       with group creation and I am not sure if it is the correct
>       behavior to accept it with regcomp (REG_EXTENDED).  The GNU grep
>       accepts it with ERE option though.

POSIX allows recomp and grep to accept this pattern as an extension. I dunno 
what it means, though, and no doubt regcomp and grep should report an error 
instead of accepting it silently. That would be a different bugfix, though.

One other thing: let's use https: intead of http: in URLs, as per Gnulib style. 
We should be using these everywhere in Glibc, of course, but one step at a time.

See attached patches for what the above comments boil down to. They are intended 
to be applied after the patch you posted. The patches could all be squashed.
  

Patch

From 86022a73a20f701ec54eb9baa5d963d3c665051d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 28 Jun 2018 12:49:07 -0700
Subject: [PATCH 2/2] posix: finish merge with Gnulib regex

* posix/regex_internal.h: Include intprops.h.
* posix/regcomp.c (re_compile_internal): Omit unnecessary call to
__libc_lock_init, as the previous lock_init has already
initialized the lock.
* posix/regex.h (_Restrict_, _Restrict_arr_):
Port to Gnulib case, where __restrict and __restrict_arr
might not work.  Do this in a better way than Gnulib
did until today.
* posix/regexec.c (check_add_overflow_idx):
Remove, as it was not portable to ICC, and had some problems
when __GNUC__ < 5.  Use replaced by intprops.h's INT_ADD_WRAPV.
---
 ChangeLog              | 13 +++++++++++++
 posix/regcomp.c        |  4 +---
 posix/regex.c          |  2 +-
 posix/regex.h          | 27 ++++++++++++++++++++-------
 posix/regex_internal.c |  2 +-
 posix/regex_internal.h |  6 ++++--
 posix/regexec.c        | 21 ++-------------------
 7 files changed, 42 insertions(+), 33 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 27793d1456..38935253b0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,18 @@ 
 2018-06-28  Paul Eggert  <eggert@cs.ucla.edu>
 
+	posix: finish merge with Gnulib regex
+	* posix/regex_internal.h: Include intprops.h.
+	* posix/regcomp.c (re_compile_internal): Omit unnecessary call to
+	__libc_lock_init, as the previous lock_init has already
+	initialized the lock.
+	* posix/regex.h (_Restrict_, _Restrict_arr_):
+	Port to Gnulib case, where __restrict and __restrict_arr
+	might not work.  Do this in a better way than Gnulib
+	did until today.
+	* posix/regexec.c (check_add_overflow_idx):
+	Remove, as it was not portable to ICC, and had some problems
+	when __GNUC__ < 5.  Use replaced by intprops.h's INT_ADD_WRAPV.
+
 	posix: add intprops.h to prepare for regex sync
 	* include/intprops.h: New file, taken from Gnulib except
 	with reworded header comments.  This will be used in a
diff --git a/posix/regcomp.c b/posix/regcomp.c
index 8431268b1c..6aaa54f66d 100644
--- a/posix/regcomp.c
+++ b/posix/regcomp.c
@@ -15,7 +15,7 @@ 
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 #ifdef _LIBC
 # include <locale/weight.h>
@@ -784,8 +784,6 @@  re_compile_internal (regex_t *preg, const char * pattern, size_t length,
   strncpy (dfa->re_str, pattern, length + 1);
 #endif
 
-  __libc_lock_init (dfa->lock);
-
   err = re_string_construct (&regexp, pattern, length, preg->translate,
 			     (syntax & RE_ICASE) != 0, dfa);
   if (BE (err != REG_NOERROR, 0))
diff --git a/posix/regex.c b/posix/regex.c
index cfadc6b096..d6591e8670 100644
--- a/posix/regex.c
+++ b/posix/regex.c
@@ -15,7 +15,7 @@ 
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 #ifndef _LIBC
 # include <config.h>
diff --git a/posix/regex.h b/posix/regex.h
index eef262981c..32933bc6d5 100644
--- a/posix/regex.h
+++ b/posix/regex.h
@@ -15,7 +15,7 @@ 
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 #ifndef _REGEX_H
 #define _REGEX_H 1
@@ -607,19 +607,32 @@  extern int re_exec (const char *);
 # endif
 #endif
 
-/* GCC 2.95 and later have "__restrict"; C99 compilers have
+/* For plain 'restrict', use glibc's __restrict if defined.
+   Otherwise, GCC 2.95 and later have "__restrict"; C99 compilers have
    "restrict", and "configure" may have defined "restrict".
    Other compilers use __restrict, __restrict__, and _Restrict, and
    'configure' might #define 'restrict' to those words, so pick a
    different name.  */
 #ifndef _Restrict_
-# define _Restrict_ __restrict
+# if defined __restrict || 2 < __GNUC__ + (95 <= __GNUC_MINOR__)
+#  define _Restrict_ __restrict
+# elif 199901L <= __STDC_VERSION__ || defined restrict
+#  define _Restrict_ restrict
+# else
+#  define _Restrict_
+# endif
 #endif
-/* gcc 3.1 and up support the [restrict] syntax.  Don't trust
-   sys/cdefs.h's definition of __restrict_arr, though, as it
-   mishandles gcc -ansi -pedantic.  */
+/* For [restrict], use glibc's __restrict_arr if available.
+   Otherwise, GCC 3.1 (not in C++ mode) and C99 support [restrict].  */
 #ifndef _Restrict_arr_
-# define _Restrict_arr_ __restrict_arr
+# ifdef __restrict_arr
+#  define _Restrict_arr_ __restrict_arr
+# elif ((199901L <= __STDC_VERSION__ || 3 < __GNUC__ + (1 <= __GNUC_MINOR__)) \
+        && !defined __GNUG__)
+#  define _Restrict_arr_ _Restrict_
+# else
+#  define _Restrict_arr_
+# endif
 #endif
 
 /* POSIX compatibility.  */
diff --git a/posix/regex_internal.c b/posix/regex_internal.c
index 2a0c01556e..7f0083b918 100644
--- a/posix/regex_internal.c
+++ b/posix/regex_internal.c
@@ -15,7 +15,7 @@ 
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 static void re_string_construct_common (const char *str, Idx len,
 					re_string_t *pstr,
diff --git a/posix/regex_internal.h b/posix/regex_internal.h
index e621b3eead..b27afbda2a 100644
--- a/posix/regex_internal.h
+++ b/posix/regex_internal.h
@@ -15,7 +15,7 @@ 
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 #ifndef _REGEX_INTERNAL_H
 #define _REGEX_INTERNAL_H 1
@@ -33,11 +33,13 @@ 
 #include <stdbool.h>
 #include <stdint.h>
 
+#include "intprops.h"
+
 #ifdef _LIBC
 # include <libc-lock.h>
 # define lock_define(name) __libc_lock_define (, name)
 # define lock_init(lock) (__libc_lock_init (lock), 0)
-# define lock_fini(lock)
+# define lock_fini(lock) ((void) 0)
 # define lock_lock(lock) __libc_lock_lock (lock)
 # define lock_unlock(lock) __libc_lock_unlock (lock)
 #elif defined GNULIB_LOCK && !defined USE_UNLOCKED_IO
diff --git a/posix/regexec.c b/posix/regexec.c
index 573fc877e1..63aef97535 100644
--- a/posix/regexec.c
+++ b/posix/regexec.c
@@ -15,7 +15,7 @@ 
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
+   <https://www.gnu.org/licenses/>.  */
 
 static reg_errcode_t match_ctx_init (re_match_context_t *cache, int eflags,
 				     Idx n);
@@ -317,23 +317,6 @@  re_search_2 (struct re_pattern_buffer *bufp, const char *string1, Idx length1,
 weak_alias (__re_search_2, re_search_2)
 #endif
 
-/* Set *R = A + B.  Return true if the answer is mathematically incorrect due
-   to overflow; in this case, *R is the low order bits of the correct
-   answer.  */
-static inline bool
-check_add_overflow_idx (Idx a, Idx b, Idx *r)
-{
-#if __GNUC__ >= 5
-  return __builtin_add_overflow (a, b, r);
-#else
-  if ((a > 0 && b > SSIZE_MAX - a)
-      || (a < 0 && b < SSIZE_MIN -a)
-    return false;
-  *r = a + b;
-  return true;
-#endif
-}
-
 static regoff_t
 re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
 		  Idx length1, const char *string2, Idx length2, Idx start,
@@ -346,7 +329,7 @@  re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
   char *s = NULL;
 
   if (BE ((length1 < 0 || length2 < 0 || stop < 0
-           || check_add_overflow_idx (length1, length2, &len)),
+           || INT_ADD_WRAPV (length1, length2, &len)),
           0))
     return -2;
 
-- 
2.17.1