From patchwork Thu Jun 28 20:21:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Eggert X-Patchwork-Id: 28114 Received: (qmail 103098 invoked by alias); 28 Jun 2018 20:22:05 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 103058 invoked by uid 89); 28 Jun 2018 20:22:03 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-24.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, SPF_PASS autolearn=ham version=3.3.2 spammy=Act, tight, representing, 3.1 X-HELO: zimbra.cs.ucla.edu Subject: Re: [PATCH] posix: Sync gnulib regex implementation To: Adhemerval Zanella , libc-alpha@sourceware.org References: <1530204275-31537-1-git-send-email-adhemerval.zanella@linaro.org> From: Paul Eggert Openpgp: preference=signencrypt Message-ID: Date: Thu, 28 Jun 2018 13:21:50 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <1530204275-31537-1-git-send-email-adhemerval.zanella@linaro.org> Adhemerval Zanella wrote: > This patch syncs regex code with gnulib (commit 16aa5a2). Thanks for doing all that work. I merged some of the changes embodied in that patch back to Gnulib; see . Here are some comments on the changes that I had questions about: > * regcomp: add a __libc_lock_init init for dfa lock. Why is this needed? The lock has already been initialized by the call to lock_init about 15 lines earlier. Please see attached patch, which reverts this change. > * regexec: instead of use intprops.h and its macros, this patch > the check_add_overflow_idx function to check for addition > overflow. The function need to be defined on the file because > the file will be build multiple times wit 'Idx' type being > redefined (I think intprops.h addition should be done in a > separate patch). When used with Gnulib, that implementation won't work with ICC 17.0.4 20170411, which defines __GNUC__ to be 5 but does not support __builtin_add_overflow. Also, the __GNUC__ < 5 code doesn't look right for multiple reasons: (1) it doesn't set *r when returning false, contrary to the comment describing the code (presumably the comment needs to be fixed?), (2) it incorrectly uses SSIZE_MAX instead of INT_MAX when _REGEX_LARGE_OFFSETS is not defined, and (3) it has an unnecessary test for a > 0 (you need to test only whether a < 0). Although these issues could be fixed individually, why waste time? Let's just use intprops.h and avoid the hassle of hacking on the bugs of a partial replacement. Attached please see an intprops.h addition that is done as a separate patch. The second attached patch goes back to using INT_ADD_WRAPV instead of check_add_overflow. > * regex.h: Remove _Restrict_ and _Restrict_arr_ definition based > on __STDC_VERSION__ because its usage leads to failures on > posix/check-installed-headers-c{xx}. That implementation assumes glibc, so won't work in Gnulib. I attempted to work around the check-installed-headers issue in the attached patch; if this doesn't work please let me know what the issue is. > * regex_internal.h: Define lock_fini to empty macro because setting > to 0 lead to build issues (error: statement with no effect > [-Werror=unused-value]). Making it empty is problematic, since lock_fini is supposed to be usable wherever a function call could appear. Instead, let's define it to ((void) 0). That should fix the -Wunused-value issue in a better way. See attached patch. > 2. posix/PCRE.tests: the test '(a)|\1' uses a backreference along > with group creation and I am not sure if it is the correct > behavior to accept it with regcomp (REG_EXTENDED). The GNU grep > accepts it with ERE option though. POSIX allows recomp and grep to accept this pattern as an extension. I dunno what it means, though, and no doubt regcomp and grep should report an error instead of accepting it silently. That would be a different bugfix, though. One other thing: let's use https: intead of http: in URLs, as per Gnulib style. We should be using these everywhere in Glibc, of course, but one step at a time. See attached patches for what the above comments boil down to. They are intended to be applied after the patch you posted. The patches could all be squashed. From 86022a73a20f701ec54eb9baa5d963d3c665051d Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 28 Jun 2018 12:49:07 -0700 Subject: [PATCH 2/2] posix: finish merge with Gnulib regex * posix/regex_internal.h: Include intprops.h. * posix/regcomp.c (re_compile_internal): Omit unnecessary call to __libc_lock_init, as the previous lock_init has already initialized the lock. * posix/regex.h (_Restrict_, _Restrict_arr_): Port to Gnulib case, where __restrict and __restrict_arr might not work. Do this in a better way than Gnulib did until today. * posix/regexec.c (check_add_overflow_idx): Remove, as it was not portable to ICC, and had some problems when __GNUC__ < 5. Use replaced by intprops.h's INT_ADD_WRAPV. --- ChangeLog | 13 +++++++++++++ posix/regcomp.c | 4 +--- posix/regex.c | 2 +- posix/regex.h | 27 ++++++++++++++++++++------- posix/regex_internal.c | 2 +- posix/regex_internal.h | 6 ++++-- posix/regexec.c | 21 ++------------------- 7 files changed, 42 insertions(+), 33 deletions(-) diff --git a/ChangeLog b/ChangeLog index 27793d1456..38935253b0 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,18 @@ 2018-06-28 Paul Eggert + posix: finish merge with Gnulib regex + * posix/regex_internal.h: Include intprops.h. + * posix/regcomp.c (re_compile_internal): Omit unnecessary call to + __libc_lock_init, as the previous lock_init has already + initialized the lock. + * posix/regex.h (_Restrict_, _Restrict_arr_): + Port to Gnulib case, where __restrict and __restrict_arr + might not work. Do this in a better way than Gnulib + did until today. + * posix/regexec.c (check_add_overflow_idx): + Remove, as it was not portable to ICC, and had some problems + when __GNUC__ < 5. Use replaced by intprops.h's INT_ADD_WRAPV. + posix: add intprops.h to prepare for regex sync * include/intprops.h: New file, taken from Gnulib except with reworded header comments. This will be used in a diff --git a/posix/regcomp.c b/posix/regcomp.c index 8431268b1c..6aaa54f66d 100644 --- a/posix/regcomp.c +++ b/posix/regcomp.c @@ -15,7 +15,7 @@ You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see - . */ + . */ #ifdef _LIBC # include @@ -784,8 +784,6 @@ re_compile_internal (regex_t *preg, const char * pattern, size_t length, strncpy (dfa->re_str, pattern, length + 1); #endif - __libc_lock_init (dfa->lock); - err = re_string_construct (®exp, pattern, length, preg->translate, (syntax & RE_ICASE) != 0, dfa); if (BE (err != REG_NOERROR, 0)) diff --git a/posix/regex.c b/posix/regex.c index cfadc6b096..d6591e8670 100644 --- a/posix/regex.c +++ b/posix/regex.c @@ -15,7 +15,7 @@ You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see - . */ + . */ #ifndef _LIBC # include diff --git a/posix/regex.h b/posix/regex.h index eef262981c..32933bc6d5 100644 --- a/posix/regex.h +++ b/posix/regex.h @@ -15,7 +15,7 @@ You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see - . */ + . */ #ifndef _REGEX_H #define _REGEX_H 1 @@ -607,19 +607,32 @@ extern int re_exec (const char *); # endif #endif -/* GCC 2.95 and later have "__restrict"; C99 compilers have +/* For plain 'restrict', use glibc's __restrict if defined. + Otherwise, GCC 2.95 and later have "__restrict"; C99 compilers have "restrict", and "configure" may have defined "restrict". Other compilers use __restrict, __restrict__, and _Restrict, and 'configure' might #define 'restrict' to those words, so pick a different name. */ #ifndef _Restrict_ -# define _Restrict_ __restrict +# if defined __restrict || 2 < __GNUC__ + (95 <= __GNUC_MINOR__) +# define _Restrict_ __restrict +# elif 199901L <= __STDC_VERSION__ || defined restrict +# define _Restrict_ restrict +# else +# define _Restrict_ +# endif #endif -/* gcc 3.1 and up support the [restrict] syntax. Don't trust - sys/cdefs.h's definition of __restrict_arr, though, as it - mishandles gcc -ansi -pedantic. */ +/* For [restrict], use glibc's __restrict_arr if available. + Otherwise, GCC 3.1 (not in C++ mode) and C99 support [restrict]. */ #ifndef _Restrict_arr_ -# define _Restrict_arr_ __restrict_arr +# ifdef __restrict_arr +# define _Restrict_arr_ __restrict_arr +# elif ((199901L <= __STDC_VERSION__ || 3 < __GNUC__ + (1 <= __GNUC_MINOR__)) \ + && !defined __GNUG__) +# define _Restrict_arr_ _Restrict_ +# else +# define _Restrict_arr_ +# endif #endif /* POSIX compatibility. */ diff --git a/posix/regex_internal.c b/posix/regex_internal.c index 2a0c01556e..7f0083b918 100644 --- a/posix/regex_internal.c +++ b/posix/regex_internal.c @@ -15,7 +15,7 @@ You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see - . */ + . */ static void re_string_construct_common (const char *str, Idx len, re_string_t *pstr, diff --git a/posix/regex_internal.h b/posix/regex_internal.h index e621b3eead..b27afbda2a 100644 --- a/posix/regex_internal.h +++ b/posix/regex_internal.h @@ -15,7 +15,7 @@ You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see - . */ + . */ #ifndef _REGEX_INTERNAL_H #define _REGEX_INTERNAL_H 1 @@ -33,11 +33,13 @@ #include #include +#include "intprops.h" + #ifdef _LIBC # include # define lock_define(name) __libc_lock_define (, name) # define lock_init(lock) (__libc_lock_init (lock), 0) -# define lock_fini(lock) +# define lock_fini(lock) ((void) 0) # define lock_lock(lock) __libc_lock_lock (lock) # define lock_unlock(lock) __libc_lock_unlock (lock) #elif defined GNULIB_LOCK && !defined USE_UNLOCKED_IO diff --git a/posix/regexec.c b/posix/regexec.c index 573fc877e1..63aef97535 100644 --- a/posix/regexec.c +++ b/posix/regexec.c @@ -15,7 +15,7 @@ You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see - . */ + . */ static reg_errcode_t match_ctx_init (re_match_context_t *cache, int eflags, Idx n); @@ -317,23 +317,6 @@ re_search_2 (struct re_pattern_buffer *bufp, const char *string1, Idx length1, weak_alias (__re_search_2, re_search_2) #endif -/* Set *R = A + B. Return true if the answer is mathematically incorrect due - to overflow; in this case, *R is the low order bits of the correct - answer. */ -static inline bool -check_add_overflow_idx (Idx a, Idx b, Idx *r) -{ -#if __GNUC__ >= 5 - return __builtin_add_overflow (a, b, r); -#else - if ((a > 0 && b > SSIZE_MAX - a) - || (a < 0 && b < SSIZE_MIN -a) - return false; - *r = a + b; - return true; -#endif -} - static regoff_t re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1, Idx length1, const char *string2, Idx length2, Idx start, @@ -346,7 +329,7 @@ re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1, char *s = NULL; if (BE ((length1 < 0 || length2 < 0 || stop < 0 - || check_add_overflow_idx (length1, length2, &len)), + || INT_ADD_WRAPV (length1, length2, &len)), 0)) return -2; -- 2.17.1