posix: Sync gnulib regex implementation

Message ID 09fa88fe-87e5-2c98-43e7-2dfeffabffbf@cs.ucla.edu
State New, archived
Headers

Commit Message

Paul Eggert June 29, 2018, 11:10 p.m. UTC
  On 06/29/2018 09:00 AM, Adhemerval Zanella wrote:
> Paul, could you check what I am missing on the overflow check so we can
> move forward? I really don't want to get this sync stalled because of
> intprops.h addition as a pre-requisite.

I can add comments to  INT_ADD_WRAPV, if that would help. Please ask 
questions about it; that would help me understand what parts need better 
commenting.

I'm afraid that Florian's comment about testability is a bit of a red 
herring, as intprops.h is an internal header, not a published one. Glibc 
typically doesn't have test cases for macros defined in internal include 
files, and there's no need to test intprops.h directly. Besides, 
intprops.h and INT_ADD_WRAPV have been tested widely in commonly-used 
GNU applications like Coreutils, so it's not like I'm proposing anything 
flaky here.

That being said, this use of INT_ADD_WRAPV is unimportant (as far as I 
know, no Gnu apps use this two-buffer regex searching code any more), so 
I guess it's OK here to write a simple substitute that's good enough for 
regex even if it's not robust or general. The goal should be simplicity, 
portability, and meeting regex's needs. Proposed further Glibc patch 
attached (it assumes your patch). I've installed a similar patch into 
Gnulib.

I really don't want to get into the habit of doing this sort of thing, 
though. There's a lot of hard-won integer overflow expertise in 
intprops.h and it's a waste of time to continue to maintain poor 
imitations of it.


> I tried to check the pos-processed implementation using intprops.h for
> the snippet ... And its results are pretty much unreadable.
Trying to audit this by looking at the preprocessed output is a bit like 
auditing glibc by looking at the generated machine code. It can be done, 
but you can't expect it to be easy. For your particular compiler I 
expect that the assembly-language output is more readable than the 
preprocessor output, so if you want to look at low level stuff I suggest 
looking at assembly language. But really, you should be better off 
reading the original source (and if it needs more comments, please say 
where and why and l'll add them).
  

Patch

From 614946f802c3a7c2b11be14f85d3ed80f5e7ab57 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 29 Jun 2018 15:42:16 -0700
Subject: [PATCH] Merge better with Gnulib

* posix/regcomp.c (TYPE_SIGNED): Remove; now done by regex_internal.h.
* posix/regex_internal.h [_LIBC]: Do not include intprops.h.
(TYPE_SIGNED, INT_ADD_WRAPV) [_LIBC]: New macros.
* posix/regexec.c (check_add_overflow_idx): Remove.
(re_search_2_stub): Use INT_ADD_WRAPV insead of check_add_overflow_idx.
---
 ChangeLog              |  9 +++++++++
 posix/regcomp.c        |  4 ----
 posix/regex_internal.h | 18 ++++++++++++++++++
 posix/regexec.c        | 18 +-----------------
 4 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index c51c651220..4e665873ff 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@ 
+2018-06-29  Paul Eggert  <eggert@cs.ucla.edu>
+
+	Merge better with Gnulib
+	* posix/regcomp.c (TYPE_SIGNED): Remove; now done by regex_internal.h.
+	* posix/regex_internal.h [_LIBC]: Do not include intprops.h.
+	(TYPE_SIGNED, INT_ADD_WRAPV) [_LIBC]: New macros.
+	* posix/regexec.c (check_add_overflow_idx): Remove.
+	(re_search_2_stub): Use INT_ADD_WRAPV insead of check_add_overflow_idx.
+
 2018-06-29  Adhemerval Zanella  <adhemerval.zanella@linaro.org>
 
 	[BZ #23233]
diff --git a/posix/regcomp.c b/posix/regcomp.c
index 6aaa54f66d..7b5ddaad0c 100644
--- a/posix/regcomp.c
+++ b/posix/regcomp.c
@@ -2650,10 +2650,6 @@  parse_dup_op (bin_tree_t *elem, re_string_t *regexp, re_dfa_t *dfa,
   if (BE (tree == NULL, 0))
     goto parse_dup_op_espace;
 
-/* From gnulib's "intprops.h":
-   True if the arithmetic type T is signed.  */
-#define TYPE_SIGNED(t) (! ((t) 0 < (t) -1))
-
   /* This loop is actually executed only when end != -1,
      to rewrite <re>{0,n} as (<re>(<re>...<re>?)?)?...  We have
      already created the start+1-th copy.  */
diff --git a/posix/regex_internal.h b/posix/regex_internal.h
index f38f6b3d80..3b836ed206 100644
--- a/posix/regex_internal.h
+++ b/posix/regex_internal.h
@@ -33,6 +33,24 @@ 
 #include <stdbool.h>
 #include <stdint.h>
 
+/* Properties of integers.  Although Gnulib has intprops.h, glibc does
+   without for now.  */
+#ifndef _LIBC
+# include "intprops.h"
+#else
+/* True if the real type T is signed.  */
+# define TYPE_SIGNED(t) (! ((t) 0 < (t) -1))
+
+/* True if adding the nonnegative Idx values A and B would overflow.
+   If false, set *R to A + B.  A, B, and R may be evaluated more than
+   once, or zero times.  Although this is not a full implementation of
+   Gnulib INT_ADD_WRAPV, it is good enough for glibc regex code.
+   FIXME: This implementation is a fragile stopgap, and this file would
+   be simpler and more robust if intprops.h were migrated into glibc.  */
+# define INT_ADD_WRAPV(a, b, r) \
+   (IDX_MAX - (a) < (b) ? true : (*(r) = (a) + (b), false))
+#endif
+
 #ifdef _LIBC
 # include <libc-lock.h>
 # define lock_define(name) __libc_lock_define (, name)
diff --git a/posix/regexec.c b/posix/regexec.c
index d64b025d12..63aef97535 100644
--- a/posix/regexec.c
+++ b/posix/regexec.c
@@ -317,22 +317,6 @@  re_search_2 (struct re_pattern_buffer *bufp, const char *string1, Idx length1,
 weak_alias (__re_search_2, re_search_2)
 #endif
 
-/* Set *R = A + B.  Return true if the answer is mathematically incorrect due
-   to overflow; in this case, *R is the low order bits of the correct
-   answer.  */
-static inline bool
-check_add_overflow_idx (Idx a, Idx b, Idx *r)
-{
-#if __GNUC__ >= 5
-  return __builtin_add_overflow (a, b, r);
-#else
-  *r = a + b;
-  if (a < 0 && b < IDX_MAX -a)
-    return false;
-  return true;
-#endif
-}
-
 static regoff_t
 re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
 		  Idx length1, const char *string2, Idx length2, Idx start,
@@ -345,7 +329,7 @@  re_search_2_stub (struct re_pattern_buffer *bufp, const char *string1,
   char *s = NULL;
 
   if (BE ((length1 < 0 || length2 < 0 || stop < 0
-           || check_add_overflow_idx (length1, length2, &len)),
+           || INT_ADD_WRAPV (length1, length2, &len)),
           0))
     return -2;
 
-- 
2.17.1