[2/2] posix: Remove alloca usage for internal fnmatch implementation

Message ID 20210104202528.1228255-2-adhemerval.zanella@linaro.org
State Superseded
Headers
Series [1/2] posix: User scratch_buffer on fnmatch |

Commit Message

Adhemerval Zanella Netto Jan. 4, 2021, 8:25 p.m. UTC
  This patch replaces the internal fnmatch pattern list generation
to use a dynamic array.

Checked on x86_64-linux-gnu.
---
 posix/fnmatch.c      |  24 +-----
 posix/fnmatch_loop.c | 190 +++++++++++++++++++------------------------
 2 files changed, 87 insertions(+), 127 deletions(-)
  

Comments

Florian Weimer March 8, 2021, 12:59 p.m. UTC | #1
* Adhemerval Zanella via Libc-alpha:

> -    else if (*p == L_('|'))
> +    else if (*p == L_(')') || *p == L_('|'))
>        {
>          if (level == 0)
>            {
> -            NEW_PATTERN;
> -            startp = p + 1;
> +            size_t slen = opt == L_('?') || opt == L_('@')
> +			  ? pattern_len : p - startp + 1;
> +            CHAR *newp = malloc (slen * sizeof (CHAR));
> +            if (newp != NULL)
> +              {
> +                *((CHAR *) MEMPCPY (newp, startp, p - startp)) = L_('\0');
> +                PASTE (PATTERN_PREFIX,_add) (&list, newp);
> +              }
> +            if (newp == NULL || PASTE (PATTERN_PREFIX, _has_failed) (&list))
> +              {
> +                retval = -2;
> +                goto out;
> +              }
> +
> +            if (*p == L_('|'))
> +              startp = p + 1;
>            }

slen seems to be the wrong variable name.  But I don't know wh the
original code computes plen conditionally and then uses p - startp
unconditionally.  That seems wrong.  The discrepancy goes back to
821a6bb4360.  Do you see a case where the difference matters?

The == 0 checks for the recursive FCT calls are wrong because they treat
match failure the same as OOM and other errors (the -2 return value),
but that also is a pre-existing issue.

The conversation itself appears to be faithful.

Thanks,
Florian
  
Adhemerval Zanella Netto Oct. 20, 2021, 3:12 p.m. UTC | #2
On 08/03/2021 09:59, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> -    else if (*p == L_('|'))
>> +    else if (*p == L_(')') || *p == L_('|'))
>>        {
>>          if (level == 0)
>>            {
>> -            NEW_PATTERN;
>> -            startp = p + 1;
>> +            size_t slen = opt == L_('?') || opt == L_('@')
>> +			  ? pattern_len : p - startp + 1;
>> +            CHAR *newp = malloc (slen * sizeof (CHAR));
>> +            if (newp != NULL)
>> +              {
>> +                *((CHAR *) MEMPCPY (newp, startp, p - startp)) = L_('\0');
>> +                PASTE (PATTERN_PREFIX,_add) (&list, newp);
>> +              }
>> +            if (newp == NULL || PASTE (PATTERN_PREFIX, _has_failed) (&list))
>> +              {
>> +                retval = -2;
>> +                goto out;
>> +              }
>> +
>> +            if (*p == L_('|'))
>> +              startp = p + 1;
>>            }
> 
> slen seems to be the wrong variable name.  But I don't know wh the
> original code computes plen conditionally and then uses p - startp
> unconditionally.  That seems wrong.  The discrepancy goes back to
> 821a6bb4360.  Do you see a case where the difference matters?
> 
> The == 0 checks for the recursive FCT calls are wrong because they treat
> match failure the same as OOM and other errors (the -2 return value),
> but that also is a pre-existing issue.
> 
> The conversation itself appears to be faithful.

Hi Florian,

I noted this patch [1] is marked accepted, was you the one that
accepted it? In any case, are you still ok with the change?


[1] https://patchwork.sourceware.org/project/glibc/patch/20210202130804.1920933-2-adhemerval.zanella@linaro.org/
  
Florian Weimer Oct. 21, 2021, 9:54 a.m. UTC | #3
* Adhemerval Zanella:

> On 08/03/2021 09:59, Florian Weimer wrote:
>> * Adhemerval Zanella via Libc-alpha:
>> 
>>> -    else if (*p == L_('|'))
>>> +    else if (*p == L_(')') || *p == L_('|'))
>>>        {
>>>          if (level == 0)
>>>            {
>>> -            NEW_PATTERN;
>>> -            startp = p + 1;
>>> +            size_t slen = opt == L_('?') || opt == L_('@')
>>> +			  ? pattern_len : p - startp + 1;
>>> +            CHAR *newp = malloc (slen * sizeof (CHAR));
>>> +            if (newp != NULL)
>>> +              {
>>> +                *((CHAR *) MEMPCPY (newp, startp, p - startp)) = L_('\0');
>>> +                PASTE (PATTERN_PREFIX,_add) (&list, newp);
>>> +              }
>>> +            if (newp == NULL || PASTE (PATTERN_PREFIX, _has_failed) (&list))
>>> +              {
>>> +                retval = -2;
>>> +                goto out;
>>> +              }
>>> +
>>> +            if (*p == L_('|'))
>>> +              startp = p + 1;
>>>            }
>> 
>> slen seems to be the wrong variable name.  But I don't know wh the
>> original code computes plen conditionally and then uses p - startp
>> unconditionally.  That seems wrong.  The discrepancy goes back to
>> 821a6bb4360.  Do you see a case where the difference matters?
>> 
>> The == 0 checks for the recursive FCT calls are wrong because they treat
>> match failure the same as OOM and other errors (the -2 return value),
>> but that also is a pre-existing issue.
>> 
>> The conversation itself appears to be faithful.
>
> Hi Florian,
>
> I noted this patch [1] is marked accepted, was you the one that
> accepted it? In any case, are you still ok with the change?
>
>
> [1] https://patchwork.sourceware.org/project/glibc/patch/20210202130804.1920933-2-adhemerval.zanella@linaro.org/

I think all the issues I identified are pre-existing, and as I said, the
conversion to remove alloca appears to be correct.

Thanks,
Florian
  

Patch

diff --git a/posix/fnmatch.c b/posix/fnmatch.c
index ac254fc9ac..2a6186b594 100644
--- a/posix/fnmatch.c
+++ b/posix/fnmatch.c
@@ -31,9 +31,6 @@ 
 #include <ctype.h>
 #include <string.h>
 #include <stdlib.h>
-#if defined _LIBC || HAVE_ALLOCA
-# include <alloca.h>
-#endif
 #include <wchar.h>
 #include <wctype.h>
 #include <stddef.h>
@@ -87,22 +84,6 @@  typedef ptrdiff_t idx_t;
 #define NO_LEADING_PERIOD(flags) \
   ((flags & (FNM_FILE_NAME | FNM_PERIOD)) == (FNM_FILE_NAME | FNM_PERIOD))
 
-#ifndef _LIBC
-# if HAVE_ALLOCA
-/* The OS usually guarantees only one guard page at the bottom of the stack,
-   and a page size can be as small as 4096 bytes.  So we cannot safely
-   allocate anything larger than 4096 bytes.  Also care for the possibility
-   of a few compiler-allocated temporary stack slots.  */
-#  define __libc_use_alloca(n) ((n) < 4032)
-# else
-/* Just use malloc.  */
-#  define __libc_use_alloca(n) false
-#  undef alloca
-#  define alloca(n) malloc (n)
-# endif
-# define alloca_account(size, avar) ((avar) += (size), alloca (size))
-#endif
-
 /* Provide support for user-defined character classes, based on the functions
    from ISO C 90 amendment 1.  */
 #ifdef CHARCLASS_NAME_MAX
@@ -293,8 +274,7 @@  fnmatch (const char *pattern, const char *string, int flags)
 
       int res = internal_fnwmatch (wpattern.data, wstring.data,
                                    (wchar_t *) wstring.data + n,
-                                   flags & FNM_PERIOD, flags, NULL,
-                                   false);
+                                   flags & FNM_PERIOD, flags, NULL);
 
       scratch_buffer_free (&wstring);
       scratch_buffer_free (&wpattern);
@@ -303,7 +283,7 @@  fnmatch (const char *pattern, const char *string, int flags)
     }
 
   return internal_fnmatch (pattern, string, string + strlen (string),
-                           flags & FNM_PERIOD, flags, NULL, 0);
+                           flags & FNM_PERIOD, flags, NULL);
 }
 
 #undef fnmatch
diff --git a/posix/fnmatch_loop.c b/posix/fnmatch_loop.c
index 7f938af590..69f78f0fd8 100644
--- a/posix/fnmatch_loop.c
+++ b/posix/fnmatch_loop.c
@@ -30,15 +30,14 @@  struct STRUCT
    it matches, nonzero if not.  */
 static int FCT (const CHAR *pattern, const CHAR *string,
                 const CHAR *string_end, bool no_leading_period, int flags,
-                struct STRUCT *ends, size_t alloca_used);
+                struct STRUCT *ends);
 static int EXT (INT opt, const CHAR *pattern, const CHAR *string,
-                const CHAR *string_end, bool no_leading_period, int flags,
-                size_t alloca_used);
+                const CHAR *string_end, bool no_leading_period, int flags);
 static const CHAR *END (const CHAR *patternp);
 
 static int
 FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
-     bool no_leading_period, int flags, struct STRUCT *ends, size_t alloca_used)
+     bool no_leading_period, int flags, struct STRUCT *ends)
 {
   const CHAR *p = pattern, *n = string;
   UCHAR c;
@@ -62,8 +61,7 @@  FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
         case L_('?'):
           if (__glibc_unlikely (flags & FNM_EXTMATCH) && *p == '(')
             {
-              int res = EXT (c, p, n, string_end, no_leading_period,
-                             flags, alloca_used);
+              int res = EXT (c, p, n, string_end, no_leading_period, flags);
               if (res != -1)
                 return res;
             }
@@ -92,8 +90,7 @@  FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
         case L_('*'):
           if (__glibc_unlikely (flags & FNM_EXTMATCH) && *p == '(')
             {
-              int res = EXT (c, p, n, string_end, no_leading_period,
-                             flags, alloca_used);
+              int res = EXT (c, p, n, string_end, no_leading_period, flags);
               if (res != -1)
                 return res;
             }
@@ -182,7 +179,7 @@  FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
 
                   for (--p; n < endp; ++n, no_leading_period = false)
                     if (FCT (p, n, string_end, no_leading_period, flags2,
-                             &end, alloca_used) == 0)
+                             &end) == 0)
                       goto found;
                 }
               else if (c == L_('/') && (flags & FNM_FILE_NAME))
@@ -191,7 +188,7 @@  FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
                     ++n;
                   if (n < string_end && *n == L_('/')
                       && (FCT (p, n + 1, string_end, flags & FNM_PERIOD, flags,
-                               NULL, alloca_used) == 0))
+                               NULL) == 0))
                     return 0;
                 }
               else
@@ -205,7 +202,7 @@  FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
                   for (--p; n < endp; ++n, no_leading_period = false)
                     if (FOLD ((UCHAR) *n) == c
                         && (FCT (p, n, string_end, no_leading_period, flags2,
-                                 &end, alloca_used) == 0))
+                                 &end) == 0))
                       {
                       found:
                         if (end.pattern == NULL)
@@ -892,8 +889,7 @@  FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end,
         case L_('!'):
           if (__glibc_unlikely (flags & FNM_EXTMATCH) && *p == '(')
             {
-              int res = EXT (c, p, n, string_end, no_leading_period, flags,
-                             alloca_used);
+              int res = EXT (c, p, n, string_end, no_leading_period, flags);
               if (res != -1)
                 return res;
             }
@@ -972,26 +968,37 @@  END (const CHAR *pattern)
   return p + 1;
 }
 
+#if WIDE_CHAR_VERSION
+# define PATTERN_PREFIX pattern_list
+#else
+# define PATTERN_PREFIX wpattern_list
+#endif
+
+#define PASTE(a,b)                 PASTE1(a,b)
+#define PASTE1(a,b)                a##b
+
+#define DYNARRAY_STRUCT            PATTERN_PREFIX
+#define DYNARRAY_ELEMENT_FREE(ptr) free (*ptr)
+#define DYNARRAY_ELEMENT           CHAR *
+#define DYNARRAY_PREFIX            PASTE(PATTERN_PREFIX,_)
+#define DYNARRAY_INITIAL_SIZE      8
+#include <malloc/dynarray-skeleton.c>
 
 static int
 EXT (INT opt, const CHAR *pattern, const CHAR *string, const CHAR *string_end,
-     bool no_leading_period, int flags, size_t alloca_used)
+     bool no_leading_period, int flags)
 {
   const CHAR *startp;
   ptrdiff_t level;
-  struct patternlist
-  {
-    struct patternlist *next;
-    CHAR malloced;
-    CHAR str __flexarr;
-  } *list = NULL;
-  struct patternlist **lastp = &list;
+  struct PATTERN_PREFIX list;
   size_t pattern_len = STRLEN (pattern);
-  bool any_malloced = false;
+  size_t pattern_i = 0;
   const CHAR *p;
   const CHAR *rs;
   int retval = 0;
 
+  PASTE (PATTERN_PREFIX, _init) (&list);
+
   /* Parse the pattern.  Store the individual parts in the list.  */
   level = 0;
   for (startp = p = pattern + 1; level >= 0; ++p)
@@ -1027,74 +1034,48 @@  EXT (INT opt, const CHAR *pattern, const CHAR *string, const CHAR *string_end,
               || *p == L_('!')) && p[1] == L_('('))
       /* Remember the nesting level.  */
       ++level;
-    else if (*p == L_(')'))
-      {
-        if (level-- == 0)
-          {
-            /* This means we found the end of the pattern.  */
-#define NEW_PATTERN \
-            struct patternlist *newp;                                         \
-            size_t plen = (opt == L_('?') || opt == L_('@')                   \
-                           ? pattern_len : (p - startp + 1UL));               \
-            idx_t slen = FLEXSIZEOF (struct patternlist, str, 0);             \
-            idx_t new_used = alloca_used + slen;                              \
-            idx_t plensize;                                                   \
-            if (INT_MULTIPLY_WRAPV (plen, sizeof (CHAR), &plensize)           \
-                || INT_ADD_WRAPV (new_used, plensize, &new_used))             \
-              {                                                               \
-                retval = -2;                                                  \
-                goto out;                                                     \
-              }                                                               \
-            slen += plensize;                                                 \
-            bool malloced = ! __libc_use_alloca (new_used);                   \
-            if (__glibc_unlikely (malloced))                                  \
-              {                                                               \
-                newp = malloc (slen);                                         \
-                if (newp == NULL)                                             \
-                  {                                                           \
-                    retval = -2;                                              \
-                    goto out;                                                 \
-                  }                                                           \
-                any_malloced = true;                                          \
-              }                                                               \
-            else                                                              \
-              newp = alloca_account (slen, alloca_used);                      \
-            newp->next = NULL;                                                \
-            newp->malloced = malloced;                                        \
-            *((CHAR *) MEMPCPY (newp->str, startp, p - startp)) = L_('\0');   \
-            *lastp = newp;                                                    \
-            lastp = &newp->next
-            NEW_PATTERN;
-          }
-      }
-    else if (*p == L_('|'))
+    else if (*p == L_(')') || *p == L_('|'))
       {
         if (level == 0)
           {
-            NEW_PATTERN;
-            startp = p + 1;
+            size_t slen = opt == L_('?') || opt == L_('@')
+			  ? pattern_len : p - startp + 1;
+            CHAR *newp = malloc (slen * sizeof (CHAR));
+            if (newp != NULL)
+              {
+                *((CHAR *) MEMPCPY (newp, startp, p - startp)) = L_('\0');
+                PASTE (PATTERN_PREFIX,_add) (&list, newp);
+              }
+            if (newp == NULL || PASTE (PATTERN_PREFIX, _has_failed) (&list))
+              {
+                retval = -2;
+                goto out;
+              }
+
+            if (*p == L_('|'))
+              startp = p + 1;
           }
+        if (*p == L_(')'))
+	  level--;
       }
-  assert (list != NULL);
   assert (p[-1] == L_(')'));
-#undef NEW_PATTERN
 
   switch (opt)
     {
     case L_('*'):
-      if (FCT (p, string, string_end, no_leading_period, flags, NULL,
-               alloca_used) == 0)
+      if (FCT (p, string, string_end, no_leading_period, flags, NULL) == 0)
         goto success;
       FALLTHROUGH;
     case L_('+'):
-      do
+      for (; pattern_i < PASTE (PATTERN_PREFIX, _size)(&list); pattern_i++)
         {
           for (rs = string; rs <= string_end; ++rs)
             /* First match the prefix with the current pattern with the
                current pattern.  */
-            if (FCT (list->str, string, rs, no_leading_period,
+            if (FCT (*PASTE (PATTERN_PREFIX, _at) (&list, pattern_i), string,
+                     rs, no_leading_period,
                      flags & FNM_FILE_NAME ? flags : flags & ~FNM_PERIOD,
-                     NULL, alloca_used) == 0
+                     NULL) == 0
                 /* This was successful.  Now match the rest with the rest
                    of the pattern.  */
                 && (FCT (p, rs, string_end,
@@ -1102,7 +1083,7 @@  EXT (INT opt, const CHAR *pattern, const CHAR *string, const CHAR *string_end,
                          ? no_leading_period
                          : rs[-1] == '/' && NO_LEADING_PERIOD (flags),
                          flags & FNM_FILE_NAME
-                         ? flags : flags & ~FNM_PERIOD, NULL, alloca_used) == 0
+                         ? flags : flags & ~FNM_PERIOD, NULL) == 0
                     /* This didn't work.  Try the whole pattern.  */
                     || (rs != string
                         && FCT (pattern - 1, rs, string_end,
@@ -1110,35 +1091,33 @@  EXT (INT opt, const CHAR *pattern, const CHAR *string, const CHAR *string_end,
                                 ? no_leading_period
                                 : rs[-1] == '/' && NO_LEADING_PERIOD (flags),
                                 flags & FNM_FILE_NAME
-                                ? flags : flags & ~FNM_PERIOD, NULL,
-                                alloca_used) == 0)))
+                                ? flags : flags & ~FNM_PERIOD, NULL) == 0)))
               /* It worked.  Signal success.  */
               goto success;
         }
-      while ((list = list->next) != NULL);
 
       /* None of the patterns lead to a match.  */
       retval = FNM_NOMATCH;
       break;
 
     case L_('?'):
-      if (FCT (p, string, string_end, no_leading_period, flags, NULL,
-               alloca_used) == 0)
+      if (FCT (p, string, string_end, no_leading_period, flags, NULL) == 0)
         goto success;
       FALLTHROUGH;
     case L_('@'):
-      do
-        /* I cannot believe it but 'strcat' is actually acceptable
-           here.  Match the entire string with the prefix from the
-           pattern list and the rest of the pattern following the
-           pattern list.  */
-        if (FCT (STRCAT (list->str, p), string, string_end,
-                 no_leading_period,
-                 flags & FNM_FILE_NAME ? flags : flags & ~FNM_PERIOD,
-                 NULL, alloca_used) == 0)
-          /* It worked.  Signal success.  */
-          goto success;
-      while ((list = list->next) != NULL);
+      for (; pattern_i < PASTE (PATTERN_PREFIX, _size) (&list); pattern_i++)
+        {
+          /* I cannot believe it but `strcat' is actually acceptable
+             here.  Match the entire string with the prefix from the
+             pattern list and the rest of the pattern following the
+             pattern list.  */
+          if (FCT (STRCAT (*PASTE (PATTERN_PREFIX, _at) (&list, pattern_i), p),
+                   string, string_end, no_leading_period,
+                   flags & FNM_FILE_NAME ? flags : flags & ~FNM_PERIOD,
+                   NULL) == 0)
+            /* It worked.  Signal success.  */
+            goto success;
+        }
 
       /* None of the patterns lead to a match.  */
       retval = FNM_NOMATCH;
@@ -1147,22 +1126,27 @@  EXT (INT opt, const CHAR *pattern, const CHAR *string, const CHAR *string_end,
     case L_('!'):
       for (rs = string; rs <= string_end; ++rs)
         {
-          struct patternlist *runp;
+	  size_t runp_i;
 
-          for (runp = list; runp != NULL; runp = runp->next)
-            if (FCT (runp->str, string, rs,  no_leading_period,
-                     flags & FNM_FILE_NAME ? flags : flags & ~FNM_PERIOD,
-                     NULL, alloca_used) == 0)
+          for (runp_i = pattern_i;
+               runp_i != PASTE (PATTERN_PREFIX, _size) (&list);
+               runp_i++)
+            {
+              if (FCT (*PASTE (PATTERN_PREFIX, _at) (&list, runp_i), string, rs,
+                       no_leading_period,
+                       flags & FNM_FILE_NAME ? flags : flags & ~FNM_PERIOD,
+                       NULL) == 0)
               break;
+            }
 
           /* If none of the patterns matched see whether the rest does.  */
-          if (runp == NULL
+          if (runp_i == PASTE (PATTERN_PREFIX, _size) (&list)
               && (FCT (p, rs, string_end,
                        rs == string
                        ? no_leading_period
                        : rs[-1] == '/' && NO_LEADING_PERIOD (flags),
                        flags & FNM_FILE_NAME ? flags : flags & ~FNM_PERIOD,
-                       NULL, alloca_used) == 0))
+                       NULL) == 0))
             /* This is successful.  */
             goto success;
         }
@@ -1180,18 +1164,14 @@  EXT (INT opt, const CHAR *pattern, const CHAR *string, const CHAR *string_end,
 
  success:
  out:
-  if (any_malloced)
-    while (list != NULL)
-      {
-        struct patternlist *old = list;
-        list = list->next;
-        if (old->malloced)
-          free (old);
-      }
+  PASTE (PATTERN_PREFIX, _free) (&list);
 
   return retval;
 }
 
+#undef PATTERN_PREFIX
+#undef PASTE
+#undef PASTE1
 
 #undef FOLD
 #undef CHAR