Patchwork [07/17] malloc: Add specialized dynarray for C strings

login
register
mail settings
Submitter Adhemerval Zanella Netto
Date June 13, 2017, 2:24 p.m.
Message ID <5881a348-d176-0a53-f970-361d200f5602@linaro.org>
Download mbox | patch
Permalink /patch/20989/
State New
Headers show

Comments

Adhemerval Zanella Netto - June 13, 2017, 2:24 p.m.
On 13/06/2017 09:17, Adhemerval Zanella wrote:
> 
> 
> On 13/06/2017 06:46, Florian Weimer wrote:
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>>
>>> The provided function are not extensive and meant mainly to be use in
>>> subsequent glob implementation cleanup.  For internal object
>>> consistency only the function provided by char_array.c should be used,
>>> including internal object manipulation.
>>
>> It's surprising that glob needs these many functions.
>>
>> I'm not sure if the automatic NUL termination makes sense if we ever
>> want to generalize this interface.  How ingrained is that to the glob
>> use case?
> 
> glob internally prepends the search directory by replacing the tilde with
> a external home directory (GLOB_TILDE, either by environment variable or
> through getXXX functions) or append new data by calling new patterns
> recursively (GLOB_BRACE).
> 
> It is not an specific usage to glob, but rather a generic one which with 
> more higher level languages all the memory buffer bikeshedding will be 
> avoided by a string object. The idea is to keep a consistent C string buffer
> stack-allocated (for general use) or heap based (to not bound due the
> internal expansion) which can be used as constant argument for other
> function call (fnmatch or recursively in case of GLOB_BRACE). 
> 
>>
>>> +++ b/malloc/char_array.c
>>
>> Should this be malloc/char_array-skeleton.c, to indicate that this file
>> is parameterized and intended for inclusion into another .c file?
> 
> I am not sure, the idea of char_array is to be self-contained, there is
> no need to define anything in the file to include it.  Also, if we see
> other usage of dynamic C string inside glibc it would be a good idea to
> add internal symbols for common symbols.
> 
>>
>>> +#define DYNARRAY_ELEMENT_INIT(__e) (*__e = '\0')
>>
>> Why is this needed?  I don't think the code cares for the contents of
>> freshly added bytes.  For the trailing NUL byte, it seems better to add
>> that separately from the initialization operations.
> 
> It does not, it is an artefact I added and it should be removed.
> 
>>
>>> +static bool __attribute_used__
>>> +char_array_replace_str_pos (struct char_array *array, size_t pos,
>>> +                            const char *str, size_t len)
>>> +{
>>> +  if (pos > array->dynarray_header.used)
>>> +    return false;
>>> +
>>> +  size_t newsize;
>>> +  if (check_add_wrapv_size_t (pos, len, &newsize)
>>> +      || check_add_wrapv_size_t (newsize, 1, &newsize)
>>> +      || !char_array_resize (array, newsize))
>>> +    return false;
>>
>> I don't think it is a good idea to mix the reporting of usage errors
>> (index out of bounds) with environmental conditions (memory allocation
>> failure).  Have you considered calling __libc_fatal if pos is out of
>> range, similar to the existing *_at function?
> 
> Indeed, I will change to call __libc_fatal in such cases.
> 
>>
>>> +++ b/malloc/malloc-internal.h
>>
>>> +/* Set *R = A + B.  Return true if the answer is mathematically incorrect due
>>> +   to overflow; in this case, *R is the low order bits of the correct
>>> +   answer.  */
>>> +static inline bool
>>> +check_add_wrapv_size_t (size_t a, size_t b, size_t *r)
>>
>> Why not use check_add_overflow_size_t, for consistency with
>> check_mul_overflow_size_t?
>>
> 
> I will change it.
> 

Here is an updated patch for the specialized dynarray.  I have incorporated all
your suggestion but the skeleton name change (which I am not sure if it should
follow the idea since it should be a complete 'api').  I also added some nonull
attribute as for dynarray implementation.

----

Patch

diff --git a/malloc/Makefile b/malloc/Makefile
index 14c13f1..323b3fb 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -46,6 +46,7 @@  tests-internal += \
 	 tst-dynarray \
 	 tst-dynarray-fail \
 	 tst-dynarray-at-fail \
+	 tst-char_array
 
 ifneq (no,$(have-tunables))
 tests += tst-malloc-usable-tunables
@@ -58,7 +59,7 @@  test-srcs = tst-mtrace
 routines = malloc morecore mcheck mtrace obstack reallocarray \
   scratch_buffer_grow scratch_buffer_grow_preserve \
   scratch_buffer_set_array_size \
-  dynarray_at_failure \
+  dynarray_at_failure dynarray_overflow_failure \
   dynarray_emplace_enlarge \
   dynarray_finalize \
   dynarray_resize \
diff --git a/malloc/char_array.c b/malloc/char_array.c
new file mode 100644
index 0000000..4d73203
--- /dev/null
+++ b/malloc/char_array.c
@@ -0,0 +1,281 @@ 
+/* Specialized dynarray for C strings.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file provides a dynamic C string with an initial stack allocated
+   buffer.  Since it is based on dynarray, it provided dynamic size
+   expansion and heap usage for large strings.
+
+   The following parameters are optional:
+
+   CHAR_ARRAY_INITIAL_SIZE
+      The size of the statically allocated array (default is 256).  It will
+      be used to define DYNARRAY_INITIAL_SIZE.
+
+   The following functions are provided:
+
+   bool char_array_init_empty (struct char_array *);
+   bool char_array_init_str (struct char_array *, const char *);
+   bool char_array_init_str_size (struct char_array *, const char *, size_t);
+   bool char_array_is_empty (struct char_array *);
+   const char *char_array_str (struct char_array *);
+   char char_array_pos (struct char_array *, size_t);
+   size_t char_array_length (struct char_array *);
+   bool char_array_set_str (struct char_array *, const char *);
+   bool char_array_set_str_size (struct char_array *, const char *, size_t);
+   void char_array_erase (struct char_array *, size_t);
+   bool char_array_resize_str (struct char_array *, size_t);
+   bool char_array_prepend_str (struct char_array *, const char *);
+   bool char_array_append_str (struct char_array *, const char *);
+   bool char_array_replace_str_pos (struct char_array *, size_t, const char *,
+				    size_t);
+
+   For instance:
+
+   struct char_array str;
+   // str == "testing";
+   char_array_init_str (&str, "testing");
+   // c == 's'
+   char c = char_array_pos (&str, 2);
+   // str = "testing2";
+   char_array_set_str (&str, "testing2");
+   // str = "testi";
+   char_array_erase (&str, 5);
+   // str = "123testi";
+   char_array_prepend_str (&str, "123");
+   // len = 8;
+   size_t len = char_array_length (&str);
+   // str = "123testi456";
+   char_array_append_str (&str, "456");
+   // str = "123testi789";
+   char_array_replace_str_pos (&str, 7, "789", 3);
+ */
+
+#define DYNARRAY_STRUCT            char_array
+#define DYNARRAY_ELEMENT           char
+#define DYNARRAY_PREFIX            char_array_
+#ifndef CHAR_ARRAY_INITIAL_SIZE
+# define CHAR_ARRAY_INITIAL_SIZE 256
+#endif
+#define DYNARRAY_INITIAL_SIZE  CHAR_ARRAY_INITIAL_SIZE
+#include <malloc/dynarray-skeleton.c>
+#include <malloc/malloc-internal.h>
+
+/* Return a const char for the internal C string handled by 'array'.  */
+__attribute__ ((nonnull (1)))
+static const char *
+char_array_str (struct char_array *array)
+{
+  return char_array_at (array, 0);
+}
+
+/* Return the character at position 'pos' from the char_array 'array'.  */
+__attribute__ ((nonnull (1)))
+static char
+char_array_pos (struct char_array *array, size_t pos)
+{
+  return *char_array_at (array, pos);
+}
+
+/* Calculate the length of the string, excluding the terminating null.  */
+__attribute__ ((unused, nonnull (1)))
+static size_t
+char_array_length (struct char_array *array)
+{
+  /* Exclude the final '\0'.  */
+  return array->dynarray_header.used - 1;
+}
+
+/* Copy the contents of string 'str' to char_array 'array', including the
+   final '\0'.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_set_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str) + 1;
+  if (!char_array_resize (array, size))
+    return false;
+  memcpy (array->dynarray_header.array, str, size);
+  array->dynarray_header.used = size;
+  return true;
+}
+
+/* Copy up 'size' bytes from string 'str' to char_array 'array'.  A final
+   '\0' is appended in the char_array.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_set_str_size (struct char_array *array, const char *str,
+			 size_t size)
+{
+  size_t newsize;
+  if (check_add_wrapv_size_t (size, 1, &newsize))
+    __libc_dynarray_overflow_failure (size, 1);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  *((char *) mempcpy (array->dynarray_header.array, str, size)) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Initialize the char_array 'array' and sets it to an empty string ("").  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_init_empty (struct char_array *array)
+{
+  char_array_init (array);
+  return char_array_set_str (array, "");
+}
+
+/* Initialize the char_array 'array' and copy the content of string 'str'.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_init_str (struct char_array *array, const char *str)
+{
+  char_array_init (array);
+  return char_array_set_str (array, str);
+}
+
+/* Initialize the char_array 'array' and copy the content of string 'str'
+   up to 'size' characteres.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_init_str_size (struct char_array *array, const char *str,
+			  size_t size)
+{
+  char_array_init (array);
+  return char_array_set_str_size (array, str, size);
+}
+
+/* Return if the char_array contain any characteres.  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_is_empty (struct char_array *array)
+{
+  return *char_array_at (array, 0) == '\0';
+}
+
+/* Remove the byte at position 'pos' from char_array 'array'.  The contents
+   are moved internally if the position is not at the end of the internal
+   buffer.  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_erase (struct char_array *array, size_t pos)
+{
+  if (pos >= array->dynarray_header.used - 1)
+    return false;
+
+  char *ppos = char_array_at (array, pos);
+  char *lpos = array->dynarray_header.array + array->dynarray_header.used;
+  ptrdiff_t size = lpos - ppos;
+  memmove (ppos, ppos + 1, size);
+  array->dynarray_header.used--;
+  return true;
+}
+
+/* Resize the char_array 'array' to size 'count' maintaining the ending
+   '\0' byte.  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_crop (struct char_array *array, size_t size)
+{
+  if (size >= (array->dynarray_header.used - 1)
+      || !char_array_resize (array, size + 1))
+    return false;
+
+  array->dynarray_header.array[array->dynarray_header.used] = '\0';
+  return true;
+}
+
+/* Prepend the contents of string 'str' to char_array 'array', including the
+   final '\0' byte.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_prepend_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str);
+  /* Resizing the array might change its used elements and we need below
+     to correct copy the elements.  */
+  size_t used = array->dynarray_header.used;
+
+  size_t newsize;
+  if (check_add_wrapv_size_t (used, size, &newsize))
+    __libc_dynarray_overflow_failure (used, size);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  /* Make room for the string and copy it.  */
+  memmove (array->dynarray_header.array + size, array->dynarray_header.array,
+           used);
+  memcpy (array->dynarray_header.array, str, size);
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Append the contents of string 'str' to char_array 'array, including the
+   final '\0' byte.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_append_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str);
+  /* Resizing the array might change its used elements and it used it below
+     to correct copy the elements.  */
+  size_t used = array->dynarray_header.used - 1;
+
+  /* 'used' does account for final '\0', so there is no need to add
+     an extra element to calculate the final required size.  */
+  size_t newsize;
+  if (check_add_wrapv_size_t (used + 1, size, &newsize))
+    __libc_dynarray_overflow_failure (used + 1, size);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  /* Start to append at '\0' up to string length and add a final '\0'.  */
+  *(char*) mempcpy (array->dynarray_header.array + used, str, size) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Replace the contents starting of position 'pos' of char_array 'array'
+   with the contents of string 'str' up to 'len' bytes.  A final '\0'
+   is appended in the string.  */
+__attribute__ ((nonnull (1, 3)))
+static bool
+char_array_replace_str_pos (struct char_array *array, size_t pos,
+                            const char *str, size_t len)
+{
+  if (pos > array->dynarray_header.used)
+    return false;
+
+  size_t newsize;
+  if (check_add_wrapv_size_t (pos, len, &newsize))
+    __libc_dynarray_overflow_failure (pos, len);
+  if (check_add_wrapv_size_t (newsize, 1, &newsize))
+    __libc_dynarray_overflow_failure (newsize, 1);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  char *start = char_array_at (array, pos);
+  *(char *) mempcpy (start, str, len) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
diff --git a/malloc/dynarray.h b/malloc/dynarray.h
index c73e08b..5a491b4 100644
--- a/malloc/dynarray.h
+++ b/malloc/dynarray.h
@@ -173,4 +173,12 @@  void __libc_dynarray_at_failure (size_t size, size_t index)
   __attribute__ ((noreturn));
 libc_hidden_proto (__libc_dynarray_at_failure)
 
+/* Internal function.  TErminate the process after an overflow in
+   new size allocation.  SIZE is the current number of elements in
+   dynamic array and INCR is the new elements to add on current
+   size.  */
+void __libc_dynarray_overflow_failure (size_t size, size_t incr)
+  __attribute__ ((noreturn));
+libc_hidden_proto (__libc_dynarray_overflow_failure)
+
 #endif /* _DYNARRAY_H */
diff --git a/malloc/dynarray_overflow_failure.c b/malloc/dynarray_overflow_failure.c
new file mode 100644
index 0000000..14936b0
--- /dev/null
+++ b/malloc/dynarray_overflow_failure.c
@@ -0,0 +1,31 @@ 
+/* Report an dynamic array size overflow condition.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dynarray.h>
+#include <stdio.h>
+
+void
+__libc_dynarray_overflow_failure (size_t size, size_t incr)
+{
+  char buf[200];
+  __snprintf (buf, sizeof (buf), "Fatal glibc error: "
+              "new size overflows (old %zu and increment %zu)\n",
+              size, incr);
+ __libc_fatal (buf);
+}
+libc_hidden_def (__libc_dynarray_overflow_failure)
diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
index dbd801a..3066cd3 100644
--- a/malloc/malloc-internal.h
+++ b/malloc/malloc-internal.h
@@ -101,4 +101,18 @@  check_mul_overflow_size_t (size_t left, size_t right, size_t *result)
 #endif
 }
 
+/* Set *R = A + B.  Return true if the answer is mathematically incorrect due
+   to overflow; in this case, *R is the low order bits of the correct
+   answer.  */
+static inline bool
+check_add_wrapv_size_t (size_t a, size_t b, size_t *r)
+{
+#if 5 <= __GNUC__
+  return __builtin_add_overflow (a, b, r);
+#else
+  *r = a + b;
+  return *r < a;
+#endif
+}
+
 #endif /* _MALLOC_INTERNAL_H */
diff --git a/malloc/tst-char_array.c b/malloc/tst-char_array.c
new file mode 100644
index 0000000..53f9482
--- /dev/null
+++ b/malloc/tst-char_array.c
@@ -0,0 +1,107 @@ 
+/* Test for char_array.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+#include <malloc/char_array.c>
+
+#include <malloc.h>
+#include <mcheck.h>
+#include <stdint.h>
+#include <support/check.h>
+#include <support/support.h>
+
+static int
+do_test (void)
+{
+  mtrace ();
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_empty (&str) == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == 0);
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == true);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing"));
+    TEST_VERIFY_EXIT (char_array_pos (&str, 2) == 's');
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == false);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testing") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str_size (&str, "testing", 4));
+    TEST_VERIFY_EXIT (char_array_length (&str) == 4);
+    TEST_VERIFY_EXIT (char_array_pos (&str, 2) == 's');
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == false);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "test") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_set_str (&str, "abcdef"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "abcdef") == 0);
+    TEST_VERIFY_EXIT (char_array_set_str_size (&str, "abcdef", 4));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "abcd") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_erase (&str, 4) == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 1);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testng") == 0);
+    TEST_VERIFY_EXIT (char_array_erase (&str, char_array_length (&str))
+		      == false);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 1);
+    TEST_VERIFY_EXIT (char_array_erase (&str, char_array_length (&str) - 1)
+		      == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 2);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testn") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "test"));
+    TEST_VERIFY_EXIT (char_array_prepend_str (&str, "123"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test"));
+    TEST_VERIFY_EXIT (char_array_append_str (&str, "456"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test456") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test456"));
+    TEST_VERIFY_EXIT (char_array_replace_str_pos (&str, 7, "789", 3));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test789") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test789"));
+    char_array_free (&str);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>