Patchwork benchtests: New strlen-struct benchmark

login
register
mail settings
Submitter Siddhesh Poyarekar
Date Aug. 7, 2018, 7:50 p.m.
Message ID <20180807195022.29149-1-siddhesh@sourceware.org>
Download mbox | patch
Permalink /patch/28776/
State New
Headers show

Comments

Siddhesh Poyarekar - Aug. 7, 2018, 7:50 p.m.
A typical strlen use is on string attributes in a list of structures
(sort of like a tuple) and is also often the first operation on that
string.  The current string benchmark has a static list of lengths and
alignments and runs repeatedly on the same input, thus failing to
emulate this scenario.

This benchmark adds this by allocating a linked list of strings and
then walking through it, calling strlen on each of the elements.  I've
done the following things to take care of consistency of numbers:

- Called mallopt to ensure that the malloc threshold is static and the
  two larger sizes (16k and 32k) are allocated using mmap.

- Trimmed the heap on every iteration.  This should not have a direct
  impact but I'm being cautious.

- strlen is called exactly once on every string

Performance numbers were verified to be stable on an idle aarch64
(falkor) system.

	* benchtests/bench-strlen-struct.c: New benchmark.
	* benchtests/Makefile (string-benchset): Add it.
	* benchtests/bench-string.h: Include malloc.h.
	[LINKED_STRINGS](struct linked_strings): New struct.
	[LINKED_STRINGS](alloc_strings, alloc_elem, free_strings): New
	functions.

CC: Wilco.Dijkstra@arm.com
CC: carlos@redhat.com
---
 benchtests/Makefile              |   2 +-
 benchtests/bench-string.h        |  64 +++++++++++++
 benchtests/bench-strlen-struct.c | 152 +++++++++++++++++++++++++++++++
 3 files changed, 217 insertions(+), 1 deletion(-)
 create mode 100644 benchtests/bench-strlen-struct.c
Wilco Dijkstra - Aug. 8, 2018, 3:43 p.m.
Hi Siddhesh,

So my first question is, what are you trying to benchmark here?
The strings are all identically sized so it won't stress strlen. The
layout of the structs and strings is such that it ends up as one linear 
sweep of memory (which any prefetcher should be able to spot).
Would better results on this test mean real code gets faster too?

If you want to benchmark strlen in a more realistic way then it would
make sense to do something like the random memcpy (create an
array of strings based on a measured distribution and ensure the
strings are not laid out linearly in memory).

If you have a specific workload you want to mimic then it may be
better to keep it closer to the original code. Traversing a linked list of
10000 entries with 32KB strings seems quite unusual...

Cheers,
Wilco
Siddhesh Poyarekar - Aug. 8, 2018, 4:03 p.m.
On 08/08/2018 09:13 PM, Wilco Dijkstra wrote:
> So my first question is, what are you trying to benchmark here?
> The strings are all identically sized so it won't stress strlen. The
> layout of the structs and strings is such that it ends up as one linear
> sweep of memory (which any prefetcher should be able to spot).
> Would better results on this test mean real code gets faster too?

Yeah, I added a larger padding in the struct to make the access stride a 
bit uneven, but I suspect some prefetchers may still be able to do 
something useful with it.  I played around with the string and list 
lengths to stabilize the benchmark numbers further, but I suppose stable 
numbers don't necessarily mean useful numbers.

> If you want to benchmark strlen in a more realistic way then it would
> make sense to do something like the random memcpy (create an
> array of strings based on a measured distribution and ensure the
> strings are not laid out linearly in memory).
> 
> If you have a specific workload you want to mimic then it may be
> better to keep it closer to the original code. Traversing a linked list of
> 10000 entries with 32KB strings seems quite unusual...

TBH I couldn't find any real workloads that use strlen that I could 
emulate, which is why I did this weird attempt at emulating list 
traversal.  I was trying to stay away from inventing my own size 
distribution because it becomes just another arbitrary data point that I 
won't even understand fully and it's also harder to stabilize the 
results; memcpy-random has that problem for example.

Do you have any specific ideas based on your experience with workloads 
that you've seen in the past?

Thanks,
Siddhesh

Patch

diff --git a/benchtests/Makefile b/benchtests/Makefile
index bcd6a9c26d..318365fcad 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -43,7 +43,7 @@  string-benchset := bcopy bzero memccpy memchr memcmp memcpy memmem memmove \
 		   strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \
 		   strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok \
 		   strcoll memcpy-large memcpy-random memmove-large memset-large \
-		   memcpy-walk memset-walk memmove-walk
+		   memcpy-walk memset-walk memmove-walk strlen-struct
 
 # Build and run locale-dependent benchmarks only if we're building natively.
 ifeq (no,$(cross-compiling))
diff --git a/benchtests/bench-string.h b/benchtests/bench-string.h
index f8389982e9..c4c780f40c 100644
--- a/benchtests/bench-string.h
+++ b/benchtests/bench-string.h
@@ -18,6 +18,7 @@ 
 
 #include <getopt.h>
 #include <sys/cdefs.h>
+#include <malloc.h>
 
 /* We are compiled under _ISOMAC, so libc-symbols.h does not do this
    for us.  */
@@ -252,4 +253,67 @@  test_init (void)
     }
 }
 
+
+/* String function testing on strings inside linked structures.  */
+# ifdef LINKED_STRINGS
+
+struct linked_strings
+{
+  char *str;
+  int dummy[16];
+  struct linked_strings *next;
+};
+
+
+static struct linked_strings *
+alloc_elem (size_t len)
+{
+  struct linked_strings *e = malloc (sizeof (struct linked_strings));
+
+  if (e == NULL)
+    error (EXIT_FAILURE, errno, "struct alloc: malloc failed");
+
+  e->str = malloc (len + 1);
+  if (e->str == NULL)
+    error (EXIT_FAILURE, errno, "struct alloc: malloc failed");
+
+  memset (e->str, 'a', len);
+  e->str[len] = '\0';
+  e->next = NULL;
+
+  return e;
+}
+
+static struct linked_strings *
+alloc_strings (size_t len, size_t num_elem)
+{
+  /* Fix the mmap threshold so that we (1) don't end up influencing allocations
+     of later iterations and (2) we test performance on mmapped regions.  */
+  mallopt (M_MMAP_THRESHOLD, 8 * 1024);
+
+  struct linked_strings *head = alloc_elem (len);
+  struct linked_strings *list = head;
+
+  for (size_t i = 0; i < num_elem; i++)
+    {
+      list->next = alloc_elem (len);
+      list = list->next;
+    }
+  return head;
+}
+
+static void
+free_strings (struct linked_strings *head)
+{
+  while (head)
+    {
+      struct linked_strings *cur = head;
+      head = head->next;
+      free (cur->str);
+      free (cur);
+    }
+  malloc_trim (0);
+}
+
+# endif /* LINKED_STRINGS */
 #endif /* TEST_MAIN */
diff --git a/benchtests/bench-strlen-struct.c b/benchtests/bench-strlen-struct.c
new file mode 100644
index 0000000000..4d3cbb57de
--- /dev/null
+++ b/benchtests/bench-strlen-struct.c
@@ -0,0 +1,152 @@ 
+/* Measure STRLEN functions - walk through a list of elements and measure
+   string lengths.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* RATIONALE
+   ---------
+
+   This test measures the latency of measuring a string length in a typical
+   setting where a dynamically allocated string is part of a structure, which
+   in turn is allocated in a linked list.  This puts the strings in different
+   parts of the heap (and in larger string sizes, different maps) and thus
+   attempts to approximate the caching behavior of such code.  */
+
+#define TEST_MAIN
+#define LINKED_STRINGS
+#ifndef WIDE
+# define TEST_NAME "strlen"
+#else
+# define TEST_NAME "wcslen"
+#endif
+#include "bench-string.h"
+
+#ifndef WIDE
+# define STRLEN strlen
+# define CHAR char
+# define MAX_CHAR CHAR_MAX
+#else
+# include <wchar.h>
+# define STRLEN wcslen
+# define CHAR wchar_t
+# define MAX_CHAR WCHAR_MAX
+#endif
+
+#include "json-lib.h"
+
+typedef size_t (*proto_t) (const CHAR *);
+
+size_t
+simple_STRLEN (const CHAR *s)
+{
+  const CHAR *p;
+
+  for (p = s; *p; ++p);
+  return p - s;
+}
+
+#ifndef WIDE
+size_t
+builtin_strlen (const CHAR *p)
+{
+  return __builtin_strlen (p);
+}
+IMPL (builtin_strlen, 0)
+#endif
+
+IMPL (simple_STRLEN, 0)
+IMPL (STRLEN, 1)
+
+#define NUM_STRINGS 10000
+static void
+do_one_test (json_ctx_t *json_ctx, impl_t *impl, struct linked_strings *head)
+{
+  timing_t start, stop, cur;
+
+  TIMING_NOW (start);
+  while (head)
+    {
+      CALL (impl, head->str);
+      head = head->next;
+    }
+  TIMING_NOW (stop);
+
+  TIMING_DIFF (cur, start, stop);
+
+  json_element_double (json_ctx, (double) cur / (double) NUM_STRINGS);
+}
+
+static void
+do_test (json_ctx_t *json_ctx, size_t len)
+{
+  json_element_object_begin (json_ctx);
+  json_attr_uint (json_ctx, "length", len);
+  json_array_begin (json_ctx, "timings");
+
+
+  FOR_EACH_IMPL (impl, 0)
+    {
+      struct linked_strings *head = alloc_strings (len, NUM_STRINGS);
+      do_one_test (json_ctx, impl, head);
+      free_strings (head);
+    }
+
+  json_array_end (json_ctx);
+  json_element_object_end (json_ctx);
+}
+
+int
+test_main (void)
+{
+  json_ctx_t json_ctx;
+  size_t i;
+
+  test_init ();
+
+  json_init (&json_ctx, 0, stdout);
+
+  json_document_begin (&json_ctx);
+  json_attr_string (&json_ctx, "timing_type", TIMING_TYPE);
+
+  json_attr_object_begin (&json_ctx, "functions");
+  json_attr_object_begin (&json_ctx, TEST_NAME);
+  json_attr_string (&json_ctx, "bench-variant", "random");
+
+  json_array_begin (&json_ctx, "ifuncs");
+  FOR_EACH_IMPL (impl, 0)
+    json_element_string (&json_ctx, impl->name);
+  json_array_end (&json_ctx);
+
+  json_array_begin (&json_ctx, "results");
+
+  for (i = 1; i <= 32*1024; i = i << 1)
+  {
+    do_test (&json_ctx, i);
+    do_test (&json_ctx, i + 1);
+    do_test (&json_ctx, i + 2);
+    do_test (&json_ctx, i + 3);
+  }
+
+  json_array_end (&json_ctx);
+  json_attr_object_end (&json_ctx);
+  json_attr_object_end (&json_ctx);
+  json_document_end (&json_ctx);
+
+  return ret;
+}
+
+#include <support/test-driver.c>