[v2,BZ,15884] strcoll: improve performance by removing the cache

Message ID 543D20F8.9000308@web.de
State Superseded
Headers

Commit Message

Leonhard Holz Oct. 14, 2014, 1:11 p.m. UTC
  Hello everybody,

this is a path that should solve bug 15884. It complains about the 
performance of strcoll(). It was found out that the runtime of strcoll() 
is actually bound to strlen which is needed for calculating the size of 
a cache that was installed to improve the comparison performance.

The idea for this patch was that the cache is only useful in rare cases 
(strings of same length and same first-level-chars) and that it would be 
better to avoid memory allocation at all. To prove this I wrote a 
performance test bench-strcoll.c with test data in 
benchtests-strcoll.tar.gz. Also modifications in benchtests/Makefile and 
localedata/Makefile are necessary to make it work.

After removing the cache the strcoll method showed the predicted 
behavior (getting slightly faster) in all but the test case for hindi 
word sorting. This was due the hindi text having much more equal words 
than the other ones. For equal strings the performance was worse since 
all comparison levels were run through and from the second level on the 
cache improved the comparison performance of the original version.

Therefore I added a bytewise test via strcmp iff the first level 
comparison found that both strings did match because in this case it is 
very likely that equal strings are compared. This solved the problem 
with the hindi test case and improved the performance of the others.

The output of the benchmark is:

   "strcoll": {
    "": {
     "locale": "en_US.UTF-8",
     "duration": 7.41506e+09,
     "iterations": 16,
     "mean": 4.63441e+08
    },
    "vi_VN.UTF-8": {
     "duration": 7.4185e+07,
     "iterations": 16,
     "mean": 4.63656e+06
    },
    "en_US.UTF-8": {
     "duration": 7.59759e+07,
     "iterations": 16,
     "mean": 4.7485e+06
    },
    "ar_SA.UTF-8": {
     "duration": 7.61479e+07,
     "iterations": 16,
     "mean": 4.75924e+06
    },
    "zh_CN.UTF-8": {
     "duration": 1.37535e+07,
     "iterations": 16,
     "mean": 859594
    },
    "cs_CZ.UTF-8": {
     "duration": 7.90032e+07,
     "iterations": 16,
     "mean": 4.9377e+06
    },
    "en_GB.UTF-8": {
     "duration": 7.4454e+07,
     "iterations": 16,
     "mean": 4.65338e+06
    },
    "da_DK.UTF-8": {
     "duration": 7.265e+07,
     "iterations": 16,
     "mean": 4.54062e+06
    },
    "pl_PL.UTF-8": {
     "duration": 7.28964e+07,
     "iterations": 16,
     "mean": 4.55602e+06
    },
    "fr_FR.UTF-8": {
     "duration": 7.99239e+07,
     "iterations": 16,
     "mean": 4.99524e+06
    },
    "pt_PT.UTF-8": {
     "duration": 7.78347e+07,
     "iterations": 16,
     "mean": 4.86467e+06
    },
    "el_GR.UTF-8": {
     "duration": 1.0959e+08,
     "iterations": 16,
     "mean": 6.84937e+06
    },
    "ru_RU.UTF-8": {
     "duration": 9.39324e+07,
     "iterations": 16,
     "mean": 5.87077e+06
    },
    "iw_IL.UTF-8": {
     "duration": 8.8769e+07,
     "iterations": 16,
     "mean": 5.54806e+06
    },
    "es_ES.UTF-8": {
     "duration": 8.0782e+07,
     "iterations": 16,
     "mean": 5.04888e+06
    },
    "hi_IN.UTF-8": {
     "duration": 3.66962e+09,
     "iterations": 16,
     "mean": 2.29351e+08
    },
    "sv_SE.UTF-8": {
     "duration": 7.41934e+07,
     "iterations": 16,
     "mean": 4.63709e+06
    },
    "hu_HU.UTF-8": {
     "duration": 9.04538e+07,
     "iterations": 16,
     "mean": 5.65336e+06
    },
    "tr_TR.UTF-8": {
     "duration": 7.25579e+07,
     "iterations": 16,
     "mean": 4.53487e+06
    },
    "is_IS.UTF-8": {
     "duration": 6.83783e+07,
     "iterations": 16,
     "mean": 4.27364e+06
    },
    "it_IT.UTF-8": {
     "duration": 7.50307e+07,
     "iterations": 16,
     "mean": 4.68942e+06
    },
    "sr_RS.UTF-8": {
     "duration": 8.14996e+07,
     "iterations": 16,
     "mean": 5.09373e+06
    },
    "ja_JP.UTF-8": {
     "duration": 1.5325e+07,
     "iterations": 16,
     "mean": 957814
    }
   }

That is compared to the previous version:

glibc files	-33.77%
vi_VN.UTF-8	-34.12%
en_US.UTF-8	-42.42%
ar_SA.UTF-8	-27.49%
zh_CN.UTF-8	+07.90%
cs_CZ.UTF-8	-29.67%
en_GB.UTF-8	-28.50%
da_DK.UTF-8	-36.57%
pl_PL.UTF-8	-39.31%
fr_FR.UTF-8	-28.57%
pt_PT.UTF-8	-22.82%
el_GR.UTF-8	-26.77%
ru_RU.UTF-8	-35.81%
iw_IL.UTF-8	-35.34%
es_ES.UTF-8	-34.46%
hi_IN.UTF-8	-00.38%
sv_SE.UTF-8	-36.99%
hu_HU.UTF-8	-16.35%
tr_TR.UTF-8	-27.80%
is_IS.UTF-8	-33.24%
it_IT.UTF-8	-24.39%
sr_RS.UTF-8	-37.55%
ja_JP.UTF-8	+02.84%

Best,
Leonhard


2014-10-14  Leonhard Holz  <leonhard.holz@web.de>

     [BZ #15884]
     * string/strcoll_l.c (STRCOLL): Remove weight and rules cache.
     * benchtests/bench-strcoll.c: New benchmark.
     * benchtests/Makefile (bench-string): Use it.
     * benchtests/strcoll-inputs/lorem_ipsum_ar_SA New file.
     * benchtests/strcoll-inputs/lorem_ipsum_cs_CZ Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_da_DK Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_el_GR Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_en_GB Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_en_US Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_es_ES Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_fr_FR Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_hi_IN Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_hu_HU Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_is_IS Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_it_IT Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_iw_IL Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_ja_JP Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_pl_PL Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_pt_PT Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_ru_RU Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_sr_RS Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_sv_SE Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_tr_TR Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_vi_VN Likewise.
     * benchtests/strcoll-inputs/lorem_ipsum_zh_CN Likewise.
     * localedata/Makefile (LOCALES): Generate needed locales.

diff --git a/localedata/Makefile b/localedata/Makefile
index b6235f2..cb24974 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -106,7 +106,10 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \
 	   hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
 	   nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
 	   zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
-	   tr_TR.ISO-8859-9 en_GB.UTF-8
+	   tr_TR.ISO-8859-9 en_GB.UTF-8 vi_VN.UTF-8 ar_SA.UTF-8 zh_CN.UTF-8 \
+	   da_DK.UTF-8 pl_PL.UTF-8 pt_PT.UTF-8 el_GR.UTF-8 ru_RU.UTF-8 \
+	   iw_IL.UTF-8 es_ES.UTF-8 hi_IN.UTF-8 sv_SE.UTF-8 hu_HU.UTF-8 \
+	   is_IS.UTF-8 it_IT.UTF-8 sr_RS.UTF-8
 LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g')
 CHARMAPS := $(shell echo "$(LOCALES)" | \
 		    sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g)
diff --git a/string/strcoll_l.c b/string/strcoll_l.c
index d4f42a3..6677eaf
--- a/string/strcoll_l.c
+++ b/string/strcoll_l.c
@@ -22,7 +22,6 @@
 #include <locale.h>
 #include <stddef.h>
 #include <stdint.h>
-#include <stdlib.h>
 #include <string.h>
 #include <sys/param.h>
 
@@ -55,8 +54,6 @@ typedef struct
   size_t backw;			/* Current Backward sequence index.  */
   size_t backw_stop;		/* Index where the backward sequences stop.  */
   const USTRING_TYPE *us;	/* The string.  */
-  int32_t *idxarr;		/* Array to cache weight indices.  */
-  unsigned char *rulearr;	/* Array to cache rules.  */
   unsigned char rule;		/* Saved rule for the first sequence.  */
   int32_t idx;			/* Index to weight of the current sequence.  */
   int32_t save_idx;		/* Save looked up index of a forward
@@ -65,179 +62,9 @@ typedef struct
   const USTRING_TYPE *back_us;	/* Beginning of the backward sequence.  */
 } coll_seq;
 
-/* Get next sequence.  The weight indices are cached, so we don't need to
-   traverse the string.  */
-static void
-get_next_seq_cached (coll_seq *seq, int nrules, int pass,
-		     const unsigned char *rulesets,
-		     const USTRING_TYPE *weights)
-{
-  size_t val = seq->val = 0;
-  int len = seq->len;
-  size_t backw_stop = seq->backw_stop;
-  size_t backw = seq->backw;
-  size_t idxcnt = seq->idxcnt;
-  size_t idxmax = seq->idxmax;
-  size_t idxnow = seq->idxnow;
-  unsigned char *rulearr = seq->rulearr;
-  int32_t *idxarr = seq->idxarr;
-
-  while (len == 0)
-    {
-      ++val;
-      if (backw_stop != ~0ul)
-	{
-	  /* There is something pushed.  */
-	  if (backw == backw_stop)
-	    {
-	      /* The last pushed character was handled.  Continue
-		 with forward characters.  */
-	      if (idxcnt < idxmax)
-		{
-		  idxnow = idxcnt;
-		  backw_stop = ~0ul;
-		}
-	      else
-		{
-		  /* Nothing any more.  The backward sequence
-		     ended with the last sequence in the string.  */
-		  idxnow = ~0ul;
-		  break;
-		}
-	    }
-	  else
-	    idxnow = --backw;
-	}
-      else
-	{
-	  backw_stop = idxcnt;
-
-	  while (idxcnt < idxmax)
-	    {
-	      if ((rulesets[rulearr[idxcnt] * nrules + pass]
-		   & sort_backward) == 0)
-		/* No more backward characters to push.  */
-		break;
-	      ++idxcnt;
-	    }
-
-	  if (backw_stop == idxcnt)
-	    {
-	      /* No sequence at all or just one.  */
-	      if (idxcnt == idxmax)
-		/* Note that LEN is still zero.  */
-		break;
-
-	      backw_stop = ~0ul;
-	      idxnow = idxcnt++;
-	    }
-	  else
-	    /* We pushed backward sequences.  */
-	    idxnow = backw = idxcnt - 1;
-	}
-      len = weights[idxarr[idxnow]++];
-    }
-
-  /* Update the structure.  */
-  seq->val = val;
-  seq->len = len;
-  seq->backw_stop = backw_stop;
-  seq->backw = backw;
-  seq->idxcnt = idxcnt;
-  seq->idxnow = idxnow;
-}
-
 /* Get next sequence.  Traverse the string as required.  */
 static void
 get_next_seq (coll_seq *seq, int nrules, const unsigned char *rulesets,
-	      const USTRING_TYPE *weights, const int32_t *table,
-	      const USTRING_TYPE *extra, const int32_t *indirect)
-{
-  size_t val = seq->val = 0;
-  int len = seq->len;
-  size_t backw_stop = seq->backw_stop;
-  size_t backw = seq->backw;
-  size_t idxcnt = seq->idxcnt;
-  size_t idxmax = seq->idxmax;
-  size_t idxnow = seq->idxnow;
-  unsigned char *rulearr = seq->rulearr;
-  int32_t *idxarr = seq->idxarr;
-  const USTRING_TYPE *us = seq->us;
-
-  while (len == 0)
-    {
-      ++val;
-      if (backw_stop != ~0ul)
-	{
-	  /* There is something pushed.  */
-	  if (backw == backw_stop)
-	    {
-	      /* The last pushed character was handled.  Continue
-		 with forward characters.  */
-	      if (idxcnt < idxmax)
-		{
-		  idxnow = idxcnt;
-		  backw_stop = ~0ul;
-		}
-	      else
-		/* Nothing any more.  The backward sequence ended with
-		   the last sequence in the string.  Note that LEN
-		   is still zero.  */
-		break;
-	    }
-	  else
-	    idxnow = --backw;
-	}
-      else
-	{
-	  backw_stop = idxmax;
-
-	  while (*us != L('\0'))
-	    {
-	      int32_t tmp = findidx (table, indirect, extra, &us, -1);
-	      rulearr[idxmax] = tmp >> 24;
-	      idxarr[idxmax] = tmp & 0xffffff;
-	      idxcnt = idxmax++;
-
-	      if ((rulesets[rulearr[idxcnt] * nrules]
-		   & sort_backward) == 0)
-		/* No more backward characters to push.  */
-		break;
-	      ++idxcnt;
-	    }
-
-	  if (backw_stop >= idxcnt)
-	    {
-	      /* No sequence at all or just one.  */
-	      if (idxcnt == idxmax || backw_stop > idxcnt)
-		/* Note that LEN is still zero.  */
-		break;
-
-	      backw_stop = ~0ul;
-	      idxnow = idxcnt;
-	    }
-	  else
-	    /* We pushed backward sequences.  */
-	    idxnow = backw = idxcnt - 1;
-	}
-      len = weights[idxarr[idxnow]++];
-    }
-
-  /* Update the structure.  */
-  seq->val = val;
-  seq->len = len;
-  seq->backw_stop = backw_stop;
-  seq->backw = backw;
-  seq->idxcnt = idxcnt;
-  seq->idxmax = idxmax;
-  seq->idxnow = idxnow;
-  seq->us = us;
-}
-
-/* Get next sequence.  Traverse the string as required.  This function does not
-   set or use any index or rule cache.  */
-static void
-get_next_seq_nocache (coll_seq *seq, int nrules, const unsigned char *rulesets,
 		      const USTRING_TYPE *weights, const int32_t *table,
 		      const USTRING_TYPE *extra, const int32_t *indirect,
 		      int pass)
@@ -366,10 +193,9 @@ get_next_seq_nocache (coll_seq *seq, int nrules, const unsigned char *rulesets,
   seq->idx = idx;
 }
 
-/* Compare two sequences.  This version does not use the index and rules
-   cache.  */
+/* Compare two sequences.  */
 static int
-do_compare_nocache (coll_seq *seq1, coll_seq *seq2, int position,
+do_compare (coll_seq *seq1, coll_seq *seq2, int position,
 		    const USTRING_TYPE *weights)
 {
   int seq1len = seq1->len;
@@ -417,56 +243,6 @@ out:
   return result;
 }
 
-/* Compare two sequences using the index cache.  */
-static int
-do_compare (coll_seq *seq1, coll_seq *seq2, int position,
-	    const USTRING_TYPE *weights)
-{
-  int seq1len = seq1->len;
-  int seq2len = seq2->len;
-  size_t val1 = seq1->val;
-  size_t val2 = seq2->val;
-  int32_t *idx1arr = seq1->idxarr;
-  int32_t *idx2arr = seq2->idxarr;
-  int idx1now = seq1->idxnow;
-  int idx2now = seq2->idxnow;
-  int result = 0;
-
-  /* Test for position if necessary.  */
-  if (position && val1 != val2)
-    {
-      result = val1 > val2 ? 1 : -1;
-      goto out;
-    }
-
-  /* Compare the two sequences.  */
-  do
-    {
-      if (weights[idx1arr[idx1now]] != weights[idx2arr[idx2now]])
-	{
-	  /* The sequences differ.  */
-	  result = weights[idx1arr[idx1now]] - weights[idx2arr[idx2now]];
-	  goto out;
-	}
-
-      /* Increment the offsets.  */
-      ++idx1arr[idx1now];
-      ++idx2arr[idx2now];
-
-      --seq1len;
-      --seq2len;
-    }
-  while (seq1len > 0 && seq2len > 0);
-
-  if (position && seq1len != seq2len)
-    result = seq1len - seq2len;
-
-out:
-  seq1->len = seq1len;
-  seq2->len = seq2len;
-  return result;
-}
-
 int
 STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l)
 {
@@ -483,6 +259,10 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l)
   if (nrules == 0)
     return STRCMP (s1, s2);
 
+  /* Catch empty strings.  */
+  if (__glibc_unlikely (*s1 == '\0') || __glibc_unlikely (*s2 == '\0'))
+    return (*s1 != '\0') - (*s2 != '\0');
+
   rulesets = (const unsigned char *)
     current->values[_NL_ITEM_INDEX (_NL_COLLATE_RULESETS)].string;
   table = (const int32_t *)
@@ -499,65 +279,12 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l)
   assert (((uintptr_t) extra) % __alignof__ (extra[0]) == 0);
   assert (((uintptr_t) indirect) % __alignof__ (indirect[0]) == 0);
 
-  /* We need this a few times.  */
-  size_t s1len = STRLEN (s1);
-  size_t s2len = STRLEN (s2);
-
-  /* Catch empty strings.  */
-  if (__glibc_unlikely (s1len == 0) || __glibc_unlikely (s2len == 0))
-    return (s1len != 0) - (s2len != 0);
-
-  /* Perform the first pass over the string and while doing this find
-     and store the weights for each character.  Since we want this to
-     be as fast as possible we are using `alloca' to store the temporary
-     values.  But since there is no limit on the length of the string
-     we have to use `malloc' if the string is too long.  We should be
-     very conservative here.
-
-     Please note that the localedef programs makes sure that `position'
-     is not used at the first level.  */
+  int result = 0, rule = 0;
 
   coll_seq seq1, seq2;
-  bool use_malloc = false;
-  int result = 0;
-
   memset (&seq1, 0, sizeof (seq1));
   seq2 = seq1;
 
-  size_t size_max = SIZE_MAX / (sizeof (int32_t) + 1);
-
-  if (MIN (s1len, s2len) > size_max
-      || MAX (s1len, s2len) > size_max - MIN (s1len, s2len))
-    {
-      /* If the strings are long enough to cause overflow in the size request,
-         then skip the allocation and proceed with the non-cached routines.  */
-    }
-  else if (! __libc_use_alloca ((s1len + s2len) * (sizeof (int32_t) + 1)))
-    {
-      seq1.idxarr = (int32_t *) malloc ((s1len + s2len) * (sizeof (int32_t) + 1));
-
-      /* If we failed to allocate memory, we leave everything as NULL so that
-	 we use the nocache version of traversal and comparison functions.  */
-      if (seq1.idxarr != NULL)
-	{
-	  seq2.idxarr = &seq1.idxarr[s1len];
-	  seq1.rulearr = (unsigned char *) &seq2.idxarr[s2len];
-	  seq2.rulearr = &seq1.rulearr[s1len];
-	  use_malloc = true;
-	}
-    }
-  else
-    {
-      seq1.idxarr = (int32_t *) alloca (s1len * sizeof (int32_t));
-      seq2.idxarr = (int32_t *) alloca (s2len * sizeof (int32_t));
-      seq1.rulearr = (unsigned char *) alloca (s1len);
-      seq2.rulearr = (unsigned char *) alloca (s2len);
-    }
-
-  int rule = 0;
-
-  /* Cache values in the first pass and if needed, use them in subsequent
-     passes.  */
   for (int pass = 0; pass < nrules; ++pass)
     {
       seq1.idxcnt = 0;
@@ -575,64 +302,45 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l)
       seq2.us = (const USTRING_TYPE *) s2;
 
       /* We assume that if a rule has defined `position' in one section
-	 this is true for all of them.  */
+	 this is true for all of them.  Please note that the localedef programs
+	 makes sure that `position' is not used at the first level.  */
+
       int position = rulesets[rule * nrules + pass] & sort_position;
 
       while (1)
 	{
-	  if (__glibc_unlikely (seq1.idxarr == NULL))
-	    {
-	      get_next_seq_nocache (&seq1, nrules, rulesets, weights, table,
+	  get_next_seq (&seq1, nrules, rulesets, weights, table,
 				    extra, indirect, pass);
-	      get_next_seq_nocache (&seq2, nrules, rulesets, weights, table,
+	  get_next_seq (&seq2, nrules, rulesets, weights, table,
 				    extra, indirect, pass);
-	    }
-	  else if (pass == 0)
-	    {
-	      get_next_seq (&seq1, nrules, rulesets, weights, table, extra,
-			    indirect);
-	      get_next_seq (&seq2, nrules, rulesets, weights, table, extra,
-			    indirect);
-	    }
-	  else
-	    {
-	      get_next_seq_cached (&seq1, nrules, pass, rulesets, weights);
-	      get_next_seq_cached (&seq2, nrules, pass, rulesets, weights);
-	    }
-
 	  /* See whether any or both strings are empty.  */
 	  if (seq1.len == 0 || seq2.len == 0)
 	    {
 	      if (seq1.len == seq2.len)
-		/* Both ended.  So far so good, both strings are equal
-		   at this level.  */
-		break;
+		{
+		  /* Both strings ended and are equal at this level.  Do a
+		     byte-level comparison to ensure that we don't waste time
+		     going through multiple passes for totally equal strings
+		     before proceeding to subsequent passes.  */
+		  if (pass == 0 && STRCMP (s1, s2) == 0)
+		    return result;
+		  else
+		    break;
+	        }
 
 	      /* This means one string is shorter than the other.  Find out
 		 which one and return an appropriate value.  */
-	      result = seq1.len == 0 ? -1 : 1;
-	      goto free_and_return;
+	      return seq1.len == 0 ? -1 : 1;
 	    }
 
-	  if (__glibc_unlikely (seq1.idxarr == NULL))
-	    result = do_compare_nocache (&seq1, &seq2, position, weights);
-	  else
-	    result = do_compare (&seq1, &seq2, position, weights);
+	  result = do_compare (&seq1, &seq2, position, weights);
 	  if (result != 0)
-	    goto free_and_return;
+	    return result;
 	}
 
-      if (__glibc_likely (seq1.rulearr != NULL))
-	rule = seq1.rulearr[0];
-      else
-	rule = seq1.rule;
+      rule = seq1.rule;
     }
 
-  /* Free the memory if needed.  */
- free_and_return:
-  if (use_malloc)
-    free (seq1.idxarr);
-
   return result;
 }
 libc_hidden_def (STRCOLL)
/* Measure strcoll implementation.
   Copyright (C) 2014 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#define TEST_MAIN
#define TEST_NAME "strcoll"

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <locale.h>
#include <unistd.h>
#include <dirent.h>
#include "bench-timing.h"
#include "json-lib.h"

/* many thanks to http://generator.lorem-ipsum.info/ */
#define CHARSET "UTF-8"
#define INPUT_FOLDER "strcoll-inputs"
#define INPUT_PREFIX "lorem_ipsum_"

const char *locales[] = {
  "vi_VN",
  "en_US",
  "ar_SA",
  "en_US",
  "zh_CN",
  "cs_CZ",
  "en_GB",
  "da_DK",
  "pl_PL",
  "fr_FR",
  "pt_PT",
  "el_GR",
  "ru_RU",
  "iw_IL",
  "es_ES",
  "hi_IN",
  "sv_SE",
  "hu_HU",
  "tr_TR",
  "is_IS",
  "it_IT",
  "sr_RS",
  "ja_JP"
};

#define TEXTFILE_DELIMITER " \n\r\t.,?!"
#define FILENAMES_DELIMITER "\n\r"

int
is_dir (const char *filename)
{
  int is_dir = 0;
  struct stat stats;

  if (stat (filename, &stats) == 0)
    is_dir = stats.st_mode & S_IFDIR;

  return is_dir;
}

char *
concat_path (const char *dirname, const char *filename)
{
  size_t d_len = strlen (dirname);
  size_t f_len = strlen (filename);
  char * path = malloc (d_len + f_len + 2);

  memcpy (path, dirname, d_len);
  * (path + d_len) = '/';
  memcpy (path + d_len + 1, filename, f_len);
  * (path + d_len + f_len + 1) = '\0';

  return path;
}

char *
read_file (const char *filename)
{
  struct stat stats;
  char *buffer = NULL;
  int fd = open (filename, O_RDONLY);

  if (fd >= 0)
    {
      if (fstat (fd, &stats) == 0)
	{
	  buffer = malloc (stats.st_size + 1);
	  if (buffer)
	    {
	      read (fd, buffer, stats.st_size);
	      *(buffer + stats.st_size) = '\0';
	    }
	}
      close (fd);
    }

  return buffer;
}

size_t
count_words (const char *text, const char *delim)
{
  size_t wordcount = 0;
  char *tmp = strdup (text);

  char *token = strtok (tmp, delim);
  while (token != NULL)
    {
      if (*token != '\0')
	wordcount++;
      token = strtok (NULL, delim);
    }

  free (tmp);
  return wordcount;
}

typedef struct
{
  size_t size;
  char **words;
} word_list;

word_list *
new_word_list (size_t size)
{
  word_list *list = malloc (sizeof (word_list));
  list->size = size;
  list->words = malloc (size * sizeof (char *));
  return list;
}

word_list *
merge_word_list (word_list * dst, word_list * src)
{
  size_t i, n = dst->size;
  dst->size += src->size;
  dst->words = realloc (dst->words, dst->size * sizeof (char *));

  for (i = 0; i < src->size; i++)
    dst->words[i + n] = src->words[i];

  free (src->words);
  free (src);

  return dst;
}

word_list *
file_word_list (const char *dirname)
{
  DIR *dir;
  struct dirent *ent;
  word_list *list = NULL;

  if ((dir = opendir (dirname)) != NULL)
    {
      size_t ent_cnt = 0, i = 0;
      word_list *sublist = new_word_list (0);

      while ((ent = readdir (dir)) != NULL)
	if (strcmp (".", ent->d_name) != 0 && strcmp ("..", ent->d_name) != 0)
	  {
	    char *subdirname = concat_path (dirname, ent->d_name);
	    if (is_dir (subdirname))
		merge_word_list (sublist, file_word_list (subdirname));
	    free (subdirname);

	    ent_cnt++;
	  }

      rewinddir (dir);
      list = new_word_list (ent_cnt);
      while ((ent = readdir (dir)) != NULL)
	if (strcmp (".", ent->d_name) != 0 && strcmp ("..", ent->d_name) != 0)
	  {
	    list->words[i] = strdup (ent->d_name);
	    i++;
	  }

      merge_word_list (list, sublist);
      closedir (dir);
    }

  return list;
}

word_list *
str_word_list (const char *str, const char *delim)
{
  size_t n = 0;
  word_list *list = new_word_list (count_words (str, delim));

  char *toks = strdup (str);
  char *word = strtok (toks, delim);
  while (word != NULL && n < list->size)
    {
      if (*word != '\0')
	list->words[n++] = strdup (word);
      word = strtok (NULL, delim);
    }

  free (toks);
  return list;
}

word_list *
copy_word_list (const word_list *list)
{
  size_t i;
  word_list *copy = new_word_list (list->size);

  for (i = 0; i < list->size; i++)
    copy->words[i] = strdup (list->words[i]);

  return copy;
}

void
free_word_list (word_list *list)
{
  size_t i;
  for (i = 0; i < list->size; i++)
    free (list->words[i]);

  free (list->words);
  free (list);
}

int
compare_words (const void *a, const void *b)
{
  const char *s1 = *(char **) a;
  const char *s2 = *(char **) b;
  return strcoll (s1, s2);
}

#undef INNER_LOOP_ITERS
#define INNER_LOOP_ITERS 16

void
bench_list (json_ctx_t *json_ctx, word_list *list)
{
  size_t i;
  timing_t start, stop, cur;

  word_list **tests = malloc (INNER_LOOP_ITERS * sizeof (word_list *));
  for (i = 0; i < INNER_LOOP_ITERS; i++)
    tests[i] = copy_word_list (list);

  TIMING_NOW (start);
  for (i = 0; i < INNER_LOOP_ITERS; i++)
    qsort (tests[i]->words, tests[i]->size, sizeof (char *), compare_words);
  TIMING_NOW (stop);

  TIMING_DIFF (cur, start, stop);
  setlocale (LC_ALL, "en_US.UTF-8");
  json_attr_double (json_ctx, "duration", cur);
  json_attr_double (json_ctx, "iterations", i);
  json_attr_double (json_ctx, "mean", cur / i);

  for (i = 0; i < INNER_LOOP_ITERS; i++)
    free_word_list (tests[i]);
  free (tests);
}

#define OK 0
#define ERROR_LOCALE 1
#define ERROR_IO 2

int
bench_file (json_ctx_t *json_ctx, const char *filename, const char *delim,
  const char *locale)
{
  if (setlocale (LC_ALL, locale) == NULL)
    return ERROR_LOCALE;

  char *text = read_file (filename);
  if (text == NULL)
    return ERROR_IO;

  word_list *list = str_word_list (text, delim);

  json_attr_object_begin (json_ctx, locale);
  bench_list (json_ctx, list);
  json_attr_object_end (json_ctx);

  free_word_list (list);
  free (text);
  return OK;
}

int
bench_file_list (json_ctx_t *json_ctx, const char *dirname, const char *locale)
{
  if (setlocale (LC_ALL, locale) == NULL)
    return ERROR_LOCALE;

  word_list *list = file_word_list (dirname);
  if (list == NULL)
    return ERROR_IO;

  json_attr_object_begin (json_ctx, "");
  json_attr_string (json_ctx, "locale", locale);
  bench_list (json_ctx, list);
  json_attr_object_end (json_ctx);

  free_word_list (list);
  return OK;
}

int
do_bench (void)
{
  timing_t res __attribute__ ((unused));
  TIMING_INIT (res);

  json_ctx_t *json_ctx = malloc (sizeof (json_ctx_t));
  json_init (json_ctx, 2, stdout);
  json_attr_object_begin (json_ctx, "strcoll");

  int result = bench_file_list (json_ctx, "..", "en_US.UTF-8");
  if (result == ERROR_LOCALE)
    {
      printf ("Failed to set locale en_US.UTF-8, aborting!\n");
      return result;
    }
  if (result == ERROR_IO)
    {
      printf ("Failed to read libc directory, aborting!\n");
      return result;
    }

  size_t i;
  char filename[40], locale[15];
  for (i = 0; i < (sizeof (locales) / sizeof (locales[0])); i++)
    {
      sprintf (locale, "%s." CHARSET, locales[i]);
      sprintf (filename, INPUT_FOLDER "/" INPUT_PREFIX "%s", locales[i]);
      result = bench_file (json_ctx, filename, TEXTFILE_DELIMITER, locale);
      if (result == ERROR_LOCALE)
	{
	  printf ("Failed to set locale %s, aborting!\n", locale);
	  return result;
	}
      if (result == ERROR_IO)
	{
	  printf ("Failed to read file %s, aborting!\n", filename);
	  return result;
	}
    }

  json_attr_object_end (json_ctx);
  free (json_ctx);
  return OK;
}

#define TEST_FUNCTION do_bench ()

/* On slower platforms this test needs more than the default 2 seconds.  */
#define TIMEOUT 10

#include "../test-skeleton.c"
  

Comments

Siddhesh Poyarekar Oct. 17, 2014, 10:02 a.m. UTC | #1
On Tue, Oct 14, 2014 at 03:11:20PM +0200, Leonhard Holz wrote:
> The output of the benchmark is:

Thanks, the outputs look good to me.  I'm going to push your strcoll
changes, but the benchmark changes need more work.  I hope you'll
continue to work on them :)

> diff --git a/benchtests/Makefile b/benchtests/Makefile
> index fd3036d..e79ceee 100644
> --- a/benchtests/Makefile
> +++ b/benchtests/Makefile
> @@ -34,7 +34,7 @@ string-bench := bcopy bzero memccpy memchr memcmp memcpy memmem memmove \
>  		mempcpy memset rawmemchr stpcpy stpncpy strcasecmp strcasestr \
>  		strcat strchr strchrnul strcmp strcpy strcspn strlen \
>  		strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \
> -		strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok
> +		strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok strcoll
>  string-bench-all := $(string-bench)
>  
>  stdlib-bench := strtod

You need to ensure that the locales are generated first.  The tests
target does this already.  Also my fault that I didn't point out
earlier that you'll need to set GCONV_PATH and LOCPATH to actually
make the test use the generated locales.  Otherwise they'll just use
the system locales.

> diff --git a/localedata/Makefile b/localedata/Makefile
> index b6235f2..cb24974 100644
> --- a/localedata/Makefile
> +++ b/localedata/Makefile
> @@ -106,7 +106,10 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \
>  	   hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
>  	   nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
>  	   zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
> -	   tr_TR.ISO-8859-9 en_GB.UTF-8
> +	   tr_TR.ISO-8859-9 en_GB.UTF-8 vi_VN.UTF-8 ar_SA.UTF-8 zh_CN.UTF-8 \
> +	   da_DK.UTF-8 pl_PL.UTF-8 pt_PT.UTF-8 el_GR.UTF-8 ru_RU.UTF-8 \
> +	   iw_IL.UTF-8 es_ES.UTF-8 hi_IN.UTF-8 sv_SE.UTF-8 hu_HU.UTF-8 \
> +	   is_IS.UTF-8 it_IT.UTF-8 sr_RS.UTF-8
>  LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g')
>  CHARMAPS := $(shell echo "$(LOCALES)" | \
>  		    sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g)

> /* Measure strcoll implementation.
>    Copyright (C) 2014 Free Software Foundation, Inc.
>    This file is part of the GNU C Library.
> 
>    The GNU C Library is free software; you can redistribute it and/or
>    modify it under the terms of the GNU Lesser General Public
>    License as published by the Free Software Foundation; either
>    version 2.1 of the License, or (at your option) any later version.
> 
>    The GNU C Library is distributed in the hope that it will be useful,
>    but WITHOUT ANY WARRANTY; without even the implied warranty of
>    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>    Lesser General Public License for more details.
> 
>    You should have received a copy of the GNU Lesser General Public
>    License along with the GNU C Library; if not, see
>    <http://www.gnu.org/licenses/>.  */
> 
> #define TEST_MAIN
> #define TEST_NAME "strcoll"
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <fcntl.h>
> #include <locale.h>
> #include <unistd.h>
> #include <dirent.h>
> #include "bench-timing.h"
> #include "json-lib.h"
> 
> /* many thanks to http://generator.lorem-ipsum.info/ */

Capitalize first letter of the first word.

> #define CHARSET "UTF-8"
> #define INPUT_FOLDER "strcoll-inputs"
> #define INPUT_PREFIX "lorem_ipsum_"
> 
> const char *locales[] = {

Make variable static.

>   "vi_VN",
>   "en_US",
>   "ar_SA",
>   "en_US",
>   "zh_CN",
>   "cs_CZ",
>   "en_GB",
>   "da_DK",
>   "pl_PL",
>   "fr_FR",
>   "pt_PT",
>   "el_GR",
>   "ru_RU",
>   "iw_IL",
>   "es_ES",
>   "hi_IN",
>   "sv_SE",
>   "hu_HU",
>   "tr_TR",
>   "is_IS",
>   "it_IT",
>   "sr_RS",
>   "ja_JP"
> };
> 
> #define TEXTFILE_DELIMITER " \n\r\t.,?!"
> #define FILENAMES_DELIMITER "\n\r"
> 
> int

Make function static.

> is_dir (const char *filename)
> {
>   int is_dir = 0;
>   struct stat stats;
> 
>   if (stat (filename, &stats) == 0)
>     is_dir = stats.st_mode & S_IFDIR;
> 
>   return is_dir;
> }
> 
> char *

Make function static. 

> concat_path (const char *dirname, const char *filename)
> {
>   size_t d_len = strlen (dirname);
>   size_t f_len = strlen (filename);
>   char * path = malloc (d_len + f_len + 2);
> 
>   memcpy (path, dirname, d_len);
>   * (path + d_len) = '/';
>   memcpy (path + d_len + 1, filename, f_len);
>   * (path + d_len + f_len + 1) = '\0';

Use asprintf:

  int ret = asprintf (&path, "%s/%s", dirname, filename);
  if (ret < 0)
    return NULL;

> 
>   return path;
> }
> 
> char *

Make function static.

> read_file (const char *filename)
> {
>   struct stat stats;
>   char *buffer = NULL;
>   int fd = open (filename, O_RDONLY);
> 
>   if (fd >= 0)
>     {
>       if (fstat (fd, &stats) == 0)
> 	{
> 	  buffer = malloc (stats.st_size + 1);
> 	  if (buffer)
> 	    {
> 	      read (fd, buffer, stats.st_size);

Check for read failure and free buffer and return if there's an error.

> 	      *(buffer + stats.st_size) = '\0';
> 	    }
> 	}
>       close (fd);
>     }
> 
>   return buffer;
> }
> 
> size_t

Make function static.

> count_words (const char *text, const char *delim)
> {
>   size_t wordcount = 0;
>   char *tmp = strdup (text);
> 
>   char *token = strtok (tmp, delim);
>   while (token != NULL)
>     {
>       if (*token != '\0')
> 	wordcount++;
>       token = strtok (NULL, delim);
>     }
> 
>   free (tmp);
>   return wordcount;
> }
> 
> typedef struct
> {
>   size_t size;
>   char **words;
> } word_list;
> 
> word_list *
> new_word_list (size_t size)
> {
>   word_list *list = malloc (sizeof (word_list));
>   list->size = size;
>   list->words = malloc (size * sizeof (char *));
>   return list;
> }
> 
> word_list *

Make function static.

> merge_word_list (word_list * dst, word_list * src)
> {
>   size_t i, n = dst->size;
>   dst->size += src->size;
>   dst->words = realloc (dst->words, dst->size * sizeof (char *));
> 
>   for (i = 0; i < src->size; i++)
>     dst->words[i + n] = src->words[i];
> 
>   free (src->words);
>   free (src);
> 
>   return dst;
> }
> 
> word_list *

Make function static.

> file_word_list (const char *dirname)
> {
>   DIR *dir;
>   struct dirent *ent;
>   word_list *list = NULL;
> 
>   if ((dir = opendir (dirname)) != NULL)
>     {
>       size_t ent_cnt = 0, i = 0;
>       word_list *sublist = new_word_list (0);
> 
>       while ((ent = readdir (dir)) != NULL)
> 	if (strcmp (".", ent->d_name) != 0 && strcmp ("..", ent->d_name) != 0)
> 	  {
> 	    char *subdirname = concat_path (dirname, ent->d_name);
> 	    if (is_dir (subdirname))
> 		merge_word_list (sublist, file_word_list (subdirname));
> 	    free (subdirname);
> 
> 	    ent_cnt++;
> 	  }
> 
>       rewinddir (dir);
>       list = new_word_list (ent_cnt);
>       while ((ent = readdir (dir)) != NULL)
> 	if (strcmp (".", ent->d_name) != 0 && strcmp ("..", ent->d_name) != 0)
> 	  {
> 	    list->words[i] = strdup (ent->d_name);
> 	    i++;
> 	  }
> 
>       merge_word_list (list, sublist);
>       closedir (dir);
>     }
> 
>   return list;
> }
> 
> word_list *

Make function static.

> str_word_list (const char *str, const char *delim)
> {
>   size_t n = 0;
>   word_list *list = new_word_list (count_words (str, delim));
> 
>   char *toks = strdup (str);
>   char *word = strtok (toks, delim);
>   while (word != NULL && n < list->size)
>     {
>       if (*word != '\0')
> 	list->words[n++] = strdup (word);
>       word = strtok (NULL, delim);
>     }

You're doing the same thing twice, which is suboptimal.  It's fine for
now I guess.

> 
>   free (toks);
>   return list;
> }
> 
> word_list *

Make the function static.

> copy_word_list (const word_list *list)
> {
>   size_t i;
>   word_list *copy = new_word_list (list->size);
> 
>   for (i = 0; i < list->size; i++)
>     copy->words[i] = strdup (list->words[i]);
> 
>   return copy;
> }
> 
> void

Make the function static.

> free_word_list (word_list *list)
> {
>   size_t i;
>   for (i = 0; i < list->size; i++)
>     free (list->words[i]);
> 
>   free (list->words);
>   free (list);
> }
> 
> int

Make the function static.

> compare_words (const void *a, const void *b)
> {
>   const char *s1 = *(char **) a;
>   const char *s2 = *(char **) b;
>   return strcoll (s1, s2);
> }
> 
> #undef INNER_LOOP_ITERS
> #define INNER_LOOP_ITERS 16
> 
> void

Make the function static.

> bench_list (json_ctx_t *json_ctx, word_list *list)
> {
>   size_t i;
>   timing_t start, stop, cur;
> 
>   word_list **tests = malloc (INNER_LOOP_ITERS * sizeof (word_list *));
>   for (i = 0; i < INNER_LOOP_ITERS; i++)
>     tests[i] = copy_word_list (list);
> 
>   TIMING_NOW (start);
>   for (i = 0; i < INNER_LOOP_ITERS; i++)
>     qsort (tests[i]->words, tests[i]->size, sizeof (char *), compare_words);
>   TIMING_NOW (stop);
> 
>   TIMING_DIFF (cur, start, stop);
>   setlocale (LC_ALL, "en_US.UTF-8");
>   json_attr_double (json_ctx, "duration", cur);
>   json_attr_double (json_ctx, "iterations", i);
>   json_attr_double (json_ctx, "mean", cur / i);
> 
>   for (i = 0; i < INNER_LOOP_ITERS; i++)
>     free_word_list (tests[i]);
>   free (tests);
> }
> 
> #define OK 0
> #define ERROR_LOCALE 1
> #define ERROR_IO 2
> 
> int

Make the function static.

> bench_file (json_ctx_t *json_ctx, const char *filename, const char *delim,
>   const char *locale)
> {
>   if (setlocale (LC_ALL, locale) == NULL)
>     return ERROR_LOCALE;
> 
>   char *text = read_file (filename);
>   if (text == NULL)
>     return ERROR_IO;
> 
>   word_list *list = str_word_list (text, delim);
> 
>   json_attr_object_begin (json_ctx, locale);
>   bench_list (json_ctx, list);
>   json_attr_object_end (json_ctx);
> 
>   free_word_list (list);
>   free (text);
>   return OK;
> }
> 
> int

Make the function static.

> bench_file_list (json_ctx_t *json_ctx, const char *dirname, const char *locale)
> {
>   if (setlocale (LC_ALL, locale) == NULL)
>     return ERROR_LOCALE;
> 
>   word_list *list = file_word_list (dirname);
>   if (list == NULL)
>     return ERROR_IO;
> 
>   json_attr_object_begin (json_ctx, "");
>   json_attr_string (json_ctx, "locale", locale);
>   bench_list (json_ctx, list);
>   json_attr_object_end (json_ctx);
> 
>   free_word_list (list);
>   return OK;
> }
> 
> int
> do_bench (void)
> {
>   timing_t res __attribute__ ((unused));
>   TIMING_INIT (res);
> 
>   json_ctx_t *json_ctx = malloc (sizeof (json_ctx_t));
>   json_init (json_ctx, 2, stdout);
>   json_attr_object_begin (json_ctx, "strcoll");
> 
>   int result = bench_file_list (json_ctx, "..", "en_US.UTF-8");
>   if (result == ERROR_LOCALE)
>     {
>       printf ("Failed to set locale en_US.UTF-8, aborting!\n");
>       return result;
>     }
>   if (result == ERROR_IO)
>     {
>       printf ("Failed to read libc directory, aborting!\n");
>       return result;
>     }
> 
>   size_t i;
>   char filename[40], locale[15];
>   for (i = 0; i < (sizeof (locales) / sizeof (locales[0])); i++)
>     {
>       sprintf (locale, "%s." CHARSET, locales[i]);
>       sprintf (filename, INPUT_FOLDER "/" INPUT_PREFIX "%s", locales[i]);
>       result = bench_file (json_ctx, filename, TEXTFILE_DELIMITER, locale);
>       if (result == ERROR_LOCALE)
> 	{
> 	  printf ("Failed to set locale %s, aborting!\n", locale);
> 	  return result;
> 	}
>       if (result == ERROR_IO)
> 	{
> 	  printf ("Failed to read file %s, aborting!\n", filename);
> 	  return result;
> 	}
>     }
> 
>   json_attr_object_end (json_ctx);
>   free (json_ctx);
>   return OK;
> }
> 
> #define TEST_FUNCTION do_bench ()
> 
> /* On slower platforms this test needs more than the default 2 seconds.  */
> #define TIMEOUT 10
> 
> #include "../test-skeleton.c"

Siddhesh

PS: I'm out for two weeks, so I'll be able to do the next review once
    I'm back in November.
  
Leonhard Holz Oct. 20, 2014, 8:24 p.m. UTC | #2
Am 17.10.2014 12:02, schrieb Siddhesh Poyarekar:
> On Tue, Oct 14, 2014 at 03:11:20PM +0200, Leonhard Holz wrote:
>> The output of the benchmark is:
>
> Thanks, the outputs look good to me.  I'm going to push your strcoll
> changes, but the benchmark changes need more work.  I hope you'll
> continue to work on them :)

:)

>
>> diff --git a/benchtests/Makefile b/benchtests/Makefile
>> index fd3036d..e79ceee 100644
>> --- a/benchtests/Makefile
>> +++ b/benchtests/Makefile
>> @@ -34,7 +34,7 @@ string-bench := bcopy bzero memccpy memchr memcmp memcpy memmem memmove \
>>   		mempcpy memset rawmemchr stpcpy stpncpy strcasecmp strcasestr \
>>   		strcat strchr strchrnul strcmp strcpy strcspn strlen \
>>   		strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \
>> -		strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok
>> +		strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok strcoll
>>   string-bench-all := $(string-bench)
>>
>>   stdlib-bench := strtod
>
> You need to ensure that the locales are generated first.  The tests
> target does this already.  Also my fault that I didn't point out
> earlier that you'll need to set GCONV_PATH and LOCPATH to actually
> make the test use the generated locales.  Otherwise they'll just use
> the system locales.
>

I am not sure if that is the case because the used locales are not 
installed on my system and before I added them to the Makefile they were 
not found, but after adding and executing "make tests" it did work.

Anyhow I feel a bit overstrained by those Makefiles. I understand that 
the wirings for generating the needed locales in localedata/Makefile 
should be applied to benchtests/Makefile but unfortunately I do not 
understand how the mechanics in localedata/Makefile actually work. Maybe 
someone here can help?

Best,
Leonhard
  
Stefan Liebler Oct. 22, 2014, 3:13 p.m. UTC | #3
Hi Leonhard,

great job.
If I run make xcheck on s390x, I get a failure in 
string/tst-strcoll-overflow testcase. See tst-strcoll-overflow.out:
0
Expected signal 'Alarm clock' from child, got none

I measured the execution time of this testcase between 35...60s
which is smaller than the defined timeout of 300s.
Can you change the behaviour of the testcase that it is passing
if it was timed out or if it returns successfully without timeout.

Bye
Stefan

On 10/14/2014 03:11 PM, Leonhard Holz wrote:
> Hello everybody,
>
> this is a path that should solve bug 15884. It complains about the
> performance of strcoll(). It was found out that the runtime of strcoll()
> is actually bound to strlen which is needed for calculating the size of
> a cache that was installed to improve the comparison performance.
>
> The idea for this patch was that the cache is only useful in rare cases
> (strings of same length and same first-level-chars) and that it would be
> better to avoid memory allocation at all. To prove this I wrote a
> performance test bench-strcoll.c with test data in
> benchtests-strcoll.tar.gz. Also modifications in benchtests/Makefile and
> localedata/Makefile are necessary to make it work.
>
> After removing the cache the strcoll method showed the predicted
> behavior (getting slightly faster) in all but the test case for hindi
> word sorting. This was due the hindi text having much more equal words
> than the other ones. For equal strings the performance was worse since
> all comparison levels were run through and from the second level on the
> cache improved the comparison performance of the original version.
>
> Therefore I added a bytewise test via strcmp iff the first level
> comparison found that both strings did match because in this case it is
> very likely that equal strings are compared. This solved the problem
> with the hindi test case and improved the performance of the others.
>
> The output of the benchmark is:
>
>    "strcoll": {
>     "": {
>      "locale": "en_US.UTF-8",
>      "duration": 7.41506e+09,
>      "iterations": 16,
>      "mean": 4.63441e+08
>     },
>     "vi_VN.UTF-8": {
>      "duration": 7.4185e+07,
>      "iterations": 16,
>      "mean": 4.63656e+06
>     },
>     "en_US.UTF-8": {
>      "duration": 7.59759e+07,
>      "iterations": 16,
>      "mean": 4.7485e+06
>     },
>     "ar_SA.UTF-8": {
>      "duration": 7.61479e+07,
>      "iterations": 16,
>      "mean": 4.75924e+06
>     },
>     "zh_CN.UTF-8": {
>      "duration": 1.37535e+07,
>      "iterations": 16,
>      "mean": 859594
>     },
>     "cs_CZ.UTF-8": {
>      "duration": 7.90032e+07,
>      "iterations": 16,
>      "mean": 4.9377e+06
>     },
>     "en_GB.UTF-8": {
>      "duration": 7.4454e+07,
>      "iterations": 16,
>      "mean": 4.65338e+06
>     },
>     "da_DK.UTF-8": {
>      "duration": 7.265e+07,
>      "iterations": 16,
>      "mean": 4.54062e+06
>     },
>     "pl_PL.UTF-8": {
>      "duration": 7.28964e+07,
>      "iterations": 16,
>      "mean": 4.55602e+06
>     },
>     "fr_FR.UTF-8": {
>      "duration": 7.99239e+07,
>      "iterations": 16,
>      "mean": 4.99524e+06
>     },
>     "pt_PT.UTF-8": {
>      "duration": 7.78347e+07,
>      "iterations": 16,
>      "mean": 4.86467e+06
>     },
>     "el_GR.UTF-8": {
>      "duration": 1.0959e+08,
>      "iterations": 16,
>      "mean": 6.84937e+06
>     },
>     "ru_RU.UTF-8": {
>      "duration": 9.39324e+07,
>      "iterations": 16,
>      "mean": 5.87077e+06
>     },
>     "iw_IL.UTF-8": {
>      "duration": 8.8769e+07,
>      "iterations": 16,
>      "mean": 5.54806e+06
>     },
>     "es_ES.UTF-8": {
>      "duration": 8.0782e+07,
>      "iterations": 16,
>      "mean": 5.04888e+06
>     },
>     "hi_IN.UTF-8": {
>      "duration": 3.66962e+09,
>      "iterations": 16,
>      "mean": 2.29351e+08
>     },
>     "sv_SE.UTF-8": {
>      "duration": 7.41934e+07,
>      "iterations": 16,
>      "mean": 4.63709e+06
>     },
>     "hu_HU.UTF-8": {
>      "duration": 9.04538e+07,
>      "iterations": 16,
>      "mean": 5.65336e+06
>     },
>     "tr_TR.UTF-8": {
>      "duration": 7.25579e+07,
>      "iterations": 16,
>      "mean": 4.53487e+06
>     },
>     "is_IS.UTF-8": {
>      "duration": 6.83783e+07,
>      "iterations": 16,
>      "mean": 4.27364e+06
>     },
>     "it_IT.UTF-8": {
>      "duration": 7.50307e+07,
>      "iterations": 16,
>      "mean": 4.68942e+06
>     },
>     "sr_RS.UTF-8": {
>      "duration": 8.14996e+07,
>      "iterations": 16,
>      "mean": 5.09373e+06
>     },
>     "ja_JP.UTF-8": {
>      "duration": 1.5325e+07,
>      "iterations": 16,
>      "mean": 957814
>     }
>    }
>
> That is compared to the previous version:
>
> glibc files    -33.77%
> vi_VN.UTF-8    -34.12%
> en_US.UTF-8    -42.42%
> ar_SA.UTF-8    -27.49%
> zh_CN.UTF-8    +07.90%
> cs_CZ.UTF-8    -29.67%
> en_GB.UTF-8    -28.50%
> da_DK.UTF-8    -36.57%
> pl_PL.UTF-8    -39.31%
> fr_FR.UTF-8    -28.57%
> pt_PT.UTF-8    -22.82%
> el_GR.UTF-8    -26.77%
> ru_RU.UTF-8    -35.81%
> iw_IL.UTF-8    -35.34%
> es_ES.UTF-8    -34.46%
> hi_IN.UTF-8    -00.38%
> sv_SE.UTF-8    -36.99%
> hu_HU.UTF-8    -16.35%
> tr_TR.UTF-8    -27.80%
> is_IS.UTF-8    -33.24%
> it_IT.UTF-8    -24.39%
> sr_RS.UTF-8    -37.55%
> ja_JP.UTF-8    +02.84%
>
> Best,
> Leonhard
>
>
> 2014-10-14  Leonhard Holz  <leonhard.holz@web.de>
>
>      [BZ #15884]
>      * string/strcoll_l.c (STRCOLL): Remove weight and rules cache.
>      * benchtests/bench-strcoll.c: New benchmark.
>      * benchtests/Makefile (bench-string): Use it.
>      * benchtests/strcoll-inputs/lorem_ipsum_ar_SA New file.
>      * benchtests/strcoll-inputs/lorem_ipsum_cs_CZ Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_da_DK Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_el_GR Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_en_GB Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_en_US Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_es_ES Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_fr_FR Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_hi_IN Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_hu_HU Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_is_IS Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_it_IT Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_iw_IL Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_ja_JP Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_pl_PL Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_pt_PT Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_ru_RU Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_sr_RS Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_sv_SE Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_tr_TR Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_vi_VN Likewise.
>      * benchtests/strcoll-inputs/lorem_ipsum_zh_CN Likewise.
>      * localedata/Makefile (LOCALES): Generate needed locales.
>
  
Leonhard Holz Oct. 22, 2014, 8:39 p.m. UTC | #4
Hi Stefan,

is this a new problem or did it occure before the patch?

Best,
Leonhard

Am 22.10.2014 17:13, schrieb Stefan Liebler:
> Hi Leonhard,
>
> great job.
> If I run make xcheck on s390x, I get a failure in
> string/tst-strcoll-overflow testcase. See tst-strcoll-overflow.out:
> 0
> Expected signal 'Alarm clock' from child, got none
>
> I measured the execution time of this testcase between 35...60s
> which is smaller than the defined timeout of 300s.
> Can you change the behaviour of the testcase that it is passing
> if it was timed out or if it returns successfully without timeout.
>
> Bye
> Stefan
>
> On 10/14/2014 03:11 PM, Leonhard Holz wrote:
>> Hello everybody,
>>
>> this is a path that should solve bug 15884. It complains about the
>> performance of strcoll(). It was found out that the runtime of strcoll()
>> is actually bound to strlen which is needed for calculating the size of
>> a cache that was installed to improve the comparison performance.
>>
>> The idea for this patch was that the cache is only useful in rare cases
>> (strings of same length and same first-level-chars) and that it would be
>> better to avoid memory allocation at all. To prove this I wrote a
>> performance test bench-strcoll.c with test data in
>> benchtests-strcoll.tar.gz. Also modifications in benchtests/Makefile and
>> localedata/Makefile are necessary to make it work.
>>
>> After removing the cache the strcoll method showed the predicted
>> behavior (getting slightly faster) in all but the test case for hindi
>> word sorting. This was due the hindi text having much more equal words
>> than the other ones. For equal strings the performance was worse since
>> all comparison levels were run through and from the second level on the
>> cache improved the comparison performance of the original version.
>>
>> Therefore I added a bytewise test via strcmp iff the first level
>> comparison found that both strings did match because in this case it is
>> very likely that equal strings are compared. This solved the problem
>> with the hindi test case and improved the performance of the others.
>>
>> The output of the benchmark is:
>>
>>    "strcoll": {
>>     "": {
>>      "locale": "en_US.UTF-8",
>>      "duration": 7.41506e+09,
>>      "iterations": 16,
>>      "mean": 4.63441e+08
>>     },
>>     "vi_VN.UTF-8": {
>>      "duration": 7.4185e+07,
>>      "iterations": 16,
>>      "mean": 4.63656e+06
>>     },
>>     "en_US.UTF-8": {
>>      "duration": 7.59759e+07,
>>      "iterations": 16,
>>      "mean": 4.7485e+06
>>     },
>>     "ar_SA.UTF-8": {
>>      "duration": 7.61479e+07,
>>      "iterations": 16,
>>      "mean": 4.75924e+06
>>     },
>>     "zh_CN.UTF-8": {
>>      "duration": 1.37535e+07,
>>      "iterations": 16,
>>      "mean": 859594
>>     },
>>     "cs_CZ.UTF-8": {
>>      "duration": 7.90032e+07,
>>      "iterations": 16,
>>      "mean": 4.9377e+06
>>     },
>>     "en_GB.UTF-8": {
>>      "duration": 7.4454e+07,
>>      "iterations": 16,
>>      "mean": 4.65338e+06
>>     },
>>     "da_DK.UTF-8": {
>>      "duration": 7.265e+07,
>>      "iterations": 16,
>>      "mean": 4.54062e+06
>>     },
>>     "pl_PL.UTF-8": {
>>      "duration": 7.28964e+07,
>>      "iterations": 16,
>>      "mean": 4.55602e+06
>>     },
>>     "fr_FR.UTF-8": {
>>      "duration": 7.99239e+07,
>>      "iterations": 16,
>>      "mean": 4.99524e+06
>>     },
>>     "pt_PT.UTF-8": {
>>      "duration": 7.78347e+07,
>>      "iterations": 16,
>>      "mean": 4.86467e+06
>>     },
>>     "el_GR.UTF-8": {
>>      "duration": 1.0959e+08,
>>      "iterations": 16,
>>      "mean": 6.84937e+06
>>     },
>>     "ru_RU.UTF-8": {
>>      "duration": 9.39324e+07,
>>      "iterations": 16,
>>      "mean": 5.87077e+06
>>     },
>>     "iw_IL.UTF-8": {
>>      "duration": 8.8769e+07,
>>      "iterations": 16,
>>      "mean": 5.54806e+06
>>     },
>>     "es_ES.UTF-8": {
>>      "duration": 8.0782e+07,
>>      "iterations": 16,
>>      "mean": 5.04888e+06
>>     },
>>     "hi_IN.UTF-8": {
>>      "duration": 3.66962e+09,
>>      "iterations": 16,
>>      "mean": 2.29351e+08
>>     },
>>     "sv_SE.UTF-8": {
>>      "duration": 7.41934e+07,
>>      "iterations": 16,
>>      "mean": 4.63709e+06
>>     },
>>     "hu_HU.UTF-8": {
>>      "duration": 9.04538e+07,
>>      "iterations": 16,
>>      "mean": 5.65336e+06
>>     },
>>     "tr_TR.UTF-8": {
>>      "duration": 7.25579e+07,
>>      "iterations": 16,
>>      "mean": 4.53487e+06
>>     },
>>     "is_IS.UTF-8": {
>>      "duration": 6.83783e+07,
>>      "iterations": 16,
>>      "mean": 4.27364e+06
>>     },
>>     "it_IT.UTF-8": {
>>      "duration": 7.50307e+07,
>>      "iterations": 16,
>>      "mean": 4.68942e+06
>>     },
>>     "sr_RS.UTF-8": {
>>      "duration": 8.14996e+07,
>>      "iterations": 16,
>>      "mean": 5.09373e+06
>>     },
>>     "ja_JP.UTF-8": {
>>      "duration": 1.5325e+07,
>>      "iterations": 16,
>>      "mean": 957814
>>     }
>>    }
>>
>> That is compared to the previous version:
>>
>> glibc files    -33.77%
>> vi_VN.UTF-8    -34.12%
>> en_US.UTF-8    -42.42%
>> ar_SA.UTF-8    -27.49%
>> zh_CN.UTF-8    +07.90%
>> cs_CZ.UTF-8    -29.67%
>> en_GB.UTF-8    -28.50%
>> da_DK.UTF-8    -36.57%
>> pl_PL.UTF-8    -39.31%
>> fr_FR.UTF-8    -28.57%
>> pt_PT.UTF-8    -22.82%
>> el_GR.UTF-8    -26.77%
>> ru_RU.UTF-8    -35.81%
>> iw_IL.UTF-8    -35.34%
>> es_ES.UTF-8    -34.46%
>> hi_IN.UTF-8    -00.38%
>> sv_SE.UTF-8    -36.99%
>> hu_HU.UTF-8    -16.35%
>> tr_TR.UTF-8    -27.80%
>> is_IS.UTF-8    -33.24%
>> it_IT.UTF-8    -24.39%
>> sr_RS.UTF-8    -37.55%
>> ja_JP.UTF-8    +02.84%
>>
>> Best,
>> Leonhard
>>
>>
>> 2014-10-14  Leonhard Holz  <leonhard.holz@web.de>
>>
>>      [BZ #15884]
>>      * string/strcoll_l.c (STRCOLL): Remove weight and rules cache.
>>      * benchtests/bench-strcoll.c: New benchmark.
>>      * benchtests/Makefile (bench-string): Use it.
>>      * benchtests/strcoll-inputs/lorem_ipsum_ar_SA New file.
>>      * benchtests/strcoll-inputs/lorem_ipsum_cs_CZ Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_da_DK Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_el_GR Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_en_GB Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_en_US Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_es_ES Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_fr_FR Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_hi_IN Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_hu_HU Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_is_IS Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_it_IT Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_iw_IL Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_ja_JP Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_pl_PL Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_pt_PT Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_ru_RU Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_sr_RS Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_sv_SE Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_tr_TR Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_vi_VN Likewise.
>>      * benchtests/strcoll-inputs/lorem_ipsum_zh_CN Likewise.
>>      * localedata/Makefile (LOCALES): Generate needed locales.
>>
>
  
Stefan Liebler Oct. 23, 2014, 10:18 a.m. UTC | #5
Hi Leonhard,
the testcase timed out after 300s before the patch,
which is interpreted as passing.
Bye

On 10/22/2014 10:39 PM, Leonhard Holz wrote:
> Hi Stefan,
>
> is this a new problem or did it occure before the patch?
>
> Best,
> Leonhard
>
> Am 22.10.2014 17:13, schrieb Stefan Liebler:
>> Hi Leonhard,
>>
>> great job.
>> If I run make xcheck on s390x, I get a failure in
>> string/tst-strcoll-overflow testcase. See tst-strcoll-overflow.out:
>> 0
>> Expected signal 'Alarm clock' from child, got none
>>
>> I measured the execution time of this testcase between 35...60s
>> which is smaller than the defined timeout of 300s.
>> Can you change the behaviour of the testcase that it is passing
>> if it was timed out or if it returns successfully without timeout.
>>
>> Bye
>> Stefan
>>
>> On 10/14/2014 03:11 PM, Leonhard Holz wrote:
>>> Hello everybody,
>>>
>>> this is a path that should solve bug 15884. It complains about the
>>> performance of strcoll(). It was found out that the runtime of strcoll()
>>> is actually bound to strlen which is needed for calculating the size of
>>> a cache that was installed to improve the comparison performance.
>>>
>>> The idea for this patch was that the cache is only useful in rare cases
>>> (strings of same length and same first-level-chars) and that it would be
>>> better to avoid memory allocation at all. To prove this I wrote a
>>> performance test bench-strcoll.c with test data in
>>> benchtests-strcoll.tar.gz. Also modifications in benchtests/Makefile and
>>> localedata/Makefile are necessary to make it work.
>>>
>>> After removing the cache the strcoll method showed the predicted
>>> behavior (getting slightly faster) in all but the test case for hindi
>>> word sorting. This was due the hindi text having much more equal words
>>> than the other ones. For equal strings the performance was worse since
>>> all comparison levels were run through and from the second level on the
>>> cache improved the comparison performance of the original version.
>>>
>>> Therefore I added a bytewise test via strcmp iff the first level
>>> comparison found that both strings did match because in this case it is
>>> very likely that equal strings are compared. This solved the problem
>>> with the hindi test case and improved the performance of the others.
>>>
>>> The output of the benchmark is:
>>>
>>>    "strcoll": {
>>>     "": {
>>>      "locale": "en_US.UTF-8",
>>>      "duration": 7.41506e+09,
>>>      "iterations": 16,
>>>      "mean": 4.63441e+08
>>>     },
>>>     "vi_VN.UTF-8": {
>>>      "duration": 7.4185e+07,
>>>      "iterations": 16,
>>>      "mean": 4.63656e+06
>>>     },
>>>     "en_US.UTF-8": {
>>>      "duration": 7.59759e+07,
>>>      "iterations": 16,
>>>      "mean": 4.7485e+06
>>>     },
>>>     "ar_SA.UTF-8": {
>>>      "duration": 7.61479e+07,
>>>      "iterations": 16,
>>>      "mean": 4.75924e+06
>>>     },
>>>     "zh_CN.UTF-8": {
>>>      "duration": 1.37535e+07,
>>>      "iterations": 16,
>>>      "mean": 859594
>>>     },
>>>     "cs_CZ.UTF-8": {
>>>      "duration": 7.90032e+07,
>>>      "iterations": 16,
>>>      "mean": 4.9377e+06
>>>     },
>>>     "en_GB.UTF-8": {
>>>      "duration": 7.4454e+07,
>>>      "iterations": 16,
>>>      "mean": 4.65338e+06
>>>     },
>>>     "da_DK.UTF-8": {
>>>      "duration": 7.265e+07,
>>>      "iterations": 16,
>>>      "mean": 4.54062e+06
>>>     },
>>>     "pl_PL.UTF-8": {
>>>      "duration": 7.28964e+07,
>>>      "iterations": 16,
>>>      "mean": 4.55602e+06
>>>     },
>>>     "fr_FR.UTF-8": {
>>>      "duration": 7.99239e+07,
>>>      "iterations": 16,
>>>      "mean": 4.99524e+06
>>>     },
>>>     "pt_PT.UTF-8": {
>>>      "duration": 7.78347e+07,
>>>      "iterations": 16,
>>>      "mean": 4.86467e+06
>>>     },
>>>     "el_GR.UTF-8": {
>>>      "duration": 1.0959e+08,
>>>      "iterations": 16,
>>>      "mean": 6.84937e+06
>>>     },
>>>     "ru_RU.UTF-8": {
>>>      "duration": 9.39324e+07,
>>>      "iterations": 16,
>>>      "mean": 5.87077e+06
>>>     },
>>>     "iw_IL.UTF-8": {
>>>      "duration": 8.8769e+07,
>>>      "iterations": 16,
>>>      "mean": 5.54806e+06
>>>     },
>>>     "es_ES.UTF-8": {
>>>      "duration": 8.0782e+07,
>>>      "iterations": 16,
>>>      "mean": 5.04888e+06
>>>     },
>>>     "hi_IN.UTF-8": {
>>>      "duration": 3.66962e+09,
>>>      "iterations": 16,
>>>      "mean": 2.29351e+08
>>>     },
>>>     "sv_SE.UTF-8": {
>>>      "duration": 7.41934e+07,
>>>      "iterations": 16,
>>>      "mean": 4.63709e+06
>>>     },
>>>     "hu_HU.UTF-8": {
>>>      "duration": 9.04538e+07,
>>>      "iterations": 16,
>>>      "mean": 5.65336e+06
>>>     },
>>>     "tr_TR.UTF-8": {
>>>      "duration": 7.25579e+07,
>>>      "iterations": 16,
>>>      "mean": 4.53487e+06
>>>     },
>>>     "is_IS.UTF-8": {
>>>      "duration": 6.83783e+07,
>>>      "iterations": 16,
>>>      "mean": 4.27364e+06
>>>     },
>>>     "it_IT.UTF-8": {
>>>      "duration": 7.50307e+07,
>>>      "iterations": 16,
>>>      "mean": 4.68942e+06
>>>     },
>>>     "sr_RS.UTF-8": {
>>>      "duration": 8.14996e+07,
>>>      "iterations": 16,
>>>      "mean": 5.09373e+06
>>>     },
>>>     "ja_JP.UTF-8": {
>>>      "duration": 1.5325e+07,
>>>      "iterations": 16,
>>>      "mean": 957814
>>>     }
>>>    }
>>>
>>> That is compared to the previous version:
>>>
>>> glibc files    -33.77%
>>> vi_VN.UTF-8    -34.12%
>>> en_US.UTF-8    -42.42%
>>> ar_SA.UTF-8    -27.49%
>>> zh_CN.UTF-8    +07.90%
>>> cs_CZ.UTF-8    -29.67%
>>> en_GB.UTF-8    -28.50%
>>> da_DK.UTF-8    -36.57%
>>> pl_PL.UTF-8    -39.31%
>>> fr_FR.UTF-8    -28.57%
>>> pt_PT.UTF-8    -22.82%
>>> el_GR.UTF-8    -26.77%
>>> ru_RU.UTF-8    -35.81%
>>> iw_IL.UTF-8    -35.34%
>>> es_ES.UTF-8    -34.46%
>>> hi_IN.UTF-8    -00.38%
>>> sv_SE.UTF-8    -36.99%
>>> hu_HU.UTF-8    -16.35%
>>> tr_TR.UTF-8    -27.80%
>>> is_IS.UTF-8    -33.24%
>>> it_IT.UTF-8    -24.39%
>>> sr_RS.UTF-8    -37.55%
>>> ja_JP.UTF-8    +02.84%
>>>
>>> Best,
>>> Leonhard
>>>
>>>
>>> 2014-10-14  Leonhard Holz  <leonhard.holz@web.de>
>>>
>>>      [BZ #15884]
>>>      * string/strcoll_l.c (STRCOLL): Remove weight and rules cache.
>>>      * benchtests/bench-strcoll.c: New benchmark.
>>>      * benchtests/Makefile (bench-string): Use it.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_ar_SA New file.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_cs_CZ Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_da_DK Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_el_GR Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_en_GB Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_en_US Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_es_ES Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_fr_FR Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_hi_IN Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_hu_HU Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_is_IS Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_it_IT Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_iw_IL Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_ja_JP Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_pl_PL Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_pt_PT Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_ru_RU Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_sr_RS Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_sv_SE Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_tr_TR Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_vi_VN Likewise.
>>>      * benchtests/strcoll-inputs/lorem_ipsum_zh_CN Likewise.
>>>      * localedata/Makefile (LOCALES): Generate needed locales.
>>>
>>
>
  
Stefan Liebler Oct. 23, 2014, 3:13 p.m. UTC | #6
Hi Leonhard,

i filed a bug.
See: https://sourceware.org/bugzilla/show_bug.cgi?id=17506

Bye
Stefan

On 10/23/2014 02:27 PM, Leonhard Holz wrote:> Hi Stefan,
 >
 > looks like this has to be changed in test-skeleton.c. Can you file a bug
 > report for the problem?
 >
 > Best,
 > Leonhard
 >
 > Am 23.10.2014 12:18, schrieb Stefan Liebler:
 >> Hi Leonhard,
 >> the testcase timed out after 300s before the patch,
 >> which is interpreted as passing.
 >> Bye
 >>
 >> On 10/22/2014 10:39 PM, Leonhard Holz wrote:
 >>> Hi Stefan,
 >>>
 >>> is this a new problem or did it occure before the patch?
 >>>
 >>> Best,
 >>> Leonhard
 >>>
 >>> Am 22.10.2014 17:13, schrieb Stefan Liebler:
 >>>> Hi Leonhard,
 >>>>
 >>>> great job.
 >>>> If I run make xcheck on s390x, I get a failure in
 >>>> string/tst-strcoll-overflow testcase. See tst-strcoll-overflow.out:
 >>>> 0
 >>>> Expected signal 'Alarm clock' from child, got none
 >>>>
 >>>> I measured the execution time of this testcase between 35...60s
 >>>> which is smaller than the defined timeout of 300s.
 >>>> Can you change the behaviour of the testcase that it is passing
 >>>> if it was timed out or if it returns successfully without timeout.
 >>>>
 >>>> Bye
 >>>> Stefan
 >>>>
 >>>> On 10/14/2014 03:11 PM, Leonhard Holz wrote:
 >>>>> Hello everybody,
 >>>>>
 >>>>> this is a path that should solve bug 15884. It complains about the
 >>>>> performance of strcoll(). It was found out that the runtime of
 >>>>> strcoll()
 >>>>> is actually bound to strlen which is needed for calculating the
 >>>>> size of
 >>>>> a cache that was installed to improve the comparison performance.
 >>>>>
 >>>>> The idea for this patch was that the cache is only useful in rare
 >>>>> cases
 >>>>> (strings of same length and same first-level-chars) and that it
 >>>>> would be
 >>>>> better to avoid memory allocation at all. To prove this I wrote a
 >>>>> performance test bench-strcoll.c with test data in
 >>>>> benchtests-strcoll.tar.gz. Also modifications in benchtests/Makefile
 >>>>> and
 >>>>> localedata/Makefile are necessary to make it work.
 >>>>>
 >>>>> After removing the cache the strcoll method showed the predicted
 >>>>> behavior (getting slightly faster) in all but the test case for hindi
 >>>>> word sorting. This was due the hindi text having much more equal 
words
 >>>>> than the other ones. For equal strings the performance was worse 
since
 >>>>> all comparison levels were run through and from the second level on
 >>>>> the
 >>>>> cache improved the comparison performance of the original version.
 >>>>>
 >>>>> Therefore I added a bytewise test via strcmp iff the first level
 >>>>> comparison found that both strings did match because in this case
 >>>>> it is
 >>>>> very likely that equal strings are compared. This solved the problem
 >>>>> with the hindi test case and improved the performance of the others.
 >>>>>
 >>>>
 >>>
 >>
 >
  
Siddhesh Poyarekar Nov. 4, 2014, 4:44 p.m. UTC | #7
On Mon, Oct 20, 2014 at 10:24:49PM +0200, Leonhard Holz wrote:
> I am not sure if that is the case because the used locales are not installed
> on my system and before I added them to the Makefile they were not found,
> but after adding and executing "make tests" it did work.
> 
> Anyhow I feel a bit overstrained by those Makefiles. I understand that the

Yeah, legend has it that those who've read those Makefiles for too
long have needed some sort of medical help...

> wirings for generating the needed locales in localedata/Makefile should be
> applied to benchtests/Makefile but unfortunately I do not understand how the
> mechanics in localedata/Makefile actually work. Maybe someone here can help?

I'll take a look at them later and ping you again.

Siddhesh
  

Patch

diff --git a/benchtests/Makefile b/benchtests/Makefile
index fd3036d..e79ceee 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -34,7 +34,7 @@  string-bench := bcopy bzero memccpy memchr memcmp memcpy memmem memmove \
 		mempcpy memset rawmemchr stpcpy stpncpy strcasecmp strcasestr \
 		strcat strchr strchrnul strcmp strcpy strcspn strlen \
 		strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \
-		strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok
+		strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok strcoll
 string-bench-all := $(string-bench)
 
 stdlib-bench := strtod