From patchwork Mon Sep 29 10:30:18 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonhard Holz X-Patchwork-Id: 3017 Received: (qmail 15037 invoked by alias); 29 Sep 2014 10:30:33 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 15010 invoked by uid 89); 29 Sep 2014 10:30:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, KAM_INFOUSME, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mout.web.de Message-ID: <542934BA.9020000@web.de> Date: Mon, 29 Sep 2014 12:30:18 +0200 From: Leonhard Holz User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: libc-alpha@sourceware.org Subject: Re: [Patch] [BZ 15884] strcoll: improve performance by removing the cache References: <542282C1.2080504@web.de> <20140924100247.GN1716@spoyarek.pnq.redhat.com> In-Reply-To: <20140924100247.GN1716@spoyarek.pnq.redhat.com> X-UI-Out-Filterresults: notjunk:1; Ok, according to Siddhesh remarks I splitted the patch in two, this which removes the cache and a following one that introduces inline-ing. Formating should now be better and I tried to minimize the changed lines. 2014-09-29 Leonhard Holz [BZ #15884] * string/strcoll_l.c: Remove weight and rules cache. * benchtests/bench-strcoll.c: Benchmark for strcoll(). * benchtests/Makefile: Likewise. * benchtests/strcoll-inputs/glibc_files.txt: Benchmark data. * benchtests/strcoll-inputs/lorem_ipsum_vietnamese.txt * benchtests/strcoll-inputs/lorem_ipsum_latin.txt * benchtests/strcoll-inputs/lorem_ipsum_arabic.txt * benchtests/strcoll-inputs/lorem_ipsum_l33tspeak.txt * benchtests/strcoll-inputs/lorem_ipsum_chinese.txt * benchtests/strcoll-inputs/lorem_ipsum_czech.txt * benchtests/strcoll-inputs/lorem_ipsum_old_english.txt * benchtests/strcoll-inputs/lorem_ipsum_danish.txt * benchtests/strcoll-inputs/lorem_ipsum_polish.txt * benchtests/strcoll-inputs/lorem_ipsum_french.txt * benchtests/strcoll-inputs/lorem_ipsum_portugese.txt * benchtests/strcoll-inputs/lorem_ipsum_greek.txt * benchtests/strcoll-inputs/lorem_ipsum_russian.txt * benchtests/strcoll-inputs/lorem_ipsum_hebrew.txt * benchtests/strcoll-inputs/lorem_ipsum_spain.txt * benchtests/strcoll-inputs/lorem_ipsum_hindi.txt * benchtests/strcoll-inputs/lorem_ipsum_swedish.txt * benchtests/strcoll-inputs/lorem_ipsum_hungarian.txt * benchtests/strcoll-inputs/lorem_ipsum_turkish.txt * benchtests/strcoll-inputs/lorem_ipsum_icelandic.txt * benchtests/strcoll-inputs/lorem_ipsum_italian.txt * benchtests/strcoll-inputs/lorem_ipsum_yugoslavian.txt * benchtests/strcoll-inputs/lorem_ipsum_japanese.txt * localedata/Makefile: Generate locales needed for benchtests. The numbers are: glibc_files.txt en_US.UTF-8 -46.72% lorem_ipsum_vietnamese.txt vi_VN.UTF-8 -36.60% lorem_ipsum_latin.txt en_US.UTF-8 -45.83% lorem_ipsum_arabic.txt ar_SA.UTF-8 -34.11% lorem_ipsum_l33tspeak.txt en_US.UTF-8 -46.28% lorem_ipsum_chinese.txt zh_CN.UTF-8 +30.95% lorem_ipsum_czech.txt cs_CZ.UTF-8 -36.17% lorem_ipsum_old_english.txt en_GB.UTF-8 -35.22% lorem_ipsum_danish.txt da_DK.UTF-8 -39.22% lorem_ipsum_polish.txt pl_PL.UTF-8 -42.62% lorem_ipsum_french.txt fr_FR.UTF-8 -31.09% lorem_ipsum_portugese.txt pt_PT.UTF-8 -30.27% lorem_ipsum_greek.txt el_GR.UTF-8 -32.07% lorem_ipsum_russian.txt ru_RU.UTF-8 -36.00% lorem_ipsum_hebrew.txt iw_IL.UTF-8 -41.44% lorem_ipsum_spain.txt es_ES.UTF-8 -35.64% lorem_ipsum_hindi.txt hi_IN.UTF-8 -00.17% lorem_ipsum_swedish.txt sv_SE.UTF-8 -38.85% lorem_ipsum_hungarian.txt hu_HU.UTF-8 -21.82% lorem_ipsum_turkish.txt tr_TR.UTF-8 -38.08% lorem_ipsum_icelandic.txt is_IS.UTF-8 -43.40% lorem_ipsum_italian.txt it_IT.UTF-8 -30.52% lorem_ipsum_yugoslavian.txt sr_RS.UTF-8 -36.41% lorem_ipsum_japanese.txt ja_JP.UTF-8 +18.00% Chinese and japanese are a bit special as AFAIK in these languages every character is a word and the benchmark is probably comparing sentences. Also theses language complete much faster in absolute numbers, about 1e6 vs. 3e6 (new) / 5e6 (old) for alphabetic languages. Best, Leonhard Am 24.09.2014 12:02, schrieb Siddhesh Poyarekar: > On Wed, Sep 24, 2014 at 10:37:21AM +0200, Leonhard Holz wrote: >> Hello everybody, >> >> this is a path that should solve bug 15884. It complains about the >> performance of strcoll(). It was found out that the runtime of strcoll() is >> actually bound to strlen which is needed for calculating the size of a cache >> that was installed to improve the comparison performance. >> >> The idea for this patch was that the cache is only useful in rare cases >> (strings of same length and same first-level-chars) and that it would be >> better to avoid memory allocation at all. To prove this I wrote a >> performance test that is found in benchtests-strcoll.tar.bz2. Also >> modifications in benchtests/Makefile and localedata/Makefile are necessary >> to make it work. >> >> After removing the cache the strcoll method showed the predicted behavior >> (getting slightly faster) in all but the test case for hindi word sorting. >> This was due the hindi text having much more equal words than the other >> ones. For equal strings the performance was worse since all comparison >> levels were run through and from the second level on the cache improved the >> comparison performance of the original version. >> >> Therefore I added a bytewise test via strcmp iff the first level comparison >> found that both strings did match because in this case it is very likely >> that equal strings are compared. This solved the problem with the hindi test >> case and improved the performance of the others. > > Thanks for working on this and also writing a benchmark for it. The > general approach seems sound to me (I haven't done a deep review yet), > but there are quite a few nits that will need to be worked out, most > of them covered in the contributor checklist[1]. > > - There are a lot of unrelated whitespace and formatting changes in > the patch. Most of them seem to have been made using the GNU indent > program, which is mostly accurate, but not completely. Please > review and fix them up. > > - The change needs a changelog which mentions all your changes, > including all the new files. > > - Please include bench-strcoll.c in the patch as well. It's OK if you > post the input files in the tarball but the source needs to be > reviewed. > > - bench-strcoll.c has some code formatting issues, especially > unnecessary braces around single line for/if blocks. > >> Another improvement was achieved by inlineing both static subroutines. > > - Please post the inlining change separately with separate numbers for > it. In general we stay away from inlining functions and just let > the compiler do its job. However if there is a case where such > inlining is especially useful, it needs to be accompanied with > numbers. So a separate patch with separate numbers for the change > would be helpful. > > - Finally, I don't know if you have signed a copyright assignment with > the FSF for your changes. Carlos seems to have mentioned that in > your previous email thread, but I don't know if you've followed > through on it since I am not an FSF maintainer. Maybe one of the > FSF maintainers can confirm that. > > Siddhesh > > [1] https://sourceware.org/glibc/wiki/Contribution%20checklist > /* Measure strcoll implementation. Copyright (C) 2014 Free Software Foundation, Inc. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see . */ #define TEST_MAIN #define TEST_NAME "strcoll" #include #include #include #include "bench-timing.h" // many thanks to http://generator.lorem-ipsum.info/ const char *li_files[] = { "strcoll-inputs/lorem_ipsum_vietnamese.txt", "strcoll-inputs/lorem_ipsum_latin.txt", "strcoll-inputs/lorem_ipsum_arabic.txt", "strcoll-inputs/lorem_ipsum_l33tspeak.txt", "strcoll-inputs/lorem_ipsum_chinese.txt", "strcoll-inputs/lorem_ipsum_czech.txt", "strcoll-inputs/lorem_ipsum_old_english.txt", "strcoll-inputs/lorem_ipsum_danish.txt", "strcoll-inputs/lorem_ipsum_polish.txt", "strcoll-inputs/lorem_ipsum_french.txt", "strcoll-inputs/lorem_ipsum_portugese.txt", "strcoll-inputs/lorem_ipsum_greek.txt", "strcoll-inputs/lorem_ipsum_russian.txt", "strcoll-inputs/lorem_ipsum_hebrew.txt", "strcoll-inputs/lorem_ipsum_spain.txt", "strcoll-inputs/lorem_ipsum_hindi.txt", "strcoll-inputs/lorem_ipsum_swedish.txt", "strcoll-inputs/lorem_ipsum_hungarian.txt", "strcoll-inputs/lorem_ipsum_turkish.txt", "strcoll-inputs/lorem_ipsum_icelandic.txt", "strcoll-inputs/lorem_ipsum_italian.txt", "strcoll-inputs/lorem_ipsum_yugoslavian.txt", "strcoll-inputs/lorem_ipsum_japanese.txt" }; const char *li_locales[] = { "vi_VN.UTF-8", "en_US.UTF-8", "ar_SA.UTF-8", "en_US.UTF-8", "zh_CN.UTF-8", "cs_CZ.UTF-8", "en_GB.UTF-8", "da_DK.UTF-8", "pl_PL.UTF-8", "fr_FR.UTF-8", "pt_PT.UTF-8", "el_GR.UTF-8", "ru_RU.UTF-8", "iw_IL.UTF-8", "es_ES.UTF-8", "hi_IN.UTF-8", "sv_SE.UTF-8", "hu_HU.UTF-8", "tr_TR.UTF-8", "is_IS.UTF-8", "it_IT.UTF-8", "sr_RS.UTF-8", "ja_JP.UTF-8" }; const char *filenames_file = "strcoll-inputs/glibc_files.txt"; #define LI_DELIMITER " \n\r\t.,?!" #define FILENAMES_DELIMITER "\n\r" char * read_file (const char *filename) { char *buffer = NULL; FILE *f = fopen (filename, "rb"); if (f) { fseek (f, 0, SEEK_END); size_t length = ftell (f); fseek (f, 0, SEEK_SET); buffer = malloc (length + 1); if (buffer) { fread (buffer, 1, length, f); *(buffer + length) = '\0'; } fclose (f); } return buffer; } size_t count_words (const char *text, const char *delim) { size_t wordcount = 0; char *tmp = strdup (text); char *token = strtok (tmp, delim); while (token != NULL) { if (*token != '\0') wordcount++; token = strtok (NULL, delim); } free (tmp); return wordcount; } typedef struct { char *str; size_t strlen; size_t size; char **words; } word_list; word_list tokenize_string (const char *str, const char *delim) { word_list list; list.strlen = strlen (str); list.size = count_words (str, delim); list.words = malloc (list.size * sizeof (char *)); size_t n = 0; list.str = strdup (str); char *word = strtok (list.str, delim); while (word != NULL && n < list.size) { if (*word != '\0') list.words[n++] = word; word = strtok (NULL, delim); } return list; } word_list copy_word_list (const word_list list) { size_t i; word_list copy; copy.strlen = list.strlen; copy.str = malloc (list.strlen + 1); memcpy (copy.str, list.str, list.strlen + 1); copy.size = list.size; copy.words = malloc (list.size * sizeof (char *)); for (i = 0; i < list.size; i++) copy.words[i] = list.words[i] - list.str + copy.str; return copy; } int compare_words (const void *a, const void *b) { const char *s1 = *(char **) a; const char *s2 = *(char **) b; return strcoll (s1, s2); } word_list sort_word_list (const word_list list) { word_list sorted = copy_word_list (list); qsort (sorted.words, sorted.size, sizeof (char *), compare_words); return sorted; } void print_word_list (word_list list) { size_t i; for (i = 0; i < list.size; i++) printf ("%s\n", list.words[i]); } void free_word_list (word_list list) { free (list.words); free (list.str); } #undef INNER_LOOP_ITERS #define INNER_LOOP_ITERS 16 void test_file (const char *filename, const char *delim, const char *locale) { setlocale (LC_ALL, locale); timing_t start, stop, cur; size_t i, iters = INNER_LOOP_ITERS; printf ("%-50s %-10s", filename, setlocale (LC_ALL, NULL)); char *text = read_file (filename); word_list list = tokenize_string (text, delim); word_list *tests = malloc (INNER_LOOP_ITERS * sizeof (word_list)); for (i = 0; i < INNER_LOOP_ITERS; i++) tests[i] = copy_word_list (list); TIMING_NOW (start); for (i = 0; i < INNER_LOOP_ITERS; i++) qsort (tests[i].words, tests[i].size, sizeof (char *), compare_words); TIMING_NOW (stop); setlocale (LC_ALL, "en_US.UTF-8"); TIMING_DIFF (cur, start, stop); TIMING_PRINT_MEAN ((double) cur, (double) iters); putchar ('\n'); for (i = 0; i < INNER_LOOP_ITERS; i++) free_word_list (tests[i]); free (tests); free_word_list (list); free (text); } int do_bench (void) { if (setlocale (LC_ALL, "en_US.UTF-8") == NULL) { printf ("Failed to set default locale."); return 1; } timing_t res __attribute__ ((unused)); TIMING_INIT (res); test_file (filenames_file, FILENAMES_DELIMITER, "en_US.UTF-8"); size_t i; for (i = 0; i < (sizeof (li_files) / sizeof (li_files[0])); i++) test_file (li_files[i], LI_DELIMITER, li_locales[i]); return 0; } #define TEST_FUNCTION do_bench () /* On slower platforms this test needs more than the default 2 seconds. */ #define TIMEOUT 10 #include "../test-skeleton.c" diff --git a/localedata/Makefile b/localedata/Makefile index b6235f2..cb24974 100644 --- a/localedata/Makefile +++ b/localedata/Makefile @@ -106,7 +106,10 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \ hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \ nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \ zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \ - tr_TR.ISO-8859-9 en_GB.UTF-8 + tr_TR.ISO-8859-9 en_GB.UTF-8 vi_VN.UTF-8 ar_SA.UTF-8 zh_CN.UTF-8 \ + da_DK.UTF-8 pl_PL.UTF-8 pt_PT.UTF-8 el_GR.UTF-8 ru_RU.UTF-8 \ + iw_IL.UTF-8 es_ES.UTF-8 hi_IN.UTF-8 sv_SE.UTF-8 hu_HU.UTF-8 \ + is_IS.UTF-8 it_IT.UTF-8 sr_RS.UTF-8 LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g') CHARMAPS := $(shell echo "$(LOCALES)" | \ sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g) diff --git a/string/strcoll_l.c b/string/strcoll_l.c index d4f42a3..dbda23c 100644 --- a/string/strcoll_l.c +++ b/string/strcoll_l.c @@ -22,7 +22,6 @@ #include #include #include -#include #include #include @@ -55,8 +54,6 @@ typedef struct size_t backw; /* Current Backward sequence index. */ size_t backw_stop; /* Index where the backward sequences stop. */ const USTRING_TYPE *us; /* The string. */ - int32_t *idxarr; /* Array to cache weight indices. */ - unsigned char *rulearr; /* Array to cache rules. */ unsigned char rule; /* Saved rule for the first sequence. */ int32_t idx; /* Index to weight of the current sequence. */ int32_t save_idx; /* Save looked up index of a forward @@ -65,179 +62,9 @@ typedef struct const USTRING_TYPE *back_us; /* Beginning of the backward sequence. */ } coll_seq; -/* Get next sequence. The weight indices are cached, so we don't need to - traverse the string. */ -static void -get_next_seq_cached (coll_seq *seq, int nrules, int pass, - const unsigned char *rulesets, - const USTRING_TYPE *weights) -{ - size_t val = seq->val = 0; - int len = seq->len; - size_t backw_stop = seq->backw_stop; - size_t backw = seq->backw; - size_t idxcnt = seq->idxcnt; - size_t idxmax = seq->idxmax; - size_t idxnow = seq->idxnow; - unsigned char *rulearr = seq->rulearr; - int32_t *idxarr = seq->idxarr; - - while (len == 0) - { - ++val; - if (backw_stop != ~0ul) - { - /* There is something pushed. */ - if (backw == backw_stop) - { - /* The last pushed character was handled. Continue - with forward characters. */ - if (idxcnt < idxmax) - { - idxnow = idxcnt; - backw_stop = ~0ul; - } - else - { - /* Nothing any more. The backward sequence - ended with the last sequence in the string. */ - idxnow = ~0ul; - break; - } - } - else - idxnow = --backw; - } - else - { - backw_stop = idxcnt; - - while (idxcnt < idxmax) - { - if ((rulesets[rulearr[idxcnt] * nrules + pass] - & sort_backward) == 0) - /* No more backward characters to push. */ - break; - ++idxcnt; - } - - if (backw_stop == idxcnt) - { - /* No sequence at all or just one. */ - if (idxcnt == idxmax) - /* Note that LEN is still zero. */ - break; - - backw_stop = ~0ul; - idxnow = idxcnt++; - } - else - /* We pushed backward sequences. */ - idxnow = backw = idxcnt - 1; - } - len = weights[idxarr[idxnow]++]; - } - - /* Update the structure. */ - seq->val = val; - seq->len = len; - seq->backw_stop = backw_stop; - seq->backw = backw; - seq->idxcnt = idxcnt; - seq->idxnow = idxnow; -} - /* Get next sequence. Traverse the string as required. */ static void get_next_seq (coll_seq *seq, int nrules, const unsigned char *rulesets, - const USTRING_TYPE *weights, const int32_t *table, - const USTRING_TYPE *extra, const int32_t *indirect) -{ - size_t val = seq->val = 0; - int len = seq->len; - size_t backw_stop = seq->backw_stop; - size_t backw = seq->backw; - size_t idxcnt = seq->idxcnt; - size_t idxmax = seq->idxmax; - size_t idxnow = seq->idxnow; - unsigned char *rulearr = seq->rulearr; - int32_t *idxarr = seq->idxarr; - const USTRING_TYPE *us = seq->us; - - while (len == 0) - { - ++val; - if (backw_stop != ~0ul) - { - /* There is something pushed. */ - if (backw == backw_stop) - { - /* The last pushed character was handled. Continue - with forward characters. */ - if (idxcnt < idxmax) - { - idxnow = idxcnt; - backw_stop = ~0ul; - } - else - /* Nothing any more. The backward sequence ended with - the last sequence in the string. Note that LEN - is still zero. */ - break; - } - else - idxnow = --backw; - } - else - { - backw_stop = idxmax; - - while (*us != L('\0')) - { - int32_t tmp = findidx (table, indirect, extra, &us, -1); - rulearr[idxmax] = tmp >> 24; - idxarr[idxmax] = tmp & 0xffffff; - idxcnt = idxmax++; - - if ((rulesets[rulearr[idxcnt] * nrules] - & sort_backward) == 0) - /* No more backward characters to push. */ - break; - ++idxcnt; - } - - if (backw_stop >= idxcnt) - { - /* No sequence at all or just one. */ - if (idxcnt == idxmax || backw_stop > idxcnt) - /* Note that LEN is still zero. */ - break; - - backw_stop = ~0ul; - idxnow = idxcnt; - } - else - /* We pushed backward sequences. */ - idxnow = backw = idxcnt - 1; - } - len = weights[idxarr[idxnow]++]; - } - - /* Update the structure. */ - seq->val = val; - seq->len = len; - seq->backw_stop = backw_stop; - seq->backw = backw; - seq->idxcnt = idxcnt; - seq->idxmax = idxmax; - seq->idxnow = idxnow; - seq->us = us; -} - -/* Get next sequence. Traverse the string as required. This function does not - set or use any index or rule cache. */ -static void -get_next_seq_nocache (coll_seq *seq, int nrules, const unsigned char *rulesets, const USTRING_TYPE *weights, const int32_t *table, const USTRING_TYPE *extra, const int32_t *indirect, int pass) @@ -366,10 +193,9 @@ get_next_seq_nocache (coll_seq *seq, int nrules, const unsigned char *rulesets, seq->idx = idx; } -/* Compare two sequences. This version does not use the index and rules - cache. */ +/* Compare two sequences. */ static int -do_compare_nocache (coll_seq *seq1, coll_seq *seq2, int position, +do_compare (coll_seq *seq1, coll_seq *seq2, int position, const USTRING_TYPE *weights) { int seq1len = seq1->len; @@ -417,56 +243,6 @@ out: return result; } -/* Compare two sequences using the index cache. */ -static int -do_compare (coll_seq *seq1, coll_seq *seq2, int position, - const USTRING_TYPE *weights) -{ - int seq1len = seq1->len; - int seq2len = seq2->len; - size_t val1 = seq1->val; - size_t val2 = seq2->val; - int32_t *idx1arr = seq1->idxarr; - int32_t *idx2arr = seq2->idxarr; - int idx1now = seq1->idxnow; - int idx2now = seq2->idxnow; - int result = 0; - - /* Test for position if necessary. */ - if (position && val1 != val2) - { - result = val1 > val2 ? 1 : -1; - goto out; - } - - /* Compare the two sequences. */ - do - { - if (weights[idx1arr[idx1now]] != weights[idx2arr[idx2now]]) - { - /* The sequences differ. */ - result = weights[idx1arr[idx1now]] - weights[idx2arr[idx2now]]; - goto out; - } - - /* Increment the offsets. */ - ++idx1arr[idx1now]; - ++idx2arr[idx2now]; - - --seq1len; - --seq2len; - } - while (seq1len > 0 && seq2len > 0); - - if (position && seq1len != seq2len) - result = seq1len - seq2len; - -out: - seq1->len = seq1len; - seq2->len = seq2len; - return result; -} - int STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l) { @@ -483,6 +259,10 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l) if (nrules == 0) return STRCMP (s1, s2); + /* Catch empty strings. */ + if (__glibc_unlikely (*s1 == '\0') || __glibc_unlikely (*s2 == '\0')) + return (*s1 != '\0') - (*s2 != '\0'); + rulesets = (const unsigned char *) current->values[_NL_ITEM_INDEX (_NL_COLLATE_RULESETS)].string; table = (const int32_t *) @@ -499,65 +279,12 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l) assert (((uintptr_t) extra) % __alignof__ (extra[0]) == 0); assert (((uintptr_t) indirect) % __alignof__ (indirect[0]) == 0); - /* We need this a few times. */ - size_t s1len = STRLEN (s1); - size_t s2len = STRLEN (s2); - - /* Catch empty strings. */ - if (__glibc_unlikely (s1len == 0) || __glibc_unlikely (s2len == 0)) - return (s1len != 0) - (s2len != 0); - - /* Perform the first pass over the string and while doing this find - and store the weights for each character. Since we want this to - be as fast as possible we are using `alloca' to store the temporary - values. But since there is no limit on the length of the string - we have to use `malloc' if the string is too long. We should be - very conservative here. - - Please note that the localedef programs makes sure that `position' - is not used at the first level. */ + int result = 0, rule = 0; coll_seq seq1, seq2; - bool use_malloc = false; - int result = 0; - memset (&seq1, 0, sizeof (seq1)); seq2 = seq1; - size_t size_max = SIZE_MAX / (sizeof (int32_t) + 1); - - if (MIN (s1len, s2len) > size_max - || MAX (s1len, s2len) > size_max - MIN (s1len, s2len)) - { - /* If the strings are long enough to cause overflow in the size request, - then skip the allocation and proceed with the non-cached routines. */ - } - else if (! __libc_use_alloca ((s1len + s2len) * (sizeof (int32_t) + 1))) - { - seq1.idxarr = (int32_t *) malloc ((s1len + s2len) * (sizeof (int32_t) + 1)); - - /* If we failed to allocate memory, we leave everything as NULL so that - we use the nocache version of traversal and comparison functions. */ - if (seq1.idxarr != NULL) - { - seq2.idxarr = &seq1.idxarr[s1len]; - seq1.rulearr = (unsigned char *) &seq2.idxarr[s2len]; - seq2.rulearr = &seq1.rulearr[s1len]; - use_malloc = true; - } - } - else - { - seq1.idxarr = (int32_t *) alloca (s1len * sizeof (int32_t)); - seq2.idxarr = (int32_t *) alloca (s2len * sizeof (int32_t)); - seq1.rulearr = (unsigned char *) alloca (s1len); - seq2.rulearr = (unsigned char *) alloca (s2len); - } - - int rule = 0; - - /* Cache values in the first pass and if needed, use them in subsequent - passes. */ for (int pass = 0; pass < nrules; ++pass) { seq1.idxcnt = 0; @@ -575,64 +302,44 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l) seq2.us = (const USTRING_TYPE *) s2; /* We assume that if a rule has defined `position' in one section - this is true for all of them. */ + this is true for all of them. Please note that the localedef programs + makes sure that `position' is not used at the first level. */ + int position = rulesets[rule * nrules + pass] & sort_position; while (1) { - if (__glibc_unlikely (seq1.idxarr == NULL)) - { - get_next_seq_nocache (&seq1, nrules, rulesets, weights, table, + get_next_seq (&seq1, nrules, rulesets, weights, table, extra, indirect, pass); - get_next_seq_nocache (&seq2, nrules, rulesets, weights, table, + get_next_seq (&seq2, nrules, rulesets, weights, table, extra, indirect, pass); - } - else if (pass == 0) - { - get_next_seq (&seq1, nrules, rulesets, weights, table, extra, - indirect); - get_next_seq (&seq2, nrules, rulesets, weights, table, extra, - indirect); - } - else - { - get_next_seq_cached (&seq1, nrules, pass, rulesets, weights); - get_next_seq_cached (&seq2, nrules, pass, rulesets, weights); - } - /* See whether any or both strings are empty. */ if (seq1.len == 0 || seq2.len == 0) { if (seq1.len == seq2.len) - /* Both ended. So far so good, both strings are equal - at this level. */ - break; + { + /* Both ended. Both strings are equal at this level. */ + if (pass == 0 && STRCMP(s1, s2) == 0) + /* Shortcut to avoid looping through all levels + for totally equal strings. */ + return result; + else + break; + } /* This means one string is shorter than the other. Find out which one and return an appropriate value. */ - result = seq1.len == 0 ? -1 : 1; - goto free_and_return; + return seq1.len == 0 ? -1 : 1; } - if (__glibc_unlikely (seq1.idxarr == NULL)) - result = do_compare_nocache (&seq1, &seq2, position, weights); - else - result = do_compare (&seq1, &seq2, position, weights); + result = do_compare (&seq1, &seq2, position, weights); if (result != 0) - goto free_and_return; + return result; } - if (__glibc_likely (seq1.rulearr != NULL)) - rule = seq1.rulearr[0]; - else - rule = seq1.rule; + rule = seq1.rule; } - /* Free the memory if needed. */ - free_and_return: - if (use_malloc) - free (seq1.idxarr); - return result; } libc_hidden_def (STRCOLL) diff --git a/benchtests/Makefile b/benchtests/Makefile index fd3036d..e79ceee 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -34,7 +34,7 @@ string-bench := bcopy bzero memccpy memchr memcmp memcpy memmem memmove \ mempcpy memset rawmemchr stpcpy stpncpy strcasecmp strcasestr \ strcat strchr strchrnul strcmp strcpy strcspn strlen \ strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \ - strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok + strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok strcoll string-bench-all := $(string-bench) stdlib-bench := strtod