From patchwork Wed Jun 3 10:52:14 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ondrej Bilka X-Patchwork-Id: 7024 Received: (qmail 54867 invoked by alias); 3 Jun 2015 10:53:43 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 54828 invoked by uid 89); 3 Jun 2015 10:53:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.1 required=5.0 tests=AWL, BAYES_99, BAYES_999, FREEMAIL_FROM, SPF_NEUTRAL autolearn=no version=3.3.2 X-HELO: popelka.ms.mff.cuni.cz Date: Wed, 3 Jun 2015 12:52:14 +0200 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: libc-alpha@sourceware.org Subject: [PATCH v6 4/*] Generic string search functions (strstr, strcasestr, memmem) Message-ID: <20150603105214.GA31807@domone> References: <20150531190013.GA8530@domone> <20150601131146.GA22285@domone> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150601131146.GA22285@domone> User-Agent: Mutt/1.5.20 (2009-06-14) Hi, I collected strstr profile trace and it was surprising. The biggest one is that strstr has in 26.5% needle of size 1. I expected that as gcc already transforms strstr(x,"c") to strchr that it would be rare but no, that made me optimize these cases as likely. Then as I mentioned in tile optimization thread I somewhat though that tolower will also ignore diacritics to make search also diacritic-insensitive. That isn't case which allow me to optimize strcasestr more. I use that only two characters x for which tolower(x) == tolower(c) are tolower(c) and toupper(c). This doesn't have to hold for all encodings, I add _NL_CTYPE_NONBIJECTIVE_CASE to test that property. For my computer there are following average values in bytes: average size 24.06 needle size 4.31 comparisons 25.07 digraphs 0.10 trigraphs 0.08 calls 969516 succeed 40.7% haystack aligned to 4 bytes 74.9% aligned to 8 bytes 71.7% aligned to 16 bytes 62.8% needle aligned to 4 bytes 48.6% aligned to 8 bytes 29.7% aligned to 16 bytes 29.1% needle found in n bytes: n <= 0: 6.2% n <= 1: 7.3% n <= 2: 9.7% n <= 3: 12.8% n <= 4: 13.6% n <= 8: 16.4% n <= 16: 55.2% n <= 32: 72.3% n <= 64: 96.4% needle size: n <= 0: 1.7% n <= 1: 27.2% n <= 2: 60.3% n <= 3: 62.4% n <= 4: 64.2% n <= 8: 84.5% n <= 16: 98.8% n <= 32: 99.9% n <= 64: 100.0% * benchtests/bench-strcasestr.c: Remove simple_strcasestr. * string/test-strcasestr.c: Likewise. * locale/categories.def: Add _NL_CTYPE_NONBIJECTIVE_CASE. * locale/C-ctype.c: Likewise. * locale/langinfo.h (enum): Likewise. * locale/programs/ld-ctype.c: Detect _NL_CTYPE_NONBIJECTIVE_CASE. * string/memmem.c (struct): Use string skeleton. * string/strcasestr.c (struct): Likewise. * string/strstr.c (struct): Likewise. * sysdeps/generic/string_vector_search.h: New file. diff --git a/benchtests/bench-strcasestr.c b/benchtests/bench-strcasestr.c index 33531a4..b13efe0 100644 --- a/benchtests/bench-strcasestr.c +++ b/benchtests/bench-strcasestr.c @@ -20,11 +20,7 @@ #define TEST_NAME "strcasestr" #include "bench-string.h" - -#define STRCASESTR simple_strcasestr -#define NO_ALIAS -#define __strncasecmp strncasecmp -#include "../string/strcasestr.c" +#include static char * @@ -53,7 +49,6 @@ stupid_strcasestr (const char *s1, const char *s2) typedef char *(*proto_t) (const char *, const char *); IMPL (stupid_strcasestr, 0) -IMPL (simple_strcasestr, 0) IMPL (strcasestr, 1) @@ -86,10 +81,9 @@ do_test (size_t align1, size_t align2, size_t len1, size_t len2, static const char d[] = "1234567890abcxyz"; #define dl (sizeof (d) - 1) char *ss2 = s2; - for (size_t l = len2; l > 0; l = l > dl ? l - dl : 0) + for (size_t l = 0; l < len2; l++) { - size_t t = l > dl ? dl : l; - ss2 = mempcpy (ss2, d, t); + ss2[l] = ((unsigned)rand())%16 + 1; } s2[len2] = '\0'; diff --git a/locale/C-ctype.c b/locale/C-ctype.c index 7c616d8..61ee2a7 100644 --- a/locale/C-ctype.c +++ b/locale/C-ctype.c @@ -528,7 +528,7 @@ _nl_C_LC_CTYPE_width attribute_hidden = }; /* Number of fields with fixed meanings, starting at 0. */ -#define NR_FIXED 72 +#define NR_FIXED 73 /* Number of class fields, starting at CLASS_OFFSET. */ #define NR_CLASSES 12 /* Number of map fields, starting at MAP_OFFSET. */ @@ -669,6 +669,8 @@ const struct __locale_data _nl_C_LC_CTYPE attribute_hidden = { .word = 0 }, /* _NL_CTYPE_NONASCII_CASE */ { .word = 0 }, + /* _NL_CTYPE_NONBIJECTIVE_CASE */ + { .word = 0 }, /* NR_CLASSES wctype_tables */ { .string = (const char *) _nl_C_LC_CTYPE_class_upper.header }, { .string = (const char *) _nl_C_LC_CTYPE_class_lower.header }, diff --git a/locale/categories.def b/locale/categories.def index 045489d..338ca36 100644 --- a/locale/categories.def +++ b/locale/categories.def @@ -135,6 +135,7 @@ DEFINE_CATEGORY DEFINE_ELEMENT (_NL_CTYPE_TRANSLIT_IGNORE, "ctype-translit-ignore", std, string) DEFINE_ELEMENT (_NL_CTYPE_MAP_TO_NONASCII, "map-to-nonascii", std, word) DEFINE_ELEMENT (_NL_CTYPE_NONASCII_CASE, "nonascii-case", std, word) + DEFINE_ELEMENT (_NL_CTYPE_NONBIJECTIVE_CASE, "nonbijective-case", std, word) ), _nl_postload_ctype) diff --git a/locale/langinfo.h b/locale/langinfo.h index ffc5c7f..093b840 100644 --- a/locale/langinfo.h +++ b/locale/langinfo.h @@ -335,6 +335,7 @@ enum _NL_CTYPE_TRANSLIT_IGNORE, _NL_CTYPE_MAP_TO_NONASCII, _NL_CTYPE_NONASCII_CASE, + _NL_CTYPE_NONBIJECTIVE_CASE, _NL_CTYPE_EXTRA_MAP_1, _NL_CTYPE_EXTRA_MAP_2, _NL_CTYPE_EXTRA_MAP_3, diff --git a/locale/programs/ld-ctype.c b/locale/programs/ld-ctype.c index e8690f3..3968809 100644 --- a/locale/programs/ld-ctype.c +++ b/locale/programs/ld-ctype.c @@ -227,6 +227,8 @@ struct locale_ctype_t uint32_t to_nonascii; uint32_t nonascii_case; + uint32_t nonbijective_case; + /* The arrays for the binary representation. */ char_class_t *ctype_b; @@ -691,6 +693,13 @@ character not defined in character map"))); || ctype->map256_collection[1][cnt] != cnt) ctype->nonascii_case = 1; + + for (cnt = 0; cnt < 256; ++cnt) + if (ctype->map256_collection[0][ctype->map256_collection[1][cnt]] != cnt && + ctype->map256_collection[1][ctype->map256_collection[0][cnt]] != cnt) + ctype->nonbijective_case = 1; + + /* Now that the tests are done make sure the name array contains all characters which are handled in the WIDTH section of the character set definition file. */ @@ -1063,6 +1072,9 @@ ctype_output (struct localedef_t *locale, const struct charmap_t *charmap, CTYPE_UINT32 (_NL_CTYPE_NONASCII_CASE, ctype->nonascii_case); + CTYPE_UINT32 (_NL_CTYPE_NONBIJECTIVE_CASE, ctype->nonbijective_case); + + case _NL_ITEM_INDEX (_NL_CTYPE_INDIGITS_MB_LEN): add_locale_uint32 (&file, ctype->mbdigits_act / 10); break; diff --git a/string/memmem.c b/string/memmem.c index 8a81f65..9add37e 100644 --- a/string/memmem.c +++ b/string/memmem.c @@ -35,6 +35,45 @@ #undef memmem +#include + +struct cmask +{ + unsigned long int n0, n1; + size_t needle_size; +}; +#define CUSTOM_CMASK +#define CMASK_PARAM struct cmask cmask + +#define LOOKAHEAD 1 +#define EXPRESSION_NOCARRY(x, cmask) \ + ( bytes_equal_nocarry (BYTE (0), cmask.n0) \ + & bytes_equal_nocarry (BYTE (1), cmask.n1)) + +/* We calculate mask, not look for first byte and carry could corrupt + following bytes. Only possible optimization would be use + contains_zero (x) as it must end there. */ +#define EXPRESSION(x, cmask) EXPRESSION_NOCARRY(x, cmask) + +static int +check (char *s, char *n, unsigned long *rent, struct cmask cmask) +{ + /* First two characters were already checked by vector loop. */ + size_t i = 2; + + while (i < cmask.needle_size && s[i] == n[i]) + i++; + + if (i == cmask.needle_size) + return 1; + + rent += i + 10; + return 0; +} + +#define CHECK_N +#include + /* Return the first occurrence of NEEDLE in HAYSTACK. Return HAYSTACK if NEEDLE_LEN is 0, otherwise NULL if NEEDLE is not found in HAYSTACK. */ @@ -46,17 +85,43 @@ __memmem (const void *haystack_start, size_t haystack_len, not an array of 'char' values. See ISO C 99 section 6.2.6.1. */ const unsigned char *haystack = (const unsigned char *) haystack_start; const unsigned char *needle = (const unsigned char *) needle_start; + char *ret; + unsigned char *n = (unsigned char *) needle; if (needle_len == 0) /* The first occurrence of the empty string is deemed to occur at the beginning of the string. */ return (void *) haystack; + if (needle_len == 1) + return memchr (haystack, n[0], haystack_len); + /* Sanity check, otherwise the loop might search through the whole memory. */ if (__glibc_unlikely (haystack_len < needle_len)) return NULL; + if (__BYTE_ORDER == __LITTLE_ENDIAN) + { + struct cmask cmask; + cmask.n0 = broadcast (n[0]); + cmask.n1 = broadcast (n[1]); + cmask.needle_size = needle_len; + size_t search_size = haystack_len - needle_len + 1; + if (vector_search ((char *) haystack, (char *) needle, + cmask, search_size, &ret)) + return (void *) ret; + else + { + haystack_len -= (char *) ret - (char *) haystack; + haystack = (const unsigned char *) ret; + + if (ret == NULL || haystack_len < needle_len) + return NULL; + } + } + + /* Use optimizations in memchr when possible, to reduce the search size of haystack using a linear algorithm with a smaller coefficient. However, avoid memchr for long needles, since we diff --git a/string/strcasestr.c b/string/strcasestr.c index 400fab8..db7f1cd 100644 --- a/string/strcasestr.c +++ b/string/strcasestr.c @@ -57,6 +57,60 @@ #define STRCASESTR __strcasestr #endif +/* A initial fast vectorized loop looking for first digraph. */ + +#include + +struct cmask +{ + vector_int l0, u0, l1, u1; +}; +#define CUSTOM_CMASK +#define CMASK_PARAM struct cmask cmask + + /* We use trick that for when character is in ascii and character + 0 <= c <= 127 could be caselessly equal only one of characters + tolower (c), toupper (c) or a character x in range 128 <= x <= 255 + As these are exactly characters have highest bit set to 1 we adjust + a expression from strstr and filter lower bits by anding with high_bits. + */ + +#define LOOKAHEAD 1 +#define BYTE_EXPRESSION(x, l, u) \ + ( bytes_equal_nocarry (x, l) \ + | bytes_equal_nocarry (x, u)) + +#define EXPRESSION_NOCARRY(x, cmask) (( \ + ( BYTE_EXPRESSION (BYTE(0), cmask.l0, cmask.u0) \ + & BYTE_EXPRESSION (BYTE(1), cmask.l1, cmask.u1) \ + ) | contains_zero_nocarry (BYTE (1)))) + +/* We calculate mask, not look for first byte and carry could corrupt + following bytes. Only possible optimization would be use + contains_zero (x) as it must end there. */ +#define EXPRESSION(x, cmask) EXPRESSION_NOCARRY (x, cmask) + +static int +check (char *s, char *n, unsigned long *rent, struct cmask cmask) +{ + size_t i = 2; + + if (!s[1]) + return 1; + + while (n[i] && tolower (s[i]) == tolower (n[i])) + i++; + + if (!n[i]) + return 1; + + rent += i + 10; + return 0; +} + +#include + +#include "../locale/localeinfo.h" /* Find the first occurrence of NEEDLE in HAYSTACK, using case-insensitive comparison. This function gives unspecified @@ -66,9 +120,46 @@ STRCASESTR (const char *haystack_start, const char *needle_start) { const char *haystack = haystack_start; const char *needle = needle_start; + char *ret; size_t needle_len; /* Length of NEEDLE. */ size_t haystack_len; /* Known minimum length of HAYSTACK. */ bool ok = true; /* True if NEEDLE is prefix of HAYSTACK. */ + unsigned char *n = (unsigned char *) needle; + + __locale_t loc = _NL_CURRENT_LOCALE; + struct __locale_data *ctype = loc->__locales[LC_CTYPE]; + int bijective = !ctype->values[_NL_ITEM_INDEX (_NL_CTYPE_NONBIJECTIVE_CASE)].word; + + if (!n[0]) + return (char *) haystack; + if (!haystack[0]) + return NULL; + + /* TODO jump to special case of strpbrk with size 2. */ + if (!n[1] && bijective) + { + char ac[3]; + ac[0]=tolower(n[0]); + ac[1]=toupper(n[0]); + ac[2]=0; + return strpbrk (haystack, ac); + } + + if (__BYTE_ORDER == __LITTLE_ENDIAN && bijective) + { + struct cmask cmask; + cmask.l0 = broadcast (tolower(n[0])); + cmask.u0 = broadcast (toupper(n[0])); + cmask.l1 = broadcast (tolower(n[1])); + cmask.u1 = broadcast (toupper(n[1])); + + if (vector_search ((char *) haystack, (char *) needle, cmask,\ + 0, &ret)) + return (ret[1] != '\0') ? ret : NULL; + else + haystack = ret; + } + /* Determine length of NEEDLE, and in the process, make sure HAYSTACK is at least as long (no point processing all of a long diff --git a/string/strstr.c b/string/strstr.c index 045e878..f1047e9 100644 --- a/string/strstr.c +++ b/string/strstr.c @@ -45,6 +45,48 @@ #define STRSTR strstr #endif +#include + +struct cmask +{ + vector_int n0, n1; +}; +#define CUSTOM_CMASK +#define CMASK_PARAM struct cmask cmask + +#define LOOKAHEAD 1 +#define EXPRESSION_NOCARRY(x, cmask) (\ + ( bytes_equal_nocarry (BYTE (0), cmask.n0) \ + & bytes_equal_nocarry (BYTE (1), cmask.n1)) \ + | contains_zero_nocarry (BYTE (1))) + +/* We calculate mask, not look for first byte and carry could corrupt + following bytes. Only possible optimization would be use + contains_zero (x) as it must end there. */ +#define EXPRESSION(x, cmask) EXPRESSION_NOCARRY(x, cmask) + +static int +check (char *s, char *n, unsigned long *rent, struct cmask cmask) +{ + /* First two characters were already checked by vector loop. */ + size_t i = 2; + if (!s[1]) + return 1; + + while (n[i] && s[i] == n[i]) + i++; + + if (!n[i]) + return 1; + + rent += i + 10; + return 0; +} + +#include + + + /* Return the first occurrence of NEEDLE in HAYSTACK. Return HAYSTACK if NEEDLE is empty, otherwise NULL if NEEDLE is not found in HAYSTACK. */ @@ -53,9 +95,34 @@ STRSTR (const char *haystack_start, const char *needle_start) { const char *haystack = haystack_start; const char *needle = needle_start; + char *ret; size_t needle_len; /* Length of NEEDLE. */ size_t haystack_len; /* Known minimum length of HAYSTACK. */ bool ok = true; /* True if NEEDLE is prefix of HAYSTACK. */ + unsigned char *n = (unsigned char *) needle; + + if (!n[0]) + return (char *) haystack; + if (!n[1]) + return strchr (haystack, n[0]); + + if (!haystack[0]) + return NULL; + + if (__BYTE_ORDER == __LITTLE_ENDIAN) + { + struct cmask cmask; + cmask.n0 = broadcast (n[0]); + cmask.n1 = broadcast (n[1]); + + if (vector_search ((char *) haystack, (char *) needle, cmask,\ + 0, &ret)) + return ret[1] != '\0' ? ret : NULL; + else + haystack = ret; + } + + /* Determine length of NEEDLE, and in the process, make sure HAYSTACK is at least as long (no point processing all of a long diff --git a/string/test-strcasestr.c b/string/test-strcasestr.c index 489dc84..f8a7ee9 100644 --- a/string/test-strcasestr.c +++ b/string/test-strcasestr.c @@ -21,12 +21,7 @@ #define TEST_NAME "strcasestr" #include "test-string.h" - -#define STRCASESTR simple_strcasestr -#define NO_ALIAS -#define __strncasecmp strncasecmp -#include "strcasestr.c" - +#include static char * stupid_strcasestr (const char *s1, const char *s2) @@ -54,7 +49,6 @@ stupid_strcasestr (const char *s1, const char *s2) typedef char *(*proto_t) (const char *, const char *); IMPL (stupid_strcasestr, 0) -IMPL (simple_strcasestr, 0) IMPL (strcasestr, 1) diff --git a/sysdeps/generic/string_vector_search.h b/sysdeps/generic/string_vector_search.h new file mode 100644 index 0000000..263fc25 --- /dev/null +++ b/sysdeps/generic/string_vector_search.h @@ -0,0 +1,89 @@ +/* Algorithm to select a faster string search implementation. + Copyright (C) 2015 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* + For string search algorithms there are two ways how you could look + at performance. + + From practical standpoint a naive algorithm is baseline. Most theoretical + algorithms (BW, KMP, ...) are slower because they need to do expensive + random access. This makes prohibitive constant cost per character while naive + algorithm is linear on random inputs as most of time you don't match string. + Also potential gains are nonexistent, user supplies needle so you could be + happy when its at least 8 characters long and most haystacks are less than 64 + bytes large. + + On other hand we need to ensure a linear worst-case of our algorithm. + + We do that with bit of accounting. We count number of comparisons and when + it exceeds a size of haystack times some constant we switch to two-way + algorithm with guaranteed linear time. + + Main performance boost is gained by vectorizing naive algorithm checks. + We will check in parallel if first digraph matches. That should be quite rare, + in english most frequent digraph is th with frequency around 1%. + + Then after each digraph found we face decision if to keep looking for + digraphs or switch to two-way algorithm. These are covered as common + problem in online algorithm setting: buy-or-rent problem. + A precomputation in two-way algorithm with needle size n takes + around same time as 20 character comparisons so in worst case a two-way + algorithm would be twenty times slower than naive for haystacks of size n+1. + A solution would be pay higher rent rate until it accumulates to buy cost. + Then we would in worst case be twice slower than selecting optimal + implementation from start. + + That would work except it needs strlen (needle) which is unnecessary + in practice. To show that haystack cannot match needle it suffices to know + that there is no leading triplet from needle in haystack. As unless there + was a planted needle in haystack and false positive is unlikely we likely + don't have to inspect more than three or four characters from needle. Also + correct accounting takes time so we approximate cost based on number of + comparisons and vector searchs. + + */ + +#include + +static int +vector_search (char *haystack, char *needle, CMASK_PARAM, + size_t check_n, char **ret) +{ + char *s = haystack; + char *end = s + check_n; + unsigned long rent = 0; + while (rent < 256 + (s - haystack)) + { + s = string_skeleton (s, cmask, end - s); + if (s == NULL) + { + *ret = s; + return 1; + } + if (check (s, needle, &rent, cmask)) + { + *ret = s; + return 1; + } + else + s++; + } + /* Superlinear behaviour detected, switch to two-way algorithm. */ + *ret = s; + return 0; +}