Message ID | d242d10d-8f83-1913-44d3-4c9fc49e0477@arm.com |
---|---|
State | Committed |
Headers | show |
* Szabolcs Nagy: > Passing a second argument to the ifunc resolver allows accessing > AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2 > on linux because for ilp32 to remain compatible with lp64 ABI no more > than 32bit hwcap flags can be in AT_HWCAP which is already used up. Is the expectation that ilp32 will always have the second argument, and therefore will not need the flag? > Old call ABI passed hwcap as uint64_t, new abi sets the > _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument > that's a pointer to an extendible struct. A resolver has to check > the _IFUNC_ARG_HWCAP flag before accessing the second argument. The _IFUNC_ARG_HWCAP flag is not reserved on the kernel side. I think it should be. Thanks, Florian
On 12/06/2019 11:51, Florian Weimer wrote: > * Szabolcs Nagy: > >> Passing a second argument to the ifunc resolver allows accessing >> AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2 >> on linux because for ilp32 to remain compatible with lp64 ABI no more >> than 32bit hwcap flags can be in AT_HWCAP which is already used up. > > Is the expectation that ilp32 will always have the second argument, and > therefore will not need the flag? i guess there is no backward compat issue on ilp32 because it's not merged yet upstream, but currently ilp32 resolver takes uint64_t argument while AT_HWCAP is 32bit, so ilp32 resolver code can do the same _IFUNC_ARG_HWCAP flag check as lp64, before it uses the second argument. >> Old call ABI passed hwcap as uint64_t, new abi sets the >> _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument >> that's a pointer to an extendible struct. A resolver has to check >> the _IFUNC_ARG_HWCAP flag before accessing the second argument. > > The _IFUNC_ARG_HWCAP flag is not reserved on the kernel side. I think > it should be. the kernel patch added documentation about it http://lists.infradead.org/pipermail/linux-arm-kernel/2019-April/646891.html the top two bits of a 64bit AT_HWCAP is guaranteed to be 0, right now only the bottom 32bits are used and new bits go to AT_HWCAP2 (but later if ilp32 support is dropped from the kernel they may use some of the top bits of AT_HWCAP)
* Szabolcs Nagy: > On 12/06/2019 11:51, Florian Weimer wrote: >> * Szabolcs Nagy: >> >>> Passing a second argument to the ifunc resolver allows accessing >>> AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2 >>> on linux because for ilp32 to remain compatible with lp64 ABI no more >>> than 32bit hwcap flags can be in AT_HWCAP which is already used up. >> >> Is the expectation that ilp32 will always have the second argument, and >> therefore will not need the flag? > > i guess there is no backward compat issue on ilp32 > because it's not merged yet upstream, Right. > but currently ilp32 resolver takes uint64_t argument > while AT_HWCAP is 32bit, so ilp32 resolver code can > do the same _IFUNC_ARG_HWCAP flag check as lp64, > before it uses the second argument. I see. >>> Old call ABI passed hwcap as uint64_t, new abi sets the >>> _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument >>> that's a pointer to an extendible struct. A resolver has to check >>> the _IFUNC_ARG_HWCAP flag before accessing the second argument. >> >> The _IFUNC_ARG_HWCAP flag is not reserved on the kernel side. I think >> it should be. > > the kernel patch added documentation about it > http://lists.infradead.org/pipermail/linux-arm-kernel/2019-April/646891.html > > the top two bits of a 64bit AT_HWCAP is guaranteed to be 0, > right now only the bottom 32bits are used and new bits go > to AT_HWCAP2 (but later if ilp32 support is dropped from > the kernel they may use some of the top bits of AT_HWCAP) Ah, I had only looked at the UAPI header. | For interoperation with userspace, the kernel guarantees that bits 62 | and 63 of AT_HWCAP will always be returned as 0. It's not as explicit as I would like (kind of like “reserved, must be zero“ in the Internet RFCs), but it's better than nothing. Thanks, Florian
On 12/06/2019 11:09, Szabolcs Nagy wrote: > Passing a second argument to the ifunc resolver allows accessing > AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2 > on linux because for ilp32 to remain compatible with lp64 ABI no more > than 32bit hwcap flags can be in AT_HWCAP which is already used up. for the record i'm still planning to commit this for 2.30. will try to do it next week. > > Currently the relocation ordering logic does not guarantee that ifunc > resolvers can call libc apis or access libc objects, so only the > resolver arguments and runtime environment dependent instructions can > be used to do the dispatch (this affects ifunc resolvers outside of > the libc). > > Since ifunc resolver is target specific and only supposed to be > called by the dynamic linker, the call ABI can be changed in a > backward compatible way: > > Old call ABI passed hwcap as uint64_t, new abi sets the > _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument > that's a pointer to an extendible struct. A resolver has to check > the _IFUNC_ARG_HWCAP flag before accessing the second argument. > > The new sys/ifunc.h installed header has the definitions for the > new ABI, everything is in the implementation reserved namespace. > > An alternative approach is to try to support extern calls from ifunc > resolvers such as getauxval, but that seems non-trivial > https://sourceware.org/ml/libc-alpha/2017-01/msg00468.html > > 2019-06-12 Szabolcs Nagy <szabolcs.nagy@arm.com> > > * sysdeps/aarch64/Makefile: Install sys/ifunc.h and add tests. > * sysdeps/aarch64/dl-irel.h (elf_ifunc_invoke): Update to new ABI. > * sysdeps/aarch64/sys/ifunc.h: New file. > * sysdeps/aarch64/tst-ifunc-arg-1.c: New file. > * sysdeps/aarch64/tst-ifunc-arg-2.c: New file. >
On 28/06/2019 17:35, Szabolcs Nagy wrote: > On 12/06/2019 11:09, Szabolcs Nagy wrote: >> Passing a second argument to the ifunc resolver allows accessing >> AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2 >> on linux because for ilp32 to remain compatible with lp64 ABI no more >> than 32bit hwcap flags can be in AT_HWCAP which is already used up. > > for the record i'm still planning to commit this for 2.30. > will try to do it next week. committed. >>> Currently the relocation ordering logic does not guarantee that ifunc >> resolvers can call libc apis or access libc objects, so only the >> resolver arguments and runtime environment dependent instructions can >> be used to do the dispatch (this affects ifunc resolvers outside of >> the libc). >> >> Since ifunc resolver is target specific and only supposed to be >> called by the dynamic linker, the call ABI can be changed in a >> backward compatible way: >> >> Old call ABI passed hwcap as uint64_t, new abi sets the >> _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument >> that's a pointer to an extendible struct. A resolver has to check >> the _IFUNC_ARG_HWCAP flag before accessing the second argument. >> >> The new sys/ifunc.h installed header has the definitions for the >> new ABI, everything is in the implementation reserved namespace. >> >> An alternative approach is to try to support extern calls from ifunc >> resolvers such as getauxval, but that seems non-trivial >> https://sourceware.org/ml/libc-alpha/2017-01/msg00468.html >> >> 2019-06-12 Szabolcs Nagy <szabolcs.nagy@arm.com> >> >> * sysdeps/aarch64/Makefile: Install sys/ifunc.h and add tests. >> * sysdeps/aarch64/dl-irel.h (elf_ifunc_invoke): Update to new ABI. >> * sysdeps/aarch64/sys/ifunc.h: New file. >> * sysdeps/aarch64/tst-ifunc-arg-1.c: New file. >> * sysdeps/aarch64/tst-ifunc-arg-2.c: New file. >> >
diff --git a/string/str-two-way.h b/string/str-two-way.h index b5011baafa..50fe6b4684 100644 --- a/string/str-two-way.h +++ b/string/str-two-way.h @@ -221,7 +221,7 @@ critical_factorization (const unsigned char *needle, size_t needle_len, most 2 * HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching. If AVAILABLE modifies HAYSTACK_LEN (as in strstr), then at most 3 * HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching. */ -static RETURN_TYPE +static inline RETURN_TYPE two_way_short_needle (const unsigned char *haystack, size_t haystack_len, const unsigned char *needle, size_t needle_len) { @@ -383,7 +383,7 @@ two_way_short_needle (const unsigned char *haystack, size_t haystack_len, If AVAILABLE modifies HAYSTACK_LEN (as in strstr), then at most 3 * HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching, and sublinear performance is not possible. */ -static RETURN_TYPE +__attribute__((noinline)) static RETURN_TYPE two_way_long_needle (const unsigned char *haystack, size_t haystack_len, const unsigned char *needle, size_t needle_len) { diff --git a/string/strstr.c b/string/strstr.c index 64e478b9e7..737585fff2 100644 --- a/string/strstr.c +++ b/string/strstr.c @@ -16,21 +16,12 @@ License along with the GNU C Library; if not, see <http://www.gnu.org/licenses/>. */ -/* This particular implementation was written by Eric Blake, 2008. */ - #ifndef _LIBC # include <config.h> #endif -/* Specification of strstr. */ #include <string.h> -#include <stdbool.h> - -#ifndef _LIBC -# define __builtin_expect(expr, val) (expr) -#endif - #define RETURN_TYPE char * #define AVAILABLE(h, h_l, j, n_l) \ (((j) + (n_l) <= (h_l)) \ @@ -47,47 +38,122 @@ #define STRSTR strstr #endif -/* Return the first occurrence of NEEDLE in HAYSTACK. Return HAYSTACK - if NEEDLE is empty, otherwise NULL if NEEDLE is not found in - HAYSTACK. */ -char * -STRSTR (const char *haystack, const char *needle) +static inline char * +strstr2 (const unsigned char *hs, const unsigned char *ne) { - size_t needle_len; /* Length of NEEDLE. */ - size_t haystack_len; /* Known minimum length of HAYSTACK. */ - - /* Handle empty NEEDLE special case. */ - if (needle[0] == '\0') - return (char *) haystack; + uint32_t h1 = (ne[0] << 16) | ne[1]; + uint32_t h2 = 0; + for (int c = hs[0]; h1 != h2 && c != 0; c = *++hs) + h2 = (h2 << 16) | c; + return h1 == h2 ? (char *)hs - 2 : NULL; +} - /* Skip until we find the first matching char from NEEDLE. */ - haystack = strchr (haystack, needle[0]); - if (haystack == NULL || needle[1] == '\0') - return (char *) haystack; +static inline char * +strstr3 (const unsigned char *hs, const unsigned char *ne) +{ + uint32_t h1 = (ne[0] << 24) | (ne[1] << 16) | (ne[2] << 8); + uint32_t h2 = 0; + for (int c = hs[0]; h1 != h2 && c != 0; c = *++hs) + h2 = (h2 | c) << 8; + return h1 == h2 ? (char *)hs - 3 : NULL; +} - /* Ensure HAYSTACK length is at least as long as NEEDLE length. - Since a match may occur early on in a huge HAYSTACK, use strnlen +#define hash2(p) (((size_t)(p)[0] - ((size_t)(p)[-1] << 3)) % sizeof (shift)) + +/* Fast strstr algorithm with guaranteed linear-time performance. + Small needles up to size 2 use a dedicated linear search. Longer needles + up to size 256 use a novel modified Horspool algorithm. It hashes pairs + of characters to quickly skip past mismatches. The main search loop only + exits if the last 2 characters match, avoiding unnecessary calls to memcmp + and allowing for a larger skip if there is no match. A self-adapting + filtering check is used to quickly detect mismatches in long needles. + By limiting the needle length to 256, the shift table can be reduced to 8 + bits per entry, lowering preprocessing overhead and minimizing cache effects. + The limit also implies worst-case performance is linear. + Needles larger than 256 characters use the linear-time Two-Way algorithm. */ +char * +STRSTR (const char *haystack, const char *needle) +{ + const unsigned char *hs = (const unsigned char *) haystack; + const unsigned char *ne = (const unsigned char *) needle; + + /* Handle short needle special cases first. */ + if (ne[0] == '\0') + return (char *)hs; + hs = (const unsigned char *)strchr ((const char*)hs, ne[0]); + if (hs == NULL || ne[1] == '\0') + return (char*)hs; + if (ne[2] == '\0') + return strstr2 (hs, ne); + if (ne[3] == '\0') + return strstr3 (hs, ne); + + /* Ensure haystack length is at least as long as needle length. + Since a match may occur early on in a huge haystack, use strnlen and read ahead a few cachelines for improved performance. */ - needle_len = strlen (needle); - haystack_len = __strnlen (haystack, needle_len + 256); - if (haystack_len < needle_len) + size_t ne_len = strlen ((const char*)ne); + size_t hs_len = __strnlen ((const char*)hs, ne_len | 512); + if (hs_len < ne_len) return NULL; - /* Check whether we have a match. This improves performance since we avoid - the initialization overhead of the two-way algorithm. */ - if (memcmp (haystack, needle, needle_len) == 0) - return (char *) haystack; - - /* Perform the search. Abstract memory is considered to be an array - of 'unsigned char' values, not an array of 'char' values. See - ISO C 99 section 6.2.6.1. */ - if (needle_len < LONG_NEEDLE_THRESHOLD) - return two_way_short_needle ((const unsigned char *) haystack, - haystack_len, - (const unsigned char *) needle, needle_len); - return two_way_long_needle ((const unsigned char *) haystack, haystack_len, - (const unsigned char *) needle, needle_len); + /* Check whether we have a match. This improves performance since we + avoid initialization overheads. */ + if (memcmp (hs, ne, ne_len) == 0) + return (char *) hs; + + /* Use Two-Way algorithm for very long needles. */ + if (__glibc_unlikely (ne_len > 256)) + return two_way_long_needle (hs, hs_len, ne, ne_len); + + const unsigned char *end = hs + hs_len - ne_len; + uint8_t shift[256]; + size_t tmp, shift1; + size_t m1 = ne_len - 1; + size_t offset = 0; + + /* Initialize bad character shift hash table. */ + memset (shift, 0, sizeof (shift)); + for (int i = 1; i < m1; i++) + shift[hash2 (ne + i)] = i; + shift1 = m1 - shift[hash2 (ne + m1)]; + shift[hash2 (ne + m1)] = m1; + + while (1) + { + if (__glibc_unlikely (hs > end)) + { + end += __strnlen ((const char*)end + m1 + 1, 2048); + if (hs > end) + return NULL; + } + + /* Skip past character pairs not in the needle. */ + do + { + hs += m1; + tmp = shift[hash2 (hs)]; + } + while (hs <= end && tmp == 0); + + /* If the match is not at the end of the needle, shift to the end + and continue until we match the last 2 characters. */ + hs -= tmp; + if (tmp < m1) + continue; + + /* The last 2 characters match. If the needle is long, check a + fixed number of characters first to quickly filter out mismatches. */ + if (memcmp (hs + offset, ne + offset, sizeof (int)) == 0) + { + if (memcmp (hs, ne, m1) == 0) + return (void *) hs; + + /* Adjust filter offset when it doesn't find the mismatch. */ + offset = (offset >= sizeof (int) ? offset : m1) - sizeof (int); + } + + /* Skip based on matching the last 2 characters. */ + hs += shift1; + } } libc_hidden_builtin_def (strstr) - -#undef LONG_NEEDLE_THRESHOLD