[RFC] aarch64: new ifunc resolver ABI

Message ID d242d10d-8f83-1913-44d3-4c9fc49e0477@arm.com
State Committed
Headers

Commit Message

Szabolcs Nagy June 12, 2019, 10:09 a.m. UTC
  Passing a second argument to the ifunc resolver allows accessing
AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2
on linux because for ilp32 to remain compatible with lp64 ABI no more
than 32bit hwcap flags can be in AT_HWCAP which is already used up.

Currently the relocation ordering logic does not guarantee that ifunc
resolvers can call libc apis or access libc objects, so only the
resolver arguments and runtime environment dependent instructions can
be used to do the dispatch (this affects ifunc resolvers outside of
the libc).

Since ifunc resolver is target specific and only supposed to be
called by the dynamic linker, the call ABI can be changed in a
backward compatible way:

Old call ABI passed hwcap as uint64_t, new abi sets the
_IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument
that's a pointer to an extendible struct. A resolver has to check
the _IFUNC_ARG_HWCAP flag before accessing the second argument.

The new sys/ifunc.h installed header has the definitions for the
new ABI, everything is in the implementation reserved namespace.

An alternative approach is to try to support extern calls from ifunc
resolvers such as getauxval, but that seems non-trivial
https://sourceware.org/ml/libc-alpha/2017-01/msg00468.html

2019-06-12  Szabolcs Nagy  <szabolcs.nagy@arm.com>

	* sysdeps/aarch64/Makefile: Install sys/ifunc.h and add tests.
	* sysdeps/aarch64/dl-irel.h (elf_ifunc_invoke): Update to new ABI.
	* sysdeps/aarch64/sys/ifunc.h: New file.
	* sysdeps/aarch64/tst-ifunc-arg-1.c: New file.
	* sysdeps/aarch64/tst-ifunc-arg-2.c: New file.
  

Comments

Florian Weimer June 12, 2019, 10:51 a.m. UTC | #1
* Szabolcs Nagy:

> Passing a second argument to the ifunc resolver allows accessing
> AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2
> on linux because for ilp32 to remain compatible with lp64 ABI no more
> than 32bit hwcap flags can be in AT_HWCAP which is already used up.

Is the expectation that ilp32 will always have the second argument, and
therefore will not need the flag?

> Old call ABI passed hwcap as uint64_t, new abi sets the
> _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument
> that's a pointer to an extendible struct. A resolver has to check
> the _IFUNC_ARG_HWCAP flag before accessing the second argument.

The _IFUNC_ARG_HWCAP flag is not reserved on the kernel side.  I think
it should be.

Thanks,
Florian
  
Szabolcs Nagy June 12, 2019, 11:14 a.m. UTC | #2
On 12/06/2019 11:51, Florian Weimer wrote:
> * Szabolcs Nagy:

> 

>> Passing a second argument to the ifunc resolver allows accessing

>> AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2

>> on linux because for ilp32 to remain compatible with lp64 ABI no more

>> than 32bit hwcap flags can be in AT_HWCAP which is already used up.

> 

> Is the expectation that ilp32 will always have the second argument, and

> therefore will not need the flag?


i guess there is no backward compat issue on ilp32
because it's not merged yet upstream,

but currently ilp32 resolver takes uint64_t argument
while AT_HWCAP is 32bit, so ilp32 resolver code can
do the same _IFUNC_ARG_HWCAP flag check as lp64,
before it uses the second argument.

>> Old call ABI passed hwcap as uint64_t, new abi sets the

>> _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument

>> that's a pointer to an extendible struct. A resolver has to check

>> the _IFUNC_ARG_HWCAP flag before accessing the second argument.

> 

> The _IFUNC_ARG_HWCAP flag is not reserved on the kernel side.  I think

> it should be.


the kernel patch added documentation about it
http://lists.infradead.org/pipermail/linux-arm-kernel/2019-April/646891.html

the top two bits of a 64bit AT_HWCAP is guaranteed to be 0,
right now only the bottom 32bits are used and new bits go
to AT_HWCAP2 (but later if ilp32 support is dropped from
the kernel they may use some of the top bits of AT_HWCAP)
  
Florian Weimer June 12, 2019, 11:20 a.m. UTC | #3
* Szabolcs Nagy:

> On 12/06/2019 11:51, Florian Weimer wrote:
>> * Szabolcs Nagy:
>> 
>>> Passing a second argument to the ifunc resolver allows accessing
>>> AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2
>>> on linux because for ilp32 to remain compatible with lp64 ABI no more
>>> than 32bit hwcap flags can be in AT_HWCAP which is already used up.
>> 
>> Is the expectation that ilp32 will always have the second argument, and
>> therefore will not need the flag?
>
> i guess there is no backward compat issue on ilp32
> because it's not merged yet upstream,

Right.

> but currently ilp32 resolver takes uint64_t argument
> while AT_HWCAP is 32bit, so ilp32 resolver code can
> do the same _IFUNC_ARG_HWCAP flag check as lp64,
> before it uses the second argument.

I see.

>>> Old call ABI passed hwcap as uint64_t, new abi sets the
>>> _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument
>>> that's a pointer to an extendible struct. A resolver has to check
>>> the _IFUNC_ARG_HWCAP flag before accessing the second argument.
>> 
>> The _IFUNC_ARG_HWCAP flag is not reserved on the kernel side.  I think
>> it should be.
>
> the kernel patch added documentation about it
> http://lists.infradead.org/pipermail/linux-arm-kernel/2019-April/646891.html
>
> the top two bits of a 64bit AT_HWCAP is guaranteed to be 0,
> right now only the bottom 32bits are used and new bits go
> to AT_HWCAP2 (but later if ilp32 support is dropped from
> the kernel they may use some of the top bits of AT_HWCAP)

Ah, I had only looked at the UAPI header.

| For interoperation with userspace, the kernel guarantees that bits 62
| and 63 of AT_HWCAP will always be returned as 0.

It's not as explicit as I would like (kind of like “reserved, must be
zero“ in the Internet RFCs), but it's better than nothing.

Thanks,
Florian
  
Szabolcs Nagy June 28, 2019, 4:35 p.m. UTC | #4
On 12/06/2019 11:09, Szabolcs Nagy wrote:
> Passing a second argument to the ifunc resolver allows accessing

> AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2

> on linux because for ilp32 to remain compatible with lp64 ABI no more

> than 32bit hwcap flags can be in AT_HWCAP which is already used up.


for the record i'm still planning to commit this for 2.30.
will try to do it next week.

> > Currently the relocation ordering logic does not guarantee that ifunc

> resolvers can call libc apis or access libc objects, so only the

> resolver arguments and runtime environment dependent instructions can

> be used to do the dispatch (this affects ifunc resolvers outside of

> the libc).

> 

> Since ifunc resolver is target specific and only supposed to be

> called by the dynamic linker, the call ABI can be changed in a

> backward compatible way:

> 

> Old call ABI passed hwcap as uint64_t, new abi sets the

> _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument

> that's a pointer to an extendible struct. A resolver has to check

> the _IFUNC_ARG_HWCAP flag before accessing the second argument.

> 

> The new sys/ifunc.h installed header has the definitions for the

> new ABI, everything is in the implementation reserved namespace.

> 

> An alternative approach is to try to support extern calls from ifunc

> resolvers such as getauxval, but that seems non-trivial

> https://sourceware.org/ml/libc-alpha/2017-01/msg00468.html

> 

> 2019-06-12  Szabolcs Nagy  <szabolcs.nagy@arm.com>

> 

> 	* sysdeps/aarch64/Makefile: Install sys/ifunc.h and add tests.

> 	* sysdeps/aarch64/dl-irel.h (elf_ifunc_invoke): Update to new ABI.

> 	* sysdeps/aarch64/sys/ifunc.h: New file.

> 	* sysdeps/aarch64/tst-ifunc-arg-1.c: New file.

> 	* sysdeps/aarch64/tst-ifunc-arg-2.c: New file.

>
  
Szabolcs Nagy July 4, 2019, 10:16 a.m. UTC | #5
On 28/06/2019 17:35, Szabolcs Nagy wrote:
> On 12/06/2019 11:09, Szabolcs Nagy wrote:

>> Passing a second argument to the ifunc resolver allows accessing

>> AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2

>> on linux because for ilp32 to remain compatible with lp64 ABI no more

>> than 32bit hwcap flags can be in AT_HWCAP which is already used up.

> 

> for the record i'm still planning to commit this for 2.30.

> will try to do it next week.


committed.

>>> Currently the relocation ordering logic does not guarantee that ifunc

>> resolvers can call libc apis or access libc objects, so only the

>> resolver arguments and runtime environment dependent instructions can

>> be used to do the dispatch (this affects ifunc resolvers outside of

>> the libc).

>>

>> Since ifunc resolver is target specific and only supposed to be

>> called by the dynamic linker, the call ABI can be changed in a

>> backward compatible way:

>>

>> Old call ABI passed hwcap as uint64_t, new abi sets the

>> _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument

>> that's a pointer to an extendible struct. A resolver has to check

>> the _IFUNC_ARG_HWCAP flag before accessing the second argument.

>>

>> The new sys/ifunc.h installed header has the definitions for the

>> new ABI, everything is in the implementation reserved namespace.

>>

>> An alternative approach is to try to support extern calls from ifunc

>> resolvers such as getauxval, but that seems non-trivial

>> https://sourceware.org/ml/libc-alpha/2017-01/msg00468.html

>>

>> 2019-06-12  Szabolcs Nagy  <szabolcs.nagy@arm.com>

>>

>> 	* sysdeps/aarch64/Makefile: Install sys/ifunc.h and add tests.

>> 	* sysdeps/aarch64/dl-irel.h (elf_ifunc_invoke): Update to new ABI.

>> 	* sysdeps/aarch64/sys/ifunc.h: New file.

>> 	* sysdeps/aarch64/tst-ifunc-arg-1.c: New file.

>> 	* sysdeps/aarch64/tst-ifunc-arg-2.c: New file.

>>

>
  

Patch

diff --git a/string/str-two-way.h b/string/str-two-way.h
index b5011baafa..50fe6b4684 100644
--- a/string/str-two-way.h
+++ b/string/str-two-way.h
@@ -221,7 +221,7 @@  critical_factorization (const unsigned char *needle, size_t needle_len,
    most 2 * HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching.
    If AVAILABLE modifies HAYSTACK_LEN (as in strstr), then at most 3 *
    HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching.  */
-static RETURN_TYPE
+static inline RETURN_TYPE
 two_way_short_needle (const unsigned char *haystack, size_t haystack_len,
 		      const unsigned char *needle, size_t needle_len)
 {
@@ -383,7 +383,7 @@  two_way_short_needle (const unsigned char *haystack, size_t haystack_len,
    If AVAILABLE modifies HAYSTACK_LEN (as in strstr), then at most 3 *
    HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching, and
    sublinear performance is not possible.  */
-static RETURN_TYPE
+__attribute__((noinline)) static RETURN_TYPE
 two_way_long_needle (const unsigned char *haystack, size_t haystack_len,
 		     const unsigned char *needle, size_t needle_len)
 {
diff --git a/string/strstr.c b/string/strstr.c
index 64e478b9e7..737585fff2 100644
--- a/string/strstr.c
+++ b/string/strstr.c
@@ -16,21 +16,12 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-/* This particular implementation was written by Eric Blake, 2008.  */
-
 #ifndef _LIBC
 # include <config.h>
 #endif
 
-/* Specification of strstr.  */
 #include <string.h>
 
-#include <stdbool.h>
-
-#ifndef _LIBC
-# define __builtin_expect(expr, val)   (expr)
-#endif
-
 #define RETURN_TYPE char *
 #define AVAILABLE(h, h_l, j, n_l)			\
   (((j) + (n_l) <= (h_l)) \
@@ -47,47 +38,122 @@ 
 #define STRSTR strstr
 #endif
 
-/* Return the first occurrence of NEEDLE in HAYSTACK.  Return HAYSTACK
-   if NEEDLE is empty, otherwise NULL if NEEDLE is not found in
-   HAYSTACK.  */
-char *
-STRSTR (const char *haystack, const char *needle)
+static inline char *
+strstr2 (const unsigned char *hs, const unsigned char *ne)
 {
-  size_t needle_len; /* Length of NEEDLE.  */
-  size_t haystack_len; /* Known minimum length of HAYSTACK.  */
-
-  /* Handle empty NEEDLE special case.  */
-  if (needle[0] == '\0')
-    return (char *) haystack;
+  uint32_t h1 = (ne[0] << 16) | ne[1];
+  uint32_t h2 = 0;
+  for (int c = hs[0]; h1 != h2 && c != 0; c = *++hs)
+      h2 = (h2 << 16) | c;
+  return h1 == h2 ? (char *)hs - 2 : NULL;
+}
 
-  /* Skip until we find the first matching char from NEEDLE.  */
-  haystack = strchr (haystack, needle[0]);
-  if (haystack == NULL || needle[1] == '\0')
-    return (char *) haystack;
+static inline char *
+strstr3 (const unsigned char *hs, const unsigned char *ne)
+{
+  uint32_t h1 = (ne[0] << 24) | (ne[1] << 16) | (ne[2] << 8);
+  uint32_t h2 = 0;
+  for (int c = hs[0]; h1 != h2 && c != 0; c = *++hs)
+      h2 = (h2 | c) << 8;
+  return h1 == h2 ? (char *)hs - 3 : NULL;
+}
 
-  /* Ensure HAYSTACK length is at least as long as NEEDLE length.
-     Since a match may occur early on in a huge HAYSTACK, use strnlen
+#define hash2(p) (((size_t)(p)[0] - ((size_t)(p)[-1] << 3)) % sizeof (shift))
+
+/* Fast strstr algorithm with guaranteed linear-time performance.
+   Small needles up to size 2 use a dedicated linear search.  Longer needles
+   up to size 256 use a novel modified Horspool algorithm.  It hashes pairs
+   of characters to quickly skip past mismatches.  The main search loop only
+   exits if the last 2 characters match, avoiding unnecessary calls to memcmp
+   and allowing for a larger skip if there is no match.  A self-adapting
+   filtering check is used to quickly detect mismatches in long needles.
+   By limiting the needle length to 256, the shift table can be reduced to 8
+   bits per entry, lowering preprocessing overhead and minimizing cache effects.
+   The limit also implies worst-case performance is linear.
+   Needles larger than 256 characters use the linear-time Two-Way algorithm.  */
+char *
+STRSTR (const char *haystack, const char *needle)
+{
+  const unsigned char *hs = (const unsigned char *) haystack;
+  const unsigned char *ne = (const unsigned char *) needle;
+
+  /* Handle short needle special cases first.  */
+  if (ne[0] == '\0')
+    return (char *)hs;
+  hs = (const unsigned char *)strchr ((const char*)hs, ne[0]);
+  if (hs == NULL || ne[1] == '\0')
+    return (char*)hs;
+  if (ne[2] == '\0')
+    return strstr2 (hs, ne);
+  if (ne[3] == '\0')
+    return strstr3 (hs, ne);
+
+  /* Ensure haystack length is at least as long as needle length.
+     Since a match may occur early on in a huge haystack, use strnlen
      and read ahead a few cachelines for improved performance.  */
-  needle_len = strlen (needle);
-  haystack_len = __strnlen (haystack, needle_len + 256);
-  if (haystack_len < needle_len)
+  size_t ne_len = strlen ((const char*)ne);
+  size_t hs_len = __strnlen ((const char*)hs, ne_len | 512);
+  if (hs_len < ne_len)
     return NULL;
 
-  /* Check whether we have a match.  This improves performance since we avoid
-     the initialization overhead of the two-way algorithm.  */
-  if (memcmp (haystack, needle, needle_len) == 0)
-    return (char *) haystack;
-
-  /* Perform the search.  Abstract memory is considered to be an array
-     of 'unsigned char' values, not an array of 'char' values.  See
-     ISO C 99 section 6.2.6.1.  */
-  if (needle_len < LONG_NEEDLE_THRESHOLD)
-    return two_way_short_needle ((const unsigned char *) haystack,
-				 haystack_len,
-				 (const unsigned char *) needle, needle_len);
-  return two_way_long_needle ((const unsigned char *) haystack, haystack_len,
-			      (const unsigned char *) needle, needle_len);
+  /* Check whether we have a match.  This improves performance since we
+     avoid initialization overheads.  */
+  if (memcmp (hs, ne, ne_len) == 0)
+    return (char *) hs;
+
+  /* Use Two-Way algorithm for very long needles.  */
+  if (__glibc_unlikely (ne_len > 256))
+    return two_way_long_needle (hs, hs_len, ne, ne_len);
+
+  const unsigned char *end = hs + hs_len - ne_len;
+  uint8_t shift[256];
+  size_t tmp, shift1;
+  size_t m1 = ne_len - 1;
+  size_t offset = 0;
+
+  /* Initialize bad character shift hash table.  */
+  memset (shift, 0, sizeof (shift));
+  for (int i = 1; i < m1; i++)
+    shift[hash2 (ne + i)] = i;
+  shift1 = m1 - shift[hash2 (ne + m1)];
+  shift[hash2 (ne + m1)] = m1;
+
+  while (1)
+    {
+      if (__glibc_unlikely (hs > end))
+       {
+         end += __strnlen ((const char*)end + m1 + 1, 2048);
+         if (hs > end)
+           return NULL;
+       }
+
+      /* Skip past character pairs not in the needle.  */
+      do
+       {
+         hs += m1;
+         tmp = shift[hash2 (hs)];
+       }
+      while (hs <= end && tmp == 0);
+
+      /* If the match is not at the end of the needle, shift to the end
+        and continue until we match the last 2 characters.  */
+      hs -= tmp;
+      if (tmp < m1)
+       continue;
+
+      /* The last 2 characters match.  If the needle is long, check a
+        fixed number of characters first to quickly filter out mismatches.  */
+      if (memcmp (hs + offset, ne + offset, sizeof (int)) == 0)
+       {
+         if (memcmp (hs, ne, m1) == 0)
+           return (void *) hs;
+
+         /* Adjust filter offset when it doesn't find the mismatch.  */
+         offset = (offset >= sizeof (int) ? offset : m1) - sizeof (int);
+       }
+
+      /* Skip based on matching the last 2 characters.  */
+      hs += shift1;
+    }
 }
 libc_hidden_builtin_def (strstr)
-
-#undef LONG_NEEDLE_THRESHOLD