[AArch64] Add ifunc support for Ares
Commit Message
Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares
supports 2 128-bit loads/stores, use Neon registers for memcpy by
selecting __memcpy_falkor by default (we should rename this to
__memcpy_simd or similar).
OK for commit?
ChangeLog:
2018-12-19 Wilco Dijkstra <wdijkstr@arm.com>
* manual/tunables.texi (glibc.cpu.name): Add ares tunable.
* sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use
__memcpy_falkor for ares.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES):
Add new define.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list):
Add ares cpu.
--
Comments
On 19/12/18 11:58 PM, Wilco Dijkstra wrote:
> Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares
> supports 2 128-bit loads/stores, use Neon registers for memcpy by
> selecting __memcpy_falkor by default (we should rename this to
> __memcpy_simd or similar).
The falkor memcpy has a very specific quirk that reuses register names
to optimise prefetcher usage so it may not necessarily work well with
other implementations. Perhaps a new implementation similar to the
stock memcpy but with vector registers would be more suitable for a
__memcpy_simd.
Siddhesh
Hi Siddhesh,
> The falkor memcpy has a very specific quirk that reuses register names
> to optimise prefetcher usage so it may not necessarily work well with
> other implementations. Perhaps a new implementation similar to the
> stock memcpy but with vector registers would be more suitable for a
> __memcpy_simd.
Reusing registers does not matter on an out-of-order core since they are
renamed. But you're right that a generic SIMD memcpy could do better.
For example using LDP/STP of Q registers will be smaller and faster,
maybe even on Falkor.
Cheers,
Wilco
@@ -360,7 +360,7 @@ This tunable is specific to powerpc, powerpc64 and powerpc64le.
The @code{glibc.cpu.name=xxx} tunable allows the user to tell @theglibc{} to
assume that the CPU is @code{xxx} where xxx may have one of these values:
@code{generic}, @code{falkor}, @code{thunderxt88}, @code{thunderx2t99},
-@code{thunderx2t99p1}.
+@code{thunderx2t99p1}, @code{ares}.
This tunable is specific to aarch64.
@end deftp
@@ -36,7 +36,7 @@ extern __typeof (__redirect_memcpy) __memcpy_falkor attribute_hidden;
libc_ifunc (__libc_memcpy,
(IS_THUNDERX (midr)
? __memcpy_thunderx
- : (IS_FALKOR (midr) || IS_PHECDA (midr)
+ : (IS_FALKOR (midr) || IS_PHECDA (midr) || IS_ARES (midr)
? __memcpy_falkor
: (IS_THUNDERX2 (midr) || IS_THUNDERX2PA (midr)
? __memcpy_thunderx2
@@ -51,6 +51,8 @@
#define IS_PHECDA(midr) (MIDR_IMPLEMENTOR(midr) == 'h' \
&& MIDR_PARTNUM(midr) == 0x000)
+#define IS_ARES(midr) (MIDR_IMPLEMENTOR(midr) == 'A' \
+ && MIDR_PARTNUM(midr) == 0xd0c)
struct cpu_features
{
@@ -36,6 +36,7 @@ static struct cpu_list cpu_list[] = {
{"thunderx2t99", 0x431F0AF0},
{"thunderx2t99p1", 0x420F5160},
{"phecda", 0x680F0000},
+ {"ares", 0x411FD0C0},
{"generic", 0x0}
};