[AArch64] Add ifunc support for Ares

Message ID DB5PR08MB1030A888EFC9735023F2021D83BE0@DB5PR08MB1030.eurprd08.prod.outlook.com
State New, archived
Headers

Commit Message

Wilco Dijkstra Dec. 19, 2018, 6:28 p.m. UTC
  Add Ares to the midr_el0 list and support ifunc dispatch.  Since Ares
supports 2 128-bit loads/stores, use Neon registers for memcpy by
selecting __memcpy_falkor by default (we should rename this to
__memcpy_simd or similar).

OK for commit?

ChangeLog:
2018-12-19  Wilco Dijkstra  <wdijkstr@arm.com>

        * manual/tunables.texi (glibc.cpu.name): Add ares tunable.
        * sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use
        __memcpy_falkor for ares.
        * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES):
        Add new define.
        * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list):
        Add ares cpu.
--
  

Comments

Siddhesh Poyarekar Dec. 20, 2018, 3:21 p.m. UTC | #1
On 19/12/18 11:58 PM, Wilco Dijkstra wrote:
> Add Ares to the midr_el0 list and support ifunc dispatch.  Since Ares
> supports 2 128-bit loads/stores, use Neon registers for memcpy by
> selecting __memcpy_falkor by default (we should rename this to
> __memcpy_simd or similar).

The falkor memcpy has a very specific quirk that reuses register names 
to optimise prefetcher usage so it may not necessarily work well with 
other implementations.  Perhaps a new implementation similar to the 
stock memcpy but with vector registers would be more suitable for a 
__memcpy_simd.

Siddhesh
  
Wilco Dijkstra Dec. 20, 2018, 3:45 p.m. UTC | #2
Hi Siddhesh,

> The falkor memcpy has a very specific quirk that reuses register names 
> to optimise prefetcher usage so it may not necessarily work well with 
> other implementations.  Perhaps a new implementation similar to the 
> stock memcpy but with vector registers would be more suitable for a 
> __memcpy_simd.

Reusing registers does not matter on an out-of-order core since they are
renamed. But you're right that a generic SIMD memcpy could do better.
For example using LDP/STP of Q registers will be smaller and faster,
maybe even on Falkor.

Cheers,
Wilco
  

Patch

diff --git a/manual/tunables.texi b/manual/tunables.texi
index 09a25655aeebbbd5489c1680e00ba4444d21dcc0..af820820e044f718b125d39491b35e7e273da20c 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -360,7 +360,7 @@  This tunable is specific to powerpc, powerpc64 and powerpc64le.
 The @code{glibc.cpu.name=xxx} tunable allows the user to tell @theglibc{} to
 assume that the CPU is @code{xxx} where xxx may have one of these values:
 @code{generic}, @code{falkor}, @code{thunderxt88}, @code{thunderx2t99},
-@code{thunderx2t99p1}.
+@code{thunderx2t99p1}, @code{ares}.
 
 This tunable is specific to aarch64.
 @end deftp
diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
index 4a04a63b0fe0c84b9225286c4aaf1386d01a736a..8f5d4e7df51af09c88b3c4c4d2a0a0f477ce405e 100644
--- a/sysdeps/aarch64/multiarch/memcpy.c
+++ b/sysdeps/aarch64/multiarch/memcpy.c
@@ -36,7 +36,7 @@  extern __typeof (__redirect_memcpy) __memcpy_falkor attribute_hidden;
 libc_ifunc (__libc_memcpy,
             (IS_THUNDERX (midr)
 	     ? __memcpy_thunderx
-	     : (IS_FALKOR (midr) || IS_PHECDA (midr)
+	     : (IS_FALKOR (midr) || IS_PHECDA (midr) || IS_ARES (midr)
 		? __memcpy_falkor
 		: (IS_THUNDERX2 (midr) || IS_THUNDERX2PA (midr)
 		  ? __memcpy_thunderx2
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index eb35adfbe9d429d5622a712738fa75bafe8e7322..153d258afe975ab463580cba4248a6950126c89a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -51,6 +51,8 @@ 
 
 #define IS_PHECDA(midr) (MIDR_IMPLEMENTOR(midr) == 'h'			      \
                         && MIDR_PARTNUM(midr) == 0x000)
+#define IS_ARES(midr) (MIDR_IMPLEMENTOR(midr) == 'A'			      \
+			&& MIDR_PARTNUM(midr) == 0xd0c)
 
 struct cpu_features
 {
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index b4f348509eb1c6b319add6eb8ed8a198c00df149..69be36869ebcc2105ec161b24d52be9bbf00b627 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -36,6 +36,7 @@  static struct cpu_list cpu_list[] = {
       {"thunderx2t99",   0x431F0AF0},
       {"thunderx2t99p1", 0x420F5160},
       {"phecda",	 0x680F0000},
+      {"ares",		 0x411FD0C0},
       {"generic", 	 0x0}
 };