AArch64: Improve backwards memmove performance

Message ID DB8PR08MB503633698FAA56D1D8E3AF35835A0@DB8PR08MB5036.eurprd08.prod.outlook.com
State Committed
Commit bd394d131c10c9ec22c6424197b79410042eed99
Headers
Series AArch64: Improve backwards memmove performance |

Commit Message

Wilco Dijkstra Aug. 20, 2020, 11:46 a.m. UTC
  On some microarchitectures performance of the backwards memmove improves if
the stores use STR with decreasing addresses.  So change the memmove loop
in memcpy_advsimd.S to use 2x STR rather than STP.

Passes GLIBC regression test, OK for commit?

---
  

Comments

Adhemerval Zanella Netto Aug. 25, 2020, 12:15 p.m. UTC | #1
On 20/08/2020 08:46, Wilco Dijkstra wrote:
> On some microarchitectures performance of the backwards memmove improves if
> the stores use STR with decreasing addresses.  So change the memmove loop
> in memcpy_advsimd.S to use 2x STR rather than STP.
> 
> Passes GLIBC regression test, OK for commit?

LGTM, thanks.  Does it make any difference to use the same strategy on the
last iteration at L(copy64_from_start) as well?

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

> 
> ---
> diff --git a/sysdeps/aarch64/multiarch/memcpy_advsimd.S b/sysdeps/aarch64/multiarch/memcpy_advsimd.S
> index d4ba74777744c8bb5a83e43ab2d63ad8dab35203..48bb6d7ca425197907eaef2307fb3939e69baa15 100644
> --- a/sysdeps/aarch64/multiarch/memcpy_advsimd.S
> +++ b/sysdeps/aarch64/multiarch/memcpy_advsimd.S
> @@ -223,12 +223,13 @@ L(copy_long_backwards):
>  	b.ls	L(copy64_from_start)
>  
>  L(loop64_backwards):
> -	stp	A_q, B_q, [dstend, -32]
> +	str	B_q, [dstend, -16]
> +	str	A_q, [dstend, -32]
>  	ldp	A_q, B_q, [srcend, -96]
> -	stp	C_q, D_q, [dstend, -64]
> +	str	D_q, [dstend, -48]
> +	str	C_q, [dstend, -64]!
>  	ldp	C_q, D_q, [srcend, -128]
>  	sub	srcend, srcend, 64
> -	sub	dstend, dstend, 64
>  	subs	count, count, 64
>  	b.hi	L(loop64_backwards)
>  
>
  

Patch

diff --git a/sysdeps/aarch64/multiarch/memcpy_advsimd.S b/sysdeps/aarch64/multiarch/memcpy_advsimd.S
index d4ba74777744c8bb5a83e43ab2d63ad8dab35203..48bb6d7ca425197907eaef2307fb3939e69baa15 100644
--- a/sysdeps/aarch64/multiarch/memcpy_advsimd.S
+++ b/sysdeps/aarch64/multiarch/memcpy_advsimd.S
@@ -223,12 +223,13 @@  L(copy_long_backwards):
 	b.ls	L(copy64_from_start)
 
 L(loop64_backwards):
-	stp	A_q, B_q, [dstend, -32]
+	str	B_q, [dstend, -16]
+	str	A_q, [dstend, -32]
 	ldp	A_q, B_q, [srcend, -96]
-	stp	C_q, D_q, [dstend, -64]
+	str	D_q, [dstend, -48]
+	str	C_q, [dstend, -64]!
 	ldp	C_q, D_q, [srcend, -128]
 	sub	srcend, srcend, 64
-	sub	dstend, dstend, 64
 	subs	count, count, 64
 	b.hi	L(loop64_backwards)