Patchwork aarch64: Optimized memcmp for medium to large sizes

login
register
mail settings
Submitter Siddhesh Poyarekar
Date March 6, 2018, 5:34 p.m.
Message ID <414c23c1-87b8-7fc4-d4a3-c44eff2660c0@sourceware.org>
Download mbox | patch
Permalink /patch/26215/
State New
Headers show

Comments

Siddhesh Poyarekar - March 6, 2018, 5:34 p.m.
On Tuesday 06 March 2018 10:47 PM, Szabolcs Nagy wrote:
> this broke the build for me:
> 
> /B/elf/librtld.os: In function `memcmp':
> /S/string/../sysdeps/aarch64/memcmp.S:78: undefined reference to `.Lloop8'
> collect2: error: ld returned 1 exit status
> make[2]: *** [/B/elf/ld.so] Error 1
> make[2]: Leaving directory `/S/elf'

Sorry, I took the lazy way out and failed to smoke test the loop8 name
fixup and missed one instance.  I've pushed this obvious fix after
actually building it this time.

Siddhesh

Patch

From 4e54d918630ea53e29dd70d3bdffcb00d29ed3d4 Mon Sep 17 00:00:00 2001
From: Siddhesh Poyarekar <siddhesh@sourceware.org>
Date: Tue, 6 Mar 2018 22:56:35 +0530
Subject: [PATCH] aarch64: Fix branch target to loop16

I goofed up when changing the loop8 name to loop16 and missed on out
the branch instance.  Fixed and actually build tested this time.

	* sysdeps/aarch64/memcmp.S (more16): Fix branch target loop16.
---
 ChangeLog                | 2 ++
 sysdeps/aarch64/memcmp.S | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 23609b80d7..a24ed86474 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,7 @@ 
 2018-03-06  Siddhesh Poyarekar  <siddhesh@sourceware.org>
 
+	* sysdeps/aarch64/memcmp.S (more16): Fix loop16 branch target.
+
 	* sysdeps/aarch64/memcmp.S: Widen comparison to 16 bytes at a
 	time.
 
diff --git a/sysdeps/aarch64/memcmp.S b/sysdeps/aarch64/memcmp.S
index 8325d047e7..743bc078bb 100644
--- a/sysdeps/aarch64/memcmp.S
+++ b/sysdeps/aarch64/memcmp.S
@@ -75,7 +75,7 @@  L(more16):
 	/* We overlap loads between 0-32 bytes at either side of SRC1 when we
 	   try to align, so limit it only to strings larger than 128 bytes.  */
 	cmp	limit, 96
-	b.ls	L(loop8)
+	b.ls	L(loop16)
 
 	/* Align src1 and adjust src2 with bytes not yet done.  */
 	and	tmp1, src1, 15
-- 
2.14.3