Patchwork [x86_64] Fix for wrong selector in x86_64/multiarch/memcpy.S BZ #18880

login
register
mail settings
Submitter H.J. Lu
Date March 4, 2016, 4:07 p.m.
Message ID <CAMe9rOpyGNAzQtXzVxZ9qCFUqe0n+MgK-2hbzN2D=dWzcT_=zA@mail.gmail.com>
Download mbox | patch
Permalink /patch/11196/
State New
Headers show

Comments

H.J. Lu - March 4, 2016, 4:07 p.m.
On Thu, Mar 3, 2016 at 9:04 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote:
>>Change looks good.  If you can't commit it yourself, please improve commit
>>log:
>>
>>1. Don't add your ChangLog entry in ChangeLog directly since other people may change ChangeLog.
>>2. In ChangeLog entry, describe what you did, like check Fast_Unaligned_Load instead of Slow_BSF and >check Fast_Copy_Backward for __memcpy_ssse3_back.
>
> As per your suggestion, I have fixed the patch with improved commit log and also providing separate ChangeLog patch. If OK please commit it else let me know for any required changes.
>
> Thanks,
> Amit Pawar
>
>

This is the patch I am going to check in.

Patch

From 2b4fee345d53eb8fc81461f2aefae74e9f3604ae Mon Sep 17 00:00:00 2001
From: Amit Pawar <Amit.Pawar@amd.com>
Date: Thu, 3 Mar 2016 22:24:21 +0530
Subject: [PATCH] x86-64: Fix memcpy IFUNC selection

Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for
Fast_Copy_Backward to enable __memcpy_ssse3_back.  Existing selection
order is updated with following selection order:

1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set.
2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set.
3. __memcpy_sse2 if SSSE3 isn't available.
4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set.
5. __memcpy_ssse3

	[BZ #18880]
	* sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load
	instead of Slow_BSF and also check for Fast_Copy_Backward to
	enable __memcpy_ssse3_back.
---
 sysdeps/x86_64/multiarch/memcpy.S | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S
index 64a1bcd..8882590 100644
--- a/sysdeps/x86_64/multiarch/memcpy.S
+++ b/sysdeps/x86_64/multiarch/memcpy.S
@@ -35,22 +35,23 @@  ENTRY(__new_memcpy)
 	jz	1f
 	HAS_ARCH_FEATURE (Prefer_No_VZEROUPPER)
 	jz	1f
-	leaq    __memcpy_avx512_no_vzeroupper(%rip), %rax
+	lea    __memcpy_avx512_no_vzeroupper(%rip), %RAX_LP
 	ret
 #endif
-1:	leaq	__memcpy_avx_unaligned(%rip), %rax
+1:	lea	__memcpy_avx_unaligned(%rip), %RAX_LP
 	HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
-	jz 2f
-	ret
-2:	leaq	__memcpy_sse2(%rip), %rax
-	HAS_ARCH_FEATURE (Slow_BSF)
-	jnz	3f
-	leaq	__memcpy_sse2_unaligned(%rip), %rax
-	ret
-3:	HAS_CPU_FEATURE (SSSE3)
-	jz 4f
-	leaq    __memcpy_ssse3(%rip), %rax
-4:	ret
+	jnz	2f
+	lea	__memcpy_sse2_unaligned(%rip), %RAX_LP
+	HAS_ARCH_FEATURE (Fast_Unaligned_Load)
+	jnz	2f
+	lea	__memcpy_sse2(%rip), %RAX_LP
+	HAS_CPU_FEATURE (SSSE3)
+	jz	2f
+	lea    __memcpy_ssse3_back(%rip), %RAX_LP
+	HAS_ARCH_FEATURE (Fast_Copy_Backward)
+	jnz	2f
+	lea	__memcpy_ssse3(%rip), %RAX_LP
+2:	ret
 END(__new_memcpy)
 
 # undef ENTRY
-- 
2.5.0