Patchwork [X86_64] Set bit_Fast_Unaligned_Load for Excavator family CPU's

login
register
mail settings
Submitter H.J. Lu
Date Jan. 14, 2016, 4:18 p.m.
Message ID <CAMe9rOr-v73iyJss0sLgyA24M=9_+Kx+rko2nD75xmo-YdQ9hw@mail.gmail.com>
Download mbox | patch
Permalink /patch/10382/
State New
Headers show

Comments

H.J. Lu - Jan. 14, 2016, 4:18 p.m.
On Thu, Jan 14, 2016 at 8:07 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> OK from my part (you still need x86 maintainers ack to push it upstream).

I cleaned it up.  This is what I checked in.  Thanks.

> On 14-01-2016 12:52, Pawar, Amit wrote:
>> (a) Ask Adhemerval for an exception to provide this IFUNC tweak for AMD CPUs.
>> Done.
>>
>> (b) Once granted an exception, add your patch to the list of blockers here:
>>     https://sourceware.org/glibc/wiki/Release/2.23#Release_blockers.3F
>> Sure.
>>
>> Again, please post your new patch as quickly as possible.
>> I have filed a bug for this. https://sourceware.org/bugzilla/show_bug.cgi?id=19467
>> PFA patch and if OK please commit it in from my side.
>>
>> Thanks
>> Amit
>>

Patch

From d7890e6947114785755ae5b1cf5310491092ee0b Mon Sep 17 00:00:00 2001
From: Amit Pawar <Amit.Pawar@amd.com>
Date: Thu, 14 Jan 2016 20:06:02 +0530
Subject: [PATCH] Set index_Fast_Unaligned_Load for Excavator family CPUs

GLIBC benchtest testcases shows SSE2_Unaligned based implementations
are performing faster compare to SSE2 based implementations for
routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy
and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family
0x15h CPU's. This makes SSE2_Unaligned based implementations as
default for these routines.

	[BZ #19467]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	index_Fast_Unaligned_Load flag for Excavator family CPUs.
---
 ChangeLog                  | 6 ++++++
 sysdeps/x86/cpu-features.c | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 424f731..054998f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@ 
+2016-01-14  Amit Pawar  <amit.pawar@amd.com>
+
+	[BZ #19467]
+	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
+	index_Fast_Unaligned_Load flag for Excavator family CPUs.
+
 2016-01-02  Marcin Koƛcielnicki  <koriakin@0x04.net>
 
 	* sysdeps/s390/nptl/tls.h (struct tcbhead_t): Add __private_ss field.
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index e6bd4c9..218ff2b 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -154,6 +154,14 @@  init_cpu_features (struct cpu_features *cpu_features)
 		 cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ebx,
 		 cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ecx,
 		 cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].edx);
+
+      if (family == 0x15)
+	{
+	  /* "Excavator"   */
+	  if (model >= 0x60 && model <= 0x7f)
+	    cpu_features->feature[index_Fast_Unaligned_Load]
+	      |= bit_Fast_Unaligned_Load;
+	}
     }
   else
     kind = arch_kind_other;
-- 
2.5.0