From patchwork Tue Apr  5 12:22:58 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lu, Hongjiu" <hongjiu.lu@intel.com>
X-Patchwork-Id: 11633
Received: (qmail 44091 invoked by alias); 5 Apr 2016 12:23:08 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 44075 invoked by uid 89); 5 Apr 2016 12:23:07 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=BAYES_50,
	KAM_LAZY_DOMAIN_SECURITY, NO_DNS_FOR_FROM,
	RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=UD:VMOVU.d32,
	p2align, vmovud32, vmovu.d32
X-HELO: mga03.intel.com
X-ExtLoop1: 1
Date: Tue, 5 Apr 2016 05:22:58 -0700
From: "H.J. Lu" <hongjiu.lu@intel.com>
To: GNU C Library <libc-alpha@sourceware.org>
Subject: [committed, PATCH] Force 32-bit displacement in
	memset-vec-unaligned-erms.S
Message-ID: <20160405122258.GA6778@intel.com>
Reply-To: "H.J. Lu" <hjl.tools@gmail.com>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.5.24 (2015-08-30)

* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force
	32-bit displacement to avoid long nop between instructions.
---
 sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
index 9383517..fe0f745 100644
--- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
+++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
@@ -159,10 +159,23 @@ L(return):
 	.p2align 4
 L(loop_start):
 	leaq	(VEC_SIZE * 4)(%rdi), %rcx
+# if VEC_SIZE == 32 || VEC_SIZE == 64
+	/* Force 32-bit displacement to avoid long nop between
+	   instructions.  */
+	VMOVU.d32 %VEC(0), (%rdi)
+# else
 	VMOVU	%VEC(0), (%rdi)
+# endif
 	andq	$-(VEC_SIZE * 4), %rcx
+# if VEC_SIZE == 32
+	/* Force 32-bit displacement to avoid long nop between
+	   instructions.  */
+	VMOVU.d32 %VEC(0), -VEC_SIZE(%rdi,%rdx)
+	VMOVU.d32 %VEC(0), VEC_SIZE(%rdi)
+# else
 	VMOVU	%VEC(0), -VEC_SIZE(%rdi,%rdx)
 	VMOVU	%VEC(0), VEC_SIZE(%rdi)
+# endif
 	VMOVU	%VEC(0), -(VEC_SIZE * 2)(%rdi,%rdx)
 	VMOVU	%VEC(0), (VEC_SIZE * 2)(%rdi)
 	VMOVU	%VEC(0), -(VEC_SIZE * 3)(%rdi,%rdx)