From patchwork Fri Jun 23 18:33:08 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "H.J. Lu" <hjl.tools@gmail.com>
X-Patchwork-Id: 21240
Received: (qmail 59513 invoked by alias); 23 Jun 2017 18:33:12 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 59490 invoked by uid 89); 23 Jun 2017 18:33:11 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL, BAYES_00,
	FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2,
	GIT_PATCH_3, RCVD_IN_DNSWL_NONE,
	SPF_PASS autolearn=ham version=3.3.2 spammy=Hx-languages-length:2170
X-HELO: mail-ot0-f194.google.com
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:in-reply-to:references:from:date
	:message-id:subject:to:cc;
	bh=wTPl7o/NaJwsHQgKsEsZiTjif6llccn6Ul5SpqQm90E=;
	b=IHYxdj+PpxNLjSg0xdQDK4D6wGllnefKADaA8FHGEGm0B3FuXA6hGN6Jy193nbyb9W
	DZpCAKGnw+G0ew41HSEA1jX8LHLk59lE9bkY0HzoxE3rq+idof+/DYxPnLAvjv28EaY7
	XWqnOmGIqK0mZYl3q3n8TlkcQ0exe9pRaG1bqIRIQ3fTA58JV0mP+Z2fbe9gJwtZY6k4
	ZIcpOmOZQp1SH47vm7wV4tr4xnRT1e61toA7MqofN+V3E7dR0rcY2A1eu7z3GURitNNm
	LDhxiA5auPIF0pPRXVhlbgu+GeZu/s0XYw4jgMHSSJDiEZl6xVenVjVQQkRtctkpINsf
	ihXA==
X-Gm-Message-State: AKS2vOzr0gdLE6B+vUw8/edJd//VtSbwE0gPMMgFnZQ5ICiWeYbvgW96
	8L4hekAxj/HADlTn1tWDFrLHMYb8xRRW
X-Received: by 10.157.83.21 with SMTP id g21mr1654509oth.206.1498242789135;
	Fri, 23 Jun 2017 11:33:09 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <37781c2f-2cd5-6c3e-e0b1-9189ffe97750@redhat.com>
References: <20170623132026.82F2D4017D45E@oldenburg.str.redhat.com>
	<6fec374c-177f-b8e8-d7a3-ab10d7dab136@redhat.com>
	<5c63ea37-7fe6-79f3-de42-9146ae084269@redhat.com>
	<37781c2f-2cd5-6c3e-e0b1-9189ffe97750@redhat.com>
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 23 Jun 2017 11:33:08 -0700
Message-ID: 
 <CAMe9rOqwAAwAtDEWEsdcafSfQBRQXqPjEw9gfeVAVvp9ciVryQ@mail.gmail.com>
Subject: Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction
	[BZ #21662]
To: Florian Weimer <fweimer@redhat.com>
Cc: "Carlos O'Donell" <carlos@redhat.com>,
	GNU C Library <libc-alpha@sourceware.org>

On Fri, Jun 23, 2017 at 9:42 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/23/2017 06:38 PM, Carlos O'Donell wrote:
>
>> I assume that this catches the regression by ensuring the high values of
>> the subtraction result in an underflow which results in a positive value
>> of the subtraction and a wrong answer?
>
> Yes, I thought I said so in the commit message.
>
>> Was this comment ever accurate? mobzwl is not a BE load.
>
> We used bswap, so the register contents before the comparison is in
> big-endian format.
>
>>> +    orl     %edi, %eax
>>> +    orl     %esi, %ecx
>>> +    /* Subtraction is okay because the upper 8 bits a zero.  */
>>
>> s/a zero/are zero/g
>
> Okay, I'll fix this typo in a follow-up commit.

How about this patch to turn

movzbl -1(%rdi, %rdx), %edi
movzbl -1(%rsi, %rdx), %esi
orl %edi, %eax
orl %esi, %ecx

into

movb -1(%rdi, %rdx), %al
movb -1(%rsi, %rdx), %cl


H.J.

From acecb3f7de4892b68ec1b464a576ee84b3f97527 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 23 Jun 2017 11:29:38 -0700
Subject: [PATCH] x86-64: Optimize L(between_2_3) in memcmp-avx2-movbe.S

Turn

	movzbl	-1(%rdi, %rdx), %edi
	movzbl	-1(%rsi, %rdx), %esi
	orl	%edi, %eax
	orl	%esi, %ecx

into

	movb	-1(%rdi, %rdx), %al
	movb	-1(%rsi, %rdx), %cl

	* sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3):
	Replace movzbl and orl with movb.
---
 sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
index 9d19210..abcc61c 100644
--- a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
+++ b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
@@ -144,11 +144,9 @@ L(between_2_3):
 	shll	$8, %ecx
 	bswap	%eax
 	bswap	%ecx
-	movzbl	-1(%rdi, %rdx), %edi
-	movzbl	-1(%rsi, %rdx), %esi
-	orl	%edi, %eax
-	orl	%esi, %ecx
-	/* Subtraction is okay because the upper 8 bits a zero.  */
+	movb	-1(%rdi, %rdx), %al
+	movb	-1(%rsi, %rdx), %cl
+	/* Subtraction is okay because the upper 8 bits are zero.  */
 	subl	%ecx, %eax
 	ret
 
-- 
2.9.4