From patchwork Fri Jun 23 18:33:08 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 21240 Received: (qmail 59513 invoked by alias); 23 Jun 2017 18:33:12 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 59490 invoked by uid 89); 23 Jun 2017 18:33:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=Hx-languages-length:2170 X-HELO: mail-ot0-f194.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wTPl7o/NaJwsHQgKsEsZiTjif6llccn6Ul5SpqQm90E=; b=IHYxdj+PpxNLjSg0xdQDK4D6wGllnefKADaA8FHGEGm0B3FuXA6hGN6Jy193nbyb9W DZpCAKGnw+G0ew41HSEA1jX8LHLk59lE9bkY0HzoxE3rq+idof+/DYxPnLAvjv28EaY7 XWqnOmGIqK0mZYl3q3n8TlkcQ0exe9pRaG1bqIRIQ3fTA58JV0mP+Z2fbe9gJwtZY6k4 ZIcpOmOZQp1SH47vm7wV4tr4xnRT1e61toA7MqofN+V3E7dR0rcY2A1eu7z3GURitNNm LDhxiA5auPIF0pPRXVhlbgu+GeZu/s0XYw4jgMHSSJDiEZl6xVenVjVQQkRtctkpINsf ihXA== X-Gm-Message-State: AKS2vOzr0gdLE6B+vUw8/edJd//VtSbwE0gPMMgFnZQ5ICiWeYbvgW96 8L4hekAxj/HADlTn1tWDFrLHMYb8xRRW X-Received: by 10.157.83.21 with SMTP id g21mr1654509oth.206.1498242789135; Fri, 23 Jun 2017 11:33:09 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <37781c2f-2cd5-6c3e-e0b1-9189ffe97750@redhat.com> References: <20170623132026.82F2D4017D45E@oldenburg.str.redhat.com> <6fec374c-177f-b8e8-d7a3-ab10d7dab136@redhat.com> <5c63ea37-7fe6-79f3-de42-9146ae084269@redhat.com> <37781c2f-2cd5-6c3e-e0b1-9189ffe97750@redhat.com> From: "H.J. Lu" Date: Fri, 23 Jun 2017 11:33:08 -0700 Message-ID: Subject: Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662] To: Florian Weimer Cc: "Carlos O'Donell" , GNU C Library On Fri, Jun 23, 2017 at 9:42 AM, Florian Weimer wrote: > On 06/23/2017 06:38 PM, Carlos O'Donell wrote: > >> I assume that this catches the regression by ensuring the high values of >> the subtraction result in an underflow which results in a positive value >> of the subtraction and a wrong answer? > > Yes, I thought I said so in the commit message. > >> Was this comment ever accurate? mobzwl is not a BE load. > > We used bswap, so the register contents before the comparison is in > big-endian format. > >>> + orl %edi, %eax >>> + orl %esi, %ecx >>> + /* Subtraction is okay because the upper 8 bits a zero. */ >> >> s/a zero/are zero/g > > Okay, I'll fix this typo in a follow-up commit. How about this patch to turn movzbl -1(%rdi, %rdx), %edi movzbl -1(%rsi, %rdx), %esi orl %edi, %eax orl %esi, %ecx into movb -1(%rdi, %rdx), %al movb -1(%rsi, %rdx), %cl H.J. From acecb3f7de4892b68ec1b464a576ee84b3f97527 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 23 Jun 2017 11:29:38 -0700 Subject: [PATCH] x86-64: Optimize L(between_2_3) in memcmp-avx2-movbe.S Turn movzbl -1(%rdi, %rdx), %edi movzbl -1(%rsi, %rdx), %esi orl %edi, %eax orl %esi, %ecx into movb -1(%rdi, %rdx), %al movb -1(%rsi, %rdx), %cl * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3): Replace movzbl and orl with movb. --- sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S index 9d19210..abcc61c 100644 --- a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S +++ b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S @@ -144,11 +144,9 @@ L(between_2_3): shll $8, %ecx bswap %eax bswap %ecx - movzbl -1(%rdi, %rdx), %edi - movzbl -1(%rsi, %rdx), %esi - orl %edi, %eax - orl %esi, %ecx - /* Subtraction is okay because the upper 8 bits a zero. */ + movb -1(%rdi, %rdx), %al + movb -1(%rsi, %rdx), %cl + /* Subtraction is okay because the upper 8 bits are zero. */ subl %ecx, %eax ret -- 2.9.4