From patchwork Tue May 30 19:28:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 20643 Received: (qmail 20951 invoked by alias); 30 May 2017 19:28:21 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 20936 invoked by uid 89); 30 May 2017 19:28:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.1 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy=stall X-HELO: mail-qk0-f176.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=849C4+na/qt0JnheaSnoiX7PmTAkIPqkDmCOeYjIxuU=; b=GgFe/TOCzbJaWx3fnUhS4wncTd9xMz7xkpqmpD1GNCht+KOD73/TQ1iRU7HavDo4V+ wUmING7glN836slqflPTZMPK253Z4b7jbr+SfiW+WusYuF4IMgij056EfO5IgDObmxzx +NvIcPZWbYZo9ZZM9FSTPJZWl56zkN6UJB8ABDXywqTtIir8+YrC9HhFSZIMjoF9UhYs bFJqk6ETBNYcb8pauNBFTlMJMegdpCNGZbuQePWIQFboghAgPn81rXQQNTUccwN9Om5K FyI7AchmXU+f40kqgmjKp2DRZIdLoRf5NEuLlg/BS0a34aE9OUpzN6l1gvLZMgt3ELuo Z1Sw== X-Gm-Message-State: AODbwcBdI2wxvBSwk9bXzr+fG0IexWyphUwjPEKtSZ3j6SGm8kgztiOD z4u3pUR0In3z92cahA0qBEha4htSEw== X-Received: by 10.55.33.98 with SMTP id h95mr23669681qkh.246.1496172501516; Tue, 30 May 2017 12:28:21 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20170520145006.GA19672@lucon.org> From: "H.J. Lu" Date: Tue, 30 May 2017 12:28:20 -0700 Message-ID: Subject: Re: [PATCH] x86_64: Remove 9 REX bytes from memchr.S To: Zack Weinberg Cc: GNU C Library On Wed, May 24, 2017 at 7:55 AM, H.J. Lu wrote: > On Sat, May 20, 2017 at 12:58 PM, H.J. Lu wrote: >> On Sat, May 20, 2017 at 7:59 AM, Zack Weinberg wrote: >>> On Sat, May 20, 2017 at 10:50 AM, H.J. Lu wrote: >>>> There is no need to use 64-bit registers when only the lower 32 bits >>>> are non-zero. >>> >>> This code is used generically for x86-64, not for a specific >>> microarchitecture. Is there a reason why this will never cause partial >>> register stalls, now or in the future? >> >> By x86-64 specification, 32-bit destination registers in these instructions >> are zero-extended to 64 bits and there is register stall at all. >> > > Here is the updated patch with one more REX byte removed. > > Any other comments? > This is the patch I am checking in. From 0b4aae9e15eeb63419dc1df2578b3df50aae7edf Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Thu, 18 May 2017 12:22:31 -0700 Subject: [PATCH] x86_64: Remove redundant REX bytes from memchr.S By x86-64 specification, 32-bit destination registers are zero-extended to 64 bits. There is no need to use 64-bit registers when only the lower 32 bits are non-zero. * sysdeps/x86_64/memchr.S (MEMCHR): Use 32-bit registers for the lower 32 bits. --- sysdeps/x86_64/memchr.S | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/sysdeps/x86_64/memchr.S b/sysdeps/x86_64/memchr.S index 8242f2d..3167cd8 100644 --- a/sysdeps/x86_64/memchr.S +++ b/sysdeps/x86_64/memchr.S @@ -31,7 +31,7 @@ .text ENTRY(MEMCHR) movd %esi, %xmm1 - mov %rdi, %rcx + mov %edi, %ecx #ifdef USE_AS_WMEMCHR test %rdx, %rdx @@ -44,10 +44,10 @@ ENTRY(MEMCHR) punpcklbw %xmm1, %xmm1 #endif - and $63, %rcx + and $63, %ecx pshufd $0, %xmm1, %xmm1 - cmp $48, %rcx + cmp $48, %ecx ja L(crosscache) movdqu (%rdi), %xmm0 @@ -59,7 +59,7 @@ ENTRY(MEMCHR) sub $16, %rdx jbe L(return_null) add $16, %rdi - and $15, %rcx + and $15, %ecx and $-16, %rdi add %rcx, %rdx sub $64, %rdx @@ -68,7 +68,7 @@ ENTRY(MEMCHR) .p2align 4 L(crosscache): - and $15, %rcx + and $15, %ecx and $-16, %rdi movdqa (%rdi), %xmm0 @@ -162,7 +162,7 @@ L(loop_prolog): mov %rdi, %rcx and $-64, %rdi - and $63, %rcx + and $63, %ecx add %rcx, %rdx .p2align 4 @@ -214,7 +214,7 @@ L(align64_loop): .p2align 4 L(exit_loop): - add $32, %rdx + add $32, %edx jle L(exit_loop_32) movdqa (%rdi), %xmm0 @@ -234,32 +234,32 @@ L(exit_loop): pmovmskb %xmm3, %eax test %eax, %eax jnz L(matches32_1) - sub $16, %rdx + sub $16, %edx jle L(return_null) PCMPEQ 48(%rdi), %xmm1 pmovmskb %xmm1, %eax test %eax, %eax jnz L(matches48_1) - xor %rax, %rax + xor %eax, %eax ret .p2align 4 L(exit_loop_32): - add $32, %rdx + add $32, %edx movdqa (%rdi), %xmm0 PCMPEQ %xmm1, %xmm0 pmovmskb %xmm0, %eax test %eax, %eax jnz L(matches_1) - sub $16, %rdx + sub $16, %edx jbe L(return_null) PCMPEQ 16(%rdi), %xmm1 pmovmskb %xmm1, %eax test %eax, %eax jnz L(matches16_1) - xor %rax, %rax + xor %eax, %eax ret .p2align 4 @@ -320,7 +320,7 @@ L(matches48_1): .p2align 4 L(return_null): - xor %rax, %rax + xor %eax, %eax ret END(MEMCHR) -- 2.9.4