From patchwork Wed May 24 14:55:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 20559 Received: (qmail 58528 invoked by alias); 24 May 2017 14:55:17 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 58519 invoked by uid 89); 24 May 2017 14:55:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-23.6 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy=2147, Hx-languages-length:2826 X-HELO: mail-ua0-f195.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=FJ2ewrjlZlUtWbkEADr2OEzTT4J3rwj6eSpAL42aEgE=; b=BP1szPkBhuy2y1sDYnXu7Q4AOZPLRoo73IP/w7y4kkbY9o0x7ek+gLlQ0vQI0EzaVA XnTzKtDpi4q3zfdcLJBSvsp2bfiGkuMHvkG/NQz1I93IcQ5V0tsPi1pYJ9YjcU1nXJAK P7AlD77KJOQD0i4AE3Uj/VstMtI3IsZKBh9cqgROyKZP/IxegZEsFB/bS1/nej5WyqVY k+wwIKel8JoGyGgzgJwbhyIBQ2PTEVlckktI3q6QyZMhGWRk0xknG6ifHo229xrnTSLF 0pTaVf5qvgvRxJbnyxPcm3HRrrljbRIruU+XY0TIPPRTRSZDHYGE/WszwKgY2HMi7Ona zXKg== X-Gm-Message-State: AODbwcDJiiLTnZ9ZUceTQ84txhFOmfbaKOovrsdkYWz5qhxYr4cbIDK2 L/YCN/9U0O34kchficGc7kSex8uzRg== X-Received: by 10.159.37.150 with SMTP id 22mr15499366uaf.72.1495637716908; Wed, 24 May 2017 07:55:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20170520145006.GA19672@lucon.org> From: "H.J. Lu" Date: Wed, 24 May 2017 07:55:16 -0700 Message-ID: Subject: Re: [PATCH] x86_64: Remove 9 REX bytes from memchr.S To: Zack Weinberg Cc: GNU C Library On Sat, May 20, 2017 at 12:58 PM, H.J. Lu wrote: > On Sat, May 20, 2017 at 7:59 AM, Zack Weinberg wrote: >> On Sat, May 20, 2017 at 10:50 AM, H.J. Lu wrote: >>> There is no need to use 64-bit registers when only the lower 32 bits >>> are non-zero. >> >> This code is used generically for x86-64, not for a specific >> microarchitecture. Is there a reason why this will never cause partial >> register stalls, now or in the future? > > By x86-64 specification, 32-bit destination registers in these instructions > are zero-extended to 64 bits and there is register stall at all. > Here is the updated patch with one more REX byte removed. Any other comments? From 2db36b54338f65a803080ce8085977299a4f52a0 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Thu, 18 May 2017 12:22:31 -0700 Subject: [PATCH] x86_64: Remove redundant REX bytes from memchr.S By x86-64 specification, 32-bit destination registers are zero-extended to 64 bits. There is no need to use 64-bit registers when only the lower 32 bits are non-zero. * sysdeps/x86_64/memchr.S (MEMCHR): Use 32-bit registers for the lower 32 bits. --- sysdeps/x86_64/memchr.S | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/sysdeps/x86_64/memchr.S b/sysdeps/x86_64/memchr.S index 8242f2d..77a71ae 100644 --- a/sysdeps/x86_64/memchr.S +++ b/sysdeps/x86_64/memchr.S @@ -31,7 +31,7 @@ .text ENTRY(MEMCHR) movd %esi, %xmm1 - mov %rdi, %rcx + mov %edi, %ecx #ifdef USE_AS_WMEMCHR test %rdx, %rdx @@ -44,10 +44,10 @@ ENTRY(MEMCHR) punpcklbw %xmm1, %xmm1 #endif - and $63, %rcx + and $63, %ecx pshufd $0, %xmm1, %xmm1 - cmp $48, %rcx + cmp $48, %ecx ja L(crosscache) movdqu (%rdi), %xmm0 @@ -59,7 +59,7 @@ ENTRY(MEMCHR) sub $16, %rdx jbe L(return_null) add $16, %rdi - and $15, %rcx + and $15, %ecx and $-16, %rdi add %rcx, %rdx sub $64, %rdx @@ -68,7 +68,7 @@ ENTRY(MEMCHR) .p2align 4 L(crosscache): - and $15, %rcx + and $15, %ecx and $-16, %rdi movdqa (%rdi), %xmm0 @@ -162,7 +162,7 @@ L(loop_prolog): mov %rdi, %rcx and $-64, %rdi - and $63, %rcx + and $63, %ecx add %rcx, %rdx .p2align 4 @@ -214,7 +214,7 @@ L(align64_loop): .p2align 4 L(exit_loop): - add $32, %rdx + add $32, %edx jle L(exit_loop_32) movdqa (%rdi), %xmm0 @@ -234,7 +234,7 @@ L(exit_loop): pmovmskb %xmm3, %eax test %eax, %eax jnz L(matches32_1) - sub $16, %rdx + sub $16, %edx jle L(return_null) PCMPEQ 48(%rdi), %xmm1 @@ -246,13 +246,13 @@ L(exit_loop): .p2align 4 L(exit_loop_32): - add $32, %rdx + add $32, %edx movdqa (%rdi), %xmm0 PCMPEQ %xmm1, %xmm0 pmovmskb %xmm0, %eax test %eax, %eax jnz L(matches_1) - sub $16, %rdx + sub $16, %edx jbe L(return_null) PCMPEQ 16(%rdi), %xmm1 -- 2.9.4