From patchwork Mon Oct 30 15:46:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 23967 Received: (qmail 6853 invoked by alias); 30 Oct 2017 15:46:17 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 6540 invoked by uid 89); 30 Oct 2017 15:46:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-oi0-f49.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=tRoSYQxnMx3u1iwzXTzNTd/8V+ZiVnCTGNGojpvWrvk=; b=JiHREFFMfjF1YbllnJ2nwkycCxxf9+/JfzABL1wyE/jJTIPLvHWVE4Bq0Fwh1di+Ae oqUMCdjJlObgXo4BJ0B19hu44DWVlUpfWCuRuD2MPZnNjCxXZanjpePnwYMmUv/AzFfi YPFolnm/hfjeUaNK09Yt4BH1C4hnQ2iu8NtClB967ZusU6KaC3DRdoVjC8VwFVjyMulU 7iIqcm/Tc+dlnjMRqLmlXOx8+YEhhknKkRR1RKNDxbmPeYfe6Hb5K1sgrm1h8ehAKLNU QQ8HWq1f3iawn01ALP2UiOcWT6vMzbvbS1pGm0cp8YNy1rJqxuc/sHNtJgV7S4o01Vqo PIPw== X-Gm-Message-State: AMCzsaWIzxeFoPMjQ2o0xZcpTPC4mBxNBVJKRlepzVADKPaf61ZPLM+5 mf9P7S9vpR6T9KI+IXh7LNXsSAYAm9NE63kzNOWm0g== X-Google-Smtp-Source: ABhQp+TgtwsJvntEnK3JkRrOxxvom8W1Ncj4OHnfPUMAWKoo8CYefmR7RusNUWwtoM6hGeBmObGSHkIRAUk9wZO4n4U= X-Received: by 10.202.97.67 with SMTP id v64mr5133141oib.256.1509378371227; Mon, 30 Oct 2017 08:46:11 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <94e6928b-c4f4-eddd-f6a3-863a31823ed9@redhat.com> References: <20171027210714.GA15539@gmail.com> <94e6928b-c4f4-eddd-f6a3-863a31823ed9@redhat.com> From: "H.J. Lu" Date: Mon, 30 Oct 2017 08:46:10 -0700 Message-ID: Subject: Re: [PATCH] i586: Use a jump table in strcpy.S {BZ #22353] To: Florian Weimer Cc: GNU C Library On Mon, Oct 30, 2017 at 6:20 AM, Florian Weimer wrote: > On 10/30/2017 02:12 PM, H.J. Lu wrote: > >>> What's the performance impact of the change? Is it even worth to use a >>> jump >>> table here? >>> >> >> I added 2 strcpy implementations to hjl/pr22353/master. Jump table is >> faster >> than other non-SSE strcpy implementations because it can copy up to 4 >> bytes >> at a time. > > > To clarify, I meant replacing the jump table with conditional branches, > basically getting rid of Duff's device. > I changed: BRANCH_TO_JMPTBL_ENTRY (L(SrcTable), %ecx, 4) .p2align 4 L(Src0): to cmpb $2, %cl je L(Src2) ja L(Src3) cmpb $1, %cl je L(Src1) L(Src0): Their performances are equivalent on Haswell. Here is the updated patch I am going to check in. Thanks. From 9208134988aecc786aad96e996dab013b656b22e Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 27 Oct 2017 02:31:13 -0700 Subject: [PATCH] i586: Use conditional branches in strcpy.S [BZ #22353] i586 strcpy.S used a clever trick with LEA to implement jump table: /* ECX has the last 2 bits of the address of source - 1. */ andl $3, %ecx call 2f 2: popl %edx /* 0xb is the distance between 2: and 1:. */ leal 0xb(%edx,%ecx,8), %ecx jmp *%ecx .align 8 1: /* ECX == 0 */ orb (%esi), %al jz L(end) stosb xorl %eax, %eax incl %esi /* ECX == 1 */ orb (%esi), %al jz L(end) stosb xorl %eax, %eax incl %esi /* ECX == 2 */ orb (%esi), %al jz L(end) stosb xorl %eax, %eax incl %esi /* ECX == 3 */ L(1): movl (%esi), %ecx leal 4(%esi),%esi This fails if there are instruction length changes before L(1):. This patch replaces it with conditional branches: cmpb $2, %cl je L(Src2) ja L(Src3) cmpb $1, %cl je L(Src1) L(Src0): which have similar performance and work with any instruction lengths. Tested on i586 and i686 with and without --disable-multi-arch. [BZ #22353] * sysdeps/i386/i586/strcpy.S (STRCPY): Use conditional branches. (1): Renamed to ... (L(Src0)): This. (L(Src1)): New. (L(Src2)): Likewise. (L(1)): Renamed to ... (L(Src3)): This. --- sysdeps/i386/i586/strcpy.S | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/sysdeps/i386/i586/strcpy.S b/sysdeps/i386/i586/strcpy.S index a444604f4f..bb73ca4ef3 100644 --- a/sysdeps/i386/i586/strcpy.S +++ b/sysdeps/i386/i586/strcpy.S @@ -53,41 +53,35 @@ ENTRY (STRCPY) cfi_rel_offset (ebx, 0) andl $3, %ecx -#ifdef PIC - call 2f - cfi_adjust_cfa_offset (4) -2: popl %edx - cfi_adjust_cfa_offset (-4) - /* 0xb is the distance between 2: and 1: but we avoid writing - 1f-2b because the assembler generates worse code. */ - leal 0xb(%edx,%ecx,8), %ecx -#else - leal 1f(,%ecx,8), %ecx -#endif - - jmp *%ecx + cmpb $2, %cl + je L(Src2) + ja L(Src3) + cmpb $1, %cl + je L(Src1) - .align 8 -1: +L(Src0): orb (%esi), %al jz L(end) stosb xorl %eax, %eax incl %esi +L(Src1): orb (%esi), %al jz L(end) stosb xorl %eax, %eax incl %esi +L(Src2): orb (%esi), %al jz L(end) stosb xorl %eax, %eax incl %esi -L(1): movl (%esi), %ecx +L(Src3): + movl (%esi), %ecx leal 4(%esi),%esi subl %ecx, %eax @@ -107,7 +101,7 @@ L(1): movl (%esi), %ecx movl %edx, (%edi) leal 4(%edi),%edi - jmp L(1) + jmp L(Src3) L(3): movl %ecx, %edx -- 2.13.6