From patchwork Wed Mar 20 16:29:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Youdkevitch X-Patchwork-Id: 31918 Received: (qmail 75627 invoked by alias); 20 Mar 2019 16:29:40 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 75618 invoked by uid 89); 20 Mar 2019 16:29:39 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-27.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=HX-Languages-Length:1593 X-HELO: forward100j.mail.yandex.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bell-sw.com; s=mail; t=1553099374; bh=gzXSFZzLhA9XUWmogKG5jIeMMarBoCydFuMfQmT63Yc=; h=Subject:To:From:Date:Message-ID; b=BCjwFPWNVYQWLxyD1OsUZslt309/zYRXurNL6D0kdNrna/W0ha/3pQjWwpV7l4vxn FqCAXJhx7MyQ3jpDgl/1UTDlGTcqF8AvOunn2rr6LFLRDPFbY3R4bbPCiGooGjInl7 8pRD8exU2qKQwo8lCDM3s7P48qP3rBEwUrmj4oXA= Authentication-Results: mxback20g.mail.yandex.net; dkim=pass header.i=@bell-sw.com Date: Wed, 20 Mar 2019 19:29:32 +0300 From: Anton Youdkevitch To: libc-alpha@sourceware.org Subject: [PATCH v2] aarch64: thunderx2 memcpy branches reordering Message-ID: <20190320162930.GB13393@bell-sw.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Rewrote the branches in load and merge chunk so that the order is more in line with the most probable case. ChangeLog: * sysdeps/aarch64/multiarch/memcpy_thunderx2.S: branches reordering diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx2.S b/sysdeps/aarch64/multiarch/memcpy_thunderx2.S index b2215c1..f637300 100644 --- a/sysdeps/aarch64/multiarch/memcpy_thunderx2.S +++ b/sysdeps/aarch64/multiarch/memcpy_thunderx2.S @@ -382,7 +382,8 @@ L(bytes_0_to_3): strb A_lw, [dstin] strb B_lw, [dstin, tmp1] strb A_hw, [dstend, -1] -L(end): ret +L(end): + ret .p2align 4 @@ -557,17 +558,9 @@ L(ext_size_ ## shft):;\ ext A_v.16b, C_v.16b, D_v.16b, 16-shft;\ ext B_v.16b, D_v.16b, E_v.16b, 16-shft;\ subs count, count, 32;\ - b.ge 2f;\ + b.lt 2f;\ 1:;\ stp A_q, B_q, [dst], #32;\ - ext H_v.16b, E_v.16b, F_v.16b, 16-shft;\ - ext I_v.16b, F_v.16b, G_v.16b, 16-shft;\ - stp H_q, I_q, [dst], #16;\ - add dst, dst, tmp1;\ - str G_q, [dst], #16;\ - b L(copy_long_check32);\ -2:;\ - stp A_q, B_q, [dst], #32;\ prfm pldl1strm, [src, MEMCPY_PREFETCH_LDR];\ ldp D_q, J_q, [src], #32;\ ext H_v.16b, E_v.16b, F_v.16b, 16-shft;\ @@ -579,8 +572,15 @@ L(ext_size_ ## shft):;\ ext B_v.16b, D_v.16b, J_v.16b, 16-shft;\ mov E_v.16b, J_v.16b;\ subs count, count, 64;\ - b.ge 2b;\ - b 1b;\ + b.ge 1b;\ +2:;\ + stp A_q, B_q, [dst], #32;\ + ext H_v.16b, E_v.16b, F_v.16b, 16-shft;\ + ext I_v.16b, F_v.16b, G_v.16b, 16-shft;\ + stp H_q, I_q, [dst], #16;\ + add dst, dst, tmp1;\ + str G_q, [dst], #16;\ + b L(copy_long_check32);\ EXT_CHUNK(1) EXT_CHUNK(2)