From patchwork Wed Nov 14 14:45:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 30143 Received: (qmail 57207 invoked by alias); 14 Nov 2018 14:45:40 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 57029 invoked by uid 89); 14 Nov 2018 14:45:32 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=Hx-languages-length:1310, bhs, biased X-HELO: EUR03-VE1-obe.outbound.protection.outlook.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ElHsn5BjKgyWhtoKI8TUbbrQIECOtNKZvNmgIgGykOc=; b=pFb+yjeCwT5038Lh90A/YwU+IsFpcx89/CjPMCwbtdSDuYSNx4DpYDzbVDKSynRO1gJ5LmypvpSFusgXinCqLD4iGLfn6E0COLb3KATVXPDvaH3wjzVUJqKkpyRc2qEBijSsRNZ8Aw6/qPYYNRkRsgWdvVtVGh3ptph9Zwt6MKs= From: Wilco Dijkstra To: 'GNU C Library' , Szabolcs Nagy CC: nd Subject: Re: [PATCH][AArch64] Adjust writeback in non-zero memset Date: Wed, 14 Nov 2018 14:45:13 +0000 Message-ID: References: In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) MIME-Version: 1.0 v2: Also bias dst in the zva_other code to avoid issues with zva sizes >= 256. This fixes an ineffiency in the non-zero memset.  Delaying the writeback until the end of the loop is slightly faster on some cores - this shows ~5% performance gain on Cortex-A53 when doing large non-zero memsets. Tested against the GLIBC testsuite, OK for commit? ChangeLog: 2018-11-14  Wilco Dijkstra          * sysdeps/aarch64/memset.S (MEMSET): Improve non-zero memset loop. diff --git a/sysdeps/aarch64/memset.S b/sysdeps/aarch64/memset.S index 4a454593618f78e22c55520d56737fab5d8f63a4..9738cf5fd55a1d937fb3392cec46f37b4d5fb51d 100644 --- a/sysdeps/aarch64/memset.S +++ b/sysdeps/aarch64/memset.S @@ -89,10 +89,10 @@ L(set_long):      b.eq    L(try_zva)  L(no_zva):      sub    count, dstend, dst    /* Count is 16 too large.  */ -    add    dst, dst, 16 +    sub    dst, dst, 16        /* Dst is biased by -32.  */      sub    count, count, 64 + 16    /* Adjust count and bias for loop.  */ -1:    stp    q0, q0, [dst], 64 -    stp    q0, q0, [dst, -32] +1:    stp    q0, q0, [dst, 32] +    stp    q0, q0, [dst, 64]!  L(tail64):      subs    count, count, 64      b.hi    1b @@ -183,6 +183,7 @@ L(zva_other):      subs    count, count, zva_len      b.hs    3b  4:    add    count, count, zva_len +    sub    dst, dst, 32        /* Bias dst for tail loop.  */      b    L(tail64)  #endif