From patchwork Thu Aug 8 03:56:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 34001 Received: (qmail 104268 invoked by alias); 8 Aug 2019 03:56:39 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 104260 invoked by uid 89); 8 Aug 2019 03:56:39 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy=H*i:sk:0a810dd, H*f:sk:0a810dd X-HELO: NAM05-BY2-obe.outbound.protection.outlook.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jnstrxWIav4NNlVZrsb8ZccF2lWnMSVn5Hynw+RrOdgonrsExf2MNmNtpi3HITtefkTaefaF/sFvwLuMFVkKvXNVN4t3vk6ndUE6G9pBBg6h8bGLFFn8Yc7/dUBucxBSwykCcI5vkJHkD3Usp8O5SzNsYShlzBrmovki41Ga1ImZCgR2LlrE3Y4UFwj5WBpDYrX0bbfGh0sBZvaypYHWO0YUCh0MuSCEybOLvNoBwPoCznN3TS+kU1F0OC4oyUjB0br8LW5syHJowg8r6Z/zsZBPvUYuC1nKmzDNv23sNZtcF9AUf5wox+7jqe5dazW8BJjb12kHC/RVycRllabgWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vV3x9xDcHvVXz1KzF0UdqKg4BarShrj0xgPF6B8OgEM=; b=JFexV+rUt6IikgIhMjxnimzDpCSsLQvgQqwC/z6XQ/ACEfgn1yTB2LkQIHk+Osx3CIuiTJIP6yZ27jmvpcN6rbS9TVfAuQOYq1Be+cnL2Cgmmp/NezRPCp/5x6AUw/9Cp9+pC2SFd/wOnW0zURvS7avvjdOesnTnwZ3kbbraa0F/gRK08z6zQDkE6yUXL2GE0GkMVo+sdzIqp5vN78eNFT3si2ZzNjctILHZRzQBNnJlOE/BPMMGt+zUJUe/p+Fkhq0pVaTR4cauCxABhPuPqHj/4oBbw6FCu0vgawMCneywPL5Q0+t0v046HP6qwury9iqySlZbPwYMcK2Su7QukA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vV3x9xDcHvVXz1KzF0UdqKg4BarShrj0xgPF6B8OgEM=; b=C4DpeT7Nm1tSWKVTQA6huDcbEeAINbzxRoDkYKlNRS5K9+xYz3F+KSuBgqOAbYH95vo9useQn0hiT6hp4HTQTkx4sp0Ww0XQJTvf6jexccca7U0p4zxurB3z394H80xiVVh0bqD6CZXb4OHRp/ZAsGOC7ov+e2wMzcnF7wMffcQ= From: Feng Xue OS To: Siddhesh Poyarekar , Wilco Dijkstra , 'GNU C Library' CC: nd Subject: Re: [PATCH] aarch64: Add tunable glibc.memset.dc_zva_threshold Date: Thu, 8 Aug 2019 03:56:33 +0000 Message-ID: References: , <0a810dde-3b92-4782-09cb-16cdbc8dbb75@gotplt.org> In-Reply-To: <0a810dde-3b92-4782-09cb-16cdbc8dbb75@gotplt.org> authentication-results: spf=none (sender IP is ) smtp.mailfrom=fxue@os.amperecomputing.com; x-ms-oob-tlc-oobclassifiers: OLM:10000; received-spf: None (protection.outlook.com: os.amperecomputing.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 8UxJuMYJxTgTQpJHyovG2lrTCelO/DPW/PGD8+d+MlpU2lTzBwCJHFlJq7fG24vxVkpBHr3G7seerxvEkoBiwB48+AW3dk9PSLXxZvomHP8= This version disable DC ZVA in emag. Feng ------ * sysdeps/aarch64/multiarch/memset_base64.S (DC_ZVA_THRESHOLD): Disable DC ZVA code if this macro is defined as zero. * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD): Change to zero to disable using DC ZVA. --- ChangeLog | 7 +++++++ sysdeps/aarch64/multiarch/memset_base64.S | 12 ++++++++++-- sysdeps/aarch64/multiarch/memset_emag.S | 12 +++++++----- 3 files changed, 24 insertions(+), 7 deletions(-) -- 1.8.3.1 diff --git a/ChangeLog b/ChangeLog index dbdb85d..ba27f96 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2019-08-08 Feng Xue + + * sysdeps/aarch64/multiarch/memset_base64.S (DC_ZVA_THRESHOLD): + Disable DC ZVA code if this macro is defined as zero. + * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD): + Change to zero to disable using DC ZVA. + 2019-07-25 Florian Weimer [BZ #24677] diff --git a/sysdeps/aarch64/multiarch/memset_base64.S b/sysdeps/aarch64/multiarch/memset_base64.S index 9a62325..c0cccba 100644 --- a/sysdeps/aarch64/multiarch/memset_base64.S +++ b/sysdeps/aarch64/multiarch/memset_base64.S @@ -23,6 +23,7 @@ # define MEMSET __memset_base64 #endif +/* To disable DC ZVA, set this threshold to 0. */ #ifndef DC_ZVA_THRESHOLD # define DC_ZVA_THRESHOLD 512 #endif @@ -91,11 +92,12 @@ L(set96): .p2align 4 L(set_long): stp val, val, [dstin] + bic dst, dstin, 15 +#if DC_ZVA_THRESHOLD cmp count, DC_ZVA_THRESHOLD ccmp val, 0, 0, cs - bic dst, dstin, 15 b.eq L(zva_64) - +#endif /* Small-size or non-zero memset does not use DC ZVA. */ sub count, dstend, dst @@ -105,7 +107,11 @@ L(set_long): * count is less than 33 bytes, so as to bypass 2 unneccesary stps. */ sub count, count, 64+16+1 + +#if DC_ZVA_THRESHOLD + /* Align loop on 16-byte boundary, this might be friendly to i-cache. */ nop +#endif 1: stp val, val, [dst, 16] stp val, val, [dst, 32] @@ -121,6 +127,7 @@ L(set_long): stp val, val, [dstend, -16] ret +#if DC_ZVA_THRESHOLD .p2align 3 L(zva_64): stp val, val, [dst, 16] @@ -173,6 +180,7 @@ L(zva_64): 1: stp val, val, [dstend, -32] stp val, val, [dstend, -16] ret +#endif END (MEMSET) libc_hidden_builtin_def (MEMSET) diff --git a/sysdeps/aarch64/multiarch/memset_emag.S b/sysdeps/aarch64/multiarch/memset_emag.S index 1c1fabc..c2aed62 100644 --- a/sysdeps/aarch64/multiarch/memset_emag.S +++ b/sysdeps/aarch64/multiarch/memset_emag.S @@ -21,12 +21,14 @@ # define MEMSET __memset_emag /* - * Using dc zva to zero memory does not produce better performance if + * Using DC ZVA to zero memory does not produce better performance if * memory size is not very large, especially when there are multiple - * processes/threads contending memory/cache. Here we use a somewhat - * large threshold to trigger usage of dc zva. -*/ -# define DC_ZVA_THRESHOLD 1024 + * processes/threads contending memory/cache. Here we set threshold to + * zero to disable using DC ZVA, which is good for multi-process/thread + * workloads. + */ + +# define DC_ZVA_THRESHOLD 0 # include "./memset_base64.S" #endif