From patchwork Fri Aug 2 01:49:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 33914 Received: (qmail 84083 invoked by alias); 2 Aug 2019 01:50:11 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 83998 invoked by uid 89); 2 Aug 2019 01:50:11 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy=concise, sk:cachea, sk:cache.a, 2013 X-HELO: NAM01-BN3-obe.outbound.protection.outlook.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=d/6cVrTJRZUtRCXoilgDIY7M6XVKHHVA0pPnRF6UUbPWecJrQHlBN8Naw3me+8bsj6DlJHPq5hllr4/1ta1TUuH3VG0s4V/OxWBXRj5UKEEe5ctRVXdrY9RcEXHmUTRfxo5VDDa1wcEOYg8U8j4yqvSFd/Nxah4iWQ8oL6MxZbKuoeo9/eIs4/8i42M2ZkNLJbFBPSE1UL+OXZEAj8fiLwYFFySKNUGC2ORQrPXirsUYkBwuGDik3x71/WQtMbVNNaJqJfJvXMQN6GJqUKtmDAiCEv5W2FPz9PjFD7UI4tplJhw1cRPYvc0JPSeL8AM9a7bZG2DV7OHidbYXQ2f7pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EH71vvWVOmHjCaRzJhFlE6g3cHi6zSvrrrlb2b0qN1U=; b=oX3aUFdSnKWGvw5YH7Ma/PQfH4/VHkadh6WXYkFHF/bC4f1We0NIuKc2CnD4PeN7Yjmyost6IR1sQij+aPq8W0JM6K3mr3wUsBMrXSVcNnhwsVHtfN8GoDKW+sIR88WBDqOtqfSAVv2J0JPYdqjfqIPyOQaVrHd7QXv599TJhSG81iCtin6aepqW1TGC+PuLSrfkrIy6+uur8yKgCMjygsdLkLsN35KeiU9K2ZGquFdeOwUrH2fh1AF9cIVsLmO3f5LCsXpKA9BhRg5MLiCmxUop7f3eq5gWuxLB9IcdM0jJXcpxpJF/ulaIrk+n6iFjEmwhbIFXCzZ8Ji+ogS2GBA== ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=os.amperecomputing.com;dmarc=pass action=none header.from=os.amperecomputing.com;dkim=pass header.d=os.amperecomputing.com;arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EH71vvWVOmHjCaRzJhFlE6g3cHi6zSvrrrlb2b0qN1U=; b=NDaEBZRGMXLFsps2Ar827zELAkWSbSN8TTgm7MdNLlX/rv8zJ4Lj0F7vOFKJHJzqPsrDkXLbM8d+m88/TPz6FeHmkX0Y/2l6He06joUa6gQih9nSVeEugfcfLv5fq9vrtfCxC7autkeqF/pIwXNipPYLdFPqR6Tm868+cm7WiSI= From: Feng Xue OS To: Siddhesh Poyarekar , "libc-alpha@sourceware.org" Subject: Re: [PATCH] aarch64: Add tunable glibc.memset.dc_zva_threshold Date: Fri, 2 Aug 2019 01:49:58 +0000 Message-ID: References: , In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=fxue@os.amperecomputing.com; x-ms-exchange-purlcount: 1 x-ms-oob-tlc-oobclassifiers: OLM:10000; received-spf: None (protection.outlook.com: os.amperecomputing.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: fxue@os.amperecomputing.com For SPEC2017 502.gcc_r (rate=32), which uses quite a few memsets, we can get 2.3% improvement on emag processor if DC ZVA threshold is changed from 512 to 8M. > This should be called cache.aarch64_dc_zva_threshold or > cache.aarch64_dczva_threshold. I think dc_zva implies aarch64 architecture, so the name "cache.dc_zva_threshold" seems to be concise a little bit. > The default threshold initialization needs to take place in the emag > file and not here, this code is already getting complicated and it won't > be long until it starts looking like a character soup. That should also > take care of the unexplained magic number (8M). Changed, and added comment for this default value, please refer to new patch below. Thanks, Feng -------------- * manual/tunables.texi: Document glibc.cache.dc_zva_threshold. * sysdeps/aarch64/dl-tunables.list (glibc): Add cache.dc_zva_threshold. * sysdeps/aarch64/multiarch/memset_emag.c: New file. * sysdeps/aarch64/multiarch/memset_base64.S (__memset_base64) : Add conditional compare over __dc_zva_threshold. * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD): Change to a new value. (HAVE_THRESHOLD_TUNABLE): New macro. --- ChangeLog | 12 +++++++++ manual/tunables.texi | 9 +++++++ sysdeps/aarch64/dl-tunables.list | 6 +++++ sysdeps/aarch64/multiarch/memset.c | 2 ++ sysdeps/aarch64/multiarch/memset_base64.S | 7 ++++++ sysdeps/aarch64/multiarch/memset_emag.S | 15 +++++++---- sysdeps/aarch64/multiarch/memset_emag.c | 41 +++++++++++++++++++++++++++++++ 7 files changed, 87 insertions(+), 5 deletions(-) create mode 100644 sysdeps/aarch64/multiarch/memset_emag.c -- 1.8.3.1 diff --git a/ChangeLog b/ChangeLog index dbdb85d..7626606 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,15 @@ +2019-07-31 Feng Xue + + * manual/tunables.texi: Document glibc.cache.dc_zva_threshold. + * sysdeps/aarch64/dl-tunables.list (glibc): + Add cache.dc_zva_threshold. + * sysdeps/aarch64/multiarch/memset_emag.c: New file. + * sysdeps/aarch64/multiarch/memset_base64.S (__memset_base64) : Add + conditional compare over __dc_zva_threshold. + * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD): Change + to a new value. + (HAVE_THRESHOLD_TUNABLE): New macro. + 2019-07-25 Florian Weimer [BZ #24677] diff --git a/manual/tunables.texi b/manual/tunables.texi index ee0fdf2..fb304d0 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -411,3 +411,12 @@ instead. This tunable is specific to i386 and x86-64. @end deftp + +@deftp Tunable glibc.cache.dc_zva_threshold +The @code{glibc.cache.dc_zva_threshold} tunable allows the user to set +threshold to trigger DC ZVA in memset on emag processor. When memset size +is less than this threshold, normal memory store instruction will be used, +otherwise DC ZVA instruction will be used. + +This tunable is specific to emag aarch64. +@end deftp diff --git a/sysdeps/aarch64/dl-tunables.list b/sysdeps/aarch64/dl-tunables.list index 5fac533..5d4bffe 100644 --- a/sysdeps/aarch64/dl-tunables.list +++ b/sysdeps/aarch64/dl-tunables.list @@ -22,4 +22,10 @@ glibc { type: STRING } } + cache { + dc_zva_threshold { + type: SIZE_T + default: 0 + } + } } diff --git a/sysdeps/aarch64/multiarch/memset.c b/sysdeps/aarch64/multiarch/memset.c index 4817587..792d7f6 100644 --- a/sysdeps/aarch64/multiarch/memset.c +++ b/sysdeps/aarch64/multiarch/memset.c @@ -41,4 +41,6 @@ libc_ifunc (__libc_memset, # undef memset strong_alias (__libc_memset, memset); + +# include "./memset_emag.c" #endif diff --git a/sysdeps/aarch64/multiarch/memset_base64.S b/sysdeps/aarch64/multiarch/memset_base64.S index 9a62325..748e321 100644 --- a/sysdeps/aarch64/multiarch/memset_base64.S +++ b/sysdeps/aarch64/multiarch/memset_base64.S @@ -91,7 +91,14 @@ L(set96): .p2align 4 L(set_long): stp val, val, [dstin] +#ifdef HAVE_THRESHOLD_TUNABLE + adrp tmp1, __dc_zva_threshold + add tmp1, tmp1, :lo12:__dc_zva_threshold + ldr tmp2, [tmp1] /* Load DC ZVA tunable threshold value. */ + cmp count, tmp2 +#else cmp count, DC_ZVA_THRESHOLD +#endif ccmp val, 0, 0, cs bic dst, dstin, 15 b.eq L(zva_64) diff --git a/sysdeps/aarch64/multiarch/memset_emag.S b/sysdeps/aarch64/multiarch/memset_emag.S index 1c1fabc..a566e40 100644 --- a/sysdeps/aarch64/multiarch/memset_emag.S +++ b/sysdeps/aarch64/multiarch/memset_emag.S @@ -20,13 +20,18 @@ #if IS_IN (libc) # define MEMSET __memset_emag +# if HAVE_TUNABLES +# define HAVE_THRESHOLD_TUNABLE 1 +# endif + /* - * Using dc zva to zero memory does not produce better performance if + * Using DC ZVA to zero memory does not produce better performance if * memory size is not very large, especially when there are multiple - * processes/threads contending memory/cache. Here we use a somewhat - * large threshold to trigger usage of dc zva. -*/ -# define DC_ZVA_THRESHOLD 1024 + * processes/threads contending memory/cache. Here we use a very + * large threshold to trigger usage of DC ZVA, which is good for + * multi-process/thread workloads. + */ +# define DC_ZVA_THRESHOLD 8*1024*1024 # include "./memset_base64.S" #endif diff --git a/sysdeps/aarch64/multiarch/memset_emag.c b/sysdeps/aarch64/multiarch/memset_emag.c new file mode 100644 index 0000000..8b83ab9 --- /dev/null +++ b/sysdeps/aarch64/multiarch/memset_emag.c @@ -0,0 +1,41 @@ +/* Setup threshold to trigger DC ZVA in memset for emag. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if HAVE_TUNABLES +# include + +/* We assume common workloads on server are likely to consist of multiple + * processes/threads, contending memory/cache. For this scenario, disabling + * DC ZVA in memset can archive better performance on emag processor. + * Therefore, by default, we use a very larget threshold, here is 8M, + * which has similar effect as disabling DC ZVA, kind of optimization + * for multi-process/thread workload. + */ +uint64_t __dc_zva_threshold = 8 * 1024 * 1024; + +static void +__attribute__ ((constructor)) +init_dc_zva_threshold (void) +{ + uint64_t threshold + = TUNABLE_GET (glibc, cache, dc_zva_threshold, uint64_t, NULL); + + if (threshold) + __dc_zva_threshold = threshold; +} +#endif