From patchwork Fri Jul 26 11:58:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 33805 Received: (qmail 42939 invoked by alias); 26 Jul 2019 11:58:10 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 42921 invoked by uid 89); 26 Jul 2019 11:58:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy=outperform, composed X-HELO: NAM05-CO1-obe.outbound.protection.outlook.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Yu0CIc3X+I1FZqd7dVm9YTFL5ehuOwPstsMCSB3hCyzJnY9gpXbflWmb8mbXOG7S0HzjupSpYnJEA3dqsJ0Y6h8K9rkvQ1Fo8otmtZt58b/MLUJQ8vy82hvVE8/yigcaY6Cj33XdWD2ayrDCSb0/gg3weq4QpPUSPJM4IzePKwS2+a2pjgbWmMBF0sJ1WE/WajciIh6QuVL5+hsF8pw+csMZykkn9Mam6FAXUeut2I1vjVC16OWkhUe/C7kwarEmE02Wg4giojUEaJr4eBbJaYV5Vsl34LnOMqNeUBW9Gtcv4rdx7je8Mh7GrBryj385B9Mof4Ym0Nx7Wwd6v/Neww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DHJOc9wXMOXDxGC9SgG1H8DjR/RLNM1tqY5gUdm40Zo=; b=bvATcpiPBKhyKyhJ//cyx7UOQEt+zXfeE2uM1qf9ZFtVKDwg25rCjzT5gUrfl4DosfCASyRnIiCUD2PiRCiR0FZc+i7LOG/31+3pwdbThaCNTsBCajZrfJMeMsBiIjoCdF0lg6UUatup9v639mxsQhNOoIjn8mU+IsVfgp8FlNHFE/W8OesT+b5icO0Kbc3q1Mm28huoM5QFH/sBz6rwEnR6c0Hwe3+HiQ6/VBxXsCmg8PTrV6AWUliKooOc8wNpLxhEbmAEY5GdW16RpXN7eIDj4Em8h/2ur+OXFBgwmM53ATJ5PUdybYKtXlmCXtQ4SPDnYPiC4N5+vEdHNW0HVw== ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=os.amperecomputing.com;dmarc=pass action=none header.from=os.amperecomputing.com;dkim=pass header.d=os.amperecomputing.com;arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DHJOc9wXMOXDxGC9SgG1H8DjR/RLNM1tqY5gUdm40Zo=; b=cuBtwGe5hasNZGWr4FbWucLEtrroZUb/PnlmpD5Xml2UeXI0NfGkmyxW1B0rcEb4aWeQk44YNTSKyCUaMTOq32qWVamnzkvCiRD4Dmr6mbeKyTaq/Zd/ieOQ0dbvoupzvfS3HAfEACTqW7SkRJnBvBnfSM1bTrs+YSM/TOd1XCA= From: Feng Xue OS To: "libc-alpha@sourceware.org" CC: Feng Xue OS Subject: [PATCH] aarch64: Add tunable glibc.memset.dc_zva_threshold Date: Fri, 26 Jul 2019 11:58:03 +0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=fxue@os.amperecomputing.com; x-ms-oob-tlc-oobclassifiers: OLM:4502; received-spf: None (protection.outlook.com: os.amperecomputing.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: fxue@os.amperecomputing.com This patch is composed to add a tunable 'glibc.memset.dc_zva_threshold' to control using DC ZVA in memset or not. Only when memset size exceeds this threshold, DC ZVA will be used. The background is that DC ZVA does not always outperform normal memory-store zeroing, especially when there are multiple processes/threads contending for memory/cache. Feng ---- * manual/tunables.texi: Document glibc.memset.dc_zva_threshold. * sysdeps/aarch64/dl-tunables.list (glibc): Add memset.dc_zva_threshold. * sysdeps/aarch64/multiarch/init-arch.h [HAVE_TUNABLES]: Include dl-tunables.h (INIT_ZVA_THRESHOLD): New macro, with new local variable zva_threshold. (INIT_ARCH): Add INIT_ZVA_THRESHOLD. * sysdeps/aarch64/multiarch/memset.c (__memset_dc_zva_threshold): New variable. (init_memset): New macro. * sysdeps/aarch64/memset_base64.S (__memset_base64) [HAVE_TUNABLES]: Add conditional compare over __memset_dc_zva_threshold. * sysdeps/aarch64/memset_emag.S (DC_ZVA_THRESHOLD): Changed to a new value. --- ChangeLog | 18 ++++++++++++++++++ manual/tunables.texi | 9 +++++++++ sysdeps/aarch64/dl-tunables.list | 6 ++++++ sysdeps/aarch64/multiarch/init-arch.h | 11 +++++++++++ sysdeps/aarch64/multiarch/memset.c | 22 +++++++++++++++++++--- sysdeps/aarch64/multiarch/memset_base64.S | 7 +++++++ sysdeps/aarch64/multiarch/memset_emag.S | 4 ++-- 7 files changed, 72 insertions(+), 5 deletions(-) diff --git a/ChangeLog b/ChangeLog index dbdb85d..1921e2a 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,21 @@ +2019-07-26 Feng Xue + + * manual/tunables.texi: Document glibc.memset.dc_zva_threshold. + * sysdeps/aarch64/dl-tunables.list (glibc): + Add memset.dc_zva_threshold. + * sysdeps/aarch64/multiarch/init-arch.h [HAVE_TUNABLES]: Include + dl-tunables.h + (INIT_ZVA_THRESHOLD): New macro, with new local variable + zva_threshold. + (INIT_ARCH): Add INIT_ZVA_THRESHOLD. + * sysdeps/aarch64/multiarch/memset.c (__memset_dc_zva_threshold): + New variable. + (init_memset): New macro. + * sysdeps/aarch64/memset_base64.S (__memset_base64) [HAVE_TUNABLES]: + Add conditional compare over __memset_dc_zva_threshold. + * sysdeps/aarch64/memset_emag.S (DC_ZVA_THRESHOLD): Changed to a + new value. + 2019-07-25 Florian Weimer [BZ #24677] diff --git a/manual/tunables.texi b/manual/tunables.texi index ee0fdf2..c7c13cc 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -411,3 +411,12 @@ instead. This tunable is specific to i386 and x86-64. @end deftp + +@deftp Tunable glibc.memset.dc_zva_threshold +The @code{glibc.memset.dc_zva_threshold} tunable allows the user to set +threshold to trigger DC ZVA in memset. When memset size is less than this +threshold, normal memory store instruction will be used, otherwise DC ZVA +instruction will be used. Value of zero means default threshold. + +This tunable is specific to aarch64. +@end deftp diff --git a/sysdeps/aarch64/dl-tunables.list b/sysdeps/aarch64/dl-tunables.list index 5fac533..0f5b5e1 100644 --- a/sysdeps/aarch64/dl-tunables.list +++ b/sysdeps/aarch64/dl-tunables.list @@ -22,4 +22,10 @@ glibc { type: STRING } } + memset { + dc_zva_threshold { + type: SIZE_T + default: 0 + } + } } diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h index b9020ae..93133a2 100644 --- a/sysdeps/aarch64/multiarch/init-arch.h +++ b/sysdeps/aarch64/multiarch/init-arch.h @@ -18,7 +18,18 @@ #include +#if HAVE_TUNABLES +# include + +# define INIT_ZVA_THRESHOLD() \ + uint64_t __attribute__((unused)) zva_threshold = \ + TUNABLE_GET(glibc, memset, dc_zva_threshold, size_t, NULL); +#else +# define INIT_ZVA_THRESHOLD() +#endif + #define INIT_ARCH() \ + INIT_ZVA_THRESHOLD() \ uint64_t __attribute__((unused)) midr = \ GLRO(dl_aarch64_cpu_features).midr_el1; \ unsigned __attribute__((unused)) zva_size = \ diff --git a/sysdeps/aarch64/multiarch/memset.c b/sysdeps/aarch64/multiarch/memset.c index 4817587..2015bce 100644 --- a/sysdeps/aarch64/multiarch/memset.c +++ b/sysdeps/aarch64/multiarch/memset.c @@ -32,12 +32,28 @@ extern __typeof (__redirect_memset) __memset_falkor attribute_hidden; extern __typeof (__redirect_memset) __memset_emag attribute_hidden; extern __typeof (__redirect_memset) __memset_generic attribute_hidden; +# if HAVE_TUNABLES +uint64_t __memset_dc_zva_threshold = 512; + +# define init_memset(fn, default_zva_threshold) \ +({ \ + if (zva_threshold) \ + __memset_dc_zva_threshold = zva_threshold; \ + else if (default_zva_threshold) \ + __memset_dc_zva_threshold = default_zva_threshold; \ + fn; \ +}) +# else +# define init_memset(fn, default_zva_threshold) (fn) +# endif + libc_ifunc (__libc_memset, + ((IS_FALKOR (midr) || IS_PHECDA (midr)) && zva_size == 64 - ? __memset_falkor + ? init_memset (__memset_falkor, 0) : (IS_EMAG (midr) && zva_size == 64 - ? __memset_emag - : __memset_generic))); + ? init_memset (__memset_emag, 8*1024*1024) + : init_memset (__memset_generic, 0)))); # undef memset strong_alias (__libc_memset, memset); diff --git a/sysdeps/aarch64/multiarch/memset_base64.S b/sysdeps/aarch64/multiarch/memset_base64.S index 9a62325..6350a6d 100644 --- a/sysdeps/aarch64/multiarch/memset_base64.S +++ b/sysdeps/aarch64/multiarch/memset_base64.S @@ -91,7 +91,14 @@ L(set96): .p2align 4 L(set_long): stp val, val, [dstin] +#if HAVE_TUNABLES + adrp tmp1, __memset_dc_zva_threshold + add tmp1, tmp1, :lo12:__memset_dc_zva_threshold + ldr tmp2, [tmp1] /* Load DC ZVA tunable threshold value. */ + cmp count, tmp2 +#else cmp count, DC_ZVA_THRESHOLD +#endif ccmp val, 0, 0, cs bic dst, dstin, 15 b.eq L(zva_64) diff --git a/sysdeps/aarch64/multiarch/memset_emag.S b/sysdeps/aarch64/multiarch/memset_emag.S index 1c1fabc..78a2a14 100644 --- a/sysdeps/aarch64/multiarch/memset_emag.S +++ b/sysdeps/aarch64/multiarch/memset_emag.S @@ -23,10 +23,10 @@ /* * Using dc zva to zero memory does not produce better performance if * memory size is not very large, especially when there are multiple - * processes/threads contending memory/cache. Here we use a somewhat + * processes/threads contending memory/cache. Here we use a very * large threshold to trigger usage of dc zva. */ -# define DC_ZVA_THRESHOLD 1024 +# define DC_ZVA_THRESHOLD (8*1024*1024) # include "./memset_base64.S" #endif