From patchwork Fri Aug 2 04:31:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 33915 Received: (qmail 92957 invoked by alias); 2 Aug 2019 04:31:53 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 92947 invoked by uid 89); 2 Aug 2019 04:31:52 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy=Friday, friday, H*f:sk:495d29a, H*i:sk:495d29a X-HELO: NAM01-SN1-obe.outbound.protection.outlook.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NQYQ2u1tSpMjSYW35KNOHzHwPethda6o67X/R/Dsr+XftOVUIEptSrYaCH1VVcMZA6lXmlYhMKJuISTgpdbwQg7phen+xKLFQxS4VoG5fTeomYOWJgkBZGPQUm2dcijAo7khA7ao7RVQw+zCtKgRbJswWHbanJ/0IxKX8HKI748EKoKuahXN0iaEsHrYpgkhojZYDkN+5kGmgoNVdOHM8tJhHNQCGdvEnim3c3xFm5pYLLo8h+ZJRw/H6ycAzr0v+7uLmxDea7RwHRmBPYzb2vtuHuckEWpMG9UEuywQVA1hYoG19Ir7s+3so5+c2/ZpGjyVd+8ekMQwQtJJXsC7DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IXUUHrtFaY/eAhTgtD6qLH1xKZsQava7bikXML854aA=; b=It7gp7GZQD0254zwJsWKB4iI29wMtAFv6Ia7JamAmwRQmZaSW1cZgzTdVIR4vVm9w0klj4U0un2kMYiyPJlRm3XeY7u15PYTq5/nOj6eocXOPc1TK8XlcuqNHk6VYPQh/6NjjX7yIMcvOA4QRANcwSm8UVNhNfENgEtIu2W3Ib15c0IufIZ+c+JpXGsobdlb+1jHMrEKdekMLnrzIiRo0fJrUKVEyDJ1qaXnXQg4sXLjXA41Ne+qUjIDsa1BV8EeugOmltrum7EYhU8RGQXtjjpH3J50sG8o25RSe64Trm5Lpo1i3xHObnrvzBX+AT7Ul4HTgI6N6pZJxKxZnbsFVw== ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=os.amperecomputing.com;dmarc=pass action=none header.from=os.amperecomputing.com;dkim=pass header.d=os.amperecomputing.com;arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IXUUHrtFaY/eAhTgtD6qLH1xKZsQava7bikXML854aA=; b=sxpzhySv4FoJ+PKfWhmpKX86yuyQwGFYIRKI+8+dI17QebdgIPlpmMtP2uJoaSrTmi+CA6lDrAkKUM2uNUD0psfKZVpH7Owf6M+mA0D10UeGLQ1w0lnydjdMEgfOTx6wsyWes+bf3vh+ak7oju6+m29fOmM8ptym8X1AzESUa9s= From: Feng Xue OS To: Siddhesh Poyarekar , "libc-alpha@sourceware.org" Subject: Re: [PATCH] aarch64: Add tunable glibc.memset.dc_zva_threshold Date: Fri, 2 Aug 2019 04:31:45 +0000 Message-ID: References: <749cd1e7-9c50-9f8c-6f4e-1920f9618c3c@gotplt.org>, <495d29a6-a149-b5aa-affe-56c503be08c0@gotplt.org> In-Reply-To: <495d29a6-a149-b5aa-affe-56c503be08c0@gotplt.org> authentication-results: spf=none (sender IP is ) smtp.mailfrom=fxue@os.amperecomputing.com; x-ms-exchange-purlcount: 1 x-ms-oob-tlc-oobclassifiers: OLM:10000; received-spf: None (protection.outlook.com: os.amperecomputing.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: fxue@os.amperecomputing.com >> That's great, can you test for another part too? Making a case for a >> tunable is easier if you can show applicability to a larger set of >> processors. > Oops, I assumed this explanation was for introducing the tunable, I > realize now that it is for the default 8M value on emag. The test is > still desirable to make the case for a tunable stronger, but not for > this specific point :) I still hope this tuning on dc zva can work for other aarch64 processors. Since we focus on emag, and got no other aarch64 machines on hand, Then, if someone of other aarch64 is willing to test this, that would be better. > It's not just about whether the meaning is clear, it is about naming > convention. Not having an architecture name in the tunable implies that > it could be generally applicable. Ok. Changed. Thanks, Feng -------- * manual/tunables.texi: Document glibc.cache.aarch64_dczva_threshold. * sysdeps/aarch64/dl-tunables.list (glibc): Add cache.aarch64_dczva_threshold. * sysdeps/aarch64/multiarch/memset_emag.c: New file. * sysdeps/aarch64/multiarch/memset_base64.S (DC_ZVA_THRESHOLD) : Rename to DCZVA_THRESHOLD. * (__memset_base64) : Add conditional compare over __dczva_threshold enclosed by new macro HAVE_DCZVA_THRESHOLD_TUNABLE. * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD): Rename to DCZVA_THRESHOLD, and change to a new value. (HAVE_DCZVA_THRESHOLD_TUNABLE): New macro. --- ChangeLog | 14 +++++++++++ manual/tunables.texi | 9 +++++++ sysdeps/aarch64/dl-tunables.list | 6 +++++ sysdeps/aarch64/multiarch/memset.c | 2 ++ sysdeps/aarch64/multiarch/memset_base64.S | 13 +++++++--- sysdeps/aarch64/multiarch/memset_emag.S | 15 +++++++---- sysdeps/aarch64/multiarch/memset_emag.c | 41 +++++++++++++++++++++++++++++++ 7 files changed, 92 insertions(+), 8 deletions(-) create mode 100644 sysdeps/aarch64/multiarch/memset_emag.c -- 1.8.3.1 diff --git a/ChangeLog b/ChangeLog index dbdb85d..94c55e3 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,17 @@ +2019-07-31 Feng Xue + + * manual/tunables.texi: Document glibc.cache.aarch64_dczva_threshold. + * sysdeps/aarch64/dl-tunables.list (glibc): + Add cache.aarch64_dczva_threshold. + * sysdeps/aarch64/multiarch/memset_emag.c: New file. + * sysdeps/aarch64/multiarch/memset_base64.S (DC_ZVA_THRESHOLD) : + Rename to DCZVA_THRESHOLD. + * (__memset_base64) : Add conditional compare over __dczva_threshold + enclosed by new macro HAVE_DCZVA_THRESHOLD_TUNABLE. + * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD): Rename + to DCZVA_THRESHOLD, and change to a new value. + (HAVE_DCZVA_THRESHOLD_TUNABLE): New macro. + 2019-07-25 Florian Weimer [BZ #24677] diff --git a/manual/tunables.texi b/manual/tunables.texi index ee0fdf2..b248739 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -411,3 +411,12 @@ instead. This tunable is specific to i386 and x86-64. @end deftp + +@deftp Tunable glibc.cache.aarch64_dczva_threshold +The @code{glibc.cache.aarch64_dczva_threshold} tunable allows the user to set +threshold to trigger DC ZVA in memset on emag processor. When memset size +is less than this threshold, normal memory store instruction will be used, +otherwise DC ZVA instruction will be used. + +This tunable is specific to emag aarch64. +@end deftp diff --git a/sysdeps/aarch64/dl-tunables.list b/sysdeps/aarch64/dl-tunables.list index 5fac533..3bc622c 100644 --- a/sysdeps/aarch64/dl-tunables.list +++ b/sysdeps/aarch64/dl-tunables.list @@ -22,4 +22,10 @@ glibc { type: STRING } } + cache { + aarch64_dczva_threshold { + type: SIZE_T + default: 0 + } + } } diff --git a/sysdeps/aarch64/multiarch/memset.c b/sysdeps/aarch64/multiarch/memset.c index 4817587..792d7f6 100644 --- a/sysdeps/aarch64/multiarch/memset.c +++ b/sysdeps/aarch64/multiarch/memset.c @@ -41,4 +41,6 @@ libc_ifunc (__libc_memset, # undef memset strong_alias (__libc_memset, memset); + +# include "./memset_emag.c" #endif diff --git a/sysdeps/aarch64/multiarch/memset_base64.S b/sysdeps/aarch64/multiarch/memset_base64.S index 9a62325..d523320 100644 --- a/sysdeps/aarch64/multiarch/memset_base64.S +++ b/sysdeps/aarch64/multiarch/memset_base64.S @@ -23,8 +23,8 @@ # define MEMSET __memset_base64 #endif -#ifndef DC_ZVA_THRESHOLD -# define DC_ZVA_THRESHOLD 512 +#ifndef DCZVA_THRESHOLD +# define DCZVA_THRESHOLD 512 #endif /* Assumptions: @@ -91,7 +91,14 @@ L(set96): .p2align 4 L(set_long): stp val, val, [dstin] - cmp count, DC_ZVA_THRESHOLD +#ifdef HAVE_DCZVA_THRESHOLD_TUNABLE + adrp tmp1, __dczva_threshold + add tmp1, tmp1, :lo12:__dczva_threshold + ldr tmp2, [tmp1] /* Load DC ZVA tunable threshold value. */ + cmp count, tmp2 +#else + cmp count, DCZVA_THRESHOLD +#endif ccmp val, 0, 0, cs bic dst, dstin, 15 b.eq L(zva_64) diff --git a/sysdeps/aarch64/multiarch/memset_emag.S b/sysdeps/aarch64/multiarch/memset_emag.S index 1c1fabc..6df612b 100644 --- a/sysdeps/aarch64/multiarch/memset_emag.S +++ b/sysdeps/aarch64/multiarch/memset_emag.S @@ -20,13 +20,18 @@ #if IS_IN (libc) # define MEMSET __memset_emag +# if HAVE_TUNABLES +# define HAVE_DCZVA_THRESHOLD_TUNABLE 1 +# endif + /* - * Using dc zva to zero memory does not produce better performance if + * Using DC ZVA to zero memory does not produce better performance if * memory size is not very large, especially when there are multiple - * processes/threads contending memory/cache. Here we use a somewhat - * large threshold to trigger usage of dc zva. -*/ -# define DC_ZVA_THRESHOLD 1024 + * processes/threads contending memory/cache. Here we use a very + * large threshold to trigger usage of DC ZVA, which is good for + * multi-process/thread workloads. + */ +# define DCZVA_THRESHOLD 8*1024*1024 # include "./memset_base64.S" #endif diff --git a/sysdeps/aarch64/multiarch/memset_emag.c b/sysdeps/aarch64/multiarch/memset_emag.c new file mode 100644 index 0000000..6121046 --- /dev/null +++ b/sysdeps/aarch64/multiarch/memset_emag.c @@ -0,0 +1,41 @@ +/* Setup threshold to trigger DC ZVA in memset for emag. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if HAVE_TUNABLES +# include + +/* We assume common workloads on server are likely to consist of multiple + * processes/threads, contending memory/cache. For this scenario, disabling + * DC ZVA in memset can archive better performance on emag processor. + * Therefore, by default, we use a very larget threshold, here is 8M, + * which has similar effect as disabling DC ZVA, kind of optimization + * for multi-process/thread workload. + */ +uint64_t __dczva_threshold = 8 * 1024 * 1024; + +static void +__attribute__ ((constructor)) +init_dczva_threshold (void) +{ + uint64_t threshold + = TUNABLE_GET (glibc, cache, aarch64_dczva_threshold, uint64_t, NULL); + + if (threshold) + __dczva_threshold = threshold; +} +#endif