From patchwork Mon May 22 20:22:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 20536 Received: (qmail 32859 invoked by alias); 22 May 2017 20:22:52 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 32838 invoked by uid 89); 22 May 2017 20:22:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.1 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-qk0-f170.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Bpfm/cleKncLPqTIJW3WFqKiiaUhzP/c0ERCorXGdU8=; b=aOM10zJy4nydXgPpYx7MNAPCeAQr5MWzULgbskrY+XIYyT0yzqSzyMa+9DgKRd5ITM hhjV0KFYFTrlso5nCUJ3bkiB6fiw7Zx5fQ3qMzgvcSm5LtCalLmq6k8fP/KEi43+wQCT PDylx368tCKkhwCo6I8pMGsX+HWQPVADCZqi2a1abGTgowVTIel88+IaPAWmYsF68zlK bttLlGGBKPbwhh3EoE+p1pOf2QmUwpDMBNHPK1nvMXgv0ucDAGO6OMIk8nSAHICmtM8p y6SVHhEcUw1w2YYvzV5Fgzzr6yWsTo2RwitlGiJvfQP4Z7gyNTxeCpmAlrGr5llT23xs 3j9Q== X-Gm-Message-State: AODbwcB4zRJGtzwBAg0gqUXsC+PLPTl5IPqFm+1fn9MYxz03B6ETU2Wi i7PgCdQlQ8Nm7H/0uoU+LLTXPeiBfQ== X-Received: by 10.55.50.19 with SMTP id y19mr20026382qky.24.1495484571826; Mon, 22 May 2017 13:22:51 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <9c563a4b-424b-242f-b82f-4650ab2637f7@redhat.com> <28e34264-e8c5-5570-c48c-9125893808b2@redhat.com> From: "H.J. Lu" Date: Mon, 22 May 2017 13:22:51 -0700 Message-ID: Subject: Re: memcpy performance regressions 2.19 -> 2.24(5) To: Erich Elsen Cc: "Carlos O'Donell" , GNU C Library On Mon, May 22, 2017 at 12:17 PM, H.J. Lu wrote: > On Thu, May 18, 2017 at 1:59 PM, Erich Elsen wrote: >> Hi H.J., >> >> I was on vacation, sorry for the slow reply. The updated benchmark >> still shows the same behavior, thanks. >> >> I'll try my hand at creating a patch that makes that variable >> __x86_shared_non_temporal_threshold a tunable. It will be necessary >> to do internal experiments anyway. >> > > __x86_shared_non_temporal_threshold was set to 6 times of per-core > shared cache size, based on the large memcpy micro benchmark in glibc > on a 8-core processor. For a processor with more than 8 cores, the > threshold is too low. Set __x86_shared_non_temporal_threshold to the > 3/4 of the total shared cache size so that it is unchanged on 8-core > processors. On processors with less than 8 cores, the threshold is > lower. > > Any comments? > Here is a patch to add support for "glibc.x86_cache.non_temporal_threshold=number" to GLIBC_TUNABLES. From 3e31bc4a930e7b32924befe762014f85d5408692 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Mon, 22 May 2017 12:00:43 -0700 Subject: [PATCH] Add x86_cache.non_temporal_threshold to GLIBC_TUNABLES Add support for "glibc.x86_cache.non_temporal_threshold=number" to GLIBC_TUNABLES. * elf/dl-tunables.list (x86_cache): New name space. * sysdeps/x86/cacheinfo.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE): New. [HAVE_TUNABLES]: Include . [HAVE_TUNABLES] (DL_TUNABLE_CALLBACK (set_non_temporal_threshold)): New. [HAVE_TUNABLES] (init_cacheinfo): Call TUNABLE_SET_VAL_WITH_CALLBACK with set_non_temporal_threshold. --- elf/dl-tunables.list | 6 ++++++ sysdeps/x86/cacheinfo.c | 22 +++++++++++++++++++--- 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list index b9f1488..2c899fe 100644 --- a/elf/dl-tunables.list +++ b/elf/dl-tunables.list @@ -77,4 +77,10 @@ glibc { security_level: SXID_IGNORE } } + x86_cache { + non_temporal_threshold { + type: SIZE_T + security_level: SXID_IGNORE + } + } } diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c index 3434d97..1b195eb 100644 --- a/sysdeps/x86/cacheinfo.c +++ b/sysdeps/x86/cacheinfo.c @@ -23,6 +23,20 @@ #include #include +/* Threshold to use non temporal store. */ +long int __x86_shared_non_temporal_threshold attribute_hidden; + +#if HAVE_TUNABLES +# define TUNABLE_NAMESPACE x86_cache +# include + +void +DL_TUNABLE_CALLBACK (set_non_temporal_threshold) (tunable_val_t *valp) +{ + __x86_shared_non_temporal_threshold = (long int) valp->numval; +} +#endif + #define is_intel GLRO(dl_x86_cpu_features).kind == arch_kind_intel #define is_amd GLRO(dl_x86_cpu_features).kind == arch_kind_amd #define max_cpuid GLRO(dl_x86_cpu_features).max_cpuid @@ -466,9 +480,6 @@ long int __x86_raw_shared_cache_size_half attribute_hidden = 1024 * 1024 / 2; /* Similar to __x86_shared_cache_size, but not rounded. */ long int __x86_raw_shared_cache_size attribute_hidden = 1024 * 1024; -/* Threshold to use non temporal store. */ -long int __x86_shared_non_temporal_threshold attribute_hidden; - #ifndef DISABLE_PREFETCHW /* PREFETCHW support flag for use in memory and string routines. */ int __x86_prefetchw attribute_hidden; @@ -770,4 +781,9 @@ intel_bug_no_cache_info: total shared cache size. */ __x86_shared_non_temporal_threshold = __x86_shared_cache_size * threads * 3 / 4; + +#if HAVE_TUNABLES + TUNABLE_SET_VAL_WITH_CALLBACK (non_temporal_threshold, NULL, + set_non_temporal_threshold); +#endif } -- 2.9.4