From patchwork Tue May 10 23:49:22 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 12198 Received: (qmail 130529 invoked by alias); 10 May 2016 23:49:35 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 130405 invoked by uid 89); 10 May 2016 23:49:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=family X-HELO: mail-qg0-f68.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=t8l2kOoZLAJAn843Nh5t0zZAvIT5bmW/2Xa75yShjg8=; b=L1Tzxs86nbUATzyqFyBy07pArNc3fj9RvK3Cl9W2VuMwxMUI54oO46raMml/jk68na AKO7YFnESDja0SPvsIOef/Z+CXR2uO7nRgj1pevHjX1gYL15oAVkVV9T0eVGCNayF58F h3rUAg7RHbDCTv1XyeS5pPRzpvUhb0tYQf+aoJD8P8xs0LUzQ3o3mXjXvj6FPhKf8crn 2M9GzCZ0mreYzEvrx95iq//3TJZJGgMo04FhRBPOcii+SyfYyVPtOxZEh0Fn9DaW22zh fk/rk+J9gbamKaOIYwVyoBE0AaiuaAc66T0F4th2xAEamEmi+ZyPyXqASCfNZmWCbSlt dFmQ== X-Gm-Message-State: AOPr4FW7pdCzxQQEP2DNj8rlwZnexzHLCAGqKAcT3iLD/Wm8ZuVqVBHbSsnDvHriiFq7mHubw7CyQaaRHPg02Q== MIME-Version: 1.0 X-Received: by 10.141.45.5 with SMTP id w5mr318919qhe.1.1462924162541; Tue, 10 May 2016 16:49:22 -0700 (PDT) Date: Tue, 10 May 2016 16:49:22 -0700 Message-ID: Subject: [PATCH] Add cache info to _dl_x86_cpu_features From: "H.J. Lu" To: GNU C Library On Sun, May 8, 2016 at 8:57 AM, H.J. Lu wrote: > Support setting processor-specific __x86_shared_non_temporal_threshold > value in init_cpu_features. > > Tested on x86 and x86-64. Any comments and feebacks? > > H.J. > --- > * sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold): > Initialize only if it is zero. > --- > sysdeps/x86/cacheinfo.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c > index 143b333..788f42d 100644 > --- a/sysdeps/x86/cacheinfo.c > +++ b/sysdeps/x86/cacheinfo.c > @@ -669,5 +669,6 @@ init_cacheinfo (void) > /* The large memcpy micro benchmark in glibc shows that 6 times of > shared cache size is the approximate value above which non-temporal > store becomes faster. */ > - __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6; > + if (__x86_shared_non_temporal_threshold == 0) > + __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6; > } This doesn't work since init_cpu_features may only update the private copy in ld.so. Instead, this patch adds cache info to _dl_x86_cpu_features so that init_cpu_features sets cache size. Any comments and feebacks? From fa82f9be580d9a4be4c3a56cde2aa2b1aadf98fc Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Tue, 10 May 2016 05:42:49 -0700 Subject: [PATCH] X86: Add cache info to _dl_x86_cpu_features This patch adds cache info to _dl_x86_cpu_features to allow a processor to override cache info derived from CPUID. Tested on x86 and x86-64. * sysdeps/x86/cacheinfo.c: Skip if not in libc. (init_cacheinfo): Use raw_data_size, raw_shared_size and shared_non_temporal_threshold from _dl_x86_cpu_features if not zero. * sysdeps/x86/cpu-features.h (cache_info): New. (cpu_features): Add cache. --- sysdeps/x86/cacheinfo.c | 17 ++++++++++++++++- sysdeps/x86/cpu-features.h | 13 +++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c index 143b333..d6d0083 100644 --- a/sysdeps/x86/cacheinfo.c +++ b/sysdeps/x86/cacheinfo.c @@ -16,6 +16,8 @@ License along with the GNU C Library; if not, see . */ +#if IS_IN (libc) + #include #include #include @@ -646,6 +648,11 @@ init_cacheinfo (void) #endif } + const struct cache_info *cache = &GLRO(dl_x86_cpu_features).cache; + + if (cache->raw_data_size != 0) + data = cache->raw_data_size; + if (data > 0) { __x86_raw_data_cache_size_half = data / 2; @@ -656,6 +663,9 @@ init_cacheinfo (void) __x86_data_cache_size = data; } + if (cache->raw_shared_size != 0) + shared = cache->raw_shared_size; + if (shared > 0) { __x86_raw_shared_cache_size_half = shared / 2; @@ -669,5 +679,10 @@ init_cacheinfo (void) /* The large memcpy micro benchmark in glibc shows that 6 times of shared cache size is the approximate value above which non-temporal store becomes faster. */ - __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6; + __x86_shared_non_temporal_threshold + = (cache->shared_non_temporal_threshold != 0 + ? cache->shared_non_temporal_threshold + : __x86_shared_cache_size * 6); } + +#endif diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h index 9529d61..335d96a 100644 --- a/sysdeps/x86/cpu-features.h +++ b/sysdeps/x86/cpu-features.h @@ -164,6 +164,18 @@ #else /* __ASSEMBLER__ */ +struct cache_info +{ + /* Data cache size for use in memory and string routines, typically + L1 size. */ + long int raw_data_size; + /* Shared cache size for use in memory and string routines, typically + L2 or L3 size. */ + long int raw_shared_size; + /* Threshold to use non temporal store. */ + long int shared_non_temporal_threshold; +}; + enum { COMMON_CPUID_INDEX_1 = 0, @@ -193,6 +205,7 @@ struct cpu_features unsigned int family; unsigned int model; unsigned int feature[FEATURE_INDEX_MAX]; + struct cache_info cache; }; /* Used from outside of glibc to get access to the CPU features -- 2.5.5