x86: Update __x86_shared_non_temporal_threshold

Message ID 20170525143624.GA7747@lucon.org
State New, archived
Headers

Commit Message

Lu, Hongjiu May 25, 2017, 2:36 p.m. UTC
  __x86_shared_non_temporal_threshold was set to 6 times of per-core
shared cache size, based on the large memcpy micro benchmark in glibc
on a 8-core processor.  For a processor with more than 8 cores, the
threshold is too low.  Set __x86_shared_non_temporal_threshold to the
3/4 of the total shared cache size so that it is unchanged on 8-core
processors.  On processors with less than 8 cores, the threshold is
lower.

Any comments?

H.J.
---
	* sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold):
	Set to the 3/4 of the total shared cache size.
---
 sysdeps/x86/cacheinfo.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
  

Comments

H.J. Lu May 30, 2017, 2:44 p.m. UTC | #1
On Thu, May 25, 2017 at 7:36 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> __x86_shared_non_temporal_threshold was set to 6 times of per-core
> shared cache size, based on the large memcpy micro benchmark in glibc
> on a 8-core processor.  For a processor with more than 8 cores, the
> threshold is too low.  Set __x86_shared_non_temporal_threshold to the
> 3/4 of the total shared cache size so that it is unchanged on 8-core
> processors.  On processors with less than 8 cores, the threshold is
> lower.
>
> Any comments?

Any objections?

> H.J.
> ---
>         * sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold):
>         Set to the 3/4 of the total shared cache size.
> ---
>  sysdeps/x86/cacheinfo.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> index 1ccbe41..3434d97 100644
> --- a/sysdeps/x86/cacheinfo.c
> +++ b/sysdeps/x86/cacheinfo.c
> @@ -766,6 +766,8 @@ intel_bug_no_cache_info:
>
>    /* The large memcpy micro benchmark in glibc shows that 6 times of
>       shared cache size is the approximate value above which non-temporal
> -     store becomes faster.  */
> -  __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6;
> +     store becomes faster on a 8-core processor.  This is the 3/4 of the
> +     total shared cache size.  */
> +  __x86_shared_non_temporal_threshold
> +    = __x86_shared_cache_size * threads * 3 / 4;
>  }
> --
> 2.9.4
>
  
H.J. Lu June 2, 2017, 8:05 p.m. UTC | #2
On Tue, May 30, 2017 at 7:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, May 25, 2017 at 7:36 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> __x86_shared_non_temporal_threshold was set to 6 times of per-core
>> shared cache size, based on the large memcpy micro benchmark in glibc
>> on a 8-core processor.  For a processor with more than 8 cores, the
>> threshold is too low.  Set __x86_shared_non_temporal_threshold to the
>> 3/4 of the total shared cache size so that it is unchanged on 8-core
>> processors.  On processors with less than 8 cores, the threshold is
>> lower.
>>
>> Any comments?
>
> Any objections?
>
>> H.J.
>> ---
>>         * sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold):
>>         Set to the 3/4 of the total shared cache size.
>> ---
>>  sysdeps/x86/cacheinfo.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
>> index 1ccbe41..3434d97 100644
>> --- a/sysdeps/x86/cacheinfo.c
>> +++ b/sysdeps/x86/cacheinfo.c
>> @@ -766,6 +766,8 @@ intel_bug_no_cache_info:
>>
>>    /* The large memcpy micro benchmark in glibc shows that 6 times of
>>       shared cache size is the approximate value above which non-temporal
>> -     store becomes faster.  */
>> -  __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6;
>> +     store becomes faster on a 8-core processor.  This is the 3/4 of the
>> +     total shared cache size.  */
>> +  __x86_shared_non_temporal_threshold
>> +    = __x86_shared_cache_size * threads * 3 / 4;
>>  }
>> --
>> 2.9.4
>>
>

I will check it today.
  

Patch

diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 1ccbe41..3434d97 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -766,6 +766,8 @@  intel_bug_no_cache_info:
 
   /* The large memcpy micro benchmark in glibc shows that 6 times of
      shared cache size is the approximate value above which non-temporal
-     store becomes faster.  */
-  __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6;
+     store becomes faster on a 8-core processor.  This is the 3/4 of the
+     total shared cache size.  */
+  __x86_shared_non_temporal_threshold
+    = __x86_shared_cache_size * threads * 3 / 4;
 }