[v3,2/3] x86: Do not prefer ERMS for memset on Zen3+

Message ID 20240208130840.533348-3-adhemerval.zanella@linaro.org
State Committed
Commit 272708884cb750f12f5c74a00e6620c19dc6d567
Delegated to: H.J. Lu
Headers
Series x86: Improve ERMS usage on Zen3+ |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed

Commit Message

Adhemerval Zanella Netto Feb. 8, 2024, 1:08 p.m. UTC
  For AMD Zen3+ architecture, the performance of the vectorized loop is
slightly better than ERMS.

Checked on x86_64-linux-gnu on Zen3.
---
 sysdeps/x86/dl-cacheinfo.h | 5 +++++
 1 file changed, 5 insertions(+)
  

Comments

H.J. Lu Feb. 12, 2024, 3:56 p.m. UTC | #1
On Thu, Feb 8, 2024 at 5:08 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
> For AMD Zen3+ architecture, the performance of the vectorized loop is
> slightly better than ERMS.
>
> Checked on x86_64-linux-gnu on Zen3.
> ---
>  sysdeps/x86/dl-cacheinfo.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
> index f34d12846c..5a98f70364 100644
> --- a/sysdeps/x86/dl-cacheinfo.h
> +++ b/sysdeps/x86/dl-cacheinfo.h
> @@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
>       minimum value is fixed.  */
>    rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold,
>                                      long int, NULL);
> +  if (cpu_features->basic.kind == arch_kind_amd
> +      && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold))
> +    /* For AMD Zen3+ architecture, the performance of the vectorized loop is
> +       slightly better than ERMS.  */
> +    rep_stosb_threshold = SIZE_MAX;
>
>    TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX);
>    TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX);
> --
> 2.34.1
>

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.
  

Patch

diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
index f34d12846c..5a98f70364 100644
--- a/sysdeps/x86/dl-cacheinfo.h
+++ b/sysdeps/x86/dl-cacheinfo.h
@@ -1021,6 +1021,11 @@  dl_init_cacheinfo (struct cpu_features *cpu_features)
      minimum value is fixed.  */
   rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold,
 				     long int, NULL);
+  if (cpu_features->basic.kind == arch_kind_amd
+      && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold))
+    /* For AMD Zen3+ architecture, the performance of the vectorized loop is
+       slightly better than ERMS.  */
+    rep_stosb_threshold = SIZE_MAX;
 
   TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX);
   TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX);