tunables: Add IFUNC selection and cache sizes

Message ID 20170615131042.GA28885@gmail.com
State New, archived
Headers

Commit Message

H.J. Lu June 15, 2017, 1:10 p.m. UTC
  The current IFUNC selection is based on microbenchmarks in glibc.  It
should give the best performance for most workloads.  But other choices
may have better performance for a particular workload or on the hardware
which wasn't available at the selection was made.  The environment
variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used
to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz,
where the feature name is case-sensitive and has to match the ones in
cpu-features.h.  It can be used by glibc developers to override the
IFUNC selection to tune for a new processor or improve performance for
a particular workload.  It isn't intended for normal end users.

NOTE: the IFUNC selection may change over time.  Please check all
multiarch implementations when experimenting.

Also, GLIBC_TUNABLES=glibc.tune.non_temporal_threshold=NUMBER is
provided to set threshold to use non temporal store to NUMBER,
GLIBC_TUNABLES=glibc.tune.data_cache_size=NUMBER to set data cache size,
GLIBC_TUNABLES=glibc.tune.shared_cache_size=NUMBER to set shared cache
size.

Any comments?

H.J.
---
2017-06-15  H.J. Lu  <hongjiu.lu@intel.com>
	    Erich Elsen  <eriche@google.com>

	* elf/dl-tunables.list (tune): Add ifunc, non_temporal_threshold,
	data_cache_size and shared_cache_size.
	* manual/tunables.texi: Document glibc.tune.ifunc,
	glibc.tune.data_cache_size, glibc.tune.shared_cache_size and
	glibc.tune.non_temporal_threshold.
	* sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file.
	* sysdeps/x86/cpu-tunables.c: Likewise.
	* sysdeps/x86/cacheinfo.c
	(init_cacheinfo): Check and get data cache size, shared cache
	size and non temporal threshold from cpu_features.
	* sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE):
	New.
	[HAVE_TUNABLES] Include <unistd.h>.
	[HAVE_TUNABLES] Include <elf/dl-tunables.h>.
	[HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise.
	[HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set
	IFUNC selection, data cache size, shared cache size and non
	temporal threshold.
	* sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size,
	shared_cache_size and non_temporal_threshold.
---
 elf/dl-tunables.list                    |  16 ++
 manual/tunables.texi                    |  36 ++++
 sysdeps/unix/sysv/linux/x86/dl-sysdep.c |  21 ++
 sysdeps/x86/cacheinfo.c                 |  10 +-
 sysdeps/x86/cpu-features.c              |  19 ++
 sysdeps/x86/cpu-features.h              |   8 +
 sysdeps/x86/cpu-tunables.c              | 330 ++++++++++++++++++++++++++++++++
 7 files changed, 439 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/unix/sysv/linux/x86/dl-sysdep.c
 create mode 100644 sysdeps/x86/cpu-tunables.c
  

Comments

H.J. Lu June 20, 2017, 10:55 a.m. UTC | #1
On Thu, Jun 15, 2017 at 6:10 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> The current IFUNC selection is based on microbenchmarks in glibc.  It
> should give the best performance for most workloads.  But other choices
> may have better performance for a particular workload or on the hardware
> which wasn't available at the selection was made.  The environment
> variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used
> to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz,
> where the feature name is case-sensitive and has to match the ones in
> cpu-features.h.  It can be used by glibc developers to override the
> IFUNC selection to tune for a new processor or improve performance for
> a particular workload.  It isn't intended for normal end users.
>
> NOTE: the IFUNC selection may change over time.  Please check all
> multiarch implementations when experimenting.
>
> Also, GLIBC_TUNABLES=glibc.tune.non_temporal_threshold=NUMBER is
> provided to set threshold to use non temporal store to NUMBER,
> GLIBC_TUNABLES=glibc.tune.data_cache_size=NUMBER to set data cache size,
> GLIBC_TUNABLES=glibc.tune.shared_cache_size=NUMBER to set shared cache
> size.
>
> Any comments?

I will check it in this week if there is no objection.

H.J.
> H.J.
> ---
> 2017-06-15  H.J. Lu  <hongjiu.lu@intel.com>
>             Erich Elsen  <eriche@google.com>
>
>         * elf/dl-tunables.list (tune): Add ifunc, non_temporal_threshold,
>         data_cache_size and shared_cache_size.
>         * manual/tunables.texi: Document glibc.tune.ifunc,
>         glibc.tune.data_cache_size, glibc.tune.shared_cache_size and
>         glibc.tune.non_temporal_threshold.
>         * sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file.
>         * sysdeps/x86/cpu-tunables.c: Likewise.
>         * sysdeps/x86/cacheinfo.c
>         (init_cacheinfo): Check and get data cache size, shared cache
>         size and non temporal threshold from cpu_features.
>         * sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE):
>         New.
>         [HAVE_TUNABLES] Include <unistd.h>.
>         [HAVE_TUNABLES] Include <elf/dl-tunables.h>.
>         [HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise.
>         [HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set
>         IFUNC selection, data cache size, shared cache size and non
>         temporal threshold.
>         * sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size,
>         shared_cache_size and non_temporal_threshold.
> ---
>  elf/dl-tunables.list                    |  16 ++
>  manual/tunables.texi                    |  36 ++++
>  sysdeps/unix/sysv/linux/x86/dl-sysdep.c |  21 ++
>  sysdeps/x86/cacheinfo.c                 |  10 +-
>  sysdeps/x86/cpu-features.c              |  19 ++
>  sysdeps/x86/cpu-features.h              |   8 +
>  sysdeps/x86/cpu-tunables.c              | 330 ++++++++++++++++++++++++++++++++
>  7 files changed, 439 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/unix/sysv/linux/x86/dl-sysdep.c
>  create mode 100644 sysdeps/x86/cpu-tunables.c
>
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 41ce9af..78354fb 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -82,6 +82,22 @@ glibc {
>        type: UINT_64
>        env_alias: LD_HWCAP_MASK
>        default: HWCAP_IMPORTANT
> +      }
> +    ifunc {
> +      type: STRING
> +      security_level: SXID_IGNORE
> +    }
> +    non_temporal_threshold {
> +      type: SIZE_T
> +      security_level: SXID_IGNORE
> +    }
> +    data_cache_size {
> +      type: SIZE_T
> +      security_level: SXID_IGNORE
> +    }
> +    shared_cache_size {
> +      type: SIZE_T
> +      security_level: SXID_IGNORE
>      }
>    }
>  }
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index c9a4cb7..709f58a 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -198,6 +198,14 @@ is 8 times the number of cores online.
>  @cindex hardware capability tunables
>  @cindex hwcap tunables
>  @cindex tunables, hwcap
> +@cindex ifunc tunables
> +@cindex tunables, ifunc
> +@cindex data_cache_size tunables
> +@cindex tunables, data_cache_size
> +@cindex shared_cache_size tunables
> +@cindex tunables, shared_cache_size
> +@cindex non_temporal_threshold tunables
> +@cindex tunables, non_temporal_threshold
>
>  @deftp {Tunable namespace} glibc.tune
>  Behavior of @theglibc{} can be tuned to assume specific hardware capabilities
> @@ -213,3 +221,31 @@ extensions available in the processor at runtime for some architectures.  The
>  @code{glibc.tune.hwcap_mask} tunable allows the user to mask out those
>  capabilities at runtime, thus disabling use of those extensions.
>  @end deftp
> +
> +@deftp Tunable glibc.tune.ifunc
> +The @code{glibc.tune.ifunc=-xxx,yyy,-zzz...} tunable allows the user to
> +enable CPU/ARCH feature @code{yyy}, disable CPU/ARCH feature @code{xxx}
> +and @code{zzz} where the feature name is case-sensitive and has to match
> +the ones in @code{sysdeps/x86/cpu-features.h}.
> +
> +This tunable is specific to i386 and x86-64.
> +@end deftp
> +
> +@deftp Tunable glibc.tune.data_cache_size
> +The @code{glibc.tune.data_cache_size} tunable allows the user to set
> +data cache size in bytes for use in memory and string routines.
> +
> +This tunable is specific to i386 and x86-64.
> +@end deftp
> +
> +@deftp Tunable glibc.tune.shared_cache_size
> +The @code{glibc.tune.shared_cache_size} tunable allows the user to set
> +shared cache size in bytes for use in memory and string routines.
> +@end deftp
> +
> +@deftp Tunable glibc.tune.non_temporal_threshold
> +The @code{glibc.tune.non_temporal_threshold} tunable allows the user to
> +set threshold in bytes for non temporal store.
> +
> +This tunable is specific to i386 and x86-64.
> +@end deftp
> diff --git a/sysdeps/unix/sysv/linux/x86/dl-sysdep.c b/sysdeps/unix/sysv/linux/x86/dl-sysdep.c
> new file mode 100644
> index 0000000..64eb0d7
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/x86/dl-sysdep.c
> @@ -0,0 +1,21 @@
> +/* Operating system support for run-time dynamic linker.  X86 version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <config.h>
> +#include <sysdeps/x86/cpu-tunables.c>
> +#include <sysdeps/unix/sysv/linux/dl-sysdep.c>
> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> index 8000fd1..9542dfb 100644
> --- a/sysdeps/x86/cacheinfo.c
> +++ b/sysdeps/x86/cacheinfo.c
> @@ -752,6 +752,9 @@ intel_bug_no_cache_info:
>  #endif
>      }
>
> +  if (cpu_features->data_cache_size != 0)
> +    data = cpu_features->data_cache_size;
> +
>    if (data > 0)
>      {
>        __x86_raw_data_cache_size_half = data / 2;
> @@ -762,6 +765,9 @@ intel_bug_no_cache_info:
>        __x86_data_cache_size = data;
>      }
>
> +  if (cpu_features->shared_cache_size != 0)
> +    shared = cpu_features->shared_cache_size;
> +
>    if (shared > 0)
>      {
>        __x86_raw_shared_cache_size_half = shared / 2;
> @@ -777,7 +783,9 @@ intel_bug_no_cache_info:
>       store becomes faster on a 8-core processor.  This is the 3/4 of the
>       total shared cache size.  */
>    __x86_shared_non_temporal_threshold
> -    = __x86_shared_cache_size * threads * 3 / 4;
> +    = (cpu_features->non_temporal_threshold != 0
> +       ? cpu_features->non_temporal_threshold
> +       : __x86_shared_cache_size * threads * 3 / 4);
>  }
>
>  #endif
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index 4288001..1ca0310 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -20,6 +20,15 @@
>  #include <cpu-features.h>
>  #include <dl-hwcap.h>
>
> +#if HAVE_TUNABLES
> +# define TUNABLE_NAMESPACE tune
> +# include <unistd.h>           /* Get STDOUT_FILENO for _dl_printf.  */
> +# include <elf/dl-tunables.h>
> +
> +extern void TUNABLE_CALLBACK (set_ifunc) (tunable_val_t *)
> +  attribute_hidden;
> +#endif
> +
>  static void
>  get_common_indeces (struct cpu_features *cpu_features,
>                     unsigned int *family, unsigned int *model,
> @@ -312,6 +321,16 @@ no_cpuid:
>    cpu_features->model = model;
>    cpu_features->kind = kind;
>
> +#if HAVE_TUNABLES
> +  TUNABLE_GET (ifunc, tunable_val_t *, TUNABLE_CALLBACK (set_ifunc));
> +  cpu_features->non_temporal_threshold
> +    = TUNABLE_GET (non_temporal_threshold, long int, NULL);
> +  cpu_features->data_cache_size
> +    = TUNABLE_GET (data_cache_size, long int, NULL);
> +  cpu_features->shared_cache_size
> +    = TUNABLE_GET (shared_cache_size, long int, NULL);
> +#endif
> +
>    /* Reuse dl_platform, dl_hwcap and dl_hwcap_mask for x86.  */
>    GLRO(dl_platform) = NULL;
>    GLRO(dl_hwcap) = 0;
> diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
> index f2329ea..fef5e18 100644
> --- a/sysdeps/x86/cpu-features.h
> +++ b/sysdeps/x86/cpu-features.h
> @@ -215,6 +215,14 @@ struct cpu_features
>    unsigned int family;
>    unsigned int model;
>    unsigned int feature[FEATURE_INDEX_MAX];
> +  /* Data cache size for use in memory and string routines, typically
> +     L1 size.  */
> +  long int data_cache_size;
> +  /* Shared cache size for use in memory and string routines, typically
> +     L2 or L3 size.  */
> +  long int shared_cache_size;
> +  /* Threshold to use non temporal store.  */
> +  long int non_temporal_threshold;
>  };
>
>  /* Used from outside of glibc to get access to the CPU features
> diff --git a/sysdeps/x86/cpu-tunables.c b/sysdeps/x86/cpu-tunables.c
> new file mode 100644
> index 0000000..dfa3647
> --- /dev/null
> +++ b/sysdeps/x86/cpu-tunables.c
> @@ -0,0 +1,330 @@
> +/* x86 CPU feature tuning.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#if HAVE_TUNABLES
> +# define TUNABLE_NAMESPACE tune
> +# include <stdbool.h>
> +# include <stdint.h>
> +# include <unistd.h>           /* Get STDOUT_FILENO for _dl_printf.  */
> +# include <elf/dl-tunables.h>
> +# include <string.h>
> +# include <cpu-features.h>
> +# include <ldsodefs.h>
> +
> +/* We can't use IFUNC memcmp nor strlen in init_cpu_features from libc.a
> +   since IFUNC must be set up by init_cpu_features.  */
> +# if defined USE_MULTIARCH && !defined SHARED
> +#  ifdef __x86_64__
> +#   define DEFAULT_MEMCMP      __memcmp_sse2
> +#   define DEFAULT_STRLEN      __strlen_sse2
> +extern __typeof (strlen) DEFAULT_STRLEN;
> +#  else
> +#   define DEFAULT_MEMCMP      __memcmp_ia32
> +#   define DEFAULT_STRLEN      strlen
> +#  endif
> +extern __typeof (memcmp) DEFAULT_MEMCMP;
> +# else
> +#  define DEFAULT_MEMCMP       memcmp
> +#  define DEFAULT_STRLEN       strlen
> +# endif
> +
> +# define CHECK_GLIBC_IFUNC_CPU_OFF(f, cpu_features, name, len)         \
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);       \
> +  if (!DEFAULT_MEMCMP (f, #name, len))                                 \
> +    {                                                                  \
> +      cpu_features->cpuid[index_cpu_##name].reg_##name                 \
> +       &= ~bit_cpu_##name;                                             \
> +      break;                                                           \
> +    }
> +
> +/* Disable an ARCH feature NAME.  We don't enable an ARCH feature which
> +   isn't available.  */
> +# define CHECK_GLIBC_IFUNC_ARCH_OFF(f, cpu_features, name, len)                \
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);       \
> +  if (!DEFAULT_MEMCMP (f, #name, len))                                 \
> +    {                                                                  \
> +      cpu_features->feature[index_arch_##name]                         \
> +       &= ~bit_arch_##name;                                            \
> +      break;                                                           \
> +    }
> +
> +/* Enable/disable an ARCH feature NAME.  */
> +# define CHECK_GLIBC_IFUNC_ARCH_BOTH(f, cpu_features, name, disable,   \
> +                                   len)                                \
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);       \
> +  if (!DEFAULT_MEMCMP (f, #name, len))                                 \
> +    {                                                                  \
> +      if (disable)                                                     \
> +       cpu_features->feature[index_arch_##name]                        \
> +         &= ~bit_arch_##name;                                          \
> +      else                                                             \
> +       cpu_features->feature[index_arch_##name]                        \
> +         |= bit_arch_##name;                                           \
> +      break;                                                           \
> +    }
> +
> +/* Enable/disable an ARCH feature NAME.  Enable an ARCH feature only
> +   if the ARCH feature NEED is also enabled.  */
> +# define CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH(f, cpu_features, name,  \
> +                                              need, disable, len)      \
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);       \
> +  if (!DEFAULT_MEMCMP (f, #name, len))                                 \
> +    {                                                                  \
> +      if (disable)                                                     \
> +       cpu_features->feature[index_arch_##name]                        \
> +         &= ~bit_arch_##name;                                          \
> +      else if (CPU_FEATURES_ARCH_P (cpu_features, need))               \
> +       cpu_features->feature[index_arch_##name]                        \
> +         |= bit_arch_##name;                                           \
> +      break;                                                           \
> +    }
> +
> +/* Enable/disable an ARCH feature NAME.  Enable an ARCH feature only
> +   if the CPU feature NEED is also enabled.  */
> +# define CHECK_GLIBC_IFUNC_ARCH_NEED_CPU_BOTH(f, cpu_features, name,   \
> +                                             need, disable, len)       \
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);       \
> +  if (!DEFAULT_MEMCMP (f, #name, len))                                 \
> +    {                                                                  \
> +      if (disable)                                                     \
> +       cpu_features->feature[index_arch_##name]                        \
> +         &= ~bit_arch_##name;                                          \
> +      else if (CPU_FEATURES_CPU_P (cpu_features, need))                        \
> +       cpu_features->feature[index_arch_##name]                        \
> +         |= bit_arch_##name;                                           \
> +      break;                                                           \
> +    }
> +
> +attribute_hidden
> +void
> +TUNABLE_CALLBACK (set_ifunc) (tunable_val_t *valp)
> +{
> +  /* The current IFUNC selection is based on microbenchmarks in glibc.
> +     It should give the best performance for most workloads.  But other
> +     choices may have better performance for a particular workload or on
> +     the hardware which wasn't available when the selection was made.
> +     The environment variable, GLIBC_IFUNC=-xxx,yyy,-zzz...., can be
> +     used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy
> +     and zzz, where the feature name is case-sensitive and has to match
> +     the ones in cpu-features.h.  It can be used by glibc developers to
> +     tune for a new processor or override the IFUNC selection to improve
> +     performance for a particular workload.
> +
> +     Since all CPU/ARCH features are hardware optimizations without
> +     security implication, except for Prefer_MAP_32BIT_EXEC, which can
> +     only be disabled, we check GLIBC_IFUNC for programs, including
> +     set*id ones.
> +
> +     NOTE: the IFUNC selection may change over time.  Please check all
> +     multiarch implementations when experimenting.  */
> +
> +  const char *p = valp->strval;
> +  struct cpu_features *cpu_features = &GLRO(dl_x86_cpu_features);
> +  const char *end = p + DEFAULT_STRLEN (p);
> +  size_t len;
> +
> +  do
> +    {
> +      const char *c, *n;
> +      bool disable;
> +      size_t nl;
> +
> +      for (c = p; *c != ','; c++)
> +       if (c >= end)
> +         break;
> +
> +      len = c - p;
> +      disable = *p == '-';
> +      if (disable)
> +       {
> +         n = p + 1;
> +         nl = len - 1;
> +       }
> +      else
> +       {
> +         n = p;
> +         nl = len;
> +       }
> +      switch (nl)
> +       {
> +       default:
> +         break;
> +       case 3:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX, 3);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, CX8, 3);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, FMA, 3);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, HTT, 3);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, RTM, 3);
> +           }
> +         break;
> +       case 4:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX2, 4);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, BMI1, 4);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, BMI2, 4);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, CMOV, 4);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, ERMS, 4);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, FMA4, 4);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSE2, 4);
> +             CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, I586, 4);
> +             CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, I686, 4);
> +           }
> +         break;
> +       case 5:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, LZCNT, 5);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, MOVBE, 5);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSSE3, 5);
> +           }
> +         break;
> +       case 6:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, POPCNT, 6);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSE4_1, 6);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSE4_2, 6);
> +           }
> +         break;
> +       case 7:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512F, 7);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, OSXSAVE, 7);
> +           }
> +         break;
> +       case 8:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512CD, 8);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512BW, 8);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512DQ, 8);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512ER, 8);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512PF, 8);
> +             CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512VL, 8);
> +           }
> +         CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features, Slow_BSF,
> +                                      disable, 8);
> +         break;
> +       case 10:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, AVX_Usable,
> +                                         10);
> +             CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, FMA_Usable,
> +                                         10);
> +           }
> +         break;
> +       case 11:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, AVX2_Usable,
> +                                         11);
> +             CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, FMA4_Usable,
> +                                         11);
> +           }
> +         CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features, Prefer_ERMS,
> +                                      disable, 11);
> +         CHECK_GLIBC_IFUNC_ARCH_NEED_CPU_BOTH (n, cpu_features,
> +                                               Slow_SSE4_2, SSE4_2,
> +                                               disable, 11);
> +         break;
> +       case 14:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features,
> +                                         AVX512F_Usable, 14);
> +           }
> +         break;
> +       case 15:
> +         if (disable)
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features,
> +                                         AVX512DQ_Usable, 15);
> +           }
> +         CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features, Fast_Rep_String,
> +                                      disable, 15);
> +         break;
> +       case 16:
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH
> +               (n, cpu_features, Prefer_No_AVX512, AVX512F_Usable,
> +                disable, 16);
> +           }
> +         break;
> +       case 18:
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +                                          Fast_Copy_Backward, disable,
> +                                          18);
> +           }
> +         break;
> +       case 19:
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +                                          Fast_Unaligned_Load, disable,
> +                                          19);
> +             CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +                                          Fast_Unaligned_Copy, disable,
> +                                          19);
> +           }
> +         break;
> +       case 20:
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH
> +               (n, cpu_features, Prefer_No_VZEROUPPER, AVX_Usable,
> +                disable, 20);
> +           }
> +         break;
> +       case 21:
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +                                          Prefer_MAP_32BIT_EXEC, disable,
> +                                          21);
> +           }
> +         break;
> +       case 23:
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH
> +               (n, cpu_features, AVX_Fast_Unaligned_Load, AVX_Usable,
> +                disable, 23);
> +           }
> +         break;
> +       case 26:
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_NEED_CPU_BOTH
> +               (n, cpu_features, Prefer_PMINUB_for_stringop, SSE2,
> +                disable, 26);
> +           }
> +         break;
> +       case 27:
> +           {
> +             CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +                                          Use_dl_runtime_resolve_slow,
> +                                          disable, 27);
> +           }
> +         break;
> +       }
> +      p += len + 1;
> +    }
> +  while (p < end);
> +}
> +#endif
> --
> 2.9.4
>
  
Adhemerval Zanella June 20, 2017, 1:23 p.m. UTC | #2
On 15/06/2017 10:10, H.J. Lu wrote:
> The current IFUNC selection is based on microbenchmarks in glibc.  It
> should give the best performance for most workloads.  But other choices
> may have better performance for a particular workload or on the hardware
> which wasn't available at the selection was made.  The environment
> variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used
> to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz,
> where the feature name is case-sensitive and has to match the ones in
> cpu-features.h.  It can be used by glibc developers to override the
> IFUNC selection to tune for a new processor or improve performance for
> a particular workload.  It isn't intended for normal end users.
> 
> NOTE: the IFUNC selection may change over time.  Please check all
> multiarch implementations when experimenting.
> 
> Also, GLIBC_TUNABLES=glibc.tune.non_temporal_threshold=NUMBER is
> provided to set threshold to use non temporal store to NUMBER,
> GLIBC_TUNABLES=glibc.tune.data_cache_size=NUMBER to set data cache size,
> GLIBC_TUNABLES=glibc.tune.shared_cache_size=NUMBER to set shared cache
> size.
> 
> Any comments?
> 
> H.J.
> ---
> 2017-06-15  H.J. Lu  <hongjiu.lu@intel.com>
> 	    Erich Elsen  <eriche@google.com>
> 
> 	* elf/dl-tunables.list (tune): Add ifunc, non_temporal_threshold,
> 	data_cache_size and shared_cache_size.
> 	* manual/tunables.texi: Document glibc.tune.ifunc,
> 	glibc.tune.data_cache_size, glibc.tune.shared_cache_size and
> 	glibc.tune.non_temporal_threshold.
> 	* sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file.
> 	* sysdeps/x86/cpu-tunables.c: Likewise.
> 	* sysdeps/x86/cacheinfo.c
> 	(init_cacheinfo): Check and get data cache size, shared cache
> 	size and non temporal threshold from cpu_features.
> 	* sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE):
> 	New.
> 	[HAVE_TUNABLES] Include <unistd.h>.
> 	[HAVE_TUNABLES] Include <elf/dl-tunables.h>.
> 	[HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise.
> 	[HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set
> 	IFUNC selection, data cache size, shared cache size and non
> 	temporal threshold.
> 	* sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size,
> 	shared_cache_size and non_temporal_threshold.
> ---
>  elf/dl-tunables.list                    |  16 ++
>  manual/tunables.texi                    |  36 ++++
>  sysdeps/unix/sysv/linux/x86/dl-sysdep.c |  21 ++
>  sysdeps/x86/cacheinfo.c                 |  10 +-
>  sysdeps/x86/cpu-features.c              |  19 ++
>  sysdeps/x86/cpu-features.h              |   8 +
>  sysdeps/x86/cpu-tunables.c              | 330 ++++++++++++++++++++++++++++++++
>  7 files changed, 439 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/unix/sysv/linux/x86/dl-sysdep.c
>  create mode 100644 sysdeps/x86/cpu-tunables.c
> 
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 41ce9af..78354fb 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -82,6 +82,22 @@ glibc {
>        type: UINT_64
>        env_alias: LD_HWCAP_MASK
>        default: HWCAP_IMPORTANT
> +      }
> +    ifunc {
> +      type: STRING
> +      security_level: SXID_IGNORE
> +    }
> +    non_temporal_threshold {
> +      type: SIZE_T
> +      security_level: SXID_IGNORE
> +    }
> +    data_cache_size {
> +      type: SIZE_T
> +      security_level: SXID_IGNORE
> +    }
> +    shared_cache_size {
> +      type: SIZE_T
> +      security_level: SXID_IGNORE
>      }

Is it possible with current tunables approach to make it arch-specific? The
'ifunc' switch seems a generic one, but the 'non_temporal_threshold', 
'data_cache_size', and 'shared_cache_size' are very x86 specific and I see 
it confusing exposing them to non-x86 architectures.

>    }
>  }
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index c9a4cb7..709f58a 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -198,6 +198,14 @@ is 8 times the number of cores online.
>  @cindex hardware capability tunables
>  @cindex hwcap tunables
>  @cindex tunables, hwcap
> +@cindex ifunc tunables
> +@cindex tunables, ifunc
> +@cindex data_cache_size tunables
> +@cindex tunables, data_cache_size
> +@cindex shared_cache_size tunables
> +@cindex tunables, shared_cache_size
> +@cindex non_temporal_threshold tunables
> +@cindex tunables, non_temporal_threshold
>  
>  @deftp {Tunable namespace} glibc.tune
>  Behavior of @theglibc{} can be tuned to assume specific hardware capabilities
> @@ -213,3 +221,31 @@ extensions available in the processor at runtime for some architectures.  The
>  @code{glibc.tune.hwcap_mask} tunable allows the user to mask out those
>  capabilities at runtime, thus disabling use of those extensions.
>  @end deftp
> +
> +@deftp Tunable glibc.tune.ifunc
> +The @code{glibc.tune.ifunc=-xxx,yyy,-zzz...} tunable allows the user to
> +enable CPU/ARCH feature @code{yyy}, disable CPU/ARCH feature @code{xxx}
> +and @code{zzz} where the feature name is case-sensitive and has to match
> +the ones in @code{sysdeps/x86/cpu-features.h}.
> +
> +This tunable is specific to i386 and x86-64.
> +@end deftp
> +
> +@deftp Tunable glibc.tune.data_cache_size
> +The @code{glibc.tune.data_cache_size} tunable allows the user to set
> +data cache size in bytes for use in memory and string routines.
> +
> +This tunable is specific to i386 and x86-64.
> +@end deftp
> +
> +@deftp Tunable glibc.tune.shared_cache_size
> +The @code{glibc.tune.shared_cache_size} tunable allows the user to set
> +shared cache size in bytes for use in memory and string routines.
> +@end deftp
> +
> +@deftp Tunable glibc.tune.non_temporal_threshold
> +The @code{glibc.tune.non_temporal_threshold} tunable allows the user to
> +set threshold in bytes for non temporal store.
> +
> +This tunable is specific to i386 and x86-64.
> +@end deftp
> diff --git a/sysdeps/unix/sysv/linux/x86/dl-sysdep.c b/sysdeps/unix/sysv/linux/x86/dl-sysdep.c
> new file mode 100644
> index 0000000..64eb0d7
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/x86/dl-sysdep.c
> @@ -0,0 +1,21 @@
> +/* Operating system support for run-time dynamic linker.  X86 version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <config.h>
> +#include <sysdeps/x86/cpu-tunables.c>
> +#include <sysdeps/unix/sysv/linux/dl-sysdep.c>
> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> index 8000fd1..9542dfb 100644
> --- a/sysdeps/x86/cacheinfo.c
> +++ b/sysdeps/x86/cacheinfo.c
> @@ -752,6 +752,9 @@ intel_bug_no_cache_info:
>  #endif
>      }
>  
> +  if (cpu_features->data_cache_size != 0)
> +    data = cpu_features->data_cache_size;
> +
>    if (data > 0)
>      {
>        __x86_raw_data_cache_size_half = data / 2;
> @@ -762,6 +765,9 @@ intel_bug_no_cache_info:
>        __x86_data_cache_size = data;
>      }
>  
> +  if (cpu_features->shared_cache_size != 0)
> +    shared = cpu_features->shared_cache_size;
> +
>    if (shared > 0)
>      {
>        __x86_raw_shared_cache_size_half = shared / 2;
> @@ -777,7 +783,9 @@ intel_bug_no_cache_info:
>       store becomes faster on a 8-core processor.  This is the 3/4 of the
>       total shared cache size.  */
>    __x86_shared_non_temporal_threshold
> -    = __x86_shared_cache_size * threads * 3 / 4;
> +    = (cpu_features->non_temporal_threshold != 0
> +       ? cpu_features->non_temporal_threshold
> +       : __x86_shared_cache_size * threads * 3 / 4);
>  }
>  
>  #endif
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index 4288001..1ca0310 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -20,6 +20,15 @@
>  #include <cpu-features.h>
>  #include <dl-hwcap.h>
>  
> +#if HAVE_TUNABLES
> +# define TUNABLE_NAMESPACE tune
> +# include <unistd.h>		/* Get STDOUT_FILENO for _dl_printf.  */
> +# include <elf/dl-tunables.h>
> +
> +extern void TUNABLE_CALLBACK (set_ifunc) (tunable_val_t *)
> +  attribute_hidden;
> +#endif
> +
>  static void
>  get_common_indeces (struct cpu_features *cpu_features,
>  		    unsigned int *family, unsigned int *model,
> @@ -312,6 +321,16 @@ no_cpuid:
>    cpu_features->model = model;
>    cpu_features->kind = kind;
>  
> +#if HAVE_TUNABLES
> +  TUNABLE_GET (ifunc, tunable_val_t *, TUNABLE_CALLBACK (set_ifunc));
> +  cpu_features->non_temporal_threshold
> +    = TUNABLE_GET (non_temporal_threshold, long int, NULL);
> +  cpu_features->data_cache_size
> +    = TUNABLE_GET (data_cache_size, long int, NULL);
> +  cpu_features->shared_cache_size
> +    = TUNABLE_GET (shared_cache_size, long int, NULL);
> +#endif
> +
>    /* Reuse dl_platform, dl_hwcap and dl_hwcap_mask for x86.  */
>    GLRO(dl_platform) = NULL;
>    GLRO(dl_hwcap) = 0;
> diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
> index f2329ea..fef5e18 100644
> --- a/sysdeps/x86/cpu-features.h
> +++ b/sysdeps/x86/cpu-features.h
> @@ -215,6 +215,14 @@ struct cpu_features
>    unsigned int family;
>    unsigned int model;
>    unsigned int feature[FEATURE_INDEX_MAX];
> +  /* Data cache size for use in memory and string routines, typically
> +     L1 size.  */
> +  long int data_cache_size;
> +  /* Shared cache size for use in memory and string routines, typically
> +     L2 or L3 size.  */
> +  long int shared_cache_size;
> +  /* Threshold to use non temporal store.  */
> +  long int non_temporal_threshold;
>  };
>  
>  /* Used from outside of glibc to get access to the CPU features
> diff --git a/sysdeps/x86/cpu-tunables.c b/sysdeps/x86/cpu-tunables.c
> new file mode 100644
> index 0000000..dfa3647
> --- /dev/null
> +++ b/sysdeps/x86/cpu-tunables.c
> @@ -0,0 +1,330 @@
> +/* x86 CPU feature tuning.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#if HAVE_TUNABLES
> +# define TUNABLE_NAMESPACE tune
> +# include <stdbool.h>
> +# include <stdint.h>
> +# include <unistd.h>		/* Get STDOUT_FILENO for _dl_printf.  */
> +# include <elf/dl-tunables.h>
> +# include <string.h>
> +# include <cpu-features.h>
> +# include <ldsodefs.h>
> +
> +/* We can't use IFUNC memcmp nor strlen in init_cpu_features from libc.a
> +   since IFUNC must be set up by init_cpu_features.  */
> +# if defined USE_MULTIARCH && !defined SHARED
> +#  ifdef __x86_64__
> +#   define DEFAULT_MEMCMP	__memcmp_sse2
> +#   define DEFAULT_STRLEN	__strlen_sse2
> +extern __typeof (strlen) DEFAULT_STRLEN;
> +#  else
> +#   define DEFAULT_MEMCMP	__memcmp_ia32
> +#   define DEFAULT_STRLEN	strlen
> +#  endif
> +extern __typeof (memcmp) DEFAULT_MEMCMP;
> +# else
> +#  define DEFAULT_MEMCMP	memcmp
> +#  define DEFAULT_STRLEN	strlen
> +# endif
> +
> +# define CHECK_GLIBC_IFUNC_CPU_OFF(f, cpu_features, name, len)		\
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
> +  if (!DEFAULT_MEMCMP (f, #name, len))					\
> +    {									\
> +      cpu_features->cpuid[index_cpu_##name].reg_##name			\
> +	&= ~bit_cpu_##name;						\
> +      break;								\
> +    }
> +
> +/* Disable an ARCH feature NAME.  We don't enable an ARCH feature which
> +   isn't available.  */
> +# define CHECK_GLIBC_IFUNC_ARCH_OFF(f, cpu_features, name, len)		\
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
> +  if (!DEFAULT_MEMCMP (f, #name, len))					\
> +    {									\
> +      cpu_features->feature[index_arch_##name]				\
> +	&= ~bit_arch_##name;						\
> +      break;								\
> +    }
> +
> +/* Enable/disable an ARCH feature NAME.  */
> +# define CHECK_GLIBC_IFUNC_ARCH_BOTH(f, cpu_features, name, disable,	\
> +				    len)				\
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
> +  if (!DEFAULT_MEMCMP (f, #name, len))					\
> +    {									\
> +      if (disable)							\
> +	cpu_features->feature[index_arch_##name]			\
> +	  &= ~bit_arch_##name;						\
> +      else								\
> +	cpu_features->feature[index_arch_##name]			\
> +	  |= bit_arch_##name;						\
> +      break;								\
> +    }
> +
> +/* Enable/disable an ARCH feature NAME.  Enable an ARCH feature only
> +   if the ARCH feature NEED is also enabled.  */
> +# define CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH(f, cpu_features, name,	\
> +					       need, disable, len)	\
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
> +  if (!DEFAULT_MEMCMP (f, #name, len))					\
> +    {									\
> +      if (disable)							\
> +	cpu_features->feature[index_arch_##name]			\
> +	  &= ~bit_arch_##name;						\
> +      else if (CPU_FEATURES_ARCH_P (cpu_features, need))		\
> +	cpu_features->feature[index_arch_##name]			\
> +	  |= bit_arch_##name;						\
> +      break;								\
> +    }
> +
> +/* Enable/disable an ARCH feature NAME.  Enable an ARCH feature only
> +   if the CPU feature NEED is also enabled.  */
> +# define CHECK_GLIBC_IFUNC_ARCH_NEED_CPU_BOTH(f, cpu_features, name,	\
> +					      need, disable, len)	\
> +  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
> +  if (!DEFAULT_MEMCMP (f, #name, len))					\
> +    {									\
> +      if (disable)							\
> +	cpu_features->feature[index_arch_##name]			\
> +	  &= ~bit_arch_##name;						\
> +      else if (CPU_FEATURES_CPU_P (cpu_features, need))			\
> +	cpu_features->feature[index_arch_##name]			\
> +	  |= bit_arch_##name;						\
> +      break;								\
> +    }
> +
> +attribute_hidden
> +void
> +TUNABLE_CALLBACK (set_ifunc) (tunable_val_t *valp)
> +{
> +  /* The current IFUNC selection is based on microbenchmarks in glibc.
> +     It should give the best performance for most workloads.  But other
> +     choices may have better performance for a particular workload or on
> +     the hardware which wasn't available when the selection was made.
> +     The environment variable, GLIBC_IFUNC=-xxx,yyy,-zzz...., can be
> +     used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy
> +     and zzz, where the feature name is case-sensitive and has to match
> +     the ones in cpu-features.h.  It can be used by glibc developers to
> +     tune for a new processor or override the IFUNC selection to improve
> +     performance for a particular workload.
> +
> +     Since all CPU/ARCH features are hardware optimizations without
> +     security implication, except for Prefer_MAP_32BIT_EXEC, which can
> +     only be disabled, we check GLIBC_IFUNC for programs, including
> +     set*id ones.
> +
> +     NOTE: the IFUNC selection may change over time.  Please check all
> +     multiarch implementations when experimenting.  */
> +
> +  const char *p = valp->strval;
> +  struct cpu_features *cpu_features = &GLRO(dl_x86_cpu_features);
> +  const char *end = p + DEFAULT_STRLEN (p);
> +  size_t len;
> +
> +  do
> +    {
> +      const char *c, *n;
> +      bool disable;
> +      size_t nl;
> +
> +      for (c = p; *c != ','; c++)
> +	if (c >= end)
> +	  break;
> +
> +      len = c - p;
> +      disable = *p == '-';
> +      if (disable)
> +	{
> +	  n = p + 1;
> +	  nl = len - 1;
> +	}
> +      else
> +	{
> +	  n = p;
> +	  nl = len;
> +	}
> +      switch (nl)
> +	{
> +	default:
> +	  break;
> +	case 3:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX, 3);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, CX8, 3);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, FMA, 3);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, HTT, 3);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, RTM, 3);
> +	    }
> +	  break;
> +	case 4:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX2, 4);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, BMI1, 4);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, BMI2, 4);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, CMOV, 4);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, ERMS, 4);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, FMA4, 4);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSE2, 4);
> +	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, I586, 4);
> +	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, I686, 4);
> +	    }
> +	  break;
> +	case 5:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, LZCNT, 5);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, MOVBE, 5);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSSE3, 5);
> +	    }
> +	  break;
> +	case 6:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, POPCNT, 6);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSE4_1, 6);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSE4_2, 6);
> +	    }
> +	  break;
> +	case 7:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512F, 7);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, OSXSAVE, 7);
> +	    }
> +	  break;
> +	case 8:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512CD, 8);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512BW, 8);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512DQ, 8);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512ER, 8);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512PF, 8);
> +	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512VL, 8);
> +	    }
> +	  CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features, Slow_BSF,
> +				       disable, 8);
> +	  break;
> +	case 10:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, AVX_Usable,
> +					  10);
> +	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, FMA_Usable,
> +					  10);
> +	    }
> +	  break;
> +	case 11:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, AVX2_Usable,
> +					  11);
> +	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, FMA4_Usable,
> +					  11);
> +	    }
> +	  CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features, Prefer_ERMS,
> +				       disable, 11);
> +	  CHECK_GLIBC_IFUNC_ARCH_NEED_CPU_BOTH (n, cpu_features,
> +						Slow_SSE4_2, SSE4_2,
> +						disable, 11);
> +	  break;
> +	case 14:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features,
> +					  AVX512F_Usable, 14);
> +	    }
> +	  break;
> +	case 15:
> +	  if (disable)
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features,
> +					  AVX512DQ_Usable, 15);
> +	    }
> +	  CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features, Fast_Rep_String,
> +				       disable, 15);
> +	  break;
> +	case 16:
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH
> +		(n, cpu_features, Prefer_No_AVX512, AVX512F_Usable,
> +		 disable, 16);
> +	    }
> +	  break;
> +	case 18:
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +					   Fast_Copy_Backward, disable,
> +					   18);
> +	    }
> +	  break;
> +	case 19:
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +					   Fast_Unaligned_Load, disable,
> +					   19);
> +	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +					   Fast_Unaligned_Copy, disable,
> +					   19);
> +	    }
> +	  break;
> +	case 20:
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH
> +		(n, cpu_features, Prefer_No_VZEROUPPER, AVX_Usable,
> +		 disable, 20);
> +	    }
> +	  break;
> +	case 21:
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +					   Prefer_MAP_32BIT_EXEC, disable,
> +					   21);
> +	    }
> +	  break;
> +	case 23:
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH
> +		(n, cpu_features, AVX_Fast_Unaligned_Load, AVX_Usable,
> +		 disable, 23);
> +	    }
> +	  break;
> +	case 26:
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_NEED_CPU_BOTH
> +		(n, cpu_features, Prefer_PMINUB_for_stringop, SSE2,
> +		 disable, 26);
> +	    }
> +	  break;
> +	case 27:
> +	    {
> +	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
> +					   Use_dl_runtime_resolve_slow,
> +					   disable, 27);
> +	    }
> +	  break;
> +	}
> +      p += len + 1;
> +    }
> +  while (p < end);
> +}
> +#endif
>
  
H.J. Lu June 20, 2017, 1:28 p.m. UTC | #3
On Tue, Jun 20, 2017 at 6:23 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
> On 15/06/2017 10:10, H.J. Lu wrote:
>> The current IFUNC selection is based on microbenchmarks in glibc.  It
>> should give the best performance for most workloads.  But other choices
>> may have better performance for a particular workload or on the hardware
>> which wasn't available at the selection was made.  The environment
>> variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used
>> to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz,
>> where the feature name is case-sensitive and has to match the ones in
>> cpu-features.h.  It can be used by glibc developers to override the
>> IFUNC selection to tune for a new processor or improve performance for
>> a particular workload.  It isn't intended for normal end users.
>>
>> NOTE: the IFUNC selection may change over time.  Please check all
>> multiarch implementations when experimenting.
>>
>> Also, GLIBC_TUNABLES=glibc.tune.non_temporal_threshold=NUMBER is
>> provided to set threshold to use non temporal store to NUMBER,
>> GLIBC_TUNABLES=glibc.tune.data_cache_size=NUMBER to set data cache size,
>> GLIBC_TUNABLES=glibc.tune.shared_cache_size=NUMBER to set shared cache
>> size.
>>
>> Any comments?
>>
>> H.J.
>> ---
>> 2017-06-15  H.J. Lu  <hongjiu.lu@intel.com>
>>           Erich Elsen  <eriche@google.com>
>>
>>       * elf/dl-tunables.list (tune): Add ifunc, non_temporal_threshold,
>>       data_cache_size and shared_cache_size.
>>       * manual/tunables.texi: Document glibc.tune.ifunc,
>>       glibc.tune.data_cache_size, glibc.tune.shared_cache_size and
>>       glibc.tune.non_temporal_threshold.
>>       * sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file.
>>       * sysdeps/x86/cpu-tunables.c: Likewise.
>>       * sysdeps/x86/cacheinfo.c
>>       (init_cacheinfo): Check and get data cache size, shared cache
>>       size and non temporal threshold from cpu_features.
>>       * sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE):
>>       New.
>>       [HAVE_TUNABLES] Include <unistd.h>.
>>       [HAVE_TUNABLES] Include <elf/dl-tunables.h>.
>>       [HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise.
>>       [HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set
>>       IFUNC selection, data cache size, shared cache size and non
>>       temporal threshold.
>>       * sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size,
>>       shared_cache_size and non_temporal_threshold.
>> ---
>>  elf/dl-tunables.list                    |  16 ++
>>  manual/tunables.texi                    |  36 ++++
>>  sysdeps/unix/sysv/linux/x86/dl-sysdep.c |  21 ++
>>  sysdeps/x86/cacheinfo.c                 |  10 +-
>>  sysdeps/x86/cpu-features.c              |  19 ++
>>  sysdeps/x86/cpu-features.h              |   8 +
>>  sysdeps/x86/cpu-tunables.c              | 330 ++++++++++++++++++++++++++++++++
>>  7 files changed, 439 insertions(+), 1 deletion(-)
>>  create mode 100644 sysdeps/unix/sysv/linux/x86/dl-sysdep.c
>>  create mode 100644 sysdeps/x86/cpu-tunables.c
>>
>> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
>> index 41ce9af..78354fb 100644
>> --- a/elf/dl-tunables.list
>> +++ b/elf/dl-tunables.list
>> @@ -82,6 +82,22 @@ glibc {
>>        type: UINT_64
>>        env_alias: LD_HWCAP_MASK
>>        default: HWCAP_IMPORTANT
>> +      }
>> +    ifunc {
>> +      type: STRING
>> +      security_level: SXID_IGNORE
>> +    }
>> +    non_temporal_threshold {
>> +      type: SIZE_T
>> +      security_level: SXID_IGNORE
>> +    }
>> +    data_cache_size {
>> +      type: SIZE_T
>> +      security_level: SXID_IGNORE
>> +    }
>> +    shared_cache_size {
>> +      type: SIZE_T
>> +      security_level: SXID_IGNORE
>>      }
>
> Is it possible with current tunables approach to make it arch-specific? The
> 'ifunc' switch seems a generic one, but the 'non_temporal_threshold',
> 'data_cache_size', and 'shared_cache_size' are very x86 specific and I see
> it confusing exposing them to non-x86 architectures.

Yes, it can be made x86 specific.  I will update my patch shortly.

Thanks.

H.J.
  
H.J. Lu June 20, 2017, 1:41 p.m. UTC | #4
On Tue, Jun 20, 2017 at 6:28 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Jun 20, 2017 at 6:23 AM, Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>> On 15/06/2017 10:10, H.J. Lu wrote:
>>> The current IFUNC selection is based on microbenchmarks in glibc.  It
>>> should give the best performance for most workloads.  But other choices
>>> may have better performance for a particular workload or on the hardware
>>> which wasn't available at the selection was made.  The environment
>>> variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used
>>> to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz,
>>> where the feature name is case-sensitive and has to match the ones in
>>> cpu-features.h.  It can be used by glibc developers to override the
>>> IFUNC selection to tune for a new processor or improve performance for
>>> a particular workload.  It isn't intended for normal end users.
>>>
>>> NOTE: the IFUNC selection may change over time.  Please check all
>>> multiarch implementations when experimenting.
>>>
>>> Also, GLIBC_TUNABLES=glibc.tune.non_temporal_threshold=NUMBER is
>>> provided to set threshold to use non temporal store to NUMBER,
>>> GLIBC_TUNABLES=glibc.tune.data_cache_size=NUMBER to set data cache size,
>>> GLIBC_TUNABLES=glibc.tune.shared_cache_size=NUMBER to set shared cache
>>> size.
>>>
>>> Any comments?
>>>
>>> H.J.
>>> ---
>>> 2017-06-15  H.J. Lu  <hongjiu.lu@intel.com>
>>>           Erich Elsen  <eriche@google.com>
>>>
>>>       * elf/dl-tunables.list (tune): Add ifunc, non_temporal_threshold,
>>>       data_cache_size and shared_cache_size.
>>>       * manual/tunables.texi: Document glibc.tune.ifunc,
>>>       glibc.tune.data_cache_size, glibc.tune.shared_cache_size and
>>>       glibc.tune.non_temporal_threshold.
>>>       * sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file.
>>>       * sysdeps/x86/cpu-tunables.c: Likewise.
>>>       * sysdeps/x86/cacheinfo.c
>>>       (init_cacheinfo): Check and get data cache size, shared cache
>>>       size and non temporal threshold from cpu_features.
>>>       * sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE):
>>>       New.
>>>       [HAVE_TUNABLES] Include <unistd.h>.
>>>       [HAVE_TUNABLES] Include <elf/dl-tunables.h>.
>>>       [HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise.
>>>       [HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set
>>>       IFUNC selection, data cache size, shared cache size and non
>>>       temporal threshold.
>>>       * sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size,
>>>       shared_cache_size and non_temporal_threshold.
>>> ---
>>>  elf/dl-tunables.list                    |  16 ++
>>>  manual/tunables.texi                    |  36 ++++
>>>  sysdeps/unix/sysv/linux/x86/dl-sysdep.c |  21 ++
>>>  sysdeps/x86/cacheinfo.c                 |  10 +-
>>>  sysdeps/x86/cpu-features.c              |  19 ++
>>>  sysdeps/x86/cpu-features.h              |   8 +
>>>  sysdeps/x86/cpu-tunables.c              | 330 ++++++++++++++++++++++++++++++++
>>>  7 files changed, 439 insertions(+), 1 deletion(-)
>>>  create mode 100644 sysdeps/unix/sysv/linux/x86/dl-sysdep.c
>>>  create mode 100644 sysdeps/x86/cpu-tunables.c
>>>
>>> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
>>> index 41ce9af..78354fb 100644
>>> --- a/elf/dl-tunables.list
>>> +++ b/elf/dl-tunables.list
>>> @@ -82,6 +82,22 @@ glibc {
>>>        type: UINT_64
>>>        env_alias: LD_HWCAP_MASK
>>>        default: HWCAP_IMPORTANT
>>> +      }
>>> +    ifunc {
>>> +      type: STRING
>>> +      security_level: SXID_IGNORE
>>> +    }
>>> +    non_temporal_threshold {
>>> +      type: SIZE_T
>>> +      security_level: SXID_IGNORE
>>> +    }
>>> +    data_cache_size {
>>> +      type: SIZE_T
>>> +      security_level: SXID_IGNORE
>>> +    }
>>> +    shared_cache_size {
>>> +      type: SIZE_T
>>> +      security_level: SXID_IGNORE
>>>      }
>>
>> Is it possible with current tunables approach to make it arch-specific? The
>> 'ifunc' switch seems a generic one, but the 'non_temporal_threshold',
>> 'data_cache_size', and 'shared_cache_size' are very x86 specific and I see
>> it confusing exposing them to non-x86 architectures.
>
> Yes, it can be made x86 specific.  I will update my patch shortly.
>

Since ifunc and others are handled in the same name space, I have to
make non_temporal_threshold, .. as well as ifunc x86 specific.  Will it be
OK?
  
Siddhesh Poyarekar June 20, 2017, 2:36 p.m. UTC | #5
On Tuesday 20 June 2017 07:11 PM, H.J. Lu wrote:
> Since ifunc and others are handled in the same name space, I have to
> make non_temporal_threshold, .. as well as ifunc x86 specific.  Will it be
> OK?

I'm going to add ifunc selection for aarch64 soon, so it makes sense to
have that generic.  You could name the arch-specific tunables as
x86_non_temporal_threshold and so on.

Siddhesh
  
Victor Rodriguez June 20, 2017, 6:36 p.m. UTC | #6
-----Original Message-----

From: <libc-alpha-owner@sourceware.org> on behalf of Siddhesh Poyarekar
<siddhesh@gotplt.org>
Date: Tuesday, June 20, 2017 at 9:36 AM
To: "H.J. Lu" <hjl.tools@gmail.com>, Adhemerval Zanella
<adhemerval.zanella@linaro.org>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH] tunables: Add IFUNC selection and cache sizes

>On Tuesday 20 June 2017 07:11 PM, H.J. Lu wrote:
>> Since ifunc and others are handled in the same name space, I have to
>> make non_temporal_threshold, .. as well as ifunc x86 specific.  Will it
>>be
>> OK?
>
>I'm going to add ifunc selection for aarch64 soon, so it makes sense to
>have that generic.  You could name the arch-specific tunables as
>x86_non_temporal_threshold and so on.

What does it mean arch-specific? ( haswell / skyline / ... )

Victor
>
>Siddhesh
  
H.J. Lu June 20, 2017, 6:45 p.m. UTC | #7
On Tue, Jun 20, 2017 at 11:36 AM, Rodriguez Bahena, Victor
<victor.rodriguez.bahena@intel.com> wrote:
> -----Original Message-----
>
> From: <libc-alpha-owner@sourceware.org> on behalf of Siddhesh Poyarekar
> <siddhesh@gotplt.org>
> Date: Tuesday, June 20, 2017 at 9:36 AM
> To: "H.J. Lu" <hjl.tools@gmail.com>, Adhemerval Zanella
> <adhemerval.zanella@linaro.org>
> Cc: GNU C Library <libc-alpha@sourceware.org>
> Subject: Re: [PATCH] tunables: Add IFUNC selection and cache sizes
>
>>On Tuesday 20 June 2017 07:11 PM, H.J. Lu wrote:
>>> Since ifunc and others are handled in the same name space, I have to
>>> make non_temporal_threshold, .. as well as ifunc x86 specific.  Will it
>>>be
>>> OK?
>>
>>I'm going to add ifunc selection for aarch64 soon, so it makes sense to
>>have that generic.  You could name the arch-specific tunables as
>>x86_non_temporal_threshold and so on.
>
> What does it mean arch-specific? ( haswell / skyline / ... )
>

It is x86 vs aarch64.
  

Patch

diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index 41ce9af..78354fb 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -82,6 +82,22 @@  glibc {
       type: UINT_64
       env_alias: LD_HWCAP_MASK
       default: HWCAP_IMPORTANT
+      }
+    ifunc {
+      type: STRING
+      security_level: SXID_IGNORE
+    }
+    non_temporal_threshold {
+      type: SIZE_T
+      security_level: SXID_IGNORE
+    }
+    data_cache_size {
+      type: SIZE_T
+      security_level: SXID_IGNORE
+    }
+    shared_cache_size {
+      type: SIZE_T
+      security_level: SXID_IGNORE
     }
   }
 }
diff --git a/manual/tunables.texi b/manual/tunables.texi
index c9a4cb7..709f58a 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -198,6 +198,14 @@  is 8 times the number of cores online.
 @cindex hardware capability tunables
 @cindex hwcap tunables
 @cindex tunables, hwcap
+@cindex ifunc tunables
+@cindex tunables, ifunc
+@cindex data_cache_size tunables
+@cindex tunables, data_cache_size
+@cindex shared_cache_size tunables
+@cindex tunables, shared_cache_size
+@cindex non_temporal_threshold tunables
+@cindex tunables, non_temporal_threshold
 
 @deftp {Tunable namespace} glibc.tune
 Behavior of @theglibc{} can be tuned to assume specific hardware capabilities
@@ -213,3 +221,31 @@  extensions available in the processor at runtime for some architectures.  The
 @code{glibc.tune.hwcap_mask} tunable allows the user to mask out those
 capabilities at runtime, thus disabling use of those extensions.
 @end deftp
+
+@deftp Tunable glibc.tune.ifunc
+The @code{glibc.tune.ifunc=-xxx,yyy,-zzz...} tunable allows the user to
+enable CPU/ARCH feature @code{yyy}, disable CPU/ARCH feature @code{xxx}
+and @code{zzz} where the feature name is case-sensitive and has to match
+the ones in @code{sysdeps/x86/cpu-features.h}.
+
+This tunable is specific to i386 and x86-64.
+@end deftp
+
+@deftp Tunable glibc.tune.data_cache_size
+The @code{glibc.tune.data_cache_size} tunable allows the user to set
+data cache size in bytes for use in memory and string routines.
+
+This tunable is specific to i386 and x86-64.
+@end deftp
+
+@deftp Tunable glibc.tune.shared_cache_size
+The @code{glibc.tune.shared_cache_size} tunable allows the user to set
+shared cache size in bytes for use in memory and string routines.
+@end deftp
+
+@deftp Tunable glibc.tune.non_temporal_threshold
+The @code{glibc.tune.non_temporal_threshold} tunable allows the user to
+set threshold in bytes for non temporal store.
+
+This tunable is specific to i386 and x86-64.
+@end deftp
diff --git a/sysdeps/unix/sysv/linux/x86/dl-sysdep.c b/sysdeps/unix/sysv/linux/x86/dl-sysdep.c
new file mode 100644
index 0000000..64eb0d7
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86/dl-sysdep.c
@@ -0,0 +1,21 @@ 
+/* Operating system support for run-time dynamic linker.  X86 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+#include <sysdeps/x86/cpu-tunables.c>
+#include <sysdeps/unix/sysv/linux/dl-sysdep.c>
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 8000fd1..9542dfb 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -752,6 +752,9 @@  intel_bug_no_cache_info:
 #endif
     }
 
+  if (cpu_features->data_cache_size != 0)
+    data = cpu_features->data_cache_size;
+
   if (data > 0)
     {
       __x86_raw_data_cache_size_half = data / 2;
@@ -762,6 +765,9 @@  intel_bug_no_cache_info:
       __x86_data_cache_size = data;
     }
 
+  if (cpu_features->shared_cache_size != 0)
+    shared = cpu_features->shared_cache_size;
+
   if (shared > 0)
     {
       __x86_raw_shared_cache_size_half = shared / 2;
@@ -777,7 +783,9 @@  intel_bug_no_cache_info:
      store becomes faster on a 8-core processor.  This is the 3/4 of the
      total shared cache size.  */
   __x86_shared_non_temporal_threshold
-    = __x86_shared_cache_size * threads * 3 / 4;
+    = (cpu_features->non_temporal_threshold != 0
+       ? cpu_features->non_temporal_threshold
+       : __x86_shared_cache_size * threads * 3 / 4);
 }
 
 #endif
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 4288001..1ca0310 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -20,6 +20,15 @@ 
 #include <cpu-features.h>
 #include <dl-hwcap.h>
 
+#if HAVE_TUNABLES
+# define TUNABLE_NAMESPACE tune
+# include <unistd.h>		/* Get STDOUT_FILENO for _dl_printf.  */
+# include <elf/dl-tunables.h>
+
+extern void TUNABLE_CALLBACK (set_ifunc) (tunable_val_t *)
+  attribute_hidden;
+#endif
+
 static void
 get_common_indeces (struct cpu_features *cpu_features,
 		    unsigned int *family, unsigned int *model,
@@ -312,6 +321,16 @@  no_cpuid:
   cpu_features->model = model;
   cpu_features->kind = kind;
 
+#if HAVE_TUNABLES
+  TUNABLE_GET (ifunc, tunable_val_t *, TUNABLE_CALLBACK (set_ifunc));
+  cpu_features->non_temporal_threshold
+    = TUNABLE_GET (non_temporal_threshold, long int, NULL);
+  cpu_features->data_cache_size
+    = TUNABLE_GET (data_cache_size, long int, NULL);
+  cpu_features->shared_cache_size
+    = TUNABLE_GET (shared_cache_size, long int, NULL);
+#endif
+
   /* Reuse dl_platform, dl_hwcap and dl_hwcap_mask for x86.  */
   GLRO(dl_platform) = NULL;
   GLRO(dl_hwcap) = 0;
diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
index f2329ea..fef5e18 100644
--- a/sysdeps/x86/cpu-features.h
+++ b/sysdeps/x86/cpu-features.h
@@ -215,6 +215,14 @@  struct cpu_features
   unsigned int family;
   unsigned int model;
   unsigned int feature[FEATURE_INDEX_MAX];
+  /* Data cache size for use in memory and string routines, typically
+     L1 size.  */
+  long int data_cache_size;
+  /* Shared cache size for use in memory and string routines, typically
+     L2 or L3 size.  */
+  long int shared_cache_size;
+  /* Threshold to use non temporal store.  */
+  long int non_temporal_threshold;
 };
 
 /* Used from outside of glibc to get access to the CPU features
diff --git a/sysdeps/x86/cpu-tunables.c b/sysdeps/x86/cpu-tunables.c
new file mode 100644
index 0000000..dfa3647
--- /dev/null
+++ b/sysdeps/x86/cpu-tunables.c
@@ -0,0 +1,330 @@ 
+/* x86 CPU feature tuning.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if HAVE_TUNABLES
+# define TUNABLE_NAMESPACE tune
+# include <stdbool.h>
+# include <stdint.h>
+# include <unistd.h>		/* Get STDOUT_FILENO for _dl_printf.  */
+# include <elf/dl-tunables.h>
+# include <string.h>
+# include <cpu-features.h>
+# include <ldsodefs.h>
+
+/* We can't use IFUNC memcmp nor strlen in init_cpu_features from libc.a
+   since IFUNC must be set up by init_cpu_features.  */
+# if defined USE_MULTIARCH && !defined SHARED
+#  ifdef __x86_64__
+#   define DEFAULT_MEMCMP	__memcmp_sse2
+#   define DEFAULT_STRLEN	__strlen_sse2
+extern __typeof (strlen) DEFAULT_STRLEN;
+#  else
+#   define DEFAULT_MEMCMP	__memcmp_ia32
+#   define DEFAULT_STRLEN	strlen
+#  endif
+extern __typeof (memcmp) DEFAULT_MEMCMP;
+# else
+#  define DEFAULT_MEMCMP	memcmp
+#  define DEFAULT_STRLEN	strlen
+# endif
+
+# define CHECK_GLIBC_IFUNC_CPU_OFF(f, cpu_features, name, len)		\
+  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
+  if (!DEFAULT_MEMCMP (f, #name, len))					\
+    {									\
+      cpu_features->cpuid[index_cpu_##name].reg_##name			\
+	&= ~bit_cpu_##name;						\
+      break;								\
+    }
+
+/* Disable an ARCH feature NAME.  We don't enable an ARCH feature which
+   isn't available.  */
+# define CHECK_GLIBC_IFUNC_ARCH_OFF(f, cpu_features, name, len)		\
+  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
+  if (!DEFAULT_MEMCMP (f, #name, len))					\
+    {									\
+      cpu_features->feature[index_arch_##name]				\
+	&= ~bit_arch_##name;						\
+      break;								\
+    }
+
+/* Enable/disable an ARCH feature NAME.  */
+# define CHECK_GLIBC_IFUNC_ARCH_BOTH(f, cpu_features, name, disable,	\
+				    len)				\
+  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
+  if (!DEFAULT_MEMCMP (f, #name, len))					\
+    {									\
+      if (disable)							\
+	cpu_features->feature[index_arch_##name]			\
+	  &= ~bit_arch_##name;						\
+      else								\
+	cpu_features->feature[index_arch_##name]			\
+	  |= bit_arch_##name;						\
+      break;								\
+    }
+
+/* Enable/disable an ARCH feature NAME.  Enable an ARCH feature only
+   if the ARCH feature NEED is also enabled.  */
+# define CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH(f, cpu_features, name,	\
+					       need, disable, len)	\
+  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
+  if (!DEFAULT_MEMCMP (f, #name, len))					\
+    {									\
+      if (disable)							\
+	cpu_features->feature[index_arch_##name]			\
+	  &= ~bit_arch_##name;						\
+      else if (CPU_FEATURES_ARCH_P (cpu_features, need))		\
+	cpu_features->feature[index_arch_##name]			\
+	  |= bit_arch_##name;						\
+      break;								\
+    }
+
+/* Enable/disable an ARCH feature NAME.  Enable an ARCH feature only
+   if the CPU feature NEED is also enabled.  */
+# define CHECK_GLIBC_IFUNC_ARCH_NEED_CPU_BOTH(f, cpu_features, name,	\
+					      need, disable, len)	\
+  _Static_assert (sizeof (#name) - 1 == len, #name " != " #len);	\
+  if (!DEFAULT_MEMCMP (f, #name, len))					\
+    {									\
+      if (disable)							\
+	cpu_features->feature[index_arch_##name]			\
+	  &= ~bit_arch_##name;						\
+      else if (CPU_FEATURES_CPU_P (cpu_features, need))			\
+	cpu_features->feature[index_arch_##name]			\
+	  |= bit_arch_##name;						\
+      break;								\
+    }
+
+attribute_hidden
+void
+TUNABLE_CALLBACK (set_ifunc) (tunable_val_t *valp)
+{
+  /* The current IFUNC selection is based on microbenchmarks in glibc.
+     It should give the best performance for most workloads.  But other
+     choices may have better performance for a particular workload or on
+     the hardware which wasn't available when the selection was made.
+     The environment variable, GLIBC_IFUNC=-xxx,yyy,-zzz...., can be
+     used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy
+     and zzz, where the feature name is case-sensitive and has to match
+     the ones in cpu-features.h.  It can be used by glibc developers to
+     tune for a new processor or override the IFUNC selection to improve
+     performance for a particular workload.
+
+     Since all CPU/ARCH features are hardware optimizations without
+     security implication, except for Prefer_MAP_32BIT_EXEC, which can
+     only be disabled, we check GLIBC_IFUNC for programs, including
+     set*id ones.
+
+     NOTE: the IFUNC selection may change over time.  Please check all
+     multiarch implementations when experimenting.  */
+
+  const char *p = valp->strval;
+  struct cpu_features *cpu_features = &GLRO(dl_x86_cpu_features);
+  const char *end = p + DEFAULT_STRLEN (p);
+  size_t len;
+
+  do
+    {
+      const char *c, *n;
+      bool disable;
+      size_t nl;
+
+      for (c = p; *c != ','; c++)
+	if (c >= end)
+	  break;
+
+      len = c - p;
+      disable = *p == '-';
+      if (disable)
+	{
+	  n = p + 1;
+	  nl = len - 1;
+	}
+      else
+	{
+	  n = p;
+	  nl = len;
+	}
+      switch (nl)
+	{
+	default:
+	  break;
+	case 3:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX, 3);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, CX8, 3);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, FMA, 3);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, HTT, 3);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, RTM, 3);
+	    }
+	  break;
+	case 4:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX2, 4);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, BMI1, 4);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, BMI2, 4);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, CMOV, 4);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, ERMS, 4);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, FMA4, 4);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSE2, 4);
+	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, I586, 4);
+	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, I686, 4);
+	    }
+	  break;
+	case 5:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, LZCNT, 5);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, MOVBE, 5);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSSE3, 5);
+	    }
+	  break;
+	case 6:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, POPCNT, 6);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSE4_1, 6);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, SSE4_2, 6);
+	    }
+	  break;
+	case 7:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512F, 7);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, OSXSAVE, 7);
+	    }
+	  break;
+	case 8:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512CD, 8);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512BW, 8);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512DQ, 8);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512ER, 8);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512PF, 8);
+	      CHECK_GLIBC_IFUNC_CPU_OFF (n, cpu_features, AVX512VL, 8);
+	    }
+	  CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features, Slow_BSF,
+				       disable, 8);
+	  break;
+	case 10:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, AVX_Usable,
+					  10);
+	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, FMA_Usable,
+					  10);
+	    }
+	  break;
+	case 11:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, AVX2_Usable,
+					  11);
+	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features, FMA4_Usable,
+					  11);
+	    }
+	  CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features, Prefer_ERMS,
+				       disable, 11);
+	  CHECK_GLIBC_IFUNC_ARCH_NEED_CPU_BOTH (n, cpu_features,
+						Slow_SSE4_2, SSE4_2,
+						disable, 11);
+	  break;
+	case 14:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features,
+					  AVX512F_Usable, 14);
+	    }
+	  break;
+	case 15:
+	  if (disable)
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_OFF (n, cpu_features,
+					  AVX512DQ_Usable, 15);
+	    }
+	  CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features, Fast_Rep_String,
+				       disable, 15);
+	  break;
+	case 16:
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH
+		(n, cpu_features, Prefer_No_AVX512, AVX512F_Usable,
+		 disable, 16);
+	    }
+	  break;
+	case 18:
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
+					   Fast_Copy_Backward, disable,
+					   18);
+	    }
+	  break;
+	case 19:
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
+					   Fast_Unaligned_Load, disable,
+					   19);
+	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
+					   Fast_Unaligned_Copy, disable,
+					   19);
+	    }
+	  break;
+	case 20:
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH
+		(n, cpu_features, Prefer_No_VZEROUPPER, AVX_Usable,
+		 disable, 20);
+	    }
+	  break;
+	case 21:
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
+					   Prefer_MAP_32BIT_EXEC, disable,
+					   21);
+	    }
+	  break;
+	case 23:
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_NEED_ARCH_BOTH
+		(n, cpu_features, AVX_Fast_Unaligned_Load, AVX_Usable,
+		 disable, 23);
+	    }
+	  break;
+	case 26:
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_NEED_CPU_BOTH
+		(n, cpu_features, Prefer_PMINUB_for_stringop, SSE2,
+		 disable, 26);
+	    }
+	  break;
+	case 27:
+	    {
+	      CHECK_GLIBC_IFUNC_ARCH_BOTH (n, cpu_features,
+					   Use_dl_runtime_resolve_slow,
+					   disable, 27);
+	    }
+	  break;
+	}
+      p += len + 1;
+    }
+  while (p < end);
+}
+#endif