[v4] PowerPC: Influence cpu/arch hwcap features via GLIBC_TUNABLES.

Message ID 20230709154110.283700-1-bmahi496@linux.ibm.com
State Superseded
Headers
Series [v4] PowerPC: Influence cpu/arch hwcap features via GLIBC_TUNABLES. |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed

Commit Message

develop--- via Libc-alpha July 9, 2023, 3:41 p.m. UTC
  From: Mahesh Bodapati <mahesh.bodapati@ibm.com>

This patch enables the option to influence hwcaps used by PowerPC.
The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz....,
can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx
and zzz, where the feature name is case-sensitive and has to match the ones
mentioned in the file{sysdeps/powerpc/dl-procinfo.c}.

Note that the tunable only handles the features which are really used
in the IFUNC selection.  All others are ignored as the values are only
used inside glibc.
---
 manual/tunables.texi                          |   5 +-
 sysdeps/powerpc/cpu-features.c                |  95 +++++++++++++++-
 sysdeps/powerpc/cpu-features.h                | 103 ++++++++++++++++++
 sysdeps/powerpc/dl-tunables.list              |   3 +
 sysdeps/powerpc/hwcapinfo.c                   |   4 +
 .../power4/multiarch/ifunc-impl-list.c        |   4 +-
 .../powerpc32/power4/multiarch/init-arch.h    |  10 +-
 sysdeps/powerpc/powerpc64/dl-machine.h        |   2 -
 .../powerpc64/multiarch/ifunc-impl-list.c     |   7 +-
 9 files changed, 221 insertions(+), 12 deletions(-)
  

Comments

Peter Bergner July 10, 2023, 4:12 p.m. UTC | #1
On 7/9/23 10:41 AM, bmahi496@linux.ibm.com wrote:
> +static const struct
> +{
> +  unsigned short offset;
> +  unsigned int mask;
> +  bool id;
> +} hwcap_tunables[] = {
> +   /* AT_HWCAP tunable masks.  */
> +   { 0,     PPC_FEATURE_HAS_4xxMAC,                 0 },
> +   { 7,     PPC_FEATURE_HAS_ALTIVEC,                0 },
> +   { 15,    PPC_FEATURE_ARCH_2_05,                  0 },
> +   { 25,    PPC_FEATURE_ARCH_2_06,                  0 },


In my reply to one of Adhemerval's notes, I showed how you can
eliminate the hard coded offset field in the struct altogether.
I'd prefer you implement it that way, since this method is fragile
and easy to make mistakes.  Especially if someone wanted to add a
new feature somewhere in the middle of this list.  Then all offsets
after that would have to be recalculated by hand again.

The offset of the next feature string is just the previous feature's
offset + the strlen of the current feature + 1 for the null char,
so you can just keep a running offset sum as you iterate through
the loop...or as in my previous reply, update a pointer into the
hwcaps_names[] array as you go.  Either works.

Peter
  
MAHESH BODAPATI July 10, 2023, 5:28 p.m. UTC | #2
On 10/07/23 9:42 pm, Peter Bergner wrote:
> On 7/9/23 10:41 AM, bmahi496@linux.ibm.com wrote:
>> +static const struct
>> +{
>> +  unsigned short offset;
>> +  unsigned int mask;
>> +  bool id;
>> +} hwcap_tunables[] = {
>> +   /* AT_HWCAP tunable masks.  */
>> +   { 0,     PPC_FEATURE_HAS_4xxMAC,                 0 },
>> +   { 7,     PPC_FEATURE_HAS_ALTIVEC,                0 },
>> +   { 15,    PPC_FEATURE_ARCH_2_05,                  0 },
>> +   { 25,    PPC_FEATURE_ARCH_2_06,                  0 },
>
> In my reply to one of Adhemerval's notes, I showed how you can
> eliminate the hard coded offset field in the struct altogether.
> I'd prefer you implement it that way, since this method is fragile
> and easy to make mistakes.  Especially if someone wanted to add a
> new feature somewhere in the middle of this list.  Then all offsets
> after that would have to be recalculated by hand again.

Yes,We should only add new feature at the end of the list.
if we want to add the feature at the middle then we should recalculate 
the offsets.


>
> The offset of the next feature string is just the previous feature's
> offset + the strlen of the current feature + 1 for the null char,
> so you can just keep a running offset sum as you iterate through
> the loop...or as in my previous reply, update a pointer into the
> hwcaps_names[] array as you go.  Either works.

Sorry I missed it. I made the changes accordingly and sharing the 
updated patch.

>
> Peter
>
>
>
  

Patch

diff --git a/manual/tunables.texi b/manual/tunables.texi
index 4ca0e42a11..776fd93fd9 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -513,7 +513,10 @@  On s390x, the supported HWCAP and STFLE features can be found in
 @code{sysdeps/s390/cpu-features.c}.  In addition the user can also set
 a CPU arch-level like @code{z13} instead of single HWCAP and STFLE features.
 
-This tunable is specific to i386, x86-64 and s390x.
+On powerpc, the supported HWCAP and HWCAP2 features can be found in
+@code{sysdeps/powerpc/dl-procinfo.c}.
+
+This tunable is specific to i386, x86-64, s390x and powerpc.
 @end deftp
 
 @deftp Tunable glibc.cpu.cached_memopt
diff --git a/sysdeps/powerpc/cpu-features.c b/sysdeps/powerpc/cpu-features.c
index 0ef3cf89d2..2d31fa2f45 100644
--- a/sysdeps/powerpc/cpu-features.c
+++ b/sysdeps/powerpc/cpu-features.c
@@ -19,14 +19,107 @@ 
 #include <stdint.h>
 #include <cpu-features.h>
 #include <elf/dl-tunables.h>
+#include <unistd.h>
+#include <string.h>
+#define MEMCMP_DEFAULT memcmp
+#define STRLEN_DEFAULT strlen
+
+static void
+TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp)
+{
+  /* The current IFUNC selection is always using the most recent
+     features which are available via AT_HWCAP or AT_HWCAP2.  But in
+     some scenarios it is useful to adjust this selection.
+
+     The environment variable:
+
+     GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,....
+
+     Can be used to enable HWCAP/HWCAP2 feature yyy, disable HWCAP/HWCAP2
+     feature xxx, where the feature name is case-sensitive and has to match
+     the ones mentioned in the file{sysdeps/powerpc/dl-procinfo.c}. */
+
+  /* Copy the features from dl_powerpc_cpu_features, which contains the
+     features provided by AT_HWCAP and AT_HWCAP2.  */
+  struct cpu_features *cpu_features = &GLRO(dl_powerpc_cpu_features);
+  unsigned long int tcbv_hwcap = cpu_features->hwcap;
+  unsigned long int tcbv_hwcap2 = cpu_features->hwcap2;
+  unsigned int tun_count;
+  const char *token = valp->strval;
+  tun_count = sizeof(hwcap_tunables)/sizeof(hwcap_tunables[0]);
+  do
+    {
+      const char *token_end, *feature;
+      bool disable;
+      size_t token_len, i, feature_len;
+      /* Find token separator or end of string.  */
+      for (token_end = token; *token_end != ','; token_end++)
+	if (*token_end == '\0')
+	  break;
+
+      /* Determine feature.  */
+      token_len = token_end - token;
+      if (*token == '-')
+	{
+	  disable = true;
+	  feature = token + 1;
+	  feature_len = token_len - 1;
+	}
+      else
+	{
+	  disable = false;
+	  feature = token;
+	  feature_len = token_len;
+	}
+      for (i = 0; i < tun_count; ++i)
+	{
+	  const char *hwcap_name = hwcap_names + hwcap_tunables[i].offset;
+	  /* Check the tunable name on the supported list.  */
+	  if (STRLEN_DEFAULT (hwcap_name) == feature_len
+	      && MEMCMP_DEFAULT (feature, hwcap_name, feature_len)
+	      == 0)
+	    {
+	      /* Update the hwcap and hwcap2 bits.  */
+	      if (disable)
+		{
+		  /* Id is 1 for hwcap2 tunable.  */
+		  if (hwcap_tunables[i].id)
+		    cpu_features->hwcap2 &= ~(hwcap_tunables[i].mask);
+		  else
+		    cpu_features->hwcap &= ~(hwcap_tunables[i].mask);
+		}
+	      else
+		{
+		  /* Enable the features and also checking that no unsupported
+		     features were enabled by user.  */
+		  if (hwcap_tunables[i].id)
+		    cpu_features->hwcap2 |= (tcbv_hwcap2 & hwcap_tunables[i].mask);
+		  else
+		    cpu_features->hwcap |= (tcbv_hwcap & hwcap_tunables[i].mask);
+		}
+	      break;
+	    }
+	}
+	token += token_len;
+	/* ... and skip token separator for next round.  */
+	if (*token == ',') token++;
+    }
+  while (*token != '\0');
+}
 
 static inline void
-init_cpu_features (struct cpu_features *cpu_features)
+init_cpu_features (struct cpu_features *cpu_features, uint64_t hwcaps[])
 {
+  /* Fill the cpu_features with the supported hwcaps
+     which are set by __tcb_parse_hwcap_and_convert_at_platform.  */
+  cpu_features->hwcap = hwcaps[0];
+  cpu_features->hwcap2 = hwcaps[1];
   /* Default is to use aligned memory access on optimized function unless
      tunables is enable, since for this case user can explicit disable
      unaligned optimizations.  */
   int32_t cached_memfunc = TUNABLE_GET (glibc, cpu, cached_memopt, int32_t,
 					NULL);
   cpu_features->use_cached_memopt = (cached_memfunc > 0);
+  TUNABLE_GET (glibc, cpu, hwcaps, tunable_val_t *,
+	       TUNABLE_CALLBACK (set_hwcaps));
 }
diff --git a/sysdeps/powerpc/cpu-features.h b/sysdeps/powerpc/cpu-features.h
index d316dc3d64..f3fb01eb76 100644
--- a/sysdeps/powerpc/cpu-features.h
+++ b/sysdeps/powerpc/cpu-features.h
@@ -19,10 +19,113 @@ 
 # define __CPU_FEATURES_POWERPC_H
 
 #include <stdbool.h>
+#include <sys/auxv.h>
 
 struct cpu_features
 {
   bool use_cached_memopt;
+  unsigned long int hwcap;
+  unsigned long int hwcap2;
+};
+
+static const char hwcap_names[] = {
+  "4xxmac\0"
+  "altivec\0"
+  "arch_2_05\0"
+  "arch_2_06\0"
+  "archpmu\0"
+  "booke\0"
+  "cellbe\0"
+  "dfp\0"
+  "efpdouble\0"
+  "efpsingle\0"
+  "fpu\0"
+  "ic_snoop\0"
+  "mmu\0"
+  "notb\0"
+  "pa6t\0"
+  "power4\0"
+  "power5\0"
+  "power5+\0"
+  "power6x\0"
+  "ppc32\0"
+  "ppc601\0"
+  "ppc64\0"
+  "ppcle\0"
+  "smt\0"
+  "spe\0"
+  "true_le\0"
+  "ucache\0"
+  "vsx\0"
+  "arch_2_07\0"
+  "dscr\0"
+  "ebb\0"
+  "htm\0"
+  "htm-nosc\0"
+  "htm-no-suspend\0"
+  "isel\0"
+  "tar\0"
+  "vcrypto\0"
+  "arch_3_00\0"
+  "ieee128\0"
+  "darn\0"
+  "scv\0"
+  "arch_3_1\0"
+  "mma\0"
+};
+
+static const struct
+{
+  unsigned short offset;
+  unsigned int mask;
+  bool id;
+} hwcap_tunables[] = {
+   /* AT_HWCAP tunable masks.  */
+   { 0,     PPC_FEATURE_HAS_4xxMAC,                 0 },
+   { 7,     PPC_FEATURE_HAS_ALTIVEC,                0 },
+   { 15,    PPC_FEATURE_ARCH_2_05,                  0 },
+   { 25,    PPC_FEATURE_ARCH_2_06,                  0 },
+   { 35,    PPC_FEATURE_PSERIES_PERFMON_COMPAT,     0 },
+   { 43,    PPC_FEATURE_BOOKE,                      0 },
+   { 49,    PPC_FEATURE_CELL_BE,                    0 },
+   { 56,    PPC_FEATURE_HAS_DFP,                    0 },
+   { 60,    PPC_FEATURE_HAS_EFP_DOUBLE,             0 },
+   { 70,    PPC_FEATURE_HAS_EFP_SINGLE,             0 },
+   { 80,    PPC_FEATURE_HAS_FPU,                    0 },
+   { 84,    PPC_FEATURE_ICACHE_SNOOP,               0 },
+   { 93,    PPC_FEATURE_HAS_MMU,                    0 },
+   { 97,    PPC_FEATURE_NO_TB,                      0 },
+   { 102,   PPC_FEATURE_PA6T,                       0 },
+   { 107,   PPC_FEATURE_POWER4,                     0 },
+   { 114,   PPC_FEATURE_POWER5,                     0 },
+   { 121,   PPC_FEATURE_POWER5_PLUS,                0 },
+   { 129,   PPC_FEATURE_POWER6_EXT,                 0 },
+   { 137,   PPC_FEATURE_32,                         0 },
+   { 143,   PPC_FEATURE_601_INSTR,                  0 },
+   { 150,   PPC_FEATURE_64,                         0 },
+   { 156,   PPC_FEATURE_PPC_LE,                     0 },
+   { 162,   PPC_FEATURE_SMT,                        0 },
+   { 166,   PPC_FEATURE_HAS_SPE,                    0 },
+   { 170,   PPC_FEATURE_TRUE_LE,                    0 },
+   { 178,   PPC_FEATURE_UNIFIED_CACHE,              0 },
+   { 185,   PPC_FEATURE_HAS_VSX,                    0 },
+
+   /* AT_HWCAP2 tunable masks.  */
+   { 189,   PPC_FEATURE2_ARCH_2_07,                 1 },
+   { 199,   PPC_FEATURE2_HAS_DSCR,                  1 },
+   { 204,   PPC_FEATURE2_HAS_EBB,                   1 },
+   { 208,   PPC_FEATURE2_HAS_HTM,                   1 },
+   { 212,   PPC_FEATURE2_HTM_NOSC,                  1 },
+   { 221,   PPC_FEATURE2_HTM_NO_SUSPEND,            1 },
+   { 236,   PPC_FEATURE2_HAS_ISEL,                  1 },
+   { 241,   PPC_FEATURE2_HAS_TAR,                   1 },
+   { 245,   PPC_FEATURE2_HAS_VEC_CRYPTO,            1 },
+   { 253,   PPC_FEATURE2_ARCH_3_00,                 1 },
+   { 263,   PPC_FEATURE2_HAS_IEEE128,               1 },
+   { 271,   PPC_FEATURE2_DARN,                      1 },
+   { 276,   PPC_FEATURE2_SCV,                       1 },
+   { 280,   PPC_FEATURE2_ARCH_3_1,                  1 },
+   { 289,   PPC_FEATURE2_MMA,                       1 },
 };
 
 #endif /* __CPU_FEATURES_H  */
diff --git a/sysdeps/powerpc/dl-tunables.list b/sysdeps/powerpc/dl-tunables.list
index 87d6235c75..807b7f8013 100644
--- a/sysdeps/powerpc/dl-tunables.list
+++ b/sysdeps/powerpc/dl-tunables.list
@@ -24,5 +24,8 @@  glibc {
       maxval: 1
       default: 0
     }
+    hwcaps {
+      type: STRING
+    }
   }
 }
diff --git a/sysdeps/powerpc/hwcapinfo.c b/sysdeps/powerpc/hwcapinfo.c
index e26e64d99e..f2c473c556 100644
--- a/sysdeps/powerpc/hwcapinfo.c
+++ b/sysdeps/powerpc/hwcapinfo.c
@@ -19,6 +19,7 @@ 
 #include <unistd.h>
 #include <shlib-compat.h>
 #include <dl-procinfo.h>
+#include <cpu-features.c>
 
 tcbhead_t __tcb __attribute__ ((visibility ("hidden")));
 
@@ -63,6 +64,9 @@  __tcb_parse_hwcap_and_convert_at_platform (void)
   else if (h1 & PPC_FEATURE_POWER5)
     h1 |= PPC_FEATURE_POWER4;
 
+  uint64_t array_hwcaps[] = { h1, h2 };
+  init_cpu_features (&GLRO(dl_powerpc_cpu_features), array_hwcaps);
+
   /* Consolidate both HWCAP and HWCAP2 into a single doubleword so that
      we can read both in a single load later.  */
   __tcb.hwcap = (h1 << 32) | (h2 & 0xffffffff);
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c
index b4f80539e7..986c37d71e 100644
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c
@@ -21,6 +21,7 @@ 
 #include <wchar.h>
 #include <ldsodefs.h>
 #include <ifunc-impl-list.h>
+#include <cpu-features.h>
 
 size_t
 __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
@@ -28,7 +29,8 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 {
   size_t i = max;
 
-  unsigned long int hwcap = GLRO(dl_hwcap);
+  const struct cpu_features *features = &GLRO(dl_powerpc_cpu_features);
+  unsigned long int hwcap = features->hwcap;
   /* hwcap contains only the latest supported ISA, the code checks which is
      and fills the previous supported ones.  */
   if (hwcap & PPC_FEATURE_ARCH_2_06)
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
index 3dd00e02ee..a0bbd12012 100644
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
+++ b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
@@ -16,6 +16,7 @@ 
    <https://www.gnu.org/licenses/>.  */
 
 #include <ldsodefs.h>
+#include <cpu-features.h>
 
 /* The code checks if _rtld_global_ro was realocated before trying to access
    the dl_hwcap field. The assembly is to make the compiler not optimize the
@@ -32,11 +33,12 @@ 
 # define __GLRO(value)  GLRO(value)
 #endif
 
-/* dl_hwcap contains only the latest supported ISA, the macro checks which is
-   and fills the previous ones.  */
+/* Get the hardware information post the tunables set , the macro checks
+   it and fills the previous ones.  */
 #define INIT_ARCH() \
-  unsigned long int hwcap = __GLRO(dl_hwcap); 			\
-  unsigned long int __attribute__((unused)) hwcap2 = __GLRO(dl_hwcap2); \
+  const struct cpu_features *features = &GLRO(dl_powerpc_cpu_features);	\
+  unsigned long int hwcap = features->hwcap;				\
+  unsigned long int __attribute__((unused)) hwcap2 = features->hwcap2; \
   bool __attribute__((unused)) use_cached_memopt =		\
     __GLRO(dl_powerpc_cpu_features.use_cached_memopt);		\
   if (hwcap & PPC_FEATURE_ARCH_2_06)				\
diff --git a/sysdeps/powerpc/powerpc64/dl-machine.h b/sysdeps/powerpc/powerpc64/dl-machine.h
index 9b8943bc91..449208e86f 100644
--- a/sysdeps/powerpc/powerpc64/dl-machine.h
+++ b/sysdeps/powerpc/powerpc64/dl-machine.h
@@ -27,7 +27,6 @@ 
 #include <dl-tls.h>
 #include <sysdep.h>
 #include <hwcapinfo.h>
-#include <cpu-features.c>
 #include <dl-static-tls.h>
 #include <dl-funcdesc.h>
 #include <dl-machine-rel.h>
@@ -297,7 +296,6 @@  static inline void __attribute__ ((unused))
 dl_platform_init (void)
 {
   __tcb_parse_hwcap_and_convert_at_platform ();
-  init_cpu_features (&GLRO(dl_powerpc_cpu_features));
 }
 #endif
 
diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index ebe9434052..fc26dd0e17 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -17,6 +17,7 @@ 
    <https://www.gnu.org/licenses/>.  */
 
 #include <assert.h>
+#include <cpu-features.h>
 #include <string.h>
 #include <wchar.h>
 #include <ldsodefs.h>
@@ -27,9 +28,9 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 			size_t max)
 {
   size_t i = max;
-
-  unsigned long int hwcap = GLRO(dl_hwcap);
-  unsigned long int hwcap2 = GLRO(dl_hwcap2);
+  const struct cpu_features *features = &GLRO(dl_powerpc_cpu_features);
+  unsigned long int hwcap = features->hwcap;
+  unsigned long int hwcap2 = features->hwcap2;
 #ifdef SHARED
   int cacheline_size = GLRO(dl_cache_line_size);
 #endif