Add ifunc memcpy and memmove for aarch64

Message ID 1486598532.2866.66.camel@caviumnetworks.com
State New, archived
Headers

Commit Message

Steve Ellcey Feb. 9, 2017, 12:02 a.m. UTC
  On Wed, 2017-02-08 at 11:15 +0530, Siddhesh Poyarekar wrote:

> Looks OK with a couple of nits below.

Here is a de-nitted version with the ChangeLog using 'Likewise'
instead of 'Ditto' and with a one line description at the top
of libc-start.c.

Steve Ellcey
sellcey@caviium.com



2017-02-08  Steve Ellcey  <sellcey@caviumnetworks.com>
	    Adhemerval Zanella  <adhemerval.zanella@linaro.org>

	* sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
	(DL_PLATFORM_INIT): New define.
	(dl_platform_init): New function.
	* sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/libc-start.c: Likewise.
  

Comments

Szabolcs Nagy Feb. 9, 2017, 10:50 a.m. UTC | #1
On 09/02/17 00:02, Steve Ellcey wrote:
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +    GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +  /* init_cpu_features has been called early from __libc_start_main in
> +     static executable.  */
> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
...
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> +  if (GLRO(dl_hwcap) & HWCAP_CPUID)
> +    {
> +      register uint64_t id = 0;
> +      asm volatile ("mrs %0, midr_el1" : "=r"(id));
> +      cpu_features->midr_el1 = id;

this is a trap into the kernel at every process startup

since this is called very early (dynamic linking case
above, static linking case below) i wonder if there
could be a way for the user to request midr_el1==0
unconditionally (avoiding the overhead and making
sure the most generic implementation is used)

is there something like that on other targets?

> +    }
> +  else
> +    {
> +      cpu_features->midr_el1 = 0;
> +    }
> +}
...
> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function.  */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> +		   int argc, char **argv,
> +		   __typeof (main) init,
> +		   void (*fini) (void),
> +		   void (*rtld_fini) (void), void *stack_end)
> +{
> +  init_cpu_features (&_dl_aarch64_cpu_features);
> +  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> +			     stack_end);
> +}
> +#endif
>
  
Siddhesh Poyarekar Feb. 9, 2017, 11:04 a.m. UTC | #2
On Thursday 09 February 2017 04:20 PM, Szabolcs Nagy wrote:
> this is a trap into the kernel at every process startup
> 
> since this is called very early (dynamic linking case
> above, static linking case below) i wonder if there
> could be a way for the user to request midr_el1==0
> unconditionally (avoiding the overhead and making
> sure the most generic implementation is used)

Well you could use tunables to avoid it, but then if a single trap is a
problem for you then the tunables infra is going to be just as expensive.

Why is a single trap at startup such a concern though?

> is there something like that on other targets?

H.J. Lu had a patch to override IFUNCs using (what will now be) a
tunable but that patch did not make progress.  I hope it will now since
I too am interested in overriding IFUNC selection using tunables.  But
then this would be an orthogonal discussion to avoiding the trap since
the goals of both are different.

Siddhesh
  
Szabolcs Nagy Feb. 9, 2017, 11:40 a.m. UTC | #3
On 09/02/17 11:04, Siddhesh Poyarekar wrote:
> On Thursday 09 February 2017 04:20 PM, Szabolcs Nagy wrote:
>> this is a trap into the kernel at every process startup
>>
>> since this is called very early (dynamic linking case
>> above, static linking case below) i wonder if there
>> could be a way for the user to request midr_el1==0
>> unconditionally (avoiding the overhead and making
>> sure the most generic implementation is used)
> 
> Well you could use tunables to avoid it, but then if a single trap is a
> problem for you then the tunables infra is going to be just as expensive.
> 
> Why is a single trap at startup such a concern though?

ok, it is probably not worth worrying about
(should be around 0.1% of the minimal startup time now)

but doing it just to control ifunc selection is still
useful (at least for development)

(if eventually there will be widespread use of ifunc
based function multi-versioning then the trap will
not be per process startup but per module load which
is a bit more relevant, but also not something we can
control from the libc)

>> is there something like that on other targets?
> 
> H.J. Lu had a patch to override IFUNCs using (what will now be) a
> tunable but that patch did not make progress.  I hope it will now since
> I too am interested in overriding IFUNC selection using tunables.  But
> then this would be an orthogonal discussion to avoiding the trap since
> the goals of both are different.

i see.
  
Siddhesh Poyarekar Feb. 9, 2017, 11:53 a.m. UTC | #4
On Thursday 09 February 2017 05:10 PM, Szabolcs Nagy wrote:
> ok, it is probably not worth worrying about
> (should be around 0.1% of the minimal startup time now)
> 
> but doing it just to control ifunc selection is still
> useful (at least for development)

A simple tunable that disables CPU selection would be nice:

glibc.tune.cpu = default|thunderx|a57

However then you would have to think harder about positioning the cpu
structure initialization in static binaries to have them be controlled
by tunables and not just blindly put them right at the top of
__libc_start as the easy way out.  It is something that should be done
anyway.

However, couldn't we do that as an add-on to this patch?  I'd really
like this framework to go in early and be exercised a bit because I am
interested in pushing (in the coming weeks hopefully) some optimal
string routines for aarch64.

> (if eventually there will be widespread use of ifunc
> based function multi-versioning then the trap will
> not be per process startup but per module load which
> is a bit more relevant, but also not something we can
> control from the libc)

Right we cannot control that.

Siddhesh
  
Andrew Pinski Feb. 9, 2017, 3:54 p.m. UTC | #5
On Thu, Feb 9, 2017 at 2:50 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 09/02/17 00:02, Steve Ellcey wrote:
>> +static inline void __attribute__ ((unused))
>> +dl_platform_init (void)
>> +{
>> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
>> +    /* Avoid an empty string which would disturb us.  */
>> +    GLRO(dl_platform) = NULL;
>> +
>> +#ifdef SHARED
>> +  /* init_cpu_features has been called early from __libc_start_main in
>> +     static executable.  */
>> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
>> +#endif
>> +}
> ...
>> +static inline void
>> +init_cpu_features (struct cpu_features *cpu_features)
>> +{
>> +  if (GLRO(dl_hwcap) & HWCAP_CPUID)
>> +    {
>> +      register uint64_t id = 0;
>> +      asm volatile ("mrs %0, midr_el1" : "=r"(id));
>> +      cpu_features->midr_el1 = id;
>
> this is a trap into the kernel at every process startup
>
> since this is called very early (dynamic linking case
> above, static linking case below) i wonder if there
> could be a way for the user to request midr_el1==0
> unconditionally (avoiding the overhead and making
> sure the most generic implementation is used)

Well the easy way to do this would be use LD_HWCAP_MASK and mask off
the HWCAP_CPUID bit.

>
> is there something like that on other targets?

Some targets like PowerPC use the hwcap to say which processor they
are being run on so you use the same method as above.

Thanks,
Andrew

>
>> +    }
>> +  else
>> +    {
>> +      cpu_features->midr_el1 = 0;
>> +    }
>> +}
> ...
>> +#ifdef SHARED
>> +# include <csu/libc-start.c>
>> +# else
>> +/* The main work is done in the generic function.  */
>> +# define LIBC_START_DISABLE_INLINE
>> +# define LIBC_START_MAIN generic_start_main
>> +# include <csu/libc-start.c>
>> +# include <cpu-features.c>
>> +
>> +extern struct cpu_features _dl_aarch64_cpu_features;
>> +
>> +int
>> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>> +                int argc, char **argv,
>> +                __typeof (main) init,
>> +                void (*fini) (void),
>> +                void (*rtld_fini) (void), void *stack_end)
>> +{
>> +  init_cpu_features (&_dl_aarch64_cpu_features);
>> +  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
>> +                          stack_end);
>> +}
>> +#endif
>>
>
  
Steve Ellcey Feb. 10, 2017, 12:54 a.m. UTC | #6
On Thu, 2017-02-09 at 07:54 -0800, Andrew Pinski wrote:
> On Thu, Feb 9, 2017 at 2:50 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:> 
> > since this is called very early (dynamic linking case
> > above, static linking case below) i wonder if there
> > could be a way for the user to request midr_el1==0
> > unconditionally (avoiding the overhead and making
> > sure the most generic implementation is used)
> Well the easy way to do this would be use LD_HWCAP_MASK and mask off
> the HWCAP_CPUID bit.
> 
> > is there something like that on other targets?
> Some targets like PowerPC use the hwcap to say which processor they
> are being run on so you use the same method as above.
> 
> Thanks,
> Andrew

Do you know if LD_HWCAP_MASK actually works on PowerPC to do this?  I
see where PowerPC is reading GLRO(dl_hwcap) but I don't see them
reading GLRO(dl_hwcap_mask).  I don't think that the generic parts of
libc apply LD_HWCAP_MASK to HWCAP automatically but maybe I missed it
somewhere.

I changed init_cpu_features in my patch from:

	if (GLRO(dl_hwcap) & HWCAP_CPUID)
to
	if (GLRO(dl_hwcap) & GLRO(dl_hwcap_mask) & HWCAP_CPUID)

to see what would happen.  Initially this returned false because
we were using the default dl-procinfo.h and HWCAP_IMPORTANT (used
to initialize dl_hwcap_mask) was 0.  I made a copy of dl-procinfo.h
and set HWCAP_IMPORTANT to HPWCAP_CPUID and I got true here and the
thunderx ifuncs were run.

But if I ran things after setting LD_HWCAP_MASK to 0 it didn't seem to
have any affect and I still ran thunderx ifuncs.  I am not sure but it
seemed like this code in init_cpu_features may have been getting run
before LD_HWCAP_MASK was getting read and before GLRO(dl_hwcap_mask)
was reset from its initial value.

Steve Ellcey
  

Patch

diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 84b8aec..15d79a6 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -25,6 +25,7 @@ 
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-irel.h>
+#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -225,6 +226,23 @@  _dl_start_user:								\n\
 #define ELF_MACHINE_NO_REL 1
 #define ELF_MACHINE_NO_RELA 0
 
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+    /* Avoid an empty string which would disturb us.  */
+    GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+  /* init_cpu_features has been called early from __libc_start_main in
+     static executable.  */
+  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
+
 static inline ElfW(Addr)
 elf_machine_fixup_plt (struct link_map *map, lookup_t t,
 		       const ElfW(Rela) *reloc,
diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
index f277074..ba4ada3 100644
--- a/sysdeps/aarch64/ldsodefs.h
+++ b/sysdeps/aarch64/ldsodefs.h
@@ -20,6 +20,7 @@ 
 #define _AARCH64_LDSODEFS_H 1
 
 #include <elf.h>
+#include <cpu-features.h>
 
 struct La_aarch64_regs;
 struct La_aarch64_retval;
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
index e69de29..8e4b514 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
@@ -0,0 +1,38 @@ 
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <cpu-features.h>
+
+#ifndef HWCAP_CPUID
+# define HWCAP_CPUID		(1 << 11)
+#endif
+
+static inline void
+init_cpu_features (struct cpu_features *cpu_features)
+{
+  if (GLRO(dl_hwcap) & HWCAP_CPUID)
+    {
+      register uint64_t id = 0;
+      asm volatile ("mrs %0, midr_el1" : "=r"(id));
+      cpu_features->midr_el1 = id;
+    }
+  else
+    {
+      cpu_features->midr_el1 = 0;
+    }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
index e69de29..c92b650 100644
--- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
+++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
@@ -0,0 +1,49 @@ 
+/* Initialize CPU feature data.  AArch64 version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _CPU_FEATURES_AARCH64_H
+#define _CPU_FEATURES_AARCH64_H
+
+#include <stdint.h>
+
+#define MIDR_PARTNUM_SHIFT	4
+#define MIDR_PARTNUM_MASK	(0xfff << MIDR_PARTNUM_SHIFT)
+#define MIDR_PARTNUM(midr)	\
+	(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT	16
+#define MIDR_ARCHITECTURE_MASK	(0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_ARCHITECTURE(midr)	\
+	(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_VARIANT_SHIFT	20
+#define MIDR_VARIANT_MASK	(0xf << MIDR_VARIANT_SHIFT)
+#define MIDR_VARIANT(midr)	\
+	(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
+#define MIDR_IMPLEMENTOR_SHIFT	24
+#define MIDR_IMPLEMENTOR_MASK	(0xff << MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_IMPLEMENTOR(midr)	\
+	(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+
+#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C'	\
+			   && MIDR_PARTNUM(midr) == 0x0a1)
+
+struct cpu_features
+{
+  uint64_t midr_el1;
+};
+
+#endif /* _CPU_FEATURES_AARCH64_H  */
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
index e69de29..438046a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
@@ -0,0 +1,60 @@ 
+/* Data for AArch64 version of processor capability information.
+   Linux version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* If anything should be added here check whether the size of each string
+   is still ok with the given array size.
+
+   All the #ifdefs in the definitions are quite irritating but
+   necessary if we want to avoid duplicating the information.  There
+   are three different modes:
+
+   - PROCINFO_DECL is defined.  This means we are only interested in
+     declarations.
+
+   - PROCINFO_DECL is not defined:
+
+     + if SHARED is defined the file is included in an array
+       initializer.  The .element = { ... } syntax is needed.
+
+     + if SHARED is not defined a normal array initialization is
+       needed.
+  */
+
+#ifndef PROCINFO_CLASS
+# define PROCINFO_CLASS
+#endif
+
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+  ._dl_aarch64_cpu_features
+# else
+PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
+# endif
+# ifndef PROCINFO_DECL
+= { }
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
+#undef PROCINFO_DECL
+#undef PROCINFO_CLASS
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
index e69de29..a5babd4 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
+++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
@@ -0,0 +1,41 @@ 
+/* Override csu/libc-start.c on AArch64.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef SHARED
+# include <csu/libc-start.c>
+# else
+/* The main work is done in the generic function.  */
+# define LIBC_START_DISABLE_INLINE
+# define LIBC_START_MAIN generic_start_main
+# include <csu/libc-start.c>
+# include <cpu-features.c>
+
+extern struct cpu_features _dl_aarch64_cpu_features;
+
+int
+__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
+		   int argc, char **argv,
+		   __typeof (main) init,
+		   void (*fini) (void),
+		   void (*rtld_fini) (void), void *stack_end)
+{
+  init_cpu_features (&_dl_aarch64_cpu_features);
+  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
+			     stack_end);
+}
+#endif