Patchwork [aarch64] Fix hpwcap argument passed to ifunc resolvers

login
register
mail settings
Submitter Steve Ellcey
Date Aug. 30, 2017, 10:35 p.m.
Message ID <1504132534.3182.4.camel@cavium.com>
Download mbox | patch
Permalink /patch/22431/
State New
Headers show

Comments

Steve Ellcey - Aug. 30, 2017, 10:35 p.m.
I submitted a patch to gcc to use IFUNCs in libatomic.

https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00545.html

While addressing some of the comments to that patch I
realized I needed some changes to glibc in order to
enable the functionality correctly.  This patch has
those changes.  In elf_ifunc_invoke I changed the
argument type passed to the ifunc resolver function
from 'unsigned long int' to 'uint64_t'.  This doesn't
matter until we do ILP32 but I would like to change
it now so that I can check in the gcc patch with a 
matching type and not have to change it later.  The
other change here is to use hwcap_mask to mask out
the hwcap value passed to the resolver function.  This
means that ifunc resolvers that use the hwcap argument
to check for hardware capabilities will be affected
by LD_HWCAP_MASK in the same way that glibc resolver
functions are.

Finally, I updated HWCAP_IMPORTANT to be all the
known HWCAP_* values or'ed together so that nothing
was masked out by default.  This was set to just
HWCAP_CPUID before and if you did not set LD_HWCAP_MASK
but tried to check HWCAP_ATOMICS in a ifunc resolver
you would get 0 because the default value of hwcap_mask
would not have that bit set and it would get masked out by
the AND operation I added to elf_ifunc_invoke.  I don't
think we want any of the hardware capabilities masked out
by default, but only if the user sets LD_HWCAP_MASK.

Tested on aarch64 with a modified GCC libatomic patch
and with the glibc testsuite.

Steve Ellcey
sellcey@cavium.com


2017-08-30  Steve Ellcey  <sellcey@cavium.com>

	* sysdeps/aarch64/dl-irel.h: Include elf/dl-hwcaps.h
	for GET_HWCAP_MASK.
	(elf_ifunc_invoke): Change argument type and use
	GET_HWCAP_MASK to mask out bits in dl_hwcap.
	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT):
	Use bitwise or of all defined HWCAP values as default value.
Szabolcs Nagy - Aug. 31, 2017, 3:19 p.m.
On 30/08/17 23:35, Steve Ellcey wrote:
> I submitted a patch to gcc to use IFUNCs in libatomic.
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00545.html
> 
> While addressing some of the comments to that patch I
> realized I needed some changes to glibc in order to
> enable the functionality correctly.  This patch has
> those changes.  In elf_ifunc_invoke I changed the
> argument type passed to the ifunc resolver function
> from 'unsigned long int' to 'uint64_t'.  This doesn't
> matter until we do ILP32 but I would like to change
> it now so that I can check in the gcc patch with a 
> matching type and not have to change it later.  The
> other change here is to use hwcap_mask to mask out
> the hwcap value passed to the resolver function.  This
> means that ifunc resolvers that use the hwcap argument
> to check for hardware capabilities will be affected
> by LD_HWCAP_MASK in the same way that glibc resolver
> functions are.
> 

i'm not yet convinced that LD_HWCAP_MASK should be
applied to ifunc resolvers: glibc might be interested
in different hwcap flags than external users and the
important hwcaps mask has other effect in glibc that
i need to review.

please do the type change and the masking in separate
patches (the libatomic change should work without masking
it's just not possible to opt out using an envvar).
Steve Ellcey - Aug. 31, 2017, 4:12 p.m.
On Thu, 2017-08-31 at 16:19 +0100, Szabolcs Nagy wrote:

> i'm not yet convinced that LD_HWCAP_MASK should be
> applied to ifunc resolvers: glibc might be interested
> in different hwcap flags than external users and the
> important hwcaps mask has other effect in glibc that
> i need to review.

Personally, I like the idea that LD_HWCAP_MASK can affect ifunc
resolvers.  It gives users a single method to turn off some hardware
capability across all libraries.  I can see where finer grain control
might also be useful but I think that in general we are going to have
situations where a particular hardware capability is just not working
the way it should and so we want to turn off its use everywhere in an
executable.

> please do the type change and the masking in separate
> patches (the libatomic change should work without masking
> it's just not possible to opt out using an envvar).

OK, I have sent this off in a seperate thread.

https://sourceware.org/ml/libc-alpha/2017-08/msg01355.html

Steve Ellcey
sellcey@cavium.com
Carlos O'Donell - Aug. 31, 2017, 11:59 p.m.
On 08/31/2017 11:12 AM, Steve Ellcey wrote:
> On Thu, 2017-08-31 at 16:19 +0100, Szabolcs Nagy wrote:
>>  
>> i'm not yet convinced that LD_HWCAP_MASK should be
>> applied to ifunc resolvers: glibc might be interested
>> in different hwcap flags than external users and the
>> important hwcaps mask has other effect in glibc that
>> i need to review.
> 
> Personally, I like the idea that LD_HWCAP_MASK can affect ifunc
> resolvers.  It gives users a single method to turn off some hardware
> capability across all libraries.  I can see where finer grain control
> might also be useful but I think that in general we are going to have
> situations where a particular hardware capability is just not working
> the way it should and so we want to turn off its use everywhere in an
> executable.

I agree with Szabolcs.

The use of LD_HWCAP_MASK is a bit of mess in glibc.

Problems:

* We have one LD_HWCAP_MASK, but two HWCAPs e.g. AT_HWCAP, AT_HWCAP2.

* LD_HWCAP_MASK is used for multiple things:

  * As a filter for ld.so.cache results.

  * As a filter for multilib directory selection.

* We already have sysdeps/aarch64/dl-tunables.list to select
  glibc.tune.cpu.type.

I'd just use the tunables.
Steve Ellcey - Sept. 1, 2017, 5:22 p.m.
On Thu, 2017-08-31 at 18:59 -0500, Carlos O'Donell wrote:

> I'd just use the tunables.

OK, I think using tunables is reasonable.  I seem to be having some
trouble with it though, maybe you can tell me if I am doing something
wrong.

On a thunderx box I run a program that calls memcpy, which is an ifunc.
It then calls __memcpy_thunderx the way I think it should.

Then, in the debugger, I run:

set environment GLIBC_TUNABLES=glibc.tune.cpu=generic

I rerun my test program and instead of calling __memcpy_generic (which
it should) or __memcpy_thunderx (which it did before), it calls
__memcpy_falkor, which is totally wrong.  Am I setting the variable
incorrectly?  I don't see any problems with the IS_FALKOR macro so I
think the problem may be in tunable_is_name unless I am just messing
up how GLIBC_TUNABLES is supposed to be set.

Steve Ellcey
sellcey@cavium.com
Steve Ellcey - Sept. 1, 2017, 5:30 p.m.
I think I found my problem,  tunable_is_name returns true if it matches
and false if it does not but the use of it
in sysdeps/unix/sysv/linux/aarch64/cpu-features.c is:

if (tunable_is_name (mcpu, cpu_list[i].name) == 0)

It should just be:

if (tunable_is_name (mcpu, cpu_list[i].name))

I will send a patch for this once I have tested it.

Steve Ellcey
sellcey@cavium.com



On Fri, 2017-09-01 at 10:22 -0700, Steve Ellcey wrote:
> On Thu, 2017-08-31 at 18:59 -0500, Carlos O'Donell wrote:
> 
> > 
> > I'd just use the tunables.
> OK, I think using tunables is reasonable.  I seem to be having some
> trouble with it though, maybe you can tell me if I am doing something
> wrong.
> 
> On a thunderx box I run a program that calls memcpy, which is an
> ifunc.
> It then calls __memcpy_thunderx the way I think it should.
> 
> Then, in the debugger, I run:
> 
> set environment GLIBC_TUNABLES=glibc.tune.cpu=generic
> 
> I rerun my test program and instead of calling __memcpy_generic
> (which
> it should) or __memcpy_thunderx (which it did before), it calls
> __memcpy_falkor, which is totally wrong.  Am I setting the variable
> incorrectly?  I don't see any problems with the IS_FALKOR macro so I
> think the problem may be in tunable_is_name unless I am just messing
> up how GLIBC_TUNABLES is supposed to be set.
> 
> Steve Ellcey
> sellcey@cavium.com
Carlos O'Donell - Sept. 1, 2017, 6:55 p.m.
On 09/01/2017 12:22 PM, Steve Ellcey wrote:
> On Thu, 2017-08-31 at 18:59 -0500, Carlos O'Donell wrote:
> 
>> I'd just use the tunables.
> 
> OK, I think using tunables is reasonable.  I seem to be having some
> trouble with it though, maybe you can tell me if I am doing something
> wrong.
> 
> On a thunderx box I run a program that calls memcpy, which is an ifunc.
> It then calls __memcpy_thunderx the way I think it should.
> 
> Then, in the debugger, I run:
> 
> set environment GLIBC_TUNABLES=glibc.tune.cpu=generic
> 
> I rerun my test program and instead of calling __memcpy_generic (which
> it should) or __memcpy_thunderx (which it did before), it calls
> __memcpy_falkor, which is totally wrong.  Am I setting the variable
> incorrectly?  I don't see any problems with the IS_FALKOR macro so I
> think the problem may be in tunable_is_name unless I am just messing
> up how GLIBC_TUNABLES is supposed to be set.

That looks right, you'll have to debug this to see what's going on.
Carlos O'Donell - Sept. 1, 2017, 6:56 p.m.
On 09/01/2017 12:30 PM, Steve Ellcey wrote:
> I think I found my problem,  tunable_is_name returns true if it matches
> and false if it does not but the use of it
> in sysdeps/unix/sysv/linux/aarch64/cpu-features.c is:
> 
> if (tunable_is_name (mcpu, cpu_list[i].name) == 0)
> 
> It should just be:
> 
> if (tunable_is_name (mcpu, cpu_list[i].name))
> 
> I will send a patch for this once I have tested it.
Thanks. This code is quite new. I expect this was just an oversight of
using the new API.

Patch

diff --git a/sysdeps/aarch64/dl-irel.h b/sysdeps/aarch64/dl-irel.h
index 4a80275..3ef8497 100644
--- a/sysdeps/aarch64/dl-irel.h
+++ b/sysdeps/aarch64/dl-irel.h
@@ -24,6 +24,7 @@ 
 #include <unistd.h>
 #include <ldsodefs.h>
 #include <sysdep.h>
+#include <elf/dl-hwcaps.h>
 
 #define ELF_MACHINE_IRELA	1
 
@@ -31,7 +32,8 @@  static inline ElfW(Addr)
 __attribute ((always_inline))
 elf_ifunc_invoke (ElfW(Addr) addr)
 {
-  return ((ElfW(Addr) (*) (unsigned long int)) (addr)) (GLRO(dl_hwcap));
+  return ((ElfW(Addr) (*) (uint64_t)) (addr))
+	   (GLRO(dl_hwcap) & GET_HWCAP_MASK());
 }
 
 static inline void
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h
index 0333a18..19637f6 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h
@@ -33,9 +33,15 @@ 
 /* Offset of the last bit allocated in HWCAP.  */
 #define _DL_HWCAP_LAST 15
 
-/* HWCAP_CPUID should be available by default to influence IFUNC as well as
-   library search.  */
-#define HWCAP_IMPORTANT HWCAP_CPUID
+/* HWCAP_IMPORTANT is used to set the default value of hwcap_mask, we do not
+   want to mask out any HWCAP values because then we could not use them in
+   an IFUNC selector.  */
+#define HWCAP_IMPORTANT (HWCAP_FP | HWCAP_ASIMD | HWCAP_EVTSTRM | HWCAP_AES \
+			 | HWCAP_PMULL | HWCAP_SHA1 | HWCAP_SHA2 | HWCAP_CRC32 \
+			 | HWCAP_ATOMICS | HWCAP_FPHP | HWCAP_ASIMDHP \
+		         | HWCAP_CPUID | HWCAP_ASIMDRDM | HWCAP_JSCVT \
+			 | HWCAP_FCMA | HWCAP_LRCPC)
+
 
 static inline const char *
 __attribute__ ((unused))