Patchwork aarch64: add HWCAP_ATOMICS to HWCAP_IMPORTANT

login
register
mail settings
Submitter Szabolcs Nagy
Date April 19, 2018, 11:51 a.m.
Message ID <1d8eb765-e147-534e-ed1e-daa8deb8d5a7@arm.com>
Download mbox | patch
Permalink /patch/26836/
State New
Headers show

Comments

Szabolcs Nagy - April 19, 2018, 11:51 a.m.
This enables searching shared libraries in atomics/ when the hardware
supports LSE atomics of armv8.1 so one can provide optimized variants
of libraries in a portable way.

LSE atomics does not affect library abi, the new instructions can
interoperate with old ones.

I'm not familiar with how this feature of the dynamic linker is used
in practice by distros or others so comments are welcome.

2018-04-19  Szabolcs Nagy  <szabolcs.nagy@arm.com>

	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add
	HWCAP_ATOMICS.
Adhemerval Zanella Netto - April 19, 2018, 2:38 p.m.
On 19/04/2018 08:51, Szabolcs Nagy wrote:
> This enables searching shared libraries in atomics/ when the hardware
> supports LSE atomics of armv8.1 so one can provide optimized variants
> of libraries in a portable way.
> 
> LSE atomics does not affect library abi, the new instructions can
> interoperate with old ones.
> 
> I'm not familiar with how this feature of the dynamic linker is used
> in practice by distros or others so comments are welcome.

Clearlinux seems to use this to provide optimized Intel libraries [1].

> 
> 2018-04-19  Szabolcs Nagy  <szabolcs.nagy@arm.com>
> 
>     * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add
>     HWCAP_ATOMICS.

I think what you want is something what x86_64 has done [2]: on cpu-features.c
the code creates a list of possible processor specific paths and sets it do
GLRO(dl_platform) (for instance on x86_64 if the underlying system is a haswell
it will add the haswell folder path).

Currently since AArch64 do not change dl_platform_init, it adds 'aarch64' from
AT_PLATFORM and 'cpuid' because of HWCAP_IMPORTANT.  IMHO neither does make
sense as search paths, I would expect at least the 'cpu_list' from aarch 
cpu-features.c (maybe by excluding the 'generic' field).

So I suggest to rework how aarch64 obtain the search path by setting the
dl_platform in cpu-features.c:

  - We can get the cpu_list if HWCAP_CPUID, so add only current cpu folder
    if it the case.

  - If HWCAP_ATOMICS is set add 'lse'.

  - Any more required?

[1] https://clearlinux.org/blogs/transparent-use-library-packages-optimized-intel-architecture
[2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=1432d38ea04ab5e96f21a38;hp=3b5f801ddb838311b5b05c218caac3bdb00d7c95
Steve Ellcey - April 19, 2018, 4:08 p.m.
On Thu, 2018-04-19 at 12:51 +0100, Szabolcs Nagy wrote:
> This enables searching shared libraries in atomics/ when the hardware
> supports LSE atomics of armv8.1 so one can provide optimized variants
> of libraries in a portable way.
> 
> LSE atomics does not affect library abi, the new instructions can
> interoperate with old ones.
> 
> I'm not familiar with how this feature of the dynamic linker is used
> in practice by distros or others so comments are welcome.
> 
> 2018-04-19  Szabolcs Nagy  <szabolcs.nagy@arm.com>
> 
> 	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h
> (HWCAP_IMPORTANT): Add
> 	HWCAP_ATOMICS.

I don't know if this is relavent or not but I checked in changes a few
months ago to use IFUNC in libatomic so that it could use LSE (or not)
depending on the aarch64 hardware it is running on.  So if a user is
just using libatomic calls then they don't need a separate library or a
sperate search path.  If they want their own libraries with versions
that do or do not use LSE directly (and don't want to use IFUNCs) then
this could still be needed/desired.

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00187.html
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00434.html

Steve Ellcey
sellcey@cavium.com
Szabolcs Nagy - April 19, 2018, 4:58 p.m.
On 19/04/18 17:08, Steve Ellcey wrote:
> On Thu, 2018-04-19 at 12:51 +0100, Szabolcs Nagy wrote:
>> This enables searching shared libraries in atomics/ when the hardware
>> supports LSE atomics of armv8.1 so one can provide optimized variants
>> of libraries in a portable way.
>>
>> LSE atomics does not affect library abi, the new instructions can
>> interoperate with old ones.
>>
>> I'm not familiar with how this feature of the dynamic linker is used
>> in practice by distros or others so comments are welcome.
>>
>> 2018-04-19  Szabolcs Nagy  <szabolcs.nagy@arm.com>
>>
>> 	* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h
>> (HWCAP_IMPORTANT): Add
>> 	HWCAP_ATOMICS.
> 
> I don't know if this is relavent or not but I checked in changes a few
> months ago to use IFUNC in libatomic so that it could use LSE (or not)
> depending on the aarch64 hardware it is running on.  So if a user is
> just using libatomic calls then they don't need a separate library or a
> sperate search path.  If they want their own libraries with versions
> that do or do not use LSE directly (and don't want to use IFUNCs) then
> this could still be needed/desired.
> 

yeah, i mainly want to enable libpthread/libc with lse atomics,
it's not practical to ifunc all pthread primitives and malloc
in glibc, but the armv8.1 atomic instructions may give significant
benefit so we want a way to use those in distros.

> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00187.html
> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00434.html
> 
> Steve Ellcey
> sellcey@cavium.com
>
Szabolcs Nagy - April 19, 2018, 5:06 p.m.
On 19/04/18 15:38, Adhemerval Zanella wrote:
> On 19/04/2018 08:51, Szabolcs Nagy wrote:
>> This enables searching shared libraries in atomics/ when the hardware
>> supports LSE atomics of armv8.1 so one can provide optimized variants
>> of libraries in a portable way.
>>
>> LSE atomics does not affect library abi, the new instructions can
>> interoperate with old ones.
>>
>> I'm not familiar with how this feature of the dynamic linker is used
>> in practice by distros or others so comments are welcome.
> 
> Clearlinux seems to use this to provide optimized Intel libraries [1].
> 

interesting thanks.

>> 2018-04-19  Szabolcs Nagy  <szabolcs.nagy@arm.com>
>>
>>      * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add
>>      HWCAP_ATOMICS.
> 
> I think what you want is something what x86_64 has done [2]: on cpu-features.c
> the code creates a list of possible processor specific paths and sets it do
> GLRO(dl_platform) (for instance on x86_64 if the underlying system is a haswell
> it will add the haswell folder path).
> 
> Currently since AArch64 do not change dl_platform_init, it adds 'aarch64' from
> AT_PLATFORM and 'cpuid' because of HWCAP_IMPORTANT.  IMHO neither does make
> sense as search paths, I would expect at least the 'cpu_list' from aarch
> cpu-features.c (maybe by excluding the 'generic' field).
> 

i don't know the reasons behind 'aarch64' and 'tls' search paths
and i have no particular attachment to the HWCAP_IMPORTANT mechanism.

> So I suggest to rework how aarch64 obtain the search path by setting the
> dl_platform in cpu-features.c:
> 
>    - We can get the cpu_list if HWCAP_CPUID, so add only current cpu folder
>      if it the case.
> 
>    - If HWCAP_ATOMICS is set add 'lse'.
> 

if these paths are for optimization only then i guess the list
can change between libc releases without causing issues other
than performance regressions.

in that case i'm in favor of removing unnecessary search paths.

atomics i think is a useful variant, i'll think about the cpuid
based search paths, i don't want too many variants since nobody
will prepare/test binaries for all uarch variants, but i do like
the ability to have alternative optimized libs.

>    - Any more required?
> 
> [1] https://clearlinux.org/blogs/transparent-use-library-packages-optimized-intel-architecture
> [2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=1432d38ea04ab5e96f21a38;hp=3b5f801ddb838311b5b05c218caac3bdb00d7c95
>
Adhemerval Zanella Netto - April 19, 2018, 7:25 p.m.
On 19/04/2018 14:06, Szabolcs Nagy wrote:
> On 19/04/18 15:38, Adhemerval Zanella wrote:
>> On 19/04/2018 08:51, Szabolcs Nagy wrote:
>>> This enables searching shared libraries in atomics/ when the hardware
>>> supports LSE atomics of armv8.1 so one can provide optimized variants
>>> of libraries in a portable way.
>>>
>>> LSE atomics does not affect library abi, the new instructions can
>>> interoperate with old ones.
>>>
>>> I'm not familiar with how this feature of the dynamic linker is used
>>> in practice by distros or others so comments are welcome.
>>
>> Clearlinux seems to use this to provide optimized Intel libraries [1].
>>
> 
> interesting thanks.
> 
>>> 2018-04-19  Szabolcs Nagy  <szabolcs.nagy@arm.com>
>>>
>>>      * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add
>>>      HWCAP_ATOMICS.
>>
>> I think what you want is something what x86_64 has done [2]: on cpu-features.c
>> the code creates a list of possible processor specific paths and sets it do
>> GLRO(dl_platform) (for instance on x86_64 if the underlying system is a haswell
>> it will add the haswell folder path).
>>
>> Currently since AArch64 do not change dl_platform_init, it adds 'aarch64' from
>> AT_PLATFORM and 'cpuid' because of HWCAP_IMPORTANT.  IMHO neither does make
>> sense as search paths, I would expect at least the 'cpu_list' from aarch
>> cpu-features.c (maybe by excluding the 'generic' field).
>>
> 
> i don't know the reasons behind 'aarch64' and 'tls' search paths
> and i have no particular attachment to the HWCAP_IMPORTANT mechanism.

'aarch64' came from default code at elf/dl-sysdep.c if the platform does
not override the dl_platform (and the code to set AT_PLATFORM on dl_platform
is from commit 0a54e401).  My guess is to provide a direct way to difference
ABI folders for bi-arch system (x86_64 and i686 for instance).

I think for aarch64 there is no direct gain in adding 'aarch64' in search
path.

> 
>> So I suggest to rework how aarch64 obtain the search path by setting the
>> dl_platform in cpu-features.c:
>>
>>    - We can get the cpu_list if HWCAP_CPUID, so add only current cpu folder
>>      if it the case.
>>
>>    - If HWCAP_ATOMICS is set add 'lse'.
>>
> 
> if these paths are for optimization only then i guess the list
> can change between libc releases without causing issues other
> than performance regressions.
> 
> in that case i'm in favor of removing unnecessary search paths.
> 
> atomics i think is a useful variant, i'll think about the cpuid
> based search paths, i don't want too many variants since nobody
> will prepare/test binaries for all uarch variants, but i do like
> the ability to have alternative optimized libs.

I think adding just 'lse' (or other meaningful name) should be suffice
for now.  If cavium/qualcomm/etc desire, they can propose adding more
search paths based on their requirements.

> 
>>    - Any more required?
>>
>> [1] https://clearlinux.org/blogs/transparent-use-library-packages-optimized-intel-architecture
>> [2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=1432d38ea04ab5e96f21a38;hp=3b5f801ddb838311b5b05c218caac3bdb00d7c95
>>
>
Carlos O'Donell - April 30, 2018, 3:27 p.m.
On 04/19/2018 01:06 PM, Szabolcs Nagy wrote:
> if these paths are for optimization only then i guess the list
> can change between libc releases without causing issues other
> than performance regressions.

There is a "full lifecycle" cost to turning on multilibs.

In RPM transactions the new libraries are installed and then the old
libraries are removed.

This means that you can have an old multilib'd library present in the
middle of the install, and if you are deprecating the multilib, the
remaining library will be the old one, it may be loaded, and may not
work correctly.

Consider this case:

* Add lse/ search dir if 8.1 LSE.
* Deploy multilibs.
* Deprecate multilib because performance is good enough by default and
  you want to stop paying for the build/test/qe costs of two builds.
* Remove multilibs.
 
At this point all multilib'd packages must deploy a "<package>_post_upgrade"
application which erases all known multilib'd files to avoid loading a
mixed lse/ library with a newer set of libraries. We do this in
glibc via glibc_post_upgrade.c in Fedora [1].

One alternative is to deprecate the lse/ search dir immediately in ld.so,
which is not always desirable because some applications may still be
relying on it to load/operate correctly.

So think carefully before turn on multilibs. The additional verification
and deployment costs should not be ignored.

As a reference, presently in Fedora for POWER we have 4 x ppc64be multilibs,
and only 1 x ppc64le multilib. This is relatively under control because the
ISAs are nested well and Fedora has all POWER8 builders (the latest multilib).
Carlos O'Donell - April 30, 2018, 3:33 p.m.
On 04/19/2018 07:51 AM, Szabolcs Nagy wrote:
> This enables searching shared libraries in atomics/ when the hardware
> supports LSE atomics of armv8.1 so one can provide optimized variants
> of libraries in a portable way.
> 
> LSE atomics does not affect library abi, the new instructions can
> interoperate with old ones.
> 
> I'm not familiar with how this feature of the dynamic linker is used
> in practice by distros or others so comments are welcome.
> 
> 2018-04-19  Szabolcs Nagy  <szabolcs.nagy@arm.com>
> 
>     * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add
>     HWCAP_ATOMICS.

This is your choice to enable, but you must convince downstream
distributions that it makes sense.

Adhemerval's technical comments are on point regarding the patch and
I'd follow his advice, though as you say one dir for each uarch is probably
not the right fit for AArch64.

Really, this is the "age old" question of building a multilib variant of
an architecture and compiling for a specific ISA or ISA extension.

If you turn this on, beware of the additional costs and complications, as
I commented downthread:
https://www.sourceware.org/ml/libc-alpha/2018-04/msg00624.html

For distributions the key technical metric for deciding to multilib
is performance.

What performance improvements do you see when you inline the 8.1 LSE
atomics? Across one application? Across the system? Across different
workloads?

Does that performance translate into real-world gains?

The cost to a distro is not insubstantial, it is additional errata/QE
(bodhi), requirements for more ARM hardware (8.1 LSE) to test upon,
build routing requirements to route all builds to the new hardware
for automated testing (koji), additional boxes for CI exposure
(koeshi), etc. etc.

Each distro must balance the value this brings against the cost to
the users and customers.

Your job is going to be to convince them that the performance is so
good that it outweighs the costs.

I'm excited to be convinced :-)

Patch

diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h
index 6887713149..4530cc2159 100644
--- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h
+++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h
@@ -27,9 +27,9 @@ 
 /* We cannot provide a general printing function.  */
 #define _dl_procinfo(type, word) -1
 
-/* HWCAP_CPUID should be available by default to influence IFUNC as well as
-   library search.  */
-#define HWCAP_IMPORTANT HWCAP_CPUID
+/* Default hwcap_mask setting, affects the library search path and cpu_features
+   used by glibc internal IFUNCs when the selected hwcaps are available.  */
+#define HWCAP_IMPORTANT (HWCAP_CPUID | HWCAP_ATOMICS)
 
 static inline const char *
 __attribute__ ((unused))