aarch64: Make elf_machine_{load_address, dynamic} robust [BZ #28203]

Message ID 20210806184406.3192584-1-maskray@google.com
State Superseded
Delegated to: Szabolcs Nagy
Headers
Series aarch64: Make elf_machine_{load_address, dynamic} robust [BZ #28203] |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit fail Patch caused testsuite regressions

Commit Message

Fangrui Song Aug. 6, 2021, 6:44 p.m. UTC
  The AArch64 ABI is largely platform agnostic and does not specify
_GLOBAL_OFFSET_TABLE_[0] ([1]). glibc ld.so turns out to be probably the
only user of _GLOBAL_OFFSET_TABLE_[0] and GNU ld defines the value
to the link-time address _DYNAMIC. [2]

In 2012, __ehdr_start was implemented in GNU ld and gold.
Using adrp+addr to access __ehdr_start/_DYNAMIC gives us a robust way to
get the load address and the link-time address of _DYNAMIC.

With https://sourceware.org/pipermail/libc-alpha/2021-August/129864.html,
this patch, and disabling traditional TLSGD tests (neither Clang nor
LLD's aarch64 port supports), LLD linked glibc has the same number of
`make check` failures.

With this, I think
https://github.com/ARM-software/abi-aa/pull/57
doesn't need to specify _GLOBAL_OFFSET_TABLE_[0].
musl, FreeBSD, and NetBSD don't use _GLOBAL_OFFSET_TABLE_[0].

A bonus with the new asm approach: we drop reliance on compiler generating
PC-relative relocations to get the runtime address.

[1]: From a psABI maintainer, https://bugs.llvm.org/show_bug.cgi?id=49672#c2
[2]: LLD's aarch64 port does not set _GLOBAL_OFFSET_TABLE_[0] to the
link-time address _DYNAMIC.
LLD is widely used on aarch64 Android and ChromeOS devices.  Software
just works without the need for _GLOBAL_OFFSET_TABLE_[0].
---
 sysdeps/aarch64/dl-machine.h | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)
  

Comments

Szabolcs Nagy Aug. 9, 2021, 4:58 p.m. UTC | #1
The 08/06/2021 11:44, Fangrui Song via Libc-alpha wrote:
> The AArch64 ABI is largely platform agnostic and does not specify
> _GLOBAL_OFFSET_TABLE_[0] ([1]). glibc ld.so turns out to be probably the
> only user of _GLOBAL_OFFSET_TABLE_[0] and GNU ld defines the value
> to the link-time address _DYNAMIC. [2]
> 
> In 2012, __ehdr_start was implemented in GNU ld and gold.
> Using adrp+addr to access __ehdr_start/_DYNAMIC gives us a robust way to
> get the load address and the link-time address of _DYNAMIC.

typo: adrp+add

the binutils version is 2.23, i'd add that too to
the commit message.

this change only affects ld.so and static PIE and
in both cases we can rely on newer binutils.
(e.g. in libc_nonshared.a or crt1.o we can't)

> 
> With https://sourceware.org/pipermail/libc-alpha/2021-August/129864.html,
> this patch, and disabling traditional TLSGD tests (neither Clang nor
> LLD's aarch64 port supports), LLD linked glibc has the same number of
> `make check` failures.
> 
> With this, I think
> https://github.com/ARM-software/abi-aa/pull/57
> doesn't need to specify _GLOBAL_OFFSET_TABLE_[0].
> musl, FreeBSD, and NetBSD don't use _GLOBAL_OFFSET_TABLE_[0].

i don't know who might be using GOT[0], but even if
it's only glibc, we want future binutils to be able
to build a working older glibc for a while so i don't
think we can repurpose GOT[0].

> 
> A bonus with the new asm approach: we drop reliance on compiler generating
> PC-relative relocations to get the runtime address.

that's true but it's nicer to maintain c code:

e.g. we may build ld.so with -mcmodel=tiny and
then "adr" is enough instead of "adrp+add".

i think we already rely on hidden symbol access
not to generate dynamic relocs in the early ld.so
code.

csu/libc-start.c already uses __ehdr_start from
c code.

> 
> [1]: From a psABI maintainer, https://bugs.llvm.org/show_bug.cgi?id=49672#c2
> [2]: LLD's aarch64 port does not set _GLOBAL_OFFSET_TABLE_[0] to the
> link-time address _DYNAMIC.
> LLD is widely used on aarch64 Android and ChromeOS devices.  Software
> just works without the need for _GLOBAL_OFFSET_TABLE_[0].

the patch should be ok with c instead of asm.

thanks.

> ---
>  sysdeps/aarch64/dl-machine.h | 28 +++++++++++++---------------
>  1 file changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
> index d29d827ab3..bf251d2972 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -37,28 +37,26 @@ elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
>    return ehdr->e_machine == EM_AARCH64;
>  }
>  
> -/* Return the link-time address of _DYNAMIC.  Conveniently, this is the
> -   first element of the GOT. */
> -static inline ElfW(Addr) __attribute__ ((unused))
> -elf_machine_dynamic (void)
> -{
> -  extern const ElfW(Addr) _GLOBAL_OFFSET_TABLE_[] attribute_hidden;
> -  return _GLOBAL_OFFSET_TABLE_[0];
> -}
> -
>  /* Return the run-time load address of the shared object.  */
>  
>  static inline ElfW(Addr) __attribute__ ((unused))
>  elf_machine_load_address (void)
>  {
> -  /* To figure out the load address we use the definition that for any symbol:
> -     dynamic_addr(symbol) = static_addr(symbol) + load_addr
> +  ElfW(Addr) addr;
> +  asm ("adrp %0, __ehdr_start\n\t"
> +       "add %0, %0, :lo12:__ehdr_start" : "=r"(addr));
> +  return addr;
> +}
>  
> -    _DYNAMIC sysmbol is used here as its link-time address stored in
> -    the special unrelocated first GOT entry.  */
> +/* Return the link-time address of _DYNAMIC.  */
>  
> -    extern ElfW(Dyn) _DYNAMIC[] attribute_hidden;
> -    return (ElfW(Addr)) &_DYNAMIC - elf_machine_dynamic ();
> +static inline ElfW(Addr) __attribute__ ((unused))
> +elf_machine_dynamic (void)
> +{
> +  ElfW(Addr) addr;
> +  asm ("adrp %0, _DYNAMIC\n\t"
> +       "add %0, %0, :lo12:_DYNAMIC" : "=r"(addr));
> +  return addr - elf_machine_load_address ();
>  }
>  
>  /* Set up the loaded object described by L so its unrelocated PLT
> -- 
> 2.32.0.605.g8dce9f2422-goog
>
  
Fangrui Song Aug. 9, 2021, 5:40 p.m. UTC | #2
On 2021-08-09, Szabolcs Nagy wrote:
>The 08/06/2021 11:44, Fangrui Song via Libc-alpha wrote:
>> The AArch64 ABI is largely platform agnostic and does not specify
>> _GLOBAL_OFFSET_TABLE_[0] ([1]). glibc ld.so turns out to be probably the
>> only user of _GLOBAL_OFFSET_TABLE_[0] and GNU ld defines the value
>> to the link-time address _DYNAMIC. [2]
>>
>> In 2012, __ehdr_start was implemented in GNU ld and gold.
>> Using adrp+addr to access __ehdr_start/_DYNAMIC gives us a robust way to
>> get the load address and the link-time address of _DYNAMIC.
>
>typo: adrp+add
>
>the binutils version is 2.23, i'd add that too to
>the commit message.

Done.

>this change only affects ld.so and static PIE and
>in both cases we can rely on newer binutils.
>(e.g. in libc_nonshared.a or crt1.o we can't)

Thanks for additional context.

>>
>> With https://sourceware.org/pipermail/libc-alpha/2021-August/129864.html,
>> this patch, and disabling traditional TLSGD tests (neither Clang nor
>> LLD's aarch64 port supports), LLD linked glibc has the same number of
>> `make check` failures.
>>
>> With this, I think
>> https://github.com/ARM-software/abi-aa/pull/57
>> doesn't need to specify _GLOBAL_OFFSET_TABLE_[0].
>> musl, FreeBSD, and NetBSD don't use _GLOBAL_OFFSET_TABLE_[0].
>
>i don't know who might be using GOT[0], but even if
>it's only glibc, we want future binutils to be able
>to build a working older glibc for a while so i don't
>think we can repurpose GOT[0].

GNU ld and gold don't need to do anything.
LLD's aarch64 port never reserves space for .got .

If the ABI wants to say something, it can be non-normative text that GNU
ld uses .got[0].
.got.plt[0~2] are reserved by all of GNU ld, gold, and LLD.

>>
>> A bonus with the new asm approach: we drop reliance on compiler generating
>> PC-relative relocations to get the runtime address.
>
>that's true but it's nicer to maintain c code:
>
>e.g. we may build ld.so with -mcmodel=tiny and
>then "adr" is enough instead of "adrp+add".

Done.

>i think we already rely on hidden symbol access
>not to generate dynamic relocs in the early ld.so
>code.
>
>csu/libc-start.c already uses __ehdr_start from
>c code.

I have a separate patch 
https://sourceware.org/pipermail/libc-alpha/2021-August/129917.html
("elf: Unconditionally use __ehdr_start").

>>
>> [1]: From a psABI maintainer, https://bugs.llvm.org/show_bug.cgi?id=49672#c2
>> [2]: LLD's aarch64 port does not set _GLOBAL_OFFSET_TABLE_[0] to the
>> link-time address _DYNAMIC.
>> LLD is widely used on aarch64 Android and ChromeOS devices.  Software
>> just works without the need for _GLOBAL_OFFSET_TABLE_[0].
>
>the patch should be ok with c instead of asm.
>
>thanks.

Thanks for the comments.
v2: https://sourceware.org/pipermail/libc-alpha/2021-August/129966.html

>> ---
>>  sysdeps/aarch64/dl-machine.h | 28 +++++++++++++---------------
>>  1 file changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
>> index d29d827ab3..bf251d2972 100644
>> --- a/sysdeps/aarch64/dl-machine.h
>> +++ b/sysdeps/aarch64/dl-machine.h
>> @@ -37,28 +37,26 @@ elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
>>    return ehdr->e_machine == EM_AARCH64;
>>  }
>>
>> -/* Return the link-time address of _DYNAMIC.  Conveniently, this is the
>> -   first element of the GOT. */
>> -static inline ElfW(Addr) __attribute__ ((unused))
>> -elf_machine_dynamic (void)
>> -{
>> -  extern const ElfW(Addr) _GLOBAL_OFFSET_TABLE_[] attribute_hidden;
>> -  return _GLOBAL_OFFSET_TABLE_[0];
>> -}
>> -
>>  /* Return the run-time load address of the shared object.  */
>>
>>  static inline ElfW(Addr) __attribute__ ((unused))
>>  elf_machine_load_address (void)
>>  {
>> -  /* To figure out the load address we use the definition that for any symbol:
>> -     dynamic_addr(symbol) = static_addr(symbol) + load_addr
>> +  ElfW(Addr) addr;
>> +  asm ("adrp %0, __ehdr_start\n\t"
>> +       "add %0, %0, :lo12:__ehdr_start" : "=r"(addr));
>> +  return addr;
>> +}
>>
>> -    _DYNAMIC sysmbol is used here as its link-time address stored in
>> -    the special unrelocated first GOT entry.  */
>> +/* Return the link-time address of _DYNAMIC.  */
>>
>> -    extern ElfW(Dyn) _DYNAMIC[] attribute_hidden;
>> -    return (ElfW(Addr)) &_DYNAMIC - elf_machine_dynamic ();
>> +static inline ElfW(Addr) __attribute__ ((unused))
>> +elf_machine_dynamic (void)
>> +{
>> +  ElfW(Addr) addr;
>> +  asm ("adrp %0, _DYNAMIC\n\t"
>> +       "add %0, %0, :lo12:_DYNAMIC" : "=r"(addr));
>> +  return addr - elf_machine_load_address ();
>>  }
>>
>>  /* Set up the loaded object described by L so its unrelocated PLT
>> --
>> 2.32.0.605.g8dce9f2422-goog
>>
  

Patch

diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index d29d827ab3..bf251d2972 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -37,28 +37,26 @@  elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
   return ehdr->e_machine == EM_AARCH64;
 }
 
-/* Return the link-time address of _DYNAMIC.  Conveniently, this is the
-   first element of the GOT. */
-static inline ElfW(Addr) __attribute__ ((unused))
-elf_machine_dynamic (void)
-{
-  extern const ElfW(Addr) _GLOBAL_OFFSET_TABLE_[] attribute_hidden;
-  return _GLOBAL_OFFSET_TABLE_[0];
-}
-
 /* Return the run-time load address of the shared object.  */
 
 static inline ElfW(Addr) __attribute__ ((unused))
 elf_machine_load_address (void)
 {
-  /* To figure out the load address we use the definition that for any symbol:
-     dynamic_addr(symbol) = static_addr(symbol) + load_addr
+  ElfW(Addr) addr;
+  asm ("adrp %0, __ehdr_start\n\t"
+       "add %0, %0, :lo12:__ehdr_start" : "=r"(addr));
+  return addr;
+}
 
-    _DYNAMIC sysmbol is used here as its link-time address stored in
-    the special unrelocated first GOT entry.  */
+/* Return the link-time address of _DYNAMIC.  */
 
-    extern ElfW(Dyn) _DYNAMIC[] attribute_hidden;
-    return (ElfW(Addr)) &_DYNAMIC - elf_machine_dynamic ();
+static inline ElfW(Addr) __attribute__ ((unused))
+elf_machine_dynamic (void)
+{
+  ElfW(Addr) addr;
+  asm ("adrp %0, _DYNAMIC\n\t"
+       "add %0, %0, :lo12:_DYNAMIC" : "=r"(addr));
+  return addr - elf_machine_load_address ();
 }
 
 /* Set up the loaded object described by L so its unrelocated PLT