aarch64: Check PIC instead of SHARED in start.S

Message ID 20170929213203.GG2482@gmail.com
State New, archived
Headers

Commit Message

H.J. Lu Sept. 29, 2017, 9:32 p.m. UTC
  Since start.o may be compiled as PIC, we should check PIC instead of
SHARED.

OK for master?


	* sysdeps/aarch64/start.S (_start): Check PIC instead of SHARED.
---
 sysdeps/aarch64/start.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Szabolcs Nagy Oct. 2, 2017, 10:08 a.m. UTC | #1
On 29/09/17 22:32, H.J. Lu wrote:
> Since start.o may be compiled as PIC, we should check PIC instead of
> SHARED.
> 
> OK for master?
> 

i believe that the compile/link tests worked..

..but i still don't understand how the GOT entries
of the startup code get initialized in PIE executable
at runtime.

> 
> 	* sysdeps/aarch64/start.S (_start): Check PIC instead of SHARED.
> ---
>  sysdeps/aarch64/start.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/sysdeps/aarch64/start.S b/sysdeps/aarch64/start.S
> index c20433ad73..7a946506f2 100644
> --- a/sysdeps/aarch64/start.S
> +++ b/sysdeps/aarch64/start.S
> @@ -60,7 +60,7 @@ _start:
>  	/* Setup stack limit in argument register */
>  	mov	x6, sp
>  
> -#ifdef SHARED
> +#ifdef PIC
>          adrp    x0, :got:main
>  	ldr     PTR_REG (0), [x0, #:got_lo12:main]
>  
>
  
H.J. Lu Oct. 2, 2017, 11:20 a.m. UTC | #2
On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 29/09/17 22:32, H.J. Lu wrote:
>> Since start.o may be compiled as PIC, we should check PIC instead of
>> SHARED.
>>
>> OK for master?
>>
>
> i believe that the compile/link tests worked..

Does static PIE of hjl/pie/static branch run on arm and aarch64?

> ..but i still don't understand how the GOT entries
> of the startup code get initialized in PIE executable
> at runtime.

You just avoid GOT entries in start.S for static PIE by using
PC relative relocations.
  
Szabolcs Nagy Oct. 3, 2017, 10:39 a.m. UTC | #3
On 02/10/17 12:20, H.J. Lu wrote:
> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>> On 29/09/17 22:32, H.J. Lu wrote:
>>> Since start.o may be compiled as PIC, we should check PIC instead of
>>> SHARED.
>>>
>>> OK for master?
>>>
>>
>> i believe that the compile/link tests worked..
> 
> Does static PIE of hjl/pie/static branch run on arm and aarch64?
> 

no, if i build with --enable-static-pie the install step
fails when the static linked sln runs.

there are relative relocs against the func ptrs that are
loaded from GOT in the startup code, but execution fails
even before those are used because there are R*_JUMP_SLOT
and R*_GLOB_DAT relocs which are not processed correctly.

in particular in
  if (__pthread_initialize_minimal != NULL)
    __pthread_initialize_minimal ();
the symbol value loaded from GOT is non-NULL even though
there is no pthread linked in, that is probably a linker bug.

>> ..but i still don't understand how the GOT entries
>> of the startup code get initialized in PIE executable
>> at runtime.
> 
> You just avoid GOT entries in start.S for static PIE by using
> PC relative relocations.
> 

i don't see how can you do that when you have to pass
absolute addresses as arguments to __libc_start_main
and the base address is not yet computed.
  
H.J. Lu Oct. 3, 2017, 10:52 a.m. UTC | #4
On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 02/10/17 12:20, H.J. Lu wrote:
>> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>> On 29/09/17 22:32, H.J. Lu wrote:
>>>> Since start.o may be compiled as PIC, we should check PIC instead of
>>>> SHARED.
>>>>
>>>> OK for master?
>>>>
>>>
>>> i believe that the compile/link tests worked..
>>
>> Does static PIE of hjl/pie/static branch run on arm and aarch64?
>>
>
> no, if i build with --enable-static-pie the install step
> fails when the static linked sln runs.
>
> there are relative relocs against the func ptrs that are
> loaded from GOT in the startup code, but execution fails
> even before those are used because there are R*_JUMP_SLOT
> and R*_GLOB_DAT relocs which are not processed correctly.
>
> in particular in
>   if (__pthread_initialize_minimal != NULL)
>     __pthread_initialize_minimal ();
> the symbol value loaded from GOT is non-NULL even though
> there is no pthread linked in, that is probably a linker bug.
>
>>> ..but i still don't understand how the GOT entries
>>> of the startup code get initialized in PIE executable
>>> at runtime.
>>
>> You just avoid GOT entries in start.S for static PIE by using
>> PC relative relocations.
>>
>
> i don't see how can you do that when you have to pass
> absolute addresses as arguments to __libc_start_main
> and the base address is not yet computed.

Does ARM support PC relative relocation for local function address?
All functions are local in static PIE.  In i386/start.S, there are

	/* Load PIC register.  */
	call 1f
	addl $_GLOBAL_OFFSET_TABLE_, %ebx

	/* Push address of our own entry points to .fini and .init.  */
	leal __libc_csu_fini@GOTOFF(%ebx), %eax
	pushl %eax
	leal __libc_csu_init@GOTOFF(%ebx), %eax
	pushl %eax

	pushl %ecx		/* Push second argument: argv.  */
	pushl %esi		/* Push first argument: argc.  */

# ifdef SHARED
	pushl main@GOT(%ebx)
# else
	/* Avoid relocation in static PIE since _start is called before
	   it is relocated.  */
	leal main@GOTOFF(%ebx), %eax
	pushl %eax
# endif

GOTOFF can be resolved by linker to avoid dynamic relocations.

[hjl@gnu-efi-2 gcc]$ cat x.c
extern void foo (void) __attribute__ ((visibility("hidden")));
extern void bar (void*);

void
xxx (void)
{
  bar (foo);
}
[hjl@gnu-efi-2 gcc]$ cat x.s
	.arch armv8-a
	.file	"x.c"
	.text
	.align	2
	.align	3
	.global	xxx
	.type	xxx, %function
xxx:
	adrp	x0, foo
	add	x0, x0, :lo12:foo
	b	bar
	.size	xxx, .-xxx
	.hidden	foo
	.ident	"GCC: (GNU) 8.0.0 20171002 (experimental)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-efi-2 gcc]$

Does this need GOT?
  
Szabolcs Nagy Oct. 3, 2017, 11:47 a.m. UTC | #5
On 03/10/17 11:52, H.J. Lu wrote:
> On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>> On 02/10/17 12:20, H.J. Lu wrote:
>>> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>> On 29/09/17 22:32, H.J. Lu wrote:
>>>>> Since start.o may be compiled as PIC, we should check PIC instead of
>>>>> SHARED.
>>>>>
>>>>> OK for master?
>>>>>
>>>>
>>>> i believe that the compile/link tests worked..
>>>
>>> Does static PIE of hjl/pie/static branch run on arm and aarch64?
>>>
>>
>> no, if i build with --enable-static-pie the install step
>> fails when the static linked sln runs.
>>
>> there are relative relocs against the func ptrs that are
>> loaded from GOT in the startup code, but execution fails
>> even before those are used because there are R*_JUMP_SLOT
>> and R*_GLOB_DAT relocs which are not processed correctly.
>>
>> in particular in
>>   if (__pthread_initialize_minimal != NULL)
>>     __pthread_initialize_minimal ();
>> the symbol value loaded from GOT is non-NULL even though
>> there is no pthread linked in, that is probably a linker bug.
>>
>>>> ..but i still don't understand how the GOT entries
>>>> of the startup code get initialized in PIE executable
>>>> at runtime.
>>>
>>> You just avoid GOT entries in start.S for static PIE by using
>>> PC relative relocations.
>>>
>>
>> i don't see how can you do that when you have to pass
>> absolute addresses as arguments to __libc_start_main
>> and the base address is not yet computed.
> 
> Does ARM support PC relative relocation for local function address?
> All functions are local in static PIE.  In i386/start.S, there are
> 
> 	/* Load PIC register.  */
> 	call 1f
> 	addl $_GLOBAL_OFFSET_TABLE_, %ebx
> 
> 	/* Push address of our own entry points to .fini and .init.  */
> 	leal __libc_csu_fini@GOTOFF(%ebx), %eax
> 	pushl %eax
> 	leal __libc_csu_init@GOTOFF(%ebx), %eax
> 	pushl %eax
> 
> 	pushl %ecx		/* Push second argument: argv.  */
> 	pushl %esi		/* Push first argument: argc.  */
> 
> # ifdef SHARED
> 	pushl main@GOT(%ebx)
> # else
> 	/* Avoid relocation in static PIE since _start is called before
> 	   it is relocated.  */
> 	leal main@GOTOFF(%ebx), %eax
> 	pushl %eax
> # endif
> 
> GOTOFF can be resolved by linker to avoid dynamic relocations.
> 
> [hjl@gnu-efi-2 gcc]$ cat x.c
> extern void foo (void) __attribute__ ((visibility("hidden")));
> extern void bar (void*);
> 
> void
> xxx (void)
> {
>   bar (foo);
> }
> [hjl@gnu-efi-2 gcc]$ cat x.s
> 	.arch armv8-a
> 	.file	"x.c"
> 	.text
> 	.align	2
> 	.align	3
> 	.global	xxx
> 	.type	xxx, %function
> xxx:
> 	adrp	x0, foo
> 	add	x0, x0, :lo12:foo
> 	b	bar
> 	.size	xxx, .-xxx
> 	.hidden	foo
> 	.ident	"GCC: (GNU) 8.0.0 20171002 (experimental)"
> 	.section	.note.GNU-stack,"",@progbits
> [hjl@gnu-efi-2 gcc]$
> 
> Does this need GOT?
> 

ok, this works (for binaries < 4G), but i assumed the symbols
can come from external module (in case of non-static linking)
and thus crt1.o will need GOT entries anyway, but now i see
that all of __libc_csu_init, __libc_csu_fini and main will
be in the same module as crt1.o

i can update start.S, but i wonder if there might be code
where main is not in the executable for some reason, but
comes from a shared lib.
  
H.J. Lu Oct. 3, 2017, noon UTC | #6
On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 03/10/17 11:52, H.J. Lu wrote:
>> On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>> On 02/10/17 12:20, H.J. Lu wrote:
>>>> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>>> On 29/09/17 22:32, H.J. Lu wrote:
>>>>>> Since start.o may be compiled as PIC, we should check PIC instead of
>>>>>> SHARED.
>>>>>>
>>>>>> OK for master?
>>>>>>
>>>>>
>>>>> i believe that the compile/link tests worked..
>>>>
>>>> Does static PIE of hjl/pie/static branch run on arm and aarch64?
>>>>
>>>
>>> no, if i build with --enable-static-pie the install step
>>> fails when the static linked sln runs.
>>>
>>> there are relative relocs against the func ptrs that are
>>> loaded from GOT in the startup code, but execution fails
>>> even before those are used because there are R*_JUMP_SLOT
>>> and R*_GLOB_DAT relocs which are not processed correctly.
>>>
>>> in particular in
>>>   if (__pthread_initialize_minimal != NULL)
>>>     __pthread_initialize_minimal ();
>>> the symbol value loaded from GOT is non-NULL even though
>>> there is no pthread linked in, that is probably a linker bug.
>>>
>>>>> ..but i still don't understand how the GOT entries
>>>>> of the startup code get initialized in PIE executable
>>>>> at runtime.
>>>>
>>>> You just avoid GOT entries in start.S for static PIE by using
>>>> PC relative relocations.
>>>>
>>>
>>> i don't see how can you do that when you have to pass
>>> absolute addresses as arguments to __libc_start_main
>>> and the base address is not yet computed.
>>
>> Does ARM support PC relative relocation for local function address?
>> All functions are local in static PIE.  In i386/start.S, there are
>>
>> 	/* Load PIC register.  */
>> 	call 1f
>> 	addl $_GLOBAL_OFFSET_TABLE_, %ebx
>>
>> 	/* Push address of our own entry points to .fini and .init.  */
>> 	leal __libc_csu_fini@GOTOFF(%ebx), %eax
>> 	pushl %eax
>> 	leal __libc_csu_init@GOTOFF(%ebx), %eax
>> 	pushl %eax
>>
>> 	pushl %ecx		/* Push second argument: argv.  */
>> 	pushl %esi		/* Push first argument: argc.  */
>>
>> # ifdef SHARED
>> 	pushl main@GOT(%ebx)
>> # else
>> 	/* Avoid relocation in static PIE since _start is called before
>> 	   it is relocated.  */
>> 	leal main@GOTOFF(%ebx), %eax
>> 	pushl %eax
>> # endif
>>
>> GOTOFF can be resolved by linker to avoid dynamic relocations.
>>
>> [hjl@gnu-efi-2 gcc]$ cat x.c
>> extern void foo (void) __attribute__ ((visibility("hidden")));
>> extern void bar (void*);
>>
>> void
>> xxx (void)
>> {
>>   bar (foo);
>> }
>> [hjl@gnu-efi-2 gcc]$ cat x.s
>> 	.arch armv8-a
>> 	.file	"x.c"
>> 	.text
>> 	.align	2
>> 	.align	3
>> 	.global	xxx
>> 	.type	xxx, %function
>> xxx:
>> 	adrp	x0, foo
>> 	add	x0, x0, :lo12:foo
>> 	b	bar
>> 	.size	xxx, .-xxx
>> 	.hidden	foo
>> 	.ident	"GCC: (GNU) 8.0.0 20171002 (experimental)"
>> 	.section	.note.GNU-stack,"",@progbits
>> [hjl@gnu-efi-2 gcc]$
>>
>> Does this need GOT?
>>
>
> ok, this works (for binaries < 4G), but i assumed the symbols
> can come from external module (in case of non-static linking)
> and thus crt1.o will need GOT entries anyway, but now i see
> that all of __libc_csu_init, __libc_csu_fini and main will
> be in the same module as crt1.o
>
> i can update start.S, but i wonder if there might be code
> where main is not in the executable for some reason, but
> comes from a shared lib.
>

That won't happen with static PIE :-).
  
Szabolcs Nagy Oct. 3, 2017, 12:07 p.m. UTC | #7
On 03/10/17 13:00, H.J. Lu wrote:
> On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>> ok, this works (for binaries < 4G), but i assumed the symbols
>> can come from external module (in case of non-static linking)
>> and thus crt1.o will need GOT entries anyway, but now i see
>> that all of __libc_csu_init, __libc_csu_fini and main will
>> be in the same module as crt1.o
>>
>> i can update start.S, but i wonder if there might be code
>> where main is not in the executable for some reason, but
>> comes from a shared lib.
>>
> 
> That won't happen with static PIE :-).
> 

yes but now crt1.o is used for both static and non-static
linking and if i change start.S it will observably change
abi for the non-static case (i think it probably does not
matter in practice, but somebody might think otherwise).
  
H.J. Lu Oct. 3, 2017, 2:31 p.m. UTC | #8
On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 03/10/17 13:00, H.J. Lu wrote:
>> On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>> ok, this works (for binaries < 4G), but i assumed the symbols
>>> can come from external module (in case of non-static linking)
>>> and thus crt1.o will need GOT entries anyway, but now i see
>>> that all of __libc_csu_init, __libc_csu_fini and main will
>>> be in the same module as crt1.o
>>>
>>> i can update start.S, but i wonder if there might be code
>>> where main is not in the executable for some reason, but
>>> comes from a shared lib.
>>>
>>
>> That won't happen with static PIE :-).
>>
>
> yes but now crt1.o is used for both static and non-static
> linking and if i change start.S it will observably change
> abi for the non-static case (i think it probably does not
> matter in practice, but somebody might think otherwise).
>

Can you modify your start.S with

#if defined PIC && !defined SHARED

        pass local_main to __libc_start_main

local_main:
       tail call to main via PLT
  
Szabolcs Nagy Oct. 3, 2017, 3:17 p.m. UTC | #9
On 03/10/17 15:31, H.J. Lu wrote:
> On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>> On 03/10/17 13:00, H.J. Lu wrote:
>>> On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>> ok, this works (for binaries < 4G), but i assumed the symbols
>>>> can come from external module (in case of non-static linking)
>>>> and thus crt1.o will need GOT entries anyway, but now i see
>>>> that all of __libc_csu_init, __libc_csu_fini and main will
>>>> be in the same module as crt1.o
>>>>
>>>> i can update start.S, but i wonder if there might be code
>>>> where main is not in the executable for some reason, but
>>>> comes from a shared lib.
>>>>
>>>
>>> That won't happen with static PIE :-).
>>>
>>
>> yes but now crt1.o is used for both static and non-static
>> linking and if i change start.S it will observably change
>> abi for the non-static case (i think it probably does not
>> matter in practice, but somebody might think otherwise).
>>
> 
> Can you modify your start.S with
> 
> #if defined PIC && !defined SHARED
> 
>         pass local_main to __libc_start_main
> 
> local_main:
>        tail call to main via PLT
> 

good, so this problem can be solved too
(i'm not sure if it is worth solving though)
  
Szabolcs Nagy Oct. 3, 2017, 3:31 p.m. UTC | #10
On 03/10/17 11:39, Szabolcs Nagy wrote:
> On 02/10/17 12:20, H.J. Lu wrote:
>> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>> On 29/09/17 22:32, H.J. Lu wrote:
>>>> Since start.o may be compiled as PIC, we should check PIC instead of
>>>> SHARED.
>>>>
>>>> OK for master?
>>>>
>>>
>>> i believe that the compile/link tests worked..
>>
>> Does static PIE of hjl/pie/static branch run on arm and aarch64?
>>
> 
> no, if i build with --enable-static-pie the install step
> fails when the static linked sln runs.
> 
> there are relative relocs against the func ptrs that are
> loaded from GOT in the startup code, but execution fails
> even before those are used because there are R*_JUMP_SLOT
> and R*_GLOB_DAT relocs which are not processed correctly.
> 
> in particular in
>   if (__pthread_initialize_minimal != NULL)
>     __pthread_initialize_minimal ();
> the symbol value loaded from GOT is non-NULL even though
> there is no pthread linked in, that is probably a linker bug.
> 

it seems weak extern symbol is accessed via got and at
link time that is not relaxed to 0 with -static -pie
and the got entry is not initialized to 0 either.

as far as i can see the startup code and weak symbols
are the remaining issues on aarch64 for static pie.
  
H.J. Lu Oct. 4, 2017, 9:05 a.m. UTC | #11
On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 03/10/17 11:39, Szabolcs Nagy wrote:
>> On 02/10/17 12:20, H.J. Lu wrote:
>>> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>> On 29/09/17 22:32, H.J. Lu wrote:
>>>>> Since start.o may be compiled as PIC, we should check PIC instead of
>>>>> SHARED.
>>>>>
>>>>> OK for master?
>>>>>
>>>>
>>>> i believe that the compile/link tests worked..
>>>
>>> Does static PIE of hjl/pie/static branch run on arm and aarch64?
>>>
>>
>> no, if i build with --enable-static-pie the install step
>> fails when the static linked sln runs.
>>
>> there are relative relocs against the func ptrs that are
>> loaded from GOT in the startup code, but execution fails
>> even before those are used because there are R*_JUMP_SLOT
>> and R*_GLOB_DAT relocs which are not processed correctly.
>>
>> in particular in
>>   if (__pthread_initialize_minimal != NULL)
>>     __pthread_initialize_minimal ();
>> the symbol value loaded from GOT is non-NULL even though
>> there is no pthread linked in, that is probably a linker bug.
>>
>
> it seems weak extern symbol is accessed via got and at
> link time that is not relaxed to 0 with -static -pie
> and the got entry is not initialized to 0 either.

Please try the current hjl/pie/static branch.  I added Pcrt1.o to
to create static PIE.  There is no weak extern symbol anymore.
As long as there is no dynamic relocation before
_dl_relocate_static_pie relocates static PIE, it should work.

> as far as i can see the startup code and weak symbols
> are the remaining issues on aarch64 for static pie.
>
>
  
Szabolcs Nagy Oct. 6, 2017, 10:56 a.m. UTC | #12
On 03/10/17 16:31, Szabolcs Nagy wrote:
> On 03/10/17 11:39, Szabolcs Nagy wrote:
>> On 02/10/17 12:20, H.J. Lu wrote:
>>> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>> On 29/09/17 22:32, H.J. Lu wrote:
>>>>> Since start.o may be compiled as PIC, we should check PIC instead of
>>>>> SHARED.
>>>>>
>>>>> OK for master?
>>>>>
>>>>
>>>> i believe that the compile/link tests worked..
>>>
>>> Does static PIE of hjl/pie/static branch run on arm and aarch64?
>>>
>>
>> no, if i build with --enable-static-pie the install step
>> fails when the static linked sln runs.
>>
>> there are relative relocs against the func ptrs that are
>> loaded from GOT in the startup code, but execution fails
>> even before those are used because there are R*_JUMP_SLOT
>> and R*_GLOB_DAT relocs which are not processed correctly.
>>
>> in particular in
>>   if (__pthread_initialize_minimal != NULL)
>>     __pthread_initialize_minimal ();
>> the symbol value loaded from GOT is non-NULL even though
>> there is no pthread linked in, that is probably a linker bug.
>>
> 
> it seems weak extern symbol is accessed via got and at
> link time that is not relaxed to 0 with -static -pie
> and the got entry is not initialized to 0 either.

aarch64 dl-machine.h has

      struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
      ElfW(Addr) value = sym_map == NULL ? 0 : sym_map->l_addr + sym->st_value;

x86_64 has

      struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
      ElfW(Addr) value = (sym == NULL ? 0
			  : (ElfW(Addr)) sym_map->l_addr + sym->st_value);

sym_map is always == BOOTSTRAP_MAP in case of static pie, so
tye sym_map == NULL check is not true on aarch64 case for weak
undef symbols.

so either targets need to be fixed to not use sym_map check
for detecting undef weak (powerpc32, powerpc64, aarch64,
i386, arm, sh, sparc32, sparc64) or RESOLVE_MAP should not
be unconditionally set to BOOTSTRAP_MAP in _dl_relocate_static_pie
(since that is not true for undef symbols)
  
Szabolcs Nagy Oct. 6, 2017, 11:01 a.m. UTC | #13
On 04/10/17 10:05, H.J. Lu wrote:
> On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>
>> it seems weak extern symbol is accessed via got and at
>> link time that is not relaxed to 0 with -static -pie
>> and the got entry is not initialized to 0 either.
> 
> Please try the current hjl/pie/static branch.  I added Pcrt1.o to
> to create static PIE.  There is no weak extern symbol anymore.
> As long as there is no dynamic relocation before
> _dl_relocate_static_pie relocates static PIE, it should work.
> 

i still get weak undefined symbols (pthread stuff when linking
without -lpthread)

how Pcrt1.o is supposed to work? it seems it is built the same
way as crt1.o? (i'd expect Pcrt1 to mean "static pie link with
pie libc.a", and crt1.o to mean "link executable that will be
relocated by a dynamic linker")
  
H.J. Lu Oct. 6, 2017, 12:25 p.m. UTC | #14
On 10/6/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 04/10/17 10:05, H.J. Lu wrote:
>> On 10/3/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>
>>> it seems weak extern symbol is accessed via got and at
>>> link time that is not relaxed to 0 with -static -pie
>>> and the got entry is not initialized to 0 either.
>>
>> Please try the current hjl/pie/static branch.  I added Pcrt1.o to
>> to create static PIE.  There is no weak extern symbol anymore.
>> As long as there is no dynamic relocation before
>> _dl_relocate_static_pie relocates static PIE, it should work.
>>
>
> i still get weak undefined symbols (pthread stuff when linking
> without -lpthread)

All weak undefined symbols in static executable should be resolved
to zero at link-time, PIE nor non-PIE.  See

/* Is a undefined weak symbol which is resolved to 0.  Reference to an
   undefined weak symbol is resolved to 0 when building executable if
   it isn't dynamic and
   1. Has non-GOT/non-PLT relocations in text section.  Or
   2. Has no GOT/PLT relocation.
   Local undefined weak symbol is always resolved to 0.
 */
#define UNDEFINED_WEAK_RESOLVED_TO_ZERO(INFO, EH) \
  ((EH)->elf.root.type == bfd_link_hash_undefweak		 \
   && (SYMBOL_REFERENCES_LOCAL_P ((INFO), &(EH)->elf)		 \
       || (bfd_link_executable (INFO)				 \
	   && (!(EH)->has_got_reloc				 \
	       || (EH)->has_non_got_reloc))))

in elfxx-x86.h.  Also see:

https://sourceware.org/bugzilla/show_bug.cgi?id=19636

> how Pcrt1.o is supposed to work? it seems it is built the same
> way as crt1.o? (i'd expect Pcrt1 to mean "static pie link with
> pie libc.a", and crt1.o to mean "link executable that will be
> relocated by a dynamic linker")

That is correct:

[hjl@gnu-efi-2 glibc]$ readelf -sW csu/Pcrt1.o  | grep _dl_relocate_static_pie
[hjl@gnu-efi-2 glibc]$ readelf -sW csu/crt1.o  | grep _dl_relocate_static_pie
    24: 0000000000000038     4 FUNC    GLOBAL HIDDEN     2
_dl_relocate_static_pie
[hjl@gnu-efi-2 glibc]$

When Pcrt1.o is used, the real _dl_relocate_static_pie in libc.a will
be used for
static PIE.  Otherwise, you will get a dummy _dl_relocate_static_pie in crt1.o
for non-PIE static and dynamic executables.
  
H.J. Lu Oct. 6, 2017, 12:34 p.m. UTC | #15
On 10/6/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 03/10/17 16:31, Szabolcs Nagy wrote:
>> On 03/10/17 11:39, Szabolcs Nagy wrote:
>>> On 02/10/17 12:20, H.J. Lu wrote:
>>>> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>>> On 29/09/17 22:32, H.J. Lu wrote:
>>>>>> Since start.o may be compiled as PIC, we should check PIC instead of
>>>>>> SHARED.
>>>>>>
>>>>>> OK for master?
>>>>>>
>>>>>
>>>>> i believe that the compile/link tests worked..
>>>>
>>>> Does static PIE of hjl/pie/static branch run on arm and aarch64?
>>>>
>>>
>>> no, if i build with --enable-static-pie the install step
>>> fails when the static linked sln runs.
>>>
>>> there are relative relocs against the func ptrs that are
>>> loaded from GOT in the startup code, but execution fails
>>> even before those are used because there are R*_JUMP_SLOT
>>> and R*_GLOB_DAT relocs which are not processed correctly.
>>>
>>> in particular in
>>>   if (__pthread_initialize_minimal != NULL)
>>>     __pthread_initialize_minimal ();
>>> the symbol value loaded from GOT is non-NULL even though
>>> there is no pthread linked in, that is probably a linker bug.
>>>
>>
>> it seems weak extern symbol is accessed via got and at
>> link time that is not relaxed to 0 with -static -pie
>> and the got entry is not initialized to 0 either.
>
> aarch64 dl-machine.h has
>
>       struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
>       ElfW(Addr) value = sym_map == NULL ? 0 : sym_map->l_addr +
> sym->st_value;
>
> x86_64 has
>
>       struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
>       ElfW(Addr) value = (sym == NULL ? 0
> 			  : (ElfW(Addr)) sym_map->l_addr + sym->st_value);
>
> sym_map is always == BOOTSTRAP_MAP in case of static pie, so
> tye sym_map == NULL check is not true on aarch64 case for weak
> undef symbols.
>
> so either targets need to be fixed to not use sym_map check
> for detecting undef weak (powerpc32, powerpc64, aarch64,
> i386, arm, sh, sparc32, sparc64) or RESOLVE_MAP should not
> be unconditionally set to BOOTSTRAP_MAP in _dl_relocate_static_pie
> (since that is not true for undef symbols)
>

It shouldn't matter.  All undefined weak symbols should be resolved
to 0 in static PIE by linker.  See:

https://sourceware.org/bugzilla/show_bug.cgi?id=22269
  
Szabolcs Nagy Oct. 6, 2017, 1:08 p.m. UTC | #16
On 06/10/17 13:34, H.J. Lu wrote:
> On 10/6/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>> On 03/10/17 16:31, Szabolcs Nagy wrote:
>>> On 03/10/17 11:39, Szabolcs Nagy wrote:
>>>> On 02/10/17 12:20, H.J. Lu wrote:
>>>>> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>>>> On 29/09/17 22:32, H.J. Lu wrote:
>>>>>>> Since start.o may be compiled as PIC, we should check PIC instead of
>>>>>>> SHARED.
>>>>>>>
>>>>>>> OK for master?
>>>>>>>
>>>>>>
>>>>>> i believe that the compile/link tests worked..
>>>>>
>>>>> Does static PIE of hjl/pie/static branch run on arm and aarch64?
>>>>>
>>>>
>>>> no, if i build with --enable-static-pie the install step
>>>> fails when the static linked sln runs.
>>>>
>>>> there are relative relocs against the func ptrs that are
>>>> loaded from GOT in the startup code, but execution fails
>>>> even before those are used because there are R*_JUMP_SLOT
>>>> and R*_GLOB_DAT relocs which are not processed correctly.
>>>>
>>>> in particular in
>>>>   if (__pthread_initialize_minimal != NULL)
>>>>     __pthread_initialize_minimal ();
>>>> the symbol value loaded from GOT is non-NULL even though
>>>> there is no pthread linked in, that is probably a linker bug.
>>>>
>>>
>>> it seems weak extern symbol is accessed via got and at
>>> link time that is not relaxed to 0 with -static -pie
>>> and the got entry is not initialized to 0 either.
>>
>> aarch64 dl-machine.h has
>>
>>       struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
>>       ElfW(Addr) value = sym_map == NULL ? 0 : sym_map->l_addr +
>> sym->st_value;
>>
>> x86_64 has
>>
>>       struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
>>       ElfW(Addr) value = (sym == NULL ? 0
>> 			  : (ElfW(Addr)) sym_map->l_addr + sym->st_value);
>>
>> sym_map is always == BOOTSTRAP_MAP in case of static pie, so
>> tye sym_map == NULL check is not true on aarch64 case for weak
>> undef symbols.
>>
>> so either targets need to be fixed to not use sym_map check
>> for detecting undef weak (powerpc32, powerpc64, aarch64,
>> i386, arm, sh, sparc32, sparc64) or RESOLVE_MAP should not
>> be unconditionally set to BOOTSTRAP_MAP in _dl_relocate_static_pie
>> (since that is not true for undef symbols)
>>
> 
> It shouldn't matter.  All undefined weak symbols should be resolved
> to 0 in static PIE by linker.  See:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=22269
> 

ok, thanks for the bug report, but i prefer to also
fix the dynamic linker to try set abs/got/jumpslot
relocs for weak undef syms to 0, so it can work with
earlier binutils.
  
H.J. Lu Oct. 7, 2017, 1:07 a.m. UTC | #17
On 10/6/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 06/10/17 13:34, H.J. Lu wrote:
>> On 10/6/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>> On 03/10/17 16:31, Szabolcs Nagy wrote:
>>>> On 03/10/17 11:39, Szabolcs Nagy wrote:
>>>>> On 02/10/17 12:20, H.J. Lu wrote:
>>>>>> On 10/2/17, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>>>>> On 29/09/17 22:32, H.J. Lu wrote:
>>>>>>>> Since start.o may be compiled as PIC, we should check PIC instead
>>>>>>>> of
>>>>>>>> SHARED.
>>>>>>>>
>>>>>>>> OK for master?
>>>>>>>>
>>>>>>>
>>>>>>> i believe that the compile/link tests worked..
>>>>>>
>>>>>> Does static PIE of hjl/pie/static branch run on arm and aarch64?
>>>>>>
>>>>>
>>>>> no, if i build with --enable-static-pie the install step
>>>>> fails when the static linked sln runs.
>>>>>
>>>>> there are relative relocs against the func ptrs that are
>>>>> loaded from GOT in the startup code, but execution fails
>>>>> even before those are used because there are R*_JUMP_SLOT
>>>>> and R*_GLOB_DAT relocs which are not processed correctly.
>>>>>
>>>>> in particular in
>>>>>   if (__pthread_initialize_minimal != NULL)
>>>>>     __pthread_initialize_minimal ();
>>>>> the symbol value loaded from GOT is non-NULL even though
>>>>> there is no pthread linked in, that is probably a linker bug.
>>>>>
>>>>
>>>> it seems weak extern symbol is accessed via got and at
>>>> link time that is not relaxed to 0 with -static -pie
>>>> and the got entry is not initialized to 0 either.
>>>
>>> aarch64 dl-machine.h has
>>>
>>>       struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
>>>       ElfW(Addr) value = sym_map == NULL ? 0 : sym_map->l_addr +
>>> sym->st_value;
>>>
>>> x86_64 has
>>>
>>>       struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
>>>       ElfW(Addr) value = (sym == NULL ? 0
>>> 			  : (ElfW(Addr)) sym_map->l_addr + sym->st_value);
>>>
>>> sym_map is always == BOOTSTRAP_MAP in case of static pie, so
>>> tye sym_map == NULL check is not true on aarch64 case for weak
>>> undef symbols.
>>>
>>> so either targets need to be fixed to not use sym_map check
>>> for detecting undef weak (powerpc32, powerpc64, aarch64,
>>> i386, arm, sh, sparc32, sparc64) or RESOLVE_MAP should not
>>> be unconditionally set to BOOTSTRAP_MAP in _dl_relocate_static_pie
>>> (since that is not true for undef symbols)
>>>
>>
>> It shouldn't matter.  All undefined weak symbols should be resolved
>> to 0 in static PIE by linker.  See:
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=22269
>>
>
> ok, thanks for the bug report, but i prefer to also
> fix the dynamic linker to try set abs/got/jumpslot

Sure.

> relocs for weak undef syms to 0, so it can work with
> earlier binutils.
>
>

Please try binutils users/hjl/pr22269 branch.  aarch64 may need to check
UNDEFINED_WEAK_RESOLVED_TO_ZERO in more places.
  

Patch

diff --git a/sysdeps/aarch64/start.S b/sysdeps/aarch64/start.S
index c20433ad73..7a946506f2 100644
--- a/sysdeps/aarch64/start.S
+++ b/sysdeps/aarch64/start.S
@@ -60,7 +60,7 @@  _start:
 	/* Setup stack limit in argument register */
 	mov	x6, sp
 
-#ifdef SHARED
+#ifdef PIC
         adrp    x0, :got:main
 	ldr     PTR_REG (0), [x0, #:got_lo12:main]