dl: Use "adr" assembler command to get proper load address

Message ID 20210907131616.23472-1-lukma@denx.de
State Superseded
Headers
Series dl: Use "adr" assembler command to get proper load address |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Lukasz Majewski Sept. 7, 2021, 1:16 p.m. UTC
  This change is a partial revert of commit
bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of
__ehdr_start linker variable to get the address of loaded program.

The elf_machine_load_address() function is declared in the
sysdeps/arm/dl-machine.h header. It is called from _dl_start() entry
point for the program. It shall return the load address of the dynamic
linker program.
With this revert the 'adr' assembler instruction is used instead of a
place holder:

arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr
00000000 l       .note.gnu.build-id     00000000      __ehdr_start

which shall be pre-set by binutils.

This is crucial in the QEMU ARM environment for which (when /sbin/init
is executed) values set in __ehdr_start symbol are wrong. This causes
the program to crash very early - when the /lib/ld-linux-armhf.so.3 is
executed as a prerequisite to /sbin/init execution.
The kernel's fs/binfmt_elf.c is though responsible for setting
up execution environment, not binutils.

It looks like the only robust way to obtain the _dl_start offset is to
use assembler instruction - not rely on values provided by binutils.

HW:
Hardware name: ARM-Versatile Express (Run with QEMU)
Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1

When the /sbin/init is setup for run from Linux kernel's very small
environment with LD_DEBUG=all the __ehdr_start is not shown at all.

Fixes: BZ #28293
---
 sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)
  

Comments

Fangrui Song Sept. 7, 2021, 4:49 p.m. UTC | #1
On 2021-09-07, Lukasz Majewski wrote:
>This change is a partial revert of commit
>bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of
>__ehdr_start linker variable to get the address of loaded program.
>
>The elf_machine_load_address() function is declared in the
>sysdeps/arm/dl-machine.h header. It is called from _dl_start() entry
>point for the program. It shall return the load address of the dynamic
>linker program.

Yes.

>With this revert the 'adr' assembler instruction is used instead of a
>place holder:
>
>arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr
>00000000 l       .note.gnu.build-id     00000000      __ehdr_start
>
>which shall be pre-set by binutils.

I don't understand this. The sh_addr field of the binutils defined
__ehdr_start is 0.

Declararing __ehdr_start as hidden and accessing it with PC-relative
addressing generates computes the runtime address of __ehdr_start, which
equals the load base.

What's wrong with it? The objdump -t line doesn't tell why this change
is good. Can you compare objdump -dr output and show the incorrect code
sequences with C assessing __ehdr_start?

>This is crucial in the QEMU ARM environment for which (when /sbin/init
>is executed) values set in __ehdr_start symbol are wrong. This causes
>the program to crash very early - when the /lib/ld-linux-armhf.so.3 is
>executed as a prerequisite to /sbin/init execution.

>The kernel's fs/binfmt_elf.c is though responsible for setting
>up execution environment, not binutils.

Whether the symbol entry __ehdr_start exists doesn't matter.
The hidden definition does not have any associated relocation.

>It looks like the only robust way to obtain the _dl_start offset is to
>use assembler instruction - not rely on values provided by binutils.
>
>HW:
>Hardware name: ARM-Versatile Express (Run with QEMU)
>Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1
>
>When the /sbin/init is setup for run from Linux kernel's very small
>environment with LD_DEBUG=all the __ehdr_start is not shown at all.
>
>Fixes: BZ #28293
>---
> sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---
> 1 file changed, 25 insertions(+), 3 deletions(-)
>
>diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h
>index eb13cb8b57..58ebef6ecd 100644
>--- a/sysdeps/arm/dl-machine.h
>+++ b/sysdeps/arm/dl-machine.h
>@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr *ehdr)
> }
>
> /* Return the run-time load address of the shared object.  */
>-static inline ElfW(Addr) __attribute__ ((unused))
>+static inline Elf32_Addr __attribute__ ((unused))
> elf_machine_load_address (void)
> {
>-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
>-  return (ElfW(Addr)) &__ehdr_start;
>+  Elf32_Addr pcrel_addr;
>+#ifdef SHARED
>+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");
>+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;
>+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
>+#else
>+  extern Elf32_Addr __dl_relocate_static_pie (void *)
>+    asm ("_dl_relocate_static_pie") attribute_hidden;
>+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;
>+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));
>+#endif
>+#ifdef __thumb__
>+  /* Clear the low bit of the function address.
>+
>+     NOTE: got_addr is from GOT table whose lsb is always set by linker if it's
>+     Thumb function address.  PCREL_ADDR comes from PC-relative calculation
>+     which will finish during assembling.  GAS assembler before the fix for
>+     PR gas/21458 was not setting the lsb but does after that.  Always do the
>+     strip for both, so the code works with various combinations of glibc and
>+     Binutils.  */
>+  got_addr &= ~(Elf32_Addr) 1;
>+  pcrel_addr &= ~(Elf32_Addr) 1;
>+#endif
>+  return pcrel_addr - got_addr;
> }
>
> /* Return the link-time address of _DYNAMIC.  */
>-- 
>2.20.1
>
  
Lukasz Majewski Sept. 7, 2021, 5:32 p.m. UTC | #2
Hi Fangrui,

> On 2021-09-07, Lukasz Majewski wrote:
> >This change is a partial revert of commit
> >bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of
> >__ehdr_start linker variable to get the address of loaded program.
> >
> >The elf_machine_load_address() function is declared in the
> >sysdeps/arm/dl-machine.h header. It is called from _dl_start() entry
> >point for the program. It shall return the load address of the
> >dynamic linker program.  
> 
> Yes.
> 
> >With this revert the 'adr' assembler instruction is used instead of a
> >place holder:
> >
> >arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr
> >00000000 l       .note.gnu.build-id     00000000      __ehdr_start
> >
> >which shall be pre-set by binutils.  
> 
> I don't understand this. The sh_addr field of the binutils defined
> __ehdr_start is 0.
> 
> Declararing __ehdr_start as hidden and accessing it with PC-relative
> addressing generates computes the runtime address of __ehdr_start,
> which equals the load base.

Maybe I don't get it - but shouldn't this be filled in by binutils
(linker ?) when the program is assembled before run?

Or is the __ehdr_start used just as a relative offset to get the proper
position of ld-linux-armh.so when called?

> 
> What's wrong with it? The objdump -t line doesn't tell why this change
> is good.

I'm also puzzled why this causes the QEMU versalite board to break.
(/sbin/init dies with SIGSEGFAULT = 0xb [*])

It looks like the &__ehdr_start is not equal in the result to 'adr'
assembler code in QEMU.

I also thought that the issue is caused by fs/binfmt_elf.c changes in
the Linux kernel (as mine is 5.1), so I've tested it with 5.10 and
5.14. No change.

The QEMU is pretty recent - it is 6.0.0 version for ARM.

> Can you compare objdump -dr output and show the incorrect
> code sequences with C assessing __ehdr_start?

Ok. I will explicitly compare _dl_start function with and without this
patch.

> 
> >This is crucial in the QEMU ARM environment for which (when
> >/sbin/init is executed) values set in __ehdr_start symbol are wrong.
> >This causes the program to crash very early - when the
> >/lib/ld-linux-armhf.so.3 is executed as a prerequisite to /sbin/init
> >execution.  
> 
> >The kernel's fs/binfmt_elf.c is though responsible for setting
> >up execution environment, not binutils.  
> 
> Whether the symbol entry __ehdr_start exists doesn't matter.
> The hidden definition does not have any associated relocation.

I'm also wondering if similar issue is visible with "normal" - i.e. not
QEMU ARM run init.

Maybe the "rough" environment in which /sbin/init is run is the culprit?

> 
> >It looks like the only robust way to obtain the _dl_start offset is
> >to use assembler instruction - not rely on values provided by
> >binutils.
> >
> >HW:
> >Hardware name: ARM-Versatile Express (Run with QEMU)
> >Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1
> >
> >When the /sbin/init is setup for run from Linux kernel's very small
> >environment with LD_DEBUG=all the __ehdr_start is not shown at all.
> >
> >Fixes: BZ #28293
> >---
> > sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---
> > 1 file changed, 25 insertions(+), 3 deletions(-)
> >
> >diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h
> >index eb13cb8b57..58ebef6ecd 100644
> >--- a/sysdeps/arm/dl-machine.h
> >+++ b/sysdeps/arm/dl-machine.h
> >@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr *ehdr)
> > }
> >
> > /* Return the run-time load address of the shared object.  */
> >-static inline ElfW(Addr) __attribute__ ((unused))
> >+static inline Elf32_Addr __attribute__ ((unused))
> > elf_machine_load_address (void)
> > {
> >-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
> >-  return (ElfW(Addr)) &__ehdr_start;
> >+  Elf32_Addr pcrel_addr;
> >+#ifdef SHARED
> >+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");
> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;
> >+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
> >+#else
> >+  extern Elf32_Addr __dl_relocate_static_pie (void *)
> >+    asm ("_dl_relocate_static_pie") attribute_hidden;
> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;
> >+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));
> >+#endif
> >+#ifdef __thumb__
> >+  /* Clear the low bit of the function address.
> >+
> >+     NOTE: got_addr is from GOT table whose lsb is always set by
> >linker if it's
> >+     Thumb function address.  PCREL_ADDR comes from PC-relative
> >calculation
> >+     which will finish during assembling.  GAS assembler before the
> >fix for
> >+     PR gas/21458 was not setting the lsb but does after that.
> >Always do the
> >+     strip for both, so the code works with various combinations of
> >glibc and
> >+     Binutils.  */
> >+  got_addr &= ~(Elf32_Addr) 1;
> >+  pcrel_addr &= ~(Elf32_Addr) 1;
> >+#endif
> >+  return pcrel_addr - got_addr;
> > }
> >
> > /* Return the link-time address of _DYNAMIC.  */
> >-- 
> >2.20.1
> >  

Note:

[*] - the QEMU debugging is pretty difficult with /sbin/init. Qemu has
good support for kernel debugging (-s -S switches). Then for user space
program one could use the "on-board" gdb or gdbserver.

However, /sbin/init needs to be debugged from linux and scheduler
context switches needs to be taken into account to review how the code
is executed. And such debugging scenario has issue with QEMU-arm.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Fangrui Song Sept. 7, 2021, 5:44 p.m. UTC | #3
On 2021-09-07, Lukasz Majewski wrote:
>Hi Fangrui,
>
>> On 2021-09-07, Lukasz Majewski wrote:
>> >This change is a partial revert of commit
>> >bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of
>> >__ehdr_start linker variable to get the address of loaded program.
>> >
>> >The elf_machine_load_address() function is declared in the
>> >sysdeps/arm/dl-machine.h header. It is called from _dl_start() entry
>> >point for the program. It shall return the load address of the
>> >dynamic linker program.
>>
>> Yes.
>>
>> >With this revert the 'adr' assembler instruction is used instead of a
>> >place holder:
>> >
>> >arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr
>> >00000000 l       .note.gnu.build-id     00000000      __ehdr_start
>> >
>> >which shall be pre-set by binutils.
>>
>> I don't understand this. The sh_addr field of the binutils defined
>> __ehdr_start is 0.
>>
>> Declararing __ehdr_start as hidden and accessing it with PC-relative
>> addressing generates computes the runtime address of __ehdr_start,
>> which equals the load base.
>
>Maybe I don't get it - but shouldn't this be filled in by binutils
>(linker ?) when the program is assembled before run?

The linker just sets its link-time address (sh_addr): 0.
The address is computed at runtime.

>Or is the __ehdr_start used just as a relative offset to get the proper
>position of ld-linux-armh.so when called?

It's a relative offset.

% cat a.c
__attribute__((visibility("hidden")))
extern char __ehdr_start[];

char *load_address() { return __ehdr_start; }

% arm-linux-gnueabi-gcc -O1 -fpic a.c -nostdlib
% arm-linux-gnueabi-objdump -dr a.out

a.out:     file format elf32-littlearm


Disassembly of section .text:

00000198 <load_address>:
  198:   e59f0004        ldr     r0, [pc, #4]    ; 1a4 <load_address+0xc>
  19c:   e08f0000        add     r0, pc, r0
  1a0:   e12fff1e        bx      lr
  1a4:   fffffe5c        .word   0xfffffe5c

At runtime, `load_address` will compute the load address.
Note that there is no runtime relocation.


>>
>> What's wrong with it? The objdump -t line doesn't tell why this change
>> is good.
>
>I'm also puzzled why this causes the QEMU versalite board to break.
>(/sbin/init dies with SIGSEGFAULT = 0xb [*])
>
>It looks like the &__ehdr_start is not equal in the result to 'adr'
>assembler code in QEMU.
>
>I also thought that the issue is caused by fs/binfmt_elf.c changes in
>the Linux kernel (as mine is 5.1), so I've tested it with 5.10 and
>5.14. No change.

The kernel change must be unrelated.
The kernel doesn't even know __ehdr_start as a symbol exists.

>The QEMU is pretty recent - it is 6.0.0 version for ARM.
>
>> Can you compare objdump -dr output and show the incorrect
>> code sequences with C assessing __ehdr_start?
>
>Ok. I will explicitly compare _dl_start function with and without this
>patch.

In case it helps, perhaps replace always_inline with noinline
and check why the function is incorrect under your qemu setup.

>>
>> >This is crucial in the QEMU ARM environment for which (when
>> >/sbin/init is executed) values set in __ehdr_start symbol are wrong.
>> >This causes the program to crash very early - when the
>> >/lib/ld-linux-armhf.so.3 is executed as a prerequisite to /sbin/init
>> >execution.
>>
>> >The kernel's fs/binfmt_elf.c is though responsible for setting
>> >up execution environment, not binutils.
>>
>> Whether the symbol entry __ehdr_start exists doesn't matter.
>> The hidden definition does not have any associated relocation.
>
>I'm also wondering if similar issue is visible with "normal" - i.e. not
>QEMU ARM run init.
>
>Maybe the "rough" environment in which /sbin/init is run is the culprit?
>

I had tested running ld.so and elf/ldconfig which covered the usage.

I don't know. It needs some debugging from you:)

>>
>> >It looks like the only robust way to obtain the _dl_start offset is
>> >to use assembler instruction - not rely on values provided by
>> >binutils.
>> >
>> >HW:
>> >Hardware name: ARM-Versatile Express (Run with QEMU)
>> >Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1
>> >
>> >When the /sbin/init is setup for run from Linux kernel's very small
>> >environment with LD_DEBUG=all the __ehdr_start is not shown at all.
>> >
>> >Fixes: BZ #28293
>> >---
>> > sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---
>> > 1 file changed, 25 insertions(+), 3 deletions(-)
>> >
>> >diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h
>> >index eb13cb8b57..58ebef6ecd 100644
>> >--- a/sysdeps/arm/dl-machine.h
>> >+++ b/sysdeps/arm/dl-machine.h
>> >@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr *ehdr)
>> > }
>> >
>> > /* Return the run-time load address of the shared object.  */
>> >-static inline ElfW(Addr) __attribute__ ((unused))
>> >+static inline Elf32_Addr __attribute__ ((unused))
>> > elf_machine_load_address (void)
>> > {
>> >-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
>> >-  return (ElfW(Addr)) &__ehdr_start;
>> >+  Elf32_Addr pcrel_addr;
>> >+#ifdef SHARED
>> >+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");
>> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;
>> >+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
>> >+#else
>> >+  extern Elf32_Addr __dl_relocate_static_pie (void *)
>> >+    asm ("_dl_relocate_static_pie") attribute_hidden;
>> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;
>> >+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));
>> >+#endif
>> >+#ifdef __thumb__
>> >+  /* Clear the low bit of the function address.
>> >+
>> >+     NOTE: got_addr is from GOT table whose lsb is always set by
>> >linker if it's
>> >+     Thumb function address.  PCREL_ADDR comes from PC-relative
>> >calculation
>> >+     which will finish during assembling.  GAS assembler before the
>> >fix for
>> >+     PR gas/21458 was not setting the lsb but does after that.
>> >Always do the
>> >+     strip for both, so the code works with various combinations of
>> >glibc and
>> >+     Binutils.  */
>> >+  got_addr &= ~(Elf32_Addr) 1;
>> >+  pcrel_addr &= ~(Elf32_Addr) 1;
>> >+#endif
>> >+  return pcrel_addr - got_addr;
>> > }
>> >
>> > /* Return the link-time address of _DYNAMIC.  */
>> >--
>> >2.20.1
>> >
>
>Note:
>
>[*] - the QEMU debugging is pretty difficult with /sbin/init. Qemu has
>good support for kernel debugging (-s -S switches). Then for user space
>program one could use the "on-board" gdb or gdbserver.
>
>However, /sbin/init needs to be debugged from linux and scheduler
>context switches needs to be taken into account to review how the code
>is executed. And such debugging scenario has issue with QEMU-arm.
>
>
>Best regards,
>
>Lukasz Majewski
>
>--
>
>DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
>HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
>Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Lukasz Majewski Sept. 8, 2021, 3:05 p.m. UTC | #4
Hi Fangrui,

> On 2021-09-07, Lukasz Majewski wrote:
> >Hi Fangrui,
> >  
> >> On 2021-09-07, Lukasz Majewski wrote:  
> >> >This change is a partial revert of commit
> >> >bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of
> >> >__ehdr_start linker variable to get the address of loaded program.
> >> >
> >> >The elf_machine_load_address() function is declared in the
> >> >sysdeps/arm/dl-machine.h header. It is called from _dl_start()
> >> >entry point for the program. It shall return the load address of
> >> >the dynamic linker program.  
> >>
> >> Yes.
> >>  
> >> >With this revert the 'adr' assembler instruction is used instead
> >> >of a place holder:
> >> >
> >> >arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr
> >> >00000000 l       .note.gnu.build-id     00000000      __ehdr_start
> >> >
> >> >which shall be pre-set by binutils.  
> >>
> >> I don't understand this. The sh_addr field of the binutils defined
> >> __ehdr_start is 0.
> >>
> >> Declararing __ehdr_start as hidden and accessing it with
> >> PC-relative addressing generates computes the runtime address of
> >> __ehdr_start, which equals the load base.  
> >
> >Maybe I don't get it - but shouldn't this be filled in by binutils
> >(linker ?) when the program is assembled before run?  
> 
> The linker just sets its link-time address (sh_addr): 0.
> The address is computed at runtime.
> 
> >Or is the __ehdr_start used just as a relative offset to get the
> >proper position of ld-linux-armh.so when called?  
> 
> It's a relative offset.
> 
> % cat a.c
> __attribute__((visibility("hidden")))
> extern char __ehdr_start[];
> 
> char *load_address() { return __ehdr_start; }
> 
> % arm-linux-gnueabi-gcc -O1 -fpic a.c -nostdlib
> % arm-linux-gnueabi-objdump -dr a.out
> 
> a.out:     file format elf32-littlearm
> 
> 
> Disassembly of section .text:
> 
> 00000198 <load_address>:
>   198:   e59f0004        ldr     r0, [pc, #4]    ; 1a4
> <load_address+0xc> 19c:   e08f0000        add     r0, pc, r0
>   1a0:   e12fff1e        bx      lr
>   1a4:   fffffe5c        .word   0xfffffe5c
> 
> At runtime, `load_address` will compute the load address.
> Note that there is no runtime relocation.

I think that the problem may be with having the negative value
calculated.

The relevant snipet:

    116c:       bf00            nop
    116e:       bf00            nop
    1170:       bf00            nop
    1172:       f8df 3508       ldr.w   r3, [pc, #1288] ; 167c <_dl_start+0x520>

    1176:       f8df 1508       ldr.w   r1, [pc, #1288] ; 1680 <_dl_start+0x524>

    117a:       447b            add     r3, pc
    117c:       4479            add     r1, pc
    117e:       f8c3 1598       str.w   r1, [r3, #1432] ; 0x598
    1182:       bf00            nop
    1184:       bf00            nop
    1186:       bf00            nop
    1188:       bf00            nop
    118a:       bf00            nop
    118c:       bf00            nop
    118e:       f8df 24f4       ldr.w   r2, [pc, #1268] ; 1684 <_dl_start+0x528>
    1192:       f8d3 5598       ldr.w   r5, [r3, #1432] ; 0x598
    1196:       447a            add     r2, pc
    1198:       442a            add     r2, r5
    119a:       1a52            subs    r2, r2, r1
    119c:       f8c3 25a0       str.w   r2, [r3, #1440] ; 0x5a0
    11a0:       6813            ldr     r3, [r2, #0]


    167c:       0002be92        .word   0x0002be92
    1680:       ffffee80        .word   0xffffee80

The r1 gets the 0xffffee80 (negative offset) value. It is then added to pc
and used to calculate r2.

For working code (with this patch applied) - there are NO such large
values (like aforementioned 0xffffee80). The arithmetic is done on 

   1690:       00000020        .word   0x00000020
   1694:       0002be7e        .word   0x0002be7e

which seems to work.

This shouldn't be a problem as with U2 the arithmetic shall work.
However, I've noticed (with passing LD_DEBUG=all) that the
ld-linux-armhf.so.3 (and init) are relocated twice before execution.

Why do we need to relocate it?

Another question is why on this particular case the large (i.e.
negative) offset matters?

> 
> 
> >>
> >> What's wrong with it? The objdump -t line doesn't tell why this
> >> change is good.  
> >
> >I'm also puzzled why this causes the QEMU versalite board to break.
> >(/sbin/init dies with SIGSEGFAULT = 0xb [*])
> >
> >It looks like the &__ehdr_start is not equal in the result to 'adr'
> >assembler code in QEMU.
> >
> >I also thought that the issue is caused by fs/binfmt_elf.c changes in
> >the Linux kernel (as mine is 5.1), so I've tested it with 5.10 and
> >5.14. No change.  
> 
> The kernel change must be unrelated.
> The kernel doesn't even know __ehdr_start as a symbol exists.
> 
> >The QEMU is pretty recent - it is 6.0.0 version for ARM.
> >  
> >> Can you compare objdump -dr output and show the incorrect
> >> code sequences with C assessing __ehdr_start?  
> >
> >Ok. I will explicitly compare _dl_start function with and without
> >this patch.  
> 
> In case it helps, perhaps replace always_inline with noinline
> and check why the function is incorrect under your qemu setup.
> 
> >>  
> >> >This is crucial in the QEMU ARM environment for which (when
> >> >/sbin/init is executed) values set in __ehdr_start symbol are
> >> >wrong. This causes the program to crash very early - when the
> >> >/lib/ld-linux-armhf.so.3 is executed as a prerequisite to
> >> >/sbin/init execution.  
> >>  
> >> >The kernel's fs/binfmt_elf.c is though responsible for setting
> >> >up execution environment, not binutils.  
> >>
> >> Whether the symbol entry __ehdr_start exists doesn't matter.
> >> The hidden definition does not have any associated relocation.  
> >
> >I'm also wondering if similar issue is visible with "normal" - i.e.
> >not QEMU ARM run init.
> >
> >Maybe the "rough" environment in which /sbin/init is run is the
> >culprit?
> >  
> 
> I had tested running ld.so and elf/ldconfig which covered the usage.
> 
> I don't know. It needs some debugging from you:)
> 
> >>  
> >> >It looks like the only robust way to obtain the _dl_start offset
> >> >is to use assembler instruction - not rely on values provided by
> >> >binutils.
> >> >
> >> >HW:
> >> >Hardware name: ARM-Versatile Express (Run with QEMU)
> >> >Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1
> >> >
> >> >When the /sbin/init is setup for run from Linux kernel's very
> >> >small environment with LD_DEBUG=all the __ehdr_start is not shown
> >> >at all.
> >> >
> >> >Fixes: BZ #28293
> >> >---
> >> > sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---
> >> > 1 file changed, 25 insertions(+), 3 deletions(-)
> >> >
> >> >diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h
> >> >index eb13cb8b57..58ebef6ecd 100644
> >> >--- a/sysdeps/arm/dl-machine.h
> >> >+++ b/sysdeps/arm/dl-machine.h
> >> >@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr
> >> >*ehdr)
> >> > }
> >> >
> >> > /* Return the run-time load address of the shared object.  */
> >> >-static inline ElfW(Addr) __attribute__ ((unused))
> >> >+static inline Elf32_Addr __attribute__ ((unused))
> >> > elf_machine_load_address (void)
> >> > {
> >> >-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
> >> >-  return (ElfW(Addr)) &__ehdr_start;
> >> >+  Elf32_Addr pcrel_addr;
> >> >+#ifdef SHARED
> >> >+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");
> >> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;
> >> >+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
> >> >+#else
> >> >+  extern Elf32_Addr __dl_relocate_static_pie (void *)
> >> >+    asm ("_dl_relocate_static_pie") attribute_hidden;
> >> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;
> >> >+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));
> >> >+#endif
> >> >+#ifdef __thumb__
> >> >+  /* Clear the low bit of the function address.
> >> >+
> >> >+     NOTE: got_addr is from GOT table whose lsb is always set by
> >> >linker if it's
> >> >+     Thumb function address.  PCREL_ADDR comes from PC-relative
> >> >calculation
> >> >+     which will finish during assembling.  GAS assembler before
> >> >the fix for
> >> >+     PR gas/21458 was not setting the lsb but does after that.
> >> >Always do the
> >> >+     strip for both, so the code works with various combinations
> >> >of glibc and
> >> >+     Binutils.  */
> >> >+  got_addr &= ~(Elf32_Addr) 1;
> >> >+  pcrel_addr &= ~(Elf32_Addr) 1;
> >> >+#endif
> >> >+  return pcrel_addr - got_addr;
> >> > }
> >> >
> >> > /* Return the link-time address of _DYNAMIC.  */
> >> >--
> >> >2.20.1
> >> >  
> >
> >Note:
> >
> >[*] - the QEMU debugging is pretty difficult with /sbin/init. Qemu
> >has good support for kernel debugging (-s -S switches). Then for
> >user space program one could use the "on-board" gdb or gdbserver.
> >
> >However, /sbin/init needs to be debugged from linux and scheduler
> >context switches needs to be taken into account to review how the
> >code is executed. And such debugging scenario has issue with
> >QEMU-arm.
> >
> >
> >Best regards,
> >
> >Lukasz Majewski
> >
> >--
> >
> >DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> >HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> >Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> >lukma@denx.de  
> 
> 




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Fangrui Song Sept. 8, 2021, 5:41 p.m. UTC | #5
On Wed, Sep 8, 2021 at 8:05 AM Lukasz Majewski <lukma@denx.de> wrote:
>
> Hi Fangrui,
>
> > On 2021-09-07, Lukasz Majewski wrote:
> > >Hi Fangrui,
> > >
> > >> On 2021-09-07, Lukasz Majewski wrote:
> > >> >This change is a partial revert of commit
> > >> >bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of
> > >> >__ehdr_start linker variable to get the address of loaded program.
> > >> >
> > >> >The elf_machine_load_address() function is declared in the
> > >> >sysdeps/arm/dl-machine.h header. It is called from _dl_start()
> > >> >entry point for the program. It shall return the load address of
> > >> >the dynamic linker program.
> > >>
> > >> Yes.
> > >>
> > >> >With this revert the 'adr' assembler instruction is used instead
> > >> >of a place holder:
> > >> >
> > >> >arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr
> > >> >00000000 l       .note.gnu.build-id     00000000      __ehdr_start
> > >> >
> > >> >which shall be pre-set by binutils.
> > >>
> > >> I don't understand this. The sh_addr field of the binutils defined
> > >> __ehdr_start is 0.
> > >>
> > >> Declararing __ehdr_start as hidden and accessing it with
> > >> PC-relative addressing generates computes the runtime address of
> > >> __ehdr_start, which equals the load base.
> > >
> > >Maybe I don't get it - but shouldn't this be filled in by binutils
> > >(linker ?) when the program is assembled before run?
> >
> > The linker just sets its link-time address (sh_addr): 0.
> > The address is computed at runtime.
> >
> > >Or is the __ehdr_start used just as a relative offset to get the
> > >proper position of ld-linux-armh.so when called?
> >
> > It's a relative offset.
> >
> > % cat a.c
> > __attribute__((visibility("hidden")))
> > extern char __ehdr_start[];
> >
> > char *load_address() { return __ehdr_start; }
> >
> > % arm-linux-gnueabi-gcc -O1 -fpic a.c -nostdlib
> > % arm-linux-gnueabi-objdump -dr a.out
> >
> > a.out:     file format elf32-littlearm
> >
> >
> > Disassembly of section .text:
> >
> > 00000198 <load_address>:
> >   198:   e59f0004        ldr     r0, [pc, #4]    ; 1a4
> > <load_address+0xc> 19c:   e08f0000        add     r0, pc, r0
> >   1a0:   e12fff1e        bx      lr
> >   1a4:   fffffe5c        .word   0xfffffe5c
> >
> > At runtime, `load_address` will compute the load address.
> > Note that there is no runtime relocation.
>
> I think that the problem may be with having the negative value
> calculated.

Unfamiliar with arm, but I don't see why a negative value can be a problem.
The address computation just wraps around as expected.

> The relevant snipet:
>
>     116c:       bf00            nop
>     116e:       bf00            nop
>     1170:       bf00            nop
>     1172:       f8df 3508       ldr.w   r3, [pc, #1288] ; 167c <_dl_start+0x520>
>
>     1176:       f8df 1508       ldr.w   r1, [pc, #1288] ; 1680 <_dl_start+0x524>
>
>     117a:       447b            add     r3, pc
>     117c:       4479            add     r1, pc
>     117e:       f8c3 1598       str.w   r1, [r3, #1432] ; 0x598
>     1182:       bf00            nop
>     1184:       bf00            nop
>     1186:       bf00            nop
>     1188:       bf00            nop
>     118a:       bf00            nop
>     118c:       bf00            nop
>     118e:       f8df 24f4       ldr.w   r2, [pc, #1268] ; 1684 <_dl_start+0x528>
>     1192:       f8d3 5598       ldr.w   r5, [r3, #1432] ; 0x598
>     1196:       447a            add     r2, pc
>     1198:       442a            add     r2, r5
>     119a:       1a52            subs    r2, r2, r1
>     119c:       f8c3 25a0       str.w   r2, [r3, #1440] ; 0x5a0
>     11a0:       6813            ldr     r3, [r2, #0]
>
>
>     167c:       0002be92        .word   0x0002be92
>     1680:       ffffee80        .word   0xffffee80
>
> The r1 gets the 0xffffee80 (negative offset) value. It is then added to pc
> and used to calculate r2.
>
> For working code (with this patch applied) - there are NO such large
> values (like aforementioned 0xffffee80). The arithmetic is done on
>
>    1690:       00000020        .word   0x00000020
>    1694:       0002be7e        .word   0x0002be7e
>
> which seems to work.
>
> This shouldn't be a problem as with U2 the arithmetic shall work.
> However, I've noticed (with passing LD_DEBUG=all) that the
> ld-linux-armhf.so.3 (and init) are relocated twice before execution.
>
> Why do we need to relocate it?

No idea...

> Another question is why on this particular case the large (i.e.
> negative) offset matters?

No idea...

> >
> >
> > >>
> > >> What's wrong with it? The objdump -t line doesn't tell why this
> > >> change is good.
> > >
> > >I'm also puzzled why this causes the QEMU versalite board to break.
> > >(/sbin/init dies with SIGSEGFAULT = 0xb [*])
> > >
> > >It looks like the &__ehdr_start is not equal in the result to 'adr'
> > >assembler code in QEMU.
> > >
> > >I also thought that the issue is caused by fs/binfmt_elf.c changes in
> > >the Linux kernel (as mine is 5.1), so I've tested it with 5.10 and
> > >5.14. No change.
> >
> > The kernel change must be unrelated.
> > The kernel doesn't even know __ehdr_start as a symbol exists.
> >
> > >The QEMU is pretty recent - it is 6.0.0 version for ARM.
> > >
> > >> Can you compare objdump -dr output and show the incorrect
> > >> code sequences with C assessing __ehdr_start?
> > >
> > >Ok. I will explicitly compare _dl_start function with and without
> > >this patch.
> >
> > In case it helps, perhaps replace always_inline with noinline
> > and check why the function is incorrect under your qemu setup.
> >
> > >>
> > >> >This is crucial in the QEMU ARM environment for which (when
> > >> >/sbin/init is executed) values set in __ehdr_start symbol are
> > >> >wrong. This causes the program to crash very early - when the
> > >> >/lib/ld-linux-armhf.so.3 is executed as a prerequisite to
> > >> >/sbin/init execution.
> > >>
> > >> >The kernel's fs/binfmt_elf.c is though responsible for setting
> > >> >up execution environment, not binutils.
> > >>
> > >> Whether the symbol entry __ehdr_start exists doesn't matter.
> > >> The hidden definition does not have any associated relocation.
> > >
> > >I'm also wondering if similar issue is visible with "normal" - i.e.
> > >not QEMU ARM run init.
> > >
> > >Maybe the "rough" environment in which /sbin/init is run is the
> > >culprit?
> > >
> >
> > I had tested running ld.so and elf/ldconfig which covered the usage.
> >
> > I don't know. It needs some debugging from you:)
> >
> > >>
> > >> >It looks like the only robust way to obtain the _dl_start offset
> > >> >is to use assembler instruction - not rely on values provided by
> > >> >binutils.
> > >> >
> > >> >HW:
> > >> >Hardware name: ARM-Versatile Express (Run with QEMU)
> > >> >Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1
> > >> >
> > >> >When the /sbin/init is setup for run from Linux kernel's very
> > >> >small environment with LD_DEBUG=all the __ehdr_start is not shown
> > >> >at all.
> > >> >
> > >> >Fixes: BZ #28293
> > >> >---
> > >> > sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++---
> > >> > 1 file changed, 25 insertions(+), 3 deletions(-)
> > >> >
> > >> >diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h
> > >> >index eb13cb8b57..58ebef6ecd 100644
> > >> >--- a/sysdeps/arm/dl-machine.h
> > >> >+++ b/sysdeps/arm/dl-machine.h
> > >> >@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr
> > >> >*ehdr)
> > >> > }
> > >> >
> > >> > /* Return the run-time load address of the shared object.  */
> > >> >-static inline ElfW(Addr) __attribute__ ((unused))
> > >> >+static inline Elf32_Addr __attribute__ ((unused))
> > >> > elf_machine_load_address (void)
> > >> > {
> > >> >-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
> > >> >-  return (ElfW(Addr)) &__ehdr_start;
> > >> >+  Elf32_Addr pcrel_addr;
> > >> >+#ifdef SHARED
> > >> >+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");
> > >> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;
> > >> >+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
> > >> >+#else
> > >> >+  extern Elf32_Addr __dl_relocate_static_pie (void *)
> > >> >+    asm ("_dl_relocate_static_pie") attribute_hidden;
> > >> >+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;
> > >> >+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));
> > >> >+#endif
> > >> >+#ifdef __thumb__
> > >> >+  /* Clear the low bit of the function address.
> > >> >+
> > >> >+     NOTE: got_addr is from GOT table whose lsb is always set by
> > >> >linker if it's
> > >> >+     Thumb function address.  PCREL_ADDR comes from PC-relative
> > >> >calculation
> > >> >+     which will finish during assembling.  GAS assembler before
> > >> >the fix for
> > >> >+     PR gas/21458 was not setting the lsb but does after that.
> > >> >Always do the
> > >> >+     strip for both, so the code works with various combinations
> > >> >of glibc and
> > >> >+     Binutils.  */
> > >> >+  got_addr &= ~(Elf32_Addr) 1;
> > >> >+  pcrel_addr &= ~(Elf32_Addr) 1;
> > >> >+#endif
> > >> >+  return pcrel_addr - got_addr;
> > >> > }
> > >> >
> > >> > /* Return the link-time address of _DYNAMIC.  */
> > >> >--
> > >> >2.20.1
> > >> >
> > >
> > >Note:
> > >
> > >[*] - the QEMU debugging is pretty difficult with /sbin/init. Qemu
> > >has good support for kernel debugging (-s -S switches). Then for
> > >user space program one could use the "on-board" gdb or gdbserver.
> > >
> > >However, /sbin/init needs to be debugged from linux and scheduler
> > >context switches needs to be taken into account to review how the
> > >code is executed. And such debugging scenario has issue with
> > >QEMU-arm.
> > >
> > >
> > >Best regards,
> > >
> > >Lukasz Majewski
> > >
> > >--
> > >
> > >DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> > >HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> > >Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> > >lukma@denx.de
> >
> >
>
>
>
>
> Best regards,
>
> Lukasz Majewski
>
> --
>
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Adhemerval Zanella Netto Sept. 8, 2021, 7:19 p.m. UTC | #6
On 08/09/2021 12:05, Lukasz Majewski wrote:
> 
> The r1 gets the 0xffffee80 (negative offset) value. It is then added to pc
> and used to calculate r2.
> 
> For working code (with this patch applied) - there are NO such large
> values (like aforementioned 0xffffee80). The arithmetic is done on 
> 
>    1690:       00000020        .word   0x00000020
>    1694:       0002be7e        .word   0x0002be7e
> 
> which seems to work.
> 
> This shouldn't be a problem as with U2 the arithmetic shall work.
> However, I've noticed (with passing LD_DEBUG=all) that the
> ld-linux-armhf.so.3 (and init) are relocated twice before execution.
> 
> Why do we need to relocate it?
> 
> Another question is why on this particular case the large (i.e.
> negative) offset matters?

I think it is highly unlikely the negative offset plays any role here.
Do you have a working example to trigger this issue?  I am currently
testing arm for different compilers (gcc 6.2, gcc 10, gcc 11) and
with different configurations (armv5, armv6, armv7, with and without
thumb) and I haven't see any issue so far.

It might binutils related, which version are you using?
  
Lukasz Majewski Sept. 8, 2021, 8:34 p.m. UTC | #7
Hi Adhemerval,

> On 08/09/2021 12:05, Lukasz Majewski wrote:
> > 
> > The r1 gets the 0xffffee80 (negative offset) value. It is then
> > added to pc and used to calculate r2.
> > 
> > For working code (with this patch applied) - there are NO such large
> > values (like aforementioned 0xffffee80). The arithmetic is done on 
> > 
> >    1690:       00000020        .word   0x00000020
> >    1694:       0002be7e        .word   0x0002be7e
> > 
> > which seems to work.
> > 
> > This shouldn't be a problem as with U2 the arithmetic shall work.
> > However, I've noticed (with passing LD_DEBUG=all) that the
> > ld-linux-armhf.so.3 (and init) are relocated twice before execution.
> > 
> > Why do we need to relocate it?
> > 
> > Another question is why on this particular case the large (i.e.
> > negative) offset matters?  
> 
> I think it is highly unlikely the negative offset plays any role here.
> Do you have a working example to trigger this issue? 

I'm just using QEMU (runqemu -d y2038-image-devel nographic) from
meta-y2038 build.

Recent versions: 
poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1
qemu_6.0.0
binutils_2.37
gcc_11.2

The QEMU board uses Cortex-A9 core.

> I am currently
> testing arm for different compilers (gcc 6.2, gcc 10, gcc 11) and
> with different configurations (armv5, armv6, armv7, with and without
> thumb) and I haven't see any issue so far.
> 
> It might binutils related, which version are you using?




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Lukasz Majewski Sept. 9, 2021, 7:18 a.m. UTC | #8
On Wed, 8 Sep 2021 22:34:21 +0200
Lukasz Majewski <lukma@denx.de> wrote:

> Hi Adhemerval,
> 
> > On 08/09/2021 12:05, Lukasz Majewski wrote:  
> > > 
> > > The r1 gets the 0xffffee80 (negative offset) value. It is then
> > > added to pc and used to calculate r2.
> > > 
> > > For working code (with this patch applied) - there are NO such
> > > large values (like aforementioned 0xffffee80). The arithmetic is
> > > done on 
> > > 
> > >    1690:       00000020        .word   0x00000020
> > >    1694:       0002be7e        .word   0x0002be7e
> > > 
> > > which seems to work.
> > > 
> > > This shouldn't be a problem as with U2 the arithmetic shall work.
> > > However, I've noticed (with passing LD_DEBUG=all) that the
> > > ld-linux-armhf.so.3 (and init) are relocated twice before
> > > execution.
> > > 
> > > Why do we need to relocate it?
> > > 
> > > Another question is why on this particular case the large (i.e.
> > > negative) offset matters?    
> > 
> > I think it is highly unlikely the negative offset plays any role
> > here. Do you have a working example to trigger this issue?   
> 
> I'm just using QEMU (runqemu -d y2038-image-devel nographic) from
> meta-y2038 build.
> 
> Recent versions: 
> poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1
> qemu_6.0.0
> binutils_2.37
> gcc_11.2
> 
> The QEMU board uses Cortex-A9 core.

I've updated the poky to be the newest -master:
master:50293eec9a7d0602b748220ab100b070b8d3432a

No change.

I will try the same setup with Cortex-A5 core - maybe there is some
kind of subtle handling of such emulation in qemu?

> 
> > I am currently
> > testing arm for different compilers (gcc 6.2, gcc 10, gcc 11) and
> > with different configurations (armv5, armv6, armv7, with and without
> > thumb) and I haven't see any issue so far.
> > 
> > It might binutils related, which version are you using?  
> 
> 
> 
> 
> Best regards,
> 
> Lukasz Majewski
> 
> --
> 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@denx.de




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Lukasz Majewski Sept. 9, 2021, 9:49 a.m. UTC | #9
On Thu, 9 Sep 2021 09:18:06 +0200
Lukasz Majewski <lukma@denx.de> wrote:

> On Wed, 8 Sep 2021 22:34:21 +0200
> Lukasz Majewski <lukma@denx.de> wrote:
> 
> > Hi Adhemerval,
> >   
> > > On 08/09/2021 12:05, Lukasz Majewski wrote:    
> > > > 
> > > > The r1 gets the 0xffffee80 (negative offset) value. It is then
> > > > added to pc and used to calculate r2.
> > > > 
> > > > For working code (with this patch applied) - there are NO such
> > > > large values (like aforementioned 0xffffee80). The arithmetic is
> > > > done on 
> > > > 
> > > >    1690:       00000020        .word   0x00000020
> > > >    1694:       0002be7e        .word   0x0002be7e
> > > > 
> > > > which seems to work.
> > > > 
> > > > This shouldn't be a problem as with U2 the arithmetic shall
> > > > work. However, I've noticed (with passing LD_DEBUG=all) that the
> > > > ld-linux-armhf.so.3 (and init) are relocated twice before
> > > > execution.
> > > > 
> > > > Why do we need to relocate it?
> > > > 
> > > > Another question is why on this particular case the large (i.e.
> > > > negative) offset matters?      
> > > 
> > > I think it is highly unlikely the negative offset plays any role
> > > here. Do you have a working example to trigger this issue?     
> > 
> > I'm just using QEMU (runqemu -d y2038-image-devel nographic) from
> > meta-y2038 build.
> > 
> > Recent versions: 
> > poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1
> > qemu_6.0.0
> > binutils_2.37
> > gcc_11.2
> > 
> > The QEMU board uses Cortex-A9 core.  
> 
> I've updated the poky to be the newest -master:
> master:50293eec9a7d0602b748220ab100b070b8d3432a
> 
> No change.
> 
> I will try the same setup with Cortex-A5 core - maybe there is some
> kind of subtle handling of such emulation in qemu?

No change. On the qemuarm yocto/OE MACHINE the same situation is
observed.

After applying this patch the board works without issues.

> 
> >   
> > > I am currently
> > > testing arm for different compilers (gcc 6.2, gcc 10, gcc 11) and
> > > with different configurations (armv5, armv6, armv7, with and
> > > without thumb) and I haven't see any issue so far.
> > > 
> > > It might binutils related, which version are you using?    
> > 
> > 
> > 
> > 
> > Best regards,
> > 
> > Lukasz Majewski
> > 
> > --
> > 
> > DENX Software Engineering GmbH,      Managing Director: Wolfgang
> > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,
> > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> > lukma@denx.de  
> 
> 
> 
> 
> Best regards,
> 
> Lukasz Majewski
> 
> --
> 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@denx.de




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Lukasz Majewski Sept. 10, 2021, 10:10 a.m. UTC | #10
On Thu, 9 Sep 2021 11:49:36 +0200
Lukasz Majewski <lukma@denx.de> wrote:

> On Thu, 9 Sep 2021 09:18:06 +0200
> Lukasz Majewski <lukma@denx.de> wrote:
> 
> > On Wed, 8 Sep 2021 22:34:21 +0200
> > Lukasz Majewski <lukma@denx.de> wrote:
> >   
> > > Hi Adhemerval,
> > >     
> > > > On 08/09/2021 12:05, Lukasz Majewski wrote:      
> > > > > 
> > > > > The r1 gets the 0xffffee80 (negative offset) value. It is then
> > > > > added to pc and used to calculate r2.
> > > > > 
> > > > > For working code (with this patch applied) - there are NO such
> > > > > large values (like aforementioned 0xffffee80). The arithmetic
> > > > > is done on 
> > > > > 
> > > > >    1690:       00000020        .word   0x00000020
> > > > >    1694:       0002be7e        .word   0x0002be7e
> > > > > 
> > > > > which seems to work.
> > > > > 
> > > > > This shouldn't be a problem as with U2 the arithmetic shall
> > > > > work. However, I've noticed (with passing LD_DEBUG=all) that
> > > > > the ld-linux-armhf.so.3 (and init) are relocated twice before
> > > > > execution.
> > > > > 
> > > > > Why do we need to relocate it?
> > > > > 
> > > > > Another question is why on this particular case the large
> > > > > (i.e. negative) offset matters?        
> > > > 
> > > > I think it is highly unlikely the negative offset plays any role
> > > > here. Do you have a working example to trigger this issue?
> > > >  
> > > 
> > > I'm just using QEMU (runqemu -d y2038-image-devel nographic) from
> > > meta-y2038 build.
> > > 
> > > Recent versions: 
> > > poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1
> > > qemu_6.0.0
> > > binutils_2.37
> > > gcc_11.2
> > > 
> > > The QEMU board uses Cortex-A9 core.    
> > 
> > I've updated the poky to be the newest -master:
> > master:50293eec9a7d0602b748220ab100b070b8d3432a
> > 
> > No change.
> > 
> > I will try the same setup with Cortex-A5 core - maybe there is some
> > kind of subtle handling of such emulation in qemu?  
> 
> No change. On the qemuarm yocto/OE MACHINE the same situation is
> observed.
> 

Do you have any more ideas where to look for solution of this problem?

> After applying this patch the board works without issues.
> 
> >   
> > >     
> > > > I am currently
> > > > testing arm for different compilers (gcc 6.2, gcc 10, gcc 11)
> > > > and with different configurations (armv5, armv6, armv7, with and
> > > > without thumb) and I haven't see any issue so far.
> > > > 
> > > > It might binutils related, which version are you using?      
> > > 
> > > 
> > > 
> > > 
> > > Best regards,
> > > 
> > > Lukasz Majewski
> > > 
> > > --
> > > 
> > > DENX Software Engineering GmbH,      Managing Director: Wolfgang
> > > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,
> > > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> > > lukma@denx.de    
> > 
> > 
> > 
> > 
> > Best regards,
> > 
> > Lukasz Majewski
> > 
> > --
> > 
> > DENX Software Engineering GmbH,      Managing Director: Wolfgang
> > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,
> > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> > lukma@denx.de  
> 
> 
> 
> 
> Best regards,
> 
> Lukasz Majewski
> 
> --
> 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@denx.de




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Lukasz Majewski Sept. 17, 2021, 8:29 a.m. UTC | #11
Dear Community,

> On Thu, 9 Sep 2021 11:49:36 +0200
> Lukasz Majewski <lukma@denx.de> wrote:
> 
> > On Thu, 9 Sep 2021 09:18:06 +0200
> > Lukasz Majewski <lukma@denx.de> wrote:
> >   
> > > On Wed, 8 Sep 2021 22:34:21 +0200
> > > Lukasz Majewski <lukma@denx.de> wrote:
> > >     
> > > > Hi Adhemerval,
> > > >       
> > > > > On 08/09/2021 12:05, Lukasz Majewski wrote:        
> > > > > > 
> > > > > > The r1 gets the 0xffffee80 (negative offset) value. It is
> > > > > > then added to pc and used to calculate r2.
> > > > > > 
> > > > > > For working code (with this patch applied) - there are NO
> > > > > > such large values (like aforementioned 0xffffee80). The
> > > > > > arithmetic is done on 
> > > > > > 
> > > > > >    1690:       00000020        .word   0x00000020
> > > > > >    1694:       0002be7e        .word   0x0002be7e
> > > > > > 
> > > > > > which seems to work.
> > > > > > 
> > > > > > This shouldn't be a problem as with U2 the arithmetic shall
> > > > > > work. However, I've noticed (with passing LD_DEBUG=all) that
> > > > > > the ld-linux-armhf.so.3 (and init) are relocated twice
> > > > > > before execution.
> > > > > > 
> > > > > > Why do we need to relocate it?
> > > > > > 
> > > > > > Another question is why on this particular case the large
> > > > > > (i.e. negative) offset matters?          
> > > > > 
> > > > > I think it is highly unlikely the negative offset plays any
> > > > > role here. Do you have a working example to trigger this
> > > > > issue? 
> > > > 
> > > > I'm just using QEMU (runqemu -d y2038-image-devel nographic)
> > > > from meta-y2038 build.
> > > > 
> > > > Recent versions: 
> > > > poky: SHA1: 1e2e9a84d6dd81d7f6dd69c0d119d0149d10ade1
> > > > qemu_6.0.0
> > > > binutils_2.37
> > > > gcc_11.2
> > > > 
> > > > The QEMU board uses Cortex-A9 core.      
> > > 
> > > I've updated the poky to be the newest -master:
> > > master:50293eec9a7d0602b748220ab100b070b8d3432a
> > > 
> > > No change.
> > > 
> > > I will try the same setup with Cortex-A5 core - maybe there is
> > > some kind of subtle handling of such emulation in qemu?    
> > 
> > No change. On the qemuarm yocto/OE MACHINE the same situation is
> > observed.
> >   
> 
> Do you have any more ideas where to look for solution of this problem?

Can we make a decision regarding this fix?

The only finding, which I get, is the large negative offset added to the
pc in the glibc broken (current master) code, which causes QEMU system
to fail. 

I've shared this problem with the QEMU community [1], but no reply
received till today.


Shall we:

1. Apply this patch or

2. Revert the conversion (whole) patch [2] for ARM.

3. Wait till QEMU response for the problem?



Links:

[1] -
https://lists.nongnu.org/archive/html/qemu-devel/2021-09/msg03332.html

[2] - SHA1: bca0f5cbc9257c13322b99e55235c4f21ba0bd82

> 
> > After applying this patch the board works without issues.
> >   
> > >     
> > > >       
> > > > > I am currently
> > > > > testing arm for different compilers (gcc 6.2, gcc 10, gcc 11)
> > > > > and with different configurations (armv5, armv6, armv7, with
> > > > > and without thumb) and I haven't see any issue so far.
> > > > > 
> > > > > It might binutils related, which version are you using?
> > > > >  
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Best regards,
> > > > 
> > > > Lukasz Majewski
> > > > 
> > > > --
> > > > 
> > > > DENX Software Engineering GmbH,      Managing Director: Wolfgang
> > > > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194
> > > > Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax:
> > > > (+49)-8142-66989-80 Email: lukma@denx.de      
> > > 
> > > 
> > > 
> > > 
> > > Best regards,
> > > 
> > > Lukasz Majewski
> > > 
> > > --
> > > 
> > > DENX Software Engineering GmbH,      Managing Director: Wolfgang
> > > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,
> > > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> > > lukma@denx.de    
> > 
> > 
> > 
> > 
> > Best regards,
> > 
> > Lukasz Majewski
> > 
> > --
> > 
> > DENX Software Engineering GmbH,      Managing Director: Wolfgang
> > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,
> > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> > lukma@denx.de  
> 
> 
> 
> 
> Best regards,
> 
> Lukasz Majewski
> 
> --
> 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@denx.de




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Joseph Myers Sept. 17, 2021, 1:27 p.m. UTC | #12
On Fri, 17 Sep 2021, Lukasz Majewski wrote:

> Can we make a decision regarding this fix?

I don't think we've yet seen a thorough analysis of the issue.

Involvement of QEMU is probably not relevant.  Either the dynamic linker 
binary generated is correct, according to the instruction semantics in the 
Arm ARM and the relocation semantics in AAELF, or it isn't.  If it's 
incorrect, either the (static) linker inputs are correct and there's a 
linker bug, or the linker inputs are incorrect ("correct" here might be a 
bit fuzzier, involving things that are not fully specified).

So start from working out whether the generated binary is correct or not, 
with an appropriate description of the relevant instruction sequences 
executed and why they do or do not work in each case.

QEMU only becomes relevant if you have a binary that works when executed 
on hardware but not on QEMU (or vice versa).

(a) Do you have such a binary working on hardware but not on QEMU?

(b) Have you tested using the same binaries with both QEMU and hardware?

(c) Have you tested with the glibc testsuite, using a prebuilt system 
image with known good glibc rather than building init with the new QEMU?  
That's a better starting point than building a whole system with the new 
glibc, though if the result is "all tests fail execution" you may not be 
much further forward (but at least you can run the dynamic linker, good 
and bad versions, under gdbserver, and step individual instructions to see 
exactly what happens when the problem code is executed).

(d) Likewise, but with --enable-hardcoded-path-in-tests?  The point of 
this question is that one thing that's different by default between a 
glibc testsuite run and running binaries outside the glibc testsuite is 
whether new binaries are run directly or via ld.so --library-path.  It's 
possible some dynamic linker bugs might show up in one configuration but 
not the other.  So if the glibc testsuite passes in a default 
configuration (which I assume it did, if the submitter of the original 
patch tested it properly), but init fails in a newly built system image, 
one possible explanation would be such a bug - and running the glibc 
testsuite with --enable-hardcoded-path-in-tests is one way of checking for 
that kind of issue.
  
Andreas Schwab Sept. 17, 2021, 4:17 p.m. UTC | #13
On Sep 17 2021, Joseph Myers wrote:

> QEMU only becomes relevant if you have a binary that works when executed 
> on hardware but not on QEMU (or vice versa).

Given that glibc is working fine on hardware, it is unlikely a bug in
glibc.

Andreas.
  
Lukasz Majewski Sept. 26, 2021, 7:58 p.m. UTC | #14
Hi Joseph,

Thanks for your input.

> On Fri, 17 Sep 2021, Lukasz Majewski wrote:
> 
> > Can we make a decision regarding this fix?  
> 
> I don't think we've yet seen a thorough analysis of the issue.
> 
> Involvement of QEMU is probably not relevant.  Either the dynamic
> linker binary generated is correct, according to the instruction
> semantics in the Arm ARM and the relocation semantics in AAELF, or it
> isn't.  If it's incorrect, either the (static) linker inputs are
> correct and there's a linker bug, or the linker inputs are incorrect
> ("correct" here might be a bit fuzzier, involving things that are not
> fully specified).
> 
> So start from working out whether the generated binary is correct or
> not, with an appropriate description of the relevant instruction
> sequences executed and why they do or do not work in each case.
> 
> QEMU only becomes relevant if you have a binary that works when
> executed on hardware but not on QEMU (or vice versa).
> 
> (a) Do you have such a binary working on hardware but not on QEMU?

Yes. I can run Beagle Bone generated image on the HW (without the fix),
but it breaks down on QEMU.

> 
> (b) Have you tested using the same binaries with both QEMU and
> hardware?

Yes. OOPs are only present on the QEMU.

> 
> (c) Have you tested with the glibc testsuite, using a prebuilt system 
> image with known good glibc rather than building init with the new
> QEMU?

No, not yet. I try to debug working vs broken ld-linux.so on working
QEMU setup with LD_PRELOAD.

> That's a better starting point than building a whole system
> with the new glibc, though if the result is "all tests fail
> execution" you may not be much further forward (but at least you can
> run the dynamic linker, good and bad versions, under gdbserver, and
> step individual instructions to see exactly what happens when the
> problem code is executed).
> 
> (d) Likewise, but with --enable-hardcoded-path-in-tests?  The point
> of this question is that one thing that's different by default
> between a glibc testsuite run and running binaries outside the glibc
> testsuite is whether new binaries are run directly or via ld.so
> --library-path.  It's possible some dynamic linker bugs might show up
> in one configuration but not the other.  So if the glibc testsuite
> passes in a default configuration (which I assume it did, if the
> submitter of the original patch tested it properly), but init fails
> in a newly built system image, one possible explanation would be such
> a bug - and running the glibc testsuite with
> --enable-hardcoded-path-in-tests is one way of checking for that kind
> of issue.
> 




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Joseph Myers Sept. 27, 2021, 4 p.m. UTC | #15
On Sun, 26 Sep 2021, Lukasz Majewski wrote:

> > QEMU only becomes relevant if you have a binary that works when
> > executed on hardware but not on QEMU (or vice versa).
> > 
> > (a) Do you have such a binary working on hardware but not on QEMU?
> 
> Yes. I can run Beagle Bone generated image on the HW (without the fix),
> but it breaks down on QEMU.

If this is system QEMU emulation, that strongly suggests a QEMU bug, not a 
glibc bug - meaning you should find where the execution diverges between 
QEMU and hardware to identify the mis-emulated instruction.

(For QEMU usermode emulation, different address space layout compared to 
running natively under the Linux kernel could be an issue.)
  
Lukasz Majewski Oct. 5, 2021, 7:45 a.m. UTC | #16
Hi Fangrui,

> >
> >I'm also wondering if similar issue is visible with "normal" - i.e.
> >not QEMU ARM run init.
> >
> >Maybe the "rough" environment in which /sbin/init is run is the
> >culprit?
> >  
> 
> I had tested running ld.so and elf/ldconfig which covered the usage.
> 
> I don't know. It needs some debugging from you:)

Could you share on which hardware (ARM) you run this change?


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Fangrui Song Oct. 6, 2021, 7:57 a.m. UTC | #17
On 2021-10-05, Lukasz Majewski wrote:
>Hi Fangrui,
>
>> >
>> >I'm also wondering if similar issue is visible with "normal" - i.e.
>> >not QEMU ARM run init.
>> >
>> >Maybe the "rough" environment in which /sbin/init is run is the
>> >culprit?
>> >
>>
>> I had tested running ld.so and elf/ldconfig which covered the usage.
>>
>> I don't know. It needs some debugging from you:)
>
>Could you share on which hardware (ARM) you run this change?
>
>
>Best regards,
>
>Lukasz Majewski

I ran ld.so and elf/ldconfig with qemu-arm-static (built from qemu/linux-user) (via binfmt_misc) on x86-64 Linux (Debian testing).
I don't have real ARM hardware.
  
Lukasz Majewski Oct. 6, 2021, 9:03 a.m. UTC | #18
Hi Fangrui,

> On 2021-10-05, Lukasz Majewski wrote:
> >Hi Fangrui,
> >  
> >> >
> >> >I'm also wondering if similar issue is visible with "normal" -
> >> >i.e. not QEMU ARM run init.
> >> >
> >> >Maybe the "rough" environment in which /sbin/init is run is the
> >> >culprit?
> >> >  
> >>
> >> I had tested running ld.so and elf/ldconfig which covered the
> >> usage.
> >>
> >> I don't know. It needs some debugging from you:)  
> >
> >Could you share on which hardware (ARM) you run this change?
> >
> >
> >Best regards,
> >
> >Lukasz Majewski  
> 

Thanks for the info.

> I ran ld.so and elf/ldconfig with qemu-arm-static (built from

Ok, so this was only the user mode emulation, not qemu-system-arm ?

> qemu/linux-user) (via binfmt_misc) on x86-64 Linux (Debian testing).
> I don't have real ARM hardware.

The built image for BBB (Beagle Bone Black) also shows OOPs on the real
HW.

I'm debugging this now and share results asap.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Lukasz Majewski Oct. 6, 2021, 11:43 a.m. UTC | #19
Dear Community,

Please find in-depth analyze about the issue:

It was tested with Beagle Bone Black (BBB) and QEMU (the same binary
rootfs images).
(If needed I will upload images and script to run QEMU to some server
for reproduction).
Branch: https://github.com/lmajewski/y2038_glibc/commits/y2038_edge

On working setup to trigger the core dump:
/home/root/ld-linux-armhf.so.3 /sbin/init
gdb ./ld-linux-armhf.so.3 core

(and the /home/root/ld-linux-armhf.so.3 is the "broken" one).


Not working (patch [1] not applied):
====================================

All the code is located in _dl_start in elf/rtld.c and
elf/get-dynamic-info.h files:

(gdb) p/x $r5
$46 = 0xb6fc8000
r5 is set from the elf_machine_load_address()

Then we enter the elf_get_dynamic_info()

0xb6fc95fc      99            ADJUST_DYN_INFO (DT_SYMTAB);
   0xb6fc95f8 <_dl_start+308>:  04 30 92 15     ldrne   r3, [r2, #4]
=> 0xb6fc95fc <_dl_start+312>:  05 30 83 10     addne   r3, r3, r5
   0xb6fc9600 <_dl_start+316>:  04 30 82 15     strne   r3, [r2, #4]
(gdb) p/x $r3
$63 = 0x410003f4
(gdb) p/x $r5
$64 = 0xb6fc8000
(gdb) si
0xb6fc9600      99            ADJUST_DYN_INFO (DT_SYMTAB);
   0xb6fc95f8 <_dl_start+308>:  04 30 92 15     ldrne   r3, [r2, #4]
   0xb6fc95fc <_dl_start+312>:  05 30 83 10     addne   r3, r3, r5
=> 0xb6fc9600 <_dl_start+316>:  04 30 82 15     strne   r3, [r2, #4]
(gdb) p/x $r3
$65 = 0xf7fc83f4

The above address is the kernel space one - so accessing it causes:
"Program received signal SIGSEGV, Segmentation fault."

(gdb) p/x $r3
$55 = 0xf7fc88e4
112                 DO_ELF_MACHINE_REL_RELATIVE (map, l_addr, relative);
=> 0xb6fc979c <_dl_start+728>:  08 10 13 e5     ldr     r1, [r3, #-8]


The problem is that elf_machine_load_address() returns wrong value - or
maybe to say it differently - the one used afterwards by glibc is wrong.

The correct one is, which corresponds to using 'adr' command (with
patch [1] applied):

0xb6fc9508 in _dl_start (arg=0xbefffdf0) at rtld.c:545
545       bootstrap_map.l_addr = elf_machine_load_address ();
=> 0xb6fc9508 <_dl_start+68>:   98 55 81 e5     str     r5, [r1, #1432]
; 0x598 (gdb) p/x $r5
$12 = 0x75fc8000


Another issue is the way the assembler code is generated. With the
working code one would have:

(gdb) si
0xb6fc9608      99            ADJUST_DYN_INFO (DT_SYMTAB);
   0xb6fc9600 <_dl_start+316>:  00 37 9f e5     ldr     r3, [pc, #1792]
; 0xb6fc9604 <_dl_start+320>:  03 30 8f e0     add     r3, pc, r3
=> 0xb6fc9608 <_dl_start+324>:  d0 35 93 e5     ldr     r3, [r3, #1488]
; 0x5d0 0xb6fc960c <_dl_start+328>:  00 00 53 e3     cmp     r3, #0

(gdb) p/x $r3
$13 = 0xb6fff010
(gdb) p/x $pc
$14 = 0xb6fc9608
(gdb) 

In the above example the $r3 value is relative to $pc and $r5 is not
used at all.


The code which fails now looks like:
112                 DO_ELF_MACHINE_REL_RELATIVE (map, l_addr, relative);
=> 0xb6fc9870 <_dl_start+940>:  08 10 13 e5     ldr     r1, [r3, #-8]

(gdb) p/x l_addr
$15 = 0x75fc8000

and everything works as expected.


Questions:
==========

1. I guess that different addressing assembler generated ($r3
calculation) is related to the address to be relocated (when the offset
is too large the register is used - not the $pc itself).

2. I don't understand why values provided by elf_machine_load_address()
differ? I would expect that those are equal.

And help and input appreciated.





Links:
[1] -
https://github.com/lmajewski/y2038_glibc/commit/e67e0f589b88a44be8f8b9b770b08950dd7e5bd5

readelf -e ld-linux-armhf.so.3 

[10] .plt              PROGBITS        41000994 000994 000050 04  AX  0   0  4
[11] .text             PROGBITS        41000a00 000a00 01fed0 00  AX  0   0 64
[12] .rodata           PROGBITS        410208d0 0208d0 004b24 00   A  0   0  4
[13] .ARM.extab        PROGBITS        410253f4 0253f4 000018 00   A  0   0  4
[14] .ARM.exidx        ARM_EXIDX       4102540c 02540c 0000c8 00  AL 11   0  4
[15] .data.rel.ro      PROGBITS        41036200 026200 000cf4 00  WA  0   0  8
[16] .dynamic          DYNAMIC         41036ef4 026ef4 0000c8 08  WA  5   0  4
[17] .got              PROGBITS        41036fbc 026fbc 000040 04  WA  0   0  4


> Hi Fangrui,
> 
> > On 2021-10-05, Lukasz Majewski wrote:  
> > >Hi Fangrui,
> > >    
> > >> >
> > >> >I'm also wondering if similar issue is visible with "normal" -
> > >> >i.e. not QEMU ARM run init.
> > >> >
> > >> >Maybe the "rough" environment in which /sbin/init is run is the
> > >> >culprit?
> > >> >    
> > >>
> > >> I had tested running ld.so and elf/ldconfig which covered the
> > >> usage.
> > >>
> > >> I don't know. It needs some debugging from you:)    
> > >
> > >Could you share on which hardware (ARM) you run this change?
> > >
> > >
> > >Best regards,
> > >
> > >Lukasz Majewski    
> >   
> 
> Thanks for the info.
> 
> > I ran ld.so and elf/ldconfig with qemu-arm-static (built from  
> 
> Ok, so this was only the user mode emulation, not qemu-system-arm ?
> 
> > qemu/linux-user) (via binfmt_misc) on x86-64 Linux (Debian testing).
> > I don't have real ARM hardware.  
> 
> The built image for BBB (Beagle Bone Black) also shows OOPs on the
> real HW.
> 
> I'm debugging this now and share results asap.
> 
> 
> Best regards,
> 
> Lukasz Majewski
> 
> --
> 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@denx.de




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Szabolcs Nagy Oct. 6, 2021, 12:55 p.m. UTC | #20
The 10/06/2021 13:43, Lukasz Majewski wrote:
> Please find in-depth analyze about the issue:
> 
> It was tested with Beagle Bone Black (BBB) and QEMU (the same binary
> rootfs images).
> (If needed I will upload images and script to run QEMU to some server
> for reproduction).
> Branch: https://github.com/lmajewski/y2038_glibc/commits/y2038_edge

i think it is easier to look at if you upload the broken
ld.so binary. or at least readelf -aW ld.so output.

> On working setup to trigger the core dump:
> /home/root/ld-linux-armhf.so.3 /sbin/init
> gdb ./ld-linux-armhf.so.3 core
> 
> (and the /home/root/ld-linux-armhf.so.3 is the "broken" one).
> 
> 
> Not working (patch [1] not applied):
> ====================================
> 
> All the code is located in _dl_start in elf/rtld.c and
> elf/get-dynamic-info.h files:
> 
> (gdb) p/x $r5
> $46 = 0xb6fc8000
> r5 is set from the elf_machine_load_address()
> 
> Then we enter the elf_get_dynamic_info()
> 
> 0xb6fc95fc      99            ADJUST_DYN_INFO (DT_SYMTAB);
>    0xb6fc95f8 <_dl_start+308>:  04 30 92 15     ldrne   r3, [r2, #4]
> => 0xb6fc95fc <_dl_start+312>:  05 30 83 10     addne   r3, r3, r5
>    0xb6fc9600 <_dl_start+316>:  04 30 82 15     strne   r3, [r2, #4]
> (gdb) p/x $r3
> $63 = 0x410003f4
> (gdb) p/x $r5
> $64 = 0xb6fc8000

it seems r5 is already wrong here, it's not the runtime
address of 0. (looks more like the runtime address of
0x41000000)

likely something goes wrong with the computation of r5.

> Links:
> [1] -
> https://github.com/lmajewski/y2038_glibc/commit/e67e0f589b88a44be8f8b9b770b08950dd7e5bd5
> 
> readelf -e ld-linux-armhf.so.3 
> 
> [10] .plt              PROGBITS        41000994 000994 000050 04  AX  0   0  4
> [11] .text             PROGBITS        41000a00 000a00 01fed0 00  AX  0   0 64
> [12] .rodata           PROGBITS        410208d0 0208d0 004b24 00   A  0   0  4
> [13] .ARM.extab        PROGBITS        410253f4 0253f4 000018 00   A  0   0  4
> [14] .ARM.exidx        ARM_EXIDX       4102540c 02540c 0000c8 00  AL 11   0  4
> [15] .data.rel.ro      PROGBITS        41036200 026200 000cf4 00  WA  0   0  8
> [16] .dynamic          DYNAMIC         41036ef4 026ef4 0000c8 08  WA  5   0  4
> [17] .got              PROGBITS        41036fbc 026fbc 000040 04  WA  0   0  4

why are all addresses >0x41000000 ?
in a shared library i expect all those addresses
to be close to 0.

is this made by some modified binutils?
  
Lukasz Majewski Oct. 7, 2021, 9:19 a.m. UTC | #21
Hi Szabolcs,

> The 10/06/2021 13:43, Lukasz Majewski wrote:
> > Please find in-depth analyze about the issue:
> > 
> > It was tested with Beagle Bone Black (BBB) and QEMU (the same binary
> > rootfs images).
> > (If needed I will upload images and script to run QEMU to some
> > server for reproduction).
> > Branch: https://github.com/lmajewski/y2038_glibc/commits/y2038_edge
> >  
> 
> i think it is easier to look at if you upload the broken
> ld.so binary. or at least readelf -aW ld.so output.

Please find working and broken ld-linux-armhf.so.3:
https://owncloud.denx.de/s/wtAfktG6pXLffSA

> 
> > On working setup to trigger the core dump:
> > /home/root/ld-linux-armhf.so.3 /sbin/init
> > gdb ./ld-linux-armhf.so.3 core
> > 
> > (and the /home/root/ld-linux-armhf.so.3 is the "broken" one).
> > 
> > 
> > Not working (patch [1] not applied):
> > ====================================
> > 
> > All the code is located in _dl_start in elf/rtld.c and
> > elf/get-dynamic-info.h files:
> > 
> > (gdb) p/x $r5
> > $46 = 0xb6fc8000
> > r5 is set from the elf_machine_load_address()
> > 
> > Then we enter the elf_get_dynamic_info()
> > 
> > 0xb6fc95fc      99            ADJUST_DYN_INFO (DT_SYMTAB);
> >    0xb6fc95f8 <_dl_start+308>:  04 30 92 15     ldrne   r3, [r2,
> > #4] => 0xb6fc95fc <_dl_start+312>:  05 30 83 10     addne   r3, r3,
> > r5 0xb6fc9600 <_dl_start+316>:  04 30 82 15     strne   r3, [r2, #4]
> > (gdb) p/x $r3
> > $63 = 0x410003f4
> > (gdb) p/x $r5
> > $64 = 0xb6fc8000  
> 
> it seems r5 is already wrong here, it's not the runtime
> address of 0. (looks more like the runtime address of
> 0x41000000)

The r5 is set according to the change in patch which now I'm trying to
fix.

To be more specific - r5 is set according to code in patch SHA1:
bca0f5cbc9257c13322b99e55235c4f21ba0bd82

> 
> likely something goes wrong with the computation of r5.

Before the SHA1: bca0f5cbc9257c13322b99e55235c4f21ba0bd82 patch r5 was
computed with using 'adr' assembler instruction, not set from some
constant value.

asm ("adr %0, _dl_start" : "=r" (pcrel_addr));

And r5 value was 0x75fc8000

> 
> > Links:
> > [1] -
> > https://github.com/lmajewski/y2038_glibc/commit/e67e0f589b88a44be8f8b9b770b08950dd7e5bd5
> > 
> > readelf -e ld-linux-armhf.so.3 
> > 
> > [10] .plt              PROGBITS        41000994 000994 000050 04
> > AX  0   0  4 [11] .text             PROGBITS        41000a00 000a00
> > 01fed0 00  AX  0   0 64 [12] .rodata           PROGBITS
> > 410208d0 0208d0 004b24 00   A  0   0  4 [13] .ARM.extab
> > PROGBITS        410253f4 0253f4 000018 00   A  0   0  4 [14]
> > .ARM.exidx        ARM_EXIDX       4102540c 02540c 0000c8 00  AL 11
> >  0  4 [15] .data.rel.ro      PROGBITS        41036200 026200 000cf4
> > 00  WA  0   0  8 [16] .dynamic          DYNAMIC         41036ef4
> > 026ef4 0000c8 08  WA  5   0  4 [17] .got              PROGBITS
> >   41036fbc 026fbc 000040 04  WA  0   0  4  
> 
> why are all addresses >0x41000000 ?
> in a shared library i expect all those addresses
> to be close to 0.

On this real HW system (the rootfs which is running) - libc.so.6 also
has address > 0x41000000
libm.so.6 also has the value > 0x41200000
(Entry point address:               0x412c9190)

The offset of > 0x41000000 looks a bit strange indeed, but it is still
less than the kernel space. Moreover, with position independent code it
shall not matter.

> 
> is this made by some modified binutils?

I've double checked the ld-linux-armhf.so.3 and after build it has:
(Entry point address:               0xa00) which seems to be correct.

So it looks like during installation of the glibc (on the Yocto/OE
created rootfs) the elf header is modified and this 0x41000000 offset is
added.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Lukasz Majewski Oct. 7, 2021, 10 a.m. UTC | #22
On Thu, 7 Oct 2021 11:19:26 +0200
Lukasz Majewski <lukma@denx.de> wrote:

> Hi Szabolcs,
> 
> > The 10/06/2021 13:43, Lukasz Majewski wrote:  
> > > Please find in-depth analyze about the issue:
> > > 
> > > It was tested with Beagle Bone Black (BBB) and QEMU (the same
> > > binary rootfs images).
> > > (If needed I will upload images and script to run QEMU to some
> > > server for reproduction).
> > > Branch:
> > > https://github.com/lmajewski/y2038_glibc/commits/y2038_edge 
> > 
> > i think it is easier to look at if you upload the broken
> > ld.so binary. or at least readelf -aW ld.so output.  
> 
> Please find working and broken ld-linux-armhf.so.3:
> https://owncloud.denx.de/s/wtAfktG6pXLffSA
> 
> >   
> > > On working setup to trigger the core dump:
> > > /home/root/ld-linux-armhf.so.3 /sbin/init
> > > gdb ./ld-linux-armhf.so.3 core
> > > 
> > > (and the /home/root/ld-linux-armhf.so.3 is the "broken" one).
> > > 
> > > 
> > > Not working (patch [1] not applied):
> > > ====================================
> > > 
> > > All the code is located in _dl_start in elf/rtld.c and
> > > elf/get-dynamic-info.h files:
> > > 
> > > (gdb) p/x $r5
> > > $46 = 0xb6fc8000
> > > r5 is set from the elf_machine_load_address()
> > > 
> > > Then we enter the elf_get_dynamic_info()
> > > 
> > > 0xb6fc95fc      99            ADJUST_DYN_INFO (DT_SYMTAB);
> > >    0xb6fc95f8 <_dl_start+308>:  04 30 92 15     ldrne   r3, [r2,
> > > #4] => 0xb6fc95fc <_dl_start+312>:  05 30 83 10     addne   r3,
> > > r3, r5 0xb6fc9600 <_dl_start+316>:  04 30 82 15     strne   r3,
> > > [r2, #4] (gdb) p/x $r3
> > > $63 = 0x410003f4
> > > (gdb) p/x $r5
> > > $64 = 0xb6fc8000    
> > 
> > it seems r5 is already wrong here, it's not the runtime
> > address of 0. (looks more like the runtime address of
> > 0x41000000)  
> 
> The r5 is set according to the change in patch which now I'm trying to
> fix.
> 
> To be more specific - r5 is set according to code in patch SHA1:
> bca0f5cbc9257c13322b99e55235c4f21ba0bd82
> 
> > 
> > likely something goes wrong with the computation of r5.  
> 
> Before the SHA1: bca0f5cbc9257c13322b99e55235c4f21ba0bd82 patch r5 was
> computed with using 'adr' assembler instruction, not set from some
> constant value.
> 
> asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
> 
> And r5 value was 0x75fc8000
> 
> >   
> > > Links:
> > > [1] -
> > > https://github.com/lmajewski/y2038_glibc/commit/e67e0f589b88a44be8f8b9b770b08950dd7e5bd5
> > > 
> > > readelf -e ld-linux-armhf.so.3 
> > > 
> > > [10] .plt              PROGBITS        41000994 000994 000050 04
> > > AX  0   0  4 [11] .text             PROGBITS        41000a00
> > > 000a00 01fed0 00  AX  0   0 64 [12] .rodata           PROGBITS
> > > 410208d0 0208d0 004b24 00   A  0   0  4 [13] .ARM.extab
> > > PROGBITS        410253f4 0253f4 000018 00   A  0   0  4 [14]
> > > .ARM.exidx        ARM_EXIDX       4102540c 02540c 0000c8 00  AL 11
> > >  0  4 [15] .data.rel.ro      PROGBITS        41036200 026200
> > > 000cf4 00  WA  0   0  8 [16] .dynamic          DYNAMIC
> > > 41036ef4 026ef4 0000c8 08  WA  5   0  4 [17] .got
> > > PROGBITS 41036fbc 026fbc 000040 04  WA  0   0  4    
> > 
> > why are all addresses >0x41000000 ?
> > in a shared library i expect all those addresses
> > to be close to 0.  
> 
> On this real HW system (the rootfs which is running) - libc.so.6 also
> has address > 0x41000000
> libm.so.6 also has the value > 0x41200000
> (Entry point address:               0x412c9190)
> 
> The offset of > 0x41000000 looks a bit strange indeed, but it is still
> less than the kernel space. Moreover, with position independent code
> it shall not matter.
> 
> > 
> > is this made by some modified binutils?  
> 
> I've double checked the ld-linux-armhf.so.3 and after build it has:
> (Entry point address:               0xa00) which seems to be correct.
> 
> So it looks like during installation of the glibc (on the Yocto/OE
> created rootfs) the elf header is modified and this 0x41000000 offset
> is added.
> 

And indeed it is the case. Yocto/OE by default perform prelinking (use
prelink program) to speedup start time of dynamic program.

The prelink [1] itself assigns some virtual addresses to all required
shared objects (in our case for /sbin/init), so no clashes are
encountered.

And using prelink is a _default_ behaviour in Yocto/OE poky distro.

Links:
[1] - https://linux.die.net/man/8/prelink

> 
> Best regards,
> 
> Lukasz Majewski
> 
> --
> 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@denx.de




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Szabolcs Nagy Oct. 7, 2021, 2:15 p.m. UTC | #23
The 10/07/2021 12:00, Lukasz Majewski wrote:
> On Thu, 7 Oct 2021 11:19:26 +0200
> Lukasz Majewski <lukma@denx.de> wrote:
> > Please find working and broken ld-linux-armhf.so.3:
> > https://owncloud.denx.de/s/wtAfktG6pXLffSA

thanks.

i can confirm that this crashes on an aarch64 machine too
(that supports running arm binaries), so not just a qemu
problem.


> > > > Links:
> > > > [1] -
> > > > https://github.com/lmajewski/y2038_glibc/commit/e67e0f589b88a44be8f8b9b770b08950dd7e5bd5
> > > > 
> > > > readelf -e ld-linux-armhf.so.3 
> > > > 
> > > > [10] .plt              PROGBITS        41000994 000994 000050 04
> > > > AX  0   0  4 [11] .text             PROGBITS        41000a00
> > > > 000a00 01fed0 00  AX  0   0 64 [12] .rodata           PROGBITS
> > > > 410208d0 0208d0 004b24 00   A  0   0  4 [13] .ARM.extab
> > > > PROGBITS        410253f4 0253f4 000018 00   A  0   0  4 [14]
> > > > .ARM.exidx        ARM_EXIDX       4102540c 02540c 0000c8 00  AL 11
> > > >  0  4 [15] .data.rel.ro      PROGBITS        41036200 026200
> > > > 000cf4 00  WA  0   0  8 [16] .dynamic          DYNAMIC
> > > > 41036ef4 026ef4 0000c8 08  WA  5   0  4 [17] .got
> > > > PROGBITS 41036fbc 026fbc 000040 04  WA  0   0  4    
> > > 
> > > why are all addresses >0x41000000 ?
> > > in a shared library i expect all those addresses
> > > to be close to 0.  
> > 
> > On this real HW system (the rootfs which is running) - libc.so.6 also
> > has address > 0x41000000
> > libm.so.6 also has the value > 0x41200000
> > (Entry point address:               0x412c9190)
> > 
> > The offset of > 0x41000000 looks a bit strange indeed, but it is still
> > less than the kernel space. Moreover, with position independent code
> > it shall not matter.
> > 
> > > 
> > > is this made by some modified binutils?  
> > 
> > I've double checked the ld-linux-armhf.so.3 and after build it has:
> > (Entry point address:               0xa00) which seems to be correct.
> > 
> > So it looks like during installation of the glibc (on the Yocto/OE
> > created rootfs) the elf header is modified and this 0x41000000 offset
> > is added.
> > 
> 
> And indeed it is the case. Yocto/OE by default perform prelinking (use
> prelink program) to speedup start time of dynamic program.
> 
> The prelink [1] itself assigns some virtual addresses to all required
> shared objects (in our case for /sbin/init), so no clashes are
> encountered.
> 
> And using prelink is a _default_ behaviour in Yocto/OE poky distro.
> 
> Links:
> [1] - https://linux.die.net/man/8/prelink

ah prelinking.

ok so &__ehdr_start is supposed to point to the
elf header, which supposed to start at address 0
within the elf image.

but with prelinking everthing is moved by a fixed
offset (including the elf header), so __ehdr_start
is no longer has 0 address.

i don't immediately know what's the best fix for this.
  
Adhemerval Zanella Netto Oct. 7, 2021, 2:16 p.m. UTC | #24
On 07/10/2021 07:00, Lukasz Majewski wrote:
> On Thu, 7 Oct 2021 11:19:26 +0200
> Lukasz Majewski <lukma@denx.de> wrote:
> 
>> Hi Szabolcs,
>>
>>> The 10/06/2021 13:43, Lukasz Majewski wrote:  
>>>> Please find in-depth analyze about the issue:
>>>>
>>>> It was tested with Beagle Bone Black (BBB) and QEMU (the same
>>>> binary rootfs images).
>>>> (If needed I will upload images and script to run QEMU to some
>>>> server for reproduction).
>>>> Branch:
>>>> https://github.com/lmajewski/y2038_glibc/commits/y2038_edge 
>>>
>>> i think it is easier to look at if you upload the broken
>>> ld.so binary. or at least readelf -aW ld.so output.  
>>
>> Please find working and broken ld-linux-armhf.so.3:
>> https://owncloud.denx.de/s/wtAfktG6pXLffSA
>>
>>>   
>>>> On working setup to trigger the core dump:
>>>> /home/root/ld-linux-armhf.so.3 /sbin/init
>>>> gdb ./ld-linux-armhf.so.3 core
>>>>
>>>> (and the /home/root/ld-linux-armhf.so.3 is the "broken" one).
>>>>
>>>>
>>>> Not working (patch [1] not applied):
>>>> ====================================
>>>>
>>>> All the code is located in _dl_start in elf/rtld.c and
>>>> elf/get-dynamic-info.h files:
>>>>
>>>> (gdb) p/x $r5
>>>> $46 = 0xb6fc8000
>>>> r5 is set from the elf_machine_load_address()
>>>>
>>>> Then we enter the elf_get_dynamic_info()
>>>>
>>>> 0xb6fc95fc      99            ADJUST_DYN_INFO (DT_SYMTAB);
>>>>    0xb6fc95f8 <_dl_start+308>:  04 30 92 15     ldrne   r3, [r2,
>>>> #4] => 0xb6fc95fc <_dl_start+312>:  05 30 83 10     addne   r3,
>>>> r3, r5 0xb6fc9600 <_dl_start+316>:  04 30 82 15     strne   r3,
>>>> [r2, #4] (gdb) p/x $r3
>>>> $63 = 0x410003f4
>>>> (gdb) p/x $r5
>>>> $64 = 0xb6fc8000    
>>>
>>> it seems r5 is already wrong here, it's not the runtime
>>> address of 0. (looks more like the runtime address of
>>> 0x41000000)  
>>
>> The r5 is set according to the change in patch which now I'm trying to
>> fix.
>>
>> To be more specific - r5 is set according to code in patch SHA1:
>> bca0f5cbc9257c13322b99e55235c4f21ba0bd82
>>
>>>
>>> likely something goes wrong with the computation of r5.  
>>
>> Before the SHA1: bca0f5cbc9257c13322b99e55235c4f21ba0bd82 patch r5 was
>> computed with using 'adr' assembler instruction, not set from some
>> constant value.
>>
>> asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
>>
>> And r5 value was 0x75fc8000
>>
>>>   
>>>> Links:
>>>> [1] -
>>>> https://github.com/lmajewski/y2038_glibc/commit/e67e0f589b88a44be8f8b9b770b08950dd7e5bd5
>>>>
>>>> readelf -e ld-linux-armhf.so.3 
>>>>
>>>> [10] .plt              PROGBITS        41000994 000994 000050 04
>>>> AX  0   0  4 [11] .text             PROGBITS        41000a00
>>>> 000a00 01fed0 00  AX  0   0 64 [12] .rodata           PROGBITS
>>>> 410208d0 0208d0 004b24 00   A  0   0  4 [13] .ARM.extab
>>>> PROGBITS        410253f4 0253f4 000018 00   A  0   0  4 [14]
>>>> .ARM.exidx        ARM_EXIDX       4102540c 02540c 0000c8 00  AL 11
>>>>  0  4 [15] .data.rel.ro      PROGBITS        41036200 026200
>>>> 000cf4 00  WA  0   0  8 [16] .dynamic          DYNAMIC
>>>> 41036ef4 026ef4 0000c8 08  WA  5   0  4 [17] .got
>>>> PROGBITS 41036fbc 026fbc 000040 04  WA  0   0  4    
>>>
>>> why are all addresses >0x41000000 ?
>>> in a shared library i expect all those addresses
>>> to be close to 0.  
>>
>> On this real HW system (the rootfs which is running) - libc.so.6 also
>> has address > 0x41000000
>> libm.so.6 also has the value > 0x41200000
>> (Entry point address:               0x412c9190)
>>
>> The offset of > 0x41000000 looks a bit strange indeed, but it is still
>> less than the kernel space. Moreover, with position independent code
>> it shall not matter.
>>
>>>
>>> is this made by some modified binutils?  
>>
>> I've double checked the ld-linux-armhf.so.3 and after build it has:
>> (Entry point address:               0xa00) which seems to be correct.
>>
>> So it looks like during installation of the glibc (on the Yocto/OE
>> created rootfs) the elf header is modified and this 0x41000000 offset
>> is added.
>>
> 
> And indeed it is the case. Yocto/OE by default perform prelinking (use
> prelink program) to speedup start time of dynamic program.
> 
> The prelink [1] itself assigns some virtual addresses to all required
> shared objects (in our case for /sbin/init), so no clashes are
> encountered.
> 
> And using prelink is a _default_ behaviour in Yocto/OE poky distro.

Does it work without prelink? Also, does it fail with prelink in real
hardware?

It indeed might be a prelink issue in fact.
  
H.J. Lu Oct. 7, 2021, 2:29 p.m. UTC | #25
On Thu, Oct 7, 2021 at 7:18 AM Adhemerval Zanella via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
>
>
> On 07/10/2021 07:00, Lukasz Majewski wrote:
> > On Thu, 7 Oct 2021 11:19:26 +0200
> > Lukasz Majewski <lukma@denx.de> wrote:
> >
> >> Hi Szabolcs,
> >>
> >>> The 10/06/2021 13:43, Lukasz Majewski wrote:
> >>>> Please find in-depth analyze about the issue:
> >>>>
> >>>> It was tested with Beagle Bone Black (BBB) and QEMU (the same
> >>>> binary rootfs images).
> >>>> (If needed I will upload images and script to run QEMU to some
> >>>> server for reproduction).
> >>>> Branch:
> >>>> https://github.com/lmajewski/y2038_glibc/commits/y2038_edge
> >>>
> >>> i think it is easier to look at if you upload the broken
> >>> ld.so binary. or at least readelf -aW ld.so output.
> >>
> >> Please find working and broken ld-linux-armhf.so.3:
> >> https://owncloud.denx.de/s/wtAfktG6pXLffSA
> >>
> >>>
> >>>> On working setup to trigger the core dump:
> >>>> /home/root/ld-linux-armhf.so.3 /sbin/init
> >>>> gdb ./ld-linux-armhf.so.3 core
> >>>>
> >>>> (and the /home/root/ld-linux-armhf.so.3 is the "broken" one).
> >>>>
> >>>>
> >>>> Not working (patch [1] not applied):
> >>>> ====================================
> >>>>
> >>>> All the code is located in _dl_start in elf/rtld.c and
> >>>> elf/get-dynamic-info.h files:
> >>>>
> >>>> (gdb) p/x $r5
> >>>> $46 = 0xb6fc8000
> >>>> r5 is set from the elf_machine_load_address()
> >>>>
> >>>> Then we enter the elf_get_dynamic_info()
> >>>>
> >>>> 0xb6fc95fc      99            ADJUST_DYN_INFO (DT_SYMTAB);
> >>>>    0xb6fc95f8 <_dl_start+308>:  04 30 92 15     ldrne   r3, [r2,
> >>>> #4] => 0xb6fc95fc <_dl_start+312>:  05 30 83 10     addne   r3,
> >>>> r3, r5 0xb6fc9600 <_dl_start+316>:  04 30 82 15     strne   r3,
> >>>> [r2, #4] (gdb) p/x $r3
> >>>> $63 = 0x410003f4
> >>>> (gdb) p/x $r5
> >>>> $64 = 0xb6fc8000
> >>>
> >>> it seems r5 is already wrong here, it's not the runtime
> >>> address of 0. (looks more like the runtime address of
> >>> 0x41000000)
> >>
> >> The r5 is set according to the change in patch which now I'm trying to
> >> fix.
> >>
> >> To be more specific - r5 is set according to code in patch SHA1:
> >> bca0f5cbc9257c13322b99e55235c4f21ba0bd82
> >>
> >>>
> >>> likely something goes wrong with the computation of r5.
> >>
> >> Before the SHA1: bca0f5cbc9257c13322b99e55235c4f21ba0bd82 patch r5 was
> >> computed with using 'adr' assembler instruction, not set from some
> >> constant value.
> >>
> >> asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
> >>
> >> And r5 value was 0x75fc8000
> >>
> >>>
> >>>> Links:
> >>>> [1] -
> >>>> https://github.com/lmajewski/y2038_glibc/commit/e67e0f589b88a44be8f8b9b770b08950dd7e5bd5
> >>>>
> >>>> readelf -e ld-linux-armhf.so.3
> >>>>
> >>>> [10] .plt              PROGBITS        41000994 000994 000050 04
> >>>> AX  0   0  4 [11] .text             PROGBITS        41000a00
> >>>> 000a00 01fed0 00  AX  0   0 64 [12] .rodata           PROGBITS
> >>>> 410208d0 0208d0 004b24 00   A  0   0  4 [13] .ARM.extab
> >>>> PROGBITS        410253f4 0253f4 000018 00   A  0   0  4 [14]
> >>>> .ARM.exidx        ARM_EXIDX       4102540c 02540c 0000c8 00  AL 11
> >>>>  0  4 [15] .data.rel.ro      PROGBITS        41036200 026200
> >>>> 000cf4 00  WA  0   0  8 [16] .dynamic          DYNAMIC
> >>>> 41036ef4 026ef4 0000c8 08  WA  5   0  4 [17] .got
> >>>> PROGBITS 41036fbc 026fbc 000040 04  WA  0   0  4
> >>>
> >>> why are all addresses >0x41000000 ?
> >>> in a shared library i expect all those addresses
> >>> to be close to 0.
> >>
> >> On this real HW system (the rootfs which is running) - libc.so.6 also
> >> has address > 0x41000000
> >> libm.so.6 also has the value > 0x41200000
> >> (Entry point address:               0x412c9190)
> >>
> >> The offset of > 0x41000000 looks a bit strange indeed, but it is still
> >> less than the kernel space. Moreover, with position independent code
> >> it shall not matter.
> >>
> >>>
> >>> is this made by some modified binutils?
> >>
> >> I've double checked the ld-linux-armhf.so.3 and after build it has:
> >> (Entry point address:               0xa00) which seems to be correct.
> >>
> >> So it looks like during installation of the glibc (on the Yocto/OE
> >> created rootfs) the elf header is modified and this 0x41000000 offset
> >> is added.
> >>
> >
> > And indeed it is the case. Yocto/OE by default perform prelinking (use
> > prelink program) to speedup start time of dynamic program.
> >
> > The prelink [1] itself assigns some virtual addresses to all required
> > shared objects (in our case for /sbin/init), so no clashes are
> > encountered.
> >
> > And using prelink is a _default_ behaviour in Yocto/OE poky distro.
>
> Does it work without prelink? Also, does it fail with prelink in real
> hardware?
>
> It indeed might be a prelink issue in fact.

This will fail everywhere if prelink is used.
  
Lukasz Majewski Oct. 7, 2021, 2:58 p.m. UTC | #26
Hi Szabolcs,

> The 10/07/2021 12:00, Lukasz Majewski wrote:
> > On Thu, 7 Oct 2021 11:19:26 +0200
> > Lukasz Majewski <lukma@denx.de> wrote:  
> > > Please find working and broken ld-linux-armhf.so.3:
> > > https://owncloud.denx.de/s/wtAfktG6pXLffSA  
> 
> thanks.
> 
> i can confirm that this crashes on an aarch64 machine too
> (that supports running arm binaries), so not just a qemu
> problem.
> 
> 
> > > > > Links:
> > > > > [1] -
> > > > > https://github.com/lmajewski/y2038_glibc/commit/e67e0f589b88a44be8f8b9b770b08950dd7e5bd5
> > > > > 
> > > > > readelf -e ld-linux-armhf.so.3 
> > > > > 
> > > > > [10] .plt              PROGBITS        41000994 000994 000050
> > > > > 04 AX  0   0  4 [11] .text             PROGBITS
> > > > > 41000a00 000a00 01fed0 00  AX  0   0 64 [12] .rodata
> > > > >  PROGBITS 410208d0 0208d0 004b24 00   A  0   0  4 [13]
> > > > > .ARM.extab PROGBITS        410253f4 0253f4 000018 00   A  0
> > > > > 0  4 [14] .ARM.exidx        ARM_EXIDX       4102540c 02540c
> > > > > 0000c8 00  AL 11 0  4 [15] .data.rel.ro      PROGBITS
> > > > > 41036200 026200 000cf4 00  WA  0   0  8 [16] .dynamic
> > > > >  DYNAMIC 41036ef4 026ef4 0000c8 08  WA  5   0  4 [17] .got
> > > > > PROGBITS 41036fbc 026fbc 000040 04  WA  0   0  4      
> > > > 
> > > > why are all addresses >0x41000000 ?
> > > > in a shared library i expect all those addresses
> > > > to be close to 0.    
> > > 
> > > On this real HW system (the rootfs which is running) - libc.so.6
> > > also has address > 0x41000000
> > > libm.so.6 also has the value > 0x41200000
> > > (Entry point address:               0x412c9190)
> > > 
> > > The offset of > 0x41000000 looks a bit strange indeed, but it is
> > > still less than the kernel space. Moreover, with position
> > > independent code it shall not matter.
> > >   
> > > > 
> > > > is this made by some modified binutils?    
> > > 
> > > I've double checked the ld-linux-armhf.so.3 and after build it
> > > has: (Entry point address:               0xa00) which seems to be
> > > correct.
> > > 
> > > So it looks like during installation of the glibc (on the Yocto/OE
> > > created rootfs) the elf header is modified and this 0x41000000
> > > offset is added.
> > >   
> > 
> > And indeed it is the case. Yocto/OE by default perform prelinking
> > (use prelink program) to speedup start time of dynamic program.
> > 
> > The prelink [1] itself assigns some virtual addresses to all
> > required shared objects (in our case for /sbin/init), so no clashes
> > are encountered.
> > 
> > And using prelink is a _default_ behaviour in Yocto/OE poky distro.
> > 
> > Links:
> > [1] - https://linux.die.net/man/8/prelink  
> 
> ah prelinking.
> 
> ok so &__ehdr_start is supposed to point to the
> elf header, which supposed to start at address 0
> within the elf image.
> 
> but with prelinking everthing is moved by a fixed
> offset (including the elf header), so __ehdr_start
> is no longer has 0 address.

I think that __ehdr_start now points to > 0x41000000.

> 
> i don't immediately know what's the best fix for this.

I think that getting the address with 'adr' ASM instruction was more
robust, as it was not relying on the set constant (__ehdr_start).


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Szabolcs Nagy Oct. 7, 2021, 3:57 p.m. UTC | #27
The 10/07/2021 07:29, H.J. Lu via Libc-alpha wrote:
> On Thu, Oct 7, 2021 at 7:18 AM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> > On 07/10/2021 07:00, Lukasz Majewski wrote:
> > > On Thu, 7 Oct 2021 11:19:26 +0200
> > > Lukasz Majewski <lukma@denx.de> wrote:
> > > And indeed it is the case. Yocto/OE by default perform prelinking (use
> > > prelink program) to speedup start time of dynamic program.
> > >
> > > The prelink [1] itself assigns some virtual addresses to all required
> > > shared objects (in our case for /sbin/init), so no clashes are
> > > encountered.
> > >
> > > And using prelink is a _default_ behaviour in Yocto/OE poky distro.
> >
> > Does it work without prelink? Also, does it fail with prelink in real
> > hardware?
> >
> > It indeed might be a prelink issue in fact.
> 
> This will fail everywhere if prelink is used.

i thought the point of prelinking is that the vaddr in
the elf image is the runtime address so you don't have
to process relative relocs or adjust pointers in the
dynamic array with += l_addr at all (and then l_addr
does not have to be correct, ld.so would still work).

but here ld.so is loaded to some random offset on top
of the static prelink offset. is this expected?
does it make sense to prelink ld.so this way?
  
H.J. Lu Oct. 7, 2021, 4:22 p.m. UTC | #28
On Thu, Oct 7, 2021 at 8:57 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 10/07/2021 07:29, H.J. Lu via Libc-alpha wrote:
> > On Thu, Oct 7, 2021 at 7:18 AM Adhemerval Zanella via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> > > On 07/10/2021 07:00, Lukasz Majewski wrote:
> > > > On Thu, 7 Oct 2021 11:19:26 +0200
> > > > Lukasz Majewski <lukma@denx.de> wrote:
> > > > And indeed it is the case. Yocto/OE by default perform prelinking (use
> > > > prelink program) to speedup start time of dynamic program.
> > > >
> > > > The prelink [1] itself assigns some virtual addresses to all required
> > > > shared objects (in our case for /sbin/init), so no clashes are
> > > > encountered.
> > > >
> > > > And using prelink is a _default_ behaviour in Yocto/OE poky distro.
> > >
> > > Does it work without prelink? Also, does it fail with prelink in real
> > > hardware?
> > >
> > > It indeed might be a prelink issue in fact.
> >
> > This will fail everywhere if prelink is used.
>
> i thought the point of prelinking is that the vaddr in
> the elf image is the runtime address so you don't have
> to process relative relocs or adjust pointers in the
> dynamic array with += l_addr at all (and then l_addr
> does not have to be correct, ld.so would still work).
>
> but here ld.so is loaded to some random offset on top
> of the static prelink offset. is this expected?
> does it make sense to prelink ld.so this way?

ld.so is loaded by the kernel.  It makes no sense to
prelink ld.so.
  
Adhemerval Zanella Netto Oct. 7, 2021, 4:53 p.m. UTC | #29
On 07/10/2021 13:22, H.J. Lu wrote:
> On Thu, Oct 7, 2021 at 8:57 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>
>> The 10/07/2021 07:29, H.J. Lu via Libc-alpha wrote:
>>> On Thu, Oct 7, 2021 at 7:18 AM Adhemerval Zanella via Libc-alpha
>>> <libc-alpha@sourceware.org> wrote:
>>>> On 07/10/2021 07:00, Lukasz Majewski wrote:
>>>>> On Thu, 7 Oct 2021 11:19:26 +0200
>>>>> Lukasz Majewski <lukma@denx.de> wrote:
>>>>> And indeed it is the case. Yocto/OE by default perform prelinking (use
>>>>> prelink program) to speedup start time of dynamic program.
>>>>>
>>>>> The prelink [1] itself assigns some virtual addresses to all required
>>>>> shared objects (in our case for /sbin/init), so no clashes are
>>>>> encountered.
>>>>>
>>>>> And using prelink is a _default_ behaviour in Yocto/OE poky distro.
>>>>
>>>> Does it work without prelink? Also, does it fail with prelink in real
>>>> hardware?
>>>>
>>>> It indeed might be a prelink issue in fact.
>>>
>>> This will fail everywhere if prelink is used.
>>
>> i thought the point of prelinking is that the vaddr in
>> the elf image is the runtime address so you don't have
>> to process relative relocs or adjust pointers in the
>> dynamic array with += l_addr at all (and then l_addr
>> does not have to be correct, ld.so would still work).
>>
>> but here ld.so is loaded to some random offset on top
>> of the static prelink offset. is this expected?
>> does it make sense to prelink ld.so this way?
> 
> ld.so is loaded by the kernel.  It makes no sense to
> prelink ld.so.
> 

The question is whether we consider this a regression or if
we don't support it and instruct consumers to not do it.
  
H.J. Lu Oct. 7, 2021, 5:05 p.m. UTC | #30
On Thu, Oct 7, 2021 at 9:53 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 07/10/2021 13:22, H.J. Lu wrote:
> > On Thu, Oct 7, 2021 at 8:57 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>
> >> The 10/07/2021 07:29, H.J. Lu via Libc-alpha wrote:
> >>> On Thu, Oct 7, 2021 at 7:18 AM Adhemerval Zanella via Libc-alpha
> >>> <libc-alpha@sourceware.org> wrote:
> >>>> On 07/10/2021 07:00, Lukasz Majewski wrote:
> >>>>> On Thu, 7 Oct 2021 11:19:26 +0200
> >>>>> Lukasz Majewski <lukma@denx.de> wrote:
> >>>>> And indeed it is the case. Yocto/OE by default perform prelinking (use
> >>>>> prelink program) to speedup start time of dynamic program.
> >>>>>
> >>>>> The prelink [1] itself assigns some virtual addresses to all required
> >>>>> shared objects (in our case for /sbin/init), so no clashes are
> >>>>> encountered.
> >>>>>
> >>>>> And using prelink is a _default_ behaviour in Yocto/OE poky distro.
> >>>>
> >>>> Does it work without prelink? Also, does it fail with prelink in real
> >>>> hardware?
> >>>>
> >>>> It indeed might be a prelink issue in fact.
> >>>
> >>> This will fail everywhere if prelink is used.
> >>
> >> i thought the point of prelinking is that the vaddr in
> >> the elf image is the runtime address so you don't have
> >> to process relative relocs or adjust pointers in the
> >> dynamic array with += l_addr at all (and then l_addr
> >> does not have to be correct, ld.so would still work).
> >>
> >> but here ld.so is loaded to some random offset on top
> >> of the static prelink offset. is this expected?
> >> does it make sense to prelink ld.so this way?
> >
> > ld.so is loaded by the kernel.  It makes no sense to
> > prelink ld.so.
> >
>
> The question is whether we consider this a regression or if
> we don't support it and instruct consumers to not do it.

I consider it a pilot error.  We can add a check in ld.so
to error out if it has been prelinked.
  
Fangrui Song Oct. 7, 2021, 5:24 p.m. UTC | #31
On Thu, Oct 7, 2021 at 9:54 AM Adhemerval Zanella via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
>
>
> On 07/10/2021 13:22, H.J. Lu wrote:
> > On Thu, Oct 7, 2021 at 8:57 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> >>
> >> The 10/07/2021 07:29, H.J. Lu via Libc-alpha wrote:
> >>> On Thu, Oct 7, 2021 at 7:18 AM Adhemerval Zanella via Libc-alpha
> >>> <libc-alpha@sourceware.org> wrote:
> >>>> On 07/10/2021 07:00, Lukasz Majewski wrote:
> >>>>> On Thu, 7 Oct 2021 11:19:26 +0200
> >>>>> Lukasz Majewski <lukma@denx.de> wrote:
> >>>>> And indeed it is the case. Yocto/OE by default perform prelinking (use
> >>>>> prelink program) to speedup start time of dynamic program.
> >>>>>
> >>>>> The prelink [1] itself assigns some virtual addresses to all required
> >>>>> shared objects (in our case for /sbin/init), so no clashes are
> >>>>> encountered.
> >>>>>
> >>>>> And using prelink is a _default_ behaviour in Yocto/OE poky distro.
> >>>>
> >>>> Does it work without prelink? Also, does it fail with prelink in real
> >>>> hardware?
> >>>>
> >>>> It indeed might be a prelink issue in fact.
> >>>
> >>> This will fail everywhere if prelink is used.
> >>
> >> i thought the point of prelinking is that the vaddr in
> >> the elf image is the runtime address so you don't have
> >> to process relative relocs or adjust pointers in the
> >> dynamic array with += l_addr at all (and then l_addr
> >> does not have to be correct, ld.so would still work).
> >>
> >> but here ld.so is loaded to some random offset on top
> >> of the static prelink offset. is this expected?
> >> does it make sense to prelink ld.so this way?
> >
> > ld.so is loaded by the kernel.  It makes no sense to
> > prelink ld.so.
> >
>
> The question is whether we consider this a regression or if
> we don't support it and instruct consumers to not do it.

&__ehdr_start was used long before "elf: Unconditionally use __ehdr_start"
https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=elf/rtld.c;h=878e6480f4980b2fef468d8e271cd68acd4fc2d0;hp=d733359eaf808b8a5579908a6cd1153062c02919;hb=302247c89121e8d4c7629e589edbb4974fff6edb;hpb=13710e7e6af6c8965cc9a63a0660cb4ce1966557

It is surprising if that code could do something useful with prelinking.
  
Szabolcs Nagy Oct. 8, 2021, 9:15 a.m. UTC | #32
The 10/07/2021 10:24, Fāng-ruì Sòng wrote:
> &__ehdr_start was used long before "elf: Unconditionally use __ehdr_start"
> https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=elf/rtld.c;h=878e6480f4980b2fef468d8e271cd68acd4fc2d0;hp=d733359eaf808b8a5579908a6cd1153062c02919;hb=302247c89121e8d4c7629e589edbb4974fff6edb;hpb=13710e7e6af6c8965cc9a63a0660cb4ce1966557
> 
> It is surprising if that code could do something useful with prelinking.

__ehdr_start was used as the start of the elf header.
that's true even after prelinking does its offsets.

the new use of __ehdr_start is to mean the base address
by which local symbol vaddr needs to be offset to get
the runtime address. this use relies on __ehdr_start to
have 0 vaddr wihch is broken by prelinking that moves
everything to some offset.

in principle the elf object after prelinking is valid,
but since this is the ld.so we can say that we only
support the case when the elf header is at 0 vaddr.
  
Lukasz Majewski Oct. 11, 2021, 8:56 a.m. UTC | #33
Hi H.J, Adhemerval, Szabolcs,

> On Thu, Oct 7, 2021 at 7:18 AM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> >
> >
> > On 07/10/2021 07:00, Lukasz Majewski wrote:  
> > > On Thu, 7 Oct 2021 11:19:26 +0200
> > > Lukasz Majewski <lukma@denx.de> wrote:
> > >  
> > >> Hi Szabolcs,
> > >>  
> > >>> The 10/06/2021 13:43, Lukasz Majewski wrote:  
> > >>>> Please find in-depth analyze about the issue:
> > >>>>
> > >>>> It was tested with Beagle Bone Black (BBB) and QEMU (the same
> > >>>> binary rootfs images).
> > >>>> (If needed I will upload images and script to run QEMU to some
> > >>>> server for reproduction).
> > >>>> Branch:
> > >>>> https://github.com/lmajewski/y2038_glibc/commits/y2038_edge  
> > >>>
> > >>> i think it is easier to look at if you upload the broken
> > >>> ld.so binary. or at least readelf -aW ld.so output.  
> > >>
> > >> Please find working and broken ld-linux-armhf.so.3:
> > >> https://owncloud.denx.de/s/wtAfktG6pXLffSA
> > >>  
> > >>>  
> > >>>> On working setup to trigger the core dump:
> > >>>> /home/root/ld-linux-armhf.so.3 /sbin/init
> > >>>> gdb ./ld-linux-armhf.so.3 core
> > >>>>
> > >>>> (and the /home/root/ld-linux-armhf.so.3 is the "broken" one).
> > >>>>
> > >>>>
> > >>>> Not working (patch [1] not applied):
> > >>>> ====================================
> > >>>>
> > >>>> All the code is located in _dl_start in elf/rtld.c and
> > >>>> elf/get-dynamic-info.h files:
> > >>>>
> > >>>> (gdb) p/x $r5
> > >>>> $46 = 0xb6fc8000
> > >>>> r5 is set from the elf_machine_load_address()
> > >>>>
> > >>>> Then we enter the elf_get_dynamic_info()
> > >>>>
> > >>>> 0xb6fc95fc      99            ADJUST_DYN_INFO (DT_SYMTAB);
> > >>>>    0xb6fc95f8 <_dl_start+308>:  04 30 92 15     ldrne   r3,
> > >>>> [r2, #4] => 0xb6fc95fc <_dl_start+312>:  05 30 83 10     addne
> > >>>>   r3, r3, r5 0xb6fc9600 <_dl_start+316>:  04 30 82 15
> > >>>> strne   r3, [r2, #4] (gdb) p/x $r3
> > >>>> $63 = 0x410003f4
> > >>>> (gdb) p/x $r5
> > >>>> $64 = 0xb6fc8000  
> > >>>
> > >>> it seems r5 is already wrong here, it's not the runtime
> > >>> address of 0. (looks more like the runtime address of
> > >>> 0x41000000)  
> > >>
> > >> The r5 is set according to the change in patch which now I'm
> > >> trying to fix.
> > >>
> > >> To be more specific - r5 is set according to code in patch SHA1:
> > >> bca0f5cbc9257c13322b99e55235c4f21ba0bd82
> > >>  
> > >>>
> > >>> likely something goes wrong with the computation of r5.  
> > >>
> > >> Before the SHA1: bca0f5cbc9257c13322b99e55235c4f21ba0bd82 patch
> > >> r5 was computed with using 'adr' assembler instruction, not set
> > >> from some constant value.
> > >>
> > >> asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
> > >>
> > >> And r5 value was 0x75fc8000
> > >>  
> > >>>  
> > >>>> Links:
> > >>>> [1] -
> > >>>> https://github.com/lmajewski/y2038_glibc/commit/e67e0f589b88a44be8f8b9b770b08950dd7e5bd5
> > >>>>
> > >>>> readelf -e ld-linux-armhf.so.3
> > >>>>
> > >>>> [10] .plt              PROGBITS        41000994 000994 000050
> > >>>> 04 AX  0   0  4 [11] .text             PROGBITS        41000a00
> > >>>> 000a00 01fed0 00  AX  0   0 64 [12] .rodata           PROGBITS
> > >>>> 410208d0 0208d0 004b24 00   A  0   0  4 [13] .ARM.extab
> > >>>> PROGBITS        410253f4 0253f4 000018 00   A  0   0  4 [14]
> > >>>> .ARM.exidx        ARM_EXIDX       4102540c 02540c 0000c8 00
> > >>>> AL 11 0  4 [15] .data.rel.ro      PROGBITS        41036200
> > >>>> 026200 000cf4 00  WA  0   0  8 [16] .dynamic          DYNAMIC
> > >>>> 41036ef4 026ef4 0000c8 08  WA  5   0  4 [17] .got
> > >>>> PROGBITS 41036fbc 026fbc 000040 04  WA  0   0  4  
> > >>>
> > >>> why are all addresses >0x41000000 ?
> > >>> in a shared library i expect all those addresses
> > >>> to be close to 0.  
> > >>
> > >> On this real HW system (the rootfs which is running) - libc.so.6
> > >> also has address > 0x41000000
> > >> libm.so.6 also has the value > 0x41200000
> > >> (Entry point address:               0x412c9190)
> > >>
> > >> The offset of > 0x41000000 looks a bit strange indeed, but it is
> > >> still less than the kernel space. Moreover, with position
> > >> independent code it shall not matter.
> > >>  
> > >>>
> > >>> is this made by some modified binutils?  
> > >>
> > >> I've double checked the ld-linux-armhf.so.3 and after build it
> > >> has: (Entry point address:               0xa00) which seems to
> > >> be correct.
> > >>
> > >> So it looks like during installation of the glibc (on the
> > >> Yocto/OE created rootfs) the elf header is modified and this
> > >> 0x41000000 offset is added.
> > >>  
> > >
> > > And indeed it is the case. Yocto/OE by default perform prelinking
> > > (use prelink program) to speedup start time of dynamic program.
> > >
> > > The prelink [1] itself assigns some virtual addresses to all
> > > required shared objects (in our case for /sbin/init), so no
> > > clashes are encountered.
> > >
> > > And using prelink is a _default_ behaviour in Yocto/OE poky
> > > distro.  
> >
> > Does it work without prelink? Also, does it fail with prelink in
> > real hardware?
> >
> > It indeed might be a prelink issue in fact.  
> 
> This will fail everywhere if prelink is used.
> 

Do we have _any_ plan how to fix it? BSPs for many boards are built
with Yocto/OE nowadays...

The approach with using 'adr' (pseudo) assembler instruction to
calculate the on-runtime addresses was working previously.

Shall we revert changes which were introduced recently (to use
__ehdr_start)?


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Szabolcs Nagy Oct. 11, 2021, 10:18 a.m. UTC | #34
The 10/11/2021 10:56, Lukasz Majewski wrote:
> Do we have _any_ plan how to fix it? BSPs for many boards are built
> with Yocto/OE nowadays...
> 
> The approach with using 'adr' (pseudo) assembler instruction to
> calculate the on-runtime addresses was working previously.
> 
> Shall we revert changes which were introduced recently (to use
> __ehdr_start)?

the reason for the change was to avoid relying on GOT[0].

i guess we can revert elf_machine_load_address on arm, but
keep the new elf_machine_dynamic that does not rely on GOT[0].

it is also useful to have the same c code across targets
for the load address. if we want to do that and support
vaddr!=0 for ehdr then i think it can be (untested):

__attribute__ ((const)) ElfW(Addr)
load_addr (void)
{
  extern const ElfW(Ehdr) __ehdr_start __attribute__ ((visibility ("hidden")));
  ElfW(Addr) addr = (ElfW(Addr)) &__ehdr_start;
  ElfW(Word) phnum = __ehdr_start.e_phnum;
  const ElfW(Phdr) *phdr = (const void *) (addr + __ehdr_start.e_phoff);
  const ElfW(Phdr) *ph;
  assert (__ehdr_start.e_phentsize == sizeof *phdr);
  for (ph = phdr; ph < &phdr[phnum]; ++ph)
    if (ph->p_type == PT_LOAD && ph->p_offset == 0)
      {
        addr -= ph->p_vaddr;
        break;
      }
  return addr;
}

i don't have a strong preference either way:
1) fix yocto
2) partial revert
3) new way to compute the load address.
  
Lukasz Majewski Oct. 11, 2021, 11:47 a.m. UTC | #35
Hi Szabolcs,

Thank you very much for your input.

> The 10/11/2021 10:56, Lukasz Majewski wrote:
> > Do we have _any_ plan how to fix it? BSPs for many boards are built
> > with Yocto/OE nowadays...
> > 
> > The approach with using 'adr' (pseudo) assembler instruction to
> > calculate the on-runtime addresses was working previously.
> > 
> > Shall we revert changes which were introduced recently (to use
> > __ehdr_start)?  
> 
> the reason for the change was to avoid relying on GOT[0].

And why it is requested to not relying on GOT[0]?

> 
> i guess we can revert elf_machine_load_address on arm, but
> keep the new elf_machine_dynamic that does not rely on GOT[0].

This was the approach proposed by this patch.

> 
> it is also useful to have the same c code across targets
> for the load address. if we want to do that and support
> vaddr!=0 for ehdr then i think it can be (untested):
> 
> __attribute__ ((const)) ElfW(Addr)
> load_addr (void)
> {
>   extern const ElfW(Ehdr) __ehdr_start __attribute__ ((visibility
> ("hidden"))); ElfW(Addr) addr = (ElfW(Addr)) &__ehdr_start;
>   ElfW(Word) phnum = __ehdr_start.e_phnum;
>   const ElfW(Phdr) *phdr = (const void *) (addr +
> __ehdr_start.e_phoff); const ElfW(Phdr) *ph;
>   assert (__ehdr_start.e_phentsize == sizeof *phdr);
>   for (ph = phdr; ph < &phdr[phnum]; ++ph)
>     if (ph->p_type == PT_LOAD && ph->p_offset == 0)
>       {
>         addr -= ph->p_vaddr;

The above line is a bit strange for me. Isn't the p_offset set by
prelink as well?

And most of all - why do we need to perform relocation in such very
early stage of the ld-linux-armhf.so.3 ?

>         break;
>       }
>   return addr;
> }
> 
> i don't have a strong preference either way:
> 1) fix yocto

Yocto by default uses prelinkg from the very long time. I think that it
would be very difficult to change that.

(moreover, IMHO running prelink on any binary and glibc is a valid use
case)

> 2) partial revert

Ok.

> 3) new way to compute the load address.

This would require some input from developers deeply involved in ELF
loader development. (Florian IIRC).

Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
H.J. Lu Oct. 11, 2021, 12:01 p.m. UTC | #36
On Mon, Oct 11, 2021 at 4:47 AM Lukasz Majewski <lukma@denx.de> wrote:
>
> Hi Szabolcs,
>
> Thank you very much for your input.
>
> > The 10/11/2021 10:56, Lukasz Majewski wrote:
> > > Do we have _any_ plan how to fix it? BSPs for many boards are built
> > > with Yocto/OE nowadays...
> > >
> > > The approach with using 'adr' (pseudo) assembler instruction to
> > > calculate the on-runtime addresses was working previously.
> > >
> > > Shall we revert changes which were introduced recently (to use
> > > __ehdr_start)?
> >
> > the reason for the change was to avoid relying on GOT[0].
>
> And why it is requested to not relying on GOT[0]?
>
> >
> > i guess we can revert elf_machine_load_address on arm, but
> > keep the new elf_machine_dynamic that does not rely on GOT[0].
>
> This was the approach proposed by this patch.
>
> >
> > it is also useful to have the same c code across targets
> > for the load address. if we want to do that and support
> > vaddr!=0 for ehdr then i think it can be (untested):
> >
> > __attribute__ ((const)) ElfW(Addr)
> > load_addr (void)
> > {
> >   extern const ElfW(Ehdr) __ehdr_start __attribute__ ((visibility
> > ("hidden"))); ElfW(Addr) addr = (ElfW(Addr)) &__ehdr_start;
> >   ElfW(Word) phnum = __ehdr_start.e_phnum;
> >   const ElfW(Phdr) *phdr = (const void *) (addr +
> > __ehdr_start.e_phoff); const ElfW(Phdr) *ph;
> >   assert (__ehdr_start.e_phentsize == sizeof *phdr);
> >   for (ph = phdr; ph < &phdr[phnum]; ++ph)
> >     if (ph->p_type == PT_LOAD && ph->p_offset == 0)
> >       {
> >         addr -= ph->p_vaddr;
>
> The above line is a bit strange for me. Isn't the p_offset set by
> prelink as well?
>
> And most of all - why do we need to perform relocation in such very
> early stage of the ld-linux-armhf.so.3 ?
>
> >         break;
> >       }
> >   return addr;
> > }
> >
> > i don't have a strong preference either way:
> > 1) fix yocto
>
> Yocto by default uses prelinkg from the very long time. I think that it
> would be very difficult to change that.

Just don't run prelink on ld.so since the load address is controlled
by kernel.

> (moreover, IMHO running prelink on any binary and glibc is a valid use
> case)
>
> > 2) partial revert
>
> Ok.
>
> > 3) new way to compute the load address.
>
> This would require some input from developers deeply involved in ELF
> loader development. (Florian IIRC).
>
> Best regards,
>
> Lukasz Majewski
>
> --
>
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Szabolcs Nagy Oct. 11, 2021, 12:48 p.m. UTC | #37
The 10/11/2021 13:47, Lukasz Majewski wrote:
> Hi Szabolcs,
> 
> Thank you very much for your input.
> 
> > The 10/11/2021 10:56, Lukasz Majewski wrote:
> > > Do we have _any_ plan how to fix it? BSPs for many boards are built
> > > with Yocto/OE nowadays...
> > > 
> > > The approach with using 'adr' (pseudo) assembler instruction to
> > > calculate the on-runtime addresses was working previously.
> > > 
> > > Shall we revert changes which were introduced recently (to use
> > > __ehdr_start)?  
> > 
> > the reason for the change was to avoid relying on GOT[0].
> 
> And why it is requested to not relying on GOT[0]?

to support lld.

> > 
> > i guess we can revert elf_machine_load_address on arm, but
> > keep the new elf_machine_dynamic that does not rely on GOT[0].
> 
> This was the approach proposed by this patch.

i see, then i think only the commit message needs
update to explain the ld.so prelink issue.

> > 
> > it is also useful to have the same c code across targets
> > for the load address. if we want to do that and support
> > vaddr!=0 for ehdr then i think it can be (untested):
> > 
> > __attribute__ ((const)) ElfW(Addr)
> > load_addr (void)
> > {
> >   extern const ElfW(Ehdr) __ehdr_start __attribute__ ((visibility
> > ("hidden"))); ElfW(Addr) addr = (ElfW(Addr)) &__ehdr_start;
> >   ElfW(Word) phnum = __ehdr_start.e_phnum;
> >   const ElfW(Phdr) *phdr = (const void *) (addr +
> > __ehdr_start.e_phoff); const ElfW(Phdr) *ph;
> >   assert (__ehdr_start.e_phentsize == sizeof *phdr);
> >   for (ph = phdr; ph < &phdr[phnum]; ++ph)
> >     if (ph->p_type == PT_LOAD && ph->p_offset == 0)
> >       {
> >         addr -= ph->p_vaddr;
> 
> The above line is a bit strange for me. Isn't the p_offset set by
> prelink as well?
> 
> And most of all - why do we need to perform relocation in such very
> early stage of the ld-linux-armhf.so.3 ?

ld.so itself relies on dynamic relocs, so it has to
self relocate (as early as possible). (one ugliness
of the old arm load address computation is that it
relies on running before self relocation so the GOT
entry of _dl_start still contains the link time addr.
on other targets the computation still works later.)

the p_offset is the offset in the file, prelinking
does not change that and the elf header starts at
offset==0 which is normally part of a load segment:

$ readelf -lW BROKEN_ld-linux-armhf.so.3

Elf file type is DYN (Shared object file)
Entry point 0x41000a00
There are 7 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  EXIDX          0x02540c 0x4102540c 0x4102540c 0x000c8 0x000c8 R   0x4
  LOAD           0x000000 0x41000000 0x41000000 0x254d4 0x254d4 R E 0x10000
  LOAD           0x026200 0x41036200 0x41036200 0x016d8 0x017c0 RW  0x10000
  DYNAMIC        0x026ef4 0x41036ef4 0x41036ef4 0x000c8 0x000c8 RW  0x4
  NOTE           0x000114 0x41000114 0x41000114 0x00024 0x00024 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  GNU_RELRO      0x026200 0x41036200 0x41036200 0x00e00 0x00e00 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     .ARM.exidx
   01     .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_d .rel.dyn .rel.plt .plt .text .rodata .ARM.extab .ARM.exidx
   02     .data.rel.ro .dynamic .got .data .bss
   03     .dynamic
   04     .note.gnu.build-id
   05
   06     .data.rel.ro .dynamic .got
  
Lukasz Majewski Oct. 11, 2021, 1:10 p.m. UTC | #38
On Mon, 11 Oct 2021 05:01:23 -0700
"H.J. Lu" <hjl.tools@gmail.com> wrote:

> On Mon, Oct 11, 2021 at 4:47 AM Lukasz Majewski <lukma@denx.de> wrote:
> >
> > Hi Szabolcs,
> >
> > Thank you very much for your input.
> >  
> > > The 10/11/2021 10:56, Lukasz Majewski wrote:  
> > > > Do we have _any_ plan how to fix it? BSPs for many boards are
> > > > built with Yocto/OE nowadays...
> > > >
> > > > The approach with using 'adr' (pseudo) assembler instruction to
> > > > calculate the on-runtime addresses was working previously.
> > > >
> > > > Shall we revert changes which were introduced recently (to use
> > > > __ehdr_start)?  
> > >
> > > the reason for the change was to avoid relying on GOT[0].  
> >
> > And why it is requested to not relying on GOT[0]?
> >  
> > >
> > > i guess we can revert elf_machine_load_address on arm, but
> > > keep the new elf_machine_dynamic that does not rely on GOT[0].  
> >
> > This was the approach proposed by this patch.
> >  
> > >
> > > it is also useful to have the same c code across targets
> > > for the load address. if we want to do that and support
> > > vaddr!=0 for ehdr then i think it can be (untested):
> > >
> > > __attribute__ ((const)) ElfW(Addr)
> > > load_addr (void)
> > > {
> > >   extern const ElfW(Ehdr) __ehdr_start __attribute__ ((visibility
> > > ("hidden"))); ElfW(Addr) addr = (ElfW(Addr)) &__ehdr_start;
> > >   ElfW(Word) phnum = __ehdr_start.e_phnum;
> > >   const ElfW(Phdr) *phdr = (const void *) (addr +
> > > __ehdr_start.e_phoff); const ElfW(Phdr) *ph;
> > >   assert (__ehdr_start.e_phentsize == sizeof *phdr);
> > >   for (ph = phdr; ph < &phdr[phnum]; ++ph)
> > >     if (ph->p_type == PT_LOAD && ph->p_offset == 0)
> > >       {
> > >         addr -= ph->p_vaddr;  
> >
> > The above line is a bit strange for me. Isn't the p_offset set by
> > prelink as well?
> >
> > And most of all - why do we need to perform relocation in such very
> > early stage of the ld-linux-armhf.so.3 ?
> >  
> > >         break;
> > >       }
> > >   return addr;
> > > }
> > >
> > > i don't have a strong preference either way:
> > > 1) fix yocto  
> >
> > Yocto by default uses prelinkg from the very long time. I think
> > that it would be very difficult to change that.  
> 
> Just don't run prelink on ld.so since the load address is controlled
> by kernel.

prelink is used also to speed things up. In some use cases - boot time
optimization (like kiosk, or automotive) the bootup time is crucial.

IMHO, the change with __ehdr_start introduced regression and hence
shall be fixed so the previous behaviour is preserved.

> 
> > (moreover, IMHO running prelink on any binary and glibc is a valid
> > use case)
> >  
> > > 2) partial revert  
> >
> > Ok.
> >  
> > > 3) new way to compute the load address.  
> >
> > This would require some input from developers deeply involved in ELF
> > loader development. (Florian IIRC).
> >
> > Best regards,
> >
> > Lukasz Majewski
> >
> > --
> >
> > DENX Software Engineering GmbH,      Managing Director: Wolfgang
> > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,
> > Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> > lukma@denx.de  
> 
> 
> 




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
H.J. Lu Oct. 11, 2021, 1:22 p.m. UTC | #39
On Mon, Oct 11, 2021 at 6:10 AM Lukasz Majewski <lukma@denx.de> wrote:
>
> On Mon, 11 Oct 2021 05:01:23 -0700
> "H.J. Lu" <hjl.tools@gmail.com> wrote:
>
> > On Mon, Oct 11, 2021 at 4:47 AM Lukasz Majewski <lukma@denx.de> wrote:
> > >
> > > Hi Szabolcs,
> > >
> > > Thank you very much for your input.
> > >
> > > > The 10/11/2021 10:56, Lukasz Majewski wrote:
> > > > > Do we have _any_ plan how to fix it? BSPs for many boards are
> > > > > built with Yocto/OE nowadays...
> > > > >
> > > > > The approach with using 'adr' (pseudo) assembler instruction to
> > > > > calculate the on-runtime addresses was working previously.
> > > > >
> > > > > Shall we revert changes which were introduced recently (to use
> > > > > __ehdr_start)?
> > > >
> > > > the reason for the change was to avoid relying on GOT[0].
> > >
> > > And why it is requested to not relying on GOT[0]?
> > >
> > > >
> > > > i guess we can revert elf_machine_load_address on arm, but
> > > > keep the new elf_machine_dynamic that does not rely on GOT[0].
> > >
> > > This was the approach proposed by this patch.
> > >
> > > >
> > > > it is also useful to have the same c code across targets
> > > > for the load address. if we want to do that and support
> > > > vaddr!=0 for ehdr then i think it can be (untested):
> > > >
> > > > __attribute__ ((const)) ElfW(Addr)
> > > > load_addr (void)
> > > > {
> > > >   extern const ElfW(Ehdr) __ehdr_start __attribute__ ((visibility
> > > > ("hidden"))); ElfW(Addr) addr = (ElfW(Addr)) &__ehdr_start;
> > > >   ElfW(Word) phnum = __ehdr_start.e_phnum;
> > > >   const ElfW(Phdr) *phdr = (const void *) (addr +
> > > > __ehdr_start.e_phoff); const ElfW(Phdr) *ph;
> > > >   assert (__ehdr_start.e_phentsize == sizeof *phdr);
> > > >   for (ph = phdr; ph < &phdr[phnum]; ++ph)
> > > >     if (ph->p_type == PT_LOAD && ph->p_offset == 0)
> > > >       {
> > > >         addr -= ph->p_vaddr;
> > >
> > > The above line is a bit strange for me. Isn't the p_offset set by
> > > prelink as well?
> > >
> > > And most of all - why do we need to perform relocation in such very
> > > early stage of the ld-linux-armhf.so.3 ?
> > >
> > > >         break;
> > > >       }
> > > >   return addr;
> > > > }
> > > >
> > > > i don't have a strong preference either way:
> > > > 1) fix yocto
> > >
> > > Yocto by default uses prelinkg from the very long time. I think
> > > that it would be very difficult to change that.
> >
> > Just don't run prelink on ld.so since the load address is controlled
> > by kernel.
>
> prelink is used also to speed things up. In some use cases - boot time
> optimization (like kiosk, or automotive) the bootup time is crucial.

I am NOT asking you to remove prelink.  I am only asking not to run
prelink on ld.so.  Do you have bootup time data on ld.so with and without
prelink?

> IMHO, the change with __ehdr_start introduced regression and hence
> shall be fixed so the previous behaviour is preserved.
>
  
Adhemerval Zanella Netto Oct. 11, 2021, 1:34 p.m. UTC | #40
On 11/10/2021 09:01, H.J. Lu wrote:
> On Mon, Oct 11, 2021 at 4:47 AM Lukasz Majewski <lukma@denx.de> wrote:
>>
>> Hi Szabolcs,
>>
>> Thank you very much for your input.
>>
>>> The 10/11/2021 10:56, Lukasz Majewski wrote:
>>>> Do we have _any_ plan how to fix it? BSPs for many boards are built
>>>> with Yocto/OE nowadays...
>>>>
>>>> The approach with using 'adr' (pseudo) assembler instruction to
>>>> calculate the on-runtime addresses was working previously.
>>>>
>>>> Shall we revert changes which were introduced recently (to use
>>>> __ehdr_start)?
>>>
>>> the reason for the change was to avoid relying on GOT[0].
>>
>> And why it is requested to not relying on GOT[0]?
>>
>>>
>>> i guess we can revert elf_machine_load_address on arm, but
>>> keep the new elf_machine_dynamic that does not rely on GOT[0].
>>
>> This was the approach proposed by this patch.
>>
>>>
>>> it is also useful to have the same c code across targets
>>> for the load address. if we want to do that and support
>>> vaddr!=0 for ehdr then i think it can be (untested):
>>>
>>> __attribute__ ((const)) ElfW(Addr)
>>> load_addr (void)
>>> {
>>>   extern const ElfW(Ehdr) __ehdr_start __attribute__ ((visibility
>>> ("hidden"))); ElfW(Addr) addr = (ElfW(Addr)) &__ehdr_start;
>>>   ElfW(Word) phnum = __ehdr_start.e_phnum;
>>>   const ElfW(Phdr) *phdr = (const void *) (addr +
>>> __ehdr_start.e_phoff); const ElfW(Phdr) *ph;
>>>   assert (__ehdr_start.e_phentsize == sizeof *phdr);
>>>   for (ph = phdr; ph < &phdr[phnum]; ++ph)
>>>     if (ph->p_type == PT_LOAD && ph->p_offset == 0)
>>>       {
>>>         addr -= ph->p_vaddr;
>>
>> The above line is a bit strange for me. Isn't the p_offset set by
>> prelink as well?
>>
>> And most of all - why do we need to perform relocation in such very
>> early stage of the ld-linux-armhf.so.3 ?
>>
>>>         break;
>>>       }
>>>   return addr;
>>> }
>>>
>>> i don't have a strong preference either way:
>>> 1) fix yocto
>>
>> Yocto by default uses prelinkg from the very long time. I think that it
>> would be very difficult to change that.
> 
> Just don't run prelink on ld.so since the load address is controlled
> by kernel.

I think it would be better to fail early on ld.so with a meaningful
warning to avoid the possible issues.  Another solution would be
to ignore the entrypoint value set by prelink for the loader and
always use 0.
  
Lukasz Majewski Oct. 11, 2021, 2:31 p.m. UTC | #41
On Mon, 11 Oct 2021 06:22:53 -0700
"H.J. Lu" <hjl.tools@gmail.com> wrote:

> On Mon, Oct 11, 2021 at 6:10 AM Lukasz Majewski <lukma@denx.de> wrote:
> >
> > On Mon, 11 Oct 2021 05:01:23 -0700
> > "H.J. Lu" <hjl.tools@gmail.com> wrote:
> >  
> > > On Mon, Oct 11, 2021 at 4:47 AM Lukasz Majewski <lukma@denx.de>
> > > wrote:  
> > > >
> > > > Hi Szabolcs,
> > > >
> > > > Thank you very much for your input.
> > > >  
> > > > > The 10/11/2021 10:56, Lukasz Majewski wrote:  
> > > > > > Do we have _any_ plan how to fix it? BSPs for many boards
> > > > > > are built with Yocto/OE nowadays...
> > > > > >
> > > > > > The approach with using 'adr' (pseudo) assembler
> > > > > > instruction to calculate the on-runtime addresses was
> > > > > > working previously.
> > > > > >
> > > > > > Shall we revert changes which were introduced recently (to
> > > > > > use __ehdr_start)?  
> > > > >
> > > > > the reason for the change was to avoid relying on GOT[0].  
> > > >
> > > > And why it is requested to not relying on GOT[0]?
> > > >  
> > > > >
> > > > > i guess we can revert elf_machine_load_address on arm, but
> > > > > keep the new elf_machine_dynamic that does not rely on
> > > > > GOT[0].  
> > > >
> > > > This was the approach proposed by this patch.
> > > >  
> > > > >
> > > > > it is also useful to have the same c code across targets
> > > > > for the load address. if we want to do that and support
> > > > > vaddr!=0 for ehdr then i think it can be (untested):
> > > > >
> > > > > __attribute__ ((const)) ElfW(Addr)
> > > > > load_addr (void)
> > > > > {
> > > > >   extern const ElfW(Ehdr) __ehdr_start __attribute__
> > > > > ((visibility ("hidden"))); ElfW(Addr) addr = (ElfW(Addr))
> > > > > &__ehdr_start; ElfW(Word) phnum = __ehdr_start.e_phnum;
> > > > >   const ElfW(Phdr) *phdr = (const void *) (addr +
> > > > > __ehdr_start.e_phoff); const ElfW(Phdr) *ph;
> > > > >   assert (__ehdr_start.e_phentsize == sizeof *phdr);
> > > > >   for (ph = phdr; ph < &phdr[phnum]; ++ph)
> > > > >     if (ph->p_type == PT_LOAD && ph->p_offset == 0)
> > > > >       {
> > > > >         addr -= ph->p_vaddr;  
> > > >
> > > > The above line is a bit strange for me. Isn't the p_offset set
> > > > by prelink as well?
> > > >
> > > > And most of all - why do we need to perform relocation in such
> > > > very early stage of the ld-linux-armhf.so.3 ?
> > > >  
> > > > >         break;
> > > > >       }
> > > > >   return addr;
> > > > > }
> > > > >
> > > > > i don't have a strong preference either way:
> > > > > 1) fix yocto  
> > > >
> > > > Yocto by default uses prelinkg from the very long time. I think
> > > > that it would be very difficult to change that.  
> > >
> > > Just don't run prelink on ld.so since the load address is
> > > controlled by kernel.  
> >
> > prelink is used also to speed things up. In some use cases - boot
> > time optimization (like kiosk, or automotive) the bootup time is
> > crucial.  
> 
> I am NOT asking you to remove prelink.  I am only asking not to run
> prelink on ld.so.  Do you have bootup time data on ld.so with and
> without prelink?

The prelink is run on /sbin/init binary, so the libc.so.6 and
ld-linux-armhf.so.3 are automatically affected.

To put it in other words - up till this issue everything was working
(maybe it was just working by a pure coincidence).

Now, because of change in glibc (__ehdr_start usage) you ask if
Yocto/OE can stop using prelink on certain libraries to avoid this
error.

Have I understood your comment correctly?

> 
> > IMHO, the change with __ehdr_start introduced regression and hence
> > shall be fixed so the previous behaviour is preserved.
> >  
> 




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  

Patch

diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h
index eb13cb8b57..58ebef6ecd 100644
--- a/sysdeps/arm/dl-machine.h
+++ b/sysdeps/arm/dl-machine.h
@@ -38,11 +38,33 @@  elf_machine_matches_host (const Elf32_Ehdr *ehdr)
 }
 
 /* Return the run-time load address of the shared object.  */
-static inline ElfW(Addr) __attribute__ ((unused))
+static inline Elf32_Addr __attribute__ ((unused))
 elf_machine_load_address (void)
 {
-  extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
-  return (ElfW(Addr)) &__ehdr_start;
+  Elf32_Addr pcrel_addr;
+#ifdef SHARED
+  extern Elf32_Addr __dl_start (void *) asm ("_dl_start");
+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_start;
+  asm ("adr %0, _dl_start" : "=r" (pcrel_addr));
+#else
+  extern Elf32_Addr __dl_relocate_static_pie (void *)
+    asm ("_dl_relocate_static_pie") attribute_hidden;
+  Elf32_Addr got_addr = (Elf32_Addr) &__dl_relocate_static_pie;
+  asm ("adr %0, _dl_relocate_static_pie" : "=r" (pcrel_addr));
+#endif
+#ifdef __thumb__
+  /* Clear the low bit of the function address.
+
+     NOTE: got_addr is from GOT table whose lsb is always set by linker if it's
+     Thumb function address.  PCREL_ADDR comes from PC-relative calculation
+     which will finish during assembling.  GAS assembler before the fix for
+     PR gas/21458 was not setting the lsb but does after that.  Always do the
+     strip for both, so the code works with various combinations of glibc and
+     Binutils.  */
+  got_addr &= ~(Elf32_Addr) 1;
+  pcrel_addr &= ~(Elf32_Addr) 1;
+#endif
+  return pcrel_addr - got_addr;
 }
 
 /* Return the link-time address of _DYNAMIC.  */