[3/3] csu: Move static pie self relocation later [BZ #27072]

Message ID 1bfa01e6e92073b30c02cb76a209656b9d97b675.1610016590.git.szabolcs.nagy@arm.com
State Superseded
Headers
Series fix ifunc with static pie [BZ #27072] |

Commit Message

Szabolcs Nagy Jan. 7, 2021, 11:01 a.m. UTC
  On targets where hidden symbol access does not need RELATIVE
relocs, move the static pie self relocation after tunables and
cpu features are set up.  This allows processing IRELATIVE
relocs with correct ifunc dispatch logic.

Unfortunately it is hard to guarantee that there will be no
dynamic relocations in the early start up code, so this is a
bit fragile. Ideally the RELATIVE relocs would be processed as
early as possible and IRELATIVE relocs after cpu features are
setup, but in glibc it is hard to separate them into two steps.
---
 csu/libc-start.c | 10 ++++++++++
 1 file changed, 10 insertions(+)
  

Comments

H.J. Lu Jan. 7, 2021, 12:36 p.m. UTC | #1
On Thu, Jan 7, 2021 at 3:04 AM Szabolcs Nagy via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On targets where hidden symbol access does not need RELATIVE
> relocs, move the static pie self relocation after tunables and
> cpu features are set up.  This allows processing IRELATIVE
> relocs with correct ifunc dispatch logic.
>
> Unfortunately it is hard to guarantee that there will be no
> dynamic relocations in the early start up code, so this is a
> bit fragile. Ideally the RELATIVE relocs would be processed as
> early as possible and IRELATIVE relocs after cpu features are
> setup, but in glibc it is hard to separate them into two steps.

x86 linker places IRELATIVE relocations the last:

https://sourceware.org/bugzilla/show_bug.cgi?id=13302

Does your linker have this bug fixed?
  
Carlos O'Donell Jan. 7, 2021, 12:46 p.m. UTC | #2
On 1/7/21 7:36 AM, H.J. Lu via Libc-alpha wrote:
> On Thu, Jan 7, 2021 at 3:04 AM Szabolcs Nagy via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> On targets where hidden symbol access does not need RELATIVE
>> relocs, move the static pie self relocation after tunables and
>> cpu features are set up.  This allows processing IRELATIVE
>> relocs with correct ifunc dispatch logic.
>>
>> Unfortunately it is hard to guarantee that there will be no
>> dynamic relocations in the early start up code, so this is a
>> bit fragile. Ideally the RELATIVE relocs would be processed as
>> early as possible and IRELATIVE relocs after cpu features are
>> setup, but in glibc it is hard to separate them into two steps.
> 
> x86 linker places IRELATIVE relocations the last:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=13302
> 
> Does your linker have this bug fixed?

Agreed, this is something I asked about during the design of
IFUNCs and I was told by Ulrich and others that IRELATIVE has
to be placed in a group and after RELATIVE relocs.

We need to document more of the guarantees and semantics here.
  
H.J. Lu Jan. 7, 2021, 12:51 p.m. UTC | #3
On Thu, Jan 7, 2021 at 4:46 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 1/7/21 7:36 AM, H.J. Lu via Libc-alpha wrote:
> > On Thu, Jan 7, 2021 at 3:04 AM Szabolcs Nagy via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> >>
> >> On targets where hidden symbol access does not need RELATIVE
> >> relocs, move the static pie self relocation after tunables and
> >> cpu features are set up.  This allows processing IRELATIVE
> >> relocs with correct ifunc dispatch logic.
> >>
> >> Unfortunately it is hard to guarantee that there will be no
> >> dynamic relocations in the early start up code, so this is a
> >> bit fragile. Ideally the RELATIVE relocs would be processed as
> >> early as possible and IRELATIVE relocs after cpu features are
> >> setup, but in glibc it is hard to separate them into two steps.
> >
> > x86 linker places IRELATIVE relocations the last:
> >
> > https://sourceware.org/bugzilla/show_bug.cgi?id=13302
> >
> > Does your linker have this bug fixed?
>
> Agreed, this is something I asked about during the design of
> IFUNCs and I was told by Ulrich and others that IRELATIVE has
> to be placed in a group and after RELATIVE relocs.

Not just before RELATIVE.   IRELATIVE should be processed the
last before all other relocations.   See PR 13302 for other cases.
  
Szabolcs Nagy Jan. 7, 2021, 1:02 p.m. UTC | #4
The 01/07/2021 04:51, H.J. Lu wrote:
> On Thu, Jan 7, 2021 at 4:46 AM Carlos O'Donell <carlos@redhat.com> wrote:
> >
> > On 1/7/21 7:36 AM, H.J. Lu via Libc-alpha wrote:
> > > On Thu, Jan 7, 2021 at 3:04 AM Szabolcs Nagy via Libc-alpha
> > > <libc-alpha@sourceware.org> wrote:
> > >>
> > >> On targets where hidden symbol access does not need RELATIVE
> > >> relocs, move the static pie self relocation after tunables and
> > >> cpu features are set up.  This allows processing IRELATIVE
> > >> relocs with correct ifunc dispatch logic.
> > >>
> > >> Unfortunately it is hard to guarantee that there will be no
> > >> dynamic relocations in the early start up code, so this is a
> > >> bit fragile. Ideally the RELATIVE relocs would be processed as
> > >> early as possible and IRELATIVE relocs after cpu features are
> > >> setup, but in glibc it is hard to separate them into two steps.
> > >
> > > x86 linker places IRELATIVE relocations the last:
> > >
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=13302
> > >
> > > Does your linker have this bug fixed?
> >
> > Agreed, this is something I asked about during the design of
> > IFUNCs and I was told by Ulrich and others that IRELATIVE has
> > to be placed in a group and after RELATIVE relocs.
> 
> Not just before RELATIVE.   IRELATIVE should be processed the
> last before all other relocations.   See PR 13302 for other cases.

this is about static pie: sorting relocs does not help, the
problem is not that ifunc resolvers have some relocs in them,
but that the target specific relocation processing is one
monolithic switch in a loop that handles all relocs in one go,
there is no api to request only a subset of relocs to be
processed (either by type or ordering in the list of relocs).

in __libc_start_main we need to do

  do_relative_relocs();
  setup_auxv_tunables_cpu_etc();
  do_irelative_relocs();

which cannot be done without major changes in all targets and
generic elf reloc handling code. so the second best is

  setup_auxv_tunables_cpu_etc(); // avoid relative relocs
  do_all_relocs();

on targets where this can be done (other targets do
not support static pie).
  
Szabolcs Nagy Jan. 7, 2021, 2:25 p.m. UTC | #5
The 01/07/2021 13:02, Szabolcs Nagy via Libc-alpha wrote:
> this is about static pie: sorting relocs does not help, the
> problem is not that ifunc resolvers have some relocs in them,
> but that the target specific relocation processing is one
> monolithic switch in a loop that handles all relocs in one go,
> there is no api to request only a subset of relocs to be
> processed (either by type or ordering in the list of relocs).
> 
> in __libc_start_main we need to do
> 
>   do_relative_relocs();
>   setup_auxv_tunables_cpu_etc();
>   do_irelative_relocs();

i just realized we can do this because RELATIVE and IRELATIVE
happen to be in different places (one where DT_REL{A} points
to and another where DT_JUMPREL points to).

i dont know how reliable this is across targets, but we only
need this to work in static pie (which is only supported on
a small number of targets as far as i know).

so i can try to tease _dl_relocate_static_pie apart into
'normal' reloc and 'plt' reloc processing (normally this
is done so that plt relocs can be lazy evaluated, but here
we would do it so IRELATIVE is processed in a separate step).

it still sounds like a big hack that relies on where
IRELATIVE is placed with current linkers and that there
are no other relocs we need to care about in static pie.
i can create a patch to see if it works.
  
Siddhesh Poyarekar Jan. 7, 2021, 2:48 p.m. UTC | #6
On 1/7/21 7:55 PM, Szabolcs Nagy via Libc-alpha wrote:
> it still sounds like a big hack that relies on where
> IRELATIVE is placed with current linkers and that there
> are no other relocs we need to care about in static pie.
> i can create a patch to see if it works.

This is probably a safe assumption for pointer based architectures but 
not for those with capabilities.  I have a feeling you might care about 
the latter ;)

Siddhesh
  
Szabolcs Nagy Jan. 7, 2021, 3:25 p.m. UTC | #7
The 01/07/2021 20:18, Siddhesh Poyarekar wrote:
> On 1/7/21 7:55 PM, Szabolcs Nagy via Libc-alpha wrote:
> > it still sounds like a big hack that relies on where
> > IRELATIVE is placed with current linkers and that there
> > are no other relocs we need to care about in static pie.
> > i can create a patch to see if it works.
> 
> This is probably a safe assumption for pointer based architectures but not
> for those with capabilities.  I have a feeling you might care about the
> latter ;)

well static pie is broken now, i care about that more.
we don't know yet how capability relocs will work with
static linking and ifuncs.

but the main problem with this idea is that i have
to copy a large part of the elf reloc code and hack
it apart to do static pie relocation (and that code
is not a nice piece of code in the first place).
  
H.J. Lu Jan. 7, 2021, 5:48 p.m. UTC | #8
On Thu, Jan 7, 2021 at 7:25 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 01/07/2021 20:18, Siddhesh Poyarekar wrote:
> > On 1/7/21 7:55 PM, Szabolcs Nagy via Libc-alpha wrote:
> > > it still sounds like a big hack that relies on where
> > > IRELATIVE is placed with current linkers and that there
> > > are no other relocs we need to care about in static pie.
> > > i can create a patch to see if it works.
> >
> > This is probably a safe assumption for pointer based architectures but not
> > for those with capabilities.  I have a feeling you might care about the
> > latter ;)
>
> well static pie is broken now, i care about that more.
> we don't know yet how capability relocs will work with
> static linking and ifuncs.
>
> but the main problem with this idea is that i have
> to copy a large part of the elf reloc code and hack
> it apart to do static pie relocation (and that code
> is not a nice piece of code in the first place).

We should call _dl_relocate_static_pie as early as possible
and delay others as much as we can.
  
H.J. Lu Jan. 7, 2021, 9:03 p.m. UTC | #9
On Thu, Jan 7, 2021 at 3:04 AM Szabolcs Nagy via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On targets where hidden symbol access does not need RELATIVE
> relocs, move the static pie self relocation after tunables and
> cpu features are set up.  This allows processing IRELATIVE
> relocs with correct ifunc dispatch logic.
>
> Unfortunately it is hard to guarantee that there will be no
> dynamic relocations in the early start up code, so this is a
> bit fragile. Ideally the RELATIVE relocs would be processed as
> early as possible and IRELATIVE relocs after cpu features are
> setup, but in glibc it is hard to separate them into two steps.
> ---
>  csu/libc-start.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/csu/libc-start.c b/csu/libc-start.c
> index db859c3bed..b8d22bd59e 100644
> --- a/csu/libc-start.c
> +++ b/csu/libc-start.c
> @@ -142,7 +142,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>    int result;
>
>  #ifndef SHARED
> +# ifndef PI_STATIC_AND_HIDDEN
> +  /* Do static pie self relocation as early as possible.  */
>    _dl_relocate_static_pie ();
> +# endif

It should be simply removed. PI_STATIC_AND_HIDDEN should be
required for static PIE.

>    char **ev = &argv[argc + 1];
>
> @@ -191,6 +194,13 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),

Does
   {
      /* Starting from binutils-2.23, the linker will define the
         magic symbol __ehdr_start to point to our own ELF header
         if it is visible in a segment that also includes the phdrs.
         So we can set up _dl_phdr and _dl_phnum even without any
         information from auxv.  */

      extern const ElfW(Ehdr) __ehdr_start
        __attribute__ ((weak, visibility ("hidden")));
      if (&__ehdr_start != NULL)
        {
          assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
          GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
          GL(dl_phnum) = __ehdr_start.e_phnum;
        }
    }

here require RELATIVE relocation?

>    ARCH_INIT_CPU_FEATURES ();
>
> +# ifdef PI_STATIC_AND_HIDDEN
> +  /* Do static pie self relocation after cpu features are setup.
> +     Code before this point must avoid relocations, which in practice
> +     means no initialized global pointer or ifunc symbol access.  */
> +  _dl_relocate_static_pie ();
> +# endif
> +
>    /* Perform IREL{,A} relocations.  */
>    ARCH_SETUP_IREL ();
>
> --
> 2.17.1
>
  
Szabolcs Nagy Jan. 8, 2021, 10:08 a.m. UTC | #10
The 01/07/2021 13:03, H.J. Lu wrote:
> On Thu, Jan 7, 2021 at 3:04 AM Szabolcs Nagy via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >  #ifndef SHARED
> > +# ifndef PI_STATIC_AND_HIDDEN
> > +  /* Do static pie self relocation as early as possible.  */
> >    _dl_relocate_static_pie ();
> > +# endif
> 
> It should be simply removed. PI_STATIC_AND_HIDDEN should be
> required for static PIE.

i guess that makes sense, i can add an assertion for that.

> >    char **ev = &argv[argc + 1];
> >
> > @@ -191,6 +194,13 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> 
> Does
>    {
>       /* Starting from binutils-2.23, the linker will define the
>          magic symbol __ehdr_start to point to our own ELF header
>          if it is visible in a segment that also includes the phdrs.
>          So we can set up _dl_phdr and _dl_phnum even without any
>          information from auxv.  */
> 
>       extern const ElfW(Ehdr) __ehdr_start
>         __attribute__ ((weak, visibility ("hidden")));
>       if (&__ehdr_start != NULL)
>         {
>           assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
>           GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
>           GL(dl_phnum) = __ehdr_start.e_phnum;
>         }
>     }
> 
> here require RELATIVE relocation?

hm indeed __ehdr_start is accessed via GOT.
(i thought pc relative access would work, but
i guess that cannot work for an undefined
weak symbol.)

without relocation processing &__ehdr_start
will always evaluate to 0, i.e. as if it was
undefined.

but i think GL(dl_phdr) is only used much
later so it can be moved after self relocs.
that would also make it clear that we don't
use anything here from phdrs that might
require relocations.

i'll fix this.
  

Patch

diff --git a/csu/libc-start.c b/csu/libc-start.c
index db859c3bed..b8d22bd59e 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -142,7 +142,10 @@  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
   int result;
 
 #ifndef SHARED
+# ifndef PI_STATIC_AND_HIDDEN
+  /* Do static pie self relocation as early as possible.  */
   _dl_relocate_static_pie ();
+# endif
 
   char **ev = &argv[argc + 1];
 
@@ -191,6 +194,13 @@  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 
   ARCH_INIT_CPU_FEATURES ();
 
+# ifdef PI_STATIC_AND_HIDDEN
+  /* Do static pie self relocation after cpu features are setup.
+     Code before this point must avoid relocations, which in practice
+     means no initialized global pointer or ifunc symbol access.  */
+  _dl_relocate_static_pie ();
+# endif
+
   /* Perform IREL{,A} relocations.  */
   ARCH_SETUP_IREL ();