[0/1] __libc_start_main() now uses auxv to obtain phdr's address [BZ #29864]

Message ID 73l3eC0YTCoWnmm2zGV-SIY9-W5DQ-peyx7secTao121qocwqub-R4Zhrpkai8_LRxy4xsn7ARdHRbI7NnzSysk6ACmQWfiavdpz-tA70xc=@protonmail.com
State Superseded
Headers

Commit Message

Yago Gutiérrez Dec. 7, 2022, 12:16 p.m. UTC
  Yago Gutiérrez (1):
  __libc_start_main() now uses auxv to obtain phdr's address [BZ #29864]

 csu/libc-start.c                        | 34 +++++++++++++------------
 sysdeps/generic/ldsodefs.h              |  3 +++
 sysdeps/unix/sysv/linux/dl-parse_auxv.h |  1 +
 3 files changed, 22 insertions(+), 16 deletions(-)
  

Comments

Adhemerval Zanella Dec. 8, 2022, 5:07 p.m. UTC | #1
On Wed, Dec 7, 2022 at 9:17 AM Yago Gutiérrez via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Yago Gutiérrez (1):
>   __libc_start_main() now uses auxv to obtain phdr's address [BZ #29864]

For this change I think Linux should also use AT_PHNUM for dl_phnum,
the rest looks ok.

>
>  csu/libc-start.c                        | 34 +++++++++++++------------
>  sysdeps/generic/ldsodefs.h              |  3 +++
>  sysdeps/unix/sysv/linux/dl-parse_auxv.h |  1 +
>  3 files changed, 22 insertions(+), 16 deletions(-)
>
> --
> 2.38.1
  
Florian Weimer Dec. 8, 2022, 6:42 p.m. UTC | #2
* Adhemerval Zanella via Libc-alpha:

> On Wed, Dec 7, 2022 at 9:17 AM Yago Gutiérrez via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> Yago Gutiérrez (1):
>>   __libc_start_main() now uses auxv to obtain phdr's address [BZ #29864]
>
> For this change I think Linux should also use AT_PHNUM for dl_phnum,
> the rest looks ok.

I'm not really happy with this change, see my upcoming comments.

Thanks,
Florian
  
Florian Weimer Dec. 8, 2022, 6:54 p.m. UTC | #3
* Yago Gutiérrez via Libc-alpha:

> From bf2371ce732194f652719aff4af0f9021f9cbd90 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Yago=20Guti=C3=A9rrez?= <yagogl@protonmail.com>
> Date: Wed, 7 Dec 2022 12:49:22 +0100
> Subject: [PATCH 1/1] __libc_start_main() now uses auxv to obtain phdr's
>  address [BZ #29864]
> To: libc-alpha@sourceware.org

This needs a much more elaborate commit message.

I don't quite get why you need to add this to _rtld_global.  This is
only used by the static startup code.

This looks like it is a fix for the refactoring in:

commit 73fc4e28b9464f0e13edc719a5372839970e7ddb
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Feb 28 11:50:41 2022 +0100

    Linux: Consolidate auxiliary vector parsing (redo)
    
    And optimize it slightly.
    
    This is commit 8c8510ab2790039e58995ef3a22309582413d3ff revised.
    
    In _dl_aux_init in elf/dl-support.c, use an explicit loop
    and -fno-tree-loop-distribute-patterns to avoid memset.
    
    Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

What are the actual failures you see?

Thanks,
Florian
  
Adhemerval Zanella Dec. 8, 2022, 7:15 p.m. UTC | #4
On 08/12/22 15:54, Florian Weimer via Libc-alpha wrote:
> * Yago Gutiérrez via Libc-alpha:
> 
>> From bf2371ce732194f652719aff4af0f9021f9cbd90 Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Yago=20Guti=C3=A9rrez?= <yagogl@protonmail.com>
>> Date: Wed, 7 Dec 2022 12:49:22 +0100
>> Subject: [PATCH 1/1] __libc_start_main() now uses auxv to obtain phdr's
>>  address [BZ #29864]
>> To: libc-alpha@sourceware.org
> 
> This needs a much more elaborate commit message.
> 
> I don't quite get why you need to add this to _rtld_global.  This is
> only used by the static startup code.
> 
> This looks like it is a fix for the refactoring in:
> 
> commit 73fc4e28b9464f0e13edc719a5372839970e7ddb
> Author: Florian Weimer <fweimer@redhat.com>
> Date:   Mon Feb 28 11:50:41 2022 +0100
> 
>     Linux: Consolidate auxiliary vector parsing (redo)
>     
>     And optimize it slightly.
>     
>     This is commit 8c8510ab2790039e58995ef3a22309582413d3ff revised.
>     
>     In _dl_aux_init in elf/dl-support.c, use an explicit loop
>     and -fno-tree-loop-distribute-patterns to avoid memset.
>     
>     Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> 
> What are the actual failures you see?

Even if it does not fix a user visible failure, I think align the dl_phdr
value with the kernel provided ones (since it is the responsible to actually
load the binary) makes sense.
  
Yago Gutiérrez Dec. 9, 2022, 9:02 a.m. UTC | #5
Oh, I'm sorry, you are right. I wrote the explanations in the bugzilla report, not here, my bad. I will paste here what I said there.

The ELF specification declares a field in the ELF header to contain the offset inside the binary at which there will be found the program headers table: phoff. Since a segment may see its in-memory size differ from its in-file size (or the binary may have holes), in-memory offsets are bound to differ too from their in-file counterparts, and hence we may commit an error using base + phoff to find the address of the phdr's, failing miserably with a segmentation fault with a perfectly valid ELF.
Given that the kernel so nicely provides this piece of information in the auxiliar vector, I think it would be best to use it by default, recurring to other ways only when this field is mysteriously ommited from it or there is no auxiliar vector at all.

Thank you for your attention,
Y.


Sent with Proton Mail secure email.

------- Original Message -------
On Thursday, December 8th, 2022 at 19:54, Florian Weimer <fweimer@redhat.com> wrote:


> * Yago Gutiérrez via Libc-alpha:
> 
> > From bf2371ce732194f652719aff4af0f9021f9cbd90 Mon Sep 17 00:00:00 2001
> > From: =?UTF-8?q?Yago=20Guti=C3=A9rrez?= yagogl@protonmail.com
> > Date: Wed, 7 Dec 2022 12:49:22 +0100
> > Subject: [PATCH 1/1] __libc_start_main() now uses auxv to obtain phdr's
> > address [BZ #29864]
> > To: libc-alpha@sourceware.org
> 
> 
> This needs a much more elaborate commit message.
> 
> I don't quite get why you need to add this to _rtld_global. This is
> only used by the static startup code.
> 
> This looks like it is a fix for the refactoring in:
> 
> commit 73fc4e28b9464f0e13edc719a5372839970e7ddb
> Author: Florian Weimer fweimer@redhat.com
> 
> Date: Mon Feb 28 11:50:41 2022 +0100
> 
> Linux: Consolidate auxiliary vector parsing (redo)
> 
> And optimize it slightly.
> 
> This is commit 8c8510ab2790039e58995ef3a22309582413d3ff revised.
> 
> In _dl_aux_init in elf/dl-support.c, use an explicit loop
> and -fno-tree-loop-distribute-patterns to avoid memset.
> 
> Reviewed-by: Szabolcs Nagy szabolcs.nagy@arm.com
> 
> 
> What are the actual failures you see?
> 
> Thanks,
> Florian
  
Florian Weimer Dec. 9, 2022, 1:13 p.m. UTC | #6
* Yago Gutiérrez:

> diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
> index bf9374371e..f1a964ec1d 100644
> --- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
> +++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
> @@ -52,6 +52,9 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
>    _dl_random = (void *) auxv_values[AT_RANDOM];
>    GLRO(dl_minsigstacksize) = auxv_values[AT_MINSIGSTKSZ];
>    GLRO(dl_sysinfo_dso) = (void *) auxv_values[AT_SYSINFO_EHDR];
> +#ifndef SHARED
> +  GL(dl_phdr) = (void*) auxv_values[AT_PHDR];
> +#endif
>  #ifdef NEED_DL_SYSINFO
>    if (GLRO(dl_sysinfo_dso) != NULL)
>      GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];

The dynamic linker calls _dl_parse_auxv via _dl_sysdep_parse_arguments.
This gives direct access to all auxv entries.  I wonder if we should
abolish _dl_aux_init and call _dl_parse_auxv here.  It shouldn't be
necessary to duplicate this data in _dl_main_map and separate variables
(see _dl_non_dynamic_init).

Can we assume that AT_PHDR and AT_PHNUM are always present on Linux?

Thanks,
Florian
  
Adhemerval Zanella Dec. 9, 2022, 1:50 p.m. UTC | #7
On 09/12/22 10:13, Florian Weimer via Libc-alpha wrote:
> * Yago Gutiérrez:
> 
>> diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
>> index bf9374371e..f1a964ec1d 100644
>> --- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
>> +++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
>> @@ -52,6 +52,9 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
>>    _dl_random = (void *) auxv_values[AT_RANDOM];
>>    GLRO(dl_minsigstacksize) = auxv_values[AT_MINSIGSTKSZ];
>>    GLRO(dl_sysinfo_dso) = (void *) auxv_values[AT_SYSINFO_EHDR];
>> +#ifndef SHARED
>> +  GL(dl_phdr) = (void*) auxv_values[AT_PHDR];
>> +#endif
>>  #ifdef NEED_DL_SYSINFO
>>    if (GLRO(dl_sysinfo_dso) != NULL)
>>      GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];
> 
> The dynamic linker calls _dl_parse_auxv via _dl_sysdep_parse_arguments.
> This gives direct access to all auxv entries.  I wonder if we should
> abolish _dl_aux_init and call _dl_parse_auxv here.  It shouldn't be
> necessary to duplicate this data in _dl_main_map and separate variables
> (see _dl_non_dynamic_init).
> 
> Can we assume that AT_PHDR and AT_PHNUM are always present on Linux?

Afaik yes, kernel provides AT_PHNUM (since initial initial git 2.6.12-rc2
at least).  I am think it is worth to add that this fix leverages the kernel
fix 0da1d5002745c (fs/binfmt_elf: Fix AT_PHDR for unusual ELF files).
  

Patch

From bf2371ce732194f652719aff4af0f9021f9cbd90 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yago=20Guti=C3=A9rrez?= <yagogl@protonmail.com>
Date: Wed, 7 Dec 2022 12:49:22 +0100
Subject: [PATCH 1/1] __libc_start_main() now uses auxv to obtain phdr's
 address [BZ #29864]
To: libc-alpha@sourceware.org

---
 csu/libc-start.c                        | 34 +++++++++++++------------
 sysdeps/generic/ldsodefs.h              |  3 +++
 sysdeps/unix/sysv/linux/dl-parse_auxv.h |  1 +
 3 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/csu/libc-start.c b/csu/libc-start.c
index 543560f36c..f6e0fbf60a 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -262,28 +262,30 @@  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
   }
 #  endif
   _dl_aux_init (auxvec);
-  if (GL(dl_phdr) == NULL)
 # endif
-    {
-      /* Starting from binutils-2.23, the linker will define the
-         magic symbol __ehdr_start to point to our own ELF header
-         if it is visible in a segment that also includes the phdrs.
-         So we can set up _dl_phdr and _dl_phnum even without any
-         information from auxv.  */
 
-      extern const ElfW(Ehdr) __ehdr_start
+  /* Starting from binutils-2.23, the linker will define the
+     magic symbol __ehdr_start to point to our own ELF header
+     if it is visible in a segment that also includes the phdrs.
+     So we can set up _dl_phdr and _dl_phnum even without any
+     information from auxv.  */
+  extern const ElfW(Ehdr) __ehdr_start
 # if BUILD_PIE_DEFAULT
-	__attribute__ ((visibility ("hidden")));
+  __attribute__ ((visibility ("hidden")));
 # else
-	__attribute__ ((weak, visibility ("hidden")));
-      if (&__ehdr_start != NULL)
+  __attribute__ ((weak, visibility ("hidden")));
+  if (&__ehdr_start != NULL)
+# endif
+  {
+# ifdef HAVE_AUX_VECTOR
+    if (GL(dl_phdr) == NULL)
 # endif
-        {
-          assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
-          GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
-          GL(dl_phnum) = __ehdr_start.e_phnum;
-        }
+    {
+      assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
+      GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
     }
+    GL(dl_phnum) = __ehdr_start.e_phnum;
+  }
 
   __tunables_init (__environ);
 
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 9dae72b1ed..ccdb42837f 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -395,6 +395,9 @@  struct rtld_global
 
   /* Structure describing the dynamic linker itself.  */
   EXTERN struct link_map _dl_rtld_map;
+
+  EXTERN const ElfW(Phdr)* _dl_phdr;
+
 #ifdef SHARED
   /* Used to store the audit information for the link map of the
      dynamic loader.  */
diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
index bf9374371e..2cc666cbd6 100644
--- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
+++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
@@ -52,6 +52,7 @@  void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
   _dl_random = (void *) auxv_values[AT_RANDOM];
   GLRO(dl_minsigstacksize) = auxv_values[AT_MINSIGSTKSZ];
   GLRO(dl_sysinfo_dso) = (void *) auxv_values[AT_SYSINFO_EHDR];
+  GL(dl_phdr) = (void*) auxv_values[AT_PHDR];
 #ifdef NEED_DL_SYSINFO
   if (GLRO(dl_sysinfo_dso) != NULL)
     GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];
-- 
2.38.1