[1/3] elf_getarhdr.c: Return correct header for archive within an archive

Message ID 20250908021729.19560-1-amerey@redhat.com
State Superseded
Headers
Series [1/3] elf_getarhdr.c: Return correct header for archive within an archive |

Commit Message

Aaron Merey Sept. 8, 2025, 2:17 a.m. UTC
  If elf_getarhdr is called on a descriptor that refers to an archive that
is itself a member of an outer archive, it may return the Elf_Arhdr of
the current member of the inner archive instead of Elf_Arhdr of the inner
archive itself.

This also causes a memory leak: elf_end only attempts to free
Elf_Arhdr fields ar_name and ar_rawname for descriptors that are not
ELF_K_AR.

Fix this by adding a new field ar_ar_hdr to Elf->state.ar.  The
ar_ar_hdr field stores the Elf_Arhdr of an archive which is itself a member
of an archive.  elf_getarhdr now returns ar_ar_hdr for ELF_K_AR descriptors
associated with a parent archive and elf_end will free ar_ar_hdr.ar_name and
ar_ar_hdr.ar_rawname.

Siged-off-by: Aaron Merey <amerey@redhat.com>
---

This series addresses the following bugs:
https://issues.oss-fuzz.com/issues/440209723
https://issues.oss-fuzz.com/issues/440177309

There is at least one other bug in eu-readelf:process_elf where the code
does not account for archives within archives:

      off_t aroff = elf_getaroff (elf);
      pure_elf = dwelf_elf_begin (fd); 
      if (aroff > 0)
        {     
          /* Archive member.  */
          (void) elf_rand (pure_elf, aroff);
          Elf *armem = elf_begin (-1, ELF_C_READ_MMAP, pure_elf);
          elf_end (pure_elf);
          pure_elf = armem;
        }     
      if (pure_elf == NULL) 
        {     
          error (0, 0, _("cannot read ELF: %s"), elf_errmsg (-1));
          return;
        }

process_elf attempts to re-read archive members to avoid printing
relocations that may have been applied earlier.  But for members of
an inner archive this fails.  elf_rand is called with the outer archive
descriptor (pure_elf) but aroff is relative to the inner archive.

I will add tests for archives-within-archives cases and check other
eu-* tools that may be affected.

 libelf/elf_begin.c    | 6 +++++-
 libelf/elf_end.c      | 8 ++++++++
 libelf/elf_getarhdr.c | 5 ++++-
 libelf/libelfP.h      | 2 ++
 4 files changed, 19 insertions(+), 2 deletions(-)
  

Comments

Mark Wielaard Sept. 9, 2025, 2:22 p.m. UTC | #1
Hi Aaron,

On Sun, 2025-09-07 at 22:17 -0400, Aaron Merey wrote:
> If elf_getarhdr is called on a descriptor that refers to an archive that
> is itself a member of an outer archive, it may return the Elf_Arhdr of
> the current member of the inner archive instead of Elf_Arhdr of the inner
> archive itself.
> 
> This also causes a memory leak: elf_end only attempts to free
> Elf_Arhdr fields ar_name and ar_rawname for descriptors that are not
> ELF_K_AR.
> 
> Fix this by adding a new field ar_ar_hdr to Elf->state.ar.  The
> ar_ar_hdr field stores the Elf_Arhdr of an archive which is itself a member
> of an archive.  elf_getarhdr now returns ar_ar_hdr for ELF_K_AR descriptors
> associated with a parent archive and elf_end will free ar_ar_hdr.ar_name and
> ar_ar_hdr.ar_rawname.

The naming confused me a little. ar_ar_hdr for the ar union member
plays the same role as the elf_ar_hdr for the elf[32|64] union members.
But the ar.elf_ar_hdr field us used for a different purpose (holding
the current/offset ar_hdr).

It might make sense to rename these and even maybe move the elf_ar_hdr
/ar_ar_hdr out of the union members and just have them as part of the
Elf struct itself because now every Elf has an "am I an Elf ar member"
field.

Maybe do this after you added more tests though and after playing with
pahole to see if there is an ideal layout of the struct/union members.

> Siged-off-by: Aaron Merey <amerey@redhat.com>
> ---
> 
> This series addresses the following bugs:
> https://issues.oss-fuzz.com/issues/440209723
> https://issues.oss-fuzz.com/issues/440177309
> 
> There is at least one other bug in eu-readelf:process_elf where the code
> does not account for archives within archives:
> 
>       off_t aroff = elf_getaroff (elf);
>       pure_elf = dwelf_elf_begin (fd); 
>       if (aroff > 0)
>         {     
>           /* Archive member.  */
>           (void) elf_rand (pure_elf, aroff);
>           Elf *armem = elf_begin (-1, ELF_C_READ_MMAP, pure_elf);
>           elf_end (pure_elf);
>           pure_elf = armem;
>         }     
>       if (pure_elf == NULL) 
>         {     
>           error (0, 0, _("cannot read ELF: %s"), elf_errmsg (-1));
>           return;
>         }
> 
> process_elf attempts to re-read archive members to avoid printing
> relocations that may have been applied earlier.  But for members of
> an inner archive this fails.  elf_rand is called with the outer archive
> descriptor (pure_elf) but aroff is relative to the inner archive.
> 
> I will add tests for archives-within-archives cases and check other
> eu-* tools that may be affected.

Yes, more tests would be nice.

>  libelf/elf_begin.c    | 6 +++++-
>  libelf/elf_end.c      | 8 ++++++++
>  libelf/elf_getarhdr.c | 5 ++++-
>  libelf/libelfP.h      | 2 ++
>  4 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/libelf/elf_begin.c b/libelf/elf_begin.c
> index d3ab887d..823a4324 100644
> --- a/libelf/elf_begin.c
> +++ b/libelf/elf_begin.c
> @@ -1128,8 +1128,12 @@ dup_elf (int fildes, Elf_Cmd cmd, Elf *ref)
>      {
>        /* Enlist this new descriptor in the list of children.  */
>        result->next = ref->state.ar.children;
> -      result->state.elf.elf_ar_hdr = ar_hdr;
>        ref->state.ar.children = result;
> +
> +      if (result->kind == ELF_K_AR)
> +	result->state.ar.ar_ar_hdr = ar_hdr;
> +      else
> +        result->state.elf.elf_ar_hdr = ar_hdr;
>      }
>    else
>      {

Looks OK.

> diff --git a/libelf/elf_end.c b/libelf/elf_end.c
> index 1d366127..460b09b7 100644
> --- a/libelf/elf_end.c
> +++ b/libelf/elf_end.c
> @@ -124,6 +124,14 @@ elf_end (Elf *elf)
>        if (elf->state.elf.elf_ar_hdr.ar_rawname != NULL)
>  	free (elf->state.elf.elf_ar_hdr.ar_rawname);
>      }
> +  else
> +    {
> +      if (elf->state.ar.ar_ar_hdr.ar_name != NULL)
> +	free (elf->state.ar.ar_ar_hdr.ar_name);
> +
> +      if (elf->state.ar.ar_ar_hdr.ar_rawname != NULL)
> +	free (elf->state.ar.ar_ar_hdr.ar_rawname);
> +    }
>  
>    /* This was the last activation.  Free all resources.  */
>    switch (elf->kind)

Looks OK.

> diff --git a/libelf/elf_getarhdr.c b/libelf/elf_getarhdr.c
> index 9211fc2e..a5fde49b 100644
> --- a/libelf/elf_getarhdr.c
> +++ b/libelf/elf_getarhdr.c
> @@ -51,5 +51,8 @@ elf_getarhdr (Elf *elf)
>        return NULL;
>      }
>  
> -  return &elf->state.elf.elf_ar_hdr;
> +  if (elf->kind == ELF_K_AR)
> +    return &elf->state.ar.ar_ar_hdr;
> +  else
> +    return &elf->state.elf.elf_ar_hdr;
>  }

Looks OK.

So in all these cases it would be simpler if there was just an
state.elf_ar_hdr field so you could use that instead of having to
select on elf->kind.

> diff --git a/libelf/libelfP.h b/libelf/libelfP.h
> index 1b93da88..568d4b26 100644
> --- a/libelf/libelfP.h
> +++ b/libelf/libelfP.h
> @@ -403,6 +403,8 @@ struct Elf
>        char ar_name[16];		/* NUL terminated ar_name of elf_ar_hdr.  */
>        char raw_name[17];	/* This is a buffer for the NUL terminated
>  				   named raw_name used in the elf_ar_hdr.  */
> +      Elf_Arhdr ar_ar_hdr;	/* Archive header of this archive.  Used when
> +				   an archive is a member of an archive.  */
>      } ar;
>    } state;
>  

Might want to check placement of this field with pahole. It now comes
after a 17 element array which means there is at least a small gap for
alignment.

Cheers,

Mark
  
Aaron Merey Sept. 9, 2025, 10:40 p.m. UTC | #2
Hi Mark,

On Tue, Sep 9, 2025 at 10:24 AM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi Aaron,
>
> On Sun, 2025-09-07 at 22:17 -0400, Aaron Merey wrote:
> > If elf_getarhdr is called on a descriptor that refers to an archive that
> > is itself a member of an outer archive, it may return the Elf_Arhdr of
> > the current member of the inner archive instead of Elf_Arhdr of the inner
> > archive itself.
> >
> > This also causes a memory leak: elf_end only attempts to free
> > Elf_Arhdr fields ar_name and ar_rawname for descriptors that are not
> > ELF_K_AR.
> >
> > Fix this by adding a new field ar_ar_hdr to Elf->state.ar.  The
> > ar_ar_hdr field stores the Elf_Arhdr of an archive which is itself a member
> > of an archive.  elf_getarhdr now returns ar_ar_hdr for ELF_K_AR descriptors
> > associated with a parent archive and elf_end will free ar_ar_hdr.ar_name and
> > ar_ar_hdr.ar_rawname.
>
> The naming confused me a little. ar_ar_hdr for the ar union member
> plays the same role as the elf_ar_hdr for the elf[32|64] union members.
> But the ar.elf_ar_hdr field us used for a different purpose (holding
> the current/offset ar_hdr).
>
> It might make sense to rename these and even maybe move the elf_ar_hdr
> /ar_ar_hdr out of the union members and just have them as part of the
> Elf struct itself because now every Elf has an "am I an Elf ar member"
> field.
>
> Maybe do this after you added more tests though and after playing with
> pahole to see if there is an ideal layout of the struct/union members.

I agree that it's better to combine state.elf[32|64].elf_ar_hdr and
state.ar.ar_ar_hdr into one struct Elf field outside of elf->state.
It's less confusing and we can set/get elf_ar_hdr without needing to
check elf->kind, like you said.

I used pahole to compare this suggested layout with the layout given
in this patch (new ar_ar_hdr field in state.ar). There is no
difference in the struct Elf size or the amount of padding:

with --enable-thread-safety:
        /* size: 368, cachelines: 6, members: 14 */
        /* last cacheline: 48 bytes */

with --disable-thread-safety:
        /* size: 312, cachelines: 5, members: 13 */
        /* sum members: 308, holes: 1, sum holes: 4 */
        /* last cacheline: 56 bytes */

I will revise this patch and add more tests.

Aaron
  

Patch

diff --git a/libelf/elf_begin.c b/libelf/elf_begin.c
index d3ab887d..823a4324 100644
--- a/libelf/elf_begin.c
+++ b/libelf/elf_begin.c
@@ -1128,8 +1128,12 @@  dup_elf (int fildes, Elf_Cmd cmd, Elf *ref)
     {
       /* Enlist this new descriptor in the list of children.  */
       result->next = ref->state.ar.children;
-      result->state.elf.elf_ar_hdr = ar_hdr;
       ref->state.ar.children = result;
+
+      if (result->kind == ELF_K_AR)
+	result->state.ar.ar_ar_hdr = ar_hdr;
+      else
+        result->state.elf.elf_ar_hdr = ar_hdr;
     }
   else
     {
diff --git a/libelf/elf_end.c b/libelf/elf_end.c
index 1d366127..460b09b7 100644
--- a/libelf/elf_end.c
+++ b/libelf/elf_end.c
@@ -124,6 +124,14 @@  elf_end (Elf *elf)
       if (elf->state.elf.elf_ar_hdr.ar_rawname != NULL)
 	free (elf->state.elf.elf_ar_hdr.ar_rawname);
     }
+  else
+    {
+      if (elf->state.ar.ar_ar_hdr.ar_name != NULL)
+	free (elf->state.ar.ar_ar_hdr.ar_name);
+
+      if (elf->state.ar.ar_ar_hdr.ar_rawname != NULL)
+	free (elf->state.ar.ar_ar_hdr.ar_rawname);
+    }
 
   /* This was the last activation.  Free all resources.  */
   switch (elf->kind)
diff --git a/libelf/elf_getarhdr.c b/libelf/elf_getarhdr.c
index 9211fc2e..a5fde49b 100644
--- a/libelf/elf_getarhdr.c
+++ b/libelf/elf_getarhdr.c
@@ -51,5 +51,8 @@  elf_getarhdr (Elf *elf)
       return NULL;
     }
 
-  return &elf->state.elf.elf_ar_hdr;
+  if (elf->kind == ELF_K_AR)
+    return &elf->state.ar.ar_ar_hdr;
+  else
+    return &elf->state.elf.elf_ar_hdr;
 }
diff --git a/libelf/libelfP.h b/libelf/libelfP.h
index 1b93da88..568d4b26 100644
--- a/libelf/libelfP.h
+++ b/libelf/libelfP.h
@@ -403,6 +403,8 @@  struct Elf
       char ar_name[16];		/* NUL terminated ar_name of elf_ar_hdr.  */
       char raw_name[17];	/* This is a buffer for the NUL terminated
 				   named raw_name used in the elf_ar_hdr.  */
+      Elf_Arhdr ar_ar_hdr;	/* Archive header of this archive.  Used when
+				   an archive is a member of an archive.  */
     } ar;
   } state;