elf_memory again, readonly or readwrite...

Message ID 20240829134757.GC4864@gnu.wildebeest.org
State Committed
Headers
Series elf_memory again, readonly or readwrite... |

Commit Message

Mark Wielaard Aug. 29, 2024, 1:47 p.m. UTC
  Hi,

So we changed elf_memory so it pretends the in-memory Elf image is
read with ELF_C_READ_MMAP. This helps when calling elf_memory on
read-only memory which still wants to change some things about the Elf
like uncompress some sections (which changes the section header).

With ELF_C_READ_MMAP libelf will make a copy of the section headers,
so any changes aren't written to the memory image (because that would
crash if the underlying memory is read-only).

But that breaks another use case where elf_memory is used on rw memory
and then the section headers are updated using libelf and it is
expected those in-memory section headers reflect the changes.

The problem of course is the elf_memory function doesn't take a "mode"
argument. So we have to guess. And make one or the other usage
unusable.

I think we should assume the memory is read/write and at least header
updates are written back to the memory image. But that when only just
reading the image then nothing is changed and written back to the
image.

This does mean that when explicitly uncompressing sections you have to
make sure the image is writable (because that updates the shdrs).

Is that bad/unreasonable?

Derek, would it make your use case impossible?

John, what kind of Shdr changes are you expecting to get written back
to the memory image. Are there any issues that cannot be worked around
by always using the Shdr copy given back from (g)elf[32|64]_shdr?

The patch I am thinking of is attached.

Cheers,

Mark
  

Comments

John Mellor-Crummey Aug. 29, 2024, 2:58 p.m. UTC | #1
Mark,

The draft patch you provided lets my section header updates happen in place. This meets my needs.

I am working with Intel Ponte Vecchio GPU binaries, which are relocatables. What I am trying to do is to (1) change the sh_addr field in the section headers to make the sections non-overlapping (all sections originally have sh_addr 0). I want to relocate each section to its offset, so it is non-overlapping. (2) I make corresponding changes in the symbol table for addresses of text segments and function symbols. (3) I relocate the line map entries to correspond to the new positions.

The use case I have is a bit strange in that 
- I read an elf binary into memory into a buffer that I allocate with malloc. (I don’t want to change the original copy of the binary, I just want to change my copy in memory. I probably could use MMAP_PRIVATE for this, but that required more thought.)
- I open the in-memory copy of the binary with elf_memory and make the changes that I describe above.
- I then pass the in-memory copy to Dyninst to help me process the binary (extract information about functions, inlined code, line mappings, etc.)

Dyninst isn’t accepting an Elf * from me. It accepts the memory as a char * pointer like elf_memory does. Then it uses elfutils to open it itself. That is the reason that I need all of my adjustments to the binary to be committed back to the memory segment: Dyninst isn’t inheriting the Elf * from me; it is starting from scratch.

I am willing to call elf_update to make updates if that makes it easier to address both Derek’s use case and mine. However, some changes to elf_update may be necessary to support update of something mapped with ELF_C_READ_MMAP_PRIVATE.
--
John Mellor-Crummey		Professor
Dept of Computer Science	Rice University
email: johnmc@rice.edu		phone: 713-348-5179

> On Aug 29, 2024, at 8:47 AM, Mark Wielaard <mark@klomp.org> wrote:
> 
> Hi,
> 
> So we changed elf_memory so it pretends the in-memory Elf image is
> read with ELF_C_READ_MMAP. This helps when calling elf_memory on
> read-only memory which still wants to change some things about the Elf
> like uncompress some sections (which changes the section header).
> 
> With ELF_C_READ_MMAP libelf will make a copy of the section headers,
> so any changes aren't written to the memory image (because that would
> crash if the underlying memory is read-only).
> 
> But that breaks another use case where elf_memory is used on rw memory
> and then the section headers are updated using libelf and it is
> expected those in-memory section headers reflect the changes.
> 
> The problem of course is the elf_memory function doesn't take a "mode"
> argument. So we have to guess. And make one or the other usage
> unusable.
> 
> I think we should assume the memory is read/write and at least header
> updates are written back to the memory image. But that when only just
> reading the image then nothing is changed and written back to the
> image.
> 
> This does mean that when explicitly uncompressing sections you have to
> make sure the image is writable (because that updates the shdrs).
> 
> Is that bad/unreasonable?
> 
> Derek, would it make your use case impossible?
> 
> John, what kind of Shdr changes are you expecting to get written back
> to the memory image. Are there any issues that cannot be worked around
> by always using the Shdr copy given back from (g)elf[32|64]_shdr?
> 
> The patch I am thinking of is attached.
> 
> Cheers,
> 
> Mark<elf_memory.patch>
  
Derek Bruening Aug. 29, 2024, 9:06 p.m. UTC | #2
Instead of elf_memory() having to guess, which seems like it will confuse
future users (esp if its behavior is not documented), can we make a new
extended API routine elf_memory_mode() that takes in the mode, and clearly
document that the old legacy one assumes readonly (or read-write if you
change it)?

On Thu, Aug 29, 2024 at 9:48 AM Mark Wielaard <mark@klomp.org> wrote:

> Hi,
>
> So we changed elf_memory so it pretends the in-memory Elf image is
> read with ELF_C_READ_MMAP. This helps when calling elf_memory on
> read-only memory which still wants to change some things about the Elf
> like uncompress some sections (which changes the section header).
>
> With ELF_C_READ_MMAP libelf will make a copy of the section headers,
> so any changes aren't written to the memory image (because that would
> crash if the underlying memory is read-only).
>
> But that breaks another use case where elf_memory is used on rw memory
> and then the section headers are updated using libelf and it is
> expected those in-memory section headers reflect the changes.
>
> The problem of course is the elf_memory function doesn't take a "mode"
> argument. So we have to guess. And make one or the other usage
> unusable.
>
> I think we should assume the memory is read/write and at least header
> updates are written back to the memory image. But that when only just
> reading the image then nothing is changed and written back to the
> image.
>
> This does mean that when explicitly uncompressing sections you have to
> make sure the image is writable (because that updates the shdrs).
>
> Is that bad/unreasonable?
>
> Derek, would it make your use case impossible?
>
> John, what kind of Shdr changes are you expecting to get written back
> to the memory image. Are there any issues that cannot be worked around
> by always using the Shdr copy given back from (g)elf[32|64]_shdr?
>
> The patch I am thinking of is attached.
>
> Cheers,
>
> Mark
  
Mark Wielaard Oct. 18, 2024, 3:03 p.m. UTC | #3
Hi,

On Thu, 2024-08-29 at 15:47 +0200, Mark Wielaard wrote:
> So we changed elf_memory so it pretends the in-memory Elf image is
> read with ELF_C_READ_MMAP. This helps when calling elf_memory on
> read-only memory which still wants to change some things about the Elf
> like uncompress some sections (which changes the section header).
> 
> With ELF_C_READ_MMAP libelf will make a copy of the section headers,
> so any changes aren't written to the memory image (because that would
> crash if the underlying memory is read-only).
> 
> But that breaks another use case where elf_memory is used on rw memory
> and then the section headers are updated using libelf and it is
> expected those in-memory section headers reflect the changes.
> 
> The problem of course is the elf_memory function doesn't take a "mode"
> argument. So we have to guess. And make one or the other usage
> unusable.
> 
> I think we should assume the memory is read/write and at least header
> updates are written back to the memory image. But that when only just
> reading the image then nothing is changed and written back to the
> image.
> 
> This does mean that when explicitly uncompressing sections you have to
> make sure the image is writable (because that updates the shdrs).

I waited too long deciding what to do here. Sorry. We are about to make
the 0.192 release today. If it isn't too late then I would like to
apply my proposed patch (as attached).

It isn't ideal, but it basically reverts to the behavior of elfutils
0.190. So you can again do direct manipulation on ELF images created
through elf_memory. So in that way it was a regression in 0.191, fixed
for 0.192.

It does mean that if the memory you gave elf_memory is read-only and
you call elf_compress on one of the sections things might crash
(because elf_compress will try to update the Elf_Shdr structure in
place to set the new size). Which indeed could be seen as a regression
from 0.191 (but was there before). If you do want to decompress (or
otherwise manipulate) the ELF image created through elf_memory you have
to do what the testcase (now) does. mmap the image using MMAP_PRIVATE
so it is (copy-on-)writable.

Longer term we probably want a new elf_memory2 function that takes an
argument describing the protection bits of the given memory.

Aaron, is it OK to commit this before you start the 0.192 release
process?

Cheers,

Mark
  
Aaron Merey Oct. 18, 2024, 4:10 p.m. UTC | #4
Hi Mark,

On Fri, Oct 18, 2024 at 11:03 AM Mark Wielaard <mark@klomp.org> wrote:
>
> I waited too long deciding what to do here. Sorry. We are about to make
> the 0.192 release today. If it isn't too late then I would like to
> apply my proposed patch (as attached).
>
> It isn't ideal, but it basically reverts to the behavior of elfutils
> 0.190. So you can again do direct manipulation on ELF images created
> through elf_memory. So in that way it was a regression in 0.191, fixed
> for 0.192.
>
> It does mean that if the memory you gave elf_memory is read-only and
> you call elf_compress on one of the sections things might crash
> (because elf_compress will try to update the Elf_Shdr structure in
> place to set the new size). Which indeed could be seen as a regression
> from 0.191 (but was there before). If you do want to decompress (or
> otherwise manipulate) the ELF image created through elf_memory you have
> to do what the testcase (now) does. mmap the image using MMAP_PRIVATE
> so it is (copy-on-)writable.
>
> Longer term we probably want a new elf_memory2 function that takes an
> argument describing the protection bits of the given memory.
>
> Aaron, is it OK to commit this before you start the 0.192 release
> process?

This is fine, please go ahead and merge.

Aaron
  

Patch

diff --git a/libelf/elf_memory.c b/libelf/elf_memory.c
index 13d77cb71b39..1df49d732dd9 100644
--- a/libelf/elf_memory.c
+++ b/libelf/elf_memory.c
@@ -46,5 +46,6 @@  elf_memory (char *image, size_t size)
       return NULL;
     }
 
-  return __libelf_read_mmaped_file (-1, image, 0, size, ELF_C_READ_MMAP, NULL);
+  return __libelf_read_mmaped_file (-1, image, 0, size,
+				    ELF_C_READ_MMAP_PRIVATE, NULL);
 }
diff --git a/tests/elfgetzdata.c b/tests/elfgetzdata.c
index 0af6c223a06b..a50275fea1a7 100644
--- a/tests/elfgetzdata.c
+++ b/tests/elfgetzdata.c
@@ -69,7 +69,8 @@  main (int argc, char *argv[])
       else
         {
 	  assert (do_mem);
-	  // We mmap the memory ourselves, explicitly PROT_READ only
+	  // We mmap the memory ourselves, explicitly PROT_READ | PROT_WRITE
+	  // elf_memory needs writable memory when using elf_compress.
 	  struct stat st;
 	  if (fstat (fd, &st) != 0)
 	    {
@@ -79,7 +80,8 @@  main (int argc, char *argv[])
 	      continue;
 	    }
 	  map_size = st.st_size;
-	  map_address = mmap (NULL, map_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	  map_address = mmap (NULL, map_size, PROT_READ | PROT_WRITE,
+			      MAP_PRIVATE, fd, 0);
 	  if (map_address == MAP_FAILED)
 	    {
 	      printf ("%s cannot mmap %s\n", argv[cnt], strerror (errno));