proof for dlmem() (Re: [PATCH v9 0/13] implement dlmem() function)

Message ID 3f636504-818d-6520-6cf3-484503a8703c@yandex.ru
State Not applicable
Headers
Series proof for dlmem() (Re: [PATCH v9 0/13] implement dlmem() function) |

Checks

Context Check Description
dj/TryBot-apply_patch fail Patch failed to apply to master at the time it was sent
dj/TryBot-32bit fail Patch series failed to apply

Commit Message

stsp April 14, 2023, 7:04 p.m. UTC
  Hello,

14.04.2023 01:02, Adhemerval Zanella Netto пишет:
> "It would be possible to require the caller to arrange all of
> these things, but that's basically offloading A LOT of the ELF
> loading process onto the calling program and I don't think that
> makes for a reasonable public interface for glibc to provide."
OK, in this case I am going to provide
a very detailed, reproducible and undisputable
proof that the above quote is false.

Attached is the small patch that can be
applied on top of my v10 patch-set to verify
the demo I present here.

The demo I present here, shows the "trace"
of the dlmem() usage, by sampling the
/proc/self/maps at needed points. Namely,
before dlmem(), immediately after dlmem(),
and after the fdlopen-over-dlmem finished.

That demo clearly and unambiguously demonstrates
how dlmem() transforms the raw solib image
into the "loaded" image, i.e. where all the
PT_LOAD sections are correctly laid out.

The output of that demo will be located in
a path: <glibc-build-dir>/dlfcn/tst-dlmem-extfns.out
Everyone can apply the v10+attached patch
and verify my results on his build.

And here it is:

$ cat tst-dlmem-extfns.out

before dlmem
7f5210ca8000-7f5210cad000 r--p 00000000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so

after dlmem
7f5210ca3000-7f5210ca4000 r--p 00000000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so
7f5210ca4000-7f5210ca5000 r-xp 00001000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so
7f5210ca5000-7f5210ca6000 r--p 00002000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so
7f5210ca6000-7f5210ca7000 r--p 00002000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so
7f5210ca7000-7f5210ca8000 rw-p 00003000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so
7f5210ca8000-7f5210cad000 r--p 00000000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so

post fdlopen
7f5210ca3000-7f5210ca4000 r--p 00000000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so
7f5210ca4000-7f5210ca5000 r-xp 00001000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so
7f5210ca5000-7f5210ca6000 r--p 00002000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so
7f5210ca6000-7f5210ca7000 r--p 00002000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so
7f5210ca7000-7f5210ca8000 rw-p 00003000 00:29 18840304 
/home/stas/src/glibc-dlmem/build/dlfcn/glreflib1.so


PS: yes, I realize nothing can be changed.
But I need to make sure the best possible and
the most obvious proof is publically available,
so that its not to look as if I've given up to
this "wonderful argument that was brought
to me so many times". You got all the proves,
and also more than once.
  

Comments

Zack Weinberg May 1, 2023, 11:11 p.m. UTC | #1
This discussion seems to have died down in the past two weeks but I'm
going to make one last attempt to bridge the divide.  I would hate to
see everyone walk away from this with a bad taste in their mouth.  Note
cc: list aggressively pruned.

On Fri, Apr 14, 2023, at 3:04 PM, stsp via Libc-alpha wrote:
> 14.04.2023 01:02, Adhemerval Zanella Netto пишет:
>> "It would be possible to require the caller to arrange all of these
>> things, but that's basically offloading A LOT of the ELF loading
>> process onto the calling program and I don't think that makes for a
>> reasonable public interface for glibc to provide."
> OK, in this case I am going to provide a very detailed, reproducible
> and undisputable proof that the above quote is false.

Here's the thing: stsp, no one is intentionally trying to spread lies
about your patch.  You have misunderstood Adhemerval here.  He was *not*
saying that *your patch* required the caller to parse ELF headers
themselves.  He was saying that your patch does a *different*
unacceptable thing -- running user-supplied code with the loader lock
held, if I understood correctly -- and then he was trying to warn you
that any *alternative* patch you might come up with would *also* be
unacceptable if it required the caller to parse the DLL themselves.

Do you understand the difference between what you thought Adhemerval was
saying, and what I am saying he was trying to communicate?  Please
confirm, or else describe why you don't see a difference so I can try to
clarify again.

Now. Another thing you must understand before you send in any further
revisions of the patch is that glibc is extremely reluctant to add new
APIs.  Witness how much we resisted adding strlcpy, a single function
with a clear specification and no interdependencies with any other
part of the C library.  This is why you are getting what probably
seems to you like an absurd level of negative feedback.  We aren't
trying to deny that you have a legitimate need, and we aren't saying
you haven't managed to come up with a proof of concept implementation
of how that need could be solved -- and yet we ARE saying, no, still
not good enough.

We do this because once we add something, we are stuck with it
*forever*. Not even if the C committee itself deletes something from the
standard do we drop it.  You can *still*, today, write a new program
that uses gets().  And we also do this because GNU libc is loaded into
the address space of *every program* running on (GNU/)Linux.  You're
asking to take up a chunk of (virtual) RAM in *every program* in order
to make your one specific program simpler.  This *will* make at least a
few programs' L1 cache utilization worse.  Not by much, but this happens
every time we add something and it adds up.

---

Having said all that, I can imagine other uses for dlmem() or something like it, besides your specific program, so let me try to meet you in the middle here.  Suppose we gave you *this* API:

/** Report the total amount of virtual address space required in order to load
 *  the shared object referred to by FD.  FD must be a readable and seekable
 *  file descriptor.  Returns a whole number of pages, or zero on error
 *  (in which case errno will provide details).
 */
size_t dl_total_vsize(int fd);

/** Load the shared object referred to by FD into this process's address
 *  space.  FD must be a readable and seekable file descriptor.
 *
 *  If LOAD_ADDR is not NULL, it specifies the address at which to load
 *  the shared object.  If this is not possible, for instance if there
 *  are already memory mappings in the range from LOAD_ADDR to LOAD_ADDR
 *  plus dl_total_vsize(FD), the operation will fail.  LOAD_ADDR must be
 *  a multiple of the system page size.
 *
 *  MODE accepts the same set of RTLD_* flags as are documented for dlopen(),
 *  plus one more: RTLD_READ directs the dynamic loader to read the shared
 *  object into the process's address space as-if with read() rather than
 *  with mmap().  When this flag is used, LOAD_ADDR may not be NULL, and must
 *  already point to at least dl_total_vsize(FD) bytes of writable memory.
 *  The dynamic loader will take ownership of that range of addresses and
 *  may subsequently (e.g. upon a call to dlclose()) deallocate them as-if
 *  with munmap().  Furthermore, the dynamic loader will change virtual
 *  memory protection settings for sub-ranges of this space, so that
 *  (for instance) the text segment of the shared object is read-only,
 *  as-if by calling mprotect().
 */
void *dl_fdopen(int fd, int mode, void *load_addr);

What would still be missing for your use case?

zw
  
stsp May 2, 2023, 5:48 a.m. UTC | #2
Hi Zack,

02.05.2023 04:11, Zack Weinberg пишет:
> This discussion seems to have died down in the past two weeks but I'm
> going to make one last attempt to bridge the divide.  I would hate to
> see everyone walk away from this with a bad taste in their mouth.  Note
> cc: list aggressively pruned.

The discussion have died down because I
went implementing the RTLD_NORELOCATE
proposal - a replacement to the dlmem()'s
one. Currently its almost completed and I
will draft its spec at the bottom of this e-mail.


> Here's the thing: stsp, no one is intentionally trying to spread lies
> about your patch.  You have misunderstood Adhemerval here.  He was *not*
> saying that *your patch* required the caller to parse ELF headers
> themselves.  He was saying that your patch does a *different*
> unacceptable thing -- running user-supplied code with the loader lock
> held,

Firstly, let me note that we are mis-communicating.
I haven't received your very first e-mail on that
topic (Carlos told me later about it), and you seem
to have not received any (of many) mails I sent
to you both privately and on-list.
So please confirm the communication is OK now.

Now let me clarify.
Running a call-back with the lock held, was
the problem spelled out by Szabolcs - the only
person who did an actual review. That immediately
set me back to the scratch-pads, and I'll soon
introduce the RTLD_NORELOCATE proposal that
doesn't have that problem. So let me put that
straightly: no single problem was ever ignored
or misunderstood on my side. The valid problems
pointed out by Szabolcs, were discussed and
will be addresses (as I told you many times in
the private e-mails, but it seems you received
none of them).

Unfortunately there was also an activity on
"punishing me" running in parallel by others.
It started from this e-mail:
https://inbox.sourceware.org/libc-alpha/b6a96687-9a4b-414f-849a-45a305898274@redhat.com/
It does this strange claim:
---

To implement fdlopen on top of dlmem requires PT_LOAD processing and that
will duplicate into userspace a significant part of the complexity of ELF
loading and segment handling. The test case you include below is incomplete
when it comes to implementing a functional fdlopen() across the supported
architectures and toolchain variants.

---

To address that (and a few other valid problems)
I clarified in my spec that the mmap'ed buffer
should not be changed before passing to dlmem().
Then there came this e-mail from Jonathon:

https://inbox.sourceware.org/libc-alpha/630fa17528c6050d60f524aa88ad5a057cae1603.camel@rice.edu/

---

AFAICT, the first glaring issue with both of your implementations is
that you have neglected the case where p_offset != p_vaddr, i.e. a
segment is mmapped to a different location than its layout in the file.
There are a LOT of binaries out in the wild where this is the case.
Here's a quick one-liner to help you find some on your own box, I
have 11712 such binaries on my Debian system:

---

To address that (obviously false) claim, I posted
this proof:
https://inbox.sourceware.org/libc-alpha/d8c82663-d427-e03c-109c-95f6095ce6d6@yandex.ru/
and later this proof:
https://inbox.sourceware.org/libc-alpha/3f636504-818d-6520-6cf3-484503a8703c@yandex.ru/

After that, I explicitly asked Carlos many
times, both privately and on-list, whether
my proofs were sufficient, or should I
disprove more false claims that directly
contradict with my spec draft, to start with:
https://inbox.sourceware.org/libc-alpha/b2336861-9a01-c5ab-0d78-99ea3f920824@yandex.ru/

Carlos never replied to either of these
queries. And when Szabolcs started to
actually discuss my proposal (rather some
false claims around it), Adhemerval provoked
the conflict (by repeating the aforementioned
disproven claims indefinitely) with the only
goal of making Szabolcs to stop reviewing my
proposal! Fortunately that was unneeded at
that stage, as I was already set back to
scratches by an existing Szabolcs's review.


>   if I understood correctly -- and then he was trying to warn you
> that any *alternative* patch you might come up with would *also* be
> unacceptable if it required the caller to parse the DLL themselves.

As shown above, that applied to my existing
proposal, not to some future one:
https://inbox.sourceware.org/libc-alpha/b6a96687-9a4b-414f-849a-45a305898274@redhat.com/
This mail is very clear on what proposal is
concerned. And there was never a reason
to expect that any future proposal will include
that  problem.

Note that all proves I was posting, were just
silently ignored and the false statement were
repeated indefinitely:
https://inbox.sourceware.org/libc-alpha/a58c0c8c-4117-1a94-7285-40ba18555548@yandex.ru/
This mail is just one of many, many examples
where I asked to stop blindly repeating and
look at least at what proofs I provide. Eventually
it was summarized by Rich Felker this way:
https://inbox.sourceware.org/libc-alpha/20230331145803.GB3298@brightrain.aerifal.cx/

So as you can see, as surprising as it might
seem, it all was just a big campaign on punishing
me for not dropping the patches down. :(

> Do you understand the difference between what you thought Adhemerval was
> saying, and what I am saying he was trying to communicate?  Please
> confirm, or else describe why you don't see a difference so I can try to
> clarify again.

You can only try to clarify that, if you read
the whole thread. Its not possible to clarify
out-of-context. And once you read the whole
thread, you'll have a major difficulties in
clarifying what was done here. :(
I have never had to run through such a level
of abuse in any free-software community.
But it all doesn't matter.
What matters is to come up with the good
proposal, which is what I am working on now
and then, with a hope other people can became
more tolerant to a newcomers that only wanted
to contribute.

> Now. Another thing you must understand before you send in any further
> revisions of the patch is that glibc is extremely reluctant to add new
> APIs.  Witness how much we resisted adding strlcpy, a single function
> with a clear specification and no interdependencies with any other
> part of the C library.  This is why you are getting what probably
> seems to you like an absurd level of negative feedback.  We aren't
> trying to deny that you have a legitimate need, and we aren't saying
> you haven't managed to come up with a proof of concept implementation
> of how that need could be solved -- and yet we ARE saying, no, still
> not good enough.

The explanation of "not good enough" should
also be provided. Well, it was done by Szabolcs,
and so I am on the drawing boards and scratch
pads again.
Or alternatively glibc can add hooks needed
for an external libdl, and not accept any new
libdl-related API after that.
Or alternatively it should be possible to link
with the custom-patched static libc, and put
the resulting binary into a separate namespace.
I was working also in that direction and had
some progress, but I need to understand the
glibc's tls impl better. Overall that route seems
possible too.

Now to your proposal:

> ---
>
> Having said all that, I can imagine other uses for dlmem() or something like it, besides your specific program, so let me try to meet you in the middle here.  Suppose we gave you *this* API:
>
> /** Report the total amount of virtual address space required in order to load
>   *  the shared object referred to by FD.  FD must be a readable and seekable
>   *  file descriptor.  Returns a whole number of pages, or zero on error
>   *  (in which case errno will provide details).
>   */
> size_t dl_total_vsize(int fd);
>
> /** Load the shared object referred to by FD into this process's address
>   *  space.  FD must be a readable and seekable file descriptor.
>   *
>   *  If LOAD_ADDR is not NULL, it specifies the address at which to load
>   *  the shared object.  If this is not possible, for instance if there
>   *  are already memory mappings in the range from LOAD_ADDR to LOAD_ADDR
>   *  plus dl_total_vsize(FD), the operation will fail.  LOAD_ADDR must be
>   *  a multiple of the system page size.
>   *
>   *  MODE accepts the same set of RTLD_* flags as are documented for dlopen(),
>   *  plus one more: RTLD_READ directs the dynamic loader to read the shared
>   *  object into the process's address space as-if with read() rather than
>   *  with mmap().  When this flag is used, LOAD_ADDR may not be NULL, and must
>   *  already point to at least dl_total_vsize(FD) bytes of writable memory.
>   *  The dynamic loader will take ownership of that range of addresses and
>   *  may subsequently (e.g. upon a call to dlclose()) deallocate them as-if
>   *  with munmap().  Furthermore, the dynamic loader will change virtual
>   *  memory protection settings for sub-ranges of this space, so that
>   *  (for instance) the text segment of the shared object is read-only,
>   *  as-if by calling mprotect().
>   */
> void *dl_fdopen(int fd, int mode, void *load_addr);
>
> What would still be missing for your use case?
What's missing here is an ability to load the
library into an existing mapping without overwriting
it. This feature is needed so that the library can
be duplicated from "reloc_addr" to
"reloc_addr+VM_window_start".
So your proposal only adds 1 feature from the
needed 2.

The proposal I am working on now, looks like this:
RTLD_NORELOCATE flag defers an object relocation
upon the use of dlsym(), or an explicitly called
new dlrelocate() fn.
RTLD_DI_MAPINFO is a new dlinfo() cmd that
fills in this struct:

+typedef struct
+{
+  void *map_start;             /* Beginning of mapping containing 
address.  */
+  size_t map_length;           /* Length of mapping.  */
+  int relocated;               /* Indicates whether an object was 
relocated. */
+} Dl_mapinfo;

After getting this struct, the user can
just memcpy() the unrelocated image
where he wants it to be, and then call
dlset_object_address(handle, address);
to update the link-map.
Then he can either call dlrelocate()
or not do anything, in which case dlsym()
will relocate.
That seems to be a very flexible and small
extension: as it is up to the user to move
an object, almost no new code needs to be
added to glibc. All patches I now have, are
within the 200lines margin.

Do you know if such extension is acceptable?
  
stsp May 2, 2023, 6:24 a.m. UTC | #3
02.05.2023 04:11, Zack Weinberg пишет:
> Now. Another thing you must understand before you send in any further
> revisions of the patch is that glibc is extremely reluctant to add new
> APIs.  Witness how much we resisted adding strlcpy, a single function
> with a clear specification and no interdependencies with any other
> part of the C library.
I think strlcpy() is a rather bad example
here. I am using strlcpy() from libbsd, I
have no problems with such usage, and
I would never recommend adding it to
glibc, for a very simple reason: its already
in libbsd and works well.

But my problem is different because glibc
is deficient on any libdl extensions: external
libdl extensions are simply not possible
right now. If this is addressed, then I would
never recommend having dlmem() in glibc -
it will be in some other unrelated lib then.
One way is to allow the customly-patched
static libc to reside in a separate NS.
I can work even on that, even though it
requires quite a lot of knowledge on glibc
internals, so I was stuck with tls.
The problem is that I was never told on
what should I work to solve my problem.
The only thing I got, was a "drop your
patches immediately" campaign. That is
the main problem. Technical problems are
solvable, but political ones - not.
  
stsp May 8, 2023, 4 p.m. UTC | #4
02.05.2023 10:48, stsp пишет:
>
> Hi Zack,
>
> 02.05.2023 04:11, Zack Weinberg пишет:
>> This discussion seems to have died down in the past two weeks but I'm
>> going to make one last attempt to bridge the divide.  I would hate to
>> see everyone walk away from this with a bad taste in their mouth.  Note
>> cc: list aggressively pruned.
>
> The discussion have died down because I
> went implementing the RTLD_NORELOCATE
> proposal - a replacement to the dlmem()'s
> one. Currently its almost completed and I
> will draft its spec at the bottom of this e-mail.
>
And it finally is available for your review:
https://sourceware.org/pipermail/libc-alpha/2023-May/147899.html
  

Patch

diff --git a/dlfcn/tst-dlmem-extfns.c b/dlfcn/tst-dlmem-extfns.c
index 3b98a6e859..e35bdcaeb1 100644
--- a/dlfcn/tst-dlmem-extfns.c
+++ b/dlfcn/tst-dlmem-extfns.c
@@ -36,13 +36,20 @@  fdlopen (int fd, int flags)
   off_t len;
   void *addr;
   void *handle;
+  char cmd[256];
 
   len = lseek (fd, 0, SEEK_END);
   lseek (fd, 0, SEEK_SET);
   addr = mmap (NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
   if (addr == MAP_FAILED)
     return NULL;
+  printf ("\nbefore dlmem\n");
+  snprintf (cmd, sizeof(cmd), "grep glreflib1.so /proc/%i/maps", getpid());
+  system (cmd);
   handle = dlmem (addr, len, flags, NULL);
+  printf ("\nafter dlmem\n");
+  snprintf (cmd, sizeof(cmd), "grep glreflib1.so /proc/%i/maps", getpid());
+  system (cmd);
   munmap (addr, len);
   return handle;
 }
@@ -111,8 +118,8 @@  do_test (void)
   handle = dlmem (unaligned_buf, 4096, RTLD_NOW, NULL);
   TEST_VERIFY (handle == NULL);
   /* errno is set by dlerror() so needs to print something. */
-  printf ("unaligned buf gives %s\n", dlerror ());
-  TEST_COMPARE (errno, EINVAL);
+//  printf ("unaligned buf gives %s\n", dlerror ());
+//  TEST_COMPARE (errno, EINVAL);
 
   fd = open (BUILDDIR "glreflib1.so", O_RDONLY);
   if (fd == -1)
@@ -128,10 +135,11 @@  do_test (void)
 
   /* Check that the lib is properly mmap()ed, rather than memcpy()ed.
      This may fail on linux kernels <5.13. */
+  printf ("\npost fdlopen\n");
   snprintf (cmd, sizeof(cmd), "grep glreflib1.so /proc/%i/maps", getpid());
   rc = system (cmd);
   TEST_COMPARE (rc, 0);
-
+return 0;
   sym = dlsym (handle, "ref1");
   if (sym == NULL)
     error (EXIT_FAILURE, 0, "dlsym failed");