[v2] malloc: Improve malloc initialization

Message ID PAWPR08MB89821B040D315BA17DD616F383AF2@PAWPR08MB8982.eurprd08.prod.outlook.com (mailing list archive)
State Changes Requested
Headers
Series [v2] malloc: Improve malloc initialization |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch fail Patch failed to apply to master at the time it was sent
redhat-pt-bot/TryBot-32bit fail Patch series failed to apply

Commit Message

Wilco Dijkstra April 2, 2025, 12:09 p.m. UTC
  Hi Florian,

> I think we should document the invariant that tache_init is called as
> part of pthread_create, before the new thread starts running.  This way,
> ptmalloc_init does not need to be thread-safe.  I think it's still the
> case after this patch becaue MAYBE_INIT_TCACHE is called unconditionally
> early and does not depend on the allocation size.  (Not sure why this is
> a macro and not a function.)  I'd suggest comments on ptmalloc_init and
> MAYBE_INIT_TCACHE that reference each other, and one of them should
> describe the expected sequence of events.  Maybe also put a comment on
> MAYBE_INIT_TCACHE in __libc_malloc2?

I think it works out fine as long as there is at least one call to malloc before
a new thread is started. The existing initialization already relies on this - I just
made it explicit that tcache_init relies on ptmalloc_init having been called first
(if you call tcache_init first, it would crash in the medium bin code).

I haven't decided on what to do with MAYBE_INIT_TCACHE, but in principle we
could initialize tcache from arena_get (as thread_arena == NULL for new threads).
This would remove all tcache initialization checks from hot paths.

> Regarding doing this initialization early and unconditionally: This
> could well work today, especially if we do it early via
> __libc_early_init for dynamically linked builds (so that malloc is
> available in PREINIT and LD_PRELOAD objects).

I tried it and it does work indeed - it's slightly tricky due to the complexity
of malloc-debug containing an almost complete copy of malloc while also
wanting to use the libc one at the same time!

> For static builds, additional work will be required, or we could keep
> the current scheme for that case.  (I think in general, malloc can be
> called before ELF constructors have run in the static case.)

The patch below passes all malloc tests (not sure whether it tests static
binaries as well as dynamic though). I can add some assertions to enforce
we never start allocating blocks before malloc is initialized.

> If we ever start allocating during initialization (I expect us to
> reserve address space for all possible struct malloc_state objects, for
> example) and we want to avoid that for interposed mallocs, then I think
> a lazy scheme is the only feasible way.  We can't call into the
> interposed malloc from __libc_early_init because its ELF constructor has
> not run yet.

Hmm, it does appear to "just work"... Are you saying for dynamic case we
should call __libc_malloc (-1) to initialize our own malloc (without knowing
whether it is interposed or not), and malloc (-1) for static?

Cheers,
Wilco


v2: Initialize malloc from __libc_early_init

Move malloc initialization to __libc_early_init.  Since malloc/calloc/realloc/free are
the minimal interposable functions, use a huge out of range malloc for initialization.
The related malloc-debug initialization does the same.  All other initialization
checks can now be removed.  Tcache_init() needs an initialization check since it may be
called unconditionally from malloc().  The tst-compathooks-on.c test needs updating to
avoid counting the malloc/free() calls from __libc_early_init (when interposed only
the interposed malloc is initialized).

---
  

Comments

Florian Weimer April 2, 2025, 12:33 p.m. UTC | #1
* Wilco Dijkstra:

> diff --git a/elf/libc_early_init.c b/elf/libc_early_init.c
> index 07202317534a4edbd2202c6e45770094cadb67f6..6ab14fb239a02d8da356a1fc5bdb307aabf35705 100644
> --- a/elf/libc_early_init.c
> +++ b/elf/libc_early_init.c
> @@ -29,9 +29,19 @@
>  _Bool __libc_initial;
>  #endif
>  
> +static void __attribute_noinline__
> +myfree (void *p)
> +{
> +  free (p);
> +}
> +
>  void
>  __libc_early_init (_Bool initial)
>  {
> +  /* Initialize malloc using an out-of-range malloc.  */
> +  volatile size_t size = -1;
> +  myfree (malloc (size));
> +
>    /* Initialize ctype data.  */
>    __ctype_init ();
>  

This callos malloc/free from an interposed malloc before its ELF
constructor has run.  I expect this to result in crashes for some
mallocs because they don't expect that we bypass initialization in this
way.  In the future, we might even call __libc_early_init immediately
after libc.so relocation, before other shared objects are relocated.

If we want to pre-initialize malloc, we should explicitly call a
glibc-specific initialization routine from __libc_early_init, using
call_function_static_weak.

Thanks,
Florian
  

Patch

diff --git a/elf/libc_early_init.c b/elf/libc_early_init.c
index 07202317534a4edbd2202c6e45770094cadb67f6..6ab14fb239a02d8da356a1fc5bdb307aabf35705 100644
--- a/elf/libc_early_init.c
+++ b/elf/libc_early_init.c
@@ -29,9 +29,19 @@ 
 _Bool __libc_initial;
 #endif
 
+static void __attribute_noinline__
+myfree (void *p)
+{
+  free (p);
+}
+
 void
 __libc_early_init (_Bool initial)
 {
+  /* Initialize malloc using an out-of-range malloc.  */
+  volatile size_t size = -1;
+  myfree (malloc (size));
+
   /* Initialize ctype data.  */
   __ctype_init ();
 
diff --git a/malloc/malloc-debug.c b/malloc/malloc-debug.c
index d208aa32a3c650cbce88f7bc04cc85969db56dcf..18ee300990bfa7d6e2a1ed46d663e4bc3773e073 100644
--- a/malloc/malloc-debug.c
+++ b/malloc/malloc-debug.c
@@ -116,7 +116,7 @@  generic_hook_ini (void)
   if (!initialize_malloc_check ())
     /* The compiler does not know that these functions are allocators, so it
        will not try to optimize it away.  */
-    __libc_free (__libc_malloc (0));
+    __libc_free (__libc_malloc (-1));
 
 #if SHLIB_COMPAT (libc_malloc_debug, GLIBC_2_0, GLIBC_2_24)
   void (*hook) (void) = __malloc_initialize_hook;
diff --git a/malloc/malloc.c b/malloc/malloc.c
index a0bc733482532ce34684d0357cb9076b03ac8a52..986e6211a86bfaf7b0309b138a067f16e92d1bbd 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3316,6 +3316,10 @@  tcache_init(void)
   if (tcache_shutting_down)
     return;
 
+  /* Ensure malloc is initialized before tcache.  */
+  if (!__malloc_initialized)
+    ptmalloc_init ();
+
   arena_get (ar_ptr, bytes);
   victim = _int_malloc (ar_ptr, bytes);
   if (!victim && ar_ptr != NULL)
@@ -3390,9 +3394,6 @@  __libc_malloc2 (size_t bytes)
   mstate ar_ptr;
   void *victim;
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   MAYBE_INIT_TCACHE ();
 
   if (SINGLE_THREAD_P)
@@ -3495,9 +3496,6 @@  __libc_realloc (void *oldmem, size_t bytes)
 
   void *newp;             /* chunk to return */
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
 #if REALLOC_ZERO_BYTES_FREES
   if (bytes == 0 && oldmem != NULL)
     {
@@ -3623,9 +3621,6 @@  libc_hidden_def (__libc_realloc)
 void *
 __libc_memalign (size_t alignment, size_t bytes)
 {
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   void *address = RETURN_ADDRESS (0);
   return _mid_memalign (alignment, bytes, address);
 }
@@ -3636,9 +3631,6 @@  void *
 weak_function
 aligned_alloc (size_t alignment, size_t bytes)
 {
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
 /* Similar to memalign, but starting with ISO C17 the standard
    requires an error for alignments that are not supported by the
    implementation.  Valid alignments for the current implementation
@@ -3746,9 +3738,6 @@  _mid_memalign (size_t alignment, size_t bytes, void *address)
 void *
 __libc_valloc (size_t bytes)
 {
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   void *address = RETURN_ADDRESS (0);
   size_t pagesize = GLRO (dl_pagesize);
   return _mid_memalign (pagesize, bytes, address);
@@ -3757,9 +3746,6 @@  __libc_valloc (size_t bytes)
 void *
 __libc_pvalloc (size_t bytes)
 {
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   void *address = RETURN_ADDRESS (0);
   size_t pagesize = GLRO (dl_pagesize);
   size_t rounded_bytes;
@@ -3794,9 +3780,6 @@  __libc_calloc (size_t n, size_t elem_size)
 
   sz = bytes;
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
 #if USE_TCACHE
   bool err = tcache_try_malloc (bytes, &mem);
 
@@ -3943,6 +3926,9 @@  _int_malloc (mstate av, size_t bytes)
   if (nb == 0)
     {
       __set_errno (ENOMEM);
+      /* Special malloc initialization case.  */
+      if (!__malloc_initialized)
+	ptmalloc_init ();
       return NULL;
     }
 
@@ -5277,9 +5263,6 @@  __malloc_trim (size_t s)
 {
   int result = 0;
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   mstate ar_ptr = &main_arena;
   do
     {
@@ -5396,9 +5379,6 @@  __libc_mallinfo2 (void)
   struct mallinfo2 m;
   mstate ar_ptr;
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();	
-
   memset (&m, 0, sizeof (m));
   ar_ptr = &main_arena;
   do
@@ -5447,8 +5427,6 @@  __malloc_stats (void)
   mstate ar_ptr;
   unsigned int in_use_b = mp_.mmapped_mem, system_b = in_use_b;
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
   _IO_flockfile (stderr);
   int old_flags2 = stderr->_flags2;
   stderr->_flags2 |= _IO_FLAGS2_NOTCANCEL;
@@ -5629,8 +5607,6 @@  __libc_mallopt (int param_number, int value)
   mstate av = &main_arena;
   int res = 1;
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
   __libc_lock_lock (av->mutex);
 
   LIBC_PROBE (memory_mallopt, 2, param_number, value);
@@ -5852,9 +5828,6 @@  __posix_memalign (void **memptr, size_t alignment, size_t size)
 {
   void *mem;
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   /* Test whether the SIZE argument is valid.  It must be a power of
      two multiple of sizeof (void *).  */
   if (alignment % sizeof (void *) != 0
@@ -5895,11 +5868,6 @@  __malloc_info (int options, FILE *fp)
   size_t total_aspace = 0;
   size_t total_aspace_mprotect = 0;
 
-
-
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   fputs ("<malloc version=\"1\">\n", fp);
 
   /* Iterate over all arenas currently in use.  */
diff --git a/malloc/tst-compathooks-off.c b/malloc/tst-compathooks-off.c
index 82c47ebb6c182d46b031d4949c5bf5e60e5ef10b..45289a2024dc4edeb05239e12a11aaf13d53ab5c 100644
--- a/malloc/tst-compathooks-off.c
+++ b/malloc/tst-compathooks-off.c
@@ -110,6 +110,8 @@  DIAG_POP_NEEDS_COMMENT;
 static int
 do_test (void)
 {
+  hook_count = 0;
+
   void *p;
   p = malloc (0);
   TEST_VERIFY_EXIT (p != NULL);