status of dj/malloc branch?

Message ID 20240401191925.M515362@dcvr
State Not applicable
Headers
Series status of dj/malloc branch? |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch fail Patch failed to apply to master at the time it was sent
redhat-pt-bot/TryBot-32bit fail Patch series failed to apply

Commit Message

Eric Wong April 1, 2024, 7:19 p.m. UTC
  I'm interested in the tracing features described at
https://sourceware.org/glibc/wiki/MallocTracing
to test and validate memory fragmentation avoidance in a long-lived
single-threaded Perl C10K HTTP/IMAP/NNTP/POP3 daemon.

It appears stalled for years, however, and the current glibc
malloc doesn't have the trace + replay features.

I'm currently dogfooding the below patch on an old glibc (Debian
oldstable :x) on my "production" home server.  My theory is the
jemalloc idea of having fewer possible sizes is good for
avoiding fragmentation in long-lived processes.

This is because sizes for string processing are highly
variable and lifetimes are mixed for event-driven C10K servers
where some clients live only for a single request and others for
many.  Clients end up sharing allocations due to caching
and deduplication, so a short-lived client can end up allocating
something that lives a long-time.  Perl does lazy loading
and internal caching+memoization all over the place, too.

The downside is 0-20% waste in initial fits, but I expect it to
get better fits over time...

Not a serious patch against Debian glibc 2.31-13+deb11u8:
  

Comments

DJ Delorie April 1, 2024, 8:59 p.m. UTC | #1
If you're wondering about the branch itself, the harsh answer is that
I'm not maintaining it.

If you're wondering about the goals therein, IIRC we discovered we had
no good way to visualize and analyze the heap itself in order to
understand what causes the problem we're trying to solve.  While these
are solvable problems, they're big projects and never quite made it to
the top of our priority lists.
  
Eric Wong April 6, 2024, 10:02 p.m. UTC | #2
DJ Delorie <dj@redhat.com> wrote:
> 
> If you're wondering about the branch itself, the harsh answer is that
> I'm not maintaining it.
> 
> If you're wondering about the goals therein, IIRC we discovered we had
> no good way to visualize and analyze the heap itself in order to
> understand what causes the problem we're trying to solve.  While these
> are solvable problems, they're big projects and never quite made it to
> the top of our priority lists.

Thanks for the response.

I started doing my own tracing[1] in mwrap-perl[2] and it's
crazy expensive (I/O and storage) to trace all the allocations
done by a busy Perl process.  I had to add compression (using
zstd) to slow down the use of disk space; hope I can get useful
reproducible data before I run out of space.

[1] https://80x24.org/mwrap-perl/20240406214954.159627-1-e@80x24.org/
[2] https://80x24.org/mwrap-perl.git
  
DJ Delorie April 8, 2024, 6:37 p.m. UTC | #3
Eric Wong <e@80x24.org> writes:
> I started doing my own tracing[1] in mwrap-perl[2] and it's
> crazy expensive (I/O and storage) to trace all the allocations

Yeah, we ran into that problem too.  64-bit pointers plus many-argument
functions like realloc results in a HUGE log file.  Like, terabytes in
one case.
  

Patch

diff --git a/malloc/malloc.c b/malloc/malloc.c
index f7cd29bc..6e0b066d 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3018,6 +3018,31 @@  tcache_thread_shutdown (void)
 
 #endif /* !USE_TCACHE  */
 
+static inline size_t
+size_class_pad (size_t bytes)
+{
+  if (bytes <= MAX_FAST_SIZE || bytes >= DEFAULT_MMAP_THRESHOLD_MAX)
+    return bytes;
+  /*
+   * Use jemalloc-inspired size classes for mid-size allocations to
+   * minimize fragmentation.  This means we pay a 0-20% overhead on
+   * the initial allocations to improve the likelyhood of reuse.
+   */
+  size_t max = sizeof(void *) << 4;
+  size_t nxt;
+
+  do {
+    if (bytes <= max) {
+      size_t sc_bytes = ALIGN_UP (bytes, max >> 3);
+
+      return sc_bytes <= DEFAULT_MMAP_THRESHOLD_MAX ? sc_bytes : bytes;
+    }
+    nxt = max << 1;
+  } while (nxt > max && nxt < DEFAULT_MMAP_THRESHOLD_MAX && (max = nxt));
+
+  return bytes;
+}
+
 void *
 __libc_malloc (size_t bytes)
 {
@@ -3031,6 +3056,7 @@  __libc_malloc (size_t bytes)
     = atomic_forced_read (__malloc_hook);
   if (__builtin_expect (hook != NULL, 0))
     return (*hook)(bytes, RETURN_ADDRESS (0));
+  bytes = size_class_pad (bytes);
 #if USE_TCACHE
   /* int_free also calls request2size, be careful to not pad twice.  */
   size_t tbytes;
@@ -3150,6 +3176,8 @@  __libc_realloc (void *oldmem, size_t bytes)
   if (oldmem == 0)
     return __libc_malloc (bytes);
 
+  bytes = size_class_pad (bytes);
+
   /* chunk corresponding to oldmem */
   const mchunkptr oldp = mem2chunk (oldmem);
   /* its size */
@@ -3391,6 +3419,7 @@  __libc_calloc (size_t n, size_t elem_size)
       return memset (mem, 0, sz);
     }
 
+  sz = size_class_pad (sz);
   MAYBE_INIT_TCACHE ();
 
   if (SINGLE_THREAD_P)