malloc: send freed small chunks to smallbin

Message ID Kh9S4HnMXbEYGsdza15DjLBt1WbbFMpYTnuElnqLU3FjtFsBpE_T0Bi3wv0yZurMZ-Lkk8iNrntGJUQzY3Ni8620B8XeDxAP2hHXlVPu1gE=@proton.me
State New
Headers
Series malloc: send freed small chunks to smallbin |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-arm fail Patch failed to apply
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 fail Patch failed to apply

Commit Message

k4lizen July 3, 2024, 8:25 a.m. UTC
  Large chunks get added to the unsorted bin since
sorting them takes time, for small chunks the
benefit of adding them to the unsorted bin is
non-existant, actually hurting performance.

See discussion:
https://sourceware.org/pipermail/libc-alpha/2024-July/157922.html
---
malloc/malloc.c | 49 +++++++++++++++++++++++++++++++++----------------
1 file changed, 33 insertions(+), 16 deletions(-)

--
2.45.2
  

Comments

k4lizen July 3, 2024, 8:40 a.m. UTC | #1
Here is a pathological benchmark for the current implementation (the comparison
is as exaggerated as possible). The micro benchmark code is in the attachment.

current: Total time: 69.390269 seconds
patched: Total time: 59.247076 seconds
current: Total time: 69.952139 seconds
patched: Total time: 59.208276 seconds
current: Total time: 69.419799 seconds
patched: Total time: 59.873232 seconds

Current average: 69.587402334 +- 0.36473667
Patched average: 59.442859334 +- 0.43037267
Relative improvement: 0.145781
(1 - (modified/original))

Also, I noted before that:
>Because of this, it is possible that in the current implementation chunks would
>sometimes end up in the tcache, while in my version they would go directly to
>a small bin. Though the setup for that is relatively complex.

But thats actually not true, as when a chunk is taken from the smallbin, the other
chunks in it get moved to the tcache:
https://elixir.bootlin.com/glibc/latest/source/malloc/malloc.c#L3995

So there should be no difference between the current implementation and the
patched version in that respect. Thus, I cannot come up with a pathological
benchmark for the patched version of the code.

Comparing the code in a more generic setting is probably advisable, though
as I said I'm not sure what the best way to go about doing that is.
On Wednesday, July 3rd, 2024 at 10:25 AM, k4lizen <k4lizen@proton.me> wrote:

> Large chunks get added to the unsorted bin since
> sorting them takes time, for small chunks the
> benefit of adding them to the unsorted bin is
> non-existant, actually hurting performance.
>
> See discussion:
> https://sourceware.org/pipermail/libc-alpha/2024-July/157922.html
> ---
> malloc/malloc.c | 49 +++++++++++++++++++++++++++++++++----------------
> 1 file changed, 33 insertions(+), 16 deletions(-)
>
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index bcb6e5b83c..8c6155d8f3 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -4156,8 +4156,8 @@ _int_malloc (mstate av, size_t bytes)
> #endif
> }
>
> - /* place chunk in bin */
> -
> + /* Place chunk in bin. malloc_consolidate() could create
> + small chunks. */
> if (in_smallbin_range (size))
> {
> victim_index = smallbin_index (size);
> @@ -4723,23 +4723,41 @@ _int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size,
> } else
> clear_inuse_bit_at_offset(nextchunk, 0);
>
> + mchunkptr bck, fwd;
> +
> + if(in_smallbin_range (size)){
> /*
> - Place the chunk in unsorted chunk list. Chunks are
> - not placed into regular bins until after they have
> - been given one chance to be used in malloc.
> + Place small chunks directly in their smallbin, so they
> + don't pollute the unsorted bin.
> */
>
> - mchunkptr bck = unsorted_chunks (av);
> - mchunkptr fwd = bck->fd;
> - if (__glibc_unlikely (fwd->bk != bck))
> - malloc_printerr ("free(): corrupted unsorted chunks");
> - p->fd = fwd;
> + int chunk_index = smallbin_index (size);
> + bck = bin_at (av, chunk_index);
> + fwd = bck->fd;
> +
> + if (__glibc_unlikely (fwd->bk != bck))
> + malloc_printerr ("free(): chunks in smallbin corrupted");
> +
> + mark_bin (av, chunk_index);
> + }
> + else
> + {
> + /*
> + Place large chunks in unsorted chunk list. Large chunks are
> + not placed into regular bins until after they have
> + been given one chance to be used in malloc.
> + */
> +
> + bck = unsorted_chunks (av);
> + fwd = bck->fd;
> + if (__glibc_unlikely (fwd->bk != bck))
> + malloc_printerr ("free(): corrupted unsorted chunks");
> + p->fd_nextsize = NULL;
> + p->bk_nextsize = NULL;
> + }
> +
> p->bk = bck;
> - if (!in_smallbin_range(size))
> - {
> - p->fd_nextsize = NULL;
> - p->bk_nextsize = NULL;
> - }
> + p->fd = fwd;
> bck->fd = p;
> fwd->bk = p;
>
> @@ -4748,7 +4766,6 @@ _int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size,
>
> check_free_chunk(av, p);
> }
> -
> else
> {
> /* If the chunk borders the current high end of memory,
> --
> 2.45.2
  

Patch

From 39bbbafd2cbc926864486a6e58876b7b000940ea Mon Sep 17 00:00:00 2001
From: k4lizen <124312252+k4lizen@users.noreply.github.com>
Date: Wed, 3 Jul 2024 06:01:51 +0200
Subject: [PATCH] malloc: send freed small chunks to smallbin
To: libc-alpha@sourceware.org
Cc: fweimer@redhat.com,
    dj@redhat.com

Large chunks get added to the unsorted bin since
sorting them takes time, for small chunks the
benefit of adding them to the unsorted bin is
non-existant, actually hurting performance.

See discussion:
https://sourceware.org/pipermail/libc-alpha/2024-July/157922.html
---
 malloc/malloc.c | 49 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index bcb6e5b83c..8c6155d8f3 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -4156,8 +4156,8 @@  _int_malloc (mstate av, size_t bytes)
 #endif
             }
 
-          /* place chunk in bin */
-
+          /* Place chunk in bin. malloc_consolidate() could create
+      small chunks. */
           if (in_smallbin_range (size))
             {
               victim_index = smallbin_index (size);
@@ -4723,23 +4723,41 @@  _int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size,
       } else
 	clear_inuse_bit_at_offset(nextchunk, 0);
 
+      mchunkptr bck, fwd;
+
+      if(in_smallbin_range (size)){
       /*
-	Place the chunk in unsorted chunk list. Chunks are
-	not placed into regular bins until after they have
-	been given one chance to be used in malloc.
+  Place small chunks directly in their smallbin, so they
+  don't pollute the unsorted bin.
       */
 
-      mchunkptr bck = unsorted_chunks (av);
-      mchunkptr fwd = bck->fd;
-      if (__glibc_unlikely (fwd->bk != bck))
-	malloc_printerr ("free(): corrupted unsorted chunks");
-      p->fd = fwd;
+        int chunk_index = smallbin_index (size);
+        bck = bin_at (av, chunk_index);
+        fwd = bck->fd;
+
+        if (__glibc_unlikely (fwd->bk != bck))
+    malloc_printerr ("free(): chunks in smallbin corrupted");
+
+        mark_bin (av, chunk_index);
+      }
+      else
+      {
+      /*
+  Place large chunks in unsorted chunk list. Large chunks are
+  not placed into regular bins until after they have
+  been given one chance to be used in malloc.
+      */
+
+        bck = unsorted_chunks (av);
+        fwd = bck->fd;
+        if (__glibc_unlikely (fwd->bk != bck))
+    malloc_printerr ("free(): corrupted unsorted chunks");
+        p->fd_nextsize = NULL;
+        p->bk_nextsize = NULL;
+      }
+      
       p->bk = bck;
-      if (!in_smallbin_range(size))
-	{
-	  p->fd_nextsize = NULL;
-	  p->bk_nextsize = NULL;
-	}
+      p->fd = fwd;
       bck->fd = p;
       fwd->bk = p;
 
@@ -4748,7 +4766,6 @@  _int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size,
 
       check_free_chunk(av, p);
     }
-
   else
     {
       /* If the chunk borders the current high end of memory,
-- 
2.45.2