[v5,2/7] malloc: optimize initialization

Message ID 20250321120102.54012-3-cupertino.miranda@oracle.com (mailing list archive)
State Under Review
Delegated to: Wilco Dijkstra
Headers
Series tcache: malloc improvements |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed

Commit Message

Cupertino Miranda March 21, 2025, noon UTC
  This patch adds __glibc_unlikely to all conditions related to the call
to ptmalloc_init which check if __malloc_initialized variable is set.
---
 malloc/malloc.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)
  

Comments

Wilco Dijkstra March 21, 2025, 3:59 p.m. UTC | #1
Hi Cupertino,

> This patch adds __glibc_unlikely to all conditions related to the call
> to ptmalloc_init which check if __malloc_initialized variable is set.

> -  if (!__malloc_initialized)
> +  if (__glibc_unlikely (!__malloc_initialized))
>     ptmalloc_init ();

What difference did you see as a result (not sure whether it is in your graph)?
I benchmarked it against current trunk, and there is no difference in generated
code of all the key malloc/free functions. There are some improvements in less
frequently used functions, so it looks worthwhile, but I'd like to understand the
reasoning behind it.

Note in my __libc_malloc patch I moved this initialization after the tcache code.
The same could be done in several other cases. Moving it gives large gains
since you avoid doing unnecessary work before the tcache code.

Cheers,
Wilco
  
Cupertino Miranda March 21, 2025, 4:56 p.m. UTC | #2
Hi Wilco,

This is one of the changes which might bring actually 0 effect since the 
compiler could be already making the proper decision.
In my chart it is the comparison between patch2 and patch3 columns.

I cannot believe on the variation between those runs. Please notice I 
executed the benchmark with my machine in full idle, killed xorg, etc.

Now that I think about it, maybe I need to disable Address Space Layout 
Randomization (ASLR) (setting /proc/sys/kernel/randomize_va_space to 0).
This might justify the variation.

Will run all benchmark with it set to 0.

Cheers,
Cupertino


On 21-03-2025 15:59, Wilco Dijkstra wrote:
> Hi Cupertino,
> 
>> This patch adds __glibc_unlikely to all conditions related to the call
>> to ptmalloc_init which check if __malloc_initialized variable is set.
> 
>> -  if (!__malloc_initialized)
>> +  if (__glibc_unlikely (!__malloc_initialized))
>>       ptmalloc_init ();
> 
> What difference did you see as a result (not sure whether it is in your graph)?
> I benchmarked it against current trunk, and there is no difference in generated
> code of all the key malloc/free functions. There are some improvements in less
> frequently used functions, so it looks worthwhile, but I'd like to understand the
> reasoning behind it.
> 
> Note in my __libc_malloc patch I moved this initialization after the tcache code.
> The same could be done in several other cases. Moving it gives large gains
> since you avoid doing unnecessary work before the tcache code.
> 
> Cheers,
> Wilco
  
Cupertino Miranda March 27, 2025, 5:30 p.m. UTC | #3
Hi Wilco,

Obviously, disabling ASLR brought nothing useful.
I have collected the data this time with 50 runs per test.
Still continue to get some strange performance variations.
Tried to run the same tests on my Apple M1 in native linux, but 
mimalloc-bench is failing to build for ARM. :(

I adapted the mimalloc scripts to be able to plot the data from multiple 
runs and have collected it with the same tools as mimalloc-bench does.
It is capable to generate PNG, PDF and HTML charts.
Considering the number of tests PNG and PDF version seem useless, but 
the HTML is nice since it has some javascript to zoom and pan on the chart.

Now, the columns on the chart match the patches.
Hope this makes it easier to understand the results.

Cheers,
Cupertino

On 21-03-2025 16:56, Cupertino Miranda wrote:
> Hi Wilco,
> 
> This is one of the changes which might bring actually 0 effect since the 
> compiler could be already making the proper decision.
> In my chart it is the comparison between patch2 and patch3 columns.
> 
> I cannot believe on the variation between those runs. Please notice I 
> executed the benchmark with my machine in full idle, killed xorg, etc.
> 
> Now that I think about it, maybe I need to disable Address Space Layout 
> Randomization (ASLR) (setting /proc/sys/kernel/randomize_va_space to 0).
> This might justify the variation.
> 
> Will run all benchmark with it set to 0.
> 
> Cheers,
> Cupertino
> 
> 
> On 21-03-2025 15:59, Wilco Dijkstra wrote:
>> Hi Cupertino,
>>
>>> This patch adds __glibc_unlikely to all conditions related to the call
>>> to ptmalloc_init which check if __malloc_initialized variable is set.
>>
>>> -  if (!__malloc_initialized)
>>> +  if (__glibc_unlikely (!__malloc_initialized))
>>>       ptmalloc_init ();
>>
>> What difference did you see as a result (not sure whether it is in 
>> your graph)?
>> I benchmarked it against current trunk, and there is no difference in 
>> generated
>> code of all the key malloc/free functions. There are some improvements 
>> in less
>> frequently used functions, so it looks worthwhile, but I'd like to 
>> understand the
>> reasoning behind it.
>>
>> Note in my __libc_malloc patch I moved this initialization after the 
>> tcache code.
>> The same could be done in several other cases. Moving it gives large 
>> gains
>> since you avoid doing unnecessary work before the tcache code.
>>
>> Cheers,
>> Wilco
>
  
Cupertino Miranda March 29, 2025, 10 p.m. UTC | #4
Hi Wilco,

Obviously, disabling ASLR brought nothing useful.
I have collected the data this time with 50 runs per test.
Still continue to get some strange performance variations.
Tried to run the same tests on my Apple M1 in native linux, but 
mimalloc-bench is failing to build for ARM. :(

I adapted the mimalloc scripts to be able to plot the data from multiple 
runs and have collected it with the same tools as mimalloc-bench does.
It is capable to generate PNG, PDF and HTML charts.
Considering the number of tests PNG and PDF version seem useless, but 
the HTML is nice since it has some javascript to zoom and pan on the chart.

Now, the columns on the chart match the patches.
Hope this makes it easier to understand the results.

Cheers,
Cupertino

PS: This email has not been delivered for half a week. Re-sending with 
compressed attachments.



On 27-03-2025 17:30, Cupertino Miranda wrote:
> Hi Wilco,
> 
> Obviously, disabling ASLR brought nothing useful.
> I have collected the data this time with 50 runs per test.
> Still continue to get some strange performance variations.
> Tried to run the same tests on my Apple M1 in native linux, but 
> mimalloc-bench is failing to build for ARM. :(
> 
> I adapted the mimalloc scripts to be able to plot the data from multiple 
> runs and have collected it with the same tools as mimalloc-bench does.
> It is capable to generate PNG, PDF and HTML charts.
> Considering the number of tests PNG and PDF version seem useless, but 
> the HTML is nice since it has some javascript to zoom and pan on the chart.
> 
> Now, the columns on the chart match the patches.
> Hope this makes it easier to understand the results.
> 
> Cheers,
> Cupertino
> 
> On 21-03-2025 16:56, Cupertino Miranda wrote:
>> Hi Wilco,
>>
>> This is one of the changes which might bring actually 0 effect since 
>> the compiler could be already making the proper decision.
>> In my chart it is the comparison between patch2 and patch3 columns.
>>
>> I cannot believe on the variation between those runs. Please notice I 
>> executed the benchmark with my machine in full idle, killed xorg, etc.
>>
>> Now that I think about it, maybe I need to disable Address Space 
>> Layout Randomization (ASLR) (setting /proc/sys/kernel/ 
>> randomize_va_space to 0).
>> This might justify the variation.
>>
>> Will run all benchmark with it set to 0.
>>
>> Cheers,
>> Cupertino
>>
>>
>> On 21-03-2025 15:59, Wilco Dijkstra wrote:
>>> Hi Cupertino,
>>>
>>>> This patch adds __glibc_unlikely to all conditions related to the call
>>>> to ptmalloc_init which check if __malloc_initialized variable is set.
>>>
>>>> -  if (!__malloc_initialized)
>>>> +  if (__glibc_unlikely (!__malloc_initialized))
>>>>       ptmalloc_init ();
>>>
>>> What difference did you see as a result (not sure whether it is in 
>>> your graph)?
>>> I benchmarked it against current trunk, and there is no difference in 
>>> generated
>>> code of all the key malloc/free functions. There are some 
>>> improvements in less
>>> frequently used functions, so it looks worthwhile, but I'd like to 
>>> understand the
>>> reasoning behind it.
>>>
>>> Note in my __libc_malloc patch I moved this initialization after the 
>>> tcache code.
>>> The same could be done in several other cases. Moving it gives large 
>>> gains
>>> since you avoid doing unnecessary work before the tcache code.
>>>
>>> Cheers,
>>> Wilco
>>
  
Mark Wielaard March 30, 2025, 12:02 p.m. UTC | #5
Hi Cupertino,

On Sat, Mar 29, 2025 at 10:00:06PM +0000, Cupertino Miranda wrote:
> PS: This email has not been delivered for half a week. Re-sending
> with compressed attachments.

Sorry about that. You correctly determined it was too big.  Which put
it in the moderator queue. We do have two libc-alpha list
administrators (Carlos and Ryan). But we could use an extra volunteer
so things aren't pending that long.

Cheers,

Mark
  
Carlos O'Donell March 31, 2025, 1:59 p.m. UTC | #6
On 3/30/25 8:02 AM, Mark Wielaard wrote:
> Hi Cupertino,
> 
> On Sat, Mar 29, 2025 at 10:00:06PM +0000, Cupertino Miranda wrote:
>> PS: This email has not been delivered for half a week. Re-sending
>> with compressed attachments.
> 
> Sorry about that. You correctly determined it was too big.  Which put
> it in the moderator queue. We do have two libc-alpha list
> administrators (Carlos and Ryan). But we could use an extra volunteer
> so things aren't pending that long.

Approved.

What's odd is that I don't remember seeing an email that I had anything
in the moderation queue.

I'll look into it.
  
Carlos O'Donell March 31, 2025, 6:46 p.m. UTC | #7
On 3/31/25 9:59 AM, Carlos O'Donell wrote:
> On 3/30/25 8:02 AM, Mark Wielaard wrote:
>> Hi Cupertino,
>>
>> On Sat, Mar 29, 2025 at 10:00:06PM +0000, Cupertino Miranda wrote:
>>> PS: This email has not been delivered for half a week. Re-sending
>>> with compressed attachments.
>>
>> Sorry about that. You correctly determined it was too big.  Which put
>> it in the moderator queue. We do have two libc-alpha list
>> administrators (Carlos and Ryan). But we could use an extra volunteer
>> so things aren't pending that long.
> 
> Approved.
> 
> What's odd is that I don't remember seeing an email that I had anything
> in the moderation queue.
> 
> I'll look into it.
> 

Just to clarify, my "Approved." has to do with approving the email stuck
in moderation.

Approval of a patch is only carried out with Reviewed-by :-)
  

Patch

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 59b22b3455..98d1fda144 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3389,7 +3389,7 @@  __libc_malloc (size_t bytes)
   _Static_assert (PTRDIFF_MAX <= SIZE_MAX / 2,
                   "PTRDIFF_MAX is not more than half of SIZE_MAX");
 
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 #if USE_TCACHE
   bool err = tcache_try_malloc (bytes, &victim);
@@ -3488,7 +3488,7 @@  __libc_realloc (void *oldmem, size_t bytes)
 
   void *newp;             /* chunk to return */
 
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
 #if REALLOC_ZERO_BYTES_FREES
@@ -3619,7 +3619,7 @@  libc_hidden_def (__libc_realloc)
 void *
 __libc_memalign (size_t alignment, size_t bytes)
 {
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
   void *address = RETURN_ADDRESS (0);
@@ -3632,7 +3632,7 @@  void *
 weak_function
 aligned_alloc (size_t alignment, size_t bytes)
 {
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
 /* Similar to memalign, but starting with ISO C17 the standard
@@ -3742,7 +3742,7 @@  _mid_memalign (size_t alignment, size_t bytes, void *address)
 void *
 __libc_valloc (size_t bytes)
 {
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
   void *address = RETURN_ADDRESS (0);
@@ -3753,7 +3753,7 @@  __libc_valloc (size_t bytes)
 void *
 __libc_pvalloc (size_t bytes)
 {
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
   void *address = RETURN_ADDRESS (0);
@@ -3790,7 +3790,7 @@  __libc_calloc (size_t n, size_t elem_size)
 
   sz = bytes;
 
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
 #if USE_TCACHE
@@ -5273,7 +5273,7 @@  __malloc_trim (size_t s)
 {
   int result = 0;
 
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
   mstate ar_ptr = &main_arena;
@@ -5392,7 +5392,7 @@  __libc_mallinfo2 (void)
   struct mallinfo2 m;
   mstate ar_ptr;
 
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
   memset (&m, 0, sizeof (m));
@@ -5443,7 +5443,7 @@  __malloc_stats (void)
   mstate ar_ptr;
   unsigned int in_use_b = mp_.mmapped_mem, system_b = in_use_b;
 
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
   _IO_flockfile (stderr);
   int old_flags2 = stderr->_flags2;
@@ -5625,7 +5625,7 @@  __libc_mallopt (int param_number, int value)
   mstate av = &main_arena;
   int res = 1;
 
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
   __libc_lock_lock (av->mutex);
 
@@ -5848,7 +5848,7 @@  __posix_memalign (void **memptr, size_t alignment, size_t size)
 {
   void *mem;
 
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
   /* Test whether the SIZE argument is valid.  It must be a power of
@@ -5893,7 +5893,7 @@  __malloc_info (int options, FILE *fp)
 
 
 
-  if (!__malloc_initialized)
+  if (__glibc_unlikely (!__malloc_initialized))
     ptmalloc_init ();
 
   fputs ("<malloc version=\"1\">\n", fp);