[v3,3/5] malloc: Arena is not needed for tcache path in free()
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
defers arena deference to _int_free_chunk() thus accelerate tcache path.
When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
Result of bench-malloc-thread benchmark
Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)
Threads# | Ratio
-----------|------
1 thread | 0.994
4 threads | 0.968
The data shows it can brings 3% performance gain in multi-thread scenario.
---
Changes in v2:
- _int_free_check() should be put outside of USE_TCACHE.
- Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
---
malloc/malloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
@@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
{
mchunkptr next;
+ if (av == NULL)
+ av = arena_for_chunk (p);
+
do_check_chunk (av, p);
if (chunk_is_mmapped (p))
@@ -3447,9 +3450,10 @@ __libc_free (void *mem)
/* Mark the chunk as belonging to the library again. */
(void)tag_region (chunk2mem (p), memsize (p));
- ar_ptr = arena_for_chunk (p);
INTERNAL_SIZE_T size = chunksize (p);
- _int_free_check (ar_ptr, p, size);
+ /* av is not needed for _int_free_check in non-DEBUG mode,
+ in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
+ _int_free_check (NULL, p, size);
#if USE_TCACHE
if (tcache_free (p, size))
@@ -3458,6 +3462,8 @@ __libc_free (void *mem)
return;
}
#endif
+
+ ar_ptr = arena_for_chunk (p);
_int_free_chunk (ar_ptr, p, size, 0);
}