malloc: Use internal TLS for tcache and thread_arena
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
fail
|
Test failed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
fail
|
Test failed
|
Commit Message
Use internal TLS for faster access to frequently used thread-local data. Add tcache
and thread_arena to tls-internal.h to avoid GOT indirections. Performance of
bench-malloc-thread 32 improves by 2.2% on Neoverse V2.
---
Comments
* Wilco Dijkstra:
> Use internal TLS for faster access to frequently used thread-local
> data. Add tcache and thread_arena to tls-internal.h to avoid GOT
> indirections. Performance of bench-malloc-thread 32 improves by 2.2%
> on Neoverse V2.
I'm pretty sure this is incompatible with dlmopen and auditors.
The separate mallocs work despite their separated data structures
because malloc and free from different address spaces are not expected
to interoperate (and generally can't because an auditor can legimiately
use a completely different malloc implementation than the main program).
Having a shared tcache braeks that because it's totally possible that
tcache gets populated in one namespace, but freed in another namespace,
using the respective lower-level allocators. For non-main arenas, this
may not trigger heap corruption immediately, but for the main arena, I
expect this to be rather problematic because the locks and everything
are separate.
However, I like the direction of this change. We absolutely should make
this optimization for the initial namespace. But we need to find a way
to redirect mallocs in secondary namespaces to a different
implementation, without additional run-time checks in the main
namespace.
Thanks,
Florian
@@ -18,6 +18,7 @@
#include <stdbool.h>
#include <setvmaname.h>
+#include <tls-internal.h>
#define TUNABLE_NAMESPACE malloc
#include <elf/dl-tunables.h>
@@ -86,8 +87,16 @@ extern int sanity_check_heap_info_alignment[(sizeof (heap_info)
/* Thread specific data. */
+#if IS_IN (libc)
+
+#define thread_arena __glibc_tls_internal ()->thread_arena
+
+#else
+
static __thread mstate thread_arena attribute_tls_model_ie;
+#endif
+
/* Arena free list. free_list_lock synchronizes access to the
free_list variable below, and the next_free and attached_threads
members of struct malloc_state objects. No other locks must be
@@ -257,6 +257,8 @@
#include <sys/random.h>
#include <not-cancel.h>
+#include <tls-internal.h>
+
/*
Debugging:
@@ -3127,7 +3129,8 @@ typedef struct tcache_perthread_struct
} tcache_perthread_struct;
static __thread bool tcache_shutting_down = false;
-static __thread tcache_perthread_struct *tcache = NULL;
+
+#define tcache __glibc_tls_internal ()->tcache
/* Process-wide key to try and catch a double-free in the same thread. */
static uintptr_t tcache_key;
@@ -23,6 +23,9 @@ struct tls_internal_t
{
char *strsignal_buf;
char *strerror_l_buf;
+
+ struct tcache_perthread_struct *tcache;
+ struct malloc_state *thread_arena;
};
#endif