[v6,2/3] malloc: add tcache support for large chunk caching
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
fail
|
Patch failed to apply
|
Commit Message
Existing tcache implementation in glibc seems to focus in caching
smaller data size allocations, limiting the size of the allocation to
1KB.
This patch changes tcache implementation to allow to cache any chunk
size allocations. The implementation adds extra bins (linked-lists)
which store chunks with different ranges of allocation sizes. Bin
selection is done in multiples in powers of 2 and chunks are inserted in
growing size ordering within the bin. The last bin contains all other
sizes of allocations.
This patch although by default preserves the same implementation,
limitting caches to 1KB chunks, it now allows to increase the max size
for the cached chunks with the tunable glibc.malloc.tcache_max.
---
malloc/malloc.c | 159 ++++++++++++++++++++++++++++++++++--------------
1 file changed, 115 insertions(+), 44 deletions(-)
Comments
Hi Wilco,
I noticed you missed to reply-all. Re-adding.
On 16-04-2025 17:44, Wilco Dijkstra wrote:
> Hi Cupertino,
>
>> +# define TCACHE_FIXED_SIZE_BINS \
>> + (mp_.tcache_bins < TCACHE_MAX_BINS ? mp_.tcache_bins : TCACHE_MAX_BINS)
>
> This does not make any sense. There are 2 options: either keep the current definition of
> mp_.tcache_bins for the small bins and use a new global for the larger bins, or use
> mp_.tcache_bins for the larger bins only and TCACHE_MAX_BINS for the smaller ones.
Well, I had the intention to keep original semantics unchanged and add
support for large tcaches as well.
Original code, allowed tcaches to be limited below the 1k size too.
Which of those options you would prefer ?
I think recent code changes are redirecting code to option 2.
>
>> -# define tidx2usize(idx) (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)
>> +# define tidx2usize(idx) (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - MALLOC_ALIGNMENT + 1)
>
> This is not correct.
I have applied the reverse of the original csize2tidx formula, which has
changed recently. Now I am not totally sure, but my impression was that
this was more accurate.
>
>> -/* When "x" is from chunksize(). */
>> -# define csize2tidx(x) (((x) - MINSIZE) / MALLOC_ALIGNMENT)
>> +/* When "x" is from chunksize(). Only usable for single sized tcache bins. */
>> +# define fast_csize2tidx(x) (((x) - MINSIZE) / MALLOC_ALIGNMENT)
>
> Why rename csize2tidx()? Just add a new definition for the larger bins.
Right, side-effects of earlier versions.
>
>> - .tcache_max_bytes = tidx2usize (TCACHE_MAX_BINS-1),
>> + .tcache_max_bytes = MAX_TCACHE_SIZE,
>
> OK
>
>> - uint16_t counts[TCACHE_MAX_BINS];
>> - tcache_entry *entries[TCACHE_MAX_BINS];
>> + uint16_t counts[TCACHE_MAX_BINS + TCACHE_UNBOUND_SIZE_BINS];
>> + tcache_entry *entries[TCACHE_MAX_BINS + TCACHE_UNBOUND_SIZE_BINS];
>
> OK
>
>> +static __always_inline size_t
>> +csize2tidx(size_t nb)
>> +{
>> + if (__glibc_likely (nb < tidx2usize (TCACHE_MAX_BINS)))
>> + return fast_csize2tidx(nb);
>
> This is incorrect, it should be <= MAX_TCACHE_SIZE. But it's also not needed
> given this function is only used for large tcache.
Side-effects of earlier versions. :(
I think there are some functions like memalign and perhaps calloc that
still would use it.
>
>> + else if (__glibc_likely (mp_.tcache_bins < TCACHE_MAX_BINS))
>> + return TCACHE_MAX_BINS;
>
> This does not make any sense...
>
>> + else
>> + {
>> + size_t idx = TCACHE_MAX_BINS
>> + + __builtin_clz (tidx2usize (TCACHE_MAX_BINS-1))
>> + - __builtin_clz (nb);
>> + return idx < TCACHE_ALL_BINS ? idx : TCACHE_ALL_BINS - 1;
>> + }
>
> This appears to be mapping all sizes into the last bin. There is no advantage in doing
> this as really large sizes should not be in tcache - so this is just adding extra code for
> no advantage.
I don't necessarily agree with limiting once more the maximmum size for
the tcache. If the user defines a higher valued tunable for
tcache_max_size, why should we not allow it. Effects on performance are
really negligible. Applications are not dominated anyway by allocation
calls. ;)
>
> +/* Returns the position in the tcache bin where the specified size is first found in.
> + This function is used by both tcache_get and tcache_put to locate the proper
> + position in the tcache bin linked-list. */
> +static __always_inline tcache_entry **
> +tcache_location_for_size (size_t nb, size_t tc_idx)
> +{
> + if (__glibc_likely (nb <= tidx2usize (TCACHE_MAX_BINS-1)))
> + return &(tcache->entries[tc_idx]);
>
> There is no need to handle small tcache here, it just adds unnecessary complexity.
I think there are still some paths that rely on this for small sizes.
Look into _mid_memalign.
>
>> @@ -3167,8 +3212,9 @@ tcache_put (mchunkptr chunk, size_t tc_idx)
>> detect a double free. */
>> e->key = tcache_key;
>>
>> - e->next = PROTECT_PTR (&e->next, REVEAL_PTR (tcache->entries[tc_idx]));
>> - tcache->entries[tc_idx] = PROTECT_PTR (&(tcache->entries[tc_idx]), e);
>> + tcache_entry **entry = tcache_location_for_size (chunksize (chunk), tc_idx);
>> + e->next = PROTECT_PTR (&e->next, REVEAL_PTR (*entry));
>> + *entry = PROTECT_PTR (entry, e);
>> ++(tcache->counts[tc_idx]);
>
> Just add a new function tcache_insert_large or similar for the large case.
Ok
>
>> -tcache_get (size_t tc_idx)
>> +tcache_get (size_t nb, size_t tc_idx)
>> {
>> - return tcache_get_n (tc_idx, & tcache->entries[tc_idx]);
>> -}
>> + tcache_entry **entry = tcache_location_for_size (nb, tc_idx);
>
> Same here, no need to make the small tcache code more complex.
Ok
>
>> + tcache_entry *e = REVEAL_PTR (*entry);
>> + if (__glibc_unlikely (tc_idx >= TCACHE_MAX_BINS)
>> + && (e == NULL || chunksize (mem2chunk (e)) != nb))
>> + return NULL;
>
> So if we found a larger block than required, we are not going to use it after
> doing all this extra work? What you want to do is to check how much larger
> the returned block is, and use it if it is a little larger than the required size.
> That should be in the function that searches for the block so we don't need to
> repeat all the reveals and chunksize etc twice...
Not sure about allowing tcaches to return non matching size chunks.
This could result in more fragmentation. I think by having larger chunk
sizes tcaches, would reduce arena contention enough to allow the non
matching size allocation to lock and find a better match in main bins.
Can we leave this for a future patch.
Please notice the current approach also does not deal with releasing
tcache large chunks back to arenas, this is another topic to look into.
I left all this possible optimizations for later to allow these
discussions to happen.
>
> @@ -3373,8 +3419,20 @@ __libc_malloc2 (size_t bytes)
>
>> + size_t nb = checked_request2size (bytes);
>> + size_t tc_idx = csize2tidx (nb);
>> +
>> + if (tcache_available (tc_idx))
>> + {
>> + void *tmp = tcache_get (nb, tc_idx);
>> + if (tmp != NULL)
>> + return tag_new_usable (tmp);
>> + }
>
> This should be in __libc_malloc immediately after checking small tcache:
Ok
>
> if (__glibc_likely (bytes <= MAX_TCACHE_SIZE))
> ... small tcache code
> else if (bytes < mp_.tcache_max_size)
> {
> void *mem = tcache_find_large (checked_request2size (bytes));
> if (mem != NULL)
> return tag_new_usable (mem);
> }
> return __libc_malloc2 (bytes);
>
>
>> @@ -3409,10 +3467,17 @@ void *
>> __libc_malloc (size_t bytes)
>> {
>> #if USE_TCACHE
>> - size_t tc_idx = csize2tidx (checked_request2size (bytes));
>> + size_t nb = checked_request2size (bytes);
>> + size_t tc_idx = fast_csize2tidx (nb);
>>
>> - if (tcache_available (tc_idx))
>> - return tag_new_usable (tcache_get (tc_idx));
>> + /* Limit this call to chunks that fit in same dimention tcache bins.
>> + Large chunk sizes are cached inside __libc_malloc2. */
>> + if (tc_idx < TCACHE_MAX_BINS
>> + && tcache_available (tc_idx))
>> + {
>> + tcache_entry **entry = &tcache->entries[tc_idx];
>> + return tag_new_usable (tcache_remove_entry (tc_idx, entry));
>> + }
>
> Again, it doesn't make any sense to make the fast path more complex.
What do you mean ? How is it more complex? The presented code is a
subset of what tcache_get does.
What I am understanding now is that, you clearly want 2 sets of
functions, really separating tcaches in small/large implementations.
Please allow me the release of some frustration here ... you were the
motivation for me to merge both implementations in the first place. :)
>
> return __libc_malloc2 (bytes);
> @@ -3683,20 +3748,28 @@ _mid_memalign (size_t alignment, size_t bytes, void *address)
>
> if (tcache_available (tc_idx))
> {
> - /* The tcache itself isn't encoded, but the chain is. */
> - tcache_entry **tep = & tcache->entries[tc_idx];
> + tcache_entry **tep = tcache_location_for_size (tbytes, tc_idx);
> tcache_entry *te = REVEAL_PTR (*tep);
> +
> + /* Make sure returned chunk is of expected size. */
> + if (te == NULL
> + || (tc_idx >= TCACHE_MAX_BINS
> + && chunksize (mem2chunk (te)) != tbytes))
> + goto tcache_memalign_abort;
> +
> + /* The tcache itself isn't encoded, but the chain is. */
> while (te != NULL && !PTR_IS_ALIGNED (te, alignment))
> {
> tep = & (te->next);
> - te = tcache_next (te);
> + te = REVEAL_PTR (te->next);
> }
> if (te != NULL)
> {
> - void *victim = tcache_get_n (tc_idx, tep);
> + void *victim = tcache_remove_entry (tc_idx, tep);
> return tag_new_usable (victim);
> }
> }
> +tcache_memalign_abort:
>
> You need separate code paths for the small and large sizes, or this won't work. There
> is no guarantee you will get a block of the right size - it seems to return immediately
> if it finds any block that happens to be aligned enough...
That is not true, please notice that tcache_location_for_size already
places tep pointer in a matching size chunk in the list, and if size
does not match it aborts.
>
>
> @@ -3983,8 +4056,8 @@ _int_malloc (mstate av, size_t bytes)
> #if USE_TCACHE
> /* While we're here, if we see other chunks of the same size,
> stash them in the tcache. */
> - size_t tc_idx = csize2tidx (nb);
> - if (tcache != NULL && tc_idx < mp_.tcache_bins)
> + size_t tc_idx = fast_csize2tidx (nb);
> + if (tcache != NULL && tc_idx < TCACHE_FIXED_SIZE_BINS)
>
> This change is not needed.
>
> @@ -4044,8 +4117,8 @@ _int_malloc (mstate av, size_t bytes)
> #if USE_TCACHE
> /* While we're here, if we see other chunks of the same size,
> stash them in the tcache. */
> - size_t tc_idx = csize2tidx (nb);
> - if (tcache != NULL && tc_idx < mp_.tcache_bins)
> + size_t tc_idx = fast_csize2tidx (nb);
> + if (__glibc_likely (tcache != NULL) && tc_idx < TCACHE_FIXED_SIZE_BINS)
>
> Same here.
>
> @@ -4106,8 +4179,8 @@ _int_malloc (mstate av, size_t bytes)
>
> #if USE_TCACHE
> INTERNAL_SIZE_T tcache_nb = 0;
> - size_t tc_idx = csize2tidx (nb);
> - if (tcache != NULL && tc_idx < mp_.tcache_bins)
> + size_t tc_idx = fast_csize2tidx (nb);
> + if (tcache != NULL && tc_idx < TCACHE_FIXED_SIZE_BINS)
>
> And here.
>
>
> @@ -5530,14 +5603,12 @@ do_set_arena_max (size_t value)
> static __always_inline int
> do_set_tcache_max (size_t value)
> {
> - if (value <= MAX_TCACHE_SIZE)
> - {
> - LIBC_PROBE (memory_tunable_tcache_max_bytes, 2, value, mp_.tcache_max_bytes);
> - mp_.tcache_max_bytes = value;
> - mp_.tcache_bins = csize2tidx (request2size(value)) + 1;
> - return 1;
> - }
> - return 0;
> + LIBC_PROBE (memory_tunable_tcache_max_bytes, 2, value, mp_.tcache_max_bytes);
> + mp_.tcache_max_bytes = value;
> + mp_.tcache_bins = TCACHE_MAX_BINS + TCACHE_UNBOUND_SIZE_BINS;
> + mp_.tcache_bins = csize2tidx (request2size(value)) + 1;
>
> There should be a limit on the maximum size and bins. There is no proper check on
> the size, so as soon as the size is larger than MAX_TCACHE_SIZE, the large tcache is
> enabled for all sizes without any control.
Oh, I missed to check for tcache_max_bytes. Oups!
Cheers,
Cupertino
Hi Cupertino,
>> This does not make any sense. There are 2 options: either keep the current definition of
>> mp_.tcache_bins for the small bins and use a new global for the larger bins, or use
>> mp_.tcache_bins for the larger bins only and TCACHE_MAX_BINS for the smaller ones.
>
> Well, I had the intention to keep original semantics unchanged and add
> support for large tcaches as well.
> Original code, allowed tcaches to be limited below the 1k size too.
>
> Which of those options you would prefer ?
> I think recent code changes are redirecting code to option 2.
The code below is what I am working towards - essentially another step after that.
However existing code expects that tcache_max_bins remains capped at TCACHE_MAX_BINS.
So unless you want to add support for option 2, you could use tcache_max_bytes for large
blocks since it is currently unused. Ultimately malloc would end up looking like this:
if (__glibc_likely (bytes <= MAX_TCACHE_SIZE))
... small tcache code
else if (bytes <= mp_.tcache_max_size)
... large tcache code
return __libc_malloc2 (bytes);
This has the advantage neither case pays for the cost of computing the tidx of the other.
>> -# define tidx2usize(idx) (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)
>> +# define tidx2usize(idx) (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - MALLOC_ALIGNMENT + 1)
>>
>> This is not correct.
> I have applied the reverse of the original csize2tidx formula, which has
> changed recently. Now I am not totally sure, but my impression was that
> this was more accurate.
Try it with idx=0, you get 24 bytes user memory for the smallest chunk for a 64-bit target.
The changed formula ends up with 17 which is a weird value. If you use a higher malloc
alignment of 32, you'd get 1.
>>> + if (__glibc_likely (nb < tidx2usize (TCACHE_MAX_BINS)))
>>> + return fast_csize2tidx(nb);
>>
>> This is incorrect, it should be <= MAX_TCACHE_SIZE. But it's also not needed
>> given this function is only used for large tcache.
> Side-effects of earlier versions. :(
> I think there are some functions like memalign and perhaps calloc that
> still would use it.
Only supporting small tcache for calloc and memalign would be fine for now.
>> This appears to be mapping all sizes into the last bin. There is no advantage in doing
>> this as really large sizes should not be in tcache - so this is just adding extra code for
>> no advantage.
> I don't necessarily agree with limiting once more the maximmum size for
> the tcache. If the user defines a higher valued tunable for
> tcache_max_size, why should we not allow it. Effects on performance are
> really negligible. Applications are not dominated anyway by allocation
> calls. ;)
Large blocks will typically be mmap'd, and we can't currently allow those to be cached in
tcache due to their different format. You will need to add a check for that in the
large block free until we fix this.
Caching huge blocks in tcache will significantly increase memory usage. You could have
a thread freeing a 1GB block but not allocating more large blocks - now that 1GB block
cannot be reused by other threads until the thread exits... We must release such blocks
back to the system. Also since such blocks are rare, there is no need to speed them up
using tcache. So there should be a limit on the size of blocks.
>> +tcache_location_for_size (size_t nb, size_t tc_idx)
>> +{
>> + if (__glibc_likely (nb <= tidx2usize (TCACHE_MAX_BINS-1)))
>> + return &(tcache->entries[tc_idx]);
>>
>> There is no need to handle small tcache here, it just adds unnecessary complexity.
> I think there are still some paths that rely on this for small sizes.
>Look into _mid_memalign.
It's not clear whether supporting large blocks in memalign is useful - actually I'm not
sure it is for small blocks! The chance of finding an aligned chunk will be very small,
especially for larger alignments.
>> So if we found a larger block than required, we are not going to use it after
>> doing all this extra work? What you want to do is to check how much larger
>> the returned block is, and use it if it is a little larger than the required size.
>> That should be in the function that searches for the block so we don't need to
>> repeat all the reveals and chunksize etc twice...
> Not sure about allowing tcaches to return non matching size chunks.
> This could result in more fragmentation. I think by having larger chunk
> sizes tcaches, would reduce arena contention enough to allow the non
> matching size allocation to lock and find a better match in main bins.
The fragmentation problem is caused by allowing way too many block sizes, being
overly picky about allocating a matching block size, splitting off tiny remainder
blocks from a larger block, and then also being very lazy about coalescing...
A perfect storm for fragmentation!
> Can we leave this for a future patch.
> Please notice the current approach also does not deal with releasing
> tcache large chunks back to arenas, this is another topic to look into.
Sure we don't need to solve this now.
>> + /* Limit this call to chunks that fit in same dimention tcache bins.
>> + Large chunk sizes are cached inside __libc_malloc2. */
>> + if (tc_idx < TCACHE_MAX_BINS
>> + && tcache_available (tc_idx))
>> + {
>> + tcache_entry **entry = &tcache->entries[tc_idx];
>> + return tag_new_usable (tcache_remove_entry (tc_idx, entry));
>> + }
>
>> Again, it doesn't make any sense to make the fast path more complex.
> What do you mean ? How is it more complex? The presented code is a
> subset of what tcache_get does.
It does a lot of unnecessary extra work due to all the extra if statements.
I still get a 25% slowdown with this version.
> What I am understanding now is that, you clearly want 2 sets of
> functions, really separating tcaches in small/large implementations.
> Please allow me the release of some frustration here ... you were the
> motivation for me to merge both implementations in the first place. :)
I've asked for keeping the small tcache code fast since your initial RFC.
The 25% slowdown shows that it is impossible to keep the small tcache code fast
when merging it with the large tcache code. So the only option is to keep the code
separate.
>> You need separate code paths for the small and large sizes, or this won't work. There
>> is no guarantee you will get a block of the right size - it seems to return immediately
>> if it finds any block that happens to be aligned enough...
> That is not true, please notice that tcache_location_for_size already
> places tep pointer in a matching size chunk in the list, and if size
> does not match it aborts.
It only does that for the first entry, but then it loops and it could in principle get any
other size as it follows the next links. If the blocks are kept sorted on increasing size
(as appears to be the case), you could return a block that is twice the requested size.
It seems best to add this in a later patch if we can prove it is useful.
>> There should be a limit on the maximum size and bins. There is no proper check on
>> the size, so as soon as the size is larger than MAX_TCACHE_SIZE, the large tcache is
>> enabled for all sizes without any control.
>Oh, I missed to check for tcache_max_bytes. Oups!
tcache_max_bytes is unused right now.
But what I meant is the maximum size is not limited due to csize2tidx - so once you allow
the largest bin, you allow all sizes.
Cheers,
Wilco
@@ -292,13 +292,19 @@
#if USE_TCACHE
/* We want 64 entries. This is an arbitrary limit, which tunables can reduce. */
# define TCACHE_MAX_BINS 64
+/* Number of bins beyond the TCACHE_MAX_BINS reserved for ranges of chunksizes
+ to allow tcaches to cache chunks beyond ~1k size. */
+# define TCACHE_UNBOUND_SIZE_BINS 10
+# define TCACHE_ALL_BINS (TCACHE_MAX_BINS + TCACHE_UNBOUND_SIZE_BINS)
# define MAX_TCACHE_SIZE tidx2usize (TCACHE_MAX_BINS-1)
+# define TCACHE_FIXED_SIZE_BINS \
+ (mp_.tcache_bins < TCACHE_MAX_BINS ? mp_.tcache_bins : TCACHE_MAX_BINS)
/* Only used to pre-fill the tunables. */
-# define tidx2usize(idx) (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)
+# define tidx2usize(idx) (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - MALLOC_ALIGNMENT + 1)
-/* When "x" is from chunksize(). */
-# define csize2tidx(x) (((x) - MINSIZE) / MALLOC_ALIGNMENT)
+/* When "x" is from chunksize(). Only usable for single sized tcache bins. */
+# define fast_csize2tidx(x) (((x) - MINSIZE) / MALLOC_ALIGNMENT)
/* When "x" is a user-provided size. */
# define usize2tidx(x) csize2tidx (request2size (x))
@@ -1929,7 +1935,7 @@ static struct malloc_par mp_ =
,
.tcache_count = TCACHE_FILL_COUNT,
.tcache_bins = TCACHE_MAX_BINS,
- .tcache_max_bytes = tidx2usize (TCACHE_MAX_BINS-1),
+ .tcache_max_bytes = MAX_TCACHE_SIZE,
.tcache_unsorted_limit = 0 /* No limit. */
#endif
};
@@ -3122,8 +3128,8 @@ typedef struct tcache_entry
time), this is for performance reasons. */
typedef struct tcache_perthread_struct
{
- uint16_t counts[TCACHE_MAX_BINS];
- tcache_entry *entries[TCACHE_MAX_BINS];
+ uint16_t counts[TCACHE_MAX_BINS + TCACHE_UNBOUND_SIZE_BINS];
+ tcache_entry *entries[TCACHE_MAX_BINS + TCACHE_UNBOUND_SIZE_BINS];
} tcache_perthread_struct;
static __thread bool tcache_shutting_down = false;
@@ -3156,6 +3162,45 @@ tcache_key_initialize (void)
}
}
+static __always_inline size_t
+csize2tidx(size_t nb)
+{
+ if (__glibc_likely (nb < tidx2usize (TCACHE_MAX_BINS)))
+ return fast_csize2tidx(nb);
+ else if (__glibc_likely (mp_.tcache_bins < TCACHE_MAX_BINS))
+ return TCACHE_MAX_BINS;
+ else
+ {
+ size_t idx = TCACHE_MAX_BINS
+ + __builtin_clz (tidx2usize (TCACHE_MAX_BINS-1))
+ - __builtin_clz (nb);
+ return idx < TCACHE_ALL_BINS ? idx : TCACHE_ALL_BINS - 1;
+ }
+}
+
+/* Returns the position in the tcache bin where the specified size is first found in.
+ This function is used by both tcache_get and tcache_put to locate the proper
+ position in the tcache bin linked-list. */
+static __always_inline tcache_entry **
+tcache_location_for_size (size_t nb, size_t tc_idx)
+{
+ if (__glibc_likely (nb <= tidx2usize (TCACHE_MAX_BINS-1)))
+ return &(tcache->entries[tc_idx]);
+ else
+ {
+ tcache_entry **tep = &(tcache->entries[tc_idx]);
+ tcache_entry *te = REVEAL_PTR (*tep);
+ while (te != NULL
+ && __glibc_unlikely (chunksize (mem2chunk (te)) < nb))
+ {
+ tep = & (te->next);
+ te = REVEAL_PTR (te->next);
+ }
+
+ return tep;
+ }
+}
+
/* Caller must ensure that we know tc_idx is valid and there's room
for more chunks. */
static __always_inline void
@@ -3167,8 +3212,9 @@ tcache_put (mchunkptr chunk, size_t tc_idx)
detect a double free. */
e->key = tcache_key;
- e->next = PROTECT_PTR (&e->next, REVEAL_PTR (tcache->entries[tc_idx]));
- tcache->entries[tc_idx] = PROTECT_PTR (&(tcache->entries[tc_idx]), e);
+ tcache_entry **entry = tcache_location_for_size (chunksize (chunk), tc_idx);
+ e->next = PROTECT_PTR (&e->next, REVEAL_PTR (*entry));
+ *entry = PROTECT_PTR (entry, e);
++(tcache->counts[tc_idx]);
}
@@ -3176,7 +3222,7 @@ tcache_put (mchunkptr chunk, size_t tc_idx)
available chunks to remove. Removes chunk from the middle of the
list. */
static __always_inline void *
-tcache_get_n (size_t tc_idx, tcache_entry **ep)
+tcache_remove_entry (size_t tc_idx, tcache_entry **ep)
{
tcache_entry *e = REVEAL_PTR (*ep);
@@ -3190,18 +3236,18 @@ tcache_get_n (size_t tc_idx, tcache_entry **ep)
return (void *) e;
}
-/* Like the above, but removes from the head of the list. */
+/* Like the above, but traverses the list to find proper location to get the
+ entry within the respective tcache bin. */
static __always_inline void *
-tcache_get (size_t tc_idx)
+tcache_get (size_t nb, size_t tc_idx)
{
- return tcache_get_n (tc_idx, & tcache->entries[tc_idx]);
-}
+ tcache_entry **entry = tcache_location_for_size (nb, tc_idx);
+ tcache_entry *e = REVEAL_PTR (*entry);
+ if (__glibc_unlikely (tc_idx >= TCACHE_MAX_BINS)
+ && (e == NULL || chunksize (mem2chunk (e)) != nb))
+ return NULL;
-/* Iterates through the tcache linked list. */
-static __always_inline tcache_entry *
-tcache_next (tcache_entry *e)
-{
- return (tcache_entry *) REVEAL_PTR (e->next);
+ return tcache_remove_entry (tc_idx, entry);
}
/* Check if tcache is available for alloc by corresponding tc_idx. */
@@ -3258,7 +3304,7 @@ tcache_thread_shutdown (void)
/* Free all of the entries and the tcache itself back to the arena
heap for coalescing. */
- for (i = 0; i < TCACHE_MAX_BINS; ++i)
+ for (i = 0; i < TCACHE_MAX_BINS + TCACHE_UNBOUND_SIZE_BINS; ++i)
{
while (REVEAL_PTR (tcache_tmp->entries[i]))
{
@@ -3317,7 +3363,7 @@ tcache_init(void)
memset (tcache, 0, sizeof (tcache_perthread_struct));
}
- for (int i = 0; i < mp_.tcache_bins; i++)
+ for (int i = 0; i < TCACHE_MAX_BINS + TCACHE_UNBOUND_SIZE_BINS; i++)
tcache->entries[i] = PROTECT_PTR (&(tcache->entries[i]), NULL);
}
@@ -3344,7 +3390,7 @@ tcache_try_malloc (size_t bytes, void **memptr)
MAYBE_INIT_TCACHE ();
if (tcache_available (tc_idx))
- *memptr = tcache_get (tc_idx);
+ *memptr = tcache_get (tbytes, tc_idx);
else
*memptr = NULL;
@@ -3373,8 +3419,20 @@ __libc_malloc2 (size_t bytes)
if (!__malloc_initialized)
ptmalloc_init ();
+#if USE_TCACHE
MAYBE_INIT_TCACHE ();
+ size_t nb = checked_request2size (bytes);
+ size_t tc_idx = csize2tidx (nb);
+
+ if (tcache_available (tc_idx))
+ {
+ void *tmp = tcache_get (nb, tc_idx);
+ if (tmp != NULL)
+ return tag_new_usable (tmp);
+ }
+#endif
+
if (SINGLE_THREAD_P)
{
victim = tag_new_usable (_int_malloc (&main_arena, bytes));
@@ -3409,10 +3467,17 @@ void *
__libc_malloc (size_t bytes)
{
#if USE_TCACHE
- size_t tc_idx = csize2tidx (checked_request2size (bytes));
+ size_t nb = checked_request2size (bytes);
+ size_t tc_idx = fast_csize2tidx (nb);
- if (tcache_available (tc_idx))
- return tag_new_usable (tcache_get (tc_idx));
+ /* Limit this call to chunks that fit in same dimention tcache bins.
+ Large chunk sizes are cached inside __libc_malloc2. */
+ if (tc_idx < TCACHE_MAX_BINS
+ && tcache_available (tc_idx))
+ {
+ tcache_entry **entry = &tcache->entries[tc_idx];
+ return tag_new_usable (tcache_remove_entry (tc_idx, entry));
+ }
#endif
return __libc_malloc2 (bytes);
@@ -3683,20 +3748,28 @@ _mid_memalign (size_t alignment, size_t bytes, void *address)
if (tcache_available (tc_idx))
{
- /* The tcache itself isn't encoded, but the chain is. */
- tcache_entry **tep = & tcache->entries[tc_idx];
+ tcache_entry **tep = tcache_location_for_size (tbytes, tc_idx);
tcache_entry *te = REVEAL_PTR (*tep);
+
+ /* Make sure returned chunk is of expected size. */
+ if (te == NULL
+ || (tc_idx >= TCACHE_MAX_BINS
+ && chunksize (mem2chunk (te)) != tbytes))
+ goto tcache_memalign_abort;
+
+ /* The tcache itself isn't encoded, but the chain is. */
while (te != NULL && !PTR_IS_ALIGNED (te, alignment))
{
tep = & (te->next);
- te = tcache_next (te);
+ te = REVEAL_PTR (te->next);
}
if (te != NULL)
{
- void *victim = tcache_get_n (tc_idx, tep);
+ void *victim = tcache_remove_entry (tc_idx, tep);
return tag_new_usable (victim);
}
}
+tcache_memalign_abort:
}
#endif
@@ -3983,8 +4056,8 @@ _int_malloc (mstate av, size_t bytes)
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
- size_t tc_idx = csize2tidx (nb);
- if (tcache != NULL && tc_idx < mp_.tcache_bins)
+ size_t tc_idx = fast_csize2tidx (nb);
+ if (tcache != NULL && tc_idx < TCACHE_FIXED_SIZE_BINS)
{
mchunkptr tc_victim;
@@ -4044,8 +4117,8 @@ _int_malloc (mstate av, size_t bytes)
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
- size_t tc_idx = csize2tidx (nb);
- if (tcache != NULL && tc_idx < mp_.tcache_bins)
+ size_t tc_idx = fast_csize2tidx (nb);
+ if (__glibc_likely (tcache != NULL) && tc_idx < TCACHE_FIXED_SIZE_BINS)
{
mchunkptr tc_victim;
@@ -4106,8 +4179,8 @@ _int_malloc (mstate av, size_t bytes)
#if USE_TCACHE
INTERNAL_SIZE_T tcache_nb = 0;
- size_t tc_idx = csize2tidx (nb);
- if (tcache != NULL && tc_idx < mp_.tcache_bins)
+ size_t tc_idx = fast_csize2tidx (nb);
+ if (tcache != NULL && tc_idx < TCACHE_FIXED_SIZE_BINS)
tcache_nb = nb;
int return_cached = 0;
@@ -4285,7 +4358,7 @@ _int_malloc (mstate av, size_t bytes)
&& mp_.tcache_unsorted_limit > 0
&& tcache_unsorted_count > mp_.tcache_unsorted_limit)
{
- return tcache_get (tc_idx);
+ return tcache_get (nb, tc_idx);
}
#endif
@@ -4298,7 +4371,7 @@ _int_malloc (mstate av, size_t bytes)
/* If all the small chunks we found ended up cached, return one now. */
if (return_cached)
{
- return tcache_get (tc_idx);
+ return tcache_get (nb, tc_idx);
}
#endif
@@ -5530,14 +5603,12 @@ do_set_arena_max (size_t value)
static __always_inline int
do_set_tcache_max (size_t value)
{
- if (value <= MAX_TCACHE_SIZE)
- {
- LIBC_PROBE (memory_tunable_tcache_max_bytes, 2, value, mp_.tcache_max_bytes);
- mp_.tcache_max_bytes = value;
- mp_.tcache_bins = csize2tidx (request2size(value)) + 1;
- return 1;
- }
- return 0;
+ LIBC_PROBE (memory_tunable_tcache_max_bytes, 2, value, mp_.tcache_max_bytes);
+ mp_.tcache_max_bytes = value;
+ mp_.tcache_bins = TCACHE_MAX_BINS + TCACHE_UNBOUND_SIZE_BINS;
+ mp_.tcache_bins = csize2tidx (request2size(value)) + 1;
+
+ return 1;
}
static __always_inline int