[v3,3/5] malloc: Arena is not needed for tcache path in free()
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
defers arena deference to _int_free_chunk() thus accelerate tcache path.
When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
Result of bench-malloc-thread benchmark
Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)
Threads# | Ratio
-----------|------
1 thread | 0.994
4 threads | 0.968
The data shows it can brings 3% performance gain in multi-thread scenario.
---
Changes in v2:
- _int_free_check() should be put outside of USE_TCACHE.
- Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
---
malloc/malloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
Comments
On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
>
> Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
> defers arena deference to _int_free_chunk() thus accelerate tcache path.
> When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
>
> Result of bench-malloc-thread benchmark
>
> Test Platform: Xeon-8380
> Ratio: New / Original time_per_iteration (Lower is Better)
>
> Threads# | Ratio
> -----------|------
> 1 thread | 0.994
> 4 threads | 0.968
>
> The data shows it can brings 3% performance gain in multi-thread scenario.
>
> ---
> Changes in v2:
> - _int_free_check() should be put outside of USE_TCACHE.
> - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
> ---
> malloc/malloc.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 264f35e1a3..efb5292e9f 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
> {
> mchunkptr next;
>
> + if (av == NULL)
> + av = arena_for_chunk (p);
> +
> do_check_chunk (av, p);
>
> if (chunk_is_mmapped (p))
> @@ -3447,9 +3450,10 @@ __libc_free (void *mem)
> /* Mark the chunk as belonging to the library again. */
> (void)tag_region (chunk2mem (p), memsize (p));
>
> - ar_ptr = arena_for_chunk (p);
> INTERNAL_SIZE_T size = chunksize (p);
> - _int_free_check (ar_ptr, p, size);
> + /* av is not needed for _int_free_check in non-DEBUG mode,
> + in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
> + _int_free_check (NULL, p, size);
If _int_free_check doesn't need av, can you push it further down?
>
> #if USE_TCACHE
> if (tcache_free (p, size))
> @@ -3458,6 +3462,8 @@ __libc_free (void *mem)
> return;
> }
> #endif
> +
> + ar_ptr = arena_for_chunk (p);
> _int_free_chunk (ar_ptr, p, size, 0);
> }
>
> --
> 2.43.0
>
On Mon, Nov 25, 2024 at 6:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
> >
> > Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
> > defers arena deference to _int_free_chunk() thus accelerate tcache path.
> > When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
> >
> > Result of bench-malloc-thread benchmark
> >
> > Test Platform: Xeon-8380
> > Ratio: New / Original time_per_iteration (Lower is Better)
> >
> > Threads# | Ratio
> > -----------|------
> > 1 thread | 0.994
> > 4 threads | 0.968
> >
> > The data shows it can brings 3% performance gain in multi-thread scenario.
> >
> > ---
> > Changes in v2:
> > - _int_free_check() should be put outside of USE_TCACHE.
> > - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
> > ---
> > malloc/malloc.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/malloc/malloc.c b/malloc/malloc.c
> > index 264f35e1a3..efb5292e9f 100644
> > --- a/malloc/malloc.c
> > +++ b/malloc/malloc.c
> > @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
> > {
> > mchunkptr next;
> >
> > + if (av == NULL)
> > + av = arena_for_chunk (p);
> > +
> > do_check_chunk (av, p);
> >
> > if (chunk_is_mmapped (p))
> > @@ -3447,9 +3450,10 @@ __libc_free (void *mem)
> > /* Mark the chunk as belonging to the library again. */
> > (void)tag_region (chunk2mem (p), memsize (p));
> >
> > - ar_ptr = arena_for_chunk (p);
> > INTERNAL_SIZE_T size = chunksize (p);
> > - _int_free_check (ar_ptr, p, size);
> > + /* av is not needed for _int_free_check in non-DEBUG mode,
> > + in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
> > + _int_free_check (NULL, p, size);
>
> If _int_free_check doesn't need av, can you push it further down?
>
Something like this?
On 11/25/2024 7:58 PM, H.J. Lu wrote:
> On Mon, Nov 25, 2024 at 6:34 PM H.J. Lu<hjl.tools@gmail.com> wrote:
>> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo<wangyang.guo@intel.com> wrote:
>>> Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
>>> defers arena deference to _int_free_chunk() thus accelerate tcache path.
>>> When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
>>>
>>> Result of bench-malloc-thread benchmark
>>>
>>> Test Platform: Xeon-8380
>>> Ratio: New / Original time_per_iteration (Lower is Better)
>>>
>>> Threads# | Ratio
>>> -----------|------
>>> 1 thread | 0.994
>>> 4 threads | 0.968
>>>
>>> The data shows it can brings 3% performance gain in multi-thread scenario.
>>>
>>> ---
>>> Changes in v2:
>>> - _int_free_check() should be put outside of USE_TCACHE.
>>> - Link to v1:https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
>>> ---
>>> malloc/malloc.c | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>>> index 264f35e1a3..efb5292e9f 100644
>>> --- a/malloc/malloc.c
>>> +++ b/malloc/malloc.c
>>> @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
>>> {
>>> mchunkptr next;
>>>
>>> + if (av == NULL)
>>> + av = arena_for_chunk (p);
>>> +
>>> do_check_chunk (av, p);
>>>
>>> if (chunk_is_mmapped (p))
>>> @@ -3447,9 +3450,10 @@ __libc_free (void *mem)
>>> /* Mark the chunk as belonging to the library again. */
>>> (void)tag_region (chunk2mem (p), memsize (p));
>>>
>>> - ar_ptr = arena_for_chunk (p);
>>> INTERNAL_SIZE_T size = chunksize (p);
>>> - _int_free_check (ar_ptr, p, size);
>>> + /* av is not needed for _int_free_check in non-DEBUG mode,
>>> + in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
>>> + _int_free_check (NULL, p, size);
>> If _int_free_check doesn't need av, can you push it further down?
>>
> Something like this?
The problem is _init_free is also used in other places. We can not make
sure `av` always got from `arena_for_chunk (p)`.
On Tue, Nov 26, 2024, 10:55 AM Guo, Wangyang <wangyang.guo@intel.com> wrote:
> On 11/25/2024 7:58 PM, H.J. Lu wrote:
>
> On Mon, Nov 25, 2024 at 6:34 PM H.J. Lu <hjl.tools@gmail.com> <hjl.tools@gmail.com> wrote:
>
> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo <wangyang.guo@intel.com> <wangyang.guo@intel.com> wrote:
>
> Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
> defers arena deference to _int_free_chunk() thus accelerate tcache path.
> When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
>
> Result of bench-malloc-thread benchmark
>
> Test Platform: Xeon-8380
> Ratio: New / Original time_per_iteration (Lower is Better)
>
> Threads# | Ratio
> -----------|------
> 1 thread | 0.994
> 4 threads | 0.968
>
> The data shows it can brings 3% performance gain in multi-thread scenario.
>
> ---
> Changes in v2:
> - _int_free_check() should be put outside of USE_TCACHE.
> - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
> ---
> malloc/malloc.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 264f35e1a3..efb5292e9f 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
> {
> mchunkptr next;
>
> + if (av == NULL)
> + av = arena_for_chunk (p);
> +
> do_check_chunk (av, p);
>
> if (chunk_is_mmapped (p))
> @@ -3447,9 +3450,10 @@ __libc_free (void *mem)
> /* Mark the chunk as belonging to the library again. */
> (void)tag_region (chunk2mem (p), memsize (p));
>
> - ar_ptr = arena_for_chunk (p);
> INTERNAL_SIZE_T size = chunksize (p);
> - _int_free_check (ar_ptr, p, size);
> + /* av is not needed for _int_free_check in non-DEBUG mode,
> + in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
> + _int_free_check (NULL, p, size);
>
> If _int_free_check doesn't need av, can you push it further down?
>
>
> Something like this?
>
> The problem is _init_free is also used in other places. We can not make
> sure `av` always got from `arena_for_chunk (p)`.
>
Have you looked at my change?
On 11/26/2024 11:23 AM, H.J. Lu wrote:
>
>
> On Tue, Nov 26, 2024, 10:55 AM Guo, Wangyang <wangyang.guo@intel.com>
> wrote:
>
> On 11/25/2024 7:58 PM, H.J. Lu wrote:
>
>> On Mon, Nov 25, 2024 at 6:34 PM H.J. Lu<hjl.tools@gmail.com> <mailto:hjl.tools@gmail.com> wrote:
>>> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo<wangyang.guo@intel.com> <mailto:wangyang.guo@intel.com> wrote:
>>>> Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
>>>> defers arena deference to _int_free_chunk() thus accelerate tcache path.
>>>> When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
>>>>
>>>> Result of bench-malloc-thread benchmark
>>>>
>>>> Test Platform: Xeon-8380
>>>> Ratio: New / Original time_per_iteration (Lower is Better)
>>>>
>>>> Threads# | Ratio
>>>> -----------|------
>>>> 1 thread | 0.994
>>>> 4 threads | 0.968
>>>>
>>>> The data shows it can brings 3% performance gain in multi-thread scenario.
>>>>
>>>> ---
>>>> Changes in v2:
>>>> - _int_free_check() should be put outside of USE_TCACHE.
>>>> - Link to v1:https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
>>>> ---
>>>> malloc/malloc.c | 10 ++++++++--
>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>>>> index 264f35e1a3..efb5292e9f 100644
>>>> --- a/malloc/malloc.c
>>>> +++ b/malloc/malloc.c
>>>> @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
>>>> {
>>>> mchunkptr next;
>>>>
>>>> + if (av == NULL)
>>>> + av = arena_for_chunk (p);
>>>> +
>>>> do_check_chunk (av, p);
>>>>
>>>> if (chunk_is_mmapped (p))
>>>> @@ -3447,9 +3450,10 @@ __libc_free (void *mem)
>>>> /* Mark the chunk as belonging to the library again. */
>>>> (void)tag_region (chunk2mem (p), memsize (p));
>>>>
>>>> - ar_ptr = arena_for_chunk (p);
>>>> INTERNAL_SIZE_T size = chunksize (p);
>>>> - _int_free_check (ar_ptr, p, size);
>>>> + /* av is not needed for _int_free_check in non-DEBUG mode,
>>>> + in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
>>>> + _int_free_check (NULL, p, size);
>>> If _int_free_check doesn't need av, can you push it further down?
>>>
>> Something like this?
> The problem is _init_free is also used in other places. We can not
> make sure `av` always got from `arena_for_chunk (p)`.
>
>
> Have you looked at my change?
Your patch try to remove `av` parameters in _int_free_check:
@@ -4684,7 +4693,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
size = chunksize (p);
- _int_free_check (av, p, size);
+ _int_free_check (p, size);
And fetch `av` by `arena_for_chunk (p)`.
+static inline void +do_check_inuse_chunk_fast (mstate av, mchunkptr p)
+{ + mstate av = arena_for_chunk (p); + do_check_inuse_chunk (av, p); +}
My point is for _int_free(av, p, have_lock), if `av` is not equals to
`arena_for_chunk (p)`, then we can not simply omit it from
`_int_free_check`.
On Tue, Nov 26, 2024, 11:41 AM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>
> On 11/26/2024 11:23 AM, H.J. Lu wrote:
>
>
>
> On Tue, Nov 26, 2024, 10:55 AM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>>
>> On 11/25/2024 7:58 PM, H.J. Lu wrote:
>>
>> On Mon, Nov 25, 2024 at 6:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
>>
>> Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
>> defers arena deference to _int_free_chunk() thus accelerate tcache path.
>> When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
>>
>> Result of bench-malloc-thread benchmark
>>
>> Test Platform: Xeon-8380
>> Ratio: New / Original time_per_iteration (Lower is Better)
>>
>> Threads# | Ratio
>> -----------|------
>> 1 thread | 0.994
>> 4 threads | 0.968
>>
>> The data shows it can brings 3% performance gain in multi-thread scenario.
>>
>> ---
>> Changes in v2:
>> - _int_free_check() should be put outside of USE_TCACHE.
>> - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
>> ---
>> malloc/malloc.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>> index 264f35e1a3..efb5292e9f 100644
>> --- a/malloc/malloc.c
>> +++ b/malloc/malloc.c
>> @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
>> {
>> mchunkptr next;
>>
>> + if (av == NULL)
>> + av = arena_for_chunk (p);
>> +
>> do_check_chunk (av, p);
>>
>> if (chunk_is_mmapped (p))
>> @@ -3447,9 +3450,10 @@ __libc_free (void *mem)
>> /* Mark the chunk as belonging to the library again. */
>> (void)tag_region (chunk2mem (p), memsize (p));
>>
>> - ar_ptr = arena_for_chunk (p);
>> INTERNAL_SIZE_T size = chunksize (p);
>> - _int_free_check (ar_ptr, p, size);
>> + /* av is not needed for _int_free_check in non-DEBUG mode,
>> + in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
>> + _int_free_check (NULL, p, size);
>>
>> If _int_free_check doesn't need av, can you push it further down?
>>
>> Something like this?
>>
>> The problem is _init_free is also used in other places. We can not make sure `av` always got from `arena_for_chunk (p)`.
>
>
> Have you looked at my change?
>
> Your patch try to remove `av` parameters in _int_free_check:
>
> @@ -4684,7 +4693,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
>
> size = chunksize (p);
>
> - _int_free_check (av, p, size);
> + _int_free_check (p, size);
>
> And fetch `av` by `arena_for_chunk (p)`.
>
> +static inline void +do_check_inuse_chunk_fast (mstate av, mchunkptr p) +{ + mstate av = arena_for_chunk (p); + do_check_inuse_chunk (av, p); +}
>
> My point is for _int_free(av, p, have_lock), if `av` is not equals to `arena_for_chunk (p)`, then we can not simply omit it from `_int_free_check`.
Is inlining _int_free good enough for performance improvement?
H.J.
On 11/26/2024 2:20 PM, H.J. Lu wrote:
> On Tue, Nov 26, 2024, 11:41 AM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>> On 11/26/2024 11:23 AM, H.J. Lu wrote:
>>
>>
>>
>> On Tue, Nov 26, 2024, 10:55 AM Guo, Wangyang <wangyang.guo@intel.com> wrote:
>>> On 11/25/2024 7:58 PM, H.J. Lu wrote:
>>>
>>> On Mon, Nov 25, 2024 at 6:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>
>>> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo <wangyang.guo@intel.com> wrote:
>>>
>>> Arena is not needed for _int_free_check() in non-DEBUG mode. This commit
>>> defers arena deference to _int_free_chunk() thus accelerate tcache path.
>>> When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk().
>>>
>>> Result of bench-malloc-thread benchmark
>>>
>>> Test Platform: Xeon-8380
>>> Ratio: New / Original time_per_iteration (Lower is Better)
>>>
>>> Threads# | Ratio
>>> -----------|------
>>> 1 thread | 0.994
>>> 4 threads | 0.968
>>>
>>> The data shows it can brings 3% performance gain in multi-thread scenario.
>>>
>>> ---
>>> Changes in v2:
>>> - _int_free_check() should be put outside of USE_TCACHE.
>>> - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
>>> ---
>>> malloc/malloc.c | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>>> index 264f35e1a3..efb5292e9f 100644
>>> --- a/malloc/malloc.c
>>> +++ b/malloc/malloc.c
>>> @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
>>> {
>>> mchunkptr next;
>>>
>>> + if (av == NULL)
>>> + av = arena_for_chunk (p);
>>> +
>>> do_check_chunk (av, p);
>>>
>>> if (chunk_is_mmapped (p))
>>> @@ -3447,9 +3450,10 @@ __libc_free (void *mem)
>>> /* Mark the chunk as belonging to the library again. */
>>> (void)tag_region (chunk2mem (p), memsize (p));
>>>
>>> - ar_ptr = arena_for_chunk (p);
>>> INTERNAL_SIZE_T size = chunksize (p);
>>> - _int_free_check (ar_ptr, p, size);
>>> + /* av is not needed for _int_free_check in non-DEBUG mode,
>>> + in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
>>> + _int_free_check (NULL, p, size);
>>>
>>> If _int_free_check doesn't need av, can you push it further down?
>>>
>>> Something like this?
>>>
>>> The problem is _init_free is also used in other places. We can not make sure `av` always got from `arena_for_chunk (p)`.
>>
>> Have you looked at my change?
>>
>> Your patch try to remove `av` parameters in _int_free_check:
>>
>> @@ -4684,7 +4693,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
>>
>> size = chunksize (p);
>>
>> - _int_free_check (av, p, size);
>> + _int_free_check (p, size);
>>
>> And fetch `av` by `arena_for_chunk (p)`.
>>
>> +static inline void +do_check_inuse_chunk_fast (mstate av, mchunkptr p) +{ + mstate av = arena_for_chunk (p); + do_check_inuse_chunk (av, p); +}
>>
>> My point is for _int_free(av, p, have_lock), if `av` is not equals to `arena_for_chunk (p)`, then we can not simply omit it from `_int_free_check`.
> Is inlining _int_free good enough for performance improvement?
>
> H.J.
The major performance gain comes from inlining. I think that is good
enough and can drop this patch.
On 11/26/2024 2:27 PM, Guo, Wangyang wrote:
> On 11/26/2024 2:20 PM, H.J. Lu wrote:
>
>> On Tue, Nov 26, 2024, 11:41 AM Guo, Wangyang <wangyang.guo@intel.com>
>> wrote:
>>> On 11/26/2024 11:23 AM, H.J. Lu wrote:
>>>
>>>
>>>
>>> On Tue, Nov 26, 2024, 10:55 AM Guo, Wangyang
>>> <wangyang.guo@intel.com> wrote:
>>>> On 11/25/2024 7:58 PM, H.J. Lu wrote:
>>>>
>>>> On Mon, Nov 25, 2024 at 6:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>
>>>> On Thu, Aug 29, 2024 at 2:31 PM Wangyang Guo
>>>> <wangyang.guo@intel.com> wrote:
>>>>
>>>> Arena is not needed for _int_free_check() in non-DEBUG mode. This
>>>> commit
>>>> defers arena deference to _int_free_chunk() thus accelerate tcache
>>>> path.
>>>> When DEBUG enabled, arena can be obtained from p in
>>>> do_check_inuse_chunk().
>>>>
>>>> Result of bench-malloc-thread benchmark
>>>>
>>>> Test Platform: Xeon-8380
>>>> Ratio: New / Original time_per_iteration (Lower is Better)
>>>>
>>>> Threads# | Ratio
>>>> -----------|------
>>>> 1 thread | 0.994
>>>> 4 threads | 0.968
>>>>
>>>> The data shows it can brings 3% performance gain in multi-thread
>>>> scenario.
>>>>
>>>> ---
>>>> Changes in v2:
>>>> - _int_free_check() should be put outside of USE_TCACHE.
>>>> - Link to v1:
>>>> https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html
>>>> ---
>>>> malloc/malloc.c | 10 ++++++++--
>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>>>> index 264f35e1a3..efb5292e9f 100644
>>>> --- a/malloc/malloc.c
>>>> +++ b/malloc/malloc.c
>>>> @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
>>>> {
>>>> mchunkptr next;
>>>>
>>>> + if (av == NULL)
>>>> + av = arena_for_chunk (p);
>>>> +
>>>> do_check_chunk (av, p);
>>>>
>>>> if (chunk_is_mmapped (p))
>>>> @@ -3447,9 +3450,10 @@ __libc_free (void *mem)
>>>> /* Mark the chunk as belonging to the library again. */
>>>> (void)tag_region (chunk2mem (p), memsize (p));
>>>>
>>>> - ar_ptr = arena_for_chunk (p);
>>>> INTERNAL_SIZE_T size = chunksize (p);
>>>> - _int_free_check (ar_ptr, p, size);
>>>> + /* av is not needed for _int_free_check in non-DEBUG mode,
>>>> + in DEBUG mode, av will fetch from p in
>>>> do_check_inuse_chunk. */
>>>> + _int_free_check (NULL, p, size);
>>>>
>>>> If _int_free_check doesn't need av, can you push it further down?
>>>>
>>>> Something like this?
>>>>
>>>> The problem is _init_free is also used in other places. We can not
>>>> make sure `av` always got from `arena_for_chunk (p)`.
>>>
>>> Have you looked at my change?
>>>
>>> Your patch try to remove `av` parameters in _int_free_check:
>>>
>>> @@ -4684,7 +4693,7 @@ _int_free (mstate av, mchunkptr p, int have_lock)
>>>
>>> size = chunksize (p);
>>>
>>> - _int_free_check (av, p, size);
>>> + _int_free_check (p, size);
>>>
>>> And fetch `av` by `arena_for_chunk (p)`.
>>>
>>> +static inline void +do_check_inuse_chunk_fast (mstate av, mchunkptr
>>> p) +{ + mstate av = arena_for_chunk (p); + do_check_inuse_chunk (av,
>>> p); +}
>>>
>>> My point is for _int_free(av, p, have_lock), if `av` is not equals
>>> to `arena_for_chunk (p)`, then we can not simply omit it from
>>> `_int_free_check`.
>> Is inlining _int_free good enough for performance improvement?
>>
>> H.J.
> The major performance gain comes from inlining. I think that is good
> enough and can drop this patch.
New patchset has been sent out.
@@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
{
mchunkptr next;
+ if (av == NULL)
+ av = arena_for_chunk (p);
+
do_check_chunk (av, p);
if (chunk_is_mmapped (p))
@@ -3447,9 +3450,10 @@ __libc_free (void *mem)
/* Mark the chunk as belonging to the library again. */
(void)tag_region (chunk2mem (p), memsize (p));
- ar_ptr = arena_for_chunk (p);
INTERNAL_SIZE_T size = chunksize (p);
- _int_free_check (ar_ptr, p, size);
+ /* av is not needed for _int_free_check in non-DEBUG mode,
+ in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
+ _int_free_check (NULL, p, size);
#if USE_TCACHE
if (tcache_free (p, size))
@@ -3458,6 +3462,8 @@ __libc_free (void *mem)
return;
}
#endif
+
+ ar_ptr = arena_for_chunk (p);
_int_free_chunk (ar_ptr, p, size, 0);
}