malloc: Trim unused arenas on thread exit

Message ID fa4afa75-6165-9655-fc59-21f7d5170c07@redhat.com
State Not applicable
Headers

Commit Message

Florian Weimer Nov. 9, 2017, 10:55 a.m. UTC
  I tried the attached patch to trim unused arenas on thread exit.  The 
trimming actually happens (the heap consolidation is visible in the 
malloc_info output from tst-malloc_info), but the arena heaps aren't 
deallocated.

I think trimming unused arenas as much as possible is a good heuristics 
to minimize RSS, so getting this to work might be worthwhile.

Thanks,
Florian
  

Comments

Siddhesh Poyarekar Nov. 10, 2017, 10:52 a.m. UTC | #1
On Thursday 09 November 2017 04:25 PM, Florian Weimer wrote:
> I tried the attached patch to trim unused arenas on thread exit.  The
> trimming actually happens (the heap consolidation is visible in the
> malloc_info output from tst-malloc_info), but the arena heaps aren't
> deallocated.
> 
> I think trimming unused arenas as much as possible is a good heuristics
> to minimize RSS, so getting this to work might be worthwhile.

I wonder if the overhead of unmapping the arena heaps is worthwhile.
For cases where it matters (overcommit is a ratio or is disabled or for
__libc_enable_secure), trimming of the heaps should reduce the commit
charge already since we do remap the pages as PROT_NONE.

The patch seems OK, I'm just wondering if the additional work is worth
the effort because it will also hurt applications that spawn threads
frequently and have similar resource usage; they'll miss out on the
caching effect.

Siddhesh

> Thanks,
> Florian
> 
> mtrim.patch
> 
> 
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 85b985e193..758226c222 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -953,12 +953,22 @@ arena_thread_freeres (void)
>        /* If this was the last attached thread for this arena, put the
>  	 arena on the free list.  */
>        assert (a->attached_threads > 0);
> -      if (--a->attached_threads == 0)
> +      bool arena_is_unused = --a->attached_threads == 0;
> +      if (arena_is_unused)
>  	{
>  	  a->next_free = free_list;
>  	  free_list = a;
>  	}
>        __libc_lock_unlock (free_list_lock);
> +
> +      /* If there are no more users, compact the arena as much as
> +	 possible.  */
> +      if (arena_is_unused)
> +	{
> +	  __libc_lock_lock (a->mutex);
> +	  mtrim (a, 0);
> +	  __libc_lock_unlock (a->mutex);
> +	}
>      }
>  }
>  text_set_element (__libc_thread_subfreeres, arena_thread_freeres);
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 1f003d2ef0..a0b11784d2 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -1831,7 +1831,7 @@ malloc_init_state (mstate av)
>  static void *sysmalloc (INTERNAL_SIZE_T, mstate);
>  static int      systrim (size_t, mstate);
>  static void     malloc_consolidate (mstate);
> -
> +static int mtrim (mstate av, size_t pad);
>  
>  /* -------------- Early definitions for debugging hooks ---------------- */
>  
>
  
Florian Weimer Nov. 10, 2017, 1:13 p.m. UTC | #2
On 11/10/2017 11:52 AM, Siddhesh Poyarekar wrote:
> On Thursday 09 November 2017 04:25 PM, Florian Weimer wrote:
>> I tried the attached patch to trim unused arenas on thread exit.  The
>> trimming actually happens (the heap consolidation is visible in the
>> malloc_info output from tst-malloc_info), but the arena heaps aren't
>> deallocated.
>>
>> I think trimming unused arenas as much as possible is a good heuristics
>> to minimize RSS, so getting this to work might be worthwhile.
> 
> I wonder if the overhead of unmapping the arena heaps is worthwhile.
> For cases where it matters (overcommit is a ratio or is disabled or for
> __libc_enable_secure), trimming of the heaps should reduce the commit
> charge already since we do remap the pages as PROT_NONE.

I would have expected that the test makes the second sub-heap completely 
unused, so that the existing logic would unmap it.  But that does not 
seem to happen.

> The patch seems OK, I'm just wondering if the additional work is worth
> the effort because it will also hurt applications that spawn threads
> frequently and have similar resource usage; they'll miss out on the
> caching effect.

We could do this not for the current arena, but he arena at the op of 
the free list.  Then we'd return some caching, and you'd get a ping-pong 
effect only if you stop and start more than one thread.

The consolidation we should still perform on the current arena, maybe 
even if it is not unused.  I assume that before the thread exits, it 
deallocates some resources, and we should make sure that malloc sees 
them once the arena is reused, even from a thread which has a vastly 
different allocation pattern.

Thanks,
Florian
  

Patch

diff --git a/malloc/arena.c b/malloc/arena.c
index 85b985e193..758226c222 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -953,12 +953,22 @@  arena_thread_freeres (void)
       /* If this was the last attached thread for this arena, put the
 	 arena on the free list.  */
       assert (a->attached_threads > 0);
-      if (--a->attached_threads == 0)
+      bool arena_is_unused = --a->attached_threads == 0;
+      if (arena_is_unused)
 	{
 	  a->next_free = free_list;
 	  free_list = a;
 	}
       __libc_lock_unlock (free_list_lock);
+
+      /* If there are no more users, compact the arena as much as
+	 possible.  */
+      if (arena_is_unused)
+	{
+	  __libc_lock_lock (a->mutex);
+	  mtrim (a, 0);
+	  __libc_lock_unlock (a->mutex);
+	}
     }
 }
 text_set_element (__libc_thread_subfreeres, arena_thread_freeres);
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 1f003d2ef0..a0b11784d2 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1831,7 +1831,7 @@  malloc_init_state (mstate av)
 static void *sysmalloc (INTERNAL_SIZE_T, mstate);
 static int      systrim (size_t, mstate);
 static void     malloc_consolidate (mstate);
-
+static int mtrim (mstate av, size_t pad);
 
 /* -------------- Early definitions for debugging hooks ---------------- */