Bug 22111: Fix malloc tcache leak.

Message ID	586a9486-ff6e-300c-ddec-6625ee3db67a@redhat.com
State	Committed
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org To: GNU C Library <libc-alpha@sourceware.org> From: Carlos O'Donell <carlos@redhat.com> Subject: [PATCH] Bug 22111: Fix malloc tcache leak. Message-ID: <586a9486-ff6e-300c-ddec-6625ee3db67a@redhat.com> Date: Tue, 26 Sep 2017 23:44:18 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit

Message ID

586a9486-ff6e-300c-ddec-6625ee3db67a@redhat.com

State

Committed

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
To: GNU C Library <libc-alpha@sourceware.org>
From: Carlos O'Donell <carlos@redhat.com>
Subject: [PATCH] Bug 22111: Fix malloc tcache leak.
Message-ID: <586a9486-ff6e-300c-ddec-6625ee3db67a@redhat.com>
Date: Tue, 26 Sep 2017 23:44:18 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.3.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Commit Message

Carlos O'Donell Sept. 27, 2017, 5:44 a.m. UTC

  commit message---
The malloc tcache added in 2.26 will leak all of the elements
remaining in the cache and the cache structure itself when a
thread exits. The defect is that we do not set tcache_shutting_down
early enough, and the thread simply recreates the tcache and
places the elements back onto a new tcache which is subsequently
lost as the thread exits (unfreed memory). The fix is relatively
simple, move the setting of tcache_shutting_down earlier in the
tcache_thread_freeres. We add a test case which uses mallinfo
and some heuristics to look for 1000x memory usage between the
start and end of a thread start/join loop. It is very reliable
at detecting that there is a leak given the number of iterations.
Without the fix the test will consume 122MiB of leaked memory.

Tested on x86_64 with no regressions.

Signed-off-by: Carlos O'Donell <carlos@redhat.com>
---

OK to checkin? In particular I'd be happy to see if there is any
way to make the test better.

Once approved I'll backport this to 2.26 stable branch.

2017-09-26  Carlos O'Donell  <carlos@redhat.com>

	[BZ #22111]
	* malloc/malloc.c (tcache_shutting_down): Use bool type.
	(tcache_thread_freeres): Set tcache_shutting_down before
	freeing the tcache.
	* malloc/Makefile (tests): Add tst-malloc-tcache-leak.
	* malloc/tst-malloc-tcache-leak.c: New file.

---

Comments

Florian Weimer Sept. 27, 2017, 8:57 a.m. UTC | #1

On 09/27/2017 07:44 AM, Carlos O'Donell wrote:
> commit message---

Usually, we use something like this as the subject line:

malloc: Fix tcache leak on thread destruction [BZ #22111]

That is, subsystem first, description of the fix, and bug reference last.

> The malloc tcache added in 2.26 will leak all of the elements
> remaining in the cache and the cache structure itself when a
> thread exits. The defect is that we do not set tcache_shutting_down
> early enough, and the thread simply recreates the tcache and
> places the elements back onto a new tcache which is subsequently
> lost as the thread exits (unfreed memory). The fix is relatively
> simple, move the setting of tcache_shutting_down earlier in the
> tcache_thread_freeres. We add a test case which uses mallinfo
> and some heuristics to look for 1000x memory usage between the
> start and end of a thread start/join loop. It is very reliable
> at detecting that there is a leak given the number of iterations.
> Without the fix the test will consume 122MiB of leaked memory.

See below, commit message may need updating.

The change itself looks good to me.

> diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
> new file mode 100644
> index 0000000..ad7ea46
> --- /dev/null
> +++ b/malloc/tst-malloc-tcache-leak.c
> @@ -0,0 +1,111 @@


> +static int
> +do_test (void)
> +{
> +  /* Allocate an arbitrary number of threads that can run concurrently
> +     and carry out one allocation from the tcahce and then exit.  We

typo: “tcahce”

> +     could do it one thread at a time to minimize memory usage, but
> +     it's faster to do 10 threads at a time, and we won't run out of
> +     memory.  */

The comment is wrong for NUMA systems at least.  Running single-threaded 
is faster, particularly if you set the CPU affinity first.

> +#define TNUM 10

TNUM should be parallel_threads or something like that, but I think it 
is unneeded.

> +  pthread_t *threads;
> +  unsigned int i, loop;
> +  struct mallinfo info_before, info_after;
> +  void *retval;
> +
> +  /* Avoid there being 0 malloc'd data at this point by allocating the
> +     pthread_t's required to run the test.  */
> +  threads = (pthread_t *) xmalloc (sizeof (pthread_t) * TNUM);

Should use xcalloc.

> +  info_before = mallinfo ();
> +
> +  assert (info_before.uordblks != 0);
> +
> +  printf ("INFO: %d (bytes) are in use before starting threads.\n",
> +          info_before.uordblks);
> +
> +  /* Again this is an arbitrary choice.  We run the 10 threads 10,000
> +     times for a total of 100,000 threads created and joined.  This
> +     gives us enough iterations to show a leak.  */
> +  for (loop = 0; loop < 10000; loop++)

If you keep the parallel execution (maybe it is faster on small systems, 
I have not tried), the outer loop count should involve TNUM.

> +    {
> +
> +      for (i = 0; i < TNUM; i++)
> +	{
> +	  threads[i] = xpthread_create (NULL, worker, NULL);
> +	}

Extraneous braces.

> +      for (i = 0; i < TNUM; i++)
> +	{
> +	  retval = xpthread_join (threads[i]);
> +	  free (retval);
> +	}
> +    }
> +
> +  info_after = mallinfo ();
> +  printf ("INFO: %d (bytes) are in use after all threads joined.\n",
> +          info_after.uordblks);
> +
> +  /* We need to compare the memory in use before and the memory in use
> +     after starting and joining 100,000 threads.  We almost always grow
> +     memory slightly, but not much.  If the growth factor is over 1000
> +     then we know we have a leak with high confidence.  Consider that if
> +     even 1-byte leaked per thread we'd have 100,000 bytes of additional
> +     memory, and in general the in-use at the start of main is quite
> +     low.  */
> +  if (info_after.uordblks > (info_before.uordblks * 1000))
> +    FAIL_EXIT1 ("Memory usage after threads is too high.\n");

I think you should use a fixed offset here, not something that 
essentially scales with the initially allocated amount of memory.

Thanks,
Florian

diff mbox

Patch

diff --git a/malloc/Makefile b/malloc/Makefile
index 50b487e..6cf78e1 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -34,6 +34,7 @@  tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
 	 tst-interpose-nothread \
 	 tst-interpose-thread \
 	 tst-alloc_buffer \
+	 tst-malloc-tcache-leak \
 
 tests-static := \
 	 tst-interpose-static-nothread \
@@ -242,3 +243,5 @@  tst-dynarray-fail-ENV = MALLOC_TRACE=$(objpfx)tst-dynarray-fail.mtrace
 $(objpfx)tst-dynarray-fail-mem.out: $(objpfx)tst-dynarray-fail.out
 	$(common-objpfx)malloc/mtrace $(objpfx)tst-dynarray-fail.mtrace > $@; \
 	$(evaluate-test)
+
+$(objpfx)tst-malloc-tcache-leak: $(shared-thread-library)
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 1c2a0b0..d3fcadd 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2916,7 +2916,7 @@  typedef struct tcache_perthread_struct
   tcache_entry *entries[TCACHE_MAX_BINS];
 } tcache_perthread_struct;
 
-static __thread char tcache_shutting_down = 0;
+static __thread bool tcache_shutting_down = false;
 static __thread tcache_perthread_struct *tcache = NULL;
 
 /* Caller must ensure that we know tc_idx is valid and there's room
@@ -2953,8 +2953,12 @@  tcache_thread_freeres (void)
   if (!tcache)
     return;
 
+  /* Disable the tcache and prevent it from being reinitialized.  */
   tcache = NULL;
+  tcache_shutting_down = true;
 
+  /* Free all of the entries and the tcache itself back to the arena
+     heap for coalescing.  */
   for (i = 0; i < TCACHE_MAX_BINS; ++i)
     {
       while (tcache_tmp->entries[i])
@@ -2966,8 +2970,6 @@  tcache_thread_freeres (void)
     }
 
   __libc_free (tcache_tmp);
-
-  tcache_shutting_down = 1;
 }
 text_set_element (__libc_thread_subfreeres, tcache_thread_freeres);
 
diff --git a/malloc/tst-malloc-tcache-leak.c b/malloc/tst-malloc-tcache-leak.c
new file mode 100644
index 0000000..ad7ea46
--- /dev/null
+++ b/malloc/tst-malloc-tcache-leak.c
@@ -0,0 +1,111 @@ 
+/* Bug 22111: Test that threads do not leak their per thread cache.
+   Copyright (C) 2015-2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The point of this test is to start and exit a large number of
+   threads, while at the same time looking to see if the used
+   memory grows with each round of threads run.  If the memory
+   grows above some linear bound we declare the test failed and
+   that the malloc implementation is leaking memory with each
+   thread.  This is a good indicator that the thread local cache
+   is leaking chunks.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+#include <pthread.h>
+#include <assert.h>
+
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xthread.h>
+
+void *
+worker (void *data)
+{
+  /* Allocate an arbitrary amount of memory that is known to fit into
+     the thread local cache (tcache).  If we have at least 64 bins
+     (default e.g. TCACHE_MAX_BINS) we should be able to allocate 32
+     bytes and force malloc to fill the tcache.  We have the allocated
+     memory escape back to the parent to be freed to avoid any compiler
+     optimizations.  */
+  return (void *) xmalloc (32);
+}
+
+static int
+do_test (void)
+{
+  /* Allocate an arbitrary number of threads that can run concurrently
+     and carry out one allocation from the tcahce and then exit.  We
+     could do it one thread at a time to minimize memory usage, but
+     it's faster to do 10 threads at a time, and we won't run out of
+     memory.  */
+#define TNUM 10
+  pthread_t *threads;
+  unsigned int i, loop;
+  struct mallinfo info_before, info_after;
+  void *retval;
+
+  /* Avoid there being 0 malloc'd data at this point by allocating the
+     pthread_t's required to run the test.  */
+  threads = (pthread_t *) xmalloc (sizeof (pthread_t) * TNUM);
+
+  info_before = mallinfo ();
+
+  assert (info_before.uordblks != 0);
+
+  printf ("INFO: %d (bytes) are in use before starting threads.\n",
+          info_before.uordblks);
+
+  /* Again this is an arbitrary choice.  We run the 10 threads 10,000
+     times for a total of 100,000 threads created and joined.  This
+     gives us enough iterations to show a leak.  */
+  for (loop = 0; loop < 10000; loop++)
+    {
+
+      for (i = 0; i < TNUM; i++)
+	{
+	  threads[i] = xpthread_create (NULL, worker, NULL);
+	}
+
+      for (i = 0; i < TNUM; i++)
+	{
+	  retval = xpthread_join (threads[i]);
+	  free (retval);
+	}
+    }
+
+  info_after = mallinfo ();
+  printf ("INFO: %d (bytes) are in use after all threads joined.\n",
+          info_after.uordblks);
+
+  /* We need to compare the memory in use before and the memory in use
+     after starting and joining 100,000 threads.  We almost always grow
+     memory slightly, but not much.  If the growth factor is over 1000
+     then we know we have a leak with high confidence.  Consider that if
+     even 1-byte leaked per thread we'd have 100,000 bytes of additional
+     memory, and in general the in-use at the start of main is quite
+     low.  */
+  if (info_after.uordblks > (info_before.uordblks * 1000))
+    FAIL_EXIT1 ("Memory usage after threads is too high.\n");
+
+  /* Did not detect excessive memory usage.  */
+  free (threads);
+  exit (0);
+}
+
+#include <support/test-driver.c>