Message ID | 20150210202703.GC21275@suse.de |
---|---|
State | Superseded |
Headers |
Received: (qmail 1347 invoked by alias); 10 Feb 2015 20:27:10 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 1330 invoked by uid 89); 10 Feb 2015 20:27:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Date: Tue, 10 Feb 2015 20:27:03 +0000 From: Mel Gorman <mgorman@suse.de> To: Carlos O'Donell <carlos@redhat.com> Cc: Rik van Riel <riel@redhat.com>, KOSAKI Motohiro <kosaki.motohiro@gmail.com>, Konstantin Serebryany <kcc@google.com>, Minchan Kim <minchan.kim@gmail.com>, libc-alpha@sourceware.org Subject: [PATCH] [v2] malloc: Consistently apply trim_threshold to all heaps Message-ID: <20150210202703.GC21275@suse.de> References: <20150209140608.GD2395@suse.de> <54D91E06.7060603@redhat.com> <20150209224947.GA21275@suse.de> <54DA25D0.3050501@redhat.com> <20150210165414.GF2395@suse.de> <54DA5599.2070804@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <54DA5599.2070804@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) |
Commit Message
Mel Gorman
Feb. 10, 2015, 8:27 p.m. UTC
Trimming heaps is a balance between saving memory and the system overhead required to update page tables and discard allocated pages. The malloc option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide where this balance point is but it is only applied to the main arena. For scalability reasons, glibc malloc has per-thread heaps but these are shrunk with madvise() if there is one page free at the top of the heap. In some circumstances this can lead to high system overhead if a thread has a control flow like while (data_to_process) { buf = malloc(large_size); do_stuff(); free(buf); } For a large size, the free() will call madvise (pagetable teardown, page free and TLB flush) every time followed immediately by a malloc (fault, kernel page alloc, zeroing and charge accounting). The kernel overhead can dominate such a workload. This patch allows the user to tune when madvise gets called by applying the trim threshold to the per-thread heaps. This is a basic test-case for the worst case scenario where every free is a madvise followed by a an alloc /* gcc bench-free.c -lpthread -o bench-free */ static int num = 1024; void __attribute__((noinline,noclone)) dostuff (void *p) { } void *worker (void *data) { int i; for (i = num; i--;) { void *m = malloc (48*4096); dostuff (m); free (m); } return NULL; } int main() { int i; pthread_t t; void *ret; if (pthread_create (&t, NULL, worker, NULL)) exit (2); if (pthread_join (t, &ret)) exit (3); return 0; } Before the patch, this resulted in 1024 calls to madvise. With the patch applied, madvise is called twice because the default trim threshold is high enough to avoid the problem. This a more complex case where there is a mix of mallocs and frees. It's simply a different worker function for the test case above void *worker (void *data) { int i; int j = 0; void *free_index[num]; for (i = num; i--;) { void *m = malloc ((i % 58) *4096); dostuff (m); if (i % 2 == 0) { free (m); } else { free_index[j++] = m; } } for (; j >= 0; j--) { free(free_index[j]); } return NULL; } glibc 2.21 calls malloc 90305 times but with the patch applied, it's called 13438. Increasing the trim threshold will decrease the number of times it's called if a user ever detected that madvise overhead and refaulting was a problem. ebizzy is meant to generate a workload resembling common web application server workloads. It is threaded with a large working set that at its core has an allocation, do_stuff, free loop that also hits this case. The primary metric of the benchmark is records processed per second. This is running on my desktop which is a single socket machine with an I7-4770 and 8 cores. Each thread count was run for 30 seconds. It was only run once as the performance difference is so high that the variation is insignificant. glibc 2.21 patch threads 1 10230 44114 threads 2 19153 84925 threads 4 34295 134569 threads 8 51007 183387 Note that the saving happens to be a concidence as the size allocated by ebizzy was less than the default threshold. If a different number of chunks were specified then it may also be necessary to tune the threshold to compensate By default on my machine, the patch is roughly quadrupling the performance of this benchmark. The difference in system CPU usage illustrates why. ebizzy running 1 thread with glibc 2.21 10230 records/s 306904 real 30.00 s user 7.47 s sys 22.49 s 22.49 seconds was spent in the kernel for a workload runinng 30 seconds. With the patch applied ebizzy running 1 thread with patch applied 44126 records/s 1323792 real 30.00 s user 29.97 s sys 0.00 s system CPU usage was zero with the patch applied. strace shows that glibc running this workload calls madvise approximately 9000 times a second. With the patch applied madvise was called twice during the workload (or 0.06 times per second). * malloc/arena.c (free): Apply trim threshold to per-thread heaps as well as the main arena. --- ChangeLog | 5 +++++ malloc/arena.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-)
Comments
On Tue, Feb 10, 2015 at 08:27:03PM +0000, Mel Gorman wrote: > Trimming heaps is a balance between saving memory and the system overhead > required to update page tables and discard allocated pages. The malloc > option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide > where this balance point is but it is only applied to the main arena. > > For scalability reasons, glibc malloc has per-thread heaps but these are > shrunk with madvise() if there is one page free at the top of the heap. > In some circumstances this can lead to high system overhead if a thread > has a control flow like > > while (data_to_process) { > buf = malloc(large_size); > do_stuff(); > free(buf); > } > > For a large size, the free() will call madvise (pagetable teardown, page > free and TLB flush) every time followed immediately by a malloc (fault, > kernel page alloc, zeroing and charge accounting). The kernel overhead > can dominate such a workload. > > This patch allows the user to tune when madvise gets called by applying > the trim threshold to the per-thread heaps. Thanks for fixing this. I consider this a bug, since the complementary tunable MALLOC_MMAP_THRESHOLD_ does get applied to non-main arenas. > diff --git a/ChangeLog b/ChangeLog > index dc1ed1ba1249..b860b2fe1850 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,3 +1,8 @@ > +2015-02-10 Mel Gorman <mgorman@suse.de> > + > + * malloc/arena.c (free): Apply trim threshold to per-thread heaps > + as well as the main arena. > + > 2015-02-06 Carlos O'Donell <carlos@systemhalted.org> This should not be in a diff since it won't always apply to the current tree cleanly. Please file a bug for this (since it is an unintended inconsistency) and I'll push it for you. Thanks, Siddhesh
On Wed, Feb 18, 2015 at 10:58:03AM +0530, Siddhesh Poyarekar wrote: > This should not be in a diff since it won't always apply to the > current tree cleanly. Please file a bug for this (since it is an > unintended inconsistency) and I'll push it for you. And I just noticed that you posted a v3 with a quoted bug report. The only additional nit there is that the ChangeLog entry should mention the BZ #. I'll push this. Siddhesh
On Wed, Feb 18, 2015 at 11:19:49AM +0530, Siddhesh Poyarekar wrote: > On Wed, Feb 18, 2015 at 10:58:03AM +0530, Siddhesh Poyarekar wrote: > > This should not be in a diff since it won't always apply to the > > current tree cleanly. Please file a bug for this (since it is an > > unintended inconsistency) and I'll push it for you. > > And I just noticed that you posted a v3 with a quoted bug report. The > only additional nit there is that the ChangeLog entry should mention > the BZ #. I'll push this. > Thanks for pointing out the hazards with the patch. The Changelog was included in the diff because it was done manually because I though that was required. I don't think it needs another bug to fix up as it is a misunderstanding on my part. There will be a v4 shortly that hopefully confirms to the patch rules.
diff --git a/ChangeLog b/ChangeLog index dc1ed1ba1249..b860b2fe1850 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2015-02-10 Mel Gorman <mgorman@suse.de> + + * malloc/arena.c (free): Apply trim threshold to per-thread heaps + as well as the main arena. + 2015-02-06 Carlos O'Donell <carlos@systemhalted.org> * version.h (RELEASE): Set to "stable". diff --git a/malloc/arena.c b/malloc/arena.c index 886defb074a2..a78d4835a825 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -696,7 +696,7 @@ heap_trim (heap_info *heap, size_t pad) } top_size = chunksize (top_chunk); extra = (top_size - pad - MINSIZE - 1) & ~(pagesz - 1); - if (extra < (long) pagesz) + if (extra < (long) mp_.trim_threshold) return 0; /* Try to shrink. */