Message ID | 20150306142934.GX3087@suse.de |
---|---|
State | Committed |
Headers |
Received: (qmail 122266 invoked by alias); 6 Mar 2015 14:29:44 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 122252 invoked by uid 89); 6 Mar 2015 14:29:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Date: Fri, 6 Mar 2015 14:29:34 +0000 From: Mel Gorman <mgorman@suse.de> To: Siddhesh Poyarekar <siddhesh@redhat.com> Cc: Mike Frysinger <vapier@gentoo.org>, Julian Taylor <jtaylor.debian@googlemail.com>, Chris Metcalf <cmetcalf@ezchip.com>, Carlos O'Donell <carlos@redhat.com>, libc-alpha@sourceware.org Subject: [PATCH] [v9] malloc: Consistently apply trim_threshold to all heaps [BZ #17195] Message-ID: <20150306142934.GX3087@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) |
Commit Message
Mel Gorman
March 6, 2015, 2:29 p.m. UTC
Trimming heaps is a balance between saving memory and the system overhead required to update page tables and discard allocated pages. The malloc option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide where this balance point is but it is only applied to the main arena. For scalability reasons, glibc malloc has per-thread heaps but these are shrunk with madvise() if there is one page free at the top of the heap. In some circumstances this can lead to high system overhead if a thread has a control flow like while (data_to_process) { buf = malloc(large_size); do_stuff(); free(buf); } For a large size, the free() will call madvise (pagetable teardown, page free and TLB flush) every time followed immediately by a malloc (fault, kernel page alloc, zeroing and charge accounting). The kernel overhead can dominate such a workload. This patch allows the user to tune when madvise gets called by applying the trim threshold to the per-thread heaps and using similar logic to the main arena when deciding whether to shrink. Alternatively if the dynamic brk/mmap threshold gets adjusted then the new values will be obeyed by the per-thread heaps. Bug 17195 was a test case motivated by a problem encountered in scientific applications written in python that performance badly due to high page fault overhead. The basic operation of such a program was posted by Julian Taylor https://sourceware.org/ml/libc-alpha/2015-02/msg00373.html With this patch applied, the overhead is eliminated. All numbers in this report are in seconds and were recorded by running Julian's program 30 times. pyarray glibc madvise 2.21 v2 System min 1.81 ( 0.00%) 0.00 (100.00%) System mean 1.93 ( 0.00%) 0.02 ( 99.20%) System stddev 0.06 ( 0.00%) 0.01 ( 88.99%) System max 2.06 ( 0.00%) 0.03 ( 98.54%) Elapsed min 3.26 ( 0.00%) 2.37 ( 27.30%) Elapsed mean 3.39 ( 0.00%) 2.41 ( 28.84%) Elapsed stddev 0.14 ( 0.00%) 0.02 ( 82.73%) Elapsed max 4.05 ( 0.00%) 2.47 ( 39.01%) glibc madvise 2.21 v2 User 141.86 142.28 System 57.94 0.60 Elapsed 102.02 72.66 Note that almost a minutes worth of system time is eliminted and the program completes 28% faster on average. To illustrate the problem without python this is a basic test-case for the worst case scenario where every free is a madvise followed by a an alloc /* gcc bench-free.c -lpthread -o bench-free */ static int num = 1024; void __attribute__((noinline,noclone)) dostuff (void *p) { } void *worker (void *data) { int i; for (i = num; i--;) { void *m = malloc (48*4096); dostuff (m); free (m); } return NULL; } int main() { int i; pthread_t t; void *ret; if (pthread_create (&t, NULL, worker, NULL)) exit (2); if (pthread_join (t, &ret)) exit (3); return 0; } Before the patch, this resulted in 1024 calls to madvise. With the patch applied, madvise is called twice because the default trim threshold is high enough to avoid this. This a more complex case where there is a mix of frees. It's simply a different worker function for the test case above void *worker (void *data) { int i; int j = 0; void *free_index[num]; for (i = num; i--;) { void *m = malloc ((i % 58) *4096); dostuff (m); if (i % 2 == 0) { free (m); } else { free_index[j++] = m; } } for (; j >= 0; j--) { free(free_index[j]); } return NULL; } glibc 2.21 calls malloc 90305 times but with the patch applied, it's called 13438. Increasing the trim threshold will decrease the number of times it's called with the option of eliminating the overhead. ebizzy is meant to generate a workload resembling common web application server workloads. It is threaded with a large working set that at its core has an allocation, do_stuff, free loop that also hits this case. The primary metric of the benchmark is records processed per second. This is running on my desktop which is a single socket machine with an I7-4770 and 8 cores. Each thread count was run for 30 seconds. It was only run once as the performance difference is so high that the variation is insignificant. glibc 2.21 patch threads 1 10230 44114 threads 2 19153 84925 threads 4 34295 134569 threads 8 51007 183387 Note that the saving happens to be a concidence as the size allocated by ebizzy was less than the default threshold. If a different number of chunks were specified then it may also be necessary to tune the threshold to compensate This is roughly quadrupling the performance of this benchmark. The difference in system CPU usage illustrates why. ebizzy running 1 thread with glibc 2.21 10230 records/s 306904 real 30.00 s user 7.47 s sys 22.49 s 22.49 seconds was spent in the kernel for a workload runinng 30 seconds. With the patch applied ebizzy running 1 thread with patch applied 44126 records/s 1323792 real 30.00 s user 29.97 s sys 0.00 s system CPU usage was zero with the patch applied. strace shows that glibc running this workload calls madvise approximately 9000 times a second. With the patch applied madvise was called twice during the workload (or 0.06 times per second). 2015-02-10 Mel Gorman <mgorman@suse.de> [BZ #17195] * malloc/arena.c (free): Apply trim threshold to per-thread heaps as well as the main arena.
Comments
looks fine to me, but prob want sign off from someone a bit more familiar with the malloc code -mike
On Fri, Mar 06, 2015 at 02:29:34PM +0000, Mel Gorman wrote: > Trimming heaps is a balance between saving memory and the system overhead > required to update page tables and discard allocated pages. The malloc > option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide > where this balance point is but it is only applied to the main arena. > Ping as it's been a while since anyone said anything on it. I note it's in patchwork (http://patchwork.sourceware.org/patch/5496/) but do not know if that means anyone plans to look at it. Thanks.
On 03/16/2015 12:13 PM, Mel Gorman wrote: > On Fri, Mar 06, 2015 at 02:29:34PM +0000, Mel Gorman wrote: >> Trimming heaps is a balance between saving memory and the system overhead >> required to update page tables and discard allocated pages. The malloc >> option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide >> where this balance point is but it is only applied to the main arena. >> > > Ping as it's been a while since anyone said anything on it. I note it's > in patchwork (http://patchwork.sourceware.org/patch/5496/) but do not > know if that means anyone plans to look at it. Thanks. > I actually have a build going with your patch and I'm trying to stress test it to see what impact it has on some real applications. Cheers, Carlos.
On Mon, Mar 16, 2015 at 02:15:17PM -0400, Carlos O'Donell wrote: > On 03/16/2015 12:13 PM, Mel Gorman wrote: > > On Fri, Mar 06, 2015 at 02:29:34PM +0000, Mel Gorman wrote: > >> Trimming heaps is a balance between saving memory and the system overhead > >> required to update page tables and discard allocated pages. The malloc > >> option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide > >> where this balance point is but it is only applied to the main arena. > >> > > > > Ping as it's been a while since anyone said anything on it. I note it's > > in patchwork (http://patchwork.sourceware.org/patch/5496/) but do not > > know if that means anyone plans to look at it. Thanks. > > > > I actually have a build going with your patch and I'm trying to stress test > it to see what impact it has on some real applications. > Thanks. Any luck with this?
On Fri, Mar 06, 2015 at 02:29:34PM +0000, Mel Gorman wrote: > Trimming heaps is a balance between saving memory and the system overhead > required to update page tables and discard allocated pages. The malloc > option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide > where this balance point is but it is only applied to the main arena. > Just pinging after another week as per the guidelines. Any luck with this? Thanks
On Tue, Mar 24, 2015 at 03:46:45PM +0000, Mel Gorman wrote: > On Mon, Mar 16, 2015 at 02:15:17PM -0400, Carlos O'Donell wrote: > > On 03/16/2015 12:13 PM, Mel Gorman wrote: > > > On Fri, Mar 06, 2015 at 02:29:34PM +0000, Mel Gorman wrote: > > >> Trimming heaps is a balance between saving memory and the system overhead > > >> required to update page tables and discard allocated pages. The malloc > > >> option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide > > >> where this balance point is but it is only applied to the main arena. > > >> > > > > > > Ping as it's been a while since anyone said anything on it. I note it's > > > in patchwork (http://patchwork.sourceware.org/patch/5496/) but do not > > > know if that means anyone plans to look at it. Thanks. > > > > > > > I actually have a build going with your patch and I'm trying to stress test > > it to see what impact it has on some real applications. > > > > Thanks. Any luck with this? > Is your testing finished? Also in my opinion M_TRIM_THRESHOLD looks like useless tunable. A better approach would be do trimming once per second to maximum in previous second. But that would be separate patch.
On Wed, Apr 01, 2015 at 07:02:38PM +0200, Ond??ej B?lka wrote: > On Tue, Mar 24, 2015 at 03:46:45PM +0000, Mel Gorman wrote: > > On Mon, Mar 16, 2015 at 02:15:17PM -0400, Carlos O'Donell wrote: > > > On 03/16/2015 12:13 PM, Mel Gorman wrote: > > > > On Fri, Mar 06, 2015 at 02:29:34PM +0000, Mel Gorman wrote: > > > >> Trimming heaps is a balance between saving memory and the system overhead > > > >> required to update page tables and discard allocated pages. The malloc > > > >> option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide > > > >> where this balance point is but it is only applied to the main arena. > > > >> > > > > > > > > Ping as it's been a while since anyone said anything on it. I note it's > > > > in patchwork (http://patchwork.sourceware.org/patch/5496/) but do not > > > > know if that means anyone plans to look at it. Thanks. > > > > > > > > > > I actually have a build going with your patch and I'm trying to stress test > > > it to see what impact it has on some real applications. > > > > > > > Thanks. Any luck with this? > > > Is your testing finished? > Yes. The changelog already has details on the public benchmarks I used to evaluate this. > Also in my opinion M_TRIM_THRESHOLD looks like useless tunable. A better > approach would be do trimming once per second to maximum in previous > second. But that would be separate patch. This is a separate entirely. Feel free to post a patch that removes M_TRIM_THRESHOLD and replaces it with a per-second trigger that shrinks heap based on historical usage.
----- Original Message ----- > > > > > Ping as it's been a while since anyone said anything on it. I note > > > > > it's > > > > > in patchwork (http://patchwork.sourceware.org/patch/5496/) but do not > > > > > know if that means anyone plans to look at it. Thanks. I've tested and pushed this patch now. In future please also run the glibc testsuite and mention in your submission what the results were. Thanks, Siddhesh
On Thu, Apr 02, 2015 at 02:47:13AM -0400, Siddhesh Poyarekar wrote: > ----- Original Message ----- > > > > > > Ping as it's been a while since anyone said anything on it. I note > > > > > > it's > > > > > > in patchwork (http://patchwork.sourceware.org/patch/5496/) but do not > > > > > > know if that means anyone plans to look at it. Thanks. > > I've tested and pushed this patch now. In future please also run the glibc > testsuite and mention in your submission what the results were. > Thanks very much. FWIW, the test suite was run and there was no change in the number of passes or failures. I'll put that in the changelog in the future.
diff --git a/malloc/arena.c b/malloc/arena.c index 886defb074a2..5babaed98dd1 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -658,7 +658,7 @@ heap_trim (heap_info *heap, size_t pad) unsigned long pagesz = GLRO (dl_pagesize); mchunkptr top_chunk = top (ar_ptr), p, bck, fwd; heap_info *prev_heap; - long new_size, top_size, extra, prev_size, misalign; + long new_size, top_size, top_area, extra, prev_size, misalign; /* Can this heap go away completely? */ while (top_chunk == chunk_at_offset (heap, sizeof (*heap))) @@ -694,9 +694,16 @@ heap_trim (heap_info *heap, size_t pad) set_head (top_chunk, new_size | PREV_INUSE); /*check_chunk(ar_ptr, top_chunk);*/ } + + /* Uses similar logic for per-thread arenas as the main arena with systrim + by preserving the top pad and at least a page. */ top_size = chunksize (top_chunk); - extra = (top_size - pad - MINSIZE - 1) & ~(pagesz - 1); - if (extra < (long) pagesz) + top_area = top_size - MINSIZE - 1; + if (top_area <= pad) + return 0; + + extra = ALIGN_DOWN(top_area - pad, pagesz); + if ((unsigned long) extra < mp_.trim_threshold) return 0; /* Try to shrink. */