From patchwork Thu Sep 28 15:28:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 23197 Received: (qmail 103161 invoked by alias); 28 Sep 2017 15:28:20 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 102762 invoked by uid 89); 28 Sep 2017 15:28:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=OLD, H*MI:outlook X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: "libc-alpha@sourceware.org" , "dj@redhat.com" CC: nd Subject: [PATCH][malloc] Speedup fastbin paths Date: Thu, 28 Sep 2017 15:28:14 +0000 Message-ID: x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB6PR0801MB2054; 6:q8dSASvkIj/J8cB451HyuJkcRQOc8xL+mbTCmo6dRMONUKqfwoLtLSIecnFTt107dM6A+coZe99rrsvMoNMxh/7dksWiwgOgShK9xys7V8LU0LXk4JZyJ4Vxilp5r7x7ho2VSQA3pPCcMVX5P26TFOVgCPn/dLlvdOgtfbxnzXmRvRWM+ncQdsJbLQsmFOoshRzY1Wo6j5vhGuaNutDkn+XehaQHpV5rrnS+Fjrt80PUpj8cra2u1xw70IkUpcUu2c5YbTCQ2eeEB+iyaK3vLYEQLEis3vi90b7CewKQCBYP3gyI3kXr8ZXtr4I1LARtXIjYCpCjQObnxrEcYiSEHA==; 5:zTlzgdMEIeENnwXjKIHfoEKEzTHEMPKQRHKR1T9p70YssSFAWL++ae265Ugv9ljwdugQq/5dAoLozLna2j4ePRVCO11Lip74/jNFWyG62F07A1Zcs3FzhG8nBM2Xhg/UKsU1QULxGqzbUU0tScqf6A==; 24:HVjNHjxOu4mZS4R7WNFoul3dLvU1JhmOFmNYt4qo86OFvWJ04FWo9/kwBzuN37Mev25LLuUxvuQz/EuEZMi3LWaI3kKgxHvSJ5MklK4HvFI=; 7:MMx5uhAesEKAkkkxLLKlnq+cuvYOaFZ54ikskSrjJkKk/hBdBB+XJ8EMsw3EYaThE/lf9T6lGTGyJqhfF/UqPUn99CQfx2YSaKb9+1c12YAl8BalMonyPyD9oOgjxGIP3QiwmQTJH4BLrYpftO5uNluoTzqvHm+PgIqXsL/biJwd3AeQPSLVmJy3mQWTJT0OxxWKm8bJ4PcbfvyeKt4uFl1nFjuplTYIiYnUHyR7+6w= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 57b9b309-d758-4786-53b3-08d506858953 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254152)(48565401081)(2017052603199)(201703131423075)(201703031133081)(201702281549075); SRVR:DB6PR0801MB2054; x-ms-traffictypediagnostic: DB6PR0801MB2054: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; nodisclaimer: True x-exchange-antispam-report-test: UriScan:(180628864354917); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(93006095)(93001095)(100000703101)(100105400095)(3002001)(10201501046)(6055026)(6041248)(20161123562025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123564025)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DB6PR0801MB2054; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DB6PR0801MB2054; x-forefront-prvs: 0444EB1997 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(346002)(376002)(39860400002)(199003)(377424004)(189002)(54534003)(2501003)(575784001)(7696004)(6436002)(2900100001)(8676002)(8936002)(81166006)(14454004)(81156014)(2906002)(53936002)(110136005)(3280700002)(3660700001)(97736004)(74316002)(7736002)(189998001)(5660300001)(316002)(50986999)(305945005)(54356999)(102836003)(4326008)(3846002)(5250100002)(478600001)(101416001)(25786009)(6116002)(72206003)(66066001)(86362001)(68736007)(55016002)(6506006)(99286003)(33656002)(106356001)(9686003)(105586002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB2054; H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Sep 2017 15:28:14.6524 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB2054 This patch speeds up the fastbin paths. When an application is single-threaded, we can execute a simpler, faster fastbin path. Doing this at a high level means we only need a single check which can bypass multiple locks, atomic instructions and related extra complexity. The speedup for fastbin allocations on AArch64 is about 2.4 times. The number of load/store exclusive instructions is now zero in single-threaded scenarios. Also inline tcache_get and tcache_put for a 16% performance gain of the tcache fast paths. Bench-malloc-thread speeds up by an extra ~5% on top of the previous patch for the single-threaded case and ~2.5% for the multi-threaded case (in total ~9% for 1 thread and ~6% for 32 threads with the have_fastchunk improvement). Passes GLIBC tests. OK to commit? ChangeLog: 2017-09-28 Wilco Dijkstra * malloc/malloc.c (sysdep-cancel.h): Add include. (tcache_put): Inline. (tcache_get): Inline. (__libc_malloc): Add SINGLE_THREAD_P path. (_int_malloc): Likewise. (_int_free): Likewise. diff --git a/malloc/malloc.c b/malloc/malloc.c index 082c2b927727bff441cf48744265628d0bc40add..88cfd25766eba6787faeb7195d95b73d7a4637ab 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -243,6 +243,10 @@ #include +/* For SINGLE_THREAD_P. */ +#include + + /* Debugging: @@ -2909,7 +2913,7 @@ static __thread tcache_perthread_struct *tcache = NULL; /* Caller must ensure that we know tc_idx is valid and there's room for more chunks. */ -static void +static inline void tcache_put (mchunkptr chunk, size_t tc_idx) { tcache_entry *e = (tcache_entry *) chunk2mem (chunk); @@ -2921,7 +2925,7 @@ tcache_put (mchunkptr chunk, size_t tc_idx) /* Caller must ensure that we know tc_idx is valid and there's available chunks to remove. */ -static void * +static inline void * tcache_get (size_t tc_idx) { tcache_entry *e = tcache->entries[tc_idx]; @@ -3030,24 +3034,34 @@ __libc_malloc (size_t bytes) DIAG_POP_NEEDS_COMMENT; #endif - arena_get (ar_ptr, bytes); - - victim = _int_malloc (ar_ptr, bytes); - /* Retry with another arena only if we were able to find a usable arena - before. */ - if (!victim && ar_ptr != NULL) + if (SINGLE_THREAD_P) { - LIBC_PROBE (memory_malloc_retry, 1, bytes); - ar_ptr = arena_get_retry (ar_ptr, bytes); - victim = _int_malloc (ar_ptr, bytes); + victim = _int_malloc (&main_arena, bytes); + assert (!victim || chunk_is_mmapped (mem2chunk (victim)) || + &main_arena == arena_for_chunk (mem2chunk (victim))); + return victim; } + else + { + arena_get (ar_ptr, bytes); - if (ar_ptr != NULL) - __libc_lock_unlock (ar_ptr->mutex); + victim = _int_malloc (ar_ptr, bytes); + /* Retry with another arena only if we were able to find a usable arena + before. */ + if (!victim && ar_ptr != NULL) + { + LIBC_PROBE (memory_malloc_retry, 1, bytes); + ar_ptr = arena_get_retry (ar_ptr, bytes); + victim = _int_malloc (ar_ptr, bytes); + } + + if (ar_ptr != NULL) + __libc_lock_unlock (ar_ptr->mutex); - assert (!victim || chunk_is_mmapped (mem2chunk (victim)) || - ar_ptr == arena_for_chunk (mem2chunk (victim))); - return victim; + assert (!victim || chunk_is_mmapped (mem2chunk (victim)) || + ar_ptr == arena_for_chunk (mem2chunk (victim))); + return victim; + } } libc_hidden_def (__libc_malloc) @@ -3526,39 +3540,81 @@ _int_malloc (mstate av, size_t bytes) if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ())) { - idx = fastbin_index (nb); - mfastbinptr *fb = &fastbin (av, idx); - mchunkptr pp = *fb; - REMOVE_FB (fb, victim, pp); - if (victim != 0) - { - if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0)) - malloc_printerr ("malloc(): memory corruption (fast)"); - check_remalloced_chunk (av, victim, nb); -#if USE_TCACHE - /* While we're here, if we see other chunks of the same size, - stash them in the tcache. */ - size_t tc_idx = csize2tidx (nb); - if (tcache && tc_idx < mp_.tcache_bins) + if (SINGLE_THREAD_P) + { + idx = fastbin_index (nb); + mfastbinptr *fb = &fastbin (av, idx); + mchunkptr next; + victim = *fb; + if (victim != NULL) { - mchunkptr tc_victim; - - /* While bin not empty and tcache not full, copy chunks over. */ - while (tcache->counts[tc_idx] < mp_.tcache_count - && (pp = *fb) != NULL) + size_t victim_idx = fastbin_index (chunksize (victim)); + next = victim->fd; + if (__builtin_expect (victim_idx != idx, 0)) + malloc_printerr ("malloc(): memory corruption (fast)"); + check_remalloced_chunk (av, victim, nb); +#if USE_TCACHE + /* While we're here, if we see other chunks of the same size, + stash them in the tcache. */ + size_t tc_idx = csize2tidx (nb); + if (next != NULL && tcache && tc_idx < mp_.tcache_bins) { - REMOVE_FB (fb, tc_victim, pp); - if (tc_victim != 0) + mchunkptr tc_victim; + + /* While bin not empty and tcache not full, copy chunks. */ + while (tcache->counts[tc_idx] < mp_.tcache_count) { + tc_victim = next; + next = tc_victim->fd; tcache_put (tc_victim, tc_idx); - } + if (next == NULL) + break; + } } - } #endif - void *p = chunk2mem (victim); - alloc_perturb (p, bytes); - return p; + *fb = next; + void *p = chunk2mem (victim); + alloc_perturb (p, bytes); + return p; + } } + else + { + idx = fastbin_index (nb); + mfastbinptr *fb = &fastbin (av, idx); + mchunkptr pp = *fb; + REMOVE_FB (fb, victim, pp); + if (victim != 0) + { + size_t victim_idx = fastbin_index (chunksize (victim)); + if (__builtin_expect (victim_idx != idx, 0)) + malloc_printerr ("malloc(): memory corruption (fast)"); + check_remalloced_chunk (av, victim, nb); +#if USE_TCACHE + /* While we're here, if we see other chunks of the same size, + stash them in the tcache. */ + size_t tc_idx = csize2tidx (nb); + if (tcache && tc_idx < mp_.tcache_bins) + { + mchunkptr tc_victim; + + /* While bin not empty and tcache not full, copy chunks. */ + while (tcache->counts[tc_idx] < mp_.tcache_count + && (pp = *fb) != NULL) + { + REMOVE_FB (fb, tc_victim, pp); + if (tc_victim != 0) + { + tcache_put (tc_victim, tc_idx); + } + } + } +#endif + void *p = chunk2mem (victim); + alloc_perturb (p, bytes); + return p; + } + } } /* @@ -4158,24 +4214,36 @@ _int_free (mstate av, mchunkptr p, int have_lock) /* Atomically link P to its fastbin: P->FD = *FB; *FB = P; */ mchunkptr old = *fb, old2; unsigned int old_idx = ~0u; - do + + if (SINGLE_THREAD_P && !have_lock) { - /* Check that the top of the bin is not the record we are going to add - (i.e., double free). */ if (__builtin_expect (old == p, 0)) malloc_printerr ("double free or corruption (fasttop)"); - /* Check that size of fastbin chunk at the top is the same as - size of the chunk that we are adding. We can dereference OLD - only if we have the lock, otherwise it might have already been - deallocated. See use of OLD_IDX below for the actual check. */ - if (have_lock && old != NULL) - old_idx = fastbin_index(chunksize(old)); - p->fd = old2 = old; + p->fd = old; + *fb = p; } - while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2); + else + { + do + { + /* Check that the top of the bin is not the record we are going to + add (i.e., double free). */ + if (__builtin_expect (old == p, 0)) + malloc_printerr ("double free or corruption (fasttop)"); + /* Check that size of fastbin chunk at the top is the same as + size of the chunk that we are adding. We can dereference OLD + only if we have the lock, otherwise it might have already been + deallocated. See use of OLD_IDX below for the actual check. */ + if (have_lock && old != NULL) + old_idx = fastbin_index (chunksize (old)); + p->fd = old2 = old; + } + while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) + != old2); - if (have_lock && old != NULL && __builtin_expect (old_idx != idx, 0)) - malloc_printerr ("invalid fastbin entry (free)"); + if (have_lock && old != NULL && __builtin_expect (old_idx != idx, 0)) + malloc_printerr ("invalid fastbin entry (free)"); + } } /*