From patchwork Fri Oct 20 14:51:47 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 23731 Received: (qmail 75420 invoked by alias); 20 Oct 2017 14:51:53 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 75408 invoked by uid 89); 20 Oct 2017 14:51:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: EUR01-DB5-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: DJ Delorie CC: "libc-alpha@sourceware.org" , nd Subject: Re: [PATCH 5/5] Add single-threaded path to _int_malloc Date: Fri, 20 Oct 2017 14:51:47 +0000 Message-ID: References: (message from Wilco Dijkstra on Thu, 12 Oct 2017 09:35:49 +0000), In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB6PR0801MB2055; 6:Wwm6Mdqdy3ND4CPaaA9cddE2xhxfbCLXjvv/w736gWTqT2JBUFMxu0CB9kAbPNlRoaWvEhybxqk7NwuPBIx8kgTrW2k7S91R1vYOI2d/oXOS/CbvD/RYuhqcr2fxrAL4fxetznhIo6GSSoPOlMrdCjerrn+HMg5FdkSnuPAw0bJtg7o2j6U2JtkcYquWadpPgiiOoxNAQ8rY8d5jZn4517Xd7ew4tZYa0wfWaT49fG4KyrqiSeMrygvgz5KDWbJw6wsrdQXrZCbRtv72XQ8dJOcxxRz9rrMJwNmgS+Tu2LXp57SQdrMs5gxrT5NPAJkDbvT+/lNLqmKjZcBp6ync1w==; 5:LmnDWjlLyykLloD61NjcMGdElxJ+FAdOwl1rQKUbJhMsC7olx/y1prM1F8Kiedoi6j/PUTaxWpyY+tJgPh9dZ9PVbE1z7YPkk10AJiUGDvO7IH5pnvScPHzwEMLTJb9M31CRkyBXwLokTyr0zBouAA==; 24:piArbWRUFvJSXr/YZxyzXjkw7uEhJ7pOea+T8BwkcZfiEK1U7/16uDdyLAp+18vrqMugJn0mwLdFYpE0kWCe02AdUqTLx8ujVhBrgUxheJE=; 7:zfW6usDIKHX+b+/wtvwU54ElZhMCB9BirtHcTqCFgwW4MWvBFHSw1RxjTiYQeZ7jl2ErKgnsDDhmCS7qkOMFvNjdfF2uWOOFjiks+ST8HxLG3A1wFd6wqh+2C1kd5pC3OV1XUh63XXffJVtPcBzh0Es/Whcyfk3FcADVurHarz/Eo3hhJiucIOWdFFpfOf3DUXPDG/So0uKEw+cZKrw3E1/HGHNuUF4Xg/CqMTW52JM= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 76a64755-1a1a-4c7a-69f5-08d517ca16b5 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(48565401081)(4534020)(4602075)(4627075)(201703031133081)(201702281549075)(2017052603199); SRVR:DB6PR0801MB2055; x-ms-traffictypediagnostic: DB6PR0801MB2055: nodisclaimer: True x-exchange-antispam-report-test: UriScan:(180628864354917); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(3002001)(93006095)(93001095)(10201501046)(3231020)(100000703101)(100105400095)(6055026)(6041248)(20161123562025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123564025)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DB6PR0801MB2055; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DB6PR0801MB2055; x-forefront-prvs: 0466CA5A45 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(346002)(376002)(199003)(189002)(377424004)(24454002)(54534003)(66066001)(2906002)(6506006)(81156014)(6436002)(305945005)(7696004)(86362001)(25786009)(4326008)(81166006)(72206003)(8676002)(7736002)(6246003)(4001150100001)(97736004)(2950100002)(14454004)(6916009)(189998001)(8936002)(68736007)(99286003)(101416001)(106356001)(3846002)(54906003)(5660300001)(5250100002)(6116002)(105586002)(3660700001)(3280700002)(2900100001)(55016002)(102836003)(50986999)(76176999)(478600001)(9686003)(54356999)(33656002)(316002)(53936002)(74316002)(229853002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB2055; H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 76a64755-1a1a-4c7a-69f5-08d517ca16b5 X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Oct 2017 14:51:47.3832 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB2055 DJ Delorie wrote: > +      if (SINGLE_THREAD_P) > Would a suitable __glibc_unlikely make sense here?  I can't convince > myself that either of these two cases would be rare enough to move to > cold sections, but I posit the question anyway... How do you mean? SINGLE_THREAD_P is more likely indeed but it's not so skewed that it makes sense to treat the multithreaded case as being cold... > +           void *p = chunk2mem (victim); > +           alloc_perturb (p, bytes); > +           return p; > +         } > The overall duplication of code looks... undesirable... but given > there's a macro in the middle we're changing, I don't see a good way to > factor it out, unless the SINGLE_THREAD_P test is so cheap we can > instead rewrite the REMOVE_FB() macro to use it instead of duplicating > code.  Could/did you consider that?  If the benchmarks show no > significant difference, I'd prefer cleaner code over an insignificant > speedup.  Might even call SINGLE_THREAD_P once and store the result in a > register, to let the compiler optimize it better. See new version below which avoids the duplication. It's slightly slower but not significantly, so it seems best to do it this way for now. However moving items from the fast bin to the tcache could be done in a much more efficient way in the future. > Remember, "fast at any cost" is not the goal.  "Fast while still > maintainable" is ;-) However those typically go together. Removing needless complexity results in performance gains. The code should be restructured to use more inline function calls. We could use the ifunc mechanism to decide between different variants - including for example for the single-thread optimization. Wilco ChangeLog: 2017-10-11 Wilco Dijkstra * malloc/malloc.c (_int_malloc): Add SINGLE_THREAD_P path. diff --git a/malloc/malloc.c b/malloc/malloc.c index ca5cfff3a1b1882ae608219fdec973b7f13cbb21..74eb961baba44aa6f11d6bb5e6408cfaaa7c3c47 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3569,37 +3569,50 @@ _int_malloc (mstate av, size_t bytes) { idx = fastbin_index (nb); mfastbinptr *fb = &fastbin (av, idx); - mchunkptr pp = *fb; - REMOVE_FB (fb, victim, pp); - if (victim != 0) - { - if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0)) - malloc_printerr ("malloc(): memory corruption (fast)"); - check_remalloced_chunk (av, victim, nb); -#if USE_TCACHE - /* While we're here, if we see other chunks of the same size, - stash them in the tcache. */ - size_t tc_idx = csize2tidx (nb); - if (tcache && tc_idx < mp_.tcache_bins) - { - mchunkptr tc_victim; + mchunkptr pp; + victim = *fb; - /* While bin not empty and tcache not full, copy chunks over. */ - while (tcache->counts[tc_idx] < mp_.tcache_count - && (pp = *fb) != NULL) + if (victim != NULL) + { + if (SINGLE_THREAD_P) + *fb = victim->fd; + else + REMOVE_FB (fb, pp, victim); + if (__glibc_likely (victim != NULL)) + { + size_t victim_idx = fastbin_index (chunksize (victim)); + if (__builtin_expect (victim_idx != idx, 0)) + malloc_printerr ("malloc(): memory corruption (fast)"); + check_remalloced_chunk (av, victim, nb); +#if USE_TCACHE + /* While we're here, if we see other chunks of the same size, + stash them in the tcache. */ + size_t tc_idx = csize2tidx (nb); + if (tcache && tc_idx < mp_.tcache_bins) { - REMOVE_FB (fb, tc_victim, pp); - if (tc_victim != 0) + mchunkptr tc_victim; + + /* While bin not empty and tcache not full, copy chunks. */ + while (tcache->counts[tc_idx] < mp_.tcache_count + && (tc_victim = *fb) != NULL) { + if (SINGLE_THREAD_P) + *fb = tc_victim->fd; + else + { + REMOVE_FB (fb, pp, tc_victim); + if (__glibc_unlikely (tc_victim == NULL)) + break; + } tcache_put (tc_victim, tc_idx); - } + } } - } #endif - void *p = chunk2mem (victim); - alloc_perturb (p, bytes); - return p; - } + void *p = chunk2mem (victim); + alloc_perturb (p, bytes); + return p; + } + } } /*