From patchwork Wed Aug 1 09:26:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 28709 Received: (qmail 94281 invoked by alias); 1 Aug 2018 09:26:31 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 93767 invoked by uid 89); 1 Aug 2018 09:26:30 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-24.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=reintroduce, Hx-languages-length:3246, cheaply, distributing X-HELO: dcvr.yhbt.net Date: Wed, 1 Aug 2018 09:26:26 +0000 From: Eric Wong To: Carlos O'Donell Cc: libc-alpha@sourceware.org Subject: Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees Message-ID: <20180801092626.jrwyrojfye4avcis@whir> References: <20180731084936.g4yw6wnvt677miti@dcvr> <0cfdccea-d173-486c-85f4-27e285a30a1a@redhat.com> <20180731231819.57xsqvdfdyfxrzy5@whir> <20180801062352.rlrjqmsszntkzlfe@untitled> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Carlos O'Donell wrote: > On 08/01/2018 02:23 AM, Eric Wong wrote: > > Carlos O'Donell wrote: > >> On 07/31/2018 07:18 PM, Eric Wong wrote: > >>> Also, if I spawn a bunch of threads and get a bunch of > >>> arenas early in the program lifetime; and then only have few > >>> threads later, there can be a lot of idle arenas. > >> > >> Yes. That is true. We don't coalesce arenas to match the thread > >> demand. > > > > Eep :< If contention can be avoided (which tcache seems to > > work well for), limiting arenas to CPU count seems desirable and > > worth trying. > > Agreed. > > In general it is not as bad as you think. > > An arena is made up of a chain of heaps, each an mmap'd block, and > if we can manage to free an entire heap then we unmap the heap, > and if we're lucky we can manage to free down the entire arena > (_int_free -> large chunk / consolidate -> heap_trim -> shrink_heap). > > So we might just end up with a large number of arena's that don't > have very much allocated at all, but are all on the arena free list > waiting for a thread to attach to them to reduce overall contention. > > I agree that it would be *better* if we had one arena per CPU and > each thread could easily determine the CPU it was on (via a > restartable sequence) and then allocate CPU-local memory to work > with (the best you can do; ignoring NUMA effects). Thanks for the info on arenas. One problem for Ruby is we get many threads[1], and they create allocations of varying lifetimes. All this while malloc contention is rarely a problem in Ruby because of the global VM lock (GVL). Even without restartable sequences, I was wondering if lfstack (also in urcu) could even be used for sharing/distributing arenas between threads. This would require tcache to avoid retries on lfstack pop/push. Much less straighforward than using wfcqueue for frees with this patch, though :) [1] we only had green-threads back in Ruby 1.8, and I guess many Rubyists got used to the idea that they could have many threads cheaply. Ruby 1.9+ moved to 100% native threads, so I'm also trying to reintroduce green threads as an option back into Ruby (but still keeping native threads) > > OK, I noticed my patch fails conformance tests because > > (despite my use of __cds_wfcq_splice_nonblocking) it references > > poll(), despite poll() being in an impossible code path: > > > > __cds_wfcq_splice_nonblocking -> ___cds_wfcq_splice > > -> ___cds_wfcq_busy_wait -> poll > > > > The poll call is impossible because the `blocking' parameter is 0; > > but I guess the linker doesn't know that? > > Correct. We can fix that easily at a later date. Don't worry about it. Heh, a bit dirty, but #define-ing poll away seems to work :) diff --git a/malloc/malloc.c b/malloc/malloc.c index 40d61e45db..89e675c7a0 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -247,6 +247,11 @@ /* For SINGLE_THREAD_P. */ #include +/* prevent wfcqueue.h from including poll.h and linking to it */ +#include +#undef poll +#define poll(a,b,c) assert(0 && "should not be called") + #define _LGPL_SOURCE /* allows inlines */ #include