[RFC/PoC] malloc: use wfcqueue to speed up remote frees

  Carlos O'Donell <carlos@redhat.com> wrote:
> On 08/01/2018 02:23 AM, Eric Wong wrote:
> > Carlos O'Donell <carlos@redhat.com> wrote:
> >> On 07/31/2018 07:18 PM, Eric Wong wrote:
> >>> Also, if I spawn a bunch of threads and get a bunch of
> >>> arenas early in the program lifetime; and then only have few
> >>> threads later, there can be a lot of idle arenas.
> >>  
> >> Yes. That is true. We don't coalesce arenas to match the thread
> >> demand.
> > 
> > Eep :<    If contention can be avoided (which tcache seems to
> > work well for), limiting arenas to CPU count seems desirable and
> > worth trying.
> 
> Agreed.
> 
> In general it is not as bad as you think.
> 
> An arena is made up of a chain of heaps, each an mmap'd block, and
> if we can manage to free an entire heap then we unmap the heap,
> and if we're lucky we can manage to free down the entire arena
> (_int_free -> large chunk / consolidate -> heap_trim -> shrink_heap).
> 
> So we might just end up with a large number of arena's that don't
> have very much allocated at all, but are all on the arena free list
> waiting for a thread to attach to them to reduce overall contention.
> 
> I agree that it would be *better* if we had one arena per CPU and
> each thread could easily determine the CPU it was on (via a
> restartable sequence) and then allocate CPU-local memory to work
> with (the best you can do; ignoring NUMA effects).

Thanks for the info on arenas.  One problem for Ruby is we get
many threads[1], and they create allocations of varying
lifetimes.  All this while malloc contention is rarely a
problem in Ruby because of the global VM lock (GVL).

Even without restartable sequences, I was wondering if lfstack
(also in urcu) could even be used for sharing/distributing
arenas between threads.  This would require tcache to avoid
retries on lfstack pop/push.

Much less straighforward than using wfcqueue for frees with
this patch, though :)

[1] we only had green-threads back in Ruby 1.8, and I guess many
    Rubyists got used to the idea that they could have many
    threads cheaply.  Ruby 1.9+ moved to 100% native threads,
    so I'm also trying to reintroduce green threads as an option
    back into Ruby (but still keeping native threads)

> > OK, I noticed my patch fails conformance tests because
> > (despite my use of __cds_wfcq_splice_nonblocking) it references
> > poll(), despite poll() being in an impossible code path:
> > 
> >    __cds_wfcq_splice_nonblocking -> ___cds_wfcq_splice
> > 	   -> ___cds_wfcq_busy_wait -> poll
> > 
> > The poll call is impossible because the `blocking' parameter is 0;
> > but I guess the linker doesn't know that?
> 
> Correct. We can fix that easily at a later date. Don't worry about it.

Heh, a bit dirty, but #define-ing poll away seems to work :)

[RFC/PoC] malloc: use wfcqueue to speed up remote frees

Commit Message

Comments

Patch