From patchwork Mon Jan 14 05:01:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kemi Wang X-Patchwork-Id: 31050 Received: (qmail 69716 invoked by alias); 14 Jan 2019 05:06:14 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 68683 invoked by uid 89); 14 Jan 2019 05:06:14 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.0 required=5.0 tests=BAYES_20, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=Unlock, waking, 48, 4.8 X-HELO: mga12.intel.com From: Kemi Wang To: Carlos , Siddhesh Poyarekar , Glibc alpha Cc: Kemi Wang Subject: [PATCH v3 1/2] Mutex: Accelerate lock acquisition by queuing spinner Date: Mon, 14 Jan 2019 13:01:43 +0800 Message-Id: <1547442104-24364-1-git-send-email-kemi.wang@intel.com> Adaptive mutex indicates the semantic of spinning for a while before calling into the kernel to block. Thus, the lock of adaptive mutex could be acquired via immediate getting, spinning getting or waking up. Currently, the spin-waiting algorithm of adaptive mutex is for each processor to repeatedly execute an test_and_set instruction until either the maximum spin count is reached or the lock is acquired. However, the lock performance via spinning getting will degrade significantly as the number of spinning processors increases. Two factors at least cause this degradation[1]. First, in order to release the lock, the lock holder has to contend with spinning processors for exclusive access of lock cache line. (E.g. "lock; decl 0%" of lll_unlock() in sysdeps/unix/sysv/linux/x86_64/lowlevellock.h for pthread_mutex_unlock()). For most multiprocessor architectures, it has to wait behind those test_and_set instructions from spinning processors. Furthermore, on these invalidation-based cache coherency system, test_and_set instruction will trigger a "read-for-ownership" request for exclusive access to the lock cache line, it will potentially slow the access of other locations(shares the cache line with the lock at least) by the lock holder. In this patch, we propose another spin-waiting algorithm to accelerate lock acquisition by queuing spinners based-on MCS[2] lock. With MCS algorithm, only the header of queue is allowed to spin on the mutex lock, others spin on locally-accessible flag variables. Thus, the previous negative factors are eliminated. The implementation of this MCS-based spin-waiting algorithm requires an additional pointer to hold the tail of queue. To keep the size of mutex data structure constant and maintain the user space ABI unchanged, the __list field which is originally for implementing robust futex will be reused. Therefore, we propose a new type mutex with GNU extension PTHREAD_MUTEX_QUEUESPINNER_NP in this patch. Pass the ABI compatibility test by running "make check-abi" The benchmark is available at branch mutex_testing in https://github.com/kemicoco/tst-spinlock.git. The tunable pthread.mutex_spin_count is set to 10000 which is big enough in our testing for fair comparison. The first test case emulates a practical workload with mathematical calculation. The second test case provided by our customer emulates the lock contention in a distributed file system. Each workload runs with multiple threads in parallel, each workload does a) Acquire the lock; b) Do some work in critical section; c) Unlock d) Wait a random time (0~3000 TSCs) in a loop until 5 seconds, and obtain the total iterations. Testing on 2s-Skylake platform (112 cores with 62G RAM, HT=on). TC1: Hashwork Thread num adaptive mutex mcs mutex 1 7297792 7298755 (+0.01%) 2 9332105 9752901 (+4.51%) 3 10428251 11029771 (+5.7%) 4 10572596 11203997 (+5.9%) 5 10496815 11008433 (+4.8%) 6 10292946 10569467 (+2.6%) 7 9861111 10153538 (+2.97%) 14 5845303 9756283 (+66.91%) 28 4299209 8985135 (+109.0%) 56 3298821 5747645 (+74.23%) 112 2941309 5629700 (+91.4%) 448 2821056 3716799 (+31.75%) TC2: Test_and_set instruction on shared variables Thread num adaptive mutex mcs mutex 1 7748765 7751963 (+0.04%) 2 8521596 9251649 (+8.57%) 3 9594653 9890211 (+3.08%) 4 9934361 9800205 (-1.35%) 5 8146007 9597982 (+17.82%) 6 6896436 9367882 (+35.84%) 7 5943880 9159893 (+54.11%) 14 4194305 8623029 (+105.59%) 28 3374218 7818631 (+131.72%) 56 2533912 4836622 (+90.88%) 112 2541950 4850938 (+90.84%) 448 2507000 5345149 (+113.21%) Test result on workstation(16 cores with 32G RAM, HT=on) TC1: Hashwork Thread num adaptive mutex mcs mutex 1 11419169 11430369 (+0.1%) 2 15364452 15873667 (+3.31%) 3 17234014 17547329 (+1.82%) 4 17711736 17613548 (-0.55%) 5 16583392 17626707 (+6.29%) 6 14855586 17305468 (+16.49%) 7 12948851 17130807 (+32.3%) 14 8698172 15322237 (+76.15%) 16 8123930 14937645 (+83.87%) 64 7946006 5685132 (-28.45%) TC2: Test_and_set instruction on shared variables Thread num adaptive mutex mcs mutex 1 12535156 12595629 (+0.48%) 2 15665576 15929652 (+1.69%) 3 17469331 16881225 (-3.37%) 4 14833035 15777572 (+6.37%) 5 12376033 15256528 (+23.27%) 6 10568899 14693661 (+39.03%) 7 9657775 14486039 (+49.99%) 14 8015061 14112508 (+76.07%) 16 7641725 13935473 (+82.36%) 64 7571112 7735482 (+2.17%) Potential issues: a) As the preemption can't be disabled at userland during spinning, MCS style lock potentially has the risk to collapse lock performance when CPUs are heavily oversubscribed. But generally, MCS-based spin-waiting algorithm performs much better than the existed one. We may consider to mitigate this issue by using a cancellable MCS to prevent unnecessary active waiting in future. Reference: [1]"The performance of spin lock alternatives for shared-memory multiprocessors" https://www.cc.gatech.edu/classes/AY2009/cs4210_fall/papers/anderson-spinlock.pdf. [2]"Algorithms for scalable synchronization on shared-memory multiprocessors" http://www.cs.rochester.edu/~scott/papers/1991_TOCS_synch.pdf * NEWS: New entry. * nptl/Makefile (libpthread-routines): Add mcs_lock. * nptl/mcs_lock.c: New file * nptl/mcs_lock.h: New file * nptl/allocatestack.c (__data.__list): Convert to __data.__list.__list_t * nptl/descr.h (__data.__list): Likewise * nptl/nptl-init.c (__data.__list): Likewise * nptl/pthreadP.h: Extend mutex type mask * nptl/pthread_mutex_init.c: Add robust futex checking * nptl/pthread_mutex_lock.c: Implement a new mutex prototype * nptl/pthread_mutex_timedlock.c: Likewise * nptl/pthread_mutex_trylock.c: Likewise * nptl/pthread_mutex_unlock.c: Likewise * nptl/pthread_mutexattr_settype.c: Set a new mutex attribution * sysdeps/nptl/bits/thread-shared-types.h: Redefine struct __pthread_internal_list and define mcs_lock_t * sysdeps/nptl/pthread.h: Add a new mutex type * sysdeps/unix/sysv/linux/hppa/pthread.h: Likewise May todo list: a) Add mutex performance test cases at benchtests b) Update mutex description at nptl/thread.texi c) Update nptl/printers.py V2->V3 (Most for addressing Siddhesh Poyarekar's comments): a) Add changelog entry b) Add detailed description on synchronization usage in mcs_lock.c c) Correct formatting on mcs_lock ()/mcs_unlock () definition d) Only keep necessary info in NEWS e) Remove superfluous barrier in mcs_lock.c f) Remove one line superfluous code (node.locked = 1) g) Use atomic_load_acquire in mcs_lock () instead of atomic_load_relaxed to synchronizes-with atomic_store_release in mcs_unlock h) Add PI/PP support for this new type mutex with GUN extension PTHREAD_MUTEX_QUEUESPINNER_NP V1->V2: a) Add one line description before copyright in new files b) Add one entry to introduce this new type of mutex with GNU PTHREAD_MUTEX_QUEUESPINNER_NP extension in NEWS Signed-off-by: Kemi Wang --- NEWS | 3 ++ nptl/Makefile | 3 +- nptl/allocatestack.c | 2 +- nptl/descr.h | 26 +++++----- nptl/mcs_lock.c | 87 +++++++++++++++++++++++++++++++++ nptl/mcs_lock.h | 23 +++++++++ nptl/nptl-init.c | 2 +- nptl/pthreadP.h | 6 ++- nptl/pthread_mutex_init.c | 5 ++ nptl/pthread_mutex_lock.c | 36 +++++++++++++- nptl/pthread_mutex_timedlock.c | 33 +++++++++++-- nptl/pthread_mutex_trylock.c | 7 ++- nptl/pthread_mutex_unlock.c | 9 +++- nptl/pthread_mutexattr_settype.c | 2 +- sysdeps/nptl/bits/thread-shared-types.h | 21 ++++++-- sysdeps/nptl/pthread.h | 15 ++++-- sysdeps/unix/sysv/linux/hppa/pthread.h | 4 ++ 17 files changed, 247 insertions(+), 37 deletions(-) create mode 100644 nptl/mcs_lock.c create mode 100644 nptl/mcs_lock.h diff --git a/NEWS b/NEWS index ae80818..0521b56 100644 --- a/NEWS +++ b/NEWS @@ -46,6 +46,9 @@ Major new features: incosistent mutex state after fork call in multithread environment. In both popen and system there is no direct access to user-defined mutexes. +* The mutex with PTHREAD_MUTEX_QUEUESPINNER_NP GNU extension has been added, + which accelerates lock acquisition by queuing spinners. + Deprecated and removed features, and other changes affecting compatibility: * The glibc.tune tunable namespace has been renamed to glibc.cpu and the diff --git a/nptl/Makefile b/nptl/Makefile index 34ae830..0de8df3 100644 --- a/nptl/Makefile +++ b/nptl/Makefile @@ -145,7 +145,8 @@ libpthread-routines = nptl-init nptlfreeres vars events version pt-interp \ mtx_destroy mtx_init mtx_lock mtx_timedlock \ mtx_trylock mtx_unlock call_once cnd_broadcast \ cnd_destroy cnd_init cnd_signal cnd_timedwait cnd_wait \ - tss_create tss_delete tss_get tss_set pthread_mutex_conf + tss_create tss_delete tss_get tss_set pthread_mutex_conf \ + mcs_lock # pthread_setuid pthread_seteuid pthread_setreuid \ # pthread_setresuid \ # pthread_setgid pthread_setegid pthread_setregid \ diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 04e3f08..9f47129 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -749,7 +749,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, might have happened in the kernel. */ pd->robust_head.futex_offset = (offsetof (pthread_mutex_t, __data.__lock) - offsetof (pthread_mutex_t, - __data.__list.__next)); + __data.__list.__list_t.__next)); pd->robust_head.list_op_pending = NULL; #if __PTHREAD_MUTEX_HAVE_PREV pd->robust_prev = &pd->robust_head; diff --git a/nptl/descr.h b/nptl/descr.h index 9c01e1b..e47c7cf 100644 --- a/nptl/descr.h +++ b/nptl/descr.h @@ -184,38 +184,38 @@ struct pthread FIXME We should use relaxed MO atomic operations here and signal fences because this kind of concurrency is similar to synchronizing with a signal handler. */ -# define QUEUE_PTR_ADJUST (offsetof (__pthread_list_t, __next)) +# define QUEUE_PTR_ADJUST (offsetof (__pthread_list_t, __list_t.__next)) # define ENQUEUE_MUTEX_BOTH(mutex, val) \ do { \ __pthread_list_t *next = (__pthread_list_t *) \ ((((uintptr_t) THREAD_GETMEM (THREAD_SELF, robust_head.list)) & ~1ul) \ - QUEUE_PTR_ADJUST); \ - next->__prev = (void *) &mutex->__data.__list.__next; \ - mutex->__data.__list.__next = THREAD_GETMEM (THREAD_SELF, \ + next->__list_t.__prev = (void *) &mutex->__data.__list.__list_t.__next; \ + mutex->__data.__list.__list_t.__next = THREAD_GETMEM (THREAD_SELF, \ robust_head.list); \ - mutex->__data.__list.__prev = (void *) &THREAD_SELF->robust_head; \ + mutex->__data.__list.__list_t.__prev = (void *) &THREAD_SELF->robust_head; \ /* Ensure that the new list entry is ready before we insert it. */ \ __asm ("" ::: "memory"); \ THREAD_SETMEM (THREAD_SELF, robust_head.list, \ - (void *) (((uintptr_t) &mutex->__data.__list.__next) \ + (void *) (((uintptr_t) &mutex->__data.__list.__list_t.__next) \ | val)); \ } while (0) # define DEQUEUE_MUTEX(mutex) \ do { \ __pthread_list_t *next = (__pthread_list_t *) \ - ((char *) (((uintptr_t) mutex->__data.__list.__next) & ~1ul) \ + ((char *) (((uintptr_t) mutex->__data.__list.__list_t.__next) & ~1ul) \ - QUEUE_PTR_ADJUST); \ - next->__prev = mutex->__data.__list.__prev; \ + next->__list_t.__prev = mutex->__data.__list.__list_t.__prev; \ __pthread_list_t *prev = (__pthread_list_t *) \ - ((char *) (((uintptr_t) mutex->__data.__list.__prev) & ~1ul) \ + ((char *) (((uintptr_t) mutex->__data.__list.__list_t.__prev) & ~1ul) \ - QUEUE_PTR_ADJUST); \ - prev->__next = mutex->__data.__list.__next; \ + prev->__list_t.__next = mutex->__data.__list.__list_t.__next; \ /* Ensure that we remove the entry from the list before we change the \ __next pointer of the entry, which is read by the kernel. */ \ __asm ("" ::: "memory"); \ - mutex->__data.__list.__prev = NULL; \ - mutex->__data.__list.__next = NULL; \ + mutex->__data.__list.__list_t.__prev = NULL; \ + mutex->__data.__list.__list_t.__next = NULL; \ } while (0) #else union @@ -226,7 +226,7 @@ struct pthread # define ENQUEUE_MUTEX_BOTH(mutex, val) \ do { \ - mutex->__data.__list.__next \ + mutex->__data.__list.__list_t.__next \ = THREAD_GETMEM (THREAD_SELF, robust_list.__next); \ /* Ensure that the new list entry is ready before we insert it. */ \ __asm ("" ::: "memory"); \ @@ -253,7 +253,7 @@ struct pthread /* Ensure that we remove the entry from the list before we change the \ __next pointer of the entry, which is read by the kernel. */ \ __asm ("" ::: "memory"); \ - mutex->__data.__list.__next = NULL; \ + mutex->__data.__list.__list_t.__next = NULL; \ } \ } while (0) #endif diff --git a/nptl/mcs_lock.c b/nptl/mcs_lock.c new file mode 100644 index 0000000..835fbce --- /dev/null +++ b/nptl/mcs_lock.c @@ -0,0 +1,87 @@ +/* MCS-style lock for queuing spinners + Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "pthreadP.h" +#include + +static __thread mcs_lock_t node = { + NULL, + 0, +}; + +void +mcs_lock (mcs_lock_t **lock) +{ + mcs_lock_t *prev; + + /* Initialize node. */ + node.next = NULL; + node.locked = 0; + + /* Move the tail pointer of the queue (lock) to point to current TLS node + * atomically, and return the previous tail pointer of the queue. This + * atomic_exchange_acquire ensures the strict FIFO order in the queue when + * multiple spinners compete to enqueue. */ + prev = atomic_exchange_acquire(lock, &node); + + /* No active spinner, mcs lock is acquired immediately. */ + if (prev == NULL) + return; + + /* Add current spinner into the queue. */ + atomic_store_relaxed (&prev->next, &node); + /* Waiting until mcs lock holder is passed down from the previous spinner. + * Using atomic_load_acquire to synchronizes-with the atomic_store_release + * in mcs_unlock, and ensure that prior critical sections happen-before + * this critical section. */ + while (!atomic_load_acquire (&node.locked)) + atomic_spin_nop (); +} + +void +mcs_unlock (mcs_lock_t **lock) +{ + mcs_lock_t *next = node.next; + + /* If the next pointer is not NULL, it indicates that we are not the last + * spinner in the queue, thus we have to pass down mcs lock holder to the + * next spinner before exiting. If the next pointer is NULL, there are + * two possible cases, a) we are the last one in the queue, thus we reset + * the tail pointer of the queue to NULL, b) the tail pointer of the + * queue has moved to point to a new spinner, but this new spinner has + * not been added to the queue. thus we have to wait until it is added to + * the queue. */ + if (next == NULL) + { + /* a) Use a CAS instruction to set lock to NULL if the tail pointer + * points to current node. */ + if (atomic_compare_and_exchange_val_acq(lock, NULL, &node) == &node) + return; + + /* b) When we reaches here, we hit case b) aforementioned because + * there is a time window between updating the tail pointer of queue + * and adding a new spinner to the queue in mcs_lock (). Using + * relax MO until we observe the new spinner is added to the queue. */ + while (! (next = atomic_load_relaxed (&node.next))) + atomic_spin_nop (); + } + + /* Pass down mcs lock holder to next spinner, synchronize-with + * atomic_load_acquire in mcs_lock. */ + atomic_store_release (&next->locked, 1); +} diff --git a/nptl/mcs_lock.h b/nptl/mcs_lock.h new file mode 100644 index 0000000..5ee32f5 --- /dev/null +++ b/nptl/mcs_lock.h @@ -0,0 +1,23 @@ +/* MCS-style lock for queuing spinners + Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +extern void mcs_lock (mcs_lock_t **lock) + __attribute__ ((visibility ("hidden"))); + +extern void mcs_unlock (mcs_lock_t **lock) + __attribute__ ((visibility ("hidden"))); diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c index 20ff3fd..d20481a 100644 --- a/nptl/nptl-init.c +++ b/nptl/nptl-init.c @@ -289,7 +289,7 @@ __pthread_initialize_minimal_internal (void) #ifdef __NR_set_robust_list pd->robust_head.futex_offset = (offsetof (pthread_mutex_t, __data.__lock) - offsetof (pthread_mutex_t, - __data.__list.__next)); + __data.__list.__list_t.__next)); INTERNAL_SYSCALL_DECL (err); int res = INTERNAL_SYSCALL (set_robust_list, err, 2, &pd->robust_head, sizeof (struct robust_list_head)); diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h index 7f16ba9..7f5d8d2 100644 --- a/nptl/pthreadP.h +++ b/nptl/pthreadP.h @@ -67,7 +67,7 @@ static inline short max_adaptive_count (void) /* Internal mutex type value. */ enum { - PTHREAD_MUTEX_KIND_MASK_NP = 3, + PTHREAD_MUTEX_KIND_MASK_NP = 7, PTHREAD_MUTEX_ELISION_NP = 256, PTHREAD_MUTEX_NO_ELISION_NP = 512, @@ -88,6 +88,8 @@ enum = PTHREAD_MUTEX_PRIO_INHERIT_NP | PTHREAD_MUTEX_ERRORCHECK_NP, PTHREAD_MUTEX_PI_ADAPTIVE_NP = PTHREAD_MUTEX_PRIO_INHERIT_NP | PTHREAD_MUTEX_ADAPTIVE_NP, + PTHREAD_MUTEX_PI_QUEUESPINNER_NP + = PTHREAD_MUTEX_PRIO_INHERIT_NP | PTHREAD_MUTEX_QUEUESPINNER_NP, PTHREAD_MUTEX_PI_ROBUST_NORMAL_NP = PTHREAD_MUTEX_PRIO_INHERIT_NP | PTHREAD_MUTEX_ROBUST_NORMAL_NP, PTHREAD_MUTEX_PI_ROBUST_RECURSIVE_NP @@ -105,6 +107,8 @@ enum = PTHREAD_MUTEX_PRIO_PROTECT_NP | PTHREAD_MUTEX_ERRORCHECK_NP, PTHREAD_MUTEX_PP_ADAPTIVE_NP = PTHREAD_MUTEX_PRIO_PROTECT_NP | PTHREAD_MUTEX_ADAPTIVE_NP, + PTHREAD_MUTEX_PP_QUEUESPINNER_NP + = PTHREAD_MUTEX_PRIO_PROTECT_NP | PTHREAD_MUTEX_QUEUESPINNER_NP, PTHREAD_MUTEX_ELISION_FLAGS_NP = PTHREAD_MUTEX_ELISION_NP | PTHREAD_MUTEX_NO_ELISION_NP, diff --git a/nptl/pthread_mutex_init.c b/nptl/pthread_mutex_init.c index 5cf290c..99f1707 100644 --- a/nptl/pthread_mutex_init.c +++ b/nptl/pthread_mutex_init.c @@ -111,6 +111,11 @@ __pthread_mutex_init (pthread_mutex_t *mutex, return ENOTSUP; #endif + /* Robust mutex does not support the PTHREAD_MUTEX_QUEUESPINNER_NP + GNU extension. */ + if ((imutexattr->mutexkind & PTHREAD_MUTEX_QUEUESPINNER_NP) != 0) + return ENOTSUP; + mutex_kind |= PTHREAD_MUTEX_ROBUST_NORMAL_NP; } diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c index 474b4df..db1b6d1 100644 --- a/nptl/pthread_mutex_lock.c +++ b/nptl/pthread_mutex_lock.c @@ -26,6 +26,7 @@ #include #include #include +#include #ifndef lll_lock_elision #define lll_lock_elision(lock, try_lock, private) ({ \ @@ -118,6 +119,35 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) mutex->__data.__count = 1; } else if (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex) + == PTHREAD_MUTEX_QUEUESPINNER_NP, 1)) + { + if (! __is_smp) + goto simple; + + if (LLL_MUTEX_TRYLOCK (mutex) != 0) + { + int cnt = 0; + int max_cnt = MIN (max_adaptive_count (), + mutex->__data.__spins * 2 + 10); + int val = 0; + + mcs_lock ((mcs_lock_t **)&mutex->__data.__list.mcs_lock); + + do + { + atomic_spin_nop (); + val = atomic_load_relaxed (&mutex->__data.__lock); + } + while (val != 0 && ++cnt < max_cnt); + + mcs_unlock ((mcs_lock_t **)&mutex->__data.__list.mcs_lock); + LLL_MUTEX_LOCK (mutex); + + mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; + } + assert (mutex->__data.__owner == 0); + } + else if (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex) == PTHREAD_MUTEX_ADAPTIVE_NP, 1)) { if (! __is_smp) @@ -179,7 +209,7 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) case PTHREAD_MUTEX_ROBUST_NORMAL_NP: case PTHREAD_MUTEX_ROBUST_ADAPTIVE_NP: THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, - &mutex->__data.__list.__next); + &mutex->__data.__list.__list_t.__next); /* We need to set op_pending before starting the operation. Also see comments at ENQUEUE_MUTEX. */ __asm ("" ::: "memory"); @@ -347,6 +377,7 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) case PTHREAD_MUTEX_PI_ERRORCHECK_NP: case PTHREAD_MUTEX_PI_NORMAL_NP: case PTHREAD_MUTEX_PI_ADAPTIVE_NP: + case PTHREAD_MUTEX_PI_QUEUESPINNER_NP: case PTHREAD_MUTEX_PI_ROBUST_RECURSIVE_NP: case PTHREAD_MUTEX_PI_ROBUST_ERRORCHECK_NP: case PTHREAD_MUTEX_PI_ROBUST_NORMAL_NP: @@ -365,7 +396,7 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) { /* Note: robust PI futexes are signaled by setting bit 0. */ THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, - (void *) (((uintptr_t) &mutex->__data.__list.__next) + (void *) (((uintptr_t) &mutex->__data.__list.__list_t.__next) | 1)); /* We need to set op_pending before starting the operation. Also see comments at ENQUEUE_MUTEX. */ @@ -509,6 +540,7 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) case PTHREAD_MUTEX_PP_ERRORCHECK_NP: case PTHREAD_MUTEX_PP_NORMAL_NP: case PTHREAD_MUTEX_PP_ADAPTIVE_NP: + case PTHREAD_MUTEX_PP_QUEUESPINNER_NP: { /* See concurrency notes regarding __kind in struct __pthread_mutex_s in sysdeps/nptl/bits/thread-shared-types.h. */ diff --git a/nptl/pthread_mutex_timedlock.c b/nptl/pthread_mutex_timedlock.c index 453b824..75bd734 100644 --- a/nptl/pthread_mutex_timedlock.c +++ b/nptl/pthread_mutex_timedlock.c @@ -25,6 +25,7 @@ #include #include #include +#include #include @@ -135,13 +136,37 @@ __pthread_mutex_timedlock (pthread_mutex_t *mutex, mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; } break; - + case PTHREAD_MUTEX_QUEUESPINNER_NP: + if (! __is_smp) + goto simple; + + if (lll_trylock (mutex) != 0) + { + int cnt = 0; + int max_cnt = MIN (max_adaptive_count (), + mutex->__data.__spins * 2 + 10); + int val = 0; + + mcs_lock ((mcs_lock_t **)&mutex->__data.__list.mcs_lock); + do + { + atomic_spin_nop (); + val = atomic_load_relaxed (&mutex->__data.__lock); + } + while (val != 0 && ++cnt < max_cnt); + mcs_unlock ((mcs_lock_t **)&mutex->__data.__list.mcs_lock); + result = lll_timedlock(mutex->__data.__lock, abstime, + PTHREAD_MUTEX_PSHARED (mutex)); + + mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; + } + break; case PTHREAD_MUTEX_ROBUST_RECURSIVE_NP: case PTHREAD_MUTEX_ROBUST_ERRORCHECK_NP: case PTHREAD_MUTEX_ROBUST_NORMAL_NP: case PTHREAD_MUTEX_ROBUST_ADAPTIVE_NP: THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, - &mutex->__data.__list.__next); + &mutex->__data.__list.__list_t.__next); /* We need to set op_pending before starting the operation. Also see comments at ENQUEUE_MUTEX. */ __asm ("" ::: "memory"); @@ -335,6 +360,7 @@ __pthread_mutex_timedlock (pthread_mutex_t *mutex, case PTHREAD_MUTEX_PI_ERRORCHECK_NP: case PTHREAD_MUTEX_PI_NORMAL_NP: case PTHREAD_MUTEX_PI_ADAPTIVE_NP: + case PTHREAD_MUTEX_PI_QUEUESPINNER_NP: case PTHREAD_MUTEX_PI_ROBUST_RECURSIVE_NP: case PTHREAD_MUTEX_PI_ROBUST_ERRORCHECK_NP: case PTHREAD_MUTEX_PI_ROBUST_NORMAL_NP: @@ -353,7 +379,7 @@ __pthread_mutex_timedlock (pthread_mutex_t *mutex, { /* Note: robust PI futexes are signaled by setting bit 0. */ THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, - (void *) (((uintptr_t) &mutex->__data.__list.__next) + (void *) (((uintptr_t) &mutex->__data.__list.__list_t.__next) | 1)); /* We need to set op_pending before starting the operation. Also see comments at ENQUEUE_MUTEX. */ @@ -516,6 +542,7 @@ __pthread_mutex_timedlock (pthread_mutex_t *mutex, case PTHREAD_MUTEX_PP_ERRORCHECK_NP: case PTHREAD_MUTEX_PP_NORMAL_NP: case PTHREAD_MUTEX_PP_ADAPTIVE_NP: + case PTHREAD_MUTEX_PP_QUEUESPINNER_NP: { /* See concurrency notes regarding __kind in struct __pthread_mutex_s in sysdeps/nptl/bits/thread-shared-types.h. */ diff --git a/nptl/pthread_mutex_trylock.c b/nptl/pthread_mutex_trylock.c index fa90c1d..2545d15 100644 --- a/nptl/pthread_mutex_trylock.c +++ b/nptl/pthread_mutex_trylock.c @@ -78,6 +78,7 @@ __pthread_mutex_trylock (pthread_mutex_t *mutex) FORCE_ELISION (mutex, goto elision); /*FALL THROUGH*/ case PTHREAD_MUTEX_ADAPTIVE_NP: + case PTHREAD_MUTEX_QUEUESPINNER_NP: case PTHREAD_MUTEX_ERRORCHECK_NP: if (lll_trylock (mutex->__data.__lock) != 0) break; @@ -93,7 +94,7 @@ __pthread_mutex_trylock (pthread_mutex_t *mutex) case PTHREAD_MUTEX_ROBUST_NORMAL_NP: case PTHREAD_MUTEX_ROBUST_ADAPTIVE_NP: THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, - &mutex->__data.__list.__next); + &mutex->__data.__list.__list_t.__next); oldval = mutex->__data.__lock; do @@ -196,6 +197,7 @@ __pthread_mutex_trylock (pthread_mutex_t *mutex) case PTHREAD_MUTEX_PI_ERRORCHECK_NP: case PTHREAD_MUTEX_PI_NORMAL_NP: case PTHREAD_MUTEX_PI_ADAPTIVE_NP: + case PTHREAD_MUTEX_PI_QUEUESPINNER_NP: case PTHREAD_MUTEX_PI_ROBUST_RECURSIVE_NP: case PTHREAD_MUTEX_PI_ROBUST_ERRORCHECK_NP: case PTHREAD_MUTEX_PI_ROBUST_NORMAL_NP: @@ -213,7 +215,7 @@ __pthread_mutex_trylock (pthread_mutex_t *mutex) if (robust) /* Note: robust PI futexes are signaled by setting bit 0. */ THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, - (void *) (((uintptr_t) &mutex->__data.__list.__next) + (void *) (((uintptr_t) &mutex->__data.__list.__list_t.__next) | 1)); oldval = mutex->__data.__lock; @@ -332,6 +334,7 @@ __pthread_mutex_trylock (pthread_mutex_t *mutex) case PTHREAD_MUTEX_PP_ERRORCHECK_NP: case PTHREAD_MUTEX_PP_NORMAL_NP: case PTHREAD_MUTEX_PP_ADAPTIVE_NP: + case PTHREAD_MUTEX_PP_QUEUESPINNER_NP: { /* See concurrency notes regarding __kind in struct __pthread_mutex_s in sysdeps/nptl/bits/thread-shared-types.h. */ diff --git a/nptl/pthread_mutex_unlock.c b/nptl/pthread_mutex_unlock.c index 68d04d5..abca8e7 100644 --- a/nptl/pthread_mutex_unlock.c +++ b/nptl/pthread_mutex_unlock.c @@ -78,6 +78,9 @@ __pthread_mutex_unlock_usercnt (pthread_mutex_t *mutex, int decr) goto normal; } else if (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex) + == PTHREAD_MUTEX_QUEUESPINNER_NP, 1)) + goto normal; + else if (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex) == PTHREAD_MUTEX_ADAPTIVE_NP, 1)) goto normal; else @@ -142,7 +145,7 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) robust: /* Remove mutex from the list. */ THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, - &mutex->__data.__list.__next); + &mutex->__data.__list.__list_t.__next); /* We must set op_pending before we dequeue the mutex. Also see comments at ENQUEUE_MUTEX. */ __asm ("" ::: "memory"); @@ -213,6 +216,7 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) case PTHREAD_MUTEX_PI_ERRORCHECK_NP: case PTHREAD_MUTEX_PI_NORMAL_NP: case PTHREAD_MUTEX_PI_ADAPTIVE_NP: + case PTHREAD_MUTEX_PI_QUEUESPINNER_NP: case PTHREAD_MUTEX_PI_ROBUST_ERRORCHECK_NP: case PTHREAD_MUTEX_PI_ROBUST_NORMAL_NP: case PTHREAD_MUTEX_PI_ROBUST_ADAPTIVE_NP: @@ -242,7 +246,7 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) /* Remove mutex from the list. Note: robust PI futexes are signaled by setting bit 0. */ THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, - (void *) (((uintptr_t) &mutex->__data.__list.__next) + (void *) (((uintptr_t) &mutex->__data.__list.__list_t.__next) | 1)); /* We must set op_pending before we dequeue the mutex. Also see comments at ENQUEUE_MUTEX. */ @@ -311,6 +315,7 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) case PTHREAD_MUTEX_PP_NORMAL_NP: case PTHREAD_MUTEX_PP_ADAPTIVE_NP: + case PTHREAD_MUTEX_PP_QUEUESPINNER_NP: /* Always reset the owner field. */ pp: mutex->__data.__owner = 0; diff --git a/nptl/pthread_mutexattr_settype.c b/nptl/pthread_mutexattr_settype.c index 7d36cc3..c2382b4 100644 --- a/nptl/pthread_mutexattr_settype.c +++ b/nptl/pthread_mutexattr_settype.c @@ -25,7 +25,7 @@ __pthread_mutexattr_settype (pthread_mutexattr_t *attr, int kind) { struct pthread_mutexattr *iattr; - if (kind < PTHREAD_MUTEX_NORMAL || kind > PTHREAD_MUTEX_ADAPTIVE_NP) + if (kind < PTHREAD_MUTEX_NORMAL || kind > PTHREAD_MUTEX_QUEUESPINNER_NP) return EINVAL; /* Cannot distinguish between DEFAULT and NORMAL. So any settype diff --git a/sysdeps/nptl/bits/thread-shared-types.h b/sysdeps/nptl/bits/thread-shared-types.h index 05c94e7..1cf8874 100644 --- a/sysdeps/nptl/bits/thread-shared-types.h +++ b/sysdeps/nptl/bits/thread-shared-types.h @@ -79,15 +79,19 @@ /* Common definition of pthread_mutex_t. */ #if !__PTHREAD_MUTEX_USE_UNION -typedef struct __pthread_internal_list +typedef union __pthread_internal_list { - struct __pthread_internal_list *__prev; - struct __pthread_internal_list *__next; + struct { + union __pthread_internal_list *__prev; + union __pthread_internal_list *__next; + }__list_t; + void *mcs_lock; } __pthread_list_t; #else -typedef struct __pthread_internal_slist +typedef union __pthread_internal_slist { - struct __pthread_internal_slist *__next; + union __pthread_internal_slist *__next; + void *mcs_lock; } __pthread_slist_t; #endif @@ -165,6 +169,13 @@ struct __pthread_mutex_s __PTHREAD_COMPAT_PADDING_END }; +struct mcs_lock +{ + struct mcs_lock *next; + int locked; +}; + +typedef struct mcs_lock mcs_lock_t; /* Common definition of pthread_cond_t. */ diff --git a/sysdeps/nptl/pthread.h b/sysdeps/nptl/pthread.h index df049ab..4b4b80a 100644 --- a/sysdeps/nptl/pthread.h +++ b/sysdeps/nptl/pthread.h @@ -45,7 +45,8 @@ enum PTHREAD_MUTEX_TIMED_NP, PTHREAD_MUTEX_RECURSIVE_NP, PTHREAD_MUTEX_ERRORCHECK_NP, - PTHREAD_MUTEX_ADAPTIVE_NP + PTHREAD_MUTEX_ADAPTIVE_NP, + PTHREAD_MUTEX_QUEUESPINNER_NP #if defined __USE_UNIX98 || defined __USE_XOPEN2K8 , PTHREAD_MUTEX_NORMAL = PTHREAD_MUTEX_TIMED_NP, @@ -85,14 +86,16 @@ enum #if __PTHREAD_MUTEX_HAVE_PREV # define PTHREAD_MUTEX_INITIALIZER \ - { { 0, 0, 0, 0, 0, __PTHREAD_SPINS, { 0, 0 } } } + { { 0, 0, 0, 0, 0, __PTHREAD_SPINS, { { 0, 0 } } } } # ifdef __USE_GNU # define PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP \ - { { 0, 0, 0, 0, PTHREAD_MUTEX_RECURSIVE_NP, __PTHREAD_SPINS, { 0, 0 } } } + { { 0, 0, 0, 0, PTHREAD_MUTEX_RECURSIVE_NP, __PTHREAD_SPINS, { { 0, 0 } } } } # define PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP \ - { { 0, 0, 0, 0, PTHREAD_MUTEX_ERRORCHECK_NP, __PTHREAD_SPINS, { 0, 0 } } } + { { 0, 0, 0, 0, PTHREAD_MUTEX_ERRORCHECK_NP, __PTHREAD_SPINS, { { 0, 0 } } } } # define PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP \ - { { 0, 0, 0, 0, PTHREAD_MUTEX_ADAPTIVE_NP, __PTHREAD_SPINS, { 0, 0 } } } + { { 0, 0, 0, 0, PTHREAD_MUTEX_ADAPTIVE_NP, __PTHREAD_SPINS, { { 0, 0 } } } } +# define PTHREAD_QUEUESPINNER_MUTEX_INITIALIZER_NP \ + { { 0, 0, 0, 0, PTHREAD_MUTEX_QUEUESPINNER_NP, __PTHREAD_SPINS, { { 0, 0 } } } } # endif #else @@ -105,6 +108,8 @@ enum { { 0, 0, 0, PTHREAD_MUTEX_ERRORCHECK_NP, 0, { __PTHREAD_SPINS } } } # define PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP \ { { 0, 0, 0, PTHREAD_MUTEX_ADAPTIVE_NP, 0, { __PTHREAD_SPINS } } } +# define PTHREAD_QUEUESPINNER_MUTEX_INITIALIZER_NP \ + { { 0, 0, 0, PTHREAD_MUTEX_QUEUESPINNER_NP, 0, { __PTHREAD_SPINS } } } # endif #endif diff --git a/sysdeps/unix/sysv/linux/hppa/pthread.h b/sysdeps/unix/sysv/linux/hppa/pthread.h index 11a024d..57c101c 100644 --- a/sysdeps/unix/sysv/linux/hppa/pthread.h +++ b/sysdeps/unix/sysv/linux/hppa/pthread.h @@ -46,6 +46,7 @@ enum PTHREAD_MUTEX_RECURSIVE_NP, PTHREAD_MUTEX_ERRORCHECK_NP, PTHREAD_MUTEX_ADAPTIVE_NP + PTHREAD_MUTEX_QUEUESPINNER_NP, #if defined __USE_UNIX98 || defined __USE_XOPEN2K8 , PTHREAD_MUTEX_NORMAL = PTHREAD_MUTEX_TIMED_NP, @@ -95,6 +96,9 @@ enum # define PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP \ { { 0, 0, 0, PTHREAD_MUTEX_ADAPTIVE_NP, { 0, 0, 0, 0 }, 0, \ { __PTHREAD_SPINS }, { 0, 0 } } } +# define PTHREAD_QUEUESPINNER_MUTEX_INITIALIZER_NP \ + { { 0, 0, 0, PTHREAD_MUTEX_QUEUESPINNER_NP, { 0, 0, 0, 0 }, 0, \ + { __PTHREAD_SPINS }, { 0, 0 } } } #endif