From patchwork Fri Mar 30 07:14:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kemi Wang X-Patchwork-Id: 26514 Received: (qmail 84156 invoked by alias); 30 Mar 2018 07:17:18 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 84041 invoked by uid 89); 30 Mar 2018 07:17:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=0.7, RAM, ram, 1033 X-HELO: mga07.intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 From: Kemi Wang To: Glibc alpha Cc: Dave Hansen , Tim Chen , Andi Kleen , Ying Huang , Aaron Lu , Lu Aubrey , Kemi Wang Subject: [PATCH 2/3] Mutex: Only read while spinning Date: Fri, 30 Mar 2018 15:14:52 +0800 Message-Id: <1522394093-9835-2-git-send-email-kemi.wang@intel.com> In-Reply-To: <1522394093-9835-1-git-send-email-kemi.wang@intel.com> References: <1522394093-9835-1-git-send-email-kemi.wang@intel.com> The pthread adaptive spin mutex spins on the lock for a while before going to a sleep. While the lock is contended and we need to wait, going straight back to LLL_MUTEX_TRYLOCK(cmpxchg) is not a good idea on many targets as that will force expensive memory synchronization among processors and penalize other running threads. For example, it constantly floods the system with "read for ownership" requests, which are much more expensive to process than a single read. Thus, we only use MO read until we observe the lock to not be acquired anymore, as suggusted by Andi Kleen. Test machine: 2-sockets Skylake paltform, 112 cores with 62G RAM Test case: Contended pthread adaptive spin mutex with global update each thread of the workload does: a) Lock the mutex (adaptive spin type) b) Globle variable increment c) Unlock the mutex in a loop until timeout, and the main thread reports the total iteration number of all the threads in one second. This test case is as same as Will-it-scale.pthread_mutex3 except mutex type is modified to PTHREAD_MUTEX_ADAPTIVE_NP. github: https://github.com/antonblanchard/will-it-scale.git nr_threads base head(SPIN_COUNT=10) head(SPIN_COUNT=1000) 1 51644585 51307573(-0.7%) 51323778(-0.6%) 2 7914789 10011301(+26.5%) 9867343(+24.7%) 7 1687620 4224135(+150.3%) 3430504(+103.3%) 14 1026555 3784957(+268.7%) 1843458(+79.6%) 28 962001 2886885(+200.1%) 681965(-29.1%) 56 883770 2740755(+210.1%) 364879(-58.7%) 112 1150589 2707089(+135.3%) 415261(-63.9%) Suggested-by: Andi Kleen Signed-off-by: Kemi Wang --- nptl/pthread_mutex_lock.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c index 1519c14..c3aca93 100644 --- a/nptl/pthread_mutex_lock.c +++ b/nptl/pthread_mutex_lock.c @@ -26,6 +26,7 @@ #include #include #include +#include #ifndef lll_lock_elision #define lll_lock_elision(lock, try_lock, private) ({ \ @@ -124,16 +125,22 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) if (LLL_MUTEX_TRYLOCK (mutex) != 0) { int cnt = 0; - int max_cnt = MIN (MAX_ADAPTIVE_COUNT, - mutex->__data.__spins * 2 + 10); + int max_cnt = MIN (__mutex_aconf.spin_count, + mutex->__data.__spins * 2 + 100); do { - if (cnt++ >= max_cnt) - { - LLL_MUTEX_LOCK (mutex); - break; - } - atomic_spin_nop (); + if (cnt >= max_cnt) + { + LLL_MUTEX_LOCK (mutex); + break; + } + /* MO read while spinning */ + do + { + atomic_spin_nop (); + } + while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && + ++cnt < max_cnt); } while (LLL_MUTEX_TRYLOCK (mutex) != 0);