From patchwork Fri Mar 30 07:14:52 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kemi Wang <kemi.wang@intel.com>
X-Patchwork-Id: 26514
Received: (qmail 84156 invoked by alias); 30 Mar 2018 07:17:18 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 84041 invoked by uid 89); 30 Mar 2018 07:17:17 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0,
	GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS,
	T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=0.7, RAM,
	ram, 1033
X-HELO: mga07.intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
X-ExtLoop1: 1
From: Kemi Wang <kemi.wang@intel.com>
To: Glibc alpha <libc-alpha@sourceware.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Tim Chen <tim.c.chen@intel.com>, Andi Kleen <andi.kleen@intel.com>,
	Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>,
	Lu Aubrey <aubrey.li@intel.com>, Kemi Wang <kemi.wang@intel.com>
Subject: [PATCH 2/3] Mutex: Only read while spinning
Date: Fri, 30 Mar 2018 15:14:52 +0800
Message-Id: <1522394093-9835-2-git-send-email-kemi.wang@intel.com>
In-Reply-To: <1522394093-9835-1-git-send-email-kemi.wang@intel.com>
References: <1522394093-9835-1-git-send-email-kemi.wang@intel.com>

The pthread adaptive spin mutex spins on the lock for a while before going
to a sleep. While the lock is contended and we need to wait, going straight
back to LLL_MUTEX_TRYLOCK(cmpxchg) is not a good idea on many targets as
that will force expensive memory synchronization among processors and
penalize other running threads. For example, it constantly floods the
system with "read for ownership" requests, which are much more expensive to
process than a single read. Thus, we only use MO read until we observe the
lock to not be acquired anymore, as suggusted by Andi Kleen.

Test machine:
2-sockets Skylake paltform, 112 cores with 62G RAM

Test case: Contended pthread adaptive spin mutex with global update
each thread of the workload does:
a) Lock the mutex (adaptive spin type)
b) Globle variable increment
c) Unlock the mutex
in a loop until timeout, and the main thread reports the total iteration
number of all the threads in one second.

This test case is as same as Will-it-scale.pthread_mutex3 except mutex type is
modified to PTHREAD_MUTEX_ADAPTIVE_NP.
github: https://github.com/antonblanchard/will-it-scale.git

nr_threads      base         head(SPIN_COUNT=10)  head(SPIN_COUNT=1000)
1               51644585        51307573(-0.7%)    51323778(-0.6%)
2               7914789         10011301(+26.5%)   9867343(+24.7%)
7               1687620         4224135(+150.3%)   3430504(+103.3%)
14              1026555         3784957(+268.7%)   1843458(+79.6%)
28              962001          2886885(+200.1%)   681965(-29.1%)
56              883770          2740755(+210.1%)   364879(-58.7%)
112             1150589         2707089(+135.3%)   415261(-63.9%)

Suggested-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
---
 nptl/pthread_mutex_lock.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
index 1519c14..c3aca93 100644
--- a/nptl/pthread_mutex_lock.c
+++ b/nptl/pthread_mutex_lock.c
@@ -26,6 +26,7 @@
 #include <atomic.h>
 #include <lowlevellock.h>
 #include <stap-probe.h>
+#include <mutex-conf.h>
 
 #ifndef lll_lock_elision
 #define lll_lock_elision(lock, try_lock, private)	({ \
@@ -124,16 +125,22 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
       if (LLL_MUTEX_TRYLOCK (mutex) != 0)
 	{
 	  int cnt = 0;
-	  int max_cnt = MIN (MAX_ADAPTIVE_COUNT,
-			     mutex->__data.__spins * 2 + 10);
+	  int max_cnt = MIN (__mutex_aconf.spin_count,
+			mutex->__data.__spins * 2 + 100);
 	  do
 	    {
-	      if (cnt++ >= max_cnt)
-		{
-		  LLL_MUTEX_LOCK (mutex);
-		  break;
-		}
-	      atomic_spin_nop ();
+		if (cnt >= max_cnt)
+		  {
+		    LLL_MUTEX_LOCK (mutex);
+		    break;
+		  }
+		/* MO read while spinning */
+		do
+		  {
+		    atomic_spin_nop ();
+		  }
+		while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
+			++cnt < max_cnt);
 	    }
 	  while (LLL_MUTEX_TRYLOCK (mutex) != 0);