[3/3] Mutex: Avoid useless spinning

Message ID	1522394093-9835-3-git-send-email-kemi.wang@intel.com
State	New, archived
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org From: Kemi Wang <kemi.wang@intel.com> To: Glibc alpha <libc-alpha@sourceware.org> Cc: Dave Hansen <dave.hansen@linux.intel.com>, Tim Chen <tim.c.chen@intel.com>, Andi Kleen <andi.kleen@intel.com>, Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>, Lu Aubrey <aubrey.li@intel.com>, Kemi Wang <kemi.wang@intel.com> Subject: [PATCH 3/3] Mutex: Avoid useless spinning Date: Fri, 30 Mar 2018 15:14:53 +0800 Message-Id: <1522394093-9835-3-git-send-email-kemi.wang@intel.com> In-Reply-To: <1522394093-9835-1-git-send-email-kemi.wang@intel.com> References: <1522394093-9835-1-git-send-email-kemi.wang@intel.com>

Message ID

1522394093-9835-3-git-send-email-kemi.wang@intel.com

State

New, archived

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
From: Kemi Wang <kemi.wang@intel.com>
To: Glibc alpha <libc-alpha@sourceware.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Tim Chen <tim.c.chen@intel.com>, Andi Kleen <andi.kleen@intel.com>,
	Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>,
	Lu Aubrey <aubrey.li@intel.com>, Kemi Wang <kemi.wang@intel.com>
Subject: [PATCH 3/3] Mutex: Avoid useless spinning
Date: Fri, 30 Mar 2018 15:14:53 +0800
Message-Id: <1522394093-9835-3-git-send-email-kemi.wang@intel.com>
In-Reply-To: <1522394093-9835-1-git-send-email-kemi.wang@intel.com>
References: <1522394093-9835-1-git-send-email-kemi.wang@intel.com>

Commit Message

Kemi Wang March 30, 2018, 7:14 a.m. UTC

  Usually, we can't set too short time out while spinning on the lock, that
probably makes a thread which is trying to acquire the lock go to sleep
quickly, thus weakens the benefit of pthread adaptive spin lock.

However, there is also a problem if we set the time out large in
case of protecting a small critical section with severe lock contention.
As we can see the test result in the last patch, the performance is highly
effected by the spin count tunables, smaller spin count, better performance
improvement. This is because the thread probably spins on the lock until
timeout in severe lock contention before going to sleep.

In this patch, we avoid the useless spin by making the spinner sleep
if it fails to acquire the lock when the lock is available, as suggested
by Tim Chen.

nr_threads    base       COUNT=1000(head~1)   COUNT=1000(head)
1           51644585      51323778(-0.6%)	     51378551(-0.5%)
2           7914789       9867343(+24.7%)	     11503559(+45.3%)
7           1687620       3430504(+103.3%)	     7817383(+363.2%)
14          1026555       1843458(+79.6%)	     7360883(+617.0%)
28          962001        681965(-29.1%)	     5681945(+490.6%)
56          883770        364879(-58.7%)	     3416068(+286.5%)
112         1150589       415261(-63.9%)	     3255700(+183.0%)

Suggested-by: Tim Chen <tim.c.chen@intel.com>
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
---
 nptl/pthread_mutex_lock.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

Comments

Adhemerval Zanella April 5, 2018, 8:59 p.m. UTC | #1

On 30/03/2018 04:14, Kemi Wang wrote:
> Usually, we can't set too short time out while spinning on the lock, that
> probably makes a thread which is trying to acquire the lock go to sleep
> quickly, thus weakens the benefit of pthread adaptive spin lock.
> 
> However, there is also a problem if we set the time out large in
> case of protecting a small critical section with severe lock contention.
> As we can see the test result in the last patch, the performance is highly
> effected by the spin count tunables, smaller spin count, better performance
> improvement. This is because the thread probably spins on the lock until
> timeout in severe lock contention before going to sleep.
> 
> In this patch, we avoid the useless spin by making the spinner sleep
> if it fails to acquire the lock when the lock is available, as suggested
> by Tim Chen.
> 
> nr_threads    base       COUNT=1000(head~1)   COUNT=1000(head)
> 1           51644585      51323778(-0.6%)	     51378551(-0.5%)
> 2           7914789       9867343(+24.7%)	     11503559(+45.3%)
> 7           1687620       3430504(+103.3%)	     7817383(+363.2%)
> 14          1026555       1843458(+79.6%)	     7360883(+617.0%)
> 28          962001        681965(-29.1%)	     5681945(+490.6%)
> 56          883770        364879(-58.7%)	     3416068(+286.5%)
> 112         1150589       415261(-63.9%)	     3255700(+183.0%)

As before [2], I checked the change on a 64 cores aarch64 machine, but
differently than previous patch this one seems to show improvements:

nr_threads      base            head(SPIN_COUNT=10)  head(SPIN_COUNT=1000)
1               27566206        28776779 (4.206770)  28778073 (4.211078)
2               8498813         9129102 (6.904173)   7042975 (-20.670782)
7               5019434         5832195 (13.935765)  5098511 (1.550982)
14              4379155         6507212 (32.703053)  5200018 (15.785772)
28              4397464         4584480 (4.079329)   4456767 (1.330628)
56              4020956         3534899 (-13.750237) 4096197 (1.836850)

I would suggest you to squash both patch in only one for version 2.

> 
> Suggested-by: Tim Chen <tim.c.chen@intel.com>
> Signed-off-by: Kemi Wang <kemi.wang@intel.com>
> ---
>  nptl/pthread_mutex_lock.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
> index c3aca93..0faee1a 100644
> --- a/nptl/pthread_mutex_lock.c
> +++ b/nptl/pthread_mutex_lock.c
> @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
>  	  int cnt = 0;
>  	  int max_cnt = MIN (__mutex_aconf.spin_count,
>  			mutex->__data.__spins * 2 + 100);
> +
> +     	/* MO read while spinning */
>  	  do
> -	    {
> -		if (cnt >= max_cnt)
> -		  {
> -		    LLL_MUTEX_LOCK (mutex);
> -		    break;
> -		  }
> -		/* MO read while spinning */
> -		do
> -		  {
> -		    atomic_spin_nop ();
> -		  }
> -		while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
> +    	{
> +		  atomic_spin_nop ();
> +		}
> +	  while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
>  			++cnt < max_cnt);
> -	    }
> -	  while (LLL_MUTEX_TRYLOCK (mutex) != 0);
> +	    /* Try to acquire the lock if lock is available or the spin count
> +	     * is run out, go to sleep if fails
> +	     */
> +	  if (LLL_MUTEX_TRYLOCK (mutex) != 0)
> +		  LLL_MUTEX_LOCK (mutex);
>  
>  	  mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
>  	}
> 

Please fix the format issue here.

Kemi Wang April 8, 2018, 8:31 a.m. UTC | #2

On 2018年04月06日 04:59, Adhemerval Zanella wrote:
> 
> 
> On 30/03/2018 04:14, Kemi Wang wrote:
>> Usually, we can't set too short time out while spinning on the lock, that
>> probably makes a thread which is trying to acquire the lock go to sleep
>> quickly, thus weakens the benefit of pthread adaptive spin lock.
>>
>> However, there is also a problem if we set the time out large in
>> case of protecting a small critical section with severe lock contention.
>> As we can see the test result in the last patch, the performance is highly
>> effected by the spin count tunables, smaller spin count, better performance
>> improvement. This is because the thread probably spins on the lock until
>> timeout in severe lock contention before going to sleep.
>>
>> In this patch, we avoid the useless spin by making the spinner sleep
>> if it fails to acquire the lock when the lock is available, as suggested
>> by Tim Chen.
>>
>> nr_threads    base       COUNT=1000(head~1)   COUNT=1000(head)
>> 1           51644585      51323778(-0.6%)	     51378551(-0.5%)
>> 2           7914789       9867343(+24.7%)	     11503559(+45.3%)
>> 7           1687620       3430504(+103.3%)	     7817383(+363.2%)
>> 14          1026555       1843458(+79.6%)	     7360883(+617.0%)
>> 28          962001        681965(-29.1%)	     5681945(+490.6%)
>> 56          883770        364879(-58.7%)	     3416068(+286.5%)
>> 112         1150589       415261(-63.9%)	     3255700(+183.0%)
> 
> As before [2], I checked the change on a 64 cores aarch64 machine, but
> differently than previous patch this one seems to show improvements:
> 
> nr_threads      base            head(SPIN_COUNT=10)  head(SPIN_COUNT=1000)
> 1               27566206        28776779 (4.206770)  28778073 (4.211078)
> 2               8498813         9129102 (6.904173)   7042975 (-20.670782)
> 7               5019434         5832195 (13.935765)  5098511 (1.550982)
> 14              4379155         6507212 (32.703053)  5200018 (15.785772)
> 28              4397464         4584480 (4.079329)   4456767 (1.330628)
> 56              4020956         3534899 (-13.750237) 4096197 (1.836850)
> 
> I would suggest you to squash both patch in only one for version 2.
> 

OK, the separation here is easy for review.

>>
>> Suggested-by: Tim Chen <tim.c.chen@intel.com>
>> Signed-off-by: Kemi Wang <kemi.wang@intel.com>
>> ---
>>  nptl/pthread_mutex_lock.c | 25 +++++++++++--------------
>>  1 file changed, 11 insertions(+), 14 deletions(-)
>>
>> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
>> index c3aca93..0faee1a 100644
>> --- a/nptl/pthread_mutex_lock.c
>> +++ b/nptl/pthread_mutex_lock.c
>> @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
>>  	  int cnt = 0;
>>  	  int max_cnt = MIN (__mutex_aconf.spin_count,
>>  			mutex->__data.__spins * 2 + 100);
>> +
>> +     	/* MO read while spinning */
>>  	  do
>> -	    {
>> -		if (cnt >= max_cnt)
>> -		  {
>> -		    LLL_MUTEX_LOCK (mutex);
>> -		    break;
>> -		  }
>> -		/* MO read while spinning */
>> -		do
>> -		  {
>> -		    atomic_spin_nop ();
>> -		  }
>> -		while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
>> +    	{
>> +		  atomic_spin_nop ();
>> +		}
>> +	  while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
>>  			++cnt < max_cnt);
>> -	    }
>> -	  while (LLL_MUTEX_TRYLOCK (mutex) != 0);
>> +	    /* Try to acquire the lock if lock is available or the spin count
>> +	     * is run out, go to sleep if fails
>> +	     */
>> +	  if (LLL_MUTEX_TRYLOCK (mutex) != 0)
>> +		  LLL_MUTEX_LOCK (mutex);
>>  
>>  	  mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
>>  	}
>>
> 
> Please fix the format issue here.
>

diff mbox

Patch

diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
index c3aca93..0faee1a 100644
--- a/nptl/pthread_mutex_lock.c
+++ b/nptl/pthread_mutex_lock.c
@@ -127,22 +127,19 @@  __pthread_mutex_lock (pthread_mutex_t *mutex)
 	  int cnt = 0;
 	  int max_cnt = MIN (__mutex_aconf.spin_count,
 			mutex->__data.__spins * 2 + 100);
+
+     	/* MO read while spinning */
 	  do
-	    {
-		if (cnt >= max_cnt)
-		  {
-		    LLL_MUTEX_LOCK (mutex);
-		    break;
-		  }
-		/* MO read while spinning */
-		do
-		  {
-		    atomic_spin_nop ();
-		  }
-		while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
+    	{
+		  atomic_spin_nop ();
+		}
+	  while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
 			++cnt < max_cnt);
-	    }
-	  while (LLL_MUTEX_TRYLOCK (mutex) != 0);
+	    /* Try to acquire the lock if lock is available or the spin count
+	     * is run out, go to sleep if fails
+	     */
+	  if (LLL_MUTEX_TRYLOCK (mutex) != 0)
+		  LLL_MUTEX_LOCK (mutex);
 
 	  mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
 	}