Message ID | 1522394093-9835-3-git-send-email-kemi.wang@intel.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 84404 invoked by alias); 30 Mar 2018 07:17:19 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 84259 invoked by uid 89); 30 Mar 2018 07:17:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=0.5 X-HELO: mga07.intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 From: Kemi Wang <kemi.wang@intel.com> To: Glibc alpha <libc-alpha@sourceware.org> Cc: Dave Hansen <dave.hansen@linux.intel.com>, Tim Chen <tim.c.chen@intel.com>, Andi Kleen <andi.kleen@intel.com>, Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>, Lu Aubrey <aubrey.li@intel.com>, Kemi Wang <kemi.wang@intel.com> Subject: [PATCH 3/3] Mutex: Avoid useless spinning Date: Fri, 30 Mar 2018 15:14:53 +0800 Message-Id: <1522394093-9835-3-git-send-email-kemi.wang@intel.com> In-Reply-To: <1522394093-9835-1-git-send-email-kemi.wang@intel.com> References: <1522394093-9835-1-git-send-email-kemi.wang@intel.com> |
Commit Message
Kemi Wang
March 30, 2018, 7:14 a.m. UTC
Usually, we can't set too short time out while spinning on the lock, that
probably makes a thread which is trying to acquire the lock go to sleep
quickly, thus weakens the benefit of pthread adaptive spin lock.
However, there is also a problem if we set the time out large in
case of protecting a small critical section with severe lock contention.
As we can see the test result in the last patch, the performance is highly
effected by the spin count tunables, smaller spin count, better performance
improvement. This is because the thread probably spins on the lock until
timeout in severe lock contention before going to sleep.
In this patch, we avoid the useless spin by making the spinner sleep
if it fails to acquire the lock when the lock is available, as suggested
by Tim Chen.
nr_threads base COUNT=1000(head~1) COUNT=1000(head)
1 51644585 51323778(-0.6%) 51378551(-0.5%)
2 7914789 9867343(+24.7%) 11503559(+45.3%)
7 1687620 3430504(+103.3%) 7817383(+363.2%)
14 1026555 1843458(+79.6%) 7360883(+617.0%)
28 962001 681965(-29.1%) 5681945(+490.6%)
56 883770 364879(-58.7%) 3416068(+286.5%)
112 1150589 415261(-63.9%) 3255700(+183.0%)
Suggested-by: Tim Chen <tim.c.chen@intel.com>
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
---
nptl/pthread_mutex_lock.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)
Comments
On 30/03/2018 04:14, Kemi Wang wrote: > Usually, we can't set too short time out while spinning on the lock, that > probably makes a thread which is trying to acquire the lock go to sleep > quickly, thus weakens the benefit of pthread adaptive spin lock. > > However, there is also a problem if we set the time out large in > case of protecting a small critical section with severe lock contention. > As we can see the test result in the last patch, the performance is highly > effected by the spin count tunables, smaller spin count, better performance > improvement. This is because the thread probably spins on the lock until > timeout in severe lock contention before going to sleep. > > In this patch, we avoid the useless spin by making the spinner sleep > if it fails to acquire the lock when the lock is available, as suggested > by Tim Chen. > > nr_threads base COUNT=1000(head~1) COUNT=1000(head) > 1 51644585 51323778(-0.6%) 51378551(-0.5%) > 2 7914789 9867343(+24.7%) 11503559(+45.3%) > 7 1687620 3430504(+103.3%) 7817383(+363.2%) > 14 1026555 1843458(+79.6%) 7360883(+617.0%) > 28 962001 681965(-29.1%) 5681945(+490.6%) > 56 883770 364879(-58.7%) 3416068(+286.5%) > 112 1150589 415261(-63.9%) 3255700(+183.0%) As before [2], I checked the change on a 64 cores aarch64 machine, but differently than previous patch this one seems to show improvements: nr_threads base head(SPIN_COUNT=10) head(SPIN_COUNT=1000) 1 27566206 28776779 (4.206770) 28778073 (4.211078) 2 8498813 9129102 (6.904173) 7042975 (-20.670782) 7 5019434 5832195 (13.935765) 5098511 (1.550982) 14 4379155 6507212 (32.703053) 5200018 (15.785772) 28 4397464 4584480 (4.079329) 4456767 (1.330628) 56 4020956 3534899 (-13.750237) 4096197 (1.836850) I would suggest you to squash both patch in only one for version 2. > > Suggested-by: Tim Chen <tim.c.chen@intel.com> > Signed-off-by: Kemi Wang <kemi.wang@intel.com> > --- > nptl/pthread_mutex_lock.c | 25 +++++++++++-------------- > 1 file changed, 11 insertions(+), 14 deletions(-) > > diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c > index c3aca93..0faee1a 100644 > --- a/nptl/pthread_mutex_lock.c > +++ b/nptl/pthread_mutex_lock.c > @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) > int cnt = 0; > int max_cnt = MIN (__mutex_aconf.spin_count, > mutex->__data.__spins * 2 + 100); > + > + /* MO read while spinning */ > do > - { > - if (cnt >= max_cnt) > - { > - LLL_MUTEX_LOCK (mutex); > - break; > - } > - /* MO read while spinning */ > - do > - { > - atomic_spin_nop (); > - } > - while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && > + { > + atomic_spin_nop (); > + } > + while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && > ++cnt < max_cnt); > - } > - while (LLL_MUTEX_TRYLOCK (mutex) != 0); > + /* Try to acquire the lock if lock is available or the spin count > + * is run out, go to sleep if fails > + */ > + if (LLL_MUTEX_TRYLOCK (mutex) != 0) > + LLL_MUTEX_LOCK (mutex); > > mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; > } > Please fix the format issue here.
On 2018年04月06日 04:59, Adhemerval Zanella wrote: > > > On 30/03/2018 04:14, Kemi Wang wrote: >> Usually, we can't set too short time out while spinning on the lock, that >> probably makes a thread which is trying to acquire the lock go to sleep >> quickly, thus weakens the benefit of pthread adaptive spin lock. >> >> However, there is also a problem if we set the time out large in >> case of protecting a small critical section with severe lock contention. >> As we can see the test result in the last patch, the performance is highly >> effected by the spin count tunables, smaller spin count, better performance >> improvement. This is because the thread probably spins on the lock until >> timeout in severe lock contention before going to sleep. >> >> In this patch, we avoid the useless spin by making the spinner sleep >> if it fails to acquire the lock when the lock is available, as suggested >> by Tim Chen. >> >> nr_threads base COUNT=1000(head~1) COUNT=1000(head) >> 1 51644585 51323778(-0.6%) 51378551(-0.5%) >> 2 7914789 9867343(+24.7%) 11503559(+45.3%) >> 7 1687620 3430504(+103.3%) 7817383(+363.2%) >> 14 1026555 1843458(+79.6%) 7360883(+617.0%) >> 28 962001 681965(-29.1%) 5681945(+490.6%) >> 56 883770 364879(-58.7%) 3416068(+286.5%) >> 112 1150589 415261(-63.9%) 3255700(+183.0%) > > As before [2], I checked the change on a 64 cores aarch64 machine, but > differently than previous patch this one seems to show improvements: > > nr_threads base head(SPIN_COUNT=10) head(SPIN_COUNT=1000) > 1 27566206 28776779 (4.206770) 28778073 (4.211078) > 2 8498813 9129102 (6.904173) 7042975 (-20.670782) > 7 5019434 5832195 (13.935765) 5098511 (1.550982) > 14 4379155 6507212 (32.703053) 5200018 (15.785772) > 28 4397464 4584480 (4.079329) 4456767 (1.330628) > 56 4020956 3534899 (-13.750237) 4096197 (1.836850) > > I would suggest you to squash both patch in only one for version 2. > OK, the separation here is easy for review. >> >> Suggested-by: Tim Chen <tim.c.chen@intel.com> >> Signed-off-by: Kemi Wang <kemi.wang@intel.com> >> --- >> nptl/pthread_mutex_lock.c | 25 +++++++++++-------------- >> 1 file changed, 11 insertions(+), 14 deletions(-) >> >> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c >> index c3aca93..0faee1a 100644 >> --- a/nptl/pthread_mutex_lock.c >> +++ b/nptl/pthread_mutex_lock.c >> @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) >> int cnt = 0; >> int max_cnt = MIN (__mutex_aconf.spin_count, >> mutex->__data.__spins * 2 + 100); >> + >> + /* MO read while spinning */ >> do >> - { >> - if (cnt >= max_cnt) >> - { >> - LLL_MUTEX_LOCK (mutex); >> - break; >> - } >> - /* MO read while spinning */ >> - do >> - { >> - atomic_spin_nop (); >> - } >> - while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && >> + { >> + atomic_spin_nop (); >> + } >> + while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && >> ++cnt < max_cnt); >> - } >> - while (LLL_MUTEX_TRYLOCK (mutex) != 0); >> + /* Try to acquire the lock if lock is available or the spin count >> + * is run out, go to sleep if fails >> + */ >> + if (LLL_MUTEX_TRYLOCK (mutex) != 0) >> + LLL_MUTEX_LOCK (mutex); >> >> mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; >> } >> > > Please fix the format issue here. >
diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c index c3aca93..0faee1a 100644 --- a/nptl/pthread_mutex_lock.c +++ b/nptl/pthread_mutex_lock.c @@ -127,22 +127,19 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) int cnt = 0; int max_cnt = MIN (__mutex_aconf.spin_count, mutex->__data.__spins * 2 + 100); + + /* MO read while spinning */ do - { - if (cnt >= max_cnt) - { - LLL_MUTEX_LOCK (mutex); - break; - } - /* MO read while spinning */ - do - { - atomic_spin_nop (); - } - while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && + { + atomic_spin_nop (); + } + while (atomic_load_relaxed (&mutex->__data.__lock) != 0 && ++cnt < max_cnt); - } - while (LLL_MUTEX_TRYLOCK (mutex) != 0); + /* Try to acquire the lock if lock is available or the spin count + * is run out, go to sleep if fails + */ + if (LLL_MUTEX_TRYLOCK (mutex) != 0) + LLL_MUTEX_LOCK (mutex); mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; }