From patchwork Tue Jan 17 15:28:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 18932 Received: (qmail 10983 invoked by alias); 17 Jan 2017 15:28:45 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 10342 invoked by uid 89); 17 Jan 2017 15:28:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=decrementing, unlock, 5129, Afterwards X-HELO: mx0a-001b2d01.pphosted.com From: Stefan Liebler Subject: Re: [PATCH 4/4] S390: Optimize lock-elision by decrementing adapt_count at unlock. To: libc-alpha@sourceware.org References: <1481032315-12420-1-git-send-email-stli@linux.vnet.ibm.com> <1481032315-12420-4-git-send-email-stli@linux.vnet.ibm.com> <1484132014.5606.256.camel@redhat.com> Date: Tue, 17 Jan 2017 16:28:31 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <1484132014.5606.256.camel@redhat.com> X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17011715-0016-0000-0000-000004295A98 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17011715-0017-0000-0000-0000261D5F10 Message-Id: <036b04fb-3f33-4e6b-2591-c5af87539cad@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-01-17_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=3 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1701170210 On 01/11/2017 11:53 AM, Torvald Riegel wrote: > On Tue, 2016-12-06 at 14:51 +0100, Stefan Liebler wrote: >> This patch decrements the adapt_count while unlocking the futex >> instead of before aquiring the futex as it is done on power, too. >> Furthermore a transaction is only started if the futex is currently free. >> This check is done after starting the transaction, too. >> If the futex is not free and the transaction nesting depth is one, >> we can simply end the started transaction instead of aborting it. >> The implementation of this check was faulty as it always ended the >> started transaction. By using the fallback path, the the outermost >> transaction was aborted. Now the outermost transaction is aborted >> directly. >> >> This patch also adds some commentary and aligns the code in >> elision-trylock.c to the code in elision-lock.c as possible. > > I don't think this is quite ready yet. See below for details. > > I'm not too concerned about this fact, given that it's just in > s390-specific code. But generally, I'd prefer if arch-specific code > aims for the same quality and level of consensus about it as what is our > aim for generic code. > >> ChangeLog: >> >> * sysdeps/unix/sysv/linux/s390/lowlevellock.h >> (__lll_unlock_elision, lll_unlock_elision): Add adapt_count argument. >> * sysdeps/unix/sysv/linux/s390/elision-lock.c: >> (__lll_lock_elision): Decrement adapt_count while unlocking >> instead of before locking. >> * sysdeps/unix/sysv/linux/s390/elision-trylock.c >> (__lll_trylock_elision): Likewise. >> * sysdeps/unix/sysv/linux/s390/elision-unlock.c: >> (__lll_unlock_elision): Likewise. >> --- >> sysdeps/unix/sysv/linux/s390/elision-lock.c | 37 ++++++++------- >> sysdeps/unix/sysv/linux/s390/elision-trylock.c | 62 ++++++++++++++------------ >> sysdeps/unix/sysv/linux/s390/elision-unlock.c | 29 ++++++++++-- >> sysdeps/unix/sysv/linux/s390/lowlevellock.h | 4 +- >> 4 files changed, 78 insertions(+), 54 deletions(-) >> >> diff --git a/sysdeps/unix/sysv/linux/s390/elision-lock.c b/sysdeps/unix/sysv/linux/s390/elision-lock.c >> index 3dd7fbc..4a7d546 100644 >> --- a/sysdeps/unix/sysv/linux/s390/elision-lock.c >> +++ b/sysdeps/unix/sysv/linux/s390/elision-lock.c >> @@ -50,31 +50,30 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private) >> critical section uses lock elision) and outside of transactions. Thus, >> we need to use atomic accesses to avoid data races. However, the >> value of adapt_count is just a hint, so relaxed MO accesses are >> - sufficient. */ >> - if (atomic_load_relaxed (adapt_count) > 0) >> - { >> - /* Lost updates are possible, but harmless. Due to races this might lead >> - to *adapt_count becoming less than zero. */ >> - atomic_store_relaxed (adapt_count, >> - atomic_load_relaxed (adapt_count) - 1); >> - goto use_lock; >> - } >> - >> - if (aconf.try_tbegin > 0) >> + sufficient. >> + Do not begin a transaction if another cpu has locked the >> + futex with normal locking. If adapt_count is zero, it remains and the >> + next pthread_mutex_lock call will try to start a transaction again. */ > > This seems to make an assumption about performance that should be > explained in the comment. IIRC, x86 LE does not make this assumption, > so it's not generally true. I suppose s390 aborts are really expensive, > and you don't expect that a lock is in the acquired state often enough > so that aborts are overall more costly than the overhead of the > additional load and branch? > Yes, aborting a transaction is expensive. But you are right, there is an additional load and branch and this thread will anyway wait in LLL_LOCK for another thread releasing the futex. See example below. >> + if (atomic_load_relaxed (futex) == 0 >> + && atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) >> { >> int status = __libc_tbegin_retry ((void *) 0, aconf.try_tbegin - 1); >> if (__builtin_expect (status == _HTM_TBEGIN_STARTED, >> _HTM_TBEGIN_STARTED)) >> { >> - if (__builtin_expect (*futex == 0, 1)) >> + /* Check the futex to make sure nobody has touched it in the >> + mean time. This forces the futex into the cache and makes >> + sure the transaction aborts if some other cpu uses the >> + lock (writes the futex). */ > > s/cpu/CPU/ I'll correct it in the whole file and in elision-trylock.c. > > I'd also say "if another thread acquires the lock concurrently" instead > of the last part of that sentence. > >> + if (__builtin_expect (atomic_load_relaxed (futex) == 0, 1)) >> /* Lock was free. Return to user code in a transaction. */ >> return 0; >> >> /* Lock was busy. Fall back to normal locking. */ >> - if (__builtin_expect (__libc_tx_nesting_depth (), 1)) >> + if (__builtin_expect (__libc_tx_nesting_depth () <= 1, 1)) > > Use __glibc_likely? okay. > >> { >> /* In a non-nested transaction there is no need to abort, >> - which is expensive. */ >> + which is expensive. Simply end the started transaction. */ >> __libc_tend (); >> /* Don't try to use transactions for the next couple of times. >> See above for why relaxed MO is sufficient. */ >> @@ -92,9 +91,9 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private) >> is zero. >> The adapt_count of this inner mutex is not changed, >> because using the default lock with the inner mutex >> - would abort the outer transaction. >> - */ >> + would abort the outer transaction. */ >> __libc_tabort (_HTM_FIRST_USER_ABORT_CODE | 1); >> + __builtin_unreachable (); >> } >> } >> else if (status != _HTM_TBEGIN_TRANSIENT) >> @@ -110,15 +109,15 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private) >> } >> else >> { >> - /* Same logic as above, but for for a number of temporary failures in >> - a row. */ >> + /* The transaction failed for some retries with >> + _HTM_TBEGIN_TRANSIENT. Use the normal locking now and for the >> + next couple of calls. */ >> if (aconf.skip_lock_out_of_tbegin_retries > 0) >> atomic_store_relaxed (adapt_count, >> aconf.skip_lock_out_of_tbegin_retries); >> } >> } >> >> - use_lock: >> /* Use normal locking as fallback path if transaction does not succeed. */ >> return LLL_LOCK ((*futex), private); >> } >> diff --git a/sysdeps/unix/sysv/linux/s390/elision-trylock.c b/sysdeps/unix/sysv/linux/s390/elision-trylock.c >> index e21fc26..dee66d4 100644 >> --- a/sysdeps/unix/sysv/linux/s390/elision-trylock.c >> +++ b/sysdeps/unix/sysv/linux/s390/elision-trylock.c >> @@ -43,23 +43,36 @@ __lll_trylock_elision (int *futex, short *adapt_count) >> until their try_tbegin is zero. >> */ >> __libc_tabort (_HTM_FIRST_USER_ABORT_CODE | 1); >> + __builtin_unreachable (); >> } >> >> - /* Only try a transaction if it's worth it. See __lll_lock_elision for >> - why we need atomic accesses. Relaxed MO is sufficient because this is >> - just a hint. */ >> - if (atomic_load_relaxed (adapt_count) <= 0) >> + /* adapt_count can be accessed concurrently; these accesses can be both >> + inside of transactions (if critical sections are nested and the outer >> + critical section uses lock elision) and outside of transactions. Thus, >> + we need to use atomic accesses to avoid data races. However, the >> + value of adapt_count is just a hint, so relaxed MO accesses are >> + sufficient. >> + Do not begin a transaction if another cpu has locked the >> + futex with normal locking. If adapt_count is zero, it remains and the >> + next pthread_mutex_lock call will try to start a transaction again. */ >> + if (atomic_load_relaxed (futex) == 0 >> + && atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) >> { >> - int status; >> - >> - if (__builtin_expect >> - ((status = __libc_tbegin ((void *) 0)) == _HTM_TBEGIN_STARTED, 1)) >> + int status = __libc_tbegin ((void *) 0); >> + if (__builtin_expect (status == _HTM_TBEGIN_STARTED, >> + _HTM_TBEGIN_STARTED)) >> { >> - if (*futex == 0) >> + /* Check the futex to make sure nobody has touched it in the >> + mean time. This forces the futex into the cache and makes >> + sure the transaction aborts if some other cpu uses the >> + lock (writes the futex). */ >> + if (__builtin_expect (atomic_load_relaxed (futex) == 0, 1)) > > __glibc_likely okay. > >> + /* Lock was free. Return to user code in a transaction. */ >> return 0; >> - /* Lock was busy. Fall back to normal locking. */ >> - /* Since we are in a non-nested transaction there is no need to abort, >> - which is expensive. */ >> + >> + /* Lock was busy. Fall back to normal locking. Since we are in >> + a non-nested transaction there is no need to abort, which is >> + expensive. Simply end the started transaction. */ >> __libc_tend (); >> /* Note: Changing the adapt_count here might abort a transaction on a >> different cpu, but that could happen anyway when the futex is >> @@ -68,27 +81,18 @@ __lll_trylock_elision (int *futex, short *adapt_count) >> if (aconf.skip_lock_busy > 0) >> atomic_store_relaxed (adapt_count, aconf.skip_lock_busy); >> } >> - else >> + else if (status != _HTM_TBEGIN_TRANSIENT) >> { >> - if (status != _HTM_TBEGIN_TRANSIENT) >> - { >> - /* A persistent abort (cc 1 or 3) indicates that a retry is >> - probably futile. Use the normal locking now and for the >> - next couple of calls. >> - Be careful to avoid writing to the lock. */ >> - if (aconf.skip_trylock_internal_abort > 0) >> - *adapt_count = aconf.skip_trylock_internal_abort; >> - } >> + /* A persistent abort (cc 1 or 3) indicates that a retry is >> + probably futile. Use the normal locking now and for the >> + next couple of calls. >> + Be careful to avoid writing to the lock. */ >> + if (aconf.skip_trylock_internal_abort > 0) >> + *adapt_count = aconf.skip_trylock_internal_abort; >> } >> /* Could do some retries here. */ >> } >> - else >> - { >> - /* Lost updates are possible, but harmless. Due to races this might lead >> - to *adapt_count becoming less than zero. */ >> - atomic_store_relaxed (adapt_count, >> - atomic_load_relaxed (adapt_count) - 1); >> - } >> >> + /* Use normal locking as fallback path if transaction does not succeed. */ >> return lll_trylock (*futex); >> } >> diff --git a/sysdeps/unix/sysv/linux/s390/elision-unlock.c b/sysdeps/unix/sysv/linux/s390/elision-unlock.c >> index 0b1ade9..e68d970 100644 >> --- a/sysdeps/unix/sysv/linux/s390/elision-unlock.c >> +++ b/sysdeps/unix/sysv/linux/s390/elision-unlock.c >> @@ -21,16 +21,37 @@ >> #include >> >> int >> -__lll_unlock_elision(int *futex, int private) >> +__lll_unlock_elision(int *futex, short *adapt_count, int private) >> { >> /* If the lock is free, we elided the lock earlier. This does not >> necessarily mean that we are in a transaction, because the user code may >> - have closed the transaction, but that is impossible to detect reliably. */ >> - if (*futex == 0) >> + have closed the transaction, but that is impossible to detect reliably. >> + Relaxed MO access to futex is sufficient as we only need a hint, if we > > This comment is incorrect because we don't just need just a hint here. > The reason why relaxed MO is sufficient is because a correct program > will only release a lock it has acquired; therefore, it must either > changed the futex word's value to something !=0 or it must have used > elision; these are actions by the same thread, so these actions are > sequenced-before the relaxed load (and thus also happens-before the > relaxed load). Therefore, relaxed MO is sufficient. > okay. >> + started a transaction or acquired the futex in e.g. elision-lock.c. */ >> + if (atomic_load_relaxed (futex) == 0) >> { >> __libc_tend (); >> } >> else >> - lll_unlock ((*futex), private); >> + { >> + /* Update the adapt_count while unlocking before completing the critical >> + section. adapt_count is accessed concurrently outside of a >> + transaction or an aquired lock e.g. in elision-lock.c so we need to use > > transaction or a critical section (e.g., in elision-lock.c); so, we need > to use > okay >> + atomic accesses. However, the value of adapt_count is just a hint, so >> + relaxed MO accesses are sufficient. >> + If adapt_count would be decremented while locking, multiple >> + CPUs trying to lock the locked mutex will decrement adapt_count to >> + zero and another CPU will try to start a transaction, which will be >> + immediately aborted as the mutex is locked. > > I don't think this is necessarily the case. It is true that if more > than one thread decrements, only one would immediately try to use > elision (because only one decrements from 1 (ignoring lost updates)). > However, if you decrement in the critical section, and lock acquisitions > wait until the lock is free *before* loading adapt_count and choosing > whether to use elision or not, then it shouldn't matter whether you > decrement closer to the lock acquisition or closer to the release. Waiting for a free lock is done by futex-syscall within LLL_LOCK (see __lll_lock_wait/__lll_lock_wait_private). On wakeup the lock is immediately acquired if it is free. Afterwards there is no loading of adapt_count and no decision whether to use elision or not. Following your suggestion means, that we need a further implementation like in __lll_lock_wait/__lll_lock_wait_private in order to wait for a free lock and then load adapt_count and choose to elide or not! Please have a look at the following example assuming: adapt_count = 1; There are two scenarios below: Imagine that only "Thread 3a" or "Thread 3b" is used. Decrement adapt_count while locking (without this patch): -Thread 1 __lll_lock_elision: decrements adapt_count to 0 and acquires the lock via LLL_LOCK. -Thread 2 __lll_lock_elision: starts a transaction and ends / aborts it immediately as lock is acquired. adapt_count is set to 3. LLL_LOCK waits until lock is released by Thread 1. -Thread 1 __lll_unlock_elision: releases lock. -Thread 2 __lll_lock_elision: wakes up and aquires lock via waiting LLL_LOCK. -Thread 3a __lll_lock_elision: decrements adapt_count to 2 and waits via LLL_LOCK until lock is released by Thread 2. -Thread 2 __lll_unlock_elision: releases lock. -Thread 3b __lll_lock_elision: decrements adapt_count to 2 and acquires lock via LLL_LOCK. Decrement adapt_count while unlocking (with this patch): -Thread 1 __lll_lock_elision: acquires the lock via LLL_LOCK. adapt_count remains 1. -Thread 2 __lll_lock_elision: LLL_LOCK is used as futex is acquired by Thread 1 or adapt_count > 0. It waits until lock is released by Thread 1. -Thread 1 __lll_unlock_elision: decrements adapt_count to 0 and releases the lock. -Thread 2 __lll_lock_elision: wakes up and acquires lock via waiting LLL_LOCK. -Thread 3a __lll_lock_elision: LLL_LOCK is used as futex is acquired by Thread 2. -Thread 2 __lll_unlock_elision: releases lock. adapt_count remains 0. -Thread 3b __lll_lock_elision: starts a transaction. If futex is not tested before starting a transaction, the additional load and branch is not needed, Thread 3a will start and abort a transaction, set adapt_count to 3 and will wait via LLL_LOCK. In case Thread 3b, a transaction will be started. The attached diff removes the futex==0 test. Later I will make one patch with changelog for this and the other two patches. > > I think this needs a more thorough analysis (including better > documentation) and/or a microbenchmark. > >> + If adapt_count would be decremented while unlocking after completing >> + the critical section, possible waiters will be waked up before >> + decrementing the adapt_count. Those waked up waiters could have >> + destroyed and freed this mutex! */ > > Please fix this sentence. Or you could just say that POSIX' mutex > destruction requirements disallow accesses to the mutexes after it has > been released and thus could have been acquired by another thread. > Okay. >> + short adapt_count_val = atomic_load_relaxed (adapt_count); >> + if (adapt_count_val > 0) >> + atomic_store_relaxed (adapt_count, adapt_count_val - 1); >> + >> + lll_unlock ((*futex), private); >> + } >> return 0; >> } >> diff --git a/sysdeps/unix/sysv/linux/s390/lowlevellock.h b/sysdeps/unix/sysv/linux/s390/lowlevellock.h >> index 8d564ed..c60f4f7 100644 >> --- a/sysdeps/unix/sysv/linux/s390/lowlevellock.h >> +++ b/sysdeps/unix/sysv/linux/s390/lowlevellock.h >> @@ -50,7 +50,7 @@ extern int __lll_timedlock_elision >> extern int __lll_lock_elision (int *futex, short *adapt_count, int private) >> attribute_hidden; >> >> -extern int __lll_unlock_elision(int *futex, int private) >> +extern int __lll_unlock_elision(int *futex, short *adapt_count, int private) >> attribute_hidden; >> >> extern int __lll_trylock_elision(int *futex, short *adapt_count) >> @@ -59,7 +59,7 @@ extern int __lll_trylock_elision(int *futex, short *adapt_count) >> # define lll_lock_elision(futex, adapt_count, private) \ >> __lll_lock_elision (&(futex), &(adapt_count), private) >> # define lll_unlock_elision(futex, adapt_count, private) \ >> - __lll_unlock_elision (&(futex), private) >> + __lll_unlock_elision (&(futex), &(adapt_count), private) >> # define lll_trylock_elision(futex, adapt_count) \ >> __lll_trylock_elision(&(futex), &(adapt_count)) >> #endif /* ENABLE_LOCK_ELISION */ > > > diff --git a/sysdeps/unix/sysv/linux/s390/elision-lock.c b/sysdeps/unix/sysv/linux/s390/elision-lock.c index faa998e..0081537 100644 --- a/sysdeps/unix/sysv/linux/s390/elision-lock.c +++ b/sysdeps/unix/sysv/linux/s390/elision-lock.c @@ -50,30 +50,28 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private) critical section uses lock elision) and outside of transactions. Thus, we need to use atomic accesses to avoid data races. However, the value of adapt_count is just a hint, so relaxed MO accesses are - sufficient. - Do not begin a transaction if another cpu has locked the - futex with normal locking. If adapt_count is zero, it remains and the - next pthread_mutex_lock call will try to start a transaction again. */ - if (atomic_load_relaxed (futex) == 0 - && atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) + sufficient. */ + if (atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) { /* Start a transaction and retry it automatically if it aborts with _HTM_TBEGIN_TRANSIENT. This macro calls tbegin at most retry_cnt + 1 times. The second argument is considered as retry_cnt. */ int status = __libc_tbegin_retry ((void *) 0, aconf.try_tbegin - 1); - if (__builtin_expect (status == _HTM_TBEGIN_STARTED, - _HTM_TBEGIN_STARTED)) + if (__glibc_likely (status == _HTM_TBEGIN_STARTED)) { /* Check the futex to make sure nobody has touched it in the mean time. This forces the futex into the cache and makes - sure the transaction aborts if some other cpu uses the - lock (writes the futex). */ - if (__builtin_expect (atomic_load_relaxed (futex) == 0, 1)) + sure the transaction aborts if another thread acquires the lock + concurrently. */ + if (__glibc_likely (atomic_load_relaxed (futex) == 0)) /* Lock was free. Return to user code in a transaction. */ return 0; - /* Lock was busy. Fall back to normal locking. */ - if (__builtin_expect (__libc_tx_nesting_depth () <= 1, 1)) + /* Lock was busy. Fall back to normal locking. + This can be the case if e.g. adapt_count was decremented to zero + by a former release and another thread has been waken up and + acquired it. */ + if (__glibc_likely (__libc_tx_nesting_depth () <= 1)) { /* In a non-nested transaction there is no need to abort, which is expensive. Simply end the started transaction. */ diff --git a/sysdeps/unix/sysv/linux/s390/elision-trylock.c b/sysdeps/unix/sysv/linux/s390/elision-trylock.c index eec172a..aa09073 100644 --- a/sysdeps/unix/sysv/linux/s390/elision-trylock.c +++ b/sysdeps/unix/sysv/linux/s390/elision-trylock.c @@ -51,31 +51,29 @@ __lll_trylock_elision (int *futex, short *adapt_count) critical section uses lock elision) and outside of transactions. Thus, we need to use atomic accesses to avoid data races. However, the value of adapt_count is just a hint, so relaxed MO accesses are - sufficient. - Do not begin a transaction if another cpu has locked the - futex with normal locking. If adapt_count is zero, it remains and the - next pthread_mutex_lock call will try to start a transaction again. */ - if (atomic_load_relaxed (futex) == 0 - && atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) + sufficient. */ + if (atomic_load_relaxed (adapt_count) <= 0 && aconf.try_tbegin > 0) { int status = __libc_tbegin ((void *) 0); - if (__builtin_expect (status == _HTM_TBEGIN_STARTED, - _HTM_TBEGIN_STARTED)) + if (__glibc_likely (status == _HTM_TBEGIN_STARTED)) { /* Check the futex to make sure nobody has touched it in the mean time. This forces the futex into the cache and makes - sure the transaction aborts if some other cpu uses the - lock (writes the futex). */ - if (__builtin_expect (atomic_load_relaxed (futex) == 0, 1)) + sure the transaction aborts if another thread acquires the lock + concurrently. */ + if (__glibc_likely (atomic_load_relaxed (futex) == 0)) /* Lock was free. Return to user code in a transaction. */ return 0; - /* Lock was busy. Fall back to normal locking. Since we are in - a non-nested transaction there is no need to abort, which is - expensive. Simply end the started transaction. */ + /* Lock was busy. Fall back to normal locking. + This can be the case if e.g. adapt_count was decremented to zero + by a former release and another thread has been waken up and + acquired it. + Since we are in a non-nested transaction there is no need to abort, + which is expensive. Simply end the started transaction. */ __libc_tend (); /* Note: Changing the adapt_count here might abort a transaction on a - different cpu, but that could happen anyway when the futex is + different CPU, but that could happen anyway when the futex is acquired, so there's no need to check the nesting depth here. See above for why relaxed MO is sufficient. */ if (aconf.skip_lock_busy > 0) diff --git a/sysdeps/unix/sysv/linux/s390/elision-unlock.c b/sysdeps/unix/sysv/linux/s390/elision-unlock.c index d9205fa..c062d71 100644 --- a/sysdeps/unix/sysv/linux/s390/elision-unlock.c +++ b/sysdeps/unix/sysv/linux/s390/elision-unlock.c @@ -26,8 +26,12 @@ __lll_unlock_elision(int *futex, short *adapt_count, int private) /* If the lock is free, we elided the lock earlier. This does not necessarily mean that we are in a transaction, because the user code may have closed the transaction, but that is impossible to detect reliably. - Relaxed MO access to futex is sufficient as we only need a hint, if we - started a transaction or acquired the futex in e.g. elision-lock.c. */ + Relaxed MO access to futex is sufficient because a correct program + will only release a lock it has acquired; therefore, it must either + changed the futex word's value to something !=0 or it must have used + elision; these are actions by the same thread, so these actions are + sequenced-before the relaxed load (and thus also happens-before the + relaxed load). Therefore, relaxed MO is sufficient. */ if (atomic_load_relaxed (futex) == 0) { __libc_tend (); @@ -36,17 +40,17 @@ __lll_unlock_elision(int *futex, short *adapt_count, int private) { /* Update the adapt_count while unlocking before completing the critical section. adapt_count is accessed concurrently outside of a - transaction or an aquired lock e.g. in elision-lock.c so we need to use - atomic accesses. However, the value of adapt_count is just a hint, so - relaxed MO accesses are sufficient. + transaction or a critical section (e.g. in elision-lock.c). So we need + to use atomic accesses. However, the value of adapt_count is just a + hint, so relaxed MO accesses are sufficient. If adapt_count would be decremented while locking, multiple - CPUs trying to lock the locked mutex will decrement adapt_count to + CPUs, trying to lock the acquired mutex, will decrement adapt_count to zero and another CPU will try to start a transaction, which will be immediately aborted as the mutex is locked. - If adapt_count would be decremented while unlocking after completing - the critical section, possible waiters will be waked up before - decrementing the adapt_count. Those waked up waiters could have - destroyed and freed this mutex! */ + The update of adapt_count is done before releasing the lock as POSIX' + mutex destruction requirements disallow accesses to the mutex after it + has been released and thus could have been acquired or destroyed by + another thread. */ short adapt_count_val = atomic_load_relaxed (adapt_count); if (adapt_count_val > 0) atomic_store_relaxed (adapt_count, adapt_count_val - 1);