From patchwork Wed Feb 15 09:35:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 19262 Received: (qmail 21513 invoked by alias); 15 Feb 2017 09:36:14 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 21460 invoked by uid 89); 15 Feb 2017 09:36:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=AWL, BAYES_50, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, URIBL_RED autolearn=no version=3.3.2 spammy=qualification, exchanges, nonetheless, acquired X-HELO: mx0a-001b2d01.pphosted.com From: Stefan Liebler Subject: Re: [PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros. To: libc-alpha@sourceware.org References: <1481905917-15654-1-git-send-email-stli@linux.vnet.ibm.com> <5857CF10.1060100@arm.com> <628f6311-239c-5eea-572c-c2acae6fcbee@linux.vnet.ibm.com> <1487017743.16322.80.camel@redhat.com> Date: Wed, 15 Feb 2017 10:35:56 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <1487017743.16322.80.camel@redhat.com> X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17021509-0024-0000-0000-000002B64823 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17021509-0025-0000-0000-000022696316 Message-Id: <60a34645-17e4-6693-1343-03c55b0c47ad@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-02-15_05:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=4 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1702150095 On 02/13/2017 09:29 PM, Torvald Riegel wrote: > Thanks for working on this. Detailed comments below. > > Generally, I think we need to keep working on optimizing this (this can > happen in follow-up patches). For example, we should have > microbenchmarks for the exchange vs. CAS choice. > > Also, some of the tuning knobs such as > SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG apply to atomics in general and > not just the spinlock. I'd prefer if these where in a state in which we > could add them as property that's part of the atomics interface. > > On Wed, 2017-02-08 at 15:49 +0100, Stefan Liebler wrote: >> diff --git a/include/atomic.h b/include/atomic.h >> index 7f32640..770db4a 100644 >> --- a/include/atomic.h >> >> +++ b/include/atomic.h >> >> @@ -54,7 +54,7 @@ >> and following args. */ >> #define __atomic_val_bysize(pre, post, mem, ...) \ >> ({ \ >> - __typeof (*mem) __atg1_result; \ >> >> + __typeof ((__typeof (*(mem))) *(mem)) __atg1_result; \ >> >> if (sizeof (*mem) == 1) \ >> __atg1_result = pre##_8_##post (mem, __VA_ARGS__); \ >> else if (sizeof (*mem) == 2) \ >> @@ -162,9 +162,9 @@ >> /* Store NEWVALUE in *MEM and return the old value. */ >> #ifndef atomic_exchange_acq >> # define atomic_exchange_acq(mem, newvalue) \ >> - ({ __typeof (*(mem)) __atg5_oldval; \ >> >> + ({ __typeof ((__typeof (*(mem))) *(mem)) __atg5_oldval; \ >> >> __typeof (mem) __atg5_memp = (mem); \ >> - __typeof (*(mem)) __atg5_value = (newvalue); \ >> >> + __typeof ((__typeof (*(mem))) *(mem)) __atg5_value = (newvalue); \ >> >> \ >> do \ >> __atg5_oldval = *__atg5_memp; \ >> @@ -668,7 +668,7 @@ void __atomic_link_error (void); >> >> # ifndef atomic_load_relaxed >> # define atomic_load_relaxed(mem) \ >> - ({ __typeof (*(mem)) __atg100_val; \ >> >> + ({ __typeof ((__typeof (*(mem))) *(mem)) __atg100_val; \ >> >> __asm ("" : "=r" (__atg100_val) : "0" (*(mem))); \ >> __atg100_val; }) >> # endif > > You could keep these changes, but it's a bit odd because you only apply > them for the functions you needed them for. I think it would be better > to just remove the volatile qualification in the caller (ie, cast lock > to nonvolatile in pthread_spin_lock etc. > Yes, I've only applied them for the functions needed in spinlock-code. Removing volatile from type pthread_spinlock_t is no option and casting the volatile pointer to a non-volatile pointer is undefined. See comment from Florian: On 12/16/2016 09:12 PM, Florian Weimer wrote: > That's undefined: > > “If an attempt is made to refer to an object defined with a > volatile-qualified type through use of an lvalue with > non-volatile-qualified type, the behavior is undefined.” > > But we cannot drop the volatile qualifier from the definition of > pthread_spinlock_t because it would change the C++ mangling of > pthread_spinlock_t * and similar types. >> diff --git a/nptl/pthread_spin_init.c b/nptl/pthread_spin_init.c >> index 01dec5e..bca3590 100644 >> --- a/nptl/pthread_spin_init.c >> >> +++ b/nptl/pthread_spin_init.c >> >> @@ -22,6 +22,8 @@ >> int >> pthread_spin_init (pthread_spinlock_t *lock, int pshared) >> { >> - *lock = 0; >> >> + /* The atomic_store_relaxed is enough as we only initialize the spinlock here >> >> + and we are not in a critical region. */ > > I would change the comment to: > /* Relaxed MO is fine because this is an initializing store. */ Okay. > >> >> + atomic_store_relaxed (lock, 0); >> >> return 0; >> } >> diff --git a/nptl/pthread_spin_lock.c b/nptl/pthread_spin_lock.c >> index 4d03b78..4107b5e 100644 >> --- a/nptl/pthread_spin_lock.c >> >> +++ b/nptl/pthread_spin_lock.c >> >> @@ -21,7 +21,7 @@ >> >> /* A machine-specific version can define SPIN_LOCK_READS_BETWEEN_CMPXCHG >> to the number of plain reads that it's optimal to spin on between uses >> - of atomic_compare_and_exchange_val_acq. If spinning forever is optimal >> >> + of atomic_compare_exchange_weak_acquire. If spinning forever is optimal > > , at the end of this line. > Okay >> then use -1. If no plain reads here would ever be optimal, use 0. */ >> #ifndef SPIN_LOCK_READS_BETWEEN_CMPXCHG >> # warning machine-dependent file should define SPIN_LOCK_READS_BETWEEN_CMPXCHG >> @@ -37,10 +37,13 @@ pthread_spin_lock (pthread_spinlock_t *lock) >> when the lock is locked. >> We assume that the first try mostly will be successful, and we use >> atomic_exchange. For the subsequent tries we use >> - atomic_compare_and_exchange. */ >> >> - if (atomic_exchange_acq (lock, 1) == 0) >> >> + atomic_compare_and_exchange. >> >> + We need acquire memory order here as we need to see if another thread has >> >> + locked / unlocked this spinlock. */ > > Please change to: > We use acquire MO to synchronize-with the release MO store in > pthread_spin_unlock, and thus ensure that prior critical sections > happen-before this critical section. > Okay. >> >> + if (__glibc_likely (atomic_exchange_acquire (lock, 1) == 0)) >> >> return 0; >> >> + int val; >> >> do >> { >> /* The lock is contended and we need to wait. Going straight back >> @@ -50,20 +53,39 @@ pthread_spin_lock (pthread_spinlock_t *lock) >> On the other hand, we do want to update memory state on the local core >> once in a while to avoid spinning indefinitely until some event that >> will happen to update local memory as a side-effect. */ >> - if (SPIN_LOCK_READS_BETWEEN_CMPXCHG >= 0) >> >> + >> >> +#if SPIN_LOCK_READS_BETWEEN_CMPXCHG >= 0 >> >> + /* Use at most SPIN_LOCK_READS_BETWEEN_CMPXCHG plain reads between the >> >> + atomic compare and exchanges. */ >> Also changed this comment to: /* Use at most SPIN_LOCK_READS_BETWEEN_CMPXCHG relaxed MO reads between the atomic compare and exchanges. */ >> + int wait; >> >> + for (wait = 0; wait < SPIN_LOCK_READS_BETWEEN_CMPXCHG; wait ++) >> >> { >> - int wait = SPIN_LOCK_READS_BETWEEN_CMPXCHG; >> >> + atomic_spin_nop (); >> >> >> - while (*lock != 0 && wait > 0) >> >> - --wait; >> >> + val = atomic_load_relaxed (lock); >> >> + if (val == 0) >> >> + break; >> >> } >> - else >> >> + >> >> + /* Set expected value to zero for the next compare and exchange. */ >> >> + val = 0; >> >> + >> >> +#else /* SPIN_LOCK_READS_BETWEEN_CMPXCHG < 0 */ >> >> + /* Use plain reads until spinlock is free and then try a further atomic > >> + compare and exchange the next time. */ > > > Please change to: > Use relaxed-MO reads until we observe the lock to not be acquired > anymore. > Okay. > >> + do >> >> { >> - while (*lock != 0) >> >> - ; >> >> + atomic_spin_nop (); >> >> + >> >> + val = atomic_load_relaxed (lock); >> >> } >> + while (val != 0); >> >> + >> >> +#endif >> >> + /* We need acquire memory order here for the same reason as mentioned >> >> + for the first try to lock the spinlock. */ >> >> } >> - while (atomic_compare_and_exchange_val_acq (lock, 1, 0) != 0); >> >> + while (!atomic_compare_exchange_weak_acquire (lock, &val, 1)); >> >> >> return 0; >> } >> diff --git a/nptl/pthread_spin_trylock.c b/nptl/pthread_spin_trylock.c >> index 593bba3..942a38f 100644 >> --- a/nptl/pthread_spin_trylock.c >> >> +++ b/nptl/pthread_spin_trylock.c >> >> @@ -20,8 +20,29 @@ >> #include >> #include "pthreadP.h" >> >> +/* A machine-specific version can define >> >> + SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG to 1 if an explicit test if >> >> + lock is free is optimal. */ >> >> +#ifndef SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG >> >> +# define SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG 0 >> >> +#endif > > A while ago we tried hard to remove all code that would fail silently > when a macro had a typo in the name, for example. We still have a few > of them left (e.g., in the atomics), but I think this here doesn't > warrant an exception. > Okay. You're right. In comment of patch 2/2, you've mentioned a header where an architecture shall define those parameters. Thus I've added the file nptl/pthread_spin_parameters.h. It includes the description of the macros and it emits a warning if one architecture tries to use generic spinlock implementation without the definition of those macros. Furthermore I've added pthread_spin_parameters.h files for the architectures using the generic spinlock implementation with definition of those macros. The pthread_spin_lock.c files for those architectures are no longer required. I've added a compiler error in nptl/pthread_spin_trylock.c if the macro is not defined to zero or one. If some arch has an empty pthread_spin_parameters.h file or a file without the definition of those macros and the include_next directive, the pthread_spin_lock.c / pthread_spin_trylock.c will emit a Wundef warning. > Also, the name does not match the description. What are you really > trying to parametrize here? Do you want to do test-and-test-and-set vs. > test-and-set? Or do you want to make some statement about how CAS vs. > exchange behave in terms of runtime costs / effect on caches? Or do you > want to have a flag that says that exchange is implemented through a CAS > loop anyway, and thus it's not worth trying an exchange if we actually > want to bail out when the value is not equal to the expected value? > See below. >> >> + >> >> int >> pthread_spin_trylock (pthread_spinlock_t *lock) >> { >> - return atomic_exchange_acq (lock, 1) ? EBUSY : 0; >> >> + /* We need acquire memory order here as we need to see if another >> >> + thread has locked / unlocked this spinlock. */ > > See above. Please fix the comment in a similar way. > Changed comment to: /* We use acquire MO to synchronize-with the release MO store in pthread_spin_unlock, and thus ensure that prior critical sections happen-before this critical section. */ >> >> +#if SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG == 1 >> >> + /* Load and test the spinlock and only try to lock the spinlock if it is >> >> + free. */ >> >> + int val = atomic_load_relaxed (lock); >> >> + if (__glibc_likely (val == 0 && atomic_exchange_acquire (lock, 1) == 0)) > > I think that's not quite the same as a choice between CAS and exchange. Yes. You are right. > Doing a load first could be helpful for *both* CAS and exchange. OTOH, > if we assume that trylock will succeed most of the time, then the load > is unnecessary and might be more costly (eg, if it causes the state of > the cache line to change twice). > E.g. on s390, there exists no instruction which atomically exchanges the value. Instead a CAS loop is used. The CAS instruction locks the cache-line exclusively whether the value in memory equals the expected old-value or not. Therefore the value is loaded and compared before using the CAS instruction. If no other CPU has accessed the lock-value, the load will cause the state of the cache line to exclusive. If the lock is not acquired, the subsequent CAS instruction does not need to change the state of the cache line. If another CPU has accessed the lock-value e.g. by acquiring the lock, the load will cause the state of the cache line either to read-only or exclusive. This depends on the other CPU - whether it has already stored a new value or not. As this behaviour depends on the architecture, the architecture has to decide whether it should load and test the lock-value before the atomic-macro. I've changed the name of the macro-define to SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG. >> >> + return 0; >> >> +#else >> >> + /* Set spinlock to locked and test if we have locked it. */ > > Just say that we try to acquire the lock, which succeeds if the lock had > not been acquired. Okay. > >> + if (__glibc_likely (atomic_exchange_acquire (lock, 1) == 0)) >> >> + return 0; >> >> +#endif >> >> + >> >> + return EBUSY; >> >> } >> diff --git a/nptl/pthread_spin_unlock.c b/nptl/pthread_spin_unlock.c >> index 5fd73e5..f83b696 100644 >> --- a/nptl/pthread_spin_unlock.c >> >> +++ b/nptl/pthread_spin_unlock.c >> >> @@ -23,7 +23,9 @@ >> int >> pthread_spin_unlock (pthread_spinlock_t *lock) >> { >> - atomic_full_barrier (); >> >> - *lock = 0; >> >> + /* The atomic_store_release synchronizes-with the atomic_exchange_acquire >> >> + or atomic_compare_exchange_weak_acquire in pthread_spin_lock / >> >> + pthread_spin_trylock. */ >> >> + atomic_store_release (lock, 0); >> >> return 0; >> } > > I agree with this change. However, depending on how one interprets > POSIX' memory model, one may expect lock releases to be sequentially > consistent. Nonetheless, IMO, glibc should use only release MO. > > But we need to announce this change. Some of us have been considering > an additional section in the manual were we specify where we deviate > from POSIX; this change might be the first we're listing there (though > to be fair, this is a deviation only on some architectures because on > others such as powerpc I think, we've been using release MO forever). > > As not all architectures use the generic spinlock implementation, what about a NEWS entry like: * The new generic spinlock implementation uses C11 like atomics. The pthread_spin_unlock implementation is now using release memory order instead of sequentially consistent memory order. Here is an updated version of patch 1/2. (Patch 2/2 has to be aligned due to these changes, too) Thanks. Stefan ChangeLog: * NEWS: Mention new spinlock implementation. * include/atomic.h: (__atomic_val_bysize): Cast type to omit volatile qualifier. (atomic_exchange_acq): Likewise. (atomic_load_relaxed): Likewise. * nptl/pthread_spin_init.c (pthread_spin_init): Use atomic_store_relaxed. * nptl/pthread_spin_lock.c (pthread_spin_lock): Use C11-like atomic macros. * nptl/pthread_spin_trylock.c (pthread_spin_trylock): Likewise. Use an explicit load and test if lock is free before exchange, if new macro SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG macro is set to one. * nptl/pthread_spin_unlock.c (pthread_spin_unlock): Use atomic_store_release. * nptl/pthread_spin_parameters.h: New File. * sysdeps/aarch64/nptl/pthread_spin_lock.c: Delete File. * sysdeps/aarch64/nptl/pthread_spin_parameters.h: New File. * sysdeps/arm/nptl/pthread_spin_lock.c: Delete File. * sysdeps/arm/nptl/pthread_spin_parameters.h: New File. * sysdeps/hppa/nptl/pthread_spin_lock.c: Delete File. * sysdeps/hppa/nptl/pthread_spin_parameters.h: New File. * sysdeps/m68k/nptl/pthread_spin_lock.c: Delete File. * sysdeps/m68k/nptl/pthread_spin_parameters.h: New File. * sysdeps/microblaze/nptl/pthread_spin_lock.c: Delete File. * sysdeps/microblaze/nptl/pthread_spin_parameters.h: New File. * sysdeps/mips/nptl/pthread_spin_lock.c: Delete File. * sysdeps/mips/nptl/pthread_spin_parameters.h: New File. * sysdeps/nios2/nptl/pthread_spin_lock.c: Delete File. * sysdeps/nios2/nptl/pthread_spin_parameters.h: New File. commit 2110f18d944bcb018f57d35b1efe19b9b5f78b1a Author: Stefan Liebler Date: Wed Feb 15 10:19:27 2017 +0100 Optimize generic spinlock code and use C11 like atomic macros. This patch optimizes the generic spinlock code. The type pthread_spinlock_t is a typedef to volatile int on all archs. Passing a volatile pointer to the atomic macros can lead to extra stores and loads to stack if such a macro creates a temporary variable by using "__typeof (*(mem)) tmp;". Thus, those macros which are used by spinlock code - atomic_exchange_acquire, atomic_load_relaxed, atomic_compare_exchange_weak_acquire - have to be adjusted. According to the comment from Szabolcs Nagy, the type of a cast expression is unqualified (see http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_423.htm): __typeof ((__typeof (*(mem)) *(mem)) tmp; This patch adjusts those macros in include/atomic.h. The atomic macros are replaced by the C11 like atomic macros and thus the code is aligned to it. The pthread_spin_unlock implementation is now using release memory order instead of sequentially consistent memory order. The issue with passed volatile int pointers applies to the C11 like atomic macros as well as the ones used before. I've added a glibc_likely hint to the first atomic exchange in pthread_spin_lock in order to return immediately to the caller if the lock is free. Without the hint, there is an additional jump if the lock is free. I've added the atomic_spin_nop macro within the loop of plain reads. The plain reads are also realized by C11 like atomic_load_relaxed macro. For pthread_spin_trylock, a machine-specific version can define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG to 1 if an explicit test if lock is free is optimal. An architecture which wants to use the generic spinlock implementation has to create the file pthread_spin_parameters.h with the definition of the following parameter macros instead of creating a wrapper file which includes nptl/pthread_spin_lock.c: SPIN_LOCK_READS_BETWEEN_CMPXCHG SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG ChangeLog: * NEWS: Mention new spinlock implementation. * include/atomic.h: (__atomic_val_bysize): Cast type to omit volatile qualifier. (atomic_exchange_acq): Likewise. (atomic_load_relaxed): Likewise. * nptl/pthread_spin_init.c (pthread_spin_init): Use atomic_store_relaxed. * nptl/pthread_spin_lock.c (pthread_spin_lock): Use C11-like atomic macros. * nptl/pthread_spin_trylock.c (pthread_spin_trylock): Likewise. Use an explicit load and test if lock is free before exchange, if new macro SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG macro is set to one. * nptl/pthread_spin_unlock.c (pthread_spin_unlock): Use atomic_store_release. * nptl/pthread_spin_parameters.h: New File. * sysdeps/aarch64/nptl/pthread_spin_lock.c: Delete File. * sysdeps/aarch64/nptl/pthread_spin_parameters.h: New File. * sysdeps/arm/nptl/pthread_spin_lock.c: Delete File. * sysdeps/arm/nptl/pthread_spin_parameters.h: New File. * sysdeps/hppa/nptl/pthread_spin_lock.c: Delete File. * sysdeps/hppa/nptl/pthread_spin_parameters.h: New File. * sysdeps/m68k/nptl/pthread_spin_lock.c: Delete File. * sysdeps/m68k/nptl/pthread_spin_parameters.h: New File. * sysdeps/microblaze/nptl/pthread_spin_lock.c: Delete File. * sysdeps/microblaze/nptl/pthread_spin_parameters.h: New File. * sysdeps/mips/nptl/pthread_spin_lock.c: Delete File. * sysdeps/mips/nptl/pthread_spin_parameters.h: New File. * sysdeps/nios2/nptl/pthread_spin_lock.c: Delete File. * sysdeps/nios2/nptl/pthread_spin_parameters.h: New File. diff --git a/NEWS b/NEWS index b5a401c..b85b734 100644 --- a/NEWS +++ b/NEWS @@ -7,7 +7,9 @@ using `glibc' in the "product" field. Version 2.26 -[Add important changes here] +* The new generic spinlock implementation uses C11 like atomics. + The pthread_spin_unlock implementation is now using release + memory order instead of sequentially consistent memory order. Security related changes: diff --git a/include/atomic.h b/include/atomic.h index 7f32640..770db4a 100644 --- a/include/atomic.h +++ b/include/atomic.h @@ -54,7 +54,7 @@ and following args. */ #define __atomic_val_bysize(pre, post, mem, ...) \ ({ \ - __typeof (*mem) __atg1_result; \ + __typeof ((__typeof (*(mem))) *(mem)) __atg1_result; \ if (sizeof (*mem) == 1) \ __atg1_result = pre##_8_##post (mem, __VA_ARGS__); \ else if (sizeof (*mem) == 2) \ @@ -162,9 +162,9 @@ /* Store NEWVALUE in *MEM and return the old value. */ #ifndef atomic_exchange_acq # define atomic_exchange_acq(mem, newvalue) \ - ({ __typeof (*(mem)) __atg5_oldval; \ + ({ __typeof ((__typeof (*(mem))) *(mem)) __atg5_oldval; \ __typeof (mem) __atg5_memp = (mem); \ - __typeof (*(mem)) __atg5_value = (newvalue); \ + __typeof ((__typeof (*(mem))) *(mem)) __atg5_value = (newvalue); \ \ do \ __atg5_oldval = *__atg5_memp; \ @@ -668,7 +668,7 @@ void __atomic_link_error (void); # ifndef atomic_load_relaxed # define atomic_load_relaxed(mem) \ - ({ __typeof (*(mem)) __atg100_val; \ + ({ __typeof ((__typeof (*(mem))) *(mem)) __atg100_val; \ __asm ("" : "=r" (__atg100_val) : "0" (*(mem))); \ __atg100_val; }) # endif diff --git a/nptl/pthread_spin_init.c b/nptl/pthread_spin_init.c index 01dec5e..fe30913 100644 --- a/nptl/pthread_spin_init.c +++ b/nptl/pthread_spin_init.c @@ -22,6 +22,7 @@ int pthread_spin_init (pthread_spinlock_t *lock, int pshared) { - *lock = 0; + /* Relaxed MO is fine because this is an initializing store. */ + atomic_store_relaxed (lock, 0); return 0; } diff --git a/nptl/pthread_spin_lock.c b/nptl/pthread_spin_lock.c index 4d03b78..65d34a3 100644 --- a/nptl/pthread_spin_lock.c +++ b/nptl/pthread_spin_lock.c @@ -18,15 +18,7 @@ #include #include "pthreadP.h" - -/* A machine-specific version can define SPIN_LOCK_READS_BETWEEN_CMPXCHG - to the number of plain reads that it's optimal to spin on between uses - of atomic_compare_and_exchange_val_acq. If spinning forever is optimal - then use -1. If no plain reads here would ever be optimal, use 0. */ -#ifndef SPIN_LOCK_READS_BETWEEN_CMPXCHG -# warning machine-dependent file should define SPIN_LOCK_READS_BETWEEN_CMPXCHG -# define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 -#endif +#include int pthread_spin_lock (pthread_spinlock_t *lock) @@ -37,10 +29,14 @@ pthread_spin_lock (pthread_spinlock_t *lock) when the lock is locked. We assume that the first try mostly will be successful, and we use atomic_exchange. For the subsequent tries we use - atomic_compare_and_exchange. */ - if (atomic_exchange_acq (lock, 1) == 0) + atomic_compare_and_exchange. + We use acquire MO to synchronize-with the release MO store in + pthread_spin_unlock, and thus ensure that prior critical sections + happen-before this critical section. */ + if (__glibc_likely (atomic_exchange_acquire (lock, 1) == 0)) return 0; + int val; do { /* The lock is contended and we need to wait. Going straight back @@ -50,20 +46,39 @@ pthread_spin_lock (pthread_spinlock_t *lock) On the other hand, we do want to update memory state on the local core once in a while to avoid spinning indefinitely until some event that will happen to update local memory as a side-effect. */ - if (SPIN_LOCK_READS_BETWEEN_CMPXCHG >= 0) + +#if SPIN_LOCK_READS_BETWEEN_CMPXCHG >= 0 + /* Use at most SPIN_LOCK_READS_BETWEEN_CMPXCHG relaxed MO reads between + the atomic compare and exchanges. */ + int wait; + for (wait = 0; wait < SPIN_LOCK_READS_BETWEEN_CMPXCHG; wait ++) { - int wait = SPIN_LOCK_READS_BETWEEN_CMPXCHG; + atomic_spin_nop (); - while (*lock != 0 && wait > 0) - --wait; + val = atomic_load_relaxed (lock); + if (val == 0) + break; } - else + + /* Set expected value to zero for the next compare and exchange. */ + val = 0; + +#else /* SPIN_LOCK_READS_BETWEEN_CMPXCHG < 0 */ + /* Use relaxed MO reads until we observe the lock to not be acquired + anymore. */ + do { - while (*lock != 0) - ; + atomic_spin_nop (); + + val = atomic_load_relaxed (lock); } + while (val != 0); + +#endif + /* We need acquire memory order here for the same reason as mentioned + for the first try to lock the spinlock. */ } - while (atomic_compare_and_exchange_val_acq (lock, 1, 0) != 0); + while (!atomic_compare_exchange_weak_acquire (lock, &val, 1)); return 0; } diff --git a/nptl/pthread_spin_parameters.h b/nptl/pthread_spin_parameters.h new file mode 100644 index 0000000..6cde482 --- /dev/null +++ b/nptl/pthread_spin_parameters.h @@ -0,0 +1,35 @@ +/* Parameters for generic pthread_spinlock_t implementation. Generic version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* A machine-specific version can define SPIN_LOCK_READS_BETWEEN_CMPXCHG + to the number of plain reads that it's optimal to spin on between uses + of atomic_compare_exchange_weak_acquire. If spinning forever is optimal, + then use -1. If no plain reads here would ever be optimal, use 0. */ +#ifndef SPIN_LOCK_READS_BETWEEN_CMPXCHG +# warning machine-dependent file should define SPIN_LOCK_READS_BETWEEN_CMPXCHG +# define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 +#endif + +/* A machine-specific version can define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG + to 1 if an explicit load and test if lock is not acquired before executing + atomic_exchange_acquire is optimal. Define to 0 to execute + atomic_exchange_acquire without an additional load and test. */ +#ifndef SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG +# warning machine-dependent file should define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG +# define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 0 +#endif diff --git a/nptl/pthread_spin_trylock.c b/nptl/pthread_spin_trylock.c index 593bba3..1e22843 100644 --- a/nptl/pthread_spin_trylock.c +++ b/nptl/pthread_spin_trylock.c @@ -19,9 +19,28 @@ #include #include #include "pthreadP.h" +#include int pthread_spin_trylock (pthread_spinlock_t *lock) { - return atomic_exchange_acq (lock, 1) ? EBUSY : 0; + /* We use acquire MO to synchronize-with the release MO store in + pthread_spin_unlock, and thus ensure that prior critical sections + happen-before this critical section. */ +#if SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG == 1 + /* Load and test the spinlock and only try to acquire the lock if it has + not been acquired. */ + int val = atomic_load_relaxed (lock); + if (__glibc_likely (val == 0 && atomic_exchange_acquire (lock, 1) == 0)) + return 0; +#elif SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG == 0 + /* Try to acquire the lock, which succeeds if the lock had not been + acquired. */ + if (__glibc_likely (atomic_exchange_acquire (lock, 1) == 0)) + return 0; +#else +# error SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG has to be defined to 0 or 1 +#endif + + return EBUSY; } diff --git a/nptl/pthread_spin_unlock.c b/nptl/pthread_spin_unlock.c index 5fd73e5..f83b696 100644 --- a/nptl/pthread_spin_unlock.c +++ b/nptl/pthread_spin_unlock.c @@ -23,7 +23,9 @@ int pthread_spin_unlock (pthread_spinlock_t *lock) { - atomic_full_barrier (); - *lock = 0; + /* The atomic_store_release synchronizes-with the atomic_exchange_acquire + or atomic_compare_exchange_weak_acquire in pthread_spin_lock / + pthread_spin_trylock. */ + atomic_store_release (lock, 0); return 0; } diff --git a/sysdeps/aarch64/nptl/pthread_spin_lock.c b/sysdeps/aarch64/nptl/pthread_spin_lock.c deleted file mode 100644 index fcfcb40..0000000 --- a/sysdeps/aarch64/nptl/pthread_spin_lock.c +++ /dev/null @@ -1,24 +0,0 @@ -/* Copyright (C) 2008-2017 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of the - License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 - -/* We can't use the normal "#include " because - it will resolve to this very file. Using "sysdeps/.." as reference to the - top level directory does the job. */ -#include diff --git a/sysdeps/aarch64/nptl/pthread_spin_parameters.h b/sysdeps/aarch64/nptl/pthread_spin_parameters.h new file mode 100644 index 0000000..5fd319c --- /dev/null +++ b/sysdeps/aarch64/nptl/pthread_spin_parameters.h @@ -0,0 +1,22 @@ +/* Parameters for generic pthread_spinlock_t implementation. aarch64 version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 +#define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 0 + +#include_next diff --git a/sysdeps/arm/nptl/pthread_spin_lock.c b/sysdeps/arm/nptl/pthread_spin_lock.c deleted file mode 100644 index 037b3b8..0000000 --- a/sysdeps/arm/nptl/pthread_spin_lock.c +++ /dev/null @@ -1,23 +0,0 @@ -/* Copyright (C) 2008-2017 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 - -/* We can't use the normal "#include " because - it will resolve to this very file. Using "sysdeps/.." as reference to the - top level directory does the job. */ -#include diff --git a/sysdeps/arm/nptl/pthread_spin_parameters.h b/sysdeps/arm/nptl/pthread_spin_parameters.h new file mode 100644 index 0000000..3f8285a --- /dev/null +++ b/sysdeps/arm/nptl/pthread_spin_parameters.h @@ -0,0 +1,22 @@ +/* Parameters for generic pthread_spinlock_t implementation. arm version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 +#define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 0 + +#include_next diff --git a/sysdeps/hppa/nptl/pthread_spin_lock.c b/sysdeps/hppa/nptl/pthread_spin_lock.c deleted file mode 100644 index 14f36a6..0000000 --- a/sysdeps/hppa/nptl/pthread_spin_lock.c +++ /dev/null @@ -1,23 +0,0 @@ -/* Copyright (C) 2005-2017 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 - -/* We can't use the normal "#include " because - it will resolve to this very file. Using "sysdeps/.." as reference to the - top level directory does the job. */ -#include diff --git a/sysdeps/hppa/nptl/pthread_spin_parameters.h b/sysdeps/hppa/nptl/pthread_spin_parameters.h new file mode 100644 index 0000000..ebeb162 --- /dev/null +++ b/sysdeps/hppa/nptl/pthread_spin_parameters.h @@ -0,0 +1,22 @@ +/* Parameters for generic pthread_spinlock_t implementation. hppa version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 +#define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 0 + +#include_next diff --git a/sysdeps/m68k/nptl/pthread_spin_lock.c b/sysdeps/m68k/nptl/pthread_spin_lock.c deleted file mode 100644 index 62795f4..0000000 --- a/sysdeps/m68k/nptl/pthread_spin_lock.c +++ /dev/null @@ -1,24 +0,0 @@ -/* Copyright (C) 2010-2017 Free Software Foundation, Inc. - This file is part of the GNU C Library. - Contributed by Maxim Kuvyrkov , 2010. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 - -/* We can't use the normal "#include " because - it will resolve to this very file. Using "sysdeps/.." as reference to the - top level directory does the job. */ -#include diff --git a/sysdeps/m68k/nptl/pthread_spin_parameters.h b/sysdeps/m68k/nptl/pthread_spin_parameters.h new file mode 100644 index 0000000..b7b8581 --- /dev/null +++ b/sysdeps/m68k/nptl/pthread_spin_parameters.h @@ -0,0 +1,22 @@ +/* Parameters for generic pthread_spinlock_t implementation. m68k version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 +#define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 0 + +#include_next diff --git a/sysdeps/microblaze/nptl/pthread_spin_lock.c b/sysdeps/microblaze/nptl/pthread_spin_lock.c deleted file mode 100644 index fcfcb40..0000000 --- a/sysdeps/microblaze/nptl/pthread_spin_lock.c +++ /dev/null @@ -1,24 +0,0 @@ -/* Copyright (C) 2008-2017 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public License as - published by the Free Software Foundation; either version 2.1 of the - License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 - -/* We can't use the normal "#include " because - it will resolve to this very file. Using "sysdeps/.." as reference to the - top level directory does the job. */ -#include diff --git a/sysdeps/microblaze/nptl/pthread_spin_parameters.h b/sysdeps/microblaze/nptl/pthread_spin_parameters.h new file mode 100644 index 0000000..ef5827b --- /dev/null +++ b/sysdeps/microblaze/nptl/pthread_spin_parameters.h @@ -0,0 +1,23 @@ +/* Parameters for generic pthread_spinlock_t implementation. + microblaze version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 +#define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 0 + +#include_next diff --git a/sysdeps/mips/nptl/pthread_spin_lock.c b/sysdeps/mips/nptl/pthread_spin_lock.c deleted file mode 100644 index 19d87a5..0000000 --- a/sysdeps/mips/nptl/pthread_spin_lock.c +++ /dev/null @@ -1,23 +0,0 @@ -/* Copyright (C) 2012-2017 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 - -/* We can't use the normal "#include " because - it will resolve to this very file. Using "sysdeps/.." as reference to the - top level directory does the job. */ -#include diff --git a/sysdeps/mips/nptl/pthread_spin_parameters.h b/sysdeps/mips/nptl/pthread_spin_parameters.h new file mode 100644 index 0000000..4f03890 --- /dev/null +++ b/sysdeps/mips/nptl/pthread_spin_parameters.h @@ -0,0 +1,22 @@ +/* Parameters for generic pthread_spinlock_t implementation. mips version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 +#define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 0 + +#include_next diff --git a/sysdeps/nios2/nptl/pthread_spin_lock.c b/sysdeps/nios2/nptl/pthread_spin_lock.c deleted file mode 100644 index b203469..0000000 --- a/sysdeps/nios2/nptl/pthread_spin_lock.c +++ /dev/null @@ -1,24 +0,0 @@ -/* pthread spin-lock implementation for Nios II. - Copyright (C) 2005-2017 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 - -/* We can't use the normal "#include " because - it will resolve to this very file. Using "sysdeps/.." as reference to the - top level directory does the job. */ -#include diff --git a/sysdeps/nios2/nptl/pthread_spin_parameters.h b/sysdeps/nios2/nptl/pthread_spin_parameters.h new file mode 100644 index 0000000..237edb3 --- /dev/null +++ b/sysdeps/nios2/nptl/pthread_spin_parameters.h @@ -0,0 +1,22 @@ +/* Parameters for generic pthread_spinlock_t implementation. nios2 version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000 +#define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 0 + +#include_next