From patchwork Fri Dec 16 16:31:56 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 18514 Received: (qmail 15821 invoked by alias); 16 Dec 2016 16:32:10 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 15802 invoked by uid 89); 16 Dec 2016 16:32:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=2927, exchange X-HELO: mx0a-001b2d01.pphosted.com From: Stefan Liebler To: libc-alpha@sourceware.org Cc: Stefan Liebler Subject: [PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros. Date: Fri, 16 Dec 2016 17:31:56 +0100 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16121616-0040-0000-0000-000003209FB5 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16121616-0041-0000-0000-00001E4498B8 Message-Id: <1481905917-15654-1-git-send-email-stli@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-12-16_10:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=3 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1612160260 This patch optimizes the generic spinlock code. The type pthread_spinlock_t is a typedef to volatile int on all archs. Passing a volatile pointer to the atomic macros can lead to extra stores and loads to stack if such a macro creates a temporary variable by using __typeof (*(mem)). Thus the patch passes a int pointer to the atomic macros. The atomic macros are replaced by the C11 like atomic macros and thus the code is aligned to it. I've added a glibc_likely hint to the first atomic exchange in pthread_spin_lock in order to return immediately to caller if lock is free. Without the hint, there is an additional jump if lock is free. I've added the atomic_spin_nop macro within the loop of plain reads. The plain reads are realized by dereferencing the volatile pointer as the for-loop was optimized out with atomic_load_relaxed. For pthread_spin_trylock, a machine-specific version can define SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG to 1 if an explicit test if lock is free is optimal. ChangeLog: * nptl/pthread_spin_init.c (pthread_spin_init): Use atomic_store_relaxed. * nptl/pthread_spin_lock.c (pthread_spin_lock): Use C11-like atomic macros and pass int pointers instead of volatile int pointers. * nptl/pthread_spin_trylock.c (pthread_spin_trylock): Likewise. Use an explicit test if lock is free, if new SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG macro is set to one. (SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG): New define. * nptl/pthread_spin_unlock.c (pthread_spin_unlock): Use atomic_store_release. --- nptl/pthread_spin_init.c | 4 +++- nptl/pthread_spin_lock.c | 54 +++++++++++++++++++++++++++++++++++---------- nptl/pthread_spin_trylock.c | 29 ++++++++++++++++++++++-- nptl/pthread_spin_unlock.c | 6 +++-- 4 files changed, 76 insertions(+), 17 deletions(-) diff --git a/nptl/pthread_spin_init.c b/nptl/pthread_spin_init.c index 8ed4772..65d05b8 100644 --- a/nptl/pthread_spin_init.c +++ b/nptl/pthread_spin_init.c @@ -22,6 +22,8 @@ int pthread_spin_init (pthread_spinlock_t *lock, int pshared) { - *lock = 0; + /* The atomic_store_relaxed is enough as we only initialize the spinlock here + and we are not in a critical region. */ + atomic_store_relaxed (lock, 0); return 0; } diff --git a/nptl/pthread_spin_lock.c b/nptl/pthread_spin_lock.c index fb9bcc1..adbc9d7 100644 --- a/nptl/pthread_spin_lock.c +++ b/nptl/pthread_spin_lock.c @@ -21,7 +21,7 @@ /* A machine-specific version can define SPIN_LOCK_READS_BETWEEN_CMPXCHG to the number of plain reads that it's optimal to spin on between uses - of atomic_compare_and_exchange_val_acq. If spinning forever is optimal + of atomic_compare_exchange_weak_acquire. If spinning forever is optimal then use -1. If no plain reads here would ever be optimal, use 0. */ #ifndef SPIN_LOCK_READS_BETWEEN_CMPXCHG # warning machine-dependent file should define SPIN_LOCK_READS_BETWEEN_CMPXCHG @@ -29,18 +29,27 @@ #endif int -pthread_spin_lock (pthread_spinlock_t *lock) +pthread_spin_lock (pthread_spinlock_t *lock_volatile) { + /* The type pthread_spinlock_t is a typedef to volatile int on all archs. + Passing a volatile pointer to the atomic macros can lead to extra stores + and loads to stack if such a macro creates a temporary variable by using + __typeof (*(mem)). */ + int *lock = (int *) lock_volatile; + /* atomic_exchange usually takes less instructions than atomic_compare_and_exchange. On the other hand, atomic_compare_and_exchange potentially generates less bus traffic when the lock is locked. We assume that the first try mostly will be successful, and we use atomic_exchange. For the subsequent tries we use - atomic_compare_and_exchange. */ - if (atomic_exchange_acq (lock, 1) == 0) + atomic_compare_and_exchange. + We need acquire memory order here as we need to see if another thread has + locked / unlocked this spinlock. */ + if (__glibc_likely (atomic_exchange_acquire (lock, 1) == 0)) return 0; + int val; do { /* The lock is contended and we need to wait. Going straight back @@ -50,20 +59,41 @@ pthread_spin_lock (pthread_spinlock_t *lock) On the other hand, we do want to update memory state on the local core once in a while to avoid spinning indefinitely until some event that will happen to update local memory as a side-effect. */ - if (SPIN_LOCK_READS_BETWEEN_CMPXCHG >= 0) + +#if SPIN_LOCK_READS_BETWEEN_CMPXCHG >= 0 + /* Use at most SPIN_LOCK_READS_BETWEEN_CMPXCHG plain reads between the + atomic compare and exchanges. */ + int wait; + for (wait = 0; wait < SPIN_LOCK_READS_BETWEEN_CMPXCHG; wait ++) { - int wait = SPIN_LOCK_READS_BETWEEN_CMPXCHG; + atomic_spin_nop (); - while (*lock != 0 && wait > 0) - --wait; + /* Use a plain read every round. */ + val = *lock_volatile; + if (val == 0) + break; } - else + + /* Set expected value to zero for the next compare and exchange. */ + val = 0; + +#else /* SPIN_LOCK_READS_BETWEEN_CMPXCHG < 0 */ + /* Use plain reads until spinlock is free and then try a further atomic + compare and exchange the next time. */ + do { - while (*lock != 0) - ; + atomic_spin_nop (); + + /* Use a plain read every round. */ + val = *lock_volatile; } + while (val != 0); + +#endif + /* We need acquire memory order here for the same reason as mentioned + for the first try to lock the spinlock. */ } - while (atomic_compare_and_exchange_val_acq (lock, 1, 0) != 0); + while (!atomic_compare_exchange_weak_acquire (lock, &val, 1)); return 0; } diff --git a/nptl/pthread_spin_trylock.c b/nptl/pthread_spin_trylock.c index 4e1a96c..8e9c76f 100644 --- a/nptl/pthread_spin_trylock.c +++ b/nptl/pthread_spin_trylock.c @@ -20,8 +20,33 @@ #include #include "pthreadP.h" +/* A machine-specific version can define + SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG to 1 if an explicit test if + lock is free is optimal. */ +#ifndef SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG +# define SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG 0 +#endif + int -pthread_spin_trylock (pthread_spinlock_t *lock) +pthread_spin_trylock (pthread_spinlock_t *lock_volatile) { - return atomic_exchange_acq (lock, 1) ? EBUSY : 0; + /* See comment in pthread_spin_lock.c. */ + int *lock = (int *) lock_volatile; + + /* We need acquire memory order here as we need to see if another + thread has locked / unlocked this spinlock. */ +#if SPIN_TRYLOCK_USE_CMPXCHG_INSTEAD_OF_XCHG == 1 + /* Load and test the spinlock and only try to lock the spinlock if it is + free. */ + int val = atomic_load_relaxed (lock); + if (__glibc_likely (val == 0 + && atomic_compare_exchange_weak_acquire (lock, &val, 1))) + return 0; +#else + /* Set spinlock to locked and test if we have locked it. */ + if (__glibc_likely (atomic_exchange_acquire (lock, 1) == 0)) + return 0; +#endif + + return EBUSY; } diff --git a/nptl/pthread_spin_unlock.c b/nptl/pthread_spin_unlock.c index d4b63ac..014f295 100644 --- a/nptl/pthread_spin_unlock.c +++ b/nptl/pthread_spin_unlock.c @@ -23,7 +23,9 @@ int pthread_spin_unlock (pthread_spinlock_t *lock) { - atomic_full_barrier (); - *lock = 0; + /* The atomic_store_release synchronizes-with the atomic_exchange_acquire + or atomic_compare_exchange_weak_acquire in pthread_spin_lock / + pthread_spin_trylock. */ + atomic_store_release (lock, 0); return 0; }