From patchwork Tue Jan 20 11:20:30 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 4736 Received: (qmail 24725 invoked by alias); 20 Jan 2015 11:20:46 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 24712 invoked by uid 89); 20 Jan 2015 11:20:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: plane.gmane.org To: libc-alpha@sourceware.org From: Stefan Liebler Subject: Re: [PATCH] s390: Use generic lowlevellock-futex.h. Date: Tue, 20 Jan 2015 12:20:30 +0100 Lines: 361 Message-ID: References: <1418857172.20194.24.camel@triegel.csb> <1420645455.29798.36.camel@triegel.csb> Mime-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 In-Reply-To: <1420645455.29798.36.camel@triegel.csb> Hi, On 01/07/2015 04:44 PM, Torvald Riegel wrote: > Ping. > > On Wed, 2014-12-17 at 23:59 +0100, Torvald Riegel wrote: >> This untested patch removes the custom futex operations in s390 >> lowlevellock.h, and instead uses the generic ones from >> lowlevellock-futex.h. They seem to be equivalent. The behaviour of lll_futex_requeue, lll_futex_wake_unlock, lll_futex_cmp_requeue_pi has changed. The s390-variant returns 0 or 1 and the lowlevellock-futex.h-variant returns 0 or __ret. The used return values of those macros are compared equal or unequal to zero. Thus this is okay. >> It also removes the definition of SYS_futex because it doesn't seem to >> be used anywhere else. OK >> >> It would be even nicer if we could remove the custom lowlevellock.h >> altogether. However, current s390 has custom asm for the lock fast >> paths, so this needs a closer review than I can do. Could you check >> whether that would be possible? The custom lowlevellock.h can´t be removed completely because of the lock-elision defines. But including the generic lowlevellock.h as power does is fine. Those lll_* macros use the atomic_* macros and the asm-output is nearly equivalent to the custom implementation. The s390 lll_cond_lock uses if ( atomic_compare_and_exchange_bool_acq(futex, 2, 0) ) but the generic code uses if ( atomic_exchange_acq(futex, 2) != 0 ). In both cases __lll_lock_wait is called if the futex was != 0 before. I think this is a bug in s390-variant, because the futex is only set to 2 if futex was 0. If futex was 2 before, then futex remains 2 and __lll_lock_wait is called. If futex was 1 before, then the s390-code does not set futex to 2, but __lll_lock_wait is called. The generic code always sets futex to 2. The generic lll_[robust_]unlock macros use atomic_exchange_rel (__futex, 0). The behaviour for s390-variant is the same, but the asm-output for generic-variant needs one additional register and two additional instructions. The old-value is stored before compare-and-swap instruction in order to compare it against the "old-value" from cs-instruction. The s390-variant uses the condition code of cs-instruction to determine, if the cs-instruction has changed the value. Thus lll_[robust_]unlock macros or the atomic_exchange_rel macro should be redefined for s390. I think removing the custom implementation as most as possible is the right way. Thus I´ve changed your patch. Now the s390 - lowlevellock.h does only include sysdeps/nptl/lowlevellock.h and defines the macros for lock-elision. The atomic_exchange macro is defined for s390 in atomic.h. The testsuite runs without regression on s390-32/s390-64 with/without --enable-lock-elision. Ok to commit? Thanks. Bye Stefan --- 2015-01-20 Stefan Liebler * sysdeps/unix/sysv/linux/s390/lowlevellock.h: Include and remove macros and functions that are now defined there. (SYS_futex): Remove. (lll_compare_and_swap): Remove. * sysdeps/s390/bits/atomic.h (atomic_exchange_acq): Define. diff --git a/sysdeps/s390/bits/atomic.h b/sysdeps/s390/bits/atomic.h index a852882..16c8c54 100644 --- a/sysdeps/s390/bits/atomic.h +++ b/sysdeps/s390/bits/atomic.h @@ -77,3 +77,44 @@ typedef uintmax_t uatomic_max_t; # define __arch_compare_and_exchange_val_64_acq(mem, newval, oldval) \ (abort (), (__typeof (*mem)) 0) #endif + +/* Store NEWVALUE in *MEM and return the old value. */ +/* On s390, the atomic_exchange_acq is different from generic implementation, + because the generic one does not use the condition-code of cs-instruction + to determine if looping is needed. Instead it saves the old-value and + compares it against old-value returned by cs-instruction. */ +#ifdef __s390x__ +# define atomic_exchange_acq(mem, newvalue) \ + ({ __typeof (mem) __atg5_memp = (mem); \ + __typeof (*(mem)) __atg5_oldval = *__atg5_memp; \ + __typeof (*(mem)) __atg5_value = (newvalue); \ + if (sizeof (*mem) == 4) \ + __asm __volatile ("0: cs %0,%2,%1\n" \ + " jl 0b" \ + : "+d" (__atg5_oldval), "=Q" (*__atg5_memp) \ + : "d" (__atg5_value), "m" (*__atg5_memp) \ + : "cc", "memory" ); \ + else if (sizeof (*mem) == 8) \ + __asm __volatile ("0: csg %0,%2,%1\n" \ + " jl 0b" \ + : "+d" ( __atg5_oldval), "=Q" (*__atg5_memp) \ + : "d" ((long) __atg5_value), "m" (*__atg5_memp) \ + : "cc", "memory" ); \ + else \ + abort (); \ + __atg5_oldval; }) +#else +# define atomic_exchange_acq(mem, newvalue) \ + ({ __typeof (mem) __atg5_memp = (mem); \ + __typeof (*(mem)) __atg5_oldval = *__atg5_memp; \ + __typeof (*(mem)) __atg5_value = (newvalue); \ + if (sizeof (*mem) == 4) \ + __asm __volatile ("0: cs %0,%2,%1\n" \ + " jl 0b" \ + : "+d" (__atg5_oldval), "=Q" (*__atg5_memp) \ + : "d" (__atg5_value), "m" (*__atg5_memp) \ + : "cc", "memory" ); \ + else \ + abort (); \ + __atg5_oldval; }) +#endif diff --git a/sysdeps/unix/sysv/linux/s390/lowlevellock.h b/sysdeps/unix/sysv/linux/s390/lowlevellock.h index a468859..163a731 100644 --- a/sysdeps/unix/sysv/linux/s390/lowlevellock.h +++ b/sysdeps/unix/sysv/linux/s390/lowlevellock.h @@ -16,335 +16,20 @@ License along with the GNU C Library; if not, see . */ -#ifndef _LOWLEVELLOCK_H -#define _LOWLEVELLOCK_H 1 +#ifndef _S390_LOWLEVELLOCK_H +#define _S390_LOWLEVELLOCK_H 1 -#include -#include -#include -#include -#include +#include -#define SYS_futex 238 -#define FUTEX_WAIT 0 -#define FUTEX_WAKE 1 -#define FUTEX_REQUEUE 3 -#define FUTEX_CMP_REQUEUE 4 -#define FUTEX_WAKE_OP 5 -#define FUTEX_OP_CLEAR_WAKE_IF_GT_ONE ((4 << 24) | 1) -#define FUTEX_LOCK_PI 6 -#define FUTEX_UNLOCK_PI 7 -#define FUTEX_TRYLOCK_PI 8 -#define FUTEX_WAIT_BITSET 9 -#define FUTEX_WAKE_BITSET 10 -#define FUTEX_WAIT_REQUEUE_PI 11 -#define FUTEX_CMP_REQUEUE_PI 12 -#define FUTEX_PRIVATE_FLAG 128 -#define FUTEX_CLOCK_REALTIME 256 - -#define FUTEX_BITSET_MATCH_ANY 0xffffffff - -/* Values for 'private' parameter of locking macros. Yes, the - definition seems to be backwards. But it is not. The bit will be - reversed before passing to the system call. */ -#define LLL_PRIVATE 0 -#define LLL_SHARED FUTEX_PRIVATE_FLAG - - -#if IS_IN (libc) || IS_IN (rtld) -/* In libc.so or ld.so all futexes are private. */ -# ifdef __ASSUME_PRIVATE_FUTEX -# define __lll_private_flag(fl, private) \ - ((fl) | FUTEX_PRIVATE_FLAG) -# else -# define __lll_private_flag(fl, private) \ - ((fl) | THREAD_GETMEM (THREAD_SELF, header.private_futex)) -# endif -#else -# ifdef __ASSUME_PRIVATE_FUTEX -# define __lll_private_flag(fl, private) \ - (((fl) | FUTEX_PRIVATE_FLAG) ^ (private)) -# else -# define __lll_private_flag(fl, private) \ - (__builtin_constant_p (private) \ - ? ((private) == 0 \ - ? ((fl) | THREAD_GETMEM (THREAD_SELF, header.private_futex)) \ - : (fl)) \ - : ((fl) | (((private) ^ FUTEX_PRIVATE_FLAG) \ - & THREAD_GETMEM (THREAD_SELF, header.private_futex)))) -# endif -#endif - -#define lll_futex_wait(futex, val, private) \ - lll_futex_timed_wait (futex, val, NULL, private) - -#define lll_futex_timed_wait(futexp, val, timespec, private) \ - ({ \ - INTERNAL_SYSCALL_DECL (__err); \ - \ - INTERNAL_SYSCALL (futex, __err, 4, (futexp), \ - __lll_private_flag (FUTEX_WAIT, private), \ - (val), (timespec)); \ - }) - -#define lll_futex_timed_wait_bitset(futexp, val, timespec, clockbit, private) \ - ({ \ - INTERNAL_SYSCALL_DECL (__err); \ - int __op = FUTEX_WAIT_BITSET | clockbit; \ - \ - INTERNAL_SYSCALL (futex, __err, 6, (futexp), \ - __lll_private_flag (__op, private), \ - (val), (timespec), NULL /* Unused. */, \ - FUTEX_BITSET_MATCH_ANY); \ - }) - -#define lll_futex_wake(futexp, nr, private) \ - ({ \ - INTERNAL_SYSCALL_DECL (__err); \ - \ - INTERNAL_SYSCALL (futex, __err, 4, (futexp), \ - __lll_private_flag (FUTEX_WAKE, private), \ - (nr), 0); \ - }) - - -/* Returns non-zero if error happened, zero if success. */ -#define lll_futex_requeue(futexp, nr_wake, nr_move, mutex, val, private) \ - ({ \ - INTERNAL_SYSCALL_DECL (__err); \ - long int __ret; \ - \ - __ret = INTERNAL_SYSCALL (futex, __err, 6, (futexp), \ - __lll_private_flag (FUTEX_CMP_REQUEUE, private),\ - (nr_wake), (nr_move), (mutex), (val)); \ - INTERNAL_SYSCALL_ERROR_P (__ret, __err); \ - }) - -/* Returns non-zero if error happened, zero if success. */ -#define lll_futex_wake_unlock(futexp, nr_wake, nr_wake2, futexp2, private) \ - ({ \ - INTERNAL_SYSCALL_DECL (__err); \ - long int __ret; \ - \ - __ret = INTERNAL_SYSCALL (futex, __err, 6, (futexp), \ - __lll_private_flag (FUTEX_WAKE_OP, private), \ - (nr_wake), (nr_wake2), (futexp2), \ - FUTEX_OP_CLEAR_WAKE_IF_GT_ONE); \ - INTERNAL_SYSCALL_ERROR_P (__ret, __err); \ - }) - -/* Priority Inheritance support. */ -#define lll_futex_wait_requeue_pi(futexp, val, mutex, private) \ - lll_futex_timed_wait_requeue_pi (futexp, val, NULL, 0, mutex, private) - -#define lll_futex_timed_wait_requeue_pi(futexp, val, timespec, clockbit, \ - mutex, private) \ - ({ \ - INTERNAL_SYSCALL_DECL (__err); \ - int __op = FUTEX_WAIT_REQUEUE_PI | clockbit; \ - \ - INTERNAL_SYSCALL (futex, __err, 5, (futexp), \ - __lll_private_flag (__op, private), \ - (val), (timespec), mutex); \ - }) - -#define lll_futex_cmp_requeue_pi(futexp, nr_wake, nr_move, mutex, val, priv) \ - ({ \ - INTERNAL_SYSCALL_DECL (__err); \ - long int __ret; \ - \ - __ret = INTERNAL_SYSCALL (futex, __err, 6, (futexp), \ - __lll_private_flag (FUTEX_CMP_REQUEUE_PI, priv),\ - (nr_wake), (nr_move), (mutex), (val)); \ - INTERNAL_SYSCALL_ERROR_P (__ret, __err); \ - }) - -#define lll_compare_and_swap(futex, oldval, newval, operation) \ - do { \ - __typeof (futex) __futex = (futex); \ - __asm __volatile (" l %1,%0\n" \ - "0: " operation "\n" \ - " cs %1,%2,%0\n" \ - " jl 0b\n" \ - "1:" \ - : "=Q" (*__futex), "=&d" (oldval), "=&d" (newval) \ - : "m" (*__futex) : "cc", "memory" ); \ - } while (0) - - -static inline int -__attribute__ ((always_inline)) -__lll_trylock (int *futex) -{ - unsigned int old; - - __asm __volatile ("cs %0,%3,%1" - : "=d" (old), "=Q" (*futex) - : "0" (0), "d" (1), "m" (*futex) : "cc", "memory" ); - return old != 0; -} -#define lll_trylock(futex) __lll_trylock (&(futex)) - - -static inline int -__attribute__ ((always_inline)) -__lll_cond_trylock (int *futex) -{ - unsigned int old; - - __asm __volatile ("cs %0,%3,%1" - : "=d" (old), "=Q" (*futex) - : "0" (0), "d" (2), "m" (*futex) : "cc", "memory" ); - return old != 0; -} -#define lll_cond_trylock(futex) __lll_cond_trylock (&(futex)) - - -extern void __lll_lock_wait_private (int *futex) attribute_hidden; -extern void __lll_lock_wait (int *futex, int private) attribute_hidden; -extern int __lll_robust_lock_wait (int *futex, int private) attribute_hidden; - -static inline void -__attribute__ ((always_inline)) -__lll_lock (int *futex, int private) -{ - if (__glibc_unlikely (atomic_compare_and_exchange_bool_acq (futex, 1, 0))) - { - if (__builtin_constant_p (private) && private == LLL_PRIVATE) - __lll_lock_wait_private (futex); - else - __lll_lock_wait (futex, private); - } -} -#define lll_lock(futex, private) __lll_lock (&(futex), private) - -static inline int -__attribute__ ((always_inline)) -__lll_robust_lock (int *futex, int id, int private) -{ - int result = 0; - if (__builtin_expect (atomic_compare_and_exchange_bool_acq (futex, id, 0), - 0)) - result = __lll_robust_lock_wait (futex, private); - return result; -} -#define lll_robust_lock(futex, id, private) \ - __lll_robust_lock (&(futex), id, private) - -static inline void -__attribute__ ((always_inline)) -__lll_cond_lock (int *futex, int private) -{ - if (__glibc_unlikely (atomic_compare_and_exchange_bool_acq (futex, 2, 0))) - __lll_lock_wait (futex, private); -} -#define lll_cond_lock(futex, private) __lll_cond_lock (&(futex), private) - -#define lll_robust_cond_lock(futex, id, private) \ - __lll_robust_lock (&(futex), (id) | FUTEX_WAITERS, private) - -extern int __lll_timedlock_wait - (int *futex, const struct timespec *, int private) attribute_hidden; -extern int __lll_robust_timedlock_wait - (int *futex, const struct timespec *, int private) attribute_hidden; - -static inline int -__attribute__ ((always_inline)) -__lll_timedlock (int *futex, const struct timespec *abstime, int private) -{ - int result = 0; - if (__glibc_unlikely (atomic_compare_and_exchange_bool_acq (futex, 1, 0))) - result = __lll_timedlock_wait (futex, abstime, private); - return result; -} -#define lll_timedlock(futex, abstime, private) \ - __lll_timedlock (&(futex), abstime, private) - -#ifdef ENABLE_LOCK_ELISION +/* Transactional lock elision definitions. */ +# ifdef ENABLE_LOCK_ELISION extern int __lll_timedlock_elision (int *futex, short *adapt_count, const struct timespec *timeout, int private) attribute_hidden; -# define lll_timedlock_elision(futex, adapt_count, timeout, private) \ +# define lll_timedlock_elision(futex, adapt_count, timeout, private) \ __lll_timedlock_elision(&(futex), &(adapt_count), timeout, private) -#endif - -static inline int -__attribute__ ((always_inline)) -__lll_robust_timedlock (int *futex, const struct timespec *abstime, - int id, int private) -{ - int result = 0; - if (__builtin_expect (atomic_compare_and_exchange_bool_acq (futex, id, 0), - 0)) - result = __lll_robust_timedlock_wait (futex, abstime, private); - return result; -} -#define lll_robust_timedlock(futex, abstime, id, private) \ - __lll_robust_timedlock (&(futex), abstime, id, private) - - -#define __lll_unlock(futex, private) \ - (void) \ - ({ int __oldval; \ - int __newval = 0; \ - int *__futexp = (futex); \ - \ - lll_compare_and_swap (__futexp, __oldval, __newval, "slr %2,%2"); \ - if (__glibc_unlikely (__oldval > 1)) \ - lll_futex_wake (__futexp, 1, private); \ - }) -#define lll_unlock(futex, private) __lll_unlock(&(futex), private) - - -#define __lll_robust_unlock(futex, private) \ - (void) \ - ({ int __oldval; \ - int __newval = 0; \ - int *__futexp = (futex); \ - \ - lll_compare_and_swap (__futexp, __oldval, __newval, "slr %2,%2"); \ - if (__glibc_unlikely (__oldval & FUTEX_WAITERS)) \ - lll_futex_wake (__futexp, 1, private); \ - }) -#define lll_robust_unlock(futex, private) \ - __lll_robust_unlock(&(futex), private) - -#define lll_islocked(futex) \ - (futex != 0) - - -/* Initializers for lock. */ -#define LLL_LOCK_INITIALIZER (0) -#define LLL_LOCK_INITIALIZER_LOCKED (1) - -/* The kernel notifies a process which uses CLONE_CHILD_CLEARTID via futex - wakeup when the clone terminates. The memory location contains the - thread ID while the clone is running and is reset to zero - afterwards. */ -#define __lll_wait_tid(ptid) \ - do \ - { \ - int __tid; \ - \ - while ((__tid = *ptid) != 0) \ - lll_futex_wait (ptid, __tid, LLL_SHARED); \ - } \ - while (0) -#define lll_wait_tid(tid) __lll_wait_tid(&(tid)) - -extern int __lll_timedwait_tid (int *, const struct timespec *) - attribute_hidden; - -#define lll_timedwait_tid(tid, abstime) \ - ({ \ - int __res = 0; \ - if ((tid) != 0) \ - __res = __lll_timedwait_tid (&(tid), (abstime)); \ - __res; \ - }) -#ifdef ENABLE_LOCK_ELISION extern int __lll_lock_elision (int *futex, short *adapt_count, int private) attribute_hidden; @@ -354,12 +39,12 @@ extern int __lll_unlock_elision(int *futex, int private) extern int __lll_trylock_elision(int *futex, short *adapt_count) attribute_hidden; -# define lll_lock_elision(futex, adapt_count, private) \ +# define lll_lock_elision(futex, adapt_count, private) \ __lll_lock_elision (&(futex), &(adapt_count), private) -# define lll_unlock_elision(futex, private) \ +# define lll_unlock_elision(futex, private) \ __lll_unlock_elision (&(futex), private) -# define lll_trylock_elision(futex, adapt_count) \ +# define lll_trylock_elision(futex, adapt_count) \ __lll_trylock_elision(&(futex), &(adapt_count)) -#endif +# endif /* ENABLE_LOCK_ELISION */ #endif /* lowlevellock.h */