From patchwork Tue Dec 6 13:51:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 18230 Received: (qmail 75996 invoked by alias); 6 Dec 2016 13:52:17 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 75909 invoked by uid 89); 6 Dec 2016 13:52:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, RCVD_IN_SEMBACKSCATTER autolearn=no version=3.3.2 spammy=futile, becoming, transaction X-HELO: mx0a-001b2d01.pphosted.com From: Stefan Liebler To: libc-alpha@sourceware.org Cc: Stefan Liebler Subject: [PATCH 1/4] S390: Use C11-like atomics instead of plain memory accesses in lock elision code. Date: Tue, 6 Dec 2016 14:51:52 +0100 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16120613-0012-0000-0000-0000049EF1C1 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16120613-0013-0000-0000-0000167B75E2 Message-Id: <1481032315-12420-1-git-send-email-stli@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-12-06_08:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1612060220 This uses atomic operations to access lock elision metadata that is accessed concurrently (ie, adapt_count fields). The size of the data is less than a word but accessed only with atomic loads and stores. See also x86 commit ca6e601a9d4a72b3699cca15bad12ac1716bf49a: "Use C11-like atomics instead of plain memory accesses in x86 lock elision." ChangeLog: * sysdeps/unix/sysv/linux/s390/elision-lock.c (__lll_lock_elision): Use atomics to load / store adapt_count. * sysdeps/unix/sysv/linux/s390/elision-trylock.c (__lll_trylock_elision): Likewise. --- sysdeps/unix/sysv/linux/s390/elision-lock.c | 25 ++++++++++++++++++------- sysdeps/unix/sysv/linux/s390/elision-trylock.c | 14 +++++++++----- 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/sysdeps/unix/sysv/linux/s390/elision-lock.c b/sysdeps/unix/sysv/linux/s390/elision-lock.c index ecb507e..1876d21 100644 --- a/sysdeps/unix/sysv/linux/s390/elision-lock.c +++ b/sysdeps/unix/sysv/linux/s390/elision-lock.c @@ -45,11 +45,18 @@ int __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private) { - if (*adapt_count > 0) + /* adapt_count can be accessed concurrently; these accesses can be both + inside of transactions (if critical sections are nested and the outer + critical section uses lock elision) and outside of transactions. Thus, + we need to use atomic accesses to avoid data races. However, the + value of adapt_count is just a hint, so relaxed MO accesses are + sufficient. */ + if (atomic_load_relaxed (adapt_count) > 0) { /* Lost updates are possible, but harmless. Due to races this might lead to *adapt_count becoming less than zero. */ - (*adapt_count)--; + atomic_store_relaxed (adapt_count, + atomic_load_relaxed (adapt_count) - 1); goto use_lock; } @@ -74,8 +81,10 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private) /* In a non-nested transaction there is no need to abort, which is expensive. */ __builtin_tend (); + /* Don't try to use transactions for the next couple of times. + See above for why relaxed MO is sufficient. */ if (aconf.skip_lock_busy > 0) - *adapt_count = aconf.skip_lock_busy; + atomic_store_relaxed (adapt_count, aconf.skip_lock_busy); goto use_lock; } else /* nesting depth is > 1 */ @@ -101,18 +110,20 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private) /* A persistent abort (cc 1 or 3) indicates that a retry is probably futile. Use the normal locking now and for the next couple of calls. - Be careful to avoid writing to the lock. */ + Be careful to avoid writing to the lock. See above for why + relaxed MO is sufficient. */ if (aconf.skip_lock_internal_abort > 0) - *adapt_count = aconf.skip_lock_internal_abort; + atomic_store_relaxed (adapt_count, + aconf.skip_lock_internal_abort); goto use_lock; } } } /* Same logic as above, but for for a number of temporary failures in a - row. */ + row. See above for why relaxed MO is sufficient. */ if (aconf.skip_lock_out_of_tbegin_retries > 0 && aconf.try_tbegin > 0) - *adapt_count = aconf.skip_lock_out_of_tbegin_retries; + atomic_store_relaxed (adapt_count, aconf.skip_lock_out_of_tbegin_retries); use_lock: return LLL_LOCK ((*futex), private); diff --git a/sysdeps/unix/sysv/linux/s390/elision-trylock.c b/sysdeps/unix/sysv/linux/s390/elision-trylock.c index 3d5a994..a3252b8 100644 --- a/sysdeps/unix/sysv/linux/s390/elision-trylock.c +++ b/sysdeps/unix/sysv/linux/s390/elision-trylock.c @@ -49,8 +49,10 @@ __lll_trylock_elision (int *futex, short *adapt_count) __builtin_tabort (_HTM_FIRST_USER_ABORT_CODE | 1); } - /* Only try a transaction if it's worth it. */ - if (*adapt_count <= 0) + /* Only try a transaction if it's worth it. See __lll_lock_elision for + why we need atomic accesses. Relaxed MO is sufficient because this is + just a hint. */ + if (atomic_load_relaxed (adapt_count) <= 0) { unsigned status; @@ -65,9 +67,10 @@ __lll_trylock_elision (int *futex, short *adapt_count) __builtin_tend (); /* Note: Changing the adapt_count here might abort a transaction on a different cpu, but that could happen anyway when the futex is - acquired, so there's no need to check the nesting depth here. */ + acquired, so there's no need to check the nesting depth here. + See above for why relaxed MO is sufficient. */ if (aconf.skip_lock_busy > 0) - *adapt_count = aconf.skip_lock_busy; + atomic_store_relaxed (adapt_count, aconf.skip_lock_busy); } else { @@ -87,7 +90,8 @@ __lll_trylock_elision (int *futex, short *adapt_count) { /* Lost updates are possible, but harmless. Due to races this might lead to *adapt_count becoming less than zero. */ - (*adapt_count)--; + atomic_store_relaxed (adapt_count, + atomic_load_relaxed (adapt_count) - 1); } return lll_trylock (*futex);