From patchwork Tue Dec  6 13:51:52 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Stefan Liebler <stli@linux.vnet.ibm.com>
X-Patchwork-Id: 18230
Received: (qmail 75996 invoked by alias); 6 Dec 2016 13:52:17 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 75909 invoked by uid 89); 6 Dec 2016 13:52:16 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_00,
	KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW,
	RCVD_IN_SEMBACKSCATTER autolearn=no version=3.3.2 spammy=futile,
	becoming, transaction
X-HELO: mx0a-001b2d01.pphosted.com
From: Stefan Liebler <stli@linux.vnet.ibm.com>
To: libc-alpha@sourceware.org
Cc: Stefan Liebler <stli@linux.vnet.ibm.com>
Subject: [PATCH 1/4] S390: Use C11-like atomics instead of plain memory
	accesses in lock elision code.
Date: Tue,  6 Dec 2016 14:51:52 +0100
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16120613-0012-0000-0000-0000049EF1C1
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16120613-0013-0000-0000-0000167B75E2
Message-Id: <1481032315-12420-1-git-send-email-stli@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
	definitions=2016-12-06_08:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
	spamscore=0 suspectscore=1
	malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
	adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000
	definitions=main-1612060220

This uses atomic operations to access lock elision metadata that is accessed
concurrently (ie, adapt_count fields).  The size of the data is less than a
word but accessed only with atomic loads and stores.

See also x86 commit ca6e601a9d4a72b3699cca15bad12ac1716bf49a:
"Use C11-like atomics instead of plain memory accesses in x86 lock elision."

ChangeLog:

	* sysdeps/unix/sysv/linux/s390/elision-lock.c
	(__lll_lock_elision): Use atomics to load / store adapt_count.
	* sysdeps/unix/sysv/linux/s390/elision-trylock.c
	(__lll_trylock_elision): Likewise.
---
 sysdeps/unix/sysv/linux/s390/elision-lock.c    | 25 ++++++++++++++++++-------
 sysdeps/unix/sysv/linux/s390/elision-trylock.c | 14 +++++++++-----
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/s390/elision-lock.c b/sysdeps/unix/sysv/linux/s390/elision-lock.c
index ecb507e..1876d21 100644
--- a/sysdeps/unix/sysv/linux/s390/elision-lock.c
+++ b/sysdeps/unix/sysv/linux/s390/elision-lock.c
@@ -45,11 +45,18 @@
 int
 __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private)
 {
-  if (*adapt_count > 0)
+  /* adapt_count can be accessed concurrently; these accesses can be both
+     inside of transactions (if critical sections are nested and the outer
+     critical section uses lock elision) and outside of transactions.  Thus,
+     we need to use atomic accesses to avoid data races.  However, the
+     value of adapt_count is just a hint, so relaxed MO accesses are
+     sufficient.  */
+  if (atomic_load_relaxed (adapt_count) > 0)
     {
       /* Lost updates are possible, but harmless.  Due to races this might lead
 	 to *adapt_count becoming less than zero.  */
-      (*adapt_count)--;
+      atomic_store_relaxed (adapt_count,
+			    atomic_load_relaxed (adapt_count) - 1);
       goto use_lock;
     }
 
@@ -74,8 +81,10 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private)
 	      /* In a non-nested transaction there is no need to abort,
 		 which is expensive.  */
 	      __builtin_tend ();
+	      /* Don't try to use transactions for the next couple of times.
+		 See above for why relaxed MO is sufficient.  */
 	      if (aconf.skip_lock_busy > 0)
-		*adapt_count = aconf.skip_lock_busy;
+		atomic_store_relaxed (adapt_count, aconf.skip_lock_busy);
 	      goto use_lock;
 	    }
 	  else /* nesting depth is > 1 */
@@ -101,18 +110,20 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private)
 	      /* A persistent abort (cc 1 or 3) indicates that a retry is
 		 probably futile.  Use the normal locking now and for the
 		 next couple of calls.
-		 Be careful to avoid writing to the lock.  */
+		 Be careful to avoid writing to the lock.  See above for why
+		 relaxed MO is sufficient.  */
 	      if (aconf.skip_lock_internal_abort > 0)
-		*adapt_count = aconf.skip_lock_internal_abort;
+		atomic_store_relaxed (adapt_count,
+				      aconf.skip_lock_internal_abort);
 	      goto use_lock;
 	    }
 	}
     }
 
   /* Same logic as above, but for for a number of temporary failures in a
-     row.  */
+     row.  See above for why relaxed MO is sufficient.  */
   if (aconf.skip_lock_out_of_tbegin_retries > 0 && aconf.try_tbegin > 0)
-    *adapt_count = aconf.skip_lock_out_of_tbegin_retries;
+    atomic_store_relaxed (adapt_count, aconf.skip_lock_out_of_tbegin_retries);
 
   use_lock:
   return LLL_LOCK ((*futex), private);
diff --git a/sysdeps/unix/sysv/linux/s390/elision-trylock.c b/sysdeps/unix/sysv/linux/s390/elision-trylock.c
index 3d5a994..a3252b8 100644
--- a/sysdeps/unix/sysv/linux/s390/elision-trylock.c
+++ b/sysdeps/unix/sysv/linux/s390/elision-trylock.c
@@ -49,8 +49,10 @@ __lll_trylock_elision (int *futex, short *adapt_count)
       __builtin_tabort (_HTM_FIRST_USER_ABORT_CODE | 1);
     }
 
-  /* Only try a transaction if it's worth it.  */
-  if (*adapt_count <= 0)
+  /* Only try a transaction if it's worth it.  See __lll_lock_elision for
+     why we need atomic accesses.  Relaxed MO is sufficient because this is
+     just a hint.  */
+  if (atomic_load_relaxed (adapt_count) <= 0)
     {
       unsigned status;
 
@@ -65,9 +67,10 @@ __lll_trylock_elision (int *futex, short *adapt_count)
 	  __builtin_tend ();
 	  /* Note: Changing the adapt_count here might abort a transaction on a
 	     different cpu, but that could happen anyway when the futex is
-	     acquired, so there's no need to check the nesting depth here.  */
+	     acquired, so there's no need to check the nesting depth here.
+	     See above for why relaxed MO is sufficient.  */
 	  if (aconf.skip_lock_busy > 0)
-	    *adapt_count = aconf.skip_lock_busy;
+	    atomic_store_relaxed (adapt_count, aconf.skip_lock_busy);
 	}
       else
 	{
@@ -87,7 +90,8 @@ __lll_trylock_elision (int *futex, short *adapt_count)
     {
       /* Lost updates are possible, but harmless.  Due to races this might lead
 	 to *adapt_count becoming less than zero.  */
-      (*adapt_count)--;
+      atomic_store_relaxed (adapt_count,
+			    atomic_load_relaxed (adapt_count) - 1);
     }
 
   return lll_trylock (*futex);