From patchwork Wed Oct 29 22:10:50 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Torvald Riegel <triegel@redhat.com>
X-Patchwork-Id: 3492
Received: (qmail 3932 invoked by alias); 29 Oct 2014 22:10:55 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 3920 invoked by uid 89); 29 Oct 2014 22:10:54 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.4 required=5.0 tests=AWL, BAYES_00,
	RP_MATCHES_RCVD, SPF_HELO_PASS,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Subject: [PATCH 4/4] Use C11 atomics in pthread_once.
From: Torvald Riegel <triegel@redhat.com>
To: GLIBC Devel <libc-alpha@sourceware.org>
In-Reply-To: <1414617613.10085.23.camel@triegel.csb>
References: <1414617613.10085.23.camel@triegel.csb>
Date: Wed, 29 Oct 2014 23:10:50 +0100
Message-ID: <1414620650.10085.63.camel@triegel.csb>
Mime-Version: 1.0

This patch transforms pthread_once to use C11 atomics.  It's meant as an
illustration and early test.

Please note that I've transformed *all* accesses to concurrently
accessed memory locations to use atomic operations.  This is the right
thing to do to inform the compiler about concurrency and prevent it from
making optimizations based on assumptions about data-race-freedom and
sequential code (concurrent accesses are not sequential code...).
You'll see that atomic_*_relaxed is used quite a bit, which restricts
the compiler a little but does not add any barriers.

Also, this makes it easy to see which loads and stores are actually
concurrent code and thus need additional attention by the programmer.

I've compared generated code on x86_64 on GCC 4.4.7.  The only thing
that changes between before/after the patch is that a "cmp %eax,%edx"
becomes a "cmp %edx,%eax", but it's used to test equality of the values.

I've also looked at the code generated by a pre-4.9 GCC build.  The code
generated for the pthread_once fast path is the same as with GCC 4.4.7
before the patch.  The slow path has somewhat different code with the
more recent compiler, with less instructions.

Joseph mentioned that GCC doesn't yet optimize memory_order_relaxed
loads as well as it could.  This is something we may have to look at
again for other code, but for pthread_once at least this wasn't a
problem.  My gut feeling is that it won't matter as much for most
concurrent code, because relaxed atomic ops will likely be close to
other atomic ops with stronger memory orders, unless you're doing
something special like generating loads and loads of relaxed ops.

	* nptl/pthread_once.c (clear_once_control, __pthread_once_slow,
	__pthread_once): Use C11 atomics.

commit 985496a0a6cd8dbef60276e400bc3e51528862a6
Author: Torvald Riegel <triegel@redhat.com>
Date:   Tue Oct 21 00:34:53 2014 +0200

    Use C11 atomics in pthread_once.

diff --git a/nptl/pthread_once.c b/nptl/pthread_once.c
index 05aea84..9bb82e9 100644
--- a/nptl/pthread_once.c
+++ b/nptl/pthread_once.c
@@ -34,7 +34,7 @@ clear_once_control (void *arg)
      other threads that see this value: This function will be called if we
      get interrupted (see __pthread_once), so all we need to relay to other
      threads is the state being reset again.  */
-  *once_control = 0;
+  atomic_store_relaxed (once_control, 0);
   lll_futex_wake (once_control, INT_MAX, LLL_PRIVATE);
 }
 
@@ -68,20 +68,18 @@ __pthread_once_slow (pthread_once_t *once_control, void (*init_routine) (void))
 {
   while (1)
     {
-      int oldval, val, newval;
+      int val, newval;
 
       /* We need acquire memory order for this load because if the value
          signals that initialization has finished, we need to see any
          data modifications done during initialization.  */
-      val = *once_control;
-      atomic_read_barrier ();
+      val = atomic_load_acquire (once_control);
       do
 	{
 	  /* Check if the initialization has already been done.  */
 	  if (__glibc_likely ((val & __PTHREAD_ONCE_DONE) != 0))
 	    return 0;
 
-	  oldval = val;
 	  /* We try to set the state to in-progress and having the current
 	     fork generation.  We don't need atomic accesses for the fork
 	     generation because it's immutable in a particular process, and
@@ -90,18 +88,17 @@ __pthread_once_slow (pthread_once_t *once_control, void (*init_routine) (void))
 	  newval = __fork_generation | __PTHREAD_ONCE_INPROGRESS;
 	  /* We need acquire memory order here for the same reason as for the
 	     load from once_control above.  */
-	  val = atomic_compare_and_exchange_val_acq (once_control, newval,
-						     oldval);
 	}
-      while (__glibc_unlikely (val != oldval));
+      while (__glibc_unlikely (!atomic_compare_exchange_weak_acquire (
+	  once_control, &val, newval)));
 
       /* Check if another thread already runs the initializer.	*/
-      if ((oldval & __PTHREAD_ONCE_INPROGRESS) != 0)
+      if ((val & __PTHREAD_ONCE_INPROGRESS) != 0)
 	{
 	  /* Check whether the initializer execution was interrupted by a
 	     fork.  We know that for both values, __PTHREAD_ONCE_INPROGRESS
 	     is set and __PTHREAD_ONCE_DONE is not.  */
-	  if (oldval == newval)
+	  if (val == newval)
 	    {
 	      /* Same generation, some other thread was faster. Wait.  */
 	      lll_futex_wait (once_control, newval, LLL_PRIVATE);
@@ -122,8 +119,7 @@ __pthread_once_slow (pthread_once_t *once_control, void (*init_routine) (void))
       /* Mark *once_control as having finished the initialization.  We need
          release memory order here because we need to synchronize with other
          threads that want to use the initialized data.  */
-      atomic_write_barrier ();
-      *once_control = __PTHREAD_ONCE_DONE;
+      atomic_store_release (once_control, __PTHREAD_ONCE_DONE);
 
       /* Wake up all other threads.  */
       lll_futex_wake (once_control, INT_MAX, LLL_PRIVATE);
@@ -138,8 +134,7 @@ __pthread_once (pthread_once_t *once_control, void (*init_routine) (void))
 {
   /* Fast path.  See __pthread_once_slow.  */
   int val;
-  val = *once_control;
-  atomic_read_barrier ();
+  val = atomic_load_acquire (once_control);
   if (__glibc_likely ((val & __PTHREAD_ONCE_DONE) != 0))
     return 0;
   else