From patchwork Wed Oct 29 21:50:16 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Torvald Riegel <triegel@redhat.com>
X-Patchwork-Id: 3490
Received: (qmail 15942 invoked by alias); 29 Oct 2014 21:50:23 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 15930 invoked by uid 89); 29 Oct 2014 21:50:21 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.4 required=5.0 tests=AWL, BAYES_00,
	RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Subject: [PATCH 2/4] Add atomic operations similar to those provided by C11.
From: Torvald Riegel <triegel@redhat.com>
To: GLIBC Devel <libc-alpha@sourceware.org>
In-Reply-To: <1414617613.10085.23.camel@triegel.csb>
References: <1414617613.10085.23.camel@triegel.csb>
Date: Wed, 29 Oct 2014 22:50:16 +0100
Message-ID: <1414619416.10085.46.camel@triegel.csb>
Mime-Version: 1.0

This patch adds atomic operations similar to C11.

The function naming is essentially the C11 names, but with the memory
order argument removed and added as a suffix.  For example, C11's
  atomic_store_explicit(&foo, 23, memory_order_release)
becomes
  atomic_store_relaxed (&foo, 23);

This avoids naming collisions, and it's not much more verbose because we
would make the memory orders explicit anyway.

The provided functions are a subset of what C11 provides.  I think it's
better to start minimal, as we can easily add more operations if needed.
I've chosen this subset based on the operations that would be necessary
for generic implementations of mutexes, pthread_once, rwlock,
semaphores, and condvars.

Likewise, I've chosen to only provide 32b atomics (which we need for
futexes, for example) and additionally 64b atomics if provided by the
arch (that's what __HAVE_64B_ATOMICS is for, see Patch 1/4).

Per Josephs request, attempts to apply atomic operations when using the
__atomic builtins to variables of other sizes results in an early error.
If not using the builtins, I rely on the existing atomics to Do The
Right Thing.

	* include/atomic.h (__atomic_link_error, __atomic_check_size,
	atomic_thread_fence_acquire, atomic_thread_fence_release,
	atomic_thread_fence_seq_cst, atomic_load_relaxed,
	atomic_load_acquire, atomic_store_relaxed, atomic_store_release,
	atomic_compare_exchange_weak_relaxed,
	atomic_compare_exchange_weak_acquire,
	atomic_compare_exchange_weak_release,
	atomic_exchange_acquire, atomic_exchange_release,
	atomic_fetch_add_relaxed, atomic_fetch_add_acquire,
	atomic_fetch_add_release, atomic_fetch_add_acq_rel,
	atomic_fetch_and_acquire,
	atomic_fetch_or_relaxed, atomic_fetch_or_acquire): New.

commit 5921ff5b989972b744f5cb7d8f4f0b16c04ab19e
Author: Torvald Riegel <triegel@redhat.com>
Date:   Sun Sep 14 20:04:54 2014 +0200

    Add atomic operations similar to those provided by C11.

diff --git a/include/atomic.h b/include/atomic.h
index 3e82b6a..c4df8f1 100644
--- a/include/atomic.h
+++ b/include/atomic.h
@@ -542,6 +542,218 @@
   ({ __typeof (x) __x; __asm ("" : "=r" (__x) : "0" (x)); __x; })
 #endif
 
+/* This is equal to 1 iff the architecture supports 64b atomic operations.  */
+#ifndef __HAVE_64B_ATOMICS
+#error Unable to determine if 64-bit atomics are present.
+#endif
+
+/* The following functions are a subset of the atomic operations provided by
+   C11.  Usually, a function named atomic_OP_MO(args) is equivalent to C11's
+   atomic_OP_explicit(args, memory_order_MO); exceptions noted below.  */
+
+/* Each arch can request to use compiler built-ins for C11 atomics.  If it
+   does, all atomics will be based on these.  */
+#if USE_ATOMIC_COMPILER_BUILTINS
+
+/* We require 32b atomic operations; some archs also support 64b atomic
+   operations.  */
+void __atomic_link_error (void);
+# if __HAVE_64B_ATOMICS == 1
+#  define __atomic_check_size(mem) \
+   if ((sizeof (*mem) != 4) && (sizeof (*mem) != 8))			      \
+     __atomic_link_error ();
+# else
+#  define __atomic_check_size(mem) \
+   if (sizeof (*mem) != 4)						      \
+     __atomic_link_error ();
+# endif
+
+# define atomic_thread_fence_acquire() \
+  __atomic_thread_fence (__ATOMIC_ACQUIRE)
+# define atomic_thread_fence_release() \
+  __atomic_thread_fence (__ATOMIC_RELEASE)
+# define atomic_thread_fence_seq_cst() \
+  __atomic_thread_fence (__ATOMIC_SEQ_CST)
+
+# define atomic_load_relaxed(mem) \
+  ({ __atomic_check_size((mem)); __atomic_load_n ((mem), __ATOMIC_RELAXED); })
+# define atomic_load_acquire(mem) \
+  ({ __atomic_check_size((mem)); __atomic_load_n ((mem), __ATOMIC_ACQUIRE); })
+
+# define atomic_store_relaxed(mem, val) \
+  do {									      \
+    __atomic_check_size((mem));						      \
+    __atomic_store_n ((mem), (val), __ATOMIC_RELAXED);			      \
+  } while (0)
+# define atomic_store_release(mem, val) \
+  do {									      \
+    __atomic_check_size((mem));						      \
+    __atomic_store_n ((mem), (val), __ATOMIC_RELEASE);			      \
+  } while (0)
+
+/* On failure, this CAS has memory_order_relaxed semantics.  */
+# define atomic_compare_exchange_weak_relaxed(mem, expected, desired) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_compare_exchange_n ((mem), (expected), (desired), 1,		      \
+    __ATOMIC_RELAXED, __ATOMIC_RELAXED); })
+# define atomic_compare_exchange_weak_acquire(mem, expected, desired) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_compare_exchange_n ((mem), (expected), (desired), 1,		      \
+    __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); })
+# define atomic_compare_exchange_weak_release(mem, expected, desired) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_compare_exchange_n ((mem), (expected), (desired), 1,		      \
+    __ATOMIC_RELEASE, __ATOMIC_RELAXED); })
+
+# define atomic_exchange_acquire(mem, desired) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_exchange_n ((mem), (desired), __ATOMIC_ACQUIRE); })
+# define atomic_exchange_release(mem, desired) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_exchange_n ((mem), (desired), __ATOMIC_RELEASE); })
+
+# define atomic_fetch_add_relaxed(mem, operand) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_fetch_add ((mem), (operand), __ATOMIC_RELAXED); })
+# define atomic_fetch_add_acquire(mem, operand) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_fetch_add ((mem), (operand), __ATOMIC_ACQUIRE); })
+# define atomic_fetch_add_release(mem, operand) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_fetch_add ((mem), (operand), __ATOMIC_RELEASE); })
+# define atomic_fetch_add_acq_rel(mem, operand) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_fetch_add ((mem), (operand), __ATOMIC_ACQ_REL); })
+
+# define atomic_fetch_and_acquire(mem, operand) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_fetch_and ((mem), (operand), __ATOMIC_ACQUIRE); })
+
+# define atomic_fetch_or_relaxed(mem, operand) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_fetch_or ((mem), (operand), __ATOMIC_RELAXED); })
+# define atomic_fetch_or_acquire(mem, operand) \
+  ({ __atomic_check_size((mem));					      \
+  __atomic_fetch_or ((mem), (operand), __ATOMIC_ACQUIRE); })
+
+#else /* !USE_ATOMIC_COMPILER_BUILTINS  */
+
+/* By default, we assume that read, write, and full barriers are equivalent
+   to acquire, release, and seq_cst barriers.  Archs for which this does not
+   hold have to provide custom definitions of the fences.  */
+# ifndef atomic_thread_fence_acquire
+#  define atomic_thread_fence_acquire() atomic_read_barrier ()
+# endif
+# ifndef atomic_thread_fence_release
+#  define atomic_thread_fence_release() atomic_write_barrier ()
+# endif
+# ifndef atomic_thread_fence_seq_cst
+#  define atomic_thread_fence_seq_cst() atomic_full_barrier ()
+# endif
+
+# ifndef atomic_load_relaxed
+#  define atomic_load_relaxed(mem) \
+   ({ __typeof (*(mem)) __atg100_val;					      \
+   __asm ("" : "=r" (__atg100_val) : "0" (*(mem)));			      \
+   __atg100_val; })
+# endif
+# ifndef atomic_load_acquire
+#  define atomic_load_acquire(mem) \
+   ({ __typeof (*(mem)) __atg101_val = atomic_load_relaxed (mem);	      \
+   atomic_thread_fence_acquire ();					      \
+   __atg101_val; })
+# endif
+
+# ifndef atomic_store_relaxed
+/* XXX Use inline asm here?  */
+#  define atomic_store_relaxed(mem, val) do { *(mem) = (val); } while (0)
+# endif
+# ifndef atomic_store_release
+#  define atomic_store_release(mem, val) \
+   do {									      \
+     atomic_thread_fence_release ();					      \
+     atomic_store_relaxed ((mem), (val));				      \
+   } while (0)
+# endif
+
+/* On failure, this CAS has memory_order_relaxed semantics.  */
+/* XXX This potentially has one branch more than necessary, but archs
+   currently do not define a CAS that returns both the previous value and
+   the success flag.  */
+# ifndef atomic_compare_exchange_weak_acquire
+#  define atomic_compare_exchange_weak_acquire(mem, expected, desired) \
+   ({ typeof (*(expected)) __atg102_expected = *(expected);		      \
+   *(expected) =							      \
+     atomic_compare_and_exchange_val_acq ((mem), (desired), *(expected));     \
+   *(expected) == __atg102_expected; })
+# endif
+# ifndef atomic_compare_exchange_weak_relaxed
+/* XXX Fall back to CAS with acquire MO because archs do not define a weaker
+   CAS.  */
+#  define atomic_compare_exchange_weak_relaxed(mem, expected, desired) \
+   atomic_compare_exchange_weak_acquire ((mem), (expected), (desired))
+# endif
+# ifndef atomic_compare_exchange_weak_release
+#  define atomic_compare_exchange_weak_release(mem, expected, desired) \
+   ({ typeof (*(expected)) __atg103_expected = *(expected);		      \
+   *(expected) =							      \
+     atomic_compare_and_exchange_val_rel ((mem), (desired), *(expected));     \
+   *(expected) == __atg103_expected; })
+# endif
+
+# ifndef atomic_exchange_acquire
+#  define atomic_exchange_acquire(mem, val) \
+   atomic_exchange_acq ((mem), (val))
+# endif
+# ifndef atomic_exchange_release
+#  define atomic_exchange_release(mem, val) \
+   atomic_exchange_rel ((mem), (val))
+# endif
+
+# ifndef atomic_fetch_add_acquire
+#  define atomic_fetch_add_acquire(mem, operand) \
+   atomic_exchange_and_add_acq ((mem), (operand))
+# endif
+# ifndef atomic_fetch_add_relaxed
+/* XXX Fall back to acquire MO because the MO semantics of
+   atomic_exchange_and_add are not documented; the generic version falls back
+   to atomic_exchange_and_add_acq if atomic_exchange_and_add is not defined,
+   and vice versa.  */
+#  define atomic_fetch_add_relaxed(mem, operand) \
+   atomic_fetch_add_acquire ((mem), (operand))
+# endif
+# ifndef atomic_fetch_add_release
+#  define atomic_fetch_add_release(mem, operand) \
+   atomic_exchange_and_add_rel ((mem), (operand))
+# endif
+# ifndef atomic_fetch_add_acq_rel
+#  define atomic_fetch_add_acq_rel(mem, operand) \
+   ({ atomic_thread_fence_release ();					      \
+   atomic_exchange_and_add_acq ((mem), (operand)); })
+# endif
+
+/* XXX The default for atomic_and_val has acquire semantics, but this is not
+   documented.  */
+# ifndef atomic_fetch_and_acquire
+#  define atomic_fetch_and_acquire(mem, operand) \
+   atomic_and_val ((mem), (operand))
+# endif
+
+/* XXX The default for atomic_or_val has acquire semantics, but this is not
+   documented.  */
+# ifndef atomic_fetch_or_acquire
+#  define atomic_fetch_or_acquire(mem, operand) \
+   atomic_or_val ((mem), (operand))
+# endif
+/* XXX Fall back to acquire MO because archs do not define a weaker
+   atomic_or_val.  */
+# ifndef atomic_fetch_or_relaxed
+#  define atomic_fetch_or_relaxed(mem, operand) \
+   atomic_fetch_or_acquire ((mem), (operand))
+# endif
+
+#endif /* !USE_ATOMIC_COMPILER_BUILTINS  */
+
 
 #ifndef atomic_delay
 # define atomic_delay() do { /* nothing */ } while (0)