diff mbox

Only use 64b atomics in semaphores if long int is 64b.

Message ID 1422647712.16670.41.camel@triegel.csb
State Dropped
Headers show

Commit Message

Torvald Riegel Jan. 30, 2015, 7:55 p.m. UTC
This patch is another attempt at fixing this.  It uses Josephs
suggestion for a new flag, SEM_USE_64B_ATOMICS, which is set to 1 iff
__HAVE_64B_ATOMICS != 0 and defined (__LP64__).  This is meant to
capture the condition that we need "long int" to be a 8-byte-sized type.

This does expect that an 8B-sized long int will also be 8B-aligned.  Do
we support any arch where this isn't true?

Joseph also suggested to !SHLIB_COMPAT (libpthread, GLIBC_2_0,
GLIBC_2_22)) to enable this on archs for which support has been added
starting with 2.22.  I'm not entirely sure how this machinery works
(specifically, where/how an arch defines what it's first supported
version is), but I'll prepare a patch that uses this in the struct
new_sem definition and tries to merge the many bits/semaphore.h headers
into a generic one (so that new archs would pick up this one
specifically).

Chris' patch for at-runtime realignment of the data can still be added
during the 2.22 cycle.  I talked a bit about it with Carlos offline, and
one concern he had was violating the aliasing rules in the language, and
thus potentially making the compiler do funny things (hence the
hesitation to use this for 2.21).

Tested on x86_64-linux and i686-linux.

OK?

Carlos, if OK, do you want this for 2.21, or for 2.22 (and let all
affected archs just set __HAVE_64B_ATOMICS to 0 as MIPS has done
already)?

(Will add changelog later.)

Comments

H.J. Lu Jan. 30, 2015, 8:08 p.m. UTC | #1
On Fri, Jan 30, 2015 at 11:55 AM, Torvald Riegel <triegel@redhat.com> wrote:
> This patch is another attempt at fixing this.  It uses Josephs
> suggestion for a new flag, SEM_USE_64B_ATOMICS, which is set to 1 iff
> __HAVE_64B_ATOMICS != 0 and defined (__LP64__).  This is meant to
> capture the condition that we need "long int" to be a 8-byte-sized type.
>
> This does expect that an 8B-sized long int will also be 8B-aligned.  Do
> we support any arch where this isn't true?
>
> Joseph also suggested to !SHLIB_COMPAT (libpthread, GLIBC_2_0,
> GLIBC_2_22)) to enable this on archs for which support has been added
> starting with 2.22.  I'm not entirely sure how this machinery works
> (specifically, where/how an arch defines what it's first supported
> version is), but I'll prepare a patch that uses this in the struct
> new_sem definition and tries to merge the many bits/semaphore.h headers
> into a generic one (so that new archs would pick up this one
> specifically).
>
> Chris' patch for at-runtime realignment of the data can still be added
> during the 2.22 cycle.  I talked a bit about it with Carlos offline, and
> one concern he had was violating the aliasing rules in the language, and
> thus potentially making the compiler do funny things (hence the
> hesitation to use this for 2.21).
>
> Tested on x86_64-linux and i686-linux.
>
> OK?
>
> Carlos, if OK, do you want this for 2.21, or for 2.22 (and let all
> affected archs just set __HAVE_64B_ATOMICS to 0 as MIPS has done
> already)?
>
>

After looking over the semaphore core, I decided to use 64-bit
atomics in semaphore for x32 and pay the extra cycle.  Please
define SEM_USE_64B_ATOMICS for x32.

Thanks.
Carlos O'Donell Feb. 1, 2015, 11:56 p.m. UTC | #2
On 01/30/2015 02:55 PM, Torvald Riegel wrote:
> This patch is another attempt at fixing this.  It uses Josephs
> suggestion for a new flag, SEM_USE_64B_ATOMICS, which is set to 1 iff
> __HAVE_64B_ATOMICS != 0 and defined (__LP64__).  This is meant to
> capture the condition that we need "long int" to be a 8-byte-sized type.
> 
> This does expect that an 8B-sized long int will also be 8B-aligned.  Do
> we support any arch where this isn't true?
> 
> Joseph also suggested to !SHLIB_COMPAT (libpthread, GLIBC_2_0,
> GLIBC_2_22)) to enable this on archs for which support has been added
> starting with 2.22.  I'm not entirely sure how this machinery works
> (specifically, where/how an arch defines what it's first supported
> version is), but I'll prepare a patch that uses this in the struct
> new_sem definition and tries to merge the many bits/semaphore.h headers
> into a generic one (so that new archs would pick up this one
> specifically).
> 
> Chris' patch for at-runtime realignment of the data can still be added
> during the 2.22 cycle.  I talked a bit about it with Carlos offline, and
> one concern he had was violating the aliasing rules in the language, and
> thus potentially making the compiler do funny things (hence the
> hesitation to use this for 2.21).
> 
> Tested on x86_64-linux and i686-linux.
> 
> OK?
> 
> Carlos, if OK, do you want this for 2.21, or for 2.22 (and let all
> affected archs just set __HAVE_64B_ATOMICS to 0 as MIPS has done
> already)?

Please aim for 2.22 as it needs a review of the SHLIB_COMPAT to make
sure it does what we intend.

Cheers,
Carlos.
diff mbox

Patch

commit 201e6fddc7213a7ad5eb9f9866d6fcb130d6065b
Author: Torvald Riegel <triegel@redhat.com>
Date:   Fri Jan 30 19:15:31 2015 +0100

    Only use 64b atomics in semaphores if long int is 64b.

diff --git a/nptl/sem_getvalue.c b/nptl/sem_getvalue.c
index 1432cc7..06b8624 100644
--- a/nptl/sem_getvalue.c
+++ b/nptl/sem_getvalue.c
@@ -34,7 +34,7 @@  __new_sem_getvalue (sem_t *sem, int *sval)
      necessary, use a stronger MO here and elsewhere (e.g., potentially
      release MO in all places where we consume a token).  */
 
-#if __HAVE_64B_ATOMICS
+#if SEM_USE_64B_ATOMICS
   *sval = atomic_load_relaxed (&isem->data) & SEM_VALUE_MASK;
 #else
   *sval = atomic_load_relaxed (&isem->value) >> SEM_VALUE_SHIFT;
diff --git a/nptl/sem_init.c b/nptl/sem_init.c
index 575b661..4437d31 100644
--- a/nptl/sem_init.c
+++ b/nptl/sem_init.c
@@ -53,7 +53,7 @@  __new_sem_init (sem_t *sem, int pshared, unsigned int value)
   struct new_sem *isem = (struct new_sem *) sem;
 
   /* Use the values the caller provided.  */
-#if __HAVE_64B_ATOMICS
+#if SEM_USE_64B_ATOMICS
   isem->data = value;
 #else
   isem->value = value << SEM_VALUE_SHIFT;
diff --git a/nptl/sem_open.c b/nptl/sem_open.c
index bfd2dea..0524d9d 100644
--- a/nptl/sem_open.c
+++ b/nptl/sem_open.c
@@ -193,7 +193,7 @@  sem_open (const char *name, int oflag, ...)
 	struct new_sem newsem;
       } sem;
 
-#if __HAVE_64B_ATOMICS
+#if SEM_USE_64B_ATOMICS
       sem.newsem.data = value;
 #else
       sem.newsem.value = value << SEM_VALUE_SHIFT;
diff --git a/nptl/sem_post.c b/nptl/sem_post.c
index 6e495ed..9baa72b 100644
--- a/nptl/sem_post.c
+++ b/nptl/sem_post.c
@@ -59,7 +59,7 @@  __new_sem_post (sem_t *sem)
   struct new_sem *isem = (struct new_sem *) sem;
   int private = isem->private;
 
-#if __HAVE_64B_ATOMICS
+#if SEM_USE_64B_ATOMICS
   /* Add a token to the semaphore.  We use release MO to make sure that a
      thread acquiring this token synchronizes with us and other threads that
      added tokens before (the release sequence includes atomic RMW operations
diff --git a/nptl/sem_waitcommon.c b/nptl/sem_waitcommon.c
index 311e511..675edb5 100644
--- a/nptl/sem_waitcommon.c
+++ b/nptl/sem_waitcommon.c
@@ -175,7 +175,7 @@  futex_wake (unsigned int* futex, int processes_to_wake, int private)
    to previous glibc builds.  */
 static const int sem_assume_only_signals_cause_futex_EINTR = 1;
 
-#if !__HAVE_64B_ATOMICS
+#if !SEM_USE_64B_ATOMICS
 static void
 __sem_wait_32_finish (struct new_sem *sem);
 #endif
@@ -185,7 +185,7 @@  __sem_wait_cleanup (void *arg)
 {
   struct new_sem *sem = (struct new_sem *) arg;
 
-#if __HAVE_64B_ATOMICS
+#if SEM_USE_64B_ATOMICS
   /* Stop being registered as a waiter.  See below for MO.  */
   atomic_fetch_add_relaxed (&sem->data, -((uint64_t) 1 << SEM_NWAITERS_SHIFT));
 #else
@@ -204,7 +204,7 @@  do_futex_wait (struct new_sem *sem, const struct timespec *abstime)
 {
   int err;
 
-#if __HAVE_64B_ATOMICS
+#if SEM_USE_64B_ATOMICS
   err = futex_abstimed_wait ((unsigned int *) &sem->data + SEM_VALUE_OFFSET, 0,
 			     abstime, sem->private, true);
 #else
@@ -227,7 +227,7 @@  __new_sem_wait_fast (struct new_sem *sem, int definitive_result)
      synchronize memory); thus, relaxed MO is sufficient for the initial load
      and the failure path of the CAS.  If the weak CAS fails and we need a
      definitive result, retry.  */
-#if __HAVE_64B_ATOMICS
+#if SEM_USE_64B_ATOMICS
   uint64_t d = atomic_load_relaxed (&sem->data);
   do
     {
@@ -260,7 +260,7 @@  __new_sem_wait_slow (struct new_sem *sem, const struct timespec *abstime)
 {
   int err = 0;
 
-#if __HAVE_64B_ATOMICS
+#if SEM_USE_64B_ATOMICS
   /* Add a waiter.  Relaxed MO is sufficient because we can rely on the
      ordering provided by the RMW operations we use.  */
   uint64_t d = atomic_fetch_add_relaxed (&sem->data,
@@ -388,7 +388,7 @@  __new_sem_wait_slow (struct new_sem *sem, const struct timespec *abstime)
 	  /* If there is no token, wait.  */
 	  if ((v >> SEM_VALUE_SHIFT) == 0)
 	    {
-	      /* See __HAVE_64B_ATOMICS variant.  */
+	      /* See SEM_USE_64B_ATOMICS variant.  */
 	      err = do_futex_wait(sem, abstime);
 	      if (err == ETIMEDOUT ||
 		  (err == EINTR && sem_assume_only_signals_cause_futex_EINTR))
@@ -422,7 +422,7 @@  error:
 }
 
 /* Stop being a registered waiter (non-64b-atomics code only).  */
-#if !__HAVE_64B_ATOMICS
+#if !SEM_USE_64B_ATOMICS
 static void
 __sem_wait_32_finish (struct new_sem *sem)
 {
diff --git a/sysdeps/nptl/internaltypes.h b/sysdeps/nptl/internaltypes.h
index 8f5cfa4..916f652 100644
--- a/sysdeps/nptl/internaltypes.h
+++ b/sysdeps/nptl/internaltypes.h
@@ -143,7 +143,17 @@  struct pthread_key_struct
 /* Semaphore variable structure.  */
 struct new_sem
 {
-#if __HAVE_64B_ATOMICS
+/* We define SEM_USE_64B_ATOMICS to 1 iff we have 64b atomic operations and
+   are on an LP64 data model.  Our atomic operations expect to target
+   naturally aligned variables.  sem_t is aligned to long int, so it will
+   be 64b-aligned only if long int is a 64b-sized type and naturally
+   aligned.  */
+#if __HAVE_64B_ATOMICS && defined (__LP64__)
+# define SEM_USE_64B_ATOMICS 1
+#else
+# define SEM_USE_64B_ATOMICS 0
+#endif
+#if SEM_USE_64B_ATOMICS
   /* The data field holds both value (in the least-significant 32 bytes) and
      nwaiters.  */
 # if __BYTE_ORDER == __LITTLE_ENDIAN