[v2] Remove atomic_compare_and_exchange_bool_rel.

Message ID 1466189884.26093.257.camel@localhost.localdomain
State Committed
Headers

Commit Message

Torvald Riegel June 17, 2016, 6:58 p.m. UTC
  On Fri, 2016-06-17 at 14:33 -0300, Tulio Magno Quites Machado Filho
wrote:
> Torvald Riegel <triegel@redhat.com> writes:
> 
> > Removing this operation and the matching (unused) catomic_ operation
> > seemed to be easier than fixing powerpc's definition of it, only for it
> > to be removed anyway in the future.  There were just three call sites of
> > it.
> 
> OK, but why can't we remove __arch_compare_and_exchange_bool_*_rel from
> powerpc[32|64] right now?
> AFAICS, they're all unused.

Right. I've removed them in the attached revised patch.

> For the record, microblaze seems to have a few unused lines as well.

Likewise.

Note that for powerpc specifically, use of the new C11-like
atomic_compare_exchange_weak_release will have acquire semantics too,
unnecessarily.  This patch only adds this effect in the nscd cache and
in some cases for robust locks, which I wouldn't consider
performance-critical.  However, semaphores and barriers use
atomic_compare_exchange_weak_release already, where it might matter
somewhat more; this was the case already before this patch.

This issue will disappear as soon as you (can) set
USE_ATOMIC_COMPILER_BUILTINS to 1.  What's your plan for powerpc
regarding this?  Is the currently required  GCC 4.7 sufficient, or what
is the first GCC version where this would be possible?
  

Comments

Tulio Magno Quites Machado Filho June 17, 2016, 10:17 p.m. UTC | #1
Torvald Riegel <triegel@redhat.com> writes:

> On Fri, 2016-06-17 at 14:33 -0300, Tulio Magno Quites Machado Filho
> wrote:
>> Torvald Riegel <triegel@redhat.com> writes:
>> 
>> > Removing this operation and the matching (unused) catomic_ operation
>> > seemed to be easier than fixing powerpc's definition of it, only for it
>> > to be removed anyway in the future.  There were just three call sites of
>> > it.
>> 
>> OK, but why can't we remove __arch_compare_and_exchange_bool_*_rel from
>> powerpc[32|64] right now?
>> AFAICS, they're all unused.
>
> Right. I've removed them in the attached revised patch.
>
>> For the record, microblaze seems to have a few unused lines as well.
>
> Likewise.
>
> Note that for powerpc specifically, use of the new C11-like
> atomic_compare_exchange_weak_release will have acquire semantics too,
> unnecessarily.

I'm probably missing something here.

When atomic_compare_exchange_weak_release isn't available and
USE_ATOMIC_COMPILER_BUILTINS = 0, include/atomic.h:697 will use
atomic_compare_and_exchange_val_rel, which is defined on
sysdeps/powerpc/atomic-machine.h and has release semantics.

Is there something else preventing this from happening?

> This issue will disappear as soon as you (can) set
> USE_ATOMIC_COMPILER_BUILTINS to 1.  What's your plan for powerpc
> regarding this?

I believe this is a long-term goal, but I don't think we should work on this
in a hurry due to a bug in GCC [1] and because that will suppress some
flexibility such as the one I implemented in the past [2] and which is still
on my backlog pending changes.

My next step is to replace part of the inline asm by compiler built-ins.

> Is the currently required  GCC 4.7 sufficient, or what
> is the first GCC version where this would be possible?

From an API perspective, I think it's sufficient.
But we still have to deal with the performance issue from [1].

>     Remove atomic_compare_and_exchange_bool_rel.
>     
>     atomic_compare_and_exchange_bool_rel and
>     catomic_compare_and_exchange_bool_rel are removed and replaced with the
>     new C11-like atomic_compare_exchange_weak_release.  The concurrent code
>     in nscd/cache.c has not been reviewed yet, so this patch does not add
>     detailed comments.

Tested on ppc as well.

LGTM, but I also believe we should wait for Lei's confirmation.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66867
[2] http://patchwork.sourceware.org/patch/11307/
  
Torvald Riegel June 18, 2016, 9:22 a.m. UTC | #2
On Fri, 2016-06-17 at 19:17 -0300, Tulio Magno Quites Machado Filho
wrote:
> Torvald Riegel <triegel@redhat.com> writes:
> 
> > On Fri, 2016-06-17 at 14:33 -0300, Tulio Magno Quites Machado Filho
> > wrote:
> >> Torvald Riegel <triegel@redhat.com> writes:
> >> 
> >> > Removing this operation and the matching (unused) catomic_ operation
> >> > seemed to be easier than fixing powerpc's definition of it, only for it
> >> > to be removed anyway in the future.  There were just three call sites of
> >> > it.
> >> 
> >> OK, but why can't we remove __arch_compare_and_exchange_bool_*_rel from
> >> powerpc[32|64] right now?
> >> AFAICS, they're all unused.
> >
> > Right. I've removed them in the attached revised patch.
> >
> >> For the record, microblaze seems to have a few unused lines as well.
> >
> > Likewise.
> >
> > Note that for powerpc specifically, use of the new C11-like
> > atomic_compare_exchange_weak_release will have acquire semantics too,
> > unnecessarily.
> 
> I'm probably missing something here.
> 
> When atomic_compare_exchange_weak_release isn't available and
> USE_ATOMIC_COMPILER_BUILTINS = 0, include/atomic.h:697 will use
> atomic_compare_and_exchange_val_rel, which is defined on
> sysdeps/powerpc/atomic-machine.h and has release semantics.
> 
> Is there something else preventing this from happening?

No, you're right, sorry.  Late-in-the-day reply.  I mixed this up with
the atomic RMW ops with release MO that I've been adding for the new
condvar.

> > This issue will disappear as soon as you (can) set
> > USE_ATOMIC_COMPILER_BUILTINS to 1.  What's your plan for powerpc
> > regarding this?
> 
> I believe this is a long-term goal, but I don't think we should work on this
> in a hurry due to a bug in GCC [1] and because that will suppress some
> flexibility such as the one I implemented in the past [2] and which is still
> on my backlog pending changes.

I agree there's no real need to hurry.  All I want is for arch
maintainers to be aware of this and make the switch at the right time.

> My next step is to replace part of the inline asm by compiler built-ins.
> 
> > Is the currently required  GCC 4.7 sufficient, or what
> > is the first GCC version where this would be possible?
> 
> From an API perspective, I think it's sufficient.
> But we still have to deal with the performance issue from [1].
> 
> >     Remove atomic_compare_and_exchange_bool_rel.
> >     
> >     atomic_compare_and_exchange_bool_rel and
> >     catomic_compare_and_exchange_bool_rel are removed and replaced with the
> >     new C11-like atomic_compare_exchange_weak_release.  The concurrent code
> >     in nscd/cache.c has not been reviewed yet, so this patch does not add
> >     detailed comments.
> 
> Tested on ppc as well.

Thanks.

> LGTM, but I also believe we should wait for Lei's confirmation.

Yes.
  

Patch

commit a30bda25fc10db264d18dec154162e1223593d44
Author: Torvald Riegel <triegel@redhat.com>
Date:   Tue Jun 14 15:12:00 2016 +0200

    Remove atomic_compare_and_exchange_bool_rel.
    
    atomic_compare_and_exchange_bool_rel and
    catomic_compare_and_exchange_bool_rel are removed and replaced with the
    new C11-like atomic_compare_exchange_weak_release.  The concurrent code
    in nscd/cache.c has not been reviewed yet, so this patch does not add
    detailed comments.
    
    	* nscd/cache.c (cache_add): Use new C11-like atomic operation instead
    	of atomic_compare_and_exchange_bool_rel.
    	* nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_full): Likewise.
    	* include/atomic.h (atomic_compare_and_exchange_bool_rel,
    	catomic_compare_and_exchange_bool_rel): Remove.
    	* sysdeps/aarch64/atomic-machine.h
    	(atomic_compare_and_exchange_bool_rel): Likewise.
    	* sysdeps/alpha/atomic-machine.h
    	(atomic_compare_and_exchange_bool_rel): Likewise.
    	* sysdeps/arm/atomic-machine.h
    	(atomic_compare_and_exchange_bool_rel): Likewise.
    	* sysdeps/mips/atomic-machine.h
    	(atomic_compare_and_exchange_bool_rel): Likewise.
    	* sysdeps/tile/atomic-machine.h
    	(atomic_compare_and_exchange_bool_rel): Likewise.

diff --git a/include/atomic.h b/include/atomic.h
index 5e8bfff..ad3db25 100644
--- a/include/atomic.h
+++ b/include/atomic.h
@@ -159,23 +159,6 @@ 
 #endif
 
 
-#ifndef catomic_compare_and_exchange_bool_rel
-# ifndef atomic_compare_and_exchange_bool_rel
-#  define catomic_compare_and_exchange_bool_rel(mem, newval, oldval)	      \
-  catomic_compare_and_exchange_bool_acq (mem, newval, oldval)
-# else
-#  define catomic_compare_and_exchange_bool_rel(mem, newval, oldval)	      \
-  atomic_compare_and_exchange_bool_rel (mem, newval, oldval)
-# endif
-#endif
-
-
-#ifndef atomic_compare_and_exchange_bool_rel
-# define atomic_compare_and_exchange_bool_rel(mem, newval, oldval) \
-  atomic_compare_and_exchange_bool_acq (mem, newval, oldval)
-#endif
-
-
 /* Store NEWVALUE in *MEM and return the old value.  */
 #ifndef atomic_exchange_acq
 # define atomic_exchange_acq(mem, newvalue) \
diff --git a/nptl/pthread_mutex_unlock.c b/nptl/pthread_mutex_unlock.c
index 334ce38..82aaa95 100644
--- a/nptl/pthread_mutex_unlock.c
+++ b/nptl/pthread_mutex_unlock.c
@@ -237,15 +237,24 @@  __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr)
       int private = (robust
 		     ? PTHREAD_ROBUST_MUTEX_PSHARED (mutex)
 		     : PTHREAD_MUTEX_PSHARED (mutex));
-      if ((mutex->__data.__lock & FUTEX_WAITERS) != 0
-	  || atomic_compare_and_exchange_bool_rel (&mutex->__data.__lock, 0,
-						   THREAD_GETMEM (THREAD_SELF,
-								  tid)))
+      /* Unlock the mutex using a CAS unless there are futex waiters or our
+	 TID is not the value of __lock anymore, in which case we let the
+	 kernel take care of the situation.  Use release MO in the CAS to
+	 synchronize with acquire MO in lock acquisitions.  */
+      int l = atomic_load_relaxed (&mutex->__data.__lock);
+      do
 	{
-	  INTERNAL_SYSCALL_DECL (__err);
-	  INTERNAL_SYSCALL (futex, __err, 2, &mutex->__data.__lock,
-			    __lll_private_flag (FUTEX_UNLOCK_PI, private));
+	  if (((l & FUTEX_WAITERS) != 0)
+	      || (l != THREAD_GETMEM (THREAD_SELF, tid)))
+	    {
+	      INTERNAL_SYSCALL_DECL (__err);
+	      INTERNAL_SYSCALL (futex, __err, 2, &mutex->__data.__lock,
+				__lll_private_flag (FUTEX_UNLOCK_PI, private));
+	      break;
+	    }
 	}
+      while (!atomic_compare_exchange_weak_release (&mutex->__data.__lock,
+						    &l, 0));
 
       THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
       break;
@@ -278,15 +287,16 @@  __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr)
 	/* One less user.  */
 	--mutex->__data.__nusers;
 
-      /* Unlock.  */
-      int newval, oldval;
+      /* Unlock.  Use release MO in the CAS to synchronize with acquire MO in
+	 lock acquisitions.  */
+      int newval;
+      int oldval = atomic_load_relaxed (&mutex->__data.__lock);
       do
 	{
-	  oldval = mutex->__data.__lock;
 	  newval = oldval & PTHREAD_MUTEX_PRIO_CEILING_MASK;
 	}
-      while (atomic_compare_and_exchange_bool_rel (&mutex->__data.__lock,
-						   newval, oldval));
+      while (!atomic_compare_exchange_weak_release (&mutex->__data.__lock,
+						    &oldval, newval));
 
       if ((oldval & ~PTHREAD_MUTEX_PRIO_CEILING_MASK) > 1)
 	lll_futex_wake (&mutex->__data.__lock, 1,
diff --git a/nscd/cache.c b/nscd/cache.c
index 3021abd..daa0b2b 100644
--- a/nscd/cache.c
+++ b/nscd/cache.c
@@ -178,12 +178,12 @@  cache_add (int type, const void *key, size_t len, struct datahead *packet,
   assert ((newp->packet & BLOCK_ALIGN_M1) == 0);
 
   /* Put the new entry in the first position.  */
-  do
-    newp->next = table->head->array[hash];
-  while (atomic_compare_and_exchange_bool_rel (&table->head->array[hash],
-					       (ref_t) ((char *) newp
-							- table->data),
-					       (ref_t) newp->next));
+  /* TODO Review concurrency.  Use atomic_exchange_release.  */
+  newp->next = atomic_load_relaxed (&table->head->array[hash]);
+  while (!atomic_compare_exchange_weak_release (&table->head->array[hash],
+						(ref_t *) &newp->next,
+						(ref_t) ((char *) newp
+							 - table->data)));
 
   /* Update the statistics.  */
   if (packet->notfound)
diff --git a/sysdeps/aarch64/atomic-machine.h b/sysdeps/aarch64/atomic-machine.h
index 28c80dc..6708b9b 100644
--- a/sysdeps/aarch64/atomic-machine.h
+++ b/sysdeps/aarch64/atomic-machine.h
@@ -115,10 +115,6 @@  typedef uintmax_t uatomic_max_t;
 
 /* Compare and exchange with "release" semantics, ie barrier before.  */
 
-# define atomic_compare_and_exchange_bool_rel(mem, new, old)	\
-  __atomic_bool_bysize (__arch_compare_and_exchange_bool, int,	\
-			mem, new, old, __ATOMIC_RELEASE)
-
 # define atomic_compare_and_exchange_val_rel(mem, new, old)	 \
   __atomic_val_bysize (__arch_compare_and_exchange_val, int,    \
                        mem, new, old, __ATOMIC_RELEASE)
diff --git a/sysdeps/alpha/atomic-machine.h b/sysdeps/alpha/atomic-machine.h
index d96cb7a..882d800 100644
--- a/sysdeps/alpha/atomic-machine.h
+++ b/sysdeps/alpha/atomic-machine.h
@@ -210,10 +210,6 @@  typedef uintmax_t uatomic_max_t;
 
 /* Compare and exchange with "release" semantics, ie barrier before.  */
 
-#define atomic_compare_and_exchange_bool_rel(mem, new, old)	\
-  __atomic_bool_bysize (__arch_compare_and_exchange_bool, int,	\
-		        mem, new, old, __MB, "")
-
 #define atomic_compare_and_exchange_val_rel(mem, new, old)	\
   __atomic_val_bysize (__arch_compare_and_exchange_val, int,	\
 		       mem, new, old, __MB, "")
diff --git a/sysdeps/arm/atomic-machine.h b/sysdeps/arm/atomic-machine.h
index dd5e714..916c09a 100644
--- a/sysdeps/arm/atomic-machine.h
+++ b/sysdeps/arm/atomic-machine.h
@@ -87,10 +87,6 @@  void __arm_link_error (void);
 
 /* Compare and exchange with "release" semantics, ie barrier before.  */
 
-# define atomic_compare_and_exchange_bool_rel(mem, new, old)    \
-  __atomic_bool_bysize (__arch_compare_and_exchange_bool, int,  \
-                        mem, new, old, __ATOMIC_RELEASE)
-
 # define atomic_compare_and_exchange_val_rel(mem, new, old)      \
   __atomic_val_bysize (__arch_compare_and_exchange_val, int,    \
                        mem, new, old, __ATOMIC_RELEASE)
diff --git a/sysdeps/microblaze/atomic-machine.h b/sysdeps/microblaze/atomic-machine.h
index af7acac..229fd49 100644
--- a/sysdeps/microblaze/atomic-machine.h
+++ b/sysdeps/microblaze/atomic-machine.h
@@ -47,12 +47,6 @@  typedef uintmax_t uatomic_max_t;
 #define __arch_compare_and_exchange_bool_16_acq(mem, newval, oldval)           \
   (abort (), 0)
 
-#define __arch_compare_and_exchange_bool_8_rel(mem, newval, oldval)            \
-  (abort (), 0)
-
-#define __arch_compare_and_exchange_bool_16_rel(mem, newval, oldval)           \
-  (abort (), 0)
-
 #define __arch_compare_and_exchange_val_32_acq(mem, newval, oldval)            \
   ({                                                                           \
       __typeof (*(mem)) __tmp;                                                 \
diff --git a/sysdeps/mips/atomic-machine.h b/sysdeps/mips/atomic-machine.h
index a60e4fb..771166b 100644
--- a/sysdeps/mips/atomic-machine.h
+++ b/sysdeps/mips/atomic-machine.h
@@ -149,10 +149,6 @@  typedef uintmax_t uatomic_max_t;
 
 /* Compare and exchange with "release" semantics, ie barrier before.  */
 
-# define atomic_compare_and_exchange_bool_rel(mem, new, old)	\
-  __atomic_bool_bysize (__arch_compare_and_exchange_bool, int,	\
-			mem, new, old, __ATOMIC_RELEASE)
-
 # define atomic_compare_and_exchange_val_rel(mem, new, old)	 \
   __atomic_val_bysize (__arch_compare_and_exchange_val, int,    \
                        mem, new, old, __ATOMIC_RELEASE)
@@ -330,10 +326,6 @@  typedef uintmax_t uatomic_max_t;
 
 /* Compare and exchange with "release" semantics, ie barrier before.  */
 
-# define atomic_compare_and_exchange_bool_rel(mem, new, old)	\
-  __atomic_bool_bysize (__arch_compare_and_exchange_bool, int,	\
-		        mem, new, old, MIPS_SYNC_STR, "")
-
 # define atomic_compare_and_exchange_val_rel(mem, new, old)	\
   __atomic_val_bysize (__arch_compare_and_exchange_val, int,	\
 		       mem, new, old, MIPS_SYNC_STR, "")
diff --git a/sysdeps/powerpc/atomic-machine.h b/sysdeps/powerpc/atomic-machine.h
index 8b0e1e7..c6b9eea 100644
--- a/sysdeps/powerpc/atomic-machine.h
+++ b/sysdeps/powerpc/atomic-machine.h
@@ -53,12 +53,6 @@  typedef uintmax_t uatomic_max_t;
 #define __arch_compare_and_exchange_bool_16_acq(mem, newval, oldval) \
   (abort (), 0)
 
-#define __arch_compare_and_exchange_bool_8_rel(mem, newval, oldval) \
-  (abort (), 0)
-
-#define __arch_compare_and_exchange_bool_16_rel(mem, newval, oldval) \
-  (abort (), 0)
-
 #ifdef UP
 # define __ARCH_ACQ_INSTR	""
 # define __ARCH_REL_INSTR	""
diff --git a/sysdeps/powerpc/powerpc32/atomic-machine.h b/sysdeps/powerpc/powerpc32/atomic-machine.h
index 1d407b3..40a8b7b 100644
--- a/sysdeps/powerpc/powerpc32/atomic-machine.h
+++ b/sysdeps/powerpc/powerpc32/atomic-machine.h
@@ -58,22 +58,6 @@ 
   __tmp != 0;								      \
 })
 
-#define __arch_compare_and_exchange_bool_32_rel(mem, newval, oldval)	      \
-({									      \
-  unsigned int __tmp;							      \
-  __asm __volatile (__ARCH_REL_INSTR "\n"				      \
-		    "1:	lwarx	%0,0,%1" MUTEX_HINT_REL "\n"		      \
-		    "	subf.	%0,%2,%0\n"				      \
-		    "	bne	2f\n"					      \
-		    "	stwcx.	%3,0,%1\n"				      \
-		    "	bne-	1b\n"					      \
-		    "2:	"						      \
-		    : "=&r" (__tmp)					      \
-		    : "b" (mem), "r" (oldval), "r" (newval)		      \
-		    : "cr0", "memory");					      \
-  __tmp != 0;								      \
-})
-
 /* Powerpc32 processors don't implement the 64-bit (doubleword) forms of
    load and reserve (ldarx) and store conditional (stdcx.) instructions.
    So for powerpc32 we stub out the 64-bit forms.  */
@@ -83,9 +67,6 @@ 
 #define __arch_compare_and_exchange_val_64_acq(mem, newval, oldval) \
   (abort (), (__typeof (*mem)) 0)
 
-#define __arch_compare_and_exchange_bool_64_rel(mem, newval, oldval) \
-  (abort (), 0)
-
 #define __arch_compare_and_exchange_val_64_rel(mem, newval, oldval) \
   (abort (), (__typeof (*mem)) 0)
 
diff --git a/sysdeps/powerpc/powerpc64/atomic-machine.h b/sysdeps/powerpc/powerpc64/atomic-machine.h
index 751487a..7971318 100644
--- a/sysdeps/powerpc/powerpc64/atomic-machine.h
+++ b/sysdeps/powerpc/powerpc64/atomic-machine.h
@@ -58,23 +58,6 @@ 
   __tmp != 0;								      \
 })
 
-#define __arch_compare_and_exchange_bool_32_rel(mem, newval, oldval) \
-({									      \
-  unsigned int __tmp, __tmp2;						      \
-  __asm __volatile (__ARCH_REL_INSTR "\n"				      \
-		    "   clrldi  %1,%1,32\n"				      \
-		    "1:	lwarx	%0,0,%2" MUTEX_HINT_REL "\n"		      \
-		    "	subf.	%0,%1,%0\n"				      \
-		    "	bne	2f\n"					      \
-		    "	stwcx.	%4,0,%2\n"				      \
-		    "	bne-	1b\n"					      \
-		    "2:	"						      \
-		    : "=&r" (__tmp), "=r" (__tmp2)			      \
-		    : "b" (mem), "1" (oldval), "r" (newval)		      \
-		    : "cr0", "memory");					      \
-  __tmp != 0;								      \
-})
-
 /*
  * Only powerpc64 processors support Load doubleword and reserve index (ldarx)
  * and Store doubleword conditional indexed (stdcx) instructions.  So here
@@ -96,22 +79,6 @@ 
   __tmp != 0;								      \
 })
 
-#define __arch_compare_and_exchange_bool_64_rel(mem, newval, oldval) \
-({									      \
-  unsigned long	__tmp;							      \
-  __asm __volatile (__ARCH_REL_INSTR "\n"				      \
-		    "1:	ldarx	%0,0,%1" MUTEX_HINT_REL "\n"		      \
-		    "	subf.	%0,%2,%0\n"				      \
-		    "	bne	2f\n"					      \
-		    "	stdcx.	%3,0,%1\n"				      \
-		    "	bne-	1b\n"					      \
-		    "2:	"						      \
-		    : "=&r" (__tmp)					      \
-		    : "b" (mem), "r" (oldval), "r" (newval)		      \
-		    : "cr0", "memory");					      \
-  __tmp != 0;								      \
-})
-
 #define __arch_compare_and_exchange_val_64_acq(mem, newval, oldval) \
   ({									      \
       __typeof (*(mem)) __tmp;						      \
diff --git a/sysdeps/tile/atomic-machine.h b/sysdeps/tile/atomic-machine.h
index 651e0d9..336518c 100644
--- a/sysdeps/tile/atomic-machine.h
+++ b/sysdeps/tile/atomic-machine.h
@@ -55,11 +55,6 @@  typedef uintmax_t uatomic_max_t;
     atomic_full_barrier ();                                     \
     atomic_compare_and_exchange_val_acq ((mem), (n), (o));      \
   })
-#define atomic_compare_and_exchange_bool_rel(mem, n, o)         \
-  ({                                                            \
-    atomic_full_barrier ();                                     \
-    atomic_compare_and_exchange_bool_acq ((mem), (n), (o));     \
-  })
 #define atomic_exchange_rel(mem, n)                             \
   ({                                                            \
     atomic_full_barrier ();                                     \