[roland/lll_robust_trylock] Get rid of lll_robust_trylock.

Message ID 20140709230011.B9DB42C39AC@topped-with-meat.com
State Committed
Headers

Commit Message

Roland McGrath July 9, 2014, 11 p.m. UTC
  As discussed in the "[PATCH roland/lll-futex]" thread, every definition of
lll_robust_trylock is either the same or has a bug.  It has exactly one
user, and that user already presumes exactly what it must do.  So it's
better to just get rid of it.

On x86 (GCC 4.6.3), this affected the generated code nontrivially, but only
in good ways.  The compiler made some different register allocation and
scheduling decisions.  It also noticed an opportunity to fuse redundant
code that it had overlooked before.  I'm not really sure why it couldn't
see it before.  My only guess is that it made some pessimistic assumption
before when it was dealing with an asm (the old lll_robust_trylock) and now
it groks more because atomic_compare_and_exchange_val_acq is actually just
__sync_val_compare_and_swap rather than hand-coded asm.

On machines that had the "!= 0" comparison (aarch64, alpha, arm, hppa,
ia64, microblaze, mips, s390, sh, sparc, and tile) this should affect the
generated code slightly and that should be fixing a latent performance bug.

I'm not going to wait for every machine maintainer to chime in.  But I do
want some review both that the new logic seems to be more right on the
machines above where it's intentionally changing the acutal logic, and that
the x86_64/i686 changes to generated code do only neutral or good things
compared to the old assembly code (which I think had the right logic
already, so no logic should be changed there).


Thanks,
Roland


2014-07-09  Roland McGrath  <roland@hack.frob.com>

	* nptl/pthread_mutex_trylock.c (__pthread_mutex_trylock):
	Use atomic_compare_and_exchange_val_acq directly rather than
	lll_robust_trylock.
	* sysdeps/unix/sysv/linux/aarch64/lowlevellock.h
	(__lll_robust_trylock, lll_robust_trylock): Removed.
	* sysdeps/unix/sysv/linux/alpha/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/arm/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/hppa/nptl/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/i386/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/ia64/nptl/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/m68k/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/microblaze/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/mips/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/s390/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/sh/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/sparc/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/tile/lowlevellock.h: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/lowlevellock.h: Likewise.
  

Comments

Ondrej Bilka July 10, 2014, 12:55 p.m. UTC | #1
On Wed, Jul 09, 2014 at 04:00:11PM -0700, Roland McGrath wrote:
> As discussed in the "[PATCH roland/lll-futex]" thread, every definition of
> lll_robust_trylock is either the same or has a bug.  It has exactly one
> user, and that user already presumes exactly what it must do.  So it's
> better to just get rid of it.
> 
> On x86 (GCC 4.6.3), this affected the generated code nontrivially, but only
> in good ways.  The compiler made some different register allocation and
> scheduling decisions.  It also noticed an opportunity to fuse redundant
> code that it had overlooked before.  I'm not really sure why it couldn't
> see it before.  My only guess is that it made some pessimistic assumption
> before when it was dealing with an asm (the old lll_robust_trylock) and now
> it groks more because atomic_compare_and_exchange_val_acq is actually just
> __sync_val_compare_and_swap rather than hand-coded asm.
> 
> On machines that had the "!= 0" comparison (aarch64, alpha, arm, hppa,
> ia64, microblaze, mips, s390, sh, sparc, and tile) this should affect the
> generated code slightly and that should be fixing a latent performance bug.
> 
> I'm not going to wait for every machine maintainer to chime in.  But I do
> want some review both that the new logic seems to be more right on the
> machines above where it's intentionally changing the acutal logic, and that
> the x86_64/i686 changes to generated code do only neutral or good things
> compared to the old assembly code (which I think had the right logic
> already, so no logic should be changed there).
> 
Yes, x86_64 looks ok. Difference is because compiler now has more
information. in assembly before it needed to assume that it could
overwrite memory in arbitrary way so it cannot move loads across.
  

Patch

--- a/nptl/pthread_mutex_trylock.c
+++ b/nptl/pthread_mutex_trylock.c
@@ -159,7 +159,8 @@  __pthread_mutex_trylock (mutex)
 		}
 	    }
 
-	  oldval = lll_robust_trylock (mutex->__data.__lock, id);
+	  oldval = atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,
+							id, 0);
 	  if (oldval != 0 && (oldval & FUTEX_OWNER_DIED) == 0)
 	    {
 	      THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
--- a/sysdeps/unix/sysv/linux/aarch64/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/aarch64/lowlevellock.h
@@ -180,11 +180,6 @@ 
 #define lll_cond_trylock(lock)	\
   atomic_compare_and_exchange_val_acq(&(lock), 2, 0)
 
-#define __lll_robust_trylock(futex, id) \
-  (atomic_compare_and_exchange_val_acq (futex, id, 0) != 0)
-#define lll_robust_trylock(lock, id) \
-  __lll_robust_trylock (&(lock), id)
-
 extern void __lll_lock_wait_private (int *futex) attribute_hidden;
 extern void __lll_lock_wait (int *futex, int private) attribute_hidden;
 extern int __lll_robust_lock_wait (int *futex, int private) attribute_hidden;
--- a/sysdeps/unix/sysv/linux/alpha/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/alpha/lowlevellock.h
@@ -187,14 +187,6 @@  __lll_cond_trylock(int *futex)
 #define lll_cond_trylock(lock)	__lll_cond_trylock (&(lock))
 
 
-static inline int __attribute__((always_inline))
-__lll_robust_trylock(int *futex, int id)
-{
-  return atomic_compare_and_exchange_val_acq (futex, id, 0) != 0;
-}
-#define lll_robust_trylock(lock, id) \
-  __lll_robust_trylock (&(lock), id)
-
 extern void __lll_lock_wait_private (int *futex) attribute_hidden;
 extern void __lll_lock_wait (int *futex, int private) attribute_hidden;
 extern int __lll_robust_lock_wait (int *futex, int private) attribute_hidden;
--- a/sysdeps/unix/sysv/linux/arm/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/arm/lowlevellock.h
@@ -176,11 +176,6 @@ 
 #define lll_cond_trylock(lock)	\
   atomic_compare_and_exchange_val_acq(&(lock), 2, 0)
 
-#define __lll_robust_trylock(futex, id) \
-  (atomic_compare_and_exchange_val_acq (futex, id, 0) != 0)
-#define lll_robust_trylock(lock, id) \
-  __lll_robust_trylock (&(lock), id)
-
 extern void __lll_lock_wait_private (int *futex) attribute_hidden;
 extern void __lll_lock_wait (int *futex, int private) attribute_hidden;
 extern int __lll_robust_lock_wait (int *futex, int private) attribute_hidden;
--- a/sysdeps/unix/sysv/linux/hppa/nptl/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/hppa/nptl/lowlevellock.h
@@ -195,15 +195,6 @@  typedef int lll_lock_t;
 
 static inline int
 __attribute__ ((always_inline))
-__lll_robust_trylock (int *futex, int id)
-{
-  return atomic_compare_and_exchange_val_acq (futex, id, 0) != 0;
-}
-#define lll_robust_trylock(futex, id) \
-  __lll_robust_trylock (&(futex), id)
-
-static inline int
-__attribute__ ((always_inline))
 __lll_cond_trylock (int *futex)
 {
   return atomic_compare_and_exchange_val_acq (futex, 2, 0) != 0;
--- a/sysdeps/unix/sysv/linux/i386/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/i386/lowlevellock.h
@@ -188,15 +188,6 @@ 
 		       : "memory");					      \
      ret; })
 
-#define lll_robust_trylock(futex, id) \
-  ({ int ret;								      \
-     __asm __volatile (LOCK_INSTR "cmpxchgl %2, %1"			      \
-		       : "=a" (ret), "=m" (futex)			      \
-		       : "r" (id), "m" (futex),				      \
-			 "0" (LLL_LOCK_INITIALIZER)			      \
-		       : "memory");					      \
-     ret; })
-
 
 #define lll_cond_trylock(futex) \
   ({ int ret;								      \
--- a/sysdeps/unix/sysv/linux/ia64/nptl/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/ia64/nptl/lowlevellock.h
@@ -169,12 +169,6 @@  while (0)
 #define lll_trylock(futex) __lll_trylock (&(futex))
 
 
-#define __lll_robust_trylock(futex, id) \
-  (atomic_compare_and_exchange_val_acq (futex, id, 0) != 0)
-#define lll_robust_trylock(futex, id) \
-  __lll_robust_trylock (&(futex), id)
-
-
 #define __lll_cond_trylock(futex) \
   (atomic_compare_and_exchange_val_acq (futex, 2, 0) != 0)
 #define lll_cond_trylock(futex) __lll_cond_trylock (&(futex))
--- a/sysdeps/unix/sysv/linux/m68k/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/m68k/lowlevellock.h
@@ -177,9 +177,6 @@ 
 #define lll_cond_trylock(lock)				\
   atomic_compare_and_exchange_val_acq (&(lock), 2, 0)
 
-#define lll_robust_trylock(lock, id)			\
-  atomic_compare_and_exchange_val_acq (&(lock), id, 0)
-
 extern void __lll_lock_wait_private (int *futex) attribute_hidden;
 extern void __lll_lock_wait (int *futex, int private) attribute_hidden;
 extern int __lll_robust_lock_wait (int *futex, int private) attribute_hidden;
--- a/sysdeps/unix/sysv/linux/microblaze/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/microblaze/lowlevellock.h
@@ -180,11 +180,6 @@ 
 #define lll_cond_trylock(lock)                                                 \
   atomic_compare_and_exchange_val_acq(&(lock), 2, 0)
 
-#define __lll_robust_trylock(futex, id)                                        \
-  (atomic_compare_and_exchange_val_acq (futex, id, 0) != 0)
-#define lll_robust_trylock(lock, id)                                           \
-  __lll_robust_trylock (&(lock), id)
-
 extern void __lll_lock_wait_private (int *futex) attribute_hidden;
 extern void __lll_lock_wait (int *futex, int private) attribute_hidden;
 extern int __lll_robust_lock_wait (int *futex, int private) attribute_hidden;
--- a/sysdeps/unix/sysv/linux/mips/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/mips/lowlevellock.h
@@ -187,14 +187,6 @@  __lll_cond_trylock(int *futex)
 #define lll_cond_trylock(lock)	__lll_cond_trylock (&(lock))
 
 
-static inline int __attribute__((always_inline))
-__lll_robust_trylock(int *futex, int id)
-{
-  return atomic_compare_and_exchange_val_acq (futex, id, 0) != 0;
-}
-#define lll_robust_trylock(lock, id) \
-  __lll_robust_trylock (&(lock), id)
-
 extern void __lll_lock_wait_private (int *futex) attribute_hidden;
 extern void __lll_lock_wait (int *futex, int private) attribute_hidden;
 extern int __lll_robust_lock_wait (int *futex, int private) attribute_hidden;
--- a/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/powerpc/lowlevellock.h
@@ -202,23 +202,6 @@ 
 # endif
 #endif
 
-/* Set *futex to ID if it is 0, atomically.  Returns the old value */
-#define __lll_robust_trylock(futex, id) \
-  ({ int __val;								      \
-     __asm __volatile ("1:	lwarx	%0,0,%2" MUTEX_HINT_ACQ "\n"	      \
-		       "	cmpwi	0,%0,0\n"			      \
-		       "	bne	2f\n"				      \
-		       "	stwcx.	%3,0,%2\n"			      \
-		       "	bne-	1b\n"				      \
-		       "2:	" __lll_acq_instr			      \
-		       : "=&r" (__val), "=m" (*futex)			      \
-		       : "r" (futex), "r" (id), "m" (*futex)		      \
-		       : "cr0", "memory");				      \
-     __val;								      \
-  })
-
-#define lll_robust_trylock(lock, id) __lll_robust_trylock (&(lock), id)
-
 /* Set *futex to 1 if it is 0, atomically.  Returns the old value */
 #define __lll_trylock(futex) __lll_robust_trylock (futex, 1)
 
--- a/sysdeps/unix/sysv/linux/s390/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/s390/lowlevellock.h
@@ -210,21 +210,6 @@  __lll_cond_trylock (int *futex)
 #define lll_cond_trylock(futex) __lll_cond_trylock (&(futex))
 
 
-static inline int
-__attribute__ ((always_inline))
-__lll_robust_trylock (int *futex, int id)
-{
-    unsigned int old;
-
-    __asm __volatile ("cs %0,%3,%1"
-		       : "=d" (old), "=Q" (*futex)
-		       : "0" (0), "d" (id), "m" (*futex) : "cc", "memory" );
-    return old != 0;
-}
-#define lll_robust_trylock(futex, id) \
-  __lll_robust_trylock (&(futex), id)
-
-
 extern void __lll_lock_wait_private (int *futex) attribute_hidden;
 extern void __lll_lock_wait (int *futex, int private) attribute_hidden;
 extern int __lll_robust_lock_wait (int *futex, int private) attribute_hidden;
--- a/sysdeps/unix/sysv/linux/sh/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/sh/lowlevellock.h
@@ -118,28 +118,6 @@  extern int __lll_unlock_wake (int *__futex, int private) attribute_hidden;
 	: "r0", "r1", "r2", "t", "memory"); \
      __result; })
 
-#define lll_robust_trylock(futex, id)	\
-  ({ unsigned char __result; \
-     __asm __volatile ("\
-	.align 2\n\
-	mova 1f,r0\n\
-	nop\n\
-	mov r15,r1\n\
-	mov #-8,r15\n\
-     0: mov.l @%1,r2\n\
-	cmp/eq r2,%3\n\
-	bf 1f\n\
-	mov.l %2,@%1\n\
-     1: mov r1,r15\n\
-	mov #-1,%0\n\
-	negc %0,%0"\
-	: "=r" (__result) \
-	: "r" (&(futex)), \
-	  "r" (id), \
-	  "r" (LLL_LOCK_INITIALIZER) \
-	: "r0", "r1", "r2", "t", "memory"); \
-     __result; })
-
 #define lll_cond_trylock(futex) \
   ({ unsigned char __result; \
      __asm __volatile ("\
--- a/sysdeps/unix/sysv/linux/sparc/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/sparc/lowlevellock.h
@@ -203,15 +203,6 @@  __lll_cond_trylock (int *futex)
 }
 #define lll_cond_trylock(futex) __lll_cond_trylock (&(futex))
 
-static inline int
-__attribute__ ((always_inline))
-__lll_robust_trylock (int *futex, int id)
-{
-  return atomic_compare_and_exchange_val_acq (futex, id, 0) != 0;
-}
-#define lll_robust_trylock(futex, id) \
-  __lll_robust_trylock (&(futex), id)
-
 
 extern void __lll_lock_wait_private (int *futex) attribute_hidden;
 extern void __lll_lock_wait (int *futex, int private) attribute_hidden;
--- a/sysdeps/unix/sysv/linux/tile/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/tile/lowlevellock.h
@@ -186,14 +186,6 @@  __lll_cond_trylock (int *futex)
 #define lll_cond_trylock(lock)	__lll_cond_trylock (&(lock))
 
 
-static inline int __attribute__ ((always_inline))
-__lll_robust_trylock (int *futex, int id)
-{
-  return atomic_compare_and_exchange_val_acq (futex, id, 0) != 0;
-}
-#define lll_robust_trylock(lock, id) \
-  __lll_robust_trylock (&(lock), id)
-
 extern void __lll_lock_wait_private (int *futex) attribute_hidden;
 extern void __lll_lock_wait (int *futex, int private) attribute_hidden;
 extern int __lll_robust_lock_wait (int *futex, int private) attribute_hidden;
--- a/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h
+++ b/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h
@@ -165,14 +165,6 @@ 
 		       : "memory");					      \
      ret; })
 
-#define lll_robust_trylock(futex, id) \
-  ({ int ret;								      \
-     __asm __volatile (LOCK_INSTR "cmpxchgl %2, %1"			      \
-		       : "=a" (ret), "=m" (futex)			      \
-		       : "r" (id), "m" (futex),	"0" (LLL_LOCK_INITIALIZER)    \
-		       : "memory");					      \
-     ret; })
-
 #define lll_cond_trylock(futex) \
   ({ int ret;								      \
      __asm __volatile (LOCK_INSTR "cmpxchgl %2, %1"			      \