[2/2] S390: Use generic spinlock code.

Message ID c7954f69-04a0-47f1-7c79-3866d15cfe09@linux.vnet.ibm.com
State Superseded
Headers

Commit Message

Stefan Liebler Feb. 15, 2017, 4:26 p.m. UTC
  On 02/13/2017 09:39 PM, Torvald Riegel wrote:
> On Wed, 2017-02-08 at 15:49 +0100, Stefan Liebler wrote:
>> This is an updated version of the patch, which adjusts the s390 specific
>> atomic-macros in the same way as in include/atomic.h.
>> Thus passing a volatile int pointer is fine, too.
>
> The general direction of this is okay.
> Some of my comments for patch 1/2 apply here as well (eg, volatile vs.
> atomics).
>
See answer in patch 1/2.

> What I don't like is the choice of 1000 for
> SPIN_LOCK_READS_BETWEEN_CMPXCHG.  Have you run benchmarks to come up
> with this value, or is it a guess?  Why isn't it documented how you end
> up with this number?
> We can keep these with a choice such as this, but then we need to have a
> FIXME comment in the code, explaining that this is just an arbitrary
> choice.
>
> I would guess that just spinning forever is sufficient, and that we
> don't need to throw in a CAS every now and then; using randomized
> exponential back-off might be more important.  This is something that we
> would be in a better position to answer if you'd provide a
> microbenchmark for this choice too.
> At the end of 2016, I've posted a draft of a microbenchmark for rwlocks.
> Maybe you can use this as a start and run a few experiments?
 >
I've run own benchmarks in the same manner as your mentioned 
microbenchmark for rwlocks below.
You are right, I can't recognize a real difference between
#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
and
#define SPIN_LOCK_READS_BETWEEN_CMPXCHG -1

As it does not hurt, I prefer to use a CAS every 1000 plain reads.
A CAS is not necessary on current CPUs but from architecture 
perspective, it is more correct if there is such a serialization 
instruction.

There is a difference between
#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 0
and one of the others.

The same applies to
#define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 1
It does not hurt if the lock is free, but there is a difference if the 
lock is already acquired and trylock is called often.

I've saw your microbenchmark-post and added some notes.

I added a FIXME comment to re-evaluate the choice once we have the 
appropriate microbenchmarks.

> Also, I'm not quite sure whether this number is really
> spinlock-specific, and I would like to find a better place for these.
> IMO, they should be in some header that contains default tuning
> parameters for synchronization code, which is provided by each
> architecture that uses the generic spinlock; we'd have no #ifdef for the
> tuning parameters, so we'd catch typos in those headers.
>
See pthread_spin_parameters.h in updated patch 1/2.


I've attached an updated patch due to the changes in patch 1/2 and added 
comments to the macro definitions.

Bye.
Stefan

ChangeLog:

	* sysdeps/s390/atomic-machine.h:
	(__arch_compare_and_exchange_val_32_acq):
	Cast type to omit volatile qualifier.
	(__arch_compare_and_exchange_val_64_acq): Likewise.
	(atomic_exchange_acq): Likewise.
	* sysdeps/s390/nptl/pthread_spin_init.c: Delete File.
	* sysdeps/s390/nptl/pthread_spin_lock.c: Likewise.
	* sysdeps/s390/nptl/pthread_spin_trylock.c: Likewise.
	* sysdeps/s390/nptl/pthread_spin_unlock.c: Likewise.
	* sysdeps/s390/nptl/pthread_spin_parameters.h: New File.
  

Comments

Torvald Riegel Feb. 18, 2017, 5:05 p.m. UTC | #1
On Wed, 2017-02-15 at 17:26 +0100, Stefan Liebler wrote:
> On 02/13/2017 09:39 PM, Torvald Riegel wrote:
> > On Wed, 2017-02-08 at 15:49 +0100, Stefan Liebler wrote:
> >> This is an updated version of the patch, which adjusts the s390 specific
> >> atomic-macros in the same way as in include/atomic.h.
> >> Thus passing a volatile int pointer is fine, too.
> >
> > The general direction of this is okay.
> > Some of my comments for patch 1/2 apply here as well (eg, volatile vs.
> > atomics).
> >
> See answer in patch 1/2.
> 
> > What I don't like is the choice of 1000 for
> > SPIN_LOCK_READS_BETWEEN_CMPXCHG.  Have you run benchmarks to come up
> > with this value, or is it a guess?  Why isn't it documented how you end
> > up with this number?
> > We can keep these with a choice such as this, but then we need to have a
> > FIXME comment in the code, explaining that this is just an arbitrary
> > choice.
> >
> > I would guess that just spinning forever is sufficient, and that we
> > don't need to throw in a CAS every now and then; using randomized
> > exponential back-off might be more important.  This is something that we
> > would be in a better position to answer if you'd provide a
> > microbenchmark for this choice too.
> > At the end of 2016, I've posted a draft of a microbenchmark for rwlocks.
> > Maybe you can use this as a start and run a few experiments?
>  >
> I've run own benchmarks in the same manner as your mentioned 
> microbenchmark for rwlocks below.
> You are right, I can't recognize a real difference between
> #define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
> and
> #define SPIN_LOCK_READS_BETWEEN_CMPXCHG -1
> 
> As it does not hurt, I prefer to use a CAS every 1000 plain reads.
> A CAS is not necessary on current CPUs but from architecture 
> perspective, it is more correct if there is such a serialization 
> instruction.

What do you mean by "more correct"?  I'm not aware of an architecture
that would not ensure that stores on one CPU will eventually become
visible to other CPUs.

Thus, as I wrote in my review of patch 1/2, I think we should just
remove the occassional CAS (ie, just do the equivalent of the -1
setting, always).

> There is a difference between
> #define SPIN_LOCK_READS_BETWEEN_CMPXCHG 0
> and one of the others.

Right.  We do want to spin if the lock is acquired by another thread.
What we should do in a future patch is to experiment with the back-off
between loads.  tile already has some code for this, which we at least
need to integrate at some point.

> The same applies to
> #define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 1
> It does not hurt if the lock is free, but there is a difference if the 
> lock is already acquired and trylock is called often.

Yes.  I've replied to this point in the 1/2 patch thread.

> I've saw your microbenchmark-post and added some notes.

Thanks.  I'll get to them later.

> I added a FIXME comment to re-evaluate the choice once we have the 
> appropriate microbenchmarks.
> 
> > Also, I'm not quite sure whether this number is really
> > spinlock-specific, and I would like to find a better place for these.
> > IMO, they should be in some header that contains default tuning
> > parameters for synchronization code, which is provided by each
> > architecture that uses the generic spinlock; we'd have no #ifdef for the
> > tuning parameters, so we'd catch typos in those headers.
> >
> See pthread_spin_parameters.h in updated patch 1/2.

I suggest that we'll work towards consensus on patch 1/2 first.  I
believe once that is done, patch 2/2 would likely just remove s390 code.

Thanks for continuing the work on this.  I know we have some back and
forth here in terms of overall direction, but I also think we're making
progress and are continually improving the changes.
  

Patch

commit 6dc9da4a9c99524d5d6deb841811f1a8cc37c5b2
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed Feb 15 17:22:38 2017 +0100

    S390: Use generic spinlock code.
    
    This patch removes the s390 specific implementation of spinlock code
    and is now using the generic one.
    
    For pthread_spin_trylock an explicit load and test before executing
    compare and swap instruction is done as it is an interlock update even
    if the lock is already acquired.
    
    The macros in s390 specific atomic-machine.h are aligned in order to
    omit the storing to and loading from stack.
    
    ChangeLog:
    
    	* sysdeps/s390/atomic-machine.h:
    	(__arch_compare_and_exchange_val_32_acq):
    	Cast type to omit volatile qualifier.
    	(__arch_compare_and_exchange_val_64_acq): Likewise.
    	(atomic_exchange_acq): Likewise.
    	* sysdeps/s390/nptl/pthread_spin_init.c: Delete File.
    	* sysdeps/s390/nptl/pthread_spin_lock.c: Likewise.
    	* sysdeps/s390/nptl/pthread_spin_trylock.c: Likewise.
    	* sysdeps/s390/nptl/pthread_spin_unlock.c: Likewise.
    	* sysdeps/s390/nptl/pthread_spin_parameters.h: New File.

diff --git a/sysdeps/s390/atomic-machine.h b/sysdeps/s390/atomic-machine.h
index 211d3d6..d98c836 100644
--- a/sysdeps/s390/atomic-machine.h
+++ b/sysdeps/s390/atomic-machine.h
@@ -54,7 +54,7 @@  typedef uintmax_t uatomic_max_t;
 
 #define __arch_compare_and_exchange_val_32_acq(mem, newval, oldval) \
   ({ __typeof (mem) __archmem = (mem);					      \
-     __typeof (*mem) __archold = (oldval);				      \
+     __typeof ((__typeof (*(mem))) *(mem)) __archold = (oldval);	      \
      __asm__ __volatile__ ("cs %0,%2,%1"				      \
 			   : "+d" (__archold), "=Q" (*__archmem)	      \
 			   : "d" (newval), "m" (*__archmem) : "cc", "memory" );	\
@@ -64,7 +64,7 @@  typedef uintmax_t uatomic_max_t;
 # define __HAVE_64B_ATOMICS 1
 # define __arch_compare_and_exchange_val_64_acq(mem, newval, oldval) \
   ({ __typeof (mem) __archmem = (mem);					      \
-     __typeof (*mem) __archold = (oldval);				      \
+     __typeof ((__typeof (*(mem))) *(mem)) __archold = (oldval);	      \
      __asm__ __volatile__ ("csg %0,%2,%1"				      \
 			   : "+d" (__archold), "=Q" (*__archmem)	      \
 			   : "d" ((long) (newval)), "m" (*__archmem) : "cc", "memory" ); \
@@ -86,8 +86,8 @@  typedef uintmax_t uatomic_max_t;
 #ifdef __s390x__
 # define atomic_exchange_acq(mem, newvalue)				\
   ({ __typeof (mem) __atg5_memp = (mem);				\
-    __typeof (*(mem)) __atg5_oldval = *__atg5_memp;			\
-    __typeof (*(mem)) __atg5_value = (newvalue);			\
+    __typeof ((__typeof (*(mem))) *(mem)) __atg5_oldval = *__atg5_memp;	\
+    __typeof ((__typeof (*(mem))) *(mem)) __atg5_value = (newvalue);	\
     if (sizeof (*mem) == 4)						\
       __asm__ __volatile__ ("0: cs %0,%2,%1\n"				\
 			    "   jl 0b"					\
@@ -106,8 +106,8 @@  typedef uintmax_t uatomic_max_t;
 #else
 # define atomic_exchange_acq(mem, newvalue)				\
   ({ __typeof (mem) __atg5_memp = (mem);				\
-    __typeof (*(mem)) __atg5_oldval = *__atg5_memp;			\
-    __typeof (*(mem)) __atg5_value = (newvalue);			\
+    __typeof ((__typeof (*(mem))) *(mem)) __atg5_oldval = *__atg5_memp;	\
+    __typeof ((__typeof (*(mem))) *(mem)) __atg5_value = (newvalue);	\
     if (sizeof (*mem) == 4)						\
       __asm__ __volatile__ ("0: cs %0,%2,%1\n"				\
 			    "   jl 0b"					\
diff --git a/sysdeps/s390/nptl/pthread_spin_init.c b/sysdeps/s390/nptl/pthread_spin_init.c
deleted file mode 100644
index d826871..0000000
--- a/sysdeps/s390/nptl/pthread_spin_init.c
+++ /dev/null
@@ -1,19 +0,0 @@ 
-/* Copyright (C) 2003-2017 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Martin Schwidefsky <schwidefsky@de.ibm.com>, 2003.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-/* Not needed.  pthread_spin_init is an alias for pthread_spin_unlock.  */
diff --git a/sysdeps/s390/nptl/pthread_spin_lock.c b/sysdeps/s390/nptl/pthread_spin_lock.c
deleted file mode 100644
index 7349940..0000000
--- a/sysdeps/s390/nptl/pthread_spin_lock.c
+++ /dev/null
@@ -1,32 +0,0 @@ 
-/* Copyright (C) 2003-2017 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Martin Schwidefsky <schwidefsky@de.ibm.com>, 2003.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include "pthreadP.h"
-
-int
-pthread_spin_lock (pthread_spinlock_t *lock)
-{
-  int oldval;
-
-  __asm__ __volatile__ ("0: lhi %0,0\n"
-			"   cs  %0,%2,%1\n"
-			"   jl  0b"
-			: "=&d" (oldval), "=Q" (*lock)
-			: "d" (1), "m" (*lock) : "cc" );
-  return 0;
-}
diff --git a/sysdeps/s390/nptl/pthread_spin_parameters.h b/sysdeps/s390/nptl/pthread_spin_parameters.h
new file mode 100644
index 0000000..b2ff166
--- /dev/null
+++ b/sysdeps/s390/nptl/pthread_spin_parameters.h
@@ -0,0 +1,36 @@ 
+/* Parameters for generic pthread_spinlock_t implementation.  s390 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* FIXME: Re-evaluate the defines below once we have an appropriate
+   microbenchmark.
+   See e.g. "RFC: shared-memory synchronization benchmarking in glibc"
+   (https://sourceware.org/ml/libc-alpha/2017-01/msg00168.html)  */
+
+/* An own benchmark shows no difference between defining to '1000'
+   and '-1', but a positive result compared to '0' as a CAS instruction
+   would be used in a loop, which should be avoided on s390 due to cache-line
+   ping-pong.  */
+#define SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
+
+/* An own benchmark showed a positive result if defined to '1'
+   and trylock is called often by different CPUs.
+   If the lock is not acquired and no other CPU is trying to
+   acquire the lock, then the additional load does not hurt.  */
+#define SPIN_TRYLOCK_LOAD_AND_TEST_BEFORE_XCHG 1
+
+#include_next <pthread_spin_parameters.h>
diff --git a/sysdeps/s390/nptl/pthread_spin_trylock.c b/sysdeps/s390/nptl/pthread_spin_trylock.c
deleted file mode 100644
index 0e848da..0000000
--- a/sysdeps/s390/nptl/pthread_spin_trylock.c
+++ /dev/null
@@ -1,32 +0,0 @@ 
-/* Copyright (C) 2003-2017 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Martin Schwidefsky <schwidefsky@de.ibm.com>, 2003.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <errno.h>
-#include "pthreadP.h"
-
-int
-pthread_spin_trylock (pthread_spinlock_t *lock)
-{
-  int old;
-
-  __asm__ __volatile__ ("cs %0,%3,%1"
-			: "=d" (old), "=Q" (*lock)
-			: "0" (0), "d" (1), "m" (*lock) : "cc" );
-
-  return old != 0 ? EBUSY : 0;
-}
diff --git a/sysdeps/s390/nptl/pthread_spin_unlock.c b/sysdeps/s390/nptl/pthread_spin_unlock.c
deleted file mode 100644
index 54e7378..0000000
--- a/sysdeps/s390/nptl/pthread_spin_unlock.c
+++ /dev/null
@@ -1,32 +0,0 @@ 
-/* Copyright (C) 2003-2017 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Martin Schwidefsky <schwidefsky@de.ibm.com>, 2003.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-/* Ugly hack to avoid the declaration of pthread_spin_init.  */
-#define pthread_spin_init pthread_spin_init_XXX
-#include "pthreadP.h"
-#undef pthread_spin_init
-
-int
-pthread_spin_unlock (pthread_spinlock_t *lock)
-{
-  __asm__ __volatile__ ("   xc  %O0(4,%R0),%0\n"
-			"   bcr 15,0"
-			: "=Q" (*lock) : "m" (*lock) : "cc" );
-  return 0;
-}
-strong_alias (pthread_spin_unlock, pthread_spin_init)