[1/2] S390: Optimize atomic macros.

  The atomic_compare_and_exchange_val_acq macro is now implemented with
gcc __sync_val_compare_and_swap instead of an inline assembly with
compare-and-swap instruction.
The memory is compared against expected OLDVAL before using compare-and-swap
instruction in case of OLDVAL is constant at compile time.  This is used in
various locking code.  If the lock is already aquired, another cpu has not to
exclusively lock the memory.  If OLDVAL is not constant the compare-and-swap
instruction is used directly as the usages of this macro usually load the
current value in front of this macro.

The same applies to atomic_compare_and_exchange_bool_acq which wasn't
defined before.  Now it is implemented with gcc __sync_bool_compare_and_swap.
If the macro is used as condition in an if/while expression, the condition
code is used to e.g. jump directly to another code sequence.  Before this
change, the old value returned by compare-and-swap instruction was compared
with given OLDVAL to determine if e.g. a jump is needed.

The atomic_exchange_acq macro is now using the load-and-and instruction for a
constant zero value instead of a compare-and-swap loop.  This instruction is
available on a z196 zarch and higher cpus.  This is e.g. used in unlocking code.

The newly defined atomic_exchange_and_add macro is implemented with gcc
builtin __sync_fetch_and_add which uses load-and-add instruction on z196 zarch
and higher cpus instead of a loop with compare-and-swap instruction.
The same applies to atomic_or_val, atomic_and_val, ... macros, which use
the appropiate z196 instruction.

The macros lll_trylock, lll_cond_trylock are extended by an __glibc_unlikely
hint. With the hint gcc on s390 emits code in e.g. pthread_mutex_trylock
which does not use jumps in case the lock is free.  Without the hint it had
to jump if the lock was free.

ChangeLog:

	* sysdeps/s390/atomic-machine.h
	(__ATOMIC_MACROS_HAVE_Z196_ZARCH_INSN): New define.
	(atomic_compare_and_exchange_val_acq):
	Use __sync_val_compare_and_swap and first compare with non-atomic
	instruction in case of OLDVAL is constant.
	(atomic_compare_and_exchange_bool_acq): New define.
	(atomic_exchange_acq): Use load-and-and instruction for constant
	zero values, if available.
	(atomic_exchange_and_add, catomic_exchange_and_add, atomic_or_val,
	atomic_or, catomic_or, atomic_bit_test_set, atomic_and_val,
	atomic_and, catomic_and): New define.
	* sysdeps/unix/sysv/linux/s390/lowlevellock.h:
	(lll_trylock, lll_cond_trylock): New define.
---
 sysdeps/s390/atomic-machine.h               | 195 ++++++++++++++++++++--------
 sysdeps/unix/sysv/linux/s390/lowlevellock.h |  25 +++-
 2 files changed, 160 insertions(+), 60 deletions(-)

[1/2] S390: Optimize atomic macros.

Commit Message

Comments

Patch