PR libgomp/103068: Optimize gomp_mutex_lock_slow for x86 target
Commit Message
Hi,
From the CPU's point of view, getting a cache line for writing is more
expensive than reading. See Appendix A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers
/xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and causes
excessive cache line bouncing.
For gomp_mutex_lock_slow, it spins on __atomic_compare_exchange_n, so
add load-check to continue spin if cmpxchg may fail.
Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for master?
libgomp/ChangeLog:
PR libgomp/103068
* config/linux/mutex.c (gomp_mutex_lock_slow): Continue spin
loop when mutex is not 0 under x86 target.
* config/linux/x86/futex.h (TARGET_X86_AVOID_CMPXCHG): Define.
---
libgomp/config/linux/mutex.c | 5 +++++
libgomp/config/linux/x86/futex.h | 2 ++
2 files changed, 7 insertions(+)
@@ -49,6 +49,11 @@ gomp_mutex_lock_slow (gomp_mutex_t *mutex, int oldval)
}
else
{
+#ifdef TARGET_X86_AVOID_CMPXCHG
+ /* For x86, omit cmpxchg when atomic load shows mutex is not 0. */
+ if ((oldval = __atomic_load_n (mutex, MEMMODEL_RELAXED)) != 0)
+ continue;
+#endif
/* Something changed. If now unlocked, we're good to go. */
oldval = 0;
if (__atomic_compare_exchange_n (mutex, &oldval, 1, false,
@@ -122,3 +122,5 @@ cpu_relax (void)
{
__builtin_ia32_pause ();
}
+
+#define TARGET_X86_AVOID_CMPXCHG