[4/4] Use C11 atomics in pthread_once.

  This patch transforms pthread_once to use C11 atomics.  It's meant as an
illustration and early test.

Please note that I've transformed *all* accesses to concurrently
accessed memory locations to use atomic operations.  This is the right
thing to do to inform the compiler about concurrency and prevent it from
making optimizations based on assumptions about data-race-freedom and
sequential code (concurrent accesses are not sequential code...).
You'll see that atomic_*_relaxed is used quite a bit, which restricts
the compiler a little but does not add any barriers.

Also, this makes it easy to see which loads and stores are actually
concurrent code and thus need additional attention by the programmer.

I've compared generated code on x86_64 on GCC 4.4.7.  The only thing
that changes between before/after the patch is that a "cmp %eax,%edx"
becomes a "cmp %edx,%eax", but it's used to test equality of the values.

I've also looked at the code generated by a pre-4.9 GCC build.  The code
generated for the pthread_once fast path is the same as with GCC 4.4.7
before the patch.  The slow path has somewhat different code with the
more recent compiler, with less instructions.

Joseph mentioned that GCC doesn't yet optimize memory_order_relaxed
loads as well as it could.  This is something we may have to look at
again for other code, but for pthread_once at least this wasn't a
problem.  My gut feeling is that it won't matter as much for most
concurrent code, because relaxed atomic ops will likely be close to
other atomic ops with stronger memory orders, unless you're doing
something special like generating loads and loads of relaxed ops.

	* nptl/pthread_once.c (clear_once_control, __pthread_once_slow,
	__pthread_once): Use C11 atomics.

[4/4] Use C11 atomics in pthread_once.

Commit Message

Comments

Patch