[PATCHv4] powerpc: Fix write-after-destroy in lock elision
Commit Message
Changes since v3:
- Use atomics to update adapt_count in __lll_unlock_elision.
- Improved the source code comments in __lll_unlock_elision to mention
the mutex destroy requirements.
Changes since v2:
- Fixed coding style.
Changes since v1:
- Removed references to data race by the actual error: write-after-destroy.
- Enclosed adapt_count accesses in C11 atomics.
--- 8< ---
The update of *adapt_count after the release of the lock causes a race
condition when thread A unlocks, thread B continues and destroys the
mutex, and thread A writes to *adapt_count.
2017-01-03 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Steven Munroe <sjmunroe@us.ibm.com>
Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
[BZ #20822]
* sysdeps/unix/sysv/linux/powerpc/elision-lock.c
(__lll_lock_elision): Access adapt_count via C11 atomics.
* sysdeps/unix/sysv/linux/powerpc/elision-trylock.c
(__lll_trylock_elision): Likewise.
* sysdeps/unix/sysv/linux/powerpc/elision-unlock.c
(__lll_unlock_elision): Update adapt_count variable inside the
critical section using C11 atomics.
---
sysdeps/unix/sysv/linux/powerpc/elision-lock.c | 8 +++++---
sysdeps/unix/sysv/linux/powerpc/elision-trylock.c | 7 ++++---
sysdeps/unix/sysv/linux/powerpc/elision-unlock.c | 15 +++++++++------
3 files changed, 18 insertions(+), 12 deletions(-)
Comments
On Tue, 2017-01-03 at 10:34 -0200, Tulio Magno Quites Machado Filho
wrote:
> Changes since v3:
> - Use atomics to update adapt_count in __lll_unlock_elision.
> - Improved the source code comments in __lll_unlock_elision to mention
> the mutex destroy requirements.
>
> Changes since v2:
> - Fixed coding style.
>
> Changes since v1:
> - Removed references to data race by the actual error: write-after-destroy.
> - Enclosed adapt_count accesses in C11 atomics.
>
> --- 8< ---
This is okay, thanks.
For the next time, or when you change non-arch-specific code, please
remember to document memory orders; a brief comment is often sufficient
but reveals the intent behind a specific bit of implementation. For
example:
> diff --git a/sysdeps/unix/sysv/linux/powerpc/elision-lock.c b/sysdeps/unix/sysv/linux/powerpc/elision-lock.c
> index 4589491..044803a 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/elision-lock.c
> +++ b/sysdeps/unix/sysv/linux/powerpc/elision-lock.c
> @@ -45,7 +45,7 @@
> int
> __lll_lock_elision (int *lock, short *adapt_count, EXTRAARG int pshared)
> {
> - if (*adapt_count > 0)
> + if (atomic_load_relaxed (adapt_count) > 0)
Here, you could have just addded:
/* adapt_count is accessed concurrently but is just a hint. Thus,
use atomic accesses but relaxed MO is sufficient. */
This helps others to more quickly understand what was intended. This
isn't critical in this case because we have well-documented similar code
for other architectures, but generally it is better to document why a
choice was made if it is not immediately obvious, or cannot be checked
for consistency based on just local information (eg, the source code
lines around the statement in question); concurrent code obviously can't
be cross-checked with just local information...
Torvald Riegel <triegel@redhat.com> writes:
> For the next time, or when you change non-arch-specific code, please
> remember to document memory orders; a brief comment is often sufficient
> but reveals the intent behind a specific bit of implementation. For
> example:
>
>> diff --git a/sysdeps/unix/sysv/linux/powerpc/elision-lock.c b/sysdeps/unix/sysv/linux/powerpc/elision-lock.c
>> index 4589491..044803a 100644
>> --- a/sysdeps/unix/sysv/linux/powerpc/elision-lock.c
>> +++ b/sysdeps/unix/sysv/linux/powerpc/elision-lock.c
>> @@ -45,7 +45,7 @@
>> int
>> __lll_lock_elision (int *lock, short *adapt_count, EXTRAARG int pshared)
>> {
>> - if (*adapt_count > 0)
>> + if (atomic_load_relaxed (adapt_count) > 0)
>
> Here, you could have just addded:
> /* adapt_count is accessed concurrently but is just a hint. Thus,
> use atomic accesses but relaxed MO is sufficient. */
>
> This helps others to more quickly understand what was intended. This
> isn't critical in this case because we have well-documented similar code
> for other architectures, but generally it is better to document why a
> choice was made if it is not immediately obvious, or cannot be checked
> for consistency based on just local information (eg, the source code
> lines around the statement in question); concurrent code obviously can't
> be cross-checked with just local information...
I agree with you.
I added this comment in this patch and pushed it as e9a96ea.
Thanks!
On Jan 11 2017, "Ulrich Weigand" <uweigand@de.ibm.com> wrote:
> Tulio Magno Quites Machado Filho wrote:
>
>> + /* Update adapt_count in the critical section to prevent a
>> + write-after-destroy error as mentioned in BZ 20822. The
>> + following update of adapt_count has to be contained within
>> + the critical region of the fall-back lock in order to not violate
>> + the mutex destruction requirements. */
>> + short __tmp = atomic_load_relaxed (adapt_count);
>> + if (__tmp > 0)
>> + atomic_store_relaxed (adapt_count, __tmp--);
>
> Doesn't this have to be --__tmp to have any effect?
It should be just __tmp - 1.
Andreas.
@@ -45,7 +45,7 @@
int
__lll_lock_elision (int *lock, short *adapt_count, EXTRAARG int pshared)
{
- if (*adapt_count > 0)
+ if (atomic_load_relaxed (adapt_count) > 0)
{
goto use_lock;
}
@@ -67,7 +67,8 @@ __lll_lock_elision (int *lock, short *adapt_count, EXTRAARG int pshared)
if (_TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
{
if (aconf.skip_lock_internal_abort > 0)
- *adapt_count = aconf.skip_lock_internal_abort;
+ atomic_store_relaxed (adapt_count,
+ aconf.skip_lock_internal_abort);
goto use_lock;
}
}
@@ -75,7 +76,8 @@ __lll_lock_elision (int *lock, short *adapt_count, EXTRAARG int pshared)
/* Fall back to locks for a bit if retries have been exhausted */
if (aconf.try_tbegin > 0 && aconf.skip_lock_out_of_tbegin_retries > 0)
- *adapt_count = aconf.skip_lock_out_of_tbegin_retries;
+ atomic_store_relaxed (adapt_count,
+ aconf.skip_lock_out_of_tbegin_retries);
use_lock:
return LLL_LOCK ((*lock), pshared);
@@ -34,7 +34,7 @@ __lll_trylock_elision (int *futex, short *adapt_count)
__libc_tabort (_ABORT_NESTED_TRYLOCK);
/* Only try a transaction if it's worth it. */
- if (*adapt_count > 0)
+ if (atomic_load_relaxed (adapt_count) > 0)
{
goto use_lock;
}
@@ -49,7 +49,7 @@ __lll_trylock_elision (int *futex, short *adapt_count)
__libc_tend (0);
if (aconf.skip_lock_busy > 0)
- *adapt_count = aconf.skip_lock_busy;
+ atomic_store_relaxed (adapt_count, aconf.skip_lock_busy);
}
else
{
@@ -59,7 +59,8 @@ __lll_trylock_elision (int *futex, short *adapt_count)
result in another failure. Use normal locking now and
for the next couple of calls. */
if (aconf.skip_trylock_internal_abort > 0)
- *adapt_count = aconf.skip_trylock_internal_abort;
+ atomic_store_relaxed (adapt_count,
+ aconf.skip_trylock_internal_abort);
}
}
@@ -28,13 +28,16 @@ __lll_unlock_elision (int *lock, short *adapt_count, int pshared)
__libc_tend (0);
else
{
- lll_unlock ((*lock), pshared);
+ /* Update adapt_count in the critical section to prevent a
+ write-after-destroy error as mentioned in BZ 20822. The
+ following update of adapt_count has to be contained within
+ the critical region of the fall-back lock in order to not violate
+ the mutex destruction requirements. */
+ short __tmp = atomic_load_relaxed (adapt_count);
+ if (__tmp > 0)
+ atomic_store_relaxed (adapt_count, __tmp--);
- /* Update the adapt count AFTER completing the critical section.
- Doing this here prevents unneeded stalling when entering
- a critical section. Saving about 8% runtime on P8. */
- if (*adapt_count > 0)
- (*adapt_count)--;
+ lll_unlock ((*lock), pshared);
}
return 0;
}