libstdc++: Fix deadlock in atomic wait [PR104442]

Message ID CAMmuTO_7Pnv1p=eCwXKO61Te2BD5YiKZaHogWroaJiVjiM=7yQ@mail.gmail.com
State Committed
Headers
Series libstdc++: Fix deadlock in atomic wait [PR104442] |

Commit Message

Thomas Rodgers Feb. 9, 2022, 12:55 a.m. UTC
  This issue was observed as a deadloack in
29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
"laundered" (e.g. type T* does not suffice as a waitable address for the
platform's native waiting primitive), the address waited is that of the
_M_ver member of __waiter_pool_base, so several threads may wait on the
same address for unrelated atomic<T>'s. As noted in the PR, the
implementation correctly exits the wait for the thread who's data
changed, but not for any other threads waiting on the same address.

As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting
but the other waiters were not reloaded the value of _M_ver before
re-entering the wait.

Moving the spin call inside the loop accomplishes this, and is
consistent with the predicate accepting version of __waiter::_M_do_wait.
  

Comments

Jonathan Wakely Feb. 9, 2022, 10:42 a.m. UTC | #1
On Wed, 9 Feb 2022 at 00:57, Thomas Rodgers via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> This issue was observed as a deadloack in
> 29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
> "laundered" (e.g. type T* does not suffice as a waitable address for the
> platform's native waiting primitive), the address waited is that of the
> _M_ver member of __waiter_pool_base, so several threads may wait on the
> same address for unrelated atomic<T>'s. As noted in the PR, the
> implementation correctly exits the wait for the thread who's data
> changed, but not for any other threads waiting on the same address.
>
> As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting
> but the other waiters were not reloaded the value of _M_ver before
> re-entering the wait.
>
> Moving the spin call inside the loop accomplishes this, and is
> consistent with the predicate accepting version of __waiter::_M_do_wait.

There is a change to the memory order in _S_do_spin_v which is not
described in the commit msg or the changelog. Is that unintentional?

(Aside: why do we even have _S_do_spin_v, it's called in exactly one
place, so could just be inlined into _M_do_spin_v, couldn't it?)
  
Thomas Rodgers Feb. 9, 2022, 3:13 p.m. UTC | #2
Excessively enthusiastic refactoring. I expect to rewrite most of this as
part of the work I'm starting now for GCC13 stage1.

On Wed, Feb 9, 2022 at 2:43 AM Jonathan Wakely <jwakely@redhat.com> wrote:

> On Wed, 9 Feb 2022 at 00:57, Thomas Rodgers via Libstdc++
> <libstdc++@gcc.gnu.org> wrote:
> >
> > This issue was observed as a deadloack in
> > 29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
> > "laundered" (e.g. type T* does not suffice as a waitable address for the
> > platform's native waiting primitive), the address waited is that of the
> > _M_ver member of __waiter_pool_base, so several threads may wait on the
> > same address for unrelated atomic<T>'s. As noted in the PR, the
> > implementation correctly exits the wait for the thread who's data
> > changed, but not for any other threads waiting on the same address.
> >
> > As noted in the PR the __waiter::_M_do_wait_v member was correctly
> exiting
> > but the other waiters were not reloaded the value of _M_ver before
> > re-entering the wait.
> >
> > Moving the spin call inside the loop accomplishes this, and is
> > consistent with the predicate accepting version of __waiter::_M_do_wait.
>
> There is a change to the memory order in _S_do_spin_v which is not
> described in the commit msg or the changelog. Is that unintentional?
>
> (Aside: why do we even have _S_do_spin_v, it's called in exactly one
> place, so could just be inlined into _M_do_spin_v, couldn't it?)
>
>
  
Thomas Rodgers Feb. 9, 2022, 5:10 p.m. UTC | #3
Updated patch. I reverted the memory order change (and will submit that as
another patch) and fixed some spelling and grammar errors.

On Wed, Feb 9, 2022 at 2:43 AM Jonathan Wakely <jwakely@redhat.com> wrote:

> On Wed, 9 Feb 2022 at 00:57, Thomas Rodgers via Libstdc++
> <libstdc++@gcc.gnu.org> wrote:
> >
> > This issue was observed as a deadloack in
> > 29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
> > "laundered" (e.g. type T* does not suffice as a waitable address for the
> > platform's native waiting primitive), the address waited is that of the
> > _M_ver member of __waiter_pool_base, so several threads may wait on the
> > same address for unrelated atomic<T>'s. As noted in the PR, the
> > implementation correctly exits the wait for the thread who's data
> > changed, but not for any other threads waiting on the same address.
> >
> > As noted in the PR the __waiter::_M_do_wait_v member was correctly
> exiting
> > but the other waiters were not reloaded the value of _M_ver before
> > re-entering the wait.
> >
> > Moving the spin call inside the loop accomplishes this, and is
> > consistent with the predicate accepting version of __waiter::_M_do_wait.
>
> There is a change to the memory order in _S_do_spin_v which is not
> described in the commit msg or the changelog. Is that unintentional?
>
> (Aside: why do we even have _S_do_spin_v, it's called in exactly one
> place, so could just be inlined into _M_do_spin_v, couldn't it?)
>
>
  
Jonathan Wakely Feb. 9, 2022, 5:13 p.m. UTC | #4
On Wed, 9 Feb 2022 at 17:10, Thomas Rodgers wrote:
>
> Updated patch. I reverted the memory order change (and will submit that as another patch) and fixed some spelling and grammar errors.

OK for trunk and gcc-11, thanks.
  
Thomas Rodgers Feb. 9, 2022, 8:33 p.m. UTC | #5
Tested x86_64-pc-linux-gnu, committed to master, backported to gcc-11.

On Wed, Feb 9, 2022 at 9:14 AM Jonathan Wakely <jwakely@redhat.com> wrote:

> On Wed, 9 Feb 2022 at 17:10, Thomas Rodgers wrote:
> >
> > Updated patch. I reverted the memory order change (and will submit that
> as another patch) and fixed some spelling and grammar errors.
>
> OK for trunk and gcc-11, thanks.
>
>
  

Patch

From ee66736beca3dce4bc09350c5407a2ac7219fbec Mon Sep 17 00:00:00 2001
From: Thomas Rodgers <rodgert@appliantology.com>
Date: Tue, 8 Feb 2022 16:33:36 -0800
Subject: [PATCH] libstdc++: Fix deadlock in atomic wait [PR104442]

This issue was observed as a deadloack in
29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
"laundered" (e.g. type T* does not suffice as a waitable address for the
platform's native waiting primitive), the address waited is that of the
_M_ver member of __waiter_pool_base, so several threads may wait on the
same address for unrelated atomic<T>'s. As noted in the PR, the
implementation correctly exits the wait for the thread who's data
changed, but not for any other threads waiting on the same address.

As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting
but the other waiters were not reloaded the value of _M_ver before
re-entering the wait.

Moving the spin call inside the loop accomplishes this, and is
consistent with the predicate accepting version of __waiter::_M_do_wait.

libstdc++-v3/ChangeLog:

	PR libstdc++/104442
	* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): Move spin
	 loop inside do loop so that threads failing the wait, reload
	 _M_ver.
---
 libstdc++-v3/include/bits/atomic_wait.h | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index d7de0d7eb9e..33ce26ade1b 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -332,7 +332,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	      }
 	    else
 	      {
-		__atomic_load(__addr, &__val, __ATOMIC_RELAXED);
+		__atomic_load(__addr, &__val, __ATOMIC_SEQ_CST);
 	      }
 	    return __atomic_spin(__pred, __spin);
 	  }
@@ -388,12 +388,11 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  void
 	  _M_do_wait_v(_Tp __old, _ValFn __vfn)
 	  {
-	    __platform_wait_t __val;
-	    if (__base_type::_M_do_spin_v(__old, __vfn, __val))
-	      return;
-
 	    do
 	      {
+		__platform_wait_t __val;
+		if (__base_type::_M_do_spin_v(__old, __vfn, __val))
+		  return;
 		__base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
 	      }
 	    while (__detail::__atomic_compare(__old, __vfn()));
-- 
2.34.1