[01/10] libstdc++: Atomic wait/notify ABI stabilization

Message ID 20250110212810.832494-2-jwakely@redhat.com
State Superseded
Headers
Series C++20 atomic wait/notify ABI stabilization |

Commit Message

Jonathan Wakely Jan. 10, 2025, 9:23 p.m. UTC
  From: Thomas Rodgers <trodgers@redhat.com>

This represents a major refactoring of the previous atomic::wait
and atomic::notify implementation detail. The aim of this change
is to simplify the implementation details and position the resulting
implementation so that much of the current header-only detail
can be moved into the shared library, while also accounting for
anticipated changes to wait/notify functionality for C++26.

The previous implementation implemented spin logic in terms of
the types __default_spin_policy, __timed_backoff_spin_policy, and
the free function __atomic_spin. These are replaced in favor of
two new free functions; __spin_impl and __spin_until_impl. These
currently inline free functions are expected to be moved into the
libstdc++ shared library in a future commit.

The previous implementation derived untimed and timed wait
implementation detail from __detail::__waiter_pool_base. This
is-a relationship is removed in the new version and the previous
implementation detail is renamed to reflect this change. The
static _S_for member has been renamed as well to indicate that it
returns the __waiter_pool_impl entry in the static 'side table'
for a given awaited address.

This new implementation replaces all of the non-templated waiting
detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and
__bare_wait with the __wait_impl free function, and the supporting
__wait_flags enum and __wait_args struct. This currenly inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This new implementation replaces all of the non-templated notifying
detail of __waiter_base, __waiter_pool, and __waiter with the
__notify_impl free function. This currently inline free function
is expected to be moved into the libstdc++ shared library in a
future commit.

The __atomic_wait_address template function is updated to account
for the above changes and to support the expected C++26 change to
pass the most recent observed value to the caller supplied predicate.

A new non-templated __atomic_wait_address_v free function is added
that only works for atomic types that operate only on __platform_wait_t
and requires the caller to supply a memory order. This is intended
to be the simplest code path for such types.

The __atomic_wait_address_v template function is now implemented in
terms of new __atomic_wait_address template and continues to accept
a user supplied "value function" to retrieve the current value of
the atomic.

The __atomic_notify_address template function is updated to account
for the above changes.

The template __platform_wait_until_impl is renamed to
__wait_clock_t. The previous __platform_wait_until template is deleted
and the functionality previously provided is moved t the new tempalate
function __wait_until. A similar change is made to the
__cond_wait_until_impl/__cond_wait_until implementation.

This new implementation similarly replaces all of the non-templated
waiting detail of __timed_waiter_pool, __timed_waiter, etc. with
the new __wait_until_impl free function. This currently inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This implementation replaces all templated waiting functions that
manage clock conversion as well as relative waiting (wait_for) with
the new template functions __wait_until and __wait_for.

Similarly the previous implementation detail for the various
__atomic_wait_address_Xxx templates is adjusted to account for the
implementation changes outlined above.

All of the "bare wait" versions of __atomic_wait_Xxx have been removed
and replaced with a defaulted boolean __bare_wait parameter on the
new version of these templates.

libstdc++-v3/ChangeLog:

	* include/bits/atomic_timed_wait.h:
	(__detail::__platform_wait_until_impl): Rename to
	__platform_wait_until.
	(__detail::__platform_wait_until): Remove previous
	definition.
	(__detail::__cond_wait_until_impl): Rename to
	__cond_wait_until.
	(__detail::__cond_wait_until): Remove previous
	definition.
	(__detail::__spin_until_impl): New function.
	(__detail::__wait_until_impl): New function.
	(__detail::__wait_until): New function.
	(__detail::__wait_for): New function.
	(__detail::__timed_waiter_pool): Remove type.
	(__detail::__timed_backoff_spin_policy): Remove type.
	(__detail::__timed_waiter): Remove type.
	(__detail::__enters_timed_wait): Remove type alias.
	(__detail::__bare_timed_wait): Remove type alias.
	(__atomic_wait_address_until): Adjust to new implementation
	detail.
	(__atomic_wait_address_until_v): Likewise.
	(__atomic_wait_address_bare): Remove.
	(__atomic_wait_address_for): Adjust to new implementation
	detail.
	(__atomic_wait_address_for_v): Likewise.
	(__atomic_wait_address_for_bare): Remove.
	* include/bits/atomic_wait.h: Include bits/stl_pair.h.
	(__detail::__default_spin_policy): Remove type.
	(__detail::__atomic_spin): Remove function.
	(__detail::__waiter_pool_base): Rename to __waiter_pool_impl.
	Remove _M_notify.  Rename _S_for to _S_impl_for.
	(__detail::__waiter_base): Remove type.
	(__detail::__waiter_pool): Remove type.
	(__detail::__waiter): Remove type.
	(__detail::__enters_wait): Remove type alias.
	(__detail::__bare_wait): Remove type alias.
	(__detail::__wait_flags): New enum.
	(__detail::__wait_args): New struct.
	(__detail::__wait_result_type): New type alias.
	(__detail::__spin_impl): New function.
	(__detail::__wait_impl): New function.
	(__atomic_wait_address): Adjust to new implementation detail.
	(__atomic_wait_address_v): Likewise.
	(__atomic_notify_address): Likewise.
	(__atomic_wait_address_bare): Delete.
	(__atomic_notify_address_bare): Likewise.
	* include/bits/semaphore_base.h: Adjust implementation to
	use new __atomic_wait_address_v contract.
	* include/std/barrier: Adjust implementation to use new
	__atomic_wait contract.
	* include/std/latch: Adjust implementation to use new
	__atomic_wait contract.
	* testsuite/29_atomics/atomic/wait_notify/100334.cc (main):
	Adjust to for __detail::__waiter_pool_base renaming.
---
 libstdc++-v3/include/bits/atomic_timed_wait.h | 549 ++++++++----------
 libstdc++-v3/include/bits/atomic_wait.h       | 486 ++++++++--------
 libstdc++-v3/include/bits/semaphore_base.h    |  53 +-
 libstdc++-v3/include/std/barrier              |   6 +-
 libstdc++-v3/include/std/latch                |   5 +-
 .../29_atomics/atomic/wait_notify/100334.cc   |   4 +-
 6 files changed, 514 insertions(+), 589 deletions(-)
  

Comments

Jonathan Wakely Jan. 28, 2025, 2:10 p.m. UTC | #1
On Fri, 10 Jan 2025 at 21:29, Jonathan Wakely wrote:
>
> This represents a major refactoring of the previous atomic::wait
> and atomic::notify implementation detail. The aim of this change
> is to simplify the implementation details and position the resulting
> implementation so that much of the current header-only detail
> can be moved into the shared library, while also accounting for
> anticipated changes to wait/notify functionality for C++26.
>
> The previous implementation implemented spin logic in terms of
> the types __default_spin_policy, __timed_backoff_spin_policy, and
> the free function __atomic_spin. These are replaced in favor of
> two new free functions; __spin_impl and __spin_until_impl. These
> currently inline free functions are expected to be moved into the
> libstdc++ shared library in a future commit.
>
> The previous implementation derived untimed and timed wait
> implementation detail from __detail::__waiter_pool_base. This
> is-a relationship is removed in the new version and the previous
> implementation detail is renamed to reflect this change. The
> static _S_for member has been renamed as well to indicate that it
> returns the __waiter_pool_impl entry in the static 'side table'
> for a given awaited address.
>
> This new implementation replaces all of the non-templated waiting
> detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and
> __bare_wait with the __wait_impl free function, and the supporting
> __wait_flags enum and __wait_args struct. This currenly inline free
> function is expected to be moved into the libstdc++ shared library
> in a future commit.
>
> This new implementation replaces all of the non-templated notifying
> detail of __waiter_base, __waiter_pool, and __waiter with the
> __notify_impl free function. This currently inline free function
> is expected to be moved into the libstdc++ shared library in a
> future commit.
>
> The __atomic_wait_address template function is updated to account
> for the above changes and to support the expected C++26 change to
> pass the most recent observed value to the caller supplied predicate.
>
> A new non-templated __atomic_wait_address_v free function is added
> that only works for atomic types that operate only on __platform_wait_t
> and requires the caller to supply a memory order. This is intended
> to be the simplest code path for such types.
>
> The __atomic_wait_address_v template function is now implemented in
> terms of new __atomic_wait_address template and continues to accept
> a user supplied "value function" to retrieve the current value of
> the atomic.
>
> The __atomic_notify_address template function is updated to account
> for the above changes.
>
> The template __platform_wait_until_impl is renamed to
> __wait_clock_t. The previous __platform_wait_until template is deleted
> and the functionality previously provided is moved t the new tempalate
> function __wait_until. A similar change is made to the
> __cond_wait_until_impl/__cond_wait_until implementation.
>
> This new implementation similarly replaces all of the non-templated
> waiting detail of __timed_waiter_pool, __timed_waiter, etc. with
> the new __wait_until_impl free function. This currently inline free
> function is expected to be moved into the libstdc++ shared library
> in a future commit.
>
> This implementation replaces all templated waiting functions that
> manage clock conversion as well as relative waiting (wait_for) with
> the new template functions __wait_until and __wait_for.
>
> Similarly the previous implementation detail for the various
> __atomic_wait_address_Xxx templates is adjusted to account for the
> implementation changes outlined above.
>
> All of the "bare wait" versions of __atomic_wait_Xxx have been removed
> and replaced with a defaulted boolean __bare_wait parameter on the
> new version of these templates.
>
> libstdc++-v3/ChangeLog:
>
>         * include/bits/atomic_timed_wait.h:
>         (__detail::__platform_wait_until_impl): Rename to
>         __platform_wait_until.
>         (__detail::__platform_wait_until): Remove previous
>         definition.
>         (__detail::__cond_wait_until_impl): Rename to
>         __cond_wait_until.
>         (__detail::__cond_wait_until): Remove previous
>         definition.
>         (__detail::__spin_until_impl): New function.
>         (__detail::__wait_until_impl): New function.
>         (__detail::__wait_until): New function.
>         (__detail::__wait_for): New function.
>         (__detail::__timed_waiter_pool): Remove type.
>         (__detail::__timed_backoff_spin_policy): Remove type.
>         (__detail::__timed_waiter): Remove type.
>         (__detail::__enters_timed_wait): Remove type alias.
>         (__detail::__bare_timed_wait): Remove type alias.
>         (__atomic_wait_address_until): Adjust to new implementation
>         detail.
>         (__atomic_wait_address_until_v): Likewise.
>         (__atomic_wait_address_bare): Remove.
>         (__atomic_wait_address_for): Adjust to new implementation
>         detail.
>         (__atomic_wait_address_for_v): Likewise.
>         (__atomic_wait_address_for_bare): Remove.
>         * include/bits/atomic_wait.h: Include bits/stl_pair.h.
>         (__detail::__default_spin_policy): Remove type.
>         (__detail::__atomic_spin): Remove function.
>         (__detail::__waiter_pool_base): Rename to __waiter_pool_impl.
>         Remove _M_notify.  Rename _S_for to _S_impl_for.
>         (__detail::__waiter_base): Remove type.
>         (__detail::__waiter_pool): Remove type.
>         (__detail::__waiter): Remove type.
>         (__detail::__enters_wait): Remove type alias.
>         (__detail::__bare_wait): Remove type alias.
>         (__detail::__wait_flags): New enum.
>         (__detail::__wait_args): New struct.
>         (__detail::__wait_result_type): New type alias.
>         (__detail::__spin_impl): New function.
>         (__detail::__wait_impl): New function.
>         (__atomic_wait_address): Adjust to new implementation detail.
>         (__atomic_wait_address_v): Likewise.
>         (__atomic_notify_address): Likewise.
>         (__atomic_wait_address_bare): Delete.
>         (__atomic_notify_address_bare): Likewise.
>         * include/bits/semaphore_base.h: Adjust implementation to
>         use new __atomic_wait_address_v contract.
>         * include/std/barrier: Adjust implementation to use new
>         __atomic_wait contract.
>         * include/std/latch: Adjust implementation to use new
>         __atomic_wait contract.
>         * testsuite/29_atomics/atomic/wait_notify/100334.cc (main):
>         Adjust to for __detail::__waiter_pool_base renaming.
> ---
>  libstdc++-v3/include/bits/atomic_timed_wait.h | 549 ++++++++----------
>  libstdc++-v3/include/bits/atomic_wait.h       | 486 ++++++++--------
>  libstdc++-v3/include/bits/semaphore_base.h    |  53 +-
>  libstdc++-v3/include/std/barrier              |   6 +-
>  libstdc++-v3/include/std/latch                |   5 +-
>  .../29_atomics/atomic/wait_notify/100334.cc   |   4 +-
>  6 files changed, 514 insertions(+), 589 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h
> index 9a6ac95b7d0e..196548484024 100644
> --- a/libstdc++-v3/include/bits/atomic_timed_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
> @@ -76,62 +76,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
>  #define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
>      // returns true if wait ended before timeout
> -    template<typename _Dur>
> -      bool
> -      __platform_wait_until_impl(const __platform_wait_t* __addr,
> -                                __platform_wait_t __old,
> -                                const chrono::time_point<__wait_clock_t, _Dur>&
> -                                     __atime) noexcept
> -      {
> -       auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
> -       auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
> +    bool
> +    __platform_wait_until(const __platform_wait_t* __addr,
> +                         __platform_wait_t __old,
> +                         const __wait_clock_t::time_point& __atime) noexcept
> +    {
> +      auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
> +      auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
>
> -       struct timespec __rt =
> +      struct timespec __rt =
>         {
>           static_cast<std::time_t>(__s.time_since_epoch().count()),
>           static_cast<long>(__ns.count())
>         };
>
> -       auto __e = syscall (SYS_futex, __addr,
> -                           static_cast<int>(__futex_wait_flags::
> -                                               __wait_bitset_private),
> -                           __old, &__rt, nullptr,
> -                           static_cast<int>(__futex_wait_flags::
> -                                               __bitset_match_any));
> -
> -       if (__e)
> -         {
> -           if (errno == ETIMEDOUT)
> -             return false;
> -           if (errno != EINTR && errno != EAGAIN)
> -             __throw_system_error(errno);
> -         }
> -       return true;
> -      }
> -
> -    // returns true if wait ended before timeout
> -    template<typename _Clock, typename _Dur>
> -      bool
> -      __platform_wait_until(const __platform_wait_t* __addr, __platform_wait_t __old,
> -                           const chrono::time_point<_Clock, _Dur>& __atime)
> -      {
> -       if constexpr (is_same_v<__wait_clock_t, _Clock>)
> -         {
> -           return __platform_wait_until_impl(__addr, __old, __atime);
> -         }
> -       else
> -         {
> -           if (!__platform_wait_until_impl(__addr, __old,
> -                                           __to_wait_clock(__atime)))
> -             {
> -               // We got a timeout when measured against __clock_t but
> -               // we need to check against the caller-supplied clock
> -               // to tell whether we should return a timeout.
> -               if (_Clock::now() < __atime)
> -                 return true;
> -             }
> +      auto __e = syscall (SYS_futex, __addr,
> +                         static_cast<int>(__futex_wait_flags::__wait_bitset_private),
> +                         __old, &__rt, nullptr,
> +                         static_cast<int>(__futex_wait_flags::__bitset_match_any));
> +      if (__e)
> +       {
> +         if (errno == ETIMEDOUT)
>             return false;
> -         }
> +         if (errno != EINTR && errno != EAGAIN)
> +           __throw_system_error(errno);
> +       }
> +       return true;
>        }
>  #else
>  // define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement __platform_wait_until()
> @@ -141,15 +111,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  #ifdef _GLIBCXX_HAS_GTHREADS
>      // Returns true if wait ended before timeout.
> -    // _Clock must be either steady_clock or system_clock.
> -    template<typename _Clock, typename _Dur>
> -      bool
> -      __cond_wait_until_impl(__condvar& __cv, mutex& __mx,
> -                            const chrono::time_point<_Clock, _Dur>& __atime)
> -      {
> -       static_assert(std::__is_one_of<_Clock, chrono::steady_clock,
> -                                              chrono::system_clock>::value);
> -
> +    bool
> +    __cond_wait_until(__condvar& __cv, mutex& __mx,
> +                     const __wait_clock_t::time_point& __atime)
> +    {
>         auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
>         auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
>
> @@ -160,293 +125,261 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>           };
>
>  #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
> -       if constexpr (is_same_v<chrono::steady_clock, _Clock>)
> +       if constexpr (is_same_v<chrono::steady_clock, __wait_clock_t>)
>           __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
>         else
>  #endif
>           __cv.wait_until(__mx, __ts);
> -       return _Clock::now() < __atime;
> -      }
> -
> -    // returns true if wait ended before timeout
> -    template<typename _Clock, typename _Dur>
> -      bool
> -      __cond_wait_until(__condvar& __cv, mutex& __mx,
> -         const chrono::time_point<_Clock, _Dur>& __atime)
> -      {
> -#ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
> -       if constexpr (is_same_v<_Clock, chrono::steady_clock>)
> -         return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
> -       else
> -#endif
> -       if constexpr (is_same_v<_Clock, chrono::system_clock>)
> -         return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
> -       else
> -         {
> -           if (__cond_wait_until_impl(__cv, __mx,
> -                                      __to_wait_clock(__atime)))
> -             {
> -               // We got a timeout when measured against __clock_t but
> -               // we need to check against the caller-supplied clock
> -               // to tell whether we should return a timeout.
> -               if (_Clock::now() < __atime)
> -                 return true;
> -             }
> -           return false;
> -         }
> +       return __wait_clock_t::now() < __atime;
>        }
>  #endif // _GLIBCXX_HAS_GTHREADS
>
> -    struct __timed_waiter_pool : __waiter_pool_base
> +    inline __wait_result_type
> +    __spin_until_impl(const __platform_wait_t* __addr, __wait_args __args,
> +                     const __wait_clock_t::time_point& __deadline)
>      {
> -      // returns true if wait ended before timeout
> -      template<typename _Clock, typename _Dur>
> -       bool
> -       _M_do_wait_until(__platform_wait_t* __addr, __platform_wait_t __old,
> -                        const chrono::time_point<_Clock, _Dur>& __atime)
> +      auto __t0 = __wait_clock_t::now();
> +      using namespace literals::chrono_literals;
> +
> +      __platform_wait_t __val;
> +      auto __now = __wait_clock_t::now();
> +      for (; __now < __deadline; __now = __wait_clock_t::now())
> +      {
> +       auto __elapsed = __now - __t0;
> +#ifndef _GLIBCXX_NO_SLEEP
> +       if (__elapsed > 128ms)
> +       {
> +         this_thread::sleep_for(64ms);
> +       }
> +       else if (__elapsed > 64us)
> +       {
> +         this_thread::sleep_for(__elapsed / 2);
> +       }
> +       else
> +#endif
> +       if (__elapsed > 4us)
> +       {
> +         __thread_yield();
> +       }
> +       else
> +       {
> +         auto __res = __detail::__spin_impl(__addr, __args);
> +         if (__res.first)
> +           return __res;
> +       }
> +
> +       __atomic_load(__addr, &__val, __args._M_order);
> +       if (__val != __args._M_old)
> +           return make_pair(true, __val);
> +      }
> +      return make_pair(false, __val);
> +    }
> +
> +    inline __wait_result_type
> +    __wait_until_impl(const __platform_wait_t* __addr, __wait_args __args,
> +                     const __wait_clock_t::time_point& __atime)
> +    {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +      __waiter_pool_impl* __pool = nullptr;
> +#else
> +      // if we don't have __platform_wait, we always need the side-table
> +      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +
> +      __platform_wait_t* __wait_addr;
> +      if (__args & __wait_flags::__proxy_wait)
>         {
>  #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> -         return __platform_wait_until(__addr, __old, __atime);
> -#else
> -         __platform_wait_t __val;
> -         __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
> -         if (__val == __old)
> -           {
> -             lock_guard<mutex> __l(_M_mtx);
> -             return __cond_wait_until(_M_cv, _M_mtx, __atime);
> -           }
> -         else
> -           return true;
> -#endif // _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +         __pool = &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +         __wait_addr = &__pool->_M_ver;
> +         __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
>         }
> -    };
> +      else
> +       __wait_addr = const_cast<__platform_wait_t*>(__addr);
>
> -    struct __timed_backoff_spin_policy
> -    {
> -      __wait_clock_t::time_point _M_deadline;
> -      __wait_clock_t::time_point _M_t0;
> +      if (__args & __wait_flags::__do_spin)
> +       {
> +         auto __res = __detail::__spin_until_impl(__wait_addr, __args, __atime);
> +         if (__res.first)
> +           return __res;
> +         if (__args & __wait_flags::__spin_only)
> +           return __res;
> +       }
>
> -      template<typename _Clock, typename _Dur>
> -       __timed_backoff_spin_policy(chrono::time_point<_Clock, _Dur>
> -                                     __deadline = _Clock::time_point::max(),
> -                                   chrono::time_point<_Clock, _Dur>
> -                                     __t0 = _Clock::now()) noexcept
> -         : _M_deadline(__to_wait_clock(__deadline))
> -         , _M_t0(__to_wait_clock(__t0))
> -       { }
> -
> -      bool
> -      operator()() const noexcept
> +      if (!(__args & __wait_flags::__track_contention))
>        {
> -       using namespace literals::chrono_literals;
> -       auto __now = __wait_clock_t::now();
> -       if (_M_deadline <= __now)
> -         return false;
> -
> -       // FIXME: this_thread::sleep_for not available #ifdef _GLIBCXX_NO_SLEEP
> -
> -       auto __elapsed = __now - _M_t0;
> -       if (__elapsed > 128ms)
> -         {
> -           this_thread::sleep_for(64ms);
> -         }
> -       else if (__elapsed > 64us)
> -         {
> -           this_thread::sleep_for(__elapsed / 2);
> -         }
> -       else if (__elapsed > 4us)
> -         {
> -           __thread_yield();
> -         }
> -       else
> -         return false;
> -       return true;
> +       // caller does not externally track contention
> +#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +       __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
> +                                    : __pool;
> +#endif
> +       __pool->_M_enter_wait();
>        }
> -    };
>
> -    template<typename _EntersWait>
> -      struct __timed_waiter : __waiter_base<__timed_waiter_pool>
> +      __wait_result_type __res;
> +#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +      if (__platform_wait_until(__wait_addr, __args._M_old, __atime))
> +       __res = make_pair(true, __args._M_old);
> +      else
> +       __res = make_pair(false, __args._M_old);
> +#else
> +      __platform_wait_t __val;
> +      __atomic_load(__wait_addr, &__val, __args._M_order);
> +      if (__val == __args._M_old)
> +       {
> +          lock_guard<mutex> __l{ __pool->_M_mtx };
> +          __atomic_load(__wait_addr, &__val, __args._M_order);
> +          if (__val == __args._M_old &&
> +              __cond_wait_until(__pool->_M_cv, __pool->_M_mtx, __atime))
> +            __res = make_pair(true, __val);
> +       }
> +      else
> +       __res = make_pair(false, __val);
> +#endif
> +
> +      if (!(__args & __wait_flags::__track_contention))
> +         // caller does not externally track contention
> +         __pool->_M_leave_wait();
> +      return __res;
> +    }
> +
> +    template<typename _Clock, typename _Dur>
> +      __wait_result_type
> +      __wait_until(const __platform_wait_t* __addr, __wait_args __args,
> +                  const chrono::time_point<_Clock, _Dur>& __atime) noexcept
>        {
> -       using __base_type = __waiter_base<__timed_waiter_pool>;
> -
> -       template<typename _Tp>
> -         __timed_waiter(const _Tp* __addr) noexcept
> -         : __base_type(__addr)
> -       {
> -         if constexpr (_EntersWait::value)
> -           _M_w._M_enter_wait();
> -       }
> -
> -       ~__timed_waiter()
> -       {
> -         if constexpr (_EntersWait::value)
> -           _M_w._M_leave_wait();
> -       }
> -
> -       // returns true if wait ended before timeout
> -       template<typename _Tp, typename _ValFn,
> -                typename _Clock, typename _Dur>
> -         bool
> -         _M_do_wait_until_v(_Tp __old, _ValFn __vfn,
> -                            const chrono::time_point<_Clock, _Dur>&
> -                                                               __atime) noexcept
> +       if constexpr (is_same_v<__wait_clock_t, _Clock>)
> +         return __detail::__wait_until_impl(__addr, __args, __atime);
> +       else
>           {
> -           __platform_wait_t __val;
> -           if (_M_do_spin(__old, std::move(__vfn), __val,
> -                          __timed_backoff_spin_policy(__atime)))
> -             return true;
> -           return __base_type::_M_w._M_do_wait_until(__base_type::_M_addr, __val, __atime);
> -         }
> +            auto __res = __detail::__wait_until_impl(__addr, __args,
> +                                           __to_wait_clock(__atime));
> +            if (!__res.first)
> +              {
> +                 // We got a timeout when measured against __clock_t but
> +                 // we need to check against the caller-supplied clock
> +                 // to tell whether we should return a timeout.
> +                 if (_Clock::now() < __atime)
> +                   return make_pair(true, __res.second);
> +               }
> +             return __res;
> +           }
> +      }
>
> -       // returns true if wait ended before timeout
> -       template<typename _Pred,
> -                typename _Clock, typename _Dur>
> -         bool
> -         _M_do_wait_until(_Pred __pred, __platform_wait_t __val,
> -                         const chrono::time_point<_Clock, _Dur>&
> -                                                             __atime) noexcept
> -         {
> -           for (auto __now = _Clock::now(); __now < __atime;
> -                 __now = _Clock::now())
> -             {
> -               if (__base_type::_M_w._M_do_wait_until(
> -                     __base_type::_M_addr, __val, __atime)
> -                   && __pred())
> -                 return true;
> -
> -               if (__base_type::_M_do_spin(__pred, __val,
> -                              __timed_backoff_spin_policy(__atime, __now)))
> -                 return true;
> -             }
> -           return false;
> -         }
> -
> -       // returns true if wait ended before timeout
> -       template<typename _Pred,
> -                typename _Clock, typename _Dur>
> -         bool
> -         _M_do_wait_until(_Pred __pred,
> -                          const chrono::time_point<_Clock, _Dur>&
> -                                                               __atime) noexcept
> -         {
> -           __platform_wait_t __val;
> -           if (__base_type::_M_do_spin(__pred, __val,
> -                                       __timed_backoff_spin_policy(__atime)))
> -             return true;
> -           return _M_do_wait_until(__pred, __val, __atime);
> -         }
> -
> -       template<typename _Tp, typename _ValFn,
> -                typename _Rep, typename _Period>
> -         bool
> -         _M_do_wait_for_v(_Tp __old, _ValFn __vfn,
> -                          const chrono::duration<_Rep, _Period>&
> -                                                               __rtime) noexcept
> -         {
> -           __platform_wait_t __val;
> -           if (_M_do_spin_v(__old, std::move(__vfn), __val))
> -             return true;
> -
> -           if (!__rtime.count())
> -             return false; // no rtime supplied, and spin did not acquire
> -
> -           auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
> -
> -           return __base_type::_M_w._M_do_wait_until(
> -                                         __base_type::_M_addr,
> -                                         __val,
> -                                         chrono::steady_clock::now() + __reltime);
> -         }
> -
> -       template<typename _Pred,
> -                typename _Rep, typename _Period>
> -         bool
> -         _M_do_wait_for(_Pred __pred,
> -                        const chrono::duration<_Rep, _Period>& __rtime) noexcept
> -         {
> -           __platform_wait_t __val;
> -           if (__base_type::_M_do_spin(__pred, __val))
> -             return true;
> -
> -           if (!__rtime.count())
> -             return false; // no rtime supplied, and spin did not acquire
> -
> -           auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
> -
> -           return _M_do_wait_until(__pred, __val,
> -                                   chrono::steady_clock::now() + __reltime);
> -         }
> -      };
> -
> -    using __enters_timed_wait = __timed_waiter<std::true_type>;
> -    using __bare_timed_wait = __timed_waiter<std::false_type>;
> +    template<typename _Rep, typename _Period>
> +      __wait_result_type
> +      __wait_for(const __platform_wait_t* __addr, __wait_args __args,
> +                const chrono::duration<_Rep, _Period>& __rtime) noexcept
> +    {
> +      if (!__rtime.count())
> +       // no rtime supplied, just spin a bit
> +       return __detail::__wait_impl(__addr, __args | __wait_flags::__spin_only);
> +      auto const __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
> +      auto const __atime = chrono::steady_clock::now() + __reltime;
> +      return __detail::__wait_until(__addr, __args, __atime);
> +    }
>    } // namespace __detail
>
>    // returns true if wait ended before timeout
> +  template<typename _Tp,
> +          typename _Pred, typename _ValFn,
> +          typename _Clock, typename _Dur>
> +    bool
> +    __atomic_wait_address_until(const _Tp* __addr, _Pred&& __pred,
> +                               _ValFn&& __vfn,
> +                               const chrono::time_point<_Clock, _Dur>& __atime,
> +                               bool __bare_wait = false) noexcept
> +    {
> +       const auto __wait_addr =
> +          reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
> +       __detail::__wait_args __args{ __addr, __bare_wait };
> +       _Tp __val = __vfn();
> +       while (!__pred(__val))
> +        {
> +          auto __res = __detail::__wait_until(__wait_addr, __args, __atime);
> +          if (!__res.first)
> +            // timed out
> +            return __res.first; // C++26 will also return last observed __val
> +          __val = __vfn();
> +        }
> +       return true; // C++26 will also return last observed __val
> +    }
> +
> +  template<typename _Clock, typename _Dur>
> +    bool
> +    __atomic_wait_address_until_v(const __detail::__platform_wait_t* __addr,
> +                                 __detail::__platform_wait_t __old,
> +                                 int __order,
> +                                 const chrono::time_point<_Clock, _Dur>& __atime,
> +                                 bool __bare_wait = false) noexcept
> +    {
> +      __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
> +      auto __res = __detail::__wait_until(__addr, __args, __atime);
> +      return __res.first; // C++26 will also return last observed __val
> +    }
> +
>    template<typename _Tp, typename _ValFn,
>            typename _Clock, typename _Dur>
>      bool
>      __atomic_wait_address_until_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
> -                       const chrono::time_point<_Clock, _Dur>&
> -                           __atime) noexcept
> +                                 const chrono::time_point<_Clock, _Dur>& __atime,
> +                                 bool __bare_wait = false) noexcept
>      {
> -      __detail::__enters_timed_wait __w{__addr};
> -      return __w._M_do_wait_until_v(__old, __vfn, __atime);
> +       auto __pfn = [&](const _Tp& __val)
> +          { return !__detail::__atomic_compare(__old, __val); };
> +       return __atomic_wait_address_until(__addr, __pfn, forward<_ValFn>(__vfn),
> +                                         __atime, __bare_wait);
>      }
>
> -  template<typename _Tp, typename _Pred,
> -          typename _Clock, typename _Dur>
> -    bool
> -    __atomic_wait_address_until(const _Tp* __addr, _Pred __pred,
> -                               const chrono::time_point<_Clock, _Dur>&
> -                                                             __atime) noexcept
> -    {
> -      __detail::__enters_timed_wait __w{__addr};
> -      return __w._M_do_wait_until(__pred, __atime);
> -    }
> +  template<typename _Tp,
> +          typename _Pred, typename _ValFn,
> +          typename _Rep, typename _Period>
> +   bool
> +   __atomic_wait_address_for(const _Tp* __addr, _Pred&& __pred,
> +                            _ValFn&& __vfn,
> +                            const chrono::duration<_Rep, _Period>& __rtime,
> +                            bool __bare_wait = false) noexcept
> +  {
> +      const auto __wait_addr =
> +         reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
> +      __detail::__wait_args __args{ __addr, __bare_wait };
> +      _Tp __val = __vfn();
> +      while (!__pred(__val))
> +       {
> +         auto __res = __detail::__wait_for(__wait_addr, __args, __rtime);
> +         if (!__res.first)
> +           // timed out
> +           return __res.first; // C++26 will also return last observed __val
> +         __val = __vfn();
> +       }
> +      return true; // C++26 will also return last observed __val
> +  }
>
> -  template<typename _Pred,
> -          typename _Clock, typename _Dur>
> +  template<typename _Rep, typename _Period>
>      bool
> -    __atomic_wait_address_until_bare(const __detail::__platform_wait_t* __addr,
> -                               _Pred __pred,
> -                               const chrono::time_point<_Clock, _Dur>&
> -                                                             __atime) noexcept
> -    {
> -      __detail::__bare_timed_wait __w{__addr};
> -      return __w._M_do_wait_until(__pred, __atime);
> -    }
> +    __atomic_wait_address_for_v(const __detail::__platform_wait_t* __addr,
> +                               __detail::__platform_wait_t __old,
> +                               int __order,
> +                               const chrono::time_point<_Rep, _Period>& __rtime,
> +                               bool __bare_wait = false) noexcept
> +  {
> +    __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
> +    auto __res = __detail::__wait_for(__addr, __args, __rtime);
> +    return __res.first; // C++26 will also return last observed __Val
> +  }
>
>    template<typename _Tp, typename _ValFn,
>            typename _Rep, typename _Period>
>      bool
>      __atomic_wait_address_for_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
> -                     const chrono::duration<_Rep, _Period>& __rtime) noexcept
> +                               const chrono::duration<_Rep, _Period>& __rtime,
> +                               bool __bare_wait = false) noexcept
>      {
> -      __detail::__enters_timed_wait __w{__addr};
> -      return __w._M_do_wait_for_v(__old, __vfn, __rtime);
> -    }
> -
> -  template<typename _Tp, typename _Pred,
> -          typename _Rep, typename _Period>
> -    bool
> -    __atomic_wait_address_for(const _Tp* __addr, _Pred __pred,
> -                     const chrono::duration<_Rep, _Period>& __rtime) noexcept
> -    {
> -
> -      __detail::__enters_timed_wait __w{__addr};
> -      return __w._M_do_wait_for(__pred, __rtime);
> -    }
> -
> -  template<typename _Pred,
> -          typename _Rep, typename _Period>
> -    bool
> -    __atomic_wait_address_for_bare(const __detail::__platform_wait_t* __addr,
> -                       _Pred __pred,
> -                       const chrono::duration<_Rep, _Period>& __rtime) noexcept
> -    {
> -      __detail::__bare_timed_wait __w{__addr};
> -      return __w._M_do_wait_for(__pred, __rtime);
> +      auto __pfn = [&](const _Tp& __val)
> +         { return !__detail::__atomic_compare(__old, __val); };
> +      return __atomic_wait_address_for(__addr, __pfn, forward<_ValFn>(__vfn),
> +                                      __rtime, __bare_wait);
>      }
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace std
> diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
> index 6d1554f68a56..18cfc2ef7bd2 100644
> --- a/libstdc++-v3/include/bits/atomic_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_wait.h
> @@ -50,7 +50,8 @@
>  # include <bits/functexcept.h>
>  #endif
>
> -# include <bits/std_mutex.h>  // std::mutex, std::__condvar
> +#include <bits/stl_pair.h>
> +#include <bits/std_mutex.h>  // std::mutex, std::__condvar
>
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
> @@ -134,7 +135,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      __thread_yield() noexcept
>      {
>  #if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD
> -     __gthread_yield();
> +      __gthread_yield();
>  #endif
>      }
>
> @@ -151,38 +152,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      inline constexpr auto __atomic_spin_count_relax = 12;
>      inline constexpr auto __atomic_spin_count = 16;
>
> -    struct __default_spin_policy
> -    {
> -      bool
> -      operator()() const noexcept
> -      { return false; }
> -    };
> -
> -    template<typename _Pred,
> -            typename _Spin = __default_spin_policy>
> -      bool
> -      __atomic_spin(_Pred& __pred, _Spin __spin = _Spin{ }) noexcept
> -      {
> -       for (auto __i = 0; __i < __atomic_spin_count; ++__i)
> -         {
> -           if (__pred())
> -             return true;
> -
> -           if (__i < __atomic_spin_count_relax)
> -             __detail::__thread_relax();
> -           else
> -             __detail::__thread_yield();
> -         }
> -
> -       while (__spin())
> -         {
> -           if (__pred())
> -             return true;
> -         }
> -
> -       return false;
> -      }
> -
>      // return true if equal
>      template<typename _Tp>
>        bool __atomic_compare(const _Tp& __a, const _Tp& __b)
> @@ -191,7 +160,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>         return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
>        }
>
> -    struct __waiter_pool_base
> +    struct __waiter_pool_impl
>      {
>        // Don't use std::hardware_destructive_interference_size here because we
>        // don't want the layout of library types to depend on compiler options.
> @@ -208,7 +177,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
>        __condvar _M_cv;
>  #endif
> -      __waiter_pool_base() = default;
> +      __waiter_pool_impl() = default;
>
>        void
>        _M_enter_wait() noexcept
> @@ -226,256 +195,271 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>         return __res != 0;
>        }
>
> -      void
> -      _M_notify(__platform_wait_t* __addr, [[maybe_unused]] bool __all,
> -               bool __bare) noexcept
> -      {
> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> -       if (__addr == &_M_ver)
> -         {
> -           __atomic_fetch_add(__addr, 1, __ATOMIC_SEQ_CST);
> -           __all = true;
> -         }
> -
> -       if (__bare || _M_waiting())
> -         __platform_notify(__addr, __all);
> -#else
> -       {
> -         lock_guard<mutex> __l(_M_mtx);
> -         __atomic_fetch_add(__addr, 1, __ATOMIC_RELAXED);
> -       }
> -       if (__bare || _M_waiting())
> -         _M_cv.notify_all();
> -#endif
> -      }
> -
> -      static __waiter_pool_base&
> -      _S_for(const void* __addr) noexcept
> +      static __waiter_pool_impl&
> +      _S_impl_for(const void* __addr) noexcept
>        {
>         constexpr __UINTPTR_TYPE__ __ct = 16;
> -       static __waiter_pool_base __w[__ct];
> +       static __waiter_pool_impl __w[__ct];
>         auto __key = ((__UINTPTR_TYPE__)__addr >> 2) % __ct;
>         return __w[__key];
>        }
>      };
>
> -    struct __waiter_pool : __waiter_pool_base
> +    enum class __wait_flags : __UINT_LEAST32_TYPE__
>      {
> -      void
> -      _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old) noexcept
> -      {
> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> -       __platform_wait(__addr, __old);
> -#else
> -       __platform_wait_t __val;
> -       __atomic_load(__addr, &__val, __ATOMIC_SEQ_CST);
> -       if (__val == __old)
> -         {
> -           lock_guard<mutex> __l(_M_mtx);
> -           __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
> -           if (__val == __old)
> -             _M_cv.wait(_M_mtx);
> -         }
> -#endif // __GLIBCXX_HAVE_PLATFORM_WAIT
> -      }
> +       __abi_version = 0,
> +       __proxy_wait = 1,
> +       __track_contention = 2,
> +       __do_spin = 4,
> +       __spin_only = 8 | __do_spin, // implies __do_spin
> +       __abi_version_mask = 0xffff0000,
>      };
>
> -    template<typename _Tp>
> -      struct __waiter_base
> +    struct __wait_args
> +    {
> +      __platform_wait_t _M_old;
> +      int _M_order = __ATOMIC_ACQUIRE;
> +      __wait_flags _M_flags;
> +
> +      template<typename _Tp>
> +       explicit __wait_args(const _Tp* __addr,
> +                            bool __bare_wait = false) noexcept
> +           : _M_flags{ _S_flags_for(__addr, __bare_wait) }
> +       { }
> +
> +      __wait_args(const __platform_wait_t* __addr, __platform_wait_t __old,
> +                 int __order, bool __bare_wait = false) noexcept
> +         : _M_old{ __old }
> +         , _M_order{ __order }
> +         , _M_flags{ _S_flags_for(__addr, __bare_wait) }
> +       { }
> +
> +      __wait_args(const __wait_args&) noexcept = default;
> +      __wait_args&
> +      operator=(const __wait_args&) noexcept = default;
> +
> +      bool
> +      operator&(__wait_flags __flag) const noexcept
>        {
> -       using __waiter_type = _Tp;
> +        using __t = underlying_type_t<__wait_flags>;
> +        return static_cast<__t>(_M_flags)
> +            & static_cast<__t>(__flag);
> +      }
>
> -       __waiter_type& _M_w;
> -       __platform_wait_t* _M_addr;
> +      __wait_args
> +      operator|(__wait_flags __flag) const noexcept
> +      {
> +       using __t = underlying_type_t<__wait_flags>;
> +       __wait_args __res{ *this };
> +       const auto __flags = static_cast<__t>(__res._M_flags)
> +                            | static_cast<__t>(__flag);
> +       __res._M_flags = __wait_flags{ __flags };
> +       return __res;
> +      }
>
> -       template<typename _Up>
> -         static __platform_wait_t*
> -         _S_wait_addr(const _Up* __a, __platform_wait_t* __b)
> -         {
> -           if constexpr (__platform_wait_uses_type<_Up>)
> -             return reinterpret_cast<__platform_wait_t*>(const_cast<_Up*>(__a));
> -           else
> -             return __b;
> -         }
> +    private:
> +      static int
> +      constexpr _S_default_flags() noexcept
> +      {
> +       using __t = underlying_type_t<__wait_flags>;
> +       return static_cast<__t>(__wait_flags::__abi_version)
> +               | static_cast<__t>(__wait_flags::__do_spin);
> +      }
>
> -       static __waiter_type&
> -       _S_for(const void* __addr) noexcept
> +      template<typename _Tp>
> +       static int
> +       constexpr _S_flags_for(const _Tp*, bool __bare_wait) noexcept
>         {
> -         static_assert(sizeof(__waiter_type) == sizeof(__waiter_pool_base));
> -         auto& res = __waiter_pool_base::_S_for(__addr);
> -         return reinterpret_cast<__waiter_type&>(res);
> +         auto __res = _S_default_flags();
> +         if (!__bare_wait)
> +           __res |= static_cast<int>(__wait_flags::__track_contention);
> +         if constexpr (!__platform_wait_uses_type<_Tp>)
> +           __res |= static_cast<int>(__wait_flags::__proxy_wait);
> +         return __res;
>         }
>
> -       template<typename _Up>
> -         explicit __waiter_base(const _Up* __addr) noexcept
> -           : _M_w(_S_for(__addr))
> -           , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
> -         { }
> -
> -       void
> -       _M_notify(bool __all, bool __bare = false) noexcept
> -       { _M_w._M_notify(_M_addr, __all, __bare); }
> -
> -       template<typename _Up, typename _ValFn,
> -                typename _Spin = __default_spin_policy>
> -         static bool
> -         _S_do_spin_v(__platform_wait_t* __addr,
> -                      const _Up& __old, _ValFn __vfn,
> -                      __platform_wait_t& __val,
> -                      _Spin __spin = _Spin{ })
> -         {
> -           auto const __pred = [=]
> -             { return !__detail::__atomic_compare(__old, __vfn()); };
> -
> -           if constexpr (__platform_wait_uses_type<_Up>)
> -             {
> -               __builtin_memcpy(&__val, &__old, sizeof(__val));
> -             }
> -           else
> -             {
> -               __atomic_load(__addr, &__val, __ATOMIC_ACQUIRE);
> -             }
> -           return __atomic_spin(__pred, __spin);
> -         }
> -
> -       template<typename _Up, typename _ValFn,
> -                typename _Spin = __default_spin_policy>
> -         bool
> -         _M_do_spin_v(const _Up& __old, _ValFn __vfn,
> -                      __platform_wait_t& __val,
> -                      _Spin __spin = _Spin{ })
> -         { return _S_do_spin_v(_M_addr, __old, __vfn, __val, __spin); }
> -
> -       template<typename _Pred,
> -                typename _Spin = __default_spin_policy>
> -         static bool
> -         _S_do_spin(const __platform_wait_t* __addr,
> -                    _Pred __pred,
> -                    __platform_wait_t& __val,
> -                    _Spin __spin = _Spin{ })
> -         {
> -           __atomic_load(__addr, &__val, __ATOMIC_ACQUIRE);
> -           return __atomic_spin(__pred, __spin);
> -         }
> -
> -       template<typename _Pred,
> -                typename _Spin = __default_spin_policy>
> -         bool
> -         _M_do_spin(_Pred __pred, __platform_wait_t& __val,
> -                    _Spin __spin = _Spin{ })
> -         { return _S_do_spin(_M_addr, __pred, __val, __spin); }
> -      };
> -
> -    template<typename _EntersWait>
> -      struct __waiter : __waiter_base<__waiter_pool>
> -      {
> -       using __base_type = __waiter_base<__waiter_pool>;
> -
> -       template<typename _Tp>
> -         explicit __waiter(const _Tp* __addr) noexcept
> -           : __base_type(__addr)
> -         {
> -           if constexpr (_EntersWait::value)
> -             _M_w._M_enter_wait();
> -         }
> -
> -       ~__waiter()
> +      template<typename _Tp>
> +       static int
> +       _S_memory_order_for(const _Tp*, int __order) noexcept
>         {
> -         if constexpr (_EntersWait::value)
> -           _M_w._M_leave_wait();
> +         if constexpr (__platform_wait_uses_type<_Tp>)
> +           return __order;
> +         return __ATOMIC_ACQUIRE;
> +       }
> +    };
> +
> +    using __wait_result_type = pair<bool, __platform_wait_t>;
> +    inline __wait_result_type
> +    __spin_impl(const __platform_wait_t* __addr, __wait_args __args)
> +    {
> +      __platform_wait_t __val;
> +      for (auto __i = 0; __i < __atomic_spin_count; ++__i)
> +       {
> +         __atomic_load(__addr, &__val, __args._M_order);
> +         if (__val != __args._M_old)
> +           return make_pair(true, __val);
> +         if (__i < __atomic_spin_count_relax)
> +           __detail::__thread_relax();
> +         else
> +           __detail::__thread_yield();
> +       }
> +      return make_pair(false, __val);
> +    }
> +
> +    inline __wait_result_type
> +    __wait_impl(const __platform_wait_t* __addr, __wait_args __args)
> +    {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +      __waiter_pool_impl* __pool = nullptr;
> +#else
> +      // if we don't have __platform_wait, we always need the side-table
> +      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +
> +      __platform_wait_t* __wait_addr;
> +      if (__args & __wait_flags::__proxy_wait)
> +       {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +         __pool = &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +         __wait_addr = &__pool->_M_ver;
> +         __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
> +       }
> +      else
> +       __wait_addr = const_cast<__platform_wait_t*>(__addr);
> +
> +      if (__args & __wait_flags::__do_spin)
> +       {
> +         auto __res = __detail::__spin_impl(__wait_addr, __args);
> +         if (__res.first)
> +           return __res;
> +         if (__args & __wait_flags::__spin_only)
> +           return __res;
>         }
>
> -       template<typename _Tp, typename _ValFn>
> -         void
> -         _M_do_wait_v(_Tp __old, _ValFn __vfn)
> -         {
> -           do
> -             {
> -               __platform_wait_t __val;
> -               if (__base_type::_M_do_spin_v(__old, __vfn, __val))
> -                 return;
> -               __base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
> -             }
> -           while (__detail::__atomic_compare(__old, __vfn()));
> -         }
> +      if (!(__args & __wait_flags::__track_contention))
> +       {
> +         // caller does not externally track contention
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +         __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
> +                                      : __pool;
> +#endif
> +         __pool->_M_enter_wait();
> +       }
>
> -       template<typename _Pred>
> -         void
> -         _M_do_wait(_Pred __pred) noexcept
> -         {
> -           do
> -             {
> -               __platform_wait_t __val;
> -               if (__base_type::_M_do_spin(__pred, __val))
> -                 return;
> -               __base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
> -             }
> -           while (!__pred());
> -         }
> -      };
> +      __wait_result_type __res;
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +      __platform_wait(__wait_addr, __args._M_old);
> +      __res = make_pair(false, __args._M_old);
> +#else
> +      __platform_wait_t __val;
> +      __atomic_load(__wait_addr, &__val, __args._M_order);
> +      if (__val == __args._M_old)
> +       {
> +           lock_guard<mutex> __l{ __pool->_M_mtx };
> +           __atomic_load(__wait_addr, &__val, __args._M_order);
> +           if (__val == __args._M_old)
> +               __pool->_M_cv.wait(__pool->_M_mtx);
> +       }
> +      __res = make_pair(false, __val);
> +#endif
>
> -    using __enters_wait = __waiter<std::true_type>;
> -    using __bare_wait = __waiter<std::false_type>;
> +      if (!(__args & __wait_flags::__track_contention))
> +       // caller does not externally track contention
> +       __pool->_M_leave_wait();
> +      return __res;
> +    }
> +
> +    inline void
> +    __notify_impl(const __platform_wait_t* __addr, [[maybe_unused]] bool __all,
> +                 __wait_args __args)
> +    {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +      __waiter_pool_impl* __pool = nullptr;
> +#else
> +      // if we don't have __platform_notify, we always need the side-table
> +      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +
> +      if (!(__args & __wait_flags::__track_contention))
> +       {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +         __pool = &__waiter_pool_impl::_S_impl_for(__addr);
> +#endif
> +         if (!__pool->_M_waiting())
> +           return;
> +       }
> +
> +      __platform_wait_t* __wait_addr;
> +      if (__args & __wait_flags::__proxy_wait)
> +       {
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +          __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
> +                                       : __pool;
> +#endif
> +          __wait_addr = &__pool->_M_ver;
> +          __atomic_fetch_add(__wait_addr, 1, __ATOMIC_RELAXED);
> +          __all = true;
> +        }
> +
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> +      __platform_notify(__wait_addr, __all);
> +#else
> +      lock_guard<mutex> __l{ __pool->_M_mtx };
> +      __pool->_M_cv.notify_all();
> +#endif
> +    }
>    } // namespace __detail
>
> +  template<typename _Tp,
> +          typename _Pred, typename _ValFn>
> +    void
> +    __atomic_wait_address(const _Tp* __addr,
> +                         _Pred&& __pred, _ValFn&& __vfn,
> +                         bool __bare_wait = false) noexcept
> +    {
> +      const auto __wait_addr =
> +         reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
> +      __detail::__wait_args __args{ __addr, __bare_wait };
> +      _Tp __val = __vfn();
> +      while (!__pred(__val))
> +       {

There's a serious race condition here, resulting in missed
notifications and causing __atomic_wait_address to block forever.

The problem case is for proxied waits, where __wait_impl waits for the
value of _M_ver to be incremented by __notify_impl.
Let's say thread A checks __pred(__val) here and determines it hasn't
changed yet, then gets time-sliced and thread B runs. Thread B changes
__val and calls notify() which increments _M_ver. Then thread A
continues, calling __wait_impl which loads _M_ver twice, decides it
hasn't changed, and goes to sleep forever waiting for it to change.

Thread A will never wake up, because thread B already sent its
notification. Thread A will never notice that __val has changed,
because it's blocked waiting for _M_ver to change, so never gets to
check __pred(__val) again.

Before this refactoring, the load of _M_ver happened before checking
__pred(__val), which avoids the problem (as long as the value of
_M_ver doesn't wrap and


> +         __detail::__wait_impl(__wait_addr, __args);
> +         __val = __vfn();
> +       }
> +      // C++26 will return __val
> +    }
> +
> +  inline void
> +  __atomic_wait_address_v(const __detail::__platform_wait_t* __addr,
> +                         __detail::__platform_wait_t __old,
> +                         int __order)
> +  {
> +    __detail::__wait_args __args{ __addr, __old, __order };
> +    // C++26 will not ignore the return value here
> +    __detail::__wait_impl(__addr, __args);
> +  }
> +
>    template<typename _Tp, typename _ValFn>
>      void
>      __atomic_wait_address_v(const _Tp* __addr, _Tp __old,
>                             _ValFn __vfn) noexcept
>      {
> -      __detail::__enters_wait __w(__addr);
> -      __w._M_do_wait_v(__old, __vfn);
> -    }
> -
> -  template<typename _Tp, typename _Pred>
> -    void
> -    __atomic_wait_address(const _Tp* __addr, _Pred __pred) noexcept
> -    {
> -      __detail::__enters_wait __w(__addr);
> -      __w._M_do_wait(__pred);
> -    }
> -
> -  // This call is to be used by atomic types which track contention externally
> -  template<typename _Pred>
> -    void
> -    __atomic_wait_address_bare(const __detail::__platform_wait_t* __addr,
> -                              _Pred __pred) noexcept
> -    {
> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> -      do
> -       {
> -         __detail::__platform_wait_t __val;
> -         if (__detail::__bare_wait::_S_do_spin(__addr, __pred, __val))
> -           return;
> -         __detail::__platform_wait(__addr, __val);
> -       }
> -      while (!__pred());
> -#else // !_GLIBCXX_HAVE_PLATFORM_WAIT
> -      __detail::__bare_wait __w(__addr);
> -      __w._M_do_wait(__pred);
> -#endif
> +      auto __pfn = [&](const _Tp& __val)
> +         { return !__detail::__atomic_compare(__old, __val); };
> +      __atomic_wait_address(__addr, __pfn, forward<_ValFn>(__vfn));
>      }
>
>    template<typename _Tp>
>      void
> -    __atomic_notify_address(const _Tp* __addr, bool __all) noexcept
> +    __atomic_notify_address(const _Tp* __addr, bool __all,
> +                           bool __bare_wait = false) noexcept
>      {
> -      __detail::__bare_wait __w(__addr);
> -      __w._M_notify(__all);
> +      const auto __wait_addr =
> +         reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
> +      __detail::__wait_args __args{ __addr, __bare_wait };
> +      __detail::__notify_impl(__wait_addr, __all, __args);
>      }
> -
> -  // This call is to be used by atomic types which track contention externally
> -  inline void
> -  __atomic_notify_address_bare(const __detail::__platform_wait_t* __addr,
> -                              bool __all) noexcept
> -  {
> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> -    __detail::__platform_notify(__addr, __all);
> -#else
> -    __detail::__bare_wait __w(__addr);
> -    __w._M_notify(__all, true);
> -#endif
> -  }
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace std
>  #endif // __glibcxx_atomic_wait
> diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h
> index d8f9bd8982bf..444a1589fb5a 100644
> --- a/libstdc++-v3/include/bits/semaphore_base.h
> +++ b/libstdc++-v3/include/bits/semaphore_base.h
> @@ -181,10 +181,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      __atomic_semaphore(const __atomic_semaphore&) = delete;
>      __atomic_semaphore& operator=(const __atomic_semaphore&) = delete;
>
> -    static _GLIBCXX_ALWAYS_INLINE bool
> -    _S_do_try_acquire(__detail::__platform_wait_t* __counter) noexcept
> +    static _GLIBCXX_ALWAYS_INLINE __detail::__platform_wait_t
> +    _S_get_current(__detail::__platform_wait_t* __counter) noexcept
> +    {
> +      return __atomic_impl::load(__counter, memory_order::acquire);
> +    }
> +
> +    static _GLIBCXX_ALWAYS_INLINE bool
> +    _S_do_try_acquire(__detail::__platform_wait_t* __counter,
> +                     __detail::__platform_wait_t __old) noexcept
>      {
> -      auto __old = __atomic_impl::load(__counter, memory_order::acquire);
>        if (__old == 0)
>         return false;
>
> @@ -197,17 +203,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      _GLIBCXX_ALWAYS_INLINE void
>      _M_acquire() noexcept
>      {
> -      auto const __pred =
> -       [this] { return _S_do_try_acquire(&this->_M_counter); };
> -      std::__atomic_wait_address_bare(&_M_counter, __pred);
> +      auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
> +      auto const __pred = [this](__detail::__platform_wait_t __cur)
> +       { return _S_do_try_acquire(&this->_M_counter, __cur); };
> +      std::__atomic_wait_address(&_M_counter, __pred, __vfn, true);
>      }
>
>      bool
>      _M_try_acquire() noexcept
>      {
> -      auto const __pred =
> -       [this] { return _S_do_try_acquire(&this->_M_counter); };
> -      return std::__detail::__atomic_spin(__pred);
> +      auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
> +      auto const __pred = [this](__detail::__platform_wait_t __cur)
> +       { return _S_do_try_acquire(&this->_M_counter, __cur); };
> +      return __atomic_wait_address_for(&_M_counter, __pred, __vfn,
> +                                        __detail::__wait_clock_t::duration(),
> +                                        true);
>      }
>
>      template<typename _Clock, typename _Duration>
> @@ -215,21 +225,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        _M_try_acquire_until(const chrono::time_point<_Clock,
>                            _Duration>& __atime) noexcept
>        {
> -       auto const __pred =
> -         [this] { return _S_do_try_acquire(&this->_M_counter); };
> -
> -       return __atomic_wait_address_until_bare(&_M_counter, __pred, __atime);
> +       auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
> +       auto const __pred = [this](__detail::__platform_wait_t __cur)
> +        { return _S_do_try_acquire(&this->_M_counter, __cur); };
> +       return std::__atomic_wait_address_until(&_M_counter,
> +                                               __pred, __vfn, __atime, true);
>        }
>
>      template<typename _Rep, typename _Period>
>        _GLIBCXX_ALWAYS_INLINE bool
> -      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime)
> -       noexcept
> +      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime) noexcept
>        {
> -       auto const __pred =
> -         [this] { return _S_do_try_acquire(&this->_M_counter); };
> -
> -       return __atomic_wait_address_for_bare(&_M_counter, __pred, __rtime);
> +       auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
> +       auto const __pred = [this](__detail::__platform_wait_t __cur)
> +        { return _S_do_try_acquire(&this->_M_counter, __cur); };
> +       return std::__atomic_wait_address_for(&_M_counter,
> +                                             __pred, __vfn, __rtime, true);
>        }
>
>      _GLIBCXX_ALWAYS_INLINE void
> @@ -238,9 +249,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        if (0 < __atomic_impl::fetch_add(&_M_counter, __update, memory_order_release))
>         return;
>        if (__update > 1)
> -       __atomic_notify_address_bare(&_M_counter, true);
> +       __atomic_notify_address(&_M_counter, true, true);
>        else
> -       __atomic_notify_address_bare(&_M_counter, true);
> +       __atomic_notify_address(&_M_counter, true, true);
>  // FIXME - Figure out why this does not wake a waiting thread
>  //     __atomic_notify_address_bare(&_M_counter, false);
>      }
> diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
> index 6c3cfd44697c..62b03d0223f4 100644
> --- a/libstdc++-v3/include/std/barrier
> +++ b/libstdc++-v3/include/std/barrier
> @@ -194,11 +194,7 @@ It looks different from literature pseudocode for two main reasons:
>        wait(arrival_token&& __old_phase) const
>        {
>         __atomic_phase_const_ref_t __phase(_M_phase);
> -       auto const __test_fn = [=]
> -         {
> -           return __phase.load(memory_order_acquire) != __old_phase;
> -         };
> -       std::__atomic_wait_address(&_M_phase, __test_fn);
> +       __phase.wait(__old_phase, memory_order_acquire);
>        }
>
>        void
> diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
> index 9220580613d2..de0afd8989bf 100644
> --- a/libstdc++-v3/include/std/latch
> +++ b/libstdc++-v3/include/std/latch
> @@ -78,8 +78,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      _GLIBCXX_ALWAYS_INLINE void
>      wait() const noexcept
>      {
> -      auto const __pred = [this] { return this->try_wait(); };
> -      std::__atomic_wait_address(&_M_a, __pred);
> +      auto const __vfn = [this] { return this->try_wait(); };
> +      auto const __pred = [this](bool __b) { return __b; };
> +      std::__atomic_wait_address(&_M_a, __pred, __vfn);
>      }
>
>      _GLIBCXX_ALWAYS_INLINE void
> diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
> index 018c0c98d0ec..ec596e316500 100644
> --- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
> +++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
> @@ -54,8 +54,8 @@ main()
>      atom->store(0);
>    }
>
> -  auto a = &std::__detail::__waiter_pool_base::_S_for(reinterpret_cast<char *>(atomics.a[0]));
> -  auto b = &std::__detail::__waiter_pool_base::_S_for(reinterpret_cast<char *>(atomics.a[1]));
> +  auto a = &std::__detail::__waiter_pool_impl::_S_impl_for(reinterpret_cast<char *>(atomics.a[0]));
> +  auto b = &std::__detail::__waiter_pool_impl::_S_impl_for(reinterpret_cast<char *>(atomics.a[1]));
>    VERIFY( a == b );
>
>    auto fut0 = std::async(std::launch::async, [&] { atomics.a[0]->wait(0); });
> --
> 2.47.1
>
  
Jonathan Wakely Jan. 28, 2025, 3:03 p.m. UTC | #2
On Tue, 28 Jan 2025 at 14:10, Jonathan Wakely wrote:
>
> On Fri, 10 Jan 2025 at 21:29, Jonathan Wakely wrote:
> >
> > +  template<typename _Tp,
> > +          typename _Pred, typename _ValFn>
> > +    void
> > +    __atomic_wait_address(const _Tp* __addr,
> > +                         _Pred&& __pred, _ValFn&& __vfn,
> > +                         bool __bare_wait = false) noexcept
> > +    {
> > +      const auto __wait_addr =
> > +         reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
> > +      __detail::__wait_args __args{ __addr, __bare_wait };
> > +      _Tp __val = __vfn();
> > +      while (!__pred(__val))
> > +       {
>
> There's a serious race condition here, resulting in missed
> notifications and causing __atomic_wait_address to block forever.
>
> The problem case is for proxied waits, where __wait_impl waits for the
> value of _M_ver to be incremented by __notify_impl.
> Let's say thread A checks __pred(__val) here and determines it hasn't
> changed yet, then gets time-sliced and thread B runs. Thread B changes
> __val and calls notify() which increments _M_ver. Then thread A
> continues, calling __wait_impl which loads _M_ver twice, decides it
> hasn't changed, and goes to sleep forever waiting for it to change.
>
> Thread A will never wake up, because thread B already sent its
> notification. Thread A will never notice that __val has changed,
> because it's blocked waiting for _M_ver to change, so never gets to
> check __pred(__val) again.
>
> Before this refactoring, the load of _M_ver happened before checking
> __pred(__val), which avoids the problem (as long as the value of
> _M_ver doesn't wrap and

Oops, I hit Ctrl-Enter instead of Ctrl-Backspace, which is send in gmail.

Loading the initial _M_ver value before checking the predicate works
as long as _M_ver isn't incremented 2^32 times so that it wraps back
to the same value, which would look like not changing. It's not
strictly correct but seems good enough to assume that thread B (and
any other threads) won't do 4 billion notifications in the time it
takes thread A to check the predicate and enter __wait_impl.

But that would be more of a concern for P2643, which wants to allow
arbitrary user-defined predicates to be used with atomic waiting
functions. Those also mean we can't just load the previous value of
_M_ver and check __pred(__val) all while holding a mutex lock, to
properly synchronize with atomic::notify(). Holding a lock around
arbitrary user-defined predicates is a no-no.
  

Patch

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h
index 9a6ac95b7d0e..196548484024 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -76,62 +76,32 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
 #define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
     // returns true if wait ended before timeout
-    template<typename _Dur>
-      bool
-      __platform_wait_until_impl(const __platform_wait_t* __addr,
-				 __platform_wait_t __old,
-				 const chrono::time_point<__wait_clock_t, _Dur>&
-				      __atime) noexcept
-      {
-	auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
-	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
+    bool
+    __platform_wait_until(const __platform_wait_t* __addr,
+			  __platform_wait_t __old,
+			  const __wait_clock_t::time_point& __atime) noexcept
+    {
+      auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
+      auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
 
-	struct timespec __rt =
+      struct timespec __rt =
 	{
 	  static_cast<std::time_t>(__s.time_since_epoch().count()),
 	  static_cast<long>(__ns.count())
 	};
 
-	auto __e = syscall (SYS_futex, __addr,
-			    static_cast<int>(__futex_wait_flags::
-						__wait_bitset_private),
-			    __old, &__rt, nullptr,
-			    static_cast<int>(__futex_wait_flags::
-						__bitset_match_any));
-
-	if (__e)
-	  {
-	    if (errno == ETIMEDOUT)
-	      return false;
-	    if (errno != EINTR && errno != EAGAIN)
-	      __throw_system_error(errno);
-	  }
-	return true;
-      }
-
-    // returns true if wait ended before timeout
-    template<typename _Clock, typename _Dur>
-      bool
-      __platform_wait_until(const __platform_wait_t* __addr, __platform_wait_t __old,
-			    const chrono::time_point<_Clock, _Dur>& __atime)
-      {
-	if constexpr (is_same_v<__wait_clock_t, _Clock>)
-	  {
-	    return __platform_wait_until_impl(__addr, __old, __atime);
-	  }
-	else
-	  {
-	    if (!__platform_wait_until_impl(__addr, __old,
-					    __to_wait_clock(__atime)))
-	      {
-		// We got a timeout when measured against __clock_t but
-		// we need to check against the caller-supplied clock
-		// to tell whether we should return a timeout.
-		if (_Clock::now() < __atime)
-		  return true;
-	      }
+      auto __e = syscall (SYS_futex, __addr,
+			  static_cast<int>(__futex_wait_flags::__wait_bitset_private),
+			  __old, &__rt, nullptr,
+			  static_cast<int>(__futex_wait_flags::__bitset_match_any));
+      if (__e)
+	{
+	  if (errno == ETIMEDOUT)
 	    return false;
-	  }
+	  if (errno != EINTR && errno != EAGAIN)
+	    __throw_system_error(errno);
+	}
+	return true;
       }
 #else
 // define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement __platform_wait_until()
@@ -141,15 +111,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #ifdef _GLIBCXX_HAS_GTHREADS
     // Returns true if wait ended before timeout.
-    // _Clock must be either steady_clock or system_clock.
-    template<typename _Clock, typename _Dur>
-      bool
-      __cond_wait_until_impl(__condvar& __cv, mutex& __mx,
-			     const chrono::time_point<_Clock, _Dur>& __atime)
-      {
-	static_assert(std::__is_one_of<_Clock, chrono::steady_clock,
-					       chrono::system_clock>::value);
-
+    bool
+    __cond_wait_until(__condvar& __cv, mutex& __mx,
+		      const __wait_clock_t::time_point& __atime)
+    {
 	auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
 	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
 
@@ -160,293 +125,261 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  };
 
 #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-	if constexpr (is_same_v<chrono::steady_clock, _Clock>)
+	if constexpr (is_same_v<chrono::steady_clock, __wait_clock_t>)
 	  __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
 	else
 #endif
 	  __cv.wait_until(__mx, __ts);
-	return _Clock::now() < __atime;
-      }
-
-    // returns true if wait ended before timeout
-    template<typename _Clock, typename _Dur>
-      bool
-      __cond_wait_until(__condvar& __cv, mutex& __mx,
-	  const chrono::time_point<_Clock, _Dur>& __atime)
-      {
-#ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-	if constexpr (is_same_v<_Clock, chrono::steady_clock>)
-	  return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
-	else
-#endif
-	if constexpr (is_same_v<_Clock, chrono::system_clock>)
-	  return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
-	else
-	  {
-	    if (__cond_wait_until_impl(__cv, __mx,
-				       __to_wait_clock(__atime)))
-	      {
-		// We got a timeout when measured against __clock_t but
-		// we need to check against the caller-supplied clock
-		// to tell whether we should return a timeout.
-		if (_Clock::now() < __atime)
-		  return true;
-	      }
-	    return false;
-	  }
+	return __wait_clock_t::now() < __atime;
       }
 #endif // _GLIBCXX_HAS_GTHREADS
 
-    struct __timed_waiter_pool : __waiter_pool_base
+    inline __wait_result_type
+    __spin_until_impl(const __platform_wait_t* __addr, __wait_args __args,
+		      const __wait_clock_t::time_point& __deadline)
     {
-      // returns true if wait ended before timeout
-      template<typename _Clock, typename _Dur>
-	bool
-	_M_do_wait_until(__platform_wait_t* __addr, __platform_wait_t __old,
-			 const chrono::time_point<_Clock, _Dur>& __atime)
+      auto __t0 = __wait_clock_t::now();
+      using namespace literals::chrono_literals;
+
+      __platform_wait_t __val;
+      auto __now = __wait_clock_t::now();
+      for (; __now < __deadline; __now = __wait_clock_t::now())
+      {
+	auto __elapsed = __now - __t0;
+#ifndef _GLIBCXX_NO_SLEEP
+	if (__elapsed > 128ms)
+	{
+	  this_thread::sleep_for(64ms);
+	}
+	else if (__elapsed > 64us)
+	{
+	  this_thread::sleep_for(__elapsed / 2);
+	}
+	else
+#endif
+	if (__elapsed > 4us)
+	{
+	  __thread_yield();
+	}
+	else
+	{
+	  auto __res = __detail::__spin_impl(__addr, __args);
+	  if (__res.first)
+	    return __res;
+	}
+
+	__atomic_load(__addr, &__val, __args._M_order);
+	if (__val != __args._M_old)
+	    return make_pair(true, __val);
+      }
+      return make_pair(false, __val);
+    }
+
+    inline __wait_result_type
+    __wait_until_impl(const __platform_wait_t* __addr, __wait_args __args,
+		      const __wait_clock_t::time_point& __atime)
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+      __waiter_pool_impl* __pool = nullptr;
+#else
+      // if we don't have __platform_wait, we always need the side-table
+      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+
+      __platform_wait_t* __wait_addr;
+      if (__args & __wait_flags::__proxy_wait)
 	{
 #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
-	  return __platform_wait_until(__addr, __old, __atime);
-#else
-	  __platform_wait_t __val;
-	  __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
-	  if (__val == __old)
-	    {
-	      lock_guard<mutex> __l(_M_mtx);
-	      return __cond_wait_until(_M_cv, _M_mtx, __atime);
-	    }
-	  else
-	    return true;
-#endif // _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+	  __wait_addr = &__pool->_M_ver;
+	  __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
 	}
-    };
+      else
+	__wait_addr = const_cast<__platform_wait_t*>(__addr);
 
-    struct __timed_backoff_spin_policy
-    {
-      __wait_clock_t::time_point _M_deadline;
-      __wait_clock_t::time_point _M_t0;
+      if (__args & __wait_flags::__do_spin)
+	{
+	  auto __res = __detail::__spin_until_impl(__wait_addr, __args, __atime);
+	  if (__res.first)
+	    return __res;
+	  if (__args & __wait_flags::__spin_only)
+	    return __res;
+	}
 
-      template<typename _Clock, typename _Dur>
-	__timed_backoff_spin_policy(chrono::time_point<_Clock, _Dur>
-				      __deadline = _Clock::time_point::max(),
-				    chrono::time_point<_Clock, _Dur>
-				      __t0 = _Clock::now()) noexcept
-	  : _M_deadline(__to_wait_clock(__deadline))
-	  , _M_t0(__to_wait_clock(__t0))
-	{ }
-
-      bool
-      operator()() const noexcept
+      if (!(__args & __wait_flags::__track_contention))
       {
-	using namespace literals::chrono_literals;
-	auto __now = __wait_clock_t::now();
-	if (_M_deadline <= __now)
-	  return false;
-
-	// FIXME: this_thread::sleep_for not available #ifdef _GLIBCXX_NO_SLEEP
-
-	auto __elapsed = __now - _M_t0;
-	if (__elapsed > 128ms)
-	  {
-	    this_thread::sleep_for(64ms);
-	  }
-	else if (__elapsed > 64us)
-	  {
-	    this_thread::sleep_for(__elapsed / 2);
-	  }
-	else if (__elapsed > 4us)
-	  {
-	    __thread_yield();
-	  }
-	else
-	  return false;
-	return true;
+	// caller does not externally track contention
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+	__pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
+				     : __pool;
+#endif
+	__pool->_M_enter_wait();
       }
-    };
 
-    template<typename _EntersWait>
-      struct __timed_waiter : __waiter_base<__timed_waiter_pool>
+      __wait_result_type __res;
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+      if (__platform_wait_until(__wait_addr, __args._M_old, __atime))
+	__res = make_pair(true, __args._M_old);
+      else
+	__res = make_pair(false, __args._M_old);
+#else
+      __platform_wait_t __val;
+      __atomic_load(__wait_addr, &__val, __args._M_order);
+      if (__val == __args._M_old)
+	{
+	   lock_guard<mutex> __l{ __pool->_M_mtx };
+	   __atomic_load(__wait_addr, &__val, __args._M_order);
+	   if (__val == __args._M_old &&
+	       __cond_wait_until(__pool->_M_cv, __pool->_M_mtx, __atime))
+	     __res = make_pair(true, __val);
+	}
+      else
+	__res = make_pair(false, __val);
+#endif
+
+      if (!(__args & __wait_flags::__track_contention))
+	  // caller does not externally track contention
+	  __pool->_M_leave_wait();
+      return __res;
+    }
+
+    template<typename _Clock, typename _Dur>
+      __wait_result_type
+      __wait_until(const __platform_wait_t* __addr, __wait_args __args,
+		   const chrono::time_point<_Clock, _Dur>& __atime) noexcept
       {
-	using __base_type = __waiter_base<__timed_waiter_pool>;
-
-	template<typename _Tp>
-	  __timed_waiter(const _Tp* __addr) noexcept
-	  : __base_type(__addr)
-	{
-	  if constexpr (_EntersWait::value)
-	    _M_w._M_enter_wait();
-	}
-
-	~__timed_waiter()
-	{
-	  if constexpr (_EntersWait::value)
-	    _M_w._M_leave_wait();
-	}
-
-	// returns true if wait ended before timeout
-	template<typename _Tp, typename _ValFn,
-		 typename _Clock, typename _Dur>
-	  bool
-	  _M_do_wait_until_v(_Tp __old, _ValFn __vfn,
-			     const chrono::time_point<_Clock, _Dur>&
-								__atime) noexcept
+	if constexpr (is_same_v<__wait_clock_t, _Clock>)
+	  return __detail::__wait_until_impl(__addr, __args, __atime);
+	else
 	  {
-	    __platform_wait_t __val;
-	    if (_M_do_spin(__old, std::move(__vfn), __val,
-			   __timed_backoff_spin_policy(__atime)))
-	      return true;
-	    return __base_type::_M_w._M_do_wait_until(__base_type::_M_addr, __val, __atime);
-	  }
+	     auto __res = __detail::__wait_until_impl(__addr, __args,
+					    __to_wait_clock(__atime));
+	     if (!__res.first)
+	       {
+		  // We got a timeout when measured against __clock_t but
+		  // we need to check against the caller-supplied clock
+		  // to tell whether we should return a timeout.
+		  if (_Clock::now() < __atime)
+		    return make_pair(true, __res.second);
+		}
+	      return __res;
+	    }
+      }
 
-	// returns true if wait ended before timeout
-	template<typename _Pred,
-		 typename _Clock, typename _Dur>
-	  bool
-	  _M_do_wait_until(_Pred __pred, __platform_wait_t __val,
-			  const chrono::time_point<_Clock, _Dur>&
-							      __atime) noexcept
-	  {
-	    for (auto __now = _Clock::now(); __now < __atime;
-		  __now = _Clock::now())
-	      {
-		if (__base_type::_M_w._M_do_wait_until(
-		      __base_type::_M_addr, __val, __atime)
-		    && __pred())
-		  return true;
-
-		if (__base_type::_M_do_spin(__pred, __val,
-			       __timed_backoff_spin_policy(__atime, __now)))
-		  return true;
-	      }
-	    return false;
-	  }
-
-	// returns true if wait ended before timeout
-	template<typename _Pred,
-		 typename _Clock, typename _Dur>
-	  bool
-	  _M_do_wait_until(_Pred __pred,
-			   const chrono::time_point<_Clock, _Dur>&
-								__atime) noexcept
-	  {
-	    __platform_wait_t __val;
-	    if (__base_type::_M_do_spin(__pred, __val,
-					__timed_backoff_spin_policy(__atime)))
-	      return true;
-	    return _M_do_wait_until(__pred, __val, __atime);
-	  }
-
-	template<typename _Tp, typename _ValFn,
-		 typename _Rep, typename _Period>
-	  bool
-	  _M_do_wait_for_v(_Tp __old, _ValFn __vfn,
-			   const chrono::duration<_Rep, _Period>&
-								__rtime) noexcept
-	  {
-	    __platform_wait_t __val;
-	    if (_M_do_spin_v(__old, std::move(__vfn), __val))
-	      return true;
-
-	    if (!__rtime.count())
-	      return false; // no rtime supplied, and spin did not acquire
-
-	    auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
-
-	    return __base_type::_M_w._M_do_wait_until(
-					  __base_type::_M_addr,
-					  __val,
-					  chrono::steady_clock::now() + __reltime);
-	  }
-
-	template<typename _Pred,
-		 typename _Rep, typename _Period>
-	  bool
-	  _M_do_wait_for(_Pred __pred,
-			 const chrono::duration<_Rep, _Period>& __rtime) noexcept
-	  {
-	    __platform_wait_t __val;
-	    if (__base_type::_M_do_spin(__pred, __val))
-	      return true;
-
-	    if (!__rtime.count())
-	      return false; // no rtime supplied, and spin did not acquire
-
-	    auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
-
-	    return _M_do_wait_until(__pred, __val,
-				    chrono::steady_clock::now() + __reltime);
-	  }
-      };
-
-    using __enters_timed_wait = __timed_waiter<std::true_type>;
-    using __bare_timed_wait = __timed_waiter<std::false_type>;
+    template<typename _Rep, typename _Period>
+      __wait_result_type
+      __wait_for(const __platform_wait_t* __addr, __wait_args __args,
+		 const chrono::duration<_Rep, _Period>& __rtime) noexcept
+    {
+      if (!__rtime.count())
+	// no rtime supplied, just spin a bit
+	return __detail::__wait_impl(__addr, __args | __wait_flags::__spin_only);
+      auto const __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
+      auto const __atime = chrono::steady_clock::now() + __reltime;
+      return __detail::__wait_until(__addr, __args, __atime);
+    }
   } // namespace __detail
 
   // returns true if wait ended before timeout
+  template<typename _Tp,
+	   typename _Pred, typename _ValFn,
+	   typename _Clock, typename _Dur>
+    bool
+    __atomic_wait_address_until(const _Tp* __addr, _Pred&& __pred,
+				_ValFn&& __vfn,
+				const chrono::time_point<_Clock, _Dur>& __atime,
+				bool __bare_wait = false) noexcept
+    {
+       const auto __wait_addr =
+	   reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+       __detail::__wait_args __args{ __addr, __bare_wait };
+       _Tp __val = __vfn();
+       while (!__pred(__val))
+	 {
+	   auto __res = __detail::__wait_until(__wait_addr, __args, __atime);
+	   if (!__res.first)
+	     // timed out
+	     return __res.first; // C++26 will also return last observed __val
+	   __val = __vfn();
+	 }
+       return true; // C++26 will also return last observed __val
+    }
+
+  template<typename _Clock, typename _Dur>
+    bool
+    __atomic_wait_address_until_v(const __detail::__platform_wait_t* __addr,
+				  __detail::__platform_wait_t __old,
+				  int __order,
+				  const chrono::time_point<_Clock, _Dur>& __atime,
+				  bool __bare_wait = false) noexcept
+    {
+      __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
+      auto __res = __detail::__wait_until(__addr, __args, __atime);
+      return __res.first; // C++26 will also return last observed __val
+    }
+
   template<typename _Tp, typename _ValFn,
 	   typename _Clock, typename _Dur>
     bool
     __atomic_wait_address_until_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
-			const chrono::time_point<_Clock, _Dur>&
-			    __atime) noexcept
+				  const chrono::time_point<_Clock, _Dur>& __atime,
+				  bool __bare_wait = false) noexcept
     {
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_until_v(__old, __vfn, __atime);
+       auto __pfn = [&](const _Tp& __val)
+	   { return !__detail::__atomic_compare(__old, __val); };
+       return __atomic_wait_address_until(__addr, __pfn, forward<_ValFn>(__vfn),
+					  __atime, __bare_wait);
     }
 
-  template<typename _Tp, typename _Pred,
-	   typename _Clock, typename _Dur>
-    bool
-    __atomic_wait_address_until(const _Tp* __addr, _Pred __pred,
-				const chrono::time_point<_Clock, _Dur>&
-							      __atime) noexcept
-    {
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_until(__pred, __atime);
-    }
+  template<typename _Tp,
+	   typename _Pred, typename _ValFn,
+	   typename _Rep, typename _Period>
+   bool
+   __atomic_wait_address_for(const _Tp* __addr, _Pred&& __pred,
+			     _ValFn&& __vfn,
+			     const chrono::duration<_Rep, _Period>& __rtime,
+			     bool __bare_wait = false) noexcept
+  {
+      const auto __wait_addr =
+	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+      __detail::__wait_args __args{ __addr, __bare_wait };
+      _Tp __val = __vfn();
+      while (!__pred(__val))
+	{
+	  auto __res = __detail::__wait_for(__wait_addr, __args, __rtime);
+	  if (!__res.first)
+	    // timed out
+	    return __res.first; // C++26 will also return last observed __val
+	  __val = __vfn();
+	}
+      return true; // C++26 will also return last observed __val
+  }
 
-  template<typename _Pred,
-	   typename _Clock, typename _Dur>
+  template<typename _Rep, typename _Period>
     bool
-    __atomic_wait_address_until_bare(const __detail::__platform_wait_t* __addr,
-				_Pred __pred,
-				const chrono::time_point<_Clock, _Dur>&
-							      __atime) noexcept
-    {
-      __detail::__bare_timed_wait __w{__addr};
-      return __w._M_do_wait_until(__pred, __atime);
-    }
+    __atomic_wait_address_for_v(const __detail::__platform_wait_t* __addr,
+				__detail::__platform_wait_t __old,
+				int __order,
+				const chrono::time_point<_Rep, _Period>& __rtime,
+				bool __bare_wait = false) noexcept
+  {
+    __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
+    auto __res = __detail::__wait_for(__addr, __args, __rtime);
+    return __res.first; // C++26 will also return last observed __Val
+  }
 
   template<typename _Tp, typename _ValFn,
 	   typename _Rep, typename _Period>
     bool
     __atomic_wait_address_for_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
-		      const chrono::duration<_Rep, _Period>& __rtime) noexcept
+				const chrono::duration<_Rep, _Period>& __rtime,
+				bool __bare_wait = false) noexcept
     {
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_for_v(__old, __vfn, __rtime);
-    }
-
-  template<typename _Tp, typename _Pred,
-	   typename _Rep, typename _Period>
-    bool
-    __atomic_wait_address_for(const _Tp* __addr, _Pred __pred,
-		      const chrono::duration<_Rep, _Period>& __rtime) noexcept
-    {
-
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_for(__pred, __rtime);
-    }
-
-  template<typename _Pred,
-	   typename _Rep, typename _Period>
-    bool
-    __atomic_wait_address_for_bare(const __detail::__platform_wait_t* __addr,
-			_Pred __pred,
-			const chrono::duration<_Rep, _Period>& __rtime) noexcept
-    {
-      __detail::__bare_timed_wait __w{__addr};
-      return __w._M_do_wait_for(__pred, __rtime);
+      auto __pfn = [&](const _Tp& __val)
+	  { return !__detail::__atomic_compare(__old, __val); };
+      return __atomic_wait_address_for(__addr, __pfn, forward<_ValFn>(__vfn),
+				       __rtime, __bare_wait);
     }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 6d1554f68a56..18cfc2ef7bd2 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -50,7 +50,8 @@ 
 # include <bits/functexcept.h>
 #endif
 
-# include <bits/std_mutex.h>  // std::mutex, std::__condvar
+#include <bits/stl_pair.h>
+#include <bits/std_mutex.h>  // std::mutex, std::__condvar
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -134,7 +135,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     __thread_yield() noexcept
     {
 #if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD
-     __gthread_yield();
+      __gthread_yield();
 #endif
     }
 
@@ -151,38 +152,6 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     inline constexpr auto __atomic_spin_count_relax = 12;
     inline constexpr auto __atomic_spin_count = 16;
 
-    struct __default_spin_policy
-    {
-      bool
-      operator()() const noexcept
-      { return false; }
-    };
-
-    template<typename _Pred,
-	     typename _Spin = __default_spin_policy>
-      bool
-      __atomic_spin(_Pred& __pred, _Spin __spin = _Spin{ }) noexcept
-      {
-	for (auto __i = 0; __i < __atomic_spin_count; ++__i)
-	  {
-	    if (__pred())
-	      return true;
-
-	    if (__i < __atomic_spin_count_relax)
-	      __detail::__thread_relax();
-	    else
-	      __detail::__thread_yield();
-	  }
-
-	while (__spin())
-	  {
-	    if (__pred())
-	      return true;
-	  }
-
-	return false;
-      }
-
     // return true if equal
     template<typename _Tp>
       bool __atomic_compare(const _Tp& __a, const _Tp& __b)
@@ -191,7 +160,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
       }
 
-    struct __waiter_pool_base
+    struct __waiter_pool_impl
     {
       // Don't use std::hardware_destructive_interference_size here because we
       // don't want the layout of library types to depend on compiler options.
@@ -208,7 +177,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
       __condvar _M_cv;
 #endif
-      __waiter_pool_base() = default;
+      __waiter_pool_impl() = default;
 
       void
       _M_enter_wait() noexcept
@@ -226,256 +195,271 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __res != 0;
       }
 
-      void
-      _M_notify(__platform_wait_t* __addr, [[maybe_unused]] bool __all,
-		bool __bare) noexcept
-      {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-	if (__addr == &_M_ver)
-	  {
-	    __atomic_fetch_add(__addr, 1, __ATOMIC_SEQ_CST);
-	    __all = true;
-	  }
-
-	if (__bare || _M_waiting())
-	  __platform_notify(__addr, __all);
-#else
-	{
-	  lock_guard<mutex> __l(_M_mtx);
-	  __atomic_fetch_add(__addr, 1, __ATOMIC_RELAXED);
-	}
-	if (__bare || _M_waiting())
-	  _M_cv.notify_all();
-#endif
-      }
-
-      static __waiter_pool_base&
-      _S_for(const void* __addr) noexcept
+      static __waiter_pool_impl&
+      _S_impl_for(const void* __addr) noexcept
       {
 	constexpr __UINTPTR_TYPE__ __ct = 16;
-	static __waiter_pool_base __w[__ct];
+	static __waiter_pool_impl __w[__ct];
 	auto __key = ((__UINTPTR_TYPE__)__addr >> 2) % __ct;
 	return __w[__key];
       }
     };
 
-    struct __waiter_pool : __waiter_pool_base
+    enum class __wait_flags : __UINT_LEAST32_TYPE__
     {
-      void
-      _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old) noexcept
-      {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-	__platform_wait(__addr, __old);
-#else
-	__platform_wait_t __val;
-	__atomic_load(__addr, &__val, __ATOMIC_SEQ_CST);
-	if (__val == __old)
-	  {
-	    lock_guard<mutex> __l(_M_mtx);
-	    __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
-	    if (__val == __old)
-	      _M_cv.wait(_M_mtx);
-	  }
-#endif // __GLIBCXX_HAVE_PLATFORM_WAIT
-      }
+       __abi_version = 0,
+       __proxy_wait = 1,
+       __track_contention = 2,
+       __do_spin = 4,
+       __spin_only = 8 | __do_spin, // implies __do_spin
+       __abi_version_mask = 0xffff0000,
     };
 
-    template<typename _Tp>
-      struct __waiter_base
+    struct __wait_args
+    {
+      __platform_wait_t _M_old;
+      int _M_order = __ATOMIC_ACQUIRE;
+      __wait_flags _M_flags;
+
+      template<typename _Tp>
+	explicit __wait_args(const _Tp* __addr,
+			     bool __bare_wait = false) noexcept
+	    : _M_flags{ _S_flags_for(__addr, __bare_wait) }
+	{ }
+
+      __wait_args(const __platform_wait_t* __addr, __platform_wait_t __old,
+		  int __order, bool __bare_wait = false) noexcept
+	  : _M_old{ __old }
+	  , _M_order{ __order }
+	  , _M_flags{ _S_flags_for(__addr, __bare_wait) }
+	{ }
+
+      __wait_args(const __wait_args&) noexcept = default;
+      __wait_args&
+      operator=(const __wait_args&) noexcept = default;
+
+      bool
+      operator&(__wait_flags __flag) const noexcept
       {
-	using __waiter_type = _Tp;
+	 using __t = underlying_type_t<__wait_flags>;
+	 return static_cast<__t>(_M_flags)
+	     & static_cast<__t>(__flag);
+      }
 
-	__waiter_type& _M_w;
-	__platform_wait_t* _M_addr;
+      __wait_args
+      operator|(__wait_flags __flag) const noexcept
+      {
+	using __t = underlying_type_t<__wait_flags>;
+	__wait_args __res{ *this };
+	const auto __flags = static_cast<__t>(__res._M_flags)
+			     | static_cast<__t>(__flag);
+	__res._M_flags = __wait_flags{ __flags };
+	return __res;
+      }
 
-	template<typename _Up>
-	  static __platform_wait_t*
-	  _S_wait_addr(const _Up* __a, __platform_wait_t* __b)
-	  {
-	    if constexpr (__platform_wait_uses_type<_Up>)
-	      return reinterpret_cast<__platform_wait_t*>(const_cast<_Up*>(__a));
-	    else
-	      return __b;
-	  }
+    private:
+      static int
+      constexpr _S_default_flags() noexcept
+      {
+	using __t = underlying_type_t<__wait_flags>;
+	return static_cast<__t>(__wait_flags::__abi_version)
+		| static_cast<__t>(__wait_flags::__do_spin);
+      }
 
-	static __waiter_type&
-	_S_for(const void* __addr) noexcept
+      template<typename _Tp>
+	static int
+	constexpr _S_flags_for(const _Tp*, bool __bare_wait) noexcept
 	{
-	  static_assert(sizeof(__waiter_type) == sizeof(__waiter_pool_base));
-	  auto& res = __waiter_pool_base::_S_for(__addr);
-	  return reinterpret_cast<__waiter_type&>(res);
+	  auto __res = _S_default_flags();
+	  if (!__bare_wait)
+	    __res |= static_cast<int>(__wait_flags::__track_contention);
+	  if constexpr (!__platform_wait_uses_type<_Tp>)
+	    __res |= static_cast<int>(__wait_flags::__proxy_wait);
+	  return __res;
 	}
 
-	template<typename _Up>
-	  explicit __waiter_base(const _Up* __addr) noexcept
-	    : _M_w(_S_for(__addr))
-	    , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
-	  { }
-
-	void
-	_M_notify(bool __all, bool __bare = false) noexcept
-	{ _M_w._M_notify(_M_addr, __all, __bare); }
-
-	template<typename _Up, typename _ValFn,
-		 typename _Spin = __default_spin_policy>
-	  static bool
-	  _S_do_spin_v(__platform_wait_t* __addr,
-		       const _Up& __old, _ValFn __vfn,
-		       __platform_wait_t& __val,
-		       _Spin __spin = _Spin{ })
-	  {
-	    auto const __pred = [=]
-	      { return !__detail::__atomic_compare(__old, __vfn()); };
-
-	    if constexpr (__platform_wait_uses_type<_Up>)
-	      {
-		__builtin_memcpy(&__val, &__old, sizeof(__val));
-	      }
-	    else
-	      {
-		__atomic_load(__addr, &__val, __ATOMIC_ACQUIRE);
-	      }
-	    return __atomic_spin(__pred, __spin);
-	  }
-
-	template<typename _Up, typename _ValFn,
-		 typename _Spin = __default_spin_policy>
-	  bool
-	  _M_do_spin_v(const _Up& __old, _ValFn __vfn,
-		       __platform_wait_t& __val,
-		       _Spin __spin = _Spin{ })
-	  { return _S_do_spin_v(_M_addr, __old, __vfn, __val, __spin); }
-
-	template<typename _Pred,
-		 typename _Spin = __default_spin_policy>
-	  static bool
-	  _S_do_spin(const __platform_wait_t* __addr,
-		     _Pred __pred,
-		     __platform_wait_t& __val,
-		     _Spin __spin = _Spin{ })
-	  {
-	    __atomic_load(__addr, &__val, __ATOMIC_ACQUIRE);
-	    return __atomic_spin(__pred, __spin);
-	  }
-
-	template<typename _Pred,
-		 typename _Spin = __default_spin_policy>
-	  bool
-	  _M_do_spin(_Pred __pred, __platform_wait_t& __val,
-		     _Spin __spin = _Spin{ })
-	  { return _S_do_spin(_M_addr, __pred, __val, __spin); }
-      };
-
-    template<typename _EntersWait>
-      struct __waiter : __waiter_base<__waiter_pool>
-      {
-	using __base_type = __waiter_base<__waiter_pool>;
-
-	template<typename _Tp>
-	  explicit __waiter(const _Tp* __addr) noexcept
-	    : __base_type(__addr)
-	  {
-	    if constexpr (_EntersWait::value)
-	      _M_w._M_enter_wait();
-	  }
-
-	~__waiter()
+      template<typename _Tp>
+	static int
+	_S_memory_order_for(const _Tp*, int __order) noexcept
 	{
-	  if constexpr (_EntersWait::value)
-	    _M_w._M_leave_wait();
+	  if constexpr (__platform_wait_uses_type<_Tp>)
+	    return __order;
+	  return __ATOMIC_ACQUIRE;
+	}
+    };
+
+    using __wait_result_type = pair<bool, __platform_wait_t>;
+    inline __wait_result_type
+    __spin_impl(const __platform_wait_t* __addr, __wait_args __args)
+    {
+      __platform_wait_t __val;
+      for (auto __i = 0; __i < __atomic_spin_count; ++__i)
+	{
+	  __atomic_load(__addr, &__val, __args._M_order);
+	  if (__val != __args._M_old)
+	    return make_pair(true, __val);
+	  if (__i < __atomic_spin_count_relax)
+	    __detail::__thread_relax();
+	  else
+	    __detail::__thread_yield();
+	}
+      return make_pair(false, __val);
+    }
+
+    inline __wait_result_type
+    __wait_impl(const __platform_wait_t* __addr, __wait_args __args)
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __waiter_pool_impl* __pool = nullptr;
+#else
+      // if we don't have __platform_wait, we always need the side-table
+      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+
+      __platform_wait_t* __wait_addr;
+      if (__args & __wait_flags::__proxy_wait)
+	{
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+	  __wait_addr = &__pool->_M_ver;
+	  __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
+	}
+      else
+	__wait_addr = const_cast<__platform_wait_t*>(__addr);
+
+      if (__args & __wait_flags::__do_spin)
+	{
+	  auto __res = __detail::__spin_impl(__wait_addr, __args);
+	  if (__res.first)
+	    return __res;
+	  if (__args & __wait_flags::__spin_only)
+	    return __res;
 	}
 
-	template<typename _Tp, typename _ValFn>
-	  void
-	  _M_do_wait_v(_Tp __old, _ValFn __vfn)
-	  {
-	    do
-	      {
-		__platform_wait_t __val;
-		if (__base_type::_M_do_spin_v(__old, __vfn, __val))
-		  return;
-		__base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
-	      }
-	    while (__detail::__atomic_compare(__old, __vfn()));
-	  }
+      if (!(__args & __wait_flags::__track_contention))
+	{
+	  // caller does not externally track contention
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	  __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
+				       : __pool;
+#endif
+	  __pool->_M_enter_wait();
+	}
 
-	template<typename _Pred>
-	  void
-	  _M_do_wait(_Pred __pred) noexcept
-	  {
-	    do
-	      {
-		__platform_wait_t __val;
-		if (__base_type::_M_do_spin(__pred, __val))
-		  return;
-		__base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
-	      }
-	    while (!__pred());
-	  }
-      };
+      __wait_result_type __res;
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __platform_wait(__wait_addr, __args._M_old);
+      __res = make_pair(false, __args._M_old);
+#else
+      __platform_wait_t __val;
+      __atomic_load(__wait_addr, &__val, __args._M_order);
+      if (__val == __args._M_old)
+	{
+	    lock_guard<mutex> __l{ __pool->_M_mtx };
+	    __atomic_load(__wait_addr, &__val, __args._M_order);
+	    if (__val == __args._M_old)
+		__pool->_M_cv.wait(__pool->_M_mtx);
+	}
+      __res = make_pair(false, __val);
+#endif
 
-    using __enters_wait = __waiter<std::true_type>;
-    using __bare_wait = __waiter<std::false_type>;
+      if (!(__args & __wait_flags::__track_contention))
+	// caller does not externally track contention
+	__pool->_M_leave_wait();
+      return __res;
+    }
+
+    inline void
+    __notify_impl(const __platform_wait_t* __addr, [[maybe_unused]] bool __all,
+		  __wait_args __args)
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __waiter_pool_impl* __pool = nullptr;
+#else
+      // if we don't have __platform_notify, we always need the side-table
+      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+
+      if (!(__args & __wait_flags::__track_contention))
+	{
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+	  if (!__pool->_M_waiting())
+	    return;
+	}
+
+      __platform_wait_t* __wait_addr;
+      if (__args & __wait_flags::__proxy_wait)
+	{
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	   __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
+					: __pool;
+#endif
+	   __wait_addr = &__pool->_M_ver;
+	   __atomic_fetch_add(__wait_addr, 1, __ATOMIC_RELAXED);
+	   __all = true;
+	 }
+
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __platform_notify(__wait_addr, __all);
+#else
+      lock_guard<mutex> __l{ __pool->_M_mtx };
+      __pool->_M_cv.notify_all();
+#endif
+    }
   } // namespace __detail
 
+  template<typename _Tp,
+	   typename _Pred, typename _ValFn>
+    void
+    __atomic_wait_address(const _Tp* __addr,
+			  _Pred&& __pred, _ValFn&& __vfn,
+			  bool __bare_wait = false) noexcept
+    {
+      const auto __wait_addr =
+	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+      __detail::__wait_args __args{ __addr, __bare_wait };
+      _Tp __val = __vfn();
+      while (!__pred(__val))
+	{
+	  __detail::__wait_impl(__wait_addr, __args);
+	  __val = __vfn();
+	}
+      // C++26 will return __val
+    }
+
+  inline void
+  __atomic_wait_address_v(const __detail::__platform_wait_t* __addr,
+			  __detail::__platform_wait_t __old,
+			  int __order)
+  {
+    __detail::__wait_args __args{ __addr, __old, __order };
+    // C++26 will not ignore the return value here
+    __detail::__wait_impl(__addr, __args);
+  }
+
   template<typename _Tp, typename _ValFn>
     void
     __atomic_wait_address_v(const _Tp* __addr, _Tp __old,
 			    _ValFn __vfn) noexcept
     {
-      __detail::__enters_wait __w(__addr);
-      __w._M_do_wait_v(__old, __vfn);
-    }
-
-  template<typename _Tp, typename _Pred>
-    void
-    __atomic_wait_address(const _Tp* __addr, _Pred __pred) noexcept
-    {
-      __detail::__enters_wait __w(__addr);
-      __w._M_do_wait(__pred);
-    }
-
-  // This call is to be used by atomic types which track contention externally
-  template<typename _Pred>
-    void
-    __atomic_wait_address_bare(const __detail::__platform_wait_t* __addr,
-			       _Pred __pred) noexcept
-    {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-      do
-	{
-	  __detail::__platform_wait_t __val;
-	  if (__detail::__bare_wait::_S_do_spin(__addr, __pred, __val))
-	    return;
-	  __detail::__platform_wait(__addr, __val);
-	}
-      while (!__pred());
-#else // !_GLIBCXX_HAVE_PLATFORM_WAIT
-      __detail::__bare_wait __w(__addr);
-      __w._M_do_wait(__pred);
-#endif
+      auto __pfn = [&](const _Tp& __val)
+	  { return !__detail::__atomic_compare(__old, __val); };
+      __atomic_wait_address(__addr, __pfn, forward<_ValFn>(__vfn));
     }
 
   template<typename _Tp>
     void
-    __atomic_notify_address(const _Tp* __addr, bool __all) noexcept
+    __atomic_notify_address(const _Tp* __addr, bool __all,
+			    bool __bare_wait = false) noexcept
     {
-      __detail::__bare_wait __w(__addr);
-      __w._M_notify(__all);
+      const auto __wait_addr =
+	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+      __detail::__wait_args __args{ __addr, __bare_wait };
+      __detail::__notify_impl(__wait_addr, __all, __args);
     }
-
-  // This call is to be used by atomic types which track contention externally
-  inline void
-  __atomic_notify_address_bare(const __detail::__platform_wait_t* __addr,
-			       bool __all) noexcept
-  {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-    __detail::__platform_notify(__addr, __all);
-#else
-    __detail::__bare_wait __w(__addr);
-    __w._M_notify(__all, true);
-#endif
-  }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 #endif // __glibcxx_atomic_wait
diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h
index d8f9bd8982bf..444a1589fb5a 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -181,10 +181,16 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     __atomic_semaphore(const __atomic_semaphore&) = delete;
     __atomic_semaphore& operator=(const __atomic_semaphore&) = delete;
 
-    static _GLIBCXX_ALWAYS_INLINE bool
-    _S_do_try_acquire(__detail::__platform_wait_t* __counter) noexcept
+    static _GLIBCXX_ALWAYS_INLINE __detail::__platform_wait_t
+    _S_get_current(__detail::__platform_wait_t* __counter) noexcept
+    {
+      return __atomic_impl::load(__counter, memory_order::acquire);
+    }
+
+    static _GLIBCXX_ALWAYS_INLINE bool
+    _S_do_try_acquire(__detail::__platform_wait_t* __counter,
+		      __detail::__platform_wait_t __old) noexcept
     {
-      auto __old = __atomic_impl::load(__counter, memory_order::acquire);
       if (__old == 0)
 	return false;
 
@@ -197,17 +203,21 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _GLIBCXX_ALWAYS_INLINE void
     _M_acquire() noexcept
     {
-      auto const __pred =
-	[this] { return _S_do_try_acquire(&this->_M_counter); };
-      std::__atomic_wait_address_bare(&_M_counter, __pred);
+      auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+      auto const __pred = [this](__detail::__platform_wait_t __cur)
+	{ return _S_do_try_acquire(&this->_M_counter, __cur); };
+      std::__atomic_wait_address(&_M_counter, __pred, __vfn, true);
     }
 
     bool
     _M_try_acquire() noexcept
     {
-      auto const __pred =
-	[this] { return _S_do_try_acquire(&this->_M_counter); };
-      return std::__detail::__atomic_spin(__pred);
+      auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+      auto const __pred = [this](__detail::__platform_wait_t __cur)
+	{ return _S_do_try_acquire(&this->_M_counter, __cur); };
+      return __atomic_wait_address_for(&_M_counter, __pred, __vfn,
+					 __detail::__wait_clock_t::duration(),
+					 true);
     }
 
     template<typename _Clock, typename _Duration>
@@ -215,21 +225,22 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_try_acquire_until(const chrono::time_point<_Clock,
 			   _Duration>& __atime) noexcept
       {
-	auto const __pred =
-	  [this] { return _S_do_try_acquire(&this->_M_counter); };
-
-	return __atomic_wait_address_until_bare(&_M_counter, __pred, __atime);
+	auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+	auto const __pred = [this](__detail::__platform_wait_t __cur)
+	 { return _S_do_try_acquire(&this->_M_counter, __cur); };
+	return std::__atomic_wait_address_until(&_M_counter,
+						__pred, __vfn, __atime, true);
       }
 
     template<typename _Rep, typename _Period>
       _GLIBCXX_ALWAYS_INLINE bool
-      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime)
-	noexcept
+      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime) noexcept
       {
-	auto const __pred =
-	  [this] { return _S_do_try_acquire(&this->_M_counter); };
-
-	return __atomic_wait_address_for_bare(&_M_counter, __pred, __rtime);
+	auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+	auto const __pred = [this](__detail::__platform_wait_t __cur)
+	 { return _S_do_try_acquire(&this->_M_counter, __cur); };
+	return std::__atomic_wait_address_for(&_M_counter,
+					      __pred, __vfn, __rtime, true);
       }
 
     _GLIBCXX_ALWAYS_INLINE void
@@ -238,9 +249,9 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       if (0 < __atomic_impl::fetch_add(&_M_counter, __update, memory_order_release))
 	return;
       if (__update > 1)
-	__atomic_notify_address_bare(&_M_counter, true);
+	__atomic_notify_address(&_M_counter, true, true);
       else
-	__atomic_notify_address_bare(&_M_counter, true);
+	__atomic_notify_address(&_M_counter, true, true);
 // FIXME - Figure out why this does not wake a waiting thread
 //	__atomic_notify_address_bare(&_M_counter, false);
     }
diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index 6c3cfd44697c..62b03d0223f4 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -194,11 +194,7 @@  It looks different from literature pseudocode for two main reasons:
       wait(arrival_token&& __old_phase) const
       {
 	__atomic_phase_const_ref_t __phase(_M_phase);
-	auto const __test_fn = [=]
-	  {
-	    return __phase.load(memory_order_acquire) != __old_phase;
-	  };
-	std::__atomic_wait_address(&_M_phase, __test_fn);
+	__phase.wait(__old_phase, memory_order_acquire);
       }
 
       void
diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
index 9220580613d2..de0afd8989bf 100644
--- a/libstdc++-v3/include/std/latch
+++ b/libstdc++-v3/include/std/latch
@@ -78,8 +78,9 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _GLIBCXX_ALWAYS_INLINE void
     wait() const noexcept
     {
-      auto const __pred = [this] { return this->try_wait(); };
-      std::__atomic_wait_address(&_M_a, __pred);
+      auto const __vfn = [this] { return this->try_wait(); };
+      auto const __pred = [this](bool __b) { return __b; };
+      std::__atomic_wait_address(&_M_a, __pred, __vfn);
     }
 
     _GLIBCXX_ALWAYS_INLINE void
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
index 018c0c98d0ec..ec596e316500 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
@@ -54,8 +54,8 @@  main()
     atom->store(0);
   }
 
-  auto a = &std::__detail::__waiter_pool_base::_S_for(reinterpret_cast<char *>(atomics.a[0]));
-  auto b = &std::__detail::__waiter_pool_base::_S_for(reinterpret_cast<char *>(atomics.a[1]));
+  auto a = &std::__detail::__waiter_pool_impl::_S_impl_for(reinterpret_cast<char *>(atomics.a[0]));
+  auto b = &std::__detail::__waiter_pool_impl::_S_impl_for(reinterpret_cast<char *>(atomics.a[1]));
   VERIFY( a == b );
 
   auto fut0 = std::async(std::launch::async, [&] { atomics.a[0]->wait(0); });