[committed] libstdc++: Optimize ref-count updates in COW std::string
Commit Message
Tested x86_64-linux, pushed to trunk.
Most ref-count updates in the COW string are done via the functions in
<ext/atomicity.h>, which will use non-atomic ops when the program is
known to be single-threaded. The _M_is_leaked() and _M_is_shared()
functions use __atomic_load_n directly, because <ext/atomicity.h>
doesn't provide a load operation. Those functions can check the
__is_single_threaded() predicate to avoid using __atomic_load_n when not
needed.
The move constructor for the fully-dynamic-string increments the
ref-count by either 2 or 1, for leaked or non-leaked strings
respectively. That can be changed to use a non-atomic store of 1 for all
non-shared strings. It can be non-atomic because even if the program is
multi-threaded, conflicting access to the rvalue object while it's being
moved from would be data race anyway. It can store 1 directly for all
non-shared strings because it doesn't matter whether the initial
refcount was -1 or 0, it should be 1 after the move constructor creates
a second owner.
libstdc++-v3/ChangeLog:
* include/bits/cow_string.h (basic_string::_M_is_leaked): Use
non-atomic load when __is_single_threaded() is true.
(basic_string::_M_is_shared): Likewise.
(basic_string::(basic_string&&)) [_GLIBCXX_FULLY_DYNAMIC_STRING]:
Use non-atomic store when rvalue is not shared.
---
libstdc++-v3/include/bits/cow_string.h | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
Comments
* Jonathan Wakely via Libstdc:
> diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
> index ced395b80b8..4fae1d02981 100644
> --- a/libstdc++-v3/include/bits/cow_string.h
> +++ b/libstdc++-v3/include/bits/cow_string.h
> @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> * destroy the empty-string _Rep object.
> *
> * All but the last paragraph is considered pretty conventional
> - * for a C++ string implementation.
> + * for a Copy-On-Write C++ string implementation.
> */
> // 21.3 Template class basic_string
> template<typename _CharT, typename _Traits, typename _Alloc>
> @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> // so we need to use an atomic load. However, _M_is_leaked
> // predicate does not change concurrently (i.e. the string is either
> // leaked or not), so a relaxed load is enough.
> - return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> -#else
> - return this->_M_refcount < 0;
> + if (!__gnu_cxx::__is_single_threaded())
> + return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> #endif
> + return this->_M_refcount < 0;
> }
Relaxed MO loads of word-size values on all current architectures only
have a compiler barrier, so I think the optimization makes things worse?
(I doubt the conditional lack of a compiler barrier leads to
optimization improvements elsewhere.)
Thanks,
Florian
On Wed, 1 Dec 2021 at 18:16, Florian Weimer wrote:
>
> * Jonathan Wakely via Libstdc:
>
> > diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
> > index ced395b80b8..4fae1d02981 100644
> > --- a/libstdc++-v3/include/bits/cow_string.h
> > +++ b/libstdc++-v3/include/bits/cow_string.h
> > @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > * destroy the empty-string _Rep object.
> > *
> > * All but the last paragraph is considered pretty conventional
> > - * for a C++ string implementation.
> > + * for a Copy-On-Write C++ string implementation.
> > */
> > // 21.3 Template class basic_string
> > template<typename _CharT, typename _Traits, typename _Alloc>
> > @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > // so we need to use an atomic load. However, _M_is_leaked
> > // predicate does not change concurrently (i.e. the string is either
> > // leaked or not), so a relaxed load is enough.
> > - return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> > -#else
> > - return this->_M_refcount < 0;
> > + if (!__gnu_cxx::__is_single_threaded())
> > + return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> > #endif
> > + return this->_M_refcount < 0;
> > }
>
> Relaxed MO loads of word-size values on all current architectures only
> have a compiler barrier, so I think the optimization makes things worse?
Hmm, yes.
> (I doubt the conditional lack of a compiler barrier leads to
> optimization improvements elsewhere.)
Probably not. I'll revert the change to _M_is_leaked() and just keep
it for _M_is_shared().
Thanks for pointing that out.
@@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* destroy the empty-string _Rep object.
*
* All but the last paragraph is considered pretty conventional
- * for a C++ string implementation.
+ * for a Copy-On-Write C++ string implementation.
*/
// 21.3 Template class basic_string
template<typename _CharT, typename _Traits, typename _Alloc>
@@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// so we need to use an atomic load. However, _M_is_leaked
// predicate does not change concurrently (i.e. the string is either
// leaked or not), so a relaxed load is enough.
- return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
-#else
- return this->_M_refcount < 0;
+ if (!__gnu_cxx::__is_single_threaded())
+ return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
#endif
+ return this->_M_refcount < 0;
}
bool
@@ -222,10 +222,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// but one reference concurrently with this check, so we need this
// load to be acquire to synchronize with release fetch_and_add in
// _M_dispose.
- return __atomic_load_n(&this->_M_refcount, __ATOMIC_ACQUIRE) > 0;
-#else
- return this->_M_refcount > 0;
+ if (!__gnu_cxx::__is_single_threaded())
+ return __atomic_load_n(&this->_M_refcount, __ATOMIC_ACQUIRE) > 0;
#endif
+ return this->_M_refcount > 0;
}
void
@@ -629,12 +629,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
#else
// Rather than allocate an empty string for the rvalue string,
// just share ownership with it by incrementing the reference count.
- // If the rvalue string was "leaked" then it was the unique owner,
- // so need an extra increment to indicate shared ownership.
- if (_M_rep()->_M_is_leaked())
- __gnu_cxx::__atomic_add_dispatch(&_M_rep()->_M_refcount, 2);
- else
+ // If the rvalue string was the unique owner then there are exactly
+ // two owners now.
+ if (_M_rep()->_M_is_shared())
__gnu_cxx::__atomic_add_dispatch(&_M_rep()->_M_refcount, 1);
+ else
+ _M_rep()->_M_refcount = 1;
#endif
}