[committed] libstdc++: Fix regression in memory use when constructing paths

Message ID YWdkRVBp41FbiG0K@redhat.com
State Committed
Headers
Series [committed] libstdc++: Fix regression in memory use when constructing paths |

Commit Message

Jonathan Wakely Oct. 13, 2021, 10:57 p.m. UTC
  On 13/10/21 21:19 +0100, Jonathan Wakely wrote:
>On 13/10/21 20:41 +0100, Jonathan Wakely wrote:
>>Adjust the __detail::__effective_range overloads so they always return a
>>string or string view using std::char_traits, because we don't care
>>about the traits of an incoming string.
>>
>>Use std::contiguous_iterator in the __effective_range(const Source&)
>>overload, to allow returning a basic_string_view in more cases. For the
>>non-contiguous cases in both __effective_range and __string_from_range,
>>return a std::string instead of std::u8string when the value type of the
>>range is char8_t.  These changes avoid unnecessary basic_string
>>temporaries.
>
>[...]
>
>>  template<typename _InputIterator>
>>    inline auto
>>    __string_from_range(_InputIterator __first, _InputIterator __last)
>>    {
>>      using _EcharT
>>	= typename std::iterator_traits<_InputIterator>::value_type;
>>-      static_assert(__is_encoded_char<_EcharT>);
>>+      static_assert(__is_encoded_char<_EcharT>); // C++17 [fs.req]/3
>>
>>-#if __cpp_lib_concepts
>>-      constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>;
>>-#else
>>-      constexpr bool __contiguous
>>-	= is_pointer_v<decltype(std::__niter_base(__first))>;
>>-#endif
>>-      if constexpr (__contiguous)
>>+      if constexpr (__is_contiguous<_InputIterator>)
>
>Oops, this pessimiszes construction from string::iterator and
>vector::iterator in C++17 mode, because the new __is_contiguous
>variable template just uses is_pointer_v, without the __niter_base
>call that unwraps a __normal_iterator.
>
>That means that we now create a basic_string<C> temporary where we
>previously jsut returned a basic_string_view<C>.
>
>I am testing a fix.

Here's the fix, committed to trunk.

With this change there are no temporaries for string and vector
iterators passed to path constructors. For debug mode there are some
unnecessary temporaries for vector::iterator arguments, because the
safe iterator isn't recognized as contiguous in C++17 mode (but it's
fine in C++20 mode).

The bytes allocated to construct a path consisting of a single
filename with 38 characters are:

Constructor arguments               GCC 11 | GCC 12 | 11 debug | 12 debug
-------------------------------------------|--------|----------|---------
const char*                           39   |   39   |    39    |   39
const char(&)[N]                      39   |   39   |    39    |   39
const char8_t*                        39   |   39   |    39    |   39
const char8_t(&)[N]                   39   |   39   |    39    |   39
std::string::iterator pair            39   |   39   |    39    |   39
std::string::iterator NTCTS           92   |   39   |    92    |   39
std::u8string::iterator pair          39   |   39   |    39    |   39
std::u8string::iterator NTCTS        131   |   39   |   131    |   39
std::vector<char8_t>::iterator pair   39   |   39   |    78    |   78
std::vector<char8_t>::iterator NTCTS 131   |   39   |   131    |   39
std::list<char8_t>::iterator pair     78   |   78   |    78    |   78
std::list<char8_t>::iterator NTCTS   131   |  131   |   131    |  131

So for GCC 12 there are no unwanted allocations unless the iterators
are not contiguous (e.g. std::list::iterator). In that case we need to
construct a temporary string. For the pair(Iter, Iter) constructor we
can use distance(first, last) to size that temporary string correctly,
but for the path(const Source&) constructor we read one character at a
time from the input and call push_back.

In any case, the regression is fixed and we're at least as good as GCC
11 in all cases now.
  

Patch

commit f874a13ca3870a56036a90758b0d41c8c217f4f7
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Oct 13 21:32:14 2021

    libstdc++: Fix regression in memory use when constructing paths
    
    The changes in r12-4381 were intended to reduce memory usage, but
    replacing the __contiguous constant in __string_from_range with the new
    __is_contiguous variable template caused a regression. The old code
    checked is_pointer_v<decltype(std::__niter_base(__first))> but he new
    code just checks is_pointer_v<_InputIterator>. This means that we no
    longer recognise basic_string::iterator and vector::iterator as
    contiguous, and so return a temporary basic_string instead of a
    basic_string_view. This only affects C++17 mode, because the
    std::contiguous_iterator concept is used in C++20 which gives the right
    answer for __normal_iterator (and more types as well).
    
    The fix is to specialize the new __is_contiguous variable template so it
    is true for __normal_iterator<T*, C> specializations. The new partial
    specializations are defined for C++20 too, because it should be cheaper
    to match the partial specialization than to check whether the
    std::contiguous_iterator concept is satisfied.
    
    libstdc++-v3/ChangeLog:
    
            * include/bits/fs_path.h (__detail::__is_contiguous): Add
            partial specializations for pointers and __normal_iterator.

diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index 05db792fbae..c51bfa3095a 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -158,9 +158,16 @@  namespace __detail
     constexpr bool __is_contiguous = std::contiguous_iterator<_Iter>;
 #else
   template<typename _Iter>
-    constexpr bool __is_contiguous = is_pointer_v<_Iter>;
+    constexpr bool __is_contiguous = false;
 #endif
 
+  template<typename _Tp>
+    constexpr bool __is_contiguous<_Tp*> = true;
+
+  template<typename _Tp, typename _Seq>
+    constexpr bool
+    __is_contiguous<__gnu_cxx::__normal_iterator<_Tp*, _Seq>> = true;
+
 #if !defined _GLIBCXX_FILESYSTEM_IS_WINDOWS && defined _GLIBCXX_USE_CHAR8_T
   // For POSIX treat char8_t sequences as char without encoding conversions.
   template<typename _EcharT>