From patchwork Wed Oct 13 22:57:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 46190 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0229D3858403 for ; Wed, 13 Oct 2021 22:58:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0229D3858403 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1634165895; bh=rFG2FJcVvCNy2F6n/hjeFQWpZgUxq/ZGd7tH55RZfOQ=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=YuQDnVPOCyU+VPjJ7QrAoGkn0Bz2TBU+R6KzQ9owSK+su7eNLF8zQPkFxPzRu4PmO IR1BZ3IcpsU4CSUKnRVp5l6t1yYeEXhEOAQVtZ3z8p3a5dIF5Kqv9Uij7zatmWYTfj 51lD5I8wObjeMBTYKvVrCLq1YY/mb9Y46XivuX14= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 4CB083858403 for ; Wed, 13 Oct 2021 22:57:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4CB083858403 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-604-m_3d80PuP6OwpD5BqKErlw-1; Wed, 13 Oct 2021 18:57:11 -0400 X-MC-Unique: m_3d80PuP6OwpD5BqKErlw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8C0B91808312; Wed, 13 Oct 2021 22:57:10 +0000 (UTC) Received: from localhost (unknown [10.33.37.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id 36F695F4E2; Wed, 13 Oct 2021 22:57:09 +0000 (UTC) Date: Wed, 13 Oct 2021 23:57:09 +0100 To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [committed] libstdc++: Fix regression in memory use when constructing paths Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=unavailable autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jonathan Wakely via Gcc-patches From: Jonathan Wakely Reply-To: Jonathan Wakely Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" On 13/10/21 21:19 +0100, Jonathan Wakely wrote: >On 13/10/21 20:41 +0100, Jonathan Wakely wrote: >>Adjust the __detail::__effective_range overloads so they always return a >>string or string view using std::char_traits, because we don't care >>about the traits of an incoming string. >> >>Use std::contiguous_iterator in the __effective_range(const Source&) >>overload, to allow returning a basic_string_view in more cases. For the >>non-contiguous cases in both __effective_range and __string_from_range, >>return a std::string instead of std::u8string when the value type of the >>range is char8_t. These changes avoid unnecessary basic_string >>temporaries. > >[...] > >> template >> inline auto >> __string_from_range(_InputIterator __first, _InputIterator __last) >> { >> using _EcharT >> = typename std::iterator_traits<_InputIterator>::value_type; >>- static_assert(__is_encoded_char<_EcharT>); >>+ static_assert(__is_encoded_char<_EcharT>); // C++17 [fs.req]/3 >> >>-#if __cpp_lib_concepts >>- constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>; >>-#else >>- constexpr bool __contiguous >>- = is_pointer_v; >>-#endif >>- if constexpr (__contiguous) >>+ if constexpr (__is_contiguous<_InputIterator>) > >Oops, this pessimiszes construction from string::iterator and >vector::iterator in C++17 mode, because the new __is_contiguous >variable template just uses is_pointer_v, without the __niter_base >call that unwraps a __normal_iterator. > >That means that we now create a basic_string temporary where we >previously jsut returned a basic_string_view. > >I am testing a fix. Here's the fix, committed to trunk. With this change there are no temporaries for string and vector iterators passed to path constructors. For debug mode there are some unnecessary temporaries for vector::iterator arguments, because the safe iterator isn't recognized as contiguous in C++17 mode (but it's fine in C++20 mode). The bytes allocated to construct a path consisting of a single filename with 38 characters are: Constructor arguments GCC 11 | GCC 12 | 11 debug | 12 debug -------------------------------------------|--------|----------|--------- const char* 39 | 39 | 39 | 39 const char(&)[N] 39 | 39 | 39 | 39 const char8_t* 39 | 39 | 39 | 39 const char8_t(&)[N] 39 | 39 | 39 | 39 std::string::iterator pair 39 | 39 | 39 | 39 std::string::iterator NTCTS 92 | 39 | 92 | 39 std::u8string::iterator pair 39 | 39 | 39 | 39 std::u8string::iterator NTCTS 131 | 39 | 131 | 39 std::vector::iterator pair 39 | 39 | 78 | 78 std::vector::iterator NTCTS 131 | 39 | 131 | 39 std::list::iterator pair 78 | 78 | 78 | 78 std::list::iterator NTCTS 131 | 131 | 131 | 131 So for GCC 12 there are no unwanted allocations unless the iterators are not contiguous (e.g. std::list::iterator). In that case we need to construct a temporary string. For the pair(Iter, Iter) constructor we can use distance(first, last) to size that temporary string correctly, but for the path(const Source&) constructor we read one character at a time from the input and call push_back. In any case, the regression is fixed and we're at least as good as GCC 11 in all cases now. commit f874a13ca3870a56036a90758b0d41c8c217f4f7 Author: Jonathan Wakely Date: Wed Oct 13 21:32:14 2021 libstdc++: Fix regression in memory use when constructing paths The changes in r12-4381 were intended to reduce memory usage, but replacing the __contiguous constant in __string_from_range with the new __is_contiguous variable template caused a regression. The old code checked is_pointer_v but he new code just checks is_pointer_v<_InputIterator>. This means that we no longer recognise basic_string::iterator and vector::iterator as contiguous, and so return a temporary basic_string instead of a basic_string_view. This only affects C++17 mode, because the std::contiguous_iterator concept is used in C++20 which gives the right answer for __normal_iterator (and more types as well). The fix is to specialize the new __is_contiguous variable template so it is true for __normal_iterator specializations. The new partial specializations are defined for C++20 too, because it should be cheaper to match the partial specialization than to check whether the std::contiguous_iterator concept is satisfied. libstdc++-v3/ChangeLog: * include/bits/fs_path.h (__detail::__is_contiguous): Add partial specializations for pointers and __normal_iterator. diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h index 05db792fbae..c51bfa3095a 100644 --- a/libstdc++-v3/include/bits/fs_path.h +++ b/libstdc++-v3/include/bits/fs_path.h @@ -158,9 +158,16 @@ namespace __detail constexpr bool __is_contiguous = std::contiguous_iterator<_Iter>; #else template - constexpr bool __is_contiguous = is_pointer_v<_Iter>; + constexpr bool __is_contiguous = false; #endif + template + constexpr bool __is_contiguous<_Tp*> = true; + + template + constexpr bool + __is_contiguous<__gnu_cxx::__normal_iterator<_Tp*, _Seq>> = true; + #if !defined _GLIBCXX_FILESYSTEM_IS_WINDOWS && defined _GLIBCXX_USE_CHAR8_T // For POSIX treat char8_t sequences as char without encoding conversions. template