Message ID | 20211202170031.366865-1-jwakely@redhat.com |
---|---|
State | Committed |
Commit | fb9875ebf10c86be21824cb836b7b3b80f3a731b |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 24491385C40A for <patchwork@sourceware.org>; Thu, 2 Dec 2021 17:21:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 24491385C40A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1638465660; bh=Poo69HATr9x9P5WimX3xTfRSqrDFwXQzEeP7IijOQZs=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=xgVPieh5IxBMQo535xWgVs9vk9GY4I9lKdj79ENzdbKcO1sLJhXz8euqPVGKG+jZf S2OHNqv7YffDO8VjOqpTGcvwIyk5lL0c8D+zmMgt4q9G62HjyLW7/VN3apuiKpM52X GgTyjlMnnqoZXzdLSapXhN8qQ3/O91lk+YcOAvCQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 5CC24385B83E for <gcc-patches@gcc.gnu.org>; Thu, 2 Dec 2021 17:03:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5CC24385B83E Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-29-5bWSg-HuPRKzrJlV8T0InQ-1; Thu, 02 Dec 2021 12:01:34 -0500 X-MC-Unique: 5bWSg-HuPRKzrJlV8T0InQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0475683DEA6; Thu, 2 Dec 2021 17:00:33 +0000 (UTC) Received: from localhost (unknown [10.33.36.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id A93835F4E9; Thu, 2 Dec 2021 17:00:32 +0000 (UTC) To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH] libstdc++: Do not leak empty COW strings Date: Thu, 2 Dec 2021 17:00:31 +0000 Message-Id: <20211202170031.366865-1-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Jonathan Wakely via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Jonathan Wakely <jwakely@redhat.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
libstdc++: Do not leak empty COW strings
|
|
Commit Message
Jonathan Wakely
Dec. 2, 2021, 5 p.m. UTC
Apart from "don't bother changing the COW string", does anybody see a reason we shouldn't do this? This passes all tests for normal COW strings and fully-dynamic COW strings. When non-const references, pointers or iterators are obtained to the contents of a COW std::basic_string, the implementation has to assume it could result in a write to the contents. If the string was previously shared, it does the "copy-on-write" step of creating a new copy of the data that is not shared by another object. It also marks the string as "leaked", so that no future copies of it will share ownership either. However, if the string is empty then the only character in the sequence is the terminating null, and modifying that is undefined behaviour. This means that non-const references/pointers/iterators to an empty string are effectively const. Since no direct modification is possible, there is no need to "leak" the string, it can be safely shared with other objects. This avoids unnecessary allocations to create new copies of empty strings that can't be modified anyway. We already did this optimization for strings that share ownership of the static _S_empty_rep() object, but not for strings that have non-zero capacity, and not for fully-dynamic-strings (where the _S_empty_rep() object is never used). With this change we avoid two allocations in the return statement: std::string s; s.reserve(1); // allocate std::string s2 = s; std::string s3 = s; return s[0] + s2[0] + s3[0]; // leak+allocate twice libstdc++-v3/ChangeLog: * include/bits/cow_string.h (basic_string::_M_leak_hard): Do not reallocate an empty string. --- libstdc++-v3/include/bits/cow_string.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
Comments
On Thu, 2 Dec 2021 at 17:21, Jonathan Wakely via Libstdc++ <libstdc++@gcc.gnu.org> wrote: > > Apart from "don't bother changing the COW string", does anybody see a > reason we shouldn't do this? This passes all tests for normal COW > strings and fully-dynamic COW strings. Pushed to trunk. > > When non-const references, pointers or iterators are obtained to the > contents of a COW std::basic_string, the implementation has to assume it > could result in a write to the contents. If the string was previously > shared, it does the "copy-on-write" step of creating a new copy of the > data that is not shared by another object. It also marks the string as > "leaked", so that no future copies of it will share ownership either. > > However, if the string is empty then the only character in the sequence > is the terminating null, and modifying that is undefined behaviour. This > means that non-const references/pointers/iterators to an empty string > are effectively const. Since no direct modification is possible, there > is no need to "leak" the string, it can be safely shared with other > objects. This avoids unnecessary allocations to create new copies of > empty strings that can't be modified anyway. > > We already did this optimization for strings that share ownership of the > static _S_empty_rep() object, but not for strings that have non-zero > capacity, and not for fully-dynamic-strings (where the _S_empty_rep() > object is never used). > > With this change we avoid two allocations in the return statement: > > std::string s; > s.reserve(1); // allocate > std::string s2 = s; > std::string s3 = s; > return s[0] + s2[0] + s3[0]; // leak+allocate twice > > libstdc++-v3/ChangeLog: > > * include/bits/cow_string.h (basic_string::_M_leak_hard): Do not > reallocate an empty string. > --- > libstdc++-v3/include/bits/cow_string.h | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h > index 389b39583e4..b21a7422246 100644 > --- a/libstdc++-v3/include/bits/cow_string.h > +++ b/libstdc++-v3/include/bits/cow_string.h > @@ -3366,10 +3366,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > basic_string<_CharT, _Traits, _Alloc>:: > _M_leak_hard() > { > -#if _GLIBCXX_FULLY_DYNAMIC_STRING == 0 > - if (_M_rep() == &_S_empty_rep()) > + // No need to create a new copy of an empty string when a non-const > + // reference/pointer/iterator into it is obtained. Modifying the > + // trailing null character is undefined, so the ref/pointer/iterator > + // is effectively const anyway. > + if (this->empty()) > return; > -#endif > + > if (_M_rep()->_M_is_shared()) > _M_mutate(0, 0, 0); > _M_rep()->_M_set_leaked(); > -- > 2.31.1 >
diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h index 389b39583e4..b21a7422246 100644 --- a/libstdc++-v3/include/bits/cow_string.h +++ b/libstdc++-v3/include/bits/cow_string.h @@ -3366,10 +3366,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION basic_string<_CharT, _Traits, _Alloc>:: _M_leak_hard() { -#if _GLIBCXX_FULLY_DYNAMIC_STRING == 0 - if (_M_rep() == &_S_empty_rep()) + // No need to create a new copy of an empty string when a non-const + // reference/pointer/iterator into it is obtained. Modifying the + // trailing null character is undefined, so the ref/pointer/iterator + // is effectively const anyway. + if (this->empty()) return; -#endif + if (_M_rep()->_M_is_shared()) _M_mutate(0, 0, 0); _M_rep()->_M_set_leaked();