libstdc++: Silence false positive null-dereference warning in istreambuf_iterator::operator++ [PR105580]
| Message ID | 20260107144053.672943-1-tkaminsk@redhat.com |
|---|---|
| State | New |
| Headers |
Return-Path: <gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 8B1E84BA2E26 for <patchwork@sourceware.org>; Wed, 7 Jan 2026 14:42:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8B1E84BA2E26 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=U1iCGV85 X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 0E3944BA2E05 for <gcc-patches@gcc.gnu.org>; Wed, 7 Jan 2026 14:41:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0E3944BA2E05 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0E3944BA2E05 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1767796860; cv=none; b=Xt90YyA/umnEVgASM0HILfIzBa0Q+o0cJ0g4voXlxEQW+Cc/NGojVCb++hBkwD+XUjJM2YJilJRDbbcG5oG1H81oRvBF/qN2kt8tIhEonEs4utK+DGkGIv2WwZ4mSsSIByiPhVbX6cQoloQP+Nqv8K4X7Zkt0xZuSwYJT0PL6WY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1767796860; c=relaxed/simple; bh=iKUIloZ+zeRVTssRge7NYQDPHQiXReigapRphWaTiWM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=kp8Xdlv0Q9GMsj0zBDjNGvHM8YXbtsg/ndmHxx1lSD49ABeGGmGjrnMj0fX6NvyJAjpBZEsCdvjmDkbkfOMXhJM6iSZqCgaea2s3aPKkxC+3WsZqKCOL3C3wRj0Q2mgCleLooqyhoRwej6xfJU3sk3LRgNHajOCTSTzRhkkjkvY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0E3944BA2E05 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1767796859; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=g+T/0wgfrdAb/OzlUdIbkut+skaIhdpOO4ZIT0/cEYk=; b=U1iCGV85yLoAwqvjNQ8+XvxeVxAJV2OY2ZfO9G3O+FSQhw6p/E+F5ZXCkC43NlvV44DLek 700zqS1SCyQ0L2Zkw5KXIawZyrMqFHkFVWonn6bjRWM9xVE2nWOdqDBBIRiSpaQZ8BmeGE wS3jw5GC/hNakZxGXi7RxNRzVwSGLKg= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-674-Fh8JZv41NpaRbMXRuoBXmw-1; Wed, 07 Jan 2026 09:40:56 -0500 X-MC-Unique: Fh8JZv41NpaRbMXRuoBXmw-1 X-Mimecast-MFC-AGG-ID: Fh8JZv41NpaRbMXRuoBXmw_1767796855 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A144218005AD; Wed, 7 Jan 2026 14:40:55 +0000 (UTC) Received: from localhost (unknown [10.44.32.252]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id ECB161955F24; Wed, 7 Jan 2026 14:40:54 +0000 (UTC) From: =?utf-8?q?Tomasz_Kami=C5=84ski?= <tkaminsk@redhat.com> To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH] libstdc++: Silence false positive null-dereference warning in istreambuf_iterator::operator++ [PR105580] Date: Wed, 7 Jan 2026 15:30:43 +0100 Message-ID: <20260107144053.672943-1-tkaminsk@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: twDF75frwPdn6jMRqhA9Tc06eetzAfj5ffgCdSX9NhY_1767796855 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_BLOCKED, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_PASS, SPF_NONE, TXREP, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org |
| Series |
libstdc++: Silence false positive null-dereference warning in istreambuf_iterator::operator++ [PR105580]
|
|
Commit Message
Tomasz Kaminski
Jan. 7, 2026, 2:30 p.m. UTC
The warning was produced by following seqeunce, given an istream_iterator<char>
it, such that *it will result in hitting EoF in it->_M_get(), and thus clearing
_M_sbuf, the subsequent call to ++it, will result in _M_sbuf->sbumpc() call
on null pointer dereference. This is however an false-positive, as in such sitution
it == istream_iteator() returns true, and the iterator should not be dereferenced
in first place.
This patch addresses the above by clearing the _M_sbuf in operator++, instead of
_M_get(). This removes the need for making _M_sbuf mutable, and thus make the
implementation conforming with regards to C++11 [res.on.data.races] p3.
This change should not or positive have performance impact on the usual iteration
patterns, in form:
while (it != end) { process(*it); ++it; }
In case when it is end-of-stream iterator, the it != end returns in one call of
_M_sbuf->sgetc() both before and after the change. However we do not modify _M_sbuf
in this case. For non-empty range, we replace call to _M_sbuf->sbumpc() with
_M_sbuf->snextc() in pre-increment, and extract the check again from *it to
++it. However, as _M_sbuf is now cleared during increment, so last it != end
check avoids _M_sbuf->sgetc() call to check against EoF.
However, this change impact the behavior of the post-increment (*it++), as
we now load both current character (for return value) and next character (to
check against EoF). In consequence we call both sgetc() and snextc(), in contrast
to previous single sbumpc() call.
PR libstdc++/105580
libstdc++-v3/ChangeLog:
* include/bits/streambuf_iterator.h (istreambuf_iterator::_M_sbuf):
Remove mutable and adjust whitespace.
(istreambuf_iterator::_M_c): Adjust whitespace.
(istreambuf_iterator::operator++()): Clear _M_sbuf if next character
is EoF.
(istreambuf_iterator::operator++(int)): Use _M_sbuf->sgetc() to
load current character, and define in terms of __*this.
(istreambuf_iterator::_M_get()): Do not clear _M_sbuf in case of EoF.
* testsuite/24_iterators/istreambuf_iterator/2.cc: Test for using
multiple iterators to same rdbuf.
---
This warning turned out to be false-positive (as explained in the commit
decription), so we can just add add pragrams to address that. But this
change seem to be beneficial anyway, as we are no longer causing
data-races.
Tested on x86_64-linux. Locally tested with -Wnull-dereference,
and the istream_iterator warning in 24_iterators/istreambuf_iterator/2.cc
disappears (there are some other warning present I haven't looked into
that yet).
.../include/bits/streambuf_iterator.h | 20 +++----
.../24_iterators/istreambuf_iterator/2.cc | 58 +++++++++++++++++++
2 files changed, 66 insertions(+), 12 deletions(-)
Comments
Forgot to include, this change are ABI compatible, as clearing _M_sbuf is de-facto optimization of the EoF checks, and if we get (N - after the patch, O - before the patch): * N op*, O op++ -> we never set _M_sbuf to nullptr, so equals will call sgetc() for also when the iterator reaches end-of-stream (same as old behavior) * O op*, N op++ -> impacts only programs that deference past the end iterators (we allow that as extension), otherwise no-impact, except duplicated _S_is_eof(_M_c) check. On Wed, Jan 7, 2026 at 3:41 PM Tomasz Kamiński <tkaminsk@redhat.com> wrote: > The warning was produced by following seqeunce, given an > istream_iterator<char> > it, such that *it will result in hitting EoF in it->_M_get(), and thus > clearing > _M_sbuf, the subsequent call to ++it, will result in _M_sbuf->sbumpc() call > on null pointer dereference. This is however an false-positive, as in such > sitution > it == istream_iteator() returns true, and the iterator should not be > dereferenced > in first place. > > This patch addresses the above by clearing the _M_sbuf in operator++, > instead of > _M_get(). This removes the need for making _M_sbuf mutable, and thus make > the > implementation conforming with regards to C++11 [res.on.data.races] p3. > > This change should not or positive have performance impact on the usual > iteration > patterns, in form: > while (it != end) { process(*it); ++it; } > In case when it is end-of-stream iterator, the it != end returns in one > call of > _M_sbuf->sgetc() both before and after the change. However we do not > modify _M_sbuf > in this case. For non-empty range, we replace call to _M_sbuf->sbumpc() > with > _M_sbuf->snextc() in pre-increment, and extract the check again from *it to > ++it. However, as _M_sbuf is now cleared during increment, so last it != > end > check avoids _M_sbuf->sgetc() call to check against EoF. > > However, this change impact the behavior of the post-increment (*it++), as > we now load both current character (for return value) and next character > (to > check against EoF). In consequence we call both sgetc() and snextc(), in > contrast > to previous single sbumpc() call. > > PR libstdc++/105580 > > libstdc++-v3/ChangeLog: > > * include/bits/streambuf_iterator.h (istreambuf_iterator::_M_sbuf): > Remove mutable and adjust whitespace. > (istreambuf_iterator::_M_c): Adjust whitespace. > (istreambuf_iterator::operator++()): Clear _M_sbuf if next > character > is EoF. > (istreambuf_iterator::operator++(int)): Use _M_sbuf->sgetc() to > load current character, and define in terms of __*this. > (istreambuf_iterator::_M_get()): Do not clear _M_sbuf in case of > EoF. > * testsuite/24_iterators/istreambuf_iterator/2.cc: Test for using > multiple iterators to same rdbuf. > --- > This warning turned out to be false-positive (as explained in the commit > decription), so we can just add add pragrams to address that. But this > change seem to be beneficial anyway, as we are no longer causing > data-races. > > Tested on x86_64-linux. Locally tested with -Wnull-dereference, > and the istream_iterator warning in 24_iterators/istreambuf_iterator/2.cc > disappears (there are some other warning present I haven't looked into > that yet). > > .../include/bits/streambuf_iterator.h | 20 +++---- > .../24_iterators/istreambuf_iterator/2.cc | 58 +++++++++++++++++++ > 2 files changed, 66 insertions(+), 12 deletions(-) > > diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h > b/libstdc++-v3/include/bits/streambuf_iterator.h > index 93d3dd24f93..095928ca4d8 100644 > --- a/libstdc++-v3/include/bits/streambuf_iterator.h > +++ b/libstdc++-v3/include/bits/streambuf_iterator.h > @@ -112,8 +112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > // the "end of stream" iterator value. > // NB: This implementation assumes the "end of stream" value > // is EOF, or -1. > - mutable streambuf_type* _M_sbuf; > - int_type _M_c; > + streambuf_type* _M_sbuf; > + int_type _M_c; > > public: > /// Construct end of input stream iterator. > @@ -172,7 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > _M_message(__gnu_debug::__msg_inc_istreambuf) > ._M_iterator(*this)); > > - _M_sbuf->sbumpc(); > + if (_S_is_eof(_M_sbuf->snextc())) > + _M_sbuf = 0; > _M_c = traits_type::eof(); > return *this; > } > @@ -181,14 +182,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > istreambuf_iterator > operator++(int) > { > - __glibcxx_requires_cond(_M_sbuf && > - (!_S_is_eof(_M_c) || > !_S_is_eof(_M_sbuf->sgetc())), > - > _M_message(__gnu_debug::__msg_inc_istreambuf) > - ._M_iterator(*this)); > - > istreambuf_iterator __old = *this; > - __old._M_c = _M_sbuf->sbumpc(); > - _M_c = traits_type::eof(); > + __old._M_c = _M_sbuf->sgetc(); > + ++*this; > return __old; > } > > @@ -206,8 +202,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > _M_get() const > { > int_type __ret = _M_c; > - if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = > _M_sbuf->sgetc())) > - _M_sbuf = 0; > + if (_M_sbuf && _S_is_eof(__ret)) > + __ret = _M_sbuf->sgetc(); > return __ret; > } > > diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc > b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc > index 78701d71cee..0318a677f35 100644 > --- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc > +++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc > @@ -109,8 +109,66 @@ void test02(void) > } > } > > +void > +test_empty() > +{ > + std::istreambuf_iterator<char> null(0), end; > + VERIFY( null == end ); > + > + std::istringstream ess; > + // Thie specification hear seem to indicate, that such iterators > + // are not end-of-stream iterators, as rdbuf pointer is nullptr, > + // however we treat them as such, as otherwise we would produce > + // a range containing single eof character. > + std::istreambuf_iterator<char> it1(ess), it2(ess); > + VERIFY( it1 == end ); > + VERIFY( it2 == end ); > +} > + > +void > +test_multi() > +{ > + // C++98 (and later) define operator* in [istreambuf.iterator.ops] as: > + // Returns: The character obtained via the streambuf member > sbuf_->sgetc(). > + // This nails down behavior of mulptiple iterators to same sequence. > + std::istringstream iss("abcd"); > + std::istreambuf_iterator<char> it1(iss), it2(iss), end; > + > + VERIFY( it1 != end ); > + VERIFY( it2 != end ); > + VERIFY( *it1 == 'a' ); > + VERIFY( *it2 == 'a' ); > + ++it1; > + > + VERIFY( it1 != end ); > + VERIFY( it2 != end ); > + VERIFY( *it1 == 'b' ); > + VERIFY( *it2 == 'b' ); > + ++it2; > + > + VERIFY( it1 != end ); > + VERIFY( it2 != end ); > + VERIFY( *it1 == 'c' ); > + VERIFY( *it2 == 'c' ); > + ++it2; > + > + VERIFY( it1 != end ); > + VERIFY( it2 != end ); > + VERIFY( *it1 == 'd' ); > + VERIFY( *it2 == 'd' ); > + // second dereference > + VERIFY( *it1 == 'd' ); > + VERIFY( *it2 == 'd' ); > + ++it1; > + > + VERIFY( it1 == end ); > + VERIFY( it2 == end ); > +} > + > int main() > { > test02(); > + test_empty(); > + test_multi(); > return 0; > } > -- > 2.52.0 > >
I have performed some microbenchmarking, on following example: * bench_empty: input is empty istreamstream, creates iterator and compares against end * bench_stream_create: only resetting stringstream with new content * bench_preinc: for loop over using pre increment * bench_postinc: for loop over using post increment * bench_deref_posting: while with *it++ This are the result I have observed, for the changes with the patch: -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- bench_empty 2.10 ns 2.09 ns 337921434 bench_stream_create 10.6 ns 10.6 ns 65736840 bench_preinc 419 ns 419 ns 1662732 bench_postinc 436 ns 435 ns 1609828 bench_deref_postinc 306 ns 305 ns 22866 And the result before the patch: -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- bench_empty 2.27 ns 2.26 ns 314944156 bench_stream_create 11.7 ns 11.6 ns 60679957 bench_preinc 873 ns 871 ns 807352 bench_postinc 920 ns 917 ns 751565 bench_deref_postinc 706 ns 704 ns 1038942 For sanity, I have also check the result on g++ 15.2 that ships with my system: -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- bench_empty 2.30 ns 2.30 ns 303070664 bench_stream_create 10.6 ns 10.6 ns 65778662 bench_preinc 767 ns 765 ns 915085 bench_postinc 767 ns 764 ns 913985 bench_deref_postinc 675 ns 673 ns 1107772 So it looks like eliminating save to member from both equal and operator* makes implementation two times faster. I will look into that. Attaching also my benchmark file. Regards, Tomasz On Wed, Jan 7, 2026 at 3:41 PM Tomasz Kamiński <tkaminsk@redhat.com> wrote: > The warning was produced by following seqeunce, given an > istream_iterator<char> > it, such that *it will result in hitting EoF in it->_M_get(), and thus > clearing > _M_sbuf, the subsequent call to ++it, will result in _M_sbuf->sbumpc() call > on null pointer dereference. This is however an false-positive, as in such > sitution > it == istream_iteator() returns true, and the iterator should not be > dereferenced > in first place. > > This patch addresses the above by clearing the _M_sbuf in operator++, > instead of > _M_get(). This removes the need for making _M_sbuf mutable, and thus make > the > implementation conforming with regards to C++11 [res.on.data.races] p3. > > This change should not or positive have performance impact on the usual > iteration > patterns, in form: > while (it != end) { process(*it); ++it; } > In case when it is end-of-stream iterator, the it != end returns in one > call of > _M_sbuf->sgetc() both before and after the change. However we do not > modify _M_sbuf > in this case. For non-empty range, we replace call to _M_sbuf->sbumpc() > with > _M_sbuf->snextc() in pre-increment, and extract the check again from *it to > ++it. However, as _M_sbuf is now cleared during increment, so last it != > end > check avoids _M_sbuf->sgetc() call to check against EoF. > > However, this change impact the behavior of the post-increment (*it++), as > we now load both current character (for return value) and next character > (to > check against EoF). In consequence we call both sgetc() and snextc(), in > contrast > to previous single sbumpc() call. > > PR libstdc++/105580 > > libstdc++-v3/ChangeLog: > > * include/bits/streambuf_iterator.h (istreambuf_iterator::_M_sbuf): > Remove mutable and adjust whitespace. > (istreambuf_iterator::_M_c): Adjust whitespace. > (istreambuf_iterator::operator++()): Clear _M_sbuf if next > character > is EoF. > (istreambuf_iterator::operator++(int)): Use _M_sbuf->sgetc() to > load current character, and define in terms of __*this. > (istreambuf_iterator::_M_get()): Do not clear _M_sbuf in case of > EoF. > * testsuite/24_iterators/istreambuf_iterator/2.cc: Test for using > multiple iterators to same rdbuf. > --- > This warning turned out to be false-positive (as explained in the commit > decription), so we can just add add pragrams to address that. But this > change seem to be beneficial anyway, as we are no longer causing > data-races. > > Tested on x86_64-linux. Locally tested with -Wnull-dereference, > and the istream_iterator warning in 24_iterators/istreambuf_iterator/2.cc > disappears (there are some other warning present I haven't looked into > that yet). > > .../include/bits/streambuf_iterator.h | 20 +++---- > .../24_iterators/istreambuf_iterator/2.cc | 58 +++++++++++++++++++ > 2 files changed, 66 insertions(+), 12 deletions(-) > > diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h > b/libstdc++-v3/include/bits/streambuf_iterator.h > index 93d3dd24f93..095928ca4d8 100644 > --- a/libstdc++-v3/include/bits/streambuf_iterator.h > +++ b/libstdc++-v3/include/bits/streambuf_iterator.h > @@ -112,8 +112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > // the "end of stream" iterator value. > // NB: This implementation assumes the "end of stream" value > // is EOF, or -1. > - mutable streambuf_type* _M_sbuf; > - int_type _M_c; > + streambuf_type* _M_sbuf; > + int_type _M_c; > > public: > /// Construct end of input stream iterator. > @@ -172,7 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > _M_message(__gnu_debug::__msg_inc_istreambuf) > ._M_iterator(*this)); > > - _M_sbuf->sbumpc(); > + if (_S_is_eof(_M_sbuf->snextc())) > + _M_sbuf = 0; > _M_c = traits_type::eof(); > return *this; > } > @@ -181,14 +182,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > istreambuf_iterator > operator++(int) > { > - __glibcxx_requires_cond(_M_sbuf && > - (!_S_is_eof(_M_c) || > !_S_is_eof(_M_sbuf->sgetc())), > - > _M_message(__gnu_debug::__msg_inc_istreambuf) > - ._M_iterator(*this)); > - > istreambuf_iterator __old = *this; > - __old._M_c = _M_sbuf->sbumpc(); > - _M_c = traits_type::eof(); > + __old._M_c = _M_sbuf->sgetc(); > + ++*this; > return __old; > } > > @@ -206,8 +202,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > _M_get() const > { > int_type __ret = _M_c; > - if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = > _M_sbuf->sgetc())) > - _M_sbuf = 0; > + if (_M_sbuf && _S_is_eof(__ret)) > + __ret = _M_sbuf->sgetc(); > return __ret; > } > > diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc > b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc > index 78701d71cee..0318a677f35 100644 > --- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc > +++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc > @@ -109,8 +109,66 @@ void test02(void) > } > } > > +void > +test_empty() > +{ > + std::istreambuf_iterator<char> null(0), end; > + VERIFY( null == end ); > + > + std::istringstream ess; > + // Thie specification hear seem to indicate, that such iterators > + // are not end-of-stream iterators, as rdbuf pointer is nullptr, > + // however we treat them as such, as otherwise we would produce > + // a range containing single eof character. > + std::istreambuf_iterator<char> it1(ess), it2(ess); > + VERIFY( it1 == end ); > + VERIFY( it2 == end ); > +} > + > +void > +test_multi() > +{ > + // C++98 (and later) define operator* in [istreambuf.iterator.ops] as: > + // Returns: The character obtained via the streambuf member > sbuf_->sgetc(). > + // This nails down behavior of mulptiple iterators to same sequence. > + std::istringstream iss("abcd"); > + std::istreambuf_iterator<char> it1(iss), it2(iss), end; > + > + VERIFY( it1 != end ); > + VERIFY( it2 != end ); > + VERIFY( *it1 == 'a' ); > + VERIFY( *it2 == 'a' ); > + ++it1; > + > + VERIFY( it1 != end ); > + VERIFY( it2 != end ); > + VERIFY( *it1 == 'b' ); > + VERIFY( *it2 == 'b' ); > + ++it2; > + > + VERIFY( it1 != end ); > + VERIFY( it2 != end ); > + VERIFY( *it1 == 'c' ); > + VERIFY( *it2 == 'c' ); > + ++it2; > + > + VERIFY( it1 != end ); > + VERIFY( it2 != end ); > + VERIFY( *it1 == 'd' ); > + VERIFY( *it2 == 'd' ); > + // second dereference > + VERIFY( *it1 == 'd' ); > + VERIFY( *it2 == 'd' ); > + ++it1; > + > + VERIFY( it1 == end ); > + VERIFY( it2 == end ); > +} > + > int main() > { > test02(); > + test_empty(); > + test_multi(); > return 0; > } > -- > 2.52.0 > >
On Thu, Jan 8, 2026 at 10:38 AM Tomasz Kaminski <tkaminsk@redhat.com> wrote: > I have performed some microbenchmarking, on following example: > * bench_empty: input is empty istreamstream, creates iterator and compares > against end > * bench_stream_create: only resetting stringstream with new content > * bench_preinc: for loop over using pre increment > * bench_postinc: for loop over using post increment > * bench_deref_posting: while with *it++ > > This are the result I have observed, for the changes with the patch: > -------------------------------------------------------------- > Benchmark Time CPU Iterations > -------------------------------------------------------------- > bench_empty 2.10 ns 2.09 ns 337921434 > bench_stream_create 10.6 ns 10.6 ns 65736840 > bench_preinc 419 ns 419 ns 1662732 > bench_postinc 436 ns 435 ns 1609828 > bench_deref_postinc 306 ns 305 ns 22866 > > And the result before the patch: > -------------------------------------------------------------- > Benchmark Time CPU Iterations > -------------------------------------------------------------- > bench_empty 2.27 ns 2.26 ns 314944156 > bench_stream_create 11.7 ns 11.6 ns 60679957 > bench_preinc 873 ns 871 ns 807352 > bench_postinc 920 ns 917 ns 751565 > bench_deref_postinc 706 ns 704 ns 1038942 > > For sanity, I have also check the result on g++ 15.2 that ships with my > system: > -------------------------------------------------------------- > Benchmark Time CPU Iterations > -------------------------------------------------------------- > bench_empty 2.30 ns 2.30 ns 303070664 > bench_stream_create 10.6 ns 10.6 ns 65778662 > bench_preinc 767 ns 765 ns 915085 > bench_postinc 767 ns 764 ns 913985 > bench_deref_postinc 675 ns 673 ns 1107772 > > So it looks like eliminating save to member from both equal and operator* > makes implementation > two times faster. I will look into that. > > Attaching also my benchmark file. > > Regards, > Tomasz > > On Wed, Jan 7, 2026 at 3:41 PM Tomasz Kamiński <tkaminsk@redhat.com> > wrote: > >> The warning was produced by following seqeunce, given an >> istream_iterator<char> >> it, such that *it will result in hitting EoF in it->_M_get(), and thus >> clearing >> _M_sbuf, the subsequent call to ++it, will result in _M_sbuf->sbumpc() >> call >> on null pointer dereference. This is however an false-positive, as in >> such sitution >> it == istream_iteator() returns true, and the iterator should not be >> dereferenced >> in first place. >> >> This patch addresses the above by clearing the _M_sbuf in operator++, >> instead of >> _M_get(). This removes the need for making _M_sbuf mutable, and thus make >> the >> implementation conforming with regards to C++11 [res.on.data.races] p3. >> >> This change should not or positive have performance impact on the usual >> iteration >> patterns, in form: >> while (it != end) { process(*it); ++it; } >> In case when it is end-of-stream iterator, the it != end returns in one >> call of >> _M_sbuf->sgetc() both before and after the change. However we do not >> modify _M_sbuf >> in this case. For non-empty range, we replace call to _M_sbuf->sbumpc() >> with >> _M_sbuf->snextc() in pre-increment, and extract the check again from *it >> to >> ++it. However, as _M_sbuf is now cleared during increment, so last it != >> end >> check avoids _M_sbuf->sgetc() call to check against EoF. >> >> However, this change impact the behavior of the post-increment (*it++), as >> we now load both current character (for return value) and next character >> (to >> check against EoF). In consequence we call both sgetc() and snextc(), in >> contrast >> to previous single sbumpc() call. >> >> PR libstdc++/105580 >> >> libstdc++-v3/ChangeLog: >> >> * include/bits/streambuf_iterator.h >> (istreambuf_iterator::_M_sbuf): >> Remove mutable and adjust whitespace. >> (istreambuf_iterator::_M_c): Adjust whitespace. >> (istreambuf_iterator::operator++()): Clear _M_sbuf if next >> character >> is EoF. >> (istreambuf_iterator::operator++(int)): Use _M_sbuf->sgetc() to >> load current character, and define in terms of __*this. >> (istreambuf_iterator::_M_get()): Do not clear _M_sbuf in case of >> EoF. >> * testsuite/24_iterators/istreambuf_iterator/2.cc: Test for using >> multiple iterators to same rdbuf. >> --- >> This warning turned out to be false-positive (as explained in the commit >> decription), so we can just add add pragrams to address that. But this >> change seem to be beneficial anyway, as we are no longer causing >> data-races. >> >> Tested on x86_64-linux. Locally tested with -Wnull-dereference, >> and the istream_iterator warning in 24_iterators/istreambuf_iterator/2.cc >> disappears (there are some other warning present I haven't looked into >> that yet). >> >> .../include/bits/streambuf_iterator.h | 20 +++---- >> .../24_iterators/istreambuf_iterator/2.cc | 58 +++++++++++++++++++ >> 2 files changed, 66 insertions(+), 12 deletions(-) >> >> diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h >> b/libstdc++-v3/include/bits/streambuf_iterator.h >> index 93d3dd24f93..095928ca4d8 100644 >> --- a/libstdc++-v3/include/bits/streambuf_iterator.h >> +++ b/libstdc++-v3/include/bits/streambuf_iterator.h >> @@ -112,8 +112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> // the "end of stream" iterator value. >> // NB: This implementation assumes the "end of stream" value >> // is EOF, or -1. >> - mutable streambuf_type* _M_sbuf; >> - int_type _M_c; >> + streambuf_type* _M_sbuf; >> + int_type _M_c; >> >> public: >> /// Construct end of input stream iterator. >> @@ -172,7 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> >> _M_message(__gnu_debug::__msg_inc_istreambuf) >> ._M_iterator(*this)); >> >> - _M_sbuf->sbumpc(); >> + if (_S_is_eof(_M_sbuf->snextc())) >> + _M_sbuf = 0; >> _M_c = traits_type::eof(); >> return *this; >> } >> @@ -181,14 +182,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> istreambuf_iterator >> operator++(int) >> { >> - __glibcxx_requires_cond(_M_sbuf && >> - (!_S_is_eof(_M_c) || >> !_S_is_eof(_M_sbuf->sgetc())), >> - >> _M_message(__gnu_debug::__msg_inc_istreambuf) >> - ._M_iterator(*this)); >> - >> istreambuf_iterator __old = *this; >> - __old._M_c = _M_sbuf->sbumpc(); >> - _M_c = traits_type::eof(); >> + __old._M_c = _M_sbuf->sgetc(); >> + ++*this; >> return __old; >> } >> >> @@ -206,8 +202,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> _M_get() const >> { >> int_type __ret = _M_c; >> - if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = >> _M_sbuf->sgetc())) >> - _M_sbuf = 0; >> + if (_M_sbuf && _S_is_eof(__ret)) >> + __ret = _M_sbuf->sgetc(); >> > The benefit seem to be result of eliminating the if condition here, as we are now having one less eof check per loop, previously we got: O: 4: 2x2 in _M_get per operator* and equal N: 3: 2x1 in _M_get per operator* and equal, and one in operator++ > return __ret; >> } >> >> diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >> b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >> index 78701d71cee..0318a677f35 100644 >> --- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >> +++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >> @@ -109,8 +109,66 @@ void test02(void) >> } >> } >> >> +void >> +test_empty() >> +{ >> + std::istreambuf_iterator<char> null(0), end; >> + VERIFY( null == end ); >> + >> + std::istringstream ess; >> + // Thie specification hear seem to indicate, that such iterators >> + // are not end-of-stream iterators, as rdbuf pointer is nullptr, >> + // however we treat them as such, as otherwise we would produce >> + // a range containing single eof character. >> + std::istreambuf_iterator<char> it1(ess), it2(ess); >> + VERIFY( it1 == end ); >> + VERIFY( it2 == end ); >> +} >> + >> +void >> +test_multi() >> +{ >> + // C++98 (and later) define operator* in [istreambuf.iterator.ops] as: >> + // Returns: The character obtained via the streambuf member >> sbuf_->sgetc(). >> + // This nails down behavior of mulptiple iterators to same sequence. >> + std::istringstream iss("abcd"); >> + std::istreambuf_iterator<char> it1(iss), it2(iss), end; >> + >> + VERIFY( it1 != end ); >> + VERIFY( it2 != end ); >> + VERIFY( *it1 == 'a' ); >> + VERIFY( *it2 == 'a' ); >> + ++it1; >> + >> + VERIFY( it1 != end ); >> + VERIFY( it2 != end ); >> + VERIFY( *it1 == 'b' ); >> + VERIFY( *it2 == 'b' ); >> + ++it2; >> + >> + VERIFY( it1 != end ); >> + VERIFY( it2 != end ); >> + VERIFY( *it1 == 'c' ); >> + VERIFY( *it2 == 'c' ); >> + ++it2; >> + >> + VERIFY( it1 != end ); >> + VERIFY( it2 != end ); >> + VERIFY( *it1 == 'd' ); >> + VERIFY( *it2 == 'd' ); >> + // second dereference >> + VERIFY( *it1 == 'd' ); >> + VERIFY( *it2 == 'd' ); >> + ++it1; >> + >> + VERIFY( it1 == end ); >> + VERIFY( it2 == end ); >> +} >> + >> int main() >> { >> test02(); >> + test_empty(); >> + test_multi(); >> return 0; >> } >> -- >> 2.52.0 >> >>
On Thu, Jan 8, 2026 at 10:55 AM Tomasz Kaminski <tkaminsk@redhat.com> wrote: > > > On Thu, Jan 8, 2026 at 10:38 AM Tomasz Kaminski <tkaminsk@redhat.com> > wrote: > >> I have performed some microbenchmarking, on following example: >> * bench_empty: input is empty istreamstream, creates iterator and >> compares against end >> * bench_stream_create: only resetting stringstream with new content >> * bench_preinc: for loop over using pre increment >> * bench_postinc: for loop over using post increment >> * bench_deref_posting: while with *it++ >> >> This are the result I have observed, for the changes with the patch: >> -------------------------------------------------------------- >> Benchmark Time CPU Iterations >> -------------------------------------------------------------- >> bench_empty 2.10 ns 2.09 ns 337921434 >> bench_stream_create 10.6 ns 10.6 ns 65736840 >> bench_preinc 419 ns 419 ns 1662732 >> bench_postinc 436 ns 435 ns 1609828 >> bench_deref_postinc 306 ns 305 ns 22866 >> >> And the result before the patch: >> -------------------------------------------------------------- >> Benchmark Time CPU Iterations >> -------------------------------------------------------------- >> bench_empty 2.27 ns 2.26 ns 314944156 >> bench_stream_create 11.7 ns 11.6 ns 60679957 >> bench_preinc 873 ns 871 ns 807352 >> bench_postinc 920 ns 917 ns 751565 >> bench_deref_postinc 706 ns 704 ns 1038942 >> >> For sanity, I have also check the result on g++ 15.2 that ships with my >> system: >> -------------------------------------------------------------- >> Benchmark Time CPU Iterations >> -------------------------------------------------------------- >> bench_empty 2.30 ns 2.30 ns 303070664 >> bench_stream_create 10.6 ns 10.6 ns 65778662 >> bench_preinc 767 ns 765 ns 915085 >> bench_postinc 767 ns 764 ns 913985 >> bench_deref_postinc 675 ns 673 ns 1107772 >> >> So it looks like eliminating save to member from both equal and operator* >> makes implementation >> two times faster. I will look into that. >> >> Attaching also my benchmark file. >> >> Regards, >> Tomasz >> >> On Wed, Jan 7, 2026 at 3:41 PM Tomasz Kamiński <tkaminsk@redhat.com> >> wrote: >> >>> The warning was produced by following seqeunce, given an >>> istream_iterator<char> >>> it, such that *it will result in hitting EoF in it->_M_get(), and thus >>> clearing >>> _M_sbuf, the subsequent call to ++it, will result in _M_sbuf->sbumpc() >>> call >>> on null pointer dereference. This is however an false-positive, as in >>> such sitution >>> it == istream_iteator() returns true, and the iterator should not be >>> dereferenced >>> in first place. >>> >>> This patch addresses the above by clearing the _M_sbuf in operator++, >>> instead of >>> _M_get(). This removes the need for making _M_sbuf mutable, and thus >>> make the >>> implementation conforming with regards to C++11 [res.on.data.races] p3. >>> >>> This change should not or positive have performance impact on the usual >>> iteration >>> patterns, in form: >>> while (it != end) { process(*it); ++it; } >>> In case when it is end-of-stream iterator, the it != end returns in one >>> call of >>> _M_sbuf->sgetc() both before and after the change. However we do not >>> modify _M_sbuf >>> in this case. For non-empty range, we replace call to _M_sbuf->sbumpc() >>> with >>> _M_sbuf->snextc() in pre-increment, and extract the check again from *it >>> to >>> ++it. However, as _M_sbuf is now cleared during increment, so last it != >>> end >>> check avoids _M_sbuf->sgetc() call to check against EoF. >>> >>> However, this change impact the behavior of the post-increment (*it++), >>> as >>> we now load both current character (for return value) and next character >>> (to >>> check against EoF). In consequence we call both sgetc() and snextc(), in >>> contrast >>> to previous single sbumpc() call. >>> >>> PR libstdc++/105580 >>> >>> libstdc++-v3/ChangeLog: >>> >>> * include/bits/streambuf_iterator.h >>> (istreambuf_iterator::_M_sbuf): >>> Remove mutable and adjust whitespace. >>> (istreambuf_iterator::_M_c): Adjust whitespace. >>> (istreambuf_iterator::operator++()): Clear _M_sbuf if next >>> character >>> is EoF. >>> (istreambuf_iterator::operator++(int)): Use _M_sbuf->sgetc() to >>> load current character, and define in terms of __*this. >>> (istreambuf_iterator::_M_get()): Do not clear _M_sbuf in case of >>> EoF. >>> * testsuite/24_iterators/istreambuf_iterator/2.cc: Test for using >>> multiple iterators to same rdbuf. >>> --- >>> This warning turned out to be false-positive (as explained in the commit >>> decription), so we can just add add pragrams to address that. But this >>> change seem to be beneficial anyway, as we are no longer causing >>> data-races. >>> >>> Tested on x86_64-linux. Locally tested with -Wnull-dereference, >>> and the istream_iterator warning in 24_iterators/istreambuf_iterator/2.cc >>> disappears (there are some other warning present I haven't looked into >>> that yet). >>> >>> .../include/bits/streambuf_iterator.h | 20 +++---- >>> .../24_iterators/istreambuf_iterator/2.cc | 58 +++++++++++++++++++ >>> 2 files changed, 66 insertions(+), 12 deletions(-) >>> >>> diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h >>> b/libstdc++-v3/include/bits/streambuf_iterator.h >>> index 93d3dd24f93..095928ca4d8 100644 >>> --- a/libstdc++-v3/include/bits/streambuf_iterator.h >>> +++ b/libstdc++-v3/include/bits/streambuf_iterator.h >>> @@ -112,8 +112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>> // the "end of stream" iterator value. >>> // NB: This implementation assumes the "end of stream" value >>> // is EOF, or -1. >>> - mutable streambuf_type* _M_sbuf; >>> - int_type _M_c; >>> + streambuf_type* _M_sbuf; >>> + int_type _M_c; >>> >>> public: >>> /// Construct end of input stream iterator. >>> @@ -172,7 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>> >>> _M_message(__gnu_debug::__msg_inc_istreambuf) >>> ._M_iterator(*this)); >>> >>> - _M_sbuf->sbumpc(); >>> + if (_S_is_eof(_M_sbuf->snextc())) >>> + _M_sbuf = 0; >>> _M_c = traits_type::eof(); >>> return *this; >>> } >>> @@ -181,14 +182,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>> istreambuf_iterator >>> operator++(int) >>> { >>> - __glibcxx_requires_cond(_M_sbuf && >>> - (!_S_is_eof(_M_c) || >>> !_S_is_eof(_M_sbuf->sgetc())), >>> - >>> _M_message(__gnu_debug::__msg_inc_istreambuf) >>> - ._M_iterator(*this)); >>> - >>> istreambuf_iterator __old = *this; >>> - __old._M_c = _M_sbuf->sbumpc(); >>> - _M_c = traits_type::eof(); >>> + __old._M_c = _M_sbuf->sgetc(); >>> + ++*this; >>> return __old; >>> } >>> >>> @@ -206,8 +202,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>> _M_get() const >>> { >>> int_type __ret = _M_c; >>> - if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = >>> _M_sbuf->sgetc())) >>> - _M_sbuf = 0; >>> + if (_M_sbuf && _S_is_eof(__ret)) >>> + __ret = _M_sbuf->sgetc(); >>> >> The benefit seem to be result of eliminating the if condition here, as we > are now having > one less eof check per loop, previously we got: > O: 4: 2x2 in _M_get per operator* and equal > N: 3: 2x1 in _M_get per operator* and equal, and one in operator++ > We could also eliminate the _S_is_eof(__ret) check, by returning __proxy instead of istream_iterator for pre-increment, and eliminate uses of _M_c completely. This will however break already non-portable code that uses: istream_iteartor i; istream_iteartor i2 = i++; We cannot provide the conversion operator, as this would require reintroducing S_is_eof check. I have prototyped that, and there is a measurable benefit from doing it, but I think something that we should do in stage 1 for GCC 17. return __ret; >>> } >>> >>> diff --git >>> a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >>> b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >>> index 78701d71cee..0318a677f35 100644 >>> --- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >>> +++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >>> @@ -109,8 +109,66 @@ void test02(void) >>> } >>> } >>> >>> +void >>> +test_empty() >>> +{ >>> + std::istreambuf_iterator<char> null(0), end; >>> + VERIFY( null == end ); >>> + >>> + std::istringstream ess; >>> + // Thie specification hear seem to indicate, that such iterators >>> + // are not end-of-stream iterators, as rdbuf pointer is nullptr, >>> + // however we treat them as such, as otherwise we would produce >>> + // a range containing single eof character. >>> + std::istreambuf_iterator<char> it1(ess), it2(ess); >>> + VERIFY( it1 == end ); >>> + VERIFY( it2 == end ); >>> +} >>> + >>> +void >>> +test_multi() >>> +{ >>> + // C++98 (and later) define operator* in [istreambuf.iterator.ops] as: >>> + // Returns: The character obtained via the streambuf member >>> sbuf_->sgetc(). >>> + // This nails down behavior of mulptiple iterators to same sequence. >>> + std::istringstream iss("abcd"); >>> + std::istreambuf_iterator<char> it1(iss), it2(iss), end; >>> + >>> + VERIFY( it1 != end ); >>> + VERIFY( it2 != end ); >>> + VERIFY( *it1 == 'a' ); >>> + VERIFY( *it2 == 'a' ); >>> + ++it1; >>> + >>> + VERIFY( it1 != end ); >>> + VERIFY( it2 != end ); >>> + VERIFY( *it1 == 'b' ); >>> + VERIFY( *it2 == 'b' ); >>> + ++it2; >>> + >>> + VERIFY( it1 != end ); >>> + VERIFY( it2 != end ); >>> + VERIFY( *it1 == 'c' ); >>> + VERIFY( *it2 == 'c' ); >>> + ++it2; >>> + >>> + VERIFY( it1 != end ); >>> + VERIFY( it2 != end ); >>> + VERIFY( *it1 == 'd' ); >>> + VERIFY( *it2 == 'd' ); >>> + // second dereference >>> + VERIFY( *it1 == 'd' ); >>> + VERIFY( *it2 == 'd' ); >>> + ++it1; >>> + >>> + VERIFY( it1 == end ); >>> + VERIFY( it2 == end ); >>> +} >>> + >>> int main() >>> { >>> test02(); >>> + test_empty(); >>> + test_multi(); >>> return 0; >>> } >>> -- >>> 2.52.0 >>> >>>
On Thu, Jan 8, 2026 at 11:52 AM Tomasz Kaminski <tkaminsk@redhat.com> wrote: > > > On Thu, Jan 8, 2026 at 10:55 AM Tomasz Kaminski <tkaminsk@redhat.com> > wrote: > >> >> >> On Thu, Jan 8, 2026 at 10:38 AM Tomasz Kaminski <tkaminsk@redhat.com> >> wrote: >> >>> I have performed some microbenchmarking, on following example: >>> * bench_empty: input is empty istreamstream, creates iterator and >>> compares against end >>> * bench_stream_create: only resetting stringstream with new content >>> * bench_preinc: for loop over using pre increment >>> * bench_postinc: for loop over using post increment >>> * bench_deref_posting: while with *it++ >>> >>> This are the result I have observed, for the changes with the patch: >>> -------------------------------------------------------------- >>> Benchmark Time CPU Iterations >>> -------------------------------------------------------------- >>> bench_empty 2.10 ns 2.09 ns 337921434 >>> bench_stream_create 10.6 ns 10.6 ns 65736840 >>> bench_preinc 419 ns 419 ns 1662732 >>> bench_postinc 436 ns 435 ns 1609828 >>> bench_deref_postinc 306 ns 305 ns 22866 >>> >>> And the result before the patch: >>> -------------------------------------------------------------- >>> Benchmark Time CPU Iterations >>> -------------------------------------------------------------- >>> bench_empty 2.27 ns 2.26 ns 314944156 >>> bench_stream_create 11.7 ns 11.6 ns 60679957 >>> bench_preinc 873 ns 871 ns 807352 >>> bench_postinc 920 ns 917 ns 751565 >>> bench_deref_postinc 706 ns 704 ns 1038942 >>> >> >>> For sanity, I have also check the result on g++ 15.2 that ships with my >>> system: >>> -------------------------------------------------------------- >>> Benchmark Time CPU Iterations >>> -------------------------------------------------------------- >>> bench_empty 2.30 ns 2.30 ns 303070664 >>> bench_stream_create 10.6 ns 10.6 ns 65778662 >>> bench_preinc 767 ns 765 ns 915085 >>> bench_postinc 767 ns 764 ns 913985 >>> bench_deref_postinc 675 ns 673 ns 1107772 >>> >> The times reported when using __proxy as return type for post-increment, without providing conversion to istream_iterator: -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- bench_empty 1.80 ns 1.79 ns 398182529 bench_stream_create 10.6 ns 10.6 ns 65450500 bench_preinc 314 ns 314 ns 2231714 bench_postinc 360 ns 359 ns 2029998 bench_deref_postinc 315 ns 313 ns 2235176 > >>> So it looks like eliminating save to member from both equal and >>> operator* makes implementation >>> two times faster. I will look into that. >>> >>> Attaching also my benchmark file. >>> >>> Regards, >>> Tomasz >>> >>> On Wed, Jan 7, 2026 at 3:41 PM Tomasz Kamiński <tkaminsk@redhat.com> >>> wrote: >>> >>>> The warning was produced by following seqeunce, given an >>>> istream_iterator<char> >>>> it, such that *it will result in hitting EoF in it->_M_get(), and thus >>>> clearing >>>> _M_sbuf, the subsequent call to ++it, will result in _M_sbuf->sbumpc() >>>> call >>>> on null pointer dereference. This is however an false-positive, as in >>>> such sitution >>>> it == istream_iteator() returns true, and the iterator should not be >>>> dereferenced >>>> in first place. >>>> >>>> This patch addresses the above by clearing the _M_sbuf in operator++, >>>> instead of >>>> _M_get(). This removes the need for making _M_sbuf mutable, and thus >>>> make the >>>> implementation conforming with regards to C++11 [res.on.data.races] p3. >>>> >>>> This change should not or positive have performance impact on the usual >>>> iteration >>>> patterns, in form: >>>> while (it != end) { process(*it); ++it; } >>>> In case when it is end-of-stream iterator, the it != end returns in one >>>> call of >>>> _M_sbuf->sgetc() both before and after the change. However we do not >>>> modify _M_sbuf >>>> in this case. For non-empty range, we replace call to _M_sbuf->sbumpc() >>>> with >>>> _M_sbuf->snextc() in pre-increment, and extract the check again from >>>> *it to >>>> ++it. However, as _M_sbuf is now cleared during increment, so last it >>>> != end >>>> check avoids _M_sbuf->sgetc() call to check against EoF. >>>> >>>> However, this change impact the behavior of the post-increment (*it++), >>>> as >>>> we now load both current character (for return value) and next >>>> character (to >>>> check against EoF). In consequence we call both sgetc() and snextc(), >>>> in contrast >>>> to previous single sbumpc() call. >>>> >>>> PR libstdc++/105580 >>>> >>>> libstdc++-v3/ChangeLog: >>>> >>>> * include/bits/streambuf_iterator.h >>>> (istreambuf_iterator::_M_sbuf): >>>> Remove mutable and adjust whitespace. >>>> (istreambuf_iterator::_M_c): Adjust whitespace. >>>> (istreambuf_iterator::operator++()): Clear _M_sbuf if next >>>> character >>>> is EoF. >>>> (istreambuf_iterator::operator++(int)): Use _M_sbuf->sgetc() to >>>> load current character, and define in terms of __*this. >>>> (istreambuf_iterator::_M_get()): Do not clear _M_sbuf in case >>>> of EoF. >>>> * testsuite/24_iterators/istreambuf_iterator/2.cc: Test for >>>> using >>>> multiple iterators to same rdbuf. >>>> --- >>>> This warning turned out to be false-positive (as explained in the commit >>>> decription), so we can just add add pragrams to address that. But this >>>> change seem to be beneficial anyway, as we are no longer causing >>>> data-races. >>>> >>>> Tested on x86_64-linux. Locally tested with -Wnull-dereference, >>>> and the istream_iterator warning in >>>> 24_iterators/istreambuf_iterator/2.cc >>>> disappears (there are some other warning present I haven't looked into >>>> that yet). >>>> >>>> .../include/bits/streambuf_iterator.h | 20 +++---- >>>> .../24_iterators/istreambuf_iterator/2.cc | 58 +++++++++++++++++++ >>>> 2 files changed, 66 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h >>>> b/libstdc++-v3/include/bits/streambuf_iterator.h >>>> index 93d3dd24f93..095928ca4d8 100644 >>>> --- a/libstdc++-v3/include/bits/streambuf_iterator.h >>>> +++ b/libstdc++-v3/include/bits/streambuf_iterator.h >>>> @@ -112,8 +112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>>> // the "end of stream" iterator value. >>>> // NB: This implementation assumes the "end of stream" value >>>> // is EOF, or -1. >>>> - mutable streambuf_type* _M_sbuf; >>>> - int_type _M_c; >>>> + streambuf_type* _M_sbuf; >>>> + int_type _M_c; >>>> >>>> public: >>>> /// Construct end of input stream iterator. >>>> @@ -172,7 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>>> >>>> _M_message(__gnu_debug::__msg_inc_istreambuf) >>>> ._M_iterator(*this)); >>>> >>>> - _M_sbuf->sbumpc(); >>>> + if (_S_is_eof(_M_sbuf->snextc())) >>>> + _M_sbuf = 0; >>>> _M_c = traits_type::eof(); >>>> return *this; >>>> } >>>> @@ -181,14 +182,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>>> istreambuf_iterator >>>> operator++(int) >>>> { >>>> - __glibcxx_requires_cond(_M_sbuf && >>>> - (!_S_is_eof(_M_c) || >>>> !_S_is_eof(_M_sbuf->sgetc())), >>>> - >>>> _M_message(__gnu_debug::__msg_inc_istreambuf) >>>> - ._M_iterator(*this)); >>>> - >>>> istreambuf_iterator __old = *this; >>>> - __old._M_c = _M_sbuf->sbumpc(); >>>> - _M_c = traits_type::eof(); >>>> + __old._M_c = _M_sbuf->sgetc(); >>>> + ++*this; >>>> return __old; >>>> } >>>> >>>> @@ -206,8 +202,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >>>> _M_get() const >>>> { >>>> int_type __ret = _M_c; >>>> - if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = >>>> _M_sbuf->sgetc())) >>>> - _M_sbuf = 0; >>>> + if (_M_sbuf && _S_is_eof(__ret)) >>>> + __ret = _M_sbuf->sgetc(); >>>> >>> The benefit seem to be result of eliminating the if condition here, as >> we are now having >> one less eof check per loop, previously we got: >> O: 4: 2x2 in _M_get per operator* and equal >> N: 3: 2x1 in _M_get per operator* and equal, and one in operator++ >> > We could also eliminate the _S_is_eof(__ret) check, by returning __proxy > instead of istream_iterator > for pre-increment, and eliminate uses of _M_c completely. This will > however break already non-portable > code that uses: > istream_iteartor i; > istream_iteartor i2 = i++; > We cannot provide the conversion operator, as this would require > reintroducing S_is_eof check. > > I have prototyped that, and there is a measurable benefit from doing it, > but I think something that we should > do in stage 1 for GCC 17. > > return __ret; >>>> } >>>> >>>> diff --git >>>> a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >>>> b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >>>> index 78701d71cee..0318a677f35 100644 >>>> --- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >>>> +++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc >>>> @@ -109,8 +109,66 @@ void test02(void) >>>> } >>>> } >>>> >>>> +void >>>> +test_empty() >>>> +{ >>>> + std::istreambuf_iterator<char> null(0), end; >>>> + VERIFY( null == end ); >>>> + >>>> + std::istringstream ess; >>>> + // Thie specification hear seem to indicate, that such iterators >>>> + // are not end-of-stream iterators, as rdbuf pointer is nullptr, >>>> + // however we treat them as such, as otherwise we would produce >>>> + // a range containing single eof character. >>>> + std::istreambuf_iterator<char> it1(ess), it2(ess); >>>> + VERIFY( it1 == end ); >>>> + VERIFY( it2 == end ); >>>> +} >>>> + >>>> +void >>>> +test_multi() >>>> +{ >>>> + // C++98 (and later) define operator* in [istreambuf.iterator.ops] >>>> as: >>>> + // Returns: The character obtained via the streambuf member >>>> sbuf_->sgetc(). >>>> + // This nails down behavior of mulptiple iterators to same sequence. >>>> + std::istringstream iss("abcd"); >>>> + std::istreambuf_iterator<char> it1(iss), it2(iss), end; >>>> + >>>> + VERIFY( it1 != end ); >>>> + VERIFY( it2 != end ); >>>> + VERIFY( *it1 == 'a' ); >>>> + VERIFY( *it2 == 'a' ); >>>> + ++it1; >>>> + >>>> + VERIFY( it1 != end ); >>>> + VERIFY( it2 != end ); >>>> + VERIFY( *it1 == 'b' ); >>>> + VERIFY( *it2 == 'b' ); >>>> + ++it2; >>>> + >>>> + VERIFY( it1 != end ); >>>> + VERIFY( it2 != end ); >>>> + VERIFY( *it1 == 'c' ); >>>> + VERIFY( *it2 == 'c' ); >>>> + ++it2; >>>> + >>>> + VERIFY( it1 != end ); >>>> + VERIFY( it2 != end ); >>>> + VERIFY( *it1 == 'd' ); >>>> + VERIFY( *it2 == 'd' ); >>>> + // second dereference >>>> + VERIFY( *it1 == 'd' ); >>>> + VERIFY( *it2 == 'd' ); >>>> + ++it1; >>>> + >>>> + VERIFY( it1 == end ); >>>> + VERIFY( it2 == end ); >>>> +} >>>> + >>>> int main() >>>> { >>>> test02(); >>>> + test_empty(); >>>> + test_multi(); >>>> return 0; >>>> } >>>> -- >>>> 2.52.0 >>>> >>>>
diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h b/libstdc++-v3/include/bits/streambuf_iterator.h index 93d3dd24f93..095928ca4d8 100644 --- a/libstdc++-v3/include/bits/streambuf_iterator.h +++ b/libstdc++-v3/include/bits/streambuf_iterator.h @@ -112,8 +112,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // the "end of stream" iterator value. // NB: This implementation assumes the "end of stream" value // is EOF, or -1. - mutable streambuf_type* _M_sbuf; - int_type _M_c; + streambuf_type* _M_sbuf; + int_type _M_c; public: /// Construct end of input stream iterator. @@ -172,7 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_message(__gnu_debug::__msg_inc_istreambuf) ._M_iterator(*this)); - _M_sbuf->sbumpc(); + if (_S_is_eof(_M_sbuf->snextc())) + _M_sbuf = 0; _M_c = traits_type::eof(); return *this; } @@ -181,14 +182,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION istreambuf_iterator operator++(int) { - __glibcxx_requires_cond(_M_sbuf && - (!_S_is_eof(_M_c) || !_S_is_eof(_M_sbuf->sgetc())), - _M_message(__gnu_debug::__msg_inc_istreambuf) - ._M_iterator(*this)); - istreambuf_iterator __old = *this; - __old._M_c = _M_sbuf->sbumpc(); - _M_c = traits_type::eof(); + __old._M_c = _M_sbuf->sgetc(); + ++*this; return __old; } @@ -206,8 +202,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_get() const { int_type __ret = _M_c; - if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = _M_sbuf->sgetc())) - _M_sbuf = 0; + if (_M_sbuf && _S_is_eof(__ret)) + __ret = _M_sbuf->sgetc(); return __ret; } diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc index 78701d71cee..0318a677f35 100644 --- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc +++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc @@ -109,8 +109,66 @@ void test02(void) } } +void +test_empty() +{ + std::istreambuf_iterator<char> null(0), end; + VERIFY( null == end ); + + std::istringstream ess; + // Thie specification hear seem to indicate, that such iterators + // are not end-of-stream iterators, as rdbuf pointer is nullptr, + // however we treat them as such, as otherwise we would produce + // a range containing single eof character. + std::istreambuf_iterator<char> it1(ess), it2(ess); + VERIFY( it1 == end ); + VERIFY( it2 == end ); +} + +void +test_multi() +{ + // C++98 (and later) define operator* in [istreambuf.iterator.ops] as: + // Returns: The character obtained via the streambuf member sbuf_->sgetc(). + // This nails down behavior of mulptiple iterators to same sequence. + std::istringstream iss("abcd"); + std::istreambuf_iterator<char> it1(iss), it2(iss), end; + + VERIFY( it1 != end ); + VERIFY( it2 != end ); + VERIFY( *it1 == 'a' ); + VERIFY( *it2 == 'a' ); + ++it1; + + VERIFY( it1 != end ); + VERIFY( it2 != end ); + VERIFY( *it1 == 'b' ); + VERIFY( *it2 == 'b' ); + ++it2; + + VERIFY( it1 != end ); + VERIFY( it2 != end ); + VERIFY( *it1 == 'c' ); + VERIFY( *it2 == 'c' ); + ++it2; + + VERIFY( it1 != end ); + VERIFY( it2 != end ); + VERIFY( *it1 == 'd' ); + VERIFY( *it2 == 'd' ); + // second dereference + VERIFY( *it1 == 'd' ); + VERIFY( *it2 == 'd' ); + ++it1; + + VERIFY( it1 == end ); + VERIFY( it2 == end ); +} + int main() { test02(); + test_empty(); + test_multi(); return 0; }