From patchwork Mon Apr 8 16:48:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 88186 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3AF7C3858C39 for ; Mon, 8 Apr 2024 16:53:44 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 2616B3858D28 for ; Mon, 8 Apr 2024 16:52:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2616B3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2616B3858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712595190; cv=none; b=tq9BXmT1EGIHw0O4+0C8ag8tWlKh0+lRVIVxq22P0GGq+nSkV1ryvIaWgEVfcafyQFjl9cGqKHrqylv88motF57zKFb7T23dVeeFEK0QQwS0fAcJ97PI0lriKFVKgB3OV4LjKWBFqTLgBmfVJH9m2tOO6zvKE0jNMU3HYIgB/0s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712595190; c=relaxed/simple; bh=EHa2S2Q9tpSgNcSHPMNvMrB291AWmhAG6HjhTTZNE7E=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=JHPMRTbPn2GF6pIr9tILJ8/iVHX07s7JuSHu3Ur2wHNgzrjegX35A8h1lH3+lW4IAq6O7eE/SvkQMprLXvsUcU0N/oRzZ7fDhD8uYYG0e5pjk2Hbud1z92cLmAtUCWYyGK+Sx79sSggR1+NnxeWNf6ZEpOODFPGqoWjj55WUkwE= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712595178; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3CD/uyUVqXsYqpe26FNacOUkC5BHz4/AzREoMm/bEfo=; b=RzX5nRcdP8Y1VaqNzLalIOzzyjvH1iO4IbpC21vYxecDEnjBYz5+GSZJaHmGujRXNDJ/WT /njlJ+LYTa/YSKVAWGfIGzLu1rWNQqVruxYoKFuaqW7Ty0zvwcI9gJedVnIF2tD3bGUtLv ZzRxNu/GMNABTM86trzsujavR0WTT/w= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-671-HyG6M3-7NKy8eUoKezoY1A-1; Mon, 08 Apr 2024 12:52:57 -0400 X-MC-Unique: HyG6M3-7NKy8eUoKezoY1A-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3015E29ABA11; Mon, 8 Apr 2024 16:52:57 +0000 (UTC) Received: from localhost (unknown [10.42.28.163]) by smtp.corp.redhat.com (Postfix) with ESMTP id F2ED9492BD4; Mon, 8 Apr 2024 16:52:56 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH v2] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] Date: Mon, 8 Apr 2024 17:48:19 +0100 Message-ID: <20240408165252.196710-1-jwakely@redhat.com> In-Reply-To: <20240404153158.313297-1-jwakely@redhat.com> References: <20240404153158.313297-1-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Patch v2. I realised that it's not only negative delim values that cause the problem, but also ones greater than CHAR_MAX. Calling ignore(n, 'a'+256) will cause traits_type::find to match 'a' but then the eq_int_type comparison will fail because (int)'a' != (int)('a' + 256). This version of the patch calls to_int_type on the delim and if that alters the value, it's never going to match so skip the loop that tries to find it and just ignore up to n chars instead. Tested x86_64linux and aarch64-linux. -- >8 -- A negative delim value passed to std::istream::ignore can never match any character in the stream, because the comparison is done using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns negative values (except at EOF). The optimized version of ignore for the std::istream specialization uses traits_type::find to locate the delim character in the streambuf, which _can_ match a negative delim on platforms where char is signed, but then we do another comparison using eq_int_type which fails. The code then keeps looping forever, with traits_type::find locating the character and traits_type::eq_int_type saying it's not a match, so traits_type::find is used again and finds the same character again. A possible fix would be to check with eq_int_type after a successful find, to see whether we really have a match. However, that would be suboptimal since we know that a negative delimiter will never match using eq_int_type. So a better fix is to adjust the check at the top of the function that handles delim==eof(), so that we treat all negative delim values as equivalent to EOF. That way we don't bother using find to search for something that will never match with eq_int_type. The version of ignore in the primary template doesn't need a change, because it doesn't use traits_type::find, instead characters are extracted one-by-one and always matched using eq_int_type. That avoids the inconsistency between find and eq_int_type. The specialization for std::wistream does use traits_type::find, but traits_type::to_int_type is equivalent to an implicit conversion from wchar_t to wint_t, so passing a wchar_t directly to ignore without using to_int_type works. libstdc++-v3/ChangeLog: PR libstdc++/93672 * src/c++98/istream.cc (istream::ignore(streamsize, int_type)): Treat all negative delimiter values as eof(). * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test. * testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New test. --- libstdc++-v3/src/c++98/istream.cc | 13 ++- .../27_io/basic_istream/ignore/char/93672.cc | 101 ++++++++++++++++++ .../basic_istream/ignore/wchar_t/93672.cc | 34 ++++++ 3 files changed, 146 insertions(+), 2 deletions(-) create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc index 07ac739c26a..d1b4444ff2b 100644 --- a/libstdc++-v3/src/c++98/istream.cc +++ b/libstdc++-v3/src/c++98/istream.cc @@ -112,8 +112,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION basic_istream:: ignore(streamsize __n, int_type __delim) { - if (traits_type::eq_int_type(__delim, traits_type::eof())) - return ignore(__n); + { + // If conversion to int_type changes the value then __delim does not + // correspond to a value of type char_type, and so will never match + // a character extracted from the input sequence. Just use ignore(n). + const int_type chk_delim = traits_type::to_int_type(__delim); + const bool matchable = traits_type::eq_int_type(chk_delim, __delim); + if (__builtin_expect(!matchable, 0)) + return ignore(__n); + // Now we know that __delim is a valid char_type value, so it's safe + // for the code below to use traits_type::find to search for it. + } _M_gcount = 0; sentry __cerb(*this, true); diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc new file mode 100644 index 00000000000..96737485b83 --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc @@ -0,0 +1,101 @@ +// { dg-do run } + +#include +#include +#include + +void +test_pr93672() // std::basic_istream::ignore hangs if delim MSB is set +{ + std::istringstream in(".\xfc..\xfd...\xfe."); + + // This should find '\xfd' even on platforms where char is signed, + // because the delimiter is correctly converted to the stream's int_type. + in.ignore(100, std::char_traits::to_int_type('\xfc')); + VERIFY( in.gcount() == 2 ); + VERIFY( ! in.eof() ); + + // This should work equivalently to traits_type::to_int_type + in.ignore(100, (unsigned char)'\xfd'); + VERIFY( in.gcount() == 3 ); + VERIFY( ! in.eof() ); + + // This only works if char is unsigned. + in.ignore(100, '\xfe'); + if (std::numeric_limits::is_signed) + { + // When char is signed, '\xfe' != traits_type::to_int_type('\xfe') + // so the delimiter does not match the character in the input sequence, + // and ignore consumes all input until EOF. + VERIFY( in.gcount() == 5 ); + VERIFY( in.eof() ); + } + else + { + // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter + // matches the character in the input sequence, and doesn't reach EOF. + VERIFY( in.gcount() == 4 ); + VERIFY( ! in.eof() ); + } + + in.clear(); + in.str(".a."); + in.ignore(100, 'a' + 256); // Should not match 'a' + VERIFY( in.gcount() == 3 ); + VERIFY( in.eof() ); +} + +// Custom traits type that inherits all behaviour from std::char_traits. +struct traits : std::char_traits { }; + +void +test_primary_template() +{ + // Check that the primary template for std::basic_istream::ignore + // works the same as the std::istream::ignore specialization. + // The infinite loop bug was never present in the primary template, + // because it doesn't use traits_type::find to search the input sequence. + + std::basic_istringstream in(".\xfc..\xfd...\xfe."); + + // This should find '\xfd' even on platforms where char is signed, + // because the delimiter is correctly converted to the stream's int_type. + in.ignore(100, std::char_traits::to_int_type('\xfc')); + VERIFY( in.gcount() == 2 ); + VERIFY( ! in.eof() ); + + // This should work equivalently to traits_type::to_int_type + in.ignore(100, (unsigned char)'\xfd'); + VERIFY( in.gcount() == 3 ); + VERIFY( ! in.eof() ); + + // This only works if char is unsigned. + in.ignore(100, '\xfe'); + if (std::numeric_limits::is_signed) + { + // When char is signed, '\xfe' != traits_type::to_int_type('\xfe') + // so the delimiter does not match the character in the input sequence, + // and ignore consumes all input until EOF. + VERIFY( in.gcount() == 5 ); + VERIFY( in.eof() ); + } + else + { + // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter + // matches the character in the input sequence, and doesn't reach EOF. + VERIFY( in.gcount() == 4 ); + VERIFY( ! in.eof() ); + } + + in.clear(); + in.str(".a."); + in.ignore(100, 'a' + 256); // Should not match 'a' + VERIFY( in.gcount() == 3 ); + VERIFY( in.eof() ); +} + +int main() +{ + test_pr93672(); + test_primary_template(); +} diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc new file mode 100644 index 00000000000..5ce9155e02c --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc @@ -0,0 +1,34 @@ +// { dg-do run } + +#include +#include +#include +#include + +// PR 93672 was a bug in std::istream that never affected std::wistream. +// This test ensures that the bug doesn't get introduced to std::wistream. +void +test_pr93672() +{ + std::wstring str = L".x..x."; + str[1] = (wchar_t)-2; + str[4] = (wchar_t)-3; + std::wistringstream in(str); + + // This should find the character even on platforms where wchar_t is signed, + // because the delimiter is correctly converted to the stream's int_type. + in.ignore(100, std::char_traits::to_int_type((wchar_t)-2)); + VERIFY( in.gcount() == 2 ); + VERIFY( ! in.eof() ); + + // This also works, because std::char_traits::to_int_type(wc) is + // equivalent to (int_type)wc so using to_int_type isn't needed. + in.ignore(100, (wchar_t)-3); + VERIFY( in.gcount() == 3 ); + VERIFY( ! in.eof() ); +} + +int main() +{ + test_pr93672(); +}