Message ID | 20210325140102.31043-1-hi@alyssa.is |
---|---|
State | Committed |
Commit | b03e4d7bd25b1ac485f858f0a857ba6085e8c9b0 |
Delegated to: | Florian Weimer |
Headers |
Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 557863844047; Thu, 25 Mar 2021 14:01:18 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by sourceware.org (Postfix) with ESMTPS id 7792B384406C for <libc-alpha@sourceware.org>; Thu, 25 Mar 2021 14:01:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7792B384406C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=alyssa.is Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=qyliss@eve.qyliss.net Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 3A1705C00CD for <libc-alpha@sourceware.org>; Thu, 25 Mar 2021 10:01:16 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Thu, 25 Mar 2021 10:01:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alyssa.is; h= from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; s=fm2; bh=5G5jewYIH837KePPJGa0omo70D W6aFiE32/JHNV4rlg=; b=OWygHzCeTRaznng8Y8AwypNCPjIN9pKMo3NnZZ6frQ t6DSFDQUke4ynfEU20QNCJJNDsX6TfAmqCB9NMErZD4jTu9K2xUJz2RH5Tvl0MmP vj3DAbJUxICELKX9/Skvqs5BqK7O/M57+D/6DSVxDV3h2Vghm5CTX6HwcKpjcSBS c3qEB/LSqRU+sKl/yL/W3DnZ9g15rWhr/vNTRjlJmuEO47hwhq1lKq+LCM/dwrq4 +Qy3wphP5XjuAUTH1wFHbhr67iA2MLcW8+pg8wFN4PJv0qTUDDb4ozSM7z+MXciw 9ZnYhz4Vks0NtjIZIWbIPcRLI+8XtDxZzi+9EV3i8cwA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :message-id:mime-version:subject:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=5G5jewYIH837KePPJ Ga0omo70DW6aFiE32/JHNV4rlg=; b=jl5deIWQVxdPrBG6Jqrs2PYDFONJstmT6 OGgN252TB9SqzqFWzv7JzTQ/cCXsnep8OZgx/GZPqMFdXDOFR9l0+/Sl2hvKn7Cl gEEaLX2tO0YeyhCfJCoiXymaAma2MbZ/aho3IPVZ7qzb7Xp6HnMlRsvBCwIyWA9H 2cfTVEzf6qYJ1q82rJmQCtDS7W6HkbRlkQ5tzERgDMZe+Np94VnXY6DVwsvHTSR1 18vtsHjZXvIT9TcmqPspV/Xj7Ik9ozDUPQSZZjFjYOXe7QTO4pW3+ro4bct26vDo PkDGvShJd6KdnVXsye9hJXfH72bPU+75CO6PlujCZ6reEOYTLG8QA== X-ME-Sender: <xms:ppdcYHzyqdKMbqa5DgwBDZ7NypZlXJmdo_PcNq2XtPEvHa-iGOvOoA> <xme:ppdcYPMn_dy1HF7sLOMnpy8ZeRw-GAzaDH-a38fbEeCboNh9Ng2TYEj0i9fSB7TfB LTSB0sOifhgz0K3kQ> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrudehtddgheelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgggfestdekredtre dttdenucfhrhhomheptehlhihsshgrucftohhsshcuoehhihesrghlhihsshgrrdhisheq necuggftrfgrthhtvghrnhephedvfffghfetieejgfetfedtgffhvdehueehvdejudfgge fgleejgfelfeevgfefnecukfhppeekgedrudekgedrvdeftddrvdduleenucevlhhushht vghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehqhihlihhsshesvghvvg drqhihlhhishhsrdhnvght X-ME-Proxy: <xmx:ppdcYONJ17hTouW6CPWNth-ZfSX-AIDcJ3usT4z5ucBDntbeWB9LsA> <xmx:ppdcYJRU-9hAgfHoPJcW3W06mZcPXOekCuqNqc-vVFT48X3_hHAFkg> <xmx:ppdcYIAEb5A_xrAWQhxl8rDOM8TnN_ytS1Rbo9tLDqzEdeKXs49WMg> <xmx:rJdcYP08URXkHeQpwZEi6AWruQW3ewL2Qf3zRRBfVCHHQMLFoPtVsw> Received: from eve.qyliss.net (p54b8e6db.dip0.t-ipconnect.de [84.184.230.219]) by mail.messagingengine.com (Postfix) with ESMTPA id 27A9324005B for <libc-alpha@sourceware.org>; Thu, 25 Mar 2021 10:01:10 -0400 (EDT) Received: by eve.qyliss.net (Postfix, from userid 1000) id 5B0DFFB8; Thu, 25 Mar 2021 14:01:09 +0000 (UTC) From: Alyssa Ross <hi@alyssa.is> To: libc-alpha@sourceware.org Subject: [PATCH] stdio: fix vfscanf with matches longer than INT_MAX (bug 27650) Date: Thu, 25 Mar 2021 14:01:02 +0000 Message-Id: <20210325140102.31043-1-hi@alyssa.is> X-Mailer: git-send-email 2.30.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> Cc: Alyssa Ross <hi@alyssa.is> Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org> |
Series |
stdio: fix vfscanf with matches longer than INT_MAX (bug 27650)
|
|
Commit Message
Alyssa Ross
March 25, 2021, 2:01 p.m. UTC
Patterns like %*[ can safely be used to match a great many characters, and it's quite realisitic to use them for more than INT_MAX characters from an IO stream. With the previous approach, after INT_MAX characters (v)fscanf would return successfully, indicating an end to the match, even though there wasn't one. --- I have not done a copyright assignment yet, but I think this change should be small enough to be exempt? stdio-common/vfscanf-internal.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)
Comments
* Alyssa Ross: > I have not done a copyright assignment yet, but I think this change > should be small enough to be exempt? Yes, I think it's small enough. The test case wouldn't be, though. I think the one on the bug needs some large (infinite) input on the stdin, though. A real test case for glibc should probably involve pipe, fork, and fdopen. fopencookie could work, too. > stdio-common/vfscanf-internal.c | 13 ++++--------- > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/stdio-common/vfscanf-internal.c b/stdio-common/vfscanf-internal.c > index 38e74776a5..1d81e16f4e 100644 > --- a/stdio-common/vfscanf-internal.c > +++ b/stdio-common/vfscanf-internal.c > @@ -2479,11 +2479,6 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > else > not_in = 0; > > - if (width < 0) > - /* There is no width given so there is also no limit on the > - number of characters we read. Therefore we set width to > - a very high value to make the algorithm easier. */ > - width = INT_MAX; > > #ifdef COMPILE_WSCANF > /* Find the beginning and the end of the scanlist. We are not > @@ -2647,7 +2642,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > } > } > } > - while (--width > 0 && inchar () != WEOF); > + while ((width < 0 || --width > 0) && inchar () != WEOF); > out: > #else > char buf[MB_LEN_MAX]; > @@ -2732,7 +2727,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > } > } > > - if (--width <= 0) > + if (width >= 0 && --width <= 0) > break; > } > while (inchar () != EOF); > @@ -2884,7 +2879,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > assert (n <= MB_LEN_MAX); > str += n; > } > - while (--width > 0 && inchar () != WEOF); > + while ((width < 0 || --width > 0) && inchar () != WEOF); > out2: > #else > do > @@ -2938,7 +2933,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, > } > } > } > - while (--width > 0 && inchar () != EOF); > + while ((width < 0 || --width > 0) && inchar () != EOF); > #endif > > if (__glibc_unlikely (now == read_in)) So I tried to review this. -U100 helped. I was worried about width starting out as positive and going negative. But as far as I can tell, processing stops once width == 0, so the issue cannot happen. Do you want to work on the test case? Will the copyright assignment be an obstacle?
>> I have not done a copyright assignment yet, but I think this change >> should be small enough to be exempt? > > Yes, I think it's small enough. > > The test case wouldn't be, though. I think the one on the bug needs > some large (infinite) input on the stdin, though. A real test case > for glibc should probably involve pipe, fork, and fdopen. fopencookie > could work, too. Oh, thanks for telling me about fopencookie! I'd never have known about that otherwise. I've started having a go at a test case using it and it seems like it'll work well. >> stdio-common/vfscanf-internal.c | 13 ++++--------- >> 1 file changed, 4 insertions(+), 9 deletions(-) >> >> diff --git a/stdio-common/vfscanf-internal.c b/stdio-common/vfscanf-internal.c >> index 38e74776a5..1d81e16f4e 100644 >> --- a/stdio-common/vfscanf-internal.c >> +++ b/stdio-common/vfscanf-internal.c >> @@ -2479,11 +2479,6 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, >> else >> not_in = 0; >> >> - if (width < 0) >> - /* There is no width given so there is also no limit on the >> - number of characters we read. Therefore we set width to >> - a very high value to make the algorithm easier. */ >> - width = INT_MAX; >> >> #ifdef COMPILE_WSCANF >> /* Find the beginning and the end of the scanlist. We are not >> @@ -2647,7 +2642,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, >> } >> } >> } >> - while (--width > 0 && inchar () != WEOF); >> + while ((width < 0 || --width > 0) && inchar () != WEOF); >> out: >> #else >> char buf[MB_LEN_MAX]; >> @@ -2732,7 +2727,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, >> } >> } >> >> - if (--width <= 0) >> + if (width >= 0 && --width <= 0) >> break; >> } >> while (inchar () != EOF); >> @@ -2884,7 +2879,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, >> assert (n <= MB_LEN_MAX); >> str += n; >> } >> - while (--width > 0 && inchar () != WEOF); >> + while ((width < 0 || --width > 0) && inchar () != WEOF); >> out2: >> #else >> do >> @@ -2938,7 +2933,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, >> } >> } >> } >> - while (--width > 0 && inchar () != EOF); >> + while ((width < 0 || --width > 0) && inchar () != EOF); >> #endif >> >> if (__glibc_unlikely (now == read_in)) > > So I tried to review this. -U100 helped. I was worried about width > starting out as positive and going negative. But as far as I can > tell, processing stops once width == 0, so the issue cannot happen. That's my understanding too. > Do you want to work on the test case? Will the copyright assignment > be an obstacle? I'm happy to have a go at a test case. I think I have most of one already. Copyright assignment will only be a problem in that it'll slow things down a bit! I just sent one in for Emacs and I've asked about doing one for glibc as well. One question about the test: fscanf-ing through INT_MAX characters on a trivial memcpy-based fopencookie stream takes 20 seconds on my (admittedly fairly old) machine. How slow is too slow for a test?
* Alyssa Ross: >>> I have not done a copyright assignment yet, but I think this change >>> should be small enough to be exempt? >> >> Yes, I think it's small enough. >> >> The test case wouldn't be, though. I think the one on the bug needs >> some large (infinite) input on the stdin, though. A real test case >> for glibc should probably involve pipe, fork, and fdopen. fopencookie >> could work, too. > > Oh, thanks for telling me about fopencookie! I'd never have known about > that otherwise. I've started having a go at a test case using it and it > seems like it'll work well. Or you could test sscanf via <support/blob_repeat.h> on 64-bit architectures. It would avoid the repeated memcpy calls, at the cost of an initial strlen on the entire buffer. > One question about the test: fscanf-ing through INT_MAX characters on a > trivial memcpy-based fopencookie stream takes 20 seconds on my > (admittedly fairly old) machine. How slow is too slow for a test? Opinions on that vary. Twenty seconds is stretching things as far as I'm concerned. I guess it depends what you mean by “fairly old”. We have a second category of tests, xtests, that only run with “make xcheck”. We could put the test there if it takes too long to run otherwise.
Florian Weimer <fw@deneb.enyo.de> writes: >> Oh, thanks for telling me about fopencookie! I'd never have known about >> that otherwise. I've started having a go at a test case using it and it >> seems like it'll work well. > > Or you could test sscanf via <support/blob_repeat.h> on 64-bit > architectures. It would avoid the repeated memcpy calls, at the cost > of an initial strlen on the entire buffer. Would that be a problem on 32-bit? We'd only need to map INT_MAX bytes + 1 page, so the other half of the address space would be enough for everything else, wouldn't it? >> One question about the test: fscanf-ing through INT_MAX characters on a >> trivial memcpy-based fopencookie stream takes 20 seconds on my >> (admittedly fairly old) machine. How slow is too slow for a test? > > Opinions on that vary. Twenty seconds is stretching things as far as > I'm concerned. I guess it depends what you mean by “fairly old”. > > We have a second category of tests, xtests, that only run with “make > xcheck”. We could put the test there if it takes too long to run > otherwise. That machine is a laptop from 2012. On my other laptop, from 2017, it takes <5 seconds, and the fopencookie and blob_repeat versions aren't really distinguishable from each other in terms of time taken. I would like to go with the blob_repeat version if it'll work everywhere, though, because not having to write a custom stream implementation makes the test a lot shorter and easier to understand.
* Alyssa Ross: > Florian Weimer <fw@deneb.enyo.de> writes: > >>> Oh, thanks for telling me about fopencookie! I'd never have known about >>> that otherwise. I've started having a go at a test case using it and it >>> seems like it'll work well. >> >> Or you could test sscanf via <support/blob_repeat.h> on 64-bit >> architectures. It would avoid the repeated memcpy calls, at the cost >> of an initial strlen on the entire buffer. > > Would that be a problem on 32-bit? We'd only need to map > INT_MAX bytes + 1 page, so the other half of the address space would be > enough for everything else, wouldn't it? True, it would work on most 32-bit targets, particularly when running on 64-bit kernels, where more address space is available to userspace. s390 (without the x) has just 31-bit addresses, so it won't be able to run the test, but that's fine, given that it's an architecture-independent issue. (blob_repeat sets up page aliasing behind the scenes, so the actual memory requirements are quite limited, it's just address space that could be problematic.) >>> One question about the test: fscanf-ing through INT_MAX characters on a >>> trivial memcpy-based fopencookie stream takes 20 seconds on my >>> (admittedly fairly old) machine. How slow is too slow for a test? >> >> Opinions on that vary. Twenty seconds is stretching things as far as >> I'm concerned. I guess it depends what you mean by “fairly old”. >> >> We have a second category of tests, xtests, that only run with “make >> xcheck”. We could put the test there if it takes too long to run >> otherwise. > > That machine is a laptop from 2012. On my other laptop, from 2017, it > takes <5 seconds, and the fopencookie and blob_repeat versions aren't > really distinguishable from each other in terms of time taken. I would > like to go with the blob_repeat version if it'll work everywhere, > though, because not having to write a custom stream implementation makes > the test a lot shorter and easier to understand. Sounds good to me. Less than five seconds is fine, I think.
On Thu, Mar 25, 2021 at 06:25:17PM +0100, Florian Weimer wrote: > Do you want to work on the test case? Will the copyright assignment > be an obstacle? Okay, I've written a test case (thanks for all your help). Do I need to seperately test swscanf and friends, since their implementation is seperate? And should I post the revised patch with the test case as soon as it's ready, or should I wait until I have my copyright paperwork sorted?
* Alyssa Ross: > On Thu, Mar 25, 2021 at 06:25:17PM +0100, Florian Weimer wrote: >> Do you want to work on the test case? Will the copyright assignment >> be an obstacle? > > Okay, I've written a test case (thanks for all your help). Do I need to > seperately test swscanf and friends, since their implementation is > seperate? And should I post the revised patch with the test case as > soon as it's ready, or should I wait until I have my copyright paperwork > sorted? I think we can apply the fix now and the test case later. You can post the test case now, we can mark it as FSF Assignment Missing in Patchwork.
* Alyssa Ross: > Patterns like %*[ can safely be used to match a great many characters, > and it's quite realisitic to use them for more than INT_MAX characters > from an IO stream. > > With the previous approach, after INT_MAX characters (v)fscanf would > return successfully, indicating an end to the match, even though there > wasn't one. I've pushed to this commit, thanks. We can integrate the test once your copyright assignment is on file. Thanks, Florian
On Mon, May 03, 2021 at 10:57:12AM +0200, Florian Weimer wrote: > * Alyssa Ross: > > > Patterns like %*[ can safely be used to match a great many characters, > > and it's quite realisitic to use them for more than INT_MAX characters > > from an IO stream. > > > > With the previous approach, after INT_MAX characters (v)fscanf would > > return successfully, indicating an end to the match, even though there > > wasn't one. > > I've pushed to this commit, thanks. > > We can integrate the test once your copyright assignment is on file. My copyright assignment should be sorted now (as of May 6).
diff --git a/stdio-common/vfscanf-internal.c b/stdio-common/vfscanf-internal.c index 38e74776a5..1d81e16f4e 100644 --- a/stdio-common/vfscanf-internal.c +++ b/stdio-common/vfscanf-internal.c @@ -2479,11 +2479,6 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, else not_in = 0; - if (width < 0) - /* There is no width given so there is also no limit on the - number of characters we read. Therefore we set width to - a very high value to make the algorithm easier. */ - width = INT_MAX; #ifdef COMPILE_WSCANF /* Find the beginning and the end of the scanlist. We are not @@ -2647,7 +2642,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, } } } - while (--width > 0 && inchar () != WEOF); + while ((width < 0 || --width > 0) && inchar () != WEOF); out: #else char buf[MB_LEN_MAX]; @@ -2732,7 +2727,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, } } - if (--width <= 0) + if (width >= 0 && --width <= 0) break; } while (inchar () != EOF); @@ -2884,7 +2879,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, assert (n <= MB_LEN_MAX); str += n; } - while (--width > 0 && inchar () != WEOF); + while ((width < 0 || --width > 0) && inchar () != WEOF); out2: #else do @@ -2938,7 +2933,7 @@ __vfscanf_internal (FILE *s, const char *format, va_list argptr, } } } - while (--width > 0 && inchar () != EOF); + while ((width < 0 || --width > 0) && inchar () != EOF); #endif if (__glibc_unlikely (now == read_in))