[v2] stdio-common: Add test for vfscanf with matches longer than INT_MAX [BZ #27650]

Message ID 54811be2-18d4-10ca-ec1f-d4b06aeb203f@redhat.com
State Superseded
Headers
Series [v2] stdio-common: Add test for vfscanf with matches longer than INT_MAX [BZ #27650] |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed

Commit Message

Maciej W. Rozycki July 2, 2024, 12:07 p.m. UTC
  Complement commit b03e4d7bd25b ("stdio: fix vfscanf with matches longer 
than INT_MAX (bug 27650)") and add a test case for the issue, inspired 
by the reproducer provided with the bug report.

This has been verified to succeed as from the commit referred and fail 
beforehand.

As the test requires 2GiB of data to be passed around its performance 
has been evaluated using a choice of systems and the execution time 
determined to be respectively in the range of 9s for POWER9@2.166GHz, 
24s for FU740@1.2GHz, and 40s for 74Kf@950MHz.  As this is on the verge 
of and beyond the default timeout it has been increased by the factor of 
8.  Regardless, following recent practice the test has been added to the 
standard rather than extended set.
---
Hi,

 This has been verified with the `powerpc64le-linux-gnu' (IBM POWER9) 
native target and then the same host and the `riscv64-linux-gnu' (SiFive 
FU740) and `mips-linux-gnu' (o32 ABI) (MIPS 74Kf) targets.  This is so as 
to assess performance requirements for the test case.

 I now have a working test environment with a 21064A@266MHz Alpha machine 
in my lab, so as a matter of interest as to performance I have tried this 
test with that system as well.  It has turned out to require ~4m30s to 
complete, so setting TIMEOUTFACTOR=2 is required, not unreasonably for a 
30 years old system.  However the test also failed towards the end, with:

tst-scanf-bz27650: fscanf: input failure, at 2147483648: Cannot allocate memory

be it with v1 or with code updated for `fopencookie' and parts around 
`fscanf' unchanged.

 Having re-read the relevant parts of the ISO C and POSIX specifications I 
have figured out that `fscanf' is one of the functions that returns a 
result that is not indicative of an error and is allowed to set `errno' 
arbitrarily even in the absence of an error both at a time:

"The value of errno may be set to nonzero by a library function call 
whether or not there is an error, provided the use of errno is not 
documented in the description of the function in this document."

and POSIX further clarifies for `fscanf':

"If a read error occurs, the error indicator for the stream shall be set."

Indeed returning -1 in `io_read' makes `fscanf' return 0 here and set the 
error indicator.  I have therefore updated error handling for `fscanf', by 
explicitly checking for the error condition of the stream and if it is 
clear, then ignoring `errno'.  With the fix in place the `alpha-linux-gnu' 
target now passes the test too.

 I have re-verified that this updated test case still triggers a failure 
with commit b03e4d7bd25b reverted.

 Any questions, comments or concerns?  Otherwise OK to apply?

  Maciej

Changes from v1:

- Reimplement in terms of `fopencookie', eliminating the need for a 
  subprocess and associated handling.

- Update execution times reported in the change description, slightly 
  reduced accordingly.

- Correct error handling for `fscanf'.

- Fix a typo s/MAX_INT/INT_MAX/ in comments.
---
 stdio-common/Makefile            |    2 
 stdio-common/tst-scanf-bz27650.c |   92 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+)

glibc-tst-scanf-bz27650.diff
  

Comments

Florian Weimer July 2, 2024, 7:33 p.m. UTC | #1
* Maciej W. Rozycki:

> --- /dev/null
> +++ glibc/stdio-common/tst-scanf-bz27650.c
> @@ -0,0 +1,92 @@
> +/* Test for BZ #27650, formatted input matching beyond INT_MAX.

I like how much shorter the test is now.

> +#include <mcheck.h>

Where is <mcheck.h> used?


> +  if (v == EOF || e != 0)
> +    error (EXIT_FAILURE, e ? errno : 0,
> +	   "fscanf: input failure, at %u", written);

Tests should not write diagnostics to standard error (where they get
interspersed with make output).  Please #include <support/check.h> and
use FAIL_EXIT1 instead (or TEST_VERIFY, TEST_COMPARE …).

> +  if (!feof (in))
> +    {
> +      v = fgetc (in);
> +      if (v == EOF)
> +	error (EXIT_FAILURE, errno, "fgetc: input failure");
> +      else if (v == '\n')
> +	error (EXIT_FAILURE, 0, "unexpected new line character received");
> +      else
> +	error (EXIT_FAILURE, 0,
> +	       "character received after end of file expected: \\x%02x", v);
> +    }

Missing fclose (in)?
  
Maciej W. Rozycki July 4, 2024, 10:22 a.m. UTC | #2
On Tue, 2 Jul 2024, Florian Weimer wrote:

> > +#include <mcheck.h>
> 
> Where is <mcheck.h> used?

 There's a call to `mtrace' in `do_test' right at the beginning.

> > +  if (v == EOF || e != 0)
> > +    error (EXIT_FAILURE, e ? errno : 0,
> > +	   "fscanf: input failure, at %u", written);
> 
> Tests should not write diagnostics to standard error (where they get
> interspersed with make output).  Please #include <support/check.h> and
> use FAIL_EXIT1 instead (or TEST_VERIFY, TEST_COMPARE …).

 There's prior usage, presumably with older test cases, but indeed I've 
become aware of this peculiarity in the course of my other effort, and 
it's just escaped me here.  Honestly, given that there's no use for fd 2 
for test cases the test driver could well just assign `stderr = stdio', 
making all the available standard error reporting facilities work as 
expected automagically.  Maybe it's worth doing as a separate cleanup?

 Anyway the macros you refer to standardise the structure of the messages 
produced, so after some experimenting I chose to follow your suggestion 
rather than doing `stderr = stdio'.  However sadly there is currently no 
macro that lets one pass a descriptive message but does not terminate the 
program or return from the current function.  So I chose to trivially add 
one and converted the patch into a mini patch series, having concluded 
that calling `support_print_failure_impl' directly from a test case felt a 
bit hackish.

> > +  if (!feof (in))
> > +    {
> > +      v = fgetc (in);
> > +      if (v == EOF)
> > +	error (EXIT_FAILURE, errno, "fgetc: input failure");
> > +      else if (v == '\n')
> > +	error (EXIT_FAILURE, 0, "unexpected new line character received");
> > +      else
> > +	error (EXIT_FAILURE, 0,
> > +	       "character received after end of file expected: \\x%02x", v);
> > +    }
> 
> Missing fclose (in)?

 Technically not needed, but you're right it's a bit sloppy not to have 
it.  This however requires the test case to be reshaped a little for the 
cleanup to be always called provided that the stream has been successfully 
opened.

 I have posted v3 now.

  Maciej
  
Maciej W. Rozycki July 4, 2024, 1 p.m. UTC | #3
On Thu, 4 Jul 2024, Maciej W. Rozycki wrote:

> > > +#include <mcheck.h>
> > 
> > Where is <mcheck.h> used?
> 
>  There's a call to `mtrace' in `do_test' right at the beginning.

 And I have spotted that for this to have any effect explicit arrangements 
have to be made for the test harness, so I have now posted v4 to address 
that, superseding v3.  Sorry for the extra noise.

  Maciej
  
Maciej W. Rozycki July 5, 2024, 4:09 p.m. UTC | #4
On Thu, 4 Jul 2024, Maciej W. Rozycki wrote:

>  And I have spotted that for this to have any effect explicit arrangements 
> have to be made for the test harness, so I have now posted v4 to address 
> that, superseding v3.  Sorry for the extra noise.

 For the record, I have obviously noticed the CI failures, hard to miss.

 It just did not occur to me, sigh, that adding a FAIL definition to 
support/check.h is actually a global change and therefore requires running 
the whole test suite, unlike with earlier iterations.  And it happened 
that the FAIL definition clashed with a bunch of same-named local macros, 
including, sadly, some recently added, which could have been chosen to use 
support/check.h facilities instead.  I guess it wasn't my day yesterday.

 I've been working on v5 since I became aware of the problem.  Current ETA 
is Monday next week.

  Maciej
  

Patch

Index: glibc/stdio-common/Makefile
===================================================================
--- glibc.orig/stdio-common/Makefile
+++ glibc/stdio-common/Makefile
@@ -244,6 +244,7 @@  tests := \
   tst-scanf-binary-c23 \
   tst-scanf-binary-gnu11 \
   tst-scanf-binary-gnu89 \
+  tst-scanf-bz27650 \
   tst-scanf-intn \
   tst-scanf-round \
   tst-scanf-to_inpunct \
@@ -314,6 +315,7 @@  generated += \
   tst-printf-fp-free.mtrace \
   tst-printf-fp-leak-mem.out \
   tst-printf-fp-leak.mtrace \
+  tst-scanf-bz27650.mtrace \
   tst-vfprintf-width-prec-mem.out \
   tst-vfprintf-width-prec.mtrace \
   # generated
Index: glibc/stdio-common/tst-scanf-bz27650.c
===================================================================
--- /dev/null
+++ glibc/stdio-common/tst-scanf-bz27650.c
@@ -0,0 +1,92 @@ 
+/* Test for BZ #27650, formatted input matching beyond INT_MAX.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <error.h>
+#include <errno.h>
+#include <limits.h>
+#include <mcheck.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <sys/types.h>
+
+#include <support/test-driver.h>
+
+/* Produce a stream of more than INT_MAX characters via buffer BUF of
+   size SIZE according to bookkeeping in COOKIE and then return EOF.  */
+
+static ssize_t
+io_read (void *cookie, char *buf, size_t size)
+{
+  unsigned int *written = cookie;
+  unsigned int w = *written;
+
+  if (w > INT_MAX)
+    return 0;
+
+  memset (buf, 'a', size);
+  *written = w + size;
+  return size;
+}
+
+/* Consume a stream of more than INT_MAX characters from an artificial
+   input stream of which none is the new line character.  The call to
+   fscanf is supposed to complete upon the EOF condition of input,
+   however in the presence of BZ #27650 it will terminate prematurely
+   with characters still outstanding in input.  Diagnose the condition
+   and return status accordingly.  */
+
+int
+do_test (void)
+{
+  static cookie_io_functions_t io_funcs = { .read = io_read };
+  unsigned int written = 0;
+  FILE *in;
+  int v, e;
+
+  mtrace ();
+
+  in = fopencookie (&written, "r", io_funcs);
+  if (in == NULL)
+    error (EXIT_FAILURE, errno, "fopencookie");
+
+  v = fscanf (in, "%*[^\n]");
+  e = ferror (in);
+  if (v == EOF || e != 0)
+    error (EXIT_FAILURE, e ? errno : 0,
+	   "fscanf: input failure, at %u", written);
+
+  if (!feof (in))
+    {
+      v = fgetc (in);
+      if (v == EOF)
+	error (EXIT_FAILURE, errno, "fgetc: input failure");
+      else if (v == '\n')
+	error (EXIT_FAILURE, 0, "unexpected new line character received");
+      else
+	error (EXIT_FAILURE, 0,
+	       "character received after end of file expected: \\x%02x", v);
+    }
+
+  return EXIT_SUCCESS;
+}
+
+#define TIMEOUT (DEFAULT_TIMEOUT * 8)
+#include <support/test-driver.c>