stdio-common: Add test for vfscanf with matches longer than INT_MAX [BZ #27650]

Message ID 7bd09b7b-bbd4-ac32-02d5-687ec5e01986@redhat.com
State Superseded
Headers
Series stdio-common: Add test for vfscanf with matches longer than INT_MAX [BZ #27650] |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed

Commit Message

Maciej W. Rozycki June 7, 2024, 12:13 p.m. UTC
  Complement commit b03e4d7bd25b ("stdio: fix vfscanf with matches longer 
than INT_MAX (bug 27650)") and add a test case for the issue, inspired 
by the reproducer provided with the bug report.

This has been verified to succeed as from the commit referred and fail 
beforehand.

As the test requires 2GiB of data to be passed around its performance 
has been evaluated using a choice of systems and the execution time 
determined to be respectively in the range of 10s for POWER9@2.166GHz, 
28s for FU740@1.2GHz, and 48s for 74Kf@950MHz.  As this is on the verge 
of and beyond the default timeout it has been increased by the factor of 
8.  Regardless, following recent practice the test has been added to the 
standard rather than extended set.
---
Hi,

 This has been verified with the `powerpc64le-linux-gnu' (IBM POWER9) 
native target and then the same host and the `riscv64-linux-gnu' (SiFive 
FU740) and `mips-linux-gnu' (o32 ABI) (MIPS 74Kf) targets.  This is so as 
to assess performance requirements for the test case.  And with respect to 
these I have been referred off list to: 
<https://inbox.sourceware.org/libc-alpha/b52d05a2-f4c1-e385-39bd-cd4a6f4f232f@redhat.com/>
for the choice between the standard and the extended set of tests.

 Any questions, comments or concerns?  Otherwise OK to apply?

  Maciej
---
 stdio-common/Makefile            |    2 
 stdio-common/tst-scanf-bz27650.c |  216 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 218 insertions(+)

glibc-tst-scanf-bz27650.diff
  

Comments

Florian Weimer June 7, 2024, 4:11 p.m. UTC | #1
* Maciej W. Rozycki:

> +/* Produce a stream of more than MAX_INT characters to stdout of which
> +   none is the new line character.  This is executed as a subprocess
> +   and the caller wants a void callee, upon the return from which the
> +   process will terminate successfully, so in the case of a failure we
> +   need to explicitly call exit with the failure status.  */
> +
> +static void
> +do_write (void *arg)
> +{
> +  static const char s[] = { [0 ... 4095] = 'a' };
> +  size_t i;
> +
> +  for (i = 0; i <= INT_MAX / sizeof (s); i++)
> +    if (fwrite (s, 1, sizeof (s), stdout) != sizeof (s))
> +      {
> +	int err = errno;
> +
> +	/* Close our stdout so that there's no risk for us to block
> +	   while `fscanf' is waiting on our stdout in `do_read' and
> +	   nothing checking our stderr.  If closing has failed, then
> +	   refrain from reporting anything, for the same reason.  */
> +	if (fclose (stdout) == 0)
> +	  error (0, err, "%s: fwrite: output error", __func__);
> +	exit (EXIT_FAILURE);
> +      }

Is there a reason for using stdout or stream I/O for writing the file?
You could use create_temp_file from <support/temp_file.h> and write
directly to the file descriptor.

It may be faster to use <support/blob_repeat.h> to create a string in
memory because it uses alias mappings, and parse that string using
sscanf or fmemopen+vfscanf.

Thanks,
Florian
  
Maciej W. Rozycki June 7, 2024, 5:24 p.m. UTC | #2
On Fri, 7 Jun 2024, Florian Weimer wrote:

> * Maciej W. Rozycki:
> 
> > +/* Produce a stream of more than MAX_INT characters to stdout of which
> > +   none is the new line character.  This is executed as a subprocess
> > +   and the caller wants a void callee, upon the return from which the
> > +   process will terminate successfully, so in the case of a failure we
> > +   need to explicitly call exit with the failure status.  */
> > +
> > +static void
> > +do_write (void *arg)
> > +{
> > +  static const char s[] = { [0 ... 4095] = 'a' };
> > +  size_t i;
> > +
> > +  for (i = 0; i <= INT_MAX / sizeof (s); i++)
> > +    if (fwrite (s, 1, sizeof (s), stdout) != sizeof (s))
> > +      {
> > +	int err = errno;
> > +
> > +	/* Close our stdout so that there's no risk for us to block
> > +	   while `fscanf' is waiting on our stdout in `do_read' and
> > +	   nothing checking our stderr.  If closing has failed, then
> > +	   refrain from reporting anything, for the same reason.  */
> > +	if (fclose (stdout) == 0)
> > +	  error (0, err, "%s: fwrite: output error", __func__);
> > +	exit (EXIT_FAILURE);
> > +      }
> 
> Is there a reason for using stdout or stream I/O for writing the file?
> You could use create_temp_file from <support/temp_file.h> and write
> directly to the file descriptor.

 A pipe is used because over 2GiB of data has to be transferred.  It could 
be a bit of a stress for the target board if such a large amount was to be 
actually stored in a filesystem (even in the presence of LFS).  There may 
be limited storage available too.

 My choice to use the `fwrite' interface over raw `write' was mostly 
symmetry with the rest of code and also less hassle in handling, as with 
`fwrite' you don't have to take care of partial writes.  Besides, `stdout' 
is readily available, there's no need for an extra library call (and an 
error to handle) to extract the underlying file descriptor.  Not a big 
deal overall, just a matter of style.

> It may be faster to use <support/blob_repeat.h> to create a string in
> memory because it uses alias mappings, and parse that string using
> sscanf or fmemopen+vfscanf.

 Neat!  However with 32-bit targets the size of the allocation required 
may exceed the limit of the user VM supported by hardware or the OS (some 
targets give room for manoeuvre to the OS as to the user/kernel VM split 
while other ones have it hardwired).  As a quick check I have run 
stdlib/tst-strtod-overflow with my 74Kf target and, lo and behold:

UNSUPPORTED: stdlib/tst-strtod-overflow
original exit status 77
warning: memory allocation failed, cannot test for overflow

And I think we do want to have coverage for scanning INT_MAX+ characters 
especially with 32-bit targets, where it may be hitting more than just the 
limit of the `int' type and the specific issue reported with BZ #27650.

 Do my answers address your concerns?

  Maciej
  
Maciej W. Rozycki June 21, 2024, 3:32 p.m. UTC | #3
On Fri, 7 Jun 2024, Maciej W. Rozycki wrote:

> Complement commit b03e4d7bd25b ("stdio: fix vfscanf with matches longer 
> than INT_MAX (bug 27650)") and add a test case for the issue, inspired 
> by the reproducer provided with the bug report.

 Ping for: 
<https://sourceware.org/pipermail/libc-alpha/2024-June/157283.html>,
<https://patchwork.sourceware.org/project/glibc/patch/7bd09b7b-bbd4-ac32-02d5-687ec5e01986@redhat.com/>.

  Maciej
  
Maciej W. Rozycki July 1, 2024, 10:25 a.m. UTC | #4
On Fri, 7 Jun 2024, Maciej W. Rozycki wrote:

> Complement commit b03e4d7bd25b ("stdio: fix vfscanf with matches longer 
> than INT_MAX (bug 27650)") and add a test case for the issue, inspired 
> by the reproducer provided with the bug report.

 Ping for:
<https://sourceware.org/pipermail/libc-alpha/2024-June/157283.html>,
<https://patchwork.sourceware.org/project/glibc/patch/7bd09b7b-bbd4-ac32-02d5-687ec5e01986@redhat.com/>.

  Maciej
  
Florian Weimer July 1, 2024, 4:01 p.m. UTC | #5
* Maciej W. Rozycki:

>> It may be faster to use <support/blob_repeat.h> to create a string in
>> memory because it uses alias mappings, and parse that string using
>> sscanf or fmemopen+vfscanf.
>
>  Neat!  However with 32-bit targets the size of the allocation required 
> may exceed the limit of the user VM supported by hardware or the OS (some 
> targets give room for manoeuvre to the OS as to the user/kernel VM split 
> while other ones have it hardwired).  As a quick check I have run 
> stdlib/tst-strtod-overflow with my 74Kf target and, lo and behold:
>
> UNSUPPORTED: stdlib/tst-strtod-overflow
> original exit status 77
> warning: memory allocation failed, cannot test for overflow
>
> And I think we do want to have coverage for scanning INT_MAX+ characters 
> especially with 32-bit targets, where it may be hitting more than just the 
> limit of the `int' type and the specific issue reported with BZ #27650.

Fair enough.

>  Do my answers address your concerns?

I think the test case is still unnecessarily complicated.  You can use
fopencookie to generate the data, or at least use threads to simplify
the error handling considerably.  And please do not reuse stdout for
data generation.  Given how our test error reporting works (we write to
stdout), fclose (stdout) is a bit of a debugger trap.

Thanks,
Florian
  
Maciej W. Rozycki July 2, 2024, 12:02 p.m. UTC | #6
On Mon, 1 Jul 2024, Florian Weimer wrote:

> >  Do my answers address your concerns?
> 
> I think the test case is still unnecessarily complicated.  You can use
> fopencookie to generate the data, or at least use threads to simplify
> the error handling considerably.  And please do not reuse stdout for
> data generation.  Given how our test error reporting works (we write to
> stdout), fclose (stdout) is a bit of a debugger trap.

 But that `fclose (stdout)' is called in the child only, for the stream 
associated with the pipe connecting with the parent, which in turn is a 
consequence of how our own `support_subprocess' has been implemented -- 
not my choice.  This is how unnamed pipes work and this stream or the 
underlying pipe do not communicate with the test harness and neither have 
anything to do with the parent's stdout (and likewise stderr).

 And I don't think using threads would make the test case simpler.  In 
fact threading is complex by itself, so making the test case threaded 
would only complicate things.

 However the idea of using `fopencookie' (what a weird name!), which is a 
GNU extension I have not come across before, seems attractive to me.  It 
will remove the need to use a subprocess, likely improve performance a 
little, as there'll be no need to cross the kernel boundary anymore, and 
will greatly simplify handling overall.  Thanks for the suggestion, I'll 
be posting v2 shortly.

 Thank you for your review.

  Maciej
  

Patch

Index: glibc/stdio-common/Makefile
===================================================================
--- glibc.orig/stdio-common/Makefile
+++ glibc/stdio-common/Makefile
@@ -244,6 +244,7 @@  tests := \
   tst-scanf-binary-c23 \
   tst-scanf-binary-gnu11 \
   tst-scanf-binary-gnu89 \
+  tst-scanf-bz27650 \
   tst-scanf-intn \
   tst-scanf-round \
   tst-scanf-to_inpunct \
@@ -314,6 +315,7 @@  generated += \
   tst-printf-fp-free.mtrace \
   tst-printf-fp-leak-mem.out \
   tst-printf-fp-leak.mtrace \
+  tst-scanf-bz27650.mtrace \
   tst-vfprintf-width-prec-mem.out \
   tst-vfprintf-width-prec.mtrace \
   # generated
Index: glibc/stdio-common/tst-scanf-bz27650.c
===================================================================
--- /dev/null
+++ glibc/stdio-common/tst-scanf-bz27650.c
@@ -0,0 +1,216 @@ 
+/* Test for BZ #27650, formatted input matching beyond MAX_INT.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <array_length.h>
+#include <error.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <mcheck.h>
+#include <poll.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <support/subprocess.h>
+#include <support/test-driver.h>
+
+/* Produce a stream of more than MAX_INT characters to stdout of which
+   none is the new line character.  This is executed as a subprocess
+   and the caller wants a void callee, upon the return from which the
+   process will terminate successfully, so in the case of a failure we
+   need to explicitly call exit with the failure status.  */
+
+static void
+do_write (void *arg)
+{
+  static const char s[] = { [0 ... 4095] = 'a' };
+  size_t i;
+
+  for (i = 0; i <= INT_MAX / sizeof (s); i++)
+    if (fwrite (s, 1, sizeof (s), stdout) != sizeof (s))
+      {
+	int err = errno;
+
+	/* Close our stdout so that there's no risk for us to block
+	   while `fscanf' is waiting on our stdout in `do_read' and
+	   nothing checking our stderr.  If closing has failed, then
+	   refrain from reporting anything, for the same reason.  */
+	if (fclose (stdout) == 0)
+	  error (0, err, "%s: fwrite: output error", __func__);
+	exit (EXIT_FAILURE);
+      }
+}
+
+/* Consume a stream of more than MAX_INT characters from IN of which
+   none is the new line character.  The call to fscanf is supposed
+   to complete upon the EOF condition on IN, however in the presence
+   of BZ #27650 it will terminate prematurely with characters still
+   outstanding in IN.  Diagnose the condition and return status
+   accordingly.  */
+
+static int
+do_read (FILE *in)
+{
+  int v;
+
+  v = fscanf (in, "%*[^\n]");
+  if (v == EOF || errno != 0)
+    {
+      error (0, errno, "%s: fscanf: input failure", __func__);
+      return EXIT_FAILURE;
+    }
+
+  if (!feof (in))
+    {
+      v = fgetc (in);
+      if (v == EOF)
+	error (0, errno, "%s: fgetc: input failure", __func__);
+      else if (v == '\n')
+	error (0, 0, "%s: unexpected new line character received", __func__);
+      else
+	error (0, 0,
+	       "%s: character received after end of file expected: \\x%02x",
+	       __func__, v);
+      return EXIT_FAILURE;
+    }
+
+  return EXIT_SUCCESS;
+}
+
+/* Run do_write in a subprocess and communicate its output produced to
+   stdout via a pipe to do_read.  Upon completion of do_read consume
+   any outstanding input from do_write and report any issues.  Return
+   success or failure based on the status of the subprocess and ours.  */
+
+int
+do_test (void)
+{
+  struct support_subprocess target;
+  FILE *chdout, *chderr;
+  int chdstatus;
+  int status;
+
+  mtrace ();
+
+  target = support_subprocess (do_write, NULL);
+  chdout = fdopen (target.stdout_pipe[0], "r");
+  if (chdout == NULL)
+    {
+      error (0, errno, "fdopen");
+      status = EXIT_FAILURE;
+    }
+  else
+    status = do_read (chdout);
+
+  /* Switch the pipes to the non-blocking mode to make sure do_write
+     does not lock up waiting output and consume any outstanding input
+     received.  Discard any output from do_write's stdout and pass any
+     output from do_write's stderr along to our stderr.  */
+  if (fcntl (target.stdout_pipe[0], F_SETFL, O_NONBLOCK) == -1
+      || fcntl (target.stderr_pipe[0], F_SETFL, O_NONBLOCK) == -1)
+    {
+      error (0, errno, "fcntl (F_SETFL)");
+      status = EXIT_FAILURE;
+    }
+  else if (chdout != NULL)
+    {
+      chderr = fdopen (target.stderr_pipe[0], "r");
+      if (chderr != NULL)
+	{
+	  struct pollfd fds[] =
+	    { { .fd = target.stderr_pipe[0], .events = POLLIN },
+	      { .fd = target.stdout_pipe[0], .events = POLLIN } };
+	  FILE *ss[array_length (fds)][2] =
+	    { { chderr, stderr }, { chdout } };
+	  int pollstatus;
+	  size_t i;
+
+	  while ((pollstatus = poll (fds, array_length (fds), -1)) >= 0)
+	    {
+	      bool stop;
+
+	      stop = false;
+	      for (i = 0; i < array_length (fds); i++)
+		{
+		  char buf[1024];
+		  char *s;
+
+		  if (fds[i].revents & POLLERR)
+		    fds[i].fd = -1;
+		  else if (fds[i].revents & POLLIN)
+		    do
+		      {
+			s = fgets (buf, sizeof (buf), ss[i][0]);
+			if (s != NULL)
+			  {
+			    if (ss[i][1] != NULL
+				&& fputs (buf, ss[i][1]) == EOF)
+			      {
+				error (0, errno, "fputs");
+				status = EXIT_FAILURE;
+				stop = true;
+			      }
+			  }
+			else if (errno == EAGAIN)
+			  clearerr (chderr);
+			else
+			  {
+			    error (0, errno, "fgets");
+			    status = EXIT_FAILURE;
+			    stop = true;
+			  }
+		      }
+		    while (s != NULL);
+		  else if (fds[i].revents & POLLHUP)
+		    fds[i].fd = -1;
+		}
+	      if (stop)
+		break;
+
+	      stop = true;
+	      for (i = 0; i < array_length (fds); i++)
+		if (fds[i].fd >= 0)
+		  {
+		    stop = false;
+		    break;
+		  }
+	      if (stop)
+		break;
+	    }
+	  if (pollstatus < 0)
+	    {
+	      error (0, errno, "poll");
+	      status = EXIT_FAILURE;
+	    }
+	}
+    }
+
+  /* Combine our subprocess's status and intended ours.  Only succeed
+     if both are good.  */
+  chdstatus = support_process_wait (&target);
+  if (status == EXIT_SUCCESS && WIFEXITED (chdstatus))
+    return WEXITSTATUS (chdstatus);
+  else if (status != EXIT_SUCCESS)
+    return status;
+  else
+    return EXIT_FAILURE;
+}
+
+#define TIMEOUT (DEFAULT_TIMEOUT * 8)
+#include <support/test-driver.c>