Patchwork [BZ,#1190] Make EOF sticky in stdio.

login
register
mail settings
Submitter Zack Weinberg
Date Feb. 12, 2018, 4:42 p.m.
Message ID <20180212164211.1249-1-zackw@panix.com>
Download mbox | patch
Permalink /patch/25915/
State New
Headers show

Comments

Zack Weinberg - Feb. 12, 2018, 4:42 p.m.
C99 specifies that the EOF condition on a file is "sticky": once EOF
has been encountered, all subsequent reads should continue to return
EOF until the file is closed or something clears the "end-of-file
indicator" (e.g. fseek, clearerr).  This is arguably a change from
C89, where the wording was ambiguous; the BSDs always had sticky EOF,
but the System V lineage would attempt to read from the underlying fd
again.  GNU libc has followed System V for as long as we've been
using libio---the relevant chunk of code is

    int
    _IO_new_file_underflow (_IO_FILE *fp)
    {
      _IO_ssize_t count;
    #if 0
      /* SysV does not make this test; take it out for compatibility */
      if (fp->_flags & _IO_EOF_SEEN)
        return (EOF);
    #endif

That's been unchanged since before 1995.  This only matters if the
underlying file has changed in the meantime, of course; perhaps that's
why nobody got around to filing a bug report until 2005, six years
after C99 was published.  And nobody took that bug seriously until
2012, at which time there was a long but inconclusive discussion on
libc-alpha regarding whether it would break applications to change
anything, see <https://sourceware.org/ml/libc-alpha/2012-09/msg00343.html>.

It is my considered opinion that we should just go ahead and fix the
bug, and that a backward compatibility mode is not required, because
the BSDs have always had sticky EOF, so portable code has always had
to be prepared to deal with that behavior.  Nowadays, the lineages we
should be worrying most about compatibility with are all BSD-derived,
anyway.  Thus, this patch.

You might wonder if changing the _underflow impls is sufficient to
apply the C99 semantics to all of the many stdio functions that
perform input.  It should be enough to cover all paths to _IO_SYSREAD,
and the only other functions that call _IO_SYSREAD are the _seekoff
impls, which is OK because seeking clears EOF, and the _xsgetn impls,
which, as far as I can tell, are unused within glibc.

The test cases in this patch are adapted from bug #19476, filed in
2016; I will credit Martin Sebor in the change log.

zw

	[BZ #1190]
	* libio/fileops.c (_IO_new_file_underflow): Return EOF immediately
	if the _IO_EOF_SEEN bit is already set; update commentary.
	* libio/oldfileops.c (_IO_old_file_underflow): Likewise.
	* libio/wfileops.c (_IO_wfile_underflow): Likewise.

	* libio/tst-fgetc-after-eof.c, wcsmbs/test-fgetwc-after-eof.c:
	New test cases.
	* libio/Makefile (tests): Add tst-fgetc-after-eof.
	* wcsmbs/Makefile (tests): Add tst-fgetwc-after-eof.
---
 libio/Makefile                |   3 +-
 libio/fileops.c               |   5 +-
 libio/oldfileops.c            |   5 +-
 libio/tst-fgetc-after-eof.c   | 135 ++++++++++++++++++++++++++++++++++++++++
 libio/wfileops.c              |   4 ++
 wcsmbs/Makefile               |   2 +-
 wcsmbs/tst-fgetwc-after-eof.c | 141 ++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 287 insertions(+), 8 deletions(-)
 create mode 100644 libio/tst-fgetc-after-eof.c
 create mode 100644 wcsmbs/tst-fgetwc-after-eof.c
Andreas Schwab - Feb. 12, 2018, 5:22 p.m.
On Feb 12 2018, Zack Weinberg <zackw@panix.com> wrote:

> That's been unchanged since before 1995.  This only matters if the
> underlying file has changed in the meantime, of course;

Interactive devices like ttys also don't have a sticky EOF.

Andreas.
Zack Weinberg - Feb. 12, 2018, 5:31 p.m.
On Mon, Feb 12, 2018 at 12:22 PM, Andreas Schwab <schwab@suse.de> wrote:
> On Feb 12 2018, Zack Weinberg <zackw@panix.com> wrote:
>> That's been unchanged since before 1995.  This only matters if the
>> underlying file has changed in the meantime, of course;
>
> Interactive devices like ttys also don't have a sticky EOF.

Right, I think that if there are any programs this change is likely to
break, it's programs that use stdio to read from a tty.  I could even
see an argument for giving stdin, specifically, non-sticky EOF when
connected to a terminal (kinda like how stdout is line-buffered when
connected to a terminal). But I would want some evidence that there
_are_ programs that would break without that, before we spend any time
coding it up.

zw
Andreas Schwab - Feb. 12, 2018, 6:25 p.m.
On Feb 12 2018, Zack Weinberg <zackw@panix.com> wrote:

> On Mon, Feb 12, 2018 at 12:22 PM, Andreas Schwab <schwab@suse.de> wrote:
>> On Feb 12 2018, Zack Weinberg <zackw@panix.com> wrote:
>>> That's been unchanged since before 1995.  This only matters if the
>>> underlying file has changed in the meantime, of course;
>>
>> Interactive devices like ttys also don't have a sticky EOF.
>
> Right, I think that if there are any programs this change is likely to
> break, it's programs that use stdio to read from a tty.

On the other hand, it can also fix such programs, by no longer needing
to send EOF twice to stop it from reading further.

Andreas.

Patch

diff --git a/libio/Makefile b/libio/Makefile
index 3e08ed0eeb..a2e1c09e1c 100644
--- a/libio/Makefile
+++ b/libio/Makefile
@@ -64,7 +64,8 @@  tests = tst_swprintf tst_wprintf tst_swscanf tst_wscanf tst_getwc tst_putwc   \
 	bug-memstream1 bug-wmemstream1 \
 	tst-setvbuf1 tst-popen1 tst-fgetwc bug-wsetpos tst-fseek \
 	tst-fwrite-error tst-ftell-partial-wide tst-ftell-active-handler \
-	tst-ftell-append tst-fputws tst-bz22415
+	tst-ftell-append tst-fputws tst-bz22415 tst-fgetc-after-eof \
+
 ifeq (yes,$(build-shared))
 # Add test-fopenloc only if shared library is enabled since it depends on
 # shared localedata objects.
diff --git a/libio/fileops.c b/libio/fileops.c
index 5fd8a9647f..9c7e2bc7a2 100644
--- a/libio/fileops.c
+++ b/libio/fileops.c
@@ -469,11 +469,10 @@  int
 _IO_new_file_underflow (_IO_FILE *fp)
 {
   _IO_ssize_t count;
-#if 0
-  /* SysV does not make this test; take it out for compatibility */
+
+  /* C99 requires EOF to be "sticky".  */
   if (fp->_flags & _IO_EOF_SEEN)
     return (EOF);
-#endif
 
   if (fp->_flags & _IO_NO_READS)
     {
diff --git a/libio/oldfileops.c b/libio/oldfileops.c
index f00f6cb979..637fea38db 100644
--- a/libio/oldfileops.c
+++ b/libio/oldfileops.c
@@ -294,11 +294,10 @@  attribute_compat_text_section
 _IO_old_file_underflow (_IO_FILE *fp)
 {
   _IO_ssize_t count;
-#if 0
-  /* SysV does not make this test; take it out for compatibility */
+
+  /* C99 requires EOF to be "sticky".  */
   if (fp->_flags & _IO_EOF_SEEN)
     return (EOF);
-#endif
 
   if (fp->_flags & _IO_NO_READS)
     {
diff --git a/libio/tst-fgetc-after-eof.c b/libio/tst-fgetc-after-eof.c
new file mode 100644
index 0000000000..8487765d9d
--- /dev/null
+++ b/libio/tst-fgetc-after-eof.c
@@ -0,0 +1,135 @@ 
+/* Bug 1190: EOF conditions are supposed to be sticky.
+   Copyright (C) 2018 Free Software Foundation.
+   Copying and distribution of this file, with or without modification,
+   are permitted in any medium without royalty provided the copyright
+   notice and this notice are preserved. This file is offered as-is,
+   without any warranty.  */
+
+/* ISO C1999 specification of fgetc:
+
+       #include <stdio.h>
+       int fgetc (FILE *stream);
+
+   Description
+
+     If the end-of-file indicator for the input stream pointed to by
+     stream is not set and a next character is present, the fgetc
+     function obtains that character as an unsigned char converted to
+     an int and advances the associated file position indicator for
+     the stream (if defined).
+
+   Returns
+
+     If the end-of-file indicator for the stream is set, or if the
+     stream is at end-of-file, the end-of-file indicator for the
+     stream is set and the fgetc function returns EOF. Otherwise, the
+     fgetc function returns the next character from the input stream
+     pointed to by stream. If a read error occurs, the error indicator
+     for the stream is set and the fgetc function returns EOF.
+
+   The requirement to return EOF "if the end-of-file indicator for the
+   stream is set" was new in C99; the language in the 1989 edition of
+   the standard was ambiguous.  Historically, BSD-derived Unix always
+   had the C99 behavior, whereas in System V fgetc would attempt to
+   call read() again before returning EOF again.  Prior to version 2.28,
+   glibc followed the System V behavior even though this does not
+   comply with C99.
+
+   See
+   <https://sourceware.org/bugzilla/show_bug.cgi?id=1190>,
+   <https://sourceware.org/bugzilla/show_bug.cgi?id=19476>,
+   and the thread at
+   <https://sourceware.org/ml/libc-alpha/2012-09/msg00343.html>
+   for more detail.  */
+
+#include <stdio.h>
+
+#include <support/temp_file.h>
+
+int
+do_test (void)
+{
+  /* Create a scratch file, then open it by name a second time,
+     so that at the operating system level we have two independent
+     seek pointers.  */
+  char *scratch_fname;
+  int wfd = create_temp_file ("tst-fgetc-after-eof.", &scratch_fname);
+  if (wfd == -1)
+    {
+      perror ("create_temp_file");
+      return 1;
+    }
+
+  FILE *fout = fdopen (wfd, "w+");
+  if (!fout)
+    {
+      perror ("fdopen");
+      return 1;
+    }
+
+  if (fputs ("abc\n", fout) == EOF
+      || fflush (fout) || ferror (fout))
+    {
+      perror ("write error (initial)");
+      return 1;
+    }
+
+  FILE *fin = fopen (scratch_fname, "r+");
+  if (!fin)
+    {
+      perror (scratch_fname);
+      return 1;
+    }
+
+  /* Reach EOF.  */
+  while (fgetc (fin) != EOF)
+    ;
+  if (ferror (fin))
+    {
+      perror ("read error");
+      return 1;
+    }
+
+  /* Enlarge the file using the write stream.  */
+  if (fputs ("d\n", fout) == EOF
+      || fflush (fout) || ferror (fout))
+    {
+      perror ("write error (enlarging)");
+      return 1;
+    }
+
+  /* feof (fin) should still be true at this point.  */
+  if (!feof (fin))
+    {
+      fputs ("FAIL: input stream not at EOF after enlarging", stderr);
+      return 1;
+    }
+
+  /* And consistent with that, another fgetc should return EOF.  */
+  int c = fgetc (fin);
+  if (c != EOF)
+    {
+      fprintf (stderr, "FAIL: fgetc returned '%c', not EOF\n", c);
+      return 1;
+    }
+
+  /* However, after calling clearerr(), we should be able to read the
+     next characters written to the file.  */
+  clearerr (fin);
+  c = fgetc (fin);
+  if (c != 'd')
+    {
+      if (c == EOF)
+        fprintf (stderr, "FAIL: fgetc after clearerr returned EOF, not 'd'\n");
+      else
+        fprintf (stderr,
+                 "FAIL: fgetc after clearerr returned '%c', not 'd'\n", c);
+      return 1;
+    }
+
+  fclose (fin);
+  fclose (fout);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/libio/wfileops.c b/libio/wfileops.c
index 2488821d54..20ebc885e6 100644
--- a/libio/wfileops.c
+++ b/libio/wfileops.c
@@ -116,6 +116,10 @@  _IO_wfile_underflow (_IO_FILE *fp)
   enum __codecvt_result status;
   _IO_ssize_t count;
 
+  /* C99 requires EOF to be "sticky".  */
+  if (fp->_flags & _IO_EOF_SEEN)
+    return (EOF);
+
   if (__glibc_unlikely (fp->_flags & _IO_NO_READS))
     {
       fp->_flags |= _IO_ERR_SEEN;
diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
index 3ee91d2e1a..63a6fbab58 100644
--- a/wcsmbs/Makefile
+++ b/wcsmbs/Makefile
@@ -50,7 +50,7 @@  strop-tests :=  wcscmp wcsncmp wmemcmp wcslen wcschr wcsrchr wcscpy wcsnlen \
 tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \
 	 tst-wcrtomb tst-wcpncpy tst-mbsrtowcs tst-wchar-h tst-mbrtowc2 \
 	 tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \
-	 tst-wcstod-round test-char-types \
+	 tst-wcstod-round test-char-types tst-fgetwc-after-eof \
 	 $(addprefix test-,$(strop-tests))
 
 include ../Rules
diff --git a/wcsmbs/tst-fgetwc-after-eof.c b/wcsmbs/tst-fgetwc-after-eof.c
new file mode 100644
index 0000000000..d0feeaa916
--- /dev/null
+++ b/wcsmbs/tst-fgetwc-after-eof.c
@@ -0,0 +1,141 @@ 
+/* Bug 1190: EOF conditions are supposed to be sticky.
+   Copyright (C) 2018 Free Software Foundation.
+   Copying and distribution of this file, with or without modification,
+   are permitted in any medium without royalty provided the copyright
+   notice and this notice are preserved. This file is offered as-is,
+   without any warranty.  */
+
+/* ISO C1999 specification of fgetwc:
+
+       #include <stdio.h>
+       #include <wchar.h>
+       wint_t fgetwc (FILE *stream);
+
+   Description
+
+     If the end-of-file indicator for the input stream pointed to by
+     stream is not set and a next wide character is present, the
+     fgetwc function obtains that wide character as a wchar_t
+     converted to a wint_t and advances the associated file position
+     indicator for the stream (if defined).
+
+  Returns
+
+     If the end-of-file indicator for the stream is set, or if the
+     stream is at end-of-file, the end- of-file indicator for the
+     stream is set and the fgetwc function returns WEOF. Otherwise,
+     the fgetwc function returns the next wide character from the
+     input stream pointed to by stream. If a read error occurs, the
+     error indicator for the stream is set and the fgetwc function
+     returns WEOF. If an encoding error occurs (including too few
+     bytes), the value of the macro EILSEQ is stored in errno and the
+     fgetwc function returns WEOF.
+
+   The requirement to return WEOF "if the end-of-file indicator for the
+   stream is set" was new in C99; the language in the 1995 edition of
+   the standard was ambiguous.  Historically, BSD-derived Unix always
+   had the C99 behavior, whereas in System V fgetc would attempt to
+   call read() again before returning EOF again.  Prior to version 2.28,
+   glibc followed the System V behavior even though this does not
+   comply with C99.
+
+   See
+   <https://sourceware.org/bugzilla/show_bug.cgi?id=1190>,
+   <https://sourceware.org/bugzilla/show_bug.cgi?id=19476>,
+   and the thread at
+   <https://sourceware.org/ml/libc-alpha/2012-09/msg00343.html>
+   for more detail.  */
+
+#include <stdio.h>
+#include <wchar.h>
+
+#include <support/temp_file.h>
+
+int
+do_test (void)
+{
+  /* Create a scratch file, then open it by name a second time,
+     so that at the operating system level we have two independent
+     seek pointers.  */
+  char *scratch_fname;
+  int wfd = create_temp_file ("tst-fgetc-after-eof.", &scratch_fname);
+  if (wfd == -1)
+    {
+      perror ("create_temp_file");
+      return 1;
+    }
+
+  FILE *fout = fdopen (wfd, "w+");
+  if (!fout)
+    {
+      perror ("fdopen");
+      return 1;
+    }
+
+  if (fputws (L"abc\n", fout) == WEOF
+      || fflush (fout) || ferror (fout))
+    {
+      perror ("write error (initial)");
+      return 1;
+    }
+
+  FILE *fin = fopen (scratch_fname, "r+");
+  if (!fin)
+    {
+      perror (scratch_fname);
+      return 1;
+    }
+
+  /* Reach EOF.  */
+  while (fgetwc (fin) != WEOF)
+    ;
+  if (ferror (fin))
+    {
+      perror ("read error");
+      return 1;
+    }
+
+  /* Enlarge the file using the write stream.  */
+  if (fputws (L"d\n", fout) == WEOF
+      || fflush (fout) || ferror (fout))
+    {
+      perror ("write error (enlarging)");
+      return 1;
+    }
+
+  /* feof (fin) should still be true at this point.  */
+  if (!feof (fin))
+    {
+      fputs ("FAIL: input stream not at EOF after enlarging", stderr);
+      return 1;
+    }
+
+  /* And consistent with that, another fgetwc should return WEOF.  */
+  wint_t c = fgetwc (fin);
+  if (c != WEOF)
+    {
+      fprintf (stderr, "FAIL: fgetc returned L'%lc', not WEOF\n", c);
+      return 1;
+    }
+
+  /* However, after calling clearerr(), we should be able to read the
+     next characters written to the file.  */
+  clearerr (fin);
+  c = fgetwc (fin);
+  if (c != L'd')
+    {
+      if (c == EOF)
+        fprintf (stderr,
+                 "FAIL: fgetwc after clearerr returned EOF, not L'd'\n");
+      else
+        fprintf (stderr,
+                 "FAIL: fgetc after clearerr returned L'%lc', not L'd'\n", c);
+      return 1;
+    }
+
+  fclose (fin);
+  fclose (fout);
+  return 0;
+}
+
+#include <support/test-driver.c>