From patchwork Mon Feb 12 16:42:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zack Weinberg X-Patchwork-Id: 25915 Received: (qmail 80113 invoked by alias); 12 Feb 2018 16:42:17 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 80098 invoked by uid 89); 12 Feb 2018 16:42:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=Reach, seriously, H*Ad:U*siddhesh, worrying X-HELO: mailbackend.panix.com From: Zack Weinberg To: libc-alpha@sourceware.org Cc: joseph@codesourcery.com, schwab@linux-m68k.org, siddhesh@gotplt.org Subject: [PATCH] [BZ #1190] Make EOF sticky in stdio. Date: Mon, 12 Feb 2018 11:42:11 -0500 Message-Id: <20180212164211.1249-1-zackw@panix.com> MIME-Version: 1.0 C99 specifies that the EOF condition on a file is "sticky": once EOF has been encountered, all subsequent reads should continue to return EOF until the file is closed or something clears the "end-of-file indicator" (e.g. fseek, clearerr). This is arguably a change from C89, where the wording was ambiguous; the BSDs always had sticky EOF, but the System V lineage would attempt to read from the underlying fd again. GNU libc has followed System V for as long as we've been using libio---the relevant chunk of code is int _IO_new_file_underflow (_IO_FILE *fp) { _IO_ssize_t count; #if 0 /* SysV does not make this test; take it out for compatibility */ if (fp->_flags & _IO_EOF_SEEN) return (EOF); #endif That's been unchanged since before 1995. This only matters if the underlying file has changed in the meantime, of course; perhaps that's why nobody got around to filing a bug report until 2005, six years after C99 was published. And nobody took that bug seriously until 2012, at which time there was a long but inconclusive discussion on libc-alpha regarding whether it would break applications to change anything, see . It is my considered opinion that we should just go ahead and fix the bug, and that a backward compatibility mode is not required, because the BSDs have always had sticky EOF, so portable code has always had to be prepared to deal with that behavior. Nowadays, the lineages we should be worrying most about compatibility with are all BSD-derived, anyway. Thus, this patch. You might wonder if changing the _underflow impls is sufficient to apply the C99 semantics to all of the many stdio functions that perform input. It should be enough to cover all paths to _IO_SYSREAD, and the only other functions that call _IO_SYSREAD are the _seekoff impls, which is OK because seeking clears EOF, and the _xsgetn impls, which, as far as I can tell, are unused within glibc. The test cases in this patch are adapted from bug #19476, filed in 2016; I will credit Martin Sebor in the change log. zw [BZ #1190] * libio/fileops.c (_IO_new_file_underflow): Return EOF immediately if the _IO_EOF_SEEN bit is already set; update commentary. * libio/oldfileops.c (_IO_old_file_underflow): Likewise. * libio/wfileops.c (_IO_wfile_underflow): Likewise. * libio/tst-fgetc-after-eof.c, wcsmbs/test-fgetwc-after-eof.c: New test cases. * libio/Makefile (tests): Add tst-fgetc-after-eof. * wcsmbs/Makefile (tests): Add tst-fgetwc-after-eof. --- libio/Makefile | 3 +- libio/fileops.c | 5 +- libio/oldfileops.c | 5 +- libio/tst-fgetc-after-eof.c | 135 ++++++++++++++++++++++++++++++++++++++++ libio/wfileops.c | 4 ++ wcsmbs/Makefile | 2 +- wcsmbs/tst-fgetwc-after-eof.c | 141 ++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 287 insertions(+), 8 deletions(-) create mode 100644 libio/tst-fgetc-after-eof.c create mode 100644 wcsmbs/tst-fgetwc-after-eof.c diff --git a/libio/Makefile b/libio/Makefile index 3e08ed0eeb..a2e1c09e1c 100644 --- a/libio/Makefile +++ b/libio/Makefile @@ -64,7 +64,8 @@ tests = tst_swprintf tst_wprintf tst_swscanf tst_wscanf tst_getwc tst_putwc \ bug-memstream1 bug-wmemstream1 \ tst-setvbuf1 tst-popen1 tst-fgetwc bug-wsetpos tst-fseek \ tst-fwrite-error tst-ftell-partial-wide tst-ftell-active-handler \ - tst-ftell-append tst-fputws tst-bz22415 + tst-ftell-append tst-fputws tst-bz22415 tst-fgetc-after-eof \ + ifeq (yes,$(build-shared)) # Add test-fopenloc only if shared library is enabled since it depends on # shared localedata objects. diff --git a/libio/fileops.c b/libio/fileops.c index 5fd8a9647f..9c7e2bc7a2 100644 --- a/libio/fileops.c +++ b/libio/fileops.c @@ -469,11 +469,10 @@ int _IO_new_file_underflow (_IO_FILE *fp) { _IO_ssize_t count; -#if 0 - /* SysV does not make this test; take it out for compatibility */ + + /* C99 requires EOF to be "sticky". */ if (fp->_flags & _IO_EOF_SEEN) return (EOF); -#endif if (fp->_flags & _IO_NO_READS) { diff --git a/libio/oldfileops.c b/libio/oldfileops.c index f00f6cb979..637fea38db 100644 --- a/libio/oldfileops.c +++ b/libio/oldfileops.c @@ -294,11 +294,10 @@ attribute_compat_text_section _IO_old_file_underflow (_IO_FILE *fp) { _IO_ssize_t count; -#if 0 - /* SysV does not make this test; take it out for compatibility */ + + /* C99 requires EOF to be "sticky". */ if (fp->_flags & _IO_EOF_SEEN) return (EOF); -#endif if (fp->_flags & _IO_NO_READS) { diff --git a/libio/tst-fgetc-after-eof.c b/libio/tst-fgetc-after-eof.c new file mode 100644 index 0000000000..8487765d9d --- /dev/null +++ b/libio/tst-fgetc-after-eof.c @@ -0,0 +1,135 @@ +/* Bug 1190: EOF conditions are supposed to be sticky. + Copyright (C) 2018 Free Software Foundation. + Copying and distribution of this file, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. This file is offered as-is, + without any warranty. */ + +/* ISO C1999 specification of fgetc: + + #include + int fgetc (FILE *stream); + + Description + + If the end-of-file indicator for the input stream pointed to by + stream is not set and a next character is present, the fgetc + function obtains that character as an unsigned char converted to + an int and advances the associated file position indicator for + the stream (if defined). + + Returns + + If the end-of-file indicator for the stream is set, or if the + stream is at end-of-file, the end-of-file indicator for the + stream is set and the fgetc function returns EOF. Otherwise, the + fgetc function returns the next character from the input stream + pointed to by stream. If a read error occurs, the error indicator + for the stream is set and the fgetc function returns EOF. + + The requirement to return EOF "if the end-of-file indicator for the + stream is set" was new in C99; the language in the 1989 edition of + the standard was ambiguous. Historically, BSD-derived Unix always + had the C99 behavior, whereas in System V fgetc would attempt to + call read() again before returning EOF again. Prior to version 2.28, + glibc followed the System V behavior even though this does not + comply with C99. + + See + , + , + and the thread at + + for more detail. */ + +#include + +#include + +int +do_test (void) +{ + /* Create a scratch file, then open it by name a second time, + so that at the operating system level we have two independent + seek pointers. */ + char *scratch_fname; + int wfd = create_temp_file ("tst-fgetc-after-eof.", &scratch_fname); + if (wfd == -1) + { + perror ("create_temp_file"); + return 1; + } + + FILE *fout = fdopen (wfd, "w+"); + if (!fout) + { + perror ("fdopen"); + return 1; + } + + if (fputs ("abc\n", fout) == EOF + || fflush (fout) || ferror (fout)) + { + perror ("write error (initial)"); + return 1; + } + + FILE *fin = fopen (scratch_fname, "r+"); + if (!fin) + { + perror (scratch_fname); + return 1; + } + + /* Reach EOF. */ + while (fgetc (fin) != EOF) + ; + if (ferror (fin)) + { + perror ("read error"); + return 1; + } + + /* Enlarge the file using the write stream. */ + if (fputs ("d\n", fout) == EOF + || fflush (fout) || ferror (fout)) + { + perror ("write error (enlarging)"); + return 1; + } + + /* feof (fin) should still be true at this point. */ + if (!feof (fin)) + { + fputs ("FAIL: input stream not at EOF after enlarging", stderr); + return 1; + } + + /* And consistent with that, another fgetc should return EOF. */ + int c = fgetc (fin); + if (c != EOF) + { + fprintf (stderr, "FAIL: fgetc returned '%c', not EOF\n", c); + return 1; + } + + /* However, after calling clearerr(), we should be able to read the + next characters written to the file. */ + clearerr (fin); + c = fgetc (fin); + if (c != 'd') + { + if (c == EOF) + fprintf (stderr, "FAIL: fgetc after clearerr returned EOF, not 'd'\n"); + else + fprintf (stderr, + "FAIL: fgetc after clearerr returned '%c', not 'd'\n", c); + return 1; + } + + fclose (fin); + fclose (fout); + return 0; +} + +#include diff --git a/libio/wfileops.c b/libio/wfileops.c index 2488821d54..20ebc885e6 100644 --- a/libio/wfileops.c +++ b/libio/wfileops.c @@ -116,6 +116,10 @@ _IO_wfile_underflow (_IO_FILE *fp) enum __codecvt_result status; _IO_ssize_t count; + /* C99 requires EOF to be "sticky". */ + if (fp->_flags & _IO_EOF_SEEN) + return (EOF); + if (__glibc_unlikely (fp->_flags & _IO_NO_READS)) { fp->_flags |= _IO_ERR_SEEN; diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index 3ee91d2e1a..63a6fbab58 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -50,7 +50,7 @@ strop-tests := wcscmp wcsncmp wmemcmp wcslen wcschr wcsrchr wcscpy wcsnlen \ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \ tst-wcrtomb tst-wcpncpy tst-mbsrtowcs tst-wchar-h tst-mbrtowc2 \ tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \ - tst-wcstod-round test-char-types \ + tst-wcstod-round test-char-types tst-fgetwc-after-eof \ $(addprefix test-,$(strop-tests)) include ../Rules diff --git a/wcsmbs/tst-fgetwc-after-eof.c b/wcsmbs/tst-fgetwc-after-eof.c new file mode 100644 index 0000000000..d0feeaa916 --- /dev/null +++ b/wcsmbs/tst-fgetwc-after-eof.c @@ -0,0 +1,141 @@ +/* Bug 1190: EOF conditions are supposed to be sticky. + Copyright (C) 2018 Free Software Foundation. + Copying and distribution of this file, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. This file is offered as-is, + without any warranty. */ + +/* ISO C1999 specification of fgetwc: + + #include + #include + wint_t fgetwc (FILE *stream); + + Description + + If the end-of-file indicator for the input stream pointed to by + stream is not set and a next wide character is present, the + fgetwc function obtains that wide character as a wchar_t + converted to a wint_t and advances the associated file position + indicator for the stream (if defined). + + Returns + + If the end-of-file indicator for the stream is set, or if the + stream is at end-of-file, the end- of-file indicator for the + stream is set and the fgetwc function returns WEOF. Otherwise, + the fgetwc function returns the next wide character from the + input stream pointed to by stream. If a read error occurs, the + error indicator for the stream is set and the fgetwc function + returns WEOF. If an encoding error occurs (including too few + bytes), the value of the macro EILSEQ is stored in errno and the + fgetwc function returns WEOF. + + The requirement to return WEOF "if the end-of-file indicator for the + stream is set" was new in C99; the language in the 1995 edition of + the standard was ambiguous. Historically, BSD-derived Unix always + had the C99 behavior, whereas in System V fgetc would attempt to + call read() again before returning EOF again. Prior to version 2.28, + glibc followed the System V behavior even though this does not + comply with C99. + + See + , + , + and the thread at + + for more detail. */ + +#include +#include + +#include + +int +do_test (void) +{ + /* Create a scratch file, then open it by name a second time, + so that at the operating system level we have two independent + seek pointers. */ + char *scratch_fname; + int wfd = create_temp_file ("tst-fgetc-after-eof.", &scratch_fname); + if (wfd == -1) + { + perror ("create_temp_file"); + return 1; + } + + FILE *fout = fdopen (wfd, "w+"); + if (!fout) + { + perror ("fdopen"); + return 1; + } + + if (fputws (L"abc\n", fout) == WEOF + || fflush (fout) || ferror (fout)) + { + perror ("write error (initial)"); + return 1; + } + + FILE *fin = fopen (scratch_fname, "r+"); + if (!fin) + { + perror (scratch_fname); + return 1; + } + + /* Reach EOF. */ + while (fgetwc (fin) != WEOF) + ; + if (ferror (fin)) + { + perror ("read error"); + return 1; + } + + /* Enlarge the file using the write stream. */ + if (fputws (L"d\n", fout) == WEOF + || fflush (fout) || ferror (fout)) + { + perror ("write error (enlarging)"); + return 1; + } + + /* feof (fin) should still be true at this point. */ + if (!feof (fin)) + { + fputs ("FAIL: input stream not at EOF after enlarging", stderr); + return 1; + } + + /* And consistent with that, another fgetwc should return WEOF. */ + wint_t c = fgetwc (fin); + if (c != WEOF) + { + fprintf (stderr, "FAIL: fgetc returned L'%lc', not WEOF\n", c); + return 1; + } + + /* However, after calling clearerr(), we should be able to read the + next characters written to the file. */ + clearerr (fin); + c = fgetwc (fin); + if (c != L'd') + { + if (c == EOF) + fprintf (stderr, + "FAIL: fgetwc after clearerr returned EOF, not L'd'\n"); + else + fprintf (stderr, + "FAIL: fgetc after clearerr returned L'%lc', not L'd'\n", c); + return 1; + } + + fclose (fin); + fclose (fout); + return 0; +} + +#include