Never cache offset when the stream handle is not active (BZ #16680)

Message ID 20140310122306.GF1656@spoyarek.pnq.redhat.com
State Superseded
Headers

Commit Message

Siddhesh Poyarekar March 10, 2014, 12:23 p.m. UTC
  Hi,

The ftell implementation was made conservative to ensure that
incorrectly cached offsets never affect it.  However, this causes
problems for append mode when a file stream is rewound.

For a and a+ mode files in read mode, rewinding the stream should
result in ftell returning 0 as the offset, but without caching, it
just assumes that the file offset is the end of file (as opposed to
SEEK_CUR, since rewind correctly sets it).  Now I couldn't find
anything in POSIX that specifies the stream position after rewind()
for a file opened in 'a' mode, but for 'a+' mode it should be set to
0.  For 'a' mode too, it probably makes sense to keep it set to 0 in
the interest of retaining old behavior.

The best way to fix this would be to avoid caching the offset before
the file handle is active.  With this change, the only time the offset
cache is not reliable is when the file is writing in any of the append
modes.

I have added a test case to the active-ftell-handle test to detect
this bug and verify that it is fixed.  I have also verified that there
are no regressions in the testsuite.

Siddhesh

	[BZ #16680]
	* libio/fileops.c (_IO_file_open): Don't change offset cache.
	(do_ftell): Use cached offset when available and when file is
	not in append mode.
	* libio/wfileops.c (do_ftell_wide): Likewise.
	* libio/tst-ftell-active-handler.c (do_rewind_test): New test
	case.
	(do_one_test): Call it.
	(do_ftell_test): Fix test output.

---
 libio/fileops.c                  |  31 +++---------
 libio/tst-ftell-active-handler.c | 104 ++++++++++++++++++++++++++++++++++++++-
 libio/wfileops.c                 |  25 +++-------
 3 files changed, 117 insertions(+), 43 deletions(-)
  

Comments

Rich Felker March 10, 2014, 3:37 p.m. UTC | #1
On Mon, Mar 10, 2014 at 05:53:06PM +0530, Siddhesh Poyarekar wrote:
> Hi,
> 
> The ftell implementation was made conservative to ensure that
> incorrectly cached offsets never affect it.  However, this causes
> problems for append mode when a file stream is rewound.
> 
> For a and a+ mode files in read mode, rewinding the stream should
> result in ftell returning 0 as the offset, but without caching, it
> just assumes that the file offset is the end of file (as opposed to
> SEEK_CUR, since rewind correctly sets it).  Now I couldn't find
> anything in POSIX that specifies the stream position after rewind()
> for a file opened in 'a' mode, but for 'a+' mode it should be set to
> 0.  For 'a' mode too, it probably makes sense to keep it set to 0 in
> the interest of retaining old behavior.
> 
> The best way to fix this would be to avoid caching the offset before
> the file handle is active.  With this change, the only time the offset
> cache is not reliable is when the file is writing in any of the append
> modes.

I'm confused. I thought we discussed before that caching the offset is
_always_ invalid when the file handle is not active. This is not
specific to append mode.

Rich
  
Siddhesh Poyarekar March 10, 2014, 3:42 p.m. UTC | #2
On Mon, Mar 10, 2014 at 11:37:13AM -0400, Rich Felker wrote:
> On Mon, Mar 10, 2014 at 05:53:06PM +0530, Siddhesh Poyarekar wrote:
> > Hi,
> > 
> > The ftell implementation was made conservative to ensure that
> > incorrectly cached offsets never affect it.  However, this causes
> > problems for append mode when a file stream is rewound.
> > 
> > For a and a+ mode files in read mode, rewinding the stream should
> > result in ftell returning 0 as the offset, but without caching, it
> > just assumes that the file offset is the end of file (as opposed to
> > SEEK_CUR, since rewind correctly sets it).  Now I couldn't find
> > anything in POSIX that specifies the stream position after rewind()
> > for a file opened in 'a' mode, but for 'a+' mode it should be set to
> > 0.  For 'a' mode too, it probably makes sense to keep it set to 0 in
> > the interest of retaining old behavior.
> > 
> > The best way to fix this would be to avoid caching the offset before
> > the file handle is active.  With this change, the only time the offset
> > cache is not reliable is when the file is writing in any of the append
> > modes.
> 
> I'm confused. I thought we discussed before that caching the offset is
> _always_ invalid when the file handle is not active. This is not
> specific to append mode.

Sorry, I should have been specific - the only time the offset cache is
not reliable in an _active_ file handle is when the file is writing in
any of the append modes.  The cache is always invalid when the handle
is not active, which is why no function should cache the offset till
the handle is activated.  Does that sound correct now?

Thanks,
Siddhesh
  
Rich Felker March 10, 2014, 6:34 p.m. UTC | #3
On Mon, Mar 10, 2014 at 09:12:04PM +0530, Siddhesh Poyarekar wrote:
> On Mon, Mar 10, 2014 at 11:37:13AM -0400, Rich Felker wrote:
> > On Mon, Mar 10, 2014 at 05:53:06PM +0530, Siddhesh Poyarekar wrote:
> > > Hi,
> > > 
> > > The ftell implementation was made conservative to ensure that
> > > incorrectly cached offsets never affect it.  However, this causes
> > > problems for append mode when a file stream is rewound.
> > > 
> > > For a and a+ mode files in read mode, rewinding the stream should
> > > result in ftell returning 0 as the offset, but without caching, it
> > > just assumes that the file offset is the end of file (as opposed to
> > > SEEK_CUR, since rewind correctly sets it).  Now I couldn't find
> > > anything in POSIX that specifies the stream position after rewind()
> > > for a file opened in 'a' mode, but for 'a+' mode it should be set to
> > > 0.  For 'a' mode too, it probably makes sense to keep it set to 0 in
> > > the interest of retaining old behavior.
> > > 
> > > The best way to fix this would be to avoid caching the offset before
> > > the file handle is active.  With this change, the only time the offset
> > > cache is not reliable is when the file is writing in any of the append
> > > modes.
> > 
> > I'm confused. I thought we discussed before that caching the offset is
> > _always_ invalid when the file handle is not active. This is not
> > specific to append mode.
> 
> Sorry, I should have been specific - the only time the offset cache is
> not reliable in an _active_ file handle is when the file is writing in
> any of the append modes.  The cache is always invalid when the handle
> is not active, which is why no function should cache the offset till
> the handle is activated.  Does that sound correct now?

Caching the offset is valid when writing in append mode. The one case
that's special for append mode is that, even once the handle is
active, you can't save the resulting offset at the time of seek/rewind
and use it for ftell during subsequent writing. This is because the
(first) write after seeking could change the current offset. One easy
way to deal with this is to treat seek/rewind as deactivating the
handle when the stream is in append mode, but more fine-grained
solutions are possible too.

Rich
  
Siddhesh Poyarekar March 11, 2014, 12:39 a.m. UTC | #4
On Mon, Mar 10, 2014 at 02:34:59PM -0400, Rich Felker wrote:
> Caching the offset is valid when writing in append mode. The one case
> that's special for append mode is that, even once the handle is
> active, you can't save the resulting offset at the time of seek/rewind
> and use it for ftell during subsequent writing. This is because the
> (first) write after seeking could change the current offset. One easy
> way to deal with this is to treat seek/rewind as deactivating the
> handle when the stream is in append mode, but more fine-grained
> solutions are possible too.

Treating seek/rewind as deactivating the handle won't help for append
mode because we want to simulate the file position as being the end of
file regardless of the file position when the handle is inactive, but
after a seek, the stream is assumed to be in read mode until the next
write and in such a case, we want to know the actual position of the
file and not the simulated end-of-file.  That is why I use writing in
append mode as a coarse-grained indicator that the file offset is the
end of file.  Isn't it similar to what you do in musl?

Siddhesh
  
Rich Felker March 11, 2014, 2:39 a.m. UTC | #5
On Tue, Mar 11, 2014 at 06:09:35AM +0530, Siddhesh Poyarekar wrote:
> On Mon, Mar 10, 2014 at 02:34:59PM -0400, Rich Felker wrote:
> > Caching the offset is valid when writing in append mode. The one case
> > that's special for append mode is that, even once the handle is
> > active, you can't save the resulting offset at the time of seek/rewind
> > and use it for ftell during subsequent writing. This is because the
> > (first) write after seeking could change the current offset. One easy
> > way to deal with this is to treat seek/rewind as deactivating the
> > handle when the stream is in append mode, but more fine-grained
> > solutions are possible too.
> 
> Treating seek/rewind as deactivating the handle won't help for append
> mode because we want to simulate the file position as being the end of
> file regardless of the file position when the handle is inactive, but

Why do you want to do this? It's wrong. If the handle is inactive, the
only way you can get the position is by calling lseek(fd,0,SEEK_CUR).
This is because operations on another handle could have changed the
position without stdio's knowledge.

> after a seek, the stream is assumed to be in read mode until the next
> write and in such a case, we want to know the actual position of the
> file and not the simulated end-of-file.  That is why I use writing in
> append mode as a coarse-grained indicator that the file offset is the
> end of file.  Isn't it similar to what you do in musl?

In musl there is no caching. ftello calls lseek(fd,0,SEEK_CUR) in all
cases except append mode with unflushed buffer; in this case it calls
lseek(fd,0,SEEK_END). (It's valid to move the offset in this case
because having unflushed data is proof that the handle is active.)

Rich
  
Patchwork Bot March 11, 2014, 2:57 a.m. UTC | #6
Resending, this is hopefully in plain text now.

Siddhesh

On 11/03/2014, Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> wrote:
> On Tuesday, 11 March 2014, Rich Felker <dalias@aerifal.cx> wrote:
>
>> On Tue, Mar 11, 2014 at 06:09:35AM +0530, Siddhesh Poyarekar wrote:
>> > On Mon, Mar 10, 2014 at 02:34:59PM -0400, Rich Felker wrote:
>> > > Caching the offset is valid when writing in append mode. The one case
>> > > that's special for append mode is that, even once the handle is
>> > > active, you can't save the resulting offset at the time of
>> > > seek/rewind
>> > > and use it for ftell during subsequent writing. This is because the
>> > > (first) write after seeking could change the current offset. One easy
>> > > way to deal with this is to treat seek/rewind as deactivating the
>> > > handle when the stream is in append mode, but more fine-grained
>> > > solutions are possible too.
>> >
>> > Treating seek/rewind as deactivating the handle won't help for append
>> > mode because we want to simulate the file position as being the end of
>> > file regardless of the file position when the handle is inactive, but
>>
>> Why do you want to do this? It's wrong. If the handle is inactive, the
>
> only way you can get the position is by calling lseek(fd,0,SEEK_CUR).
>
> This is because operations on another handle could have changed the
>> position without stdio's knowledge.
>>
>
> Because that would give an inconsistent result. Initial position for append
> mode files is implementation defined and glibc has set it as end of file in
> the past. Changing that would be a breakage, wouldn't it?
>
> Siddhesh
>
>
> --
> http://siddhesh.in
>
  
Rich Felker March 11, 2014, 3:15 a.m. UTC | #7
On Tue, Mar 11, 2014 at 08:24:26AM +0530, Siddhesh Poyarekar wrote:
> On Tuesday, 11 March 2014, Rich Felker <dalias@aerifal.cx> wrote:
> 
> > On Tue, Mar 11, 2014 at 06:09:35AM +0530, Siddhesh Poyarekar wrote:
> > > On Mon, Mar 10, 2014 at 02:34:59PM -0400, Rich Felker wrote:
> > > > Caching the offset is valid when writing in append mode. The one case
> > > > that's special for append mode is that, even once the handle is
> > > > active, you can't save the resulting offset at the time of seek/rewind
> > > > and use it for ftell during subsequent writing. This is because the
> > > > (first) write after seeking could change the current offset. One easy
> > > > way to deal with this is to treat seek/rewind as deactivating the
> > > > handle when the stream is in append mode, but more fine-grained
> > > > solutions are possible too.
> > >
> > > Treating seek/rewind as deactivating the handle won't help for append
> > > mode because we want to simulate the file position as being the end of
> > > file regardless of the file position when the handle is inactive, but
> >
> > Why do you want to do this? It's wrong. If the handle is inactive, the
> 
> only way you can get the position is by calling lseek(fd,0,SEEK_CUR).
> 
> This is because operations on another handle could have changed the
> > position without stdio's knowledge.
> >
> 
> Because that would give an inconsistent result. Initial position for append
> mode files is implementation defined and glibc has set it as end of file in
> the past. Changing that would be a breakage, wouldn't it?

Then fopen needs to physically seek to the end of the file with lseek
when append mode is requested. Faking the position is non-conforming
because it does not respect the rules for active handles.

Rich
  
Patchwork Bot March 11, 2014, 3:32 a.m. UTC | #8
OK, I guess I could just do the seek but not cache it in fopen and
fdopen.  That way I can get rid of the stat bits and just rely on
seek_cur. For append mode I can cache the offset after seeking to end
if we have an unflushed write, like you do in musl.

Siddhesh

On 11/03/2014, Rich Felker <dalias@aerifal.cx> wrote:
> On Tue, Mar 11, 2014 at 08:24:26AM +0530, Siddhesh Poyarekar wrote:
>> On Tuesday, 11 March 2014, Rich Felker <dalias@aerifal.cx> wrote:
>>
>> > On Tue, Mar 11, 2014 at 06:09:35AM +0530, Siddhesh Poyarekar wrote:
>> > > On Mon, Mar 10, 2014 at 02:34:59PM -0400, Rich Felker wrote:
>> > > > Caching the offset is valid when writing in append mode. The one
>> > > > case
>> > > > that's special for append mode is that, even once the handle is
>> > > > active, you can't save the resulting offset at the time of
>> > > > seek/rewind
>> > > > and use it for ftell during subsequent writing. This is because the
>> > > > (first) write after seeking could change the current offset. One
>> > > > easy
>> > > > way to deal with this is to treat seek/rewind as deactivating the
>> > > > handle when the stream is in append mode, but more fine-grained
>> > > > solutions are possible too.
>> > >
>> > > Treating seek/rewind as deactivating the handle won't help for append
>> > > mode because we want to simulate the file position as being the end
>> > > of
>> > > file regardless of the file position when the handle is inactive, but
>> >
>> > Why do you want to do this? It's wrong. If the handle is inactive, the
>>
>> only way you can get the position is by calling lseek(fd,0,SEEK_CUR).
>>
>> This is because operations on another handle could have changed the
>> > position without stdio's knowledge.
>> >
>>
>> Because that would give an inconsistent result. Initial position for
>> append
>> mode files is implementation defined and glibc has set it as end of file
>> in
>> the past. Changing that would be a breakage, wouldn't it?
>
> Then fopen needs to physically seek to the end of the file with lseek
> when append mode is requested. Faking the position is non-conforming
> because it does not respect the rules for active handles.
>
> Rich
>
  

Patch

diff --git a/libio/fileops.c b/libio/fileops.c
index 2e7bc8d..dbae602 100644
--- a/libio/fileops.c
+++ b/libio/fileops.c
@@ -232,13 +232,6 @@  _IO_file_open (fp, filename, posix_mode, prot, read_write, is32not64)
     return NULL;
   fp->_fileno = fdesc;
   _IO_mask_flags (fp, read_write,_IO_NO_READS+_IO_NO_WRITES+_IO_IS_APPENDING);
-  if ((read_write & _IO_IS_APPENDING) && (read_write & _IO_NO_READS))
-    if (_IO_SEEKOFF (fp, (_IO_off64_t)0, _IO_seek_end, _IOS_INPUT|_IOS_OUTPUT)
-	== _IO_pos_BAD && errno != ESPIPE)
-      {
-	close_not_cancel (fdesc);
-	return NULL;
-      }
   _IO_link_in ((struct _IO_FILE_plus *) fp);
   return fp;
 }
@@ -974,29 +967,19 @@  do_ftell (_IO_FILE *fp)
       bool was_writing = (fp->_IO_write_ptr > fp->_IO_write_base
 			  || _IO_in_put_mode (fp));
 
+      bool append_mode = (fp->_flags & _IO_IS_APPENDING) == _IO_IS_APPENDING;
+
       /* Adjust for unflushed data.  */
       if (!was_writing)
 	result -= fp->_IO_read_end - fp->_IO_read_ptr;
       else
 	result += fp->_IO_write_ptr - fp->_IO_read_end;
 
-      /* It is safe to use the cached offset when available if there is
-	 unbuffered data (indicating that the file handle is active) and the
-	 handle is not for a file open in a+ mode.  The latter condition is
-	 because there could be a scenario where there is a switch from read
-	 mode to write mode using an fseek to an arbitrary position.  In this
-	 case, there would be unbuffered data due to be appended to the end of
-	 the file, but the offset may not necessarily be the end of the
-	 file.  It is fine to use the cached offset when the a+ stream is in
-	 read mode though, since the offset is maintained correctly in that
-	 case.  Note that this is not a comprehensive set of cases when the
-	 offset is reliable.  The offset may be reliable even in some cases
-	 where there is no unflushed input and the handle is active, but it's
-	 just that we don't have a way to identify that condition reliably.  */
-      use_cached_offset = (result != 0 && fp->_offset != _IO_pos_BAD
-			   && ((fp->_flags & (_IO_IS_APPENDING | _IO_NO_READS))
-			       == (_IO_IS_APPENDING | _IO_NO_READS)
-			       && was_writing));
+      /* It should be safe to use the cached offset if it is available and we
+	 are not writing in a+ mode.  Any attempt to set the offset when the
+	 stream handle is not active, is a bug.  */
+      use_cached_offset = (fp->_offset != _IO_pos_BAD
+			   && !(was_writing && append_mode));
     }
 
   if (use_cached_offset)
diff --git a/libio/tst-ftell-active-handler.c b/libio/tst-ftell-active-handler.c
index 5d5fc26..4168c54 100644
--- a/libio/tst-ftell-active-handler.c
+++ b/libio/tst-ftell-active-handler.c
@@ -88,6 +88,107 @@  static size_t file_len;
 typedef int (*fputs_func_t) (const void *data, FILE *fp);
 fputs_func_t fputs_func;
 
+/* Test that ftell output after a rewind is correct.  */
+static int
+do_rewind_test (const char *filename)
+{
+  int ret = 0;
+  struct test
+    {
+      const char *mode;
+      int fd_mode;
+      size_t old_off;
+      size_t new_off;
+    } test_modes[] = {
+	  {"w", O_WRONLY, 0, data_len},
+	  {"w+", O_RDWR, 0, data_len},
+	  {"r+", O_RDWR, 0, data_len},
+	  /* The new offsets for 'a' and 'a+' modes have to factor in the
+	     previous writes since they compulsorily append to the end of the
+	     file.  */
+	  {"a", O_WRONLY, 0, 3 * data_len},
+	  {"a+", O_RDWR, 0, 4 * data_len},
+    };
+
+  /* Empty the file before the test so that our offsets are simple to
+     calculate.  */
+  FILE *fp = fopen (filename, "w");
+  if (fp == NULL)
+    {
+      printf ("Failed to open file for emptying\n");
+      return 1;
+    }
+  fclose (fp);
+
+  for (int j = 0; j < 2; j++)
+    {
+      for (int i = 0; i < sizeof (test_modes) / sizeof (struct test); i++)
+	{
+	  FILE *fp;
+	  int fd;
+	  int fileret;
+
+	  printf ("\trewind: %s (file, \"%s\"): ", j == 0 ? "fdopen" : "fopen",
+		  test_modes[i].mode);
+
+	  if (j == 0)
+	    fileret = get_handles_fdopen (filename, fd, fp,
+					  test_modes[i].fd_mode,
+					  test_modes[i].mode);
+	  else
+	    fileret = get_handles_fopen (filename, fd, fp, test_modes[i].mode);
+
+	  if (fileret != 0)
+	    return fileret;
+
+	  /* Write some content to the file, rewind and ensure that the ftell
+	     output after the rewind is 0.  POSIX does not specify what the
+	     behavior is when a file is rewound in 'a' mode, so we retain
+	     current behavior, which is to keep the 0 offset.  */
+	  size_t written = fputs_func (data, fp);
+
+	  if (written == EOF)
+	    {
+	      printf ("fputs[1] failed to write data\n");
+	      ret |= 1;
+	    }
+
+	  rewind (fp);
+	  long offset = ftell (fp);
+
+	  if (offset != test_modes[i].old_off)
+	    {
+	      printf ("Incorrect old offset.  Expected %zu, but got %ld, ",
+		      test_modes[i].old_off, offset);
+	      ret |= 1;
+	    }
+	  else
+	    printf ("old offset = %ld, ", offset);
+
+	  written = fputs_func (data, fp);
+
+	  if (written == EOF)
+	    {
+	      printf ("fputs[1] failed to write data\n");
+	      ret |= 1;
+	    }
+
+	  /* After this write, the offset in append modes should factor in the
+	     implicit lseek to the end of file.  */
+	  offset = ftell (fp);
+	  if (offset != test_modes[i].new_off)
+	    {
+	      printf ("Incorrect new offset.  Expected %zu, but got %ld\n",
+		      test_modes[i].new_off, offset);
+	      ret |= 1;
+	    }
+	  else
+	    printf ("new offset = %ld\n", offset);
+	}
+    }
+  return ret;
+}
+
 /* Test that the value of ftell is not cached when the stream handle is not
    active.  */
 static int
@@ -157,7 +258,7 @@  do_ftell_test (const char *filename)
 	  if (off != test_modes[i].new_off)
 	    {
 	      printf ("Incorrect new offset.  Expected %zu but got %ld\n",
-		      test_modes[i].old_off, off);
+		      test_modes[i].new_off, off);
 	      ret |= 1;
 	    }
 	  else
@@ -322,6 +423,7 @@  do_one_test (const char *filename)
   ret |= do_ftell_test (filename);
   ret |= do_write_test (filename);
   ret |= do_append_test (filename);
+  ret |= do_rewind_test (filename);
 
   return ret;
 }
diff --git a/libio/wfileops.c b/libio/wfileops.c
index 8b2e108..eaeb527 100644
--- a/libio/wfileops.c
+++ b/libio/wfileops.c
@@ -704,24 +704,13 @@  do_ftell_wide (_IO_FILE *fp)
 	  offset -= fp->_IO_read_end - fp->_IO_write_ptr;
 	}
 
-      /* It is safe to use the cached offset when available if there is
-	 unbuffered data (indicating that the file handle is active) and
-	 the handle is not for a file open in a+ mode.  The latter
-	 condition is because there could be a scenario where there is a
-	 switch from read mode to write mode using an fseek to an arbitrary
-	 position.  In this case, there would be unbuffered data due to be
-	 appended to the end of the file, but the offset may not
-	 necessarily be the end of the file.  It is fine to use the cached
-	 offset when the a+ stream is in read mode though, since the offset
-	 is maintained correctly in that case.  Note that this is not a
-	 comprehensive set of cases when the offset is reliable.  The
-	 offset may be reliable even in some cases where there is no
-	 unflushed input and the handle is active, but it's just that we
-	 don't have a way to identify that condition reliably.  */
-      use_cached_offset = (offset != 0 && fp->_offset != _IO_pos_BAD
-			   && ((fp->_flags & (_IO_IS_APPENDING | _IO_NO_READS))
-			       == (_IO_IS_APPENDING | _IO_NO_READS)
-			       && was_writing));
+      bool append_mode = (fp->_flags & _IO_IS_APPENDING) == _IO_IS_APPENDING;
+
+      /* It should be safe to use the cached offset if it is available and we
+	 are not writing in a+ mode.  Any attempt to set the offset when the
+	 stream handle is not active, is a bug.  */
+      use_cached_offset = (fp->_offset != _IO_pos_BAD
+			   && !(was_writing && append_mode));
     }
 
   if (use_cached_offset)