Document further requirement on mixing streams / file descriptors

Message ID 46008e45-db75-c168-70fe-3c5b5009a9b5@redhat.com
State Changes Requested
Headers
Series Document further requirement on mixing streams / file descriptors |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit fail Patch caused testsuite regressions
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed

Commit Message

Joseph Myers Sept. 25, 2024, 9:28 p.m. UTC
  The gilbc manual has some documentation in llio.texi of requirements
for moving between I/O on FILE * streams and file descriptors on the
same open file description.

The documentation of what must be done on a FILE * stream to move from
it to either a file descriptor or another FILE * for the same open
file description seems to match POSIX.  However, there is an
additional requirement in POSIX on the *second* of the two handles
being moved between, which is not mentioned in the glibc manual: "If
any previous active handle has been used by a function that explicitly
changed the file offset, except as required above for the first
handle, the application shall perform an lseek() or fseek() (as
appropriate to the type of handle) to an appropriate location.".

Document this requirement on seeking in the glibc manual.  Note that
I'm not sure what the "except as required above for the first handle"
is meant to be about, so I haven't documented anything for it.  As far
as I can tell, nothing specified for moving from the first handle
actually list calling a seek function as one of the steps to be done.
(Current POSIX doesn't seem to have any relevant rationale for this
section.  The rationale in the 1996 edition says "In requiring the
seek to an appropriate location for the new handle, the application is
required to know what it is doing if it is passing streams with seeks
involved.  If the required seek is not done, the results are undefined
(and in fact the program probably will not work on many common
implementations)." - which also doesn't help in understanding the
purpose of "except as required above for the first handle".)

Tested with "make info" and "make pdf".
  

Comments

Florian Weimer Sept. 26, 2024, 8:28 a.m. UTC | #1
* Joseph Myers:

> The gilbc manual has some documentation in llio.texi of requirements

Typo: g[lib]c
> diff --git a/manual/llio.texi b/manual/llio.texi
> index a035c3e20f..3ea5c352ee 100644
> --- a/manual/llio.texi
> +++ b/manual/llio.texi
> @@ -1097,6 +1097,21 @@ streams persist in other processes, their file positions become
>  undefined as a result.  To prevent this, you must clean up the streams
>  before destroying them.
>  
> +In addition to cleaning up a stream before doing I/O using another
> +linked channel, additional precautions are needed to ensure a
> +well-defined file position indicator in some cases.  If both the
> +following conditions hold, you must set the file position indicator on
> +the new channel (either a stream or a descriptor) using a function
> +such as @code{fseek} or @code{lseek}.
> +
> +@itemize @bullet
> +@item At least one of the old and new linked channels is a stream.
> +
> +@item The file position indicator was previously set (using the old
> +linked channel or a previous channel linked to it) with a function
> +such as @code{fseek} or @code{lseek}.
> +@end itemize

For context, this updates the Linked Channels subsection, which is about
channels with the same underlying file description.

I do not think this rules accurate.  The standard streams are linked
channels, typically with descriptors for the file description in the
parent process.  They are streams.  A freshly started program does not
know if another program seeked any of the descriptors before.  Does this
mean programs need to add fseek calls for the standard streams?  What if
those streams are not seekable?

I think we have a step missing in the cleaning process: the new channel
may indeed need seeking.  The current manual suggests that cleaning is
only needed on the old channel, but I don't think this is accurate, for
both linked and independent channels.  For example, an input stream may
have old file contents buffered.

Thanks,
Florian
  
Joseph Myers Sept. 26, 2024, 10:39 p.m. UTC | #2
On Thu, 26 Sep 2024, Florian Weimer wrote:

> I do not think this rules accurate.  The standard streams are linked
> channels, typically with descriptors for the file description in the
> parent process.  They are streams.  A freshly started program does not
> know if another program seeked any of the descriptors before.  Does this
> mean programs need to add fseek calls for the standard streams?  What if
> those streams are not seekable?

This seems like it might be an omission in the POSIX specification.

What I'd expect is: after execve, the standard streams are set up from the 
relevant file descriptors.  If the previous process seeked on a handle for 
that open file description, then it (possibly in the child after fork) 
must make the file descriptor active, including seeking on it to get a 
defined offset, but then after execve nothing more is needed regarding 
seeking on the stream (assuming that other processes aren't using the 
same open file description at the same time).

I can't however find anything in POSIX that says that this is what happens 
with handles for file descriptors 0, 1, 2 on execve (and, in particular, 
that the requirement to seek on the stream does not apply).  If we think 
this is what the semantics should be for glibc, we could still document it 
as such.

> I think we have a step missing in the cleaning process: the new channel
> may indeed need seeking.  The current manual suggests that cleaning is
> only needed on the old channel, but I don't think this is accurate, for
> both linked and independent channels.  For example, an input stream may
> have old file contents buffered.

For an input stream with old contents buffered (that is the new, linked 
handle), I think it would have been the active handle earlier, and so have 
needed to be cleaned when it ceased to be the active handle.  (In the case 
of independent channels, the manual already says "You should clean an 
input stream before reading data that may have been modified using an 
independent channel.  Otherwise, you might read obsolete data that had 
been in the stream's buffer.".)
  
Florian Weimer Sept. 30, 2024, 10:10 a.m. UTC | #3
* Joseph Myers:

> On Thu, 26 Sep 2024, Florian Weimer wrote:
>
>> I do not think this rules accurate.  The standard streams are linked
>> channels, typically with descriptors for the file description in the
>> parent process.  They are streams.  A freshly started program does not
>> know if another program seeked any of the descriptors before.  Does this
>> mean programs need to add fseek calls for the standard streams?  What if
>> those streams are not seekable?
>
> This seems like it might be an omission in the POSIX specification.
>
> What I'd expect is: after execve, the standard streams are set up from the 
> relevant file descriptors.  If the previous process seeked on a handle for 
> that open file description, then it (possibly in the child after fork) 
> must make the file descriptor active, including seeking on it to get a 
> defined offset, but then after execve nothing more is needed regarding 
> seeking on the stream (assuming that other processes aren't using the 
> same open file description at the same time).

I puzzled by this seeking requirement on the newly created descriptors.
Why would one have to seek after a dup on the new descriptor?  Do you
think that's relevant in a GNU/Linux context?  After all, the new
descriptor shares the underlying file description, and does not maintain
its offset.

>> I think we have a step missing in the cleaning process: the new channel
>> may indeed need seeking.  The current manual suggests that cleaning is
>> only needed on the old channel, but I don't think this is accurate, for
>> both linked and independent channels.  For example, an input stream may
>> have old file contents buffered.
>
> For an input stream with old contents buffered (that is the new, linked 
> handle), I think it would have been the active handle earlier, and so have 
> needed to be cleaned when it ceased to be the active handle.  (In the case 
> of independent channels, the manual already says "You should clean an 
> input stream before reading data that may have been modified using an 
> independent channel.  Otherwise, you might read obsolete data that had 
> been in the stream's buffer.".)

That's a good point.  Maybe it's possible to tweak the language you
proposed to apply only if the new stream was previously active?

The standard streams were not active before, so maybe that change is
sufficient to avoid the unnecessary requirement about seeking for them?

Thanks,
Florian
  
Joseph Myers Oct. 7, 2024, 10:58 p.m. UTC | #4
On Mon, 30 Sep 2024, Florian Weimer wrote:

> > What I'd expect is: after execve, the standard streams are set up from the 
> > relevant file descriptors.  If the previous process seeked on a handle for 
> > that open file description, then it (possibly in the child after fork) 
> > must make the file descriptor active, including seeking on it to get a 
> > defined offset, but then after execve nothing more is needed regarding 
> > seeking on the stream (assuming that other processes aren't using the 
> > same open file description at the same time).
> 
> I puzzled by this seeking requirement on the newly created descriptors.
> Why would one have to seek after a dup on the new descriptor?  Do you
> think that's relevant in a GNU/Linux context?  After all, the new
> descriptor shares the underlying file description, and does not maintain
> its offset.

It may in fact not be relevant (if the previous handle was a file 
descriptor, its offset should be well defined; if it was a stream, the 
rules for cleaning a stream before changing handles should have ensured 
the offset on the underlying file descriptor is correct).

> > For an input stream with old contents buffered (that is the new, linked 
> > handle), I think it would have been the active handle earlier, and so have 
> > needed to be cleaned when it ceased to be the active handle.  (In the case 
> > of independent channels, the manual already says "You should clean an 
> > input stream before reading data that may have been modified using an 
> > independent channel.  Otherwise, you might read obsolete data that had 
> > been in the stream's buffer.".)
> 
> That's a good point.  Maybe it's possible to tweak the language you
> proposed to apply only if the new stream was previously active?
> 
> The standard streams were not active before, so maybe that change is
> sufficient to avoid the unnecessary requirement about seeking for them?

So the relevant case here is: the previous handle on which seeking took 
place was a stream, which is now being made active again.  Seeking 
resulted in fp->_offset being set to something other than _IO_pos_BAD, and 
the rules for cleaning the stream did not result in it becoming 
_IO_pos_BAD again (for example, it's an unbuffered stream).  But because 
the offset of the underlying file descriptor may have changed since that 
handle stopped being active, seeking on the stream being made active again 
is necessary to make fp->_offset correct again.

Here is a version of the patch that tries to reflect that understanding.



Document further requirement on mixing streams / file descriptors

The gilbc manual has some documentation in llio.texi of requirements
for moving between I/O on FILE * streams and file descriptors on the
same open file description.

The documentation of what must be done on a FILE * stream to move from
it to either a file descriptor or another FILE * for the same open
file description seems to match POSIX.  However, there is an
additional requirement in POSIX on the *second* of the two handles
being moved between, which is not mentioned in the glibc manual: "If
any previous active handle has been used by a function that explicitly
changed the file offset, except as required above for the first
handle, the application shall perform an lseek() or fseek() (as
appropriate to the type of handle) to an appropriate location.".

Document this requirement on seeking in the glibc manual, limited to
the case that seems relevant to glibc (the new channel is a previously
active stream, on which the seeking previously occurred).  Note that
I'm not sure what the "except as required above for the first handle"
is meant to be about, so I haven't documented anything for it.  As far
as I can tell, nothing specified for moving from the first handle
actually list calling a seek function as one of the steps to be done.
(Current POSIX doesn't seem to have any relevant rationale for this
section.  The rationale in the 1996 edition says "In requiring the
seek to an appropriate location for the new handle, the application is
required to know what it is doing if it is passing streams with seeks
involved.  If the required seek is not done, the results are undefined
(and in fact the program probably will not work on many common
implementations)." - which also doesn't help in understanding the
purpose of "except as required above for the first handle".)

Tested with "make info" and "make pdf".

---

Changes in v2: restrict the documentation of the requirement to the
case that seems relevant to glibc, then note the additional cases from
POSIX.

diff --git a/manual/llio.texi b/manual/llio.texi
index a035c3e20f..3f76ee40fe 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -1097,6 +1097,27 @@ streams persist in other processes, their file positions become
 undefined as a result.  To prevent this, you must clean up the streams
 before destroying them.
 
+In addition to cleaning up a stream before doing I/O using another
+linked channel, additional precautions are needed to ensure a
+well-defined file position indicator in some cases.  If both the
+following conditions hold, you must set the file position indicator on
+the new channel (a stream) using a function such as @code{fseek}.
+
+@itemize @bullet
+@item
+The new linked channel is a stream that was previously active.
+
+@item
+The file position indicator was previously set on that channel (while
+it was previously active) with a function such as @code{fseek}.
+@end itemize
+
+POSIX requires such precautions in more cases: if either the old or
+the new linked channel is a stream (whether or not previously active)
+and the file position indicator was previously set on any channel
+linked to those channels with a function such as @code{fseek} or
+@code{lseek}.
+
 @node Independent Channels
 @subsection Independent Channels
 @cindex independent channels
  

Patch

diff --git a/manual/llio.texi b/manual/llio.texi
index a035c3e20f..3ea5c352ee 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -1097,6 +1097,21 @@  streams persist in other processes, their file positions become
 undefined as a result.  To prevent this, you must clean up the streams
 before destroying them.
 
+In addition to cleaning up a stream before doing I/O using another
+linked channel, additional precautions are needed to ensure a
+well-defined file position indicator in some cases.  If both the
+following conditions hold, you must set the file position indicator on
+the new channel (either a stream or a descriptor) using a function
+such as @code{fseek} or @code{lseek}.
+
+@itemize @bullet
+@item At least one of the old and new linked channels is a stream.
+
+@item The file position indicator was previously set (using the old
+linked channel or a previous channel linked to it) with a function
+such as @code{fseek} or @code{lseek}.
+@end itemize
+
 @node Independent Channels
 @subsection Independent Channels
 @cindex independent channels