manual: Clarifications for listing directories

Message ID 87plknf4d6.fsf@oldenburg.str.redhat.com (mailing list archive)
State New
Delegated to: Maxim Kuvyrkov
Headers
Series manual: Clarifications for listing directories |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed

Commit Message

Florian Weimer Jan. 16, 2025, 10:19 a.m. UTC
  Support for seeking is limited.  Using the d_off and d_reclen members
of struct dirent is discouraged, especially with readdir.  Concurrent
modification of directories during iteration may result in duplicate
or missing etnries.

---
 manual/filesys.texi | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 64 insertions(+), 2 deletions(-)


base-commit: a402cae36d95a2141703df324b5de5b581868c5c
  

Patch

diff --git a/manual/filesys.texi b/manual/filesys.texi
index aabb68385b..0500a751be 100644
--- a/manual/filesys.texi
+++ b/manual/filesys.texi
@@ -409,18 +409,41 @@  entries.  It contains the following fields:
 This is the null-terminated file name component.  This is the only
 field you can count on in all POSIX systems.
 
+While this field is defined with a specified length, functions such as
+@code{readdir} may return a pointer to a @code{struct dirent} where the
+@code{d_name} extends beyond the end of the struct.
+
 @item ino_t d_fileno
 This is the file serial number.  For BSD compatibility, you can also
 refer to this member as @code{d_ino}.  On @gnulinuxhurdsystems{} and most POSIX
 systems, for most files this the same as the @code{st_ino} member that
 @code{stat} will return for the file.  @xref{File Attributes}.
 
+@item off_t d_off
+This value contains the offset of the next directory entry (after this
+entry) in the directory stream.  The value may not be compatible with
+@code{lseek} or @code{seekdir}, especially if the width of @code{d_off}
+is less than 64 bits.  Directory entries are not ordered by offset, and
+the @code{d_off} and @code{d_reclen} values are unrelated.  Seeking on
+directory streams is not recommended.  The symbol
+@code{_DIRENT_HAVE_D_OFF} is defined if the @code{d_ino} member is
+available.
+
 @item unsigned char d_namlen
 This is the length of the file name, not including the terminating
 null character.  Its type is @code{unsigned char} because that is the
 integer type of the appropriate size.  This member is a BSD extension.
 The symbol @code{_DIRENT_HAVE_D_NAMLEN} is defined if this member is
-available.
+available.  (It is not available on Linux.)
+
+@item unsigned short int d_reclen
+This is the length of the entire directory record.  When iterating
+through a buffer filled by @code{getdents64} (@pxref{Low-level Directory
+Access}), this value needs to be added to the offset of the current
+directory entry to obtain the offset of the next entry.  When using
+@code{readdir} and related functions, the value of @code{d_reclen} is
+undefined and should not be accessed.  The symbol
+@code{_DIRENT_HAVE_D_RECLEN} is defined if this member is available.
 
 @item unsigned char d_type
 This is the type of the file, possibly unknown.  The following constants
@@ -457,7 +480,7 @@  This member is a BSD extension.  The symbol @code{_DIRENT_HAVE_D_TYPE}
 is defined if this member is available.  On systems where it is used, it
 corresponds to the file type bits in the @code{st_mode} member of
 @code{struct stat}.  If the value cannot be determined the member
-value is DT_UNKNOWN.  These two macros convert between @code{d_type}
+value is @code{DT_UNKNOWN}.  These two macros convert between @code{d_type}
 values and @code{st_mode} values:
 
 @deftypefun int IFTODT (mode_t @var{mode})
@@ -632,6 +655,20 @@  and can be rewritten by a subsequent call.
 return entries for @file{.} and @file{..}, even though these are always
 valid file names in any directory.  @xref{File Name Resolution}.
 
+If a directory is modified before between a call to @code{readdir} and
+after the directory stream was created or @code{rewinddir} was last
+called on it, it is unspecified according to POSIX whether newly created
+or removed entries appear among the entries returned by repeated
+@code{readdir} calls before the end of the directory is reached.
+However, due to practical implementation constraints, it is possible
+that entries (including unrelated, unmodified entries) appear multiple
+times or do not appear at all if the directory is modified while listing
+it.  If the application intends to create files in the directory, it
+maybe necessary to complete the iteration first and create a copy of the
+information obtained before creating any new files.  (See below for
+instructions regarding copying of @code{d_name}.)  The iteration can be
+restarted using @code{rewinddir}.  @xref{Random Access Directory}.
+
 If there are no more entries in the directory or an error is detected,
 @code{readdir} returns a null pointer.  The following @code{errno} error
 conditions are defined for this function:
@@ -812,6 +849,10 @@  directory since it was opened with @code{opendir}.  (Entries for these
 files might or might not be returned by @code{readdir} if they were
 added or removed since you last called @code{opendir} or
 @code{rewinddir}.)
+
+For example, it is recommended to call @code{rewinddir} followed by
+@code{readdir} to check if a directory is empty after listing it with
+@code{readdir} and deleting all encountered files from it.
 @end deftypefun
 
 @deftypefun {long int} telldir (DIR *@var{dirstream})
@@ -823,6 +864,13 @@  added or removed since you last called @code{opendir} or
 The @code{telldir} function returns the file position of the directory
 stream @var{dirstream}.  You can use this value with @code{seekdir} to
 restore the directory stream to that position.
+
+Using the the @code{telldir} function is not recommended.
+
+The value returned by @code{telldir} may not be compatible with the
+@code{d_off} field in @code{struct dirent}, and cannot be used with the
+@code{lseek} function.  The returned value may not unambiguously
+identify the position in the directory stream.
 @end deftypefun
 
 @deftypefun void seekdir (DIR *@var{dirstream}, long int @var{pos})
@@ -836,6 +884,9 @@  stream @var{dirstream} to @var{pos}.  The value @var{pos} must be the
 result of a previous call to @code{telldir} on this particular stream;
 closing and reopening the directory can invalidate values returned by
 @code{telldir}.
+
+Using the the @code{seekdir} function is not recommended.  To seek to
+the beginning of the directory stream, use @code{rewinddir}.
 @end deftypefun
 
 
@@ -1007,9 +1058,20 @@  Note that some file systems support file names longer than
 @code{NAME_MAX} bytes (e.g., because they support up to 255 Unicode
 characters), so a buffer size of at least 1024 is recommended.
 
+If the directory has been modified since the first call to
+@code{getdents64} on the directory (opening the descriptor or seeking to
+offset zero), it is possible that the buffer contains entries that have
+been encountered before.  Likewise, it is possible that files that are
+still present are not reported before the end of the directory is
+encountered (and @code{getdents64} returns zero).
+
 This function is specific to Linux.
 @end deftypefun
 
+Systems that support @code{getdents64} support seeking on directory
+streams.  @xref{File Position Primitive}.  However, the only offset that
+works reliably is offset zero, indicating that reading the directory
+should start from the beginning.
 
 @node Working with Directory Trees
 @section Working with Directory Trees