Fix readdir_r with long file names

Message ID 56D54DAD.1040306@gmail.com
State New, archived
Headers

Commit Message

Michael Kerrisk (man-pages) March 1, 2016, 8:07 a.m. UTC
  Hello Florian,

On 08/15/2013 09:52 AM, Florian Weimer wrote:
> On 08/13/2013 06:00 AM, Siddhesh Poyarekar wrote:
>>> +In POSIX.1-2008, @code{readdir} is not thread-safe.  In @theglibc{}
>>> +implementation, it is safe to call @code{readdir} concurrently on
>>> +different @var{dirstream}s (but multiple threads accessing the same
>>> +@var{dirstream} result in undefined behavior).  @code{readdir_r} is a
>>
>> Minor nit - you could get rid of the round brackets and simply write
>> the line as:
>>
>> In @theglibc{} implementation, it is safe to call @code{readdir}
>> concurrently on different @var{dirstream}s, but multiple threads
>> accessing the same @var{dirstream} result in undefined behavior.
> 
> Applied, thanks.
> 
>>> +@item
>>> +POSIX-1.2008 does not guarantee that @code{readdir} is thread-safe,
>>> +even when access to the same @var{dirstream} is serialized.  But in
>>> +current implementations (including @theglibc{}), it is safe to call
>>> +@code{readdir} concurrently on different @var{dirstream}s, so there is
>>> +no requirement to use @code{readdir_r} even in multi-threaded
>>> +programs.
>>> +
>>
>> This seems to gloss over the fact that one would need synchronization
>> (or readdir_r) if readdir is called concurrently on the same
>> dirstream.  It seems like a bad idea to do this at all, but we
>> probably should add a note about it anyway.
> 
> I ended up with this:
> 
> +@item
> +POSIX-1.2008 does not guarantee that @code{readdir} is thread-safe,
> +even when access to the same @var{dirstream} is serialized.  But in
> +current implementations (including @theglibc{}), it is safe to call
> +@code{readdir} concurrently on different @var{dirstream}s, so there is
> +no need to use @code{readdir_r} in most multi-threaded programs.  In
> +the rare case that multiple threads need to read from the same
> +@var{dirstream}, it is still better to use @code{readdir} and external
> +synchronization.
> 
> I'm attaching my current version.  It compiles, but tests are still running.
> 
> I'll update the NEWS file with the bug number on the final commit.

I see that glibc 2.23 deprecates readdir_r(), which prompted me to catch
up on this thread. I'd like to see the points you make documented in the
readdir_r(3) man page also. Would you be willing to allow that text to
be reused / reworked for the page, under that page's existing "verbatim"
license (https://www.kernel.org/doc/man-pages/licenses.html#verbatim)?

The text I'd propose to add to the man page would be (new material
starting at ===>):

  readdir_r()
       The readdir_r() function is a reentrant version  of  readdir().
       It  reads  the  next  directory entry from the directory stream
       dirp, and returns it in the caller-allocated buffer pointed  to
       by entry.  A pointer to the returned item is placed in *result;
       if the end of the directory stream was encountered,  then  NULL
       is instead returned in *result.

       Since  POSIX.1  does  not specify the size of the d_name field,
       and other nonstandard fields may precede that field within  the
       dirent  structure,  portable  applications that use readdir_r()
       could allocate the buffer whose address is passed in  entry  as
       follows:

           name_max = pathconf(dirpath, _PC_NAME_MAX);
           if (name_max == -1)         /* Limit not defined, or error */
               name_max = 255;         /* Take a guess */
           len = offsetof(struct dirent, d_name) + name_max + 1;
           entryp = malloc(len);

       (POSIX.1  requires  that  d_name  is the last field in a struct
       dirent.)
  

Comments

Florian Weimer March 1, 2016, 4:59 p.m. UTC | #1
On 03/01/2016 09:07 AM, Michael Kerrisk (man-pages) wrote:

> I see that glibc 2.23 deprecates readdir_r(), which prompted me to catch
> up on this thread. I'd like to see the points you make documented in the
> readdir_r(3) man page also. Would you be willing to allow that text to
> be reused / reworked for the page, under that page's existing "verbatim"
> license (https://www.kernel.org/doc/man-pages/licenses.html#verbatim)?

Hi Michael,

thanks for keeping an eye on deprecations.  The deprecation happened for
glibc 2.24 (unrelased).

I'm happy to report that I may grant your request.

> The text I'd propose to add to the man page would be (new material
> starting at ===>):

It may make sense to move this documentation to a separate manual page,
specific to readdir_r.  This will keep the readdir documentation nice
and crisp.  Most programmers will never have to consult all these details.

You should remove the example using pathconf because it is not correct.
 The kernel does not return valid values for _PC_NAME_MAX and some file
systems (such as CIFS, and CD-ROMs with Joliet extensions once a kernel
bug is fixed).  The CIFS limit is somewhere around 765, and not 255 as
reported by the kernel.  If I recall correctly, Windows SMB servers can
actually exceed the 255 byte limit.  The reason is that Windows NTFS has
a limit based on 16-bit UCS-2 characters, and after UTF-8 conversion,
the maximum length is more than 255 bytes.

> ===>   However, the above approach has problems, and it is recommended
>        that  applications  use readdir() instead of readdir_r().  Fur‐
>        thermore, since version  2.23,  glibc  deprecates  readdir_r().
>        The reasons are as follows:
> 
>        *  On  systems where NAME_MAX is undefined, calling readdir_r()
>           may be unsafe because the interface does not allow the call‐
>           er to specify the length of the buffer used for the returned
>           directory entry.
> 
>        *  On some systems, readdir_r() can't  read  directory  entries
>           with very long names.  When the glibc implementation encoun‐
>           ters such a name, readdir_r() fails with the error ENAMETOO‐
>           LONG after the final directory entry has been read.  On some
>           other systems, readdir_r() may return a success status,  but
>           the  returned d_name field may not be null terminated or may
>           be truncated.
> 
>        *  In the current POSIX.1 specification  (POSIX.1-2008),  read‐
>           dir_r() is not required to be thread-safe.  However, in mod‐
>           ern implementations (including  the  glibc  implementation),
>           concurrent  calls  to  readdir_r()  that  specify  different
>           directory streams are thread-safe.  Therefore,  the  use  of

These two references to readdir_r should be to readdir instead.

I believe there was a historic implementation which implemented
fdopendir (fd) as (DIR *) fd, and used a global static buffer for
readdir.  This is about the only way readdir can be non-thread-safe.

>           readdir_r()  is  generally unnecessary in multithreaded pro‐
>           grams.  In cases where multiple threads must read  from  the
>           same  directory  stream,  using readdir() with external syn‐
>           chronization is still preferable to the use of  readdir_r(),
>           for the reasons given in the points above.
> 
>        *  It  is  expected  that a future version of POSIX.1 will make
>           readdir_r() obsolete, and require that readdir() be  thread-
>           safe  when  concurrently  employed  on  different  directory
>           streams.

Okay.

Thanks,
Florian
  
Michael Kerrisk \(man-pages\) March 1, 2016, 8:14 p.m. UTC | #2
Hi Florian,

On 03/01/2016 05:59 PM, Florian Weimer wrote:
> On 03/01/2016 09:07 AM, Michael Kerrisk (man-pages) wrote:
> 
>> I see that glibc 2.23 deprecates readdir_r(), which prompted me to catch
>> up on this thread. I'd like to see the points you make documented in the
>> readdir_r(3) man page also. Would you be willing to allow that text to
>> be reused / reworked for the page, under that page's existing "verbatim"
>> license (https://www.kernel.org/doc/man-pages/licenses.html#verbatim)?
> 
> Hi Michael,
> 
> thanks for keeping an eye on deprecations.  The deprecation happened for
> glibc 2.24 (unrelased).

Ah yes, I was getting ahead of myself. Fixed that in the page text below.

> I'm happy to report that I may grant your request.

Thanks!

>> The text I'd propose to add to the man page would be (new material
>> starting at ===>):
> 
> It may make sense to move this documentation to a separate manual page,
> specific to readdir_r.  This will keep the readdir documentation nice
> and crisp.  Most programmers will never have to consult all these details.

Yes, seems reasonable. Done.

> You should remove the example using pathconf because it is not correct.

Done.

>  The kernel does not return valid values for _PC_NAME_MAX and some file
> systems (such as CIFS, and CD-ROMs with Joliet extensions once a kernel
> bug is fixed).  The CIFS limit is somewhere around 765, and not 255 as
> reported by the kernel.  If I recall correctly, Windows SMB servers can
> actually exceed the 255 byte limit.  The reason is that Windows NTFS has
> a limit based on 16-bit UCS-2 characters, and after UTF-8 conversion,
> the maximum length is more than 255 bytes.

What happens with readdir() when it gets a filename that is larger 
than 255 characters?
> 
>> ===>   However, the above approach has problems, and it is recommended
>>        that  applications  use readdir() instead of readdir_r().  Fur‐
>>        thermore, since version  2.23,  glibc  deprecates  readdir_r().

s/23/24/

>>        The reasons are as follows:
>>
>>        *  On  systems where NAME_MAX is undefined, calling readdir_r()
>>           may be unsafe because the interface does not allow the call‐
>>           er to specify the length of the buffer used for the returned
>>           directory entry.
>>
>>        *  On some systems, readdir_r() can't  read  directory  entries
>>           with very long names.  When the glibc implementation encoun‐
>>           ters such a name, readdir_r() fails with the error ENAMETOO‐
>>           LONG after the final directory entry has been read.  On some
>>           other systems, readdir_r() may return a success status,  but
>>           the  returned d_name field may not be null terminated or may
>>           be truncated.
>>
>>        *  In the current POSIX.1 specification  (POSIX.1-2008),  read‐
>>           dir_r() is not required to be thread-safe.  However, in mod‐
>>           ern implementations (including  the  glibc  implementation),
>>           concurrent  calls  to  readdir_r()  that  specify  different
>>           directory streams are thread-safe.  Therefore,  the  use  of
> 
> These two references to readdir_r should be to readdir instead.

Fixed.

> 
> I believe there was a historic implementation which implemented
> fdopendir (fd) as (DIR *) fd, and used a global static buffer for
> readdir.  This is about the only way readdir can be non-thread-safe.
> 
>>           readdir_r()  is  generally unnecessary in multithreaded pro‐
>>           grams.  In cases where multiple threads must read  from  the
>>           same  directory  stream,  using readdir() with external syn‐
>>           chronization is still preferable to the use of  readdir_r(),
>>           for the reasons given in the points above.
>>
>>        *  It  is  expected  that a future version of POSIX.1 will make
>>           readdir_r() obsolete, and require that readdir() be  thread-
>>           safe  when  concurrently  employed  on  different  directory
>>           streams.

Thanks for all of the feedback Florian! The current versions of the
readdir(3) and readdir_r(3) have been pushed to the repo.

Cheers,

Michael
  
Florian Weimer March 1, 2016, 8:27 p.m. UTC | #3
On 03/01/2016 09:14 PM, Michael Kerrisk (man-pages) wrote:

> What happens with readdir() when it gets a filename that is larger 
> than 255 characters?

Good question.  Ugh.

readdir will return a pointer to a struct dirent whose d_name member
will not be null-terminated, but the memory following the struct dirent
object will contain the rest of the name, and will eventually be
null-terminated.

This will work perfectly fine if you don't copy struct dirent objects
using memcpy, and the compiler does not optimize things too much.  We
should implement compiler support for this wart: inhibit optimizations
(I think there are already special cases for length-0 and length-1
arrays at the end, so it's not totally without precedent), and warn
about sizeof (struct dirent) and using it as a (non-pointer) declarator.
 The second part is likely generally useful for structs whose size is
not intended to be part of the ABI.

Florian
  
Michael Kerrisk \(man-pages\) March 1, 2016, 9:01 p.m. UTC | #4
On 03/01/2016 09:27 PM, Florian Weimer wrote:
> On 03/01/2016 09:14 PM, Michael Kerrisk (man-pages) wrote:
> 
>> What happens with readdir() when it gets a filename that is larger 
>> than 255 characters?
> 
> Good question.  Ugh.
> 
> readdir will return a pointer to a struct dirent whose d_name member
> will not be null-terminated, but the memory following the struct dirent
> object will contain the rest of the name, and will eventually be
> null-terminated.

So, in other words, if the caller users a declaration of the form

    struct dirent d;

(rather than say allocating a large buffer dynamically), then we have 
a buffer overrun?

Cheers,

Michael

> This will work perfectly fine if you don't copy struct dirent objects
> using memcpy, and the compiler does not optimize things too much.  We
> should implement compiler support for this wart: inhibit optimizations
> (I think there are already special cases for length-0 and length-1
> arrays at the end, so it's not totally without precedent), and warn
> about sizeof (struct dirent) and using it as a (non-pointer) declarator.
>  The second part is likely generally useful for structs whose size is
> not intended to be part of the ABI.
> 
> Florian
>
  
Paul Eggert March 1, 2016, 9:20 p.m. UTC | #5
On 03/01/2016 12:27 PM, Florian Weimer wrote:
> We
> should implement compiler support for this wart: inhibit optimizations
> (I think there are already special cases for length-0 and length-1
> arrays at the end, so it's not totally without precedent), and warn
> about sizeof (struct dirent) and using it as a (non-pointer) declarator.

Why not use a flexible array member for this? Sure, that assumes C99, 
but flexible array members are pretty much universally supported now 
(and we can fall back on the current layout for pre-C99 compilers). This 
would work better with modern compilers that treat small arrays with 
more respect than traditional C compilers did. And as I understand 
things, it would conform to POSIX (and if I'm wrong, POSIX should get 
fixed....).

For what it's worth, portable code cannot copy struct dirent values 
anyway, as this loses file names in operating systems like Solaris where 
d_name has size 1.
  
Florian Weimer March 1, 2016, 10:16 p.m. UTC | #6
On 03/01/2016 10:20 PM, Paul Eggert wrote:
> On 03/01/2016 12:27 PM, Florian Weimer wrote:
>> We
>> should implement compiler support for this wart: inhibit optimizations
>> (I think there are already special cases for length-0 and length-1
>> arrays at the end, so it's not totally without precedent), and warn
>> about sizeof (struct dirent) and using it as a (non-pointer) declarator.
> 
> Why not use a flexible array member for this?

For which part, and how exactly?

You can't put a flexible array member into a transparent union.

> Sure, that assumes C99,
> but flexible array members are pretty much universally supported now
> (and we can fall back on the current layout for pre-C99 compilers). This
> would work better with modern compilers that treat small arrays with
> more respect than traditional C compilers did. And as I understand
> things, it would conform to POSIX (and if I'm wrong, POSIX should get
> fixed....).

If you mean to add some zero-width padding member at the end of the
struct, after the d_name member, then I'm worried that makes overrunning
the d_name array member even more undefined than it already is. :)

> For what it's worth, portable code cannot copy struct dirent values
> anyway, as this loses file names in operating systems like Solaris where
> d_name has size 1.

Interesting.  I feel slightly better now about having this overrunning
d_name member.

Florian
  
Florian Weimer March 1, 2016, 10:21 p.m. UTC | #7
On 03/01/2016 10:01 PM, Michael Kerrisk (man-pages) wrote:
> On 03/01/2016 09:27 PM, Florian Weimer wrote:
>> On 03/01/2016 09:14 PM, Michael Kerrisk (man-pages) wrote:
>>
>>> What happens with readdir() when it gets a filename that is larger 
>>> than 255 characters?
>>
>> Good question.  Ugh.
>>
>> readdir will return a pointer to a struct dirent whose d_name member
>> will not be null-terminated, but the memory following the struct dirent
>> object will contain the rest of the name, and will eventually be
>> null-terminated.
> 
> So, in other words, if the caller users a declaration of the form
> 
>     struct dirent d;
> 
> (rather than say allocating a large buffer dynamically), then we have 
> a buffer overrun?

readdir gives you only a struct dirent * to an internal buffer.  If you do

  struct dirent *e = readdir (dir);
  memcpy (&d, e, sizeof (d));

you can end up with a truncated name.  According to Paul's comment, this
kind of truncation is very visible on Solaris.

Florian
  
Rich Felker March 1, 2016, 10:27 p.m. UTC | #8
On Tue, Mar 01, 2016 at 11:21:11PM +0100, Florian Weimer wrote:
> On 03/01/2016 10:01 PM, Michael Kerrisk (man-pages) wrote:
> > On 03/01/2016 09:27 PM, Florian Weimer wrote:
> >> On 03/01/2016 09:14 PM, Michael Kerrisk (man-pages) wrote:
> >>
> >>> What happens with readdir() when it gets a filename that is larger 
> >>> than 255 characters?
> >>
> >> Good question.  Ugh.
> >>
> >> readdir will return a pointer to a struct dirent whose d_name member
> >> will not be null-terminated, but the memory following the struct dirent
> >> object will contain the rest of the name, and will eventually be
> >> null-terminated.
> > 
> > So, in other words, if the caller users a declaration of the form
> > 
> >     struct dirent d;
> > 
> > (rather than say allocating a large buffer dynamically), then we have 
> > a buffer overrun?
> 
> readdir gives you only a struct dirent * to an internal buffer.  If you do
> 
>   struct dirent *e = readdir (dir);
>   memcpy (&d, e, sizeof (d));
> 
> you can end up with a truncated name.  According to Paul's comment, this
> kind of truncation is very visible on Solaris.

POSIX also cautions you that this is a permitted definition. See:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html

It's covered under the description and rationale.

Rich
  
Michael Kerrisk \(man-pages\) March 2, 2016, 8:17 a.m. UTC | #9
On 03/01/2016 11:21 PM, Florian Weimer wrote:
> On 03/01/2016 10:01 PM, Michael Kerrisk (man-pages) wrote:
>> On 03/01/2016 09:27 PM, Florian Weimer wrote:
>>> On 03/01/2016 09:14 PM, Michael Kerrisk (man-pages) wrote:
>>>
>>>> What happens with readdir() when it gets a filename that is larger 
>>>> than 255 characters?
>>>
>>> Good question.  Ugh.
>>>
>>> readdir will return a pointer to a struct dirent whose d_name member
>>> will not be null-terminated, but the memory following the struct dirent
>>> object will contain the rest of the name, and will eventually be
>>> null-terminated.
>>
>> So, in other words, if the caller users a declaration of the form
>>
>>     struct dirent d;
>>
>> (rather than say allocating a large buffer dynamically), then we have 
>> a buffer overrun?
> 
> readdir gives you only a struct dirent * to an internal buffer.  

D'oh! Yes, of course. I wasn't thinking clearly as I wrote that last night.

> If you do
> 
>   struct dirent *e = readdir (dir);
>   memcpy (&d, e, sizeof (d));
> 
> you can end up with a truncated name.  

Got it.

> According to Paul's comment, this
> kind of truncation is very visible on Solaris.

Cheers,

Michael
  

Patch

===>   However, the above approach has problems, and it is recommended
       that  applications  use readdir() instead of readdir_r().  Fur‐
       thermore, since version  2.23,  glibc  deprecates  readdir_r().
       The reasons are as follows:

       *  On  systems where NAME_MAX is undefined, calling readdir_r()
          may be unsafe because the interface does not allow the call‐
          er to specify the length of the buffer used for the returned
          directory entry.

       *  On some systems, readdir_r() can't  read  directory  entries
          with very long names.  When the glibc implementation encoun‐
          ters such a name, readdir_r() fails with the error ENAMETOO‐
          LONG after the final directory entry has been read.  On some
          other systems, readdir_r() may return a success status,  but
          the  returned d_name field may not be null terminated or may
          be truncated.

       *  In the current POSIX.1 specification  (POSIX.1-2008),  read‐
          dir_r() is not required to be thread-safe.  However, in mod‐
          ern implementations (including  the  glibc  implementation),
          concurrent  calls  to  readdir_r()  that  specify  different
          directory streams are thread-safe.  Therefore,  the  use  of
          readdir_r()  is  generally unnecessary in multithreaded pro‐
          grams.  In cases where multiple threads must read  from  the
          same  directory  stream,  using readdir() with external syn‐
          chronization is still preferable to the use of  readdir_r(),
          for the reasons given in the points above.

       *  It  is  expected  that a future version of POSIX.1 will make
          readdir_r() obsolete, and require that readdir() be  thread-
          safe  when  concurrently  employed  on  different  directory
          streams.

Cheers,

Michael

===================================
2013-08-15  Florian Weimer  <fweimer@redhat.com>

	[BZ #14699]
	CVE-2013-4237
	* sysdeps/posix/dirstream.h (struct __dirstream): Add errcode
	member.
	* sysdeps/posix/opendir.c (__alloc_dir): Initialize errcode
	member.
	* sysdeps/posix/rewinddir.c (rewinddir): Reset errcode member.
	* sysdeps/posix/readdir_r.c (__READDIR_R): Enforce NAME_MAX limit.
	Return delayed error code.  Remove GETDENTS_64BIT_ALIGNED
	conditional.
	* sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c: Do not define
	GETDENTS_64BIT_ALIGNED.
	* sysdeps/unix/sysv/linux/i386/readdir64_r.c: Likewise.
	* manual/filesys.texi (Reading/Closing Directory): Document
	ENAMETOOLONG return value of readdir_r.  Recommend readdir more
	strongly.
	* manual/conf.texi (Limits for Files): Add portability note to
	NAME_MAX, PATH_MAX.
	(Pathconf): Add portability note for _PC_NAME_MAX, _PC_PATH_MAX.

diff --git a/manual/conf.texi b/manual/conf.texi
index 7eb8b36..c720063 100644
--- a/manual/conf.texi
+++ b/manual/conf.texi
@@ -1149,6 +1149,9 @@  typed ahead as input.  @xref{I/O Queues}.
 @deftypevr Macro int NAME_MAX
 The uniform system limit (if any) for the length of a file name component, not
 including the terminating null character.
+
+@strong{Portability Note:} On some systems, @theglibc{} defines
+@code{NAME_MAX}, but does not actually enforce this limit.
 @end deftypevr
 
 @comment limits.h
@@ -1157,6 +1160,9 @@  including the terminating null character.
 The uniform system limit (if any) for the length of an entire file name (that
 is, the argument given to system calls such as @code{open}), including the
 terminating null character.
+
+@strong{Portability Note:} @Theglibc{} does not enforce this limit
+even if @code{PATH_MAX} is defined.
 @end deftypevr
 
 @cindex limits, pipe buffer size
@@ -1476,6 +1482,9 @@  Inquire about the value of @code{POSIX_REC_MIN_XFER_SIZE}.
 Inquire about the value of @code{POSIX_REC_XFER_ALIGN}.
 @end table
 
+@strong{Portability Note:} On some systems, @theglibc{} does not
+enforce @code{_PC_NAME_MAX} or @code{_PC_PATH_MAX} limits.
+
 @node Utility Limits
 @section Utility Program Capacity Limits
 
diff --git a/manual/filesys.texi b/manual/filesys.texi
index 1df9cf2..814c210 100644
--- a/manual/filesys.texi
+++ b/manual/filesys.texi
@@ -444,9 +444,9 @@  symbols are declared in the header file @file{dirent.h}.
 @comment POSIX.1
 @deftypefun {struct dirent *} readdir (DIR *@var{dirstream})
 This function reads the next entry from the directory.  It normally
-returns a pointer to a structure containing information about the file.
-This structure is statically allocated and can be rewritten by a
-subsequent call.
+returns a pointer to a structure containing information about the
+file.  This structure is associated with the @var{dirstream} handle
+and can be rewritten by a subsequent call.
 
 @strong{Portability Note:} On some systems @code{readdir} may not
 return entries for @file{.} and @file{..}, even though these are always
@@ -461,19 +461,61 @@  conditions are defined for this function:
 The @var{dirstream} argument is not valid.
 @end table
 
-@code{readdir} is not thread safe.  Multiple threads using
-@code{readdir} on the same @var{dirstream} may overwrite the return
-value.  Use @code{readdir_r} when this is critical.
+To distinguish between an end-of-directory condition or an error, you
+must set @code{errno} to zero before calling @code{readdir}.  To avoid
+entering an infinite loop, you should stop reading from the directory
+after the first error.
+
+In POSIX.1-2008, @code{readdir} is not thread-safe.  In @theglibc{}
+implementation, it is safe to call @code{readdir} concurrently on
+different @var{dirstream}s, but multiple threads accessing the same
+@var{dirstream} result in undefined behavior.  @code{readdir_r} is a
+fully thread-safe alternative, but suffers from poor portability (see
+below).  It is recommended that you use @code{readdir}, with external
+locking if multiple threads access the same @var{dirstream}.
 @end deftypefun
 
 @comment dirent.h
 @comment GNU
 @deftypefun int readdir_r (DIR *@var{dirstream}, struct dirent *@var{entry}, struct dirent **@var{result})
-This function is the reentrant version of @code{readdir}.  Like
-@code{readdir} it returns the next entry from the directory.  But to
-prevent conflicts between simultaneously running threads the result is
-not stored in statically allocated memory.  Instead the argument
-@var{entry} points to a place to store the result.
+This function is a version of @code{readdir} which performs internal
+locking.  Like @code{readdir} it returns the next entry from the
+directory.  To prevent conflicts between simultaneously running
+threads the result is stored inside the @var{entry} object.
+
+@strong{Portability Note:} It is recommended to use @code{readdir}
+instead of @code{readdir_r} for the following reasons:
+
+@itemize @bullet
+@item
+On systems which do not define @code{NAME_MAX}, it may not be possible
+to use @code{readdir_r} safely because the caller does not specify the
+length of the buffer for the directory entry.
+
+@item
+On some systems, @code{readdir_r} cannot read directory entries with
+very long names.  If such a name is encountered, @theglibc{}
+implementation of @code{readdir_r} returns with an error code of
+@code{ENAMETOOLONG} after the final directory entry has been read.  On
+other systems, @code{readdir_r} may return successfully, but the
+@code{d_name} member may not be NUL-terminated or may be truncated.
+
+@item
+POSIX-1.2008 does not guarantee that @code{readdir} is thread-safe,
+even when access to the same @var{dirstream} is serialized.  But in
+current implementations (including @theglibc{}), it is safe to call
+@code{readdir} concurrently on different @var{dirstream}s, so there is
+no need to use @code{readdir_r} in most multi-threaded programs.  In
+the rare case that multiple threads need to read from the same
+@var{dirstream}, it is still better to use @code{readdir} and external
+synchronization.
+
+@item
+It is expected that future versions of POSIX will obsolete
+@code{readdir_r} and mandate the level of thread safety for
+@code{readdir} which is provided by @theglibc{} and other
+implementations today.
+@end itemize
 
 Normally @code{readdir_r} returns zero and sets @code{*@var{result}}
 to @var{entry}.  If there are no more entries in the directory or an
@@ -481,15 +523,6 @@  error is detected, @code{readdir_r} sets @code{*@var{result}} to a
 null pointer and returns a nonzero error code, also stored in
 @code{errno}, as described for @code{readdir}.
 
-@strong{Portability Note:} On some systems @code{readdir_r} may not
-return a NUL terminated string for the file name, even when there is no
-@code{d_reclen} field in @code{struct dirent} and the file
-name is the maximum allowed size.  Modern systems all have the
-@code{d_reclen} field, and on old systems multi-threading is not
-critical.  In any case there is no such problem with the @code{readdir}
-function, so that even on systems without the @code{d_reclen} member one
-could use multiple threads by using external locking.
-
 It is also important to look at the definition of the @code{struct
 dirent} type.  Simply passing a pointer to an object of this type for
 the second parameter of @code{readdir_r} might not be enough.  Some
diff --git a/sysdeps/posix/dirstream.h b/sysdeps/posix/dirstream.h
index a7a074d..8e8570d 100644
--- a/sysdeps/posix/dirstream.h
+++ b/sysdeps/posix/dirstream.h
@@ -39,6 +39,8 @@  struct __dirstream
 
     off_t filepos;		/* Position of next entry to read.  */
 
+    int errcode;		/* Delayed error code.  */
+
     /* Directory block.  */
     char data[0] __attribute__ ((aligned (__alignof__ (void*))));
   };
diff --git a/sysdeps/posix/opendir.c b/sysdeps/posix/opendir.c
index ddfc3a7..fc05b0f 100644
--- a/sysdeps/posix/opendir.c
+++ b/sysdeps/posix/opendir.c
@@ -231,6 +231,7 @@  __alloc_dir (int fd, bool close_fd, int flags, const struct stat64 *statp)
   dirp->size = 0;
   dirp->offset = 0;
   dirp->filepos = 0;
+  dirp->errcode = 0;
 
   return dirp;
 }
diff --git a/sysdeps/posix/readdir_r.c b/sysdeps/posix/readdir_r.c
index b5a8e2e..8ed5c3f 100644
--- a/sysdeps/posix/readdir_r.c
+++ b/sysdeps/posix/readdir_r.c
@@ -40,6 +40,7 @@  __READDIR_R (DIR *dirp, DIRENT_TYPE *entry, DIRENT_TYPE **result)
   DIRENT_TYPE *dp;
   size_t reclen;
   const int saved_errno = errno;
+  int ret;
 
   __libc_lock_lock (dirp->lock);
 
@@ -70,10 +71,10 @@  __READDIR_R (DIR *dirp, DIRENT_TYPE *entry, DIRENT_TYPE **result)
 		  bytes = 0;
 		  __set_errno (saved_errno);
 		}
+	      if (bytes < 0)
+		dirp->errcode = errno;
 
 	      dp = NULL;
-	      /* Reclen != 0 signals that an error occurred.  */
-	      reclen = bytes != 0;
 	      break;
 	    }
 	  dirp->size = (size_t) bytes;
@@ -106,29 +107,46 @@  __READDIR_R (DIR *dirp, DIRENT_TYPE *entry, DIRENT_TYPE **result)
       dirp->filepos += reclen;
 #endif
 
-      /* Skip deleted files.  */
+#ifdef NAME_MAX
+      if (reclen > offsetof (DIRENT_TYPE, d_name) + NAME_MAX + 1)
+	{
+	  /* The record is very long.  It could still fit into the
+	     caller-supplied buffer if we can skip padding at the
+	     end.  */
+	  size_t namelen = _D_EXACT_NAMLEN (dp);
+	  if (namelen <= NAME_MAX)
+	    reclen = offsetof (DIRENT_TYPE, d_name) + namelen + 1;
+	  else
+	    {
+	      /* The name is too long.  Ignore this file.  */
+	      dirp->errcode = ENAMETOOLONG;
+	      dp->d_ino = 0;
+	      continue;
+	    }
+	}
+#endif
+
+      /* Skip deleted and ignored files.  */
     }
   while (dp->d_ino == 0);
 
   if (dp != NULL)
     {
-#ifdef GETDENTS_64BIT_ALIGNED
-      /* The d_reclen value might include padding which is not part of
-	 the DIRENT_TYPE data structure.  */
-      reclen = MIN (reclen,
-		    offsetof (DIRENT_TYPE, d_name) + sizeof (dp->d_name));
-#endif
       *result = memcpy (entry, dp, reclen);
-#ifdef GETDENTS_64BIT_ALIGNED
+#ifdef _DIRENT_HAVE_D_RECLEN
       entry->d_reclen = reclen;
 #endif
+      ret = 0;
     }
   else
-    *result = NULL;
+    {
+      *result = NULL;
+      ret = dirp->errcode;
+    }
 
   __libc_lock_unlock (dirp->lock);
 
-  return dp != NULL ? 0 : reclen ? errno : 0;
+  return ret;
 }
 
 #ifdef __READDIR_R_ALIAS
diff --git a/sysdeps/posix/rewinddir.c b/sysdeps/posix/rewinddir.c
index 2935a8e..d4991ad 100644
--- a/sysdeps/posix/rewinddir.c
+++ b/sysdeps/posix/rewinddir.c
@@ -33,6 +33,7 @@  rewinddir (dirp)
   dirp->filepos = 0;
   dirp->offset = 0;
   dirp->size = 0;
+  dirp->errcode = 0;
 #ifndef NOT_IN_libc
   __libc_lock_unlock (dirp->lock);
 #endif
diff --git a/sysdeps/unix/sysv/linux/i386/readdir64_r.c b/sysdeps/unix/sysv/linux/i386/readdir64_r.c
index 8ebbcfd..a7d114e 100644
--- a/sysdeps/unix/sysv/linux/i386/readdir64_r.c
+++ b/sysdeps/unix/sysv/linux/i386/readdir64_r.c
@@ -18,7 +18,6 @@ 
 #define __READDIR_R __readdir64_r
 #define __GETDENTS __getdents64
 #define DIRENT_TYPE struct dirent64
-#define GETDENTS_64BIT_ALIGNED 1
 
 #include <sysdeps/posix/readdir_r.c>
 
diff --git a/sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c b/sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c
index 5ed8e95..290f2c8 100644
--- a/sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c
+++ b/sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c
@@ -1,5 +1,4 @@ 
 #define readdir64_r __no_readdir64_r_decl
-#define GETDENTS_64BIT_ALIGNED 1
 #include <sysdeps/posix/readdir_r.c>
 #undef readdir64_r
 weak_alias (__readdir_r, readdir64_r)