From patchwork Tue Mar 1 08:07:09 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Michael Kerrisk (man-pages)" X-Patchwork-Id: 11152 Received: (qmail 35938 invoked by alias); 1 Mar 2016 08:07:24 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 35896 invoked by uid 89); 1 Mar 2016 08:07:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.1 required=5.0 tests=BAYES_50, FREEMAIL_FROM, FREEMAIL_REPLYTO, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=no version=3.3.2 spammy=Position, suffers, threadsafe, thread-safe X-HELO: mail-wm0-f47.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:reply-to:subject:references:to:cc:from :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding; bh=0AO0pUO5WRzAZUXrug+R2yi1pbtxRWdeUj8Twp5D33Y=; b=fCzLMlBwbCb/j+GvlY/JQJLtknvxMV4Fygh2WfhLjQn3HqNBJgY/az5KnxnuwfWsSD 3IJW1cRwN11Ol+QyNHMswsPoCjQBPBLl/88bBwY3f5hxQm4u8IruN+IqZ/TUpGCTZU88 RBcG4vtZ+Gvyjkqy98yAe61zl7g6jGDuwmZPyZVzR5wmHR1tadfWV0p5SlwL/KqS0zBx gVGcA5quwaBfOnAqNTXIaowo+hbmiyh3Rw0GZT4AqARtGaO7PxUNGf3azyU2sMRusKcs siViJRaKl2nflFMPIhs3UKNCDlG2eBNRxufMmi6qGbEhdS1ZHpEhSrjkjWr/yeVoMybD QZWw== X-Gm-Message-State: AD7BkJJKNobX9VBcyPOdpL04qYTNR7VN7nXmqcjkNjB/jc7CHBzUimmCDUVDxj5OEMA2rg== X-Received: by 10.194.92.68 with SMTP id ck4mr18617206wjb.144.1456819633320; Tue, 01 Mar 2016 00:07:13 -0800 (PST) Reply-To: mtk.manpages@gmail.com Subject: Re: [PATCH] Fix readdir_r with long file names References: <51B0B39F.4060202@redhat.com> <51B0BD36.3030202@redhat.com> <20130607013024.GO29800@brightrain.aerifal.cx> <51B19203.3070307@redhat.com> <20130607144143.GQ29800@brightrain.aerifal.cx> <51B57E35.4080403@redhat.com> <51B65EA7.2020402@redhat.com> <20130611011324.GT29800@brightrain.aerifal.cx> <51B8702D.2060505@redhat.com> <20130813040038.GE21795@spoyarek.pnq.redhat.com> <520C88A6.9070501@redhat.com> To: Florian Weimer , Siddhesh Poyarekar Cc: "Michael Kerrisk (gmail)" , Rich Felker , Carlos O'Donell , KOSAKI Motohiro , libc-alpha , Roland McGrath , linux-man From: "Michael Kerrisk (man-pages)" X-Enigmail-Draft-Status: N1110 Message-ID: <56D54DAD.1040306@gmail.com> Date: Tue, 1 Mar 2016 09:07:09 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <520C88A6.9070501@redhat.com> Hello Florian, On 08/15/2013 09:52 AM, Florian Weimer wrote: > On 08/13/2013 06:00 AM, Siddhesh Poyarekar wrote: >>> +In POSIX.1-2008, @code{readdir} is not thread-safe. In @theglibc{} >>> +implementation, it is safe to call @code{readdir} concurrently on >>> +different @var{dirstream}s (but multiple threads accessing the same >>> +@var{dirstream} result in undefined behavior). @code{readdir_r} is a >> >> Minor nit - you could get rid of the round brackets and simply write >> the line as: >> >> In @theglibc{} implementation, it is safe to call @code{readdir} >> concurrently on different @var{dirstream}s, but multiple threads >> accessing the same @var{dirstream} result in undefined behavior. > > Applied, thanks. > >>> +@item >>> +POSIX-1.2008 does not guarantee that @code{readdir} is thread-safe, >>> +even when access to the same @var{dirstream} is serialized. But in >>> +current implementations (including @theglibc{}), it is safe to call >>> +@code{readdir} concurrently on different @var{dirstream}s, so there is >>> +no requirement to use @code{readdir_r} even in multi-threaded >>> +programs. >>> + >> >> This seems to gloss over the fact that one would need synchronization >> (or readdir_r) if readdir is called concurrently on the same >> dirstream. It seems like a bad idea to do this at all, but we >> probably should add a note about it anyway. > > I ended up with this: > > +@item > +POSIX-1.2008 does not guarantee that @code{readdir} is thread-safe, > +even when access to the same @var{dirstream} is serialized. But in > +current implementations (including @theglibc{}), it is safe to call > +@code{readdir} concurrently on different @var{dirstream}s, so there is > +no need to use @code{readdir_r} in most multi-threaded programs. In > +the rare case that multiple threads need to read from the same > +@var{dirstream}, it is still better to use @code{readdir} and external > +synchronization. > > I'm attaching my current version. It compiles, but tests are still running. > > I'll update the NEWS file with the bug number on the final commit. I see that glibc 2.23 deprecates readdir_r(), which prompted me to catch up on this thread. I'd like to see the points you make documented in the readdir_r(3) man page also. Would you be willing to allow that text to be reused / reworked for the page, under that page's existing "verbatim" license (https://www.kernel.org/doc/man-pages/licenses.html#verbatim)? The text I'd propose to add to the man page would be (new material starting at ===>): readdir_r() The readdir_r() function is a reentrant version of readdir(). It reads the next directory entry from the directory stream dirp, and returns it in the caller-allocated buffer pointed to by entry. A pointer to the returned item is placed in *result; if the end of the directory stream was encountered, then NULL is instead returned in *result. Since POSIX.1 does not specify the size of the d_name field, and other nonstandard fields may precede that field within the dirent structure, portable applications that use readdir_r() could allocate the buffer whose address is passed in entry as follows: name_max = pathconf(dirpath, _PC_NAME_MAX); if (name_max == -1) /* Limit not defined, or error */ name_max = 255; /* Take a guess */ len = offsetof(struct dirent, d_name) + name_max + 1; entryp = malloc(len); (POSIX.1 requires that d_name is the last field in a struct dirent.) ===> However, the above approach has problems, and it is recommended that applications use readdir() instead of readdir_r(). Fur‐ thermore, since version 2.23, glibc deprecates readdir_r(). The reasons are as follows: * On systems where NAME_MAX is undefined, calling readdir_r() may be unsafe because the interface does not allow the call‐ er to specify the length of the buffer used for the returned directory entry. * On some systems, readdir_r() can't read directory entries with very long names. When the glibc implementation encoun‐ ters such a name, readdir_r() fails with the error ENAMETOO‐ LONG after the final directory entry has been read. On some other systems, readdir_r() may return a success status, but the returned d_name field may not be null terminated or may be truncated. * In the current POSIX.1 specification (POSIX.1-2008), read‐ dir_r() is not required to be thread-safe. However, in mod‐ ern implementations (including the glibc implementation), concurrent calls to readdir_r() that specify different directory streams are thread-safe. Therefore, the use of readdir_r() is generally unnecessary in multithreaded pro‐ grams. In cases where multiple threads must read from the same directory stream, using readdir() with external syn‐ chronization is still preferable to the use of readdir_r(), for the reasons given in the points above. * It is expected that a future version of POSIX.1 will make readdir_r() obsolete, and require that readdir() be thread- safe when concurrently employed on different directory streams. Cheers, Michael =================================== 2013-08-15 Florian Weimer [BZ #14699] CVE-2013-4237 * sysdeps/posix/dirstream.h (struct __dirstream): Add errcode member. * sysdeps/posix/opendir.c (__alloc_dir): Initialize errcode member. * sysdeps/posix/rewinddir.c (rewinddir): Reset errcode member. * sysdeps/posix/readdir_r.c (__READDIR_R): Enforce NAME_MAX limit. Return delayed error code. Remove GETDENTS_64BIT_ALIGNED conditional. * sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c: Do not define GETDENTS_64BIT_ALIGNED. * sysdeps/unix/sysv/linux/i386/readdir64_r.c: Likewise. * manual/filesys.texi (Reading/Closing Directory): Document ENAMETOOLONG return value of readdir_r. Recommend readdir more strongly. * manual/conf.texi (Limits for Files): Add portability note to NAME_MAX, PATH_MAX. (Pathconf): Add portability note for _PC_NAME_MAX, _PC_PATH_MAX. diff --git a/manual/conf.texi b/manual/conf.texi index 7eb8b36..c720063 100644 --- a/manual/conf.texi +++ b/manual/conf.texi @@ -1149,6 +1149,9 @@ typed ahead as input. @xref{I/O Queues}. @deftypevr Macro int NAME_MAX The uniform system limit (if any) for the length of a file name component, not including the terminating null character. + +@strong{Portability Note:} On some systems, @theglibc{} defines +@code{NAME_MAX}, but does not actually enforce this limit. @end deftypevr @comment limits.h @@ -1157,6 +1160,9 @@ including the terminating null character. The uniform system limit (if any) for the length of an entire file name (that is, the argument given to system calls such as @code{open}), including the terminating null character. + +@strong{Portability Note:} @Theglibc{} does not enforce this limit +even if @code{PATH_MAX} is defined. @end deftypevr @cindex limits, pipe buffer size @@ -1476,6 +1482,9 @@ Inquire about the value of @code{POSIX_REC_MIN_XFER_SIZE}. Inquire about the value of @code{POSIX_REC_XFER_ALIGN}. @end table +@strong{Portability Note:} On some systems, @theglibc{} does not +enforce @code{_PC_NAME_MAX} or @code{_PC_PATH_MAX} limits. + @node Utility Limits @section Utility Program Capacity Limits diff --git a/manual/filesys.texi b/manual/filesys.texi index 1df9cf2..814c210 100644 --- a/manual/filesys.texi +++ b/manual/filesys.texi @@ -444,9 +444,9 @@ symbols are declared in the header file @file{dirent.h}. @comment POSIX.1 @deftypefun {struct dirent *} readdir (DIR *@var{dirstream}) This function reads the next entry from the directory. It normally -returns a pointer to a structure containing information about the file. -This structure is statically allocated and can be rewritten by a -subsequent call. +returns a pointer to a structure containing information about the +file. This structure is associated with the @var{dirstream} handle +and can be rewritten by a subsequent call. @strong{Portability Note:} On some systems @code{readdir} may not return entries for @file{.} and @file{..}, even though these are always @@ -461,19 +461,61 @@ conditions are defined for this function: The @var{dirstream} argument is not valid. @end table -@code{readdir} is not thread safe. Multiple threads using -@code{readdir} on the same @var{dirstream} may overwrite the return -value. Use @code{readdir_r} when this is critical. +To distinguish between an end-of-directory condition or an error, you +must set @code{errno} to zero before calling @code{readdir}. To avoid +entering an infinite loop, you should stop reading from the directory +after the first error. + +In POSIX.1-2008, @code{readdir} is not thread-safe. In @theglibc{} +implementation, it is safe to call @code{readdir} concurrently on +different @var{dirstream}s, but multiple threads accessing the same +@var{dirstream} result in undefined behavior. @code{readdir_r} is a +fully thread-safe alternative, but suffers from poor portability (see +below). It is recommended that you use @code{readdir}, with external +locking if multiple threads access the same @var{dirstream}. @end deftypefun @comment dirent.h @comment GNU @deftypefun int readdir_r (DIR *@var{dirstream}, struct dirent *@var{entry}, struct dirent **@var{result}) -This function is the reentrant version of @code{readdir}. Like -@code{readdir} it returns the next entry from the directory. But to -prevent conflicts between simultaneously running threads the result is -not stored in statically allocated memory. Instead the argument -@var{entry} points to a place to store the result. +This function is a version of @code{readdir} which performs internal +locking. Like @code{readdir} it returns the next entry from the +directory. To prevent conflicts between simultaneously running +threads the result is stored inside the @var{entry} object. + +@strong{Portability Note:} It is recommended to use @code{readdir} +instead of @code{readdir_r} for the following reasons: + +@itemize @bullet +@item +On systems which do not define @code{NAME_MAX}, it may not be possible +to use @code{readdir_r} safely because the caller does not specify the +length of the buffer for the directory entry. + +@item +On some systems, @code{readdir_r} cannot read directory entries with +very long names. If such a name is encountered, @theglibc{} +implementation of @code{readdir_r} returns with an error code of +@code{ENAMETOOLONG} after the final directory entry has been read. On +other systems, @code{readdir_r} may return successfully, but the +@code{d_name} member may not be NUL-terminated or may be truncated. + +@item +POSIX-1.2008 does not guarantee that @code{readdir} is thread-safe, +even when access to the same @var{dirstream} is serialized. But in +current implementations (including @theglibc{}), it is safe to call +@code{readdir} concurrently on different @var{dirstream}s, so there is +no need to use @code{readdir_r} in most multi-threaded programs. In +the rare case that multiple threads need to read from the same +@var{dirstream}, it is still better to use @code{readdir} and external +synchronization. + +@item +It is expected that future versions of POSIX will obsolete +@code{readdir_r} and mandate the level of thread safety for +@code{readdir} which is provided by @theglibc{} and other +implementations today. +@end itemize Normally @code{readdir_r} returns zero and sets @code{*@var{result}} to @var{entry}. If there are no more entries in the directory or an @@ -481,15 +523,6 @@ error is detected, @code{readdir_r} sets @code{*@var{result}} to a null pointer and returns a nonzero error code, also stored in @code{errno}, as described for @code{readdir}. -@strong{Portability Note:} On some systems @code{readdir_r} may not -return a NUL terminated string for the file name, even when there is no -@code{d_reclen} field in @code{struct dirent} and the file -name is the maximum allowed size. Modern systems all have the -@code{d_reclen} field, and on old systems multi-threading is not -critical. In any case there is no such problem with the @code{readdir} -function, so that even on systems without the @code{d_reclen} member one -could use multiple threads by using external locking. - It is also important to look at the definition of the @code{struct dirent} type. Simply passing a pointer to an object of this type for the second parameter of @code{readdir_r} might not be enough. Some diff --git a/sysdeps/posix/dirstream.h b/sysdeps/posix/dirstream.h index a7a074d..8e8570d 100644 --- a/sysdeps/posix/dirstream.h +++ b/sysdeps/posix/dirstream.h @@ -39,6 +39,8 @@ struct __dirstream off_t filepos; /* Position of next entry to read. */ + int errcode; /* Delayed error code. */ + /* Directory block. */ char data[0] __attribute__ ((aligned (__alignof__ (void*)))); }; diff --git a/sysdeps/posix/opendir.c b/sysdeps/posix/opendir.c index ddfc3a7..fc05b0f 100644 --- a/sysdeps/posix/opendir.c +++ b/sysdeps/posix/opendir.c @@ -231,6 +231,7 @@ __alloc_dir (int fd, bool close_fd, int flags, const struct stat64 *statp) dirp->size = 0; dirp->offset = 0; dirp->filepos = 0; + dirp->errcode = 0; return dirp; } diff --git a/sysdeps/posix/readdir_r.c b/sysdeps/posix/readdir_r.c index b5a8e2e..8ed5c3f 100644 --- a/sysdeps/posix/readdir_r.c +++ b/sysdeps/posix/readdir_r.c @@ -40,6 +40,7 @@ __READDIR_R (DIR *dirp, DIRENT_TYPE *entry, DIRENT_TYPE **result) DIRENT_TYPE *dp; size_t reclen; const int saved_errno = errno; + int ret; __libc_lock_lock (dirp->lock); @@ -70,10 +71,10 @@ __READDIR_R (DIR *dirp, DIRENT_TYPE *entry, DIRENT_TYPE **result) bytes = 0; __set_errno (saved_errno); } + if (bytes < 0) + dirp->errcode = errno; dp = NULL; - /* Reclen != 0 signals that an error occurred. */ - reclen = bytes != 0; break; } dirp->size = (size_t) bytes; @@ -106,29 +107,46 @@ __READDIR_R (DIR *dirp, DIRENT_TYPE *entry, DIRENT_TYPE **result) dirp->filepos += reclen; #endif - /* Skip deleted files. */ +#ifdef NAME_MAX + if (reclen > offsetof (DIRENT_TYPE, d_name) + NAME_MAX + 1) + { + /* The record is very long. It could still fit into the + caller-supplied buffer if we can skip padding at the + end. */ + size_t namelen = _D_EXACT_NAMLEN (dp); + if (namelen <= NAME_MAX) + reclen = offsetof (DIRENT_TYPE, d_name) + namelen + 1; + else + { + /* The name is too long. Ignore this file. */ + dirp->errcode = ENAMETOOLONG; + dp->d_ino = 0; + continue; + } + } +#endif + + /* Skip deleted and ignored files. */ } while (dp->d_ino == 0); if (dp != NULL) { -#ifdef GETDENTS_64BIT_ALIGNED - /* The d_reclen value might include padding which is not part of - the DIRENT_TYPE data structure. */ - reclen = MIN (reclen, - offsetof (DIRENT_TYPE, d_name) + sizeof (dp->d_name)); -#endif *result = memcpy (entry, dp, reclen); -#ifdef GETDENTS_64BIT_ALIGNED +#ifdef _DIRENT_HAVE_D_RECLEN entry->d_reclen = reclen; #endif + ret = 0; } else - *result = NULL; + { + *result = NULL; + ret = dirp->errcode; + } __libc_lock_unlock (dirp->lock); - return dp != NULL ? 0 : reclen ? errno : 0; + return ret; } #ifdef __READDIR_R_ALIAS diff --git a/sysdeps/posix/rewinddir.c b/sysdeps/posix/rewinddir.c index 2935a8e..d4991ad 100644 --- a/sysdeps/posix/rewinddir.c +++ b/sysdeps/posix/rewinddir.c @@ -33,6 +33,7 @@ rewinddir (dirp) dirp->filepos = 0; dirp->offset = 0; dirp->size = 0; + dirp->errcode = 0; #ifndef NOT_IN_libc __libc_lock_unlock (dirp->lock); #endif diff --git a/sysdeps/unix/sysv/linux/i386/readdir64_r.c b/sysdeps/unix/sysv/linux/i386/readdir64_r.c index 8ebbcfd..a7d114e 100644 --- a/sysdeps/unix/sysv/linux/i386/readdir64_r.c +++ b/sysdeps/unix/sysv/linux/i386/readdir64_r.c @@ -18,7 +18,6 @@ #define __READDIR_R __readdir64_r #define __GETDENTS __getdents64 #define DIRENT_TYPE struct dirent64 -#define GETDENTS_64BIT_ALIGNED 1 #include diff --git a/sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c b/sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c index 5ed8e95..290f2c8 100644 --- a/sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c +++ b/sysdeps/unix/sysv/linux/wordsize-64/readdir_r.c @@ -1,5 +1,4 @@ #define readdir64_r __no_readdir64_r_decl -#define GETDENTS_64BIT_ALIGNED 1 #include #undef readdir64_r weak_alias (__readdir_r, readdir64_r)