From patchwork Tue Apr 12 14:01:00 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 11706 Received: (qmail 87317 invoked by alias); 12 Apr 2016 14:01:18 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 87275 invoked by uid 89); 12 Apr 2016 14:01:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.3 required=5.0 tests=BAYES_50, KAM_STOCKGEN, RP_MATCHES_RCVD, SPF_HELO_PASS, T_FILL_THIS_FORM_SHORT autolearn=no version=3.3.2 spammy=direnth, UD:dirent.h, dirent, dirent.h X-HELO: mx1.redhat.com Subject: Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir To: Roland McGrath References: <57068276.9000503@redhat.com> <20160408184422.A74BE2C3B21@topped-with-meat.com> <570BD76D.5070907@redhat.com> <20160411211902.8E34F2C3B22@topped-with-meat.com> Cc: GNU C Library From: Florian Weimer Message-ID: <570CFF9C.3080707@redhat.com> Date: Tue, 12 Apr 2016 16:01:00 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160411211902.8E34F2C3B22@topped-with-meat.com> On 04/11/2016 11:19 PM, Roland McGrath wrote: >> I'm trying to get clarification if getdents on Linux can ever return >> zero d_ino values. It's going to be difficult to work this out due to >> the VFS layer. My hunch is that inode 0 is more likely to mean I can't >> tell you the inode number right now here , and not please skip this entry . > > I don't know where your hunch comes from, but I don't believe it for a > second. The d_ino==0 convention has been universal in Unix since before > Linux existed. Any filesystem that returns d_ino==0 entries for getdents > will have those entries completely ignored by userland, so I doubt anyone > has implemented a filesystem that way. Non-GNU userland would work, so it could be a file system restricted to embedded applications. >> +An implementation of @code{gl_readdir} should initialize the >> +@code{struct dirent} object to zero, up to the @code{d_name} member. >> +The @code{d_ino} member must be set to a non-zero value, otherwise >> +@code{glob} may ignore the returned directory entry. > > This wording is awkward. It seems to contradict itself, as it first > implies that d_ino should be zero and then says it must be nonzero. I'm > not convinced this "zero the object" advice is the way to go anyway. If > there is struct padding in struct dirent, it doesn't matter that it be > zero. If it has d_namlen, it (now) doesn't matter what it contains at all. > Perhaps it's better to say just which fields glob examines and that the > protocol requires only that those fields be set. > > Regardless of what we settle on as best practice, I think we want a little > code example here that demonstrates it. Right. I realized that d_type and d_ino are unconditionally available in glibc, so I got rid of the memset and spelled out all relevant members explicitly. I added the mkdirent example function in the new version of the patch. > @code{glob} examines the @code{struct dirent} object only immediately after > each call to your @code{gl_readdir} callback function and does not store > the pointer returned. That pointer is not expected to remain valid after a > subsequent call to the @code{gl_readdir} or @code{gl_closedir} callback. A > common way to implement these callbacks it to reuse the same buffer each > time the @code{gl_readdir} function is called and free the buffer in the > @code{gl_closedir} function. I expanded a bit on that in the new version of the patch, too. I hope the table nesting isn't too awkward. To me, it looks okay in Info, HTML and PDF. Florian glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir Previously, application code had to set up the d_namlen member if the target supported it, involving conditional compilation. After this change, glob will use the length of the string in d_name instead of d_namlen to determine the file name length. All glibc targets provide the d_type and d_ino members, and setting the as needed for gl_readdir is straightforward. Changing the behavior with regards to d_ino is left to a future cleanup. 2016-04-12 Florian Weimer glob: Simplify and document the interface for the GLOB_ALTDIRFUNC callback function gl_readdir. * posix/glob.c (NAMELEN, CONVERT_D_NAMLEN): Remove. (CONVERT_DIRENT_DIRENT64): Use strcpy instead of memcpy. (glob_in_dir): Remove len. Use strdup instead of malloc and memcpy to copy the name. * manual/pattern.texi (Calling Glob): Document requirements for implementations of the gl_readdir callback function. * manual/examples/mkdirent.c: New example. * posix/bug-glob2.c (my_readdir): Set d_ino to 1 unconditionally, per the manual guidance. * posix/tst-gnuglob.c (my_readdir): Likewise. diff --git a/manual/examples/mkdirent.c b/manual/examples/mkdirent.c new file mode 100644 index 0000000..eea794d --- /dev/null +++ b/manual/examples/mkdirent.c @@ -0,0 +1,42 @@ +/* Example for creating a struct dirent object for use with glob. + Copyright (C) 2016 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, if not, see . +*/ + +#include +#include +#include +#include +#include + +struct dirent * +mkdirent (const char *name) +{ + size_t dirent_size = offsetof (struct dirent, d_name) + 1; + size_t name_length = strlen (name); + size_t total_size = dirent_size + name_length; + if (total_size < dirent_size) + { + errno = ENOMEM; + return NULL; + } + struct dirent *result = malloc (total_size); + if (result == NULL) + return NULL; + result->d_type = DT_UNKNOWN; + result->d_ino = 1; /* Do not skip this entry. */ + memcpy (result->d_name, name, name_length + 1); + return result; +} diff --git a/manual/pattern.texi b/manual/pattern.texi index d1b9275..565e7eb 100644 --- a/manual/pattern.texi +++ b/manual/pattern.texi @@ -237,7 +237,44 @@ function used to read the contents of a directory. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of this field is @w{@code{struct dirent *(*) (void *)}}. -This is a GNU extension. +An implementation of @code{gl_readdir} needs to initialize the following +members of the @code{struct dirent} object: + +@table @code +@item d_type +This member should be set to the file type of the entry if it is known. +Otherwise, the value @code{DT_UNKNOWN} can be used. The @code{glob} +function may use the specified file type to avoid callbacks in cases +where the file type indicates that the data is not required. + +@item d_ino +This member needs to be non-zero, otherwise @code{glob} may skip the +current entry and call the @code{gl_readdir} callback function again to +retrieve another entry. + +@item d_name +This member must be set to the name of the entry. It must be +null-terminated. +@end table + +The example below shows how to allocate a @code{struct dirent} object +containing a given name. + +@smallexample +@include mkdirent.c.texi +@end smallexample + +The @code{glob} function reads the @code{struct dirent} members listed +above and makes a copy of the file name in the @code{d_name} member +immediately after the @code{gl_readdir} callback function returns. +Future invocations of any of the callback functions may dealloacte or +reuse the buffer. It is the responsibility of the caller of the +@code{glob} function to allocate and deallocate the buffer, around the +call to @code{glob} or using the callback functions. For example, an +application could allocate the buffer in the @code{gl_readdir} callback +function, and deallocate it in the @code{gl_closedir} callback function. + +The @code{gl_readdir} member is a GNU extension. @item gl_opendir The address of an alternative implementation of the @code{opendir} diff --git a/posix/bug-glob2.c b/posix/bug-glob2.c index ddf5ec9..0fdc5d0 100644 --- a/posix/bug-glob2.c +++ b/posix/bug-glob2.c @@ -193,7 +193,7 @@ my_readdir (void *gdir) return NULL; } - dir->d.d_ino = dir->idx; + dir->d.d_ino = 1; /* glob should not skip this entry. */ #ifdef _DIRENT_HAVE_D_TYPE dir->d.d_type = filesystem[dir->idx].type; diff --git a/posix/glob.c b/posix/glob.c index 0c04c3c..9ae76ac 100644 --- a/posix/glob.c +++ b/posix/glob.c @@ -57,10 +57,8 @@ #if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__ # include -# define NAMLEN(dirent) strlen((dirent)->d_name) #else # define dirent direct -# define NAMLEN(dirent) (dirent)->d_namlen # ifdef HAVE_SYS_NDIR_H # include # endif @@ -76,12 +74,6 @@ #endif -/* In GNU systems, defines this macro for us. */ -#ifdef _D_NAMLEN -# undef NAMLEN -# define NAMLEN(d) _D_NAMLEN(d) -#endif - /* When used in the GNU libc the symbol _DIRENT_HAVE_D_TYPE is available if the `d_type' member for `struct dirent' is available. HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB. */ @@ -105,12 +97,6 @@ /* If the system has the `struct dirent64' type we use it internally. */ #if defined _LIBC && !defined COMPILE_GLOB64 -# if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__ -# define CONVERT_D_NAMLEN(d64, d32) -# else -# define CONVERT_D_NAMLEN(d64, d32) \ - (d64)->d_namlen = (d32)->d_namlen; -# endif # if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__ # define CONVERT_D_INO(d64, d32) @@ -127,8 +113,7 @@ # endif # define CONVERT_DIRENT_DIRENT64(d64, d32) \ - memcpy ((d64)->d_name, (d32)->d_name, NAMLEN (d32) + 1); \ - CONVERT_D_NAMLEN (d64, d32) \ + strcpy ((d64)->d_name, (d32)->d_name); \ CONVERT_D_INO (d64, d32) \ CONVERT_D_TYPE (d64, d32) #endif @@ -1554,7 +1539,6 @@ glob_in_dir (const char *pattern, const char *directory, int flags, while (1) { const char *name; - size_t len; #if defined _LIBC && !defined COMPILE_GLOB64 struct dirent64 *d; union @@ -1622,12 +1606,10 @@ glob_in_dir (const char *pattern, const char *directory, int flags, names = newnames; cur = 0; } - len = NAMLEN (d); - names->name[cur] = (char *) malloc (len + 1); + names->name[cur] = strdup (d->d_name); if (names->name[cur] == NULL) goto memory_error; - *((char *) mempcpy (names->name[cur++], name, len)) - = '\0'; + ++cur; ++nfound; } } diff --git a/posix/tst-gnuglob.c b/posix/tst-gnuglob.c index 992b997..b7b859b 100644 --- a/posix/tst-gnuglob.c +++ b/posix/tst-gnuglob.c @@ -211,7 +211,7 @@ my_readdir (void *gdir) return NULL; } - dir->d.d_ino = dir->idx; + dir->d.d_ino = 1; /* glob should not skip this entry. */ #ifdef _DIRENT_HAVE_D_TYPE dir->d.d_type = filesystem[dir->idx].type;