glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir

Message ID 570CFF9C.3080707@redhat.com
State Committed
Headers

Commit Message

Florian Weimer April 12, 2016, 2:01 p.m. UTC
  On 04/11/2016 11:19 PM, Roland McGrath wrote:
>> I'm trying to get clarification if getdents on Linux can ever return
>> zero d_ino values.  It's going to be difficult to work this out due to
>> the VFS layer.  My hunch is that inode 0 is more likely to mean  I can't
>> tell you the inode number right now here , and not  please skip this entry .
>
> I don't know where your hunch comes from, but I don't believe it for a
> second.  The d_ino==0 convention has been universal in Unix since before
> Linux existed.  Any filesystem that returns d_ino==0 entries for getdents
> will have those entries completely ignored by userland, so I doubt anyone
> has implemented a filesystem that way.

Non-GNU userland would work, so it could be a file system restricted to 
embedded applications.

>> +An implementation of @code{gl_readdir} should initialize the
>> +@code{struct dirent} object to zero, up to the @code{d_name} member.
>> +The @code{d_ino} member must be set to a non-zero value, otherwise
>> +@code{glob} may ignore the returned directory entry.
>
> This wording is awkward.  It seems to contradict itself, as it first
> implies that d_ino should be zero and then says it must be nonzero.  I'm
> not convinced this "zero the object" advice is the way to go anyway.  If
> there is struct padding in struct dirent, it doesn't matter that it be
> zero.  If it has d_namlen, it (now) doesn't matter what it contains at all.
> Perhaps it's better to say just which fields glob examines and that the
> protocol requires only that those fields be set.
>
> Regardless of what we settle on as best practice, I think we want a little
> code example here that demonstrates it.

Right.  I realized that d_type and d_ino are unconditionally available 
in glibc, so I got rid of the memset and spelled out all relevant 
members explicitly.  I added the mkdirent example function in the new 
version of the patch.

> @code{glob} examines the @code{struct dirent} object only immediately after
> each call to your @code{gl_readdir} callback function and does not store
> the pointer returned.  That pointer is not expected to remain valid after a
> subsequent call to the @code{gl_readdir} or @code{gl_closedir} callback.  A
> common way to implement these callbacks it to reuse the same buffer each
> time the @code{gl_readdir} function is called and free the buffer in the
> @code{gl_closedir} function.

I expanded a bit on that in the new version of the patch, too.  I hope 
the table nesting isn't too awkward.  To me, it looks okay in Info, HTML 
and PDF.

Florian
  

Comments

Florian Weimer April 26, 2016, 2:17 p.m. UTC | #1
On 04/12/2016 04:01 PM, Florian Weimer wrote:
> glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir
>
> Previously, application code had to set up the d_namlen member if
> the target supported it, involving conditional compilation.  After
> this change, glob will use the length of the string in d_name instead
> of d_namlen to determine the file name length.  All glibc targets
> provide the d_type and d_ino members, and setting the as needed for
> gl_readdir is straightforward.
>
> Changing the behavior with regards to d_ino is left to a future
> cleanup.
>
> 2016-04-12  Florian Weimer<fweimer@redhat.com>
>
> 	glob: Simplify and document the interface for the GLOB_ALTDIRFUNC
> 	callback function gl_readdir.
> 	* posix/glob.c (NAMELEN, CONVERT_D_NAMLEN): Remove.
> 	(CONVERT_DIRENT_DIRENT64): Use strcpy instead of memcpy.
> 	(glob_in_dir): Remove len.  Use strdup instead of malloc and
> 	memcpy to copy the name.
> 	* manual/pattern.texi (Calling Glob): Document requirements for
> 	implementations of the gl_readdir callback function.
> 	* manual/examples/mkdirent.c: New example.
> 	* posix/bug-glob2.c (my_readdir): Set d_ino to 1 unconditionally,
> 	per the manual guidance.
> 	* posix/tst-gnuglob.c (my_readdir): Likewise.

Ping?

I'd like to move forward with these cleanups and eventually fix 
CVE-2016-1234.

Thanks,
Florian
  
Roland McGrath April 28, 2016, 9:55 p.m. UTC | #2
Approved
  

Patch

glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir

Previously, application code had to set up the d_namlen member if
the target supported it, involving conditional compilation.  After
this change, glob will use the length of the string in d_name instead
of d_namlen to determine the file name length.  All glibc targets
provide the d_type and d_ino members, and setting the as needed for
gl_readdir is straightforward.

Changing the behavior with regards to d_ino is left to a future
cleanup.

2016-04-12  Florian Weimer  <fweimer@redhat.com>

	glob: Simplify and document the interface for the GLOB_ALTDIRFUNC
	callback function gl_readdir.
	* posix/glob.c (NAMELEN, CONVERT_D_NAMLEN): Remove.
	(CONVERT_DIRENT_DIRENT64): Use strcpy instead of memcpy.
	(glob_in_dir): Remove len.  Use strdup instead of malloc and
	memcpy to copy the name.
	* manual/pattern.texi (Calling Glob): Document requirements for
	implementations of the gl_readdir callback function.
	* manual/examples/mkdirent.c: New example.
	* posix/bug-glob2.c (my_readdir): Set d_ino to 1 unconditionally,
	per the manual guidance.
	* posix/tst-gnuglob.c (my_readdir): Likewise.

diff --git a/manual/examples/mkdirent.c b/manual/examples/mkdirent.c
new file mode 100644
index 0000000..eea794d
--- /dev/null
+++ b/manual/examples/mkdirent.c
@@ -0,0 +1,42 @@ 
+/* Example for creating a struct dirent object for use with glob.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License
+   as published by the Free Software Foundation; either version 2
+   of the License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, if not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <dirent.h>
+#include <errno.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+
+struct dirent *
+mkdirent (const char *name)
+{
+  size_t dirent_size = offsetof (struct dirent, d_name) + 1;
+  size_t name_length = strlen (name);
+  size_t total_size = dirent_size + name_length;
+  if (total_size < dirent_size)
+    {
+      errno = ENOMEM;
+      return NULL;
+    }
+  struct dirent *result = malloc (total_size);
+  if (result == NULL)
+    return NULL;
+  result->d_type = DT_UNKNOWN;
+  result->d_ino = 1;            /* Do not skip this entry.  */
+  memcpy (result->d_name, name, name_length + 1);
+  return result;
+}
diff --git a/manual/pattern.texi b/manual/pattern.texi
index d1b9275..565e7eb 100644
--- a/manual/pattern.texi
+++ b/manual/pattern.texi
@@ -237,7 +237,44 @@  function used to read the contents of a directory.  It is used if the
 @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter.  The type of
 this field is @w{@code{struct dirent *(*) (void *)}}.
 
-This is a GNU extension.
+An implementation of @code{gl_readdir} needs to initialize the following
+members of the @code{struct dirent} object:
+
+@table @code
+@item d_type
+This member should be set to the file type of the entry if it is known.
+Otherwise, the value @code{DT_UNKNOWN} can be used.  The @code{glob}
+function may use the specified file type to avoid callbacks in cases
+where the file type indicates that the data is not required.
+
+@item d_ino
+This member needs to be non-zero, otherwise @code{glob} may skip the
+current entry and call the @code{gl_readdir} callback function again to
+retrieve another entry.
+
+@item d_name
+This member must be set to the name of the entry.  It must be
+null-terminated.
+@end table
+
+The example below shows how to allocate a @code{struct dirent} object
+containing a given name.
+
+@smallexample
+@include mkdirent.c.texi
+@end smallexample
+
+The @code{glob} function reads the @code{struct dirent} members listed
+above and makes a copy of the file name in the @code{d_name} member
+immediately after the @code{gl_readdir} callback function returns.
+Future invocations of any of the callback functions may dealloacte or
+reuse the buffer.  It is the responsibility of the caller of the
+@code{glob} function to allocate and deallocate the buffer, around the
+call to @code{glob} or using the callback functions.  For example, an
+application could allocate the buffer in the @code{gl_readdir} callback
+function, and deallocate it in the @code{gl_closedir} callback function.
+
+The @code{gl_readdir} member is a GNU extension.
 
 @item gl_opendir
 The address of an alternative implementation of the @code{opendir}
diff --git a/posix/bug-glob2.c b/posix/bug-glob2.c
index ddf5ec9..0fdc5d0 100644
--- a/posix/bug-glob2.c
+++ b/posix/bug-glob2.c
@@ -193,7 +193,7 @@  my_readdir (void *gdir)
       return NULL;
     }
 
-  dir->d.d_ino = dir->idx;
+  dir->d.d_ino = 1;		/* glob should not skip this entry.  */
 
 #ifdef _DIRENT_HAVE_D_TYPE
   dir->d.d_type = filesystem[dir->idx].type;
diff --git a/posix/glob.c b/posix/glob.c
index 0c04c3c..9ae76ac 100644
--- a/posix/glob.c
+++ b/posix/glob.c
@@ -57,10 +57,8 @@ 
 
 #if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
 # include <dirent.h>
-# define NAMLEN(dirent) strlen((dirent)->d_name)
 #else
 # define dirent direct
-# define NAMLEN(dirent) (dirent)->d_namlen
 # ifdef HAVE_SYS_NDIR_H
 #  include <sys/ndir.h>
 # endif
@@ -76,12 +74,6 @@ 
 #endif
 
 
-/* In GNU systems, <dirent.h> defines this macro for us.  */
-#ifdef _D_NAMLEN
-# undef NAMLEN
-# define NAMLEN(d) _D_NAMLEN(d)
-#endif
-
 /* When used in the GNU libc the symbol _DIRENT_HAVE_D_TYPE is available
    if the `d_type' member for `struct dirent' is available.
    HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB.  */
@@ -105,12 +97,6 @@ 
 
 /* If the system has the `struct dirent64' type we use it internally.  */
 #if defined _LIBC && !defined COMPILE_GLOB64
-# if defined HAVE_DIRENT_H || defined __GNU_LIBRARY__
-#  define CONVERT_D_NAMLEN(d64, d32)
-# else
-#  define CONVERT_D_NAMLEN(d64, d32) \
-  (d64)->d_namlen = (d32)->d_namlen;
-# endif
 
 # if (defined POSIX || defined WINDOWS32) && !defined __GNU_LIBRARY__
 #  define CONVERT_D_INO(d64, d32)
@@ -127,8 +113,7 @@ 
 # endif
 
 # define CONVERT_DIRENT_DIRENT64(d64, d32) \
-  memcpy ((d64)->d_name, (d32)->d_name, NAMLEN (d32) + 1);		      \
-  CONVERT_D_NAMLEN (d64, d32)						      \
+  strcpy ((d64)->d_name, (d32)->d_name);				      \
   CONVERT_D_INO (d64, d32)						      \
   CONVERT_D_TYPE (d64, d32)
 #endif
@@ -1554,7 +1539,6 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
 	  while (1)
 	    {
 	      const char *name;
-	      size_t len;
 #if defined _LIBC && !defined COMPILE_GLOB64
 	      struct dirent64 *d;
 	      union
@@ -1622,12 +1606,10 @@  glob_in_dir (const char *pattern, const char *directory, int flags,
 			  names = newnames;
 			  cur = 0;
 			}
-		      len = NAMLEN (d);
-		      names->name[cur] = (char *) malloc (len + 1);
+		      names->name[cur] = strdup (d->d_name);
 		      if (names->name[cur] == NULL)
 			goto memory_error;
-		      *((char *) mempcpy (names->name[cur++], name, len))
-			= '\0';
+		      ++cur;
 		      ++nfound;
 		    }
 		}
diff --git a/posix/tst-gnuglob.c b/posix/tst-gnuglob.c
index 992b997..b7b859b 100644
--- a/posix/tst-gnuglob.c
+++ b/posix/tst-gnuglob.c
@@ -211,7 +211,7 @@  my_readdir (void *gdir)
       return NULL;
     }
 
-  dir->d.d_ino = dir->idx;
+  dir->d.d_ino = 1;		/* glob should not skip this entry.  */
 
 #ifdef _DIRENT_HAVE_D_TYPE
   dir->d.d_type = filesystem[dir->idx].type;