posix: if glob has a trailing slash match directories only.

Message ID 20171204042021.GA3045@madrid
State New, archived
Headers

Commit Message

Dmitry Goncharov Dec. 4, 2017, 4:20 a.m. UTC
  On Wed, Nov 29, 2017 at 03:01:46PM -0800, Paul Eggert wrote:
> That would be a better approach. Another possibility might be to change 
> the last assignment to:
> 
>      flags |= GLOB_ONLYDIR | GLOB_MARK;
> 
> and then at the end, filter out all matches that aren't marked with 
> trailing '/'. This would avoid creating a new GLOB_XXX option and would 
> probably be easier to implement.
> 
Please have a look at this implementation of your idea.

regards, Dmitry
  

Comments

Paul Eggert Dec. 4, 2017, 8:01 a.m. UTC | #1
Dmitry Goncharov wrote:
>>       flags |= GLOB_ONLYDIR | GLOB_MARK;
>>
>> and then at the end, filter out all matches that aren't marked with
>> trailing '/'. This would avoid creating a new GLOB_XXX option and would
>> probably be easier to implement.
>>
> Please have a look at this implementation of your idea.

I'm not quite following how it's an implementation, since I don't see where it 
does anything like "flags |= GLOB_ONLYDIR | GLOB_MARK;". Maybe there's another 
part of the patch you're missing?

The variable "filter" is a boolean and should be of type bool.

That comment and code look over-complicated. Can't you simply copy nonnull 
entries in-place, in a single pass? That way, the filtered order will be the 
same as the original order.
  
Dmitry Goncharov Dec. 4, 2017, 3:46 p.m. UTC | #2
On Mon, Dec 4, 2017 at 3:01 AM, Paul Eggert <eggert@cs.ucla.edu> wrote:
> Dmitry Goncharov wrote:
>>>
>>>       flags |= GLOB_ONLYDIR | GLOB_MARK;
>>>
>>> and then at the end, filter out all matches that aren't marked with
>>> trailing '/'. This would avoid creating a new GLOB_XXX option and would
>>> probably be easier to implement.
>>>
>> Please have a look at this implementation of your idea.
>
>
> I'm not quite following how it's an implementation, since I don't see where
> it does anything like "flags |= GLOB_ONLYDIR | GLOB_MARK;". Maybe there's
> another part of the patch you're missing?

glob sets GLOB_MARK when it calls itself recursively when the pattern
has a trailing slash.

>
> The variable "filter" is a boolean and should be of type bool.

sure

>
> That comment and code look over-complicated.

i agree. Will remove the comment.

> Can't you simply copy nonnull entries in-place, in a single pass?

The first pass calls is_dir and frees and resets files. Do you mean to
also do copying in this very pass?

> That way, the filtered order will be the same as the original order.

The stable algorithm results in more iterations and more copying. Are you sure?

regards, Dmitry
  
Paul Eggert Dec. 4, 2017, 7:32 p.m. UTC | #3
On 12/04/2017 07:46 AM, Dmitry Goncharov wrote:
>> Can't you simply copy nonnull entries in-place, in a single pass?
> The first pass calls is_dir and frees and resets files. Do you mean to
> also do copying in this very pass?

I would think that would be easy, yes.

>
>> That way, the filtered order will be the same as the original order.
> The stable algorithm results in more iterations and more copying. Are you sure?

Sorry, I'm not seeing why an extra iteration would be needed. I expect 
users would prefer that this bug-fix does not alter the order of 
returned values.
  

Patch

diff --git a/ChangeLog b/ChangeLog
index ffeb208132..1b7e3984d2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@ 
+2017-11-28  Dmitry Goncharov  <dgoncharov@users.sf.net>
+
+	[BZ #22513]
+	* posix/glob.c (glob_in_dir): Make pattern with a trailing slash
+	match directores only.
+	* posix/globtest.sh: Add tests.
+	* posix/globtest.c: Print gl_pathv before errors.
+
 2017-11-13  Florian Weimer  <fweimer@redhat.com>
 
 	* support/next_to_fault.h, support/next_to_fault.c: New files.
diff --git a/posix/glob.c b/posix/glob.c
index cb39779d07..10756f1f92 100644
--- a/posix/glob.c
+++ b/posix/glob.c
@@ -1124,8 +1124,11 @@  __glob (const char *pattern, int flags, int (*errfunc) (const char *, int),
     {
       /* Append slashes to directory names.  */
       size_t i;
+      int filter;
+      filter = 0;
 
       for (i = oldcount; i < pglob->gl_pathc + pglob->gl_offs; ++i)
+        {
 	if (is_dir (pglob->gl_pathv[i], flags, pglob))
 	  {
 	    size_t len = strlen (pglob->gl_pathv[i]) + 2;
@@ -1140,6 +1143,77 @@  __glob (const char *pattern, int flags, int (*errfunc) (const char *, int),
 	    strcpy (&new[len - 2], "/");
 	    pglob->gl_pathv[i] = new;
 	  }
+	else if (flags & GLOB_ONLYDIR)
+	  {
+	    /* Either the caller specified GLOB_MARK and GLOB_ONLYDIR or the
+	       caller specified the pattern that has a trailing slash.
+	       Filter out everything, but directories.	*/
+	    free (pglob->gl_pathv[i]);
+	    pglob->gl_pathv[i] = NULL;
+	    filter = 1;
+	  }
+	}
+      if (filter)
+	{
+	  /* After the first pass above gl_pathv has full elements (directories)
+	     and empty elements.
+	     This algorithm partitions gl_pathv to have all full elements in the
+	     beginning followed by all empty elements at the end and then sets
+	     gl_pathc to the number of the full elements.
+
+	     Assume the case of 2 elements offsetted by the caller and 2 more matches
+	     found by prior calls to glob with GLOB_APPEND and 3 more matches found
+	     this time.
+	     offs = 2, oldcount = 4, pathc = 5.
+
+	     Initial position.
+	     gl_pathv       e     f
+	      |             |     |
+	      v             v     v
+	      [ ][ ] [ ][ ][ ][ ][ ]
+	      <----><-------------->
+	       offs	 pathc
+	      <----------->
+		oldcount
+	      From this position e will be advancing to the next empty element
+	      and f will be backtracking to the last full element. At each
+	      iteration contents of e and f will be swapped.
+	   */
+
+	  size_t n;
+	  char **e, **f;
+	  n = pglob->gl_pathc + pglob->gl_offs;
+	  assert (oldcount < n); /* Otherwise filter'd be 0.  */
+	  /* f points at the last element of gl_pathv.	*/
+	  f = pglob->gl_pathv + n - 1;
+	  /* e points at gl_pathv[oldcount].  */
+	  e = pglob->gl_pathv + oldcount;
+	  assert (f >= e);
+	  assert (f >= pglob->gl_pathv); /* Otherwise filter'd be 0.  */
+	  for (;;)
+	    {
+	      assert (f >= pglob->gl_pathv);
+	      assert (e < pglob->gl_pathv + n);
+	      /* Advance e to point at the first empty element.  */
+	      while (*e != NULL && e < f)
+		++e;
+	      if (e >= f)
+		break;
+	      /* Backtrack f to point at the last full element.  */
+	      while (*f == NULL && f > e)
+		--f;
+	      if (e >= f)
+		break;
+	      assert (f > e);
+	      *e = *f;
+	      *f = NULL;
+	    }
+	  /* If there are full elements in gl_pathv then f points at the last
+	     one. Otherwise f points at gl_pathv[oldcount].  */
+	  pglob->gl_pathc = f - pglob->gl_pathv - pglob->gl_offs;
+	  if (pglob->gl_pathc == 0)
+	    retval = GLOB_NOMATCH;
+	}
     }
 
   if (!(flags & GLOB_NOSORT))
diff --git a/posix/globtest.c b/posix/globtest.c
index 878ae33044..0a94550d26 100644
--- a/posix/globtest.c
+++ b/posix/globtest.c
@@ -93,14 +93,6 @@  main (int argc, char *argv[])
     glob_flags |= GLOB_APPEND;
   }
 
-  /* Was there an error? */
-  if (i == GLOB_NOSPACE)
-    puts ("GLOB_NOSPACE");
-  else if (i == GLOB_ABORTED)
-    puts ("GLOB_ABORTED");
-  else if (i == GLOB_NOMATCH)
-    puts ("GLOB_NOMATCH");
-
   /* If we set an offset, fill in the first field.
      (Unless glob() has filled it in already - which is an error) */
   if ((glob_flags & GLOB_DOOFFS) && g.gl_pathv[0] == NULL)
@@ -109,10 +101,25 @@  main (int argc, char *argv[])
   /* Print out the names.  Unless otherwise specified, qoute them.  */
   if (g.gl_pathv)
     {
-      for (i = 0; i < g.gl_offs + g.gl_pathc; ++i)
+      int k;
+      for (k = 0; k < g.gl_offs + g.gl_pathc; ++k)
         printf ("%s%s%s\n", quotes ? "`" : "",
-		g.gl_pathv[i] ? g.gl_pathv[i] : "(null)",
+		g.gl_pathv[k] ? g.gl_pathv[k] : "(null)",
 		quotes ? "'" : "");
+
     }
+
+  /* Was there an error?
+   * Printing the error after printing g.gl_pathv simplifies the test suite,
+   * because when -o is used 'abc' is always printed first regardless of
+   * whether the pattern matches anyting.
+   */
+  if (i == GLOB_NOSPACE)
+    puts ("GLOB_NOSPACE");
+  else if (i == GLOB_ABORTED)
+    puts ("GLOB_ABORTED");
+  else if (i == GLOB_NOMATCH)
+    puts ("GLOB_NOMATCH");
+
   return 0;
 }
diff --git a/posix/globtest.sh b/posix/globtest.sh
index 73f7ae31cc..bc86fcc232 100755
--- a/posix/globtest.sh
+++ b/posix/globtest.sh
@@ -43,13 +43,22 @@  export LC_ALL
 
 # Create the arena
 testdir=${common_objpfx}posix/globtest-dir
+tdir2=${common_objpfx}posix/globtest-dir2
 testout=${common_objpfx}posix/globtest-out
-rm -rf $testdir $testout
+rm -rf $testdir $tdir2 $testout
 mkdir $testdir
+mkdir $tdir2
+mkdir $tdir2/hello1d
+ln -s $tdir2/hello1d $tdir2/hello1ds
+mkdir $tdir2/hello2d
+echo 1 > $tdir2/hello1f
+ln -s $tdir2/hello1f $tdir2/hello1fs
+echo 2 > $tdir2/hello2f
+ln $tdir2/hello2f $tdir2/hello2fl
 
 cleanup() {
     chmod 777 $testdir/noread
-    rm -fr $testdir $testout
+    rm -fr $testdir $tdir2 $testout
 }
 
 trap cleanup 0 HUP INT QUIT TERM
@@ -74,6 +83,30 @@  echo 1_1 > $testdir/dir1/file1_1
 echo 1_2 > $testdir/dir1/file1_2
 ln -fs dir1 $testdir/link1
 
+trailing_slash_test()
+{
+    local dir="$1"
+    local pattern="$2"
+    local expected="$3"
+    failed=0
+    ${test_program_prefix} \
+    ${common_objpfx}posix/globtest -q "$dir" "$pattern" | sort -fd > $testout
+    echo -e "$expected" | $CMP - $testout >> $logfile || failed=1
+    if test $failed -ne 0; then
+      echo "$pattern test failed" >> $logfile
+      result=1
+    fi
+
+    failed=0
+    ${test_program_prefix} \
+    ${common_objpfx}posix/globtest -qo "$dir" "$pattern" | sort -fd > $testout
+    echo -e "abc\n$expected" | $CMP - $testout >> $logfile || failed=1
+    if test $failed -ne 0; then
+      echo "$pattern with dooffs test failed" >> $logfile
+      result=1
+    fi
+}
+
 # Run some tests.
 result=0
 rm -f $logfile
@@ -692,6 +725,22 @@  if test $failed -ne 0; then
   result=1
 fi
 
+# Try multiple patterns (GLOB_APPEND) with offset (GLOB_DOOFFS)
+# The pattern has a trailing slash which eliminates files in the
+# subdirectories.
+failed=0
+${test_program_prefix} \
+${common_objpfx}posix/globtest -o "$testdir" "file1" "*/*/" |
+sort > $testout
+cat <<"EOF" | $CMP - $testout >> $logfile || failed=1
+`abc'
+`file1'
+EOF
+if test $failed -ne 0; then
+  echo "GLOB_APPEND2 with slash test failed" >> $logfile
+  result=1
+fi
+
 # Test NOCHECK with non-existing file in subdir.
 failed=0
 ${test_program_prefix} \
@@ -815,6 +864,93 @@  if test $failed -ne 0; then
   result=1
 fi
 
+# This code tests the algo that filters out non directories when the pattern
+# has a trailing slash.
+# There are at least the following cases
+# - The pattern matches files and directories.
+# - The pattern matches nothing.
+# - The pattern matches only files.
+# - The pattern matches 1 file.
+# - The pattern matches only directories.
+# - The pattern matches 1 directory.
+# Each test is run twice, with and without a trailing slash to demonstrate
+# the differentce that the slash makes.
+# Each test is also run from the target directory by doing chdir and from the
+# current directory by specifying the full path in the pattern.
+# Each test is also run with and without dooffs.
+
+# The pattern matches files and directories.
+pattern='hello*'
+expected="hello1d\nhello1ds\nhello1f\nhello1fs\nhello2d\nhello2f\nhello2fl"
+trailing_slash_test "$tdir2" "$pattern" "$expected"
+
+expected="hello1d/\nhello1ds/\nhello2d/"
+trailing_slash_test "$tdir2" "$pattern/" "$expected"
+
+pattern="$tdir2/hello*"
+expected="$tdir2/hello1d\n$tdir2/hello1ds\n$tdir2/hello1f\n\
+$tdir2/hello1fs\n$tdir2/hello2d\n$tdir2/hello2f\n$tdir2/hello2fl"
+trailing_slash_test . "$pattern" "$expected"
+
+expected="$tdir2/hello1d/\n$tdir2/hello1ds/\n$tdir2/hello2d/"
+trailing_slash_test . "$pattern/" "$expected"
+
+# The pattern matches nothing.
+pattern='nomatch*'
+trailing_slash_test "$tdir2" "$pattern" GLOB_NOMATCH
+trailing_slash_test "$tdir2" "$pattern/" GLOB_NOMATCH
+trailing_slash_test . "$pattern" GLOB_NOMATCH
+trailing_slash_test . "$pattern/" GLOB_NOMATCH
+
+# The pattern matches only files.
+pattern='hello?f*'
+expected="hello1f\nhello1fs\nhello2f\nhello2fl"
+trailing_slash_test "$tdir2" "$pattern" "$expected"
+trailing_slash_test "$tdir2" "$pattern/" GLOB_NOMATCH
+
+pattern="$tdir2/hello?f*"
+expected="$tdir2/hello1f\n$tdir2/hello1fs\n$tdir2/hello2f\n$tdir2/hello2fl"
+trailing_slash_test . "$pattern" "$expected"
+trailing_slash_test . "$pattern/" GLOB_NOMATCH
+
+# The pattern matches only 1 file.
+pattern='hello*fl'
+expected="hello2fl"
+trailing_slash_test "$tdir2" "$pattern" "$expected"
+trailing_slash_test "$tdir2" "$pattern/" GLOB_NOMATCH
+
+pattern="$tdir2/hello*fl"
+expected="$tdir2/hello2fl"
+trailing_slash_test . "$pattern" "$expected"
+trailing_slash_test . "$pattern/" GLOB_NOMATCH
+
+# The pattern matches only directores.
+pattern='hello?d*'
+expected="hello1d\nhello1ds\nhello2d"
+trailing_slash_test "$tdir2" "$pattern" "$expected"
+
+expected="hello1d/\nhello1ds/\nhello2d/"
+trailing_slash_test "$tdir2" "$pattern/" "$expected"
+
+pattern="$tdir2/hello?d*"
+expected="$tdir2/hello1d\n$tdir2/hello1ds\n$tdir2/hello2d"
+trailing_slash_test . "$pattern" "$expected"
+
+expected="$tdir2/hello1d/\n$tdir2/hello1ds/\n$tdir2/hello2d/"
+trailing_slash_test . "$pattern/" "$expected"
+
+# The pattern matches only 1 directory.
+pattern='hello*ds'
+expected="hello1ds"
+trailing_slash_test "$tdir2" "$pattern" "$expected"
+trailing_slash_test "$tdir2" "$pattern/" "$expected/"
+
+pattern="$tdir2/hello*ds"
+expected="$tdir2/hello1ds"
+trailing_slash_test . "$pattern" "$expected"
+trailing_slash_test . "$pattern/" "$expected/"
+
+
 if test $result -eq 0; then
     echo "All OK." > $logfile
 fi