[4/4] gdbsupport: rewrite gdb_argv to use extract_string_maybe_quoted

Message ID 666244da341dd7b2d8621da423868dc02afef2a5.1707409662.git.aburgess@redhat.com
State New
Headers
Series Rewrite gdb_argv using extract_string_maybe_quoted |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed

Commit Message

Andrew Burgess Feb. 8, 2024, 4:28 p.m. UTC
  This commit rewrites the gdb_argv class to make use of the function
extract_string_maybe_quoted.

I've moved shell_extract_string_ctrl from the unittests directory into
the gdbsupport directory to make it more widely available, and this is
used within gdb_argv to split arguments and handle escaping correctly.

The algorithm behind this rewrite is pretty simple, for an input
string we skip white space, and then call extract_string_maybe_quoted
to get the next argument.  We then repeat this process to get each
subsequent argument.

The result is almost identical to the libibery buildargv function
except for one edge case which I think is a bug in buildargv.

The libibery buildargv function, if passed a string that only contains
white space, will return a single empty argument.  I think this is
wrong, and after this rewrite we now return a zero length argument
list.  This edge case is tested in gdb_argv-selftests.c and also in
gdb.python/py-cmd.exp.
---
 gdb/testsuite/gdb.python/py-cmd.exp      |  4 ++
 gdb/unittests/extract-string-selftests.c |  3 --
 gdb/unittests/gdb_argv-selftests.c       |  3 ++
 gdbsupport/buildargv.h                   | 50 ++++++++++++++++--------
 gdbsupport/common-utils.cc               |  5 +++
 gdbsupport/common-utils.h                | 11 ++++++
 6 files changed, 57 insertions(+), 19 deletions(-)
  

Comments

Tom Tromey Feb. 8, 2024, 7:08 p.m. UTC | #1
>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

Andrew> The libibery buildargv function, if passed a string that only contains

typo, 'libiberty'

Andrew> white space, will return a single empty argument.  I think this is
Andrew> wrong, and after this rewrite we now return a zero length argument
Andrew> list.

I wonder if this should also be fixed in libiberty.

That said, I'm also in favor of gdb moving away from using buildargv.
Maybe gdb_argv could even be simplified later -- right now it has to do
some memory allocation stuff just to be compatible with how libiberty
works; but I wonder if we could eventually just switch it to
std::vector<std::string>.

Tom
  
Andrew Burgess Feb. 9, 2024, 4:46 p.m. UTC | #2
Tom Tromey <tom@tromey.com> writes:

>>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:
>
> Andrew> The libibery buildargv function, if passed a string that only contains
>
> typo, 'libiberty'
>
> Andrew> white space, will return a single empty argument.  I think this is
> Andrew> wrong, and after this rewrite we now return a zero length argument
> Andrew> list.
>
> I wonder if this should also be fixed in libiberty.

I already have a patch in this area on the gcc list.  It doesn't include
this fix, but an item on my todo list is to update that patch to include
this fix too.

> That said, I'm also in favor of gdb moving away from using buildargv.

If I simplify extract_string_maybe_quoted as you suggest on patch #2
then I'll not be able to share that here.  I'm going to rework the first
3 patches, and see how this leaves this patch looking.

> Maybe gdb_argv could even be simplified later -- right now it has to do
> some memory allocation stuff just to be compatible with how libiberty
> works; but I wonder if we could eventually just switch it to
> std::vector<std::string>.

Maybe.  One place we use gdb_argv is when we start an inferior without a
shell, in which case we forward the gdb_argv data to the exec call (so
we need a nullptr terminated array of C strings).  But that's just one
of _many_ uses of gdb_argv, so maybe we should specialise that case in
some way, and all the other uses could use the vector of strings.

Thanks,
Andrew
  
Tom Tromey Feb. 9, 2024, 5:22 p.m. UTC | #3
>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

>> That said, I'm also in favor of gdb moving away from using buildargv.

Andrew> If I simplify extract_string_maybe_quoted as you suggest on patch #2
Andrew> then I'll not be able to share that here.  I'm going to rework the first
Andrew> 3 patches, and see how this leaves this patch looking.

If my suggestion causes problems, it's fine to push back... I wouldn't
want a sort of offhand thought to make more work.

Andrew> Maybe.  One place we use gdb_argv is when we start an inferior without a
Andrew> shell, in which case we forward the gdb_argv data to the exec call (so
Andrew> we need a nullptr terminated array of C strings).  But that's just one
Andrew> of _many_ uses of gdb_argv, so maybe we should specialise that case in
Andrew> some way, and all the other uses could use the vector of strings.

For this particular case we could have some method to return a transient
vector of 'const char *' or something along those lines.  To be clear
I'm not suggesting you do anything like this in your series, I was just
pointing out why I think it'd be good to move off of buildargv.

Tom
  

Patch

diff --git a/gdb/testsuite/gdb.python/py-cmd.exp b/gdb/testsuite/gdb.python/py-cmd.exp
index 01af475dded..8b1175d44c8 100644
--- a/gdb/testsuite/gdb.python/py-cmd.exp
+++ b/gdb/testsuite/gdb.python/py-cmd.exp
@@ -146,6 +146,10 @@  gdb_test "python print (gdb.string_to_argv (''))" \
   {\[\]} \
     "string_to_argv ('')"
 
+gdb_test "python print (gdb.string_to_argv (' '))" \
+  {\[\]} \
+    "string_to_argv (' ')"
+
 # Test user-defined python commands.
 gdb_test_multiline "input simple user-defined command" \
   "python" "" \
diff --git a/gdb/unittests/extract-string-selftests.c b/gdb/unittests/extract-string-selftests.c
index 3b9a5bd104c..5fd8426ec22 100644
--- a/gdb/unittests/extract-string-selftests.c
+++ b/gdb/unittests/extract-string-selftests.c
@@ -23,9 +23,6 @@ 
 namespace selftests {
 namespace extract_string {
 
-static extract_string_ctrl shell_extract_string_ctrl
-  (nullptr, "", "\"$`\\", "\n", "", "\n");
-
 struct test_def
 {
   test_def (const char *input,
diff --git a/gdb/unittests/gdb_argv-selftests.c b/gdb/unittests/gdb_argv-selftests.c
index b6a87254c93..b374b9b32ec 100644
--- a/gdb/unittests/gdb_argv-selftests.c
+++ b/gdb/unittests/gdb_argv-selftests.c
@@ -83,6 +83,9 @@  struct test_def
 /* The array of all tests.  */
 
 test_def tests[] = {
+  { "", { } },
+  { "  ", { } },
+  { "''", { "" } },
   { "abc def", {"abc", "def"} },
   { "one two 3", {"one", "two", "3"} },
   { "one two\\ three", {"one", "two three"} },
diff --git a/gdbsupport/buildargv.h b/gdbsupport/buildargv.h
index 8fdd085b386..8e84affed39 100644
--- a/gdbsupport/buildargv.h
+++ b/gdbsupport/buildargv.h
@@ -1,4 +1,4 @@ 
-/* RAII wrapper for buildargv
+/* RAII wrapper to split an argument string into individual arguments.
 
    Copyright (C) 2021-2024 Free Software Foundation, Inc.
 
@@ -21,9 +21,12 @@ 
 #define GDBSUPPORT_BUILDARGV_H
 
 #include "libiberty.h"
+#include "gdbsupport/common-utils.h"
+#include <string>
 
-/* A wrapper for an array of char* that was allocated in the way that
-   'buildargv' does, and should be freed with 'freeargv'.  */
+/* A class for splitting a single char* string into an array of char*
+   arguments, each argument is extracted from the original string.  The
+   resulting array will be freed by this classes destructor.  */
 
 class gdb_argv
 {
@@ -32,15 +35,15 @@  class gdb_argv
   /* A constructor that initializes to NULL.  */
 
   gdb_argv ()
-    : m_argv (NULL)
+    : m_argv (nullptr)
   {
   }
 
-  /* A constructor that calls buildargv on STR.  STR may be NULL, in
-     which case this object is initialized with a NULL array.  */
+  /* A constructor that splits STR into an array of arguments.  STR may be
+     NULL, in which case this object is initialized with a NULL array.  */
 
   explicit gdb_argv (const char *str)
-    : m_argv (NULL)
+    : m_argv (nullptr)
   {
     reset (str);
   }
@@ -74,18 +77,34 @@  class gdb_argv
     freeargv (m_argv);
   }
 
-  /* Call buildargv on STR, storing the result in this object.  Any
-     previous state is freed.  STR may be NULL, in which case this
-     object is reset with a NULL array.  If buildargv fails due to
-     out-of-memory, call malloc_failure.  Therefore, the value is
-     guaranteed to be non-NULL, unless the parameter itself is
-     NULL.  */
+  /* Read arguments from STR by calling extract_string_maybe_quoted.
+     Leading and trailing white space in STR will be ignored.  Any previous
+     argument state is freed.  STR may be nullptr, in which case this
+     object is reset with a nullptr array.  */
 
   void reset (const char *str)
   {
-    char **argv = buildargv (str);
     freeargv (m_argv);
-    m_argv = argv;
+    m_argv = nullptr;
+
+    if (str == nullptr)
+      return;
+
+    std::vector<char *> tmp_argv;
+
+    str = skip_spaces (str);
+    while (*str != '\0')
+      {
+	std::string arg
+	  = extract_string_maybe_quoted (&str, shell_extract_string_ctrl);
+	tmp_argv.emplace_back (xstrdup (arg.c_str ()));
+	str = skip_spaces (str);
+      }
+    tmp_argv.emplace_back (nullptr);
+
+    size_t sz = sizeof (decltype (tmp_argv)::value_type) * tmp_argv.size ();
+    m_argv = (char **) xmalloc (sz);
+    memcpy (m_argv, tmp_argv.data (), sz);
   }
 
   /* Return the underlying array.  */
@@ -197,7 +216,6 @@  class gdb_argv
 private:
 
   /* The wrapped array.  */
-
   char **m_argv;
 };
 
diff --git a/gdbsupport/common-utils.cc b/gdbsupport/common-utils.cc
index 15ee2c43f83..0fb2b6f350a 100644
--- a/gdbsupport/common-utils.cc
+++ b/gdbsupport/common-utils.cc
@@ -168,6 +168,11 @@  savestring (const char *ptr, size_t len)
 
 extract_string_ctrl default_extract_string_ctrl (nullptr, nullptr, nullptr);
 
+/* See common-utils.h.  */
+
+extract_string_ctrl shell_extract_string_ctrl (nullptr, "", "\"$`\\",
+					       "\n", "", "\n");
+
 /* See documentation in common-utils.h.  */
 
 std::string
diff --git a/gdbsupport/common-utils.h b/gdbsupport/common-utils.h
index 60578cd7560..bf603ee04e3 100644
--- a/gdbsupport/common-utils.h
+++ b/gdbsupport/common-utils.h
@@ -203,6 +203,17 @@  struct extract_string_ctrl
 
 extern extract_string_ctrl default_extract_string_ctrl;
 
+/* This control object for use with extract_string_maybe_quoted replicates
+   shell like behaviour.  Every character can be escaped in an unquoted
+   context, nothing can be escaped within single quotes, and a limited set
+   of characters (see POSIX shell spec) can be escaped within a double
+   quoted context.
+
+   An escaped newline will be removed in an unquoted and double quoted
+   context.  */
+
+extern extract_string_ctrl shell_extract_string_ctrl;
+
 /* Extract the next word from ARG.  The next word is defined as either,
    everything up to the next space, or, if the next word starts with either
    a single or double quote, then everything up to the closing quote.  The