[09/16] gdb/python: change escaping rules when setting arguments

Message ID db6a34d69f5c9ed07f650f90eb58c9719df06971.1704809585.git.aburgess@redhat.com
State New
Headers
Series Inferior argument (inc for remote targets) changes |

Commit Message

Andrew Burgess Jan. 9, 2024, 2:26 p.m. UTC
  It is possible to set an inferior's arguments through the Python API
by assigning to the gdb.Inferior.arguments attribute.

This attribute can be assigned a string, in which case the string is
taken verbatim as the inferior's argument string.  Or this attribute
can be assigned a sequence, in which case the members of the sequence
are combined (with some escaping applied) to create the inferior's
argument string.

Currently, the when the arguments from a Python list are escaped, we
use escape_shell_characters.  I suspect the reasons for this are
mostly accidental.

When the gdb.Inferior.arguments attribute was introduced in commit:

  commit 3153113252f3b949a159439a17e88af8ff0dce30
  Date:   Mon May 1 13:53:59 2023 -0600

      Add attributes and methods to gdb.Inferior

GDB's inferior::set_args method called construct_inferior_arguments,
and construct_inferior_arguments didn't take an escaping function as a
parameter, the only option was escape_shell_characters as that was the
escaping hard-coded into construct_inferior_arguments.  The commit
message makes no comments for or against escaping of special shell
characters, and no tests were added that checked this behaviour.  All
of this leads me to think that the handling of special shell
characters wasn't really considered (an understandable oversight).

But I'd like to consider it, and I think the current behaviour is not
ideal.

Consider this case:

  (gdb) python gdb.selected_inferior().arguments = ['$VAR']
  (gdb) show args
  Argument list to give program being debugged when it is started is "\$VAR".

This means that when the inferior is run it will see literal '$VAR' as
its argument.  If instead, the user wants to pass the shell expanded
value of $VAR to the inferior, there's no way to achieve this result
using the list assignment method.

In this commit I propose that we change this behaviour so that we
instead see this:

  (gdb) python gdb.selected_inferior().arguments = ['$VAR']
  (gdb) show args
  Argument list to give program being debugged when it is started is "$VAR".

Now the '$' character is not escaped.  If the inferior is started
under a shell then the user will see the shell expanded value of
'$VAR'.

Of course, if the user wants to pass a literal '$VAR' (with no
expansion) then they can do:

  (gdb) python gdb.selected_inferior().arguments = ['\$VAR']

This actually feels more natural to me, the user writes the argument
as they would present it to a shell.

So, after this commit, GDB only escapes quote characters and white
space characters.  This keeps some level of backward compatibility
with the existing behaviour (for things other than shell special
characters), but also seems natural, from the user's point of view,
the arguments they are providing are already quoted (by Python's
string quotes) so there's no need to quote white space.  It's only
when GDB converts the Python sequence into a single string that the
white space actually needs quoting.

There are tests for the updated functionality, and I've updated the
docs and added a NEWS entry.
---
 gdb/NEWS                                 |  4 +++
 gdb/doc/python.texi                      |  7 +++--
 gdb/python/py-inferior.c                 |  2 +-
 gdb/testsuite/gdb.python/py-inferior.exp | 36 ++++++++++++++++++++----
 gdbsupport/common-inferior.cc            | 14 +++++++++
 gdbsupport/common-inferior.h             |  5 ++++
 6 files changed, 60 insertions(+), 8 deletions(-)
  

Comments

Eli Zaretskii Jan. 9, 2024, 4:30 p.m. UTC | #1
> From: Andrew Burgess <aburgess@redhat.com>
> Cc: Andrew Burgess <aburgess@redhat.com>, Michael Weghorn <m.weghorn@posteo.de>
> Date: Tue,  9 Jan 2024 14:26:32 +0000
> 
> It is possible to set an inferior's arguments through the Python API
> by assigning to the gdb.Inferior.arguments attribute.
> 
> This attribute can be assigned a string, in which case the string is
> taken verbatim as the inferior's argument string.  Or this attribute
> can be assigned a sequence, in which case the members of the sequence
> are combined (with some escaping applied) to create the inferior's
> argument string.
> 
> Currently, the when the arguments from a Python list are escaped, we
> use escape_shell_characters.  I suspect the reasons for this are
> mostly accidental.
> 
> When the gdb.Inferior.arguments attribute was introduced in commit:
> 
>   commit 3153113252f3b949a159439a17e88af8ff0dce30
>   Date:   Mon May 1 13:53:59 2023 -0600
> 
>       Add attributes and methods to gdb.Inferior
> 
> GDB's inferior::set_args method called construct_inferior_arguments,
> and construct_inferior_arguments didn't take an escaping function as a
> parameter, the only option was escape_shell_characters as that was the
> escaping hard-coded into construct_inferior_arguments.  The commit
> message makes no comments for or against escaping of special shell
> characters, and no tests were added that checked this behaviour.  All
> of this leads me to think that the handling of special shell
> characters wasn't really considered (an understandable oversight).
> 
> But I'd like to consider it, and I think the current behaviour is not
> ideal.
> 
> Consider this case:
> 
>   (gdb) python gdb.selected_inferior().arguments = ['$VAR']
>   (gdb) show args
>   Argument list to give program being debugged when it is started is "\$VAR".
> 
> This means that when the inferior is run it will see literal '$VAR' as
> its argument.  If instead, the user wants to pass the shell expanded
> value of $VAR to the inferior, there's no way to achieve this result
> using the list assignment method.
> 
> In this commit I propose that we change this behaviour so that we
> instead see this:
> 
>   (gdb) python gdb.selected_inferior().arguments = ['$VAR']
>   (gdb) show args
>   Argument list to give program being debugged when it is started is "$VAR".
> 
> Now the '$' character is not escaped.  If the inferior is started
> under a shell then the user will see the shell expanded value of
> '$VAR'.
> 
> Of course, if the user wants to pass a literal '$VAR' (with no
> expansion) then they can do:
> 
>   (gdb) python gdb.selected_inferior().arguments = ['\$VAR']
> 
> This actually feels more natural to me, the user writes the argument
> as they would present it to a shell.
> 
> So, after this commit, GDB only escapes quote characters and white
> space characters.  This keeps some level of backward compatibility
> with the existing behaviour (for things other than shell special
> characters), but also seems natural, from the user's point of view,
> the arguments they are providing are already quoted (by Python's
> string quotes) so there's no need to quote white space.  It's only
> when GDB converts the Python sequence into a single string that the
> white space actually needs quoting.
> 
> There are tests for the updated functionality, and I've updated the
> docs and added a NEWS entry.
> ---
>  gdb/NEWS                                 |  4 +++
>  gdb/doc/python.texi                      |  7 +++--
>  gdb/python/py-inferior.c                 |  2 +-
>  gdb/testsuite/gdb.python/py-inferior.exp | 36 ++++++++++++++++++++----
>  gdbsupport/common-inferior.cc            | 14 +++++++++
>  gdbsupport/common-inferior.h             |  5 ++++
>  6 files changed, 60 insertions(+), 8 deletions(-)

The documentation parts are okay, thanks.

Reviewed-By: Eli Zaretskii <eliz@gnu.org>
  

Patch

diff --git a/gdb/NEWS b/gdb/NEWS
index 11cd6c0663e..b72ba3d87e8 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -87,6 +87,10 @@  show remote thread-options-packet
   ** New function gdb.interrupt(), that interrupts GDB as if the user
      typed control-c.
 
+  ** When assigning a sequence to gdb.Inferior.arguments, only quote
+     and whitespace characters will be escaped.  Everything else will
+     be left unmodified.
+
 * Debugger Adapter Protocol changes
 
   ** GDB now emits the "process" event.
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index d74defeec0c..e04e79cafa5 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -3580,8 +3580,11 @@ 
 
 Either a string or a sequence of strings can be assigned to this
 attribute.  When a string is assigned, it is assumed to have any
-necessary quoting for the shell; when a sequence is assigned, the
-quoting is applied by @value{GDBN}.
+necessary quoting for the shell; when a sequence is assigned, quoting
+is applied by @value{GDBN} so that the individual strings can be
+concatenated into a single string, with a single space between each
+argument.  This means that shell quote characters and whitespace
+characters will be escaped.
 @end defvar
 
 A @code{gdb.Inferior} object has the following methods:
diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c
index 8641d8a068b..5b7c7fb9365 100644
--- a/gdb/python/py-inferior.c
+++ b/gdb/python/py-inferior.c
@@ -919,7 +919,7 @@  infpy_set_args (PyObject *self, PyObject *value, void *closure)
       for (const auto &arg : args)
 	argvec.push_back (arg.get ());
       gdb::array_view<char * const> view (argvec.data (), argvec.size ());
-      inf->inferior->set_args (view, escape_shell_characters);
+      inf->inferior->set_args (view, escape_quotes_and_white_space);
     }
   else
     {
diff --git a/gdb/testsuite/gdb.python/py-inferior.exp b/gdb/testsuite/gdb.python/py-inferior.exp
index 6fbcdd6822f..108c85c0165 100644
--- a/gdb/testsuite/gdb.python/py-inferior.exp
+++ b/gdb/testsuite/gdb.python/py-inferior.exp
@@ -410,11 +410,37 @@  gdb_test "show args" \
     [string_to_regexp "Argument list to give program being debugged when it is started is \"a b c\"."] \
     "show args from string"
 
-gdb_test_no_output "python gdb.selected_inferior().arguments = \['a', 'b c'\]" \
-    "set arguments from list"
-gdb_test "show args" \
-    [string_to_regexp "Argument list to give program being debugged when it is started is \"a b\\ c\"."] \
-    "show args from list"
+# Test setting inferior arguments from a Python list.  INPUT is a
+# single string that contains the Python list, this is inserted into
+# the argument setting command, and should include the surrouning
+# square brackets.
+#
+# The OUTPUT is the string that describes the arguments as GDB will
+# have stored them within the inferior, as seen in the 'show args'
+# command output.  OUTPUT should include the surrounding quotes.
+# OUTPUT will be passed through string_to_regexp, so should be a plain
+# string, not a regexp.
+proc test_setting_arguments_from_list { input output } {
+    with_test_prefix "input: ${input}" {
+	gdb_test_no_output "python gdb.selected_inferior().arguments = ${input}" \
+	    "set arguments from list"
+	gdb_test "show args" \
+	    [string_to_regexp \
+		 "Argument list to give program being debugged when it is started is ${output}."] \
+	    "show args from list"
+    }
+}
+
+# Test setting inferior arguments from a list.  Try to hit all the
+# potentially problematic cases.  Notice that shell characters are not
+# automatically quoted, if a user wants a shell character quoted then
+# they must do that themselves.
+test_setting_arguments_from_list "\['a', 'b c'\]" "\"a b\\ c\""
+test_setting_arguments_from_list "\[' ', '\\t', '\\n']" "\"\\  \\\t '\r\n'\""
+test_setting_arguments_from_list "\['', '']" "\"'' ''\""
+test_setting_arguments_from_list "\['\"']" "\"\\\"\""
+test_setting_arguments_from_list "\[\"'\"]" "\"\\\'\""
+test_setting_arguments_from_list "\[\"\$VAR\", \";\"]" "\"\$VAR ;\""
 
 gdb_test_no_output "python gdb.selected_inferior().clear_env()" \
     "clear environment"
diff --git a/gdbsupport/common-inferior.cc b/gdbsupport/common-inferior.cc
index 6717f7d5c08..cf2cd9a090a 100644
--- a/gdbsupport/common-inferior.cc
+++ b/gdbsupport/common-inferior.cc
@@ -133,3 +133,17 @@  escape_shell_characters (const char *arg)
 
   return escape_characters (arg, special);
 }
+
+/* See common-inferior.h.  */
+
+std::string
+escape_quotes_and_white_space (const char * arg)
+{
+#ifdef __MINGW32__
+  static const char special[] = "\" \t\n";
+#else
+  static const char special[] = "\"' \t\n";
+#endif
+
+  return escape_characters (arg, special);
+}
diff --git a/gdbsupport/common-inferior.h b/gdbsupport/common-inferior.h
index 92d1954c3fe..7cc01fb2f28 100644
--- a/gdbsupport/common-inferior.h
+++ b/gdbsupport/common-inferior.h
@@ -65,6 +65,11 @@  using escape_args_func = std::string (*) (const char *arg);
 /* Return a version of ARG that has special shell characters escaped.  */
 extern std::string escape_shell_characters (const char *arg);
 
+/* Return a version of ARG that has quote characters and white space
+   characters escaped.  No other special shell characters will have been
+   escaped though.  */
+extern std::string escape_quotes_and_white_space (const char *arg);
+
 /* Pass each element of ARGS through ESCAPE_FUNC and combine the results
    into a single string, separating each element with a single space
    character.  */