[09/16] gdb/python: change escaping rules when setting arguments
Commit Message
It is possible to set an inferior's arguments through the Python API
by assigning to the gdb.Inferior.arguments attribute.
This attribute can be assigned a string, in which case the string is
taken verbatim as the inferior's argument string. Or this attribute
can be assigned a sequence, in which case the members of the sequence
are combined (with some escaping applied) to create the inferior's
argument string.
Currently, the when the arguments from a Python list are escaped, we
use escape_shell_characters. I suspect the reasons for this are
mostly accidental.
When the gdb.Inferior.arguments attribute was introduced in commit:
commit 3153113252f3b949a159439a17e88af8ff0dce30
Date: Mon May 1 13:53:59 2023 -0600
Add attributes and methods to gdb.Inferior
GDB's inferior::set_args method called construct_inferior_arguments,
and construct_inferior_arguments didn't take an escaping function as a
parameter, the only option was escape_shell_characters as that was the
escaping hard-coded into construct_inferior_arguments. The commit
message makes no comments for or against escaping of special shell
characters, and no tests were added that checked this behaviour. All
of this leads me to think that the handling of special shell
characters wasn't really considered (an understandable oversight).
But I'd like to consider it, and I think the current behaviour is not
ideal.
Consider this case:
(gdb) python gdb.selected_inferior().arguments = ['$VAR']
(gdb) show args
Argument list to give program being debugged when it is started is "\$VAR".
This means that when the inferior is run it will see literal '$VAR' as
its argument. If instead, the user wants to pass the shell expanded
value of $VAR to the inferior, there's no way to achieve this result
using the list assignment method.
In this commit I propose that we change this behaviour so that we
instead see this:
(gdb) python gdb.selected_inferior().arguments = ['$VAR']
(gdb) show args
Argument list to give program being debugged when it is started is "$VAR".
Now the '$' character is not escaped. If the inferior is started
under a shell then the user will see the shell expanded value of
'$VAR'.
Of course, if the user wants to pass a literal '$VAR' (with no
expansion) then they can do:
(gdb) python gdb.selected_inferior().arguments = ['\$VAR']
This actually feels more natural to me, the user writes the argument
as they would present it to a shell.
So, after this commit, GDB only escapes quote characters and white
space characters. This keeps some level of backward compatibility
with the existing behaviour (for things other than shell special
characters), but also seems natural, from the user's point of view,
the arguments they are providing are already quoted (by Python's
string quotes) so there's no need to quote white space. It's only
when GDB converts the Python sequence into a single string that the
white space actually needs quoting.
There are tests for the updated functionality, and I've updated the
docs and added a NEWS entry.
---
gdb/NEWS | 4 +++
gdb/doc/python.texi | 7 +++--
gdb/python/py-inferior.c | 2 +-
gdb/testsuite/gdb.python/py-inferior.exp | 36 ++++++++++++++++++++----
gdbsupport/common-inferior.cc | 14 +++++++++
gdbsupport/common-inferior.h | 5 ++++
6 files changed, 60 insertions(+), 8 deletions(-)
Comments
> From: Andrew Burgess <aburgess@redhat.com>
> Cc: Andrew Burgess <aburgess@redhat.com>, Michael Weghorn <m.weghorn@posteo.de>
> Date: Tue, 9 Jan 2024 14:26:32 +0000
>
> It is possible to set an inferior's arguments through the Python API
> by assigning to the gdb.Inferior.arguments attribute.
>
> This attribute can be assigned a string, in which case the string is
> taken verbatim as the inferior's argument string. Or this attribute
> can be assigned a sequence, in which case the members of the sequence
> are combined (with some escaping applied) to create the inferior's
> argument string.
>
> Currently, the when the arguments from a Python list are escaped, we
> use escape_shell_characters. I suspect the reasons for this are
> mostly accidental.
>
> When the gdb.Inferior.arguments attribute was introduced in commit:
>
> commit 3153113252f3b949a159439a17e88af8ff0dce30
> Date: Mon May 1 13:53:59 2023 -0600
>
> Add attributes and methods to gdb.Inferior
>
> GDB's inferior::set_args method called construct_inferior_arguments,
> and construct_inferior_arguments didn't take an escaping function as a
> parameter, the only option was escape_shell_characters as that was the
> escaping hard-coded into construct_inferior_arguments. The commit
> message makes no comments for or against escaping of special shell
> characters, and no tests were added that checked this behaviour. All
> of this leads me to think that the handling of special shell
> characters wasn't really considered (an understandable oversight).
>
> But I'd like to consider it, and I think the current behaviour is not
> ideal.
>
> Consider this case:
>
> (gdb) python gdb.selected_inferior().arguments = ['$VAR']
> (gdb) show args
> Argument list to give program being debugged when it is started is "\$VAR".
>
> This means that when the inferior is run it will see literal '$VAR' as
> its argument. If instead, the user wants to pass the shell expanded
> value of $VAR to the inferior, there's no way to achieve this result
> using the list assignment method.
>
> In this commit I propose that we change this behaviour so that we
> instead see this:
>
> (gdb) python gdb.selected_inferior().arguments = ['$VAR']
> (gdb) show args
> Argument list to give program being debugged when it is started is "$VAR".
>
> Now the '$' character is not escaped. If the inferior is started
> under a shell then the user will see the shell expanded value of
> '$VAR'.
>
> Of course, if the user wants to pass a literal '$VAR' (with no
> expansion) then they can do:
>
> (gdb) python gdb.selected_inferior().arguments = ['\$VAR']
>
> This actually feels more natural to me, the user writes the argument
> as they would present it to a shell.
>
> So, after this commit, GDB only escapes quote characters and white
> space characters. This keeps some level of backward compatibility
> with the existing behaviour (for things other than shell special
> characters), but also seems natural, from the user's point of view,
> the arguments they are providing are already quoted (by Python's
> string quotes) so there's no need to quote white space. It's only
> when GDB converts the Python sequence into a single string that the
> white space actually needs quoting.
>
> There are tests for the updated functionality, and I've updated the
> docs and added a NEWS entry.
> ---
> gdb/NEWS | 4 +++
> gdb/doc/python.texi | 7 +++--
> gdb/python/py-inferior.c | 2 +-
> gdb/testsuite/gdb.python/py-inferior.exp | 36 ++++++++++++++++++++----
> gdbsupport/common-inferior.cc | 14 +++++++++
> gdbsupport/common-inferior.h | 5 ++++
> 6 files changed, 60 insertions(+), 8 deletions(-)
The documentation parts are okay, thanks.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
@@ -87,6 +87,10 @@ show remote thread-options-packet
** New function gdb.interrupt(), that interrupts GDB as if the user
typed control-c.
+ ** When assigning a sequence to gdb.Inferior.arguments, only quote
+ and whitespace characters will be escaped. Everything else will
+ be left unmodified.
+
* Debugger Adapter Protocol changes
** GDB now emits the "process" event.
@@ -3580,8 +3580,11 @@
Either a string or a sequence of strings can be assigned to this
attribute. When a string is assigned, it is assumed to have any
-necessary quoting for the shell; when a sequence is assigned, the
-quoting is applied by @value{GDBN}.
+necessary quoting for the shell; when a sequence is assigned, quoting
+is applied by @value{GDBN} so that the individual strings can be
+concatenated into a single string, with a single space between each
+argument. This means that shell quote characters and whitespace
+characters will be escaped.
@end defvar
A @code{gdb.Inferior} object has the following methods:
@@ -919,7 +919,7 @@ infpy_set_args (PyObject *self, PyObject *value, void *closure)
for (const auto &arg : args)
argvec.push_back (arg.get ());
gdb::array_view<char * const> view (argvec.data (), argvec.size ());
- inf->inferior->set_args (view, escape_shell_characters);
+ inf->inferior->set_args (view, escape_quotes_and_white_space);
}
else
{
@@ -410,11 +410,37 @@ gdb_test "show args" \
[string_to_regexp "Argument list to give program being debugged when it is started is \"a b c\"."] \
"show args from string"
-gdb_test_no_output "python gdb.selected_inferior().arguments = \['a', 'b c'\]" \
- "set arguments from list"
-gdb_test "show args" \
- [string_to_regexp "Argument list to give program being debugged when it is started is \"a b\\ c\"."] \
- "show args from list"
+# Test setting inferior arguments from a Python list. INPUT is a
+# single string that contains the Python list, this is inserted into
+# the argument setting command, and should include the surrouning
+# square brackets.
+#
+# The OUTPUT is the string that describes the arguments as GDB will
+# have stored them within the inferior, as seen in the 'show args'
+# command output. OUTPUT should include the surrounding quotes.
+# OUTPUT will be passed through string_to_regexp, so should be a plain
+# string, not a regexp.
+proc test_setting_arguments_from_list { input output } {
+ with_test_prefix "input: ${input}" {
+ gdb_test_no_output "python gdb.selected_inferior().arguments = ${input}" \
+ "set arguments from list"
+ gdb_test "show args" \
+ [string_to_regexp \
+ "Argument list to give program being debugged when it is started is ${output}."] \
+ "show args from list"
+ }
+}
+
+# Test setting inferior arguments from a list. Try to hit all the
+# potentially problematic cases. Notice that shell characters are not
+# automatically quoted, if a user wants a shell character quoted then
+# they must do that themselves.
+test_setting_arguments_from_list "\['a', 'b c'\]" "\"a b\\ c\""
+test_setting_arguments_from_list "\[' ', '\\t', '\\n']" "\"\\ \\\t '\r\n'\""
+test_setting_arguments_from_list "\['', '']" "\"'' ''\""
+test_setting_arguments_from_list "\['\"']" "\"\\\"\""
+test_setting_arguments_from_list "\[\"'\"]" "\"\\\'\""
+test_setting_arguments_from_list "\[\"\$VAR\", \";\"]" "\"\$VAR ;\""
gdb_test_no_output "python gdb.selected_inferior().clear_env()" \
"clear environment"
@@ -133,3 +133,17 @@ escape_shell_characters (const char *arg)
return escape_characters (arg, special);
}
+
+/* See common-inferior.h. */
+
+std::string
+escape_quotes_and_white_space (const char * arg)
+{
+#ifdef __MINGW32__
+ static const char special[] = "\" \t\n";
+#else
+ static const char special[] = "\"' \t\n";
+#endif
+
+ return escape_characters (arg, special);
+}
@@ -65,6 +65,11 @@ using escape_args_func = std::string (*) (const char *arg);
/* Return a version of ARG that has special shell characters escaped. */
extern std::string escape_shell_characters (const char *arg);
+/* Return a version of ARG that has quote characters and white space
+ characters escaped. No other special shell characters will have been
+ escaped though. */
+extern std::string escape_quotes_and_white_space (const char *arg);
+
/* Pass each element of ARGS through ESCAPE_FUNC and combine the results
into a single string, separating each element with a single space
character. */