gdb/python: change escaping rules when setting arguments
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gdb_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gdb_check--master-arm |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 |
success
|
Test passed
|
Commit Message
It is possible to set an inferior's arguments through the Python API
by assigning to the gdb.Inferior.arguments attribute.
This attribute can be assigned a string, in which case the string is
taken verbatim as the inferior's argument string. Or this attribute
can be assigned a sequence, in which case the members of the sequence
are combined (with some escaping applied) to create the inferior's
argument string.
When assigning from a sequence, all special shell characters in each
argument are escaped, I suspect the reasons for this are mostly
accidental. When the gdb.Inferior.arguments attribute was introduced
in commit:
commit 3153113252f3b949a159439a17e88af8ff0dce30
Date: Mon May 1 13:53:59 2023 -0600
Add attributes and methods to gdb.Inferior
GDB's inferior::set_args method called construct_inferior_arguments,
and construct_inferior_arguments only offered one form of escaping,
all special shell characters were escaped.
The commit message for 3153113252f3b949 doesn't discuss quoting at
all, no tests were added that checked this behaviour, and the
documentation is (I think) a little vague on exactly what quoting will
be applied, the text says:
Either a string or a sequence of strings can be assigned to this
attribute. When a string is assigned, it is assumed to have any
necessary quoting for the shell; when a sequence is assigned, the
quoting is applied by GDB.
So clearly quoting is applied when a sequence is used, but what
quoting is left unclear.
I think there is a problem with the way quoting is currently handled.
Consider this case:
(gdb) python gdb.selected_inferior().arguments = ['$VAR']
(gdb) show args
Argument list to give program being debugged when it is started is "\$VAR".
This means that when the inferior is run it will see literal '$VAR' as
its argument. If instead, the user wants to pass the shell expanded
value of $VAR to the inferior, there's no way to achieve this result
using the sequence assignment method. Instead the user has to
manually combine the arguments, and assign a single string.
I think it would be better if GDB applied the minimum escaping
required in order to merge the arguments together while keeping the
sequence elements as separate arguments. Thus, the above example, I
think, should work like this instead:
(gdb) python gdb.selected_inferior().arguments = ['$VAR']
(gdb) show args
Argument list to give program being debugged when it is started is "$VAR".
Now the '$' character is not escaped. If the inferior is started
under a shell then the user will see the shell expanded value of
'$VAR'.
Of course, if the user wants to pass a literal '$VAR' (with no
expansion) then they can do:
(gdb) python gdb.selected_inferior().arguments = ['\$VAR']
(gdb) show args
Argument list to give program being debugged when it is started is "\$VAR".
GDB would still escape some characters, but only those characters that
are important to GDB, quotes and white space. After this patch the
user would see behaviour like this:
(gdb) python gdb.selected_inferior().arguments = ["\"1\"", '\'2\'', '3 4']
(gdb) show args
Argument list to give program being debugged when it is started is "\"1\" \'2\' 3\ 4".
Notice that GDB has quoted the single and double quote characters as
well as the white space in the third argument. This means that the
three sequence elements will be delivered to the inferior as three
separate arguments exactly as the user delivered them to Python.
It is important to understand that this is a Python API breaking
change, and this could impact existing user scripts. My hope is that
this is enough of an edge case that we'll not inconvenience too many
users.
If people think this is too much of a risk, then I do have some ideas
for how we can offer different methods of setting the inferior
arguments from Python that will allow for the new quoting without
changing the existing behaviour; we could add an inferior method, or
add a new write-only attribute. Either of these approaches would work
fine, but (I think) would be less than ideal. I'd much rather just
update the current API.
There are tests for the updated functionality, and I've updated the
docs and added a NEWS entry.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Tested-By: Guinevere Larsen <guinevere@redhat.com>
---
gdb/NEWS | 4 +++
gdb/doc/python.texi | 8 ++++--
gdb/python/py-inferior.c | 2 +-
gdb/testsuite/gdb.python/py-inferior.exp | 36 ++++++++++++++++++++----
4 files changed, 42 insertions(+), 8 deletions(-)
base-commit: 512ca2fca4b6674d51706d0d7fe21ed72f733fb5
Comments
Hi Tom,
I wonder if you have any thoughts on this change? I think you did the
initial work on the Python API for inferior arguments.
Thanks
Andrew
Andrew Burgess <aburgess@redhat.com> writes:
> It is possible to set an inferior's arguments through the Python API
> by assigning to the gdb.Inferior.arguments attribute.
>
> This attribute can be assigned a string, in which case the string is
> taken verbatim as the inferior's argument string. Or this attribute
> can be assigned a sequence, in which case the members of the sequence
> are combined (with some escaping applied) to create the inferior's
> argument string.
>
> When assigning from a sequence, all special shell characters in each
> argument are escaped, I suspect the reasons for this are mostly
> accidental. When the gdb.Inferior.arguments attribute was introduced
> in commit:
>
> commit 3153113252f3b949a159439a17e88af8ff0dce30
> Date: Mon May 1 13:53:59 2023 -0600
>
> Add attributes and methods to gdb.Inferior
>
> GDB's inferior::set_args method called construct_inferior_arguments,
> and construct_inferior_arguments only offered one form of escaping,
> all special shell characters were escaped.
>
> The commit message for 3153113252f3b949 doesn't discuss quoting at
> all, no tests were added that checked this behaviour, and the
> documentation is (I think) a little vague on exactly what quoting will
> be applied, the text says:
>
> Either a string or a sequence of strings can be assigned to this
> attribute. When a string is assigned, it is assumed to have any
> necessary quoting for the shell; when a sequence is assigned, the
> quoting is applied by GDB.
>
> So clearly quoting is applied when a sequence is used, but what
> quoting is left unclear.
>
> I think there is a problem with the way quoting is currently handled.
> Consider this case:
>
> (gdb) python gdb.selected_inferior().arguments = ['$VAR']
> (gdb) show args
> Argument list to give program being debugged when it is started is "\$VAR".
>
> This means that when the inferior is run it will see literal '$VAR' as
> its argument. If instead, the user wants to pass the shell expanded
> value of $VAR to the inferior, there's no way to achieve this result
> using the sequence assignment method. Instead the user has to
> manually combine the arguments, and assign a single string.
>
> I think it would be better if GDB applied the minimum escaping
> required in order to merge the arguments together while keeping the
> sequence elements as separate arguments. Thus, the above example, I
> think, should work like this instead:
>
> (gdb) python gdb.selected_inferior().arguments = ['$VAR']
> (gdb) show args
> Argument list to give program being debugged when it is started is "$VAR".
>
> Now the '$' character is not escaped. If the inferior is started
> under a shell then the user will see the shell expanded value of
> '$VAR'.
>
> Of course, if the user wants to pass a literal '$VAR' (with no
> expansion) then they can do:
>
> (gdb) python gdb.selected_inferior().arguments = ['\$VAR']
> (gdb) show args
> Argument list to give program being debugged when it is started is "\$VAR".
>
> GDB would still escape some characters, but only those characters that
> are important to GDB, quotes and white space. After this patch the
> user would see behaviour like this:
>
> (gdb) python gdb.selected_inferior().arguments = ["\"1\"", '\'2\'', '3 4']
> (gdb) show args
> Argument list to give program being debugged when it is started is "\"1\" \'2\' 3\ 4".
>
> Notice that GDB has quoted the single and double quote characters as
> well as the white space in the third argument. This means that the
> three sequence elements will be delivered to the inferior as three
> separate arguments exactly as the user delivered them to Python.
>
> It is important to understand that this is a Python API breaking
> change, and this could impact existing user scripts. My hope is that
> this is enough of an edge case that we'll not inconvenience too many
> users.
>
> If people think this is too much of a risk, then I do have some ideas
> for how we can offer different methods of setting the inferior
> arguments from Python that will allow for the new quoting without
> changing the existing behaviour; we could add an inferior method, or
> add a new write-only attribute. Either of these approaches would work
> fine, but (I think) would be less than ideal. I'd much rather just
> update the current API.
>
> There are tests for the updated functionality, and I've updated the
> docs and added a NEWS entry.
>
> Reviewed-By: Eli Zaretskii <eliz@gnu.org>
> Tested-By: Guinevere Larsen <guinevere@redhat.com>
> ---
> gdb/NEWS | 4 +++
> gdb/doc/python.texi | 8 ++++--
> gdb/python/py-inferior.c | 2 +-
> gdb/testsuite/gdb.python/py-inferior.exp | 36 ++++++++++++++++++++----
> 4 files changed, 42 insertions(+), 8 deletions(-)
>
> diff --git a/gdb/NEWS b/gdb/NEWS
> index 0aac7a7b13a..36ea1b77837 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -58,6 +58,10 @@ show riscv numeric-register-names
> was never documented in the GDB manual, so users should not have
> been using it.
>
> + ** When assigning a sequence to gdb.Inferior.arguments, only quote
> + and whitespace characters will be escaped. Everything else will
> + be left unmodified.
> +
> * Guile API
>
> ** New type <gdb:color> for dealing with colors.
> diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
> index 0dbb37bb8dd..953dace3cc8 100644
> --- a/gdb/doc/python.texi
> +++ b/gdb/doc/python.texi
> @@ -3584,8 +3584,12 @@ Inferiors In Python
>
> Either a string or a sequence of strings can be assigned to this
> attribute. When a string is assigned, it is assumed to have any
> -necessary quoting for the shell; when a sequence is assigned, the
> -quoting is applied by @value{GDBN}.
> +necessary quoting for the shell; when a sequence is assigned, quoting
> +is applied by @value{GDBN} so that the individual strings can be
> +concatenated into a single string, with a single space between each
> +argument. This means that shell quote characters and whitespace
> +characters will be escaped. Quoting is not added to any other special
> +shell characters by @value{GDBN}.
> @end defvar
>
> A @code{gdb.Inferior} object has the following methods:
> diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c
> index 356961cfbcb..962a3ff486b 100644
> --- a/gdb/python/py-inferior.c
> +++ b/gdb/python/py-inferior.c
> @@ -929,7 +929,7 @@ infpy_set_args (PyObject *self, PyObject *value, void *closure)
> for (const auto &arg : args)
> argvec.push_back (arg.get ());
> gdb::array_view<char * const> view (argvec.data (), argvec.size ());
> - inf->inferior->set_args (view, true);
> + inf->inferior->set_args (view, false);
> }
> else
> {
> diff --git a/gdb/testsuite/gdb.python/py-inferior.exp b/gdb/testsuite/gdb.python/py-inferior.exp
> index dbde7fe1f34..6e5f34ebf12 100644
> --- a/gdb/testsuite/gdb.python/py-inferior.exp
> +++ b/gdb/testsuite/gdb.python/py-inferior.exp
> @@ -480,11 +480,37 @@ gdb_test "show args" \
> [string_to_regexp "Argument list to give program being debugged when it is started is \"a b c\"."] \
> "show args from string"
>
> -gdb_test_no_output "python gdb.selected_inferior().arguments = \['a', 'b c'\]" \
> - "set arguments from list"
> -gdb_test "show args" \
> - [string_to_regexp "Argument list to give program being debugged when it is started is \"a b\\ c\"."] \
> - "show args from list"
> +# Test setting inferior arguments from a Python list. INPUT is a
> +# single string that contains the Python list, this is inserted into
> +# the argument setting command, and should include the surrouning
> +# square brackets.
> +#
> +# The OUTPUT is the string that describes the arguments as GDB will
> +# have stored them within the inferior, as seen in the 'show args'
> +# command output. OUTPUT should include the surrounding quotes.
> +# OUTPUT will be passed through string_to_regexp, so should be a plain
> +# string, not a regexp.
> +proc test_setting_arguments_from_list { input output } {
> + with_test_prefix "input: ${input}" {
> + gdb_test_no_output "python gdb.selected_inferior().arguments = ${input}" \
> + "set arguments from list"
> + gdb_test "show args" \
> + [string_to_regexp \
> + "Argument list to give program being debugged when it is started is ${output}."] \
> + "show args from list"
> + }
> +}
> +
> +# Test setting inferior arguments from a list. Try to hit all the
> +# potentially problematic cases. Notice that shell characters are not
> +# automatically quoted, if a user wants a shell character quoted then
> +# they must do that themselves.
> +test_setting_arguments_from_list "\['a', 'b c'\]" "\"a b\\ c\""
> +test_setting_arguments_from_list "\[' ', '\\t', '\\n']" "\"\\ \\\t '\r\n'\""
> +test_setting_arguments_from_list "\['', '']" "\"'' ''\""
> +test_setting_arguments_from_list "\['\"']" "\"\\\"\""
> +test_setting_arguments_from_list "\[\"'\"]" "\"\\\'\""
> +test_setting_arguments_from_list "\[\"\$VAR\", \";\"]" "\"\$VAR ;\""
>
> gdb_test_no_output "python gdb.selected_inferior().clear_env()" \
> "clear environment"
>
> base-commit: 512ca2fca4b6674d51706d0d7fe21ed72f733fb5
> --
> 2.47.1
I'm thinking about checking this in. See this series for the big
picture of what I'm trying to do with argument handling:
https://inbox.sourceware.org/gdb-patches/cover.1730731085.git.aburgess@redhat.com
This patch isn't necessarily critical to getting the rest of this series
merged, but I do think this change makes things more consistent. I
figure merging this well ahead of GDB 17 gives plenty of time for issues
to crop up, in which case this could be reverted.
If anyone has an objection, just let me know and I'll hold off merging
this.
Thanks,
Andrew
---
Andrew Burgess <aburgess@redhat.com> writes:
> It is possible to set an inferior's arguments through the Python API
> by assigning to the gdb.Inferior.arguments attribute.
>
> This attribute can be assigned a string, in which case the string is
> taken verbatim as the inferior's argument string. Or this attribute
> can be assigned a sequence, in which case the members of the sequence
> are combined (with some escaping applied) to create the inferior's
> argument string.
>
> When assigning from a sequence, all special shell characters in each
> argument are escaped, I suspect the reasons for this are mostly
> accidental. When the gdb.Inferior.arguments attribute was introduced
> in commit:
>
> commit 3153113252f3b949a159439a17e88af8ff0dce30
> Date: Mon May 1 13:53:59 2023 -0600
>
> Add attributes and methods to gdb.Inferior
>
> GDB's inferior::set_args method called construct_inferior_arguments,
> and construct_inferior_arguments only offered one form of escaping,
> all special shell characters were escaped.
>
> The commit message for 3153113252f3b949 doesn't discuss quoting at
> all, no tests were added that checked this behaviour, and the
> documentation is (I think) a little vague on exactly what quoting will
> be applied, the text says:
>
> Either a string or a sequence of strings can be assigned to this
> attribute. When a string is assigned, it is assumed to have any
> necessary quoting for the shell; when a sequence is assigned, the
> quoting is applied by GDB.
>
> So clearly quoting is applied when a sequence is used, but what
> quoting is left unclear.
>
> I think there is a problem with the way quoting is currently handled.
> Consider this case:
>
> (gdb) python gdb.selected_inferior().arguments = ['$VAR']
> (gdb) show args
> Argument list to give program being debugged when it is started is "\$VAR".
>
> This means that when the inferior is run it will see literal '$VAR' as
> its argument. If instead, the user wants to pass the shell expanded
> value of $VAR to the inferior, there's no way to achieve this result
> using the sequence assignment method. Instead the user has to
> manually combine the arguments, and assign a single string.
>
> I think it would be better if GDB applied the minimum escaping
> required in order to merge the arguments together while keeping the
> sequence elements as separate arguments. Thus, the above example, I
> think, should work like this instead:
>
> (gdb) python gdb.selected_inferior().arguments = ['$VAR']
> (gdb) show args
> Argument list to give program being debugged when it is started is "$VAR".
>
> Now the '$' character is not escaped. If the inferior is started
> under a shell then the user will see the shell expanded value of
> '$VAR'.
>
> Of course, if the user wants to pass a literal '$VAR' (with no
> expansion) then they can do:
>
> (gdb) python gdb.selected_inferior().arguments = ['\$VAR']
> (gdb) show args
> Argument list to give program being debugged when it is started is "\$VAR".
>
> GDB would still escape some characters, but only those characters that
> are important to GDB, quotes and white space. After this patch the
> user would see behaviour like this:
>
> (gdb) python gdb.selected_inferior().arguments = ["\"1\"", '\'2\'', '3 4']
> (gdb) show args
> Argument list to give program being debugged when it is started is "\"1\" \'2\' 3\ 4".
>
> Notice that GDB has quoted the single and double quote characters as
> well as the white space in the third argument. This means that the
> three sequence elements will be delivered to the inferior as three
> separate arguments exactly as the user delivered them to Python.
>
> It is important to understand that this is a Python API breaking
> change, and this could impact existing user scripts. My hope is that
> this is enough of an edge case that we'll not inconvenience too many
> users.
>
> If people think this is too much of a risk, then I do have some ideas
> for how we can offer different methods of setting the inferior
> arguments from Python that will allow for the new quoting without
> changing the existing behaviour; we could add an inferior method, or
> add a new write-only attribute. Either of these approaches would work
> fine, but (I think) would be less than ideal. I'd much rather just
> update the current API.
>
> There are tests for the updated functionality, and I've updated the
> docs and added a NEWS entry.
>
> Reviewed-By: Eli Zaretskii <eliz@gnu.org>
> Tested-By: Guinevere Larsen <guinevere@redhat.com>
> ---
> gdb/NEWS | 4 +++
> gdb/doc/python.texi | 8 ++++--
> gdb/python/py-inferior.c | 2 +-
> gdb/testsuite/gdb.python/py-inferior.exp | 36 ++++++++++++++++++++----
> 4 files changed, 42 insertions(+), 8 deletions(-)
>
> diff --git a/gdb/NEWS b/gdb/NEWS
> index 0aac7a7b13a..36ea1b77837 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -58,6 +58,10 @@ show riscv numeric-register-names
> was never documented in the GDB manual, so users should not have
> been using it.
>
> + ** When assigning a sequence to gdb.Inferior.arguments, only quote
> + and whitespace characters will be escaped. Everything else will
> + be left unmodified.
> +
> * Guile API
>
> ** New type <gdb:color> for dealing with colors.
> diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
> index 0dbb37bb8dd..953dace3cc8 100644
> --- a/gdb/doc/python.texi
> +++ b/gdb/doc/python.texi
> @@ -3584,8 +3584,12 @@ Inferiors In Python
>
> Either a string or a sequence of strings can be assigned to this
> attribute. When a string is assigned, it is assumed to have any
> -necessary quoting for the shell; when a sequence is assigned, the
> -quoting is applied by @value{GDBN}.
> +necessary quoting for the shell; when a sequence is assigned, quoting
> +is applied by @value{GDBN} so that the individual strings can be
> +concatenated into a single string, with a single space between each
> +argument. This means that shell quote characters and whitespace
> +characters will be escaped. Quoting is not added to any other special
> +shell characters by @value{GDBN}.
> @end defvar
>
> A @code{gdb.Inferior} object has the following methods:
> diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c
> index 356961cfbcb..962a3ff486b 100644
> --- a/gdb/python/py-inferior.c
> +++ b/gdb/python/py-inferior.c
> @@ -929,7 +929,7 @@ infpy_set_args (PyObject *self, PyObject *value, void *closure)
> for (const auto &arg : args)
> argvec.push_back (arg.get ());
> gdb::array_view<char * const> view (argvec.data (), argvec.size ());
> - inf->inferior->set_args (view, true);
> + inf->inferior->set_args (view, false);
> }
> else
> {
> diff --git a/gdb/testsuite/gdb.python/py-inferior.exp b/gdb/testsuite/gdb.python/py-inferior.exp
> index dbde7fe1f34..6e5f34ebf12 100644
> --- a/gdb/testsuite/gdb.python/py-inferior.exp
> +++ b/gdb/testsuite/gdb.python/py-inferior.exp
> @@ -480,11 +480,37 @@ gdb_test "show args" \
> [string_to_regexp "Argument list to give program being debugged when it is started is \"a b c\"."] \
> "show args from string"
>
> -gdb_test_no_output "python gdb.selected_inferior().arguments = \['a', 'b c'\]" \
> - "set arguments from list"
> -gdb_test "show args" \
> - [string_to_regexp "Argument list to give program being debugged when it is started is \"a b\\ c\"."] \
> - "show args from list"
> +# Test setting inferior arguments from a Python list. INPUT is a
> +# single string that contains the Python list, this is inserted into
> +# the argument setting command, and should include the surrouning
> +# square brackets.
> +#
> +# The OUTPUT is the string that describes the arguments as GDB will
> +# have stored them within the inferior, as seen in the 'show args'
> +# command output. OUTPUT should include the surrounding quotes.
> +# OUTPUT will be passed through string_to_regexp, so should be a plain
> +# string, not a regexp.
> +proc test_setting_arguments_from_list { input output } {
> + with_test_prefix "input: ${input}" {
> + gdb_test_no_output "python gdb.selected_inferior().arguments = ${input}" \
> + "set arguments from list"
> + gdb_test "show args" \
> + [string_to_regexp \
> + "Argument list to give program being debugged when it is started is ${output}."] \
> + "show args from list"
> + }
> +}
> +
> +# Test setting inferior arguments from a list. Try to hit all the
> +# potentially problematic cases. Notice that shell characters are not
> +# automatically quoted, if a user wants a shell character quoted then
> +# they must do that themselves.
> +test_setting_arguments_from_list "\['a', 'b c'\]" "\"a b\\ c\""
> +test_setting_arguments_from_list "\[' ', '\\t', '\\n']" "\"\\ \\\t '\r\n'\""
> +test_setting_arguments_from_list "\['', '']" "\"'' ''\""
> +test_setting_arguments_from_list "\['\"']" "\"\\\"\""
> +test_setting_arguments_from_list "\[\"'\"]" "\"\\\'\""
> +test_setting_arguments_from_list "\[\"\$VAR\", \";\"]" "\"\$VAR ;\""
>
> gdb_test_no_output "python gdb.selected_inferior().clear_env()" \
> "clear environment"
>
> base-commit: 512ca2fca4b6674d51706d0d7fe21ed72f733fb5
> --
> 2.47.1
>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:
Andrew> It is possible to set an inferior's arguments through the Python API
Andrew> by assigning to the gdb.Inferior.arguments attribute.
Hi, sorry I didn't reply to this when you originally pinged me a few
weeks ago.
Andrew> Either a string or a sequence of strings can be assigned to this
Andrew> attribute. When a string is assigned, it is assumed to have any
Andrew> necessary quoting for the shell; when a sequence is assigned, the
Andrew> quoting is applied by GDB.
Andrew> So clearly quoting is applied when a sequence is used, but what
Andrew> quoting is left unclear.
As I recall my goal here was to make it straightforward to pass exactly
the desired arguments to the inferior.
This is used by DAP, though it's worth noting that the launch request
isn't specified by DAP itself but instead by gdb:
@item args
If provided, this should be an array of strings. These strings are
provided as command-line arguments to the inferior, as if by
@code{set args}. @xref{Arguments}.
... though that description makes it sound quite different from what
really happens, since 'set args' works in a different way.
Andrew> I think there is a problem with the way quoting is currently handled.
Andrew> Consider this case:
Andrew> (gdb) python gdb.selected_inferior().arguments = ['$VAR']
Andrew> (gdb) show args
Andrew> Argument list to give program being debugged when it is started is "\$VAR".
Andrew> This means that when the inferior is run it will see literal '$VAR' as
Andrew> its argument. If instead, the user wants to pass the shell expanded
Andrew> value of $VAR to the inferior, there's no way to achieve this result
Andrew> using the sequence assignment method.
In this particular case it seems reasonably easy to do any substitutions
in Python.
Andrew> Of course, if the user wants to pass a literal '$VAR' (with no
Andrew> expansion) then they can do:
Andrew> (gdb) python gdb.selected_inferior().arguments = ['\$VAR']
The drawback here is that now the Python code needs to know the quoting
rules -- which gdb already does, and which presumably might be target
dependent, something that gdb already knows well.
Andrew> If people think this is too much of a risk, then I do have some ideas
Andrew> for how we can offer different methods of setting the inferior
Andrew> arguments from Python that will allow for the new quoting without
Andrew> changing the existing behaviour; we could add an inferior method, or
Andrew> add a new write-only attribute. Either of these approaches would work
Andrew> fine, but (I think) would be less than ideal. I'd much rather just
Andrew> update the current API.
I'm not terribly concerned about the risk.
FWIW I'm not really sold on the current approach. The quoting/dequoting
stuff was kind of long-running annoyance of mine and so I implemented
what I thought made sense, but on the other hand, the current approach
does prevent the use of redirection in conjunction with the
array-assignment model.
Though on the other hand, the string approach is there to handle those
needs?
Maybe it would help to understand your motivation for this change.
I suspect this patch needs some kind of change in the DAP code or
documentation.
Tom
Tom Tromey <tom@tromey.com> writes:
>>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:
>
> Andrew> It is possible to set an inferior's arguments through the Python API
> Andrew> by assigning to the gdb.Inferior.arguments attribute.
>
> Hi, sorry I didn't reply to this when you originally pinged me a few
> weeks ago.
>
> Andrew> Either a string or a sequence of strings can be assigned to this
> Andrew> attribute. When a string is assigned, it is assumed to have any
> Andrew> necessary quoting for the shell; when a sequence is assigned, the
> Andrew> quoting is applied by GDB.
>
> Andrew> So clearly quoting is applied when a sequence is used, but what
> Andrew> quoting is left unclear.
>
> As I recall my goal here was to make it straightforward to pass exactly
> the desired arguments to the inferior.
>
> This is used by DAP, though it's worth noting that the launch request
> isn't specified by DAP itself but instead by gdb:
>
> @item args
> If provided, this should be an array of strings. These strings are
> provided as command-line arguments to the inferior, as if by
> @code{set args}. @xref{Arguments}.
>
> ... though that description makes it sound quite different from what
> really happens, since 'set args' works in a different way.
I agree that the description doesn't match reality. Though with this
patch I think they would match. Or at least be a lot closer.
>
> Andrew> I think there is a problem with the way quoting is currently handled.
> Andrew> Consider this case:
>
> Andrew> (gdb) python gdb.selected_inferior().arguments = ['$VAR']
> Andrew> (gdb) show args
> Andrew> Argument list to give program being debugged when it is started is "\$VAR".
>
> Andrew> This means that when the inferior is run it will see literal '$VAR' as
> Andrew> its argument. If instead, the user wants to pass the shell expanded
> Andrew> value of $VAR to the inferior, there's no way to achieve this result
> Andrew> using the sequence assignment method.
>
> In this particular case it seems reasonably easy to do any substitutions
> in Python.
Maybe I'm misunderstanding you, but it sounds like you're suggesting
users would perform shell expansion in Python and pass through the
expanded values. Clearly for the trivial example I used here that might
work, but I don't think that's good advice in general.
But as you mention below, there's the string approach, then all Python
needs to know is how to join multiple args into a single string, which
is much easier than doing full shell expansion.
> Andrew> Of course, if the user wants to pass a literal '$VAR' (with no
> Andrew> expansion) then they can do:
> Andrew> (gdb) python gdb.selected_inferior().arguments = ['\$VAR']
>
> The drawback here is that now the Python code needs to know the quoting
> rules -- which gdb already does, and which presumably might be target
> dependent, something that gdb already knows well.
No not really. If I could summarise the current inferior argument work
it would be: stop having GDB try to quote, split, and join inferior
arguments, because GDB really doesn't know what its doing. I'd much
rather push this out to the edge (i.e. the user) as much as possible.
It's the user that knows their target and know what should be quoted and
what shouldn't.
GDB needs to keep out of it as much as possible. Capture the arguments
ideally as a string from the user, and forward this to the target shell
unchanged.
Only when the user chooses not to use a shell does GDB need to get
involved when the argument string gets split apart. But in that case, I
would argue, GDB itself is acting as the shell, the argument splitting
rules are those that GDB itself defines, and therefore are correct by
definition.
>
> Andrew> If people think this is too much of a risk, then I do have some ideas
> Andrew> for how we can offer different methods of setting the inferior
> Andrew> arguments from Python that will allow for the new quoting without
> Andrew> changing the existing behaviour; we could add an inferior method, or
> Andrew> add a new write-only attribute. Either of these approaches would work
> Andrew> fine, but (I think) would be less than ideal. I'd much rather just
> Andrew> update the current API.
>
> I'm not terribly concerned about the risk.
>
> FWIW I'm not really sold on the current approach. The quoting/dequoting
> stuff was kind of long-running annoyance of mine and so I implemented
> what I thought made sense, but on the other hand, the current approach
> does prevent the use of redirection in conjunction with the
> array-assignment model.
>
> Though on the other hand, the string approach is there to handle those
> needs?
I hear you. This change isn't blocking the rest of my inferior series,
so my plan for now is to put this on ice, work on getting the rest of
the series merged, then revisit this later.
> Maybe it would help to understand your motivation for this change.
The motivation is as mentioned above: move logic for quoting out of GDB
and into the hands of the user. It's the user that knows their target.
You mention above that this change would require Python to understand
the quoting rules after this patch, but I don't agree. I would assume
that the list the the Python DAP code passes through would actually come
from the user. I would argue that it would be the _user_ that would
then need to understand the quoting rules for their target. While right
now the user will know their target (we hope), but they also have to
learn GDB's quoting, along with the limitations that the DAP code
imposes.
> I suspect this patch needs some kind of change in the DAP code or
> documentation.
As you point out above, the DAP documentation is wrong _now_. This
patch would (I claim) make the code basically align with the
documentation.
I rebased and reran this patch on gdb.dap/*.exp and there are no
failures, but I'm not really surprised, this change is kind of an edge
case. This was part of the reason I felt OK proposing this change,
given there were no tests capturing a desired set of behaviours.
I'm going to focus on getting other work from the inferior argument
series merged for now, and I'll revisit this patch later on. Between
now and then I'll try to come up with some better examples, though other
than making bigger, more complex, examples like the one I already gave,
I don't think there's really anything else to suggest ... but I'll see
what I can come up with.
Thanks for your valuable feedback.
Andrew
@@ -58,6 +58,10 @@ show riscv numeric-register-names
was never documented in the GDB manual, so users should not have
been using it.
+ ** When assigning a sequence to gdb.Inferior.arguments, only quote
+ and whitespace characters will be escaped. Everything else will
+ be left unmodified.
+
* Guile API
** New type <gdb:color> for dealing with colors.
@@ -3584,8 +3584,12 @@ Inferiors In Python
Either a string or a sequence of strings can be assigned to this
attribute. When a string is assigned, it is assumed to have any
-necessary quoting for the shell; when a sequence is assigned, the
-quoting is applied by @value{GDBN}.
+necessary quoting for the shell; when a sequence is assigned, quoting
+is applied by @value{GDBN} so that the individual strings can be
+concatenated into a single string, with a single space between each
+argument. This means that shell quote characters and whitespace
+characters will be escaped. Quoting is not added to any other special
+shell characters by @value{GDBN}.
@end defvar
A @code{gdb.Inferior} object has the following methods:
@@ -929,7 +929,7 @@ infpy_set_args (PyObject *self, PyObject *value, void *closure)
for (const auto &arg : args)
argvec.push_back (arg.get ());
gdb::array_view<char * const> view (argvec.data (), argvec.size ());
- inf->inferior->set_args (view, true);
+ inf->inferior->set_args (view, false);
}
else
{
@@ -480,11 +480,37 @@ gdb_test "show args" \
[string_to_regexp "Argument list to give program being debugged when it is started is \"a b c\"."] \
"show args from string"
-gdb_test_no_output "python gdb.selected_inferior().arguments = \['a', 'b c'\]" \
- "set arguments from list"
-gdb_test "show args" \
- [string_to_regexp "Argument list to give program being debugged when it is started is \"a b\\ c\"."] \
- "show args from list"
+# Test setting inferior arguments from a Python list. INPUT is a
+# single string that contains the Python list, this is inserted into
+# the argument setting command, and should include the surrouning
+# square brackets.
+#
+# The OUTPUT is the string that describes the arguments as GDB will
+# have stored them within the inferior, as seen in the 'show args'
+# command output. OUTPUT should include the surrounding quotes.
+# OUTPUT will be passed through string_to_regexp, so should be a plain
+# string, not a regexp.
+proc test_setting_arguments_from_list { input output } {
+ with_test_prefix "input: ${input}" {
+ gdb_test_no_output "python gdb.selected_inferior().arguments = ${input}" \
+ "set arguments from list"
+ gdb_test "show args" \
+ [string_to_regexp \
+ "Argument list to give program being debugged when it is started is ${output}."] \
+ "show args from list"
+ }
+}
+
+# Test setting inferior arguments from a list. Try to hit all the
+# potentially problematic cases. Notice that shell characters are not
+# automatically quoted, if a user wants a shell character quoted then
+# they must do that themselves.
+test_setting_arguments_from_list "\['a', 'b c'\]" "\"a b\\ c\""
+test_setting_arguments_from_list "\[' ', '\\t', '\\n']" "\"\\ \\\t '\r\n'\""
+test_setting_arguments_from_list "\['', '']" "\"'' ''\""
+test_setting_arguments_from_list "\['\"']" "\"\\\"\""
+test_setting_arguments_from_list "\[\"'\"]" "\"\\\'\""
+test_setting_arguments_from_list "\[\"\$VAR\", \";\"]" "\"\$VAR ;\""
gdb_test_no_output "python gdb.selected_inferior().clear_env()" \
"clear environment"