[16/16] gdb/gdbserver: rework argument splitting and joining

Message ID b39576a9faeff115419fb3fc1121ec0f861a1c8e.1704809586.git.aburgess@redhat.com
State New
Headers
Series Inferior argument (inc for remote targets) changes |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm pending Test started
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 pending Test started

Commit Message

Andrew Burgess Jan. 9, 2024, 2:26 p.m. UTC
  This commit attempts to improve how GDB splits and joins inferior
arguments when passing them to a remote target.  The splitting and
joining is now the legacy approach for passing arguments to a remote
target; the preference is to pass arguments as a single string,
however, if GDB is connecting to a target that doesn't support passing
inferior arguments as a single string then the joining and splitting
approach is still used, and this commit tries to improve that approach
a little.

Currently when GDB passes arguments to a remote target this is done by
using the gdb_argv class.  This is the same technique uses to split
the argument vector in order to start an inferior without a shell.

This approach has a problem.  Consider a native target, and the user
sets the inferior arguments like this:

  (gdb) set args \$VAR

In this case the user has escaped the '$' character, removing it's
special meaning.  When started under a shell the '\$VAR' will not
result in variable expansion, and the inferior will see a literal
'$VAR'.  When startup-with-shell is off then gdb_argv is used to split
the inferior arguments, and gdb_argv also removes the backslash
escape, so the inferior again sees a literal '$VAR' string.  I think
this consistency makes sense.

Now consider if the user sets the inferior arguments like this:

  (gdb) set args $VAR

In the startup-with-shell case the shell will expand '$VAR' and the
inferior will see whatever the value of '$VAR' was.  But if
startup-with-shell is off then there is no shell, and the inferior
will just be passed a literal '$VAR' string.

Given the above, when we consider using gdb_argv for splitting
inferior arguments in order to pass them to a remote target, the
problem we have is this: there is a many to one between input strings
and output strings.  Both '\$VAR' and '$VAR' both result in an output
string of '$VAR'.  The remote target can't know what the actual input
was.  If the remote adds the escaping then (when startup-with-shell is
on) the shell will not substitute variables when it should, and if the
remote doesn't add escaping then the shell will try to substitute
variables when it shouldn't.

So I think we have to admit that the current approach is not ideal,
but we cannot change things too much; in simple cases (i.e. without
too many special shell characters) the existing approach does work, so
if we do something too different then existing targets are going to
break even for simple cases.

The current approach then is for the remote target to escape any
special shell characters within the arguments received from GDB.  This
is too aggressive.

An ideal solution might be to have GDB remove quoting, but to keep, or
apply all required escaping, and for the remote target to do nothing
but join the incoming arguments with a single whitespace character.
This would certainly be easiest all around, but would break a lot of
backward compatibility, for example given this:

  (gdb) set args 'ab cd'

GDB could strip the single quotes, and escape the whitespace, sending
the argument to the remote target as (ab\ cd) -- without the
parenthesis -- then on the remote end the target could use this
argument directly.

However, if we adopt this approach existing targets, those that escape
special shell characters, are likely to do the wrong thing, it's
likely that both the backslash and space would be escaped.

So, what we are searching for is a solution that sits somewhere half
way between GDB performs all the escaping, and the current situation,
where the target is expected to apply the escaping.

I propose that the rules be something like:

  + GDB removes the quotes, and escapes all special shell characters
    except for the following:

  + Whitespace is not escaped, the remote is expected to escape any
    whitespace within an argument,

  + Quote characters are not escaped, the remote is expected to escape
    any single or double quote characters within an argument.

  + Any other special shell characters will have been escaped, if
    needed, by GDB, and the escaping included in the transmitted
    argument.

The solution presented here takes into account unquoted arguments, as
well as single and double quoted arguments.

**WARNING**: This is a potentially breaking change in the remote
protocol.  A user, using an existing remote target, that handles
inferior arguments in a particular way may find that with this change
now causes some cases to behave differently.  My hope is that by
handling whitespace and quotes differently we can preserve enough
behaviour that the problem cases are not too common.

As part of this commit I've updated unittests/remote-arg-selftests.c,
and it's possible to see examples of when the arguments passed from
GDB to the remote will change, and when they remain the same.

And there are some tests in gdb.base/startup-with-shell.exp that now
pass even when using a remote target board and when the
single-inferior-arg feature is turned off.

All of the other argument passing tests, e.g. gdb.base/args.exp, and
gdb.base/inferior-args.exp continue to pass, and also already run with
the single-inferior-arg feature both enabled and disabled, so the
changes in this commit should be well tested already.

What isn't addressed at all in this commit is parameter expansion,
command substitution, or arithmetic expansion.  In some way these are
similar to quotes, but instead of single or double quotes the argument
is surrounded with ${...}, $(...), or $((....)).  However, unlike
single or double quotes I don't think that we can remove the quoting
in these cases, if we did, how would the remote target know what to
add back?

However, I'm not proposing to even try to address these cases just
yet.  Ideally, if this whole series is accepted then by this point we
will have already merged the patch that passes inferior arguments as a
single string, a solution that handles all of the above cases.  We can
always revisit this later on and look at supporting these cases if
needed.

This work was inspired by this series:

  https://inbox.sourceware.org/gdb-patches/20211022071933.3478427-1-m.weghorn@posteo.de/

I originally started reviewing this series, and figured that, surely
it must be possible to solve the problem that series tries to solve by
just being "smarter" about how we split arguments.  It was only when I
actually tried to implement this, and started writing tests, that I
realised that above series was right, and what we really need to do is
pass all arguments as a single string.  But I think that series is
worth a read, and should get credit for reaching the right answer
quicker than me!

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28392
---
 gdb/NEWS                                      |   8 +
 gdb/doc/gdb.texinfo                           |  13 +
 gdb/testsuite/gdb.base/startup-with-shell.exp |  27 +--
 gdb/unittests/remote-arg-selftests.c          |  18 +-
 gdbsupport/remote-args.cc                     | 229 +++++++++++++++++-
 5 files changed, 264 insertions(+), 31 deletions(-)
  

Comments

Eli Zaretskii Jan. 9, 2024, 4:37 p.m. UTC | #1
> From: Andrew Burgess <aburgess@redhat.com>
> Cc: Andrew Burgess <aburgess@redhat.com>, Michael Weghorn <m.weghorn@posteo.de>
> Date: Tue,  9 Jan 2024 14:26:39 +0000
> 
>  gdb/NEWS                                      |   8 +
>  gdb/doc/gdb.texinfo                           |  13 +
>  gdb/testsuite/gdb.base/startup-with-shell.exp |  27 +--
>  gdb/unittests/remote-arg-selftests.c          |  18 +-
>  gdbsupport/remote-args.cc                     | 229 +++++++++++++++++-
>  5 files changed, 264 insertions(+), 31 deletions(-)

The documentation parts are okay, thanks.

Reviewed-By: Eli Zaretskii <eliz@gnu.org>
  
Keith Seitz Jan. 21, 2024, 3:57 a.m. UTC | #2
On 1/9/24 06:26, Andrew Burgess wrote:

> I propose that the rules be something like:
> 
>    + GDB removes the quotes, and escapes all special shell characters
>      except for the following:
> 
>    + Whitespace is not escaped, the remote is expected to escape any
>      whitespace within an argument,
> 
>    + Quote characters are not escaped, the remote is expected to escape
>      any single or double quote characters within an argument.
> 
>    + Any other special shell characters will have been escaped, if
>      needed, by GDB, and the escaping included in the transmitted
>      argument.

This is a difficult situation to resolve, isn't it? After reading
through your tests, I believe I have a tiny understanding of what
these rules are doing.

In the end, you've given this far more thought (and play) than I have
while attempting to review this.

Having said that, I only have typos to point out.

> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index 6ff059d14ed..cc9c5ef0ccc 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -43137,6 +43137,19 @@
>   @var{argument} string, which includes all inferior arguments,
>   separated with whitespace.
>   
> +If @value{GDBN} didn't offer @samp{single-inf-arg+}, or the remote
> +didn't choose to use this feature, then @value{GDBN} will split the
> +inferior arguments before passing them to the remote target.  Single
> +and double quotes surrounding any argument will have been removed by
> +@value{GDBN}.  Any special shell characters within each argument will
> +have been escaped with a backslash character except for quote and
> +whitespace characters, these will not be escaped within the arguments,
> +the remote target is expected to escape these characters itself.
> +
> +These slightly complex rules exist for backward compatibility reasons.
> +When implementing a new remote target it is suggested that the
> +@samp{single-inf-arg} feature be supported.

"When implementing a new remote target," (missing comma)

> +
>   @c FIXME:  What about non-stop mode?
>   
>   This packet is only available in extended mode (@pxref{extended mode}).

Keith
  

Patch

diff --git a/gdb/NEWS b/gdb/NEWS
index 8f0bf34fccc..206f72070bb 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -17,6 +17,14 @@ 
   this flag is used gdbserver will not escape special shell characters
   within the inferior arguments.
 
+* There have been changes to how inferior arguments are passed to
+  remote targets.  When using the latest GDB and gdbserver the new
+  single-inf-arg will be used for argument passing.  When using remote
+  targets that don't support single-inf-arg, GDB will now escape
+  special shell characters (except for whitespace and quotes) within
+  arguments before passing them to the remote target.  Remote targets
+  will need updating to take this into account.
+
 * Changed commands
 
 disassemble
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 6ff059d14ed..cc9c5ef0ccc 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -43137,6 +43137,19 @@ 
 @var{argument} string, which includes all inferior arguments,
 separated with whitespace.
 
+If @value{GDBN} didn't offer @samp{single-inf-arg+}, or the remote
+didn't choose to use this feature, then @value{GDBN} will split the
+inferior arguments before passing them to the remote target.  Single
+and double quotes surrounding any argument will have been removed by
+@value{GDBN}.  Any special shell characters within each argument will
+have been escaped with a backslash character except for quote and
+whitespace characters, these will not be escaped within the arguments,
+the remote target is expected to escape these characters itself.
+
+These slightly complex rules exist for backward compatibility reasons.
+When implementing a new remote target it is suggested that the
+@samp{single-inf-arg} feature be supported.
+
 @c FIXME:  What about non-stop mode?
 
 This packet is only available in extended mode (@pxref{extended mode}).
diff --git a/gdb/testsuite/gdb.base/startup-with-shell.exp b/gdb/testsuite/gdb.base/startup-with-shell.exp
index f467774dc5f..3952bd774d4 100644
--- a/gdb/testsuite/gdb.base/startup-with-shell.exp
+++ b/gdb/testsuite/gdb.base/startup-with-shell.exp
@@ -55,11 +55,8 @@  proc initial_setup_simple { startup_with_shell run_args } {
 # it to ON_RE if startup-with-shell is on, otherwise compare to
 # OFF_RE.
 #
-# If PROBLEMATIC_ON is true then when startup-with-shell is on we
-# expect the comparison to fail, so setup an xfail.
-#
 # TESTNAME is a string used in the test names.
-proc run_test { args on_re off_re testname { problematic_on false } } {
+proc run_test { args on_re off_re testname } {
     foreach startup_with_shell { "on" "off" } {
 	with_test_prefix "$testname, startup_with_shell: ${startup_with_shell}" {
 	    if {![initial_setup_simple $startup_with_shell $args]} {
@@ -68,14 +65,8 @@  proc run_test { args on_re off_re testname { problematic_on false } } {
 
 	    if { $startup_with_shell } {
 		set re $on_re
-		set problematic $problematic_on
 	    } else {
 		set re $off_re
-		set problematic false
-	    }
-
-	    if { $problematic } {
-		setup_xfail "*-*-*" gdb/28392
 	    }
 
 	    gdb_test "print argv\[1\]" "\\\$$::decimal = $::hex $re" $testname
@@ -91,15 +82,14 @@  proc run_test_same { args re testname } {
 }
 
 # Run the actual tests
-proc run_all_tests { { is_remote_with_split_args false } } {
+proc run_all_tests { } {
     # The regexp to match a single '\' character.
     set bs "\\\\"
 
     run_test "$::unique_file_dir/*.unique-extension" \
 	"\"$::unique_file\"" \
 	"\"$::unique_file_dir/\\\*\.unique-extension\"" \
-	"arg is glob" \
-	$is_remote_with_split_args
+	"arg is glob"
 
     run_test_same "$::unique_file_dir/\\*.unique-extension" \
 	"\"$::unique_file_dir/\\\*\.unique-extension\"" \
@@ -110,8 +100,7 @@  proc run_all_tests { { is_remote_with_split_args false } } {
 	run_test "\$TEST" \
 	    "\"1234\"" \
 	    "\"\\\$TEST\"" \
-	    "arg is shell variable" \
-	    $is_remote_with_split_args
+	    "arg is shell variable"
 
 	run_test_same "\\\$TEST" \
 	    "\"\\\$TEST\"" \
@@ -121,14 +110,12 @@  proc run_all_tests { { is_remote_with_split_args false } } {
     run_test "\$(echo foo)" \
 	"\"foo\"" \
 	"\"\\\$\\(echo\"" \
-	"arg is parameter expansion, command execution" \
-	$is_remote_with_split_args
+	"arg is parameter expansion, command execution"
 
     run_test "\$((2 + 3))" \
 	"\"5\"" \
 	"\"\\\$\\(\\(2\"" \
-	"arg is parameter expansion, expression evaluation" \
-	$is_remote_with_split_args
+	"arg is parameter expansion, expression evaluation"
 
     run_test_same "\"\\a\"" \
 	"\"${bs}${bs}a\"" \
@@ -180,7 +167,7 @@  if {[target_info gdb_protocol] == "extended-remote"} {
     with_test_prefix "single-inferior-arg disabled" {
 	save_vars { GDBFLAGS } {
 	    append GDBFLAGS " -ex \"set remote single-inferior-argument-feature-packet off\""
-	    run_all_tests true
+	    run_all_tests
 	}
     }
 }
diff --git a/gdb/unittests/remote-arg-selftests.c b/gdb/unittests/remote-arg-selftests.c
index b732d94792d..7abc9d46345 100644
--- a/gdb/unittests/remote-arg-selftests.c
+++ b/gdb/unittests/remote-arg-selftests.c
@@ -45,18 +45,26 @@  arg_test_desc desc[] = {
   { "a b c", { "a", "b", "c" }, "a b c" },
   { "\"a b\" 'c d'", { "a b", "c d" }, "a\\ b c\\ d" },
   { "\\' \\\"", { "'", "\"" }, "\\' \\\"" },
-  { "'\\'", { "\\" }, "\\\\" },
-  { "\"\\\\\" \"\\\\\\\"\"", { "\\", "\\\"" }, "\\\\ \\\\\\\"" },
+  { "'\\'", { "\\\\" }, "\\\\" },
+  { "\"\\\\\" \"\\\\\\\"\"", { "\\\\", "\\\\\"" }, "\\\\ \\\\\\\"" },
   { "\\  \" \" ' '", { " ", " ", " "}, "\\  \\  \\ " },
   { "\"'\"", { "'" }, "\\'" },
-  { "'\"' '\\\"'", { "\"", "\\\"" } , "\\\" \\\\\\\""},
+  { "'\"' '\\\"'", { "\"", "\\\\\"" } , "\\\" \\\\\\\""},
   { "\"first arg\" \"\" \"third-arg\" \"'\" \"\\\"\" \"\\\\\\\"\" \" \" \"\"",
-    { "first arg", "", "third-arg", "'", "\"", "\\\""," ", "" },
+    { "first arg", "", "third-arg", "'", "\"", "\\\\\""," ", "" },
     "first\\ arg '' third-arg \\' \\\" \\\\\\\" \\  ''"},
   { "\"\\a\" \"\\&\" \"\\#\" \"\\<\" \"\\^\"",
-    { "\\a", "\\&", "\\#" , "\\<" , "\\^"},
+    { "\\\\a", "\\\\\\&", "\\\\\\#" , "\\\\\\<" , "\\\\\\^"},
+    "\\\\a \\\\\\& \\\\\\# \\\\\\< \\\\\\^" },
+  { "\"\\\\a\" \"\\\\&\" \"\\\\#\" \"\\\\<\" \"\\\\^\"",
+    { "\\\\a", "\\\\\\&", "\\\\\\#" , "\\\\\\<" , "\\\\\\^"},
     "\\\\a \\\\\\& \\\\\\# \\\\\\< \\\\\\^" },
   { "1 '\n' 3", { "1", "\n", "3" }, "1 '\n' 3" },
+  { "abc* abc\\* abc\\\\*", { "abc*", "abc\\*", "abc\\\\*" },
+    "abc* abc\\* abc\\\\*" },
+  { "\"ab\\ cd\" \"ef\\'gh\"", { "ab\\\\ cd", "ef\\\\'gh" },
+    "ab\\\\\\ cd ef\\\\\\'gh" },
+  { "\"$VAR\" \"ab`foo`cd\"", { "$VAR", "ab`foo`cd" }, "$VAR ab`foo`cd" },
 };
 
 /* Run the remote argument passing self tests.  */
diff --git a/gdbsupport/remote-args.cc b/gdbsupport/remote-args.cc
index b808a64efce..77e4c1a4bc1 100644
--- a/gdbsupport/remote-args.cc
+++ b/gdbsupport/remote-args.cc
@@ -20,18 +20,235 @@ 
 #include "gdbsupport/common-inferior.h"
 #include "gdbsupport/buildargv.h"
 
+/* This file contains the function used for splitting an argument string
+   into separate arguments in preparation for sending the argument over the
+   remote protocol, as well as the function to merge the separate arguments
+   back together into a single argument string.
+
+   The logic within these functions is slightly more complex that it should
+   be.  This is in order to maintain some level of backward compatibility.
+
+   In the following test example of command line arguments will be given.
+   To avoid confusion arguments, and argument strings will be delimited
+   with '(' and ')', the parenthesis are not part of the argument itself.
+   This is clearer than using quotes, as some of the examples will include
+   quotes within the arguments.
+
+   Historically, the algorithm used to split the argument string into
+   separate arguments removed a level of quoting from the arguments.  For
+   example consider the following argument string: (abc* abc\*).  The
+   historic algorithm would split this into (abc*) and (abc*).  Notice that
+   the two arguments are identical.  On the remote end we are now destined
+   for failure, either we apply an escape to both '*' characters, or we
+   apply an escape to neither.  In either case, we get one of the arguments
+   wrong.  The historic approach was just broken.
+
+   However, the historic approach has been in place for many years.
+   Clearly not all arguments were corrupted in the manor described above,
+   so lots of things did work.  For example, the string: ("ab cd" "ef")
+   will be split into (ab cd) and (ef).  And the string ('"') will become
+   just (").
+
+   What we can observer from all of these examples, is that the historic
+   approach at the remote end was to simple apply an escape to every
+   special shell character, quotes, white space, as well as every other
+   special character (e.g. (*)).  The problem with this approach is that
+   sometimes special shell characters shouldn't be escaped.
+
+   If we could start from scratch, then the simple approach would be to
+   retain all escaping while splitting the argument string, and, where
+   quotes are used, convert this into backslash escaping as neeeded.  Thus
+   the argument string ("ab cd" "ef") would become (ab\ cd) and (ef).  And
+   the argument string (abc* abc\*) would become (abc*) (abc\*).  On the
+   remote end, joining these arguments is as simple as concatenation with a
+   single space character between.
+
+   However, if we took this approach, then consider ("ab cd").  Previously
+   this was sent as (ab cd), but now it would be (ab\ cd).  This breaks
+   backward compatibility.
+
+   And so, this is where the complexity comes in.
+
+   The strategy here is to split the arguments, removing all double and
+   single quotes.  While removing quotes, special shell characters are
+   escaped as needed.  But, white space characters, and quote characters
+   are not escaped.  These characters must always be escaped, and so we can
+   safely drop the escape in these cases, this provides some degree of
+   backward compatibility.  */
+
+/* Return true if C is a double or single quote character.  */
+
+static bool
+isquote (const char c)
+{
+  return c == '"' || c == '\'';
+}
+
 /* See remote-args.h.  */
 
 std::vector<std::string>
 gdb::remote_args::split (std::string args)
 {
-  std::vector<std::string> results;
+  std::vector<std::string> remote_args;
+
+  const char *input = args.c_str ();
+  bool squote = false, dquote = false;
+
+#ifdef __MINGW32__
+  /* This holds all the characters considered special to the
+     Windows shells.  */
+  static const char special[] = "\"!&*|[]{}<>?`~^=;, \t\n";
+#else
+  /* This holds all the characters considered special to the
+     typical Unix shells.  We include `^' because the SunOS
+     /bin/sh treats it as a synonym for `|'.  */
+  static const char special[] = "\"!#$&*()\\|[]{}<>?'`~^; \t\n";
+#endif
+
+  /* Characters that are special within double quotes.  */
+  static const char dquote_special[] = "$`\\";
+
+  input = skip_spaces (input);
+
+  do
+    {
+      std::string arg;
+
+      while (*input != '\0')
+	{
+	  if (isspace (*input) && !squote && !dquote)
+	    break;
+	  else if (*input == '\\' && !squote)
+	    {
+	      if (input[1] == '\0')
+		arg += input[0];
+	      else if (input[1] == '\n')
+		++input;
+	      else if (dquote && input[1] == '"')
+		{
+		  arg += input[1];
+		  ++input;
+		}
+	      else if (dquote && strchr (dquote_special, input[1]) != nullptr)
+		{
+		  /* Within double quotes, these characters have special
+		     meaning.  If they are escaped with a backslash then we
+		     need to preserve the escape once we remove the
+		     quotes.  */
+		  arg += input[0];	/* Backslash.  */
+		  arg += input[1];	/* Special character.  */
+		  ++input;
+		}
+	      else if (dquote)
+		{
+		  /* Within double quotes, none of the remaining characters
+		     have any special meaning, the backslash before the
+		     character is a literal backslash.
+
+		     To retain the literal backslash with the quotes
+		     removed, we need to escape the backslash.  */
+		  arg += input[0];	/* Backslash.  */
+		  arg += input[0];	/* Backslash.  */
+
+		  /* If the character after the backslash has special
+		     meaning outside of the double quotes, then we need to
+		     escape it now, otherwise it will gain special meaning
+		     as we remove the surrounding quotes.  However, as per
+		     the comments at the head of this file; we don't
+		     escape white space or quotes.  */
+		  if (!isspace (input[1])
+		      && !isquote (input[1])
+		      && strchr (special, input[1]) != nullptr)
+		    arg += input[0];	/* Backslash.  */
+
+		  arg += input[1];	/* Character.  */
+		  ++input;
+		}
+	      else if (isspace (input[1]) || isquote (input[1]))
+		{
+		  /* We remove the escaping from white space and quote
+		     characters.  */
+		  arg += input[1];	/* Character.  */
+		  ++input;
+		}
+	      else
+		{
+		  /* For everything else, retain the escaping.  */
+		  arg += input[0];	/* Backslash.  */
+		  arg += input[1];	/* Character.  */
+		  ++input;
+		}
+	    }
+	  else if (squote)
+	    {
+	      /* Inside a single quote argument there are no special
+		 characters.  A single quote finishes the argument.  */
+	      if (*input == '\'')
+		squote = false;
+	      /* Don't add escaping to white space or quotes.  We already
+		 handled single quotes above, so the isquote call here will
+		 only find double quotes.  */
+	      else if (isspace (*input) || isquote (*input))
+		arg += *input;
+	      /* Any other special shell character needs a backslash adding
+		 to avoid the character gaining special meaning outside of
+		 the single quotes.  */
+	      else if (strchr (special, *input) != NULL)
+		{
+		  arg += '\\';
+		  arg += *input;
+		}
+	      /* Any other character just gets added to the output.  */
+	      else
+		arg += *input;
+	    }
+	  else if (dquote)
+	    {
+	      /* Inside a double quoted argument.  A double quote closes
+		 the argument.  An escaped double quote will have been
+		 handled in the backslash handling block above.  */
+	      if (*input == '"')
+		dquote = false;
+	      /* Don't add escaping for white space or quotes.  We already
+		 handled double quotes above, so the isquote call here will
+		 only find single quotes.  */
+	      else if (isspace (*input) || isquote (*input))
+		arg += *input;
+	      /* Any character that is not one of the few characters that
+		 retains its special meaning without double quotes, but is
+		 otherwise a special character needs an escape character
+		 adding, to avoid the character gaining special meaning
+		 outside of the quotes.  */
+	      else if (strchr (dquote_special, *input) == nullptr
+		       && strchr (special, *input) != nullptr)
+		{
+		  arg += '\\';
+		  arg += *input;
+		}
+	      /* Anything else just gets passed through to the output.  */
+	      else
+		arg += *input;
+	    }
+	  else
+	    {
+	      /* Found a character outside of a single or double quoted
+		 argument, and not preceded by a backslash.  */
+	      if (*input == '\'')
+		squote = true;
+	      else if (*input == '"')
+		dquote = true;
+	      else
+		arg += *input;
+	    }
+	  ++input;
+	}
 
-  gdb_argv argv (args.c_str ());
-  for (int i = 0; argv[i] != nullptr; i++)
-    results.emplace_back (argv[i]);
+      remote_args.push_back (std::move (arg));
+      input = skip_spaces (input);
+    }
+  while (*input != '\0');
 
-  return results;
+  return remote_args;
 }
 
 /* See remote-args.h.  */
@@ -39,5 +256,5 @@  gdb::remote_args::split (std::string args)
 std::string
 gdb::remote_args::join (std::vector<gdb::unique_xmalloc_ptr<char>> &args)
 {
-  return construct_inferior_arguments (args, escape_shell_characters);
+  return construct_inferior_arguments (args, escape_quotes_and_white_space);
 }