From patchwork Tue Jan 9 14:26:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Burgess X-Patchwork-Id: 83653 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1CAF43858017 for ; Tue, 9 Jan 2024 14:32:16 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id A39E338582A6 for ; Tue, 9 Jan 2024 14:27:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A39E338582A6 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A39E338582A6 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704810437; cv=none; b=LNhGuNsIS9LzxG/2NUELDUxqv0Y+zNTJCIDekUNCb2L7aQjA+AQ8uc6ZuEYEv+5szZFFFvzYlhYTjIwDyR/dvYVT6M0kPNQmqDGG86YeFpihHSWjSS2MxhbkKqYyM8sbujezQi2nbh2R8MXSQb9iS/kr5Tci2GcHVqCaAigjiIU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704810437; c=relaxed/simple; bh=zKsBISShAypINWlhHHMQ9H60uO3wgqTHNVdIqZ0dpZ4=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=S1DJ7BnadElO2V4LHsC5UM6xSI9ToPTQCe/zTWztV6/tGO/DDE3UH2vUu8uM6rdNFkIdzSF5D4Z+WZzTkHOIefFCoCOwc2WZiZ99UyBUcKJYzYKn81gCK9tujV3wduCMyh/xTt4O2oZSaGyAs6mamVs19imED4PcPj52TVUKRDI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1704810431; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zspPI7YTZ0pTjx72nJ7v7ED2fPgMu9oxtxDs2mm42n8=; b=O78R1cK3nAxCRxANOgfl05p2SfF6xC4SlbLV31SRI5NE1nnXRXbpNpmtxXFP1Ij2lSlvoN suDR5KSa1o/wkA7QfKcbBiZyDdaTNcKQ9BEd4ksuZbyfT1rZu9UN7AAl5D20vw5IMNC/GU Fe57MFclYKFjlEuLeQTtkdWV2k+XAbs= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-338-TqAR-h4ENfejCsOa79gq0w-1; Tue, 09 Jan 2024 09:27:10 -0500 X-MC-Unique: TqAR-h4ENfejCsOa79gq0w-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-40d3c533083so24721655e9.1 for ; Tue, 09 Jan 2024 06:27:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704810427; x=1705415227; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zspPI7YTZ0pTjx72nJ7v7ED2fPgMu9oxtxDs2mm42n8=; b=DiI0t08Q26tzZYfXUHS3liHf5AYYRvb/SdkBSe3oMYSZ8F71/czMTORpmVrWFFT7G8 nk5PKYg4pj1Cw5DXtihr1YrvwVoupscw906cPFRzriadIyRdwjfQo6FUHWZ+emAYlxLl fSXu3cY3p1tf3YrhdrPNIiPZW8b0ivEs+BMDtdIH7jKY112vsvU917PaBv+lOq+8l8fz 29wXwYf2rfX+gO4FPyZYK8+MqiWpwFk8Hq77vWGAH73zXac6FVjNj/NqXlHcxWRvdq6F tHMD1lM7hvoyTUOffZRUVcVCjA4nuHFbRtveiEWHfhv+xTApBsfX4Xslif1/7F+beuh4 8L9g== X-Gm-Message-State: AOJu0Yx+WW9pX/Pb87dSRD50elRU0IYsP300XxhOJw8yyDrKm7qdiZFC 1aXWiT1++zaAepU5a3b72AZ/KbbbzcH2NKPUp4U3HsY/HI2+ZmGx2rILxEAQvEDklOgvw9cg6xC 6gO4GrYmhZymMjFaXXp3gGMlgY8DUN+fa9eOgdQ8S6u8FLpIq70u73JIq2zdih3C2gpO7FiM6Mw Bm1ylAWL3+u+tZbA== X-Received: by 2002:a05:600c:510d:b0:40d:8557:8d85 with SMTP id o13-20020a05600c510d00b0040d85578d85mr2883101wms.80.1704810427061; Tue, 09 Jan 2024 06:27:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IE3KSzjRXpe6+0mwA1lfkmUB6Yq8fXKweZYMP8lnmwv2Kt/n6YY6ZOgP27+9MpvCFoTM7ckuQ== X-Received: by 2002:a05:600c:510d:b0:40d:8557:8d85 with SMTP id o13-20020a05600c510d00b0040d85578d85mr2883086wms.80.1704810426363; Tue, 09 Jan 2024 06:27:06 -0800 (PST) Received: from localhost (185.223.159.143.dyn.plus.net. [143.159.223.185]) by smtp.gmail.com with ESMTPSA id fa7-20020a05600c518700b0040d87100733sm3701785wmb.39.2024.01.09.06.27.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jan 2024 06:27:06 -0800 (PST) From: Andrew Burgess To: gdb-patches@sourceware.org Cc: Andrew Burgess , Michael Weghorn Subject: [PATCH 16/16] gdb/gdbserver: rework argument splitting and joining Date: Tue, 9 Jan 2024 14:26:39 +0000 Message-Id: X-Mailer: git-send-email 2.25.4 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org This commit attempts to improve how GDB splits and joins inferior arguments when passing them to a remote target. The splitting and joining is now the legacy approach for passing arguments to a remote target; the preference is to pass arguments as a single string, however, if GDB is connecting to a target that doesn't support passing inferior arguments as a single string then the joining and splitting approach is still used, and this commit tries to improve that approach a little. Currently when GDB passes arguments to a remote target this is done by using the gdb_argv class. This is the same technique uses to split the argument vector in order to start an inferior without a shell. This approach has a problem. Consider a native target, and the user sets the inferior arguments like this: (gdb) set args \$VAR In this case the user has escaped the '$' character, removing it's special meaning. When started under a shell the '\$VAR' will not result in variable expansion, and the inferior will see a literal '$VAR'. When startup-with-shell is off then gdb_argv is used to split the inferior arguments, and gdb_argv also removes the backslash escape, so the inferior again sees a literal '$VAR' string. I think this consistency makes sense. Now consider if the user sets the inferior arguments like this: (gdb) set args $VAR In the startup-with-shell case the shell will expand '$VAR' and the inferior will see whatever the value of '$VAR' was. But if startup-with-shell is off then there is no shell, and the inferior will just be passed a literal '$VAR' string. Given the above, when we consider using gdb_argv for splitting inferior arguments in order to pass them to a remote target, the problem we have is this: there is a many to one between input strings and output strings. Both '\$VAR' and '$VAR' both result in an output string of '$VAR'. The remote target can't know what the actual input was. If the remote adds the escaping then (when startup-with-shell is on) the shell will not substitute variables when it should, and if the remote doesn't add escaping then the shell will try to substitute variables when it shouldn't. So I think we have to admit that the current approach is not ideal, but we cannot change things too much; in simple cases (i.e. without too many special shell characters) the existing approach does work, so if we do something too different then existing targets are going to break even for simple cases. The current approach then is for the remote target to escape any special shell characters within the arguments received from GDB. This is too aggressive. An ideal solution might be to have GDB remove quoting, but to keep, or apply all required escaping, and for the remote target to do nothing but join the incoming arguments with a single whitespace character. This would certainly be easiest all around, but would break a lot of backward compatibility, for example given this: (gdb) set args 'ab cd' GDB could strip the single quotes, and escape the whitespace, sending the argument to the remote target as (ab\ cd) -- without the parenthesis -- then on the remote end the target could use this argument directly. However, if we adopt this approach existing targets, those that escape special shell characters, are likely to do the wrong thing, it's likely that both the backslash and space would be escaped. So, what we are searching for is a solution that sits somewhere half way between GDB performs all the escaping, and the current situation, where the target is expected to apply the escaping. I propose that the rules be something like: + GDB removes the quotes, and escapes all special shell characters except for the following: + Whitespace is not escaped, the remote is expected to escape any whitespace within an argument, + Quote characters are not escaped, the remote is expected to escape any single or double quote characters within an argument. + Any other special shell characters will have been escaped, if needed, by GDB, and the escaping included in the transmitted argument. The solution presented here takes into account unquoted arguments, as well as single and double quoted arguments. **WARNING**: This is a potentially breaking change in the remote protocol. A user, using an existing remote target, that handles inferior arguments in a particular way may find that with this change now causes some cases to behave differently. My hope is that by handling whitespace and quotes differently we can preserve enough behaviour that the problem cases are not too common. As part of this commit I've updated unittests/remote-arg-selftests.c, and it's possible to see examples of when the arguments passed from GDB to the remote will change, and when they remain the same. And there are some tests in gdb.base/startup-with-shell.exp that now pass even when using a remote target board and when the single-inferior-arg feature is turned off. All of the other argument passing tests, e.g. gdb.base/args.exp, and gdb.base/inferior-args.exp continue to pass, and also already run with the single-inferior-arg feature both enabled and disabled, so the changes in this commit should be well tested already. What isn't addressed at all in this commit is parameter expansion, command substitution, or arithmetic expansion. In some way these are similar to quotes, but instead of single or double quotes the argument is surrounded with ${...}, $(...), or $((....)). However, unlike single or double quotes I don't think that we can remove the quoting in these cases, if we did, how would the remote target know what to add back? However, I'm not proposing to even try to address these cases just yet. Ideally, if this whole series is accepted then by this point we will have already merged the patch that passes inferior arguments as a single string, a solution that handles all of the above cases. We can always revisit this later on and look at supporting these cases if needed. This work was inspired by this series: https://inbox.sourceware.org/gdb-patches/20211022071933.3478427-1-m.weghorn@posteo.de/ I originally started reviewing this series, and figured that, surely it must be possible to solve the problem that series tries to solve by just being "smarter" about how we split arguments. It was only when I actually tried to implement this, and started writing tests, that I realised that above series was right, and what we really need to do is pass all arguments as a single string. But I think that series is worth a read, and should get credit for reaching the right answer quicker than me! Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28392 Reviewed-By: Eli Zaretskii --- gdb/NEWS | 8 + gdb/doc/gdb.texinfo | 13 + gdb/testsuite/gdb.base/startup-with-shell.exp | 27 +-- gdb/unittests/remote-arg-selftests.c | 18 +- gdbsupport/remote-args.cc | 229 +++++++++++++++++- 5 files changed, 264 insertions(+), 31 deletions(-) diff --git a/gdb/NEWS b/gdb/NEWS index 8f0bf34fccc..206f72070bb 100644 --- a/gdb/NEWS +++ b/gdb/NEWS @@ -17,6 +17,14 @@ this flag is used gdbserver will not escape special shell characters within the inferior arguments. +* There have been changes to how inferior arguments are passed to + remote targets. When using the latest GDB and gdbserver the new + single-inf-arg will be used for argument passing. When using remote + targets that don't support single-inf-arg, GDB will now escape + special shell characters (except for whitespace and quotes) within + arguments before passing them to the remote target. Remote targets + will need updating to take this into account. + * Changed commands disassemble diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo index 6ff059d14ed..cc9c5ef0ccc 100644 --- a/gdb/doc/gdb.texinfo +++ b/gdb/doc/gdb.texinfo @@ -43137,6 +43137,19 @@ @var{argument} string, which includes all inferior arguments, separated with whitespace. +If @value{GDBN} didn't offer @samp{single-inf-arg+}, or the remote +didn't choose to use this feature, then @value{GDBN} will split the +inferior arguments before passing them to the remote target. Single +and double quotes surrounding any argument will have been removed by +@value{GDBN}. Any special shell characters within each argument will +have been escaped with a backslash character except for quote and +whitespace characters, these will not be escaped within the arguments, +the remote target is expected to escape these characters itself. + +These slightly complex rules exist for backward compatibility reasons. +When implementing a new remote target it is suggested that the +@samp{single-inf-arg} feature be supported. + @c FIXME: What about non-stop mode? This packet is only available in extended mode (@pxref{extended mode}). diff --git a/gdb/testsuite/gdb.base/startup-with-shell.exp b/gdb/testsuite/gdb.base/startup-with-shell.exp index f467774dc5f..3952bd774d4 100644 --- a/gdb/testsuite/gdb.base/startup-with-shell.exp +++ b/gdb/testsuite/gdb.base/startup-with-shell.exp @@ -55,11 +55,8 @@ proc initial_setup_simple { startup_with_shell run_args } { # it to ON_RE if startup-with-shell is on, otherwise compare to # OFF_RE. # -# If PROBLEMATIC_ON is true then when startup-with-shell is on we -# expect the comparison to fail, so setup an xfail. -# # TESTNAME is a string used in the test names. -proc run_test { args on_re off_re testname { problematic_on false } } { +proc run_test { args on_re off_re testname } { foreach startup_with_shell { "on" "off" } { with_test_prefix "$testname, startup_with_shell: ${startup_with_shell}" { if {![initial_setup_simple $startup_with_shell $args]} { @@ -68,14 +65,8 @@ proc run_test { args on_re off_re testname { problematic_on false } } { if { $startup_with_shell } { set re $on_re - set problematic $problematic_on } else { set re $off_re - set problematic false - } - - if { $problematic } { - setup_xfail "*-*-*" gdb/28392 } gdb_test "print argv\[1\]" "\\\$$::decimal = $::hex $re" $testname @@ -91,15 +82,14 @@ proc run_test_same { args re testname } { } # Run the actual tests -proc run_all_tests { { is_remote_with_split_args false } } { +proc run_all_tests { } { # The regexp to match a single '\' character. set bs "\\\\" run_test "$::unique_file_dir/*.unique-extension" \ "\"$::unique_file\"" \ "\"$::unique_file_dir/\\\*\.unique-extension\"" \ - "arg is glob" \ - $is_remote_with_split_args + "arg is glob" run_test_same "$::unique_file_dir/\\*.unique-extension" \ "\"$::unique_file_dir/\\\*\.unique-extension\"" \ @@ -110,8 +100,7 @@ proc run_all_tests { { is_remote_with_split_args false } } { run_test "\$TEST" \ "\"1234\"" \ "\"\\\$TEST\"" \ - "arg is shell variable" \ - $is_remote_with_split_args + "arg is shell variable" run_test_same "\\\$TEST" \ "\"\\\$TEST\"" \ @@ -121,14 +110,12 @@ proc run_all_tests { { is_remote_with_split_args false } } { run_test "\$(echo foo)" \ "\"foo\"" \ "\"\\\$\\(echo\"" \ - "arg is parameter expansion, command execution" \ - $is_remote_with_split_args + "arg is parameter expansion, command execution" run_test "\$((2 + 3))" \ "\"5\"" \ "\"\\\$\\(\\(2\"" \ - "arg is parameter expansion, expression evaluation" \ - $is_remote_with_split_args + "arg is parameter expansion, expression evaluation" run_test_same "\"\\a\"" \ "\"${bs}${bs}a\"" \ @@ -180,7 +167,7 @@ if {[target_info gdb_protocol] == "extended-remote"} { with_test_prefix "single-inferior-arg disabled" { save_vars { GDBFLAGS } { append GDBFLAGS " -ex \"set remote single-inferior-argument-feature-packet off\"" - run_all_tests true + run_all_tests } } } diff --git a/gdb/unittests/remote-arg-selftests.c b/gdb/unittests/remote-arg-selftests.c index b732d94792d..7abc9d46345 100644 --- a/gdb/unittests/remote-arg-selftests.c +++ b/gdb/unittests/remote-arg-selftests.c @@ -45,18 +45,26 @@ arg_test_desc desc[] = { { "a b c", { "a", "b", "c" }, "a b c" }, { "\"a b\" 'c d'", { "a b", "c d" }, "a\\ b c\\ d" }, { "\\' \\\"", { "'", "\"" }, "\\' \\\"" }, - { "'\\'", { "\\" }, "\\\\" }, - { "\"\\\\\" \"\\\\\\\"\"", { "\\", "\\\"" }, "\\\\ \\\\\\\"" }, + { "'\\'", { "\\\\" }, "\\\\" }, + { "\"\\\\\" \"\\\\\\\"\"", { "\\\\", "\\\\\"" }, "\\\\ \\\\\\\"" }, { "\\ \" \" ' '", { " ", " ", " "}, "\\ \\ \\ " }, { "\"'\"", { "'" }, "\\'" }, - { "'\"' '\\\"'", { "\"", "\\\"" } , "\\\" \\\\\\\""}, + { "'\"' '\\\"'", { "\"", "\\\\\"" } , "\\\" \\\\\\\""}, { "\"first arg\" \"\" \"third-arg\" \"'\" \"\\\"\" \"\\\\\\\"\" \" \" \"\"", - { "first arg", "", "third-arg", "'", "\"", "\\\""," ", "" }, + { "first arg", "", "third-arg", "'", "\"", "\\\\\""," ", "" }, "first\\ arg '' third-arg \\' \\\" \\\\\\\" \\ ''"}, { "\"\\a\" \"\\&\" \"\\#\" \"\\<\" \"\\^\"", - { "\\a", "\\&", "\\#" , "\\<" , "\\^"}, + { "\\\\a", "\\\\\\&", "\\\\\\#" , "\\\\\\<" , "\\\\\\^"}, + "\\\\a \\\\\\& \\\\\\# \\\\\\< \\\\\\^" }, + { "\"\\\\a\" \"\\\\&\" \"\\\\#\" \"\\\\<\" \"\\\\^\"", + { "\\\\a", "\\\\\\&", "\\\\\\#" , "\\\\\\<" , "\\\\\\^"}, "\\\\a \\\\\\& \\\\\\# \\\\\\< \\\\\\^" }, { "1 '\n' 3", { "1", "\n", "3" }, "1 '\n' 3" }, + { "abc* abc\\* abc\\\\*", { "abc*", "abc\\*", "abc\\\\*" }, + "abc* abc\\* abc\\\\*" }, + { "\"ab\\ cd\" \"ef\\'gh\"", { "ab\\\\ cd", "ef\\\\'gh" }, + "ab\\\\\\ cd ef\\\\\\'gh" }, + { "\"$VAR\" \"ab`foo`cd\"", { "$VAR", "ab`foo`cd" }, "$VAR ab`foo`cd" }, }; /* Run the remote argument passing self tests. */ diff --git a/gdbsupport/remote-args.cc b/gdbsupport/remote-args.cc index b808a64efce..77e4c1a4bc1 100644 --- a/gdbsupport/remote-args.cc +++ b/gdbsupport/remote-args.cc @@ -20,18 +20,235 @@ #include "gdbsupport/common-inferior.h" #include "gdbsupport/buildargv.h" +/* This file contains the function used for splitting an argument string + into separate arguments in preparation for sending the argument over the + remote protocol, as well as the function to merge the separate arguments + back together into a single argument string. + + The logic within these functions is slightly more complex that it should + be. This is in order to maintain some level of backward compatibility. + + In the following test example of command line arguments will be given. + To avoid confusion arguments, and argument strings will be delimited + with '(' and ')', the parenthesis are not part of the argument itself. + This is clearer than using quotes, as some of the examples will include + quotes within the arguments. + + Historically, the algorithm used to split the argument string into + separate arguments removed a level of quoting from the arguments. For + example consider the following argument string: (abc* abc\*). The + historic algorithm would split this into (abc*) and (abc*). Notice that + the two arguments are identical. On the remote end we are now destined + for failure, either we apply an escape to both '*' characters, or we + apply an escape to neither. In either case, we get one of the arguments + wrong. The historic approach was just broken. + + However, the historic approach has been in place for many years. + Clearly not all arguments were corrupted in the manor described above, + so lots of things did work. For example, the string: ("ab cd" "ef") + will be split into (ab cd) and (ef). And the string ('"') will become + just ("). + + What we can observer from all of these examples, is that the historic + approach at the remote end was to simple apply an escape to every + special shell character, quotes, white space, as well as every other + special character (e.g. (*)). The problem with this approach is that + sometimes special shell characters shouldn't be escaped. + + If we could start from scratch, then the simple approach would be to + retain all escaping while splitting the argument string, and, where + quotes are used, convert this into backslash escaping as neeeded. Thus + the argument string ("ab cd" "ef") would become (ab\ cd) and (ef). And + the argument string (abc* abc\*) would become (abc*) (abc\*). On the + remote end, joining these arguments is as simple as concatenation with a + single space character between. + + However, if we took this approach, then consider ("ab cd"). Previously + this was sent as (ab cd), but now it would be (ab\ cd). This breaks + backward compatibility. + + And so, this is where the complexity comes in. + + The strategy here is to split the arguments, removing all double and + single quotes. While removing quotes, special shell characters are + escaped as needed. But, white space characters, and quote characters + are not escaped. These characters must always be escaped, and so we can + safely drop the escape in these cases, this provides some degree of + backward compatibility. */ + +/* Return true if C is a double or single quote character. */ + +static bool +isquote (const char c) +{ + return c == '"' || c == '\''; +} + /* See remote-args.h. */ std::vector gdb::remote_args::split (std::string args) { - std::vector results; + std::vector remote_args; + + const char *input = args.c_str (); + bool squote = false, dquote = false; + +#ifdef __MINGW32__ + /* This holds all the characters considered special to the + Windows shells. */ + static const char special[] = "\"!&*|[]{}<>?`~^=;, \t\n"; +#else + /* This holds all the characters considered special to the + typical Unix shells. We include `^' because the SunOS + /bin/sh treats it as a synonym for `|'. */ + static const char special[] = "\"!#$&*()\\|[]{}<>?'`~^; \t\n"; +#endif + + /* Characters that are special within double quotes. */ + static const char dquote_special[] = "$`\\"; + + input = skip_spaces (input); + + do + { + std::string arg; + + while (*input != '\0') + { + if (isspace (*input) && !squote && !dquote) + break; + else if (*input == '\\' && !squote) + { + if (input[1] == '\0') + arg += input[0]; + else if (input[1] == '\n') + ++input; + else if (dquote && input[1] == '"') + { + arg += input[1]; + ++input; + } + else if (dquote && strchr (dquote_special, input[1]) != nullptr) + { + /* Within double quotes, these characters have special + meaning. If they are escaped with a backslash then we + need to preserve the escape once we remove the + quotes. */ + arg += input[0]; /* Backslash. */ + arg += input[1]; /* Special character. */ + ++input; + } + else if (dquote) + { + /* Within double quotes, none of the remaining characters + have any special meaning, the backslash before the + character is a literal backslash. + + To retain the literal backslash with the quotes + removed, we need to escape the backslash. */ + arg += input[0]; /* Backslash. */ + arg += input[0]; /* Backslash. */ + + /* If the character after the backslash has special + meaning outside of the double quotes, then we need to + escape it now, otherwise it will gain special meaning + as we remove the surrounding quotes. However, as per + the comments at the head of this file; we don't + escape white space or quotes. */ + if (!isspace (input[1]) + && !isquote (input[1]) + && strchr (special, input[1]) != nullptr) + arg += input[0]; /* Backslash. */ + + arg += input[1]; /* Character. */ + ++input; + } + else if (isspace (input[1]) || isquote (input[1])) + { + /* We remove the escaping from white space and quote + characters. */ + arg += input[1]; /* Character. */ + ++input; + } + else + { + /* For everything else, retain the escaping. */ + arg += input[0]; /* Backslash. */ + arg += input[1]; /* Character. */ + ++input; + } + } + else if (squote) + { + /* Inside a single quote argument there are no special + characters. A single quote finishes the argument. */ + if (*input == '\'') + squote = false; + /* Don't add escaping to white space or quotes. We already + handled single quotes above, so the isquote call here will + only find double quotes. */ + else if (isspace (*input) || isquote (*input)) + arg += *input; + /* Any other special shell character needs a backslash adding + to avoid the character gaining special meaning outside of + the single quotes. */ + else if (strchr (special, *input) != NULL) + { + arg += '\\'; + arg += *input; + } + /* Any other character just gets added to the output. */ + else + arg += *input; + } + else if (dquote) + { + /* Inside a double quoted argument. A double quote closes + the argument. An escaped double quote will have been + handled in the backslash handling block above. */ + if (*input == '"') + dquote = false; + /* Don't add escaping for white space or quotes. We already + handled double quotes above, so the isquote call here will + only find single quotes. */ + else if (isspace (*input) || isquote (*input)) + arg += *input; + /* Any character that is not one of the few characters that + retains its special meaning without double quotes, but is + otherwise a special character needs an escape character + adding, to avoid the character gaining special meaning + outside of the quotes. */ + else if (strchr (dquote_special, *input) == nullptr + && strchr (special, *input) != nullptr) + { + arg += '\\'; + arg += *input; + } + /* Anything else just gets passed through to the output. */ + else + arg += *input; + } + else + { + /* Found a character outside of a single or double quoted + argument, and not preceded by a backslash. */ + if (*input == '\'') + squote = true; + else if (*input == '"') + dquote = true; + else + arg += *input; + } + ++input; + } - gdb_argv argv (args.c_str ()); - for (int i = 0; argv[i] != nullptr; i++) - results.emplace_back (argv[i]); + remote_args.push_back (std::move (arg)); + input = skip_spaces (input); + } + while (*input != '\0'); - return results; + return remote_args; } /* See remote-args.h. */ @@ -39,5 +256,5 @@ gdb::remote_args::split (std::string args) std::string gdb::remote_args::join (std::vector> &args) { - return construct_inferior_arguments (args, escape_shell_characters); + return construct_inferior_arguments (args, escape_quotes_and_white_space); }