[01/16] libiberty/buildargv: POSIX behaviour for backslash handling

Message ID cb38535ada7d18f2b11be79174d2afe3584866de.1704809585.git.aburgess@redhat.com
State New
Headers
Series Inferior argument (inc for remote targets) changes |

Commit Message

Andrew Burgess Jan. 9, 2024, 2:26 p.m. UTC
  This is a libiberty patch.  I've posted this to the gcc-patches list
here:

  https://inbox.sourceware.org/gcc-patches/24a8d878590403540bc9b579ba58805985a4d2f7.1701881419.git.aburgess@redhat.com/

However, GCC is currently in stage 4 of its release cycle.  Based on
the timing of previous releases, I'm not expecting this patch to be
merged before April.

One option is clearly to just wait until GCC hits stage 1, and then
try to get this patch merged.

Another option would be to create a GDB only fork of buildargv which
includes this patch.  In April if/when I manage to get this patch
merged I would remove out GDB local copy.  Of course, there's a risk
that this patch isn't accepted into GCC, in which case we might be
stuck with a GDB only fork.

Either way, I figure the first step is to address any issues that are
raised with the rest of this series, this could well take until April
anyway, in which case GCC might be back in stage 1.

Thanks,
Andrew

---

GDB makes use of the libiberty function buildargv for splitting the
inferior (program being debugged) argument string in some situations.

I have recently been working to fix some edge cases issues in this
area of GDB, and have tracked done some of the unexpected behaviour to
the libiberty function buildargv, and how it handles backslash
escapes.

For reference, I've been mostly reading:

  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html

The issues that I would like to fix are:

  1. Backslashes within single quotes should not be treated as an
  escape, thus: '\a' should split to \a, retaining the backslash.

  2. Backslashes within double quotes should only act as an escape if
  they are immediately before one of the characters $ (dollar),
  ` (backtick), " (double quote), ` (backslash), or \n (newline).  In
  all other cases a backslash should not be treated as an escape
  character.  Thus: "\a" should split to \a, but "\$" should split to
  $.

  3. A backslash-newline sequence should be treated as a line
  continuation, both the backslash and the newline should be removed.

I've updated libiberty and also added some tests.  All the existing
libiberty tests continue to pass, but I'm not sure if there is
additional testing that should be done.
---
 libiberty/argv.c                      |  8 +++++--
 libiberty/testsuite/test-expandargv.c | 34 +++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 2 deletions(-)
  

Patch

diff --git a/libiberty/argv.c b/libiberty/argv.c
index c2823d3e4ba..6bae4ca2ee9 100644
--- a/libiberty/argv.c
+++ b/libiberty/argv.c
@@ -224,9 +224,13 @@  char **buildargv (const char *input)
 		  if (bsquote)
 		    {
 		      bsquote = 0;
-		      *arg++ = *input;
+		      if (*input != '\n')
+			*arg++ = *input;
 		    }
-		  else if (*input == '\\')
+		  else if (*input == '\\'
+			   && !squote
+			   && (!dquote
+			       || strchr ("$`\"\\\n", *(input + 1)) != NULL))
 		    {
 		      bsquote = 1;
 		    }
diff --git a/libiberty/testsuite/test-expandargv.c b/libiberty/testsuite/test-expandargv.c
index 30f2337ef77..b8dcc6a269a 100644
--- a/libiberty/testsuite/test-expandargv.c
+++ b/libiberty/testsuite/test-expandargv.c
@@ -142,6 +142,40 @@  const char *test_data[] = {
   "b",
   0,
 
+  /* Test 7 - No backslash removal within single quotes.  */
+  "'a\\$VAR' '\\\"'",    /* Test 7 data */
+  ARGV0,
+  "@test-expandargv-7.lst",
+  0,
+  ARGV0,
+  "a\\$VAR",
+  "\\\"",
+  0,
+
+  /* Test 8 - Remove backslash / newline pairs.  */
+  "\"ab\\\ncd\" ef\\\ngh",    /* Test 8 data */
+  ARGV0,
+  "@test-expandargv-8.lst",
+  0,
+  ARGV0,
+  "abcd",
+  "efgh",
+  0,
+
+  /* Test 9 - Backslash within double quotes.  */
+  "\"\\$VAR\" \"\\`\" \"\\\"\" \"\\\\\" \"\\n\" \"\\t\"",    /* Test 9 data */
+  ARGV0,
+  "@test-expandargv-9.lst",
+  0,
+  ARGV0,
+  "$VAR",
+  "`",
+  "\"",
+  "\\",
+  "\\n",
+  "\\t",
+  0,
+
   0 /* Test done marker, don't remove. */
 };