[PATCHv2] gdb: fix missing null-character when using value_cstring

Message ID 75628df1fbd72166504c9b2a368170121b86e072.1680849242.git.aburgess@redhat.com
State New
Headers
Series [PATCHv2] gdb: fix missing null-character when using value_cstring |

Commit Message

Andrew Burgess April 7, 2023, 6:35 a.m. UTC
  In v2:

 - Updated inline with Simon's suggestion.  value_cstring is now
   responsible for adding the null character, users of value_cstring
   have been updated as required.

---

In PR gdb/21699 an issue was reported with the $_as_string convenience
function.  It was observed that the string returned by this function,
when pushed into the inferior, was not null terminated.

This was causing problems when using the result with GDB's printf
command, as this command relies on the string having been pushed into
the inferior and being null terminated.

The bug includes a simple reproducer:

  #include <stddef.h>
  static char arena[51] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";

  /* Override malloc() so value_coerce_to_target() gets a known pointer, and we
     know we"ll see an error if $_as_string() gives a string that isn't NULL
     terminated. */
  void
  *malloc (size_t size)
  {
      if (size > sizeof (arena))
          return NULL;
      return arena;
  }

  int
  main ()
  {
    return 0;
  }

Then use this in a GDB session like this:

  $ gdb -q test
  Reading symbols from /tmp/test...
  (gdb) start
  Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
  Starting program: /tmp/test

  Temporary breakpoint 1, main () at test.c:17
  17        return 0;
  (gdb) printf "%s\n", $_as_string("hello")
  "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  (gdb) quit

The problem is with the GDB function value_cstring, or at least, how
this function is being used.

When $_as_string is called we enter fnpy_call (python/py-function.c),
this calls into Python code.  The Python code returns a result, which
will be a Python string, and then we call convert_value_from_python to
convert the Python string to a GDB value.

In convert_value_from_python (python/py-value.c) we enter the
gdbpy_is_string case (as the result is a string), then we call
python_string_to_target_string, which returns a null terminated C
string.  Next we then make this call:

  value = value_cstring (s.get (), strlen (s.get ()),
                         builtin_type_pychar);

This passes the pointer to the first character 's.get ()' and the
length of the string 'strlen (s.get ())', however, this length does
not include the trailing null character.

If we look at value_cstring (valops.c) we see that an array is created
using the passed in length, and characters are copied into the newly
allocated array value.  This means we do not copy the strings trailing
null character, nor does value_cstring add a trailing null.

Finally, when we use this array value with printf, GDB pushed the
array to the inferior, which mallocs some space based on the array
length, and then copies the array content to the inferior.

As the array doesn't include a trailing null, none is written into the
inferior.  So what we place into the inferior is not a C string, but
is actually an array of characters.

When printf tries to print the value it starts at the address of the
first character and continues until it reaches a null.  When that will
be is undefined, so we may end up printing random garbage.

Now, ignore for a moment that the whole push an array to the inferior
just so we can fetch it in order to print it is clearly crazy.  That's
a problem for another day I think.  The important question here is:
should value_cstring include a null character or not.

Given the function name include 'cstring', which I'm assuming means C
style string, I think that we should be including a trailing null.

Currently, the use of value_cstring in c_string_operation::evaluate
includes the trailing null in the length sent to value_cstring, as do
two uses of value_cstring in str_value_from_setting.  Every other use
of value_cstring does not include the null in the length being sent,
and so is currently broken.

This commit changes value_cstring to add a null character at the end
of the generated array value, and updates c_string_operation::evaluate
and str_value_from_setting so that the length sent no longer includes
the trailing null.

I've added a header comment to value_cstring (value.h) to describe how
value_cstring works.

Upon testing there were two tests that failed after this fix,
gdb.base/settings.exp and gdb.python/py-mi.exp.  In both of these
cases we end up asking for the type of a character array allocated
through value_cstring.  The length of this array has now increased by
one.  Here's the previous behaviour:

  (gdb) set args abc
  (gdb) p $_gdb_setting("args")
  $1 = "abc"
  (gdb) ptype $1
  type = char [3]
  (gdb)

And here's the modified behaviour:

  (gdb) set args abc
  (gdb) p $_gdb_setting("args")
  $1 = "abc"
  (gdb) ptype $1
  type = char [4]
  (gdb)

Notice the type of $1 changed from an array of length 3 to an array of
length 4.  However, I don't think this is a bad thing, consider:

  char lit[] = "zzz";
  int
  main()
  {
    return 0;
  }

And in GDB:

  (gdb) ptype lit
  type = char [4]
  (gdb)

The null character is considered part of the array, so I think the new
behaviour makes sense.

I've added some new tests that try to exercise this issue in a few
different ways.  At the end of the day though it's all just variations
of the same thing, create a value by calling through value_cstring,
then force GDB to push the value to the inferior, and then treat that
as a C style string and see the lack of null character cause problems.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
---
 gdb/c-lang.c                             | 13 +++--
 gdb/cli/cli-cmds.c                       |  4 +-
 gdb/testsuite/gdb.base/cstring-exprs.c   | 51 ++++++++++++++++++
 gdb/testsuite/gdb.base/cstring-exprs.exp | 68 ++++++++++++++++++++++++
 gdb/testsuite/gdb.base/settings.exp      |  4 +-
 gdb/testsuite/gdb.python/py-mi.exp       |  2 +-
 gdb/valops.c                             | 12 +++++
 gdb/value.h                              |  6 +++
 8 files changed, 153 insertions(+), 7 deletions(-)
 create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.c
 create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.exp


base-commit: 9340f361097963011369c3339f7d28239d2f851b
  

Patch

diff --git a/gdb/c-lang.c b/gdb/c-lang.c
index 6535ab93498..d845f1ad829 100644
--- a/gdb/c-lang.c
+++ b/gdb/c-lang.c
@@ -678,9 +678,16 @@  c_string_operation::evaluate (struct type *expect_type,
 		  obstack_object_size (&output));
 	}
       else
-	result = value_cstring ((const char *) obstack_base (&output),
-				obstack_object_size (&output),
-				type);
+	{
+	  /* We expect the string on the obstack to, at a minimum, contain
+	     a null character.  */
+	  const char *str = (const char *) obstack_base (&output);
+	  gdb_assert (obstack_object_size (&output) >= type->length ());
+	  ssize_t len = obstack_object_size (&output) - type->length ();
+	  for (i = 0; i < type->length (); ++i)
+	    gdb_assert (str[len + i] == '\0');
+	  result = value_cstring (str, len, type);
+	}
     }
   return result;
 }
diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
index 3b1c6a9f4bd..ad22a68a9ec 100644
--- a/gdb/cli/cli-cmds.c
+++ b/gdb/cli/cli-cmds.c
@@ -2310,7 +2310,7 @@  value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  return value_cstring (value, len,
 				builtin_type (gdbarch)->builtin_char);
 	else
-	  return value_cstring ("", 1,
+	  return value_cstring ("", 0,
 				builtin_type (gdbarch)->builtin_char);
       }
     default:
@@ -2395,7 +2395,7 @@  str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
 	  return value_cstring (value, len,
 				builtin_type (gdbarch)->builtin_char);
 	else
-	  return value_cstring ("", 1,
+	  return value_cstring ("", 0,
 				builtin_type (gdbarch)->builtin_char);
       }
     default:
diff --git a/gdb/testsuite/gdb.base/cstring-exprs.c b/gdb/testsuite/gdb.base/cstring-exprs.c
new file mode 100644
index 00000000000..e541bf0bf3f
--- /dev/null
+++ b/gdb/testsuite/gdb.base/cstring-exprs.c
@@ -0,0 +1,51 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+#include <string.h>
+
+/* A memory area used as the malloc memory buffer.  */
+
+static char arena[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
+
+/* Override malloc().  When GDB tries to push strings into the inferior we
+   always return the same pointer to arena.  This does mean we can't have
+   multiple strings in use at the same time, but that's fine for our basic
+   testing, and this is simpler than using dlsym.  */
+
+void *
+malloc (size_t size)
+{
+  memset (arena, 'X', sizeof (arena));
+  if (size > sizeof (arena))
+    return NULL;
+  return arena;
+}
+
+/* This function is called from GDB.  */
+
+void
+take_string (const char *str)
+{
+  /* Nothing.  */
+}
+
+int
+main (void)
+{
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/cstring-exprs.exp b/gdb/testsuite/gdb.base/cstring-exprs.exp
new file mode 100644
index 00000000000..fc2f96affa0
--- /dev/null
+++ b/gdb/testsuite/gdb.base/cstring-exprs.exp
@@ -0,0 +1,68 @@ 
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test different ways that we can cause GDB to call the value_string
+# function.  This function should create a C style string, i.e. the
+# string should include a null terminating character.
+#
+# If the value created by value_cstring is pushed into the inferior
+# and the null terminating character is missing, then we can get
+# unexpected results.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" $testfile $srcfile] } {
+    return -1
+}
+
+if {![runto_main]} {
+    return 0
+}
+
+if [allow_python_tests] {
+    # The $_as_string convenience function is implemented in Python.
+    gdb_test {printf "%s\n", $_as_string("aabbcc")} "\"aabbcc\""
+
+    # Create a gdb.Value object for a string.  Take its address (which
+    # forces it into the inferior), and then print the address as a
+    # string.
+    gdb_test_no_output {python addr = gdb.Value("xxyyzz").address}
+    gdb_test {python gdb.execute("x/1s 0x%x" % addr)} \
+	"$hex <arena>:\\s+\"xxyyzz\""
+
+    # Call an inferior function through a gdb.Value object, pass a
+    # string to the inferior function and ensure it arrives correctly.
+    gdb_test "p/x take_string" " = $hex.*"
+    gdb_test_no_output "python func_ptr = gdb.history (0)" \
+	"place address of take_string into Python variable"
+    gdb_test "python func_value = func_ptr.dereference()" ""
+    gdb_breakpoint "take_string"
+    gdb_test {python result = func_value("qqaazz")} \
+	"Breakpoint $decimal, take_string \\(str=$hex <arena> \"qqaazz\"\\) at .*"
+    gdb_test "continue" "Continuing\\."
+}
+
+# Use printf on a string parsed by the C expression parser.
+gdb_test {printf "%s\n", "ddeeff"} "ddeeff"
+
+# Parse a string in the C expression parser, force it into the
+# inferior by taking its address, then print it as a string.
+gdb_test {x/1s &"gghhii"} "$hex <arena>:\\s+\"gghhii\""
+
+# Use $_gdb_setting_str and $_gdb_maint_setting_str to create a string
+# value, and then print using printf, which forces the string into the
+# inferior.
+gdb_test {printf "%s\n", $_gdb_setting_str("arch")} "auto"
+gdb_test {printf "%s\n", $_gdb_maint_setting_str("bfd-sharing")} "on"
diff --git a/gdb/testsuite/gdb.base/settings.exp b/gdb/testsuite/gdb.base/settings.exp
index e54203924ec..9dd3ee174c7 100644
--- a/gdb/testsuite/gdb.base/settings.exp
+++ b/gdb/testsuite/gdb.base/settings.exp
@@ -504,7 +504,9 @@  proc_with_prefix test-enum {} {
     gdb_test_no_output "$set_cmd zzz"
     show_setting "$show_cmd" "zzz" 0 "yyy"
 
-    check_type "test-settings enum" "type = char \\\[3\\\]"
+    # A string literal includes the trailing null character, hence the
+    # array size of four here.
+    check_type "test-settings enum" "type = char \\\[4\\\]"
 
     test_gdb_complete_multiple "$set_cmd " "" "" {
 	"xxx"
diff --git a/gdb/testsuite/gdb.python/py-mi.exp b/gdb/testsuite/gdb.python/py-mi.exp
index 97b8a42f715..c65c83553f1 100644
--- a/gdb/testsuite/gdb.python/py-mi.exp
+++ b/gdb/testsuite/gdb.python/py-mi.exp
@@ -288,7 +288,7 @@  mi_create_dynamic_varobj nstype2 nstype2 ".*" 1 \
   "create nstype2 varobj"
 
 mi_list_varobj_children nstype2 {
-    { {nstype2.<error at 0>} {<error at 0>} 6 {char \[6\]} }
+    { {nstype2.<error at 0>} {<error at 0>} 7 {char \[7\]} }
 } "list children after setting exception flag"
 
 mi_create_varobj me me \
diff --git a/gdb/valops.c b/gdb/valops.c
index d002c9dca9b..13df292d251 100644
--- a/gdb/valops.c
+++ b/gdb/valops.c
@@ -1738,17 +1738,29 @@  value_array (int lowbound, int highbound, struct value **elemvec)
   return val;
 }
 
+/* See value.h.  */
+
 struct value *
 value_cstring (const char *ptr, ssize_t len, struct type *char_type)
 {
+  gdb_assert (len % char_type->length () == 0);
+
   struct value *val;
   int lowbound = current_language->string_lower_bound ();
   ssize_t highbound = len / char_type->length ();
+
+  /* Add one to HIGHBOUND to leave space for a null character.  */
+  highbound += 1;
+
   struct type *stringtype
     = lookup_array_range_type (char_type, lowbound, highbound + lowbound - 1);
 
   val = value::allocate (stringtype);
+  gdb_assert (val->type ()->length () >= len);
   memcpy (val->contents_raw ().data (), ptr, len);
+
+  /* Fill the null character.  */
+  memset (val->contents_raw ().data () + len, 0, char_type->length ());
   return val;
 }
 
diff --git a/gdb/value.h b/gdb/value.h
index e13f92c397b..86669a9a64b 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -1182,6 +1182,12 @@  class scoped_value_mark
   const struct value *m_value;
 };
 
+/* Create an array value, with element type CHAR_TYPE and length LEN.  The
+   array represents a C style string.  The content for the value is copied
+   from PTR.  This function will add a null-character at the end of the
+   resulting value, so LEN should not include the null from the source
+   string.  */
+
 extern struct value *value_cstring (const char *ptr, ssize_t len,
 				    struct type *char_type);
 extern struct value *value_string (const char *ptr, ssize_t len,