Force array coercion in c_get_string

Message ID 877ebh7sve.fsf@tromey.com
State New, archived
Headers

Commit Message

Tom Tromey April 25, 2019, 8:05 p.m. UTC
  Pedro> So with the patch, we coerce the array to inferior memory, in order
Pedro> to read its contents back from the inferior?  Is that right?
Pedro> Why not skip the detour and save the memory allocation in the inferior?

Tom> Yeah, that's weird, isn't it.

Plan B is a bit more complicated, but may be nicer because in some cases
it will require fewer memory reads, and also doesn't force coercion.

Let me know what you think.

Tom

commit 9acd65e278c569f47e73f9d455127b3379792c34
Author: Tom Tromey <tromey@adacore.com>
Date:   Thu Apr 25 12:14:58 2019 -0600

    Correctly handle non-C-style arrays in c_get_string
    
    A user here noticed that the Python Value.string method did not work
    for Ada arrays.  I tracked this down to an oddity in value_as_address
    -- namely, it calls coerce_array, but that function will not force
    array coercion when the language has c_style_arrays=false, as Ada
    does.
    
    This patch fixes the problem by changing c_get_string so that arrays
    take the "in GDB's memory" branch.  The actual patch is somewhat more
    complicated than you might think, because the caller can request more
    array elements than the type allows.  This is normal when the type is
    using the C struct hack.
    
    Tested on x86-64 Fedora 29.
    
    gdb/ChangeLog
    2019-04-25  Tom Tromey  <tromey@adacore.com>
    
            * c-lang.c (c_get_string): Handle non-C-style arrays.
    
    gdb/testsuite/ChangeLog
    2019-04-25  Tom Tromey  <tromey@adacore.com>
    
            * gdb.python/py-value.exp (test_value_in_inferior): Add Ada test.
  

Comments

Pedro Alves April 26, 2019, 4:42 p.m. UTC | #1
On 4/25/19 9:05 PM, Tom Tromey wrote:

>The actual patch is somewhat more
>     complicated than you might think, because the caller can request more
>     array elements than the type allows.  This is normal when the type is
>     using the C struct hack.

Which test exposes this issue?

I'm a bit confused here.  If we ever coerce an array and then read more
elements than the type allows, won't we be reading garbage passed the
coerced array elements?

Thanks,
Pedro Alves
  
Tom Tromey April 26, 2019, 4:49 p.m. UTC | #2
>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:

>> The actual patch is somewhat more
>> complicated than you might think, because the caller can request more
>> array elements than the type allows.  This is normal when the type is
>> using the C struct hack.

Pedro> Which test exposes this issue?

From py-value.exp:

  # Test fetching a string longer than its declared (in C) size.
  # PR 16286
  gdb_py_test_silent_cmd "python xstr = gdb.parse_and_eval('xstr')" "get xstr" 1
  gdb_test "python print(xstr\['text'\].string (length = xstr\['length'\]))" "x{100}" \
    "read string beyond declared size"

Pedro> I'm a bit confused here.  If we ever coerce an array and then read more
Pedro> elements than the type allows, won't we be reading garbage passed the
Pedro> coerced array elements?

In the case where we want to read past the array limit, c_get_string
will take the second branch, which does an explicit read_string.

Tom
  
Pedro Alves April 26, 2019, 7:44 p.m. UTC | #3
On 4/26/19 5:49 PM, Tom Tromey wrote:
>>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:
> 
>>> The actual patch is somewhat more
>>> complicated than you might think, because the caller can request more
>>> array elements than the type allows.  This is normal when the type is
>>> using the C struct hack.
> 
> Pedro> Which test exposes this issue?
> 
> From py-value.exp:
> 
>   # Test fetching a string longer than its declared (in C) size.
>   # PR 16286
>   gdb_py_test_silent_cmd "python xstr = gdb.parse_and_eval('xstr')" "get xstr" 1
>   gdb_test "python print(xstr\['text'\].string (length = xstr\['length'\]))" "x{100}" \
>     "read string beyond declared size"
> 
> Pedro> I'm a bit confused here.  If we ever coerce an array and then read more
> Pedro> elements than the type allows, won't we be reading garbage passed the
> Pedro> coerced array elements?
> 
> In the case where we want to read past the array limit, c_get_string
> will take the second branch, which does an explicit read_string.

Try this:

 (gdb) python print (gdb.parse_and_eval("\"ab\"").string (length = 10))

This will go to the "in GDB's memory" branch, but it will
read beyond the value's contents buffer.

That's preexisting, but it's related to what I was saying above.

Looking at this comment:

> +     An array is assumed to live in GDB's memory, so we take this path
> +     here.  However, it's possible for the caller to request more
> +     array elements than apparently exist -- this can happen when
> +     using the C struct hack.  So, only do this for an array if either
> +     no length was specified, or the length is within the existing
> +     bounds.  */

in my previous message, I was thinking that in the second branch we'd
end up copying the array to target memory and then read past its end.

We won't copy the array to memory, because the other hunk skips
value_as_address.

However,

>    else
>      {
> -      CORE_ADDR addr = value_as_address (value);
> +      /* value_as_address calls coerce_array, but that won't actually
> +	 coerce the array if c_style_arrays is false.  In this case,
> +	 get the address of the array directly.  */

Note: I found this comment confusing.  The first sentence suggests that you
_want_ the coercion, including copying a "GDB memory" array to target
memory, but you don't get it because of c_style_arrays.
But I don't think getting the raw address of an array is coercion,
not in the sense of converting to a different type, like a pointer
to the first element, as in C.

> +      CORE_ADDR addr;
> +      if (TYPE_CODE (type) == TYPE_CODE_ARRAY)
> +	addr = value_address (value);

Can we get here with an LVAL != lval_memory?  

I assume so, given the condition:

  if ((VALUE_LVAL (value) == not_lval
       || VALUE_LVAL (value) == lval_internalvar)
      && fetchlimit != UINT_MAX)

So if we get to the second branch with:

 VALUE_LVAL (value) == not_lval
 TYPE_CODE (type) == TYPE_CODE_ARRAY
 fetchlimit == UINT_MAX

We'll be calling value_address on a non-lval_memory value.
value_address returns address 0 in that case, does not throw
an error (maybe it should).

value_coerce_array, if it were reached, has this:

  if (VALUE_LVAL (arg1) != lval_memory)
    error (_("Attempt to take address of value not located in memory."));

So shouldn't we have such a check here too?

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 17e10e869b8..b8e018a97e5 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,7 @@ 
+2019-04-25  Tom Tromey  <tromey@adacore.com>
+
+	* c-lang.c (c_get_string): Handle non-C-style arrays.
+
 2019-04-25  Tom Tromey  <tromey@adacore.com>
 
 	PR gdb/24475:
diff --git a/gdb/c-lang.c b/gdb/c-lang.c
index 33506f1d1ed..0c189aae606 100644
--- a/gdb/c-lang.c
+++ b/gdb/c-lang.c
@@ -279,9 +279,18 @@  c_get_string (struct value *value, gdb::unique_xmalloc_ptr<gdb_byte> *buffer,
   /* If the string lives in GDB's memory instead of the inferior's,
      then we just need to copy it to BUFFER.  Also, since such strings
      are arrays with known size, FETCHLIMIT will hold the size of the
-     array.  */
+     array.
+
+     An array is assumed to live in GDB's memory, so we take this path
+     here.  However, it's possible for the caller to request more
+     array elements than apparently exist -- this can happen when
+     using the C struct hack.  So, only do this for an array if either
+     no length was specified, or the length is within the existing
+     bounds.  */
   if ((VALUE_LVAL (value) == not_lval
-       || VALUE_LVAL (value) == lval_internalvar)
+       || VALUE_LVAL (value) == lval_internalvar
+       || (TYPE_CODE (type) == TYPE_CODE_ARRAY
+	   && (*length < 0 || *length <= fetchlimit)))
       && fetchlimit != UINT_MAX)
     {
       int i;
@@ -306,7 +315,14 @@  c_get_string (struct value *value, gdb::unique_xmalloc_ptr<gdb_byte> *buffer,
     }
   else
     {
-      CORE_ADDR addr = value_as_address (value);
+      /* value_as_address calls coerce_array, but that won't actually
+	 coerce the array if c_style_arrays is false.  In this case,
+	 get the address of the array directly.  */
+      CORE_ADDR addr;
+      if (TYPE_CODE (type) == TYPE_CODE_ARRAY)
+	addr = value_address (value);
+      else
+	addr = value_as_address (value);
 
       /* Prior to the fix for PR 16196 read_string would ignore fetchlimit
 	 if length > 0.  The old "broken" behaviour is the behaviour we want:
diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index c81d841e93a..7127cbdc75f 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,7 @@ 
+2019-04-25  Tom Tromey  <tromey@adacore.com>
+
+	* gdb.python/py-value.exp (test_value_in_inferior): Add Ada test.
+
 2019-04-25  Sergio Durigan Junior  <sergiodj@redhat.com>
 
 	PR corefiles/11608
diff --git a/gdb/testsuite/gdb.python/py-value.exp b/gdb/testsuite/gdb.python/py-value.exp
index b3d90b52272..7f42d31572a 100644
--- a/gdb/testsuite/gdb.python/py-value.exp
+++ b/gdb/testsuite/gdb.python/py-value.exp
@@ -315,6 +315,13 @@  proc test_value_in_inferior {} {
   gdb_test "python print (\"---\"+st.string (length = 0)+\"---\")" "------" "test string (length = 0) is empty"
   gdb_test "python print (len(st.string (length = 0)))" "0" "test length is 0"
 
+  # We choose Ada here to test a language where c_style_arrays is
+  # false.
+  gdb_test "set lang ada" \
+      "Warning: the current language does not match this frame."
+  gdb_test "python print (st.string ())"  "divide et impera"  \
+      "Test string with no length in ada"
+  gdb_test_no_output "set lang auto"
 
   # Fetch a string that has embedded nulls.
   gdb_test "print nullst" "\"divide\\\\000et\\\\000impera\".*"