[v3,3/5] GDB: Only make data actually retrieved into value history available

Message ID alpine.DEB.2.20.2301211715160.7841@tpp.orcam.me.uk
State Superseded
Headers
Series gdb: introduce limited array lengths while printing values |

Commit Message

Maciej W. Rozycki Jan. 23, 2023, 11:14 p.m. UTC
  While it makes sense to allow accessing out-of-bounds elements in the 
debuggee and see whatever there might happen to be there in memory (we 
are a debugger and not a programming rules enforcement facility and we 
want to make people's life easier in chasing bugs), e.g.:

  (gdb) print one_hundred[-1]
  $1 = 0
  (gdb) print one_hundred[100]
  $2 = 0
  (gdb) 

we shouldn't really pretend that we have any meaningful data around 
values recorded in history (what these commands really retrieve are 
current debuggee memory contents outside the original data accessed, 
really confusing in my opinion).  Add code to mark data around values 
recorded in history unavailable then:

  (gdb) print one_hundred[-1]
  $1 = <unavailable>
  (gdb) print one_hundred[100]
  $2 = <unavailable>

Adjust `value_entirely_available' accordingly; also zero-length types 
need special handling in `value_entirely_covered_by_range_vector'.

Add a suitable test case, which also covers integer overflows in data 
location calculation.
---
Hi,

 While at it I noticed that while values recorded in the history are just 
as one would expect immutable, one can poke at the original data object 
data in history has originated from by writing to the value in the 
history, e.g.:

  (gdb) print -elements 3 -- one_hundred
  $1 = {0, 1, 2...}
  (gdb) print -elements 3 -- $1
  $2 = {0, 1, 2...}
  (gdb) set $1[0] = -1
  (gdb) print -elements 3 -- $1
  $3 = {0, 1, 2...}
  (gdb) print -elements 3 -- one_hundred
  $4 = {-1, 1, 2...}
  (gdb) 

This is undocumented and I find it surprising to say the least.  Shall I 
file a bug for this phenomenon?

 LONG_MIN is not used, but I have thought we can have it anyway for 
consistency and so that people do not try to invent local variants.

  Maciej

Changes from v2:

- Correct the copyright year in new files.

New change in v2.
---
 gdb/defs.h                                           |    8 ++
 gdb/testsuite/gdb.base/value-history-unavailable.c   |   29 +++++++
 gdb/testsuite/gdb.base/value-history-unavailable.exp |   73 +++++++++++++++++++
 gdb/valarith.c                                       |   15 +++
 gdb/value.c                                          |   24 ++++--
 5 files changed, 144 insertions(+), 5 deletions(-)

gdb-value-history-unavailable.diff
  

Comments

Tom Tromey Jan. 31, 2023, 6:47 p.m. UTC | #1
>>>>> "Maciej" == Maciej W Rozycki <macro@embecosm.com> writes:

Maciej> While it makes sense to allow accessing out-of-bounds elements in the 
Maciej> debuggee and see whatever there might happen to be there in memory (we 
Maciej> are a debugger and not a programming rules enforcement facility and we 
Maciej> want to make people's life easier in chasing bugs), e.g.:

Thank you for the patch.

Maciej> Index: src/gdb/defs.h
Maciej> ===================================================================
Maciej> --- src.orig/gdb/defs.h
Maciej> +++ src/gdb/defs.h
Maciej> @@ -469,6 +469,10 @@ enum val_prettyformat
Maciej>  #define	LONG_MAX ((long)(ULONG_MAX >> 1))   /* 0x7FFFFFFF for 32-bits */
Maciej>  #endif
 
Maciej> +#if !defined (LONG_MIN)                     /* 0x80000000 for 32-bits */
Maciej> +#define	LONG_MIN ((long) (~(long) 0 ^ LONG_MAX))
Maciej> +#endif

I wasn't aware of this code being here.

For LONG_*, I tend to think we don't need it.  <limits.h> should provide
it, and gnulib seems to think there aren't any systems without this:

https://www.gnu.org/software/gnulib/manual/html_node/limits_002eh.html

 
Maciej> +#if !defined (LONGEST_MIN)                 /* 0x8000000000000000 for 64-bits */
Maciej> +#define	LONGEST_MIN ((LONGEST) (~(LONGEST) 0 ^ LONGEST_MAX))
Maciej> +#endif

This one probably belongs in gdbsupport/common-types.h.

We should clean up this part of defs.h but I can do that.

Maciej> @@ -1950,6 +1955,15 @@ record_latest_value (struct value *val)
Maciej>       a value on the value history never changes.  */
Maciej>    if (value_lazy (val))
Maciej>      value_fetch_lazy (val);
Maciej> +
Maciej> +  /* Don't pretend we have anything available there in the history beyond
Maciej> +     the boundaries of the value recorded.  It's not like inferior memory
Maciej> +     where there is actual stuff underneath.  */
Maciej> +  ULONGEST length = value_enclosing_type (val)->length ();
Maciej> +  mark_value_bits_unavailable (val, LONGEST_MIN, 0 ^ LONGEST_MIN);
Maciej> +  mark_value_bits_unavailable (val, length * TARGET_CHAR_BIT,
Maciej> +			       LONGEST_MAX - length * TARGET_CHAR_BIT);

I'm struggling to understand why this is needed.
Like it seems to me that a value should always report that bytes outside
of its content range are unavailable.

Tom
  
Maciej W. Rozycki Feb. 10, 2023, 2:18 p.m. UTC | #2
On Tue, 31 Jan 2023, Tom Tromey wrote:

> Maciej> +#if !defined (LONG_MIN)                     /* 0x80000000 for 32-bits */
> Maciej> +#define	LONG_MIN ((long) (~(long) 0 ^ LONG_MAX))
> Maciej> +#endif
> 
> I wasn't aware of this code being here.
> 
> For LONG_*, I tend to think we don't need it.  <limits.h> should provide
> it, and gnulib seems to think there aren't any systems without this:
> 
> https://www.gnu.org/software/gnulib/manual/html_node/limits_002eh.html

 I guess it's still C thinking kicking around here at least on my side.  
Since we've switched to C++11 it's that language standard that we have 
been driven by, and it mandates the presence of these macros.  Except we 
need to switch to <climits>.

 I don't like to maintain a mess, so I'll repost this patch series with
a suitable update included.

> Maciej> +#if !defined (LONGEST_MIN)                 /* 0x8000000000000000 for 64-bits */
> Maciej> +#define	LONGEST_MIN ((LONGEST) (~(LONGEST) 0 ^ LONGEST_MAX))
> Maciej> +#endif
> 
> This one probably belongs in gdbsupport/common-types.h.
> 
> We should clean up this part of defs.h but I can do that.

 Already done with the upcoming <climits> update.

> Maciej> @@ -1950,6 +1955,15 @@ record_latest_value (struct value *val)
> Maciej>       a value on the value history never changes.  */
> Maciej>    if (value_lazy (val))
> Maciej>      value_fetch_lazy (val);
> Maciej> +
> Maciej> +  /* Don't pretend we have anything available there in the history beyond
> Maciej> +     the boundaries of the value recorded.  It's not like inferior memory
> Maciej> +     where there is actual stuff underneath.  */
> Maciej> +  ULONGEST length = value_enclosing_type (val)->length ();
> Maciej> +  mark_value_bits_unavailable (val, LONGEST_MIN, 0 ^ LONGEST_MIN);
> Maciej> +  mark_value_bits_unavailable (val, length * TARGET_CHAR_BIT,
> Maciej> +			       LONGEST_MAX - length * TARGET_CHAR_BIT);
> 
> I'm struggling to understand why this is needed.
> Like it seems to me that a value should always report that bytes outside
> of its content range are unavailable.

 I have investigated whether a check based solely on the length of the 
data type associated with the value recorded would do, but that triggered 
Python regressions:

FAIL: gdb.python/flexible-array-member.exp: python print(ns['items'][0])
FAIL: gdb.python/flexible-array-member.exp: python print(ns['items'][1])
FAIL: gdb.python/flexible-array-member.exp: python print(ns['items'][2])
FAIL: gdb.python/flexible-array-member.exp: python print(zs['items'][0])
FAIL: gdb.python/flexible-array-member.exp: python print(zs['items'][1])
FAIL: gdb.python/flexible-array-member.exp: python print(zs['items'][2])
FAIL: gdb.python/flexible-array-member.exp: python print(zso['items'][0])
FAIL: gdb.python/flexible-array-member.exp: python print(zso['items'][1])
FAIL: gdb.python/flexible-array-member.exp: python print(zso['items'][2])

where flexible array member elements are accessed like:

  (gdb) python ns = gdb.parse_and_eval('ns').dereference()
  (gdb) python print(ns['items'][0])

that is using a value associated with a Python variable created on the fly 
rather than a value history entry.  These now showed as <unavailable> and 
it broke the test cases, which expect actual values, i.e.:

  python print(ns['items'][0])
  <unavailable>
  (gdb) FAIL: gdb.python/flexible-array-member.exp: python print(ns['items'][0])

compared to:

  python print(ns['items'][0])
  101
  (gdb) PASS: gdb.python/flexible-array-member.exp: python print(ns['items'][0])

 So instead I looked into making value history entries a distinct lvalue 
type (it would have a nice side effect of showing "no such vector element" 
when accessing out-of-bounds array elements) replacing `lval_memory', 
`lval_register', etc.  It might also cure the history mutability issue I 
have observed (for which I have filed PR exp/30107 to track BTW).  But 
while possibly a nice clean-up I realised it would be a major task, so 
it's something perhaps for another occasion.

 So finally I came up with just a boolean flag set for entries recorded in 
the value history.  It has passed my regression testing now and I'll be 
posting it shortly along with the rest of the patches.  Unless I hear to 
the contrary, I'm going to follow your earlier approval for the changes 
already handled, but this gives you or someone else a chance to chime in 
just in case an issue gets spotted.

 Thank you for your review.

  Maciej
  
Tom Tromey Feb. 10, 2023, 9:11 p.m. UTC | #3
>>>>> "Maciej" == Maciej W Rozycki <macro@embecosm.com> writes:

>> I'm struggling to understand why this is needed.
>> Like it seems to me that a value should always report that bytes outside
>> of its content range are unavailable.

Maciej>  I have investigated whether a check based solely on the length of the 
Maciej> data type associated with the value recorded would do, but that triggered 
Maciej> Python regressions:

Interesting, thanks.  I guess the issue here is we don't know how big
the value should be in the "struct hack" case.

Tom
  

Patch

Index: src/gdb/defs.h
===================================================================
--- src.orig/gdb/defs.h
+++ src/gdb/defs.h
@@ -469,6 +469,10 @@  enum val_prettyformat
 #define	LONG_MAX ((long)(ULONG_MAX >> 1))   /* 0x7FFFFFFF for 32-bits */
 #endif
 
+#if !defined (LONG_MIN)                     /* 0x80000000 for 32-bits */
+#define	LONG_MIN ((long) (~(long) 0 ^ LONG_MAX))
+#endif
+
 #if !defined (ULONGEST_MAX)
 #define	ULONGEST_MAX (~(ULONGEST)0)        /* 0xFFFFFFFFFFFFFFFF for 64-bits */
 #endif
@@ -477,6 +481,10 @@  enum val_prettyformat
 #define	LONGEST_MAX ((LONGEST)(ULONGEST_MAX >> 1))
 #endif
 
+#if !defined (LONGEST_MIN)                 /* 0x8000000000000000 for 64-bits */
+#define	LONGEST_MIN ((LONGEST) (~(LONGEST) 0 ^ LONGEST_MAX))
+#endif
+
 /* * Convert a LONGEST to an int.  This is used in contexts (e.g. number of
    arguments to a function, number in a value history, register number, etc.)
    where the value must not be larger than can fit in an int.  */
Index: src/gdb/testsuite/gdb.base/value-history-unavailable.c
===================================================================
--- /dev/null
+++ src/gdb/testsuite/gdb.base/value-history-unavailable.c
@@ -0,0 +1,29 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+struct
+{
+  unsigned char x[1024];
+  unsigned char a[1024];
+  unsigned char y[1024];
+} a = { {-1} };
+
+int
+main ()
+{
+  return 0;
+}
Index: src/gdb/testsuite/gdb.base/value-history-unavailable.exp
===================================================================
--- /dev/null
+++ src/gdb/testsuite/gdb.base/value-history-unavailable.exp
@@ -0,0 +1,73 @@ 
+# Copyright (C) 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test GDB's value availability ranges.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" $testfile $srcfile] } {
+    return -1
+}
+
+if ![runto_main] then {
+    perror "couldn't run to breakpoint"
+    continue
+}
+
+set target_char_mask [get_valueof "/u" "a.x\[0]" "255" "get target char mask"]
+set target_char_bit 0
+for {set i $target_char_mask} {$i > 0} {set i [expr $i >> 1]} {
+    incr target_char_bit
+}
+set target_char_rank -1
+for {set i $target_char_bit} {$i > 0} {set i [expr $i >> 1]} {
+    incr target_char_rank
+}
+
+# Verify accesses to original inferior data.
+gdb_test "print a.a" "\\\$2 = '\\\\000' <repeats 1023 times>"
+gdb_test "print a.a\[-1\]" "\\\$3 = 0 '\\\\000'"
+gdb_test "print a.a\[1024\]" "\\\$4 = 0 '\\\\000'"
+
+# Verify in-range value history accesses.
+gdb_test "print \$2" "\\\$5 = '\\\\000' <repeats 1023 times>"
+gdb_test "print \$2\[0\]" "\\\$6 = 0 '\\\\000'"
+gdb_test "print \$2\[1023\]" "\\\$7 = 0 '\\\\000'"
+
+# Values outside the array recorded will have not been retrieved.
+gdb_test "print \$2\[-1\]" "\\\$8 = <unavailable>"
+gdb_test "print \$2\[1024\]" "\\\$9 = <unavailable>"
+gdb_test "print \$2\[-1LL << 63 - $target_char_rank\]" \
+	"\\\$10 = <unavailable>"
+gdb_test "print \$2\[(1LL << 63 - $target_char_rank) - 1\]" \
+	"\\\$11 = <unavailable>"
+
+# Accesses through pointers in history go straight to the inferior though.
+gdb_test "print \$2\[0\]@1" "\\\$12 = \"\""
+gdb_test "print \$2\[-1\]@1" "\\\$13 = \"\""
+gdb_test "print \$2\[1024\]@1" "\\\$14 = \"\""
+
+# Verify out-of-range value history accesses.
+gdb_test "print \$2\[(-1LL << 63 - $target_char_rank) - 1\]" \
+    "Integer overflow in data location calculation"
+gdb_test "print \$2\[(1LL << 63 - $target_char_rank)\]" \
+    "Integer overflow in data location calculation"
+gdb_test "print \$2\[-1LL << 63\]" \
+    "Integer overflow in data location calculation"
+gdb_test "print \$2\[(1ULL << 63) - 1\]" \
+    "Integer overflow in data location calculation"
+
+# Sanity-check a copy of an unavailable value.
+gdb_test "print \$11" "\\\$15 = <unavailable>"
Index: src/gdb/valarith.c
===================================================================
--- src.orig/gdb/valarith.c
+++ src/gdb/valarith.c
@@ -182,6 +182,21 @@  value_subscript (struct value *array, LO
 	}
 
       index -= *lowerbound;
+
+      /* Do not try to dereference a pointer to an unavailable value.
+	 Instead mock up a new one and give it the original address.  */
+      struct type *elt_type = check_typedef (tarray->target_type ());
+      LONGEST elt_size = type_length_units (elt_type);
+      if (!value_lazy (array)
+	  && !value_bytes_available (array, elt_size * index, elt_size))
+	{
+	  struct value *val = allocate_value (elt_type);
+	  mark_value_bytes_unavailable (val, 0, elt_size);
+	  VALUE_LVAL (val) = lval_memory;
+	  set_value_address (val, value_address (array) + elt_size * index);
+	  return val;
+	}
+
       array = value_coerce_array (array);
     }
 
Index: src/gdb/value.c
===================================================================
--- src.orig/gdb/value.c
+++ src/gdb/value.c
@@ -421,7 +421,9 @@  value_entirely_available (struct value *
   if (value->lazy)
     value_fetch_lazy (value);
 
-  if (value->unavailable.empty ())
+  if (value->unavailable.empty ()
+      || value_bytes_available (value, 0,
+				value_enclosing_type (value)->length ()))
     return 1;
   return 0;
 }
@@ -441,11 +443,12 @@  value_entirely_covered_by_range_vector (
 
   if (ranges.size () == 1)
     {
+      ULONGEST length = value_enclosing_type (value)->length ();
       const struct range &t = ranges[0];
 
-      if (t.offset == 0
-	  && t.length == (TARGET_CHAR_BIT
-			  * value_enclosing_type (value)->length ()))
+      if (length == 0)
+	return t.offset == 0 && t.length == 0;
+      if (t.offset <= 0 && t.offset + t.length >= TARGET_CHAR_BIT * length)
 	return 1;
     }
 
@@ -1277,7 +1280,9 @@  require_not_optimized_out (const struct
 static void
 require_available (const struct value *value)
 {
-  if (!value->unavailable.empty ())
+  if (!value->unavailable.empty ()
+      && !value_bytes_available (value, 0,
+				 value_enclosing_type (value)->length ()))
     throw_error (NOT_AVAILABLE_ERROR, _("value is not available"));
 }
 
@@ -1950,6 +1955,15 @@  record_latest_value (struct value *val)
      a value on the value history never changes.  */
   if (value_lazy (val))
     value_fetch_lazy (val);
+
+  /* Don't pretend we have anything available there in the history beyond
+     the boundaries of the value recorded.  It's not like inferior memory
+     where there is actual stuff underneath.  */
+  ULONGEST length = value_enclosing_type (val)->length ();
+  mark_value_bits_unavailable (val, LONGEST_MIN, 0 ^ LONGEST_MIN);
+  mark_value_bits_unavailable (val, length * TARGET_CHAR_BIT,
+			       LONGEST_MAX - length * TARGET_CHAR_BIT);
+
   /* We preserve VALUE_LVAL so that the user can find out where it was fetched
      from.  This is a bit dubious, because then *&$1 does not just return $1
      but the current contents of that location.  c'est la vie...  */