diff mbox

[doc,RFA] Fix lazy string type docs

Message ID 94eb2c1121f421b06405406923b6@google.com
State New
Headers show

Commit Message

Doug Evans Nov. 3, 2016, 5:46 p.m. UTC
Hi.

I was trying to understand a problem I was having with python lazy strings.
It turns out the docs are wrong, and the "type" attribute of a lazy
string is the character type, not a pointer to the character's type.

Tested on amd64-linux.

2016-11-03  Doug Evans  <dje@google.com>

	* python/py-lazy-string.c (lazy_string_object): Add comment.

	doc/
	* guile.texi (Lazy Strings In Guile): Fix docs for lazy-string-type.
	* python.texi (Lazy Strings In Python): Fix docs for LazyString.type.

	testsuite/
	* gdb.guile/scm-value.exp (test_lazy_strings): Add test for
	lazy-string-type.
	* gdb.python/py-value.exp (test_lazy_strings): Add test for
	LazyString.type.

Comments

Eli Zaretskii Nov. 3, 2016, 6:21 p.m. UTC | #1
> Date: Thu, 03 Nov 2016 17:46:28 +0000
> From: Doug Evans <dje@google.com>
> 
> +Return the type of a character in @var{lazy-string}.

Hmm... what is "type of a character" in this context?

I have no comments to the markup or the text.

Thanks.
Doug Evans Nov. 3, 2016, 6:35 p.m. UTC | #2
Apologies for the resend. text/html strikes again.

On Thu, Nov 3, 2016 at 11:21 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Thu, 03 Nov 2016 17:46:28 +0000
>> From: Doug Evans <dje@google.com>
>>
>> +Return the type of a character in @var{lazy-string}.
>
> Hmm... what is "type of a character" in this context?
>
> I have no comments to the markup or the text.
>
> Thanks.

Consider char vs wchar_t.
e.g.,
#include <stddef.h>
char foo[] = "bar";
wchar_t wide_foo[] = L"bar";
Doug Evans Nov. 3, 2016, 7:03 p.m. UTC | #3
On Thu, Nov 3, 2016 at 10:46 AM, Doug Evans <dje@google.com> wrote:
> Hi.
>
> I was trying to understand a problem I was having with python lazy strings.
> It turns out the docs are wrong, and the "type" attribute of a lazy
> string is the character type, not a pointer to the character's type.

... Unless the thing we're making a lazy string out of is an array
instead of a pointer.

const char* foo = "Dave";
const char bar[] = "No, man, I'm Dave.";

(gdb) py print gdb.parse_and_eval("foo").lazy_string().type
const char
(gdb) py print gdb.parse_and_eval("bar").lazy_string().type
const char [19]

I don't have a strong opinion on what the correct answer is, but there
is certainly a bug here.

Phil, do you remember why this code exists in valpy_lazy_string():

      if (TYPE_CODE (value_type (value)) == TYPE_CODE_PTR)
        value = value_ind (value);

[lazy string support went in in commit be759fcf]

I kinda like the type of the lazy string version of a char array still
being a char array.

OTOH, when I look at the definition of a lazy string, recording the
character type makes more sense
(to me anyway: e.g., length would be redundant for the case where type
is a char array)

typedef struct {
  PyObject_HEAD
  /*  Holds the address of the lazy string.  */
  CORE_ADDR address;

  /*  Holds the encoding that will be applied to the string
      when the string is printed by GDB.  If the encoding is set
      to None then GDB will select the most appropriate
      encoding when the sting is printed.  */
  char *encoding;

  /* Holds the length of the string in characters.  If the
     length is -1, then the string will be fetched and encoded up to
     the first null of appropriate width.  */
  long length;

  /*  This attribute holds the type that is represented by the lazy
      string's type.  */
  struct type *type;
} lazy_string_object;

To fix this bug we're going to have to break one or the other, or add
a knob (bleah) to control the old, broken, behaviour.
Or mark the "type" LazyString attribute as broken/deprecated and
provide a new attribute with correct behaviour.

Opinions?
Eli Zaretskii Nov. 3, 2016, 7:07 p.m. UTC | #4
> From: Doug Evans <dje@google.com>
> Date: Thu, 3 Nov 2016 11:30:39 -0700
> Cc: gdb-patches <gdb-patches@sourceware.org>
> 
> On Thu, Nov 3, 2016 at 11:21 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>  > Date: Thu, 03 Nov 2016 17:46:28 +0000
>  > From: Doug Evans <dje@google.com>
>  >
>  > +Return the type of a character in @var{lazy-string}.
> 
>  Hmm... what is "type of a character" in this context?
> 
>  I have no comments to the markup or the text.
> 
>  Thanks.
> 
> ​Consider​ char vs wchar_t.
> e.g.,
> #include <stddef.h>
> char foo[] = "bar";
> wchar_t wide_foo[] = L"bar";

So maybe it would be more clear to say "the type of characters",
plural?  And perhaps give an example or two of what types these could
be?

Thanks.
André Pönitz Nov. 3, 2016, 10:29 p.m. UTC | #5
On Thu, Nov 03, 2016 at 12:03:46PM -0700, Doug Evans wrote:
> To fix this bug we're going to have to break one or the other, or add
> a knob (bleah) to control the old, broken, behaviour.

Adding a knob to enable the old behaviour does not really help
frontends, as with the same effort a frontend adds a "set knob
enabled" command it could just adapt to the new behaviour.

> Or mark the "type" LazyString attribute as broken/deprecated and
> provide a new attribute with correct behaviour.  [...]
> Opinions?

To be honest, I don't really have an opinion on which is better or even
correct, as it is still rather unclear to me in which situation a
frontend would benefit from using gdb.LazyString at all. To me, the need
to specify an encoding or rely on GDB to select something appropriate
seems to be a disadvantage over keeping a (address, length) pair and
using gdb.Inferior.read_memory() when needed, especially when the
contents of that memory can be corrupted and not necessarily contain
data of an intended encoding anymore.

André

PS: 

> [...] will select the most appropriate
> encoding when the sting is printed [...]
 
s/sting/string/ ? (also copied to guile/scm-lazy-string.c)
Doug Evans Nov. 3, 2016, 11:54 p.m. UTC | #6
On Thu, Nov 3, 2016 at 12:07 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Doug Evans <dje@google.com>
>> Date: Thu, 3 Nov 2016 11:30:39 -0700
>> Cc: gdb-patches <gdb-patches@sourceware.org>
>>
>> On Thu, Nov 3, 2016 at 11:21 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>>
>>  > Date: Thu, 03 Nov 2016 17:46:28 +0000
>>  > From: Doug Evans <dje@google.com>
>>  >
>>  > +Return the type of a character in @var{lazy-string}.
>>
>>  Hmm... what is "type of a character" in this context?
>>
>>  I have no comments to the markup or the text.
>>
>>  Thanks.
>>
>> Consider char vs wchar_t.
>> e.g.,
>> #include <stddef.h>
>> char foo[] = "bar";
>> wchar_t wide_foo[] = L"bar";
>
> So maybe it would be more clear to say "the type of characters",
> plural?  And perhaps give an example or two of what types these could
> be?

Or how about "the type of an element in the string" ?
Doug Evans Nov. 4, 2016, midnight UTC | #7
On Thu, Nov 3, 2016 at 12:03 PM, Doug Evans <dje@google.com> wrote:
> On Thu, Nov 3, 2016 at 10:46 AM, Doug Evans <dje@google.com> wrote:
>> Hi.
>>
>> I was trying to understand a problem I was having with python lazy strings.
>> It turns out the docs are wrong, and the "type" attribute of a lazy
>> string is the character type, not a pointer to the character's type.
>
> ... Unless the thing we're making a lazy string out of is an array
> instead of a pointer.
>
> const char* foo = "Dave";
> const char bar[] = "No, man, I'm Dave.";
>
> (gdb) py print gdb.parse_and_eval("foo").lazy_string().type
> const char
> (gdb) py print gdb.parse_and_eval("bar").lazy_string().type
> const char [19]
>
> I don't have a strong opinion on what the correct answer is, but there
> is certainly a bug here.
>
> Phil, do you remember why this code exists in valpy_lazy_string():
>
>       if (TYPE_CODE (value_type (value)) == TYPE_CODE_PTR)
>         value = value_ind (value);
>
> [lazy string support went in in commit be759fcf]
>
> I kinda like the type of the lazy string version of a char array still
> being a char array.
>
> OTOH, when I look at the definition of a lazy string, recording the
> character type makes more sense
> (to me anyway: e.g., length would be redundant for the case where type
> is a char array)
>
> typedef struct {
>   PyObject_HEAD
>   /*  Holds the address of the lazy string.  */
>   CORE_ADDR address;
>
>   /*  Holds the encoding that will be applied to the string
>       when the string is printed by GDB.  If the encoding is set
>       to None then GDB will select the most appropriate
>       encoding when the sting is printed.  */
>   char *encoding;
>
>   /* Holds the length of the string in characters.  If the
>      length is -1, then the string will be fetched and encoded up to
>      the first null of appropriate width.  */
>   long length;
>
>   /*  This attribute holds the type that is represented by the lazy
>       string's type.  */
>   struct type *type;
> } lazy_string_object;
>
> To fix this bug we're going to have to break one or the other, or add
> a knob (bleah) to control the old, broken, behaviour.
> Or mark the "type" LazyString attribute as broken/deprecated and
> provide a new attribute with correct behaviour.

Ok, an audit of all internal uses that I can find of lazy_string->type
use the type as an element of the string.

E.g.: py-pretty-print.c:print_children has:

          gdbpy_extract_lazy_string (py_v, &addr, &type, &length, &encoding);

          local_opts.addressprint = 0;
          val_print_string (type, encoding, addr, (int) length, stream,
                            &local_opts);

val_print_string takes a string element type as the first arg.

I think the right thing to do here is go with the posted patch (after
doc is ok).
And I'll post another patch to fix char[] handling (I'll file a pr for
that too, assuming my search fu can't find an existing one).
Plus I have another patch after that to fix another aspect of lazy
string handling.
Eli Zaretskii Nov. 4, 2016, 7:32 a.m. UTC | #8
> From: Doug Evans <dje@google.com>
> Date: Thu, 3 Nov 2016 16:54:13 -0700
> Cc: gdb-patches <gdb-patches@sourceware.org>
> 
> >> Consider char vs wchar_t.
> >> e.g.,
> >> #include <stddef.h>
> >> char foo[] = "bar";
> >> wchar_t wide_foo[] = L"bar";
> >
> > So maybe it would be more clear to say "the type of characters",
> > plural?  And perhaps give an example or two of what types these could
> > be?
> 
> Or how about "the type of an element in the string" ?

Yes, that'd be fine, too.

Thanks.
Doug Evans Nov. 4, 2016, 5:52 p.m. UTC | #9
On Thu, Nov 3, 2016 at 5:00 PM, Doug Evans <dje@google.com> wrote:
> On Thu, Nov 3, 2016 at 12:03 PM, Doug Evans <dje@google.com> wrote:
>> On Thu, Nov 3, 2016 at 10:46 AM, Doug Evans <dje@google.com> wrote:
>>> Hi.
>>>
>>> I was trying to understand a problem I was having with python lazy strings.
>>> It turns out the docs are wrong, and the "type" attribute of a lazy
>>> string is the character type, not a pointer to the character's type.
>>
>> ... Unless the thing we're making a lazy string out of is an array
>> instead of a pointer.
>>
>> const char* foo = "Dave";
>> const char bar[] = "No, man, I'm Dave.";
>>
>> (gdb) py print gdb.parse_and_eval("foo").lazy_string().type
>> const char
>> (gdb) py print gdb.parse_and_eval("bar").lazy_string().type
>> const char [19]
>>
>> I don't have a strong opinion on what the correct answer is, but there
>> is certainly a bug here.
>>
>> Phil, do you remember why this code exists in valpy_lazy_string():
>>
>>       if (TYPE_CODE (value_type (value)) == TYPE_CODE_PTR)
>>         value = value_ind (value);
>>
>> [lazy string support went in in commit be759fcf]
>>
>> I kinda like the type of the lazy string version of a char array still
>> being a char array.
>>
>> OTOH, when I look at the definition of a lazy string, recording the
>> character type makes more sense
>> (to me anyway: e.g., length would be redundant for the case where type
>> is a char array)
>>
>> typedef struct {
>>   PyObject_HEAD
>>   /*  Holds the address of the lazy string.  */
>>   CORE_ADDR address;
>>
>>   /*  Holds the encoding that will be applied to the string
>>       when the string is printed by GDB.  If the encoding is set
>>       to None then GDB will select the most appropriate
>>       encoding when the sting is printed.  */
>>   char *encoding;
>>
>>   /* Holds the length of the string in characters.  If the
>>      length is -1, then the string will be fetched and encoded up to
>>      the first null of appropriate width.  */
>>   long length;
>>
>>   /*  This attribute holds the type that is represented by the lazy
>>       string's type.  */
>>   struct type *type;
>> } lazy_string_object;
>>
>> To fix this bug we're going to have to break one or the other, or add
>> a knob (bleah) to control the old, broken, behaviour.
>> Or mark the "type" LazyString attribute as broken/deprecated and
>> provide a new attribute with correct behaviour.
>
> Ok, an audit of all internal uses that I can find of lazy_string->type
> use the type as an element of the string.
>
> E.g.: py-pretty-print.c:print_children has:
>
>           gdbpy_extract_lazy_string (py_v, &addr, &type, &length, &encoding);
>
>           local_opts.addressprint = 0;
>           val_print_string (type, encoding, addr, (int) length, stream,
>                             &local_opts);
>
> val_print_string takes a string element type as the first arg.
>
> I think the right thing to do here is go with the posted patch (after
> doc is ok).
> And I'll post another patch to fix char[] handling (I'll file a pr for
> that too, assuming my search fu can't find an existing one).
> Plus I have another patch after that to fix another aspect of lazy
> string handling.

As more data, I found the following test in py-value.exp:

  gdb_test "python print (lstr.type)" "const char \*." "Test
lazy-string type name equality"
  gdb_test "python print (sptr.type)" "const char \*." "Test string
type name equality"

The test is, seemingly, trying to ensure the type of the lazy string
is the same as the type of the string.
Not unreasonable to be sure.
Alas that's not what's going on:

python print (lstr.type)^M
const char^M
(gdb) PASS: gdb.python/py-value.exp: Test lazy-string type name equality
python print (sptr.type)^M
const char *^M
(gdb) PASS: gdb.python/py-value.exp: Test string type name equality

I'm now thinking of fixing LazyString.type for char* strings so that
the above "python print (lstr.type)" prints "const char *" instead of
"const char".
Yeah, I think it's going to break someone, but I don't want to leave
things as they are and it does feel better from a u/i perspective
(even with its warts).
If people want me taking a different route, please speak up.

btw, I found the following four lazy string bugs.
I'll try to address them all as well, if time allows.

https://sourceware.org/bugzilla/show_bug.cgi?id=18779
https://sourceware.org/bugzilla/show_bug.cgi?id=18439
https://sourceware.org/bugzilla/show_bug.cgi?id=17728
https://sourceware.org/bugzilla/show_bug.cgi?id=16020
Phil Muldoon Nov. 4, 2016, 7:20 p.m. UTC | #10
On 04/11/16 17:52, Doug Evans wrote:
> On Thu, Nov 3, 2016 at 5:00 PM, Doug Evans <dje@google.com> wrote:
>> On Thu, Nov 3, 2016 at 12:03 PM, Doug Evans <dje@google.com> wrote:
>>> On Thu, Nov 3, 2016 at 10:46 AM, Doug Evans <dje@google.com> wrote:
>>>> Hi.
>>>>

>>>
>>> (gdb) py print gdb.parse_and_eval("foo").lazy_string().type
>>> const char
>>> (gdb) py print gdb.parse_and_eval("bar").lazy_string().type
>>> const char [19]
>>>
>>> I don't have a strong opinion on what the correct answer is, but there
>>> is certainly a bug here.
>>>
>>> Phil, do you remember why this code exists in valpy_lazy_string():
>>>
>>>       if (TYPE_CODE (value_type (value)) == TYPE_CODE_PTR)
>>>         value = value_ind (value);
>>>
>>> [lazy string support went in in commit be759fcf]

I can't remember. Too many sleeps ;) It's possible it is just a
mistake and we're worrying about a bug that should just be fixed.

I think lazy strings were implemented to carry around a string that
remains unfetched (in cases of massive strings) until actually needed,
or a pointer to that string. I also thought there was some work
carried out on them by Tom (?) after the initial commit.

I keep notes and I'll look on the weekend to see of I can dig out
specific implementation details.

Cheers

Phil
Doug Evans Nov. 4, 2016, 7:29 p.m. UTC | #11
On Fri, Nov 4, 2016 at 12:20 PM, Phil Muldoon <pmuldoon@redhat.com> wrote:
> On 04/11/16 17:52, Doug Evans wrote:
>> On Thu, Nov 3, 2016 at 5:00 PM, Doug Evans <dje@google.com> wrote:
>>> On Thu, Nov 3, 2016 at 12:03 PM, Doug Evans <dje@google.com> wrote:
>>>> On Thu, Nov 3, 2016 at 10:46 AM, Doug Evans <dje@google.com> wrote:
>>>>> Hi.
>>>>>
>
>>>>
>>>> (gdb) py print gdb.parse_and_eval("foo").lazy_string().type
>>>> const char
>>>> (gdb) py print gdb.parse_and_eval("bar").lazy_string().type
>>>> const char [19]
>>>>
>>>> I don't have a strong opinion on what the correct answer is, but there
>>>> is certainly a bug here.
>>>>
>>>> Phil, do you remember why this code exists in valpy_lazy_string():
>>>>
>>>>       if (TYPE_CODE (value_type (value)) == TYPE_CODE_PTR)
>>>>         value = value_ind (value);
>>>>
>>>> [lazy string support went in in commit be759fcf]
>
> I can't remember. Too many sleeps ;) It's possible it is just a
> mistake and we're worrying about a bug that should just be fixed.
>
> I think lazy strings were implemented to carry around a string that
> remains unfetched (in cases of massive strings) until actually needed,
> or a pointer to that string. I also thought there was some work
> carried out on them by Tom (?) after the initial commit.
>
> I keep notes and I'll look on the weekend to see of I can dig out
> specific implementation details.

Hi Phil.

Don't spend too much time on this.
I think I'm at a point where I'm comfortable with sending a patch I'd
like committed.
diff mbox

Patch

diff --git a/gdb/python/py-lazy-string.c b/gdb/python/py-lazy-string.c
index d4b40df..49b3744 100644
--- a/gdb/python/py-lazy-string.c
+++ b/gdb/python/py-lazy-string.c
@@ -40,8 +40,7 @@  typedef struct {
       the first null of appropriate width.  */
    long length;

-  /*  This attribute holds the type that is represented by the lazy
-      string's type.  */
+  /* This attribute holds the type of a character in the string.  */
    struct type *type;
  } lazy_string_object;

diff --git a/gdb/doc/guile.texi b/gdb/doc/guile.texi
index 0030f3f..a413802 100644
--- a/gdb/doc/guile.texi
+++ b/gdb/doc/guile.texi
@@ -3247,9 +3247,7 @@  most appropriate encoding when the string is printed.
  @end deffn

  @deffn {Scheme Procedure} lazy-string-type lazy-string
-Return the type that is represented by @var{lazy-string}'s type.
-For a lazy string this will always be a pointer type.  To
-resolve this to the lazy string's character type, use  
@code{type-target-type}.
+Return the type of a character in @var{lazy-string}.
  @xref{Types In Guile}.
  @end deffn

diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index d6507e5..63fca68 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -4883,11 +4883,8 @@  is not writable.
  @end defvar

  @defvar LazyString.type
-This attribute holds the type that is represented by the lazy string's
-type.  For a lazy string this will always be a pointer type.  To
-resolve this to the lazy string's character type, use the type's
-@code{target} method.  @xref{Types In Python}.  This attribute is not
-writable.
+This attribute holds the type of a character in the string.
+@xref{Types In Python}.  This attribute is not writable.
  @end defvar

  @node Architectures In Python
diff --git a/gdb/testsuite/gdb.guile/scm-value.exp  
b/gdb/testsuite/gdb.guile/scm-value.exp
index 1d07c9f..02730ae 100644
--- a/gdb/testsuite/gdb.guile/scm-value.exp
+++ b/gdb/testsuite/gdb.guile/scm-value.exp
@@ -241,6 +241,8 @@  proc test_lazy_strings {} {
  	"= 0" "Test lazy string length"
      gdb_test "gu (print (lazy-string-address snstr))" \
  	"= 0" "Test lazy string address"
+    gdb_test "gu (print (lazy-string-type snstr))" \
+	"= const char" "Test lazy string type"
  }

  proc test_inferior_function_call {} {
diff --git a/gdb/testsuite/gdb.python/py-value.exp  
b/gdb/testsuite/gdb.python/py-value.exp
index 89be659..3ed1801 100644
--- a/gdb/testsuite/gdb.python/py-value.exp
+++ b/gdb/testsuite/gdb.python/py-value.exp
@@ -333,6 +333,7 @@  proc test_lazy_strings {} {
    gdb_py_test_silent_cmd "python snstr =  
snptr.lazy_string(length=0)" "Succesfully create a lazy string" 1
    gdb_test "python print (snstr.length)" "0" "Test lazy string length"
    gdb_test "python print (snstr.address)" "0" "Test lazy string address"
+  gdb_test "python print (snstr.type)" "const char" "Test lazy string type"
  }