gdb: fix missing null-character when using value_cstring
Commit Message
In PR gdb/21699 an issue was reported with the $_as_string convenience
function. It was observed that the string returned by this function,
when pushed into the inferior, was not null terminated.
This was causing problems when using the result with GDB's printf
command, as this command relies on the string having been pushed into
the inferior and being null terminated.
The bug includes a simple reproducer:
#include <stddef.h>
static char arena[51] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
/* Override malloc() so value_coerce_to_target() gets a known pointer, and we
know we"ll see an error if $_as_string() gives a string that isn't NULL
terminated. */
void
*malloc (size_t size)
{
if (size > sizeof (arena))
return NULL;
return arena;
}
int
main ()
{
return 0;
}
Then use this in a GDB session like this:
$ gdb -q test
Reading symbols from /tmp/test...
(gdb) start
Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
Starting program: /tmp/test
Temporary breakpoint 1, main () at test.c:17
17 return 0;
(gdb) printf "%s\n", $_as_string("hello")
"hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(gdb) quit
The problem is with the GDB function value_cstring, or at least, how
this function is being used.
When $_as_string is called we enter fnpy_call (python/py-function.c),
this calls into Python code. The Python code returns a result, which
will be a Python string, and then we call convert_value_from_python to
convert the Python string to a GDB value.
In convert_value_from_python (python/py-value.c) we enter the
gdbpy_is_string case (as the result is a string), then we call
python_string_to_target_string, which returns a null terminated C
string. Next we then make this call:
value = value_cstring (s.get (), strlen (s.get ()),
builtin_type_pychar);
This passes the pointer to the first character 's.get ()' and the
length of the string 'strlen (s.get ())', however, this length does
not include the trailing null character.
If we look at value_cstring (valops.c) we see that an array is created
using the passed in length, and characters are copied into the newly
allocated array value. This means we do not copy the strings trailing
null character, nor does value_cstring add a trailing null.
Finally, when we use this array value with printf, GDB pushed the
array to the inferior, which mallocs some space based on the array
length, and then copies the array content to the inferior.
As the array doesn't include a trailing null, non is written into the
inferior. So what we place into the inferior is not a C string, but
is actually an array of characters.
When printf tries to print the value it starts at the address of the
first character and continues until it reaches a null. When that will
be is undefined, so we may end up printing random garbage.
Now, ignore for a moment that the whole push an array to the inferior
just so we can fetch it in order to print it is clearly crazy. That's
a problem for another day I think. The important question here is:
should value_cstring include a null character or not.
Given the function name include 'cstring', which I'm assuming means C
style string, I think that we should be including a trailing null.
Given that, I see two possibilities, either value_cstring can always
add a trailing null, or value_cstring can assert that there is a
trailing null, and the caller is responsible for making sure that the
passed in length includes the null character.
Given we're always passing from a C style string to begin with the
question is really, should the length being passed to value_cstring
include the null, or not include the null?
The only place where we currently include the null in the passed
length is from c_string_operation::evaluate. Every other use of
value_cstring passes the length excluding the null.
I was tempted to adjust c_string_operation::evaluate to exclude the
null, and then have value_cstring add a trailing null. However, this
does mean that if, in the future, a use is introduced that incorrectly
includes the trailing null in the passed length, then we are unlikely
to spot immediately - we'd instead create an array with two null
characters at the end.
Alternatively, if we change the requirements of value_cstring so that
we require the passed length includes the trailing null, then we can
assert that this is indeed the case within value_cstring. Any
incorrect uses in the future will be quickly spotted.
So that's what I did, c_string_operation::evaluate is left unchanged,
but every other use of value_cstring is adjusted with a '+ 1' so that
we include the null within the length being passed.
I've added a header comment to value_cstring (value.h) to describe the
requirements.
Upon testing there were two tests that failed after this fix,
gdb.base/settings.exp and gdb.python/py-mi.exp. In both of these
cases we end up asking for the type of a character array allocated
through value_cstring. The length of this array has now increased by
one. Here's the previous behaviour:
(gdb) set args abc
(gdb) p $_gdb_setting("args")
$1 = "abc"
(gdb) ptype $1
type = char [3]
(gdb)
And here's the modified behaviour:
(gdb) set args abc
(gdb) p $_gdb_setting("args")
$1 = "abc"
(gdb) ptype $1
type = char [4]
(gdb)
Notice the type of $1 changed from an array of length 3 to an array of
length 4. However, I don't think this is a bad thing, consider:
char lit[] = "zzz";
int
main()
{
return 0;
}
And in GDB:
(gdb) ptype lit
type = char [4]
(gdb)
The null character is considered part of the array, so I think the new
behaviour makes sense.
I've added some new tests that try to exercise this issue in a few
different ways. At the end of the day though it's all just variations
of the same thing, create a value by calling through value_cstring,
then force GDB to push the value to the inferior, and then treat that
as a C style string and see the lack of null character cause problems.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
---
gdb/cli/cli-cmds.c | 17 ++++--
gdb/guile/scm-math.c | 16 ++++--
gdb/python/py-value.c | 8 ++-
gdb/testsuite/gdb.base/cstring-exprs.c | 51 ++++++++++++++++++
gdb/testsuite/gdb.base/cstring-exprs.exp | 68 ++++++++++++++++++++++++
gdb/testsuite/gdb.base/settings.exp | 4 +-
gdb/testsuite/gdb.python/py-mi.exp | 2 +-
gdb/valops.c | 4 ++
gdb/value.c | 3 +-
gdb/value.h | 4 ++
10 files changed, 164 insertions(+), 13 deletions(-)
create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.c
create mode 100644 gdb/testsuite/gdb.base/cstring-exprs.exp
base-commit: 60a13bbcdfb0ce008a77563cea0c34c830d7b170
Comments
On 4/3/23 17:49, Andrew Burgess via Gdb-patches wrote:
> In PR gdb/21699 an issue was reported with the $_as_string convenience
> function. It was observed that the string returned by this function,
> when pushed into the inferior, was not null terminated.
>
> This was causing problems when using the result with GDB's printf
> command, as this command relies on the string having been pushed into
> the inferior and being null terminated.
>
> The bug includes a simple reproducer:
>
> #include <stddef.h>
> static char arena[51] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
>
> /* Override malloc() so value_coerce_to_target() gets a known pointer, and we
> know we"ll see an error if $_as_string() gives a string that isn't NULL
> terminated. */
> void
> *malloc (size_t size)
> {
> if (size > sizeof (arena))
> return NULL;
> return arena;
> }
>
> int
> main ()
> {
> return 0;
> }
>
> Then use this in a GDB session like this:
>
> $ gdb -q test
> Reading symbols from /tmp/test...
> (gdb) start
> Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
> Starting program: /tmp/test
>
> Temporary breakpoint 1, main () at test.c:17
> 17 return 0;
> (gdb) printf "%s\n", $_as_string("hello")
> "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> (gdb) quit
>
> The problem is with the GDB function value_cstring, or at least, how
> this function is being used.
>
> When $_as_string is called we enter fnpy_call (python/py-function.c),
> this calls into Python code. The Python code returns a result, which
> will be a Python string, and then we call convert_value_from_python to
> convert the Python string to a GDB value.
>
> In convert_value_from_python (python/py-value.c) we enter the
> gdbpy_is_string case (as the result is a string), then we call
> python_string_to_target_string, which returns a null terminated C
> string. Next we then make this call:
>
> value = value_cstring (s.get (), strlen (s.get ()),
> builtin_type_pychar);
>
> This passes the pointer to the first character 's.get ()' and the
> length of the string 'strlen (s.get ())', however, this length does
> not include the trailing null character.
>
> If we look at value_cstring (valops.c) we see that an array is created
> using the passed in length, and characters are copied into the newly
> allocated array value. This means we do not copy the strings trailing
> null character, nor does value_cstring add a trailing null.
>
> Finally, when we use this array value with printf, GDB pushed the
> array to the inferior, which mallocs some space based on the array
> length, and then copies the array content to the inferior.
>
> As the array doesn't include a trailing null, non is written into the
non -> none
> inferior. So what we place into the inferior is not a C string, but
> is actually an array of characters.
>
> When printf tries to print the value it starts at the address of the
> first character and continues until it reaches a null. When that will
> be is undefined, so we may end up printing random garbage.
>
> Now, ignore for a moment that the whole push an array to the inferior
> just so we can fetch it in order to print it is clearly crazy. That's
> a problem for another day I think. The important question here is:
> should value_cstring include a null character or not.
>
> Given the function name include 'cstring', which I'm assuming means C
> style string, I think that we should be including a trailing null.
>
> Given that, I see two possibilities, either value_cstring can always
> add a trailing null, or value_cstring can assert that there is a
> trailing null, and the caller is responsible for making sure that the
> passed in length includes the null character.
>
> Given we're always passing from a C style string to begin with the
> question is really, should the length being passed to value_cstring
> include the null, or not include the null?
>
> The only place where we currently include the null in the passed
> length is from c_string_operation::evaluate. Every other use of
> value_cstring passes the length excluding the null.
>
> I was tempted to adjust c_string_operation::evaluate to exclude the
> null, and then have value_cstring add a trailing null. However, this
> does mean that if, in the future, a use is introduced that incorrectly
> includes the trailing null in the passed length, then we are unlikely
> to spot immediately - we'd instead create an array with two null
> characters at the end.o
You can always assert that the last character is not '\0'.
>
> Alternatively, if we change the requirements of value_cstring so that
> we require the passed length includes the trailing null, then we can
> assert that this is indeed the case within value_cstring. Any
> incorrect uses in the future will be quickly spotted.
>
> So that's what I did, c_string_operation::evaluate is left unchanged,
> but every other use of value_cstring is adjusted with a '+ 1' so that
> we include the null within the length being passed.
That sounds counterintuitive to me. With an API of style pointer +
length, I don't expect the length to include the null terminator. It
also unnecessarily forces the caller to have a null-terminated version
of the string, which may not always be the case (you might want to call
value_cstring on a subset of an existing string).
I think that:
struct value *
value_cstring (const char *ptr, ssize_t len, struct type *char_type)
should take a length excluding the null terminator, but a null
terminator in the result (its job is to build a C string, and a C string
requires a null terminator at the end).
We can have the following overload, for convenience, for places that
already have a C string but don't already know its length:
struct value *
value_cstring (const char *str, struct type *char_type)
{
return value_cstring (str, strlen (str), char_type);
}
> I've added a header comment to value_cstring (value.h) to describe the
> requirements.
>
> Upon testing there were two tests that failed after this fix,
> gdb.base/settings.exp and gdb.python/py-mi.exp. In both of these
> cases we end up asking for the type of a character array allocated
> through value_cstring. The length of this array has now increased by
> one. Here's the previous behaviour:
>
> (gdb) set args abc
> (gdb) p $_gdb_setting("args")
> $1 = "abc"
> (gdb) ptype $1
> type = char [3]
> (gdb)
>
> And here's the modified behaviour:
>
> (gdb) set args abc
> (gdb) p $_gdb_setting("args")
> $1 = "abc"
> (gdb) ptype $1
> type = char [4]
> (gdb)
>
> Notice the type of $1 changed from an array of length 3 to an array of
> length 4. However, I don't think this is a bad thing, consider:
>
> char lit[] = "zzz";
> int
> main()
> {
> return 0;
> }
>
> And in GDB:
>
> (gdb) ptype lit
> type = char [4]
> (gdb)
>
> The null character is considered part of the array, so I think the new
> behaviour makes sense.
Makes sense.
> diff --git a/gdb/testsuite/gdb.base/cstring-exprs.c b/gdb/testsuite/gdb.base/cstring-exprs.c
> new file mode 100644
> index 00000000000..8135edd97d4
> --- /dev/null
> +++ b/gdb/testsuite/gdb.base/cstring-exprs.c
> @@ -0,0 +1,51 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> + Copyright 2023 Free Software Foundation, Inc.
> +
> + This program is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3 of the License, or
> + (at your option) any later version.
> +
> + This program is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
> +
> +#include <stddef.h>
> +#include <string.h>
> +
> +/* A memory area used as the malloc memory buffer. */
> +
> +static char arena[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
> +
> +/* Override malloc(). When GDB tries to push strings into the inferior we
> + always return the same pointer to arena. This does mean we can't have
> + multiple strings in use at the same time, but that's fine for our basic
> + testing, and this is simpler than using dlsym. */
> +
> +void
> +*malloc (size_t size)
The * is on the wrong line.
Simon
Simon Marchi <simark@simark.ca> writes:
> On 4/3/23 17:49, Andrew Burgess via Gdb-patches wrote:
>> In PR gdb/21699 an issue was reported with the $_as_string convenience
>> function. It was observed that the string returned by this function,
>> when pushed into the inferior, was not null terminated.
>>
>> This was causing problems when using the result with GDB's printf
>> command, as this command relies on the string having been pushed into
>> the inferior and being null terminated.
>>
>> The bug includes a simple reproducer:
>>
>> #include <stddef.h>
>> static char arena[51] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
>>
>> /* Override malloc() so value_coerce_to_target() gets a known pointer, and we
>> know we"ll see an error if $_as_string() gives a string that isn't NULL
>> terminated. */
>> void
>> *malloc (size_t size)
>> {
>> if (size > sizeof (arena))
>> return NULL;
>> return arena;
>> }
>>
>> int
>> main ()
>> {
>> return 0;
>> }
>>
>> Then use this in a GDB session like this:
>>
>> $ gdb -q test
>> Reading symbols from /tmp/test...
>> (gdb) start
>> Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
>> Starting program: /tmp/test
>>
>> Temporary breakpoint 1, main () at test.c:17
>> 17 return 0;
>> (gdb) printf "%s\n", $_as_string("hello")
>> "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> (gdb) quit
>>
>> The problem is with the GDB function value_cstring, or at least, how
>> this function is being used.
>>
>> When $_as_string is called we enter fnpy_call (python/py-function.c),
>> this calls into Python code. The Python code returns a result, which
>> will be a Python string, and then we call convert_value_from_python to
>> convert the Python string to a GDB value.
>>
>> In convert_value_from_python (python/py-value.c) we enter the
>> gdbpy_is_string case (as the result is a string), then we call
>> python_string_to_target_string, which returns a null terminated C
>> string. Next we then make this call:
>>
>> value = value_cstring (s.get (), strlen (s.get ()),
>> builtin_type_pychar);
>>
>> This passes the pointer to the first character 's.get ()' and the
>> length of the string 'strlen (s.get ())', however, this length does
>> not include the trailing null character.
>>
>> If we look at value_cstring (valops.c) we see that an array is created
>> using the passed in length, and characters are copied into the newly
>> allocated array value. This means we do not copy the strings trailing
>> null character, nor does value_cstring add a trailing null.
>>
>> Finally, when we use this array value with printf, GDB pushed the
>> array to the inferior, which mallocs some space based on the array
>> length, and then copies the array content to the inferior.
>>
>> As the array doesn't include a trailing null, non is written into the
>
> non -> none
>
>> inferior. So what we place into the inferior is not a C string, but
>> is actually an array of characters.
>>
>> When printf tries to print the value it starts at the address of the
>> first character and continues until it reaches a null. When that will
>> be is undefined, so we may end up printing random garbage.
>>
>> Now, ignore for a moment that the whole push an array to the inferior
>> just so we can fetch it in order to print it is clearly crazy. That's
>> a problem for another day I think. The important question here is:
>> should value_cstring include a null character or not.
>>
>> Given the function name include 'cstring', which I'm assuming means C
>> style string, I think that we should be including a trailing null.
>>
>> Given that, I see two possibilities, either value_cstring can always
>> add a trailing null, or value_cstring can assert that there is a
>> trailing null, and the caller is responsible for making sure that the
>> passed in length includes the null character.
>>
>> Given we're always passing from a C style string to begin with the
>> question is really, should the length being passed to value_cstring
>> include the null, or not include the null?
>>
>> The only place where we currently include the null in the passed
>> length is from c_string_operation::evaluate. Every other use of
>> value_cstring passes the length excluding the null.
>>
>> I was tempted to adjust c_string_operation::evaluate to exclude the
>> null, and then have value_cstring add a trailing null. However, this
>> does mean that if, in the future, a use is introduced that incorrectly
>> includes the trailing null in the passed length, then we are unlikely
>> to spot immediately - we'd instead create an array with two null
>> characters at the end.o
>
> You can always assert that the last character is not '\0'.
I thought about that, but I worried about strings that might contain
embedded '\0' characters... maybe we just don't care about them.
>
>>
>> Alternatively, if we change the requirements of value_cstring so that
>> we require the passed length includes the trailing null, then we can
>> assert that this is indeed the case within value_cstring. Any
>> incorrect uses in the future will be quickly spotted.
>>
>> So that's what I did, c_string_operation::evaluate is left unchanged,
>> but every other use of value_cstring is adjusted with a '+ 1' so that
>> we include the null within the length being passed.
>
> That sounds counterintuitive to me. With an API of style pointer +
> length, I don't expect the length to include the null terminator. It
> also unnecessarily forces the caller to have a null-terminated version
> of the string, which may not always be the case (you might want to call
> value_cstring on a subset of an existing string).
>
> I think that:
>
> struct value *
> value_cstring (const char *ptr, ssize_t len, struct type *char_type)
>
> should take a length excluding the null terminator, but a null
> terminator in the result (its job is to build a C string, and a C string
> requires a null terminator at the end).
This is why writing comments is so important :)
I read it as "build a value* from this C-string", which is why I figured
we can assume there will be a '\0' at the end.
Anyway, I don't really mind either way, just so long as we can get
something that works! I'll flip this around to the way you suggest and
repost.
Thanks for the feedback,
Andrew
>
> We can have the following overload, for convenience, for places that
> already have a C string but don't already know its length:
>
> struct value *
> value_cstring (const char *str, struct type *char_type)
> {
> return value_cstring (str, strlen (str), char_type);
> }
>
>> I've added a header comment to value_cstring (value.h) to describe the
>> requirements.
>>
>> Upon testing there were two tests that failed after this fix,
>> gdb.base/settings.exp and gdb.python/py-mi.exp. In both of these
>> cases we end up asking for the type of a character array allocated
>> through value_cstring. The length of this array has now increased by
>> one. Here's the previous behaviour:
>>
>> (gdb) set args abc
>> (gdb) p $_gdb_setting("args")
>> $1 = "abc"
>> (gdb) ptype $1
>> type = char [3]
>> (gdb)
>>
>> And here's the modified behaviour:
>>
>> (gdb) set args abc
>> (gdb) p $_gdb_setting("args")
>> $1 = "abc"
>> (gdb) ptype $1
>> type = char [4]
>> (gdb)
>>
>> Notice the type of $1 changed from an array of length 3 to an array of
>> length 4. However, I don't think this is a bad thing, consider:
>>
>> char lit[] = "zzz";
>> int
>> main()
>> {
>> return 0;
>> }
>>
>> And in GDB:
>>
>> (gdb) ptype lit
>> type = char [4]
>> (gdb)
>>
>> The null character is considered part of the array, so I think the new
>> behaviour makes sense.
>
> Makes sense.
>
>> diff --git a/gdb/testsuite/gdb.base/cstring-exprs.c b/gdb/testsuite/gdb.base/cstring-exprs.c
>> new file mode 100644
>> index 00000000000..8135edd97d4
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.base/cstring-exprs.c
>> @@ -0,0 +1,51 @@
>> +/* This testcase is part of GDB, the GNU debugger.
>> +
>> + Copyright 2023 Free Software Foundation, Inc.
>> +
>> + This program is free software; you can redistribute it and/or modify
>> + it under the terms of the GNU General Public License as published by
>> + the Free Software Foundation; either version 3 of the License, or
>> + (at your option) any later version.
>> +
>> + This program is distributed in the hope that it will be useful,
>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + GNU General Public License for more details.
>> +
>> + You should have received a copy of the GNU General Public License
>> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
>> +
>> +#include <stddef.h>
>> +#include <string.h>
>> +
>> +/* A memory area used as the malloc memory buffer. */
>> +
>> +static char arena[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
>> +
>> +/* Override malloc(). When GDB tries to push strings into the inferior we
>> + always return the same pointer to arena. This does mean we can't have
>> + multiple strings in use at the same time, but that's fine for our basic
>> + testing, and this is simpler than using dlsym. */
>> +
>> +void
>> +*malloc (size_t size)
>
> The * is on the wrong line.
>
> Simon
On 2023-04-06 2:20 p.m., Andrew Burgess via Gdb-patches wrote:
> Simon Marchi <simark@simark.ca> writes:
>
>> On 4/3/23 17:49, Andrew Burgess via Gdb-patches wrote:
>>> Alternatively, if we change the requirements of value_cstring so that
>>> we require the passed length includes the trailing null, then we can
>>> assert that this is indeed the case within value_cstring. Any
>>> incorrect uses in the future will be quickly spotted.
>>>
>>> So that's what I did, c_string_operation::evaluate is left unchanged,
>>> but every other use of value_cstring is adjusted with a '+ 1' so that
>>> we include the null within the length being passed.
>>
>> That sounds counterintuitive to me. With an API of style pointer +
>> length, I don't expect the length to include the null terminator. It
>> also unnecessarily forces the caller to have a null-terminated version
>> of the string, which may not always be the case (you might want to call
>> value_cstring on a subset of an existing string).
>>
>> I think that:
>>
>> struct value *
>> value_cstring (const char *ptr, ssize_t len, struct type *char_type)
>>
>> should take a length excluding the null terminator, but a null
>> terminator in the result (its job is to build a C string, and a C string
>> requires a null terminator at the end).
>
> This is why writing comments is so important :)
>
> I read it as "build a value* from this C-string", which is why I figured
> we can assume there will be a '\0' at the end.
>
> Anyway, I don't really mind either way, just so long as we can get
> something that works! I'll flip this around to the way you suggest and
> repost.
>
Hmm, this rang a bell. See this patch and following discussion:
https://inbox.sourceware.org/gdb-patches/20210713153142.3957666-1-simon.marchi@efficios.com/
Pedro Alves
Pedro Alves <pedro@palves.net> writes:
> On 2023-04-06 2:20 p.m., Andrew Burgess via Gdb-patches wrote:
>> Simon Marchi <simark@simark.ca> writes:
>>
>>> On 4/3/23 17:49, Andrew Burgess via Gdb-patches wrote:
>
>>>> Alternatively, if we change the requirements of value_cstring so that
>>>> we require the passed length includes the trailing null, then we can
>>>> assert that this is indeed the case within value_cstring. Any
>>>> incorrect uses in the future will be quickly spotted.
>>>>
>>>> So that's what I did, c_string_operation::evaluate is left unchanged,
>>>> but every other use of value_cstring is adjusted with a '+ 1' so that
>>>> we include the null within the length being passed.
>>>
>>> That sounds counterintuitive to me. With an API of style pointer +
>>> length, I don't expect the length to include the null terminator. It
>>> also unnecessarily forces the caller to have a null-terminated version
>>> of the string, which may not always be the case (you might want to call
>>> value_cstring on a subset of an existing string).
>>>
>>> I think that:
>>>
>>> struct value *
>>> value_cstring (const char *ptr, ssize_t len, struct type *char_type)
>>>
>>> should take a length excluding the null terminator, but a null
>>> terminator in the result (its job is to build a C string, and a C string
>>> requires a null terminator at the end).
>>
>> This is why writing comments is so important :)
>>
>> I read it as "build a value* from this C-string", which is why I figured
>> we can assume there will be a '\0' at the end.
>>
>> Anyway, I don't really mind either way, just so long as we can get
>> something that works! I'll flip this around to the way you suggest and
>> repost.
>>
>
> Hmm, this rang a bell. See this patch and following discussion:
>
> https://inbox.sourceware.org/gdb-patches/20210713153142.3957666-1-simon.marchi@efficios.com/
>
Thanks for pointing that out.
I guess from the age of that patch it's not something you're planning to
get back to anytime soon. I assume you'd have no objections if I
rebased your patch and merged it with my tests and reposted it?
Thanks,
Andrew
On 2023-04-12 9:47 p.m., Andrew Burgess wrote:
> Pedro Alves <pedro@palves.net> writes:
>> Hmm, this rang a bell. See this patch and following discussion:
>>
>> https://inbox.sourceware.org/gdb-patches/20210713153142.3957666-1-simon.marchi@efficios.com/
>>
>
> Thanks for pointing that out.
>
> I guess from the age of that patch it's not something you're planning to
> get back to anytime soon. I assume you'd have no objections if I
> rebased your patch and merged it with my tests and reposted it?
>
No objections at all. Please feel free to use anything from it.
@@ -2307,8 +2307,11 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
}
if (len > 0)
- return value_cstring (value, len,
- builtin_type (gdbarch)->builtin_char);
+ {
+ /* The +1 here ensures we include the trailing null character. */
+ return value_cstring (value, len + 1,
+ builtin_type (gdbarch)->builtin_char);
+ }
else
return value_cstring ("", 1,
builtin_type (gdbarch)->builtin_char);
@@ -2363,7 +2366,8 @@ str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
{
std::string cmd_val = get_setshow_command_value_string (var);
- return value_cstring (cmd_val.c_str (), cmd_val.size (),
+ /* The +1 ensures we include the trailing null character. */
+ return value_cstring (cmd_val.c_str (), cmd_val.size () + 1,
builtin_type (gdbarch)->builtin_char);
}
@@ -2392,8 +2396,11 @@ str_value_from_setting (const setting &var, struct gdbarch *gdbarch)
}
if (len > 0)
- return value_cstring (value, len,
- builtin_type (gdbarch)->builtin_char);
+ {
+ /* The +1 here ensures we include the trailing null character. */
+ return value_cstring (value, len + 1,
+ builtin_type (gdbarch)->builtin_char);
+ }
else
return value_cstring ("", 1,
builtin_type (gdbarch)->builtin_char);
@@ -803,9 +803,19 @@ vlscm_convert_typed_value_from_scheme (const char *func_name,
0 /*non-strict*/,
&except_scm);
if (s != NULL)
- value = value_cstring (s.get (), len,
- language_string_char_type (language,
- gdbarch));
+ {
+ /* The gdbscm_scm_to_string call doesn't guarantee to add
+ a tailing null-character, so we need to add our own. */
+ gdb::unique_xmalloc_ptr<char> tmp ((char *) xmalloc (len + 1));
+ memcpy (tmp.get (), s.get (), len);
+ *(tmp.get () + len) = '\0';
+
+ /* The +1 here ensures we include the trailing null
+ character. */
+ value = value_cstring (tmp.get (), len + 1,
+ language_string_char_type (language,
+ gdbarch));
+ }
else
value = NULL;
}
@@ -1881,8 +1881,12 @@ convert_value_from_python (PyObject *obj)
gdb::unique_xmalloc_ptr<char> s
= python_string_to_target_string (obj);
if (s != NULL)
- value = value_cstring (s.get (), strlen (s.get ()),
- builtin_type_pychar);
+ {
+ /* The +1 here ensures we include the trailing null
+ character. */
+ value = value_cstring (s.get (), strlen (s.get ()) + 1,
+ builtin_type_pychar);
+ }
}
else if (PyObject_TypeCheck (obj, &value_object_type))
value = ((value_object *) obj)->value->copy ();
new file mode 100644
@@ -0,0 +1,51 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+ Copyright 2023 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <stddef.h>
+#include <string.h>
+
+/* A memory area used as the malloc memory buffer. */
+
+static char arena[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
+
+/* Override malloc(). When GDB tries to push strings into the inferior we
+ always return the same pointer to arena. This does mean we can't have
+ multiple strings in use at the same time, but that's fine for our basic
+ testing, and this is simpler than using dlsym. */
+
+void
+*malloc (size_t size)
+{
+ memset (arena, 'X', sizeof (arena));
+ if (size > sizeof (arena))
+ return NULL;
+ return arena;
+}
+
+/* This function is called from GDB. */
+
+void
+take_string (const char *str)
+{
+ /* Nothing. */
+}
+
+int
+main (void)
+{
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,68 @@
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# Test different ways that we can cause GDB to call the value_string
+# function. This function should create a C style string, i.e. the
+# string should include a null terminating character.
+#
+# If the value created by value_cstring is pushed into the inferior
+# and the null terminating character is missing, then we can get
+# unexpected results.
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" $testfile $srcfile] } {
+ return -1
+}
+
+if {![runto_main]} {
+ return 0
+}
+
+if [allow_python_tests] {
+ # The $_as_string convenience function is implemented in Python.
+ gdb_test {printf "%s\n", $_as_string("aabbcc")} "\"aabbcc\""
+
+ # Create a gdb.Value object for a string. Take its address (which
+ # forces it into the inferior), and then print the address as a
+ # string.
+ gdb_test_no_output {python addr = gdb.Value("xxyyzz").address}
+ gdb_test {python gdb.execute("x/1s 0x%x" % addr)} \
+ "$hex <arena>:\\s+\"xxyyzz\""
+
+ # Call an inferior function through a gdb.Value object, pass a
+ # string to the inferior function and ensure it arrives correctly.
+ gdb_test "p/x take_string" " = $hex.*"
+ gdb_test_no_output "python func_ptr = gdb.history (0)" \
+ "place address of take_string into Python variable"
+ gdb_test "python func_value = func_ptr.dereference()" ""
+ gdb_breakpoint "take_string"
+ gdb_test {python result = func_value("qqaazz")} \
+ "Breakpoint $decimal, take_string \\(str=$hex <arena> \"qqaazz\"\\) at .*"
+ gdb_test "continue" "Continuing\\."
+}
+
+# Use printf on a string parsed by the C expression parser.
+gdb_test {printf "%s\n", "ddeeff"} "ddeeff"
+
+# Parse a string in the C expression parser, force it into the
+# inferior by taking its address, then print it as a string.
+gdb_test {x/1s &"gghhii"} "$hex <arena>:\\s+\"gghhii\""
+
+# Use $_gdb_setting_str and $_gdb_maint_setting_str to create a string
+# value, and then print using printf, which forces the string into the
+# inferior.
+gdb_test {printf "%s\n", $_gdb_setting_str("arch")} "auto"
+gdb_test {printf "%s\n", $_gdb_maint_setting_str("bfd-sharing")} "on"
@@ -504,7 +504,9 @@ proc_with_prefix test-enum {} {
gdb_test_no_output "$set_cmd zzz"
show_setting "$show_cmd" "zzz" 0 "yyy"
- check_type "test-settings enum" "type = char \\\[3\\\]"
+ # A string literal includes the trailing null character, hence the
+ # array size of four here.
+ check_type "test-settings enum" "type = char \\\[4\\\]"
test_gdb_complete_multiple "$set_cmd " "" "" {
"xxx"
@@ -288,7 +288,7 @@ mi_create_dynamic_varobj nstype2 nstype2 ".*" 1 \
"create nstype2 varobj"
mi_list_varobj_children nstype2 {
- { {nstype2.<error at 0>} {<error at 0>} 6 {char \[6\]} }
+ { {nstype2.<error at 0>} {<error at 0>} 7 {char \[7\]} }
} "list children after setting exception flag"
mi_create_varobj me me \
@@ -1738,9 +1738,13 @@ value_array (int lowbound, int highbound, struct value **elemvec)
return val;
}
+/* See value.h. */
+
struct value *
value_cstring (const char *ptr, ssize_t len, struct type *char_type)
{
+ gdb_assert (*(ptr + (len - 1)) == '\0');
+
struct value *val;
int lowbound = current_language->string_lower_bound ();
ssize_t highbound = len / char_type->length ();
@@ -2044,7 +2044,8 @@ value_of_internalvar (struct gdbarch *gdbarch, struct internalvar *var)
break;
case INTERNALVAR_STRING:
- val = value_cstring (var->u.string, strlen (var->u.string),
+ /* The +1 ensures we include the null character. */
+ val = value_cstring (var->u.string, strlen (var->u.string) + 1,
builtin_type (gdbarch)->builtin_char);
break;
@@ -1182,6 +1182,10 @@ class scoped_value_mark
const struct value *m_value;
};
+/* Create an array value, with element type CHAR_TYPE and length LEN. The
+ array represents a C style string. The content for the value is copied
+ from PTR. The last array element must be a null character. */
+
extern struct value *value_cstring (const char *ptr, ssize_t len,
struct type *char_type);
extern struct value *value_string (const char *ptr, ssize_t len,