Use generic_printstr from ada_language::printstr

Message ID 20241204183254.296681-1-tromey@adacore.com
State New
Headers
Series Use generic_printstr from ada_language::printstr |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed

Commit Message

Tom Tromey Dec. 4, 2024, 6:32 p.m. UTC
  Currently, if you create a lazy string while in Ada language mode, the
string will be rendered strangely, like:

    "["d0"]["9f"]["d1"]["80"]["d0"]["b8"]...

This happens because ada_printstr does not really handle UTF-8
decoding.

This patch changes ada_language::printstr to use generic_printstr when
UTF-8 is used.

Note that this code could probably be improved some more -- the
current patch only addresses the narrow case of the Python API.  I've
filed a follow-up bug for the remaining changes.
---
 gdb/ada-lang.c                             | 11 +++++-
 gdb/testsuite/gdb.ada/lazy-string.exp      | 43 ++++++++++++++++++++++
 gdb/testsuite/gdb.ada/lazy-string/main.adb | 25 +++++++++++++
 3 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 gdb/testsuite/gdb.ada/lazy-string.exp
 create mode 100644 gdb/testsuite/gdb.ada/lazy-string/main.adb
  

Comments

Andrew Burgess Dec. 12, 2024, 11:29 a.m. UTC | #1
Tom Tromey <tromey@adacore.com> writes:

> Currently, if you create a lazy string while in Ada language mode, the
> string will be rendered strangely, like:
>
>     "["d0"]["9f"]["d1"]["80"]["d0"]["b8"]...
>
> This happens because ada_printstr does not really handle UTF-8
> decoding.
>
> This patch changes ada_language::printstr to use generic_printstr when
> UTF-8 is used.
>
> Note that this code could probably be improved some more -- the
> current patch only addresses the narrow case of the Python API.  I've
> filed a follow-up bug for the remaining changes.

Could you include the bug number please.

> ---
>  gdb/ada-lang.c                             | 11 +++++-
>  gdb/testsuite/gdb.ada/lazy-string.exp      | 43 ++++++++++++++++++++++
>  gdb/testsuite/gdb.ada/lazy-string/main.adb | 25 +++++++++++++
>  3 files changed, 77 insertions(+), 2 deletions(-)
>  create mode 100644 gdb/testsuite/gdb.ada/lazy-string.exp
>  create mode 100644 gdb/testsuite/gdb.ada/lazy-string/main.adb
>
> diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
> index bd5dcced54a..e86b664e3e2 100644
> --- a/gdb/ada-lang.c
> +++ b/gdb/ada-lang.c
> @@ -13800,8 +13800,15 @@ class ada_language : public language_defn
>  		 const char *encoding, int force_ellipses,
>  		 const struct value_print_options *options) const override
>    {
> -    ada_printstr (stream, elttype, string, length, encoding,
> -		  force_ellipses, options);
> +    /* ada_printstr doesn't handle UTF-8 too well, but we want this
> +       for lazy-string printing.  Defer this case to the generic
> +       code.  */
> +    if (encoding != nullptr && strcasecmp (encoding, "UTF-8") == 0)
> +      generic_printstr (stream, elttype, string, length, encoding,
> +			force_ellipses, '"', 0, options);
> +    else
> +      ada_printstr (stream, elttype, string, length, encoding,
> +		    force_ellipses, options);

I guess this is fine.  Maybe the bug which I haven't bothered to track
down explains the problems, but I'd expect ada_printstr to be formatting
things differently to generic_printstr, so surely there will be weird
differences now depending on the string type?

But it's all contained to the Ada code anyway, so if there are issues,
you'll be the one to deal with them :) So for that reason I'm OK with
this change.

>    }
>  
>    /* See language.h.  */
> diff --git a/gdb/testsuite/gdb.ada/lazy-string.exp b/gdb/testsuite/gdb.ada/lazy-string.exp
> new file mode 100644
> index 00000000000..39c100142b4
> --- /dev/null
> +++ b/gdb/testsuite/gdb.ada/lazy-string.exp
> @@ -0,0 +1,43 @@
> +# Copyright 2024 Free Software Foundation, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test GDB's 'set print characters' setting works for Ada strings.

This needs updating I think.

> +
> +load_lib "ada.exp"
> +load_lib gdb-python.exp
> +
> +require allow_ada_tests allow_python_tests
> +
> +standard_ada_testfile main
> +
> +# Enable basic use of UTF-8.  LC_ALL gets reset for each testfile.
> +setenv LC_ALL C.UTF-8
> +
> +if {[gdb_compile_ada "${srcfile}" "${binfile}" executable {debug}] != ""} {
> +    return -1
> +}
> +
> +clean_restart ${testfile}
> +
> +set bp_location [gdb_get_line_number "STOP" ${testdir}/main.adb]
> +if ![runto "main.adb:$bp_location" ] then {

Please format this as:

  if {![runto "main.adb:$bp_location"]} {

I went through and removed all the 'then' at one point.

With the minor issues fixed:

Approved-By: Andrew Burgess <aburgess@redhat.com>

thanks,
Andrew

> +  return -1
> +}
> +
> +gdb_test_no_output "python arg = gdb.parse_and_eval('arg.all')"
> +
> +gdb_test "python print(str(arg.lazy_string(encoding='utf-8')))" \
> +    "\"funçao\"" \
> +    "print lazy string using utf-8"
> diff --git a/gdb/testsuite/gdb.ada/lazy-string/main.adb b/gdb/testsuite/gdb.ada/lazy-string/main.adb
> new file mode 100644
> index 00000000000..b5ccd011d25
> --- /dev/null
> +++ b/gdb/testsuite/gdb.ada/lazy-string/main.adb
> @@ -0,0 +1,25 @@
> +--  Copyright 2024 Free Software Foundation, Inc.
> +--
> +--  This program is free software; you can redistribute it and/or modify
> +--  it under the terms of the GNU General Public License as published by
> +--  the Free Software Foundation; either version 3 of the License, or
> +--  (at your option) any later version.
> +--
> +--  This program is distributed in the hope that it will be useful,
> +--  but WITHOUT ANY WARRANTY; without even the implied warranty of
> +--  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +--  GNU General Public License for more details.
> +--
> +--  You should have received a copy of the GNU General Public License
> +--  along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +procedure Main is
> +
> +   procedure Blah (Arg : String) is
> +   begin
> +     null; -- STOP
> +   end;
> +
> +begin
> +   Blah ("funçao");
> +end Main;
> -- 
> 2.47.0
  
Tom Tromey Dec. 12, 2024, 2:42 p.m. UTC | #2
>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

Andrew> I guess this is fine.  Maybe the bug which I haven't bothered to track
Andrew> down explains the problems, but I'd expect ada_printstr to be formatting
Andrew> things differently to generic_printstr, so surely there will be weird
Andrew> differences now depending on the string type?

Yeah, there could be minor differences here if the string has some
invalid UTF-8 sequences.  But that's less important, IMO, than getting
UTF-8 output correct.

Andrew> With the minor issues fixed:

Andrew> Approved-By: Andrew Burgess <aburgess@redhat.com>

Thanks, I fixed these up.

Also:

Andrew> I went through and removed all the 'then' at one point.

... I found more, I'll send a patch momentarily.

Tom
  
Tom Tromey Dec. 12, 2024, 2:58 p.m. UTC | #3
Andrew> With the minor issues fixed:

Andrew> Approved-By: Andrew Burgess <aburgess@redhat.com>

Tom> Thanks, I fixed these up.

I fixed the .exp file but neglected to save it.
I'll send a follow-up patch momentarily.

Tom
  

Patch

diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
index bd5dcced54a..e86b664e3e2 100644
--- a/gdb/ada-lang.c
+++ b/gdb/ada-lang.c
@@ -13800,8 +13800,15 @@  class ada_language : public language_defn
 		 const char *encoding, int force_ellipses,
 		 const struct value_print_options *options) const override
   {
-    ada_printstr (stream, elttype, string, length, encoding,
-		  force_ellipses, options);
+    /* ada_printstr doesn't handle UTF-8 too well, but we want this
+       for lazy-string printing.  Defer this case to the generic
+       code.  */
+    if (encoding != nullptr && strcasecmp (encoding, "UTF-8") == 0)
+      generic_printstr (stream, elttype, string, length, encoding,
+			force_ellipses, '"', 0, options);
+    else
+      ada_printstr (stream, elttype, string, length, encoding,
+		    force_ellipses, options);
   }
 
   /* See language.h.  */
diff --git a/gdb/testsuite/gdb.ada/lazy-string.exp b/gdb/testsuite/gdb.ada/lazy-string.exp
new file mode 100644
index 00000000000..39c100142b4
--- /dev/null
+++ b/gdb/testsuite/gdb.ada/lazy-string.exp
@@ -0,0 +1,43 @@ 
+# Copyright 2024 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test GDB's 'set print characters' setting works for Ada strings.
+
+load_lib "ada.exp"
+load_lib gdb-python.exp
+
+require allow_ada_tests allow_python_tests
+
+standard_ada_testfile main
+
+# Enable basic use of UTF-8.  LC_ALL gets reset for each testfile.
+setenv LC_ALL C.UTF-8
+
+if {[gdb_compile_ada "${srcfile}" "${binfile}" executable {debug}] != ""} {
+    return -1
+}
+
+clean_restart ${testfile}
+
+set bp_location [gdb_get_line_number "STOP" ${testdir}/main.adb]
+if ![runto "main.adb:$bp_location" ] then {
+  return -1
+}
+
+gdb_test_no_output "python arg = gdb.parse_and_eval('arg.all')"
+
+gdb_test "python print(str(arg.lazy_string(encoding='utf-8')))" \
+    "\"funçao\"" \
+    "print lazy string using utf-8"
diff --git a/gdb/testsuite/gdb.ada/lazy-string/main.adb b/gdb/testsuite/gdb.ada/lazy-string/main.adb
new file mode 100644
index 00000000000..b5ccd011d25
--- /dev/null
+++ b/gdb/testsuite/gdb.ada/lazy-string/main.adb
@@ -0,0 +1,25 @@ 
+--  Copyright 2024 Free Software Foundation, Inc.
+--
+--  This program is free software; you can redistribute it and/or modify
+--  it under the terms of the GNU General Public License as published by
+--  the Free Software Foundation; either version 3 of the License, or
+--  (at your option) any later version.
+--
+--  This program is distributed in the hope that it will be useful,
+--  but WITHOUT ANY WARRANTY; without even the implied warranty of
+--  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+--  GNU General Public License for more details.
+--
+--  You should have received a copy of the GNU General Public License
+--  along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+procedure Main is
+
+   procedure Blah (Arg : String) is
+   begin
+     null; -- STOP
+   end;
+
+begin
+   Blah ("funçao");
+end Main;