[v2,2/2,gdb/tui] Handle unicode chars in prompt
Commit Message
Let's try to set the prompt using a unicode character, say '❯', aka U+276F
(heavy right-pointing angle quotation mark ornament).
This works fine on an xterm with CLI (with X marking the position of the
blinking cursor):
...
$ gdb -q -ex "set prompt GDB❯ "
GDB❯ X
...
but with TUI:
...
$ gdb -q -tui -ex "set prompt GDB❯ "
...
we get instead:
...
GDB GDB X
...
We can use the test-case gdb.tui/unicode-prompt.exp to get more details, using
tuiterm.
With Term::dump_screen we have:
...
16 (gdb) set prompt GDB❯
17 GDB❯ GDB❯ GDB❯ set prompt (gdb)
18 (gdb)
...
and with Term::dump_screen_with_attrs (summarizing using attribute sets <attrs1>
and <attrs2>):
...
16 (gdb) set prompt GDB❯
17 GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> set prompt (gdb)
18 (gdb)
...
where:
...
<attrs1> == <reverse:1><invisible:1><blinking:1><intensity:bold>
<attrs2> == <reverse:0><invisible:0><blinking:0><intensity:normal>
...
This explains why we didn't see the unicode char on xterm: it's hidden
because the invisible attribute is set.
So, there seem to be two problems:
- the attributes are incorrect, and
- the prompt is repeated a couple of times.
In TUI, the prompt is written out by tui_puts_internal, which outputs one byte
at a time using waddch, which apparantly breaks multi-byte char support.
Fix this by detecting multi-byte chars in tui_puts_internal, and printing them using
waddnstr.
Tested on x86_64-linux.
Reported-By: wuzy01@qq.com
PR tui/28800
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28800
---
gdb/testsuite/gdb.tui/unicode-prompt.exp | 43 +++++++++
gdb/tui/tui-io.c | 106 +++++++++++++++++++----
2 files changed, 134 insertions(+), 15 deletions(-)
create mode 100644 gdb/testsuite/gdb.tui/unicode-prompt.exp
Comments
>>>>> "Tom" == Tom de Vries via Gdb-patches <gdb-patches@sourceware.org> writes:
Tom> Fix this by detecting multi-byte chars in tui_puts_internal, and printing them using
Tom> waddnstr.
Is the detection really needed? What tui_puts_internal instead just
always collected the longest span of printable characters and used
waddnstr?
Tom
>>>>> "Tom" == Tom de Vries via Gdb-patches <gdb-patches@sourceware.org> writes:
Tom> +#ifdef HAVE_BTOWC
Tom> + {
Tom> + int mb_len;
Tom> + if (is_mb_char (string, mb_len) && mb_len != 1)
Tom> + {
Tom> + if (mb_len == 0)
Tom> + {
Tom> + /* Multi-byte null char. */
Tom> + break;
Tom> + }
Tom> +
Tom> + waddnstr (w, string, mb_len);
Tom> + string += mb_len;
Tom> + handled = true;
Tom> + }
Tom> + }
Tom> +#endif
I wonder if this would be simplified by using wchar_iterator.
This iterator tries to convert just a single character, and has out
parameters that reflect which input bytes were converted.
The main benefit would be less #ifdef and no need for is_mb_char in
tui-io.c.
You may need to add a method to wchar_iterator to let the caller skip
some bytes (you wouldn't want to create a new one on each iteration, as
it calls iconv_open). That way the escape handling could stay pretty
much the same.
Tom
On 6/9/23 17:39, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries via Gdb-patches <gdb-patches@sourceware.org> writes:
>
> Tom> +#ifdef HAVE_BTOWC
> Tom> + {
> Tom> + int mb_len;
> Tom> + if (is_mb_char (string, mb_len) && mb_len != 1)
> Tom> + {
> Tom> + if (mb_len == 0)
> Tom> + {
> Tom> + /* Multi-byte null char. */
> Tom> + break;
> Tom> + }
> Tom> +
> Tom> + waddnstr (w, string, mb_len);
> Tom> + string += mb_len;
> Tom> + handled = true;
> Tom> + }
> Tom> + }
> Tom> +#endif
>
> I wonder if this would be simplified by using wchar_iterator.
>
> This iterator tries to convert just a single character, and has out
> parameters that reflect which input bytes were converted.
>
> The main benefit would be less #ifdef and no need for is_mb_char in
> tui-io.c.
>
The iterator constructor also needs a specification of encoding and
width. I suppose for encoding we could use host_charset (), but I don't
know how to get the base width of that char set.
ISTM that's a problem that the multibyte functions take care of for us.
Thanks,
- Tom
> You may need to add a method to wchar_iterator to let the caller skip
> some bytes (you wouldn't want to create a new one on each iteration, as
> it calls iconv_open). That way the escape handling could stay pretty
> much the same.
>
> Tom
>>>>> "Tom" == Tom de Vries via Gdb-patches <gdb-patches@sourceware.org> writes:
Tom> The iterator constructor also needs a specification of encoding and
Tom> width. I suppose for encoding we could use host_charset (), but I
Tom> don't know how to get the base width of that char set.
The base width is 1.
Tom
On 6/12/23 20:44, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries via Gdb-patches <gdb-patches@sourceware.org> writes:
>
> Tom> The iterator constructor also needs a specification of encoding and
> Tom> width. I suppose for encoding we could use host_charset (), but I
> Tom> don't know how to get the base width of that char set.
>
> The base width is 1.
OK, then how about this?
Thanks,
- Tom
new file mode 100644
@@ -0,0 +1,43 @@
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+require allow_tui_tests
+
+tuiterm_env
+
+save_vars { env(LC_ALL) } {
+ # Override "C" settings from default_gdb_init.
+ setenv LC_ALL "C.UTF-8"
+
+ Term::clean_restart 24 80
+
+ if {![Term::enter_tui]} {
+ unsupported "TUI not supported"
+ return
+ }
+
+ set unicode_char "\u276F"
+
+ set prompt "GDB$unicode_char "
+ set prompt_re [string_to_regexp $prompt]
+
+ # Set new prompt.
+ send_gdb "set prompt $prompt\n"
+ # Set old prompt back.
+ send_gdb "set prompt (gdb) \n"
+
+ gdb_assert { [Term::wait_for "^${prompt_re}set prompt $gdb_prompt "] } \
+ "prompt with unicode char"
+}
@@ -514,6 +514,55 @@ tui_puts (const char *string, WINDOW *w)
update_cmdwin_start_line ();
}
+/* Use HAVE_BTOWC as sign that we have functioning wchar_t support. See also
+ gdb_wchar.h. */
+
+#ifdef HAVE_BTOWC
+/* Return true if STRING starts with a multi-byte char. Return the length of
+ the multi-byte char in LEN, or 0 in case it's a multi-byte null char.
+ Implementation based on _rl_read_mbchar. */
+
+static bool
+is_mb_char (const char *string, int &len)
+{
+ for (len = 1; len <= MB_CUR_MAX; len++)
+ {
+ size_t res;
+
+ {
+ mbstate_t ps;
+ memset (&ps, 0, sizeof (mbstate_t));
+ res = mbrtowc (nullptr, string, len, &ps);
+ }
+
+ if (res == (size_t)(-1))
+ {
+ /* Not a multi-byte char. */
+ return false;
+ }
+
+ if (res == (size_t)(-2))
+ {
+ /* Part of a multi-byte char. */
+ continue;
+ }
+
+ if (res == 0)
+ {
+ /* Multi-byte null char. */
+ len = 0;
+ return true;
+ }
+
+ /* Complete multi-byte char. */
+ gdb_assert (res == len);
+ return true;
+ }
+
+ return false;
+}
+#endif
+
static void
tui_puts_internal (WINDOW *w, const char *string, int *height)
{
@@ -521,29 +570,56 @@ tui_puts_internal (WINDOW *w, const char *string, int *height)
int prev_col = 0;
bool saw_nl = false;
- while ((c = *string++) != 0)
+ while (true)
{
- if (c == '\1' || c == '\2')
- {
- /* Ignore these, they are readline escape-marking
- sequences. */
- continue;
- }
+ bool handled = false;
- if (c == '\033')
+#ifdef HAVE_BTOWC
+ {
+ int mb_len;
+ if (is_mb_char (string, mb_len) && mb_len != 1)
+ {
+ if (mb_len == 0)
+ {
+ /* Multi-byte null char. */
+ break;
+ }
+
+ waddnstr (w, string, mb_len);
+ string += mb_len;
+ handled = true;
+ }
+ }
+#endif
+
+ if (!handled)
{
- size_t bytes_read = apply_ansi_escape (w, string - 1);
- if (bytes_read > 0)
+ c = *string++;
+ if (c == '\0')
+ break;
+
+ if (c == '\1' || c == '\2')
{
- string = string + bytes_read - 1;
+ /* Ignore these, they are readline escape-marking
+ sequences. */
continue;
}
- }
- if (c == '\n')
- saw_nl = true;
+ if (c == '\033')
+ {
+ size_t bytes_read = apply_ansi_escape (w, string - 1);
+ if (bytes_read > 0)
+ {
+ string = string + bytes_read - 1;
+ continue;
+ }
+ }
+
+ if (c == '\n')
+ saw_nl = true;
- do_tui_putc (w, c);
+ do_tui_putc (w, c);
+ }
if (height != nullptr)
{