[RFC,2/3,gdb/dap] Catch and log exceptions in dap threads
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 |
success
|
Testing passed
|
Commit Message
When running test-case gdb.dap/eof.exp, it occasionally coredumps.
The thread triggering the coredump is:
...
#0 0x0000ffff42bb2280 in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x0000ffff42b65800 [PAC] in raise () from /lib64/libc.so.6
#2 0x00000000007b03e8 [PAC] in handle_fatal_signal (sig=11)
at gdb/event-top.c:926
#3 0x00000000007b0470 in handle_sigsegv (sig=11)
at gdb/event-top.c:976
#4 <signal handler called>
#5 0x0000000000606080 in cli_ui_out::do_message (this=0xffff2f7ed728, style=...,
format=0xffff0c002af1 "%s", args=...) at gdb/cli-out.c:232
#6 0x0000000000ce6358 in ui_out::call_do_message (this=0xffff2f7ed728, style=...,
format=0xffff0c002af1 "%s") at gdb/ui-out.c:584
#7 0x0000000000ce6610 in ui_out::vmessage (this=0xffff2f7ed728, in_style=...,
format=0x16f93ea "", args=...) at gdb/ui-out.c:621
#8 0x0000000000ce3a9c in ui_file::vprintf (this=0xfffffbea1b18, ...)
at gdb/ui-file.c:74
#9 0x0000000000d2b148 in gdb_vprintf (stream=0xfffffbea1b18, format=0x16f93e8 "%s",
args=...) at gdb/utils.c:1898
#10 0x0000000000d2b23c in gdb_printf (stream=0xfffffbea1b18, format=0x16f93e8 "%s")
at gdb/utils.c:1913
#11 0x0000000000ab5208 in gdbpy_write (self=0x33fe35d0, args=0x342ec280, kw=0x345c08b0)
at gdb/python/python.c:1464
#12 0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0
#13 0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall ()
from /lib64/libpython3.12.so.1.0
#14 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault ()
from /lib64/libpython3.12.so.1.0
#15 0x0000ffff434d8cd0 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0
#16 0x0000ffff434b9824 [PAC] in PyObject_CallOneArg () from /lib64/libpython3.12.so.1.0
#17 0x0000ffff43557674 [PAC] in PyFile_WriteObject () from /lib64/libpython3.12.so.1.0
#18 0x0000ffff435577a0 [PAC] in PyFile_WriteString () from /lib64/libpython3.12.so.1.0
#19 0x0000ffff43465354 [PAC] in thread_excepthook () from /lib64/libpython3.12.so.1.0
#20 0x0000ffff434ac6e0 [PAC] in cfunction_vectorcall_O ()
from /lib64/libpython3.12.so.1.0
#21 0x0000ffff434a32d8 [PAC] in PyObject_Vectorcall () from /lib64/libpython3.12.so.1.0
#22 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault ()
from /lib64/libpython3.12.so.1.0
#23 0x0000ffff434d8d88 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0
#24 0x0000ffff435e0ef4 [PAC] in thread_run () from /lib64/libpython3.12.so.1.0
#25 0x0000ffff43591ec0 [PAC] in pythread_wrapper () from /lib64/libpython3.12.so.1.0
#26 0x0000ffff42bb0584 [PAC] in start_thread () from /lib64/libc.so.6
#27 0x0000ffff42c1fd4c [PAC] in thread_start () from /lib64/libc.so.6
...
The direct cause for the coredump seems to be that cli_ui_out::do_message
is trying to write to a stream variable which does not look sound:
...
(gdb) p *stream
$8 = {_vptr.ui_file = 0x0, m_applied_style = {m_foreground = {m_simple = true, {
m_value = 0, {m_red = 0 '\000', m_green = 0 '\000', m_blue = 0 '\000'}}},
m_background = {m_simple = 32, {m_value = 65535, {m_red = 255 '\377',
m_green = 255 '\377', m_blue = 0 '\000'}}},
m_intensity = (unknown: 0x438fe710), m_reverse = 255}}
...
The string that is being printed is:
...
(gdb) p str
$9 = "Exception in thread "
...
so AFAICT this is a DAP thread running into an exception and trying to print
it.
If we look at the state of gdb's main thread, we have:
...
#0 0x0000ffff42bac914 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
#1 0x0000ffff42bafb44 [PAC] in pthread_cond_timedwait@@GLIBC_2.17 ()
from /lib64/libc.so.6
#2 0x0000ffff43466e9c [PAC] in take_gil () from /lib64/libpython3.12.so.1.0
#3 0x0000ffff43484fe0 [PAC] in PyEval_RestoreThread ()
from /lib64/libpython3.12.so.1.0
#4 0x0000000000ab8698 [PAC] in gdbpy_allow_threads::~gdbpy_allow_threads (
this=0xfffffbea1cf8, __in_chrg=<optimized out>)
at gdb/python/python-internal.h:769
#5 0x0000000000ab2fec in execute_gdb_command (self=0x33fe35d0, args=0x34297b60,
kw=0x34553d20) at gdb/python/python.c:681
#6 0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0
#7 0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall ()
from /lib64/libpython3.12.so.1.0
#8 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault ()
from /lib64/libpython3.12.so.1.0
#9 0x0000ffff4353bce8 [PAC] in _PyObject_VectorcallTstate.lto_priv.3 ()
from /lib64/libpython3.12.so.1.0
#10 0x0000000000ab87fc [PAC] in gdbpy_event::operator() (this=0xffff14005900)
at gdb/python/python.c:1061
#11 0x0000000000ab93e8 in std::__invoke_impl<void, gdbpy_event&> (__f=...)
at /usr/include/c++/13/bits/invoke.h:61
#12 0x0000000000ab9204 in std::__invoke_r<void, gdbpy_event&> (__fn=...)
at /usr/include/c++/13/bits/invoke.h:111
#13 0x0000000000ab8e90 in std::_Function_handler<..>::_M_invoke(...) (...)
at /usr/include/c++/13/bits/std_function.h:290
#14 0x000000000062e0d0 in std::function<void ()>::operator()() const (
this=0xffff14005830) at /usr/include/c++/13/bits/std_function.h:591
#15 0x0000000000b67f14 in run_events (error=0, client_data=0x0)
at gdb/run-on-main-thread.c:76
#16 0x000000000157e290 in handle_file_event (file_ptr=0x33dae3a0, ready_mask=1)
at gdbsupport/event-loop.cc:573
#17 0x000000000157e760 in gdb_wait_for_event (block=1)
at gdbsupport/event-loop.cc:694
#18 0x000000000157d464 in gdb_do_one_event (mstimeout=-1)
at gdbsupport/event-loop.cc:264
#19 0x0000000000943a84 in start_event_loop () at gdb/main.c:401
#20 0x0000000000943bfc in captured_command_loop () at gdb/main.c:465
#21 0x000000000094567c in captured_main (data=0xfffffbea23e8)
at gdb/main.c:1335
#22 0x0000000000945700 in gdb_main (args=0xfffffbea23e8)
at gdb/main.c:1354
#23 0x0000000000423ab4 in main (argc=14, argv=0xfffffbea2578)
at gdb/gdb.c:39
...
AFAIU, there's a race between the two threads on gdb_stderr:
- the DAP thread samples the gdb_stderr value, and uses it a bit later to
print to
- the gdb main thread changes the gdb_stderr value forth and back,
using a temporary value for string capture purposes
The non-sound stream value is caused by gdb_stderr being sampled while
pointing to a str_file object, and used once the str_file object is already
destroyed.
The error here is that the DAP thread attempts to print to gdb_stderr.
Fix this by adding a thread_wrapper that:
- catches all exceptions and logs them to dap.log, and
- while we're at it, logs when exiting
and using the thread_wrapper for each DAP thread.
Tested on aarch64-linux.
---
gdb/python/lib/gdb/dap/startup.py | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
Comments
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
Thanks for the patch.
Tom> - the DAP thread samples the gdb_stderr value, and uses it a bit later to
Tom> print to
Tom> - the gdb main thread changes the gdb_stderr value forth and back,
Tom> using a temporary value for string capture purposes
I think the first bullet point here is incomplete.
Tom> + # Catch any exception, and log it. If we let it escape here, it
Tom> + # it'll be printed in gdb_stderr, which is not safe to access from
Remove the "it" at the end of the first line here.
Approved-By: Tom Tromey <tom@tromey.com>
Tom
On 2/7/24 16:52, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
>
> Thanks for the patch.
>
> Tom> - the DAP thread samples the gdb_stderr value, and uses it a bit later to
> Tom> print to
> Tom> - the gdb main thread changes the gdb_stderr value forth and back,
> Tom> using a temporary value for string capture purposes
>
Thanks for the review.
> I think the first bullet point here is incomplete.
>
AFAICT, it's not, but ... not a native speaker.
Should try to I reformulate?
Thanks,
- Tom
> Tom> + # Catch any exception, and log it. If we let it escape here, it
> Tom> + # it'll be printed in gdb_stderr, which is not safe to access from
>
> Remove the "it" at the end of the first line here.
>
> Approved-By: Tom Tromey <tom@tromey.com>
>
> Tom
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
Tom> - the DAP thread samples the gdb_stderr value, and uses it a
Tom> bit later to print to
>> I think the first bullet point here is incomplete.
Tom> AFAICT, it's not, but ... not a native speaker.
Tom> Should try to I reformulate?
I re-read it a couple more times & it makes sense now.
I think it's fine, sorry about that.
Tom
@@ -62,9 +62,25 @@ def start_thread(name, target, args=()):
"""Start a new thread, invoking TARGET with *ARGS there.
This is a helper function that ensures that any GDB signals are
correctly blocked."""
- result = gdb.Thread(name=name, target=target, args=args, daemon=True)
- result.start()
+ def thread_wrapper(*args):
+ thread_name = threading.current_thread().name
+ # Catch any exception, and log it. If we let it escape here, it
+ # it'll be printed in gdb_stderr, which is not safe to access from
+ # anywhere but gdb's main thread.
+ try:
+ target(*args)
+ except Exception as err:
+ err_string = "%s, %s" % (err, type(err))
+ log(thread_name + ": caught exception: " + err_string)
+ log_stack()
+ finally:
+ # Log when a thread terminates.
+ log(thread_name + ": terminating")
+
+ result = gdb.Thread(name=name, target=thread_wrapper, args=args,
+ daemon=True)
+ result.start()
def start_dap(target):
"""Start the DAP thread and invoke TARGET there."""