[RFC,2/3,gdb/dap] Catch and log exceptions in dap threads

Message ID 20240207090224.27521-3-tdevries@suse.de
State Committed
Headers
Series Fix issues triggered by gdb.dap/eof.exp |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed

Commit Message

Tom de Vries Feb. 7, 2024, 9:02 a.m. UTC
  When running test-case gdb.dap/eof.exp, it occasionally coredumps.

The thread triggering the coredump is:
...
 #0  0x0000ffff42bb2280 in __pthread_kill_implementation () from /lib64/libc.so.6
 #1  0x0000ffff42b65800 [PAC] in raise () from /lib64/libc.so.6
 #2  0x00000000007b03e8 [PAC] in handle_fatal_signal (sig=11)
     at gdb/event-top.c:926
 #3  0x00000000007b0470 in handle_sigsegv (sig=11)
     at gdb/event-top.c:976
 #4  <signal handler called>
 #5  0x0000000000606080 in cli_ui_out::do_message (this=0xffff2f7ed728, style=...,
     format=0xffff0c002af1 "%s", args=...) at gdb/cli-out.c:232
 #6  0x0000000000ce6358 in ui_out::call_do_message (this=0xffff2f7ed728, style=...,
     format=0xffff0c002af1 "%s") at gdb/ui-out.c:584
 #7  0x0000000000ce6610 in ui_out::vmessage (this=0xffff2f7ed728, in_style=...,
     format=0x16f93ea "", args=...) at gdb/ui-out.c:621
 #8  0x0000000000ce3a9c in ui_file::vprintf (this=0xfffffbea1b18, ...)
     at gdb/ui-file.c:74
 #9  0x0000000000d2b148 in gdb_vprintf (stream=0xfffffbea1b18, format=0x16f93e8 "%s",
     args=...) at gdb/utils.c:1898
 #10 0x0000000000d2b23c in gdb_printf (stream=0xfffffbea1b18, format=0x16f93e8 "%s")
     at gdb/utils.c:1913
 #11 0x0000000000ab5208 in gdbpy_write (self=0x33fe35d0, args=0x342ec280, kw=0x345c08b0)
     at gdb/python/python.c:1464
 #12 0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0
 #13 0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall ()
     from /lib64/libpython3.12.so.1.0
 #14 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault ()
    from /lib64/libpython3.12.so.1.0
 #15 0x0000ffff434d8cd0 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0
 #16 0x0000ffff434b9824 [PAC] in PyObject_CallOneArg () from /lib64/libpython3.12.so.1.0
 #17 0x0000ffff43557674 [PAC] in PyFile_WriteObject () from /lib64/libpython3.12.so.1.0
 #18 0x0000ffff435577a0 [PAC] in PyFile_WriteString () from /lib64/libpython3.12.so.1.0
 #19 0x0000ffff43465354 [PAC] in thread_excepthook () from /lib64/libpython3.12.so.1.0
 #20 0x0000ffff434ac6e0 [PAC] in cfunction_vectorcall_O ()
    from /lib64/libpython3.12.so.1.0
 #21 0x0000ffff434a32d8 [PAC] in PyObject_Vectorcall () from /lib64/libpython3.12.so.1.0
 #22 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault ()
    from /lib64/libpython3.12.so.1.0
 #23 0x0000ffff434d8d88 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0
 #24 0x0000ffff435e0ef4 [PAC] in thread_run () from /lib64/libpython3.12.so.1.0
 #25 0x0000ffff43591ec0 [PAC] in pythread_wrapper () from /lib64/libpython3.12.so.1.0
 #26 0x0000ffff42bb0584 [PAC] in start_thread () from /lib64/libc.so.6
 #27 0x0000ffff42c1fd4c [PAC] in thread_start () from /lib64/libc.so.6
...

The direct cause for the coredump seems to be that cli_ui_out::do_message
is trying to write to a stream variable which does not look sound:
...
(gdb) p *stream
$8 = {_vptr.ui_file = 0x0, m_applied_style = {m_foreground = {m_simple = true, {
        m_value = 0, {m_red = 0 '\000', m_green = 0 '\000', m_blue = 0 '\000'}}},
    m_background = {m_simple = 32, {m_value = 65535, {m_red = 255 '\377',
          m_green = 255 '\377', m_blue = 0 '\000'}}},
    m_intensity = (unknown: 0x438fe710), m_reverse = 255}}
...

The string that is being printed is:
...
(gdb) p str
$9 = "Exception in thread "
...
so AFAICT this is a DAP thread running into an exception and trying to print
it.

If we look at the state of gdb's main thread, we have:
...
 #0  0x0000ffff42bac914 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
 #1  0x0000ffff42bafb44 [PAC] in pthread_cond_timedwait@@GLIBC_2.17 ()
    from /lib64/libc.so.6
 #2  0x0000ffff43466e9c [PAC] in take_gil () from /lib64/libpython3.12.so.1.0
 #3  0x0000ffff43484fe0 [PAC] in PyEval_RestoreThread ()
     from /lib64/libpython3.12.so.1.0
 #4  0x0000000000ab8698 [PAC] in gdbpy_allow_threads::~gdbpy_allow_threads (
     this=0xfffffbea1cf8, __in_chrg=<optimized out>)
     at gdb/python/python-internal.h:769
 #5  0x0000000000ab2fec in execute_gdb_command (self=0x33fe35d0, args=0x34297b60,
     kw=0x34553d20) at gdb/python/python.c:681
 #6  0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0
 #7  0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall ()
     from /lib64/libpython3.12.so.1.0
 #8  0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault ()
    from /lib64/libpython3.12.so.1.0
 #9  0x0000ffff4353bce8 [PAC] in _PyObject_VectorcallTstate.lto_priv.3 ()
    from /lib64/libpython3.12.so.1.0
 #10 0x0000000000ab87fc [PAC] in gdbpy_event::operator() (this=0xffff14005900)
     at gdb/python/python.c:1061
 #11 0x0000000000ab93e8 in std::__invoke_impl<void, gdbpy_event&> (__f=...)
     at /usr/include/c++/13/bits/invoke.h:61
 #12 0x0000000000ab9204 in std::__invoke_r<void, gdbpy_event&> (__fn=...)
     at /usr/include/c++/13/bits/invoke.h:111
 #13 0x0000000000ab8e90 in std::_Function_handler<..>::_M_invoke(...) (...)
     at /usr/include/c++/13/bits/std_function.h:290
 #14 0x000000000062e0d0 in std::function<void ()>::operator()() const (
     this=0xffff14005830) at /usr/include/c++/13/bits/std_function.h:591
 #15 0x0000000000b67f14 in run_events (error=0, client_data=0x0)
     at gdb/run-on-main-thread.c:76
 #16 0x000000000157e290 in handle_file_event (file_ptr=0x33dae3a0, ready_mask=1)
     at gdbsupport/event-loop.cc:573
 #17 0x000000000157e760 in gdb_wait_for_event (block=1)
     at gdbsupport/event-loop.cc:694
 #18 0x000000000157d464 in gdb_do_one_event (mstimeout=-1)
     at gdbsupport/event-loop.cc:264
 #19 0x0000000000943a84 in start_event_loop () at gdb/main.c:401
 #20 0x0000000000943bfc in captured_command_loop () at gdb/main.c:465
 #21 0x000000000094567c in captured_main (data=0xfffffbea23e8)
     at gdb/main.c:1335
 #22 0x0000000000945700 in gdb_main (args=0xfffffbea23e8)
     at gdb/main.c:1354
 #23 0x0000000000423ab4 in main (argc=14, argv=0xfffffbea2578)
     at gdb/gdb.c:39
...

AFAIU, there's a race between the two threads on gdb_stderr:
- the DAP thread samples the gdb_stderr value, and uses it a bit later to
  print to
- the gdb main thread changes the gdb_stderr value forth and back,
  using a temporary value for string capture purposes

The non-sound stream value is caused by gdb_stderr being sampled while
pointing to a str_file object, and used once the str_file object is already
destroyed.

The error here is that the DAP thread attempts to print to gdb_stderr.

Fix this by adding a thread_wrapper that:
- catches all exceptions and logs them to dap.log, and
- while we're at it, logs when exiting
and using the thread_wrapper for each DAP thread.

Tested on aarch64-linux.
---
 gdb/python/lib/gdb/dap/startup.py | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)
  

Comments

Tom Tromey Feb. 7, 2024, 3:52 p.m. UTC | #1
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

Thanks for the patch.

Tom> - the DAP thread samples the gdb_stderr value, and uses it a bit later to
Tom>   print to
Tom> - the gdb main thread changes the gdb_stderr value forth and back,
Tom>   using a temporary value for string capture purposes

I think the first bullet point here is incomplete.

Tom> +        # Catch any exception, and log it.  If we let it escape here, it
Tom> +        # it'll be printed in gdb_stderr, which is not safe to access from

Remove the "it" at the end of the first line here.

Approved-By: Tom Tromey <tom@tromey.com>

Tom
  
Tom de Vries Feb. 12, 2024, 3:15 p.m. UTC | #2
On 2/7/24 16:52, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
> 
> Thanks for the patch.
> 
> Tom> - the DAP thread samples the gdb_stderr value, and uses it a bit later to
> Tom>   print to
> Tom> - the gdb main thread changes the gdb_stderr value forth and back,
> Tom>   using a temporary value for string capture purposes
> 

Thanks for the review.

> I think the first bullet point here is incomplete.
> 

AFAICT, it's not, but ... not a native speaker.

Should try to I reformulate?

Thanks,
- Tom

> Tom> +        # Catch any exception, and log it.  If we let it escape here, it
> Tom> +        # it'll be printed in gdb_stderr, which is not safe to access from
> 
> Remove the "it" at the end of the first line here.
> 
> Approved-By: Tom Tromey <tom@tromey.com>
> 
> Tom
  
Tom Tromey Feb. 12, 2024, 5:35 p.m. UTC | #3
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

Tom> - the DAP thread samples the gdb_stderr value, and uses it a
Tom> bit later to print to

>> I think the first bullet point here is incomplete.

Tom> AFAICT, it's not, but ... not a native speaker.

Tom> Should try to I reformulate?

I re-read it a couple more times & it makes sense now.
I think it's fine, sorry about that.

Tom
  

Patch

diff --git a/gdb/python/lib/gdb/dap/startup.py b/gdb/python/lib/gdb/dap/startup.py
index 4f12e0b0376..7cd1ef2fba1 100644
--- a/gdb/python/lib/gdb/dap/startup.py
+++ b/gdb/python/lib/gdb/dap/startup.py
@@ -62,9 +62,25 @@  def start_thread(name, target, args=()):
     """Start a new thread, invoking TARGET with *ARGS there.
     This is a helper function that ensures that any GDB signals are
     correctly blocked."""
-    result = gdb.Thread(name=name, target=target, args=args, daemon=True)
-    result.start()
 
+    def thread_wrapper(*args):
+        thread_name = threading.current_thread().name
+        # Catch any exception, and log it.  If we let it escape here, it
+        # it'll be printed in gdb_stderr, which is not safe to access from
+        # anywhere but gdb's main thread.
+        try:
+            target(*args)
+        except Exception as err:
+            err_string = "%s, %s" % (err, type(err))
+            log(thread_name + ": caught exception: " + err_string)
+            log_stack()
+        finally:
+            # Log when a thread terminates.
+            log(thread_name + ": terminating")
+
+    result = gdb.Thread(name=name, target=thread_wrapper, args=args,
+                        daemon=True)
+    result.start()
 
 def start_dap(target):
     """Start the DAP thread and invoke TARGET there."""