[RFA] Fix crashes due to python GIL released too early

Message ID 20191123112609.7362-1-philippe.waroquiers@skynet.be
State New, archived

Commit Message

Philippe Waroquiers Nov. 23, 2019, 11:26 a.m. UTC
  When running GDB tests under Valgrind, various tests are failing due
to invalid memory access.
Here is the stack trace reported by Valgrind, for gdb.base/freebpcmd.exp :
  ==18658== Invalid read of size 8
  ==18658==    at 0x7F9107: is_main (signalmodule.c:195)
  ==18658==    by 0x7F9107: PyOS_InterruptOccurred (signalmodule.c:1730)
  ==18658==    by 0x3696E2: check_quit_flag() (extension.c:829)
  ==18658==    by 0x36980B: restore_active_ext_lang(active_ext_lang_state*) (extension.c:782)
  ==18658==    by 0x48F617: gdbpy_enter::~gdbpy_enter() (python.c:235)
  ==18658==    by 0x47BB71: add_thread_object(thread_info*) (object.h:470)
  ==18658==    by 0x53A84D: operator() (std_function.h:687)
  ==18658==    by 0x53A84D: notify (observable.h:106)
  ==18658==    by 0x53A84D: add_thread_silent(ptid_t) (thread.c:311)
  ==18658==    by 0x3CD954: inf_ptrace_target::create_inferior(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&
  , char**, int) (inf-ptrace.c:139)
  ==18658==    by 0x3FE644: linux_nat_target::create_inferior(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&,
   char**, int) (linux-nat.c:1094)
  ==18658==    by 0x3D5727: run_command_1(char const*, int, run_how) (infcmd.c:633)
  ==18658==    by 0x2C05D1: cmd_func(cmd_list_element*, char const*, int) (cli-decode.c:1948)
  ==18658==    by 0x53F29F: execute_command(char const*, int) (top.c:639)
  ==18658==    by 0x3638EB: command_handler(char const*) (event-top.c:586)
  ==18658==    by 0x36468C: command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) (event-top.c:771)
  ==18658==    by 0x36407C: gdb_rl_callback_handler(char*) (event-top.c:217)
  ==18658==    by 0x5B2A1F: rl_callback_read_char (callback.c:281)
  ==18658==    by 0x36346D: gdb_rl_callback_read_char_wrapper_noexcept() (event-top.c:175)
  ==18658==    by 0x363F70: gdb_rl_callback_read_char_wrapper(void*) (event-top.c:192)
  ==18658==    by 0x3633AF: stdin_event_handler(int, void*) (event-top.c:514)
  ==18658==    by 0x362504: gdb_wait_for_event (event-loop.c:857)
  ==18658==    by 0x362504: gdb_wait_for_event(int) (event-loop.c:744)
  ==18658==    by 0x362676: gdb_do_one_event() [clone .part.11] (event-loop.c:321)
  ==18658==    by 0x3627AD: gdb_do_one_event (event-loop.c:303)
  ==18658==    by 0x3627AD: start_event_loop() (event-loop.c:370)
  ==18658==    by 0x41D35A: captured_command_loop() (main.c:381)
  ==18658==    by 0x41F2A4: captured_main (main.c:1224)
  ==18658==    by 0x41F2A4: gdb_main(captured_main_args*) (main.c:1239)
  ==18658==    by 0x227D0A: main (gdb.c:32)
  ==18658==  Address 0x10 is not stack'd, malloc'd or (recently) free'd

The problem seems to be created by gdbpy_enter::~gdbpy_enter () releasing the GIL lock
too early:
~gdbpy_enter () does:
      PyGILState_Release (m_state);
      python_gdbarch = m_gdbarch;
      python_language = m_language;

      restore_active_ext_lang (m_previous_active);

So, it releases the GIL lock, does 2 assignments and then leads to the following
call sequence:
  restore_active_ext_lang => check_quit_flag => python.c gdbpy_check_quit_flag
     => PyOS_InterruptOccurred => is_main.
is_main code is:
    static int
    is_main(_PyRuntimeState *runtime)
        unsigned long thread = PyThread_get_thread_ident();
        PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp;
        return (thread == runtime->main_thread
                && interp == runtime->interpreters.main);

The macros and functions to access the thread state are documented as:
    /* Variable and macro for in-line access to current thread
       and interpreter state */

    #define _PyRuntimeState_GetThreadState(runtime) \

    /* Get the current Python thread state.

       Efficient macro reading directly the 'gilstate.tstate_current' atomic
       variable. The macro is unsafe: it does not check for error and it can
       return NULL.

       The caller must hold the GIL.

       See also PyThreadState_Get() and PyThreadState_GET(). */
    #define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime)

So, we see that GDB releases the GIL and then potentially calls
_PyRuntimeState_GetThreadState that needs the GIL.

It is not very clear why the problem is only observed when running under
Valgrind.  Probably caused by the slowdown due to Valgrind and/or to the 'single
thread' scheduling by Valgrind.

This patch fixes the crashes by releasing the GIT lock later.

YYYY-MM-DD  Philippe Waroquiers  <philippe.waroquiers@skynet.be>

	* python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after
	restore_active_ext_lang, as GIL is needed for (indirectly)
	called PyOS_InterruptOccurred.
 gdb/python/python.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


Tom Tromey Nov. 25, 2019, 9:59 p.m. UTC | #1
>>>>> "Philippe" == Philippe Waroquiers <philippe.waroquiers@skynet.be> writes:

Philippe> YYYY-MM-DD  Philippe Waroquiers  <philippe.waroquiers@skynet.be>

Philippe> 	* python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after
Philippe> 	restore_active_ext_lang, as GIL is needed for (indirectly)
Philippe> 	called PyOS_InterruptOccurred.

Thank you for the patch and the explanation.
This is ok.



diff --git a/gdb/python/python.c b/gdb/python/python.c
index 7b561a1c35..1cc5ebb7a7 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -228,11 +228,11 @@  gdbpy_enter::~gdbpy_enter ()
   m_error->restore ();
-  PyGILState_Release (m_state);
   python_gdbarch = m_gdbarch;
   python_language = m_language;
   restore_active_ext_lang (m_previous_active);
+  PyGILState_Release (m_state);
 /* Set the quit flag.  */