[RFA] Fix crashes due to python GIL released too early

  When running GDB tests under Valgrind, various tests are failing due
to invalid memory access.
Here is the stack trace reported by Valgrind, for gdb.base/freebpcmd.exp :
  ==18658== Invalid read of size 8
  ==18658==    at 0x7F9107: is_main (signalmodule.c:195)
  ==18658==    by 0x7F9107: PyOS_InterruptOccurred (signalmodule.c:1730)
  ==18658==    by 0x3696E2: check_quit_flag() (extension.c:829)
  ==18658==    by 0x36980B: restore_active_ext_lang(active_ext_lang_state*) (extension.c:782)
  ==18658==    by 0x48F617: gdbpy_enter::~gdbpy_enter() (python.c:235)
  ==18658==    by 0x47BB71: add_thread_object(thread_info*) (object.h:470)
  ==18658==    by 0x53A84D: operator() (std_function.h:687)
  ==18658==    by 0x53A84D: notify (observable.h:106)
  ==18658==    by 0x53A84D: add_thread_silent(ptid_t) (thread.c:311)
  ==18658==    by 0x3CD954: inf_ptrace_target::create_inferior(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&
  , char**, int) (inf-ptrace.c:139)
  ==18658==    by 0x3FE644: linux_nat_target::create_inferior(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&,
   char**, int) (linux-nat.c:1094)
  ==18658==    by 0x3D5727: run_command_1(char const*, int, run_how) (infcmd.c:633)
  ==18658==    by 0x2C05D1: cmd_func(cmd_list_element*, char const*, int) (cli-decode.c:1948)
  ==18658==    by 0x53F29F: execute_command(char const*, int) (top.c:639)
  ==18658==    by 0x3638EB: command_handler(char const*) (event-top.c:586)
  ==18658==    by 0x36468C: command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) (event-top.c:771)
  ==18658==    by 0x36407C: gdb_rl_callback_handler(char*) (event-top.c:217)
  ==18658==    by 0x5B2A1F: rl_callback_read_char (callback.c:281)
  ==18658==    by 0x36346D: gdb_rl_callback_read_char_wrapper_noexcept() (event-top.c:175)
  ==18658==    by 0x363F70: gdb_rl_callback_read_char_wrapper(void*) (event-top.c:192)
  ==18658==    by 0x3633AF: stdin_event_handler(int, void*) (event-top.c:514)
  ==18658==    by 0x362504: gdb_wait_for_event (event-loop.c:857)
  ==18658==    by 0x362504: gdb_wait_for_event(int) (event-loop.c:744)
  ==18658==    by 0x362676: gdb_do_one_event() [clone .part.11] (event-loop.c:321)
  ==18658==    by 0x3627AD: gdb_do_one_event (event-loop.c:303)
  ==18658==    by 0x3627AD: start_event_loop() (event-loop.c:370)
  ==18658==    by 0x41D35A: captured_command_loop() (main.c:381)
  ==18658==    by 0x41F2A4: captured_main (main.c:1224)
  ==18658==    by 0x41F2A4: gdb_main(captured_main_args*) (main.c:1239)
  ==18658==    by 0x227D0A: main (gdb.c:32)
  ==18658==  Address 0x10 is not stack'd, malloc'd or (recently) free'd

The problem seems to be created by gdbpy_enter::~gdbpy_enter () releasing the GIL lock
too early:
~gdbpy_enter () does:
      ...
      PyGILState_Release (m_state);
      python_gdbarch = m_gdbarch;
      python_language = m_language;

      restore_active_ext_lang (m_previous_active);
    }

So, it releases the GIL lock, does 2 assignments and then leads to the following
call sequence:
  restore_active_ext_lang => check_quit_flag => python.c gdbpy_check_quit_flag
     => PyOS_InterruptOccurred => is_main.
is_main code is:
    static int
    is_main(_PyRuntimeState *runtime)
    {
        unsigned long thread = PyThread_get_thread_ident();
        PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp;
        return (thread == runtime->main_thread
                && interp == runtime->interpreters.main);
    }

The macros and functions to access the thread state are documented as:
    /* Variable and macro for in-line access to current thread
       and interpreter state */

    #define _PyRuntimeState_GetThreadState(runtime) \
        ((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current))

    /* Get the current Python thread state.

       Efficient macro reading directly the 'gilstate.tstate_current' atomic
       variable. The macro is unsafe: it does not check for error and it can
       return NULL.

       The caller must hold the GIL.

       See also PyThreadState_Get() and PyThreadState_GET(). */
    #define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime)

So, we see that GDB releases the GIL and then potentially calls
_PyRuntimeState_GetThreadState that needs the GIL.

It is not very clear why the problem is only observed when running under
Valgrind.  Probably caused by the slowdown due to Valgrind and/or to the 'single
thread' scheduling by Valgrind.

This patch fixes the crashes by releasing the GIT lock later.

YYYY-MM-DD  Philippe Waroquiers  <philippe.waroquiers@skynet.be>

	* python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after
	restore_active_ext_lang, as GIL is needed for (indirectly)
	called PyOS_InterruptOccurred.
---
 gdb/python/python.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[RFA] Fix crashes due to python GIL released too early

Commit Message

Comments

Patch