From patchwork Sat Nov 23 11:26:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Waroquiers X-Patchwork-Id: 36156 Received: (qmail 69059 invoked by alias); 23 Nov 2019 11:26:22 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 69051 invoked by uid 89); 23 Nov 2019 11:26:22 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=Efficient X-HELO: mailsec104.isp.belgacom.be Received: from mailsec104.isp.belgacom.be (HELO mailsec104.isp.belgacom.be) (195.238.20.100) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 23 Nov 2019 11:26:18 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=skynet.be; i=@skynet.be; q=dns/txt; s=securemail; t=1574508378; x=1606044378; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=fjPFra6flSM/rjo1TxjjsCD/JEhnFB8d5bQOL44DXX8=; b=fqrNh/5SWOujxjioDd27IBO1jCCT6CRMVoIaDiSH808iDSLF84BR9Vbc RepWGjtbehIyPzLpDrNoh8YyHei7zg==; IronPort-SDR: UBgwcHBXGZjWNLwIdWrAO//y5D1rZu1XybfHW90QTKXaIEFLWxFVxvWfkC7jglOgFWR/+xlDNv X/ThLMe1KZW4vxZOXDFmpNNM562JnNAxSABxJ1khG1GzVMLl/Eo6WNItvBKjQaOJyBpYfDJPiJ oaauIlDeXvyUi7F377plu/jLOs3gpW2FkCRnvMGIKlphrUKQQ34qC57xmQTZvW5HbrIcb2/Q5t yug/cqKfchojBajsOKy7aAaAIdXgiuYIEHO+aj/U2qvrlttx+FOZopMEsoExqwjmSsie82KJLg UFw= Received: from 161.68-136-217.adsl-dyn.isp.belgacom.be (HELO md.home) ([217.136.68.161]) by relay.skynet.be with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 23 Nov 2019 12:26:16 +0100 From: Philippe Waroquiers To: gdb-patches@sourceware.org Cc: Philippe Waroquiers Subject: [RFA] Fix crashes due to python GIL released too early Date: Sat, 23 Nov 2019 12:26:09 +0100 Message-Id: <20191123112609.7362-1-philippe.waroquiers@skynet.be> MIME-Version: 1.0 X-IsSubscribed: yes When running GDB tests under Valgrind, various tests are failing due to invalid memory access. Here is the stack trace reported by Valgrind, for gdb.base/freebpcmd.exp : ==18658== Invalid read of size 8 ==18658== at 0x7F9107: is_main (signalmodule.c:195) ==18658== by 0x7F9107: PyOS_InterruptOccurred (signalmodule.c:1730) ==18658== by 0x3696E2: check_quit_flag() (extension.c:829) ==18658== by 0x36980B: restore_active_ext_lang(active_ext_lang_state*) (extension.c:782) ==18658== by 0x48F617: gdbpy_enter::~gdbpy_enter() (python.c:235) ==18658== by 0x47BB71: add_thread_object(thread_info*) (object.h:470) ==18658== by 0x53A84D: operator() (std_function.h:687) ==18658== by 0x53A84D: notify (observable.h:106) ==18658== by 0x53A84D: add_thread_silent(ptid_t) (thread.c:311) ==18658== by 0x3CD954: inf_ptrace_target::create_inferior(char const*, std::__cxx11::basic_string, std::allocator > const& , char**, int) (inf-ptrace.c:139) ==18658== by 0x3FE644: linux_nat_target::create_inferior(char const*, std::__cxx11::basic_string, std::allocator > const&, char**, int) (linux-nat.c:1094) ==18658== by 0x3D5727: run_command_1(char const*, int, run_how) (infcmd.c:633) ==18658== by 0x2C05D1: cmd_func(cmd_list_element*, char const*, int) (cli-decode.c:1948) ==18658== by 0x53F29F: execute_command(char const*, int) (top.c:639) ==18658== by 0x3638EB: command_handler(char const*) (event-top.c:586) ==18658== by 0x36468C: command_line_handler(std::unique_ptr >&&) (event-top.c:771) ==18658== by 0x36407C: gdb_rl_callback_handler(char*) (event-top.c:217) ==18658== by 0x5B2A1F: rl_callback_read_char (callback.c:281) ==18658== by 0x36346D: gdb_rl_callback_read_char_wrapper_noexcept() (event-top.c:175) ==18658== by 0x363F70: gdb_rl_callback_read_char_wrapper(void*) (event-top.c:192) ==18658== by 0x3633AF: stdin_event_handler(int, void*) (event-top.c:514) ==18658== by 0x362504: gdb_wait_for_event (event-loop.c:857) ==18658== by 0x362504: gdb_wait_for_event(int) (event-loop.c:744) ==18658== by 0x362676: gdb_do_one_event() [clone .part.11] (event-loop.c:321) ==18658== by 0x3627AD: gdb_do_one_event (event-loop.c:303) ==18658== by 0x3627AD: start_event_loop() (event-loop.c:370) ==18658== by 0x41D35A: captured_command_loop() (main.c:381) ==18658== by 0x41F2A4: captured_main (main.c:1224) ==18658== by 0x41F2A4: gdb_main(captured_main_args*) (main.c:1239) ==18658== by 0x227D0A: main (gdb.c:32) ==18658== Address 0x10 is not stack'd, malloc'd or (recently) free'd The problem seems to be created by gdbpy_enter::~gdbpy_enter () releasing the GIL lock too early: ~gdbpy_enter () does: ... PyGILState_Release (m_state); python_gdbarch = m_gdbarch; python_language = m_language; restore_active_ext_lang (m_previous_active); } So, it releases the GIL lock, does 2 assignments and then leads to the following call sequence: restore_active_ext_lang => check_quit_flag => python.c gdbpy_check_quit_flag => PyOS_InterruptOccurred => is_main. is_main code is: static int is_main(_PyRuntimeState *runtime) { unsigned long thread = PyThread_get_thread_ident(); PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp; return (thread == runtime->main_thread && interp == runtime->interpreters.main); } The macros and functions to access the thread state are documented as: /* Variable and macro for in-line access to current thread and interpreter state */ #define _PyRuntimeState_GetThreadState(runtime) \ ((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current)) /* Get the current Python thread state. Efficient macro reading directly the 'gilstate.tstate_current' atomic variable. The macro is unsafe: it does not check for error and it can return NULL. The caller must hold the GIL. See also PyThreadState_Get() and PyThreadState_GET(). */ #define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime) So, we see that GDB releases the GIL and then potentially calls _PyRuntimeState_GetThreadState that needs the GIL. It is not very clear why the problem is only observed when running under Valgrind. Probably caused by the slowdown due to Valgrind and/or to the 'single thread' scheduling by Valgrind. This patch fixes the crashes by releasing the GIT lock later. YYYY-MM-DD Philippe Waroquiers * python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after restore_active_ext_lang, as GIL is needed for (indirectly) called PyOS_InterruptOccurred. --- gdb/python/python.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gdb/python/python.c b/gdb/python/python.c index 7b561a1c35..1cc5ebb7a7 100644 --- a/gdb/python/python.c +++ b/gdb/python/python.c @@ -228,11 +228,11 @@ gdbpy_enter::~gdbpy_enter () m_error->restore (); - PyGILState_Release (m_state); python_gdbarch = m_gdbarch; python_language = m_language; restore_active_ext_lang (m_previous_active); + PyGILState_Release (m_state); } /* Set the quit flag. */