From patchwork Sun Apr 28 16:58:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Brobecker X-Patchwork-Id: 32443 Received: (qmail 67818 invoked by alias); 28 Apr 2019 16:58:18 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 67760 invoked by uid 89); 28 Apr 2019 16:58:17 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-16.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=dealt, happening, continuing, notified X-HELO: rock.gnat.com Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 28 Apr 2019 16:58:16 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id E5C3B117150 for ; Sun, 28 Apr 2019 12:58:14 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 2ZNRoqwY9V22 for ; Sun, 28 Apr 2019 12:58:14 -0400 (EDT) Received: from tron.gnat.com (tron.gnat.com [205.232.38.10]) by rock.gnat.com (Postfix) with ESMTP id D6292116F4B for ; Sun, 28 Apr 2019 12:58:14 -0400 (EDT) Received: by tron.gnat.com (Postfix, from userid 4233) id D5423549; Sun, 28 Apr 2019 12:58:14 -0400 (EDT) From: Joel Brobecker To: gdb-patches@sourceware.org Subject: [RFA v2 1/2][master+8.3] (Windows) fix thr != nullptr assert failure in delete_thread_1 Date: Sun, 28 Apr 2019 12:58:10 -0400 Message-Id: <1556470691-146942-2-git-send-email-brobecker@adacore.com> In-Reply-To: <1556470691-146942-1-git-send-email-brobecker@adacore.com> References: <1555453982-77808-1-git-send-email-brobecker@adacore.com> <1556470691-146942-1-git-send-email-brobecker@adacore.com> We have observed that GDB would randomly trip the following assertion failure when debugging on Windows. When allowing the program to run until the inferior exits, we occasionally see: (gdb) cont Continuing. [Thread 48192.0xd100 exited with code 1] [Thread 48192.0x10ad8 exited with code 1] [Thread 48192.0x36e28 exited with code 0] [Thread 48192.0x52be4 exited with code 0] [Thread 48192.0x5aa40 exited with code 0] ../../src/gdb/thread.c:453: internal-error: void delete_thread_1(thread_inf o*, bool): Assertion `thr != nullptr' failed. Running the same scenario with some additional traces enabled... (gdb) set verbose (gdb) set debugevents ... allows us to understand what the issue is. To understand, we need to first look at the events received when starting the program, and in particular which threads got created how. First, we get a CREATE_PROCESS_DEBUG_EVENT for tid=0x442a8: gdb: kernel event for pid=317536 tid=0x442a8 code=CREATE_PROCESS_DEBUG_EVENT) Shortly after, we get some CREATE_THREAD_DEBUG_EVENT events, one of them being for tid=0x4010c: gdb: kernel event for pid=317536 tid=0x4010c code=CREATE_THREAD_DEBUG_EVENT) Fast forward a bit of debugging, and we do a "cont" as above, at which point the programs reaches the end, and the system reports "exit" events. The first interesting one is the following: gdb: kernel event for pid=317536 tid=0x442a8 code=EXIT_THREAD_DEBUG_EVENT) This is reporting a thread-exit event for a thread whose tid is the TID of what we call the "main thread". That's the thread that was created when we received the CREATE_PROCESS_DEBUG_EVENT notification, and whose TID is actually stored in a global variable named main_thread_id. This is not something we expected, as the assumption we made was that the main thread would exit last, and we would be notified of it via an EXIT_PROCESS_DEBUG_EVENT. But apparently, this is not always true, at least on Windows Server 2012 and 2016 where this issue has been observed happening randomly. The consequence of the above notification is that we call windows_delete_thread for that thread, which removes it from our list of known threads. And a little bit later, then we then get the EXIT_PROCESS_DEBUG_EVENT, and we can see that the associated tid is not the main_thread_id, but rather the tid of one of the threads that was created during the lifetime of the program, in this case tid=0x4010c: gdb: kernel event for pid=317536 tid=0x4010c code=EXIT_PROCESS_DEBUG_EVENT) And the debug trace printed right after shows why we're crashing: [Deleting Thread 317536.0x442a8] We are trying to delete the thread whose tid=0x442a8, which is the main_thread_id! As we have already deleted that thread before, the search for it returns a nullptr, which then trips the assertion check in delete_thread_1. This commit fixes this issue. It ignores the open question of what to do with the main_thread_id global, particularly after that thread has been removed from our list of threads. This will be dealt with as a separate patch, to allow cherry-picking this patch into a release branch. For now, we fix the code so as to avoid this crash. gdb/ChangeLog: * windows-nat.c (get_windows_debug_event) : Use current_event.dwThreadId instead of main_thread_id. --- gdb/windows-nat.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c index 5009418..9f3242c 100644 --- a/gdb/windows-nat.c +++ b/gdb/windows-nat.c @@ -1637,11 +1637,11 @@ get_windows_debug_event (struct target_ops *ops, else if (saw_create == 1) { windows_delete_thread (ptid_t (current_event.dwProcessId, 0, - main_thread_id), + current_event.dwThreadId), 0, true /* main_thread_p */); ourstatus->kind = TARGET_WAITKIND_EXITED; ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode; - thread_id = main_thread_id; + thread_id = current_event.dwThreadId; } break;