From patchwork Tue Apr 16 22:33:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Brobecker X-Patchwork-Id: 32310 Received: (qmail 21281 invoked by alias); 16 Apr 2019 22:33:09 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 21219 invoked by uid 89); 16 Apr 2019 22:33:08 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-16.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=thr, observed, notified X-HELO: rock.gnat.com Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 16 Apr 2019 22:33:06 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id D617756225 for ; Tue, 16 Apr 2019 18:33:04 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id igbliDf8VRkl for ; Tue, 16 Apr 2019 18:33:04 -0400 (EDT) Received: from tron.gnat.com (tron.gnat.com [205.232.38.10]) by rock.gnat.com (Postfix) with ESMTP id C347D56163 for ; Tue, 16 Apr 2019 18:33:04 -0400 (EDT) Received: by tron.gnat.com (Postfix, from userid 4233) id C27A140D; Tue, 16 Apr 2019 18:33:04 -0400 (EDT) From: Joel Brobecker To: gdb-patches@sourceware.org Subject: [RFA 1/2][master+8.3] (Windows) fix thr != nullptr assert failure in delete_thread_1 Date: Tue, 16 Apr 2019 18:33:01 -0400 Message-Id: <1555453982-77808-2-git-send-email-brobecker@adacore.com> In-Reply-To: <1555453982-77808-1-git-send-email-brobecker@adacore.com> References: <1555453982-77808-1-git-send-email-brobecker@adacore.com> Hello, We have observed that GDB would randomly trip the the following assertion failure when debugging on Windows. When allowing the program to run until the inferior exits, we occasionally see: (gdb) cont Continuing. [Thread 48192.0xd100 exited with code 1] [Thread 48192.0x10ad8 exited with code 1] [Thread 48192.0x36e28 exited with code 0] [Thread 48192.0x52be4 exited with code 0] [Thread 48192.0x5aa40 exited with code 0] ../../src/gdb/thread.c:453: internal-error: void delete_thread_1(thread_inf o*, bool): Assertion `thr != nullptr' failed. Running the same scenario with some additional traces enabled... (gdb) set verbose (gdb) set debugevents ... allows us to understand what the issue is. To understand, we need to first look at the events received when starting the program, and in particular which threads got created how. First, we get a CREATE_PROCESS_DEBUG_EVENT for tid=0x442a8: gdb: kernel event for pid=317536 tid=0x442a8 code=CREATE_PROCESS_DEBUG_EVENT) Shortly after, we get some CREATE_THREAD_DEBUG_EVENT events, one of them being for tid=0x4010c: gdb: kernel event for pid=317536 tid=0x4010c code=CREATE_THREAD_DEBUG_EVENT) Fast forward a bit of debugging, and we do a "cont" as above, at which point the programs reaches the end, and the system reports "exit" events. The first interesting one is the following: gdb: kernel event for pid=317536 tid=0x442a8 code=EXIT_THREAD_DEBUG_EVENT) This is reporting a thread-exit event for a thread whose tid is the TID of what we call the "main thread". That's the thread that was created when we received the CREATE_PROCESS_DEBUG_EVENT notification, and whose TID is actually stored in a global variable called main_thread_id. This is not something we expected, as the assumption we made was that the main thread would exit last, and we would be notified of it via an EXIT_PROCESS_DEBUG_EVENT. But apparently, this is not always true, at least on Windows Server 2012 and 2016 where this issue has been observed happening randomly. The consequence of the above notification is that we call windows_delete_thread for that thread, which removes it from our list of known threads. And a little bit later, then we then get the EXIT_PROCESS_DEBUG_EVENT, and we can see that the associated tid is not the main_thread_id, but rather the tid of one of the threads that was created during the lifetime of the program, in this case tid=0x4010c: gdb: kernel event for pid=317536 tid=0x4010c code=EXIT_PROCESS_DEBUG_EVENT) And the debug trace printed right after shows why we're crashing: [Deleting Thread 317536.0x442a8] We are trying to delete the thread whose tid=0x442a8, which is the main_thread_id! As we have already deleted that thread before, the search for it returns a nullptr, which then trips the assertion check in delete_thread_1. This commit fixes the immediate issue; but it still ignores the open question of what to do with the main_thread_id global, particularly after that thread has been removed from our list of threads. The bigger question is kept separate from this patch, because this smaller patch seems to work well enough on its own in practice, and make the patch less complex, thus a possible candidate for backporting to a release branch (thinking of the gdb-8.3-branch, in this case). Another change that this commit makes is re-instate the thread create/ exit notifications for what we thought was the main thread. We added some filtering recently to not show those notifications. Otherwise, with what we've discovered, it is possible that the list of threads reported by the "new thread" notifications does not match the list of threads from the "thread exit" notifications. gdb/ChangeLog: * windows-nat.c (windows_add_thread): Remove parameter "main_thread_p". Adjust function accordingly. Update all callers. (windows_delete_thread): Likewise. (get_windows_debug_event) : Use current_event.dwThreadId instead of main_thread_id. Tested on both x86-windows and x86_64-windows using AdaCore's testsuite. The portion of the patch that uses current_event.dwThreadId instead of main_thread_id has undergood about a couple of weeks of nightly testing on various Windows versions. OK to master? OK for 8.3? Thanks! --- gdb/windows-nat.c | 58 ++++++++++++++++++++++++------------------------------- 1 file changed, 25 insertions(+), 33 deletions(-) diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c index 5009418..039eb13 100644 --- a/gdb/windows-nat.c +++ b/gdb/windows-nat.c @@ -430,12 +430,10 @@ thread_rec (DWORD id, int get_context) PTID is the ptid of the thread to be added. H is its Windows handle. - TLB is its thread local base. - MAIN_THREAD_P should be true if the thread to be added is - the main thread, false otherwise. */ + TLB is its thread local base. */ static windows_thread_info * -windows_add_thread (ptid_t ptid, HANDLE h, void *tlb, bool main_thread_p) +windows_add_thread (ptid_t ptid, HANDLE h, void *tlb) { windows_thread_info *th; DWORD id; @@ -456,13 +454,18 @@ windows_add_thread (ptid_t ptid, HANDLE h, void *tlb, bool main_thread_p) /* Add this new thread to the list of threads. - To be consistent with what's done on other platforms, we add - the main thread silently (in reality, this thread is really - more of a process to the user than a thread). */ - if (main_thread_p) - add_thread_silent (ptid); - else - add_thread (ptid); + On many platforms, we add the main program's thread silently. + But unfortunately, it does not seem to identify such thread + on Windows. We used to think that the main program's thread + is the thread reported by the CREATE_PROCESS_DEBUG_EVENT, but + we've seen since then that this thread sometimes exits before + other threads in the program -- when that happens, what we see + is that the CREATE_PROCESS_DEBUG_EVENT thread gets reported + as exited via an EXIT_THREAD_DEBUG_EVENT, and another thread + gets used in the EXIT_PROCESS_DEBUG_EVENT notification instead. + + So, we just add all threads verbosely. */ + add_thread (ptid); /* Set the debug registers for the new thread if they are used. */ if (debug_registers_used) @@ -503,12 +506,10 @@ windows_init_thread_list (void) /* Delete a thread from the list of threads. PTID is the ptid of the thread to be deleted. - EXIT_CODE is the thread's exit code. - MAIN_THREAD_P should be true if the thread to be deleted is - the main thread, false otherwise. */ + EXIT_CODE is the thread's exit code. */ static void -windows_delete_thread (ptid_t ptid, DWORD exit_code, bool main_thread_p) +windows_delete_thread (ptid_t ptid, DWORD exit_code) { windows_thread_info *th; DWORD id; @@ -517,16 +518,11 @@ windows_delete_thread (ptid_t ptid, DWORD exit_code, bool main_thread_p) id = ptid.tid (); - /* Emit a notification about the thread being deleted. - - Note that no notification was printed when the main thread - was created, and thus, unless in verbose mode, we should be - symetrical, and avoid that notification for the main thread - here as well. */ + /* Emit a notification about the thread being deleted. */ if (info_verbose) printf_unfiltered ("[Deleting %s]\n", target_pid_to_str (ptid).c_str ()); - else if (print_thread_events && !main_thread_p) + else if (print_thread_events) printf_unfiltered (_("[%s exited with code %u]\n"), target_pid_to_str (ptid).c_str (), (unsigned) exit_code); @@ -1411,8 +1407,7 @@ fake_create_process (void) = windows_add_thread (ptid_t (current_event.dwProcessId, 0, current_event.dwThreadId), current_event.u.CreateThread.hThread, - current_event.u.CreateThread.lpThreadLocalBase, - true /* main_thread_p */); + current_event.u.CreateThread.lpThreadLocalBase); return main_thread_id; } @@ -1584,8 +1579,7 @@ get_windows_debug_event (struct target_ops *ops, th = windows_add_thread (ptid_t (current_event.dwProcessId, 0, current_event.dwThreadId), current_event.u.CreateThread.hThread, - current_event.u.CreateThread.lpThreadLocalBase, - false /* main_thread_p */); + current_event.u.CreateThread.lpThreadLocalBase); break; @@ -1596,8 +1590,7 @@ get_windows_debug_event (struct target_ops *ops, "EXIT_THREAD_DEBUG_EVENT")); windows_delete_thread (ptid_t (current_event.dwProcessId, 0, current_event.dwThreadId), - current_event.u.ExitThread.dwExitCode, - false /* main_thread_p */); + current_event.u.ExitThread.dwExitCode); th = &dummy_thread_info; break; @@ -1617,8 +1610,7 @@ get_windows_debug_event (struct target_ops *ops, (ptid_t (current_event.dwProcessId, 0, current_event.dwThreadId), current_event.u.CreateProcessInfo.hThread, - current_event.u.CreateProcessInfo.lpThreadLocalBase, - true /* main_thread_p */); + current_event.u.CreateProcessInfo.lpThreadLocalBase); thread_id = current_event.dwThreadId; break; @@ -1637,11 +1629,11 @@ get_windows_debug_event (struct target_ops *ops, else if (saw_create == 1) { windows_delete_thread (ptid_t (current_event.dwProcessId, 0, - main_thread_id), - 0, true /* main_thread_p */); + current_event.dwThreadId), + 0); ourstatus->kind = TARGET_WAITKIND_EXITED; ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode; - thread_id = main_thread_id; + thread_id = current_event.dwThreadId; } break;