Message ID | gerrit.1580307037000.Ibe1f29251fe2ff1c1991f041babbe18373c113b1@gnutoolchain-gerrit.osci.io |
---|---|
State | New, archived |
Headers |
Received: (qmail 64280 invoked by alias); 29 Jan 2020 14:10:50 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 64103 invoked by uid 89); 29 Jan 2020 14:10:44 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3 autolearn=ham version=3.3.1 spammy=hanging, 24938, gdb_versions X-HELO: mx1.osci.io Received: from polly.osci.io (HELO mx1.osci.io) (8.43.85.229) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 29 Jan 2020 14:10:42 +0000 Received: by mx1.osci.io (Postfix, from userid 994) id D400320256; Wed, 29 Jan 2020 09:10:38 -0500 (EST) Received: from gnutoolchain-gerrit.osci.io (gnutoolchain-gerrit.osci.io [IPv6:2620:52:3:1:5054:ff:fe06:16ca]) by mx1.osci.io (Postfix) with ESMTP id AAFFB20179 for <gdb-patches@sourceware.org>; Wed, 29 Jan 2020 09:10:37 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by gnutoolchain-gerrit.osci.io (Postfix) with ESMTP id 794B320AF7 for <gdb-patches@sourceware.org>; Wed, 29 Jan 2020 09:10:37 -0500 (EST) X-Gerrit-PatchSet: 1 Date: Wed, 29 Jan 2020 09:10:37 -0500 From: "Tom de Vries (Code Review)" <gerrit@gnutoolchain-gerrit.osci.io> To: gdb-patches@sourceware.org Message-ID: <gerrit.1580307037000.Ibe1f29251fe2ff1c1991f041babbe18373c113b1@gnutoolchain-gerrit.osci.io> Auto-Submitted: auto-generated X-Gerrit-MessageType: newchange Subject: [review] [gdb/threads] Fix hang in stop_all_threads after killing inferior X-Gerrit-Change-Id: Ibe1f29251fe2ff1c1991f041babbe18373c113b1 X-Gerrit-Change-Number: 759 X-Gerrit-ChangeURL: <https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759> X-Gerrit-Commit: fe2db3d5c7f974e5387a3d2649e4c8f2d2a44bda References: <gerrit.1580307037000.Ibe1f29251fe2ff1c1991f041babbe18373c113b1@gnutoolchain-gerrit.osci.io> Reply-To: tdevries@suse.de, gdb-patches@sourceware.org MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/3.0.3-79-g83ff7f88f1 Content-Type: text/plain; charset=UTF-8 |
Commit Message
Simon Marchi (Code Review)
Jan. 29, 2020, 2:10 p.m. UTC
Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 ...................................................................... [gdb/threads] Fix hang in stop_all_threads after killing inferior Consider a two-threaded testcase a.out, sleeping in both its threads: ... $ gdb -ex r --args a.out Reading symbols from a.out... Starting program: /data/gdb_versions/devel/a.out [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x7ffff77fe700 (LWP 31268)] ... Typing ^C causing stop_all_threads to be executed, and if an external SIGKILL (such as caused by killall -9 a.out) arrives at the start of stop_all_threads, gdb hangs in stop_all_threads after giving this warning: ... warning: unable to open /proc file '/proc/24938/status' ... Using "set debug infrun 1" we can see in more detail where we hang: ... infrun: stop_all_threads infrun: stop_all_threads, pass=0, iterations=0 infrun: Thread 0x7ffff7fa6740 (LWP 10264) not executing infrun: Thread 0x7ffff77fe700 (LWP 10268) executing, need stop infrun: target_wait (-1.0.0, status) = infrun: 10264.10268.0 [Thread 0x7ffff77fe700 (LWP 10268)], infrun: status->kind = signalled, signal = GDB_SIGNAL_KILL infrun: stop_all_threads status->kind = signalled, signal = GDB_SIGNAL_KILL \ Thread 0x7ffff77fe700 (LWP 10268) infrun: Thread 0x7ffff7fa6740 (LWP 10264) not executing infrun: Thread 0x7ffff77fe700 (LWP 10268) executing, already stopping warning: unable to open /proc file '/proc/10264/status' infrun: target_wait (-1.0.0, status) = infrun: -1.0.0 [process -1], infrun: status->kind = no-resumed infrun: infrun_async(0) infrun: stop_all_threads status->kind = no-resumed process -1 infrun: Thread 0x7ffff7fa6740 (LWP 10264) not executing infrun: Thread 0x7ffff77fe700 (LWP 10268) executing, already stopping infrun: stop_all_threads status->kind = no-resumed process -1 infrun: Thread 0x7ffff7fa6740 (LWP 10264) not executing infrun: Thread 0x7ffff77fe700 (LWP 10268) executing, already stopping infrun: stop_all_threads status->kind = no-resumed process -1 infrun: Thread 0x7ffff7fa6740 (LWP 10264) not executing infrun: Thread 0x7ffff77fe700 (LWP 10268) executing, already stopping <repeat> ...... So, we're hanging in the 'while (1)' loop in stop_all_threads as follows: - thread t is tested, and both t->executing and t->stop_requested are found to be 1 - consequently need_wait is set 1 - consequently wait_one is executed - wait_one returns a TARGET_WAITKIND_NO_RESUMED event, which is handled by continuing at the start of the loop The loop actually starts with update_thread_list (), but that doesn't seem to change the state of the threads. Fix the hang by detecting the first sign of trouble: the TARGET_WAITKIND_SIGNALLED event with signal GDB_SIGNAL_KILL, and breaking out of the loop. Build and reg-tested on x86_64-linux. gdb/ChangeLog: 2020-01-29 Tom de Vries <tdevries@suse.de> PR threads/25478 * infrun.c (stop_all_threads): Return when detecting event TARGET_WAITKIND_SIGNALLED with signal GDB_SIGNAL_KILL. Change-Id: Ibe1f29251fe2ff1c1991f041babbe18373c113b1 --- M gdb/infrun.c 1 file changed, 5 insertions(+), 1 deletion(-)
Comments
Mihails Strasuns has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 ...................................................................... Patch Set 1: This seems very similar to https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133 - I don't think there should be any logic specific to SIGKILL here. It seems like a general race condition for early termination while `stop_all_threads` is being executed.
Tankut Baris Aktemur has posted comments on this change.
Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759
......................................................................
Patch Set 1:
> FWIW, I looked over the review comments there, and noticed the suggestion to "leave the TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED event pending"
For the continuation of the discussion, please see
https://sourceware.org/ml/gdb-patches/2020-01/msg00212.html
Tom de Vries has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 ...................................................................... Patch Set 1: > FWIW, I looked over the review comments there, and noticed the suggestion to "leave the TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED event pending", which I tried using this additional patch: > ... > diff --git a/gdb/infrun.c b/gdb/infrun.c > index e34ddc83b45..c1035c25d7f 100644 > --- a/gdb/infrun.c > +++ b/gdb/infrun.c > @@ -4774,7 +4774,11 @@ stop_all_threads (void) > > if (event.ws.kind == TARGET_WAITKIND_SIGNALLED > && event.ws.value.sig == GDB_SIGNAL_KILL) > - goto done; > + { > + thread_info *t = find_thread_ptid (event.target, event.ptid); > + save_waitstatus (t, &event.ws); > + goto done; > + } > else if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED > || event.ws.kind == TARGET_WAITKIND_THREAD_EXITED > || event.ws.kind == TARGET_WAITKIND_EXITED > ... > > But I didn't notice any difference in behaviour. I've realized from another use of save_waitstatus that I have to set t->resumed in order to have the saved wait status noticed. With this additional patch, I did manage to observe a difference in behaviour: ... @@ -4777,6 +4777,7 @@ stop_all_threads (void) { thread_info *t = find_thread_ptid (event.target, event.ptid); save_waitstatus (t, &event.ws); + t->resumed = 1; goto done; } else if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED ... The difference is in what happens when I continue after the ^C is handled: - without the two patches gdb hangs, until I press enter, after which I get the prompt - with the two patches, gdb just presents the prompt
Tankut Baris Aktemur has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 ...................................................................... Patch Set 2: (1 comment) | --- gdb/infrun.c | +++ gdb/infrun.c | @@ -4767,11 +4767,19 @@ stop_all_threads (void) | if (debug_infrun) | { | fprintf_unfiltered (gdb_stdlog, | "infrun: stop_all_threads %s %s\n", | target_waitstatus_to_string (&event.ws).c_str (), | target_pid_to_str (event.ptid).c_str ()); | } | | - if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED | + if (event.ws.kind == TARGET_WAITKIND_SIGNALLED | + && event.ws.value.sig == GDB_SIGNAL_KILL) PS2, Line 4776: Why only this case? Wouldn't the same problem be seen if, say, a different signal or an exited status was received? | + { | + thread_info *t = find_thread_ptid (event.target, event.ptid); | + save_waitstatus (t, &event.ws); | + t->resumed = 1; | + goto done; | + } | + else if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED | || event.ws.kind == TARGET_WAITKIND_THREAD_EXITED | || event.ws.kind == TARGET_WAITKIND_EXITED
Tom de Vries has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 ...................................................................... Patch Set 2: (1 comment) | --- gdb/infrun.c | +++ gdb/infrun.c | @@ -4767,11 +4767,19 @@ stop_all_threads (void) | if (debug_infrun) | { | fprintf_unfiltered (gdb_stdlog, | "infrun: stop_all_threads %s %s\n", | target_waitstatus_to_string (&event.ws).c_str (), | target_pid_to_str (event.ptid).c_str ()); | } | | - if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED | + if (event.ws.kind == TARGET_WAITKIND_SIGNALLED | + && event.ws.value.sig == GDB_SIGNAL_KILL) PS2, Line 4776: For now I'm just trying to solve the PR in the absolute minimal way. | + { | + thread_info *t = find_thread_ptid (event.target, event.ptid); | + save_waitstatus (t, &event.ws); | + t->resumed = 1; | + goto done; | + } | + else if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED | || event.ws.kind == TARGET_WAITKIND_THREAD_EXITED | || event.ws.kind == TARGET_WAITKIND_EXITED
Tom de Vries has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 ...................................................................... Patch Set 2: (1 comment) | --- gdb/infrun.c | +++ gdb/infrun.c | @@ -4773,10 +4773,18 @@ stop_all_threads (void) | } | | - if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED | + if (event.ws.kind == TARGET_WAITKIND_SIGNALLED | + && event.ws.value.sig == GDB_SIGNAL_KILL) | + { | + thread_info *t = find_thread_ptid (event.target, event.ptid); | + save_waitstatus (t, &event.ws); | + t->resumed = 1; | + goto done; PS2, Line 4781: I've realized that the 'goto done' is not a good idea, since all_non_exited_threads can iterate over more than one inferior. I'll upload a new version that removes the 'goto done. | + } | + else if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED | || event.ws.kind == TARGET_WAITKIND_THREAD_EXITED | || event.ws.kind == TARGET_WAITKIND_EXITED | || event.ws.kind == TARGET_WAITKIND_SIGNALLED) | { | /* All resumed threads exited | or one thread/process exited/signalled. */ | }
Tankut Baris Aktemur has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 ...................................................................... Patch Set 3: I've tried this patch in two scenarios. One is when multi-threading is enabled (executable is multi.out), and the other is when the program is single-threaded (executable is named single1.out and single2.out -- exactly the same copies just to distinguish the processes). The test program is sleep.c from the PR. I start the multi-threaded case as follows: $ gdb -ex "set prompt (master-gdb) " -ex "b stop_all_threads" -ex r -ex \ "shell killall -9 multi.out" -ex c --args gdb -ex "set debug infrun 1" \ -ex r --args multi.out When gdb stops, I try "info inferiors" and "info threads" commands: Thread 1 received signal SIGINT, Interrupt. 0x00007ffff78a99d0 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 infrun: infrun_async(0) (gdb) i inferiors Num Description Connection Executable * 1 process 12979 1 (native) /path/to/multi.out (gdb) i threads Id Target Id Frame * 1 Thread 0x7ffff7fe3740 (LWP 12979) 0x00007ffff78a99d0 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 2 Thread 0x7ffff77c4700 (LWP 12996) Couldn't get registers: No such process. (gdb) thread 2 [Switching to thread 2 (Thread 0x7ffff77c4700 (LWP 12996))] Couldn't get registers: No such process. (gdb) i threads Couldn't get registers: No such process. (gdb) thread 1 Couldn't get registers: No such process. The second scenario is similar, except that instead of a multi-threaded single process, we have two single-threaded processes. Started as $ gdb -ex "set prompt (master-gdb) " -ex "b stop_all_threads" -ex r -ex c \ -ex c -ex "shell killall -9 single1.out" -ex c --args gdb -ex "set debug infrun 1" \ -ex start -ex "add-inferior -exec ./single2.out" -ex "inferior 2" -ex "start" \ -ex "set schedule-multiple on" -ex c --args ./single1.out The behavior is this: (gdb) info inferiors Num Description Connection Executable 1 process 2266 1 (native) /path/to/single1.out * 2 process 2282 1 (native) /path/to/single2.out (gdb) info threads Id Target Id Frame 1.1 process 2266 Couldn't get registers: No such process. (gdb) inferior 1 [Switching to inferior 1 [process 2266] (/path/to/single1.out)] [Switching to thread 1.1 (process 2266)] Couldn't get registers: No such process. (gdb) info inferiors Num Description Connection Executable Couldn't get registers: No such process. (gdb) info threads Couldn't get registers: No such process. (gdb) So, I think at this point it boils down to the discussion covered in https://sourceware.org/ml/gdb-patches/2020-01/msg00212.html and https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/133 GDB is not able to handle disappeared processes gracefully. The maintainers may consider the fix for the hanging behavior (i.e. the infinite loop) OK and the weird post-behavior above as a separate problem to be addressed later, but my understanding from Pedro's comment was that the deeper problem of GDB not handling already-gone processes well shall be addressed. Thanks -Baris
Tom de Vries has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 ...................................................................... Patch Set 3: > Patch Set 3: > GDB is not able to handle disappeared processes gracefully. Ack. > The > maintainers may consider the fix for the hanging behavior (i.e. the > infinite loop) OK and the weird post-behavior above as a separate problem That's my understanding from Pedro's comment: the weird post-behaviour is a pre-existing issue, independent of the hang that this patch fixes.
Tom de Vries has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/759 ...................................................................... Patch Set 4: Added test-case.
diff --git a/gdb/infrun.c b/gdb/infrun.c index 22de42c..e34ddc8 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -4772,7 +4772,10 @@ target_pid_to_str (event.ptid).c_str ()); } - if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED + if (event.ws.kind == TARGET_WAITKIND_SIGNALLED + && event.ws.value.sig == GDB_SIGNAL_KILL) + goto done; + else if (event.ws.kind == TARGET_WAITKIND_NO_RESUMED || event.ws.kind == TARGET_WAITKIND_THREAD_EXITED || event.ws.kind == TARGET_WAITKIND_EXITED || event.ws.kind == TARGET_WAITKIND_SIGNALLED) @@ -4872,6 +4875,7 @@ } } + done: if (debug_infrun) fprintf_unfiltered (gdb_stdlog, "infrun: stop_all_threads done\n"); }