From patchwork Wed Aug 5 01:04:13 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Brobecker X-Patchwork-Id: 8010 Received: (qmail 118315 invoked by alias); 5 Aug 2015 01:04:19 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 118299 invoked by uid 89); 5 Aug 2015 01:04:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 X-HELO: rock.gnat.com Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 05 Aug 2015 01:04:17 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id 239A328C70; Tue, 4 Aug 2015 21:04:15 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Fd+S7GsKkHsn; Tue, 4 Aug 2015 21:04:15 -0400 (EDT) Received: from joel.gnat.com (localhost.localdomain [127.0.0.1]) by rock.gnat.com (Postfix) with ESMTP id E7E6228B24; Tue, 4 Aug 2015 21:04:14 -0400 (EDT) Received: by joel.gnat.com (Postfix, from userid 1000) id 329D4472E5; Tue, 4 Aug 2015 18:04:13 -0700 (PDT) Date: Tue, 4 Aug 2015 18:04:13 -0700 From: Joel Brobecker To: Pedro Alves Cc: gdb-patches@sourceware.org Subject: Re: sig != GDB_SIGNAL_0 failed assertion stepping program on GNU/Linux Message-ID: <20150805010413.GL4777@adacore.com> References: <20150804180745.GA13984@adacore.com> <55C10A7B.3050405@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <55C10A7B.3050405@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) > If the "next" is for thread 1, > > > That's when we get an event from a different thread (thread 3): > > > > infrun: target_wait (-1.0.0, status) = > > infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)], > > infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP > > infrun: TARGET_WAITKIND_STOPPED > > infrun: stop_pc = 0x80782d0 > > infrun: context switch > > infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378) > > > > ... which we find to be at the address where we set a breakpoint > > on "the unwinder debug hook" (namely "_Unwind_DebugHook"). That's > > why GDB reports for this event that this is... > > > > infrun: BPSTAT_WHAT_SET_LONGJMP_RESUME > > Why are we getting this? longjmp/exception/step-resume breakpoints > are thread-specific. > > I'd guess that the bug is in bpstat_what: > > struct bpstat_what > bpstat_what (bpstat bs_head) > { > ... > case bp_longjmp: > case bp_longjmp_call_dummy: > case bp_exception: > this_action = BPSTAT_WHAT_SET_LONGJMP_RESUME; > retval.is_longjmp = bptype != bp_exception; > break; > ... > > This bit is not considering "if (bs->stop)" like e.g., > the bp_step_resume case. > > I've seen something like this trigger before, and have a patch > somewhere to rewrite bpstat_what differently which fixes that. > I never managed to write a testcase for it so never submitted > it. But, could you try the simpler approach? Try making that: > > if (bs->stop) > { > this_action = BPSTAT_WHAT_SET_LONGJMP_RESUME; > retval.is_longjmp = bptype != bp_exception; > } > else > this_action = BPSTAT_WHAT_SINGLE; > break; Ah ha, I missed the fact that the exception breakpoint is thread- specific. Your fix seems to be working very well; thanks for suggestion, Pedro! Attached is a patch with a slightly altered analysis as the revision log. Our SuSE 10 machine is very slow, so I tested it on a more modern machine with a slightly different distro. I'm wondering if we shouldn't be doing the same for: case bp_longjmp_resume: case bp_exception_resume: this_action = BPSTAT_WHAT_CLEAR_LONGJMP_RESUME; retval.is_longjmp = bptype == bp_longjmp_resume; break; gdb/ChangeLog: Pedro Alves * breakpoint.c (bpstat_what) : Correctly handle the case where BS->STOP is not set. Thanks! From 8ff769070f12eafd1b858a63a184a4be9f9a6500 Mon Sep 17 00:00:00 2001 From: Pedro Alves Date: Tue, 4 Aug 2015 23:40:08 +0200 Subject: [PATCH] sig != GDB_SIGNAL_0 failed assertion stepping program on GNU/Linux Trying to next/step a program on GNU/Linux sometimes results in the following failed assertion: % gdb -q .obj/gprof/main (gdb) start (gdb) n (gdb) step [...]/infrun.c:2391: internal-error: resume: Assertion `sig != GDB_SIGNAL_0' failed. What happens is that, durig the "next" operation, GDB hits a longjmp/exception/step-resume breakpoint but fails to see that this breakpoint was set for a different thread than the one being stepped. More precisely, at the end of the "start" command, we are stopped at the start of function Main in main.adb; there are 4 threads in total, and we are in the main thread (which is thread 1): (gdb) info thread Id Target Id Frame 4 Thread 0xb7a56ba0 (LWP 28379) 0xffffe410 in __kernel_vsyscall () 3 Thread 0xb7c5aba0 (LWP 28378) 0xffffe410 in __kernel_vsyscall () 2 Thread 0xb7e5eba0 (LWP 28377) 0xffffe410 in __kernel_vsyscall () * 1 Thread 0xb7ea18c0 (LWP 28370) main () at /[...]/main.adb:57 All the logs below reference Thread ID/LWP, but I think it'll be easier to talk about the the thread by thread number. For instance, thread 1 is LWP 28370 while thread 3 is LWP 28378. So, I will translate in my explanations the LWPs into thread numbers. Back to what happens while we are trying to "next' our program: (gdb) n infrun: clear_proceed_status_thread (Thread 0xb7a56ba0 (LWP 28379)) infrun: clear_proceed_status_thread (Thread 0xb7c5aba0 (LWP 28378)) infrun: clear_proceed_status_thread (Thread 0xb7e5eba0 (LWP 28377)) infrun: clear_proceed_status_thread (Thread 0xb7ea18c0 (LWP 28370)) infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT) infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x805451e infrun: target_wait (-1.0.0, status) = infrun: 28370.28370.0 [Thread 0xb7ea18c0 (LWP 28370)], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x8054523 We've resumed thread 1 (LWP 28370), and received in return a signal that the same thread stopped slightly further. It's still in the range of instructions for the line of source we started the "next" from, as evidenced by the following trace... infrun: stepping inside range [0x805451e-0x8054531] ... and thus, we decide to continue stepping the same thread: infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523 infrun: prepare_to_wait That's when we get an event from a different thread (thread 3)... infrun: target_wait (-1.0.0, status) = infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x80782d0 infrun: context switch infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378) ... which we find to be at the address where we set a breakpoint on "the unwinder debug hook" (namely "_Unwind_DebugHook"). But GDB fails to notice that the breakpoint was inserted for thread 1 only, and so decides to handle it as... infrun: BPSTAT_WHAT_SET_LONGJMP_RESUME ... and inserts a breakpoint at the corresponding resume address, as evidenced by this the next log: infrun: exception resume at 80542a2 That breakpoint seems innocent right now, but will play a role fairly quickly. But for now, GDB has inserted the exception-resume breakpoint, and needs to single-step thread 3 past the breakpoint it just hit. Thus, it temporarily disables the exception breakpoint, and requests a step of that thread: infrun: skipping breakpoint: stepping past insn at: 0x80782d0 infrun: skipping breakpoint: stepping past insn at: 0x80782d0 infrun: skipping breakpoint: stepping past insn at: 0x80782d0 infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 0xb7c5aba0 (LWP 28378)] at 0x80782d0 infrun: prepare_to_wait We then get a notification, still from thread 3, that it's now past that breakpoint... infrun: prepare_to_wait infrun: target_wait (-1.0.0, status) = infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x8078424 ... so we can resume what we were doing before, which is single-stepping thread 1 until we get to a new line of code: infrun: switching back to stepped thread infrun: Switching context from Thread 0xb7c5aba0 (LWP 28378) to Thread 0xb7ea18c0 (LWP 28370) infrun: expected thread still hasn't advanced infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523 The "resume" log above shows that we're resuming thread 1 from where we left off (0x8054523). We get one more stop at 0x8054529, which is still inside our stepping range so we go again. That's when we get the following event, from thread 3: infrun: prepare_to_wait infrun: target_wait (-1.0.0, status) = infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x80542a2 Now the stop_pc adddres is interesting, because it's the address of "exception resume" breakpoint. When GDB sees this, it knows infrun: context switch infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378) infrun: BPSTAT_WHAT_CLEAR_LONGJMP_RESUME ... and since the location is at a different line of code, this is where it decide the "next" operation should stop: infrun: stop_waiting [Switching to Thread 0xb7c5aba0 (LWP 28378)] 0x080542a2 in inte_tache_rt.ttache_rt ( <_task>=0x80968ec ) at /[...]/inte_tache_rt.adb:54 54 end loop; Instead, what GDB should be doing is noticing that the exception breakpoint we hit was for a different thread, thus single-step that thread out of the breakpoint _without_ inserting the exception-return breakpoint, and then resume the single-stepping of the initial thread (thread 1) until stepping out of the stepping range. This is what this patch does, and after applying it, GDB now correctly stops on the next line of code. gdb/ChangeLog: Pedro Alves * breakpoint.c (bpstat_what) : Correctly handle the case where BS->STOP is not set. Tested on x86_64-linux, no regressions. --- gdb/breakpoint.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c index 74c7a7b..da4ee82 100644 --- a/gdb/breakpoint.c +++ b/gdb/breakpoint.c @@ -5778,8 +5778,13 @@ bpstat_what (bpstat bs_head) case bp_longjmp: case bp_longjmp_call_dummy: case bp_exception: - this_action = BPSTAT_WHAT_SET_LONGJMP_RESUME; - retval.is_longjmp = bptype != bp_exception; + if (bs->stop) + { + this_action = BPSTAT_WHAT_SET_LONGJMP_RESUME; + retval.is_longjmp = bptype != bp_exception; + } + else + this_action = BPSTAT_WHAT_SINGLE; break; case bp_longjmp_resume: case bp_exception_resume: -- 1.7.10.4