btrace: avoid tp != NULL assertion
Commit Message
On 03/03/2015 12:25 PM, Metzger, Markus T wrote:
>> -----Original Message-----
>> From: Pedro Alves [mailto:palves@redhat.com]
>> Sent: Tuesday, March 3, 2015 12:55 PM
>> To: Metzger, Markus T
>> Cc: gdb-patches@sourceware.org
>> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion
>
>
>> Frame #7 where we convert the ptid to something inf-ptrace.c understands.
>>
>> The fix is just to stop losing information. This also improves
>> performance a tiny bit, as currently we fetch registers out of ptrace
>> twice, once for {pid,lwp} in linux-nat.c, and another for {pid,0} in
>> inf-ptrace/i386-linux-nat.c. We'll now use a single regcache object
>> everywhere.
>>
>> Please try the patch below.
>
> It works. Thanks. I'll drop my patch, then.
Alright, I've applied it now, as below.
> No, that wasn't the reason for replacing the assert. There are no such
> errors in the gdb.btrace suite (which is mostly single-threaded) with my
> patch and I have not seen any such errors otherwise, either.
Then it sounds like we're either lacking basic tests, or the threaded tests
are somehow not running correctly when gdb is a 32-bit program. I think
that if you step any non-leader thread, you should see it happen.
Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
caught it. My machine doesn't do btrace, so I can't try it myself...
BTW, did any existing test in the testsuite catch the assertion we're
fixing?
--------
From 90ad5e1d4f34d02f437ec12d1b65d7252f5b7f1c Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 3 Mar 2015 13:33:44 +0000
Subject: [PATCH] Linux/ptrace: don't convert ptids when asking inf-ptrace
layer to resume LWP
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Ref: https://sourceware.org/ml/gdb-patches/2015-03/msg00060.html
The record-btrace target can hit an assertion here:
Breakpoint 1, record_btrace_fetch_registers (ops=0x974bfc0 <record_btrace_ops>,
regcache=0x9a0a798, regno=8) at gdb/record-btrace.c:1202
1202 gdb_assert (tp != NULL);
(gdb) p regcache->ptid
$3 = {pid = 23856, lwp = 0, tid = 0}
The problem is that the linux-nat layer converts the ptid to a
single-process ptid before passing the request down to the inf-ptrace
layer, which loses information, and then record-btrace can't find the
corresponding thread in GDB's thread list:
(gdb) bt
#0 record_btrace_fetch_registers (ops=0x974bfc0 <record_btrace_ops>, regcache=0x9a0a798, regno=8)
at gdb/record-btrace.c:1202
#1 0x083f4ee2 in delegate_fetch_registers (self=0x974bfc0 <record_btrace_ops>, arg1=0x9a0a798,
arg2=8) at gdb/target-delegates.c:149
#2 0x08406562 in target_fetch_registers (regcache=0x9a0a798, regno=8)
at gdb/target.c:3279
#3 0x08355255 in regcache_raw_read (regcache=0x9a0a798, regnum=8,
buf=0xbfffe6c0 "¨\003\222\tÀ8kIøæÿ¿HO5\b\035]")
at gdb/regcache.c:643
#4 0x083558a7 in regcache_cooked_read (regcache=0x9a0a798, regnum=8,
buf=0xbfffe6c0 "¨\003\222\tÀ8kIøæÿ¿HO5\b\035]")
at gdb/regcache.c:734
#5 0x08355de3 in regcache_cooked_read_unsigned (regcache=0x9a0a798, regnum=8, val=0xbfffe738)
at gdb/regcache.c:838
#6 0x0827a106 in i386_linux_resume (ops=0x9737ca0 <linux_ops_saved>, ptid=..., step=1,
signal=GDB_SIGNAL_0) at gdb/i386-linux-nat.c:670
#7 0x08280c12 in linux_resume_one_lwp (lp=0x9a0a5b8, step=1, signo=GDB_SIGNAL_0)
at gdb/linux-nat.c:1529
#8 0x08281281 in linux_nat_resume (ops=0x98da608, ptid=..., step=1, signo=GDB_SIGNAL_0)
at gdb/linux-nat.c:1708
#9 0x0850738e in record_btrace_resume (ops=0x98da608, ptid=..., step=1, signal=GDB_SIGNAL_0)
at gdb/record-btrace.c:1760
...
The fix is just to not lose information, and let the intact ptid reach
record-btrace.c.
Tested on x86-64 Fedora 20, -m32.
gdb/ChangeLog:
2015-03-03 Pedro Alves <palves@redhat.com>
* i386-linux-nat.c (i386_linux_resume): Get the ptrace PID out of
the lwp field of ptid. Pass the full ptid to get_thread_regcache.
* inf-ptrace.c (get_ptrace_pid): New function.
(inf_ptrace_resume): Use it.
* linux-nat.c (linux_resume_one_lwp): Pass the LWP's ptid ummodified
to the lower layer.
---
gdb/ChangeLog | 9 +++++++++
gdb/i386-linux-nat.c | 5 ++---
gdb/inf-ptrace.c | 25 ++++++++++++++++++++++---
gdb/linux-nat.c | 6 +-----
4 files changed, 34 insertions(+), 11 deletions(-)
Comments
> -----Original Message-----
> From: gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] On Behalf Of Pedro Alves
> Sent: Tuesday, March 3, 2015 2:42 PM
> To: Metzger, Markus T
> Cc: gdb-patches@sourceware.org
> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion
> > No, that wasn't the reason for replacing the assert. There are no such
> > errors in the gdb.btrace suite (which is mostly single-threaded) with my
> > patch and I have not seen any such errors otherwise, either.
>
> Then it sounds like we're either lacking basic tests, or the threaded tests
> are somehow not running correctly when gdb is a 32-bit program. I think
> that if you step any non-leader thread, you should see it happen.
> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
> caught it. My machine doesn't do btrace, so I can't try it myself...
>
> BTW, did any existing test in the testsuite catch the assertion we're
> fixing?
Almost all of them when run on 32-bit systems; -m32 on 64-bit systems does
not catch this.
Regards,
Markus.
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk
Registergericht: Muenchen HRB 47456
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
On 03/03/2015 01:55 PM, Metzger, Markus T wrote:
>> -----Original Message-----
>> From: gdb-patches-owner@sourceware.org [mailto:gdb-patches-
>> owner@sourceware.org] On Behalf Of Pedro Alves
>> Sent: Tuesday, March 3, 2015 2:42 PM
>> To: Metzger, Markus T
>> Cc: gdb-patches@sourceware.org
>> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion
>
>
>>> No, that wasn't the reason for replacing the assert. There are no such
>>> errors in the gdb.btrace suite (which is mostly single-threaded) with my
>>> patch and I have not seen any such errors otherwise, either.
>>
>> Then it sounds like we're either lacking basic tests, or the threaded tests
>> are somehow not running correctly when gdb is a 32-bit program. I think
>> that if you step any non-leader thread, you should see it happen.
>> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
>> caught it. My machine doesn't do btrace, so I can't try it myself...
>>
>> BTW, did any existing test in the testsuite catch the assertion we're
>> fixing?
>
> Almost all of them when run on 32-bit systems; -m32 on 64-bit systems does
> not catch this.
Right, that's why I said "when gdb is a 32-bit program". Sounds like
no existing test tries a "step" when not replaying then. It'd be very
nice to have one. Can I convince you to add one? :-)
Thanks,
Pedro Alves
> -----Original Message-----
> From: Pedro Alves [mailto:palves@redhat.com]
> Sent: Tuesday, March 3, 2015 3:03 PM
> To: Metzger, Markus T
> Cc: gdb-patches@sourceware.org
> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion
> >>> No, that wasn't the reason for replacing the assert. There are no such
> >>> errors in the gdb.btrace suite (which is mostly single-threaded) with my
> >>> patch and I have not seen any such errors otherwise, either.
> >>
> >> Then it sounds like we're either lacking basic tests, or the threaded tests
> >> are somehow not running correctly when gdb is a 32-bit program. I think
> >> that if you step any non-leader thread, you should see it happen.
> >> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
> >> caught it. My machine doesn't do btrace, so I can't try it myself...
> >>
> >> BTW, did any existing test in the testsuite catch the assertion we're
> >> fixing?
> >
> > Almost all of them when run on 32-bit systems; -m32 on 64-bit systems
> does
> > not catch this.
>
> Right, that's why I said "when gdb is a 32-bit program". Sounds like
> no existing test tries a "step" when not replaying then. It'd be very
> nice to have one. Can I convince you to add one? :-)
The multi-thread-step.exp test does not catch it because it uses "cont",
which works fine. When I add a "step" before the "cont", I get the
"No thread" error when using my old patch instead of your new patch.
Or I get the assert when using neither my old nor your new patch.
But then, I got the assert already on other tests.
With my patch dropped and your patch committed, what is the new
test expected to catch?
Regards,
Markus.
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk
Registergericht: Muenchen HRB 47456
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
On 03/03/2015 02:44 PM, Metzger, Markus T wrote:
>> -----Original Message-----
>> From: Pedro Alves [mailto:palves@redhat.com]
>> Sent: Tuesday, March 3, 2015 3:03 PM
>> To: Metzger, Markus T
>> Cc: gdb-patches@sourceware.org
>> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion
>
>
>>>>> No, that wasn't the reason for replacing the assert. There are no such
>>>>> errors in the gdb.btrace suite (which is mostly single-threaded) with my
>>>>> patch and I have not seen any such errors otherwise, either.
>>>>
>>>> Then it sounds like we're either lacking basic tests, or the threaded tests
>>>> are somehow not running correctly when gdb is a 32-bit program. I think
>>>> that if you step any non-leader thread, you should see it happen.
>>>> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
>>>> caught it. My machine doesn't do btrace, so I can't try it myself...
>>>>
>>>> BTW, did any existing test in the testsuite catch the assertion we're
>>>> fixing?
>>>
>>> Almost all of them when run on 32-bit systems; -m32 on 64-bit systems
>> does
>>> not catch this.
>>
>> Right, that's why I said "when gdb is a 32-bit program". Sounds like
>> no existing test tries a "step" when not replaying then. It'd be very
>> nice to have one. Can I convince you to add one? :-)
>
> The multi-thread-step.exp test does not catch it because it uses "cont",
> which works fine. When I add a "step" before the "cont", I get the
> "No thread" error when using my old patch instead of your new patch.
> Or I get the assert when using neither my old nor your new patch.
> But then, I got the assert already on other tests.
>
> With my patch dropped and your patch committed, what is the new
> test expected to catch?
You're getting me confused...
The test was expected to catch the assertion, given that apparently
no other test was catching it -- from the dialog above, one understands
no test would be catching this before (that's what I explicitly
asked), but now you're saying the opposite.
If indeed there are already tests that triggered the
error/internal-error before the fix, then I agree a new
test is not necessary.
> -----Original Message-----
> From: Pedro Alves [mailto:palves@redhat.com]
> Sent: Tuesday, March 3, 2015 4:25 PM
> To: Metzger, Markus T
> Cc: gdb-patches@sourceware.org
> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion
> >>>>> No, that wasn't the reason for replacing the assert. There are no such
> >>>>> errors in the gdb.btrace suite (which is mostly single-threaded) with
> my
> >>>>> patch and I have not seen any such errors otherwise, either.
> >>>>
> >>>> Then it sounds like we're either lacking basic tests, or the threaded
> tests
> >>>> are somehow not running correctly when gdb is a 32-bit program. I
> think
> >>>> that if you step any non-leader thread, you should see it happen.
> >>>> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should
> have
> >>>> caught it. My machine doesn't do btrace, so I can't try it myself...
> >>>>
> >>>> BTW, did any existing test in the testsuite catch the assertion we're
> >>>> fixing?
> >>>
> >>> Almost all of them when run on 32-bit systems; -m32 on 64-bit systems
> >> does
> >>> not catch this.
> >>
> >> Right, that's why I said "when gdb is a 32-bit program". Sounds like
> >> no existing test tries a "step" when not replaying then. It'd be very
> >> nice to have one. Can I convince you to add one? :-)
> >
> > The multi-thread-step.exp test does not catch it because it uses "cont",
> > which works fine. When I add a "step" before the "cont", I get the
> > "No thread" error when using my old patch instead of your new patch.
> > Or I get the assert when using neither my old nor your new patch.
> > But then, I got the assert already on other tests.
> >
> > With my patch dropped and your patch committed, what is the new
> > test expected to catch?
>
> You're getting me confused...
>
> The test was expected to catch the assertion, given that apparently
> no other test was catching it -- from the dialog above, one understands
> no test would be catching this before (that's what I explicitly
> asked), but now you're saying the opposite.
I think that was a misunderstanding. The assertion is caught by several
gdb.btrace tests when run with 32-bit GDB.
I thought you were referring to the badness in my patch that would
result in GDB asking for registers in a wrong process.
Regards,
Markus.
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk
Registergericht: Muenchen HRB 47456
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
On 03/03/2015 03:37 PM, Metzger, Markus T wrote:
> I think that was a misunderstanding. The assertion is caught by several
> gdb.btrace tests when run with 32-bit GDB.
>
> I thought you were referring to the badness in my patch that would
> result in GDB asking for registers in a wrong process.
Ah. Alright, glad we're sorted. :-)
Thanks,
Pedro Alves
@@ -1,3 +1,12 @@
+2015-03-03 Pedro Alves <palves@redhat.com>
+
+ * i386-linux-nat.c (i386_linux_resume): Get the ptrace PID out of
+ the lwp field of ptid. Pass the full ptid to get_thread_regcache.
+ * inf-ptrace.c (get_ptrace_pid): New function.
+ (inf_ptrace_resume): Use it.
+ * linux-nat.c (linux_resume_one_lwp): Pass the LWP's ptid ummodified
+ to the lower layer.
+
2015-03-03 Markus Metzger <markus.t.metzger@intel.com>
* nat/linux-btrace.c: Include sys/utsname.h.
@@ -648,8 +648,7 @@ static void
i386_linux_resume (struct target_ops *ops,
ptid_t ptid, int step, enum gdb_signal signal)
{
- int pid = ptid_get_pid (ptid);
-
+ int pid = ptid_get_lwp (ptid);
int request;
if (catch_syscall_enabled () > 0)
@@ -659,7 +658,7 @@ i386_linux_resume (struct target_ops *ops,
if (step)
{
- struct regcache *regcache = get_thread_regcache (pid_to_ptid (pid));
+ struct regcache *regcache = get_thread_regcache (ptid);
struct gdbarch *gdbarch = get_regcache_arch (regcache);
enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
ULONGEST pc;
@@ -289,6 +289,22 @@ inf_ptrace_stop (struct target_ops *self, ptid_t ptid)
kill (-inferior_process_group (), SIGINT);
}
+/* Return which PID to pass to ptrace in order to observe/control the
+ tracee identified by PTID. */
+
+static pid_t
+get_ptrace_pid (ptid_t ptid)
+{
+ pid_t pid;
+
+ /* If we have an LWPID to work with, use it. Otherwise, we're
+ dealing with a non-threaded program/target. */
+ pid = ptid_get_lwp (ptid);
+ if (pid == 0)
+ pid = ptid_get_pid (ptid);
+ return pid;
+}
+
/* Resume execution of thread PTID, or all threads if PTID is -1. If
STEP is nonzero, single-step it. If SIGNAL is nonzero, give it
that signal. */
@@ -297,13 +313,16 @@ static void
inf_ptrace_resume (struct target_ops *ops,
ptid_t ptid, int step, enum gdb_signal signal)
{
- pid_t pid = ptid_get_pid (ptid);
+ pid_t pid;
+
int request;
- if (pid == -1)
+ if (ptid_equal (minus_one_ptid, ptid))
/* Resume all threads. Traditionally ptrace() only supports
single-threaded processes, so simply resume the inferior. */
- pid = ptid_get_pid (inferior_ptid);
+ pid = get_ptrace_pid (inferior_ptid);
+ else
+ pid = get_ptrace_pid (ptid);
if (catch_syscall_enabled () > 0)
request = PT_SYSCALL;
@@ -1506,8 +1506,6 @@ linux_nat_detach (struct target_ops *ops, const char *args, int from_tty)
static void
linux_resume_one_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
{
- ptid_t ptid;
-
lp->step = step;
/* stop_pc doubles as the PC the LWP had when it was last resumed.
@@ -1524,9 +1522,7 @@ linux_resume_one_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
if (linux_nat_prepare_to_resume != NULL)
linux_nat_prepare_to_resume (lp);
- /* Convert to something the lower layer understands. */
- ptid = pid_to_ptid (ptid_get_lwp (lp->ptid));
- linux_ops->to_resume (linux_ops, ptid, step, signo);
+ linux_ops->to_resume (linux_ops, lp->ptid, step, signo);
lp->stop_reason = LWP_STOPPED_BY_NO_REASON;
lp->stopped = 0;
registers_changed_ptid (lp->ptid);