btrace: avoid tp != NULL assertion

Message ID 54F5BA0B.2000106@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves March 3, 2015, 1:41 p.m. UTC
  On 03/03/2015 12:25 PM, Metzger, Markus T wrote:
>> -----Original Message-----
>> From: Pedro Alves [mailto:palves@redhat.com]
>> Sent: Tuesday, March 3, 2015 12:55 PM
>> To: Metzger, Markus T
>> Cc: gdb-patches@sourceware.org
>> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion
> 
> 
>> Frame #7 where we convert the ptid to something inf-ptrace.c understands.
>>
>> The fix is just to stop losing information.  This also improves
>> performance a tiny bit, as currently we fetch registers out of ptrace
>> twice, once for {pid,lwp} in linux-nat.c, and another for {pid,0} in
>> inf-ptrace/i386-linux-nat.c.  We'll now use a single regcache object
>> everywhere.
>>
>> Please try the patch below.
> 
> It works.  Thanks.  I'll drop my patch, then.

Alright, I've applied it now, as below.

> No, that wasn't the reason for replacing the assert.  There are no such
> errors in the gdb.btrace suite (which is mostly single-threaded) with my
> patch and I have not seen any such errors otherwise, either.  

Then it sounds like we're either lacking basic tests, or the threaded tests
are somehow not running correctly when gdb is a 32-bit program.  I think
that if you step any non-leader thread, you should see it happen.
Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
caught it.  My machine doesn't do btrace, so I can't try it myself...

BTW, did any existing test in the testsuite catch the assertion we're
fixing?

--------
From 90ad5e1d4f34d02f437ec12d1b65d7252f5b7f1c Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 3 Mar 2015 13:33:44 +0000
Subject: [PATCH] Linux/ptrace: don't convert ptids when asking inf-ptrace
 layer to resume LWP
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Ref: https://sourceware.org/ml/gdb-patches/2015-03/msg00060.html

The record-btrace target can hit an assertion here:

 Breakpoint 1, record_btrace_fetch_registers (ops=0x974bfc0 <record_btrace_ops>,
     regcache=0x9a0a798, regno=8) at gdb/record-btrace.c:1202
 1202	  gdb_assert (tp != NULL);

 (gdb) p regcache->ptid
 $3 = {pid = 23856, lwp = 0, tid = 0}

The problem is that the linux-nat layer converts the ptid to a
single-process ptid before passing the request down to the inf-ptrace
layer, which loses information, and then record-btrace can't find the
corresponding thread in GDB's thread list:

 (gdb) bt
 #0  record_btrace_fetch_registers (ops=0x974bfc0 <record_btrace_ops>, regcache=0x9a0a798, regno=8)
     at gdb/record-btrace.c:1202
 #1  0x083f4ee2 in delegate_fetch_registers (self=0x974bfc0 <record_btrace_ops>, arg1=0x9a0a798,
     arg2=8) at gdb/target-delegates.c:149
 #2  0x08406562 in target_fetch_registers (regcache=0x9a0a798, regno=8)
     at gdb/target.c:3279
 #3  0x08355255 in regcache_raw_read (regcache=0x9a0a798, regnum=8,
     buf=0xbfffe6c0 "¨\003\222\tÀ8kIøæÿ¿HO5\b\035]")
     at gdb/regcache.c:643
 #4  0x083558a7 in regcache_cooked_read (regcache=0x9a0a798, regnum=8,
     buf=0xbfffe6c0 "¨\003\222\tÀ8kIøæÿ¿HO5\b\035]")
     at gdb/regcache.c:734
 #5  0x08355de3 in regcache_cooked_read_unsigned (regcache=0x9a0a798, regnum=8, val=0xbfffe738)
     at gdb/regcache.c:838
 #6  0x0827a106 in i386_linux_resume (ops=0x9737ca0 <linux_ops_saved>, ptid=..., step=1,
     signal=GDB_SIGNAL_0) at gdb/i386-linux-nat.c:670
 #7  0x08280c12 in linux_resume_one_lwp (lp=0x9a0a5b8, step=1, signo=GDB_SIGNAL_0)
     at gdb/linux-nat.c:1529
 #8  0x08281281 in linux_nat_resume (ops=0x98da608, ptid=..., step=1, signo=GDB_SIGNAL_0)
     at gdb/linux-nat.c:1708
 #9  0x0850738e in record_btrace_resume (ops=0x98da608, ptid=..., step=1, signal=GDB_SIGNAL_0)
     at gdb/record-btrace.c:1760
 ...

The fix is just to not lose information, and let the intact ptid reach
record-btrace.c.

Tested on x86-64 Fedora 20, -m32.

gdb/ChangeLog:
2015-03-03  Pedro Alves  <palves@redhat.com>

	* i386-linux-nat.c (i386_linux_resume): Get the ptrace PID out of
	the lwp field of ptid.  Pass the full ptid to get_thread_regcache.
	* inf-ptrace.c (get_ptrace_pid): New function.
	(inf_ptrace_resume): Use it.
	* linux-nat.c (linux_resume_one_lwp): Pass the LWP's ptid ummodified
	to the lower layer.
---
 gdb/ChangeLog        |  9 +++++++++
 gdb/i386-linux-nat.c |  5 ++---
 gdb/inf-ptrace.c     | 25 ++++++++++++++++++++++---
 gdb/linux-nat.c      |  6 +-----
 4 files changed, 34 insertions(+), 11 deletions(-)
  

Comments

Metzger, Markus T March 3, 2015, 1:55 p.m. UTC | #1
> -----Original Message-----
> From: gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] On Behalf Of Pedro Alves
> Sent: Tuesday, March 3, 2015 2:42 PM
> To: Metzger, Markus T
> Cc: gdb-patches@sourceware.org
> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion


> > No, that wasn't the reason for replacing the assert.  There are no such
> > errors in the gdb.btrace suite (which is mostly single-threaded) with my
> > patch and I have not seen any such errors otherwise, either.
> 
> Then it sounds like we're either lacking basic tests, or the threaded tests
> are somehow not running correctly when gdb is a 32-bit program.  I think
> that if you step any non-leader thread, you should see it happen.
> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
> caught it.  My machine doesn't do btrace, so I can't try it myself...
> 
> BTW, did any existing test in the testsuite catch the assertion we're
> fixing?

Almost all of them when run on 32-bit systems; -m32 on 64-bit systems does
not catch this.

Regards,
Markus.
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk
Registergericht: Muenchen HRB 47456
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
  
Pedro Alves March 3, 2015, 2:03 p.m. UTC | #2
On 03/03/2015 01:55 PM, Metzger, Markus T wrote:
>> -----Original Message-----
>> From: gdb-patches-owner@sourceware.org [mailto:gdb-patches-
>> owner@sourceware.org] On Behalf Of Pedro Alves
>> Sent: Tuesday, March 3, 2015 2:42 PM
>> To: Metzger, Markus T
>> Cc: gdb-patches@sourceware.org
>> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion
> 
> 
>>> No, that wasn't the reason for replacing the assert.  There are no such
>>> errors in the gdb.btrace suite (which is mostly single-threaded) with my
>>> patch and I have not seen any such errors otherwise, either.
>>
>> Then it sounds like we're either lacking basic tests, or the threaded tests
>> are somehow not running correctly when gdb is a 32-bit program.  I think
>> that if you step any non-leader thread, you should see it happen.
>> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
>> caught it.  My machine doesn't do btrace, so I can't try it myself...
>>
>> BTW, did any existing test in the testsuite catch the assertion we're
>> fixing?
> 
> Almost all of them when run on 32-bit systems; -m32 on 64-bit systems does
> not catch this.

Right, that's why I said "when gdb is a 32-bit program".  Sounds like
no existing test tries a "step" when not replaying then.  It'd be very
nice to have one.  Can I convince you to add one?  :-)

Thanks,
Pedro Alves
  
Metzger, Markus T March 3, 2015, 2:44 p.m. UTC | #3
> -----Original Message-----
> From: Pedro Alves [mailto:palves@redhat.com]
> Sent: Tuesday, March 3, 2015 3:03 PM
> To: Metzger, Markus T
> Cc: gdb-patches@sourceware.org
> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion


> >>> No, that wasn't the reason for replacing the assert.  There are no such
> >>> errors in the gdb.btrace suite (which is mostly single-threaded) with my
> >>> patch and I have not seen any such errors otherwise, either.
> >>
> >> Then it sounds like we're either lacking basic tests, or the threaded tests
> >> are somehow not running correctly when gdb is a 32-bit program.  I think
> >> that if you step any non-leader thread, you should see it happen.
> >> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
> >> caught it.  My machine doesn't do btrace, so I can't try it myself...
> >>
> >> BTW, did any existing test in the testsuite catch the assertion we're
> >> fixing?
> >
> > Almost all of them when run on 32-bit systems; -m32 on 64-bit systems
> does
> > not catch this.
> 
> Right, that's why I said "when gdb is a 32-bit program".  Sounds like
> no existing test tries a "step" when not replaying then.  It'd be very
> nice to have one.  Can I convince you to add one?  :-)

The multi-thread-step.exp test does not catch it because it uses "cont",
which works fine.  When I add a "step" before the "cont", I get the
"No thread" error when using my old patch instead of your new patch.
Or I get the assert when using neither my old nor your new patch.
But then, I got the assert already on other tests.

With my patch dropped and your patch committed, what is the new
test expected to catch?

Regards,
Markus.
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk
Registergericht: Muenchen HRB 47456
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
  
Pedro Alves March 3, 2015, 3:25 p.m. UTC | #4
On 03/03/2015 02:44 PM, Metzger, Markus T wrote:
>> -----Original Message-----
>> From: Pedro Alves [mailto:palves@redhat.com]
>> Sent: Tuesday, March 3, 2015 3:03 PM
>> To: Metzger, Markus T
>> Cc: gdb-patches@sourceware.org
>> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion
> 
> 
>>>>> No, that wasn't the reason for replacing the assert.  There are no such
>>>>> errors in the gdb.btrace suite (which is mostly single-threaded) with my
>>>>> patch and I have not seen any such errors otherwise, either.
>>>>
>>>> Then it sounds like we're either lacking basic tests, or the threaded tests
>>>> are somehow not running correctly when gdb is a 32-bit program.  I think
>>>> that if you step any non-leader thread, you should see it happen.
>>>> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should have
>>>> caught it.  My machine doesn't do btrace, so I can't try it myself...
>>>>
>>>> BTW, did any existing test in the testsuite catch the assertion we're
>>>> fixing?
>>>
>>> Almost all of them when run on 32-bit systems; -m32 on 64-bit systems
>> does
>>> not catch this.
>>
>> Right, that's why I said "when gdb is a 32-bit program".  Sounds like
>> no existing test tries a "step" when not replaying then.  It'd be very
>> nice to have one.  Can I convince you to add one?  :-)
> 
> The multi-thread-step.exp test does not catch it because it uses "cont",
> which works fine.  When I add a "step" before the "cont", I get the
> "No thread" error when using my old patch instead of your new patch.
> Or I get the assert when using neither my old nor your new patch.
> But then, I got the assert already on other tests.
> 
> With my patch dropped and your patch committed, what is the new
> test expected to catch?

You're getting me confused...

The test was expected to catch the assertion, given that apparently
no other test was catching it -- from the dialog above, one understands
no test would be catching this before (that's what I explicitly
asked), but now you're saying the opposite.

If indeed there are already tests that triggered the
error/internal-error before the fix, then I agree a new
test is not necessary.
  
Metzger, Markus T March 3, 2015, 3:37 p.m. UTC | #5
> -----Original Message-----
> From: Pedro Alves [mailto:palves@redhat.com]
> Sent: Tuesday, March 3, 2015 4:25 PM
> To: Metzger, Markus T
> Cc: gdb-patches@sourceware.org
> Subject: Re: [PATCH] btrace: avoid tp != NULL assertion


> >>>>> No, that wasn't the reason for replacing the assert.  There are no such
> >>>>> errors in the gdb.btrace suite (which is mostly single-threaded) with
> my
> >>>>> patch and I have not seen any such errors otherwise, either.
> >>>>
> >>>> Then it sounds like we're either lacking basic tests, or the threaded
> tests
> >>>> are somehow not running correctly when gdb is a 32-bit program.  I
> think
> >>>> that if you step any non-leader thread, you should see it happen.
> >>>> Grepping the tests, I think gdb.btrace/multi-thread-step.exp should
> have
> >>>> caught it.  My machine doesn't do btrace, so I can't try it myself...
> >>>>
> >>>> BTW, did any existing test in the testsuite catch the assertion we're
> >>>> fixing?
> >>>
> >>> Almost all of them when run on 32-bit systems; -m32 on 64-bit systems
> >> does
> >>> not catch this.
> >>
> >> Right, that's why I said "when gdb is a 32-bit program".  Sounds like
> >> no existing test tries a "step" when not replaying then.  It'd be very
> >> nice to have one.  Can I convince you to add one?  :-)
> >
> > The multi-thread-step.exp test does not catch it because it uses "cont",
> > which works fine.  When I add a "step" before the "cont", I get the
> > "No thread" error when using my old patch instead of your new patch.
> > Or I get the assert when using neither my old nor your new patch.
> > But then, I got the assert already on other tests.
> >
> > With my patch dropped and your patch committed, what is the new
> > test expected to catch?
> 
> You're getting me confused...
> 
> The test was expected to catch the assertion, given that apparently
> no other test was catching it -- from the dialog above, one understands
> no test would be catching this before (that's what I explicitly
> asked), but now you're saying the opposite.

I think that was a misunderstanding.  The assertion is caught by several
gdb.btrace tests when run with 32-bit GDB.

I thought you were referring to the badness in my patch that would
result in GDB asking for registers in a wrong process.

Regards,
Markus.
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk
Registergericht: Muenchen HRB 47456
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
  
Pedro Alves March 3, 2015, 3:48 p.m. UTC | #6
On 03/03/2015 03:37 PM, Metzger, Markus T wrote:

> I think that was a misunderstanding.  The assertion is caught by several
> gdb.btrace tests when run with 32-bit GDB.
> 
> I thought you were referring to the badness in my patch that would
> result in GDB asking for registers in a wrong process.

Ah.  Alright, glad we're sorted.  :-)

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 1db98ff..31a672e 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,12 @@ 
+2015-03-03  Pedro Alves  <palves@redhat.com>
+
+	* i386-linux-nat.c (i386_linux_resume): Get the ptrace PID out of
+	the lwp field of ptid.  Pass the full ptid to get_thread_regcache.
+	* inf-ptrace.c (get_ptrace_pid): New function.
+	(inf_ptrace_resume): Use it.
+	* linux-nat.c (linux_resume_one_lwp): Pass the LWP's ptid ummodified
+	to the lower layer.
+
 2015-03-03  Markus Metzger  <markus.t.metzger@intel.com>
 
 	* nat/linux-btrace.c: Include sys/utsname.h.
diff --git a/gdb/i386-linux-nat.c b/gdb/i386-linux-nat.c
index b95b47e..8cb8c66 100644
--- a/gdb/i386-linux-nat.c
+++ b/gdb/i386-linux-nat.c
@@ -648,8 +648,7 @@  static void
 i386_linux_resume (struct target_ops *ops,
 		   ptid_t ptid, int step, enum gdb_signal signal)
 {
-  int pid = ptid_get_pid (ptid);
-
+  int pid = ptid_get_lwp (ptid);
   int request;
 
   if (catch_syscall_enabled () > 0)
@@ -659,7 +658,7 @@  i386_linux_resume (struct target_ops *ops,
 
   if (step)
     {
-      struct regcache *regcache = get_thread_regcache (pid_to_ptid (pid));
+      struct regcache *regcache = get_thread_regcache (ptid);
       struct gdbarch *gdbarch = get_regcache_arch (regcache);
       enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
       ULONGEST pc;
diff --git a/gdb/inf-ptrace.c b/gdb/inf-ptrace.c
index 4c22a84..606bdb4 100644
--- a/gdb/inf-ptrace.c
+++ b/gdb/inf-ptrace.c
@@ -289,6 +289,22 @@  inf_ptrace_stop (struct target_ops *self, ptid_t ptid)
   kill (-inferior_process_group (), SIGINT);
 }
 
+/* Return which PID to pass to ptrace in order to observe/control the
+   tracee identified by PTID.  */
+
+static pid_t
+get_ptrace_pid (ptid_t ptid)
+{
+  pid_t pid;
+
+  /* If we have an LWPID to work with, use it.  Otherwise, we're
+     dealing with a non-threaded program/target.  */
+  pid = ptid_get_lwp (ptid);
+  if (pid == 0)
+    pid = ptid_get_pid (ptid);
+  return pid;
+}
+
 /* Resume execution of thread PTID, or all threads if PTID is -1.  If
    STEP is nonzero, single-step it.  If SIGNAL is nonzero, give it
    that signal.  */
@@ -297,13 +313,16 @@  static void
 inf_ptrace_resume (struct target_ops *ops,
 		   ptid_t ptid, int step, enum gdb_signal signal)
 {
-  pid_t pid = ptid_get_pid (ptid);
+  pid_t pid;
+
   int request;
 
-  if (pid == -1)
+  if (ptid_equal (minus_one_ptid, ptid))
     /* Resume all threads.  Traditionally ptrace() only supports
        single-threaded processes, so simply resume the inferior.  */
-    pid = ptid_get_pid (inferior_ptid);
+    pid = get_ptrace_pid (inferior_ptid);
+  else
+    pid = get_ptrace_pid (ptid);
 
   if (catch_syscall_enabled () > 0)
     request = PT_SYSCALL;
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 2e1133d..cb10e2c 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1506,8 +1506,6 @@  linux_nat_detach (struct target_ops *ops, const char *args, int from_tty)
 static void
 linux_resume_one_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
 {
-  ptid_t ptid;
-
   lp->step = step;
 
   /* stop_pc doubles as the PC the LWP had when it was last resumed.
@@ -1524,9 +1522,7 @@  linux_resume_one_lwp (struct lwp_info *lp, int step, enum gdb_signal signo)
 
   if (linux_nat_prepare_to_resume != NULL)
     linux_nat_prepare_to_resume (lp);
-  /* Convert to something the lower layer understands.  */
-  ptid = pid_to_ptid (ptid_get_lwp (lp->ptid));
-  linux_ops->to_resume (linux_ops, ptid, step, signo);
+  linux_ops->to_resume (linux_ops, lp->ptid, step, signo);
   lp->stop_reason = LWP_STOPPED_BY_NO_REASON;
   lp->stopped = 0;
   registers_changed_ptid (lp->ptid);