Remove lwp -> pid conversion in linux_nat_xfer_partial

Message ID 20170321221744.20567-1-simon.marchi@ericsson.com
State New, archived
Headers

Commit Message

Simon Marchi March 21, 2017, 10:17 p.m. UTC
  From: Simon Marchi <simon.marchi@polymtl.ca>

When inferior_ptid represents a lwp, linux_nat_xfer_partial converts it
to a ptid with only the pid field set.  For example, if
inferior_ptid is:

  { .pid = 1234, .lwp = 1235 }

it will change it to:

  { .pid = 1235, .lwp = 0 }

This is presumably because not all implementations of to_xfer_partial
that might be called down the line know how to handle a ptid with a lwp.
From what I found, it's been like this for a long long time, I traced
the original implementation at least to this commit (1999)

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/lin-thread.c;h=2f255c0e54a0387f1b7994c0bf808f4320b9b054;hb=ed9a39e#l1241

It looks like a hack to me, I think it's simpler if we just make all
implementations handle ptids correctly.  The only place I found that
needed fixing is inf_ptrace_xfer_partial.

There is also linux_proc_xfer_partial and linux_proc_xfer_spu, which
both only use the pid field of inferior_ptid and ignore lwp.  However,
since they use "/proc/<pid>", using the id of any thread in the process
will give the same result (AFAIK).

The testsuite found no regression on native amd64 linux.

gdb/ChangeLog:

	* inf-ptrace.c (inf_ptrace_xfer_partial): Get pid from ptid
	using get_ptrace_pid.
	* linux-nat.c (linux_nat_xfer_partial): Don't set/restore
	inferior_ptid.
---
 gdb/inf-ptrace.c | 2 +-
 gdb/linux-nat.c  | 7 -------
 2 files changed, 1 insertion(+), 8 deletions(-)
  

Comments

Pedro Alves March 21, 2017, 11:58 p.m. UTC | #1
On 03/21/2017 10:17 PM, Simon Marchi wrote:
> From: Simon Marchi <simon.marchi@polymtl.ca>
> 
> When inferior_ptid represents a lwp, linux_nat_xfer_partial converts it
> to a ptid with only the pid field set.  For example, if
> inferior_ptid is:
> 
>   { .pid = 1234, .lwp = 1235 }
> 
> it will change it to:
> 
>   { .pid = 1235, .lwp = 0 }
> 
> This is presumably because not all implementations of to_xfer_partial
> that might be called down the line know how to handle a ptid with a lwp.
> From what I found, it's been like this for a long long time, I traced
> the original implementation at least to this commit (1999)
> 
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/lin-thread.c;h=2f255c0e54a0387f1b7994c0bf808f4320b9b054;hb=ed9a39e#l1241
> 
> It looks like a hack to me, I think it's simpler if we just make all
> implementations handle ptids correctly.  The only place I found that
> needed fixing is inf_ptrace_xfer_partial.

Yeah...  The stuffing of lwpids in the inferior_pid integer global was clearly
a hack.  But when later GDB grew the "ptid" structure, this shuffling
made some sense -- way back then, when we had LinuxThreads instead of NTPL, the
kernel didn't really have any concept of "threads" or "lwps".  Threads
were really each a heavy weight process.  So in that sense, in the abstract,
it made some sense to have inf-ptrace.c only ever think about processes.  But,
over the years we've been running into issues with that, and over time
the inf-ptrace.c layer as been adjusted to understand these ptids.  Commit 90ad5e1d4f34d0
("Linux/ptrace: don't convert ptids when asking inf-ptrace layer to resume LWP")
is one that comes to mind.  You've running into some left overs of a long
slow conversion...

> 
> There is also linux_proc_xfer_partial and linux_proc_xfer_spu, which
> both only use the pid field of inferior_ptid and ignore lwp.  However,
> since they use "/proc/<pid>", using the id of any thread in the process
> will give the same result (AFAIK).

It's generally better to use the lwp id:

- some files under /proc/<pid>/ may not work if the <pid> thread is
  running, just like ptrace requires a stopped thread.  The current
  thread's lwp id is more likely to be in the necessary state (stopped).

- if the leader exits, and goes zombie, then several files under
  "/proc/<pid>" won't work, though using "/proc/<pid>/task/<tid>" would.
  (try poking at leader-exit.exp a bit.)
  The latter path form is also generally better for being robust in
  the case TID exits and is reused in another process, much like
  tkill vs tgkill.

So if possible to switch those spots too, I'd recommend/prefer it.

> 
> The testsuite found no regression on native amd64 linux.
> 
> gdb/ChangeLog:
> 
> 	* inf-ptrace.c (inf_ptrace_xfer_partial): Get pid from ptid
> 	using get_ptrace_pid.
> 	* linux-nat.c (linux_nat_xfer_partial): Don't set/restore
> 	inferior_ptid.

Looks good to me.

Thanks,
Pedro Alves
  
Simon Marchi March 22, 2017, 12:42 a.m. UTC | #2
On 2017-03-21 19:58, Pedro Alves wrote:
> Yeah...  The stuffing of lwpids in the inferior_pid integer global was 
> clearly
> a hack.  But when later GDB grew the "ptid" structure, this shuffling
> made some sense -- way back then, when we had LinuxThreads instead of 
> NTPL, the
> kernel didn't really have any concept of "threads" or "lwps".  Threads
> were really each a heavy weight process.  So in that sense, in the 
> abstract,
> it made some sense to have inf-ptrace.c only ever think about 
> processes.  But,
> over the years we've been running into issues with that, and over time
> the inf-ptrace.c layer as been adjusted to understand these ptids.
> Commit 90ad5e1d4f34d0
> ("Linux/ptrace: don't convert ptids when asking inf-ptrace layer to 
> resume LWP")
> is one that comes to mind.  You've running into some left overs of a 
> long
> slow conversion...

Ok, thanks for the history bits.

>> There is also linux_proc_xfer_partial and linux_proc_xfer_spu, which
>> both only use the pid field of inferior_ptid and ignore lwp.  However,
>> since they use "/proc/<pid>", using the id of any thread in the 
>> process
>> will give the same result (AFAIK).
> 
> It's generally better to use the lwp id:
> 
> - some files under /proc/<pid>/ may not work if the <pid> thread is
>   running, just like ptrace requires a stopped thread.  The current
>   thread's lwp id is more likely to be in the necessary state 
> (stopped).
> 
> - if the leader exits, and goes zombie, then several files under
>   "/proc/<pid>" won't work, though using "/proc/<pid>/task/<tid>" 
> would.
>   (try poking at leader-exit.exp a bit.)
>   The latter path form is also generally better for being robust in
>   the case TID exits and is reused in another process, much like
>   tkill vs tgkill.

I thought that the process exited whenever the main thread exits, that's 
not the case?  I guess not if there's a test for it...

> So if possible to switch those spots too, I'd recommend/prefer it.

Ok, I'll just replace ptid_get_pid with get_ptrace_pid* in this patch 
and look at using /proc/<pid>/task/<tid> after.  When doing the latter, 
do I still have to consider cases where ptid is a single-process/thread 
ptid (lwp == 0)?  From my experience, there's always a lwp on Linux, but 
perhaps there are some setups I don't know about with which it can 
happen?

* using get_ptrace_pid is an abuse of terminology, since we're not using 
ptrace, but it does what we want.

Thanks,

Simon
  
Pedro Alves March 22, 2017, 1 a.m. UTC | #3
On 03/22/2017 12:42 AM, Simon Marchi wrote:

>>> There is also linux_proc_xfer_partial and linux_proc_xfer_spu, which
>>> both only use the pid field of inferior_ptid and ignore lwp.  However,
>>> since they use "/proc/<pid>", using the id of any thread in the process
>>> will give the same result (AFAIK).
>>
>> It's generally better to use the lwp id:
>>
>> - some files under /proc/<pid>/ may not work if the <pid> thread is
>>   running, just like ptrace requires a stopped thread.  The current
>>   thread's lwp id is more likely to be in the necessary state (stopped).
>>
>> - if the leader exits, and goes zombie, then several files under
>>   "/proc/<pid>" won't work, though using "/proc/<pid>/task/<tid>" would.
>>   (try poking at leader-exit.exp a bit.)
>>   The latter path form is also generally better for being robust in
>>   the case TID exits and is reused in another process, much like
>>   tkill vs tgkill.
> 
> I thought that the process exited whenever the main thread exits, that's
> not the case?  I guess not if there's a test for it...

Nope.  If you call "exit", then yes.  The kernel kills the whole thread
group in response to that system call.  If the leader does
pthread_exit, then no, the thread group stays around until all children
exit too.  The kernel won't report the main thread's exit status (i.e.,
we can't reap that zombie, and we'd hang if we tried a blocking waitpid)
until all the children are reaped first.  That's why we have
linux-nat.c:check_zombie_leaders (and the equivalent in gdbserver).

>> So if possible to switch those spots too, I'd recommend/prefer it.
> 
> Ok, I'll just replace ptid_get_pid with get_ptrace_pid* in this patch

Since this is linux-specific code, you should be able to use
ptid_get_lwp directly.

> and look at using /proc/<pid>/task/<tid> after.  When doing the latter,
> do I still have to consider cases where ptid is a single-process/thread
> ptid (lwp == 0)?  From my experience, there's always a lwp on Linux, but
> perhaps there are some setups I don't know about with which it can happen?

Right, on Linux there's always an lwp.  Before NPTL, the 
/proc/<pid>/task/<tid> path didn't exist at all, but we no longer
support LinuxThreads.

Thanks,
Pedro Alves
  
Pedro Alves March 22, 2017, 1:13 a.m. UTC | #4
On 03/22/2017 01:00 AM, Pedro Alves wrote:

>> and look at using /proc/<pid>/task/<tid> after.  When doing the latter,
>> do I still have to consider cases where ptid is a single-process/thread
>> ptid (lwp == 0)?  From my experience, there's always a lwp on Linux, but
>> perhaps there are some setups I don't know about with which it can happen?
> 
> Right, on Linux there's always an lwp.  Before NPTL, the 
> /proc/<pid>/task/<tid> path didn't exist at all, but we no longer
> support LinuxThreads.

I think I read the question all backwards.  I understood you were
asking whether the <tid> in that /proc patch would be 0 in some
cases...  Sorry.

So trying again:

In linux-nat.c, you only see a ptid without an lwp filled in during
early child process startup, either while attaching (see 
thread_change_ptid call in linux_nat_attach), or while doing the first
iteration of the fork-child.c:startup_inferior dance.  The lwp is filled in
at the top of linux_nat_wait_1 [1].  I don't think we access /proc between
the initial fork and that spot.  If we did, we'd be accessing the process
while it was still executing the shell, and startup_inferior was invented
exactly to prevent that sort of inadvertent access.

[1] And that is another inferior_ptid hack that we should get
rid of ... somehow ...

Thanks,
Pedro Alves
  
Simon Marchi March 22, 2017, 1:22 a.m. UTC | #5
On 2017-03-21 21:00, Pedro Alves wrote:
> Nope.  If you call "exit", then yes.  The kernel kills the whole thread
> group in response to that system call.  If the leader does
> pthread_exit, then no, the thread group stays around until all children
> exit too.  The kernel won't report the main thread's exit status (i.e.,
> we can't reap that zombie, and we'd hang if we tried a blocking 
> waitpid)
> until all the children are reaped first.  That's why we have
> linux-nat.c:check_zombie_leaders (and the equivalent in gdbserver).

Oh ok, in my testing I was just letting main return, but I guess it 
reaches a point where the libc calls the exit syscall.  When I call 
pthread_exit, the process stays alive.

>>> So if possible to switch those spots too, I'd recommend/prefer it.
>> 
>> Ok, I'll just replace ptid_get_pid with get_ptrace_pid* in this patch
> 
> Since this is linux-specific code, you should be able to use
> ptid_get_lwp directly.

Ok.

>> and look at using /proc/<pid>/task/<tid> after.  When doing the 
>> latter,
>> do I still have to consider cases where ptid is a 
>> single-process/thread
>> ptid (lwp == 0)?  From my experience, there's always a lwp on Linux, 
>> but
>> perhaps there are some setups I don't know about with which it can 
>> happen?
> 
> Right, on Linux there's always an lwp.  Before NPTL, the
> /proc/<pid>/task/<tid> path didn't exist at all, but we no longer
> support LinuxThreads.

Thanks, I'm sending an updated patch.

Simon
  

Patch

diff --git a/gdb/inf-ptrace.c b/gdb/inf-ptrace.c
index 61d24269a8..f912d28088 100644
--- a/gdb/inf-ptrace.c
+++ b/gdb/inf-ptrace.c
@@ -520,7 +520,7 @@  inf_ptrace_xfer_partial (struct target_ops *ops, enum target_object object,
 			 const gdb_byte *writebuf,
 			 ULONGEST offset, ULONGEST len, ULONGEST *xfered_len)
 {
-  pid_t pid = ptid_get_pid (inferior_ptid);
+  pid_t pid = get_ptrace_pid (inferior_ptid);
 
   switch (object)
     {
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 73ef2d4947..0827f546a8 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -3890,7 +3890,6 @@  linux_nat_xfer_partial (struct target_ops *ops, enum target_object object,
 			const gdb_byte *writebuf,
 			ULONGEST offset, ULONGEST len, ULONGEST *xfered_len)
 {
-  struct cleanup *old_chain;
   enum target_xfer_status xfer;
 
   if (object == TARGET_OBJECT_SIGNAL_INFO)
@@ -3903,15 +3902,9 @@  linux_nat_xfer_partial (struct target_ops *ops, enum target_object object,
   if (object == TARGET_OBJECT_MEMORY && ptid_equal (inferior_ptid, null_ptid))
     return TARGET_XFER_EOF;
 
-  old_chain = save_inferior_ptid ();
-
-  if (ptid_lwp_p (inferior_ptid))
-    inferior_ptid = pid_to_ptid (ptid_get_lwp (inferior_ptid));
-
   xfer = linux_ops->to_xfer_partial (ops, object, annex, readbuf, writebuf,
 				     offset, len, xfered_len);
 
-  do_cleanups (old_chain);
   return xfer;
 }