[v1,1/1] waitid: Add support for waiting for the current process group

Message ID 20190814113822.9505-2-christian.brauner@ubuntu.com
State Not applicable
Headers

Commit Message

Christian Brauner Aug. 14, 2019, 11:38 a.m. UTC
  From: "Eric W. Biederman" <ebiederm@xmission.com>

It was recently discovered that the linux version of waitid is not a
superset of the other wait functions because it does not include
support for waiting for the current process group.  This has two
downsides.  An extra system call is needed to get the current process
group, and a signal could come in between the system call that
retrieved the process gorup and the call to waitid that changes the
current process group.

Allow userspace to avoid both of those issues by defining
idtype == P_PGID and id == 0 to mean wait for the caller's process
group at the time of the call.

Arguments can be made for using a different choice of idtype and id
for this case but the BSDs already use this P_PGID and 0 to indicate
waiting for the current process's process group.  So be nice to user
space programmers and don't introduce an unnecessary incompatibility.

Some people have noted that the posix description is that
waitpid will wait for the current process group, and that in
the presence of pthreads that process group can change.  To get
clarity on this issue I looked at XNU, FreeBSD, and Luminos.  All of
those flavors of unix waited for the current process group at the
time of call and as written could not adapt to the process group
changing after the call.

At one point Linux did adapt to the current process group changing but
that stopped in 161550d74c07 ("pid: sys_wait... fixes").  It has been
over 11 years since Linux has that behavior, no programs that fail
with the change in behavior have been reported, and I could not
find any other unix that does this.  So I think it is safe to clarify
the definition of current process group, to current process group
at the time of the wait function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Alistair Francis <alistair23@gmail.com>
Cc: Zong Li <zongbox@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: GNU C Library <libc-alpha@sourceware.org>
---
v1:
- Christian Brauner <christian.brauner@ubuntu.com>:
  - move find_get_pid() calls into the switch statements to minimize
    merge conflicts with P_PIDFD changes and adhere to coding style
    discussions we had for P_PIDFD
---
 kernel/exit.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)
  

Comments

Oleg Nesterov Aug. 14, 2019, 12:29 p.m. UTC | #1
On 08/14, christian.brauner@ubuntu.com wrote:
>
>  	case P_PGID:
>  		type = PIDTYPE_PGID;
> -		if (upid <= 0)
> +		if (upid < 0)
>  			return -EINVAL;
> +
> +		if (upid == 0)
> +			pid = get_pid(task_pgrp(current));

this needs rcu lock or tasklist_lock, this can race with another thread
doing sys_setpgid/setsid (see change_pid(PIDTYPE_PGID)).

Oleg.
  
Christian Brauner Aug. 14, 2019, 12:45 p.m. UTC | #2
On Wed, Aug 14, 2019 at 02:29:10PM +0200, Oleg Nesterov wrote:
> On 08/14, christian.brauner@ubuntu.com wrote:
> >
> >  	case P_PGID:
> >  		type = PIDTYPE_PGID;
> > -		if (upid <= 0)
> > +		if (upid < 0)
> >  			return -EINVAL;
> > +
> > +		if (upid == 0)
> > +			pid = get_pid(task_pgrp(current));
> 
> this needs rcu lock or tasklist_lock, this can race with another thread
> doing sys_setpgid/setsid (see change_pid(PIDTYPE_PGID)).

Oh, I naively assumed task_pgrp() would take an rcu lock...

kernel/sys.c takes both, i.e.

rcu_read_lock();
write_lock_irq(&tasklist_lock);

though I think we should be fine with just rcu_read_lock(). setpgid()
indicates that it wants to check real_parent and needs the
write_lock_irq() because it might change behind its back which we don't
care about since we're not changing the pgrp.

Christian
  
Oleg Nesterov Aug. 14, 2019, 12:50 p.m. UTC | #3
On 08/14, Christian Brauner wrote:
>
> On Wed, Aug 14, 2019 at 02:29:10PM +0200, Oleg Nesterov wrote:
> > On 08/14, christian.brauner@ubuntu.com wrote:
> > >
> > >  	case P_PGID:
> > >  		type = PIDTYPE_PGID;
> > > -		if (upid <= 0)
> > > +		if (upid < 0)
> > >  			return -EINVAL;
> > > +
> > > +		if (upid == 0)
> > > +			pid = get_pid(task_pgrp(current));
> >
> > this needs rcu lock or tasklist_lock, this can race with another thread
> > doing sys_setpgid/setsid (see change_pid(PIDTYPE_PGID)).
>
> Oh, I naively assumed task_pgrp() would take an rcu lock...

but it would not help ;)

> though I think we should be fine with just rcu_read_lock().

Yes,

Oleg.
  
Christian Brauner Aug. 14, 2019, 12:53 p.m. UTC | #4
On Wed, Aug 14, 2019 at 02:50:12PM +0200, Oleg Nesterov wrote:
> On 08/14, Christian Brauner wrote:
> >
> > On Wed, Aug 14, 2019 at 02:29:10PM +0200, Oleg Nesterov wrote:
> > > On 08/14, christian.brauner@ubuntu.com wrote:
> > > >
> > > >  	case P_PGID:
> > > >  		type = PIDTYPE_PGID;
> > > > -		if (upid <= 0)
> > > > +		if (upid < 0)
> > > >  			return -EINVAL;
> > > > +
> > > > +		if (upid == 0)
> > > > +			pid = get_pid(task_pgrp(current));
> > >
> > > this needs rcu lock or tasklist_lock, this can race with another thread
> > > doing sys_setpgid/setsid (see change_pid(PIDTYPE_PGID)).
> >
> > Oh, I naively assumed task_pgrp() would take an rcu lock...
> 
> but it would not help ;)

Yeah, it doesn't do a get. :)
  

Patch

diff --git a/kernel/exit.c b/kernel/exit.c
index 5b4a5dcce8f8..e70083b14f31 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1576,19 +1576,23 @@  static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
 		type = PIDTYPE_PID;
 		if (upid <= 0)
 			return -EINVAL;
+
+		pid = find_get_pid(upid);
 		break;
 	case P_PGID:
 		type = PIDTYPE_PGID;
-		if (upid <= 0)
+		if (upid < 0)
 			return -EINVAL;
+
+		if (upid == 0)
+			pid = get_pid(task_pgrp(current));
+		else
+			pid = find_get_pid(upid);
 		break;
 	default:
 		return -EINVAL;
 	}
 
-	if (type < PIDTYPE_MAX)
-		pid = find_get_pid(upid);
-
 	wo.wo_type	= type;
 	wo.wo_pid	= pid;
 	wo.wo_flags	= options;