[BZ,#15392] Remove fork child pid assertion

Message ID 1416014955-5408-1-git-send-email-rickyz@chromium.org
State Dropped
Headers

Commit Message

Ricky Zhou Nov. 15, 2014, 1:29 a.m. UTC
  This assertion is no longer always true, since a forked child may be in
a different PID namespace than its parent, and the two namespace may
have PID collisions.

An example program which hits this assert is attached to bug 17596.

	[BZ #15392]
	* sysdeps/nptl/fork.c (__libc_fork): Remove assert that the parent
	and child pid must differ after forking.
---
 sysdeps/nptl/fork.c | 2 --
 1 file changed, 2 deletions(-)
  

Comments

Torvald Riegel Nov. 17, 2014, 10:38 a.m. UTC | #1
On Fri, 2014-11-14 at 17:29 -0800, Ricky Zhou wrote:
> This assertion is no longer always true, since a forked child may be in
> a different PID namespace than its parent, and the two namespace may
> have PID collisions.

If that's true, process-shared synchronization facilities based on the
thread ID (e.g., mutexes) might not work anymore.

Do we at least need a note in the documentation that this is the case?
Or do we still need to implement it?

Is there any glibc-internal synchronization that this could break?

I believe we deal with these issues before dropping the assert.

> An example program which hits this assert is attached to bug 17596.
> 
> 	[BZ #15392]
> 	* sysdeps/nptl/fork.c (__libc_fork): Remove assert that the parent
> 	and child pid must differ after forking.
> ---
>  sysdeps/nptl/fork.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/sysdeps/nptl/fork.c b/sysdeps/nptl/fork.c
> index a7dafa8..6adb723 100644
> --- a/sysdeps/nptl/fork.c
> +++ b/sysdeps/nptl/fork.c
> @@ -139,8 +139,6 @@ __libc_fork (void)
>      {
>        struct pthread *self = THREAD_SELF;
>  
> -      assert (THREAD_GETMEM (self, tid) != ppid);
> -
>        /* See __pthread_once.  */
>        if (__fork_generation_pointer != NULL)
>  	*__fork_generation_pointer += __PTHREAD_ONCE_FORK_GEN_INCR;
  
Ricky Zhou Nov. 17, 2014, 10:20 p.m. UTC | #2
On Mon, Nov 17, 2014 at 2:38 AM, Torvald Riegel <triegel@redhat.com> wrote:
> On Fri, 2014-11-14 at 17:29 -0800, Ricky Zhou wrote:
>> This assertion is no longer always true, since a forked child may be in
>> a different PID namespace than its parent, and the two namespace may
>> have PID collisions.
>
> If that's true, process-shared synchronization facilities based on the
> thread ID (e.g., mutexes) might not work anymore.
>
> Do we at least need a note in the documentation that this is the case?
> Or do we still need to implement it?
>
> Is there any glibc-internal synchronization that this could break?
>
> I believe we deal with these issues before dropping the assert.
Within a PID namespace, all thread IDs should remain unique, and a
process cannot have threads in different PID namespaces
(http://lxr.free-electrons.com/source/kernel/fork.c#L1217), so I'm
having trouble coming up with a situation where removing this assert
would allow badness to happen down the line.

Ricky
  
Torvald Riegel Nov. 18, 2014, 2:32 p.m. UTC | #3
On Mon, 2014-11-17 at 14:20 -0800, Ricky Zhou wrote:
> On Mon, Nov 17, 2014 at 2:38 AM, Torvald Riegel <triegel@redhat.com> wrote:
> > On Fri, 2014-11-14 at 17:29 -0800, Ricky Zhou wrote:
> >> This assertion is no longer always true, since a forked child may be in
> >> a different PID namespace than its parent, and the two namespace may
> >> have PID collisions.
> >
> > If that's true, process-shared synchronization facilities based on the
> > thread ID (e.g., mutexes) might not work anymore.
> >
> > Do we at least need a note in the documentation that this is the case?
> > Or do we still need to implement it?
> >
> > Is there any glibc-internal synchronization that this could break?
> >
> > I believe we deal with these issues before dropping the assert.
> Within a PID namespace, all thread IDs should remain unique, and a
> process cannot have threads in different PID namespaces
> (http://lxr.free-electrons.com/source/kernel/fork.c#L1217), so I'm
> having trouble coming up with a situation where removing this assert
> would allow badness to happen down the line.

Okay, so private (intra-process) mutexes and such are safe.  However,
what about process-shared mutexes?
  
Carlos O'Donell Nov. 18, 2014, 4:02 p.m. UTC | #4
On 11/18/2014 09:32 AM, Torvald Riegel wrote:
> On Mon, 2014-11-17 at 14:20 -0800, Ricky Zhou wrote:
>> On Mon, Nov 17, 2014 at 2:38 AM, Torvald Riegel <triegel@redhat.com> wrote:
>>> On Fri, 2014-11-14 at 17:29 -0800, Ricky Zhou wrote:
>>>> This assertion is no longer always true, since a forked child may be in
>>>> a different PID namespace than its parent, and the two namespace may
>>>> have PID collisions.
>>>
>>> If that's true, process-shared synchronization facilities based on the
>>> thread ID (e.g., mutexes) might not work anymore.
>>>
>>> Do we at least need a note in the documentation that this is the case?
>>> Or do we still need to implement it?
>>>
>>> Is there any glibc-internal synchronization that this could break?
>>>
>>> I believe we deal with these issues before dropping the assert.
>> Within a PID namespace, all thread IDs should remain unique, and a
>> process cannot have threads in different PID namespaces
>> (http://lxr.free-electrons.com/source/kernel/fork.c#L1217), so I'm
>> having trouble coming up with a situation where removing this assert
>> would allow badness to happen down the line.
> 
> Okay, so private (intra-process) mutexes and such are safe.  However,
> what about process-shared mutexes?

I agree with Torvald, and I have further comments.

All of pthreads expects POSIX requirements, which are that the PIDs
are stable for the life of the process. If setns is called, then the
fork'd child violates that invariant and enters undefined behaviour.

The only tenable solution I see is documenting that you must use clone
to enter the new namespace. The documentation should note that fork
is forbidden after setns and unshare.

The patch I would like to see is:

* Add minimal stub entry for `setns` and `unshare` to 
  manual/threads.texi and mention that after calling this
  function that fork must not be used, and clone is the
  only safe function to call.

If you have problems using clone, then we need to talk about that
interface. The feedback from lxc has been that they would like not
to have to worry about the stack requirement in clone(), but we
haven't done anything to make that easier.

Cheers,
Carlos.
  
Ricky Zhou Nov. 19, 2014, 1:51 a.m. UTC | #5
Ah, I see, I didn't know about process-shared mutexes - those
definitely won't work across PID namespaces (and this should
definitely be documented). Do you know of any other places in glibc
where we may be assuming that PIDs are unique between processes that
may be in different PID namespaces? I think it'd be useful to have a
list of these.

I'm a little bit confused about why we'd want to disallow fork after
setns/unshare (since this is likely to be a common pattern that's
already in use). I'm not super familiar with glibc internals/APIs, but
are there many other instances where we rely on the uniqueness of PIDs
between two different processes apart from process-shared
mutexes/condvars/barriers? Would it be possible to document the
specific APIs that won't work across PID namespaces instead of
forbidding fork after setns/unshare with CLONE_NEWPID?

By the way, I am also interested in improving or extending the clone
API as well (let me know if you'd rather split this into a separate
thread). Two ideas that would solve our issues with the current clone
wrapper:

1) Provide an interface to reset the PID cache (this would allow us to
use the syscall directly).
2) Provide an alternate fork-like version of the clone wrapper. This
version would not take a child_stack, and would enforce that CLONE_VM
is not set. Aside from this, it would invalidate the PID cache and
perform the syscall.

Thanks,
Ricky

On Tue, Nov 18, 2014 at 8:02 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 11/18/2014 09:32 AM, Torvald Riegel wrote:
>> On Mon, 2014-11-17 at 14:20 -0800, Ricky Zhou wrote:
>>> On Mon, Nov 17, 2014 at 2:38 AM, Torvald Riegel <triegel@redhat.com> wrote:
>>>> On Fri, 2014-11-14 at 17:29 -0800, Ricky Zhou wrote:
>>>>> This assertion is no longer always true, since a forked child may be in
>>>>> a different PID namespace than its parent, and the two namespace may
>>>>> have PID collisions.
>>>>
>>>> If that's true, process-shared synchronization facilities based on the
>>>> thread ID (e.g., mutexes) might not work anymore.
>>>>
>>>> Do we at least need a note in the documentation that this is the case?
>>>> Or do we still need to implement it?
>>>>
>>>> Is there any glibc-internal synchronization that this could break?
>>>>
>>>> I believe we deal with these issues before dropping the assert.
>>> Within a PID namespace, all thread IDs should remain unique, and a
>>> process cannot have threads in different PID namespaces
>>> (http://lxr.free-electrons.com/source/kernel/fork.c#L1217), so I'm
>>> having trouble coming up with a situation where removing this assert
>>> would allow badness to happen down the line.
>>
>> Okay, so private (intra-process) mutexes and such are safe.  However,
>> what about process-shared mutexes?
>
> I agree with Torvald, and I have further comments.
>
> All of pthreads expects POSIX requirements, which are that the PIDs
> are stable for the life of the process. If setns is called, then the
> fork'd child violates that invariant and enters undefined behaviour.
>
> The only tenable solution I see is documenting that you must use clone
> to enter the new namespace. The documentation should note that fork
> is forbidden after setns and unshare.
>
> The patch I would like to see is:
>
> * Add minimal stub entry for `setns` and `unshare` to
>   manual/threads.texi and mention that after calling this
>   function that fork must not be used, and clone is the
>   only safe function to call.
>
> If you have problems using clone, then we need to talk about that
> interface. The feedback from lxc has been that they would like not
> to have to worry about the stack requirement in clone(), but we
> haven't done anything to make that easier.
>
> Cheers,
> Carlos.
>
>
  
Carlos O'Donell Nov. 19, 2014, 3:57 a.m. UTC | #6
On 11/18/2014 08:51 PM, Ricky Zhou wrote:
> Ah, I see, I didn't know about process-shared mutexes - those
> definitely won't work across PID namespaces (and this should
> definitely be documented). Do you know of any other places in glibc
> where we may be assuming that PIDs are unique between processes that
> may be in different PID namespaces? I think it'd be useful to have a
> list of these.

PID uniqueness is a requirement for POSIX, everything can rely on it.

Documenting the functions which depend on it will lock down the
implementation's flexibility.

The uniqueness of the PID continues to be true *before* you enter
the namespace and then *after* you enter the namespace. The only
problem is during the transition for the child.

We should clearly define *how* you transition and limit that transition
to a simple set of APIs.

> I'm a little bit confused about why we'd want to disallow fork after
> setns/unshare (since this is likely to be a common pattern that's
> already in use). I'm not super familiar with glibc internals/APIs, but
> are there many other instances where we rely on the uniqueness of PIDs
> between two different processes apart from process-shared
> mutexes/condvars/barriers? Would it be possible to document the
> specific APIs that won't work across PID namespaces instead of
> forbidding fork after setns/unshare with CLONE_NEWPID?

You are asking to support a POSIX API with a set of semantics that
violate the contract of that API. I would like to avoid this.

Using clone is fine, since clone has none of the POSIX requirements
that fork, or vfork have.

I would like to see one well documented way to transition to the child
safely after which you should be able to continue calling libc functions
including fork and vfork.

Today, the easiest solution is:
"After setns/unshare you must clone followed immediately by exec"

The exec starts the process over again in the new namespace and everything
works fine. The problem is that this is resource intensive and not what
users want.

> By the way, I am also interested in improving or extending the clone
> API as well (let me know if you'd rather split this into a separate
> thread). Two ideas that would solve our issues with the current clone
> wrapper:

Then let us go in this direction, make clone better, and recommend
it as the only way to create a new process after setns/unshare.

> 1) Provide an interface to reset the PID cache (this would allow us to
> use the syscall directly).

Why do you need this if you have a version of clone that resets it for you?

> 2) Provide an alternate fork-like version of the clone wrapper. This
> version would not take a child_stack, and would enforce that CLONE_VM
> is not set. Aside from this, it would invalidate the PID cache and
> perform the syscall.

I think a new fork-like clone wrapper should be acceptable, we'll have
to iterate on the design a bit, but I don't object to the idea.

We could document it as being the one safe way to create a new process
in the new namespace.

Cheers,
Carlos.
  
Ricky Zhou Nov. 20, 2014, 8:05 p.m. UTC | #7
On Tue, Nov 18, 2014 at 7:57 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 11/18/2014 08:51 PM, Ricky Zhou wrote:
>> Ah, I see, I didn't know about process-shared mutexes - those
>> definitely won't work across PID namespaces (and this should
>> definitely be documented). Do you know of any other places in glibc
>> where we may be assuming that PIDs are unique between processes that
>> may be in different PID namespaces? I think it'd be useful to have a
>> list of these.
>
> PID uniqueness is a requirement for POSIX, everything can rely on it.
>
> Documenting the functions which depend on it will lock down the
> implementation's flexibility.
My goal with documenting functions that depend on PID uniqueness isn't
to lock down the implementation, but rather to lock down people that
decide to use PID namespaces. For example, someone using PID
namespaces should be aware that they cannot use process-shared mutexes
across the PID namespace.

> You are asking to support a POSIX API with a set of semantics that
> violate the contract of that API. I would like to avoid this.
Is the concrete fear that future changes to glibc internals may want
to rely on the uniqueness of PIDs across fork? I'm not trying to argue
one way or the other here - I just want to better understand the
reasoning for disallowing fork after setns/unshare.

I haven't fully read through the fork code, but it seems to reset some
other glibc internal state, and someone using setns/unshare might want
that behavior as well.

> Using clone is fine, since clone has none of the POSIX requirements
> that fork, or vfork have.
Does this mean that developers should implicitly know that features
like process-shared mutexes won't work across clones into a new PID
namespace because clone is not a POSIX API?

> Then let us go in this direction, make clone better, and recommend
> it as the only way to create a new process after setns/unshare.
>
>> 1) Provide an interface to reset the PID cache (this would allow us to
>> use the syscall directly).
>
> Why do you need this if you have a version of clone that resets it for you?
From my perspective, this could potentially be useful in the (probably
not too likely) situation where Linux adds some new method for
creating a process that we want to use before glibc has a wrapper for
it. I'm fine with the other approach of adding a new clone wrapper as
well though.

> I think a new fork-like clone wrapper should be acceptable, we'll have
> to iterate on the design a bit, but I don't object to the idea.
Cool, I'll send out an RFC in a bit to start talking about what this
interface should look like.

Thanks,
Ricky
  

Patch

diff --git a/sysdeps/nptl/fork.c b/sysdeps/nptl/fork.c
index a7dafa8..6adb723 100644
--- a/sysdeps/nptl/fork.c
+++ b/sysdeps/nptl/fork.c
@@ -139,8 +139,6 @@  __libc_fork (void)
     {
       struct pthread *self = THREAD_SELF;
 
-      assert (THREAD_GETMEM (self, tid) != ppid);
-
       /* See __pthread_once.  */
       if (__fork_generation_pointer != NULL)
 	*__fork_generation_pointer += __PTHREAD_ONCE_FORK_GEN_INCR;