[BZ,#18433] Check file access/existence before forking.

Message ID	55F19819.3010601@gmail.com
State	Rejected
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org To: libc-alpha@sourceware.org From: Navid Rahimi <rahimi.nv@gmail.com> Subject: [PATCH] [BZ #18433] Check file access/existence before forking. Message-ID: <55F19819.3010601@gmail.com> Date: Thu, 10 Sep 2015 19:17:53 +0430 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit

Message ID

55F19819.3010601@gmail.com

State

Rejected

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
To: libc-alpha@sourceware.org
From: Navid Rahimi <rahimi.nv@gmail.com>
Subject: [PATCH] [BZ #18433] Check file access/existence before forking.
Message-ID: <55F19819.3010601@gmail.com>
Date: Thu, 10 Sep 2015 19:17:53 +0430
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.2.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Commit Message

Navid Rahimi Sept. 10, 2015, 2:47 p.m. UTC

  [BZ #18433]
* sysdeps/posix/spawni.c (__spawni):
Check file access before forking.
---
ChangeLog               | 5 +++++
sysdeps/posix/spawni.c | 3 +++
2 files changed, 8 insertions(+)

Comments

Szabolcs Nagy Sept. 10, 2015, 3:01 p.m. UTC | #1

On 10/09/15 15:47, Navid Rahimi wrote:
> [BZ #18433]
> * sysdeps/posix/spawni.c (__spawni):
> Check file access before forking.
> ---

> +++ b/sysdeps/posix/spawni.c
> @@ -90,6 +90,9 @@ __spawni (pid_t *pid, const char *file,
>      size_t len;
>      size_t pathlen;
>
> +  if(__access (file, X_OK) != 0)
> +    return errno;
> +

this does not guarantee the success of __execve (file, argv, envp).

Zack Weinberg Sept. 10, 2015, 3:02 p.m. UTC | #2

On September 10, 2015 10:47:53 AM EDT, Navid Rahimi <rahimi.nv@gmail.com> wrote:
>[BZ #18433]
>* sysdeps/posix/spawni.c (__spawni):
>Check file access before forking.
>---
>ChangeLog               | 5 +++++
>sysdeps/posix/spawni.c | 3 +++
>2 files changed, 8 insertions(+)
>
>diff --git a/ChangeLog b/ChangeLog
>index 5f009a8..0e0c85b 100644
>--- a/ChangeLog
>+++ b/ChangeLog
>@@ -1,3 +1,8 @@
>+2015-09-10  Navid Rahimi  <rahimi.nv@gmail.com>
>+
>+	[BZ #18433]
>+	* sysdeps/posix/spawni.c (__spawni): Check file access before
>forking.
>+
>  2015-09-08  Joseph Myers  <joseph@codesourcery.com>
>
>	[BZ #14912]
>diff --git a/sysdeps/posix/spawni.c b/sysdeps/posix/spawni.c
>index eee9331..c571390 100644
>--- a/sysdeps/posix/spawni.c
>+++ b/sysdeps/posix/spawni.c
>@@ -90,6 +90,9 @@ __spawni (pid_t *pid, const char *file,
>    size_t len;
>    size_t pathlen;
>
>+  if(__access (file, X_OK) != 0)
>+    return errno;
>+
>    /* Do this once.  */
>    short int flags = attrp == NULL ? 0 : attrp->__flags;

Why is a TOCTOU race acceptable and/or unavoidable? Also, why is a check using the real rather than the effective credentials correct here?

Navid Rahimi Sept. 10, 2015, 3:05 p.m. UTC | #3

On 09/10/2015 07:31 PM, Szabolcs Nagy wrote:
> On 10/09/15 15:47, Navid Rahimi wrote:
>> [BZ #18433]
>> * sysdeps/posix/spawni.c (__spawni):
>> Check file access before forking.
>> ---
>
>> +++ b/sysdeps/posix/spawni.c
>> @@ -90,6 +90,9 @@ __spawni (pid_t *pid, const char *file,
>>      size_t len;
>>      size_t pathlen;
>>
>> +  if(__access (file, X_OK) != 0)
>> +    return errno;
>> +
>
> this does not guarantee the success of __execve (file, argv, envp).
>

I think our main objection here is to avoid forking when there is no 
file.There is so many other variable for checking if execve is going to 
success or not.

Phil Blundell Sept. 10, 2015, 6:26 p.m. UTC | #4

On Thu, 2015-09-10 at 19:35 +0430, Navid Rahimi wrote:
> I think our main objection here is to avoid forking when there is no 
> file.There is so many other variable for checking if execve is going to 
> success or not.

On the other hand, your patch will add an extra system call and
directory lookup to every successful posix_spawn() call, i.e. you are
optimising the failure case at the expense of the successful case.  It's
not at all obvious to me that this is a sensible thing to do.  Can you
explain your reasoning in a bit more detail?

p.

Navid Rahimi Sept. 10, 2015, 9:38 p.m. UTC | #5

On Thu, Sep 10, 2015 at 7:32 PM, Zack Weinberg <zackw@panix.com> wrote:
> Why is a TOCTOU race acceptable and/or unavoidable? Also, why is a check using the real rather than the effective credentials correct here?

Because of nature of lock in unix (being advisory) I think there is no
way to lock file and prevent TOCTOU, even if we had lock mechanism ,
it is cumbersome (and almost impossible) to design in correctly.
Because for preventing TOCTOU we should keep lock until execve , and
after that point so many question will raise , "what will happen to
lock in execve ?" or "does child process has lock also ?"

About using real rather effective credentials , it was my mistake ,
euidaccess would be more appropriate .

Navid Rahimi Sept. 10, 2015, 9:45 p.m. UTC | #6

On Thu, Sep 10, 2015 at 10:56 PM, Phil Blundell <pb@pbcl.net> wrote:
> On Thu, 2015-09-10 at 19:35 +0430, Navid Rahimi wrote:
>> I think our main objection here is to avoid forking when there is no
>> file.There is so many other variable for checking if execve is going to
>> success or not.
>
> On the other hand, your patch will add an extra system call and
> directory lookup to every successful posix_spawn() call, i.e. you are
> optimising the failure case at the expense of the successful case.  It's
> not at all obvious to me that this is a sensible thing to do.  Can you
> explain your reasoning in a bit more detail?
>
> p.
>
>

I agree with you completely about overhead and extra syscall , but
there is no other option if we want to check fork. About optimizing
failure case , I would not call it optimizing , I returning correct
error code and correct answer to caller at expense of one extra
syscall overhead . Perfect Abstraction I think is the most priority of
every project.Adding extra syscall will have overhead , but without
that we are not reporting status correctly to caller.

Zack Weinberg Sept. 10, 2015, 9:50 p.m. UTC | #7

On Thu, Sep 10, 2015 at 5:38 PM, navid Rahimi <rahimi.nv@gmail.com> wrote:
> On Thu, Sep 10, 2015 at 7:32 PM, Zack Weinberg <zackw@panix.com> wrote:
>> Why is a TOCTOU race acceptable and/or unavoidable? Also, why is a check using the real rather than the effective credentials correct here?
>
> Because of nature of lock in unix (being advisory) I think there is no
> way to lock file and prevent TOCTOU, even if we had lock mechanism ,
> it is cumbersome (and almost impossible) to design in correctly.

It just occurred to me that you could open() the file in the parent
and then use fexecve() in the child.  If that worked correctly it
would address both my concerns and Phil's.  However, fexecve is faked
on top of /proc, which means there are all sorts of ways it might not
work 100% reliably, and for correctness you would need an open mode
that may not actually exist (O_EXEC, I guess it would be called).

All in all I am inclined to say leave things as they are.  Nobody uses
posix_spawn if they can help it, anyway :)

zw

Phil Blundell Sept. 10, 2015, 11:15 p.m. UTC | #8

On Fri, 2015-09-11 at 02:15 +0430, navid Rahimi wrote: 
> I agree with you completely about overhead and extra syscall , but
> there is no other option if we want to check fork. About optimizing
> failure case , I would not call it optimizing , I returning correct
> error code and correct answer to caller at expense of one extra
> syscall overhead . 

Well, right, but I still don't understand why this is desirable.

A portable application can't rely on this behaviour since, as I wrote in
the bugzilla ticket earlier, POSIX explicitly allows an implementation
to not detect errors before fork()ing and instead to have the child do
exit(127); and as far as I can tell glibc is already doing this
correctly.  So there doesn't seem to be any conformance problem here; at
most we're talking about a QoI issue.

An application which is particularly worried that it might be about to
posix_spawn() a nonexistent executable can always do the access() check
for itself beforehand.  Alternatively it could look for the exit code
127 and then try to figure out retrospectively why the spawn() would
have failed by poking around with access().  But it remains a bit
unclear to me why the application would really need to know since, even
if it can determine why posix_spawn() failed, it's not obvious that it
can take any meaningful steps to fix the problem.

Do you have a concrete use-case where having a specific failure return
from posix_spawn() is important?

p.

Ondrej Bilka Sept. 11, 2015, 6:28 a.m. UTC | #9

On Thu, Sep 10, 2015 at 07:26:46PM +0100, Phil Blundell wrote:
> On Thu, 2015-09-10 at 19:35 +0430, Navid Rahimi wrote:
> > I think our main objection here is to avoid forking when there is no 
> > file.There is so many other variable for checking if execve is going to 
> > success or not.
> 
> On the other hand, your patch will add an extra system call and
> directory lookup to every successful posix_spawn() call, i.e. you are
> optimising the failure case at the expense of the successful case.  It's
> not at all obvious to me that this is a sensible thing to do.  Can you
> explain your reasoning in a bit more detail?
> 
And do you have programs where that would matter? On quick test just
calling fork around takes 240000 cycles so this would likely lost in
noise.

Szabolcs Nagy Sept. 11, 2015, 9:08 a.m. UTC | #10

On 10/09/15 22:45, navid Rahimi wrote:
> On Thu, Sep 10, 2015 at 10:56 PM, Phil Blundell <pb@pbcl.net> wrote:
>> On Thu, 2015-09-10 at 19:35 +0430, Navid Rahimi wrote:
>>> I think our main objection here is to avoid forking when there is no
>>> file.There is so many other variable for checking if execve is going to
>>> success or not.
>>
>> On the other hand, your patch will add an extra system call and
>> directory lookup to every successful posix_spawn() call, i.e. you are
>> optimising the failure case at the expense of the successful case.  It's
>> not at all obvious to me that this is a sensible thing to do.  Can you
>> explain your reasoning in a bit more detail?
>>
>> p.
>>
>>
>
> I agree with you completely about overhead and extra syscall , but
> there is no other option if we want to check fork. About optimizing

there are other options.

for example you can correctly check if execve returned
and report an error then instead of doing approximations
of the right check.

you can report the error through a cloexec pipe to the parent.

note that a correct posix_spawn implementation never uses
fork for QoI reasons (that's the point of posix_spawn, so
it works on nommu, large multi-thread applications etc.)
and vfork has the wrong semantics for c code so there are
other issues here.

> failure case , I would not call it optimizing , I returning correct
> error code and correct answer to caller at expense of one extra
> syscall overhead . Perfect Abstraction I think is the most priority of
> every project.Adding extra syscall will have overhead , but without
> that we are not reporting status correctly to caller.
>

Florian Weimer Sept. 11, 2015, 9:18 a.m. UTC | #11

On 09/10/2015 11:50 PM, Zack Weinberg wrote:
> On Thu, Sep 10, 2015 at 5:38 PM, navid Rahimi <rahimi.nv@gmail.com> wrote:
>> On Thu, Sep 10, 2015 at 7:32 PM, Zack Weinberg <zackw@panix.com> wrote:
>>> Why is a TOCTOU race acceptable and/or unavoidable? Also, why is a check using the real rather than the effective credentials correct here?
>>
>> Because of nature of lock in unix (being advisory) I think there is no
>> way to lock file and prevent TOCTOU, even if we had lock mechanism ,
>> it is cumbersome (and almost impossible) to design in correctly.
> 
> It just occurred to me that you could open() the file in the parent
> and then use fexecve() in the child.  If that worked correctly it
> would address both my concerns and Phil's.  However, fexecve is faked
> on top of /proc, which means there are all sorts of ways it might not
> work 100% reliably, and for correctness you would need an open mode
> that may not actually exist (O_EXEC, I guess it would be called).

The correct way is to set up a pipe with O_CLOEXEC file descriptors in
the parent prior to forking.  The parent reads from the pipe after
forking, waiting.  When the execve succeeds, the read in the parent
completes without returning any data.  On execve failure, the child
writes the errno value over the pipe, where the parent can read it and
report it.

The downside is that this adds additional blocking to the posix_spawn
operation, which is not something all callers want (some may even prefer
that the vfork happens on a new thread).  So I think this needs a
configuration knob.

> All in all I am inclined to say leave things as they are.  Nobody uses
> posix_spawn if they can help it, anyway :)

We need to move people off fork and clone to posix_spawn.  If it is
missing functionality, we should add it.

Alexander Monakov Sept. 11, 2015, 9:33 a.m. UTC | #12

On Fri, 11 Sep 2015, Florian Weimer wrote:
> The downside is that this adds additional blocking to the posix_spawn
> operation, which is not something all callers want (some may even prefer
> that the vfork happens on a new thread).  So I think this needs a
> configuration knob.

Huh?  By definition, the vfork parent is suspended until the vfork child
either execs or terminates.  In both outcomes, the pipe will be already
closed when the parent is resumed, so it will not block on reading.

(and if you can rely on parent-suspending semantics of vfork, you don't even
need a pipe)

Unless I've missed something?

Alexander

Navid Rahimi Sept. 11, 2015, 9:34 a.m. UTC | #13

On Fri, Sep 11, 2015 at 1:48 PM, Florian Weimer <fweimer@redhat.com> wrote:
> The correct way is to set up a pipe with O_CLOEXEC file descriptors in
> the parent prior to forking.  The parent reads from the pipe after
> forking, waiting.  When the execve succeeds, the read in the parent
> completes without returning any data.  On execve failure, the child
> writes the errno value over the pipe, where the parent can read it and
> report it.

I was reading musl source code to figure it out how they handle this
situation , they use exact same method . They use pipe's to get
situation of child .The main key point here is the closure of pipe in
successful execve(O_CLOEXEC).
Implementation of it is not hard at all. I will work on patch to send in days.

Rich Felker Sept. 11, 2015, 3:55 p.m. UTC | #14

On Fri, Sep 11, 2015 at 12:33:24PM +0300, Alexander Monakov wrote:
> On Fri, 11 Sep 2015, Florian Weimer wrote:
> > The downside is that this adds additional blocking to the posix_spawn
> > operation, which is not something all callers want (some may even prefer
> > that the vfork happens on a new thread).  So I think this needs a
> > configuration knob.
> 
> Huh?  By definition, the vfork parent is suspended until the vfork child
> either execs or terminates.  In both outcomes, the pipe will be already
> closed when the parent is resumed, so it will not block on reading.
> 
> (and if you can rely on parent-suspending semantics of vfork, you don't even
> need a pipe)
> 
> Unless I've missed something?

I've found CLONE_VFORK unreliable on Linux: under some conditions,
when running under ptrace, I observed the parent returning before the
child execs/exits, and of course horrible memory corruption resulted.
Thus I strongly prefer using CLONE_VFORK merely as a hint to the
scheduler (which helps prevent the extra blocking step in read) and
using FD_CLOEXEC for the actual synchronization.

Rich

Andreas Schwab Sept. 11, 2015, 4:23 p.m. UTC | #15

Rich Felker <dalias@libc.org> writes:

> I've found CLONE_VFORK unreliable on Linux: under some conditions,
> when running under ptrace, I observed the parent returning before the
> child execs/exits, and of course horrible memory corruption resulted.

Did you report it?

Andreas.

Rich Felker Sept. 11, 2015, 4:59 p.m. UTC | #16

On Fri, Sep 11, 2015 at 06:23:07PM +0200, Andreas Schwab wrote:
> Rich Felker <dalias@libc.org> writes:
> 
> > I've found CLONE_VFORK unreliable on Linux: under some conditions,
> > when running under ptrace, I observed the parent returning before the
> > child execs/exits, and of course horrible memory corruption resulted.
> 
> Did you report it?

At the time I was too busy to try to track down exactly why it was
happening, and it was unclear to me whether this was the kernel's
fault or the tracing process's fault (strace).

Rich

Mike Frysinger Sept. 11, 2015, 6:22 p.m. UTC | #17

doing an access/stat/whatever doesn't solve the issue you described --
as others have pointed out, there's a TOCTOU race, but there's also no
guarantee that doing a stat/access on mode bits means you can actually
execute that file and/or that the exec will succeed.  since this doesn't
actually fix the problem you've described (and others have pointed out
that it's not a spec violation), then this patch as-is is dead.
-mike

Roland McGrath Sept. 11, 2015, 8:18 p.m. UTC | #18

access has the wrong semantics.  You could use euidaccess.  But then that
will has the wrong semantics if POSIX_SPAWN_RESETIDS was used.  It quickly
gets quite hairy.  Trying to match the semantics that execve will use after
everything that posix_spawn might be doing in the child side is a losing
battle.

Navid Rahimi Sept. 11, 2015, 8:26 p.m. UTC | #19

On Sat, Sep 12, 2015 at 12:48 AM, Roland McGrath <roland@hack.frob.com> wrote:
> access has the wrong semantics.  You could use euidaccess.  But then that
> will has the wrong semantics if POSIX_SPAWN_RESETIDS was used.  It quickly
> gets quite hairy.  Trying to match the semantics that execve will use after
> everything that posix_spawn might be doing in the child side is a losing
> battle.

About using access/euidaccess , both are dead end. So I get over it.
I thought best approach would be using pipe's but pipes are not
available all the time.

Zack Weinberg Sept. 11, 2015, 9:01 p.m. UTC | #20

On Fri, Sep 11, 2015 at 5:18 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> All in all I am inclined to say leave things as they are.  Nobody uses
>> posix_spawn if they can help it, anyway :)
>
> We need to move people off fork and clone to posix_spawn.  If it is
> missing functionality, we should add it.

... I honestly have no idea why you say that.  I am under the
impression that posix_spawn is a monstrosity from the days when the
POSIX committee was inclined to make up interfaces out of whole cloth
in order to fix things that weren't broken, and that it should under
no circumstances be used in new code.

zw

Szabolcs Nagy Sept. 14, 2015, 2:04 p.m. UTC | #21

On 11/09/15 22:01, Zack Weinberg wrote:
> On Fri, Sep 11, 2015 at 5:18 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>> All in all I am inclined to say leave things as they are.  Nobody uses
>>> posix_spawn if they can help it, anyway :)
>>
>> We need to move people off fork and clone to posix_spawn.  If it is
>> missing functionality, we should add it.
>
> ... I honestly have no idea why you say that.  I am under the

because fork has issues

- spawn process from a large application (memory overcommit off)
   is unreliable with fork.
- using it from a multi-threaded application (so from a library)
   is hard, because the child has to be AS-safe.
- leaks sensitive information to the child (the sensitive info
   is often managed by a library and there is no way of safely
   clearing it on fork).
- pthread_atfork is broken, further limiting the applicability
   of fork in multi-threaded code (it cannot be implemented if
   fork has to be AS-safe and the interface contract of the
   callbacks are ill-defined).
- it has no simple implementation if the underlying platform has
   no fork syscall with the right semantics (posix on windows,
   nommu,...)

> impression that posix_spawn is a monstrosity from the days when the
> POSIX committee was inclined to make up interfaces out of whole cloth
> in order to fix things that weren't broken, and that it should under
> no circumstances be used in new code.

posix_spawn is the only standard interface that can
work (some) fork limitations around.

new code should use it instead of fork (or vfork) if
possible.

>
> zw
>

Mike Frysinger Sept. 15, 2015, 10:02 p.m. UTC | #22

On 14 Sep 2015 15:04, Szabolcs Nagy wrote:
> On 11/09/15 22:01, Zack Weinberg wrote:
> > On Fri, Sep 11, 2015 at 5:18 AM, Florian Weimer wrote:
> >>> All in all I am inclined to say leave things as they are.  Nobody uses
> >>> posix_spawn if they can help it, anyway :)
> >>
> >> We need to move people off fork and clone to posix_spawn.  If it is
> >> missing functionality, we should add it.
> >
> > ... I honestly have no idea why you say that.  I am under the
> 
> because fork has issues
> 
> - spawn process from a large application (memory overcommit off)
>    is unreliable with fork.
> - using it from a multi-threaded application (so from a library)
>    is hard, because the child has to be AS-safe.
> - leaks sensitive information to the child (the sensitive info
>    is often managed by a library and there is no way of safely
>    clearing it on fork).
> - pthread_atfork is broken, further limiting the applicability
>    of fork in multi-threaded code (it cannot be implemented if
>    fork has to be AS-safe and the interface contract of the
>    callbacks are ill-defined).
> - it has no simple implementation if the underlying platform has
>    no fork syscall with the right semantics (posix on windows,
>    nommu,...)

how does posix_spawn mangically fix these things ?  it still calls
fork internally, and the use of the vfork flag is non-portable.
-mike

Navid Rahimi Sept. 15, 2015, 10:06 p.m. UTC | #23

On Wed, Sep 16, 2015 at 2:32 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> how does posix_spawn mangically fix these things ?  it still calls
> fork internally, and the use of the vfork flag is non-portable.
> -mike

I sent a v2 ( in fact v3 ) of patch which solve the bug with pipes in
whole new thread in mailing list,
but personally I have these question too , posix_spawn is just another
layer on top of fork (or vfork) but seems people count on it as hole
another approach .
It is a little hard to understand for me how is it differ from fork
when it is just fork with a bunch of other options.

best wishes,
-navid

Szabolcs Nagy Sept. 16, 2015, 8:35 a.m. UTC | #24

On 15/09/15 23:02, Mike Frysinger wrote:
> On 14 Sep 2015 15:04, Szabolcs Nagy wrote:
>> On 11/09/15 22:01, Zack Weinberg wrote:
>>> On Fri, Sep 11, 2015 at 5:18 AM, Florian Weimer wrote:
>>>>> All in all I am inclined to say leave things as they are.  Nobody uses
>>>>> posix_spawn if they can help it, anyway :)
>>>>
>>>> We need to move people off fork and clone to posix_spawn.  If it is
>>>> missing functionality, we should add it.
>>>
>>> ... I honestly have no idea why you say that.  I am under the
>>
>> because fork has issues
>>
>> - spawn process from a large application (memory overcommit off)
>>     is unreliable with fork.
>> - using it from a multi-threaded application (so from a library)
>>     is hard, because the child has to be AS-safe.
>> - leaks sensitive information to the child (the sensitive info
>>     is often managed by a library and there is no way of safely
>>     clearing it on fork).
>> - pthread_atfork is broken, further limiting the applicability
>>     of fork in multi-threaded code (it cannot be implemented if
>>     fork has to be AS-safe and the interface contract of the
>>     callbacks are ill-defined).
>> - it has no simple implementation if the underlying platform has
>>     no fork syscall with the right semantics (posix on windows,
>>     nommu,...)
>
> how does posix_spawn mangically fix these things ?  it still calls
> fork internally, and the use of the vfork flag is non-portable.
> -mike

the api is such that user code is not run with a copy of the
address space in the child so an implementation without fork
is possible (that's the point of this api).

an implementation with fork is not useful as you noted.

an implementation with vfork is problematic in c because
the compiler is free to clobber the stack of the parent.

it can be implemented with clone and CLONE_VM on linux.
(see the musl posix_spawn by Rich Felker) and that solves
the problems (of course the kinds of actions before exec
are limited by the api so it's not always useful/simple).

Carlos O'Donell Sept. 18, 2015, 2:44 p.m. UTC | #25

On 09/11/2015 05:08 AM, Szabolcs Nagy wrote:
> On 10/09/15 22:45, navid Rahimi wrote:
>> On Thu, Sep 10, 2015 at 10:56 PM, Phil Blundell <pb@pbcl.net> wrote:
>>> On Thu, 2015-09-10 at 19:35 +0430, Navid Rahimi wrote:
>>>> I think our main objection here is to avoid forking when there is no
>>>> file.There is so many other variable for checking if execve is going to
>>>> success or not.
>>>
>>> On the other hand, your patch will add an extra system call and
>>> directory lookup to every successful posix_spawn() call, i.e. you are
>>> optimising the failure case at the expense of the successful case.  It's
>>> not at all obvious to me that this is a sensible thing to do.  Can you
>>> explain your reasoning in a bit more detail?
>>>
>>> p.
>>>
>>>
>>
>> I agree with you completely about overhead and extra syscall , but
>> there is no other option if we want to check fork. About optimizing
> 
> there are other options.
> 
> for example you can correctly check if execve returned
> and report an error then instead of doing approximations
> of the right check.
> 
> you can report the error through a cloexec pipe to the parent.
> 
> note that a correct posix_spawn implementation never uses
> fork for QoI reasons (that's the point of posix_spawn, so
> it works on nommu, large multi-thread applications etc.)
> and vfork has the wrong semantics for c code so there are
> other issues here.

Note that it is possible to use vfork in certain conditions,
and we do in glibc. So one should not entirely dismiss vfork,
but that's slightly off topic.

c.

Szabolcs Nagy Sept. 18, 2015, 3:03 p.m. UTC | #26

On 18/09/15 15:44, Carlos O'Donell wrote:
> Note that it is possible to use vfork in certain conditions,
> and we do in glibc. So one should not entirely dismiss vfork,
> but that's slightly off topic.

are you sure?

i think all use of vfork is invalid c: the compiler can
spill registers on the stack then in the child clobber
them, then after vfork returns in the parent the
clobbered registers are restored breaking the expectations
of the compiler. (this can break independently of how
the c code around vfork looks like).

i think you have to know the compiler internals or use
some language extension to make it valid.

Alexander Monakov Sept. 18, 2015, 3:19 p.m. UTC | #27

On Fri, 18 Sep 2015, Szabolcs Nagy wrote:

> On 18/09/15 15:44, Carlos O'Donell wrote:
> > Note that it is possible to use vfork in certain conditions,
> > and we do in glibc. So one should not entirely dismiss vfork,
> > but that's slightly off topic.
> 
> are you sure?
> 
> i think all use of vfork is invalid c: the compiler can
> spill registers on the stack then in the child clobber
> them, then after vfork returns in the parent the
> clobbered registers are restored breaking the expectations
> of the compiler. (this can break independently of how
> the c code around vfork looks like).

The same argument applies to setjmp, and the solution in GCC is the
same for both: GCC internally recognizes vfork, setjmp, and a few other
functions as functions that "return twice".

(after all, if it was broken like you describe, one would expect to witness
such breakage in practice, for instance with GNU Make prior to 4.0)

Alexander

Szabolcs Nagy Sept. 18, 2015, 3:32 p.m. UTC | #28

On 18/09/15 16:19, Alexander Monakov wrote:
> On Fri, 18 Sep 2015, Szabolcs Nagy wrote:
>
>> On 18/09/15 15:44, Carlos O'Donell wrote:
>>> Note that it is possible to use vfork in certain conditions,
>>> and we do in glibc. So one should not entirely dismiss vfork,
>>> but that's slightly off topic.
>>
>> are you sure?
>>
>> i think all use of vfork is invalid c: the compiler can
>> spill registers on the stack then in the child clobber
>> them, then after vfork returns in the parent the
>> clobbered registers are restored breaking the expectations
>> of the compiler. (this can break independently of how
>> the c code around vfork looks like).
>
> The same argument applies to setjmp, and the solution in GCC is the
> same for both: GCC internally recognizes vfork, setjmp, and a few other
> functions as functions that "return twice".
>
> (after all, if it was broken like you describe, one would expect to witness
> such breakage in practice, for instance with GNU Make prior to 4.0)
>

that's surprising: setjmp is standard c
and it is a macro so it cannot be called
through a function pointer.. this is not
true for vfork so it's not clear why gcc
recognizes it.

> Alexander
>

Alexander Monakov Sept. 18, 2015, 4:48 p.m. UTC | #29

> > The same argument applies to setjmp, and the solution in GCC is the
> > same for both: GCC internally recognizes vfork, setjmp, and a few other
> > functions as functions that "return twice".
> >
> > (after all, if it was broken like you describe, one would expect to witness
> > such breakage in practice, for instance with GNU Make prior to 4.0)
> >
> 
> that's surprising: setjmp is standard c
> and it is a macro so it cannot be called
> through a function pointer.. this is not
> true for vfork so it's not clear why gcc
> recognizes it.

Because a minor change to GCC (extending setjmp treatment to vfork) is overall
a more acceptable compromise than demanding fixes to every vfork-in-C usage?

(I can see your argument in a way that an abstract C compiler cannot be
required to treat vfork specially, unlike setjmp; but fwiw, not only GCC, but
Clang and cparser also appear to treat vfork similar to setjmp)

Alexander

Zack Weinberg Sept. 18, 2015, 4:52 p.m. UTC | #30

On Wed, Sep 16, 2015 at 4:35 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 15/09/15 23:02, Mike Frysinger wrote:
>> On 14 Sep 2015 15:04, Szabolcs Nagy wrote:
>>> On 11/09/15 22:01, Zack Weinberg wrote:
>>>> On Fri, Sep 11, 2015 at 5:18 AM, Florian Weimer wrote:
>>>>> We need to move people off fork and clone to posix_spawn.  If it is
>>>>> missing functionality, we should add it.
>>>>
>>>> ... I honestly have no idea why you say that.
>>>
>>> because fork has issues [...]
>>
>> how does posix_spawn magically fix these things ?  it still calls
>> fork internally [...]
>
> the api is such that user code is not run with a copy of the
> address space in the child so an implementation without fork
> is possible (that's the point of this api). [...]  (of course the kinds
> of actions before exec are limited by the api so it's not always
> useful/simple).

So that means migrating to posix_spawn doesn't fix anything until a
better kernel API exists, so developer effort should go into _that_,
and then applications should migrate directly to the new kernel API,
not to posix_spawn.

(I actually have a design sketched out in my head; unfortunately I
have neither the time nor the kernel programming experience to get it
done.  The fundamental idea is a new process state, "egg," in which
the process has a PID and a task_struct assigned but does not have an
address space and cannot execute instructions.  There are a bunch of
new system calls to set the kernel state of an egg -- open file
descriptors and the like -- and a modified execve() that loads a
program into an egg and sets it going.)

zw

Carlos O'Donell Sept. 18, 2015, 5:01 p.m. UTC | #31

On 09/18/2015 11:03 AM, Szabolcs Nagy wrote:
> On 18/09/15 15:44, Carlos O'Donell wrote:
>> Note that it is possible to use vfork in certain conditions,
>> and we do in glibc. So one should not entirely dismiss vfork,
>> but that's slightly off topic.
> 
> are you sure?
> 
> i think all use of vfork is invalid c: the compiler can
> spill registers on the stack then in the child clobber
> them, then after vfork returns in the parent the
> clobbered registers are restored breaking the expectations
> of the compiler. (this can break independently of how
> the c code around vfork looks like).
> 
> i think you have to know the compiler internals or use
> some language extension to make it valid.

The only way I know to make it safe is to create a thread
to handle the vfork state changes, thus you're running
on a distinct stack.

See: http://red.ht/1NI34eT

Cheers,
Carlos.

Zack Weinberg Sept. 18, 2015, 5:13 p.m. UTC | #32

This was meant to be a reply to list.

---------- Forwarded message ----------
From: Zack Weinberg <zackw@panix.com>
Date: Fri, Sep 18, 2015 at 12:42 PM
Subject: Re: [PATCH] [BZ #18433] Check file access/existence before forking.
To: Szabolcs Nagy <szabolcs.nagy@arm.com>

On Fri, Sep 18, 2015 at 11:32 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> that's surprising: setjmp is standard c
> and it is a macro so it cannot be called
> through a function pointer.. this is not
> true for vfork so it's not clear why gcc
> recognizes it.

Any function can be annotated with these semantics using __attribute__
((returns_twice)), but glibc's headers don't do that, so GCC must have
special knowledge of the name 'vfork'; and this probably _is_ a
conformance violation, in that a program that doesn't include
<unistd.h> (or includes it in a feature-selection mode that doesn't
bring in a declaration of 'vfork') ought to be allowed to define its
own 'vfork' that has no such semantics.  (As far as I know, vfork is
_not_ in POSIX nor XSI.)  I don't see any way around it, though.
Breaking programs that call vfork, expecting it to have its special
behavior, would be worse.  And I expect the code-generation changes
are not _wrong_ in the case that vfork doesn't return twice, just less
optimal.

And yes, I expect Bad Things happen if someone calls vfork through a
function pointer.

zw

Rich Felker Sept. 18, 2015, 7:53 p.m. UTC | #33

On Fri, Sep 18, 2015 at 06:19:33PM +0300, Alexander Monakov wrote:
> On Fri, 18 Sep 2015, Szabolcs Nagy wrote:
> 
> > On 18/09/15 15:44, Carlos O'Donell wrote:
> > > Note that it is possible to use vfork in certain conditions,
> > > and we do in glibc. So one should not entirely dismiss vfork,
> > > but that's slightly off topic.
> > 
> > are you sure?
> > 
> > i think all use of vfork is invalid c: the compiler can
> > spill registers on the stack then in the child clobber
> > them, then after vfork returns in the parent the
> > clobbered registers are restored breaking the expectations
> > of the compiler. (this can break independently of how
> > the c code around vfork looks like).
> 
> The same argument applies to setjmp,

Not quite. With setjmp, once there's any return from the function
where setjmp was called (or call to a function that's known neither to
return nor call longjmp or throw an exception), the compiler may
rightfully assume that non-reachable data in the setjmp caller is no
longer live and clobber it.

What makes vfork is special is that the data must be treated as live
even when the caller calls _exit.

> and the solution in GCC is the
> same for both: GCC internally recognizes vfork, setjmp, and a few other
> functions as functions that "return twice".

If gcc treats vfork even more specially than setjmp, or slightly
pessimizes behavior around setjmp to force all data to remain live,
then I think it's safe to use vfork as long as your compiler is gcc.
Whether other compilers that are or might become interesting do the
same, I don't know. It's not an assumption I like.

> (after all, if it was broken like you describe, one would expect to witness
> such breakage in practice, for instance with GNU Make prior to 4.0)

I thought the reason GNU make switches was that they encountered
subtle breakage in practice somewhere... Of course GNU make has to be
compatible with basically _all_ compilers, not just gcc, so that might
be the reason.

Rich

Mike Frysinger Sept. 18, 2015, 8:13 p.m. UTC | #34

On 18 Sep 2015 13:13, Zack Weinberg wrote:
> From: Zack Weinberg <zackw@panix.com>
> On Fri, Sep 18, 2015 at 11:32 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> > that's surprising: setjmp is standard c
> > and it is a macro so it cannot be called
> > through a function pointer.. this is not
> > true for vfork so it's not clear why gcc
> > recognizes it.
> 
> Any function can be annotated with these semantics using __attribute__
> ((returns_twice)), but glibc's headers don't do that

feel like posting a patch ? :)
-mike

Alexander Monakov Sept. 18, 2015, 8:19 p.m. UTC | #35

On Fri, 18 Sep 2015, Rich Felker wrote:
> > > i think all use of vfork is invalid c: the compiler can
> > > spill registers on the stack then in the child clobber
> > > them, then after vfork returns in the parent the
> > > clobbered registers are restored breaking the expectations
> > > of the compiler. (this can break independently of how
> > > the c code around vfork looks like).
> > 
> > The same argument applies to setjmp,
> 
> Not quite. With setjmp, once there's any return from the function
> where setjmp was called (or call to a function that's known neither to
> return nor call longjmp or throw an exception), the compiler may
> rightfully assume that non-reachable data in the setjmp caller is no
> longer live and clobber it.
> 
> What makes vfork is special is that the data must be treated as live
> even when the caller calls _exit.

For the compiler, the relationship between vfork and _exit should be the same
as between setjmp and longjmp: calling the latter causes the former to return
a second time.

Zack Weinberg Sept. 18, 2015, 9:59 p.m. UTC | #36

On September 18, 2015 4:13:02 PM EDT, Mike Frysinger <vapier@gentoo.org> wrote:
>On 18 Sep 2015 13:13, Zack Weinberg wrote:
>> From: Zack Weinberg <zackw@panix.com>
>> On Fri, Sep 18, 2015 at 11:32 AM, Szabolcs Nagy
><szabolcs.nagy@arm.com> wrote:
>> > that's surprising: setjmp is standard c
>> > and it is a macro so it cannot be called
>> > through a function pointer.. this is not
>> > true for vfork so it's not clear why gcc
>> > recognizes it.
>>
>> Any function can be annotated with these semantics using
>__attribute__
>> ((returns_twice)), but glibc's headers don't do that
>
>feel like posting a patch ? :)

Is that actually a useful change to make, considering GCC already
knows about vfork?

To be clear, I am not advocating for the use of vfork (in posix_spawn
or otherwise); I am only advocating *against* bothering to make
improvements to posix_spawn.

zw

Rich Felker Sept. 18, 2015, 11:26 p.m. UTC | #37

On Fri, Sep 18, 2015 at 11:19:56PM +0300, Alexander Monakov wrote:
> On Fri, 18 Sep 2015, Rich Felker wrote:
> > > > i think all use of vfork is invalid c: the compiler can
> > > > spill registers on the stack then in the child clobber
> > > > them, then after vfork returns in the parent the
> > > > clobbered registers are restored breaking the expectations
> > > > of the compiler. (this can break independently of how
> > > > the c code around vfork looks like).
> > > 
> > > The same argument applies to setjmp,
> > 
> > Not quite. With setjmp, once there's any return from the function
> > where setjmp was called (or call to a function that's known neither to
> > return nor call longjmp or throw an exception), the compiler may
> > rightfully assume that non-reachable data in the setjmp caller is no
> > longer live and clobber it.
> > 
> > What makes vfork is special is that the data must be treated as live
> > even when the caller calls _exit.
> 
> For the compiler, the relationship between vfork and _exit should be the same
> as between setjmp and longjmp: calling the latter causes the former to return
> a second time.

Yes, I suppose that's a good way to model it. But the returns_twice
attribute does not specify under what conditions the function returns
again, so unless gcc has vfork-specific knowledge attached to the
specific name, I suppose it must just assume everything stays live.

Rich

Andreas Schwab Sept. 19, 2015, 6:39 a.m. UTC | #38

Rich Felker <dalias@libc.org> writes:

> Yes, I suppose that's a good way to model it. But the returns_twice
> attribute does not specify under what conditions the function returns
> again, so unless gcc has vfork-specific knowledge attached to the
> specific name, I suppose it must just assume everything stays live.

GCC doesn't treat vfork any different from setjmp, or any other function
marked returns_twice, see calls.c:special_function_p.

Andreas.

Zack Weinberg Sept. 19, 2015, 3:04 p.m. UTC | #39

On Sat, Sep 19, 2015 at 2:39 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Rich Felker <dalias@libc.org> writes:
>
>> Yes, I suppose that's a good way to model it. But the returns_twice
>> attribute does not specify under what conditions the function returns
>> again, so unless gcc has vfork-specific knowledge attached to the
>> specific name, I suppose it must just assume everything stays live.
>
> GCC doesn't treat vfork any different from setjmp, or any other function
> marked returns_twice, see calls.c:special_function_p.

special_function_p just defines the set of functions assumed to have
this behavior (setjmp, setjmp_syscall, sigsetjmp, savectx, qsetjmp,
vfork, and getcontext); to find out what the compiler *does* with that
knowledge you have to grep for ECF_RETURNS_TWICE.

As best I can tell, the direct effects of a callsite being marked
ECF_RETURNS_TWICE seem to be that it is not eligible for various
optimizations (e.g. tail call) and, much more importantly, the
control-flow graph for the containing function is annotated with an
"abnormal edge" from *every other callsite which is not known to have
no side effects* to that callsite.

I'm not sure exactly what "abnormal edge" means to GCC, but I think
this should be conservatively correct, whether or not such a function
actually does return twice.

zw

Richard Henderson Sept. 21, 2015, 5:34 p.m. UTC | #40

On 09/19/2015 08:04 AM, Zack Weinberg wrote:
> On Sat, Sep 19, 2015 at 2:39 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> Rich Felker <dalias@libc.org> writes:
>>
>>> Yes, I suppose that's a good way to model it. But the returns_twice
>>> attribute does not specify under what conditions the function returns
>>> again, so unless gcc has vfork-specific knowledge attached to the
>>> specific name, I suppose it must just assume everything stays live.

Indeed it does assume everything stays live.

>>
>> GCC doesn't treat vfork any different from setjmp, or any other function
>> marked returns_twice, see calls.c:special_function_p.
> 
> special_function_p just defines the set of functions assumed to have
> this behavior (setjmp, setjmp_syscall, sigsetjmp, savectx, qsetjmp,
> vfork, and getcontext); to find out what the compiler *does* with that
> knowledge you have to grep for ECF_RETURNS_TWICE.
> 
> As best I can tell, the direct effects of a callsite being marked
> ECF_RETURNS_TWICE seem to be that it is not eligible for various
> optimizations (e.g. tail call) and, much more importantly, the
> control-flow graph for the containing function is annotated with an
> "abnormal edge" from *every other callsite which is not known to have
> no side effects* to that callsite.
> 
> I'm not sure exactly what "abnormal edge" means to GCC, but I think
> this should be conservatively correct, whether or not such a function
> actually does return twice.

Abnormal edge, i.e. abnormal flow of control, i.e. longjmp or throw-like.

It's a big hammer that kills quite a lot of data flow analysis, but thankfully
returns_twice functions are rare.  And in the situations they are used, there
usually isn't much room for improvement anyway, so why complicate the
correctness issue?


r~

Florian Weimer Sept. 30, 2015, 11:48 a.m. UTC | #41

On 09/18/2015 06:52 PM, Zack Weinberg wrote:
> On Wed, Sep 16, 2015 at 4:35 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>> On 15/09/15 23:02, Mike Frysinger wrote:
>>> On 14 Sep 2015 15:04, Szabolcs Nagy wrote:
>>>> On 11/09/15 22:01, Zack Weinberg wrote:
>>>>> On Fri, Sep 11, 2015 at 5:18 AM, Florian Weimer wrote:
>>>>>> We need to move people off fork and clone to posix_spawn.  If it is
>>>>>> missing functionality, we should add it.
>>>>>
>>>>> ... I honestly have no idea why you say that.
>>>>
>>>> because fork has issues [...]
>>>
>>> how does posix_spawn magically fix these things ?  it still calls
>>> fork internally [...]
>>
>> the api is such that user code is not run with a copy of the
>> address space in the child so an implementation without fork
>> is possible (that's the point of this api). [...]  (of course the kinds
>> of actions before exec are limited by the api so it's not always
>> useful/simple).
> 
> So that means migrating to posix_spawn doesn't fix anything until a
> better kernel API exists, so developer effort should go into _that_,
> and then applications should migrate directly to the new kernel API,
> not to posix_spawn.

posix_spawn is not a kernel API.  If applications move to that, we can
add optimizations as soon as better kernel APIs become available, and
applications will directly benefit.  That's the nice aspect of
pthread_spawn.  The not-so-nice aspect from an application point of view
is that the interface is so restrictive, and glibc basically has to vet
and implement any child process configuration option.

> (I actually have a design sketched out in my head; unfortunately I
> have neither the time nor the kernel programming experience to get it
> done.  The fundamental idea is a new process state, "egg," in which
> the process has a PID and a task_struct assigned but does not have an
> address space and cannot execute instructions.  There are a bunch of
> new system calls to set the kernel state of an egg -- open file
> descriptors and the like -- and a modified execve() that loads a
> program into an egg and sets it going.)

I think we actually have most bits in the kernel (what you are proposing
is a lot like the restore part of the checkpoint/restore support), we
just have to fit things together properly.

One thing that we can't really address: Some callers want to know the
child PID *before* the execve (so that pthread_spawn does not delay
execution), some want to know precisely if execve succeeded.  We
probably need to introduce a flag to choose between the two behaviors
because they are fundamentally incompatible.

Zack Weinberg Sept. 30, 2015, 2:35 p.m. UTC | #42

On Wed, Sep 30, 2015 at 7:48 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 09/18/2015 06:52 PM, Zack Weinberg wrote:
>> (I actually have a design sketched out in my head; unfortunately I
>> have neither the time nor the kernel programming experience to get it
>> done.  The fundamental idea is a new process state, "egg," in which
>> the process has a PID and a task_struct assigned but does not have an
>> address space and cannot execute instructions.  There are a bunch of
>> new system calls to set the kernel state of an egg -- open file
>> descriptors and the like -- and a modified execve() that loads a
>> program into an egg and sets it going.)
>
> I think we actually have most bits in the kernel (what you are proposing
> is a lot like the restore part of the checkpoint/restore support), we
> just have to fit things together properly.
>
> One thing that we can't really address: Some callers want to know the
> child PID *before* the execve (so that pthread_spawn does not delay
> execution), some want to know precisely if execve succeeded.  We
> probably need to introduce a flag to choose between the two behaviors
> because they are fundamentally incompatible.

This is a non-issue if applications code to the new kernel API rather
than posix_spawn. egg_create() returns the new PID, which you always
need to know, and egg_execve(pid, ...) tells you if it failed.

(A critical aspect of my idea, which I guess I forgot to mention, is
that all the new setting-up-the-egg syscalls report failure in the
*parent*; this wipes out a whole class of problems with failures in
between fork() and execve() being difficult to report accurately.)

zw

diff mbox

Patch

diff --git a/ChangeLog b/ChangeLog
index 5f009a8..0e0c85b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@ 
+2015-09-10  Navid Rahimi  <rahimi.nv@gmail.com>
+
+	[BZ #18433]
+	* sysdeps/posix/spawni.c (__spawni): Check file access before forking.
+
  2015-09-08  Joseph Myers  <joseph@codesourcery.com>

	[BZ #14912]
diff --git a/sysdeps/posix/spawni.c b/sysdeps/posix/spawni.c
index eee9331..c571390 100644
--- a/sysdeps/posix/spawni.c
+++ b/sysdeps/posix/spawni.c
@@ -90,6 +90,9 @@  __spawni (pid_t *pid, const char *file,
    size_t len;
    size_t pathlen;

+  if(__access (file, X_OK) != 0)
+    return errno;
+
    /* Do this once.  */
    short int flags = attrp == NULL ? 0 : attrp->__flags;