posix: Fix improper assert in Linux posix_spawn (BZ#22273)

Message ID 1507833562-6945-1-git-send-email-adhemerval.zanella@linaro.org
State Dropped
Headers

Commit Message

Adhemerval Zanella Netto Oct. 12, 2017, 6:39 p.m. UTC
  As noted by Florian Weimer, current Linux posix_spawn implementation
can trigger an assert if the auxiliary process is terminated before
actually setting the err member:

    340   /* Child must set args.err to something non-negative - we rely on
    341      the parent and child sharing VM.  */
    342   args.err = -1;
    [...]
    362   new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
    363                    CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
    364
    365   if (new_pid > 0)
    366     {
    367       ec = args.err;
    368       assert (ec >= 0);

Another possible issue is killing the child between setting the err and
actually calling execve.  In this case the process will not ran, but
posix_spawn also will not report any error:

    269
    270   args->err = 0;
    271   args->exec (args->file, args->argv, args->envp);

As suggested by Andreas Schwab, this patch removes the faulty assert
and also handles any signal that happens before fork and execve as the
spawn was successful (and thus relaying the handling to the caller to
figure this out).  Different than Florian, I can not see why using
atomics to set err would help here, essentially the code runs
sequentially (due CLONE_VFORK) and I think it would not be legal the
compiler evaluate ec without checking for new_pid result (thus there
is no need to compiler barrier).

Checked on x86_64-linux-gnu.

	[BZ #22273]
	* sysdeps/unix/sysv/linux/spawni.c (__spawnix): Handle the case where
	the auxiliary process is terminated by a signal before calling _exit
	or execve.
---
 ChangeLog                        |  6 ++++++
 sysdeps/unix/sysv/linux/spawni.c | 12 ++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)
  

Comments

Florian Weimer Oct. 12, 2017, 6:49 p.m. UTC | #1
* Adhemerval Zanella:

> As suggested by Andreas Schwab, this patch removes the faulty assert
> and also handles any signal that happens before fork and execve as the
> spawn was successful (and thus relaying the handling to the caller to
> figure this out).  Different than Florian, I can not see why using
> atomics to set err would help here, essentially the code runs
> sequentially (due CLONE_VFORK) and I think it would not be legal the
> compiler evaluate ec without checking for new_pid result (thus there
> is no need to compiler barrier).

I thought you'd need to guard against reordering in the child.  Since
the err member is the only thing that is written and the syscalls in
the child act as compiler barriers (so that the store cannot float to
the beginning of the function), this is not actually an issue here.
  
Adhemerval Zanella Netto Oct. 17, 2017, 1:42 p.m. UTC | #2
On 12/10/2017 15:49, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> As suggested by Andreas Schwab, this patch removes the faulty assert
>> and also handles any signal that happens before fork and execve as the
>> spawn was successful (and thus relaying the handling to the caller to
>> figure this out).  Different than Florian, I can not see why using
>> atomics to set err would help here, essentially the code runs
>> sequentially (due CLONE_VFORK) and I think it would not be legal the
>> compiler evaluate ec without checking for new_pid result (thus there
>> is no need to compiler barrier).
> 
> I thought you'd need to guard against reordering in the child.  Since
> the err member is the only thing that is written and the syscalls in
> the child act as compiler barriers (so that the store cannot float to
> the beginning of the function), this is not actually an issue here.
> 

Right, so are you ok with the patch proposed?
  
Florian Weimer Oct. 17, 2017, 8:36 p.m. UTC | #3
* Adhemerval Zanella:

> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
> index dea1650..d6acc1e 100644
> --- a/sysdeps/unix/sysv/linux/spawni.c
> +++ b/sysdeps/unix/sysv/linux/spawni.c
> @@ -365,9 +365,17 @@ __spawnix (pid_t * pid, const char *file,
>    if (new_pid > 0)
>      {
>        ec = args.err;
> -      assert (ec >= 0);
>        if (ec != 0)
> -	  __waitpid (new_pid, NULL, 0);
> +	{
> +	  /* It handles the unlikely case where the auxiliary process is
> +	     terminated by a signal before calling _exit or execve.  For
> +	     this case the signal is relayed to the called, as the spawn
> +	     was successful.  */
> +	  int status;
> +	  __waitpid (new_pid, &status, WNOHANG);
> +	  if (WIFSIGNALED (status))
> +	    ec = 0;
> +	}

I'm sorry to say, but this does not look correct to me.  A SIGCHLD
handler may have collected the PID at the point of the waitpid call,
so it can fail with ECHLD or collect the wait notification for another
subprocess due to PID reuse.

I think we need to treat ec == -1 (process did not proceed to the
execve call because it was terminated by a signal) just like ec == 0
(process made a successful execve call, or was terminated just before
the system call, or just after a failed execve, again due to signals).
That is, return the PID to the caller and let it handle the abnormal
process termination due t a signal.

So the assert needs to be removed and the condition changed from ec !=
0 to ec > 0.  And this needs plenty of comments, explaining that we
report subprocess termination due to a signal to the caller and cannot
handle it locally due to the PID reuse race.

I'm not sure if it is necessary to initialize args.err (and ec) to -1
in the first place and reset it in the subprocess.  It seems to me
that this can be simplified a bit because we only need the non-zero
value to carry the information that execve failed, and the error code
with which it failed.
  

Patch

diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index dea1650..d6acc1e 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -365,9 +365,17 @@  __spawnix (pid_t * pid, const char *file,
   if (new_pid > 0)
     {
       ec = args.err;
-      assert (ec >= 0);
       if (ec != 0)
-	  __waitpid (new_pid, NULL, 0);
+	{
+	  /* It handles the unlikely case where the auxiliary process is
+	     terminated by a signal before calling _exit or execve.  For
+	     this case the signal is relayed to the called, as the spawn
+	     was successful.  */
+	  int status;
+	  __waitpid (new_pid, &status, WNOHANG);
+	  if (WIFSIGNALED (status))
+	    ec = 0;
+	}
     }
   else
     ec = -new_pid;