Message ID | 20191010133518.5420-1-christian.brauner@ubuntu.com |
---|---|
State | Not applicable |
Headers |
Received: (qmail 32782 invoked by alias); 10 Oct 2019 15:24:26 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 32650 invoked by uid 89); 10 Oct 2019 15:24:18 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-15.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_SBL_CSS, RCVD_IN_SORBS_WEB autolearn=unavailable version=3.3.1 spammy=HReceived-SPF:pass, HX-Google-DKIM-Signature:sk:x-origi, HX-Original-Authentication-Results:spf, HX-Original-Authentication-Results:permitted X-HELO: youngberry.canonical.com Resent-From: Christian Brauner <christian.brauner@ubuntu.com> Resent-Date: Thu, 10 Oct 2019 17:24:00 +0200 Resent-Message-ID: <20191010152400.563fhfv65akvwa4i@wittgenstein> Resent-To: libc-alpha@sourceware.org ARC-Seal: i=2; a=rsa-sha256; t=1570714608; cv=pass; d=google.com; s=arc-20160816; b=ZWIQoSEOmOg1I6u3R1MivFfVHC/otpVbIdOPunaQxDct/QynSfs0AbQXBiEUZgNnfl 9JcHMe8oihE8JfvAompTxs6Q3VqS48dciHYwf0OmM3Sq6haResFkIQyaRc+wOMWWOKf0 NIZV5RWzt+hrmAo8jmd4LcA1nMla8BB4wNdyNEIW8O1iDdn0udgYjCDsa0Dj4USGlY0q Ur9WJ82e/YqldSD+vYv1sn/q/U+6PouGgl0TYG+tmdOLq8GU7h4eA4AoYc2jktVSwhYN oFiq+5ZuyE5gapJ6Q6e9uW1pPdliHndP0+6xBJxF8/bkSGWdwQNF7XGBvQS4DMLyyahj 4tBQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:delivered-to; bh=ojv6YG6RITwlcY5aroAxXtMlkW/FY86TUJN6zaSvQas=; b=j8mrKajqfVfiX/PqTM+TLUm6/2p9UllFgjOtDoWYRuiInuNBne3DghlRP8ifHXXDut b+MU47wuMF0ivEQXdpnNbCOw8MP//0cG82i1HxJcMqEjBSfaWBASuVYUR1SLdWBR+1cu UPH49G/WnqnZU9KevdoufSELrU8iPMDekzv0kATnR4I63R+zpIgiUkdkN+zO9nyjCD50 YuuUT8nHyOVQhA6ecsojIPYRc5euj7MU7n1h58OOfyO8eMxmj7878U9J/2usUCW/Oe2/ t85sdtxY+TMTs9iM0EAUkL3OplWApcBd/tobdP1RMg8ibez8YLNaPfy6mP3gPjb082Am GVCw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: best guess record for domain of christian.brauner+caf_=christian=brauner.io@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom="christian.brauner+caf_=christian=brauner.io@canonical.com" Received-SPF: pass (google.com: best guess record for domain of christian.brauner+caf_=christian=brauner.io@canonical.com designates 91.189.89.112 as permitted sender) client-ip=91.189.89.112; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: best guess record for domain of christian.brauner+caf_=christian=brauner.io@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom="christian.brauner+caf_=christian=brauner.io@canonical.com" X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 91.189.89.112 is neither permitted nor denied by best guess record for domain of christian.brauner@ubuntu.com) smtp.mailfrom=christian.brauner@ubuntu.com X-Forwarded-To: christian@brauner.io X-Forwarded-For: christian.brauner@canonical.com christian@brauner.io ARC-Seal: i=1; a=rsa-sha256; t=1570714605; cv=none; d=google.com; s=arc-20160816; b=PwyZdd9b18tlnDd+20kPV+Dn7rlsNQojmYbugHWbdaRw0SzUZsMzCyA97xH0kyjKOC WMAJ1Wr9KYU4pByG6eUqCrBGFgLUqxQ7ouAyvKtQHesAULejtT80NxNGRnkPIN57+BV0 sKi5B4791C+xSZakbN/S3oFV/pIcUEG5D2rkkcq9xI2HW3hMGzaV7LbvaV8Lp/YtmxCY AJ4UZk1lJpiKVnaZrdTDt7iw+MlesD9ye438xQlz9MtlHSPfUW2f/wJwzp3coKn3JYlJ w9CN/nt+lB/6cUfN+NpvMpsBLMeiTTAjeIXpEdayEk+/xB9o+fJ4UGuEF0teL3dNYLRg HCZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from; bh=ojv6YG6RITwlcY5aroAxXtMlkW/FY86TUJN6zaSvQas=; b=lbolqUsuByo5Rp2KYVByv6vRfq2TFfGil2k7u+ZMBX5H7D1N2b1BC5wV7riRMWo1vF kCV7vZtXBBiI1aIrNLU+MZbCppz1h6yPZJHoGSoZ/QflexphSZa6KuturHNT25APW8A9 Q5LD4TxUBfQejo2eErMO70gmXzRopQEDYtqqW/xb1IPvaz0w+IH/eLB3pHaYtVDVW5H8 prUKG9QZaRIJTMoRtk+nNqNqUtfk1rQ136TIRxZ3gN+D2oJRKIOiketBFCXbhCDwXeQ+ V+u4ukzbC6Y+nMJpSAJSaXonPzu3gABKAA8jYngq/UOnaFogiquUPjNbHR5zL3sgpYMk Jslg== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 91.189.89.112 is neither permitted nor denied by best guess record for domain of christian.brauner@ubuntu.com) smtp.mailfrom=christian.brauner@ubuntu.com Received-SPF: neutral (google.com: 91.189.89.112 is neither permitted nor denied by best guess record for domain of christian.brauner@ubuntu.com) client-ip=91.189.89.112; From: Christian Brauner <christian.brauner@ubuntu.com> To: linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>, Florian Weimer <fweimer@redhat.com>, libc-alpha@sourceware.org Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Shuah Khan <shuah@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Michal Hocko <mhocko@suse.com>, Elena Reshetova <elena.reshetova@intel.com>, Thomas Gleixner <tglx@linutronix.de>, Roman Gushchin <guro@fb.com>, Andrea Arcangeli <aarcange@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>, Aleksa Sarai <cyphar@cyphar.com>, "Dmitry V. Levin" <ldv@altlinux.org>, linux-kselftest@vger.kernel.org, Christian Brauner <christian.brauner@ubuntu.com> Subject: [PATCH 1/2] clone3: add CLONE3_CLEAR_SIGHAND Date: Thu, 10 Oct 2019 15:35:17 +0200 Message-Id: <20191010133518.5420-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit |
Commit Message
Christian Brauner
Oct. 10, 2019, 1:35 p.m. UTC
Reset all signal handlers of the child not set to SIG_IGN to SIG_DFL.
Mutually exclusive with CLONE_SIGHAND to not disturb other thread's
signal handler.
In the spirit of closer cooperation between glibc developers and kernel
developers (cf. [2]) this patchset came out of a discussion on the glibc
mailing list for improving posix_spawn() (cf. [1], [3], [4]). Kernel
support for this feature has been explicitly requested by glibc and I
see no reason not to help them with this.
The child helper process on Linux posix_spawn must ensure that no signal
handlers are enabled, so the signal disposition must be either SIG_DFL
or SIG_IGN. However, it requires a sigprocmask to obtain the current
signal mask and at least _NSIG sigaction calls to reset the signal
handlers for each posix_spawn call or complex state tracking that might
lead to data corruption in glibc. Adding this flags lets glibc avoid
these problems.
[1]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00149.html
[3]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00158.html
[4]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00160.html
[2]: https://lwn.net/Articles/799331/
'[...] by asking for better cooperation with the C-library projects
in general. They should be copied on patches containing ABI
changes, for example. I noted that there are often times where
C-library developers wish the kernel community had done things
differently; how could those be avoided in the future? Members of
the audience suggested that more glibc developers should perhaps
join the linux-api list. The other suggestion was to "copy Florian
on everything".'
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: libc-alpha@sourceware.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
include/uapi/linux/sched.h | 3 +++
kernel/fork.c | 11 ++++++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
Comments
On Fri, Oct 11, 2019 at 10:21:18AM +0200, Michal Hocko wrote:
> [Cc linux-api]
Right, thanks Michal.
Christian
On 2019-10-11, Michael Kerrisk <mtk.manpages@gmail.com> wrote:
> Why CLONE3_CLEAR_SIGHAND rather than just CLONE_CLEAR_SIGHAND?
There are no more flag bits left for the classic clone()/clone2() (the
last one was used up by CLONE_PIDFD) -- thus this flag is clone3()-only.
Hello Aleksa, On Sat, 12 Oct 2019 at 00:12, Aleksa Sarai <cyphar@cyphar.com> wrote: > > On 2019-10-11, Michael Kerrisk <mtk.manpages@gmail.com> wrote: > > Why CLONE3_CLEAR_SIGHAND rather than just CLONE_CLEAR_SIGHAND? > > There are no more flag bits left for the classic clone()/clone2() (the > last one was used up by CLONE_PIDFD) -- thus this flag is clone3()-only. Yes, I understand that. But, I'm not sure that the "3" in the prefix is necessary. "CLONE_" still seems better to me. Consider this: sometime in the near future we will probably have time namespaces. The new flag for those namespaces will only be usable with clone3(). It should NOT be called CLONE3_NEWTIME, but rather CLONE_NEWTIME (or similar), because that same flag will presumably also be used in other APIs such as unshare() and setns(). (Hmm -- I wonder if we are going to need a new unshare2() or some such...) Thanks, Michael
On Sat, Oct 12, 2019 at 08:53:34AM +0200, Michael Kerrisk (man-pages) wrote: > Hello Aleksa, > > On Sat, 12 Oct 2019 at 00:12, Aleksa Sarai <cyphar@cyphar.com> wrote: > > > > On 2019-10-11, Michael Kerrisk <mtk.manpages@gmail.com> wrote: > > > Why CLONE3_CLEAR_SIGHAND rather than just CLONE_CLEAR_SIGHAND? I don't care much how we name this apart from the "_CLEAR_SIGHAND" suffix. But see for a little rationale below. > > > > There are no more flag bits left for the classic clone()/clone2() (the > > last one was used up by CLONE_PIDFD) -- thus this flag is clone3()-only. > > Yes, I understand that. But, I'm not sure that the "3" in the prefix > is necessary. "CLONE_" still seems better to me. > > Consider this: sometime in the near future we will probably have time > namespaces. The new flag for those namespaces will only be usable with > clone3(). It should NOT be called CLONE3_NEWTIME, but rather > CLONE_NEWTIME (or similar), because that same flag will presumably > also be used in other APIs such as unshare() and setns(). (Hmm -- I There are some noteable differences though. CLONE_NEWTIME takes the CSIGNAL bit which is in the range of a 32bit integer and thus useable by unshare() too. The same does not hold for CLONE{3}_CLEAR_SIGHAND. You can't pass it to unshare(). unshare() also just deals with namespace-relevant stuff so CLONE{3}_CLEAR_SIGHAND doesn't make much sense there. > wonder if we are going to need a new unshare2() or some such...) We still have one 32bit bit left (CLONE_DETACHED) which we can't reuse with clone()/clone2() but we can reuse with clone3(). We can simply earmark it for namespace-related stuff and thus still have one bit left for unshare() before we have to go for unshare2() (If we have to go there at all since I'm not sure how much more namespaces we can come up with.). Christian
On 10/12/19 9:48 AM, Christian Brauner wrote: > On Sat, Oct 12, 2019 at 08:53:34AM +0200, Michael Kerrisk (man-pages) wrote: >> Hello Aleksa, >> >> On Sat, 12 Oct 2019 at 00:12, Aleksa Sarai <cyphar@cyphar.com> wrote: >>> >>> On 2019-10-11, Michael Kerrisk <mtk.manpages@gmail.com> wrote: >>>> Why CLONE3_CLEAR_SIGHAND rather than just CLONE_CLEAR_SIGHAND? > > I don't care much how we name this apart from the "_CLEAR_SIGHAND" > suffix. But see for a little rationale below. > >>> >>> There are no more flag bits left for the classic clone()/clone2() (the >>> last one was used up by CLONE_PIDFD) -- thus this flag is clone3()-only. >> >> Yes, I understand that. But, I'm not sure that the "3" in the prefix >> is necessary. "CLONE_" still seems better to me. >> >> Consider this: sometime in the near future we will probably have time >> namespaces. The new flag for those namespaces will only be usable with >> clone3(). It should NOT be called CLONE3_NEWTIME, but rather >> CLONE_NEWTIME (or similar), because that same flag will presumably >> also be used in other APIs such as unshare() and setns(). (Hmm -- I > > There are some noteable differences though. CLONE_NEWTIME takes the > CSIGNAL bit which is in the range of a 32bit integer and thus useable by > unshare() too. The same does not hold for CLONE{3}_CLEAR_SIGHAND. You > can't pass it to unshare(). unshare() also just deals with > namespace-relevant stuff so CLONE{3}_CLEAR_SIGHAND doesn't make much > sense there. Sure, but going forward there's very likely to be more CLONE flags for whatever reason, and some will be usable just in clone3() while others will be more widely used (in other APIs such as unshare() and setns()). Using two different prefixes for these flags (CLONE_/CLONE3_) would be just confusing. AFAICS, the CLONE3_ prefix really provides no advantage, but does have the potential to cause confusion down the track for the aforementioned reasons. (Furthermore... Shudder! What if there's a clone4() one day. I know you might say: "won't happen, we got things right this time", but API history suggests that "right" now not infrequently becomes "oops" later.) I do recommend CLONE_ for all the flags... >> wonder if we are going to need a new unshare2() or some such...) > > We still have one 32bit bit left (CLONE_DETACHED) which we can't reuse > with clone()/clone2() but we can reuse with clone3(). We can simply > earmark it for namespace-related stuff and thus still have one bit left > for unshare() before we have to go for unshare2() (If we have to go > there at all since I'm not sure how much more namespaces we can come up > with.). I'm sure there'll be more namespaces... Cheers, Michael
On Sat, Oct 12, 2019 at 01:46:54PM +0200, Michael Kerrisk (man-pages) wrote: > On 10/12/19 9:48 AM, Christian Brauner wrote: > > On Sat, Oct 12, 2019 at 08:53:34AM +0200, Michael Kerrisk (man-pages) wrote: > >> Hello Aleksa, > >> > >> On Sat, 12 Oct 2019 at 00:12, Aleksa Sarai <cyphar@cyphar.com> wrote: > >>> > >>> On 2019-10-11, Michael Kerrisk <mtk.manpages@gmail.com> wrote: > >>>> Why CLONE3_CLEAR_SIGHAND rather than just CLONE_CLEAR_SIGHAND? > > > > I don't care much how we name this apart from the "_CLEAR_SIGHAND" > > suffix. But see for a little rationale below. > > > >>> > >>> There are no more flag bits left for the classic clone()/clone2() (the > >>> last one was used up by CLONE_PIDFD) -- thus this flag is clone3()-only. > >> > >> Yes, I understand that. But, I'm not sure that the "3" in the prefix > >> is necessary. "CLONE_" still seems better to me. > >> > >> Consider this: sometime in the near future we will probably have time > >> namespaces. The new flag for those namespaces will only be usable with > >> clone3(). It should NOT be called CLONE3_NEWTIME, but rather > >> CLONE_NEWTIME (or similar), because that same flag will presumably > >> also be used in other APIs such as unshare() and setns(). (Hmm -- I > > > > There are some noteable differences though. CLONE_NEWTIME takes the > > CSIGNAL bit which is in the range of a 32bit integer and thus useable by > > unshare() too. The same does not hold for CLONE{3}_CLEAR_SIGHAND. You > > can't pass it to unshare(). unshare() also just deals with > > namespace-relevant stuff so CLONE{3}_CLEAR_SIGHAND doesn't make much > > sense there. > > Sure, but going forward there's very likely to be more CLONE flags > for whatever reason, and some will be usable just in clone3() > while others will be more widely used (in other APIs such as > unshare() and setns()). Using two different prefixes for these > flags (CLONE_/CLONE3_) would be just confusing. AFAICS, the CLONE3_ I do agree with that part. And as I said in my previous mail, I don't care about the prefix. > prefix really provides no advantage, but does have the potential to > cause confusion down the track for the aforementioned reasons. > (Furthermore... Shudder! What if there's a clone4() one day. I > know you might say: "won't happen, we got things right this time", > but API history suggests that "right" now not infrequently becomes > "oops" later.) I do recommend CLONE_ for all the flags... I do love your trust in our ability to design syscalls (//Cc Aleksa ;)). :) > > >> wonder if we are going to need a new unshare2() or some such...) > > > > We still have one 32bit bit left (CLONE_DETACHED) which we can't reuse > > with clone()/clone2() but we can reuse with clone3(). We can simply > > earmark it for namespace-related stuff and thus still have one bit left > > for unshare() before we have to go for unshare2() (If we have to go > > there at all since I'm not sure how much more namespaces we can come up > > with.). > > I'm sure there'll be more namespaces... Let's hope not. :) Anyway, no real reason to do unshare2() any time soon. :) Christian
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 99335e1f4a27..c583720f689f 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -33,6 +33,9 @@ #define CLONE_NEWNET 0x40000000 /* New network namespace */ #define CLONE_IO 0x80000000 /* Clone io context */ +/* Flags for the clone3() syscall */ +#define CLONE3_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */ + #ifndef __ASSEMBLY__ /** * struct clone_args - arguments for the clone3 syscall diff --git a/kernel/fork.c b/kernel/fork.c index 1f6c45f6a734..661f8d1f3881 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1517,6 +1517,11 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk) spin_lock_irq(¤t->sighand->siglock); memcpy(sig->action, current->sighand->action, sizeof(sig->action)); spin_unlock_irq(¤t->sighand->siglock); + + /* Reset all signal handler not set to SIG_IGN to SIG_DFL. */ + if (clone_flags & CLONE3_CLEAR_SIGHAND) + flush_signal_handlers(tsk, 0); + return 0; } @@ -2567,7 +2572,7 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs) * All lower bits of the flag word are taken. * Verify that no other unknown flags are passed along. */ - if (kargs->flags & ~CLONE_LEGACY_FLAGS) + if (kargs->flags & ~(CLONE_LEGACY_FLAGS | CLONE3_CLEAR_SIGHAND)) return false; /* @@ -2577,6 +2582,10 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs) if (kargs->flags & (CLONE_DETACHED | CSIGNAL)) return false; + if ((kargs->flags & (CLONE_SIGHAND | CLONE3_CLEAR_SIGHAND)) == + (CLONE_SIGHAND | CLONE3_CLEAR_SIGHAND)) + return false; + if ((kargs->flags & (CLONE_THREAD | CLONE_PARENT)) && kargs->exit_signal) return false;