Message ID | 20210316065215.23768-6-chang.seok.bae@intel.com |
---|---|
State | Not applicable |
Headers |
Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 811AF385481A; Tue, 16 Mar 2021 06:57:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 811AF385481A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1615877843; bh=DnweHu7lZvAHD3pL3n7cL1QPNQD0UZAUcVkO4W5jzT4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=WMr7/OUp1PO1KJBDp6hS+HKhr6r5HRpyRYWsW9EmiKsbthgp99/9adH7uNAZkUfzu 2sh+7WMHcCom/5r2iNYN/6mZDhYqgSgGk3YYgFKFbNMOjGJ6nAHbT5CmuL0TgUT9XB GVg5VZcmMHRjISf62UYokdD9x/PKsVk2toxEgCcU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id 483FF385BF99 for <libc-alpha@sourceware.org>; Tue, 16 Mar 2021 06:57:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 483FF385BF99 IronPort-SDR: radqvbw8OJQY4RJD6ZWAlLoa3vE/DO86UVgE/57T/qrp2W5SOe8l+Z0qa4bdstSeKx4iz2G4c4 4BrRBs2H+wOQ== X-IronPort-AV: E=McAfee;i="6000,8403,9924"; a="169126409" X-IronPort-AV: E=Sophos;i="5.81,251,1610438400"; d="scan'208";a="169126409" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Mar 2021 23:57:12 -0700 IronPort-SDR: NaWqtLNwjoxz1cY7Ox75cbdrrOG+hk9XyQPDsHE5yNF2A4zKN/T9br7P9Zt0SeASMrJCAAyT2F JcJ6ZcB9cqoA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,251,1610438400"; d="scan'208";a="511296086" Received: from chang-linux-3.sc.intel.com ([172.25.66.175]) by fmsmga001.fm.intel.com with ESMTP; 15 Mar 2021 23:57:12 -0700 To: bp@suse.de, tglx@linutronix.de, mingo@kernel.org, luto@kernel.org, x86@kernel.org Subject: [PATCH v7 5/6] x86/signal: Detect and prevent an alternate signal stack overflow Date: Mon, 15 Mar 2021 23:52:14 -0700 Message-Id: <20210316065215.23768-6-chang.seok.bae@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210316065215.23768-1-chang.seok.bae@intel.com> References: <20210316065215.23768-1-chang.seok.bae@intel.com> X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: "Chang S. Bae via Libc-alpha" <libc-alpha@sourceware.org> Reply-To: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: linux-arch@vger.kernel.org, len.brown@intel.com, tony.luck@intel.com, libc-alpha@sourceware.org, ravi.v.shankar@intel.com, chang.seok.bae@intel.com, jannh@google.com, linux-kernel@vger.kernel.org, dave.hansen@intel.com, linux-api@vger.kernel.org, Dave.Martin@arm.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org> |
Series |
x86: Improve Minimum Alternate Stack Size
|
|
Commit Message
Chang S. Bae
March 16, 2021, 6:52 a.m. UTC
The kernel pushes context on to the userspace stack to prepare for the user's signal handler. When the user has supplied an alternate signal stack, via sigaltstack(2), it is easy for the kernel to verify that the stack size is sufficient for the current hardware context. Check if writing the hardware context to the alternate stack will exceed it's size. If yes, then instead of corrupting user-data and proceeding with the original signal handler, an immediate SIGSEGV signal is delivered. Instead of calling on_sig_stack(), directly check the new stack pointer whether in the bounds. While the kernel allows new source code to discover and use a sufficient alternate signal stack size, this check is still necessary to protect binaries with insufficient alternate signal stack size from data corruption. Suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Reviewed-by: Len Brown <len.brown@intel.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org --- Changes from v5: * Fixed the overflow check. (Andy Lutomirski) * Updated the changelog. Changes from v3: * Updated the changelog (Borislav Petkov) Changes from v2: * Simplified the implementation (Jann Horn) --- arch/x86/kernel/signal.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
Comments
On Mon, Mar 15, 2021 at 11:52:14PM -0700, Chang S. Bae wrote: > @@ -272,7 +275,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, > * If we are on the alternate signal stack and would overflow it, don't. > * Return an always-bogus address instead so we will die with SIGSEGV. > */ > - if (onsigstack && !likely(on_sig_stack(sp))) > + if (onsigstack && unlikely(sp <= current->sas_ss_sp || > + sp - current->sas_ss_sp > current->sas_ss_size)) > return (void __user *)-1L; So clearly I'm missing something because trying to trigger the test case in the bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=153531 on current tip/master doesn't work. Runs with MY_MINSIGSTKSZ under 2048 fail with: tst-minsigstksz-2: sigaltstack: Cannot allocate memory and above 2048 don't overwrite bytes below the stack. So something else is missing. How did you test this patch? Thx.
On Mar 16, 2021, at 04:52, Borislav Petkov <bp@suse.de> wrote: > On Mon, Mar 15, 2021 at 11:52:14PM -0700, Chang S. Bae wrote: >> @@ -272,7 +275,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, >> * If we are on the alternate signal stack and would overflow it, don't. >> * Return an always-bogus address instead so we will die with SIGSEGV. >> */ >> - if (onsigstack && !likely(on_sig_stack(sp))) >> + if (onsigstack && unlikely(sp <= current->sas_ss_sp || >> + sp - current->sas_ss_sp > current->sas_ss_size)) >> return (void __user *)-1L; > > So clearly I'm missing something because trying to trigger the test case > in the bugzilla: > > https://bugzilla.kernel.org/show_bug.cgi?id=153531 > > on current tip/master doesn't work. Runs with MY_MINSIGSTKSZ under 2048 > fail with: > > tst-minsigstksz-2: sigaltstack: Cannot allocate memory > > and above 2048 don't overwrite bytes below the stack. > > So something else is missing. How did you test this patch? I suspect the AVX-512 states not enabled there. When I ran it under a machine without AVX-512 like this, it didn’t show the overwrite message: $ cat /proc/cpuinfo | grep -m1 "model name” model name : Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz $ sudo dmesg | grep "Enabled xstate” [ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using ‘compacted’ format. $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=2047 $ ./a.out a.out: sigaltstack: Cannot allocate memory $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=2048 $ ./a.out When do it again with AVX-512, it did show the message: $ cat /proc/cpuinfo | grep -m1 "model name” model name : Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz $ sudo dmesg | grep "Enabled xstate” [ 0.000000] x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted' format. $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=2048 $ ./a.out a.out: changed byte 1412 bytes below configured stack $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=3490 $ ./a.out a.out: changed byte 21 bytes below configured stack $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=3491 $ ./a.out Also, on the second machine, without this patch: $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=3191 $ ./a.out a.out: changed byte 319 bytes below configured stack But with this patch, it gave segfault with a too-small size: $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=3191 $ ./a.out Segmentation fault (core dumped) Thanks, Chang
On Tue, Mar 16, 2021 at 06:26:46PM +0000, Bae, Chang Seok wrote:
> I suspect the AVX-512 states not enabled there.
Ok, I found a machine which has AVX-512:
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[5]: 832, xstate_sizes[5]: 64
[ 0.000000] x86/fpu: xstate_offset[6]: 896, xstate_sizes[6]: 512
[ 0.000000] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
[ 0.000000] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]: 8
[ 0.000000] x86/fpu: Enabled xstate features 0x2e7, context size is 2440 bytes, using 'compacted' format.
and applied your patch and added a debug printk, see end of mail.
Then, I ran the test case:
$ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=3453 -o tst-minsigstksz-2
$ ./tst-minsigstksz-2
tst-minsigstksz-2: changed byte 50 bytes below configured stack
Whoops.
And the debug print said:
[ 5395.252884] signal: get_sigframe: sp: 0x7f54ec39e7b8, sas_ss_sp: 0x7f54ec39e6ce, sas_ss_size 0xd7d
which tells me that, AFAICT, your check whether we have enough alt stack
doesn't seem to work in this case.
Thx.
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index a06cb107c0e8..a7396f7c3832 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -237,7 +237,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
unsigned long math_size = 0;
unsigned long sp = regs->sp;
unsigned long buf_fx = 0;
- int onsigstack = on_sig_stack(sp);
+ bool onsigstack = on_sig_stack(sp);
int ret;
/* redzone */
@@ -246,8 +246,11 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
/* This is the X/Open sanctioned signal stack switching. */
if (ka->sa.sa_flags & SA_ONSTACK) {
- if (sas_ss_flags(sp) == 0)
+ if (sas_ss_flags(sp) == 0) {
sp = current->sas_ss_sp + current->sas_ss_size;
+ /* On the alternate signal stack */
+ onsigstack = true;
+ }
} else if (IS_ENABLED(CONFIG_X86_32) &&
!onsigstack &&
regs->ss != __USER_DS &&
@@ -263,11 +266,16 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
sp = align_sigframe(sp - frame_size);
+ if (onsigstack)
+ pr_info("%s: sp: 0x%lx, sas_ss_sp: 0x%lx, sas_ss_size 0x%lx\n",
+ __func__, sp, current->sas_ss_sp, current->sas_ss_size);
+
/*
* If we are on the alternate signal stack and would overflow it, don't.
* Return an always-bogus address instead so we will die with SIGSEGV.
*/
- if (onsigstack && !likely(on_sig_stack(sp)))
+ if (onsigstack && unlikely(sp <= current->sas_ss_sp ||
+ sp - current->sas_ss_sp > current->sas_ss_size))
return (void __user *)-1L;
/* save i387 and extended state */
On Mar 25, 2021, at 09:20, Borislav Petkov <bp@suse.de> wrote: > > $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=3453 -o tst-minsigstksz-2 > $ ./tst-minsigstksz-2 > tst-minsigstksz-2: changed byte 50 bytes below configured stack > > Whoops. > > And the debug print said: > > [ 5395.252884] signal: get_sigframe: sp: 0x7f54ec39e7b8, sas_ss_sp: 0x7f54ec39e6ce, sas_ss_size 0xd7d > > which tells me that, AFAICT, your check whether we have enough alt stack > doesn't seem to work in this case. Yes, in this case. tst-minsigstksz-2.c has this code: static void handler (int signo) { /* Clear a bit of on-stack memory. */ volatile char buffer[256]; for (size_t i = 0; i < sizeof (buffer); ++i) buffer[i] = 0; handler_run = 1; } … if (handler_run != 1) errx (1, "handler did not run"); for (void *p = stack_buffer; p < stack_bottom; ++p) if (*(unsigned char *) p != 0xCC) errx (1, "changed byte %zd bytes below configured stack\n", stack_bottom - p); … I think the message comes from the handler’s overwriting, not from the kernel. The patch's check is to detect and prevent the kernel-induced overflow -- whether alt stack enough for signal delivery itself. The stack is possibly not enough for the signal handler's use as the kernel does not know for it. Thanks, Chang
On Mon, Mar 15, 2021 at 11:57 PM Chang S. Bae <chang.seok.bae@intel.com> wrote: > > The kernel pushes context on to the userspace stack to prepare for the > user's signal handler. When the user has supplied an alternate signal > stack, via sigaltstack(2), it is easy for the kernel to verify that the > stack size is sufficient for the current hardware context. > > Check if writing the hardware context to the alternate stack will exceed > it's size. If yes, then instead of corrupting user-data and proceeding with > the original signal handler, an immediate SIGSEGV signal is delivered. > > Instead of calling on_sig_stack(), directly check the new stack pointer > whether in the bounds. > > While the kernel allows new source code to discover and use a sufficient > alternate signal stack size, this check is still necessary to protect > binaries with insufficient alternate signal stack size from data > corruption. This patch results in excessively complicated control and data flow. > - int onsigstack = on_sig_stack(sp); > + bool onsigstack = on_sig_stack(sp); Here onsigstack means "we were already using the altstack". > int ret; > > /* redzone */ > @@ -251,8 +251,11 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, > > /* This is the X/Open sanctioned signal stack switching. */ > if (ka->sa.sa_flags & SA_ONSTACK) { > - if (sas_ss_flags(sp) == 0) > + if (sas_ss_flags(sp) == 0) { > sp = current->sas_ss_sp + current->sas_ss_size; > + /* On the alternate signal stack */ > + onsigstack = true; > + } But now onsigstack is also true if we are using the legacy path to *enter* the altstack. So now it's (was on altstack) || (entering altstack via legacy path). > } else if (IS_ENABLED(CONFIG_X86_32) && > !onsigstack && > regs->ss != __USER_DS && > @@ -272,7 +275,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, > * If we are on the alternate signal stack and would overflow it, don't. > * Return an always-bogus address instead so we will die with SIGSEGV. > */ > - if (onsigstack && !likely(on_sig_stack(sp))) > + if (onsigstack && unlikely(sp <= current->sas_ss_sp || > + sp - current->sas_ss_sp > current->sas_ss_size)) And now we fail if ((was on altstack) || (entering altstack via legacy path)) && (new sp is out of bounds). The condition we actually want is that, if we are entering the altstack and we don't fit, we should fail. This is tricky because of the autodisarm stuff and the possibility of nonlinear stack segments, so it's not even clear to me exactly what we should be doing. I propose: > return (void __user *)-1L; Can we please log something (if (show_unhandled_signals || printk_ratelimit()) that says that we overflowed the altstack? How about: diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index ea794a083c44..53781324a2d3 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -237,7 +237,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, unsigned long math_size = 0; unsigned long sp = regs->sp; unsigned long buf_fx = 0; - int onsigstack = on_sig_stack(sp); + bool already_onsigstack = on_sig_stack(sp); + bool entering_altstack = false; int ret; /* redzone */ @@ -246,15 +247,25 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, /* This is the X/Open sanctioned signal stack switching. */ if (ka->sa.sa_flags & SA_ONSTACK) { - if (sas_ss_flags(sp) == 0) + /* + * This checks already_onsigstack via sas_ss_flags(). + * Sensible programs use SS_AUTODISARM, which disables + * that check, and programs that don't use + * SS_AUTODISARM get compatible but potentially + * bizarre behavior. + */ + if (sas_ss_flags(sp) == 0) { sp = current->sas_ss_sp + current->sas_ss_size; + entering_altstack = true; + } } else if (IS_ENABLED(CONFIG_X86_32) && - !onsigstack && + !already_onsigstack && regs->ss != __USER_DS && !(ka->sa.sa_flags & SA_RESTORER) && ka->sa.sa_restorer) { /* This is the legacy signal stack switching. */ sp = (unsigned long) ka->sa.sa_restorer; + entering_altstack = true; } sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32), @@ -267,8 +278,16 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, * If we are on the alternate signal stack and would overflow it, don't. * Return an always-bogus address instead so we will die with SIGSEGV. */ - if (onsigstack && !likely(on_sig_stack(sp))) + if (unlikely(entering_altstack && + (sp <= current->sas_ss_sp || + sp - current->sas_ss_sp > current->sas_ss_size))) { + if (show_unhandled_signals && printk_ratelimit()) { + pr_info("%s[%d] overflowed sigaltstack", + tsk->comm, task_pid_nr(tsk)); + } + return (void __user *)-1L; + } /* save i387 and extended state */ ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size); Apologies for whitespace damage. I attached it, too.
On Thu, Mar 25, 2021 at 11:13:12AM -0700, Andy Lutomirski wrote: > diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c > index ea794a083c44..53781324a2d3 100644 > --- a/arch/x86/kernel/signal.c > +++ b/arch/x86/kernel/signal.c > @@ -237,7 +237,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, > unsigned long math_size = 0; > unsigned long sp = regs->sp; > unsigned long buf_fx = 0; > - int onsigstack = on_sig_stack(sp); > + bool already_onsigstack = on_sig_stack(sp); > + bool entering_altstack = false; > int ret; > > /* redzone */ > @@ -246,15 +247,25 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, > > /* This is the X/Open sanctioned signal stack switching. */ > if (ka->sa.sa_flags & SA_ONSTACK) { > - if (sas_ss_flags(sp) == 0) > + /* > + * This checks already_onsigstack via sas_ss_flags(). > + * Sensible programs use SS_AUTODISARM, which disables > + * that check, and programs that don't use > + * SS_AUTODISARM get compatible but potentially > + * bizarre behavior. > + */ > + if (sas_ss_flags(sp) == 0) { > sp = current->sas_ss_sp + current->sas_ss_size; > + entering_altstack = true; > + } > } else if (IS_ENABLED(CONFIG_X86_32) && > - !onsigstack && > + !already_onsigstack && > regs->ss != __USER_DS && > !(ka->sa.sa_flags & SA_RESTORER) && > ka->sa.sa_restorer) { > /* This is the legacy signal stack switching. */ > sp = (unsigned long) ka->sa.sa_restorer; > + entering_altstack = true; > } What a mess this whole signal handling is. I need a course in signal handling to understand what's going on here... > > sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32), > @@ -267,8 +278,16 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, > * If we are on the alternate signal stack and would overflow it, don't. > * Return an always-bogus address instead so we will die with SIGSEGV. > */ > - if (onsigstack && !likely(on_sig_stack(sp))) > + if (unlikely(entering_altstack && > + (sp <= current->sas_ss_sp || > + sp - current->sas_ss_sp > current->sas_ss_size))) { You could've simply done if (unlikely(entering_altstack && !on_sig_stack(sp))) here. > + if (show_unhandled_signals && printk_ratelimit()) { > + pr_info("%s[%d] overflowed sigaltstack", > + tsk->comm, task_pid_nr(tsk)); > + } Why do you even wanna issue that? It looks like callers will propagate an error value up and people don't look at dmesg all the time. Btw, s/tsk/current/g IOW, this builds: --- diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index a06cb107c0e8..c00e932b5f18 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -234,10 +234,11 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, void __user **fpstate) { /* Default to using normal stack */ + bool already_onsigstack = on_sig_stack(regs->sp); + bool entering_altstack = false; unsigned long math_size = 0; unsigned long sp = regs->sp; unsigned long buf_fx = 0; - int onsigstack = on_sig_stack(sp); int ret; /* redzone */ @@ -246,15 +247,24 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, /* This is the X/Open sanctioned signal stack switching. */ if (ka->sa.sa_flags & SA_ONSTACK) { - if (sas_ss_flags(sp) == 0) + /* + * This checks already_onsigstack via sas_ss_flags(). Sensible + * programs use SS_AUTODISARM, which disables that check, and + * programs that don't use SS_AUTODISARM get compatible but + * potentially bizarre behavior. + */ + if (sas_ss_flags(sp) == 0) { sp = current->sas_ss_sp + current->sas_ss_size; + entering_altstack = true; + } } else if (IS_ENABLED(CONFIG_X86_32) && - !onsigstack && + !already_onsigstack && regs->ss != __USER_DS && !(ka->sa.sa_flags & SA_RESTORER) && ka->sa.sa_restorer) { /* This is the legacy signal stack switching. */ sp = (unsigned long) ka->sa.sa_restorer; + entering_altstack = true; } sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32), @@ -267,8 +277,14 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, * If we are on the alternate signal stack and would overflow it, don't. * Return an always-bogus address instead so we will die with SIGSEGV. */ - if (onsigstack && !likely(on_sig_stack(sp))) + if (unlikely(entering_altstack && !on_sig_stack(sp))) { + + if (show_unhandled_signals && printk_ratelimit()) + pr_info("%s[%d] overflowed sigaltstack", + current->comm, task_pid_nr(current)); + return (void __user *)-1L; + } /* save i387 and extended state */ ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size);
* Chang Seok via Libc-alpha Bae: > On Mar 25, 2021, at 09:20, Borislav Petkov <bp@suse.de> wrote: >> >> $ gcc tst-minsigstksz-2.c -DMY_MINSIGSTKSZ=3453 -o tst-minsigstksz-2 >> $ ./tst-minsigstksz-2 >> tst-minsigstksz-2: changed byte 50 bytes below configured stack >> >> Whoops. >> >> And the debug print said: >> >> [ 5395.252884] signal: get_sigframe: sp: 0x7f54ec39e7b8, sas_ss_sp: 0x7f54ec39e6ce, sas_ss_size 0xd7d >> >> which tells me that, AFAICT, your check whether we have enough alt stack >> doesn't seem to work in this case. > > Yes, in this case. > > tst-minsigstksz-2.c has this code: > > static void > handler (int signo) > { > /* Clear a bit of on-stack memory. */ > volatile char buffer[256]; > for (size_t i = 0; i < sizeof (buffer); ++i) > buffer[i] = 0; > handler_run = 1; > } > … > > if (handler_run != 1) > errx (1, "handler did not run"); > > for (void *p = stack_buffer; p < stack_bottom; ++p) > if (*(unsigned char *) p != 0xCC) > errx (1, "changed byte %zd bytes below configured stack\n", > stack_bottom - p); > … > > I think the message comes from the handler’s overwriting, not from the kernel. > > The patch's check is to detect and prevent the kernel-induced overflow -- > whether alt stack enough for signal delivery itself. The stack is possibly > not enough for the signal handler's use as the kernel does not know for it. Ahh, right. When I wrote the test, I didn't know which turn the kernel would eventually take, so the test is quite arbitrary. The glibc dynamic loader uses XSAVE/XSAVEC as well, so you can probably double the practical stack requirement if lazy binding is in use and can be triggered from the signal handler. Estimating stack sizes is hard.
On Mar 25, 2021, at 11:54, Borislav Petkov <bp@suse.de> wrote: > On Thu, Mar 25, 2021 at 11:13:12AM -0700, Andy Lutomirski wrote: >> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c >> index ea794a083c44..53781324a2d3 100644 >> --- a/arch/x86/kernel/signal.c >> +++ b/arch/x86/kernel/signal.c >> @@ -237,7 +237,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, >> unsigned long math_size = 0; >> unsigned long sp = regs->sp; >> unsigned long buf_fx = 0; >> - int onsigstack = on_sig_stack(sp); >> + bool already_onsigstack = on_sig_stack(sp); >> + bool entering_altstack = false; >> int ret; >> >> /* redzone */ >> @@ -246,15 +247,25 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, >> >> /* This is the X/Open sanctioned signal stack switching. */ >> if (ka->sa.sa_flags & SA_ONSTACK) { >> - if (sas_ss_flags(sp) == 0) >> + /* >> + * This checks already_onsigstack via sas_ss_flags(). >> + * Sensible programs use SS_AUTODISARM, which disables >> + * that check, and programs that don't use >> + * SS_AUTODISARM get compatible but potentially >> + * bizarre behavior. >> + */ >> + if (sas_ss_flags(sp) == 0) { >> sp = current->sas_ss_sp + current->sas_ss_size; >> + entering_altstack = true; >> + } >> } else if (IS_ENABLED(CONFIG_X86_32) && >> - !onsigstack && >> + !already_onsigstack && >> regs->ss != __USER_DS && >> !(ka->sa.sa_flags & SA_RESTORER) && >> ka->sa.sa_restorer) { >> /* This is the legacy signal stack switching. */ >> sp = (unsigned long) ka->sa.sa_restorer; >> + entering_altstack = true; >> } > > What a mess this whole signal handling is. I need a course in signal > handling to understand what's going on here... > >> >> sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32), >> @@ -267,8 +278,16 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, >> * If we are on the alternate signal stack and would overflow it, don't. >> * Return an always-bogus address instead so we will die with SIGSEGV. >> */ >> - if (onsigstack && !likely(on_sig_stack(sp))) >> + if (unlikely(entering_altstack && >> + (sp <= current->sas_ss_sp || >> + sp - current->sas_ss_sp > current->sas_ss_size))) { > > You could've simply done > > if (unlikely(entering_altstack && !on_sig_stack(sp))) > > here. But if sigaltstack()’ed with the SS_AUTODISARM flag, both on_sig_stack() and sas_ss_flags() return 0 [1]. Then, segfault always here. v5 had the exact issue before [2]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/sched/signal.h#n576 [2] https://lore.kernel.org/lkml/CALCETrXuFrHUU-L=HMofTgEDZk9muPnVtK=EjsTHqQ01XhbRYg@mail.gmail.com/ Thanks, Chang
On Thu, Mar 25, 2021 at 09:11:56PM +0000, Bae, Chang Seok wrote: > But if sigaltstack()’ed with the SS_AUTODISARM flag, both on_sig_stack() and > sas_ss_flags() return 0 [1]. Then, segfault always here. v5 had the exact > issue before [2]. Ah, there's that SS_AUTODISARM check above it which I missed, sorry. I guess we can do a __on_sig_stack() helper or so which does the stack check only without the SS_AUTODISARM. Just for readability's sake in what is already a pretty messy function. Thx.
On Thu, Mar 25, 2021 at 11:54 AM Borislav Petkov <bp@suse.de> wrote: > > On Thu, Mar 25, 2021 at 11:13:12AM -0700, Andy Lutomirski wrote: > > diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c > > index ea794a083c44..53781324a2d3 100644 > > --- a/arch/x86/kernel/signal.c > > +++ b/arch/x86/kernel/signal.c > > @@ -237,7 +237,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, > > unsigned long math_size = 0; > > unsigned long sp = regs->sp; > > unsigned long buf_fx = 0; > > - int onsigstack = on_sig_stack(sp); > > + bool already_onsigstack = on_sig_stack(sp); > > + bool entering_altstack = false; > > int ret; > > > > /* redzone */ > > @@ -246,15 +247,25 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, > > > > /* This is the X/Open sanctioned signal stack switching. */ > > if (ka->sa.sa_flags & SA_ONSTACK) { > > - if (sas_ss_flags(sp) == 0) > > + /* > > + * This checks already_onsigstack via sas_ss_flags(). > > + * Sensible programs use SS_AUTODISARM, which disables > > + * that check, and programs that don't use > > + * SS_AUTODISARM get compatible but potentially > > + * bizarre behavior. > > + */ > > + if (sas_ss_flags(sp) == 0) { > > sp = current->sas_ss_sp + current->sas_ss_size; > > + entering_altstack = true; > > + } > > } else if (IS_ENABLED(CONFIG_X86_32) && > > - !onsigstack && > > + !already_onsigstack && > > regs->ss != __USER_DS && > > !(ka->sa.sa_flags & SA_RESTORER) && > > ka->sa.sa_restorer) { > > /* This is the legacy signal stack switching. */ > > sp = (unsigned long) ka->sa.sa_restorer; > > + entering_altstack = true; > > } > > What a mess this whole signal handling is. I need a course in signal > handling to understand what's going on here... > > > > > sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32), > > @@ -267,8 +278,16 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, > > * If we are on the alternate signal stack and would overflow it, don't. > > * Return an always-bogus address instead so we will die with SIGSEGV. > > */ > > - if (onsigstack && !likely(on_sig_stack(sp))) > > + if (unlikely(entering_altstack && > > + (sp <= current->sas_ss_sp || > > + sp - current->sas_ss_sp > current->sas_ss_size))) { > > You could've simply done > > if (unlikely(entering_altstack && !on_sig_stack(sp))) > > here. Nope. on_sig_stack() is a horrible kludge and won't work here. We could have something like __on_sig_stack() or sp_is_on_sig_stack() or something, though. > > > > + if (show_unhandled_signals && printk_ratelimit()) { > > + pr_info("%s[%d] overflowed sigaltstack", > > + tsk->comm, task_pid_nr(tsk)); > > + } > > Why do you even wanna issue that? It looks like callers will propagate > an error value up and people don't look at dmesg all the time. I figure that the people whose programs spontaneously crash should get a hint why if they look at dmesg. Maybe the message should say "overflowed sigaltstack -- try noavx512"? We really ought to have a SIGSIGFAIL signal that's sent, double-fault style, when we fail to send a signal.
I forgot to mention why I cc'd all you fine Xen folk: On Thu, Mar 25, 2021 at 11:13 AM Andy Lutomirski <luto@kernel.org> wrote: > > > } else if (IS_ENABLED(CONFIG_X86_32) && > > !onsigstack && > > regs->ss != __USER_DS && This bit here seems really dubious on Xen PV. Honestly it seems dubious everywhere, but especially on Xen PV. --Andy
On Thu, Mar 25, 2021 at 09:56:53PM -0700, Andy Lutomirski wrote: > Nope. on_sig_stack() is a horrible kludge and won't work here. We > could have something like __on_sig_stack() or sp_is_on_sig_stack() or > something, though. Yeah, see my other reply. Ack to either of those carved out helpers. > I figure that the people whose programs spontaneously crash should get > a hint why if they look at dmesg. Maybe the message should say > "overflowed sigaltstack -- try noavx512"? I guess, as long as it is ratelimited. I mean, we can remove it later if it starts gettin' annoying. > We really ought to have a SIGSIGFAIL signal that's sent, double-fault > style, when we fail to send a signal. Yeap, we should be able to tell userspace that we couldn't send a signal, hohumm. Thx.
On Mar 26, 2021, at 03:30, Borislav Petkov <bp@alien8.de> wrote: > On Thu, Mar 25, 2021 at 09:56:53PM -0700, Andy Lutomirski wrote: >> We really ought to have a SIGSIGFAIL signal that's sent, double-fault >> style, when we fail to send a signal. > > Yeap, we should be able to tell userspace that we couldn't send a > signal, hohumm. Hi Boris, Let me clarify some details as preparing to include this in a revision. So, IIUC, a number needs to be assigned for this new SIGFAIL. At a glance, not sure which one to pick there in signal.h -- 1-31 fully occupied and the rest for 33 different real-time signals. Also, perhaps, force_sig(SIGFAIL) here, instead of return -1 -- to die with SIGSEGV. Thanks, Chang
On Mon, Apr 12, 2021 at 10:30:23PM +0000, Bae, Chang Seok wrote: > On Mar 26, 2021, at 03:30, Borislav Petkov <bp@alien8.de> wrote: > > On Thu, Mar 25, 2021 at 09:56:53PM -0700, Andy Lutomirski wrote: > >> We really ought to have a SIGSIGFAIL signal that's sent, double-fault > >> style, when we fail to send a signal. > > > > Yeap, we should be able to tell userspace that we couldn't send a > > signal, hohumm. > > Hi Boris, > > Let me clarify some details as preparing to include this in a revision. > > So, IIUC, a number needs to be assigned for this new SIGFAIL. At a glance, not > sure which one to pick there in signal.h -- 1-31 fully occupied and the rest > for 33 different real-time signals. > > Also, perhaps, force_sig(SIGFAIL) here, instead of return -1 -- to die with > SIGSEGV. I think this needs to be decided together with userspace people so that they can act accordingly and whether it even makes sense to them. Florian, any suggestions? Subthread starts here: https://lkml.kernel.org/r/CALCETrXQZuvJQrHDMst6PPgtJxaS_sPk2JhwMiMDNPunq45YFg@mail.gmail.com Thx.
* Borislav Petkov: > On Mon, Apr 12, 2021 at 10:30:23PM +0000, Bae, Chang Seok wrote: >> On Mar 26, 2021, at 03:30, Borislav Petkov <bp@alien8.de> wrote: >> > On Thu, Mar 25, 2021 at 09:56:53PM -0700, Andy Lutomirski wrote: >> >> We really ought to have a SIGSIGFAIL signal that's sent, double-fault >> >> style, when we fail to send a signal. >> > >> > Yeap, we should be able to tell userspace that we couldn't send a >> > signal, hohumm. >> >> Hi Boris, >> >> Let me clarify some details as preparing to include this in a revision. >> >> So, IIUC, a number needs to be assigned for this new SIGFAIL. At a glance, not >> sure which one to pick there in signal.h -- 1-31 fully occupied and the rest >> for 33 different real-time signals. >> >> Also, perhaps, force_sig(SIGFAIL) here, instead of return -1 -- to die with >> SIGSEGV. > > I think this needs to be decided together with userspace people so that > they can act accordingly and whether it even makes sense to them. > > Florian, any suggestions? Is this discussion about better behavior (at least diagnostics) for existing applications, without any code changes? Or an alternative programming model? Does noavx512 acutally reduce the XSAVE size to AVX2 levels? Or would you need noxsave? One possibility is that the sigaltstack size check prevents application from running which work just fine today because all they do is install a stack overflow handler, and stack overflow does not actually happen. So if sigaltstack fails and the application checks the result of the system call, it probably won't run at all. Shifting the diagnostic to the pointer where the signal would have to be delivered is perhaps the only thing that can be done. As for SIGFAIL in particular, I don't think there are any leftover signal numbers. It would need a prctl to assign the signal number, and I'm not sure if there is a useful programming model because signals do not really compose well even today. SIGFAIL adds another point where libraries need to collaborate, and we do not have a mechanism for that. (This is about what Rich Felker termed “library-safe code”, proper maintenance of process-wide resources such as the current directory.) Thanks, Florian
On Wed, Apr 14, 2021 at 01:30:43PM +0200, Florian Weimer wrote: > Is this discussion about better behavior (at least diagnostics) for > existing applications, without any code changes? Or an alternative > programming model? Former. > Does noavx512 acutally reduce the XSAVE size to AVX2 levels? Yeah. > Or would you need noxsave? I don't think so. > One possibility is that the sigaltstack size check prevents application > from running which work just fine today because all they do is install a > stack overflow handler, and stack overflow does not actually happen. So sigaltstack(2) says in the NOTES: Functions called from a signal handler executing on an alternate signal stack will also use the alternate signal stack. (This also applies to any handlers invoked for other signals while the process is executing on the alternate signal stack.) Unlike the standard stack, the system does not automatically extend the alternate signal stack. Exceeding the allocated size of the alternate signal stack will lead to unpredictable results. > So if sigaltstack fails and the application checks the result of the > system call, it probably won't run at all. Shifting the diagnostic to > the pointer where the signal would have to be delivered is perhaps the > only thing that can be done. So using the example from the same manpage: The most common usage of an alternate signal stack is to handle the SIGSEGV sig‐ nal that is generated if the space available for the normal process stack is ex‐ hausted: in this case, a signal handler for SIGSEGV cannot be invoked on the process stack; if we wish to handle it, we must use an alternate signal stack. and considering these "unpredictable results" would it make sense or even be at all possible to return SIGFAIL from that SIGSEGV signal handler which should run on the sigaltstack but that sigaltstack overflows? I think we wanna be able to tell the process through that previously registered SIGSEGV handler which is supposed to run on the sigaltstack, that that stack got overflowed. Or is this use case obsolete and this is not what people do at all? > As for SIGFAIL in particular, I don't think there are any leftover > signal numbers. It would need a prctl to assign the signal number, and > I'm not sure if there is a useful programming model because signals do > not really compose well even today. SIGFAIL adds another point where > libraries need to collaborate, and we do not have a mechanism for that. > (This is about what Rich Felker termed “library-safe code”, proper > maintenance of process-wide resources such as the current directory.) Oh fun. I guess if Linux goes and does something, people would adopt it and it'll become standard. :-P Thx.
* Borislav Petkov: >> One possibility is that the sigaltstack size check prevents application >> from running which work just fine today because all they do is install a >> stack overflow handler, and stack overflow does not actually happen. > > So sigaltstack(2) says in the NOTES: > > Functions called from a signal handler executing on an alternate signal stack > will also use the alternate signal stack. (This also applies to any handlers > invoked for other signals while the process is executing on the alternate signal > stack.) Unlike the standard stack, the system does not automatically extend the > alternate signal stack. Exceeding the allocated size of the alternate signal > stack will lead to unpredictable results. > >> So if sigaltstack fails and the application checks the result of the >> system call, it probably won't run at all. Shifting the diagnostic to >> the pointer where the signal would have to be delivered is perhaps the >> only thing that can be done. > > So using the example from the same manpage: > > The most common usage of an alternate signal stack is to handle the SIGSEGV sig‐ > nal that is generated if the space available for the normal process stack is ex‐ > hausted: in this case, a signal handler for SIGSEGV cannot be invoked on the > process stack; if we wish to handle it, we must use an alternate signal stack. > > and considering these "unpredictable results" would it make sense or > even be at all possible to return SIGFAIL from that SIGSEGV signal > handler which should run on the sigaltstack but that sigaltstack > overflows? > > I think we wanna be able to tell the process through that previously > registered SIGSEGV handler which is supposed to run on the sigaltstack, > that that stack got overflowed. Just to be clear, I'm worried about the case where an application installs a stack overflow handler, but stack overflow does not regularly happen at run time. GNU m4 is an example. Today, for most m4 scripts, it's totally fine to have an alternative signal stack which is too small. If the kernel returned an error for the sigaltstack call, m4 wouldn't start anymore, independently of the script. Which is worse than memory corruption with some scripts, I think. > Or is this use case obsolete and this is not what people do at all? It's widely used in currently-maintained software. It's the only way to recover from stack overflows without boundary checks on every function call. Does the alternative signal stack actually have to contain the siginfo_t data? I don't think it has to be contiguous. Maybe the kernel could allocate and map something behind the processes back if the sigaltstack region is too small? And for the stack overflow handler, the kernel could treat SIGSEGV with a sigaltstack region that is too small like the SIG_DFL handler. This would make m4 work again. Thanks, Florian
On Mon, May 03, 2021 at 07:30:21AM +0200, Florian Weimer wrote: > Just to be clear, I'm worried about the case where an application > installs a stack overflow handler, but stack overflow does not regularly > happen at run time. GNU m4 is an example. Today, for most m4 scripts, > it's totally fine to have an alternative signal stack which is too > small. If the kernel returned an error for the sigaltstack call, m4 > wouldn't start anymore, independently of the script. Which is worse > than memory corruption with some scripts, I think. Oh lovely. > > > Or is this use case obsolete and this is not what people do at all? > > It's widely used in currently-maintained software. It's the only way to > recover from stack overflows without boundary checks on every function > call. > > Does the alternative signal stack actually have to contain the siginfo_t > data? I don't think it has to be contiguous. Maybe the kernel could > allocate and map something behind the processes back if the sigaltstack > region is too small? So there's an attempt floating around to address this: https://lkml.kernel.org/r/20210422044856.27250-1-chang.seok.bae@intel.com esp patch 3. I'd appreciate having a look and sanity-checking this whether it makes sense and could be useful this way... > And for the stack overflow handler, the kernel could treat SIGSEGV with > a sigaltstack region that is too small like the SIG_DFL handler. This > would make m4 work again. /me searches a bit about SIG_DFL... Do you mean that the default action in this case should be what SIGSEGV does by default - to dump core? Thx.
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 0d24f64d0145..9a62604fbf63 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -242,7 +242,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, unsigned long math_size = 0; unsigned long sp = regs->sp; unsigned long buf_fx = 0; - int onsigstack = on_sig_stack(sp); + bool onsigstack = on_sig_stack(sp); int ret; /* redzone */ @@ -251,8 +251,11 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, /* This is the X/Open sanctioned signal stack switching. */ if (ka->sa.sa_flags & SA_ONSTACK) { - if (sas_ss_flags(sp) == 0) + if (sas_ss_flags(sp) == 0) { sp = current->sas_ss_sp + current->sas_ss_size; + /* On the alternate signal stack */ + onsigstack = true; + } } else if (IS_ENABLED(CONFIG_X86_32) && !onsigstack && regs->ss != __USER_DS && @@ -272,7 +275,8 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, * If we are on the alternate signal stack and would overflow it, don't. * Return an always-bogus address instead so we will die with SIGSEGV. */ - if (onsigstack && !likely(on_sig_stack(sp))) + if (onsigstack && unlikely(sp <= current->sas_ss_sp || + sp - current->sas_ss_sp > current->sas_ss_size)) return (void __user *)-1L; /* save i387 and extended state */