Message ID | 20210316065215.23768-1-chang.seok.bae@intel.com |
---|---|
Headers |
Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8CB84385703B; Tue, 16 Mar 2021 06:57:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8CB84385703B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1615877836; bh=E1c7jG/LLE7GWMiDylvqBU+mPVypKGRn3pjyh7tkmcY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=IYOxT/mKpJmrQ/r5WjL5Zb2LDGCbP/rHeTNnnJSlvaZGbAIRol/7tlmkQOnzOcyLA 0ckssyZ56wRRFMw3KkQegehjClqtqPeg/ByP2kCp9ykVtIZfqRzqqcdYG11hnLiUj3 N5CmPU3z2H2SIXE23SufTx68XXxSSRo//OXvYHDA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id 0CCDE3858D29 for <libc-alpha@sourceware.org>; Tue, 16 Mar 2021 06:57:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 0CCDE3858D29 IronPort-SDR: A0o0XTrB99opi/xhCX08AFBWqCqPXQ0kXAzaHLuq7dumnwmsc2oUmI++TOYVMdOxXqV3Zx64OK qgzcwCw8uxqA== X-IronPort-AV: E=McAfee;i="6000,8403,9924"; a="169126404" X-IronPort-AV: E=Sophos;i="5.81,251,1610438400"; d="scan'208";a="169126404" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Mar 2021 23:57:11 -0700 IronPort-SDR: qnpBGRE2wppZL6tplYqEm0jVKNV60UXK9XAj1XjenkDdKKw1lubXGnQNqiT9J9tG8UKh7kQiNc euuofBv8/mqA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,251,1610438400"; d="scan'208";a="511296069" Received: from chang-linux-3.sc.intel.com ([172.25.66.175]) by fmsmga001.fm.intel.com with ESMTP; 15 Mar 2021 23:57:11 -0700 To: bp@suse.de, tglx@linutronix.de, mingo@kernel.org, luto@kernel.org, x86@kernel.org Subject: [PATCH v7 0/6] x86: Improve Minimum Alternate Stack Size Date: Mon, 15 Mar 2021 23:52:09 -0700 Message-Id: <20210316065215.23768-1-chang.seok.bae@intel.com> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-3.4 required=5.0 tests=AC_FROM_MANY_DOTS, BAYES_00, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: "Chang S. Bae via Libc-alpha" <libc-alpha@sourceware.org> Reply-To: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: linux-arch@vger.kernel.org, len.brown@intel.com, tony.luck@intel.com, libc-alpha@sourceware.org, ravi.v.shankar@intel.com, chang.seok.bae@intel.com, jannh@google.com, linux-kernel@vger.kernel.org, dave.hansen@intel.com, linux-api@vger.kernel.org, Dave.Martin@arm.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org> |
Series |
x86: Improve Minimum Alternate Stack Size
|
|
Message
Chang S. Bae
March 16, 2021, 6:52 a.m. UTC
During signal entry, the kernel pushes data onto the normal userspace stack. On x86, the data pushed onto the user stack includes XSAVE state, which has grown over time as new features and larger registers have been added to the architecture. MINSIGSTKSZ is a constant provided in the kernel signal.h headers and typically distributed in lib-dev(el) packages, e.g. [1]. Its value is compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ constant indicates to userspace how much data the kernel expects to push on the user stack, [2][3]. However, this constant is much too small and does not reflect recent additions to the architecture. For instance, when AVX-512 states are in use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB. The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can cause user stack overflow when delivering a signal. In this series, we suggest a couple of things: 1. Provide a variable minimum stack size to userspace, as a similar approach to [5]. 2. Avoid using a too-small alternate stack. Changes from v6 [11]: * Updated and fixed the documentation. (Borislav Petkov) * Revised the AT_MINSIGSTKSZ comment. (Borislav Petkov) Changes form v5 [10]: * Fixed the overflow detection. (Andy Lutomirski) * Reverted the AT_MINSIGSTKSZ removal on arm64. (Dave Martin) * Added a documentation about the x86 AT_MINSIGSTKSZ. * Supported the existing sigaltstack test to use the new aux vector. Changes from v4 [9]: * Moved the aux vector define to the generic header. (Carlos O'Donell) Changes from v3 [8]: * Updated the changelog. (Borislav Petkov) * Revised the test messages again. (Borislav Petkov) Changes from v2 [7]: * Simplified the sigaltstack overflow prevention. (Jann Horn) * Renamed fpstate size helper with cleanup. (Borislav Petkov) * Cleaned up the signframe struct size defines. (Borislav Petkov) * Revised the selftest messages. (Borislav Petkov) * Revised a changelog. (Borislav Petkov) Changes from v1 [6]: * Took stack alignment into account for sigframe size. (Dave Martin) [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/bits/sigstack.h;h=b9dca794da093dc4d41d39db9851d444e1b54d9b;hb=HEAD [2]: https://www.gnu.org/software/libc/manual/html_node/Signal-Stack.html [3]: https://man7.org/linux/man-pages/man2/sigaltstack.2.html [4]: https://bugzilla.kernel.org/show_bug.cgi?id=153531 [5]: https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4671/original/plumbers-dm-2017.pdf [6]: https://lore.kernel.org/lkml/20200929205746.6763-1-chang.seok.bae@intel.com/ [7]: https://lore.kernel.org/lkml/20201119190237.626-1-chang.seok.bae@intel.com/ [8]: https://lore.kernel.org/lkml/20201223015312.4882-1-chang.seok.bae@intel.com/ [9]: https://lore.kernel.org/lkml/20210115211038.2072-1-chang.seok.bae@intel.com/ [10]: https://lore.kernel.org/lkml/20210203172242.29644-1-chang.seok.bae@intel.com/ [11]: https://lore.kernel.org/lkml/20210227165911.32757-1-chang.seok.bae@intel.com/ Chang S. Bae (6): uapi: Define the aux vector AT_MINSIGSTKSZ x86/signal: Introduce helpers to get the maximum signal frame size x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available x86/signal: Detect and prevent an alternate signal stack overflow selftest/x86/signal: Include test cases for validating sigaltstack Documentation/x86/elf_auxvec.rst | 53 +++++++++ Documentation/x86/index.rst | 1 + arch/x86/include/asm/elf.h | 4 + arch/x86/include/asm/fpu/signal.h | 2 + arch/x86/include/asm/sigframe.h | 2 + arch/x86/include/uapi/asm/auxvec.h | 4 +- arch/x86/kernel/cpu/common.c | 3 + arch/x86/kernel/fpu/signal.c | 19 ++++ arch/x86/kernel/signal.c | 72 +++++++++++- include/uapi/linux/auxvec.h | 3 + tools/testing/selftests/sigaltstack/sas.c | 20 +++- tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/sigaltstack.c | 128 ++++++++++++++++++++++ 13 files changed, 300 insertions(+), 13 deletions(-) create mode 100644 Documentation/x86/elf_auxvec.rst create mode 100644 tools/testing/selftests/x86/sigaltstack.c base-commit: 1e28eed17697bcf343c6743f0028cc3b5dd88bf0
Comments
* Chang S. Bae <chang.seok.bae@intel.com> wrote: > During signal entry, the kernel pushes data onto the normal userspace > stack. On x86, the data pushed onto the user stack includes XSAVE state, > which has grown over time as new features and larger registers have been > added to the architecture. > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ > constant indicates to userspace how much data the kernel expects to push on > the user stack, [2][3]. > > However, this constant is much too small and does not reflect recent > additions to the architecture. For instance, when AVX-512 states are in > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB. > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can > cause user stack overflow when delivering a signal. > uapi: Define the aux vector AT_MINSIGSTKSZ > x86/signal: Introduce helpers to get the maximum signal frame size > x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ > selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available > x86/signal: Detect and prevent an alternate signal stack overflow > selftest/x86/signal: Include test cases for validating sigaltstack So this looks really complicated, is this justified? Why not just internally round up sigaltstack size if it's too small? This would be more robust, as it would fix applications that use MINSIGSTKSZ but don't use the new AT_MINSIGSTKSZ facility. I.e. does AT_MINSIGSTKSZ have any other uses than avoiding the segfault if MINSIGSTKSZ is used to create a small signal stack? Thanks, Ingo
* Ingo Molnar <mingo@kernel.org> wrote: > > * Chang S. Bae <chang.seok.bae@intel.com> wrote: > > > During signal entry, the kernel pushes data onto the normal userspace > > stack. On x86, the data pushed onto the user stack includes XSAVE state, > > which has grown over time as new features and larger registers have been > > added to the architecture. > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ > > constant indicates to userspace how much data the kernel expects to push on > > the user stack, [2][3]. > > > > However, this constant is much too small and does not reflect recent > > additions to the architecture. For instance, when AVX-512 states are in > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB. > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can > > cause user stack overflow when delivering a signal. > > > uapi: Define the aux vector AT_MINSIGSTKSZ > > x86/signal: Introduce helpers to get the maximum signal frame size > > x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ > > selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available > > x86/signal: Detect and prevent an alternate signal stack overflow > > selftest/x86/signal: Include test cases for validating sigaltstack > > So this looks really complicated, is this justified? > > Why not just internally round up sigaltstack size if it's too small? > This would be more robust, as it would fix applications that use > MINSIGSTKSZ but don't use the new AT_MINSIGSTKSZ facility. > > I.e. does AT_MINSIGSTKSZ have any other uses than avoiding the > segfault if MINSIGSTKSZ is used to create a small signal stack? I.e. if the kernel sees a too small ->ss_size in sigaltstack() it would ignore ->ss_sp and mmap() a new sigaltstack instead and use that for the signal handler stack. This would automatically make MINSIGSTKSZ - and other too small sizes work today, and in the future. But the question is, is there user-space usage of sigaltstacks that relies on controlling or reading the contents of the stack? longjmp using programs perhaps? Thanks, Ingo
On Wed, Mar 17, 2021 at 6:45 AM Ingo Molnar <mingo@kernel.org> wrote: > > > * Ingo Molnar <mingo@kernel.org> wrote: > > > > > * Chang S. Bae <chang.seok.bae@intel.com> wrote: > > > > > During signal entry, the kernel pushes data onto the normal userspace > > > stack. On x86, the data pushed onto the user stack includes XSAVE state, > > > which has grown over time as new features and larger registers have been > > > added to the architecture. > > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ > > > constant indicates to userspace how much data the kernel expects to push on > > > the user stack, [2][3]. > > > > > > However, this constant is much too small and does not reflect recent > > > additions to the architecture. For instance, when AVX-512 states are in > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB. > > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can > > > cause user stack overflow when delivering a signal. > > > > > uapi: Define the aux vector AT_MINSIGSTKSZ > > > x86/signal: Introduce helpers to get the maximum signal frame size > > > x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ > > > selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available > > > x86/signal: Detect and prevent an alternate signal stack overflow > > > selftest/x86/signal: Include test cases for validating sigaltstack > > > > So this looks really complicated, is this justified? > > > > Why not just internally round up sigaltstack size if it's too small? > > This would be more robust, as it would fix applications that use > > MINSIGSTKSZ but don't use the new AT_MINSIGSTKSZ facility. > > > > I.e. does AT_MINSIGSTKSZ have any other uses than avoiding the > > segfault if MINSIGSTKSZ is used to create a small signal stack? > > I.e. if the kernel sees a too small ->ss_size in sigaltstack() it > would ignore ->ss_sp and mmap() a new sigaltstack instead and use that > for the signal handler stack. > > This would automatically make MINSIGSTKSZ - and other too small sizes > work today, and in the future. > > But the question is, is there user-space usage of sigaltstacks that > relies on controlling or reading the contents of the stack? > > longjmp using programs perhaps? For the legacy binary that requests a too-small sigaltstack, there are several choices: We could detect the too-small stack at sigaltstack(2) invocation and return an error. This results in two deal-killing problems: First, some applications don't check the return value, so the check would be fruitless. Second, those that check and error-out may be programs that never actually take the signal, and so we'd be causing a dusty binary to exit, when it didn't exit on another system, or another kernel. Or we could detect the too small stack at signal registration time. This has the same two deal-killers as above. Then there is the approach in this patch-set, which detects an imminent stack overflow at run time. It has neither of the two problems above, and the benefit that we now prevent data corruption that could have been happening on some systems already today. The down side is that the dusty binary that does request the too-small stack can now die at run time. So your idea of recognizing the problem and conjuring up a sufficient stack is compelling, since it would likely "just work", no matter how dumb the program. But where would the the sufficient stack come from -- is this a new kernel buffer, or is there a way to abscond some user memory? I would expect a signal handler to look at the data on its stack and nobody else will look at that stack. But this is already an unreasonable program for allocating a special signal stack in the first place :-/ So yes, one could imagine the signal handler could longjump instead of gracefully completing, and if this specially allocated signal stack isn't where the user planned, that could be trouble. Another idea we discussed was to detect the potential overflow at run-time, and instead of killing the process, just push the signal onto the regular user stack. this might actually work, but it is sort of devious; and it would not work in the case where the user overflowed their regular stack already, which may be the most (only?) compelling reason that they allocated and declared a special sigaltstack in the first place...
* Len Brown <lenb@kernel.org> wrote: > On Wed, Mar 17, 2021 at 6:45 AM Ingo Molnar <mingo@kernel.org> wrote: > > > > > > * Ingo Molnar <mingo@kernel.org> wrote: > > > > > > > > * Chang S. Bae <chang.seok.bae@intel.com> wrote: > > > > > > > During signal entry, the kernel pushes data onto the normal userspace > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state, > > > > which has grown over time as new features and larger registers have been > > > > added to the architecture. > > > > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ > > > > constant indicates to userspace how much data the kernel expects to push on > > > > the user stack, [2][3]. > > > > > > > > However, this constant is much too small and does not reflect recent > > > > additions to the architecture. For instance, when AVX-512 states are in > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB. > > > > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can > > > > cause user stack overflow when delivering a signal. > > > > > > > uapi: Define the aux vector AT_MINSIGSTKSZ > > > > x86/signal: Introduce helpers to get the maximum signal frame size > > > > x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ > > > > selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available > > > > x86/signal: Detect and prevent an alternate signal stack overflow > > > > selftest/x86/signal: Include test cases for validating sigaltstack > > > > > > So this looks really complicated, is this justified? > > > > > > Why not just internally round up sigaltstack size if it's too small? > > > This would be more robust, as it would fix applications that use > > > MINSIGSTKSZ but don't use the new AT_MINSIGSTKSZ facility. > > > > > > I.e. does AT_MINSIGSTKSZ have any other uses than avoiding the > > > segfault if MINSIGSTKSZ is used to create a small signal stack? > > > > I.e. if the kernel sees a too small ->ss_size in sigaltstack() it > > would ignore ->ss_sp and mmap() a new sigaltstack instead and use that > > for the signal handler stack. > > > > This would automatically make MINSIGSTKSZ - and other too small sizes > > work today, and in the future. > > > > But the question is, is there user-space usage of sigaltstacks that > > relies on controlling or reading the contents of the stack? > > > > longjmp using programs perhaps? > > For the legacy binary that requests a too-small sigaltstack, there are > several choices: > > We could detect the too-small stack at sigaltstack(2) invocation and > return an error. > This results in two deal-killing problems: > First, some applications don't check the return value, so the check > would be fruitless. > Second, those that check and error-out may be programs that never > actually take the signal, and so we'd be causing a dusty binary to > exit, when it didn't exit on another system, or another kernel. > > Or we could detect the too small stack at signal registration time. > This has the same two deal-killers as above. > > Then there is the approach in this patch-set, which detects an > imminent stack overflow at run time. > It has neither of the two problems above, and the benefit that we now > prevent data corruption > that could have been happening on some systems already today. The > down side is that the dusty binary > that does request the too-small stack can now die at run time. > > So your idea of recognizing the problem and conjuring up a > sufficient stack is compelling, since it would likely "just work", > no matter how dumb the program. But where would the the sufficient > stack come from -- is this a new kernel buffer, or is there a way to > abscond some user memory? I would expect a signal handler to look > at the data on its stack and nobody else will look at that stack. > But this is already an unreasonable program for allocating a special > signal stack in the first place :-/ So yes, one could imagine the > signal handler could longjump instead of gracefully completing, and > if this specially allocated signal stack isn't where the user > planned, that could be trouble. We could mmap() (implicitly) new anonymous memory - but I can see why this is probably more trouble than worth... > Another idea we discussed was to detect the potential overflow at > run-time, and instead of killing the process, just push the signal > onto the regular user stack. this might actually work, but it is > sort of devious; and it would not work in the case where the user > overflowed their regular stack already, which may be the most > (only?) compelling reason that they allocated and declared a special > sigaltstack in the first place... Yeah, this doesn't sound deterministic enough. Ok, thanks for the detailed answers - I withdraw my objections, let's proceed with the approach you are proposing? Thanks, Ingo