Message ID | 20211115152714.3205552-1-broonie@kernel.org (mailing list archive) |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0AAE33857C42 for <patchwork@sourceware.org>; Mon, 15 Nov 2021 15:27:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0AAE33857C42 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1636990068; bh=Gu88fQwSWj4ZyUjDCfCXa3U8WWoYzknHCIzqm//0shw=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=kQI1oJDLFSO4S/75W+hLQkvqb5HsziolZgNm3yE2VPyk5wNOCAHl69ukRa6yGTvwK 1MRDWwqr9Buj51WaYOFOtzjOPwyIYEUys7HOsSfunTUiRmIT7UpgrU/R509wmDts4N NMZ/telAlLOI5ebh20lTpOgkowWT/gE642FuVh7I= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by sourceware.org (Postfix) with ESMTPS id C96623858D39 for <libc-alpha@sourceware.org>; Mon, 15 Nov 2021 15:27:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C96623858D39 Received: by mail.kernel.org (Postfix) with ESMTPSA id D61DC60EB2; Mon, 15 Nov 2021 15:27:21 +0000 (UTC) To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org> Subject: [PATCH v7 0/4] arm64: Enable BTI for the executable as well as the interpreter Date: Mon, 15 Nov 2021 15:27:10 +0000 Message-Id: <20211115152714.3205552-1-broonie@kernel.org> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2375; h=from:subject; bh=y2gJEptAy9Et9KNJ+WgWMubC2Cf2DYOaQGtmMK574eM=; b=owEBbQGS/pANAwAKASTWi3JdVIfQAcsmYgBhknxLTDDzIIYxy+ulyMSW7G9JFvtTRLsXNuPUPEwl 4lYp3TqJATMEAAEKAB0WIQSt5miqZ1cYtZ/in+ok1otyXVSH0AUCYZJ8SwAKCRAk1otyXVSH0F+rB/ 9Ui/dB66NxvOi8g/beOGNPDLFGwrLIyaBV8Ro1VoGi8dU9mcRWkTNhiYEFk2h70oDW8Ss3RIqeq9hm j2ZOIk/yXtfqbpVXIc3qIdznwiwqvlxCiNC9CWHBk9PnjZmn0f5di2+RHyVliiJCUVOir55WC8TMv2 aoEWxAuMzOwQgfeDIs7ai3pM7J7goBNFWEQo8l6pKSvu+RVOt8x/O0AecknD/4Hy9YTWuHfrfFPig8 ZCK/PZRfCM3nbOYRjDCi7W+ThbG4HRNnRmuiK6pofKVnFBf3UoUmPiVfv4q1FPPgfmUbfZQQIcmwJo SkngK8Mq1B6BOWkV9NU+1KBosJwLMm X-Developer-Key: i=broonie@kernel.org; a=openpgp; fpr=3F2568AAC26998F9E813A1C5C3F436CA30F5D8EB Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Mark Brown via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Mark Brown <broonie@kernel.org> Cc: linux-arch@vger.kernel.org, Yu-cheng Yu <yu-cheng.yu@intel.com>, libc-alpha@sourceware.org, Szabolcs Nagy <szabolcs.nagy@arm.com>, Jeremy Linton <jeremy.linton@arm.com>, Mark Brown <broonie@kernel.org>, linux-arm-kernel@lists.infradead.org Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
arm64: Enable BTI for the executable as well as the interpreter
|
|
Message
Mark Brown
Nov. 15, 2021, 3:27 p.m. UTC
Deployments of BTI on arm64 have run into issues interacting with systemd's MemoryDenyWriteExecute feature. Currently for dynamically linked executables the kernel will only handle architecture specific properties like BTI for the interpreter, the expectation is that the interpreter will then handle any properties on the main executable. For BTI this means remapping the executable segments PROT_EXEC | PROT_BTI. This interacts poorly with MemoryDenyWriteExecute since that is implemented using a seccomp filter which prevents setting PROT_EXEC on already mapped memory and lacks the context to be able to detect that memory is already mapped with PROT_EXEC. This series resolves this by handling the BTI property for both the interpreter and the main executable. This does mean that we may get more code with BTI enabled if running on a system without BTI support in the dynamic linker, this is expected to be a safe configuration and testing seems to confirm that. It also reduces the flexibility userspace has to disable BTI but it is expected that for cases where there are problems which require BTI to be disabled it is more likely that it will need to be disabled on a system level. v7: - Rebase onto v5.16-rc1. v6: - Rebase onto v5.15-rc1. v5: - Rebase onto v5.14-rc2. - Tweak changelog on patch 1. - Use the helper for interpreter/executable flag in elf.h as well. v4: - Rebase onto v5.14-rc1. v3: - Fix passing of properties for parsing by the main executable. - Drop has_interp from arch_parse_elf_property(). - Coding style tweaks. v2: - Add a patch dropping has_interp from arch_adjust_elf_prot() - Fix bisection issue with static executables on arm64 in the first patch. Mark Brown (4): elf: Allow architectures to parse properties on the main executable arm64: Enable BTI for main executable as well as the interpreter elf: Remove has_interp property from arch_adjust_elf_prot() elf: Remove has_interp property from arch_parse_elf_property() arch/arm64/include/asm/elf.h | 14 ++++++++++++-- arch/arm64/kernel/process.c | 16 +++------------- fs/binfmt_elf.c | 29 ++++++++++++++++++++--------- include/linux/elf.h | 8 +++++--- 4 files changed, 40 insertions(+), 27 deletions(-) base-commit: fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf
Comments
On Mon, Nov 15, 2021 at 03:27:10PM +0000, Mark Brown wrote: > Deployments of BTI on arm64 have run into issues interacting with > systemd's MemoryDenyWriteExecute feature. Currently for dynamically > linked executables the kernel will only handle architecture specific > properties like BTI for the interpreter, the expectation is that the > interpreter will then handle any properties on the main executable. > For BTI this means remapping the executable segments PROT_EXEC | > PROT_BTI. > > This interacts poorly with MemoryDenyWriteExecute since that is > implemented using a seccomp filter which prevents setting PROT_EXEC on > already mapped memory and lacks the context to be able to detect that > memory is already mapped with PROT_EXEC. This series resolves this by > handling the BTI property for both the interpreter and the main > executable. > > This does mean that we may get more code with BTI enabled if running on > a system without BTI support in the dynamic linker, this is expected to > be a safe configuration and testing seems to confirm that. It also > reduces the flexibility userspace has to disable BTI but it is expected > that for cases where there are problems which require BTI to be disabled > it is more likely that it will need to be disabled on a system level. Given the silence on this series over the past months, I propose we drop it. It's a bit unfortunate that systemd's MemoryDenyWriteExecute cannot work with BTI but I also think the former is a pretty blunt hardening mechanism (rejecting any mprotect(PROT_EXEC) regardless of the previous attributes). I'm not a security expert to assess whether MDWX is more important than BTI (hardware availability also influences the distros decision). My suggestion would be to look at a better way to support the MDWX on the long run that does not interfere with BTI.
The 12/08/2021 18:23, Catalin Marinas wrote: > On Mon, Nov 15, 2021 at 03:27:10PM +0000, Mark Brown wrote: > > Deployments of BTI on arm64 have run into issues interacting with > > systemd's MemoryDenyWriteExecute feature. Currently for dynamically > > linked executables the kernel will only handle architecture specific > > properties like BTI for the interpreter, the expectation is that the > > interpreter will then handle any properties on the main executable. > > For BTI this means remapping the executable segments PROT_EXEC | > > PROT_BTI. > > > > This interacts poorly with MemoryDenyWriteExecute since that is > > implemented using a seccomp filter which prevents setting PROT_EXEC on > > already mapped memory and lacks the context to be able to detect that > > memory is already mapped with PROT_EXEC. This series resolves this by > > handling the BTI property for both the interpreter and the main > > executable. > > > > This does mean that we may get more code with BTI enabled if running on > > a system without BTI support in the dynamic linker, this is expected to > > be a safe configuration and testing seems to confirm that. It also > > reduces the flexibility userspace has to disable BTI but it is expected > > that for cases where there are problems which require BTI to be disabled > > it is more likely that it will need to be disabled on a system level. > > Given the silence on this series over the past months, I propose we drop > it. It's a bit unfortunate that systemd's MemoryDenyWriteExecute cannot > work with BTI but I also think the former is a pretty blunt hardening > mechanism (rejecting any mprotect(PROT_EXEC) regardless of the previous > attributes). > > I'm not a security expert to assess whether MDWX is more important than > BTI (hardware availability also influences the distros decision). My > suggestion would be to look at a better way to support the MDWX on the > long run that does not interfere with BTI. i still think it would be better if the kernel dealt with PROT_BTI for the exe loaded by the kernel. things work without this series as glibc will continue to apply mprotect and ignore the failure, but with the series security is tighter (neither mdwx nor bti are compromised). for unrelated reasons bti is not widely deployed currently so i guess there is no urgent interest in this.
On Thu, Dec 09, 2021 at 11:10:48AM +0000, Szabolcs Nagy wrote: > The 12/08/2021 18:23, Catalin Marinas wrote: > > On Mon, Nov 15, 2021 at 03:27:10PM +0000, Mark Brown wrote: > > > memory is already mapped with PROT_EXEC. This series resolves this by > > > handling the BTI property for both the interpreter and the main > > > executable. > > Given the silence on this series over the past months, I propose we drop > > it. It's a bit unfortunate that systemd's MemoryDenyWriteExecute cannot > > work with BTI but I also think the former is a pretty blunt hardening > > mechanism (rejecting any mprotect(PROT_EXEC) regardless of the previous > > attributes). > i still think it would be better if the kernel dealt with > PROT_BTI for the exe loaded by the kernel. The above message from Catalin isn't quite the full story here - my understanding from backchannel is that there's concern from others that we might be creating future issues by enabling PROT_BTI, especially in the case where the same permissions issue prevents the dynamic linker disabling PROT_BTI. They'd therefore rather stick with the status quo and not create any new ABI. Unfortunately that's not something people have been willing to say on the list, hopefully the above captures the thinking well enough. Personally I'm a bit ambivalent on this, I do see the potential issue but I'm having trouble constructing an actual scenario and my instinct is that since we handle PROT_EXEC we should also handle PROT_BTI for consistency.
Hi, On 1/4/22 11:32, Mark Brown wrote: > On Thu, Dec 09, 2021 at 11:10:48AM +0000, Szabolcs Nagy wrote: >> The 12/08/2021 18:23, Catalin Marinas wrote: >>> On Mon, Nov 15, 2021 at 03:27:10PM +0000, Mark Brown wrote: > >>>> memory is already mapped with PROT_EXEC. This series resolves this by >>>> handling the BTI property for both the interpreter and the main >>>> executable. > >>> Given the silence on this series over the past months, I propose we drop >>> it. It's a bit unfortunate that systemd's MemoryDenyWriteExecute cannot >>> work with BTI but I also think the former is a pretty blunt hardening >>> mechanism (rejecting any mprotect(PROT_EXEC) regardless of the previous >>> attributes). > >> i still think it would be better if the kernel dealt with >> PROT_BTI for the exe loaded by the kernel. > > The above message from Catalin isn't quite the full story here - my > understanding from backchannel is that there's concern from others that > we might be creating future issues by enabling PROT_BTI, especially in > the case where the same permissions issue prevents the dynamic linker > disabling PROT_BTI. They'd therefore rather stick with the status quo > and not create any new ABI. Unfortunately that's not something people > have been willing to say on the list, hopefully the above captures the > thinking well enough. > > Personally I'm a bit ambivalent on this, I do see the potential issue > but I'm having trouble constructing an actual scenario and my instinct > is that since we handle PROT_EXEC we should also handle PROT_BTI for > consistency. > I'm hardly a security expert, but it seems to me that BTI hardens against a wider set of possible exploits than MDWE. Yet, we are silently turning it off for systemd services which are considered some of the most security critical things in the machine right now (ex:logind, etc). So despite 'systemd-analyze secuirty` flagging those services as the most secure ones on a system, they might actually be less secure. It also seems that getting BTI turned on earlier, as this patch is doing is itself a win. So, mentally i'm having a hard time balancing the hypothetical problem laid out, as it should only really exist in an environment similar to the MDWE one, since AFAIK, its possible today to just flip it back off unless MDWE stops that from happening. What are the the remaining alternatives? A new syscall? But that is by definition a new ABI, and wouldn't benefit from having BTI turned on as early as this patch is doing. Should we disable MDWE on a BTI machine? I'm not sure that is a good look, particularly if MDWE happens to successfully stop some exploit. AFAIK, MDWE+BTI are a good strong combination, it seems a shame if we can't get them both working together. I hesitate to suggest it, but maybe this patch should be conditional somehow, that way !systemd/MDWE machines can behave as they do today, and systemd/MDWE machines can request BTI be turned on by the kernel automatically?
Hi Jeremy, On Wed, Jan 05, 2022 at 04:42:01PM -0600, Jeremy Linton wrote: > On 1/4/22 11:32, Mark Brown wrote: > > On Thu, Dec 09, 2021 at 11:10:48AM +0000, Szabolcs Nagy wrote: > > > The 12/08/2021 18:23, Catalin Marinas wrote: > > > > On Mon, Nov 15, 2021 at 03:27:10PM +0000, Mark Brown wrote: > > > > > memory is already mapped with PROT_EXEC. This series resolves this by > > > > > handling the BTI property for both the interpreter and the main > > > > > executable. > > > > > > > > Given the silence on this series over the past months, I propose we drop > > > > it. It's a bit unfortunate that systemd's MemoryDenyWriteExecute cannot > > > > work with BTI but I also think the former is a pretty blunt hardening > > > > mechanism (rejecting any mprotect(PROT_EXEC) regardless of the previous > > > > attributes). > > > > > > i still think it would be better if the kernel dealt with > > > PROT_BTI for the exe loaded by the kernel. > > > > The above message from Catalin isn't quite the full story here - my > > understanding from backchannel is that there's concern from others that > > we might be creating future issues by enabling PROT_BTI, especially in > > the case where the same permissions issue prevents the dynamic linker > > disabling PROT_BTI. They'd therefore rather stick with the status quo > > and not create any new ABI. Unfortunately that's not something people > > have been willing to say on the list, hopefully the above captures the > > thinking well enough. > > > > Personally I'm a bit ambivalent on this, I do see the potential issue > > but I'm having trouble constructing an actual scenario and my instinct > > is that since we handle PROT_EXEC we should also handle PROT_BTI for > > consistency. > > I'm hardly a security expert, but it seems to me that BTI hardens against a > wider set of possible exploits than MDWE. They are complementary features. > Yet, we are silently turning it > off for systemd services which are considered some of the most security > critical things in the machine right now (ex:logind, etc). So despite > 'systemd-analyze secuirty` flagging those services as the most secure ones > on a system, they might actually be less secure. Well, that's a distro decision. MDWE/MDWX is not something imposed by the kernel but rather a seccomp bpf filter set up by systemd. > It also seems that getting BTI turned on earlier, as this patch is doing is > itself a win. > > So, mentally i'm having a hard time balancing the hypothetical problem laid > out, as it should only really exist in an environment similar to the MDWE > one, since AFAIK, its possible today to just flip it back off unless MDWE > stops that from happening. That's a user ABI change and given that the first attempt was shown to break with some combination of old loader and new main executable (or the other way around), I'd rather keep things as they are. > What are the the remaining alternatives? A new syscall? But that is by > definition a new ABI, A new ABI is better than changing the current ABI. > and wouldn't benefit from having BTI turned on as early as this patch > is doing. In the absence of MDWX, it's not relevant how early the kernel turns BTI on for the main executable. The dynamic loader would do this with an mprotect() before actually executing any of the main code. Of course, we assume there are no security bugs in the dynamic loader. > Should we disable MDWE on a BTI machine? I'm > not sure that is a good look, particularly if MDWE happens to successfully > stop some exploit. AFAIK, MDWE+BTI are a good strong combination, it seems a > shame if we can't get them both working together. AFAICT MDWX wants (one of the filters) to prevent a previously writable mapping from becoming executable through mprotect(PROT_EXEC). How common is mprotect(PROT_EXEC|PROT_BTI) outside of the dynamic loader? I doubt it is, especially in an MDWX environment. So can we not change the filter to allow PROT_EXEC|PROT_BTI? If your code is already exploitable to allow random syscalls, all bets are off anyway. > I hesitate to suggest it, but maybe this patch should be conditional > somehow, that way !systemd/MDWE machines can behave as they do today, and > systemd/MDWE machines can request BTI be turned on by the kernel > automatically? That would be some big knob sysctl but I'm still not keen on toggling the ABI like this.
Hi, On 1/6/22 05:00, Catalin Marinas wrote: > Hi Jeremy, > > On Wed, Jan 05, 2022 at 04:42:01PM -0600, Jeremy Linton wrote: >> On 1/4/22 11:32, Mark Brown wrote: >>> On Thu, Dec 09, 2021 at 11:10:48AM +0000, Szabolcs Nagy wrote: >>>> The 12/08/2021 18:23, Catalin Marinas wrote: >>>>> On Mon, Nov 15, 2021 at 03:27:10PM +0000, Mark Brown wrote: >>>>>> memory is already mapped with PROT_EXEC. This series resolves this by >>>>>> handling the BTI property for both the interpreter and the main >>>>>> executable. >>>>> >>>>> Given the silence on this series over the past months, I propose we drop >>>>> it. It's a bit unfortunate that systemd's MemoryDenyWriteExecute cannot >>>>> work with BTI but I also think the former is a pretty blunt hardening >>>>> mechanism (rejecting any mprotect(PROT_EXEC) regardless of the previous >>>>> attributes). >>>> >>>> i still think it would be better if the kernel dealt with >>>> PROT_BTI for the exe loaded by the kernel. >>> >>> The above message from Catalin isn't quite the full story here - my >>> understanding from backchannel is that there's concern from others that >>> we might be creating future issues by enabling PROT_BTI, especially in >>> the case where the same permissions issue prevents the dynamic linker >>> disabling PROT_BTI. They'd therefore rather stick with the status quo >>> and not create any new ABI. Unfortunately that's not something people >>> have been willing to say on the list, hopefully the above captures the >>> thinking well enough. >>> >>> Personally I'm a bit ambivalent on this, I do see the potential issue >>> but I'm having trouble constructing an actual scenario and my instinct >>> is that since we handle PROT_EXEC we should also handle PROT_BTI for >>> consistency. >> >> I'm hardly a security expert, but it seems to me that BTI hardens against a >> wider set of possible exploits than MDWE. > > They are complementary features. > >> Yet, we are silently turning it >> off for systemd services which are considered some of the most security >> critical things in the machine right now (ex:logind, etc). So despite >> 'systemd-analyze secuirty` flagging those services as the most secure ones >> on a system, they might actually be less secure. > > Well, that's a distro decision. MDWE/MDWX is not something imposed by > the kernel but rather a seccomp bpf filter set up by systemd. > >> It also seems that getting BTI turned on earlier, as this patch is doing is >> itself a win. >> >> So, mentally i'm having a hard time balancing the hypothetical problem laid >> out, as it should only really exist in an environment similar to the MDWE >> one, since AFAIK, its possible today to just flip it back off unless MDWE >> stops that from happening. > > That's a user ABI change and given that the first attempt was shown to > break with some combination of old loader and new main executable (or > the other way around), I'd rather keep things as they are. This should only change the behavior for for binaries which conform to the new ABI containing the BTI note. So outside of the tiny window of things built with BTI, but run on !BTI hardware or older kernel+glibc, this shouldn't be a problem. (Unless i'm missing something) Put another way, now is the time to make a change, before there is a legacy BTI ecosystem we have to deal with. > >> What are the the remaining alternatives? A new syscall? But that is by >> definition a new ABI, > > A new ABI is better than changing the current ABI. > >> and wouldn't benefit from having BTI turned on as early as this patch >> is doing. > > In the absence of MDWX, it's not relevant how early the kernel turns BTI > on for the main executable. The dynamic loader would do this with an > mprotect() before actually executing any of the main code. Of course, we > assume there are no security bugs in the dynamic loader. > >> Should we disable MDWE on a BTI machine? I'm >> not sure that is a good look, particularly if MDWE happens to successfully >> stop some exploit. AFAIK, MDWE+BTI are a good strong combination, it seems a >> shame if we can't get them both working together. > > AFAICT MDWX wants (one of the filters) to prevent a previously writable > mapping from becoming executable through mprotect(PROT_EXEC). How common > is mprotect(PROT_EXEC|PROT_BTI) outside of the dynamic loader? I doubt > it is, especially in an MDWX environment. So can we not change the > filter to allow PROT_EXEC|PROT_BTI? If your code is already exploitable > to allow random syscalls, all bets are off anyway. I would expect JITs to be twittling EXEC|BTI but, those wouldn't be able to trivially run under MDWE anyway. So rarely? Changing the filter to allow PROT_EXEC|PROT_BTI defeats the purpose because the hypothetical exploit would just add the BTI tags and turn BTI on as well. The filter, as is, also provides additional BTI protections because it makes it more difficult to disable BTI. Without that filter it seems likely someone could come up with a way to use an existing PROT_EXEC as a gadget to disable BTI anywhere they choose. So, to your point before, BTI+MDWE are complementary, the combination seems considerably more robust than either by itself. > >> I hesitate to suggest it, but maybe this patch should be conditional >> somehow, that way !systemd/MDWE machines can behave as they do today, and >> systemd/MDWE machines can request BTI be turned on by the kernel >> automatically? > > That would be some big knob sysctl but I'm still not keen on toggling > the ABI like this. >
On Thu, Jan 06, 2022 at 10:09:35AM -0600, Jeremy Linton wrote: > On 1/6/22 05:00, Catalin Marinas wrote: > > On Wed, Jan 05, 2022 at 04:42:01PM -0600, Jeremy Linton wrote: > > > So, mentally i'm having a hard time balancing the hypothetical problem laid > > > out, as it should only really exist in an environment similar to the MDWE > > > one, since AFAIK, its possible today to just flip it back off unless MDWE > > > stops that from happening. > > > > That's a user ABI change and given that the first attempt was shown to > > break with some combination of old loader and new main executable (or > > the other way around), I'd rather keep things as they are. > > This should only change the behavior for for binaries which conform to the > new ABI containing the BTI note. So outside of the tiny window of things > built with BTI, but run on !BTI hardware or older kernel+glibc, this > shouldn't be a problem. (Unless i'm missing something) Put another way, now > is the time to make a change, before there is a legacy BTI ecosystem we have > to deal with. The concern is that the loader may decide in the future to not enable (or turn off) BTI for some reason (e.g. mixed libraries, old glibc on BTI hardware). If we force BTI on the main executable, we'd take this option away. Note also that it's not only glibc here, there are other loaders. > > AFAICT MDWX wants (one of the filters) to prevent a previously writable > > mapping from becoming executable through mprotect(PROT_EXEC). How common > > is mprotect(PROT_EXEC|PROT_BTI) outside of the dynamic loader? I doubt > > it is, especially in an MDWX environment. So can we not change the > > filter to allow PROT_EXEC|PROT_BTI? If your code is already exploitable > > to allow random syscalls, all bets are off anyway. > > I would expect JITs to be twittling EXEC|BTI but, those wouldn't be able to > trivially run under MDWE anyway. So rarely? > > Changing the filter to allow PROT_EXEC|PROT_BTI defeats the purpose because > the hypothetical exploit would just add the BTI tags and turn BTI on as > well. The filter, as is, also provides additional BTI protections because it > makes it more difficult to disable BTI. Without that filter it seems likely > someone could come up with a way to use an existing PROT_EXEC as a gadget to > disable BTI anywhere they choose. To me, MDWX is more about protecting against making some (previously) writable buffers executable (either inadvertently or as a programmer decision). It's not meant to prevent the hijacking of the instruction flow or data corruption that ends up as an mprotect(PROT_EXEC|PROT_BTI) call. If that's possible, the MDWX mitigation is really negligible. That said, the programmer may learn that passing PROT_BTI avoids the filter and start using it, though it's not trivial due to the need for landing pads. > So, to your point before, BTI+MDWE are complementary, the combination seems > considerably more robust than either by itself. Before we come up with a better solution, I find allowing PROT_BTI an acceptable compromise. I don't think we lose much with the current codebase. As for a better solution, we need to track the state of a memory range as BPF denying PROT_EXEC anywhere is a pretty big hammer. We could add a VM_WAS_WRITE flag and, in combination with a prctl() to opt in to the feature, prevent mprotect(PROT_EXEC) on such vmas. No need for seccomp bpf filters.
On Thu, Jan 06, 2022 at 06:13:37PM +0000, Catalin Marinas wrote: > On Thu, Jan 06, 2022 at 10:09:35AM -0600, Jeremy Linton wrote: > > This should only change the behavior for for binaries which conform to the > > new ABI containing the BTI note. So outside of the tiny window of things > > built with BTI, but run on !BTI hardware or older kernel+glibc, this > > shouldn't be a problem. (Unless i'm missing something) Put another way, now > > is the time to make a change, before there is a legacy BTI ecosystem we have > > to deal with. > The concern is that the loader may decide in the future to not enable > (or turn off) BTI for some reason (e.g. mixed libraries, old glibc on > BTI hardware). If we force BTI on the main executable, we'd take this > option away. Note also that it's not only glibc here, there are other > loaders. Neither of those examples should be concerns - BTI is per page so you can mix BTI and non-BTI freely in a process (as will happen now for the case where the dynamic loader is built for BTI but the main executable is not, and the dynamic loader should do if there's a mix of BTI and non-BTI libraries). The main case where there might be a change would be the case where there's individual excutables with incorrect BTI annotations which are run under this seccomp MWDE, then the dynamic loader might have support for disabling BTI based on some configuration but wouldn't be able to due to the MWDE. Note also that we're only taking the option of disabiling BTI away in the case where there's something like this seccomp filter disabling permission changes.
On Thu, Jan 06, 2022 at 07:07:46PM +0000, Mark Brown wrote: > On Thu, Jan 06, 2022 at 06:13:37PM +0000, Catalin Marinas wrote: > > On Thu, Jan 06, 2022 at 10:09:35AM -0600, Jeremy Linton wrote: > > > This should only change the behavior for for binaries which conform to the > > > new ABI containing the BTI note. So outside of the tiny window of things > > > built with BTI, but run on !BTI hardware or older kernel+glibc, this > > > shouldn't be a problem. (Unless i'm missing something) Put another way, now > > > is the time to make a change, before there is a legacy BTI ecosystem we have > > > to deal with. > > > The concern is that the loader may decide in the future to not enable > > (or turn off) BTI for some reason (e.g. mixed libraries, old glibc on > > BTI hardware). If we force BTI on the main executable, we'd take this > > option away. Note also that it's not only glibc here, there are other > > loaders. > > Neither of those examples should be concerns - BTI is per page so you > can mix BTI and non-BTI freely in a process (as will happen now for the > case where the dynamic loader is built for BTI but the main executable > is not, and the dynamic loader should do if there's a mix of BTI and > non-BTI libraries). The main case where there might be a change would > be the case where there's individual excutables with incorrect BTI > annotations which are run under this seccomp MWDE, then the dynamic > loader might have support for disabling BTI based on some configuration > but wouldn't be able to due to the MWDE. > > Note also that we're only taking the option of disabiling BTI away in > the case where there's something like this seccomp filter disabling > permission changes. Thanks for the clarification. I think we can look at this from two angles: 1. Ignoring MDWE, should whoever does the original mmap() also honour PROT_BTI? We do this for static binaries but, for consistency, should we extend it to dynamic executable? 2. A 'simple' fix to allow MDWE together with BTI. Regarding (1), I don't remember whether we decided to do it this way because it was more complicated to handle it in the kernel (like the 4 more patches in this series) or because we wanted to leave the option to the dynamic loader. It would be good to clarify this and we may have a small window, as Jeremy said, where changing the ABI won't cause problems (well, hopefully, there's still a risk). If we only want (2), I just think we are approaching it wrongly. Looking at mprotect(), this systemd feature is not even MDWE but rather "deny mprotect execute" disregarding the previous attributes. If we want this, something like below would do (conditional on some personality flag or a prctl): diff --git a/mm/mprotect.c b/mm/mprotect.c index e552f5e0ccbd..4262e6f1c14e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -616,6 +616,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len, goto out; } + if ((newflags & VM_EXEC) && !(vma->vm_flags & VM_EXEC)) { + error = -EACCES; + goto out; + } + /* Allow architectures to sanity-check the new flags */ if (!arch_validate_flags(newflags)) { error = -EINVAL; But we could also do a proper MDWE covering both mmap() and mprotect() and keep track on whether a vma was previously writable. A personality flag seems like a better option as we have READ_IMPLIES_EXEC, we could add a DENY_WRITE_EXEC.
On Fri, Jan 07, 2022 at 12:01:17PM +0000, Catalin Marinas wrote: > Regarding (1), I don't remember whether we decided to do it this way > because it was more complicated to handle it in the kernel (like the 4 > more patches in this series) or because we wanted to leave the option to > the dynamic loader. It would be good to clarify this and we may have a > small window, as Jeremy said, where changing the ABI won't cause > problems (well, hopefully, there's still a risk). My understanding is that it was basically just a "let's defer everything to userspace" thing. It means that userspace is responsible for turning on BTI and is therefore responsible for any workarounds which are needed for problematic binaries, it's the absolute minimum the kernel can be responsible for. This all predates my involvement though.
On Fri, Jan 07, 2022 at 12:01:17PM +0000, Catalin Marinas wrote: > I think we can look at this from two angles: > > 1. Ignoring MDWE, should whoever does the original mmap() also honour > PROT_BTI? We do this for static binaries but, for consistency, should > we extend it to dynamic executable? > > 2. A 'simple' fix to allow MDWE together with BTI. Thinking about it, (1) is not that different from the kernel setting PROT_EXEC on the main executable when the dynamic loader could've done it as well. There is a case for making this more consistent: whoever does the mmap() should use the full attributes. Question for the toolchain people: would the compiler ever generate relocations in the main executable that the linker needs to resolve via an mprotect(READ|WRITE) followed by mprotect(READ|EXEC)? If yes, we'd better go for a proper MDWE implementation in the kernel.
On 17/01/2022 14:54, Catalin Marinas via Libc-alpha wrote: > On Fri, Jan 07, 2022 at 12:01:17PM +0000, Catalin Marinas wrote: >> I think we can look at this from two angles: >> >> 1. Ignoring MDWE, should whoever does the original mmap() also honour >> PROT_BTI? We do this for static binaries but, for consistency, should >> we extend it to dynamic executable? >> >> 2. A 'simple' fix to allow MDWE together with BTI. > > Thinking about it, (1) is not that different from the kernel setting > PROT_EXEC on the main executable when the dynamic loader could've done > it as well. There is a case for making this more consistent: whoever > does the mmap() should use the full attributes. > > Question for the toolchain people: would the compiler ever generate > relocations in the main executable that the linker needs to resolve via > an mprotect(READ|WRITE) followed by mprotect(READ|EXEC)? If yes, we'd > better go for a proper MDWE implementation in the kernel. > Yes, text relocations. However these are deprecated (some libcs even do not support it) and have a lot of drawbacks.
On Mon, Jan 17, 2022 at 10:17 AM Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> wrote: > > > > On 17/01/2022 14:54, Catalin Marinas via Libc-alpha wrote: > > On Fri, Jan 07, 2022 at 12:01:17PM +0000, Catalin Marinas wrote: > >> I think we can look at this from two angles: > >> > >> 1. Ignoring MDWE, should whoever does the original mmap() also honour > >> PROT_BTI? We do this for static binaries but, for consistency, should > >> we extend it to dynamic executable? > >> > >> 2. A 'simple' fix to allow MDWE together with BTI. > > > > Thinking about it, (1) is not that different from the kernel setting > > PROT_EXEC on the main executable when the dynamic loader could've done > > it as well. There is a case for making this more consistent: whoever > > does the mmap() should use the full attributes. > > > > Question for the toolchain people: would the compiler ever generate > > relocations in the main executable that the linker needs to resolve via > > an mprotect(READ|WRITE) followed by mprotect(READ|EXEC)? If yes, we'd > > better go for a proper MDWE implementation in the kernel. > > > > Yes, text relocations. However these are deprecated (some libcs even do > not support it) and have a lot of drawbacks. We are taking a different approach for CET enabling. CET will be changed to be enabled from user space: https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/cet/enable and the CET kernel no longer enables CET automatically: https://github.com/hjl-tools/linux/tree/hjl/cet%2F5.16.0-v4
The 01/17/2022 17:54, Catalin Marinas wrote: > On Fri, Jan 07, 2022 at 12:01:17PM +0000, Catalin Marinas wrote: > > I think we can look at this from two angles: > > > > 1. Ignoring MDWE, should whoever does the original mmap() also honour > > PROT_BTI? We do this for static binaries but, for consistency, should > > we extend it to dynamic executable? > > > > 2. A 'simple' fix to allow MDWE together with BTI. > > Thinking about it, (1) is not that different from the kernel setting > PROT_EXEC on the main executable when the dynamic loader could've done > it as well. There is a case for making this more consistent: whoever > does the mmap() should use the full attributes. > Yeah that was my original idea that it should be consistent. One caveat is that protection flags are normally specified in the program header, but the BTI marking is in PT_GNU_PROPERTY which is harder to get to, so glibc does not try to get it right for the initial mapping either: it has to re-mmap or mprotect. (In principle we could use read syscalls to parse the ELF headers and notes before mmap, but that's more complicated with additional failure modes.) i.e. if (2) is fixed then mprotect can be used for library mapping too which is simpler than re-mmap. > Question for the toolchain people: would the compiler ever generate > relocations in the main executable that the linker needs to resolve via > an mprotect(READ|WRITE) followed by mprotect(READ|EXEC)? If yes, we'd > better go for a proper MDWE implementation in the kernel. There is some support for text relocations in glibc, but it's not expected to be common (e.g. it is a bug if a distro binary has it). For static PIE we made -z text ldflag the default so text relocs are rejected (i think glibc cannot self-relocate those, so ld.so cannot have them either), but otherwise certain text relocs work (static relocations are not supported). $ cat a.c int x = 42; __attribute__((section(".text"))) int *y = &x; int main(){ return *y; } $ gcc a.c /tmp/ccOrpMPD.s: Assembler messages: /tmp/ccOrpMPD.s:12: Warning: ignoring changed section attributes for .text $ readelf -aW ./a.out |grep TEXTREL 0x0000000000000016 (TEXTREL) 0x0 0x000000000000001e (FLAGS) TEXTREL BIND_NOW $ strace -e mprotect ./a.out mprotect(0xffff839ee000, 65536, PROT_NONE) = 0 mprotect(0xffff839fe000, 12288, PROT_READ) = 0 mprotect(0xaaaac2096000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 mprotect(0xaaaac2096000, 4096, PROT_READ|PROT_EXEC) = 0 mprotect(0xaaaac20a6000, 4096, PROT_READ) = 0 mprotect(0xffff83a55000, 4096, PROT_READ) = 0 +++ exited with 42 +++
The 01/17/2022 11:01, H.J. Lu via Libc-alpha wrote: > We are taking a different approach for CET enabling. CET will be > changed to be enabled from user space: > > https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/cet/enable > > and the CET kernel no longer enables CET automatically: > > https://github.com/hjl-tools/linux/tree/hjl/cet%2F5.16.0-v4 we considered userspace handling of BTI in static exe and ld.so too. at the time we wanted the protection to be on whenever BTI marked code is executed, so it has to be enabled at program entry. i no longer think that the entry code protection is very important, but delaying mprotect for static exe does not fix our mprotect(*|PROT_EXEC) problem with systemd. i also don't immediately see where you deal with shadow stack allocation for the main stack if it is userspace enabled, i expected that to require kernel assistance if you want the main stack protected all the way up.
On Tue, Jan 18, 2022 at 3:22 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote: > > The 01/17/2022 11:01, H.J. Lu via Libc-alpha wrote: > > We are taking a different approach for CET enabling. CET will be > > changed to be enabled from user space: > > > > https://gitlab.com/x86-glibc/glibc/-/tree/users/hjl/cet/enable > > > > and the CET kernel no longer enables CET automatically: > > > > https://github.com/hjl-tools/linux/tree/hjl/cet%2F5.16.0-v4 > > we considered userspace handling of BTI in static exe > and ld.so too. at the time we wanted the protection to > be on whenever BTI marked code is executed, so it has > to be enabled at program entry. > > i no longer think that the entry code protection is very > important, but delaying mprotect for static exe does > not fix our mprotect(*|PROT_EXEC) problem with systemd. > > i also don't immediately see where you deal with shadow > stack allocation for the main stack if it is userspace > enabled, i expected that to require kernel assistance > if you want the main stack protected all the way up. We enable shadow stack in user space as soon as possible: https://gitlab.com/x86-glibc/glibc/-/commit/211abce607a9f6e4cd1cadefb87561413dd8fae9
(Mark posted another series but I'm replying here to clarify some aspects) On Tue, Jan 18, 2022 at 11:02:55AM +0000, Szabolcs Nagy wrote: > The 01/17/2022 17:54, Catalin Marinas wrote: > > On Fri, Jan 07, 2022 at 12:01:17PM +0000, Catalin Marinas wrote: > > > I think we can look at this from two angles: > > > > > > 1. Ignoring MDWE, should whoever does the original mmap() also honour > > > PROT_BTI? We do this for static binaries but, for consistency, should > > > we extend it to dynamic executable? > > > > > > 2. A 'simple' fix to allow MDWE together with BTI. > > > > Thinking about it, (1) is not that different from the kernel setting > > PROT_EXEC on the main executable when the dynamic loader could've done > > it as well. There is a case for making this more consistent: whoever > > does the mmap() should use the full attributes. > > Yeah that was my original idea that it should be consistent. > One caveat is that protection flags are normally specified > in the program header, but the BTI marking is in > PT_GNU_PROPERTY which is harder to get to, so glibc does not > try to get it right for the initial mapping either: it has > to re-mmap or mprotect. (In principle we could use read > syscalls to parse the ELF headers and notes before mmap, > but that's more complicated with additional failure modes.) > > i.e. if (2) is fixed then mprotect can be used for library > mapping too which is simpler than re-mmap. I lost track of the userspace fixes here, was glibc changed to attempt a re-mmap of the dynamic libraries instead of mprotect()? It looks like (2) is a simpler fix and (1) could still be added for consistency, it's complementary. > > Question for the toolchain people: would the compiler ever generate > > relocations in the main executable that the linker needs to resolve via > > an mprotect(READ|WRITE) followed by mprotect(READ|EXEC)? If yes, we'd > > better go for a proper MDWE implementation in the kernel. > > There is some support for text relocations in glibc, but it's not > expected to be common (e.g. it is a bug if a distro binary has it). Thanks for the clarification.
The 01/27/2022 12:24, Catalin Marinas wrote: > (Mark posted another series but I'm replying here to clarify some > aspects) > > On Tue, Jan 18, 2022 at 11:02:55AM +0000, Szabolcs Nagy wrote: > > The 01/17/2022 17:54, Catalin Marinas wrote: > > > On Fri, Jan 07, 2022 at 12:01:17PM +0000, Catalin Marinas wrote: > > > > I think we can look at this from two angles: > > > > > > > > 1. Ignoring MDWE, should whoever does the original mmap() also honour > > > > PROT_BTI? We do this for static binaries but, for consistency, should > > > > we extend it to dynamic executable? > > > > > > > > 2. A 'simple' fix to allow MDWE together with BTI. > > > > > > Thinking about it, (1) is not that different from the kernel setting > > > PROT_EXEC on the main executable when the dynamic loader could've done > > > it as well. There is a case for making this more consistent: whoever > > > does the mmap() should use the full attributes. > > > > Yeah that was my original idea that it should be consistent. > > One caveat is that protection flags are normally specified > > in the program header, but the BTI marking is in > > PT_GNU_PROPERTY which is harder to get to, so glibc does not > > try to get it right for the initial mapping either: it has > > to re-mmap or mprotect. (In principle we could use read > > syscalls to parse the ELF headers and notes before mmap, > > but that's more complicated with additional failure modes.) > > > > i.e. if (2) is fixed then mprotect can be used for library > > mapping too which is simpler than re-mmap. > > I lost track of the userspace fixes here, was glibc changed to attempt a > re-mmap of the dynamic libraries instead of mprotect()? yes (so under mdwe, bti is lost on the exe but not on libs) see the commit message for the fix https://sourceware.org/bugzilla/show_bug.cgi?id=26831 > > It looks like (2) is a simpler fix and (1) could still be added for > consistency, it's complementary. i agree. if (2) is fixed then i would change glibc to use mprotect and handle the failure (this will require an update to systemd and disabling mdwe on old kernels) if (1) is fixed then i would probably still keep doing mprotect on the main exe so bti protection works on old kernels.