[v6,3/3] arm64: pac: Optimize kernel entry/exit key installation code paths

Message ID e3977b3e1b548be1d9554ccfad6c83ac87802583.1609311499.git.pcc@google.com
State Not applicable
Headers
Series [v6,1/3] arm64: mte: make the per-task SCTLR_EL1 field usable elsewhere |

Commit Message

Peter Collingbourne Dec. 30, 2020, 6:59 a.m. UTC
  The kernel does not use any keys besides IA so we don't need to
install IB/DA/DB/GA on kernel exit if we arrange to install them
on task switch instead, which we can expect to happen an order of
magnitude less often.

Furthermore we can avoid installing the user IA in the case where the
user task has IA disabled and just leave the kernel IA installed. This
also lets us avoid needing to install IA on kernel entry.

On an Apple M1 under a hypervisor, the overhead of kernel entry/exit
has been measured to be reduced by 15.6ns in the case where IA is
enabled, and 31.9ns in the case where IA is disabled.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/Ieddf6b580d23c9e0bed45a822dabe72d2ffc9a8e
---
 arch/arm64/include/asm/asm_pointer_auth.h | 20 +------------
 arch/arm64/include/asm/pointer_auth.h     | 36 ++++++++++++++++++-----
 arch/arm64/kernel/asm-offsets.c           |  4 ---
 arch/arm64/kernel/entry.S                 | 33 ++++++++++++---------
 arch/arm64/kernel/pointer_auth.c          |  1 +
 arch/arm64/kernel/process.c               |  1 +
 arch/arm64/kernel/suspend.c               |  3 +-
 7 files changed, 53 insertions(+), 45 deletions(-)
  

Comments

Will Deacon Jan. 26, 2021, 1:09 p.m. UTC | #1
On Tue, Dec 29, 2020 at 10:59:15PM -0800, Peter Collingbourne wrote:
> The kernel does not use any keys besides IA so we don't need to
> install IB/DA/DB/GA on kernel exit if we arrange to install them
> on task switch instead, which we can expect to happen an order of
> magnitude less often.
> 
> Furthermore we can avoid installing the user IA in the case where the
> user task has IA disabled and just leave the kernel IA installed. This
> also lets us avoid needing to install IA on kernel entry.

I've got to be honest, this makes me nervous in case there is a way for
userspace to recover the kernel key even though EnIA is clear. Currently,
EnIA doesn't affect XPAC* and PACGA instructions, and the architecture
clearly expects us to be switching these things:

  | Note
  | Keys are not banked by Exception level. Arm expects software to switch the
  | keys between Exception levels, typically by swapping the values with zero
  | so that the current key values are not present in memo

But then:

> On an Apple M1 under a hypervisor, the overhead of kernel entry/exit
> has been measured to be reduced by 15.6ns in the case where IA is
> enabled, and 31.9ns in the case where IA is disabled.

That's a good improvement, so this feels like its worth doing. I suppose all we
can do is keep an eye on the architecture in case any future extensions mean
the approach taken here is dangerous.

Will
  
Peter Collingbourne Feb. 12, 2021, 5:01 a.m. UTC | #2
On Tue, Jan 26, 2021 at 5:09 AM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Dec 29, 2020 at 10:59:15PM -0800, Peter Collingbourne wrote:
> > The kernel does not use any keys besides IA so we don't need to
> > install IB/DA/DB/GA on kernel exit if we arrange to install them
> > on task switch instead, which we can expect to happen an order of
> > magnitude less often.
> >
> > Furthermore we can avoid installing the user IA in the case where the
> > user task has IA disabled and just leave the kernel IA installed. This
> > also lets us avoid needing to install IA on kernel entry.
>
> I've got to be honest, this makes me nervous in case there is a way for
> userspace to recover the kernel key even though EnIA is clear. Currently,
> EnIA doesn't affect XPAC* and PACGA instructions, and the architecture

For GA I would expect it to be controlled by a hypothetical EnGA, not
by EnIA (and I'm a bit surprised that there isn't an EnGA; doesn't it
mean that a userspace program running under an unaware kernel or
hypervisor may sign things using the GA from potentially another
hypervisor guest?)

> clearly expects us to be switching these things:
>
>   | Note
>   | Keys are not banked by Exception level. Arm expects software to switch the
>   | keys between Exception levels, typically by swapping the values with zero
>   | so that the current key values are not present in memo
>
> But then:
>
> > On an Apple M1 under a hypervisor, the overhead of kernel entry/exit
> > has been measured to be reduced by 15.6ns in the case where IA is
> > enabled, and 31.9ns in the case where IA is disabled.
>
> That's a good improvement, so this feels like its worth doing. I suppose all we
> can do is keep an eye on the architecture in case any future extensions mean
> the approach taken here is dangerous.

Right. I think it makes sense for any future extensions not to expose
a key in any way if the corresponding key enable bit is clear (to
avoid the potential problem that I highlighted with GA above) unless
the operating system has explicitly opted into such behavior, e.g. by
setting a separate bit in SCTLR_EL1.

Peter
  
James Morse Feb. 12, 2021, 11:01 a.m. UTC | #3
Hi Peter,

On 12/02/2021 05:01, Peter Collingbourne wrote:
> On Tue, Jan 26, 2021 at 5:09 AM Will Deacon <will@kernel.org> wrote:
>>
>> On Tue, Dec 29, 2020 at 10:59:15PM -0800, Peter Collingbourne wrote:
>>> The kernel does not use any keys besides IA so we don't need to
>>> install IB/DA/DB/GA on kernel exit if we arrange to install them
>>> on task switch instead, which we can expect to happen an order of
>>> magnitude less often.
>>>
>>> Furthermore we can avoid installing the user IA in the case where the
>>> user task has IA disabled and just leave the kernel IA installed. This
>>> also lets us avoid needing to install IA on kernel entry.
>>
>> I've got to be honest, this makes me nervous in case there is a way for
>> userspace to recover the kernel key even though EnIA is clear. Currently,
>> EnIA doesn't affect XPAC* and PACGA instructions, and the architecture

> For GA I would expect it to be controlled by a hypothetical EnGA, not
> by EnIA (and I'm a bit surprised that there isn't an EnGA;

PACGA is undefined if the CPU doesn't implement PAC, whereas PACIASP is a NOP if the CPU
doesn't implement PAC.

I think the reason from the SCTLR_ELx controls is to make unaware systems transform the
instructions that were hints back into hints. (e.g. the AddPACIA psuedo code). This is
needed on mismatched big-little systems, otherwise processes can't be migrated between them.

For the non-hint instructions, user-space needs to test the hwcap/id-register-emulation to
know it can use these instructions, and the compiler shouldn't output them unconditionally.


> doesn't it
> mean that a userspace program running under an unaware kernel or
> hypervisor may sign things using the GA from potentially another
> hypervisor guest?)

The hypervisor controls all this with HCR_EL2.API, which also traps PACGA et al.
For the hypervisor its all or nothing.
If the hypervisor is emulating a machine without PAC, it can emulate an undefined
exception regardless of whether the CPU supports PAC or not.

Does this match your reading?


Thanks,

James
  
Peter Collingbourne Feb. 12, 2021, 6:20 p.m. UTC | #4
On Fri, Feb 12, 2021 at 3:01 AM James Morse <james.morse@arm.com> wrote:
>
> Hi Peter,
>
> On 12/02/2021 05:01, Peter Collingbourne wrote:
> > On Tue, Jan 26, 2021 at 5:09 AM Will Deacon <will@kernel.org> wrote:
> >>
> >> On Tue, Dec 29, 2020 at 10:59:15PM -0800, Peter Collingbourne wrote:
> >>> The kernel does not use any keys besides IA so we don't need to
> >>> install IB/DA/DB/GA on kernel exit if we arrange to install them
> >>> on task switch instead, which we can expect to happen an order of
> >>> magnitude less often.
> >>>
> >>> Furthermore we can avoid installing the user IA in the case where the
> >>> user task has IA disabled and just leave the kernel IA installed. This
> >>> also lets us avoid needing to install IA on kernel entry.
> >>
> >> I've got to be honest, this makes me nervous in case there is a way for
> >> userspace to recover the kernel key even though EnIA is clear. Currently,
> >> EnIA doesn't affect XPAC* and PACGA instructions, and the architecture
>
> > For GA I would expect it to be controlled by a hypothetical EnGA, not
> > by EnIA (and I'm a bit surprised that there isn't an EnGA;
>
> PACGA is undefined if the CPU doesn't implement PAC, whereas PACIASP is a NOP if the CPU
> doesn't implement PAC.
>
> I think the reason from the SCTLR_ELx controls is to make unaware systems transform the
> instructions that were hints back into hints. (e.g. the AddPACIA psuedo code). This is
> needed on mismatched big-little systems, otherwise processes can't be migrated between them.

It's needed for more than that, see the history of my
PR_PAC_SET_ENABLED_KEYS patch, in particular [1].

> For the non-hint instructions, user-space needs to test the hwcap/id-register-emulation to
> know it can use these instructions, and the compiler shouldn't output them unconditionally.

Right, unless the target is known to support them.

> > doesn't it
> > mean that a userspace program running under an unaware kernel or
> > hypervisor may sign things using the GA from potentially another
> > hypervisor guest?)
>
> The hypervisor controls all this with HCR_EL2.API, which also traps PACGA et al.
> For the hypervisor its all or nothing.
> If the hypervisor is emulating a machine without PAC, it can emulate an undefined
> exception regardless of whether the CPU supports PAC or not.
>
> Does this match your reading?

I think that's right. So an unaware hypervisor would set API to 0 and
none of the guests would be able to use the authentication
instructions. Since it looks like API=0 would make the hint-space
instructions trap as well, I would imagine that a hypervisor would
need to emulate them as no-ops if it's emulating a machine without
PAC.

Peter

[1] https://www.spinics.net/lists/arm-kernel/msg830889.html
  

Patch

diff --git a/arch/arm64/include/asm/asm_pointer_auth.h b/arch/arm64/include/asm/asm_pointer_auth.h
index 52dead2a8640..8ca2dc0661ee 100644
--- a/arch/arm64/include/asm/asm_pointer_auth.h
+++ b/arch/arm64/include/asm/asm_pointer_auth.h
@@ -13,30 +13,12 @@ 
  * so use the base value of ldp as thread.keys_user and offset as
  * thread.keys_user.ap*.
  */
-	.macro ptrauth_keys_install_user tsk, tmp1, tmp2, tmp3
+	.macro __ptrauth_keys_install_user tsk, tmp1, tmp2, tmp3
 	mov	\tmp1, #THREAD_KEYS_USER
 	add	\tmp1, \tsk, \tmp1
-alternative_if_not ARM64_HAS_ADDRESS_AUTH
-	b	.Laddr_auth_skip_\@
-alternative_else_nop_endif
 	ldp	\tmp2, \tmp3, [\tmp1, #PTRAUTH_USER_KEY_APIA]
 	msr_s	SYS_APIAKEYLO_EL1, \tmp2
 	msr_s	SYS_APIAKEYHI_EL1, \tmp3
-	ldp	\tmp2, \tmp3, [\tmp1, #PTRAUTH_USER_KEY_APIB]
-	msr_s	SYS_APIBKEYLO_EL1, \tmp2
-	msr_s	SYS_APIBKEYHI_EL1, \tmp3
-	ldp	\tmp2, \tmp3, [\tmp1, #PTRAUTH_USER_KEY_APDA]
-	msr_s	SYS_APDAKEYLO_EL1, \tmp2
-	msr_s	SYS_APDAKEYHI_EL1, \tmp3
-	ldp	\tmp2, \tmp3, [\tmp1, #PTRAUTH_USER_KEY_APDB]
-	msr_s	SYS_APDBKEYLO_EL1, \tmp2
-	msr_s	SYS_APDBKEYHI_EL1, \tmp3
-.Laddr_auth_skip_\@:
-alternative_if ARM64_HAS_GENERIC_AUTH
-	ldp	\tmp2, \tmp3, [\tmp1, #PTRAUTH_USER_KEY_APGA]
-	msr_s	SYS_APGAKEYLO_EL1, \tmp2
-	msr_s	SYS_APGAKEYHI_EL1, \tmp3
-alternative_else_nop_endif
 	.endm
 
 	.macro __ptrauth_keys_install_kernel_nosync tsk, tmp1, tmp2, tmp3
diff --git a/arch/arm64/include/asm/pointer_auth.h b/arch/arm64/include/asm/pointer_auth.h
index 1a85e25d98ba..3e40fc776ea3 100644
--- a/arch/arm64/include/asm/pointer_auth.h
+++ b/arch/arm64/include/asm/pointer_auth.h
@@ -35,6 +35,25 @@  struct ptrauth_keys_kernel {
 	struct ptrauth_key apia;
 };
 
+#define __ptrauth_key_install_nosync(k, v)			\
+do {								\
+	struct ptrauth_key __pki_v = (v);			\
+	write_sysreg_s(__pki_v.lo, SYS_ ## k ## KEYLO_EL1);	\
+	write_sysreg_s(__pki_v.hi, SYS_ ## k ## KEYHI_EL1);	\
+} while (0)
+
+static inline void ptrauth_keys_install_user(struct ptrauth_keys_user *keys)
+{
+	if (system_supports_address_auth()) {
+		__ptrauth_key_install_nosync(APIB, keys->apib);
+		__ptrauth_key_install_nosync(APDA, keys->apda);
+		__ptrauth_key_install_nosync(APDB, keys->apdb);
+	}
+
+	if (system_supports_generic_auth())
+		__ptrauth_key_install_nosync(APGA, keys->apga);
+}
+
 static inline void ptrauth_keys_init_user(struct ptrauth_keys_user *keys)
 {
 	if (system_supports_address_auth()) {
@@ -46,14 +65,9 @@  static inline void ptrauth_keys_init_user(struct ptrauth_keys_user *keys)
 
 	if (system_supports_generic_auth())
 		get_random_bytes(&keys->apga, sizeof(keys->apga));
-}
 
-#define __ptrauth_key_install_nosync(k, v)			\
-do {								\
-	struct ptrauth_key __pki_v = (v);			\
-	write_sysreg_s(__pki_v.lo, SYS_ ## k ## KEYLO_EL1);	\
-	write_sysreg_s(__pki_v.hi, SYS_ ## k ## KEYHI_EL1);	\
-} while (0)
+	ptrauth_keys_install_user(keys);
+}
 
 static __always_inline void ptrauth_keys_init_kernel(struct ptrauth_keys_kernel *keys)
 {
@@ -81,6 +95,9 @@  static inline unsigned long ptrauth_strip_insn_pac(unsigned long ptr)
 	return ptrauth_clear_pac(ptr);
 }
 
+#define ptrauth_suspend_exit()                                                 \
+	ptrauth_keys_install_user(&current->thread.keys_user)
+
 #define ptrauth_thread_init_user()                                             \
 	do {                                                                   \
 		ptrauth_keys_init_user(&current->thread.keys_user);            \
@@ -92,6 +109,9 @@  static inline unsigned long ptrauth_strip_insn_pac(unsigned long ptr)
 					   SCTLR_ELx_ENDA | SCTLR_ELx_ENDB);   \
 	} while (0)
 
+#define ptrauth_thread_switch_user(tsk)                                        \
+	ptrauth_keys_install_user(&(tsk)->thread.keys_user)
+
 #define ptrauth_thread_init_kernel(tsk)					\
 	ptrauth_keys_init_kernel(&(tsk)->thread.keys_kernel)
 #define ptrauth_thread_switch_kernel(tsk)				\
@@ -102,8 +122,10 @@  static inline unsigned long ptrauth_strip_insn_pac(unsigned long ptr)
 #define ptrauth_set_enabled_keys(tsk, keys, enabled)	(-EINVAL)
 #define ptrauth_get_enabled_keys(tsk)	(-EINVAL)
 #define ptrauth_strip_insn_pac(lr)	(lr)
+#define ptrauth_suspend_exit()
 #define ptrauth_thread_init_user()
 #define ptrauth_thread_init_kernel(tsk)
+#define ptrauth_thread_switch_user(tsk)
 #define ptrauth_thread_switch_kernel(tsk)
 #endif /* CONFIG_ARM64_PTR_AUTH */
 
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index dc0907a0240c..2bc21a71abcf 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -147,10 +147,6 @@  int main(void)
 #endif
 #ifdef CONFIG_ARM64_PTR_AUTH
   DEFINE(PTRAUTH_USER_KEY_APIA,		offsetof(struct ptrauth_keys_user, apia));
-  DEFINE(PTRAUTH_USER_KEY_APIB,		offsetof(struct ptrauth_keys_user, apib));
-  DEFINE(PTRAUTH_USER_KEY_APDA,		offsetof(struct ptrauth_keys_user, apda));
-  DEFINE(PTRAUTH_USER_KEY_APDB,		offsetof(struct ptrauth_keys_user, apdb));
-  DEFINE(PTRAUTH_USER_KEY_APGA,		offsetof(struct ptrauth_keys_user, apga));
   DEFINE(PTRAUTH_KERNEL_KEY_APIA,	offsetof(struct ptrauth_keys_kernel, apia));
   BLANK();
 #endif
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 33037121fd0d..933db3f30f7a 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -247,21 +247,26 @@  alternative_else_nop_endif
 	check_mte_async_tcf x19, x22
 	apply_ssbd 1, x22, x23
 
-	ptrauth_keys_install_kernel_nosync tsk, x20, x22, x23
-
 #ifdef CONFIG_ARM64_PTR_AUTH
 alternative_if ARM64_HAS_ADDRESS_AUTH
 	/*
 	 * Enable IA for in-kernel PAC if the task had it disabled. Although
 	 * this could be implemented with an unconditional MRS which would avoid
 	 * a load, this was measured to be slower on Cortex-A75 and Cortex-A76.
+	 *
+	 * Install the kernel IA key only if IA was enabled in the task. If IA
+	 * was disabled on kernel exit then we would have left the kernel IA
+	 * installed so there is no need to install it again.
 	 */
 	ldr	x0, [tsk, THREAD_SCTLR_USER]
-	tbnz	x0, SCTLR_ELx_ENIA_SHIFT, 1f
+	tbz	x0, SCTLR_ELx_ENIA_SHIFT, 1f
+	__ptrauth_keys_install_kernel_nosync tsk, x20, x22, x23
+	b	2f
+1:
 	mrs	x0, sctlr_el1
 	orr	x0, x0, SCTLR_ELx_ENIA
 	msr	sctlr_el1, x0
-1:
+2:
 	isb
 alternative_else_nop_endif
 #endif
@@ -368,24 +373,24 @@  alternative_else_nop_endif
 3:
 	scs_save tsk, x0
 
-	/*
-	 * No kernel C function calls after this as user keys are set and IA may
-	 * be disabled.
-	 */
-	ptrauth_keys_install_user tsk, x0, x1, x2
-
 #ifdef CONFIG_ARM64_PTR_AUTH
 alternative_if ARM64_HAS_ADDRESS_AUTH
 	/*
-	 * IA was enabled for in-kernel PAC. Disable it now if needed.
-	 * All other per-task SCTLR bits were updated on task switch.
+	 * IA was enabled for in-kernel PAC. Disable it now if needed, or
+	 * alternatively install the user's IA. All other per-task keys and
+	 * SCTLR bits were updated on task switch.
+	 *
+	 * No kernel C function calls after this.
 	 */
 	ldr	x0, [tsk, THREAD_SCTLR_USER]
-	tbnz	x0, SCTLR_ELx_ENIA_SHIFT, 1f
+	tbz	x0, SCTLR_ELx_ENIA_SHIFT, 1f
+	__ptrauth_keys_install_user tsk, x0, x1, x2
+	b	2f
+1:
 	mrs	x0, sctlr_el1
 	bic	x0, x0, SCTLR_ELx_ENIA
 	msr	sctlr_el1, x0
-1:
+2:
 alternative_else_nop_endif
 #endif
 
diff --git a/arch/arm64/kernel/pointer_auth.c b/arch/arm64/kernel/pointer_auth.c
index f03e5bfe4490..60901ab0a7fe 100644
--- a/arch/arm64/kernel/pointer_auth.c
+++ b/arch/arm64/kernel/pointer_auth.c
@@ -43,6 +43,7 @@  int ptrauth_prctl_reset_keys(struct task_struct *tsk, unsigned long arg)
 		get_random_bytes(&keys->apdb, sizeof(keys->apdb));
 	if (arg & PR_PAC_APGAKEY)
 		get_random_bytes(&keys->apga, sizeof(keys->apga));
+	ptrauth_keys_install_user(keys);
 
 	return 0;
 }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index a6b0edc67e41..c978876ce07f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -570,6 +570,7 @@  __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	entry_task_switch(next);
 	ssbs_thread_switch(next);
 	erratum_1418040_thread_switch(prev, next);
+	ptrauth_thread_switch_user(next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index a67b37a7a47e..c5a6614a41f9 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -74,8 +74,9 @@  void notrace __cpu_suspend_exit(void)
 	 */
 	spectre_v4_enable_mitigation(NULL);
 
-	/* Restore additional MTE-specific configuration */
+	/* Restore additional feature-specific configuration */
 	mte_suspend_exit();
+	ptrauth_suspend_exit();
 }
 
 /*