diff mbox series

[v3,1/3] x86: Implement arch_prctl(ARCH_VSYSCALL_CONTROL) to disable vsyscall

Message ID 3a1c8280967b491bf6917a18fbff6c9b52e8df24.1641398395.git.fweimer@redhat.com
State Not Applicable
Headers show
Series [v3,1/3] x86: Implement arch_prctl(ARCH_VSYSCALL_CONTROL) to disable vsyscall | expand

Checks

Context Check Description
dj/TryBot-apply_patch fail Patch failed to apply to master at the time it was sent

Commit Message

Florian Weimer Jan. 5, 2022, 4:02 p.m. UTC
Distributions struggle with changing the default for vsyscall
emulation because it is a clear break of userspace ABI, something
that should not happen.

The legacy vsyscall interface is supposed to be used by libcs only,
not by applications.  This commit adds a new arch_prctl request,
ARCH_VSYSCALL_CONTROL, with one argument.  If the argument is 0,
executing vsyscalls will cause the process to terminate.  Argument 1
turns vsyscall back on (this is mostly for a largely theoretical
CRIU use case).

Newer libcs can use a zero ARCH_VSYSCALL_CONTROL at startup to disable
vsyscall for the process.  Legacy libcs do not perform this call, so
vsyscall remains enabled for them.  This approach should achieves
backwards compatibility (perfect compatibility if the assumption that
only libcs use vsyscall is accurate), and it provides full hardening
for new binaries.

The chosen value of ARCH_VSYSCALL_CONTROL should avoid conflicts
with other x86-64 arch_prctl requests.  The fact that with
vsyscall=emulate, reading the vsyscall region is still possible
even after a zero ARCH_VSYSCALL_CONTROL is considered limitation
in the current implementation and may change in a future kernel
version.

Future arch_prctls requests commonly used at process startup can imply
ARCH_VSYSCALL_CONTROL with a zero argument, so that a separate system
call for disabling vsyscall is avoided.

Signed-off-by: Florian Weimer <fweimer@redhat.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
---
v3: Remove warning log message.  Split out test.
v2: ARCH_VSYSCALL_CONTROL instead of ARCH_VSYSCALL_LOCKOUT.  New tests
    for the toggle behavior.  Implement hiding [vsyscall] in
    /proc/PID/maps and test it.  Various other test fixes cleanups
    (e.g., fixed missing second argument to gettimeofday).

arch/x86/entry/vsyscall/vsyscall_64.c | 7 ++++++-
 arch/x86/include/asm/mmu.h            | 6 ++++++
 arch/x86/include/uapi/asm/prctl.h     | 2 ++
 arch/x86/kernel/process_64.c          | 7 +++++++
 4 files changed, 21 insertions(+), 1 deletion(-)


base-commit: c9e6606c7fe92b50a02ce51dda82586ebdf99b48

Comments

Florian Weimer Jan. 13, 2022, 5:27 p.m. UTC | #1
* Florian Weimer:

> Distributions struggle with changing the default for vsyscall
> emulation because it is a clear break of userspace ABI, something
> that should not happen.
>
> The legacy vsyscall interface is supposed to be used by libcs only,
> not by applications.  This commit adds a new arch_prctl request,
> ARCH_VSYSCALL_CONTROL, with one argument.  If the argument is 0,
> executing vsyscalls will cause the process to terminate.  Argument 1
> turns vsyscall back on (this is mostly for a largely theoretical
> CRIU use case).
>
> Newer libcs can use a zero ARCH_VSYSCALL_CONTROL at startup to disable
> vsyscall for the process.  Legacy libcs do not perform this call, so
> vsyscall remains enabled for them.  This approach should achieves
> backwards compatibility (perfect compatibility if the assumption that
> only libcs use vsyscall is accurate), and it provides full hardening
> for new binaries.
>
> The chosen value of ARCH_VSYSCALL_CONTROL should avoid conflicts
> with other x86-64 arch_prctl requests.  The fact that with
> vsyscall=emulate, reading the vsyscall region is still possible
> even after a zero ARCH_VSYSCALL_CONTROL is considered limitation
> in the current implementation and may change in a future kernel
> version.
>
> Future arch_prctls requests commonly used at process startup can imply
> ARCH_VSYSCALL_CONTROL with a zero argument, so that a separate system
> call for disabling vsyscall is avoided.
>
> Signed-off-by: Florian Weimer <fweimer@redhat.com>
> Acked-by: Andrei Vagin <avagin@gmail.com>
> ---
> v3: Remove warning log message.  Split out test.
> v2: ARCH_VSYSCALL_CONTROL instead of ARCH_VSYSCALL_LOCKOUT.  New tests
>     for the toggle behavior.  Implement hiding [vsyscall] in
>     /proc/PID/maps and test it.  Various other test fixes cleanups
>     (e.g., fixed missing second argument to gettimeofday).
>
> arch/x86/entry/vsyscall/vsyscall_64.c | 7 ++++++-
>  arch/x86/include/asm/mmu.h            | 6 ++++++
>  arch/x86/include/uapi/asm/prctl.h     | 2 ++
>  arch/x86/kernel/process_64.c          | 7 +++++++
>  4 files changed, 21 insertions(+), 1 deletion(-)

Hello,

sorry to bother you again.  What can I do to move this forward?

Thanks,
Florian
Andy Lutomirski Jan. 13, 2022, 9:47 p.m. UTC | #2
On 1/5/22 08:02, Florian Weimer wrote:
> Distributions struggle with changing the default for vsyscall
> emulation because it is a clear break of userspace ABI, something
> that should not happen.
> 
> The legacy vsyscall interface is supposed to be used by libcs only,
> not by applications.  This commit adds a new arch_prctl request,
> ARCH_VSYSCALL_CONTROL, with one argument.  If the argument is 0,
> executing vsyscalls will cause the process to terminate.  Argument 1
> turns vsyscall back on (this is mostly for a largely theoretical
> CRIU use case).
> 
> Newer libcs can use a zero ARCH_VSYSCALL_CONTROL at startup to disable
> vsyscall for the process.  Legacy libcs do not perform this call, so
> vsyscall remains enabled for them.  This approach should achieves
> backwards compatibility (perfect compatibility if the assumption that
> only libcs use vsyscall is accurate), and it provides full hardening
> for new binaries.
> 
> The chosen value of ARCH_VSYSCALL_CONTROL should avoid conflicts
> with other x86-64 arch_prctl requests.  The fact that with
> vsyscall=emulate, reading the vsyscall region is still possible
> even after a zero ARCH_VSYSCALL_CONTROL is considered limitation
> in the current implementation and may change in a future kernel
> version.
> 
> Future arch_prctls requests commonly used at process startup can imply
> ARCH_VSYSCALL_CONTROL with a zero argument, so that a separate system
> call for disabling vsyscall is avoided.
> 
> Signed-off-by: Florian Weimer <fweimer@redhat.com>
> Acked-by: Andrei Vagin <avagin@gmail.com>
> ---
> v3: Remove warning log message.  Split out test.
> v2: ARCH_VSYSCALL_CONTROL instead of ARCH_VSYSCALL_LOCKOUT.  New tests
>      for the toggle behavior.  Implement hiding [vsyscall] in
>      /proc/PID/maps and test it.  Various other test fixes cleanups
>      (e.g., fixed missing second argument to gettimeofday).
> 
> arch/x86/entry/vsyscall/vsyscall_64.c | 7 ++++++-
>   arch/x86/include/asm/mmu.h            | 6 ++++++
>   arch/x86/include/uapi/asm/prctl.h     | 2 ++
>   arch/x86/kernel/process_64.c          | 7 +++++++
>   4 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
> index fd2ee9408e91..6fc524b9f232 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -174,6 +174,9 @@ bool emulate_vsyscall(unsigned long error_code,
>   
>   	tsk = current;
>   
> +	if (tsk->mm->context.vsyscall_disabled)
> +		goto sigsegv;
> +

Is there a reason you didn't just change the check earlier in the 
function to:

if (vsyscall_mode == NONE || current->mm->context.vsyscall_disabled)

Also, I still think the prctl should not be available if 
vsyscall=emulate.  Either we should fully implement it or we should not 
implement.  We could even do:

pr_warn_once("userspace vsyscall hardening request ignored because you 
have vsyscall=emulate.  Unless you absolutely need vsyscall=emulate, 
update your system to use vsyscall=xonly.\n");

and thus encourage good behavior.

--Andy
Florian Weimer Jan. 14, 2022, 1:36 p.m. UTC | #3
* Andy Lutomirski:

> Is there a reason you didn't just change the check earlier in the
> function to:
>
> if (vsyscall_mode == NONE || current->mm->context.vsyscall_disabled)

Andrei requested that I don't print anything if vsyscall was disabled.

The original patch used a different message for better diagnostics.

> Also, I still think the prctl should not be available if
> vsyscall=emulate.  Either we should fully implement it or we should
> not implement.  We could even do:
>
> pr_warn_once("userspace vsyscall hardening request ignored because you
> have vsyscall=emulate.  Unless you absolutely need vsyscall=emulate, 
> update your system to use vsyscall=xonly.\n");
>
> and thus encourage good behavior.

I think there is still some hardening applied even with
vsyscall=emulate.  The question is what is more important: the
additional hardening, or clean, easily described behavior of the
interface.

Maybe ARCH_VSYSCALL_CONTROL could return different values based on to
what degree it could disable vsyscall?

The pr_warn_once does not seem particularly useful.  Anyone who upgrades
glibc and still uses vsyscall=emulate will see that, with no way to
disable it.

Thanks,
Florian
diff mbox series

Patch

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index fd2ee9408e91..6fc524b9f232 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -174,6 +174,9 @@  bool emulate_vsyscall(unsigned long error_code,
 
 	tsk = current;
 
+	if (tsk->mm->context.vsyscall_disabled)
+		goto sigsegv;
+
 	/*
 	 * Check for access_ok violations and find the syscall nr.
 	 *
@@ -316,8 +319,10 @@  static struct vm_area_struct gate_vma __ro_after_init = {
 
 struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
 {
+	if (!mm || mm->context.vsyscall_disabled)
+		return NULL;
 #ifdef CONFIG_COMPAT
-	if (!mm || !(mm->context.flags & MM_CONTEXT_HAS_VSYSCALL))
+	if (!(mm->context.flags & MM_CONTEXT_HAS_VSYSCALL))
 		return NULL;
 #endif
 	if (vsyscall_mode == NONE)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 5d7494631ea9..3934d6907910 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -41,6 +41,12 @@  typedef struct {
 #ifdef CONFIG_X86_64
 	unsigned short flags;
 #endif
+#ifdef CONFIG_X86_VSYSCALL_EMULATION
+	/*
+	 * Changed by arch_prctl(ARCH_VSYSCALL_CONTROL).
+	 */
+	bool vsyscall_disabled;
+#endif
 
 	struct mutex lock;
 	void __user *vdso;			/* vdso base address */
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 754a07856817..aad0bcfbf49f 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -18,4 +18,6 @@ 
 #define ARCH_MAP_VDSO_32	0x2002
 #define ARCH_MAP_VDSO_64	0x2003
 
+#define ARCH_VSYSCALL_CONTROL	0x5001
+
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3402edec236c..834bad068211 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -816,6 +816,13 @@  long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 		ret = put_user(base, (unsigned long __user *)arg2);
 		break;
 	}
+#ifdef CONFIG_X86_VSYSCALL_EMULATION
+	case ARCH_VSYSCALL_CONTROL:
+		if (unlikely(arg2 > 1))
+			return -EINVAL;
+		current->mm->context.vsyscall_disabled = !arg2;
+		break;
+#endif
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
 # ifdef CONFIG_X86_X32_ABI