Kernel prctl feature for syscall interception and emulation

  Hi,

I'm proposing a kernel patch for a feature I'm calling Syscall User
Dispatch (SUD).  It is a mechanism to efficiently redirect system calls
of only part of a binary back to userspace to be emulated by a
compatibility layer.  The patchset is close to being accepted, but
Florian suggested the feature might pose some constraints on glibc, and
requested I raise the discussion here.

The problem I am trying to solve is that modern Windows games running
over Wine are issuing Windows system calls directly from the Windows
code, without going through the "WinAPI", which doesn't give Wine a
chance to emulate the library calls and implement the behavior.  As a
result, Windows syscalls reache the Linux kernel, and the kernel has
no context to differentiate them from native syscalls coming from the
Wine side, since it cannot trust the ABI, not even syscall numbers to be
something sane.  Historically, Windows applications were very respectful
of the WinAPI, not bypassing it, but we are seeing modern applications
like games doing it more often for reasons, I believe, of DRM.

It is worth mentioning that, by design, Wine and the Windows application
run on the same process space, so we really cannot just filter specific
threads or the entire application. We need some kind of filter executed
on each system call.

Now, the obvious way to solve this problem would be cBPF filtering
memory regions, through Seccomp.  The main problem with that approach is
the performance of executing a large cBPF filter.  The goal is to run
games, and we observed the Seccomp filter become a bottleneck, since we
have many, many memory areas that need to be checked by cBPF.  In
addition, seccomp, as a security mechanism, doesn't support some filter
update operations, like removing them.  Another approaches were
explored, like making a new mode out of seccomp, but the kernel
community preferred to make it a separate, self-contained mechanism.
Other solutions, like (live) patching the Windows application are out
of question, as they would trip DRM and anti-cheat protection
mechanisms.

The SUD interface I proposed to the kernel community is self-contained
and exposed as a prctl option.  It lets userspace define a switch
variable per-thread that, when set, will raise a SIGSYS for any syscall
attempted.  The idea is that Wine can just flip this switch efficiently
before delivering control to the Windows portions of the binary, and
flip it back off when it needs to execute native syscalls.  It is
important for us that the switch flip doesn't require a syscall, for
performance reasons.  The interface also lets userspace define a
"dispatcher region" from where any syscalls are always executed,
regardless of the selector variable.  This is important for the return
of the SIGSYS directly to a Windows segment, where we need to execute
the signal return trampoline with the selector blocked.  Ideally, Wine
would simply define this dispatcher region as the entire libc code
segment, and just use the selector to safe-guard against Linux libraries
issuing syscalls by themselves (they exist).

I think my questions to libc are: what are the constraints, if any, that
libc would face with this new interface?  I expected this to be
completely invisible to libc.  In addition, are there any problems you
foresee with the current interface?

Finally, I don't think it makes sense to bother you immediately with
the kernel implementation patches, but if you want to see the them,
they are archived in the link below.  I can also share them directly on
this ML if you request it.

  https://lkml.org/lkml/2020/11/17/2347

Nevertheless, I think it is useful the share the final patch, that has
the in-tree documentation for the interface, which I inlined in this
message.

Thanks.

-- >8 --
Subject: [PATCH v7 7/7] docs: Document Syscall User Dispatch

Explain the interface, provide some background and security notes.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 .../admin-guide/syscall-user-dispatch.rst     | 87 +++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100644 Documentation/admin-guide/syscall-user-dispatch.rst

Message ID	873616v6g9.fsf@collabora.com
State	Not applicable
Headers	Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 20F98386F407; Wed, 18 Nov 2020 18:57:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 20F98386F407 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1605725854; bh=dXGJo02C5IbLmlO8hgl+nyEb6pyJBaDnKPhIoBP8JaY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=lkG1/bI7k1wXLLXv6Rl12yQqDmQW/nFL+eYcBqhYnWGsXhn/Tr4G/7QHASklcylof uifa6Pip/oY9fFjTepHA6z3yE5bt+UhiljteB7myJPVLabIYsTSOyUSPi6ne3e1gZJ J7w+SJ8D6kX2sfwMBEZWJGRPzm1ni5S8ALQAr9FM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [46.235.227.227]) by sourceware.org (Postfix) with ESMTPS id 641F53857C4C for <libc-alpha@sourceware.org>; Wed, 18 Nov 2020 18:57:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 641F53857C4C Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id 543081F44B45 To: libc-alpha@sourceware.org Subject: Kernel prctl feature for syscall interception and emulation Date: Wed, 18 Nov 2020 13:57:26 -0500 Message-ID: <873616v6g9.fsf@collabora.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Gabriel Krisman Bertazi via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Gabriel Krisman Bertazi <krisman@collabora.com> Cc: Florian Weimer <fw@deneb.enyo.de>, linux-kernel@vger.kernel.org Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org>
Series	Kernel prctl feature for syscall interception and emulation \| Kernel prctl feature for syscall interception and emulation

Kernel prctl feature for syscall interception and emulation

Commit Message

Comments

Patch