x86: Wire up 32-bit direct socket calls

Message ID cb5138299d37d5800e2d135b01a7667fa6115854.1436912629.git.luto@kernel.org
State Not applicable
Headers

Commit Message

Andy Lutomirski July 14, 2015, 10:24 p.m. UTC
  On x86_64, there's no socketcall syscall; instead all of the socket
calls are real syscalls.  For 32-bit programs, we're stuck offering
the socketcall syscall, but it would be nice to expose the direct
calls as well.  This will enable seccomp to filter socket calls (for
new userspace only, but that's fine for some applications) and it
will provide a tiny performance boost.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
  

Comments

Geert Uytterhoeven Sept. 2, 2015, 9:48 a.m. UTC | #1
On Wed, Jul 15, 2015 at 12:24 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On x86_64, there's no socketcall syscall; instead all of the socket
> calls are real syscalls.  For 32-bit programs, we're stuck offering
> the socketcall syscall, but it would be nice to expose the direct
> calls as well.  This will enable seccomp to filter socket calls (for
> new userspace only, but that's fine for some applications) and it
> will provide a tiny performance boost.
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/syscalls/syscall_32.tbl | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ef8187f9d28d..25e3cf1cd8fd 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -365,3 +365,18 @@
>  356    i386    memfd_create            sys_memfd_create
>  357    i386    bpf                     sys_bpf
>  358    i386    execveat                sys_execveat                    stub32_execveat
> +359    i386    socket                  sys_socket
> +360    i386    socketpair              sys_socketpair
> +361    i386    bind                    sys_bind
> +362    i386    connect                 sys_connect
> +363    i386    listen                  sys_listen
> +364    i386    accept4                 sys_accept4
> +365    i386    getsockopt              sys_getsockopt                  compat_sys_getsockopt
> +366    i386    setsockopt              sys_setsockopt                  compat_sys_setsockopt
> +367    i386    getsockname             sys_getsockname
> +368    i386    getpeername             sys_getpeername
> +369    i386    sendto                  sys_sendto
> +370    i386    sendmsg                 sys_sendmsg                     compat_sys_sendmsg
> +371    i386    recvfrom                sys_recvfrom                    compat_sys_recvfrom
> +372    i386    recvmsg                 sys_recvmsg                     compat_sys_recvmsg
> +373    i386    shutdown                sys_shutdown

Should all other architectures follow suit?
Or should we follow the s390 approach:

commit 5a7ff75a0c63222d138d944240146dc49a9624e1
Author: Heiko Carstens <heiko.carstens@de.ibm.com>
Date:   Tue Aug 4 09:15:58 2015 +0200

    s390/syscalls: ignore syscalls reachable via sys_socketcall

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
  
H. Peter Anvin Sept. 2, 2015, 8:16 p.m. UTC | #2
On 09/02/2015 02:48 AM, Geert Uytterhoeven wrote:
> 
> Should all other architectures follow suit?
> Or should we follow the s390 approach:
> 

It is up to the maintainer(s), largely dependent on how likely you are
going to want to support this in your libc, but in general, socketcall
is an abomination which there is no reason not to bypass.

So follow suit unless you have a strong reason not to.

	-hpa
  
David Laight Sept. 3, 2015, 10:06 a.m. UTC | #3
From: Peter Anvin

> Sent: 02 September 2015 21:16

> On 09/02/2015 02:48 AM, Geert Uytterhoeven wrote:

> >

> > Should all other architectures follow suit?

> > Or should we follow the s390 approach:

> >

> 

> It is up to the maintainer(s), largely dependent on how likely you are

> going to want to support this in your libc, but in general, socketcall

> is an abomination which there is no reason not to bypass.


The other (worse) abomination is the way SCTP overloads setsockopt()
to perform actions that change state.
Rather unfortunately that got documented in the protocol standard :-(

	David
  
Arnd Bergmann Sept. 7, 2015, 12:53 p.m. UTC | #4
On Wednesday 02 September 2015 13:16:19 H. Peter Anvin wrote:
> On 09/02/2015 02:48 AM, Geert Uytterhoeven wrote:
> > 
> > Should all other architectures follow suit?
> > Or should we follow the s390 approach:
> > 
> 
> It is up to the maintainer(s), largely dependent on how likely you are
> going to want to support this in your libc, but in general, socketcall
> is an abomination which there is no reason not to bypass.
> 
> So follow suit unless you have a strong reason not to.

+1

In my y2038 syscall series, I'm adding a new recvmmsg64 call, and
we may decide to add new setsockopt/getsockopt variants as well.
This is probably not the last change to socketcall, and it would
be made much easier if all architectures had separate calls here.

It seems that there are very few architectures that don't already have
the separate calls:

$ git grep -l __NR_socketcall arch/*/include/uapi  | xargs git grep -L recvmsg 
arch/cris/include/uapi/asm/unistd.h
arch/frv/include/uapi/asm/unistd.h
arch/m32r/include/uapi/asm/unistd.h
arch/m68k/include/uapi/asm/unistd.h
arch/mn10300/include/uapi/asm/unistd.h
arch/s390/include/uapi/asm/unistd.h

These are of course all examples of architectures that originally followed
the i386 syscall scheme closely rather than trying to leave out obsolete
calls.

	Arnd
  
Heiko Carstens Sept. 11, 2015, 8:24 a.m. UTC | #5
On Mon, Sep 07, 2015 at 02:53:12PM +0200, Arnd Bergmann wrote:
> On Wednesday 02 September 2015 13:16:19 H. Peter Anvin wrote:
> > On 09/02/2015 02:48 AM, Geert Uytterhoeven wrote:
> > > 
> > > Should all other architectures follow suit?
> > > Or should we follow the s390 approach:
> > > 
> > 
> > It is up to the maintainer(s), largely dependent on how likely you are
> > going to want to support this in your libc, but in general, socketcall
> > is an abomination which there is no reason not to bypass.
> > 
> > So follow suit unless you have a strong reason not to.
> 
> +1
> 
> In my y2038 syscall series, I'm adding a new recvmmsg64 call, and
> we may decide to add new setsockopt/getsockopt variants as well.
> This is probably not the last change to socketcall, and it would
> be made much easier if all architectures had separate calls here.
> 
> It seems that there are very few architectures that don't already have
> the separate calls:
> 
> $ git grep -l __NR_socketcall arch/*/include/uapi  | xargs git grep -L recvmsg 
> arch/cris/include/uapi/asm/unistd.h
> arch/frv/include/uapi/asm/unistd.h
> arch/m32r/include/uapi/asm/unistd.h
> arch/m68k/include/uapi/asm/unistd.h
> arch/mn10300/include/uapi/asm/unistd.h
> arch/s390/include/uapi/asm/unistd.h
> 
> These are of course all examples of architectures that originally followed
> the i386 syscall scheme closely rather than trying to leave out obsolete
> calls.

FWIW, the s390 approach (ignoring the "new" system calls) is only temporarily.
I'll enable the seperate calls later when I have time to test everything,
especially the glibc stuff.

The same is true for the ipc system call. (any reason why the seperate system
calls haven't been enabled on x86 now as well?)
  
Arnd Bergmann Sept. 11, 2015, 8:46 a.m. UTC | #6
On Friday 11 September 2015 10:24:29 Heiko Carstens wrote:
> 
> FWIW, the s390 approach (ignoring the "new" system calls) is only temporarily.
> I'll enable the seperate calls later when I have time to test everything,
> especially the glibc stuff.

Ok, thanks for clarifying.

> The same is true for the ipc system call. (any reason why the seperate system
> calls haven't been enabled on x86 now as well?)

Agreed, we should split that out on all architectures as well.
Almost the same set of architectures that have sys_socketcall also
have sys_ipc, and the reasons for changing are identical. I don't
think we have any other system calls that are handled like this
on some architectures but not on others. There are a couple of
system calls (e.g. futex) that are also multiplexers, but at
least they do it consistently.

	Arnd
  
Geert Uytterhoeven Sept. 11, 2015, 9:54 a.m. UTC | #7
On Fri, Sep 11, 2015 at 10:46 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday 11 September 2015 10:24:29 Heiko Carstens wrote:
>>
>> FWIW, the s390 approach (ignoring the "new" system calls) is only temporarily.
>> I'll enable the seperate calls later when I have time to test everything,
>> especially the glibc stuff.
>
> Ok, thanks for clarifying.
>
>> The same is true for the ipc system call. (any reason why the seperate system
>> calls haven't been enabled on x86 now as well?)
>
> Agreed, we should split that out on all architectures as well.
> Almost the same set of architectures that have sys_socketcall also
> have sys_ipc, and the reasons for changing are identical. I don't
> think we have any other system calls that are handled like this
> on some architectures but not on others. There are a couple of
> system calls (e.g. futex) that are also multiplexers, but at
> least they do it consistently.

To make sure I don't miss any (it seems I missed recvmmsg and sendmmsg for
the socketcall case, sigh), this is the list of ipc syscalls to implement?

    sys_msgget
    sys_msgctl
    sys_msgrcv
    sys_msgsnd
    sys_semget
    sys_semctl
    sys_semtimedop
    sys_shmget
    sys_shmctl
    sys_shmat
    sys_shmdt

sys_semop() seems to be unneeded because it can be implemented using
sys_semtimedop()?

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
  
Arnd Bergmann Sept. 11, 2015, 10:14 a.m. UTC | #8
On Friday 11 September 2015 11:54:50 Geert Uytterhoeven wrote:
> To make sure I don't miss any (it seems I missed recvmmsg and sendmmsg for
> the socketcall case, sigh), this is the list of ipc syscalls to implement?
> 
>     sys_msgget
>     sys_msgctl
>     sys_msgrcv
>     sys_msgsnd
>     sys_semget
>     sys_semctl
>     sys_semtimedop
>     sys_shmget
>     sys_shmctl
>     sys_shmat
>     sys_shmdt
> 
> sys_semop() seems to be unneeded because it can be implemented using
> sys_semtimedop()?
> 

Yes, that list looks right. IPC also includes a set of six sys_mq_*
call, but I believe that everyone already has those as they are not
covered by sys_ipc.

For y2038 compatibility, we will likely add a new variant of
semtimedop that takes a 64-bit timespec. While the argument passed
there is a relative time that will never need to be longer than 68
years, we need to accommodate user space that defines timespec
in a sane way, and converting the argument in libc would be awkward.

	Arnd
  
Andy Lutomirski Sept. 11, 2015, 4:32 p.m. UTC | #9
On Fri, Sep 11, 2015 at 3:14 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday 11 September 2015 11:54:50 Geert Uytterhoeven wrote:
>> To make sure I don't miss any (it seems I missed recvmmsg and sendmmsg for
>> the socketcall case, sigh), this is the list of ipc syscalls to implement?
>>
>>     sys_msgget
>>     sys_msgctl
>>     sys_msgrcv
>>     sys_msgsnd
>>     sys_semget
>>     sys_semctl
>>     sys_semtimedop
>>     sys_shmget
>>     sys_shmctl
>>     sys_shmat
>>     sys_shmdt
>>
>> sys_semop() seems to be unneeded because it can be implemented using
>> sys_semtimedop()?
>>
>
> Yes, that list looks right. IPC also includes a set of six sys_mq_*
> call, but I believe that everyone already has those as they are not
> covered by sys_ipc.
>
> For y2038 compatibility, we will likely add a new variant of
> semtimedop that takes a 64-bit timespec. While the argument passed
> there is a relative time that will never need to be longer than 68
> years, we need to accommodate user space that defines timespec
> in a sane way, and converting the argument in libc would be awkward.
>

I missed sys_ipc entirely.

Ingo, Thomas, want to just wire those up, too?  I can send a patch
next week, but it'll be as trivial as the socket one.

--Andy
  
Ingo Molnar Sept. 14, 2015, 1:35 p.m. UTC | #10
* Andy Lutomirski <luto@amacapital.net> wrote:

> On Fri, Sep 11, 2015 at 3:14 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Friday 11 September 2015 11:54:50 Geert Uytterhoeven wrote:
> >> To make sure I don't miss any (it seems I missed recvmmsg and sendmmsg for
> >> the socketcall case, sigh), this is the list of ipc syscalls to implement?
> >>
> >>     sys_msgget
> >>     sys_msgctl
> >>     sys_msgrcv
> >>     sys_msgsnd
> >>     sys_semget
> >>     sys_semctl
> >>     sys_semtimedop
> >>     sys_shmget
> >>     sys_shmctl
> >>     sys_shmat
> >>     sys_shmdt
> >>
> >> sys_semop() seems to be unneeded because it can be implemented using
> >> sys_semtimedop()?
> >>
> >
> > Yes, that list looks right. IPC also includes a set of six sys_mq_*
> > call, but I believe that everyone already has those as they are not
> > covered by sys_ipc.
> >
> > For y2038 compatibility, we will likely add a new variant of
> > semtimedop that takes a 64-bit timespec. While the argument passed
> > there is a relative time that will never need to be longer than 68
> > years, we need to accommodate user space that defines timespec
> > in a sane way, and converting the argument in libc would be awkward.
> >
> 
> I missed sys_ipc entirely.
> 
> Ingo, Thomas, want to just wire those up, too?  I can send a patch
> next week, but it'll be as trivial as the socket one.

Yeah, sure - split out system calls are so much better (and slightly faster) than 
omnibus demuxers.

Thanks,

	Ingo
  
H. Peter Anvin Sept. 15, 2015, 8:55 p.m. UTC | #11
On 09/14/2015 06:35 AM, Ingo Molnar wrote:
>>
>> I missed sys_ipc entirely.
>>
>> Ingo, Thomas, want to just wire those up, too?  I can send a patch
>> next week, but it'll be as trivial as the socket one.
> 
> Yeah, sure - split out system calls are so much better (and slightly faster) than 
> omnibus demuxers.
> 

Indeed.  sys_socketcall and sys_ipc are legacy mistakes.

	-hpa
  

Patch

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f9d28d..25e3cf1cd8fd 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,18 @@ 
 356	i386	memfd_create		sys_memfd_create
 357	i386	bpf			sys_bpf
 358	i386	execveat		sys_execveat			stub32_execveat
+359	i386	socket			sys_socket
+360	i386	socketpair		sys_socketpair
+361	i386	bind			sys_bind
+362	i386	connect			sys_connect
+363	i386	listen			sys_listen
+364	i386	accept4			sys_accept4
+365	i386	getsockopt		sys_getsockopt			compat_sys_getsockopt
+366	i386	setsockopt		sys_setsockopt			compat_sys_setsockopt
+367	i386	getsockname		sys_getsockname
+368	i386	getpeername		sys_getpeername
+369	i386	sendto			sys_sendto
+370	i386	sendmsg			sys_sendmsg			compat_sys_sendmsg
+371	i386	recvfrom		sys_recvfrom			compat_sys_recvfrom
+372	i386	recvmsg			sys_recvmsg			compat_sys_recvmsg
+373	i386	shutdown		sys_shutdown