Message ID | 20170822142915.5yokngypphdb4mv4@cs.unibo.it |
---|---|
State | New, archived |
Headers |
Received: (qmail 86094 invoked by alias); 22 Aug 2017 14:29:32 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 80978 invoked by uid 89); 22 Aug 2017 14:29:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-23.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RP_MATCHES_RCVD, SPF_NEUTRAL autolearn=ham version=3.3.2 spammy=dragonfly, UD:su, life, communication X-HELO: mail.virtlab.unibo.it Date: Tue, 22 Aug 2017 16:29:15 +0200 From: Renzo Davoli <renzo@cs.unibo.it> To: libc-alpha@sourceware.org Subject: [PATCH] (linux) opensock: AF_NETLINK is a better choice than AF_UNIX Message-ID: <20170822142915.5yokngypphdb4mv4@cs.unibo.it> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: NeoMutt/20170609 (1.8.3) |
Commit Message
Renzo Davoli
Aug. 22, 2017, 2:29 p.m. UTC
In sysdeps/unix/sysv/linux/opensock.c the function __opensock opens a "generic" socket. This function tries to open sockets of different families and returns as soon as it succeeds to open a socket. It remembers the kind of socket which succeeded to avoid the scan in subsequent calls. The sequence of families for the attempts are: AF_UNIX, AF_INET, AF_INET6, etc. This function is used in: ./sysdeps/unix/sysv/linux/if_index.c ./sysdeps/unix/sysv/linux/ifreq.c (for getifaddrs) All the calling functions are related to the TCP-IP (v4 or v6) stack. It is conceptually inconsistent to use AF_UNIX sockets (although the Linux kernel accepts any kind of sockets). AF_UNIX has nothing to do with networking: it is just an InterProcess Communication means. This choice makes harder to implement system call partial virtualization https://sourceforge.net/projects/view-os/ User-space TCP-IP stacks as umnet modules virtualize the address families AF_INET, AF_INET6 and AF_NETLINK (iproute for net configuration). From the practical point of view using (for example) a umnetlwipv6 stack, /bin/ip does not work while the busybox implementation of ip succeeds (it uses a AF_INET socket for SIOCGIFINDEX). $ ip addr add 10.0.0.1/24 dev vd0 Cannot find device "vd0" $ busybox ip addr add 10.0.0.1/24 dev vd0 $ Partial virtualization using the current glibc code is harder: all the ioctl requests should be captured and virtualized if the fd is a socket and the request # is in a specific range. It is not possible to decide whether a "socket(AF_UNIX..)" system call returns a fd that needs virtualization or not. If it gets connected to "/tmp/.X11-unix/X0" (for example) it is not virtualized by the net module while if it is used in a SIOCG* ioctl call it must be virtualized. The patch here attached changes the sequence of socket for opensock's attempts to: AF_NETLINK, AF_INET, AF_INET6, AF_UNIX, etc. Netlink is more appropriate for ioctl calls like "SIOGCIFINDEX". I suppose AF_UNIX was chosen as its data structures are lighter than those of AF_INET(6). AF_NETLINK should be as light as AF_UNIX and it would make the life easier for partial virtualization. A sequence like AF_NETLINK, AF_UNIX, AF_INET, AF_INET6,... would fit anyway. Looking around in other libC implementations (paths are relative to their source trees): * NetBSD uses AF_INET see: http://bxr.su/DragonFly/lib/libc/net/if_nametoindex.c * dietlibc uses AF_INET6 see: lib/if_nametoindex.c * ulibc: uses AF_INET6 or AF_INET if AF_INET6 fails: see: libc/inet/opensock.c * newlibc: uses AF_INET in libc/sys/linux/net/ifname.c and a __opensock similar to the one in glibc in libc/sys/linux/net/ifreq.c * eglibc seems to share the same opensock code of glibc. Thank you renzo 2018-08-20 Renzo Davoli <renzo@cs.unibo.it>
Comments
On 22/08/2017 11:29, Renzo Davoli wrote: > In sysdeps/unix/sysv/linux/opensock.c the function __opensock > opens a "generic" socket. > This function tries to open sockets of different families and returns > as soon as it succeeds to open a socket. It remembers the kind of socket > which succeeded to avoid the scan in subsequent calls. > > The sequence of families for the attempts are: > AF_UNIX, AF_INET, AF_INET6, etc. > > This function is used in: > ./sysdeps/unix/sysv/linux/if_index.c > ./sysdeps/unix/sysv/linux/ifreq.c (for getifaddrs) > > All the calling functions are related to the TCP-IP (v4 or v6) stack. > It is conceptually inconsistent to use AF_UNIX sockets (although > the Linux kernel accepts any kind of sockets). > AF_UNIX has nothing to do with networking: it is just an InterProcess > Communication means. > > This choice makes harder to implement system call partial virtualization > https://sourceforge.net/projects/view-os/ > > User-space TCP-IP stacks as umnet modules virtualize the address families > AF_INET, AF_INET6 and AF_NETLINK (iproute for net configuration). > From the practical point of view using (for example) a umnetlwipv6 > stack, /bin/ip does not work while the busybox implementation of ip > succeeds (it uses a AF_INET socket for SIOCGIFINDEX). > $ ip addr add 10.0.0.1/24 dev vd0 > Cannot find device "vd0" > $ busybox ip addr add 10.0.0.1/24 dev vd0 > $ > > Partial virtualization using the current glibc code is harder: > all the ioctl requests should be captured and virtualized if the fd > is a socket and the request # is in a specific range. > It is not possible to decide whether a "socket(AF_UNIX..)" system call > returns a fd that needs virtualization or not. > If it gets connected to "/tmp/.X11-unix/X0" (for example) it is not > virtualized by the net module while if it is used in a > SIOCG* ioctl call it must be virtualized. > > The patch here attached changes the sequence of socket for opensock's > attempts to: > AF_NETLINK, AF_INET, AF_INET6, AF_UNIX, etc. > > Netlink is more appropriate for ioctl calls like "SIOGCIFINDEX". > I suppose AF_UNIX was chosen as its data structures are lighter than > those of AF_INET(6). AF_NETLINK should be as light as AF_UNIX > and it would make the life easier for partial virtualization. > A sequence like AF_NETLINK, AF_UNIX, AF_INET, AF_INET6,... > would fit anyway. > > Looking around in other libC implementations > (paths are relative to their source trees): > > * NetBSD uses AF_INET see: > http://bxr.su/DragonFly/lib/libc/net/if_nametoindex.c > > * dietlibc uses AF_INET6 see: lib/if_nametoindex.c > > * ulibc: uses AF_INET6 or AF_INET if AF_INET6 fails: > see: libc/inet/opensock.c > > * newlibc: uses AF_INET in libc/sys/linux/net/ifname.c > and a __opensock similar to the one in glibc in > libc/sys/linux/net/ifreq.c > > * eglibc seems to share the same opensock code of glibc. > > Thank you > renzo I am in favour in using AF_NETLINK for getifaddrs, but not really due the constraints you exposed from your application. IMHO it is non issue for glibc since this module seems to cross application and kernel API boundaries (since using SIOCGIFINDEX with AF_INET is fully supported on current Linux versions). Also, all the current possible advantages of AF_NETLINK are not applicable on current GLIBC usage: the kernel information using ioctl (and thus is synchronous), it does not use multicast, and it uses unnamed AF_UNIX (so no credential leak when using a named file). The only advantage, which I am not sure if it really true, is some link advocates that AF_NETLINK has less overhead than UDP sockets. Anyway, this patch also has some serious issues you will need to check out before possible inclusion: * It requires a proper ChangeLog entry [1] * You need to actually check if this changes does not generate any regression by at least running a make check and if is the case also providing a proper testcase. I am seeing this failures with your patch: FAIL: inet/bug-if1 FAIL: inet/test_ifindex FAIL: inet/tst-inet6_scopeid_pton So you will need to first check why these tests are failing and either fixing them with a proper rationale or adjust your patch. * IMHO the application constraint for this modification is not a strong one, as I said is cross application API boundaries and usually this is not a good reason to change internal glibc behaviour. I can't really see any real good reason to change it, but I will also won't oppose if the new change do not add regressions. So in future submissions you can say it helps your application to make easier to syscall handling, but also states that the change should not make any substantial advantage. [1] https://sourceware.org/glibc/wiki/Contribution%20checklist#Properly_Formatted_GNU_ChangeLog > > 2018-08-20 Renzo Davoli <renzo@cs.unibo.it> > > diff --git a/sysdeps/unix/sysv/linux/opensock.c b/sysdeps/unix/sysv/linux/opensock.c > index 21508cdc75..236c49a42f 100644 > --- a/sysdeps/unix/sysv/linux/opensock.c > +++ b/sysdeps/unix/sysv/linux/opensock.c > @@ -35,9 +35,10 @@ __opensock (void) > const char procname[15]; > } afs[] = > { > - { AF_UNIX, "net/unix" }, > + { AF_NETLINK, "net/netlink" }, > { AF_INET, "" }, > { AF_INET6, "net/if_inet6" }, > + { AF_UNIX, "net/unix" }, > { AF_AX25, "net/ax25" }, > { AF_NETROM, "net/nr" }, > { AF_ROSE, "net/rose" }, >
diff --git a/sysdeps/unix/sysv/linux/opensock.c b/sysdeps/unix/sysv/linux/opensock.c index 21508cdc75..236c49a42f 100644 --- a/sysdeps/unix/sysv/linux/opensock.c +++ b/sysdeps/unix/sysv/linux/opensock.c @@ -35,9 +35,10 @@ __opensock (void) const char procname[15]; } afs[] = { - { AF_UNIX, "net/unix" }, + { AF_NETLINK, "net/netlink" }, { AF_INET, "" }, { AF_INET6, "net/if_inet6" }, + { AF_UNIX, "net/unix" }, { AF_AX25, "net/ax25" }, { AF_NETROM, "net/nr" }, { AF_ROSE, "net/rose" },