From patchwork Fri Mar 11 03:38:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: DJ Delorie X-Patchwork-Id: 51885 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 36DE63858436 for ; Fri, 11 Mar 2022 03:38:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 36DE63858436 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1646969926; bh=uKCdXRY3zu4cxnWGS6NaQ+BYck6HwV5LXnUh6fFmGFM=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=SH1D8WsdnUgwAvAtGQoSUAByOoudQsGjddBv1QPcI/5gHU+umnaPWWBiSYSVCC3cc 9Mr6INmbwinzzAubHC13Qw4RBC/MV+XOWV0OjTylmwaPcPQkQBk0kDWCrn1viqnG3t 30pXtaJCncSo6Rubq+mNfUJQhrOcYtMQ8VHw6DbQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 9950D3858C20 for ; Fri, 11 Mar 2022 03:38:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9950D3858C20 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-403-wyGMX1kSOpyLhjkpUtBaew-1; Thu, 10 Mar 2022 22:38:21 -0500 X-MC-Unique: wyGMX1kSOpyLhjkpUtBaew-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0C89351DC for ; Fri, 11 Mar 2022 03:38:21 +0000 (UTC) Received: from greed.delorie.com (ovpn-112-4.rdu2.redhat.com [10.10.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D055261176 for ; Fri, 11 Mar 2022 03:38:20 +0000 (UTC) Received: from greed.delorie.com.redhat.com (localhost [127.0.0.1]) by greed.delorie.com (8.15.2/8.15.2) with ESMTP id 22B3cJxY386303 for ; Thu, 10 Mar 2022 22:38:19 -0500 Date: Thu, 10 Mar 2022 22:38:19 -0500 Message-Id: To: libc-alpha@sourceware.org Subject: [patch v5] Allow for unpriviledged nested containers X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: DJ Delorie via Libc-alpha From: DJ Delorie Reply-To: DJ Delorie Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" [resent as v5] [Both versions were tested, with identical test results. Rather than delete my work, I leave it as an option in the source, in case it's needed later. I want to have my cake and eat it too ;-)] If the build itself is run in a container, we may not be able to fully set up a nested container for test-container testing. Notably is the mounting of /proc, since it's critical that it be mounted from within the same PID namespace as its users, and thus cannot be bind mounted from outside the container like other mounts. This patch chooses to use the parent's PID namespace instead of creating a new one, as this is more likely to be allowed. Code for using a separate PID namespace is included as well, in case in the future a test requires it. In this configuration, test-container may not be able to mount /proc but will run the test anyway, since most containerized tests do not require /proc. The few that do may predicate that, and support for such is also added. diff --git a/elf/tst-pldd.c b/elf/tst-pldd.c index f31f9956faa..e9e99d11e01 100644 --- a/elf/tst-pldd.c +++ b/elf/tst-pldd.c @@ -85,6 +85,9 @@ in_str_list (const char *libname, const char *const strlist[]) static int do_test (void) { + /* needs /proc/sys/kernel/yama/ptrace_scope and /proc/$child */ + support_need_proc (); + /* Check if our subprocess can be debugged with ptrace. */ { int ptrace_scope = support_ptrace_scope (); diff --git a/nptl/tst-pthread-getattr.c b/nptl/tst-pthread-getattr.c index d2ebf308ae7..4a8bf1b4846 100644 --- a/nptl/tst-pthread-getattr.c +++ b/nptl/tst-pthread-getattr.c @@ -28,6 +28,8 @@ #include #include +#include + /* There is an obscure bug in the kernel due to which RLIMIT_STACK is sometimes returned as unlimited when it is not, which may cause this test to fail. There is also the other case where RLIMIT_STACK is intentionally set as @@ -153,6 +155,9 @@ check_stack_top (void) static int do_test (void) { + /* Reads /proc/self/maps to get stack size. */ + support_need_proc (); + pagesize = sysconf (_SC_PAGESIZE); return check_stack_top (); } diff --git a/nss/tst-reload2.c b/nss/tst-reload2.c index fb3b94a1fab..94e2029fd35 100644 --- a/nss/tst-reload2.c +++ b/nss/tst-reload2.c @@ -95,6 +95,9 @@ do_test (void) char buf1[PATH_MAX]; char buf2[PATH_MAX]; + /* The xmkdirp below fails if we can't map our uid, which requires /proc. */ + support_need_proc (); + sprintf (buf1, "/subdir%s", support_slibdir_prefix); xmkdirp (buf1, 0777); diff --git a/support/Makefile b/support/Makefile index 5ddcb8d1581..f036a813048 100644 --- a/support/Makefile +++ b/support/Makefile @@ -64,6 +64,7 @@ libsupport-routines = \ support_format_netent \ support_isolate_in_subprocess \ support_mutex_pi_monotonic \ + support_need_proc \ support_path_support_time64 \ support_process_state \ support_ptrace \ diff --git a/support/support.h b/support/support.h index 73b9fc48f01..bcf7bc43723 100644 --- a/support/support.h +++ b/support/support.h @@ -91,6 +91,10 @@ char *support_quote_string (const char *); regular file open for writing, and initially empty. */ int support_descriptor_supports_holes (int fd); +/* Predicates that a test requires a working /proc filesystem. This + call will exit with UNSUPPORTED if /proc is not available. */ +void support_need_proc (void); + /* Error-checking wrapper functions which terminate the process on error. */ diff --git a/support/support_need_proc.c b/support/support_need_proc.c new file mode 100644 index 00000000000..5d94b25ba8b --- /dev/null +++ b/support/support_need_proc.c @@ -0,0 +1,33 @@ +/* Indicate that a test requires a working /proc. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +/* We test for /proc/self/maps since that's one of the files that one + of our tests actually uses, but the general idea is if Linux's + /proc/ (procfs) filesystem is mounted. If not, the process exits + with an UNSUPPORTED result code. */ + +void +support_need_proc (void) +{ + if (access ("/proc/self/maps", R_OK)) + FAIL_UNSUPPORTED ("/proc is not available"); +} diff --git a/support/test-container.c b/support/test-container.c index 25e7f142193..d574ee6cd89 100644 --- a/support/test-container.c +++ b/support/test-container.c @@ -49,6 +49,20 @@ #include "check.h" #include "test-driver.h" +/* If set to 0, we bind mount the parent's /proc and re-use the + paren't pids (i.e. test runs as the user). If set to 1, we enter a + new pid namespace and mount our own /proc (i.e. test runs as root). + I was tired of swapping the code back and forth during testing, so + you get both ;-) */ +#define SEPARATE_PID_NS 1 +#define SHARED_PID_NS (!SEPARATE_PID_NS) + +#if SEPARATE_PID_NS +#define MAYBE_CLONE_NEWPID CLONE_NEWPID +#else +#define MAYBE_CLONE_NEWPID 0 +#endif + #ifndef __linux__ #define mount(s,t,fs,f,d) no_mount() int no_mount (void) @@ -231,7 +245,7 @@ concat (const char *str, ...) static void trymount (const char *src, const char *dest) { - if (mount (src, dest, "", MS_BIND, NULL) < 0) + if (mount (src, dest, "", MS_BIND | MS_REC, NULL) < 0) FAIL_EXIT1 ("can't mount %s onto %s\n", src, dest); } @@ -1068,7 +1082,7 @@ main (int argc, char **argv) #ifdef CLONE_NEWNS /* The unshare here gives us our own spaces and capabilities. */ - if (unshare (CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNS) < 0) + if (unshare (CLONE_NEWUSER | MAYBE_CLONE_NEWPID | CLONE_NEWNS) < 0) { /* Older kernels may not support all the options, or security policy may block this call. */ @@ -1094,6 +1108,15 @@ main (int argc, char **argv) trymount (support_srcdir_root, new_srcdir_path); trymount (support_objdir_root, new_objdir_path); +#if SHARED_PID_NS + /* It may not be possible to mount /proc directly. */ + { + char *new_proc = concat (new_root_path, "/proc", NULL); + xmkdirp (new_proc, 0755); + trymount ("/proc", new_proc); + } +#endif + xmkdirp (concat (new_root_path, "/dev", NULL), 0755); devmount (new_root_path, "null"); devmount (new_root_path, "zero"); @@ -1163,42 +1186,57 @@ main (int argc, char **argv) maybe_xmkdir ("/tmp", 0755); +#if SEPARATE_PID_NS /* Now that we're pid 1 (effectively "root") we can mount /proc */ maybe_xmkdir ("/proc", 0777); - if (mount ("proc", "/proc", "proc", 0, NULL) < 0) - FAIL_EXIT1 ("Unable to mount /proc: "); - - /* We map our original UID to the same UID in the container so we - can own our own files normally. */ - UMAP = open ("/proc/self/uid_map", O_WRONLY); - if (UMAP < 0) - FAIL_EXIT1 ("can't write to /proc/self/uid_map\n"); - - sprintf (tmp, "%lld %lld 1\n", - (long long) (be_su ? 0 : original_uid), (long long) original_uid); - write (UMAP, tmp, strlen (tmp)); - xclose (UMAP); - - /* We must disable setgroups () before we can map our groups, else we - get EPERM. */ - GMAP = open ("/proc/self/setgroups", O_WRONLY); - if (GMAP >= 0) + if (mount ("proc", "/proc", "proc", 0, NULL) != 0) { - /* We support kernels old enough to not have this. */ - write (GMAP, "deny\n", 5); - xclose (GMAP); + // This happens if we're trying to create a nested container, + // like if the build is running under podman, and we lack + // priviledges. + + // Ideally we would WARN here, but that would just add noise to + // *every* test-container test, and the ones that care should + // have their own relevent diagnostics. + + // FAIL_EXIT1 ("Unable to mount /proc: "); } + else + /* The shared pid namespace case will always run the following block... */ +#endif + { + /* We map our original UID to the same UID in the container so we + can own our own files normally. */ + UMAP = open ("/proc/self/uid_map", O_WRONLY); + if (UMAP < 0) + FAIL_EXIT1 ("can't write to /proc/self/uid_map\n"); + + sprintf (tmp, "%lld %lld 1\n", + (long long) (be_su ? 0 : original_uid), (long long) original_uid); + write (UMAP, tmp, strlen (tmp)); + xclose (UMAP); + + /* We must disable setgroups () before we can map our groups, else we + get EPERM. */ + GMAP = open ("/proc/self/setgroups", O_WRONLY); + if (GMAP >= 0) + { + /* We support kernels old enough to not have this. */ + write (GMAP, "deny\n", 5); + xclose (GMAP); + } - /* We map our original GID to the same GID in the container so we - can own our own files normally. */ - GMAP = open ("/proc/self/gid_map", O_WRONLY); - if (GMAP < 0) - FAIL_EXIT1 ("can't write to /proc/self/gid_map\n"); + /* We map our original GID to the same GID in the container so we + can own our own files normally. */ + GMAP = open ("/proc/self/gid_map", O_WRONLY); + if (GMAP < 0) + FAIL_EXIT1 ("can't write to /proc/self/gid_map\n"); - sprintf (tmp, "%lld %lld 1\n", - (long long) (be_su ? 0 : original_gid), (long long) original_gid); - write (GMAP, tmp, strlen (tmp)); - xclose (GMAP); + sprintf (tmp, "%lld %lld 1\n", + (long long) (be_su ? 0 : original_gid), (long long) original_gid); + write (GMAP, tmp, strlen (tmp)); + xclose (GMAP); + } if (change_cwd) {