From patchwork Tue Aug 11 20:07:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 40242 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2E57B385141F; Tue, 11 Aug 2020 20:07:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2E57B385141F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1597176446; bh=Qsft/tsl0bUJB9ESy7zLUX74lUIW0WChcpIsoohESxs=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=rZTfzCafm050gN8pUsoPM5+T+fUfB4CjCie5lf82Vf2rbi3jCXvGICyoFTOh2jNp0 oODrTjCajNez6xCumhIjlL/vq1uzLRZKoPooIKqMDxj20lRMRVVdpv3J1qutd/UQ7p OGWDr8tyhKlaDBBH9xTp82wdwZjQKSPsEmpY9Gbg= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by sourceware.org (Postfix) with ESMTPS id D6BA33857C58 for ; Tue, 11 Aug 2020 20:07:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D6BA33857C58 Received: by mail-qk1-x735.google.com with SMTP id 77so47392qkm.5 for ; Tue, 11 Aug 2020 13:07:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Qsft/tsl0bUJB9ESy7zLUX74lUIW0WChcpIsoohESxs=; b=JmCTZQPqhyvlokYyKfIDc+CNqA9urUq+VBCJk/IH1C75CP25uGWWWgVigV8ZpYjJHc 0RxssHfMI/CwNjSTsg1X5NqmZJT4xx0mVv8wiNdbkcsYRC/CLUX/8P+g2mzqGhgS7TXv LZgPHm3oRv+pxx78V342JFYunkZsSe6Pz3e0GjJqU4C2SDIow5y+gGPjN3l5RMhJIVFw tvHfXZfobRC+iQxG6cq5ZLw8vFiPRx4rj2gtBKxaqUL0xbZYVfmx1DUxAhosGYtsRfbD UVFkJDvqXDg2ltMV8li/HXltL3Yk2F0obfU+mLaU/Fbw991QCRDAYFsPZCFqVpvCfxsn qUwA== X-Gm-Message-State: AOAM533tvc3qk2tDZHiaLTPyE9lNHJ3jMNFxK+REwcszGOhtm37UdpIA QObUfpKInXBgbbLET66yKZpjWMTgxcWfWA== X-Google-Smtp-Source: ABdhPJzdJ8jyoqwLXmoLLwQRV90Lqncn/TQSIO/Rk5dhK/iwjlB3Yz7b2WrTFV4v4bJ9ePB2ExiAwQ== X-Received: by 2002:a37:6644:: with SMTP id a65mr2842008qkc.4.1597176440790; Tue, 11 Aug 2020 13:07:20 -0700 (PDT) Received: from localhost.localdomain ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id i68sm17200886qkb.58.2020.08.11.13.07.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:07:20 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 2/2] linux: Optimize realpath stack usage Date: Tue, 11 Aug 2020 17:07:15 -0300 Message-Id: <20200811200715.3432505-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200811200715.3432505-1-adhemerval.zanella@linaro.org> References: <20200811200715.3432505-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Changes from previous version: - Changed __resolve_path to assume 'resolved' has at least PATH_MAX. - Dropped 'stdlib: Enforce PATH_MAX on allocated realpath buffer' patch and make __resolve_path result a path larget than PATH_MAX if 'resolved' is NULL. - Use __fd_to_filename to read the procfs. - Remove the fstat/lstat check. --- This optimizes the stack usage for success case (from ~8K to ~4k) and where 'resolved' input buffer is not provided. For ithe failure case when the 'resolved' buffer is provided, it requires use the generic strategy to find the path when EACESS or ENOENT is returned (this is a GNU extension not defined in the standard). Regarding syscalls usage, for a sucessful path without symlinks it trades 2 syscalls (getcwd/lstat) for 3 (openat, readlink, and close). Its is slighter better if the input contains multiple symlinks (where Linux kernel tricks allows replace multiple lstats by only one readlink). For failure it depends whether the 'resolved' buffer is provided, which will call the old strategy (and thus requiring more syscalls in general). Checked on x86_64-linux-gnu and i686-linux-gnu. --- include/stdlib.h | 12 +++ stdlib/Makefile | 2 +- stdlib/canonicalize-internal.c | 155 +++++++++++++++++++++++++++ stdlib/canonicalize.c | 161 +---------------------------- stdlib/realpath.c | 43 ++++++++ sysdeps/unix/sysv/linux/realpath.c | 53 ++++++++++ 6 files changed, 265 insertions(+), 161 deletions(-) create mode 100644 stdlib/canonicalize-internal.c create mode 100644 stdlib/realpath.c create mode 100644 sysdeps/unix/sysv/linux/realpath.c diff --git a/include/stdlib.h b/include/stdlib.h index ffcefd7b85..7ed9b14614 100644 --- a/include/stdlib.h +++ b/include/stdlib.h @@ -20,6 +20,14 @@ # include +# ifndef PATH_MAX +# ifdef MAXPATHLEN +# define PATH_MAX MAXPATHLEN +# else +# define PATH_MAX 1024 +# endif +# endif + extern __typeof (strtol_l) __strtol_l; extern __typeof (strtoul_l) __strtoul_l; extern __typeof (strtoll_l) __strtoll_l; @@ -92,6 +100,10 @@ extern int __unsetenv (const char *__name) attribute_hidden; extern int __clearenv (void) attribute_hidden; extern char *__mktemp (char *__template) __THROW __nonnull ((1)); extern char *__canonicalize_file_name (const char *__name); +extern char *__resolve_path (const char *name, char *resolved) + attribute_hidden; +extern char *__realpath_system (const char *name, char *resolved) + attribute_hidden; extern char *__realpath (const char *__name, char *__resolved); libc_hidden_proto (__realpath) extern int __ptsname_r (int __fd, char *__buf, size_t __buflen) diff --git a/stdlib/Makefile b/stdlib/Makefile index 7093b8a584..35ca04541f 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -53,7 +53,7 @@ routines := \ strtof strtod strtold \ strtof_l strtod_l strtold_l \ strtof_nan strtod_nan strtold_nan \ - system canonicalize \ + system realpath canonicalize canonicalize-internal \ a64l l64a \ rpmatch strfmon strfmon_l getsubopt xpg_basename fmtmsg \ strtoimax strtoumax wcstoimax wcstoumax \ diff --git a/stdlib/canonicalize-internal.c b/stdlib/canonicalize-internal.c new file mode 100644 index 0000000000..be65fa113f --- /dev/null +++ b/stdlib/canonicalize-internal.c @@ -0,0 +1,155 @@ +/* Internal function for canonicalize absolute name of a given file. + Copyright (C) 1996-2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include + +char * +__resolve_path (const char *name, char *resolved) +{ + size_t path_max = PATH_MAX; + const char *start, *end; + char *rpath = resolved; + char *rpath_limit = rpath + path_max; + char *dest = resolved; + char extra_buf[PATH_MAX]; + int num_links = 0; + + if (name[0] != '/') + { + if (__getcwd (rpath, path_max) == NULL) + { + rpath[0] = '\0'; + return NULL; + } + dest = __rawmemchr (rpath, '\0'); + } + else + { + rpath[0] = '/'; + dest = rpath + 1; + } + + for (start = end = name; *start; start = end) + { + /* Skip sequence of multiple path-separators. */ + while (*start == '/') + ++start; + + /* Find end of path component. */ + for (end = start; *end && *end != '/'; ++end) + /* Nothing. */; + + if (end - start == 0) + break; + else if (end - start == 1 && start[0] == '.') + /* nothing */; + else if (end - start == 2 && start[0] == '.' && start[1] == '.') + { + /* Back up to previous component, ignore if at root already. */ + if (dest > rpath + 1) + while ((--dest)[-1] != '/'); + } + else + { + struct stat64 st; + + if (dest[-1] != '/') + *dest++ = '/'; + if (dest + (end - start) >= rpath_limit) + { + if (resolved) + { + __set_errno (ENAMETOOLONG); + if (dest > rpath + 1) + dest--; + *dest = '\0'; + return NULL; + } + ptrdiff_t dest_offset = dest - rpath; + size_t new_size = rpath_limit - rpath; + if (end - start + 1 > PATH_MAX) + new_size += end - start + 1; + else + new_size += PATH_MAX; + char *new_rpath = (char *) realloc (rpath, new_size); + if (new_rpath == NULL) + return NULL; + + rpath = new_rpath; + rpath_limit = rpath + new_size; + path_max = new_size; + + dest = rpath + dest_offset; + } + + dest = __mempcpy (dest, start, end - start); + *dest = '\0'; + + if (__lstat64 (rpath, &st) < 0) + return NULL; + + if (S_ISLNK (st.st_mode)) + { + if (++num_links > __eloop_threshold ()) + { + __set_errno (ELOOP); + return NULL; + } + + char buf[PATH_MAX]; + ssize_t n = __readlink (rpath, buf, sizeof (buf) - 1); + if (n < 0) + return NULL; + buf[n] = '\0'; + + size_t len = strlen (end); + if (path_max - n <= len) + { + __set_errno (ENAMETOOLONG); + return NULL; + } + + memmove (&extra_buf[n], end, len + 1); + name = end = memcpy (extra_buf, buf, n); + + if (buf[0] == '/') + dest = rpath + 1; /* It's an absolute symlink */ + else + /* Back up to previous component, ignore if at root already: */ + if (dest > rpath + 1) + while ((--dest)[-1] != '/'); + } + else if (!S_ISDIR (st.st_mode) && *end != '\0') + { + __set_errno (ENOTDIR); + return NULL; + } + } + } + + if (dest > rpath + 1 && dest[-1] == '/') + --dest; + *dest = '\0'; + + return rpath; +} diff --git a/stdlib/canonicalize.c b/stdlib/canonicalize.c index 554ba221e4..f4ab528a15 100644 --- a/stdlib/canonicalize.c +++ b/stdlib/canonicalize.c @@ -16,26 +16,11 @@ License along with the GNU C Library; if not, see . */ -#include #include -#include -#include -#include -#include #include -#include -#include #include -#ifndef PATH_MAX -# ifdef MAXPATHLEN -# define PATH_MAX MAXPATHLEN -# else -# define PATH_MAX 1024 -# endif -#endif - /* Return the canonical absolute name of file NAME. A canonical name does not contain any `.', `..' components nor any repeated path separators ('/') or symlinks. All path components must exist. If @@ -50,10 +35,6 @@ char * __realpath (const char *name, char *resolved) { - char *rpath, *dest, extra_buf[PATH_MAX]; - const char *start, *end, *rpath_limit; - int num_links = 0; - if (name == NULL) { /* As per Single Unix Specification V2 we must return an error if @@ -72,147 +53,7 @@ __realpath (const char *name, char *resolved) return NULL; } - if (resolved == NULL) - { - rpath = malloc (PATH_MAX); - if (rpath == NULL) - return NULL; - } - else - rpath = resolved; - rpath_limit = rpath + PATH_MAX; - - if (name[0] != '/') - { - if (!__getcwd (rpath, PATH_MAX)) - { - rpath[0] = '\0'; - goto error; - } - dest = __rawmemchr (rpath, '\0'); - } - else - { - rpath[0] = '/'; - dest = rpath + 1; - } - - for (start = end = name; *start; start = end) - { - struct stat64 st; - int n; - - /* Skip sequence of multiple path-separators. */ - while (*start == '/') - ++start; - - /* Find end of path component. */ - for (end = start; *end && *end != '/'; ++end) - /* Nothing. */; - - if (end - start == 0) - break; - else if (end - start == 1 && start[0] == '.') - /* nothing */; - else if (end - start == 2 && start[0] == '.' && start[1] == '.') - { - /* Back up to previous component, ignore if at root already. */ - if (dest > rpath + 1) - while ((--dest)[-1] != '/'); - } - else - { - size_t new_size; - - if (dest[-1] != '/') - *dest++ = '/'; - - if (dest + (end - start) >= rpath_limit) - { - ptrdiff_t dest_offset = dest - rpath; - char *new_rpath; - - if (resolved) - { - __set_errno (ENAMETOOLONG); - if (dest > rpath + 1) - dest--; - *dest = '\0'; - goto error; - } - new_size = rpath_limit - rpath; - if (end - start + 1 > PATH_MAX) - new_size += end - start + 1; - else - new_size += PATH_MAX; - new_rpath = (char *) realloc (rpath, new_size); - if (new_rpath == NULL) - goto error; - rpath = new_rpath; - rpath_limit = rpath + new_size; - - dest = rpath + dest_offset; - } - - dest = __mempcpy (dest, start, end - start); - *dest = '\0'; - - if (__lxstat64 (_STAT_VER, rpath, &st) < 0) - goto error; - - if (S_ISLNK (st.st_mode)) - { - char buf[PATH_MAX]; - size_t len; - - if (++num_links > __eloop_threshold ()) - { - __set_errno (ELOOP); - goto error; - } - - n = __readlink (rpath, buf, sizeof (buf) - 1); - if (n < 0) - goto error; - buf[n] = '\0'; - - len = strlen (end); - if (PATH_MAX - n <= len) - { - __set_errno (ENAMETOOLONG); - goto error; - } - - /* Careful here, end may be a pointer into extra_buf... */ - memmove (&extra_buf[n], end, len + 1); - name = end = memcpy (extra_buf, buf, n); - - if (buf[0] == '/') - dest = rpath + 1; /* It's an absolute symlink */ - else - /* Back up to previous component, ignore if at root already: */ - if (dest > rpath + 1) - while ((--dest)[-1] != '/'); - } - else if (!S_ISDIR (st.st_mode) && *end != '\0') - { - __set_errno (ENOTDIR); - goto error; - } - } - } - if (dest > rpath + 1 && dest[-1] == '/') - --dest; - *dest = '\0'; - - assert (resolved == NULL || resolved == rpath); - return rpath; - -error: - assert (resolved == NULL || resolved == rpath); - if (resolved == NULL) - free (rpath); - return NULL; + return __realpath_system (name, resolved); } libc_hidden_def (__realpath) versioned_symbol (libc, __realpath, realpath, GLIBC_2_3); diff --git a/stdlib/realpath.c b/stdlib/realpath.c new file mode 100644 index 0000000000..d482f900d0 --- /dev/null +++ b/stdlib/realpath.c @@ -0,0 +1,43 @@ +/* Return the canonical absolute name of a given file. + Copyright (C) 1996-2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +char * +__realpath_system (const char *name, char *resolved) +{ + bool resolved_malloc = false; + if (resolved == NULL) + { + resolved = malloc (PATH_MAX); + if (resolved == NULL) + return NULL; + resolved_malloc = true; + } + + char *r = __resolve_path (name, resolved, PATH_MAX); + if (r == NULL) + { + if (resolved_malloc) + free (resolved); + return NULL; + } + return r; +} diff --git a/sysdeps/unix/sysv/linux/realpath.c b/sysdeps/unix/sysv/linux/realpath.c new file mode 100644 index 0000000000..0b141e3103 --- /dev/null +++ b/sysdeps/unix/sysv/linux/realpath.c @@ -0,0 +1,53 @@ +/* Return the canonical absolute name of a given file. Linux version. + Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include + +char * +__realpath_system (const char *name, char *resolved) +{ + int fd = __open64_nocancel (name, O_PATH | O_NONBLOCK | O_CLOEXEC); + if (fd == -1) + { + /* If the call fails with either EACCES or ENOENT and resolved_path is + not NULL, then the prefix of path that is not readable or does not + exist is returned in resolved_path. This is a GNU extension. */ + if (resolved != NULL) + __resolve_path (name, resolved); + return NULL; + } + + struct fd_to_filename fdfilename; + char path[PATH_MAX]; + + char *procname = __fd_to_filename (fd, &fdfilename); + ssize_t len = __readlink (procname, path, sizeof (path) - 1); + if (len < 0) + { + __close_nocancel_nostatus (fd); + return NULL; + } + path[len] = '\0'; + __close_nocancel_nostatus (fd); + + return resolved != NULL ? strcpy (resolved, path) : __strdup (path); +}