From patchwork Thu Apr 13 03:42:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pedro Alves X-Patchwork-Id: 20013 Received: (qmail 11409 invoked by alias); 13 Apr 2017 03:42:44 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 11400 invoked by uid 89); 13 Apr 2017 03:42:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=Storage X-HELO: mail-wr0-f178.google.com Received: from mail-wr0-f178.google.com (HELO mail-wr0-f178.google.com) (209.85.128.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 13 Apr 2017 03:42:41 +0000 Received: by mail-wr0-f178.google.com with SMTP id c55so27895834wrc.3 for ; Wed, 12 Apr 2017 20:42:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=WxV4ddMCdsVUSxyZE7QWDiC2M2MJEXHKWjDgbt0dHm4=; b=SHYeY9pygRBqg1eAU0u1PcRKG/6dc1Z/qM1cQ30DcCIG6XrGa9pjcNBGJQH99W1k3z 8iaRK8bqY3vVB+G+msdRpVrB2bZNjiBquVoVz92SaClSCphRTgWUCpXXO330cfXIft/n 77V2Riy7Y8wGBhAJbu3Nxjzw3GtY+LSf5nzWqlVrU3a36hxPcJtRJEvM1ShZ9I3vPzCr Cs8W3glK75hU2hIb72p6pFW8omWCvo0AQDOZ07sH5SnJtomXdkGVpAEdGHUjjWcSt6hO ejvHuPuS/cskJ/PZPLVN5kty1EzJ1jGk0vM9USM6mjlK5U9AyAklBET7/wkbmFvUzjgF RIAQ== X-Gm-Message-State: AN3rC/4GUBKrEsRr1hfb7M6rdK5JRqzzjRTZbZQgyU4FW+bAK8b5Vw7u WR7tIbGRx+IAOeJz X-Received: by 10.223.149.99 with SMTP id 90mr608233wrs.91.1492054960378; Wed, 12 Apr 2017 20:42:40 -0700 (PDT) Received: from [192.168.0.101] ([37.189.166.198]) by smtp.gmail.com with ESMTPSA id i203sm8800947wmf.12.2017.04.12.20.42.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Apr 2017 20:42:39 -0700 (PDT) Subject: Re: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string copying (Re: [PATCH v5 3/5] C++-fy and prepare for sharing fork_inferior) To: Sergio Durigan Junior References: <1482464361-4068-1-git-send-email-sergiodj@redhat.com> <20170330014915.4894-1-sergiodj@redhat.com> <20170330014915.4894-4-sergiodj@redhat.com> <877f2q3223.fsf@redhat.com> <87o9w2z068.fsf@redhat.com> <36f28afa-1174-df6a-cb54-fdcf129fa816@redhat.com> <878tn5w9d7.fsf@redhat.com> Cc: GDB Patches From: Pedro Alves Message-ID: <01d7a597-ac60-9764-142b-bcaefd269271@redhat.com> Date: Thu, 13 Apr 2017 04:42:38 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <878tn5w9d7.fsf@redhat.com> On 04/12/2017 11:26 PM, Sergio Durigan Junior wrote: > On Wednesday, April 12 2017, Pedro Alves wrote: > >> On 04/12/2017 06:04 AM, Sergio Durigan Junior wrote: >> >>>>>> static void >>>>>> -breakup_args (char *scratch, char **argv) >>>>>> +breakup_args (const std::string &scratch, std::vector &argv) >>>>>> { >>>>> >>>>> ... >>>>> >>>>>> + >>>>>> + std::string arg = scratch.substr (cur_pos, next_sep - cur_pos); >>>>>> + >>>>> >>>>> This creates a temporary string (heap allocates) ... >>>>> >>>>>> + argv.push_back (xstrdup (arg.c_str ())); >>>>> >>>>> ... and here you create yet another copy. >>>>> >>>>> You should be able to avoid it by using e.g., savestring: >>>>> >>>>> char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos); >>>>> argv.push_back (arg); >>>> >>>> Fair enough. I had my mind on "C++-only mode" when writing this code. >> >> Yup, in C++, it's good to keep unnecessary temporaries >> and hidden heap allocations in mind. Actually, now that I look a bit >> deeper, I think we can avoid a "premature pessimization" here, keeping >> the same level of clarity. >> >> I think it'd be good to push this patch below. WDYT? > > Hm, I don't know. I mean, I understand where you're coming from and > what you're trying to achieve, but somehow I thought the old code > (especially the part that modified the argument string in place, using > \0's as separators) was somewhat hacky. Even though the new code uses > more memory, it it also more readable, at least IMNSHO. And also more > const-correct, since we can make breakup_args_for_exec receive a const > as the first argument. There's nothing wrong with writing to a string in place. A std::string is just a _container_ of chars with convenience methods. It's been designed to support embedded \0s. And it's not a problem for a function to take a copy of a value if it needs to modify it! const-correct refers to when functions that _don't need to modify an argument nevertheless declare non-const reference/pointer. Effectively you can't avoid _some_ creating _some_ new string(s). But instead of taking one single copy (O(1)), the current code is making many (N) tiny copies. Trips to the generic allocator / malloc should be one of the first things to avoid if easy to avoid, as a principle. The patch I had posted is exactly equivalent to the one below, except the state/storage in this one is moved to a class. Maybe you'd find this clearer? (It'd need some cleaning up for comments further at least, but it's getting quite late here...) From 689a881839f9d4aeb36695676856db2eaf8e97ab Mon Sep 17 00:00:00 2001 From: Pedro Alves Date: Thu, 13 Apr 2017 04:13:22 +0100 Subject: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string copying The previous patch converted the argv building from an alloca-allocated array of non-owning arg pointers, to a std::vector of owning pointers, which results in N string dups, with N being the number of arguments in the vector, and then requires manually releasing the pointers owned by the vector. This patch makes the vector hold non-owning pointers, and avoids the string dups, by doing one single string copy of the arguments upfront, and replacing separators with NULL terminators in place, like we used to. With this, there's no need to remember to call free_vector_argv either. gdb/ChangeLog: yyyy-mm-dd Pedro Alves * fork-child.c (breakup_args): Make 'scratch' non-const. Replace separators with NULL terminators in place. Change type of vector. (fork_inferior): The argument vector now holds non-owning pointers. Don't strdup strings into the vector. Remove free_vector_argv call. --- gdb/fork-child.c | 278 ++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 172 insertions(+), 106 deletions(-) diff --git a/gdb/fork-child.c b/gdb/fork-child.c index 6b7386e..91986f7 100644 --- a/gdb/fork-child.c +++ b/gdb/fork-child.c @@ -43,38 +43,96 @@ extern char **environ; static char *exec_wrapper; -/* Break up SCRATCH into an argument vector suitable for passing to - execvp and store it in ARGV. E.g., on "run a b c d" this routine +/* Build the argument vector for execv. */ + +class execv_argv_builder +{ +public: + /* EXEC_FILE is the file to run. ALLARGS is a string containing the + arguments to the program. If starting with a shell, SHELL_FILE + is the shell to run. Otherwise, SHELL_FILE is NULL. */ + execv_argv_builder (const char *exec_file, + const std::string &allargs, + const char *shell_file); + + /* Return a pointer to the built argv, in the type expected by + execv. */ + char **argv () + { + /* It is guaranteed that the exec functions do not modify the + arguments, but they nevertheless expect "char **", so that's + the type we return. */ + return const_cast (&m_argv[0]); + } + +private: + void build_no_shell (const char *exec_file, + const std::string &allargs); + + void build_shell (const char *exec_file, + const std::string &allargs, + const char *shell_file); + + /* The argument vector built. Holds non-owning pointers. */ + std::vector m_argv; + + /* Storage. Arguments in C_ARGV point inside these. */ + + /* For the !shell case. */ + std::string m_broke_up_args; + + /* For the shell case. */ + std::string m_shell_command; +}; + +/* Break up allargs into an argument vector suitable for passing to + execvp and store it in M_ARGV. E.g., on "run a b c d" this routine would get as input the string "a b c d", and as output it would - fill in ARGV with the four arguments "a", "b", "c", "d". */ + fill in M_ARGV with the four arguments "a", "b", "c", "d". */ -static void -breakup_args (const std::string &scratch, std::vector &argv) +void +execv_argv_builder::build_no_shell (const char *exec_file, + const std::string &allargs) { - for (size_t cur_pos = 0; cur_pos < scratch.size ();) + /* We're going to call execvp. Create argument vector. */ + + /* Save/work with a copy. The pointers pushed to M_ARGV point + directly into M_BROKE_UP_ARGS, which is modified in place with + the necessary NULL terminators. */ + m_broke_up_args = allargs; + + m_argv.push_back (exec_file); + + for (size_t cur_pos = 0; cur_pos < m_broke_up_args.size ();) { /* Skip whitespace-like chars. */ - std::size_t pos = scratch.find_first_not_of (" \t\n", cur_pos); + std::size_t pos = m_broke_up_args.find_first_not_of (" \t\n", cur_pos); if (pos != std::string::npos) cur_pos = pos; /* Find the position of the next separator. */ - std::size_t next_sep = scratch.find_first_of (" \t\n", cur_pos); + std::size_t next_sep = m_broke_up_args.find_first_of (" \t\n", cur_pos); - /* No separator found, which means this is the last - argument. */ if (next_sep == std::string::npos) - next_sep = scratch.size (); + { + /* No separator found, which means this is the last + argument. */ + next_sep = m_broke_up_args.size (); + } + else + { + /* Replace the separator with a terminator. */ + m_broke_up_args[next_sep++] = '\0'; + } - char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos); - argv.push_back (arg); + m_argv.push_back (&m_broke_up_args[cur_pos]); cur_pos = next_sep; } /* NULL-terminate the vector. */ - argv.push_back (NULL); + m_argv.push_back (NULL); } /* When executing a command under the given shell, return non-zero if @@ -101,6 +159,101 @@ escape_bang_in_quoted_argument (const char *shell_file) return 0; } +execv_argv_builder::execv_argv_builder (const char *exec_file, + const std::string &allargs, + const char *shell_file) +{ + if (shell_file == NULL) + build_no_shell (exec_file, allargs); + else + build_shell (exec_file, allargs, shell_file); +} + +void +execv_argv_builder::build_shell (const char *exec_file, + const std::string &allargs, + const char *shell_file) +{ + /* We're going to call a shell. */ + const char *p; + int need_to_quote; + const int escape_bang = escape_bang_in_quoted_argument (shell_file); + + m_shell_command = "exec "; + + /* Add any exec wrapper. That may be a program name with arguments, + so the user must handle quoting. */ + if (exec_wrapper) + { + m_shell_command += exec_wrapper; + m_shell_command += ' '; + } + + /* Now add exec_file, quoting as necessary. */ + + /* Quoting in this style is said to work with all shells. But csh + on IRIX 4.0.1 can't deal with it. So we only quote it if we need + to. */ + p = exec_file; + while (1) + { + switch (*p) + { + case '\'': + case '!': + case '"': + case '(': + case ')': + case '$': + case '&': + case ';': + case '<': + case '>': + case ' ': + case '\n': + case '\t': + need_to_quote = 1; + goto end_scan; + + case '\0': + need_to_quote = 0; + goto end_scan; + + default: + break; + } + ++p; + } + end_scan: + if (need_to_quote) + { + m_shell_command += '\''; + for (p = exec_file; *p != '\0'; ++p) + { + if (*p == '\'') + m_shell_command += "'\\''"; + else if (*p == '!' && escape_bang) + m_shell_command += "\\!"; + else + m_shell_command += *p; + } + m_shell_command += '\''; + } + else + m_shell_command += exec_file; + + m_shell_command += " " + allargs; + + /* If we decided above to start up with a shell, we exec the shell, + "-c" says to interpret the next arg as a shell command to + execute, and this command is "exec ". */ + m_argv.reserve (4); + m_argv.push_back (shell_file); + m_argv.push_back ("-c"); + m_argv.push_back (m_shell_command.c_str ()); + m_argv.push_back (NULL); +} + /* See inferior.h. */ void @@ -155,7 +308,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, static const char *exec_file; char **save_our_env; int shell = 0; - std::vector argv; const char *inferior_io_terminal = get_inferior_io_terminal (); struct inferior *inf; int i; @@ -183,95 +335,9 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, shell = 1; } - if (!shell) - { - /* We're going to call execvp. Create argument vector. */ - argv.push_back (xstrdup (exec_file)); - breakup_args (allargs, argv); - } - else - { - /* We're going to call a shell. */ - std::string shell_command; - const char *p; - int need_to_quote; - const int escape_bang = escape_bang_in_quoted_argument (shell_file); - - shell_command = std::string ("exec "); - - /* Add any exec wrapper. That may be a program name with arguments, so - the user must handle quoting. */ - if (exec_wrapper) - { - shell_command += exec_wrapper; - shell_command += ' '; - } - - /* Now add exec_file, quoting as necessary. */ - - /* Quoting in this style is said to work with all shells. But - csh on IRIX 4.0.1 can't deal with it. So we only quote it if - we need to. */ - p = exec_file; - while (1) - { - switch (*p) - { - case '\'': - case '!': - case '"': - case '(': - case ')': - case '$': - case '&': - case ';': - case '<': - case '>': - case ' ': - case '\n': - case '\t': - need_to_quote = 1; - goto end_scan; - - case '\0': - need_to_quote = 0; - goto end_scan; - - default: - break; - } - ++p; - } - end_scan: - if (need_to_quote) - { - shell_command += '\''; - for (p = exec_file; *p != '\0'; ++p) - { - if (*p == '\'') - shell_command += "'\\''"; - else if (*p == '!' && escape_bang) - shell_command += "\\!"; - else - shell_command += *p; - } - shell_command += '\''; - } - else - shell_command += exec_file; - - shell_command += " " + allargs; - - /* If we decided above to start up with a shell, we exec the - shell, "-c" says to interpret the next arg as a shell command - to execute, and this command is "exec - ". We xstrdup all the strings here because they will - be free'd later in the code. */ - argv.push_back (xstrdup (shell_file)); - argv.push_back (xstrdup ("-c")); - argv.push_back (xstrdup (shell_command.c_str ())); - argv.push_back (NULL); - } + /* Build the argument vector. */ + execv_argv_builder argv_builder (exec_file, allargs, + shell ? shell_file : NULL); /* Retain a copy of our environment variables, since the child will replace the value of environ and if we're vforked, we have to @@ -376,6 +442,8 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, path to find $SHELL. Rich Pixley says so, and I agree. */ environ = env; + char **argv = argv_builder.argv (); + if (exec_fun != NULL) (*exec_fun) (argv[0], &argv[0], env); else @@ -393,8 +461,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, _exit (0177); } - free_vector_argv (argv); - /* Restore our environment in case a vforked child clob'd it. */ environ = save_our_env;