From patchwork Thu Apr 13 10:51:04 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pedro Alves X-Patchwork-Id: 20027 Received: (qmail 91570 invoked by alias); 13 Apr 2017 10:51:11 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 91559 invoked by uid 89); 13 Apr 2017 10:51:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.2 spammy=sleep, bang X-HELO: mail-wm0-f41.google.com Received: from mail-wm0-f41.google.com (HELO mail-wm0-f41.google.com) (74.125.82.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 13 Apr 2017 10:51:07 +0000 Received: by mail-wm0-f41.google.com with SMTP id t189so44001949wmt.1 for ; Thu, 13 Apr 2017 03:51:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=reeVnxzix8+BgNNNOPhntJVy3j/L4EnNzTSKjXyUS+A=; b=gsLj1ynfBF3b4bgBTYnWz/hyC9cpYk7HREutDfGuNqW1x1mjgupDgFeI99m7OmVwgO B7X7m2S9kdh6n9xaI/gtAIDoYm4XAapvDLwLLPnZPylw5uYqKWuXax7pftqO5LW80M8P 5aFdTrPgn85/ki9R4PxKfG1/+xyH/BXrSuO/IGVDDdYb3AQW3V2RgtxYEf7iVNrDZutz gr5/SRN6+kKO1D0hzsRUdL/S2g+jjzGk1tlsfY5AM5N/Hid2ZhpOtC4FKr+qHBh+xmUY 5ne+VVg7+glbLZy6wy0657CLkQPWHWnrgPaVRgOEQvbOVVIH7JU6li5XjF8Zo1XXz49+ 4Wxw== X-Gm-Message-State: AN3rC/4g3ELxsJg50ZqPU470uGAFVVB8OsAixo/otJtL1dzUb3z/pQFo LvrUk2tM+AgeYWj7YaLQ8Q== X-Received: by 10.28.66.207 with SMTP id k76mr24442495wmi.121.1492080666698; Thu, 13 Apr 2017 03:51:06 -0700 (PDT) Received: from [192.168.0.101] ([37.189.166.198]) by smtp.gmail.com with ESMTPSA id m21sm10023147wma.19.2017.04.13.03.51.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Apr 2017 03:51:05 -0700 (PDT) Subject: Re: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string copying (Re: [PATCH v5 3/5] C++-fy and prepare for sharing fork_inferior) To: Sergio Durigan Junior References: <1482464361-4068-1-git-send-email-sergiodj@redhat.com> <20170330014915.4894-1-sergiodj@redhat.com> <20170330014915.4894-4-sergiodj@redhat.com> <877f2q3223.fsf@redhat.com> <87o9w2z068.fsf@redhat.com> <36f28afa-1174-df6a-cb54-fdcf129fa816@redhat.com> <878tn5w9d7.fsf@redhat.com> <01d7a597-ac60-9764-142b-bcaefd269271@redhat.com> <87y3v5szc6.fsf@redhat.com> Cc: GDB Patches From: Pedro Alves Message-ID: Date: Thu, 13 Apr 2017 11:51:04 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <87y3v5szc6.fsf@redhat.com> On 04/13/2017 05:31 AM, Sergio Durigan Junior wrote: > On Wednesday, April 12 2017, Pedro Alves wrote: > Missing comments on the functions. Yup, it was a prototype patch, it was too late already. I'd either finish it or not sleep. :-) >> + /* The argument vector built. Holds non-owning pointers. */ >> + std::vector m_argv; > > OOC, I notice you're prefixing a lot (if not all) of your class > variables with "m_". I understand this is a code convention to signal > when a variable is a member of a class, but are we following this > officially? Because (again!) I'm not really fond of this :-). Yes, we are: https://sourceware.org/gdb/wiki/Internals%20GDB-C-Coding-Standards#C.2B-.2B--specific_coding_conventions => https://gcc.gnu.org/codingconventions.html "When structs and/or classes have member functions, prefer to name data members with a leading m_ and static data members with a leading s_. " > >> + /* Storage. Arguments in C_ARGV point inside these. */ >> + >> + /* For the !shell case. */ >> + std::string m_broke_up_args; > > If we're going this way, then I'd like to see this comment expanded to > explain how the args are split, i.e., using \0 as the delimiter. Done. > Missing a comment here explaining the function below (or pointing to the > right place). Done. >> @@ -155,7 +308,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, >> static const char *exec_file; >> char **save_our_env; >> int shell = 0; >> - std::vector argv; > > I believe you can also remove the 'static' keyword from the 'shell_file' > declaration. And exec_file too. It's not entirely clear to me why they were ever made static. The rationale about vfork looks bogus, since the child does not modify these. Anyway, better leave that to a separate patch, IMO. > Good code cleanup! > > BTW, it is possible to simplify this part and remove the "shell" > variable completely, using only "shell_file". I did this. > > Otherwise, LGTM. > > Thanks for doing this, Thanks. Below's what I pushed, after another round of testing. From 808480f667e41e2fdb66bfdc9d5e047f1aa34a68 Mon Sep 17 00:00:00 2001 From: Pedro Alves Date: Thu, 13 Apr 2017 11:46:07 +0100 Subject: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string copying The previous change to fork-child.c converted the argv building from an alloca-allocated array of non-owning arg pointers, to a std::vector of owning pointers, which results in N string dups, with N being the number of arguments in the vector, and then requires manually releasing the pointers owned by the vector. This patch makes the vector hold non-owning pointers, and avoids the string dups, by doing one single string copy of the arguments upfront, and replacing separators with NULL terminators in place, like we used to. All the logic to do that is encapsulated in a new class. With this, there's no need to remember to manually release the argv elements with free_vector_argv either. gdb/ChangeLog: 2017-04-13 Pedro Alves * fork-child.c (execv_argv): New class. (breakup_args): Refactored as ... (execv_argv::init_for_no_shell): .. this method of execv_argv. Copy arguments to storage and replace separators with NULL terminators in place. (escape_bang_in_quoted_argument): Adjust to return bool. (execv_argv::execv_argv): New ctor. (execv_argv::init_for_shell): New method, factored out from fork_inferior. Don't strdup strings into the vector. (fork_inferior): Eliminate "shell" local and use execv_argv. Use Remove free_vector_argv call. --- gdb/ChangeLog | 14 +++ gdb/fork-child.c | 317 +++++++++++++++++++++++++++++++++++-------------------- 2 files changed, 215 insertions(+), 116 deletions(-) diff --git a/gdb/ChangeLog b/gdb/ChangeLog index ed71880..90ed21c 100644 --- a/gdb/ChangeLog +++ b/gdb/ChangeLog @@ -1,3 +1,17 @@ +2017-04-13 Pedro Alves + + * fork-child.c (execv_argv): New class. + (breakup_args): Refactored as ... + (execv_argv::init_for_no_shell): .. this method of execv_argv. + Copy arguments to storage and replace separators with NULL + terminators in place. + (escape_bang_in_quoted_argument): Adjust to return bool. + (execv_argv::execv_argv): New ctor. + (execv_argv::init_for_shell): New method, factored out from + fork_inferior. Don't strdup strings into the vector. + (fork_inferior): Eliminate "shell" local and use execv_argv. Use + Remove free_vector_argv call. + 2017-04-13 Yao Qi * rx-tdep.c (rx_fpsw_type): Check tdep->rx_fpsw_type instead of diff --git a/gdb/fork-child.c b/gdb/fork-child.c index 6b7386e..11ffee9 100644 --- a/gdb/fork-child.c +++ b/gdb/fork-child.c @@ -43,62 +43,235 @@ extern char **environ; static char *exec_wrapper; -/* Break up SCRATCH into an argument vector suitable for passing to - execvp and store it in ARGV. E.g., on "run a b c d" this routine - would get as input the string "a b c d", and as output it would - fill in ARGV with the four arguments "a", "b", "c", "d". */ +/* Build the argument vector for execv(3). */ -static void -breakup_args (const std::string &scratch, std::vector &argv) +class execv_argv +{ +public: + /* EXEC_FILE is the file to run. ALLARGS is a string containing the + arguments to the program. If starting with a shell, SHELL_FILE + is the shell to run. Otherwise, SHELL_FILE is NULL. */ + execv_argv (const char *exec_file, const std::string &allargs, + const char *shell_file); + + /* Return a pointer to the built argv, in the type expected by + execv. The result is (only) valid for as long as this execv_argv + object is live. We return a "char **" because that's the type + that the execv functions expect. Note that it is guaranteed that + the execv functions do not modify the argv[] array nor the + strings to which the array point. */ + char **argv () + { + return const_cast (&m_argv[0]); + } + +private: + /* Disable copying. */ + execv_argv (const execv_argv &) = delete; + void operator= (const execv_argv &) = delete; + + /* Helper methods for constructing the argument vector. */ + + /* Used when building an argv for a straight execv call, without + going via the shell. */ + void init_for_no_shell (const char *exec_file, + const std::string &allargs); + + /* Used when building an argv for execing a shell that execs the + child program. */ + void init_for_shell (const char *exec_file, + const std::string &allargs, + const char *shell_file); + + /* The argument vector built. Holds non-owning pointers. Elements + either point to the strings passed to the execv_argv ctor, or + inside M_STORAGE. */ + std::vector m_argv; + + /* Storage. In the no-shell case, this contains a copy of the + arguments passed to the ctor, split by '\0'. In the shell case, + this contains the quoted shell command. I.e., SHELL_COMMAND in + {"$SHELL" "-c", SHELL_COMMAND, NULL}. */ + std::string m_storage; +}; + +/* Create argument vector for straight call to execvp. Breaks up + ALLARGS into an argument vector suitable for passing to execvp and + stores it in M_ARGV. E.g., on "run a b c d" this routine would get + as input the string "a b c d", and as output it would fill in + M_ARGV with the four arguments "a", "b", "c", "d". Each argument + in M_ARGV points to a substring of a copy of ALLARGS stored in + M_STORAGE. */ + +void +execv_argv::init_for_no_shell (const char *exec_file, + const std::string &allargs) { - for (size_t cur_pos = 0; cur_pos < scratch.size ();) + + /* Save/work with a copy stored in our storage. The pointers pushed + to M_ARGV point directly into M_STORAGE, which is modified in + place with the necessary NULL terminators. This avoids N heap + allocations and string dups when 1 is sufficient. */ + std::string &args_copy = m_storage = allargs; + + m_argv.push_back (exec_file); + + for (size_t cur_pos = 0; cur_pos < args_copy.size ();) { /* Skip whitespace-like chars. */ - std::size_t pos = scratch.find_first_not_of (" \t\n", cur_pos); + std::size_t pos = args_copy.find_first_not_of (" \t\n", cur_pos); if (pos != std::string::npos) cur_pos = pos; /* Find the position of the next separator. */ - std::size_t next_sep = scratch.find_first_of (" \t\n", cur_pos); + std::size_t next_sep = args_copy.find_first_of (" \t\n", cur_pos); - /* No separator found, which means this is the last - argument. */ if (next_sep == std::string::npos) - next_sep = scratch.size (); + { + /* No separator found, which means this is the last + argument. */ + next_sep = args_copy.size (); + } + else + { + /* Replace the separator with a terminator. */ + args_copy[next_sep++] = '\0'; + } - char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos); - argv.push_back (arg); + m_argv.push_back (&args_copy[cur_pos]); cur_pos = next_sep; } /* NULL-terminate the vector. */ - argv.push_back (NULL); + m_argv.push_back (NULL); } -/* When executing a command under the given shell, return non-zero if - the '!' character should be escaped when embedded in a quoted +/* When executing a command under the given shell, return true if the + '!' character should be escaped when embedded in a quoted command-line argument. */ -static int +static bool escape_bang_in_quoted_argument (const char *shell_file) { - const int shell_file_len = strlen (shell_file); + size_t shell_file_len = strlen (shell_file); /* Bang should be escaped only in C Shells. For now, simply check that the shell name ends with 'csh', which covers at least csh and tcsh. This should be good enough for now. */ if (shell_file_len < 3) - return 0; + return false; if (shell_file[shell_file_len - 3] == 'c' && shell_file[shell_file_len - 2] == 's' && shell_file[shell_file_len - 1] == 'h') - return 1; + return true; - return 0; + return false; +} + +/* See declaration. */ + +execv_argv::execv_argv (const char *exec_file, + const std::string &allargs, + const char *shell_file) +{ + if (shell_file == NULL) + init_for_no_shell (exec_file, allargs); + else + init_for_shell (exec_file, allargs, shell_file); +} + +/* See declaration. */ + +void +execv_argv::init_for_shell (const char *exec_file, + const std::string &allargs, + const char *shell_file) +{ + /* We're going to call a shell. */ + bool escape_bang = escape_bang_in_quoted_argument (shell_file); + + /* We need to build a new shell command string, and make argv point + to it. So build it in the storage. */ + std::string &shell_command = m_storage; + + shell_command = "exec "; + + /* Add any exec wrapper. That may be a program name with arguments, + so the user must handle quoting. */ + if (exec_wrapper) + { + shell_command += exec_wrapper; + shell_command += ' '; + } + + /* Now add exec_file, quoting as necessary. */ + + /* Quoting in this style is said to work with all shells. But csh + on IRIX 4.0.1 can't deal with it. So we only quote it if we need + to. */ + bool need_to_quote; + const char *p = exec_file; + while (1) + { + switch (*p) + { + case '\'': + case '!': + case '"': + case '(': + case ')': + case '$': + case '&': + case ';': + case '<': + case '>': + case ' ': + case '\n': + case '\t': + need_to_quote = true; + goto end_scan; + + case '\0': + need_to_quote = false; + goto end_scan; + + default: + break; + } + ++p; + } + end_scan: + if (need_to_quote) + { + shell_command += '\''; + for (p = exec_file; *p != '\0'; ++p) + { + if (*p == '\'') + shell_command += "'\\''"; + else if (*p == '!' && escape_bang) + shell_command += "\\!"; + else + shell_command += *p; + } + shell_command += '\''; + } + else + shell_command += exec_file; + + shell_command += ' ' + allargs; + + /* If we decided above to start up with a shell, we exec the shell. + "-c" says to interpret the next arg as a shell command to + execute, and this command is "exec ". */ + m_argv.reserve (4); + m_argv.push_back (shell_file); + m_argv.push_back ("-c"); + m_argv.push_back (shell_command.c_str ()); + m_argv.push_back (NULL); } /* See inferior.h. */ @@ -154,8 +327,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, static char *shell_file; static const char *exec_file; char **save_our_env; - int shell = 0; - std::vector argv; const char *inferior_io_terminal = get_inferior_io_terminal (); struct inferior *inf; int i; @@ -172,106 +343,20 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, /* 'startup_with_shell' is declared in inferior.h and bound to the "set startup-with-shell" option. If 0, we'll just do a fork/exec, no shell, so don't bother figuring out what shell. */ - shell_file = shell_file_arg; if (startup_with_shell) { + shell_file = shell_file_arg; /* Figure out what shell to start up the user program under. */ if (shell_file == NULL) shell_file = getenv ("SHELL"); if (shell_file == NULL) shell_file = default_shell_file; - shell = 1; - } - - if (!shell) - { - /* We're going to call execvp. Create argument vector. */ - argv.push_back (xstrdup (exec_file)); - breakup_args (allargs, argv); } else - { - /* We're going to call a shell. */ - std::string shell_command; - const char *p; - int need_to_quote; - const int escape_bang = escape_bang_in_quoted_argument (shell_file); - - shell_command = std::string ("exec "); - - /* Add any exec wrapper. That may be a program name with arguments, so - the user must handle quoting. */ - if (exec_wrapper) - { - shell_command += exec_wrapper; - shell_command += ' '; - } + shell_file = NULL; - /* Now add exec_file, quoting as necessary. */ - - /* Quoting in this style is said to work with all shells. But - csh on IRIX 4.0.1 can't deal with it. So we only quote it if - we need to. */ - p = exec_file; - while (1) - { - switch (*p) - { - case '\'': - case '!': - case '"': - case '(': - case ')': - case '$': - case '&': - case ';': - case '<': - case '>': - case ' ': - case '\n': - case '\t': - need_to_quote = 1; - goto end_scan; - - case '\0': - need_to_quote = 0; - goto end_scan; - - default: - break; - } - ++p; - } - end_scan: - if (need_to_quote) - { - shell_command += '\''; - for (p = exec_file; *p != '\0'; ++p) - { - if (*p == '\'') - shell_command += "'\\''"; - else if (*p == '!' && escape_bang) - shell_command += "\\!"; - else - shell_command += *p; - } - shell_command += '\''; - } - else - shell_command += exec_file; - - shell_command += " " + allargs; - - /* If we decided above to start up with a shell, we exec the - shell, "-c" says to interpret the next arg as a shell command - to execute, and this command is "exec - ". We xstrdup all the strings here because they will - be free'd later in the code. */ - argv.push_back (xstrdup (shell_file)); - argv.push_back (xstrdup ("-c")); - argv.push_back (xstrdup (shell_command.c_str ())); - argv.push_back (NULL); - } + /* Build the argument vector. */ + execv_argv child_argv (exec_file, allargs, shell_file); /* Retain a copy of our environment variables, since the child will replace the value of environ and if we're vforked, we have to @@ -376,6 +461,8 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, path to find $SHELL. Rich Pixley says so, and I agree. */ environ = env; + char **argv = child_argv.argv (); + if (exec_fun != NULL) (*exec_fun) (argv[0], &argv[0], env); else @@ -393,8 +480,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, _exit (0177); } - free_vector_argv (argv); - /* Restore our environment in case a vforked child clob'd it. */ environ = save_our_env;