From patchwork Mon Apr 1 21:26:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Yvon Zimmermann X-Patchwork-Id: 87897 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7BF8B3858409 for ; Mon, 1 Apr 2024 21:27:10 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from mailrelay.tu-berlin.de (mailrelay.tu-berlin.de [130.149.7.70]) by sourceware.org (Postfix) with ESMTPS id A74443858D20 for ; Mon, 1 Apr 2024 21:26:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A74443858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=campus.tu-berlin.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=campus.tu-berlin.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A74443858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=130.149.7.70 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712006806; cv=none; b=Zyj99csPyvxJl6dZW0fPWSgp9gNv10feUaM96PvxYUmapd70nSE4MBeV39wYKbtqO2bOcOZn24hqF4hG3jUscENlkw1s1fzK2HE7sGJ4kOnRyI5LqvK1SA9nqQ4JN3oyAH25QBt+iPg8yeqIAXzsUxQhZ2Yy+LNGc5eQa61tXtg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712006806; c=relaxed/simple; bh=oRdQUWMSMon0HtuJFIDGSNneJir2rFgMttUMTWAcW4U=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=OQO91QFi5j4XY+YBWEKAXLi39Lw4jQQ1xvlVC17ojfeNBAVg+aB/a1CEHudkAFIDFe+uFiRF3fiud+64OifkDz8G3sLXgDO0jZlOBwJOLweKKetuOY0aMTudEEblETk2PaC3kXX2YNiauAftbMRXShFADJrP9QKWHMhN//6owZI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tu-berlin.de; l=18583; s=dkim-tub; t=1712006801; h=message-id:date:mime-version:to:from:subject: content-transfer-encoding; bh=0KfAc5OE82TlG9z5iaVQ5CF0gBi21RBWyRQSUf6nrcU=; b=fJkumsvVGlgQ7XGyznul+/efLROo/i2K9LSN/qaTBlAB3fMzRwXxAttf V5Xf4zKNVIc08d9gnGmcojlHzC+tvcP4gmU/Z/8i8RjRsldvuzONVFuSF ZbjNjc90zef2ScmpgkuJqqvWGzI82QfphG8P8vN9cENEkaOQMBuhNzA9C 4=; X-CSE-ConnectionGUID: PjhNFnHbQueWb9b5BacHvA== X-CSE-MsgGUID: uf4Brm41TuKb9IQUlVxkaA== X-IronPort-AV: E=Sophos;i="6.07,173,1708383600"; d="scan'208";a="23294505" Received: from bulkmail.tu-berlin.de (HELO mail.tu-berlin.de) ([141.23.12.143]) by mailrelay.tu-berlin.de with ESMTP; 01 Apr 2024 23:26:39 +0200 Message-ID: <5ff720ec-4453-441e-ba82-84d60a2e8d50@campus.tu-berlin.de> Date: Mon, 1 Apr 2024 23:26:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: From: Max Yvon Zimmermann Subject: [PATCH] Add wildcard matching to substitute-path rules X-Spam-Status: No, score=-11.5 required=5.0 tests=AC_FROM_MANY_DOTS, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Hi everyone! I have developed a patch that allows for glob pattern matching within substitute-path rules. Why is this useful? Some build systems and/or package managers may copy the sources of a program before build. The path of this copy may not always be the same. In the case of conan (https://conan.io/) the path will consist of some constant part and some directory named with some hash value (i.e., /path/to//builddir/). This hash will differ with each build, and therefore any source redirection within GDB must be reconfigured. This is, of course, really annoying when debugging a library. You might write a custom startup script for GDB to set the correct path automatically. But I thought it would be much more elegant if GDB would just support glob patterns (so that the rule /path/to/*/builddir/ would match any relevant build directory). How does the patch work? I wrote a custom glob matching function gdb_fileprefix_fnmatch() that is called from substitute_path_rule_matches() (maybe this function should be renamed because its meaning is slightly different now). It is located in utils.c. If you think this function should be defined at another location, let me know. I considered just using fnmatch(), but I found that this is not viable for the problem at hand. fnmatch() matches a complete string, but I need to find the longest leading substring (and its length) that matches a given pattern. So calling fnmatch() multiple times would have quadratic complexity. I also didn't want to add any additional dependencies. Instead, I was inspired by git's implementation (https://github.com/git/git/blob/master/wildmatch.c) as well as fnmatch() from libiberty. It is recursive, but given that usually only a few (often just one) wildcards are used, this should not be an issue. What must be considered? This patch will break some (rather obscure) substitute-path rules. Any rule containing the characters '\', '?' and '*' must be rewritten to escape the aforementioned characters. This could get messy because backslashes must already be escaped in the GDB command line. Maybe another command (substitute-path-glob) should be introduced instead. Personally, I am ok with breaking stuff because it's only a small change to perform for the user, but that's not really my decision to make. I have not implemented character sets yet. I don't really see use cases for them in this setting. But if you want me to add them for completeness, I would be willing to do so. I also actively decided against implementing the '**' wildcard (match any string, also match directory separators), as this could lead to ambiguous matches. Testing I have created the 'gdb.base/subst-glob.exp' testsuite to test out the new feature. I also tested against 'gdb.base/subst.exp'. The tests were performed on GNU/Linux and Windows (MinGW). I did not find any regressions in the testsuite (on Linux). Copyright I have not yet signed any papers regarding FSF copyright assignment. Please let me know what you think of this patch! Thank you in advance, Max Reviewed-By: Eli Zaretskii --- gdb/NEWS | 5 + gdb/doc/gdb.texinfo | 29 +++++ gdb/source.c | 77 ++++++++---- gdb/testsuite/gdb.base/subst-glob.exp | 166 ++++++++++++++++++++++++++ gdb/utils.c | 100 ++++++++++++++++ gdb/utils.h | 2 + 6 files changed, 356 insertions(+), 23 deletions(-) create mode 100644 gdb/testsuite/gdb.base/subst-glob.exp diff --git a/gdb/NEWS b/gdb/NEWS index feb3a37393a..5a041175507 100644 --- a/gdb/NEWS +++ b/gdb/NEWS @@ -90,6 +90,11 @@ show unwind-on-signal These new commands replaces the existing set/show unwindonsignal. The old command is maintained as an alias. +set substitute-path + This command now supports glob pattern matching for substitution + rules. Wildcards '?' and '*' are supported. Use '\' to escape + '?', '*' and '\' characters. + * New features in the GDB remote stub, GDBserver ** The --remote-debug and --event-loop-debug command line options diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo index 727f9275bfb..bfd96afb5b4 100644 --- a/gdb/doc/gdb.texinfo +++ b/gdb/doc/gdb.texinfo @@ -9954,6 +9954,35 @@ For instance, if we had entered the following commands: use the second rule to rewrite @file{/usr/src/lib/foo.c} into @file{/mnt/src/lib/foo.c}. +Rules can contain wildcards to match multiple paths. The supported +wildcards are @samp{?} (to match any single character) and @samp{*} +(to match any string). Wildcards will never match the path separator of +the system. + +For instance, if we had entered the following command: + +@smallexample +(@value{GDBP}) set substitute-path /usr/*/include /mnt/include +@end smallexample + +@noindent +@value{GDBN} would then rewrite @file{/usr/src/include/defs.h} into +@file{/mnt/include/defs.h}. Another file @file{/usr/local/include/inc.h} +would be rewritten as @file{/mnt/include/inc.h} using the same rule. + +Use @samp{@backslashchar{}} to escape the characters @samp{?}@comma{} +@samp{*} and @samp{@backslashchar{}}. Note that you need to escape any +@samp{@backslashchar{}} characters twice in the @value{GDBN} command line. + +So if we want to match a literal @samp{*} character in a rule, we would enter: + +@smallexample +(@value{GDBP}) set substitute-path /foo\\*/bar /mnt/cross +@end smallexample + +@noindent +Now only the directory @file{/foo*/bar/} would match against the rule. + @item unset substitute-path [path] @kindex unset substitute-path diff --git a/gdb/source.c b/gdb/source.c index 432301e2a71..2406e75a8b9 100644 --- a/gdb/source.c +++ b/gdb/source.c @@ -26,6 +26,7 @@ #include "frame.h" #include "value.h" #include "gdbsupport/filestuff.h" +#include "utils.h" #include #include @@ -959,44 +960,48 @@ source_full_path_of (const char *filename, return 1; } -/* Return non-zero if RULE matches PATH, that is if the rule can be - applied to PATH. */ +/* Return the position in PATH up until RULE matches PATH, that is if the rule + can be applied to PATH. + Return -1 if there is no match. */ static int substitute_path_rule_matches (const struct substitute_path_rule *rule, const char *path) { - const int from_len = rule->from.length (); - const int path_len = strlen (path); - - if (path_len < from_len) - return 0; - /* The substitution rules are anchored at the start of the path, so the path should start with rule->from. */ - if (filename_ncmp (path, rule->from.c_str (), from_len) != 0) - return 0; + const int result = gdb_fileprefix_fnmatch (rule->from.c_str (), path); - /* Make sure that the region in the path that matches the substitution - rule is immediately followed by a directory separator (or the end of - string character). */ + if (result != -1) + { + /* Make sure that the region in the path that matches the substitution + rule is immediately followed by a directory separator (or the end of + string character). */ - if (path[from_len] != '\0' && !IS_DIR_SEPARATOR (path[from_len])) - return 0; + if (path[result] != '\0' && !IS_DIR_SEPARATOR (path[result])) + return -1; + } - return 1; + return result; } /* Find the substitute-path rule that applies to PATH and return it. + Also set SUB_POS to the position in PATH up until the rule matches PATH. Return NULL if no rule applies. */ static struct substitute_path_rule * -get_substitute_path_rule (const char *path) +get_substitute_path_rule (const char *path, int &sub_pos) { for (substitute_path_rule &rule : substitute_path_rules) - if (substitute_path_rule_matches (&rule, path)) - return &rule; + { + const int result = substitute_path_rule_matches (&rule, path); + if (result != -1) + { + sub_pos = result; + return &rule; + } + } return nullptr; } @@ -1010,7 +1015,8 @@ get_substitute_path_rule (const char *path) gdb::unique_xmalloc_ptr rewrite_source_path (const char *path) { - const struct substitute_path_rule *rule = get_substitute_path_rule (path); + int sub_pos; + const struct substitute_path_rule *rule = get_substitute_path_rule (path, sub_pos); if (rule == nullptr) return nullptr; @@ -1018,7 +1024,7 @@ rewrite_source_path (const char *path) /* Compute the rewritten path and return it. */ return (gdb::unique_xmalloc_ptr - (concat (rule->to.c_str (), path + rule->from.length (), nullptr))); + (concat (rule->to.c_str (), path + sub_pos, nullptr))); } /* See source.h. */ @@ -1718,6 +1724,31 @@ strip_trailing_directory_separator (char *path) path[last] = '\0'; } +/* If the last character of PATH is a directory separator, then strip it. + Also remove any related escape character. */ + +static void +strip_trailing_directory_separator_and_escape (char *path) +{ + int last = strlen (path) - 1; + + if (last < 0) + return; /* No stripping is needed if PATH is the empty string. */ + + if (IS_DIR_SEPARATOR (path[last])) + { + path[last] = '\0'; + + if (path[last] != '\\') + return; + + --last; + /* Remove any related escape character. */ + if (last >= 0 && path[last] == '\\') + path[last] = '\0'; + } +} + /* Add a new substitute-path rule at the end of the current list of rules. The new rule will replace FROM into TO. */ @@ -1754,7 +1785,7 @@ show_substitute_path_command (const char *args, int from_tty) for (substitute_path_rule &rule : substitute_path_rules) { - if (from == NULL || substitute_path_rule_matches (&rule, from) != 0) + if (from == NULL || substitute_path_rule_matches (&rule, from) != -1) gdb_printf (" `%s' -> `%s'.\n", rule.from.c_str (), rule.to.c_str ()); } @@ -1830,7 +1861,7 @@ set_substitute_path_command (const char *args, int from_tty) /* Strip any trailing directory separator character in either FROM or TO. The substitution rule already implicitly contains them. */ - strip_trailing_directory_separator (argv[0]); + strip_trailing_directory_separator_and_escape (argv[0]); strip_trailing_directory_separator (argv[1]); /* If a rule with the same "from" was previously defined, then diff --git a/gdb/testsuite/gdb.base/subst-glob.exp b/gdb/testsuite/gdb.base/subst-glob.exp new file mode 100644 index 00000000000..74dec472b7a --- /dev/null +++ b/gdb/testsuite/gdb.base/subst-glob.exp @@ -0,0 +1,166 @@ +# Copyright 2006-2024 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +clean_restart + +# Do a bunch of testing of the substitute-path glob pattern matching. + +gdb_test_no_output "set confirm off" \ + "deactivate GDB's confirmation interface" + +proc test_pattern { pattern path mode } { + # Escape backslashes so the GDB console can unescape them again. + set terminal_pattern [ \ + string map { \ + "\\" "\\\\" \ + } $pattern \ + ] + + # Escape the pattern for regex matching. + set match_pattern [ \ + string map { \ + "*" "\\\\*" \ + "?" "\\\\?" \ + "\\" "\\\\\\\\" \ + } $pattern \ + ] + + # Handle stripping of path separators. + if {[ishost "*-mingw*"]} { + regsub {\\\\\\\\\\\\|/$} $match_pattern {} match_pattern + set match_pattern [subst $match_pattern] + } else { + regsub {/$} $match_pattern {} match_pattern + set match_pattern [subst $match_pattern] + } + + # Escape backslashes so the GDB console can unescape them again. + set terminal_path [ \ + string map { \ + "\\" "\\\\" \ + } $path \ + ] + + # Escape the path for regex matching. + set match_path [ \ + string map { \ + "*" "\\*" \ + "?" "\\?" \ + "\\" "\\\\" \ + } $path \ + ] + + gdb_test_no_output "unset substitute-path" \ + "unset substitute-path before testing '$terminal_pattern' matches '$terminal_path'" + + gdb_test_no_output "set substitute-path \"$terminal_pattern\" \"to\"" \ + "set substitute-path $terminal_pattern before testing '$terminal_pattern' matches '$terminal_path'" + + if {$mode == 1} { + gdb_test "show substitute-path \"$terminal_path\"" \ + "Source path substitution rule matching `$match_path':\r\n +`$match_pattern' -> `to'." \ + "testing '$terminal_pattern' matches '$terminal_path'" + } else { + gdb_test "show substitute-path \"$terminal_path\"" \ + "Source path substitution rule matching `$match_path':" \ + "testing '$terminal_pattern' does not match '$terminal_path'" + } +} + +proc test_pattern_pos { pattern path } { + test_pattern $pattern $path 1 +} + +proc test_pattern_neg { pattern path } { + test_pattern $pattern $path 0 +} + +# Sanity checks. +test_pattern_pos "path" "path" +test_pattern_pos "path" "path/to" +test_pattern_pos "/" "/test" +test_pattern_pos "/testing" "/testing/test" +test_pattern_pos "/testing/" "/testing/test" +test_pattern_neg "path" "test" + +# '?' wildcard. +test_pattern_pos "?atchone" "matchone" +test_pattern_pos "pat?/to" "path/to" +test_pattern_pos "path??" "pathto" +test_pattern_pos "test?ng" "testing" +test_pattern_pos "?" "?/test" +test_pattern_neg "test?" "test/" +test_pattern_neg "test?" "testing/" +test_pattern_neg "?" "" + +# '*' wildcard. +test_pattern_pos "*" "matchall" +test_pattern_pos "path_*" "path_pattern" +test_pattern_pos "test*/test" "testing/test" +test_pattern_pos "test*" "testing/test" +test_pattern_pos "test*" "test/test" +test_pattern_pos "*" "testing/test" +test_pattern_pos "*/*" "testing/test" +test_pattern_pos "*/" "test/" +test_pattern_pos "/*" "/test" +test_pattern_pos "test*" "test/" +test_pattern_pos "test*" "test" +test_pattern_pos "test*test" "testtest" +test_pattern_pos "test*test" "testingtest" +test_pattern_pos "test*test" "testingtest/test" +test_pattern_pos "*" "*test" +test_pattern_pos "**" "t" +test_pattern_pos "*" "" +test_pattern_pos "*t*st" "foobartest" +test_pattern_pos "*t*st" "foobartest/ing" +test_pattern_pos "*t*st" "tetest" +test_pattern_pos "*t*st" "tetest/ing" +test_pattern_pos "*t*st" "testtest" +test_pattern_pos "*t*st" "testtest/ing" +test_pattern_neg "*test" "foobar" +test_pattern_neg "*/test" "foo/bar" + +# Escapes. +test_pattern_pos "\\\\" "\\" +test_pattern_pos "\\\\*" "\\test" +test_pattern_pos "*\\\\" "test\\" +test_pattern_pos "\\\\/" "\\/" +test_pattern_pos "\\*" "*" +test_pattern_pos "\\?" "?" +test_pattern_pos "\\*" "*/test" +test_pattern_pos "\\?" "?/test" +test_pattern_neg "\\*" "*test" +test_pattern_neg "\\?" "?test" +test_pattern_neg "\\*" "t" +test_pattern_neg "\\?" "t" +test_pattern_neg "\\" "test" +test_pattern_neg "\\/" "" +test_pattern_neg "\\/" "/test" + +if {[ishost "*-mingw*"]} { + # DOS tests. + test_pattern_pos "test" "TEST" + test_pattern_pos "/" "\\test" + test_pattern_pos "\\\\" "/test" + test_pattern_pos "*\\\\" "test/" +} + +if {[ishost "*-linux*"]} { + # Unix tests. + test_pattern_neg "test" "TEST" + test_pattern_neg "/" "\\test" + test_pattern_neg "\\\\" "/test" + test_pattern_neg "*\\\\" "test/" +} diff --git a/gdb/utils.c b/gdb/utils.c index ded03c74099..00597543051 100644 --- a/gdb/utils.c +++ b/gdb/utils.c @@ -3532,6 +3532,106 @@ gdb_filename_fnmatch (const char *pattern, const char *string, int flags) return fnmatch (pattern, string, flags); } +/* Return the position in STRING up until a PATTERN expression is matched. + Return -1 if there is no match. + + Only the wildcards ? and * are supported. */ + +int +gdb_fileprefix_fnmatch (const char *pattern, const char *string) +{ + int string_pos = 0; + char pattern_c; + char string_c; + + while (*pattern != '\0' && *string != '\0') + { + switch (*pattern) + { + /* Unescape and match the next character. */ + case '\\': + ++pattern; + if (*pattern == '\0') + return -1; + [[fallthrough]]; + + default: + pattern_c = *pattern; + string_c = *string; + +#ifdef HAVE_CASE_INSENSITIVE_FILE_SYSTEM + pattern_c = TOLOWER (pattern_c); + string_c = TOLOWER (string_c); +#endif + +#ifdef HAVE_DOS_BASED_FILE_SYSTEM + /* On DOS-based file systems, the '/' and the '\' are equivalent. */ + if (pattern_c == '/') + pattern_c = '\\'; + if (string_c == '/') + string_c = '\\'; +#endif + + /* Compare the current character of the pattern with the path. */ + if (pattern_c != string_c) + return -1; + break; + + /* Match any character. */ + case '?': + /* Directory separators are not matched by '?'. */ + if (IS_DIR_SEPARATOR (*string)) + return -1; + break; + + /* Match any string. */ + case '*': + int best_result = -1; + + /* Try to match any folling substring. */ + while (true) + { + /* Most of these attempts will fail at the first character. */ + int result = gdb_fileprefix_fnmatch (pattern+1, string); + + if (result != -1) + { + /* If there is a substring match, compare its result to the best + candidate so far. */ + result += string_pos; + if (result > best_result) + best_result = result; + } + + /* Exit on a null byte or a directory separator. */ + if (*string == '\0' || IS_DIR_SEPARATOR (*string)) + return best_result; + + ++string; + ++string_pos; + } + } + + ++pattern; + ++string; + ++string_pos; + } + + /* If the macthing is complete but there is still some of the pattern left, + we must ensure that the remaining pattern matches the empty string. */ + if (*pattern != '\0') + { + /* Only '*' can match the empty string. */ + while (*pattern == '*') + ++pattern; + + if (*pattern != '\0') + return -1; + } + + return string_pos; +} + /* Return the number of path elements in PATH. / = 1 /foo = 2 diff --git a/gdb/utils.h b/gdb/utils.h index 875a2583179..eaf3fe8a8c3 100644 --- a/gdb/utils.h +++ b/gdb/utils.h @@ -137,6 +137,8 @@ struct set_batch_flag_and_restore_page_info extern int gdb_filename_fnmatch (const char *pattern, const char *string, int flags); +extern int gdb_fileprefix_fnmatch (const char *pattern, const char *string); + extern void substitute_path_component (char **stringp, const char *from, const char *to);