From patchwork Thu Apr 4 00:17:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Yvon Zimmermann X-Patchwork-Id: 87999 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CDFBA3846411 for ; Thu, 4 Apr 2024 00:18:53 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from mailrelay.tu-berlin.de (mailrelay.tu-berlin.de [130.149.7.70]) by sourceware.org (Postfix) with ESMTPS id 485403847718 for ; Thu, 4 Apr 2024 00:17:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 485403847718 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=campus.tu-berlin.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=campus.tu-berlin.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 485403847718 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=130.149.7.70 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712189876; cv=none; b=lk+BYEX/m4R7on1NF9Yd1f0LItgXoLNcSrJ7cn8/AaX44hlrvvie02fI07r1+fHH/64SMohAJiDHxue0CT+uRbmyhgWElmRUGZehaeDWpX6r1IU3ID36pA7apOukjsvRrp42/uYcyCMdodZNc/6pRnlPxiurQmlu/1gtkHdNuSg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712189876; c=relaxed/simple; bh=AAnCzAA/cTDN0XAesWSFmwombmb+h8mHUwYKeTzmZEA=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=cf8lHwM1llgCApJpCXvt2Ad47OpijeCJZibzFsx5BVRYZ4Dc5tvcPV7sAYg1gDrHXWy8+AketfVKBwrxf0kQmPMtlHRnQ/H2cX3Ow9Vc15LK7Q60yCAM99zekALLK39CzBDj/Y9qTaJGYnlqusq89roRLiosgkckwg1PhsmZ8Xo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tu-berlin.de; l=17649; s=dkim-tub; t=1712189873; h=message-id:date:mime-version:from:subject:to: content-transfer-encoding; bh=PHQmeQYxA10plSacyad37Nvj7JeHmrau7zlWkA5sw6I=; b=i/uTdPQuzGH8fAly9eTRzgGRZ14U67qGldwEulKBqMh8SGHTq4ei8qC7 CC4eHI/f249CrvPRLPruYnogYM+xLnXIkgEhEdHNjsQhDWuy0iFI+Vr3Y IT5Kp7MXL837Hjt4sVLf5FcZF3HjaSQOJszDw5duyyz+4IPsz5/d35hkr s=; X-CSE-ConnectionGUID: pAOU4v/eS0avoj2D8OrQCw== X-CSE-MsgGUID: mWFzq9CjSai0q376acrX9g== X-IronPort-AV: E=Sophos;i="6.07,178,1708383600"; d="scan'208";a="23386570" Received: from bulkmail.tu-berlin.de (HELO mail.tu-berlin.de) ([141.23.12.143]) by mailrelay.tu-berlin.de with ESMTP; 04 Apr 2024 02:17:51 +0200 Message-ID: <2c4b26c0-2541-44b4-9c37-a296f099a3cd@campus.tu-berlin.de> Date: Thu, 4 Apr 2024 02:17:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US From: Max Yvon Zimmermann Subject: [PATCH v2] Add wildcard matching to substitute-path rules To: X-Spam-Status: No, score=-11.8 required=5.0 tests=AC_FROM_MANY_DOTS, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Changes since v1: The function strip_trailing_directory_separator_and_escape() is now fixed. I have added a new function validate_substitute_path_pattern() that will validate any pattern during the registration of a new rule. If an invalid pattern is detected, an error message gets printed. I hope this will make the feature more usable. If this check would not be performed an invalid rule would still be visible with 'show substitute-path', but it would never match anything. Use literal ',' and '\' in the documentation. Change the example file names in the documentation. More tests in the testsuite. Reviewed-By: Eli Zaretskii --- gdb/NEWS | 5 + gdb/doc/gdb.texinfo | 29 ++++ gdb/source.c | 106 +++++++++++---- gdb/testsuite/gdb.base/subst-glob.exp | 182 ++++++++++++++++++++++++++ gdb/utils.c | 100 ++++++++++++++ gdb/utils.h | 2 + 6 files changed, 401 insertions(+), 23 deletions(-) create mode 100644 gdb/testsuite/gdb.base/subst-glob.exp diff --git a/gdb/NEWS b/gdb/NEWS index feb3a37393a..5a041175507 100644 --- a/gdb/NEWS +++ b/gdb/NEWS @@ -90,6 +90,11 @@ show unwind-on-signal These new commands replaces the existing set/show unwindonsignal. The old command is maintained as an alias. +set substitute-path + This command now supports glob pattern matching for substitution + rules. Wildcards '?' and '*' are supported. Use '\' to escape + '?', '*' and '\' characters. + * New features in the GDB remote stub, GDBserver ** The --remote-debug and --event-loop-debug command line options diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo index 727f9275bfb..3bf7d99a153 100644 --- a/gdb/doc/gdb.texinfo +++ b/gdb/doc/gdb.texinfo @@ -9954,6 +9954,35 @@ For instance, if we had entered the following commands: use the second rule to rewrite @file{/usr/src/lib/foo.c} into @file{/mnt/src/lib/foo.c}. +Rules can contain wildcards to match multiple paths. The supported +wildcards are @file{?} (to match any single character) and @file{*} +(to match any string). Wildcards will never match the path separator of +the system. + +For instance, if we had entered the following command: + +@smallexample +(@value{GDBP}) set substitute-path /build/*/include /mnt/include +@end smallexample + +@noindent +@value{GDBN} would then rewrite @file{/build/release/include/inc.h} into +@file{/mnt/include/inc.h}. Another file @file{/build/debug/include/inc.h} +would also be rewritten as @file{/mnt/include/inc.h} using the same rule. + +Use @samp{\} to escape the characters @samp{?}, @samp{*} and @samp{\}. Note +that you need to escape any @samp{\} characters twice in the @value{GDBN} +command line. + +So if we want to match a literal @samp{*} character in a rule, we would enter: + +@smallexample +(@value{GDBP}) set substitute-path /foo\\*/bar /mnt/cross +@end smallexample + +@noindent +Now only the directory @file{/foo*/bar/} would match against the rule. + @item unset substitute-path [path] @kindex unset substitute-path diff --git a/gdb/source.c b/gdb/source.c index 432301e2a71..9a2f47194af 100644 --- a/gdb/source.c +++ b/gdb/source.c @@ -16,6 +16,7 @@ You should have received a copy of the GNU General Public License along with this program. If not, see . */ +#include "defs.h" #include "arch-utils.h" #include "symtab.h" #include "expression.h" @@ -26,6 +27,7 @@ #include "frame.h" #include "value.h" #include "gdbsupport/filestuff.h" +#include "utils.h" #include #include @@ -959,44 +961,69 @@ source_full_path_of (const char *filename, return 1; } -/* Return non-zero if RULE matches PATH, that is if the rule can be - applied to PATH. */ +/* Validate a substitute-path PATTERN. + Return 1 if PATTERN is valid. + Return 0 if PATTERN is invalid. */ static int -substitute_path_rule_matches (const struct substitute_path_rule *rule, - const char *path) +validate_substitute_path_pattern (const char *pattern) { - const int from_len = rule->from.length (); - const int path_len = strlen (path); + /* Iterate backwards to find any unterminated escapes. */ - if (path_len < from_len) - return 0; + int backslash_count = 0; + for (int i = strlen (pattern) - 1; i >= 0; --i) + { + if (pattern[i] != '\\') + break; + + ++backslash_count; + } + return (backslash_count % 2) == 0; +} + +/* Return the position in PATH up until RULE matches PATH, that is if the rule + can be applied to PATH. + Return -1 if there is no match. */ + +static int +substitute_path_rule_matches (const struct substitute_path_rule *rule, + const char *path) +{ /* The substitution rules are anchored at the start of the path, so the path should start with rule->from. */ - if (filename_ncmp (path, rule->from.c_str (), from_len) != 0) - return 0; + const int result = gdb_fileprefix_fnmatch (rule->from.c_str (), path); - /* Make sure that the region in the path that matches the substitution - rule is immediately followed by a directory separator (or the end of - string character). */ + if (result != -1) + { + /* Make sure that the region in the path that matches the substitution + rule is immediately followed by a directory separator (or the end of + string character). */ - if (path[from_len] != '\0' && !IS_DIR_SEPARATOR (path[from_len])) - return 0; + if (path[result] != '\0' && !IS_DIR_SEPARATOR (path[result])) + return -1; + } - return 1; + return result; } /* Find the substitute-path rule that applies to PATH and return it. + Also set SUB_POS to the position in PATH up until the rule matches PATH. Return NULL if no rule applies. */ static struct substitute_path_rule * -get_substitute_path_rule (const char *path) +get_substitute_path_rule (const char *path, int &sub_pos) { for (substitute_path_rule &rule : substitute_path_rules) - if (substitute_path_rule_matches (&rule, path)) - return &rule; + { + const int result = substitute_path_rule_matches (&rule, path); + if (result != -1) + { + sub_pos = result; + return &rule; + } + } return nullptr; } @@ -1010,7 +1037,9 @@ get_substitute_path_rule (const char *path) gdb::unique_xmalloc_ptr rewrite_source_path (const char *path) { - const struct substitute_path_rule *rule = get_substitute_path_rule (path); + int sub_pos; + const struct substitute_path_rule *rule + = get_substitute_path_rule (path, sub_pos); if (rule == nullptr) return nullptr; @@ -1018,7 +1047,7 @@ rewrite_source_path (const char *path) /* Compute the rewritten path and return it. */ return (gdb::unique_xmalloc_ptr - (concat (rule->to.c_str (), path + rule->from.length (), nullptr))); + (concat (rule->to.c_str (), path + sub_pos, nullptr))); } /* See source.h. */ @@ -1718,6 +1747,34 @@ strip_trailing_directory_separator (char *path) path[last] = '\0'; } +/* If the last character of PATH is a directory separator, then strip it. + Also remove any related escape character (on DOS-based systems). */ + +static void +strip_trailing_directory_separator_and_escape (char *path) +{ + const int last = strlen (path) - 1; + + if (last < 0) + return; /* No stripping is needed if PATH is the empty string. */ + + if (!IS_DIR_SEPARATOR (path[last])) + return; + +#ifdef HAVE_DOS_BASED_FILE_SYSTEM + if (path[last] == '\\') + { + if (last < 1 || path[last - 1] != '\\') + return; + + /* Remove any related escape character. */ + path[last - 1] = '\0'; + } +#endif + + path[last] = '\0'; +} + /* Add a new substitute-path rule at the end of the current list of rules. The new rule will replace FROM into TO. */ @@ -1754,7 +1811,7 @@ show_substitute_path_command (const char *args, int from_tty) for (substitute_path_rule &rule : substitute_path_rules) { - if (from == NULL || substitute_path_rule_matches (&rule, from) != 0) + if (from == NULL || substitute_path_rule_matches (&rule, from) != -1) gdb_printf (" `%s' -> `%s'.\n", rule.from.c_str (), rule.to.c_str ()); } @@ -1830,9 +1887,12 @@ set_substitute_path_command (const char *args, int from_tty) /* Strip any trailing directory separator character in either FROM or TO. The substitution rule already implicitly contains them. */ - strip_trailing_directory_separator (argv[0]); + strip_trailing_directory_separator_and_escape (argv[0]); strip_trailing_directory_separator (argv[1]); + if (!validate_substitute_path_pattern (argv[0])) + error (_("First argument is not a valid glob expression")); + /* If a rule with the same "from" was previously defined, then delete it. This new rule replaces it. */ diff --git a/gdb/testsuite/gdb.base/subst-glob.exp b/gdb/testsuite/gdb.base/subst-glob.exp new file mode 100644 index 00000000000..b364760671a --- /dev/null +++ b/gdb/testsuite/gdb.base/subst-glob.exp @@ -0,0 +1,182 @@ +# Copyright 2006-2024 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +clean_restart + +# Do a bunch of testing of the substitute-path glob pattern matching. + +gdb_test_no_output "set confirm off" \ + "deactivate GDB's confirmation interface" + +proc test_pattern { pattern path mode } { + # Escape backslashes so the GDB console can unescape them again. + set terminal_pattern [ \ + string map { \ + "\\" "\\\\" \ + } $pattern \ + ] + + # Escape the pattern for regex matching. + set match_pattern [ \ + string map { \ + "*" "\\\\*" \ + "?" "\\\\?" \ + "\\" "\\\\\\\\" \ + } $pattern \ + ] + + # Handle stripping of path separators. + if {[ishost "*-mingw*"]} { + regsub {(\\\\\\\\\\\\|/)$} $match_pattern {} match_pattern + set match_pattern [subst $match_pattern] + } else { + regsub {/$} $match_pattern {} match_pattern + set match_pattern [subst $match_pattern] + } + + # Escape backslashes so the GDB console can unescape them again. + set terminal_path [ \ + string map { \ + "\\" "\\\\" \ + } $path \ + ] + + # Escape the path for regex matching. + set match_path [ \ + string map { \ + "*" "\\*" \ + "?" "\\?" \ + "\\" "\\\\" \ + } $path \ + ] + + if {$mode == "fail"} { + gdb_test "set substitute-path \"$terminal_pattern\" \"to\"" \ + "First argument is not a valid glob expression" \ + "set substitute-path $terminal_pattern (expect failure)" + } else { + gdb_test_no_output "unset substitute-path" \ + "unset substitute-path before testing '$terminal_pattern' matches '$terminal_path'" + + gdb_test_no_output "set substitute-path \"$terminal_pattern\" \"to\"" \ + "set substitute-path $terminal_pattern before testing '$terminal_pattern' matches '$terminal_path'" + } + + if {$mode == "pos"} { + gdb_test "show substitute-path \"$terminal_path\"" \ + "Source path substitution rule matching `$match_path':\r\n +`$match_pattern' -> `to'." \ + "testing '$terminal_pattern' matches '$terminal_path'" + } elseif {$mode == "neg"} { + gdb_test "show substitute-path \"$terminal_path\"" \ + "Source path substitution rule matching `$match_path':" \ + "testing '$terminal_pattern' does not match '$terminal_path'" + } +} + +proc test_pattern_pos { pattern path } { + test_pattern $pattern $path "pos" +} + +proc test_pattern_neg { pattern path } { + test_pattern $pattern $path "neg" +} + +proc test_pattern_fail { pattern } { + test_pattern $pattern "" "fail" +} + +# Sanity checks. +test_pattern_pos "path" "path" +test_pattern_pos "path" "path/to" +test_pattern_pos "/" "/test" +test_pattern_pos "/testing" "/testing/test" +test_pattern_pos "/testing/" "/testing/test" +test_pattern_neg "path" "test" +test_pattern_neg "///" "test" +test_pattern_neg "/path//to" "/path/to" + +# '?' wildcard. +test_pattern_pos "?atchone" "matchone" +test_pattern_pos "pat?/to" "path/to" +test_pattern_pos "path??" "pathto" +test_pattern_pos "test?ng" "testing" +test_pattern_pos "?" "?/test" +test_pattern_neg "test?" "test/" +test_pattern_neg "test?" "testing/" +test_pattern_neg "?" "" + +# '*' wildcard. +test_pattern_pos "*" "matchall" +test_pattern_pos "path_*" "path_pattern" +test_pattern_pos "test*/test" "testing/test" +test_pattern_pos "test*" "testing/test" +test_pattern_pos "test*" "test/test" +test_pattern_pos "*" "testing/test" +test_pattern_pos "*/*" "testing/test" +test_pattern_pos "*/" "test/" +test_pattern_pos "/*" "/test" +test_pattern_pos "test*" "test/" +test_pattern_pos "test*" "test" +test_pattern_pos "test*test" "testtest" +test_pattern_pos "test*test" "testingtest" +test_pattern_pos "test*test" "testingtest/test" +test_pattern_pos "*" "*test" +test_pattern_pos "**" "t" +test_pattern_pos "*" "" +test_pattern_pos "*t*st" "foobartest" +test_pattern_pos "*t*st" "foobartest/ing" +test_pattern_pos "*t*st" "tetest" +test_pattern_pos "*t*st" "tetest/ing" +test_pattern_pos "*t*st" "testtest" +test_pattern_pos "*t*st" "testtest/ing" +test_pattern_neg "*test" "foobar" +test_pattern_neg "*/test" "foo/bar" + +# Escapes. +test_pattern_pos "\\\\" "\\" +test_pattern_pos "\\\\*" "\\test" +test_pattern_pos "*\\\\" "test\\" +test_pattern_pos "\\\\/" "\\/" +test_pattern_pos "\\*" "*" +test_pattern_pos "\\?" "?" +test_pattern_pos "\\*" "*/test" +test_pattern_pos "\\?" "?/test" +test_pattern_pos "\\//" "/" +test_pattern_neg "\\//" "test" +test_pattern_neg "\\*" "*test" +test_pattern_neg "\\?" "?test" +test_pattern_neg "\\*" "t" +test_pattern_neg "\\?" "t" +test_pattern_fail "\\" +test_pattern_fail "\\/" +test_pattern_fail "\\\\\\" +test_pattern_fail "test\\" +test_pattern_fail "test\\\\\\" + +if {[ishost "*-mingw*"]} { + # DOS tests. + test_pattern_pos "test" "TEST" + test_pattern_pos "/" "\\test" + test_pattern_pos "\\\\" "/test" + test_pattern_pos "*\\\\" "test/" +} + +if {[ishost "*-linux*"]} { + # Unix tests. + test_pattern_neg "test" "TEST" + test_pattern_neg "/" "\\test" + test_pattern_fail "\\\\" "/test" + test_pattern_fail "*\\\\" "test/" +} diff --git a/gdb/utils.c b/gdb/utils.c index ded03c74099..00597543051 100644 --- a/gdb/utils.c +++ b/gdb/utils.c @@ -3532,6 +3532,106 @@ gdb_filename_fnmatch (const char *pattern, const char *string, int flags) return fnmatch (pattern, string, flags); } +/* Return the position in STRING up until a PATTERN expression is matched. + Return -1 if there is no match. + + Only the wildcards ? and * are supported. */ + +int +gdb_fileprefix_fnmatch (const char *pattern, const char *string) +{ + int string_pos = 0; + char pattern_c; + char string_c; + + while (*pattern != '\0' && *string != '\0') + { + switch (*pattern) + { + /* Unescape and match the next character. */ + case '\\': + ++pattern; + if (*pattern == '\0') + return -1; + [[fallthrough]]; + + default: + pattern_c = *pattern; + string_c = *string; + +#ifdef HAVE_CASE_INSENSITIVE_FILE_SYSTEM + pattern_c = TOLOWER (pattern_c); + string_c = TOLOWER (string_c); +#endif + +#ifdef HAVE_DOS_BASED_FILE_SYSTEM + /* On DOS-based file systems, the '/' and the '\' are equivalent. */ + if (pattern_c == '/') + pattern_c = '\\'; + if (string_c == '/') + string_c = '\\'; +#endif + + /* Compare the current character of the pattern with the path. */ + if (pattern_c != string_c) + return -1; + break; + + /* Match any character. */ + case '?': + /* Directory separators are not matched by '?'. */ + if (IS_DIR_SEPARATOR (*string)) + return -1; + break; + + /* Match any string. */ + case '*': + int best_result = -1; + + /* Try to match any folling substring. */ + while (true) + { + /* Most of these attempts will fail at the first character. */ + int result = gdb_fileprefix_fnmatch (pattern+1, string); + + if (result != -1) + { + /* If there is a substring match, compare its result to the best + candidate so far. */ + result += string_pos; + if (result > best_result) + best_result = result; + } + + /* Exit on a null byte or a directory separator. */ + if (*string == '\0' || IS_DIR_SEPARATOR (*string)) + return best_result; + + ++string; + ++string_pos; + } + } + + ++pattern; + ++string; + ++string_pos; + } + + /* If the macthing is complete but there is still some of the pattern left, + we must ensure that the remaining pattern matches the empty string. */ + if (*pattern != '\0') + { + /* Only '*' can match the empty string. */ + while (*pattern == '*') + ++pattern; + + if (*pattern != '\0') + return -1; + } + + return string_pos; +} + /* Return the number of path elements in PATH. / = 1 /foo = 2 diff --git a/gdb/utils.h b/gdb/utils.h index 875a2583179..eaf3fe8a8c3 100644 --- a/gdb/utils.h +++ b/gdb/utils.h @@ -137,6 +137,8 @@ struct set_batch_flag_and_restore_page_info extern int gdb_filename_fnmatch (const char *pattern, const char *string, int flags); +extern int gdb_fileprefix_fnmatch (const char *pattern, const char *string); + extern void substitute_path_component (char **stringp, const char *from, const char *to);