From patchwork Thu Feb 8 16:28:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Burgess X-Patchwork-Id: 85471 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1F275385841C for ; Thu, 8 Feb 2024 16:30:50 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 9BDD53858C78 for ; Thu, 8 Feb 2024 16:28:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9BDD53858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9BDD53858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707409735; cv=none; b=qiq2j42mnIsEqVCoynrqaKC+hkfaK/AbzeeOSmr165hJsPjR2rs5Me1P4Z/0Kk+zbP0FE0FlOQpNOoEY8Nk2cnDRsv9kgrSlBVGVQ3nf770VayTSpoKl97SvjNPwh0Wbqiu1eISkg6LWwcZ7TTgi/A4g1XJxSoXRXVvhhGnDnyo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707409735; c=relaxed/simple; bh=rKBSgQjHV4YIgnB+D7BFmvHgI0l3KIWI3mq1uOwuaWc=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=BVdqWP38R5ihzamZwzko3uFr1p6nNpjhBmiX4aSysrkpBnbw78DZh0qrw1lpvIFHeCE+k2gXwF5fppOdmrVrsMAmcPDJPHa1XIUzKYx/Ee/4kZAynFlEyYGlfMAgzVsHbijX5YbmMSew9HaTntigsZcx763/2ro3cG0d19nsBjw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707409732; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g3ZOoc6HZA4iXAcvats/dJ9TcUcIDJvc5UopEA6JBEs=; b=G0bjtLNf0PltDqynaTKXgoNxjiaSDgIVbGSbtKv6aaDuyUVBX72P0zJdYzGcZ3g5TCwNN+ iHQJ3qjejba8lzHBb1t6QEZozhYxs60gaPgcK68f+vVIsV2ogd1NRgUO8QHOnb2YG9xTbd L85k7Yee62jeZrxySYgoD+huFa7uHYM= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-213-zfUZEiWNNdqYhqFF2uYNig-1; Thu, 08 Feb 2024 11:28:51 -0500 X-MC-Unique: zfUZEiWNNdqYhqFF2uYNig-1 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-410040e1fcdso368985e9.0 for ; Thu, 08 Feb 2024 08:28:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707409729; x=1708014529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g3ZOoc6HZA4iXAcvats/dJ9TcUcIDJvc5UopEA6JBEs=; b=mL9DR3v683vVLuj525XXU+9rxy8b2yfsqphddyOjUkrYJL5R/bJMU5TpVQMi/IJcby L9jz5Zn8NXm8ieGdJ5B9AiiA/pAFgAcyPkc6C9A5Lc4KAXw84VOoSBmy7Nalxw2b9aPi zfs63ShLWdeEPt0G5YBJvnRgNmtgi+g1kCp5fTZlPqoJbESnFfnJYB6EnHrrcPhm+aSf SUeNmnNGsFHEIkL9ThWcEuq8RJmqlxYRK1KGPyCWTAY/oCFRDEiu1AjOhPQ2e0Ehp8+k VZmq3t9RjNs0nMe6KQqyuE8OOjveZ2mi5kR/jmuPy0IlC6LXhKHx3SSGpygL764XVyZE wM2w== X-Gm-Message-State: AOJu0Yx5gTCdpVaP0lk713Wt6ggbrnC1v/R+meC7DR+wfg1Qdt/874/z bW5BJxvVb9CFnsseJ1Or9AxiEJFTah9FyX3nZVOLQEzsBcUgJgdI2yxkNdCCzy3lUZcUY9T5cUL DeMFLmGdDy1H8qWx5AgGFcjr4Woh419NcWUrn+blHZxrQ5OgzSj8Pzv/nzAKNcIOhTpRaHAhALv bnwSzGRqBB9MtVc660C3Vl+DRMnRya/0hthchQaDTJaeg= X-Received: by 2002:a05:600c:3153:b0:40f:dd72:d9f with SMTP id h19-20020a05600c315300b0040fdd720d9fmr6940064wmo.39.1707409728832; Thu, 08 Feb 2024 08:28:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IEevmJKWWJwgRz5X7hvMxaA+hHAVIzy2Ru4WmJelhdWk9SOgJ/DxONjjHfniF3hOfwS9PtcmA== X-Received: by 2002:a05:600c:3153:b0:40f:dd72:d9f with SMTP id h19-20020a05600c315300b0040fdd720d9fmr6940041wmo.39.1707409728332; Thu, 08 Feb 2024 08:28:48 -0800 (PST) Received: from localhost (185.223.159.143.dyn.plus.net. [143.159.223.185]) by smtp.gmail.com with ESMTPSA id n38-20020a05600c502600b0040f035bebfcsm2124932wmr.12.2024.02.08.08.28.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 08:28:47 -0800 (PST) From: Andrew Burgess To: gdb-patches@sourceware.org Cc: Andrew Burgess Subject: [PATCH 2/4] gdbsupport: allow for configuration of extract_string_maybe_quoted Date: Thu, 8 Feb 2024 16:28:41 +0000 Message-Id: <499135b7e99641c873df5c4ce419bf4e5a9751f9.1707409662.git.aburgess@redhat.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org This function has the extract_string_maybe_quoted function accept an extra parameter, a control structure. This control structure can then be used to modify how extract_string_maybe_quoted operates. What can be controlled exactly? The new argument allows controls over which characters can and can't be escaped within each of three contexts: outside any quotes, within single quotes, and within double quotes. The escaping rules for the current extract_string_maybe_quoted make very little sense to me, we allow every character to be escaped within all three contexts. This doesn't make much sense to me, why do we need to allow so much escaping? In most places in GDB where extract_string_maybe_quoted is used no characters have special meaning, so if I call extract_string_maybe_quoted to extract from this: "Blah\@" Then I'll get back (Blah@) without the parenthesis. But why is the escaping supported? I could just as easily write: "Blah@" In which case I'll get the same (Blah@) result. If I want a result of (Blah\@) then I need to type: "Blah\\@" Now the argument could be used that this is consistent with shell escaping, which is what users will expect. Except it really isn't. Outside of any quotes then yes, any \ sequence will result in even when doesn't have a special meaning, but within double quotes only a limited set of characters need escaping, in all other cases the backslash will be preserved, yet for GDB within double quotes, our escaping is just like the unquoted escaping. And within single quotes no characters can be escaped at a shell, and yet, once again, GDB just uses the unquoted strategy. So I think GDB's approach neither makes sense from a necessity point of view, nor does it make sense from a "like a shell" point of view. My plan, in a later commit, is to reuse extract_string_maybe_quoted as the core of gdb_argv instead of calling into libiberty. To do this I need extract_string_maybe_quoted to behave more "shell like" just as libiberties buildargv function does. And so, by allowing control over which characters can, or can't be quoted, I can retain the current extract_string_maybe_quoted behaviour for the current users, but provide a different control structure for when (later on) I reuse extract_string_maybe_quoted in gdb_argv. I did consider changing the default behaviour of extract_string_maybe_quoted, and I might still propose that in the future. Unfortunately, there is one test (gdb.base/options.exp) that depends on the current extract_string_maybe_quoted behaviour, specifically is relies on being able to escape a single quote character within a single quote context -- at a POSIX shell it is not possible to embed a single quote within a single quoted string. So as this is tested behaviour I'm reluctant to just go changing this. In this commit I've added a new control structure extract_string_ctrl in gdbsupport/common-utils.h, defined a default structure in gdbsupport/common-utils.cc, and this default is used by extract_string_maybe_quoted to maintain the current behaviour. In gdb/unittests/extract-string-selftests.c I've defined a new extract_string_ctrl object which implements shell like behaviour, I've then added some new extract_string_maybe_quoted self tests which check the shell like behaviour. --- gdb/unittests/extract-string-selftests.c | 20 +++- gdbsupport/common-utils.cc | 75 +++++++------ gdbsupport/common-utils.h | 133 ++++++++++++++++++++++- 3 files changed, 191 insertions(+), 37 deletions(-) diff --git a/gdb/unittests/extract-string-selftests.c b/gdb/unittests/extract-string-selftests.c index c2eedd75342..3b9a5bd104c 100644 --- a/gdb/unittests/extract-string-selftests.c +++ b/gdb/unittests/extract-string-selftests.c @@ -23,20 +23,30 @@ namespace selftests { namespace extract_string { +static extract_string_ctrl shell_extract_string_ctrl + (nullptr, "", "\"$`\\", "\n", "", "\n"); + struct test_def { test_def (const char *input, const char *output, - size_t offset) + size_t offset, + extract_string_ctrl *ctrl = nullptr) : m_input (input), m_output (output), - m_offset (offset) + m_offset (offset), + m_ctrl (ctrl) { /* Nothing. */ } void run () const { const char *tmp = m_input; - std::string test_out = extract_string_maybe_quoted (&tmp); + std::string test_out; + + if (m_ctrl == nullptr) + test_out = extract_string_maybe_quoted (&tmp); + else + test_out = extract_string_maybe_quoted (&tmp, *m_ctrl); if (run_verbose ()) { @@ -55,6 +65,7 @@ struct test_def const char *m_input; const char *m_output; size_t m_offset; + extract_string_ctrl *m_ctrl; }; test_def tests[] = { @@ -64,6 +75,9 @@ test_def tests[] = { { "ab\\ cd ef", "ab cd", 6 }, { "\"abc\\\"def\" ghi", "abc\"def", 10 }, { "\"'abc' 'def'\" ghi", "'abc' 'def'", 13 }, + { "'ab\\ cd' ef", "ab\\ cd", 8, &shell_extract_string_ctrl }, + { "ab\\\ncd ef", "abcd", 6, &shell_extract_string_ctrl }, + { "\"ab\\\ncd\" ef", "abcd", 8, &shell_extract_string_ctrl }, }; static void diff --git a/gdbsupport/common-utils.cc b/gdbsupport/common-utils.cc index 99b9cb8609b..15ee2c43f83 100644 --- a/gdbsupport/common-utils.cc +++ b/gdbsupport/common-utils.cc @@ -164,58 +164,67 @@ savestring (const char *ptr, size_t len) return p; } +/* See common-utils.h. */ + +extract_string_ctrl default_extract_string_ctrl (nullptr, nullptr, nullptr); + /* See documentation in common-utils.h. */ std::string -extract_string_maybe_quoted (const char **arg) +extract_string_maybe_quoted (const char **arg, + const extract_string_ctrl &ctrl) { - bool squote = false; - bool dquote = false; - bool bsquote = false; + char quote = '\0'; std::string result; const char *p = *arg; /* Find the start of the argument. */ p = skip_spaces (p); - /* Parse p similarly to gdb_argv buildargv function. */ + /* Parse p similarly to the libiberty buildargv function, but respect + the escaping rules of CTRL. */ while (*p != '\0') { - if (ISSPACE (*p) && !squote && !dquote && !bsquote) + if (ISSPACE (*p) && quote == '\0') break; else { - if (bsquote) - { - bsquote = false; - result += *p; - } - else if (*p == '\\') - bsquote = true; - else if (squote) + if (*p == '\\') { - if (*p == '\'') - squote = false; + /* The character after the backslash. */ + char c = *(p + 1); + + if (ctrl.discard_escaped_char (c, quote)) + { + /* We are discarding the escape character (backslash) and + the character after the escape. This allows us to + emulate POSIX newline handling where an escaped + newline is discarded. */ + ++p; + } + else if (ctrl.is_escaped (c, quote)) + { + /* The character C is escaped. We discard the escape + character (backslash), but do include C. */ + ++p; + result += c; + } else - result += *p; - } - else if (dquote) - { - if (*p == '"') - dquote = false; - else - result += *p; + { + /* We are going to treat the backslash as a literal + character with no special association to the following + character. */ + result += *p; + } } + else if (*p == quote) + quote = '\0'; + else if (quote == '\0' && (*p == '\'' || *p == '"')) + quote = *p; else - { - if (*p == '\'') - squote = true; - else if (*p == '"') - dquote = true; - else - result += *p; - } - p++; + result += *p; + + ++p; } } diff --git a/gdbsupport/common-utils.h b/gdbsupport/common-utils.h index 23cd40c0207..60578cd7560 100644 --- a/gdbsupport/common-utils.h +++ b/gdbsupport/common-utils.h @@ -74,6 +74,135 @@ std::string &string_vappendf (std::string &dest, const char* fmt, va_list args) char *savestring (const char *ptr, size_t len); +/* A control data structure, instances of this class are passed to the + extract_string_maybe_quoted function in order to modify its behaviour. + + This class controls which characters can and can't be quoted within + different context (no quotes, single quotes, or double quotes). */ + +struct extract_string_ctrl +{ + /* Constructor. The CAN_ESCAPE_* arguments are the set of characters + which can be escaped within each context. If any character within + this set appears within the given context with a backslash before it, + then the backslash will be removed and the character retained in the + output. If any of these arguments are nullptr then every character + can be escaped within that context. If you want no character escaping + within a particular context then pass the empty string. + + The DISCARD_* arguments are the set of characters which should be + discarded when escaped within a given context. If any of these + characters appear within the given context with a backslash before + them, then both the backslash and the character will be removed from + the final output. If any of these arguments are nullptr then no + characters will be discarded when escaped within that context. + + During string extraction, if a backslash is encountered in the input, + and the next character is not in the relevant CAN_ESCAPE_* variable, + nor is in the relevant DISCARD_* variable, then the backslash will be + retained in the output. */ + + explicit extract_string_ctrl (const char *can_escape_unquoted, + const char *can_escape_single, + const char *can_escape_double, + const char *discard_unquoted = nullptr, + const char *discard_single = nullptr, + const char *discard_double = nullptr) + : m_can_escape_unquoted (can_escape_unquoted), + m_can_escape_single (can_escape_single), + m_can_escape_double (can_escape_double), + m_discard_unquoted (discard_unquoted), + m_discard_single (discard_single), + m_discard_double (discard_double) + { + /* Nothing. */ + } + + /* Return true if, when character C appears escaped within an input + string, both the backslash escape character, and the character C + should be discarded from the input stream. */ + + bool discard_escaped_char (const char c, const char quote) const + { + const char *char_set = nullptr; + switch (quote) + { + case '\0': + char_set = m_discard_unquoted; + break; + case '\'': + char_set = m_discard_single; + break; + case '"': + char_set = m_discard_double; + break; + default: + gdb_assert_not_reached ("unexpected quote character '%c'", quote); + } + + /* Discard nothing. */ + if (char_set == nullptr) + return false; + + /* If C is in CHAR_SET then discard. */ + return strchr (char_set, c) != nullptr; + } + + /* Return true if, when character C appears escaped within an input + string, we should discard the previous escape character and just + include character C. Otherwise return false. */ + + bool is_escaped (const char c, const char quote) const + { + const char *char_set = nullptr; + switch (quote) + { + case '\0': + char_set = m_can_escape_unquoted; + break; + case '\'': + char_set = m_can_escape_single; + break; + case '"': + char_set = m_can_escape_double; + break; + default: + gdb_assert_not_reached ("unexpected quote character '%c'", quote); + } + + /* When no character set is defined we always return true, this means + that every character can be escaped. */ + if (char_set == nullptr) + return true; + + /* Otherwise, return true if C is in CHAR_SET. */ + return strchr (char_set, c) != nullptr; + } + +private: + /* The set of characters that can be escaped within a particular + context. See the constructor for more information. */ + const char *m_can_escape_unquoted; + const char *m_can_escape_single; + const char *m_can_escape_double; + + /* The set of characters that can be discarded within a particular + context. See the constructor for more information. */ + const char *m_discard_unquoted; + const char *m_discard_single; + const char *m_discard_double; +}; + +/* The default control configuration for extract_string_maybe_quoted. + This is the way it is for backwards compatibility reasons, though it is + unclear if this is actually a good thing or not. Every character in + every context can be escaped. This really makes very little sense as + in most locations in GDB no characters, other than quotes and white + space, actually have any special meaning, so really very little + escaping should be supported. */ + +extern extract_string_ctrl default_extract_string_ctrl; + /* Extract the next word from ARG. The next word is defined as either, everything up to the next space, or, if the next word starts with either a single or double quote, then everything up to the closing quote. The @@ -82,7 +211,9 @@ char *savestring (const char *ptr, size_t len); word, or, for quoted words, the first character after the closing quote. */ -std::string extract_string_maybe_quoted (const char **arg); +std::string extract_string_maybe_quoted + (const char **arg, + const extract_string_ctrl &ctrl = default_extract_string_ctrl); /* The strerror() function can return NULL for errno values that are out of range. Provide a "safe" version that always returns a