From patchwork Fri Jun 2 12:22:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pedro Alves X-Patchwork-Id: 20713 Received: (qmail 1929 invoked by alias); 2 Jun 2017 12:23:18 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 1810 invoked by uid 89); 2 Jun 2017 12:23:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=converse, Within, unidentified X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 02 Jun 2017 12:23:13 +0000 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A0367796E1 for ; Fri, 2 Jun 2017 12:23:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A0367796E1 Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=palves@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com A0367796E1 Received: from cascais.lan (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06B914DA6D for ; Fri, 2 Jun 2017 12:23:14 +0000 (UTC) From: Pedro Alves To: gdb-patches@sourceware.org Subject: [PATCH 34/40] Make strcmp_iw NOT ignore whitespace in the middle of tokens Date: Fri, 2 Jun 2017 13:22:32 +0100 Message-Id: <1496406158-12663-35-git-send-email-palves@redhat.com> In-Reply-To: <1496406158-12663-1-git-send-email-palves@redhat.com> References: <1496406158-12663-1-git-send-email-palves@redhat.com> currently "b func tion" manages to set a breakpoint at "function" ! All this years I had never noticed this, but now that the linespec completer actually works, this easily happens by accident, with: "b func t" expecting to get "thread", but getting instead: "b func tion" ... Also, this: "b rettypefunc" manages to set a breakpoint on "rettype func()". These things happen due to strcmp_iw "magic". Fix it by teaching strcmp_iw about when can it skip whitespace. This required handling user-defined operators, and scope operators, complicating the code a bit, unfortunately. I added unit tests for all the corner cases I stumbled on, as I was developing this, and then in the end wrote a testsuite testcase covering many of the same things and more (later in the series). The operator_stoken changes are necessary due to a latent bug -- currently "operator char" becomes "operatorchar", and later look ups only find it because strcmp_iw ignores the whitespace... gdb/ChangeLog: yyyy-mm-dd Pedro Alves * c-exp.y (oper): Add space to operator names. * cp-support.c (cp_symbol_name_matches_1) (cp_fq_symbol_name_matches): Pass language to strncmp_iw_with_mode. (test_cp_symbol_name_cmp): Add unit tests. * language.c (default_symbol_name_matcher): Pass language to strncmp_iw_with_mode. * utils.c: Include "cp-support.h" and . (valid_identifier_name_char, cp_skip_operator_token, skip_ws) (cp_is_operator): New functions. (strncmp_iw_with_mode): Use them. Add language parameter. Don't skip whitespace in the symbol name when the lookup name doesn't have spaces, and vice versa. (strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode. * utils.h (strncmp_iw_with_mode): Add language parameter. --- gdb/c-exp.y | 5 +- gdb/cp-support.c | 65 +++++++++++++++- gdb/language.c | 2 +- gdb/utils.c | 227 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- gdb/utils.h | 3 +- 5 files changed, 289 insertions(+), 13 deletions(-) diff --git a/gdb/c-exp.y b/gdb/c-exp.y index 24a2fbd..0a182cc 100644 --- a/gdb/c-exp.y +++ b/gdb/c-exp.y @@ -1487,7 +1487,7 @@ oper: OPERATOR NEW | OPERATOR '>' { $$ = operator_stoken (">"); } | OPERATOR ASSIGN_MODIFY - { const char *op = "unknown"; + { const char *op = " unknown"; switch ($2) { case BINOP_RSH: @@ -1563,7 +1563,8 @@ oper: OPERATOR NEW c_print_type ($2, NULL, &buf, -1, 0, &type_print_raw_options); - $$ = operator_stoken (buf.c_str ()); + std::string name = " " + buf.string (); + $$ = operator_stoken (name.c_str ()); } ; diff --git a/gdb/cp-support.c b/gdb/cp-support.c index 84d8a6b..4c353c5 100644 --- a/gdb/cp-support.c +++ b/gdb/cp-support.c @@ -1709,7 +1709,7 @@ cp_symbol_name_matches_1 (const char *symbol_search_name, while (true) { if (strncmp_iw_with_mode (sname, lookup_name, lookup_name_len, - mode) == 0) + mode, language_cplus) == 0) { if (comp_match_res != NULL) { @@ -1747,7 +1747,7 @@ cp_fq_symbol_name_matches (const char *symbol_search_name, if (strncmp_iw_with_mode (symbol_search_name, name.c_str (), name.size (), - mode) == 0) + mode, language_cplus) == 0) { if (comp_match_res != NULL) { @@ -1857,6 +1857,67 @@ test_cp_symbol_name_cmp () CHECK_MATCH_C ("function(int)", "function(int)"); CHECK_NOT_MATCH_C ("function(int)", "function()"); + /* Check that whitespace within symbol names is not ignored. */ + CHECK_NOT_MATCH_C ("function", "func tion"); + CHECK_NOT_MATCH_C ("func__tion", "func_ _tion"); + CHECK_NOT_MATCH_C ("func11tion", "func1 1tion"); + + /* Check the converse, which can happen with template function, + where the return type is part of the demangled name. */ + CHECK_NOT_MATCH_C ("func tion", "function"); + CHECK_NOT_MATCH_C ("func1 1tion", "func11tion"); + CHECK_NOT_MATCH_C ("func_ _tion", "func__tion"); + + /* Within parameters too. */ + CHECK_NOT_MATCH_C ("func(param)", "func(par am)"); + + /* Check handling of whitespace around C++ operators. */ + CHECK_NOT_MATCH_C ("operator<<", "opera tor<<"); + CHECK_NOT_MATCH_C ("operator<<", "operator< <"); + CHECK_NOT_MATCH_C ("operator<<", "operator < <"); + CHECK_NOT_MATCH_C ("operator==", "operator= ="); + CHECK_NOT_MATCH_C ("operator==", "operator = ="); + CHECK_MATCH_C ("operator<<", "operator <<"); + CHECK_MATCH_C ("operator<<()", "operator <<"); + CHECK_NOT_MATCH_C ("operator<<()", "operator<<(int)"); + CHECK_NOT_MATCH_C ("operator<<(int)", "operator<<()"); + CHECK_MATCH_C ("operator==", "operator =="); + CHECK_MATCH_C ("operator==()", "operator =="); + CHECK_MATCH_C ("operator <<", "operator<<"); + CHECK_MATCH_C ("operator ==", "operator=="); + CHECK_MATCH_C ("operator bool", "operator bool"); + CHECK_MATCH_C ("operator bool ()", "operator bool"); + CHECK_MATCH_C ("operatorX<<", "operatorX < <"); + CHECK_MATCH_C ("Xoperator<<", "Xoperator < <"); + + CHECK_MATCH_C ("operator()(int)", "operator()(int)"); + CHECK_MATCH_C ("operator()(int)", "operator ( ) ( int )"); + CHECK_MATCH_C ("operator()(int)", "operator ( ) < long > ( int )"); + /* The first "()" is not the parameter list. */ + CHECK_NOT_MATCH ("operator()(int)", "operator"); + + /* Misc user-defined operator tests. */ + + CHECK_NOT_MATCH_C ("operator/=()", "operator ^="); + /* Same length at end of input. */ + CHECK_NOT_MATCH_C ("operator>>", "operator[]"); + /* Same length but not at end of input. */ + CHECK_NOT_MATCH_C ("operator>>()", "operator[]()"); + + CHECK_MATCH_C ("base::operator char*()", "base::operator char*()"); + CHECK_MATCH_C ("base::operator char*()", "base::operator char * ()"); + CHECK_MATCH_C ("base::operator char**()", "base::operator char * * ()"); + CHECK_MATCH ("base::operator char**()", "base::operator char * *"); + CHECK_MATCH_C ("base::operator*()", "base::operator*()"); + CHECK_NOT_MATCH_C ("base::operator char*()", "base::operatorc"); + CHECK_NOT_MATCH ("base::operator char*()", "base::operator char"); + CHECK_NOT_MATCH ("base::operator char*()", "base::operat"); + + /* Check handling of whitespace around C++ scope operators. */ + CHECK_NOT_MATCH_C ("foo::bar", "foo: :bar"); + CHECK_MATCH_C ("foo::bar", "foo :: bar"); + CHECK_MATCH_C ("foo :: bar", "foo::bar"); + /* Tests matching symbols in some scope. */ CHECK_MATCH_C ("foo::function()", "function"); CHECK_MATCH_C ("foo::function(int)", "function"); diff --git a/gdb/language.c b/gdb/language.c index 511a86f..377d748 100644 --- a/gdb/language.c +++ b/gdb/language.c @@ -723,7 +723,7 @@ default_symbol_name_matcher (const char *symbol_search_name, : strncmp_iw_mode::MATCH_PARAMS); if (strncmp_iw_with_mode (symbol_search_name, name.c_str (), name.size (), - mode) == 0) + mode, language_minimal) == 0) { if (comp_match_res != NULL) { diff --git a/gdb/utils.c b/gdb/utils.c index 9798edc..484c1ef 100644 --- a/gdb/utils.c +++ b/gdb/utils.c @@ -65,6 +65,8 @@ #include "gdb_usleep.h" #include "interps.h" #include "gdb_regex.h" +#include "cp-support.h" +#include #if !HAVE_DECL_MALLOC extern PTR malloc (); /* ARI: PTR */ @@ -2418,22 +2420,227 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name, } } +/* True if CH is a character that can be part of a symbol name. I.e., + either a number, a letter, or a '_'. */ + +static bool +valid_identifier_name_char (int ch) +{ + return (isalnum (ch) || ch == '_'); +} + +/* Skip to end of token, or to END, whatever comes first. */ + +static const char * +cp_skip_operator_token (const char *token, const char *end) +{ + const char *p = token; + while (p != end && !isspace (*p) && *p != '(') + { + if (valid_identifier_name_char (*p)) + { + while (p != end && valid_identifier_name_char (*p)) + p++; + return p; + } + else + { + /* Note, ordered such that among ops that share a prefix, + longer comes first. This is so that the loop below can + bail on first match. */ + static const char *ops[] = + { + "[", + "]", + "~", + ",", + "-=", "--", "->", "-", + "+=", "++", "+", + "*=", "*", + "/=", "/", + "%=", "%", + "|=", "||", "|", + "&=", "&&", "&", + "^=", "^", + "!=", "!", + "<<=", "<=", "<<", "<", + ">>=", ">=", ">>", ">", + "==", "=", + }; + + for (const char *op : ops) + { + size_t oplen = strlen (op); + size_t lencmp = std::min (oplen, end - p); + + if (strncmp (p, op, lencmp) == 0) + return p + lencmp; + } + /* Some unidentified character. Return it. */ + return p + 1; + } + } + + return p; +} + +/* Advance string1/string2 past whitespace. */ + +static void +skip_ws (const char *&string1, const char *&string2, const char *end_str2) +{ + while (isspace (*string1)) + string1++; + while (string2 < end_str2 && isspace (*string2)) + string2++; +} + +static bool +cp_is_operator (const char *string, const char *start) +{ + return ((string == start + || !valid_identifier_name_char (string[-1])) + && strncmp (string, CP_OPERATOR_STR, CP_OPERATOR_LEN) == 0 + && !valid_identifier_name_char (string[CP_OPERATOR_LEN])); +} + /* See utils.h. */ int strncmp_iw_with_mode (const char *string1, const char *string2, - size_t string2_len, strncmp_iw_mode mode) + size_t string2_len, strncmp_iw_mode mode, + enum language language) { + const char *string1_start = string1; const char *end_str2 = string2 + string2_len; + bool skip_spaces = true; + bool have_colon_op = (language == language_cplus + || language == language_rust + || language == language_fortran); while (1) { - while (isspace (*string1)) - string1++; - while (string2 < end_str2 && isspace (*string2)) - string2++; + if (skip_spaces + || ((isspace (*string1) && !valid_identifier_name_char (*string2)) + || (isspace (*string2) && !valid_identifier_name_char (*string1)))) + { + skip_ws (string1, string2, end_str2); + skip_spaces = false; + } + if (*string1 == '\0' || string2 == end_str2) break; + + /* Handle the :: operator. */ + if (have_colon_op && string1[0] == ':' && string1[1] == ':') + { + if (*string2 != ':') + return 1; + + string1++; + string2++; + + if (string2 == end_str2) + break; + + if (*string2 != ':') + return 1; + + string1++; + string2++; + + while (isspace (*string1)) + string1++; + while (string2 < end_str2 && isspace (*string2)) + string2++; + continue; + } + + /* Handle C++ user-defined operators. */ + else if (language == language_cplus + && *string1 == 'o') + { + if (cp_is_operator (string1, string1_start)) + { + /* An operator name in STRING1. Check STRING2. */ + size_t cmplen = std::min (CP_OPERATOR_LEN, end_str2 - string2); + if (strncmp (string1, string2, cmplen) != 0) + return 1; + + string1 += cmplen; + string2 += cmplen; + + if (string2 != end_str2) + { + /* Check for "operatorX" in STRING2. */ + if (valid_identifier_name_char (*string2)) + return 1; + + skip_ws (string1, string2, end_str2); + } + + /* Handle operator(). */ + if (*string1 == '(') + { + if (string2 == end_str2) + { + if (mode == strncmp_iw_mode::NORMAL) + return 0; + else + { + /* Don't break for the regular return at the + bottom, because "operator" should not + match "operator()", since this open + parentheses is not the parameter list + start. */ + return *string1 != '\0'; + } + } + + if (*string1 != *string2) + return 1; + + string1++; + string2++; + } + + while (1) + { + skip_ws (string1, string2, end_str2); + + /* Skip to end of token, or to END, whatever comes + first. */ + const char *end_str1 = string1 + strlen (string1); + const char *p1 = cp_skip_operator_token (string1, end_str1); + const char *p2 = cp_skip_operator_token (string2, end_str2); + + cmplen = std::min (p1 - string1, p2 - string2); + if (p2 == end_str2) + { + if (strncmp (string1, string2, cmplen) != 0) + return 1; + } + else + { + if (p1 - string1 != p2 - string2) + return 1; + if (strncmp (string1, string2, cmplen) != 0) + return 1; + } + + string1 += cmplen; + string2 += cmplen; + + if (*string1 == '\0' || string2 == end_str2) + break; + if (*string1 == '(' || *string2 == '(') + break; + } + + continue; + } + } + if (case_sensitivity == case_sensitive_on && *string1 != *string2) break; if (case_sensitivity == case_sensitive_off @@ -2441,6 +2648,12 @@ strncmp_iw_with_mode (const char *string1, const char *string2, != tolower ((unsigned char) *string2))) break; + /* If we see any non-whitespace, non-identifier-name character + (any of "()<>*&" etc.), then skip spaces the next time + around. */ + if (!isspace (*string1) && !valid_identifier_name_char (*string1)) + skip_spaces = true; + string1++; string2++; } @@ -2462,7 +2675,7 @@ int strncmp_iw (const char *string1, const char *string2, size_t string2_len) { return strncmp_iw_with_mode (string1, string2, string2_len, - strncmp_iw_mode::NORMAL); + strncmp_iw_mode::NORMAL, language_minimal); } /* See utils.h. */ @@ -2471,7 +2684,7 @@ int strcmp_iw (const char *string1, const char *string2) { return strncmp_iw_with_mode (string1, string2, strlen (string2), - strncmp_iw_mode::MATCH_PARAMS); + strncmp_iw_mode::MATCH_PARAMS, language_minimal); } /* This is like strcmp except that it ignores whitespace and treats diff --git a/gdb/utils.h b/gdb/utils.h index 9e531e0..4ce263e 100644 --- a/gdb/utils.h +++ b/gdb/utils.h @@ -56,7 +56,8 @@ enum class strncmp_iw_mode extern int strncmp_iw_with_mode (const char *string1, const char *string2, size_t string2_len, - strncmp_iw_mode mode); + strncmp_iw_mode mode, + enum language language); /* Do a strncmp() type operation on STRING1 and STRING2, ignoring any differences in whitespace. STRING2_LEN is STRING2's length.