From patchwork Fri Jun  2 12:22:32 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Pedro Alves <palves@redhat.com>
X-Patchwork-Id: 20713
Received: (qmail 1929 invoked by alias); 2 Jun 2017 12:23:18 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Delivered-To: mailing list gdb-patches@sourceware.org
Received: (qmail 1810 invoked by uid 89); 2 Jun 2017 12:23:17 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0,
	GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RP_MATCHES_RCVD,
	SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=converse,
	Within, unidentified
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 02 Jun 2017 12:23:13 +0000
Received: from smtp.corp.redhat.com
	(int-mx06.intmail.prod.int.phx2.redhat.com
	[10.5.11.16])	(using TLSv1.2 with cipher AECDH-AES256-SHA
	(256/256 bits))	(No client certificate requested)	by
	mx1.redhat.com (Postfix) with ESMTPS id A0367796E1	for
	<gdb-patches@sourceware.org>; Fri,  2 Jun 2017 12:23:15 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A0367796E1
Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com;
	dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com;
	spf=pass smtp.mailfrom=palves@redhat.com
DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com A0367796E1
Received: from cascais.lan (ovpn04.gateway.prod.ext.ams2.redhat.com
	[10.39.146.4])	by smtp.corp.redhat.com (Postfix) with ESMTP
	id 06B914DA6D	for <gdb-patches@sourceware.org>;
	Fri,  2 Jun 2017 12:23:14 +0000 (UTC)
From: Pedro Alves <palves@redhat.com>
To: gdb-patches@sourceware.org
Subject: [PATCH 34/40] Make strcmp_iw NOT ignore whitespace in the middle of
	tokens
Date: Fri,  2 Jun 2017 13:22:32 +0100
Message-Id: <1496406158-12663-35-git-send-email-palves@redhat.com>
In-Reply-To: <1496406158-12663-1-git-send-email-palves@redhat.com>
References: <1496406158-12663-1-git-send-email-palves@redhat.com>

currently "b func tion" manages to set a breakpoint at "function" !

All this years I had never noticed this, but now that the linespec
completer actually works, this easily happens by accident, with:

  "b func t<tab>"

expecting to get "thread", but getting instead:

  "b func tion"

...

Also, this:

  "b rettypefunc<int>"

manages to set a breakpoint on "rettype func<int>()".

These things happen due to strcmp_iw "magic".

Fix it by teaching strcmp_iw about when can it skip whitespace.  This
required handling user-defined operators, and scope operators,
complicating the code a bit, unfortunately.  I added unit tests for
all the corner cases I stumbled on, as I was developing this, and then
in the end wrote a testsuite testcase covering many of the same things
and more (later in the series).

The operator_stoken changes are necessary due to a latent bug --
currently "operator char" becomes "operatorchar", and later look ups
only find it because strcmp_iw ignores the whitespace...

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	* c-exp.y (oper): Add space to operator names.
	* cp-support.c (cp_symbol_name_matches_1)
	(cp_fq_symbol_name_matches): Pass language to
	strncmp_iw_with_mode.
	(test_cp_symbol_name_cmp): Add unit tests.
	* language.c (default_symbol_name_matcher): Pass language to
	strncmp_iw_with_mode.
	* utils.c: Include "cp-support.h" and <algorithm>.
	(valid_identifier_name_char, cp_skip_operator_token, skip_ws)
	(cp_is_operator): New functions.
	(strncmp_iw_with_mode): Use them.  Add language parameter.  Don't
	skip whitespace in the symbol name when the lookup name doesn't
	have spaces, and vice versa.
	(strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode.
	* utils.h (strncmp_iw_with_mode): Add language parameter.
---
 gdb/c-exp.y      |   5 +-
 gdb/cp-support.c |  65 +++++++++++++++-
 gdb/language.c   |   2 +-
 gdb/utils.c      | 227 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 gdb/utils.h      |   3 +-
 5 files changed, 289 insertions(+), 13 deletions(-)
diff --git a/gdb/c-exp.y b/gdb/c-exp.y
index 24a2fbd..0a182cc 100644
--- a/gdb/c-exp.y
+++ b/gdb/c-exp.y
@@ -1487,7 +1487,7 @@ oper:	OPERATOR NEW
 	|	OPERATOR '>'
 			{ $$ = operator_stoken (">"); }
 	|	OPERATOR ASSIGN_MODIFY
-			{ const char *op = "unknown";
+			{ const char *op = " unknown";
 			  switch ($2)
 			    {
 			    case BINOP_RSH:
@@ -1563,7 +1563,8 @@ oper:	OPERATOR NEW
 
 			  c_print_type ($2, NULL, &buf, -1, 0,
 					&type_print_raw_options);
-			  $$ = operator_stoken (buf.c_str ());
+			  std::string name = " " + buf.string ();
+			  $$ = operator_stoken (name.c_str ());
 			}
 	;
 
diff --git a/gdb/cp-support.c b/gdb/cp-support.c
index 84d8a6b..4c353c5 100644
--- a/gdb/cp-support.c
+++ b/gdb/cp-support.c
@@ -1709,7 +1709,7 @@ cp_symbol_name_matches_1 (const char *symbol_search_name,
   while (true)
     {
       if (strncmp_iw_with_mode (sname, lookup_name, lookup_name_len,
-				mode) == 0)
+				mode, language_cplus) == 0)
 	{
 	  if (comp_match_res != NULL)
 	    {
@@ -1747,7 +1747,7 @@ cp_fq_symbol_name_matches (const char *symbol_search_name,
 
   if (strncmp_iw_with_mode (symbol_search_name,
 			    name.c_str (), name.size (),
-			    mode) == 0)
+			    mode, language_cplus) == 0)
     {
       if (comp_match_res != NULL)
 	{
@@ -1857,6 +1857,67 @@ test_cp_symbol_name_cmp ()
   CHECK_MATCH_C ("function(int)", "function(int)");
   CHECK_NOT_MATCH_C ("function(int)", "function()");
 
+  /* Check that whitespace within symbol names is not ignored.  */
+  CHECK_NOT_MATCH_C ("function", "func tion");
+  CHECK_NOT_MATCH_C ("func__tion", "func_ _tion");
+  CHECK_NOT_MATCH_C ("func11tion", "func1 1tion");
+
+  /* Check the converse, which can happen with template function,
+     where the return type is part of the demangled name.  */
+  CHECK_NOT_MATCH_C ("func tion", "function");
+  CHECK_NOT_MATCH_C ("func1 1tion", "func11tion");
+  CHECK_NOT_MATCH_C ("func_ _tion", "func__tion");
+
+  /* Within parameters too.  */
+  CHECK_NOT_MATCH_C ("func(param)", "func(par am)");
+
+  /* Check handling of whitespace around C++ operators.  */
+  CHECK_NOT_MATCH_C ("operator<<", "opera tor<<");
+  CHECK_NOT_MATCH_C ("operator<<", "operator< <");
+  CHECK_NOT_MATCH_C ("operator<<", "operator < <");
+  CHECK_NOT_MATCH_C ("operator==", "operator= =");
+  CHECK_NOT_MATCH_C ("operator==", "operator = =");
+  CHECK_MATCH_C ("operator<<", "operator <<");
+  CHECK_MATCH_C ("operator<<()", "operator <<");
+  CHECK_NOT_MATCH_C ("operator<<()", "operator<<(int)");
+  CHECK_NOT_MATCH_C ("operator<<(int)", "operator<<()");
+  CHECK_MATCH_C ("operator==", "operator ==");
+  CHECK_MATCH_C ("operator==()", "operator ==");
+  CHECK_MATCH_C ("operator <<", "operator<<");
+  CHECK_MATCH_C ("operator ==", "operator==");
+  CHECK_MATCH_C ("operator bool", "operator  bool");
+  CHECK_MATCH_C ("operator bool ()", "operator  bool");
+  CHECK_MATCH_C ("operatorX<<", "operatorX < <");
+  CHECK_MATCH_C ("Xoperator<<", "Xoperator < <");
+
+  CHECK_MATCH_C ("operator()(int)", "operator()(int)");
+  CHECK_MATCH_C ("operator()(int)", "operator ( ) ( int )");
+  CHECK_MATCH_C ("operator()<long>(int)", "operator ( ) < long > ( int )");
+  /* The first "()" is not the parameter list.  */
+  CHECK_NOT_MATCH ("operator()(int)", "operator");
+
+  /* Misc user-defined operator tests.  */
+
+  CHECK_NOT_MATCH_C ("operator/=()", "operator ^=");
+  /* Same length at end of input.  */
+  CHECK_NOT_MATCH_C ("operator>>", "operator[]");
+  /* Same length but not at end of input.  */
+  CHECK_NOT_MATCH_C ("operator>>()", "operator[]()");
+
+  CHECK_MATCH_C ("base::operator char*()", "base::operator char*()");
+  CHECK_MATCH_C ("base::operator char*()", "base::operator char * ()");
+  CHECK_MATCH_C ("base::operator char**()", "base::operator char * * ()");
+  CHECK_MATCH ("base::operator char**()", "base::operator char * *");
+  CHECK_MATCH_C ("base::operator*()", "base::operator*()");
+  CHECK_NOT_MATCH_C ("base::operator char*()", "base::operatorc");
+  CHECK_NOT_MATCH ("base::operator char*()", "base::operator char");
+  CHECK_NOT_MATCH ("base::operator char*()", "base::operat");
+
+  /* Check handling of whitespace around C++ scope operators.  */
+  CHECK_NOT_MATCH_C ("foo::bar", "foo: :bar");
+  CHECK_MATCH_C ("foo::bar", "foo :: bar");
+  CHECK_MATCH_C ("foo :: bar", "foo::bar");
+
   /* Tests matching symbols in some scope.  */
   CHECK_MATCH_C ("foo::function()", "function");
   CHECK_MATCH_C ("foo::function(int)", "function");
diff --git a/gdb/language.c b/gdb/language.c
index 511a86f..377d748 100644
--- a/gdb/language.c
+++ b/gdb/language.c
@@ -723,7 +723,7 @@ default_symbol_name_matcher (const char *symbol_search_name,
 			  : strncmp_iw_mode::MATCH_PARAMS);
 
   if (strncmp_iw_with_mode (symbol_search_name, name.c_str (), name.size (),
-			    mode) == 0)
+			    mode, language_minimal) == 0)
     {
       if (comp_match_res != NULL)
 	{
diff --git a/gdb/utils.c b/gdb/utils.c
index 9798edc..484c1ef 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -65,6 +65,8 @@
 #include "gdb_usleep.h"
 #include "interps.h"
 #include "gdb_regex.h"
+#include "cp-support.h"
+#include <algorithm>
 
 #if !HAVE_DECL_MALLOC
 extern PTR malloc ();		/* ARI: PTR */
@@ -2418,22 +2420,227 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name,
     }
 }
 
+/* True if CH is a character that can be part of a symbol name.  I.e.,
+   either a number, a letter, or a '_'.  */
+
+static bool
+valid_identifier_name_char (int ch)
+{
+  return (isalnum (ch) || ch == '_');
+}
+
+/* Skip to end of token, or to END, whatever comes first.  */
+
+static const char *
+cp_skip_operator_token (const char *token, const char *end)
+{
+  const char *p = token;
+  while (p != end && !isspace (*p) && *p != '(')
+    {
+      if (valid_identifier_name_char (*p))
+	{
+	  while (p != end && valid_identifier_name_char (*p))
+	    p++;
+	  return p;
+	}
+      else
+	{
+	  /* Note, ordered such that among ops that share a prefix,
+	     longer comes first.  This is so that the loop below can
+	     bail on first match.  */
+	  static const char *ops[] =
+	    {
+	      "[",
+	      "]",
+	      "~",
+	      ",",
+	      "-=", "--", "->", "-",
+	      "+=", "++", "+",
+	      "*=", "*",
+	      "/=", "/",
+	      "%=", "%",
+	      "|=", "||", "|",
+	      "&=", "&&", "&",
+	      "^=", "^",
+	      "!=", "!",
+	      "<<=", "<=", "<<", "<",
+	      ">>=", ">=", ">>", ">",
+	      "==", "=",
+	    };
+
+	  for (const char *op : ops)
+	    {
+	      size_t oplen = strlen (op);
+	      size_t lencmp = std::min<size_t> (oplen, end - p);
+
+	      if (strncmp (p, op, lencmp) == 0)
+		return p + lencmp;
+	    }
+	  /* Some unidentified character.  Return it.  */
+	  return p + 1;
+	}
+    }
+
+  return p;
+}
+
+/* Advance string1/string2 past whitespace.  */
+
+static void
+skip_ws (const char *&string1, const char *&string2, const char *end_str2)
+{
+  while (isspace (*string1))
+    string1++;
+  while (string2 < end_str2 && isspace (*string2))
+    string2++;
+}
+
+static bool
+cp_is_operator (const char *string, const char *start)
+{
+  return ((string == start
+	   || !valid_identifier_name_char (string[-1]))
+	  && strncmp (string, CP_OPERATOR_STR, CP_OPERATOR_LEN) == 0
+	  && !valid_identifier_name_char (string[CP_OPERATOR_LEN]));
+}
+
 /* See utils.h.  */
 
 int
 strncmp_iw_with_mode (const char *string1, const char *string2,
-		      size_t string2_len, strncmp_iw_mode mode)
+		      size_t string2_len, strncmp_iw_mode mode,
+		      enum language language)
 {
+  const char *string1_start = string1;
   const char *end_str2 = string2 + string2_len;
+  bool skip_spaces = true;
+  bool have_colon_op = (language == language_cplus
+			|| language == language_rust
+			|| language == language_fortran);
 
   while (1)
     {
-      while (isspace (*string1))
-	string1++;
-      while (string2 < end_str2 && isspace (*string2))
-	string2++;
+      if (skip_spaces
+	  || ((isspace (*string1) && !valid_identifier_name_char (*string2))
+	      || (isspace (*string2) && !valid_identifier_name_char (*string1))))
+	{
+	  skip_ws (string1, string2, end_str2);
+	  skip_spaces = false;
+	}
+
       if (*string1 == '\0' || string2 == end_str2)
 	break;
+
+      /* Handle the :: operator.  */
+      if (have_colon_op && string1[0] == ':' && string1[1] == ':')
+	{
+	  if (*string2 != ':')
+	    return 1;
+
+	  string1++;
+	  string2++;
+
+	  if (string2 == end_str2)
+	    break;
+
+	  if (*string2 != ':')
+	    return 1;
+
+	  string1++;
+	  string2++;
+
+	  while (isspace (*string1))
+	    string1++;
+	  while (string2 < end_str2 && isspace (*string2))
+	    string2++;
+	  continue;
+	}
+
+      /* Handle C++ user-defined operators.  */
+      else if (language == language_cplus
+	       && *string1 == 'o')
+	{
+	  if (cp_is_operator (string1, string1_start))
+	    {
+	      /* An operator name in STRING1.  Check STRING2.  */
+	      size_t cmplen = std::min<size_t> (CP_OPERATOR_LEN, end_str2 - string2);
+	      if (strncmp (string1, string2, cmplen) != 0)
+		return 1;
+
+	      string1 += cmplen;
+	      string2 += cmplen;
+
+	      if (string2 != end_str2)
+		{
+		  /* Check for "operatorX" in STRING2.  */
+		  if (valid_identifier_name_char (*string2))
+		    return 1;
+
+		  skip_ws (string1, string2, end_str2);
+		}
+
+	      /* Handle operator().  */
+	      if (*string1 == '(')
+		{
+		  if (string2 == end_str2)
+		    {
+		      if (mode == strncmp_iw_mode::NORMAL)
+			return 0;
+		      else
+			{
+			  /* Don't break for the regular return at the
+			     bottom, because "operator" should not
+			     match "operator()", since this open
+			     parentheses is not the parameter list
+			     start.  */
+			  return *string1 != '\0';
+			}
+		    }
+
+		  if (*string1 != *string2)
+		    return 1;
+
+		  string1++;
+		  string2++;
+		}
+
+	      while (1)
+		{
+		  skip_ws (string1, string2, end_str2);
+
+		  /* Skip to end of token, or to END, whatever comes
+		     first.  */
+		  const char *end_str1 = string1 + strlen (string1);
+		  const char *p1 = cp_skip_operator_token (string1, end_str1);
+		  const char *p2 = cp_skip_operator_token (string2, end_str2);
+
+		  cmplen = std::min (p1 - string1, p2 - string2);
+		  if (p2 == end_str2)
+		    {
+		      if (strncmp (string1, string2, cmplen) != 0)
+			return 1;
+		    }
+		  else
+		    {
+		      if (p1 - string1 != p2 - string2)
+			return 1;
+		      if (strncmp (string1, string2, cmplen) != 0)
+			return 1;
+		    }
+
+		  string1 += cmplen;
+		  string2 += cmplen;
+
+		  if (*string1 == '\0' || string2 == end_str2)
+		    break;
+		  if (*string1 == '(' || *string2 == '(')
+		    break;
+		}
+
+	      continue;
+	    }
+	}
+
       if (case_sensitivity == case_sensitive_on && *string1 != *string2)
 	break;
       if (case_sensitivity == case_sensitive_off
@@ -2441,6 +2648,12 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
 	      != tolower ((unsigned char) *string2)))
 	break;
 
+      /* If we see any non-whitespace, non-identifier-name character
+	 (any of "()<>*&" etc.), then skip spaces the next time
+	 around.  */
+      if (!isspace (*string1) && !valid_identifier_name_char (*string1))
+	skip_spaces = true;
+
       string1++;
       string2++;
     }
@@ -2462,7 +2675,7 @@ int
 strncmp_iw (const char *string1, const char *string2, size_t string2_len)
 {
   return strncmp_iw_with_mode (string1, string2, string2_len,
-			       strncmp_iw_mode::NORMAL);
+			       strncmp_iw_mode::NORMAL, language_minimal);
 }
 
 /* See utils.h.  */
@@ -2471,7 +2684,7 @@ int
 strcmp_iw (const char *string1, const char *string2)
 {
   return strncmp_iw_with_mode (string1, string2, strlen (string2),
-			       strncmp_iw_mode::MATCH_PARAMS);
+			       strncmp_iw_mode::MATCH_PARAMS, language_minimal);
 }
 
 /* This is like strcmp except that it ignores whitespace and treats
diff --git a/gdb/utils.h b/gdb/utils.h
index 9e531e0..4ce263e 100644
--- a/gdb/utils.h
+++ b/gdb/utils.h
@@ -56,7 +56,8 @@ enum class strncmp_iw_mode
 extern int strncmp_iw_with_mode (const char *string1,
 				 const char *string2,
 				 size_t string2_len,
-				 strncmp_iw_mode mode);
+				 strncmp_iw_mode mode,
+				 enum language language);
 
 /* Do a strncmp() type operation on STRING1 and STRING2, ignoring any
    differences in whitespace.  STRING2_LEN is STRING2's length.