Expand keyword lexing intelligence in the linespec parser.

Message ID 20150320203235.29105.19172.stgit@valrhona.uglyboxes.com
State New, archived
Headers

Commit Message

Keith Seitz March 20, 2015, 8:32 p.m. UTC
  This patch changes the heuristic the linespec lexer uses to
detect a keyword in the input stream.

Currently, the heuristic is: a word is a keyword if it
1) points to a string that is a keyword
2) is followed by a non-identifier character

This is strictly more correct than using whitespace. For example,
it allows constructs such as "break foo if(i == 1)". However,
find_condition_and_thread in breakpoint.c does not support this expanded
usage. It requires whitespace to follow the keyword.

The proposed new heuristic is: a word is a keyword if it
1) points to a string that is a keyword
2) is followed by whitespace
3) is not followed by another keyword string followed by whitespace

This additional complexity allows constructs such as
"break thread thread 3" and "break thread 3".  In the former case,
the actual location is a symbol named "thread" to be set on thread #3.
In the later case, the location is NULL, i.e., the default location,
to be set on thread #3.

In order to pass all the new tests added here, I've also had to add a
new feature to parse_breakpoint_sals, which expands recognition of the
default location to keywords other than "if", which is the only keyword
currently permitted with the default (NULL) location, but there is no
reason to exclude other keywords.

Consequently, it will be possible to use "break thread 1" or
"break task 1".

In addition to all of this, it is now possible to remove the keyword_ok
state from the linespec parser.

gdb/ChangeLog

	* breakpoint.c (parse_breakpoint_sals): Use
	linespec_lexer_lex_keyword to ascertain if the user specified
	a NULL location.
	* linespec.c [IF_KEYWORD_INDEX]: Define.
	(linespec_lexer_lex_keyword): Export.
	(struct ls_parser) <keyword_ok>: Remove.
	A keyword is only a keyword if not followed by another keyword.
	(linespec_lexer_lex_one): Remove keyword_ok handling.
	Add comment explaining why the parsing stream is not advanced
	when a keyword is seen.
	(parse_linespec): Remove parser->keyword_ok.
	* linespec.h (linespec_lexer_lex_keyword): Add declaration.

gdb/testsuite/ChangeLog

	* gdb.linespec/keywords.c: New file.
	* gdb.linespec/keywords.exp: New file.
---
 gdb/breakpoint.c                        |    3 -
 gdb/linespec.c                          |   55 ++++++++++++++--------
 gdb/linespec.h                          |    4 ++
 gdb/testsuite/gdb.linespec/keywords.c   |   36 ++++++++++++++
 gdb/testsuite/gdb.linespec/keywords.exp |   77 +++++++++++++++++++++++++++++++
 5 files changed, 153 insertions(+), 22 deletions(-)
 create mode 100644 gdb/testsuite/gdb.linespec/keywords.c
 create mode 100644 gdb/testsuite/gdb.linespec/keywords.exp
  

Comments

Joel Brobecker March 23, 2015, 1:31 p.m. UTC | #1
> gdb/ChangeLog
> 
> 	* breakpoint.c (parse_breakpoint_sals): Use
> 	linespec_lexer_lex_keyword to ascertain if the user specified
> 	a NULL location.
> 	* linespec.c [IF_KEYWORD_INDEX]: Define.
> 	(linespec_lexer_lex_keyword): Export.
> 	(struct ls_parser) <keyword_ok>: Remove.
> 	A keyword is only a keyword if not followed by another keyword.
> 	(linespec_lexer_lex_one): Remove keyword_ok handling.
> 	Add comment explaining why the parsing stream is not advanced
> 	when a keyword is seen.
> 	(parse_linespec): Remove parser->keyword_ok.
> 	* linespec.h (linespec_lexer_lex_keyword): Add declaration.
> 
> gdb/testsuite/ChangeLog
> 
> 	* gdb.linespec/keywords.c: New file.
> 	* gdb.linespec/keywords.exp: New file.

LGTM. A few minor comments, but otherwise pre-approved.

> +++ b/gdb/linespec.h
> @@ -152,4 +152,8 @@ extern struct symtabs_and_lines decode_line_with_current_source (char *, int);
>  
>  extern struct symtabs_and_lines decode_line_with_last_displayed (char *, int);
>  
> +/* Does P represent one of the keywords?  If so, return
> +   the keyword.  If not, return NULL.  */
> +
> +extern const char *linespec_lexer_lex_keyword (const char *p);

You already have the function documented in linespec.c, so no need
to duplicate it here.

> +gdb_test "break thread 123" "Unknown thread 123\."

I think that "\." should be "\\.", no? Likewise for the expected
output of the remaining tests.

> +gdb_test "break thread foo" "Junk after thread keyword\."
> +gdb_test "break task 123" "Unknown task 123\."
> +gdb_test "break task foo" "Junk after task keyword\."
> +gdb_breakpoint "thread if 0" "message"
> +
> +# These are also NULL locations, but using a subsequent keyword
> +# as the "junk".
> +gdb_test "break thread thread" "Junk after thread keyword\."
> +gdb_test "break thread task" "Junk after thread keyword\."
> +gdb_test "break thread if" "Junk after thread keyword\."
> +gdb_test "break task task" "Junk after task keyword\."
> +gdb_test "break task thread" "Junk after task keyword\."
> +gdb_test "break task if" "Junk after task keyword\."
> +
> +# Test locations containing keyword followed by keyword.
> +gdb_test "break thread thread 123" "Unknown thread 123\."
> +gdb_test "break task task 123" "Unknown task 123\."
> +
> +# Test NULL location with valid conditional containing a keyword.
> +gdb_breakpoint "thread if thread == 0"
> +gdb_breakpoint "task if task == 0"
  
Keith Seitz March 23, 2015, 8:22 p.m. UTC | #2
On 03/23/2015 06:31 AM, Joel Brobecker wrote:
> LGTM. A few minor comments, but otherwise pre-approved.
>
>> +++ b/gdb/linespec.h
>> @@ -152,4 +152,8 @@ extern struct symtabs_and_lines decode_line_with_current_source (char *, int);
>>
>>   extern struct symtabs_and_lines decode_line_with_last_displayed (char *, int);
>>
>> +/* Does P represent one of the keywords?  If so, return
>> +   the keyword.  If not, return NULL.  */
>> +
>> +extern const char *linespec_lexer_lex_keyword (const char *p);
>
> You already have the function documented in linespec.c, so no need
> to duplicate it here.

Bah. I forgot all about that. It's been a hectic Monday. I'll push a 
patch to remove this very soon.

>> +gdb_test "break thread 123" "Unknown thread 123\."
>
> I think that "\." should be "\\.", no? Likewise for the expected
> output of the remaining tests.

Yes, indeed!

Thank you for the review, Joel.

Keith
  

Patch

diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index 0c000f2..e35878b 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -9335,8 +9335,7 @@  parse_breakpoint_sals (char **address,
 {
   /* If no arg given, or if first arg is 'if ', use the default
      breakpoint.  */
-  if ((*address) == NULL
-      || (startswith ((*address), "if") && isspace ((*address)[2])))
+  if ((*address) == NULL || linespec_lexer_lex_keyword (*address))
     {
       /* The last displayed codepoint, if it's valid, is our default breakpoint
          address.  */
diff --git a/gdb/linespec.c b/gdb/linespec.c
index 9ec4a5e..90c07a2 100644
--- a/gdb/linespec.c
+++ b/gdb/linespec.c
@@ -246,6 +246,7 @@  typedef enum ls_token_type linespec_token_type;
 /* List of keywords  */
 
 static const char * const linespec_keywords[] = { "if", "thread", "task" };
+#define IF_KEYWORD_INDEX 0
 
 /* A token of the linespec lexer  */
 
@@ -290,11 +291,6 @@  struct ls_parser
   /* Is the entire linespec quote-enclosed?  */
   int is_quote_enclosed;
 
-  /* Is a keyword syntactically valid at this point?
-     In, e.g., "break thread thread 1", the leading "keyword" must not
-     be interpreted as such.  */
-  int keyword_ok;
-
   /* The state of the parse.  */
   struct linespec_state state;
 #define PARSER_STATE(PPTR) (&(PPTR)->state)
@@ -421,7 +417,7 @@  linespec_lexer_lex_number (linespec_parser *parser, linespec_token *tokenp)
 /* Does P represent one of the keywords?  If so, return
    the keyword.  If not, return NULL.  */
 
-static const char *
+const char *
 linespec_lexer_lex_keyword (const char *p)
 {
   int i;
@@ -433,11 +429,34 @@  linespec_lexer_lex_keyword (const char *p)
 	  int len = strlen (linespec_keywords[i]);
 
 	  /* If P begins with one of the keywords and the next
-	     character is not a valid identifier character,
-	     we have found a keyword.  */
+	     character is whitespace, we may have found a keyword.
+	     It is only a keyword if it is not followed by another
+	     keyword.  */
 	  if (strncmp (p, linespec_keywords[i], len) == 0
-	      && !(isalnum (p[len]) || p[len] == '_'))
-	    return linespec_keywords[i];
+	      && isspace (p[len]))
+	    {
+	      int j;
+
+	      /* Special case: "if" ALWAYS stops the lexer, since it
+		 is not possible to predict what is going to appear in
+		 the condition, which can only be parsed after SaLs have
+		 been found.  */
+	      if (i != IF_KEYWORD_INDEX)
+		{
+		  p += len;
+		  p = skip_spaces_const (p);
+		  for (j = 0; j < ARRAY_SIZE (linespec_keywords); ++j)
+		    {
+		      int nextlen = strlen (linespec_keywords[j]);
+
+		      if (strncmp (p, linespec_keywords[j], nextlen) == 0
+			  && isspace (p[nextlen]))
+			return NULL;
+		    }
+		}
+
+	      return linespec_keywords[i];
+	    }
 	}
     }
 
@@ -734,13 +753,16 @@  linespec_lexer_lex_one (linespec_parser *parser)
       PARSER_STREAM (parser) = skip_spaces_const (PARSER_STREAM (parser));
 
       /* Check for a keyword, they end the linespec.  */
-      keyword = NULL;
-      if (parser->keyword_ok)
-	keyword = linespec_lexer_lex_keyword (PARSER_STREAM (parser));
+      keyword = linespec_lexer_lex_keyword (PARSER_STREAM (parser));
       if (keyword != NULL)
 	{
 	  parser->lexer.current.type = LSTOKEN_KEYWORD;
 	  LS_TOKEN_KEYWORD (parser->lexer.current) = keyword;
+	  /* We do not advance the stream here intentionally:
+	     we would like lexing to stop when a keyword is seen.
+
+	     PARSER_STREAM (parser) +=  strlen (keyword);  */
+
 	  return parser->lexer.current;
 	}
 
@@ -2175,10 +2197,6 @@  parse_linespec (linespec_parser *parser, const char **argptr)
 	}
     }
 
-  /* A keyword at the start cannot be interpreted as such.
-     Consider "b thread thread 42".  */
-  parser->keyword_ok = 0;
-
   parser->lexer.saved_arg = *argptr;
   parser->lexer.stream = argptr;
 
@@ -2250,9 +2268,6 @@  parse_linespec (linespec_parser *parser, const char **argptr)
   else if (token.type != LSTOKEN_STRING && token.type != LSTOKEN_NUMBER)
     unexpected_linespec_error (parser);
 
-  /* Now we can recognize keywords.  */
-  parser->keyword_ok = 1;
-
   /* Shortcut: If the next token is not LSTOKEN_COLON, we know that
      this token cannot represent a filename.  */
   token = linespec_lexer_peek_token (parser);
diff --git a/gdb/linespec.h b/gdb/linespec.h
index 5520afa..7e66024 100644
--- a/gdb/linespec.h
+++ b/gdb/linespec.h
@@ -152,4 +152,8 @@  extern struct symtabs_and_lines decode_line_with_current_source (char *, int);
 
 extern struct symtabs_and_lines decode_line_with_last_displayed (char *, int);
 
+/* Does P represent one of the keywords?  If so, return
+   the keyword.  If not, return NULL.  */
+
+extern const char *linespec_lexer_lex_keyword (const char *p);
 #endif /* defined (LINESPEC_H) */
diff --git a/gdb/testsuite/gdb.linespec/keywords.c b/gdb/testsuite/gdb.linespec/keywords.c
new file mode 100644
index 0000000..de2db63
--- /dev/null
+++ b/gdb/testsuite/gdb.linespec/keywords.c
@@ -0,0 +1,36 @@ 
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2015 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+static int
+task (int task)
+{
+  return task - 1;
+}
+
+static int
+thread (int thread)
+{
+  return task (thread) + 1;
+}
+
+int
+main (void)
+{
+  int x = 0;
+  x += thread (0);
+  return x;
+}
diff --git a/gdb/testsuite/gdb.linespec/keywords.exp b/gdb/testsuite/gdb.linespec/keywords.exp
new file mode 100644
index 0000000..312fc92
--- /dev/null
+++ b/gdb/testsuite/gdb.linespec/keywords.exp
@@ -0,0 +1,77 @@ 
+# Copyright 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test keyword parsing in the linespec parser.
+
+standard_testfile
+set exefile $testfile
+
+if {[prepare_for_testing $testfile $exefile $srcfile {debug}]} {
+    return -1
+}
+
+if ![runto_main] {
+    fail "Can't run to main"
+    return 0
+}
+
+# Turn off pending breakpoints to facilitate testing errors.
+gdb_test_no_output "set breakpoint pending off"
+
+# The linespec lexer ignores the language setting when lexing
+# keywords.
+gdb_test "break if" "Function \"if\" not defined."
+gdb_breakpoint "thread" "message"
+gdb_breakpoint "task" "message"
+
+# The lexer should prune any trailing whitesapce, so the expected
+# outcome of the following tests should be the same as the previous
+# tests.
+with_test_prefix "trailing whitespace" {
+    gdb_test "break if " "Function \"if\" not defined."
+    gdb_breakpoint "thread " "message"
+    gdb_breakpoint "task " "message"
+}
+
+# With a single keyword specified first in the location,
+# we assume we have a NULL location, i.e., the actual location
+# of the event is the current default location.
+#
+# break if XX --> okay if XX is a valid expression
+# (the lexer cannot know whether the expression is valid or not)
+# break {thread,task} NUMBER --> invalid thread/task
+# break {thread,task} STUFF --> "junk" after keyword (STUFF is not numeric)
+gdb_test "break thread 123" "Unknown thread 123\."
+gdb_test "break thread foo" "Junk after thread keyword\."
+gdb_test "break task 123" "Unknown task 123\."
+gdb_test "break task foo" "Junk after task keyword\."
+gdb_breakpoint "thread if 0" "message"
+
+# These are also NULL locations, but using a subsequent keyword
+# as the "junk".
+gdb_test "break thread thread" "Junk after thread keyword\."
+gdb_test "break thread task" "Junk after thread keyword\."
+gdb_test "break thread if" "Junk after thread keyword\."
+gdb_test "break task task" "Junk after task keyword\."
+gdb_test "break task thread" "Junk after task keyword\."
+gdb_test "break task if" "Junk after task keyword\."
+
+# Test locations containing keyword followed by keyword.
+gdb_test "break thread thread 123" "Unknown thread 123\."
+gdb_test "break task task 123" "Unknown task 123\."
+
+# Test NULL location with valid conditional containing a keyword.
+gdb_breakpoint "thread if thread == 0"
+gdb_breakpoint "task if task == 0"