[RFAv2,1/6] New cli-utils.h/.c function extract_info_print_args

Message ID 20180826165359.1600-2-philippe.waroquiers@skynet.be
State New, archived
Headers

Commit Message

Philippe Waroquiers Aug. 26, 2018, 4:53 p.m. UTC
  New cli-utils.h/.c function extract_info_print_args factorises
the extraction of the args '[-q] [-t TYPEREGEXP] [NAMEREGEXP]'.

This function will be used by the commands
  info [args|functions|locals|variables]

cli-utils.c has a new static function extract_arg_maybe_quoted
that extracts an argument, possibly single quoted.

gdb/ChangeLog
2018-08-26  Philippe Waroquiers  <philippe.waroquiers@skynet.be>

	* cli-utils.c (extract_arg_maybe_quoted): New function.
	(extract_info_print_args): New function.
	* cli-utils.h (extract_arg_maybe_quoted): New function.
	(extract_info_print_args): New function.
	(INFO_PRINT_ARGS_HELP): new macro.
---
 gdb/cli/cli-utils.c | 113 ++++++++++++++++++++++++++++++++++++++++++++
 gdb/cli/cli-utils.h |  20 ++++++++
 2 files changed, 133 insertions(+)
  

Comments

Tom Tromey Sept. 18, 2018, 3:23 p.m. UTC | #1
>>>>> "Philippe" == Philippe Waroquiers <philippe.waroquiers@skynet.be> writes:

Philippe> cli-utils.c has a new static function extract_arg_maybe_quoted
Philippe> that extracts an argument, possibly single quoted.

I think it would be better if this handled quotes the same way that
gdb_argv (aka libiberty's buildargv) does.  That approach seems ok-ish
(maybe not perfect), and following it would mean that at least we're not
increasing the number of ways that command lines are quoted in gdb.

I guess the reason you did not simply use gdb_argv is that the trailing
regexp is the rest of the input string.

Philippe> +/* See documentation in cli-utils.h.  */
Philippe> +
Philippe> +void
Philippe> +extract_info_print_args (const char* command,
Philippe> +			 const char **args,
Philippe> +			 bool *quiet,
Philippe> +			 std::string *regexp,
Philippe> +			 std::string *t_regexp)

I tend to think this should go somewhere else, since cli-utils is more
like "low level" parsing helpers.  But I don't really know where I
guess.

Philippe> +  if (*args != NULL && **args != '\000')
Philippe> +    *regexp = extract_arg (args);

It seems to me that this should just be "*regexp = args" for backward
compatibility.  This formulation will stop at whitespace, whereas the
previous code did not.

If changing the meaning of "info types regexp" where the regexp has
whitespace is ok, then switching fully to gdb_argv would be better.  I
don't know if it is ok to do but I suspect not.

Philippe> +/* A helper function to extract an argument from *ARG.  An argument is
Philippe> +   delimited by whitespace, but it can also be optionally quoted using
Philippe> +   single quote characters.  The return value is empty if no argument
Philippe> +   was found.  */
Philippe> +
Philippe> +static std::string
Philippe> +extract_arg_maybe_quoted (const char **arg)
Philippe> +{

One question I have is whether we could just change extract_arg to do
the dequoting.  Auditing the existing calls to see if this change would
negatively impact them might not be too hard.  And, it would be nice to
make gdb's CLI a bit more regular.

Philippe> +#define INFO_PRINT_ARGS_HELP(entity_kind) \
Philippe> +"If NAMEREGEXP is provided, only prints the " entity_kind " whose name\n\
Philippe> +matches NAMEREGEXP.\n\
Philippe> +If -t TYPEREGEXP is provided, only prints the " entity_kind " whose type\n\
Philippe> +matches TYPEREGEXP.  Note that the matching is done with the type\n\
Philippe> +printed by the 'whatis' command.\n\
Philippe> +By default, the command might produce headers and/or messages indicating\n\
Philippe> +why no " entity_kind " can be printed.\n\
Philippe> +The flag -q disables the production of these headers and messages."

Constructing help text like this is i18n-unfriendly.  gdb tries to do
this correctly, though IIRC the build isn't all wired up properly and
there aren't any actual translations :-/

Maybe it can be reworded somehow.  I'm not sure if xgettext can handle
this kind of string concatenation in combination with macro expansion,
either.

Tom
  
Philippe Waroquiers Sept. 23, 2018, 8:16 p.m. UTC | #2
Thanks for the comments, feedback below.

On Tue, 2018-09-18 at 09:23 -0600, Tom Tromey wrote:
> > > > > > "Philippe" == Philippe Waroquiers <philippe.waroquiers@skynet.be> writes:
> 
> Philippe> cli-utils.c has a new static function extract_arg_maybe_quoted
> Philippe> that extracts an argument, possibly single quoted.
> 
> I think it would be better if this handled quotes the same way that
> gdb_argv (aka libiberty's buildargv) does.  That approach seems ok-ish
> (maybe not perfect), and following it would mean that at least we're not
> increasing the number of ways that command lines are quoted in gdb.
Agreed. In RFAv3, the quoting is handled the same way as gdb_argv.

> 
> I guess the reason you did not simply use gdb_argv is that the trailing
> regexp is the rest of the input string.
Effectively. Using gdb_argv for handling the list of arguments would create
backward incompatibilities (more details below).


> 
> Philippe> +/* See documentation in cli-utils.h.  */
> Philippe> +
> Philippe> +void
> Philippe> +extract_info_print_args (const char* command,
> Philippe> +			 const char **args,
> Philippe> +			 bool *quiet,
> Philippe> +			 std::string *regexp,
> Philippe> +			 std::string *t_regexp)
> 
> I tend to think this should go somewhere else, since cli-utils is more
> like "low level" parsing helpers.  But I don't really know where I
> guess.
Yes, I found no better (existing) place, so I kept it there.
If you prefer, I can create cli-arg_utils.h for 'higher level'
arg parsing helpers if you believe this is cleaner.

> 
> Philippe> +  if (*args != NULL && **args != '\000')
> Philippe> +    *regexp = extract_arg (args);
> 
> It seems to me that this should just be "*regexp = args" for backward
> compatibility.  This formulation will stop at whitespace, whereas the
> previous code did not.
Good catch ! I have fixed the regression, and added a specific check
in info_qt.exp to verify that a space does not terminate the NAMEREGEXP.

> 
> If changing the meaning of "info types regexp" where the regexp has
> whitespace is ok, then switching fully to gdb_argv would be better.  I
> don't know if it is ok to do but I suspect not.
I looked at using gdb_argv, but concluded that using gdb_argv
would be backward incompatible for some likely use cases.
E.g. in Ada, a . can be part of a variable name,
and the behaviour of   'info variables a\.t' would change :
a lot more variables would match.
More generally, any backslashing of REGEXP special char
would be broken.
So, I did not switch fully to gdb_argv, but made
extract_arg_maybe_quoted quoting and escaping compatible
with gdb_argv.

> 
> Philippe> +/* A helper function to extract an argument from *ARG.  An argument is
> Philippe> +   delimited by whitespace, but it can also be optionally quoted using
> Philippe> +   single quote characters.  The return value is empty if no argument
> Philippe> +   was found.  */
> Philippe> +
> Philippe> +static std::string
> Philippe> +extract_arg_maybe_quoted (const char **arg)
> Philippe> +{
> 
> One question I have is whether we could just change extract_arg to do
> the dequoting.  Auditing the existing calls to see if this change would
> negatively impact them might not be too hard.  And, it would be nice to
> make gdb's CLI a bit more regular.
I examined the calls to extract_arg : I found one possible unlikely
backward incompatibility related to single quote handling :
the mem command accepts expressions for addresses, and expressions
in Ada can contain single quotes.
So, it looks possible to have extract_arg behaviour replaced by
extract_arg_maybe_quoted behaviour.
If you agree with the above analysis, I will work on that in a separate
patch series.

> 
> Philippe> +#define INFO_PRINT_ARGS_HELP(entity_kind) \
> Philippe> +"If NAMEREGEXP is provided, only prints the " entity_kind " whose name\n\
> Philippe> +matches NAMEREGEXP.\n\
> Philippe> +If -t TYPEREGEXP is provided, only prints the " entity_kind " whose type\n\
> Philippe> +matches TYPEREGEXP.  Note that the matching is done with the type\n\
> Philippe> +printed by the 'whatis' command.\n\
> Philippe> +By default, the command might produce headers and/or messages indicating\n\
> Philippe> +why no " entity_kind " can be printed.\n\
> Philippe> +The flag -q disables the production of these headers and messages."
> 
> Constructing help text like this is i18n-unfriendly.  gdb tries to do
> this correctly, though IIRC the build isn't all wired up properly and
> there aren't any actual translations :-/
> 
> Maybe it can be reworded somehow.  I'm not sure if xgettext can handle
> this kind of string concatenation in combination with macro expansion,
> either.

I have replaced the macro and string concatenation by a function,
so that all string constants use the _("...") layout.

Thanks

Philippe
  
Tom Tromey Sept. 24, 2018, 1:07 p.m. UTC | #3
>>>>> "Philippe" == Philippe Waroquiers <philippe.waroquiers@skynet.be> writes:

Tom> I tend to think this should go somewhere else, since cli-utils is more
Tom> like "low level" parsing helpers.  But I don't really know where I
Tom> guess.

Philippe> Yes, I found no better (existing) place, so I kept it there.
Philippe> If you prefer, I can create cli-arg_utils.h for 'higher level'
Philippe> arg parsing helpers if you believe this is cleaner.

I think your current approach is fine.

Philippe> I looked at using gdb_argv, but concluded that using gdb_argv
Philippe> would be backward incompatible for some likely use cases.
[...]
Philippe> So, I did not switch fully to gdb_argv, but made
Philippe> extract_arg_maybe_quoted quoting and escaping compatible
Philippe> with gdb_argv.

Thank you for looking into this.

Tom> One question I have is whether we could just change extract_arg to do
Tom> the dequoting.  Auditing the existing calls to see if this change would
Tom> negatively impact them might not be too hard.  And, it would be nice to
Tom> make gdb's CLI a bit more regular.

Philippe> I examined the calls to extract_arg : I found one possible unlikely
Philippe> backward incompatibility related to single quote handling :
Philippe> the mem command accepts expressions for addresses, and expressions
Philippe> in Ada can contain single quotes.

C-ish languages use single quotes for char constants, so this would be
pretty bad there too.

The "mem" command got the parsing wrong.  The incorrectness dates back
to when it was introduced in 2001.  For example (today's code but it
hasn't substantially changed):

  std::string tok = extract_arg (&args);
  if (tok == "")
    error (_("no lo address"));
  lo = parse_and_eval_address (tok.c_str ());

This is user-hostile for non-trivial expressions, because it forces you
to write them without spaces.

Instead code like this should use parse_to_comma_and_eval.

Maybe it is too late to fix the "mem" command.  I am not sure.  I do
know that back when I approved a similar change to "disasssemble" (I
thought I'd written that!  But the history shows not), there were
complaints -- we broke people's workarounds for the bad parsing
behavior.  On the other hand, "disasssemble" is probably used a lot more
than "mem".

Philippe> So, it looks possible to have extract_arg behaviour replaced by
Philippe> extract_arg_maybe_quoted behaviour.
Philippe> If you agree with the above analysis, I will work on that in a separate
Philippe> patch series.

One idea might be to upgrade the calls where it seems reasonable and
then leave the legacy behavior for the ones where it is not... perhaps
at the end, renaming extract_arg so that the name makes it clear that it
shouldn't be used in new code.

What do you think of that?  It's not necessary for you to do all the
work involved.

Tom
  
Philippe Waroquiers Sept. 25, 2018, 4:37 a.m. UTC | #4
On Mon, 2018-09-24 at 07:07 -0600, Tom Tromey wrote:
> Maybe it is too late to fix the "mem" command.  I am not sure.  I do
> know that back when I approved a similar change to "disasssemble" (I
> thought I'd written that!  But the history shows not), there were
> complaints -- we broke people's workarounds for the bad parsing
> behavior.  On the other hand, "disasssemble" is probably used a lot more
> than "mem".
> 
> Philippe> So, it looks possible to have extract_arg behaviour replaced by
> Philippe> extract_arg_maybe_quoted behaviour.
> Philippe> If you agree with the above analysis, I will work on that in a separate
> Philippe> patch series.
> 
> One idea might be to upgrade the calls where it seems reasonable and
> then leave the legacy behavior for the ones where it is not... perhaps
> at the end, renaming extract_arg so that the name makes it clear that it
> shouldn't be used in new code.
> 
> What do you think of that?  It's not necessary for you to do all the
> work involved.
Yes, that sounds a reasonable approach.

No problem for me to work on that, and give a proposal
'change to new behavior' or 'keep legacy behavior'  for each
extract_arg call (it might take a week or 2, as gdb is
an evening/week-end activity).

Once the analysis of which calls are reasonable to change is done,
the code
changes should be pretty small.

Philippe
  

Patch

diff --git a/gdb/cli/cli-utils.c b/gdb/cli/cli-utils.c
index 98b7414991..7adbab46a1 100644
--- a/gdb/cli/cli-utils.c
+++ b/gdb/cli/cli-utils.c
@@ -23,6 +23,9 @@ 
 
 #include <ctype.h>
 
+static std::string
+extract_arg_maybe_quoted (const char **arg);
+
 /* See documentation in cli-utils.h.  */
 
 int
@@ -125,6 +128,56 @@  get_number (char **pp)
   return result;
 }
 
+/* See documentation in cli-utils.h.  */
+
+void
+extract_info_print_args (const char* command,
+			 const char **args,
+			 bool *quiet,
+			 std::string *regexp,
+			 std::string *t_regexp)
+{
+  *quiet = false;
+  *regexp = "";
+  *t_regexp = "";
+
+  while (*args != NULL)
+    {
+      if (check_for_argument (args, "--", 2))
+	{
+	  *args = skip_spaces (*args);
+	  break;
+	}
+
+      if (check_for_argument (args, "-t", 2))
+	{
+	  *t_regexp = extract_arg_maybe_quoted (args);
+	  *args = skip_spaces (*args);
+	  continue;
+	}
+
+      if (check_for_argument (args, "-q", 2))
+	{
+	  *quiet = true;
+	  *args = skip_spaces (*args);
+	  continue;
+	}
+
+      if (**args != '-')
+	break;
+
+      std::string option = extract_arg (args);
+      error (_("Unrecognized option '%s' to %s command.  "
+	       "Try \"help %s\"."), option.c_str (),
+	     command, command);
+    }
+
+  if (*args != NULL && **args != '\000')
+    *regexp = extract_arg (args);
+
+}
+
+
 /* See documentation in cli-utils.h.  */
 
 number_or_range_parser::number_or_range_parser (const char *string)
@@ -282,6 +335,66 @@  remove_trailing_whitespace (const char *start, const char *s)
   return s;
 }
 
+/* A helper function to extract an argument from *ARG.  An argument is
+   delimited by whitespace, but it can also be optionally quoted using
+   single quote characters.  The return value is empty if no argument
+   was found.  */
+
+static std::string
+extract_arg_maybe_quoted (const char **arg)
+{
+  if (!*arg)
+    return std::string ();
+
+  /* Find the start of the argument.  */
+  *arg = skip_spaces (*arg);
+  if (!**arg)
+    return std::string ();
+
+  if (**arg == '\'')
+    {
+      /* Quoted argument.  */
+      const char *orig_arg = *arg;
+      std::string result;
+      const char *p;
+
+      (*arg)++; /* Skip starting quote.  */
+
+      /* Find the end of the quoted argument.  */
+      for (p = *arg; *p != '\0'; p++)
+	{
+	  if (*p == '\'')
+	    {
+	      (*arg)++; /* Skip ending quote.  */
+	      return result;
+	    }
+
+	  if (*p == '\\' && p[1] == '\'')
+	    {
+	      /* Escaped quote.  */
+	      p++;      /* Skip escape char.  */
+	      (*arg)++; /* Skip escape char.  */
+	    }
+
+	  result = result + *p;
+	  (*arg)++;
+	}
+      error (_("Unterminated quoted argument in %s"), orig_arg);
+    }
+  else
+    {
+      const char *result = *arg;
+
+      /* Find the end of the argument.  */
+      *arg = skip_to_space (*arg + 1);
+
+      if (result == *arg)
+	return std::string ();
+
+      return std::string (result, *arg - result);
+    }
+}
+
 /* See documentation in cli-utils.h.  */
 
 std::string
diff --git a/gdb/cli/cli-utils.h b/gdb/cli/cli-utils.h
index fa7d02d719..602a36314d 100644
--- a/gdb/cli/cli-utils.h
+++ b/gdb/cli/cli-utils.h
@@ -39,6 +39,26 @@  extern int get_number (const char **);
 
 extern int get_number (char **);
 
+/* Extract from args the arguments [-q] [-t TYPEREGEXP] [--] NAMEREGEXP.
+   Only the above flags are accepted.  Any other flag will raise an error.
+   COMMAND is used to produce the error message if an invalid flag is
+   given.  */
+extern void extract_info_print_args (const char* command,
+				     const char **args,
+				     bool *quiet,
+				     std::string *regexp,
+				     std::string *t_regexp);
+
+#define INFO_PRINT_ARGS_HELP(entity_kind) \
+"If NAMEREGEXP is provided, only prints the " entity_kind " whose name\n\
+matches NAMEREGEXP.\n\
+If -t TYPEREGEXP is provided, only prints the " entity_kind " whose type\n\
+matches TYPEREGEXP.  Note that the matching is done with the type\n\
+printed by the 'whatis' command.\n\
+By default, the command might produce headers and/or messages indicating\n\
+why no " entity_kind " can be printed.\n\
+The flag -q disables the production of these headers and messages."
+
 /* Parse a number or a range.
    A number will be of the form handled by get_number.
    A range will be of the form <number1> - <number2>, and