Use noncapturing subpattern/parens in gdb_test implementation

Message ID 20170602185638.64317b83@pinnacle.lan
State New, archived
Headers

Commit Message

Kevin Buettner June 3, 2017, 1:56 a.m. UTC
  This is the portion of gdb_test which performs the match against
the RE (regular expression) passed to it:

    return [gdb_test_multiple $command $message {
	-re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" {
	    if ![string match "" $message] then {
		pass "$message"
	    }
	}

In a test that I've been working on recently, I wanted to use
a backreference - that's the \1 in the the RE below:

gdb_test "info threads"  \
    {.*[\r\n]+\* +([0-9]+) +Thread[^\r\n]* do_something \(n=\1\) at.*}

Put into English, I wanted to make sure that the value of n passed to
do_something() is the same as the thread number shown in the "info
threads" Id column.  (I've structured the test case so that this
*should* be the case.)

It didn't work though.  It turned out that ($pattern) in the RE
noted above is capturing the attempted backreference.  So, in this
case, the backreference does not refer to ([0-9]+) as intended, but
instead refers to ($pattern).  This is wrong because it's not what I
intended, but is also wrong because, if allowed, it could only match a
string of infinite length.

This problem can be fixed by using parens for a "noncapturing
subpattern".  The way that this is done, syntactically, is to use
(?:$pattern) instead of ($pattern).

My research shows that this feature has been present since tcl8.1 which
was released in 1999.

The current tcl version is 8.6 - at least that's what I have on my
machine.  It appears to me that mingw uses some subversion of tcl8.4
which will also have this feature (since 8.4 > 8.1).

So it seems to me that any platform upon which we might wish to test
GDB will have a version of tcl which has this feature.  That being the
case, my hope is that there won't be any objections to its use.

When I looked at the implementation of gdb_test, I wondered whether
the parens were needed at all.  I've concluded that they are.  In the
event that $pattern is an RE which uses alternation at the top level,
e.g. a|b, we need to make $pattern a subpattern (via parens) to limit
the extend of the alternation.  I.e, we don't want the alternation to
extend to the other portions of the RE which gdb_test uses to match
potential blank lines at the beginning of the pattern or the gdb
prompt at the end.

gdb/testsuite/ChangeLog:
    
    	* gdb.exp (gdb_test): Using noncapturing parens for the $pattern
    	subpattern.
---
 gdb/testsuite/lib/gdb.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Simon Marchi June 5, 2017, 12:07 p.m. UTC | #1
On 2017-06-03 03:56, Kevin Buettner wrote:
> This is the portion of gdb_test which performs the match against
> the RE (regular expression) passed to it:
> 
>     return [gdb_test_multiple $command $message {
> 	-re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" {
> 	    if ![string match "" $message] then {
> 		pass "$message"
> 	    }
> 	}
> 
> In a test that I've been working on recently, I wanted to use
> a backreference - that's the \1 in the the RE below:
> 
> gdb_test "info threads"  \
>     {.*[\r\n]+\* +([0-9]+) +Thread[^\r\n]* do_something \(n=\1\) at.*}
> 
> Put into English, I wanted to make sure that the value of n passed to
> do_something() is the same as the thread number shown in the "info
> threads" Id column.  (I've structured the test case so that this
> *should* be the case.)
> 
> It didn't work though.  It turned out that ($pattern) in the RE
> noted above is capturing the attempted backreference.  So, in this
> case, the backreference does not refer to ([0-9]+) as intended, but
> instead refers to ($pattern).  This is wrong because it's not what I
> intended, but is also wrong because, if allowed, it could only match a
> string of infinite length.
> 
> This problem can be fixed by using parens for a "noncapturing
> subpattern".  The way that this is done, syntactically, is to use
> (?:$pattern) instead of ($pattern).
> 
> My research shows that this feature has been present since tcl8.1 which
> was released in 1999.
> 
> The current tcl version is 8.6 - at least that's what I have on my
> machine.  It appears to me that mingw uses some subversion of tcl8.4
> which will also have this feature (since 8.4 > 8.1).
> 
> So it seems to me that any platform upon which we might wish to test
> GDB will have a version of tcl which has this feature.  That being the
> case, my hope is that there won't be any objections to its use.
> 
> When I looked at the implementation of gdb_test, I wondered whether
> the parens were needed at all.  I've concluded that they are.  In the
> event that $pattern is an RE which uses alternation at the top level,
> e.g. a|b, we need to make $pattern a subpattern (via parens) to limit
> the extend of the alternation.  I.e, we don't want the alternation to
> extend to the other portions of the RE which gdb_test uses to match
> potential blank lines at the beginning of the pattern or the gdb
> prompt at the end.
> 
> gdb/testsuite/ChangeLog:
> 
>     	* gdb.exp (gdb_test): Using noncapturing parens for the $pattern
>     	subpattern.
> ---
>  gdb/testsuite/lib/gdb.exp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
> index 6633d24..bd61528 100644
> --- a/gdb/testsuite/lib/gdb.exp
> +++ b/gdb/testsuite/lib/gdb.exp
> @@ -1000,7 +1000,7 @@ proc gdb_test { args } {
>      }
> 
>      return [gdb_test_multiple $command $message {
> -	-re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" {
> +	-re "\[\r\n\]*(?:$pattern)\[\r\n\]+$gdb_prompt $" {
>  	    if ![string match "" $message] then {
>  		pass "$message"
>              }

The change makes sense to me.  I assume you've run the testsuite and 
noticed no regressions?  Then it's fair to assume that no test were 
using what those parentheses are currently capturing.

Simon
  
Kevin Buettner June 21, 2017, 9:58 p.m. UTC | #2
Hi Simon,

On Mon, 05 Jun 2017 14:07:36 +0200
Simon Marchi <simon.marchi@polymtl.ca> wrote:

> On 2017-06-03 03:56, Kevin Buettner wrote:

> > gdb/testsuite/ChangeLog:
> > 
> >     	* gdb.exp (gdb_test): Using noncapturing parens for the $pattern
> >     	subpattern.
> > ---
> >  gdb/testsuite/lib/gdb.exp | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
> > index 6633d24..bd61528 100644
> > --- a/gdb/testsuite/lib/gdb.exp
> > +++ b/gdb/testsuite/lib/gdb.exp
> > @@ -1000,7 +1000,7 @@ proc gdb_test { args } {
> >      }
> > 
> >      return [gdb_test_multiple $command $message {
> > -	-re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" {
> > +	-re "\[\r\n\]*(?:$pattern)\[\r\n\]+$gdb_prompt $" {
> >  	    if ![string match "" $message] then {
> >  		pass "$message"
> >              }  
> 
> The change makes sense to me.  I assume you've run the testsuite and 
> noticed no regressions?  Then it's fair to assume that no test were 
> using what those parentheses are currently capturing.

Thanks for looking it over.  I've pushed it.

I did check for regressions prior to submitting the patch.  I checked
again earlier today prior to pushing it.  None were found.

I don't think it's possible for the parens that I modified in gdb_test
to do any useful capturing.  Any backreferences that you might try
to use will be enclosed within those parens and therefore be a
recursive backreference.  In this case, the purpose of the parens
was to make sure that everything in the pattern stays grouped together.
This is important when the pattern passed to gdb_test has alternation
at the top level.

Kevin
  

Patch

diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index 6633d24..bd61528 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -1000,7 +1000,7 @@  proc gdb_test { args } {
     }
 
     return [gdb_test_multiple $command $message {
-	-re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" {
+	-re "\[\r\n\]*(?:$pattern)\[\r\n\]+$gdb_prompt $" {
 	    if ![string match "" $message] then {
 		pass "$message"
             }