Message ID | 20170602185638.64317b83@pinnacle.lan |
---|---|
State | New, archived |
Headers |
Received: (qmail 17817 invoked by alias); 3 Jun 2017 01:56:43 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 17498 invoked by uid 89); 3 Jun 2017 01:56:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-22.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.2 spammy=H*F:U*kevin, wish, research X-HELO: fed1rmfepo101.cox.net Received: from fed1rmfepo101.cox.net (HELO fed1rmfepo101.cox.net) (68.230.241.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 03 Jun 2017 01:56:38 +0000 Received: from fed1rmimpo306.cox.net ([68.230.241.174]) by fed1rmfepo101.cox.net (InterMail vM.8.01.05.28 201-2260-151-171-20160122) with ESMTP id <20170603015641.EXPF4024.fed1rmfepo101.cox.net@fed1rmimpo306.cox.net> for <gdb-patches@sourceware.org>; Fri, 2 Jun 2017 21:56:41 -0400 Received: from pinnacle.lan ([70.176.31.165]) by fed1rmimpo306.cox.net with cox id U1wf1v00F3ZlYcN011wgAT; Fri, 02 Jun 2017 21:56:41 -0400 X-CT-Class: Clean X-CT-Score: 0.00 X-CT-Spam: 0 X-Authority-Analysis: v=2.1 cv=Wt6LSYrv c=1 sm=1 tr=0 a=ZvOrSFgHO2d7oiPvmX5V7g==:117 a=ZvOrSFgHO2d7oiPvmX5V7g==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=uQR19MJU64YfV5wl3MAA:9 a=TuPnCM5bd9ER9RsF:21 a=3KZCLucddxe1E8CZ:21 a=CjuIK1q_8ugA:10 X-CM-Score: 0.00 Authentication-Results: cox.net; auth=pass (LOGIN) smtp.auth=buettner2@cox.net Date: Fri, 2 Jun 2017 18:56:38 -0700 From: Kevin Buettner <kevin@buettner.to> To: gdb-patches@sourceware.org Subject: [PATCH] Use noncapturing subpattern/parens in gdb_test implementation Message-ID: <20170602185638.64317b83@pinnacle.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit |
Commit Message
Kevin Buettner
June 3, 2017, 1:56 a.m. UTC
This is the portion of gdb_test which performs the match against the RE (regular expression) passed to it: return [gdb_test_multiple $command $message { -re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" { if ![string match "" $message] then { pass "$message" } } In a test that I've been working on recently, I wanted to use a backreference - that's the \1 in the the RE below: gdb_test "info threads" \ {.*[\r\n]+\* +([0-9]+) +Thread[^\r\n]* do_something \(n=\1\) at.*} Put into English, I wanted to make sure that the value of n passed to do_something() is the same as the thread number shown in the "info threads" Id column. (I've structured the test case so that this *should* be the case.) It didn't work though. It turned out that ($pattern) in the RE noted above is capturing the attempted backreference. So, in this case, the backreference does not refer to ([0-9]+) as intended, but instead refers to ($pattern). This is wrong because it's not what I intended, but is also wrong because, if allowed, it could only match a string of infinite length. This problem can be fixed by using parens for a "noncapturing subpattern". The way that this is done, syntactically, is to use (?:$pattern) instead of ($pattern). My research shows that this feature has been present since tcl8.1 which was released in 1999. The current tcl version is 8.6 - at least that's what I have on my machine. It appears to me that mingw uses some subversion of tcl8.4 which will also have this feature (since 8.4 > 8.1). So it seems to me that any platform upon which we might wish to test GDB will have a version of tcl which has this feature. That being the case, my hope is that there won't be any objections to its use. When I looked at the implementation of gdb_test, I wondered whether the parens were needed at all. I've concluded that they are. In the event that $pattern is an RE which uses alternation at the top level, e.g. a|b, we need to make $pattern a subpattern (via parens) to limit the extend of the alternation. I.e, we don't want the alternation to extend to the other portions of the RE which gdb_test uses to match potential blank lines at the beginning of the pattern or the gdb prompt at the end. gdb/testsuite/ChangeLog: * gdb.exp (gdb_test): Using noncapturing parens for the $pattern subpattern. --- gdb/testsuite/lib/gdb.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On 2017-06-03 03:56, Kevin Buettner wrote: > This is the portion of gdb_test which performs the match against > the RE (regular expression) passed to it: > > return [gdb_test_multiple $command $message { > -re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" { > if ![string match "" $message] then { > pass "$message" > } > } > > In a test that I've been working on recently, I wanted to use > a backreference - that's the \1 in the the RE below: > > gdb_test "info threads" \ > {.*[\r\n]+\* +([0-9]+) +Thread[^\r\n]* do_something \(n=\1\) at.*} > > Put into English, I wanted to make sure that the value of n passed to > do_something() is the same as the thread number shown in the "info > threads" Id column. (I've structured the test case so that this > *should* be the case.) > > It didn't work though. It turned out that ($pattern) in the RE > noted above is capturing the attempted backreference. So, in this > case, the backreference does not refer to ([0-9]+) as intended, but > instead refers to ($pattern). This is wrong because it's not what I > intended, but is also wrong because, if allowed, it could only match a > string of infinite length. > > This problem can be fixed by using parens for a "noncapturing > subpattern". The way that this is done, syntactically, is to use > (?:$pattern) instead of ($pattern). > > My research shows that this feature has been present since tcl8.1 which > was released in 1999. > > The current tcl version is 8.6 - at least that's what I have on my > machine. It appears to me that mingw uses some subversion of tcl8.4 > which will also have this feature (since 8.4 > 8.1). > > So it seems to me that any platform upon which we might wish to test > GDB will have a version of tcl which has this feature. That being the > case, my hope is that there won't be any objections to its use. > > When I looked at the implementation of gdb_test, I wondered whether > the parens were needed at all. I've concluded that they are. In the > event that $pattern is an RE which uses alternation at the top level, > e.g. a|b, we need to make $pattern a subpattern (via parens) to limit > the extend of the alternation. I.e, we don't want the alternation to > extend to the other portions of the RE which gdb_test uses to match > potential blank lines at the beginning of the pattern or the gdb > prompt at the end. > > gdb/testsuite/ChangeLog: > > * gdb.exp (gdb_test): Using noncapturing parens for the $pattern > subpattern. > --- > gdb/testsuite/lib/gdb.exp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp > index 6633d24..bd61528 100644 > --- a/gdb/testsuite/lib/gdb.exp > +++ b/gdb/testsuite/lib/gdb.exp > @@ -1000,7 +1000,7 @@ proc gdb_test { args } { > } > > return [gdb_test_multiple $command $message { > - -re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" { > + -re "\[\r\n\]*(?:$pattern)\[\r\n\]+$gdb_prompt $" { > if ![string match "" $message] then { > pass "$message" > } The change makes sense to me. I assume you've run the testsuite and noticed no regressions? Then it's fair to assume that no test were using what those parentheses are currently capturing. Simon
Hi Simon, On Mon, 05 Jun 2017 14:07:36 +0200 Simon Marchi <simon.marchi@polymtl.ca> wrote: > On 2017-06-03 03:56, Kevin Buettner wrote: > > gdb/testsuite/ChangeLog: > > > > * gdb.exp (gdb_test): Using noncapturing parens for the $pattern > > subpattern. > > --- > > gdb/testsuite/lib/gdb.exp | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp > > index 6633d24..bd61528 100644 > > --- a/gdb/testsuite/lib/gdb.exp > > +++ b/gdb/testsuite/lib/gdb.exp > > @@ -1000,7 +1000,7 @@ proc gdb_test { args } { > > } > > > > return [gdb_test_multiple $command $message { > > - -re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" { > > + -re "\[\r\n\]*(?:$pattern)\[\r\n\]+$gdb_prompt $" { > > if ![string match "" $message] then { > > pass "$message" > > } > > The change makes sense to me. I assume you've run the testsuite and > noticed no regressions? Then it's fair to assume that no test were > using what those parentheses are currently capturing. Thanks for looking it over. I've pushed it. I did check for regressions prior to submitting the patch. I checked again earlier today prior to pushing it. None were found. I don't think it's possible for the parens that I modified in gdb_test to do any useful capturing. Any backreferences that you might try to use will be enclosed within those parens and therefore be a recursive backreference. In this case, the purpose of the parens was to make sure that everything in the pattern stays grouped together. This is important when the pattern passed to gdb_test has alternation at the top level. Kevin
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp index 6633d24..bd61528 100644 --- a/gdb/testsuite/lib/gdb.exp +++ b/gdb/testsuite/lib/gdb.exp @@ -1000,7 +1000,7 @@ proc gdb_test { args } { } return [gdb_test_multiple $command $message { - -re "\[\r\n\]*($pattern)\[\r\n\]+$gdb_prompt $" { + -re "\[\r\n\]*(?:$pattern)\[\r\n\]+$gdb_prompt $" { if ![string match "" $message] then { pass "$message" }