From patchwork Fri Dec 8 09:19:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arnold Robbins X-Patchwork-Id: 24803 Received: (qmail 97636 invoked by alias); 8 Dec 2017 09:21:21 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 96916 invoked by uid 89); 8 Dec 2017 09:20:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, MANY_HDRS_LCASE, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=hanno, HContent-type:text, ere, Hx-spam-relays-external:ESMTPA X-HELO: mxout4.netvision.net.il MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII From: Arnold Robbins Message-id: <201712080919.vB89JOMM005649@skeeve.com> Date: Fri, 08 Dec 2017 11:19:24 +0200 To: carlos@redhat.com, libc-alpha@sourceware.org Subject: [PATCH 15/17] Regex: Diagnose ERE "()|\1". User-Agent: Heirloom mailx 12.5 6/20/10 This patch fixes a bug related to diagnosing ERE '()|\1'. See http://bugs.gnu.org/21513. I pulled this in from GNULIB. 2017-11-30 Paul Eggert Diagnose ERE '()|\1' Problem reported by Hanno Boeck in: http://bugs.gnu.org/21513 * posix/regcomp.c (parse_reg_exp): While parsing alternatives, keep track of the set of previously-completed subexpressions available before the first alternative, and restore this set just before parsing each subsequent alternative. This lets us diagnose the invalid back-reference in the ERE '()|\1'. diff --git a/posix/regcomp.c b/posix/regcomp.c index 8920cf1..e63c258 100644 --- a/posix/regcomp.c +++ b/posix/regcomp.c @@ -2157,6 +2157,7 @@ parse_reg_exp (re_string_t *regexp, regex_t *preg, re_token_t *token, { re_dfa_t *dfa = (re_dfa_t *) preg->buffer; bin_tree_t *tree, *branch = NULL; + bitset_word_t initial_bkref_map = dfa->completed_bkref_map; tree = parse_branch (regexp, preg, token, syntax, nest, err); if (BE (*err != REG_NOERROR && tree == NULL, 0)) return NULL; @@ -2167,6 +2168,8 @@ parse_reg_exp (re_string_t *regexp, regex_t *preg, re_token_t *token, if (token->type != OP_ALT && token->type != END_OF_RE && (nest == 0 || token->type != OP_CLOSE_SUBEXP)) { + bitset_word_t accumulated_bkref_map = dfa->completed_bkref_map; + dfa->completed_bkref_map = initial_bkref_map; branch = parse_branch (regexp, preg, token, syntax, nest, err); if (BE (*err != REG_NOERROR && branch == NULL, 0)) { @@ -2174,6 +2177,7 @@ parse_reg_exp (re_string_t *regexp, regex_t *preg, re_token_t *token, postorder (tree, free_tree, NULL); return NULL; } + dfa->completed_bkref_map |= accumulated_bkref_map; } else branch = NULL;