From patchwork Mon Feb 17 12:47:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 38125 Received: (qmail 86122 invoked by alias); 17 Feb 2020 12:48:59 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 86114 invoked by uid 89); 17 Feb 2020 12:48:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-16.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:sk:mvm5zg5, H*i:sk:mvm5zg5 X-HELO: albireo.enyo.de From: Florian Weimer To: Andreas Schwab Cc: libc-alpha@sourceware.org Subject: Re: [PATCH] wcsmbs: Avoid escaped character literals in References: <8736b9scqm.fsf@oldenburg2.str.redhat.com> Date: Mon, 17 Feb 2020 13:47:22 +0100 In-Reply-To: (Andreas Schwab's message of "Mon, 17 Feb 2020 12:57:37 +0100") Message-ID: <87blpxcrdx.fsf@mid.deneb.enyo.de> MIME-Version: 1.0 * Andreas Schwab: > On Feb 17 2020, Florian Weimer wrote: > >> They confuse scripts/conformtest.py because it treats the L and the > > s/scripts/conform/ > >> x7f as namespace-violating identifiers. > > Can the script be fixed not to do that? Like this? A more elaborate alternative would be to use Zack's C tokenizer in the conform tests, but I don't know if its feature set is aligned with what we need in the conform tests. Subject: Add wide and character literal support to conform/conformtest.py Without this change, tokens such as L'x7f' are reconginzed as a identifiers L, x7f, which are not in the implementation namespace and therefore trigger failures. diff --git a/conform/conformtest.py b/conform/conformtest.py index 951e3b2420..3bdc2a8e57 100644 --- a/conform/conformtest.py +++ b/conform/conformtest.py @@ -638,7 +638,7 @@ class HeaderTests(object): # constants, and hex floats may be wrongly split into # tokens including identifiers, but this is sufficient # in practice and matches the old perl script. - line = re.sub(r'"[^"]*"', '', line) + line = re.sub(r'(?:\bL)?(?:"[^"]*"|\'[^\']*\')', '', line) line = line.strip() for token in re.split(r'[^A-Za-z0-9_]+', line): if re.match(r'[A-Za-z_]', token):