From patchwork Mon Feb 17 12:47:22 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Florian Weimer <fw@deneb.enyo.de>
X-Patchwork-Id: 38125
Received: (qmail 86122 invoked by alias); 17 Feb 2020 12:48:59 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 86114 invoked by uid 89); 17 Feb 2020 12:48:59 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-16.4 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:sk:mvm5zg5,
	H*i:sk:mvm5zg5
X-HELO: albireo.enyo.de
From: Florian Weimer <fw@deneb.enyo.de>
To: Andreas Schwab <schwab@suse.de>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH] wcsmbs: Avoid escaped character literals in <wchar.h>
References: <8736b9scqm.fsf@oldenburg2.str.redhat.com>
	<mvm5zg577f2.fsf@suse.de>
Date: Mon, 17 Feb 2020 13:47:22 +0100
In-Reply-To: <mvm5zg577f2.fsf@suse.de> (Andreas Schwab's message of "Mon, 17
	Feb 2020 12:57:37 +0100")
Message-ID: <87blpxcrdx.fsf@mid.deneb.enyo.de>
MIME-Version: 1.0

* Andreas Schwab:

> On Feb 17 2020, Florian Weimer wrote:
>
>> They confuse scripts/conformtest.py because it treats the L and the
>
> s/scripts/conform/
>
>> x7f as namespace-violating identifiers.
>
> Can the script be fixed not to do that?

Like this?

A more elaborate alternative would be to use Zack's C tokenizer in the
conform tests, but I don't know if its feature set is aligned with
what we need in the conform tests.

Subject: Add wide and character literal support to conform/conformtest.py

Without this change, tokens such as L'x7f' are reconginzed as a
identifiers L, x7f, which are not in the implementation namespace and
therefore trigger failures.

diff --git a/conform/conformtest.py b/conform/conformtest.py
index 951e3b2420..3bdc2a8e57 100644
--- a/conform/conformtest.py
+++ b/conform/conformtest.py
@@ -638,7 +638,7 @@ class HeaderTests(object):
                 # constants, and hex floats may be wrongly split into
                 # tokens including identifiers, but this is sufficient
                 # in practice and matches the old perl script.
-                line = re.sub(r'"[^"]*"', '', line)
+                line = re.sub(r'(?:\bL)?(?:"[^"]*"|\'[^\']*\')', '', line)
                 line = line.strip()
                 for token in re.split(r'[^A-Za-z0-9_]+', line):
                     if re.match(r'[A-Za-z_]', token):