support C/C++ identifiers named with non-ASCII characters

From: Pedro Alves <palves@redhat.com>

  On 05/22/2018 04:48 PM, 張俊芝 wrote:

> I read through your code. If I understand it correctly, you keep all the valid ASCII characters, and treat all non-ASCII characters "as valid".
> 
> This is pretty much the same thing as I did. The only difference is that I blacklist invalid ASCII characters and you whitelist valid ASCII characters. But we both "validate" all the non-ASCII characters.

Right.  Non-7-bit/base ASCII characters must either be part of the identifier,
or invalid, but we don't need to be pedantic here, as you've also expressed
elsewhere in the thread, I believe.

> 
> But I think your code seems better than mine because it updates and reuses some common code. So I think I can abondon my patch and the unfinished test case.

Alright, I'm pushing this in then, as below.  I've added your name
to the ChangeLog too.  And fixed the Copyright years to include 2018.
Sorry that I didn't say I had a patch mostly written in the PR!

Note I've filed gdb/23211 for the completion issue.  I'm not working
on it right now.  Let me know if you'd like to take a look at that one.

From b1b60145aedb8adcb0b9dcf43a5ae735c2f03b51 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 22 May 2018 17:35:38 +0100
Subject: [PATCH] Support UTF-8 identifiers in C/C++ expressions (PR gdb/22973)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Factor out cp_ident_is_alpha/cp_ident_is_alnum out of
gdb/cp-name-parser.y and use it in the C/C++ expression parser too.

New test included.

gdb/ChangeLog:
2018-05-22  Pedro Alves  <palves@redhat.com>
	    張俊芝  <zjz@zjz.name>

	PR gdb/22973
	* c-exp.y: Include "c-support.h".
	(parse_number, c_parse_escape, lex_one_token): Use TOLOWER instead
	of tolower.  Use c_ident_is_alpha to scan names.
	* c-lang.c: Include "c-support.h".
	(convert_ucn, convert_octal, convert_hex, convert_escape): Use
	ISXDIGIT instead of isxdigit and ISDIGIT instead of isdigit.
	* c-support.h: New file, with bits factored out from ...
	* cp-name-parser.y: ... this file.
	Include "c-support.h".
	(cp_ident_is_alpha, cp_ident_is_alnum): Deleted, moved to
	c-support.h and renamed.
	(symbol_end, yylex): Adjust.

gdb/testsuite/ChangeLog:
2018-05-22  Pedro Alves  <palves@redhat.com>

	PR gdb/22973
	* gdb.base/utf8-identifiers.c: New file.
	* gdb.base/utf8-identifiers.exp: New file.
---
 gdb/ChangeLog                               | 17 +++++++
 gdb/testsuite/ChangeLog                     |  6 +++
 gdb/c-exp.y                                 | 27 +++++-----
 gdb/c-lang.c                                | 11 +++--
 gdb/c-support.h                             | 46 +++++++++++++++++
 gdb/cp-name-parser.y                        | 29 ++---------
 gdb/testsuite/gdb.base/utf8-identifiers.c   | 71 ++++++++++++++++++++++++++
 gdb/testsuite/gdb.base/utf8-identifiers.exp | 77 +++++++++++++++++++++++++++++
 8 files changed, 240 insertions(+), 44 deletions(-)
 create mode 100644 gdb/c-support.h
 create mode 100644 gdb/testsuite/gdb.base/utf8-identifiers.c
 create mode 100644 gdb/testsuite/gdb.base/utf8-identifiers.exp

support C/C++ identifiers named with non-ASCII characters

Commit Message

Patch