From patchwork Tue Aug 2 18:36:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 56507 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 80A1E3857829 for ; Tue, 2 Aug 2022 18:37:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 80A1E3857829 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1659465462; bh=6fzSxEE3TrRK29Q4Ln3iDZ3jMjh/AXOKogIyX7v8v2o=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=F6sfrajtsDMcdO1KRZJdpKGy20wFAQCUmaT2cSD1hNm1eifynyoTBLURs2ddPQF1+ ft8IyngSVBMBGNZNzx+BzV44PIByn4+AGxN+aVyC/ksDmk4z0zG3OZQYRjIlKXdX13 Dj/LRr/GQMvqML6I0INcmv68bRUm2c9vcGgNez5k= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp100.ord1d.emailsrvr.com (smtp100.ord1d.emailsrvr.com [184.106.54.100]) by sourceware.org (Postfix) with ESMTPS id E3B1D3858296 for ; Tue, 2 Aug 2022 18:36:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E3B1D3858296 X-Auth-ID: tom@honermann.net Received: by smtp13.relay.ord1d.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 1EB88C01A5; Tue, 2 Aug 2022 14:36:12 -0400 (EDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v4 1/2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes Date: Tue, 2 Aug 2022 14:36:01 -0400 Message-Id: <20220802183602.1575950-2-tom@honermann.net> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220802183602.1575950-1-tom@honermann.net> References: <20220802183602.1575950-1-tom@honermann.net> MIME-Version: 1.0 X-Classification-ID: 1c8dd2ad-b7d1-42b3-917d-b50ffbed0e63-2-1 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Gcc-patches From: Tom Honermann Reply-To: Tom Honermann Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch implements the core language and compiler dependent library changes adopted for C2X via WG14 N2653. The changes include: - Change of type for UTF-8 string literals from array of const char to array of const char8_t (unsigned char). - A new atomic_char8_t typedef. - A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of the existing __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined macro. gcc/ChangeLog: * ginclude/stdatomic.h (atomic_char8_t, ATOMIC_CHAR8_T_LOCK_FREE): New typedef and macro. gcc/c/ChangeLog: * c-parser.c (c_parser_string_literal): Use char8_t as the type of CPP_UTF8STRING when char8_t support is enabled. * c-typeck.c (digest_init): Allow initialization of an array of character type by a string literal with type array of char8_t. gcc/c-family/ChangeLog: * c-lex.c (lex_string, lex_charconst): Use char8_t as the type of CPP_UTF8CHAR and CPP_UTF8STRING when char8_t support is enabled. * c-opts.c (c_common_post_options): Set flag_char8_t if targeting C2x. gcc/testsuite/ChangeLog: * gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c: New test. * gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c: New test. * gcc.dg/c11-utf8str-type.c: New test. * gcc.dg/c17-utf8str-type.c: New test. * gcc.dg/c2x-utf8str-type.c: New test. * gcc.dg/c2x-utf8str.c: New test. * gcc.dg/gnu2x-utf8str-type.c: New test. * gcc.dg/gnu2x-utf8str.c: New test. --- gcc/c-family/c-lex.cc | 13 ++++-- gcc/c-family/c-opts.cc | 4 +- gcc/c/c-parser.cc | 16 ++++++- gcc/c/c-typeck.cc | 2 +- gcc/ginclude/stdatomic.h | 6 +++ .../atomic/c2x-stdatomic-lockfree-char8_t.c | 42 +++++++++++++++++++ .../atomic/gnu2x-stdatomic-lockfree-char8_t.c | 5 +++ gcc/testsuite/gcc.dg/c11-utf8str-type.c | 6 +++ gcc/testsuite/gcc.dg/c17-utf8str-type.c | 6 +++ gcc/testsuite/gcc.dg/c2x-utf8str-type.c | 6 +++ gcc/testsuite/gcc.dg/c2x-utf8str.c | 34 +++++++++++++++ gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c | 5 +++ gcc/testsuite/gcc.dg/gnu2x-utf8str.c | 34 +++++++++++++++ 13 files changed, 170 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c create mode 100644 gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c create mode 100644 gcc/testsuite/gcc.dg/c11-utf8str-type.c create mode 100644 gcc/testsuite/gcc.dg/c17-utf8str-type.c create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str-type.c create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str.c create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str.c diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc index 8bfa4f4024f..0b6f94e18a8 100644 --- a/gcc/c-family/c-lex.cc +++ b/gcc/c-family/c-lex.cc @@ -1352,7 +1352,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate) default: case CPP_STRING: case CPP_UTF8STRING: - value = build_string (1, ""); + if (type == CPP_UTF8STRING && flag_char8_t) + { + value = build_string (TYPE_PRECISION (char8_type_node) + / TYPE_PRECISION (char_type_node), + ""); /* char8_t is 8 bits */ + } + else + value = build_string (1, ""); break; case CPP_STRING16: value = build_string (TYPE_PRECISION (char16_type_node) @@ -1425,9 +1432,7 @@ lex_charconst (const cpp_token *token) type = char16_type_node; else if (token->type == CPP_UTF8CHAR) { - if (!c_dialect_cxx ()) - type = unsigned_char_type_node; - else if (flag_char8_t) + if (flag_char8_t) type = char8_type_node; else type = char_type_node; diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index b9f01a65ed7..108adc5caf8 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -1059,9 +1059,9 @@ c_common_post_options (const char **pfilename) if (flag_sized_deallocation == -1) flag_sized_deallocation = (cxx_dialect >= cxx14); - /* char8_t support is new in C++20. */ + /* char8_t support is implicitly enabled in C++20 and C2X. */ if (flag_char8_t == -1) - flag_char8_t = (cxx_dialect >= cxx20); + flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc2x; if (flag_extern_tls_init) { diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 92049d1a101..fa9395986de 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -7447,7 +7447,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok) default: case CPP_STRING: case CPP_UTF8STRING: - value = build_string (1, ""); + if (type == CPP_UTF8STRING && flag_char8_t) + { + value = build_string (TYPE_PRECISION (char8_type_node) + / TYPE_PRECISION (char_type_node), + ""); /* char8_t is 8 bits */ + } + else + value = build_string (1, ""); break; case CPP_STRING16: value = build_string (TYPE_PRECISION (char16_type_node) @@ -7472,9 +7479,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok) { default: case CPP_STRING: - case CPP_UTF8STRING: TREE_TYPE (value) = char_array_type_node; break; + case CPP_UTF8STRING: + if (flag_char8_t) + TREE_TYPE (value) = char8_array_type_node; + else + TREE_TYPE (value) = char_array_type_node; + break; case CPP_STRING16: TREE_TYPE (value) = char16_array_type_node; break; diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index fd0a7f81a7a..231f4e980b6 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -8045,7 +8045,7 @@ digest_init (location_t init_loc, tree type, tree init, tree origtype, if (char_array) { - if (typ2 != char_type_node) + if (typ2 != char_type_node && typ2 != char8_type_node) incompat_string_cst = true; } else if (!comptypes (typ1, typ2)) diff --git a/gcc/ginclude/stdatomic.h b/gcc/ginclude/stdatomic.h index bfcfdf664c7..9f2475b739d 100644 --- a/gcc/ginclude/stdatomic.h +++ b/gcc/ginclude/stdatomic.h @@ -49,6 +49,9 @@ typedef _Atomic long atomic_long; typedef _Atomic unsigned long atomic_ulong; typedef _Atomic long long atomic_llong; typedef _Atomic unsigned long long atomic_ullong; +#ifdef __CHAR8_TYPE__ +typedef _Atomic __CHAR8_TYPE__ atomic_char8_t; +#endif typedef _Atomic __CHAR16_TYPE__ atomic_char16_t; typedef _Atomic __CHAR32_TYPE__ atomic_char32_t; typedef _Atomic __WCHAR_TYPE__ atomic_wchar_t; @@ -97,6 +100,9 @@ extern void atomic_signal_fence (memory_order); #define ATOMIC_BOOL_LOCK_FREE __GCC_ATOMIC_BOOL_LOCK_FREE #define ATOMIC_CHAR_LOCK_FREE __GCC_ATOMIC_CHAR_LOCK_FREE +#ifdef __GCC_ATOMIC_CHAR8_T_LOCK_FREE +#define ATOMIC_CHAR8_T_LOCK_FREE __GCC_ATOMIC_CHAR8_T_LOCK_FREE +#endif #define ATOMIC_CHAR16_T_LOCK_FREE __GCC_ATOMIC_CHAR16_T_LOCK_FREE #define ATOMIC_CHAR32_T_LOCK_FREE __GCC_ATOMIC_CHAR32_T_LOCK_FREE #define ATOMIC_WCHAR_T_LOCK_FREE __GCC_ATOMIC_WCHAR_T_LOCK_FREE diff --git a/gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c b/gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c new file mode 100644 index 00000000000..1b692f55ed0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c @@ -0,0 +1,42 @@ +/* Test atomic_is_lock_free for char8_t. */ +/* { dg-do run } */ +/* { dg-options "-std=c2x -pedantic-errors" } */ + +#include +#include + +extern void abort (void); + +_Atomic __CHAR8_TYPE__ ac8a; +atomic_char8_t ac8t; + +#define CHECK_TYPE(MACRO, V1, V2) \ + do \ + { \ + int r1 = MACRO; \ + int r2 = atomic_is_lock_free (&V1); \ + int r3 = atomic_is_lock_free (&V2); \ + if (r1 != 0 && r1 != 1 && r1 != 2) \ + abort (); \ + if (r2 != 0 && r2 != 1) \ + abort (); \ + if (r3 != 0 && r3 != 1) \ + abort (); \ + if (r1 == 2 && r2 != 1) \ + abort (); \ + if (r1 == 2 && r3 != 1) \ + abort (); \ + if (r1 == 0 && r2 != 0) \ + abort (); \ + if (r1 == 0 && r3 != 0) \ + abort (); \ + } \ + while (0) + +int +main () +{ + CHECK_TYPE (ATOMIC_CHAR8_T_LOCK_FREE, ac8a, ac8t); + + return 0; +} diff --git a/gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c b/gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c new file mode 100644 index 00000000000..27a3cfe3552 --- /dev/null +++ b/gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c @@ -0,0 +1,5 @@ +/* Test atomic_is_lock_free for char8_t with -std=gnu2x. */ +/* { dg-do run } */ +/* { dg-options "-std=gnu2x -pedantic-errors" } */ + +#include "c2x-stdatomic-lockfree-char8_t.c" diff --git a/gcc/testsuite/gcc.dg/c11-utf8str-type.c b/gcc/testsuite/gcc.dg/c11-utf8str-type.c new file mode 100644 index 00000000000..8be9abb9686 --- /dev/null +++ b/gcc/testsuite/gcc.dg/c11-utf8str-type.c @@ -0,0 +1,6 @@ +/* Test C11 UTF-8 string literal type. */ +/* { dg-do compile } */ +/* { dg-options "-std=c11" } */ + +_Static_assert (_Generic (u8"text", char*: 1, default: 2) == 1, "UTF-8 string literals have an unexpected type"); +_Static_assert (_Generic (u8"x"[0], char: 1, default: 2) == 1, "UTF-8 string literal elements have an unexpected type"); diff --git a/gcc/testsuite/gcc.dg/c17-utf8str-type.c b/gcc/testsuite/gcc.dg/c17-utf8str-type.c new file mode 100644 index 00000000000..515c6db3970 --- /dev/null +++ b/gcc/testsuite/gcc.dg/c17-utf8str-type.c @@ -0,0 +1,6 @@ +/* Test C17 UTF-8 string literal type. */ +/* { dg-do compile } */ +/* { dg-options "-std=c17" } */ + +_Static_assert (_Generic (u8"text", char*: 1, default: 2) == 1, "UTF-8 string literals have an unexpected type"); +_Static_assert (_Generic (u8"x"[0], char: 1, default: 2) == 1, "UTF-8 string literal elements have an unexpected type"); diff --git a/gcc/testsuite/gcc.dg/c2x-utf8str-type.c b/gcc/testsuite/gcc.dg/c2x-utf8str-type.c new file mode 100644 index 00000000000..ebdde97b57a --- /dev/null +++ b/gcc/testsuite/gcc.dg/c2x-utf8str-type.c @@ -0,0 +1,6 @@ +/* Test C2X UTF-8 string literal type. */ +/* { dg-do compile } */ +/* { dg-options "-std=c2x" } */ + +_Static_assert (_Generic (u8"text", unsigned char*: 1, default: 2) == 1, "UTF-8 string literals have an unexpected type"); +_Static_assert (_Generic (u8"x"[0], unsigned char: 1, default: 2) == 1, "UTF-8 string literal elements have an unexpected type"); diff --git a/gcc/testsuite/gcc.dg/c2x-utf8str.c b/gcc/testsuite/gcc.dg/c2x-utf8str.c new file mode 100644 index 00000000000..2e4c392da9f --- /dev/null +++ b/gcc/testsuite/gcc.dg/c2x-utf8str.c @@ -0,0 +1,34 @@ +/* Test initialization by UTF-8 string literal in C2X. */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=c2x" } */ + +typedef __CHAR8_TYPE__ char8_t; +typedef __CHAR16_TYPE__ char16_t; +typedef __CHAR32_TYPE__ char32_t; +typedef __WCHAR_TYPE__ wchar_t; + +/* Test that char, signed char, unsigned char, and char8_t arrays can be + initialized by a UTF-8 string literal. */ +const char cbuf1[] = u8"text"; +const char cbuf2[] = { u8"text" }; +const signed char scbuf1[] = u8"text"; +const signed char scbuf2[] = { u8"text" }; +const unsigned char ucbuf1[] = u8"text"; +const unsigned char ucbuf2[] = { u8"text" }; +const char8_t c8buf1[] = u8"text"; +const char8_t c8buf2[] = { u8"text" }; + +/* Test that a diagnostic is issued for attempted initialization of + other character types by a UTF-8 string literal. */ +const char16_t c16buf1[] = u8"text"; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const char16_t c16buf2[] = { u8"text" }; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const char32_t c32buf1[] = u8"text"; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const char32_t c32buf2[] = { u8"text" }; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const wchar_t wbuf1[] = u8"text"; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const wchar_t wbuf2[] = { u8"text" }; /* { dg-error "from a string literal with type array of .unsigned char." } */ + +/* Test that char8_t arrays can be initialized by an ordinary string + literal. */ +const char8_t c8buf3[] = "text"; +const char8_t c8buf4[] = { "text" }; diff --git a/gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c b/gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c new file mode 100644 index 00000000000..efe16ffc28d --- /dev/null +++ b/gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c @@ -0,0 +1,5 @@ +/* Test C2X UTF-8 string literal type with -std=gnu2x. */ +/* { dg-do compile } */ +/* { dg-options "-std=gnu2x" } */ + +#include "c2x-utf8str-type.c" diff --git a/gcc/testsuite/gcc.dg/gnu2x-utf8str.c b/gcc/testsuite/gcc.dg/gnu2x-utf8str.c new file mode 100644 index 00000000000..f3719ea8c77 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gnu2x-utf8str.c @@ -0,0 +1,34 @@ +/* Test initialization by UTF-8 string literal in C2X with -std=gnu2x. */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=gnu2x" } */ + +typedef __CHAR8_TYPE__ char8_t; +typedef __CHAR16_TYPE__ char16_t; +typedef __CHAR32_TYPE__ char32_t; +typedef __WCHAR_TYPE__ wchar_t; + +/* Test that char, signed char, unsigned char, and char8_t arrays can be + initialized by a UTF-8 string literal. */ +const char cbuf1[] = u8"text"; +const char cbuf2[] = { u8"text" }; +const signed char scbuf1[] = u8"text"; +const signed char scbuf2[] = { u8"text" }; +const unsigned char ucbuf1[] = u8"text"; +const unsigned char ucbuf2[] = { u8"text" }; +const char8_t c8buf1[] = u8"text"; +const char8_t c8buf2[] = { u8"text" }; + +/* Test that a diagnostic is issued for attempted initialization of + other character types by a UTF-8 string literal. */ +const char16_t c16buf1[] = u8"text"; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const char16_t c16buf2[] = { u8"text" }; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const char32_t c32buf1[] = u8"text"; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const char32_t c32buf2[] = { u8"text" }; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const wchar_t wbuf1[] = u8"text"; /* { dg-error "from a string literal with type array of .unsigned char." } */ +const wchar_t wbuf2[] = { u8"text" }; /* { dg-error "from a string literal with type array of .unsigned char." } */ + +/* Test that char8_t arrays can be initialized by an ordinary string + literal. */ +const char8_t c8buf3[] = "text"; +const char8_t c8buf4[] = { "text" }; From patchwork Tue Aug 2 18:36:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 56508 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1282A3858031 for ; Tue, 2 Aug 2022 18:38:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1282A3858031 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1659465521; bh=wz64NeGnSxvwj8mQS6ZT2Tudkje5OvfdFIYz8CFGT8o=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=oGrOzIGeC+a8bpaYdd1f4zlfkGyBe/oLBpPApkfxtXWLj9M20SmCclQcCKun3T8xk vxWmzCkdHGxOjVgNCdXwNbHBwQciceuRwDXbMClE5es+wiWOhzKUbP6tk1BUPw+Dkx ZBPcQ5AF+CvFDkw1PaNwW+Kg8b8Zfefb6ZQJrkvM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp101.ord1d.emailsrvr.com (smtp101.ord1d.emailsrvr.com [184.106.54.101]) by sourceware.org (Postfix) with ESMTPS id E327038582A7 for ; Tue, 2 Aug 2022 18:36:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E327038582A7 X-Auth-ID: tom@honermann.net Received: by smtp13.relay.ord1d.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 79444C01A0; Tue, 2 Aug 2022 14:36:14 -0400 (EDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes. Date: Tue, 2 Aug 2022 14:36:02 -0400 Message-Id: <20220802183602.1575950-3-tom@honermann.net> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220802183602.1575950-1-tom@honermann.net> References: <20220802183602.1575950-1-tom@honermann.net> MIME-Version: 1.0 X-Classification-ID: 1c8dd2ad-b7d1-42b3-917d-b50ffbed0e63-3-1 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Gcc-patches From: Tom Honermann Reply-To: Tom Honermann Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch corrects handling of UTF-8 character literals in preprocessing directives so that they are treated as unsigned types in char8_t enabled C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously, UTF-8 character literals were always treated as having the same type as ordinary character literals (signed or unsigned dependent on target or use of the -fsigned-char or -funsigned char options). PR preprocessor/106426 gcc/c-family/ChangeLog: * c-opts.cc (c_common_post_options): Assign cpp_opts->unsigned_utf8char subject to -fchar8_t, -fsigned-char, and/or -funsigned-char. gcc/testsuite/ChangeLog: * g++.dg/ext/char8_t-char-literal-1.C: Check signedness of u8 literals. * g++.dg/ext/char8_t-char-literal-2.C: Check signedness of u8 literals. libcpp/ChangeLog: * charset.cc (narrow_str_to_charconst): Set signedness of CPP_UTF8CHAR literals based on unsigned_utf8char. * include/cpplib.h (cpp_options): Add unsigned_utf8char. * init.cc (cpp_create_reader): Initialize unsigned_utf8char. --- gcc/c-family/c-opts.cc | 1 + gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C | 6 +++++- gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C | 4 ++++ libcpp/charset.cc | 4 ++-- libcpp/include/cpplib.h | 4 ++-- libcpp/init.cc | 1 + 6 files changed, 15 insertions(+), 5 deletions(-) diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index 108adc5caf8..02ce1e86cdb 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -1062,6 +1062,7 @@ c_common_post_options (const char **pfilename) /* char8_t support is implicitly enabled in C++20 and C2X. */ if (flag_char8_t == -1) flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc2x; + cpp_opts->unsigned_utf8char = flag_char8_t ? 1 : cpp_opts->unsigned_char; if (flag_extern_tls_init) { diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C index 8ed85ccfdcd..2994dd38516 100644 --- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C +++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C @@ -1,6 +1,6 @@ // Test that UTF-8 character literals have type char if -fchar8_t is not enabled. // { dg-do compile } -// { dg-options "-std=c++17 -fno-char8_t" } +// { dg-options "-std=c++17 -fsigned-char -fno-char8_t" } template struct is_same @@ -10,3 +10,7 @@ template { static const bool value = true; }; static_assert(is_same::value, "Error"); + +#if u8'\0' - 1 > 0 +#error "UTF-8 character literals not signed in preprocessor" +#endif diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C index 7861736689c..db4fe70046d 100644 --- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C +++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C @@ -10,3 +10,7 @@ template { static const bool value = true; }; static_assert(is_same::value, "Error"); + +#if u8'\0' - 1 < 0 +#error "UTF-8 character literals not unsigned in preprocessor" +#endif diff --git a/libcpp/charset.cc b/libcpp/charset.cc index ca8b7cf7aa5..12e31632228 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1960,8 +1960,8 @@ narrow_str_to_charconst (cpp_reader *pfile, cpp_string str, /* Multichar constants are of type int and therefore signed. */ if (i > 1) unsigned_p = 0; - else if (type == CPP_UTF8CHAR && !CPP_OPTION (pfile, cplusplus)) - unsigned_p = 1; + else if (type == CPP_UTF8CHAR) + unsigned_p = CPP_OPTION (pfile, unsigned_utf8char); else unsigned_p = CPP_OPTION (pfile, unsigned_char); diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index 3eba6f74b57..f9c042db034 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -581,8 +581,8 @@ struct cpp_options ints and target wide characters, respectively. */ size_t precision, char_precision, int_precision, wchar_precision; - /* True means chars (wide chars) are unsigned. */ - bool unsigned_char, unsigned_wchar; + /* True means chars (wide chars, UTF-8 chars) are unsigned. */ + bool unsigned_char, unsigned_wchar, unsigned_utf8char; /* True if the most significant byte in a word has the lowest address in memory. */ diff --git a/libcpp/init.cc b/libcpp/init.cc index f4ab83d2145..0242da5f55c 100644 --- a/libcpp/init.cc +++ b/libcpp/init.cc @@ -231,6 +231,7 @@ cpp_create_reader (enum c_lang lang, cpp_hash_table *table, CPP_OPTION (pfile, int_precision) = CHAR_BIT * sizeof (int); CPP_OPTION (pfile, unsigned_char) = 0; CPP_OPTION (pfile, unsigned_wchar) = 1; + CPP_OPTION (pfile, unsigned_utf8char) = 1; CPP_OPTION (pfile, bytes_big_endian) = 1; /* does not matter */ /* Default to no charset conversion. */