From patchwork Thu Jun 30 12:52:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 55060 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D9248384147B for ; Thu, 30 Jun 2022 12:52:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D9248384147B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656593575; bh=uTPy5AifiAmDOS3kVHEGLHwBLHhLQ6+zPeuScnArjmM=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=rrbcvSCxPlOVpXtu3otFb7gE4FkZDUyB+nr0Q/LAq0pzIlXuzkCaLlx0x8cehliRG n93q3iobxj6MhSOSPKuo4tyk9I2kB6SJzjY6TbqA1BTKnGVuEZJ0IWnN8D4Zf6e6KO GGAgAIHDwejRyLAShzmOMqxBIoaUDLcvxky5Xqz0= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp113.iad3b.emailsrvr.com (smtp113.iad3b.emailsrvr.com [146.20.161.113]) by sourceware.org (Postfix) with ESMTPS id 0BBD4386F0D2 for ; Thu, 30 Jun 2022 12:52:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0BBD4386F0D2 X-Auth-ID: tom@honermann.net Received: by smtp15.relay.iad3b.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 9162AC00B4; Thu, 30 Jun 2022 08:52:28 -0400 (EDT) To: libc-alpha@sourceware.org Subject: [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Date: Thu, 30 Jun 2022 08:52:12 -0400 Message-Id: <20220630125215.6052-1-tom@honermann.net> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Classification-ID: bd4e2ac9-54ec-4bf3-bea9-66e0e087bd86-1-1 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Cc: Tom Honermann Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This series of patches provides the following: - A fix for BZ #25744 [1]. - Implementations of the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 [2] and for C2X via WG14 N2653 [3]. - A char8_t typedef as adopted for C2X via WG14 N2653 [3]. The fix for BZ #25744 [1] is included in this patch series because the tests for mbrtoc8 and c8rtomb depend on it for exercising the special case where a pair of Unicode code points is converted to/from a single double byte sequence. Such conversion cases exist for Big5-HKSCS. N2653 was adopted by WG14 for C2X and wording is present in the N2912 revision of the C2X working draft [4]. This patch series enables the new declarations in C2X mode and when _GNU_SOURCE is defined. This patch series addresses feedback provided in response to the prior submission series starting at [5]. All feedback was addressed except as noted below: - I removed the previously proposed wcsmbs/test-char8-type.c test as redundant per feedback. I prototyped extending the c++-types.data test as suggested, but that would have introduced a C++20 dependency. The test demonstrated that the char16_t and char32_t types were mapped to 'Ds' and 'Di' as expected, but char8_t got mapped to 'h' rather than 'Du'. It might be worth extending this test in the future if/when C++20 support becomes a minimal requirement. - I did not switch the iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c test to use TEST_COMPARE in favor of the existing custom error handling; the current error messages are customized to provide a more detailed explanation of the error, so removing that seemed like an arguable regression. My changes retain consistency with the existing code. I think it is reasonable to switch to TEST_COMPARE, but I think that should be done separately from this change and separately motivated. - I did not change the c8rtomb() and mbrtoc8() implementations to use thread local storage when PS is null; the implementations continue to use static storage consistent with c16rtomb(), mbrtoc16(), c32rtomb(), and mbrtoc32(). I think it is reasonable to switch all of these functions to use thread local storage, but I think such a change should be done consistently across all of them and should be separately motivated. Thank you to Joseph Myers, Carlos O'Donell, Adhemerval Zanella, and Florian Weimer for prior reviews of this patch series. Tested on Linux x86_64. [1]: Bug 25744 "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the second byte of certain double byte characters" https://sourceware.org/bugzilla/show_bug.cgi?id=25744 [2]: WG21 P0482R6 "char8_t: A type for UTF-8 characters and strings (Revision 6)" https://wg21.link/p0482r6 [3]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm [4]: WG14 N2912 "Programming languages — C working draft — June 8, 2022" https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf [5]: libc-alpha mailing list "[PATCH 0/3]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb()." https://sourceware.org/pipermail/libc-alpha/2022-February/136723.html