Message ID | 20220630125215.6052-1-tom@honermann.net |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D9248384147B for <patchwork@sourceware.org>; Thu, 30 Jun 2022 12:52:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D9248384147B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656593575; bh=uTPy5AifiAmDOS3kVHEGLHwBLHhLQ6+zPeuScnArjmM=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=rrbcvSCxPlOVpXtu3otFb7gE4FkZDUyB+nr0Q/LAq0pzIlXuzkCaLlx0x8cehliRG n93q3iobxj6MhSOSPKuo4tyk9I2kB6SJzjY6TbqA1BTKnGVuEZJ0IWnN8D4Zf6e6KO GGAgAIHDwejRyLAShzmOMqxBIoaUDLcvxky5Xqz0= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp113.iad3b.emailsrvr.com (smtp113.iad3b.emailsrvr.com [146.20.161.113]) by sourceware.org (Postfix) with ESMTPS id 0BBD4386F0D2 for <libc-alpha@sourceware.org>; Thu, 30 Jun 2022 12:52:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0BBD4386F0D2 X-Auth-ID: tom@honermann.net Received: by smtp15.relay.iad3b.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 9162AC00B4; Thu, 30 Jun 2022 08:52:28 -0400 (EDT) To: libc-alpha@sourceware.org Subject: [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Date: Thu, 30 Jun 2022 08:52:12 -0400 Message-Id: <20220630125215.6052-1-tom@honermann.net> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Classification-ID: bd4e2ac9-54ec-4bf3-bea9-66e0e087bd86-1-1 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Tom Honermann <tom@honermann.net> Cc: Tom Honermann <tom@honermann.net> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb()
|
|
Message
Tom Honermann
June 30, 2022, 12:52 p.m. UTC
This series of patches provides the following: - A fix for BZ #25744 [1]. - Implementations of the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 [2] and for C2X via WG14 N2653 [3]. - A char8_t typedef as adopted for C2X via WG14 N2653 [3]. The fix for BZ #25744 [1] is included in this patch series because the tests for mbrtoc8 and c8rtomb depend on it for exercising the special case where a pair of Unicode code points is converted to/from a single double byte sequence. Such conversion cases exist for Big5-HKSCS. N2653 was adopted by WG14 for C2X and wording is present in the N2912 revision of the C2X working draft [4]. This patch series enables the new declarations in C2X mode and when _GNU_SOURCE is defined. This patch series addresses feedback provided in response to the prior submission series starting at [5]. All feedback was addressed except as noted below: - I removed the previously proposed wcsmbs/test-char8-type.c test as redundant per feedback. I prototyped extending the c++-types.data test as suggested, but that would have introduced a C++20 dependency. The test demonstrated that the char16_t and char32_t types were mapped to 'Ds' and 'Di' as expected, but char8_t got mapped to 'h' rather than 'Du'. It might be worth extending this test in the future if/when C++20 support becomes a minimal requirement. - I did not switch the iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c test to use TEST_COMPARE in favor of the existing custom error handling; the current error messages are customized to provide a more detailed explanation of the error, so removing that seemed like an arguable regression. My changes retain consistency with the existing code. I think it is reasonable to switch to TEST_COMPARE, but I think that should be done separately from this change and separately motivated. - I did not change the c8rtomb() and mbrtoc8() implementations to use thread local storage when PS is null; the implementations continue to use static storage consistent with c16rtomb(), mbrtoc16(), c32rtomb(), and mbrtoc32(). I think it is reasonable to switch all of these functions to use thread local storage, but I think such a change should be done consistently across all of them and should be separately motivated. Thank you to Joseph Myers, Carlos O'Donell, Adhemerval Zanella, and Florian Weimer for prior reviews of this patch series. Tested on Linux x86_64. [1]: Bug 25744 "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the second byte of certain double byte characters" https://sourceware.org/bugzilla/show_bug.cgi?id=25744 [2]: WG21 P0482R6 "char8_t: A type for UTF-8 characters and strings (Revision 6)" https://wg21.link/p0482r6 [3]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm [4]: WG14 N2912 "Programming languages — C working draft — June 8, 2022" https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf [5]: libc-alpha mailing list "[PATCH 0/3]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb()." https://sourceware.org/pipermail/libc-alpha/2022-February/136723.html
Comments
> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote: > > This series of patches provides the following: > - A fix for BZ #25744 [1]. > - Implementations of the mbrtoc8 and c8rtomb functions adopted for > C++20 via WG21 P0482R6 [2] and for C2X via WG14 N2653 [3]. > - A char8_t typedef as adopted for C2X via WG14 N2653 [3]. > > The fix for BZ #25744 [1] is included in this patch series because the tests > for mbrtoc8 and c8rtomb depend on it for exercising the special case where a > pair of Unicode code points is converted to/from a single double byte sequence. > Such conversion cases exist for Big5-HKSCS. > > N2653 was adopted by WG14 for C2X and wording is present in the N2912 revision > of the C2X working draft [4]. This patch series enables the new declarations > in C2X mode and when _GNU_SOURCE is defined. > > This patch series addresses feedback provided in response to the prior > submission series starting at [5]. All feedback was addressed except as noted > below: > - I removed the previously proposed wcsmbs/test-char8-type.c test as redundant > per feedback. I prototyped extending the c++-types.data test as suggested, > but that would have introduced a C++20 dependency. The test demonstrated that > the char16_t and char32_t types were mapped to 'Ds' and 'Di' as expected, but > char8_t got mapped to 'h' rather than 'Du'. It might be worth extending this > test in the future if/when C++20 support becomes a minimal requirement. > - I did not switch the iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c test to use > TEST_COMPARE in favor of the existing custom error handling; the current > error messages are customized to provide a more detailed explanation of the > error, so removing that seemed like an arguable regression. My changes retain > consistency with the existing code. I think it is reasonable to switch to > TEST_COMPARE, but I think that should be done separately from this change and > separately motivated. > - I did not change the c8rtomb() and mbrtoc8() implementations to use thread > local storage when PS is null; the implementations continue to use static > storage consistent with c16rtomb(), mbrtoc16(), c32rtomb(), and mbrtoc32(). > I think it is reasonable to switch all of these functions to use thread > local storage, but I think such a change should be done consistently across > all of them and should be separately motivated. > > Thank you to Joseph Myers, Carlos O'Donell, Adhemerval Zanella, and Florian > Weimer for prior reviews of this patch series. > > Tested on Linux x86_64. > > [1]: Bug 25744 > "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the > second byte of certain double byte characters" > https://sourceware.org/bugzilla/show_bug.cgi?id=25744 > > [2]: WG21 P0482R6 > "char8_t: A type for UTF-8 characters and strings (Revision 6)" > https://wg21.link/p0482r6 > > [3]: WG14 N2653 > "char8_t: A type for UTF-8 characters and strings (Revision 1)" > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm > > [4]: WG14 N2912 > "Programming languages — C working draft — June 8, 2022" > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf > > [5]: libc-alpha mailing list > "[PATCH 0/3]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb()." > https://sourceware.org/pipermail/libc-alpha/2022-February/136723.html Hi Tom, Patches look good in general, just some really minor style issues I found in last review. I have fixed them and if you are ok I can install them, I have create a branch so you can check [1]. [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/mbrtoc8-c8rtomb
On 7/4/22 3:08 PM, Adhemerval Zanella wrote: > >> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote: >> >> This series of patches provides the following: >> - A fix for BZ #25744 [1]. >> - Implementations of the mbrtoc8 and c8rtomb functions adopted for >> C++20 via WG21 P0482R6 [2] and for C2X via WG14 N2653 [3]. >> - A char8_t typedef as adopted for C2X via WG14 N2653 [3]. >> >> The fix for BZ #25744 [1] is included in this patch series because the tests >> for mbrtoc8 and c8rtomb depend on it for exercising the special case where a >> pair of Unicode code points is converted to/from a single double byte sequence. >> Such conversion cases exist for Big5-HKSCS. >> >> N2653 was adopted by WG14 for C2X and wording is present in the N2912 revision >> of the C2X working draft [4]. This patch series enables the new declarations >> in C2X mode and when _GNU_SOURCE is defined. >> >> This patch series addresses feedback provided in response to the prior >> submission series starting at [5]. All feedback was addressed except as noted >> below: >> - I removed the previously proposed wcsmbs/test-char8-type.c test as redundant >> per feedback. I prototyped extending the c++-types.data test as suggested, >> but that would have introduced a C++20 dependency. The test demonstrated that >> the char16_t and char32_t types were mapped to 'Ds' and 'Di' as expected, but >> char8_t got mapped to 'h' rather than 'Du'. It might be worth extending this >> test in the future if/when C++20 support becomes a minimal requirement. >> - I did not switch the iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c test to use >> TEST_COMPARE in favor of the existing custom error handling; the current >> error messages are customized to provide a more detailed explanation of the >> error, so removing that seemed like an arguable regression. My changes retain >> consistency with the existing code. I think it is reasonable to switch to >> TEST_COMPARE, but I think that should be done separately from this change and >> separately motivated. >> - I did not change the c8rtomb() and mbrtoc8() implementations to use thread >> local storage when PS is null; the implementations continue to use static >> storage consistent with c16rtomb(), mbrtoc16(), c32rtomb(), and mbrtoc32(). >> I think it is reasonable to switch all of these functions to use thread >> local storage, but I think such a change should be done consistently across >> all of them and should be separately motivated. >> >> Thank you to Joseph Myers, Carlos O'Donell, Adhemerval Zanella, and Florian >> Weimer for prior reviews of this patch series. >> >> Tested on Linux x86_64. >> >> [1]: Bug 25744 >> "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the >> second byte of certain double byte characters" >> https://sourceware.org/bugzilla/show_bug.cgi?id=25744 >> >> [2]: WG21 P0482R6 >> "char8_t: A type for UTF-8 characters and strings (Revision 6)" >> https://wg21.link/p0482r6 >> >> [3]: WG14 N2653 >> "char8_t: A type for UTF-8 characters and strings (Revision 1)" >> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm >> >> [4]: WG14 N2912 >> "Programming languages — C working draft — June 8, 2022" >> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf >> >> [5]: libc-alpha mailing list >> "[PATCH 0/3]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb()." >> https://sourceware.org/pipermail/libc-alpha/2022-February/136723.html > Hi Tom, > > Patches look good in general, just some really minor style issues I > found in last review. I have fixed them and if you are ok I can > install them, I have create a branch so you can check [1]. Excellent, thank you! I verified that your branch differs from the patches I posted just by the style issues mentioned in your review and the additional removal of the (empty) sysdeps/mach/hurd/libhurduser.abilist and sysdeps/mach/libmachuser.abilist files (which I presume to be intentional). Looks good to me! I very much appreciate you making the additional style corrections! Tom. > > [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/mbrtoc8-c8rtomb >
> On 6 Jul 2022, at 00:27, Tom Honermann <tom@honermann.net> wrote: > > On 7/4/22 3:08 PM, Adhemerval Zanella wrote: >> >>> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote: >>> >>> This series of patches provides the following: >>> - A fix for BZ #25744 [1]. >>> - Implementations of the mbrtoc8 and c8rtomb functions adopted for >>> C++20 via WG21 P0482R6 [2] and for C2X via WG14 N2653 [3]. >>> - A char8_t typedef as adopted for C2X via WG14 N2653 [3]. >>> >>> The fix for BZ #25744 [1] is included in this patch series because the tests >>> for mbrtoc8 and c8rtomb depend on it for exercising the special case where a >>> pair of Unicode code points is converted to/from a single double byte sequence. >>> Such conversion cases exist for Big5-HKSCS. >>> >>> N2653 was adopted by WG14 for C2X and wording is present in the N2912 revision >>> of the C2X working draft [4]. This patch series enables the new declarations >>> in C2X mode and when _GNU_SOURCE is defined. >>> >>> This patch series addresses feedback provided in response to the prior >>> submission series starting at [5]. All feedback was addressed except as noted >>> below: >>> - I removed the previously proposed wcsmbs/test-char8-type.c test as redundant >>> per feedback. I prototyped extending the c++-types.data test as suggested, >>> but that would have introduced a C++20 dependency. The test demonstrated that >>> the char16_t and char32_t types were mapped to 'Ds' and 'Di' as expected, but >>> char8_t got mapped to 'h' rather than 'Du'. It might be worth extending this >>> test in the future if/when C++20 support becomes a minimal requirement. >>> - I did not switch the iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c test to use >>> TEST_COMPARE in favor of the existing custom error handling; the current >>> error messages are customized to provide a more detailed explanation of the >>> error, so removing that seemed like an arguable regression. My changes retain >>> consistency with the existing code. I think it is reasonable to switch to >>> TEST_COMPARE, but I think that should be done separately from this change and >>> separately motivated. >>> - I did not change the c8rtomb() and mbrtoc8() implementations to use thread >>> local storage when PS is null; the implementations continue to use static >>> storage consistent with c16rtomb(), mbrtoc16(), c32rtomb(), and mbrtoc32(). >>> I think it is reasonable to switch all of these functions to use thread >>> local storage, but I think such a change should be done consistently across >>> all of them and should be separately motivated. >>> >>> Thank you to Joseph Myers, Carlos O'Donell, Adhemerval Zanella, and Florian >>> Weimer for prior reviews of this patch series. >>> >>> Tested on Linux x86_64. >>> >>> [1]: Bug 25744 >>> "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the >>> second byte of certain double byte characters" >>> https://sourceware.org/bugzilla/show_bug.cgi?id=25744 >>> >>> [2]: WG21 P0482R6 >>> "char8_t: A type for UTF-8 characters and strings (Revision 6)" >>> https://wg21.link/p0482r6 >>> >>> [3]: WG14 N2653 >>> "char8_t: A type for UTF-8 characters and strings (Revision 1)" >>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm >>> >>> [4]: WG14 N2912 >>> "Programming languages — C working draft — June 8, 2022" >>> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf >>> >>> [5]: libc-alpha mailing list >>> "[PATCH 0/3]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb()." >>> https://sourceware.org/pipermail/libc-alpha/2022-February/136723.html >> Hi Tom, >> >> Patches look good in general, just some really minor style issues I >> found in last review. I have fixed them and if you are ok I can >> install them, I have create a branch so you can check [1]. > > Excellent, thank you! I verified that your branch differs from the patches I posted just by the style issues mentioned in your review and the additional removal of the (empty) sysdeps/mach/hurd/libhurduser.abilist and sysdeps/mach/libmachuser.abilist files (which I presume to be intentional). Looks good to me! I very much appreciate you making the additional style corrections! > > Tom. The hurd files removal is a mistake from my part (we should really fix make update-abi/check-abi on hurd...). I will fix it and install the patches. > >> >> [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/mbrtoc8-c8rtomb >>