From patchwork Sat Jan 8 00:39:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 49743 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9871C3857C78 for ; Sat, 8 Jan 2022 00:40:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9871C3857C78 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1641602453; bh=/dNlv3vzU+ZCWqAKp6WXPZITJ7vf2LzFg1J9z/+7bG0=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=b2rairMQzypDTs6Nd7MuEFSp2C6uGUFMEwpjAzHyQM3P2yi6s/scj4SbA0hb+Ndhe ZL4gjffxgj3xPV0NFWKUn9mJLF++6xcucDXzNXmY9wNJW4eMoN2lBtyBPUFhuDqBSA 4Tc/JQZMq9Vqi3laA1aDUGsN+Qihibb8gXVuB9BM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp69.ord1c.emailsrvr.com (smtp69.ord1c.emailsrvr.com [108.166.43.69]) by sourceware.org (Postfix) with ESMTPS id F1DF73858421 for ; Sat, 8 Jan 2022 00:39:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F1DF73858421 X-Auth-ID: tom@honermann.net Received: by smtp1.relay.ord1c.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 5F7F5200C3 for ; Fri, 7 Jan 2022 19:39:12 -0500 (EST) To: libc-alpha Subject: [PATCH 2/3]: C++20 P0482R6 and C2X N2653: Implement mbrtoc8, c8rtomb, char8_t Message-ID: Date: Fri, 7 Jan 2022 19:39:11 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 Content-Language: en-US X-Classification-ID: 11cdab8b-8a77-4e47-9009-0e29b8406365-1-1 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch provides implementations for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 [1] and proposed for C2X via WG14 N2653 [2]. It also provides the char8_t typedef from WG14 N2653 [2]. The mbrtoc8 and c8rtomb functions are declared in uchar.h if either of the C++20 __cpp_char8_t or _GNU_SOURCE feature test macros are defined. The char8_t typedef is declared in uchar.h if _GNU_SOURCE is defined and __cpp_char8_t is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). Tested on Linux x86_64. Tom. [1]: WG21 P0482R6 "char8_t: A type for UTF-8 characters and strings (Revision 6)" https://wg21.link/p0482r6 [2]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm commit 0710b1004c1eb151d739c73090c4eab81e454eb1 Author: Tom Honermann Date: Wed Jan 5 18:42:03 2022 -0500 Implement mbrtoc8(), c8rtomb(), and the char8_t typedef. This change provides implementations for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 and proposed for C2X via WG14 N2653. It also provides the char8_t typedef from N2653. The mbrtoc8 and c8rtomb functions are declared in uchar.h if either of the C++20 __cpp_char8_t feature test macro or the _GNU_SOURCE macro are defined. The char8_t typedef is declared in uchar.h if _GNU_SOURCE is defined and __cpp_char8_t is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). diff --git a/NEWS b/NEWS index 4762bfcc4e..ccb22553f2 100644 --- a/NEWS +++ b/NEWS @@ -237,6 +237,14 @@ Major new features: * The audit libraries will avoid unnecessary slowdown if it is not required PLT tracking (by not implementing the la_pltenter or la_pltexit callbacks). +* The mbrtoc8 and c8rtomb functions are added for implementation of the + C++20 P0482R6 and C2X N2653 proposals. These functions perform conversions + between multibyte sequences and the UTF-8 character encoding. A char8_t + typedef is added for the C2X N2653 proposal. The functions are declared + in uchar.h if the C++20 __cpp_char8_t feature test macro or _GNU_SOURCE + macro is defined. The char8_t typedef is declared in uchar.h if _GNU_SOURCE + is defined and __cpp_char8_t is not defined. + Deprecated and removed features, and other changes affecting compatibility: * The function pthread_mutex_consistent_np has been deprecated; programs diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist index ecf8c0992e..d1188a83a5 100644 --- a/sysdeps/mach/hurd/i386/libc.abilist +++ b/sysdeps/mach/hurd/i386/libc.abilist @@ -2287,7 +2287,9 @@ GLIBC_2.34 shm_unlink F GLIBC_2.34 timespec_getres F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F GLIBC_2.35 close_range F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist index fed942ed4b..35586c023f 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist @@ -2613,4 +2613,6 @@ GLIBC_2.34 tss_delete F GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F +GLIBC_2.35 c8rtomb F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist index 2867932704..4969c57a37 100644 --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist @@ -2711,6 +2711,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist index 239db7bab0..fccd1f8662 100644 --- a/sysdeps/unix/sysv/linux/arc/libc.abilist +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist @@ -2375,3 +2375,5 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist index bc79dcfe8a..1cb2d4fd66 100644 --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist @@ -493,6 +493,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist index 614607fd6b..b36859f8e1 100644 --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist @@ -490,6 +490,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist index 2b61543f0d..4f7828d822 100644 --- a/sysdeps/unix/sysv/linux/csky/libc.abilist +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist @@ -2649,3 +2649,5 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist index 6b3cb1adb4..d59accc822 100644 --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist @@ -2598,6 +2598,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist index 7f608c1b64..37a065f555 100644 --- a/sysdeps/unix/sysv/linux/i386/libc.abilist +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist @@ -2782,6 +2782,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist index 865deec43f..69839c86fd 100644 --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist @@ -2549,6 +2549,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist index a172d74632..b8418d612c 100644 --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist @@ -494,6 +494,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0x98 GLIBC_2.4 _IO_2_1_stdin_ D 0x98 diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist index 174e9c7739..8cfbb5f235 100644 --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist @@ -2725,6 +2725,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist index d042be1369..0d729329aa 100644 --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist @@ -2698,3 +2698,5 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist index 332da62de2..3980c6df0d 100644 --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist @@ -2695,3 +2695,5 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist index 2d6ec0d0e8..f8c2087559 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist @@ -2690,6 +2690,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist index 6c5befa72b..ce87791286 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist @@ -2688,6 +2688,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist index 5fb24c97e1..ee547ceb78 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist @@ -2696,6 +2696,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist index f4f29fc15e..4c558a283e 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist @@ -2600,6 +2600,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist index 2e7300cd05..234ab84125 100644 --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist @@ -2737,3 +2737,5 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist index 129a2f16a7..e383bac9bd 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist @@ -2752,6 +2752,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist index 7e23226779..ddbcb3e085 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist @@ -2785,6 +2785,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist index 6f97392b70..daaa374fa0 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist @@ -2508,6 +2508,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist index 29058a041a..a24f6b4751 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist @@ -2810,3 +2810,5 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist index d2924766d2..3dc0e04f64 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist @@ -2377,3 +2377,5 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index b770e05da3..af3bbfc83b 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2577,3 +2577,5 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist index bed3433a2b..4025a7f913 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist @@ -2750,6 +2750,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist index 4f1a143da5..55febf9508 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist @@ -2545,6 +2545,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist index 92c8dec8ec..cc37920ee2 100644 --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist @@ -2605,6 +2605,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist index 263da58cb7..298654b119 100644 --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist @@ -2602,6 +2602,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist index 0171efe7db..ed214429f0 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist @@ -2745,6 +2745,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist index 7f8d45f362..0dcc3674eb 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist @@ -2572,6 +2572,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist index c2f1a8ecc6..ec4e0b73e4 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist @@ -2523,6 +2523,8 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist index 8b43acf100..b60cd45821 100644 --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist @@ -2629,3 +2629,5 @@ GLIBC_2.34 tss_get F GLIBC_2.34 tss_set F GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F +GLIBC_2.35 c8rtomb F +GLIBC_2.35 mbrtoc8 F diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index 3a3be9cd95..390d94bbb2 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -42,7 +42,7 @@ routines := wcscat wcschr wcscmp wcscpy wcscspn wcsdup wcslen wcsncat \ wcsmbsload mbsrtowcs_l \ isoc99_wscanf isoc99_vwscanf isoc99_fwscanf isoc99_vfwscanf \ isoc99_swscanf isoc99_vswscanf \ - mbrtoc16 c16rtomb mbrtoc32 c32rtomb + mbrtoc8 c8rtomb mbrtoc16 c16rtomb mbrtoc32 c32rtomb strop-tests := wcscmp wcsncmp wmemcmp wcslen wcschr wcsrchr wcscpy wcsnlen \ wcpcpy wcsncpy wcpncpy wcscat wcsncat wcschrnul wcsspn wcspbrk \ diff --git a/wcsmbs/Versions b/wcsmbs/Versions index 0b31c1b940..d578518fad 100644 --- a/wcsmbs/Versions +++ b/wcsmbs/Versions @@ -49,4 +49,7 @@ libc { wcstof32; wcstof64; wcstof32x; wcstof32_l; wcstof64_l; wcstof32x_l; } + GLIBC_2.35 { + c8rtomb; mbrtoc8; + } } diff --git a/wcsmbs/c8rtomb.c b/wcsmbs/c8rtomb.c new file mode 100644 index 0000000000..8a5ffeab23 --- /dev/null +++ b/wcsmbs/c8rtomb.c @@ -0,0 +1,132 @@ +/* UTF-8 to multibyte conversion. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + + +/* This is the private state used if PS is NULL. */ +static mbstate_t state; + +size_t +c8rtomb (char *s, char8_t c8, mbstate_t *ps) +{ + /* This implementation depends on the converter invoked by wcrtomb not + needing to retain state in either the top most bit of ps->__count or + in ps->__value between invocations. This implementation uses the + top most bit of ps->__count to indicate that trailing code units are + expected and uses ps->__value to store previously seen code units. */ + + wchar_t wc; + + if (ps == NULL) + ps = &state; + + if (s == NULL) + { + /* if 's' is a null pointer, behave as if u8'\0' was passed as 'c8'. If + this occurs for an incomplete code unit sequence, then an error will + be reported below. */ + c8 = u8""[0]; + } + + if (! (ps->__count & 0x80000000)) + { + /* Initial state. */ + if ((c8 >= 0x80 && c8 <= 0xC1) || c8 >= 0xF5) + { + /* An invalid lead code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (c8 >= 0xC2) + { + /* A valid lead code unit. */ + ps->__count |= 0x80000000; + ps->__value.__wchb[0] = c8; + ps->__value.__wchb[3] = 1; + return 0; + } + /* A single byte (ASCII) code unit. */ + wc = c8; + } + else + { + char8_t cu1 = ps->__value.__wchb[0]; + if (ps->__value.__wchb[3] == 1) + { + /* A single lead code unit was previously seen. */ + if ((c8 < 0x80 || c8 > 0xBF) || + (cu1 == 0xE0 && c8 < 0xA0) || + (cu1 == 0xED && c8 > 0x9F) || + (cu1 == 0xF0 && c8 < 0x90) || + (cu1 == 0xF4 && c8 > 0x8F)) + { + /* An invalid second code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (cu1 >= 0xE0) + { + /* A three or four code unit sequence. */ + ps->__value.__wchb[1] = c8; + ++ps->__value.__wchb[3]; + return 0; + } + wc = ((cu1 & 0x1F) << 6) + + (c8 & 0x3F); + } + else + { + char8_t cu2 = ps->__value.__wchb[1]; + /* A three or four byte code unit sequence. */ + if (c8 < 0x80 || c8 > 0xBF) + { + /* An invalid third or fourth code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (ps->__value.__wchb[3] == 2 && cu1 >= 0xF0) + { + /* A four code unit sequence. */ + ps->__value.__wchb[2] = c8; + ++ps->__value.__wchb[3]; + return 0; + } + if (cu1 < 0xF0) + { + wc = ((cu1 & 0x0F) << 12) + + ((cu2 & 0x3F) << 6) + + (c8 & 0x3F); + } + else + { + char8_t cu3 = ps->__value.__wchb[2]; + wc = ((cu1 & 0x07) << 18) + + ((cu2 & 0x3F) << 12) + + ((cu3 & 0x3F) << 6) + + (c8 & 0x3F); + } + } + ps->__count &= 0x7fffffff; + ps->__value.__wch = 0; + } + + return wcrtomb (s, wc, ps); +} diff --git a/wcsmbs/mbrtoc8.c b/wcsmbs/mbrtoc8.c new file mode 100644 index 0000000000..8ca457088d --- /dev/null +++ b/wcsmbs/mbrtoc8.c @@ -0,0 +1,126 @@ +/* Multibyte to UTF-8 conversion. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include + +#include + +#ifndef EILSEQ +# define EILSEQ EINVAL +#endif + + +/* This is the private state used if PS is NULL. */ +static mbstate_t state; + +size_t +mbrtoc8 (char8_t *pc8, const char *s, size_t n, mbstate_t *ps) +{ + /* This implementation depends on the converter invoked by mbrtowc() not + needing to retain state in either the top most bit of ps->__count or + in ps->__value between invocations. This implementation uses the + top most bit of ps->__count to indicate that trailing code units are + yet to be written and uses ps->__value to store those code units. */ + + if (ps == NULL) + ps = &state; + + /* If state indicates that trailing code units are yet to be written, write + those first regardless of whether 's' is a null pointer. */ + if (ps->__count & 0x80000000) + { + /* ps->__value.__wchb[3] stores the index of the next code unit to + write. Code units are stored in reverse order. */ + size_t i = ps->__value.__wchb[3]; + if (pc8 != NULL) + { + *pc8 = ps->__value.__wchb[i]; + } + if (i == 0) + { + ps->__count &= 0x7fffffff; + ps->__value.__wch = 0; + } + else + --ps->__value.__wchb[3]; + return -3; + } + + if (s == NULL) + { + /* if 's' is a null pointer, behave as if a null pointer was passed for + 'pc8', an empty string was passed for 's', and 1 passed for 'n'. */ + pc8 = NULL; + s = ""; + n = 1; + } + + wchar_t wc; + size_t result; + + result = mbrtowc(&wc, s, n, ps); + if (result <= n) + { + if (wc <= 0x7F) + { + if (pc8 != NULL) + *pc8 = wc; + } + else if (wc <= 0x7FF) + { + if (pc8 != NULL) + *pc8 = 0xC0 + ((wc >> 6) & 0x1F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 0; + ps->__count |= 0x80000000; + } + else if (wc <= 0xFFFF) + { + if (pc8 != NULL) + *pc8 = 0xE0 + ((wc >> 12) & 0x0F); + ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 1; + ps->__count |= 0x80000000; + } + else if (wc <= 0x10FFFF) + { + if (pc8 != NULL) + *pc8 = 0xF0 + ((wc >> 18) & 0x07); + ps->__value.__wchb[2] = 0x80 + ((wc >> 12) & 0x3F); + ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 2; + ps->__count |= 0x80000000; + } + } + if (result == 0 && wc != 0) + { + /* mbrtowc() never returns -3. When a MB sequence converts to multiple + WCs, no input is consumed when writing the subsequent WCs resulting + in a result of 0 even if a null character wasn't written. */ + result = -3; + } + + return result; +} diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h index 6020f66cf6..2f15c6f74d 100644 --- a/wcsmbs/uchar.h +++ b/wcsmbs/uchar.h @@ -31,6 +31,13 @@ #include #include +/* Declare the char8_t typedef in _GNU_SOURCE mode, but only if the C++ + __cpp_char8_t feature test macro is not defined. */ +#if defined __USE_GNU && !defined __cpp_char8_t +/* Define the 8-bit character type. */ +typedef unsigned char char8_t; +#endif + #ifndef __USE_ISOCXX11 /* Define the 16-bit and 32-bit character types. */ typedef __uint_least16_t char16_t; @@ -40,6 +47,20 @@ typedef __uint_least32_t char32_t; __BEGIN_DECLS +/* Declare mbrtoc8() and c8rtomb() in _GNU_SOURCE mode or if the C++ + __cpp_char8_t feature test macro is defined. */ +#if defined __USE_GNU || defined __cpp_char8_t +/* Write char8_t representation of multibyte character pointed + to by S to PC8. */ +extern size_t mbrtoc8 (char8_t *__restrict __pc8, + const char *__restrict __s, size_t __n, + mbstate_t *__restrict __p) __THROW; + +/* Write multibyte representation of char8_t C8 to S. */ +extern size_t c8rtomb (char *__restrict __s, char8_t __c8, + mbstate_t *__restrict __ps) __THROW; +#endif + /* Write char16_t representation of multibyte character pointed to by S to PC16. */ extern size_t mbrtoc16 (char16_t *__restrict __pc16,