From patchwork Thu Jun 30 12:52:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 55600 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 27CB83841454 for ; Thu, 30 Jun 2022 12:53:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 27CB83841454 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656593581; bh=56kqwzzHX3PFZdbnrBhiWSjAfkF6a8A23KMu1gyffY4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=QIYZVB4Tl46ykF+8nsB8aWVIxN59AbKMfvedJiDbMyOhjGmp1MtqqT4y5y+uOugUt lWq/p196hyVdJAWx5jB0g87m1reSMIaD/4BU3EtPRpBlPZnvaDtniNX29ivltR3rP9 IfOiTHAkZH9GBKC/IPlrowKZXHV2KjLkHTKXPN0A= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp116.iad3b.emailsrvr.com (smtp116.iad3b.emailsrvr.com [146.20.161.116]) by sourceware.org (Postfix) with ESMTPS id CC93F3870871 for ; Thu, 30 Jun 2022 12:52:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CC93F3870871 X-Auth-ID: tom@honermann.net Received: by smtp15.relay.iad3b.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 61295C00B4; Thu, 30 Jun 2022 08:52:33 -0400 (EDT) To: libc-alpha@sourceware.org Subject: [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744] Date: Thu, 30 Jun 2022 08:52:13 -0400 Message-Id: <20220630125215.6052-2-tom@honermann.net> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220630125215.6052-1-tom@honermann.net> References: <20220630125215.6052-1-tom@honermann.net> MIME-Version: 1.0 X-Classification-ID: bd4e2ac9-54ec-4bf3-bea9-66e0e087bd86-2-1 X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Cc: Tom Honermann Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch corrects the Big5-HKSCS converter to preserve the lowest 3 bits of the mbstate_t __count data member when the converter encounters an incomplete multibyte character. This fixes BZ #25744. Reviewed-by: Adhemerval Zanella --- iconvdata/big5hkscs.c | 16 +++--- iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c | 65 +++++++++++++++++++++++ 2 files changed, 73 insertions(+), 8 deletions(-) diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c index a28b18a5ec..d12389b2e3 100644 --- a/iconvdata/big5hkscs.c +++ b/iconvdata/big5hkscs.c @@ -17769,7 +17769,7 @@ static struct the output state to the initial state. This has to be done during the flushing. */ #define EMIT_SHIFT_TO_INIT \ - if (data->__statep->__count != 0) \ + if ((data->__statep->__count >> 3) != 0) \ { \ if (FROM_DIRECTION) \ { \ @@ -17778,7 +17778,7 @@ static struct /* Write out the last character. */ \ *((uint32_t *) outbuf) = data->__statep->__count >> 3; \ outbuf += sizeof (uint32_t); \ - data->__statep->__count = 0; \ + data->__statep->__count &= 7; \ } \ else \ /* We don't have enough room in the output buffer. */ \ @@ -17792,7 +17792,7 @@ static struct uint32_t lasttwo = data->__statep->__count >> 3; \ *outbuf++ = (lasttwo >> 8) & 0xff; \ *outbuf++ = lasttwo & 0xff; \ - data->__statep->__count = 0; \ + data->__statep->__count &= 7; \ } \ else \ /* We don't have enough room in the output buffer. */ \ @@ -17878,7 +17878,7 @@ static struct \ /* Otherwise store only the first character now, and \ put the second one into the queue. */ \ - *statep = ch2 << 3; \ + *statep = (ch2 << 3) | (*statep & 7); \ /* Tell the caller why we terminate the loop. */ \ result = __GCONV_FULL_OUTPUT; \ break; \ @@ -17895,7 +17895,7 @@ static struct } \ else \ /* Clear the queue and proceed to output the saved character. */ \ - *statep = 0; \ + *statep &= 7; \ \ put32 (outptr, ch); \ outptr += 4; \ @@ -17946,7 +17946,7 @@ static struct } \ *outptr++ = (ch >> 8) & 0xff; \ *outptr++ = ch & 0xff; \ - *statep = 0; \ + *statep &= 7; \ inptr += 4; \ continue; \ \ @@ -17959,7 +17959,7 @@ static struct } \ *outptr++ = (lasttwo >> 8) & 0xff; \ *outptr++ = lasttwo & 0xff; \ - *statep = 0; \ + *statep &= 7; \ continue; \ } \ \ @@ -17996,7 +17996,7 @@ static struct /* Check for possible combining character. */ \ if (__glibc_unlikely (ch == 0xca || ch == 0xea)) \ { \ - *statep = ((cp[0] << 8) | cp[1]) << 3; \ + *statep = (((cp[0] << 8) | cp[1]) << 3) | (*statep & 7); \ inptr += 4; \ continue; \ } \ diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c index 9601b6c1d9..e1472dc2e2 100644 --- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c +++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c @@ -128,6 +128,71 @@ check_conversion (struct testdata test) printf ("error: Result of third conversion was wrong.\n"); err++; } + + /* Now perform the same test as above consuming one byte at a time. */ + mbs = test.input; + memset (&st, 0, sizeof (st)); + + /* Consume the first byte; expect an incomplete multibyte character. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != -2) + { + printf ("error: First byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the first consumed byte. */ + mbs += 1; + /* Consume the second byte; expect the first wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 1) + { + printf ("error: Second byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the second consumed byte. */ + mbs += 1; + if (wc != test.expected[0]) + { + printf ("error: Result of first wchar_t conversion was wrong.\n"); + err++; + } + /* Consume no bytes; expect the second wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 0) + { + printf ("error: First attempt of third byte conversion returned %zd.\n", ret); + err++; + } + /* Do not advance past the third byte. */ + mbs += 0; + if (wc != test.expected[1]) + { + printf ("error: Result of second wchar_t conversion was wrong.\n"); + err++; + } + /* After the second wchar_t conversion, the converter should be in + the initial state since the two input BIG5-HKSCS bytes have been + consumed and the two wchar_t's have been output. */ + if (mbsinit (&st) == 0) + { + printf ("error: Converter not in initial state.\n"); + err++; + } + /* Consume the third byte; expect the third wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 1) + { + printf ("error: Third byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the third consumed byte. */ + mbs += 1; + if (wc != test.expected[2]) + { + printf ("error: Result of third wchar_t conversion was wrong.\n"); + err++; + } + /* Return 0 if we saw no errors. */ return err; } From patchwork Thu Jun 30 12:52:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 55602 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 48D1938418AD for ; Thu, 30 Jun 2022 12:53:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 48D1938418AD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656593612; bh=XcKEaatF55IT5pIRAB1nrMR+LNFT7BfLxcjEoPNuUn0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=jd2amaAs6nW5mrcsDFPHh5Oj29Fi1NdXe7SJXMdvZ1fIHx1qk60va2jfpnoz+A0pu SY8km1JwjyTer4jLlKtZ5DN1XobbmdBjs4wGfWeJ4qThI7hCZ3exZ7Ghy/j6uKZM8L bUjD1r+EqroxGCjbPkHaAXNdDQS+g+g83T98jNro= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp119.iad3b.emailsrvr.com (smtp119.iad3b.emailsrvr.com [146.20.161.119]) by sourceware.org (Postfix) with ESMTPS id 2938B383F947 for ; Thu, 30 Jun 2022 12:52:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2938B383F947 X-Auth-ID: tom@honermann.net Received: by smtp15.relay.iad3b.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id CA642C00C4; Thu, 30 Jun 2022 08:52:35 -0400 (EDT) To: libc-alpha@sourceware.org Subject: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef. Date: Thu, 30 Jun 2022 08:52:14 -0400 Message-Id: <20220630125215.6052-3-tom@honermann.net> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220630125215.6052-1-tom@honermann.net> References: <20220630125215.6052-1-tom@honermann.net> MIME-Version: 1.0 X-Classification-ID: bd4e2ac9-54ec-4bf3-bea9-66e0e087bd86-3-1 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Cc: Tom Honermann Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This change provides implementations for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14 N2653. It also provides the char8_t typedef from WG14 N2653. The mbrtoc8 and c8rtomb functions are declared in uchar.h in C2X mode or when the _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined. The char8_t typedef is declared in uchar.h in C2X mode or when the _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). Reviewed-by: Adhemerval Zanella --- NEWS | 9 ++ sysdeps/mach/hurd/i386/libc.abilist | 2 + sysdeps/unix/sysv/linux/aarch64/libc.abilist | 2 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 2 + sysdeps/unix/sysv/linux/arc/libc.abilist | 2 + sysdeps/unix/sysv/linux/arm/be/libc.abilist | 2 + sysdeps/unix/sysv/linux/arm/le/libc.abilist | 2 + sysdeps/unix/sysv/linux/csky/libc.abilist | 2 + sysdeps/unix/sysv/linux/hppa/libc.abilist | 2 + sysdeps/unix/sysv/linux/i386/libc.abilist | 2 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 2 + .../sysv/linux/m68k/coldfire/libc.abilist | 2 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 2 + .../sysv/linux/microblaze/be/libc.abilist | 2 + .../sysv/linux/microblaze/le/libc.abilist | 2 + .../sysv/linux/mips/mips32/fpu/libc.abilist | 2 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 2 + .../sysv/linux/mips/mips64/n32/libc.abilist | 2 + .../sysv/linux/mips/mips64/n64/libc.abilist | 2 + sysdeps/unix/sysv/linux/nios2/libc.abilist | 2 + sysdeps/unix/sysv/linux/or1k/libc.abilist | 2 + .../linux/powerpc/powerpc32/fpu/libc.abilist | 2 + .../powerpc/powerpc32/nofpu/libc.abilist | 2 + .../linux/powerpc/powerpc64/be/libc.abilist | 2 + .../linux/powerpc/powerpc64/le/libc.abilist | 2 + .../unix/sysv/linux/riscv/rv32/libc.abilist | 2 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 2 + .../unix/sysv/linux/s390/s390-32/libc.abilist | 2 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 2 + sysdeps/unix/sysv/linux/sh/be/libc.abilist | 2 + sysdeps/unix/sysv/linux/sh/le/libc.abilist | 2 + .../sysv/linux/sparc/sparc32/libc.abilist | 2 + .../sysv/linux/sparc/sparc64/libc.abilist | 2 + .../unix/sysv/linux/x86_64/64/libc.abilist | 2 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 2 + wcsmbs/Makefile | 2 +- wcsmbs/Versions | 3 + wcsmbs/c8rtomb.c | 132 ++++++++++++++++++ wcsmbs/mbrtoc8.c | 126 +++++++++++++++++ wcsmbs/uchar.h | 21 +++ 40 files changed, 360 insertions(+), 1 deletion(-) create mode 100644 wcsmbs/c8rtomb.c create mode 100644 wcsmbs/mbrtoc8.c diff --git a/NEWS b/NEWS index b0a3d7e512..94243e2170 100644 --- a/NEWS +++ b/NEWS @@ -46,6 +46,15 @@ Major new features: to more flexibly configure and operate on filesystem mounts. The new mount APIs are specifically designed to work with namespaces. +* Support for the mbrtoc8 and c8rtomb multibyte/UTF-8 character conversion + functions has been added per the ISO C2X N2653 and C++20 P0482R6 proposals. + Support for the char8_t typedef has been added per the ISO C2X N2653 + proposal. The functions are declared in uchar.h in C2X mode or when the + _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined. + The char8_t typedef is declared in uchar.h in C2X mode or when the + _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro + is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). + Deprecated and removed features, and other changes affecting compatibility: * Support for prelink will be removed in the next release; this includes diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist index 4dc87e9061..66fb0e28fa 100644 --- a/sysdeps/mach/hurd/i386/libc.abilist +++ b/sysdeps/mach/hurd/i386/libc.abilist @@ -2289,6 +2289,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 close_range F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist index 8dba065b81..b3cf9fdd70 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist @@ -2616,8 +2616,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist index 08f4750022..2a45006462 100644 --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist @@ -2713,8 +2713,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist index 75db763023..0ac6bba241 100644 --- a/sysdeps/unix/sysv/linux/arc/libc.abilist +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist @@ -2377,8 +2377,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist index fa33f317ac..bfa763906b 100644 --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist @@ -496,8 +496,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist index dba2e4ce42..ffcd7ca432 100644 --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist @@ -493,8 +493,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist index e6ff921c29..940777b118 100644 --- a/sysdeps/unix/sysv/linux/csky/libc.abilist +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist @@ -2652,8 +2652,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist index 8a40cece83..508efe6626 100644 --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist @@ -2601,8 +2601,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist index a89826049f..16b91fcee9 100644 --- a/sysdeps/unix/sysv/linux/i386/libc.abilist +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist @@ -2785,8 +2785,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist index d1d96b7469..51b646790d 100644 --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist @@ -2551,8 +2551,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist index 63a62f267a..ddb43651f2 100644 --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist @@ -497,8 +497,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist index f68325f9bc..3db7deb4d0 100644 --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist @@ -2728,8 +2728,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist index 247af2075c..94afb7ad0b 100644 --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist @@ -2701,8 +2701,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist index b0ac3f9009..5873751425 100644 --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist @@ -2698,8 +2698,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist index b22cd6bf2f..f296e4edb7 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist @@ -2693,8 +2693,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist index 12fc2cce3e..1888756819 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist @@ -2691,8 +2691,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist index d3e96dfd43..7dfacee25b 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist @@ -2699,8 +2699,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist index cb58ed4db0..53e188aafe 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist @@ -2602,8 +2602,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist index 61ad58a599..bc6a836b1b 100644 --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist @@ -2740,8 +2740,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist index 1260dc4e2e..299fa67961 100644 --- a/sysdeps/unix/sysv/linux/or1k/libc.abilist +++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist @@ -2123,8 +2123,10 @@ GLIBC_2.35 wprintf F GLIBC_2.35 write F GLIBC_2.35 writev F GLIBC_2.35 wscanf F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist index 363939762c..a5a072394d 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist @@ -2755,8 +2755,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist index f512ad8baf..2d26fd8639 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist @@ -2788,8 +2788,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist index c9bdc9859c..d9f1c593ea 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist @@ -2510,8 +2510,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist index f091be30bd..874f33dbcc 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist @@ -2812,8 +2812,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist index 7ea73f9af8..465798a56f 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist @@ -2379,8 +2379,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index 333fa62714..ecc0544c05 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2579,8 +2579,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist index a867467b12..3e8d00d513 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist @@ -2753,8 +2753,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist index dbad5b3163..a872a3d186 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist @@ -2547,8 +2547,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist index 6f755cc173..a2938ca2be 100644 --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist @@ -2608,8 +2608,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist index 77d936aa3c..ef318251c5 100644 --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist @@ -2605,8 +2605,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist index 09bb4363e1..2e2fbe72e2 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist @@ -2748,8 +2748,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist index 9df9cb6adb..e1991259cd 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist @@ -2574,8 +2574,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist index 4829450ad0..7d0843d1d8 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist @@ -2525,8 +2525,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist index caea228bcb..761958f768 100644 --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist @@ -2631,8 +2631,10 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F GLIBC_2.36 fsmount F GLIBC_2.36 fsopen F +GLIBC_2.36 mbrtoc8 F GLIBC_2.36 move_mount F GLIBC_2.36 pidfd_getfd F GLIBC_2.36 pidfd_open F diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index df9a85f4a9..bda281ad70 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -42,7 +42,7 @@ routines := wcscat wcschr wcscmp wcscpy wcscspn wcsdup wcslen wcsncat \ wcsmbsload mbsrtowcs_l \ isoc99_wscanf isoc99_vwscanf isoc99_fwscanf isoc99_vfwscanf \ isoc99_swscanf isoc99_vswscanf \ - mbrtoc16 c16rtomb mbrtoc32 c32rtomb + mbrtoc8 c8rtomb mbrtoc16 c16rtomb mbrtoc32 c32rtomb strop-tests := wcscmp wcsncmp wmemcmp wcslen wcschr wcsrchr wcscpy wcsnlen \ wcpcpy wcsncpy wcpncpy wcscat wcsncat wcschrnul wcsspn wcspbrk \ diff --git a/wcsmbs/Versions b/wcsmbs/Versions index 0b31c1b940..ec28acfb73 100644 --- a/wcsmbs/Versions +++ b/wcsmbs/Versions @@ -49,4 +49,7 @@ libc { wcstof32; wcstof64; wcstof32x; wcstof32_l; wcstof64_l; wcstof32x_l; } + GLIBC_2.36 { + c8rtomb; mbrtoc8; + } } diff --git a/wcsmbs/c8rtomb.c b/wcsmbs/c8rtomb.c new file mode 100644 index 0000000000..b564770eb5 --- /dev/null +++ b/wcsmbs/c8rtomb.c @@ -0,0 +1,132 @@ +/* UTF-8 to multibyte conversion. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + + +/* This is the private state used if PS is NULL. */ +static mbstate_t state; + +size_t +c8rtomb (char *s, char8_t c8, mbstate_t *ps) +{ + /* This implementation depends on the converter invoked by wcrtomb not + needing to retain state in either the top most bit of ps->__count or + in ps->__value between invocations. This implementation uses the + top most bit of ps->__count to indicate that trailing code units are + expected and uses ps->__value to store previously seen code units. */ + + wchar_t wc; + + if (ps == NULL) + ps = &state; + + if (s == NULL) + { + /* if 's' is a null pointer, behave as if u8'\0' was passed as 'c8'. If + this occurs for an incomplete code unit sequence, then an error will + be reported below. */ + c8 = u8""[0]; + } + + if (! (ps->__count & 0x80000000)) + { + /* Initial state. */ + if ((c8 >= 0x80 && c8 <= 0xC1) || c8 >= 0xF5) + { + /* An invalid lead code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (c8 >= 0xC2) + { + /* A valid lead code unit. */ + ps->__count |= 0x80000000; + ps->__value.__wchb[0] = c8; + ps->__value.__wchb[3] = 1; + return 0; + } + /* A single byte (ASCII) code unit. */ + wc = c8; + } + else + { + char8_t cu1 = ps->__value.__wchb[0]; + if (ps->__value.__wchb[3] == 1) + { + /* A single lead code unit was previously seen. */ + if ((c8 < 0x80 || c8 > 0xBF) + || (cu1 == 0xE0 && c8 < 0xA0) + || (cu1 == 0xED && c8 > 0x9F) + || (cu1 == 0xF0 && c8 < 0x90) + || (cu1 == 0xF4 && c8 > 0x8F)) + { + /* An invalid second code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (cu1 >= 0xE0) + { + /* A three or four code unit sequence. */ + ps->__value.__wchb[1] = c8; + ++ps->__value.__wchb[3]; + return 0; + } + wc = ((cu1 & 0x1F) << 6) + + (c8 & 0x3F); + } + else + { + char8_t cu2 = ps->__value.__wchb[1]; + /* A three or four byte code unit sequence. */ + if (c8 < 0x80 || c8 > 0xBF) + { + /* An invalid third or fourth code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (ps->__value.__wchb[3] == 2 && cu1 >= 0xF0) + { + /* A four code unit sequence. */ + ps->__value.__wchb[2] = c8; + ++ps->__value.__wchb[3]; + return 0; + } + if (cu1 < 0xF0) + { + wc = ((cu1 & 0x0F) << 12) + + ((cu2 & 0x3F) << 6) + + (c8 & 0x3F); + } + else + { + char8_t cu3 = ps->__value.__wchb[2]; + wc = ((cu1 & 0x07) << 18) + + ((cu2 & 0x3F) << 12) + + ((cu3 & 0x3F) << 6) + + (c8 & 0x3F); + } + } + ps->__count &= 0x7fffffff; + ps->__value.__wch = 0; + } + + return wcrtomb (s, wc, ps); +} diff --git a/wcsmbs/mbrtoc8.c b/wcsmbs/mbrtoc8.c new file mode 100644 index 0000000000..f2fab3b6a7 --- /dev/null +++ b/wcsmbs/mbrtoc8.c @@ -0,0 +1,126 @@ +/* Multibyte to UTF-8 conversion. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include + +#include + +#ifndef EILSEQ +# define EILSEQ EINVAL +#endif + + +/* This is the private state used if PS is NULL. */ +static mbstate_t state; + +size_t +mbrtoc8 (char8_t *pc8, const char *s, size_t n, mbstate_t *ps) +{ + /* This implementation depends on the converter invoked by mbrtowc() not + needing to retain state in either the top most bit of ps->__count or + in ps->__value between invocations. This implementation uses the + top most bit of ps->__count to indicate that trailing code units are + yet to be written and uses ps->__value to store those code units. */ + + if (ps == NULL) + ps = &state; + + /* If state indicates that trailing code units are yet to be written, write + those first regardless of whether 's' is a null pointer. */ + if (ps->__count & 0x80000000) + { + /* ps->__value.__wchb[3] stores the index of the next code unit to + write. Code units are stored in reverse order. */ + size_t i = ps->__value.__wchb[3]; + if (pc8 != NULL) + { + *pc8 = ps->__value.__wchb[i]; + } + if (i == 0) + { + ps->__count &= 0x7fffffff; + ps->__value.__wch = 0; + } + else + --ps->__value.__wchb[3]; + return -3; + } + + if (s == NULL) + { + /* if 's' is a null pointer, behave as if a null pointer was passed for + 'pc8', an empty string was passed for 's', and 1 passed for 'n'. */ + pc8 = NULL; + s = ""; + n = 1; + } + + wchar_t wc; + size_t result; + + result = mbrtowc (&wc, s, n, ps); + if (result <= n) + { + if (wc <= 0x7F) + { + if (pc8 != NULL) + *pc8 = wc; + } + else if (wc <= 0x7FF) + { + if (pc8 != NULL) + *pc8 = 0xC0 + ((wc >> 6) & 0x1F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 0; + ps->__count |= 0x80000000; + } + else if (wc <= 0xFFFF) + { + if (pc8 != NULL) + *pc8 = 0xE0 + ((wc >> 12) & 0x0F); + ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 1; + ps->__count |= 0x80000000; + } + else if (wc <= 0x10FFFF) + { + if (pc8 != NULL) + *pc8 = 0xF0 + ((wc >> 18) & 0x07); + ps->__value.__wchb[2] = 0x80 + ((wc >> 12) & 0x3F); + ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 2; + ps->__count |= 0x80000000; + } + } + if (result == 0 && wc != 0) + { + /* mbrtowc() never returns -3. When a MB sequence converts to multiple + WCs, no input is consumed when writing the subsequent WCs resulting + in a result of 0 even if a null character wasn't written. */ + result = -3; + } + + return result; +} diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h index 051cdcbeb5..c37e8619a0 100644 --- a/wcsmbs/uchar.h +++ b/wcsmbs/uchar.h @@ -31,6 +31,13 @@ #include #include +/* Declare the C2x char8_t typedef in C2x modes, but only if the C++ + __cpp_char8_t feature test macro is not defined. */ +#if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t +/* Define the 8-bit character type. */ +typedef unsigned char char8_t; +#endif + #ifndef __USE_ISOCXX11 /* Define the 16-bit and 32-bit character types. */ typedef __uint_least16_t char16_t; @@ -40,6 +47,20 @@ typedef __uint_least32_t char32_t; __BEGIN_DECLS +/* Declare the C2x mbrtoc8() and c8rtomb() functions in C2x modes or if + the C++ __cpp_char8_t feature test macro is defined. */ +#if __GLIBC_USE (ISOC2X) || defined __cpp_char8_t +/* Write char8_t representation of multibyte character pointed + to by S to PC8. */ +extern size_t mbrtoc8 (char8_t *__restrict __pc8, + const char *__restrict __s, size_t __n, + mbstate_t *__restrict __p) __THROW; + +/* Write multibyte representation of char8_t C8 to S. */ +extern size_t c8rtomb (char *__restrict __s, char8_t __c8, + mbstate_t *__restrict __ps) __THROW; +#endif + /* Write char16_t representation of multibyte character pointed to by S to PC16. */ extern size_t mbrtoc16 (char16_t *__restrict __pc16, From patchwork Thu Jun 30 12:52:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 55601 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9BCDC383DBB4 for ; Thu, 30 Jun 2022 12:53:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9BCDC383DBB4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656593590; bh=WKbyNjGc+8cn3LaExdzLnv9XuKqHh0+wtkPgo2YzYhw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=wUXFg6PUGshemRZAnzvy2sHXxOZZpsq9uSoPfwZh4Y+Ygc9VUzo54SAWpLeSj80qu mzcgLIxw2aB4RrAaoozMoc6L5FRPk4dVPWsmcHqgDLBrFDYou96QuimyCUABTPEURS etaVSbjiWcmm1AJ282pK4mK0g0b2ezGbkBF1W2kM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp114.iad3b.emailsrvr.com (smtp114.iad3b.emailsrvr.com [146.20.161.114]) by sourceware.org (Postfix) with ESMTPS id B3A313842AF2 for ; Thu, 30 Jun 2022 12:52:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B3A313842AF2 X-Auth-ID: tom@honermann.net Received: by smtp15.relay.iad3b.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 9341DC00B6; Thu, 30 Jun 2022 08:52:38 -0400 (EDT) To: libc-alpha@sourceware.org Subject: [PATCH v4 3/3] stdlib: Tests for mbrtoc8(), c8rtomb(), and the char8_t typedef. Date: Thu, 30 Jun 2022 08:52:15 -0400 Message-Id: <20220630125215.6052-4-tom@honermann.net> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220630125215.6052-1-tom@honermann.net> References: <20220630125215.6052-1-tom@honermann.net> MIME-Version: 1.0 X-Classification-ID: bd4e2ac9-54ec-4bf3-bea9-66e0e087bd86-4-1 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Cc: Tom Honermann Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This change adds tests for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14 N2653, and for the char8_t typedef adopted for C2X from WG14 N2653. The tests for mbrtoc8 and c8rtomb specifically exercise conversion to and from Big5-HKSCS because of special cases that arise with that encoding. Big5-HKSCS defines some double byte sequences that convert to more than one Unicode code point. In order to test this, the locale dependencies for running tests under wcsmbs is expanded to include zh_HK.BIG5-HKSCS. Reviewed-by: Adhemerval Zanella --- wcsmbs/Makefile | 3 +- wcsmbs/test-c8rtomb.c | 613 ++++++++++++++++++++++++++++++++++++++++++ wcsmbs/test-mbrtoc8.c | 539 +++++++++++++++++++++++++++++++++++++ 3 files changed, 1154 insertions(+), 1 deletion(-) create mode 100644 wcsmbs/test-c8rtomb.c create mode 100644 wcsmbs/test-mbrtoc8.c diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index bda281ad70..e6b9e8743a 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -52,6 +52,7 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \ tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \ tst-wcstod-round test-char-types tst-fgetwc-after-eof \ tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \ + test-mbrtoc8 test-c8rtomb \ $(addprefix test-,$(strop-tests)) tst-mbstowcs \ tst-wprintf-binary @@ -59,7 +60,7 @@ include ../Rules ifeq ($(run-built-tests),yes) LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 hr_HR.ISO-8859-2 \ - ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 + ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 zh_HK.BIG5-HKSCS include ../gen-locales.mk $(objpfx)tst-btowc.out: $(gen-locales) diff --git a/wcsmbs/test-c8rtomb.c b/wcsmbs/test-c8rtomb.c new file mode 100644 index 0000000000..6d72189e86 --- /dev/null +++ b/wcsmbs/test-c8rtomb.c @@ -0,0 +1,613 @@ +/* Test c8rtomb. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static int +test_truncated_code_unit_sequence (void) +{ + /* Missing trailing code unit for a two code byte unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xC2"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Missing first trailing code unit for a three byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xE0"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Missing second trailing code unit for a three byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xE0\xA0"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Missing first trailing code unit for a four byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xF0"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Missing second trailing code unit for a four byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xF0\x90"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Missing third trailing code unit for a four byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xF0\x90\x80"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + return 0; +} + +static int +test_invalid_trailing_code_unit_sequence (void) +{ + /* Invalid trailing code unit for a two code byte unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xC2\xC0"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Invalid first trailing code unit for a three byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xE0\xC0"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Invalid second trailing code unit for a three byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xE0\xA0\xC0"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Invalid first trailing code unit for a four byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xF0\xC0"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Invalid second trailing code unit for a four byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xF0\x90\xC0"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Invalid third trailing code unit for a four byte code unit sequence. */ + { + const char8_t *u8s = (const char8_t*) u8"\xF0\x90\x80\xC0"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + return 0; +} + +static int +test_lone_trailing_code_units (void) +{ + /* Lone trailing code unit. */ + const char8_t *u8s = (const char8_t*) u8"\x80"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + + return 0; +} + +static int +test_overlong_encoding (void) +{ + /* Two byte overlong encoding. */ + { + const char8_t *u8s = (const char8_t*) u8"\xC0\x80"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Two byte overlong encoding. */ + { + const char8_t *u8s = (const char8_t*) u8"\xC1\x80"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Three byte overlong encoding. */ + { + const char8_t *u8s = (const char8_t*) u8"\xE0\x9F\xBF"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Four byte overlong encoding. */ + { + const char8_t *u8s = (const char8_t*) u8"\xF0\x8F\xBF\xBF"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + return 0; +} + +static int +test_surrogate_range (void) +{ + /* Would encode U+D800. */ + { + const char8_t *u8s = (const char8_t*) u8"\xED\xA0\x80"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Would encode U+DFFF. */ + { + const char8_t *u8s = (const char8_t*) u8"\xED\xBF\xBF"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + return 0; +} + +static int +test_out_of_range_encoding (void) +{ + /* Would encode U+00110000. */ + { + const char8_t *u8s = (const char8_t*) u8"\xF4\x90\x80\x80"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + /* Would encode U+00140000. */ + { + const char8_t *u8s = (const char8_t*) u8"\xF5\x90\x80\x80"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + return 0; +} + +static int +test_null_output_buffer (void) +{ + /* Null character with an initial state. */ + { + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (NULL, u8"X"[0], &s), (size_t) 1); + /* Assert the state is now an initial state. */ + TEST_VERIFY (mbsinit (&s)); + } + + /* Null buffer with a state corresponding to an incompletely read code + unit sequence. In this case, an error occurs since insufficient + information is available to complete the already started code unit + sequence and return to the initial state. */ + { + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8"\xC2"[0], &s), (size_t) 0); + errno = 0; + TEST_COMPARE (c8rtomb (NULL, u8"\x80"[0], &s), (size_t) -1); + TEST_COMPARE (errno, EILSEQ); + } + + return 0; +} + +static int +test_utf8 (void) +{ + xsetlocale (LC_ALL, "de_DE.UTF-8"); + + /* Null character. */ + { + /* U+0000 => 0x00 */ + const char8_t *u8s = (const char8_t*) u8"\x00"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1); + TEST_COMPARE (buf[0], (char) 0x00); + TEST_VERIFY (mbsinit (&s)); + } + + /* First non-null character in the code point range that maps to a single + code unit. */ + { + /* U+0001 => 0x01 */ + const char8_t *u8s = (const char8_t*) u8"\x01"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1); + TEST_COMPARE (buf[0], (char) 0x01); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to a single code unit. */ + { + /* U+007F => 0x7F */ + const char8_t *u8s = (const char8_t*) u8"\x7F"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1); + TEST_COMPARE (buf[0], (char) 0x7F); + TEST_VERIFY (mbsinit (&s)); + } + + /* First character in the code point range that maps to two code units. */ + { + /* U+0080 => 0xC2 0x80 */ + const char8_t *u8s = (const char8_t*) u8"\xC2\x80"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2); + TEST_COMPARE (buf[0], (char) 0xC2); + TEST_COMPARE (buf[1], (char) 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to two code units. */ + { + /* U+07FF => 0xDF 0xBF */ + const char8_t *u8s = (const char8_t*) u8"\u07FF"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2); + TEST_COMPARE (buf[0], (char) 0xDF); + TEST_COMPARE (buf[1], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* First character in the code point range that maps to three code units. */ + { + /* U+0800 => 0xE0 0xA0 0x80 */ + const char8_t *u8s = (const char8_t*) u8"\u0800"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); + TEST_COMPARE (buf[0], (char) 0xE0); + TEST_COMPARE (buf[1], (char) 0xA0); + TEST_COMPARE (buf[2], (char) 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to three code units + before the surrogate code point range. */ + { + /* U+D7FF => 0xED 0x9F 0xBF */ + const char8_t *u8s = (const char8_t*) u8"\uD7FF"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); + TEST_COMPARE (buf[0], (char) 0xED); + TEST_COMPARE (buf[1], (char) 0x9F); + TEST_COMPARE (buf[2], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* First character in the code point range that maps to three code units + after the surrogate code point range. */ + { + /* U+E000 => 0xEE 0x80 0x80 */ + const char8_t *u8s = (const char8_t*) u8"\uE000"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); + TEST_COMPARE (buf[0], (char) 0xEE); + TEST_COMPARE (buf[1], (char) 0x80); + TEST_COMPARE (buf[2], (char) 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Not a BOM. */ + { + /* U+FEFF => 0xEF 0xBB 0xBF */ + const char8_t *u8s = (const char8_t*) u8"\uFEFF"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); + TEST_COMPARE (buf[0], (char) 0xEF); + TEST_COMPARE (buf[1], (char) 0xBB); + TEST_COMPARE (buf[2], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* Replacement character. */ + { + /* U+FFFD => 0xEF 0xBF 0xBD */ + const char8_t *u8s = (const char8_t*) u8"\uFFFD"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); + TEST_COMPARE (buf[0], (char) 0xEF); + TEST_COMPARE (buf[1], (char) 0xBF); + TEST_COMPARE (buf[2], (char) 0xBD); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to three code units. */ + { + /* U+FFFF => 0xEF 0xBF 0xBF */ + const char8_t *u8s = (const char8_t*) u8"\uFFFF"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); + TEST_COMPARE (buf[0], (char) 0xEF); + TEST_COMPARE (buf[1], (char) 0xBF); + TEST_COMPARE (buf[2], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* First character in the code point range that maps to four code units. */ + { + /* U+10000 => 0xF0 0x90 0x80 0x80 */ + const char8_t *u8s = (const char8_t*) u8"\U00010000"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4); + TEST_COMPARE (buf[0], (char) 0xF0); + TEST_COMPARE (buf[1], (char) 0x90); + TEST_COMPARE (buf[2], (char) 0x80); + TEST_COMPARE (buf[3], (char) 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to four code units. */ + { + /* U+10FFFF => 0xF4 0x8F 0xBF 0xBF */ + const char8_t *u8s = (const char8_t*) u8"\U0010FFFF"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4); + TEST_COMPARE (buf[0], (char) 0xF4); + TEST_COMPARE (buf[1], (char) 0x8F); + TEST_COMPARE (buf[2], (char) 0xBF); + TEST_COMPARE (buf[3], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + return 0; +} + +static int +test_big5_hkscs (void) +{ + xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS"); + + /* A pair of two byte UTF-8 code unit sequences that map a Unicode code + point and combining character to a single double byte character. */ + { + /* U+00CA U+0304 => 0x88 0x62 */ + const char8_t *u8s = (const char8_t*) u8"\u00CA\u0304"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2); + TEST_COMPARE (buf[0], (char) 0x88); + TEST_COMPARE (buf[1], (char) 0x62); + TEST_VERIFY (mbsinit (&s)); + } + + /* Another pair of two byte UTF-8 code unit sequences that map a Unicode code + point and combining character to a single double byte character. */ + { + /* U+00EA U+030C => 0x88 0xA5 */ + const char8_t *u8s = (const char8_t*) u8"\u00EA\u030C"; + char buf[MB_LEN_MAX] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2); + TEST_COMPARE (buf[0], (char) 0x88); + TEST_COMPARE (buf[1], (char) 0xA5); + TEST_VERIFY (mbsinit (&s)); + } + + return 0; +} + +static int +do_test (void) +{ + test_truncated_code_unit_sequence (); + test_invalid_trailing_code_unit_sequence (); + test_lone_trailing_code_units (); + test_overlong_encoding (); + test_surrogate_range (); + test_out_of_range_encoding (); + test_null_output_buffer (); + test_utf8 (); + test_big5_hkscs (); + return 0; +} + +#include diff --git a/wcsmbs/test-mbrtoc8.c b/wcsmbs/test-mbrtoc8.c new file mode 100644 index 0000000000..b282fa6dba --- /dev/null +++ b/wcsmbs/test-mbrtoc8.c @@ -0,0 +1,539 @@ +/* Test mbrtoc8. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include +#include + +static int +test_utf8 (void) +{ + xsetlocale (LC_ALL, "de_DE.UTF-8"); + + /* No inputs. */ + { + const char *mbs = ""; + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 0, &s), (size_t) -2); /* no input */ + TEST_VERIFY (mbsinit (&s)); + } + + /* Null character. */ + { + const char *mbs = "\x00"; /* 0x00 => U+0000 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 0); + mbs += 1; + TEST_COMPARE (buf[0], 0x00); + TEST_VERIFY (mbsinit (&s)); + } + + /* First non-null character in the code point range that maps to a single + code unit. */ + { + const char *mbs = "\x01"; /* 0x01 => U+0001 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0x01); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to a single code unit. */ + { + const char *mbs = "\x7F"; /* 0x7F => U+007F */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0x7F); + TEST_VERIFY (mbsinit (&s)); + } + + /* First character in the code point range that maps to two code units. */ + { + const char *mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); + mbs += 2; + TEST_COMPARE (buf[0], 0xC2); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xC2); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to two code units. */ + { + const char *mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); + mbs += 2; + TEST_COMPARE (buf[0], 0xDF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xDF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* First character in the code point range that maps to three code units. */ + { + const char *mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); + mbs += 3; + TEST_COMPARE (buf[0], 0xE0); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xA0); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xE0); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xA0); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to three code units + before the surrogate code point range. */ + { + const char *mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); + mbs += 3; + TEST_COMPARE (buf[0], 0xED); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x9F); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xED); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x9F); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* First character in the code point range that maps to three code units + after the surrogate code point range. */ + { + const char *mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); + mbs += 3; + TEST_COMPARE (buf[0], 0xEE); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xEE); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Not a BOM. */ + { + const char *mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); + mbs += 3; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBB); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBB); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* Replacement character. */ + { + const char *mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); + mbs += 3; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBD); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBD); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to three code units. */ + { + const char *mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); + mbs += 3; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* First character in the code point range that maps to four code units. */ + { + const char *mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4); + mbs += 4; + TEST_COMPARE (buf[0], 0xF0); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x90); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xF0); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x90); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + } + + /* Last character in the code point range that maps to four code units. */ + { + const char *mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4); + mbs += 4; + TEST_COMPARE (buf[0], 0xF4); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x8F); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xF4); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x8F); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + } + + return 0; +} + +static int +test_big5_hkscs (void) +{ + xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS"); + + /* A double byte character that maps to a pair of two byte UTF-8 code unit + sequences. */ + { + const char *mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); + mbs += 2; + TEST_COMPARE (buf[0], 0xC3); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x8A); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xCC); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x84); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xC3); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x8A); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xCC); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x84); + TEST_VERIFY (mbsinit (&s)); + } + + /* Another double byte character that maps to a pair of two byte UTF-8 code + unit sequences. */ + { + const char *mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); + mbs += 2; + TEST_COMPARE (buf[0], 0xC3); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xAA); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xCC); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x8C); + TEST_VERIFY (mbsinit (&s)); + } + + /* Same as last test, but one code unit at a time. */ + { + const char *mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */ + char8_t buf[1] = { 0 }; + mbstate_t s = { 0 }; + + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); + mbs += 1; + TEST_COMPARE (buf[0], 0xC3); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xAA); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0xCC); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); + TEST_COMPARE (buf[0], 0x8C); + TEST_VERIFY (mbsinit (&s)); + } + + return 0; +} + +static int +do_test (void) +{ + test_utf8 (); + test_big5_hkscs (); + return 0; +} + +#include