From patchwork Sun Feb 27 16:53:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 51410 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 580BA3858428 for ; Sun, 27 Feb 2022 16:54:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 580BA3858428 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1645980861; bh=ng0xLrhxTwejUCm5LqMpYFG/HEg/gFb2/ag+LufdBRo=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=h/4HVxKv0MdiyGeXTFV6Wtla42iMSlD0P8xfZHMOB3cLe71DsJusFEiZHiYmWBa84 iPk8LsXsBXzE8IIs1P6iRDt0ZNJt3JPOFrJ0yn42/gpOhbSJPXOqR3YJpFSBqDfGBT /cc0VQSpM6/j5OQluCCiBJYlKq1QYw9Q7J6pUVpQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp72.ord1c.emailsrvr.com (smtp72.ord1c.emailsrvr.com [108.166.43.72]) by sourceware.org (Postfix) with ESMTPS id 550833858C3A for ; Sun, 27 Feb 2022 16:53:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 550833858C3A X-Auth-ID: tom@honermann.net Received: by smtp18.relay.ord1c.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 32B6AE00D0 for ; Sun, 27 Feb 2022 11:53:20 -0500 (EST) Message-ID: <29a48f8e-0c31-072c-ec36-8b62a0e1b430@honermann.net> Date: Sun, 27 Feb 2022 11:53:19 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-US Subject: [PATCH 1/3]: C++20 P0482R6 and C2X N2653: Fix for bug 25744, mbrtowc with Big5-HKSCS To: libc-alpha X-Classification-ID: 94f9124f-ec9c-43af-bb50-51b69921ce7b-1-1 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch for bug 25744 [1] updates the Big5-HKSCS converter to properly maintain the lowest 3 bits of the mbstate_t __count data member. This change is necessary to ensure that state is correctly preserved when the converter encounters an incomplete multibyte character. More details are available in bug 25744 [1]. The code changes are styled to match how these bits are maintained by converters such as iso-2022-jp.c, ibm930.c, and others. Running 'grep __count' in the 'iconvdata' directory suggests that a number of other converters, euc-jisx0213.c for example, also fail to preserve these bits in some cases, though it may be that negative effects are not observed for those converters. This patch does not attempt to address such issues with other converters. Tested on Linux x86_64. Tom. [1]: Bug 25744 "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the second byte of certain double byte characters" https://sourceware.org/bugzilla/show_bug.cgi?id=25744 commit 9580fc4e7fa0ce33c049b0c2d61b98405fdd2ae3 Author: Tom Honermann Date: Wed Jan 5 18:02:24 2022 -0500 Correct the Big5-HKSCS converter to preserve low order state bits. BZ: https://sourceware.org/bugzilla/show_bug.cgi?id=25744 diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c index a28b18a5ec..d12389b2e3 100644 --- a/iconvdata/big5hkscs.c +++ b/iconvdata/big5hkscs.c @@ -17769,7 +17769,7 @@ static struct the output state to the initial state. This has to be done during the flushing. */ #define EMIT_SHIFT_TO_INIT \ - if (data->__statep->__count != 0) \ + if ((data->__statep->__count >> 3) != 0) \ { \ if (FROM_DIRECTION) \ { \ @@ -17778,7 +17778,7 @@ static struct /* Write out the last character. */ \ *((uint32_t *) outbuf) = data->__statep->__count >> 3; \ outbuf += sizeof (uint32_t); \ - data->__statep->__count = 0; \ + data->__statep->__count &= 7; \ } \ else \ /* We don't have enough room in the output buffer. */ \ @@ -17792,7 +17792,7 @@ static struct uint32_t lasttwo = data->__statep->__count >> 3; \ *outbuf++ = (lasttwo >> 8) & 0xff; \ *outbuf++ = lasttwo & 0xff; \ - data->__statep->__count = 0; \ + data->__statep->__count &= 7; \ } \ else \ /* We don't have enough room in the output buffer. */ \ @@ -17878,7 +17878,7 @@ static struct \ /* Otherwise store only the first character now, and \ put the second one into the queue. */ \ - *statep = ch2 << 3; \ + *statep = (ch2 << 3) | (*statep & 7); \ /* Tell the caller why we terminate the loop. */ \ result = __GCONV_FULL_OUTPUT; \ break; \ @@ -17895,7 +17895,7 @@ static struct } \ else \ /* Clear the queue and proceed to output the saved character. */ \ - *statep = 0; \ + *statep &= 7; \ \ put32 (outptr, ch); \ outptr += 4; \ @@ -17946,7 +17946,7 @@ static struct } \ *outptr++ = (ch >> 8) & 0xff; \ *outptr++ = ch & 0xff; \ - *statep = 0; \ + *statep &= 7; \ inptr += 4; \ continue; \ \ @@ -17959,7 +17959,7 @@ static struct } \ *outptr++ = (lasttwo >> 8) & 0xff; \ *outptr++ = lasttwo & 0xff; \ - *statep = 0; \ + *statep &= 7; \ continue; \ } \ \ @@ -17996,7 +17996,7 @@ static struct /* Check for possible combining character. */ \ if (__glibc_unlikely (ch == 0xca || ch == 0xea)) \ { \ - *statep = ((cp[0] << 8) | cp[1]) << 3; \ + *statep = (((cp[0] << 8) | cp[1]) << 3) | (*statep & 7); \ inptr += 4; \ continue; \ } \ diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c index 9601b6c1d9..e1472dc2e2 100644 --- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c +++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c @@ -128,6 +128,71 @@ check_conversion (struct testdata test) printf ("error: Result of third conversion was wrong.\n"); err++; } + + /* Now perform the same test as above consuming one byte at a time. */ + mbs = test.input; + memset (&st, 0, sizeof (st)); + + /* Consume the first byte; expect an incomplete multibyte character. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != -2) + { + printf ("error: First byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the first consumed byte. */ + mbs += 1; + /* Consume the second byte; expect the first wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 1) + { + printf ("error: Second byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the second consumed byte. */ + mbs += 1; + if (wc != test.expected[0]) + { + printf ("error: Result of first wchar_t conversion was wrong.\n"); + err++; + } + /* Consume no bytes; expect the second wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 0) + { + printf ("error: First attempt of third byte conversion returned %zd.\n", ret); + err++; + } + /* Do not advance past the third byte. */ + mbs += 0; + if (wc != test.expected[1]) + { + printf ("error: Result of second wchar_t conversion was wrong.\n"); + err++; + } + /* After the second wchar_t conversion, the converter should be in + the initial state since the two input BIG5-HKSCS bytes have been + consumed and the two wchar_t's have been output. */ + if (mbsinit (&st) == 0) + { + printf ("error: Converter not in initial state.\n"); + err++; + } + /* Consume the third byte; expect the third wchar_t. */ + ret = mbrtowc (&wc, mbs, 1, &st); + if (ret != 1) + { + printf ("error: Third byte conversion returned %zd.\n", ret); + err++; + } + /* Advance past the third consumed byte. */ + mbs += 1; + if (wc != test.expected[2]) + { + printf ("error: Result of third wchar_t conversion was wrong.\n"); + err++; + } + /* Return 0 if we saw no errors. */ return err; } From patchwork Sun Feb 27 16:53:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 51411 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A50B5385842C for ; Sun, 27 Feb 2022 16:55:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A50B5385842C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1645980908; bh=auy1J9vjXR3yF3zWTS855dZhhIn8xxOszUIppOzJri4=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=fhrcWwWf3An7/3sDUcv6bWXQBWHrpNu9FHzmY05zuQPpu+rxSVOUy7NXJ8skXKcr2 7PxRkY2JOD/xjJ/fbi+Z+6B2npi1W6Tp2pUOhLI3zB7X3snfh7PNmi3LUmgLsaGmeC HRx9euV8bc60VRisV6jhKK7jVOpIZ8NuScjTqzLE= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp82.ord1c.emailsrvr.com (smtp82.ord1c.emailsrvr.com [108.166.43.82]) by sourceware.org (Postfix) with ESMTPS id 6D5BD3858D35 for ; Sun, 27 Feb 2022 16:53:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6D5BD3858D35 X-Auth-ID: tom@honermann.net Received: by smtp27.relay.ord1c.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 2599440083 for ; Sun, 27 Feb 2022 11:53:25 -0500 (EST) Message-ID: <57610f50-dd95-fd32-1102-5f1cda440891@honermann.net> Date: Sun, 27 Feb 2022 11:53:23 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-US Subject: [PATCH 2/3]: C++20 P0482R6 and C2X N2653: Implement mbrtoc8, c8rtomb, char8_t To: libc-alpha X-Classification-ID: 532f964c-4949-4264-b6fd-73f940998a65-1-1 X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch provides implementations for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 [1] and for C2X via WG14 N2653 [2]. It also provides the char8_t typedef adopted for C2X via WG14 N2653 [2]. The mbrtoc8 and c8rtomb functions are declared in uchar.h in C2X mode or when the _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined. The char8_t typedef is declared in uchar.h in C2X mode or when the _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). Tested on Linux x86_64. Tom. [1]: WG21 P0482R6 "char8_t: A type for UTF-8 characters and strings (Revision 6)" https://wg21.link/p0482r6 [2]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm commit d29ff21403913f7014fe3deebe65c9294e6c1d44 Author: Tom Honermann Date: Sun Feb 27 10:28:58 2022 -0500 Implement mbrtoc8(), c8rtomb(), and the char8_t typedef. This change provides implementations for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14 N2653. It also provides the char8_t typedef from WG14 N2653. The mbrtoc8 and c8rtomb functions are declared in uchar.h in C2X mode or when the _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined. The char8_t typedef is declared in uchar.h in C2X mode or when the _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). diff --git a/NEWS b/NEWS index 626eeabf5d..3c4b597ab3 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,15 @@ Version 2.36 Major new features: +* Support for the mbrtoc8 and c8rtomb multibyte/UTF-8 character conversion + functions has been added per the ISO C2X N2653 and C++20 P0482R6 proposals. + Support for the char8_t typedef has been added per the ISO C2X N2653 + proposal. The functions are declared in uchar.h in C2X mode or when the + _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined. + The char8_t typedef is declared in uchar.h in C2X mode or when the + _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro + is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). + [Add new features here] Deprecated and removed features, and other changes affecting compatibility: diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist index 4dc87e9061..66fb0e28fa 100644 --- a/sysdeps/mach/hurd/i386/libc.abilist +++ b/sysdeps/mach/hurd/i386/libc.abilist @@ -2289,6 +2289,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 close_range F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist index 1b63d9e447..82dfae3421 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist @@ -2616,3 +2616,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist index e7e4cf7d2a..56d9094a58 100644 --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist @@ -2713,6 +2713,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist index bc3d228e31..fb7fc117f3 100644 --- a/sysdeps/unix/sysv/linux/arc/libc.abilist +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist @@ -2377,3 +2377,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist index db7039c4ab..f7c3b87bc0 100644 --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist @@ -496,6 +496,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist index d2add4fb49..19f44510ee 100644 --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist @@ -493,6 +493,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist index 355d72a30c..fce3ee63ea 100644 --- a/sysdeps/unix/sysv/linux/csky/libc.abilist +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist @@ -2652,3 +2652,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist index 3df39bb28c..307b77c25b 100644 --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist @@ -2601,6 +2601,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist index c4da358f80..99f3b0dde1 100644 --- a/sysdeps/unix/sysv/linux/i386/libc.abilist +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist @@ -2785,6 +2785,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist index 241bac70ea..3fe18c1bea 100644 --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist @@ -2551,6 +2551,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist index 78bf372b72..aee8cb576b 100644 --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist @@ -497,6 +497,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0x98 GLIBC_2.4 _IO_2_1_stdin_ D 0x98 diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist index 00df5c901f..9b204c5cf9 100644 --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist @@ -2728,6 +2728,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist index e8118569c3..6ec81952e1 100644 --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist @@ -2701,3 +2701,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist index c0d2373e64..7b484a8419 100644 --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist @@ -2698,3 +2698,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist index 2d0fd04f54..26e4321f80 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist @@ -2693,6 +2693,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist index e39ccfb312..2c5e66e2c0 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist @@ -2691,6 +2691,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist index 1e900f86e4..f2e0a4818a 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist @@ -2699,6 +2699,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist index 9145ba7931..e0505441f6 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist @@ -2602,6 +2602,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist index e95d60d926..d0260261e2 100644 --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist @@ -2740,3 +2740,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist index ca934e374b..506853ee25 100644 --- a/sysdeps/unix/sysv/linux/or1k/libc.abilist +++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist @@ -2123,3 +2123,5 @@ GLIBC_2.35 wprintf F GLIBC_2.35 write F GLIBC_2.35 writev F GLIBC_2.35 wscanf F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist index 3820b9f235..0096aa7a97 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist @@ -2755,6 +2755,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist index 464dc27fcd..5869941ec6 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist @@ -2788,6 +2788,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist index 2f7e58747f..dc6277f1d8 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist @@ -2510,6 +2510,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist index 4f3043d913..fd8adceee6 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist @@ -2812,3 +2812,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist index 84b6ac815a..2f0b08fe30 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist @@ -2379,3 +2379,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index 4d5c19c56a..ad49621c76 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2579,3 +2579,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist index 7c5ee8d569..ab25e964f1 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist @@ -2753,6 +2753,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist index 50de0b46cf..3851645690 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist @@ -2547,6 +2547,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist index 66fba013ca..b606bad5dc 100644 --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist @@ -2608,6 +2608,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist index 38703f8aa0..9b7e03a290 100644 --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist @@ -2605,6 +2605,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist index 6df55eb765..04651211d7 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist @@ -2748,6 +2748,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist index b90569d881..5ce2c90903 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist @@ -2574,6 +2574,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist index e88b0f101f..85a7fb3152 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist @@ -2525,6 +2525,8 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist index e0755272eb..975e78eaa5 100644 --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist @@ -2631,3 +2631,5 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 c8rtomb F +GLIBC_2.36 mbrtoc8 F diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index df9a85f4a9..bda281ad70 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -42,7 +42,7 @@ routines := wcscat wcschr wcscmp wcscpy wcscspn wcsdup wcslen wcsncat \ wcsmbsload mbsrtowcs_l \ isoc99_wscanf isoc99_vwscanf isoc99_fwscanf isoc99_vfwscanf \ isoc99_swscanf isoc99_vswscanf \ - mbrtoc16 c16rtomb mbrtoc32 c32rtomb + mbrtoc8 c8rtomb mbrtoc16 c16rtomb mbrtoc32 c32rtomb strop-tests := wcscmp wcsncmp wmemcmp wcslen wcschr wcsrchr wcscpy wcsnlen \ wcpcpy wcsncpy wcpncpy wcscat wcsncat wcschrnul wcsspn wcspbrk \ diff --git a/wcsmbs/Versions b/wcsmbs/Versions index 0b31c1b940..ec28acfb73 100644 --- a/wcsmbs/Versions +++ b/wcsmbs/Versions @@ -49,4 +49,7 @@ libc { wcstof32; wcstof64; wcstof32x; wcstof32_l; wcstof64_l; wcstof32x_l; } + GLIBC_2.36 { + c8rtomb; mbrtoc8; + } } diff --git a/wcsmbs/c8rtomb.c b/wcsmbs/c8rtomb.c new file mode 100644 index 0000000000..8a5ffeab23 --- /dev/null +++ b/wcsmbs/c8rtomb.c @@ -0,0 +1,132 @@ +/* UTF-8 to multibyte conversion. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + + +/* This is the private state used if PS is NULL. */ +static mbstate_t state; + +size_t +c8rtomb (char *s, char8_t c8, mbstate_t *ps) +{ + /* This implementation depends on the converter invoked by wcrtomb not + needing to retain state in either the top most bit of ps->__count or + in ps->__value between invocations. This implementation uses the + top most bit of ps->__count to indicate that trailing code units are + expected and uses ps->__value to store previously seen code units. */ + + wchar_t wc; + + if (ps == NULL) + ps = &state; + + if (s == NULL) + { + /* if 's' is a null pointer, behave as if u8'\0' was passed as 'c8'. If + this occurs for an incomplete code unit sequence, then an error will + be reported below. */ + c8 = u8""[0]; + } + + if (! (ps->__count & 0x80000000)) + { + /* Initial state. */ + if ((c8 >= 0x80 && c8 <= 0xC1) || c8 >= 0xF5) + { + /* An invalid lead code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (c8 >= 0xC2) + { + /* A valid lead code unit. */ + ps->__count |= 0x80000000; + ps->__value.__wchb[0] = c8; + ps->__value.__wchb[3] = 1; + return 0; + } + /* A single byte (ASCII) code unit. */ + wc = c8; + } + else + { + char8_t cu1 = ps->__value.__wchb[0]; + if (ps->__value.__wchb[3] == 1) + { + /* A single lead code unit was previously seen. */ + if ((c8 < 0x80 || c8 > 0xBF) || + (cu1 == 0xE0 && c8 < 0xA0) || + (cu1 == 0xED && c8 > 0x9F) || + (cu1 == 0xF0 && c8 < 0x90) || + (cu1 == 0xF4 && c8 > 0x8F)) + { + /* An invalid second code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (cu1 >= 0xE0) + { + /* A three or four code unit sequence. */ + ps->__value.__wchb[1] = c8; + ++ps->__value.__wchb[3]; + return 0; + } + wc = ((cu1 & 0x1F) << 6) + + (c8 & 0x3F); + } + else + { + char8_t cu2 = ps->__value.__wchb[1]; + /* A three or four byte code unit sequence. */ + if (c8 < 0x80 || c8 > 0xBF) + { + /* An invalid third or fourth code unit. */ + __set_errno (EILSEQ); + return -1; + } + if (ps->__value.__wchb[3] == 2 && cu1 >= 0xF0) + { + /* A four code unit sequence. */ + ps->__value.__wchb[2] = c8; + ++ps->__value.__wchb[3]; + return 0; + } + if (cu1 < 0xF0) + { + wc = ((cu1 & 0x0F) << 12) + + ((cu2 & 0x3F) << 6) + + (c8 & 0x3F); + } + else + { + char8_t cu3 = ps->__value.__wchb[2]; + wc = ((cu1 & 0x07) << 18) + + ((cu2 & 0x3F) << 12) + + ((cu3 & 0x3F) << 6) + + (c8 & 0x3F); + } + } + ps->__count &= 0x7fffffff; + ps->__value.__wch = 0; + } + + return wcrtomb (s, wc, ps); +} diff --git a/wcsmbs/mbrtoc8.c b/wcsmbs/mbrtoc8.c new file mode 100644 index 0000000000..8ca457088d --- /dev/null +++ b/wcsmbs/mbrtoc8.c @@ -0,0 +1,126 @@ +/* Multibyte to UTF-8 conversion. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include + +#include + +#ifndef EILSEQ +# define EILSEQ EINVAL +#endif + + +/* This is the private state used if PS is NULL. */ +static mbstate_t state; + +size_t +mbrtoc8 (char8_t *pc8, const char *s, size_t n, mbstate_t *ps) +{ + /* This implementation depends on the converter invoked by mbrtowc() not + needing to retain state in either the top most bit of ps->__count or + in ps->__value between invocations. This implementation uses the + top most bit of ps->__count to indicate that trailing code units are + yet to be written and uses ps->__value to store those code units. */ + + if (ps == NULL) + ps = &state; + + /* If state indicates that trailing code units are yet to be written, write + those first regardless of whether 's' is a null pointer. */ + if (ps->__count & 0x80000000) + { + /* ps->__value.__wchb[3] stores the index of the next code unit to + write. Code units are stored in reverse order. */ + size_t i = ps->__value.__wchb[3]; + if (pc8 != NULL) + { + *pc8 = ps->__value.__wchb[i]; + } + if (i == 0) + { + ps->__count &= 0x7fffffff; + ps->__value.__wch = 0; + } + else + --ps->__value.__wchb[3]; + return -3; + } + + if (s == NULL) + { + /* if 's' is a null pointer, behave as if a null pointer was passed for + 'pc8', an empty string was passed for 's', and 1 passed for 'n'. */ + pc8 = NULL; + s = ""; + n = 1; + } + + wchar_t wc; + size_t result; + + result = mbrtowc(&wc, s, n, ps); + if (result <= n) + { + if (wc <= 0x7F) + { + if (pc8 != NULL) + *pc8 = wc; + } + else if (wc <= 0x7FF) + { + if (pc8 != NULL) + *pc8 = 0xC0 + ((wc >> 6) & 0x1F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 0; + ps->__count |= 0x80000000; + } + else if (wc <= 0xFFFF) + { + if (pc8 != NULL) + *pc8 = 0xE0 + ((wc >> 12) & 0x0F); + ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 1; + ps->__count |= 0x80000000; + } + else if (wc <= 0x10FFFF) + { + if (pc8 != NULL) + *pc8 = 0xF0 + ((wc >> 18) & 0x07); + ps->__value.__wchb[2] = 0x80 + ((wc >> 12) & 0x3F); + ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F); + ps->__value.__wchb[0] = 0x80 + (wc & 0x3F); + ps->__value.__wchb[3] = 2; + ps->__count |= 0x80000000; + } + } + if (result == 0 && wc != 0) + { + /* mbrtowc() never returns -3. When a MB sequence converts to multiple + WCs, no input is consumed when writing the subsequent WCs resulting + in a result of 0 even if a null character wasn't written. */ + result = -3; + } + + return result; +} diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h index 051cdcbeb5..01aaf9c94b 100644 --- a/wcsmbs/uchar.h +++ b/wcsmbs/uchar.h @@ -31,6 +31,13 @@ #include #include +/* Declare the C2x char8_t typedef in _GNU_SOURCE and C2x modes, but only + if the C++ __cpp_char8_t feature test macro is not defined. */ +#if (defined __USE_GNU || __GLIBC_USE (ISOC2X)) && !defined __cpp_char8_t +/* Define the 8-bit character type. */ +typedef unsigned char char8_t; +#endif + #ifndef __USE_ISOCXX11 /* Define the 16-bit and 32-bit character types. */ typedef __uint_least16_t char16_t; @@ -40,6 +47,20 @@ typedef __uint_least32_t char32_t; __BEGIN_DECLS +/* Declare the C2x mbrtoc8() and c8rtomb() functions in _GNU_SOURCE and C2x + modes or if the C++ __cpp_char8_t feature test macro is defined. */ +#if defined __USE_GNU || __GLIBC_USE (ISOC2X) || defined __cpp_char8_t +/* Write char8_t representation of multibyte character pointed + to by S to PC8. */ +extern size_t mbrtoc8 (char8_t *__restrict __pc8, + const char *__restrict __s, size_t __n, + mbstate_t *__restrict __p) __THROW; + +/* Write multibyte representation of char8_t C8 to S. */ +extern size_t c8rtomb (char *__restrict __s, char8_t __c8, + mbstate_t *__restrict __ps) __THROW; +#endif + /* Write char16_t representation of multibyte character pointed to by S to PC16. */ extern size_t mbrtoc16 (char16_t *__restrict __pc16, From patchwork Sun Feb 27 16:53:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 51412 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8BBAE3858410 for ; Sun, 27 Feb 2022 16:55:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8BBAE3858410 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1645980956; bh=AFINYdPK4SE6GdZQnkAXTJCNV/q5K4f5oRPT9neDs1Y=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=f39waZWgQhMcmjeOgKXckla+BtwRkSy9tz8xSk1OcTrDQpvCRA73iG2Zw+s81LGyb prJy/Yv5prB86Ee2fQXSucii/joNtDjAWp2bmAetoDv54s1tIVITnjVGKYS8ZEUzaB ke2bNw5aJENL9wv1zw8/xohZE5TPoDVFlqZFi1Zc= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtp89.ord1c.emailsrvr.com (smtp89.ord1c.emailsrvr.com [108.166.43.89]) by sourceware.org (Postfix) with ESMTPS id 425953858428 for ; Sun, 27 Feb 2022 16:53:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 425953858428 X-Auth-ID: tom@honermann.net Received: by smtp20.relay.ord1c.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 14398E009B for ; Sun, 27 Feb 2022 11:53:28 -0500 (EST) Message-ID: <2c218996-a320-9520-2ed0-8797c109ec19@honermann.net> Date: Sun, 27 Feb 2022 11:53:28 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-US Subject: [PATCH 3/3]: C++20 P0482R6 and C2X N2653: Tests for mbrtoc8, c8rtomb, char8_t To: libc-alpha X-Classification-ID: 9692af85-ddb9-4b58-ad6e-4d5f5f6c8b65-1-1 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Libc-alpha From: Tom Honermann Reply-To: Tom Honermann Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch adds tests for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 [1] and for C2X via WG14 N2653 [2], and for the char8_t typedef adopted for C2X via WG14 N2653 [2]. The tests for mbrtoc8 and c8rtomb specifically exercise conversion from/to Big5-HKSCS because of special cases that arise with that encoding. Big5-HKSCS defines some double byte sequences that convert to more than one Unicode code point. In order to test this, the locale dependencies for running tests under wcsmbs is expanded to include zh_HK.BIG5-HKSCS. Tested on Linux x86_64. Tom. [1]: WG21 P0482R6 "char8_t: A type for UTF-8 characters and strings (Revision 6)" https://wg21.link/p0482r6 [2]: WG14 N2653 "char8_t: A type for UTF-8 characters and strings (Revision 1)" http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm commit 1fb259ae6b5da6865140c203a484b0735fc152d0 Author: Tom Honermann Date: Sun Feb 27 10:39:00 2022 -0500 Tests for mbrtoc8(), c8rtomb(), and the char8_t typedef. This change adds tests for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14 N2653, and for the char8_t typedef adopted for C2X from WG14 N2653. The tests for mbrtoc8 and c8rtomb specifically exercise conversion to and from Big5-HKSCS because of special cases that arise with that encoding. Big5-HKSCS defines some double byte sequences that convert to more than one Unicode code point. In order to test this, the locale dependencies for running tests under wcsmbs is expanded to include zh_HK.BIG5-HKSCS. diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index bda281ad70..9b8445fa48 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -52,6 +52,7 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \ tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \ tst-wcstod-round test-char-types tst-fgetwc-after-eof \ tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \ + test-char8-type test-mbrtoc8 test-c8rtomb \ $(addprefix test-,$(strop-tests)) tst-mbstowcs \ tst-wprintf-binary @@ -59,7 +60,7 @@ include ../Rules ifeq ($(run-built-tests),yes) LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 hr_HR.ISO-8859-2 \ - ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 + ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 zh_HK.BIG5-HKSCS include ../gen-locales.mk $(objpfx)tst-btowc.out: $(gen-locales) diff --git a/wcsmbs/test-c8rtomb.c b/wcsmbs/test-c8rtomb.c new file mode 100644 index 0000000000..14564fa00a --- /dev/null +++ b/wcsmbs/test-c8rtomb.c @@ -0,0 +1,519 @@ +/* Test c8rtomb. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static int +test_truncated_code_unit_sequence (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Missing trailing code unit for a two code byte unit sequence. */ + u8s = (const char8_t*) u8"\xC2"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* No trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Missing first trailing code unit for a three byte code unit sequence. */ + u8s = (const char8_t*) u8"\xE0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* No trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Missing second trailing code unit for a three byte code unit sequence. */ + u8s = (const char8_t*) u8"\xE0\xA0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); /* No trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Missing first trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* No trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Missing second trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\x90"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); /* No trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Missing third trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\x90\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); /* 3rd byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1); /* No trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + return 0; +} + +static int +test_invalid_trailing_code_unit_sequence (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Invalid trailing code unit for a two code byte unit sequence. */ + u8s = (const char8_t*) u8"\xC2\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Invalid first trailing code unit for a three byte code unit sequence. */ + u8s = (const char8_t*) u8"\xE0\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Invalid second trailing code unit for a three byte code unit sequence. */ + u8s = (const char8_t*) u8"\xE0\xA0\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Invalid first trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Invalid second trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\x90\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Invalid third trailing code unit for a four byte code unit sequence. */ + u8s = (const char8_t*) u8"\xF0\x90\x80\xC0"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); /* 3rd byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + return 0; +} + +static int +test_lone_trailing_code_units (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Lone trailing code unit. */ + u8s = (const char8_t*) u8"\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); /* Lone trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + return 0; +} + +static int +test_overlong_encoding (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Two byte overlong encoding. */ + u8s = (const char8_t*) u8"\xC0\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); /* Invalid lead code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Two byte overlong encoding. */ + u8s = (const char8_t*) u8"\xC1\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); /* Invalid lead code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Three byte overlong encoding. */ + u8s = (const char8_t*) u8"\xE0\x9F\xBF"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* First byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Four byte overlong encoding. */ + u8s = (const char8_t*) u8"\xF0\x8F\xBF\xBF"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* First byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + return 0; +} + +static int +test_surrogate_range (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Would encode U+D800. */ + u8s = (const char8_t*) u8"\xED\xA0\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* First byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Would encode U+DFFF. */ + u8s = (const char8_t*) u8"\xED\xBF\xBF"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* First byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + return 0; +} + +static int +test_out_of_range_encoding (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Would encode U+00110000. */ + u8s = (const char8_t*) u8"\xF4\x90\x80\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* First byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); /* Invalid trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + /* Would encode U+00140000. */ + u8s = (const char8_t*) u8"\xF5\x90\x80\x80"; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + errno = 0; + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); /* Invalid lead code unit */ + TEST_COMPARE (errno, EILSEQ); + + return 0; +} + +static int +test_null_output_buffer (void) +{ + char buf[MB_LEN_MAX]; + mbstate_t s; + + /* Null character with an initial state. */ + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (NULL, u8"X"[0], &s), (size_t) 1); /* null byte processed */ + TEST_VERIFY (mbsinit (&s)); /* Assert the state is now an initial state. */ + + /* Null buffer with a state corresponding to an incompletely read code + unit sequence. In this case, an error occurs since insufficient + information is available to complete the already started code unit + sequence and return to the initial state. */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8"\xC2"[0], &s), (size_t) 0); /* 1st byte processed */ + errno = 0; + TEST_COMPARE (c8rtomb (NULL, u8"\x80"[0], &s), (size_t) -1); /* No trailing code unit */ + TEST_COMPARE (errno, EILSEQ); + + return 0; +} + +static int +test_utf8 (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + TEST_VERIFY_EXIT (setlocale (LC_ALL, "de_DE.UTF-8") != NULL); + + /* Null character. */ + u8s = (const char8_t*) u8"\x00"; /* U+0000 => 0x00 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1); /* 1st byte processed */ + TEST_COMPARE (buf[0], (char) 0x00); + TEST_VERIFY (mbsinit (&s)); + + /* First non-null character in the code point range that maps to a single + code unit. */ + u8s = (const char8_t*) u8"\x01"; /* U+0001 => 0x01 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1); /* 1st byte processed */ + TEST_COMPARE (buf[0], (char) 0x01); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to a single code unit. */ + u8s = (const char8_t*) u8"\x7F"; /* U+007F => 0x7F */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1); /* 1st byte processed */ + TEST_COMPARE (buf[0], (char) 0x7F); + TEST_VERIFY (mbsinit (&s)); + + /* First character in the code point range that maps to two code units. */ + u8s = (const char8_t*) u8"\xC2\x80"; /* U+0080 => 0xC2 0x80 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2); /* 2nd byte processed */ + TEST_COMPARE (buf[0], (char) 0xC2); + TEST_COMPARE (buf[1], (char) 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to two code units. */ + u8s = (const char8_t*) u8"\u07FF"; /* U+07FF => 0xDF 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2); /* 2nd byte processed */ + TEST_COMPARE (buf[0], (char) 0xDF); + TEST_COMPARE (buf[1], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* First character in the code point range that maps to three code units. */ + u8s = (const char8_t*) u8"\u0800"; /* U+0800 => 0xE0 0xA0 0x80 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); /* 3rd byte processed */ + TEST_COMPARE (buf[0], (char) 0xE0); + TEST_COMPARE (buf[1], (char) 0xA0); + TEST_COMPARE (buf[2], (char) 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to three code units + before the surrogate code point range. */ + u8s = (const char8_t*) u8"\uD7FF"; /* U+D7FF => 0xED 0x9F 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); /* 3rd byte processed */ + TEST_COMPARE (buf[0], (char) 0xED); + TEST_COMPARE (buf[1], (char) 0x9F); + TEST_COMPARE (buf[2], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* First character in the code point range that maps to three code units + after the surrogate code point range. */ + u8s = (const char8_t*) u8"\uE000"; /* U+E000 => 0xEE 0x80 0x80 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); /* 3rd byte processed */ + TEST_COMPARE (buf[0], (char) 0xEE); + TEST_COMPARE (buf[1], (char) 0x80); + TEST_COMPARE (buf[2], (char) 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Not a BOM. */ + u8s = (const char8_t*) u8"\uFEFF"; /* U+FEFF => 0xEF 0xBB 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); /* 3rd byte processed */ + TEST_COMPARE (buf[0], (char) 0xEF); + TEST_COMPARE (buf[1], (char) 0xBB); + TEST_COMPARE (buf[2], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* Replacement character. */ + u8s = (const char8_t*) u8"\uFFFD"; /* U+FFFD => 0xEF 0xBF 0xBD */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); /* 3rd byte processed */ + TEST_COMPARE (buf[0], (char) 0xEF); + TEST_COMPARE (buf[1], (char) 0xBF); + TEST_COMPARE (buf[2], (char) 0xBD); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to three code units. */ + u8s = (const char8_t*) u8"\uFFFF"; /* U+FFFF => 0xEF 0xBF 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); /* 3rd byte processed */ + TEST_COMPARE (buf[0], (char) 0xEF); + TEST_COMPARE (buf[1], (char) 0xBF); + TEST_COMPARE (buf[2], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* First character in the code point range that maps to four code units. */ + u8s = (const char8_t*) u8"\U00010000"; /* U+10000 => 0xF0 0x90 0x80 0x80 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); /* 3rd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4); /* 4th byte processed */ + TEST_COMPARE (buf[0], (char) 0xF0); + TEST_COMPARE (buf[1], (char) 0x90); + TEST_COMPARE (buf[2], (char) 0x80); + TEST_COMPARE (buf[3], (char) 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to four code units. */ + u8s = (const char8_t*) u8"\U0010FFFF"; /* U+10FFFF => 0xF4 0x8F 0xBF 0xBF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); /* 3rd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4); /* 4th byte processed */ + TEST_COMPARE (buf[0], (char) 0xF4); + TEST_COMPARE (buf[1], (char) 0x8F); + TEST_COMPARE (buf[2], (char) 0xBF); + TEST_COMPARE (buf[3], (char) 0xBF); + TEST_VERIFY (mbsinit (&s)); + + return 0; +} + +static int +test_big5_hkscs (void) +{ + const char8_t *u8s; + char buf[MB_LEN_MAX]; + mbstate_t s; + + TEST_VERIFY_EXIT (setlocale (LC_ALL, "zh_HK.BIG5-HKSCS") != NULL); + + /* A pair of two byte UTF-8 code unit sequences that map a Unicode code + point and combining character to a single double byte character. */ + u8s = (const char8_t*) u8"\u00CA\u0304"; /* U+00CA U+0304 => 0x88 0x62 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); /* 3rd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2); /* 4th byte processed */ + TEST_COMPARE (buf[0], (char) 0x88); + TEST_COMPARE (buf[1], (char) 0x62); + TEST_VERIFY (mbsinit (&s)); + + /* Another pair of two byte UTF-8 code unit sequences that map a Unicode code + point and combining character to a single double byte character. */ + u8s = (const char8_t*) u8"\u00EA\u030C"; /* U+00EA U+030C => 0x88 0xA5 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); /* 1st byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); /* 2nd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); /* 3rd byte processed */ + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2); /* 4th byte processed */ + TEST_COMPARE (buf[0], (char) 0x88); + TEST_COMPARE (buf[1], (char) 0xA5); + TEST_VERIFY (mbsinit (&s)); + + return 0; +} + +static int +do_test (void) +{ + test_truncated_code_unit_sequence (); + test_invalid_trailing_code_unit_sequence (); + test_lone_trailing_code_units (); + test_overlong_encoding (); + test_surrogate_range (); + test_out_of_range_encoding (); + test_null_output_buffer (); + test_utf8 (); + test_big5_hkscs (); + return 0; +} + +#include diff --git a/wcsmbs/test-char8-type.c b/wcsmbs/test-char8-type.c new file mode 100644 index 0000000000..642c7044ed --- /dev/null +++ b/wcsmbs/test-char8-type.c @@ -0,0 +1,31 @@ +/* Test the char8_t typedef. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +/* Verify that the char8_t type is recognized. */ +char8_t c8; + +static int +do_test (void) +{ + /* This is a compilation test. */ + return 0; +} + +#include diff --git a/wcsmbs/test-mbrtoc8.c b/wcsmbs/test-mbrtoc8.c new file mode 100644 index 0000000000..c5061635ac --- /dev/null +++ b/wcsmbs/test-mbrtoc8.c @@ -0,0 +1,462 @@ +/* Test mbrtoc8. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include + +static int +test_utf8 (void) +{ + const char *mbs; + char8_t buf[1]; + mbstate_t s; + + TEST_VERIFY_EXIT (setlocale (LC_ALL, "de_DE.UTF-8") != NULL); + + /* No inputs. */ + mbs = ""; + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 0, &s), (size_t) -2); /* no input */ + TEST_VERIFY (mbsinit (&s)); + + /* Null character. */ + mbs = "\x00"; /* 0x00 => U+0000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 0); /* null byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0x00); + TEST_VERIFY (mbsinit (&s)); + + /* First non-null character in the code point range that maps to a single + code unit. */ + mbs = "\x01"; /* 0x01 => U+0001 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1); /* 1st byte processed */ + mbs += 1; + TEST_COMPARE (buf[0], 0x01); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to a single code unit. */ + mbs = "\x7F"; /* 0x7F => U+007F */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1); /* 1st byte processed */ + mbs += 1; + TEST_COMPARE (buf[0], 0x7F); + TEST_VERIFY (mbsinit (&s)); + + /* First character in the code point range that maps to two code units. */ + mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); /* 1st byte written */ + mbs += 2; + TEST_COMPARE (buf[0], 0xC2); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xC2); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to two code units. */ + mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); /* 1st byte written */ + mbs += 2; + TEST_COMPARE (buf[0], 0xDF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xDF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* First character in the code point range that maps to three code units. */ + mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); /* 1st byte written */ + mbs += 3; + TEST_COMPARE (buf[0], 0xE0); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xA0); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xE0); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xA0); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to three code units + before the surrogate code point range. */ + mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); /* 1st byte written */ + mbs += 3; + TEST_COMPARE (buf[0], 0xED); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x9F); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xED); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x9F); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* First character in the code point range that maps to three code units + after the surrogate code point range. */ + mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); /* 1st byte written */ + mbs += 3; + TEST_COMPARE (buf[0], 0xEE); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xEE); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Not a BOM. */ + mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); /* 1st byte written */ + mbs += 3; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xBB); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xBB); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* Replacement character. */ + mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); /* 1st byte written */ + mbs += 3; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xBD); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xBD); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to three code units. */ + mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); /* 1st byte processed */ + mbs += 3; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte processed */ + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte processed */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xEF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* First character in the code point range that maps to four code units. */ + mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4); /* 1st byte written */ + mbs += 4; + TEST_COMPARE (buf[0], 0xF0); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x90); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 4th byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xF0); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x90); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 4th byte written */ + TEST_COMPARE (buf[0], 0x80); + TEST_VERIFY (mbsinit (&s)); + + /* Last character in the code point range that maps to four code units. */ + mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4); /* 1st byte written */ + mbs += 4; + TEST_COMPARE (buf[0], 0xF4); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x8F); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 4th byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xF4); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x8F); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 4th byte written */ + TEST_COMPARE (buf[0], 0xBF); + TEST_VERIFY (mbsinit (&s)); + + return 0; +} + +static int +test_big5_hkscs (void) +{ + const char *mbs; + char8_t buf[1]; + mbstate_t s; + + TEST_VERIFY_EXIT (setlocale (LC_ALL, "zh_HK.BIG5-HKSCS") != NULL); + + /* A double byte character that maps to a pair of two byte UTF-8 code unit + sequences. */ + mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); /* 1st byte written */ + mbs += 2; + TEST_COMPARE (buf[0], 0xC3); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x8A); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xCC); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 4th byte written */ + TEST_COMPARE (buf[0], 0x84); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xC3); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0x8A); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xCC); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 4th byte written */ + TEST_COMPARE (buf[0], 0x84); + TEST_VERIFY (mbsinit (&s)); + + /* Another double byte character that maps to a pair of two byte UTF-8 code + unit sequences. */ + mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); /* 1st byte written */ + mbs += 2; + TEST_COMPARE (buf[0], 0xC3); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xAA); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xCC); + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3); /* 4th byte written */ + TEST_COMPARE (buf[0], 0x8C); + TEST_VERIFY (mbsinit (&s)); + + /* Same as last test, but one code unit at a time. */ + mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */ + memset (buf, 0, sizeof (buf)); + memset (&s, 0, sizeof (s)); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); /* incomplete */ + mbs += 1; + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); /* 1st byte written */ + mbs += 1; + TEST_COMPARE (buf[0], 0xC3); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 2nd byte written */ + TEST_COMPARE (buf[0], 0xAA); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 3rd byte written */ + TEST_COMPARE (buf[0], 0xCC); + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); /* 4th byte written */ + TEST_COMPARE (buf[0], 0x8C); + TEST_VERIFY (mbsinit (&s)); + + return 0; +} + +static int +do_test (void) +{ + test_utf8 (); + test_big5_hkscs (); + return 0; +} + +#include