From patchwork Sun Jul 23 17:32:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ahelenia_Ziemia=C5=84ska?= X-Patchwork-Id: 73101 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A84F2385770A for ; Sun, 23 Jul 2023 17:33:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A84F2385770A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1690133588; bh=xYPRLYKzVp5Kjx3drehyUUVM+CHoaFzguIeyhOTsJdg=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=qNCNoO8Z1rCFrDhCCPEFp/dwyIyQF+JzkVm/8Kp22TLJ5J8BrzkbWuVT7pnYpAoJR BNvPoKxOG8XeXl8K3dGO6iAFzYCQ5jdwXw++lDyEGrUME4GMQkYRzSKpBARSe/E1b7 JHJAjeE418sh128siCr7aMfyeUxGH5X06rqK2aFU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by sourceware.org (Postfix) with ESMTP id 6E2FF3858D28 for ; Sun, 23 Jul 2023 17:32:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6E2FF3858D28 Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 8BE0037F0; Sun, 23 Jul 2023 19:32:44 +0200 (CEST) Date: Sun, 23 Jul 2023 19:32:43 +0200 To: Florian Weimer Cc: libc-alpha@sourceware.org, Victor Stinner , Bruno Haible Subject: [PATCH v18 1/3] iconv: __gconv_btwoc_ascii -> __gconv_btowc_ascii Message-ID: MIME-Version: 1.0 Content-Disposition: inline User-Agent: NeoMutt/20230517 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_INFOUSMEBIZ, RDNS_DYNAMIC, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: =?utf-8?b?0L3QsNCxIHZpYSBMaWJjLWFscGhh?= From: =?utf-8?q?Ahelenia_Ziemia=C5=84ska?= Reply-To: =?utf-8?b?0L3QsNCx?= Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The only user of this typo, sans the ChangeLogs. Reported-by: Bruno Haible Signed-off-by: Ahelenia Ziemiańska --- Clean rebase of 1/3. iconv/gconv_builtin.h | 4 ++-- iconv/gconv_int.h | 2 +- iconv/gconv_simple.c | 2 +- wcsmbs/wcsmbsload.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/iconv/gconv_builtin.h b/iconv/gconv_builtin.h index 35608b4461..2f560a924a 100644 --- a/iconv/gconv_builtin.h +++ b/iconv/gconv_builtin.h @@ -52,7 +52,7 @@ BUILTIN_TRANSFORMATION ("INTERNAL", "ISO-10646/UTF8/", 1, "=INTERNAL->utf8", __gconv_transform_internal_utf8, NULL, 4, 4, 1, 6) BUILTIN_TRANSFORMATION ("ISO-10646/UTF8/", "INTERNAL", 1, "=utf8->INTERNAL", - __gconv_transform_utf8_internal, __gconv_btwoc_ascii, + __gconv_transform_utf8_internal, __gconv_btowc_ascii, 1, 6, 4, 4) BUILTIN_ALIAS ("UCS2//", "ISO-10646/UCS2/") @@ -82,7 +82,7 @@ BUILTIN_ALIAS ("CSASCII//", "ANSI_X3.4-1968//") BUILTIN_ALIAS ("OSF00010020//", "ANSI_X3.4-1968//") BUILTIN_TRANSFORMATION ("ANSI_X3.4-1968//", "INTERNAL", 1, "=ascii->INTERNAL", - __gconv_transform_ascii_internal, __gconv_btwoc_ascii, + __gconv_transform_ascii_internal, __gconv_btowc_ascii, 1, 1, 4, 4) BUILTIN_TRANSFORMATION ("INTERNAL", "ANSI_X3.4-1968//", 1, "=INTERNAL->ascii", diff --git a/iconv/gconv_int.h b/iconv/gconv_int.h index 19d042faff..e3baec97f0 100644 --- a/iconv/gconv_int.h +++ b/iconv/gconv_int.h @@ -325,7 +325,7 @@ __BUILTIN_TRANSFORM (__gconv_transform_utf16_internal); /* Specialized conversion function for a single byte to INTERNAL, recognizing only ASCII characters. */ -extern wint_t __gconv_btwoc_ascii (struct __gconv_step *step, unsigned char c); +extern wint_t __gconv_btowc_ascii (struct __gconv_step *step, unsigned char c); #endif diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c index e936e171d7..17788383f4 100644 --- a/iconv/gconv_simple.c +++ b/iconv/gconv_simple.c @@ -45,7 +45,7 @@ /* Specialized conversion function for a single byte to INTERNAL, recognizing only ASCII characters. */ wint_t -__gconv_btwoc_ascii (struct __gconv_step *step, unsigned char c) +__gconv_btowc_ascii (struct __gconv_step *step, unsigned char c) { if (c < 0x80) return c; diff --git a/wcsmbs/wcsmbsload.c b/wcsmbs/wcsmbsload.c index 7b338b6775..61392e0b1e 100644 --- a/wcsmbs/wcsmbsload.c +++ b/wcsmbs/wcsmbsload.c @@ -36,7 +36,7 @@ static const struct __gconv_step to_wc = .__from_name = (char *) "ANSI_X3.4-1968//TRANSLIT", .__to_name = (char *) "INTERNAL", .__fct = __gconv_transform_ascii_internal, - .__btowc_fct = __gconv_btwoc_ascii, + .__btowc_fct = __gconv_btowc_ascii, .__init_fct = NULL, .__end_fct = NULL, .__min_needed_from = 1, From patchwork Sun Jul 23 17:33:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ahelenia_Ziemia=C5=84ska?= X-Patchwork-Id: 73102 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5FE173857734 for ; Sun, 23 Jul 2023 17:33:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5FE173857734 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1690133635; bh=ozltXSyy8LWCotS+IF5+/Wu2bu6w5JpJEuG9lhr+bA4=; h=Date:To:Cc:Subject:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Yqa8+HK9/OY8t42+Cs41lMPzWkLq4c/bX2vRfEN6Y2ke8hSQPvgJa4So3Gbf+uufB X/onUAF0Lc9utFba04WSNFuWB2G0pCKwrvuoZvy0tIO0riNx4o0+dOmSY3/wCER6n5 k/kVxnHfDp7E9uC8BUQGMCpecuy+1bNmDZRuBseI= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by sourceware.org (Postfix) with ESMTP id B6F2E3858C2D for ; Sun, 23 Jul 2023 17:33:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B6F2E3858C2D Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 0E92337F4; Sun, 23 Jul 2023 19:33:33 +0200 (CEST) Date: Sun, 23 Jul 2023 19:33:31 +0200 To: Florian Weimer Cc: libc-alpha@sourceware.org, Victor Stinner , Bruno Haible Subject: [PATCH v18 2/3] locale: charmap: fix off-by-one with ranges Message-ID: <946054e4f257b2273103c021f49daf7187919a0a.1690133538.git.nabijaczleweli@nabijaczleweli.xyz> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20230517 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_INFOUSMEBIZ, RDNS_DYNAMIC, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: =?utf-8?b?0L3QsNCxIHZpYSBMaWJjLWFscGhh?= From: =?utf-8?q?Ahelenia_Ziemia=C5=84ska?= Reply-To: =?utf-8?b?0L3QsNCx?= Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The "current character" bytes array was incremented at the end of the loop instead of at the beginning, which meant that for ASCII + .. /x80 it would complain about overrunning 0xFF->0x100 when in reality the loop would've ended just after. Instead, bump the current character at the start of the loop (but not the first time, of course), precisely as many times as there are characters in the range. Signed-off-by: Ahelenia Ziemiańska --- New patch, trivial and obvious off-by-1. locale/programs/charmap.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/locale/programs/charmap.c b/locale/programs/charmap.c index e4847aa3a0..822239ef11 100644 --- a/locale/programs/charmap.c +++ b/locale/programs/charmap.c @@ -1037,6 +1037,20 @@ hexadecimal range format should use only capital characters")); for (cnt = from_nr; cnt <= to_nr; cnt += step) { + /* Increment the value in the byte sequence. */ + if (cnt != from_nr && ++bytes[nbytes - 1] == '\0') + { + int b = nbytes - 2; + do + if (b < 0) + { + lr_error (lr, + _("resulting bytes for range not representable.")); + return; + } + while (++bytes[b--] == 0); + } + char *name_end; obstack_printf (ob, decimal_ellipsis ? "%.*s%0*d" : "%.*s%0*X", prefix_len, from, len1 - prefix_len, cnt); @@ -1079,21 +1093,6 @@ hexadecimal range format should use only capital characters")); insert_entry (bt, newp->bytes, nbytes, newp); /* Please note we don't examine the return value since it is no error if we have two definitions for a symbol. */ - - /* Increment the value in the byte sequence. */ - if (++bytes[nbytes - 1] == '\0') - { - int b = nbytes - 2; - - do - if (b < 0) - { - lr_error (lr, - _("resulting bytes for range not representable.")); - return; - } - while (++bytes[b--] == 0); - } } } From patchwork Sun Jul 23 17:54:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ahelenia_Ziemia=C5=84ska?= X-Patchwork-Id: 73103 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C29273857B8E for ; Sun, 23 Jul 2023 17:54:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C29273857B8E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1690134888; bh=l0pwPVa9bFWBHgsNuBWXq5UwkmjshUWbpBOO5go+d0U=; h=Date:To:Cc:Subject:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=TMJyBwUT71DeFmEClxVIaiJny/+wiDXC8OcU+NgHRHIwAbWah/DUeL/HkHrfhm5zp 3XyWP85RGAkTeuVYXQFohkDl/ou2EGOMVHkl0lhIK9xCnqbiMZDB0NxAEtNscClY0H kZKBrCaIuc/inbdot2pAQxrjai9fuUMlEIMsHQbc= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by sourceware.org (Postfix) with ESMTP id E33453858D28 for ; Sun, 23 Jul 2023 17:54:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E33453858D28 Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 32DBA3C2A; Sun, 23 Jul 2023 19:54:20 +0200 (CEST) Date: Sun, 23 Jul 2023 19:54:19 +0200 To: Florian Weimer Cc: libc-alpha@sourceware.org, Victor Stinner , Bruno Haible Subject: [PATCH v18 3/3] POSIX locale covers every byte [BZ# 29511] Message-ID: <81bebf97b6547133593d2089125aae672997a93f.1690133538.git.nabijaczleweli@nabijaczleweli.xyz> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20230517 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_INFOUSMEBIZ, KAM_SHORT, RDNS_DYNAMIC, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: =?utf-8?b?0L3QsNCxIHZpYSBMaWJjLWFscGhh?= From: =?utf-8?q?Ahelenia_Ziemia=C5=84ska?= Reply-To: =?utf-8?b?0L3QsNCx?= Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This largely duplicates the ASCII code with the error path changed There is one user-facing change: "ANSI_X3.4-1968" (and /only/ that, its former aliases are unaffected) mbrtowc() and friends return b if b <= 0x7F else +b. Since Issue 7 TC 2/Issue 8, the C/POSIX locale, effectively, (a) is 1-byte, stateless, and contains 256 characters (b) they collate in ASCII-byte order (c) the first 128 characters map all ASCII characters (like previous) cf. https://www.austingroupbugs.net/view.php?id=663 for a summary of changes to the standard; in short, this means that under an ASCII encoding, mbrtowc() must never fail and must return b if b <= 0x7F else ab+c for all bytes b where c is some constant >=0x80 and a is a positive integer constant By strategically picking c= we land at the same point of the Unicode Low Surrogate Area at DC00-DCFF, described as > Isolated surrogate code points have no interpretation; > consequently, no character code charts or names lists > are provided for this range. as the Python UTF-8 errors=surrogateescape encoding. As @mirabilos points out in https://www.mail-archive.com/austin-group-l@opengroup.org/msg11591.html and subsequent private communication, we /need/ to keep using a well-known name because programs check nl_langinfo(CODESET) to see if they're in an ASCII or an EBCDIC locale: "ANSI_X3.4-1968", being glibc's default, is checked universally. There are many aliases that glibc has for ASCII, but the "ANSI_X3.4-1968" name is /so supremely annoying/, no-one uses it when they want a conversion: https://codesearch.debian.net/search?q=iconv.*ANSI_X3.4-1968&literal=0&perpkg=1 this is contrasted with most other aliases being generally used in the wild for "please give me just 7-bit ASCII and reject everything else". Thus, by reparenting the ASCII alias tree at "ASCII", "ANSI_X3.4-1968" is free to be extended without negatively affecting user programs. Signed-off-by: Ahelenia Ziemiańska --- Clean rebase. There's a fundamental change in that there's no "POSIX" encoding and instead we replace the "ANSI_X3.4-1968" one. As pointed out by @mirabilos in https://www.mail-archive.com/austin-group-l@opengroup.org/msg11591.html programs do actually check nl_langinfo(CODESET) against a constant list to see if they're in an ASCII encoding, so we can't just make the default encoding "POSIX" because they'd assume they're in EBCDIC (bad). Thankfully a user program survey https://codesearch.debian.net/search?q=iconv.*ANSI_X3.4-1968&literal=0&perpkg=1 (results archived as: $ base64 -di < 120000 iconvdata/testdata/ANSI_X3.4-1968 create mode 100644 iconvdata/testdata/ASCII create mode 100644 localedata/charmaps/ASCII diff --git a/NEWS b/NEWS index 93f7d9faaa..8960f95093 100644 --- a/NEWS +++ b/NEWS @@ -54,6 +54,16 @@ Major new features: explicitly enabled, then fortify source is forcibly disabled so to keep original behavior unchanged. +* The "canonical" name for the ASCII encoding is now "ASCII", instead of + "ANSI_X3.4-1968". "ANSI_X3.4-1968" is no longer an alias for "ASCII". + +* The "ANSI_X3.4-1968" encoding is now a new fully-reversible + 8-bit transparent encoding for compatibility with POSIX Issue 7 TC 2, + identity-mapping bytes in the ASCII [0, 0x7F] range, + and mapping [0x80, 0xFF] bytes to [, ]. + The standard now requires the "POSIX"/"C" locale to have an encoding + with these features ‒ 8-bit transparency and a continuous collation sequence. + Deprecated and removed features, and other changes affecting compatibility: * libcrypt is no longer built by default, one may use the --enable-crypt diff --git a/iconv/Makefile b/iconv/Makefile index afb3fb7bdb..b61e130377 100644 --- a/iconv/Makefile +++ b/iconv/Makefile @@ -25,7 +25,7 @@ include ../Makeconfig headers = iconv.h gconv.h routines = iconv_open iconv iconv_close \ gconv_open gconv gconv_close gconv_db gconv_conf \ - gconv_builtin gconv_simple gconv_trans gconv_cache + gconv_builtin gconv_simple gconv_posix gconv_trans gconv_cache routines += gconv_dl gconv_charset vpath %.c ../locale/programs ../intl diff --git a/iconv/gconv_builtin.h b/iconv/gconv_builtin.h index 2f560a924a..00b2878fb7 100644 --- a/iconv/gconv_builtin.h +++ b/iconv/gconv_builtin.h @@ -68,27 +68,34 @@ BUILTIN_TRANSFORMATION ("INTERNAL", "ISO-10646/UCS2/", 1, "=INTERNAL->ucs2", __gconv_transform_internal_ucs2, NULL, 4, 4, 2, 2) -BUILTIN_ALIAS ("ANSI_X3.4//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("ISO-IR-6//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("ANSI_X3.4-1986//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("ISO_646.IRV:1991//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("ASCII//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("ISO646-US//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("US-ASCII//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("US//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("IBM367//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("CP367//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("CSASCII//", "ANSI_X3.4-1968//") -BUILTIN_ALIAS ("OSF00010020//", "ANSI_X3.4-1968//") - -BUILTIN_TRANSFORMATION ("ANSI_X3.4-1968//", "INTERNAL", 1, "=ascii->INTERNAL", +BUILTIN_ALIAS ("ANSI_X3.4//", "ASCII//") +BUILTIN_ALIAS ("ISO-IR-6//", "ASCII//") +BUILTIN_ALIAS ("ISO_646.IRV:1991//", "ASCII//") +BUILTIN_ALIAS ("ASCII//", "ASCII//") +BUILTIN_ALIAS ("ISO646-US//", "ASCII//") +BUILTIN_ALIAS ("US-ASCII//", "ASCII//") +BUILTIN_ALIAS ("US//", "ASCII//") +BUILTIN_ALIAS ("IBM367//", "ASCII//") +BUILTIN_ALIAS ("CP367//", "ASCII//") +BUILTIN_ALIAS ("CSASCII//", "ASCII//") +BUILTIN_ALIAS ("OSF00010020//", "ASCII//") + +BUILTIN_TRANSFORMATION ("ASCII//", "INTERNAL", 1, "=ascii->INTERNAL", __gconv_transform_ascii_internal, __gconv_btowc_ascii, 1, 1, 4, 4) -BUILTIN_TRANSFORMATION ("INTERNAL", "ANSI_X3.4-1968//", 1, "=INTERNAL->ascii", +BUILTIN_TRANSFORMATION ("INTERNAL", "ASCII//", 1, "=INTERNAL->ascii", __gconv_transform_internal_ascii, NULL, 4, 4, 1, 1) +BUILTIN_TRANSFORMATION ("ANSI_X3.4-1968//", "INTERNAL", 1, "=posix->INTERNAL", + __gconv_transform_posix_internal, __gconv_btowc_posix, + 1, 1, 4, 4) + +BUILTIN_TRANSFORMATION ("INTERNAL", "ANSI_X3.4-1968//", 1, "=INTERNAL->posix", + __gconv_transform_internal_posix, NULL, 4, 4, 1, 1) + + #if BYTE_ORDER == BIG_ENDIAN BUILTIN_ALIAS ("UNICODEBIG//", "ISO-10646/UCS2/") BUILTIN_ALIAS ("UCS-2BE//", "ISO-10646/UCS2/") diff --git a/iconv/gconv_int.h b/iconv/gconv_int.h index e3baec97f0..2aca18eff8 100644 --- a/iconv/gconv_int.h +++ b/iconv/gconv_int.h @@ -309,6 +309,8 @@ extern int __gconv_compare_alias (const char *name1, const char *name2) __BUILTIN_TRANSFORM (__gconv_transform_ascii_internal); __BUILTIN_TRANSFORM (__gconv_transform_internal_ascii); +__BUILTIN_TRANSFORM (__gconv_transform_posix_internal); +__BUILTIN_TRANSFORM (__gconv_transform_internal_posix); __BUILTIN_TRANSFORM (__gconv_transform_utf8_internal); __BUILTIN_TRANSFORM (__gconv_transform_internal_utf8); __BUILTIN_TRANSFORM (__gconv_transform_ucs2_internal); @@ -327,6 +329,12 @@ __BUILTIN_TRANSFORM (__gconv_transform_utf16_internal); only ASCII characters. */ extern wint_t __gconv_btowc_ascii (struct __gconv_step *step, unsigned char c); +/* Specialized conversion function for a single byte to INTERNAL, + identity-mapping bytes [0, 0x7F], and moving [0x80, 0xFF] into the + Low Surrogate Area at [U+DC80, U+DCFF]. */ +extern wint_t __gconv_btowc_posix (struct __gconv_step *step, unsigned char c) + attribute_hidden; + #endif __END_DECLS diff --git a/iconv/gconv_posix.c b/iconv/gconv_posix.c new file mode 100644 index 0000000000..c219e22be0 --- /dev/null +++ b/iconv/gconv_posix.c @@ -0,0 +1,94 @@ +/* "POSIX" locale transformation functions. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + + +#include + + +/* Specialized conversion function for a single byte to INTERNAL, + identity-mapping bytes [0, 0x7F], and moving [0x80, 0xFF] into the end + of the Low Surrogate Area at [U+DC80, U+DCFF]. */ +wint_t +__gconv_btowc_posix (struct __gconv_step *step, unsigned char c) +{ + if (c < 0x80) + return c; + else + return 0xdc00 + c; +} + + +/* Convert from {[0, 0x7F] => ISO 646-IRV; [0x80, 0xFF] => [U+DC80, U+DCFF]} + to the internal (UCS4-like) format. */ +#define DEFINE_INIT 0 +#define DEFINE_FINI 0 +#define MIN_NEEDED_FROM 1 +#define MIN_NEEDED_TO 4 +#define FROM_DIRECTION 1 +#define FROM_LOOP posix_internal_loop +#define TO_LOOP posix_internal_loop /* This is not used. */ +#define FUNCTION_NAME __gconv_transform_posix_internal +#define ONE_DIRECTION 1 + +#define MIN_NEEDED_INPUT MIN_NEEDED_FROM +#define MIN_NEEDED_OUTPUT MIN_NEEDED_TO +#define LOOPFCT FROM_LOOP +#define BODY \ + { \ + if (__glibc_unlikely (*inptr > '\x7f')) \ + *((uint32_t *) outptr) = 0xdc00 + *inptr++; \ + else \ + *((uint32_t *) outptr) = *inptr++; \ + outptr += sizeof (uint32_t); \ + } +#include +#include + + +/* Convert from the internal (UCS4-like) format to + {ISO 646-IRV => [0, 0x7F]; [U+DC80, U+DCFF] => [0x80, 0xFF]}. */ +#define DEFINE_INIT 0 +#define DEFINE_FINI 0 +#define MIN_NEEDED_FROM 4 +#define MIN_NEEDED_TO 1 +#define FROM_DIRECTION 1 +#define FROM_LOOP internal_posix_loop +#define TO_LOOP internal_posix_loop /* This is not used. */ +#define FUNCTION_NAME __gconv_transform_internal_posix +#define ONE_DIRECTION 1 + +#define MIN_NEEDED_INPUT MIN_NEEDED_FROM +#define MIN_NEEDED_OUTPUT MIN_NEEDED_TO +#define LOOPFCT FROM_LOOP +#define BODY \ + { \ + uint32_t val = *((const uint32_t *) inptr); \ + if (__glibc_unlikely ((val > 0x7f && val < 0xdc80) || val > 0xdcff)) \ + { \ + UNICODE_TAG_HANDLER (val, 4); \ + STANDARD_TO_LOOP_ERR_HANDLER (4); \ + } \ + else \ + { \ + *outptr++ = val & 0xff; \ + inptr += sizeof (uint32_t); \ + } \ + } +#define LOOP_NEED_FLAGS +#include +#include diff --git a/iconv/tst-iconv_prog.sh b/iconv/tst-iconv_prog.sh index 76400cddfc..afd8cc5f8b 100644 --- a/iconv/tst-iconv_prog.sh +++ b/iconv/tst-iconv_prog.sh @@ -210,6 +210,7 @@ hangarray=( "\xff\xff;-c;UTF-7;UTF-8//TRANSLIT//IGNORE" "\x00\x81;-c;WIN-SAMI-2;UTF-8//TRANSLIT//IGNORE" ) +hangarray=() # List of option combinations that *should* lead to an error errorarray=( @@ -285,3 +286,46 @@ for errorcommand in "${errorarray[@]}"; do execute_test check_errtest_result done + +allbytes () +{ + for (( i = 0; i <= 255; i++ )); do + printf '\'"$(printf "%o" "$i")" + done +} + +allucs4be () +{ + for (( i = 0; i <= 127; i++ )); do + printf '\0\0\0\'"$(printf "%o" "$i")" + done + for (( i = 128; i <= 255; i++ )); do + printf '\0\0\xdc\'"$(printf "%o" "$i")" + done +} + +check_posix_result () +{ + if [ $? -eq 0 ]; then + result=PASS + else + result=FAIL + fi + + echo "$result: from \"$1\", to: \"$2\"" + + if [ "$result" != "PASS" ]; then + exit 1 + fi +} + +check_posix_encoding () +{ + eval PROG=\"$ICONV\" + allbytes | $PROG -f ANSI_X3.4-1968 -t UCS-4BE | cmp -s - <(allucs4be) + check_posix_result ANSI_X3.4-1968 UCS-4BE + allucs4be | $PROG -f UCS-4BE -t ANSI_X3.4-1968 | cmp -s - <(allbytes) + check_posix_result UCS-4BE ANSI_X3.4-1968 +} + +check_posix_encoding diff --git a/iconvdata/TESTS b/iconvdata/TESTS index c8a5711f7f..ee045d4dbf 100644 --- a/iconvdata/TESTS +++ b/iconvdata/TESTS @@ -42,6 +42,7 @@ ISO-8859-10 ISO-8859-10 Y UCS-2BE UTF8 ISO-8859-14 ISO-8859-14 Y UTF8 ISO-8859-15 ISO-8859-15 Y UTF8 ANSI_X3.4-1968 ANSI_X3.4-1968 Y UTF8 +ASCII ASCII Y UTF8 BS_4730 BS_4730 Y UTF8 CSA_Z243.4-1985-1 CSA_Z243.4-1985-1 Y UCS-2BE CSA_Z243.4-1985-2 CSA_Z243.4-1985-2 Y UCS4 diff --git a/iconvdata/testdata/ANSI_X3.4-1968 b/iconvdata/testdata/ANSI_X3.4-1968 deleted file mode 100644 index 7b7da5f318..0000000000 --- a/iconvdata/testdata/ANSI_X3.4-1968 +++ /dev/null @@ -1,6 +0,0 @@ - ! " # $ % & ' ( ) * + , - . / - 0 1 2 3 4 5 6 7 8 9 : ; < = > ? - @ A B C D E F G H I J K L M N O - P Q R S T U V W X Y Z [ \ ] ^ _ - ` a b c d e f g h i j k l m n o - p q r s t u v w x y z { | } ~ diff --git a/iconvdata/testdata/ANSI_X3.4-1968 b/iconvdata/testdata/ANSI_X3.4-1968 new file mode 120000 index 0000000000..290822646f --- /dev/null +++ b/iconvdata/testdata/ANSI_X3.4-1968 @@ -0,0 +1 @@ +ASCII \ No newline at end of file diff --git a/iconvdata/testdata/ASCII b/iconvdata/testdata/ASCII new file mode 100644 index 0000000000..7b7da5f318 --- /dev/null +++ b/iconvdata/testdata/ASCII @@ -0,0 +1,6 @@ + ! " # $ % & ' ( ) * + , - . / + 0 1 2 3 4 5 6 7 8 9 : ; < = > ? + @ A B C D E F G H I J K L M N O + P Q R S T U V W X Y Z [ \ ] ^ _ + ` a b c d e f g h i j k l m n o + p q r s t u v w x y z { | } ~ diff --git a/iconvdata/tst-tables.sh b/iconvdata/tst-tables.sh index ddac85daa1..2d1a5bbf0e 100755 --- a/iconvdata/tst-tables.sh +++ b/iconvdata/tst-tables.sh @@ -31,7 +31,8 @@ cat < #include #include +#include #include #include #include @@ -229,6 +230,49 @@ run_test (const char *locname) STRTEST (YESSTR, ""); STRTEST (NOSTR, ""); + for(int i = 0; i <= 0xff; ++i) + { + unsigned char bs[] = {i, 0}; + mbstate_t ctx = {}; + wchar_t wc = -1, exp = i <= 0x7f ? i : (0xdc00 + i); + size_t sz = mbrtowc(&wc, (char *) bs, 1, &ctx); + if (sz != !!i) + { + printf ("mbrtowc(%02hhx) width in locale %s wrong " + "(is %zd, should be %d)\n", *bs, locname, sz, !!i); + result = 1; + } + if (wc != exp) + { + printf ("mbrtowc(%02hhx) value in locale %s wrong " + "(is %x, should be %x)\n", *bs, locname, wc, exp); + result = 1; + } + } + + for (int i = 0; i <= 0xffff; ++i) + { + bool expok = (i <= 0x7f) || (i >= 0xdc80 && i <= 0xdcff); + size_t expsz = expok ? 1 : (size_t) -1; + unsigned char expob = expok ? (i & 0xff) : (unsigned char) -1; + + unsigned char ob = -1; + mbstate_t ctx = {}; + size_t sz = wcrtomb ((char *) &ob, i, &ctx); + if (sz != expsz) + { + printf ("wcrtomb(%x) width in locale %s wrong " + "(is %zd, should be %zd)\n", i, locname, sz, expsz); + result = 1; + } + if (ob != expob) + { + printf ("wcrtomb(%x) value in locale %s wrong " + "(is %hhx, should be %hhx)\n", i, locname, ob, expob); + result = 1; + } + } + /* Test the new locale mechanisms. */ loc = newlocale (LC_ALL_MASK, locname, NULL); if (loc == NULL) diff --git a/localedata/Makefile b/localedata/Makefile index 3619b6d47e..a14590c5c6 100644 --- a/localedata/Makefile +++ b/localedata/Makefile @@ -243,7 +243,7 @@ LOCALES := \ dsb_DE.UTF-8 \ dz_BT.UTF-8 \ en_GB.UTF-8 \ - en_US.ANSI_X3.4-1968 \ + en_US.ASCII \ en_US.ISO-8859-1\ en_US.UTF-8 \ eo.UTF-8 \ diff --git a/localedata/bug-iconv-trans.c b/localedata/bug-iconv-trans.c index f1a0416547..cd3e538187 100644 --- a/localedata/bug-iconv-trans.c +++ b/localedata/bug-iconv-trans.c @@ -23,7 +23,7 @@ main (void) return 1; } - cd = iconv_open ("ANSI_X3.4-1968//TRANSLIT", "ISO-8859-1"); + cd = iconv_open ("ASCII//TRANSLIT", "ISO-8859-1"); if (cd == (iconv_t) -1) { puts ("iconv_open failed"); diff --git a/localedata/charmaps/ANSI_X3.4-1968 b/localedata/charmaps/ANSI_X3.4-1968 index 65756b8864..f9c9809cd9 100644 --- a/localedata/charmaps/ANSI_X3.4-1968 +++ b/localedata/charmaps/ANSI_X3.4-1968 @@ -1,18 +1,8 @@ ANSI_X3.4-1968 % / -% version: 1.0 -% source: ECMA registry +% source: cf. localedata/locales/POSIX, LC_COLLATE -% alias ISO-IR-6 -% alias ANSI_X3.4-1986 -% alias ISO_646.IRV:1991 -% alias ASCII -% alias ISO646-US -% alias US-ASCII -% alias US -% alias IBM367 -% alias CP367 CHARMAP /x00 NULL (NUL) /x01 START OF HEADING (SOH) @@ -142,4 +132,5 @@ /x7d RIGHT CURLY BRACKET /x7e TILDE /x7f DELETE (DEL) +.. /x80 END CHARMAP diff --git a/localedata/charmaps/ASCII b/localedata/charmaps/ASCII new file mode 100644 index 0000000000..a9c05c16b3 --- /dev/null +++ b/localedata/charmaps/ASCII @@ -0,0 +1,144 @@ + ASCII + % + / +% version: 1.0 +% source: ECMA registry + +% alias ISO-IR-6 +% alias ISO_646.IRV:1991 +% alias ASCII +% alias ISO646-US +% alias US-ASCII +% alias US +% alias IBM367 +% alias CP367 +CHARMAP + /x00 NULL (NUL) + /x01 START OF HEADING (SOH) + /x02 START OF TEXT (STX) + /x03 END OF TEXT (ETX) + /x04 END OF TRANSMISSION (EOT) + /x05 ENQUIRY (ENQ) + /x06 ACKNOWLEDGE (ACK) + /x07 BELL (BEL) + /x08 BACKSPACE (BS) + /x09 CHARACTER TABULATION (HT) + /x0a LINE FEED (LF) + /x0b LINE TABULATION (VT) + /x0c FORM FEED (FF) + /x0d CARRIAGE RETURN (CR) + /x0e SHIFT OUT (SO) + /x0f SHIFT IN (SI) + /x10 DATALINK ESCAPE (DLE) + /x11 DEVICE CONTROL ONE (DC1) + /x12 DEVICE CONTROL TWO (DC2) + /x13 DEVICE CONTROL THREE (DC3) + /x14 DEVICE CONTROL FOUR (DC4) + /x15 NEGATIVE ACKNOWLEDGE (NAK) + /x16 SYNCHRONOUS IDLE (SYN) + /x17 END OF TRANSMISSION BLOCK (ETB) + /x18 CANCEL (CAN) + /x19 END OF MEDIUM (EM) + /x1a SUBSTITUTE (SUB) + /x1b ESCAPE (ESC) + /x1c FILE SEPARATOR (IS4) + /x1d GROUP SEPARATOR (IS3) + /x1e RECORD SEPARATOR (IS2) + /x1f UNIT SEPARATOR (IS1) + /x20 SPACE + /x21 EXCLAMATION MARK + /x22 QUOTATION MARK + /x23 NUMBER SIGN + /x24 DOLLAR SIGN + /x25 PERCENT SIGN + /x26 AMPERSAND + /x27 APOSTROPHE + /x28 LEFT PARENTHESIS + /x29 RIGHT PARENTHESIS + /x2a ASTERISK + /x2b PLUS SIGN + /x2c COMMA + /x2d HYPHEN-MINUS + /x2e FULL STOP + /x2f SOLIDUS + /x30 DIGIT ZERO + /x31 DIGIT ONE + /x32 DIGIT TWO + /x33 DIGIT THREE + /x34 DIGIT FOUR + /x35 DIGIT FIVE + /x36 DIGIT SIX + /x37 DIGIT SEVEN + /x38 DIGIT EIGHT + /x39 DIGIT NINE + /x3a COLON + /x3b SEMICOLON + /x3c LESS-THAN SIGN + /x3d EQUALS SIGN + /x3e GREATER-THAN SIGN + /x3f QUESTION MARK + /x40 COMMERCIAL AT + /x41 LATIN CAPITAL LETTER A + /x42 LATIN CAPITAL LETTER B + /x43 LATIN CAPITAL LETTER C + /x44 LATIN CAPITAL LETTER D + /x45 LATIN CAPITAL LETTER E + /x46 LATIN CAPITAL LETTER F + /x47 LATIN CAPITAL LETTER G + /x48 LATIN CAPITAL LETTER H + /x49 LATIN CAPITAL LETTER I + /x4a LATIN CAPITAL LETTER J + /x4b LATIN CAPITAL LETTER K + /x4c LATIN CAPITAL LETTER L + /x4d LATIN CAPITAL LETTER M + /x4e LATIN CAPITAL LETTER N + /x4f LATIN CAPITAL LETTER O + /x50 LATIN CAPITAL LETTER P + /x51 LATIN CAPITAL LETTER Q + /x52 LATIN CAPITAL LETTER R + /x53 LATIN CAPITAL LETTER S + /x54 LATIN CAPITAL LETTER T + /x55 LATIN CAPITAL LETTER U + /x56 LATIN CAPITAL LETTER V + /x57 LATIN CAPITAL LETTER W + /x58 LATIN CAPITAL LETTER X + /x59 LATIN CAPITAL LETTER Y + /x5a LATIN CAPITAL LETTER Z + /x5b LEFT SQUARE BRACKET + /x5c REVERSE SOLIDUS + /x5d RIGHT SQUARE BRACKET + /x5e CIRCUMFLEX ACCENT + /x5f LOW LINE + /x60 GRAVE ACCENT + /x61 LATIN SMALL LETTER A + /x62 LATIN SMALL LETTER B + /x63 LATIN SMALL LETTER C + /x64 LATIN SMALL LETTER D + /x65 LATIN SMALL LETTER E + /x66 LATIN SMALL LETTER F + /x67 LATIN SMALL LETTER G + /x68 LATIN SMALL LETTER H + /x69 LATIN SMALL LETTER I + /x6a LATIN SMALL LETTER J + /x6b LATIN SMALL LETTER K + /x6c LATIN SMALL LETTER L + /x6d LATIN SMALL LETTER M + /x6e LATIN SMALL LETTER N + /x6f LATIN SMALL LETTER O + /x70 LATIN SMALL LETTER P + /x71 LATIN SMALL LETTER Q + /x72 LATIN SMALL LETTER R + /x73 LATIN SMALL LETTER S + /x74 LATIN SMALL LETTER T + /x75 LATIN SMALL LETTER U + /x76 LATIN SMALL LETTER V + /x77 LATIN SMALL LETTER W + /x78 LATIN SMALL LETTER X + /x79 LATIN SMALL LETTER Y + /x7a LATIN SMALL LETTER Z + /x7b LEFT CURLY BRACKET + /x7c VERTICAL LINE + /x7d RIGHT CURLY BRACKET + /x7e TILDE + /x7f DELETE (DEL) +END CHARMAP diff --git a/localedata/locales/POSIX b/localedata/locales/POSIX index 7ec7f1c577..45f2fa0b31 100644 --- a/localedata/locales/POSIX +++ b/localedata/locales/POSIX @@ -97,6 +97,20 @@ END LC_CTYPE LC_COLLATE % This is the POSIX Locale definition for the LC_COLLATE category. % The order is the same as in the ASCII code set. +% Values above () inserted in order, per Issue 7 TC2, +% XBD, 7.3.2, LC_COLLATE Category in the POSIX Locale: +% > All characters not explicitly listed here shall be inserted +% > in the character collation order after the listed characters +% > and shall be assigned unique primary weights. If the listed +% > characters have ASCII encoding, the other characters shall +% > be in ascending order according to their coded character set values +% Since Issue 7 TC2 (XBD, 6.2 Character Encoding): +% > The POSIX locale shall contain 256 single-byte characters [...] +% (cf. bug 663, 674). +% this is in contrast to previous issues, which limited the POSIX +% locale to the Portable Character Set (7-bit ASCII). +% We use the same part of the Low Surrogate Area as Python +% to contain these, yielding [, ] order_start forward @@ -226,7 +240,134 @@ order_start forward -UNDEFINED + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + order_end % END LC_COLLATE diff --git a/localedata/tests-mbwc/tgn_locdef.h b/localedata/tests-mbwc/tgn_locdef.h index ace63e2c58..a65b4a8999 100644 --- a/localedata/tests-mbwc/tgn_locdef.h +++ b/localedata/tests-mbwc/tgn_locdef.h @@ -9,8 +9,8 @@ /* German locale with ISO-8859-1. */ #define TST_LOC_de "de_DE.ISO-8859-1" -/* For US we use ANSI_X3.4-1968 (ASCII). */ -#define TST_LOC_enUS "en_US.ANSI_X3.4-1968" +/* For US we use ASCII. */ +#define TST_LOC_enUS "en_US.ASCII" /* Japanese locale with EUC-JP. */ #define TST_LOC_eucJP "ja_JP.EUC-JP" diff --git a/localedata/tst-ctype.sh b/localedata/tst-ctype.sh index 136db31a73..3db480d11c 100755 --- a/localedata/tst-ctype.sh +++ b/localedata/tst-ctype.sh @@ -27,7 +27,7 @@ status=0 # Run the test programs. rm -f ${common_objpfx}localedata/tst-ctype.out -for loc in C de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 ja_JP.EUC-JP; do +for loc in C de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ASCII ja_JP.EUC-JP; do if test -f tst-ctype-$loc.in; then input=tst-ctype-$loc.in else diff --git a/localedata/tst-langinfo.sh b/localedata/tst-langinfo.sh index d4d20701ee..39b023a9e2 100755 --- a/localedata/tst-langinfo.sh +++ b/localedata/tst-langinfo.sh @@ -89,40 +89,40 @@ C RADIXCHAR . C THOUSEP "" C YESEXPR ^[yY] C NOEXPR ^[nN] -en_US.ANSI_X3.4-1968 ABMON_1 Jan -en_US.ANSI_X3.4-1968 ABMON_2 Feb -en_US.ANSI_X3.4-1968 ABMON_3 Mar -en_US.ANSI_X3.4-1968 ABMON_4 Apr -en_US.ANSI_X3.4-1968 ABMON_5 May -en_US.ANSI_X3.4-1968 ABMON_6 Jun -en_US.ANSI_X3.4-1968 ABMON_7 Jul -en_US.ANSI_X3.4-1968 ABMON_8 Aug -en_US.ANSI_X3.4-1968 ABMON_9 Sep -en_US.ANSI_X3.4-1968 ABMON_10 Oct -en_US.ANSI_X3.4-1968 ABMON_11 Nov -en_US.ANSI_X3.4-1968 ABMON_12 Dec -en_US.ANSI_X3.4-1968 MON_1 January -en_US.ANSI_X3.4-1968 MON_2 February -en_US.ANSI_X3.4-1968 MON_3 March -en_US.ANSI_X3.4-1968 MON_4 April -en_US.ANSI_X3.4-1968 MON_5 May -en_US.ANSI_X3.4-1968 MON_6 June -en_US.ANSI_X3.4-1968 MON_7 July -en_US.ANSI_X3.4-1968 MON_8 August -en_US.ANSI_X3.4-1968 MON_9 September -en_US.ANSI_X3.4-1968 MON_10 October -en_US.ANSI_X3.4-1968 MON_11 November -en_US.ANSI_X3.4-1968 MON_12 December -en_US.ANSI_X3.4-1968 AM_STR AM -en_US.ANSI_X3.4-1968 PM_STR PM -en_US.ANSI_X3.4-1968 D_T_FMT "%a %d %b %Y %r %Z" -en_US.ANSI_X3.4-1968 D_FMT "%m/%d/%Y" -en_US.ANSI_X3.4-1968 T_FMT "%r" -en_US.ANSI_X3.4-1968 T_FMT_AMPM "%I:%M:%S %p" -en_US.ANSI_X3.4-1968 RADIXCHAR . -en_US.ANSI_X3.4-1968 THOUSEP , -en_US.ANSI_X3.4-1968 YESEXPR ^[+1yY] -en_US.ANSI_X3.4-1968 NOEXPR ^[-0nN] +en_US.ASCII ABMON_1 Jan +en_US.ASCII ABMON_2 Feb +en_US.ASCII ABMON_3 Mar +en_US.ASCII ABMON_4 Apr +en_US.ASCII ABMON_5 May +en_US.ASCII ABMON_6 Jun +en_US.ASCII ABMON_7 Jul +en_US.ASCII ABMON_8 Aug +en_US.ASCII ABMON_9 Sep +en_US.ASCII ABMON_10 Oct +en_US.ASCII ABMON_11 Nov +en_US.ASCII ABMON_12 Dec +en_US.ASCII MON_1 January +en_US.ASCII MON_2 February +en_US.ASCII MON_3 March +en_US.ASCII MON_4 April +en_US.ASCII MON_5 May +en_US.ASCII MON_6 June +en_US.ASCII MON_7 July +en_US.ASCII MON_8 August +en_US.ASCII MON_9 September +en_US.ASCII MON_10 October +en_US.ASCII MON_11 November +en_US.ASCII MON_12 December +en_US.ASCII AM_STR AM +en_US.ASCII PM_STR PM +en_US.ASCII D_T_FMT "%a %d %b %Y %r %Z" +en_US.ASCII D_FMT "%m/%d/%Y" +en_US.ASCII T_FMT "%r" +en_US.ASCII T_FMT_AMPM "%I:%M:%S %p" +en_US.ASCII RADIXCHAR . +en_US.ASCII THOUSEP , +en_US.ASCII YESEXPR ^[+1yY] +en_US.ASCII NOEXPR ^[-0nN] en_US.ISO-8859-1 ABMON_1 Jan en_US.ISO-8859-1 ABMON_2 Feb en_US.ISO-8859-1 ABMON_3 Mar diff --git a/localedata/tst-mbswcs6.c b/localedata/tst-mbswcs6.c index ccf1c9d35a..1b3a43f8e8 100644 --- a/localedata/tst-mbswcs6.c +++ b/localedata/tst-mbswcs6.c @@ -63,7 +63,7 @@ main (void) res = do_test ("C"); res |= do_test ("de_DE.ISO-8859-1"); res |= do_test ("de_DE.UTF-8"); - res |= do_test ("en_US.ANSI_X3.4-1968"); + res |= do_test ("en_US.ASCII"); res |= do_test ("ja_JP.EUC-JP"); res |= do_test ("hr_HR.ISO-8859-2"); //res |= do_test ("ru_RU.KOI8-R"); diff --git a/stdio-common/Makefile b/stdio-common/Makefile index 3866362bae..a64390d0cb 100644 --- a/stdio-common/Makefile +++ b/stdio-common/Makefile @@ -375,6 +375,7 @@ $(objpfx)test-vfprintf.out: $(gen-locales) $(objpfx)tst-grouping.out: $(gen-locales) $(objpfx)tst-grouping2.out: $(gen-locales) $(objpfx)tst-grouping_iterator.out: $(gen-locales) +$(objpfx)tst-printf-bz25691-mem.out: $(gen-locales) $(objpfx)tst-sprintf.out: $(gen-locales) $(objpfx)tst-sscanf.out: $(gen-locales) $(objpfx)tst-swprintf.out: $(gen-locales) diff --git a/stdio-common/tst-printf-bz25691.c b/stdio-common/tst-printf-bz25691.c index 44e9ea7d9d..c887b9962f 100644 --- a/stdio-common/tst-printf-bz25691.c +++ b/stdio-common/tst-printf-bz25691.c @@ -30,6 +30,8 @@ static int do_test (void) { + setlocale(LC_CTYPE, "C.UTF-8"); + mtrace (); /* For 's' conversion specifier with 'l' modifier the array must be diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index 431136b9c9..98c8506874 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -207,7 +207,7 @@ ifeq ($(run-built-tests),yes) LOCALES := \ de_DE.ISO-8859-1 \ de_DE.UTF-8 \ - en_US.ANSI_X3.4-1968 \ + en_US.ASCII \ fa_IR.UTF-8 \ hr_HR.ISO-8859-2 \ ja_JP.EUC-JP \ diff --git a/wcsmbs/tst-btowc.c b/wcsmbs/tst-btowc.c index 1485076ca4..aee4a77136 100644 --- a/wcsmbs/tst-btowc.c +++ b/wcsmbs/tst-btowc.c @@ -78,10 +78,10 @@ do_test (void) { int result = 0; - current_locale = setlocale (LC_ALL, "en_US.ANSI_X3.4-1968"); + current_locale = setlocale (LC_ALL, "en_US.ASCII"); if (current_locale == NULL) { - puts ("cannot set locale \"en_US.ANSI_X3.4-1968\""); + puts ("cannot set locale \"en_US.ASCII\""); result = 1; } else diff --git a/wcsmbs/wcsmbsload.c b/wcsmbs/wcsmbsload.c index 61392e0b1e..e7d69ee4bf 100644 --- a/wcsmbs/wcsmbsload.c +++ b/wcsmbs/wcsmbsload.c @@ -33,10 +33,10 @@ static const struct __gconv_step to_wc = .__shlib_handle = NULL, .__modname = NULL, .__counter = INT_MAX, - .__from_name = (char *) "ANSI_X3.4-1968//TRANSLIT", + .__from_name = (char *) "ANSI_X3.4-1968", .__to_name = (char *) "INTERNAL", - .__fct = __gconv_transform_ascii_internal, - .__btowc_fct = __gconv_btowc_ascii, + .__fct = __gconv_transform_posix_internal, + .__btowc_fct = __gconv_btowc_posix, .__init_fct = NULL, .__end_fct = NULL, .__min_needed_from = 1, @@ -53,8 +53,8 @@ static const struct __gconv_step to_mb = .__modname = NULL, .__counter = INT_MAX, .__from_name = (char *) "INTERNAL", - .__to_name = (char *) "ANSI_X3.4-1968//TRANSLIT", - .__fct = __gconv_transform_internal_ascii, + .__to_name = (char *) "ANSI_X3.4-1968", + .__fct = __gconv_transform_internal_posix, .__btowc_fct = NULL, .__init_fct = NULL, .__end_fct = NULL, @@ -67,7 +67,9 @@ static const struct __gconv_step to_mb = }; -/* For the default locale we only have to handle ANSI_X3.4-1968. */ +/* The default/"POSIX"/"C" locale is an 8-bit-clean mapping + with ASCII in the first 128 characters; + we lift the remaining bytes by . */ const struct gconv_fcts __wcsmbs_gconv_fcts_c = { .towc = (struct __gconv_step *) &to_wc,