From patchwork Mon Feb 8 20:28:25 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carlos O'Donell X-Patchwork-Id: 10764 Received: (qmail 95914 invoked by alias); 8 Feb 2016 20:28:29 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 95904 invoked by uid 89); 8 Feb 2016 20:28:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=Hx-languages-length:1869, clarifying, FULL X-HELO: mx1.redhat.com To: Andreas Schwab , GNU C Library From: "Carlos O'Donell" Subject: [PATCH] BZ #19575: Clarify status of entries in GB 18030-2005. Message-ID: <56B8FA69.8030508@redhat.com> Date: Mon, 8 Feb 2016 15:28:25 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In bug 19575 Florian Weimer asks about the status of the glibc support for GB 18030-2005, since ICU and Emacs produce slightly different results than glibc. The following patch adds clarifying comments to GB 18030-2005's character map to explain why glibc has the following mapping and why it is best-practice. localedata/ 2016-02-08 Carlos O'Donell * charmaps/GB18030 --- Cheers, Carlos. diff --git a/localedata/charmaps/GB18030 b/localedata/charmaps/GB18030 index 863a123..c48276e 100644 --- a/localedata/charmaps/GB18030 +++ b/localedata/charmaps/GB18030 @@ -57234,6 +57234,12 @@ CHARMAP /xa6/xbe /xa6/xbf /xa6/xc0 +% The newest GB 18030-2005 standard still uses some private use area +% code points. Any implementation which has Unicode 4.1 or newer +% support should not use these PUA code points, and instead should +% map these entries to their equivalent non-PUA code points which +% in this case map from to . This recommendation is +% based on "CJKV Processing" by Dr. Ken Lunde. % /xa6/xd9 % /xa6/xda % /xa6/xdb @@ -62997,6 +63003,10 @@ CHARMAP /x84/x31/x82/x33 VARIATION SELECTOR-14 /x84/x31/x82/x34 VARIATION SELECTOR-15 /x84/x31/x82/x35 VARIATION SELECTOR-16 +% The code points from to are a adjustment +% of the GB 18030-2005 standard to account for the fact that +% with Unicode 4.1 support we can now correctly represent those +% entries, which in the standard, used PUA code points. /xa6/xd9 PRESENTATION FORM FOR VERTICAL COMMA /xa6/xdb PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA /xa6/xda PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP