From patchwork Tue Feb 9 10:02:42 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Carlos O'Donell X-Patchwork-Id: 10769 Received: (qmail 89824 invoked by alias); 9 Feb 2016 10:02:48 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 89803 invoked by uid 89); 9 Feb 2016 10:02:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=technically, checkin, xa0, states X-HELO: mx1.redhat.com Subject: Re: [PATCH] BZ #19575: Clarify status of entries in GB 18030-2005. To: Andreas Schwab References: <56B8FA69.8030508@redhat.com> <87mvrakhab.fsf@linux-m68k.org> <56B90D0C.7090000@redhat.com> <87a8nakfq6.fsf@linux-m68k.org> <56B92BC9.7010103@redhat.com> Cc: GNU C Library From: "Carlos O'Donell" X-Enigmail-Draft-Status: N1110 Message-ID: <56B9B942.2030203@redhat.com> Date: Tue, 9 Feb 2016 05:02:42 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: On 02/09/2016 03:55 AM, Andreas Schwab wrote: > "Carlos O'Donell" writes: > >> On 02/08/2016 05:19 PM, Andreas Schwab wrote: >>> "Carlos O'Donell" writes: >>> >>>> This patch is only to clarify why these entries are being mapped >>>> differently than in the original GB 18030-2005 standard. >>> >>> They aren't. >> >> Do you have a copy of the standard to verify that? > > See charset/data/ucm/gb-18030-2005.ucm in ICU. That's not a copy of the standard. "CJKV Information Processing" by Dr. Ken Lunde on page 108 explicitly states that GB-18030-2005 has 24 PUA mappings that with Unicode 4.1 or newer can be mapped to non-PUA equivalents and he describes the 24 characters, and the ICU ucm data does exactly that. This does not match the published standard, but that is OK, it's best practice not to use PUA mappings if you can avoid it when later Unicode versions include non-PUA equivalents (as we do also in glibc). All I want to clarify in the glibc version of these files is that the data is not identical to the standard as published. v2 of the patch follows. OK to checkin? Cheers, Carlos. 2016-02-09 Carlos O'Donell [BZ #19575] * charmaps/GB18030: Document PUA to non-PUA equivalents. diff --git a/localedata/charmaps/GB18030 b/localedata/charmaps/GB18030 index 863a123..85a15fe 100644 --- a/localedata/charmaps/GB18030 +++ b/localedata/charmaps/GB18030 @@ -57234,6 +57234,22 @@ CHARMAP /xa6/xbe /xa6/xbf /xa6/xc0 +% The newest GB 18030-2005 standard still uses some private use area +% code points. Any implementation which has Unicode 4.1 or newer +% support should not use these PUA code points, and instead should +% map these entries to their equivalent non-PUA code points. There +% are 24 idiograms in GB 18030-2005 which have non-PUA equivalents. +% In glibc we only support roundtrip code points, and so must choose +% between supporting the old PUA code points, or using the newer +% non-PUA code points. We choose to use the non-PUA code points to +% be compatible with ICU's similar choice. In choosing the non-PUA +% code points we can no longer convert the old PUA code points back +% to GB-18030-2005 (technically only fixable if we added support +% for non-roundtrip code points e.g. ICU's "fallback mapping"). +% The recommendation to use the non-PUA code points, where available, +% is based on "CJKV Information Processing" 2nd Ed. by Dr. Ken Lunde. +% +% These 10 PUA mappings use equivalents from to . % /xa6/xd9 % /xa6/xda % /xa6/xdb @@ -57371,6 +57387,7 @@ CHARMAP /xd7/xfd /xd7/xfe /x83/x36/xc9/x34 +% These 3 PUA mappings use equivalents , and . % /xfe/x51 % /xfe/x52 % /xfe/x53 @@ -57379,6 +57396,7 @@ CHARMAP /x83/x36/xc9/x37 /x83/x36/xc9/x38 /x83/x36/xc9/x39 +% This 1 PUA mapping uses the equivalent . % /xfe/x59 /x83/x36/xca/x30 /x83/x36/xca/x31 @@ -57387,17 +57405,20 @@ CHARMAP /x83/x36/xca/x34 /x83/x36/xca/x35 /x83/x36/xca/x36 +% This 1 PUA mapping uses the equivalent . % /xfe/x61 /x83/x36/xca/x37 /x83/x36/xca/x38 /x83/x36/xca/x39 /x83/x36/xcb/x30 +% These 2 PUA mappings use the equivalents and . % /xfe/x66 % /xfe/x67 /x83/x36/xcb/x31 /x83/x36/xcb/x32 /x83/x36/xcb/x33 /x83/x36/xcb/x34 +% These 2 PUA mappings use the equivalents and . % /xfe/x6c % /xfe/x6d /x83/x36/xcb/x35 @@ -57408,6 +57429,7 @@ CHARMAP /x83/x36/xcc/x30 /x83/x36/xcc/x31 /x83/x36/xcc/x32 +% This 1 PUA mapping uses the equivalent . % /xfe/x76 /x83/x36/xcc/x33 /x83/x36/xcc/x34 @@ -57416,6 +57438,7 @@ CHARMAP /x83/x36/xcc/x37 /x83/x36/xcc/x38 /x83/x36/xcc/x39 +% This 1 PUA mapping uses the equivalent . % /xfe/x7e /x83/x36/xcd/x30 /x83/x36/xcd/x31 @@ -57433,6 +57456,7 @@ CHARMAP /x83/x36/xce/x33 /x83/x36/xce/x34 /x83/x36/xce/x35 +% These 2 PUA mappings use the equivalents and . % /xfe/x90 % /xfe/x91 /x83/x36/xce/x36 @@ -57449,6 +57473,7 @@ CHARMAP /x83/x36/xcf/x37 /x83/x36/xcf/x38 /x83/x36/xcf/x39 +% This 1 PUA mapping uses the equivalent . % /xfe/xa0 /x83/x36/xd0/x30 /x83/x36/xd0/x31