BZ #19575: Clarify status of entries in GB 18030-2005.
Commit Message
In bug 19575 Florian Weimer asks about the status of the glibc
support for GB 18030-2005, since ICU and Emacs produce slightly
different results than glibc.
The following patch adds clarifying comments to GB 18030-2005's
character map to explain why glibc has the following mapping and
why it is best-practice.
localedata/
2016-02-08 Carlos O'Donell <carlos@redhat.com>
* charmaps/GB18030
---
Cheers,
Carlos.
Comments
On 02/08/2016 09:28 PM, Carlos O'Donell wrote:
> In bug 19575 Florian Weimer asks about the status of the glibc
> support for GB 18030-2005, since ICU and Emacs produce slightly
> different results than glibc.
>
> The following patch adds clarifying comments to GB 18030-2005's
> character map to explain why glibc has the following mapping and
> why it is best-practice.
The comments would probably have helped me to understand the situation.
Thanks,
Florian
"Carlos O'Donell" <carlos@redhat.com> writes:
> In bug 19575 Florian Weimer asks about the status of the glibc
> support for GB 18030-2005, since ICU and Emacs produce slightly
> different results than glibc.
Emacs uses the same table as glibc.
> +% The code points from <UFE10> to <UFE19> are a adjustment
> +% of the GB 18030-2005 standard to account for the fact that
> +% with Unicode 4.1 support we can now correctly represent those
> +% entries, which in the standard, used PUA code points.
There are more differences between GB18030-2000 and GB18030-2005.
Andreas.
On 02/08/2016 04:45 PM, Andreas Schwab wrote:
> "Carlos O'Donell" <carlos@redhat.com> writes:
>
>> In bug 19575 Florian Weimer asks about the status of the glibc
>> support for GB 18030-2005, since ICU and Emacs produce slightly
>> different results than glibc.
>
> Emacs uses the same table as glibc.
Good.
>> +% The code points from <UFE10> to <UFE19> are a adjustment
>> +% of the GB 18030-2005 standard to account for the fact that
>> +% with Unicode 4.1 support we can now correctly represent those
>> +% entries, which in the standard, used PUA code points.
>
> There are more differences between GB18030-2000 and GB18030-2005.
Agreed.
This patch is only to clarify why these entries are being mapped
differently than in the original GB 18030-2005 standard.
Does the patch seem suitable to you?
Cheers,
Carlos.
"Carlos O'Donell" <carlos@redhat.com> writes:
> This patch is only to clarify why these entries are being mapped
> differently than in the original GB 18030-2005 standard.
They aren't.
Andreas.
On 02/08/2016 05:19 PM, Andreas Schwab wrote:
> "Carlos O'Donell" <carlos@redhat.com> writes:
>
>> This patch is only to clarify why these entries are being mapped
>> differently than in the original GB 18030-2005 standard.
>
> They aren't.
Do you have a copy of the standard to verify that?
Cheers,
Carlos.
"Carlos O'Donell" <carlos@redhat.com> writes:
> On 02/08/2016 05:19 PM, Andreas Schwab wrote:
>> "Carlos O'Donell" <carlos@redhat.com> writes:
>>
>>> This patch is only to clarify why these entries are being mapped
>>> differently than in the original GB 18030-2005 standard.
>>
>> They aren't.
>
> Do you have a copy of the standard to verify that?
See charset/data/ucm/gb-18030-2005.ucm in ICU.
Andreas.
@@ -57234,6 +57234,12 @@ CHARMAP
<UE78A> /xa6/xbe <Private Use>
<UE78B> /xa6/xbf <Private Use>
<UE78C> /xa6/xc0 <Private Use>
+% The newest GB 18030-2005 standard still uses some private use area
+% code points. Any implementation which has Unicode 4.1 or newer
+% support should not use these PUA code points, and instead should
+% map these entries to their equivalent non-PUA code points which
+% in this case map from <UFE10> to <UFE19>. This recommendation is
+% based on "CJKV Processing" by Dr. Ken Lunde.
% <UE78D> /xa6/xd9 <Private Use>
% <UE78E> /xa6/xda <Private Use>
% <UE78F> /xa6/xdb <Private Use>
@@ -62997,6 +63003,10 @@ CHARMAP
<UFE0D> /x84/x31/x82/x33 VARIATION SELECTOR-14
<UFE0E> /x84/x31/x82/x34 VARIATION SELECTOR-15
<UFE0F> /x84/x31/x82/x35 VARIATION SELECTOR-16
+% The code points from <UFE10> to <UFE19> are a adjustment
+% of the GB 18030-2005 standard to account for the fact that
+% with Unicode 4.1 support we can now correctly represent those
+% entries, which in the standard, used PUA code points.
<UFE10> /xa6/xd9 PRESENTATION FORM FOR VERTICAL COMMA
<UFE11> /xa6/xdb PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA
<UFE12> /xa6/xda PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP