[1/3] localedata: use same comment_char/escape_char in these files

Message ID 1455954855-26431-1-git-send-email-vapier@gentoo.org
State Committed
Delegated to: Mike Frysinger
Headers

Commit Message

Mike Frysinger Feb. 20, 2016, 7:54 a.m. UTC
  These files are small and easy to convert to what most others use.
---
 localedata/locales/POSIX       | 289 +++++++++++++++++++++--------------------
 localedata/locales/iso14651_t1 |   6 +-
 2 files changed, 150 insertions(+), 145 deletions(-)
  

Comments

Mike Frysinger Feb. 25, 2016, 8:12 p.m. UTC | #1
ping this series ...
-mike
  
Mike Frysinger March 9, 2016, 10:24 p.m. UTC | #2
On 25 Feb 2016 15:12, Mike Frysinger wrote:
> ping this series ...

ping some more ...
-mike
  
Marko Myllynen March 11, 2016, 10:31 a.m. UTC | #3
Hi,

On 2016-03-10 00:24, Mike Frysinger wrote:
> On 25 Feb 2016 15:12, Mike Frysinger wrote:
>> ping this series ...
> 
> ping some more ...

I think the silence here underlines once again that we simply don't have
enough "resources" in this area when a trivial change doesn't get a
timely review even when the patch is straightforward. Meaning that if we
want to keep the actual locale data in glibc in proper shape, using CLDR
is the only realistic and sustainable way forward.

Carlos and Florian exchanged few emails about CLDR/Unicode/glibc locale
copyright status, was there still something to be clarified on that front?

Mike's recent patch deprecated tel/fax fields in LC_IDENTIFICATION, are
there anything else we should could / should deprecate or remove?
(There's the PR https://sourceware.org/bugzilla/show_bug.cgi?id=14641 at
least but it's still being discussed, not sure how to deal with that.)

If/when those aspects are agreed upon, is there anything else or can we
then start using Mike's script to sync from CLDR? Perhaps the situation
with day abbreviations was left a bit open?

Mike, do you have a gut feeling how complete coverage the current CLDR
data provides per glibc locale, how many categories / keywords would
still need to be maintained without input from CLDR?

Oh, wrt the patch itself, LGTM.

Thanks,
  
Chris Leonard March 11, 2016, 2:24 p.m. UTC | #4
I know that there are a number of glibc locales that I have
contributed that are not represented in CLDR.  Working with OLPC and
Sugar Labs as I do, we are often on the bleeding edge of supporting
new languages on GNUI/Linux-based  systems.

cjl

On Fri, Mar 11, 2016 at 5:31 AM, Marko Myllynen <myllynen@redhat.com> wrote:
> Hi,
>
> On 2016-03-10 00:24, Mike Frysinger wrote:
>> On 25 Feb 2016 15:12, Mike Frysinger wrote:
>>> ping this series ...
>>
>> ping some more ...
>
> I think the silence here underlines once again that we simply don't have
> enough "resources" in this area when a trivial change doesn't get a
> timely review even when the patch is straightforward. Meaning that if we
> want to keep the actual locale data in glibc in proper shape, using CLDR
> is the only realistic and sustainable way forward.
>
> Carlos and Florian exchanged few emails about CLDR/Unicode/glibc locale
> copyright status, was there still something to be clarified on that front?
>
> Mike's recent patch deprecated tel/fax fields in LC_IDENTIFICATION, are
> there anything else we should could / should deprecate or remove?
> (There's the PR https://sourceware.org/bugzilla/show_bug.cgi?id=14641 at
> least but it's still being discussed, not sure how to deal with that.)
>
> If/when those aspects are agreed upon, is there anything else or can we
> then start using Mike's script to sync from CLDR? Perhaps the situation
> with day abbreviations was left a bit open?
>
> Mike, do you have a gut feeling how complete coverage the current CLDR
> data provides per glibc locale, how many categories / keywords would
> still need to be maintained without input from CLDR?
>
> Oh, wrt the patch itself, LGTM.
>
> Thanks,
>
> --
> Marko Myllynen
  
Marko Myllynen March 12, 2016, 12:19 p.m. UTC | #5
Hi,

On 2016-03-11 16:24, Chris Leonard wrote:
> I know that there are a number of glibc locales that I have
> contributed that are not represented in CLDR.  Working with OLPC and
> Sugar Labs as I do, we are often on the bleeding edge of supporting
> new languages on GNUI/Linux-based  systems.

Yes, locales only present in glibc but not in CLDR can't be
automatically updated (and most definitely won't be tossed away).

Do you have any rough estimate how many such locales there might be?

Thanks,
  
Chris Leonard March 12, 2016, 7:09 p.m. UTC | #6
I think there are approximately 83 glibc locales (lang-country pairs),
not in CLDR.  Some are trivial fixes because the lang  is in CLDR, in
most cases, the lang is not in CLDR at all

Data in the attached spreadsheet has directory listings of the
relevant folders of most the recent releases for both products.  Tab
namesshould be self-exlanatory.

A few glibc locales destined for deletion (pap_AN, iw_IL) are ignored
as they have relevant replacements.

cjl



On Sat, Mar 12, 2016 at 7:19 AM, Marko Myllynen <myllynen@redhat.com> wrote:
> Hi,
>
> On 2016-03-11 16:24, Chris Leonard wrote:
>> I know that there are a number of glibc locales that I have
>> contributed that are not represented in CLDR.  Working with OLPC and
>> Sugar Labs as I do, we are often on the bleeding edge of supporting
>> new languages on GNUI/Linux-based  systems.
>
> Yes, locales only present in glibc but not in CLDR can't be
> automatically updated (and most definitely won't be tossed away).
>
> Do you have any rough estimate how many such locales there might be?
>
> Thanks,
>
> --
> Marko Myllynen
  
Chris Leonard March 13, 2016, 5:49 p.m. UTC | #7
One fundamental way of looking at this issue is to distill it down to
just unique language codes (ignoring countries, scripts and currency
variants).

Between both projects combined, there are 257 unique language codes*.

115 codes are common between both projects.
79 languages are represented in CLDR only
63 languages are represented in glibc only.

*Note: one exception is the qu/quz conflict between selecting language
codes to represent the Quechua languages of the Andes.  I counted this
as in common, although it will require some resolution going forward,
as will the Aymara ay/ayc choice (there is no existing CLDR locale for
Aymara at present).

There are very possibly a few other such code selection issues which I
will look into further, I have a nagging suspicion that something is
going on with the Sotho language codes of Africa, but I need to
confirm that.  In any event, those wouldn't change the overall numbers
much.

Overall, I would not declare one project the winner over the other in
terms of best representing languages, clearly some cross-porting
should be done where possible for the sake of language communities
dependent on either locale type.

I'm here because I work with glibc-dependent language communities, so
that has been my focus.  I have not tried to work with CLDR on
locales.  Anyone here have experience with that?  How
welcoming/responsive are they to people who are trying to act as
intermediaries for minority language communities?

cjl


On Sat, Mar 12, 2016 at 2:09 PM, Chris Leonard <cjlhomeaddress@gmail.com> wrote:
> I think there are approximately 83 glibc locales (lang-country pairs),
> not in CLDR.  Some are trivial fixes because the lang  is in CLDR, in
> most cases, the lang is not in CLDR at all
>
> Data in the attached spreadsheet has directory listings of the
> relevant folders of most the recent releases for both products.  Tab
> namesshould be self-exlanatory.
>
> A few glibc locales destined for deletion (pap_AN, iw_IL) are ignored
> as they have relevant replacements.
>
> cjl
>
>
>
> On Sat, Mar 12, 2016 at 7:19 AM, Marko Myllynen <myllynen@redhat.com> wrote:
>> Hi,
>>
>> On 2016-03-11 16:24, Chris Leonard wrote:
>>> I know that there are a number of glibc locales that I have
>>> contributed that are not represented in CLDR.  Working with OLPC and
>>> Sugar Labs as I do, we are often on the bleeding edge of supporting
>>> new languages on GNUI/Linux-based  systems.
>>
>> Yes, locales only present in glibc but not in CLDR can't be
>> automatically updated (and most definitely won't be tossed away).
>>
>> Do you have any rough estimate how many such locales there might be?
>>
>> Thanks,
>>
>> --
>> Marko Myllynen
  
Carlos O'Donell March 14, 2016, 6:06 p.m. UTC | #8
On 03/13/2016 01:49 PM, Chris Leonard wrote:
> I'm here because I work with glibc-dependent language communities, so
> that has been my focus.  I have not tried to work with CLDR on
> locales.  Anyone here have experience with that?  How
> welcoming/responsive are they to people who are trying to act as
> intermediaries for minority language communities?

I don't know.

Though I'd like Red Hat to find out.
  
Carlos O'Donell March 14, 2016, 6:20 p.m. UTC | #9
On 03/11/2016 05:31 AM, Marko Myllynen wrote:
> Hi,
> 
> On 2016-03-10 00:24, Mike Frysinger wrote:
>> On 25 Feb 2016 15:12, Mike Frysinger wrote:
>>> ping this series ...
>>
>> ping some more ...
> 
> I think the silence here underlines once again that we simply don't have
> enough "resources" in this area when a trivial change doesn't get a
> timely review even when the patch is straightforward. Meaning that if we
> want to keep the actual locale data in glibc in proper shape, using CLDR
> is the only realistic and sustainable way forward.

Agreed.

> Carlos and Florian exchanged few emails about CLDR/Unicode/glibc locale
> copyright status, was there still something to be clarified on that front?

I have sent the FSF legal an email requesting clarification on the type
of attribution we need to provide, if any, for using CLDR/Unicode data
in the project. This has nothing to do with copyright status. The FSF does
not collect copyright on locales, so that's fine, but what's not fine is
what do *others* think and what do *others* require of us to comply to their
own interpretations.

I have not received any response from the FSF.

I have sent another email today.
  
keld@keldix.com March 14, 2016, 7:43 p.m. UTC | #10
On Mon, Mar 14, 2016 at 02:20:08PM -0400, Carlos O'Donell wrote:
> On 03/11/2016 05:31 AM, Marko Myllynen wrote:
> > Hi,
> > 
> > Carlos and Florian exchanged few emails about CLDR/Unicode/glibc locale
> > copyright status, was there still something to be clarified on that front?
> 
> I have sent the FSF legal an email requesting clarification on the type
> of attribution we need to provide, if any, for using CLDR/Unicode data
> in the project. This has nothing to do with copyright status. The FSF does
> not collect copyright on locales, so that's fine, but what's not fine is
> what do *others* think and what do *others* require of us to comply to their
> own interpretations.

I think it is a problem that FSF does not claim copyright on the locales,
then we cannot protect them.

At least a collecting copyricht should be applied.

Best regards
keld
  
Mike Frysinger April 11, 2016, 6:58 p.m. UTC | #11
On 13 Mar 2016 13:49, Chris Leonard wrote:
> One fundamental way of looking at this issue is to distill it down to
> just unique language codes (ignoring countries, scripts and currency
> variants).

right -- i don't care nearly as much about the locale combos (lang +
territory).  they do provide unambiguous direction at that point, but
with a little effort, we can still get good data w/out them.

some of the data is territory-specific and lang-independent, so as long
as cldr has details about all the territories glibc uses (and it does
today), then that part is fine.  i don't think we should add any that
are not listed in the cldr since it looks pretty complete (as it pulls
in a number of other standards).

> Between both projects combined, there are 257 unique language codes*.
> 
> 115 codes are common between both projects.
> 79 languages are represented in CLDR only
> 63 languages are represented in glibc only.

as long as cldr has at least a territory-independent lang entry,
we can extract a good amount of detail out of that.

my concerns start when cldr lacks any lang info at all, or even more
problematic, has marked that lang code as deprecated or uses a diff
naming convention.  there appears to be about 65 langs / 71 locales
on the glibc side (ignoring @script variants) that fall into these
buckets.  looks like about the same count as you have.

> *Note: one exception is the qu/quz conflict between selecting language
> codes to represent the Quechua languages of the Andes.  I counted this
> as in common, although it will require some resolution going forward,
> as will the Aymara ay/ayc choice (there is no existing CLDR locale for
> Aymara at present).
> 
> There are very possibly a few other such code selection issues which I
> will look into further, I have a nagging suspicion that something is
> going on with the Sotho language codes of Africa, but I need to
> confirm that.  In any event, those wouldn't change the overall numbers
> much.
> 
> Overall, I would not declare one project the winner over the other in
> terms of best representing languages, clearly some cross-porting
> should be done where possible for the sake of language communities
> dependent on either locale type.
> 
> I'm here because I work with glibc-dependent language communities, so
> that has been my focus.  I have not tried to work with CLDR on
> locales.  Anyone here have experience with that?  How
> welcoming/responsive are they to people who are trying to act as
> intermediaries for minority language communities?

seeing as you can represent the concerns of these communities better
than probably any of us, it would be great if you could look into the
cldr process.  from my glances around there, it doesn't look *too*
hard to break in and start posting contributions, especially when you
have no one else representing those languages.
-mike
  
Chris Leonard April 11, 2016, 8:45 p.m. UTC | #12
On Mon, Apr 11, 2016 at 2:58 PM, Mike Frysinger <vapier@gentoo.org> wrote:
>
> seeing as you can represent the concerns of these communities better
> than probably any of us, it would be great if you could look into the
> cldr process.  from my glances around there, it doesn't look *too*
> hard to break in and start posting contributions, especially when you
> have no one else representing those languages.
> -mike

One of the joys of working with the Sugar Labs / OLPC community is
that we find ourselves on the bleeding edge of new language-in-Linux
introduction.  The substantial OLPC deployments in Peru have driven
indigenous language work (Aymara, Quechua, Awajun, and others to
come).  The same can be said of our work with deployments in Mexico
and their languages as well as some others.

I am generally sympathetic with the notion of trying to drive both
glibc and CLDR locale development forward together. and I have long
wanted to leverage the excellent work of the "100 locales for Africa"
project (in CLDR) into glibc locales as well.  The CLDR
information/translation requirements are somewhat greater, including
as it does portions of the Debian iso-code project strings (languages,
countries, scripts, etc.).  However it is easy enough to simply
locally host versions of those PO files from the Translation Project
on the Sugar Labs Pootle instance to allow their accumulation to
critical mass for CLDR development in the same manner that I had
created a "glibc-helper.pot" file to collect the core translations of
a glibc locale.

If I can get Pootle language setup information (plural numbers, plural
equation) for the languages that exist in glibc, but not CLDR, I'll
host some CLDR-relevant PO files and start looking to recruit
interested localizers, but it must be understood that it is not
particularly easy to find localizers for these languages, they have a
very small footprint on the Internet.

I'll look into it some more and report back to the list.

cjl
  
Mike Frysinger April 12, 2016, 2:11 a.m. UTC | #13
On 11 Apr 2016 16:45, Chris Leonard wrote:
> On Mon, Apr 11, 2016 at 2:58 PM, Mike Frysinger wrote:
> > seeing as you can represent the concerns of these communities better
> > than probably any of us, it would be great if you could look into the
> > cldr process.  from my glances around there, it doesn't look *too*
> > hard to break in and start posting contributions, especially when you
> > have no one else representing those languages.
> 
> One of the joys of working with the Sugar Labs / OLPC community is
> that we find ourselves on the bleeding edge of new language-in-Linux
> introduction.  The substantial OLPC deployments in Peru have driven
> indigenous language work (Aymara, Quechua, Awajun, and others to
> come).  The same can be said of our work with deployments in Mexico
> and their languages as well as some others.
> 
> I am generally sympathetic with the notion of trying to drive both
> glibc and CLDR locale development forward together. and I have long
> wanted to leverage the excellent work of the "100 locales for Africa"
> project (in CLDR) into glibc locales as well.  The CLDR
> information/translation requirements are somewhat greater, including
> as it does portions of the Debian iso-code project strings (languages,
> countries, scripts, etc.).  However it is easy enough to simply
> locally host versions of those PO files from the Translation Project
> on the Sugar Labs Pootle instance to allow their accumulation to
> critical mass for CLDR development in the same manner that I had
> created a "glibc-helper.pot" file to collect the core translations of
> a glibc locale.
> 
> If I can get Pootle language setup information (plural numbers, plural
> equation) for the languages that exist in glibc, but not CLDR, I'll
> host some CLDR-relevant PO files and start looking to recruit
> interested localizers, but it must be understood that it is not
> particularly easy to find localizers for these languages, they have a
> very small footprint on the Internet.

my only clarification/semi-counter point is that getting languages into
CLDR probably has a wider impact than only getting into glibc.  that db
is pulled by a significant number of projects/companies (and i know for
a fact it's used internally at Google in conjunction with ICU).  if it's
only glibc though, that impact is entirely limited to people using glibc
as i don't think there's that many people downstream from glibc's locale
db.  and we want to try to automate/smooth the process for adding any of
the locales that exist in CLDR today but not glibc.

tl;dr: locales in glibc might get short term returns, but CLDR is the
       only long term answer.
-mike
  
Chris Leonard April 12, 2016, 2:49 a.m. UTC | #14
On Mon, Apr 11, 2016 at 10:11 PM, Mike Frysinger <vapier@gentoo.org> wrote:

>
> my only clarification/semi-counter point is that getting languages into
> CLDR probably has a wider impact than only getting into glibc.  that db
> is pulled by a significant number of projects/companies (and i know for
> a fact it's used internally at Google in conjunction with ICU).  if it's
> only glibc though, that impact is entirely limited to people using glibc
> as i don't think there's that many people downstream from glibc's locale
> db.  and we want to try to automate/smooth the process for adding any of
> the locales that exist in CLDR today but not glibc.
>
> tl;dr: locales in glibc might get short term returns, but CLDR is the
>        only long term answer.

I totally get the utility of CLDR, but I know of 2-3 million users
that are directly downstream of glibc locales on Sugar/GNOME dual-boot
XO laptops.  Those users span an incredible array of languages that
are not commonly used on computers, yet.  Those laptops have shown
remarkable longevity in the field, I've just loaded a newly minted
build on a ten year old XO-1 and it runs like a charm.  My two main
reasons for swimming upstream to this project are to help those users
and to "show some love" to the upstream.

Nonetheless, I am onboard with the dual-tracking of locale development
to maximize utility and minimize re-work.

cjl
  

Patch

diff --git a/localedata/locales/POSIX b/localedata/locales/POSIX
index df89cb8..2804bdb 100644
--- a/localedata/locales/POSIX
+++ b/localedata/locales/POSIX
@@ -1,95 +1,98 @@ 
-# POSIX Standard Locale
-#
-# As per ISO/IEC 9945-2:1993 specifications
-# except for these additional identifying comments
-#
-# Source: ISO/IEC JTC1/SC22/WG15
-# Address: C/O DKUUG, Fruebjergvej 3
-#    DK-2100 Copenhagen O, Denmark
-# Contact: Keld Simonsen
-# Email: Keld.Simonsen@dkuug.dk
-# Tel: +45 - 39179944
-# Fax: +45 - 31208948
-# Language: POSIX
-# Territory:
-# Revision: 1.1
-# Date: 1997-03-15
-# Application: general
-# Users: general
-# Charset: ISO646:1993
-# Distribution and use is free, also for
-# commercial purposes.
+comment_char %
+escape_char /
+
+% POSIX Standard Locale
+%
+% As per ISO/IEC 9945-2:1993 specifications
+% except for these additional identifying comments
+%
+% Source: ISO/IEC JTC1/SC22/WG15
+% Address: C/O DKUUG, Fruebjergvej 3
+%    DK-2100 Copenhagen O, Denmark
+% Contact: Keld Simonsen
+% Email: Keld.Simonsen@dkuug.dk
+% Tel: +45 - 39179944
+% Fax: +45 - 31208948
+% Language: POSIX
+% Territory:
+% Revision: 1.1
+% Date: 1997-03-15
+% Application: general
+% Users: general
+% Charset: ISO646:1993
+% Distribution and use is free, also for
+% commercial purposes.
 
 LC_CTYPE
-# The following is the POSIX Locale LC_CTYPE.
-# "alpha" is by default "upper" and "lower"
-# "alnum" is by definiton "alpha" and "digit"
-# "print" is by default "alnum", "punct" and the <U0020> character
-# "graph" is by default "alnum" and "punct"
-#
-upper   <U0041>;<U0042>;<U0043>;<U0044>;<U0045>;<U0046>;<U0047>;<U0048>;\
-        <U0049>;<U004A>;<U004B>;<U004C>;<U004D>;<U004E>;<U004F>;<U0050>;\
-        <U0051>;<U0052>;<U0053>;<U0054>;<U0055>;<U0056>;<U0057>;<U0058>;\
+% The following is the POSIX Locale LC_CTYPE.
+% "alpha" is by default "upper" and "lower"
+% "alnum" is by definiton "alpha" and "digit"
+% "print" is by default "alnum", "punct" and the <U0020> character
+% "graph" is by default "alnum" and "punct"
+%
+upper   <U0041>;<U0042>;<U0043>;<U0044>;<U0045>;<U0046>;<U0047>;<U0048>;/
+        <U0049>;<U004A>;<U004B>;<U004C>;<U004D>;<U004E>;<U004F>;<U0050>;/
+        <U0051>;<U0052>;<U0053>;<U0054>;<U0055>;<U0056>;<U0057>;<U0058>;/
         <U0059>;<U005A>
-#
-lower   <U0061>;<U0062>;<U0063>;<U0064>;<U0065>;<U0066>;<U0067>;<U0068>;\
-        <U0069>;<U006A>;<U006B>;<U006C>;<U006D>;<U006E>;<U006F>;<U0070>;\
-        <U0071>;<U0072>;<U0073>;<U0074>;<U0075>;<U0076>;<U0077>;<U0078>;\
+%
+lower   <U0061>;<U0062>;<U0063>;<U0064>;<U0065>;<U0066>;<U0067>;<U0068>;/
+        <U0069>;<U006A>;<U006B>;<U006C>;<U006D>;<U006E>;<U006F>;<U0070>;/
+        <U0071>;<U0072>;<U0073>;<U0074>;<U0075>;<U0076>;<U0077>;<U0078>;/
         <U0079>;<U007A>
-#
-digit   <U0030>;<U0031>;<U0032>;<U0033>;<U0034>;\
+%
+digit   <U0030>;<U0031>;<U0032>;<U0033>;<U0034>;/
         <U0035>;<U0036>;<U0037>;<U0038>;<U0039>
-#
-space   <U0009>;<U000A>;<U000B>;<U000C>;\
+%
+space   <U0009>;<U000A>;<U000B>;<U000C>;/
         <U000D>;<U0020>
-#
-cntrl   <U0007>;<U0008>;<U0009>;<U000A>;<U000B>;\
-        <U000C>;<U000D>;\
-        <U0000>;<U0001>;<U0002>;<U0003>;<U0004>;<U0005>;<U0006>;<U000E>;\
-        <U000F>;<U0010>;<U0011>;<U0012>;<U0013>;<U0014>;<U0015>;<U0016>;\
-        <U0017>;<U0018>;<U0019>;<U001A>;<U001B>;<U001C>;<U001D>;<U001E>;\
+%
+cntrl   <U0007>;<U0008>;<U0009>;<U000A>;<U000B>;/
+        <U000C>;<U000D>;/
+        <U0000>;<U0001>;<U0002>;<U0003>;<U0004>;<U0005>;<U0006>;<U000E>;/
+        <U000F>;<U0010>;<U0011>;<U0012>;<U0013>;<U0014>;<U0015>;<U0016>;/
+        <U0017>;<U0018>;<U0019>;<U001A>;<U001B>;<U001C>;<U001D>;<U001E>;/
         <U001F>;<U007F>
-#
-punct   <U0021>;<U0022>;<U0023>;\
-        <U0024>;<U0025>;<U0026>;<U0027>;\
-        <U0028>;<U0029>;<U002A>;\
-        <U002B>;<U002C>;<U002D>;<U002E>;<U002F>;\
-        <U003A>;<U003B>;<U003C>;<U003D>;\
-        <U003E>;<U003F>;<U0040>;\
-        <U005B>;<U005C>;<U005D>;\
-        <U005E>;<U005F>;<U0060>;\
+%
+punct   <U0021>;<U0022>;<U0023>;/
+        <U0024>;<U0025>;<U0026>;<U0027>;/
+        <U0028>;<U0029>;<U002A>;/
+        <U002B>;<U002C>;<U002D>;<U002E>;<U002F>;/
+        <U003A>;<U003B>;<U003C>;<U003D>;/
+        <U003E>;<U003F>;<U0040>;/
+        <U005B>;<U005C>;<U005D>;/
+        <U005E>;<U005F>;<U0060>;/
         <U007B>;<U007C>;<U007D>;<U007E>
-#
-xdigit  <U0030>;<U0031>;<U0032>;<U0033>;<U0034>;<U0035>;<U0036>;<U0037>;\
-        <U0038>;<U0039>;<U0041>;<U0042>;<U0043>;<U0044>;<U0045>;<U0046>;\
+%
+xdigit  <U0030>;<U0031>;<U0032>;<U0033>;<U0034>;<U0035>;<U0036>;<U0037>;/
+        <U0038>;<U0039>;<U0041>;<U0042>;<U0043>;<U0044>;<U0045>;<U0046>;/
         <U0061>;<U0062>;<U0063>;<U0064>;<U0065>;<U0066>
-#
+%
 blank   <U0020>;<U0009>
-#
-tolower (<U0041>,<U0061>);(<U0042>,<U0062>);(<U0043>,<U0063>);\
-        (<U0044>,<U0064>);(<U0045>,<U0065>);(<U0046>,<U0066>);\
-        (<U0047>,<U0067>);(<U0048>,<U0068>);(<U0049>,<U0069>);\
-        (<U004A>,<U006A>);(<U004B>,<U006B>);(<U004C>,<U006C>);\
-        (<U004D>,<U006D>);(<U004E>,<U006E>);(<U004F>,<U006F>);\
-        (<U0050>,<U0070>);(<U0051>,<U0071>);(<U0052>,<U0072>);\
-        (<U0053>,<U0073>);(<U0054>,<U0074>);(<U0055>,<U0075>);\
-        (<U0056>,<U0076>);(<U0057>,<U0077>);(<U0058>,<U0078>);\
+%
+tolower (<U0041>,<U0061>);(<U0042>,<U0062>);(<U0043>,<U0063>);/
+        (<U0044>,<U0064>);(<U0045>,<U0065>);(<U0046>,<U0066>);/
+        (<U0047>,<U0067>);(<U0048>,<U0068>);(<U0049>,<U0069>);/
+        (<U004A>,<U006A>);(<U004B>,<U006B>);(<U004C>,<U006C>);/
+        (<U004D>,<U006D>);(<U004E>,<U006E>);(<U004F>,<U006F>);/
+        (<U0050>,<U0070>);(<U0051>,<U0071>);(<U0052>,<U0072>);/
+        (<U0053>,<U0073>);(<U0054>,<U0074>);(<U0055>,<U0075>);/
+        (<U0056>,<U0076>);(<U0057>,<U0077>);(<U0058>,<U0078>);/
         (<U0059>,<U0079>);(<U005A>,<U007A>)
-#
-toupper (<U0061>,<U0041>);(<U0062>,<U0042>);(<U0063>,<U0043>);\
-        (<U0064>,<U0044>);(<U0065>,<U0045>);(<U0066>,<U0046>);\
-        (<U0067>,<U0047>);(<U0068>,<U0048>);(<U0069>,<U0049>);\
-        (<U006A>,<U004A>);(<U006B>,<U004B>);(<U006C>,<U004C>);\
-        (<U006D>,<U004D>);(<U006E>,<U004E>);(<U006F>,<U004F>);\
-        (<U0070>,<U0050>);(<U0071>,<U0051>);(<U0072>,<U0052>);\
-        (<U0073>,<U0053>);(<U0074>,<U0054>);(<U0075>,<U0055>);\
-        (<U0076>,<U0056>);(<U0077>,<U0057>);(<U0078>,<U0058>);\
+%
+toupper (<U0061>,<U0041>);(<U0062>,<U0042>);(<U0063>,<U0043>);/
+        (<U0064>,<U0044>);(<U0065>,<U0045>);(<U0066>,<U0046>);/
+        (<U0067>,<U0047>);(<U0068>,<U0048>);(<U0069>,<U0049>);/
+        (<U006A>,<U004A>);(<U006B>,<U004B>);(<U006C>,<U004C>);/
+        (<U006D>,<U004D>);(<U006E>,<U004E>);(<U006F>,<U004F>);/
+        (<U0070>,<U0050>);(<U0071>,<U0051>);(<U0072>,<U0052>);/
+        (<U0073>,<U0053>);(<U0074>,<U0054>);(<U0075>,<U0055>);/
+        (<U0076>,<U0056>);(<U0077>,<U0057>);(<U0078>,<U0058>);/
         (<U0079>,<U0059>);(<U007A>,<U005A>)
 END LC_CTYPE
 
 LC_COLLATE
-# This is the POSIX Locale definition for the LC_COLLATE category.
-# The order is the same as in the ASCII code set.
+% This is the POSIX Locale definition for the LC_COLLATE category.
+% The order is the same as in the ASCII code set.
 order_start forward
 <U0000>
 <U0001>
@@ -221,13 +224,13 @@  order_start forward
 <U007F>
 UNDEFINED
 order_end
-#
+%
 END LC_COLLATE
 
 LC_MONETARY
-# This is the POSIX Locale definition for
-# the LC_MONETARY category.
-#
+% This is the POSIX Locale definition for
+% the LC_MONETARY category.
+%
 int_curr_symbol     ""
 currency_symbol     ""
 mon_decimal_point   "<U002E>"
@@ -243,92 +246,92 @@  n_cs_precedes       -1
 n_sep_by_space      -1
 p_sign_posn         -1
 n_sign_posn         -1
-#
+%
 END LC_MONETARY
 
 LC_NUMERIC
-# This is the POSIX Locale definition for
-# the LC_NUMERIC category.
-#
+% This is the POSIX Locale definition for
+% the LC_NUMERIC category.
+%
 decimal_point   "<U002E>"
 thousands_sep   ""
 grouping        -1
-#
+%
 END LC_NUMERIC
 
 LC_TIME
-# This is the POSIX Locale definition for
-# the LC_TIME category.
-#
-# Abbreviated weekday names (%s)
-abday   "<U0053><U0075><U006E>";"<U004D><U006F><U006E>";\
-        "<U0054><U0075><U0065>";"<U0057><U0065><U0064>";\
-        "<U0054><U0068><U0075>";"<U0046><U0072><U0069>";\
+% This is the POSIX Locale definition for
+% the LC_TIME category.
+%
+% Abbreviated weekday names (%s)
+abday   "<U0053><U0075><U006E>";"<U004D><U006F><U006E>";/
+        "<U0054><U0075><U0065>";"<U0057><U0065><U0064>";/
+        "<U0054><U0068><U0075>";"<U0046><U0072><U0069>";/
         "<U0053><U0061><U0074>"
-#
-# Full weekday names (%A)
-day     "<U0053><U0075><U006E><U0064><U0061><U0079>";\
-        "<U004D><U006F><U006E><U0064><U0061><U0079>";\
-        "<U0054><U0075><U0065><U0073><U0064><U0061><U0079>";\
-        "<U0057><U0065><U0064><U006E><U0065><U0073><U0064><U0061><U0079>";\
-        "<U0054><U0068><U0075><U0072><U0073><U0064><U0061><U0079>";\
-        "<U0046><U0072><U0069><U0064><U0061><U0079>";\
+%
+% Full weekday names (%A)
+day     "<U0053><U0075><U006E><U0064><U0061><U0079>";/
+        "<U004D><U006F><U006E><U0064><U0061><U0079>";/
+        "<U0054><U0075><U0065><U0073><U0064><U0061><U0079>";/
+        "<U0057><U0065><U0064><U006E><U0065><U0073><U0064><U0061><U0079>";/
+        "<U0054><U0068><U0075><U0072><U0073><U0064><U0061><U0079>";/
+        "<U0046><U0072><U0069><U0064><U0061><U0079>";/
         "<U0053><U0061><U0074><U0075><U0072><U0064><U0061><U0079>"
-#
-# Abbreviated month names (%b)
-abmon   "<U004A><U0061><U006E>";"<U0046><U0065><U0062>";\
-        "<U004D><U0061><U0072>";"<U0041><U0070><U0072>";\
-        "<U004D><U0061><U0079>";"<U004A><U0075><U006E>";\
-        "<U004A><U0075><U006C>";"<U0041><U0075><U0067>";\
-        "<U0053><U0065><U0070>";"<U004F><U0063><U0074>";\
+%
+% Abbreviated month names (%b)
+abmon   "<U004A><U0061><U006E>";"<U0046><U0065><U0062>";/
+        "<U004D><U0061><U0072>";"<U0041><U0070><U0072>";/
+        "<U004D><U0061><U0079>";"<U004A><U0075><U006E>";/
+        "<U004A><U0075><U006C>";"<U0041><U0075><U0067>";/
+        "<U0053><U0065><U0070>";"<U004F><U0063><U0074>";/
         "<U004E><U006F><U0076>";"<U0044><U0065><U0063>"
-#
-# Full month names (%B)
-mon     "<U004A><U0061><U006E><U0075><U0061><U0072><U0079>";\
-        "<U0046><U0065><U0062><U0072><U0075><U0061><U0072><U0079>";\
-        "<U004D><U0061><U0072><U0063><U0068>";\
-        "<U0041><U0070><U0072><U0069><U006C>";\
-        "<U004D><U0061><U0079>";\
-        "<U004A><U0075><U006E><U0065>";\
-        "<U004A><U0075><U006C><U0079>";\
-        "<U0041><U0075><U0067><U0075><U0073><U0074>";\
-        "<U0053><U0065><U0070><U0074><U0065><U006D><U0062><U0065><U0072>";\
-        "<U004F><U0063><U0074><U006F><U0062><U0065><U0072>";\
-        "<U004E><U006F><U0076><U0065><U006D><U0062><U0065><U0072>";\
+%
+% Full month names (%B)
+mon     "<U004A><U0061><U006E><U0075><U0061><U0072><U0079>";/
+        "<U0046><U0065><U0062><U0072><U0075><U0061><U0072><U0079>";/
+        "<U004D><U0061><U0072><U0063><U0068>";/
+        "<U0041><U0070><U0072><U0069><U006C>";/
+        "<U004D><U0061><U0079>";/
+        "<U004A><U0075><U006E><U0065>";/
+        "<U004A><U0075><U006C><U0079>";/
+        "<U0041><U0075><U0067><U0075><U0073><U0074>";/
+        "<U0053><U0065><U0070><U0074><U0065><U006D><U0062><U0065><U0072>";/
+        "<U004F><U0063><U0074><U006F><U0062><U0065><U0072>";/
+        "<U004E><U006F><U0076><U0065><U006D><U0062><U0065><U0072>";/
         "<U0044><U0065><U0063><U0065><U006D><U0062><U0065><U0072>"
-#
-# Equivalent of AM/PM (%p)      "AM"/"PM"
+%
+% Equivalent of AM/PM (%p)      "AM"/"PM"
 am_pm   "<U0041><U004D>";"<U0050><U004D>"
-#
-# Appropriate date and time representation (%c)
-#       "%a %b %e %H:%M:%S %Y"
-d_t_fmt "<U0025><U0061><U0020><U0025><U0062><U0020><U0025><U0065>\
-<U0020><U0025><U0048><U003A><U0025><U004D>\
+%
+% Appropriate date and time representation (%c)
+%       "%a %b %e %H:%M:%S %Y"
+d_t_fmt "<U0025><U0061><U0020><U0025><U0062><U0020><U0025><U0065>/
+<U0020><U0025><U0048><U003A><U0025><U004D>/
 <U003A><U0025><U0053><U0020><U0025><U0059>"
-#
-# Appropriate date representation (%x)   "%m/%d/%y"
+%
+% Appropriate date representation (%x)   "%m/%d/%y"
 d_fmt   "<U0025><U006D><U002F><U0025><U0064><U002F><U0025><U0079>"
-#
-# Appropriate time representation (%X)   "%H:%M:%S"
+%
+% Appropriate time representation (%X)   "%H:%M:%S"
 t_fmt   "<U0025><U0048><U003A><U0025><U004D><U003A><U0025><U0053>"
-#
-# Appropriate 12 h time representation (%r)   "%I:%M:%S %p"
-t_fmt_ampm "<U0025><U0049><U003A><U0025><U004D><U003A><U0025><U0053>\
+%
+% Appropriate 12 h time representation (%r)   "%I:%M:%S %p"
+t_fmt_ampm "<U0025><U0049><U003A><U0025><U004D><U003A><U0025><U0053>/
 <U0020><U0025><U0070>"
-#
-# Appropriate date representation (date(1))   "%a %b %e %H:%M:%S %Z %Y"
+%
+% Appropriate date representation (date(1))   "%a %b %e %H:%M:%S %Z %Y"
 date_fmt	"<U0025><U0061><U0020><U0025><U0062><U0020><U0025><U0065><U0020><U0025><U0048><U003A><U0025><U004D><U003A><U0025><U0053><U0020><U0025><U005A><U0020><U0025><U0059>"
 END LC_TIME
 
 LC_MESSAGES
-# This is the POSIX Locale definition for
-# the LC_NUMERIC category.
-#
+% This is the POSIX Locale definition for
+% the LC_NUMERIC category.
+%
 yesexpr "<U005E><U005B><U0079><U0059><U005D>"
-#
+%
 noexpr  "<U005E><U005B><U006E><U004E><U005D>"
-#
+%
 yesstr  "<U0059><U0065><U0073>"
-#
+%
 nostr   "<U004E><U006F>"
 END LC_MESSAGES
diff --git a/localedata/locales/iso14651_t1 b/localedata/locales/iso14651_t1
index dd694d0..0d10f4f 100644
--- a/localedata/locales/iso14651_t1
+++ b/localedata/locales/iso14651_t1
@@ -1,3 +1,6 @@ 
+comment_char %
+escape_char /
+
 LC_COLLATE
 
 copy "iso14651_t1_common"
@@ -8,7 +11,6 @@  order_start <HAN>;forward;forward;forward;forward,position
 <U4E00> <U4E00>;IGNORE;IGNORE;IGNORE
 .. ..;IGNORE;IGNORE;IGNORE
 <U9FA5> <U9FA5>;IGNORE;IGNORE;IGNORE
-#
 order_end
-#
+
 END LC_COLLATE