localedata: add more translit entries

Message ID 1461447298-13571-1-git-send-email-vapier@gentoo.org
State Committed
Delegated to: Mike Frysinger
Headers

Commit Message

Mike Frysinger April 23, 2016, 9:34 p.m. UTC
  Some of the newer symbols we're using are missing translit entries which
causes troubles when generating the locales with older encodings.

tr_TR: ₺ -> "TL"
uz_UZ: ʻ -> "'"
common:
  ֏ -> "AMD"
  ₪ -> "ILS"
  ₱ -> "PHP"
  ₸ -> "KZT"
  ₾ -> "GEL"
---
 localedata/locales/tr_TR            |  3 +++
 localedata/locales/translit_neutral | 10 ++++++++++
 localedata/locales/uz_UZ            |  3 +++
 3 files changed, 16 insertions(+)
  

Comments

Khem Raj April 23, 2016, 10:32 p.m. UTC | #1
> On Apr 23, 2016, at 2:34 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> 
> Some of the newer symbols we're using are missing translit entries which
> causes troubles when generating the locales with older encodings.
> 
> tr_TR: ₺ -> "TL"
> uz_UZ: ʻ -> "'"
> common:
>  ֏ -> "AMD"
>  ₪ -> "ILS"
>  ₱ -> "PHP"
>  ₸ -> "KZT"
>  ₾ -> “GEL"


You can add if you like

Tested-by: Khem Raj <raj.khem@gmail.com>

I tested this patch and now all broken locales that I was excluding are fixed.


> ---
> localedata/locales/tr_TR            |  3 +++
> localedata/locales/translit_neutral | 10 ++++++++++
> localedata/locales/uz_UZ            |  3 +++
> 3 files changed, 16 insertions(+)
> 
> diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR
> index 3ef50dd..3d06d0e 100644
> --- a/localedata/locales/tr_TR
> +++ b/localedata/locales/tr_TR
> @@ -4229,6 +4229,9 @@ class "combining_level3"; /
> 
> translit_start
> include "translit_combining";""
> +
> +% TURKISH LIRA SIGN
> +<U20BA> "<U0054><U004C>"
> translit_end
> END LC_CTYPE
> 
> diff --git a/localedata/locales/translit_neutral b/localedata/locales/translit_neutral
> index b15d01f..e3639c0 100644
> --- a/localedata/locales/translit_neutral
> +++ b/localedata/locales/translit_neutral
> @@ -353,6 +353,8 @@ include "translit_wide";""
> <U02DC> <U007E>
> % DOUBLE ACUTE ACCENT
> <U02DD> "<U0027><U0027>"
> +% ARMENIAN DRAM SIGN
> +<U058F> "<U0041><U004D><U0044>"
> % LATIN LETTER SMALL CAPITAL A
> <U1D00> <U0041>
> % LATIN LETTER SMALL CAPITAL AE
> @@ -561,20 +563,28 @@ include "translit_wide";""
> <U20A7> "<U0050><U0074><U0073>"
> % WON SIGN
> <U20A9> "<U004B><U0052><U0057>"
> +% NEW SHEQEL SIGN
> +<U20AA> "<U0049><U004C><U0053>"
> % DONG SIGN
> <U20AB> "<U0110><U1ED3><U006E><U0067>";"<U0044><U006F><U006E><U0067>"
> % EURO SIGN
> <U20AC> "<U0045><U0055><U0052>"
> % DRACHMA SIGN
> <U20AF> "<U0047><U0052><U0044>"
> +% PESO SIGN
> +<U20B1> "<U0050><U0048><U0050>"
> % HRYVNIA SIGN
> <U20B4> "<U0433><U0440><U043D>";"<U0055><U0041><U0048>"
> +% TENGE SIGN
> +<U20B8> "<U004B><U005A><U0054>"
> % INDIAN RUPEE SIGN
> <U20B9> "<U0049><U004E><U0052>"
> % TURKISH LIRA SIGN
> <U20BA> "<U0054><U004C>"
> % RUBLE SIGN
> <U20BD> "<U0440><U0443><U0431>";"<U0052><U0055><U0042>"
> +% GEORGIAN LARI SIGN
> +<U20BE> "<U0047><U0045><U004C>"
> % PRESCRIPTION TAKE
> <U211E> "<U0052><U0078>"
> % TRADE MARK SIGN
> diff --git a/localedata/locales/uz_UZ b/localedata/locales/uz_UZ
> index b76a281..c4e7590 100644
> --- a/localedata/locales/uz_UZ
> +++ b/localedata/locales/uz_UZ
> @@ -147,6 +147,9 @@ include "translit_combining";""
> <U049A> "<U0051>"        % k,= -> q
> <U0492> "<U0047><U0027>" % g-= -> g'
> <U04B2> "<U0048>"        % h,= -> h
> +
> +% various punctuation
> +<U02BB> "<U0027>" % ‘ -> '
> translit_end
> END LC_CTYPE
> 
> --
> 2.7.4
>
  
Marko Myllynen April 25, 2016, 1:16 p.m. UTC | #2
Hi,

On 2016-04-24 00:34, Mike Frysinger wrote:
> Some of the newer symbols we're using are missing translit entries which
> causes troubles when generating the locales with older encodings.
> 
> tr_TR: ₺ -> "TL"
> uz_UZ: ʻ -> "'"
> common:
>   ֏ -> "AMD"
>   ₪ -> "ILS"
>   ₱ -> "PHP"
>   ₸ -> "KZT"
>   ₾ -> "GEL"

The new common parts look good but I wonder when should we include
translit_neutral in locale files and when to add individual rules into
them instead (which are already defined in translit_neutral)?

Thanks,
  
Mike Frysinger April 26, 2016, 4:40 a.m. UTC | #3
On 25 Apr 2016 16:16, Marko Myllynen wrote:
> On 2016-04-24 00:34, Mike Frysinger wrote:
> > Some of the newer symbols we're using are missing translit entries which
> > causes troubles when generating the locales with older encodings.
> > 
> > tr_TR: ₺ -> "TL"
> > uz_UZ: ʻ -> "'"
> > common:
> >   ֏ -> "AMD"
> >   ₪ -> "ILS"
> >   ₱ -> "PHP"
> >   ₸ -> "KZT"
> >   ₾ -> "GEL"
> 
> The new common parts look good but I wonder when should we include
> translit_neutral in locale files and when to add individual rules into
> them instead (which are already defined in translit_neutral)?

the vast majority of locales leverage translit_neutral (by way of copying
i18n).  the exceptions are:
	az_AZ
	crh_UA
	km_KH
	ku_TR
	tr_CY
	tr_TR
	tt_RU@iqtelif

all except km_KH eventually copy tr_TR.  seems like it should be possible
to rebase these two on top of translit_neutral (or at least i18n), but i
have no idea why they didn't in the first place.  considering how hacky
km_KH has been (looks like it was copied from lo_LA and then hacked to
bits to get a little km_KH), i wouldn't be surprised if it wasn't meant
to be by itself.
-mike
  
Marko Myllynen April 26, 2016, 6:04 a.m. UTC | #4
Hi,

On 2016-04-26 07:40, Mike Frysinger wrote:
> On 25 Apr 2016 16:16, Marko Myllynen wrote:
>> On 2016-04-24 00:34, Mike Frysinger wrote:
>>> Some of the newer symbols we're using are missing translit entries which
>>> causes troubles when generating the locales with older encodings.
>>>
>>> tr_TR: ₺ -> "TL"
>>> uz_UZ: ʻ -> "'"
>>> common:
>>>   ֏ -> "AMD"
>>>   ₪ -> "ILS"
>>>   ₱ -> "PHP"
>>>   ₸ -> "KZT"
>>>   ₾ -> "GEL"
>>
>> The new common parts look good but I wonder when should we include
>> translit_neutral in locale files and when to add individual rules into
>> them instead (which are already defined in translit_neutral)?
> 
> the vast majority of locales leverage translit_neutral (by way of copying
> i18n).  the exceptions are:
> 	az_AZ
> 	crh_UA
> 	km_KH
> 	ku_TR
> 	tr_CY
> 	tr_TR
> 	tt_RU@iqtelif
> 
> all except km_KH eventually copy tr_TR.  seems like it should be possible
> to rebase these two on top of translit_neutral (or at least i18n), but i
> have no idea why they didn't in the first place.  considering how hacky
> km_KH has been (looks like it was copied from lo_LA and then hacked to
> bits to get a little km_KH), i wouldn't be surprised if it wasn't meant
> to be by itself.

Ok, thanks for checking, I think this can be investigated later in more
detail and now your patch is the right thing to do.

Cheers,
  
Chris Leonard April 26, 2016, 12:37 p.m. UTC | #5
+1

cjl

On Tue, Apr 26, 2016 at 2:04 AM, Marko Myllynen <myllynen@redhat.com> wrote:
> Hi,
>
> On 2016-04-26 07:40, Mike Frysinger wrote:
>> On 25 Apr 2016 16:16, Marko Myllynen wrote:
>>> On 2016-04-24 00:34, Mike Frysinger wrote:
>>>> Some of the newer symbols we're using are missing translit entries which
>>>> causes troubles when generating the locales with older encodings.
>>>>
>>>> tr_TR: ₺ -> "TL"
>>>> uz_UZ: ʻ -> "'"
>>>> common:
>>>>   ֏ -> "AMD"
>>>>   ₪ -> "ILS"
>>>>   ₱ -> "PHP"
>>>>   ₸ -> "KZT"
>>>>   ₾ -> "GEL"
>>>
>>> The new common parts look good but I wonder when should we include
>>> translit_neutral in locale files and when to add individual rules into
>>> them instead (which are already defined in translit_neutral)?
>>
>> the vast majority of locales leverage translit_neutral (by way of copying
>> i18n).  the exceptions are:
>>       az_AZ
>>       crh_UA
>>       km_KH
>>       ku_TR
>>       tr_CY
>>       tr_TR
>>       tt_RU@iqtelif
>>
>> all except km_KH eventually copy tr_TR.  seems like it should be possible
>> to rebase these two on top of translit_neutral (or at least i18n), but i
>> have no idea why they didn't in the first place.  considering how hacky
>> km_KH has been (looks like it was copied from lo_LA and then hacked to
>> bits to get a little km_KH), i wouldn't be surprised if it wasn't meant
>> to be by itself.
>
> Ok, thanks for checking, I think this can be investigated later in more
> detail and now your patch is the right thing to do.
>
> Cheers,
>
> --
> Marko Myllynen
  
Mike Frysinger April 27, 2016, 5:13 a.m. UTC | #6
On 26 Apr 2016 09:04, Marko Myllynen wrote:
> On 2016-04-26 07:40, Mike Frysinger wrote:
> > On 25 Apr 2016 16:16, Marko Myllynen wrote:
> >> On 2016-04-24 00:34, Mike Frysinger wrote:
> >>> Some of the newer symbols we're using are missing translit entries which
> >>> causes troubles when generating the locales with older encodings.
> >>>
> >>> tr_TR: ₺ -> "TL"
> >>> uz_UZ: ʻ -> "'"
> >>> common:
> >>>   ֏ -> "AMD"
> >>>   ₪ -> "ILS"
> >>>   ₱ -> "PHP"
> >>>   ₸ -> "KZT"
> >>>   ₾ -> "GEL"
> >>
> >> The new common parts look good but I wonder when should we include
> >> translit_neutral in locale files and when to add individual rules into
> >> them instead (which are already defined in translit_neutral)?
> > 
> > the vast majority of locales leverage translit_neutral (by way of copying
> > i18n).  the exceptions are:
> > 	az_AZ
> > 	crh_UA
> > 	km_KH
> > 	ku_TR
> > 	tr_CY
> > 	tr_TR
> > 	tt_RU@iqtelif
> > 
> > all except km_KH eventually copy tr_TR.  seems like it should be possible
> > to rebase these two on top of translit_neutral (or at least i18n), but i
> > have no idea why they didn't in the first place.  considering how hacky
> > km_KH has been (looks like it was copied from lo_LA and then hacked to
> > bits to get a little km_KH), i wouldn't be surprised if it wasn't meant
> > to be by itself.
> 
> Ok, thanks for checking, I think this can be investigated later in more
> detail and now your patch is the right thing to do.

i've filed two bugs for these so we don't lose track:
	https://sourceware.org/bugzilla/show_bug.cgi?id=20008 (km_KH)
	https://sourceware.org/bugzilla/show_bug.cgi?id=20009 (tr_TR)
-mike
  

Patch

diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR
index 3ef50dd..3d06d0e 100644
--- a/localedata/locales/tr_TR
+++ b/localedata/locales/tr_TR
@@ -4229,6 +4229,9 @@  class "combining_level3"; /
 
 translit_start
 include "translit_combining";""
+
+% TURKISH LIRA SIGN
+<U20BA> "<U0054><U004C>"
 translit_end
 END LC_CTYPE
 
diff --git a/localedata/locales/translit_neutral b/localedata/locales/translit_neutral
index b15d01f..e3639c0 100644
--- a/localedata/locales/translit_neutral
+++ b/localedata/locales/translit_neutral
@@ -353,6 +353,8 @@  include "translit_wide";""
 <U02DC> <U007E>
 % DOUBLE ACUTE ACCENT
 <U02DD> "<U0027><U0027>"
+% ARMENIAN DRAM SIGN
+<U058F> "<U0041><U004D><U0044>"
 % LATIN LETTER SMALL CAPITAL A
 <U1D00> <U0041>
 % LATIN LETTER SMALL CAPITAL AE
@@ -561,20 +563,28 @@  include "translit_wide";""
 <U20A7> "<U0050><U0074><U0073>"
 % WON SIGN
 <U20A9> "<U004B><U0052><U0057>"
+% NEW SHEQEL SIGN
+<U20AA> "<U0049><U004C><U0053>"
 % DONG SIGN
 <U20AB> "<U0110><U1ED3><U006E><U0067>";"<U0044><U006F><U006E><U0067>"
 % EURO SIGN
 <U20AC> "<U0045><U0055><U0052>"
 % DRACHMA SIGN
 <U20AF> "<U0047><U0052><U0044>"
+% PESO SIGN
+<U20B1> "<U0050><U0048><U0050>"
 % HRYVNIA SIGN
 <U20B4> "<U0433><U0440><U043D>";"<U0055><U0041><U0048>"
+% TENGE SIGN
+<U20B8> "<U004B><U005A><U0054>"
 % INDIAN RUPEE SIGN
 <U20B9> "<U0049><U004E><U0052>"
 % TURKISH LIRA SIGN
 <U20BA> "<U0054><U004C>"
 % RUBLE SIGN
 <U20BD> "<U0440><U0443><U0431>";"<U0052><U0055><U0042>"
+% GEORGIAN LARI SIGN
+<U20BE> "<U0047><U0045><U004C>"
 % PRESCRIPTION TAKE
 <U211E> "<U0052><U0078>"
 % TRADE MARK SIGN
diff --git a/localedata/locales/uz_UZ b/localedata/locales/uz_UZ
index b76a281..c4e7590 100644
--- a/localedata/locales/uz_UZ
+++ b/localedata/locales/uz_UZ
@@ -147,6 +147,9 @@  include "translit_combining";""
 <U049A> "<U0051>"        % k,= -> q
 <U0492> "<U0047><U0027>" % g-= -> g'
 <U04B2> "<U0048>"        % h,= -> h
+
+% various punctuation
+<U02BB> "<U0027>" % ‘ -> '
 translit_end
 END LC_CTYPE