localedata: add more translit entries
Commit Message
Some of the newer symbols we're using are missing translit entries which
causes troubles when generating the locales with older encodings.
tr_TR: ₺ -> "TL"
uz_UZ: ʻ -> "'"
common:
֏ -> "AMD"
₪ -> "ILS"
₱ -> "PHP"
₸ -> "KZT"
₾ -> "GEL"
---
localedata/locales/tr_TR | 3 +++
localedata/locales/translit_neutral | 10 ++++++++++
localedata/locales/uz_UZ | 3 +++
3 files changed, 16 insertions(+)
Comments
> On Apr 23, 2016, at 2:34 PM, Mike Frysinger <vapier@gentoo.org> wrote:
>
> Some of the newer symbols we're using are missing translit entries which
> causes troubles when generating the locales with older encodings.
>
> tr_TR: ₺ -> "TL"
> uz_UZ: ʻ -> "'"
> common:
> ֏ -> "AMD"
> ₪ -> "ILS"
> ₱ -> "PHP"
> ₸ -> "KZT"
> ₾ -> “GEL"
You can add if you like
Tested-by: Khem Raj <raj.khem@gmail.com>
I tested this patch and now all broken locales that I was excluding are fixed.
> ---
> localedata/locales/tr_TR | 3 +++
> localedata/locales/translit_neutral | 10 ++++++++++
> localedata/locales/uz_UZ | 3 +++
> 3 files changed, 16 insertions(+)
>
> diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR
> index 3ef50dd..3d06d0e 100644
> --- a/localedata/locales/tr_TR
> +++ b/localedata/locales/tr_TR
> @@ -4229,6 +4229,9 @@ class "combining_level3"; /
>
> translit_start
> include "translit_combining";""
> +
> +% TURKISH LIRA SIGN
> +<U20BA> "<U0054><U004C>"
> translit_end
> END LC_CTYPE
>
> diff --git a/localedata/locales/translit_neutral b/localedata/locales/translit_neutral
> index b15d01f..e3639c0 100644
> --- a/localedata/locales/translit_neutral
> +++ b/localedata/locales/translit_neutral
> @@ -353,6 +353,8 @@ include "translit_wide";""
> <U02DC> <U007E>
> % DOUBLE ACUTE ACCENT
> <U02DD> "<U0027><U0027>"
> +% ARMENIAN DRAM SIGN
> +<U058F> "<U0041><U004D><U0044>"
> % LATIN LETTER SMALL CAPITAL A
> <U1D00> <U0041>
> % LATIN LETTER SMALL CAPITAL AE
> @@ -561,20 +563,28 @@ include "translit_wide";""
> <U20A7> "<U0050><U0074><U0073>"
> % WON SIGN
> <U20A9> "<U004B><U0052><U0057>"
> +% NEW SHEQEL SIGN
> +<U20AA> "<U0049><U004C><U0053>"
> % DONG SIGN
> <U20AB> "<U0110><U1ED3><U006E><U0067>";"<U0044><U006F><U006E><U0067>"
> % EURO SIGN
> <U20AC> "<U0045><U0055><U0052>"
> % DRACHMA SIGN
> <U20AF> "<U0047><U0052><U0044>"
> +% PESO SIGN
> +<U20B1> "<U0050><U0048><U0050>"
> % HRYVNIA SIGN
> <U20B4> "<U0433><U0440><U043D>";"<U0055><U0041><U0048>"
> +% TENGE SIGN
> +<U20B8> "<U004B><U005A><U0054>"
> % INDIAN RUPEE SIGN
> <U20B9> "<U0049><U004E><U0052>"
> % TURKISH LIRA SIGN
> <U20BA> "<U0054><U004C>"
> % RUBLE SIGN
> <U20BD> "<U0440><U0443><U0431>";"<U0052><U0055><U0042>"
> +% GEORGIAN LARI SIGN
> +<U20BE> "<U0047><U0045><U004C>"
> % PRESCRIPTION TAKE
> <U211E> "<U0052><U0078>"
> % TRADE MARK SIGN
> diff --git a/localedata/locales/uz_UZ b/localedata/locales/uz_UZ
> index b76a281..c4e7590 100644
> --- a/localedata/locales/uz_UZ
> +++ b/localedata/locales/uz_UZ
> @@ -147,6 +147,9 @@ include "translit_combining";""
> <U049A> "<U0051>" % k,= -> q
> <U0492> "<U0047><U0027>" % g-= -> g'
> <U04B2> "<U0048>" % h,= -> h
> +
> +% various punctuation
> +<U02BB> "<U0027>" % ‘ -> '
> translit_end
> END LC_CTYPE
>
> --
> 2.7.4
>
Hi,
On 2016-04-24 00:34, Mike Frysinger wrote:
> Some of the newer symbols we're using are missing translit entries which
> causes troubles when generating the locales with older encodings.
>
> tr_TR: ₺ -> "TL"
> uz_UZ: ʻ -> "'"
> common:
> ֏ -> "AMD"
> ₪ -> "ILS"
> ₱ -> "PHP"
> ₸ -> "KZT"
> ₾ -> "GEL"
The new common parts look good but I wonder when should we include
translit_neutral in locale files and when to add individual rules into
them instead (which are already defined in translit_neutral)?
Thanks,
On 25 Apr 2016 16:16, Marko Myllynen wrote:
> On 2016-04-24 00:34, Mike Frysinger wrote:
> > Some of the newer symbols we're using are missing translit entries which
> > causes troubles when generating the locales with older encodings.
> >
> > tr_TR: ₺ -> "TL"
> > uz_UZ: ʻ -> "'"
> > common:
> > ֏ -> "AMD"
> > ₪ -> "ILS"
> > ₱ -> "PHP"
> > ₸ -> "KZT"
> > ₾ -> "GEL"
>
> The new common parts look good but I wonder when should we include
> translit_neutral in locale files and when to add individual rules into
> them instead (which are already defined in translit_neutral)?
the vast majority of locales leverage translit_neutral (by way of copying
i18n). the exceptions are:
az_AZ
crh_UA
km_KH
ku_TR
tr_CY
tr_TR
tt_RU@iqtelif
all except km_KH eventually copy tr_TR. seems like it should be possible
to rebase these two on top of translit_neutral (or at least i18n), but i
have no idea why they didn't in the first place. considering how hacky
km_KH has been (looks like it was copied from lo_LA and then hacked to
bits to get a little km_KH), i wouldn't be surprised if it wasn't meant
to be by itself.
-mike
Hi,
On 2016-04-26 07:40, Mike Frysinger wrote:
> On 25 Apr 2016 16:16, Marko Myllynen wrote:
>> On 2016-04-24 00:34, Mike Frysinger wrote:
>>> Some of the newer symbols we're using are missing translit entries which
>>> causes troubles when generating the locales with older encodings.
>>>
>>> tr_TR: ₺ -> "TL"
>>> uz_UZ: ʻ -> "'"
>>> common:
>>> ֏ -> "AMD"
>>> ₪ -> "ILS"
>>> ₱ -> "PHP"
>>> ₸ -> "KZT"
>>> ₾ -> "GEL"
>>
>> The new common parts look good but I wonder when should we include
>> translit_neutral in locale files and when to add individual rules into
>> them instead (which are already defined in translit_neutral)?
>
> the vast majority of locales leverage translit_neutral (by way of copying
> i18n). the exceptions are:
> az_AZ
> crh_UA
> km_KH
> ku_TR
> tr_CY
> tr_TR
> tt_RU@iqtelif
>
> all except km_KH eventually copy tr_TR. seems like it should be possible
> to rebase these two on top of translit_neutral (or at least i18n), but i
> have no idea why they didn't in the first place. considering how hacky
> km_KH has been (looks like it was copied from lo_LA and then hacked to
> bits to get a little km_KH), i wouldn't be surprised if it wasn't meant
> to be by itself.
Ok, thanks for checking, I think this can be investigated later in more
detail and now your patch is the right thing to do.
Cheers,
+1
cjl
On Tue, Apr 26, 2016 at 2:04 AM, Marko Myllynen <myllynen@redhat.com> wrote:
> Hi,
>
> On 2016-04-26 07:40, Mike Frysinger wrote:
>> On 25 Apr 2016 16:16, Marko Myllynen wrote:
>>> On 2016-04-24 00:34, Mike Frysinger wrote:
>>>> Some of the newer symbols we're using are missing translit entries which
>>>> causes troubles when generating the locales with older encodings.
>>>>
>>>> tr_TR: ₺ -> "TL"
>>>> uz_UZ: ʻ -> "'"
>>>> common:
>>>> ֏ -> "AMD"
>>>> ₪ -> "ILS"
>>>> ₱ -> "PHP"
>>>> ₸ -> "KZT"
>>>> ₾ -> "GEL"
>>>
>>> The new common parts look good but I wonder when should we include
>>> translit_neutral in locale files and when to add individual rules into
>>> them instead (which are already defined in translit_neutral)?
>>
>> the vast majority of locales leverage translit_neutral (by way of copying
>> i18n). the exceptions are:
>> az_AZ
>> crh_UA
>> km_KH
>> ku_TR
>> tr_CY
>> tr_TR
>> tt_RU@iqtelif
>>
>> all except km_KH eventually copy tr_TR. seems like it should be possible
>> to rebase these two on top of translit_neutral (or at least i18n), but i
>> have no idea why they didn't in the first place. considering how hacky
>> km_KH has been (looks like it was copied from lo_LA and then hacked to
>> bits to get a little km_KH), i wouldn't be surprised if it wasn't meant
>> to be by itself.
>
> Ok, thanks for checking, I think this can be investigated later in more
> detail and now your patch is the right thing to do.
>
> Cheers,
>
> --
> Marko Myllynen
On 26 Apr 2016 09:04, Marko Myllynen wrote:
> On 2016-04-26 07:40, Mike Frysinger wrote:
> > On 25 Apr 2016 16:16, Marko Myllynen wrote:
> >> On 2016-04-24 00:34, Mike Frysinger wrote:
> >>> Some of the newer symbols we're using are missing translit entries which
> >>> causes troubles when generating the locales with older encodings.
> >>>
> >>> tr_TR: ₺ -> "TL"
> >>> uz_UZ: ʻ -> "'"
> >>> common:
> >>> ֏ -> "AMD"
> >>> ₪ -> "ILS"
> >>> ₱ -> "PHP"
> >>> ₸ -> "KZT"
> >>> ₾ -> "GEL"
> >>
> >> The new common parts look good but I wonder when should we include
> >> translit_neutral in locale files and when to add individual rules into
> >> them instead (which are already defined in translit_neutral)?
> >
> > the vast majority of locales leverage translit_neutral (by way of copying
> > i18n). the exceptions are:
> > az_AZ
> > crh_UA
> > km_KH
> > ku_TR
> > tr_CY
> > tr_TR
> > tt_RU@iqtelif
> >
> > all except km_KH eventually copy tr_TR. seems like it should be possible
> > to rebase these two on top of translit_neutral (or at least i18n), but i
> > have no idea why they didn't in the first place. considering how hacky
> > km_KH has been (looks like it was copied from lo_LA and then hacked to
> > bits to get a little km_KH), i wouldn't be surprised if it wasn't meant
> > to be by itself.
>
> Ok, thanks for checking, I think this can be investigated later in more
> detail and now your patch is the right thing to do.
i've filed two bugs for these so we don't lose track:
https://sourceware.org/bugzilla/show_bug.cgi?id=20008 (km_KH)
https://sourceware.org/bugzilla/show_bug.cgi?id=20009 (tr_TR)
-mike
@@ -4229,6 +4229,9 @@ class "combining_level3"; /
translit_start
include "translit_combining";""
+
+% TURKISH LIRA SIGN
+<U20BA> "<U0054><U004C>"
translit_end
END LC_CTYPE
@@ -353,6 +353,8 @@ include "translit_wide";""
<U02DC> <U007E>
% DOUBLE ACUTE ACCENT
<U02DD> "<U0027><U0027>"
+% ARMENIAN DRAM SIGN
+<U058F> "<U0041><U004D><U0044>"
% LATIN LETTER SMALL CAPITAL A
<U1D00> <U0041>
% LATIN LETTER SMALL CAPITAL AE
@@ -561,20 +563,28 @@ include "translit_wide";""
<U20A7> "<U0050><U0074><U0073>"
% WON SIGN
<U20A9> "<U004B><U0052><U0057>"
+% NEW SHEQEL SIGN
+<U20AA> "<U0049><U004C><U0053>"
% DONG SIGN
<U20AB> "<U0110><U1ED3><U006E><U0067>";"<U0044><U006F><U006E><U0067>"
% EURO SIGN
<U20AC> "<U0045><U0055><U0052>"
% DRACHMA SIGN
<U20AF> "<U0047><U0052><U0044>"
+% PESO SIGN
+<U20B1> "<U0050><U0048><U0050>"
% HRYVNIA SIGN
<U20B4> "<U0433><U0440><U043D>";"<U0055><U0041><U0048>"
+% TENGE SIGN
+<U20B8> "<U004B><U005A><U0054>"
% INDIAN RUPEE SIGN
<U20B9> "<U0049><U004E><U0052>"
% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
% RUBLE SIGN
<U20BD> "<U0440><U0443><U0431>";"<U0052><U0055><U0042>"
+% GEORGIAN LARI SIGN
+<U20BE> "<U0047><U0045><U004C>"
% PRESCRIPTION TAKE
<U211E> "<U0052><U0078>"
% TRADE MARK SIGN
@@ -147,6 +147,9 @@ include "translit_combining";""
<U049A> "<U0051>" % k,= -> q
<U0492> "<U0047><U0027>" % g-= -> g'
<U04B2> "<U0048>" % h,= -> h
+
+% various punctuation
+<U02BB> "<U0027>" % ‘ -> '
translit_end
END LC_CTYPE