Message ID | 20220125013310.182786-1-emil@soleyman.com |
---|---|
Headers | show |
Series | localedata: Add locale for syr_SY [BZ #27063] | expand |
Emil Soleyman-Zomalan <emil@soleyman.com> さんはかきました: > Please add the Syriac language locale in the country of Syria. This follows the data and patterns setup in CLDR but not yet published: https://st.unicode.org/cldr-apps/v#/syr_SY/ > > I am also a contributor to the Unicode CLDR for Syriac. > > Author: Emil Soleyman-Zomalan <emil@soleyman.com> Is this about classical Syriac (iso 639-3 code syc) or is this about modern Syriac: https://en.wikipedia.org/wiki/ISO_639_macrolanguage#syr Wikipedia> syr is the ISO 639-3 language code for Syriac. Wikipedia> There are two individual language codes assigned: Wikipedia> Wikipedia> aii – Assyrian Neo-Aramaic Wikipedia> cld – Chaldean Neo-Aramaic > LC_MONETARY > int_curr_symbol "XDR " > currency_symbol "¤" XDR is a quite weird currency code: https://en.wikipedia.org/wiki/Special_drawing_rights ¤ is the generic currency sign. If this is about the modern living language and if the country is SY, shouldn‘t the currency be Syrian pound?: https://en.wikipedia.org/wiki/Syrian_pound i.e. int_curr_symbol "SYP " currency_symbol "£S" https://github.com/unicode-org/cldr/blob/main/seed/main/syr.xml#L1086 has <currencies> <currency type="SYP"> <symbol draft="unconfirmed">ل.س.</symbol> </currency> </currencies> If the country is the modern Syria, then maybe add int_prefix "963" to LC_TELEPHONE ?
Mike FABIAN <mfabian@redhat.com> さんはかきました: > Emil Soleyman-Zomalan <emil@soleyman.com> さんはかきました: > >> Please add the Syriac language locale in the country of Syria. This >> follows the data and patterns setup in CLDR but not yet published: >> https://st.unicode.org/cldr-apps/v#/syr_SY/ >> >> I am also a contributor to the Unicode CLDR for Syriac. >> >> Author: Emil Soleyman-Zomalan <emil@soleyman.com> > > Is this about classical Syriac (iso 639-3 code syc) or is this about > modern Syriac: > > https://en.wikipedia.org/wiki/ISO_639_macrolanguage#syr > > Wikipedia> syr is the ISO 639-3 language code for Syriac. > Wikipedia> There are two individual language codes assigned: > Wikipedia> > Wikipedia> aii – Assyrian Neo-Aramaic > Wikipedia> cld – Chaldean Neo-Aramaic > >> LC_MONETARY >> int_curr_symbol "XDR " >> currency_symbol "¤" > > XDR is a quite weird currency code: > > https://en.wikipedia.org/wiki/Special_drawing_rights > > ¤ is the generic currency sign. > > If this is about the modern living language and if the country is SY, > shouldn‘t the currency be Syrian pound?: > > https://en.wikipedia.org/wiki/Syrian_pound > > i.e. > > int_curr_symbol "SYP " > currency_symbol "£S" > > https://github.com/unicode-org/cldr/blob/main/seed/main/syr.xml#L1086 > > has > > <currencies> > <currency type="SYP"> > <symbol draft="unconfirmed">ل.س.</symbol> > </currency> > </currencies> > > If the country is the modern Syria, then maybe add > > int_prefix "963" > > to LC_TELEPHONE > > ? And could you please also add a file localedata/syr_SY.UTF-8.in containing lines with characters and/or words in Syriac in the correct sort order?
On Wed, Apr 6, 2022, at 11:32, Mike FABIAN wrote: > > Is this about classical Syriac (iso 639-3 code syc) or is this about > modern Syriac: > > https://en.wikipedia.org/wiki/ISO_639_macrolanguage#syr > > Wikipedia> syr is the ISO 639-3 language code for Syriac. > Wikipedia> There are two individual language codes assigned: > Wikipedia> > Wikipedia> aii – Assyrian Neo-Aramaic > Wikipedia> cld – Chaldean Neo-Aramaic Syr covers both modern and classical as it is default for all literary Syriac. https://www.syriaca.org/documentation/isostandards.html >> LC_MONETARY >> int_curr_symbol "XDR " >> currency_symbol "¤" > > XDR is a quite weird currency code: > > https://en.wikipedia.org/wiki/Special_drawing_rights > > ¤ is the generic currency sign. > > If this is about the modern living language and if the country is SY, > shouldn‘t the currency be Syrian pound?: > > https://en.wikipedia.org/wiki/Syrian_pound > > i.e. > > int_curr_symbol "SYP " > currency_symbol "£S" > https://github.com/unicode-org/cldr/blob/main/seed/main/syr.xml#L1086 > > has > > <currencies> > <currency type="SYP"> > <symbol draft="unconfirmed">ل.س.</symbol> > </currency> > </currencies> > > If the country is the modern Syria, then maybe add > > int_prefix "963" > > to LC_TELEPHONE > > ? The only reason I have shied away from using the Syrian Pound and corresponding symbol as well as the telephone prefix as we have Syriac readers and writers throughout the Middle East including Iran, Iraq, Syria, Turkey and of course the diaspora. These locale settings will apply to all of these readers irrespective of their location. I have attempted to make it generic in that regard. I would appreciate any guidance though. ― Emil Soleyman-Zomalan
* Emil Soleyman-Zomalan: > The only reason I have shied away from using the Syrian Pound and > corresponding symbol as well as the telephone prefix as we have Syriac > readers and writers throughout the Middle East including Iran, Iraq, > Syria, Turkey and of course the diaspora. > > These locale settings will apply to all of these readers irrespective > of their location. I have attempted to make it generic in that regard. Should it be a country-less locale like eo? Thanks, Florian
On Wed, Apr 6, 2022, at 13:28, Florian Weimer wrote: > > Should it be a country-less locale like eo? I can tell you that Microsoft Windows has had a Syriac locale since Windows 8 and uses the Syrian Pound as a currency but no telephone prefix. Given that we are a country-less and region-less people, it might make sense to go with a country-less locale like eo. I know that there are pros and cons to this but nothing is coming to mind right now. ― Emil Soleyman-Zomalan
On Wed, Apr 6, 2022, at 14:10, Emil Soleyman-Zomalan wrote: > > I can tell you that Microsoft Windows has had a Syriac locale since > Windows 8 and uses the Syrian Pound as a currency but no telephone > prefix. > > Given that we are a country-less and region-less people, it might make > sense to go with a country-less locale like eo. I know that there are > pros and cons to this but nothing is coming to mind right now. > I have added the syr_SY.UTF-8.in file to bugzilla#27063. I would like to say let's move forward with the country-less version of the Syriac locale. ― Emil Soleyman-Zomalan, MD FAAEM
"Emil Soleyman-Zomalan" <emil@soleyman.com> さんはかきました: > On Wed, Apr 6, 2022, at 14:10, Emil Soleyman-Zomalan wrote: >> >> I can tell you that Microsoft Windows has had a Syriac locale since >> Windows 8 and uses the Syrian Pound as a currency but no telephone >> prefix. >> >> Given that we are a country-less and region-less people, it might make >> sense to go with a country-less locale like eo. I know that there are >> pros and cons to this but nothing is coming to mind right now. >> > > I have added the syr_SY.UTF-8.in file to bugzilla#27063. I would like > to say let's move forward with the country-less version of the Syriac > locale. I am working on adding this to glibc now. I used the locale attached to the bug and the syr_SY.UTF-8.in from the bug and renamed it to syr.UTF-8.in The sorting test case fails like this: syr.UTF-8 collate-test FAIL --- syr.UTF-8.in 2022-04-19 13:07:31.675953523 +0200 +++ /local/mfabian/src/glibc-build/localedata/syr.UTF-8.out 2022-04-19 14:42:15.977224807 +0200 @@ -1,14 +1,14 @@ ; Symbol Name Hex Code ; ------+------+---------------------------------------+--------+ -ܐ ; Syriac Letter Alaph U+0710 ܑ ; Syriac Letter Superscript Alaph U+0711 +ܐ ; Syriac Letter Alaph U+0710 ܒ ; Syriac Letter Beth U+0712 ܭ ; Syriac Letter Persian Bheth U+072D ܓ ; Syriac Letter Gamal U+0713 ܔ ; Syriac Letter Gamal Garshuni U+0714 ܮ ; Syriac Letter Persian Ghamal U+072E -ܕ ; Syriac Letter Dalath U+0715 ܖ ; Syriac Letter Dotless Dalath Rish U+0716 +ܕ ; Syriac Letter Dalath U+0715 ܯ ; Syriac Letter Persian Dhalath U+072F ܗ ; Syriac Letter He U+0717 ܘ ; Syriac Letter Waw U+0718 Your locale uses LC_COLLATE copy "iso14651_t1" END LC_COLLATE i.e. it includes the file with the default Unicode collation. Your test file has two characters in a different order. If the default Unicode collation order is OK for you, I would fix the test file accordingly (My preferred solution, I would deviate from the default only if necessary). If these two characters really should be ordered as in your test file, I would add extra rules to the LC_COLLATE section to achieve that: LC_COLLATE copy "iso14651_t1" ... more rules here ... END LC_COLLATE But that is extra effort and I wonder whether this is needed. There is no collation information in CLDR yet, no file common/collation/syr.xml exists currently in CLDR. You said you are also the contributor to the Unicode CLDR for Syriac. But you didn’t add a collation file there. If that means the default order is OK, then just doing LC_COLLATE copy "iso14651_t1" END LC_COLLATE should be enough and I should fix the test file.
On Tue, Apr 19, 2022, at 08:43, Mike FABIAN wrote: > > I am working on adding this to glibc now. I used the locale attached to > the bug and the syr_SY.UTF-8.in from the bug and renamed it to > syr.UTF-8.in ... > But that is extra effort and I wonder whether this is needed. > There is no collation information in CLDR yet, no file > > common/collation/syr.xml > > exists currently in CLDR. > > You said you are also the contributor to the Unicode CLDR for Syriac. > But you didn’t add a collation file there. If that means the default > order is OK, then just doing > > LC_COLLATE > copy "iso14651_t1" > END LC_COLLATE > > should be enough and I should fix the test file. I have fixed the test file because the ordering won't make a difference in the end for those two letters and should not fail now (hopefully). I'm working on getting the collation setup with CLDR during their next submission cycle that starts on May 18. I have updated bugzilla. Thank you for your help! ― Emil Soleyman-Zomalan
"Emil Soleyman-Zomalan" <emil@soleyman.com> さんはかきました:
> I have updated bugzilla.
Thank you, looks good to me, except the second line in the test file:
; Symbol Name Hex Code
; ------+------+---------------------------------------+--------+
ܑ ; Syriac Letter Superscript Alaph U+0711
still makes the test fail because "; S" sorts after "; -".
I removed that line and attached an updated patch to
https://sourceware.org/bugzilla/show_bug.cgi?id=27063
https://sourceware.org/bugzilla/attachment.cgi?id=14074
I think this looks good and I can commit it.
On Wed, Apr 20, 2022, at 10:59, Mike FABIAN wrote: > I removed that line and attached an updated patch to > > https://sourceware.org/bugzilla/show_bug.cgi?id=27063 > https://sourceware.org/bugzilla/attachment.cgi?id=14074 > > I think this looks good and I can commit it. Thank you for all of your help. I appreciate it greatly. ― Emil Soleyman-Zomalan