mbox series

[0/2] localedata: Add locale for syr_SY [BZ #27063]

Message ID 20220125013310.182786-1-emil@soleyman.com
Headers show
Series localedata: Add locale for syr_SY [BZ #27063] | expand

Message

Emil Soleyman-Zomalan Jan. 25, 2022, 1:33 a.m. UTC
Please add the Syriac language locale in the country of Syria. This follows the data and patterns setup in CLDR but not yet published: https://st.unicode.org/cldr-apps/v#/syr_SY/

I am also a contributor to the Unicode CLDR for Syriac.

Author: Emil Soleyman-Zomalan <emil@soleyman.com>

--

Emil Soleyman-Zomalan (2):
  Add locale for syr_SY
  Add syr_SY to the localedata apparatus

 localedata/Makefile       |   1 +
 localedata/SUPPORTED      |   1 +
 localedata/locales/syr_SY | 197 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 199 insertions(+)
 create mode 100644 localedata/locales/syr_SY

Comments

Mike FABIAN April 6, 2022, 4:32 p.m. UTC | #1
Emil Soleyman-Zomalan <emil@soleyman.com> さんはかきました:

> Please add the Syriac language locale in the country of Syria. This follows the data and patterns setup in CLDR but not yet published: https://st.unicode.org/cldr-apps/v#/syr_SY/
>
> I am also a contributor to the Unicode CLDR for Syriac.
>
> Author: Emil Soleyman-Zomalan <emil@soleyman.com>

Is this about classical Syriac (iso 639-3 code syc) or is this about
modern Syriac:

https://en.wikipedia.org/wiki/ISO_639_macrolanguage#syr

Wikipedia> syr is the ISO 639-3 language code for Syriac.
Wikipedia> There are two individual language codes assigned:
Wikipedia> 
Wikipedia>     aii – Assyrian Neo-Aramaic
Wikipedia>     cld – Chaldean Neo-Aramaic

> LC_MONETARY
> int_curr_symbol   "XDR "
> currency_symbol   "¤"

XDR is a quite weird currency code:

https://en.wikipedia.org/wiki/Special_drawing_rights

¤ is the generic currency sign.

If this is about the modern living language and if the country is SY,
shouldn‘t the currency be Syrian pound?:

https://en.wikipedia.org/wiki/Syrian_pound

i.e.

int_curr_symbol   "SYP "
currency_symbol   "£S"

https://github.com/unicode-org/cldr/blob/main/seed/main/syr.xml#L1086

has

		<currencies>
			<currency type="SYP">
				<symbol draft="unconfirmed">ل.س.‏</symbol>
			</currency>
		</currencies>

If the country is the modern Syria, then maybe add

int_prefix     "963"

to LC_TELEPHONE

?
Mike FABIAN April 6, 2022, 4:39 p.m. UTC | #2
Mike FABIAN <mfabian@redhat.com> さんはかきました:

> Emil Soleyman-Zomalan <emil@soleyman.com> さんはかきました:
>
>> Please add the Syriac language locale in the country of Syria. This
>> follows the data and patterns setup in CLDR but not yet published:
>> https://st.unicode.org/cldr-apps/v#/syr_SY/
>>
>> I am also a contributor to the Unicode CLDR for Syriac.
>>
>> Author: Emil Soleyman-Zomalan <emil@soleyman.com>
>
> Is this about classical Syriac (iso 639-3 code syc) or is this about
> modern Syriac:
>
> https://en.wikipedia.org/wiki/ISO_639_macrolanguage#syr
>
> Wikipedia> syr is the ISO 639-3 language code for Syriac.
> Wikipedia> There are two individual language codes assigned:
> Wikipedia> 
> Wikipedia>     aii – Assyrian Neo-Aramaic
> Wikipedia>     cld – Chaldean Neo-Aramaic
>
>> LC_MONETARY
>> int_curr_symbol   "XDR "
>> currency_symbol   "¤"
>
> XDR is a quite weird currency code:
>
> https://en.wikipedia.org/wiki/Special_drawing_rights
>
> ¤ is the generic currency sign.
>
> If this is about the modern living language and if the country is SY,
> shouldn‘t the currency be Syrian pound?:
>
> https://en.wikipedia.org/wiki/Syrian_pound
>
> i.e.
>
> int_curr_symbol   "SYP "
> currency_symbol   "£S"
>
> https://github.com/unicode-org/cldr/blob/main/seed/main/syr.xml#L1086
>
> has
>
> 		<currencies>
> 			<currency type="SYP">
> 				<symbol draft="unconfirmed">ل.س.‏</symbol>
> 			</currency>
> 		</currencies>
>
> If the country is the modern Syria, then maybe add
>
> int_prefix     "963"
>
> to LC_TELEPHONE
>
> ?

And could you please also add a file localedata/syr_SY.UTF-8.in
containing lines with characters and/or words in Syriac in the correct
sort order?
Emil Soleyman-Zomalan April 6, 2022, 6:12 p.m. UTC | #3
On Wed, Apr 6, 2022, at 11:32, Mike FABIAN wrote:
>
> Is this about classical Syriac (iso 639-3 code syc) or is this about
> modern Syriac:
>
> https://en.wikipedia.org/wiki/ISO_639_macrolanguage#syr
>
> Wikipedia> syr is the ISO 639-3 language code for Syriac.
> Wikipedia> There are two individual language codes assigned:
> Wikipedia> 
> Wikipedia>     aii – Assyrian Neo-Aramaic
> Wikipedia>     cld – Chaldean Neo-Aramaic

Syr covers both modern and classical as it is default for all literary Syriac.

https://www.syriaca.org/documentation/isostandards.html

>> LC_MONETARY
>> int_curr_symbol   "XDR "
>> currency_symbol   "¤"
>
> XDR is a quite weird currency code:
>
> https://en.wikipedia.org/wiki/Special_drawing_rights
>
> ¤ is the generic currency sign.
>
> If this is about the modern living language and if the country is SY,
> shouldn‘t the currency be Syrian pound?:
>
> https://en.wikipedia.org/wiki/Syrian_pound
>
> i.e.
>
> int_curr_symbol   "SYP "
> currency_symbol   "£S"

> https://github.com/unicode-org/cldr/blob/main/seed/main/syr.xml#L1086
>
> has
>
> 		<currencies>
> 			<currency type="SYP">
> 				<symbol draft="unconfirmed">ل.س.‏</symbol>
> 			</currency>
> 		</currencies>
>
> If the country is the modern Syria, then maybe add
>
> int_prefix     "963"
>
> to LC_TELEPHONE
>
> ?

The only reason I have shied away from using the Syrian Pound and corresponding symbol as well as the telephone prefix as we have Syriac readers and writers throughout the Middle East including Iran, Iraq, Syria, Turkey and of course the diaspora.

These locale settings will apply to all of these readers irrespective of their location. I have attempted to make it generic in that regard.

I would appreciate any guidance though.
 
―
Emil Soleyman-Zomalan
Florian Weimer April 6, 2022, 6:28 p.m. UTC | #4
* Emil Soleyman-Zomalan:

> The only reason I have shied away from using the Syrian Pound and
> corresponding symbol as well as the telephone prefix as we have Syriac
> readers and writers throughout the Middle East including Iran, Iraq,
> Syria, Turkey and of course the diaspora.
>
> These locale settings will apply to all of these readers irrespective
> of their location. I have attempted to make it generic in that regard.

Should it be a country-less locale like eo?

Thanks,
Florian
Emil Soleyman-Zomalan April 6, 2022, 7:10 p.m. UTC | #5
On Wed, Apr 6, 2022, at 13:28, Florian Weimer wrote:
>
> Should it be a country-less locale like eo?

I can tell you that Microsoft Windows has had a Syriac locale since Windows 8 and uses the Syrian Pound as a currency but no telephone prefix.

Given that we are a country-less and region-less people, it might make sense to go with a country-less locale like eo. I know that there are pros and cons to this but nothing is coming to mind right now.

―
Emil Soleyman-Zomalan
Emil Soleyman-Zomalan April 9, 2022, 5:39 p.m. UTC | #6
On Wed, Apr 6, 2022, at 14:10, Emil Soleyman-Zomalan wrote:
>
> I can tell you that Microsoft Windows has had a Syriac locale since 
> Windows 8 and uses the Syrian Pound as a currency but no telephone 
> prefix.
>
> Given that we are a country-less and region-less people, it might make 
> sense to go with a country-less locale like eo. I know that there are 
> pros and cons to this but nothing is coming to mind right now.
>

I have added the syr_SY.UTF-8.in file to bugzilla#27063. I would like to say let's move forward with the country-less version of the Syriac locale.

―
Emil Soleyman-Zomalan, MD FAAEM
Mike FABIAN April 19, 2022, 1:43 p.m. UTC | #7
"Emil Soleyman-Zomalan" <emil@soleyman.com> さんはかきました:

> On Wed, Apr 6, 2022, at 14:10, Emil Soleyman-Zomalan wrote:
>>
>> I can tell you that Microsoft Windows has had a Syriac locale since 
>> Windows 8 and uses the Syrian Pound as a currency but no telephone 
>> prefix.
>>
>> Given that we are a country-less and region-less people, it might make 
>> sense to go with a country-less locale like eo. I know that there are 
>> pros and cons to this but nothing is coming to mind right now.
>>
>
> I have added the syr_SY.UTF-8.in file to bugzilla#27063. I would like
> to say let's move forward with the country-less version of the Syriac
> locale.

I am working on adding this to glibc now. I used the locale attached to
the bug and the syr_SY.UTF-8.in from the bug and renamed it to
syr.UTF-8.in

The sorting test case fails like this:

syr.UTF-8 collate-test FAIL
  --- syr.UTF-8.in      2022-04-19 13:07:31.675953523 +0200
  +++ /local/mfabian/src/glibc-build/localedata/syr.UTF-8.out   2022-04-19 14:42:15.977224807
 +0200
  @@ -1,14 +1,14 @@
   ; Symbol         Name                                                        Hex Code
   ; ------+------+---------------------------------------+--------+
  -ܐ        ;       Syriac Letter Alaph                             U+0710
   ܑ         ;       Syriac Letter Superscript Alaph                 U+0711
  +ܐ        ;       Syriac Letter Alaph                             U+0710
   ܒ        ;       Syriac Letter Beth                              U+0712
   ܭ        ;       Syriac Letter Persian Bheth                 U+072D
   ܓ        ;       Syriac Letter Gamal                             U+0713
   ܔ        ;       Syriac Letter Gamal Garshuni                    U+0714
   ܮ        ;       Syriac Letter Persian Ghamal                    U+072E
  -ܕ        ;       Syriac Letter Dalath                                U+0715
   ܖ        ;       Syriac Letter Dotless Dalath Rish       U+0716
  +ܕ        ;       Syriac Letter Dalath                                U+0715
   ܯ        ;       Syriac Letter Persian Dhalath                   U+072F
   ܗ        ;       Syriac Letter He                                U+0717
   ܘ        ;       Syriac Letter Waw                               U+0718

Your locale uses

LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE

i.e. it includes the file with the default Unicode collation.

Your test file has two characters in a different order.

If the default Unicode collation order is OK for you, I would fix the
test file accordingly (My preferred solution, I would deviate from the
default only if necessary).

If these two characters  really should be ordered as in your test file,
I would add extra rules to the LC_COLLATE section to achieve that:

LC_COLLATE
copy "iso14651_t1"

... more rules here ...

END LC_COLLATE

But that is extra effort and I wonder whether this is needed.
There is no collation information in CLDR yet, no file

common/collation/syr.xml

exists currently in CLDR.

You said you are also the contributor to the Unicode CLDR for Syriac.
But you didn’t add a collation file there. If that means the default
order is OK, then just doing

LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE

should be enough and I should fix the test file.
Emil Soleyman-Zomalan April 19, 2022, 7:42 p.m. UTC | #8
On Tue, Apr 19, 2022, at 08:43, Mike FABIAN wrote:
>
> I am working on adding this to glibc now. I used the locale attached to
> the bug and the syr_SY.UTF-8.in from the bug and renamed it to
> syr.UTF-8.in

...

> But that is extra effort and I wonder whether this is needed.
> There is no collation information in CLDR yet, no file
>
> common/collation/syr.xml
>
> exists currently in CLDR.
>
> You said you are also the contributor to the Unicode CLDR for Syriac.
> But you didn’t add a collation file there. If that means the default
> order is OK, then just doing
>
> LC_COLLATE
> copy "iso14651_t1"
> END LC_COLLATE
>
> should be enough and I should fix the test file.

I have fixed the test file because the ordering won't make a difference in the end for those two letters and should not fail now (hopefully). I'm working on getting the collation setup with CLDR during their next submission cycle that starts on May 18.

I have updated bugzilla.

Thank you for your help!

―
Emil Soleyman-Zomalan
Mike FABIAN April 20, 2022, 3:59 p.m. UTC | #9
"Emil Soleyman-Zomalan" <emil@soleyman.com> さんはかきました:

> I have updated bugzilla.

Thank you, looks good to me, except the second line in the test file:

; Symbol	    Name					                Hex Code
; ------+------+---------------------------------------+--------+
ܑ	    ;	    Syriac Letter Superscript Alaph		    U+0711

still makes the test fail because "; S" sorts after "; -".

I removed that line and attached an updated patch to

https://sourceware.org/bugzilla/show_bug.cgi?id=27063
https://sourceware.org/bugzilla/attachment.cgi?id=14074

I think this looks good and I can commit it.
Emil Soleyman-Zomalan April 21, 2022, 4:17 p.m. UTC | #10
On Wed, Apr 20, 2022, at 10:59, Mike FABIAN wrote:
> I removed that line and attached an updated patch to
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=27063
> https://sourceware.org/bugzilla/attachment.cgi?id=14074
>
> I think this looks good and I can commit it.

Thank you for all of your help. I appreciate it greatly.

―
Emil Soleyman-Zomalan