[0/1] Arabic scripts: More fixes after the recent import from CLDR-31
Commit Message
Feel free to treat this patch as the second part of the previous one.
[1] [2]
After the recent series of import of month names from CLDR (bug 21217) [3]
more imports seem to me to be needed, mostly abbreviated month names.
Here is an import which fixes Arabic script to reflect their full
forms import. [4]
ar_DZ, ar_MA, ar_TN: although CLDR does not provide abbreviated
month names I tried to follow the convention which abbreviates the
month names to their first 3 characters and here is the result:
month names imported and abbreviated.
However, the same rule applied to ar_IQ did not work because
produced ambiguous strings (multiple months having the same
abbreviated name). This means that ar_IQ full and abbreviated
month names probably do not match now and I don't know how to fix
this problem.
ar_SA: hopefully, this partially fixes bug 19066. [5] Previously
all abbreviated month names were English. The abbreviated day names
are not changed so the bug is not yet fixed.
ar_SY, ar_JO, ar_LB: incidentally this import together with the
previous one completely fixes bug 17225. [6] The original bug
report was against ar_SY only but the changes in ar_JO and ar_LB
are the same.
ks_IN: this looks like an import script bug to me. CLDR does not
provide the abbreviated month names for Kashmiri [7] but their
web interface seems to assume that missing data should be copied
from the existing ones and displays the abbreviated month names
according to that rule. [8] The import script did not touch the
abmon array so I've just copied it from mon. Otherwise those
two arrays would not match.
ug_CN, ur_IN, ur_PK: the import of abmon introduced the same
changes as previously have been introduced to mon. In case of
ur_IN only one month is reworded.
If you want to ask the question if it is the last patch in this
series then the answer is: no, I'm going to prepare more fixes
for Indian scripts and for other scripts of the world. But as
my time resources are limited and the glibc freeze is approaching
I can't promise they will be delivered in a reasonable future.
It's exclusively my fault that some full month names and abbreviated
month names do not match and I apologize for this.
Regards,
Rafal
[1] https://sourceware.org/ml/libc-alpha/2017-06/msg01590.html
[2] https://sourceware.org/ml/libc-alpha/2017-06/msg01591.html
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=21217
[4] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=c853f14
[5] https://sourceware.org/bugzilla/show_bug.cgi?id=19066
[6] https://sourceware.org/bugzilla/show_bug.cgi?id=17225
[7] http://www.unicode.org/repos/cldr/trunk/common/main/ks.xml
[8] http://st.unicode.org/cldr-apps/v#/ks/Gregorian/
--------
Here is a decoded version of the patch explaining the changes:
Comments
1.07.2017 03:19 Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
> [...]
> ar_DZ, ar_MA, ar_TN: although CLDR does not provide abbreviated
> month names I tried to follow the convention which abbreviates the
> month names to their first 3 characters and here is the result:
> month names imported and abbreviated.
OK, this idea turned out to be wrong and this makes whole patch
need a rework. Probably we should drop the abbreviated forms
in Arabic scripts unless they come from CLDR or are verified
by a native speaker.
See also the statement below:
> However, the same rule applied to ar_IQ did not work because
> produced ambiguous strings (multiple months having the same
> abbreviated name). This means that ar_IQ full and abbreviated
> month names probably do not match now and I don't know how to fix
> this problem.
Regards,
Rafal
On Tue, Jul 4, 2017 at 6:52 AM, Rafal Luzynski
<digitalfreak@lingonborough.com> wrote:
> 1.07.2017 03:19 Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
>> [...]
>> ar_DZ, ar_MA, ar_TN: although CLDR does not provide abbreviated
>> month names I tried to follow the convention which abbreviates the
>> month names to their first 3 characters and here is the result:
>> month names imported and abbreviated.
>
> OK, this idea turned out to be wrong and this makes whole patch
> need a rework. Probably we should drop the abbreviated forms
> in Arabic scripts unless they come from CLDR or are verified
> by a native speaker.
Yes, I think that's the best plan. I don't speak Arabic myself, but I
know enough about the *script* to know that abbreviating words is
going to be more complicated than in most languages.
zw
Zack Weinberg <zackw@panix.com> wrote:
> On Tue, Jul 4, 2017 at 6:52 AM, Rafal Luzynski
> <digitalfreak@lingonborough.com> wrote:
>> 1.07.2017 03:19 Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
>>> [...]
>>> ar_DZ, ar_MA, ar_TN: although CLDR does not provide abbreviated
>>> month names I tried to follow the convention which abbreviates the
>>> month names to their first 3 characters and here is the result:
>>> month names imported and abbreviated.
>>
>> OK, this idea turned out to be wrong and this makes whole patch
>> need a rework. Probably we should drop the abbreviated forms
>> in Arabic scripts unless they come from CLDR or are verified
>> by a native speaker.
>
> Yes, I think that's the best plan. I don't speak Arabic myself, but I
> know enough about the *script* to know that abbreviating words is
> going to be more complicated than in most languages.
Yes, just using the first 3 letters will cause the 3rd letter to be
rendered as a final form which will look wrong if before abbreviating
the 3rd letter was in the middle of the word and rendered as a medial
form.
In https://sourceware.org/bugzilla/show_bug.cgi?id=17225 Muhammad Fawwaz Orabi writes:
"... would be OK as an abbreviated form (still not very much used)"
and:
"... should not be changed (this is the FULL form, but it should not be
changed because abrreviated forms are not familiar in Arabic)"
For the Arabic locales, CLDR has the same strings for the abbreviated
month names as for the full month names, probably because 3 letter
abbreviations are not commonly used in Arabic.
We should probably just follow CLDR here.
@@ -105,10 +105,10 @@ day "الأØد";/
"السبت"
%
% Abbreviated month names (%b)
-abmon "ينا";"Ùبر";/
- "مار";"أبر";/
- "ماي";"يون";/
- "يول";"أغس";/
+abmon "جان";"ÙÙŠÙ";/
+ "مار";"Ø£Ùر";/
+ "ماي";"جوا";/
+ "جوي";"أوت";/
"سبت";"أكت";/
"نوÙ";"ديس"
%
@@ -114,7 +114,7 @@ abmon "كانون ال/
"شباط";/
"آذار";/
"نيسان";/
- "نوار";/
+ "أيار";/
"Øزيران";/
"تموز";/
"آب";/
@@ -113,7 +113,7 @@ abmon "كانون ال/
"شباط";/
"آذار";/
"نيسان";/
- "نوار";/
+ "أيار";/
"Øزيران";/
"تموز";/
"آب";/
@@ -108,9 +108,9 @@ day "الأØد";/
abmon "ينا";"Ùبر";/
"مار";"أبر";/
"ماي";"يون";/
- "يول";"أغس";/
- "سبت";"أكت";/
- "نوÙ";"ديس"
+ "يول";"غشت";/
+ "شتن";"أكت";/
+ "نون";"دجن"
%
% Full month names (%B)
mon "يناير";/
@@ -335,18 +335,18 @@ mon "يناير";/
"أكتوبر";/
"نوÙمبر";/
"ديسمبر"
-abmon "Jan"; /
- "Feb"; /
- "Mar"; /
- "Apr"; /
- "May"; /
- "Jun"; /
- "Jul"; /
- "Aug"; /
- "Sep"; /
- "Oct"; /
- "Nov"; /
- "Dec"
+abmon "يناير";/
+ "Ùبراير";/
+ "مارس";/
+ "أبريل";/
+ "مايو";/
+ "يونيو";/
+ "يوليو";/
+ "أغسطس";/
+ "سبتمبر";/
+ "أكتوبر";/
+ "نوÙمبر";/
+ "ديسمبر"
am_pm "";""
era_d_fmt ""
week 7;19971130;1
@@ -113,7 +113,7 @@ abmon "كانون ال/
"شباط";/
"آذار";/
"نيسان";/
- "نوار";/
+ "أيار";/
"Øزيران";/
"تموز";/
"آب";/
@@ -105,10 +105,10 @@ day "الأØد";/
"السبت"
%
% Abbreviated month names (%b)
-abmon "ينا";"Ùبر";/
- "مار";"أبر";/
- "ماي";"يون";/
- "يول";"أغس";/
+abmon "جان";"ÙÙŠÙ";/
+ "مار";"Ø£Ùر";/
+ "ماي";"جوا";/
+ "جوي";"أوت";/
"سبت";"أكت";/
"نوÙ";"ديس"
%
@@ -110,18 +110,18 @@ day "آتهوار";/
"جمع";"بٹوار"
%
% Abbreviated month names (%b)
-abmon "جنوری";/
- "Ùروری";/
- "مارچ";/
+abmon "جنؤری";/
+ "Ùرؤری";/
+ "مارٕچ";/
"اپریل";/
- "مئ";/
- "جون";/
- "جÙلئ";/
+ "میٔ";/
+ "جوٗن";/
+ "جوٗلایی";/
"اگست";/
- "ستنبر";/
- "اکتوبر";/
- "نوںبر";/
- "دسنبر"
+ "ستمبر";/
+ "اکتوٗبر";/
+ "نومبر";/
+ "دسمبر"
%
% Full month names (%B)
mon "جنؤری";/
@@ -250,18 +250,18 @@ day "يەكشەنبە";/
"پەيشەنبە";/
"جۈمە";/
"شەنبە"
-abmon "قەھرىتان";/
- "ھۇت";/
- "نورۇز";/
- "ئۈمىد";/
- "باھار";/
- "سەپەر";/
- "چىللە";/
- "تومۇز";/
- "مىزان";/
- "ئوغۇز";/
- "ئوغلاق";/
- "ÙƒÛ†Ù†Û•Ùƒ"
+abmon "يانۋار";/
+ "ÙÛۋرال";/
+ "مارت";/
+ "ئاپرÛÙ„";/
+ "ماي";/
+ "ئىيۇن";/
+ "ئىيۇل";/
+ "ئاۋغۇست";/
+ "سÛنتەبىر";/
+ "ئۆكتەبىر";/
+ "نويابىر";/
+ "دÛكابىر"
mon "يانۋار";/
"ÙÛۋرال";/
"مارت";/
@@ -99,7 +99,7 @@ abmon "جنوری";/
"جولائی";/
"اگست";/
"ستمبر";/
- "اكتوبر";/
+ "اکتوبر";/
"نومبر";/
"دسمبر"
%
@@ -111,16 +111,16 @@ day "اتوار";/
"جمعرات";/
"جمعه";/
"Ù‡Ùته"
-abmon "جنوري";/
- "Ùروري";/
+abmon "جنوری";/
+ "Ùروری";/
"مارچ";/
- "اپريل";/
- "مٓی";/
+ "اپریل";/
+ "مئی";/
"جون";/
- "جولاي";/
+ "جولائی";/
"اگست";/
"ستمبر";/
- "اكتوبر";/
+ "اکتوبر";/
"نومبر";/
"دسمبر"
mon "جنوری";/