[0/1] Arabic scripts: More fixes after the recent import from CLDR-31

Message ID 1183717083.905878.1498871953397@poczta.nazwa.pl
State Superseded
Delegated to: Mike Fabian
Headers

Commit Message

Rafal Luzynski July 1, 2017, 1:19 a.m. UTC
  Feel free to treat this patch as the second part of the previous one.
[1] [2]

After the recent series of import of month names from CLDR (bug 21217) [3]
more imports seem to me to be needed, mostly abbreviated month names.
Here is an import which fixes Arabic script to reflect their full
forms import. [4]

ar_DZ, ar_MA, ar_TN: although CLDR does not provide abbreviated
month names I tried to follow the convention which abbreviates the
month names to their first 3 characters and here is the result:
month names imported and abbreviated.

However, the same rule applied to ar_IQ did not work because
produced ambiguous strings (multiple months having the same
abbreviated name).  This means that ar_IQ full and abbreviated
month names probably do not match now and I don't know how to fix
this problem.

ar_SA: hopefully, this partially fixes bug 19066. [5] Previously
all abbreviated month names were English.  The abbreviated day names
are not changed so the bug is not yet fixed.

ar_SY, ar_JO, ar_LB: incidentally this import together with the
previous one completely fixes bug 17225. [6] The original bug
report was against ar_SY only but the changes in ar_JO and ar_LB
are the same.

ks_IN: this looks like an import script bug to me.  CLDR does not
provide the abbreviated month names for Kashmiri [7] but their
web interface seems to assume that missing data should be copied
from the existing ones and displays the abbreviated month names
according to that rule. [8] The import script did not touch the
abmon array so I've just copied it from mon.  Otherwise those
two arrays would not match.

ug_CN, ur_IN, ur_PK: the import of abmon introduced the same
changes as previously have been introduced to mon.  In case of
ur_IN only one month is reworded.

If you want to ask the question if it is the last patch in this
series then the answer is: no, I'm going to prepare more fixes
for Indian scripts and for other scripts of the world.  But as
my time resources are limited and the glibc freeze is approaching
I can't promise they will be delivered in a reasonable future.
It's exclusively my fault that some full month names and abbreviated
month names do not match and I apologize for this.

Regards,

Rafal


[1] https://sourceware.org/ml/libc-alpha/2017-06/msg01590.html
[2] https://sourceware.org/ml/libc-alpha/2017-06/msg01591.html
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=21217
[4] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=c853f14
[5] https://sourceware.org/bugzilla/show_bug.cgi?id=19066
[6] https://sourceware.org/bugzilla/show_bug.cgi?id=17225
[7] http://www.unicode.org/repos/cldr/trunk/common/main/ks.xml
[8] http://st.unicode.org/cldr-apps/v#/ks/Gregorian/

--------

Here is a decoded version of the patch explaining the changes:
  

Comments

Rafal Luzynski July 4, 2017, 10:52 a.m. UTC | #1
1.07.2017 03:19 Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
> [...]
> ar_DZ, ar_MA, ar_TN: although CLDR does not provide abbreviated
> month names I tried to follow the convention which abbreviates the
> month names to their first 3 characters and here is the result:
> month names imported and abbreviated.

OK, this idea turned out to be wrong and this makes whole patch
need a rework.  Probably we should drop the abbreviated forms
in Arabic scripts unless they come from CLDR or are verified
by a native speaker.

See also the statement below:

> However, the same rule applied to ar_IQ did not work because
> produced ambiguous strings (multiple months having the same
> abbreviated name). This means that ar_IQ full and abbreviated
> month names probably do not match now and I don't know how to fix
> this problem.

Regards,

Rafal
  
Zack Weinberg July 4, 2017, 2:33 p.m. UTC | #2
On Tue, Jul 4, 2017 at 6:52 AM, Rafal Luzynski
<digitalfreak@lingonborough.com> wrote:
> 1.07.2017 03:19 Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
>> [...]
>> ar_DZ, ar_MA, ar_TN: although CLDR does not provide abbreviated
>> month names I tried to follow the convention which abbreviates the
>> month names to their first 3 characters and here is the result:
>> month names imported and abbreviated.
>
> OK, this idea turned out to be wrong and this makes whole patch
> need a rework.  Probably we should drop the abbreviated forms
> in Arabic scripts unless they come from CLDR or are verified
> by a native speaker.

Yes, I think that's the best plan.  I don't speak Arabic myself, but I
know enough about the *script* to know that abbreviating words is
going to be more complicated than in most languages.

zw
  
Mike FABIAN July 4, 2017, 3:12 p.m. UTC | #3
Zack Weinberg <zackw@panix.com> wrote:

> On Tue, Jul 4, 2017 at 6:52 AM, Rafal Luzynski
> <digitalfreak@lingonborough.com> wrote:
>> 1.07.2017 03:19 Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
>>> [...]
>>> ar_DZ, ar_MA, ar_TN: although CLDR does not provide abbreviated
>>> month names I tried to follow the convention which abbreviates the
>>> month names to their first 3 characters and here is the result:
>>> month names imported and abbreviated.
>>
>> OK, this idea turned out to be wrong and this makes whole patch
>> need a rework.  Probably we should drop the abbreviated forms
>> in Arabic scripts unless they come from CLDR or are verified
>> by a native speaker.
>
> Yes, I think that's the best plan.  I don't speak Arabic myself, but I
> know enough about the *script* to know that abbreviating words is
> going to be more complicated than in most languages.

Yes, just using the first 3 letters will cause the 3rd letter to be
rendered as a final form which will look wrong if before abbreviating
the 3rd letter was in the middle of the word and rendered as a medial
form.

In https://sourceware.org/bugzilla/show_bug.cgi?id=17225 Muhammad Fawwaz Orabi writes:

"... would be OK as an abbreviated form (still not very much used)"

and:

"... should not be changed (this is the FULL form, but it should not be
changed because abrreviated forms are not familiar in Arabic)"

For the Arabic locales, CLDR has the same strings for the abbreviated
month names as for the full month names, probably because 3 letter
abbreviations are not commonly used in Arabic.

We should probably just follow CLDR here.
  

Patch

diff --git a/localedata/locales/ar_DZ b/localedata/locales/ar_DZ
index 4789ff5..62066e1 100644
--- a/localedata/locales/ar_DZ
+++ b/localedata/locales/ar_DZ
@@ -105,10 +105,10 @@  day         "الأحد";/
             "السبت"
 %
 % Abbreviated month names (%b)
-abmon       "ينا";"فبر";/
-            "مار";"أبر";/
-            "ماي";"يون";/
-            "يول";"أغس";/
+abmon       "جان";"فيف";/
+            "مار";"أفر";/
+            "ماي";"جوا";/
+            "جوي";"أوت";/
             "سبت";"أكت";/
             "نوف";"ديس"
 %
diff --git a/localedata/locales/ar_JO b/localedata/locales/ar_JO
index 9bac497..13b32fd 100644
--- a/localedata/locales/ar_JO
+++ b/localedata/locales/ar_JO
@@ -114,7 +114,7 @@  abmon       "كانون ال/
             "شباط";/
             "آذار";/
             "نيسان";/
-            "نوار";/
+            "أيار";/
             "حزيران";/
             "تموز";/
             "آب";/
diff --git a/localedata/locales/ar_LB b/localedata/locales/ar_LB
index 2e22767..1cb6b25 100644
--- a/localedata/locales/ar_LB
+++ b/localedata/locales/ar_LB
@@ -113,7 +113,7 @@  abmon       "كانون ال/
             "شباط";/
             "آذار";/
             "نيسان";/
-            "نوار";/
+            "أيار";/
             "حزيران";/
             "تموز";/
             "آب";/
diff --git a/localedata/locales/ar_MA b/localedata/locales/ar_MA
index a157d97..db795d2 100644
--- a/localedata/locales/ar_MA
+++ b/localedata/locales/ar_MA
@@ -108,9 +108,9 @@  day         "الأحد";/
 abmon       "ينا";"فبر";/
             "مار";"أبر";/
             "ماي";"يون";/
-            "يول";"أغس";/
-            "سبت";"أكت";/
-            "نوف";"ديس"
+            "يول";"غشت";/
+            "شتن";"أكت";/
+            "نون";"دجن"
 %
 % Full month names (%B)
 mon         "يناير";/
diff --git a/localedata/locales/ar_SA b/localedata/locales/ar_SA
index 88698ff..420e748 100644
--- a/localedata/locales/ar_SA
+++ b/localedata/locales/ar_SA
@@ -335,18 +335,18 @@  mon	"يناير";/
 	"أكتوبر";/
 	"نوفمبر";/
 	"ديسمبر"
-abmon	"Jan"; /
-	"Feb"; /
-	"Mar"; /
-	"Apr"; /
-	"May"; /
-	"Jun"; /
-	"Jul"; /
-	"Aug"; /
-	"Sep"; /
-	"Oct"; /
-	"Nov"; /
-	"Dec"
+abmon	"يناير";/
+	"فبراير";/
+	"مارس";/
+	"أبريل";/
+	"مايو";/
+	"يونيو";/
+	"يوليو";/
+	"أغسطس";/
+	"سبتمبر";/
+	"أكتوبر";/
+	"نوفمبر";/
+	"ديسمبر"
 am_pm	"";""
 era_d_fmt	""
 week 7;19971130;1
diff --git a/localedata/locales/ar_SY b/localedata/locales/ar_SY
index 9cc7ce5..56ed144 100644
--- a/localedata/locales/ar_SY
+++ b/localedata/locales/ar_SY
@@ -113,7 +113,7 @@  abmon       "كانون ال/
             "شباط";/
             "آذار";/
             "نيسان";/
-            "نوار";/
+            "أيار";/
             "حزيران";/
             "تموز";/
             "آب";/
diff --git a/localedata/locales/ar_TN b/localedata/locales/ar_TN
index e277275..b62b9d3 100644
--- a/localedata/locales/ar_TN
+++ b/localedata/locales/ar_TN
@@ -105,10 +105,10 @@  day         "الأحد";/
             "السبت"
 %
 % Abbreviated month names (%b)
-abmon       "ينا";"فبر";/
-            "مار";"أبر";/
-            "ماي";"يون";/
-            "يول";"أغس";/
+abmon       "جان";"فيف";/
+            "مار";"أفر";/
+            "ماي";"جوا";/
+            "جوي";"أوت";/
             "سبت";"أكت";/
             "نوف";"ديس"
 %
diff --git a/localedata/locales/ks_IN b/localedata/locales/ks_IN
index 320258b..094f2cd 100644
--- a/localedata/locales/ks_IN
+++ b/localedata/locales/ks_IN
@@ -110,18 +110,18 @@  day       "آتهوار";/
             "جمع";"بٹوار"
 %
 % Abbreviated month names (%b)
-abmon	    "جنوری";/
-	    "فروری";/
-	    "مارچ";/
+abmon	    "جنؤری";/
+	    "فرؤری";/
+	    "مارٕچ";/
 	    "اپریل";/
-	    "مئ";/
-	    "جون";/
-	    "جُلئ";/
+	    "میٔ";/
+	    "جوٗن";/
+	    "جوٗلایی";/
 	    "اگست";/
-	    "ستنبر";/
-	    "اکتوبر";/
-	    "نوںبر";/
-	    "دسنبر"
+	    "ستمبر";/
+	    "اکتوٗبر";/
+	    "نومبر";/
+	    "دسمبر"
 %
 % Full month names (%B)
 mon	    "جنؤری";/
diff --git a/localedata/locales/ug_CN b/localedata/locales/ug_CN
index 205f4b0..3773176 100644
--- a/localedata/locales/ug_CN
+++ b/localedata/locales/ug_CN
@@ -250,18 +250,18 @@  day   "يەكشەنبە";/
       "پەيشەنبە";/
       "جۈمە";/
       "شەنبە"
-abmon "قەھرىتان";/
-      "ھۇت";/
-      "نورۇز";/
-      "ئۈمىد";/
-      "باھار";/
-      "سەپەر";/
-      "چىللە";/
-      "تومۇز";/
-      "مىزان";/
-      "ئوغۇز";/
-      "ئوغلاق";/
-      "ÙƒÛ†Ù†Û•Ùƒ"
+abmon "يانۋار";/
+      "فېۋرال";/
+      "مارت";/
+      "ئاپرېل";/
+      "ماي";/
+      "ئىيۇن";/
+      "ئىيۇل";/
+      "ئاۋغۇست";/
+      "سېنتەبىر";/
+      "ئۆكتەبىر";/
+      "نويابىر";/
+      "دېكابىر"
 mon   "يانۋار";/
       "فېۋرال";/
       "مارت";/
diff --git a/localedata/locales/ur_IN b/localedata/locales/ur_IN
index 4b4309c..1af10ed 100644
--- a/localedata/locales/ur_IN
+++ b/localedata/locales/ur_IN
@@ -99,7 +99,7 @@  abmon     "جنوری";/
 	    "جولائی";/
 	    "اگست";/
 	    "ستمبر";/
-	    "اكتوبر";/
+	    "اکتوبر";/
 	    "نومبر";/
 	    "دسمبر"
 %
diff --git a/localedata/locales/ur_PK b/localedata/locales/ur_PK
index 2281ef5..c9fcd19 100644
--- a/localedata/locales/ur_PK
+++ b/localedata/locales/ur_PK
@@ -111,16 +111,16 @@  day	"اتوار";/
 	"جمعرات";/
 	"جمعه";/
 	"هفته"
-abmon	"جنوري";/
-	"فروري";/
+abmon	"جنوری";/
+	"فروری";/
 	"مارچ";/
-	"اپريل";/
-	"مٓی";/
+	"اپریل";/
+	"مئی";/
 	"جون";/
-	"جولاي";/
+	"جولائی";/
 	"اگست";/
 	"ستمبر";/
-	"اكتوبر";/
+	"اکتوبر";/
 	"نومبر";/
 	"دسمبر"
 mon	"جنوری";/