[4/4] ja_JP locale: Add entry for the new Japanese era [BZ #22964]

Message ID 201904010400.AA04319@tamuki.linet.gr.jp
State Committed
Headers

Commit Message

TAMUKI Shoichi April 1, 2019, 4 a.m. UTC
  The Japanese era name will be changed on May 1, 2019.  The Japanese
government made a preliminary announcement on April 1, 2019.

The glibc ja_JP locale must be updated to include the new era name for
strftime's alternative year format support.

Checked on x86_64-linux-gnu.

ChangeLog:

	[BZ #22964]
	* localedata/locales/ja_JP (LC_TIME): Add entry for the new Japanese
	era.
	* time/tst-strftime2.c (dates): Add 2019-04-30 and 2019-05-01.
	(mkreftable): Add rules for the new Japanese era and the new dates.
---
 NEWS                     |  2 ++
 localedata/locales/ja_JP |  6 ++++--
 time/tst-strftime2.c     | 13 ++++++++-----
 3 files changed, 14 insertions(+), 7 deletions(-)
  

Comments

Florian Weimer April 1, 2019, 7:04 a.m. UTC | #1
* TAMUKI Shoichi:

>  % The following dates and their names are recorded below in descending
>  % date order (note that <U5E74> or <NEN> follows each date).
> -% <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
> +% <REIWA> -> <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
>  %
>  % Each string is an era description segment with the format:
>  % "direction:offset:start_date:end_date:era_name:era_format"
> @@ -14964,7 +14964,9 @@ t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
>  % - The last entry <U7D00><U5143><U524D> in era_name means BC.
>  % - The second-to-last entry <U897F><U66A6> in era_name means AD.
>  %
> -era	"+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
> +era	"+:2:2020//01//01:+*:<U4EE4><U548C>:%EC%Ey<U5E74>";/
> +	"+:1:2019//05//01:2019//12//31:<U4EE4><U548C>:%EC<U5143><U5E74>";/
> +	"+:2:1990//01//01:2019//04//30:<U5E73><U6210>:%EC%Ey<U5E74>";/

Based on <https://www.kantei.go.jp/jp/tyoukanpress/201904/1_a.html>and

  新しい元号は「令和」であります。

the encoding appears to be correct.

> diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
> index 2c94bf592e..532e68f7d3 100644
> --- a/time/tst-strftime2.c
> +++ b/time/tst-strftime2.c
> @@ -54,7 +54,9 @@ static const date_t dates[] =
>    {  1,  4, 1997 },
>    {  1,  4, 1998 },
>    {  1,  4, 2010 },
> -  {  1,  4, 2011 }
> +  {  1,  4, 2011 },
> +  { 30,  4, 2019 },
> +  {  1,  5, 2019 }

Do we need tests for 2020 as well, for the other added rule?
  
TAMUKI Shoichi April 1, 2019, 10:02 a.m. UTC | #2
Hello Florian-san,

From: Florian Weimer <fw@deneb.enyo.de>
Subject: Re: [PATCH 4/4] ja_JP locale: Add entry for the new Japanese era [BZ #22964]
Date: Mon, 01 Apr 2019 09:04:05 +0200

> Based on <https://www.kantei.go.jp/jp/tyoukanpress/201904/1_a.html>and
> 
>   [...]
> 
> the encoding appears to be correct.

In addition to the code of <U4EE4>, note that the code of <UF9A8> is
also present in <REI>.  The latter is a CJK compatibility ideograph
unicode character and is not usually used.

> Do we need tests for 2020 as well, for the other added rule?

I think that the test of Reiwa 2 nen (2020) is unnecessary because the
equivalent test is performed in the case of Heisei 2 nen (1990).

Regards,
TAMUKI Shoichi
  
Rafal Luzynski April 1, 2019, 10:22 a.m. UTC | #3
Hello TAMUKI-san,

Thank you for continuing my work and adding this new patch.  Please
see below:

1.04.2019 06:00 TAMUKI Shoichi <tamuki@linet.gr.jp> wrote:
> 
> 
> The Japanese era name will be changed on May 1, 2019.  The Japanese
> government made a preliminary announcement on April 1, 2019.
> [...]

I believe this patch is very urgent and must be reviewed and pushed
as soon as possible.  If that's helpful, please consider splitting
it as a separate patch.  If that's helpful please consider rebasing
it against the current master (because I believe that now it depends
on the other patches of the series) or even separating it from the
test cases.  I believe you will want to backport it to some older
branches and the patch may not be easily applicable if it depends
on other patches.

> [...]
> diff --git a/NEWS b/NEWS
> index 684752ed53..4ad1ae65af 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -22,6 +22,8 @@ Major new features:
>    alternative calendar for the following locales: zh_TW, cmn_TW, hak_TW,
>    nan_TW, lzh_TW.
>  
> +* The entry for the new Japanese era has been added for ja_JP locale.
> +

I think that this new feature is so important that it deserves some
longer description.  However, I don't have any proposal at the moment
and I will no longer complain if other people find this description
sufficient.

Regards,

Rafal
  
Florian Weimer April 1, 2019, 10:34 a.m. UTC | #4
* TAMUKI Shoichi:

> Hello Florian-san,
>
> From: Florian Weimer <fw@deneb.enyo.de>
> Subject: Re: [PATCH 4/4] ja_JP locale: Add entry for the new Japanese era [BZ #22964]
> Date: Mon, 01 Apr 2019 09:04:05 +0200
>
>> Based on <https://www.kantei.go.jp/jp/tyoukanpress/201904/1_a.html>and
>> 
>>   [...]
>> 
>> the encoding appears to be correct.
>
> In addition to the code of <U4EE4>, note that the code of <UF9A8> is
> also present in <REI>.  The latter is a CJK compatibility ideograph
> unicode character and is not usually used.

Sorry, I don't understand.  Do you mean that the era name could be
written with <UF9A8> instead of <U4EE4> as the first codepoint?  How
certain are we that <U4EE4> is indeed the official codepoint?

Thanks,
Florian
  
Carlos O'Donell April 1, 2019, 8:34 p.m. UTC | #5
On 4/1/19 12:00 AM, TAMUKI Shoichi wrote:
> The Japanese era name will be changed on May 1, 2019.  The Japanese
> government made a preliminary announcement on April 1, 2019.
> 
> The glibc ja_JP locale must be updated to include the new era name for
> strftime's alternative year format support.
> 
> Checked on x86_64-linux-gnu.

OK for master.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ChangeLog:
> 
> 	[BZ #22964]
> 	* localedata/locales/ja_JP (LC_TIME): Add entry for the new Japanese
> 	era.
> 	* time/tst-strftime2.c (dates): Add 2019-04-30 and 2019-05-01.
> 	(mkreftable): Add rules for the new Japanese era and the new dates.
> ---
>   NEWS                     |  2 ++
>   localedata/locales/ja_JP |  6 ++++--
>   time/tst-strftime2.c     | 13 ++++++++-----
>   3 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 684752ed53..4ad1ae65af 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -22,6 +22,8 @@ Major new features:
>     alternative calendar for the following locales: zh_TW, cmn_TW, hak_TW,
>     nan_TW, lzh_TW.
>   
> +* The entry for the new Japanese era has been added for ja_JP locale.

OK. I'm ok with a short entry here, there isn't really much more to say.

> +
>   Deprecated and removed features, and other changes affecting compatibility:
>   
>   * The functions clock_gettime, clock_getres, clock_settime,
> diff --git a/localedata/locales/ja_JP b/localedata/locales/ja_JP
> index cb51e6d69d..c727997b6b 100644
> --- a/localedata/locales/ja_JP
> +++ b/localedata/locales/ja_JP
> @@ -14952,7 +14952,7 @@ t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
>   %
>   % The following dates and their names are recorded below in descending
>   % date order (note that <U5E74> or <NEN> follows each date).
> -% <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
> +% <REIWA> -> <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>

OK. Confirmed.

>   %
>   % Each string is an era description segment with the format:
>   % "direction:offset:start_date:end_date:era_name:era_format"
> @@ -14964,7 +14964,9 @@ t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
>   % - The last entry <U7D00><U5143><U524D> in era_name means BC.
>   % - The second-to-last entry <U897F><U66A6> in era_name means AD.
>   %
> -era	"+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
> +era	"+:2:2020//01//01:+*:<U4EE4><U548C>:%EC%Ey<U5E74>";/

OK, 2 year offset from 2020-01-01 onwards. The choice of U4EE4 is correct
because it has a Shift-JIS equivalent and matches new era. The choice of
U548C is correct and has a Shift-JIS equivlaent.

> +	"+:1:2019//05//01:2019//12//31:<U4EE4><U548C>:%EC<U5143><U5E74>";/

OK, 1 year offset for first year using <U5143> and is correct.

> +	"+:2:1990//01//01:2019//04//30:<U5E73><U6210>:%EC%Ey<U5E74>";/

OK, adjust previous final element to end 2019-04-30, and that matches
the end of April before transition.

>   	"+:1:1989//01//08:1989//12//31:<U5E73><U6210>:%EC<U5143><U5E74>";/
>   	"+:2:1927//01//01:1989//01//07:<U662D><U548C>:%EC%Ey<U5E74>";/
>   	"+:1:1926//12//25:1926//12//31:<U662D><U548C>:%EC<U5143><U5E74>";/
> diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
> index 2c94bf592e..532e68f7d3 100644
> --- a/time/tst-strftime2.c
> +++ b/time/tst-strftime2.c
> @@ -54,7 +54,9 @@ static const date_t dates[] =
>     {  1,  4, 1997 },
>     {  1,  4, 1998 },
>     {  1,  4, 2010 },
> -  {  1,  4, 2011 }
> +  {  1,  4, 2011 },
> +  { 30,  4, 2019 },

OK, end of HEISEI era.

> +  {  1,  5, 2019 }

OK, start of RAIWA era.

>   };
>   
>   static char ref[array_length (locales)][array_length (formats)]
> @@ -84,20 +86,20 @@ mkreftable (void)
>     static const int yrj[] =
>     {
>       43, 44, 45, 2,
> -    63, 64, 1, 2, 9, 10, 22, 23
> +    63, 64, 1, 2, 9, 10, 22, 23, 31, 1

OK, add two more years to the era tests.

>     };
>     /* Buddhist calendar year to be checked.  */
>     static const int yrb[] =
>     {
>       2453, 2454, 2455, 2456,
> -    2531, 2532, 2532, 2533, 2540, 2541, 2553, 2554
> +    2531, 2532, 2532, 2533, 2540, 2541, 2553, 2554, 2562, 2562

OK, add two more years to the buddhist calendar.

>     };
>     /* R.O.C. calendar year to be checked.  Negative number is prior to
>        Minguo counting up.  */
>     static const int yrc[] =
>     {
>       -2, -1, 1, 2,
> -    77, 78, 78, 79, 86, 87, 99, 100
> +    77, 78, 78, 79, 86, 87, 99, 100, 108, 108

OK, add two more years to the R.O.C. calendar.

>     };
>   
>     for (i = 0; i < array_length (locales); i++)
> @@ -109,7 +111,8 @@ mkreftable (void)
>   	      era = (is_before (k, 30,  7, 1912)) ? "\u660e\u6cbb"
>   		  : (is_before (k, 25, 12, 1926)) ? "\u5927\u6b63"
>   		  : (is_before (k,  8,  1, 1989)) ? "\u662d\u548c"
> -						  : "\u5e73\u6210";
> +		  : (is_before (k,  1,  5, 2019)) ? "\u5e73\u6210"
> +						  : "\u4ee4\u548c";

OK add the REIWA dates to the table to test.

>   	      yr = yrj[k], sfx = "\u5e74";
>   	    }
>   	  else if (i == 1)  /* lo_LA  */
>
  
Carlos O'Donell April 2, 2019, 3:32 a.m. UTC | #6
On 4/1/19 6:34 AM, Florian Weimer wrote:
> * TAMUKI Shoichi:
> 
>> Hello Florian-san,
>>
>> From: Florian Weimer <fw@deneb.enyo.de>
>> Subject: Re: [PATCH 4/4] ja_JP locale: Add entry for the new Japanese era [BZ #22964]
>> Date: Mon, 01 Apr 2019 09:04:05 +0200
>>
>>> Based on <https://www.kantei.go.jp/jp/tyoukanpress/201904/1_a.html>and
>>>
>>>    [...]
>>>
>>> the encoding appears to be correct.
>>
>> In addition to the code of <U4EE4>, note that the code of <UF9A8> is
>> also present in <REI>.  The latter is a CJK compatibility ideograph
>> unicode character and is not usually used.
> 
> Sorry, I don't understand.  Do you mean that the era name could be
> written with <UF9A8> instead of <U4EE4> as the first codepoint?  How
> certain are we that <U4EE4> is indeed the official codepoint?

I believe it could be written as <UF9A8>, but that is not a Kanji
character which we can display in shift-jis / euc-jp, while <U4EE4>
is a code point we have mapped to a specific encoding value.

All uses of REI are <U4EE4> on the government site, which is as strong
indicator that this is what we should be using.

Lastly, this CLDR ticket for 35.1:
https://unicode.org/cldr/trac/ticket/11796

And these revisions of trunk:
Index: common/uca/FractionalUCA.txt
===================================================================
--- common/uca/FractionalUCA.txt        (revision 14975)
+++ common/uca/FractionalUCA.txt        (revision 14978)
...
-[UCA version = 12.0.0]
+[UCA version = 12.1.0]
...
+32FF; [U+4EE4, 31][U+548C, 31] # Zyyy So       [FB40.0020.001C][CEE4.0000.0000][FB40.0020.001C][D48C.0000.0000]        * SQUARE ERA NAME REIWA

Indicate <U4EE4> is what will be used for CLDR and we should match.

No official update from the Unicode standard yet:
http://unicode.org/versions/Unicode12.1.0/
  

Patch

diff --git a/NEWS b/NEWS
index 684752ed53..4ad1ae65af 100644
--- a/NEWS
+++ b/NEWS
@@ -22,6 +22,8 @@  Major new features:
   alternative calendar for the following locales: zh_TW, cmn_TW, hak_TW,
   nan_TW, lzh_TW.
 
+* The entry for the new Japanese era has been added for ja_JP locale.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The functions clock_gettime, clock_getres, clock_settime,
diff --git a/localedata/locales/ja_JP b/localedata/locales/ja_JP
index cb51e6d69d..c727997b6b 100644
--- a/localedata/locales/ja_JP
+++ b/localedata/locales/ja_JP
@@ -14952,7 +14952,7 @@  t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
 %
 % The following dates and their names are recorded below in descending
 % date order (note that <U5E74> or <NEN> follows each date).
-% <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
+% <REIWA> -> <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
 %
 % Each string is an era description segment with the format:
 % "direction:offset:start_date:end_date:era_name:era_format"
@@ -14964,7 +14964,9 @@  t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
 % - The last entry <U7D00><U5143><U524D> in era_name means BC.
 % - The second-to-last entry <U897F><U66A6> in era_name means AD.
 %
-era	"+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
+era	"+:2:2020//01//01:+*:<U4EE4><U548C>:%EC%Ey<U5E74>";/
+	"+:1:2019//05//01:2019//12//31:<U4EE4><U548C>:%EC<U5143><U5E74>";/
+	"+:2:1990//01//01:2019//04//30:<U5E73><U6210>:%EC%Ey<U5E74>";/
 	"+:1:1989//01//08:1989//12//31:<U5E73><U6210>:%EC<U5143><U5E74>";/
 	"+:2:1927//01//01:1989//01//07:<U662D><U548C>:%EC%Ey<U5E74>";/
 	"+:1:1926//12//25:1926//12//31:<U662D><U548C>:%EC<U5143><U5E74>";/
diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
index 2c94bf592e..532e68f7d3 100644
--- a/time/tst-strftime2.c
+++ b/time/tst-strftime2.c
@@ -54,7 +54,9 @@  static const date_t dates[] =
   {  1,  4, 1997 },
   {  1,  4, 1998 },
   {  1,  4, 2010 },
-  {  1,  4, 2011 }
+  {  1,  4, 2011 },
+  { 30,  4, 2019 },
+  {  1,  5, 2019 }
 };
 
 static char ref[array_length (locales)][array_length (formats)]
@@ -84,20 +86,20 @@  mkreftable (void)
   static const int yrj[] =
   {
     43, 44, 45, 2,
-    63, 64, 1, 2, 9, 10, 22, 23
+    63, 64, 1, 2, 9, 10, 22, 23, 31, 1
   };
   /* Buddhist calendar year to be checked.  */
   static const int yrb[] =
   {
     2453, 2454, 2455, 2456,
-    2531, 2532, 2532, 2533, 2540, 2541, 2553, 2554
+    2531, 2532, 2532, 2533, 2540, 2541, 2553, 2554, 2562, 2562
   };
   /* R.O.C. calendar year to be checked.  Negative number is prior to
      Minguo counting up.  */
   static const int yrc[] =
   {
     -2, -1, 1, 2,
-    77, 78, 78, 79, 86, 87, 99, 100
+    77, 78, 78, 79, 86, 87, 99, 100, 108, 108
   };
 
   for (i = 0; i < array_length (locales); i++)
@@ -109,7 +111,8 @@  mkreftable (void)
 	      era = (is_before (k, 30,  7, 1912)) ? "\u660e\u6cbb"
 		  : (is_before (k, 25, 12, 1926)) ? "\u5927\u6b63"
 		  : (is_before (k,  8,  1, 1989)) ? "\u662d\u548c"
-						  : "\u5e73\u6210";
+		  : (is_before (k,  1,  5, 2019)) ? "\u5e73\u6210"
+						  : "\u4ee4\u548c";
 	      yr = yrj[k], sfx = "\u5e74";
 	    }
 	  else if (i == 1)  /* lo_LA  */