[4/4] ja_JP locale: Add entry for the new Japanese era [BZ #22964]
Commit Message
The Japanese era name will be changed on May 1, 2019. The Japanese
government made a preliminary announcement on April 1, 2019.
The glibc ja_JP locale must be updated to include the new era name for
strftime's alternative year format support.
Checked on x86_64-linux-gnu.
ChangeLog:
[BZ #22964]
* localedata/locales/ja_JP (LC_TIME): Add entry for the new Japanese
era.
* time/tst-strftime2.c (dates): Add 2019-04-30 and 2019-05-01.
(mkreftable): Add rules for the new Japanese era and the new dates.
---
NEWS | 2 ++
localedata/locales/ja_JP | 6 ++++--
time/tst-strftime2.c | 13 ++++++++-----
3 files changed, 14 insertions(+), 7 deletions(-)
Comments
* TAMUKI Shoichi:
> % The following dates and their names are recorded below in descending
> % date order (note that <U5E74> or <NEN> follows each date).
> -% <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
> +% <REIWA> -> <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
> %
> % Each string is an era description segment with the format:
> % "direction:offset:start_date:end_date:era_name:era_format"
> @@ -14964,7 +14964,9 @@ t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
> % - The last entry <U7D00><U5143><U524D> in era_name means BC.
> % - The second-to-last entry <U897F><U66A6> in era_name means AD.
> %
> -era "+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
> +era "+:2:2020//01//01:+*:<U4EE4><U548C>:%EC%Ey<U5E74>";/
> + "+:1:2019//05//01:2019//12//31:<U4EE4><U548C>:%EC<U5143><U5E74>";/
> + "+:2:1990//01//01:2019//04//30:<U5E73><U6210>:%EC%Ey<U5E74>";/
Based on <https://www.kantei.go.jp/jp/tyoukanpress/201904/1_a.html>and
新しい元号は「令和」であります。
the encoding appears to be correct.
> diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
> index 2c94bf592e..532e68f7d3 100644
> --- a/time/tst-strftime2.c
> +++ b/time/tst-strftime2.c
> @@ -54,7 +54,9 @@ static const date_t dates[] =
> { 1, 4, 1997 },
> { 1, 4, 1998 },
> { 1, 4, 2010 },
> - { 1, 4, 2011 }
> + { 1, 4, 2011 },
> + { 30, 4, 2019 },
> + { 1, 5, 2019 }
Do we need tests for 2020 as well, for the other added rule?
Hello Florian-san,
From: Florian Weimer <fw@deneb.enyo.de>
Subject: Re: [PATCH 4/4] ja_JP locale: Add entry for the new Japanese era [BZ #22964]
Date: Mon, 01 Apr 2019 09:04:05 +0200
> Based on <https://www.kantei.go.jp/jp/tyoukanpress/201904/1_a.html>and
>
> [...]
>
> the encoding appears to be correct.
In addition to the code of <U4EE4>, note that the code of <UF9A8> is
also present in <REI>. The latter is a CJK compatibility ideograph
unicode character and is not usually used.
> Do we need tests for 2020 as well, for the other added rule?
I think that the test of Reiwa 2 nen (2020) is unnecessary because the
equivalent test is performed in the case of Heisei 2 nen (1990).
Regards,
TAMUKI Shoichi
Hello TAMUKI-san,
Thank you for continuing my work and adding this new patch. Please
see below:
1.04.2019 06:00 TAMUKI Shoichi <tamuki@linet.gr.jp> wrote:
>
>
> The Japanese era name will be changed on May 1, 2019. The Japanese
> government made a preliminary announcement on April 1, 2019.
> [...]
I believe this patch is very urgent and must be reviewed and pushed
as soon as possible. If that's helpful, please consider splitting
it as a separate patch. If that's helpful please consider rebasing
it against the current master (because I believe that now it depends
on the other patches of the series) or even separating it from the
test cases. I believe you will want to backport it to some older
branches and the patch may not be easily applicable if it depends
on other patches.
> [...]
> diff --git a/NEWS b/NEWS
> index 684752ed53..4ad1ae65af 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -22,6 +22,8 @@ Major new features:
> alternative calendar for the following locales: zh_TW, cmn_TW, hak_TW,
> nan_TW, lzh_TW.
>
> +* The entry for the new Japanese era has been added for ja_JP locale.
> +
I think that this new feature is so important that it deserves some
longer description. However, I don't have any proposal at the moment
and I will no longer complain if other people find this description
sufficient.
Regards,
Rafal
* TAMUKI Shoichi:
> Hello Florian-san,
>
> From: Florian Weimer <fw@deneb.enyo.de>
> Subject: Re: [PATCH 4/4] ja_JP locale: Add entry for the new Japanese era [BZ #22964]
> Date: Mon, 01 Apr 2019 09:04:05 +0200
>
>> Based on <https://www.kantei.go.jp/jp/tyoukanpress/201904/1_a.html>and
>>
>> [...]
>>
>> the encoding appears to be correct.
>
> In addition to the code of <U4EE4>, note that the code of <UF9A8> is
> also present in <REI>. The latter is a CJK compatibility ideograph
> unicode character and is not usually used.
Sorry, I don't understand. Do you mean that the era name could be
written with <UF9A8> instead of <U4EE4> as the first codepoint? How
certain are we that <U4EE4> is indeed the official codepoint?
Thanks,
Florian
On 4/1/19 12:00 AM, TAMUKI Shoichi wrote:
> The Japanese era name will be changed on May 1, 2019. The Japanese
> government made a preliminary announcement on April 1, 2019.
>
> The glibc ja_JP locale must be updated to include the new era name for
> strftime's alternative year format support.
>
> Checked on x86_64-linux-gnu.
OK for master.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> ChangeLog:
>
> [BZ #22964]
> * localedata/locales/ja_JP (LC_TIME): Add entry for the new Japanese
> era.
> * time/tst-strftime2.c (dates): Add 2019-04-30 and 2019-05-01.
> (mkreftable): Add rules for the new Japanese era and the new dates.
> ---
> NEWS | 2 ++
> localedata/locales/ja_JP | 6 ++++--
> time/tst-strftime2.c | 13 ++++++++-----
> 3 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 684752ed53..4ad1ae65af 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -22,6 +22,8 @@ Major new features:
> alternative calendar for the following locales: zh_TW, cmn_TW, hak_TW,
> nan_TW, lzh_TW.
>
> +* The entry for the new Japanese era has been added for ja_JP locale.
OK. I'm ok with a short entry here, there isn't really much more to say.
> +
> Deprecated and removed features, and other changes affecting compatibility:
>
> * The functions clock_gettime, clock_getres, clock_settime,
> diff --git a/localedata/locales/ja_JP b/localedata/locales/ja_JP
> index cb51e6d69d..c727997b6b 100644
> --- a/localedata/locales/ja_JP
> +++ b/localedata/locales/ja_JP
> @@ -14952,7 +14952,7 @@ t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
> %
> % The following dates and their names are recorded below in descending
> % date order (note that <U5E74> or <NEN> follows each date).
> -% <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
> +% <REIWA> -> <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
OK. Confirmed.
> %
> % Each string is an era description segment with the format:
> % "direction:offset:start_date:end_date:era_name:era_format"
> @@ -14964,7 +14964,9 @@ t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
> % - The last entry <U7D00><U5143><U524D> in era_name means BC.
> % - The second-to-last entry <U897F><U66A6> in era_name means AD.
> %
> -era "+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
> +era "+:2:2020//01//01:+*:<U4EE4><U548C>:%EC%Ey<U5E74>";/
OK, 2 year offset from 2020-01-01 onwards. The choice of U4EE4 is correct
because it has a Shift-JIS equivalent and matches new era. The choice of
U548C is correct and has a Shift-JIS equivlaent.
> + "+:1:2019//05//01:2019//12//31:<U4EE4><U548C>:%EC<U5143><U5E74>";/
OK, 1 year offset for first year using <U5143> and is correct.
> + "+:2:1990//01//01:2019//04//30:<U5E73><U6210>:%EC%Ey<U5E74>";/
OK, adjust previous final element to end 2019-04-30, and that matches
the end of April before transition.
> "+:1:1989//01//08:1989//12//31:<U5E73><U6210>:%EC<U5143><U5E74>";/
> "+:2:1927//01//01:1989//01//07:<U662D><U548C>:%EC%Ey<U5E74>";/
> "+:1:1926//12//25:1926//12//31:<U662D><U548C>:%EC<U5143><U5E74>";/
> diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
> index 2c94bf592e..532e68f7d3 100644
> --- a/time/tst-strftime2.c
> +++ b/time/tst-strftime2.c
> @@ -54,7 +54,9 @@ static const date_t dates[] =
> { 1, 4, 1997 },
> { 1, 4, 1998 },
> { 1, 4, 2010 },
> - { 1, 4, 2011 }
> + { 1, 4, 2011 },
> + { 30, 4, 2019 },
OK, end of HEISEI era.
> + { 1, 5, 2019 }
OK, start of RAIWA era.
> };
>
> static char ref[array_length (locales)][array_length (formats)]
> @@ -84,20 +86,20 @@ mkreftable (void)
> static const int yrj[] =
> {
> 43, 44, 45, 2,
> - 63, 64, 1, 2, 9, 10, 22, 23
> + 63, 64, 1, 2, 9, 10, 22, 23, 31, 1
OK, add two more years to the era tests.
> };
> /* Buddhist calendar year to be checked. */
> static const int yrb[] =
> {
> 2453, 2454, 2455, 2456,
> - 2531, 2532, 2532, 2533, 2540, 2541, 2553, 2554
> + 2531, 2532, 2532, 2533, 2540, 2541, 2553, 2554, 2562, 2562
OK, add two more years to the buddhist calendar.
> };
> /* R.O.C. calendar year to be checked. Negative number is prior to
> Minguo counting up. */
> static const int yrc[] =
> {
> -2, -1, 1, 2,
> - 77, 78, 78, 79, 86, 87, 99, 100
> + 77, 78, 78, 79, 86, 87, 99, 100, 108, 108
OK, add two more years to the R.O.C. calendar.
> };
>
> for (i = 0; i < array_length (locales); i++)
> @@ -109,7 +111,8 @@ mkreftable (void)
> era = (is_before (k, 30, 7, 1912)) ? "\u660e\u6cbb"
> : (is_before (k, 25, 12, 1926)) ? "\u5927\u6b63"
> : (is_before (k, 8, 1, 1989)) ? "\u662d\u548c"
> - : "\u5e73\u6210";
> + : (is_before (k, 1, 5, 2019)) ? "\u5e73\u6210"
> + : "\u4ee4\u548c";
OK add the REIWA dates to the table to test.
> yr = yrj[k], sfx = "\u5e74";
> }
> else if (i == 1) /* lo_LA */
>
On 4/1/19 6:34 AM, Florian Weimer wrote:
> * TAMUKI Shoichi:
>
>> Hello Florian-san,
>>
>> From: Florian Weimer <fw@deneb.enyo.de>
>> Subject: Re: [PATCH 4/4] ja_JP locale: Add entry for the new Japanese era [BZ #22964]
>> Date: Mon, 01 Apr 2019 09:04:05 +0200
>>
>>> Based on <https://www.kantei.go.jp/jp/tyoukanpress/201904/1_a.html>and
>>>
>>> [...]
>>>
>>> the encoding appears to be correct.
>>
>> In addition to the code of <U4EE4>, note that the code of <UF9A8> is
>> also present in <REI>. The latter is a CJK compatibility ideograph
>> unicode character and is not usually used.
>
> Sorry, I don't understand. Do you mean that the era name could be
> written with <UF9A8> instead of <U4EE4> as the first codepoint? How
> certain are we that <U4EE4> is indeed the official codepoint?
I believe it could be written as <UF9A8>, but that is not a Kanji
character which we can display in shift-jis / euc-jp, while <U4EE4>
is a code point we have mapped to a specific encoding value.
All uses of REI are <U4EE4> on the government site, which is as strong
indicator that this is what we should be using.
Lastly, this CLDR ticket for 35.1:
https://unicode.org/cldr/trac/ticket/11796
And these revisions of trunk:
Index: common/uca/FractionalUCA.txt
===================================================================
--- common/uca/FractionalUCA.txt (revision 14975)
+++ common/uca/FractionalUCA.txt (revision 14978)
...
-[UCA version = 12.0.0]
+[UCA version = 12.1.0]
...
+32FF; [U+4EE4, 31][U+548C, 31] # Zyyy So [FB40.0020.001C][CEE4.0000.0000][FB40.0020.001C][D48C.0000.0000] * SQUARE ERA NAME REIWA
Indicate <U4EE4> is what will be used for CLDR and we should match.
No official update from the Unicode standard yet:
http://unicode.org/versions/Unicode12.1.0/
@@ -22,6 +22,8 @@ Major new features:
alternative calendar for the following locales: zh_TW, cmn_TW, hak_TW,
nan_TW, lzh_TW.
+* The entry for the new Japanese era has been added for ja_JP locale.
+
Deprecated and removed features, and other changes affecting compatibility:
* The functions clock_gettime, clock_getres, clock_settime,
@@ -14952,7 +14952,7 @@ t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
%
% The following dates and their names are recorded below in descending
% date order (note that <U5E74> or <NEN> follows each date).
-% <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
+% <REIWA> -> <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
%
% Each string is an era description segment with the format:
% "direction:offset:start_date:end_date:era_name:era_format"
@@ -14964,7 +14964,9 @@ t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
% - The last entry <U7D00><U5143><U524D> in era_name means BC.
% - The second-to-last entry <U897F><U66A6> in era_name means AD.
%
-era "+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
+era "+:2:2020//01//01:+*:<U4EE4><U548C>:%EC%Ey<U5E74>";/
+ "+:1:2019//05//01:2019//12//31:<U4EE4><U548C>:%EC<U5143><U5E74>";/
+ "+:2:1990//01//01:2019//04//30:<U5E73><U6210>:%EC%Ey<U5E74>";/
"+:1:1989//01//08:1989//12//31:<U5E73><U6210>:%EC<U5143><U5E74>";/
"+:2:1927//01//01:1989//01//07:<U662D><U548C>:%EC%Ey<U5E74>";/
"+:1:1926//12//25:1926//12//31:<U662D><U548C>:%EC<U5143><U5E74>";/
@@ -54,7 +54,9 @@ static const date_t dates[] =
{ 1, 4, 1997 },
{ 1, 4, 1998 },
{ 1, 4, 2010 },
- { 1, 4, 2011 }
+ { 1, 4, 2011 },
+ { 30, 4, 2019 },
+ { 1, 5, 2019 }
};
static char ref[array_length (locales)][array_length (formats)]
@@ -84,20 +86,20 @@ mkreftable (void)
static const int yrj[] =
{
43, 44, 45, 2,
- 63, 64, 1, 2, 9, 10, 22, 23
+ 63, 64, 1, 2, 9, 10, 22, 23, 31, 1
};
/* Buddhist calendar year to be checked. */
static const int yrb[] =
{
2453, 2454, 2455, 2456,
- 2531, 2532, 2532, 2533, 2540, 2541, 2553, 2554
+ 2531, 2532, 2532, 2533, 2540, 2541, 2553, 2554, 2562, 2562
};
/* R.O.C. calendar year to be checked. Negative number is prior to
Minguo counting up. */
static const int yrc[] =
{
-2, -1, 1, 2,
- 77, 78, 78, 79, 86, 87, 99, 100
+ 77, 78, 78, 79, 86, 87, 99, 100, 108, 108
};
for (i = 0; i < array_length (locales); i++)
@@ -109,7 +111,8 @@ mkreftable (void)
era = (is_before (k, 30, 7, 1912)) ? "\u660e\u6cbb"
: (is_before (k, 25, 12, 1926)) ? "\u5927\u6b63"
: (is_before (k, 8, 1, 1989)) ? "\u662d\u548c"
- : "\u5e73\u6210";
+ : (is_before (k, 1, 5, 2019)) ? "\u5e73\u6210"
+ : "\u4ee4\u548c";
yr = yrj[k], sfx = "\u5e74";
}
else if (i == 1) /* lo_LA */