Hello Paul,
Thank you for your review.
From: Paul Eggert <eggert@cs.ucla.edu>
Subject: Re: [PATCH v2] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Sun, 28 Oct 2018 14:06:46 -0700
> TAMUKI Shoichi wrote:
> > Since only one Japanese era name is used by each emperor's reign, it
> > is rare that the year ends in one digit or lasts more than three
> > digits.
>
> Rare recently, but over the long term about 75% of Japanese imperial
> years have been single-digit years: since 701 AD there have been 989
> single-digit years but only 329 two-digit years. (This calculation is
> approximate, but it's close enough; see attached shell script for how
> I did the calculation.) Although Japan is more stable now than it was
> centuries ago, the long reigns since 1868 are a historical aberration
> and it should not be surprising if the fraction of single-digit years
> reverts closer to historical levels in the not-too-distant future.
As you mentioned, before the Meiji era (1868), there were many eras of
short years. However, since they used the Lunisolar calendar instead
of the Gregorian calendar before the Meiji era, it is difficult to
accurately represent dates in the current glibc scheme and I think
that we do not have to care from a practical point of view. In fact,
before the Meiji era, there are not any era entries but defined AD and
BC instead in the Japanese locale data in glibc. Also, it is
interesting to speculate that era years in future might be shorter
like before. However, it does not necessarily guarantee that all eras
will be a single-digit year. I think that it is reasonable to change
the width padding with zero of %Ey default to 2 so as to keep it a
constant width across the past and the future.
Regarding the commit message, I will change the expression as follows.
| Since only one Japanese era name is recently used by each emperor's
| reign, it is rare that the year ends in one digit or lasts more than
| three digits.
> Although I'm no expert in Japanese, as I understand it the most common
> style for formatting imperial dates in plain text uses no spaces
> anywhere,
The most common style for formatting the Japanese calendar dates in
plain text is not necessarily without spaces.
> It's far less common to see spaces to make things line up, presumably
> for tables.
I think these are the ones that will be used properly according to the
application. Both the regular representation (%c, %x, %X) and the
alternate representation (%Ec, %Ex, %EX) in the Japanese locale of
glibc are defaulting to padded with zeros. This is suitable for
expressing width sensitive, such as business forms. Next, padding
with space is easy to read by humans while expressing them in the same
width, but on the other hand, it is not suitable for splitting fields
with delimiters of spaces. Finally, a format that does not use
padding is suitable for inputs of applications that create output
equivalent to typesetting such as TeX.
> Since glibc is already defaulting to space padding for month and day-
> of-month, it makes sense for glibc to also default to space padding
> for imperial year. However, this change should be announced more
> clearly. The ChangeLog entry should say what's going on at a high
> level, and give an example call to strftime with the before-and-after
> output, along with how to generate imperial dates with no spaces; and
> (more important) the glibc documentation should for strftime should
> contain similar examples.
As mentioned above, in the Japanese locale of glibc are defaulting to
padded with zeros, so it is also natural to pad with a zero year in
the Japanese calendar. In strftime of glibc document, it says as
follows.
| The default action is to pad the number with zeros to keep it a
| constant width.
The change from zero to space padding may cause backward compatibility
in the Japanese locale, so I think that it is OK as it is.
Since the change of this time makes sane handling of display width of
one-digit year for the Japanese calendar which was not encountered
directly so far since the Japanese locale of glibc appeared, I think
for now that it is unnecessary to add new document about the issue
specialized for the Japanese locale.
Regards,
TAMUKI Shoichi
@@ -43,7 +43,7 @@ tests := test_time clocktest tst-posixtz tst-strptime tst_wcsftime \
tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \
tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \
tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \
- tst-tzname tst-y2039
+ tst-tzname tst-y2039 tst-strftime2
include ../Rules
@@ -434,7 +434,7 @@ static CHAR_T const month_name[][10] =
#endif
static size_t __strftime_internal (CHAR_T *, size_t, const CHAR_T *,
- const struct tm *, bool *
+ const struct tm *, int *, bool *
ut_argument_spec
LOCALE_PARAM) __THROW;
@@ -456,8 +456,9 @@ my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T *format,
tmcopy = *tp;
tp = &tmcopy;
#endif
+ int yr_spec = 0; /* Override padding for %Ey. */
bool tzset_called = false;
- return __strftime_internal (s, maxsize, format, tp, &tzset_called
+ return __strftime_internal (s, maxsize, format, tp, &yr_spec, &tzset_called
ut_argument LOCALE_ARG);
}
#ifdef _LIBC
@@ -466,7 +467,7 @@ libc_hidden_def (my_strftime)
static size_t
__strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
- const struct tm *tp, bool *tzset_called
+ const struct tm *tp, int *yr_spec, bool *tzset_called
ut_argument_spec LOCALE_PARAM)
{
#if defined _LIBC && defined USE_IN_EXTENDED_LOCALE_MODEL
@@ -820,7 +821,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
if (modifier == L_('O'))
goto bad_format;
#ifdef _NL_CURRENT
- if (! (modifier == 'E'
+ if (! (modifier == L_('E')
&& (*(subfmt =
(const CHAR_T *) _NL_CURRENT (LC_TIME,
NLW(ERA_D_T_FMT)))
@@ -838,11 +839,12 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
{
CHAR_T *old_start = p;
size_t len = __strftime_internal (NULL, (size_t) -1, subfmt,
- tp, tzset_called ut_argument
- LOCALE_ARG);
+ tp, yr_spec, tzset_called
+ ut_argument LOCALE_ARG);
add (len, __strftime_internal (p, maxsize - i, subfmt,
- tp, tzset_called ut_argument
- LOCALE_ARG));
+ tp, yr_spec, tzset_called
+ ut_argument LOCALE_ARG));
+ *yr_spec = 0;
if (to_uppcase)
while (old_start < p)
@@ -917,7 +919,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
#ifdef _NL_CURRENT
if (! (modifier == L_('E')
&& (*(subfmt =
- (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
+ (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
!= L_('\0'))))
subfmt = (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(D_FMT));
goto subformat;
@@ -1262,7 +1264,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
DO_NUMBER (1, tp->tm_wday);
case L_('Y'):
- if (modifier == 'E')
+ if (modifier == L_('E'))
{
#if HAVE_STRUCT_ERA_ENTRY
struct era_entry *era = _nl_get_era_entry (tp HELPER_LOCALE_ARG);
@@ -1273,6 +1275,8 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
# else
subfmt = era->era_format;
# endif
+ if (pad != 0)
+ *yr_spec = pad;
goto subformat;
}
#else
@@ -1294,7 +1298,9 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
if (era)
{
int delta = tp->tm_year - era->start_date[0];
- DO_NUMBER (1, (era->offset
+ if (*yr_spec != 0)
+ pad = *yr_spec;
+ DO_NUMBER (2, (era->offset
+ delta * era->absolute_direction));
}
#else
new file mode 100644
@@ -0,0 +1,134 @@
+/* Verify the behavior of strftime on alternate representation for year.
+
+ Copyright (C) 2013-2018 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <locale.h>
+#include <time.h>
+#include <stdio.h>
+#include <string.h>
+
+static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8", "th_TH.UTF-8" };
+#define nlocales (sizeof (locales) / sizeof (locales[0]))
+
+static const char *formats[] = { "%EY", "%_EY", "%-EY" };
+#define nformats (sizeof (formats) / sizeof (formats[0]))
+
+static const struct
+{
+ const int d, m, y;
+} dates[] =
+ {
+ { 1, 3, 88 },
+ { 7, 0, 89 },
+ { 8, 0, 89 },
+ { 1, 3, 90 },
+ { 1, 3, 97 },
+ { 1, 3, 98 }
+ };
+#define ndates (sizeof (dates) / sizeof (dates[0]))
+
+static char ref[nlocales][nformats][ndates][100];
+
+static void
+mkreftable (void)
+{
+ int i, j, k;
+ char era[10];
+ static const int yrj[] = { 63, 64, 1, 2, 9, 10 };
+ static const int yrb[] = { 2531, 2532, 2532, 2533, 2540, 2541 };
+
+ for (i = 0; i < nlocales; i++)
+ for (j = 0; j < nformats; j++)
+ for (k = 0; k < ndates; k++)
+ {
+ if (i == 0)
+ {
+ sprintf (era, "%s", (k < 2) ? "\xe6\x98\xad\xe5\x92\x8c"
+ : "\xe5\xb9\xb3\xe6\x88\x90");
+ if (yrj[k] == 1)
+ sprintf (ref[i][j][k], "%s\xe5\x85\x83\xe5\xb9\xb4", era);
+ else
+ {
+ if (j == 0)
+ sprintf (ref[i][j][k], "%s%02d\xe5\xb9\xb4", era, yrj[k]);
+ else if (j == 1)
+ sprintf (ref[i][j][k], "%s%2d\xe5\xb9\xb4", era, yrj[k]);
+ else
+ sprintf (ref[i][j][k], "%s%d\xe5\xb9\xb4", era, yrj[k]);
+ }
+ }
+ else if (i == 1)
+ {
+ sprintf (era, "\xe0\xba\x9e\x2e\xe0\xba\xaa\x2e ");
+ sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
+ }
+ else
+ {
+ sprintf (era, "\xe0\xb8\x9e\x2e\xe0\xb8\xa8\x2e ");
+ sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
+ }
+ }
+}
+
+static int
+do_test (void)
+{
+ int i, j, k, result = 0;
+ struct tm ttm;
+ char date[11], buf[100];
+ size_t r, e;
+
+ mkreftable ();
+ for (i = 0; i < nlocales; i++)
+ {
+ if (setlocale (LC_ALL, locales[i]) == NULL)
+ {
+ printf ("locale %s does not exist, skipping...\n", locales[i]);
+ continue;
+ }
+ printf ("[%s]\n", locales[i]);
+ for (j = 0; j < nformats; j++)
+ {
+ for (k = 0; k < ndates; k++)
+ {
+ ttm.tm_mday = dates[k].d;
+ ttm.tm_mon = dates[k].m;
+ ttm.tm_year = dates[k].y;
+ strftime (date, sizeof (date), "%F", &ttm);
+ r = strftime (buf, sizeof (buf), formats[j], &ttm);
+ e = strlen (ref[i][j][k]);
+ printf ("%s\t\"%s\"\t\"%s\"", date, formats[j], buf);
+ if (strcmp (buf, ref[i][j][k]) != 0)
+ {
+ printf ("\tshould be \"%s\"", ref[i][j][k]);
+ if (r != e)
+ printf ("\tgot: %zu, expected: %zu", r, e);
+ result = 1;
+ }
+ else
+ printf ("\tOK");
+ putchar ('\n');
+ }
+ putchar ('\n');
+ }
+ }
+ return result;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"