[RFC,v8,05/16] Implement the %OB specifier - alternative month names (bug 10871)

Message ID 906183224.152188.1498644183114@poczta.nazwa.pl
State Superseded
Headers

Commit Message

Rafal Luzynski June 28, 2017, 10:03 a.m. UTC
  Some languages (Slavic, Baltic, etc.) require a genitive case of the
month name when formatting a full date (with the day number) while
they require a nominative case when referring to the month standalone.

strftime() now implements a %OB format specifier which generates an
alternative month name.  For those languages %B will return the basic
month name. For those languages which do not use different
(nominative and genitive) cases of the month name or do not yet have
their locales updated %OB will retrieve the same string as %B so
moving to %OB will be harmless as long as the version of glibc which
supports this feature is used.

Note that it is not yet decided whether %OB will return the
standalone case (usually nominative) and %B will return the full
date context case (usually genitive) or vice versa.  It depends
on the locale database and localized format strings which may vary
with the locales and depend on what a language community decides.

strptime() now accepts both nominative and genitive month names.

[BZ #10871]
* time/strftime_l.c: %OB format for alternative month names added.
* time/strptime_l.c: alternative month names also recognized.
---
 time/strftime_l.c | 11 +++++++++--
 time/strptime_l.c | 24 ++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

 # define HERE_AM_STR (_nl_C_LC_TIME.values[_NL_ITEM_INDEX (AM_STR)].string)
@@ -403,6 +405,20 @@ __strptime_internal (const char *rp, const char *fmt,
struct tm *tmp,
 	      if (s.decided !=raw)
 		{
 		  trp = rp;
+#ifdef _LIBC
+		  /* First check the alt month.  */
+		  if (match_string (_NL_CURRENT (LC_TIME, ALTMON_1 + cnt), trp)
+		      && trp > rp_longest)
+		    {
+		      rp_longest = trp;
+		      cnt_longest = cnt;
+		      if (s.decided == not
+			  && strcmp (_NL_CURRENT (LC_TIME, ALTMON_1 + cnt),
+				     alt_month_name[cnt]))
+			decided_longest = loc;
+		    }
+		  trp = rp;
+#endif
 		  if (match_string (_NL_CURRENT (LC_TIME, MON_1 + cnt), trp)
 		      && trp > rp_longest)
 		    {
@@ -429,6 +445,10 @@ __strptime_internal (const char *rp, const char *fmt,
struct tm *tmp,
 	      if (s.decided != loc
 		  && (((trp = rp, match_string (month_name[cnt], trp))
 		       && trp > rp_longest)
+#ifdef _LIBC
+		      || ((trp = rp, match_string (alt_month_name[cnt], trp))
+			  && trp > rp_longest)
+#endif
 		      || ((trp = rp, match_string (ab_month_name[cnt], trp))
 			  && trp > rp_longest)))
 		{
@@ -1016,6 +1036,10 @@ __strptime_internal (const char *rp, const char *fmt,
struct tm *tmp,
 	case 'O':
 	  switch (*fmt++)
 	    {
+	    case 'B':
+	      /* Undo the increment and continue.  */
+	      fmt--;
+	      break;
 	    case 'd':
 	    case 'e':
 	      /* Match day of month using alternate numeric symbols.  */
  

Comments

Zack Weinberg July 3, 2017, 9:30 p.m. UTC | #1
On 06/28/2017 06:03 AM, Rafal Luzynski wrote:
> Some languages (Slavic, Baltic, etc.) require a genitive case of the
> month name when formatting a full date (with the day number) while
> they require a nominative case when referring to the month standalone.

I'm not familiar with the guts of str[pf]time_l at all, so I cannot
comment on the correctness of your actual code changes here.

> Note that it is not yet decided whether %OB will return the
> standalone case (usually nominative) and %B will return the full
> date context case (usually genitive) or vice versa.  It depends
> on the locale database and localized format strings which may vary
> with the locales and depend on what a language community decides.

I don't think that this decision (whether %OB will return the standalone
case and %B will return the contextual case, or vice versa) should be
allowed to be locale-dependent.  The decision of *which grammatical form
is appropriate* for each context is obviously locale-dependent, but the
rule of which formatter is for which context needs to be consistent
across all locales, so that the _call to strftime_ itself doesn't need
to be locale-dependent.  (Not everyone is going to use %x and %X.)

zw
  
Rafal Luzynski July 4, 2017, 11:43 p.m. UTC | #2
3.07.2017 23:30 Zack Weinberg <zackw@panix.com> wrote:
> [...]
> I don't think that this decision (whether %OB will return the standalone
> case and %B will return the contextual case, or vice versa) should be
> allowed to be locale-dependent. The decision of *which grammatical form
> is appropriate* for each context is obviously locale-dependent, but the
> rule of which formatter is for which context needs to be consistent
> across all locales, so that the _call to strftime_ itself doesn't need
> to be locale-dependent. (Not everyone is going to use %x and %X.)
>
> zw

I often see the format specifiers marked as translatable in
applications so the translators usually have more freedom of choice.
This is not limited to whether to use %B or %OB but things like:
day before or after the month?  do we use spaces?  should there be
dots or dashes or slashes between day/month/year?  does our
language have month names at all (%B) or only numbers (%m)?

But this will not work if an application programmer decides to
generate the month names with nl_langinfo(): the decision whether
to use MON_1 or ALTMON_1 will not be locale-dependent.  Therefore
it's more convenient to assume that alt_mon (in locale data definition
file) - nl_langinfo(ALTMON_x) - strftime("%OB") all return the
standalone case while the other form return the full-date case.

Regards,

Rafal
  

Patch

diff --git a/time/strftime_l.c b/time/strftime_l.c
index b5ba9ca..1c4bed8 100644
--- a/time/strftime_l.c
+++ b/time/strftime_l.c
@@ -492,6 +492,9 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T
*format,
 # define f_month \
   ((const CHAR_T *) (tp->tm_mon < 0 || tp->tm_mon > 11			     \
 		     ? "?" : _NL_CURRENT (LC_TIME, NLW(MON_1) + tp->tm_mon)))
+# define f_altmonth \
+  ((const CHAR_T *) (tp->tm_mon < 0 || tp->tm_mon > 11			     \
+		     ? "?" : _NL_CURRENT (LC_TIME, NLW(ALTMON_1) + tp->tm_mon)))
 # define ampm \
   ((const CHAR_T *) _NL_CURRENT (LC_TIME, tp->tm_hour > 11		      \
 				 ? NLW(PM_STR) : NLW(AM_STR)))
@@ -507,6 +510,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T
*format,
 		   ? "?" : month_name[tp->tm_mon])
 #  define a_wkday f_wkday
 #  define a_month f_month
+#  define f_altmonth f_month
 #  define ampm (L_("AMPM") + 2 * (tp->tm_hour > 11))
 
   size_t aw_len = 3;
@@ -785,7 +789,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T
*format,
 #endif
 
 	case L_('B'):
-	  if (modifier != 0)
+	  if (modifier == L_('E'))
 	    goto bad_format;
 	  if (change_case)
 	    {
@@ -793,7 +797,10 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const
CHAR_T *format,
 	      to_lowcase = 0;
 	    }
 #if defined _NL_CURRENT || !HAVE_STRFTIME
-	  cpy (STRLEN (f_month), f_month);
+	  if (modifier == L_('O'))
+	    cpy (STRLEN (f_altmonth), f_altmonth);
+	  else
+	    cpy (STRLEN (f_month), f_month);
 	  break;
 #else
 	  goto underlying_strftime;
diff --git a/time/strptime_l.c b/time/strptime_l.c
index 185619e..4c62435 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -124,6 +124,8 @@  extern const struct __locale_data _nl_C_LC_TIME
attribute_hidden;
   (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (ABDAY_1)].string)
 # define month_name (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (MON_1)].string)
 # define ab_month_name (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (ABMON_1)].string)
+# define alt_month_name \
+  (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (ALTMON_1)].string)
 # define HERE_D_T_FMT (_nl_C_LC_TIME.values[_NL_ITEM_INDEX (D_T_FMT)].string)
 # define HERE_D_FMT (_nl_C_LC_TIME.values[_NL_ITEM_INDEX (D_FMT)].string)