[RFC,v4,06/11] Provide backward compatibility for strftime family (bug 10871).

Message ID 758885038.1799972.1477615768169@poczta.nazwa.pl
State Superseded
Headers

Commit Message

Rafal Luzynski Oct. 28, 2016, 12:49 a.m. UTC
  As %OB format specifier has been added to strftime/wcsftime
family of functions backward compatibility implementation must be
provided for older binaries which assume that %B returns
a month name in the nominative case.

[BZ #10871]
* include/time.h: Declare __strftime_l_common.
* include/wchar.h: Declare __wcsftime_l_common.
* time/Versions (libc: GLIBC_2.25):
  New (__)strftime(_l) and (__)wcsftime(_l) added.
* time/strftime.c: Provide backward compatible version.
* time/strftime_l.c: Likewise.
* time/wcsftime.c: Likewise.
* time/wcsftime_l.c: Likewise.
---
 include/time.h    |  9 +++++++++
 include/wchar.h   |  9 +++++++++
 time/Versions     |  4 ++++
 time/strftime.c   | 20 +++++++++++++++++---
 time/strftime_l.c | 55 +++++++++++++++++++++++++++++++++++++++++++++----------
 time/wcsftime.c   | 22 ++++++++++++++++++----
 time/wcsftime_l.c | 25 ++++++++++++++++++++++++-
 7 files changed, 126 insertions(+), 18 deletions(-)
  

Comments

Florian Weimer Nov. 4, 2016, 1:40 p.m. UTC | #1
On 10/28/2016 02:49 AM, Rafal Luzynski wrote:
> As %OB format specifier has been added to strftime/wcsftime
> family of functions backward compatibility implementation must be
> provided for older binaries which assume that %B returns
> a month name in the nominative case.

I think this is a misuse of symbol versioning.  Why would I want to pick 
up this change when compiling from source, but not for existing binaries?

Florian
  
Rafal Luzynski Nov. 5, 2016, 10:53 a.m. UTC | #2
4.11.2016 14:40 Florian Weimer <fweimer@redhat.com> wrote:
>
>
> On 10/28/2016 02:49 AM, Rafal Luzynski wrote:
> > As %OB format specifier has been added to strftime/wcsftime
> > family of functions backward compatibility implementation must be
> > provided for older binaries which assume that %B returns
> > a month name in the nominative case.
>
> I think this is a misuse of symbol versioning. Why would I want to pick
> up this change when compiling from source, but not for existing binaries?
>
> Florian

There may be applications which rely on the fact that "%B"
returns the month name in a nominative case. An example is cal(1)
which has been pointed out in [1]. Their source code should be
changed to use "%OB" but it cannot be expected from the existing
binaries.

Did I understand your question correctly?

You could also ask how to provide the backward compatibility
for the applications compiled from source. I think it's
impossible and it's been kinda agreed in [2]. I think that
another argument is that if you're compiling an application
from the source code then you can change %B to %OB or put some
code detecting the current glibc version (at compile time or
even at runtime).

Regards,

Rafal Luzynski

[1] https://sourceware.org/ml/libc-alpha/2016-03/msg00698.html
[2] https://sourceware.org/ml/libc-alpha/2016-06/msg00021.html
  
Florian Weimer Nov. 7, 2016, 2:13 p.m. UTC | #3
On 11/05/2016 11:53 AM, Rafal Luzynski wrote:
> 4.11.2016 14:40 Florian Weimer <fweimer@redhat.com> wrote:
>>
>>
>> On 10/28/2016 02:49 AM, Rafal Luzynski wrote:
>>> As %OB format specifier has been added to strftime/wcsftime
>>> family of functions backward compatibility implementation must be
>>> provided for older binaries which assume that %B returns
>>> a month name in the nominative case.
>>
>> I think this is a misuse of symbol versioning. Why would I want to pick
>> up this change when compiling from source, but not for existing binaries?
>>
>> Florian
>
> There may be applications which rely on the fact that "%B"
> returns the month name in a nominative case. An example is cal(1)
> which has been pointed out in [1]. Their source code should be
> changed to use "%OB" but it cannot be expected from the existing
> binaries.

> You could also ask how to provide the backward compatibility
> for the applications compiled from source.

Yes, that's what I'm concerned about.

> I think it's impossible and it's been kinda agreed in [2].

I think we should strive to provide backwards compatibility for 
applications and not alter the meaning of %B, and rather change %c to 
use %OB (or whatever the source of the month name in genitive ends up to 
be) instead of %B.

One example where this matters is German.  If POSIX requires that %B 
returns the genitive case, as has been suggested, then all applications 
which currently use %B are broken because I have yet to see a 
mechanically generated German date string which actually needs the 
genitive case.  In current usage, they only occur in phrases such “on 
the last Sunday of November”.

Does this clarify my position?

Thanks,
Florian
  
Rafal Luzynski Nov. 8, 2016, 11:39 a.m. UTC | #4
Hi,

TL;DR: German language is probably not affected by this bug so will
not be affected by the change. You will not see any difference.
If you thought otherwise that's only because of my lack of precision
and I'm sorry for this.

More details below:

7.11.2016 15:13 Florian Weimer <fweimer@redhat.com> wrote:
>
>
> On 11/05/2016 11:53 AM, Rafal Luzynski wrote:
> > 4.11.2016 14:40 Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >>
> >> On 10/28/2016 02:49 AM, Rafal Luzynski wrote:
> >>> As %OB format specifier has been added to strftime/wcsftime
> >>> family of functions backward compatibility implementation must be
> >>> provided for older binaries which assume that %B returns
> >>> a month name in the nominative case.
> >>
> >> I think this is a misuse of symbol versioning. Why would I want to pick
> >> up this change when compiling from source, but not for existing binaries?
> >>
> >> Florian
> >
> > There may be applications which rely on the fact that "%B"
> > returns the month name in a nominative case. An example is cal(1)
> > which has been pointed out in [1]. Their source code should be
> > changed to use "%OB" but it cannot be expected from the existing
> > binaries.
>
> > You could also ask how to provide the backward compatibility
> > for the applications compiled from source.
>
> Yes, that's what I'm concerned about.
>
> > I think it's impossible and it's been kinda agreed in [2].
>
> I think we should strive to provide backwards compatibility for
> applications and not alter the meaning of %B, and rather change %c to
> use %OB (or whatever the source of the month name in genitive ends up to
> be) instead of %B.

I was considering this approach and I've abandoned it because the only
reason why I was considering it was because I had misread the POSIX
proposal. More details:
https://sourceware.org/bugzilla/show_bug.cgi?id=10871#c39

While at this, changing %c and also %x would be required but insufficient
because applications often use their custom formats like day-month
or weekday-day-month which are neither %c nor %x.

> One example where this matters is German.

It's good that you mention a specific language. My knowledge of German
is next to zero but fortunately I have someone to ask for more details.

> If POSIX requires that %B
> returns the genitive case, as has been suggested,

It seems to me that here is the root of misunderstanding and if it's
caused by my lack of precision then I apologize for this. I have
always used the terms "genitive case" and "nominative case" only
as shortcuts because I should actually say "month name in a form
appropriate when formatting a full date including a day number"
and "month name in a form appropriate when formatting a month name
standalone, without a day number". Occasionally I also refer to
this full expression. There are languages which require genitive
when formatting with date and nominative when standalone; there may
be other languages which require another pair of forms (although
I'm not aware of any such language), and there are languages which
require always one form (nominative) although they also have genitive
form which is not applicable here, an example is German probably.

CLDR refers to "stand-alone" and "format", it seemed to ambiguous to me.
More info:
http://cldr.unicode.org/translation/date-time#TOC-Stand-Alone-vs.-Format-Styles

Now when I look at http://austingroupbugs.net/view.php?id=258
I can see they made the same error: they emphasize genitive and
nominative case while they should emphasize "with a day number"
and "no day number" and put "genitive" and "nominative" in
parentheses only as possible examples of how some languages
implement this feature.

> then all applications
> which currently use %B are broken

Are they broken already? When you type this command:

$ date +"%d %B %Y"

is it broken in German locale? In Polish it's broken, so is in
Czech, Russian, also in Finnish, Greek, and more. If it's also broken
in German then you'll be happy to see the change. If it's not broken
then you don't need any change.

> because I have yet to see a
> mechanically generated German date string which actually needs the
> genitive case. In current usage, they only occur in phrases such “on
> the last Sunday of November”.

Note that a genitive form will be used only if it is provided in
locale data. So if German language has a genitive form but does not
require (or even prohibits) using it when formatting a full date then
all you have to do is not to provide any changes to the locale, just
leave it as is. My patch provides a way to provide two different
forms for month names but this second form is optional. There will
be no change visible if there are no changes in the locale data.
That's the reason why I have also provided sample locale data for
some languages; these changes are not intended to be committed
(although may be committed if translators apparently find them
correct).

In your example, a phrase “on the last Sunday of November” looks
like an attempt to employ strftime() to generate natural language
sentences rather than to format a date. It's been discussed in
bugzilla and stated that strftime() will never be suitable for this.

> Does this clarify my position?
>
> Thanks,
> Florian

I think so and also now I'd like to ask: does this answers your
questions?

Thank you for your participation and support, I'll appreciate
more feedback from you and from other people.

Regards,

Rafal
  
Florian Weimer Nov. 9, 2016, 10:49 a.m. UTC | #5
On 11/08/2016 12:39 PM, Rafal Luzynski wrote:

> CLDR refers to "stand-alone" and "format", it seemed to ambiguous to me.
> More info:
> http://cldr.unicode.org/translation/date-time#TOC-Stand-Alone-vs.-Format-Styles

I think this distinction is not too bad.  It's certainly an improvement 
over the declension-based approach.  The additional explanation 
regarding elision is also helpful.

>> then all applications
>> which currently use %B are broken
>
> Are they broken already? When you type this command:
>
> $ date +"%d %B %Y"
>
> is it broken in German locale?

Yes, it should be

$ date +"%-d. %B %Y

This wasn't your point, but it's still relevant because it means that 
such date format strings need translation.  But that's probably true for 
the strings used by tools like cal (where "%B %Y" may not work 
unconditionally today).

> Note that a genitive form will be used only if it is provided in
> locale data. So if German language has a genitive form but does not
> require (or even prohibits) using it when formatting a full date then
> all you have to do is not to provide any changes to the locale, just
> leave it as is. My patch provides a way to provide two different
> forms for month names but this second form is optional. There will
> be no change visible if there are no changes in the locale data.
> That's the reason why I have also provided sample locale data for
> some languages; these changes are not intended to be committed
> (although may be committed if translators apparently find them
> correct).

Right, German wouldn't need any changes because there is only one set of 
month names relevant here.

> In your example, a phrase “on the last Sunday of November” looks
> like an attempt to employ strftime() to generate natural language
> sentences rather than to format a date. It's been discussed in
> bugzilla and stated that strftime() will never be suitable for this.

Yes, I fully expect that this would not work.  The use of the genitive 
in full date specifications has almost completely died out in German, 
and the genitive is no longer very pronounced, either (the trailing “s” 
is sometimes elided).

I still find it odd that we want to turn %B into the mangled name for 
full dates, and not %OB (and switch all the formats in places that want 
to use the new capability).  The latter seems to be better from a 
backwards-compatibility perspective.

In fact, a *lot* of languages today use "%d de %B" as the date format 
string (although not in our locales, we usually sidestep this by using 
abbreviations only for %c).  In some of these languages, the “e” in “de” 
is subject to elision, so we really want

   14 d'abril

and not

   14 de abril

The only wait to get this is to put the “de” and “d'” into the mangled 
month name (as has been suggested in the CLDR reference).  But this 
means that "%d de %B" (which seems to be the most commonly used form for 
these languages today) is expanded into “14 de d'abril”, which is not 
what we want.  I am worried that this puts pressure on us *not* to 
introduce mangled month names at all for these languages.

These issues go away if we keep the existing month names unchanged and 
add the new mangled names under new identifiers.

Florian
  
Florian Weimer Nov. 9, 2016, 11 a.m. UTC | #6
On 11/07/2016 03:13 PM, Florian Weimer wrote:

> One example where this matters is German.  If POSIX requires that %B
> returns the genitive case, as has been suggested, then all applications
> which currently use %B are broken because I have yet to see a
> mechanically generated German date string which actually needs the
> genitive case.  In current usage, they only occur in phrases such “on
> the last Sunday of November”.

The concern expressed in the paragraph above is not relevant because 
it's about month name mangling for inclusion in full date strings, and 
not about declension.  German does not have an issue with that, but many 
Romance languages do because some of them do require such mangling in 
full date strings, as explained here:

   https://sourceware.org/ml/libc-alpha/2016-11/msg00321.html

To be absolutely clear, I still object to the change to %B (and MON_1 
etc.) and moving the old definitions to %OB (ALTMON_1 etc.).  The right 
way to do this is to leave %B (MON_1 etc.) alone and add %OB (ALTMON_1 
etc.) with the new mangled form.  This also removes the surprising 
change of behavior due to a simple recompilation of unchanged sources.

As there has been some confusion regarding past objections of mine which 
were apparently worded in too a conciliatory way, I consider this a 
sustained objection under the glibc consensus protocol.

Thanks,
Florian
  
Rafal Luzynski Nov. 10, 2016, 12:33 a.m. UTC | #7
9.11.2016 11:49 Florian Weimer <fweimer@redhat.com> wrote:
>
>
> On 11/08/2016 12:39 PM, Rafal Luzynski wrote:
>
> [...]
> >> then all applications
> >> which currently use %B are broken
> >
> > Are they broken already? When you type this command:
> >
> > $ date +"%d %B %Y"
> >
> > is it broken in German locale?
>
> Yes, it should be
>
> $ date +"%-d. %B %Y
>
> This wasn't your point, but it's still relevant because it means that
> such date format strings need translation.

Yes, they do, and the translators do their jobs correctly already.
However, they are unable to provide a correct form when a month
in a genitive case is required (or more generally: a different form
is required for the month name in a date while the current form must
be retained for other purposes).

> But that's probably true for
> the strings used by tools like cal (where "%B %Y" may not work
> unconditionally today).

Yes, and in many other cases: some languages require a dot after
a day number, some don't; some prefer "%d %B" some "%B %d" etc.

> [...]
> I still find it odd that we want to turn %B into the mangled name for
> full dates, and not %OB (and switch all the formats in places that want
> to use the new capability). The latter seems to be better from a
> backwards-compatibility perspective.

I've discussed all possible solutions in [1], including what you
have proposed here. Shortly, no solution is perfect and each has
its advantages and disadvantages. Your solution has these pros:

- does not cause any backward compatibility issues;
- does not break any existing application where the current solution
  is correct.

At the same time it has the following cons:

- introduces incompatibilities with *BSD family (including OS X) and
  with the probable future POSIX specification which will remain
  forever - please read below why I find it important;
- does not actually solve the problem for any existing application
  until the authors or translators change %B to %OB (in case of open
  source programs we can reach the upstreams and suggest solution).

My solution has these pros:

- will automagically solve the problems for all applications where
  it is broken;
- will remain compatible with an existing *BSD solution and possible
  future POSIX specification.

Also has this disadvantage:

- will break some existing applications where current solution is correct;

but:

- in case of open source software we can reach the upstreams and suggest
  solution;
- in case of closed source software distributed in a binary form we can
  provide a backward compatible ABI which will provide the old behaviour
  for older programs.

I believe there are less cases where the a month name is displayed
standalone than those where it is displayed with a day number therefore
I believe that a fallout caused by applications broken by my solution
is smaller than the fallout caused by the applications broken now.
And the severity of the new bugs is equal to the severity of the current
bugs. Therefore I'm still on the side of my solution. Of course I'm
not a regular glibc developer so if you use your power and provide
your solution (you're even free to rework my patches) then I'll have
nothing more than live with that; the apps are able to adopt to it, too.

Now more about why it's valuable IMHO to remain compatible with
other systems. The next thing I'm going to do is to fix the same
bug in glib. [2] The solution of swbz#10871 will automagically fix
bgo#749206 on Linux and provide a specification of how to fix it on
other platforms. Glib is a portable multiplatform library so if we
implement ALTMON_x and %OB in a manner incompatible with *BSD then
we'll have to say to the developers:

- on Linux in g_date_strftime() use %B and %OB;
- on OS X and *BSD use %OB and %B (note: swapped);
- on Windows we'll provide a custom solution, yet to be decided
  whether it'll be Linux-like or BSD-like;
- similar problems with g_date_time_printf().

> In fact, a *lot* of languages today use "%d de %B" as the date format
> string (although not in our locales, we usually sidestep this by using
> abbreviations only for %c). In some of these languages, the “e” in “de”
> is subject to elision, so we really want
>
> 14 d'abril
>
> and not
>
> 14 de abril

Good point, CLDR lists this feature for Asturian and Catalan. I think
I read somewhere about the same problem in Italian but CLDR does not
list it.

> The only wait to get this is to put the “de” and “d'” into the mangled
> month name (as has been suggested in the CLDR reference).

Exactly!

> But this
> means that "%d de %B" (which seems to be the most commonly used form for
> these languages today) is expanded into “14 de d'abril”, which is not
> what we want.

"%d de %B" is just a temporary workaround which already works
incorrectly generating "de abril", as you quoted above.

> I am worried that this puts pressure on us *not* to
> introduce mangled month names at all for these languages.

Local language communities will have a chance to decide whether
they want this change or not. Same as with German, no change will
be visible until the locale data are updated. Each language will have
to provide their data, or, if all locale data will be copied from CLDR,
they will be able to revert the change.

> These issues go away if we keep the existing month names unchanged and
> add the new mangled names under new identifiers.
>
> Florian

I understand your objections and still I keep my opinion. Is there
any committee in glibc which would decide which way to choose?

Best regards,

Rafal

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=10871#c7
[2] https://bugzilla.gnome.org/show_bug.cgi?id=749206
  
Florian Weimer Nov. 10, 2016, 12:41 p.m. UTC | #8
On 11/10/2016 01:33 AM, Rafal Luzynski wrote:

> I've discussed all possible solutions in [1], including what you
> have proposed here. Shortly, no solution is perfect and each has
> its advantages and disadvantages. Your solution has these pros:
>
> - does not cause any backward compatibility issues;
> - does not break any existing application where the current solution
>   is correct.
>
> At the same time it has the following cons:
>
> - introduces incompatibilities with *BSD family (including OS X) and
>   with the probable future POSIX specification which will remain
>   forever - please read below why I find it important;

Even the FreeBSD situation is in support of my proposal because 
implementing it would improve date formatting:

[root@bsd ~]# uname -a
FreeBSD bsd 11.0-RELEASE-p1 FreeBSD 11.0-RELEASE-p1 #0 r306420: Thu Sep 
29 01:43:23 UTC 2016 
root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
[root@bsd ~]# LC_ALL=ca_ES.UTF-8 date -j 201604141000
dijous, 14 de d’abril de 2016, 10:00:00 UTC
[root@bsd ~]# LC_ALL=ca_ES.UTF-8 cal
   De novembre 2016
dg dl dt dc dj dv ds
        1  2  3  4  5
  6  7  8  9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30

In the date case, this is not even a third-party application using a 
hard-coded strftime argument, it's right in the base operating system, 
in the locale data.

I couldn't test Thunderbird because it did not pick up the Catalan 
language pack for some reason, but the sources use "de %B" as a date 
format, so I expect that they are broken on FreeBSD, too.

I think this shows that whatever is currently proposed for POSIX has 
plenty of unintended consequences.

> - does not actually solve the problem for any existing application
>   until the authors or translators change %B to %OB (in case of open
>   source programs we can reach the upstreams and suggest solution).

Yes, but this has to be weighed against all the applications which are 
broken after the change.

> My solution has these pros:
>
> - will automagically solve the problems for all applications where
>   it is broken;

Not true, see the FreeBSD example, where the full-format date string is 
still incorrect.

> - will remain compatible with an existing *BSD solution and possible
>   future POSIX specification.

FreeBSD has to fix things anyway, so changing the approach would not 
create additional work for them.

> Also has this disadvantage:
>
> - will break some existing applications where current solution is correct;

Some?  Most exiting applications (which use date formats) for some 
locales, I would say.

> but:
>
> - in case of open source software we can reach the upstreams and suggest
>   solution;
> - in case of closed source software distributed in a binary form we can
>   provide a backward compatible ABI which will provide the old behaviour
>   for older programs.

I think we should, if at all possible, avoid situations were mere 
recompilation of an application introduces subtle changes.  Software is 
increasingly bundled and compiled by downstream developers and not 
distributions.

> I believe there are less cases where the a month name is displayed
> standalone than those where it is displayed with a day number therefore
> I believe that a fallout caused by applications broken by my solution
> is smaller than the fallout caused by the applications broken now.

This might be true for affected Slavic locales (but I haven't investigated).

> And the severity of the new bugs is equal to the severity of the current
> bugs.

I think for Romance languages with elision, the slightly incorrect “de” 
is preferred to the “de de” or “de d'” we'd get with your approach (and 
as the FreeBSD example shows, these situations are hardly temporary, but 
the bugs stick around for quite some time).

>> I am worried that this puts pressure on us *not* to
>> introduce mangled month names at all for these languages.
>
> Local language communities will have a chance to decide whether
> they want this change or not. Same as with German, no change will
> be visible until the locale data are updated. Each language will have
> to provide their data, or, if all locale data will be copied from CLDR,
> they will be able to revert the change.

They do not have much choice here if they do not want to break most 
applications temporarily.

Florian
  
Rafal Luzynski Nov. 10, 2016, 6:42 p.m. UTC | #9
10.11.2016 13:41 Florian Weimer <fweimer@redhat.com> wrote:
>
>
> On 11/10/2016 01:33 AM, Rafal Luzynski wrote:
>
> > I've discussed all possible solutions in [1], including what you
> > have proposed here. Shortly, no solution is perfect and each has
> > its advantages and disadvantages. Your solution has these pros:
> >
> > - does not cause any backward compatibility issues;
> > - does not break any existing application where the current solution
> > is correct.
> >
> > At the same time it has the following cons:
> >
> > - introduces incompatibilities with *BSD family (including OS X) and
> > with the probable future POSIX specification which will remain
> > forever - please read below why I find it important;
>
> Even the FreeBSD situation is in support of my proposal because
> implementing it would improve date formatting:
>
> [root@bsd ~]# uname -a
> FreeBSD bsd 11.0-RELEASE-p1 FreeBSD 11.0-RELEASE-p1 #0 r306420: Thu Sep
> 29 01:43:23 UTC 2016
> root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
> [root@bsd ~]# LC_ALL=ca_ES.UTF-8 date -j 201604141000
> dijous, 14 de d’abril de 2016, 10:00:00 UTC

I was investigating these cases long ago and in Linux only,
sorry if I'm inaccurate and please tell me if an actual investigation
is needed again. I don't know if FreeBSD uses the same GNU coreutils
as Linux does but if it does then it's not a surprise if some bugs
are common.

So, AFAIR, date does not even call strftime() except when expanding
"%c" and "%x". Instead it reimplements the same algorithm as strftime()
but putting the result directly to stdout. The aim is to support
arbitrarily long format string without risking a memory overflow.
Then the same function is reused by du. Both belong to the package
coreutils. I'm aware that coreutils will have to be fixed the same
way as we would fix glibc.

In this particular case, it looks like date uses whatever is used
as an expansion of "%c" which for Catalan seems to contain something
like "%a, %-d de %B de..." which is an attempt to workaround an issue
which does not exist in FreeBSD. Probably locale data need to be fixed,
removal of "de" and one space should be sufficient in this case.
Actually this workaround is not even correct because neither
"de d’abril" nor "de abril" is correct.

May I ask you what is the result of this command in that system?

    LC_ALL=ca_ES.UTF-8 locale date_fmt

(I hope the command is correct and displays the format for "%c".)

> [root@bsd ~]# LC_ALL=ca_ES.UTF-8 cal
> De novembre 2016
> dg dl dt dc dj dv ds
>        1  2  3  4  5
>  6  7  8  9 10 11 12
> 13 14 15 16 17 18 19
> 20 21 22 23 24 25 26
> 27 28 29 30

Again I'm not sure if this is the same cal as in Linux but it looks
like it uses strftime("%B") or nl_langinfo(MON_1) where it should
use strftime("%OB") or nl_langinfo(ALTMON_1), respectively. As I said
above, I'm aware of this issue and cal is one of these apps that would
get broken and would have to be fixed.

> In the date case, this is not even a third-party application using a
> hard-coded strftime argument, it's right in the base operating system,
> in the locale data.

That's good IMHO because we know how to reach it upstream. There is
worse situation with the software we are not aware of.

> I couldn't test Thunderbird because it did not pick up the Catalan
> language pack for some reason, but the sources use "de %B" as a date
> format, so I expect that they are broken on FreeBSD, too.

I would expect Thunderbird to be a good example of an application
which is broken now and would get fixed; hopefully it's already
working correctly in FreeBSD.

I'm pretty sure it's a translation file (*.po, *.mo, *.gmo) which
provides "de %B" rather than the source code (*.c, *.cpp). Again,
this is an attempt to workaround the situation and even this does
not work correctly because it generates "de abril" on Linux and
"de d'abril" on BSD. Nothing better can be done until we fix this bug.

> I think this shows that whatever is currently proposed for POSIX has
> plenty of unintended consequences.
>
> > - does not actually solve the problem for any existing application
> > until the authors or translators change %B to %OB (in case of open
> > source programs we can reach the upstreams and suggest solution).
>
> Yes, but this has to be weighed against all the applications which are
> broken after the change.

That's what I'm trying to estimate and so far I guess there are
more apps broken now than those that will get broken.

I'm not sure if there is a way to grep over as many source codes
as possible and check how they use strftime() and nl_langinfo().

> > My solution has these pros:
> >
> > - will automagically solve the problems for all applications where
> > it is broken;
>
> Not true, see the FreeBSD example, where the full-format date string is
> still incorrect.

In case of date utility it's because they provide a default format
string for Catalan locale incorrectly. If you put a correct format string
like this:

date +"%-d %B de %Y"

the result would be correct. I'm not sure why Catalan locale in FreeBSD
provide that additional "de". Is it a remain from the times when they
had the same bug (pre-1999)? Is it copied from Linux?

> > - will remain compatible with an existing *BSD solution and possible
> > future POSIX specification.
>
> FreeBSD has to fix things anyway, so changing the approach would not
> create additional work for them.

Well, if you convinced *BSD (and Apple) to swap their meaning of "%B"
and "%OB" it would make it possible to implement the same in Linux.
I'm afraid they wouldn't agree.

> > Also has this disadvantage:
> >
> > - will break some existing applications where current solution is correct;
>
> Some? Most exiting applications (which use date formats) for some
> locales, I would say.

No, only those which list months standalone. Except calendars and some
applications grouping objects (e.g., documents, including some blog
managers) by months I can't imagine any software doing so. All other software
would get fixed. The examples in Catalan which you have provided are
caused by an attempt to fix the problem putting "de" before the month
name. Even this workaround is not perfect. But, again, if Catalan people
prefer they may choose to remain with current locale settings and current
workaround. Like German, they will not see any difference if they leave
the locale data unchanged.

>
> > but:
> >
> > - in case of open source software we can reach the upstreams and suggest
> > solution;
> > - in case of closed source software distributed in a binary form we can
> > provide a backward compatible ABI which will provide the old behaviour
> > for older programs.
>
> I think we should, if at all possible, avoid situations were mere
> recompilation of an application introduces subtle changes. Software is
> increasingly bundled and compiled by downstream developers and not
> distributions.
>
> > I believe there are less cases where the a month name is displayed
> > standalone than those where it is displayed with a day number therefore
> > I believe that a fallout caused by applications broken by my solution
> > is smaller than the fallout caused by the applications broken now.
>
> This might be true for affected Slavic locales (but I haven't investigated).

I'm counting (or rather trying to guess) the numbers of applications
here, not the number of languages (locales). I mean the number of apps
which: are broken now and will get fixed vs those which are working
correctly now and will get broken.

>
> > And the severity of the new bugs is equal to the severity of the current
> > bugs.
>
> I think for Romance languages with elision, the slightly incorrect “de”
> is preferred to the “de de” or “de d'” we'd get with your approach (and
> as the FreeBSD example shows, these situations are hardly temporary, but
> the bugs stick around for quite some time).

Definitely, "de de" or "de d'" is incorrect but if someone touches the
locale data for some language they should remove these additional "de"
from "%c" and "%x" at the same time while providing "alternative"
month names. Regarding the software which has "de" provided by
translators it's a task for translators to fix it.

Here I think that maybe we should reach some local communities including
translators and ask which solution would they prefer: would they like
that nothing changes until they change every "de %B" to "%OB" or would
they like that "de de" suddenly appears until they change "de %B"
to "%B". Is trans@lists.fedoraproject.org a good place to discuss it?

Regards,

Rafal
  
Andreas Schwab Nov. 10, 2016, 7:19 p.m. UTC | #10
On Nov 10 2016, Rafal Luzynski <digitalfreak@lingonborough.com> wrote:

> I was investigating these cases long ago and in Linux only,
> sorry if I'm inaccurate and please tell me if an actual investigation
> is needed again. I don't know if FreeBSD uses the same GNU coreutils
> as Linux does but if it does then it's not a surprise if some bugs
> are common.

FreeBSD generally does not use GNU tools.  Here is the source for date,
for example:

https://svnweb.freebsd.org/base/head/bin/date/ 

Andreas.
  
Rical Jasan Nov. 11, 2016, 3:52 a.m. UTC | #11
On 11/09/2016 04:33 PM, Rafal Luzynski wrote:
> Is there any committee in glibc which would decide which way to choose?

I'm not aware of any actual committees other than consensus on the list,
so I'd like to cast my vote here.  Locales are not an area of speciality
for me and I've only just kept an eye on this thread, so hopefully I'm
understanding this correctly, but on a high level, changing the meaning
of a format specifier already in use raises a red flag, compared to the
alternative of introducing a new one, with new meaning or behaviour.
Introducing a new one avoids "breaking" the code other people have
written -- right, wrong, or indifferent -- and makes "fixing" one's code
a voluntary action, if the new behaviour is actually what's desired.
That seems more appropriate to me.

Rical
  
Rafal Luzynski Nov. 15, 2016, 1:21 a.m. UTC | #12
10.11.2016 20:19 Andreas Schwab <schwab@linux-m68k.org> wrote:
>
>
> On Nov 10 2016, Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
>
> > I was investigating these cases long ago and in Linux only,
> > sorry if I'm inaccurate and please tell me if an actual investigation
> > is needed again. I don't know if FreeBSD uses the same GNU coreutils
> > as Linux does but if it does then it's not a surprise if some bugs
> > are common.
>
> FreeBSD generally does not use GNU tools. Here is the source for date,
> for example:
>
> https://svnweb.freebsd.org/base/head/bin/date/
>
> Andreas.

Thank you for this link, Andreas.  I took the opportunity to analyze the
problem more thoroughly.

1. date utility in FreeBSD actually calls strftime() directly which makes
   it vulnerable to memory overflow if the format string is maliciously
   long, an issue which has been fixed in glibc a while ago.  That means
   that whole implementation of date format is inside strftime().
   The default format for a date is "%+" (not supported in glibc2)
   which uses whatever is provided in locale data as date_fmt field.
2. In case of Catalan language, date_fmt and other formats had many
   changes this year but the additional unnecessary "de" before the
   month name has been added only in the last commit, on Aug 13, 2016.
   It did not exist before.  Florian, your FreeBSD says it's release 11.0
   from Sep 29, 2016.  All this looks like you have spotted a bug,
   not present in the older releases and not yet fixed.  I don't know
   why this change has been introduced, the commit comment says
   that the time data from CLDR are not good but does not explain where
   the format containing the additional "de" comes from.
   Catalan language has been added to FreeBSD only in October 2015.
   While at this, it seems strange to me that it is listed as ca_IT
   rather than ca_ES but I guess it does work anyway.
3. In case of cal utility I'm not sure if it's a core part of FreeBSD
   because FreeBSD manuals say that ncal is its native cal implementation.
   I'm not sure where cal comes from.  If it comes from Linux or from
   another non-BSD source then it may have the same problems I mentioned
   before: it displays nl_langinfo(MON_1+x) which may return a month
   name in a genitive form (or whatever is appropriate when printing
   a month name in a full date context but not standalone), one of
   those issues which will be introduced by my proposed change.
   At the same time, ncal uses wcsprintf("%OB") - correctly!
4. Unfortunately, there seem to be no more Western European languages
   supported in FreeBSD and featuring any difference between %B and %OB
   months names.

Please note that whether we implement nominative (standalone) cases
as %OB/ALTMON_x and genitive (full date) as %B/MON_x or the other
way round it does not change the vulnerability to the incorrect
format strings.  If the format string for Catalan (or any other
similar language) contains additional "de" while the month name
is already in its genitive form then whole format string becomes
incorrect no matter which implementation we choose.

Regards,

Rafal
  
Rafal Luzynski Nov. 15, 2016, 1:38 a.m. UTC | #13
11.11.2016 04:52 Rical Jasan <ricaljasan@pacific.net> wrote:
>
>
> On 11/09/2016 04:33 PM, Rafal Luzynski wrote:
> > Is there any committee in glibc which would decide which way to choose?
>
> I'm not aware of any actual committees other than consensus on the list,
> so I'd like to cast my vote here. Locales are not an area of speciality
> for me and I've only just kept an eye on this thread, so hopefully I'm
> understanding this correctly, but on a high level, changing the meaning
> of a format specifier already in use raises a red flag, compared to the
> alternative of introducing a new one, with new meaning or behaviour.
> Introducing a new one avoids "breaking" the code other people have
> written -- right, wrong, or indifferent -- and makes "fixing" one's code
> a voluntary action, if the new behaviour is actually what's desired.
> That seems more appropriate to me.
>
> Rical

Thank you, Rical for your feedback.  It's valuable even if I don't
agree.  I'm sorry if I'm repeating the same thing again but as you
said that locales are not an area of speciality for you I'd like to
summarize again what are the disadvantages if we decide to implement
genitive (full date) format as ALTMON_x and %OB and leave nominative
(standalone) format unchanged as MON_x and %B:

- glibc and *BSD family will remain incompatible forever;
- also glibc will remain forever incompatible with the possible
  future change in POSIX [1];
- we will have trouble to define what should be the specification
  of g_date_strftime() and g_date_time_printf() providing that Glib
  is a multiplatform library [2];
- no application will be fixed until it switches to the new features:
  ALTMON_x and %OB; this means all applications which display dates
  except those which display months names standalone (e.g., calendars).

Maybe it should be somehow counted how many applications use
nl_langinfo/strftime/wcsftime to display month names standalone
and how many in full date context?

Technically, both implementation are not much different.  Nothing
inside the pipeline says whether MON_x/%B are genitive or
nominative, except the code which provides the backward compatible
behaviour.  Otherwise it's only a matter of convention: we provide
nominative (or genitive) month names in their appropriate places
in locale data and say the users that they can retrieve them
using MON_x, ALTMON_x, %B, and %OB.  I'm advocating my solution
not because it has any technical advantages (actually, it is
more difficult to implement) but because I think it's good
to ensure compatibility with existing standards.  Note that *BSD
family introduced its change in the end of 1990s.  Maybe it's
good to read what were their reasons behind choosing that way
of implementation.

Regards,

Rafal

[1] http://austingroupbugs.net/view.php?id=258
[2] https://bugzilla.gnome.org/show_bug.cgi?id=749206
  
Rafal Luzynski Nov. 15, 2016, 11:09 a.m. UTC | #14
15.11.2016 02:38 Rafal Luzynski <digitalfreak@lingonborough.com> wrote:
> [...]
> - no application will be fixed until it switches to the new features:
> ALTMON_x and %OB; this means all applications which display dates
> except those which display months names standalone (e.g., calendars).

Typo, it should be:

"... all applications which display dates will continue displaying
months names incorrectly except those which display them standalone..."

Regards,

Rafal
  
Rical Jasan Nov. 16, 2016, 1:05 p.m. UTC | #15
On 11/14/2016 05:38 PM, Rafal Luzynski wrote:
> Thank you, Rical for your feedback.  It's valuable even if I don't
> agree.  I'm sorry if I'm repeating the same thing again

I should apologize because I was not nearly as up on this bug as I
thought I was.  Thank you for the links; following the gnome/glib bug
which referenced the glibc bug 10871 as "glibc developers are not
interested in solving this on their level" had me paying closer
attention to the history there (7 years...) and putting this thread in
that context.

As I understand it:

 1. the standard doesn't say %B/MON_ should be genitive or nominative
 2. the standard does say UB for %O'x', x not in the list
 3. the current glibc %B/MON_ returns nominative
 4. the BSD/POSIX (proposed) %OB/ALTMON_ returns nominative

So your patch is changing the behaviour of the format specifier %B in a
way allowable by the standard, and it adds a non-standard extension %OB
which the standard explicitly states is undefined, but not disallowed.
Additionally, binaries built against pre-%OB glibc running on post-%OB
glibc will still have the old behaviour.

I'm fine what that; I had a different version of this in my mind (see
below).  Thank you for prompting me to give this more thought.  You are
definitely between a rock and hard place.  :)

I think I should respond to Florian's objection [5] at this point.

Florian,

Were you opposed to the work of trying to make %B a little smarter?  It
looks like Rafal abandoned that approach voluntarily, and opted to chase
the unification on the horizon when it was pointed out BSD and the POSIX
proposal were actually the same.  [6]

I was originally opposed to the idea of making %B act differently in
different contexts where it didn't before, but in light of the fact the
implementation isn't restricted to a particular format/case, that does
seem a reasonable compromise.  The POSIX proposal is 6-1/2 years old
now, barely younger than this bug, and we have someone who is interested
in fixing it *now*, in a way that keeps us flexible for whatever happens
with the standards, who-knows-when.  We could avoid a new specifier
extension (in lieu of standardization), and %B could be viewed as
partially "fixed", in that it just might get it right more often than it
already does.

The bug is about returning the correct case, in a context we might be
able to determine, not whether we should define our own standard for %B
or an extension %OB.  Could we avoid addressing the whole future
standards issue by limiting the fix to making %B return genitive in
cases where it can be sure it should, where the genitive is even available?

Rical

[1]
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03_05
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=10871
[4] http://austingroupbugs.net/view.php?id=258
[5] https://sourceware.org/ml/libc-alpha/2016-11/msg00322.html
[6] https://sourceware.org/bugzilla/show_bug.cgi?id=10871#c40
  
Rafal Luzynski Nov. 17, 2016, 11:18 a.m. UTC | #16
Thank you for your reply, Rical.  I hope it moves us forward.

Please see more comments below:

16.11.2016 14:05 Rical Jasan <ricaljasan@pacific.net> wrote:
> [...]
> As I understand it:
>
> 1. the standard doesn't say %B/MON_ should be genitive or nominative
> 2. the standard does say UB for %O'x', x not in the list
> 3. the current glibc %B/MON_ returns nominative
> 4. the BSD/POSIX (proposed) %OB/ALTMON_ returns nominative

That's true as long as you remember that "nominative" is actually
a shortcut for "a form appropriate when formatting a month name
standalone, without a day number" and "genitive" is actually
a shortcut for "a form appropriate when formatting a month name
in a full date context, with a day number".  As Florian correctly
pointed out in some languages (German, also English) a correct form
when formatting a full date is nominative.  These languages should
not be forced to use genitive if their rules say not to use
genitive in that context.

> So your patch is changing the behaviour of the format specifier %B in a
> way allowable by the standard, and it adds a non-standard extension %OB
> which the standard explicitly states is undefined, but not disallowed.
> Additionally, binaries built against pre-%OB glibc running on post-%OB
> glibc will still have the old behaviour.

True, backward compatibility for existing binaries is retained.
There are of course concerns about existing sources which are
recompiled without any change but as I noted before:

1. If they are open source project we can reach them out and
   help them adopt.
2. The cases where the change will be actually needed are probably
   rare compared with the cases where a new behaviour of %B will
   finally generate a correct form without any work on their side.

> I'm fine what that; I had a different version of this in my mind (see
> below). Thank you for prompting me to give this more thought. You are
> definitely between a rock and hard place. :)
>
> I think I should respond to Florian's objection [5] at this point.
>
> Florian,
>
> Were you opposed to the work of trying to make %B a little smarter? It
> looks like Rafal abandoned that approach voluntarily, and opted to chase
> the unification on the horizon when it was pointed out BSD and the POSIX
> proposal were actually the same. [6]
> [...]

When you refer to my earlier smart (heuristic) implementation of %B do
you mean my attempt to implement an algorithm detecting if %B is in
a context of full date (near a day number) or not?  Except realizing
that BSD and the POSIX proposal are actually the same there is another
argument against the smart algorithm.  I think that I saw an implementation
in a date utility which scans the format string and then calls strftime()
for each format specifier separately.  This way a smart implementation
would have no way to tell if %B is just after a day number (well, it could
maintain some internal state) and definitely no way to tell if the next
call after a current %B will also contain a day number.

Here I preserve your links:

>
> [1]
> http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03_05
> [2] http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html
> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=10871
> [4] http://austingroupbugs.net/view.php?id=258
> [5] https://sourceware.org/ml/libc-alpha/2016-11/msg00322.html
> [6] https://sourceware.org/bugzilla/show_bug.cgi?id=10871#c40

Regards,

Rafal
  
Rical Jasan Nov. 18, 2016, 9:22 a.m. UTC | #17
On 11/17/2016 03:18 AM, Rafal Luzynski wrote:
> Thank you for your reply, Rical.  I hope it moves us forward.
> 
> Please see more comments below:
> 
> 16.11.2016 14:05 Rical Jasan <ricaljasan@pacific.net> wrote:
>> [...]
>> As I understand it:
>>
>> 1. the standard doesn't say %B/MON_ should be genitive or nominative
>> 2. the standard does say UB for %O'x', x not in the list
>> 3. the current glibc %B/MON_ returns nominative
>> 4. the BSD/POSIX (proposed) %OB/ALTMON_ returns nominative
> 
> That's true as long as you remember that "nominative" is actually
> a shortcut for "a form appropriate when formatting a month name
> standalone, without a day number" and "genitive" is actually
> a shortcut for "a form appropriate when formatting a month name
> in a full date context, with a day number".

Right.  I was sticking with the much-easier-to-type version.

#define nominative ...

> As Florian correctly
> pointed out in some languages (German, also English) a correct form
> when formatting a full date is nominative.  These languages should
> not be forced to use genitive if their rules say not to use
> genitive in that context.

I believe you had pointed out that the genitive/alternate forms would
need to be present in the locale data, and could be omitted for
languages that didn't need them, yielding the correct results in practice.

I'm not sure the problem can be appropriately addressed without a fully
general solution, which makes how you deal with it something of a lynch
pin in getting even the most minimal of fixes accepted.  Trying to solve
this issue is what puts you in standards and extensions land---and it
does not look like travel is permitted there at this time.  Extra care
in designing the code to be easily modified no matter how the general
solution winds up looking (whether %OB is standardized, glibc begins
using their own extensions some day, etc.) is probably energy better spent.

>> So your patch is changing the behaviour of the format specifier %B in a
>> way allowable by the standard, and it adds a non-standard extension %OB
>> which the standard explicitly states is undefined, but not disallowed.
>> Additionally, binaries built against pre-%OB glibc running on post-%OB
>> glibc will still have the old behaviour.
> 
> True, backward compatibility for existing binaries is retained.
> There are of course concerns about existing sources which are
> recompiled without any change but as I noted before:
> 
> 1. If they are open source project we can reach them out and
>    help them adopt.
> 2. The cases where the change will be actually needed are probably
>    rare compared with the cases where a new behaviour of %B will
>    finally generate a correct form without any work on their side.
> 
>> I'm fine what that; I had a different version of this in my mind (see
>> below). Thank you for prompting me to give this more thought. You are
>> definitely between a rock and hard place. :)
>>
>> I think I should respond to Florian's objection [5] at this point.
>>
>> Florian,
>>
>> Were you opposed to the work of trying to make %B a little smarter? It
>> looks like Rafal abandoned that approach voluntarily, and opted to chase
>> the unification on the horizon when it was pointed out BSD and the POSIX
>> proposal were actually the same. [6]
>> [...]
> 
> When you refer to my earlier smart (heuristic) implementation of %B do
> you mean my attempt to implement an algorithm detecting if %B is in
> a context of full date (near a day number) or not?

Yes.

> Except realizing
> that BSD and the POSIX proposal are actually the same there is another
> argument against the smart algorithm.  I think that I saw an implementation
> in a date utility which scans the format string and then calls strftime()
> for each format specifier separately.  This way a smart implementation
> would have no way to tell if %B is just after a day number (well, it could
> maintain some internal state) and definitely no way to tell if the next
> call after a current %B will also contain a day number.

Just preserve the status quo in cases you can't be absolutely sure the
alternate form is correct.

I think another way to ask the question I posed to Florian, making it
more general for everybody, and getting more to my point of trying to
find a suitable compromise, is: can a solution that doesn't address the
problem of cases in month names in full generality ever be found
acceptable as a fix for this bug?

By tucking some logic away behind %B that can return alternative month
names in the proper context and otherwise just do the same thing it
always did, we add code that may be construed as temporary, even if it
winds up lasting over a decade waiting for some standard to fix the
general issue, and I don't know how everybody feels about that.  Some
may want to avoid the possibility it ever gets grandfathered in due to
longevity.  "Dirty hack" was thrown around a bit in the bug discussion.
There is also the ever-present reality of imposing a maintenance burden
on others.  A narrowly-scoped, smaller fix might assuage some of those
fears.

> Here I preserve your links:
> 
>>
>> [1]
>> http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03_05
>> [2] http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html
>> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=10871
>> [4] http://austingroupbugs.net/view.php?id=258
>> [5] https://sourceware.org/ml/libc-alpha/2016-11/msg00322.html
>> [6] https://sourceware.org/bugzilla/show_bug.cgi?id=10871#c40
> 
> Regards,
> 
> Rafal

I sincerely hope we can find a direction that is acceptable to the
maintainers that fixes the blatant problem while managing to avoid
becoming a standardization issue.  I'd hate to see this bug be solved as
WONTFIXYET after all your work.  :)

Rical
  
Rafal Luzynski Nov. 22, 2016, 11:55 p.m. UTC | #18
Hello,

I'm sorry for this late reply.

18.11.2016 10:22 Rical Jasan <ricaljasan@pacific.net> wrote:
>
> On 11/17/2016 03:18 AM, Rafal Luzynski wrote:
>[...]
> > As Florian correctly
> > pointed out in some languages (German, also English) a correct form
> > when formatting a full date is nominative. These languages should
> > not be forced to use genitive if their rules say not to use
> > genitive in that context.
>
> I believe you had pointed out that the genitive/alternate forms would
> need to be present in the locale data, and could be omitted for
> languages that didn't need them, yielding the correct results in practice.

Right, that's the point.  I'm glad I have explained this in an
understandable way.

> I'm not sure the problem can be appropriately addressed without a fully
> general solution, which makes how you deal with it something of a lynch
> pin in getting even the most minimal of fixes accepted. Trying to solve
> this issue is what puts you in standards and extensions land---and it
> does not look like travel is permitted there at this time. Extra care
> in designing the code to be easily modified no matter how the general
> solution winds up looking (whether %OB is standardized, glibc begins
> using their own extensions some day, etc.) is probably energy better spent.

I'm not sure if I understand this paragraph correctly.  Did you mean
that we can't introduce this change because it involves some specification
change?  Unfortunately, I'm afraid it's impossible to do it without
a specification change.  That's why I asked about a committee which is
powerful enough to introduce such a change.

Also, regarding the design of the code, I usually pay particular attention
to this.  If you think it can be improved I'll appreciate your comments.
However, I think that the specification questions are more important
at this moment because there is no reason to polish a code which will
be rejected for the specification reasons.

> [...]
> > Except realizing
> > that BSD and the POSIX proposal are actually the same there is another
> > argument against the smart algorithm. I think that I saw an implementation
> > in a date utility which scans the format string and then calls strftime()
> > for each format specifier separately. This way a smart implementation
> > would have no way to tell if %B is just after a day number (well, it could
> > maintain some internal state) and definitely no way to tell if the next
> > call after a current %B will also contain a day number.
>
> Just preserve the status quo in cases you can't be absolutely sure the
> alternate form is correct.
>
> I think another way to ask the question I posed to Florian, making it
> more general for everybody, and getting more to my point of trying to
> find a suitable compromise, is: can a solution that doesn't address the
> problem of cases in month names in full generality ever be found
> acceptable as a fix for this bug?
>
> By tucking some logic away behind %B that can return alternative month
> names in the proper context and otherwise just do the same thing it
> always did, we add code that may be construed as temporary, even if it
> winds up lasting over a decade waiting for some standard to fix the
> general issue, and I don't know how everybody feels about that. Some
> may want to avoid the possibility it ever gets grandfathered in due to
> longevity. "Dirty hack" was thrown around a bit in the bug discussion.
> There is also the ever-present reality of imposing a maintenance burden
> on others. A narrowly-scoped, smaller fix might assuage some of those
> fears.

If you prefer my older heuristic solution the patches are somewhere
around so they can always be retrieved, reworked, and applied.
However, I can see a problem here.  This heuristic algorithm would
work correctly in most cases and incorrectly in few cases.  Yes, one
can say the same about a deterministic algorithm implementing %OB
explicitly.  But in case of a deterministic algorithm the problem
can be fixed changing the application source code.  In case of
the heuristic algorithm there is no switch to tell "yes, I really
want this nominative (genitive) case here."

Particularly, it will not work correctly in the date command line
utility which I perceive as the most versatile testing tool for
the dates format.

I have discussed probably all possible solutions here. [1]

It's probably not a correct solution but thinking hard I've found
another one:

1. Switch nl_langinfo(MON_x) and strftime("%B"...) to generate
   the genitive form (where two forms are needed) always.  Almost
   no coding required, just put the genitive forms into the locale
   data and say this is what should have always been there.
2. Don't implement the "%OB" format specifier.  (See below why).
3. However, do implement nl_langinfo(ALTMON_x) and let it return
   the nominative month names where needed.  This is simpler than
   strftime().
4. nl_langinfo(ALTMON_x) would be the only way to retrieve the
   month name in a nominative case.  It will be impossible to
   do it with strftime().  Sorry, standalone is standalone, not
   format.

Please take it as brainstorming and/or a kinda proof that every
other solution is even less acceptable. :-)

> [...]
> I sincerely hope we can find a direction that is acceptable to the
> maintainers that fixes the blatant problem while managing to avoid
> becoming a standardization issue. I'd hate to see this bug be solved as
> WONTFIXYET after all your work. :)
>
> Rical

Thank you for your support so far.  Ulrich Drepper said "You'll have
to provide a complete patch." [2] As far as I know Ulrich is no
longer around here but here is a complete patch, even if it needs
some rework to fit to glibc standards.  I'll appreciate if you
guys give more comments or even put some of your work into
finalizing it.

Best regards,

Rafal

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=10871#c7
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=10871#c2
  

Patch

diff --git a/include/time.h b/include/time.h
index 684ceb8..80ac40f 100644
--- a/include/time.h
+++ b/include/time.h
@@ -10,6 +10,15 @@  extern __typeof (strftime_l) __strftime_l;
 libc_hidden_proto (__strftime_l)
 extern __typeof (strptime_l) __strptime_l;
 
+/* Backward compatibility function: feature_OB argument specifies
+   whether or not %OB format specifier should be implemented.  */
+extern size_t __strftime_l_common (char *__restrict __s, size_t __maxsize,
+				   const char *__restrict __format,
+				   const struct tm *__restrict __tp,
+				   const int feature_OB,
+				   __locale_t __loc) __THROW;
+libc_hidden_proto (__strftime_l_common)
+
 libc_hidden_proto (time)
 libc_hidden_proto (asctime)
 libc_hidden_proto (mktime)
diff --git a/include/wchar.h b/include/wchar.h
index 6272130..495ad91 100644
--- a/include/wchar.h
+++ b/include/wchar.h
@@ -25,6 +25,15 @@  libc_hidden_proto (__wcstof_l)
 libc_hidden_proto (__wcstold_l)
 libc_hidden_proto (__wcsftime_l)
 
+/* Backward compatibility function: feature_OB argument specifies
+   whether or not %OB format specifier should be implemented.  */
+extern size_t __wcsftime_l_common (wchar_t *__restrict __s, size_t __maxsize,
+				   const wchar_t *__restrict __format,
+				   const struct tm *__restrict __tp,
+				   const int feature_OB,
+				   __locale_t __loc) __THROW;
+libc_hidden_proto (__wcsftime_l_common)
+
 
 extern double __wcstod_internal (const wchar_t *__restrict __nptr,
 				 wchar_t **__restrict __endptr, int __group)
diff --git a/time/Versions b/time/Versions
index fd83818..e11fe65 100644
--- a/time/Versions
+++ b/time/Versions
@@ -65,4 +65,8 @@  libc {
   GLIBC_2.16 {
     timespec_get;
   }
+  GLIBC_2.25 {
+    __strftime_l; __wcsftime_l;
+    strftime; strftime_l; wcsftime; wcsftime_l;
+  }
 }
diff --git a/time/strftime.c b/time/strftime.c
index 92150d9..45131d2 100644
--- a/time/strftime.c
+++ b/time/strftime.c
@@ -17,11 +17,25 @@ 
 
 #include <time.h>
 #include <locale/localeinfo.h>
+#include <shlib-compat.h>
 
 
 size_t
-strftime (char *s, size_t maxsize, const char *format, const struct tm *tp)
+__strftime (char *s, size_t maxsize, const char *format, const struct tm *tp)
 {
-  return __strftime_l (s, maxsize, format, tp, _NL_CURRENT_LOCALE);
+  return __strftime_l_common (s, maxsize, format, tp, 1, _NL_CURRENT_LOCALE);
 }
-libc_hidden_def (strftime)
+versioned_symbol (libc, __strftime, strftime, GLIBC_2_25);
+libc_hidden_ver (__strftime, strftime)
+
+
+#if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_25)
+size_t
+attribute_compat_text_section
+__strftime_compat (char *s, size_t maxsize, const char *format,
+		   const struct tm *tp)
+{
+  return __strftime_l_common (s, maxsize, format, tp, 0, _NL_CURRENT_LOCALE);
+}
+compat_symbol (libc, __strftime_compat, strftime, GLIBC_2_0);
+#endif
diff --git a/time/strftime_l.c b/time/strftime_l.c
index 4d54e23..4c75584 100644
--- a/time/strftime_l.c
+++ b/time/strftime_l.c
@@ -56,6 +56,8 @@ 
 extern char *tzname[];
 #endif
 
+#include <shlib-compat.h>
+
 /* Do multibyte processing if multibytes are supported, unless
    multibyte sequences are safe in formats.  Multibyte sequences are
    safe if they cannot contain byte sequences that look like format
@@ -279,15 +281,19 @@  static const CHAR_T zeroes[16] = /* "0000000000000000" */
    function gets as an additional argument the locale which has to be
    used.  To access the values we have to redefine the _NL_CURRENT
    macro.  */
-# define strftime		__strftime_l
-# define wcsftime		__wcsftime_l
+# define strftime		__strftime_l_common
+# define wcsftime		__wcsftime_l_common
 # undef _NL_CURRENT
 # define _NL_CURRENT(category, item) \
   (current->values[_NL_ITEM_INDEX (item)].string)
+# define FEATURE_OB_PARAM , int feature_OB
+# define FEATURE_OB_ARG , feature_OB
 # define LOCALE_PARAM , __locale_t loc
 # define LOCALE_ARG , loc
 # define HELPER_LOCALE_ARG  , current
 #else
+# define FEATURE_OB_PARAM
+# define FEATURE_OB_ARG
 # define LOCALE_PARAM
 # define LOCALE_ARG
 # ifdef _LIBC
@@ -435,6 +441,7 @@  static CHAR_T const month_name[][10] =
 static size_t __strftime_internal (CHAR_T *, size_t, const CHAR_T *,
 				   const struct tm *, bool *
 				   ut_argument_spec
+				   FEATURE_OB_PARAM
 				   LOCALE_PARAM) __THROW;
 
 /* Write information from TP into S according to the format
@@ -446,7 +453,7 @@  static size_t __strftime_internal (CHAR_T *, size_t, const
CHAR_T *,
 
 size_t
 my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T *format,
-	     const struct tm *tp ut_argument_spec LOCALE_PARAM)
+	     const struct tm *tp ut_argument_spec FEATURE_OB_PARAM LOCALE_PARAM)
 {
 #if !defined _LIBC && HAVE_TZNAME && HAVE_TZSET
   /* Solaris 2.5 tzset sometimes modifies the storage returned by localtime.
@@ -457,7 +464,7 @@  my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T
*format,
 #endif
   bool tzset_called = false;
   return __strftime_internal (s, maxsize, format, tp, &tzset_called
-			      ut_argument LOCALE_ARG);
+			      ut_argument FEATURE_OB_ARG LOCALE_ARG);
 }
 #ifdef _LIBC
 libc_hidden_def (my_strftime)
@@ -466,10 +473,12 @@  libc_hidden_def (my_strftime)
 static size_t
 __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 		     const struct tm *tp, bool *tzset_called
-		     ut_argument_spec LOCALE_PARAM)
+		     ut_argument_spec FEATURE_OB_PARAM LOCALE_PARAM)
 {
 #if defined _LIBC && defined USE_IN_EXTENDED_LOCALE_MODEL
   struct __locale_data *const current = loc->__locales[LC_TIME];
+#else
+# define feature_OB 1
 #endif
 
   int hour12 = tp->tm_hour;
@@ -781,6 +790,8 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T
*format,
 	case L_('B'):
 	  if (modifier == L_('E'))
 	    goto bad_format;
+	  if (!feature_OB && modifier == L_('O'))
+	    goto bad_format;
 	  if (change_case)
 	    {
 	      to_uppcase = 1;
@@ -788,7 +799,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T
*format,
 	    }
 #if defined _NL_CURRENT || !HAVE_STRFTIME
 	  /* Use f_altmonth only if f_altmonth is provided.  */
-	  if (f_altmonth[0] && modifier == L_('O'))
+	  if (f_altmonth[0] && (!feature_OB || modifier == L_('O')))
 	    cpy (STRLEN (f_altmonth), f_altmonth);
 	  else
 	    cpy (STRLEN (f_month), f_month);
@@ -820,10 +831,10 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const
CHAR_T *format,
 	    CHAR_T *old_start = p;
 	    size_t len = __strftime_internal (NULL, (size_t) -1, subfmt,
 					      tp, tzset_called ut_argument
-					      LOCALE_ARG);
+					      FEATURE_OB_ARG LOCALE_ARG);
 	    add (len, __strftime_internal (p, maxsize - i, subfmt,
 					   tp, tzset_called ut_argument
-					   LOCALE_ARG));
+					   FEATURE_OB_ARG LOCALE_ARG));
 
 	    if (to_uppcase)
 	      while (old_start < p)
@@ -1424,10 +1435,34 @@  size_t
 emacs_strftime (char *s, size_t maxsize, const char *format,
 		const struct tm *tp)
 {
-  return my_strftime (s, maxsize, format, tp, 0);
+  return my_strftime (s, maxsize, format, tp, 1, 0);
 }
 #endif
 
 #if defined _LIBC && !defined COMPILE_WIDE
-weak_alias (__strftime_l, strftime_l)
+size_t
+__strftime_l_internal (char *s, size_t maxsize, const char *format,
+		       const struct tm *tp, __locale_t loc)
+{
+  return my_strftime (s, maxsize, format, tp, 1, loc);
+}
+strong_alias (__strftime_l_internal, __strftime_l_internal2)
+versioned_symbol (libc, __strftime_l_internal2, __strftime_l, GLIBC_2_25);
+libc_hidden_ver (__strftime_l_internal2, __strftime_l)
+versioned_symbol (libc, __strftime_l_internal, strftime_l, GLIBC_2_25);
+libc_hidden_ver (__strftime_l_internal, strftime_l)
+
+# if SHLIB_COMPAT (libc, GLIBC_2_3, GLIBC_2_25)
+size_t
+attribute_compat_text_section
+__strftime_l_compat (char *s, size_t maxsize, const char *format,
+		     const struct tm *tp, __locale_t loc)
+{
+  return my_strftime (s, maxsize, format, tp, 0, loc);
+}
+strong_alias (__strftime_l_compat, __strftime_l_compat2)
+compat_symbol (libc, __strftime_l_compat2, __strftime_l, GLIBC_2_3);
+compat_symbol (libc, __strftime_l_compat, strftime_l, GLIBC_2_3);
+# endif
+
 #endif
diff --git a/time/wcsftime.c b/time/wcsftime.c
index a8f06f1..ee83624 100644
--- a/time/wcsftime.c
+++ b/time/wcsftime.c
@@ -17,12 +17,26 @@ 
 
 #include <wchar.h>
 #include <locale/localeinfo.h>
+#include <shlib-compat.h>
 
 
 size_t
-wcsftime (wchar_t *s, size_t maxsize, const wchar_t *format,
-	  const struct tm *tp)
+__wcsftime (wchar_t *s, size_t maxsize, const wchar_t *format,
+	    const struct tm *tp)
 {
-  return __wcsftime_l (s, maxsize, format, tp, _NL_CURRENT_LOCALE);
+  return __wcsftime_l_common (s, maxsize, format, tp, 1, _NL_CURRENT_LOCALE);
 }
-libc_hidden_def (wcsftime)
+versioned_symbol (libc, __wcsftime, wcsftime, GLIBC_2_25);
+libc_hidden_ver (__wcsftime, wcsftime)
+
+
+#if SHLIB_COMPAT (libc, GLIBC_2_2, GLIBC_2_25)
+size_t
+attribute_compat_text_section
+__wcsftime_compat (wchar_t *s, size_t maxsize, const wchar_t *format,
+		   const struct tm *tp)
+{
+  return __wcsftime_l_common (s, maxsize, format, tp, 0, _NL_CURRENT_LOCALE);
+}
+compat_symbol (libc, __wcsftime_compat, wcsftime, GLIBC_2_2);
+#endif
diff --git a/time/wcsftime_l.c b/time/wcsftime_l.c
index f771417..ccf0c7f 100644
--- a/time/wcsftime_l.c
+++ b/time/wcsftime_l.c
@@ -22,4 +22,27 @@ 
 #define COMPILE_WIDE	1
 #include "strftime_l.c"
 
-weak_alias (__wcsftime_l, wcsftime_l)
+size_t
+__wcsftime_l_internal (wchar_t *s, size_t maxsize, const wchar_t *format,
+		       const struct tm *tp, __locale_t loc)
+{
+  return my_strftime (s, maxsize, format, tp, 1, loc);
+}
+strong_alias (__wcsftime_l_internal, __wcsftime_l_internal2)
+versioned_symbol (libc, __wcsftime_l_internal2, __wcsftime_l, GLIBC_2_25);
+libc_hidden_ver (__wcsftime_l_internal2, __wcsftime_l)
+versioned_symbol (libc, __wcsftime_l_internal, wcsftime_l, GLIBC_2_25);
+libc_hidden_ver (__wcsftime_l_internal, wcsftime_l)
+
+#if SHLIB_COMPAT (libc, GLIBC_2_3, GLIBC_2_25)
+size_t
+attribute_compat_text_section
+__wcsftime_l_compat (wchar_t *s, size_t maxsize, const wchar_t *format,
+		     const struct tm *tp, __locale_t loc)
+{
+  return my_strftime (s, maxsize, format, tp, 0, loc);
+}
+strong_alias (__wcsftime_l_compat, __wcsftime_l_compat2)
+compat_symbol (libc, __wcsftime_l_compat2, __wcsftime_l, GLIBC_2_3);
+compat_symbol (libc, __wcsftime_l_compat, wcsftime_l, GLIBC_2_3);
+#endif