[BZ,#15527] strftime_l.c: Support lowercase output

Message ID 576117B9.5080105@redhat.com
State New, archived
Headers

Commit Message

Jakub Martisko June 15, 2016, 8:54 a.m. UTC
  strftime_l.c doe not provide easy way to produce lowercase output. While 
the function to create lowercase is implemented, there is no flag which 
would cause it to be called. Provided patch checks, whether combination 
of to_uppcase and change_case flags is used and sets to_lowcase if both 
of them are set which leads to lower case output.
  

Comments

Andreas Schwab June 15, 2016, 9:08 a.m. UTC | #1
Jakub Martisko <jamartis@redhat.com> writes:

>  @item ^
> -The output uses uppercase characters, but only if this is possible
> +The output uses uppercase characters, but only if this is possible.
> +
> +@item #
> +The output uses opposite case characters, but only if this is possible.
> +Can be combined with @samp{^} to produce lowercase characters.
>  (@pxref{Case Conversion}).
>  @end table

Bash is using ${x^} and ${x,} for case replacing expansions.

Andreas.
  
Jakub Martisko June 29, 2016, 7:55 a.m. UTC | #2
Hi Andreas,

thanks for your comment. The reason why I sent the patch is that there 
is a bug/feature request for similar functionality in coreutils' "date" 
program and the maintainers of coreutils/gnulib do not want to diverge 
from the glibc interface. Even though the replacing you mentioned does 
indeed work, built-in version would be imo better (for example when 
using other shell than bash), especially when all of the needed 
functionality was already implemented.

Jakub

On 15.6.2016 11:08, Andreas Schwab wrote:
> Jakub Martisko <jamartis@redhat.com> writes:
>
>>   @item ^
>> -The output uses uppercase characters, but only if this is possible
>> +The output uses uppercase characters, but only if this is possible.
>> +
>> +@item #
>> +The output uses opposite case characters, but only if this is possible.
>> +Can be combined with @samp{^} to produce lowercase characters.
>>   (@pxref{Case Conversion}).
>>   @end table
>
> Bash is using ${x^} and ${x,} for case replacing expansions.
>
> Andreas.
>
  
Jakub Martisko Nov. 29, 2016, 12:21 p.m. UTC | #3
Hi, are there any updates regarding this functionality?

On 15.6.2016 10:54, Jakub Martisko wrote:
> strftime_l.c doe not provide easy way to produce lowercase
> output. While the function to create lowercase is
> implemented, there is no flag which would cause it to be
> called. Provided patch checks, whether combination of
> to_uppcase and change_case flags is used and sets to_lowcase
> if both of them are set which leads to lower case output.
>
  
Jakub Martisko Nov. 29, 2016, 12:41 p.m. UTC | #4
Sorry, this was supposed to be a reply to:
https://sourceware.org/ml/libc-alpha/2016-06/msg00575.html

On 29.11.2016 13:21, Jakub Martisko wrote:
> Hi, are there any updates regarding this functionality?
> 
> On 15.6.2016 10:54, Jakub Martisko wrote:
>> strftime_l.c doe not provide easy way to produce lowercase
>> output. While the function to create lowercase is
>> implemented, there is no flag which would cause it to be
>> called. Provided patch checks, whether combination of
>> to_uppcase and change_case flags is used and sets to_lowcase
>> if both of them are set which leads to lower case output.
>>
  
Rafal Luzynski Dec. 1, 2016, 11:29 p.m. UTC | #5
Hi,

(Top-posting to conform to the style you already started. ;)

Your patch has drawn my attention. I must admit that I have not
analyzed it very thoroughly but OTOH I'm not the right person to
say the patch should be committed.

Only one question: what would be the order of applying
the flags? Should "%^#A" mean "convert to uppercase and then
swap" or "swap the case and then convert to uppercase" or
should it be an idiom to "convert to lowercase" no matter
what is the actual order? Should "%^#A" do the same as "%#^A"?

At first sight it may seem that usefulness of this feature
is limited: who would need a "convert to lowercase" switch
if all letters are already lowercase and those which are
uppercase (first letters of months names and weekdays names)
should always be uppercase?

But that's true only for English, German and maybe few other
languages. Not true for lots of others, including my and also
I guess your language. In many languages there is no rule
saying that month names and weekday names should always begin
with uppercase but other rules may apply: it should begin with
uppercase if it's a beginning of a sentence or a beginning of
a title. I really don't like months names (standalone) or weekday
names (a full date starting with a weekday name) starting with
lowercase just because in English they are always uppercase
and developers don't have to worry about it so they eventually
leave all other languages in all lowercase. I really wish there
was a strftime() flag converting words to titlecase. We have
"convert to uppercase" and "swap the case" but no "convert to
lowercase" nor "convert the first letter to uppercase". OTOH,
the implementation of this feature should not be left to the app
developers because not all languages need it, and converting
to the uppercase/titlecase is not a trivial task (in case of
UTF-8, how many bytes are occupied by the first letter? how
many bytes will its uppercase version occupy? does the letter
feature only lowercase and uppercase or is it a ligature
and has it a separate titlecase, like lj → Lj → LJ?) so better
should be implemented by a core library.

Unfortunately, I don't have a good candidate for a "convert
to lowercase" or a "convert to titlecase" switch. But your
patch solves the problem if we also provide all months names
and all weekday names in all locale data for all languages
in titlecase, even if a language does not require it by default.
Then we would have:

"%^A" - convert to uppercase => "SUNDAY";
"%#A" - swap the case => "sUNDAY" (yes, not useful);
"%^#A" - convert to lowercase => "sunday" (in the middle of
         a sentence, as required in many languages but not
         in English);
"%A" - leave unchanged, titlecase => "Sunday" (default, always
       in English and some other languages, required in the
       beginning of a sentence in many other languages).

We would have a way to convert all words to any (reasonable)
case and the decision would be always left for the translators
without any change in any application code. What do you guys think?

Regards,

Rafal


29.11.2016 13:41 Jakub Martisko <jamartis@redhat.com> wrote:
>
>
> Sorry, this was supposed to be a reply to:
> https://sourceware.org/ml/libc-alpha/2016-06/msg00575.html
>
> On 29.11.2016 13:21, Jakub Martisko wrote:
> > Hi, are there any updates regarding this functionality?
> >
> > On 15.6.2016 10:54, Jakub Martisko wrote:
> >> strftime_l.c doe not provide easy way to produce lowercase
> >> output. While the function to create lowercase is
> >> implemented, there is no flag which would cause it to be
> >> called. Provided patch checks, whether combination of
> >> to_uppcase and change_case flags is used and sets to_lowcase
> >> if both of them are set which leads to lower case output.
> >>
  
Jakub Martisko Dec. 5, 2016, 2:55 p.m. UTC | #6
Hello Rafal,

as for the order of ^# flags - right now the change case
flag works as an upper case flag for options which are in
title case by default (Sun -> SUN) and as lowercase for
those, which are in uppercase by default (AM -> am). In my
opinion, treating "%^#A" and "%#^A" as an idiom for lower
case makes the most sense. If you consider "%#^A", the
output would switch case (whatever that means) and then be
switched to uppercase. The "#" flag would thus be ignored.

As for the title case part of your message, I am probably
not the right person to answer it:-(.

Regards,
Jakub

On 2.12.2016 00:29, Rafal Luzynski wrote:
> Hi,
> 
> (Top-posting to conform to the style you already started. ;)
> 
> Your patch has drawn my attention. I must admit that I have not
> analyzed it very thoroughly but OTOH I'm not the right person to
> say the patch should be committed.
> 
> Only one question: what would be the order of applying
> the flags? Should "%^#A" mean "convert to uppercase and then
> swap" or "swap the case and then convert to uppercase" or
> should it be an idiom to "convert to lowercase" no matter
> what is the actual order? Should "%^#A" do the same as "%#^A"?
> 
> At first sight it may seem that usefulness of this feature
> is limited: who would need a "convert to lowercase" switch
> if all letters are already lowercase and those which are
> uppercase (first letters of months names and weekdays names)
> should always be uppercase?
> 
> But that's true only for English, German and maybe few other
> languages. Not true for lots of others, including my and also
> I guess your language. In many languages there is no rule
> saying that month names and weekday names should always begin
> with uppercase but other rules may apply: it should begin with
> uppercase if it's a beginning of a sentence or a beginning of
> a title. I really don't like months names (standalone) or weekday
> names (a full date starting with a weekday name) starting with
> lowercase just because in English they are always uppercase
> and developers don't have to worry about it so they eventually
> leave all other languages in all lowercase. I really wish there
> was a strftime() flag converting words to titlecase. We have
> "convert to uppercase" and "swap the case" but no "convert to
> lowercase" nor "convert the first letter to uppercase". OTOH,
> the implementation of this feature should not be left to the app
> developers because not all languages need it, and converting
> to the uppercase/titlecase is not a trivial task (in case of
> UTF-8, how many bytes are occupied by the first letter? how
> many bytes will its uppercase version occupy? does the letter
> feature only lowercase and uppercase or is it a ligature
> and has it a separate titlecase, like lj → Lj → LJ?) so better
> should be implemented by a core library.
> 
> Unfortunately, I don't have a good candidate for a "convert
> to lowercase" or a "convert to titlecase" switch. But your
> patch solves the problem if we also provide all months names
> and all weekday names in all locale data for all languages
> in titlecase, even if a language does not require it by default.
> Then we would have:
> 
> "%^A" - convert to uppercase => "SUNDAY";
> "%#A" - swap the case => "sUNDAY" (yes, not useful);
> "%^#A" - convert to lowercase => "sunday" (in the middle of
>          a sentence, as required in many languages but not
>          in English);
> "%A" - leave unchanged, titlecase => "Sunday" (default, always
>        in English and some other languages, required in the
>        beginning of a sentence in many other languages).
> 
> We would have a way to convert all words to any (reasonable)
> case and the decision would be always left for the translators
> without any change in any application code. What do you guys think?
> 
> Regards,
> 
> Rafal
> 
> 
> 29.11.2016 13:41 Jakub Martisko <jamartis@redhat.com> wrote:
>>
>>
>> Sorry, this was supposed to be a reply to:
>> https://sourceware.org/ml/libc-alpha/2016-06/msg00575.html
>>
>> On 29.11.2016 13:21, Jakub Martisko wrote:
>>> Hi, are there any updates regarding this functionality?
>>>
>>> On 15.6.2016 10:54, Jakub Martisko wrote:
>>>> strftime_l.c doe not provide easy way to produce lowercase
>>>> output. While the function to create lowercase is
>>>> implemented, there is no flag which would cause it to be
>>>> called. Provided patch checks, whether combination of
>>>> to_uppcase and change_case flags is used and sets to_lowcase
>>>> if both of them are set which leads to lower case output.
>>>>
  
Rafal Luzynski Dec. 6, 2016, 10:47 p.m. UTC | #7
5.12.2016 15:55 Jakub Martisko <jamartis@redhat.com> wrote:
>
>
> Hello Rafal,
>
> as for the order of ^# flags - right now the change case
> flag works as an upper case flag for options which are in
> title case by default (Sun -> SUN) and as lowercase for
> those, which are in uppercase by default (AM -> am). In my
> opinion, treating "%^#A" and "%#^A" as an idiom for lower
> case makes the most sense. If you consider "%#^A", the
> output would switch case (whatever that means) and then be
> switched to uppercase. The "#" flag would thus be ignored.

Yes, that's true. Switching the case and then converting to
uppercase wouldn't make sense. And this is how your patch
works, as far as I remember: it switches to lowercase no matter
what is the order of "#" and "^".

I think that this either should be documented or (maybe better)
should be treated as an undefined behavior: not documented,
maybe producing bad results, maybe even producing correct
results, maybe will be changed in future. It may be also
explicitly documented as undefined behavior.

> As for the title case part of your message, I am probably
> not the right person to answer it:-(.

I don't think so. :-) I thought you were a non-English native
speaker; sorry if I'm wrong but if I'm not then you can
provide some valuable input here. How does it look in your
native language? Are all months and weekdays names always
written in lowercase? Wouldn't you like them to start with
an uppercase letter sometimes? For example, if you display
a calendar would you prefer "December" or "december"
in the header? Or if you display a date with a weekday
would you prefer "Wednesday, 7th of december" or
"wednesday, 7th of decebmer"? What is your solution to
achieve a proper result?

Regards,

Rafal
  
Mike Frysinger Dec. 6, 2016, 11:20 p.m. UTC | #8
On 29 Jun 2016 09:55, Jakub Martisko wrote:
> thanks for your comment. The reason why I sent the patch is that there 
> is a bug/feature request for similar functionality in coreutils' "date" 
> program and the maintainers of coreutils/gnulib do not want to diverge 
> from the glibc interface. Even though the replacing you mentioned does 
> indeed work, built-in version would be imo better (for example when 
> using other shell than bash), especially when all of the needed 
> functionality was already implemented.

i think his point is that bash has already defined a syntax, but you
are doing it differently and there's (afaict) no need for it.  he isn't
saying you should use bash if you want lower/upper case.

so instead of adding new syntax like "%#^x", add "%,x"
-mike
  
Mike Frysinger Dec. 6, 2016, 11:21 p.m. UTC | #9
On 15 Jun 2016 10:54, Jakub Martisko wrote:
> +  /*case tests*/

seems like you should drop this comment

> +    }
>    return result + do_bz18985 ();

should be a blank line above this return statement
-mike
  

Patch

2016-05-12  Jakub Martisko  <jamartis@redhat.com>

	* [BZ #15527]
	* time/strftime_l.c (__strftime_internal): Implement conversion to
	all lowercase.
	* manual/time.texi: Document # flag.
	* time/tst-strftime.c (do_test): Test case conversion.



---
 manual/time.texi    |  6 +++++-
 time/strftime_l.c   |  7 ++++++-
 time/tst-strftime.c | 31 ++++++++++++++++++++++++++++++-
 3 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/manual/time.texi b/manual/time.texi
index f94cbe4..a65959f 100644
--- a/manual/time.texi
+++ b/manual/time.texi
@@ -1350,7 +1350,11 @@  The number is padded with zeros even if the format specifies padding
 with spaces.
 
 @item ^
-The output uses uppercase characters, but only if this is possible
+The output uses uppercase characters, but only if this is possible.
+
+@item #
+The output uses opposite case characters, but only if this is possible.
+Can be combined with @samp{^} to produce lowercase characters.
 (@pxref{Case Conversion}).
 @end table
 
diff --git a/time/strftime_l.c b/time/strftime_l.c
index 1205035..c577f28 100644
--- a/time/strftime_l.c
+++ b/time/strftime_l.c
@@ -677,7 +677,12 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	    }
 	  break;
 	}
-
+	if (to_uppcase == 1 && change_case == 1)
+	{
+	  to_uppcase = 0;
+	  change_case = 0;
+	  to_lowcase = 1;
+	}
       /* As a GNU extension we allow to specify the field width.  */
       if (ISDIGIT (*f))
 	{
diff --git a/time/tst-strftime.c b/time/tst-strftime.c
index af3ff72..e369595 100644
--- a/time/tst-strftime.c
+++ b/time/tst-strftime.c
@@ -153,7 +153,36 @@  do_test (void)
 	  result = 1;
 	}
     }
-
+  /*case tests*/
+	const struct
+	  {
+	    const char *fmt;
+	    const char *exp;
+	    size_t n;
+	  } ctests[] =
+	    {
+	      { "%^A", "SUNDAY", 6 },
+	      { "%^#A", "sunday", 6 },
+	      { "%A", "Sunday", 6 },
+	    };
+#define nctests (sizeof (ctests) / sizeof (ctests[0]))
+	  for (cnt = 0; cnt < nctests; ++cnt)
+    {
+      char buf[100];
+      size_t r = strftime (buf, sizeof (buf), ctests[cnt].fmt, &ttm);
+      if (r != ctests[cnt].n)
+	{
+	  printf ("strftime(\"%s\") returned %zu not %zu\n",
+		  ctests[cnt].fmt, r, ctests[cnt].n);
+	  result = 1;
+	}
+      if (strcmp (buf, ctests[cnt].exp) != 0)
+	{
+	  printf ("strftime(\"%s\") produced \"%s\" not \"%s\"\n",
+		  ctests[cnt].fmt, buf, ctests[cnt].exp);
+	  result = 1;
+	}
+    }
   return result + do_bz18985 ();
 }
 
-- 
2.5.0