Fix %Z parsing in strptime [BZ #16088]

Message ID 20230622150021.3828261-1-lancasterharp@gmail.com
State Superseded
Headers
Series Fix %Z parsing in strptime [BZ #16088] |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed

Commit Message

Stanley Lancaster June 22, 2023, 3 p.m. UTC
  %Z parsing terminated on space, should've terminated on next character in format string

Author: Stanley Lancast <lancasterharp@gmail.com>
---
 time/strptime_l.c   | 5 +++--
 time/tst-strptime.c | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)
  

Comments

Andreas Schwab June 22, 2023, 3:21 p.m. UTC | #1
On Jun 22 2023, Stanley Lancaster via Libc-alpha wrote:

> %Z parsing terminated on space, should've terminated on next character in format string
>
> Author: Stanley Lancast <lancasterharp@gmail.com>
> ---
>  time/strptime_l.c   | 5 +++--
>  time/tst-strptime.c | 1 +
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/time/strptime_l.c b/time/strptime_l.c
> index 85c3249fcc..5f96795406 100644
> --- a/time/strptime_l.c
> +++ b/time/strptime_l.c
> @@ -772,8 +772,9 @@ __strptime_internal (const char *rp, const char *fmt, struct tm *tmp,
>  	  /* Read timezone but perform no conversion.  */
>  	  while (ISSPACE (*rp))
>  	    rp++;
> -	  while (!ISSPACE (*rp) && *rp != '\0')
> -	    rp++;
> +
> +	  while (*rp != *(fmt) && *rp != '\0')
> +	  	rp++; 

The next character could be '%'.
  
Stanley Lancaster June 22, 2023, 5:05 p.m. UTC | #2
---------- Forwarded message ---------
From: Stanley Lancaster <lancasterharp@gmail.com>
Date: Thu, Jun 22, 2023 at 10:51 AM
Subject: Re: [PATCH] Fix %Z parsing in strptime [BZ #16088]
To: Andreas Schwab <schwab@suse.de>


Yes. I thought about this possibility. %Z does not provide enough
information about what a "time zone name" looks like. The parser cannot
know when a timezone name ends without the user telling them. If we have
something like "%Z%B" as the format string, and "CSTJUN" as the input, the
parser cannot have any idea if CSTJUN is the timezone name, or if CSTJ is
the timezone name, or if CSTJU is the timezone name, and so on and so
forth...

There has to be some type of failure mode here to allow the programmer to
understand that they need to provide some sort of terminator for %Z. I'm
open to suggestions, if a different failure mode, besides simply parsing
the input incorrectly would be appropriate.

On Thu, Jun 22, 2023 at 10:21 AM Andreas Schwab <schwab@suse.de> wrote:

> On Jun 22 2023, Stanley Lancaster via Libc-alpha wrote:
>
> > %Z parsing terminated on space, should've terminated on next character
> in format string
> >
> > Author: Stanley Lancast <lancasterharp@gmail.com>
> > ---
> >  time/strptime_l.c   | 5 +++--
> >  time/tst-strptime.c | 1 +
> >  2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/time/strptime_l.c b/time/strptime_l.c
> > index 85c3249fcc..5f96795406 100644
> > --- a/time/strptime_l.c
> > +++ b/time/strptime_l.c
> > @@ -772,8 +772,9 @@ __strptime_internal (const char *rp, const char
> *fmt, struct tm *tmp,
> >         /* Read timezone but perform no conversion.  */
> >         while (ISSPACE (*rp))
> >           rp++;
> > -       while (!ISSPACE (*rp) && *rp != '\0')
> > -         rp++;
> > +
> > +       while (*rp != *(fmt) && *rp != '\0')
> > +             rp++;
>
> The next character could be '%'.
>
> --
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
>
  
Paul Eggert June 22, 2023, 8:53 p.m. UTC | #3
On 2023-06-22 10:05, Stanley Lancaster via Libc-alpha wrote:
> Yes. I thought about this possibility. %Z does not provide enough
> information about what a "time zone name" looks like.

tzdb and C-locale POSIX time zone names can contain only ASCII 
alphanumerics, "+", and "-", and must contain at least three characters. 
So you could parse only instances of [-+a-zA-Z0-9]{3,}. Although not 
perfect, that would be better than parsing until the next space.

By the way, it was misleading for the glibc manual's strptime section to 
document %Z as "The timezone name". It's not a timezone name - it's a 
time zone abbreviation. The correct term is used in the strftime 
section. I installed the attached documentation patch to fix this issue 
where I found it in glibc.
  
Stanley Lancaster June 23, 2023, 2:31 a.m. UTC | #4
I'll rework the patch to parse the string in this manner

Thanks so much for your suggestions


On Thu, Jun 22, 2023, 3:53 PM Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 2023-06-22 10:05, Stanley Lancaster via Libc-alpha wrote:
> > Yes. I thought about this possibility. %Z does not provide enough
> > information about what a "time zone name" looks like.
>
> tzdb and C-locale POSIX time zone names can contain only ASCII
> alphanumerics, "+", and "-", and must contain at least three characters.
> So you could parse only instances of [-+a-zA-Z0-9]{3,}. Although not
> perfect, that would be better than parsing until the next space.
>
> By the way, it was misleading for the glibc manual's strptime section to
> document %Z as "The timezone name". It's not a timezone name - it's a
> time zone abbreviation. The correct term is used in the strftime
> section. I installed the attached documentation patch to fix this issue
> where I found it in glibc.
  
Florian Weimer June 23, 2023, 6:48 a.m. UTC | #5
* Paul Eggert:

> From 21fbc0a19366f89638a30eef2b53c6d4baafdb88 Mon Sep 17 00:00:00 2001
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 22 Jun 2023 13:44:50 -0700
> Subject: [PATCH] Call "CST" a time zone abbreviation, not a name
>
> In documentation, call strings like "CST" time zone abbreviations, not
> time zone names.  This terminology is more precise, and is what tzdb uses.
> A string like "CST" is ambiguous and does not fully name a time zone.

This patch is okay, thanks.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Florian
  

Patch

diff --git a/time/strptime_l.c b/time/strptime_l.c
index 85c3249fcc..5f96795406 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -772,8 +772,9 @@  __strptime_internal (const char *rp, const char *fmt, struct tm *tmp,
 	  /* Read timezone but perform no conversion.  */
 	  while (ISSPACE (*rp))
 	    rp++;
-	  while (!ISSPACE (*rp) && *rp != '\0')
-	    rp++;
+
+	  while (*rp != *(fmt) && *rp != '\0')
+	  	rp++; 
 	  break;
 	case 'z':
 	  /* We recognize four formats:
diff --git a/time/tst-strptime.c b/time/tst-strptime.c
index 3dae9e0594..6eb1af5c15 100644
--- a/time/tst-strptime.c
+++ b/time/tst-strptime.c
@@ -37,6 +37,7 @@  static const struct
   { "C", "2000-01-01", "%Y-%m-%d", 6, 0, 0, 1 },
   { "C", "03/03/00", "%D", 5, 62, 2, 3 },
   { "C", "9/9/99", "%x", 4, 251, 8, 9 },
+  { "C", "CST-2000-01-01", "%Z-%Y-%m-%d", 6, 0, 0, 1}, /* Ensure %Z terminates properly */
   { "C", "19990502123412", "%Y%m%d%H%M%S", 0, 121, 4, 2 },
   { "C", "2001 20 Mon", "%Y %U %a", 1, 140, 4, 21 },
   { "C", "2001 21 Mon", "%Y %W %a", 1, 140, 4, 21 },