diff mbox series

Fix %Z parsing in strptime [BZ #16088]

Message ID	20230714145224.1456633-1-lancasterharp@gmail.com
State	Superseded
Headers	DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5C3833858CD1 To: libc-alpha@sourceware.org Cc: Stanley Lancaster <lancasterharp@gmail.com> Subject: Fix %Z parsing in strptime [BZ #16088] Date: Fri, 14 Jul 2023 09:52:25 -0500 Message-Id: <20230714145224.1456633-1-lancasterharp@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list From: Stanley Lancaster via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Stanley Lancaster <lancasterharp@gmail.com> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
Series	Fix %Z parsing in strptime [BZ #16088] \| Fix %Z parsing in strptime [BZ #16088]

Checks

Context	Check	Description
redhat-pt-bot/TryBot-apply_patch	success	Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit	success	Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-arm	success	Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm	success	Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64	success	Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64	success	Testing passed

Commit Message

Stanley Lancaster July 14, 2023, 2:52 p.m. UTC

  ---
 time/strptime_l.c   | 5 ++++-
 time/tst-strptime.c | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Paul Eggert July 14, 2023, 4:11 p.m. UTC | #1

On 2023-07-14 07:52, Stanley Lancaster via Libc-alpha wrote:

>   	  /* Read timezone but perform no conversion.  */
> +	  /* we recognize the format [-+a-zA-Z0-9]{3,} */

Use GNU style in comment with active voice sentences and two spaces 
after sentence end, e.g., "/* Read time zone but perform no conversion. 
Recognize the format [-+a-zA-Z0-9]{3,}.  */".

> +	  const char* stop_rp = rp + 3;

Again, GNU style: "char *stop_rp" not "char* stop_rp".

More important, this has undefined behavior if rp + 3 overflows. 
Instead, count the number of bytes after the loop finishes, and make 
sure it's 3 or more.

> +	  while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')

Omit "&& *rp != '\0'"; it's redundant. Reindent to 80 columns.

> +  { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },

I don't see how this test passes. "CST0502123412" is treated as a time 
zone abbreviation, so the only info is the year. Did you run the tests? 
If so, why did this test pass? If not, please run.

Since the patch does not fix BZ#16088, it needs a commit message that 
describes what the patch does and why it's a win even though it doesn't 
fix. In particular, the patch does not set tm_zone, and there's a reason 
for that, and this should be explained.

Stanley Lancaster July 18, 2023, 5:17 p.m. UTC | #2

---
 time/strptime_l.c   | 16 +++++++++++-----
 time/tst-strptime.c |  2 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/time/strptime_l.c b/time/strptime_l.c
index 85c3249fcc..2382defc92 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -770,11 +770,17 @@ __strptime_internal (const char *rp, const char *fmt,
struct tm *tmp,
   break;
  case 'Z':
   /* Read timezone but perform no conversion.  */
-  while (ISSPACE (*rp))
-    rp++;
-  while (!ISSPACE (*rp) && *rp != '\0')
-    rp++;
-  break;
+ {
+ while (ISSPACE (*rp))
+     rp++;
+ /* Read time zone but perform no conversion. Recognize the format
[-+a-zA-Z0-9]{3,}.  */
+ const char* start_rp = rp;
+ while ((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp
>= '0' && *rp <= '9'))
+ rp++;
+ if (start_rp+3 < rp)
+ return NULL;
+ }
  case 'z':
   /* We recognize four formats:
      1. Two digits specify hours.
diff --git a/time/tst-strptime.c b/time/tst-strptime.c
index 3dae9e0594..40145cb109 100644
--- a/time/tst-strptime.c
+++ b/time/tst-strptime.c
@@ -48,6 +48,8 @@ static const struct
     6, 0, 0, 1 },
   { "en_US.ISO-8859-1", "2000-01-01 08:12:21 PM", "%Y-%m-%d %I:%M:%S %p",
     6, 0, 0, 1 },
+  { "en_US.ISO-8859-1", "2000-01-01 08:12:21 AM CST/", "%Y-%m-%d %I:%M:%S
%p %Z/",
+    6, 0, 0, 1},
   { "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 },
   { "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 },
   /* Most of the languages do not need the declension of the month names

Stanley Lancaster July 18, 2023, 5:22 p.m. UTC | #3

I made several of the changes Paul suggested. And ran all the tests. This
patch does not fill out the tm_zone information because the manpage for the
api lists this a format character where "the  corresponding fields are
parsed, but no field in tm is changed." I feel that if this is the
behaviour developers have expected & written around in the past, that
behaviour should stay. However, if everyone here feels that a full
implementation should be written, I can do so.

On Tue, Jul 18, 2023 at 12:17 PM Stanley Lancaster <lancasterharp@gmail.com>
wrote:

> ---
>  time/strptime_l.c   | 16 +++++++++++-----
>  time/tst-strptime.c |  2 ++
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/time/strptime_l.c b/time/strptime_l.c
> index 85c3249fcc..2382defc92 100644
> --- a/time/strptime_l.c
> +++ b/time/strptime_l.c
> @@ -770,11 +770,17 @@ __strptime_internal (const char *rp, const char
> *fmt, struct tm *tmp,
>    break;
>   case 'Z':
>    /* Read timezone but perform no conversion.  */
> -  while (ISSPACE (*rp))
> -    rp++;
> -  while (!ISSPACE (*rp) && *rp != '\0')
> -    rp++;
> -  break;
> + {
> + while (ISSPACE (*rp))
> +     rp++;
> + /* Read time zone but perform no conversion. Recognize the format
> [-+a-zA-Z0-9]{3,}.  */
> + const char* start_rp = rp;
> + while ((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp
> >= '0' && *rp <= '9'))
> + rp++;
> + if (start_rp+3 < rp)
> + return NULL;
> + }
>   case 'z':
>    /* We recognize four formats:
>       1. Two digits specify hours.
> diff --git a/time/tst-strptime.c b/time/tst-strptime.c
> index 3dae9e0594..40145cb109 100644
> --- a/time/tst-strptime.c
> +++ b/time/tst-strptime.c
> @@ -48,6 +48,8 @@ static const struct
>      6, 0, 0, 1 },
>    { "en_US.ISO-8859-1", "2000-01-01 08:12:21 PM", "%Y-%m-%d %I:%M:%S %p",
>      6, 0, 0, 1 },
> +  { "en_US.ISO-8859-1", "2000-01-01 08:12:21 AM CST/", "%Y-%m-%d %I:%M:%S
> %p %Z/",
> +    6, 0, 0, 1},
>    { "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 },
>    { "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 },
>    /* Most of the languages do not need the declension of the month names
> --
> 2.39.3
>
> On Fri, Jul 14, 2023 at 11:11 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
>> On 2023-07-14 07:52, Stanley Lancaster via Libc-alpha wrote:
>>
>> >         /* Read timezone but perform no conversion.  */
>> > +       /* we recognize the format [-+a-zA-Z0-9]{3,} */
>>
>> Use GNU style in comment with active voice sentences and two spaces
>> after sentence end, e.g., "/* Read time zone but perform no conversion.
>> Recognize the format [-+a-zA-Z0-9]{3,}.  */".
>>
>>
>> > +       const char* stop_rp = rp + 3;
>>
>> Again, GNU style: "char *stop_rp" not "char* stop_rp".
>>
>> More important, this has undefined behavior if rp + 3 overflows.
>> Instead, count the number of bytes after the loop finishes, and make
>> sure it's 3 or more.
>>
>>
>> > +       while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <=
>> 'z') || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')
>>
>> Omit "&& *rp != '\0'"; it's redundant. Reindent to 80 columns.
>>
>>
>> > +  { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },
>>
>> I don't see how this test passes. "CST0502123412" is treated as a time
>> zone abbreviation, so the only info is the year. Did you run the tests?
>> If so, why did this test pass? If not, please run.
>>
>> Since the patch does not fix BZ#16088, it needs a commit message that
>> describes what the patch does and why it's a win even though it doesn't
>> fix. In particular, the patch does not set tm_zone, and there's a reason
>> for that, and this should be explained.
>>
>

Paul Eggert July 19, 2023, 2:13 a.m. UTC | #4

Unfortunately that patch was corrupted by Gmail. Please use git 
send-email to send patches.

diff mbox series

Patch

diff --git a/time/strptime_l.c b/time/strptime_l.c
index 85c3249fcc..5954015c4e 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -770,9 +770,12 @@  __strptime_internal (const char *rp, const char *fmt, struct tm *tmp,
 	  break;
 	case 'Z':
 	  /* Read timezone but perform no conversion.  */
+	  /* we recognize the format [-+a-zA-Z0-9]{3,} */
 	  while (ISSPACE (*rp))
 	    rp++;
-	  while (!ISSPACE (*rp) && *rp != '\0')
+	  
+	  const char* stop_rp = rp + 3;
+	  while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')
 	    rp++;
 	  break;
 	case 'z':
diff --git a/time/tst-strptime.c b/time/tst-strptime.c
index 3dae9e0594..31d6945ef1 100644
--- a/time/tst-strptime.c
+++ b/time/tst-strptime.c
@@ -38,6 +38,7 @@  static const struct
   { "C", "03/03/00", "%D", 5, 62, 2, 3 },
   { "C", "9/9/99", "%x", 4, 251, 8, 9 },
   { "C", "19990502123412", "%Y%m%d%H%M%S", 0, 121, 4, 2 },
+  { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },
   { "C", "2001 20 Mon", "%Y %U %a", 1, 140, 4, 21 },
   { "C", "2001 21 Mon", "%Y %W %a", 1, 140, 4, 21 },
   { "C", "2001 21 Mon", "%2000Y %W %a", 1, 140, 4, 21 },