Fix %Z parsing in strptime [BZ #16088]
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Testing passed
|
Commit Message
---
time/strptime_l.c | 5 ++++-
time/tst-strptime.c | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
Comments
On 2023-07-14 07:52, Stanley Lancaster via Libc-alpha wrote:
> /* Read timezone but perform no conversion. */
> + /* we recognize the format [-+a-zA-Z0-9]{3,} */
Use GNU style in comment with active voice sentences and two spaces
after sentence end, e.g., "/* Read time zone but perform no conversion.
Recognize the format [-+a-zA-Z0-9]{3,}. */".
> + const char* stop_rp = rp + 3;
Again, GNU style: "char *stop_rp" not "char* stop_rp".
More important, this has undefined behavior if rp + 3 overflows.
Instead, count the number of bytes after the loop finishes, and make
sure it's 3 or more.
> + while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')
Omit "&& *rp != '\0'"; it's redundant. Reindent to 80 columns.
> + { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },
I don't see how this test passes. "CST0502123412" is treated as a time
zone abbreviation, so the only info is the year. Did you run the tests?
If so, why did this test pass? If not, please run.
Since the patch does not fix BZ#16088, it needs a commit message that
describes what the patch does and why it's a win even though it doesn't
fix. In particular, the patch does not set tm_zone, and there's a reason
for that, and this should be explained.
---
time/strptime_l.c | 16 +++++++++++-----
time/tst-strptime.c | 2 ++
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/time/strptime_l.c b/time/strptime_l.c
index 85c3249fcc..2382defc92 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -770,11 +770,17 @@ __strptime_internal (const char *rp, const char *fmt,
struct tm *tmp,
break;
case 'Z':
/* Read timezone but perform no conversion. */
- while (ISSPACE (*rp))
- rp++;
- while (!ISSPACE (*rp) && *rp != '\0')
- rp++;
- break;
+ {
+ while (ISSPACE (*rp))
+ rp++;
+ /* Read time zone but perform no conversion. Recognize the format
[-+a-zA-Z0-9]{3,}. */
+ const char* start_rp = rp;
+ while ((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp
>= '0' && *rp <= '9'))
+ rp++;
+ if (start_rp+3 < rp)
+ return NULL;
+ }
case 'z':
/* We recognize four formats:
1. Two digits specify hours.
diff --git a/time/tst-strptime.c b/time/tst-strptime.c
index 3dae9e0594..40145cb109 100644
--- a/time/tst-strptime.c
+++ b/time/tst-strptime.c
@@ -48,6 +48,8 @@ static const struct
6, 0, 0, 1 },
{ "en_US.ISO-8859-1", "2000-01-01 08:12:21 PM", "%Y-%m-%d %I:%M:%S %p",
6, 0, 0, 1 },
+ { "en_US.ISO-8859-1", "2000-01-01 08:12:21 AM CST/", "%Y-%m-%d %I:%M:%S
%p %Z/",
+ 6, 0, 0, 1},
{ "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 },
{ "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 },
/* Most of the languages do not need the declension of the month names
I made several of the changes Paul suggested. And ran all the tests. This
patch does not fill out the tm_zone information because the manpage for the
api lists this a format character where "the corresponding fields are
parsed, but no field in tm is changed." I feel that if this is the
behaviour developers have expected & written around in the past, that
behaviour should stay. However, if everyone here feels that a full
implementation should be written, I can do so.
On Tue, Jul 18, 2023 at 12:17 PM Stanley Lancaster <lancasterharp@gmail.com>
wrote:
> ---
> time/strptime_l.c | 16 +++++++++++-----
> time/tst-strptime.c | 2 ++
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/time/strptime_l.c b/time/strptime_l.c
> index 85c3249fcc..2382defc92 100644
> --- a/time/strptime_l.c
> +++ b/time/strptime_l.c
> @@ -770,11 +770,17 @@ __strptime_internal (const char *rp, const char
> *fmt, struct tm *tmp,
> break;
> case 'Z':
> /* Read timezone but perform no conversion. */
> - while (ISSPACE (*rp))
> - rp++;
> - while (!ISSPACE (*rp) && *rp != '\0')
> - rp++;
> - break;
> + {
> + while (ISSPACE (*rp))
> + rp++;
> + /* Read time zone but perform no conversion. Recognize the format
> [-+a-zA-Z0-9]{3,}. */
> + const char* start_rp = rp;
> + while ((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp
> >= '0' && *rp <= '9'))
> + rp++;
> + if (start_rp+3 < rp)
> + return NULL;
> + }
> case 'z':
> /* We recognize four formats:
> 1. Two digits specify hours.
> diff --git a/time/tst-strptime.c b/time/tst-strptime.c
> index 3dae9e0594..40145cb109 100644
> --- a/time/tst-strptime.c
> +++ b/time/tst-strptime.c
> @@ -48,6 +48,8 @@ static const struct
> 6, 0, 0, 1 },
> { "en_US.ISO-8859-1", "2000-01-01 08:12:21 PM", "%Y-%m-%d %I:%M:%S %p",
> 6, 0, 0, 1 },
> + { "en_US.ISO-8859-1", "2000-01-01 08:12:21 AM CST/", "%Y-%m-%d %I:%M:%S
> %p %Z/",
> + 6, 0, 0, 1},
> { "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 },
> { "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 },
> /* Most of the languages do not need the declension of the month names
> --
> 2.39.3
>
> On Fri, Jul 14, 2023 at 11:11 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
>> On 2023-07-14 07:52, Stanley Lancaster via Libc-alpha wrote:
>>
>> > /* Read timezone but perform no conversion. */
>> > + /* we recognize the format [-+a-zA-Z0-9]{3,} */
>>
>> Use GNU style in comment with active voice sentences and two spaces
>> after sentence end, e.g., "/* Read time zone but perform no conversion.
>> Recognize the format [-+a-zA-Z0-9]{3,}. */".
>>
>>
>> > + const char* stop_rp = rp + 3;
>>
>> Again, GNU style: "char *stop_rp" not "char* stop_rp".
>>
>> More important, this has undefined behavior if rp + 3 overflows.
>> Instead, count the number of bytes after the loop finishes, and make
>> sure it's 3 or more.
>>
>>
>> > + while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <=
>> 'z') || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')
>>
>> Omit "&& *rp != '\0'"; it's redundant. Reindent to 80 columns.
>>
>>
>> > + { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },
>>
>> I don't see how this test passes. "CST0502123412" is treated as a time
>> zone abbreviation, so the only info is the year. Did you run the tests?
>> If so, why did this test pass? If not, please run.
>>
>> Since the patch does not fix BZ#16088, it needs a commit message that
>> describes what the patch does and why it's a win even though it doesn't
>> fix. In particular, the patch does not set tm_zone, and there's a reason
>> for that, and this should be explained.
>>
>
Unfortunately that patch was corrupted by Gmail. Please use git
send-email to send patches.
@@ -770,9 +770,12 @@ __strptime_internal (const char *rp, const char *fmt, struct tm *tmp,
break;
case 'Z':
/* Read timezone but perform no conversion. */
+ /* we recognize the format [-+a-zA-Z0-9]{3,} */
while (ISSPACE (*rp))
rp++;
- while (!ISSPACE (*rp) && *rp != '\0')
+
+ const char* stop_rp = rp + 3;
+ while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')
rp++;
break;
case 'z':
@@ -38,6 +38,7 @@ static const struct
{ "C", "03/03/00", "%D", 5, 62, 2, 3 },
{ "C", "9/9/99", "%x", 4, 251, 8, 9 },
{ "C", "19990502123412", "%Y%m%d%H%M%S", 0, 121, 4, 2 },
+ { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },
{ "C", "2001 20 Mon", "%Y %U %a", 1, 140, 4, 21 },
{ "C", "2001 21 Mon", "%Y %W %a", 1, 140, 4, 21 },
{ "C", "2001 21 Mon", "%2000Y %W %a", 1, 140, 4, 21 },