[v2] timezone: handle truncated timezones from tzcode-2021d and later (BZ #28707)

Message ID 20211217005744.E099F203C8@pchp3.se.axis.com
State Superseded
Headers
Series [v2] timezone: handle truncated timezones from tzcode-2021d and later (BZ #28707) |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Hans-Peter Nilsson Dec. 17, 2021, 12:57 a.m. UTC
  I'm assisting Christopher, who sent the first version; this
one includes a test-case and a bugzilla report reference.
We're both covered by Axis' copyright assignment.  Tested on
x86_64-linux (Debian 9).

brgds, H-P
8< ------------------------------------------------------ >8

When using a timezone file generated by the zic in IANA
tzcode-2021d a.k.a. tzlib-2021d (also in tzlib-2021e; current as
of this writing), glibc asserts in __tzfile_read (on
e.g. tzset() for this file) and you may find lines matching
"tzfile.c:435: __tzfile_read: Assertion `num_types == 1' failed"
in your syslog.

Attached is a test-case including the tzfile for Asuncion
generated by tzlib-2021e as follows, using the tzlib-2021e zic:
"zic -d DEST -r @1546300800 -L /dev/null -b slim
SOURCE/southamerica".  Note that in its type 2 header, it has
two entries in its "time-types" array (types), but only one
entry in its "transition types" array (type_idxs).

This is valid and expected already in the published RFC8536, and
not even frowned upon: "Local time for timestamps before the
first transition is specified by the first time type (time type
0)" ... "every nonzero local time type index SHOULD appear at
least once in the transition type array".  Note the "nonzero ...
index".  Until the 2021d zic, index 0 has been shared by the
first valid transition but with 2021d it's separate, set apart
as a placeholder and only "implicitly" indexed.  (A draft update
of the RFC mandates that the entry at index 0 is a placeholder
in this case, hence can no longer be shared.)

	* time/tzfile.c: Don't assert when no transitions are found.
	* timezone/Makefile, timezone/tst-pr28707.c,
	timezone/testdata/XT5: New test.

Co-authored-by: Christopher Wong <Christopher.Wong@axis.com>
---
 time/tzfile.c           |   4 ++--
 timezone/Makefile       |   4 +++-
 timezone/testdata/XT5   | Bin 0 -> 156 bytes
 timezone/tst-bz28707.c |  46 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 51 insertions(+), 3 deletions(-)
 create mode 100644 timezone/testdata/XT5
 create mode 100644 timezone/tst-bz28707.c
  

Comments

Paul Eggert Dec. 17, 2021, 2:51 a.m. UTC | #1
Thanks for the patch. The basic idea looks right, but:

On 12/16/21 16:57, Hans-Peter Nilsson wrote:
> diff --git a/timezone/testdata/XT5 b/timezone/testdata/XT5
> new file mode 100644
> index 0000000000000000000000000000000000000000..aadcd6dccab37d1177d2c0ca725ea56c5dc7ca48
> GIT binary patch
> literal 156
> zcmWHE%1kq2AP5+NDnJ+nLI`UCDP;m;4v_j7t+fqMz5oATy}-z#Yhb{jYhcX4Wut3g
> cVrK#*jqP-N4Gr`R^$he4bbO8VOh61S07fYg0{{R3
> 
> literal 0

Instead of putting that binary file directly in the repository it might 
be better to use the 'zic' command that you mentioned, to create the 
binary file from text source, as binary files in the repository are a 
pain for the usual reasons.

If it's too much trouble to invoke zic, then a shell 'printf' like the 
following should be OK.

# Put a comment here explaining how this file was derived.
printf >XT5 \
'TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'\
'\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0TZif2\0\0\0\0\0\0\0\0'\
'\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\b\0'\
'\0\0\0\*\255\200\1\0\0\0\0\0\0\377\377\325\320\1\4-00\0-03\0\n'\
'<-04>4<-03>,M10.1.0/0,M3.4.0/0\n'
  
Carlos O'Donell Dec. 17, 2021, 2:05 p.m. UTC | #2
On 12/16/21 21:51, Paul Eggert wrote:
> Thanks for the patch. The basic idea looks right, but:
> 
> On 12/16/21 16:57, Hans-Peter Nilsson wrote:
>> diff --git a/timezone/testdata/XT5 b/timezone/testdata/XT5
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..aadcd6dccab37d1177d2c0ca725ea56c5dc7ca48
>> GIT binary patch
>> literal 156
>> zcmWHE%1kq2AP5+NDnJ+nLI`UCDP;m;4v_j7t+fqMz5oATy}-z#Yhb{jYhcX4Wut3g
>> cVrK#*jqP-N4Gr`R^$he4bbO8VOh61S07fYg0{{R3
>>
>> literal 0
> 
> Instead of putting that binary file directly in the repository it might be better to use the 'zic' command that you mentioned, to create the binary file from text source, as binary files in the repository are a pain for the usual reasons.
> 
> If it's too much trouble to invoke zic, then a shell 'printf' like the following should be OK.
> 
> # Put a comment here explaining how this file was derived.
> printf >XT5 \
> 'TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'\
> '\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0TZif2\0\0\0\0\0\0\0\0'\
> '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\b\0'\
> '\0\0\0\*\255\200\1\0\0\0\0\0\0\377\377\325\320\1\4-00\0-03\0\n'\
> '<-04>4<-03>,M10.1.0/0,M3.4.0/0\n'
 
DJ,

Could you please review the interaction between CI and this patch?

32-bit i686 CI/CD fell over because of the binary patch:
https://patchwork.sourceware.org/project/glibc/patch/20211217005744.E099F203C8@pchp3.se.axis.com/

It should be entirely possible to apply this, and it does
apply with git-pw patch apply.
  
Carlos O'Donell Dec. 17, 2021, 2:24 p.m. UTC | #3
On 12/16/21 21:51, Paul Eggert wrote:
> Thanks for the patch. The basic idea looks right, but:

Paul,

What is the impact of this for all of the downstream distributions?

I would like to avoid having every distribution scrambling to patch
glibc to handle new tdata binary files, but if we need to do this
patch I would like to get it into all of the active release branches.
  
Paul Eggert Dec. 17, 2021, 5:57 p.m. UTC | #4
On 12/17/21 06:24, Carlos O'Donell wrote:
> What is the impact of this for all of the downstream distributions?

For tzdb Zones, this should affect only people who use zic's -r option 
to limit the size of the TZif binary files by supporting only a 
particular time range. For example, 'zic -r@1609459200 ...' means "omit 
all data for timestamps before the year 2021 UTC". Although full distros 
don't do that, I suppose some stripped-down installions might, for 
embedded applications that don't deal with timestamps in the past. zic's 
-r option was introduced in tzdb 2019a (2019-03-25), was propagated into 
glibc 2.32 (2020-08-05), and is the basis for BZ#28707.

If people are using glibc zic, they need to also specify '-b slim' to 
get the slimmed-down TZif files that have the problem, as in the 
original BZ#28707 bug report. However, I expect that people who use -r 
to generate small TZif files will also either use '-b slim', or will use 
upstream zic where '-b slim' is already the default (this is as of tzdb 
2020b).

> I would like to avoid having every distribution scrambling to patch
> glibc to handle new tdata binary files, but if we need to do this
> patch I would like to get it into all of the active release branches.

That depends on how many people trim down TZif files in the 
abovementioned ways. It shouldn't be a problem on mainstream distros. It 
might be a problem if people use these newer RFC 8536-inspired zic 
options for installations on embedded devices or whatever.
  
Hans-Peter Nilsson Dec. 17, 2021, 6:11 p.m. UTC | #5
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Fri, 17 Dec 2021 18:57:52 +0100

> > I would like to avoid having every distribution scrambling to patch
> > glibc to handle new tdata binary files, but if we need to do this
> > patch I would like to get it into all of the active release branches.
> 
> That depends on how many people trim down TZif files in the 
> abovementioned ways. It shouldn't be a problem on mainstream distros. It 
> might be a problem if people use these newer RFC 8536-inspired zic 
> options for installations on embedded devices or whatever.

Just adding that this is exactly the case here.

For some reason the zic from tzcode is preferred over
glibc's, and usually the tzdata and tzcode (run only at
build time) is updated together.  (Not our own invention or
mis/conception, but using a truncated time-range apparently
is, which is the reason we're lucky #1.)

brgds, H-P
  
Paul Eggert Dec. 17, 2021, 6:40 p.m. UTC | #6
On 12/17/21 10:11, Hans-Peter Nilsson wrote:

> For some reason the zic from tzcode is preferred over
> glibc's

If you're using zic's -r and/or -bslim options, it is indeed better to 
use zic 2021d or later, as recent tzcode fixes several bugs relating to 
-r and -bslim.

This raises the issue of when it'd be a good time to sync glibc's copy 
of tzcode source files from upstream. I don't know when (and can't 
guarantee that) Internet RFC 8536's draft successor will be published as 
an RFC. That being said, I doubt whether substantial technical changes 
will be made to the draft between now and then. So partly it boils down 
to whether glibc wants to be a bit ahead of TZif standardization, or a 
bit behind it.

A significant portability implication of syncing from tzcode is that 
this would make '-b slim' the default for glibc zic. This change should 
work OK with existing glibc (so long as you don't also use zic -r) but 
it may affect other programs that read TZif files directly, and that 
don't yet support RFC 8536.
  
Hans-Peter Nilsson Dec. 17, 2021, 8:33 p.m. UTC | #7
> From: Carlos O'Donell <carlos@redhat.com>
> Date: Fri, 17 Dec 2021 15:05:57 +0100

> On 12/16/21 21:51, Paul Eggert wrote:
> > Thanks for the patch. The basic idea looks right, but:
> > 
> > On 12/16/21 16:57, Hans-Peter Nilsson wrote:
> >> diff --git a/timezone/testdata/XT5 b/timezone/testdata/XT5
> >> new file mode 100644
> >> index 0000000000000000000000000000000000000000..aadcd6dccab37d1177d2c0ca725ea56c5dc7ca48
> >> GIT binary patch
> >> literal 156
> >> zcmWHE%1kq2AP5+NDnJ+nLI`UCDP;m;4v_j7t+fqMz5oATy}-z#Yhb{jYhcX4Wut3g
> >> cVrK#*jqP-N4Gr`R^$he4bbO8VOh61S07fYg0{{R3
> >>
> >> literal 0
> > 
> > Instead of putting that binary file directly in the
> > repository it might be better to use the 'zic' command
> > that you mentioned, to create the binary file from text
> > source, as binary files in the repository are a pain for
> > the usual reasons.

So, here I was thinking:
"That kind of thinking was appropriate for CVS, but git
(should) handle the files just fine.  Also, see the existing
XT1..XT4!"

> > If it's too much trouble to invoke zic,

...and exactly version 2021d or 2021e: yes, I think it is.

> > then a shell 'printf' like the following should be OK.

Nice, thanks.

> DJ,
> 
> Could you please review the interaction between CI and this patch?
> 
> 32-bit i686 CI/CD fell over because of the binary patch:
> https://patchwork.sourceware.org/project/glibc/patch/20211217005744.E099F203C8@pchp3.se.axis.com/

Hah!  I stand corrected; QED binary files (in patches) are
*still* causing problems!

New version coming up.

brgds, H-P
  

Patch

diff --git a/time/tzfile.c b/time/tzfile.c
index 190a777152b3..8668392ad387 100644
--- a/time/tzfile.c
+++ b/time/tzfile.c
@@ -431,8 +431,8 @@  __tzfile_read (const char *file, size_t extra, char **extrap)
   if (__tzname[0] == NULL)
     {
       /* This should only happen if there are no transition rules.
-	 In this case there should be only one single type.  */
-      assert (num_types == 1);
+	 In this case there's usually only one single type, unless
+	 e.g. the data file has a truncated time-range.  */
       __tzname[0] = __tzstring (zone_names);
     }
   if (__tzname[1] == NULL)
diff --git a/timezone/Makefile b/timezone/Makefile
index c624a189b322..369b3e1698ba 100644
--- a/timezone/Makefile
+++ b/timezone/Makefile
@@ -23,7 +23,7 @@  subdir	:= timezone
 include ../Makeconfig
 
 others	:= zdump zic
-tests	:= test-tz tst-timezone tst-tzset
+tests	:= test-tz tst-timezone tst-tzset tst-bz28707
 
 generated-dirs += testdata
 
@@ -85,10 +85,12 @@  $(objpfx)tst-timezone.out: $(addprefix $(testdata)/, \
 				       America/Sao_Paulo Asia/Tokyo \
 				       Europe/London)
 $(objpfx)tst-tzset.out: $(addprefix $(testdata)/XT, 1 2 3 4)
+$(objpfx)tst-bz28707.out: $(testdata)/XT5
 
 test-tz-ENV = TZDIR=$(testdata)
 tst-timezone-ENV = TZDIR=$(testdata)
 tst-tzset-ENV = TZDIR=$(testdata)
+tst-bz28707-ENV = TZDIR=$(testdata)
 
 # Note this must come second in the deps list for $(built-program-cmd) to work.
 zic-deps = $(objpfx)zic $(leapseconds) yearistype
diff --git a/timezone/testdata/XT5 b/timezone/testdata/XT5
new file mode 100644
index 0000000000000000000000000000000000000000..aadcd6dccab37d1177d2c0ca725ea56c5dc7ca48
GIT binary patch
literal 156
zcmWHE%1kq2AP5+NDnJ+nLI`UCDP;m;4v_j7t+fqMz5oATy}-z#Yhb{jYhcX4Wut3g
cVrK#*jqP-N4Gr`R^$he4bbO8VOh61S07fYg0{{R3

literal 0
HcmV?d00001

diff --git a/timezone/tst-bz28707.c b/timezone/tst-bz28707.c
new file mode 100644
index 000000000000..0a9df1e9a094
--- /dev/null
+++ b/timezone/tst-bz28707.c
@@ -0,0 +1,46 @@ 
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <time.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+/* Test that we can use a truncated timezone-file, where the time-type
+   at index 0 is not indexed by the transition-types array (and the
+   transition-types array does not contain at least both one DST and one
+   normal time members).  */
+
+static int
+do_test (void)
+{
+  if (setenv ("TZ", "XT5", 1))
+    {
+      puts ("setenv failed.");
+      return 1;
+    }
+
+  tzset ();
+
+  return
+    /* Sanity-check that we got the right timezone-name for DST.  For
+       normal time, we're likely to get "-00" (the "unspecified" marker),
+       even though the POSIX timezone string says "-04".  Let's not test
+       that.  */
+    !(strcmp (tzname[1], "-03") == 0);
+}
+#include <support/test-driver.c>