Fix multiple minor tzset glitches [BZ #24004]

Message ID 20190211163728.31655-1-eggert@cs.ucla.edu
State Rejected
Headers

Commit Message

Paul Eggert Feb. 11, 2019, 4:37 p.m. UTC
  Avoid polling if TZ is unset.
Consistently ignore leading ":" in TZ as per documentation.
Treat TZ="" as Universal Time without leap seconds,
for consistency with upstream.
* time/tzset.c (tzset_internal): Check for null TZ first.
Do not substitute "Universal" for ""; this is __tzfile_read’s
job and we were doing it incorrectly as we treated TZ=""
differently from TZ=":".
* time/tzfile.c (__tzfile_read): Simplify, as FILE cannot be null.
---
 ChangeLog     | 13 +++++++++++++
 NEWS          |  7 +++++++
 time/tzfile.c |  5 +----
 time/tzset.c  | 15 ++++++---------
 4 files changed, 27 insertions(+), 13 deletions(-)
  

Comments

Florian Weimer Feb. 18, 2019, 9:37 a.m. UTC | #1
* Paul Eggert:

> Avoid polling if TZ is unset.

I thought this polling (for reloading /etc/localtime if it has changed)
was a feature.  I'm not sure if we can remove it.

Thanks,
Florian
  
Paul Eggert Feb. 20, 2019, 7:29 p.m. UTC | #2
On 2/18/19 1:37 AM, Florian Weimer wrote:
> I thought this polling (for reloading /etc/localtime if it has changed)
> was a feature.  I'm not sure if we can remove it.

No such polling is documented. On the contrary, the glibc manual says 
that an unset TZ is treated as if it were set to ':/etc/localtime'. 
There is no polling when you set TZ=':/etc/localtime' (or to 
TZ='/etc/localtime', or to any other value). So the code and the 
documentation disagree, and one or the other needs to be fixed.

Neither Solaris nor OpenBSD poll, so portable programs cannot assume 
that polling occurs. (These are the only two other systems I checked.)

Commentary like this:

https://blog.packagecloud.io/eng/2017/02/21/set-environment-variable-save-thousands-of-system-calls/

says that polling is a bad idea, and gives TZ=':/etc/localtime' as a 
workaround for glibc. I'm sympathetic, as the polling makes glibc look 
bad compared to the competition.
  
Florian Weimer March 13, 2019, 2:52 p.m. UTC | #3
* Paul Eggert:

> On 2/18/19 1:37 AM, Florian Weimer wrote:
>> I thought this polling (for reloading /etc/localtime if it has changed)
>> was a feature.  I'm not sure if we can remove it.
>
> No such polling is documented. On the contrary, the glibc manual says
> that an unset TZ is treated as if it were set to
> ':/etc/localtime'. There is no polling when you set
> TZ=':/etc/localtime' (or to TZ='/etc/localtime', or to any other
> value). So the code and the documentation disagree, and one or the
> other needs to be fixed.
>
> Neither Solaris nor OpenBSD poll, so portable programs cannot assume
> that polling occurs. (These are the only two other systems I checked.)
>
> Commentary like this:
>
> https://blog.packagecloud.io/eng/2017/02/21/set-environment-variable-save-thousands-of-system-calls/
>
> says that polling is a bad idea, and gives TZ=':/etc/localtime' as a
> workaround for glibc. I'm sympathetic, as the polling makes glibc look
> bad compared to the competition.

On the other hand, for files which we do not poll for changes (such as
/etc/resolv.conf previously, and /etc/gai.conf and /etc/nsswitch.conf
presently), we regularly receive requests to add such polling.  For
nsswitch.conf, this is clearly documented in the manual page, and yet
people still ask for it.

Thanks,
Florian
  
Paul Eggert March 13, 2019, 5:02 p.m. UTC | #4
On 3/13/19 7:52 AM, Florian Weimer wrote:
> On the other hand, for files which we do not poll for changes (such as
> /etc/resolv.conf previously, and /etc/gai.conf and /etc/nsswitch.conf
> presently), we regularly receive requests to add such polling.

Yes, and it would be good to add a way to control polling of /etc files,
and whatever method is used to control polling of /etc/nsswitch.conf
etc. could also be used to control polling of /etc/localtime. Still, the
current situation with /etc/localtime is not good: it disagrees with the
documentation and with common practice elsewhere. So I suggest that we
fix it and also look into adding an API to control polling.

One API might be something like a new include file etcpoll.h with an API
like this:

  etcpoll (ETC_NSSWITCH_CONF, ETCPOLL_NOW);
  etcpoll (ETC_GAI_CONF, ETCPOLL_ALWAYS);
  etcpoll (ETC_LOCALTIME, ETCPOLL_DISABLE);

where NOW means reconsult the file now, ALWAYS means always reconsult
the file before every use of its data, and DISABLE means stop
reconsulting. That sort of thing.
  
Florian Weimer March 14, 2019, 2:10 p.m. UTC | #5
* Paul Eggert:

> On 3/13/19 7:52 AM, Florian Weimer wrote:
>> On the other hand, for files which we do not poll for changes (such as
>> /etc/resolv.conf previously, and /etc/gai.conf and /etc/nsswitch.conf
>> presently), we regularly receive requests to add such polling.
>
> Yes, and it would be good to add a way to control polling of /etc files,
> and whatever method is used to control polling of /etc/nsswitch.conf
> etc. could also be used to control polling of /etc/localtime. Still, the
> current situation with /etc/localtime is not good: it disagrees with the
> documentation and with common practice elsewhere. So I suggest that we
> fix it and also look into adding an API to control polling.
>
> One API might be something like a new include file etcpoll.h with an API
> like this:
>
>   etcpoll (ETC_NSSWITCH_CONF, ETCPOLL_NOW);
>   etcpoll (ETC_GAI_CONF, ETCPOLL_ALWAYS);
>   etcpoll (ETC_LOCALTIME, ETCPOLL_DISABLE);
>
> where NOW means reconsult the file now, ALWAYS means always reconsult
> the file before every use of its data, and DISABLE means stop
> reconsulting. That sort of thing.

I think this should be a configuration file.  We can use a shared
mapping (or use /etc/ld.so.cache) to make up-to-date checks really,
really cheap.  But I think we should apply this mechanism only to files
where the system administrator has enabled this feature.

Thanks,
Florian
  
Paul Eggert March 14, 2019, 4:47 p.m. UTC | #6
On 3/14/19 7:10 AM, Florian Weimer wrote:
> * Paul Eggert:
>
>> Yes, and it would be good to add a way to control polling of /etc files,
>> and whatever method is used to control polling of /etc/nsswitch.conf
>> etc. could also be used to control polling of /etc/localtime.
> I think this should be a configuration file.  We can use a shared
> mapping (or use /etc/ld.so.cache) to make up-to-date checks really,
> really cheap.  But I think we should apply this mechanism only to files
> where the system administrator has enabled this feature.

Will the configuration file will have also an entry to control whether
the configuration file itself is polled? :-)

Come to think of it, this has the same feel as the other tunables, so
perhaps it'd be better to add it to the tunables feature. Something like
this:

export GLIBC_TUNABLES='glibc.etcpoll.nsswitch=1'

in order to enable polling of /etc/nsswitch.conf.
  
Adhemerval Zanella March 14, 2019, 6:24 p.m. UTC | #7
On 14/03/2019 13:47, Paul Eggert wrote:
> On 3/14/19 7:10 AM, Florian Weimer wrote:
>> * Paul Eggert:
>>
>>> Yes, and it would be good to add a way to control polling of /etc files,
>>> and whatever method is used to control polling of /etc/nsswitch.conf
>>> etc. could also be used to control polling of /etc/localtime.
>> I think this should be a configuration file.  We can use a shared
>> mapping (or use /etc/ld.so.cache) to make up-to-date checks really,
>> really cheap.  But I think we should apply this mechanism only to files
>> where the system administrator has enabled this feature.
> 
> Will the configuration file will have also an entry to control whether
> the configuration file itself is polled? :-)
> 
> Come to think of it, this has the same feel as the other tunables, so
> perhaps it'd be better to add it to the tunables feature. Something like
> this:
> 
> export GLIBC_TUNABLES='glibc.etcpoll.nsswitch=1'
> 
> in order to enable polling of /etc/nsswitch.conf.
> 

+1 on making this a tunable. Adding another configure switch means another
build permutation to check for consistency.
  
Florian Weimer March 14, 2019, 6:48 p.m. UTC | #8
* Paul Eggert:

> On 3/14/19 7:10 AM, Florian Weimer wrote:
>> * Paul Eggert:
>>
>>> Yes, and it would be good to add a way to control polling of /etc files,
>>> and whatever method is used to control polling of /etc/nsswitch.conf
>>> etc. could also be used to control polling of /etc/localtime.
>> I think this should be a configuration file.  We can use a shared
>> mapping (or use /etc/ld.so.cache) to make up-to-date checks really,
>> really cheap.  But I think we should apply this mechanism only to files
>> where the system administrator has enabled this feature.
>
> Will the configuration file will have also an entry to control whether
> the configuration file itself is polled? :-)

With the mapped file, there is no file polling as such, just a few
memory loads.  The cost is that the system administrator needs to run a
command after updating any of the covered configuration files, which is
why this has to be optional (and opt-in per configuration file).  There
probably aren't that many configuration files where this is needed.

Thanks,
Florian
  
Yann Droneaud March 15, 2019, 1:12 p.m. UTC | #9
Hi,

Le jeudi 14 mars 2019 à 19:48 +0100, Florian Weimer a écrit :
> * Paul Eggert:
> > On 3/14/19 7:10 AM, Florian Weimer wrote:
> > > * Paul Eggert:
> > > 
> > > > Yes, and it would be good to add a way to control polling of /etc files,
> > > > and whatever method is used to control polling of /etc/nsswitch.conf
> > > > etc. could also be used to control polling of /etc/localtime.
> > > I think this should be a configuration file.  We can use a shared
> > > mapping (or use /etc/ld.so.cache) to make up-to-date checks really,
> > > really cheap.  But I think we should apply this mechanism only to files
> > > where the system administrator has enabled this feature.
> > 
> > Will the configuration file will have also an entry to control whether
> > the configuration file itself is polled? :-)
> 
> With the mapped file, there is no file polling as such, just a few
> memory loads.  The cost is that the system administrator needs to run a
> command after updating any of the covered configuration files, which is
> why this has to be optional (and opt-in per configuration file).  There
> probably aren't that many configuration files where this is needed.
> 

I think there's a caveat with memory mapped file: if a new version of 
the configuration file is created and renamed to the configuration
location, replacing the previous one, applications won't notice the
update, as the memory mapping will still reference the previous file.

Regards.
  
Florian Weimer March 15, 2019, 3:05 p.m. UTC | #10
* Yann Droneaud:

>> With the mapped file, there is no file polling as such, just a few
>> memory loads.  The cost is that the system administrator needs to run a
>> command after updating any of the covered configuration files, which is
>> why this has to be optional (and opt-in per configuration file).  There
>> probably aren't that many configuration files where this is needed.
>
> I think there's a caveat with memory mapped file: if a new version of 
> the configuration file is created and renamed to the configuration
> location, replacing the previous one, applications won't notice the
> update, as the memory mapping will still reference the previous file.

Sure, but the configuration compiler can write a flag to the old
mapping that indicates it is out of date.  Checking that flag has to
be part of the sequence that verifies if polling is required.

I believe nscd does something similar when it invalidates a mapping.
  
Yann Droneaud March 15, 2019, 4:01 p.m. UTC | #11
Hi,

Le vendredi 15 mars 2019 à 16:05 +0100, Florian Weimer a écrit :
> * Yann Droneaud:
> 
> > > With the mapped file, there is no file polling as such, just a few
> > > memory loads.  The cost is that the system administrator needs to run a
> > > command after updating any of the covered configuration files, which is
> > > why this has to be optional (and opt-in per configuration file).  There
> > > probably aren't that many configuration files where this is needed.
> > 
> > I think there's a caveat with memory mapped file: if a new version of 
> > the configuration file is created and renamed to the configuration
> > location, replacing the previous one, applications won't notice the
> > update, as the memory mapping will still reference the previous file.
> 
> Sure, but the configuration compiler can write a flag to the old
> mapping that indicates it is out of date.  Checking that flag has to
> be part of the sequence that verifies if polling is required.
> 

Yes, this address my concern. Thanks.

(Assuming the configuration compiler is used, and not a random
stackoverflow copy pasted shell script ;)

Regards.
  
Florian Weimer April 10, 2019, 11:57 a.m. UTC | #12
* Paul Eggert:

> On 2/18/19 1:37 AM, Florian Weimer wrote:
>> I thought this polling (for reloading /etc/localtime if it has changed)
>> was a feature.  I'm not sure if we can remove it.
>
> No such polling is documented. On the contrary, the glibc manual says
> that an unset TZ is treated as if it were set to
> ':/etc/localtime'. There is no polling when you set
> TZ=':/etc/localtime' (or to TZ='/etc/localtime', or to any other
> value). So the code and the documentation disagree, and one or the
> other needs to be fixed.
>
> Neither Solaris nor OpenBSD poll, so portable programs cannot assume
> that polling occurs. (These are the only two other systems I checked.)

It turns out that polling was implemented in response to this user bug
report:

  <https://sourceware.org/bugzilla/show_bug.cgi?id=154>

I must say that the use case in the bug report is not unreasonable.

Thanks,
Florian
  
Carlos O'Donell April 10, 2019, 1:18 p.m. UTC | #13
On 4/10/19 7:57 AM, Florian Weimer wrote:
> * Paul Eggert:
> 
>> On 2/18/19 1:37 AM, Florian Weimer wrote:
>>> I thought this polling (for reloading /etc/localtime if it has changed)
>>> was a feature.  I'm not sure if we can remove it.
>>
>> No such polling is documented. On the contrary, the glibc manual says
>> that an unset TZ is treated as if it were set to
>> ':/etc/localtime'. There is no polling when you set
>> TZ=':/etc/localtime' (or to TZ='/etc/localtime', or to any other
>> value). So the code and the documentation disagree, and one or the
>> other needs to be fixed.
>>
>> Neither Solaris nor OpenBSD poll, so portable programs cannot assume
>> that polling occurs. (These are the only two other systems I checked.)
> 
> It turns out that polling was implemented in response to this user bug
> report:
> 
>    <https://sourceware.org/bugzilla/show_bug.cgi?id=154>
> 
> I must say that the use case in the bug report is not unreasonable.

I agree that today we have *millions* of people travelling through timezones
and they may be adjusting localtime on their laptop continuously and the
timezone should be updated and reflected in the logs.

I think this continues to show the following trend in expectation:

* APIs are MT-safe.

* APIs dynamically respond to changing context.
   - Location.
   - Network.
   - Language.

Few if any developers are asking us to make APIs data read-once
and immutable after reading.

Timezone information is updated on average a dozen times a year,
that's a once-a-month restart that's required to get this data
if we don't internally reload that data.

I think there *have* to be ways in which this data can be dynamically
updated without a huge performance penalty.

I find the following arguments weak:
(1) Solaris nor OpenBSD Poll.
(2) Portable programs will have to poll.

The reason I find them weak is because they do not restrict us from
doing the best we can for our developers on GNU/Linux.

We can offer polling by default, and users who only care about GNU/Linux
can write less code, and save on the maintenance of doing polling
correctly (or having to pay the full cost of tzset()).

$0.02.
  
Paul Eggert April 11, 2019, 5:51 p.m. UTC | #14
On 4/10/19 6:18 AM, Carlos O'Donell wrote:
> I think there *have* to be ways in which this data can be dynamically
> updated without a huge performance penalty. 

What ways are those, though? It's not like the penalty is zero.

One possible compromise would be for localtime to poll, and for
localtime_r to not poll (you must call tzset to poll), and tell people
"if you want performance, use localtime_r". That might hit a sweet spot
of doing polling for naive programs where performance matters, and
giving higher performance for applications that care about timestamp
performance.
  
Carlos O'Donell April 12, 2019, 5:07 p.m. UTC | #15
On 4/11/19 1:51 PM, Paul Eggert wrote:
> On 4/10/19 6:18 AM, Carlos O'Donell wrote:
>> I think there *have* to be ways in which this data can be dynamically
>> updated without a huge performance penalty.
> 
> What ways are those, though? It's not like the penalty is zero.

"without huge performance penalty" ;-)

Without threads you have to stat at some point to notice the file
updated. Cost.

You have to use RCU and a pointer to the binary data. Cost.

I also think we need to do something similar for locale data, and
the NSS reloading that DJ is looking at. In total that's 4 interfaces
which could use a dynamic update/commit style mechanism in glibc.
I'm going to go look and see if anyone has best practice for this in C.

The alternative:

(a) Check for update?

   Provide a function that tells the user if the on-disk data has changed.
   e.g tzdb_changed(), locale_changed(). The function lets the user control
   the cost of the stat and when it happens, but isolates them from the
   implemetnation.

(b) Commit update.

   Provide a function to do the thread-safe update, which the user can
   execute whenever they want e.g. tzset(), locale_reload(). The function
   lets the user control the cost of the reload and when it happens, and
   again isolates them from the implementation.

You still need to have something like RCU or an atomic protocol to make
the update thread-safe. That cost never goes away, but it's minor compared
to when to check and when to commit the update.
  
> One possible compromise would be for localtime to poll, and for
> localtime_r to not poll (you must call tzset to poll), and tell people
> "if you want performance, use localtime_r". That might hit a sweet spot
> of doing polling for naive programs where performance matters, and
> giving higher performance for applications that care about timestamp
> performance.
  
If users have to make code changes it's better to just add a new api
for "Check for update?" and "Do the update!".
  
Paul Eggert April 12, 2019, 6:41 p.m. UTC | #16
On 4/12/19 10:07 AM, Carlos O'Donell wrote:
>> One possible compromise would be for localtime to poll, and for
>> localtime_r to not poll...
>  
> If users have to make code changes it's better to just add a new api
> for "Check for update?" and "Do the update!". 

If we add that API, will we also change localtime() to stop checking for
and doing updates in the (undocumented) cases where localtime checks for
and does them now? And if not, how will programs arrange for localtime
to be faster? This would seem to be the key question.
  
Carlos O'Donell April 12, 2019, 6:57 p.m. UTC | #17
On 4/12/19 2:41 PM, Paul Eggert wrote:
> On 4/12/19 10:07 AM, Carlos O'Donell wrote:
>>> One possible compromise would be for localtime to poll, and for
>>> localtime_r to not poll...
>>   
>> If users have to make code changes it's better to just add a new api
>> for "Check for update?" and "Do the update!".
> 
> If we add that API, will we also change localtime() to stop checking for
> and doing updates in the (undocumented) cases where localtime checks for
> and does them now? And if not, how will programs arrange for localtime
> to be faster? This would seem to be the key question.
  
Sure! I would be on board with changing localtime() to stop checking for
and doing updates (unless someone has a convincing argument not to do this).
  
Florian Weimer April 12, 2019, 7:26 p.m. UTC | #18
* Carlos O'Donell:

> On 4/12/19 2:41 PM, Paul Eggert wrote:
>> On 4/12/19 10:07 AM, Carlos O'Donell wrote:
>>>> One possible compromise would be for localtime to poll, and for
>>>> localtime_r to not poll...
>>>   If users have to make code changes it's better to just add a new
>>> api
>>> for "Check for update?" and "Do the update!".
>>
>> If we add that API, will we also change localtime() to stop checking for
>> and doing updates in the (undocumented) cases where localtime checks for
>> and does them now? And if not, how will programs arrange for localtime
>> to be faster? This would seem to be the key question.

> Sure! I would be on board with changing localtime() to stop checking
> for and doing updates (unless someone has a convincing argument not to
> do this).

I want to make localtime the canonical interface, using thread-local
storage.

Thanks,
Florian
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 6f1d967f62..831d89ba8f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,16 @@ 
+2019-02-11  Paul Eggert  <eggert@cs.ucla.edu>
+
+	Fix multiple minor tzset glitches [BZ #24004]
+	Avoid polling if TZ is unset.
+	Consistently ignore leading ":" in TZ as per documentation.
+	Treat TZ="" as Universal Time without leap seconds,
+	for consistency with upstream.
+	* time/tzset.c (tzset_internal): Check for null TZ first.
+	Do not substitute "Universal" for ""; this is __tzfile_read’s
+	job and we were doing it incorrectly as we treated TZ=""
+	differently from TZ=":".
+	* time/tzfile.c (__tzfile_read): Simplify, as FILE cannot be null.
+
 2019-02-11  Paul A. Clarke  <pc@us.ibm.com>
 
 	* sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrtf):
diff --git a/NEWS b/NEWS
index 0a3b6c7a5a..87de5030c1 100644
--- a/NEWS
+++ b/NEWS
@@ -22,6 +22,13 @@  Deprecated and removed features, and other changes affecting compatibility:
   definitions in libc will be used automatically, which have been available
   since glibc 2.17.
 
+* If the TZ environment variable is unset, functions like localtime no
+  longer stat the file /etc/localtime if the file has already been
+  consulted.  Instead, they reuse the cached timezone info, just as they
+  would with TZ="/etc/localtime".  If TZ="", these functions now omit leap
+  seconds even on unusual installations where TZ="Universal" identifies a
+  TZif file with leap seconds; this is for consistency with tzcode and BSD.
+
 Changes to build and runtime requirements:
 
 * GCC 6.2 or later is required to build the GNU C Library.
diff --git a/time/tzfile.c b/time/tzfile.c
index 7229ed93b7..db66f22a4e 100644
--- a/time/tzfile.c
+++ b/time/tzfile.c
@@ -115,10 +115,7 @@  __tzfile_read (const char *file, size_t extra, char **extrap)
 
   __use_tzfile = 0;
 
-  if (file == NULL)
-    /* No user specification; use the site-wide default.  */
-    file = TZDEFAULT;
-  else if (*file == '\0')
+  if (*file == '\0')
     /* User specified the empty string; use UTC with no leap seconds.  */
     goto ret_free_transitions;
   else
diff --git a/time/tzset.c b/time/tzset.c
index 307eb651ad..1903ef864d 100644
--- a/time/tzset.c
+++ b/time/tzset.c
@@ -375,25 +375,22 @@  tzset_internal (int always)
 
   /* Examine the TZ environment variable.  */
   tz = getenv ("TZ");
-  if (tz && *tz == '\0')
-    /* User specified the empty string; use UTC explicitly.  */
-    tz = "Universal";
+
+  if (tz == NULL)
+    /* No user specification; use the site-wide default.  */
+    tz = TZDEFAULT;
 
   /* A leading colon means "implementation defined syntax".
      We ignore the colon and always use the same algorithm:
      try a data file, and if none exists parse the 1003.1 syntax.  */
-  if (tz && *tz == ':')
+  if (*tz == ':')
     ++tz;
 
   /* Check whether the value changed since the last run.  */
-  if (old_tz != NULL && tz != NULL && strcmp (tz, old_tz) == 0)
+  if (old_tz != NULL && strcmp (tz, old_tz) == 0)
     /* No change, simply return.  */
     return;
 
-  if (tz == NULL)
-    /* No user specification; use the site-wide default.  */
-    tz = TZDEFAULT;
-
   tz_rules[0].name = NULL;
   tz_rules[1].name = NULL;