[v2,2/2] timegm.3: Remove recommendation against use of timegm()
Checks
Commit Message
It was straight after a note that they are nonstandard functions,
which already tells the user that if portability is in mind, they
shouldn't be used, so this recommendation adds nothing in that
sense.
Also, there's a note that timelocal() should _never_ be used, due
to mktime(3) being identical and in the POSIX standard (it is also
in C99), so this note would also add nothing in that sense.
So the only uses not covered by those other notes are non-portable
uses of timegm(). In that scenario, it is an excellent function,
since it avoids races, which a home-made timegm(3) implementation
by means of standard functions would have.
A trivial implementation of timegm(3) using only standard C could
be (I didn't test it; use on your own):
// portable_timegm.c
#include <time.h>
time_t portable_timegm(struct tm *tm)
{
tm->tm_isdst = 0;
/*
* If another thread modifies the timezone during the
* execution of the line below, it will produce undefined
* behavior.
*/
return mktime(tm) - timezone;
}
Cc: Paul Eggert <eggert@cs.ucla.edu>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
---
man3/timegm.3 | 1 -
1 file changed, 1 deletion(-)
Comments
On 10/11/21 4:12 AM, Alejandro Colomar wrote:
> time_t portable_timegm(struct tm *tm)
> {
> tm->tm_isdst = 0;
> /*
> * If another thread modifies the timezone during the
> * execution of the line below, it will produce undefined
> * behavior.
> */
> return mktime(tm) - timezone;
> }
This doesn't work for multiple reasons: it's not thread-safe, mktime
might set timezone even in a single-threaded app, and the subtraction
might overflow.
Hi Paul,
On 10/11/21 5:40 PM, Paul Eggert wrote:
> On 10/11/21 4:12 AM, Alejandro Colomar wrote:
>
>> time_t portable_timegm(struct tm *tm)
>> {
>> tm->tm_isdst = 0;
>> /*
>> * If another thread modifies the timezone during the
>> * execution of the line below, it will produce undefined
>> * behavior.
>> */
>> return mktime(tm) - timezone;
>> }
>
> This doesn't work for multiple reasons:
> it's not thread-safe,
Actually, since timegm(3) is implemented in terms of mktime(3), as far
as I could read from glibc code, the problem will be the same, I think.
I don't understand why it wasn't the other way around; maybe it was
more complex internally... But timegm(3) shouldn't need to depend on
environment variables.
> mktime might set timezone even in a single-threaded app,
Yes, I should have called tzset() before the return line.
> and the subtraction might overflow.
Yup, casting to int64_t needed. BTW, I had a look at mktime source
code, and it uses long, which might be 32 bits, and then there's a lot
of checking for overflow. Wouldn't it be simpler to just implement
mktime(3) with int64_t internally? Then, only at the return, cast it
implicitly to whatever time_t means, but int64_t would simplify the code
very much, I think.
Thanks,
Alex
On 10/15/21 3:03 PM, Alejandro Colomar (man-pages) wrote:
> Actually, since timegm(3) is implemented in terms of mktime(3), as far as I could read from glibc code, the problem will be the same, I think.
No, because another thread could setenv ("TZ", ...) between the time
that you call mktime and the time you look at the 'timezone' variable.
So even though mktime itself is thread-safe, the expression 'mktime(tm)
- timezone' is not.
> But timegm(3) shouldn't need to depend on environment variables.
It does depend, if leap seconds are involved.
>> and the subtraction might overflow.
>
> Yup, casting to int64_t needed.
That would help, but it still wouldn't suffice. It'd mishandle -1
returns, for example. Plus, we're better of not putting today's hardware
assumptions into code (suppose int is 64 bits in future machines?).
> BTW, I had a look at mktime source
> code, and it uses long, which might be 32 bits, and then there's a lot
> of checking for overflow.
mktime uses long_int, which is not necessarily 'long'. And no matter
what type you pick, it could overflow on some platform, even if it's an
only-hypothetical platform now.
Hi Paul,
On 10/16/21 2:20 AM, Paul Eggert wrote:
> On 10/15/21 3:03 PM, Alejandro Colomar (man-pages) wrote:
>
>> Actually, since timegm(3) is implemented in terms of mktime(3), as far
>> as I could read from glibc code, the problem will be the same, I think.
>
> No, because another thread could setenv ("TZ", ...) between the time
> that you call mktime and the time you look at the 'timezone' variable.
> So even though mktime itself is thread-safe, the expression 'mktime(tm)
> - timezone' is not.
Yes, there are probably many bugs in that sample code I wrote, which
glibc solves in its timegm(3) implementation. That probably gives more
force to the original point: timegm(3) is the only non-error-prone
solution in glibc for using UTC times, so it should not be marked as
"avoid using". The standards should get a function that does that, be
it timegm(), mktime_z(), or both.
Just for curiosity, I'm not sure about this, but from what I've seen,
the only lock in glibc is in gmtime(), which is called repeatedly from
within mktime(). If another thread calls setenv("TZ"...) in between one
of those calls, wouldn't it produce the same problems?
>
>> But timegm(3) shouldn't need to depend on environment variables.
>
> It does depend, if leap seconds are involved.
Okay. (I don't know too much about those.)
>
>>> and the subtraction might overflow.
>>
>> Yup, casting to int64_t needed.
>
> That would help, but it still wouldn't suffice. It'd mishandle -1
> returns, for example.
Ahh, yes.
> Plus, we're better of not putting today's hardware
> assumptions into code (suppose int is 64 bits in future machines?).
>
>> BTW, I had a look at mktime source code, and it uses long, which might
>> be 32 bits, and then there's a lot of checking for overflow.
>
> mktime uses long_int, which is not necessarily 'long'. And no matter
> what type you pick, it could overflow on some platform, even if it's an
> only-hypothetical platform now.
I think that's not a problem for the following reasons:
- int is unlikely to be >32 bits. If so, we would miss one of the
"conventional" types: int8_t, int16_t, int32_t couldn't map to
fundamental types, unless we add a new type (short short int?), which is
also unlikely because scanf() already uses %hhi for signed char. I
think it's more likely to see something like 'long long long int'.
- The current types can already handle 128-bit archs (just use the same
mapping as in amd64 and change long long int to be int128_t), so maybe
we'll need the triple long when we get to 256-bit archs. Very
hypothetically, that is.
- Even if int ever happened to be 64 bit, this problem would still be
something very theoretical, since INT64_MAX is way greater than the age
of the universe, and many orders of magnitude greater than the expected
lifespan of the sun, and therefore the concept of leap years, months,
ydays, wdays, and so on will be meaningless for such values. How many
seconds since the Epoch will have happened the 2nd March of the year
that the Milky Way collides with Andromeda, at 11:30? I think the
correct answer to that question should be undefined behavior; or an
error if you want to be nice.
So I wouldn't care for now, and maybe just add some initial check such as:
if (tm->tm_year > SOME_ARBITRARY_HUGE_VALUE || tm->tm_mon >
SOME_ARBITRARY_HUGE_VALUE || ...) {
errno = EOVERFLOW;
return -1;
}
and then go on.
Thanks,
Alex
On 10/17/21 11:02, Alejandro Colomar (man-pages) wrote:
> timegm(3) is the only non-error-prone
> solution in glibc for using UTC times, so it should not be marked as
> "avoid using".
timegm is not portable, so it's reasonable for the documentation to warn
against its use. Perhaps the warning could be made clearer.
> - int is unlikely to be >32 bits.
There are a few platforms where int is >32 bits (these are typically
ILP64). Although glibc doesn't currently support them, let's not place
unnecessary obstacles in the way.
> - Even if int ever happened to be 64 bit, this problem would still be
> something very theoretical
The behavior of the 'zdump' command would change. I imagine it'd affect
other commands as well. Admittedly most people wouldn't notice.
> if (tm->tm_year > SOME_ARBITRARY_HUGE_VALUE
Let's not impose arbitrary limits.
Hi Paul,
On 10/18/21 00:00, Paul Eggert wrote:
> On 10/17/21 11:02, Alejandro Colomar (man-pages) wrote:
>
>> timegm(3) is the only non-error-prone solution in glibc for using UTC
>> times, so it should not be marked as "avoid using".
>
> timegm is not portable, so it's reasonable for the documentation to warn
> against its use. Perhaps the warning could be made clearer.
Well, there's no portable alternative, as we discussed, so if a program
needs to do what timegm(3) does (handle UTC dates), it needs to call it.
The only portable alternatives are using other libraries such as boost
(C++). I prefer porting timegm(3) to those platforms instead. Since
C2X will add timegm(3), it's good news.
>
>
>> - int is unlikely to be >32 bits.
>
> There are a few platforms where int is >32 bits (these are typically
> ILP64).
Interesting; I didn't know that. What a weird thing.
> Although glibc doesn't currently support them, let's not place
> unnecessary obstacles in the way.
>
>
>> - Even if int ever happened to be 64 bit, this problem would still be
>> something very theoretical
>
> The behavior of the 'zdump' command would change. I imagine it'd affect
> other commands as well. Admittedly most people wouldn't notice.
>
>
>> if (tm->tm_year > SOME_ARBITRARY_HUGE_VALUE
>
> Let's not impose arbitrary limits.
Since glibc doesn't support 64-bit 'int' yet, the following restriction
could be added without altering the behavior of any existing program:
[
If any of the fields of 'struct tm' would overflow 'int32_t', the
behavior of timegm(3) is undefined.
]
It is arbitrary, yes, but in reality, the code would handle correctly
more than INT32_MAX; it would only cause overflow for values close to
INT64_MAX.
Moreover, it would only affect those weird platforms.
And in those cases:
- If the date is something sane, UB will never be triggered.
- If the date can be compromised (e.g., user input), the only thing that
the programmer needs to do to avoid UB, is to store the temporary values
in int32_t variables before storing them in the struct tm. That will
handle any overflow by truncating it (if the programmer wants to do
something else, that's fine, but it's the minimum to avoid UB). Not
much of a problem.
It would simplify very much glibc code, by imposing small restrictions
to the user.
Don't you think?
Thanks,
Alex
On 11/4/21 17:47, Alejandro Colomar (man-pages) wrote:
> Since C2X will add timegm(3), it's good news.
Yes at that point we can remove or reword the portability warning.
> the following restriction could be added without altering the behavior of any existing program:
>
> [
> If any of the fields of 'struct tm' would overflow 'int32_t', the behavior of timegm(3) is undefined.
> ]
No, timegm should not be documented to have a 32-bit limit on struct tm
components. It should be documented only to have an int limit on struct
tm components. I.e., the same as mktime, localtime, gmtime, etc.
> It would simplify very much glibc code, by imposing small restrictions to the user.
The general glibc philosophy is to not impose arbitrary limits on the
user, even if the limits are "small restrictions". Occasionally limits
need to be imposed anyway for a good reason, but there's no good reason
here.
@@ -97,7 +97,6 @@ T} Thread safety MT-Safe env locale
.SH CONFORMING TO
These functions are nonstandard GNU extensions
that are also present on the BSDs.
-Avoid their use.
.SH NOTES
The
.BR timelocal ()