[v9] y2038: Introduce the __ASSUME_TIME64_SYSCALLS define

Message ID 20190827173015.24370-1-alistair.francis@wdc.com
State New, archived
Headers

Commit Message

Alistair Francis Aug. 27, 2019, 5:30 p.m. UTC
  From: Lukasz Majewski <lukma@denx.de>

The __ASSUME_TIME64_SYSCALLS macro indicates if the Linux kernel provides
64-bit time syscalls. If __ASSUME_TIME64_SYSCALLS is defined glibc will
only use the *64/_time64 suffixed syscalls.

__ASSUME_TIME64_SYSCALLS is defined when:
 1. __WORDSIZE == 64 - 64-bit time and syscalls have previously been
    supported for 64-bit platforms.

 2. __WORDSIZE == 32 - Since kernel 5.1 a new set of 64-bit time
    syscalls have been added. This provides a 64-bit time ABI to 32-bit
    userspace applications.

    List of new syscalls added to v5.1. kernel (they accept data based on
    struct timespec and timeval with 64 bit tv_sec):

     clock_gettime64
     clock_settime64
     clock_adjtime64
     clock_getres_time64
     clock_nanosleep_time64
     timer_gettime64
     timer_settime64
     timerfd_gettime64
     timerfd_settime64
     utimensat_time64
     pselect6_time64
     ppoll_time64
     io_pgetevents_time64
     recvmmsg_time64
     mq_timedsend_time64
     mq_timedreceive_time64
     semtimedop_time64
     rt_sigtimedwait_time64
     futex_time64
     sched_rr_get_interval_time64

3. __WORDSIZE == 32 && _SYSCALL_WORDSIZE == 64 - a special case for machines
  with ILP32 data model, but already supporting 64 bit time (the 'x32'
  architecture to be precise).

This flag can be removed when glibc's minimal supported kernel version
passes 5.1 as after that all architectures will provide a 64-bit time
syscall.

* sysdeps/unix/sysv/linux/kernel-features.h:
(__ASSUME_TIME64_SYSCALLS): Define.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
v9:
 - Reword the commit message and code comment

 sysdeps/unix/sysv/linux/kernel-features.h | 48 +++++++++++++++++++++++
 1 file changed, 48 insertions(+)
  

Comments

Zack Weinberg Aug. 27, 2019, 7:03 p.m. UTC | #1
On Tue, Aug 27, 2019 at 1:35 PM Alistair Francis
<alistair.francis@wdc.com> wrote:
> From: Lukasz Majewski <lukma@denx.de>
...

Thanks for tackling this revision.  I think it's much clearer now what
the flag means.  I would like to suggest some further improvements to
the wording of the comment and the commit message.

The commit message is largely redundant to the comment.  I think all
you need to say in the commit message is:

# Add __ASSUME_TIME64_SYSCALLS macro to linux/kernel-features.h.
#
# __ASSUME_TIME64_SYSCALLS will be defined (with value 1) if the Linux
# kernel can be assumed to provide a complete set of system calls that
# process 64-bit time_t.
#
# __ASSUME_TIME64_SYSCALLS does not indicate whether time_t is
# actually 64 bits (that's __TIMEBITS) and also does not indicate
# whether the 64-bit time_t system calls have "time64" suffixes on
# their names.

And then the comment should be something like this:

# /* Support for 64-bit time_t in the system call interface.
#
#    This flag is always true for Linux 5.1 and later.  Prior to that
#    version, it is true only for some CPU architectures and ABIs:
#
#    - __WORDSIZE == 64 - all supported architectures where pointers
#      are 64 bits also have always had 64-bit time_t.
#
#    - __WORDSIZE == 32 && __SYSCALL_WORDSIZE == 64 - this describes
#      only one supported configuration, x86's 'x32' subarchitecture,
#      where pointers are 32 bits but time_t has always been 64 bits.
#
#    For architectures where __WORDSIZE == 32 && __SYSCALL_WORDSIZE != 64,
#    glibc supports both 32-bit time_t (__TIMESIZE == 32) and 64-bit time_t
#    (__TIMESIZE == 64).  Independent of this, when __ASSUME_TIME64_SYSCALLS
#    is defined, glibc uses only the 64-bit time_t system calls; conversions
#    occur in user space.  When __ASSUME_TIME64_SYSCALLS is *not* defined,
#    glibc will attempt to use the 64-bit time_t system calls but will fall
#    back to the 32-bit system calls if they are not available.
#
#    Because of this, __ASSUME_TIME64_SYSCALLS being defined does not
#    mean that __TIMESIZE is 64, and __TIMESIZE being equal to 64 does
#    not mean that __ASSUME_TIME64_SYSCALLS is defined.  All four
#    cases are possible.  */

I think that gets everything important.  Neither the commit message
nor the comment needs to list all the new system calls.

zw
  
Joseph Myers Aug. 27, 2019, 7:35 p.m. UTC | #2
On Tue, 27 Aug 2019, Zack Weinberg wrote:

> I think that gets everything important.  Neither the commit message
> nor the comment needs to list all the new system calls.

The syscalls need to be listed to make clear exactly what the interfaces 
covered by the macro are.

* If further related syscalls are added in future, e.g. timespec64 
versions of syscalls that currently use struct rusage, they are *not* 
covered by this macro.

* If the semantics of the unsuffixed syscall on 64-bit architectures do 
not exactly match those of the suffixed syscall on 32-bit architectures, 
so that a #define of the suffixed name to the unsuffixed name doesn't 
suffice to use the existing syscall on 64-bit architectures, it is *not* 
covered by this macro, at least not without an explanation of the semantic 
differences in the comment.  (See the semtimedop discussion.)

* If the unsuffixed syscall is not in fact available on all 64-bit 
architectures supported by glibc in the minimum supported kernel version, 
it is *not* covered by this macro.  (Although treating it as covered would 
only cause issues if there is actually any fallback code for the case 
where the unsuffixed syscall isn't present either.)

The fact that the syscalls described by the macro may be suffixed or 
unsuffixed also needs to be in the comment, not just the commit message.
  
Alistair Francis Aug. 27, 2019, 8:33 p.m. UTC | #3
On Tue, Aug 27, 2019 at 12:35 PM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Tue, 27 Aug 2019, Zack Weinberg wrote:
>
> > I think that gets everything important.  Neither the commit message
> > nor the comment needs to list all the new system calls.
>
> The syscalls need to be listed to make clear exactly what the interfaces
> covered by the macro are.

I'm a little confused, should I change anything or leave the patch as is?

>
> * If further related syscalls are added in future, e.g. timespec64
> versions of syscalls that currently use struct rusage, they are *not*
> covered by this macro.
>
> * If the semantics of the unsuffixed syscall on 64-bit architectures do
> not exactly match those of the suffixed syscall on 32-bit architectures,
> so that a #define of the suffixed name to the unsuffixed name doesn't
> suffice to use the existing syscall on 64-bit architectures, it is *not*
> covered by this macro, at least not without an explanation of the semantic
> differences in the comment.  (See the semtimedop discussion.)
>
> * If the unsuffixed syscall is not in fact available on all 64-bit
> architectures supported by glibc in the minimum supported kernel version,
> it is *not* covered by this macro.  (Although treating it as covered would
> only cause issues if there is actually any fallback code for the case
> where the unsuffixed syscall isn't present either.)
>
> The fact that the syscalls described by the macro may be suffixed or
> unsuffixed also needs to be in the comment, not just the commit message.

Is this enough to describe that (already in the commit)?

   On systems with __WORDSIZE == 64 the __NR_clock_settime syscall is used
   to achieve this goal. Systems with __WORDSIZE == 32 use the
   __NR_clock_settime64 syscall available from Linux version 5.1.

Alistair

>
> --
> Joseph S. Myers
> joseph@codesourcery.com
  
Lukasz Majewski Aug. 27, 2019, 9:53 p.m. UTC | #4
Hi Alistair,

> On Tue, Aug 27, 2019 at 12:35 PM Joseph Myers
> <joseph@codesourcery.com> wrote:
> >
> > On Tue, 27 Aug 2019, Zack Weinberg wrote:
> >  
> > > I think that gets everything important.  Neither the commit
> > > message nor the comment needs to list all the new system calls.  
> >
> > The syscalls need to be listed to make clear exactly what the
> > interfaces covered by the macro are.  
> 
> I'm a little confused, should I change anything or leave the patch as
> is?

Although, I do have the little experience in the glibc development, I
would like to propose the patch description (commit message and
in-header comment) as in v8:

https://patchwork.ozlabs.org/patch/1117100/

The commit message is very short as the all info is in the comment.

Moreover, I would like to point out that the v8 had issues with wording
and IMHO in general reflected the idea behind __ASSUME_TIME64_SYSCALLS
flag.

Maybe we could refine v8 and v9 (reworded by Alistair after Joseph's
comments) ?

Zack, would you consider re-reading v8 and v9 and refining them ?

> 
> >
> > * If further related syscalls are added in future, e.g. timespec64
> > versions of syscalls that currently use struct rusage, they are
> > *not* covered by this macro.
> >
> > * If the semantics of the unsuffixed syscall on 64-bit
> > architectures do not exactly match those of the suffixed syscall on
> > 32-bit architectures, so that a #define of the suffixed name to the
> > unsuffixed name doesn't suffice to use the existing syscall on
> > 64-bit architectures, it is *not* covered by this macro, at least
> > not without an explanation of the semantic differences in the
> > comment.  (See the semtimedop discussion.)
> >
> > * If the unsuffixed syscall is not in fact available on all 64-bit
> > architectures supported by glibc in the minimum supported kernel
> > version, it is *not* covered by this macro.  (Although treating it
> > as covered would only cause issues if there is actually any
> > fallback code for the case where the unsuffixed syscall isn't
> > present either.)
> >
> > The fact that the syscalls described by the macro may be suffixed or
> > unsuffixed also needs to be in the comment, not just the commit
> > message.  
> 
> Is this enough to describe that (already in the commit)?
> 
>    On systems with __WORDSIZE == 64 the __NR_clock_settime syscall is
> used to achieve this goal. Systems with __WORDSIZE == 32 use the
>    __NR_clock_settime64 syscall available from Linux version 5.1.
> 
> Alistair
> 
> >
> > --
> > Joseph S. Myers
> > joseph@codesourcery.com  



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Joseph Myers Aug. 27, 2019, 9:55 p.m. UTC | #5
On Tue, 27 Aug 2019, Alistair Francis wrote:

> On Tue, Aug 27, 2019 at 12:35 PM Joseph Myers <joseph@codesourcery.com> wrote:
> >
> > On Tue, 27 Aug 2019, Zack Weinberg wrote:
> >
> > > I think that gets everything important.  Neither the commit message
> > > nor the comment needs to list all the new system calls.
> >
> > The syscalls need to be listed to make clear exactly what the interfaces
> > covered by the macro are.
> 
> I'm a little confused, should I change anything or leave the patch as is?

My comments were on Zack's version (missing the list of syscalls).  I 
believe Zack's version is not ready to go in as is but needs further 
changes for the issues I described (listing the syscalls and referring to 
the possibility that they may be suffixed or unsuffixed).
  
Zack Weinberg Aug. 27, 2019, 10:21 p.m. UTC | #6
On Tue, Aug 27, 2019 at 5:55 PM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Tue, 27 Aug 2019, Alistair Francis wrote:
>
> > On Tue, Aug 27, 2019 at 12:35 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > >
> > > On Tue, 27 Aug 2019, Zack Weinberg wrote:
> > >
> > > > I think that gets everything important.  Neither the commit message
> > > > nor the comment needs to list all the new system calls.
> > >
> > > The syscalls need to be listed to make clear exactly what the interfaces
> > > covered by the macro are.
> >
> > I'm a little confused, should I change anything or leave the patch as is?
>
> My comments were on Zack's version (missing the list of syscalls).  I
> believe Zack's version is not ready to go in as is but needs further
> changes for the issues I described (listing the syscalls and referring to
> the possibility that they may be suffixed or unsuffixed).

I will reread v8 and v9 and send another draft tomorrow.

zw
  
Rich Felker Aug. 30, 2019, 1:38 p.m. UTC | #7
On Tue, Aug 27, 2019 at 07:35:14PM +0000, Joseph Myers wrote:
> On Tue, 27 Aug 2019, Zack Weinberg wrote:
> 
> > I think that gets everything important.  Neither the commit message
> > nor the comment needs to list all the new system calls.
> 
> The syscalls need to be listed to make clear exactly what the interfaces 
> covered by the macro are.
> 
> * If further related syscalls are added in future, e.g. timespec64 
> versions of syscalls that currently use struct rusage, they are *not* 
> covered by this macro.
> 
> * If the semantics of the unsuffixed syscall on 64-bit architectures do 
> not exactly match those of the suffixed syscall on 32-bit architectures, 
> so that a #define of the suffixed name to the unsuffixed name doesn't 
> suffice to use the existing syscall on 64-bit architectures, it is *not* 
> covered by this macro, at least not without an explanation of the semantic 
> differences in the comment.  (See the semtimedop discussion.)

To clarify, none of the timespec ones "exactly match" -- the suffixed
syscalls on 32-bit require filling the padding around tv_nsec, whereas
the unsuffixed 64-bit ones don't (because there is no padding since
long is 64-bit). This does not affect using the unsuffixed 64-bit
syscall in place of the suffixed syscall (which AIUI is the usage you
had in mind), but would affect the reverse substitution. Any updates
to wording around this should try to avoid making the matter more
confusing to readers, IMO.

Rich
  
Joseph Myers Aug. 30, 2019, 4:36 p.m. UTC | #8
On Fri, 30 Aug 2019, Rich Felker wrote:

> To clarify, none of the timespec ones "exactly match" -- the suffixed
> syscalls on 32-bit require filling the padding around tv_nsec, whereas

What do you mean by "require filling the padding"?  I thought the 
conclusion in the kernel was that it dealt with zeroing the padding when 
reading a timespec64 from userspace on a 32-bit system (with the caveat of 
that not happening for compat tasks under 64-bit kernels before 5.1.5, and 
so the question in 
<https://sourceware.org/ml/libc-alpha/2019-05/msg00698.html> of whether to 
treat 5.1.0 through 5.1.4 as buggy and unsupported).
  
Rich Felker Aug. 30, 2019, 5 p.m. UTC | #9
On Fri, Aug 30, 2019 at 04:36:55PM +0000, Joseph Myers wrote:
> On Fri, 30 Aug 2019, Rich Felker wrote:
> 
> > To clarify, none of the timespec ones "exactly match" -- the suffixed
> > syscalls on 32-bit require filling the padding around tv_nsec, whereas
> 
> What do you mean by "require filling the padding"?  I thought the 
> conclusion in the kernel was that it dealt with zeroing the padding when 
> reading a timespec64 from userspace on a 32-bit system (with the caveat of 
> that not happening for compat tasks under 64-bit kernels before 5.1.5, and 
> so the question in 
> <https://sourceware.org/ml/libc-alpha/2019-05/msg00698.html> of whether to 
> treat 5.1.0 through 5.1.4 as buggy and unsupported).

I wasn't aware of this change. Being that the unfortunate behavior
actually appeared in released kernel versions, I would lean towards
assuming userspace has to patch it up. I'm skeptical of just
considering these versions as "unusably buggy", since, at present and
until there are time64 binaries, they appear to work just fine. It
will only be at some point in the future when time64 userspace appears
that these kernel versions break things catastrophically.

Rich
  
Zack Weinberg Aug. 30, 2019, 5:03 p.m. UTC | #10
On Thu, Aug 29, 2019 at 6:00 PM Alistair Francis <alistair23@gmail.com> wrote:
> On Thu, Aug 29, 2019 at 1:24 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > On Wed, 28 Aug 2019, Zack Weinberg wrote:
> > > OK, how does this look?
> >
> > This version is OK.
>
> Great! Thanks for your help Zack.
>
> Do you want to send this out as a patch?

I could just go ahead and apply it to master, maybe that would be the
simplest thing?  Or would you prefer to incorporate it into either
your patch series or Lukasz' patch series, so that it lands along with
the rest of the work?  (In the latter case, you should be able to feed
my previous message directly to 'git am'.)

On Wed, Aug 28, 2019 at 5:46 PM Lukasz Majewski <lukma@denx.de> wrote:
> > +#ifndef _LINUX_KERNEL_FEATURES_H
> > +#define _LINUX_KERNEL_FEATURES_H 1
>
> I assume that the above is just plain "guard" define (without any extra
> meaning for glibc or other library)?

Yes.  We have a lot of internal headers that are missing
multiple-inclusion guards, so for some time now, whenever I need to
modify an internal header that doesn't have a guard, I add one at the
same time.

On Fri, Aug 30, 2019 at 12:37 PM Joseph Myers <joseph@codesourcery.com> wrote:
> On Fri, 30 Aug 2019, Rich Felker wrote:
> > To clarify, none of the timespec ones "exactly match" -- the suffixed
> > syscalls on 32-bit require filling the padding around tv_nsec, whereas
>
> What do you mean by "require filling the padding"?  I thought the
> conclusion in the kernel was that it dealt with zeroing the padding
...

The kernel should always explicitly clear all of the padding in any
structure it writes to user space, anything else risks leaking kernel
data (e.g. the compiler decides it can use a 64-bit load and store to
copy from the kernel's struct timespec to the user space struct
timespec because those high bits are don't-care in the destination
... oops, the previous allocation had a kernel pointer in that 64-bit
slot and we just exposed the KASLR base).

Regardless, though, I don't think the text I wrote needs to mention
this concern, since it actually doesn't talk at all about the "exactly
match" rule.  The set of system calls covered by the new macro is just
listed as a fact.  Please let me know if you disagree?

zw
  
Rich Felker Aug. 30, 2019, 5:09 p.m. UTC | #11
On Fri, Aug 30, 2019 at 01:03:16PM -0400, Zack Weinberg wrote:
> On Fri, Aug 30, 2019 at 12:37 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > On Fri, 30 Aug 2019, Rich Felker wrote:
> > > To clarify, none of the timespec ones "exactly match" -- the suffixed
> > > syscalls on 32-bit require filling the padding around tv_nsec, whereas
> >
> > What do you mean by "require filling the padding"?  I thought the
> > conclusion in the kernel was that it dealt with zeroing the padding
> ....
> 
> The kernel should always explicitly clear all of the padding in any
> structure it writes to user space, anything else risks leaking kernel
> data (e.g. the compiler decides it can use a 64-bit load and store to
> copy from the kernel's struct timespec to the user space struct
> timespec because those high bits are don't-care in the destination
> .... oops, the previous allocation had a kernel pointer in that 64-bit
> slot and we just exposed the KASLR base).

The issue is the other direction, when timespecs are passed to the
kernel (usually for read-only access), e.g. providing a timeout.

Rich
  
Joseph Myers Aug. 30, 2019, 5:16 p.m. UTC | #12
On Fri, 30 Aug 2019, Rich Felker wrote:

> On Fri, Aug 30, 2019 at 04:36:55PM +0000, Joseph Myers wrote:
> > On Fri, 30 Aug 2019, Rich Felker wrote:
> > 
> > > To clarify, none of the timespec ones "exactly match" -- the suffixed
> > > syscalls on 32-bit require filling the padding around tv_nsec, whereas
> > 
> > What do you mean by "require filling the padding"?  I thought the 
> > conclusion in the kernel was that it dealt with zeroing the padding when 
> > reading a timespec64 from userspace on a 32-bit system (with the caveat of 
> > that not happening for compat tasks under 64-bit kernels before 5.1.5, and 
> > so the question in 
> > <https://sourceware.org/ml/libc-alpha/2019-05/msg00698.html> of whether to 
> > treat 5.1.0 through 5.1.4 as buggy and unsupported).
> 
> I wasn't aware of this change. Being that the unfortunate behavior
> actually appeared in released kernel versions, I would lean towards
> assuming userspace has to patch it up. I'm skeptical of just

If it has to patch it up, it clearly does not need to do so when 5.1.5 or 
later is the configured minimum kernel - or on 32-bit-only architectures 
where the compat syscall case is irrelevant.  (Whether we wish to include 
a special case for that as a micro-optimization in code which already has 
a lot of complicated conditionals is another matter, but if we include 
code to patch things up it should be arranged in a way that makes it easy 
to find and remove all of it once 5.1.5 or later is the minimum supported 
kernel version for glibc.)
  
Zack Weinberg Aug. 30, 2019, 5:19 p.m. UTC | #13
On Fri, Aug 30, 2019 at 1:09 PM Rich Felker <dalias@libc.org> wrote:
> On Fri, Aug 30, 2019 at 01:03:16PM -0400, Zack Weinberg wrote:
> > On Fri, Aug 30, 2019 at 12:37 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > > On Fri, 30 Aug 2019, Rich Felker wrote:
> > > > To clarify, none of the timespec ones "exactly match" -- the suffixed
> > > > syscalls on 32-bit require filling the padding around tv_nsec, whereas
> > >
> > > What do you mean by "require filling the padding"?  I thought the
> > > conclusion in the kernel was that it dealt with zeroing the padding
> > ....
> >
> > The kernel should always explicitly clear all of the padding in any
> > structure it writes to user space, anything else risks leaking kernel
> > data (e.g. the compiler decides it can use a 64-bit load and store to
> > copy from the kernel's struct timespec to the user space struct
> > timespec because those high bits are don't-care in the destination
> > .... oops, the previous allocation had a kernel pointer in that 64-bit
> > slot and we just exposed the KASLR base).
>
> The issue is the other direction, when timespecs are passed to the
> kernel (usually for read-only access), e.g. providing a timeout.

Sorry, I misunderstood which direction you were talking about.

I think ignoring the padding bits has to be the kernel's
responsibility, because the C library can't assume that these pointers
refer to writable storage, nor can it assume that it is safe (or
acceptably efficient) to copy the data structure.

(Do we still have to pretend that using bare 'long' for the type of
tv_nsec isn't a defect in the relevant standards?)

zw
  
Alistair Francis Aug. 30, 2019, 5:24 p.m. UTC | #14
On Fri, Aug 30, 2019 at 10:03 AM Zack Weinberg <zackw@panix.com> wrote:
>
> On Thu, Aug 29, 2019 at 6:00 PM Alistair Francis <alistair23@gmail.com> wrote:
> > On Thu, Aug 29, 2019 at 1:24 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > > On Wed, 28 Aug 2019, Zack Weinberg wrote:
> > > > OK, how does this look?
> > >
> > > This version is OK.
> >
> > Great! Thanks for your help Zack.
> >
> > Do you want to send this out as a patch?
>
> I could just go ahead and apply it to master, maybe that would be the
> simplest thing?  Or would you prefer to incorporate it into either
> your patch series or Lukasz' patch series, so that it lands along with
> the rest of the work?  (In the latter case, you should be able to feed
> my previous message directly to 'git am'.)

Yes, applying it to master is even better!

Alistair

>
> On Wed, Aug 28, 2019 at 5:46 PM Lukasz Majewski <lukma@denx.de> wrote:
> > > +#ifndef _LINUX_KERNEL_FEATURES_H
> > > +#define _LINUX_KERNEL_FEATURES_H 1
> >
> > I assume that the above is just plain "guard" define (without any extra
> > meaning for glibc or other library)?
>
> Yes.  We have a lot of internal headers that are missing
> multiple-inclusion guards, so for some time now, whenever I need to
> modify an internal header that doesn't have a guard, I add one at the
> same time.
>
> On Fri, Aug 30, 2019 at 12:37 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > On Fri, 30 Aug 2019, Rich Felker wrote:
> > > To clarify, none of the timespec ones "exactly match" -- the suffixed
> > > syscalls on 32-bit require filling the padding around tv_nsec, whereas
> >
> > What do you mean by "require filling the padding"?  I thought the
> > conclusion in the kernel was that it dealt with zeroing the padding
> ...
>
> The kernel should always explicitly clear all of the padding in any
> structure it writes to user space, anything else risks leaking kernel
> data (e.g. the compiler decides it can use a 64-bit load and store to
> copy from the kernel's struct timespec to the user space struct
> timespec because those high bits are don't-care in the destination
> ... oops, the previous allocation had a kernel pointer in that 64-bit
> slot and we just exposed the KASLR base).
>
> Regardless, though, I don't think the text I wrote needs to mention
> this concern, since it actually doesn't talk at all about the "exactly
> match" rule.  The set of system calls covered by the new macro is just
> listed as a fact.  Please let me know if you disagree?
>
> zw
  
Rich Felker Aug. 30, 2019, 6:26 p.m. UTC | #15
On Fri, Aug 30, 2019 at 01:19:27PM -0400, Zack Weinberg wrote:
> On Fri, Aug 30, 2019 at 1:09 PM Rich Felker <dalias@libc.org> wrote:
> > On Fri, Aug 30, 2019 at 01:03:16PM -0400, Zack Weinberg wrote:
> > > On Fri, Aug 30, 2019 at 12:37 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > > > On Fri, 30 Aug 2019, Rich Felker wrote:
> > > > > To clarify, none of the timespec ones "exactly match" -- the suffixed
> > > > > syscalls on 32-bit require filling the padding around tv_nsec, whereas
> > > >
> > > > What do you mean by "require filling the padding"?  I thought the
> > > > conclusion in the kernel was that it dealt with zeroing the padding
> > > ....
> > >
> > > The kernel should always explicitly clear all of the padding in any
> > > structure it writes to user space, anything else risks leaking kernel
> > > data (e.g. the compiler decides it can use a 64-bit load and store to
> > > copy from the kernel's struct timespec to the user space struct
> > > timespec because those high bits are don't-care in the destination
> > > .... oops, the previous allocation had a kernel pointer in that 64-bit
> > > slot and we just exposed the KASLR base).
> >
> > The issue is the other direction, when timespecs are passed to the
> > kernel (usually for read-only access), e.g. providing a timeout.
> 
> Sorry, I misunderstood which direction you were talking about.
> 
> I think ignoring the padding bits has to be the kernel's
> responsibility, because the C library can't assume that these pointers
> refer to writable storage, nor can it assume that it is safe (or
> acceptably efficient) to copy the data structure.

You don't patch them up. You copy them. E.g. instead of passing ts,
pass ts?(struct timespec[]){*ts}:0.

> (Do we still have to pretend that using bare 'long' for the type of
> tv_nsec isn't a defect in the relevant standards?)

It's not a defect. There's no reason for a special typedef for this
field because long can necessarily represent the range of nanoseconds,
[0,999999999].

Rich
  
Zack Weinberg Aug. 30, 2019, 6:59 p.m. UTC | #16
On Fri, Aug 30, 2019 at 2:26 PM Rich Felker <dalias@libc.org> wrote:
> > I think ignoring the padding bits has to be the kernel's
> > responsibility, because the C library can't assume that these pointers
> > refer to writable storage, nor can it assume that it is safe (or
> > acceptably efficient) to copy the data structure.
>
> You don't patch them up. You copy them. E.g. instead of passing ts,
> pass ts?(struct timespec[]){*ts}:0.

We can't assume we can do that: the structure might be too large, or
its address might be significant.  In fact, I distinctly remember we
tried that for some other kernel/user ABI mismatch a couple years ago
and it broke stuff and we had to back it out.

> > (Do we still have to pretend that using bare 'long' for the type of
> > tv_nsec isn't a defect in the relevant standards?)
>
> It's not a defect. There's no reason for a special typedef for this
> field because long can necessarily represent the range of nanoseconds,
> [0,999999999].

It's a defect! It's literally the only thing stopping us from making
the user space definition of struct timespec agree with the kernel
definition, which would make this entire problem vanish! Why am I the
only person who sees this! I feel like I'm taking crazy pills!
</mugatu>

zw
  
Paul Eggert Aug. 30, 2019, 7:11 p.m. UTC | #17
On 8/30/19 11:59 AM, Zack Weinberg wrote:
>>> (Do we still have to pretend that using bare 'long' for the type of
>>> tv_nsec isn't a defect in the relevant standards?)
>> It's not a defect. There's no reason for a special typedef for this
>> field because long can necessarily represent the range of nanoseconds,
>> [0,999999999].
> It's a defect!

+1. 'long' causes problems in practice and POSIX should be fixed. It's 
not just the problem Zack mentioned; I have application code where the 
build broke because the app unwisely trusted the POSIX spec's saying 
'long' and assigned the address of tv_nsec to a long * pointer, 
something that doesn't work on x32.
  
Joseph Myers Aug. 30, 2019, 7:27 p.m. UTC | #18
On Fri, 30 Aug 2019, Zack Weinberg wrote:

> It's a defect! It's literally the only thing stopping us from making
> the user space definition of struct timespec agree with the kernel
> definition, which would make this entire problem vanish!

It's not the job of POSIX to make it easy for one bit (glibc) of an 
implementation to compensate for non-POSIX interfaces in another bit (the 
Linux kernel) of the same implementation.  As far as POSIX is concerned 
it's one implementation.
  
Rich Felker Aug. 30, 2019, 8:18 p.m. UTC | #19
On Fri, Aug 30, 2019 at 02:59:05PM -0400, Zack Weinberg wrote:
> On Fri, Aug 30, 2019 at 2:26 PM Rich Felker <dalias@libc.org> wrote:
> > > I think ignoring the padding bits has to be the kernel's
> > > responsibility, because the C library can't assume that these pointers
> > > refer to writable storage, nor can it assume that it is safe (or
> > > acceptably efficient) to copy the data structure.
> >
> > You don't patch them up. You copy them. E.g. instead of passing ts,
> > pass ts?(struct timespec[]){*ts}:0.
> 
> We can't assume we can do that: the structure might be too large, or
> its address might be significant.  In fact, I distinctly remember we
> tried that for some other kernel/user ABI mismatch a couple years ago
> and it broke stuff and we had to back it out.

timespecs passed to the kernel do not appear in any large (or even
largeish) structures except timex, which is very special-purpose and
modified by the syscall anyway. So this is not an issue.

The other case was msghdr and cmsghdr for sockets which is much more
of a problem, especially when they occur in arrays for the Linux
"mmsg" extensions.

> > > (Do we still have to pretend that using bare 'long' for the type of
> > > tv_nsec isn't a defect in the relevant standards?)
> >
> > It's not a defect. There's no reason for a special typedef for this
> > field because long can necessarily represent the range of nanoseconds,
> > [0,999999999].
> 
> It's a defect! It's literally the only thing stopping us from making
> the user space definition of struct timespec agree with the kernel
> definition, which would make this entire problem vanish! Why am I the
> only person who sees this! I feel like I'm taking crazy pills!
> </mugatu>

A defect in the kernel, not the standard. And one that's been fixed,
but the fix seems to have been too late to depend on it. :( And it was
too late for x32 anyway.

Rich
  
Rich Felker Aug. 30, 2019, 8:20 p.m. UTC | #20
On Fri, Aug 30, 2019 at 12:11:24PM -0700, Paul Eggert wrote:
> On 8/30/19 11:59 AM, Zack Weinberg wrote:
> >>>(Do we still have to pretend that using bare 'long' for the type of
> >>>tv_nsec isn't a defect in the relevant standards?)
> >>It's not a defect. There's no reason for a special typedef for this
> >>field because long can necessarily represent the range of nanoseconds,
> >>[0,999999999].
> >It's a defect!
> 
> +1. 'long' causes problems in practice and POSIX should be fixed.
> It's not just the problem Zack mentioned; I have application code
> where the build broke because the app unwisely trusted the POSIX
> spec's saying 'long' and assigned the address of tv_nsec to a long *
> pointer, something that doesn't work on x32.

So fix x32. This is an implementation bug. You can't invalidate
valid application code, especially now that it's not just POSIX but
ISO C that defines timespec.

Rich
  
Lukasz Majewski Sept. 2, 2019, 8:45 a.m. UTC | #21
Hi Alistair, Zack,

> On Fri, Aug 30, 2019 at 10:03 AM Zack Weinberg <zackw@panix.com>
> wrote:
> >
> > On Thu, Aug 29, 2019 at 6:00 PM Alistair Francis
> > <alistair23@gmail.com> wrote:  
> > > On Thu, Aug 29, 2019 at 1:24 PM Joseph Myers
> > > <joseph@codesourcery.com> wrote:  
> > > > On Wed, 28 Aug 2019, Zack Weinberg wrote:  
> > > > > OK, how does this look?  
> > > >
> > > > This version is OK.  
> > >
> > > Great! Thanks for your help Zack.
> > >
> > > Do you want to send this out as a patch?  
> >
> > I could just go ahead and apply it to master, maybe that would be
> > the simplest thing?  Or would you prefer to incorporate it into
> > either your patch series or Lukasz' patch series, so that it lands
> > along with the rest of the work?  (In the latter case, you should
> > be able to feed my previous message directly to 'git am'.)  
> 
> Yes, applying it to master is even better!

+1

(I also would prefer to have this flag applied directly to the master
branch, without the need wait for any other patches - to avoid blocking
work, which can be done in parallel).

> 
> Alistair
> 
> >
> > On Wed, Aug 28, 2019 at 5:46 PM Lukasz Majewski <lukma@denx.de>
> > wrote:  
> > > > +#ifndef _LINUX_KERNEL_FEATURES_H
> > > > +#define _LINUX_KERNEL_FEATURES_H 1  
> > >
> > > I assume that the above is just plain "guard" define (without any
> > > extra meaning for glibc or other library)?  
> >
> > Yes.  We have a lot of internal headers that are missing
> > multiple-inclusion guards, so for some time now, whenever I need to
> > modify an internal header that doesn't have a guard, I add one at
> > the same time.
> >
> > On Fri, Aug 30, 2019 at 12:37 PM Joseph Myers
> > <joseph@codesourcery.com> wrote:  
> > > On Fri, 30 Aug 2019, Rich Felker wrote:  
> > > > To clarify, none of the timespec ones "exactly match" -- the
> > > > suffixed syscalls on 32-bit require filling the padding around
> > > > tv_nsec, whereas  
> > >
> > > What do you mean by "require filling the padding"?  I thought the
> > > conclusion in the kernel was that it dealt with zeroing the
> > > padding  
> > ...
> >
> > The kernel should always explicitly clear all of the padding in any
> > structure it writes to user space, anything else risks leaking
> > kernel data (e.g. the compiler decides it can use a 64-bit load and
> > store to copy from the kernel's struct timespec to the user space
> > struct timespec because those high bits are don't-care in the
> > destination ... oops, the previous allocation had a kernel pointer
> > in that 64-bit slot and we just exposed the KASLR base).
> >
> > Regardless, though, I don't think the text I wrote needs to mention
> > this concern, since it actually doesn't talk at all about the
> > "exactly match" rule.  The set of system calls covered by the new
> > macro is just listed as a fact.  Please let me know if you disagree?
> >
> > zw  



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Florian Weimer Sept. 4, 2019, 12:34 p.m. UTC | #22
* Rich Felker:

> On Fri, Aug 30, 2019 at 01:19:27PM -0400, Zack Weinberg wrote:
>> On Fri, Aug 30, 2019 at 1:09 PM Rich Felker <dalias@libc.org> wrote:
>> > On Fri, Aug 30, 2019 at 01:03:16PM -0400, Zack Weinberg wrote:
>> > > On Fri, Aug 30, 2019 at 12:37 PM Joseph Myers <joseph@codesourcery.com> wrote:
>> > > > On Fri, 30 Aug 2019, Rich Felker wrote:
>> > > > > To clarify, none of the timespec ones "exactly match" -- the suffixed
>> > > > > syscalls on 32-bit require filling the padding around tv_nsec, whereas
>> > > >
>> > > > What do you mean by "require filling the padding"?  I thought the
>> > > > conclusion in the kernel was that it dealt with zeroing the padding
>> > > ....
>> > >
>> > > The kernel should always explicitly clear all of the padding in any
>> > > structure it writes to user space, anything else risks leaking kernel
>> > > data (e.g. the compiler decides it can use a 64-bit load and store to
>> > > copy from the kernel's struct timespec to the user space struct
>> > > timespec because those high bits are don't-care in the destination
>> > > .... oops, the previous allocation had a kernel pointer in that 64-bit
>> > > slot and we just exposed the KASLR base).
>> >
>> > The issue is the other direction, when timespecs are passed to the
>> > kernel (usually for read-only access), e.g. providing a timeout.
>> 
>> Sorry, I misunderstood which direction you were talking about.
>> 
>> I think ignoring the padding bits has to be the kernel's
>> responsibility, because the C library can't assume that these pointers
>> refer to writable storage, nor can it assume that it is safe (or
>> acceptably efficient) to copy the data structure.
>
> You don't patch them up. You copy them. E.g. instead of passing ts,
> pass ts?(struct timespec[]){*ts}:0.

I don't see how this clears the padding.  I don't think we can support
any concise syntax for that:

If we give the padding a name, this will not work on big-endian targets

  ts?(struct timespec[]){ts->tv_sec, ts->tv_nsec}:0

because the padding is in the middle, and ts->tv_nsec is copied into the
padding.

If we do not give the padding a name, the expression leaves the padding
uninitialized (C11: “Unnamed members of structure objects have
indeterminate value even after initialization.”).

And who should do this?  I think for pselect, glibc has to do it, but
what about ioctls and socket options?

Thanks,
Florian
  
Rich Felker Sept. 4, 2019, 12:50 p.m. UTC | #23
On Wed, Sep 04, 2019 at 02:34:45PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Fri, Aug 30, 2019 at 01:19:27PM -0400, Zack Weinberg wrote:
> >> On Fri, Aug 30, 2019 at 1:09 PM Rich Felker <dalias@libc.org> wrote:
> >> > On Fri, Aug 30, 2019 at 01:03:16PM -0400, Zack Weinberg wrote:
> >> > > On Fri, Aug 30, 2019 at 12:37 PM Joseph Myers <joseph@codesourcery.com> wrote:
> >> > > > On Fri, 30 Aug 2019, Rich Felker wrote:
> >> > > > > To clarify, none of the timespec ones "exactly match" -- the suffixed
> >> > > > > syscalls on 32-bit require filling the padding around tv_nsec, whereas
> >> > > >
> >> > > > What do you mean by "require filling the padding"?  I thought the
> >> > > > conclusion in the kernel was that it dealt with zeroing the padding
> >> > > ....
> >> > >
> >> > > The kernel should always explicitly clear all of the padding in any
> >> > > structure it writes to user space, anything else risks leaking kernel
> >> > > data (e.g. the compiler decides it can use a 64-bit load and store to
> >> > > copy from the kernel's struct timespec to the user space struct
> >> > > timespec because those high bits are don't-care in the destination
> >> > > .... oops, the previous allocation had a kernel pointer in that 64-bit
> >> > > slot and we just exposed the KASLR base).
> >> >
> >> > The issue is the other direction, when timespecs are passed to the
> >> > kernel (usually for read-only access), e.g. providing a timeout.
> >> 
> >> Sorry, I misunderstood which direction you were talking about.
> >> 
> >> I think ignoring the padding bits has to be the kernel's
> >> responsibility, because the C library can't assume that these pointers
> >> refer to writable storage, nor can it assume that it is safe (or
> >> acceptably efficient) to copy the data structure.
> >
> > You don't patch them up. You copy them. E.g. instead of passing ts,
> > pass ts?(struct timespec[]){*ts}:0.
> 
> I don't see how this clears the padding.  I don't think we can support
> any concise syntax for that:
> 
> If we give the padding a name, this will not work on big-endian targets
> 
>   ts?(struct timespec[]){ts->tv_sec, ts->tv_nsec}:0
> 
> because the padding is in the middle, and ts->tv_nsec is copied into the
> padding.

While it may be desirable for compatibility with wrong code for this
sort of initializer to work, there is no specification on the order or
lack of other named members in timespec. Correct initialization has to
use designated initializers. I tried to sidestep the problem by using
*ts, but indeed I botched that; it would not zero the padding.

Actually what I did on musl was just use (long long[]){s,ns} as the
argument to the (time64) syscall, which decouples the userspace
definition of the type from the kernel interface and makes it so we
can use unnamed padding. (And of course no padding at all would work
with this, but we want to match ABI and be able to directly use
timespecs written to userspace by the kernel.)

> If we do not give the padding a name, the expression leaves the padding
> uninitialized (C11: “Unnamed members of structure objects have
> indeterminate value even after initialization.”).
> 
> And who should do this?  I think for pselect, glibc has to do it, but
> what about ioctls and socket options?

These need translation logic anyway or post-time64 software won't run
on pre-time64 kernels. See

https://git.musl-libc.org/cgit/musl/commit/?id=51fd67fcbfa598e2fe1885b517451b84c0bfe3b7

So it's not a big deal to also zero the padding if needed. Ideally
though the kernel should handle this right, especially for new
ioctls/sockopts added post-time64 where there's no need for fallback
translation.

Rich
  
Florian Weimer Sept. 4, 2019, 2:56 p.m. UTC | #24
* Rich Felker:

>> If we give the padding a name, this will not work on big-endian targets
>> 
>>   ts?(struct timespec[]){ts->tv_sec, ts->tv_nsec}:0
>> 
>> because the padding is in the middle, and ts->tv_nsec is copied into the
>> padding.
>
> While it may be desirable for compatibility with wrong code for this
> sort of initializer to work, there is no specification on the order or
> lack of other named members in timespec. Correct initialization has to
> use designated initializers. I tried to sidestep the problem by using
> *ts, but indeed I botched that; it would not zero the padding.

Wrong initializers are fairly common, unfortunately:

<https://codesearch.debian.net/search?q=struct+timespec.*+%3D+%5C%7B%5Cs*%5B%5E+.0%5D&literal=0&perpkg=1>

>> If we do not give the padding a name, the expression leaves the padding
>> uninitialized (C11: “Unnamed members of structure objects have
>> indeterminate value even after initialization.”).
>> 
>> And who should do this?  I think for pselect, glibc has to do it, but
>> what about ioctls and socket options?
>
> These need translation logic anyway or post-time64 software won't run
> on pre-time64 kernels. See
>
> https://git.musl-libc.org/cgit/musl/commit/?id=51fd67fcbfa598e2fe1885b517451b84c0bfe3b7
>
> So it's not a big deal to also zero the padding if needed.

That's not in the patch, right?

Is the set of emulations required fixed?  In the sense that there will
be no new kernel features which are 32-bit only?

Only in that case, I think userspace emulation is feasible.  Getting
full coverage for ioctls will still be difficult, I think.

I wonder if we should define a proper type (with long long int members)
for applications to use with ioctls.  That would enable concise syntax
and documenting intent.

Or we could work with WG14 and POSIX and introduce a snseconds_t type,
formally blessing what x32 and the kernel already do.  If we can get
this changed, I'm not sure if supporting older C and POSIX standards
(which mandate long) is worth the cost.

> Ideally though the kernel should handle this right, especially for new
> ioctls/sockopts added post-time64 where there's no need for fallback
> translation.

Let's hope so.

Thanks,
Florian
  
Lukasz Majewski Sept. 4, 2019, 4:14 p.m. UTC | #25
Hi Florian, Rich,

> * Rich Felker:
> 
> >> If we give the padding a name, this will not work on big-endian
> >> targets
> >> 
> >>   ts?(struct timespec[]){ts->tv_sec, ts->tv_nsec}:0
> >> 
> >> because the padding is in the middle, and ts->tv_nsec is copied
> >> into the padding.  
> >
> > While it may be desirable for compatibility with wrong code for this
> > sort of initializer to work, there is no specification on the order
> > or lack of other named members in timespec. Correct initialization
> > has to use designated initializers. I tried to sidestep the problem
> > by using *ts, but indeed I botched that; it would not zero the
> > padding.  
> 
> Wrong initializers are fairly common, unfortunately:
> 
> <https://codesearch.debian.net/search?q=struct+timespec.*+%3D+%5C%7B%5Cs*%5B%5E+.0%5D&literal=0&perpkg=1>

As fair as I remember I had a discussion about the padding and explicit
initialization with Joseph:
https://patchwork.ozlabs.org/patch/1066737/#2155140


There was also a conclusion regarding explicit clearing of the padding
if necessary - kernel shall do it (or to be more precise - kernel will
perform modulo operation of the number passed to it from glibc if
needed).

> 
> >> If we do not give the padding a name, the expression leaves the
> >> padding uninitialized (C11: “Unnamed members of structure objects
> >> have indeterminate value even after initialization.”).
> >> 
> >> And who should do this?  I think for pselect, glibc has to do it,
> >> but what about ioctls and socket options?  
> >
> > These need translation logic anyway or post-time64 software won't
> > run on pre-time64 kernels. See
> >
> > https://git.musl-libc.org/cgit/musl/commit/?id=51fd67fcbfa598e2fe1885b517451b84c0bfe3b7
> >
> > So it's not a big deal to also zero the padding if needed.  
> 
> That's not in the patch, right?
> 
> Is the set of emulations required fixed?  In the sense that there will
> be no new kernel features which are 32-bit only?
> 
> Only in that case, I think userspace emulation is feasible.  Getting
> full coverage for ioctls will still be difficult, I think.
> 
> I wonder if we should define a proper type (with long long int
> members) for applications to use with ioctls.  That would enable
> concise syntax and documenting intent.
> 
> Or we could work with WG14 and POSIX and introduce a snseconds_t type,
> formally blessing what x32 and the kernel already do.  If we can get
> this changed, I'm not sure if supporting older C and POSIX standards
> (which mandate long) is worth the cost.
> 
> > Ideally though the kernel should handle this right, especially for
> > new ioctls/sockopts added post-time64 where there's no need for
> > fallback translation.  
> 
> Let's hope so.
> 
> Thanks,
> Florian



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
  
Florian Weimer Sept. 4, 2019, 4:41 p.m. UTC | #26
* Lukasz Majewski:

> There was also a conclusion regarding explicit clearing of the padding
> if necessary - kernel shall do it (or to be more precise - kernel will
> perform modulo operation of the number passed to it from glibc if
> needed).

But that's not what the kernel ended up doing, right?

Thanks,
Florian
  
Joseph Myers Sept. 4, 2019, 4:49 p.m. UTC | #27
On Wed, 4 Sep 2019, Florian Weimer wrote:

> * Lukasz Majewski:
> 
> > There was also a conclusion regarding explicit clearing of the padding
> > if necessary - kernel shall do it (or to be more precise - kernel will
> > perform modulo operation of the number passed to it from glibc if
> > needed).
> 
> But that's not what the kernel ended up doing, right?

My understanding is that it did - except for the case of compat syscalls 
under 64-bit kernels 5.1.0 through 5.1.4.
  
Rich Felker Sept. 4, 2019, 4:53 p.m. UTC | #28
On Wed, Sep 04, 2019 at 04:56:45PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> >> If we give the padding a name, this will not work on big-endian targets
> >> 
> >>   ts?(struct timespec[]){ts->tv_sec, ts->tv_nsec}:0
> >> 
> >> because the padding is in the middle, and ts->tv_nsec is copied into the
> >> padding.
> >
> > While it may be desirable for compatibility with wrong code for this
> > sort of initializer to work, there is no specification on the order or
> > lack of other named members in timespec. Correct initialization has to
> > use designated initializers. I tried to sidestep the problem by using
> > *ts, but indeed I botched that; it would not zero the padding.
> 
> Wrong initializers are fairly common, unfortunately:
> 
> <https://codesearch.debian.net/search?q=struct+timespec.*+%3D+%5C%7B%5Cs*%5B%5E+.0%5D&literal=0&perpkg=1>

Indeed, I believe you on that. :-P

> >> If we do not give the padding a name, the expression leaves the padding
> >> uninitialized (C11: “Unnamed members of structure objects have
> >> indeterminate value even after initialization.”).
> >> 
> >> And who should do this?  I think for pselect, glibc has to do it, but
> >> what about ioctls and socket options?
> >
> > These need translation logic anyway or post-time64 software won't run
> > on pre-time64 kernels. See
> >
> > https://git.musl-libc.org/cgit/musl/commit/?id=51fd67fcbfa598e2fe1885b517451b84c0bfe3b7
> >
> > So it's not a big deal to also zero the padding if needed.
> 
> That's not in the patch, right?

I'm not aware of any equivalent proposed code for glibc. But in the
absence of it, there's a major disincentive to adopt _TIME_BITS=64
until pre-5.1 kernels are mostly gone, since certain functionality
that's been present for decades will suddenly disappear (behaving as
if unsupported).

> Is the set of emulations required fixed?  In the sense that there will
> be no new kernel features which are 32-bit only?

For the most part, yes -- it's sockopt/ioctl interfaces that used
timeval or timespec and that existed prior to time64 support. The only
place it might be unbounded is when out-of-tree special-purpose
drivers or other kernel modules define ioctl interfaces involving
these types, and get built for pre-5.1 kernels. IMO any breakage in
them is much less of a big deal and could probably be dealt with via
local hacks in the (usually hardware-specific) software, or (for
glibc) just always building the small amount of affected software with
_TIME_BITS=32 on 32-bit archs until old kernels are no longer
relevant.

> Only in that case, I think userspace emulation is feasible.  Getting
> full coverage for ioctls will still be difficult, I think.

Yes, I still don't know the scope of the ioctl problem. I've only
worked on the sockopts and ioctls that are for use with sockets.

> I wonder if we should define a proper type (with long long int members)
> for applications to use with ioctls.  That would enable concise syntax
> and documenting intent.
> 
> Or we could work with WG14 and POSIX and introduce a snseconds_t type,
> formally blessing what x32 and the kernel already do.  If we can get
> this changed, I'm not sure if supporting older C and POSIX standards
> (which mandate long) is worth the cost.

I think the kernel is already treating (or intending to treat) the
padding correctly here. (Can someone confirm this? Arnd?) The problem
is not the case where the kernel accepts time64 but needs help fixing
up the padding. The problem is the case where the kernel only supports
the old time32 version of the sockopt/ioctl and userspace has 64-bit
time_t.

Rich
  
Florian Weimer Sept. 4, 2019, 4:53 p.m. UTC | #29
* Joseph Myers:

> On Wed, 4 Sep 2019, Florian Weimer wrote:
>
>> * Lukasz Majewski:
>> 
>> > There was also a conclusion regarding explicit clearing of the padding
>> > if necessary - kernel shall do it (or to be more precise - kernel will
>> > perform modulo operation of the number passed to it from glibc if
>> > needed).
>> 
>> But that's not what the kernel ended up doing, right?
>
> My understanding is that it did - except for the case of compat syscalls 
> under 64-bit kernels 5.1.0 through 5.1.4.

Oh!

Then I don't quite get why we are having this conversation at all.  The
anonymous bit-field hack should solve this quite nicely if we just say
that the exceptional range of kernels you indicated is too buggy to
support in glibc.

What am I missing?

Thanks,
Florian
  
Rich Felker Sept. 4, 2019, 5:06 p.m. UTC | #30
On Wed, Sep 04, 2019 at 06:53:54PM +0200, Florian Weimer wrote:
> * Joseph Myers:
> 
> > On Wed, 4 Sep 2019, Florian Weimer wrote:
> >
> >> * Lukasz Majewski:
> >> 
> >> > There was also a conclusion regarding explicit clearing of the padding
> >> > if necessary - kernel shall do it (or to be more precise - kernel will
> >> > perform modulo operation of the number passed to it from glibc if
> >> > needed).
> >> 
> >> But that's not what the kernel ended up doing, right?
> >
> > My understanding is that it did - except for the case of compat syscalls 
> > under 64-bit kernels 5.1.0 through 5.1.4.
> 
> Oh!
> 
> Then I don't quite get why we are having this conversation at all.  The
> anonymous bit-field hack should solve this quite nicely if we just say
> that the exceptional range of kernels you indicated is too buggy to
> support in glibc.
> 
> What am I missing?

I think you misunderstood my mail you were replying to in Message-ID
<871rwwp7dm.fsf@oldenburg2.str.redhat.com>. I was offering an easy way
(that was actually incorrect) to do the fixup to safely support
kernels 5.1.0-5.1.4. The motivation for doing so is as follows:

Normally, every now and then there's a range of kernel versions that
have a serious bug, and everybody quickly determines that they're not
usable. However for these versions, the "bug" doesn't affect any
existing 32-bit time32 software or 64-bit software, so it's plausible
that they'll stick around for a long time (as long as software built
with _TIME_BITS=64 is rare or nonexistent). Then, when someone
upgrades userspace (possibly even in a container, unrelated to the
host distro providing the bad kernel), everything will break
catastrophically, immediately 'timing out' with EINVAL or spinning at
100% cpu.

It may be that glibc just doesn't want to bother with supporting this.
I wouldn't blame you for taking that position. For musl, it allowed
unifying the x32 padding code paths, which were previously horrible
hacks, with the new general code for time64-on-32, which was already
motivation enough, and getting safety on 5.1.0-5.1.4 is a nice bonus.

Rich
  
Zack Weinberg Sept. 4, 2019, 8:14 p.m. UTC | #31
On Fri, Aug 30, 2019 at 1:28 PM Alistair Francis <alistair23@gmail.com> wrote:
> On Fri, Aug 30, 2019 at 10:03 AM Zack Weinberg <zackw@panix.com> wrote:
> > I could just go ahead and apply it to master, maybe that would be the
> > simplest thing?  Or would you prefer to incorporate it into either
> > your patch series or Lukasz' patch series, so that it lands along with
> > the rest of the work?  (In the latter case, you should be able to feed
> > my previous message directly to 'git am'.)
>
> Yes, applying it to master is even better!

OK, I have merged this now.

zw
  
Florian Weimer Sept. 9, 2019, 1:19 p.m. UTC | #32
* Rich Felker:

> On Wed, Sep 04, 2019 at 06:53:54PM +0200, Florian Weimer wrote:
>> * Joseph Myers:
>> 
>> > On Wed, 4 Sep 2019, Florian Weimer wrote:
>> >
>> >> * Lukasz Majewski:
>> >> 
>> >> > There was also a conclusion regarding explicit clearing of the padding
>> >> > if necessary - kernel shall do it (or to be more precise - kernel will
>> >> > perform modulo operation of the number passed to it from glibc if
>> >> > needed).
>> >> 
>> >> But that's not what the kernel ended up doing, right?
>> >
>> > My understanding is that it did - except for the case of compat syscalls 
>> > under 64-bit kernels 5.1.0 through 5.1.4.
>> 
>> Oh!
>> 
>> Then I don't quite get why we are having this conversation at all.  The
>> anonymous bit-field hack should solve this quite nicely if we just say
>> that the exceptional range of kernels you indicated is too buggy to
>> support in glibc.
>> 
>> What am I missing?
>
> I think you misunderstood my mail you were replying to in Message-ID
> <871rwwp7dm.fsf@oldenburg2.str.redhat.com>. I was offering an easy way
> (that was actually incorrect) to do the fixup to safely support
> kernels 5.1.0-5.1.4. The motivation for doing so is as follows:
>
> Normally, every now and then there's a range of kernel versions that
> have a serious bug, and everybody quickly determines that they're not
> usable. However for these versions, the "bug" doesn't affect any
> existing 32-bit time32 software or 64-bit software, so it's plausible
> that they'll stick around for a long time (as long as software built
> with _TIME_BITS=64 is rare or nonexistent). Then, when someone
> upgrades userspace (possibly even in a container, unrelated to the
> host distro providing the bad kernel), everything will break
> catastrophically, immediately 'timing out' with EINVAL or spinning at
> 100% cpu.

I think it's better to break in catastrophic ways, making clear that the
kernel is at fault, instead of expecting that application developers
support kernels in the problematic range (probably without being able to
test on them).  We could teach valgrind to optionally require the
padding to be defined, but I'm not sure if that's very useful if there
is software out there specifically written *not* to work on those
problematic kernels (kind of like alignment traps on x86 are useless,
now that software assumes that unaligned loads and stores are acceptable
and have reasonable performance).

Thanks,
Florian
  
Rich Felker Sept. 9, 2019, 2:08 p.m. UTC | #33
On Mon, Sep 09, 2019 at 03:19:50PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Wed, Sep 04, 2019 at 06:53:54PM +0200, Florian Weimer wrote:
> >> * Joseph Myers:
> >> 
> >> > On Wed, 4 Sep 2019, Florian Weimer wrote:
> >> >
> >> >> * Lukasz Majewski:
> >> >> 
> >> >> > There was also a conclusion regarding explicit clearing of the padding
> >> >> > if necessary - kernel shall do it (or to be more precise - kernel will
> >> >> > perform modulo operation of the number passed to it from glibc if
> >> >> > needed).
> >> >> 
> >> >> But that's not what the kernel ended up doing, right?
> >> >
> >> > My understanding is that it did - except for the case of compat syscalls 
> >> > under 64-bit kernels 5.1.0 through 5.1.4.
> >> 
> >> Oh!
> >> 
> >> Then I don't quite get why we are having this conversation at all.  The
> >> anonymous bit-field hack should solve this quite nicely if we just say
> >> that the exceptional range of kernels you indicated is too buggy to
> >> support in glibc.
> >> 
> >> What am I missing?
> >
> > I think you misunderstood my mail you were replying to in Message-ID
> > <871rwwp7dm.fsf@oldenburg2.str.redhat.com>. I was offering an easy way
> > (that was actually incorrect) to do the fixup to safely support
> > kernels 5.1.0-5.1.4. The motivation for doing so is as follows:
> >
> > Normally, every now and then there's a range of kernel versions that
> > have a serious bug, and everybody quickly determines that they're not
> > usable. However for these versions, the "bug" doesn't affect any
> > existing 32-bit time32 software or 64-bit software, so it's plausible
> > that they'll stick around for a long time (as long as software built
> > with _TIME_BITS=64 is rare or nonexistent). Then, when someone
> > upgrades userspace (possibly even in a container, unrelated to the
> > host distro providing the bad kernel), everything will break
> > catastrophically, immediately 'timing out' with EINVAL or spinning at
> > 100% cpu.
> 
> I think it's better to break in catastrophic ways, making clear that the
> kernel is at fault, instead of expecting that application developers
> support kernels in the problematic range (probably without being able to
> test on them).  We could teach valgrind to optionally require the
> padding to be defined, but I'm not sure if that's very useful if there
> is software out there specifically written *not* to work on those
> problematic kernels (kind of like alignment traps on x86 are useless,
> now that software assumes that unaligned loads and stores are acceptable
> and have reasonable performance).

Requiring any action by application developers is not on the table;
that's contrary to the specification and a complete non-starter. The
question was whether glibc would support such kernels, which would not
be visible to the application.

Rich
  

Patch

diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
index 1518bb52287..24939b8603d 100644
--- a/sysdeps/unix/sysv/linux/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/kernel-features.h
@@ -139,3 +139,51 @@ 
    */
 
 #define __ASSUME_CLONE_DEFAULT 1
+
+/* Support for the 64-bit time Linux kernel syscalls.
+
+   This flag indicates support for Linux kernel syscalls using the 64-bit time
+   ABI. It is defined for all 64-bit architectures as they have always
+   supported 64-bit time. It is also defined for all 32-bit architectures when
+   using Linux kernel version 5.1 or newer.
+
+   When the __ASSUME_TIME64_SYSCALLS macro is defined glibc can call, without needing to
+   check at runtime for ENOSYS errors, the *64/time64 suffixed syscalls. These
+   should be #defined to the the unsuffixed versions when required (such as
+   when running on 64-bit systems).
+
+   __ASSUME_TIME64_SYSCALLS macro being defined does not mean that __TIMESIZE is
+   64-bit. In cases where the __TIMESIZE is 64-bit the 64-bit syscalls can be
+   used directly. In cases where __TIMESIZE is 32-bit conversions between the
+   original 32-bit values and the kernel's 64-bit values will need to occur.
+
+   As an example - the syscall to set clock (clock_settime) - if the
+   __ASSUME_TIME64_SYSCALLS macro is defined, it indicates that 64-bit time can
+   be set in the system.
+
+   On systems with __WORDSIZE == 64 the __NR_clock_settime syscall is used
+   to achieve this goal. Systems with __WORDSIZE == 32 use the
+   __NR_clock_settime64 syscall available from Linux version 5.1.
+
+   The __ASSUME_TIME64_SYSCALLS macro is defined for:
+
+   1. 64-bit systems that have always had 64-bit time support (__WORDSIZE == 64).
+
+   2. For x32 architecture, which is a special case in respect to 64-bit
+      time support (it has __WORDSIZE==32 but __TIMESIZE==64)
+
+   3. Systems with __WORDSIZE==32, which gain 64-bit time support
+      with the syscalls added to Linux kernel 5.1.
+
+   4. 32-bit architectures for which support was added in Linux 5.1 or later,
+      or for which the kernel/userspace ABI changed incompatibly in Linux 5.1
+      or later so that there are no syscalls using 32-bit time.
+   */
+
+#include <bits/wordsize.h>
+#if (__WORDSIZE == 64                                                   \
+     || (__WORDSIZE == 32                                               \
+         && (__LINUX_KERNEL_VERSION >= 0x050100                         \
+             || (defined __SYSCALL_WORDSIZE && __SYSCALL_WORDSIZE == 64))))
+# define __ASSUME_TIME64_SYSCALLS 1
+#endif