diff mbox

[PATCHv3,00/24] ILP32 support in ARM64

Message ID 20150211192118.GC23507@brightrain.aerifal.cx
State Not Applicable
Headers show

Commit Message

Rich Felker Feb. 11, 2015, 7:21 p.m. UTC
On Wed, Feb 11, 2015 at 05:39:19PM +0000, Catalin Marinas wrote:
> (adding Marcus)
> 
> On Tue, Feb 10, 2015 at 06:13:02PM +0000, Rich Felker wrote:
> > On Thu, 2 Oct 2014 at 16:52:18 +0100, Catalin Marinas wrote:
> > > On Wed, Sep 03, 2014 at 10:18:54PM +0100, Andrew Pinski wrote:
> > > > New version with all of the requested changes.  Updated to the
> > > > latest sources.
> > > > 
> > > > Notable changes from the previous versions:
> > > > VDSO code has been factored out to be easier to understand and
> > > > easier to maintain.
> > > > Move the config option to the last thing that gets added.
> > > > Added some extra COMPAT_* macros for core dumping for easier usage.
> > > 
> > > Apart from a few comments I've made, I would also like to see non-empty
> > > commit logs and long line wrapping (both in commit logs and
> > > Documentation/). Otherwise, the patches look fine.
> > > 
> > > So what are the next steps? Are the glibc folk ok with the ILP32 Linux
> > > ABI? On the kernel side, what I would like to see:
> > 
> > I don't know if this has been discussed on libc-alpha yet or not, but
> > I think we need to open a discussion of how it relates to open glibc
> > bug #16437, which presently applies only to x32 (ILP32 ABI on x86_64):
> > 
> > https://sourceware.org/bugzilla/show_bug.cgi?id=16437
> 
> I'm trying to understand the problem first. Quoting from the bug above
> (which I guess is quoted form C11):
> 
> "The range and precision of times representable in clock_t and time_t
> are implementation-defined. The timespec structure shall contain at
> least the following members, in any order.
> 
>          time_t tv_sec; // whole seconds -- >= 0
>          long   tv_nsec; // nanoseconds -- [0, 999999999]"
> 
> So changing time_t to 64-bit is fine on x32. The timespec struct
> exported by the kernel always uses a long for tv_nsec. However, glibc uses
> __syscall_slong_t which ends up as 64-bit for x32 (I guess it mirrors
> the __kernel_long_t definition).
> 
> So w.r.t. C11, the exported kernel timespec looks fine. But I think the
> x32 kernel support (and the current ILP32 patches) assume a native
> struct timespec with tv_nsec as 64-bit.

The exported kernel timespec is not fine if long is defined as a
32-bit type, which it is for x32 and the proposed aarch64-ILP32 ABIs.
The type being long means that code like

	long *p = &ts->tv_nsec;
	*p = 42;

is valid. With the proposed definition of timespec, this results in a
compiler warning or error on valid code, which is better than
dangerously wrong code generation, but we can easily hide the warning
via:

	void *p = &ts->tv_nsec;
	long *q = p;
	*q = 42;

Imagine this happening in places such as with callbacks that take void
pointers, for example pthread_create.

But even if the breakage could always be diagnosed by the compiler,
you're still breaking perfectly valid, conforming C programs.

> If we are to be C11 conformant, glibc on x32 has a bug as it defines
> timespec incorrectly. This hid a bug in the kernel handling the
> corresponding x32 syscalls. What's the best fix for x32 I can't really
> tell (we need people to agree on where the bugs are).

For the kernel, it should be casting the value to int/int32_t to
ignore the junk in the upper bits. But that only works on little
endian. On big endian it's a bigger mess.

However at this point fixing it on the kernel side is not very useful
for x32, at least not right away. Since people will still be running
old kernels with the wrong behavior, the fixup has to happen in
userspace to support those old kernels until the libc drops support
for kernels too old to handle it right.

> At least for AArch64 ILP32 we are still free to change the user/kernel
> ABI, so we could add wrappers for the affected syscalls to fix this up.

Yes. That's what I'd like to see before aarch64-ILP32 is officially
launched. It's a lot harder to retroactively fix this. Right now it's
fairly easy -- just a small wrapper/fixup layer in the kernel.

> > While most of the other type changes proposed (I'm looking at
> > https://lkml.org/lkml/2014/9/3/719) are permissible and simply
> > ugly/undesirable,
> 
> They may be ugly but definitely not undesirable ;).

I can point you to a few other cases that may be undesirable, but less
severe than timespec:

- https://sourceware.org/bugzilla/show_bug.cgi?id=16438
- http://man7.org/linux/man-pages/man2/sysinfo.2.html

In the case of sysinfo, the public API documentation documents the
fields as having type unsigned long, but on x32 (and presumably
aarch64-ILP32 as proposed) they have type unsigned long long.

I can probably find more examples if you're interested.

> > defining struct timespec with tv_nsec having any type other than long
> > conflicts with the requirements of C11 and POSIX, and WG14 is unlikely
> > to be interested in changing the C language because the Linux kernel
> > has the wrong type in timespec.
> 
> I agree. The strange thing is that the Linux exported headers are fine.

I'm confused why you think they're fine.

> > Note that on aarch64 ILP32, the consequences of not fixing this right
> > away will be much worse than on x32, since aarch64 (at least as I
> > understand it) supports big endian where it's not just a matter of
> > sign-extending the value from userspace and ignoring the padding, but
> > rather changing the offset of the tv_nsec member.
> 
> Indeed.
> 
> > Working around the discrepencies in userspace IS possible, but ugly.
> > We do it in musl libc for x32 right now -- see:
> > 
> > http://git.musl-libc.org/cgit/musl/tree/arch/x32/syscall_arch.h?id=v1.1.6
> > http://git.musl-libc.org/cgit/musl/tree/arch/x32/src/syscall_cp_fixup.c?id=v1.1.6
> 
> For AArch64 ILP32 I would rather see the fix-ups in kernel wrappers.
> 
> Are you aware of other cases like this?

For musl x32, the only other thing we had to work around with code was:

http://git.musl-libc.org/cgit/musl/tree/arch/x32/src/sysinfo.c?id=v1.1.6

However, code was mostly only needed where the data is being passed
from userspace to the kernel; for the other direction, most of the
discrepencies were handled simply by defining structures with extra
32-bit padding fields next to their 32-bit longs. I'm including below
a diff of our arch/x86_64/bits versus arch/x32/bits trees to
demonstrate the changes made.

Rich

Comments

Catalin Marinas Feb. 12, 2015, 6:17 p.m. UTC | #1
On Wed, Feb 11, 2015 at 02:21:18PM -0500, Rich Felker wrote:
> On Wed, Feb 11, 2015 at 05:39:19PM +0000, Catalin Marinas wrote:
> > On Tue, Feb 10, 2015 at 06:13:02PM +0000, Rich Felker wrote:
> > > I don't know if this has been discussed on libc-alpha yet or not, but
> > > I think we need to open a discussion of how it relates to open glibc
> > > bug #16437, which presently applies only to x32 (ILP32 ABI on x86_64):
> > > 
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=16437
> > 
> > I'm trying to understand the problem first. Quoting from the bug above
> > (which I guess is quoted form C11):
> > 
> > "The range and precision of times representable in clock_t and time_t
> > are implementation-defined. The timespec structure shall contain at
> > least the following members, in any order.
> > 
> >          time_t tv_sec; // whole seconds -- >= 0
> >          long   tv_nsec; // nanoseconds -- [0, 999999999]"
> > 
> > So changing time_t to 64-bit is fine on x32. The timespec struct
> > exported by the kernel always uses a long for tv_nsec. However, glibc uses
> > __syscall_slong_t which ends up as 64-bit for x32 (I guess it mirrors
> > the __kernel_long_t definition).
> > 
> > So w.r.t. C11, the exported kernel timespec looks fine. But I think the
> > x32 kernel support (and the current ILP32 patches) assume a native
> > struct timespec with tv_nsec as 64-bit.
> 
> The exported kernel timespec is not fine if long is defined as a
> 32-bit type, which it is for x32 and the proposed aarch64-ILP32 ABIs.

The exported kernel headers comply with POSIX as they use long for
tv_nsec. The exported headers can be used in user space and with an
ILP32 ABI, long is 32-bit. The problem is the syscall handler which uses
the same structure in kernel where long is 64-bit. But this doesn't
change the fact that the exported header was still correct from a user
perspective.

The solution (for new ports) could be similar to the other such
solutions in the compat layer. A kernel internal structure which is
binary-compatible with the ILP32 user one (as exported by the kernel):

struct ilp32_timespec_kernel_internal_only {
	__kernel_time_t	tv_sec;			/* seconds */
	int		tv_nsec;		/* nanoseconds */
};

and a syscall wrapper which converts between ilp32_timespec and timespec
(take compat_sys_clock_settime as an example).

If the user structure has some padding (and as I've read in this thread
it is allowed), it could be even easier for the kernel. The padding
could be 32-bit before or after tv_nsec, depending on endianness.
Arnd Bergmann Feb. 12, 2015, 6:59 p.m. UTC | #2
> Catalin Marinas <catalin.marinas@arm.com> hat am 12. Februar 2015 um 19:17
> geschrieben:
> On Wed, Feb 11, 2015 at 02:21:18PM -0500, Rich Felker wrote:
> > On Wed, Feb 11, 2015 at 05:39:19PM +0000, Catalin Marinas wrote:
> > > On Tue, Feb 10, 2015 at 06:13:02PM +0000, Rich Felker wrote:
> > > > I don't know if this has been discussed on libc-alpha yet or not, but
> > > > I think we need to open a discussion of how it relates to open glibc
> > > > bug #16437, which presently applies only to x32 (ILP32 ABI on x86_64):
> > > >
> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=16437
> > >
> > > I'm trying to understand the problem first. Quoting from the bug above
> > > (which I guess is quoted form C11):
> > >
> > > "The range and precision of times representable in clock_t and time_t
> > > are implementation-defined. The timespec structure shall contain at
> > > least the following members, in any order.
> > >
> > > time_t tv_sec; // whole seconds -- >= 0
> > > long tv_nsec; // nanoseconds -- [0, 999999999]"
> > >
> > > So changing time_t to 64-bit is fine on x32. The timespec struct
> > > exported by the kernel always uses a long for tv_nsec. However, glibc uses
> > > __syscall_slong_t which ends up as 64-bit for x32 (I guess it mirrors
> > > the __kernel_long_t definition).
> > >
> > > So w.r.t. C11, the exported kernel timespec looks fine. But I think the
> > > x32 kernel support (and the current ILP32 patches) assume a native
> > > struct timespec with tv_nsec as 64-bit.
> >
> > The exported kernel timespec is not fine if long is defined as a
> > 32-bit type, which it is for x32 and the proposed aarch64-ILP32 ABIs.
>
> The exported kernel headers comply with POSIX as they use long for
> tv_nsec. The exported headers can be used in user space and with an
> ILP32 ABI, long is 32-bit. The problem is the syscall handler which uses
> the same structure in kernel where long is 64-bit. But this doesn't
> change the fact that the exported header was still correct from a user
> perspective.

This is not ILP32 specific really, we need to add the same set of syscalls
for all 32-bit systems, in addition to the existing ones that take
a 32-bit time_t.

> The solution (for new ports) could be similar to the other such
> solutions in the compat layer. A kernel internal structure which is
> binary-compatible with the ILP32 user one (as exported by the kernel):
>
> struct ilp32_timespec_kernel_internal_only {
> __kernel_time_t tv_sec; /* seconds */
> int tv_nsec; /* nanoseconds */
> };
>
> and a syscall wrapper which converts between ilp32_timespec and timespec
> (take compat_sys_clock_settime as an example).

We then have to to this on all architectures, and not call it ilp32_timespec,
but call it something else.

I would much prefer to only have two versions of each syscall that takes a
timespec rather than three versions, or having a version that behaves
differently based on the type of program calling it. On native 32-bit
systems, we should have the native syscall taking the 16-byte structure
(using long long __kernel_time64_t) along with the compatibility syscall
with a 8-byte structure for existing applications.

On 64-bit systems, the same syscall source can be used for the normal 16-byte
structure on native 64-bit tasks, ilp32 tasks (x32, aarch64-32), and future
compat32 (i386, aarch32, ...) tasks, while the syscall for the 8-byte structure
deals with legacy compat32 tasks that do not yet use __kernel_time64_t.

> If the user structure has some padding (and as I've read in this thread
> it is allowed), it could be even easier for the kernel. The padding
> could be 32-bit before or after tv_nsec, depending on endianness.

The problem as pointed out before is that if you do this, 32-bit tasks
need to have the padding word zeroed at some stage for data passed into
the kernel, while 64-bit tasks need to return an error if the upper half
of the tv_nsec word is nonzero, at least for interfaces that are documented
to do this.

This can be done either in the kernel or in the libc. In the kernel, it
comes down to a function like

int get_user_timespec64(struct timespec64 *ts, struct __kernel_timespec64 __user
*uts, bool task_32bit)
{
       struct __kernel_timespec64 input;

       if (copy_from_user(&input, uts, sizeof(input))
              return -EFAULT;

       ts->tv_sec = input.tv_sec;
       if (task_32bit)
               ts->tv_nsec = (int)input.tv_nsec;
       else
               ts->tv_nsec = input.tv_nsec;

       return 0;
}

with data types of

struct timespec64 {
       time64_t tv_sec;
       long tv_nsec;
};

struct __kernel_timespec64 {
       __kernel_time64_t tv_nsec;
#if (__BYTE_ORDER == __BIG_ENDIAN) && (__BITS_PER_LONG == 32)
       u32 __pad;     
#endif
       long tv_nsec;
#if (__BYTE_ORDER == __LITTLE_ENDIAN) && (__BITS_PER_LONG == 32)
       u32 __pad;
#endif
};

The data structure definition is a little bit fragile, as it depends on
user space not using the __BIT_ENDIAN symbol in a conflicting way. So
far we have managed to keep that outside of general purpose headers, but
it should at least blow up in an obvious way if it does, rather than
breaking silently.

I still think it's more practical to keep the zeroing in user space though.
In that case, we keep defining __kernel_timespec64 with a 'typedef long
long __kernel_snseconds_t', and it's up to the libc to either use
__kernel_timespec64 as its timespec, or to define a C11-compliant
timespec itself and zero out the bits before passing the data to the kernel.

     Arnd
Catalin Marinas Feb. 13, 2015, 1:33 p.m. UTC | #3
On Thu, Feb 12, 2015 at 07:59:24PM +0100, Arnd Bergmann wrote:
> > Catalin Marinas <catalin.marinas@arm.com> hat am 12. Februar 2015 um 19:17
> > geschrieben:
> > On Wed, Feb 11, 2015 at 02:21:18PM -0500, Rich Felker wrote:
> > > On Wed, Feb 11, 2015 at 05:39:19PM +0000, Catalin Marinas wrote:
> > > > On Tue, Feb 10, 2015 at 06:13:02PM +0000, Rich Felker wrote:
> > > > > I don't know if this has been discussed on libc-alpha yet or not, but
> > > > > I think we need to open a discussion of how it relates to open glibc
> > > > > bug #16437, which presently applies only to x32 (ILP32 ABI on x86_64):
> > > > >
> > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=16437
> > > >
> > > > I'm trying to understand the problem first. Quoting from the bug above
> > > > (which I guess is quoted form C11):
> > > >
> > > > "The range and precision of times representable in clock_t and time_t
> > > > are implementation-defined. The timespec structure shall contain at
> > > > least the following members, in any order.
> > > >
> > > > time_t tv_sec; // whole seconds -- >= 0
> > > > long tv_nsec; // nanoseconds -- [0, 999999999]"
> > > >
> > > > So changing time_t to 64-bit is fine on x32. The timespec struct
> > > > exported by the kernel always uses a long for tv_nsec. However, glibc uses
> > > > __syscall_slong_t which ends up as 64-bit for x32 (I guess it mirrors
> > > > the __kernel_long_t definition).
> > > >
> > > > So w.r.t. C11, the exported kernel timespec looks fine. But I think the
> > > > x32 kernel support (and the current ILP32 patches) assume a native
> > > > struct timespec with tv_nsec as 64-bit.
> > >
> > > The exported kernel timespec is not fine if long is defined as a
> > > 32-bit type, which it is for x32 and the proposed aarch64-ILP32 ABIs.
> >
> > The exported kernel headers comply with POSIX as they use long for
> > tv_nsec. The exported headers can be used in user space and with an
> > ILP32 ABI, long is 32-bit. The problem is the syscall handler which uses
> > the same structure in kernel where long is 64-bit. But this doesn't
> > change the fact that the exported header was still correct from a user
> > perspective.
> 
> This is not ILP32 specific really, we need to add the same set of syscalls
> for all 32-bit systems, in addition to the existing ones that take
> a 32-bit time_t.

We can look at this as two scenarios:

1. existing 32-bit user space with a 32-bit time_t
2. new 32-bit user space, potentially with 64-bit time_t

For (1), we need an additional set of syscalls in parallel with the
old ones and most likely a different structure, let's say timespec64.

For (2), we could go for a 64-bit time_t in timespec directly, without
any timespec64 and additional set of syscalls (though internally the
kernel may handle them as timespec64).

For compat support on a 64-bit kernel, we may need to support both
32-bit time_t via compat_timespec and a 64-bit time_t via a new
compat_timespec64. In case of AArch64 ILP32, any timespec syscall should
be routed directly to the corresponding compat_timespec64 handlers as we
define a 64-bit time_t.

For new 32-bit native architectures (no compat layer), we may want to
enforce a 64-bit time_t from the beginning.

Anyway, since AArch64 ILP32 does not have a legacy ABI with 32-bit
time_t, we can start implementing it independently of the additional
syscalls for 32-bit timespec64. Eventually, the same code path will be
used for legacy 32-bit with the new 64-bit time_t syscalls.

> > The solution (for new ports) could be similar to the other such
> > solutions in the compat layer. A kernel internal structure which is
> > binary-compatible with the ILP32 user one (as exported by the kernel):
> >
> > struct ilp32_timespec_kernel_internal_only {
> > __kernel_time_t tv_sec; /* seconds */
> > int tv_nsec; /* nanoseconds */
> > };
> >
> > and a syscall wrapper which converts between ilp32_timespec and timespec
> > (take compat_sys_clock_settime as an example).
> 
> We then have to to this on all architectures, and not call it ilp32_timespec,
> but call it something else.
> 
> I would much prefer to only have two versions of each syscall that takes a
> timespec rather than three versions, or having a version that behaves
> differently based on the type of program calling it. On native 32-bit
> systems, we should have the native syscall taking the 16-byte structure
> (using long long __kernel_time64_t)

Can this also be 12 bytes in general if tv_nsec stays as 32-bit? The
size of such structure would be 16 bytes on ARM but I guess this depends
on long long the alignment requirements on specific architectures.

> along with the compatibility syscall with a 8-byte structure for
> existing applications.
> 
> On 64-bit systems, the same syscall source can be used for the normal 16-byte
> structure on native 64-bit tasks, ilp32 tasks (x32, aarch64-32), and future
> compat32 (i386, aarch32, ...) tasks, while the syscall for the 8-byte structure
> deals with legacy compat32 tasks that do not yet use __kernel_time64_t.

We could do with two syscalls but, as you said, we need some padding and
zeroing when the sizeof(time_t) != sizeof(long).

> > If the user structure has some padding (and as I've read in this thread
> > it is allowed), it could be even easier for the kernel. The padding
> > could be 32-bit before or after tv_nsec, depending on endianness.
> 
> The problem as pointed out before is that if you do this, 32-bit tasks
> need to have the padding word zeroed at some stage for data passed into
> the kernel, while 64-bit tasks need to return an error if the upper half
> of the tv_nsec word is nonzero, at least for interfaces that are documented
> to do this.
> 
> This can be done either in the kernel or in the libc.

I think this should be in the kernel as user is allowed to invoke
syscalls directly outside the libc wrappers.

> In the kernel, it comes down to a function like
> 
> int get_user_timespec64(struct timespec64 *ts, struct __kernel_timespec64 __user
> *uts, bool task_32bit)
> {
>        struct __kernel_timespec64 input;
> 
>        if (copy_from_user(&input, uts, sizeof(input))
>               return -EFAULT;
> 
>        ts->tv_sec = input.tv_sec;
>        if (task_32bit)
>                ts->tv_nsec = (int)input.tv_nsec;
>        else
>                ts->tv_nsec = input.tv_nsec;
> 
>        return 0;
> }

The only drawback is that native 64-bit and new 32-bit have the same
handling path, potentially slowing down the former (it may not be
noticeable).

> with data types of
> 
> struct timespec64 {
>        time64_t tv_sec;
>        long tv_nsec;
> };
> 
> struct __kernel_timespec64 {
>        __kernel_time64_t tv_nsec;
> #if (__BYTE_ORDER == __BIG_ENDIAN) && (__BITS_PER_LONG == 32)
>        u32 __pad;     
> #endif
>        long tv_nsec;
> #if (__BYTE_ORDER == __LITTLE_ENDIAN) && (__BITS_PER_LONG == 32)
>        u32 __pad;
> #endif
> };
> 
> The data structure definition is a little bit fragile, as it depends on
> user space not using the __BIT_ENDIAN symbol in a conflicting way. So
> far we have managed to keep that outside of general purpose headers, but
> it should at least blow up in an obvious way if it does, rather than
> breaking silently.
> 
> I still think it's more practical to keep the zeroing in user space though.
> In that case, we keep defining __kernel_timespec64 with a 'typedef long
> long __kernel_snseconds_t', and it's up to the libc to either use
> __kernel_timespec64 as its timespec, or to define a C11-compliant
> timespec itself and zero out the bits before passing the data to the kernel.

The problem with doing this in user space is syscall(2). If we don't
allow it, then it's fine to do the padding in libc.
Rich Felker Feb. 13, 2015, 4:30 p.m. UTC | #4
On Fri, Feb 13, 2015 at 01:33:56PM +0000, Catalin Marinas wrote:
> On Thu, Feb 12, 2015 at 07:59:24PM +0100, Arnd Bergmann wrote:
> > > Catalin Marinas <catalin.marinas@arm.com> hat am 12. Februar 2015 um 19:17
> > > geschrieben:
> > > On Wed, Feb 11, 2015 at 02:21:18PM -0500, Rich Felker wrote:
> > > > On Wed, Feb 11, 2015 at 05:39:19PM +0000, Catalin Marinas wrote:
> > > > > On Tue, Feb 10, 2015 at 06:13:02PM +0000, Rich Felker wrote:
> > > > > > I don't know if this has been discussed on libc-alpha yet or not, but
> > > > > > I think we need to open a discussion of how it relates to open glibc
> > > > > > bug #16437, which presently applies only to x32 (ILP32 ABI on x86_64):
> > > > > >
> > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=16437
> > > > >
> > > > > I'm trying to understand the problem first. Quoting from the bug above
> > > > > (which I guess is quoted form C11):
> > > > >
> > > > > "The range and precision of times representable in clock_t and time_t
> > > > > are implementation-defined. The timespec structure shall contain at
> > > > > least the following members, in any order.
> > > > >
> > > > > time_t tv_sec; // whole seconds -- >= 0
> > > > > long tv_nsec; // nanoseconds -- [0, 999999999]"
> > > > >
> > > > > So changing time_t to 64-bit is fine on x32. The timespec struct
> > > > > exported by the kernel always uses a long for tv_nsec. However, glibc uses
> > > > > __syscall_slong_t which ends up as 64-bit for x32 (I guess it mirrors
> > > > > the __kernel_long_t definition).
> > > > >
> > > > > So w.r.t. C11, the exported kernel timespec looks fine. But I think the
> > > > > x32 kernel support (and the current ILP32 patches) assume a native
> > > > > struct timespec with tv_nsec as 64-bit.
> > > >
> > > > The exported kernel timespec is not fine if long is defined as a
> > > > 32-bit type, which it is for x32 and the proposed aarch64-ILP32 ABIs.
> > >
> > > The exported kernel headers comply with POSIX as they use long for
> > > tv_nsec. The exported headers can be used in user space and with an
> > > ILP32 ABI, long is 32-bit. The problem is the syscall handler which uses
> > > the same structure in kernel where long is 64-bit. But this doesn't
> > > change the fact that the exported header was still correct from a user
> > > perspective.
> > 
> > This is not ILP32 specific really, we need to add the same set of syscalls
> > for all 32-bit systems, in addition to the existing ones that take
> > a 32-bit time_t.
> 
> We can look at this as two scenarios:
> 
> 1. existing 32-bit user space with a 32-bit time_t
> 2. new 32-bit user space, potentially with 64-bit time_t
> 
> For (1), we need an additional set of syscalls in parallel with the
> old ones and most likely a different structure, let's say timespec64.
> 
> For (2), we could go for a 64-bit time_t in timespec directly, without
> any timespec64 and additional set of syscalls (though internally the
> kernel may handle them as timespec64).
> 
> For compat support on a 64-bit kernel, we may need to support both
> 32-bit time_t via compat_timespec and a 64-bit time_t via a new
> compat_timespec64. In case of AArch64 ILP32, any timespec syscall should
> be routed directly to the corresponding compat_timespec64 handlers as we
> define a 64-bit time_t.
> 
> For new 32-bit native architectures (no compat layer), we may want to
> enforce a 64-bit time_t from the beginning.
> 
> Anyway, since AArch64 ILP32 does not have a legacy ABI with 32-bit
> time_t, we can start implementing it independently of the additional
> syscalls for 32-bit timespec64. Eventually, the same code path will be
> used for legacy 32-bit with the new 64-bit time_t syscalls.

This sounds right.

> > > The solution (for new ports) could be similar to the other such
> > > solutions in the compat layer. A kernel internal structure which is
> > > binary-compatible with the ILP32 user one (as exported by the kernel):
> > >
> > > struct ilp32_timespec_kernel_internal_only {
> > > __kernel_time_t tv_sec; /* seconds */
> > > int tv_nsec; /* nanoseconds */
> > > };
> > >
> > > and a syscall wrapper which converts between ilp32_timespec and timespec
> > > (take compat_sys_clock_settime as an example).
> > 
> > We then have to to this on all architectures, and not call it ilp32_timespec,
> > but call it something else.
> > 
> > I would much prefer to only have two versions of each syscall that takes a
> > timespec rather than three versions, or having a version that behaves
> > differently based on the type of program calling it. On native 32-bit
> > systems, we should have the native syscall taking the 16-byte structure
> > (using long long __kernel_time64_t)
> 
> Can this also be 12 bytes in general if tv_nsec stays as 32-bit? The
> size of such structure would be 16 bytes on ARM but I guess this depends
> on long long the alignment requirements on specific architectures.

The only archs with modern relevance I'm aware of where 64-bit types
are not aligned are i386 and, by a regretable but hard-to-fix mistake,
or1k. I don't have much opinion on whether the 64-bit-time_t timespec
should be 12 bytes or 16 bytes on such archs. From my perspective it's
a new ABI anyway so I'd like to be able to fix the 64-bit alignment
issue at the same time, in which case the question would go away, but
I'm sure others (glibc) will prefer a more transitional approach with
symbol versioning or feature test macros or something.

> > along with the compatibility syscall with a 8-byte structure for
> > existing applications.
> > 
> > On 64-bit systems, the same syscall source can be used for the normal 16-byte
> > structure on native 64-bit tasks, ilp32 tasks (x32, aarch64-32), and future
> > compat32 (i386, aarch32, ...) tasks, while the syscall for the 8-byte structure
> > deals with legacy compat32 tasks that do not yet use __kernel_time64_t.
> 
> We could do with two syscalls but, as you said, we need some padding and
> zeroing when the sizeof(time_t) != sizeof(long).
> 
> > > If the user structure has some padding (and as I've read in this thread
> > > it is allowed), it could be even easier for the kernel. The padding
> > > could be 32-bit before or after tv_nsec, depending on endianness.
> > 
> > The problem as pointed out before is that if you do this, 32-bit tasks
> > need to have the padding word zeroed at some stage for data passed into
> > the kernel, while 64-bit tasks need to return an error if the upper half
> > of the tv_nsec word is nonzero, at least for interfaces that are documented
> > to do this.
> > 
> > This can be done either in the kernel or in the libc.
> 
> I think this should be in the kernel as user is allowed to invoke
> syscalls directly outside the libc wrappers.

I agree, but see details below on why.

> > In the kernel, it comes down to a function like
> > 
> > int get_user_timespec64(struct timespec64 *ts, struct __kernel_timespec64 __user
> > *uts, bool task_32bit)
> > {
> >        struct __kernel_timespec64 input;
> > 
> >        if (copy_from_user(&input, uts, sizeof(input))
> >               return -EFAULT;
> > 
> >        ts->tv_sec = input.tv_sec;
> >        if (task_32bit)
> >                ts->tv_nsec = (int)input.tv_nsec;
> >        else
> >                ts->tv_nsec = input.tv_nsec;
> > 
> >        return 0;
> > }
> 
> The only drawback is that native 64-bit and new 32-bit have the same
> handling path, potentially slowing down the former (it may not be
> noticeable).

Offhand, I would not consider a single predictable branch on syscall
entry or return to be noticable relative to general syscall overhead.

> > The data structure definition is a little bit fragile, as it depends on
> > user space not using the __BIT_ENDIAN symbol in a conflicting way. So
> > far we have managed to keep that outside of general purpose headers, but
> > it should at least blow up in an obvious way if it does, rather than
> > breaking silently.
> > 
> > I still think it's more practical to keep the zeroing in user space though.
> > In that case, we keep defining __kernel_timespec64 with a 'typedef long
> > long __kernel_snseconds_t', and it's up to the libc to either use
> > __kernel_timespec64 as its timespec, or to define a C11-compliant
> > timespec itself and zero out the bits before passing the data to the kernel.
> 
> The problem with doing this in user space is syscall(2). If we don't
> allow it, then it's fine to do the padding in libc.

It's already the case that callers have to tiptoe around syscall(2)
usage on a per-arch basis for silly things like the convention for
passing 64-bit arguments on 32-bit archs, different arg orders to work
around 64-bit alignment and issues with too many args, and various
legacy issues. So I think manual use of syscall(2) is a less-critical
issue, though of course from a libc perspective I would very much like
for the kernel to handle it right.

What is important, on the other hand, is how timespec creeps into
other things. It's a member of lots of other important structs used
for communication with the kernel, some of which libc can't be aware
of -- things like ioctls, socket options, etc. where kernel device
drivers and network protocols, etc. may add new ones that libc isn't
aware of. IMO these are the most compelling reason to ask that the
kernel handle accepting timespecs in the proper userspace ABI form.

Rich
Catalin Marinas Feb. 13, 2015, 5:33 p.m. UTC | #5
On Fri, Feb 13, 2015 at 11:30:13AM -0500, Rich Felker wrote:
> On Fri, Feb 13, 2015 at 01:33:56PM +0000, Catalin Marinas wrote:
> > On Thu, Feb 12, 2015 at 07:59:24PM +0100, Arnd Bergmann wrote:
> > > Catalin Marinas <catalin.marinas@arm.com> hat am 12. Februar 2015 um 19:17
> > > geschrieben:
> > > > The solution (for new ports) could be similar to the other such
> > > > solutions in the compat layer. A kernel internal structure which is
> > > > binary-compatible with the ILP32 user one (as exported by the kernel):
> > > >
> > > > struct ilp32_timespec_kernel_internal_only {
> > > > __kernel_time_t tv_sec; /* seconds */
> > > > int tv_nsec; /* nanoseconds */
> > > > };
> > > >
> > > > and a syscall wrapper which converts between ilp32_timespec and timespec
> > > > (take compat_sys_clock_settime as an example).
> > > 
> > > We then have to to this on all architectures, and not call it ilp32_timespec,
> > > but call it something else.
> > > 
> > > I would much prefer to only have two versions of each syscall that takes a
> > > timespec rather than three versions, or having a version that behaves
> > > differently based on the type of program calling it. On native 32-bit
> > > systems, we should have the native syscall taking the 16-byte structure
> > > (using long long __kernel_time64_t)
> > 
> > Can this also be 12 bytes in general if tv_nsec stays as 32-bit? The
> > size of such structure would be 16 bytes on ARM but I guess this depends
> > on long long the alignment requirements on specific architectures.
> 
> The only archs with modern relevance I'm aware of where 64-bit types
> are not aligned are i386 and, by a regretable but hard-to-fix mistake,
> or1k. I don't have much opinion on whether the 64-bit-time_t timespec
> should be 12 bytes or 16 bytes on such archs. From my perspective it's
> a new ABI anyway so I'd like to be able to fix the 64-bit alignment
> issue at the same time, in which case the question would go away, but
> I'm sure others (glibc) will prefer a more transitional approach with
> symbol versioning or feature test macros or something.

The good thing about 16-byte timespec64 with appropriate (endianness
aware) struct padding is that the kernel can write tv_nsec to user as a
64-bit value (long on a 64-bit kernel). It's only the reading from user
that the 32-bit needs to be sign-extended into the kernel structure.

> > > In the kernel, it comes down to a function like
> > > 
> > > int get_user_timespec64(struct timespec64 *ts, struct __kernel_timespec64 __user
> > > *uts, bool task_32bit)
> > > {
> > >        struct __kernel_timespec64 input;
> > > 
> > >        if (copy_from_user(&input, uts, sizeof(input))
> > >               return -EFAULT;
> > > 
> > >        ts->tv_sec = input.tv_sec;
> > >        if (task_32bit)
> > >                ts->tv_nsec = (int)input.tv_nsec;
> > >        else
> > >                ts->tv_nsec = input.tv_nsec;
> > > 
> > >        return 0;
> > > }
> > 
> > The only drawback is that native 64-bit and new 32-bit have the same
> > handling path, potentially slowing down the former (it may not be
> > noticeable).
> 
> Offhand, I would not consider a single predictable branch on syscall
> entry or return to be noticable relative to general syscall overhead.

It's not just a check+branch but accessing some TIF flag which requires
reading the current_thread_info()->flags and testing it. It is probably
lost in the noise, unless you do such calls in loop where you may notice
a slight variation (it depends on the branch predictor as well; on some
architecture we may be able to make use of unlikely(task_32bit)).

> > > The data structure definition is a little bit fragile, as it depends on
> > > user space not using the __BIT_ENDIAN symbol in a conflicting way. So
> > > far we have managed to keep that outside of general purpose headers, but
> > > it should at least blow up in an obvious way if it does, rather than
> > > breaking silently.
> > > 
> > > I still think it's more practical to keep the zeroing in user space though.
> > > In that case, we keep defining __kernel_timespec64 with a 'typedef long
> > > long __kernel_snseconds_t', and it's up to the libc to either use
> > > __kernel_timespec64 as its timespec, or to define a C11-compliant
> > > timespec itself and zero out the bits before passing the data to the kernel.
> > 
> > The problem with doing this in user space is syscall(2). If we don't
> > allow it, then it's fine to do the padding in libc.
> 
> It's already the case that callers have to tiptoe around syscall(2)
> usage on a per-arch basis for silly things like the convention for
> passing 64-bit arguments on 32-bit archs, different arg orders to work
> around 64-bit alignment and issues with too many args, and various
> legacy issues. So I think manual use of syscall(2) is a less-critical
> issue, though of course from a libc perspective I would very much like
> for the kernel to handle it right.

I think there is another problem with sign-extending tv_nsec in libc.
The prototype for functions like clock_settime(2) take a const struct
timespec *. There isn't anything to prevent such structure being in a
read-only section, even though it is unlikely. So libc would have to
duplicate the structure rather than just sign-extending tv_nsec in
place.

BTW, I'll be offline for a week (holiday) and I won't be able to follow
up on this thread.
Rich Felker Feb. 13, 2015, 6:37 p.m. UTC | #6
On Fri, Feb 13, 2015 at 05:33:46PM +0000, Catalin Marinas wrote:
> > > > The data structure definition is a little bit fragile, as it depends on
> > > > user space not using the __BIT_ENDIAN symbol in a conflicting way. So
> > > > far we have managed to keep that outside of general purpose headers, but
> > > > it should at least blow up in an obvious way if it does, rather than
> > > > breaking silently.
> > > > 
> > > > I still think it's more practical to keep the zeroing in user space though.
> > > > In that case, we keep defining __kernel_timespec64 with a 'typedef long
> > > > long __kernel_snseconds_t', and it's up to the libc to either use
> > > > __kernel_timespec64 as its timespec, or to define a C11-compliant
> > > > timespec itself and zero out the bits before passing the data to the kernel.
> > > 
> > > The problem with doing this in user space is syscall(2). If we don't
> > > allow it, then it's fine to do the padding in libc.
> > 
> > It's already the case that callers have to tiptoe around syscall(2)
> > usage on a per-arch basis for silly things like the convention for
> > passing 64-bit arguments on 32-bit archs, different arg orders to work
> > around 64-bit alignment and issues with too many args, and various
> > legacy issues. So I think manual use of syscall(2) is a less-critical
> > issue, though of course from a libc perspective I would very much like
> > for the kernel to handle it right.
> 
> I think there is another problem with sign-extending tv_nsec in libc.
> The prototype for functions like clock_settime(2) take a const struct
> timespec *. There isn't anything to prevent such structure being in a
> read-only section, even though it is unlikely. So libc would have to
> duplicate the structure rather than just sign-extending tv_nsec in
> place.

Yes, we already have to do this for x32 in musl. I'd rather not have
to do the same for aarch64-ILP32.

Rich
Arnd Bergmann Feb. 16, 2015, 2:40 p.m. UTC | #7
On Friday 13 February 2015 13:37:07 Rich Felker wrote:
> On Fri, Feb 13, 2015 at 05:33:46PM +0000, Catalin Marinas wrote:
> > > > > The data structure definition is a little bit fragile, as it depends on
> > > > > user space not using the __BIT_ENDIAN symbol in a conflicting way. So
> > > > > far we have managed to keep that outside of general purpose headers, but
> > > > > it should at least blow up in an obvious way if it does, rather than
> > > > > breaking silently.
> > > > > 
> > > > > I still think it's more practical to keep the zeroing in user space though.
> > > > > In that case, we keep defining __kernel_timespec64 with a 'typedef long
> > > > > long __kernel_snseconds_t', and it's up to the libc to either use
> > > > > __kernel_timespec64 as its timespec, or to define a C11-compliant
> > > > > timespec itself and zero out the bits before passing the data to the kernel.
> > > > 
> > > > The problem with doing this in user space is syscall(2). If we don't
> > > > allow it, then it's fine to do the padding in libc.
> > > 
> > > It's already the case that callers have to tiptoe around syscall(2)
> > > usage on a per-arch basis for silly things like the convention for
> > > passing 64-bit arguments on 32-bit archs, different arg orders to work
> > > around 64-bit alignment and issues with too many args, and various
> > > legacy issues.

Right. If one wants to use syscall(), they have to know exactly what the
kernel's calling conventions are, including knowing what the timespec
definition looks like, which could have a different size and padding
compared to the user space one.

> > I think there is another problem with sign-extending tv_nsec in libc.
> > The prototype for functions like clock_settime(2) take a const struct
> > timespec *. There isn't anything to prevent such structure being in a
> > read-only section, even though it is unlikely. So libc would have to
> > duplicate the structure rather than just sign-extending tv_nsec in
> > place.

Do we actually need sign-extend, or does zero-extend have the exact
same effect? For all I can tell, all invalid nanoseconds values
remain invalid, and the accepted values are unchanged regardless
of which type extension gets used.

> Yes, we already have to do this for x32 in musl. I'd rather not have
> to do the same for aarch64-ILP32.

This would of course be solved by using a 64-bit __kernel_snseconds_t
or snseconds_t, and I suspect other libc implementations would just do
that, when they are less strict about posix/c11 compliance compared
to musl.

If you don't mind the (slight) distraction, can you describe what your
plans are for handling 64-bit time_t on the existing 32-bit ABIs?
I'm involved in both the efforts to do that and the ilp32 code on
ARM, so it would be good for me to understand your plans for musl to
get the bigger picture. Specifically, which of these do you plan
to support (if you know already):

- using 64-bit time_t on future arm32/i386/... kernels
- using 64-bit time_t on existing arm32/i386/... kernels with native
  32-bit time_t
- using 32-bit time_t on future architectures that only support 64-bit
  time_t in the kernel
- running existing binaries with 32-bit time_t on a library with 64-bit
  time_t support, using symbol versioning
- compiling new code with 32-bit time_t against a library that supports
  both 32-bit and 64-bit time_t at runtime.
- building a libc for existing architectures but without support for
  running existing 32-bit time_t applications.

	Arnd
Rich Felker Feb. 16, 2015, 3:38 p.m. UTC | #8
On Mon, Feb 16, 2015 at 03:40:54PM +0100, Arnd Bergmann wrote:
> On Friday 13 February 2015 13:37:07 Rich Felker wrote:
> > On Fri, Feb 13, 2015 at 05:33:46PM +0000, Catalin Marinas wrote:
> > > > > > The data structure definition is a little bit fragile, as it depends on
> > > > > > user space not using the __BIT_ENDIAN symbol in a conflicting way. So
> > > > > > far we have managed to keep that outside of general purpose headers, but
> > > > > > it should at least blow up in an obvious way if it does, rather than
> > > > > > breaking silently.
> > > > > > 
> > > > > > I still think it's more practical to keep the zeroing in user space though.
> > > > > > In that case, we keep defining __kernel_timespec64 with a 'typedef long
> > > > > > long __kernel_snseconds_t', and it's up to the libc to either use
> > > > > > __kernel_timespec64 as its timespec, or to define a C11-compliant
> > > > > > timespec itself and zero out the bits before passing the data to the kernel.
> > > > > 
> > > > > The problem with doing this in user space is syscall(2). If we don't
> > > > > allow it, then it's fine to do the padding in libc.
> > > > 
> > > > It's already the case that callers have to tiptoe around syscall(2)
> > > > usage on a per-arch basis for silly things like the convention for
> > > > passing 64-bit arguments on 32-bit archs, different arg orders to work
> > > > around 64-bit alignment and issues with too many args, and various
> > > > legacy issues.
> 
> Right. If one wants to use syscall(), they have to know exactly what the
> kernel's calling conventions are, including knowing what the timespec
> definition looks like, which could have a different size and padding
> compared to the user space one.
> 
> > > I think there is another problem with sign-extending tv_nsec in libc.
> > > The prototype for functions like clock_settime(2) take a const struct
> > > timespec *. There isn't anything to prevent such structure being in a
> > > read-only section, even though it is unlikely. So libc would have to
> > > duplicate the structure rather than just sign-extending tv_nsec in
> > > place.
> 
> Do we actually need sign-extend, or does zero-extend have the exact
> same effect? For all I can tell, all invalid nanoseconds values
> remain invalid, and the accepted values are unchanged regardless
> of which type extension gets used.

I think it matters for futimensat which has some special negative
codes you can store in tv_nsec, but perhaps there's an easy trick to
distinguish them even with zero extending.

> > Yes, we already have to do this for x32 in musl. I'd rather not have
> > to do the same for aarch64-ILP32.
> 
> This would of course be solved by using a 64-bit __kernel_snseconds_t
> or snseconds_t, and I suspect other libc implementations would just do
> that, when they are less strict about posix/c11 compliance compared
> to musl.

I think they would be more strict if this were for a target that
actually sees use and they were getting bug reports from C programmers
annoyed that their code was not working correctly or not even
compiling. AFAIK there are no distros based on x32 now and it's
something of an alternate model on x86_64 distros that some people are
playing around with.

> If you don't mind the (slight) distraction, can you describe what your
> plans are for handling 64-bit time_t on the existing 32-bit ABIs?
> I'm involved in both the efforts to do that and the ilp32 code on
> ARM, so it would be good for me to understand your plans for musl to
> get the bigger picture. Specifically, which of these do you plan
> to support (if you know already):

It largely depends on if there's demand. If we have users who want to
run 32-bit systems with an ABI that will survive Y2038, it will be
supported, but as a new ABI for these targets. This will likely allow
fixing other ABI issues at the same time -- for example, on i386 I
would probably switch to mandating SSE2 for floating point, and
possibly using regparm everywhere. There are a couple of different
ways it could be done though:

1. On a per-arch basis, defining a new ABI variant for the arch.

2. With a new abstraction at the syscall boundary to get rid of all
kernel-arch-specific structures in userspace and redefine all types to
have plenty of room for growth.

In regards to your specific questions about ways it could be done:

> - using 64-bit time_t on future arm32/i386/... kernels
> - using 64-bit time_t on existing arm32/i386/... kernels with native
>   32-bit time_t

If the former is supported, I would think we'd want to support the
latter too. An ABI that only works on very-new kernels is very
restrictive in who can use it. Kernel support hardly matters (until
Y2038 actually arrives); the point of 64-bit time_t is to have an ABI
that's _ready_ for it so existing binaries can keep working.

> - using 32-bit time_t on future architectures that only support 64-bit
>   time_t in the kernel

Definitely will not be supported. Introducing a new ABI with 32-bit
time_t is a huge mistake, and the only reason it's been done for some
of the new targets musl supports is because the kernel does it, and
working around a mismatch between kernel and user time_t is a huge
problem -- all sorts of things, including for example struct stat,
depend on the time_t definition, and if you're going to allow mismatch
with kernel you might as well go ahead and have a full translation
layer for kernel structs like this.

> - running existing binaries with 32-bit time_t on a library with 64-bit
>   time_t support, using symbol versioning

Symbol versions don't solve any problem, and they mask dangerous bugs,
so no. The problem is that a symbol version is only able to represent
a single interface boundary (between a caller and libc), not all the
other interface boundaries between third-party libraries. If code
compiled for 32-bit time_t calls into code that uses 64-bit time_t
with a time_t* argument and the callee writes back a result, it's
corrupted the caller's memory. Symbol versions have no way to diagnose
this.

They're also bound at ld-time, whereas the choice of needed version
depends on compile-time (which definitions were used in the header the
code was compiled against).

> - compiling new code with 32-bit time_t against a library that supports
>   both 32-bit and 64-bit time_t at runtime.

No; see above.

> - building a libc for existing architectures but without support for
>   running existing 32-bit time_t applications.

Yes; this would be the way a new ABI would always work. But since musl
inherently supports multi-arch (each arch variant has its own
PT_INTERP name and library path config) you can easily run both types
of binaries on the same system. They just need completely separate
library ecosystems. This is the only way I know to prevent the
dangerous issues that arise with other [non-]solutions like symbol
versioning or feature test macros (as in -D_FILE_OFFSET_BITS=64) for
the problem.

Rich
Arnd Bergmann Feb. 16, 2015, 4:54 p.m. UTC | #9
On Monday 16 February 2015 10:38:18 Rich Felker wrote:
> On Mon, Feb 16, 2015 at 03:40:54PM +0100, Arnd Bergmann wrote:
> > On Friday 13 February 2015 13:37:07 Rich Felker wrote:
> > > On Fri, Feb 13, 2015 at 05:33:46PM +0000, Catalin Marinas wrote:
> > > > I think there is another problem with sign-extending tv_nsec in libc.
> > > > The prototype for functions like clock_settime(2) take a const struct
> > > > timespec *. There isn't anything to prevent such structure being in a
> > > > read-only section, even though it is unlikely. So libc would have to
> > > > duplicate the structure rather than just sign-extending tv_nsec in
> > > > place.
> > 
> > Do we actually need sign-extend, or does zero-extend have the exact
> > same effect? For all I can tell, all invalid nanoseconds values
> > remain invalid, and the accepted values are unchanged regardless
> > of which type extension gets used.
> 
> I think it matters for futimensat which has some special negative
> codes you can store in tv_nsec, but perhaps there's an easy trick to
> distinguish them even with zero extending.

Ah, good point.

> > > Yes, we already have to do this for x32 in musl. I'd rather not have
> > > to do the same for aarch64-ILP32.
> > 
> > This would of course be solved by using a 64-bit __kernel_snseconds_t
> > or snseconds_t, and I suspect other libc implementations would just do
> > that, when they are less strict about posix/c11 compliance compared
> > to musl.
> 
> I think they would be more strict if this were for a target that
> actually sees use and they were getting bug reports from C programmers
> annoyed that their code was not working correctly or not even
> compiling. AFAIK there are no distros based on x32 now and it's
> something of an alternate model on x86_64 distros that some people are
> playing around with.

I would expect to see much more build breakage and runtime problems
from going to 64-bit time_t than from anything accessing the tv_nsec
field.

I'd also like to hear opinions from other libc maintainers on this.

> > If you don't mind the (slight) distraction, can you describe what your
> > plans are for handling 64-bit time_t on the existing 32-bit ABIs?
> > I'm involved in both the efforts to do that and the ilp32 code on
> > ARM, so it would be good for me to understand your plans for musl to
> > get the bigger picture. Specifically, which of these do you plan
> > to support (if you know already):
> 
> It largely depends on if there's demand. If we have users who want to
> run 32-bit systems with an ABI that will survive Y2038, it will be
> supported, but as a new ABI for these targets. This will likely allow
> fixing other ABI issues at the same time -- for example, on i386 I
> would probably switch to mandating SSE2 for floating point, and
> possibly using regparm everywhere.

I see. Note that in case of i386, the main use case would be
embedded systems, so while using regparm works, mandating anything
that is not part of the quark soc (mmx, sse, cmov, ...) might
be counterproductive.

Regarding ARM, you can probably do it more modern though and
require armv7-hardfloat, if you don't do that already.

> There are a couple of different ways it could be done though:
>
> 1. On a per-arch basis, defining a new ABI variant for the arch.
> 
> 2. With a new abstraction at the syscall boundary to get rid of all
> kernel-arch-specific structures in userspace and redefine all types to
> have plenty of room for growth.

We currently plan to change the kernel for all 32-bit
architectures to support the new ABI everywhere, in order to
avoid special casing rarely used architectures, which would
in turn be a potential source for bugs.

We will only add system calls with 64-bit time_t that don't have
a replacement already. So e.g. according to the current plan, there
won't be a time(2) or gettimeofday(2) system call with 64-bit
time_t in the kernel, but we expect the C library to implement
these through clock_gettime(2).

> In regards to your specific questions about ways it could be done:
> 
> > - using 64-bit time_t on future arm32/i386/... kernels
> > - using 64-bit time_t on existing arm32/i386/... kernels with native
> >   32-bit time_t
> 
> If the former is supported, I would think we'd want to support the
> latter too. An ABI that only works on very-new kernels is very
> restrictive in who can use it. Kernel support hardly matters (until
> Y2038 actually arrives); the point of 64-bit time_t is to have an ABI
> that's _ready_ for it so existing binaries can keep working.

Ok.

> > - using 32-bit time_t on future architectures that only support 64-bit
> >   time_t in the kernel
> 
> Definitely will not be supported. Introducing a new ABI with 32-bit
> time_t is a huge mistake, and the only reason it's been done for some
> of the new targets musl supports is because the kernel does it, and
> working around a mismatch between kernel and user time_t is a huge
> problem -- all sorts of things, including for example struct stat,
> depend on the time_t definition, and if you're going to allow mismatch
> with kernel you might as well go ahead and have a full translation
> layer for kernel structs like this.

Makes sense, I hope we can do the same for all libc implementations

> > - running existing binaries with 32-bit time_t on a library with 64-bit
> >   time_t support, using symbol versioning
> 
> Symbol versions don't solve any problem, and they mask dangerous bugs,
> so no. The problem is that a symbol version is only able to represent
> a single interface boundary (between a caller and libc), not all the
> other interface boundaries between third-party libraries. If code
> compiled for 32-bit time_t calls into code that uses 64-bit time_t
> with a time_t* argument and the callee writes back a result, it's
> corrupted the caller's memory. Symbol versions have no way to diagnose
> this.
> 
> They're also bound at ld-time, whereas the choice of needed version
> depends on compile-time (which definitions were used in the header the
> code was compiled against).

Ok, good. This differs from the approach taken by glibc though,
as they plan to use symbol versioning for this, like it was used
for off_t interfaces.

Introducing a new ABI like this definitely helps us prototype
the entire environment when doing the kernel port for the new
syscalls, this should be a lot easier to do than glibc with all
combinations of backwards compatibility.

> > - building a libc for existing architectures but without support for
> >   running existing 32-bit time_t applications.
> 
> Yes; this would be the way a new ABI would always work. But since musl
> inherently supports multi-arch (each arch variant has its own
> PT_INTERP name and library path config) you can easily run both types
> of binaries on the same system. They just need completely separate
> library ecosystems. This is the only way I know to prevent the
> dangerous issues that arise with other [non-]solutions like symbol
> versioning or feature test macros (as in -D_FILE_OFFSET_BITS=64) for
> the problem.

Ok, good. This will also help in case of embedded systems that want
to ensure that we use 64-bit time_t system-wide. I plan to add a
configuration option to the kernel to disallow all code that is
not y2038-safe (including ext3, NFSv3 and drivers with broken ioctls),
and it helps to have the same thing in user space. Of course the kernel
will by default have to support both ABIs.

Thanks a lot for your feedback.

	Arnd
diff mbox

Patch

--- arch/x86_64/bits/alltypes.h.in
+++ arch/x32/bits/alltypes.h.in
@@ -1,12 +1,12 @@ 
-#define _Addr long
-#define _Int64 long
-#define _Reg long
+#define _Addr int
+#define _Int64 long long
+#define _Reg long long
 
 TYPEDEF __builtin_va_list va_list;
 TYPEDEF __builtin_va_list __isoc_va_list;
 
 #ifndef __cplusplus
-TYPEDEF int wchar_t;
+TYPEDEF long wchar_t;
 #endif
 
 #if defined(__FLT_EVAL_METHOD__) && __FLT_EVAL_METHOD__ == 2
@@ -19,8 +19,8 @@ 
 
 TYPEDEF struct { long long __ll; long double __ld; } max_align_t;
 
-TYPEDEF long time_t;
-TYPEDEF long suseconds_t;
+TYPEDEF long long time_t;
+TYPEDEF long long suseconds_t;
 
 TYPEDEF struct { union { int __i[14]; unsigned long __s[7]; } __u; } pthread_attr_t;
 TYPEDEF struct { union { int __i[10]; volatile void *volatile __p[5]; } __u; } pthread_mutex_t;
--- arch/x86_64/bits/ipc.h
+++ arch/x32/bits/ipc.h
@@ -7,8 +7,8 @@ 
 	gid_t cgid;
 	mode_t mode;
 	int __ipc_perm_seq;
-	long __pad1;
-	long __pad2;
+	long long __pad1;
+	long long __pad2;
 };
 
 #define IPC_64 0
--- arch/x86_64/bits/limits.h
+++ arch/x32/bits/limits.h
@@ -1,8 +1,8 @@ 
 #if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
  || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
 #define PAGE_SIZE 4096
-#define LONG_BIT 64
+#define LONG_BIT 32
 #endif
 
-#define LONG_MAX  0x7fffffffffffffffL
+#define LONG_MAX  0x7fffffffL
 #define LLONG_MAX  0x7fffffffffffffffLL
--- arch/x86_64/bits/msg.h
+++ arch/x32/bits/msg.h
@@ -5,9 +5,12 @@ 
 	time_t msg_rtime;
 	time_t msg_ctime;
 	unsigned long msg_cbytes;
+	long __unused1;
 	msgqnum_t msg_qnum;
+	long __unused2;
 	msglen_t msg_qbytes;
+	long __unused3;
 	pid_t msg_lspid;
 	pid_t msg_lrpid;
-	unsigned long __unused[2];
+	unsigned long long __unused[2];
 };
--- arch/x86_64/bits/reg.h
+++ arch/x32/bits/reg.h
@@ -1,5 +1,5 @@ 
 #undef __WORDSIZE
-#define __WORDSIZE 64
+#define __WORDSIZE 32
 #define R15    0
 #define R14    1
 #define R13    2
--- arch/x86_64/bits/setjmp.h
+++ arch/x32/bits/setjmp.h
@@ -1 +1 @@ 
-typedef unsigned long __jmp_buf[8];
+typedef unsigned long long __jmp_buf[8];
--- arch/x86_64/bits/shm.h
+++ arch/x32/bits/shm.h
@@ -10,17 +10,24 @@ 
 	pid_t shm_cpid;
 	pid_t shm_lpid;
 	unsigned long shm_nattch;
-	unsigned long __pad1;
-	unsigned long __pad2;
+	unsigned long __pad0;
+	unsigned long long __pad1;
+	unsigned long long __pad2;
 };
 
 struct shminfo {
-	unsigned long shmmax, shmmin, shmmni, shmseg, shmall, __unused[4];
+	unsigned long shmmax, __pad0, shmmin, __pad1, shmmni, __pad2,
+	              shmseg, __pad3, shmall, __pad4;
+	unsigned long long __unused[4];
 };
 
 struct shm_info {
 	int __used_ids;
-	unsigned long shm_tot, shm_rss, shm_swp;
-	unsigned long __swap_attempts, __swap_successes;
-};
-
+	int __pad_ids;
+	unsigned long shm_tot, __pad0, shm_rss, __pad1, shm_swp, __pad2;
+	unsigned long __swap_attempts, __pad3, __swap_successes, __pad4;
+}
+#ifdef __GNUC__
+__attribute__((__aligned__(8)))
+#endif
+;
--- arch/x86_64/bits/signal.h
+++ arch/x32/bits/signal.h
@@ -42,12 +42,12 @@ 
 	unsigned padding[24];
 } *fpregset_t;
 struct sigcontext {
-	unsigned long r8, r9, r10, r11, r12, r13, r14, r15;
-	unsigned long rdi, rsi, rbp, rbx, rdx, rax, rcx, rsp, rip, eflags;
+	unsigned long long r8, r9, r10, r11, r12, r13, r14, r15;
+	unsigned long long rdi, rsi, rbp, rbx, rdx, rax, rcx, rsp, rip, eflags;
 	unsigned short cs, gs, fs, __pad0;
-	unsigned long err, trapno, oldmask, cr2;
+	unsigned long long err, trapno, oldmask, cr2;
 	struct _fpstate *fpstate;
-	unsigned long __reserved1[8];
+	unsigned long long __reserved1[8];
 };
 typedef struct {
 	gregset_t gregs;
@@ -56,7 +56,7 @@ 
 } mcontext_t;
 #else
 typedef struct {
-	unsigned long __space[32];
+	unsigned long long __space[32];
 } mcontext_t;
 #endif
 
@@ -72,7 +72,7 @@ 
 	stack_t uc_stack;
 	mcontext_t uc_mcontext;
 	sigset_t uc_sigmask;
-	unsigned long __fpregs_mem[64];
+	unsigned long long __fpregs_mem[64];
 } ucontext_t;
 
 #define SA_NOCLDSTOP  1
--- arch/x86_64/bits/stat.h
+++ arch/x32/bits/stat.h
@@ -18,5 +18,5 @@ 
 	struct timespec st_atim;
 	struct timespec st_mtim;
 	struct timespec st_ctim;
-	long __unused[3];
+	long long __unused[3];
 };
--- arch/x86_64/bits/statfs.h
+++ arch/x32/bits/statfs.h
@@ -1,7 +1,9 @@ 
 struct statfs {
-	unsigned long f_type, f_bsize;
+	unsigned long f_type, __pad0, f_bsize, __pad1;
 	fsblkcnt_t f_blocks, f_bfree, f_bavail;
 	fsfilcnt_t f_files, f_ffree;
 	fsid_t f_fsid;
-	unsigned long f_namelen, f_frsize, f_flags, f_spare[4];
+	unsigned long f_namelen, __pad2, f_frsize, __pad3;
+	unsigned long f_flags, __pad4;
+	unsigned long long f_spare[4];
 };
--- arch/x86_64/bits/stdint.h
+++ arch/x32/bits/stdint.h
@@ -12,9 +12,9 @@ 
 #define UINT_FAST16_MAX UINT32_MAX
 #define UINT_FAST32_MAX UINT32_MAX
 
-#define INTPTR_MIN      INT64_MIN
-#define INTPTR_MAX      INT64_MAX
-#define UINTPTR_MAX     UINT64_MAX
-#define PTRDIFF_MIN     INT64_MIN
-#define PTRDIFF_MAX     INT64_MAX
-#define SIZE_MAX        UINT64_MAX
+#define INTPTR_MIN      INT32_MIN
+#define INTPTR_MAX      INT32_MAX
+#define UINTPTR_MAX     UINT32_MAX
+#define PTRDIFF_MIN     INT32_MIN
+#define PTRDIFF_MAX     INT32_MAX
+#define SIZE_MAX        UINT32_MAX