[RFC,v2,08/20] sysdeps/wait: Use waitid if avaliable

Message ID 2df9d3878359585ac1cc46243fb6664f7a50b3b3.1561421042.git.alistair.francis@wdc.com
State New, archived
Headers

Commit Message

Alistair Francis June 25, 2019, 12:09 a.m. UTC
  If the waitid syscall is avaliable let's use that as waitpid
and wait4 aren't always avaliable (they aren't avaliable on RV32).

Unfortunately waitid is substantially differnt to waitpid and wait4, so
the conversion ends up being complex.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 ChangeLog                                  |  3 ++
 sysdeps/unix/sysv/linux/wait.c             | 21 ++++++++-
 sysdeps/unix/sysv/linux/waitpid.c          | 54 ++++++++++++++++++++++
 sysdeps/unix/sysv/linux/waitpid_nocancel.c | 53 +++++++++++++++++++++
 4 files changed, 129 insertions(+), 2 deletions(-)
  

Comments

Andreas Schwab June 25, 2019, 8:16 a.m. UTC | #1
On Jun 24 2019, Alistair Francis <alistair.francis@wdc.com> wrote:

> +  if (options | WNOHANG) {
> +    waitid_options |= WNOHANG;
> +  }
> +  if (options | WUNTRACED) {
> +    waitid_options |= WSTOPPED;
> +  }
> +  if (options | WCONTINUED) {
> +    waitid_options |= WCONTINUED;
> +  }

Surely that's not going to work out.

Andreas.
  
Zack Weinberg June 25, 2019, 10:55 a.m. UTC | #2
On Mon, Jun 24, 2019 at 8:12 PM Alistair Francis
<alistair.francis@wdc.com> wrote:
>
> If the waitid syscall is avaliable let's use that as waitpid
> and wait4 aren't always avaliable (they aren't avaliable on RV32).

wait4 does something that can't be done any other way (retrieve
resource usage information), RV32 should have it.

zw
  
Florian Weimer June 25, 2019, 11:06 a.m. UTC | #3
* Zack Weinberg:

> On Mon, Jun 24, 2019 at 8:12 PM Alistair Francis
> <alistair.francis@wdc.com> wrote:
>>
>> If the waitid syscall is avaliable let's use that as waitpid
>> and wait4 aren't always avaliable (they aren't avaliable on RV32).
>
> wait4 does something that can't be done any other way (retrieve
> resource usage information), RV32 should have it.

I think the problem is that the kernel provides wait4 under a different
name because struct rusage has two fields of struct timeval.

I think this needs to be covered by the syscall renaming mechanism, to
get a working wait4, and not changes to wait.

Thanks,
Florian
  
Arnd Bergmann June 25, 2019, noon UTC | #4
On Tue, Jun 25, 2019 at 1:07 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Zack Weinberg:
>
> > On Mon, Jun 24, 2019 at 8:12 PM Alistair Francis
> > <alistair.francis@wdc.com> wrote:
> >>
> >> If the waitid syscall is avaliable let's use that as waitpid
> >> and wait4 aren't always avaliable (they aren't avaliable on RV32).
> >
> > wait4 does something that can't be done any other way (retrieve
> > resource usage information), RV32 should have it.
>
> I think the problem is that the kernel provides wait4 under a different
> name because struct rusage has two fields of struct timeval.
>
> I think this needs to be covered by the syscall renaming mechanism, to
> get a working wait4, and not changes to wait.

The kernel system call waitid() is a superset of glibc's wait4() and
waitid(), it has an extra rusage argument.

Originally, my plan was to replace kernel's waitid() with
a waitid_time64() that takes an updated rusage structure,
but that never happened.

I still left wait4() commented out since it should not be
needed when the kernel has waitid(). I have an implementation
of wait4() based on waitid() that I did for musl and tested
successfully with ltp, see [1].
Unless I did something very wrong there, you should be able
to use something like this in glibc.

Similar coversions of timeval have to be done in getrusage(),
getitimer() and setitimer(), all of which expect a 32-bit
timeval couting elapsed time (so no overflow in y2038).
These need to be converted to the 64-bit timeval in the
public glibc interface.

       Arnd

[1] https://git.linaro.org/people/arnd/musl-y2038.git/tree/src/linux/wait4.c
  
Florian Weimer June 25, 2019, 12:10 p.m. UTC | #5
* Arnd Bergmann:

> The kernel system call waitid() is a superset of glibc's wait4() and
> waitid(), it has an extra rusage argument.
>
> Originally, my plan was to replace kernel's waitid() with
> a waitid_time64() that takes an updated rusage structure,
> but that never happened.
>
> I still left wait4() commented out since it should not be
> needed when the kernel has waitid(). I have an implementation
> of wait4() based on waitid() that I did for musl and tested
> successfully with ltp, see [1].
> Unless I did something very wrong there, you should be able
> to use something like this in glibc.
>
> Similar coversions of timeval have to be done in getrusage(),
> getitimer() and setitimer(), all of which expect a 32-bit
> timeval couting elapsed time (so no overflow in y2038).
> These need to be converted to the 64-bit timeval in the
> public glibc interface.

Does this means that RV32 will use a 32-bit struct timeval in those
system calls?  Even if everything else 64-bit?

Thanks,
Florian
  
Arnd Bergmann June 25, 2019, 1:29 p.m. UTC | #6
On Tue, Jun 25, 2019 at 2:10 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Arnd Bergmann:
>
> > The kernel system call waitid() is a superset of glibc's wait4() and
> > waitid(), it has an extra rusage argument.
> >
> > Originally, my plan was to replace kernel's waitid() with
> > a waitid_time64() that takes an updated rusage structure,
> > but that never happened.
> >
> > I still left wait4() commented out since it should not be
> > needed when the kernel has waitid(). I have an implementation
> > of wait4() based on waitid() that I did for musl and tested
> > successfully with ltp, see [1].
> > Unless I did something very wrong there, you should be able
> > to use something like this in glibc.
> >
> > Similar coversions of timeval have to be done in getrusage(),
> > getitimer() and setitimer(), all of which expect a 32-bit
> > timeval couting elapsed time (so no overflow in y2038).
> > These need to be converted to the 64-bit timeval in the
> > public glibc interface.
>
> Does this means that RV32 will use a 32-bit struct timeval in those
> system calls?  Even if everything else 64-bit?

Correct. Only those four (all deprecated but still used) system calls,
as we could not agree on a new interface before 5.1, and there
is no urgency for deployment when they can be emulated.

I agree this is ugly, sorry for having dropped the mess into your
area instead of fixing it in the kernel.

        Arnd
  
Arnd Bergmann June 25, 2019, 1:39 p.m. UTC | #7
On Tue, Jun 25, 2019 at 3:29 PM Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Jun 25, 2019 at 2:10 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> > Does this means that RV32 will use a 32-bit struct timeval in those
> > system calls?  Even if everything else 64-bit?
>
> Correct. Only those four (all deprecated but still used) system calls,
> as we could not agree on a new interface before 5.1, and there
> is no urgency for deployment when they can be emulated.

Correction: getrusage() is still a recommended interface in POSIX.1-2017
with no nanosecond based replacement, while wait4(), getitimer() and
getrusage() are all obsolete but cannot be implemented on top of other
POSIX system calls.

        Arnd
  
Florian Weimer June 25, 2019, 1:47 p.m. UTC | #8
* Arnd Bergmann:

> On Tue, Jun 25, 2019 at 3:29 PM Arnd Bergmann <arnd@arndb.de> wrote:
>> On Tue, Jun 25, 2019 at 2:10 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> > Does this means that RV32 will use a 32-bit struct timeval in those
>> > system calls?  Even if everything else 64-bit?
>>
>> Correct. Only those four (all deprecated but still used) system calls,
>> as we could not agree on a new interface before 5.1, and there
>> is no urgency for deployment when they can be emulated.
>
> Correction: getrusage() is still a recommended interface in POSIX.1-2017
> with no nanosecond based replacement, while wait4(), getitimer() and
> getrusage() are all obsolete but cannot be implemented on top of other
> POSIX system calls.

This makes me rather unhappy.  I also don't see the benefit of renaming
all time-related system calls for new architectures.  Oh well.  I hope
someone will figure out how to integrate this smoothly into glibc.

Thanks,
Florian
  
Arnd Bergmann June 25, 2019, 2:04 p.m. UTC | #9
On Tue, Jun 25, 2019 at 3:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Arnd Bergmann:
>
> > On Tue, Jun 25, 2019 at 3:29 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >> On Tue, Jun 25, 2019 at 2:10 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> > Does this means that RV32 will use a 32-bit struct timeval in those
> >> > system calls?  Even if everything else 64-bit?
> >>
> >> Correct. Only those four (all deprecated but still used) system calls,
> >> as we could not agree on a new interface before 5.1, and there
> >> is no urgency for deployment when they can be emulated.
> >
> > Correction: getrusage() is still a recommended interface in POSIX.1-2017
> > with no nanosecond based replacement, while wait4(), getitimer() and
> > getrusage() are all obsolete but cannot be implemented on top of other
> > POSIX system calls.
>
> This makes me rather unhappy.  I also don't see the benefit of renaming
> all time-related system calls for new architectures.

What got renamed? I was trying very hard to make it as consistent
and easy as possible. There is a strict mapping of __NR_* macros
to argument types for each system call [1], so e.g. __kernel_timespec
will always be used together with system calls named *time64(),
while the old system calls always refer to the traditional
"struct timespec {long tv_sec; long tv_nsec;}" type. This is the
same way that loff_t gets handled, so I assumed that all C libraries
would know how to deal with this well.

> Oh well.  I hope
> someone will figure out how to integrate this smoothly into glibc.

Integrating this should actually be easier than the other system calls
since you only need to implement the fallback path and not also the
call into the replacement. One major factor for not requiring replacements
for getrusage()/getitimer()/setitimer() first was actually that both glibc
and musl have plans to support old kernels and need to implement
the fallback path regardless of whether we implemented the new
version or not.

       Arnd

[1] ignoring the universally hated x32 ABI
  
Florian Weimer June 25, 2019, 2:08 p.m. UTC | #10
* Arnd Bergmann:

> On Tue, Jun 25, 2019 at 3:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Arnd Bergmann:
>>
>> > On Tue, Jun 25, 2019 at 3:29 PM Arnd Bergmann <arnd@arndb.de> wrote:
>> >> On Tue, Jun 25, 2019 at 2:10 PM Florian Weimer <fweimer@redhat.com> wrote:
>> >>
>> >> > Does this means that RV32 will use a 32-bit struct timeval in those
>> >> > system calls?  Even if everything else 64-bit?
>> >>
>> >> Correct. Only those four (all deprecated but still used) system calls,
>> >> as we could not agree on a new interface before 5.1, and there
>> >> is no urgency for deployment when they can be emulated.
>> >
>> > Correction: getrusage() is still a recommended interface in POSIX.1-2017
>> > with no nanosecond based replacement, while wait4(), getitimer() and
>> > getrusage() are all obsolete but cannot be implemented on top of other
>> > POSIX system calls.
>>
>> This makes me rather unhappy.  I also don't see the benefit of renaming
>> all time-related system calls for new architectures.
>
> What got renamed?

futex to futex_time64 on RV32.  <asm/unistd.h> seems to expose only the
latter.

> I was trying very hard to make it as consistent
> and easy as possible. There is a strict mapping of __NR_* macros
> to argument types for each system call [1], so e.g. __kernel_timespec
> will always be used together with system calls named *time64(),
> while the old system calls always refer to the traditional
> "struct timespec {long tv_sec; long tv_nsec;}" type. This is the
> same way that loff_t gets handled, so I assumed that all C libraries
> would know how to deal with this well.

I'm afraid, but this setup makes little sense if the old system call
does not exist for an architecture.  Instead it requires additional
porting effort.

Thanks,
Florian
  
Arnd Bergmann June 25, 2019, 2:21 p.m. UTC | #11
On Tue, Jun 25, 2019 at 4:08 PM Florian Weimer <fweimer@redhat.com> wrote:
> * Arnd Bergmann:
> > On Tue, Jun 25, 2019 at 3:47 PM Florian Weimer <fweimer@redhat.com> wrote:
> >> This makes me rather unhappy.  I also don't see the benefit of renaming
> >> all time-related system calls for new architectures.
> >
> > What got renamed?
>
> futex to futex_time64 on RV32.  <asm/unistd.h> seems to expose only the
> latter.

But that's the point: futex() takes a timespec argument based on 'long',
so that won't work beyond y2038 on 32-bit architectures and we
cannot provide that.

One of the highest priorities in the conversion was always to ensure
that new architectures would behave exactly the same way as the
existing ones, except for leaving out the compatibility support for
old C libraries that never existed on those architectures.

Reusing the old __NR_futex() macro with a new ABI would require
having architecture specific hacks even in C libraries that never
supported 32-bit time_t on any architectures, or that dropped that
support.

> > I was trying very hard to make it as consistent
> > and easy as possible. There is a strict mapping of __NR_* macros
> > to argument types for each system call [1], so e.g. __kernel_timespec
> > will always be used together with system calls named *time64(),
> > while the old system calls always refer to the traditional
> > "struct timespec {long tv_sec; long tv_nsec;}" type. This is the
> > same way that loff_t gets handled, so I assumed that all C libraries
> > would know how to deal with this well.
>
> I'm afraid, but this setup makes little sense if the old system call
> does not exist for an architecture.  Instead it requires additional
> porting effort.

I think we can still add back the time32 syscalls for rv32 in linux-5.2 if
that helps, it should be a one-line patch in the kernel (and it would
mean that linux-5.1 would be broken for anyone actually using those).

        Arnd
  
Florian Weimer June 25, 2019, 2:29 p.m. UTC | #12
* Arnd Bergmann:

> On Tue, Jun 25, 2019 at 4:08 PM Florian Weimer <fweimer@redhat.com> wrote:
>> * Arnd Bergmann:
>> > On Tue, Jun 25, 2019 at 3:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>> >> This makes me rather unhappy.  I also don't see the benefit of renaming
>> >> all time-related system calls for new architectures.
>> >
>> > What got renamed?
>>
>> futex to futex_time64 on RV32.  <asm/unistd.h> seems to expose only the
>> latter.
>
> But that's the point: futex() takes a timespec argument based on 'long',
> so that won't work beyond y2038 on 32-bit architectures and we
> cannot provide that.
>
> One of the highest priorities in the conversion was always to ensure
> that new architectures would behave exactly the same way as the
> existing ones, except for leaving out the compatibility support for
> old C libraries that never existed on those architectures.
>
> Reusing the old __NR_futex() macro with a new ABI would require
> having architecture specific hacks even in C libraries that never
> supported 32-bit time_t on any architectures, or that dropped that
> support.

I think this fails to take into account that the type differences have
clearly been abstracted away for struct timespec (you really need the
definition from an official header), and perhaps even for time_t.

If there is only a 64-bit version of those types, the syscall name
difference does not add any value whatsoever.  In glibc, I think we will
just add

#define __NR_futex __NR_futex_time64

for these architectures (only internally, we won't define SYS_futex).
And any application that wants to call the futex system call directly
will do the same thing.  Which leads to the question why the kernel
headers aren't doing it.

Thanks,
Florian
  
Alistair Francis June 25, 2019, 11:51 p.m. UTC | #13
On Tue, Jun 25, 2019 at 5:01 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jun 25, 2019 at 1:07 PM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Zack Weinberg:
> >
> > > On Mon, Jun 24, 2019 at 8:12 PM Alistair Francis
> > > <alistair.francis@wdc.com> wrote:
> > >>
> > >> If the waitid syscall is avaliable let's use that as waitpid
> > >> and wait4 aren't always avaliable (they aren't avaliable on RV32).
> > >
> > > wait4 does something that can't be done any other way (retrieve
> > > resource usage information), RV32 should have it.
> >
> > I think the problem is that the kernel provides wait4 under a different
> > name because struct rusage has two fields of struct timeval.
> >
> > I think this needs to be covered by the syscall renaming mechanism, to
> > get a working wait4, and not changes to wait.
>
> The kernel system call waitid() is a superset of glibc's wait4() and
> waitid(), it has an extra rusage argument.
>
> Originally, my plan was to replace kernel's waitid() with
> a waitid_time64() that takes an updated rusage structure,
> but that never happened.
>
> I still left wait4() commented out since it should not be
> needed when the kernel has waitid(). I have an implementation
> of wait4() based on waitid() that I did for musl and tested
> successfully with ltp, see [1].
> Unless I did something very wrong there, you should be able
> to use something like this in glibc.

Thanks for that, I still see hangs though when I port that to glibc.

Alistair

>
> Similar coversions of timeval have to be done in getrusage(),
> getitimer() and setitimer(), all of which expect a 32-bit
> timeval couting elapsed time (so no overflow in y2038).
> These need to be converted to the 64-bit timeval in the
> public glibc interface.
>
>        Arnd
>
> [1] https://git.linaro.org/people/arnd/musl-y2038.git/tree/src/linux/wait4.c
  
Arnd Bergmann June 26, 2019, 2:37 p.m. UTC | #14
On Tue, Jun 25, 2019 at 4:29 PM Florian Weimer <fweimer@redhat.com> wrote:

>
> I think this fails to take into account that the type differences have
> clearly been abstracted away for struct timespec (you really need the
> definition from an official header), and perhaps even for time_t.
>
> If there is only a 64-bit version of those types, the syscall name
> difference does not add any value whatsoever.  In glibc, I think we will
> just add
>
> #define __NR_futex __NR_futex_time64
>
> for these architectures (only internally, we won't define SYS_futex).
> And any application that wants to call the futex system call directly
> will do the same thing.  Which leads to the question why the kernel
> headers aren't doing it.

I think it's a bad trade-off to do this hack for risc-v, and I don't
really want to encourage it.

Redefining __NR_futex for rv32 would clearly speed up the process
of the initial merge because it avoids the dependency on the general
ilp32-time64 support, but the cost would be to maintain rv32 as a
special case forever.

We already need three distinct variants of futex (and any other
time64 syscall):

a) 64-bit, timespec == __old_kernel_timespec == __kernel_timespec
futex() -> syscall(__NR_futex) -> sys_futex

b) 32-bit, timespec == __old_kernel_timespec; will stay around until ~2038
futex() -> syscall(__NR_futex) -> sys_futex

c) 32-bit, timespec == __kernel_timespec, being added now
futex() -> __futex_time64() -> syscall(__NR_futex_time64) -> sys_futex_time64

With __futex_time64() being an internal function like

int __futex_time64(...)
{
      int err = -ENOSYS;
#ifdef __NR_futex_time64
      err = syscall(__NR_futex_time64, ...);
#endif
#ifdef __NR_futex
      if (err = -ENOSYS) { /* pre-5.1 kernel */
            struct __kernel_old_timespec ts32 = ts64_to_ts32(ts);
            err = syscall(__NR_futex_time64, ... &ts32 ...);
      }
#endif
      return ret;
}

What I expected would happen here was to use the same as
case c) for rv32, while your approach would be closer to a),
with no intermediate function:

d) rv32, timespec == __kernel_timespec
futex() -> syscall(__NR_futex_time64) -> sys_futex_time64

This way you have to multiplex between time32 and time64 in
two places: existing architectures by calling __futex_time64()
instead of futex() (through whatever method you normally use
in glibc to choose between different ABI variants at link time),
while rv32 would use the old symbol but redefine the kernel
macros to do the same thing.

Is there any long-term advantage for your approach that I'm
missing?

      Arnd
  
Florian Weimer June 26, 2019, 3:48 p.m. UTC | #15
* Arnd Bergmann:

[Exposing futex as __NR_futex for 64-bit-only-time_t]

> Is there any long-term advantage for your approach that I'm
> missing?

Userspace that calls futex and does not care about timeouts would just
work on RV32 without any porting.

For example, consider this code in glib:

| static void
| g_futex_wait (const volatile gint *address,
|               gint                 value)
| {
|   syscall (__NR_futex, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
| }

This will not work on RV32 for no good reason at all.  You would
actually have to add an #ifdef for RV32 to fix it because you clearly
don't want to do this:

| static void
| g_futex_wait (const volatile gint *address,
|               gint                 value)
| {
| #ifdef __NR_futex_time64
|   syscall (__NR_futex_time64, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
| #else
|   syscall (__NR_futex, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
| #endif
| }

because that would break if the library was used on an older kernel
(older than the kernel headers installed at build time).

Maybe you could use this:

| static void
| g_futex_wait (const volatile gint *address,
|               gint                 value)
| {
| #ifdef __NR_futex
|   syscall (__NR_futex_time64, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
| #else
|   syscall (__NR_futex_time64, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
| #endif
| }

But this would still break if people actually go ahead with the removal
of the 32-bit system calls (something I think is quite impossible to do,
but some people seem to disagree).

Fallback on ENOSYS requires introducing global variable to avoid
pointless future system calls.

Thanks,
Florian
  
Andreas Schwab June 26, 2019, 4:28 p.m. UTC | #16
On Jun 26 2019, Florian Weimer <fweimer@redhat.com> wrote:

> * Arnd Bergmann:
>
> [Exposing futex as __NR_futex for 64-bit-only-time_t]
>
>> Is there any long-term advantage for your approach that I'm
>> missing?
>
> Userspace that calls futex and does not care about timeouts would just
> work on RV32 without any porting.

Calling syscalls directly is always going to be hard to keep up-to-date,
has been fallen apart a lot already.

> But this would still break if people actually go ahead with the removal
> of the 32-bit system calls (something I think is quite impossible to do,
> but some people seem to disagree).

I'd expect that the current futex syscall will continue to work for all
its subfunctions without absolute timestamp.

Andreas.
  
Arnd Bergmann June 26, 2019, 9:08 p.m. UTC | #17
On Wed, Jun 26, 2019 at 5:48 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Arnd Bergmann:
>
> [Exposing futex as __NR_futex for 64-bit-only-time_t]


I'm realizing now that I was missing your point as I took futex as
an example assuming we would handle all 21 time64 syscalls the
same way, but you point out some issues that are specific to futex,
so I should try to separate those two things.

> > Is there any long-term advantage for your approach that I'm
> > missing?
>
> Userspace that calls futex and does not care about timeouts would just
> work on RV32 without any porting.

In general, my hope is that anything should just work on rv32 without
porting, regardless of the timeouts.

One problem specific to futex is the fact that glibc does not come with
a wrapper for it, so it's up to the applications to decide on which macro
to pass.

For most of the other syscalls, having rv32 not define the macro means
that we break applications at compile-time, and are hopefully able to
fix them in a way that works on all architectures. In case of futex,
I agree that it would be better not to break compilation when you
don't pass a timeout, but I see no good way to have it both ways.

> For example, consider this code in glib:
>
> | static void
> | g_futex_wait (const volatile gint *address,
> |               gint                 value)
> | {
> |   syscall (__NR_futex, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
> | }
>
> This will not work on RV32 for no good reason at all.  You would
> actually have to add an #ifdef for RV32 to fix it because you clearly
> don't want to do this:
>
> | static void
> | g_futex_wait (const volatile gint *address,
> |               gint                 value)
> | {
> | #ifdef __NR_futex_time64
> |   syscall (__NR_futex_time64, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
> | #else
> |   syscall (__NR_futex, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
> | #endif
> | }
>
> because that would break if the library was used on an older kernel
> (older than the kernel headers installed at build time).

*nod*

> Maybe you could use this:
>
> | static void
> | g_futex_wait (const volatile gint *address,
> |               gint                 value)
> | {
> | #ifdef __NR_futex
> |   syscall (__NR_futex_time64, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
> | #else
> |   syscall (__NR_futex_time64, address, (gsize) FUTEX_WAIT_PRIVATE, (gsize) value, NULL);
> | #endif
> | }
>
> But this would still break if people actually go ahead with the removal
> of the 32-bit system calls (something I think is quite impossible to do,
> but some people seem to disagree).
>
> Fallback on ENOSYS requires introducing global variable to avoid
> pointless future system calls.

I hope the example of g_futex_wait() becomes a little easier after
we have come up with a solution for fixing the much harder case
of using __NR_futex /with/ timeouts, such as (also from glib)

gboolean
g_cond_wait_until (GCond  *cond, GMutex *mutex, gint64  end_time)
{
 struct timespec span;
  ...
  res = syscall (__NR_futex, &cond->i[0], (gsize) FUTEX_WAIT_PRIVATE,
(gsize) sampled, &span);
 ...
}

The problem obviously is that on existing 32-bit architectures,
the first argument (__NR_futex) must correspond to the type
of the 'span' variable. Fixing this is always ugly, but has to be
done. The best I can think of is

gboolean
g_cond_wait_until (GCond  *cond, GMutex *mutex, gint64  end_time)
{
  struct timespec span;
  ...
  res = -ENOSYS;
  if (sizeof(time_t) > sizeof(__syscall_slong_t)) {
#ifdef __NR_futex_time64
     res = syscall (__NR_futex_time64, &cond->i[0], (gsize) FUTEX_WAIT_PRIVATE,
                 (gsize) sampled, &span);
#endif
 } else {
#ifdef __NR_futex
     res = syscall (__NR_futex, &cond->i[0], (gsize) FUTEX_WAIT_PRIVATE,
                 (gsize) sampled, &span);
#endif
  }
...
}

The version above will
- always use __NR_futex on 64-bit architectures and x32
- use __NR_futex on 32-bit architectures with 32-bit time_t, this will always
  work except on kernels that don't support time32 syscalls (as intended)
- use __NR_futex_time64 on 32-bit architectures with 64-bit time_t, which
  works when built with linux-5.1+ headers and running on linux-5.1+.
  This is probably good enough, as many other things are also broken
  when trying to use time64 with older kernels or headers.

One could achieve something similar by defining SYS_futex in glibc like

#if __WORDSIZE == 64 || __TIMESIZE == 32 || (defined __x86_64__ &&
defined __ILP32__)
#define SYS_futex __NR_futex
#else
#define SYS_futex __NR_futex_time64
#endif

and then requiring applications to use SYS_futex rather than __NR_futex.
Again, there would be no fallback for time64 with older kernels or
headers this way.

We can't really redefine __NR_futex in the kernel headers this way
unfortunately, because
a) there is no reliable way for kernel headers to check __TIMESIZE
    in an #ifdef (users might include asm/unistd.h before libc headers);
    the best approximation would be
  #define __NR_futex (sizeof(time_t) > sizeof(__kernel_long_t) ? \
               __NR_futex_time64 : __NR_futex_time32)
    which has a few downsides as well.
b) we need to provide both plain __NR_futex and __NR_futex_time64
    for libraries that want to call both depending on the timeout argument
    type.

        Arnd
  
Florian Weimer June 27, 2019, 7:33 a.m. UTC | #18
* Arnd Bergmann:

> On Wed, Jun 26, 2019 at 5:48 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Arnd Bergmann:
>>
>> [Exposing futex as __NR_futex for 64-bit-only-time_t]
>
>
> I'm realizing now that I was missing your point as I took futex as
> an example assuming we would handle all 21 time64 syscalls the
> same way, but you point out some issues that are specific to futex,
> so I should try to separate those two things.
>
>> > Is there any long-term advantage for your approach that I'm
>> > missing?
>>
>> Userspace that calls futex and does not care about timeouts would just
>> work on RV32 without any porting.
>
> In general, my hope is that anything should just work on rv32 without
> porting, regardless of the timeouts.
>
> One problem specific to futex is the fact that glibc does not come with
> a wrapper for it, so it's up to the applications to decide on which macro
> to pass.

That's hardly special to futex.  Even if there's a perfectly fine
wrapper, some people don't want to use it.

> For most of the other syscalls, having rv32 not define the macro means
> that we break applications at compile-time, and are hopefully able to
> fix them in a way that works on all architectures. In case of futex,
> I agree that it would be better not to break compilation when you
> don't pass a timeout, but I see no good way to have it both ways.

I'm more and more confused here.

Does rv32 still have 32-bit struct timeval/struct timespec outside
getrusage?

If not, there is no risk at all that programmers confuse the types, and
no need to rename the systen calls.

If there's only a 64-bit time_t, calling futex futex_time64 makes as
much sense as alpha calling getpid getxpid.

> I hope the example of g_futex_wait() becomes a little easier after
> we have come up with a solution for fixing the much harder case
> of using __NR_futex /with/ timeouts, such as (also from glib)
>
> gboolean
> g_cond_wait_until (GCond  *cond, GMutex *mutex, gint64  end_time)
> {
>  struct timespec span;
>   ...
>   res = syscall (__NR_futex, &cond->i[0], (gsize) FUTEX_WAIT_PRIVATE,
> (gsize) sampled, &span);
>  ...
> }
>
> The problem obviously is that on existing 32-bit architectures,
> the first argument (__NR_futex) must correspond to the type
> of the 'span' variable. Fixing this is always ugly, but has to be
> done. The best I can think of is
>
> gboolean
> g_cond_wait_until (GCond  *cond, GMutex *mutex, gint64  end_time)
> {
>   struct timespec span;
>   ...
>   res = -ENOSYS;
>   if (sizeof(time_t) > sizeof(__syscall_slong_t)) {
> #ifdef __NR_futex_time64
>      res = syscall (__NR_futex_time64, &cond->i[0], (gsize) FUTEX_WAIT_PRIVATE,
>                  (gsize) sampled, &span);
> #endif
>  } else {
> #ifdef __NR_futex
>      res = syscall (__NR_futex, &cond->i[0], (gsize) FUTEX_WAIT_PRIVATE,
>                  (gsize) sampled, &span);
> #endif
>   }
> ...
> }

> The version above will
> - always use __NR_futex on 64-bit architectures and x32
> - use __NR_futex on 32-bit architectures with 32-bit time_t, this will always
>   work except on kernels that don't support time32 syscalls (as intended)
> - use __NR_futex_time64 on 32-bit architectures with 64-bit time_t, which
>   works when built with linux-5.1+ headers and running on linux-5.1+.
>   This is probably good enough, as many other things are also broken
>   when trying to use time64 with older kernels or headers.

I think the quoted fix isn't the best we can do.  Portable binaries
32-bit binaries need to call futex_time64 once to see if it is
available, and fall back to futex if it is not.  g_cond_wait_until in
particular does not have a dependency on the size of struct timespec,
but other parts of glib might, and it may not be possible to compile
glib as a while with __TIMESIZE == 64.

But t his is really an aside.  I'm concerned that porting software to
rv32 requires that applications have to deal with the time_t transition
today, and I don't see a good reason for linking the two.

Thanks,
Florian
  
Andreas Schwab June 27, 2019, 8:25 a.m. UTC | #19
On Jun 27 2019, Florian Weimer <fweimer@redhat.com> wrote:

> But t his is really an aside.  I'm concerned that porting software to
> rv32 requires that applications have to deal with the time_t transition
> today, and I don't see a good reason for linking the two.

It's not unlike the situation with aarch64 not defining many traditional
syscalls, but only their *at variants.

Andreas.
  
Arnd Bergmann June 27, 2019, 10:21 a.m. UTC | #20
On Thu, Jun 27, 2019 at 9:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> * Arnd Bergmann:
> > On Wed, Jun 26, 2019 at 5:48 PM Florian Weimer <fweimer@redhat.com> wrote:
> >> * Arnd Bergmann:

> > One problem specific to futex is the fact that glibc does not come with
> > a wrapper for it, so it's up to the applications to decide on which macro
> > to pass.
>
> That's hardly special to futex.  Even if there's a perfectly fine
> wrapper, some people don't want to use it.

Sure, but any application doing that is broken with when it calls into
the kernel with any mismatched types. Calling syscall(__NR_gettimeofday)
or syscall(__NR_futex) with the time64 version of timespec (from the C library)
can't work, just like calling syscall(__NR_ftruncate) with a 64-bit off_t.

Using syscall() is always going to be hard, what makes futex different
is that there is no easy way.

> > For most of the other syscalls, having rv32 not define the macro means
> > that we break applications at compile-time, and are hopefully able to
> > fix them in a way that works on all architectures. In case of futex,
> > I agree that it would be better not to break compilation when you
> > don't pass a timeout, but I see no good way to have it both ways.
>
> I'm more and more confused here.
>
> Does rv32 still have 32-bit struct timeval/struct timespec outside
> getrusage?

Only in a few drivers that we intend to fix before anyone can
use them, so no.

> If not, there is no risk at all that programmers confuse the types, and
> no need to rename the systen calls.
>
> If there's only a 64-bit time_t, calling futex futex_time64 makes as
> much sense as alpha calling getpid getxpid.

No, the rule for syscalls in the kernel is to give a new name if you
change the arguments. When we went through multiple revisions for
fadvise/fadvise64/fadvise64_64, we could have made it so that
each new architecture only has the latest version by the name of
'fadvise', but I think that would have been really confusing.

Admittedly, the ABI for timespec is messed up, since there is
no good way to get to the right argument type. We do have a
'struct timespec' definition in <linux/time.h>, and this is what one
would have to use in combination with __NR_clock_gettime or
__NR_futex, but it clearly conflicts with the definition from
the libc <time.h> when using 64-bit time_t, so you can't include
those two headers together.

I also really don't want to have a different definition of timespec
in the kernel for rv32 and get people to use that. Instead I think
anyone using the __NR_* macros should really use __kernel_timespec
and other __kernel_* types from the kernel headers in place of
the libc types in the long run. Properly cleaning up the kernel
headers is something we definitely need to do at some point
but have no specific plan for.

We have also debated adding all the __NR_*time64 macros for all
64-bit architectures, which would have been more consistent,
but so far have not done that for consistency with
llseek/stat64/fctntl64/truncate64/... that are also 32-bit only.

> > The version above will
> > - always use __NR_futex on 64-bit architectures and x32
> > - use __NR_futex on 32-bit architectures with 32-bit time_t, this will always
> >   work except on kernels that don't support time32 syscalls (as intended)
> > - use __NR_futex_time64 on 32-bit architectures with 64-bit time_t, which
> >   works when built with linux-5.1+ headers and running on linux-5.1+.
> >   This is probably good enough, as many other things are also broken
> >   when trying to use time64 with older kernels or headers.
>
> I think the quoted fix isn't the best we can do.  Portable binaries
> 32-bit binaries need to call futex_time64 once to see if it is
> available, and fall back to futex if it is not.  g_cond_wait_until in
> particular does not have a dependency on the size of struct timespec,
> but other parts of glib might, and it may not be possible to compile
> glib as a while with __TIMESIZE == 64.

Can you clarify this? Making glib work with __TIMESIZE == 64 is
clearly required anyway, and g_cond_wait_until() cannot work
unless the local timeout structure has the same type that the kernel
expects.

> But t his is really an aside.  I'm concerned that porting software to
> rv32 requires that applications have to deal with the time_t transition
> today, and I don't see a good reason for linking the two.

The risc-v kernel maintainers specifically wanted to wait for the time_t
transition with libc, in order for rv32 to avoid going through the transition
after binaries are widely distributed already.

As I said before, we could easily undo the one-line change in the kernel
and start out with the usual 32-bit off_t/time_t/... in glibc if it turns
out to be too hard to get this to work.

     Arnd
  
Florian Weimer June 27, 2019, 11:12 a.m. UTC | #21
* Arnd Bergmann:

>> If not, there is no risk at all that programmers confuse the types, and
>> no need to rename the systen calls.
>>
>> If there's only a 64-bit time_t, calling futex futex_time64 makes as
>> much sense as alpha calling getpid getxpid.
>
> No, the rule for syscalls in the kernel is to give a new name if you
> change the arguments. When we went through multiple revisions for
> fadvise/fadvise64/fadvise64_64, we could have made it so that
> each new architecture only has the latest version by the name of
> 'fadvise', but I think that would have been really confusing.

But futex is different because in userspace, futex_time64 behaves
exactly like futex on other architectures on rv32.

> I also really don't want to have a different definition of timespec
> in the kernel for rv32 and get people to use that. Instead I think
> anyone using the __NR_* macros should really use __kernel_timespec
> and other __kernel_* types from the kernel headers in place of
> the libc types in the long run. Properly cleaning up the kernel
> headers is something we definitely need to do at some point
> but have no specific plan for.

There's only __kernel_old_timeval, no __kernel_old_timespec, so you
can't use the kernel headers to write the 32-bit fallback path.  Once we
get _TIME_BITS=64 (to change struct timespec to 64-bit, like
_FILE_OFFSET_BITS), you can't use the glibc header files either.  Not
good.

>> > The version above will
>> > - always use __NR_futex on 64-bit architectures and x32
>> > - use __NR_futex on 32-bit architectures with 32-bit time_t, this will always
>> >   work except on kernels that don't support time32 syscalls (as intended)
>> > - use __NR_futex_time64 on 32-bit architectures with 64-bit time_t, which
>> >   works when built with linux-5.1+ headers and running on linux-5.1+.
>> >   This is probably good enough, as many other things are also broken
>> >   when trying to use time64 with older kernels or headers.
>>
>> I think the quoted fix isn't the best we can do.  Portable binaries
>> 32-bit binaries need to call futex_time64 once to see if it is
>> available, and fall back to futex if it is not.  g_cond_wait_until in
>> particular does not have a dependency on the size of struct timespec,
>> but other parts of glib might, and it may not be possible to compile
>> glib as a while with __TIMESIZE == 64.
>
> Can you clarify this? Making glib work with __TIMESIZE == 64 is
> clearly required anyway,

I'm not sure.  glib could deprecate all the APIs that use time_t
externally.  Given that they don't want to bump ABI, they may not want
to compile with __TIMESIZE == 64 on old 32-bit systems.

> and g_cond_wait_until() cannot work unless the local timeout structure
> has the same type that the kernel expects.

True, and that code is currently really, really hard to write (for
legacy 32-bit architectures).  For new 32-bit architectures, there is
just pointless source code difference without any added value in terms
of type safety.

>> But t his is really an aside.  I'm concerned that porting software to
>> rv32 requires that applications have to deal with the time_t transition
>> today, and I don't see a good reason for linking the two.
>
> The risc-v kernel maintainers specifically wanted to wait for the
> time_t transition with libc, in order for rv32 to avoid going through
> the transition after binaries are widely distributed already.

That's a different concern.  They didn't want a 32-bit-time_t ABI.  I
expect that they might have chosen differently if they realized that
they'd need to absorb the userspace 64-bit porting cost, just to get
rv32 going.

> As I said before, we could easily undo the one-line change in the
> kernel and start out with the usual 32-bit off_t/time_t/... in glibc
> if it turns out to be too hard to get this to work.

glibc isn't the problem.  It's source-level changes to applications
which use kernel headers.

Thanks,
Florian
  
Arnd Bergmann June 27, 2019, 3:24 p.m. UTC | #22
On Thu, Jun 27, 2019 at 1:12 PM Florian Weimer <fweimer@redhat.com> wrote:
> * Arnd Bergmann:
> >> If not, there is no risk at all that programmers confuse the types, and
> >> no need to rename the systen calls.
> >>
> >> If there's only a 64-bit time_t, calling futex futex_time64 makes as
> >> much sense as alpha calling getpid getxpid.
> >
> > No, the rule for syscalls in the kernel is to give a new name if you
> > change the arguments. When we went through multiple revisions for
> > fadvise/fadvise64/fadvise64_64, we could have made it so that
> > each new architecture only has the latest version by the name of
> > 'fadvise', but I think that would have been really confusing.
>
> But futex is different because in userspace, futex_time64 behaves
> exactly like futex on other architectures on rv32.

Only as long as nobody tries to define their own __timespec structure to
pass into futex, based on the traditional definition. Given the lack
or a correct way to get to the definition, that would not be an
unreasonable thing to do for getting it to work on arm32
as a fallback.

> > I also really don't want to have a different definition of timespec
> > in the kernel for rv32 and get people to use that. Instead I think
> > anyone using the __NR_* macros should really use __kernel_timespec
> > and other __kernel_* types from the kernel headers in place of
> > the libc types in the long run. Properly cleaning up the kernel
> > headers is something we definitely need to do at some point
> > but have no specific plan for.
>
> There's only __kernel_old_timeval, no __kernel_old_timespec, so you
> can't use the kernel headers to write the 32-bit fallback path.  Once we
> get _TIME_BITS=64 (to change struct timespec to 64-bit, like
> _FILE_OFFSET_BITS), you can't use the glibc header files either.  Not
> good.

Right, that is clearly part of the cleanup that needs to happen.
I thought we had actually merged the __kernel_old_timespec
patch at some point, but evidently we have not.

> >> I think the quoted fix isn't the best we can do.  Portable binaries
> >> 32-bit binaries need to call futex_time64 once to see if it is
> >> available, and fall back to futex if it is not.  g_cond_wait_until in
> >> particular does not have a dependency on the size of struct timespec,
> >> but other parts of glib might, and it may not be possible to compile
> >> glib as a while with __TIMESIZE == 64.
> >
> > Can you clarify this? Making glib work with __TIMESIZE == 64 is
> > clearly required anyway,
>
> I'm not sure.  glib could deprecate all the APIs that use time_t
> externally.  Given that they don't want to bump ABI, they may not want
> to compile with __TIMESIZE == 64 on old 32-bit systems.

From a very brief look at glib, my impression is that this is again a
rather special case: all the external interfaces in glib are already
independent of sizeof(time_t), and internally it should be time64
safe already when built against a C library with 64-bit time_t
(all internal types use 64-bit seconds), except for the futex usage
usage in g_cond_wait_until().

[glib has a different problem with its deprecated GTimeVal type
 that they will have to solve to actually run beyond 2038, but
 let's not get into that]

Many other libraries do have external interfaces based on time_t,
and have to deal with those breaking between different values
of _TIME_BITS. I suspect the only realistic way to deal with this
on a large scale is to do a full distro rebuild with _TIME_BITS=64
and then fix whatever broke, but not expect a solution at the ABI
level like glibc is doing to support both versions in one binary.

> >> But t his is really an aside.  I'm concerned that porting software to
> >> rv32 requires that applications have to deal with the time_t transition
> >> today, and I don't see a good reason for linking the two.
> >
> > The risc-v kernel maintainers specifically wanted to wait for the
> > time_t transition with libc, in order for rv32 to avoid going through
> > the transition after binaries are widely distributed already.
>
> That's a different concern.  They didn't want a 32-bit-time_t ABI.  I
> expect that they might have chosen differently if they realized that
> they'd need to absorb the userspace 64-bit porting cost, just to get
> rv32 going.

I warned them that this would be extra work, but I also hope that
we will help each other while bringing up time64 distro support
on rv32, arm32 and other 32-bit architectures.

> > As I said before, we could easily undo the one-line change in the
> > kernel and start out with the usual 32-bit off_t/time_t/... in glibc
> > if it turns out to be too hard to get this to work.
>
> glibc isn't the problem.  It's source-level changes to applications
> which use kernel headers.

Right, I should have not pointed to glibc in particular. The problem
is really anything in user space that makes assumptions about
time_t and how it relates to other types and interfaces.

Applications using __NR_futex (and I hope less commonly the
other time64 syscalls) need to be fixed for arm32, and I hope
to get some help on this if the rv32 developers get there first.

Another example that is independent of the kernel is apparently
the Python C extension API, which fundamentally assumes that
'time_t' is the same as 'long'. If we first address this on arm32,
this will help rv32 as well.

      Arnd
  
Alistair Francis July 3, 2019, 11:50 p.m. UTC | #23
On Thu, Jun 27, 2019 at 8:24 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Jun 27, 2019 at 1:12 PM Florian Weimer <fweimer@redhat.com> wrote:
> > * Arnd Bergmann:
> > >> If not, there is no risk at all that programmers confuse the types, and
> > >> no need to rename the systen calls.
> > >>
> > >> If there's only a 64-bit time_t, calling futex futex_time64 makes as
> > >> much sense as alpha calling getpid getxpid.
> > >
> > > No, the rule for syscalls in the kernel is to give a new name if you
> > > change the arguments. When we went through multiple revisions for
> > > fadvise/fadvise64/fadvise64_64, we could have made it so that
> > > each new architecture only has the latest version by the name of
> > > 'fadvise', but I think that would have been really confusing.
> >
> > But futex is different because in userspace, futex_time64 behaves
> > exactly like futex on other architectures on rv32.
>
> Only as long as nobody tries to define their own __timespec structure to
> pass into futex, based on the traditional definition. Given the lack
> or a correct way to get to the definition, that would not be an
> unreasonable thing to do for getting it to work on arm32
> as a fallback.

I'm unclear what the decision here is. I currently have a patch that does this:

+#if __riscv_xlen == 32
+/* Define the __NR_futex as __NR_futex64 as RV32 doesn't have a
+ * __NR_futex syscall.
+ */
+# ifndef __NR_futex
+#  define __NR_futex __NR_futex_time64
+# endif
+#endif

for futex and other missing syscalls in my RV32 port. It seems to be
working fine for RV32 (limited testing only).

What is the consensus here? I would like to try and tidy up the patch
set next week so that others can use this for RV32 until the glibc
merge window opens again.

Alistair

>
> > > I also really don't want to have a different definition of timespec
> > > in the kernel for rv32 and get people to use that. Instead I think
> > > anyone using the __NR_* macros should really use __kernel_timespec
> > > and other __kernel_* types from the kernel headers in place of
> > > the libc types in the long run. Properly cleaning up the kernel
> > > headers is something we definitely need to do at some point
> > > but have no specific plan for.
> >
> > There's only __kernel_old_timeval, no __kernel_old_timespec, so you
> > can't use the kernel headers to write the 32-bit fallback path.  Once we
> > get _TIME_BITS=64 (to change struct timespec to 64-bit, like
> > _FILE_OFFSET_BITS), you can't use the glibc header files either.  Not
> > good.
>
> Right, that is clearly part of the cleanup that needs to happen.
> I thought we had actually merged the __kernel_old_timespec
> patch at some point, but evidently we have not.
>
> > >> I think the quoted fix isn't the best we can do.  Portable binaries
> > >> 32-bit binaries need to call futex_time64 once to see if it is
> > >> available, and fall back to futex if it is not.  g_cond_wait_until in
> > >> particular does not have a dependency on the size of struct timespec,
> > >> but other parts of glib might, and it may not be possible to compile
> > >> glib as a while with __TIMESIZE == 64.
> > >
> > > Can you clarify this? Making glib work with __TIMESIZE == 64 is
> > > clearly required anyway,
> >
> > I'm not sure.  glib could deprecate all the APIs that use time_t
> > externally.  Given that they don't want to bump ABI, they may not want
> > to compile with __TIMESIZE == 64 on old 32-bit systems.
>
> From a very brief look at glib, my impression is that this is again a
> rather special case: all the external interfaces in glib are already
> independent of sizeof(time_t), and internally it should be time64
> safe already when built against a C library with 64-bit time_t
> (all internal types use 64-bit seconds), except for the futex usage
> usage in g_cond_wait_until().
>
> [glib has a different problem with its deprecated GTimeVal type
>  that they will have to solve to actually run beyond 2038, but
>  let's not get into that]
>
> Many other libraries do have external interfaces based on time_t,
> and have to deal with those breaking between different values
> of _TIME_BITS. I suspect the only realistic way to deal with this
> on a large scale is to do a full distro rebuild with _TIME_BITS=64
> and then fix whatever broke, but not expect a solution at the ABI
> level like glibc is doing to support both versions in one binary.
>
> > >> But t his is really an aside.  I'm concerned that porting software to
> > >> rv32 requires that applications have to deal with the time_t transition
> > >> today, and I don't see a good reason for linking the two.
> > >
> > > The risc-v kernel maintainers specifically wanted to wait for the
> > > time_t transition with libc, in order for rv32 to avoid going through
> > > the transition after binaries are widely distributed already.
> >
> > That's a different concern.  They didn't want a 32-bit-time_t ABI.  I
> > expect that they might have chosen differently if they realized that
> > they'd need to absorb the userspace 64-bit porting cost, just to get
> > rv32 going.
>
> I warned them that this would be extra work, but I also hope that
> we will help each other while bringing up time64 distro support
> on rv32, arm32 and other 32-bit architectures.
>
> > > As I said before, we could easily undo the one-line change in the
> > > kernel and start out with the usual 32-bit off_t/time_t/... in glibc
> > > if it turns out to be too hard to get this to work.
> >
> > glibc isn't the problem.  It's source-level changes to applications
> > which use kernel headers.
>
> Right, I should have not pointed to glibc in particular. The problem
> is really anything in user space that makes assumptions about
> time_t and how it relates to other types and interfaces.
>
> Applications using __NR_futex (and I hope less commonly the
> other time64 syscalls) need to be fixed for arm32, and I hope
> to get some help on this if the rv32 developers get there first.
>
> Another example that is independent of the kernel is apparently
> the Python C extension API, which fundamentally assumes that
> 'time_t' is the same as 'long'. If we first address this on arm32,
> this will help rv32 as well.
>
>       Arnd
  
Florian Weimer July 8, 2019, 12:09 p.m. UTC | #24
* Andreas Schwab:

>> But this would still break if people actually go ahead with the removal
>> of the 32-bit system calls (something I think is quite impossible to do,
>> but some people seem to disagree).
>
> I'd expect that the current futex syscall will continue to work for all
> its subfunctions without absolute timestamp.

I thought that RV32 just wouldn't have a futex system call at all?

Thanks,
Florian
  
Andreas Schwab July 8, 2019, 12:34 p.m. UTC | #25
On Jul 08 2019, Florian Weimer <fweimer@redhat.com> wrote:

> * Andreas Schwab:
>
>>> But this would still break if people actually go ahead with the removal
>>> of the 32-bit system calls (something I think is quite impossible to do,
>>> but some people seem to disagree).
>>
>> I'd expect that the current futex syscall will continue to work for all
>> its subfunctions without absolute timestamp.
>
> I thought that RV32 just wouldn't have a futex system call at all?

I'm talking about legacy 32-bit architectures.

Andreas.
  
Florian Weimer July 8, 2019, 12:36 p.m. UTC | #26
* Andreas Schwab:

> On Jul 08 2019, Florian Weimer <fweimer@redhat.com> wrote:
>
>> * Andreas Schwab:
>>
>>>> But this would still break if people actually go ahead with the removal
>>>> of the 32-bit system calls (something I think is quite impossible to do,
>>>> but some people seem to disagree).
>>>
>>> I'd expect that the current futex syscall will continue to work for all
>>> its subfunctions without absolute timestamp.
>>
>> I thought that RV32 just wouldn't have a futex system call at all?
>
> I'm talking about legacy 32-bit architectures.

Ah.  Yes, I one would hope that.

But there has been talk of a kernel option to remove it even there, and
a suggestion that glibc would try the 64-bit time_t system call first,
to see if it is available, and prefer that.

I think this would be quite … wrong.

Thanks,
Florian
  
Alistair Francis July 9, 2019, 10:57 p.m. UTC | #27
On Mon, Jul 8, 2019 at 5:37 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Andreas Schwab:
>
> > On Jul 08 2019, Florian Weimer <fweimer@redhat.com> wrote:
> >
> >> * Andreas Schwab:
> >>
> >>>> But this would still break if people actually go ahead with the removal
> >>>> of the 32-bit system calls (something I think is quite impossible to do,
> >>>> but some people seem to disagree).
> >>>
> >>> I'd expect that the current futex syscall will continue to work for all
> >>> its subfunctions without absolute timestamp.
> >>
> >> I thought that RV32 just wouldn't have a futex system call at all?
> >
> > I'm talking about legacy 32-bit architectures.
>
> Ah.  Yes, I one would hope that.
>
> But there has been talk of a kernel option to remove it even there, and
> a suggestion that glibc would try the 64-bit time_t system call first,
> to see if it is available, and prefer that.
>
> I think this would be quite … wrong.

So that means that the #define solution (see below) is probably the
way to go then?

#if __riscv_xlen == 32
# ifndef __NR_futex
#  define __NR_futex __NR_futex_time64
# endif
#endif

Alistair

>
> Thanks,
> Florian
  
Andreas Schwab July 10, 2019, 7:33 a.m. UTC | #28
On Jul 09 2019, Alistair Francis <alistair23@gmail.com> wrote:

> So that means that the #define solution (see below) is probably the
> way to go then?
>
> #if __riscv_xlen == 32
> # ifndef __NR_futex
> #  define __NR_futex __NR_futex_time64
> # endif
> #endif

I don't think this is the way to go since all future 32-bit ABIs will
have to do the same.  The generic code should follow the default as
defined by <asm-generic/unistd.h>.

Andreas.
  
Alistair Francis July 10, 2019, 5:47 p.m. UTC | #29
On Wed, Jul 10, 2019 at 12:33 AM Andreas Schwab <schwab@suse.de> wrote:
>
> On Jul 09 2019, Alistair Francis <alistair23@gmail.com> wrote:
>
> > So that means that the #define solution (see below) is probably the
> > way to go then?
> >
> > #if __riscv_xlen == 32
> > # ifndef __NR_futex
> > #  define __NR_futex __NR_futex_time64
> > # endif
> > #endif
>
> I don't think this is the way to go since all future 32-bit ABIs will
> have to do the same.  The generic code should follow the default as
> defined by <asm-generic/unistd.h>.

I'm a little hesitant on making this change generic. This is what I'm
more thinking
 1. Add RV32 support by RV32 specific defines (see above)
 2. As other architectures move to this the RV32 fix can be either
moved to a generic include or if a different solution is decided upon
that can be used. The benefit here is that we don't end up pushing
everyone else to do what we do initially.

There will be some generic changes, we can't just #define everything
but this at least limits the generic code changes when we merge in the
RV32 port.

If everyone doesn't like this option though I'm happy to make the
changes generic.

Alistair

>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
  
Joseph Myers July 25, 2019, 3:49 p.m. UTC | #30
On Mon, 8 Jul 2019, Florian Weimer wrote:

> > I'm talking about legacy 32-bit architectures.
> 
> Ah.  Yes, I one would hope that.
> 
> But there has been talk of a kernel option to remove it even there, and
> a suggestion that glibc would try the 64-bit time_t system call first,
> to see if it is available, and prefer that.
> 
> I think this would be quite … wrong.

glibc should generally use the 64-bit time_t system call first on legacy 
32-bit architectures (and especially in cases such as futex where there is 
a large amount of logic in glibc at a higher level in the syscall, which 
it is particularly important to avoid duplicating for different time_t 
choices), not because of systems with the 32-bit syscall removed where it 
was previously part of the kernel ABI expected by glibc (those are not 
something reasonably supportable), but to avoid duplication and code bloat 
at both the source and binary level and to have a single consistent design 
(32-bit functions wrap the corresponding 64-bit ones, in all cases where 
the function is defined in C rather than through syscalls.list) rather 
than attempting to define what is or is not a sufficiently trivial 
function implementation that duplication is not a problem.
  

Patch

diff --git a/ChangeLog b/ChangeLog
index f1c7acb6ab..9ed9bea8b1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -5,6 +5,9 @@ 
 	* sysdeps/unix/sysv/linux/nanosleep_nocancel.c: Likewise.
 	* sysdeps/unix/sysv/linux/lowlevellock-futex.h: Use __NR_futex_time64 if we don't have __NR_futex.
 	* sysdeps/unix/sysv/linux/gettimeofday.c: Use clock_gettime64 syscall for gettimeofday.
+	* sysdeps/unix/sysv/linux/wait.c: Use __NR_waitid if avaliable.
+	* sysdeps/unix/sysv/linux/waitpid.c: Likewise.
+	* sysdeps/unix/sysv/linux/waitpid_nocancel.c: Likewise.
 
 2019-06-20  Dmitry V. Levin  <ldv@altlinux.org>
 	    Florian Weimer  <fweimer@redhat.com>
diff --git a/sysdeps/unix/sysv/linux/wait.c b/sysdeps/unix/sysv/linux/wait.c
index 498bd1c095..67b94776a4 100644
--- a/sysdeps/unix/sysv/linux/wait.c
+++ b/sysdeps/unix/sysv/linux/wait.c
@@ -26,8 +26,25 @@ 
 pid_t
 __libc_wait (int *stat_loc)
 {
-  pid_t result = SYSCALL_CANCEL (wait4, WAIT_ANY, stat_loc, 0,
-				 (struct rusage *) NULL);
+  pid_t result;
+
+#ifdef __NR_waitid
+  siginfo_t infop;
+
+  result = SYSCALL_CANCEL (waitid, P_ALL, 0, &infop, WEXITED);
+
+  if (stat_loc) {
+    *stat_loc = infop.si_status;
+  }
+
+  if (result == 0) {
+    result = infop.si_pid;
+  }
+#else
+  result = SYSCALL_CANCEL (wait4, WAIT_ANY, stat_loc, 0,
+                           (struct rusage *) NULL);
+#endif
+
   return result;
 }
 
diff --git a/sysdeps/unix/sysv/linux/waitpid.c b/sysdeps/unix/sysv/linux/waitpid.c
index f0897574c0..8c0e2f882a 100644
--- a/sysdeps/unix/sysv/linux/waitpid.c
+++ b/sysdeps/unix/sysv/linux/waitpid.c
@@ -20,12 +20,66 @@ 
 #include <sysdep-cancel.h>
 #include <stdlib.h>
 #include <sys/wait.h>
+#include <unistd.h>
 
 __pid_t
 __waitpid (__pid_t pid, int *stat_loc, int options)
 {
 #ifdef __NR_waitpid
   return SYSCALL_CANCEL (waitpid, pid, stat_loc, options);
+#elif defined(__NR_waitid)
+  int ret, waitid_options = 0;
+  idtype_t idtype = P_PID;
+  siginfo_t infop;
+
+  /* Set this to zero so we can test if WNOHANG was specified in options
+   * and there were no children in a waitable state. This is required to match
+   * waitid() behaviour.
+   */
+  infop.si_pid = 0;
+
+  if (pid < -1) {
+    idtype = P_PGID;
+    pid *= -1;
+  } else if (pid == -1) {
+    idtype = P_ALL;
+  } else if (pid == 0) {
+    idtype = P_PGID;
+    pid = getpgrp();
+  }
+
+  /* Convert the older WNOHANG, WUNTRACED and WCONTINUED options to the newer
+   * ones uses for waitid().
+   */
+  if (options | WNOHANG) {
+    waitid_options |= WNOHANG;
+  }
+  if (options | WUNTRACED) {
+    waitid_options |= WSTOPPED;
+  }
+  if (options | WCONTINUED) {
+    waitid_options |= WCONTINUED;
+  }
+
+  /* waitid() requires at least WEXITED, WSTOPPED or WCONTINUED to be
+   * specified. If none were specified default to WEXITED as that is what
+   * waitpid() and wait4() do.
+   */
+  if (options == 0 || waitid_options == WNOHANG) {
+    waitid_options |= WEXITED;
+  }
+
+  ret = SYSCALL_CANCEL (waitid, idtype, pid, &infop, waitid_options);
+
+  if (stat_loc) {
+    *stat_loc = infop.si_status;
+  }
+
+  if (ret == 0) {
+    return infop.si_pid;
+  }
+
+  return ret;
 #else
   return SYSCALL_CANCEL (wait4, pid, stat_loc, options, NULL);
 #endif
diff --git a/sysdeps/unix/sysv/linux/waitpid_nocancel.c b/sysdeps/unix/sysv/linux/waitpid_nocancel.c
index 89e36a5c0b..d14b7bf260 100644
--- a/sysdeps/unix/sysv/linux/waitpid_nocancel.c
+++ b/sysdeps/unix/sysv/linux/waitpid_nocancel.c
@@ -27,6 +27,59 @@  __waitpid_nocancel (__pid_t pid, int *stat_loc, int options)
 {
 #ifdef __NR_waitpid
   return INLINE_SYSCALL_CALL (waitpid, pid, stat_loc, options);
+#elif defined(__NR_waitid)
+  int ret, waitid_options = 0;
+  idtype_t idtype = P_PID;
+  siginfo_t infop;
+
+  /* Set this to zero so we can test if WNOHANG was specified in options
+   * and there were no children in a waitable state. This is required to match
+   * waitid() behaviour.
+   */
+  infop.si_pid = 0;
+
+  if (pid < -1) {
+    idtype = P_PGID;
+    pid *= -1;
+  } else if (pid == -1) {
+    idtype = P_ALL;
+  } else if (pid == 0) {
+    idtype = P_PGID;
+    pid = getpgrp();
+  }
+
+  /* Convert the older WNOHANG, WUNTRACED and WCONTINUED options to the newer
+   * ones uses for waitid().
+   */
+  if (options | WNOHANG) {
+    waitid_options |= WNOHANG;
+  }
+  if (options | WUNTRACED) {
+    waitid_options |= WSTOPPED;
+  }
+  if (options | WCONTINUED) {
+    waitid_options |= WCONTINUED;
+  }
+
+  /* waitid() requires at least WEXITED, WSTOPPED or WCONTINUED to be
+   * specified. If none were specified default to WEXITED as that is what
+   * waitpid() and wait4() do.
+   */
+  if (options == 0 || waitid_options == WNOHANG) {
+    waitid_options |= WEXITED;
+  }
+
+  ret = INLINE_SYSCALL_CALL (waitid, idtype, pid, &infop, waitid_options);
+
+  if (stat_loc) {
+    *stat_loc = infop.si_status;
+  }
+
+  if (ret == 0) {
+    return infop.si_pid;
+  }
+
+  return ret;
 #else
   return INLINE_SYSCALL_CALL (wait4, pid, stat_loc, options, NULL);
 #endif