[RFC,0/4] futex2: Add new futex interface

Message ID 20200612185122.327860-1-andrealmeid@collabora.com
Headers
Series futex2: Add new futex interface |

Message

André Almeida June 12, 2020, 6:51 p.m. UTC
  Hello,

This RFC is a followup to the previous discussion initiated from my last
patch "futex: Implement mechanism to wait on any of several futexes"[1].
As stated in the thread, the correct approach to move forward with the
wait multiple operation would be to create a new syscall that would have
all new cool features.

The first patch adds the new interface and just translate the call for
the old interface, without implementing new features. The goal here is
to establish the interface and to check if everyone is happy with this
API. The rest of patches are selftests to show the interface in action.
I have the following questions:

- Has anyone stared worked on a implementation of this interface? If
  yes, it would be nice to share the progress so we don't have duplicated
  work.

- What suggestions do you have to implement this? Start from scratch or
  reuse the most code possible?

- The interface seems correct and implements the requirements asked by you?

- The proposed interface uses ktime_t type for absolute timeout, and I
  assumed that it should use values in a nsec resolution. If this is true,
  we have some problems with i386 ABI, please check out the
  COMPAT_32BIT_TIME implementation in patch 1 for more details. I
  haven't added a time64 implementation yet, until this is clarified.

- Is expected to have a x32 ABI implementation as well? In the case of
  wait and wake, we could use the same as x86_64 ABI. However, for the
  waitv (aka wait on multiple futexes) we would need a proper x32 entry
  since we are dealing with 32bit pointers.

Those are the cool new features that this syscall should address some
day:

- Operate with variable bit size futexes, not restricted to 32:
  8, 16 and 64

- Wait on multiple futexes, using the following semantics:

  struct futex_wait {
	void *uaddr;
	unsigned long val;
	unsigned long flags;
  };

  sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
		  unsigned long flags, ktime_t *timo);

- Have NUMA optimizations: if FUTEX_NUMA_FLAG is present, the `void *uaddr`
  argument won't be a u{8, 16, 32, 64} value anymore, but a struct
  containing a NUMA node hint:

  struct futex32_numa {
	  u32 value __attribute__ ((aligned (8)));
	  u32 hint;
  };

  struct futex64_numa {
	  u64 value __attribute__ ((aligned (16)));
	  u64 hint;
  };

Thanks,
	André

André Almeida (4):
  futex2: Add new futex interface
  selftests: futex: Add futex2 wake/wait test
  selftests: futex: Add futex2 timeout test
  selftests: futex: Add futex2 wouldblock test

 MAINTAINERS                                   |   2 +-
 arch/x86/entry/syscalls/syscall_32.tbl        |   2 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   2 +
 include/linux/syscalls.h                      |   9 ++
 include/uapi/asm-generic/unistd.h             |   7 +-
 include/uapi/linux/futex.h                    |  10 ++
 init/Kconfig                                  |   7 ++
 kernel/Makefile                               |   1 +
 kernel/futex2.c                               |  97 ++++++++++++++++
 kernel/sys_ni.c                               |   5 +
 tools/include/uapi/asm-generic/unistd.h       |   7 +-
 .../selftests/futex/functional/.gitignore     |   1 +
 .../selftests/futex/functional/Makefile       |   4 +-
 .../selftests/futex/functional/futex2_wait.c  | 106 ++++++++++++++++++
 .../futex/functional/futex_wait_timeout.c     |  27 ++++-
 .../futex/functional/futex_wait_wouldblock.c  |  34 +++++-
 .../testing/selftests/futex/functional/run.sh |   3 +
 .../selftests/futex/include/futex2test.h      |  50 +++++++++
 18 files changed, 361 insertions(+), 13 deletions(-)
 create mode 100644 kernel/futex2.c
 create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
 create mode 100644 tools/testing/selftests/futex/include/futex2test.h
  

Comments

H.J. Lu June 12, 2020, 7:35 p.m. UTC | #1
On Fri, Jun 12, 2020 at 11:53 AM André Almeida via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Hello,
>
> This RFC is a followup to the previous discussion initiated from my last
> patch "futex: Implement mechanism to wait on any of several futexes"[1].
> As stated in the thread, the correct approach to move forward with the
> wait multiple operation would be to create a new syscall that would have
> all new cool features.
>
> The first patch adds the new interface and just translate the call for
> the old interface, without implementing new features. The goal here is
> to establish the interface and to check if everyone is happy with this
> API. The rest of patches are selftests to show the interface in action.
> I have the following questions:
>
> - Has anyone stared worked on a implementation of this interface? If
>   yes, it would be nice to share the progress so we don't have duplicated
>   work.
>
> - What suggestions do you have to implement this? Start from scratch or
>   reuse the most code possible?
>
> - The interface seems correct and implements the requirements asked by you?
>
> - The proposed interface uses ktime_t type for absolute timeout, and I
>   assumed that it should use values in a nsec resolution. If this is true,
>   we have some problems with i386 ABI, please check out the
>   COMPAT_32BIT_TIME implementation in patch 1 for more details. I
>   haven't added a time64 implementation yet, until this is clarified.
>
> - Is expected to have a x32 ABI implementation as well? In the case of
>   wait and wake, we could use the same as x86_64 ABI. However, for the
>   waitv (aka wait on multiple futexes) we would need a proper x32 entry
>   since we are dealing with 32bit pointers.

x32 should be able to use the same i386 compat systcall entry.   Will it be
problem?

> Those are the cool new features that this syscall should address some
> day:
>
> - Operate with variable bit size futexes, not restricted to 32:
>   8, 16 and 64
>
> - Wait on multiple futexes, using the following semantics:
>
>   struct futex_wait {
>         void *uaddr;
>         unsigned long val;
>         unsigned long flags;
>   };
>
>   sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
>                   unsigned long flags, ktime_t *timo);
>
> - Have NUMA optimizations: if FUTEX_NUMA_FLAG is present, the `void *uaddr`
>   argument won't be a u{8, 16, 32, 64} value anymore, but a struct
>   containing a NUMA node hint:
>
>   struct futex32_numa {
>           u32 value __attribute__ ((aligned (8)));
>           u32 hint;
>   };
>
>   struct futex64_numa {
>           u64 value __attribute__ ((aligned (16)));
>           u64 hint;
>   };
>

H.J.
  
André Almeida June 13, 2020, 7:25 p.m. UTC | #2
Hello H.J.,

On 6/12/20 4:35 PM, H.J. Lu wrote:
> On Fri, Jun 12, 2020 at 11:53 AM André Almeida via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>> - Is expected to have a x32 ABI implementation as well? In the case of
>>   wait and wake, we could use the same as x86_64 ABI. However, for the
>>   waitv (aka wait on multiple futexes) we would need a proper x32 entry
>>   since we are dealing with 32bit pointers.
> 
> x32 should be able to use the same i386 compat systcall entry.   Will it be
> problem?
> 
Indeed, you are right. In the last iteration of this work, I had some
problems dealing with x32 ABI, but this new interface doesn't have the
same problem anymore. We can use the same sys_waitv_time64 interface for
both i368 and x32.

>>
> 
> H.J.
> 

Thanks,
André
  
Arnd Bergmann June 25, 2020, 6:48 a.m. UTC | #3
On Fri, Jun 12, 2020 at 8:51 PM André Almeida <andrealmeid@collabora.com> wrote:

> - The proposed interface uses ktime_t type for absolute timeout, and I
>   assumed that it should use values in a nsec resolution. If this is true,
>   we have some problems with i386 ABI, please check out the
>   COMPAT_32BIT_TIME implementation in patch 1 for more details. I
>   haven't added a time64 implementation yet, until this is clarified.

ktime_t is not part of the uapi headers, and has always been considered
an implementation detail of the kernel so far. I would argue it should
stay that way. The most sensible alternatives would be to either use
a "__u64 *timeout" argument for a relative timeout, or a
"struct __kernel_timespec *timeout" for an absolute timeout.

old_time32_t also makes no sense for multiple reasons:

- It's another kernel internal type and not part of the uapi headers
- your time32 call has different calling conventions from your time64
  version, not just a different type.
- there should be no need to add syscalls that are known to be buggy
  when there is a replacement type that does not have that bug.

> - Is expected to have a x32 ABI implementation as well? In the case of
>   wait and wake, we could use the same as x86_64 ABI. However, for the
>   waitv (aka wait on multiple futexes) we would need a proper x32 entry
>   since we are dealing with 32bit pointers.

For new syscalls, I'd actually recommend not having a separate
entry point, but just checking 'if (in_compat_syscall())' inside of the
implementation to pick one behavior vs the other when accessing
the user pointers. This keeps the implementation simpler and
avoids assigning a new x32 syscall number that would be different
from all the other architectures.

      Arnd
  
André Almeida June 25, 2020, 2:38 p.m. UTC | #4
Hello Arnd,

On 6/25/20 3:48 AM, Arnd Bergmann wrote:
> On Fri, Jun 12, 2020 at 8:51 PM André Almeida <andrealmeid@collabora.com> wrote:
> 
>> - The proposed interface uses ktime_t type for absolute timeout, and I
>>   assumed that it should use values in a nsec resolution. If this is true,
>>   we have some problems with i386 ABI, please check out the
>>   COMPAT_32BIT_TIME implementation in patch 1 for more details. I
>>   haven't added a time64 implementation yet, until this is clarified.
> 
> ktime_t is not part of the uapi headers, and has always been considered
> an implementation detail of the kernel so far. I would argue it should
> stay that way. The most sensible alternatives would be to either use
> a "__u64 *timeout" argument for a relative timeout, or a
> "struct __kernel_timespec *timeout" for an absolute timeout.
> 
> old_time32_t also makes no sense for multiple reasons:
> 
> - It's another kernel internal type and not part of the uapi headers
> - your time32 call has different calling conventions from your time64
>   version, not just a different type.
> - there should be no need to add syscalls that are known to be buggy
>   when there is a replacement type that does not have that bug.
> 

Thanks for the input. As stated by tglx at [1], "supporting relative
timeouts is wrong to begin with", my next patch will use "struct
__kernel_timespec *timeout" for an absolute timeout.

>> - Is expected to have a x32 ABI implementation as well? In the case of
>>   wait and wake, we could use the same as x86_64 ABI. However, for the
>>   waitv (aka wait on multiple futexes) we would need a proper x32 entry
>>   since we are dealing with 32bit pointers.
> 
> For new syscalls, I'd actually recommend not having a separate
> entry point, but just checking 'if (in_compat_syscall())' inside of the
> implementation to pick one behavior vs the other when accessing
> the user pointers. This keeps the implementation simpler and
> avoids assigning a new x32 syscall number that would be different
> from all the other architectures.
> 

Cool, this will make the code cleaner.

>       Arnd
> 


Thanks,
	André

[1] https://lkml.org/lkml/2019/7/31/1499