[0/2] RISC-V: ifunced memcpy using new kernel hwprobe interface

Message ID 20230206194819.1679472-1-evan@rivosinc.com
Headers
Series RISC-V: ifunced memcpy using new kernel hwprobe interface |

Message

Evan Green Feb. 6, 2023, 7:48 p.m. UTC
  This series illustrates the use of a proposed Linux syscall that
enumerates architectural information about the RISC-V cores the system
is running on. In this series we expose a small wrapper function around
the syscall. An ifunc selector for memcpy queries it to see if unaligned
access is "fast" on this hardware. If it is, it selects a newly provided
implementation of memcpy that doesn't work hard at aligning the src and
destination buffers.

This is somewhat of a proof of concept for the syscall itself, but I do
find that in my goofy  memcpy test [1], the unaligned memcpy performed at
least as well as the generic C version. This is however on Qemu on an M1
mac, so not a test of any real hardware (more a smoke test that the
implementation isn't silly).

v1 of the Linux series can be found at [2]. I'm about to post v2 (but
haven't yet!), I can reply here with the link once v2 is posted.

[1] https://pastebin.com/Nj8ixpkX
[2] https://yhbt.net/lore/all/20221013163551.6775-1-palmer@rivosinc.com/


Evan Green (2):
  riscv: Add Linux hwprobe syscall support
  riscv: Add and use alignment-ignorant memcpy

 sysdeps/riscv/memcopy.h                       |  28 +++++
 sysdeps/riscv/memcpy.c                        |  65 +++++++++++
 sysdeps/riscv/memcpy_noalignment.S            | 103 ++++++++++++++++++
 sysdeps/unix/sysv/linux/riscv/Makefile        |   8 +-
 sysdeps/unix/sysv/linux/riscv/Versions        |   3 +
 sysdeps/unix/sysv/linux/riscv/hwprobe.c       |  30 +++++
 .../unix/sysv/linux/riscv/memcpy-generic.c    |  24 ++++
 .../unix/sysv/linux/riscv/rv32/arch-syscall.h |   1 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/arch-syscall.h |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h   |  34 ++++++
 sysdeps/unix/sysv/linux/syscall-names.list    |   1 +
 13 files changed, 298 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/riscv/memcopy.h
 create mode 100644 sysdeps/riscv/memcpy.c
 create mode 100644 sysdeps/riscv/memcpy_noalignment.S
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
  

Comments

Richard Henderson Feb. 6, 2023, 9:28 p.m. UTC | #1
On 2/6/23 09:48, Evan Green wrote:
> 
> This series illustrates the use of a proposed Linux syscall that
> enumerates architectural information about the RISC-V cores the system
> is running on. In this series we expose a small wrapper function around
> the syscall. An ifunc selector for memcpy queries it to see if unaligned
> access is "fast" on this hardware. If it is, it selects a newly provided
> implementation of memcpy that doesn't work hard at aligning the src and
> destination buffers.
> 
> This is somewhat of a proof of concept for the syscall itself, but I do
> find that in my goofy  memcpy test [1], the unaligned memcpy performed at
> least as well as the generic C version. This is however on Qemu on an M1
> mac, so not a test of any real hardware (more a smoke test that the
> implementation isn't silly).
> 
> v1 of the Linux series can be found at [2]. I'm about to post v2 (but
> haven't yet!), I can reply here with the link once v2 is posted.
> 
> [1] https://pastebin.com/Nj8ixpkX
> [2] https://yhbt.net/lore/all/20221013163551.6775-1-palmer@rivosinc.com/

Re the syscall:

I question whether the heterogenous cpu case is something that you really want to query. 
In order to handle migration between such cpus, any such query must return the minimum 
level of support.

Remove that possibility, and this becomes a simple array reference.  Now you need to 
decide whether a vdso call, or HWCAP2 as pointer to read-only data is more or less 
efficient or extensible.


r~
  
Adhemerval Zanella Feb. 7, 2023, 12:49 p.m. UTC | #2
On 06/02/23 18:28, Richard Henderson via Libc-alpha wrote:
> On 2/6/23 09:48, Evan Green wrote:
>>
>> This series illustrates the use of a proposed Linux syscall that
>> enumerates architectural information about the RISC-V cores the system
>> is running on. In this series we expose a small wrapper function around
>> the syscall. An ifunc selector for memcpy queries it to see if unaligned
>> access is "fast" on this hardware. If it is, it selects a newly provided
>> implementation of memcpy that doesn't work hard at aligning the src and
>> destination buffers.
>>
>> This is somewhat of a proof of concept for the syscall itself, but I do
>> find that in my goofy  memcpy test [1], the unaligned memcpy performed at
>> least as well as the generic C version. This is however on Qemu on an M1
>> mac, so not a test of any real hardware (more a smoke test that the
>> implementation isn't silly).
>>
>> v1 of the Linux series can be found at [2]. I'm about to post v2 (but
>> haven't yet!), I can reply here with the link once v2 is posted.
>>
>> [1] https://pastebin.com/Nj8ixpkX
>> [2] https://yhbt.net/lore/all/20221013163551.6775-1-palmer@rivosinc.com/
> 
> Re the syscall:
> 
> I question whether the heterogenous cpu case is something that you really want to query. In order to handle migration between such cpus, any such query must return the minimum level of support.
> 
> Remove that possibility, and this becomes a simple array reference.  Now you need to decide whether a vdso call, or HWCAP2 as pointer to read-only data is more or less efficient or extensible.

It should at least work if kernel trap/emulate unaligned or any instruction
not supported by the other code, although it would be really subpar.  I 
would expect that kernel would report the minimum ISA as well.

I would recommend also to cache the values as we do for aarch64/x86/powerpc
to avoid issue multiple syscall on symbol resolution (check cpu-features.c).