[v10,0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface

Message ID 20231213211142.1543025-1-evan@rivosinc.com
Headers
Series RISC-V: ifunced memcpy using new kernel hwprobe interface |

Message

Evan Green Dec. 13, 2023, 9:11 p.m. UTC
  This series illustrates the use of a recently accepted Linux syscall that
enumerates architectural information about the RISC-V cores the system
is running on. In this series we expose a small wrapper function around
the syscall. An ifunc selector for memcpy queries it to see if unaligned
access is "fast" on this hardware. If it is, it selects a newly provided
implementation of memcpy that doesn't work hard at aligning the src and
destination buffers.

For applications and libraries outside of glibc that want to use
__riscv_hwprobe() in ifunc selectors, this series also sends a pointer
to the riscv_hwprobe() function in as the second argument to ifunc
selectors. A new inline convenience function can help application and
library callers to check for validity and quickly probe a single key.

The memcpy implementation is independent enough from the rest of the
series that it can be omitted safely if desired.

Performance numbers were compared using a small test program [1], run on
a D1 Nezha board, which supports fast unaligned access. "Fast" here
means copying unaligned words is faster than copying byte-wise, but
still slower than copying aligned words. Here's the speed of various
memcpy()s with the generic implementation. The numbers before are using
v4's memcpy implementation, with the "copy last byte via overlapping
misaligned word" fix this should get slightly better.

memcpy size 1 count 1000000 offset 0 took 109564 us
memcpy size 3 count 1000000 offset 0 took 138425 us
memcpy size 4 count 1000000 offset 0 took 148374 us
memcpy size 7 count 1000000 offset 0 took 178433 us
memcpy size 8 count 1000000 offset 0 took 188430 us
memcpy size f count 1000000 offset 0 took 266118 us
memcpy size f count 1000000 offset 1 took 265940 us
memcpy size f count 1000000 offset 3 took 265934 us
memcpy size f count 1000000 offset 7 took 266215 us
memcpy size f count 1000000 offset 8 took 265954 us
memcpy size f count 1000000 offset 9 took 265886 us
memcpy size 10 count 1000000 offset 0 took 195308 us
memcpy size 11 count 1000000 offset 0 took 205161 us
memcpy size 17 count 1000000 offset 0 took 274376 us
memcpy size 18 count 1000000 offset 0 took 199188 us
memcpy size 19 count 1000000 offset 0 took 209258 us
memcpy size 1f count 1000000 offset 0 took 278263 us
memcpy size 20 count 1000000 offset 0 took 207364 us
memcpy size 21 count 1000000 offset 0 took 217143 us
memcpy size 3f count 1000000 offset 0 took 300023 us
memcpy size 40 count 1000000 offset 0 took 231063 us
memcpy size 41 count 1000000 offset 0 took 241259 us
memcpy size 7c count 100000 offset 0 took 32807 us
memcpy size 7f count 100000 offset 0 took 36274 us
memcpy size ff count 100000 offset 0 took 47818 us
memcpy size ff count 100000 offset 0 took 47932 us
memcpy size 100 count 100000 offset 0 took 40468 us
memcpy size 200 count 100000 offset 0 took 64245 us
memcpy size 27f count 100000 offset 0 took 82549 us
memcpy size 400 count 100000 offset 0 took 111254 us
memcpy size 407 count 100000 offset 0 took 119364 us
memcpy size 800 count 100000 offset 0 took 203899 us
memcpy size 87f count 100000 offset 0 took 222465 us
memcpy size 87f count 100000 offset 3 took 222289 us
memcpy size 1000 count 100000 offset 0 took 388846 us
memcpy size 1000 count 100000 offset 1 took 468827 us
memcpy size 1000 count 100000 offset 3 took 397098 us
memcpy size 1000 count 100000 offset 4 took 397379 us
memcpy size 1000 count 100000 offset 5 took 397368 us
memcpy size 1000 count 100000 offset 7 took 396867 us
memcpy size 1000 count 100000 offset 8 took 389227 us
memcpy size 1000 count 100000 offset 9 took 395949 us
memcpy size 3000 count 50000 offset 0 took 674837 us
memcpy size 3000 count 50000 offset 1 took 676944 us
memcpy size 3000 count 50000 offset 3 took 679709 us
memcpy size 3000 count 50000 offset 4 took 680829 us
memcpy size 3000 count 50000 offset 5 took 678024 us
memcpy size 3000 count 50000 offset 7 took 681097 us
memcpy size 3000 count 50000 offset 8 took 670004 us
memcpy size 3000 count 50000 offset 9 took 674553 us

Here is that same test run with the assembly memcpy() in this series:
memcpy size 1 count 1000000 offset 0 took 92703 us
memcpy size 3 count 1000000 offset 0 took 112527 us
memcpy size 4 count 1000000 offset 0 took 120481 us
memcpy size 7 count 1000000 offset 0 took 149558 us
memcpy size 8 count 1000000 offset 0 took 90617 us
memcpy size f count 1000000 offset 0 took 174373 us
memcpy size f count 1000000 offset 1 took 178615 us
memcpy size f count 1000000 offset 3 took 178845 us
memcpy size f count 1000000 offset 7 took 178636 us
memcpy size f count 1000000 offset 8 took 174442 us
memcpy size f count 1000000 offset 9 took 178660 us
memcpy size 10 count 1000000 offset 0 took 99845 us
memcpy size 11 count 1000000 offset 0 took 112522 us
memcpy size 17 count 1000000 offset 0 took 179735 us
memcpy size 18 count 1000000 offset 0 took 110870 us
memcpy size 19 count 1000000 offset 0 took 121472 us
memcpy size 1f count 1000000 offset 0 took 188231 us
memcpy size 20 count 1000000 offset 0 took 119571 us
memcpy size 21 count 1000000 offset 0 took 132429 us
memcpy size 3f count 1000000 offset 0 took 227021 us
memcpy size 40 count 1000000 offset 0 took 166416 us
memcpy size 41 count 1000000 offset 0 took 180206 us
memcpy size 7c count 100000 offset 0 took 28602 us
memcpy size 7f count 100000 offset 0 took 31676 us
memcpy size ff count 100000 offset 0 took 39257 us
memcpy size ff count 100000 offset 0 took 39176 us
memcpy size 100 count 100000 offset 0 took 21928 us
memcpy size 200 count 100000 offset 0 took 35814 us
memcpy size 27f count 100000 offset 0 took 60315 us
memcpy size 400 count 100000 offset 0 took 63652 us
memcpy size 407 count 100000 offset 0 took 73160 us
memcpy size 800 count 100000 offset 0 took 121532 us
memcpy size 87f count 100000 offset 0 took 147269 us
memcpy size 87f count 100000 offset 3 took 144744 us
memcpy size 1000 count 100000 offset 0 took 232057 us
memcpy size 1000 count 100000 offset 1 took 254319 us
memcpy size 1000 count 100000 offset 3 took 256973 us
memcpy size 1000 count 100000 offset 4 took 257655 us
memcpy size 1000 count 100000 offset 5 took 259456 us
memcpy size 1000 count 100000 offset 7 took 260849 us
memcpy size 1000 count 100000 offset 8 took 232347 us
memcpy size 1000 count 100000 offset 9 took 254330 us
memcpy size 3000 count 50000 offset 0 took 382376 us
memcpy size 3000 count 50000 offset 1 took 389872 us
memcpy size 3000 count 50000 offset 3 took 385310 us
memcpy size 3000 count 50000 offset 4 took 389748 us
memcpy size 3000 count 50000 offset 5 took 391707 us
memcpy size 3000 count 50000 offset 7 took 386778 us
memcpy size 3000 count 50000 offset 8 took 385691 us
memcpy size 3000 count 50000 offset 9 took 392030 us

The assembly routine is measurably better.

[1] https://pastebin.com/DRyECNQW


Changes in v10:
 - Remove spurious 5 from syscall patch (Adhemerval)
 - Use one item per line in Makefile (Adhemerval)
 - Remove double underscores from __riscv_hwprobe definition
   (Adhemerval)
 - Use only spaces in macro definitions of hwprobe.h (Adhemerval)
 - Introduced INTERNAL_VSYSCALL patch
 - Remove leading underscores in definition (Adhemerval)
 - Remove spurious 5 from INTERNAL_SYSCALL_CALL (Adhemerval)
 - Use new INTERNAL_VSYSCALL macro instead (Adhemerval)
 - Hand in pointer to __riscv_hwprobe(), not vDSO function
 - Avoid implicit comparisons (Adhemerval)
 - One line per function in Makefile for memcpy (Adhemerval)
 - Space before argument-like things (Adhemerval)

Changes in v9:
 - Alphabetize new entries in libc.abilist (to pass checks)
 - Fix a couple of typos causing powerpc not to build
   (build-many-glibcs)
 - Use __inline rather than inline so c89 compiles (build-many-glibcs)

Changes in v8:
 - Fix missed 2.39 in abilists (Joseph)
 - Just return -r (Florian)

Changes in v7:
 - Bumped Versions up to 2.39 (Joseph)
 - Used INTERNAL_SYSCALL_CALL, and return positive errno to match
   pthreads API (Florian).
 - Remove __THROW since it creates a warning in combination with the
   fortified access decorators.
 - Use INTERNAL_VSYSCALL_CALL (Florian)
 - Remove __THROW from function pointer type, as it creates warnings
   together with __fortified_attr_access.
 - Introduced static inline helper (Richard)
 - Use new helper function in memcpy ifunc selector (Richard)

Changes in v6:
 - Prefixed __riscv_hwprobe() parameters names with __ to avoid user
   macro namespace pollution (Joseph)
 - Introduced riscv-ifunc.h for multi-arg ifunc selectors.
 - Fix a couple regressions in the assembly from v5 :/
 - Use passed hwprobe pointer in memcpy ifunc selector.

Changes in v5:
 - Do unaligned word access for final trailing bytes (Richard)

Changes in v4:
 - Remove __USE_GNU (Florian)
 - __nonnull, __wur, __THROW, and  __fortified_attr_access decorations
  (Florian)
 - change long to long int (Florian)
 - Fix comment formatting (Florian)
 - Update backup kernel header content copy.
 - Fix function declaration formatting (Florian)
 - Changed export versions to 2.38
 - Fixed comment style (Florian)

Changes in v3:
 - Update argument types to match v4 kernel interface
 - Add the "return" to the vsyscall
 - Fix up vdso arg types to match kernel v4 version
 - Remove ifdef around INLINE_VSYSCALL (Adhemerval)
 - Word align dest for large memcpy()s.
 - Add tags
 - Remove spurious blank line from sysdeps/riscv/memcpy.c

Changes in v2:
 - hwprobe.h: Use __has_include and duplicate Linux content to make
   compilation work when Linux headers are absent (Adhemerval)
 - hwprobe.h: Put declaration under __USE_GNU (Adhemerval)
 - Use INLINE_SYSCALL_CALL (Adhemerval)
 - Update versions
 - Update UNALIGNED_MASK to match kernel v3 series.
 - Add vDSO interface
 - Used _MASK instead of _FAST value itself.

Evan Green (7):
  riscv: Add Linux hwprobe syscall support
  linux: Introduce INTERNAL_VSYSCALL
  riscv: Add hwprobe vdso call support
  riscv: Add __riscv_hwprobe pointer to ifunc calls
  riscv: Enable multi-arg ifunc resolvers
  riscv: Add ifunc helper method to hwprobe.h
  riscv: Add and use alignment-ignorant memcpy

 include/libc-symbols.h                        |  28 ++--
 sysdeps/riscv/dl-irel.h                       |   9 +-
 sysdeps/riscv/memcopy.h                       |  26 ++++
 sysdeps/riscv/memcpy.c                        |  63 ++++++++
 sysdeps/riscv/memcpy_noalignment.S            | 138 ++++++++++++++++++
 sysdeps/riscv/riscv-ifunc.h                   |  27 ++++
 sysdeps/unix/sysv/linux/dl-vdso-setup.c       |  10 ++
 sysdeps/unix/sysv/linux/dl-vdso-setup.h       |   3 +
 sysdeps/unix/sysv/linux/riscv/Makefile        |  21 ++-
 sysdeps/unix/sysv/linux/riscv/Versions        |   3 +
 sysdeps/unix/sysv/linux/riscv/hwprobe.c       |  36 +++++
 .../unix/sysv/linux/riscv/memcpy-generic.c    |  24 +++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h   | 109 ++++++++++++++
 sysdeps/unix/sysv/linux/riscv/sysdep.h        |   1 +
 sysdeps/unix/sysv/linux/sysdep-vdso.h         |  19 +++
 17 files changed, 502 insertions(+), 17 deletions(-)
 create mode 100644 sysdeps/riscv/memcopy.h
 create mode 100644 sysdeps/riscv/memcpy.c
 create mode 100644 sysdeps/riscv/memcpy_noalignment.S
 create mode 100644 sysdeps/riscv/riscv-ifunc.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
  

Comments

Evan Green Jan. 8, 2024, 5:06 p.m. UTC | #1
Any last thoughts on this series? I think the plan is to land it shortly.
  
Palmer Dabbelt Jan. 8, 2024, 6:10 p.m. UTC | #2
On Mon, 08 Jan 2024 09:06:10 PST (-0800), Evan Green wrote:
> Any last thoughts on this series? I think the plan is to land it shortly.

Sorry for benig slow here.  It's good on my end, so as long as nobody 
has lurking issues I'm happy to take it.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>

in case someone else wants to pick it up, otherwise let's give it a day 
and see?  There's no branch date listed on the wiki, but I'd like to get 
this in sooner rather than later so we can get the distro folks testing 
it and such.
  
Stefan O'Rear Jan. 9, 2024, 12:06 p.m. UTC | #3
On Mon, Jan 8, 2024, at 12:06 PM, Evan Green wrote:
> Any last thoughts on this series? I think the plan is to land it shortly.

I am still wondering what the intended ABI is for the second parameter on
non-Linux ELF systems which do not provide riscv_hwprobe as a system call.
Should it be passed as a null (or 1/-1) pointer in that case, or are dynamic
linkers expected to provide some emulation of riscv_hwprobe of some variable
fidelity?  In the latter case, what features of riscv_hwprobe must be emulated
for the dynamic linker to claim ABI conformance?

-s
  
Evan Green Jan. 9, 2024, 5:10 p.m. UTC | #4
On Tue, Jan 9, 2024 at 4:06 AM Stefan O'Rear <sorear@fastmail.com> wrote:
>
> On Mon, Jan 8, 2024, at 12:06 PM, Evan Green wrote:
> > Any last thoughts on this series? I think the plan is to land it shortly.
>
> I am still wondering what the intended ABI is for the second parameter on
> non-Linux ELF systems which do not provide riscv_hwprobe as a system call.
> Should it be passed as a null (or 1/-1) pointer in that case, or are dynamic
> linkers expected to provide some emulation of riscv_hwprobe of some variable
> fidelity?  In the latter case, what features of riscv_hwprobe must be emulated
> for the dynamic linker to claim ABI conformance?

NULL would make the most sense IMO. Hwprobe is a Linuxism for sure, so
code that uses hwprobe (either inside or outside an ifunc selector)
won't by default be portable across OSes. If Linux is consistently
good at defining the hwprobe bits and not breaking our own ABI, other
OSes could in theory emulate the interface by exposing the same
keys/values. Though at least from our perspective that's not a goal.

-Evan

>
> -s
  
Stefan O'Rear Jan. 9, 2024, 6:29 p.m. UTC | #5
On Tue, Jan 9, 2024, at 12:10 PM, Evan Green wrote:
> On Tue, Jan 9, 2024 at 4:06 AM Stefan O'Rear <sorear@fastmail.com> wrote:
>>
>> On Mon, Jan 8, 2024, at 12:06 PM, Evan Green wrote:
>> > Any last thoughts on this series? I think the plan is to land it shortly.
>>
>> I am still wondering what the intended ABI is for the second parameter on
>> non-Linux ELF systems which do not provide riscv_hwprobe as a system call.
>> Should it be passed as a null (or 1/-1) pointer in that case, or are dynamic
>> linkers expected to provide some emulation of riscv_hwprobe of some variable
>> fidelity?  In the latter case, what features of riscv_hwprobe must be emulated
>> for the dynamic linker to claim ABI conformance?
>
> NULL would make the most sense IMO. Hwprobe is a Linuxism for sure, so
> code that uses hwprobe (either inside or outside an ifunc selector)
> won't by default be portable across OSes. If Linux is consistently
> good at defining the hwprobe bits and not breaking our own ABI, other
> OSes could in theory emulate the interface by exposing the same
> keys/values. Though at least from our perspective that's not a goal.
>
> -Evan

NULL was a mistake; we need to pass a non-NULL value in a1 to signal that
a2 is defined, since the current implementations pass NULL in a1 and
garbage in a2.

If a dynamic linker does not provide a Linux-compatible riscv_hwprobe but
does support features that are passed in a2..a7, would it be better to pass
(long(*)())(-1) or a stub function that just returns 38 (+ENOSYS)?

-s
  
Evan Green Jan. 9, 2024, 6:41 p.m. UTC | #6
On Tue, Jan 9, 2024 at 10:30 AM Stefan O'Rear <sorear@fastmail.com> wrote:
>
> On Tue, Jan 9, 2024, at 12:10 PM, Evan Green wrote:
> > On Tue, Jan 9, 2024 at 4:06 AM Stefan O'Rear <sorear@fastmail.com> wrote:
> >>
> >> On Mon, Jan 8, 2024, at 12:06 PM, Evan Green wrote:
> >> > Any last thoughts on this series? I think the plan is to land it shortly.
> >>
> >> I am still wondering what the intended ABI is for the second parameter on
> >> non-Linux ELF systems which do not provide riscv_hwprobe as a system call.
> >> Should it be passed as a null (or 1/-1) pointer in that case, or are dynamic
> >> linkers expected to provide some emulation of riscv_hwprobe of some variable
> >> fidelity?  In the latter case, what features of riscv_hwprobe must be emulated
> >> for the dynamic linker to claim ABI conformance?
> >
> > NULL would make the most sense IMO. Hwprobe is a Linuxism for sure, so
> > code that uses hwprobe (either inside or outside an ifunc selector)
> > won't by default be portable across OSes. If Linux is consistently
> > good at defining the hwprobe bits and not breaking our own ABI, other
> > OSes could in theory emulate the interface by exposing the same
> > keys/values. Though at least from our perspective that's not a goal.
> >
> > -Evan
>
> NULL was a mistake; we need to pass a non-NULL value in a1 to signal that
> a2 is defined, since the current implementations pass NULL in a1 and
> garbage in a2.
>
> If a dynamic linker does not provide a Linux-compatible riscv_hwprobe but
> does support features that are passed in a2..a7, would it be better to pass
> (long(*)())(-1) or a stub function that just returns 38 (+ENOSYS)?

Oh great point, I hadn't connected those dots. I'm fine with either.
-1 would let them distinguish this case a little more explicitly, so
maybe that's better?

Is there a good spot to document this?

Should I defend against that -1 value in my static helper function as
well by checking for it? It seems like I shouldn't need to since it's
in a Linux-specific header, but if there's a scenario for it then I'll
add it.
-Evan

>
> -s
  
Stefan O'Rear Jan. 9, 2024, 7:09 p.m. UTC | #7
On Tue, Jan 9, 2024, at 1:41 PM, Evan Green wrote:
> On Tue, Jan 9, 2024 at 10:30 AM Stefan O'Rear <sorear@fastmail.com> wrote:
>>
>> On Tue, Jan 9, 2024, at 12:10 PM, Evan Green wrote:
>> > On Tue, Jan 9, 2024 at 4:06 AM Stefan O'Rear <sorear@fastmail.com> wrote:
>> >>
>> >> On Mon, Jan 8, 2024, at 12:06 PM, Evan Green wrote:
>> >> > Any last thoughts on this series? I think the plan is to land it shortly.
>> >>
>> >> I am still wondering what the intended ABI is for the second parameter on
>> >> non-Linux ELF systems which do not provide riscv_hwprobe as a system call.
>> >> Should it be passed as a null (or 1/-1) pointer in that case, or are dynamic
>> >> linkers expected to provide some emulation of riscv_hwprobe of some variable
>> >> fidelity?  In the latter case, what features of riscv_hwprobe must be emulated
>> >> for the dynamic linker to claim ABI conformance?
>> >
>> > NULL would make the most sense IMO. Hwprobe is a Linuxism for sure, so
>> > code that uses hwprobe (either inside or outside an ifunc selector)
>> > won't by default be portable across OSes. If Linux is consistently
>> > good at defining the hwprobe bits and not breaking our own ABI, other
>> > OSes could in theory emulate the interface by exposing the same
>> > keys/values. Though at least from our perspective that's not a goal.
>> >
>> > -Evan
>>
>> NULL was a mistake; we need to pass a non-NULL value in a1 to signal that
>> a2 is defined, since the current implementations pass NULL in a1 and
>> garbage in a2.
>>
>> If a dynamic linker does not provide a Linux-compatible riscv_hwprobe but
>> does support features that are passed in a2..a7, would it be better to pass
>> (long(*)())(-1) or a stub function that just returns 38 (+ENOSYS)?
>
> Oh great point, I hadn't connected those dots. I'm fine with either.
> -1 would let them distinguish this case a little more explicitly, so
> maybe that's better?

+1 might be a better choice (match SIG_IGN rather than MAP_FAILED, already
a non-function in the uABI, and slightly fewer bytes for the branch).

The stub function has a disadvantage of polluting the global ABI with
Linux-specific error constants.

No strong feelings here.

> Is there a good spot to document this?

The details of the IRELATIVE resolution process should probably go in
riscv-elf-psabi-doc/riscv-elf.adoc .

> Should I defend against that -1 value in my static helper function as
> well by checking for it? It seems like I shouldn't need to since it's
> in a Linux-specific header, but if there's a scenario for it then I'll
> add it.
> -Evan

Do you mean select_memcpy_ifunc in patch 7?  I'm inclined to say that it
should be robust in case someone copies it into a library other than glibc
and it can no longer depend on Linux and a specific version of the glibc
dynamic linker.

-s
  
enh Jan. 9, 2024, 7:18 p.m. UTC | #8
On Tue, Jan 9, 2024, 11:10 Stefan O'Rear <sorear@fastmail.com> wrote:

> On Tue, Jan 9, 2024, at 1:41 PM, Evan Green wrote:
> > On Tue, Jan 9, 2024 at 10:30 AM Stefan O'Rear <sorear@fastmail.com>
> wrote:
> >>
> >> On Tue, Jan 9, 2024, at 12:10 PM, Evan Green wrote:
> >> > On Tue, Jan 9, 2024 at 4:06 AM Stefan O'Rear <sorear@fastmail.com>
> wrote:
> >> >>
> >> >> On Mon, Jan 8, 2024, at 12:06 PM, Evan Green wrote:
> >> >> > Any last thoughts on this series? I think the plan is to land it
> shortly.
> >> >>
> >> >> I am still wondering what the intended ABI is for the second
> parameter on
> >> >> non-Linux ELF systems which do not provide riscv_hwprobe as a system
> call.
> >> >> Should it be passed as a null (or 1/-1) pointer in that case, or are
> dynamic
> >> >> linkers expected to provide some emulation of riscv_hwprobe of some
> variable
> >> >> fidelity?  In the latter case, what features of riscv_hwprobe must
> be emulated
> >> >> for the dynamic linker to claim ABI conformance?
> >> >
> >> > NULL would make the most sense IMO. Hwprobe is a Linuxism for sure, so
> >> > code that uses hwprobe (either inside or outside an ifunc selector)
> >> > won't by default be portable across OSes. If Linux is consistently
> >> > good at defining the hwprobe bits and not breaking our own ABI, other
> >> > OSes could in theory emulate the interface by exposing the same
> >> > keys/values. Though at least from our perspective that's not a goal.
> >> >
> >> > -Evan
> >>
> >> NULL was a mistake; we need to pass a non-NULL value in a1 to signal
> that
> >> a2 is defined, since the current implementations pass NULL in a1 and
> >> garbage in a2.
> >>
> >> If a dynamic linker does not provide a Linux-compatible riscv_hwprobe
> but
> >> does support features that are passed in a2..a7, would it be better to
> pass
> >> (long(*)())(-1) or a stub function that just returns 38 (+ENOSYS)?
> >
> > Oh great point, I hadn't connected those dots. I'm fine with either.
> > -1 would let them distinguish this case a little more explicitly, so
> > maybe that's better?
>
> +1 might be a better choice (match SIG_IGN rather than MAP_FAILED, already
> a non-function in the uABI, and slightly fewer bytes for the branch).
>
> The stub function has a disadvantage of polluting the global ABI with
> Linux-specific error constants.
>

(and i'd still argue that no-one really needs it and, worse, folks who
think they need it are probably pessimizing their code.)

No strong feelings here.
>
> > Is there a good spot to document this?
>
> The details of the IRELATIVE resolution process should probably go in
> riscv-elf-psabi-doc/riscv-elf.adoc .
>
> > Should I defend against that -1 value in my static helper function as
> > well by checking for it? It seems like I shouldn't need to since it's
> > in a Linux-specific header, but if there's a scenario for it then I'll
> > add it.
> > -Evan
>
> Do you mean select_memcpy_ifunc in patch 7?  I'm inclined to say that it
> should be robust in case someone copies it into a library other than glibc
> and it can no longer depend on Linux and a specific version of the glibc
> dynamic linker.
>
> -s
>
  
Evan Green Jan. 9, 2024, 7:36 p.m. UTC | #9
On Tue, Jan 9, 2024 at 11:10 AM Stefan O'Rear <sorear@fastmail.com> wrote:
>
> On Tue, Jan 9, 2024, at 1:41 PM, Evan Green wrote:
> > On Tue, Jan 9, 2024 at 10:30 AM Stefan O'Rear <sorear@fastmail.com> wrote:
> >>
> >> On Tue, Jan 9, 2024, at 12:10 PM, Evan Green wrote:
> >> > On Tue, Jan 9, 2024 at 4:06 AM Stefan O'Rear <sorear@fastmail.com> wrote:
> >> >>
> >> >> On Mon, Jan 8, 2024, at 12:06 PM, Evan Green wrote:
> >> >> > Any last thoughts on this series? I think the plan is to land it shortly.
> >> >>
> >> >> I am still wondering what the intended ABI is for the second parameter on
> >> >> non-Linux ELF systems which do not provide riscv_hwprobe as a system call.
> >> >> Should it be passed as a null (or 1/-1) pointer in that case, or are dynamic
> >> >> linkers expected to provide some emulation of riscv_hwprobe of some variable
> >> >> fidelity?  In the latter case, what features of riscv_hwprobe must be emulated
> >> >> for the dynamic linker to claim ABI conformance?
> >> >
> >> > NULL would make the most sense IMO. Hwprobe is a Linuxism for sure, so
> >> > code that uses hwprobe (either inside or outside an ifunc selector)
> >> > won't by default be portable across OSes. If Linux is consistently
> >> > good at defining the hwprobe bits and not breaking our own ABI, other
> >> > OSes could in theory emulate the interface by exposing the same
> >> > keys/values. Though at least from our perspective that's not a goal.
> >> >
> >> > -Evan
> >>
> >> NULL was a mistake; we need to pass a non-NULL value in a1 to signal that
> >> a2 is defined, since the current implementations pass NULL in a1 and
> >> garbage in a2.
> >>
> >> If a dynamic linker does not provide a Linux-compatible riscv_hwprobe but
> >> does support features that are passed in a2..a7, would it be better to pass
> >> (long(*)())(-1) or a stub function that just returns 38 (+ENOSYS)?
> >
> > Oh great point, I hadn't connected those dots. I'm fine with either.
> > -1 would let them distinguish this case a little more explicitly, so
> > maybe that's better?
>
> +1 might be a better choice (match SIG_IGN rather than MAP_FAILED, already
> a non-function in the uABI, and slightly fewer bytes for the branch).
>
> The stub function has a disadvantage of polluting the global ABI with
> Linux-specific error constants.
>
> No strong feelings here.

Sure, the arguments here make sense to me, so +1 seems fine.

>
> > Is there a good spot to document this?
>
> The details of the IRELATIVE resolution process should probably go in
> riscv-elf-psabi-doc/riscv-elf.adoc .
>
> > Should I defend against that -1 value in my static helper function as
> > well by checking for it? It seems like I shouldn't need to since it's
> > in a Linux-specific header, but if there's a scenario for it then I'll
> > add it.
> > -Evan
>
> Do you mean select_memcpy_ifunc in patch 7?  I'm inclined to say that it
> should be robust in case someone copies it into a library other than glibc
> and it can no longer depend on Linux and a specific version of the glibc
> dynamic linker.

Specifically I meant the __riscv_hwprobe_one() static inline helper in
patch 6, which is what's used by patch 7. Yeah, as far as I can tell
the only way to run into trouble is copying that function over to a
substantially different world. Maybe a comment would be sufficient
then?

-Evan