riscv: Resolve symbols directly for symbols with STO_RISCV_VARIANT_CC.

Message ID 1628480827-366232-1-git-send-email-kai.wang@sifive.com
State Changes Requested, archived
Headers
Series riscv: Resolve symbols directly for symbols with STO_RISCV_VARIANT_CC. |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Hsiangkai Wang Aug. 9, 2021, 3:47 a.m. UTC
  In some cases, we do not want to go through the resolver for function
calls. For example, functions with vector arguments will use vector
registers to pass arguments. In the resolver, we do not save/restore the
vector argument registers for lazy binding efficiency. To avoid ruining
the vector arguments, functions with vector arguments will not go
through the resolver.

To achieve the goal, we will annotate the function symbols with
STO_RISCV_VARIANT_CC flag and add DT_RISCV_VARIANT_CC tag in the dynamic
section. In the first pass on PLT relocations, we do not set up to call
_dl_runtime_resolve. Instead, we resolve the functions directly.

The related discussion could be found in
https://github.com/riscv/riscv-elf-psabi-doc/pull/190.
---
 elf/elf.h                    |  7 +++++++
 sysdeps/riscv/dl-dtprocnum.h | 21 +++++++++++++++++++++
 sysdeps/riscv/dl-machine.h   | 26 ++++++++++++++++++++++++++
 3 files changed, 54 insertions(+)
 create mode 100644 sysdeps/riscv/dl-dtprocnum.h
  

Comments

Palmer Dabbelt Aug. 11, 2021, 10:11 p.m. UTC | #1
On Sun, 08 Aug 2021 20:47:07 PDT (-0700), kai.wang@sifive.com wrote:
> In some cases, we do not want to go through the resolver for function
> calls. For example, functions with vector arguments will use vector
> registers to pass arguments. In the resolver, we do not save/restore the
> vector argument registers for lazy binding efficiency. To avoid ruining
> the vector arguments, functions with vector arguments will not go
> through the resolver.
>
> To achieve the goal, we will annotate the function symbols with
> STO_RISCV_VARIANT_CC flag and add DT_RISCV_VARIANT_CC tag in the dynamic
> section. In the first pass on PLT relocations, we do not set up to call
> _dl_runtime_resolve. Instead, we resolve the functions directly.
>
> The related discussion could be found in
> https://github.com/riscv/riscv-elf-psabi-doc/pull/190.

I'm trying to stay away from the foundation stuff these days, but from 
reading that spec it doesn't look like the variant CC has anything to do 
specifically with the V extension but instead is about allowing for 
arbitrary calling conventions for X and F registers as well.  Handling 
non-standard X-register calling conventions is really a whole different 
beast than handling non-standard F or V register calling conventions.  
It's not crazy to allow for these, but lumping them all into a single 
bit just makes this unnecessarily difficult.

In theory we can handle arbitrary F or V register calling conventions 
with the standard resolver, we just need to ensure we deal with 
save/restore of those registers.  IMO the right way to go is to just ban 
F and V register use from the resolver, as there's not much of a reason 
for them to be in there (none for F, and you probably don't want to spin 
up the V registers).  That just requires sorting out the build system, 
and would still allow us to have lazy binding for the variants.  At a 
bare minimum we can support arbitrary V register calling conventions for 
free until we start using the V registers in glibc, which is likely 
quite a way off.

Handling arbitrary X-register calling conventions is a lot tricker: for 
the fully arbitrary case we can't even have PLT entries, for example.  
Is there actually a use case for these?  If so we'll have to support it, 
it just seems odd that one would care enough about the X ABI to want a 
different one while still being able to deal with the overhead of a 
dynamic call.

So I'm not opposed to doing something like this, we just need some way 
to make sure it's actually solving a problem.

> ---
>  elf/elf.h                    |  7 +++++++
>  sysdeps/riscv/dl-dtprocnum.h | 21 +++++++++++++++++++++
>  sysdeps/riscv/dl-machine.h   | 26 ++++++++++++++++++++++++++
>  3 files changed, 54 insertions(+)
>  create mode 100644 sysdeps/riscv/dl-dtprocnum.h
>
> diff --git a/elf/elf.h b/elf/elf.h
> index 4738dfa..0de29bf 100644
> --- a/elf/elf.h
> +++ b/elf/elf.h
> @@ -3889,6 +3889,13 @@ enum
>
>  #define R_TILEGX_NUM		130
>
> +/* RISC-V specific values for the Dyn d_tag field.  */
> +#define DT_RISCV_VARIANT_CC	(DT_LOPROC + 1)
> +#define DT_RISCV_NUM		2
> +
> +/* RISC-V specific values for the st_other field.  */
> +#define STO_RISCV_VARIANT_CC	0x80
> +
>  /* RISC-V ELF Flags */
>  #define EF_RISCV_RVC 			0x0001
>  #define EF_RISCV_FLOAT_ABI 		0x0006
> diff --git a/sysdeps/riscv/dl-dtprocnum.h b/sysdeps/riscv/dl-dtprocnum.h
> new file mode 100644
> index 0000000..f189fd7
> --- /dev/null
> +++ b/sysdeps/riscv/dl-dtprocnum.h
> @@ -0,0 +1,21 @@
> +/* Configuration of lookup functions.  RISC-V version.
> +   Copyright (C) 2019-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Number of extra dynamic section entries for this architecture.  By
> +   default there are none.  */
> +#define DT_THISPROCNUM	DT_RISCV_NUM
> diff --git a/sysdeps/riscv/dl-machine.h b/sysdeps/riscv/dl-machine.h
> index aedf69f..6488483 100644
> --- a/sysdeps/riscv/dl-machine.h
> +++ b/sysdeps/riscv/dl-machine.h
> @@ -54,6 +54,9 @@
>  #define ELF_MACHINE_NO_REL 1
>  #define ELF_MACHINE_NO_RELA 0
>
> +/* Translate a processor specific dynamic tag to the index in l_info array.  */
> +#define DT_RISCV(x) (DT_RISCV_##x - DT_LOPROC + DT_NUM)
> +
>  /* Return nonzero iff ELF header is compatible with the running host.  */
>  static inline int __attribute_used__
>  elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
> @@ -299,6 +302,29 @@ elf_machine_lazy_rel (struct link_map *map, ElfW(Addr) l_addr,
>    /* Check for unexpected PLT reloc type.  */
>    if (__glibc_likely (r_type == R_RISCV_JUMP_SLOT))
>      {
> +      if (__glibc_unlikely (map->l_info[DT_RISCV (VARIANT_CC)] != NULL))
> +	{
> +	  /* Check the symbol table for variant CC symbols.  */
> +	  const Elf_Symndx symndx = ELFW(R_SYM) (reloc->r_info);
> +	  const ElfW(Sym) *symtab =
> +	    (const void *)D_PTR (map, l_info[DT_SYMTAB]);
> +	  const ElfW(Sym) *sym = &symtab[symndx];
> +	  if (__glibc_unlikely (sym->st_other & STO_RISCV_VARIANT_CC))
> +	    {
> +	      /* Avoid lazy resolution of variant CC symbols.  */
> +	      const struct r_found_version *version = NULL;
> +	      if (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
> +		{
> +		  const ElfW(Half) *vernum =
> +		    (const void *)D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
> +		  version = &map->l_versions[vernum[symndx] & 0x7fff];
> +		}
> +	      elf_machine_rela (map, reloc, sym, version, reloc_addr,
> +				skip_ifunc);
> +	      return;
> +	    }
> +	}
> +
>        if (__glibc_unlikely (map->l_mach.plt == 0))
>  	{
>  	  if (l_addr)
  
Jessica Clarke Aug. 11, 2021, 10:51 p.m. UTC | #2
On 11 Aug 2021, at 23:11, Palmer Dabbelt <palmer@dabbelt.com> wrote:
> 
> On Sun, 08 Aug 2021 20:47:07 PDT (-0700), kai.wang@sifive.com wrote:
>> In some cases, we do not want to go through the resolver for function
>> calls. For example, functions with vector arguments will use vector
>> registers to pass arguments. In the resolver, we do not save/restore the
>> vector argument registers for lazy binding efficiency. To avoid ruining
>> the vector arguments, functions with vector arguments will not go
>> through the resolver.
>> 
>> To achieve the goal, we will annotate the function symbols with
>> STO_RISCV_VARIANT_CC flag and add DT_RISCV_VARIANT_CC tag in the dynamic
>> section. In the first pass on PLT relocations, we do not set up to call
>> _dl_runtime_resolve. Instead, we resolve the functions directly.
>> 
>> The related discussion could be found in
>> https://github.com/riscv/riscv-elf-psabi-doc/pull/190.
> 
> I'm trying to stay away from the foundation stuff these days, but from reading that spec it doesn't look like the variant CC has anything to do specifically with the V extension but instead is about allowing for arbitrary calling conventions for X and F registers as well.  Handling non-standard X-register calling conventions is really a whole different beast than handling non-standard F or V register calling conventions.  It's not crazy to allow for these, but lumping them all into a single bit just makes this unnecessarily difficult.

Whilst increased granularity would be nice to have, the reality is that st_other is an 8-bit field, 2 of which are currently allocated by the gABI, with proposals to grow that to 3, so there are only 5 bits to play with, thus anything more than just a single bit for this needs an extremely good justification beyond “it would be less difficult to use”, unfortunately. This bit is sufficient to address the issue with the V calling convention, and is also sufficient for any other non-standard calling convention, whether it involves X registers, F registers or something that hasn’t even been thought up yet.

The second reason why it’s done this way is I have a strong belief that RISC-V should not reinvent the wheel except for problems unique to RISC-V. AArch64 faced exactly the same problem with SVE a few years ago and this is the exact same solution they use and is deployed everywhere, so by following their lead we can be confident it works and reuse existing implementation code for the various tools and runtimes affected.

> In theory we can handle arbitrary F or V register calling conventions with the standard resolver, we just need to ensure we deal with save/restore of those registers.  IMO the right way to go is to just ban F and V register use from the resolver, as there's not much of a reason for them to be in there (none for F, and you probably don't want to spin up the V registers).  That just requires sorting out the build system, and would still allow us to have lazy binding for the variants.  At a bare minimum we can support arbitrary V register calling conventions for free until we start using the V registers in glibc, which is likely quite a way off.

It’s a nice idea in theory, but in practice it doesn’t work, the resolver path is far too complicated for that. Run-time linkers like to call into libc for various things (e.g. on FreeBSD, if you are in a multi-threaded program, all the locking and unlocking guarding the shared data structures ends up calling into libthr via hooks, and in glibc similarly it uses libpthread’s real mutex functions), so you end up needing to make sure those parts of your (extended) libc are also free from vectors and floats (including any auto-vectorisation). But the real kicker is IFUNCs; with lazy binding, the IFUNC resolver isn’t called until you first call the function, and you have no clue what the resolver is going to do other than that it’ll follow the ABI (we should specify that, but IFUNCs as a whole are underspecified in the psABI, we don’t currently document what the arguments are). That means the resolver is free to use F and V registers. Yes, all of these can be individually addressed via various extremely-targeted means, but experience suggests that it will break sooner or later. As a simple example, none of the libc functions called by the run-time linker could ever use memcpy, be it explicit or implicit (compiler-inserted instead of a large inline struct assignment), since that will eventually be vectorised, even if you compiled the directly-called libc functions themselves without V.

> Handling arbitrary X-register calling conventions is a lot tricker: for the fully arbitrary case we can't even have PLT entries, for example.  Is there actually a use case for these?  If so we'll have to support it, it just seems odd that one would care enough about the X ABI to want a different one while still being able to deal with the overhead of a dynamic call.

You’re correct that as it stands PLT entries are technically broken by this. We should fix that so even VARAINT_CC functions can’t use those registers as argument registers (AArch64 has that implicitly as it has IP0 and IP1 as reserved registers for PLTs and veneers, but since we do linker relaxation rather than range-extending with veneers we haven’t yet needed to explicitly reserve the PLT registers).

Jess

> So I'm not opposed to doing something like this, we just need some way to make sure it's actually solving a problem.
> 
>> ---
>> elf/elf.h                    |  7 +++++++
>> sysdeps/riscv/dl-dtprocnum.h | 21 +++++++++++++++++++++
>> sysdeps/riscv/dl-machine.h   | 26 ++++++++++++++++++++++++++
>> 3 files changed, 54 insertions(+)
>> create mode 100644 sysdeps/riscv/dl-dtprocnum.h
>> 
>> diff --git a/elf/elf.h b/elf/elf.h
>> index 4738dfa..0de29bf 100644
>> --- a/elf/elf.h
>> +++ b/elf/elf.h
>> @@ -3889,6 +3889,13 @@ enum
>> 
>> #define R_TILEGX_NUM		130
>> 
>> +/* RISC-V specific values for the Dyn d_tag field.  */
>> +#define DT_RISCV_VARIANT_CC	(DT_LOPROC + 1)
>> +#define DT_RISCV_NUM		2
>> +
>> +/* RISC-V specific values for the st_other field.  */
>> +#define STO_RISCV_VARIANT_CC	0x80
>> +
>> /* RISC-V ELF Flags */
>> #define EF_RISCV_RVC 			0x0001
>> #define EF_RISCV_FLOAT_ABI 		0x0006
>> diff --git a/sysdeps/riscv/dl-dtprocnum.h b/sysdeps/riscv/dl-dtprocnum.h
>> new file mode 100644
>> index 0000000..f189fd7
>> --- /dev/null
>> +++ b/sysdeps/riscv/dl-dtprocnum.h
>> @@ -0,0 +1,21 @@
>> +/* Configuration of lookup functions.  RISC-V version.
>> +   Copyright (C) 2019-2021 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library.  If not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +/* Number of extra dynamic section entries for this architecture.  By
>> +   default there are none.  */
>> +#define DT_THISPROCNUM	DT_RISCV_NUM
>> diff --git a/sysdeps/riscv/dl-machine.h b/sysdeps/riscv/dl-machine.h
>> index aedf69f..6488483 100644
>> --- a/sysdeps/riscv/dl-machine.h
>> +++ b/sysdeps/riscv/dl-machine.h
>> @@ -54,6 +54,9 @@
>> #define ELF_MACHINE_NO_REL 1
>> #define ELF_MACHINE_NO_RELA 0
>> 
>> +/* Translate a processor specific dynamic tag to the index in l_info array.  */
>> +#define DT_RISCV(x) (DT_RISCV_##x - DT_LOPROC + DT_NUM)
>> +
>> /* Return nonzero iff ELF header is compatible with the running host.  */
>> static inline int __attribute_used__
>> elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
>> @@ -299,6 +302,29 @@ elf_machine_lazy_rel (struct link_map *map, ElfW(Addr) l_addr,
>>   /* Check for unexpected PLT reloc type.  */
>>   if (__glibc_likely (r_type == R_RISCV_JUMP_SLOT))
>>     {
>> +      if (__glibc_unlikely (map->l_info[DT_RISCV (VARIANT_CC)] != NULL))
>> +	{
>> +	  /* Check the symbol table for variant CC symbols.  */
>> +	  const Elf_Symndx symndx = ELFW(R_SYM) (reloc->r_info);
>> +	  const ElfW(Sym) *symtab =
>> +	    (const void *)D_PTR (map, l_info[DT_SYMTAB]);
>> +	  const ElfW(Sym) *sym = &symtab[symndx];
>> +	  if (__glibc_unlikely (sym->st_other & STO_RISCV_VARIANT_CC))
>> +	    {
>> +	      /* Avoid lazy resolution of variant CC symbols.  */
>> +	      const struct r_found_version *version = NULL;
>> +	      if (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
>> +		{
>> +		  const ElfW(Half) *vernum =
>> +		    (const void *)D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
>> +		  version = &map->l_versions[vernum[symndx] & 0x7fff];
>> +		}
>> +	      elf_machine_rela (map, reloc, sym, version, reloc_addr,
>> +				skip_ifunc);
>> +	      return;
>> +	    }
>> +	}
>> +
>>       if (__glibc_unlikely (map->l_mach.plt == 0))
>> 	{
>> 	  if (l_addr)
  
Jim Wilson Aug. 11, 2021, 11 p.m. UTC | #3
On Wed, Aug 11, 2021 at 3:13 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:

> I'm trying to stay away from the foundation stuff these days, but from
> reading that spec it doesn't look like the variant CC has anything to do
> specifically with the V extension but instead is about allowing for
> arbitrary calling conventions for X and F registers as well.  Handling
> non-standard X-register calling conventions is really a whole different
> beast than handling non-standard F or V register calling conventions.
> It's not crazy to allow for these, but lumping them all into a single
> bit just makes this unnecessarily difficult.
>

This is the exact same solution that the aarch64 port uses.  We aren't
inventing anything new here.

The intent is not to support arbitrary calling conventions.  The intent is
that we will have two official calling conventions once we add vector
support, one that uses vector registers and one
that doesn't.  Most code will use the non-vector register ABI, and we
expect glibc will be compiled this way for maximum portability.  But we
will need to support the second ABI that does have vector registers, and
hence we only need a single bit for that at this time.

If you aren't willing to participate in the official ABI committee, then
you should at least be willing to accept their decisions.

and would still allow us to have lazy binding for the variants.  At a
> bare minimum we can support arbitrary V register calling conventions for
> free until we start using the V registers in glibc, which is likely
> quite a way off.
>

We already have patches to use V registers in glibc, and we are already
using them.  We already upstreamed the ifunc support for binutils and glibc
required to use vector instructions.  We use them for mem* and str*
routines.  Adopting the position that the resolver isn't allowed to call
any mem* or str* functions is unreasonable.  Hence the need for this patch.

Jim
  
Palmer Dabbelt Aug. 11, 2021, 11:21 p.m. UTC | #4
On Wed, 11 Aug 2021 16:00:31 PDT (-0700), Jim Wilson wrote:
> On Wed, Aug 11, 2021 at 3:13 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
>> I'm trying to stay away from the foundation stuff these days, but from
>> reading that spec it doesn't look like the variant CC has anything to do
>> specifically with the V extension but instead is about allowing for
>> arbitrary calling conventions for X and F registers as well.  Handling
>> non-standard X-register calling conventions is really a whole different
>> beast than handling non-standard F or V register calling conventions.
>> It's not crazy to allow for these, but lumping them all into a single
>> bit just makes this unnecessarily difficult.
>>
>
> This is the exact same solution that the aarch64 port uses.  We aren't
> inventing anything new here.
>
> The intent is not to support arbitrary calling conventions.  The intent is
> that we will have two official calling conventions once we add vector
> support, one that uses vector registers and one
> that doesn't.  Most code will use the non-vector register ABI, and we
> expect glibc will be compiled this way for maximum portability.  But we
> will need to support the second ABI that does have vector registers, and
> hence we only need a single bit for that at this time.
>
> If you aren't willing to participate in the official ABI committee, then
> you should at least be willing to accept their decisions.

All I'm trying to do here is point out that the ABI you've defined has 
some much broader costs to implement than the features you're 
advertising you want to use it for.  If this is the ABI you want that's 
fine, I don't really care, just go make it part of the official psABI 
spec and we'll get on with our lives.

> and would still allow us to have lazy binding for the variants.  At a
>> bare minimum we can support arbitrary V register calling conventions for
>> free until we start using the V registers in glibc, which is likely
>> quite a way off.
>>
>
> We already have patches to use V registers in glibc, and we are already
> using them.  We already upstreamed the ifunc support for binutils and glibc
> required to use vector instructions.  We use them for mem* and str*
> routines.  Adopting the position that the resolver isn't allowed to call
> any mem* or str* functions is unreasonable.  Hence the need for this patch.
>
> Jim
  
Jessica Clarke Aug. 11, 2021, 11:32 p.m. UTC | #5
On 12 Aug 2021, at 00:21, Palmer Dabbelt <palmer@dabbelt.com> wrote:
> On Wed, 11 Aug 2021 16:00:31 PDT (-0700), Jim Wilson wrote:
>> On Wed, Aug 11, 2021 at 3:13 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>> 
>>> I'm trying to stay away from the foundation stuff these days, but from
>>> reading that spec it doesn't look like the variant CC has anything to do
>>> specifically with the V extension but instead is about allowing for
>>> arbitrary calling conventions for X and F registers as well.  Handling
>>> non-standard X-register calling conventions is really a whole different
>>> beast than handling non-standard F or V register calling conventions.
>>> It's not crazy to allow for these, but lumping them all into a single
>>> bit just makes this unnecessarily difficult.
>>> 
>> 
>> This is the exact same solution that the aarch64 port uses.  We aren't
>> inventing anything new here.
>> 
>> The intent is not to support arbitrary calling conventions.  The intent is
>> that we will have two official calling conventions once we add vector
>> support, one that uses vector registers and one
>> that doesn't.  Most code will use the non-vector register ABI, and we
>> expect glibc will be compiled this way for maximum portability.  But we
>> will need to support the second ABI that does have vector registers, and
>> hence we only need a single bit for that at this time.
>> 
>> If you aren't willing to participate in the official ABI committee, then
>> you should at least be willing to accept their decisions.
> 
> All I'm trying to do here is point out that the ABI you've defined has some much broader costs to implement than the features you're advertising you want to use it for.  If this is the ABI you want that's fine, I don't really care, just go make it part of the official psABI spec and we'll get on with our lives.

As I explained in my reply, in practice the solution for the vector calling convention specifically ends up being the solution for the more general problem, and so there are no such broader costs beyond the edge-case you identified that’s just an underspecification issue.

Regarding making it a part of the psABI spec, it has already been approved by all relevant parties, it has just been waiting on a proof-of-concept, or better, patch for either glibc or FreeBSD, which this provides, since for important features like this our policy is to not merge additions to the psABI spec until patches exist to prove its viability.

Jess

>> and would still allow us to have lazy binding for the variants.  At a
>>> bare minimum we can support arbitrary V register calling conventions for
>>> free until we start using the V registers in glibc, which is likely
>>> quite a way off.
>>> 
>> 
>> We already have patches to use V registers in glibc, and we are already
>> using them.  We already upstreamed the ifunc support for binutils and glibc
>> required to use vector instructions.  We use them for mem* and str*
>> routines.  Adopting the position that the resolver isn't allowed to call
>> any mem* or str* functions is unreasonable.  Hence the need for this patch.
>> 
>> Jim
  
Palmer Dabbelt Aug. 11, 2021, 11:52 p.m. UTC | #6
On Wed, 11 Aug 2021 16:32:38 PDT (-0700), jrtc27@jrtc27.com wrote:
> On 12 Aug 2021, at 00:21, Palmer Dabbelt <palmer@dabbelt.com> wrote:
>> On Wed, 11 Aug 2021 16:00:31 PDT (-0700), Jim Wilson wrote:
>>> On Wed, Aug 11, 2021 at 3:13 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>> 
>>>> I'm trying to stay away from the foundation stuff these days, but from
>>>> reading that spec it doesn't look like the variant CC has anything to do
>>>> specifically with the V extension but instead is about allowing for
>>>> arbitrary calling conventions for X and F registers as well.  Handling
>>>> non-standard X-register calling conventions is really a whole different
>>>> beast than handling non-standard F or V register calling conventions.
>>>> It's not crazy to allow for these, but lumping them all into a single
>>>> bit just makes this unnecessarily difficult.
>>>> 
>>> 
>>> This is the exact same solution that the aarch64 port uses.  We aren't
>>> inventing anything new here.
>>> 
>>> The intent is not to support arbitrary calling conventions.  The intent is
>>> that we will have two official calling conventions once we add vector
>>> support, one that uses vector registers and one
>>> that doesn't.  Most code will use the non-vector register ABI, and we
>>> expect glibc will be compiled this way for maximum portability.  But we
>>> will need to support the second ABI that does have vector registers, and
>>> hence we only need a single bit for that at this time.
>>> 
>>> If you aren't willing to participate in the official ABI committee, then
>>> you should at least be willing to accept their decisions.
>> 
>> All I'm trying to do here is point out that the ABI you've defined has some much broader costs to implement than the features you're advertising you want to use it for.  If this is the ABI you want that's fine, I don't really care, just go make it part of the official psABI spec and we'll get on with our lives.
>
> As I explained in my reply, in practice the solution for the vector calling convention specifically ends up being the solution for the more general problem, and so there are no such broader costs beyond the edge-case you identified that’s just an underspecification issue.
>
> Regarding making it a part of the psABI spec, it has already been approved by all relevant parties, it has just been waiting on a proof-of-concept, or better, patch for either glibc or FreeBSD, which this provides, since for important features like this our policy is to not merge additions to the psABI spec until patches exist to prove its viability.

OK, no problem.  Next time just please make it clear when you guys send 
something out for review that all discussion here is irrelevant, because 
it'll save us the time.

IIUC the ABI patch still isn't merged, which is what makes it actually 
official.  I'll take this when it lands.

>
> Jess
>
>>> and would still allow us to have lazy binding for the variants.  At a
>>>> bare minimum we can support arbitrary V register calling conventions for
>>>> free until we start using the V registers in glibc, which is likely
>>>> quite a way off.
>>>> 
>>> 
>>> We already have patches to use V registers in glibc, and we are already
>>> using them.  We already upstreamed the ifunc support for binutils and glibc
>>> required to use vector instructions.  We use them for mem* and str*
>>> routines.  Adopting the position that the resolver isn't allowed to call
>>> any mem* or str* functions is unreasonable.  Hence the need for this patch.
>>> 
>>> Jim
  
Jessica Clarke Aug. 12, 2021, 12:54 a.m. UTC | #7
On 12 Aug 2021, at 00:52, Palmer Dabbelt <palmer@dabbelt.com> wrote:
> On Wed, 11 Aug 2021 16:32:38 PDT (-0700), jrtc27@jrtc27.com wrote:
>> On 12 Aug 2021, at 00:21, Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>> On Wed, 11 Aug 2021 16:00:31 PDT (-0700), Jim Wilson wrote:
>>>> On Wed, Aug 11, 2021 at 3:13 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>>>> I'm trying to stay away from the foundation stuff these days, but from
>>>>> reading that spec it doesn't look like the variant CC has anything to do
>>>>> specifically with the V extension but instead is about allowing for
>>>>> arbitrary calling conventions for X and F registers as well.  Handling
>>>>> non-standard X-register calling conventions is really a whole different
>>>>> beast than handling non-standard F or V register calling conventions.
>>>>> It's not crazy to allow for these, but lumping them all into a single
>>>>> bit just makes this unnecessarily difficult.
>>>> This is the exact same solution that the aarch64 port uses.  We aren't
>>>> inventing anything new here.
>>>> The intent is not to support arbitrary calling conventions.  The intent is
>>>> that we will have two official calling conventions once we add vector
>>>> support, one that uses vector registers and one
>>>> that doesn't.  Most code will use the non-vector register ABI, and we
>>>> expect glibc will be compiled this way for maximum portability.  But we
>>>> will need to support the second ABI that does have vector registers, and
>>>> hence we only need a single bit for that at this time.
>>>> If you aren't willing to participate in the official ABI committee, then
>>>> you should at least be willing to accept their decisions.
>>> All I'm trying to do here is point out that the ABI you've defined has some much broader costs to implement than the features you're advertising you want to use it for.  If this is the ABI you want that's fine, I don't really care, just go make it part of the official psABI spec and we'll get on with our lives.
>> 
>> As I explained in my reply, in practice the solution for the vector calling convention specifically ends up being the solution for the more general problem, and so there are no such broader costs beyond the edge-case you identified that’s just an underspecification issue.
>> 
>> Regarding making it a part of the psABI spec, it has already been approved by all relevant parties, it has just been waiting on a proof-of-concept, or better, patch for either glibc or FreeBSD, which this provides, since for important features like this our policy is to not merge additions to the psABI spec until patches exist to prove its viability.
> 
> OK, no problem.  Next time just please make it clear when you guys send something out for review that all discussion here is irrelevant, because it'll save us the time.

That’s an unfair characterisation and, whilst I appreciate that Jim’s reply was antagonistic in places, please do not take it out on me by using that kind of tone; I’ve tried to address all your comments and concerns with technical justifications. Were there to be actual problems with the implementation we would of course take that feedback; that is one of the reasons why we don’t merge a consequential specification change until people have actually implemented it. In this case AArch64 has already gone this route two years ago and lived to tell the tale without obvious issues so in this specific case there’s unlikely to be a problem that would result in us not adopting the proposal, but it’s always possible we’ve missed something. My point was “this is the specification we (the psABI chairs, plus representatives from both the GNU and LLVM world) will be going with unless anyone has a good reason why it is a bad idea”.

However, there is a small element of truth to your comment. Discussion here isn’t irrelevant, but it’s not the primary place for ABI specification discussions. Because the RISC-V psABI tries to show no preferential treatment to a specific OS or toolchain (or, in the case of Linux, libc implementation) and seek a specification that works for all, the primary place for discussions is not a GNU-specific or Linux-specific mailing list but the psABI repository, mailing list and meetings (for which there publicly-available minutes), and those who want to have more of an input into it should contribute to the discussions there (anyone is free to offer feedback on psABI issues/PRs). When necessary we do try to reach out to the mailing lists for the specific communities involved (e.g. on the compiler front both GCC/binutils and LLVM have their own regular RISC-V sync-up calls where issues can be brought up and discussed with the outcome reported back to the psABI task group), and feedback on proof-of-concept patches that relate to the specification rather than the implementation will be taken into account, but to avoid fragmentation the ultimate sources of rationale and places for discussion are in general those common central locations.

I’m also not sure why you’re acting so surprised about this addition to the psABI; you yourself reviewed the pull request in question back in May and, as far as I can tell, it had already adopted the STO_RISCV_VARIANT_CC nomenclature, and was only using vector registers as an example of a non-standard calling convention, by that point. Since your conditional approval the wording has been overhauled, in part to address your feedback on wording that was the condition for your approval, but nothing that actually changes things as far as implementations are concerned.

And to be clear, I had no part in the authorship or publishing of this patch, nor do I have any kind of affiliation with SiFive. What SiFive, or its employees, do is entirely up to them and completely independent of myself as co-chair of the psABI specification task group. All I ask is that patches exist; whether they’re proposed as RFC/PoC or not is none of my business beyond making sure that implementations don’t officially adopt new ABI features before they are made official (or at least the kind that matter for compatibility) as that causes as much of a headache for me as it does you.

> IIUC the ABI patch still isn't merged, which is what makes it actually official.

Of course. If in light of the technical explanations from Jim and myself you still feel there are reasons why this should not be adopted then please do express them, but otherwise I expect it will be merged shortly.

> I'll take this when it lands.

Sure; I’d do the same in your shoes too.

Jess

>> Jess
>> 
>>>> and would still allow us to have lazy binding for the variants.  At a
>>>>> bare minimum we can support arbitrary V register calling conventions for
>>>>> free until we start using the V registers in glibc, which is likely
>>>>> quite a way off.
>>>> We already have patches to use V registers in glibc, and we are already
>>>> using them.  We already upstreamed the ifunc support for binutils and glibc
>>>> required to use vector instructions.  We use them for mem* and str*
>>>> routines.  Adopting the position that the resolver isn't allowed to call
>>>> any mem* or str* functions is unreasonable.  Hence the need for this patch.
>>>> Jim
  
Jim Wilson Oct. 25, 2021, 9:08 p.m. UTC | #8
On Wed, Aug 11, 2021 at 4:52 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:

> IIUC the ABI patch still isn't merged, which is what makes it actually
> official.  I'll take this when it lands.
>

Just following up, the psABI patch was merged in
    https://github.com/riscv/riscv-elf-psabi-doc/pull/190
and the psABI has gone into public review for the first official release.
So we would like this patch from Kai merged in.
    https://sourceware.org/pipermail/libc-alpha/2021-August/129931.html
in case you lost track of the original message on this thread.

Jim
  
DJ Delorie Oct. 29, 2021, 12:20 a.m. UTC | #9
This patch looks good to me and follows the spec as committed.  I think
it's good to have a "we can't trust the ABI" bit that bypasses all the
problems with lazy relocation and ABI registers.

However, it no longer compiles due to 490e6c62aa31a8aa5c4a059f6e646ede121edf0a
https://sourceware.org/pipermail/libc-alpha/2021-October/131680.html
https://patchwork.sourceware.org/project/glibc/patch/20211006050935.1311740-1-maskray@google.com/

Speficially, this function changed parameters:

-elf_machine_rela (struct link_map *map, const ElfW(Rela) *reloc,
-		  const ElfW(Sym) *sym, const struct r_found_version *version,
+elf_machine_rela (struct link_map *map, struct r_scope_elem *scope[],
+		  const ElfW(Rela) *reloc, const ElfW(Sym) *sym,
+		  const struct r_found_version *version,
 		  void *const reloc_addr, int skip_ifunc)

Hsiangkai Wang <kai.wang@sifive.com> writes:
> +/* RISC-V specific values for the Dyn d_tag field.  */
> +#define DT_RISCV_VARIANT_CC	(DT_LOPROC + 1)
> +#define DT_RISCV_NUM		2

This matches the spec, ok.  I note that (DT_LOPROC) is left unused, so
while there is only one DT_* tag, the "number" of used tags is 2, with
the first tag left undefined.

> +/* RISC-V specific values for the st_other field.  */
> +#define STO_RISCV_VARIANT_CC	0x80

This matches the spec, ok.

> diff --git a/sysdeps/riscv/dl-dtprocnum.h b/sysdeps/riscv/dl-dtprocnum.h
> new file mode 100644
> index 0000000..f189fd7
> --- /dev/null
> +++ b/sysdeps/riscv/dl-dtprocnum.h
> @@ -0,0 +1,21 @@
> +/* Configuration of lookup functions.  RISC-V version.
> +   Copyright (C) 2019-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Number of extra dynamic section entries for this architecture.  By
> +   default there are none.  */
> +#define DT_THISPROCNUM	DT_RISCV_NUM

Ok.

> diff --git a/sysdeps/riscv/dl-machine.h b/sysdeps/riscv/dl-machine.h

> +/* Translate a processor specific dynamic tag to the index in l_info array.  */
> +#define DT_RISCV(x) (DT_RISCV_##x - DT_LOPROC + DT_NUM)

Ok.

> @@ -299,6 +302,29 @@ elf_machine_lazy_rel (struct link_map *map, ElfW(Addr) l_addr,
>    /* Check for unexpected PLT reloc type.  */
>    if (__glibc_likely (r_type == R_RISCV_JUMP_SLOT))
>      {
> +      if (__glibc_unlikely (map->l_info[DT_RISCV (VARIANT_CC)] != NULL))
> +	{

If the object MAY have CC's...

> +	  /* Check the symbol table for variant CC symbols.  */
> +	  const Elf_Symndx symndx = ELFW(R_SYM) (reloc->r_info);
> +	  const ElfW(Sym) *symtab =
> +	    (const void *)D_PTR (map, l_info[DT_SYMTAB]);
> +	  const ElfW(Sym) *sym = &symtab[symndx];

Find the symbol we're relocating against; we already know reloc is not NULL.

> +	  if (__glibc_unlikely (sym->st_other & STO_RISCV_VARIANT_CC))
> +	    {

and if it IS a CC type..

> +	      /* Avoid lazy resolution of variant CC symbols.  */
> +	      const struct r_found_version *version = NULL;
> +	      if (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
> +		{
> +		  const ElfW(Half) *vernum =
> +		    (const void *)D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
> +		  version = &map->l_versions[vernum[symndx] & 0x7fff];
> +		}
> +	      elf_machine_rela (map, reloc, sym, version, reloc_addr,
> +				skip_ifunc);

THEN we relocate it now

> +	      return;
> +	    }
> +	}
> +

Else we continue to keep it lazy.

(conveniently, this code is the same as aarch64's)

>        if (__glibc_unlikely (map->l_mach.plt == 0))
>  	{
>  	  if (l_addr)
  
Palmer Dabbelt Dec. 2, 2021, 8:21 p.m. UTC | #10
[Changing to Jim's new address]

On Mon, 25 Oct 2021 14:08:26 PDT (-0700), jimw@sifive.com wrote:
> On Wed, Aug 11, 2021 at 4:52 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
>> IIUC the ABI patch still isn't merged, which is what makes it actually
>> official.  I'll take this when it lands.
>>
>
> Just following up, the psABI patch was merged in
>     https://github.com/riscv/riscv-elf-psabi-doc/pull/190
> and the psABI has gone into public review for the first official release.
> So we would like this patch from Kai merged in.
>     https://sourceware.org/pipermail/libc-alpha/2021-August/129931.html
> in case you lost track of the original message on this thread.

Sorry this took a while, there's a lot of context here and I figured 
it'd be good to run through that as the issue in play has a pretty 
roundabout effect on this patch.  This is a little long, but I promise 
it gets there at the end:

We had a long-term outstanding issue on the psABI spec
<https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/66>, which
essentially boils down to glibc relying on lazily bound functions to follow the
standard calling convention.  The actual issue isn't specific to vector
registers, we're relying on this for X and F registers too, but AFAIK there's
no users of non-standard X/F register calling conventions for lazily bound
functions so we haven't had any user-visible bugs yet.  This all becomes a lot
more important with the recent addition of V registers, which are both more
expensive to save and all clobbered by the standard ABI.

In order to address the desire for non-standard calling conventions 
driven by the standard V register map, the following text was added to 
the psABI spec:

    Run-time linkers that use lazy binding must preserve all argument 
    registers used in the standard calling convention for the ABI in 
    use. Any functions that use additional argument registers must be 
    annotated with STO_RISCV_VARIANT_CC, as defined in Section 8.4.

As a result of adding that language the bug in question about the lazily 
bound functions was resolved.  We're relying on lazily bound functions 
following a lot more of the standard calling convention than that in 
glibc (sp, and ra for example; along with all the temporaries).  The 
text in question doesn't directly address any of those assumptions, and 
given that there's historically been user-visible breakages around these 
ambiguities I think it's prudent to be more explicit about what the 
glibc ABI is here.

As far as I can see, there are really three ways to go about this:

* We can just define the defacto divergence that glibc has always had 
  from the psABI in this area.  That's the smallest change, as IMO 
  that's as simple as

    diff --git a/manual/platform.texi b/manual/platform.texi
    index d5fdc5bd05..9153408966 100644
    --- a/manual/platform.texi
    +++ b/manual/platform.texi
    @@ -121,6 +121,9 @@ when it is not allowed, the priority is set to medium.
     @node RISC-V
     @appendixsec RISC-V-specific Facilities
    
    +Functions that are lazily bound must be compatible with the standard calling
    +convention.
    +
     Cache management facilities specific to RISC-V systems that implement the Linux
     ABI are declared in @file{sys/cachectl.h}.

  to start, adding something like "unless those functions are 
  annotated with STO_RISCV_VARIANT_CC" after merging something very 
  similar to the patch in question (I haven't looked closely at the 
  code, but it seems pretty-straight-forward).
* We can do our best to allow for the additional functions this new text 
  implicitly allows.  I don't see anything we can do about RA and SP, 
  but we could save the registers we clobber in _dl_runtime_resolve 
  (including the F registers on non-F libraries running on F systems, 
  which will be a headache).  We'd still need some additional 
  glibc-specific ABI for the bits of the standard calling convention we 
  can't otherwise handle, but IMO those are pedantic enough that we 
  could just disallow them entirely.
* We can just stop doing any sort of lazy binding.  That would allow us to
  any relying on any unspecified behavior.

The first option is preferable from my end, as the second option would 
require a lot of complexity (both specification and implementation) as 
well as come with a likely performance hit.
  
Jessica Clarke Dec. 2, 2021, 9:06 p.m. UTC | #11
On 2 Dec 2021, at 20:21, Palmer Dabbelt <palmer@dabbelt.com> wrote:
> 
> [Changing to Jim's new address]
> 
> On Mon, 25 Oct 2021 14:08:26 PDT (-0700), jimw@sifive.com wrote:
>> On Wed, Aug 11, 2021 at 4:52 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>> 
>>> IIUC the ABI patch still isn't merged, which is what makes it actually
>>> official.  I'll take this when it lands.
>>> 
>> 
>> Just following up, the psABI patch was merged in
>>    https://github.com/riscv/riscv-elf-psabi-doc/pull/190
>> and the psABI has gone into public review for the first official release.
>> So we would like this patch from Kai merged in.
>>    https://sourceware.org/pipermail/libc-alpha/2021-August/129931.html
>> in case you lost track of the original message on this thread.
> 
> Sorry this took a while, there's a lot of context here and I figured 
> it'd be good to run through that as the issue in play has a pretty 
> roundabout effect on this patch.  This is a little long, but I promise 
> it gets there at the end:
> 
> We had a long-term outstanding issue on the psABI spec
> <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/66>, which
> essentially boils down to glibc relying on lazily bound functions to follow the
> standard calling convention.  The actual issue isn't specific to vector
> registers, we're relying on this for X and F registers too, but AFAIK there's
> no users of non-standard X/F register calling conventions for lazily bound
> functions so we haven't had any user-visible bugs yet.  This all becomes a lot
> more important with the recent addition of V registers, which are both more
> expensive to save and all clobbered by the standard ABI.
> 
> In order to address the desire for non-standard calling conventions 
> driven by the standard V register map, the following text was added to 
> the psABI spec:
> 
>    Run-time linkers that use lazy binding must preserve all argument 
>    registers used in the standard calling convention for the ABI in 
>    use. Any functions that use additional argument registers must be 
>    annotated with STO_RISCV_VARIANT_CC, as defined in Section 8.4.
> 
> As a result of adding that language the bug in question about the lazily 
> bound functions was resolved.  We're relying on lazily bound functions 
> following a lot more of the standard calling convention than that in 
> glibc (sp, and ra for example; along with all the temporaries).  The 
> text in question doesn't directly address any of those assumptions, and 
> given that there's historically been user-visible breakages around these 
> ambiguities I think it's prudent to be more explicit about what the 
> glibc ABI is here.
> 
> As far as I can see, there are really three ways to go about this:
> 
> * We can just define the defacto divergence that glibc has always had 
>  from the psABI in this area.  That's the smallest change, as IMO 
>  that's as simple as
> 
>    diff --git a/manual/platform.texi b/manual/platform.texi
>    index d5fdc5bd05..9153408966 100644
>    --- a/manual/platform.texi
>    +++ b/manual/platform.texi
>    @@ -121,6 +121,9 @@ when it is not allowed, the priority is set to medium.
>     @node RISC-V
>     @appendixsec RISC-V-specific Facilities
> 
>    +Functions that are lazily bound must be compatible with the standard calling
>    +convention.
>    +
>     Cache management facilities specific to RISC-V systems that implement the Linux
>     ABI are declared in @file{sys/cachectl.h}.
> 
>  to start, adding something like "unless those functions are 
>  annotated with STO_RISCV_VARIANT_CC" after merging something very 
>  similar to the patch in question (I haven't looked closely at the 
>  code, but it seems pretty-straight-forward).
> * We can do our best to allow for the additional functions this new text 
>  implicitly allows.  I don't see anything we can do about RA and SP, 
>  but we could save the registers we clobber in _dl_runtime_resolve 
>  (including the F registers on non-F libraries running on F systems, 
>  which will be a headache).  We'd still need some additional 
>  glibc-specific ABI for the bits of the standard calling convention we 
>  can't otherwise handle, but IMO those are pedantic enough that we 
>  could just disallow them entirely.
> * We can just stop doing any sort of lazy binding.  That would allow us to
>  any relying on any unspecified behavior.
> 
> The first option is preferable from my end, as the second option would 
> require a lot of complexity (both specification and implementation) as 
> well as come with a likely performance hit.

Thanks for the detailed review of this corner of the spec, it’s
especially valuable at the moment whilst we’re trying to reach a
ratified version (though all the confusing/inconsistent/ill-defined
terminology is even less appropriate here, at the end of the day the
ABI is what the toolchains and runtimes say it is so it’s really just
about getting the necessary rubber stamps to get the next label and
reviewing the document style and exact language used...).

The intent of the spec is not to make repurposing just ra/sp/gp/tp as
anything other than their ABI-defined meanings (per table 1 of riscv-cc
and, in the case of the stack pointer, the alignment required by the
integer calling convention) legal unless you use STO_RISCV_VARIANT_CC,
so the glibc requirement, which is also true of FreeBSD for exactly the
same reasons, is intended to be what is specified. Upon re-reading what
was written I can see that requirement was lost or forgotten, so I’ll
look at tightening it up, probably changing

  Any functions that use additional argument registers ...

to

  Any functions that use registers in a way that is incompatible with
  the register convention of the ABI in use ...

I also note we currently only talk about the run-time linker preserving
argument registers, and say nothing about preserved registers, nor the
return address. I’m not sure quite why we do the former (maybe I’ve
just forgotten a past conversation), I feel like that’s implied by lazy
binding being an implementation optimisation that must not break the
calling convention, but if we want to keep that language then it should
probably be changed to:

  Run-time linkers that use lazy binding must only clobber registers
  defined as temporary registers for the ABI in use.

Does that all sound sensible, and sufficient, to you?

Jess
  
Andrew Waterman Dec. 2, 2021, 9:09 p.m. UTC | #12
On Thu, Dec 2, 2021 at 1:06 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>
> On 2 Dec 2021, at 20:21, Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >
> > [Changing to Jim's new address]
> >
> > On Mon, 25 Oct 2021 14:08:26 PDT (-0700), jimw@sifive.com wrote:
> >> On Wed, Aug 11, 2021 at 4:52 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >>
> >>> IIUC the ABI patch still isn't merged, which is what makes it actually
> >>> official.  I'll take this when it lands.
> >>>
> >>
> >> Just following up, the psABI patch was merged in
> >>    https://github.com/riscv/riscv-elf-psabi-doc/pull/190
> >> and the psABI has gone into public review for the first official release.
> >> So we would like this patch from Kai merged in.
> >>    https://sourceware.org/pipermail/libc-alpha/2021-August/129931.html
> >> in case you lost track of the original message on this thread.
> >
> > Sorry this took a while, there's a lot of context here and I figured
> > it'd be good to run through that as the issue in play has a pretty
> > roundabout effect on this patch.  This is a little long, but I promise
> > it gets there at the end:
> >
> > We had a long-term outstanding issue on the psABI spec
> > <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/66>, which
> > essentially boils down to glibc relying on lazily bound functions to follow the
> > standard calling convention.  The actual issue isn't specific to vector
> > registers, we're relying on this for X and F registers too, but AFAIK there's
> > no users of non-standard X/F register calling conventions for lazily bound
> > functions so we haven't had any user-visible bugs yet.  This all becomes a lot
> > more important with the recent addition of V registers, which are both more
> > expensive to save and all clobbered by the standard ABI.
> >
> > In order to address the desire for non-standard calling conventions
> > driven by the standard V register map, the following text was added to
> > the psABI spec:
> >
> >    Run-time linkers that use lazy binding must preserve all argument
> >    registers used in the standard calling convention for the ABI in
> >    use. Any functions that use additional argument registers must be
> >    annotated with STO_RISCV_VARIANT_CC, as defined in Section 8.4.
> >
> > As a result of adding that language the bug in question about the lazily
> > bound functions was resolved.  We're relying on lazily bound functions
> > following a lot more of the standard calling convention than that in
> > glibc (sp, and ra for example; along with all the temporaries).  The
> > text in question doesn't directly address any of those assumptions, and
> > given that there's historically been user-visible breakages around these
> > ambiguities I think it's prudent to be more explicit about what the
> > glibc ABI is here.
> >
> > As far as I can see, there are really three ways to go about this:
> >
> > * We can just define the defacto divergence that glibc has always had
> >  from the psABI in this area.  That's the smallest change, as IMO
> >  that's as simple as
> >
> >    diff --git a/manual/platform.texi b/manual/platform.texi
> >    index d5fdc5bd05..9153408966 100644
> >    --- a/manual/platform.texi
> >    +++ b/manual/platform.texi
> >    @@ -121,6 +121,9 @@ when it is not allowed, the priority is set to medium.
> >     @node RISC-V
> >     @appendixsec RISC-V-specific Facilities
> >
> >    +Functions that are lazily bound must be compatible with the standard calling
> >    +convention.
> >    +
> >     Cache management facilities specific to RISC-V systems that implement the Linux
> >     ABI are declared in @file{sys/cachectl.h}.
> >
> >  to start, adding something like "unless those functions are
> >  annotated with STO_RISCV_VARIANT_CC" after merging something very
> >  similar to the patch in question (I haven't looked closely at the
> >  code, but it seems pretty-straight-forward).
> > * We can do our best to allow for the additional functions this new text
> >  implicitly allows.  I don't see anything we can do about RA and SP,
> >  but we could save the registers we clobber in _dl_runtime_resolve
> >  (including the F registers on non-F libraries running on F systems,
> >  which will be a headache).  We'd still need some additional
> >  glibc-specific ABI for the bits of the standard calling convention we
> >  can't otherwise handle, but IMO those are pedantic enough that we
> >  could just disallow them entirely.
> > * We can just stop doing any sort of lazy binding.  That would allow us to
> >  any relying on any unspecified behavior.
> >
> > The first option is preferable from my end, as the second option would
> > require a lot of complexity (both specification and implementation) as
> > well as come with a likely performance hit.
>
> Thanks for the detailed review of this corner of the spec, it’s
> especially valuable at the moment whilst we’re trying to reach a
> ratified version (though all the confusing/inconsistent/ill-defined
> terminology is even less appropriate here, at the end of the day the
> ABI is what the toolchains and runtimes say it is so it’s really just
> about getting the necessary rubber stamps to get the next label and
> reviewing the document style and exact language used...).
>
> The intent of the spec is not to make repurposing just ra/sp/gp/tp as
> anything other than their ABI-defined meanings (per table 1 of riscv-cc
> and, in the case of the stack pointer, the alignment required by the
> integer calling convention) legal unless you use STO_RISCV_VARIANT_CC,
> so the glibc requirement, which is also true of FreeBSD for exactly the
> same reasons, is intended to be what is specified. Upon re-reading what
> was written I can see that requirement was lost or forgotten, so I’ll
> look at tightening it up, probably changing
>
>   Any functions that use additional argument registers ...
>
> to
>
>   Any functions that use registers in a way that is incompatible with
>   the register convention of the ABI in use ...
>
> I also note we currently only talk about the run-time linker preserving
> argument registers, and say nothing about preserved registers, nor the
> return address. I’m not sure quite why we do the former (maybe I’ve
> just forgotten a past conversation), I feel like that’s implied by lazy
> binding being an implementation optimisation that must not break the
> calling convention, but if we want to keep that language then it should
> probably be changed to:
>
>   Run-time linkers that use lazy binding must only clobber registers
>   defined as temporary registers for the ABI in use.
>
> Does that all sound sensible, and sufficient, to you?

I think so.

>
> Jess
>
  
Florian Weimer Dec. 2, 2021, 9:44 p.m. UTC | #13
* Jessica Clarke:

> The intent of the spec is not to make repurposing just ra/sp/gp/tp as
> anything other than their ABI-defined meanings (per table 1 of riscv-cc
> and, in the case of the stack pointer, the alignment required by the
> integer calling convention) legal unless you use STO_RISCV_VARIANT_CC,
> so the glibc requirement, which is also true of FreeBSD for exactly the
> same reasons, is intended to be what is specified. Upon re-reading what
> was written I can see that requirement was lost or forgotten, so I’ll
> look at tightening it up, probably changing
>
>   Any functions that use additional argument registers ...
>
> to
>
>   Any functions that use registers in a way that is incompatible with
>   the register convention of the ABI in use ...
>
> I also note we currently only talk about the run-time linker preserving
> argument registers, and say nothing about preserved registers, nor the
> return address. I’m not sure quite why we do the former (maybe I’ve
> just forgotten a past conversation), I feel like that’s implied by lazy
> binding being an implementation optimisation that must not break the
> calling convention, but if we want to keep that language then it should
> probably be changed to:
>
>   Run-time linkers that use lazy binding must only clobber registers
>   defined as temporary registers for the ABI in use.
>
> Does that all sound sensible, and sufficient, to you?

One further aspect is that glibc has dynamic linker plugins called
auditors, and arbitrary code can run *after* a function has returned.
For that to work, we need save registers used for return values as well.
(The default dynamic linker trampoline sidesteps this because it makes a
tail call to the actual implementation, based on what the core symbol
binding routine in the dynamic linker has returned.)  This is a fairly
obscure feature, and I don't think anyone has a really good solution for
their architecture today.  There might also be better alternatives
(e.g., if you know the signature of the function you are intercepting,
you don't need a general solution for arbitrary calling conventions
anymore).

Do temporary registers overlap with return registers?

Thanks,
Florian
  
Jessica Clarke Dec. 2, 2021, 10:08 p.m. UTC | #14
On 2 Dec 2021, at 21:44, Florian Weimer <fweimer@redhat.com> wrote:
> 
> * Jessica Clarke:
> 
>> The intent of the spec is not to make repurposing just ra/sp/gp/tp as
>> anything other than their ABI-defined meanings (per table 1 of riscv-cc
>> and, in the case of the stack pointer, the alignment required by the
>> integer calling convention) legal unless you use STO_RISCV_VARIANT_CC,
>> so the glibc requirement, which is also true of FreeBSD for exactly the
>> same reasons, is intended to be what is specified. Upon re-reading what
>> was written I can see that requirement was lost or forgotten, so I’ll
>> look at tightening it up, probably changing
>> 
>>  Any functions that use additional argument registers ...
>> 
>> to
>> 
>>  Any functions that use registers in a way that is incompatible with
>>  the register convention of the ABI in use ...
>> 
>> I also note we currently only talk about the run-time linker preserving
>> argument registers, and say nothing about preserved registers, nor the
>> return address. I’m not sure quite why we do the former (maybe I’ve
>> just forgotten a past conversation), I feel like that’s implied by lazy
>> binding being an implementation optimisation that must not break the
>> calling convention, but if we want to keep that language then it should
>> probably be changed to:
>> 
>>  Run-time linkers that use lazy binding must only clobber registers
>>  defined as temporary registers for the ABI in use.
>> 
>> Does that all sound sensible, and sufficient, to you?
> 
> One further aspect is that glibc has dynamic linker plugins called
> auditors, and arbitrary code can run *after* a function has returned.
> For that to work, we need save registers used for return values as well.
> (The default dynamic linker trampoline sidesteps this because it makes a
> tail call to the actual implementation, based on what the core symbol
> binding routine in the dynamic linker has returned.)  This is a fairly
> obscure feature, and I don't think anyone has a really good solution for
> their architecture today.  There might also be better alternatives
> (e.g., if you know the signature of the function you are intercepting,
> you don't need a general solution for arbitrary calling conventions
> anymore).
> 
> Do temporary registers overlap with return registers?

The audit support in glibc is a good point, and something that hadn’t
even crossed my mind (though probably should have; I have used it).

Short answer is it’s the like AArch64 (but no IP0/1 and veneer
weirdness equivalents). Long answer is that the register convention
currently defines registers as preserved across calls, not preserved
across calls, and unallocatable (special case for gp and tp, as “don’t
clobber unless you’re the runtime or otherwise really know what you’re
doing”). Those registers not preserved across calls are further
subdivided into argument registers, used for both arguments and return
values, and temporary registers, which have no other defined role in
the calling convention.

Dealing with glibc’s auditing in the implementation-agnostic ABI starts
to become a bit awkward; in particular, interposing on the return
requires clobbering the return address on the call path, which is not a
temporary register, and so would not be legal per my suggested text,
but you don’t want to give free reign for people to clobber it because
clearly that’s totally broken in almost other cases.

Perhaps that’s an argument for dropping the first sentence entirely,
and just having the “Any functions ...” sentence, possibly with a bit
of extra non-normative text if it then looks lacking in rationale. Do
you know if any other psABIs specify the requirements on run-time
linkers like we’re currently trying (and failing) to do?

Jess
  

Patch

diff --git a/elf/elf.h b/elf/elf.h
index 4738dfa..0de29bf 100644
--- a/elf/elf.h
+++ b/elf/elf.h
@@ -3889,6 +3889,13 @@  enum
 
 #define R_TILEGX_NUM		130
 
+/* RISC-V specific values for the Dyn d_tag field.  */
+#define DT_RISCV_VARIANT_CC	(DT_LOPROC + 1)
+#define DT_RISCV_NUM		2
+
+/* RISC-V specific values for the st_other field.  */
+#define STO_RISCV_VARIANT_CC	0x80
+
 /* RISC-V ELF Flags */
 #define EF_RISCV_RVC 			0x0001
 #define EF_RISCV_FLOAT_ABI 		0x0006
diff --git a/sysdeps/riscv/dl-dtprocnum.h b/sysdeps/riscv/dl-dtprocnum.h
new file mode 100644
index 0000000..f189fd7
--- /dev/null
+++ b/sysdeps/riscv/dl-dtprocnum.h
@@ -0,0 +1,21 @@ 
+/* Configuration of lookup functions.  RISC-V version.
+   Copyright (C) 2019-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Number of extra dynamic section entries for this architecture.  By
+   default there are none.  */
+#define DT_THISPROCNUM	DT_RISCV_NUM
diff --git a/sysdeps/riscv/dl-machine.h b/sysdeps/riscv/dl-machine.h
index aedf69f..6488483 100644
--- a/sysdeps/riscv/dl-machine.h
+++ b/sysdeps/riscv/dl-machine.h
@@ -54,6 +54,9 @@ 
 #define ELF_MACHINE_NO_REL 1
 #define ELF_MACHINE_NO_RELA 0
 
+/* Translate a processor specific dynamic tag to the index in l_info array.  */
+#define DT_RISCV(x) (DT_RISCV_##x - DT_LOPROC + DT_NUM)
+
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute_used__
 elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
@@ -299,6 +302,29 @@  elf_machine_lazy_rel (struct link_map *map, ElfW(Addr) l_addr,
   /* Check for unexpected PLT reloc type.  */
   if (__glibc_likely (r_type == R_RISCV_JUMP_SLOT))
     {
+      if (__glibc_unlikely (map->l_info[DT_RISCV (VARIANT_CC)] != NULL))
+	{
+	  /* Check the symbol table for variant CC symbols.  */
+	  const Elf_Symndx symndx = ELFW(R_SYM) (reloc->r_info);
+	  const ElfW(Sym) *symtab =
+	    (const void *)D_PTR (map, l_info[DT_SYMTAB]);
+	  const ElfW(Sym) *sym = &symtab[symndx];
+	  if (__glibc_unlikely (sym->st_other & STO_RISCV_VARIANT_CC))
+	    {
+	      /* Avoid lazy resolution of variant CC symbols.  */
+	      const struct r_found_version *version = NULL;
+	      if (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
+		{
+		  const ElfW(Half) *vernum =
+		    (const void *)D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
+		  version = &map->l_versions[vernum[symndx] & 0x7fff];
+		}
+	      elf_machine_rela (map, reloc, sym, version, reloc_addr,
+				skip_ifunc);
+	      return;
+	    }
+	}
+
       if (__glibc_unlikely (map->l_mach.plt == 0))
 	{
 	  if (l_addr)