[v2,00/15] RISC-V: Add Zbb-optimized string routines as ifuncs

Message ID 20240527111900.1060546-1-christoph.muellner@vrull.eu (mailing list archive)
Headers
Series RISC-V: Add Zbb-optimized string routines as ifuncs |

Message

Christoph Müllner May 27, 2024, 11:18 a.m. UTC
  Glibc recently got hwprobe() support for RISC-V, which allows querying
avaiable extensions at runtime.  On top of that an optimized memcpy()
routine (for fast unaligned accesses) has been merged, which is built by
recompiling the generic C code with a different compiler flag.  An ifunc
resolver then detects which routine should be run using hwprobe().

This patchset follows this idea and recompiles the following functions
for Zbb (via function attributes) and enables the existing Zbb/orc.b
optimization in riscv/string-fza.h:
memchr, memrchr, strchrnul, strcmp, strlen, strncmp.
The resulting optimized routines are then selected by the resolver function
if the Zbb extension is present at runtime.

To use target function attributes, a few issues had to be resovled:
- The functions above got a mechanism to be compiled with function attributes
  (patches 2-7).  Only those routines have been touched, which are
  required for the purpose of this patchset.
- Ensuring that inlined functions also get the same function attributes
  (first patch).
- Add mechanism to explicitly enable the orc.b optimization for string functions
  (patch 8), which is a bit inspired by USE_FFS_BUILTIN.

One of the design questions is, if Zbb represents a broad enough optimization
target.  Tests with Zb* extensions showed, that no further code improvements
can be achieved with them.  Also most other extensions likely won't affect
the generated code for string routines (ignoring vector instructions, which
are a different topic).  Therefore, Zbb seemed like a sufficient target.

This series was tested by writing a simple test program to invoke the
libc routines (e.g. strcmp) and a modified QEMU that reports the
emulation of orc.b on stderr.  With that the QEMU can be used to test
if the optimized routines are executed (-cpu "rv64,zbb=[false,true]").
Further, this series was tested with SPEC CPU 2017 intrate with Zbb
enabled.  The function attribute detection mechanism was tested with
GCC 13 and GCC 14.

Changes in v2:
- Drop "Use .insn directive form for orc.b"
- Introduce use of target function attribute (and all depenendcies)
- Introduce detection of target function attribute support
- Make orc.b optimization explicit
- Small cleanups

Christoph Müllner (15):
  cdefs: Add mechanism to add attributes to __always_inline functions
  string/memchr: Add mechanism to set function attributes
  string/memrchr: Add mechanism to set function attributes
  string/strchrnul: Add mechanism to set function attributes
  string/strcmp: Add mechanism to set function attributes
  string/strlen: Add mechanism to set function attributes
  string/strncmp: Add mechanism to set function attributes
  RISC-V: string-fz[a,i].h: Make orc.b optimization explicit
  RISC-V: Add compiler test for Zbb function attribute support
  RISC-V: Add Zbb optimized memchr as ifunc
  RISC-V: Add Zbb optimized memrchr as ifunc
  RISC-V: Add Zbb optimized strchrnul as ifunc
  RISC-V: Add Zbb optimized strcmp as ifunc
  RISC-V: Add Zbb optimized strlen as ifunc
  RISC-V: Add Zbb optimized strncmp as ifunc

 config.h.in                                   |  3 +
 misc/sys/cdefs.h                              |  8 ++-
 string/memchr.c                               |  5 ++
 string/memrchr.c                              |  5 ++
 string/strchrnul.c                            |  5 ++
 string/strcmp.c                               |  8 +++
 string/strlen.c                               |  5 ++
 string/strncmp.c                              |  8 +++
 sysdeps/riscv/configure                       | 27 ++++++++
 sysdeps/riscv/configure.ac                    | 18 +++++
 sysdeps/riscv/multiarch/memchr-generic.c      | 24 +++++++
 sysdeps/riscv/multiarch/memchr-zbb.c          | 23 +++++++
 sysdeps/riscv/multiarch/memrchr-generic.c     | 24 +++++++
 sysdeps/riscv/multiarch/memrchr-zbb.c         | 23 +++++++
 sysdeps/riscv/multiarch/strchrnul-generic.c   | 24 +++++++
 sysdeps/riscv/multiarch/strchrnul-zbb.c       | 23 +++++++
 sysdeps/riscv/multiarch/strcmp-generic.c      | 24 +++++++
 sysdeps/riscv/multiarch/strcmp-zbb.c          | 23 +++++++
 sysdeps/riscv/multiarch/strlen-generic.c      | 24 +++++++
 sysdeps/riscv/multiarch/strlen-zbb.c          | 23 +++++++
 sysdeps/riscv/multiarch/strncmp-generic.c     | 26 +++++++
 sysdeps/riscv/multiarch/strncmp-zbb.c         | 25 +++++++
 sysdeps/riscv/string-fza.h                    | 22 +++++-
 sysdeps/riscv/string-fzi.h                    | 20 +++++-
 .../unix/sysv/linux/riscv/multiarch/Makefile  | 23 +++++++
 .../linux/riscv/multiarch/ifunc-impl-list.c   | 67 +++++++++++++++++--
 .../unix/sysv/linux/riscv/multiarch/memchr.c  | 60 +++++++++++++++++
 .../unix/sysv/linux/riscv/multiarch/memrchr.c | 63 +++++++++++++++++
 .../sysv/linux/riscv/multiarch/strchrnul.c    | 63 +++++++++++++++++
 .../unix/sysv/linux/riscv/multiarch/strcmp.c  | 59 ++++++++++++++++
 .../unix/sysv/linux/riscv/multiarch/strlen.c  | 59 ++++++++++++++++
 .../unix/sysv/linux/riscv/multiarch/strncmp.c | 59 ++++++++++++++++
 32 files changed, 863 insertions(+), 10 deletions(-)
 create mode 100644 sysdeps/riscv/multiarch/memchr-generic.c
 create mode 100644 sysdeps/riscv/multiarch/memchr-zbb.c
 create mode 100644 sysdeps/riscv/multiarch/memrchr-generic.c
 create mode 100644 sysdeps/riscv/multiarch/memrchr-zbb.c
 create mode 100644 sysdeps/riscv/multiarch/strchrnul-generic.c
 create mode 100644 sysdeps/riscv/multiarch/strchrnul-zbb.c
 create mode 100644 sysdeps/riscv/multiarch/strcmp-generic.c
 create mode 100644 sysdeps/riscv/multiarch/strcmp-zbb.c
 create mode 100644 sysdeps/riscv/multiarch/strlen-generic.c
 create mode 100644 sysdeps/riscv/multiarch/strlen-zbb.c
 create mode 100644 sysdeps/riscv/multiarch/strncmp-generic.c
 create mode 100644 sysdeps/riscv/multiarch/strncmp-zbb.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memchr.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memrchr.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strchrnul.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strcmp.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strlen.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strncmp.c
  

Comments

Christoph Müllner June 3, 2024, 9:08 p.m. UTC | #1
Ping.

On Mon, May 27, 2024 at 1:19 PM Christoph Müllner
<christoph.muellner@vrull.eu> wrote:
>
> Glibc recently got hwprobe() support for RISC-V, which allows querying
> avaiable extensions at runtime.  On top of that an optimized memcpy()
> routine (for fast unaligned accesses) has been merged, which is built by
> recompiling the generic C code with a different compiler flag.  An ifunc
> resolver then detects which routine should be run using hwprobe().
>
> This patchset follows this idea and recompiles the following functions
> for Zbb (via function attributes) and enables the existing Zbb/orc.b
> optimization in riscv/string-fza.h:
> memchr, memrchr, strchrnul, strcmp, strlen, strncmp.
> The resulting optimized routines are then selected by the resolver function
> if the Zbb extension is present at runtime.
>
> To use target function attributes, a few issues had to be resovled:
> - The functions above got a mechanism to be compiled with function attributes
>   (patches 2-7).  Only those routines have been touched, which are
>   required for the purpose of this patchset.
> - Ensuring that inlined functions also get the same function attributes
>   (first patch).
> - Add mechanism to explicitly enable the orc.b optimization for string functions
>   (patch 8), which is a bit inspired by USE_FFS_BUILTIN.
>
> One of the design questions is, if Zbb represents a broad enough optimization
> target.  Tests with Zb* extensions showed, that no further code improvements
> can be achieved with them.  Also most other extensions likely won't affect
> the generated code for string routines (ignoring vector instructions, which
> are a different topic).  Therefore, Zbb seemed like a sufficient target.
>
> This series was tested by writing a simple test program to invoke the
> libc routines (e.g. strcmp) and a modified QEMU that reports the
> emulation of orc.b on stderr.  With that the QEMU can be used to test
> if the optimized routines are executed (-cpu "rv64,zbb=[false,true]").
> Further, this series was tested with SPEC CPU 2017 intrate with Zbb
> enabled.  The function attribute detection mechanism was tested with
> GCC 13 and GCC 14.
>
> Changes in v2:
> - Drop "Use .insn directive form for orc.b"
> - Introduce use of target function attribute (and all depenendcies)
> - Introduce detection of target function attribute support
> - Make orc.b optimization explicit
> - Small cleanups
>
> Christoph Müllner (15):
>   cdefs: Add mechanism to add attributes to __always_inline functions
>   string/memchr: Add mechanism to set function attributes
>   string/memrchr: Add mechanism to set function attributes
>   string/strchrnul: Add mechanism to set function attributes
>   string/strcmp: Add mechanism to set function attributes
>   string/strlen: Add mechanism to set function attributes
>   string/strncmp: Add mechanism to set function attributes
>   RISC-V: string-fz[a,i].h: Make orc.b optimization explicit
>   RISC-V: Add compiler test for Zbb function attribute support
>   RISC-V: Add Zbb optimized memchr as ifunc
>   RISC-V: Add Zbb optimized memrchr as ifunc
>   RISC-V: Add Zbb optimized strchrnul as ifunc
>   RISC-V: Add Zbb optimized strcmp as ifunc
>   RISC-V: Add Zbb optimized strlen as ifunc
>   RISC-V: Add Zbb optimized strncmp as ifunc
>
>  config.h.in                                   |  3 +
>  misc/sys/cdefs.h                              |  8 ++-
>  string/memchr.c                               |  5 ++
>  string/memrchr.c                              |  5 ++
>  string/strchrnul.c                            |  5 ++
>  string/strcmp.c                               |  8 +++
>  string/strlen.c                               |  5 ++
>  string/strncmp.c                              |  8 +++
>  sysdeps/riscv/configure                       | 27 ++++++++
>  sysdeps/riscv/configure.ac                    | 18 +++++
>  sysdeps/riscv/multiarch/memchr-generic.c      | 24 +++++++
>  sysdeps/riscv/multiarch/memchr-zbb.c          | 23 +++++++
>  sysdeps/riscv/multiarch/memrchr-generic.c     | 24 +++++++
>  sysdeps/riscv/multiarch/memrchr-zbb.c         | 23 +++++++
>  sysdeps/riscv/multiarch/strchrnul-generic.c   | 24 +++++++
>  sysdeps/riscv/multiarch/strchrnul-zbb.c       | 23 +++++++
>  sysdeps/riscv/multiarch/strcmp-generic.c      | 24 +++++++
>  sysdeps/riscv/multiarch/strcmp-zbb.c          | 23 +++++++
>  sysdeps/riscv/multiarch/strlen-generic.c      | 24 +++++++
>  sysdeps/riscv/multiarch/strlen-zbb.c          | 23 +++++++
>  sysdeps/riscv/multiarch/strncmp-generic.c     | 26 +++++++
>  sysdeps/riscv/multiarch/strncmp-zbb.c         | 25 +++++++
>  sysdeps/riscv/string-fza.h                    | 22 +++++-
>  sysdeps/riscv/string-fzi.h                    | 20 +++++-
>  .../unix/sysv/linux/riscv/multiarch/Makefile  | 23 +++++++
>  .../linux/riscv/multiarch/ifunc-impl-list.c   | 67 +++++++++++++++++--
>  .../unix/sysv/linux/riscv/multiarch/memchr.c  | 60 +++++++++++++++++
>  .../unix/sysv/linux/riscv/multiarch/memrchr.c | 63 +++++++++++++++++
>  .../sysv/linux/riscv/multiarch/strchrnul.c    | 63 +++++++++++++++++
>  .../unix/sysv/linux/riscv/multiarch/strcmp.c  | 59 ++++++++++++++++
>  .../unix/sysv/linux/riscv/multiarch/strlen.c  | 59 ++++++++++++++++
>  .../unix/sysv/linux/riscv/multiarch/strncmp.c | 59 ++++++++++++++++
>  32 files changed, 863 insertions(+), 10 deletions(-)
>  create mode 100644 sysdeps/riscv/multiarch/memchr-generic.c
>  create mode 100644 sysdeps/riscv/multiarch/memchr-zbb.c
>  create mode 100644 sysdeps/riscv/multiarch/memrchr-generic.c
>  create mode 100644 sysdeps/riscv/multiarch/memrchr-zbb.c
>  create mode 100644 sysdeps/riscv/multiarch/strchrnul-generic.c
>  create mode 100644 sysdeps/riscv/multiarch/strchrnul-zbb.c
>  create mode 100644 sysdeps/riscv/multiarch/strcmp-generic.c
>  create mode 100644 sysdeps/riscv/multiarch/strcmp-zbb.c
>  create mode 100644 sysdeps/riscv/multiarch/strlen-generic.c
>  create mode 100644 sysdeps/riscv/multiarch/strlen-zbb.c
>  create mode 100644 sysdeps/riscv/multiarch/strncmp-generic.c
>  create mode 100644 sysdeps/riscv/multiarch/strncmp-zbb.c
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memchr.c
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memrchr.c
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strchrnul.c
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strcmp.c
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strlen.c
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strncmp.c
>
> --
> 2.45.1
>
  
Christoph Müllner June 11, 2024, 10 a.m. UTC | #2
Ping.

On Mon, Jun 3, 2024 at 11:08 PM Christoph Müllner
<christoph.muellner@vrull.eu> wrote:
>
> Ping.
>
> On Mon, May 27, 2024 at 1:19 PM Christoph Müllner
> <christoph.muellner@vrull.eu> wrote:
> >
> > Glibc recently got hwprobe() support for RISC-V, which allows querying
> > avaiable extensions at runtime.  On top of that an optimized memcpy()
> > routine (for fast unaligned accesses) has been merged, which is built by
> > recompiling the generic C code with a different compiler flag.  An ifunc
> > resolver then detects which routine should be run using hwprobe().
> >
> > This patchset follows this idea and recompiles the following functions
> > for Zbb (via function attributes) and enables the existing Zbb/orc.b
> > optimization in riscv/string-fza.h:
> > memchr, memrchr, strchrnul, strcmp, strlen, strncmp.
> > The resulting optimized routines are then selected by the resolver function
> > if the Zbb extension is present at runtime.
> >
> > To use target function attributes, a few issues had to be resovled:
> > - The functions above got a mechanism to be compiled with function attributes
> >   (patches 2-7).  Only those routines have been touched, which are
> >   required for the purpose of this patchset.
> > - Ensuring that inlined functions also get the same function attributes
> >   (first patch).
> > - Add mechanism to explicitly enable the orc.b optimization for string functions
> >   (patch 8), which is a bit inspired by USE_FFS_BUILTIN.
> >
> > One of the design questions is, if Zbb represents a broad enough optimization
> > target.  Tests with Zb* extensions showed, that no further code improvements
> > can be achieved with them.  Also most other extensions likely won't affect
> > the generated code for string routines (ignoring vector instructions, which
> > are a different topic).  Therefore, Zbb seemed like a sufficient target.
> >
> > This series was tested by writing a simple test program to invoke the
> > libc routines (e.g. strcmp) and a modified QEMU that reports the
> > emulation of orc.b on stderr.  With that the QEMU can be used to test
> > if the optimized routines are executed (-cpu "rv64,zbb=[false,true]").
> > Further, this series was tested with SPEC CPU 2017 intrate with Zbb
> > enabled.  The function attribute detection mechanism was tested with
> > GCC 13 and GCC 14.
> >
> > Changes in v2:
> > - Drop "Use .insn directive form for orc.b"
> > - Introduce use of target function attribute (and all depenendcies)
> > - Introduce detection of target function attribute support
> > - Make orc.b optimization explicit
> > - Small cleanups
> >
> > Christoph Müllner (15):
> >   cdefs: Add mechanism to add attributes to __always_inline functions
> >   string/memchr: Add mechanism to set function attributes
> >   string/memrchr: Add mechanism to set function attributes
> >   string/strchrnul: Add mechanism to set function attributes
> >   string/strcmp: Add mechanism to set function attributes
> >   string/strlen: Add mechanism to set function attributes
> >   string/strncmp: Add mechanism to set function attributes
> >   RISC-V: string-fz[a,i].h: Make orc.b optimization explicit
> >   RISC-V: Add compiler test for Zbb function attribute support
> >   RISC-V: Add Zbb optimized memchr as ifunc
> >   RISC-V: Add Zbb optimized memrchr as ifunc
> >   RISC-V: Add Zbb optimized strchrnul as ifunc
> >   RISC-V: Add Zbb optimized strcmp as ifunc
> >   RISC-V: Add Zbb optimized strlen as ifunc
> >   RISC-V: Add Zbb optimized strncmp as ifunc
> >
> >  config.h.in                                   |  3 +
> >  misc/sys/cdefs.h                              |  8 ++-
> >  string/memchr.c                               |  5 ++
> >  string/memrchr.c                              |  5 ++
> >  string/strchrnul.c                            |  5 ++
> >  string/strcmp.c                               |  8 +++
> >  string/strlen.c                               |  5 ++
> >  string/strncmp.c                              |  8 +++
> >  sysdeps/riscv/configure                       | 27 ++++++++
> >  sysdeps/riscv/configure.ac                    | 18 +++++
> >  sysdeps/riscv/multiarch/memchr-generic.c      | 24 +++++++
> >  sysdeps/riscv/multiarch/memchr-zbb.c          | 23 +++++++
> >  sysdeps/riscv/multiarch/memrchr-generic.c     | 24 +++++++
> >  sysdeps/riscv/multiarch/memrchr-zbb.c         | 23 +++++++
> >  sysdeps/riscv/multiarch/strchrnul-generic.c   | 24 +++++++
> >  sysdeps/riscv/multiarch/strchrnul-zbb.c       | 23 +++++++
> >  sysdeps/riscv/multiarch/strcmp-generic.c      | 24 +++++++
> >  sysdeps/riscv/multiarch/strcmp-zbb.c          | 23 +++++++
> >  sysdeps/riscv/multiarch/strlen-generic.c      | 24 +++++++
> >  sysdeps/riscv/multiarch/strlen-zbb.c          | 23 +++++++
> >  sysdeps/riscv/multiarch/strncmp-generic.c     | 26 +++++++
> >  sysdeps/riscv/multiarch/strncmp-zbb.c         | 25 +++++++
> >  sysdeps/riscv/string-fza.h                    | 22 +++++-
> >  sysdeps/riscv/string-fzi.h                    | 20 +++++-
> >  .../unix/sysv/linux/riscv/multiarch/Makefile  | 23 +++++++
> >  .../linux/riscv/multiarch/ifunc-impl-list.c   | 67 +++++++++++++++++--
> >  .../unix/sysv/linux/riscv/multiarch/memchr.c  | 60 +++++++++++++++++
> >  .../unix/sysv/linux/riscv/multiarch/memrchr.c | 63 +++++++++++++++++
> >  .../sysv/linux/riscv/multiarch/strchrnul.c    | 63 +++++++++++++++++
> >  .../unix/sysv/linux/riscv/multiarch/strcmp.c  | 59 ++++++++++++++++
> >  .../unix/sysv/linux/riscv/multiarch/strlen.c  | 59 ++++++++++++++++
> >  .../unix/sysv/linux/riscv/multiarch/strncmp.c | 59 ++++++++++++++++
> >  32 files changed, 863 insertions(+), 10 deletions(-)
> >  create mode 100644 sysdeps/riscv/multiarch/memchr-generic.c
> >  create mode 100644 sysdeps/riscv/multiarch/memchr-zbb.c
> >  create mode 100644 sysdeps/riscv/multiarch/memrchr-generic.c
> >  create mode 100644 sysdeps/riscv/multiarch/memrchr-zbb.c
> >  create mode 100644 sysdeps/riscv/multiarch/strchrnul-generic.c
> >  create mode 100644 sysdeps/riscv/multiarch/strchrnul-zbb.c
> >  create mode 100644 sysdeps/riscv/multiarch/strcmp-generic.c
> >  create mode 100644 sysdeps/riscv/multiarch/strcmp-zbb.c
> >  create mode 100644 sysdeps/riscv/multiarch/strlen-generic.c
> >  create mode 100644 sysdeps/riscv/multiarch/strlen-zbb.c
> >  create mode 100644 sysdeps/riscv/multiarch/strncmp-generic.c
> >  create mode 100644 sysdeps/riscv/multiarch/strncmp-zbb.c
> >  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memchr.c
> >  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memrchr.c
> >  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strchrnul.c
> >  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strcmp.c
> >  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strlen.c
> >  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strncmp.c
> >
> > --
> > 2.45.1
> >
  
Adhemerval Zanella Netto June 19, 2024, 2:26 p.m. UTC | #3
On 27/05/24 08:18, Christoph Müllner wrote:
> Glibc recently got hwprobe() support for RISC-V, which allows querying
> avaiable extensions at runtime.  On top of that an optimized memcpy()
> routine (for fast unaligned accesses) has been merged, which is built by
> recompiling the generic C code with a different compiler flag.  An ifunc
> resolver then detects which routine should be run using hwprobe().
> 
> This patchset follows this idea and recompiles the following functions
> for Zbb (via function attributes) and enables the existing Zbb/orc.b
> optimization in riscv/string-fza.h:
> memchr, memrchr, strchrnul, strcmp, strlen, strncmp.
> The resulting optimized routines are then selected by the resolver function
> if the Zbb extension is present at runtime.
> 
> To use target function attributes, a few issues had to be resovled:
> - The functions above got a mechanism to be compiled with function attributes
>   (patches 2-7).  Only those routines have been touched, which are
>   required for the purpose of this patchset.
> - Ensuring that inlined functions also get the same function attributes
>   (first patch).
> - Add mechanism to explicitly enable the orc.b optimization for string functions
>   (patch 8), which is a bit inspired by USE_FFS_BUILTIN.
> 
> One of the design questions is, if Zbb represents a broad enough optimization
> target.  Tests with Zb* extensions showed, that no further code improvements
> can be achieved with them.  Also most other extensions likely won't affect
> the generated code for string routines (ignoring vector instructions, which
> are a different topic).  Therefore, Zbb seemed like a sufficient target.
> 
> This series was tested by writing a simple test program to invoke the
> libc routines (e.g. strcmp) and a modified QEMU that reports the
> emulation of orc.b on stderr.  With that the QEMU can be used to test
> if the optimized routines are executed (-cpu "rv64,zbb=[false,true]").
> Further, this series was tested with SPEC CPU 2017 intrate with Zbb
> enabled.  The function attribute detection mechanism was tested with
> GCC 13 and GCC 14.
> 

I tried check this patchset with gcc 14 [1] (commit 6f6103ccc5b3bf8cb,
built with build-many-glibcs.py) and it shows an ICE:

$ riscv64-glibc-linux-gnu-gcc ../sysdeps/riscv/multiarch/memchr-zbb.c [...]
In file included from ../include/bits/string_fortified.h:1,
                 from ../string/string.h:548,
                 from ../include/string.h:60,
                 from ../string/memchr.c:24,
                 from ../sysdeps/riscv/multiarch/memchr-zbb.c:23:
../string/bits/string_fortified.h:110:1: internal compiler error: in riscv_func_target_put, at common/config/riscv/riscv-common.cc:510
  110 | {
      | ^
0x7af52762a1c9 __libc_start_call_main
        ../sysdeps/nptl/libc_start_call_main.h:58
0x7af52762a28a __libc_start_main_impl
        ../csu/libc-start.c:360

I am building with --enable-fortify-source=2, with --enable-fortify-source=no
the build does not fail. A gcc 13 build [2] seems to be in better shape but 
the version I am using does not have support for required attributes 
(HAVE_RISCV_FATTRIBUTE_ZBB is set to 0).

So I think we will need to check whether this is still happens with
gcc master/15 and see if indeed it was a regression added on gcc 14.

[1] gcc version 14.1.1 20240619 [releases/gcc-14 r14-10324-g6f6103ccc5b]
[2] gcc version 13.1.1 20230525 [releases/gcc-13 r13-7376-ge80487dcbe2]
  
Christoph Müllner June 20, 2024, 3:52 p.m. UTC | #4
On Wed, Jun 19, 2024 at 4:26 PM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 27/05/24 08:18, Christoph Müllner wrote:
> > Glibc recently got hwprobe() support for RISC-V, which allows querying
> > avaiable extensions at runtime.  On top of that an optimized memcpy()
> > routine (for fast unaligned accesses) has been merged, which is built by
> > recompiling the generic C code with a different compiler flag.  An ifunc
> > resolver then detects which routine should be run using hwprobe().
> >
> > This patchset follows this idea and recompiles the following functions
> > for Zbb (via function attributes) and enables the existing Zbb/orc.b
> > optimization in riscv/string-fza.h:
> > memchr, memrchr, strchrnul, strcmp, strlen, strncmp.
> > The resulting optimized routines are then selected by the resolver function
> > if the Zbb extension is present at runtime.
> >
> > To use target function attributes, a few issues had to be resovled:
> > - The functions above got a mechanism to be compiled with function attributes
> >   (patches 2-7).  Only those routines have been touched, which are
> >   required for the purpose of this patchset.
> > - Ensuring that inlined functions also get the same function attributes
> >   (first patch).
> > - Add mechanism to explicitly enable the orc.b optimization for string functions
> >   (patch 8), which is a bit inspired by USE_FFS_BUILTIN.
> >
> > One of the design questions is, if Zbb represents a broad enough optimization
> > target.  Tests with Zb* extensions showed, that no further code improvements
> > can be achieved with them.  Also most other extensions likely won't affect
> > the generated code for string routines (ignoring vector instructions, which
> > are a different topic).  Therefore, Zbb seemed like a sufficient target.
> >
> > This series was tested by writing a simple test program to invoke the
> > libc routines (e.g. strcmp) and a modified QEMU that reports the
> > emulation of orc.b on stderr.  With that the QEMU can be used to test
> > if the optimized routines are executed (-cpu "rv64,zbb=[false,true]").
> > Further, this series was tested with SPEC CPU 2017 intrate with Zbb
> > enabled.  The function attribute detection mechanism was tested with
> > GCC 13 and GCC 14.
> >
>
> I tried check this patchset with gcc 14 [1] (commit 6f6103ccc5b3bf8cb,
> built with build-many-glibcs.py) and it shows an ICE:
>
> $ riscv64-glibc-linux-gnu-gcc ../sysdeps/riscv/multiarch/memchr-zbb.c [...]
> In file included from ../include/bits/string_fortified.h:1,
>                  from ../string/string.h:548,
>                  from ../include/string.h:60,
>                  from ../string/memchr.c:24,
>                  from ../sysdeps/riscv/multiarch/memchr-zbb.c:23:
> ../string/bits/string_fortified.h:110:1: internal compiler error: in riscv_func_target_put, at common/config/riscv/riscv-common.cc:510
>   110 | {
>       | ^
> 0x7af52762a1c9 __libc_start_call_main
>         ../sysdeps/nptl/libc_start_call_main.h:58
> 0x7af52762a28a __libc_start_main_impl
>         ../csu/libc-start.c:360
>
> I am building with --enable-fortify-source=2, with --enable-fortify-source=no
> the build does not fail. A gcc 13 build [2] seems to be in better shape but
> the version I am using does not have support for required attributes
> (HAVE_RISCV_FATTRIBUTE_ZBB is set to 0).
>
> So I think we will need to check whether this is still happens with
> gcc master/15 and see if indeed it was a regression added on gcc 14.

I can reproduce the issue with gcc 14 and master/15.

While analysing this, I've discovered two issues in GCC
when processing target-arch attributes:
* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115554
* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115562

Both issues trigger an ICE because of a violated assertion in
riscv_func_target_put().
While the first issue could be fixed by replacing the assertion with
an error message,
the second requires more work. Unfortunately, it is the second issue
that is causing
this patchset to not build.

GCC 13 or earlier are not affected, because the RISC-V backend did not support
target-arch attributes back then.

Thanks,
Christoph

> [1] gcc version 14.1.1 20240619 [releases/gcc-14 r14-10324-g6f6103ccc5b]
> [2] gcc version 13.1.1 20230525 [releases/gcc-13 r13-7376-ge80487dcbe2]
  
Christoph Müllner July 17, 2024, 7:34 a.m. UTC | #5
On Thu, Jun 20, 2024 at 5:52 PM Christoph Müllner
<christoph.muellner@vrull.eu> wrote:
>
> On Wed, Jun 19, 2024 at 4:26 PM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
> >
> >
> >
> > On 27/05/24 08:18, Christoph Müllner wrote:
> > > Glibc recently got hwprobe() support for RISC-V, which allows querying
> > > avaiable extensions at runtime.  On top of that an optimized memcpy()
> > > routine (for fast unaligned accesses) has been merged, which is built by
> > > recompiling the generic C code with a different compiler flag.  An ifunc
> > > resolver then detects which routine should be run using hwprobe().
> > >
> > > This patchset follows this idea and recompiles the following functions
> > > for Zbb (via function attributes) and enables the existing Zbb/orc.b
> > > optimization in riscv/string-fza.h:
> > > memchr, memrchr, strchrnul, strcmp, strlen, strncmp.
> > > The resulting optimized routines are then selected by the resolver function
> > > if the Zbb extension is present at runtime.
> > >
> > > To use target function attributes, a few issues had to be resovled:
> > > - The functions above got a mechanism to be compiled with function attributes
> > >   (patches 2-7).  Only those routines have been touched, which are
> > >   required for the purpose of this patchset.
> > > - Ensuring that inlined functions also get the same function attributes
> > >   (first patch).
> > > - Add mechanism to explicitly enable the orc.b optimization for string functions
> > >   (patch 8), which is a bit inspired by USE_FFS_BUILTIN.
> > >
> > > One of the design questions is, if Zbb represents a broad enough optimization
> > > target.  Tests with Zb* extensions showed, that no further code improvements
> > > can be achieved with them.  Also most other extensions likely won't affect
> > > the generated code for string routines (ignoring vector instructions, which
> > > are a different topic).  Therefore, Zbb seemed like a sufficient target.
> > >
> > > This series was tested by writing a simple test program to invoke the
> > > libc routines (e.g. strcmp) and a modified QEMU that reports the
> > > emulation of orc.b on stderr.  With that the QEMU can be used to test
> > > if the optimized routines are executed (-cpu "rv64,zbb=[false,true]").
> > > Further, this series was tested with SPEC CPU 2017 intrate with Zbb
> > > enabled.  The function attribute detection mechanism was tested with
> > > GCC 13 and GCC 14.
> > >
> >
> > I tried check this patchset with gcc 14 [1] (commit 6f6103ccc5b3bf8cb,
> > built with build-many-glibcs.py) and it shows an ICE:
> >
> > $ riscv64-glibc-linux-gnu-gcc ../sysdeps/riscv/multiarch/memchr-zbb.c [...]
> > In file included from ../include/bits/string_fortified.h:1,
> >                  from ../string/string.h:548,
> >                  from ../include/string.h:60,
> >                  from ../string/memchr.c:24,
> >                  from ../sysdeps/riscv/multiarch/memchr-zbb.c:23:
> > ../string/bits/string_fortified.h:110:1: internal compiler error: in riscv_func_target_put, at common/config/riscv/riscv-common.cc:510
> >   110 | {
> >       | ^
> > 0x7af52762a1c9 __libc_start_call_main
> >         ../sysdeps/nptl/libc_start_call_main.h:58
> > 0x7af52762a28a __libc_start_main_impl
> >         ../csu/libc-start.c:360
> >
> > I am building with --enable-fortify-source=2, with --enable-fortify-source=no
> > the build does not fail. A gcc 13 build [2] seems to be in better shape but
> > the version I am using does not have support for required attributes
> > (HAVE_RISCV_FATTRIBUTE_ZBB is set to 0).
> >
> > So I think we will need to check whether this is still happens with
> > gcc master/15 and see if indeed it was a regression added on gcc 14.
>
> I can reproduce the issue with gcc 14 and master/15.
>
> While analysing this, I've discovered two issues in GCC
> when processing target-arch attributes:
> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115554
> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115562

The GCC issue has been fixed on master:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aa8e2de78cae4dca7f9b0efe0685f3382f9ecb9a
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=61c21a719e205f70bd046c6a0275d1a3fd6341a4

GCC-14 backport:

https://gcc.gnu.org/git?p=gcc.git;a=commit;h=0e1f599d637668bba0b2890f4cd81e7fb70473bc
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=b3cff8357e9dce680a20406698fa9dadfe04997d

Building with "--enable-fortify-source=2" does not trigger an ICE with fix.

The series still applies cleanly on master and retesting showed that
everything still works as expected.

BR
Christoph
  
Christoph Müllner Aug. 13, 2024, 2:05 p.m. UTC | #6
Ping.

On Wed, Jul 17, 2024 at 9:34 AM Christoph Müllner
<christoph.muellner@vrull.eu> wrote:
>
> On Thu, Jun 20, 2024 at 5:52 PM Christoph Müllner
> <christoph.muellner@vrull.eu> wrote:
> >
> > On Wed, Jun 19, 2024 at 4:26 PM Adhemerval Zanella Netto
> > <adhemerval.zanella@linaro.org> wrote:
> > >
> > >
> > >
> > > On 27/05/24 08:18, Christoph Müllner wrote:
> > > > Glibc recently got hwprobe() support for RISC-V, which allows querying
> > > > avaiable extensions at runtime.  On top of that an optimized memcpy()
> > > > routine (for fast unaligned accesses) has been merged, which is built by
> > > > recompiling the generic C code with a different compiler flag.  An ifunc
> > > > resolver then detects which routine should be run using hwprobe().
> > > >
> > > > This patchset follows this idea and recompiles the following functions
> > > > for Zbb (via function attributes) and enables the existing Zbb/orc.b
> > > > optimization in riscv/string-fza.h:
> > > > memchr, memrchr, strchrnul, strcmp, strlen, strncmp.
> > > > The resulting optimized routines are then selected by the resolver function
> > > > if the Zbb extension is present at runtime.
> > > >
> > > > To use target function attributes, a few issues had to be resovled:
> > > > - The functions above got a mechanism to be compiled with function attributes
> > > >   (patches 2-7).  Only those routines have been touched, which are
> > > >   required for the purpose of this patchset.
> > > > - Ensuring that inlined functions also get the same function attributes
> > > >   (first patch).
> > > > - Add mechanism to explicitly enable the orc.b optimization for string functions
> > > >   (patch 8), which is a bit inspired by USE_FFS_BUILTIN.
> > > >
> > > > One of the design questions is, if Zbb represents a broad enough optimization
> > > > target.  Tests with Zb* extensions showed, that no further code improvements
> > > > can be achieved with them.  Also most other extensions likely won't affect
> > > > the generated code for string routines (ignoring vector instructions, which
> > > > are a different topic).  Therefore, Zbb seemed like a sufficient target.
> > > >
> > > > This series was tested by writing a simple test program to invoke the
> > > > libc routines (e.g. strcmp) and a modified QEMU that reports the
> > > > emulation of orc.b on stderr.  With that the QEMU can be used to test
> > > > if the optimized routines are executed (-cpu "rv64,zbb=[false,true]").
> > > > Further, this series was tested with SPEC CPU 2017 intrate with Zbb
> > > > enabled.  The function attribute detection mechanism was tested with
> > > > GCC 13 and GCC 14.
> > > >
> > >
> > > I tried check this patchset with gcc 14 [1] (commit 6f6103ccc5b3bf8cb,
> > > built with build-many-glibcs.py) and it shows an ICE:
> > >
> > > $ riscv64-glibc-linux-gnu-gcc ../sysdeps/riscv/multiarch/memchr-zbb.c [...]
> > > In file included from ../include/bits/string_fortified.h:1,
> > >                  from ../string/string.h:548,
> > >                  from ../include/string.h:60,
> > >                  from ../string/memchr.c:24,
> > >                  from ../sysdeps/riscv/multiarch/memchr-zbb.c:23:
> > > ../string/bits/string_fortified.h:110:1: internal compiler error: in riscv_func_target_put, at common/config/riscv/riscv-common.cc:510
> > >   110 | {
> > >       | ^
> > > 0x7af52762a1c9 __libc_start_call_main
> > >         ../sysdeps/nptl/libc_start_call_main.h:58
> > > 0x7af52762a28a __libc_start_main_impl
> > >         ../csu/libc-start.c:360
> > >
> > > I am building with --enable-fortify-source=2, with --enable-fortify-source=no
> > > the build does not fail. A gcc 13 build [2] seems to be in better shape but
> > > the version I am using does not have support for required attributes
> > > (HAVE_RISCV_FATTRIBUTE_ZBB is set to 0).
> > >
> > > So I think we will need to check whether this is still happens with
> > > gcc master/15 and see if indeed it was a regression added on gcc 14.
> >
> > I can reproduce the issue with gcc 14 and master/15.
> >
> > While analysing this, I've discovered two issues in GCC
> > when processing target-arch attributes:
> > * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115554
> > * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115562
>
> The GCC issue has been fixed on master:
>
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aa8e2de78cae4dca7f9b0efe0685f3382f9ecb9a
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=61c21a719e205f70bd046c6a0275d1a3fd6341a4
>
> GCC-14 backport:
>
> https://gcc.gnu.org/git?p=gcc.git;a=commit;h=0e1f599d637668bba0b2890f4cd81e7fb70473bc
> https://gcc.gnu.org/git?p=gcc.git;a=commit;h=b3cff8357e9dce680a20406698fa9dadfe04997d
>
> Building with "--enable-fortify-source=2" does not trigger an ICE with fix.
>
> The series still applies cleanly on master and retesting showed that
> everything still works as expected.
>
> BR
> Christoph
  
Adhemerval Zanella Netto Aug. 13, 2024, 5:39 p.m. UTC | #7
On 17/07/24 04:34, Christoph Müllner wrote:
> On Thu, Jun 20, 2024 at 5:52 PM Christoph Müllner
> <christoph.muellner@vrull.eu> wrote:
>>
>> On Wed, Jun 19, 2024 at 4:26 PM Adhemerval Zanella Netto
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>>
>>>
>>> On 27/05/24 08:18, Christoph Müllner wrote:
>>>> Glibc recently got hwprobe() support for RISC-V, which allows querying
>>>> avaiable extensions at runtime.  On top of that an optimized memcpy()
>>>> routine (for fast unaligned accesses) has been merged, which is built by
>>>> recompiling the generic C code with a different compiler flag.  An ifunc
>>>> resolver then detects which routine should be run using hwprobe().
>>>>
>>>> This patchset follows this idea and recompiles the following functions
>>>> for Zbb (via function attributes) and enables the existing Zbb/orc.b
>>>> optimization in riscv/string-fza.h:
>>>> memchr, memrchr, strchrnul, strcmp, strlen, strncmp.
>>>> The resulting optimized routines are then selected by the resolver function
>>>> if the Zbb extension is present at runtime.
>>>>
>>>> To use target function attributes, a few issues had to be resovled:
>>>> - The functions above got a mechanism to be compiled with function attributes
>>>>   (patches 2-7).  Only those routines have been touched, which are
>>>>   required for the purpose of this patchset.
>>>> - Ensuring that inlined functions also get the same function attributes
>>>>   (first patch).
>>>> - Add mechanism to explicitly enable the orc.b optimization for string functions
>>>>   (patch 8), which is a bit inspired by USE_FFS_BUILTIN.
>>>>
>>>> One of the design questions is, if Zbb represents a broad enough optimization
>>>> target.  Tests with Zb* extensions showed, that no further code improvements
>>>> can be achieved with them.  Also most other extensions likely won't affect
>>>> the generated code for string routines (ignoring vector instructions, which
>>>> are a different topic).  Therefore, Zbb seemed like a sufficient target.
>>>>
>>>> This series was tested by writing a simple test program to invoke the
>>>> libc routines (e.g. strcmp) and a modified QEMU that reports the
>>>> emulation of orc.b on stderr.  With that the QEMU can be used to test
>>>> if the optimized routines are executed (-cpu "rv64,zbb=[false,true]").
>>>> Further, this series was tested with SPEC CPU 2017 intrate with Zbb
>>>> enabled.  The function attribute detection mechanism was tested with
>>>> GCC 13 and GCC 14.
>>>>
>>>
>>> I tried check this patchset with gcc 14 [1] (commit 6f6103ccc5b3bf8cb,
>>> built with build-many-glibcs.py) and it shows an ICE:
>>>
>>> $ riscv64-glibc-linux-gnu-gcc ../sysdeps/riscv/multiarch/memchr-zbb.c [...]
>>> In file included from ../include/bits/string_fortified.h:1,
>>>                  from ../string/string.h:548,
>>>                  from ../include/string.h:60,
>>>                  from ../string/memchr.c:24,
>>>                  from ../sysdeps/riscv/multiarch/memchr-zbb.c:23:
>>> ../string/bits/string_fortified.h:110:1: internal compiler error: in riscv_func_target_put, at common/config/riscv/riscv-common.cc:510
>>>   110 | {
>>>       | ^
>>> 0x7af52762a1c9 __libc_start_call_main
>>>         ../sysdeps/nptl/libc_start_call_main.h:58
>>> 0x7af52762a28a __libc_start_main_impl
>>>         ../csu/libc-start.c:360
>>>
>>> I am building with --enable-fortify-source=2, with --enable-fortify-source=no
>>> the build does not fail. A gcc 13 build [2] seems to be in better shape but
>>> the version I am using does not have support for required attributes
>>> (HAVE_RISCV_FATTRIBUTE_ZBB is set to 0).
>>>
>>> So I think we will need to check whether this is still happens with
>>> gcc master/15 and see if indeed it was a regression added on gcc 14.
>>
>> I can reproduce the issue with gcc 14 and master/15.
>>
>> While analysing this, I've discovered two issues in GCC
>> when processing target-arch attributes:
>> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115554
>> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115562
> 
> The GCC issue has been fixed on master:
> 
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aa8e2de78cae4dca7f9b0efe0685f3382f9ecb9a
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=61c21a719e205f70bd046c6a0275d1a3fd6341a4
> 
> GCC-14 backport:
> 
> https://gcc.gnu.org/git?p=gcc.git;a=commit;h=0e1f599d637668bba0b2890f4cd81e7fb70473bc
> https://gcc.gnu.org/git?p=gcc.git;a=commit;h=b3cff8357e9dce680a20406698fa9dadfe04997d
> 
> Building with "--enable-fortify-source=2" does not trigger an ICE with fix.
> 
> The series still applies cleanly on master and retesting showed that
> everything still works as expected.

Thanks, I can confirm that with gcc version 14.2.1 20240813 [master r15-2901-gccd7068d462] (GCC)
I don't see these issues.