[00/22] arm: New framework for MVE intrinsics

Message ID 20230418134608.244751-1-christophe.lyon@arm.com
Headers
Series arm: New framework for MVE intrinsics |

Message

Christophe Lyon April 18, 2023, 1:45 p.m. UTC
  Hi,

This is the beginning of a long patch series to change the way Arm MVE
intrinsics are implemented. The goal is to get rid of arm_mve.h, which
takes a long time to parse and compile.

Roughly speaking, it's about using a framework very similar to what is
implemented for AArch64/SVE intrinsics. I haven't converted all the
intrinsics yet, but I think it would be good to start the conversion
when stage-1 reopens.

* Factorizing names
One of the main implementation differences I noticed between SVE and
MVE is that mve.md provides only full builtin names at the moment, and
makes almost no use of "parameterized names"
(https://gcc.gnu.org/onlinedocs/gccint/Parameterized-Names.html#Parameterized-Names).

Without this, we'd need the builtin expander to use a large
switch/case of the form:

switch (code)
case VADDQ_S: insn_code = code_for_mve_vaddq_s (...)
case VADDQ_U: insn_code = code_for_mve_vaddq_u (...)
case VSUBQ_S: insn_code = code_for_mve_vsubq_s (...)
case VSUBQ_U: insn_code = code_for_mve_vsubq_u (...)
....

so part of the work (which I called "factorize" in the commit
messages) is about replacing

(define_insn "mve_vaddq_n_<supf><mode>"
with
(define_insn "@mve_<mve_insn>q_n_<supf><mode>"
with the help of a new iterator (mve_insn).

Doing so makes it more obvious that some patterns are identical,
except for the instruction name. I took this opportunity to merge
them, so for instance I have a patch which merges add, sub and mul
patterns.  Although not strictly necessary for the MVE intrinsics
restructuring work, this is a good opportunity to reduce such code
duplication (I did notice a few bugs during that process, which led me
to post a few small patches in the past months).  Note that identical
patterns will probably remain after the series, they can be merged
later if we want.

This factorization also implies the introduction of new iterators, but
also means that several existing ones become useless. These patches do
not remove them because it's a bit painful to reorder patches which
remove lines at some "random" places, leading to merge conflicts. It's
much simpler to write a big cleanup patch at the end of the serie to
remove all such useless iterators at once.

* Intrinsic re-implementation
After intrinsic names have been factorized, the actual
re-implementation patch is small:
- add 1 line in each of arm-mve-builtins-base.{cc,def,h} describing
  the intrinsic shape/signature, types and predicates involved,
  RTX/unspec codes
- remove the intrinsic definitions from arm_mve.h

The full series of ~140 patches is organized like this:
- patches 1 and 2 introduce the new framework
- new implementation of vreinterpretq
- new implementation of vuninitialized
- patch groups of varying size, consisting in:
  - add a new "shape" if needed (e.g. unary, binary, ternary, ....)
  - add framework support functions if needed
  - factorize a set of intrinsics (at minimum, just make use of
    parameterized-names)
  - actual re-implementation of the intrinsics

I kept patches small so the incremental progress is easy to follow and
check.  I'll submit the patches in small groups, this first one will
make sure we agree on the implementation.

Tested on arm-eabi with -mthumb/-mfloat-abi=hard/-march=armv8.1-m.main+mve.

To help reviewers, I suggest to compare arm-mve-builtins.cc with
aarch64-sve-builtins.cc.

Christophe Lyon (22):
  arm: move builtin function codes into general numberspace
  arm: [MVE intrinsics] Add new framework
  arm: [MVE intrinsics] Rework vreinterpretq
  arm: [MVE intrinsics] Rework vuninitialized
  arm: [MVE intrinsics] add binary_opt_n shape
  arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
  arm: [MVE intrinsics] factorize vadd vsubq vmulq
  arm: [MVE intrinsics] rework vaddq vmulq vsubq
  arm: [MVE intrinsics] add binary shape
  arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
  arm: [MVE intrinsics] rework vandq veorq
  arm: [MVE intrinsics] add binary_orrq shape
  arm: [MVE intrinsics] rework vorrq
  arm: [MVE intrinsics] add unspec_mve_function_exact_insn
  arm: [MVE intrinsics] add create shape
  arm: [MVE intrinsics] factorize vcreateq
  arm: [MVE intrinsics] rework vcreateq
  arm: [MVE intrinsics] factorize several binary_m operations
  arm: [MVE intrinsics] factorize several binary _n operations
  arm: [MVE intrinsics] factorize several binary _m_n operations
  arm: [MVE intrinsics] factorize several binary operations
  arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq
    vqdmulhq vrhaddq vrmulhq

 gcc/config.gcc                                |    2 +-
 gcc/config/arm/arm-builtins.cc                |  237 +-
 gcc/config/arm/arm-builtins.h                 |    1 +
 gcc/config/arm/arm-c.cc                       |   42 +-
 gcc/config/arm/arm-mve-builtins-base.cc       |  163 +
 gcc/config/arm/arm-mve-builtins-base.def      |   50 +
 gcc/config/arm/arm-mve-builtins-base.h        |   47 +
 gcc/config/arm/arm-mve-builtins-functions.h   |  387 +
 gcc/config/arm/arm-mve-builtins-shapes.cc     |  529 ++
 gcc/config/arm/arm-mve-builtins-shapes.h      |   47 +
 gcc/config/arm/arm-mve-builtins.cc            | 2013 ++++-
 gcc/config/arm/arm-mve-builtins.def           |   40 +-
 gcc/config/arm/arm-mve-builtins.h             |  672 +-
 gcc/config/arm/arm-protos.h                   |   24 +
 gcc/config/arm/arm.cc                         |   27 +
 gcc/config/arm/arm_mve.h                      | 7581 +----------------
 gcc/config/arm/arm_mve_builtins.def           |    6 -
 gcc/config/arm/arm_mve_types.h                | 1430 ----
 gcc/config/arm/iterators.md                   |  240 +-
 gcc/config/arm/mve.md                         | 1747 +---
 gcc/config/arm/predicates.md                  |    4 +
 gcc/config/arm/t-arm                          |   32 +-
 gcc/config/arm/unspecs.md                     |    1 +
 gcc/config/arm/vec-common.md                  |    8 +-
 gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
 .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
 .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
 .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
 .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
 29 files changed, 4926 insertions(+), 10492 deletions(-)
 create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
 create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
 create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
 create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
 create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
 create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
 create mode 100644 gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
 create mode 100644 gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c
  

Comments

Kyrylo Tkachov May 2, 2023, 9:18 a.m. UTC | #1
Hi Christophe,

> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 00/22] arm: New framework for MVE intrinsics
> 
> Hi,
> 
> This is the beginning of a long patch series to change the way Arm MVE
> intrinsics are implemented. The goal is to get rid of arm_mve.h, which
> takes a long time to parse and compile.
> 

Thanks for doing this. It is a significant improvement to the MVE intrinsics and should address some of the biggest maintainability and scalability issues we have in that area.
I'll be going through the patches one-by-one (I've looked at these offline already before), but the approach looks good to me at a high level.

My hope is that we'll move all the intrinsics, including the Neon ones to use this framework in the future, but getting the framework in place first is a good major first step in that direction.

Thanks,
Kyrill

> Roughly speaking, it's about using a framework very similar to what is
> implemented for AArch64/SVE intrinsics. I haven't converted all the
> intrinsics yet, but I think it would be good to start the conversion
> when stage-1 reopens.
> 
> * Factorizing names
> One of the main implementation differences I noticed between SVE and
> MVE is that mve.md provides only full builtin names at the moment, and
> makes almost no use of "parameterized names"
> (https://gcc.gnu.org/onlinedocs/gccint/Parameterized-
> Names.html#Parameterized-Names).
> 
> Without this, we'd need the builtin expander to use a large
> switch/case of the form:
> 
> switch (code)
> case VADDQ_S: insn_code = code_for_mve_vaddq_s (...)
> case VADDQ_U: insn_code = code_for_mve_vaddq_u (...)
> case VSUBQ_S: insn_code = code_for_mve_vsubq_s (...)
> case VSUBQ_U: insn_code = code_for_mve_vsubq_u (...)
> ....
> 
> so part of the work (which I called "factorize" in the commit
> messages) is about replacing
> 
> (define_insn "mve_vaddq_n_<supf><mode>"
> with
> (define_insn "@mve_<mve_insn>q_n_<supf><mode>"
> with the help of a new iterator (mve_insn).
> 
> Doing so makes it more obvious that some patterns are identical,
> except for the instruction name. I took this opportunity to merge
> them, so for instance I have a patch which merges add, sub and mul
> patterns.  Although not strictly necessary for the MVE intrinsics
> restructuring work, this is a good opportunity to reduce such code
> duplication (I did notice a few bugs during that process, which led me
> to post a few small patches in the past months).  Note that identical
> patterns will probably remain after the series, they can be merged
> later if we want.
> 
> This factorization also implies the introduction of new iterators, but
> also means that several existing ones become useless. These patches do
> not remove them because it's a bit painful to reorder patches which
> remove lines at some "random" places, leading to merge conflicts. It's
> much simpler to write a big cleanup patch at the end of the serie to
> remove all such useless iterators at once.
> 
> * Intrinsic re-implementation
> After intrinsic names have been factorized, the actual
> re-implementation patch is small:
> - add 1 line in each of arm-mve-builtins-base.{cc,def,h} describing
>   the intrinsic shape/signature, types and predicates involved,
>   RTX/unspec codes
> - remove the intrinsic definitions from arm_mve.h
> 
> The full series of ~140 patches is organized like this:
> - patches 1 and 2 introduce the new framework
> - new implementation of vreinterpretq
> - new implementation of vuninitialized
> - patch groups of varying size, consisting in:
>   - add a new "shape" if needed (e.g. unary, binary, ternary, ....)
>   - add framework support functions if needed
>   - factorize a set of intrinsics (at minimum, just make use of
>     parameterized-names)
>   - actual re-implementation of the intrinsics
> 
> I kept patches small so the incremental progress is easy to follow and
> check.  I'll submit the patches in small groups, this first one will
> make sure we agree on the implementation.
> 
> Tested on arm-eabi with -mthumb/-mfloat-abi=hard/-march=armv8.1-
> m.main+mve.
> 
> To help reviewers, I suggest to compare arm-mve-builtins.cc with
> aarch64-sve-builtins.cc.
> 
> Christophe Lyon (22):
>   arm: move builtin function codes into general numberspace
>   arm: [MVE intrinsics] Add new framework
>   arm: [MVE intrinsics] Rework vreinterpretq
>   arm: [MVE intrinsics] Rework vuninitialized
>   arm: [MVE intrinsics] add binary_opt_n shape
>   arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
>   arm: [MVE intrinsics] factorize vadd vsubq vmulq
>   arm: [MVE intrinsics] rework vaddq vmulq vsubq
>   arm: [MVE intrinsics] add binary shape
>   arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
>   arm: [MVE intrinsics] rework vandq veorq
>   arm: [MVE intrinsics] add binary_orrq shape
>   arm: [MVE intrinsics] rework vorrq
>   arm: [MVE intrinsics] add unspec_mve_function_exact_insn
>   arm: [MVE intrinsics] add create shape
>   arm: [MVE intrinsics] factorize vcreateq
>   arm: [MVE intrinsics] rework vcreateq
>   arm: [MVE intrinsics] factorize several binary_m operations
>   arm: [MVE intrinsics] factorize several binary _n operations
>   arm: [MVE intrinsics] factorize several binary _m_n operations
>   arm: [MVE intrinsics] factorize several binary operations
>   arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq
>     vqdmulhq vrhaddq vrmulhq
> 
>  gcc/config.gcc                                |    2 +-
>  gcc/config/arm/arm-builtins.cc                |  237 +-
>  gcc/config/arm/arm-builtins.h                 |    1 +
>  gcc/config/arm/arm-c.cc                       |   42 +-
>  gcc/config/arm/arm-mve-builtins-base.cc       |  163 +
>  gcc/config/arm/arm-mve-builtins-base.def      |   50 +
>  gcc/config/arm/arm-mve-builtins-base.h        |   47 +
>  gcc/config/arm/arm-mve-builtins-functions.h   |  387 +
>  gcc/config/arm/arm-mve-builtins-shapes.cc     |  529 ++
>  gcc/config/arm/arm-mve-builtins-shapes.h      |   47 +
>  gcc/config/arm/arm-mve-builtins.cc            | 2013 ++++-
>  gcc/config/arm/arm-mve-builtins.def           |   40 +-
>  gcc/config/arm/arm-mve-builtins.h             |  672 +-
>  gcc/config/arm/arm-protos.h                   |   24 +
>  gcc/config/arm/arm.cc                         |   27 +
>  gcc/config/arm/arm_mve.h                      | 7581 +----------------
>  gcc/config/arm/arm_mve_builtins.def           |    6 -
>  gcc/config/arm/arm_mve_types.h                | 1430 ----
>  gcc/config/arm/iterators.md                   |  240 +-
>  gcc/config/arm/mve.md                         | 1747 +---
>  gcc/config/arm/predicates.md                  |    4 +
>  gcc/config/arm/t-arm                          |   32 +-
>  gcc/config/arm/unspecs.md                     |    1 +
>  gcc/config/arm/vec-common.md                  |    8 +-
>  gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>  .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>  .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>  .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>  .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>  29 files changed, 4926 insertions(+), 10492 deletions(-)
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
>  create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
>  create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
>  create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
>  create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
> c++/nomve_fp_1.c
>  create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
> c++/vreinterpretq_1.C
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
> c/nomve_fp_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
> c/vreinterpretq_1.c
> 
> --
> 2.34.1
  
Christophe Lyon May 2, 2023, 3:04 p.m. UTC | #2
On 5/2/23 11:18, Kyrylo Tkachov wrote:
> Hi Christophe,
> 
>> -----Original Message-----
>> From: Christophe Lyon <christophe.lyon@arm.com>
>> Sent: Tuesday, April 18, 2023 2:46 PM
>> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
>> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
>> <Richard.Sandiford@arm.com>
>> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
>> Subject: [PATCH 00/22] arm: New framework for MVE intrinsics
>>
>> Hi,
>>
>> This is the beginning of a long patch series to change the way Arm MVE
>> intrinsics are implemented. The goal is to get rid of arm_mve.h, which
>> takes a long time to parse and compile.
>>
> 
> Thanks for doing this. It is a significant improvement to the MVE intrinsics and should address some of the biggest maintainability and scalability issues we have in that area.
> I'll be going through the patches one-by-one (I've looked at these offline already before), but the approach looks good to me at a high level.
> 
> My hope is that we'll move all the intrinsics, including the Neon ones to use this framework in the future, but getting the framework in place first is a good major first step in that direction.
> 

Indeed. Ideally we'd probably want to make this framework more generic 
so that it supports aarch64 SVE, arm MVE and Neon, but that can be done 
later. I tried to highlight the differences I noticed compared to SVE, 
so that it helps us think what needs to be specialized for different 
targets, as opposed to what is already generic enough.

Thanks,

Christophe

> Thanks,
> Kyrill
> 
>> Roughly speaking, it's about using a framework very similar to what is
>> implemented for AArch64/SVE intrinsics. I haven't converted all the
>> intrinsics yet, but I think it would be good to start the conversion
>> when stage-1 reopens.
>>
>> * Factorizing names
>> One of the main implementation differences I noticed between SVE and
>> MVE is that mve.md provides only full builtin names at the moment, and
>> makes almost no use of "parameterized names"
>> (https://gcc.gnu.org/onlinedocs/gccint/Parameterized-
>> Names.html#Parameterized-Names).
>>
>> Without this, we'd need the builtin expander to use a large
>> switch/case of the form:
>>
>> switch (code)
>> case VADDQ_S: insn_code = code_for_mve_vaddq_s (...)
>> case VADDQ_U: insn_code = code_for_mve_vaddq_u (...)
>> case VSUBQ_S: insn_code = code_for_mve_vsubq_s (...)
>> case VSUBQ_U: insn_code = code_for_mve_vsubq_u (...)
>> ....
>>
>> so part of the work (which I called "factorize" in the commit
>> messages) is about replacing
>>
>> (define_insn "mve_vaddq_n_<supf><mode>"
>> with
>> (define_insn "@mve_<mve_insn>q_n_<supf><mode>"
>> with the help of a new iterator (mve_insn).
>>
>> Doing so makes it more obvious that some patterns are identical,
>> except for the instruction name. I took this opportunity to merge
>> them, so for instance I have a patch which merges add, sub and mul
>> patterns.  Although not strictly necessary for the MVE intrinsics
>> restructuring work, this is a good opportunity to reduce such code
>> duplication (I did notice a few bugs during that process, which led me
>> to post a few small patches in the past months).  Note that identical
>> patterns will probably remain after the series, they can be merged
>> later if we want.
>>
>> This factorization also implies the introduction of new iterators, but
>> also means that several existing ones become useless. These patches do
>> not remove them because it's a bit painful to reorder patches which
>> remove lines at some "random" places, leading to merge conflicts. It's
>> much simpler to write a big cleanup patch at the end of the serie to
>> remove all such useless iterators at once.
>>
>> * Intrinsic re-implementation
>> After intrinsic names have been factorized, the actual
>> re-implementation patch is small:
>> - add 1 line in each of arm-mve-builtins-base.{cc,def,h} describing
>>    the intrinsic shape/signature, types and predicates involved,
>>    RTX/unspec codes
>> - remove the intrinsic definitions from arm_mve.h
>>
>> The full series of ~140 patches is organized like this:
>> - patches 1 and 2 introduce the new framework
>> - new implementation of vreinterpretq
>> - new implementation of vuninitialized
>> - patch groups of varying size, consisting in:
>>    - add a new "shape" if needed (e.g. unary, binary, ternary, ....)
>>    - add framework support functions if needed
>>    - factorize a set of intrinsics (at minimum, just make use of
>>      parameterized-names)
>>    - actual re-implementation of the intrinsics
>>
>> I kept patches small so the incremental progress is easy to follow and
>> check.  I'll submit the patches in small groups, this first one will
>> make sure we agree on the implementation.
>>
>> Tested on arm-eabi with -mthumb/-mfloat-abi=hard/-march=armv8.1-
>> m.main+mve.
>>
>> To help reviewers, I suggest to compare arm-mve-builtins.cc with
>> aarch64-sve-builtins.cc.
>>
>> Christophe Lyon (22):
>>    arm: move builtin function codes into general numberspace
>>    arm: [MVE intrinsics] Add new framework
>>    arm: [MVE intrinsics] Rework vreinterpretq
>>    arm: [MVE intrinsics] Rework vuninitialized
>>    arm: [MVE intrinsics] add binary_opt_n shape
>>    arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
>>    arm: [MVE intrinsics] factorize vadd vsubq vmulq
>>    arm: [MVE intrinsics] rework vaddq vmulq vsubq
>>    arm: [MVE intrinsics] add binary shape
>>    arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
>>    arm: [MVE intrinsics] rework vandq veorq
>>    arm: [MVE intrinsics] add binary_orrq shape
>>    arm: [MVE intrinsics] rework vorrq
>>    arm: [MVE intrinsics] add unspec_mve_function_exact_insn
>>    arm: [MVE intrinsics] add create shape
>>    arm: [MVE intrinsics] factorize vcreateq
>>    arm: [MVE intrinsics] rework vcreateq
>>    arm: [MVE intrinsics] factorize several binary_m operations
>>    arm: [MVE intrinsics] factorize several binary _n operations
>>    arm: [MVE intrinsics] factorize several binary _m_n operations
>>    arm: [MVE intrinsics] factorize several binary operations
>>    arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq
>>      vqdmulhq vrhaddq vrmulhq
>>
>>   gcc/config.gcc                                |    2 +-
>>   gcc/config/arm/arm-builtins.cc                |  237 +-
>>   gcc/config/arm/arm-builtins.h                 |    1 +
>>   gcc/config/arm/arm-c.cc                       |   42 +-
>>   gcc/config/arm/arm-mve-builtins-base.cc       |  163 +
>>   gcc/config/arm/arm-mve-builtins-base.def      |   50 +
>>   gcc/config/arm/arm-mve-builtins-base.h        |   47 +
>>   gcc/config/arm/arm-mve-builtins-functions.h   |  387 +
>>   gcc/config/arm/arm-mve-builtins-shapes.cc     |  529 ++
>>   gcc/config/arm/arm-mve-builtins-shapes.h      |   47 +
>>   gcc/config/arm/arm-mve-builtins.cc            | 2013 ++++-
>>   gcc/config/arm/arm-mve-builtins.def           |   40 +-
>>   gcc/config/arm/arm-mve-builtins.h             |  672 +-
>>   gcc/config/arm/arm-protos.h                   |   24 +
>>   gcc/config/arm/arm.cc                         |   27 +
>>   gcc/config/arm/arm_mve.h                      | 7581 +----------------
>>   gcc/config/arm/arm_mve_builtins.def           |    6 -
>>   gcc/config/arm/arm_mve_types.h                | 1430 ----
>>   gcc/config/arm/iterators.md                   |  240 +-
>>   gcc/config/arm/mve.md                         | 1747 +---
>>   gcc/config/arm/predicates.md                  |    4 +
>>   gcc/config/arm/t-arm                          |   32 +-
>>   gcc/config/arm/unspecs.md                     |    1 +
>>   gcc/config/arm/vec-common.md                  |    8 +-
>>   gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>>   .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>>   .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>>   .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>>   .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>>   29 files changed, 4926 insertions(+), 10492 deletions(-)
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>> c++/nomve_fp_1.c
>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>> c++/vreinterpretq_1.C
>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>> c/nomve_fp_1.c
>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>> c/vreinterpretq_1.c
>>
>> --
>> 2.34.1
>
  
Christophe Lyon May 3, 2023, 3:01 p.m. UTC | #3
On 5/2/23 17:04, Christophe Lyon via Gcc-patches wrote:
> 
> 
> On 5/2/23 11:18, Kyrylo Tkachov wrote:
>> Hi Christophe,
>>
>>> -----Original Message-----
>>> From: Christophe Lyon <christophe.lyon@arm.com>
>>> Sent: Tuesday, April 18, 2023 2:46 PM
>>> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
>>> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
>>> <Richard.Sandiford@arm.com>
>>> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
>>> Subject: [PATCH 00/22] arm: New framework for MVE intrinsics
>>>
>>> Hi,
>>>
>>> This is the beginning of a long patch series to change the way Arm MVE
>>> intrinsics are implemented. The goal is to get rid of arm_mve.h, which
>>> takes a long time to parse and compile.
>>>
>>
>> Thanks for doing this. It is a significant improvement to the MVE 
>> intrinsics and should address some of the biggest maintainability and 
>> scalability issues we have in that area.
>> I'll be going through the patches one-by-one (I've looked at these 
>> offline already before), but the approach looks good to me at a high 
>> level.
>>
>> My hope is that we'll move all the intrinsics, including the Neon ones 
>> to use this framework in the future, but getting the framework in 
>> place first is a good major first step in that direction.
>>
> 
> Indeed. Ideally we'd probably want to make this framework more generic 
> so that it supports aarch64 SVE, arm MVE and Neon, but that can be done 
> later. I tried to highlight the differences I noticed compared to SVE, 
> so that it helps us think what needs to be specialized for different 
> targets, as opposed to what is already generic enough.
> 
> Thanks,
> 
> Christophe
> 

Thank you Kyrill for the quick review.
I've pushed the series with the minor changes you requested.

I am going to prepare the next batch :-)

Thanks,

Christophe

>> Thanks,
>> Kyrill
>>
>>> Roughly speaking, it's about using a framework very similar to what is
>>> implemented for AArch64/SVE intrinsics. I haven't converted all the
>>> intrinsics yet, but I think it would be good to start the conversion
>>> when stage-1 reopens.
>>>
>>> * Factorizing names
>>> One of the main implementation differences I noticed between SVE and
>>> MVE is that mve.md provides only full builtin names at the moment, and
>>> makes almost no use of "parameterized names"
>>> (https://gcc.gnu.org/onlinedocs/gccint/Parameterized-
>>> Names.html#Parameterized-Names).
>>>
>>> Without this, we'd need the builtin expander to use a large
>>> switch/case of the form:
>>>
>>> switch (code)
>>> case VADDQ_S: insn_code = code_for_mve_vaddq_s (...)
>>> case VADDQ_U: insn_code = code_for_mve_vaddq_u (...)
>>> case VSUBQ_S: insn_code = code_for_mve_vsubq_s (...)
>>> case VSUBQ_U: insn_code = code_for_mve_vsubq_u (...)
>>> ....
>>>
>>> so part of the work (which I called "factorize" in the commit
>>> messages) is about replacing
>>>
>>> (define_insn "mve_vaddq_n_<supf><mode>"
>>> with
>>> (define_insn "@mve_<mve_insn>q_n_<supf><mode>"
>>> with the help of a new iterator (mve_insn).
>>>
>>> Doing so makes it more obvious that some patterns are identical,
>>> except for the instruction name. I took this opportunity to merge
>>> them, so for instance I have a patch which merges add, sub and mul
>>> patterns.  Although not strictly necessary for the MVE intrinsics
>>> restructuring work, this is a good opportunity to reduce such code
>>> duplication (I did notice a few bugs during that process, which led me
>>> to post a few small patches in the past months).  Note that identical
>>> patterns will probably remain after the series, they can be merged
>>> later if we want.
>>>
>>> This factorization also implies the introduction of new iterators, but
>>> also means that several existing ones become useless. These patches do
>>> not remove them because it's a bit painful to reorder patches which
>>> remove lines at some "random" places, leading to merge conflicts. It's
>>> much simpler to write a big cleanup patch at the end of the serie to
>>> remove all such useless iterators at once.
>>>
>>> * Intrinsic re-implementation
>>> After intrinsic names have been factorized, the actual
>>> re-implementation patch is small:
>>> - add 1 line in each of arm-mve-builtins-base.{cc,def,h} describing
>>>    the intrinsic shape/signature, types and predicates involved,
>>>    RTX/unspec codes
>>> - remove the intrinsic definitions from arm_mve.h
>>>
>>> The full series of ~140 patches is organized like this:
>>> - patches 1 and 2 introduce the new framework
>>> - new implementation of vreinterpretq
>>> - new implementation of vuninitialized
>>> - patch groups of varying size, consisting in:
>>>    - add a new "shape" if needed (e.g. unary, binary, ternary, ....)
>>>    - add framework support functions if needed
>>>    - factorize a set of intrinsics (at minimum, just make use of
>>>      parameterized-names)
>>>    - actual re-implementation of the intrinsics
>>>
>>> I kept patches small so the incremental progress is easy to follow and
>>> check.  I'll submit the patches in small groups, this first one will
>>> make sure we agree on the implementation.
>>>
>>> Tested on arm-eabi with -mthumb/-mfloat-abi=hard/-march=armv8.1-
>>> m.main+mve.
>>>
>>> To help reviewers, I suggest to compare arm-mve-builtins.cc with
>>> aarch64-sve-builtins.cc.
>>>
>>> Christophe Lyon (22):
>>>    arm: move builtin function codes into general numberspace
>>>    arm: [MVE intrinsics] Add new framework
>>>    arm: [MVE intrinsics] Rework vreinterpretq
>>>    arm: [MVE intrinsics] Rework vuninitialized
>>>    arm: [MVE intrinsics] add binary_opt_n shape
>>>    arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
>>>    arm: [MVE intrinsics] factorize vadd vsubq vmulq
>>>    arm: [MVE intrinsics] rework vaddq vmulq vsubq
>>>    arm: [MVE intrinsics] add binary shape
>>>    arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
>>>    arm: [MVE intrinsics] rework vandq veorq
>>>    arm: [MVE intrinsics] add binary_orrq shape
>>>    arm: [MVE intrinsics] rework vorrq
>>>    arm: [MVE intrinsics] add unspec_mve_function_exact_insn
>>>    arm: [MVE intrinsics] add create shape
>>>    arm: [MVE intrinsics] factorize vcreateq
>>>    arm: [MVE intrinsics] rework vcreateq
>>>    arm: [MVE intrinsics] factorize several binary_m operations
>>>    arm: [MVE intrinsics] factorize several binary _n operations
>>>    arm: [MVE intrinsics] factorize several binary _m_n operations
>>>    arm: [MVE intrinsics] factorize several binary operations
>>>    arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq
>>>      vqdmulhq vrhaddq vrmulhq
>>>
>>>   gcc/config.gcc                                |    2 +-
>>>   gcc/config/arm/arm-builtins.cc                |  237 +-
>>>   gcc/config/arm/arm-builtins.h                 |    1 +
>>>   gcc/config/arm/arm-c.cc                       |   42 +-
>>>   gcc/config/arm/arm-mve-builtins-base.cc       |  163 +
>>>   gcc/config/arm/arm-mve-builtins-base.def      |   50 +
>>>   gcc/config/arm/arm-mve-builtins-base.h        |   47 +
>>>   gcc/config/arm/arm-mve-builtins-functions.h   |  387 +
>>>   gcc/config/arm/arm-mve-builtins-shapes.cc     |  529 ++
>>>   gcc/config/arm/arm-mve-builtins-shapes.h      |   47 +
>>>   gcc/config/arm/arm-mve-builtins.cc            | 2013 ++++-
>>>   gcc/config/arm/arm-mve-builtins.def           |   40 +-
>>>   gcc/config/arm/arm-mve-builtins.h             |  672 +-
>>>   gcc/config/arm/arm-protos.h                   |   24 +
>>>   gcc/config/arm/arm.cc                         |   27 +
>>>   gcc/config/arm/arm_mve.h                      | 7581 +----------------
>>>   gcc/config/arm/arm_mve_builtins.def           |    6 -
>>>   gcc/config/arm/arm_mve_types.h                | 1430 ----
>>>   gcc/config/arm/iterators.md                   |  240 +-
>>>   gcc/config/arm/mve.md                         | 1747 +---
>>>   gcc/config/arm/predicates.md                  |    4 +
>>>   gcc/config/arm/t-arm                          |   32 +-
>>>   gcc/config/arm/unspecs.md                     |    1 +
>>>   gcc/config/arm/vec-common.md                  |    8 +-
>>>   gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>>>   .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>>>   .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>>>   .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>>>   .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>>>   29 files changed, 4926 insertions(+), 10492 deletions(-)
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
>>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>>> c++/nomve_fp_1.c
>>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>>> c++/vreinterpretq_1.C
>>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>>> c/nomve_fp_1.c
>>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>>> c/vreinterpretq_1.c
>>>
>>> -- 
>>> 2.34.1
>>