RISC-V: Support vfwmul.vv combine lowering

Message ID 20230628041512.188243-1-juzhe.zhong@rivai.ai
State Rejected
Headers
Series RISC-V: Support vfwmul.vv combine lowering |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm warning Patch failed to apply

Commit Message

钟居哲 June 28, 2023, 4:15 a.m. UTC
  Consider the following complicate case:
#define TEST_TYPE(TYPE1, TYPE2)                                                \
  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      {                                                                        \
	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
      }                                                                        \
  }

TEST_TYPE (double, float)

Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
So the combine PASS will first try to combine one of the combine extension, and then combine
the other. The combine flow is as follows:

Original IR:
(set (reg 0) (float_extend: (reg 1))
(set (reg 3) (float_extend: (reg 2)) 
(set (reg 4) (mult: (reg 0) (reg 3))

First step of combine:
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (float_extend: (reg 1) (reg 3))

Second step of combine:
(set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))

So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).

gcc/ChangeLog:

        * config/riscv/autovec-opt.md (@pred_single_widen_mul<any_extend:su><mode>): Change "@" into "*" in pattern name which simplifies build files.
        (*pred_single_widen_mul<any_extend:su><mode>): Ditto.
        (*pred_single_widen_mul<mode>): New pattern.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/widen/widen-3.c: Add floating-point.
        * gcc.target/riscv/rvv/autovec/widen/widen-7.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c: New test.

---
 gcc/config/riscv/autovec-opt.md               | 41 ++++++++++++++++++-
 .../riscv/rvv/autovec/widen/widen-3.c         |  7 +++-
 .../riscv/rvv/autovec/widen/widen-7.c         |  7 +++-
 .../rvv/autovec/widen/widen-complicate-3.c    |  7 +++-
 .../riscv/rvv/autovec/widen/widen_run-3.c     |  5 ++-
 .../riscv/rvv/autovec/widen/widen_run-7.c     |  5 ++-
 .../rvv/autovec/widen/widen_run_zvfh-3.c      | 28 +++++++++++++
 .../rvv/autovec/widen/widen_run_zvfh-7.c      | 28 +++++++++++++
 8 files changed, 117 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c
  

Comments

Jeff Law June 28, 2023, 4:24 p.m. UTC | #1
On 6/27/23 22:15, Juzhe-Zhong wrote:
> Consider the following complicate case:
> #define TEST_TYPE(TYPE1, TYPE2)                                                \
>    __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
>      TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
>      TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
>      TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
>    {                                                                            \
>      for (int i = 0; i < n; i++)                                                \
>        {                                                                        \
> 	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
> 	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
> 	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
> 	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
>        }                                                                        \
>    }
> 
> TEST_TYPE (double, float)
> 
> Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
> So the combine PASS will first try to combine one of the combine extension, and then combine
> the other. The combine flow is as follows:
> 
> Original IR:
> (set (reg 0) (float_extend: (reg 1))
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (reg 0) (reg 3))
> 
> First step of combine:
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (float_extend: (reg 1) (reg 3))
> 
> Second step of combine:
> (set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))
> 
> So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
> which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).
Hmm, something doesn't make sense here.  Combine knows how to do a 3->1 
combination.  I would expect to see the first step fail (substituting 
just one operand), then a later step try to combine all three 
instructions, substituting the extension for both input operands.

Can you pass along the .combine dump from the failing case?

Jeff
  
钟居哲 June 28, 2023, 10 p.m. UTC | #2
You can see here:

https://godbolt.org/z/d78646hWb 

The first case can't genreate vfwmul.vv but second case succeed.

Failed to match this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
    (if_then_else:VNx2DF (unspec:VNx2BI [
                (const_vector:VNx2BI repeat [
                        (const_int 1 [0x1])
                    ])
                (reg:DI 153)
                (const_int 2 [0x2]) repeated x2
                (const_int 1 [0x1])
                (const_int 7 [0x7])
                (reg:SI 66 vl)
                (reg:SI 67 vtype)
                (reg:SI 69 N/A)
            ] UNSPEC_VPREDICATE)
        (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
            (reg:VNx2DF 148 [ vect__8.49 ]))
        (unspec:VNx2DF [
                (reg:SI 0 zero)
            ] UNSPEC_VUNDEF)))


This patch is adding this combine pattern.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 00:24
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/27/23 22:15, Juzhe-Zhong wrote:
> Consider the following complicate case:
> #define TEST_TYPE(TYPE1, TYPE2)                                                \
>    __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
>      TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
>      TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
>      TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
>    {                                                                            \
>      for (int i = 0; i < n; i++)                                                \
>        {                                                                        \
> dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
> dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
> dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
> dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
>        }                                                                        \
>    }
> 
> TEST_TYPE (double, float)
> 
> Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
> So the combine PASS will first try to combine one of the combine extension, and then combine
> the other. The combine flow is as follows:
> 
> Original IR:
> (set (reg 0) (float_extend: (reg 1))
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (reg 0) (reg 3))
> 
> First step of combine:
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (float_extend: (reg 1) (reg 3))
> 
> Second step of combine:
> (set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))
> 
> So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
> which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).
Hmm, something doesn't make sense here.  Combine knows how to do a 3->1 
combination.  I would expect to see the first step fail (substituting 
just one operand), then a later step try to combine all three 
instructions, substituting the extension for both input operands.
 
Can you pass along the .combine dump from the failing case?
 
Jeff
  
钟居哲 June 28, 2023, 10:59 p.m. UTC | #3
This is the dump



juzhe.zhong@rivai.ai
 
From: 钟居哲
Date: 2023-06-29 06:00
To: Jeff Law; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
You can see here:

https://godbolt.org/z/d78646hWb 

The first case can't genreate vfwmul.vv but second case succeed.

Failed to match this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
    (if_then_else:VNx2DF (unspec:VNx2BI [
                (const_vector:VNx2BI repeat [
                        (const_int 1 [0x1])
                    ])
                (reg:DI 153)
                (const_int 2 [0x2]) repeated x2
                (const_int 1 [0x1])
                (const_int 7 [0x7])
                (reg:SI 66 vl)
                (reg:SI 67 vtype)
                (reg:SI 69 N/A)
            ] UNSPEC_VPREDICATE)
        (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
            (reg:VNx2DF 148 [ vect__8.49 ]))
        (unspec:VNx2DF [
                (reg:SI 0 zero)
            ] UNSPEC_VUNDEF)))


This patch is adding this combine pattern.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 00:24
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/27/23 22:15, Juzhe-Zhong wrote:
> Consider the following complicate case:
> #define TEST_TYPE(TYPE1, TYPE2)                                                \
>    __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
>      TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
>      TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
>      TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
>    {                                                                            \
>      for (int i = 0; i < n; i++)                                                \
>        {                                                                        \
> dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
> dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
> dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
> dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
>        }                                                                        \
>    }
> 
> TEST_TYPE (double, float)
> 
> Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
> So the combine PASS will first try to combine one of the combine extension, and then combine
> the other. The combine flow is as follows:
> 
> Original IR:
> (set (reg 0) (float_extend: (reg 1))
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (reg 0) (reg 3))
> 
> First step of combine:
> (set (reg 3) (float_extend: (reg 2))
> (set (reg 4) (mult: (float_extend: (reg 1) (reg 3))
> 
> Second step of combine:
> (set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))
> 
> So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
> which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).
Hmm, something doesn't make sense here.  Combine knows how to do a 3->1 
combination.  I would expect to see the first step fail (substituting 
just one operand), then a later step try to combine all three 
instructions, substituting the extension for both input operands.
 
Can you pass along the .combine dump from the failing case?
 
Jeff
  
Jeff Law June 29, 2023, 10:59 p.m. UTC | #4
On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb <https://godbolt.org/z/d78646hWb>
> 
> The first case can't genreate vfwmul.vv but second case succeed.
> 
> Failed to match this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>      (if_then_else:VNx2DF (unspec:VNx2BI [
>                  (const_vector:VNx2BI repeat [
>                          (const_int 1 [0x1])
>                      ])
>                  (reg:DI 153)
>                  (const_int 2 [0x2]) repeated x2
>                  (const_int 1 [0x1])
>                  (const_int 7 [0x7])
>                  (reg:SI 66 vl)
>                  (reg:SI 67 vtype)
>                  (reg:SI 69 N/A)
>              ] UNSPEC_VPREDICATE)
>          (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
>              (reg:VNx2DF 148 [ vect__8.49 ]))
>          (unspec:VNx2DF [
>                  (reg:SI 0 zero)
>              ] UNSPEC_VUNDEF)))
Right.  We try combining:
   24 -> 27
   25 -> 27
   23, 24 -> 27
   22, 25 -> 27

All of which fail, as expected.  24 -> 27 and 25-> 27 only put an 
extension on one operand of the mult.  The other two try to substitute a 
float extend of an if-then-else which I fully expect to fail.  All as 
expected.

The next one that gets tried is:

> Trying 25, 24 -> 27:
>    25: r149:VNx2DF=float_extend(r141:VNx2SF)
>       REG_DEAD r141:VNx2SF
>    24: r148:VNx2DF=float_extend(r139:VNx2SF)
>       REG_DEAD r139:VNx2SF
>    27: r150:VNx2DF={(unspec[const_vector,r153:DI,0x2,0x2,0x1,0x7,vl:SI,vtype:SI,N/A:SI] 69)?r148:VNx2DF*r149:VNx2DF:unspec[zero:SI] 68}
>       REG_DEAD r149:VNx2DF
>       REG_DEAD r148:VNx2DF
>       REG_DEAD N/A:SI
>       REG_DEAD zero:SI
>       REG_EQUAL r148:VNx2DF*r149:VNx2DF
> Successfully matched this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>     (if_then_else:VNx2DF (unspec:VNx2BI [
>                 (const_vector:VNx2BI repeat [
>                         (const_int 1 [0x1])
>                     ])
>                 (reg:DI 153)
>                 (const_int 2 [0x2]) repeated x2
>                 (const_int 1 [0x1])
>                 (const_int 7 [0x7])
>                 (reg:SI 66 vl)
>                 (reg:SI 67 vtype)
>                 (reg:SI 69 N/A)
>             ] UNSPEC_VPREDICATE)
>         (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 141 [ vect__4.44 ]))
>             (float_extend:VNx2DF (reg:VNx2SF 139 [ vect__7.48 ])))
>         (unspec:VNx2DF [
>                 (reg:SI 0 zero)
>             ] UNSPEC_VUNDEF)))
> allowing combination of insns 24, 25 and 27
> original costs 4 + 4 + 4 = 12
> replacement cost 4

Note how it replaced both operands of the mult with extended versions 
and the pattern matches, as expected.

The point being that I don't think those helper patterns are needed to 
handle the problem you suggested they were there to handle.  Combine 
knows how to handle multiple substitutions just fine.

Right now I don't see a need for this patch.



Jeff
  
钟居哲 June 29, 2023, 11:02 p.m. UTC | #5
>> Right now I don't see a need for this patch.
No, we need this patch.

With this patch,  this following case can be combine into vfwmul.vv:
#define TEST_TYPE(TYPE1, TYPE2)                                                \
  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      {                                                                        \
	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
      }                                                                        \
  }
TEST_TYPE (double, float)
You should try this, then you will know I am saying.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 06:59
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb <https://godbolt.org/z/d78646hWb>
> 
> The first case can't genreate vfwmul.vv but second case succeed.
> 
> Failed to match this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>      (if_then_else:VNx2DF (unspec:VNx2BI [
>                  (const_vector:VNx2BI repeat [
>                          (const_int 1 [0x1])
>                      ])
>                  (reg:DI 153)
>                  (const_int 2 [0x2]) repeated x2
>                  (const_int 1 [0x1])
>                  (const_int 7 [0x7])
>                  (reg:SI 66 vl)
>                  (reg:SI 67 vtype)
>                  (reg:SI 69 N/A)
>              ] UNSPEC_VPREDICATE)
>          (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
>              (reg:VNx2DF 148 [ vect__8.49 ]))
>          (unspec:VNx2DF [
>                  (reg:SI 0 zero)
>              ] UNSPEC_VUNDEF)))
Right.  We try combining:
   24 -> 27
   25 -> 27
   23, 24 -> 27
   22, 25 -> 27
 
All of which fail, as expected.  24 -> 27 and 25-> 27 only put an 
extension on one operand of the mult.  The other two try to substitute a 
float extend of an if-then-else which I fully expect to fail.  All as 
expected.
 
The next one that gets tried is:
 
> Trying 25, 24 -> 27:
>    25: r149:VNx2DF=float_extend(r141:VNx2SF)
>       REG_DEAD r141:VNx2SF
>    24: r148:VNx2DF=float_extend(r139:VNx2SF)
>       REG_DEAD r139:VNx2SF
>    27: r150:VNx2DF={(unspec[const_vector,r153:DI,0x2,0x2,0x1,0x7,vl:SI,vtype:SI,N/A:SI] 69)?r148:VNx2DF*r149:VNx2DF:unspec[zero:SI] 68}
>       REG_DEAD r149:VNx2DF
>       REG_DEAD r148:VNx2DF
>       REG_DEAD N/A:SI
>       REG_DEAD zero:SI
>       REG_EQUAL r148:VNx2DF*r149:VNx2DF
> Successfully matched this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>     (if_then_else:VNx2DF (unspec:VNx2BI [
>                 (const_vector:VNx2BI repeat [
>                         (const_int 1 [0x1])
>                     ])
>                 (reg:DI 153)
>                 (const_int 2 [0x2]) repeated x2
>                 (const_int 1 [0x1])
>                 (const_int 7 [0x7])
>                 (reg:SI 66 vl)
>                 (reg:SI 67 vtype)
>                 (reg:SI 69 N/A)
>             ] UNSPEC_VPREDICATE)
>         (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 141 [ vect__4.44 ]))
>             (float_extend:VNx2DF (reg:VNx2SF 139 [ vect__7.48 ])))
>         (unspec:VNx2DF [
>                 (reg:SI 0 zero)
>             ] UNSPEC_VUNDEF)))
> allowing combination of insns 24, 25 and 27
> original costs 4 + 4 + 4 = 12
> replacement cost 4
 
Note how it replaced both operands of the mult with extended versions 
and the pattern matches, as expected.
 
The point being that I don't think those helper patterns are needed to 
handle the problem you suggested they were there to handle.  Combine 
knows how to handle multiple substitutions just fine.
 
Right now I don't see a need for this patch.
 
 
 
Jeff
  
钟居哲 June 29, 2023, 11:04 p.m. UTC | #6
Or do you have better solution to make the case succeed to combine into vfwmul?
I am ok with any solution.



juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 06:59
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb <https://godbolt.org/z/d78646hWb>
> 
> The first case can't genreate vfwmul.vv but second case succeed.
> 
> Failed to match this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>      (if_then_else:VNx2DF (unspec:VNx2BI [
>                  (const_vector:VNx2BI repeat [
>                          (const_int 1 [0x1])
>                      ])
>                  (reg:DI 153)
>                  (const_int 2 [0x2]) repeated x2
>                  (const_int 1 [0x1])
>                  (const_int 7 [0x7])
>                  (reg:SI 66 vl)
>                  (reg:SI 67 vtype)
>                  (reg:SI 69 N/A)
>              ] UNSPEC_VPREDICATE)
>          (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
>              (reg:VNx2DF 148 [ vect__8.49 ]))
>          (unspec:VNx2DF [
>                  (reg:SI 0 zero)
>              ] UNSPEC_VUNDEF)))
Right.  We try combining:
   24 -> 27
   25 -> 27
   23, 24 -> 27
   22, 25 -> 27
 
All of which fail, as expected.  24 -> 27 and 25-> 27 only put an 
extension on one operand of the mult.  The other two try to substitute a 
float extend of an if-then-else which I fully expect to fail.  All as 
expected.
 
The next one that gets tried is:
 
> Trying 25, 24 -> 27:
>    25: r149:VNx2DF=float_extend(r141:VNx2SF)
>       REG_DEAD r141:VNx2SF
>    24: r148:VNx2DF=float_extend(r139:VNx2SF)
>       REG_DEAD r139:VNx2SF
>    27: r150:VNx2DF={(unspec[const_vector,r153:DI,0x2,0x2,0x1,0x7,vl:SI,vtype:SI,N/A:SI] 69)?r148:VNx2DF*r149:VNx2DF:unspec[zero:SI] 68}
>       REG_DEAD r149:VNx2DF
>       REG_DEAD r148:VNx2DF
>       REG_DEAD N/A:SI
>       REG_DEAD zero:SI
>       REG_EQUAL r148:VNx2DF*r149:VNx2DF
> Successfully matched this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>     (if_then_else:VNx2DF (unspec:VNx2BI [
>                 (const_vector:VNx2BI repeat [
>                         (const_int 1 [0x1])
>                     ])
>                 (reg:DI 153)
>                 (const_int 2 [0x2]) repeated x2
>                 (const_int 1 [0x1])
>                 (const_int 7 [0x7])
>                 (reg:SI 66 vl)
>                 (reg:SI 67 vtype)
>                 (reg:SI 69 N/A)
>             ] UNSPEC_VPREDICATE)
>         (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 141 [ vect__4.44 ]))
>             (float_extend:VNx2DF (reg:VNx2SF 139 [ vect__7.48 ])))
>         (unspec:VNx2DF [
>                 (reg:SI 0 zero)
>             ] UNSPEC_VUNDEF)))
> allowing combination of insns 24, 25 and 27
> original costs 4 + 4 + 4 = 12
> replacement cost 4
 
Note how it replaced both operands of the mult with extended versions 
and the pattern matches, as expected.
 
The point being that I don't think those helper patterns are needed to 
handle the problem you suggested they were there to handle.  Combine 
knows how to handle multiple substitutions just fine.
 
Right now I don't see a need for this patch.
 
 
 
Jeff
  
Jeff Law June 29, 2023, 11:39 p.m. UTC | #7
On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb <https://godbolt.org/z/d78646hWb>
So just to be explicit, I see no difference with that test before/after 
your proposed change.  Nor would I expect one based on my understanding 
of the patch.

The explicit conversions I see are because we need the output of the 
conversion in multiple vfmul instructions.  That won't be helped by the 
patch you've proposed.

To be more concrete:

>        vsetvli t1,t5,e32,mf2,ta,ma     # 99    [c=0 l=4]  vsetvldi
>         vle32.v v2,0(a4)        # 23    [c=4 l=4]  pred_movvnx2sf/1
>         vle32.v v1,0(a5)        # 25    [c=4 l=4]  pred_movvnx2sf/1
>         vsetvli t0,zero,e32,mf2,ta,ma   # 101   [c=0 l=4]  vsetvldi
>         vfwcvt.f.f.v    v3,v2   # 77    [c=4 l=4]  pred_extendvnx2df/0
>         vfwcvt.f.f.v    v2,v1   # 79    [c=4 l=4]  pred_extendvnx2df/0
>         vsetvli zero,t1,e32,mf2,ta,ma   # 102   [c=0 l=4]  vsetvl_discard_resultdi
>         vle32.v v5,0(a6)        # 31    [c=4 l=4]  pred_movvnx2sf/1
>         vle32.v v4,0(a7)        # 39    [c=4 l=4]  pred_movvnx2sf/1
>         vsetvli t0,zero,e32,mf2,ta,ma   # 103   [c=0 l=4]  vsetvldi
>         vfwcvt.f.f.v    v1,v5   # 81    [c=4 l=4]  pred_extendvnx2df/0
>         vsetvli zero,zero,e64,m1,ta,ma  # 104   [c=16 l=4]  vsetvl_vtype_change_only
>         vfmul.vv        v5,v2,v3        # 29    [c=4 l=4]  pred_mulvnx2df/2
>         vfmul.vv        v2,v1,v2        # 34    [c=4 l=4]  pred_mulvnx2df/2
>         vsetvli zero,t1,e64,m1,ta,ma    # 105   [c=0 l=4]  vsetvl_discard_resultdi
>         vse64.v v2,0(a1)        # 35    [c=4 l=4]  pred_storevnx2df
>         vse64.v v5,0(a0)        # 30    [c=4 l=4]  pred_storevnx2df
>         vsetvli t6,zero,e64,m1,ta,ma    # 106   [c=0 l=4]  vsetvldi
>         vfmul.vv        v1,v1,v3        # 37    [c=4 l=4]  pred_mulvnx2df/2
>         vsetvli zero,zero,e32,mf2,ta,ma # 107   [c=20 l=4]  vsetvl_vtype_change_only
>         vfwcvt.f.f.v    v2,v4   # 83    [c=4 l=4]  pred_extendvnx2df/0
>         vsetvli zero,t1,e64,m1,ta,ma    # 108   [c=0 l=4]  vsetvl_discard_resultdi
>         vse64.v v1,0(a2)        # 38    [c=4 l=4]  pred_storevnx2df
>         vsetvli t6,zero,e64,m1,ta,ma    # 109   [c=0 l=4]  vsetvldi
>         slli    t4,t1,2 # 22    [c=4 l=4]  ashldi3
>         slli    t3,t1,3 # 27    [c=4 l=4]  ashldi3
>         vfmul.vv        v1,v2,v3        # 42    [c=4 l=4]  pred_mulvnx2df/2


Note how the output of the explicit conversion done in insn 77 is used 
by the vfmul in insns 29, 37 and 42.  Similarly for the other explcit 
conversions.

Your pattern isn't going to help that problem.

You could model this as a dependency height reduction.  I think that 
will get you were you want to go.

You'll need a pattern that matches this:

> (parallel [     
>         (set (reg:VNx2DF 160 [ vect__11.15 ])
>             (if_then_else:VNx2DF (unspec:VNx2BI [
>                         (const_vector:VNx2BI repeat [
>                                 (const_int 1 [0x1])
>                             ])      
>                         (reg:DI 169)
>                         (const_int 2 [0x2]) repeated x2
>                         (const_int 1 [0x1]) 
>                         (const_int 7 [0x7])
>                         (reg:SI 66 vl)
>                         (reg:SI 67 vtype)
>                         (reg:SI 69 frm)
>                     ] UNSPEC_VPREDICATE)
>                 (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 ]))
>                     (float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
>                 (unspec:VNx2DF [
>                         (reg:SI 0 zero)
>                     ] UNSPEC_VUNDEF)))
>         (set (reg:VNx2DF 143 [ vect__8.14 ])
>             (float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 ])))
>         (set (reg:VNx2DF 145 [ vect__5.10 ])
>             (float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
>     ])

It'll need to be a define_insn_and_split as its a 3->3 splitter.  The 
split will emit the two extensions and the widening multiply as 3 
distinct insns.

This has two positive effects.  First the widening multiply is no longer 
data dependent on the float_extend and so it can issue when ever r144 
and r146 are ready rather than when r143 and r145 are ready.

The second effect is I think this pattern will end up matching all the 
multiplies in this sample code.  As a result all the float_extend insns 
you generated when splitting become dead and should be removed by DCE.


Jeff
  
Jeff Law June 29, 2023, 11:41 p.m. UTC | #8
On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb <https://godbolt.org/z/d78646hWb>
You patch doesn't help that code and your patch is a result of 
fundamentally misunderstanding combine's capabilities AFAICT.

Jeff
  
Jeff Law June 29, 2023, 11:48 p.m. UTC | #9
On 6/29/23 17:46, juzhe.zhong wrote:
> You should try the example check the codegen before and after the patch. 
> You will understand it.
I've already done that.  It makes _no_ difference on the godbold example.

Jeff
  
钟居哲 June 30, 2023, 12:44 a.m. UTC | #10
Hi, Jeff.

That's odd. I think maybe you should first clean up your environment ?
Or you didn't build up the toolchain correctly with this patch?

Compile option: --param=riscv-autovec-preference=scalable -O3 -ffast-math
Before this patch:
https://godbolt.org/z/Y5d44WMqs 

fail.s:

lw t5,0(sp)
ble t5,zero,.L5
.L3:
vsetvli t1,t5,e32,mf2,ta,ma
vle32.v v2,0(a4)
vle32.v v1,0(a5)
vsetvli t0,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v3,v2
vfwcvt.f.f.v v2,v1
vsetvli zero,t1,e32,mf2,ta,ma
vle32.v v5,0(a6)
vle32.v v4,0(a7)
vsetvli t0,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v1,v5
vsetvli zero,zero,e64,m1,ta,ma
vfmul.vv v5,v2,v3
vfmul.vv v2,v1,v2
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v2,0(a1)
vse64.v v5,0(a0)
vsetvli t6,zero,e64,m1,ta,ma
vfmul.vv v1,v1,v3
vsetvli zero,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v2,v4
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v1,0(a2)
vsetvli t6,zero,e64,m1,ta,ma
slli t4,t1,2
slli t3,t1,3
vfmul.vv v1,v2,v3
sub t5,t5,t1
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v1,0(a3)
add a4,a4,t4
add a5,a5,t4
add a0,a0,t3
add a6,a6,t4
add a1,a1,t3
add a2,a2,t3
add a7,a7,t4
add a3,a3,t3
bne t5,zero,.L3
.L5:
ret

After this patch:
pass.s:

lw t5,0(sp)
ble t5,zero,.L5
.L3:
vsetvli t1,t5,e32,mf2,ta,ma
vle32.v v1,0(a4)
vle32.v v3,0(a5)
vle32.v v2,0(a6)
vle32.v v4,0(a7)
vsetvli t6,zero,e32,mf2,ta,ma
vfwmul.vv v5,v3,v2
vfwmul.vv v6,v1,v3
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v6,0(a0)
vse64.v v5,0(a1)
vsetvli t6,zero,e32,mf2,ta,ma
slli t4,t1,2
slli t3,t1,3
vfwmul.vv v3,v2,v1
sub t5,t5,t1
vfwmul.vv v2,v1,v4
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v3,0(a2)
vse64.v v2,0(a3)
add a4,a4,t4
add a5,a5,t4
add a0,a0,t3
add a6,a6,t4
add a1,a1,t3
add a2,a2,t3
add a7,a7,t4
add a3,a3,t3
bne t5,zero,.L3
.L5:
ret

It's very obvious the codegen with this patch is perfect.

I have attached the .S in this patch.

I am not claiming that this patch solution is the only solution.

I am welcome you can provide another solution as long as you can make this codegen become the perfect codegen that this patch achieved.

I think maybe you should make sure you are using the correct toolchain that built with patch.

Thanks.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 07:48
To: juzhe.zhong
CC: gcc-patches; kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/29/23 17:46, juzhe.zhong wrote:
> You should try the example check the codegen before and after the patch. 
> You will understand it.
I've already done that.  It makes _no_ difference on the godbold example.
 
Jeff
  
Robin Dapp June 30, 2023, 10:14 a.m. UTC | #11
> The explicit conversions I see are because we need the output of the
> conversion in multiple vfmul instructions.  That won't be helped by
> the patch you've proposed.

FWIW on my local branch and the patch applied I see that the vfwmuls
are being generated (all of the vfmuls are replaced).

> It'll need to be a define_insn_and_split as its a 3->3 splitter.  The
> split will emit the two extensions and the widening multiply as 3
> distinct insns.

I tried this and while it worked for the first vfwmul the subsequent
ones are not being combined/optimized.  Now I'm not a combine expert
at all but it looks as if the source float_extends are being deleted

 deferring deletion of insn with uid = 39.
 deferring deletion of insn with uid = 37.

with that pattern successfully matched, while they are only "rescanned"
with the synthetic "single widen" one.  Them being deleted (or rather
absorbed by the vfwmul) no further combination is possible (until after
split?)

This seems to be a fundamental difference between the two approaches.
Maybe the "double widen" pattern can be adjusted to also handle this
or I did something wrong when writing the splitter?

With the "single widen" pattern, however, it works more or less
naturally therefore I'd still suggest going for it.

Regards
 Robin
  
Jeff Law June 30, 2023, 10:35 p.m. UTC | #12
On 6/30/23 04:14, Robin Dapp wrote:
>> The explicit conversions I see are because we need the output of the
>> conversion in multiple vfmul instructions.  That won't be helped by
>> the patch you've proposed.
> 
> FWIW on my local branch and the patch applied I see that the vfwmuls
> are being generated (all of the vfmuls are replaced).
> 
>> It'll need to be a define_insn_and_split as its a 3->3 splitter.  The
>> split will emit the two extensions and the widening multiply as 3
>> distinct insns.
> 
> I tried this and while it worked for the first vfwmul the subsequent
> ones are not being combined/optimized.  Now I'm not a combine expert
> at all but it looks as if the source float_extends are being deleted
> 
>   deferring deletion of insn with uid = 39.
>   deferring deletion of insn with uid = 37.
> 
> with that pattern successfully matched, while they are only "rescanned"
> with the synthetic "single widen" one.  Them being deleted (or rather
> absorbed by the vfwmul) no further combination is possible (until after
> split?)
> 
> This seems to be a fundamental difference between the two approaches.
> Maybe the "double widen" pattern can be adjusted to also handle this
> or I did something wrong when writing the splitter?
> 
> With the "single widen" pattern, however, it works more or less
> naturally therefore I'd still suggest going for it.
I'd hoped to have time to revisit all of this today, but I'm quickly 
running out of time.

There has to be some kind of mismatch between the patch or testcase or 
what we're looking at to judge success.

Monday and Tuesday are holidays in the US.  Naturally that means the 
rest of my work week is going to be busier than normal.  I don't want to 
hold things up unnecessarily.

While I really don't see the need to have the bridge pattern, I'm still 
willing to believe that I've missed something, which is why I wanted to 
dive into it myself.  For example, we have heuristics to avoid trying 
too many 4->n combine patterns and we might be tripping over that or who 
knows what.

So my suggestion is that if both of you are getting the desired code, 
then Robin handle the review side of the two patches that introduce the 
helper patterns.

Jeff
  
Robin Dapp July 1, 2023, 11:45 a.m. UTC | #13
> There has to be some kind of mismatch between the patch or testcase
> or what we're looking at to judge success.

Yeah I think the initially posted example was misleading because it
contained an already working example.

> While I really don't see the need to have the bridge pattern, I'm
> still willing to believe that I've missed something, which is why I
> wanted to dive into it myself.  For example, we have heuristics to
> avoid trying too many 4->n combine patterns and we might be tripping
> over that or who knows what.
> 
> So my suggestion is that if both of you are getting the desired code,
> then Robin handle the review side of the two patches that introduce
> the helper patterns.

I went over both patches again and given the context they seem
reasonable to me.  I'd propose go with both of them for now and - in
the meanwhile - I'm going to  brush up on my combine knowledge some
time in the next weeks and get back to this then, hopefully with a
better explanation than my last one.

Regards
 Robin
  
Robin Dapp July 3, 2023, 7:49 a.m. UTC | #14
> Thanks. Ok for trunk?

OK from my side.  As agreed with Jeff, I'm going to get back to this
and revisit/change if needed in the future.

Regards
 Robin
  
钟居哲 July 3, 2023, 8:42 a.m. UTC | #15
We failed to merge it since it's been rejected.
https://patchwork.sourceware.org/project/gcc/patch/20230628041512.188243-1-juzhe.zhong@rivai.ai/ 




juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-07-03 15:49
To: juzhe.zhong
CC: rdapp.gcc; Jeff Law; gcc-patches; kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
> Thanks. Ok for trunk?
 
OK from my side.  As agreed with Jeff, I'm going to get back to this
and revisit/change if needed in the future.
 
Regards
Robin
  
Robin Dapp July 3, 2023, 8:44 a.m. UTC | #16
> We failed to merge it since it's been rejected.
> https://patchwork.sourceware.org/project/gcc/patch/20230628041512.188243-1-juzhe.zhong@rivai.ai/ <https://patchwork.sourceware.org/project/gcc/patch/20230628041512.188243-1-juzhe.zhong@rivai.ai/> 

Err, who rejected?  Or is this about the patch itself
that doesn't apply cleanly anymore?

Regards
 Robin
  
钟居哲 July 3, 2023, 8:45 a.m. UTC | #17
We can apply it but not sure why the patchwork shows it's rejected.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-07-03 16:44
To: juzhe.zhong@rivai.ai
CC: rdapp.gcc; jeffreyalaw; gcc-patches; kito.cheng; Kito.cheng; palmer; palmer
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
> We failed to merge it since it's been rejected.
> https://patchwork.sourceware.org/project/gcc/patch/20230628041512.188243-1-juzhe.zhong@rivai.ai/ <https://patchwork.sourceware.org/project/gcc/patch/20230628041512.188243-1-juzhe.zhong@rivai.ai/> 
 
Err, who rejected?  Or is this about the patch itself
that doesn't apply cleanly anymore?
 
Regards
Robin
  
Robin Dapp July 3, 2023, 8:49 a.m. UTC | #18
On 7/3/23 10:45, juzhe.zhong@rivai.ai wrote:
> We can apply it but not sure why the patchwork shows it's rejected.

I believe it also failed for me locally because the order of
patterns in autovec-opt.md was somehow different.  The one attached
worked for me though after some minor merge adjustments on my branch.

Regards
 Robin

From 29b12a473a31b2caa64fa2d1d97920a460ced0a2 Mon Sep 17 00:00:00 2001
From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Wed, 28 Jun 2023 12:15:12 +0800
Subject: [PATCH] RISC-V: Support vfwmul.vv combine lowering

Consider the following complicate case:
#define TEST_TYPE(TYPE1, TYPE2)                                                \
  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      {                                                                        \
	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
      }                                                                        \
  }

TEST_TYPE (double, float)

Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
So the combine PASS will first try to combine one of the combine extension, and then combine
the other. The combine flow is as follows:

Original IR:
(set (reg 0) (float_extend: (reg 1))
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (reg 0) (reg 3))

First step of combine:
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (float_extend: (reg 1) (reg 3))

Second step of combine:
(set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))

So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).

gcc/ChangeLog:

        * config/riscv/autovec-opt.md (@pred_single_widen_mul<any_extend:su><mode>): Change "@" into "*" in pattern name which simplifies build files.
        (*pred_single_widen_mul<any_extend:su><mode>): Ditto.
        (*pred_single_widen_mul<mode>): New pattern.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/widen/widen-3.c: Add floating-point.
        * gcc.target/riscv/rvv/autovec/widen/widen-7.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c: New test.
---
 gcc/config/riscv/autovec-opt.md               | 39 +++++++++++++++++++
 .../riscv/rvv/autovec/widen/widen-3.c         |  7 +++-
 .../riscv/rvv/autovec/widen/widen-7.c         |  7 +++-
 .../rvv/autovec/widen/widen-complicate-3.c    |  7 +++-
 .../riscv/rvv/autovec/widen/widen_run-3.c     |  5 ++-
 .../riscv/rvv/autovec/widen/widen_run-7.c     |  5 ++-
 .../rvv/autovec/widen/widen_run_zvfh-3.c      | 28 +++++++++++++
 .../rvv/autovec/widen/widen_run_zvfh-7.c      | 28 +++++++++++++
 8 files changed, 116 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index fd9cd27f50a..99b609a99d9 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -406,6 +406,45 @@ (define_insn "*pred_extract_first_sextsi<mode>"
   [(set_attr "type" "vimovvx")
    (set_attr "mode" "<MODE>")])
 
+;; We don't have vfwmul.wv instruction like vfwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vfwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "*pred_single_widen_mul<mode>"
+  [(set (match_operand:VWEXTF 0 "register_operand"                  "=&vr,  &vr")
+       (if_then_else:VWEXTF
+         (unspec:<VM>
+           [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+            (match_operand 5 "vector_length_operand"              "   rK,   rK")
+            (match_operand 6 "const_int_operand"                  "    i,    i")
+            (match_operand 7 "const_int_operand"                  "    i,    i")
+            (match_operand 8 "const_int_operand"                  "    i,    i")
+            (match_operand 9 "const_int_operand"                  "    i,    i")
+            (reg:SI VL_REGNUM)
+            (reg:SI VTYPE_REGNUM)
+            (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
+         (mult:VWEXTF
+           (float_extend:VWEXTF
+             (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+           (match_operand:VWEXTF 3 "register_operand"             "   vr,   vr"))
+         (match_operand:VWEXTF 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    insn_code icode = code_for_pred_extend (<MODE>mode);
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    rtx ops[] = {tmp, operands[4]};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
+
+    emit_insn (gen_pred (MULT, <MODE>mode, operands[0], operands[1], operands[2],
+                        operands[3], tmp, operands[5], operands[6],
+                        operands[7], operands[8], operands[9]));
+    DONE;
+  }
+  [(set_attr "type" "vfwmul")
+   (set_attr "mode" "<MODE>")])
+
 ;; -------------------------------------------------------------------------
 ;; ---- [FP] VFWMACC
 ;; -------------------------------------------------------------------------
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
index 609a5c09f70..b2b14405902 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <stdint-gcc.h>
 
@@ -19,9 +19,12 @@
   TEST_TYPE (int32_t, int16_t)                                                 \
   TEST_TYPE (uint32_t, uint16_t)                                               \
   TEST_TYPE (int64_t, int32_t)                                                 \
-  TEST_TYPE (uint64_t, uint32_t)
+  TEST_TYPE (uint64_t, uint32_t)                                               \
+  TEST_TYPE (float, _Float16)                                                  \
+  TEST_TYPE (double, float)
 
 TEST_ALL ()
 
 /* { dg-final { scan-assembler-times {\tvwmul\.vv} 3 } } */
 /* { dg-final { scan-assembler-times {\tvwmulu\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvfwmul\.vv} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
index cc43d9ba3fe..3806e8b98ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <stdint-gcc.h>
 
@@ -19,9 +19,12 @@
   TEST_TYPE (int32_t, int16_t)                                                 \
   TEST_TYPE (uint32_t, uint16_t)                                               \
   TEST_TYPE (int64_t, int32_t)                                                 \
-  TEST_TYPE (uint64_t, uint32_t)
+  TEST_TYPE (uint64_t, uint32_t)                                               \
+  TEST_TYPE (float, _Float16)                                                  \
+  TEST_TYPE (double, float)
 
 TEST_ALL ()
 
 /* { dg-final { scan-assembler-times {\tvsext\.vf2} 3 } } */
 /* { dg-final { scan-assembler-times {\tvzext\.vf2} 3 } } */
+/* { dg-final { scan-assembler-times {\tvfwcvt} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
index e1fd79430c3..1515374890d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <stdint-gcc.h>
 
@@ -24,9 +24,12 @@
   TEST_TYPE (int32_t, int16_t)                                                 \
   TEST_TYPE (uint32_t, uint16_t)                                               \
   TEST_TYPE (int64_t, int32_t)                                                 \
-  TEST_TYPE (uint64_t, uint32_t)
+  TEST_TYPE (uint64_t, uint32_t)                                               \
+  TEST_TYPE (float, _Float16)                                                  \
+  TEST_TYPE (double, float)
 
 TEST_ALL ()
 
 /* { dg-final { scan-assembler-times {\tvwmul\.vv} 12 } } */
 /* { dg-final { scan-assembler-times {\tvwmulu\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvfwmul\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
index beb0cc2b58b..b7dd60fa8e8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <assert.h>
 #include "widen-3.c"
@@ -25,7 +25,8 @@
   RUN (int32_t, int16_t, -32768)                                               \
   RUN (uint32_t, uint16_t, 65535)                                              \
   RUN (int64_t, int32_t, -2147483648)                                          \
-  RUN (uint64_t, uint32_t, 4294967295)
+  RUN (uint64_t, uint32_t, 4294967295)                                         \
+  RUN (double, float, -2147483648)
 
 int
 main ()
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
index 4abddd5d718..ab29f4a0f70 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <assert.h>
 #include "widen-7.c"
@@ -25,7 +25,8 @@
   RUN (int32_t, int16_t, -32768)                                               \
   RUN (uint32_t, uint16_t, 65535)                                              \
   RUN (int64_t, int32_t, -2147483648)                                          \
-  RUN (uint64_t, uint32_t, 4294967295)
+  RUN (uint64_t, uint32_t, 4294967295)                                         \
+  RUN (double, float, -2147483648)
 
 int
 main ()
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
new file mode 100644
index 00000000000..c3efd0b97bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-3.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE2 b##TYPE2[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % 8723;                                          \
+      b##TYPE2[i] = LIMIT + i & 1964;                                          \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == ((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]));
+
+#define RUN_ALL() RUN (float, _Float16, -32768)
+
+int
+main ()
+{
+  RUN_ALL ()
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c
new file mode 100644
index 00000000000..60e2401c088
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-7.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE1 b##TYPE1[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % LIMIT;                                         \
+      b##TYPE1[i] = LIMIT + i & LIMIT;                                         \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE1, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == (((TYPE1) a##TYPE2[i]) * b##TYPE1[i]));
+
+#define RUN_ALL() RUN (float, _Float16, -32768)
+
+int
+main ()
+{
+  RUN_ALL ()
+}
  
钟居哲 July 3, 2023, 8:51 a.m. UTC | #19
OK. Thanks. Will commit with your cleanup patch.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-07-03 16:49
To: juzhe.zhong@rivai.ai
CC: rdapp.gcc; jeffreyalaw; gcc-patches; kito.cheng; Kito.cheng; palmer; palmer
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
On 7/3/23 10:45, juzhe.zhong@rivai.ai wrote:
> We can apply it but not sure why the patchwork shows it's rejected.
 
I believe it also failed for me locally because the order of
patterns in autovec-opt.md was somehow different.  The one attached
worked for me though after some minor merge adjustments on my branch.
 
Regards
Robin
 
From 29b12a473a31b2caa64fa2d1d97920a460ced0a2 Mon Sep 17 00:00:00 2001
From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Wed, 28 Jun 2023 12:15:12 +0800
Subject: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
Consider the following complicate case:
#define TEST_TYPE(TYPE1, TYPE2)                                                \
  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      {                                                                        \
dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
      }                                                                        \
  }
 
TEST_TYPE (double, float)
 
Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
So the combine PASS will first try to combine one of the combine extension, and then combine
the other. The combine flow is as follows:
 
Original IR:
(set (reg 0) (float_extend: (reg 1))
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (reg 0) (reg 3))
 
First step of combine:
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (float_extend: (reg 1) (reg 3))
 
Second step of combine:
(set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))
 
So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).
 
gcc/ChangeLog:
 
        * config/riscv/autovec-opt.md (@pred_single_widen_mul<any_extend:su><mode>): Change "@" into "*" in pattern name which simplifies build files.
        (*pred_single_widen_mul<any_extend:su><mode>): Ditto.
        (*pred_single_widen_mul<mode>): New pattern.
 
gcc/testsuite/ChangeLog:
 
        * gcc.target/riscv/rvv/autovec/widen/widen-3.c: Add floating-point.
        * gcc.target/riscv/rvv/autovec/widen/widen-7.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: Ditto.
        * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c: New test.
---
gcc/config/riscv/autovec-opt.md               | 39 +++++++++++++++++++
.../riscv/rvv/autovec/widen/widen-3.c         |  7 +++-
.../riscv/rvv/autovec/widen/widen-7.c         |  7 +++-
.../rvv/autovec/widen/widen-complicate-3.c    |  7 +++-
.../riscv/rvv/autovec/widen/widen_run-3.c     |  5 ++-
.../riscv/rvv/autovec/widen/widen_run-7.c     |  5 ++-
.../rvv/autovec/widen/widen_run_zvfh-3.c      | 28 +++++++++++++
.../rvv/autovec/widen/widen_run_zvfh-7.c      | 28 +++++++++++++
8 files changed, 116 insertions(+), 10 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index fd9cd27f50a..99b609a99d9 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -406,6 +406,45 @@ (define_insn "*pred_extract_first_sextsi<mode>"
   [(set_attr "type" "vimovvx")
    (set_attr "mode" "<MODE>")])
+;; We don't have vfwmul.wv instruction like vfwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vfwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "*pred_single_widen_mul<mode>"
+  [(set (match_operand:VWEXTF 0 "register_operand"                  "=&vr,  &vr")
+       (if_then_else:VWEXTF
+         (unspec:<VM>
+           [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+            (match_operand 5 "vector_length_operand"              "   rK,   rK")
+            (match_operand 6 "const_int_operand"                  "    i,    i")
+            (match_operand 7 "const_int_operand"                  "    i,    i")
+            (match_operand 8 "const_int_operand"                  "    i,    i")
+            (match_operand 9 "const_int_operand"                  "    i,    i")
+            (reg:SI VL_REGNUM)
+            (reg:SI VTYPE_REGNUM)
+            (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
+         (mult:VWEXTF
+           (float_extend:VWEXTF
+             (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+           (match_operand:VWEXTF 3 "register_operand"             "   vr,   vr"))
+         (match_operand:VWEXTF 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    insn_code icode = code_for_pred_extend (<MODE>mode);
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    rtx ops[] = {tmp, operands[4]};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
+
+    emit_insn (gen_pred (MULT, <MODE>mode, operands[0], operands[1], operands[2],
+                        operands[3], tmp, operands[5], operands[6],
+                        operands[7], operands[8], operands[9]));
+    DONE;
+  }
+  [(set_attr "type" "vfwmul")
+   (set_attr "mode" "<MODE>")])
+
;; -------------------------------------------------------------------------
;; ---- [FP] VFWMACC
;; -------------------------------------------------------------------------
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
index 609a5c09f70..b2b14405902 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
#include <stdint-gcc.h>
@@ -19,9 +19,12 @@
   TEST_TYPE (int32_t, int16_t)                                                 \
   TEST_TYPE (uint32_t, uint16_t)                                               \
   TEST_TYPE (int64_t, int32_t)                                                 \
-  TEST_TYPE (uint64_t, uint32_t)
+  TEST_TYPE (uint64_t, uint32_t)                                               \
+  TEST_TYPE (float, _Float16)                                                  \
+  TEST_TYPE (double, float)
TEST_ALL ()
/* { dg-final { scan-assembler-times {\tvwmul\.vv} 3 } } */
/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvfwmul\.vv} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
index cc43d9ba3fe..3806e8b98ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
#include <stdint-gcc.h>
@@ -19,9 +19,12 @@
   TEST_TYPE (int32_t, int16_t)                                                 \
   TEST_TYPE (uint32_t, uint16_t)                                               \
   TEST_TYPE (int64_t, int32_t)                                                 \
-  TEST_TYPE (uint64_t, uint32_t)
+  TEST_TYPE (uint64_t, uint32_t)                                               \
+  TEST_TYPE (float, _Float16)                                                  \
+  TEST_TYPE (double, float)
TEST_ALL ()
/* { dg-final { scan-assembler-times {\tvsext\.vf2} 3 } } */
/* { dg-final { scan-assembler-times {\tvzext\.vf2} 3 } } */
+/* { dg-final { scan-assembler-times {\tvfwcvt} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
index e1fd79430c3..1515374890d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
#include <stdint-gcc.h>
@@ -24,9 +24,12 @@
   TEST_TYPE (int32_t, int16_t)                                                 \
   TEST_TYPE (uint32_t, uint16_t)                                               \
   TEST_TYPE (int64_t, int32_t)                                                 \
-  TEST_TYPE (uint64_t, uint32_t)
+  TEST_TYPE (uint64_t, uint32_t)                                               \
+  TEST_TYPE (float, _Float16)                                                  \
+  TEST_TYPE (double, float)
TEST_ALL ()
/* { dg-final { scan-assembler-times {\tvwmul\.vv} 12 } } */
/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvfwmul\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
index beb0cc2b58b..b7dd60fa8e8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
@@ -1,5 +1,5 @@
/* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
#include <assert.h>
#include "widen-3.c"
@@ -25,7 +25,8 @@
   RUN (int32_t, int16_t, -32768)                                               \
   RUN (uint32_t, uint16_t, 65535)                                              \
   RUN (int64_t, int32_t, -2147483648)                                          \
-  RUN (uint64_t, uint32_t, 4294967295)
+  RUN (uint64_t, uint32_t, 4294967295)                                         \
+  RUN (double, float, -2147483648)
int
main ()
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
index 4abddd5d718..ab29f4a0f70 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
@@ -1,5 +1,5 @@
/* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
#include <assert.h>
#include "widen-7.c"
@@ -25,7 +25,8 @@
   RUN (int32_t, int16_t, -32768)                                               \
   RUN (uint32_t, uint16_t, 65535)                                              \
   RUN (int64_t, int32_t, -2147483648)                                          \
-  RUN (uint64_t, uint32_t, 4294967295)
+  RUN (uint64_t, uint32_t, 4294967295)                                         \
+  RUN (double, float, -2147483648)
int
main ()
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
new file mode 100644
index 00000000000..c3efd0b97bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-3.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE2 b##TYPE2[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % 8723;                                          \
+      b##TYPE2[i] = LIMIT + i & 1964;                                          \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == ((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]));
+
+#define RUN_ALL() RUN (float, _Float16, -32768)
+
+int
+main ()
+{
+  RUN_ALL ()
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c
new file mode 100644
index 00000000000..60e2401c088
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-7.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE1 b##TYPE1[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % LIMIT;                                         \
+      b##TYPE1[i] = LIMIT + i & LIMIT;                                         \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE1, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == (((TYPE1) a##TYPE2[i]) * b##TYPE1[i]));
+
+#define RUN_ALL() RUN (float, _Float16, -32768)
+
+int
+main ()
+{
+  RUN_ALL ()
+}
-- 
2.41.0
  
Jeff Law July 7, 2023, 9:11 p.m. UTC | #20
On 7/3/23 02:42, juzhe.zhong@rivai.ai wrote:
> We failed to merge it since it's been rejected.
> https://patchwork.sourceware.org/project/gcc/patch/20230628041512.188243-1-juzhe.zhong@rivai.ai/ <https://patchwork.sourceware.org/project/gcc/patch/20230628041512.188243-1-juzhe.zhong@rivai.ai/>
That was based on the belief that the bridging patterns should not be 
needed.  With the decision to move forward with those patterns this 
patch should be reconsidered.

jeff
  
钟居哲 July 7, 2023, 11:05 p.m. UTC | #21
Sure. 

We can come back to see in the future which doesn't change this codegen quality:
https://godbolt.org/z/d6rWPTWeW 



juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-07-08 05:11
To: juzhe.zhong@rivai.ai; Robin Dapp
CC: gcc-patches; kito.cheng; Kito.cheng; palmer; palmer
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 7/3/23 02:42, juzhe.zhong@rivai.ai wrote:
> We failed to merge it since it's been rejected.
> https://patchwork.sourceware.org/project/gcc/patch/20230628041512.188243-1-juzhe.zhong@rivai.ai/ <https://patchwork.sourceware.org/project/gcc/patch/20230628041512.188243-1-juzhe.zhong@rivai.ai/>
That was based on the belief that the bridging patterns should not be 
needed.  With the decision to move forward with those patterns this 
patch should be reconsidered.
 
jeff
  

Patch

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 28040805b23..1fcd55ac2a0 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -21,7 +21,7 @@ 
 ;; We don't have vwmul.wv instruction like vwadd.wv in RVV.
 ;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance
 ;; optimization of instructions combine.
-(define_insn_and_split "@pred_single_widen_mul<any_extend:su><mode>"
+(define_insn_and_split "*pred_single_widen_mul<any_extend:su><mode>"
   [(set (match_operand:VWEXTI 0 "register_operand"                  "=&vr,&vr")
 	(if_then_else:VWEXTI
 	  (unspec:<VM>
@@ -405,3 +405,42 @@ 
   "vmv.x.s\t%0,%1"
   [(set_attr "type" "vimovvx")
    (set_attr "mode" "<MODE>")])
+
+;; We don't have vfwmul.wv instruction like vfwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vfwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "*pred_single_widen_mul<mode>"
+  [(set (match_operand:VWEXTF 0 "register_operand"                  "=&vr,  &vr")
+	(if_then_else:VWEXTF
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "    i,    i")
+	     (match_operand 7 "const_int_operand"                  "    i,    i")
+	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	     (match_operand 9 "const_int_operand"                  "    i,    i")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)
+	     (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
+	  (mult:VWEXTF
+	    (float_extend:VWEXTF
+	      (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+	    (match_operand:VWEXTF 3 "register_operand"             "   vr,   vr"))
+	  (match_operand:VWEXTF 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    insn_code icode = code_for_pred_extend (<MODE>mode);
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    rtx ops[] = {tmp, operands[4]};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
+
+    emit_insn (gen_pred (MULT, <MODE>mode, operands[0], operands[1], operands[2],
+			 operands[3], tmp, operands[5], operands[6],
+			 operands[7], operands[8], operands[9]));
+    DONE;
+  }
+  [(set_attr "type" "vfwmul")
+   (set_attr "mode" "<MODE>")])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
index 609a5c09f70..b2b14405902 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <stdint-gcc.h>
 
@@ -19,9 +19,12 @@ 
   TEST_TYPE (int32_t, int16_t)                                                 \
   TEST_TYPE (uint32_t, uint16_t)                                               \
   TEST_TYPE (int64_t, int32_t)                                                 \
-  TEST_TYPE (uint64_t, uint32_t)
+  TEST_TYPE (uint64_t, uint32_t)                                               \
+  TEST_TYPE (float, _Float16)                                                  \
+  TEST_TYPE (double, float)
 
 TEST_ALL ()
 
 /* { dg-final { scan-assembler-times {\tvwmul\.vv} 3 } } */
 /* { dg-final { scan-assembler-times {\tvwmulu\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvfwmul\.vv} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
index cc43d9ba3fe..3806e8b98ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <stdint-gcc.h>
 
@@ -19,9 +19,12 @@ 
   TEST_TYPE (int32_t, int16_t)                                                 \
   TEST_TYPE (uint32_t, uint16_t)                                               \
   TEST_TYPE (int64_t, int32_t)                                                 \
-  TEST_TYPE (uint64_t, uint32_t)
+  TEST_TYPE (uint64_t, uint32_t)                                               \
+  TEST_TYPE (float, _Float16)                                                  \
+  TEST_TYPE (double, float)
 
 TEST_ALL ()
 
 /* { dg-final { scan-assembler-times {\tvsext\.vf2} 3 } } */
 /* { dg-final { scan-assembler-times {\tvzext\.vf2} 3 } } */
+/* { dg-final { scan-assembler-times {\tvfwcvt} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
index e1fd79430c3..1515374890d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <stdint-gcc.h>
 
@@ -24,9 +24,12 @@ 
   TEST_TYPE (int32_t, int16_t)                                                 \
   TEST_TYPE (uint32_t, uint16_t)                                               \
   TEST_TYPE (int64_t, int32_t)                                                 \
-  TEST_TYPE (uint64_t, uint32_t)
+  TEST_TYPE (uint64_t, uint32_t)                                               \
+  TEST_TYPE (float, _Float16)                                                  \
+  TEST_TYPE (double, float)
 
 TEST_ALL ()
 
 /* { dg-final { scan-assembler-times {\tvwmul\.vv} 12 } } */
 /* { dg-final { scan-assembler-times {\tvwmulu\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvfwmul\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
index beb0cc2b58b..b7dd60fa8e8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <assert.h>
 #include "widen-3.c"
@@ -25,7 +25,8 @@ 
   RUN (int32_t, int16_t, -32768)                                               \
   RUN (uint32_t, uint16_t, 65535)                                              \
   RUN (int64_t, int32_t, -2147483648)                                          \
-  RUN (uint64_t, uint32_t, 4294967295)
+  RUN (uint64_t, uint32_t, 4294967295)                                         \
+  RUN (double, float, -2147483648)
 
 int
 main ()
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
index 4abddd5d718..ab29f4a0f70 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
@@ -1,5 +1,5 @@ 
 /* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
 
 #include <assert.h>
 #include "widen-7.c"
@@ -25,7 +25,8 @@ 
   RUN (int32_t, int16_t, -32768)                                               \
   RUN (uint32_t, uint16_t, 65535)                                              \
   RUN (int64_t, int32_t, -2147483648)                                          \
-  RUN (uint64_t, uint32_t, 4294967295)
+  RUN (uint64_t, uint32_t, 4294967295)                                         \
+  RUN (double, float, -2147483648)
 
 int
 main ()
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
new file mode 100644
index 00000000000..c3efd0b97bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
@@ -0,0 +1,28 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-3.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE2 b##TYPE2[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % 8723;                                          \
+      b##TYPE2[i] = LIMIT + i & 1964;                                          \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == ((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]));
+
+#define RUN_ALL() RUN (float, _Float16, -32768)
+
+int
+main ()
+{
+  RUN_ALL ()
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c
new file mode 100644
index 00000000000..60e2401c088
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c
@@ -0,0 +1,28 @@ 
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-7.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE1 b##TYPE1[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % LIMIT;                                         \
+      b##TYPE1[i] = LIMIT + i & LIMIT;                                         \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE1, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == (((TYPE1) a##TYPE2[i]) * b##TYPE1[i]));
+
+#define RUN_ALL() RUN (float, _Float16, -32768)
+
+int
+main ()
+{
+  RUN_ALL ()
+}