[v1,1/4] RISC-V: Combine vec_duplicate + vwaddu.vv to vwaddu.vx on GR2VR cost

Message ID 20250912233514.3586587-2-pan2.li@intel.com
State Superseded
Delegated to: Robin Dapp
Headers
Series RISC-V: Combine vec_duplicate + v{widen}u.vv to v{widen}u.vx on GR2VR cost |

Checks

Context Check Description
rivoscibot/toolchain-ci-rivos-lint warning Lint failed
rivoscibot/toolchain-ci-rivos-apply-patch success Patch applied
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-test success Testing passed

Commit Message

Li, Pan2 Sept. 12, 2025, 2:14 p.m. UTC
  From: Pan Li <pan2.li@intel.com>

This patch would like to combine the vec_duplicate + vwaddu.vv to the
vwaddu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

Before this patch:
  11       beq a3,zero,.L8
  12       vsetvli a5,zero,e32,m1,ta,ma
  13       vmv.v.x v2,a2
  ...
  16   .L3:
  17       vsetvli a5,a3,e32,m1,ta,ma
  ...
  22       vwaddu.vv v1,v2,v3
  ...
  25       bne a3,zero,.L3

After this patch:
  11       beq a3,zero,.L8
  ...
  14    .L3:
  15       vsetvli a5,a3,e32,m1,ta,ma
  ...
  20       vwaddu.vx v1,a2,v3
  ...
  23       bne a3,zero,.L3

The pattern of this patch only works on DImode, aka below pattern.
v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode)
  + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode));

Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss
this extend op after expand.

For uint16_t => uint32_t we have:
(set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0))

For uint32_t => uint64_t we have:
(set (reg:DI 148 [ _6 ])
     (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0)))

We can see there is no zero_extend for uint16_t to uint32_t, and we
cannot hit the pattern above.  So the combine will try below pattern
for uint16_t to uint32_t.

v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode)
  + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode)))

But it cannot match the vwaddu sematics, thus we need another handing
for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to
uint16_t.

gcc/ChangeLog:

	* config/riscv/autovec-opt.md (*widen_frist_<any_extend:su>_vx_<mode>):
	Add helper bridge pattern for vwaddu.vx combine.
	(*widen_<any_widen_binop:optab>_<any_extend:su>_vx_<mode>): Add
	new pattern to match vwaddu.vx combine.
	* config/riscv/iterators.md: Add code attr to get extend CODE.
	* config/riscv/vector-iterators.md: Add Dmode iterator for
	widen.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/autovec-opt.md      | 44 ++++++++++++++++++++++++++++
 gcc/config/riscv/iterators.md        |  3 ++
 gcc/config/riscv/vector-iterators.md | 16 ++++++++++
 3 files changed, 63 insertions(+)
  

Comments

Robin Dapp Sept. 13, 2025, 5:01 a.m. UTC | #1
Hi Pan,

> The pattern of this patch only works on DImode, aka below pattern.
> v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode)
>   + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode));
>
> Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss
> this extend op after expand.
>
> For uint16_t => uint32_t we have:
> (set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0))
>
> For uint32_t => uint64_t we have:
> (set (reg:DI 148 [ _6 ])
>      (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0)))
>
> We can see there is no zero_extend for uint16_t to uint32_t, and we
> cannot hit the pattern above.  So the combine will try below pattern
> for uint16_t to uint32_t.
>
> v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode)
>   + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode)))
>
> But it cannot match the vwaddu sematics, thus we need another handing
> for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to
> uint16_t.

Where does the actual HI->SI extension happen then?  No chance we see it
during combine/late-combine?

> diff --git a/gcc/config/riscv/autovec-opt.md 
> b/gcc/config/riscv/autovec-opt.md
> index 02f19bc6a42..fefd2dc63c3 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -1868,6 +1868,50 @@ (define_insn_and_split "*mul_minus_vx_<mode>"
>    }
>    [(set_attr "type" "vimuladd")])
>  
> +(define_insn_and_split "*widen_frist_<any_extend:su>_vx_<mode>"

first :)

> + [(set (match_operand:VWEXTI_D   0 "register_operand")
> +       (vec_duplicate:VWEXTI_D
> +	 (any_extend:<VEL>
> +	 (match_operand:<VSUBEL> 1 "register_operand"))))]
> +  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +  {
> +    machine_mode d_trunc_mode = <V_DOUBLE_TRUNC>mode;
> +    rtx vec_dup = gen_reg_rtx (d_trunc_mode);
> +    insn_code icode = code_for_pred_broadcast (d_trunc_mode);
> +    rtx vec_dup_ops[] = {vec_dup, operands[1]};
> +    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, vec_dup_ops);
> +
> +    icode = code_for_pred_vf2 (<any_extend:CODE>, <MODE>mode);
> +    rtx extend_ops[] = {operands[0], vec_dup};
> +    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, extend_ops);

Technically, shouldn't it be the other way around?  Like first extend and then 
broadcast?
  
Li, Pan2 Sept. 13, 2025, 5:58 a.m. UTC | #2
Thanks Robin for comments.

> Where does the actual HI->SI extension happen then?  No chance we see it
> during combine/late-combine?

There is no HI->SI extension from the 272.expand and combine dump, so there is no change in RTL or
It is unsafe here. I think we need additional fix to make it work, may related to int promotion I guess.

> first :)

Oops, will fix the typo.

> Technically, shouldn't it be the other way around?  Like first extend and then 
> broadcast?

In theory yes, may be I am over-considering here. If it is equality, bring the data
to vector first may be has better opportunities in rvv env.
It is totally to keep original sematics if it is useless or incorrect.

Pan 

-----Original Message-----
From: Robin Dapp <rdapp.gcc@gmail.com> 
Sent: Saturday, September 13, 2025 1:02 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Chen, Ken <ken.chen@intel.com>; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1 1/4] RISC-V: Combine vec_duplicate + vwaddu.vv to vwaddu.vx on GR2VR cost

Hi Pan,

> The pattern of this patch only works on DImode, aka below pattern.
> v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode)
>   + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode));
>
> Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss
> this extend op after expand.
>
> For uint16_t => uint32_t we have:
> (set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0))
>
> For uint32_t => uint64_t we have:
> (set (reg:DI 148 [ _6 ])
>      (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0)))
>
> We can see there is no zero_extend for uint16_t to uint32_t, and we
> cannot hit the pattern above.  So the combine will try below pattern
> for uint16_t to uint32_t.
>
> v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode)
>   + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode)))
>
> But it cannot match the vwaddu sematics, thus we need another handing
> for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to
> uint16_t.

Where does the actual HI->SI extension happen then?  No chance we see it
during combine/late-combine?

> diff --git a/gcc/config/riscv/autovec-opt.md 
> b/gcc/config/riscv/autovec-opt.md
> index 02f19bc6a42..fefd2dc63c3 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -1868,6 +1868,50 @@ (define_insn_and_split "*mul_minus_vx_<mode>"
>    }
>    [(set_attr "type" "vimuladd")])
>  
> +(define_insn_and_split "*widen_frist_<any_extend:su>_vx_<mode>"

first :)

> + [(set (match_operand:VWEXTI_D   0 "register_operand")
> +       (vec_duplicate:VWEXTI_D
> +	 (any_extend:<VEL>
> +	 (match_operand:<VSUBEL> 1 "register_operand"))))]
> +  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +  {
> +    machine_mode d_trunc_mode = <V_DOUBLE_TRUNC>mode;
> +    rtx vec_dup = gen_reg_rtx (d_trunc_mode);
> +    insn_code icode = code_for_pred_broadcast (d_trunc_mode);
> +    rtx vec_dup_ops[] = {vec_dup, operands[1]};
> +    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, vec_dup_ops);
> +
> +    icode = code_for_pred_vf2 (<any_extend:CODE>, <MODE>mode);
> +    rtx extend_ops[] = {operands[0], vec_dup};
> +    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, extend_ops);

Technically, shouldn't it be the other way around?  Like first extend and then 
broadcast?

-- 
Regards
 Robin
  
Robin Dapp Sept. 15, 2025, 6:44 p.m. UTC | #3
>> Where does the actual HI->SI extension happen then?  No chance we see it
>> during combine/late-combine?
>
> There is no HI->SI extension from the 272.expand and combine dump, so there 
> is no change in RTL or
> It is unsafe here. I think we need additional fix to make it work, may 
> related to int promotion I guess.

Hmm, yes, integer promotion would make sense.  Then I guess the variables are 
already in the optimized dump?

> In theory yes, may be I am over-considering here. If it is equality, bring 
> the data
> to vector first may be has better opportunities in rvv env.
> It is totally to keep original sematics if it is useless or incorrect.

It's not absolutely necessary to keep the same order in this case but it helps 
understanding.  I would only change it if there's a combine opportunity or so.
  
Li, Pan2 Sept. 16, 2025, 1:35 a.m. UTC | #4
Thanks Robin for comments.

> Hmm, yes, integer promotion would make sense.  Then I guess the variables are 
> already in the optimized dump?

For uint16_t rs1 input, we have optimized dump similar as below

unsigned int _6;
vector([4,4]) unsigned int _27;
_6 = (unsigned int) rs1_15(D);
_27 = [vec_duplicate_expr] _6;

Thus, I wonder if we can do something during expand, but in another PR.

> It's not absolutely necessary to keep the same order in this case but it helps 
> understanding.  I would only change it if there's a combine opportunity or so.

I see, let me update it in v2.

Pan

-----Original Message-----
From: Robin Dapp <rdapp.gcc@gmail.com> 
Sent: Tuesday, September 16, 2025 2:45 AM
To: Li, Pan2 <pan2.li@intel.com>; Robin Dapp <rdapp.gcc@gmail.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; Chen, Ken <ken.chen@intel.com>; Liu, Hongtao <hongtao.liu@intel.com>; Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: [PATCH v1 1/4] RISC-V: Combine vec_duplicate + vwaddu.vv to vwaddu.vx on GR2VR cost

>> Where does the actual HI->SI extension happen then?  No chance we see it
>> during combine/late-combine?
>
> There is no HI->SI extension from the 272.expand and combine dump, so there 
> is no change in RTL or
> It is unsafe here. I think we need additional fix to make it work, may 
> related to int promotion I guess.

Hmm, yes, integer promotion would make sense.  Then I guess the variables are 
already in the optimized dump?

> In theory yes, may be I am over-considering here. If it is equality, bring 
> the data
> to vector first may be has better opportunities in rvv env.
> It is totally to keep original sematics if it is useless or incorrect.

It's not absolutely necessary to keep the same order in this case but it helps 
understanding.  I would only change it if there's a combine opportunity or so.
  

Patch

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 02f19bc6a42..fefd2dc63c3 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1868,6 +1868,50 @@  (define_insn_and_split "*mul_minus_vx_<mode>"
   }
   [(set_attr "type" "vimuladd")])
 
+(define_insn_and_split "*widen_frist_<any_extend:su>_vx_<mode>"
+ [(set (match_operand:VWEXTI_D   0 "register_operand")
+       (vec_duplicate:VWEXTI_D
+	 (any_extend:<VEL>
+	 (match_operand:<VSUBEL> 1 "register_operand"))))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    machine_mode d_trunc_mode = <V_DOUBLE_TRUNC>mode;
+    rtx vec_dup = gen_reg_rtx (d_trunc_mode);
+    insn_code icode = code_for_pred_broadcast (d_trunc_mode);
+    rtx vec_dup_ops[] = {vec_dup, operands[1]};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, vec_dup_ops);
+
+    icode = code_for_pred_vf2 (<any_extend:CODE>, <MODE>mode);
+    rtx extend_ops[] = {operands[0], vec_dup};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, extend_ops);
+
+    DONE;
+  })
+
+(define_insn_and_split "*widen_<any_widen_binop:optab>_<any_extend:su>_vx_<mode>"
+ [(set (match_operand:VWEXTI_D             0 "register_operand")
+       (any_widen_binop:VWEXTI_D
+	 (any_extend:VWEXTI_D
+	   (match_operand:<V_DOUBLE_TRUNC> 1 "register_operand"))
+	 (vec_duplicate:VWEXTI_D
+	   (any_extend:<VEL>
+	     (match_operand:<VSUBEL>       2 "register_operand")))))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    insn_code icode = code_for_pred_dual_widen_scalar (<any_widen_binop:CODE>,
+						       <any_extend:extend_code>,
+						       <MODE>mode);
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+
+    DONE;
+  }
+  [(set_attr "type" "viwalu")])
 
 ;; =============================================================================
 ;; Combine vec_duplicate + op.vv to op.vf
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index d3002241509..41e56c721c5 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -207,6 +207,9 @@  (define_mode_attr slot12_offset [(SI "-52") (DI "-104")])
 ;; This code iterator allows signed and unsigned widening multiplications
 ;; to use the same template.
 (define_code_iterator any_extend [sign_extend zero_extend])
+(define_code_attr extend_code [
+  (sign_extend "SIGN_EXTEND") (zero_extend "ZERO_EXTEND")
+])
 
 ;; These code iterators allow unsigned and signed extraction to be generated
 ;; from the same template.
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index ed7e9c3a951..45af65642cd 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -943,6 +943,22 @@  (define_mode_iterator V_FRACT [
   (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_VECTOR_ELEN_64")
 ])
 
+(define_mode_iterator VWEXTI_D [
+  (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
+  (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
+
+  (V1DI "riscv_vector::vls_mode_valid_p (V1DImode) && TARGET_VECTOR_ELEN_64")
+  (V2DI "riscv_vector::vls_mode_valid_p (V2DImode) && TARGET_VECTOR_ELEN_64")
+  (V4DI "riscv_vector::vls_mode_valid_p (V4DImode) && TARGET_VECTOR_ELEN_64")
+  (V8DI "riscv_vector::vls_mode_valid_p (V8DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 64")
+  (V16DI "riscv_vector::vls_mode_valid_p (V16DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+  (V32DI "riscv_vector::vls_mode_valid_p (V32DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 256")
+  (V64DI "riscv_vector::vls_mode_valid_p (V64DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 512")
+  (V128DI "riscv_vector::vls_mode_valid_p (V128DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 1024")
+  (V256DI "riscv_vector::vls_mode_valid_p (V256DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 2048")
+  (V512DI "riscv_vector::vls_mode_valid_p (V512DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 4096")
+])
+
 (define_mode_iterator VWEXTI [
   RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_VECTOR_ELEN_64")