RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Testing passed
|
Commit Message
This patch support combining cond extend and reduce_sum to cond widen reduce_sum
like combine the following three insns:
(set (reg:RVVM2HI 149)
(const_vector:RVVM2HI repeat [
(const_int 0)
]))
(set (reg:RVVM2HI 138)
(if_then_else:RVVM2HI
(reg:RVVMF8BI 135)
(reg:RVVM2HI 148)
(reg:RVVM2HI 149)))
(set (reg:HI 150)
(unspec:HI [
(reg:RVVM2HI 138)
] UNSPEC_REDUC_SUM))
into one insn:
(set (reg:SI 147)
(unspec:SI [
(if_then_else:RVVM2SI
(reg:RVVMF16BI 135)
(sign_extend:RVVM2SI (reg:RVVM1HI 136))
(const_vector:RVVM2SI repeat [
(const_int 0)
]))
] UNSPEC_REDUC_SUM))
Consider the following C code:
int16_t foo (int8_t *restrict a, int8_t *restrict pred)
{
int16_t sum = 0;
for (int i = 0; i < 16; i += 1)
if (pred[i])
sum += a[i];
return sum;
}
assembly before this patch:
foo:
vsetivli zero,16,e16,m2,ta,ma
li a5,0
vmv.v.i v2,0
vsetvli zero,zero,e8,m1,ta,ma
vl1re8.v v0,0(a1)
vmsne.vi v0,v0,0
vsetvli zero,zero,e16,m2,ta,mu
vle8.v v4,0(a0),v0.t
vmv.s.x v1,a5
vsext.vf2 v2,v4,v0.t
vredsum.vs v2,v2,v1
vmv.x.s a0,v2
slliw a0,a0,16
sraiw a0,a0,16
ret
assembly after this patch:
foo:
li a5,0
vsetivli zero,16,e16,m1,ta,ma
vmv.s.x v3,a5
vsetivli zero,16,e8,m1,ta,ma
vl1re8.v v0,0(a1)
vmsne.vi v0,v0,0
vle8.v v2,0(a0),v0.t
vwredsum.vs v1,v2,v3,v0.t
vsetivli zero,0,e16,m1,ta,ma
vmv.x.s a0,v1
slliw a0,a0,16
sraiw a0,a0,16
ret
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_widen_reduc_plus_scal_<mode>):
New combine patterns.
* config/riscv/autovec.md (vcond_mask_<mode><vm>):
Split vcond_mask pattern into three patterns.
(vec_duplicate_const_0<mode>): Ditto.
(*vcond_mask_<mode><vm>): Ditto.
* config/riscv/predicates.md (vector_register_or_const_0_operand): New.
* config/riscv/riscv-protos.h (enum insn_type): Add REDUCE_OP_M.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c: New test.
---
gcc/config/riscv/autovec-opt.md | 48 +++++++++++++++
gcc/config/riscv/autovec.md | 59 ++++++++++++++++++-
gcc/config/riscv/predicates.md | 5 ++
gcc/config/riscv/riscv-protos.h | 1 +
.../rvv/autovec/cond/cond_widen_reduc-1.c | 30 ++++++++++
.../rvv/autovec/cond/cond_widen_reduc_run-1.c | 28 +++++++++
6 files changed, 170 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c
--
2.36.3
Comments
Hi Lehua,
> +(define_expand "vcond_mask_<mode><vm>"
> + [(set (match_operand:V_VLS 0 "register_operand")
> + (if_then_else:V_VLS
> + (match_operand:<VM> 3 "register_operand")
> + (match_operand:V_VLS 1 "nonmemory_operand")
> + (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
> + "TARGET_VECTOR"
Would it hurt to allow any nonmemory operand here and just force the
"unsupported" constants into a register?
> + {
> + if (satisfies_constraint_Wc0 (operands[2]))
> + {
> + rtx reg = gen_reg_rtx (<MODE>mode);
> + emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));
Can't we emit a move_insn directly without going through the new pattern?
Or will that be optimized away in between? And the new pattern isn't
actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
already do? Probably initial values other than 0 don't work out of the box?
In any case it wouldn't hurt to describe the "design decisions" (i.e. why
we need one thing and not another) so it's easier to follow the patterns
in the future.
Regards
Robin
Hi Robin,
>> +(define_expand "vcond_mask_<mode><vm>"
>> + [(set (match_operand:V_VLS 0 "register_operand")
>> + (if_then_else:V_VLS
>> + (match_operand:<VM> 3 "register_operand")
>> + (match_operand:V_VLS 1 "nonmemory_operand")
>> + (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
>> + "TARGET_VECTOR"
>
> Would it hurt to allow any nonmemory operand here and just force the
> "unsupported" constants into a register?
Are you talking about why operand 2 doesn't use nonmemory_operand
predicate? If so, I think this is because our vmerge.v[vxi]m insns only
supports that operand 1 is a scalar and operand 2 must be a vector register.
>
>> + {
>> + if (satisfies_constraint_Wc0 (operands[2]))
>> + {
>> + rtx reg = gen_reg_rtx (<MODE>mode);
>> + emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));
>
> Can't we emit a move_insn directly without going through the new pattern?
> Or will that be optimized away in between?
Currently, the move_insn of moving vector const 0 to a vector register
will produce the bellow rtl from vector.md, I hope keep the simple
pattern before split1 pass:
(set (reg:RVVM2HI 149)
(if_then_else:RVVM2HI (unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg:DI 146)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2HI repeat [
(const_int 0 [0])
])
(unspec:RVVM2HI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
> And the new pattern isn't
> actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
> already do? Probably initial values other than 0 don't work out of the box?
Yes, the name vec_duplicate_const_0 is missleading, maybe change it to
mov_vec_const_0 is better.
> In any case it wouldn't hurt to describe the "design decisions" (i.e. why
> we need one thing and not another) so it's easier to follow the patterns
> in the future.
Yes, I should have described the design in more detail. Before this
patch, if I wanted to implement the current combine optimization, I
would need to create a combine pattern that combines the following three
instructions:
(set (reg:RVVM2HI 149)
(if_then_else:RVVM2HI (unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg:DI 146)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2HI repeat [
(const_int 0 [0])
])
(unspec:RVVM2HI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
(set (reg:RVVM2HI 138 [ vect__ifc__38.17 ])
(if_then_else:RVVM2HI (reg:RVVMF8BI 135 [ mask__18.11 ])
(sign_extend:RVVM2HI (reg:RVVM1QI 136 [ vect__6.14 ]))
(reg:RVVM2HI 149)))
(set (reg:HI 151)
(unspec:HI [
(reg:RVVM2HI 138 [ vect__ifc__38.17 ])
] UNSPEC_REDUC_SUM))
into one insn:
(set (reg:SF 139 [ <retval> ])
(unspec:SF [
(if_then_else:RVVM2SF (reg:RVVMF16BI 135 [ mask__11.32 ])
(float_extend:RVVM2SF (reg:RVVM1HF 136 [ vect__7.35 ]))
(if_then_else:RVVM2SF (unspec:RVVMF16BI [
(const_vector:RVVMF16BI repeat [
(const_int 1 [0x1])
])
(reg:DI 143)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2SF repeat [
(const_double:SF 0.0 [0x0.0p+0])
])
(unspec:RVVM2SF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
] UNSPEC_REDUC_SUM_UNORDERED))
But I think the combine pattern is ugly because the
move_vector_const_0_to_vector_registers has many extra operands we don't
care about. So I want to introduce a simple pattern of it. I have two
options here, one is to modify the generic mov pattern, changing
define_expand to define_insn_and_split to keep pattern simple. The
reason why I did not choose this one is that mov pattern uses a lot of
places, including reload pass. The impact is relatively big.
The other option (this patch chooses) is to split the vcond_mask pattern
for this particular case, allowing vec_const_0 to be passed into operand
2, and then generating it into a simple mov pattern. The purpose of
splitting the vcond_mask pattern into three patterns is to generate a
simple mov pattern. Simply allowing operand 2 of vcond_mask to allow vec
const 0 will invalidate other existing combine patterns because they
only allow the operand to be register. I wonder if these details explain
why?
Hi Lehua,
>> Would it hurt to allow any nonmemory operand here and just force the
>> "unsupported" constants into a register?
>
> Are you talking about why operand 2 doesn't use nonmemory_operand
> predicate? If so, I think this is because our vmerge.v[vxi]m insns
> only supports that operand 1 is a scalar and operand 2 must be a
> vector register.
My question was rather:
Why doesn't something like
(define_insn_and_split "vcond_mask_<mode><vm>"
[(set (match_operand:V_VLS 0 "register_operand")
(if_then_else:V_VLS
(match_operand:<VM> 3 "register_operand")
(match_operand:V_VLS 1 "nonmemory_operand")
(match_operand:V_VLS 2 "nonmemory_operand")))]
"TARGET_VECTOR && can_create_pseudo_p ()"
"#"
"&& 1"
[(const_int 0)]
{
/* The order of vcond_mask is opposite to pred_merge. */
if (REG_P (operands[2]))
operands[2] = force_reg (<MODE>mode, operands[2]);
std::swap (operands[1], operands[2]);
riscv_vector::emit_vlmax_insn (code_for_pred_merge (<MODE>mode),
riscv_vector::MERGE_OP, operands);
DONE;
}
[(set_attr "type" "vector")]
)
suffice? You could disallow operands[2] != 0 if needed.
Regards
Robin
Hi Robin,
>>> Would it hurt to allow any nonmemory operand here and just force the
>>> "unsupported" constants into a register?
>>
>> Are you talking about why operand 2 doesn't use nonmemory_operand
>> predicate? If so, I think this is because our vmerge.v[vxi]m insns
>> only supports that operand 1 is a scalar and operand 2 must be a
>> vector register.
>
> My question was rather:
>
> Why doesn't something like
>
> (define_insn_and_split "vcond_mask_<mode><vm>"
> [(set (match_operand:V_VLS 0 "register_operand")
> (if_then_else:V_VLS
> (match_operand:<VM> 3 "register_operand")
> (match_operand:V_VLS 1 "nonmemory_operand")
> (match_operand:V_VLS 2 "nonmemory_operand")))]
> "TARGET_VECTOR && can_create_pseudo_p ()"
> "#"
> "&& 1"
> [(const_int 0)]
> {
> /* The order of vcond_mask is opposite to pred_merge. */
> if (REG_P (operands[2]))
> operands[2] = force_reg (<MODE>mode, operands[2]);
> std::swap (operands[1], operands[2]);
> riscv_vector::emit_vlmax_insn (code_for_pred_merge (<MODE>mode),
> riscv_vector::MERGE_OP, operands);
> DONE;
> }
> [(set_attr "type" "vector")]
> )
>
> suffice? You could disallow operands[2] != 0 if needed.
I think I understand what you're saying. The reason for not doing this
is because simply allowing the operand 2 of vcond_mask to be vec_const_0
would cause the other combine patterns to fail because they require the
operand 2 of vcond_mask to be a register. Like the following existing
combine patterns (operand 2 as the instruction merge operand, is not
allowed to be non-register):
(define_insn_and_split "*cond_<optab><mode>"
[(set (match_operand:VF 0 "register_operand")
(if_then_else:VF
(match_operand:<VM> 1 "register_operand")
(any_float_unop:VF
(match_operand:VF 2 "register_operand"))
(match_operand:VF 3 "register_operand")))]
"TARGET_VECTOR && can_create_pseudo_p ()"
"#"
"&& 1"
[(const_int 0)]
{
insn_code icode = code_for_pred (<CODE>, <MODE>mode);
rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
riscv_vector::expand_cond_len_unop (icode, ops);
DONE;
}
[(set_attr "type" "vector")])
My current method is still to keep the operand 2 of vcond_mask as a
register, but the pattern of mov_vec_const_0 is simplified, so that the
corresponding combine pattern can be more simple. That's the only reason
I split the vcond_mask into three patterns.
Hi Lehua,
thanks for the explanation.
> My current method is still to keep the operand 2 of vcond_mask as a
> register, but the pattern of mov_vec_const_0 is simplified, so that
> the corresponding combine pattern can be more simple. That's the only
> reason I split the vcond_mask into three patterns.
My "problem" with the separate split it that it really sticks out
and everybody seeing it would wonder why we need it. It's not that
bad of course but it appears as if we messed up somewhere else.
I checked and I don't see additional FAILs with the vmask pattern
that additionally allows a const0 operand (that is forced into a register)
and a force_reg in abs:VF.
Would you mind re-checking if we can avoid the extra
"vec_duplicate_const_0" by changing the other affected patterns
in a similar manner? I really didn't verify in-depth so if we needed
to add a force_reg to every pattern we might need to reconsider.
Still, I'd be unsure if I preferred the "vec_dup_const_0" over
additional force_regs ;)
Regards
Robin
On 2023/9/20 6:02, Robin Dapp wrote:
> Hi Lehua,
>
> thanks for the explanation.
>
>> My current method is still to keep the operand 2 of vcond_mask as a
>> register, but the pattern of mov_vec_const_0 is simplified, so that
>> the corresponding combine pattern can be more simple. That's the only
>> reason I split the vcond_mask into three patterns.
>
> My "problem" with the separate split it that it really sticks out
> and everybody seeing it would wonder why we need it. It's not that
> bad of course but it appears as if we messed up somewhere else.
Can not agree more.
> I checked and I don't see additional FAILs with the vmask pattern
> that additionally allows a const0 operand (that is forced into a register)
> and a force_reg in abs:VF.
>
> Would you mind re-checking if we can avoid the extra
> "vec_duplicate_const_0" by changing the other affected patterns
> in a similar manner? I really didn't verify in-depth so if we needed
> to add a force_reg to every pattern we might need to reconsider.
> Still, I'd be unsure if I preferred the "vec_dup_const_0" over
> additional force_regs ;)
I think that's a little weird, too. I prefer to add a single
define_insn_and_split move_const_0 pattern like following diff, How do
you feel about that?
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index f4dab9fceb8..c0ab96ae8ab 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -973,7 +973,11 @@ expand_const_vector (rtx target, rtx src)
rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx
(mode);
/* Element in range -16 ~ 15 integer or 0.0 floating-point,
we use vmv.v.i instruction. */
- if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
+ /* For const int or float 0, we keep the simple pattern before split1
+ pass. */
+ if (can_create_pseudo_p () && satisfies_constraint_Wc0 (src))
+ emit_insn (gen_mov_vec_const_0 (mode, tmp, src));
+ else if (satisfies_constraint_vi (src))
{
rtx ops[] = {tmp, src};
emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f66ffebba24..b4973125d04 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1640,6 +1640,19 @@
"TARGET_VECTOR"
{})
+(define_insn_and_split "@mov_vec_const_0<mode>"
+ [(set (match_operand:V_VLS 0 "register_operand")
+ (match_operand:V_VLS 1 "vector_const_0_operand"))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+ {
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+ }
+ [(set_attr "type" "vector")])
+
;; vle.v/vse.v,vmv.v.v
(define_insn_and_split "*pred_mov<mode>"
[(set (match_operand:V_VLS 0 "nonimmediate_operand" "=vr,
vr, vd, m, vr, vr")
Hi Robin,
I have posted a V2 patch to implement the method I mentioned. I wonder
if that makes it a little easier to understand?
V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630985.html
On 2023/9/20 12:02, Lehua Ding wrote:
>
>
> On 2023/9/20 6:02, Robin Dapp wrote:
>> Hi Lehua,
>>
>> thanks for the explanation.
>>
>>> My current method is still to keep the operand 2 of vcond_mask as a
>>> register, but the pattern of mov_vec_const_0 is simplified, so that
>>> the corresponding combine pattern can be more simple. That's the only
>>> reason I split the vcond_mask into three patterns.
>>
>> My "problem" with the separate split it that it really sticks out
>> and everybody seeing it would wonder why we need it. It's not that
>> bad of course but it appears as if we messed up somewhere else.
>
> Can not agree more.
>
>> I checked and I don't see additional FAILs with the vmask pattern
>> that additionally allows a const0 operand (that is forced into a
>> register)
>> and a force_reg in abs:VF.
>>
>> Would you mind re-checking if we can avoid the extra
>> "vec_duplicate_const_0" by changing the other affected patterns
>> in a similar manner? I really didn't verify in-depth so if we needed
>> to add a force_reg to every pattern we might need to reconsider.
>> Still, I'd be unsure if I preferred the "vec_dup_const_0" over
>> additional force_regs ;)
>
> I think that's a little weird, too. I prefer to add a single
> define_insn_and_split move_const_0 pattern like following diff, How do
> you feel about that?
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index f4dab9fceb8..c0ab96ae8ab 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -973,7 +973,11 @@ expand_const_vector (rtx target, rtx src)
> rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx
> (mode);
> /* Element in range -16 ~ 15 integer or 0.0 floating-point,
> we use vmv.v.i instruction. */
> - if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
> + /* For const int or float 0, we keep the simple pattern before
> split1
> + pass. */
> + if (can_create_pseudo_p () && satisfies_constraint_Wc0 (src))
> + emit_insn (gen_mov_vec_const_0 (mode, tmp, src));
> + else if (satisfies_constraint_vi (src))
> {
> rtx ops[] = {tmp, src};
> emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index f66ffebba24..b4973125d04 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1640,6 +1640,19 @@
> "TARGET_VECTOR"
> {})
>
> +(define_insn_and_split "@mov_vec_const_0<mode>"
> + [(set (match_operand:V_VLS 0 "register_operand")
> + (match_operand:V_VLS 1 "vector_const_0_operand"))]
> + "TARGET_VECTOR && can_create_pseudo_p ()"
> + "#"
> + "&& 1"
> + [(const_int 0)]
> + {
> + emit_move_insn (operands[0], operands[1]);
> + DONE;
> + }
> + [(set_attr "type" "vector")])
> +
> ;; vle.v/vse.v,vmv.v.v
> (define_insn_and_split "*pred_mov<mode>"
> [(set (match_operand:V_VLS 0 "nonimmediate_operand" "=vr,
> vr, vd, m, vr, vr")
>
@@ -1284,6 +1284,54 @@
}
[(set_attr "type" "vector")])
+;; Combine mask extend + vredsum to mask vwredsum[u]
+(define_insn_and_split "*cond_widen_reduc_plus_scal_<mode>"
+ [(set (match_operand:<V_DOUBLE_EXTEND_VEL> 0 "register_operand")
+ (unspec:<V_DOUBLE_EXTEND_VEL> [
+ (if_then_else:<V_DOUBLE_EXTEND>
+ (match_operand:<VM> 1 "register_operand")
+ (any_extend:<V_DOUBLE_EXTEND>
+ (match_operand:VI_QHS_NO_M8 2 "register_operand"))
+ (match_operand:<V_DOUBLE_EXTEND> 3 "vector_const_0_operand"))
+ ] UNSPEC_REDUC_SUM))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+{
+ rtx ops[] = {operands[0], operands[2], operands[1],
+ gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
+ riscv_vector::expand_reduction (<WREDUC_UNSPEC>,
+ riscv_vector::REDUCE_OP_M,
+ ops, CONST0_RTX (<V_DOUBLE_EXTEND_VEL>mode));
+ DONE;
+}
+[(set_attr "type" "vector")])
+
+;; Combine mask extend + vfredsum to mask vfwredusum
+(define_insn_and_split "*cond_widen_reduc_plus_scal_<mode>"
+ [(set (match_operand:<V_DOUBLE_EXTEND_VEL> 0 "register_operand")
+ (unspec:<V_DOUBLE_EXTEND_VEL> [
+ (if_then_else:<V_DOUBLE_EXTEND>
+ (match_operand:<VM> 1 "register_operand")
+ (float_extend:<V_DOUBLE_EXTEND>
+ (match_operand:VF_HS_NO_M8 2 "register_operand"))
+ (match_operand:<V_DOUBLE_EXTEND> 3 "vector_const_0_operand"))
+ ] UNSPEC_REDUC_SUM_UNORDERED))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+{
+ rtx ops[] = {operands[0], operands[2], operands[1],
+ gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
+ riscv_vector::expand_reduction (UNSPEC_WREDUC_SUM_UNORDERED,
+ riscv_vector::REDUCE_OP_M_FRM_DYN,
+ ops, CONST0_RTX (<V_DOUBLE_EXTEND_VEL>mode));
+ DONE;
+}
+[(set_attr "type" "vector")])
+
;; =============================================================================
;; Misc combine patterns
;; =============================================================================
@@ -545,7 +545,64 @@
;; - vfmerge.vf
;; -------------------------------------------------------------------------
-(define_insn_and_split "vcond_mask_<mode><vm>"
+;; The purpose of splitting the original pattern into three patterns here is
+;; to combine the following three insns:
+;; (set (reg:RVVM2HI 149)
+;; (const_vector:RVVM2HI repeat [
+;; (const_int 0)
+;; ]))
+;; (set (reg:RVVM2HI 138)
+;; (if_then_else:RVVM2HI
+;; (reg:RVVMF8BI 135)
+;; (reg:RVVM2HI 148)
+;; (reg:RVVM2HI 149)))
+;; (set (reg:HI 150)
+;; (unspec:HI [
+;; (reg:RVVM2HI 138)
+;; ] UNSPEC_REDUC_SUM))
+;;
+;; into one insn:
+;;
+;; (set (reg:SI 147)
+;; (unspec:SI [
+;; (if_then_else:RVVM2SI
+;; (reg:RVVMF16BI 135)
+;; (sign_extend:RVVM2SI (reg:RVVM1HI 136))
+;; (const_vector:RVVM2SI repeat [
+;; (const_int 0)
+;; ]))
+;; ] UNSPEC_REDUC_SUM))
+
+(define_expand "vcond_mask_<mode><vm>"
+ [(set (match_operand:V_VLS 0 "register_operand")
+ (if_then_else:V_VLS
+ (match_operand:<VM> 3 "register_operand")
+ (match_operand:V_VLS 1 "nonmemory_operand")
+ (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
+ "TARGET_VECTOR"
+ {
+ if (satisfies_constraint_Wc0 (operands[2]))
+ {
+ rtx reg = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));
+ operands[2] = reg;
+ }
+ })
+
+(define_insn_and_split "vec_duplicate_const_0<mode>"
+ [(set (match_operand:V_VLS 0 "register_operand")
+ (match_operand:V_VLS 1 "vector_const_0_operand"))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+{
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+}
+ [(set_attr "type" "vector")])
+
+(define_insn_and_split "*vcond_mask_<mode><vm>"
[(set (match_operand:V_VLS 0 "register_operand")
(if_then_else:V_VLS
(match_operand:<VM> 3 "register_operand")
@@ -463,6 +463,11 @@
(ior (match_operand 0 "register_operand")
(match_code "const_vector")))
+(define_predicate "vector_register_or_const_0_operand"
+ (ior (match_operand 0 "register_operand")
+ (and (match_code "const_vector")
+ (match_test "satisfies_constraint_Wc0 (op)"))))
+
(define_predicate "vector_gs_scale_operand_16"
(and (match_code "const_int")
(match_test "INTVAL (op) == 1 || INTVAL (op) == 2")))
@@ -337,6 +337,7 @@ enum insn_type : unsigned int
/* For vreduce, no mask policy operand. */
REDUCE_OP = __NORMAL_OP_TA | BINARY_OP_P | VTYPE_MODE_FROM_OP1_P,
+ REDUCE_OP_M = __MASK_OP_TA | BINARY_OP_P | VTYPE_MODE_FROM_OP1_P,
REDUCE_OP_FRM_DYN = REDUCE_OP | FRM_DYN_P | VTYPE_MODE_FROM_OP1_P,
REDUCE_OP_M_FRM_DYN
= __MASK_OP_TA | BINARY_OP_P | FRM_DYN_P | VTYPE_MODE_FROM_OP1_P,
new file mode 100644
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv64gcv_zvfh_zvl128b -mabi=lp64d --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math" } */
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2, N) \
+ __attribute__ ((noipa)) \
+ TYPE1 reduc_##TYPE1##_##TYPE2 (TYPE2 *restrict a, TYPE2 *restrict pred) \
+ { \
+ TYPE1 sum = 0; \
+ for (int i = 0; i < N; i += 1) \
+ if (pred[i]) \
+ sum += a[i]; \
+ return sum; \
+ }
+
+#define TEST_ALL(TEST) \
+ TEST (int16_t, int8_t, 16) \
+ TEST (int32_t, int16_t, 8) \
+ TEST (int64_t, int32_t, 4) \
+ TEST (uint16_t, uint8_t, 16) \
+ TEST (uint32_t, uint16_t, 8) \
+ TEST (uint64_t, uint32_t, 4) \
+ TEST (float, _Float16, 8) \
+ TEST (double, float, 4)
+
+TEST_ALL (TEST_TYPE)
+
+/* { dg-final { scan-assembler-times {\tvfwredusum\.vs\tv[0-9]+,v[0-9]+,v[0-9]+,v0\.t} 2 } } */
+/* { dg-final { scan-assembler-times {\tvwredsum\.vs\tv[0-9]+,v[0-9]+,v[0-9]+,v0\.t} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwredsumu\.vs\tv[0-9]+,v[0-9]+,v[0-9]+,v0\.t} 3 } } */
new file mode 100644
@@ -0,0 +1,28 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "cond_widen_reduc-1.c"
+
+#define RUN(TYPE1, TYPE2, N) \
+ { \
+ TYPE2 a[N]; \
+ TYPE2 pred[N]; \
+ TYPE1 r = 0; \
+ for (int i = 0; i < N; i++) \
+ { \
+ a[i] = (i * 0.1) * (i & 1 ? 1 : -1); \
+ pred[i] = i % 3; \
+ if (pred[i]) \
+ r += a[i]; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ if (r != reduc_##TYPE1##_##TYPE2 (a, pred)) \
+ __builtin_abort (); \
+ }
+
+int __attribute__ ((optimize (1)))
+main ()
+{
+ TEST_ALL (RUN)
+ return 0;
+}