Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

Message ID 20220926065604.783193-1-liwei.xu@intel.com
State New
Headers
Series Optimize nested permutation to single VEC_PERM_EXPR [PR54346] |

Commit Message

Liwei Xu Sept. 26, 2022, 6:56 a.m. UTC
  This patch implemented the optimization in PR 54346, which Merges

	c = VEC_PERM_EXPR <a, b, VCST0>;
        d = VEC_PERM_EXPR <c, c, VCST1>;
                to
        d = VEC_PERM_EXPR <a, b, NEW_VCST>;

	Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
        tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
        is ok to removed it.

gcc/ChangeLog:

	PR target/54346
	* match.pd: Merge the index of VCST then generates the new vec_perm.

gcc/testsuite/ChangeLog:

	PR target/54346
	* gcc.dg/pr54346.c: New test.

Co-authored-by: liuhongt <hongtao.liu@intel.com>
---
 gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
 2 files changed, 54 insertions(+)
 create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
  

Comments

Richard Biener Sept. 26, 2022, 8:21 a.m. UTC | #1
On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote:
>
>         This patch implemented the optimization in PR 54346, which Merges
>
>         c = VEC_PERM_EXPR <a, b, VCST0>;
>         d = VEC_PERM_EXPR <c, c, VCST1>;
>                 to
>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>
>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>         is ok to removed it.

Looks good, but leave Richard a chance to ask for VLA vector support which
might be trivial to do.

Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
seem to handle VLA vectors.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         PR target/54346
>         * match.pd: Merge the index of VCST then generates the new vec_perm.
>
> gcc/testsuite/ChangeLog:
>
>         PR target/54346
>         * gcc.dg/pr54346.c: New test.
>
> Co-authored-by: liuhongt <hongtao.liu@intel.com>
> ---
>  gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>  gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>  2 files changed, 54 insertions(+)
>  create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 345bcb701a5..9219b0a10e1 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8086,6 +8086,47 @@ and,
>    (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>
>
> +/* (PR54346) Merge
> +   c = VEC_PERM_EXPR <a, b, VCST0>;
> +   d = VEC_PERM_EXPR <c, c, VCST1>;
> +   to
> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
> +
> +(simplify
> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
> + (with
> +  {
> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
> +      return NULL_TREE;
> +
> +    tree op0;
> +    machine_mode result_mode = TYPE_MODE (type);
> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
> +    vec_perm_builder builder0;
> +    vec_perm_builder builder1;
> +    vec_perm_builder builder2 (nelts, nelts, 1);
> +
> +    if (!tree_to_vec_perm_builder (&builder0, @3)
> +    || !tree_to_vec_perm_builder (&builder1, @4))
> +      return NULL_TREE;
> +
> +    vec_perm_indices sel0 (builder0, 2, nelts);
> +    vec_perm_indices sel1 (builder1, 1, nelts);
> +
> +    for (int i = 0; i < nelts; i++)
> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
> +
> +    vec_perm_indices sel2 (builder2, 2, nelts);
> +
> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
> +      return NULL_TREE;
> +
> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
> +  }
> +  (vec_perm @1 @2 { op0; })))
> +
> +
>  /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>     The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>     constant which when multiplied by a power of 2 contains a unique value
> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
> new file mode 100755
> index 00000000000..d87dc3a79a5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr54346.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-dse1" } */
> +
> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
> +
> +void fun (veci a, veci b, veci *i)
> +{
> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
> +}
> +
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
> \ No newline at end of file
> --
> 2.18.2
>
  
Richard Sandiford Sept. 30, 2022, 9:17 a.m. UTC | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote:
>>
>>         This patch implemented the optimization in PR 54346, which Merges
>>
>>         c = VEC_PERM_EXPR <a, b, VCST0>;
>>         d = VEC_PERM_EXPR <c, c, VCST1>;
>>                 to
>>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>>
>>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>>         is ok to removed it.
>
> Looks good, but leave Richard a chance to ask for VLA vector support which
> might be trivial to do.

Sorry for the slow reply.  It might be tricky to handle the general case,
so I'd suggest going with this for now and dealing with VLA as a follow-up.
(Probably after Prathamesh's changes to fold_vec_perm_expr.)

Thanks,
Richard

> Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
> tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
> seem to handle VLA vectors.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>>         PR target/54346
>>         * match.pd: Merge the index of VCST then generates the new vec_perm.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         PR target/54346
>>         * gcc.dg/pr54346.c: New test.
>>
>> Co-authored-by: liuhongt <hongtao.liu@intel.com>
>> ---
>>  gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>>  gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>>  2 files changed, 54 insertions(+)
>>  create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 345bcb701a5..9219b0a10e1 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -8086,6 +8086,47 @@ and,
>>    (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>>
>>
>> +/* (PR54346) Merge
>> +   c = VEC_PERM_EXPR <a, b, VCST0>;
>> +   d = VEC_PERM_EXPR <c, c, VCST1>;
>> +   to
>> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
>> +
>> +(simplify
>> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
>> + (with
>> +  {
>> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
>> +      return NULL_TREE;
>> +
>> +    tree op0;
>> +    machine_mode result_mode = TYPE_MODE (type);
>> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
>> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
>> +    vec_perm_builder builder0;
>> +    vec_perm_builder builder1;
>> +    vec_perm_builder builder2 (nelts, nelts, 1);
>> +
>> +    if (!tree_to_vec_perm_builder (&builder0, @3)
>> +    || !tree_to_vec_perm_builder (&builder1, @4))
>> +      return NULL_TREE;
>> +
>> +    vec_perm_indices sel0 (builder0, 2, nelts);
>> +    vec_perm_indices sel1 (builder1, 1, nelts);
>> +
>> +    for (int i = 0; i < nelts; i++)
>> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
>> +
>> +    vec_perm_indices sel2 (builder2, 2, nelts);
>> +
>> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
>> +      return NULL_TREE;
>> +
>> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
>> +  }
>> +  (vec_perm @1 @2 { op0; })))
>> +
>> +
>>  /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>>     The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>>     constant which when multiplied by a power of 2 contains a unique value
>> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
>> new file mode 100755
>> index 00000000000..d87dc3a79a5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr54346.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O -fdump-tree-dse1" } */
>> +
>> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
>> +
>> +void fun (veci a, veci b, veci *i)
>> +{
>> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
>> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
>> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
>> \ No newline at end of file
>> --
>> 2.18.2
>>
  
Li, Pan2 via Gcc-patches Sept. 30, 2022, 9:36 a.m. UTC | #3
Hi Richard 

Sure, I’ll also check the VLA case once I’m back from vacation, thank you for your help.

Best Regards
Liwei

> On 30 Sep 2022, at 5:17 pm, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote:
>>> 
>>>        This patch implemented the optimization in PR 54346, which Merges
>>> 
>>>        c = VEC_PERM_EXPR <a, b, VCST0>;
>>>        d = VEC_PERM_EXPR <c, c, VCST1>;
>>>                to
>>>        d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>>> 
>>>        Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>>>        tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>>>        is ok to removed it.
>> 
>> Looks good, but leave Richard a chance to ask for VLA vector support which
>> might be trivial to do.
> 
> Sorry for the slow reply.  It might be tricky to handle the general case,
> so I'd suggest going with this for now and dealing with VLA as a follow-up.
> (Probably after Prathamesh's changes to fold_vec_perm_expr.)
> 
> Thanks,
> Richard
> 
>> Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
>> tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
>> seem to handle VLA vectors.
>> 
>> Thanks,
>> Richard.
>> 
>>> gcc/ChangeLog:
>>> 
>>>        PR target/54346
>>>        * match.pd: Merge the index of VCST then generates the new vec_perm.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>        PR target/54346
>>>        * gcc.dg/pr54346.c: New test.
>>> 
>>> Co-authored-by: liuhongt <hongtao.liu@intel.com>
>>> ---
>>> gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>>> gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>>> 2 files changed, 54 insertions(+)
>>> create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>>> 
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index 345bcb701a5..9219b0a10e1 100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -8086,6 +8086,47 @@ and,
>>>   (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>>> 
>>> 
>>> +/* (PR54346) Merge
>>> +   c = VEC_PERM_EXPR <a, b, VCST0>;
>>> +   d = VEC_PERM_EXPR <c, c, VCST1>;
>>> +   to
>>> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
>>> +
>>> +(simplify
>>> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
>>> + (with
>>> +  {
>>> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
>>> +      return NULL_TREE;
>>> +
>>> +    tree op0;
>>> +    machine_mode result_mode = TYPE_MODE (type);
>>> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
>>> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
>>> +    vec_perm_builder builder0;
>>> +    vec_perm_builder builder1;
>>> +    vec_perm_builder builder2 (nelts, nelts, 1);
>>> +
>>> +    if (!tree_to_vec_perm_builder (&builder0, @3)
>>> +    || !tree_to_vec_perm_builder (&builder1, @4))
>>> +      return NULL_TREE;
>>> +
>>> +    vec_perm_indices sel0 (builder0, 2, nelts);
>>> +    vec_perm_indices sel1 (builder1, 1, nelts);
>>> +
>>> +    for (int i = 0; i < nelts; i++)
>>> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
>>> +
>>> +    vec_perm_indices sel2 (builder2, 2, nelts);
>>> +
>>> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
>>> +      return NULL_TREE;
>>> +
>>> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
>>> +  }
>>> +  (vec_perm @1 @2 { op0; })))
>>> +
>>> +
>>> /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>>>    The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>>>    constant which when multiplied by a power of 2 contains a unique value
>>> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
>>> new file mode 100755
>>> index 00000000000..d87dc3a79a5
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/pr54346.c
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O -fdump-tree-dse1" } */
>>> +
>>> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
>>> +
>>> +void fun (veci a, veci b, veci *i)
>>> +{
>>> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
>>> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
>>> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
>>> \ No newline at end of file
>>> --
>>> 2.18.2
>>>
  
Xi Ruoyao Oct. 12, 2022, 1:51 p.m. UTC | #4
On Mon, 2022-09-26 at 14:56 +0800, Liwei Xu via Gcc-patches wrote:
>         This patch implemented the optimization in PR 54346, which Merges
> 
>         c = VEC_PERM_EXPR <a, b, VCST0>;
>         d = VEC_PERM_EXPR <c, c, VCST1>;
>                 to
>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
> 
>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>         is ok to removed it.

I'm getting:

FAIL: gcc.dg/pr54346.c scan-tree-dump dse1 "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }"
FAIL: gcc.dg/pr54346.c scan-tree-dump-times dse1 "VEC_PERM_EXPR" 1

on loongarch64-linux-gnu.  Not sure why.
  
Levy Hsu Oct. 13, 2022, 6:15 a.m. UTC | #5
Hi RuoYao

It?s probably because loongarch64 doesn?t support 
can_vec_perm_const_p(result_mode, op_mode, sel2, false)

I?m not sure whether if loongarch will support it or should I just limit the test target for pr54346.c?

Best Regards
Levy

> On 12 Oct 2022, at 9:51 pm, Xi Ruoyao <xry111@xry111.site> wrote:
> 
> pr54346.
  
Xi Ruoyao Oct. 13, 2022, 6:44 a.m. UTC | #6
On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
> Hi RuoYao
> 
> It?s probably because loongarch64 doesn?t support 
> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
> 
> I?m not sure whether if loongarch will support it or should I just
> limit the test target for pr54346.c?

I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
actually support vector.  (LoongArch has SIMD instructions but the
support in GCC won't be added in a very recent future.)
  
Lulu Cheng Oct. 13, 2022, 8:15 a.m. UTC | #7
? 2022/10/13 ??2:44, Xi Ruoyao ??:
> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>> Hi RuoYao
>>
>> It?s probably because loongarch64 doesn?t support
>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>
>> I?m not sure whether if loongarch will support it or should I just
>> limit the test target for pr54346.c?
> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
> actually support vector.  (LoongArch has SIMD instructions but the
> support in GCC won't be added in a very recent future.)
>
If what I understand is correct, I think this might be a better solution.

 ?/* { dg-do compile } */

+/* { dg-require-effective-target vect_perm } */
 ?/* { dg-options "-O -fdump-tree-dse1" } */
  
Richard Biener Oct. 13, 2022, 11:10 a.m. UTC | #8
On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>
>
> ? 2022/10/13 ??2:44, Xi Ruoyao ??:
> > On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
> >> Hi RuoYao
> >>
> >> It?s probably because loongarch64 doesn?t support
> >> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
> >>
> >> I?m not sure whether if loongarch will support it or should I just
> >> limit the test target for pr54346.c?
> > I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
> > actually support vector.  (LoongArch has SIMD instructions but the
> > support in GCC won't be added in a very recent future.)
> >
> If what I understand is correct, I think this might be a better solution.
>
>   /* { dg-do compile } */
>
> +/* { dg-require-effective-target vect_perm } */
>   /* { dg-options "-O -fdump-tree-dse1" } */

Btw, what forwprop does is check whether any of the original permutations are
not supported and then elide the supportability check for the result.
The reasoning
is that the original permute(s) would be lowered during vectlower so we can as
well do that for the result.  We should just never turn a supported permutation
sequence into a not supported one.

Richard.

>
  
Liwei Xu Oct. 14, 2022, 1:09 a.m. UTC | #9
Hi Richard
 
Thank your for your detailed explanation, I?ll patch the test case with suggestions form LuLu.

Best
Levy

> On 13 Oct 2022, at 7:12 pm, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> ?On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>> 
>> 
>>> ? 2022/10/13 ??2:44, Xi Ruoyao ??:
>>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>>>> Hi RuoYao
>>>> 
>>>> It?s probably because loongarch64 doesn?t support
>>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>>> 
>>>> I?m not sure whether if loongarch will support it or should I just
>>>> limit the test target for pr54346.c?
>>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
>>> actually support vector.  (LoongArch has SIMD instructions but the
>>> support in GCC won't be added in a very recent future.)
>>> 
>> If what I understand is correct, I think this might be a better solution.
>> 
>>  /* { dg-do compile } */
>> 
>> +/* { dg-require-effective-target vect_perm } */
>>  /* { dg-options "-O -fdump-tree-dse1" } */
> 
> Btw, what forwprop does is check whether any of the original permutations are
> not supported and then elide the supportability check for the result.
> The reasoning
> is that the original permute(s) would be lowered during vectlower so we can as
> well do that for the result.  We should just never turn a supported permutation
> sequence into a not supported one.
> 
> Richard.
>
  
Lulu Cheng Oct. 14, 2022, 1:49 a.m. UTC | #10
? 2022/10/13 ??7:10, Richard Biener ??:
> On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>>
>> ? 2022/10/13 ??2:44, Xi Ruoyao ??:
>>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>>>> Hi RuoYao
>>>>
>>>> It?s probably because loongarch64 doesn?t support
>>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>>>
>>>> I?m not sure whether if loongarch will support it or should I just
>>>> limit the test target for pr54346.c?
>>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
>>> actually support vector.  (LoongArch has SIMD instructions but the
>>> support in GCC won't be added in a very recent future.)
>>>
>> If what I understand is correct, I think this might be a better solution.
>>
>>    /* { dg-do compile } */
>>
>> +/* { dg-require-effective-target vect_perm } */
>>    /* { dg-options "-O -fdump-tree-dse1" } */
> Btw, what forwprop does is check whether any of the original permutations are
> not supported and then elide the supportability check for the result.
> The reasoning
> is that the original permute(s) would be lowered during vectlower so we can as
> well do that for the result.  We should just never turn a supported permutation
> sequence into a not supported one.
>
> Richard.
>
Hi Richard:

I'm very sorry. I don't fully understand what you mean.

Could you give me some more details?


Thanks!

Lulu Cheng
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 345bcb701a5..9219b0a10e1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8086,6 +8086,47 @@  and,
   (minus (mult (vec_perm @1 @1 @3) @2) @4)))
 
 
+/* (PR54346) Merge 
+   c = VEC_PERM_EXPR <a, b, VCST0>; 
+   d = VEC_PERM_EXPR <c, c, VCST1>;
+   to
+   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
+   
+(simplify
+ (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
+ (with
+  {
+    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
+      return NULL_TREE;
+
+    tree op0;
+    machine_mode result_mode = TYPE_MODE (type);
+    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
+    vec_perm_builder builder0;
+    vec_perm_builder builder1;
+    vec_perm_builder builder2 (nelts, nelts, 1);
+
+    if (!tree_to_vec_perm_builder (&builder0, @3) 
+    || !tree_to_vec_perm_builder (&builder1, @4))
+      return NULL_TREE;
+
+    vec_perm_indices sel0 (builder0, 2, nelts);
+    vec_perm_indices sel1 (builder1, 1, nelts);
+   
+    for (int i = 0; i < nelts; i++)
+      builder2.quick_push (sel0[sel1[i].to_constant()]);
+
+    vec_perm_indices sel2 (builder2, 2, nelts);
+
+    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
+      return NULL_TREE;
+
+    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
+  }
+  (vec_perm @1 @2 { op0; })))
+
+
 /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
    The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
    constant which when multiplied by a power of 2 contains a unique value
diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
new file mode 100755
index 00000000000..d87dc3a79a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr54346.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1" } */
+
+typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
+
+void fun (veci a, veci b, veci *i)
+{
+  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
+  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
\ No newline at end of file