Message ID | 20220926065604.783193-1-liwei.xu@intel.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8ADE83857023 for <patchwork@sourceware.org>; Mon, 26 Sep 2022 06:58:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8ADE83857023 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1664175531; bh=WoRprUvywVRLZto6j3mUi0dPGL3t4xqsZwdZFtGJ2nw=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=ZGY4escBHuXYWfTTMhTEuRuzaheVBHOAP2q5FuFuZ/3EGbNnSW3dX4yW+DJQ7Mpsg HTZ30q0qEhYagnkEQ1zCO5NMK+XVsMYL/Kw4toVlVUEmqw1H4BsMysgybXbh9ol8Bz lBv0/gFA/kcRuHzd406jKT7QMrij2oQWZbh0Z5GM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 58CBA3858CDA for <gcc-patches@gcc.gnu.org>; Mon, 26 Sep 2022 06:58:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 58CBA3858CDA X-IronPort-AV: E=McAfee;i="6500,9779,10481"; a="301854893" X-IronPort-AV: E=Sophos;i="5.93,345,1654585200"; d="scan'208";a="301854893" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2022 23:58:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10481"; a="623226285" X-IronPort-AV: E=Sophos;i="5.93,345,1654585200"; d="scan'208";a="623226285" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga007.fm.intel.com with ESMTP; 25 Sep 2022 23:58:05 -0700 Received: from shliclel314.sh.intel.com (shliclel314.sh.intel.com [10.239.240.214]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 8C7951005687; Mon, 26 Sep 2022 14:58:04 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346] Date: Mon, 26 Sep 2022 14:56:04 +0800 Message-Id: <20220926065604.783193-1-liwei.xu@intel.com> X-Mailer: git-send-email 2.18.2 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Liwei Xu via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Liwei Xu <liwei.xu@intel.com> Cc: wilson@tuliptree.org, admin@levyhsu.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
Optimize nested permutation to single VEC_PERM_EXPR [PR54346]
|
|
Commit Message
Liwei Xu
Sept. 26, 2022, 6:56 a.m. UTC
This patch implemented the optimization in PR 54346, which Merges c = VEC_PERM_EXPR <a, b, VCST0>; d = VEC_PERM_EXPR <c, c, VCST1>; to d = VEC_PERM_EXPR <a, b, NEW_VCST>; Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it is ok to removed it. gcc/ChangeLog: PR target/54346 * match.pd: Merge the index of VCST then generates the new vec_perm. gcc/testsuite/ChangeLog: PR target/54346 * gcc.dg/pr54346.c: New test. Co-authored-by: liuhongt <hongtao.liu@intel.com> --- gcc/match.pd | 41 ++++++++++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++ 2 files changed, 54 insertions(+) create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
Comments
On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote: > > This patch implemented the optimization in PR 54346, which Merges > > c = VEC_PERM_EXPR <a, b, VCST0>; > d = VEC_PERM_EXPR <c, c, VCST1>; > to > d = VEC_PERM_EXPR <a, b, NEW_VCST>; > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it > is ok to removed it. Looks good, but leave Richard a chance to ask for VLA vector support which might be trivial to do. Btw, doesn't this handle the VEC_PERM + VEC_PERM case in tree-ssa-forwprop.cc:simplify_permutation as well? Note _that_ does seem to handle VLA vectors. Thanks, Richard. > gcc/ChangeLog: > > PR target/54346 > * match.pd: Merge the index of VCST then generates the new vec_perm. > > gcc/testsuite/ChangeLog: > > PR target/54346 > * gcc.dg/pr54346.c: New test. > > Co-authored-by: liuhongt <hongtao.liu@intel.com> > --- > gcc/match.pd | 41 ++++++++++++++++++++++++++++++++++ > gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++ > 2 files changed, 54 insertions(+) > create mode 100755 gcc/testsuite/gcc.dg/pr54346.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 345bcb701a5..9219b0a10e1 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -8086,6 +8086,47 @@ and, > (minus (mult (vec_perm @1 @1 @3) @2) @4))) > > > +/* (PR54346) Merge > + c = VEC_PERM_EXPR <a, b, VCST0>; > + d = VEC_PERM_EXPR <c, c, VCST1>; > + to > + d = VEC_PERM_EXPR <a, b, NEW_VCST>; */ > + > +(simplify > + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4) > + (with > + { > + if(!TYPE_VECTOR_SUBPARTS (type).is_constant()) > + return NULL_TREE; > + > + tree op0; > + machine_mode result_mode = TYPE_MODE (type); > + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1)); > + int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant(); > + vec_perm_builder builder0; > + vec_perm_builder builder1; > + vec_perm_builder builder2 (nelts, nelts, 1); > + > + if (!tree_to_vec_perm_builder (&builder0, @3) > + || !tree_to_vec_perm_builder (&builder1, @4)) > + return NULL_TREE; > + > + vec_perm_indices sel0 (builder0, 2, nelts); > + vec_perm_indices sel1 (builder1, 1, nelts); > + > + for (int i = 0; i < nelts; i++) > + builder2.quick_push (sel0[sel1[i].to_constant()]); > + > + vec_perm_indices sel2 (builder2, 2, nelts); > + > + if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false)) > + return NULL_TREE; > + > + op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2); > + } > + (vec_perm @1 @2 { op0; }))) > + > + > /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop. > The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic > constant which when multiplied by a power of 2 contains a unique value > diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c > new file mode 100755 > index 00000000000..d87dc3a79a5 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr54346.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -fdump-tree-dse1" } */ > + > +typedef int veci __attribute__ ((vector_size (4 * sizeof (int)))); > + > +void fun (veci a, veci b, veci *i) > +{ > + veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7}); > + *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 }); > +} > + > +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */ > +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */ > \ No newline at end of file > -- > 2.18.2 >
Richard Biener <richard.guenther@gmail.com> writes: > On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote: >> >> This patch implemented the optimization in PR 54346, which Merges >> >> c = VEC_PERM_EXPR <a, b, VCST0>; >> d = VEC_PERM_EXPR <c, c, VCST1>; >> to >> d = VEC_PERM_EXPR <a, b, NEW_VCST>; >> >> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} >> tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it >> is ok to removed it. > > Looks good, but leave Richard a chance to ask for VLA vector support which > might be trivial to do. Sorry for the slow reply. It might be tricky to handle the general case, so I'd suggest going with this for now and dealing with VLA as a follow-up. (Probably after Prathamesh's changes to fold_vec_perm_expr.) Thanks, Richard > Btw, doesn't this handle the VEC_PERM + VEC_PERM case in > tree-ssa-forwprop.cc:simplify_permutation as well? Note _that_ does > seem to handle VLA vectors. > > Thanks, > Richard. > >> gcc/ChangeLog: >> >> PR target/54346 >> * match.pd: Merge the index of VCST then generates the new vec_perm. >> >> gcc/testsuite/ChangeLog: >> >> PR target/54346 >> * gcc.dg/pr54346.c: New test. >> >> Co-authored-by: liuhongt <hongtao.liu@intel.com> >> --- >> gcc/match.pd | 41 ++++++++++++++++++++++++++++++++++ >> gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++ >> 2 files changed, 54 insertions(+) >> create mode 100755 gcc/testsuite/gcc.dg/pr54346.c >> >> diff --git a/gcc/match.pd b/gcc/match.pd >> index 345bcb701a5..9219b0a10e1 100644 >> --- a/gcc/match.pd >> +++ b/gcc/match.pd >> @@ -8086,6 +8086,47 @@ and, >> (minus (mult (vec_perm @1 @1 @3) @2) @4))) >> >> >> +/* (PR54346) Merge >> + c = VEC_PERM_EXPR <a, b, VCST0>; >> + d = VEC_PERM_EXPR <c, c, VCST1>; >> + to >> + d = VEC_PERM_EXPR <a, b, NEW_VCST>; */ >> + >> +(simplify >> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4) >> + (with >> + { >> + if(!TYPE_VECTOR_SUBPARTS (type).is_constant()) >> + return NULL_TREE; >> + >> + tree op0; >> + machine_mode result_mode = TYPE_MODE (type); >> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1)); >> + int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant(); >> + vec_perm_builder builder0; >> + vec_perm_builder builder1; >> + vec_perm_builder builder2 (nelts, nelts, 1); >> + >> + if (!tree_to_vec_perm_builder (&builder0, @3) >> + || !tree_to_vec_perm_builder (&builder1, @4)) >> + return NULL_TREE; >> + >> + vec_perm_indices sel0 (builder0, 2, nelts); >> + vec_perm_indices sel1 (builder1, 1, nelts); >> + >> + for (int i = 0; i < nelts; i++) >> + builder2.quick_push (sel0[sel1[i].to_constant()]); >> + >> + vec_perm_indices sel2 (builder2, 2, nelts); >> + >> + if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false)) >> + return NULL_TREE; >> + >> + op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2); >> + } >> + (vec_perm @1 @2 { op0; }))) >> + >> + >> /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop. >> The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic >> constant which when multiplied by a power of 2 contains a unique value >> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c >> new file mode 100755 >> index 00000000000..d87dc3a79a5 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr54346.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O -fdump-tree-dse1" } */ >> + >> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int)))); >> + >> +void fun (veci a, veci b, veci *i) >> +{ >> + veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7}); >> + *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 }); >> +} >> + >> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */ >> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */ >> \ No newline at end of file >> -- >> 2.18.2 >>
Hi Richard Sure, I’ll also check the VLA case once I’m back from vacation, thank you for your help. Best Regards Liwei > On 30 Sep 2022, at 5:17 pm, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Biener <richard.guenther@gmail.com> writes: >>> On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote: >>> >>> This patch implemented the optimization in PR 54346, which Merges >>> >>> c = VEC_PERM_EXPR <a, b, VCST0>; >>> d = VEC_PERM_EXPR <c, c, VCST1>; >>> to >>> d = VEC_PERM_EXPR <a, b, NEW_VCST>; >>> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} >>> tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it >>> is ok to removed it. >> >> Looks good, but leave Richard a chance to ask for VLA vector support which >> might be trivial to do. > > Sorry for the slow reply. It might be tricky to handle the general case, > so I'd suggest going with this for now and dealing with VLA as a follow-up. > (Probably after Prathamesh's changes to fold_vec_perm_expr.) > > Thanks, > Richard > >> Btw, doesn't this handle the VEC_PERM + VEC_PERM case in >> tree-ssa-forwprop.cc:simplify_permutation as well? Note _that_ does >> seem to handle VLA vectors. >> >> Thanks, >> Richard. >> >>> gcc/ChangeLog: >>> >>> PR target/54346 >>> * match.pd: Merge the index of VCST then generates the new vec_perm. >>> >>> gcc/testsuite/ChangeLog: >>> >>> PR target/54346 >>> * gcc.dg/pr54346.c: New test. >>> >>> Co-authored-by: liuhongt <hongtao.liu@intel.com> >>> --- >>> gcc/match.pd | 41 ++++++++++++++++++++++++++++++++++ >>> gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++ >>> 2 files changed, 54 insertions(+) >>> create mode 100755 gcc/testsuite/gcc.dg/pr54346.c >>> >>> diff --git a/gcc/match.pd b/gcc/match.pd >>> index 345bcb701a5..9219b0a10e1 100644 >>> --- a/gcc/match.pd >>> +++ b/gcc/match.pd >>> @@ -8086,6 +8086,47 @@ and, >>> (minus (mult (vec_perm @1 @1 @3) @2) @4))) >>> >>> >>> +/* (PR54346) Merge >>> + c = VEC_PERM_EXPR <a, b, VCST0>; >>> + d = VEC_PERM_EXPR <c, c, VCST1>; >>> + to >>> + d = VEC_PERM_EXPR <a, b, NEW_VCST>; */ >>> + >>> +(simplify >>> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4) >>> + (with >>> + { >>> + if(!TYPE_VECTOR_SUBPARTS (type).is_constant()) >>> + return NULL_TREE; >>> + >>> + tree op0; >>> + machine_mode result_mode = TYPE_MODE (type); >>> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1)); >>> + int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant(); >>> + vec_perm_builder builder0; >>> + vec_perm_builder builder1; >>> + vec_perm_builder builder2 (nelts, nelts, 1); >>> + >>> + if (!tree_to_vec_perm_builder (&builder0, @3) >>> + || !tree_to_vec_perm_builder (&builder1, @4)) >>> + return NULL_TREE; >>> + >>> + vec_perm_indices sel0 (builder0, 2, nelts); >>> + vec_perm_indices sel1 (builder1, 1, nelts); >>> + >>> + for (int i = 0; i < nelts; i++) >>> + builder2.quick_push (sel0[sel1[i].to_constant()]); >>> + >>> + vec_perm_indices sel2 (builder2, 2, nelts); >>> + >>> + if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false)) >>> + return NULL_TREE; >>> + >>> + op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2); >>> + } >>> + (vec_perm @1 @2 { op0; }))) >>> + >>> + >>> /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop. >>> The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic >>> constant which when multiplied by a power of 2 contains a unique value >>> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c >>> new file mode 100755 >>> index 00000000000..d87dc3a79a5 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/pr54346.c >>> @@ -0,0 +1,13 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O -fdump-tree-dse1" } */ >>> + >>> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int)))); >>> + >>> +void fun (veci a, veci b, veci *i) >>> +{ >>> + veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7}); >>> + *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 }); >>> +} >>> + >>> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */ >>> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */ >>> \ No newline at end of file >>> -- >>> 2.18.2 >>>
On Mon, 2022-09-26 at 14:56 +0800, Liwei Xu via Gcc-patches wrote: > This patch implemented the optimization in PR 54346, which Merges > > c = VEC_PERM_EXPR <a, b, VCST0>; > d = VEC_PERM_EXPR <c, c, VCST1>; > to > d = VEC_PERM_EXPR <a, b, NEW_VCST>; > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it > is ok to removed it. I'm getting: FAIL: gcc.dg/pr54346.c scan-tree-dump dse1 "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" FAIL: gcc.dg/pr54346.c scan-tree-dump-times dse1 "VEC_PERM_EXPR" 1 on loongarch64-linux-gnu. Not sure why.
Hi RuoYao It?s probably because loongarch64 doesn?t support can_vec_perm_const_p(result_mode, op_mode, sel2, false) I?m not sure whether if loongarch will support it or should I just limit the test target for pr54346.c? Best Regards Levy > On 12 Oct 2022, at 9:51 pm, Xi Ruoyao <xry111@xry111.site> wrote: > > pr54346.
On Thu, 2022-10-13 at 14:15 +0800, Levy wrote: > Hi RuoYao > > It?s probably because loongarch64 doesn?t support > can_vec_perm_const_p(result_mode, op_mode, sel2, false) > > I?m not sure whether if loongarch will support it or should I just > limit the test target for pr54346.c? I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't actually support vector. (LoongArch has SIMD instructions but the support in GCC won't be added in a very recent future.)
? 2022/10/13 ??2:44, Xi Ruoyao ??: > On Thu, 2022-10-13 at 14:15 +0800, Levy wrote: >> Hi RuoYao >> >> It?s probably because loongarch64 doesn?t support >> can_vec_perm_const_p(result_mode, op_mode, sel2, false) >> >> I?m not sure whether if loongarch will support it or should I just >> limit the test target for pr54346.c? > I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't > actually support vector. (LoongArch has SIMD instructions but the > support in GCC won't be added in a very recent future.) > If what I understand is correct, I think this might be a better solution. ?/* { dg-do compile } */ +/* { dg-require-effective-target vect_perm } */ ?/* { dg-options "-O -fdump-tree-dse1" } */
On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote: > > > ? 2022/10/13 ??2:44, Xi Ruoyao ??: > > On Thu, 2022-10-13 at 14:15 +0800, Levy wrote: > >> Hi RuoYao > >> > >> It?s probably because loongarch64 doesn?t support > >> can_vec_perm_const_p(result_mode, op_mode, sel2, false) > >> > >> I?m not sure whether if loongarch will support it or should I just > >> limit the test target for pr54346.c? > > I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't > > actually support vector. (LoongArch has SIMD instructions but the > > support in GCC won't be added in a very recent future.) > > > If what I understand is correct, I think this might be a better solution. > > /* { dg-do compile } */ > > +/* { dg-require-effective-target vect_perm } */ > /* { dg-options "-O -fdump-tree-dse1" } */ Btw, what forwprop does is check whether any of the original permutations are not supported and then elide the supportability check for the result. The reasoning is that the original permute(s) would be lowered during vectlower so we can as well do that for the result. We should just never turn a supported permutation sequence into a not supported one. Richard. >
Hi Richard Thank your for your detailed explanation, I?ll patch the test case with suggestions form LuLu. Best Levy > On 13 Oct 2022, at 7:12 pm, Richard Biener <richard.guenther@gmail.com> wrote: > > ?On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote: >> >> >>> ? 2022/10/13 ??2:44, Xi Ruoyao ??: >>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote: >>>> Hi RuoYao >>>> >>>> It?s probably because loongarch64 doesn?t support >>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false) >>>> >>>> I?m not sure whether if loongarch will support it or should I just >>>> limit the test target for pr54346.c? >>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't >>> actually support vector. (LoongArch has SIMD instructions but the >>> support in GCC won't be added in a very recent future.) >>> >> If what I understand is correct, I think this might be a better solution. >> >> /* { dg-do compile } */ >> >> +/* { dg-require-effective-target vect_perm } */ >> /* { dg-options "-O -fdump-tree-dse1" } */ > > Btw, what forwprop does is check whether any of the original permutations are > not supported and then elide the supportability check for the result. > The reasoning > is that the original permute(s) would be lowered during vectlower so we can as > well do that for the result. We should just never turn a supported permutation > sequence into a not supported one. > > Richard. >
? 2022/10/13 ??7:10, Richard Biener ??: > On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote: >> >> ? 2022/10/13 ??2:44, Xi Ruoyao ??: >>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote: >>>> Hi RuoYao >>>> >>>> It?s probably because loongarch64 doesn?t support >>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false) >>>> >>>> I?m not sure whether if loongarch will support it or should I just >>>> limit the test target for pr54346.c? >>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't >>> actually support vector. (LoongArch has SIMD instructions but the >>> support in GCC won't be added in a very recent future.) >>> >> If what I understand is correct, I think this might be a better solution. >> >> /* { dg-do compile } */ >> >> +/* { dg-require-effective-target vect_perm } */ >> /* { dg-options "-O -fdump-tree-dse1" } */ > Btw, what forwprop does is check whether any of the original permutations are > not supported and then elide the supportability check for the result. > The reasoning > is that the original permute(s) would be lowered during vectlower so we can as > well do that for the result. We should just never turn a supported permutation > sequence into a not supported one. > > Richard. > Hi Richard: I'm very sorry. I don't fully understand what you mean. Could you give me some more details? Thanks! Lulu Cheng
diff --git a/gcc/match.pd b/gcc/match.pd index 345bcb701a5..9219b0a10e1 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8086,6 +8086,47 @@ and, (minus (mult (vec_perm @1 @1 @3) @2) @4))) +/* (PR54346) Merge + c = VEC_PERM_EXPR <a, b, VCST0>; + d = VEC_PERM_EXPR <c, c, VCST1>; + to + d = VEC_PERM_EXPR <a, b, NEW_VCST>; */ + +(simplify + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4) + (with + { + if(!TYPE_VECTOR_SUBPARTS (type).is_constant()) + return NULL_TREE; + + tree op0; + machine_mode result_mode = TYPE_MODE (type); + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1)); + int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant(); + vec_perm_builder builder0; + vec_perm_builder builder1; + vec_perm_builder builder2 (nelts, nelts, 1); + + if (!tree_to_vec_perm_builder (&builder0, @3) + || !tree_to_vec_perm_builder (&builder1, @4)) + return NULL_TREE; + + vec_perm_indices sel0 (builder0, 2, nelts); + vec_perm_indices sel1 (builder1, 1, nelts); + + for (int i = 0; i < nelts; i++) + builder2.quick_push (sel0[sel1[i].to_constant()]); + + vec_perm_indices sel2 (builder2, 2, nelts); + + if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false)) + return NULL_TREE; + + op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2); + } + (vec_perm @1 @2 { op0; }))) + + /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop. The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic constant which when multiplied by a power of 2 contains a unique value diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c new file mode 100755 index 00000000000..d87dc3a79a5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr54346.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-dse1" } */ + +typedef int veci __attribute__ ((vector_size (4 * sizeof (int)))); + +void fun (veci a, veci b, veci *i) +{ + veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7}); + *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 }); +} + +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */ +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */ \ No newline at end of file