Message ID | CAAgBjMmC1EX7AvzPVcn86ZqhcZJtXGx16yEDkB2qSkJ_i09sAQ@mail.gmail.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8E48238460B3 for <patchwork@sourceware.org>; Tue, 12 Jul 2022 19:12:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8E48238460B3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1657653133; bh=Yt2MAStNlQZpkpLMRPyrnzlPIe8/YwGsrm1QCZ36EEM=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=YeXW+pbtk7DYpahrer44aRltW1tnWCQsfMD4TAmMip/VG0bEYmScSiNuD20BDUDFZ IBe2L3C+N+lAPSqGT1rWYQsPbn6nvWkz7oemHopy2RJ8wlvGExMUeK11z+ZnpkPYB7 zvFvjE3HozbHamupNlUisMznjlZ/eLhF92DW97BE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id 28E853857C7D for <gcc-patches@gcc.gnu.org>; Tue, 12 Jul 2022 19:11:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 28E853857C7D Received: by mail-ed1-x52b.google.com with SMTP id r18so11322038edb.9 for <gcc-patches@gcc.gnu.org>; Tue, 12 Jul 2022 12:11:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Yt2MAStNlQZpkpLMRPyrnzlPIe8/YwGsrm1QCZ36EEM=; b=PFqiymGj/78iTySHInZM3/bW2zAYsh4oaHxkZUBs80/2uci7d4Foyg2G2pSPQVsSBT Q34hKrK6F9u/nJvrFGzAK7SppgO5hwAFiXr8BwZdaINcPyUiYIgtkiZBZAuCpFYGOPwi 4rj4uXAIw53cojwop0AXaZZmAm7fp644s2v6dKU7Fx2zBGjuZPwkhmqdofpcuvGEPN2+ ZZCduTenf6c2nMJq5RX7fOeC61mnr3gBPAx3s1SzBDcgaAzOICgN/i4YP+AWEAUIhgwV Sy4PbXivnbOaUciqKQqQYFkkZBoIpN23F0BMQpRECEV5+PpF8eLNozal+YfJo2+d20gc PhCg== X-Gm-Message-State: AJIora+Mv5GTBNt8kx0y9xsk89qRDRINftBj4ag4bHz7iZPZ7QdnE1PG 61E1gdaUhk8dJ7V69zgN5T/kY2VAUlb0MYkcH/15zqm/ecmtfYbT X-Google-Smtp-Source: AGRyM1tcNELIcOhPyAI85JfR7fpyrijmw4go7ZOBxOMelXDqkKpK1zGCb3nROk0j7OXH/n68ySRVrDEnE8rVRFbbV+A= X-Received: by 2002:a05:6402:3807:b0:435:20fb:318d with SMTP id es7-20020a056402380700b0043520fb318dmr33355494edb.272.1657653101504; Tue, 12 Jul 2022 12:11:41 -0700 (PDT) MIME-Version: 1.0 Date: Wed, 13 Jul 2022 00:41:04 +0530 Message-ID: <CAAgBjMmC1EX7AvzPVcn86ZqhcZJtXGx16yEDkB2qSkJ_i09sAQ@mail.gmail.com> Subject: ICE after folding svld1rq to vec_perm_expr duing forwprop To: gcc Patches <gcc-patches@gcc.gnu.org>, Richard Sandiford <richard.sandiford@arm.com> Content-Type: multipart/mixed; boundary="00000000000046df0005e3a070bf" X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
ICE after folding svld1rq to vec_perm_expr duing forwprop
|
|
Commit Message
Prathamesh Kulkarni
July 12, 2022, 7:11 p.m. UTC
Hi Richard, For the following test: svint32_t f2(int a, int b, int c, int d) { int32x4_t v = (int32x4_t) {a, b, c, d}; return svld1rq_s32 (svptrue_b8 (), &v[0]); } The compiler emits following ICE with -O3 -mcpu=generic+sve: foo.c: In function ‘f2’: foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ 4 | svint32_t f2(int a, int b, int c, int d) | ^~ svint32_t __Int32x4_t _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); during GIMPLE pass: forwprop dump file: foo.c.109t.forwprop2 foo.c:4:11: internal compiler error: verify_gimple failed 0xfda04a verify_gimple_in_cfg(function*, bool) ../../gcc/gcc/tree-cfg.cc:5568 0xe9371f execute_function_todo ../../gcc/gcc/passes.cc:2091 0xe93ccb execute_todo ../../gcc/gcc/passes.cc:2145 This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: int32x4_t v; __Int32x4_t _1; svint32_t _9; vector(4) int _11; <bb 2> : _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; v_12 = _1; _11 = v_12; _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; return _9; During forwprop, simplify_permutation simplifies vec_perm_expr to view_convert_expr, and the end result becomes: svint32_t _7; __Int32x4_t _8; ;; basic block 2, loop depth 0 ;; pred: ENTRY _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); return _7; ;; succ: EXIT which causes the error duing verify_gimple since VIEW_CONVERT_EXPR has incompatible types (svint32_t, int32x4_t). The attached patch disables simplification of VEC_PERM_EXPR in simplify_permutation, if lhs and rhs have non compatible types, which resolves ICE, but am not sure if it's the correct approach ? Alternatively, should we allow assignments from fixed-width to SVE vector, so the above VIEW_CONVERT_EXPR would result in dup ? Thanks, Prathamesh
Comments
On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi Richard, > For the following test: > > svint32_t f2(int a, int b, int c, int d) > { > int32x4_t v = (int32x4_t) {a, b, c, d}; > return svld1rq_s32 (svptrue_b8 (), &v[0]); > } > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > foo.c: In function ‘f2’: > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > 4 | svint32_t f2(int a, int b, int c, int d) > | ^~ > svint32_t > __Int32x4_t > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > during GIMPLE pass: forwprop > dump file: foo.c.109t.forwprop2 > foo.c:4:11: internal compiler error: verify_gimple failed > 0xfda04a verify_gimple_in_cfg(function*, bool) > ../../gcc/gcc/tree-cfg.cc:5568 > 0xe9371f execute_function_todo > ../../gcc/gcc/passes.cc:2091 > 0xe93ccb execute_todo > ../../gcc/gcc/passes.cc:2145 > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > int32x4_t v; > __Int32x4_t _1; > svint32_t _9; > vector(4) int _11; > > <bb 2> : > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > v_12 = _1; > _11 = v_12; > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > return _9; > > During forwprop, simplify_permutation simplifies vec_perm_expr to > view_convert_expr, > and the end result becomes: > svint32_t _7; > __Int32x4_t _8; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > return _7; > ;; succ: EXIT > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > has incompatible types (svint32_t, int32x4_t). > > The attached patch disables simplification of VEC_PERM_EXPR > in simplify_permutation, if lhs and rhs have non compatible types, > which resolves ICE, but am not sure if it's the correct approach ? It for sure papers over the issue. I think the error happens earlier, the V_C_E should have been built with the type of the VEC_PERM_EXPR which is the type of the LHS. But then you probably run into the different sizes ICE (VLA vs constant size). I think for this case you want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, selecting the "low" part of the VLA vector. > > Alternatively, should we allow assignments from fixed-width to SVE > vector, so the above > VIEW_CONVERT_EXPR would result in dup ? > > Thanks, > Prathamesh
On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi Richard, > > For the following test: > > > > svint32_t f2(int a, int b, int c, int d) > > { > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > } > > > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > foo.c: In function ‘f2’: > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > 4 | svint32_t f2(int a, int b, int c, int d) > > | ^~ > > svint32_t > > __Int32x4_t > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > during GIMPLE pass: forwprop > > dump file: foo.c.109t.forwprop2 > > foo.c:4:11: internal compiler error: verify_gimple failed > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > ../../gcc/gcc/tree-cfg.cc:5568 > > 0xe9371f execute_function_todo > > ../../gcc/gcc/passes.cc:2091 > > 0xe93ccb execute_todo > > ../../gcc/gcc/passes.cc:2145 > > > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > int32x4_t v; > > __Int32x4_t _1; > > svint32_t _9; > > vector(4) int _11; > > > > <bb 2> : > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > v_12 = _1; > > _11 = v_12; > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > return _9; > > > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > view_convert_expr, > > and the end result becomes: > > svint32_t _7; > > __Int32x4_t _8; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > return _7; > > ;; succ: EXIT > > > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > has incompatible types (svint32_t, int32x4_t). > > > > The attached patch disables simplification of VEC_PERM_EXPR > > in simplify_permutation, if lhs and rhs have non compatible types, > > which resolves ICE, but am not sure if it's the correct approach ? > > It for sure papers over the issue. I think the error happens earlier, > the V_C_E should have been built with the type of the VEC_PERM_EXPR > which is the type of the LHS. But then you probably run into the > different sizes ICE (VLA vs constant size). I think for this case you > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > selecting the "low" part of the VLA vector. Hi Richard, Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to represent dup operation from fixed width to VLA vector. I am not sure how folding it to BIT_FIELD_REF will work. Could you please elaborate ? Also, the issue doesn't seem restricted to this case. The following test case also ICE's during forwprop: svint32_t foo() { int32x4_t v = (int32x4_t) {1, 2, 3, 4}; svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); return v2; } foo2.c: In function ‘foo’: foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ 9 | } | ^ svint32_t int32x4_t v2_4 = { 1, 2, 3, 4 }; because simplify_permutation folds VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > into: vector_cst {1, 2, 3, 4} and it complains during verify_gimple_assign_single because we don't support assignment of vector_cst to VLA vector. I guess the issue really is that currently, only VEC_PERM_EXPR supports lhs and rhs to have vector types with differing lengths, and simplifying it to other tree codes, like above, will result in type errors ? Thanks, Prathamesh > > > > > Alternatively, should we allow assignments from fixed-width to SVE > > vector, so the above > > VIEW_CONVERT_EXPR would result in dup ? > > > > Thanks, > > Prathamesh
On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Hi Richard, > > > For the following test: > > > > > > svint32_t f2(int a, int b, int c, int d) > > > { > > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > } > > > > > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > foo.c: In function ‘f2’: > > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > 4 | svint32_t f2(int a, int b, int c, int d) > > > | ^~ > > > svint32_t > > > __Int32x4_t > > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > during GIMPLE pass: forwprop > > > dump file: foo.c.109t.forwprop2 > > > foo.c:4:11: internal compiler error: verify_gimple failed > > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > ../../gcc/gcc/tree-cfg.cc:5568 > > > 0xe9371f execute_function_todo > > > ../../gcc/gcc/passes.cc:2091 > > > 0xe93ccb execute_todo > > > ../../gcc/gcc/passes.cc:2145 > > > > > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > int32x4_t v; > > > __Int32x4_t _1; > > > svint32_t _9; > > > vector(4) int _11; > > > > > > <bb 2> : > > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > v_12 = _1; > > > _11 = v_12; > > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > return _9; > > > > > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > view_convert_expr, > > > and the end result becomes: > > > svint32_t _7; > > > __Int32x4_t _8; > > > > > > ;; basic block 2, loop depth 0 > > > ;; pred: ENTRY > > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > return _7; > > > ;; succ: EXIT > > > > > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > has incompatible types (svint32_t, int32x4_t). > > > > > > The attached patch disables simplification of VEC_PERM_EXPR > > > in simplify_permutation, if lhs and rhs have non compatible types, > > > which resolves ICE, but am not sure if it's the correct approach ? > > > > It for sure papers over the issue. I think the error happens earlier, > > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > which is the type of the LHS. But then you probably run into the > > different sizes ICE (VLA vs constant size). I think for this case you > > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > selecting the "low" part of the VLA vector. > Hi Richard, > Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > represent dup operation > from fixed width to VLA vector. I am not sure how folding it to > BIT_FIELD_REF will work. > Could you please elaborate ? > > Also, the issue doesn't seem restricted to this case. > The following test case also ICE's during forwprop: > svint32_t foo() > { > int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > return v2; > } > > foo2.c: In function ‘foo’: > foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > 9 | } > | ^ > svint32_t > int32x4_t > v2_4 = { 1, 2, 3, 4 }; > > because simplify_permutation folds > VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > into: > vector_cst {1, 2, 3, 4} > > and it complains during verify_gimple_assign_single because we don't > support assignment of vector_cst to VLA vector. > > I guess the issue really is that currently, only VEC_PERM_EXPR > supports lhs and rhs > to have vector types with differing lengths, and simplifying it to > other tree codes, like above, > will result in type errors ? That might be the case - Richard should know. If so your type check is still too late, you should instead recognize that we are permuting a VLA vector and then refuse to go any of the non-VEC_PERM generating paths - that probably means just allowing the code == VEC_PERM_EXPR case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? Richard. > > Thanks, > Prathamesh > > > > > > > > Alternatively, should we allow assignments from fixed-width to SVE > > > vector, so the above > > > VIEW_CONVERT_EXPR would result in dup ? > > > > > > Thanks, > > > Prathamesh
Richard Biener <richard.guenther@gmail.com> writes: > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: >> >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: >> > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches >> > <gcc-patches@gcc.gnu.org> wrote: >> > > >> > > Hi Richard, >> > > For the following test: >> > > >> > > svint32_t f2(int a, int b, int c, int d) >> > > { >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); >> > > } >> > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: >> > > foo.c: In function ‘f2’: >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ >> > > 4 | svint32_t f2(int a, int b, int c, int d) >> > > | ^~ >> > > svint32_t >> > > __Int32x4_t >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); >> > > during GIMPLE pass: forwprop >> > > dump file: foo.c.109t.forwprop2 >> > > foo.c:4:11: internal compiler error: verify_gimple failed >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) >> > > ../../gcc/gcc/tree-cfg.cc:5568 >> > > 0xe9371f execute_function_todo >> > > ../../gcc/gcc/passes.cc:2091 >> > > 0xe93ccb execute_todo >> > > ../../gcc/gcc/passes.cc:2145 >> > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: >> > > int32x4_t v; >> > > __Int32x4_t _1; >> > > svint32_t _9; >> > > vector(4) int _11; >> > > >> > > <bb 2> : >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; >> > > v_12 = _1; >> > > _11 = v_12; >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; >> > > return _9; >> > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to >> > > view_convert_expr, >> > > and the end result becomes: >> > > svint32_t _7; >> > > __Int32x4_t _8; >> > > >> > > ;; basic block 2, loop depth 0 >> > > ;; pred: ENTRY >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); >> > > return _7; >> > > ;; succ: EXIT >> > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR >> > > has incompatible types (svint32_t, int32x4_t). >> > > >> > > The attached patch disables simplification of VEC_PERM_EXPR >> > > in simplify_permutation, if lhs and rhs have non compatible types, >> > > which resolves ICE, but am not sure if it's the correct approach ? >> > >> > It for sure papers over the issue. I think the error happens earlier, >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR >> > which is the type of the LHS. But then you probably run into the >> > different sizes ICE (VLA vs constant size). I think for this case you >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, >> > selecting the "low" part of the VLA vector. >> Hi Richard, >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to >> represent dup operation >> from fixed width to VLA vector. I am not sure how folding it to >> BIT_FIELD_REF will work. >> Could you please elaborate ? >> >> Also, the issue doesn't seem restricted to this case. >> The following test case also ICE's during forwprop: >> svint32_t foo() >> { >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); >> return v2; >> } >> >> foo2.c: In function ‘foo’: >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ >> 9 | } >> | ^ >> svint32_t >> int32x4_t >> v2_4 = { 1, 2, 3, 4 }; >> >> because simplify_permutation folds >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > >> into: >> vector_cst {1, 2, 3, 4} >> >> and it complains during verify_gimple_assign_single because we don't >> support assignment of vector_cst to VLA vector. >> >> I guess the issue really is that currently, only VEC_PERM_EXPR >> supports lhs and rhs >> to have vector types with differing lengths, and simplifying it to >> other tree codes, like above, >> will result in type errors ? > > That might be the case - Richard should know. I don't see anything particularly special about VEC_PERM_EXPR here, or about the VLA vs. VLS thing. We would have the same issue trying to build a 128-bit vector from 2 64-bit vectors. And there are other tree codes whose input types are/can be different from their output types. So it just seems like a normal type correctness issue: a VEC_PERM_EXPR of type T needs to be replaced by something of type T. Whether T has a constant size or a variable size doesn't matter. > If so your type check > is still too late, you should instead recognize that we are permuting > a VLA vector and then refuse to go any of the non-VEC_PERM generating > paths - that probably means just allowing the code == VEC_PERM_EXPR > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? Yeah. If we're talking about the match.pd code, I think only: (if (sel.series_p (0, 1, 0, 1)) { op0; } (if (sel.series_p (0, 1, nelts, 1)) { op1; } need a type compatibility check. For fold_vec_perm I think we should just rearrange: gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) return NULL_TREE; so that the assert comes after the early-out. It would be good at some point to relax fold_vec_perm to cases where the outputs are a different length from the inputs, since the all-constant VEC_PERM_EXPR above could be folded to a VECTOR_CST. Thanks, Richard
On Thu, 14 Jul 2022 at 17:22, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Biener <richard.guenther@gmail.com> writes: > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > >> > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > >> > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > >> > <gcc-patches@gcc.gnu.org> wrote: > >> > > > >> > > Hi Richard, > >> > > For the following test: > >> > > > >> > > svint32_t f2(int a, int b, int c, int d) > >> > > { > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > >> > > } > >> > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > >> > > foo.c: In function ‘f2’: > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > >> > > | ^~ > >> > > svint32_t > >> > > __Int32x4_t > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > >> > > during GIMPLE pass: forwprop > >> > > dump file: foo.c.109t.forwprop2 > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > >> > > 0xe9371f execute_function_todo > >> > > ../../gcc/gcc/passes.cc:2091 > >> > > 0xe93ccb execute_todo > >> > > ../../gcc/gcc/passes.cc:2145 > >> > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > >> > > int32x4_t v; > >> > > __Int32x4_t _1; > >> > > svint32_t _9; > >> > > vector(4) int _11; > >> > > > >> > > <bb 2> : > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > >> > > v_12 = _1; > >> > > _11 = v_12; > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > >> > > return _9; > >> > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > >> > > view_convert_expr, > >> > > and the end result becomes: > >> > > svint32_t _7; > >> > > __Int32x4_t _8; > >> > > > >> > > ;; basic block 2, loop depth 0 > >> > > ;; pred: ENTRY > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > >> > > return _7; > >> > > ;; succ: EXIT > >> > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > >> > > has incompatible types (svint32_t, int32x4_t). > >> > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > >> > > which resolves ICE, but am not sure if it's the correct approach ? > >> > > >> > It for sure papers over the issue. I think the error happens earlier, > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > >> > which is the type of the LHS. But then you probably run into the > >> > different sizes ICE (VLA vs constant size). I think for this case you > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > >> > selecting the "low" part of the VLA vector. > >> Hi Richard, > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > >> represent dup operation > >> from fixed width to VLA vector. I am not sure how folding it to > >> BIT_FIELD_REF will work. > >> Could you please elaborate ? > >> > >> Also, the issue doesn't seem restricted to this case. > >> The following test case also ICE's during forwprop: > >> svint32_t foo() > >> { > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > >> return v2; > >> } > >> > >> foo2.c: In function ‘foo’: > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > >> 9 | } > >> | ^ > >> svint32_t > >> int32x4_t > >> v2_4 = { 1, 2, 3, 4 }; > >> > >> because simplify_permutation folds > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > >> into: > >> vector_cst {1, 2, 3, 4} > >> > >> and it complains during verify_gimple_assign_single because we don't > >> support assignment of vector_cst to VLA vector. > >> > >> I guess the issue really is that currently, only VEC_PERM_EXPR > >> supports lhs and rhs > >> to have vector types with differing lengths, and simplifying it to > >> other tree codes, like above, > >> will result in type errors ? > > > > That might be the case - Richard should know. > > I don't see anything particularly special about VEC_PERM_EXPR here, > or about the VLA vs. VLS thing. We would have the same issue trying > to build a 128-bit vector from 2 64-bit vectors. And there are other > tree codes whose input types are/can be different from their output > types. > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > of type T needs to be replaced by something of type T. Whether T has a > constant size or a variable size doesn't matter. > > > If so your type check > > is still too late, you should instead recognize that we are permuting > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > paths - that probably means just allowing the code == VEC_PERM_EXPR > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? > > Yeah. If we're talking about the match.pd code, I think only: > > (if (sel.series_p (0, 1, 0, 1)) > { op0; } > (if (sel.series_p (0, 1, nelts, 1)) > { op1; } > > need a type compatibility check. For fold_vec_perm I think > we should just rearrange: > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > return NULL_TREE; > > so that the assert comes after the early-out. > > It would be good at some point to relax fold_vec_perm to cases where the > outputs are a different length from the inputs, since the all-constant > VEC_PERM_EXPR above could be folded to a VECTOR_CST. Hi, For the above case, I think the issue is that simplify_permutation uses TREE_TYPE (arg0) for res_type, while it should now use type for lhs. /* Shuffle of a constructor. */ bool ret = false; tree res_type = TREE_TYPE (arg0); tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)), resolves the ICE, and emits all constant VEC_PERM_EXPR: v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; return v2_4; Does the patch look OK to commit after bootstrap+test ? I will try to address the folding for above VEC_PERM_EXPR in follow-up patch. Thanks, Prathamesh > > Thanks, > Richard diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc index 69567ab3275..b320ece13f2 100644 --- a/gcc/tree-ssa-forwprop.cc +++ b/gcc/tree-ssa-forwprop.cc @@ -2589,7 +2589,7 @@ simplify_permutation (gimple_stmt_iterator *gsi) /* Shuffle of a constructor. */ bool ret = false; - tree res_type = TREE_TYPE (arg0); + tree res_type = TREE_TYPE (gimple_get_lhs (stmt)); tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); if (!opt || (TREE_CODE (opt) != CONSTRUCTOR && TREE_CODE (opt) != VECTOR_CST))
On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > <richard.sandiford@arm.com> wrote: > > > > Richard Biener <richard.guenther@gmail.com> writes: > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > <prathamesh.kulkarni@linaro.org> wrote: > > >> > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > >> > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > >> > <gcc-patches@gcc.gnu.org> wrote: > > >> > > > > >> > > Hi Richard, > > >> > > For the following test: > > >> > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > >> > > { > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > >> > > } > > >> > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > >> > > foo.c: In function ‘f2’: > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > >> > > | ^~ > > >> > > svint32_t > > >> > > __Int32x4_t > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > >> > > during GIMPLE pass: forwprop > > >> > > dump file: foo.c.109t.forwprop2 > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > >> > > 0xe9371f execute_function_todo > > >> > > ../../gcc/gcc/passes.cc:2091 > > >> > > 0xe93ccb execute_todo > > >> > > ../../gcc/gcc/passes.cc:2145 > > >> > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > >> > > int32x4_t v; > > >> > > __Int32x4_t _1; > > >> > > svint32_t _9; > > >> > > vector(4) int _11; > > >> > > > > >> > > <bb 2> : > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > >> > > v_12 = _1; > > >> > > _11 = v_12; > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > >> > > return _9; > > >> > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > >> > > view_convert_expr, > > >> > > and the end result becomes: > > >> > > svint32_t _7; > > >> > > __Int32x4_t _8; > > >> > > > > >> > > ;; basic block 2, loop depth 0 > > >> > > ;; pred: ENTRY > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > >> > > return _7; > > >> > > ;; succ: EXIT > > >> > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > >> > > has incompatible types (svint32_t, int32x4_t). > > >> > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > >> > > > >> > It for sure papers over the issue. I think the error happens earlier, > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > >> > which is the type of the LHS. But then you probably run into the > > >> > different sizes ICE (VLA vs constant size). I think for this case you > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > >> > selecting the "low" part of the VLA vector. > > >> Hi Richard, > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > >> represent dup operation > > >> from fixed width to VLA vector. I am not sure how folding it to > > >> BIT_FIELD_REF will work. > > >> Could you please elaborate ? > > >> > > >> Also, the issue doesn't seem restricted to this case. > > >> The following test case also ICE's during forwprop: > > >> svint32_t foo() > > >> { > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > >> return v2; > > >> } > > >> > > >> foo2.c: In function ‘foo’: > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > >> 9 | } > > >> | ^ > > >> svint32_t > > >> int32x4_t > > >> v2_4 = { 1, 2, 3, 4 }; > > >> > > >> because simplify_permutation folds > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > >> into: > > >> vector_cst {1, 2, 3, 4} > > >> > > >> and it complains during verify_gimple_assign_single because we don't > > >> support assignment of vector_cst to VLA vector. > > >> > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > >> supports lhs and rhs > > >> to have vector types with differing lengths, and simplifying it to > > >> other tree codes, like above, > > >> will result in type errors ? > > > > > > That might be the case - Richard should know. > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > or about the VLA vs. VLS thing. We would have the same issue trying > > to build a 128-bit vector from 2 64-bit vectors. And there are other > > tree codes whose input types are/can be different from their output > > types. > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > > of type T needs to be replaced by something of type T. Whether T has a > > constant size or a variable size doesn't matter. > > > > > If so your type check > > > is still too late, you should instead recognize that we are permuting > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > > paths - that probably means just allowing the code == VEC_PERM_EXPR > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? > > > > Yeah. If we're talking about the match.pd code, I think only: > > > > (if (sel.series_p (0, 1, 0, 1)) > > { op0; } > > (if (sel.series_p (0, 1, nelts, 1)) > > { op1; } > > > > need a type compatibility check. For fold_vec_perm I think > > we should just rearrange: > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > > return NULL_TREE; > > > > so that the assert comes after the early-out. > > > > It would be good at some point to relax fold_vec_perm to cases where the > > outputs are a different length from the inputs, since the all-constant > > VEC_PERM_EXPR above could be folded to a VECTOR_CST. > Hi, > For the above case, I think the issue is that simplify_permutation > uses TREE_TYPE (arg0) for res_type, > while it should now use type for lhs. > > /* Shuffle of a constructor. */ > bool ret = false; > tree res_type = TREE_TYPE (arg0); > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)), > resolves the ICE, and emits all constant VEC_PERM_EXPR: > > v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; > return v2_4; > > Does the patch look OK to commit after bootstrap+test ? Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt). > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch. > > Thanks, > Prathamesh > > > > Thanks, > > Richard
On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > <richard.sandiford@arm.com> wrote: > > > > > > Richard Biener <richard.guenther@gmail.com> writes: > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > >> > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > > >> > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > > >> > <gcc-patches@gcc.gnu.org> wrote: > > > >> > > > > > >> > > Hi Richard, > > > >> > > For the following test: > > > >> > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > >> > > { > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > >> > > } > > > >> > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > >> > > foo.c: In function ‘f2’: > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > >> > > | ^~ > > > >> > > svint32_t > > > >> > > __Int32x4_t > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > >> > > during GIMPLE pass: forwprop > > > >> > > dump file: foo.c.109t.forwprop2 > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > >> > > 0xe9371f execute_function_todo > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > >> > > 0xe93ccb execute_todo > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > >> > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > >> > > int32x4_t v; > > > >> > > __Int32x4_t _1; > > > >> > > svint32_t _9; > > > >> > > vector(4) int _11; > > > >> > > > > > >> > > <bb 2> : > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > >> > > v_12 = _1; > > > >> > > _11 = v_12; > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > >> > > return _9; > > > >> > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > >> > > view_convert_expr, > > > >> > > and the end result becomes: > > > >> > > svint32_t _7; > > > >> > > __Int32x4_t _8; > > > >> > > > > > >> > > ;; basic block 2, loop depth 0 > > > >> > > ;; pred: ENTRY > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > >> > > return _7; > > > >> > > ;; succ: EXIT > > > >> > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > >> > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > > >> > > > > >> > It for sure papers over the issue. I think the error happens earlier, > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > > >> > which is the type of the LHS. But then you probably run into the > > > >> > different sizes ICE (VLA vs constant size). I think for this case you > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > >> > selecting the "low" part of the VLA vector. > > > >> Hi Richard, > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > >> represent dup operation > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > >> BIT_FIELD_REF will work. > > > >> Could you please elaborate ? > > > >> > > > >> Also, the issue doesn't seem restricted to this case. > > > >> The following test case also ICE's during forwprop: > > > >> svint32_t foo() > > > >> { > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > >> return v2; > > > >> } > > > >> > > > >> foo2.c: In function ‘foo’: > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > >> 9 | } > > > >> | ^ > > > >> svint32_t > > > >> int32x4_t > > > >> v2_4 = { 1, 2, 3, 4 }; > > > >> > > > >> because simplify_permutation folds > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > > >> into: > > > >> vector_cst {1, 2, 3, 4} > > > >> > > > >> and it complains during verify_gimple_assign_single because we don't > > > >> support assignment of vector_cst to VLA vector. > > > >> > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > > >> supports lhs and rhs > > > >> to have vector types with differing lengths, and simplifying it to > > > >> other tree codes, like above, > > > >> will result in type errors ? > > > > > > > > That might be the case - Richard should know. > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > > or about the VLA vs. VLS thing. We would have the same issue trying > > > to build a 128-bit vector from 2 64-bit vectors. And there are other > > > tree codes whose input types are/can be different from their output > > > types. > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > > > of type T needs to be replaced by something of type T. Whether T has a > > > constant size or a variable size doesn't matter. > > > > > > > If so your type check > > > > is still too late, you should instead recognize that we are permuting > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? > > > > > > Yeah. If we're talking about the match.pd code, I think only: > > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > { op0; } > > > (if (sel.series_p (0, 1, nelts, 1)) > > > { op1; } > > > > > > need a type compatibility check. For fold_vec_perm I think > > > we should just rearrange: > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > > > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > > > return NULL_TREE; > > > > > > so that the assert comes after the early-out. > > > > > > It would be good at some point to relax fold_vec_perm to cases where the > > > outputs are a different length from the inputs, since the all-constant > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST. > > Hi, > > For the above case, I think the issue is that simplify_permutation > > uses TREE_TYPE (arg0) for res_type, > > while it should now use type for lhs. > > > > /* Shuffle of a constructor. */ > > bool ret = false; > > tree res_type = TREE_TYPE (arg0); > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)), > > resolves the ICE, and emits all constant VEC_PERM_EXPR: > > > > v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; > > return v2_4; > > > > Does the patch look OK to commit after bootstrap+test ? > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt). Hi, I committed the patch but unfortunately it caused PR106360. The issue is that for slp-reduc-sad-2.c on ppc64le, simplify_permutation sees the following during forwprop4: _78 = (void *) ivtmp.21_73; _92 = MEM <unsigned long> [(uint8_t *)_78]; _91 = {_92, 0}; vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; _87 = {_88, 0}; vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; So for, tree res_type = TREE_TYPE (gimple_assign_lhs (stmt)); tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); we have: res_type = V16QI arg0 = {_92, 0} arg1 = {_88, 0} op2 = {0, 2} and thus we hit the following assert in fold_vec_perm: gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16. If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead, it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR: _92 = MEM <unsigned long> [(uint8_t *)_78]; _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; _5 = {_92, _88}; vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5); I suppose it's legal to cast vector of one type to another as long as sizes match ? IIUC, the above VIEW_CONVERT_EXPR will result in: vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0, 0, 0, 0, 0, 0, 0 } ? In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks if lhs_type and res_type differ but have same size, and in that case emit: lhs = VIEW_CONVERT_EXPR<lhs_type> (opt), instead of: lhs = VIEW_CONVERT_EXPR<op1 type> (opt) where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2> Does it look OK ? Thanks, Prathamesh > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch. > > > > Thanks, > > Prathamesh > > > > > > Thanks, > > > Richard diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc index d04cf4bccf8..6019d9b03ff 100644 --- a/gcc/tree-ssa-forwprop.cc +++ b/gcc/tree-ssa-forwprop.cc @@ -2661,18 +2661,21 @@ simplify_permutation (gimple_stmt_iterator *gsi) /* Shuffle of a constructor. */ bool ret = false; - tree res_type = TREE_TYPE (gimple_assign_lhs (stmt)); + tree res_type = TREE_TYPE (arg0); tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); if (!opt || (TREE_CODE (opt) != CONSTRUCTOR && TREE_CODE (opt) != VECTOR_CST)) return 0; /* Found VIEW_CONVERT_EXPR before, need one explicit conversion. */ - if (res_type != TREE_TYPE (op0)) + tree lhs_type = TREE_TYPE (gimple_assign_lhs (stmt)); + if (res_type != lhs_type) { + if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type))) + return 0; tree name = make_ssa_name (TREE_TYPE (opt)); gimple *ass_stmt = gimple_build_assign (name, opt); gsi_insert_before (gsi, ass_stmt, GSI_SAME_STMT); - opt = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (op0), name); + opt = build1 (VIEW_CONVERT_EXPR, lhs_type, name); } gimple_assign_set_rhs_from_tree (gsi, opt); update_stmt (gsi_stmt (*gsi));
On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > <richard.sandiford@arm.com> wrote: > > > > > > > > Richard Biener <richard.guenther@gmail.com> writes: > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > >> > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > > > >> > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > > > >> > <gcc-patches@gcc.gnu.org> wrote: > > > > >> > > > > > > >> > > Hi Richard, > > > > >> > > For the following test: > > > > >> > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > >> > > { > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > >> > > } > > > > >> > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > > >> > > foo.c: In function ‘f2’: > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > >> > > | ^~ > > > > >> > > svint32_t > > > > >> > > __Int32x4_t > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > >> > > during GIMPLE pass: forwprop > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > >> > > 0xe9371f execute_function_todo > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > >> > > 0xe93ccb execute_todo > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > >> > > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > > >> > > int32x4_t v; > > > > >> > > __Int32x4_t _1; > > > > >> > > svint32_t _9; > > > > >> > > vector(4) int _11; > > > > >> > > > > > > >> > > <bb 2> : > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > >> > > v_12 = _1; > > > > >> > > _11 = v_12; > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > >> > > return _9; > > > > >> > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > > >> > > view_convert_expr, > > > > >> > > and the end result becomes: > > > > >> > > svint32_t _7; > > > > >> > > __Int32x4_t _8; > > > > >> > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > >> > > ;; pred: ENTRY > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > >> > > return _7; > > > > >> > > ;; succ: EXIT > > > > >> > > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > >> > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > > > >> > > > > > >> > It for sure papers over the issue. I think the error happens earlier, > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > > > >> > which is the type of the LHS. But then you probably run into the > > > > >> > different sizes ICE (VLA vs constant size). I think for this case you > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > >> > selecting the "low" part of the VLA vector. > > > > >> Hi Richard, > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > > >> represent dup operation > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > >> BIT_FIELD_REF will work. > > > > >> Could you please elaborate ? > > > > >> > > > > >> Also, the issue doesn't seem restricted to this case. > > > > >> The following test case also ICE's during forwprop: > > > > >> svint32_t foo() > > > > >> { > > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > >> return v2; > > > > >> } > > > > >> > > > > >> foo2.c: In function ‘foo’: > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > > >> 9 | } > > > > >> | ^ > > > > >> svint32_t > > > > >> int32x4_t > > > > >> v2_4 = { 1, 2, 3, 4 }; > > > > >> > > > > >> because simplify_permutation folds > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > > > >> into: > > > > >> vector_cst {1, 2, 3, 4} > > > > >> > > > > >> and it complains during verify_gimple_assign_single because we don't > > > > >> support assignment of vector_cst to VLA vector. > > > > >> > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > > > >> supports lhs and rhs > > > > >> to have vector types with differing lengths, and simplifying it to > > > > >> other tree codes, like above, > > > > >> will result in type errors ? > > > > > > > > > > That might be the case - Richard should know. > > > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > > > or about the VLA vs. VLS thing. We would have the same issue trying > > > > to build a 128-bit vector from 2 64-bit vectors. And there are other > > > > tree codes whose input types are/can be different from their output > > > > types. > > > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > > > > of type T needs to be replaced by something of type T. Whether T has a > > > > constant size or a variable size doesn't matter. > > > > > > > > > If so your type check > > > > > is still too late, you should instead recognize that we are permuting > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? > > > > > > > > Yeah. If we're talking about the match.pd code, I think only: > > > > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > { op0; } > > > > (if (sel.series_p (0, 1, nelts, 1)) > > > > { op1; } > > > > > > > > need a type compatibility check. For fold_vec_perm I think > > > > we should just rearrange: > > > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > > > > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > > > > return NULL_TREE; > > > > > > > > so that the assert comes after the early-out. > > > > > > > > It would be good at some point to relax fold_vec_perm to cases where the > > > > outputs are a different length from the inputs, since the all-constant > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST. > > > Hi, > > > For the above case, I think the issue is that simplify_permutation > > > uses TREE_TYPE (arg0) for res_type, > > > while it should now use type for lhs. > > > > > > /* Shuffle of a constructor. */ > > > bool ret = false; > > > tree res_type = TREE_TYPE (arg0); > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)), > > > resolves the ICE, and emits all constant VEC_PERM_EXPR: > > > > > > v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; > > > return v2_4; > > > > > > Does the patch look OK to commit after bootstrap+test ? > > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt). > Hi, > I committed the patch but unfortunately it caused PR106360. > The issue is that for slp-reduc-sad-2.c on ppc64le, > simplify_permutation sees the following during forwprop4: > > _78 = (void *) ivtmp.21_73; > _92 = MEM <unsigned long> [(uint8_t *)_78]; > _91 = {_92, 0}; > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > _87 = {_88, 0}; > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > So for, > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt)); > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > we have: > res_type = V16QI > arg0 = {_92, 0} > arg1 = {_88, 0} > op2 = {0, 2} > > and thus we hit the following assert in fold_vec_perm: > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16. > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead, > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR: > _92 = MEM <unsigned long> [(uint8_t *)_78]; > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > _5 = {_92, _88}; > vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5); > > I suppose it's legal to cast vector of one type to another as long as > sizes match ? > > IIUC, the above VIEW_CONVERT_EXPR will result in: > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0, > 0, 0, 0, 0, 0, 0 } ? > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks > if lhs_type and res_type differ but have same size, and in that case emit: > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt), > instead of: > lhs = VIEW_CONVERT_EXPR<op1 type> (opt) > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2> > > Does it look OK ? Definitely the original change was bogus. + if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type))) + return 0; just repeats your very original change though ... I'll note that fold_ternary will ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by returning NULL_TREE on cases it does not handle. I think what should be done is, in the /* If there are any VIEW_CONVERT_EXPRs found when finding permutation operands source, check whether it's valid to transform and prepare the required new operands. */ if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) { ... path also transform the expected result type. It should remain V_C_E compatible to TREE_TYPE (lhs) but get a new element type. But as said, tree fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel) { unsigned int i; unsigned HOST_WIDE_INT nelts; bool need_ctor = false; if (!sel.length ().is_constant (&nelts)) return NULL_TREE; gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary doesn't guard according to those asserts (I think we should extend fold_vec_perm to support the new constraints). Richard. > Thanks, > Prathamesh > > > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch. > > > > > > Thanks, > > > Prathamesh > > > > > > > > Thanks, > > > > Richard
On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote: > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > <richard.sandiford@arm.com> wrote: > > > > > > > > > > Richard Biener <richard.guenther@gmail.com> writes: > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > >> > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > >> > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > > > > >> > <gcc-patches@gcc.gnu.org> wrote: > > > > > >> > > > > > > > >> > > Hi Richard, > > > > > >> > > For the following test: > > > > > >> > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > >> > > { > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > >> > > } > > > > > >> > > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > > > >> > > foo.c: In function ‘f2’: > > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > >> > > | ^~ > > > > > >> > > svint32_t > > > > > >> > > __Int32x4_t > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > >> > > during GIMPLE pass: forwprop > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > >> > > 0xe9371f execute_function_todo > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > >> > > 0xe93ccb execute_todo > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > >> > > > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > > > >> > > int32x4_t v; > > > > > >> > > __Int32x4_t _1; > > > > > >> > > svint32_t _9; > > > > > >> > > vector(4) int _11; > > > > > >> > > > > > > > >> > > <bb 2> : > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > >> > > v_12 = _1; > > > > > >> > > _11 = v_12; > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > >> > > return _9; > > > > > >> > > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > > > >> > > view_convert_expr, > > > > > >> > > and the end result becomes: > > > > > >> > > svint32_t _7; > > > > > >> > > __Int32x4_t _8; > > > > > >> > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > >> > > ;; pred: ENTRY > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > >> > > return _7; > > > > > >> > > ;; succ: EXIT > > > > > >> > > > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > >> > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > > > > >> > > > > > > >> > It for sure papers over the issue. I think the error happens earlier, > > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > > > > >> > which is the type of the LHS. But then you probably run into the > > > > > >> > different sizes ICE (VLA vs constant size). I think for this case you > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > > >> > selecting the "low" part of the VLA vector. > > > > > >> Hi Richard, > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > > > >> represent dup operation > > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > > >> BIT_FIELD_REF will work. > > > > > >> Could you please elaborate ? > > > > > >> > > > > > >> Also, the issue doesn't seem restricted to this case. > > > > > >> The following test case also ICE's during forwprop: > > > > > >> svint32_t foo() > > > > > >> { > > > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > >> return v2; > > > > > >> } > > > > > >> > > > > > >> foo2.c: In function ‘foo’: > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > > > >> 9 | } > > > > > >> | ^ > > > > > >> svint32_t > > > > > >> int32x4_t > > > > > >> v2_4 = { 1, 2, 3, 4 }; > > > > > >> > > > > > >> because simplify_permutation folds > > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > > > > >> into: > > > > > >> vector_cst {1, 2, 3, 4} > > > > > >> > > > > > >> and it complains during verify_gimple_assign_single because we don't > > > > > >> support assignment of vector_cst to VLA vector. > > > > > >> > > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > > > > >> supports lhs and rhs > > > > > >> to have vector types with differing lengths, and simplifying it to > > > > > >> other tree codes, like above, > > > > > >> will result in type errors ? > > > > > > > > > > > > That might be the case - Richard should know. > > > > > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > > > > or about the VLA vs. VLS thing. We would have the same issue trying > > > > > to build a 128-bit vector from 2 64-bit vectors. And there are other > > > > > tree codes whose input types are/can be different from their output > > > > > types. > > > > > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > > > > > of type T needs to be replaced by something of type T. Whether T has a > > > > > constant size or a variable size doesn't matter. > > > > > > > > > > > If so your type check > > > > > > is still too late, you should instead recognize that we are permuting > > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR > > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? > > > > > > > > > > Yeah. If we're talking about the match.pd code, I think only: > > > > > > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > > { op0; } > > > > > (if (sel.series_p (0, 1, nelts, 1)) > > > > > { op1; } > > > > > > > > > > need a type compatibility check. For fold_vec_perm I think > > > > > we should just rearrange: > > > > > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > > > > > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > > > > > return NULL_TREE; > > > > > > > > > > so that the assert comes after the early-out. > > > > > > > > > > It would be good at some point to relax fold_vec_perm to cases where the > > > > > outputs are a different length from the inputs, since the all-constant > > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST. > > > > Hi, > > > > For the above case, I think the issue is that simplify_permutation > > > > uses TREE_TYPE (arg0) for res_type, > > > > while it should now use type for lhs. > > > > > > > > /* Shuffle of a constructor. */ > > > > bool ret = false; > > > > tree res_type = TREE_TYPE (arg0); > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)), > > > > resolves the ICE, and emits all constant VEC_PERM_EXPR: > > > > > > > > v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; > > > > return v2_4; > > > > > > > > Does the patch look OK to commit after bootstrap+test ? > > > > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt). > > Hi, > > I committed the patch but unfortunately it caused PR106360. > > The issue is that for slp-reduc-sad-2.c on ppc64le, > > simplify_permutation sees the following during forwprop4: > > > > _78 = (void *) ivtmp.21_73; > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > _91 = {_92, 0}; > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > _87 = {_88, 0}; > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > So for, > > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt)); > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > we have: > > res_type = V16QI > > arg0 = {_92, 0} > > arg1 = {_88, 0} > > op2 = {0, 2} > > > > and thus we hit the following assert in fold_vec_perm: > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16. > > > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead, > > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR: > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > _5 = {_92, _88}; > > vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5); > > > > I suppose it's legal to cast vector of one type to another as long as > > sizes match ? > > > > IIUC, the above VIEW_CONVERT_EXPR will result in: > > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0, > > 0, 0, 0, 0, 0, 0 } ? > > > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks > > if lhs_type and res_type differ but have same size, and in that case emit: > > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt), > > instead of: > > lhs = VIEW_CONVERT_EXPR<op1 type> (opt) > > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2> > > > > Does it look OK ? > > Definitely the original change was bogus. > > + if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type))) > + return 0; > > just repeats your very original change though ... I'll note that > fold_ternary will > ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by > returning NULL_TREE on cases it does not handle. > > I think what should be done is, in the > > /* If there are any VIEW_CONVERT_EXPRs found when finding permutation > operands source, check whether it's valid to transform and prepare > the required new operands. */ > if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > { > ... > > path also transform the expected result type. It should remain V_C_E compatible > to TREE_TYPE (lhs) but get a new element type. > > But as said, > > tree > fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel) > { > unsigned int i; > unsigned HOST_WIDE_INT nelts; > bool need_ctor = false; > > if (!sel.length ().is_constant (&nelts)) > return NULL_TREE; > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary > doesn't guard according to those asserts (I think we should extend fold_vec_perm > to support the new constraints). Hi Richard, Thanks for the suggestions and sorry for late reply. I reverted the change to simplify_permutaton which resolved the ppc64le case ICE. The attached patch extends fold_vec_perm to handle vectors with differing lengths. For, lhs = vec_perm_expr<arg0, arg1, mask>, the patch: (a) asserts lhs and mask have same vector length. (b) asserts arg0 and arg1 have same vector length. (c) returns NULL_TREE if element type differs for lhs, arg0 and arg1. (d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1 if the mask has npatterns == len(arg0) and nelts_per_pattern == 1. The intent is to permute arg0 and arg1, and then to dup elements in result to target vector length. So for eg: vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}> will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6} replicated thru-out. Does it look OK ? With the patch, we don't ICE for either of the aarch64 tests above. For, svint32_t f1() { int32x4_t v = {1, 2, 3, 4}; return svld1rq_s32 (svptrue_b8 (), &v[0]); } optimized dump shows: svint32_t f1 () { int32x4_t v; <bb 2> : return { 1, 2, 3, 4, ... }; } code-gen: f1: .LFB3900: .cfi_startproc ptrue p0.b, all adrp x0, .LC0 add x0, x0, :lo12:.LC0 ld1rqw z0.s, p0/z, [x0] ret .LC0: .word 1 .word 2 .word 3 .word 4 I guess for this particular case, we could use index instead. For, svint32_t f2(int a, int b, int c, int d) { int32x4_t v = {a, b, c, d}; return svld1rq_s32 (svptrue_b8 (), &v[0]); } optimized dump shows: svint32_t f2 (int a, int b, int c, int d) { svint32_t _6; <bb 2> [local count: 1073741824]: _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... }; return _6; The code-gen seems pretty bad however: f2: .LFB3901: .cfi_startproc addvl sp, sp, #-4 .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22 ptrue p0.b, all addvl x4, sp, #3 mov z0.b, #0 st1w z0.s, p0, [sp, #3, mul vl] str w0, [x4] addvl x0, sp, #2 ld1w z0.s, p0/z, [sp, #3, mul vl] st1w z0.s, p0, [sp, #2, mul vl] str w1, [x0, 4] addvl x0, sp, #1 ld1w z0.s, p0/z, [sp, #2, mul vl] st1w z0.s, p0, [sp, #1, mul vl] str w2, [x0, 8] ld1w z0.s, p0/z, [sp, #1, mul vl] st1w z0.s, p0, [sp] str w3, [sp, 12] ld1w z0.s, p0/z, [sp] addvl sp, sp, #4 .cfi_def_cfa_offset 0 ret I will try to address code-gen issues in follow up patches. Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu. Thanks, Prathamesh > > Richard. > > > Thanks, > > Prathamesh > > > > > > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch. > > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > Thanks, > > > > > Richard diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 99021a82df4..6912de9b43c 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10541,15 +10541,33 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel) unsigned HOST_WIDE_INT nelts; bool need_ctor = false; - if (!sel.length ().is_constant (&nelts)) - return NULL_TREE; - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) - && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) - && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), + sel.length ())); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)))); + if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) return NULL_TREE; + /* If result vector has greater length than input vector, + then allow permuting two vectors as long as: + a) sel.nelts_per_pattern == 1 + b) sel.npatterns == len of input vector. + The intent is to permute input vectors, and + dup the elements in resulting vector to target vector length. */ + + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) + { + nelts = sel.encoding ().npatterns (); + if (sel.encoding ().nelts_per_pattern () != 1 + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) + return NULL_TREE; + } + else if (!sel.length ().is_constant (&nelts)) + return NULL_TREE; + tree *in_elts = XALLOCAVEC (tree, nelts * 2); if (!vec_cst_ctor_to_array (arg0, nelts, in_elts) || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts)) diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index 47371d8bcbe..7e706857f43 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags, pp_space (pp); } } + if (VECTOR_TYPE_P (TREE_TYPE (node)) + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) + pp_string (pp, ", ... "); pp_right_brace (pp); } break;
On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > > <richard.sandiford@arm.com> wrote: > > > > > > > > > > > > Richard Biener <richard.guenther@gmail.com> writes: > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > >> > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > >> > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > > > > > >> > <gcc-patches@gcc.gnu.org> wrote: > > > > > > >> > > > > > > > > >> > > Hi Richard, > > > > > > >> > > For the following test: > > > > > > >> > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > > >> > > { > > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > >> > > } > > > > > > >> > > > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > > > > >> > > foo.c: In function ‘f2’: > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > > >> > > | ^~ > > > > > > >> > > svint32_t > > > > > > >> > > __Int32x4_t > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > >> > > during GIMPLE pass: forwprop > > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > > >> > > 0xe9371f execute_function_todo > > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > > >> > > 0xe93ccb execute_todo > > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > > >> > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > > > > >> > > int32x4_t v; > > > > > > >> > > __Int32x4_t _1; > > > > > > >> > > svint32_t _9; > > > > > > >> > > vector(4) int _11; > > > > > > >> > > > > > > > > >> > > <bb 2> : > > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > > >> > > v_12 = _1; > > > > > > >> > > _11 = v_12; > > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > > >> > > return _9; > > > > > > >> > > > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > > > > >> > > view_convert_expr, > > > > > > >> > > and the end result becomes: > > > > > > >> > > svint32_t _7; > > > > > > >> > > __Int32x4_t _8; > > > > > > >> > > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > > >> > > ;; pred: ENTRY > > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > >> > > return _7; > > > > > > >> > > ;; succ: EXIT > > > > > > >> > > > > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > > >> > > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > > > > > >> > > > > > > > >> > It for sure papers over the issue. I think the error happens earlier, > > > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > > > > > >> > which is the type of the LHS. But then you probably run into the > > > > > > >> > different sizes ICE (VLA vs constant size). I think for this case you > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > > > >> > selecting the "low" part of the VLA vector. > > > > > > >> Hi Richard, > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > > > > >> represent dup operation > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > > > >> BIT_FIELD_REF will work. > > > > > > >> Could you please elaborate ? > > > > > > >> > > > > > > >> Also, the issue doesn't seem restricted to this case. > > > > > > >> The following test case also ICE's during forwprop: > > > > > > >> svint32_t foo() > > > > > > >> { > > > > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > >> return v2; > > > > > > >> } > > > > > > >> > > > > > > >> foo2.c: In function ‘foo’: > > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > > > > >> 9 | } > > > > > > >> | ^ > > > > > > >> svint32_t > > > > > > >> int32x4_t > > > > > > >> v2_4 = { 1, 2, 3, 4 }; > > > > > > >> > > > > > > >> because simplify_permutation folds > > > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > > > > > >> into: > > > > > > >> vector_cst {1, 2, 3, 4} > > > > > > >> > > > > > > >> and it complains during verify_gimple_assign_single because we don't > > > > > > >> support assignment of vector_cst to VLA vector. > > > > > > >> > > > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > > > > > >> supports lhs and rhs > > > > > > >> to have vector types with differing lengths, and simplifying it to > > > > > > >> other tree codes, like above, > > > > > > >> will result in type errors ? > > > > > > > > > > > > > > That might be the case - Richard should know. > > > > > > > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > > > > > or about the VLA vs. VLS thing. We would have the same issue trying > > > > > > to build a 128-bit vector from 2 64-bit vectors. And there are other > > > > > > tree codes whose input types are/can be different from their output > > > > > > types. > > > > > > > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > > > > > > of type T needs to be replaced by something of type T. Whether T has a > > > > > > constant size or a variable size doesn't matter. > > > > > > > > > > > > > If so your type check > > > > > > > is still too late, you should instead recognize that we are permuting > > > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR > > > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? > > > > > > > > > > > > Yeah. If we're talking about the match.pd code, I think only: > > > > > > > > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > > > { op0; } > > > > > > (if (sel.series_p (0, 1, nelts, 1)) > > > > > > { op1; } > > > > > > > > > > > > need a type compatibility check. For fold_vec_perm I think > > > > > > we should just rearrange: > > > > > > > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > > > > > > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > > > > > > return NULL_TREE; > > > > > > > > > > > > so that the assert comes after the early-out. > > > > > > > > > > > > It would be good at some point to relax fold_vec_perm to cases where the > > > > > > outputs are a different length from the inputs, since the all-constant > > > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST. > > > > > Hi, > > > > > For the above case, I think the issue is that simplify_permutation > > > > > uses TREE_TYPE (arg0) for res_type, > > > > > while it should now use type for lhs. > > > > > > > > > > /* Shuffle of a constructor. */ > > > > > bool ret = false; > > > > > tree res_type = TREE_TYPE (arg0); > > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)), > > > > > resolves the ICE, and emits all constant VEC_PERM_EXPR: > > > > > > > > > > v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; > > > > > return v2_4; > > > > > > > > > > Does the patch look OK to commit after bootstrap+test ? > > > > > > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt). > > > Hi, > > > I committed the patch but unfortunately it caused PR106360. > > > The issue is that for slp-reduc-sad-2.c on ppc64le, > > > simplify_permutation sees the following during forwprop4: > > > > > > _78 = (void *) ivtmp.21_73; > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > _91 = {_92, 0}; > > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > _87 = {_88, 0}; > > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > So for, > > > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt)); > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > we have: > > > res_type = V16QI > > > arg0 = {_92, 0} > > > arg1 = {_88, 0} > > > op2 = {0, 2} > > > > > > and thus we hit the following assert in fold_vec_perm: > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16. > > > > > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead, > > > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR: > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > _5 = {_92, _88}; > > > vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5); > > > > > > I suppose it's legal to cast vector of one type to another as long as > > > sizes match ? > > > > > > IIUC, the above VIEW_CONVERT_EXPR will result in: > > > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0, > > > 0, 0, 0, 0, 0, 0 } ? > > > > > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks > > > if lhs_type and res_type differ but have same size, and in that case emit: > > > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt), > > > instead of: > > > lhs = VIEW_CONVERT_EXPR<op1 type> (opt) > > > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2> > > > > > > Does it look OK ? > > > > Definitely the original change was bogus. > > > > + if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type))) > > + return 0; > > > > just repeats your very original change though ... I'll note that > > fold_ternary will > > ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by > > returning NULL_TREE on cases it does not handle. > > > > I think what should be done is, in the > > > > /* If there are any VIEW_CONVERT_EXPRs found when finding permutation > > operands source, check whether it's valid to transform and prepare > > the required new operands. */ > > if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > { > > ... > > > > path also transform the expected result type. It should remain V_C_E compatible > > to TREE_TYPE (lhs) but get a new element type. > > > > But as said, > > > > tree > > fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel) > > { > > unsigned int i; > > unsigned HOST_WIDE_INT nelts; > > bool need_ctor = false; > > > > if (!sel.length ().is_constant (&nelts)) > > return NULL_TREE; > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary > > doesn't guard according to those asserts (I think we should extend fold_vec_perm > > to support the new constraints). > Hi Richard, > Thanks for the suggestions and sorry for late reply. I reverted the change to > simplify_permutaton which resolved the ppc64le case ICE. > > The attached patch extends fold_vec_perm to handle vectors with > differing lengths. > For, > lhs = vec_perm_expr<arg0, arg1, mask>, > the patch: > (a) asserts lhs and mask have same vector length. > (b) asserts arg0 and arg1 have same vector length. > (c) returns NULL_TREE if element type differs for lhs, arg0 and arg1. > (d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1 > if the mask has npatterns == len(arg0) and nelts_per_pattern == 1. > The intent is to permute arg0 and arg1, and then to dup elements in result > to target vector length. > So for eg: > vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}> > will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6} > replicated thru-out. > Does it look OK ? > > With the patch, we don't ICE for either of the aarch64 tests above. > For, > svint32_t f1() > { > int32x4_t v = {1, 2, 3, 4}; > return svld1rq_s32 (svptrue_b8 (), &v[0]); > } > > optimized dump shows: > svint32_t f1 () > { > int32x4_t v; > > <bb 2> : > return { 1, 2, 3, 4, ... }; > > } > > code-gen: > f1: > .LFB3900: > .cfi_startproc > ptrue p0.b, all > adrp x0, .LC0 > add x0, x0, :lo12:.LC0 > ld1rqw z0.s, p0/z, [x0] > ret > .LC0: > .word 1 > .word 2 > .word 3 > .word 4 > > I guess for this particular case, we could use index instead. > > For, > svint32_t f2(int a, int b, int c, int d) > { > int32x4_t v = {a, b, c, d}; > return svld1rq_s32 (svptrue_b8 (), &v[0]); > } > > optimized dump shows: > svint32_t f2 (int a, int b, int c, int d) > { > svint32_t _6; > > <bb 2> [local count: 1073741824]: > _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... }; > return _6; > > The code-gen seems pretty bad however: > f2: > .LFB3901: > .cfi_startproc > addvl sp, sp, #-4 > .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22 > ptrue p0.b, all > addvl x4, sp, #3 > mov z0.b, #0 > st1w z0.s, p0, [sp, #3, mul vl] > str w0, [x4] > addvl x0, sp, #2 > ld1w z0.s, p0/z, [sp, #3, mul vl] > st1w z0.s, p0, [sp, #2, mul vl] > str w1, [x0, 4] > addvl x0, sp, #1 > ld1w z0.s, p0/z, [sp, #2, mul vl] > st1w z0.s, p0, [sp, #1, mul vl] > str w2, [x0, 8] > ld1w z0.s, p0/z, [sp, #1, mul vl] > st1w z0.s, p0, [sp] > str w3, [sp, 12] > ld1w z0.s, p0/z, [sp] > addvl sp, sp, #4 > .cfi_def_cfa_offset 0 > ret > > I will try to address code-gen issues in follow up patches. > Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu. /* If result vector has greater length than input vector, + then allow permuting two vectors as long as: + a) sel.nelts_per_pattern == 1 + b) sel.npatterns == len of input vector. + The intent is to permute input vectors, and + dup the elements in resulting vector to target vector length. */ + + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) + { + nelts = sel.encoding ().npatterns (); + if (sel.encoding ().nelts_per_pattern () != 1 + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) + return NULL_TREE; + } so the only case you add is non-VLA to VLA and there explicitely only the case of a period that's same as the element count in the input vectors. @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags, pp_space (pp); } } + if (VECTOR_TYPE_P (TREE_TYPE (node)) + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) + pp_string (pp, ", ... "); pp_right_brace (pp); btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? I had hoped that you would make tree *in_elts = XALLOCAVEC (tree, nelts * 2); if (!vec_cst_ctor_to_array (arg0, nelts, in_elts) || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts)) return NULL_TREE; VLA agnostic, thus support for example permuting { 0, 2, 4, 8, ... }, { 1, 3, 5, 7 ... }, { 0, 4, 1, 5 ... } as { 0, 1, 2, 3, ... }, etc. that should be entirely doable, to somebody familiar with VLA and the APIs even more so. Richard. > Thanks, > Prathamesh > > > > > Richard. > > > > > Thanks, > > > Prathamesh > > > > > > > > > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch. > > > > > > > > > > Thanks, > > > > > Prathamesh > > > > > > > > > > > > Thanks, > > > > > > Richard
On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > > > <richard.sandiford@arm.com> wrote: > > > > > > > > > > > > > > Richard Biener <richard.guenther@gmail.com> writes: > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > >> > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > >> > > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > > > > > > >> > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > >> > > > > > > > > > >> > > Hi Richard, > > > > > > > >> > > For the following test: > > > > > > > >> > > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > > > >> > > { > > > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > >> > > } > > > > > > > >> > > > > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > > > > > >> > > foo.c: In function ‘f2’: > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > > > >> > > | ^~ > > > > > > > >> > > svint32_t > > > > > > > >> > > __Int32x4_t > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > >> > > during GIMPLE pass: forwprop > > > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > > > >> > > 0xe9371f execute_function_todo > > > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > > > >> > > 0xe93ccb execute_todo > > > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > > > >> > > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > > > > > >> > > int32x4_t v; > > > > > > > >> > > __Int32x4_t _1; > > > > > > > >> > > svint32_t _9; > > > > > > > >> > > vector(4) int _11; > > > > > > > >> > > > > > > > > > >> > > <bb 2> : > > > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > > > >> > > v_12 = _1; > > > > > > > >> > > _11 = v_12; > > > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > > > >> > > return _9; > > > > > > > >> > > > > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > > > > > >> > > view_convert_expr, > > > > > > > >> > > and the end result becomes: > > > > > > > >> > > svint32_t _7; > > > > > > > >> > > __Int32x4_t _8; > > > > > > > >> > > > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > > > >> > > ;; pred: ENTRY > > > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > >> > > return _7; > > > > > > > >> > > ;; succ: EXIT > > > > > > > >> > > > > > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > > > >> > > > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > > > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > > > > > > >> > > > > > > > > >> > It for sure papers over the issue. I think the error happens earlier, > > > > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > > > > > > >> > which is the type of the LHS. But then you probably run into the > > > > > > > >> > different sizes ICE (VLA vs constant size). I think for this case you > > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > > > > >> > selecting the "low" part of the VLA vector. > > > > > > > >> Hi Richard, > > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > > > > > >> represent dup operation > > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > > > > >> BIT_FIELD_REF will work. > > > > > > > >> Could you please elaborate ? > > > > > > > >> > > > > > > > >> Also, the issue doesn't seem restricted to this case. > > > > > > > >> The following test case also ICE's during forwprop: > > > > > > > >> svint32_t foo() > > > > > > > >> { > > > > > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > > > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > >> return v2; > > > > > > > >> } > > > > > > > >> > > > > > > > >> foo2.c: In function ‘foo’: > > > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > > > > > >> 9 | } > > > > > > > >> | ^ > > > > > > > >> svint32_t > > > > > > > >> int32x4_t > > > > > > > >> v2_4 = { 1, 2, 3, 4 }; > > > > > > > >> > > > > > > > >> because simplify_permutation folds > > > > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > > > > > > >> into: > > > > > > > >> vector_cst {1, 2, 3, 4} > > > > > > > >> > > > > > > > >> and it complains during verify_gimple_assign_single because we don't > > > > > > > >> support assignment of vector_cst to VLA vector. > > > > > > > >> > > > > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > > > > > > >> supports lhs and rhs > > > > > > > >> to have vector types with differing lengths, and simplifying it to > > > > > > > >> other tree codes, like above, > > > > > > > >> will result in type errors ? > > > > > > > > > > > > > > > > That might be the case - Richard should know. > > > > > > > > > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > > > > > > or about the VLA vs. VLS thing. We would have the same issue trying > > > > > > > to build a 128-bit vector from 2 64-bit vectors. And there are other > > > > > > > tree codes whose input types are/can be different from their output > > > > > > > types. > > > > > > > > > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > > > > > > > of type T needs to be replaced by something of type T. Whether T has a > > > > > > > constant size or a variable size doesn't matter. > > > > > > > > > > > > > > > If so your type check > > > > > > > > is still too late, you should instead recognize that we are permuting > > > > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > > > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR > > > > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? > > > > > > > > > > > > > > Yeah. If we're talking about the match.pd code, I think only: > > > > > > > > > > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > > > > { op0; } > > > > > > > (if (sel.series_p (0, 1, nelts, 1)) > > > > > > > { op1; } > > > > > > > > > > > > > > need a type compatibility check. For fold_vec_perm I think > > > > > > > we should just rearrange: > > > > > > > > > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > > > > > > > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > > > > > > > return NULL_TREE; > > > > > > > > > > > > > > so that the assert comes after the early-out. > > > > > > > > > > > > > > It would be good at some point to relax fold_vec_perm to cases where the > > > > > > > outputs are a different length from the inputs, since the all-constant > > > > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST. > > > > > > Hi, > > > > > > For the above case, I think the issue is that simplify_permutation > > > > > > uses TREE_TYPE (arg0) for res_type, > > > > > > while it should now use type for lhs. > > > > > > > > > > > > /* Shuffle of a constructor. */ > > > > > > bool ret = false; > > > > > > tree res_type = TREE_TYPE (arg0); > > > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > > > > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)), > > > > > > resolves the ICE, and emits all constant VEC_PERM_EXPR: > > > > > > > > > > > > v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; > > > > > > return v2_4; > > > > > > > > > > > > Does the patch look OK to commit after bootstrap+test ? > > > > > > > > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt). > > > > Hi, > > > > I committed the patch but unfortunately it caused PR106360. > > > > The issue is that for slp-reduc-sad-2.c on ppc64le, > > > > simplify_permutation sees the following during forwprop4: > > > > > > > > _78 = (void *) ivtmp.21_73; > > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > > _91 = {_92, 0}; > > > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > > _87 = {_88, 0}; > > > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > > > So for, > > > > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt)); > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > > > we have: > > > > res_type = V16QI > > > > arg0 = {_92, 0} > > > > arg1 = {_88, 0} > > > > op2 = {0, 2} > > > > > > > > and thus we hit the following assert in fold_vec_perm: > > > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16. > > > > > > > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead, > > > > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR: > > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > > _5 = {_92, _88}; > > > > vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5); > > > > > > > > I suppose it's legal to cast vector of one type to another as long as > > > > sizes match ? > > > > > > > > IIUC, the above VIEW_CONVERT_EXPR will result in: > > > > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0, > > > > 0, 0, 0, 0, 0, 0 } ? > > > > > > > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks > > > > if lhs_type and res_type differ but have same size, and in that case emit: > > > > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt), > > > > instead of: > > > > lhs = VIEW_CONVERT_EXPR<op1 type> (opt) > > > > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2> > > > > > > > > Does it look OK ? > > > > > > Definitely the original change was bogus. > > > > > > + if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type))) > > > + return 0; > > > > > > just repeats your very original change though ... I'll note that > > > fold_ternary will > > > ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by > > > returning NULL_TREE on cases it does not handle. > > > > > > I think what should be done is, in the > > > > > > /* If there are any VIEW_CONVERT_EXPRs found when finding permutation > > > operands source, check whether it's valid to transform and prepare > > > the required new operands. */ > > > if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > { > > > ... > > > > > > path also transform the expected result type. It should remain V_C_E compatible > > > to TREE_TYPE (lhs) but get a new element type. > > > > > > But as said, > > > > > > tree > > > fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel) > > > { > > > unsigned int i; > > > unsigned HOST_WIDE_INT nelts; > > > bool need_ctor = false; > > > > > > if (!sel.length ().is_constant (&nelts)) > > > return NULL_TREE; > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary > > > doesn't guard according to those asserts (I think we should extend fold_vec_perm > > > to support the new constraints). > > Hi Richard, > > Thanks for the suggestions and sorry for late reply. I reverted the change to > > simplify_permutaton which resolved the ppc64le case ICE. > > > > The attached patch extends fold_vec_perm to handle vectors with > > differing lengths. > > For, > > lhs = vec_perm_expr<arg0, arg1, mask>, > > the patch: > > (a) asserts lhs and mask have same vector length. > > (b) asserts arg0 and arg1 have same vector length. > > (c) returns NULL_TREE if element type differs for lhs, arg0 and arg1. > > (d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1 > > if the mask has npatterns == len(arg0) and nelts_per_pattern == 1. > > The intent is to permute arg0 and arg1, and then to dup elements in result > > to target vector length. > > So for eg: > > vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}> > > will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6} > > replicated thru-out. > > Does it look OK ? > > > > With the patch, we don't ICE for either of the aarch64 tests above. > > For, > > svint32_t f1() > > { > > int32x4_t v = {1, 2, 3, 4}; > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > } > > > > optimized dump shows: > > svint32_t f1 () > > { > > int32x4_t v; > > > > <bb 2> : > > return { 1, 2, 3, 4, ... }; > > > > } > > > > code-gen: > > f1: > > .LFB3900: > > .cfi_startproc > > ptrue p0.b, all > > adrp x0, .LC0 > > add x0, x0, :lo12:.LC0 > > ld1rqw z0.s, p0/z, [x0] > > ret > > .LC0: > > .word 1 > > .word 2 > > .word 3 > > .word 4 > > > > I guess for this particular case, we could use index instead. > > > > For, > > svint32_t f2(int a, int b, int c, int d) > > { > > int32x4_t v = {a, b, c, d}; > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > } > > > > optimized dump shows: > > svint32_t f2 (int a, int b, int c, int d) > > { > > svint32_t _6; > > > > <bb 2> [local count: 1073741824]: > > _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... }; > > return _6; > > > > The code-gen seems pretty bad however: > > f2: > > .LFB3901: > > .cfi_startproc > > addvl sp, sp, #-4 > > .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22 > > ptrue p0.b, all > > addvl x4, sp, #3 > > mov z0.b, #0 > > st1w z0.s, p0, [sp, #3, mul vl] > > str w0, [x4] > > addvl x0, sp, #2 > > ld1w z0.s, p0/z, [sp, #3, mul vl] > > st1w z0.s, p0, [sp, #2, mul vl] > > str w1, [x0, 4] > > addvl x0, sp, #1 > > ld1w z0.s, p0/z, [sp, #2, mul vl] > > st1w z0.s, p0, [sp, #1, mul vl] > > str w2, [x0, 8] > > ld1w z0.s, p0/z, [sp, #1, mul vl] > > st1w z0.s, p0, [sp] > > str w3, [sp, 12] > > ld1w z0.s, p0/z, [sp] > > addvl sp, sp, #4 > > .cfi_def_cfa_offset 0 > > ret > > > > I will try to address code-gen issues in follow up patches. > > Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu. > > > /* If result vector has greater length than input vector, > + then allow permuting two vectors as long as: > + a) sel.nelts_per_pattern == 1 > + b) sel.npatterns == len of input vector. > + The intent is to permute input vectors, and > + dup the elements in resulting vector to target vector length. */ > + > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > + { > + nelts = sel.encoding ().npatterns (); > + if (sel.encoding ().nelts_per_pattern () != 1 > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > + return NULL_TREE; > + } > > so the only case you add is non-VLA to VLA and there > explicitely only the case of a period that's same as the > element count in the input vectors. > > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > node, int spc, dump_flags_t flags, > pp_space (pp); > } > } > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > + pp_string (pp, ", ... "); > pp_right_brace (pp); > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? Well, it got created for the following case after folding: svint32_t f2(int a, int b, int c, int d) { int32x4_t v = {a, b, c, d}; return svld1rq_s32 (svptrue_b8 (), &v[0]); } The svld1rq_s32 call gets folded to: v = {a, b, c, d} lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> fold_vec_perm then folds the above VEC_PERM_EXPR to VLA constructor, since elements in v (in_elts) are not constant, and need_ctor is thus true: lhs = {a, b, c, d, ...} I added "..." to make it more explicit that it's a VLA constructor. > > I had hoped that you would make > > tree *in_elts = XALLOCAVEC (tree, nelts * 2); > if (!vec_cst_ctor_to_array (arg0, nelts, in_elts) > || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts)) > return NULL_TREE; > > VLA agnostic, thus support for example permuting > { 0, 2, 4, 8, ... }, { 1, 3, 5, 7 ... }, { 0, 4, 1, 5 ... } > as { 0, 1, 2, 3, ... }, etc. > > that should be entirely doable, to somebody familiar with VLA and the APIs > even more so. Thanks, I will work on adding this case. Thanks, Prathamesh > > Richard. > > > Thanks, > > Prathamesh > > > > > > > > Richard. > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > > > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch. > > > > > > > > > > > > Thanks, > > > > > > Prathamesh > > > > > > > > > > > > > > Thanks, > > > > > > > Richard
On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > > > > <richard.sandiford@arm.com> wrote: > > > > > > > > > > > > > > > > Richard Biener <richard.guenther@gmail.com> writes: > > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > >> > > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > >> > > > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > > > > > > > >> > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > >> > > > > > > > > > > >> > > Hi Richard, > > > > > > > > >> > > For the following test: > > > > > > > > >> > > > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > > > > >> > > { > > > > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > > >> > > } > > > > > > > > >> > > > > > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > > > > > > >> > > foo.c: In function ‘f2’: > > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > > > > >> > > | ^~ > > > > > > > > >> > > svint32_t > > > > > > > > >> > > __Int32x4_t > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > > >> > > during GIMPLE pass: forwprop > > > > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > > > > >> > > 0xe9371f execute_function_todo > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > > > > >> > > 0xe93ccb execute_todo > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > > > > >> > > > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > > > > > > >> > > int32x4_t v; > > > > > > > > >> > > __Int32x4_t _1; > > > > > > > > >> > > svint32_t _9; > > > > > > > > >> > > vector(4) int _11; > > > > > > > > >> > > > > > > > > > > >> > > <bb 2> : > > > > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > > > > >> > > v_12 = _1; > > > > > > > > >> > > _11 = v_12; > > > > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > > > > >> > > return _9; > > > > > > > > >> > > > > > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > > > > > > >> > > view_convert_expr, > > > > > > > > >> > > and the end result becomes: > > > > > > > > >> > > svint32_t _7; > > > > > > > > >> > > __Int32x4_t _8; > > > > > > > > >> > > > > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > > > > >> > > ;; pred: ENTRY > > > > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > > >> > > return _7; > > > > > > > > >> > > ;; succ: EXIT > > > > > > > > >> > > > > > > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > > > > >> > > > > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > > > > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > > > > > > > >> > > > > > > > > > >> > It for sure papers over the issue. I think the error happens earlier, > > > > > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > > > > > > > >> > which is the type of the LHS. But then you probably run into the > > > > > > > > >> > different sizes ICE (VLA vs constant size). I think for this case you > > > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > > > > > >> > selecting the "low" part of the VLA vector. > > > > > > > > >> Hi Richard, > > > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > > > > > > >> represent dup operation > > > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > > > > > >> BIT_FIELD_REF will work. > > > > > > > > >> Could you please elaborate ? > > > > > > > > >> > > > > > > > > >> Also, the issue doesn't seem restricted to this case. > > > > > > > > >> The following test case also ICE's during forwprop: > > > > > > > > >> svint32_t foo() > > > > > > > > >> { > > > > > > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > > > > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > > >> return v2; > > > > > > > > >> } > > > > > > > > >> > > > > > > > > >> foo2.c: In function ‘foo’: > > > > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > > > > > > >> 9 | } > > > > > > > > >> | ^ > > > > > > > > >> svint32_t > > > > > > > > >> int32x4_t > > > > > > > > >> v2_4 = { 1, 2, 3, 4 }; > > > > > > > > >> > > > > > > > > >> because simplify_permutation folds > > > > > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > > > > > > > >> into: > > > > > > > > >> vector_cst {1, 2, 3, 4} > > > > > > > > >> > > > > > > > > >> and it complains during verify_gimple_assign_single because we don't > > > > > > > > >> support assignment of vector_cst to VLA vector. > > > > > > > > >> > > > > > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > > > > > > > >> supports lhs and rhs > > > > > > > > >> to have vector types with differing lengths, and simplifying it to > > > > > > > > >> other tree codes, like above, > > > > > > > > >> will result in type errors ? > > > > > > > > > > > > > > > > > > That might be the case - Richard should know. > > > > > > > > > > > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > > > > > > > or about the VLA vs. VLS thing. We would have the same issue trying > > > > > > > > to build a 128-bit vector from 2 64-bit vectors. And there are other > > > > > > > > tree codes whose input types are/can be different from their output > > > > > > > > types. > > > > > > > > > > > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > > > > > > > > of type T needs to be replaced by something of type T. Whether T has a > > > > > > > > constant size or a variable size doesn't matter. > > > > > > > > > > > > > > > > > If so your type check > > > > > > > > > is still too late, you should instead recognize that we are permuting > > > > > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > > > > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR > > > > > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? > > > > > > > > > > > > > > > > Yeah. If we're talking about the match.pd code, I think only: > > > > > > > > > > > > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > > > > > { op0; } > > > > > > > > (if (sel.series_p (0, 1, nelts, 1)) > > > > > > > > { op1; } > > > > > > > > > > > > > > > > need a type compatibility check. For fold_vec_perm I think > > > > > > > > we should just rearrange: > > > > > > > > > > > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > > > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > > > > > > > > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > > > > > > > > return NULL_TREE; > > > > > > > > > > > > > > > > so that the assert comes after the early-out. > > > > > > > > > > > > > > > > It would be good at some point to relax fold_vec_perm to cases where the > > > > > > > > outputs are a different length from the inputs, since the all-constant > > > > > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST. > > > > > > > Hi, > > > > > > > For the above case, I think the issue is that simplify_permutation > > > > > > > uses TREE_TYPE (arg0) for res_type, > > > > > > > while it should now use type for lhs. > > > > > > > > > > > > > > /* Shuffle of a constructor. */ > > > > > > > bool ret = false; > > > > > > > tree res_type = TREE_TYPE (arg0); > > > > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > > > > > > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)), > > > > > > > resolves the ICE, and emits all constant VEC_PERM_EXPR: > > > > > > > > > > > > > > v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; > > > > > > > return v2_4; > > > > > > > > > > > > > > Does the patch look OK to commit after bootstrap+test ? > > > > > > > > > > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt). > > > > > Hi, > > > > > I committed the patch but unfortunately it caused PR106360. > > > > > The issue is that for slp-reduc-sad-2.c on ppc64le, > > > > > simplify_permutation sees the following during forwprop4: > > > > > > > > > > _78 = (void *) ivtmp.21_73; > > > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > > > _91 = {_92, 0}; > > > > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > > > _87 = {_88, 0}; > > > > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > > > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > > > > > So for, > > > > > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt)); > > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > > > > > we have: > > > > > res_type = V16QI > > > > > arg0 = {_92, 0} > > > > > arg1 = {_88, 0} > > > > > op2 = {0, 2} > > > > > > > > > > and thus we hit the following assert in fold_vec_perm: > > > > > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > > > > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16. > > > > > > > > > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead, > > > > > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR: > > > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > > > _5 = {_92, _88}; > > > > > vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5); > > > > > > > > > > I suppose it's legal to cast vector of one type to another as long as > > > > > sizes match ? > > > > > > > > > > IIUC, the above VIEW_CONVERT_EXPR will result in: > > > > > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0, > > > > > 0, 0, 0, 0, 0, 0 } ? > > > > > > > > > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks > > > > > if lhs_type and res_type differ but have same size, and in that case emit: > > > > > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt), > > > > > instead of: > > > > > lhs = VIEW_CONVERT_EXPR<op1 type> (opt) > > > > > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2> > > > > > > > > > > Does it look OK ? > > > > > > > > Definitely the original change was bogus. > > > > > > > > + if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type))) > > > > + return 0; > > > > > > > > just repeats your very original change though ... I'll note that > > > > fold_ternary will > > > > ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by > > > > returning NULL_TREE on cases it does not handle. > > > > > > > > I think what should be done is, in the > > > > > > > > /* If there are any VIEW_CONVERT_EXPRs found when finding permutation > > > > operands source, check whether it's valid to transform and prepare > > > > the required new operands. */ > > > > if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > { > > > > ... > > > > > > > > path also transform the expected result type. It should remain V_C_E compatible > > > > to TREE_TYPE (lhs) but get a new element type. > > > > > > > > But as said, > > > > > > > > tree > > > > fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel) > > > > { > > > > unsigned int i; > > > > unsigned HOST_WIDE_INT nelts; > > > > bool need_ctor = false; > > > > > > > > if (!sel.length ().is_constant (&nelts)) > > > > return NULL_TREE; > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > > > ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary > > > > doesn't guard according to those asserts (I think we should extend fold_vec_perm > > > > to support the new constraints). > > > Hi Richard, > > > Thanks for the suggestions and sorry for late reply. I reverted the change to > > > simplify_permutaton which resolved the ppc64le case ICE. > > > > > > The attached patch extends fold_vec_perm to handle vectors with > > > differing lengths. > > > For, > > > lhs = vec_perm_expr<arg0, arg1, mask>, > > > the patch: > > > (a) asserts lhs and mask have same vector length. > > > (b) asserts arg0 and arg1 have same vector length. > > > (c) returns NULL_TREE if element type differs for lhs, arg0 and arg1. > > > (d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1 > > > if the mask has npatterns == len(arg0) and nelts_per_pattern == 1. > > > The intent is to permute arg0 and arg1, and then to dup elements in result > > > to target vector length. > > > So for eg: > > > vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}> > > > will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6} > > > replicated thru-out. > > > Does it look OK ? > > > > > > With the patch, we don't ICE for either of the aarch64 tests above. > > > For, > > > svint32_t f1() > > > { > > > int32x4_t v = {1, 2, 3, 4}; > > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > } > > > > > > optimized dump shows: > > > svint32_t f1 () > > > { > > > int32x4_t v; > > > > > > <bb 2> : > > > return { 1, 2, 3, 4, ... }; > > > > > > } > > > > > > code-gen: > > > f1: > > > .LFB3900: > > > .cfi_startproc > > > ptrue p0.b, all > > > adrp x0, .LC0 > > > add x0, x0, :lo12:.LC0 > > > ld1rqw z0.s, p0/z, [x0] > > > ret > > > .LC0: > > > .word 1 > > > .word 2 > > > .word 3 > > > .word 4 > > > > > > I guess for this particular case, we could use index instead. > > > > > > For, > > > svint32_t f2(int a, int b, int c, int d) > > > { > > > int32x4_t v = {a, b, c, d}; > > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > } > > > > > > optimized dump shows: > > > svint32_t f2 (int a, int b, int c, int d) > > > { > > > svint32_t _6; > > > > > > <bb 2> [local count: 1073741824]: > > > _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... }; > > > return _6; > > > > > > The code-gen seems pretty bad however: > > > f2: > > > .LFB3901: > > > .cfi_startproc > > > addvl sp, sp, #-4 > > > .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22 > > > ptrue p0.b, all > > > addvl x4, sp, #3 > > > mov z0.b, #0 > > > st1w z0.s, p0, [sp, #3, mul vl] > > > str w0, [x4] > > > addvl x0, sp, #2 > > > ld1w z0.s, p0/z, [sp, #3, mul vl] > > > st1w z0.s, p0, [sp, #2, mul vl] > > > str w1, [x0, 4] > > > addvl x0, sp, #1 > > > ld1w z0.s, p0/z, [sp, #2, mul vl] > > > st1w z0.s, p0, [sp, #1, mul vl] > > > str w2, [x0, 8] > > > ld1w z0.s, p0/z, [sp, #1, mul vl] > > > st1w z0.s, p0, [sp] > > > str w3, [sp, 12] > > > ld1w z0.s, p0/z, [sp] > > > addvl sp, sp, #4 > > > .cfi_def_cfa_offset 0 > > > ret > > > > > > I will try to address code-gen issues in follow up patches. > > > Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu. > > > > > > /* If result vector has greater length than input vector, > > + then allow permuting two vectors as long as: > > + a) sel.nelts_per_pattern == 1 > > + b) sel.npatterns == len of input vector. > > + The intent is to permute input vectors, and > > + dup the elements in resulting vector to target vector length. */ > > + > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > + { > > + nelts = sel.encoding ().npatterns (); > > + if (sel.encoding ().nelts_per_pattern () != 1 > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > + return NULL_TREE; > > + } > > > > so the only case you add is non-VLA to VLA and there > > explicitely only the case of a period that's same as the > > element count in the input vectors. > > > > > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > node, int spc, dump_flags_t flags, > > pp_space (pp); > > } > > } > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > + pp_string (pp, ", ... "); > > pp_right_brace (pp); > > > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > Well, it got created for the following case after folding: > svint32_t f2(int a, int b, int c, int d) > { > int32x4_t v = {a, b, c, d}; > return svld1rq_s32 (svptrue_b8 (), &v[0]); > } > > The svld1rq_s32 call gets folded to: > v = {a, b, c, d} > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > > fold_vec_perm then folds the above VEC_PERM_EXPR to > VLA constructor, since elements in v (in_elts) are not constant, and > need_ctor is thus true: > lhs = {a, b, c, d, ...} > I added "..." to make it more explicit that it's a VLA constructor. But I doubt we do anything reasonable with such a beast? Do we? I suppose it's like a vec_duplicate if you view it as V1TImode but do we actually make sure to do this duplication? > > > > I had hoped that you would make > > > > tree *in_elts = XALLOCAVEC (tree, nelts * 2); > > if (!vec_cst_ctor_to_array (arg0, nelts, in_elts) > > || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts)) > > return NULL_TREE; > > > > VLA agnostic, thus support for example permuting > > { 0, 2, 4, 8, ... }, { 1, 3, 5, 7 ... }, { 0, 4, 1, 5 ... } > > as { 0, 1, 2, 3, ... }, etc. > > > > that should be entirely doable, to somebody familiar with VLA and the APIs > > even more so. > Thanks, I will work on adding this case. > > Thanks, > Prathamesh > > > > Richard. > > > > > Thanks, > > > Prathamesh > > > > > > > > > > > Richard. > > > > > > > > > Thanks, > > > > > Prathamesh > > > > > > > > > > > > > > > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch. > > > > > > > > > > > > > > Thanks, > > > > > > > Prathamesh > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Richard
On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni > > > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford > > > > > > > > <richard.sandiford@arm.com> wrote: > > > > > > > > > > > > > > > > > > Richard Biener <richard.guenther@gmail.com> writes: > > > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > > > > > > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > >> > > > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > > >> > > > > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > > > > > > > > >> > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > >> > > > > > > > > > > > >> > > Hi Richard, > > > > > > > > > >> > > For the following test: > > > > > > > > > >> > > > > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d) > > > > > > > > > >> > > { > > > > > > > > > >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > > > > > > > >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > > > >> > > } > > > > > > > > > >> > > > > > > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > > > > > > > >> > > foo.c: In function ‘f2’: > > > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d) > > > > > > > > > >> > > | ^~ > > > > > > > > > >> > > svint32_t > > > > > > > > > >> > > __Int32x4_t > > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > > > >> > > during GIMPLE pass: forwprop > > > > > > > > > >> > > dump file: foo.c.109t.forwprop2 > > > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed > > > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568 > > > > > > > > > >> > > 0xe9371f execute_function_todo > > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2091 > > > > > > > > > >> > > 0xe93ccb execute_todo > > > > > > > > > >> > > ../../gcc/gcc/passes.cc:2145 > > > > > > > > > >> > > > > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > > > > > > > >> > > int32x4_t v; > > > > > > > > > >> > > __Int32x4_t _1; > > > > > > > > > >> > > svint32_t _9; > > > > > > > > > >> > > vector(4) int _11; > > > > > > > > > >> > > > > > > > > > > > >> > > <bb 2> : > > > > > > > > > >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > > > > > > > >> > > v_12 = _1; > > > > > > > > > >> > > _11 = v_12; > > > > > > > > > >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > > > > > > > >> > > return _9; > > > > > > > > > >> > > > > > > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > > > > > > > >> > > view_convert_expr, > > > > > > > > > >> > > and the end result becomes: > > > > > > > > > >> > > svint32_t _7; > > > > > > > > > >> > > __Int32x4_t _8; > > > > > > > > > >> > > > > > > > > > > > >> > > ;; basic block 2, loop depth 0 > > > > > > > > > >> > > ;; pred: ENTRY > > > > > > > > > >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > > > > > > > >> > > return _7; > > > > > > > > > >> > > ;; succ: EXIT > > > > > > > > > >> > > > > > > > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t). > > > > > > > > > >> > > > > > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR > > > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types, > > > > > > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ? > > > > > > > > > >> > > > > > > > > > > >> > It for sure papers over the issue. I think the error happens earlier, > > > > > > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > > > > > > > > >> > which is the type of the LHS. But then you probably run into the > > > > > > > > > >> > different sizes ICE (VLA vs constant size). I think for this case you > > > > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > > > > > > > > >> > selecting the "low" part of the VLA vector. > > > > > > > > > >> Hi Richard, > > > > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > > > > > > > > > >> represent dup operation > > > > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to > > > > > > > > > >> BIT_FIELD_REF will work. > > > > > > > > > >> Could you please elaborate ? > > > > > > > > > >> > > > > > > > > > >> Also, the issue doesn't seem restricted to this case. > > > > > > > > > >> The following test case also ICE's during forwprop: > > > > > > > > > >> svint32_t foo() > > > > > > > > > >> { > > > > > > > > > >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > > > > > > > > > >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > > > >> return v2; > > > > > > > > > >> } > > > > > > > > > >> > > > > > > > > > >> foo2.c: In function ‘foo’: > > > > > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > > > > > > > > > >> 9 | } > > > > > > > > > >> | ^ > > > > > > > > > >> svint32_t > > > > > > > > > >> int32x4_t > > > > > > > > > >> v2_4 = { 1, 2, 3, 4 }; > > > > > > > > > >> > > > > > > > > > >> because simplify_permutation folds > > > > > > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > > > > > > > > > >> into: > > > > > > > > > >> vector_cst {1, 2, 3, 4} > > > > > > > > > >> > > > > > > > > > >> and it complains during verify_gimple_assign_single because we don't > > > > > > > > > >> support assignment of vector_cst to VLA vector. > > > > > > > > > >> > > > > > > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR > > > > > > > > > >> supports lhs and rhs > > > > > > > > > >> to have vector types with differing lengths, and simplifying it to > > > > > > > > > >> other tree codes, like above, > > > > > > > > > >> will result in type errors ? > > > > > > > > > > > > > > > > > > > > That might be the case - Richard should know. > > > > > > > > > > > > > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here, > > > > > > > > > or about the VLA vs. VLS thing. We would have the same issue trying > > > > > > > > > to build a 128-bit vector from 2 64-bit vectors. And there are other > > > > > > > > > tree codes whose input types are/can be different from their output > > > > > > > > > types. > > > > > > > > > > > > > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR > > > > > > > > > of type T needs to be replaced by something of type T. Whether T has a > > > > > > > > > constant size or a variable size doesn't matter. > > > > > > > > > > > > > > > > > > > If so your type check > > > > > > > > > > is still too late, you should instead recognize that we are permuting > > > > > > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating > > > > > > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR > > > > > > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? > > > > > > > > > > > > > > > > > > Yeah. If we're talking about the match.pd code, I think only: > > > > > > > > > > > > > > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > > > > > > { op0; } > > > > > > > > > (if (sel.series_p (0, 1, nelts, 1)) > > > > > > > > > { op1; } > > > > > > > > > > > > > > > > > > need a type compatibility check. For fold_vec_perm I think > > > > > > > > > we should just rearrange: > > > > > > > > > > > > > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > > > > if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) > > > > > > > > > || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) > > > > > > > > > return NULL_TREE; > > > > > > > > > > > > > > > > > > so that the assert comes after the early-out. > > > > > > > > > > > > > > > > > > It would be good at some point to relax fold_vec_perm to cases where the > > > > > > > > > outputs are a different length from the inputs, since the all-constant > > > > > > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST. > > > > > > > > Hi, > > > > > > > > For the above case, I think the issue is that simplify_permutation > > > > > > > > uses TREE_TYPE (arg0) for res_type, > > > > > > > > while it should now use type for lhs. > > > > > > > > > > > > > > > > /* Shuffle of a constructor. */ > > > > > > > > bool ret = false; > > > > > > > > tree res_type = TREE_TYPE (arg0); > > > > > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > > > > > > > > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)), > > > > > > > > resolves the ICE, and emits all constant VEC_PERM_EXPR: > > > > > > > > > > > > > > > > v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; > > > > > > > > return v2_4; > > > > > > > > > > > > > > > > Does the patch look OK to commit after bootstrap+test ? > > > > > > > > > > > > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt). > > > > > > Hi, > > > > > > I committed the patch but unfortunately it caused PR106360. > > > > > > The issue is that for slp-reduc-sad-2.c on ppc64le, > > > > > > simplify_permutation sees the following during forwprop4: > > > > > > > > > > > > _78 = (void *) ivtmp.21_73; > > > > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > > > > _91 = {_92, 0}; > > > > > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > > > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > > > > _87 = {_88, 0}; > > > > > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > > > > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > > > > > > > So for, > > > > > > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt)); > > > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); > > > > > > > > > > > > we have: > > > > > > res_type = V16QI > > > > > > arg0 = {_92, 0} > > > > > > arg1 = {_88, 0} > > > > > > op2 = {0, 2} > > > > > > > > > > > > and thus we hit the following assert in fold_vec_perm: > > > > > > > > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > > > > > > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16. > > > > > > > > > > > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead, > > > > > > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR: > > > > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > > > > _5 = {_92, _88}; > > > > > > vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5); > > > > > > > > > > > > I suppose it's legal to cast vector of one type to another as long as > > > > > > sizes match ? > > > > > > > > > > > > IIUC, the above VIEW_CONVERT_EXPR will result in: > > > > > > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0, > > > > > > 0, 0, 0, 0, 0, 0 } ? > > > > > > > > > > > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks > > > > > > if lhs_type and res_type differ but have same size, and in that case emit: > > > > > > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt), > > > > > > instead of: > > > > > > lhs = VIEW_CONVERT_EXPR<op1 type> (opt) > > > > > > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2> > > > > > > > > > > > > Does it look OK ? > > > > > > > > > > Definitely the original change was bogus. > > > > > > > > > > + if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type))) > > > > > + return 0; > > > > > > > > > > just repeats your very original change though ... I'll note that > > > > > fold_ternary will > > > > > ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by > > > > > returning NULL_TREE on cases it does not handle. > > > > > > > > > > I think what should be done is, in the > > > > > > > > > > /* If there are any VIEW_CONVERT_EXPRs found when finding permutation > > > > > operands source, check whether it's valid to transform and prepare > > > > > the required new operands. */ > > > > > if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > > { > > > > > ... > > > > > > > > > > path also transform the expected result type. It should remain V_C_E compatible > > > > > to TREE_TYPE (lhs) but get a new element type. > > > > > > > > > > But as said, > > > > > > > > > > tree > > > > > fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel) > > > > > { > > > > > unsigned int i; > > > > > unsigned HOST_WIDE_INT nelts; > > > > > bool need_ctor = false; > > > > > > > > > > if (!sel.length ().is_constant (&nelts)) > > > > > return NULL_TREE; > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) > > > > > && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); > > > > > > > > > > ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary > > > > > doesn't guard according to those asserts (I think we should extend fold_vec_perm > > > > > to support the new constraints). > > > > Hi Richard, > > > > Thanks for the suggestions and sorry for late reply. I reverted the change to > > > > simplify_permutaton which resolved the ppc64le case ICE. > > > > > > > > The attached patch extends fold_vec_perm to handle vectors with > > > > differing lengths. > > > > For, > > > > lhs = vec_perm_expr<arg0, arg1, mask>, > > > > the patch: > > > > (a) asserts lhs and mask have same vector length. > > > > (b) asserts arg0 and arg1 have same vector length. > > > > (c) returns NULL_TREE if element type differs for lhs, arg0 and arg1. > > > > (d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1 > > > > if the mask has npatterns == len(arg0) and nelts_per_pattern == 1. > > > > The intent is to permute arg0 and arg1, and then to dup elements in result > > > > to target vector length. > > > > So for eg: > > > > vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}> > > > > will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6} > > > > replicated thru-out. > > > > Does it look OK ? > > > > > > > > With the patch, we don't ICE for either of the aarch64 tests above. > > > > For, > > > > svint32_t f1() > > > > { > > > > int32x4_t v = {1, 2, 3, 4}; > > > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > } > > > > > > > > optimized dump shows: > > > > svint32_t f1 () > > > > { > > > > int32x4_t v; > > > > > > > > <bb 2> : > > > > return { 1, 2, 3, 4, ... }; > > > > > > > > } > > > > > > > > code-gen: > > > > f1: > > > > .LFB3900: > > > > .cfi_startproc > > > > ptrue p0.b, all > > > > adrp x0, .LC0 > > > > add x0, x0, :lo12:.LC0 > > > > ld1rqw z0.s, p0/z, [x0] > > > > ret > > > > .LC0: > > > > .word 1 > > > > .word 2 > > > > .word 3 > > > > .word 4 > > > > > > > > I guess for this particular case, we could use index instead. > > > > > > > > For, > > > > svint32_t f2(int a, int b, int c, int d) > > > > { > > > > int32x4_t v = {a, b, c, d}; > > > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > } > > > > > > > > optimized dump shows: > > > > svint32_t f2 (int a, int b, int c, int d) > > > > { > > > > svint32_t _6; > > > > > > > > <bb 2> [local count: 1073741824]: > > > > _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... }; > > > > return _6; > > > > > > > > The code-gen seems pretty bad however: > > > > f2: > > > > .LFB3901: > > > > .cfi_startproc > > > > addvl sp, sp, #-4 > > > > .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22 > > > > ptrue p0.b, all > > > > addvl x4, sp, #3 > > > > mov z0.b, #0 > > > > st1w z0.s, p0, [sp, #3, mul vl] > > > > str w0, [x4] > > > > addvl x0, sp, #2 > > > > ld1w z0.s, p0/z, [sp, #3, mul vl] > > > > st1w z0.s, p0, [sp, #2, mul vl] > > > > str w1, [x0, 4] > > > > addvl x0, sp, #1 > > > > ld1w z0.s, p0/z, [sp, #2, mul vl] > > > > st1w z0.s, p0, [sp, #1, mul vl] > > > > str w2, [x0, 8] > > > > ld1w z0.s, p0/z, [sp, #1, mul vl] > > > > st1w z0.s, p0, [sp] > > > > str w3, [sp, 12] > > > > ld1w z0.s, p0/z, [sp] > > > > addvl sp, sp, #4 > > > > .cfi_def_cfa_offset 0 > > > > ret > > > > > > > > I will try to address code-gen issues in follow up patches. > > > > Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu. > > > > > > > > > /* If result vector has greater length than input vector, > > > + then allow permuting two vectors as long as: > > > + a) sel.nelts_per_pattern == 1 > > > + b) sel.npatterns == len of input vector. > > > + The intent is to permute input vectors, and > > > + dup the elements in resulting vector to target vector length. */ > > > + > > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > + { > > > + nelts = sel.encoding ().npatterns (); > > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > + return NULL_TREE; > > > + } > > > > > > so the only case you add is non-VLA to VLA and there > > > explicitely only the case of a period that's same as the > > > element count in the input vectors. > > > > > > > > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > node, int spc, dump_flags_t flags, > > > pp_space (pp); > > > } > > > } > > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > + pp_string (pp, ", ... "); > > > pp_right_brace (pp); > > > > > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > Well, it got created for the following case after folding: > > svint32_t f2(int a, int b, int c, int d) > > { > > int32x4_t v = {a, b, c, d}; > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > } > > > > The svld1rq_s32 call gets folded to: > > v = {a, b, c, d} > > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > > > > fold_vec_perm then folds the above VEC_PERM_EXPR to > > VLA constructor, since elements in v (in_elts) are not constant, and > > need_ctor is thus true: > > lhs = {a, b, c, d, ...} > > I added "..." to make it more explicit that it's a VLA constructor. > > But I doubt we do anything reasonable with such a beast? Do we? > I suppose it's like a vec_duplicate if you view it as V1TImode > but do we actually make sure to do this duplication? I am not sure. As mentioned above, the current code-gen for VLA constructor looks pretty bad. Should we avoid folding VLA constructors for now ? I guess these are 2 different issues: (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. (b) Extending fold_vec_perm to handle vectors with differing lengths. For (a), I think the issue with using: res_type = gimple_assign_lhs (stmt) in previous patch, was that op2's type will change to match tgt_units, if we go thru (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, and may thus not be same as len(lhs_type) anymore, and hit the assert in fold_vec_perm. IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the following semantics: (1) Element types for lhs, rhs1 and rhs2 should be the same. (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). The attached patch changes res_type from TREE_TYPE (arg0) to following: res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), TYPE_VECTOR_SUBPARTS (op2)) so it has same element type as arg0 (and arg1) and len of op2. Does that look reasonable ? If we need a cast from res_type to lhs_type, then both would be fixed width vectors with len(lhs_type) being a multiple of len(res_type). IIUC, we don't support casting from VLA vector to/from fixed width vector, or from VLA vector of one type to VLA vector of other type ? Currently, if op2 is VLA, and we enter the branch: (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) then I think it will bail out because op2_units will not be a compile time constant, and constant_multiple_p (op2_units, tgt_units, &factor) would return false. The patch resolves ICE's for aarch64 cases, without regressing the ones in PR106360. Does it look OK ? I have a WIP patch for extending fold_vec_perm to fold VLA vectors, that I will post shortly. Thanks, Prathamesh > > > > > > > I had hoped that you would make > > > > > > tree *in_elts = XALLOCAVEC (tree, nelts * 2); > > > if (!vec_cst_ctor_to_array (arg0, nelts, in_elts) > > > || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts)) > > > return NULL_TREE; > > > > > > VLA agnostic, thus support for example permuting > > > { 0, 2, 4, 8, ... }, { 1, 3, 5, 7 ... }, { 0, 4, 1, 5 ... } > > > as { 0, 1, 2, 3, ... }, etc. > > > > > > that should be entirely doable, to somebody familiar with VLA and the APIs > > > even more so. > > Thanks, I will work on adding this case. > > > > Thanks, > > Prathamesh > > > > > > Richard. > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > > > > > Richard. > > > > > > > > > > > Thanks, > > > > > > Prathamesh > > > > > > > > > > > > > > > > > > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Prathamesh > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Richard diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 99021a82df4..bc8ba4607a5 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10541,15 +10541,24 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel) unsigned HOST_WIDE_INT nelts; bool need_ctor = false; - if (!sel.length ().is_constant (&nelts)) - return NULL_TREE; - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts) - && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts) - && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts)); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), + sel.length ())); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)))); + if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type) || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type)) return NULL_TREE; + /* TODO: Handle VEC_PERM_EXPR with differing lengths. */ + if (!known_eq (TYPE_VECTOR_SUBPARTS (type), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) + return NULL_TREE; + + /* TODO: Handle VLA vectors. */ + if (!sel.length ().is_constant (&nelts)) + return NULL_TREE; + tree *in_elts = XALLOCAVEC (tree, nelts * 2); if (!vec_cst_ctor_to_array (arg0, nelts, in_elts) || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts)) diff --git a/gcc/match.pd b/gcc/match.pd index 330c1db0c8e..aa20cc713c5 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7845,6 +7845,12 @@ and, (with { tree op0 = @0, op1 = @1, op2 = @2; + + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); + machine_mode result_mode = TYPE_MODE (type); machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc index fdc4bc8909d..4b693ef095c 100644 --- a/gcc/tree-ssa-forwprop.cc +++ b/gcc/tree-ssa-forwprop.cc @@ -2661,7 +2661,9 @@ simplify_permutation (gimple_stmt_iterator *gsi) /* Shuffle of a constructor. */ bool ret = false; - tree res_type = TREE_TYPE (arg0); + tree res_type + = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))); tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2); if (!opt || (TREE_CODE (opt) != CONSTRUCTOR && TREE_CODE (opt) != VECTOR_CST))
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni >> <prathamesh.kulkarni@linaro.org> wrote: >> > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > > >> > > >> > > /* If result vector has greater length than input vector, >> > > + then allow permuting two vectors as long as: >> > > + a) sel.nelts_per_pattern == 1 >> > > + b) sel.npatterns == len of input vector. >> > > + The intent is to permute input vectors, and >> > > + dup the elements in resulting vector to target vector length. */ >> > > + >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) >> > > + { >> > > + nelts = sel.encoding ().npatterns (); >> > > + if (sel.encoding ().nelts_per_pattern () != 1 >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) >> > > + return NULL_TREE; >> > > + } >> > > >> > > so the only case you add is non-VLA to VLA and there >> > > explicitely only the case of a period that's same as the >> > > element count in the input vectors. >> > > >> > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree >> > > node, int spc, dump_flags_t flags, >> > > pp_space (pp); >> > > } >> > > } >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) >> > > + pp_string (pp, ", ... "); >> > > pp_right_brace (pp); >> > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? >> > Well, it got created for the following case after folding: >> > svint32_t f2(int a, int b, int c, int d) >> > { >> > int32x4_t v = {a, b, c, d}; >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); >> > } >> > >> > The svld1rq_s32 call gets folded to: >> > v = {a, b, c, d} >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> >> > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to >> > VLA constructor, since elements in v (in_elts) are not constant, and >> > need_ctor is thus true: >> > lhs = {a, b, c, d, ...} >> > I added "..." to make it more explicit that it's a VLA constructor. >> >> But I doubt we do anything reasonable with such a beast? Do we? >> I suppose it's like a vec_duplicate if you view it as V1TImode >> but do we actually make sure to do this duplication? > I am not sure. As mentioned above, the current code-gen for VLA > constructor looks pretty bad. > Should we avoid folding VLA constructors for now ? VLA constructors aren't really a thing. At least, the only VLA vector you could represent with current CONSTRUCTOR nodes is a fixed-length sequence at the start of an otherwise zero vector. I'm not sure we even use that though (perhaps we do and I've forgotten). > I guess these are 2 different issues: > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > For (a), I think the issue with using: > res_type = gimple_assign_lhs (stmt) > in previous patch, was that op2's type will change to match tgt_units, > if we go thru > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > and may thus not be same as len(lhs_type) anymore, and hit the assert > in fold_vec_perm. > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > following semantics: > (1) Element types for lhs, rhs1 and rhs2 should be the same. > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). Yeah. > The attached patch changes res_type from TREE_TYPE (arg0) to following: > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > TYPE_VECTOR_SUBPARTS (op2)) > so it has same element type as arg0 (and arg1) and len of op2. > Does that look reasonable ? > > If we need a cast from res_type to lhs_type, then both would be fixed > width vectors > with len(lhs_type) being a multiple of len(res_type). > IIUC, we don't support casting from VLA vector to/from fixed width vector, Yes, that's not supported as a cast. If the compiler knows the length of the "VLA" vector then it's not VLA. If it doesn't know the length of the VLA vector then the sizes could be different (preventing VIEW_CONVERT_EXPR) and the number of elements could be different (preventing pointwise CONVERT_EXPRs). > or from VLA vector of one type to VLA vector of other type ? That's supported though. They work just like VLS vectors: if the sizes are the same then we can use VIEW_CONVERT_EXPR, if the number of elements are the same then we can do pointwise conversions (e.g. element-by-element extensions, truncations, conversions to float, conversions to integer, etc). > Currently, if op2 is VLA, and we enter the branch: > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > then I think it will bail out because op2_units will not be a compile > time constant, > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > The patch resolves ICE's for aarch64 cases, without regressing the > ones in PR106360. > Does it look OK ? Richi should have the final say, but it looks OK in principle to me. I'm not sure it's worth applying independently of the follow-on patch though, if that patch is in the offing anyway. I guess my only question is whether tree-ssa-forwprop.cc really needs to build a new type. Couldn't it just use the TREE_TYPE of the lhs instead? Thanks, Richard
On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: > >> > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > >> <prathamesh.kulkarni@linaro.org> wrote: > >> > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > > > >> > > > >> > > /* If result vector has greater length than input vector, > >> > > + then allow permuting two vectors as long as: > >> > > + a) sel.nelts_per_pattern == 1 > >> > > + b) sel.npatterns == len of input vector. > >> > > + The intent is to permute input vectors, and > >> > > + dup the elements in resulting vector to target vector length. */ > >> > > + > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > >> > > + { > >> > > + nelts = sel.encoding ().npatterns (); > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > >> > > + return NULL_TREE; > >> > > + } > >> > > > >> > > so the only case you add is non-VLA to VLA and there > >> > > explicitely only the case of a period that's same as the > >> > > element count in the input vectors. > >> > > > >> > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > >> > > node, int spc, dump_flags_t flags, > >> > > pp_space (pp); > >> > > } > >> > > } > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > >> > > + pp_string (pp, ", ... "); > >> > > pp_right_brace (pp); > >> > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > >> > Well, it got created for the following case after folding: > >> > svint32_t f2(int a, int b, int c, int d) > >> > { > >> > int32x4_t v = {a, b, c, d}; > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > >> > } > >> > > >> > The svld1rq_s32 call gets folded to: > >> > v = {a, b, c, d} > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > >> > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > >> > VLA constructor, since elements in v (in_elts) are not constant, and > >> > need_ctor is thus true: > >> > lhs = {a, b, c, d, ...} > >> > I added "..." to make it more explicit that it's a VLA constructor. > >> > >> But I doubt we do anything reasonable with such a beast? Do we? > >> I suppose it's like a vec_duplicate if you view it as V1TImode > >> but do we actually make sure to do this duplication? > > I am not sure. As mentioned above, the current code-gen for VLA > > constructor looks pretty bad. > > Should we avoid folding VLA constructors for now ? > > VLA constructors aren't really a thing. At least, the only VLA vector > you could represent with current CONSTRUCTOR nodes is a fixed-length > sequence at the start of an otherwise zero vector. I'm not sure > we even use that though (perhaps we do and I've forgotten). > > > I guess these are 2 different issues: > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > For (a), I think the issue with using: > > res_type = gimple_assign_lhs (stmt) > > in previous patch, was that op2's type will change to match tgt_units, > > if we go thru > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > in fold_vec_perm. > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > > following semantics: > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > Yeah. > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > TYPE_VECTOR_SUBPARTS (op2)) > > so it has same element type as arg0 (and arg1) and len of op2. > > Does that look reasonable ? > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > width vectors > > with len(lhs_type) being a multiple of len(res_type). > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > Yes, that's not supported as a cast. If the compiler knows the > length of the "VLA" vector then it's not VLA. If it doesn't > know the length of the VLA vector then the sizes could be different > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > different (preventing pointwise CONVERT_EXPRs). > > > or from VLA vector of one type to VLA vector of other type ? > > That's supported though. They work just like VLS vectors: if the sizes > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > are the same then we can do pointwise conversions (e.g. element-by-element > extensions, truncations, conversions to float, conversions to integer, etc). > > > Currently, if op2 is VLA, and we enter the branch: > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > then I think it will bail out because op2_units will not be a compile > > time constant, > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > ones in PR106360. > > Does it look OK ? > > Richi should have the final say, but it looks OK in principle to me. > I'm not sure it's worth applying independently of the follow-on patch > though, if that patch is in the offing anyway. > > I guess my only question is whether tree-ssa-forwprop.cc really needs > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > instead? I think the point was they are not necssarily the same when we looked through a VIEW_CONVERT? A comment might be in order here. Btw, please don't add new asserts that trivially hold in critical paths. diff --git a/gcc/match.pd b/gcc/match.pd index 330c1db0c8e..aa20cc713c5 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7845,6 +7845,12 @@ and, (with { tree op0 = @0, op1 = @1, op2 = @2; + + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); + machine_mode result_mode = TYPE_MODE (type); machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > Thanks, > Richard
On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: > > >> > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > >> <prathamesh.kulkarni@linaro.org> wrote: > > >> > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > > > > >> > > > > >> > > /* If result vector has greater length than input vector, > > >> > > + then allow permuting two vectors as long as: > > >> > > + a) sel.nelts_per_pattern == 1 > > >> > > + b) sel.npatterns == len of input vector. > > >> > > + The intent is to permute input vectors, and > > >> > > + dup the elements in resulting vector to target vector length. */ > > >> > > + > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > >> > > + { > > >> > > + nelts = sel.encoding ().npatterns (); > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > >> > > + return NULL_TREE; > > >> > > + } > > >> > > > > >> > > so the only case you add is non-VLA to VLA and there > > >> > > explicitely only the case of a period that's same as the > > >> > > element count in the input vectors. > > >> > > > > >> > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > >> > > node, int spc, dump_flags_t flags, > > >> > > pp_space (pp); > > >> > > } > > >> > > } > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > >> > > + pp_string (pp, ", ... "); > > >> > > pp_right_brace (pp); > > >> > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > >> > Well, it got created for the following case after folding: > > >> > svint32_t f2(int a, int b, int c, int d) > > >> > { > > >> > int32x4_t v = {a, b, c, d}; > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > >> > } > > >> > > > >> > The svld1rq_s32 call gets folded to: > > >> > v = {a, b, c, d} > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > > >> > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > >> > need_ctor is thus true: > > >> > lhs = {a, b, c, d, ...} > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > >> > > >> But I doubt we do anything reasonable with such a beast? Do we? > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > >> but do we actually make sure to do this duplication? > > > I am not sure. As mentioned above, the current code-gen for VLA > > > constructor looks pretty bad. > > > Should we avoid folding VLA constructors for now ? > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > sequence at the start of an otherwise zero vector. I'm not sure > > we even use that though (perhaps we do and I've forgotten). > > > > > I guess these are 2 different issues: > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > For (a), I think the issue with using: > > > res_type = gimple_assign_lhs (stmt) > > > in previous patch, was that op2's type will change to match tgt_units, > > > if we go thru > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > in fold_vec_perm. > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > > > following semantics: > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > Yeah. > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > TYPE_VECTOR_SUBPARTS (op2)) > > > so it has same element type as arg0 (and arg1) and len of op2. > > > Does that look reasonable ? > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > width vectors > > > with len(lhs_type) being a multiple of len(res_type). > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > Yes, that's not supported as a cast. If the compiler knows the > > length of the "VLA" vector then it's not VLA. If it doesn't > > know the length of the VLA vector then the sizes could be different > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > different (preventing pointwise CONVERT_EXPRs). > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > That's supported though. They work just like VLS vectors: if the sizes > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > are the same then we can do pointwise conversions (e.g. element-by-element > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > Currently, if op2 is VLA, and we enter the branch: > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > then I think it will bail out because op2_units will not be a compile > > > time constant, > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > ones in PR106360. > > > Does it look OK ? > > > > Richi should have the final say, but it looks OK in principle to me. > > I'm not sure it's worth applying independently of the follow-on patch > > though, if that patch is in the offing anyway. > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > instead? > > I think the point was they are not necssarily the same when we > looked through a VIEW_CONVERT? A comment might be in order > here. Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of VIEW_CONVERT_EXPR. For instance in following case: _78 = (void *) ivtmp.21_73; _92 = MEM <unsigned long> [(uint8_t *)_78]; _91 = {_92, 0}; vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; _87 = {_88, 0}; vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; simplify_permutation looks thru V_C_E, and tries to fold: res = VEC_PERM_EXPR (_87, _91, {0, 2}) In this case, lhs type (V16QI) differs from res type (V2DI), and we hit assert in fold_vec_perm. > > Btw, please don't add new asserts that trivially hold in critical paths. Well, it didn't hold true for following case when op2 is VLA: lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), because we're passing res_type == V4SI, and op2_type == VNx4SI from simplify_permutation. The vec_perm pattern hits: (if (sel.series_p (0, 1, 0, 1)) { op0; } and folds it to: lhs = {1, 2, 3, 4} which results in error during verify_gimple since lhs and rhs types differ (VNx4SI, V4SI). I suppose we want to do the transforms: sel.series_p (0, 1, 0, 1) -> op0 and, sel.series_p (0, 1, nelts, 1) -> op1 only if input vectors and sel have same length ? Thanks, Prathamesh > > diff --git a/gcc/match.pd b/gcc/match.pd > index 330c1db0c8e..aa20cc713c5 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -7845,6 +7845,12 @@ and, > (with > { > tree op0 = @0, op1 = @1, op2 = @2; > + > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > + > machine_mode result_mode = TYPE_MODE (type); > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > Thanks, > > Richard
On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > <richard.sandiford@arm.com> wrote: > > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: > > > >> > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > >> <prathamesh.kulkarni@linaro.org> wrote: > > > >> > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > > > > > >> > > > > > >> > > /* If result vector has greater length than input vector, > > > >> > > + then allow permuting two vectors as long as: > > > >> > > + a) sel.nelts_per_pattern == 1 > > > >> > > + b) sel.npatterns == len of input vector. > > > >> > > + The intent is to permute input vectors, and > > > >> > > + dup the elements in resulting vector to target vector length. */ > > > >> > > + > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > >> > > + { > > > >> > > + nelts = sel.encoding ().npatterns (); > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > >> > > + return NULL_TREE; > > > >> > > + } > > > >> > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > >> > > explicitely only the case of a period that's same as the > > > >> > > element count in the input vectors. > > > >> > > > > > >> > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > >> > > node, int spc, dump_flags_t flags, > > > >> > > pp_space (pp); > > > >> > > } > > > >> > > } > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > >> > > + pp_string (pp, ", ... "); > > > >> > > pp_right_brace (pp); > > > >> > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > >> > Well, it got created for the following case after folding: > > > >> > svint32_t f2(int a, int b, int c, int d) > > > >> > { > > > >> > int32x4_t v = {a, b, c, d}; > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > >> > } > > > >> > > > > >> > The svld1rq_s32 call gets folded to: > > > >> > v = {a, b, c, d} > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > > > >> > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > >> > need_ctor is thus true: > > > >> > lhs = {a, b, c, d, ...} > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > >> > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > >> but do we actually make sure to do this duplication? > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > constructor looks pretty bad. > > > > Should we avoid folding VLA constructors for now ? > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > sequence at the start of an otherwise zero vector. I'm not sure > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > I guess these are 2 different issues: > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > For (a), I think the issue with using: > > > > res_type = gimple_assign_lhs (stmt) > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > if we go thru > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > in fold_vec_perm. > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > > > > following semantics: > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > Yeah. > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > TYPE_VECTOR_SUBPARTS (op2)) > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > Does that look reasonable ? > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > width vectors > > > > with len(lhs_type) being a multiple of len(res_type). > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > length of the "VLA" vector then it's not VLA. If it doesn't > > > know the length of the VLA vector then the sizes could be different > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > > different (preventing pointwise CONVERT_EXPRs). > > > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > > > That's supported though. They work just like VLS vectors: if the sizes > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > > are the same then we can do pointwise conversions (e.g. element-by-element > > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > > > Currently, if op2 is VLA, and we enter the branch: > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > then I think it will bail out because op2_units will not be a compile > > > > time constant, > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > > ones in PR106360. > > > > Does it look OK ? > > > > > > Richi should have the final say, but it looks OK in principle to me. > > > I'm not sure it's worth applying independently of the follow-on patch > > > though, if that patch is in the offing anyway. > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > > instead? > > > > I think the point was they are not necssarily the same when we > > looked through a VIEW_CONVERT? A comment might be in order > > here. > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of > VIEW_CONVERT_EXPR. For instance in following case: > > _78 = (void *) ivtmp.21_73; > _92 = MEM <unsigned long> [(uint8_t *)_78]; > _91 = {_92, 0}; > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > _87 = {_88, 0}; > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > simplify_permutation looks thru V_C_E, and tries to fold: > res = VEC_PERM_EXPR (_87, _91, {0, 2}) > In this case, lhs type (V16QI) differs from res type (V2DI), and we > hit assert in fold_vec_perm. Oops sorry, just to clarify -- this hit assert in fold_vec_perm only when we used res_type = lhs_type. It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI. The issue with using res_type = TREE_TYPE (arg0), comes for case when arg0 is fixed length, and op2 is VLA. Thanks, Prathamesh > > > > Btw, please don't add new asserts that trivially hold in critical paths. > Well, it didn't hold true for following case when op2 is VLA: > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), > because we're passing res_type == V4SI, and op2_type == VNx4SI from > simplify_permutation. > > The vec_perm pattern hits: > (if (sel.series_p (0, 1, 0, 1)) > { op0; } > > and folds it to: > lhs = {1, 2, 3, 4} > > which results in error during verify_gimple since lhs and rhs types > differ (VNx4SI, V4SI). > > I suppose we want to do the transforms: > sel.series_p (0, 1, 0, 1) -> op0 and, > sel.series_p (0, 1, nelts, 1) -> op1 > only if input vectors and sel have same length ? > > Thanks, > Prathamesh > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 330c1db0c8e..aa20cc713c5 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -7845,6 +7845,12 @@ and, > > (with > > { > > tree op0 = @0, op1 = @1, op2 = @2; > > + > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > > + > > machine_mode result_mode = TYPE_MODE (type); > > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > > > > > > Thanks, > > > Richard
On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > <richard.sandiford@arm.com> wrote: > > > > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: > > > > >> > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > >> <prathamesh.kulkarni@linaro.org> wrote: > > > > >> > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > > > > > > >> > > > > > > >> > > /* If result vector has greater length than input vector, > > > > >> > > + then allow permuting two vectors as long as: > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > >> > > + b) sel.npatterns == len of input vector. > > > > >> > > + The intent is to permute input vectors, and > > > > >> > > + dup the elements in resulting vector to target vector length. */ > > > > >> > > + > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > > >> > > + { > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > > >> > > + return NULL_TREE; > > > > >> > > + } > > > > >> > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > >> > > explicitely only the case of a period that's same as the > > > > >> > > element count in the input vectors. > > > > >> > > > > > > >> > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > > >> > > node, int spc, dump_flags_t flags, > > > > >> > > pp_space (pp); > > > > >> > > } > > > > >> > > } > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > > >> > > + pp_string (pp, ", ... "); > > > > >> > > pp_right_brace (pp); > > > > >> > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > > >> > Well, it got created for the following case after folding: > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > >> > { > > > > >> > int32x4_t v = {a, b, c, d}; > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > >> > } > > > > >> > > > > > >> > The svld1rq_s32 call gets folded to: > > > > >> > v = {a, b, c, d} > > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > > > > >> > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > > >> > need_ctor is thus true: > > > > >> > lhs = {a, b, c, d, ...} > > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > > >> > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > >> but do we actually make sure to do this duplication? > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > constructor looks pretty bad. > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > I guess these are 2 different issues: > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > > > For (a), I think the issue with using: > > > > > res_type = gimple_assign_lhs (stmt) > > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > > if we go thru > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > > in fold_vec_perm. > > > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > > > > > following semantics: > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > > > Yeah. > > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > > TYPE_VECTOR_SUBPARTS (op2)) > > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > > Does that look reasonable ? > > > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > > width vectors > > > > > with len(lhs_type) being a multiple of len(res_type). > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > > length of the "VLA" vector then it's not VLA. If it doesn't > > > > know the length of the VLA vector then the sizes could be different > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > > > different (preventing pointwise CONVERT_EXPRs). > > > > > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > > > > > That's supported though. They work just like VLS vectors: if the sizes > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > > > are the same then we can do pointwise conversions (e.g. element-by-element > > > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > > > > > Currently, if op2 is VLA, and we enter the branch: > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > > then I think it will bail out because op2_units will not be a compile > > > > > time constant, > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > > > ones in PR106360. > > > > > Does it look OK ? > > > > > > > > Richi should have the final say, but it looks OK in principle to me. > > > > I'm not sure it's worth applying independently of the follow-on patch > > > > though, if that patch is in the offing anyway. > > > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > > > instead? > > > > > > I think the point was they are not necssarily the same when we > > > looked through a VIEW_CONVERT? A comment might be in order > > > here. > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of > > VIEW_CONVERT_EXPR. For instance in following case: > > > > _78 = (void *) ivtmp.21_73; > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > _91 = {_92, 0}; > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > _87 = {_88, 0}; > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > simplify_permutation looks thru V_C_E, and tries to fold: > > res = VEC_PERM_EXPR (_87, _91, {0, 2}) > > In this case, lhs type (V16QI) differs from res type (V2DI), and we > > hit assert in fold_vec_perm. > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only > when we used res_type = lhs_type. > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI. > The issue with using res_type = TREE_TYPE (arg0), comes for case when > arg0 is fixed length, and op2 is VLA. Hi, Shall it be OK to commit the above patch, that changes res_type to have same elem type as arg0, and length as op2 ? Also, shall it be OK to gate these transforms in vec_perm match.pd pattern: sel.series_p (0, 1, 0, 1) -> op0 and, sel.series_p (0, 1, nelts, 1) -> op1 if len(type) == len(arg0) ? Thanks, Prathamesh > > Thanks, > Prathamesh > > > > > > Btw, please don't add new asserts that trivially hold in critical paths. > > Well, it didn't hold true for following case when op2 is VLA: > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), > > because we're passing res_type == V4SI, and op2_type == VNx4SI from > > simplify_permutation. > > > > The vec_perm pattern hits: > > (if (sel.series_p (0, 1, 0, 1)) > > { op0; } > > > > and folds it to: > > lhs = {1, 2, 3, 4} > > > > which results in error during verify_gimple since lhs and rhs types > > differ (VNx4SI, V4SI). > > > > I suppose we want to do the transforms: > > sel.series_p (0, 1, 0, 1) -> op0 and, > > sel.series_p (0, 1, nelts, 1) -> op1 > > only if input vectors and sel have same length ? > > > > Thanks, > > Prathamesh > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > index 330c1db0c8e..aa20cc713c5 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -7845,6 +7845,12 @@ and, > > > (with > > > { > > > tree op0 = @0, op1 = @1, op2 = @2; > > > + > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > > > + > > > machine_mode result_mode = TYPE_MODE (type); > > > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > > > > > > > > > > > Thanks, > > > > Richard
On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > <richard.sandiford@arm.com> wrote: > > > > > > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > >> > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > >> <prathamesh.kulkarni@linaro.org> wrote: > > > > > >> > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > > > > > > > >> > > > > > > > >> > > /* If result vector has greater length than input vector, > > > > > >> > > + then allow permuting two vectors as long as: > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > >> > > + The intent is to permute input vectors, and > > > > > >> > > + dup the elements in resulting vector to target vector length. */ > > > > > >> > > + > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > > > >> > > + { > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > > > >> > > + return NULL_TREE; > > > > > >> > > + } > > > > > >> > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > >> > > explicitely only the case of a period that's same as the > > > > > >> > > element count in the input vectors. > > > > > >> > > > > > > > >> > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > >> > > pp_space (pp); > > > > > >> > > } > > > > > >> > > } > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > > > >> > > + pp_string (pp, ", ... "); > > > > > >> > > pp_right_brace (pp); > > > > > >> > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > > > >> > Well, it got created for the following case after folding: > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > >> > { > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > >> > } > > > > > >> > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > >> > v = {a, b, c, d} > > > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > > > > > >> > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > > > >> > need_ctor is thus true: > > > > > >> > lhs = {a, b, c, d, ...} > > > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > > > >> > > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > > >> but do we actually make sure to do this duplication? > > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > > constructor looks pretty bad. > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > I guess these are 2 different issues: > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > > > > > For (a), I think the issue with using: > > > > > > res_type = gimple_assign_lhs (stmt) > > > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > > > if we go thru > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > > > in fold_vec_perm. > > > > > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > > > > > > following semantics: > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > > > > > Yeah. > > > > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > > > TYPE_VECTOR_SUBPARTS (op2)) > > > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > > > Does that look reasonable ? > > > > > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > > > width vectors > > > > > > with len(lhs_type) being a multiple of len(res_type). > > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > > > length of the "VLA" vector then it's not VLA. If it doesn't > > > > > know the length of the VLA vector then the sizes could be different > > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > > > > different (preventing pointwise CONVERT_EXPRs). > > > > > > > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > > > > > > > That's supported though. They work just like VLS vectors: if the sizes > > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > > > > are the same then we can do pointwise conversions (e.g. element-by-element > > > > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > > > > > > > Currently, if op2 is VLA, and we enter the branch: > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > > > then I think it will bail out because op2_units will not be a compile > > > > > > time constant, > > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > > > > ones in PR106360. > > > > > > Does it look OK ? > > > > > > > > > > Richi should have the final say, but it looks OK in principle to me. > > > > > I'm not sure it's worth applying independently of the follow-on patch > > > > > though, if that patch is in the offing anyway. > > > > > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > > > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > > > > instead? > > > > > > > > I think the point was they are not necssarily the same when we > > > > looked through a VIEW_CONVERT? A comment might be in order > > > > here. > > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of > > > VIEW_CONVERT_EXPR. For instance in following case: > > > > > > _78 = (void *) ivtmp.21_73; > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > _91 = {_92, 0}; > > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > _87 = {_88, 0}; > > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > simplify_permutation looks thru V_C_E, and tries to fold: > > > res = VEC_PERM_EXPR (_87, _91, {0, 2}) > > > In this case, lhs type (V16QI) differs from res type (V2DI), and we > > > hit assert in fold_vec_perm. > > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only > > when we used res_type = lhs_type. > > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI. > > The issue with using res_type = TREE_TYPE (arg0), comes for case when > > arg0 is fixed length, and op2 is VLA. > Hi, > Shall it be OK to commit the above patch, that changes res_type to > have same elem type as arg0, and length as op2 ? > > Also, shall it be OK to gate these transforms in vec_perm match.pd pattern: > sel.series_p (0, 1, 0, 1) -> op0 and, > sel.series_p (0, 1, nelts, 1) -> op1 > if len(type) == len(arg0) ? ping Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh > > > > > > > > Btw, please don't add new asserts that trivially hold in critical paths. > > > Well, it didn't hold true for following case when op2 is VLA: > > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), > > > because we're passing res_type == V4SI, and op2_type == VNx4SI from > > > simplify_permutation. > > > > > > The vec_perm pattern hits: > > > (if (sel.series_p (0, 1, 0, 1)) > > > { op0; } > > > > > > and folds it to: > > > lhs = {1, 2, 3, 4} > > > > > > which results in error during verify_gimple since lhs and rhs types > > > differ (VNx4SI, V4SI). > > > > > > I suppose we want to do the transforms: > > > sel.series_p (0, 1, 0, 1) -> op0 and, > > > sel.series_p (0, 1, nelts, 1) -> op1 > > > only if input vectors and sel have same length ? > > > > > > Thanks, > > > Prathamesh > > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index 330c1db0c8e..aa20cc713c5 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -7845,6 +7845,12 @@ and, > > > > (with > > > > { > > > > tree op0 = @0, op1 = @1, op2 = @2; > > > > + > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > > > > + > > > > machine_mode result_mode = TYPE_MODE (type); > > > > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > Richard
On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > > <richard.sandiford@arm.com> wrote: > > > > > > > > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > >> > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > > >> <prathamesh.kulkarni@linaro.org> wrote: > > > > > > >> > > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > > > > > > > > >> > > > > > > > > >> > > /* If result vector has greater length than input vector, > > > > > > >> > > + then allow permuting two vectors as long as: > > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > > >> > > + The intent is to permute input vectors, and > > > > > > >> > > + dup the elements in resulting vector to target vector length. */ > > > > > > >> > > + > > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > > > > >> > > + { > > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > > > > >> > > + return NULL_TREE; > > > > > > >> > > + } > > > > > > >> > > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > > >> > > explicitely only the case of a period that's same as the > > > > > > >> > > element count in the input vectors. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > > >> > > pp_space (pp); > > > > > > >> > > } > > > > > > >> > > } > > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > > > > >> > > + pp_string (pp, ", ... "); > > > > > > >> > > pp_right_brace (pp); > > > > > > >> > > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > > > > >> > Well, it got created for the following case after folding: > > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > > >> > { > > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > >> > } > > > > > > >> > > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > > >> > v = {a, b, c, d} > > > > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > > > > > > >> > > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > > > > >> > need_ctor is thus true: > > > > > > >> > lhs = {a, b, c, d, ...} > > > > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > > > > >> > > > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > > > >> but do we actually make sure to do this duplication? > > > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > > > constructor looks pretty bad. > > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > > > I guess these are 2 different issues: > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > > > > > > > For (a), I think the issue with using: > > > > > > > res_type = gimple_assign_lhs (stmt) > > > > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > > > > if we go thru > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > > > > in fold_vec_perm. > > > > > > > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > > > > > > > following semantics: > > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > > > > > > > Yeah. > > > > > > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > > > > TYPE_VECTOR_SUBPARTS (op2)) > > > > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > > > > Does that look reasonable ? > > > > > > > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > > > > width vectors > > > > > > > with len(lhs_type) being a multiple of len(res_type). > > > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > > > > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > > > > length of the "VLA" vector then it's not VLA. If it doesn't > > > > > > know the length of the VLA vector then the sizes could be different > > > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > > > > > different (preventing pointwise CONVERT_EXPRs). > > > > > > > > > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > > > > > > > > > That's supported though. They work just like VLS vectors: if the sizes > > > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > > > > > are the same then we can do pointwise conversions (e.g. element-by-element > > > > > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > > > > > > > > > Currently, if op2 is VLA, and we enter the branch: > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > > > > then I think it will bail out because op2_units will not be a compile > > > > > > > time constant, > > > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > > > > > ones in PR106360. > > > > > > > Does it look OK ? > > > > > > > > > > > > Richi should have the final say, but it looks OK in principle to me. > > > > > > I'm not sure it's worth applying independently of the follow-on patch > > > > > > though, if that patch is in the offing anyway. > > > > > > > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > > > > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > > > > > instead? > > > > > > > > > > I think the point was they are not necssarily the same when we > > > > > looked through a VIEW_CONVERT? A comment might be in order > > > > > here. > > > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of > > > > VIEW_CONVERT_EXPR. For instance in following case: > > > > > > > > _78 = (void *) ivtmp.21_73; > > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > > _91 = {_92, 0}; > > > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > > _87 = {_88, 0}; > > > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > > > simplify_permutation looks thru V_C_E, and tries to fold: > > > > res = VEC_PERM_EXPR (_87, _91, {0, 2}) > > > > In this case, lhs type (V16QI) differs from res type (V2DI), and we > > > > hit assert in fold_vec_perm. > > > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only > > > when we used res_type = lhs_type. > > > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI. > > > The issue with using res_type = TREE_TYPE (arg0), comes for case when > > > arg0 is fixed length, and op2 is VLA. > > Hi, > > Shall it be OK to commit the above patch, that changes res_type to > > have same elem type as arg0, and length as op2 ? > > > > Also, shall it be OK to gate these transforms in vec_perm match.pd pattern: > > sel.series_p (0, 1, 0, 1) -> op0 and, > > sel.series_p (0, 1, nelts, 1) -> op1 > > if len(type) == len(arg0) ? > ping I've lost track - what is the patch you are refering to? Richard. > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh > > > > > > Thanks, > > > Prathamesh > > > > > > > > > > Btw, please don't add new asserts that trivially hold in critical paths. > > > > Well, it didn't hold true for following case when op2 is VLA: > > > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), > > > > because we're passing res_type == V4SI, and op2_type == VNx4SI from > > > > simplify_permutation. > > > > > > > > The vec_perm pattern hits: > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > { op0; } > > > > > > > > and folds it to: > > > > lhs = {1, 2, 3, 4} > > > > > > > > which results in error during verify_gimple since lhs and rhs types > > > > differ (VNx4SI, V4SI). > > > > > > > > I suppose we want to do the transforms: > > > > sel.series_p (0, 1, 0, 1) -> op0 and, > > > > sel.series_p (0, 1, nelts, 1) -> op1 > > > > only if input vectors and sel have same length ? > > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > index 330c1db0c8e..aa20cc713c5 100644 > > > > > --- a/gcc/match.pd > > > > > +++ b/gcc/match.pd > > > > > @@ -7845,6 +7845,12 @@ and, > > > > > (with > > > > > { > > > > > tree op0 = @0, op1 = @1, op2 = @2; > > > > > + > > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > > > > > + > > > > > machine_mode result_mode = TYPE_MODE (type); > > > > > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > Richard
On Mon, 5 Sept 2022 at 14:39, Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > > > <richard.sandiford@arm.com> wrote: > > > > > > > > > > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > >> > > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > > > >> <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > >> > > > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > > > > > > > > > >> > > > > > > > > > >> > > /* If result vector has greater length than input vector, > > > > > > > >> > > + then allow permuting two vectors as long as: > > > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > > > >> > > + The intent is to permute input vectors, and > > > > > > > >> > > + dup the elements in resulting vector to target vector length. */ > > > > > > > >> > > + > > > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > > > > > >> > > + { > > > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > > > > > >> > > + return NULL_TREE; > > > > > > > >> > > + } > > > > > > > >> > > > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > > > >> > > explicitely only the case of a period that's same as the > > > > > > > >> > > element count in the input vectors. > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > > > >> > > pp_space (pp); > > > > > > > >> > > } > > > > > > > >> > > } > > > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > > > > > >> > > + pp_string (pp, ", ... "); > > > > > > > >> > > pp_right_brace (pp); > > > > > > > >> > > > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > > > > > >> > Well, it got created for the following case after folding: > > > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > > > >> > { > > > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > >> > } > > > > > > > >> > > > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > > > >> > v = {a, b, c, d} > > > > > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > > > > > > > >> > > > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > > > > > >> > need_ctor is thus true: > > > > > > > >> > lhs = {a, b, c, d, ...} > > > > > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > > > > > >> > > > > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > > > > >> but do we actually make sure to do this duplication? > > > > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > > > > constructor looks pretty bad. > > > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > > > > > I guess these are 2 different issues: > > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > > > > > > > > > For (a), I think the issue with using: > > > > > > > > res_type = gimple_assign_lhs (stmt) > > > > > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > > > > > if we go thru > > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > > > > > in fold_vec_perm. > > > > > > > > > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > > > > > > > > following semantics: > > > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > > > > > > > > > Yeah. > > > > > > > > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > > > > > TYPE_VECTOR_SUBPARTS (op2)) > > > > > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > > > > > Does that look reasonable ? > > > > > > > > > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > > > > > width vectors > > > > > > > > with len(lhs_type) being a multiple of len(res_type). > > > > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > > > > > > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > > > > > length of the "VLA" vector then it's not VLA. If it doesn't > > > > > > > know the length of the VLA vector then the sizes could be different > > > > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > > > > > > different (preventing pointwise CONVERT_EXPRs). > > > > > > > > > > > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > > > > > > > > > > > That's supported though. They work just like VLS vectors: if the sizes > > > > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > > > > > > are the same then we can do pointwise conversions (e.g. element-by-element > > > > > > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > > > > > > > > > > > Currently, if op2 is VLA, and we enter the branch: > > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > > > > > then I think it will bail out because op2_units will not be a compile > > > > > > > > time constant, > > > > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > > > > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > > > > > > ones in PR106360. > > > > > > > > Does it look OK ? > > > > > > > > > > > > > > Richi should have the final say, but it looks OK in principle to me. > > > > > > > I'm not sure it's worth applying independently of the follow-on patch > > > > > > > though, if that patch is in the offing anyway. > > > > > > > > > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > > > > > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > > > > > > instead? > > > > > > > > > > > > I think the point was they are not necssarily the same when we > > > > > > looked through a VIEW_CONVERT? A comment might be in order > > > > > > here. > > > > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of > > > > > VIEW_CONVERT_EXPR. For instance in following case: > > > > > > > > > > _78 = (void *) ivtmp.21_73; > > > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > > > _91 = {_92, 0}; > > > > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > > > _87 = {_88, 0}; > > > > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > > > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > > > > > simplify_permutation looks thru V_C_E, and tries to fold: > > > > > res = VEC_PERM_EXPR (_87, _91, {0, 2}) > > > > > In this case, lhs type (V16QI) differs from res type (V2DI), and we > > > > > hit assert in fold_vec_perm. > > > > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only > > > > when we used res_type = lhs_type. > > > > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI. > > > > The issue with using res_type = TREE_TYPE (arg0), comes for case when > > > > arg0 is fixed length, and op2 is VLA. > > > Hi, > > > Shall it be OK to commit the above patch, that changes res_type to > > > have same elem type as arg0, and length as op2 ? > > > > > > Also, shall it be OK to gate these transforms in vec_perm match.pd pattern: > > > sel.series_p (0, 1, 0, 1) -> op0 and, > > > sel.series_p (0, 1, nelts, 1) -> op1 > > > if len(type) == len(arg0) ? > > ping > > I've lost track - what is the patch you are refering to? This one: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599586.html Only the tree-ssa-forwprop.cc bits (the other patch extends fold_vec_perm and we don't want asserts in vec_perm pattern). Thanks, Prathamesh > > Richard. > > > Thanks, > > Prathamesh > > > > > > Thanks, > > > Prathamesh > > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > > > Btw, please don't add new asserts that trivially hold in critical paths. > > > > > Well, it didn't hold true for following case when op2 is VLA: > > > > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), > > > > > because we're passing res_type == V4SI, and op2_type == VNx4SI from > > > > > simplify_permutation. > > > > > > > > > > The vec_perm pattern hits: > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > > { op0; } > > > > > > > > > > and folds it to: > > > > > lhs = {1, 2, 3, 4} > > > > > > > > > > which results in error during verify_gimple since lhs and rhs types > > > > > differ (VNx4SI, V4SI). > > > > > > > > > > I suppose we want to do the transforms: > > > > > sel.series_p (0, 1, 0, 1) -> op0 and, > > > > > sel.series_p (0, 1, nelts, 1) -> op1 > > > > > only if input vectors and sel have same length ? > > > > > > > > > > Thanks, > > > > > Prathamesh > > > > > > > > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > > index 330c1db0c8e..aa20cc713c5 100644 > > > > > > --- a/gcc/match.pd > > > > > > +++ b/gcc/match.pd > > > > > > @@ -7845,6 +7845,12 @@ and, > > > > > > (with > > > > > > { > > > > > > tree op0 = @0, op1 = @1, op2 = @2; > > > > > > + > > > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > > > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > > > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > > > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > > > > > > + > > > > > > machine_mode result_mode = TYPE_MODE (type); > > > > > > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > Richard
On Mon, Sep 5, 2022 at 11:27 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Mon, 5 Sept 2022 at 14:39, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni > > > > > <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford > > > > > > > <richard.sandiford@arm.com> wrote: > > > > > > > > > > > > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > > > >> > > > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > > > > > > > > >> <prathamesh.kulkarni@linaro.org> wrote: > > > > > > > > >> > > > > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > > > > > > > > > > >> > > > > > > > > > > >> > > /* If result vector has greater length than input vector, > > > > > > > > >> > > + then allow permuting two vectors as long as: > > > > > > > > >> > > + a) sel.nelts_per_pattern == 1 > > > > > > > > >> > > + b) sel.npatterns == len of input vector. > > > > > > > > >> > > + The intent is to permute input vectors, and > > > > > > > > >> > > + dup the elements in resulting vector to target vector length. */ > > > > > > > > >> > > + > > > > > > > > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > > > > > > > > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > > > > > > > > >> > > + { > > > > > > > > >> > > + nelts = sel.encoding ().npatterns (); > > > > > > > > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > > > > > > > > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))) > > > > > > > > >> > > + return NULL_TREE; > > > > > > > > >> > > + } > > > > > > > > >> > > > > > > > > > > >> > > so the only case you add is non-VLA to VLA and there > > > > > > > > >> > > explicitely only the case of a period that's same as the > > > > > > > > >> > > element count in the input vectors. > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > > > > > > > > >> > > node, int spc, dump_flags_t flags, > > > > > > > > >> > > pp_space (pp); > > > > > > > > >> > > } > > > > > > > > >> > > } > > > > > > > > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > > > > > > > > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > > > > > > > > >> > > + pp_string (pp, ", ... "); > > > > > > > > >> > > pp_right_brace (pp); > > > > > > > > >> > > > > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > > > > > > > > >> > Well, it got created for the following case after folding: > > > > > > > > >> > svint32_t f2(int a, int b, int c, int d) > > > > > > > > >> > { > > > > > > > > >> > int32x4_t v = {a, b, c, d}; > > > > > > > > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > > > > > > >> > } > > > > > > > > >> > > > > > > > > > >> > The svld1rq_s32 call gets folded to: > > > > > > > > >> > v = {a, b, c, d} > > > > > > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > > > > > > > > >> > > > > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > > > > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and > > > > > > > > >> > need_ctor is thus true: > > > > > > > > >> > lhs = {a, b, c, d, ...} > > > > > > > > >> > I added "..." to make it more explicit that it's a VLA constructor. > > > > > > > > >> > > > > > > > > >> But I doubt we do anything reasonable with such a beast? Do we? > > > > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode > > > > > > > > >> but do we actually make sure to do this duplication? > > > > > > > > > I am not sure. As mentioned above, the current code-gen for VLA > > > > > > > > > constructor looks pretty bad. > > > > > > > > > Should we avoid folding VLA constructors for now ? > > > > > > > > > > > > > > > > VLA constructors aren't really a thing. At least, the only VLA vector > > > > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length > > > > > > > > sequence at the start of an otherwise zero vector. I'm not sure > > > > > > > > we even use that though (perhaps we do and I've forgotten). > > > > > > > > > > > > > > > > > I guess these are 2 different issues: > > > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > > > > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > > > > > > > > > > > > > > > For (a), I think the issue with using: > > > > > > > > > res_type = gimple_assign_lhs (stmt) > > > > > > > > > in previous patch, was that op2's type will change to match tgt_units, > > > > > > > > > if we go thru > > > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > > > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > > > > > > > > in fold_vec_perm. > > > > > > > > > > > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > > > > > > > > > following semantics: > > > > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > > > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > > > > > > > > > > > > > > > Yeah. > > > > > > > > > > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > > > > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > > > > > > > > TYPE_VECTOR_SUBPARTS (op2)) > > > > > > > > > so it has same element type as arg0 (and arg1) and len of op2. > > > > > > > > > Does that look reasonable ? > > > > > > > > > > > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > > > > > > > > width vectors > > > > > > > > > with len(lhs_type) being a multiple of len(res_type). > > > > > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > > > > > > > > > > > > > > > Yes, that's not supported as a cast. If the compiler knows the > > > > > > > > length of the "VLA" vector then it's not VLA. If it doesn't > > > > > > > > know the length of the VLA vector then the sizes could be different > > > > > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > > > > > > > > different (preventing pointwise CONVERT_EXPRs). > > > > > > > > > > > > > > > > > or from VLA vector of one type to VLA vector of other type ? > > > > > > > > > > > > > > > > That's supported though. They work just like VLS vectors: if the sizes > > > > > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > > > > > > > > are the same then we can do pointwise conversions (e.g. element-by-element > > > > > > > > extensions, truncations, conversions to float, conversions to integer, etc). > > > > > > > > > > > > > > > > > Currently, if op2 is VLA, and we enter the branch: > > > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > > > > > > > > then I think it will bail out because op2_units will not be a compile > > > > > > > > > time constant, > > > > > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > > > > > > > > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > > > > > > > > ones in PR106360. > > > > > > > > > Does it look OK ? > > > > > > > > > > > > > > > > Richi should have the final say, but it looks OK in principle to me. > > > > > > > > I'm not sure it's worth applying independently of the follow-on patch > > > > > > > > though, if that patch is in the offing anyway. > > > > > > > > > > > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs > > > > > > > > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > > > > > > > > instead? > > > > > > > > > > > > > > I think the point was they are not necssarily the same when we > > > > > > > looked through a VIEW_CONVERT? A comment might be in order > > > > > > > here. > > > > > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of > > > > > > VIEW_CONVERT_EXPR. For instance in following case: > > > > > > > > > > > > _78 = (void *) ivtmp.21_73; > > > > > > _92 = MEM <unsigned long> [(uint8_t *)_78]; > > > > > > _91 = {_92, 0}; > > > > > > vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91); > > > > > > _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B]; > > > > > > _87 = {_88, 0}; > > > > > > vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87); > > > > > > vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2, > > > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>; > > > > > > > > > > > > simplify_permutation looks thru V_C_E, and tries to fold: > > > > > > res = VEC_PERM_EXPR (_87, _91, {0, 2}) > > > > > > In this case, lhs type (V16QI) differs from res type (V2DI), and we > > > > > > hit assert in fold_vec_perm. > > > > > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only > > > > > when we used res_type = lhs_type. > > > > > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI. > > > > > The issue with using res_type = TREE_TYPE (arg0), comes for case when > > > > > arg0 is fixed length, and op2 is VLA. > > > > Hi, > > > > Shall it be OK to commit the above patch, that changes res_type to > > > > have same elem type as arg0, and length as op2 ? > > > > > > > > Also, shall it be OK to gate these transforms in vec_perm match.pd pattern: > > > > sel.series_p (0, 1, 0, 1) -> op0 and, > > > > sel.series_p (0, 1, nelts, 1) -> op1 > > > > if len(type) == len(arg0) ? > > > ping > > > > I've lost track - what is the patch you are refering to? > This one: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599586.html > Only the tree-ssa-forwprop.cc bits (the other patch extends > fold_vec_perm and we don't want asserts in vec_perm pattern). Yes, the tree-ssa-forwprop.cc hunk of that patch is OK. Thanks, Richard. > Thanks, > Prathamesh > > > > Richard. > > > > > Thanks, > > > Prathamesh > > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > Thanks, > > > > > Prathamesh > > > > > > > > > > > > > > Btw, please don't add new asserts that trivially hold in critical paths. > > > > > > Well, it didn't hold true for following case when op2 is VLA: > > > > > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }), > > > > > > because we're passing res_type == V4SI, and op2_type == VNx4SI from > > > > > > simplify_permutation. > > > > > > > > > > > > The vec_perm pattern hits: > > > > > > (if (sel.series_p (0, 1, 0, 1)) > > > > > > { op0; } > > > > > > > > > > > > and folds it to: > > > > > > lhs = {1, 2, 3, 4} > > > > > > > > > > > > which results in error during verify_gimple since lhs and rhs types > > > > > > differ (VNx4SI, V4SI). > > > > > > > > > > > > I suppose we want to do the transforms: > > > > > > sel.series_p (0, 1, 0, 1) -> op0 and, > > > > > > sel.series_p (0, 1, nelts, 1) -> op1 > > > > > > only if input vectors and sel have same length ? > > > > > > > > > > > > Thanks, > > > > > > Prathamesh > > > > > > > > > > > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > > > index 330c1db0c8e..aa20cc713c5 100644 > > > > > > > --- a/gcc/match.pd > > > > > > > +++ b/gcc/match.pd > > > > > > > @@ -7845,6 +7845,12 @@ and, > > > > > > > (with > > > > > > > { > > > > > > > tree op0 = @0, op1 = @1, op2 = @2; > > > > > > > + > > > > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), > > > > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); > > > > > > > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), > > > > > > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); > > > > > > > + > > > > > > > machine_mode result_mode = TYPE_MODE (type); > > > > > > > machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Richard
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc index 69567ab3275..be888f1c48e 100644 --- a/gcc/tree-ssa-forwprop.cc +++ b/gcc/tree-ssa-forwprop.cc @@ -2414,6 +2414,9 @@ simplify_permutation (gimple_stmt_iterator *gsi) if (TREE_CODE (op2) != VECTOR_CST) return 0; + if (!types_compatible_p (TREE_TYPE (gimple_get_lhs (stmt)), TREE_TYPE (op0))) + return 0; + if (TREE_CODE (op0) == VECTOR_CST) { code = VECTOR_CST;