Message ID | CAAgBjMmAJek1O=pFkRNx9jn2DAzD-0AV57ASq0cyP+aDmg+9sg@mail.gmail.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 191993858D3C for <patchwork@sourceware.org>; Thu, 2 Dec 2021 10:51:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 191993858D3C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1638442309; bh=6QdB2tDXJSyhNpm4FqB94L0EVFNGiCI6MFGRVTWivMU=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=U+L7ahkveXaSdmNGn4Cc9WppCxfiMCD1LARigbkQWwNjmUiwFf1ipQuybQR0E5MQi 0vjzYerfLNtL/XBhg5om/R2rD2T2zbsnSxKG0Es0U79bjn5rkxER/xOsRWd7JtLjzC 5IOG7ihO+065b1bsEvojIn/aaXO5bAE6D6+Wt0wY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by sourceware.org (Postfix) with ESMTPS id 9E6523858D28 for <gcc-patches@gcc.gnu.org>; Thu, 2 Dec 2021 10:51:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9E6523858D28 Received: by mail-ed1-x531.google.com with SMTP id v1so114044320edx.2 for <gcc-patches@gcc.gnu.org>; Thu, 02 Dec 2021 02:51:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=6QdB2tDXJSyhNpm4FqB94L0EVFNGiCI6MFGRVTWivMU=; b=8HElwyK1fUZraOdwJNRTY0p/66SxD8hVAkjoXmynp1mm7fAFdLGOU5J8yFJkJu1jVR EJkU7qTj9NYHiSxEDDzAXowzl3P34jnE7bP19S4QCARQYOKdIbVjwPqUp/EkeKzXcWFg diqRAOZcxvlQhJ19GFmlGHWKsCieO3ZVZV7+7L5wJnIzrcS/TuJ66I5Cgy6W19D2prU0 r0IEbj4zhAGGXVzLcvVAo1iZfG3nhH3q+Y/iOfSPBmo6ofxmj1UEvwWT02rzNjQO1g6j AZIrmsLJWVFo/ri3U4TU4bAlpxHu6NMQHmFNe7rJ5scNxWvfvRBeAKGhdCcg6b2763RL ywZQ== X-Gm-Message-State: AOAM532bBffeGEFbrmsyWbo+Mcxb72QhnHiFGV7g5dPESBUz8rr8Y4Us 0bHBOeXY17cewee5USAzVe8AVgpxJjcFqtZPBZWF+AbqZZw= X-Google-Smtp-Source: ABdhPJzFUIDTdTKEjb9b9BEgJp3YeRv0uUrd74o01+yeI8ntaBmAo8ONn2ZUhPgpEyiKkydA1KJ9msYEgZTBkaW4Bu0= X-Received: by 2002:a17:906:b51:: with SMTP id v17mr14882220ejg.262.1638442276261; Thu, 02 Dec 2021 02:51:16 -0800 (PST) MIME-Version: 1.0 Date: Thu, 2 Dec 2021 16:20:40 +0530 Message-ID: <CAAgBjMmAJek1O=pFkRNx9jn2DAzD-0AV57ASq0cyP+aDmg+9sg@mail.gmail.com> Subject: [SVE] PR96463 - Optimise svld1rq from vectors To: gcc Patches <gcc-patches@gcc.gnu.org>, Richard Sandiford <richard.sandiford@arm.com> Content-Type: multipart/mixed; boundary="000000000000dd1c7605d227918d" X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[SVE] PR96463 - Optimise svld1rq from vectors
|
|
Commit Message
Prathamesh Kulkarni
Dec. 2, 2021, 10:50 a.m. UTC
Hi Richard, I have attached a WIP untested patch for PR96463. IIUC, the PR suggests to transform lhs = svld1rq ({-1, -1, ...}, &v[0]) into: lhs = vec_perm_expr<v, v, {0, 0, ...}> if v is vector of 4 elements, and each element is 32 bits on little endian target ? I am sorry if this sounds like a silly question, but I am not sure how to convert a vector of type int32x4_t into svint32_t ? In the patch, I simply used NOP_EXPR (which I expected to fail), and gave type error during gimple verification: svint32_t foo (int32x4_t x) { return svld1rq (svptrue_b8 (), &x[0]); } transformed to: EMERGENCY DUMP: svint32_t foo (int32x4_t x) { svint32_t _3; __Int32x4_t _4; <bb 2> : _4 = VEC_PERM_EXPR <x_5(D), x_5(D), { 0, 0, 0, 0 }>; _3 = (svint32_t) _4; return _3; } and ICE's with: pr96463.c:8:1: error: invalid vector types in nop conversion 8 | } | ^ svint32_t __Int32x4_t _3 = (svint32_t) _4; during GIMPLE pass: ccp Could you please suggest how to proceed ? Thanks, Prathamesh
Comments
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > Hi Richard, > I have attached a WIP untested patch for PR96463. > IIUC, the PR suggests to transform > lhs = svld1rq ({-1, -1, ...}, &v[0]) > into: > lhs = vec_perm_expr<v, v, {0, 0, ...}> > if v is vector of 4 elements, and each element is 32 bits on little > endian target ? > > I am sorry if this sounds like a silly question, but I am not sure how > to convert a vector of type int32x4_t into svint32_t ? In the patch, I > simply used NOP_EXPR (which I expected to fail), and gave type error > during gimple verification: It should be possible in principle to have a VEC_PERM_EXPR in which the operands are Advanced SIMD vectors and the result is an SVE vector. E.g., the dup in the PR would be something like this: foo (int32x4_t a) { svint32_t _2; _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>; return _2; } where the final operand can be built using: int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1); for (int i = 0; i < source_nelts; ++i) sel.quick_push (i); I'm not sure how well-tested that combination is though. It might need changes to target-independent code. Thanks, Richard
On Thu, 2 Dec 2021 at 23:11, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > Hi Richard, > > I have attached a WIP untested patch for PR96463. > > IIUC, the PR suggests to transform > > lhs = svld1rq ({-1, -1, ...}, &v[0]) > > into: > > lhs = vec_perm_expr<v, v, {0, 0, ...}> > > if v is vector of 4 elements, and each element is 32 bits on little > > endian target ? > > > > I am sorry if this sounds like a silly question, but I am not sure how > > to convert a vector of type int32x4_t into svint32_t ? In the patch, I > > simply used NOP_EXPR (which I expected to fail), and gave type error > > during gimple verification: > > It should be possible in principle to have a VEC_PERM_EXPR in which > the operands are Advanced SIMD vectors and the result is an SVE vector. > > E.g., the dup in the PR would be something like this: > > foo (int32x4_t a) > { > svint32_t _2; > > _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>; > return _2; > } > > where the final operand can be built using: > > int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); > vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1); > for (int i = 0; i < source_nelts; ++i) > sel.quick_push (i); > > I'm not sure how well-tested that combination is though. It might need > changes to target-independent code. Hi Richard, Thanks for the suggestions. I tried the above approach in attached patch, but it still results in ICE due to type mismatch: pr96463.c: In function ‘foo’: pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’ 8 | } | ^ svint32_t int32x4_t int32x4_t svint32_t _3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>; during GIMPLE pass: ccp dump file: pr96463.c.032t.ccp1 pr96463.c:8:1: internal compiler error: verify_gimple failed Should we perhaps add another tree code, that "extends" a fixed-width vector into it's VLA equivalent ? Thanks, Prathamesh > > Thanks, > Richard diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 02e42a71e5e..b38c4641535 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -44,6 +44,8 @@ #include "aarch64-sve-builtins-shapes.h" #include "aarch64-sve-builtins-base.h" #include "aarch64-sve-builtins-functions.h" +#include "print-tree.h" +#include "gimple-pretty-print.h" using namespace aarch64_sve; @@ -1207,6 +1209,57 @@ public: insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0)); return e.use_contiguous_load_insn (icode); } + + gimple * + fold (gimple_folder &f) const OVERRIDE + { + tree arg0 = gimple_call_arg (f.call, 0); + tree arg1 = gimple_call_arg (f.call, 1); + + /* Transform: + lhs = svld1rq ({-1, -1, ... }, &v[0]) + into: + tmp = vec_perm_expr<v, v, {0, 0, ...}>. + lhs = nop_expr tmp + on little endian target. */ + + if (!BYTES_BIG_ENDIAN + && integer_all_onesp (arg0) + && TREE_CODE (arg1) == ADDR_EXPR) + { + tree t = TREE_OPERAND (arg1, 0); + if (TREE_CODE (t) == ARRAY_REF) + { + tree index = TREE_OPERAND (t, 1); + t = TREE_OPERAND (t, 0); + if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR) + { + t = TREE_OPERAND (t, 0); + tree vectype = TREE_TYPE (t); + if (VECTOR_TYPE_P (vectype) + && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) + && wi::to_wide (TYPE_SIZE (vectype)) == 128) + { + tree lhs = gimple_call_lhs (f.call); + tree lhs_type = TREE_TYPE (lhs); + int source_nelts = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); + vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (lhs_type), source_nelts, 1); + for (int i = 0; i < source_nelts; i++) + sel.quick_push (i); + + vec_perm_indices indices (sel, 1, source_nelts); + if (!can_vec_perm_const_p (TYPE_MODE (lhs_type), indices)) + return NULL; + + tree mask = vec_perm_indices_to_tree (lhs_type, indices); + return gimple_build_assign (lhs, VEC_PERM_EXPR, t, t, mask); + } + } + } + } + + return NULL; + } }; class svld1ro_impl : public load_replicate
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > On Thu, 2 Dec 2021 at 23:11, Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: >> > Hi Richard, >> > I have attached a WIP untested patch for PR96463. >> > IIUC, the PR suggests to transform >> > lhs = svld1rq ({-1, -1, ...}, &v[0]) >> > into: >> > lhs = vec_perm_expr<v, v, {0, 0, ...}> >> > if v is vector of 4 elements, and each element is 32 bits on little >> > endian target ? >> > >> > I am sorry if this sounds like a silly question, but I am not sure how >> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I >> > simply used NOP_EXPR (which I expected to fail), and gave type error >> > during gimple verification: >> >> It should be possible in principle to have a VEC_PERM_EXPR in which >> the operands are Advanced SIMD vectors and the result is an SVE vector. >> >> E.g., the dup in the PR would be something like this: >> >> foo (int32x4_t a) >> { >> svint32_t _2; >> >> _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>; >> return _2; >> } >> >> where the final operand can be built using: >> >> int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); >> vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1); >> for (int i = 0; i < source_nelts; ++i) >> sel.quick_push (i); >> >> I'm not sure how well-tested that combination is though. It might need >> changes to target-independent code. > Hi Richard, > Thanks for the suggestions. > I tried the above approach in attached patch, but it still results in > ICE due to type mismatch: > > pr96463.c: In function ‘foo’: > pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’ > 8 | } > | ^ > svint32_t > int32x4_t > int32x4_t > svint32_t > _3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>; > during GIMPLE pass: ccp > dump file: pr96463.c.032t.ccp1 > pr96463.c:8:1: internal compiler error: verify_gimple failed > > Should we perhaps add another tree code, that "extends" a fixed-width > vector into it's VLA equivalent ? No, I think this is just an extreme example of the combination not being well-tested. :-) Obviously it's worse than I thought. I think accepting this kind of VEC_PERM_EXPR is still the way to go. Richi, WDYT? Thanks, Richard
On Tue, 7 Dec 2021 at 19:08, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > On Thu, 2 Dec 2021 at 23:11, Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > >> > Hi Richard, > >> > I have attached a WIP untested patch for PR96463. > >> > IIUC, the PR suggests to transform > >> > lhs = svld1rq ({-1, -1, ...}, &v[0]) > >> > into: > >> > lhs = vec_perm_expr<v, v, {0, 0, ...}> > >> > if v is vector of 4 elements, and each element is 32 bits on little > >> > endian target ? > >> > > >> > I am sorry if this sounds like a silly question, but I am not sure how > >> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I > >> > simply used NOP_EXPR (which I expected to fail), and gave type error > >> > during gimple verification: > >> > >> It should be possible in principle to have a VEC_PERM_EXPR in which > >> the operands are Advanced SIMD vectors and the result is an SVE vector. > >> > >> E.g., the dup in the PR would be something like this: > >> > >> foo (int32x4_t a) > >> { > >> svint32_t _2; > >> > >> _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>; > >> return _2; > >> } > >> > >> where the final operand can be built using: > >> > >> int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); > >> vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1); > >> for (int i = 0; i < source_nelts; ++i) > >> sel.quick_push (i); > >> > >> I'm not sure how well-tested that combination is though. It might need > >> changes to target-independent code. > > Hi Richard, > > Thanks for the suggestions. > > I tried the above approach in attached patch, but it still results in > > ICE due to type mismatch: > > > > pr96463.c: In function ‘foo’: > > pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’ > > 8 | } > > | ^ > > svint32_t > > int32x4_t > > int32x4_t > > svint32_t > > _3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>; > > during GIMPLE pass: ccp > > dump file: pr96463.c.032t.ccp1 > > pr96463.c:8:1: internal compiler error: verify_gimple failed > > > > Should we perhaps add another tree code, that "extends" a fixed-width > > vector into it's VLA equivalent ? > > No, I think this is just an extreme example of the combination not being > well-tested. :-) Obviously it's worse than I thought. > > I think accepting this kind of VEC_PERM_EXPR is still the way to go. > Richi, WDYT? Hi Richi, ping ? Thanks, Prathamesh > > Thanks, > Richard
On Tue, 14 Dec 2021, Prathamesh Kulkarni wrote: > On Tue, 7 Dec 2021 at 19:08, Richard Sandiford > <richard.sandiford@arm.com> wrote: > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > On Thu, 2 Dec 2021 at 23:11, Richard Sandiford > > > <richard.sandiford@arm.com> wrote: > > >> > > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > >> > Hi Richard, > > >> > I have attached a WIP untested patch for PR96463. > > >> > IIUC, the PR suggests to transform > > >> > lhs = svld1rq ({-1, -1, ...}, &v[0]) > > >> > into: > > >> > lhs = vec_perm_expr<v, v, {0, 0, ...}> > > >> > if v is vector of 4 elements, and each element is 32 bits on little > > >> > endian target ? > > >> > > > >> > I am sorry if this sounds like a silly question, but I am not sure how > > >> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I > > >> > simply used NOP_EXPR (which I expected to fail), and gave type error > > >> > during gimple verification: > > >> > > >> It should be possible in principle to have a VEC_PERM_EXPR in which > > >> the operands are Advanced SIMD vectors and the result is an SVE vector. > > >> > > >> E.g., the dup in the PR would be something like this: > > >> > > >> foo (int32x4_t a) > > >> { > > >> svint32_t _2; > > >> > > >> _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>; > > >> return _2; > > >> } > > >> > > >> where the final operand can be built using: > > >> > > >> int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); > > >> vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1); > > >> for (int i = 0; i < source_nelts; ++i) > > >> sel.quick_push (i); > > >> > > >> I'm not sure how well-tested that combination is though. It might need > > >> changes to target-independent code. > > > Hi Richard, > > > Thanks for the suggestions. > > > I tried the above approach in attached patch, but it still results in > > > ICE due to type mismatch: > > > > > > pr96463.c: In function ‘foo’: > > > pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’ > > > 8 | } > > > | ^ > > > svint32_t > > > int32x4_t > > > int32x4_t > > > svint32_t > > > _3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>; > > > during GIMPLE pass: ccp > > > dump file: pr96463.c.032t.ccp1 > > > pr96463.c:8:1: internal compiler error: verify_gimple failed > > > > > > Should we perhaps add another tree code, that "extends" a fixed-width > > > vector into it's VLA equivalent ? > > > > No, I think this is just an extreme example of the combination not being > > well-tested. :-) Obviously it's worse than I thought. > > > > I think accepting this kind of VEC_PERM_EXPR is still the way to go. > > Richi, WDYT? > Hi Richi, ping ? We check case VEC_PERM_EXPR: if (!useless_type_conversion_p (lhs_type, rhs1_type) || !useless_type_conversion_p (lhs_type, rhs2_type)) { error ("type mismatch in %qs", code_name); and LHS is svint32_t while x_4 is int32x4_t (the permutation type can indeed be different - it needs to be integer for example). The test is indeed unnecessarily strict if there are two vector types that are not compatible but have the same element mode and the same number of elements. I guess we could check sth like if (!useless_type_conversion_p (TREE_TYPE (lhs_type), TREE_TYPE (rhs1_type)) || !types_compatible_p (rhs1_type, rhs2_type)) instead - we later check TYPE_VECTOR_SUBPARTS so they match. But note that vec_perm_optab has a single mode only so I'm not sure what mode we should pick at RTL expansion time for your quoted case so I'm a bit nervous here. Richard? Richard.
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 02e42a71e5e..3834f33443a 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -44,6 +44,13 @@ #include "aarch64-sve-builtins-shapes.h" #include "aarch64-sve-builtins-base.h" #include "aarch64-sve-builtins-functions.h" +#include "print-tree.h" +#include "gimple-pretty-print.h" + +/* ??? Including tree-ssanames.h requires including other header dependencies. + Just including the prototype for now. */ +extern tree make_ssa_name_fn (struct function *, tree, gimple *, + unsigned int version = 0); using namespace aarch64_sve; @@ -1207,6 +1214,52 @@ public: insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0)); return e.use_contiguous_load_insn (icode); } + + gimple * + fold (gimple_folder &f) const OVERRIDE + { + tree arg0 = gimple_call_arg (f.call, 0); + tree arg1 = gimple_call_arg (f.call, 1); + + /* Transform: + lhs = svld1rq ({-1, -1, ... }, &v[0]) + into: + tmp = vec_perm_expr<v, v, {0, 0, ...}>. + lhs = nop_expr tmp + on little endian target. */ + + if (!BYTES_BIG_ENDIAN + && integer_all_onesp (arg0) + && TREE_CODE (arg1) == ADDR_EXPR) + { + tree t = TREE_OPERAND (arg1, 0); + if (TREE_CODE (t) == ARRAY_REF) + { + tree index = TREE_OPERAND (t, 1); + t = TREE_OPERAND (t, 0); + if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR) + { + t = TREE_OPERAND (t, 0); + tree vectype = TREE_TYPE (t); + if (VECTOR_TYPE_P (vectype) + && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) + && wi::to_wide (TYPE_SIZE (vectype)) == 128) + { + tree new_temp = ::make_ssa_name_fn (cfun, vectype, NULL); + tree zero_vec = build_vector_from_val (vectype, index); + gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, t, t, zero_vec); + /* ??? How to convert between vector types if gimple_call_lhs (f.call) and + new_temp have different types ? */ + gimple *g2 = gimple_build_assign (gimple_call_lhs (f.call), NOP_EXPR, new_temp); + gsi_insert_before (f.gsi, g, GSI_SAME_STMT); + return g2; + } + } + } + } + + return NULL; + } }; class svld1ro_impl : public load_replicate