Message ID | CAAgBjM=J8Vye=RPBw1sWQnUzxfC1C2UPT_vc+_jmXOeYJG-YGQ@mail.gmail.com |
---|---|
State | Under Review |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6B10D3858414 for <patchwork@sourceware.org>; Tue, 17 Jan 2023 10:47:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6B10D3858414 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1673952474; bh=qHeH+uVippo1j8HU9T24lJsQ9IX8V25APCSU9PktxkE=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=BgFLQQEJwteswTyaEz7LMYxmCl4IfYSJ7CxOldCd+P/CXlkY7o/xgb/v/DV+yGmkS xkBPBiQVYdSI+Z52ySvS5nXBaeoraQKr6Hxu5w5GPKFJHgiGhLa5L6Zfkm7L4ta1Zb fgqYOwbj3WJJPN4rxFIR0Ka0l3Z6UbZzHJSnoxGQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by sourceware.org (Postfix) with ESMTPS id D902E3858414 for <gcc-patches@gcc.gnu.org>; Tue, 17 Jan 2023 10:47:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D902E3858414 Received: by mail-wm1-x332.google.com with SMTP id j34-20020a05600c1c2200b003da1b054057so9758766wms.5 for <gcc-patches@gcc.gnu.org>; Tue, 17 Jan 2023 02:47:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=qHeH+uVippo1j8HU9T24lJsQ9IX8V25APCSU9PktxkE=; b=KqYRojMNB2Sr1cJ37qMr0rhb+LZT83w862+KTF9AzZI7On3Fuk7BJpOk83EObXRNyx NKabu9jqCldXNPWNOK1XFDKd4ziQvE385Uuouzj87TohkeG3RdCGfD1DJAwG7i88H1je Dqhc9k6pw9JCLPOxv9B1eSpy6BQADq1ZGmhWf37CIYf3ENCT60UfTVVnsKqU+M0NODbX v6eZ4OWHiRSrWrwPZap1xm/IyzC6ndy8BB942iv5k9R19d0v0Pm+ekUKJUhZE7g0t6lb xnTxy/SM7lBW0Bnt6j7OcOYQllWUlNxQQnkjgk6bqTcXTWuafyV4vrHxIADfaRC8thm6 rg+g== X-Gm-Message-State: AFqh2kqzQw74k2HKItBOC45SyEsRCqKGnaQVdBATJHVogSUR1QoV2qS9 i5tk2Dc63g7T5wTS0Ebv55gE/RxZ/cnvJqV4IK8Nyy2nv55RxA== X-Google-Smtp-Source: AMrXdXsxNU8DroIV6hdiATdz2ApcAMy1azPA+uolc5oHU3WV3lIgp26RqflDqAenhZdz2GHSIf4QBg0ECZfrdMsDUPs= X-Received: by 2002:a05:600c:354d:b0:3da:f794:0 with SMTP id i13-20020a05600c354d00b003daf7940000mr119153wmq.147.1673952443161; Tue, 17 Jan 2023 02:47:23 -0800 (PST) MIME-Version: 1.0 Date: Tue, 17 Jan 2023 16:16:45 +0530 Message-ID: <CAAgBjM=J8Vye=RPBw1sWQnUzxfC1C2UPT_vc+_jmXOeYJG-YGQ@mail.gmail.com> Subject: [aarch64] Use wzr/xzr for assigning vector element to 0 To: gcc Patches <gcc-patches@gcc.gnu.org>, Richard Sandiford <richard.sandiford@arm.com> Content-Type: multipart/mixed; boundary="000000000000bf0f6d05f2736c66" X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[aarch64] Use wzr/xzr for assigning vector element to 0
|
|
Commit Message
Prathamesh Kulkarni
Jan. 17, 2023, 10:46 a.m. UTC
Hi Richard, For the following (contrived) test: void foo(int32x4_t v) { v[3] = 0; return v; } -O2 code-gen: foo: fmov s1, wzr ins v0.s[3], v1.s[0] ret I suppose we can instead emit the following code-gen ? foo: ins v0.s[3], wzr ret combine produces: Failed to match this instruction: (set (reg:V4SI 95 [ v ]) (vec_merge:V4SI (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ]) (reg:V4SI 97) (const_int 8 [0x8]))) So, I wrote the following pattern to match the above insn: (define_insn "aarch64_simd_vec_set_zero<mode>" [(set (match_operand:VALL_F16 0 "register_operand" "=w") (vec_merge:VALL_F16 (match_operand:VALL_F16 1 "const_dup0_operand" "w") (match_operand:VALL_F16 3 "register_operand" "0") (match_operand:SI 2 "immediate_operand" "i")))] "TARGET_SIMD" { int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); return "ins\\t%0.<Vetype>[%p2], wzr"; } ) which now matches the above insn produced by combine. However, in reload dump, it creates a new insn for assigning register to (const_vector (const_int 0)), which results in: (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} (nil)) (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) (vec_merge:V4SI (reg:V4SI 33 v1 [99]) (reg:V4SI 32 v0 [97]) (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 {aarch64_simd_vec_set_zerov4si} (nil)) and eventually the code-gen: foo: movi v1.4s, 0 ins v0.s[3], wzr ret To get rid of redundant assignment of 0 to v1, I tried to split the above pattern as in the attached patch. This works to emit code-gen: foo: ins v0.s[3], wzr ret However, I am not sure if this is the right approach. Could you suggest, if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? Thanks, Prathamesh
Comments
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > Hi Richard, > For the following (contrived) test: > > void foo(int32x4_t v) > { > v[3] = 0; > return v; > } > > -O2 code-gen: > foo: > fmov s1, wzr > ins v0.s[3], v1.s[0] > ret > > I suppose we can instead emit the following code-gen ? > foo: > ins v0.s[3], wzr > ret > > combine produces: > Failed to match this instruction: > (set (reg:V4SI 95 [ v ]) > (vec_merge:V4SI (const_vector:V4SI [ > (const_int 0 [0]) repeated x4 > ]) > (reg:V4SI 97) > (const_int 8 [0x8]))) > > So, I wrote the following pattern to match the above insn: > (define_insn "aarch64_simd_vec_set_zero<mode>" > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > (vec_merge:VALL_F16 > (match_operand:VALL_F16 1 "const_dup0_operand" "w") > (match_operand:VALL_F16 3 "register_operand" "0") > (match_operand:SI 2 "immediate_operand" "i")))] > "TARGET_SIMD" > { > int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > return "ins\\t%0.<Vetype>[%p2], wzr"; > } > ) > > which now matches the above insn produced by combine. > However, in reload dump, it creates a new insn for assigning > register to (const_vector (const_int 0)), > which results in: > (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) > (const_vector:V4SI [ > (const_int 0 [0]) repeated x4 > ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} > (nil)) > (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) > (vec_merge:V4SI (reg:V4SI 33 v1 [99]) > (reg:V4SI 32 v0 [97]) > (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 > {aarch64_simd_vec_set_zerov4si} > (nil)) > > and eventually the code-gen: > foo: > movi v1.4s, 0 > ins v0.s[3], wzr > ret > > To get rid of redundant assignment of 0 to v1, I tried to split the > above pattern > as in the attached patch. This works to emit code-gen: > foo: > ins v0.s[3], wzr > ret > > However, I am not sure if this is the right approach. Could you suggest, > if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? The problem is with the "w" constraint on operand 1, which tells LRA to force the zero into an FPR. It should work if you remove the constraint. Also, I think you'll need to use <vwcore>zr for the zero, so that it uses xzr for 64-bit elements. I think this and the existing patterns ought to test exact_log2 (INTVAL (operands[2])) >= 0 in the insn condition, since there's no guarantee that RTL optimisations won't form vec_merges that have other masks. Thanks, Richard
On Tue, 17 Jan 2023 at 18:29, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > Hi Richard, > > For the following (contrived) test: > > > > void foo(int32x4_t v) > > { > > v[3] = 0; > > return v; > > } > > > > -O2 code-gen: > > foo: > > fmov s1, wzr > > ins v0.s[3], v1.s[0] > > ret > > > > I suppose we can instead emit the following code-gen ? > > foo: > > ins v0.s[3], wzr > > ret > > > > combine produces: > > Failed to match this instruction: > > (set (reg:V4SI 95 [ v ]) > > (vec_merge:V4SI (const_vector:V4SI [ > > (const_int 0 [0]) repeated x4 > > ]) > > (reg:V4SI 97) > > (const_int 8 [0x8]))) > > > > So, I wrote the following pattern to match the above insn: > > (define_insn "aarch64_simd_vec_set_zero<mode>" > > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > > (vec_merge:VALL_F16 > > (match_operand:VALL_F16 1 "const_dup0_operand" "w") > > (match_operand:VALL_F16 3 "register_operand" "0") > > (match_operand:SI 2 "immediate_operand" "i")))] > > "TARGET_SIMD" > > { > > int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > > operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > > return "ins\\t%0.<Vetype>[%p2], wzr"; > > } > > ) > > > > which now matches the above insn produced by combine. > > However, in reload dump, it creates a new insn for assigning > > register to (const_vector (const_int 0)), > > which results in: > > (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) > > (const_vector:V4SI [ > > (const_int 0 [0]) repeated x4 > > ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} > > (nil)) > > (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) > > (vec_merge:V4SI (reg:V4SI 33 v1 [99]) > > (reg:V4SI 32 v0 [97]) > > (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 > > {aarch64_simd_vec_set_zerov4si} > > (nil)) > > > > and eventually the code-gen: > > foo: > > movi v1.4s, 0 > > ins v0.s[3], wzr > > ret > > > > To get rid of redundant assignment of 0 to v1, I tried to split the > > above pattern > > as in the attached patch. This works to emit code-gen: > > foo: > > ins v0.s[3], wzr > > ret > > > > However, I am not sure if this is the right approach. Could you suggest, > > if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? > > The problem is with the "w" constraint on operand 1, which tells LRA > to force the zero into an FPR. It should work if you remove the > constraint. Ah indeed, sorry about that, changing the constrained works. Does the attached patch look OK after bootstrap+test ? Since we're in stage-4, shall it be OK to commit now, or queue it for stage-1 ? Thanks, Prathamesh > > Also, I think you'll need to use <vwcore>zr for the zero, so that > it uses xzr for 64-bit elements. > > I think this and the existing patterns ought to test > exact_log2 (INTVAL (operands[2])) >= 0 in the insn condition, > since there's no guarantee that RTL optimisations won't form > vec_merges that have other masks. > > Thanks, > Richard [aarch64] Use wzr/xzr for assigning 0 to vector element. gcc/ChangeLog: * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): New pattern. * config/aarch64/predicates.md (const_dup0_operand): New. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 104088f67d2..8e54ee4e886 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1083,6 +1083,20 @@ [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] ) +(define_insn "aarch64_simd_vec_set_zero<mode>" + [(set (match_operand:VALL_F16 0 "register_operand" "=w") + (vec_merge:VALL_F16 + (match_operand:VALL_F16 1 "const_dup0_operand" "i") + (match_operand:VALL_F16 3 "register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" + { + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; + } +) + (define_insn "@aarch64_simd_vec_copy_lane<mode>" [(set (match_operand:VALL_F16 0 "register_operand" "=w") (vec_merge:VALL_F16 diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index ff7f73d3f30..901fa1bd7f9 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -49,6 +49,13 @@ return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3); }) +(define_predicate "const_dup0_operand" + (match_code "const_vector") +{ + op = unwrap_const_vec_duplicate (op); + return CONST_INT_P (op) && rtx_equal_p (op, const0_rtx); +}) + (define_predicate "subreg_lowpart_operator" (ior (match_code "truncate") (and (match_code "subreg")
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > On Tue, 17 Jan 2023 at 18:29, Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: >> > Hi Richard, >> > For the following (contrived) test: >> > >> > void foo(int32x4_t v) >> > { >> > v[3] = 0; >> > return v; >> > } >> > >> > -O2 code-gen: >> > foo: >> > fmov s1, wzr >> > ins v0.s[3], v1.s[0] >> > ret >> > >> > I suppose we can instead emit the following code-gen ? >> > foo: >> > ins v0.s[3], wzr >> > ret >> > >> > combine produces: >> > Failed to match this instruction: >> > (set (reg:V4SI 95 [ v ]) >> > (vec_merge:V4SI (const_vector:V4SI [ >> > (const_int 0 [0]) repeated x4 >> > ]) >> > (reg:V4SI 97) >> > (const_int 8 [0x8]))) >> > >> > So, I wrote the following pattern to match the above insn: >> > (define_insn "aarch64_simd_vec_set_zero<mode>" >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") >> > (vec_merge:VALL_F16 >> > (match_operand:VALL_F16 1 "const_dup0_operand" "w") >> > (match_operand:VALL_F16 3 "register_operand" "0") >> > (match_operand:SI 2 "immediate_operand" "i")))] >> > "TARGET_SIMD" >> > { >> > int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); >> > operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); >> > return "ins\\t%0.<Vetype>[%p2], wzr"; >> > } >> > ) >> > >> > which now matches the above insn produced by combine. >> > However, in reload dump, it creates a new insn for assigning >> > register to (const_vector (const_int 0)), >> > which results in: >> > (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) >> > (const_vector:V4SI [ >> > (const_int 0 [0]) repeated x4 >> > ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} >> > (nil)) >> > (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) >> > (vec_merge:V4SI (reg:V4SI 33 v1 [99]) >> > (reg:V4SI 32 v0 [97]) >> > (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 >> > {aarch64_simd_vec_set_zerov4si} >> > (nil)) >> > >> > and eventually the code-gen: >> > foo: >> > movi v1.4s, 0 >> > ins v0.s[3], wzr >> > ret >> > >> > To get rid of redundant assignment of 0 to v1, I tried to split the >> > above pattern >> > as in the attached patch. This works to emit code-gen: >> > foo: >> > ins v0.s[3], wzr >> > ret >> > >> > However, I am not sure if this is the right approach. Could you suggest, >> > if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? >> >> The problem is with the "w" constraint on operand 1, which tells LRA >> to force the zero into an FPR. It should work if you remove the >> constraint. > Ah indeed, sorry about that, changing the constrained works. "i" isn't right though, because that's for scalar integers. There's no need for any constraint here -- the predicate does all of the work. > Does the attached patch look OK after bootstrap+test ? > Since we're in stage-4, shall it be OK to commit now, or queue it for stage-1 ? It needs tests as well. :-) Also: > Thanks, > Prathamesh > > >> >> Also, I think you'll need to use <vwcore>zr for the zero, so that >> it uses xzr for 64-bit elements. >> >> I think this and the existing patterns ought to test >> exact_log2 (INTVAL (operands[2])) >= 0 in the insn condition, >> since there's no guarantee that RTL optimisations won't form >> vec_merges that have other masks. >> >> Thanks, >> Richard > > [aarch64] Use wzr/xzr for assigning 0 to vector element. > > gcc/ChangeLog: > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): > New pattern. > * config/aarch64/predicates.md (const_dup0_operand): New. > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index 104088f67d2..8e54ee4e886 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -1083,6 +1083,20 @@ > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] > ) > > +(define_insn "aarch64_simd_vec_set_zero<mode>" > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") > + (vec_merge:VALL_F16 > + (match_operand:VALL_F16 1 "const_dup0_operand" "i") > + (match_operand:VALL_F16 3 "register_operand" "0") > + (match_operand:SI 2 "immediate_operand" "i")))] > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" > + { > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; > + } > +) > + > (define_insn "@aarch64_simd_vec_copy_lane<mode>" > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > (vec_merge:VALL_F16 > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md > index ff7f73d3f30..901fa1bd7f9 100644 > --- a/gcc/config/aarch64/predicates.md > +++ b/gcc/config/aarch64/predicates.md > @@ -49,6 +49,13 @@ > return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3); > }) > > +(define_predicate "const_dup0_operand" > + (match_code "const_vector") > +{ > + op = unwrap_const_vec_duplicate (op); > + return CONST_INT_P (op) && rtx_equal_p (op, const0_rtx); > +}) > + We already have aarch64_simd_imm_zero for this. aarch64_simd_imm_zero is actually more general, because it works for floating-point modes too. I think the tests should cover all modes included in VALL_F16, since that should have picked up this and the xzr thing. Thanks, Richard > (define_predicate "subreg_lowpart_operator" > (ior (match_code "truncate") > (and (match_code "subreg")
On Wed, 18 Jan 2023 at 19:59, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > On Tue, 17 Jan 2023 at 18:29, Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > >> > Hi Richard, > >> > For the following (contrived) test: > >> > > >> > void foo(int32x4_t v) > >> > { > >> > v[3] = 0; > >> > return v; > >> > } > >> > > >> > -O2 code-gen: > >> > foo: > >> > fmov s1, wzr > >> > ins v0.s[3], v1.s[0] > >> > ret > >> > > >> > I suppose we can instead emit the following code-gen ? > >> > foo: > >> > ins v0.s[3], wzr > >> > ret > >> > > >> > combine produces: > >> > Failed to match this instruction: > >> > (set (reg:V4SI 95 [ v ]) > >> > (vec_merge:V4SI (const_vector:V4SI [ > >> > (const_int 0 [0]) repeated x4 > >> > ]) > >> > (reg:V4SI 97) > >> > (const_int 8 [0x8]))) > >> > > >> > So, I wrote the following pattern to match the above insn: > >> > (define_insn "aarch64_simd_vec_set_zero<mode>" > >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > >> > (vec_merge:VALL_F16 > >> > (match_operand:VALL_F16 1 "const_dup0_operand" "w") > >> > (match_operand:VALL_F16 3 "register_operand" "0") > >> > (match_operand:SI 2 "immediate_operand" "i")))] > >> > "TARGET_SIMD" > >> > { > >> > int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > >> > operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > >> > return "ins\\t%0.<Vetype>[%p2], wzr"; > >> > } > >> > ) > >> > > >> > which now matches the above insn produced by combine. > >> > However, in reload dump, it creates a new insn for assigning > >> > register to (const_vector (const_int 0)), > >> > which results in: > >> > (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) > >> > (const_vector:V4SI [ > >> > (const_int 0 [0]) repeated x4 > >> > ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} > >> > (nil)) > >> > (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) > >> > (vec_merge:V4SI (reg:V4SI 33 v1 [99]) > >> > (reg:V4SI 32 v0 [97]) > >> > (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 > >> > {aarch64_simd_vec_set_zerov4si} > >> > (nil)) > >> > > >> > and eventually the code-gen: > >> > foo: > >> > movi v1.4s, 0 > >> > ins v0.s[3], wzr > >> > ret > >> > > >> > To get rid of redundant assignment of 0 to v1, I tried to split the > >> > above pattern > >> > as in the attached patch. This works to emit code-gen: > >> > foo: > >> > ins v0.s[3], wzr > >> > ret > >> > > >> > However, I am not sure if this is the right approach. Could you suggest, > >> > if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? > >> > >> The problem is with the "w" constraint on operand 1, which tells LRA > >> to force the zero into an FPR. It should work if you remove the > >> constraint. > > Ah indeed, sorry about that, changing the constrained works. > > "i" isn't right though, because that's for scalar integers. > There's no need for any constraint here -- the predicate does > all of the work. > > > Does the attached patch look OK after bootstrap+test ? > > Since we're in stage-4, shall it be OK to commit now, or queue it for stage-1 ? > > It needs tests as well. :-) > > Also: > > > Thanks, > > Prathamesh > > > > > >> > >> Also, I think you'll need to use <vwcore>zr for the zero, so that > >> it uses xzr for 64-bit elements. > >> > >> I think this and the existing patterns ought to test > >> exact_log2 (INTVAL (operands[2])) >= 0 in the insn condition, > >> since there's no guarantee that RTL optimisations won't form > >> vec_merges that have other masks. > >> > >> Thanks, > >> Richard > > > > [aarch64] Use wzr/xzr for assigning 0 to vector element. > > > > gcc/ChangeLog: > > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): > > New pattern. > > * config/aarch64/predicates.md (const_dup0_operand): New. > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > > index 104088f67d2..8e54ee4e886 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -1083,6 +1083,20 @@ > > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] > > ) > > > > +(define_insn "aarch64_simd_vec_set_zero<mode>" > > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") > > + (vec_merge:VALL_F16 > > + (match_operand:VALL_F16 1 "const_dup0_operand" "i") > > + (match_operand:VALL_F16 3 "register_operand" "0") > > + (match_operand:SI 2 "immediate_operand" "i")))] > > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" > > + { > > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; > > + } > > +) > > + > > (define_insn "@aarch64_simd_vec_copy_lane<mode>" > > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > > (vec_merge:VALL_F16 > > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md > > index ff7f73d3f30..901fa1bd7f9 100644 > > --- a/gcc/config/aarch64/predicates.md > > +++ b/gcc/config/aarch64/predicates.md > > @@ -49,6 +49,13 @@ > > return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3); > > }) > > > > +(define_predicate "const_dup0_operand" > > + (match_code "const_vector") > > +{ > > + op = unwrap_const_vec_duplicate (op); > > + return CONST_INT_P (op) && rtx_equal_p (op, const0_rtx); > > +}) > > + > > We already have aarch64_simd_imm_zero for this. aarch64_simd_imm_zero > is actually more general, because it works for floating-point modes too. > > I think the tests should cover all modes included in VALL_F16, since > that should have picked up this and the xzr thing. Hi Richard, Thanks for the suggestions. Does the attached patch look OK ? I am not sure how to test for v4bf and v8bf since it seems the compiler refuses conversions to/from bfloat16_t ? Thanks, Prathamesh > > Thanks, > Richard > > > (define_predicate "subreg_lowpart_operator" > > (ior (match_code "truncate") > > (and (match_code "subreg") [aarch64] Use wzr/xzr for assigning 0 to vector element. gcc/ChangeLog: * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): New pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vec-set-zero.c: New test. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 7f212bf37cd..7428e74beaf 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1083,6 +1083,20 @@ [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] ) +(define_insn "aarch64_simd_vec_set_zero<mode>" + [(set (match_operand:VALL_F16 0 "register_operand" "=w") + (vec_merge:VALL_F16 + (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "") + (match_operand:VALL_F16 3 "register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" + { + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; + } +) + (define_insn "@aarch64_simd_vec_copy_lane<mode>" [(set (match_operand:VALL_F16 0 "register_operand" "=w") (vec_merge:VALL_F16 diff --git a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c new file mode 100644 index 00000000000..c260cc9e445 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "arm_neon.h" + +#define FOO(type) \ +type f_##type(type v) \ +{ \ + v[1] = 0; \ + return v; \ +} + +FOO(int8x8_t) +FOO(int16x4_t) +FOO(int32x2_t) + +FOO(int8x16_t) +FOO(int16x8_t) +FOO(int32x4_t) +FOO(int64x2_t) + +FOO(float16x4_t) +FOO(float32x2_t) + +FOO(float16x8_t) +FOO(float32x4_t) +FOO(float64x2_t) + +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.b\\\[\[1\]\\\], wzr" 2 } } */ +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.h\\\[\[1\]\\\], wzr" 4 } } */ +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.s\\\[\[1\]\\\], wzr" 4 } } */ +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.d\\\[\[1\]\\\], xzr" 2 } } */
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > On Wed, 18 Jan 2023 at 19:59, Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: >> > On Tue, 17 Jan 2023 at 18:29, Richard Sandiford >> > <richard.sandiford@arm.com> wrote: >> >> >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: >> >> > Hi Richard, >> >> > For the following (contrived) test: >> >> > >> >> > void foo(int32x4_t v) >> >> > { >> >> > v[3] = 0; >> >> > return v; >> >> > } >> >> > >> >> > -O2 code-gen: >> >> > foo: >> >> > fmov s1, wzr >> >> > ins v0.s[3], v1.s[0] >> >> > ret >> >> > >> >> > I suppose we can instead emit the following code-gen ? >> >> > foo: >> >> > ins v0.s[3], wzr >> >> > ret >> >> > >> >> > combine produces: >> >> > Failed to match this instruction: >> >> > (set (reg:V4SI 95 [ v ]) >> >> > (vec_merge:V4SI (const_vector:V4SI [ >> >> > (const_int 0 [0]) repeated x4 >> >> > ]) >> >> > (reg:V4SI 97) >> >> > (const_int 8 [0x8]))) >> >> > >> >> > So, I wrote the following pattern to match the above insn: >> >> > (define_insn "aarch64_simd_vec_set_zero<mode>" >> >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") >> >> > (vec_merge:VALL_F16 >> >> > (match_operand:VALL_F16 1 "const_dup0_operand" "w") >> >> > (match_operand:VALL_F16 3 "register_operand" "0") >> >> > (match_operand:SI 2 "immediate_operand" "i")))] >> >> > "TARGET_SIMD" >> >> > { >> >> > int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); >> >> > operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); >> >> > return "ins\\t%0.<Vetype>[%p2], wzr"; >> >> > } >> >> > ) >> >> > >> >> > which now matches the above insn produced by combine. >> >> > However, in reload dump, it creates a new insn for assigning >> >> > register to (const_vector (const_int 0)), >> >> > which results in: >> >> > (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) >> >> > (const_vector:V4SI [ >> >> > (const_int 0 [0]) repeated x4 >> >> > ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} >> >> > (nil)) >> >> > (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) >> >> > (vec_merge:V4SI (reg:V4SI 33 v1 [99]) >> >> > (reg:V4SI 32 v0 [97]) >> >> > (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 >> >> > {aarch64_simd_vec_set_zerov4si} >> >> > (nil)) >> >> > >> >> > and eventually the code-gen: >> >> > foo: >> >> > movi v1.4s, 0 >> >> > ins v0.s[3], wzr >> >> > ret >> >> > >> >> > To get rid of redundant assignment of 0 to v1, I tried to split the >> >> > above pattern >> >> > as in the attached patch. This works to emit code-gen: >> >> > foo: >> >> > ins v0.s[3], wzr >> >> > ret >> >> > >> >> > However, I am not sure if this is the right approach. Could you suggest, >> >> > if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? >> >> >> >> The problem is with the "w" constraint on operand 1, which tells LRA >> >> to force the zero into an FPR. It should work if you remove the >> >> constraint. >> > Ah indeed, sorry about that, changing the constrained works. >> >> "i" isn't right though, because that's for scalar integers. >> There's no need for any constraint here -- the predicate does >> all of the work. >> >> > Does the attached patch look OK after bootstrap+test ? >> > Since we're in stage-4, shall it be OK to commit now, or queue it for stage-1 ? >> >> It needs tests as well. :-) >> >> Also: >> >> > Thanks, >> > Prathamesh >> > >> > >> >> >> >> Also, I think you'll need to use <vwcore>zr for the zero, so that >> >> it uses xzr for 64-bit elements. >> >> >> >> I think this and the existing patterns ought to test >> >> exact_log2 (INTVAL (operands[2])) >= 0 in the insn condition, >> >> since there's no guarantee that RTL optimisations won't form >> >> vec_merges that have other masks. >> >> >> >> Thanks, >> >> Richard >> > >> > [aarch64] Use wzr/xzr for assigning 0 to vector element. >> > >> > gcc/ChangeLog: >> > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): >> > New pattern. >> > * config/aarch64/predicates.md (const_dup0_operand): New. >> > >> > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md >> > index 104088f67d2..8e54ee4e886 100644 >> > --- a/gcc/config/aarch64/aarch64-simd.md >> > +++ b/gcc/config/aarch64/aarch64-simd.md >> > @@ -1083,6 +1083,20 @@ >> > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] >> > ) >> > >> > +(define_insn "aarch64_simd_vec_set_zero<mode>" >> > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") >> > + (vec_merge:VALL_F16 >> > + (match_operand:VALL_F16 1 "const_dup0_operand" "i") >> > + (match_operand:VALL_F16 3 "register_operand" "0") >> > + (match_operand:SI 2 "immediate_operand" "i")))] >> > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" >> > + { >> > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); >> > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); >> > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; >> > + } >> > +) >> > + >> > (define_insn "@aarch64_simd_vec_copy_lane<mode>" >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") >> > (vec_merge:VALL_F16 >> > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md >> > index ff7f73d3f30..901fa1bd7f9 100644 >> > --- a/gcc/config/aarch64/predicates.md >> > +++ b/gcc/config/aarch64/predicates.md >> > @@ -49,6 +49,13 @@ >> > return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3); >> > }) >> > >> > +(define_predicate "const_dup0_operand" >> > + (match_code "const_vector") >> > +{ >> > + op = unwrap_const_vec_duplicate (op); >> > + return CONST_INT_P (op) && rtx_equal_p (op, const0_rtx); >> > +}) >> > + >> >> We already have aarch64_simd_imm_zero for this. aarch64_simd_imm_zero >> is actually more general, because it works for floating-point modes too. >> >> I think the tests should cover all modes included in VALL_F16, since >> that should have picked up this and the xzr thing. > Hi Richard, > Thanks for the suggestions. Does the attached patch look OK ? > I am not sure how to test for v4bf and v8bf since it seems the compiler > refuses conversions to/from bfloat16_t ? > > Thanks, > Prathamesh > >> >> Thanks, >> Richard >> >> > (define_predicate "subreg_lowpart_operator" >> > (ior (match_code "truncate") >> > (and (match_code "subreg") > > [aarch64] Use wzr/xzr for assigning 0 to vector element. > > gcc/ChangeLog: > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): > New pattern. > > gcc/testsuite/ChangeLog: > * gcc.target/aarch64/vec-set-zero.c: New test. > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index 7f212bf37cd..7428e74beaf 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -1083,6 +1083,20 @@ > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] > ) > > +(define_insn "aarch64_simd_vec_set_zero<mode>" > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") > + (vec_merge:VALL_F16 > + (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "") > + (match_operand:VALL_F16 3 "register_operand" "0") > + (match_operand:SI 2 "immediate_operand" "i")))] > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" > + { > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; > + } > +) > + > (define_insn "@aarch64_simd_vec_copy_lane<mode>" > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > (vec_merge:VALL_F16 > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > new file mode 100644 > index 00000000000..c260cc9e445 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "arm_neon.h" > + > +#define FOO(type) \ > +type f_##type(type v) \ > +{ \ > + v[1] = 0; \ > + return v; \ > +} > + > +FOO(int8x8_t) > +FOO(int16x4_t) > +FOO(int32x2_t) > + > +FOO(int8x16_t) > +FOO(int16x8_t) > +FOO(int32x4_t) > +FOO(int64x2_t) > + > +FOO(float16x4_t) > +FOO(float32x2_t) > + > +FOO(float16x8_t) > +FOO(float32x4_t) > +FOO(float64x2_t) > + > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.b\\\[\[1\]\\\], wzr" 2 } } */ > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.h\\\[\[1\]\\\], wzr" 4 } } */ > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.s\\\[\[1\]\\\], wzr" 4 } } */ > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.d\\\[\[1\]\\\], xzr" 2 } } */ Can you test big-endian too? I'd expect it to use different INS indices. It might be worth quoting the regexps with {...} rather than "...", to reduce the number of backslashes needed. Thanks, Richard
On Mon, 23 Jan 2023 at 22:26, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > On Wed, 18 Jan 2023 at 19:59, Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > >> > On Tue, 17 Jan 2023 at 18:29, Richard Sandiford > >> > <richard.sandiford@arm.com> wrote: > >> >> > >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > >> >> > Hi Richard, > >> >> > For the following (contrived) test: > >> >> > > >> >> > void foo(int32x4_t v) > >> >> > { > >> >> > v[3] = 0; > >> >> > return v; > >> >> > } > >> >> > > >> >> > -O2 code-gen: > >> >> > foo: > >> >> > fmov s1, wzr > >> >> > ins v0.s[3], v1.s[0] > >> >> > ret > >> >> > > >> >> > I suppose we can instead emit the following code-gen ? > >> >> > foo: > >> >> > ins v0.s[3], wzr > >> >> > ret > >> >> > > >> >> > combine produces: > >> >> > Failed to match this instruction: > >> >> > (set (reg:V4SI 95 [ v ]) > >> >> > (vec_merge:V4SI (const_vector:V4SI [ > >> >> > (const_int 0 [0]) repeated x4 > >> >> > ]) > >> >> > (reg:V4SI 97) > >> >> > (const_int 8 [0x8]))) > >> >> > > >> >> > So, I wrote the following pattern to match the above insn: > >> >> > (define_insn "aarch64_simd_vec_set_zero<mode>" > >> >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > >> >> > (vec_merge:VALL_F16 > >> >> > (match_operand:VALL_F16 1 "const_dup0_operand" "w") > >> >> > (match_operand:VALL_F16 3 "register_operand" "0") > >> >> > (match_operand:SI 2 "immediate_operand" "i")))] > >> >> > "TARGET_SIMD" > >> >> > { > >> >> > int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > >> >> > operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > >> >> > return "ins\\t%0.<Vetype>[%p2], wzr"; > >> >> > } > >> >> > ) > >> >> > > >> >> > which now matches the above insn produced by combine. > >> >> > However, in reload dump, it creates a new insn for assigning > >> >> > register to (const_vector (const_int 0)), > >> >> > which results in: > >> >> > (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) > >> >> > (const_vector:V4SI [ > >> >> > (const_int 0 [0]) repeated x4 > >> >> > ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} > >> >> > (nil)) > >> >> > (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) > >> >> > (vec_merge:V4SI (reg:V4SI 33 v1 [99]) > >> >> > (reg:V4SI 32 v0 [97]) > >> >> > (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 > >> >> > {aarch64_simd_vec_set_zerov4si} > >> >> > (nil)) > >> >> > > >> >> > and eventually the code-gen: > >> >> > foo: > >> >> > movi v1.4s, 0 > >> >> > ins v0.s[3], wzr > >> >> > ret > >> >> > > >> >> > To get rid of redundant assignment of 0 to v1, I tried to split the > >> >> > above pattern > >> >> > as in the attached patch. This works to emit code-gen: > >> >> > foo: > >> >> > ins v0.s[3], wzr > >> >> > ret > >> >> > > >> >> > However, I am not sure if this is the right approach. Could you suggest, > >> >> > if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? > >> >> > >> >> The problem is with the "w" constraint on operand 1, which tells LRA > >> >> to force the zero into an FPR. It should work if you remove the > >> >> constraint. > >> > Ah indeed, sorry about that, changing the constrained works. > >> > >> "i" isn't right though, because that's for scalar integers. > >> There's no need for any constraint here -- the predicate does > >> all of the work. > >> > >> > Does the attached patch look OK after bootstrap+test ? > >> > Since we're in stage-4, shall it be OK to commit now, or queue it for stage-1 ? > >> > >> It needs tests as well. :-) > >> > >> Also: > >> > >> > Thanks, > >> > Prathamesh > >> > > >> > > >> >> > >> >> Also, I think you'll need to use <vwcore>zr for the zero, so that > >> >> it uses xzr for 64-bit elements. > >> >> > >> >> I think this and the existing patterns ought to test > >> >> exact_log2 (INTVAL (operands[2])) >= 0 in the insn condition, > >> >> since there's no guarantee that RTL optimisations won't form > >> >> vec_merges that have other masks. > >> >> > >> >> Thanks, > >> >> Richard > >> > > >> > [aarch64] Use wzr/xzr for assigning 0 to vector element. > >> > > >> > gcc/ChangeLog: > >> > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): > >> > New pattern. > >> > * config/aarch64/predicates.md (const_dup0_operand): New. > >> > > >> > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > >> > index 104088f67d2..8e54ee4e886 100644 > >> > --- a/gcc/config/aarch64/aarch64-simd.md > >> > +++ b/gcc/config/aarch64/aarch64-simd.md > >> > @@ -1083,6 +1083,20 @@ > >> > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] > >> > ) > >> > > >> > +(define_insn "aarch64_simd_vec_set_zero<mode>" > >> > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") > >> > + (vec_merge:VALL_F16 > >> > + (match_operand:VALL_F16 1 "const_dup0_operand" "i") > >> > + (match_operand:VALL_F16 3 "register_operand" "0") > >> > + (match_operand:SI 2 "immediate_operand" "i")))] > >> > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" > >> > + { > >> > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > >> > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > >> > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; > >> > + } > >> > +) > >> > + > >> > (define_insn "@aarch64_simd_vec_copy_lane<mode>" > >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > >> > (vec_merge:VALL_F16 > >> > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md > >> > index ff7f73d3f30..901fa1bd7f9 100644 > >> > --- a/gcc/config/aarch64/predicates.md > >> > +++ b/gcc/config/aarch64/predicates.md > >> > @@ -49,6 +49,13 @@ > >> > return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3); > >> > }) > >> > > >> > +(define_predicate "const_dup0_operand" > >> > + (match_code "const_vector") > >> > +{ > >> > + op = unwrap_const_vec_duplicate (op); > >> > + return CONST_INT_P (op) && rtx_equal_p (op, const0_rtx); > >> > +}) > >> > + > >> > >> We already have aarch64_simd_imm_zero for this. aarch64_simd_imm_zero > >> is actually more general, because it works for floating-point modes too. > >> > >> I think the tests should cover all modes included in VALL_F16, since > >> that should have picked up this and the xzr thing. > > Hi Richard, > > Thanks for the suggestions. Does the attached patch look OK ? > > I am not sure how to test for v4bf and v8bf since it seems the compiler > > refuses conversions to/from bfloat16_t ? > > > > Thanks, > > Prathamesh > > > >> > >> Thanks, > >> Richard > >> > >> > (define_predicate "subreg_lowpart_operator" > >> > (ior (match_code "truncate") > >> > (and (match_code "subreg") > > > > [aarch64] Use wzr/xzr for assigning 0 to vector element. > > > > gcc/ChangeLog: > > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): > > New pattern. > > > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vec-set-zero.c: New test. > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > > index 7f212bf37cd..7428e74beaf 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -1083,6 +1083,20 @@ > > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] > > ) > > > > +(define_insn "aarch64_simd_vec_set_zero<mode>" > > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") > > + (vec_merge:VALL_F16 > > + (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "") > > + (match_operand:VALL_F16 3 "register_operand" "0") > > + (match_operand:SI 2 "immediate_operand" "i")))] > > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" > > + { > > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; > > + } > > +) > > + > > (define_insn "@aarch64_simd_vec_copy_lane<mode>" > > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > > (vec_merge:VALL_F16 > > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > > new file mode 100644 > > index 00000000000..c260cc9e445 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > > @@ -0,0 +1,32 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > + > > +#include "arm_neon.h" > > + > > +#define FOO(type) \ > > +type f_##type(type v) \ > > +{ \ > > + v[1] = 0; \ > > + return v; \ > > +} > > + > > +FOO(int8x8_t) > > +FOO(int16x4_t) > > +FOO(int32x2_t) > > + > > +FOO(int8x16_t) > > +FOO(int16x8_t) > > +FOO(int32x4_t) > > +FOO(int64x2_t) > > + > > +FOO(float16x4_t) > > +FOO(float32x2_t) > > + > > +FOO(float16x8_t) > > +FOO(float32x4_t) > > +FOO(float64x2_t) > > + > > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.b\\\[\[1\]\\\], wzr" 2 } } */ > > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.h\\\[\[1\]\\\], wzr" 4 } } */ > > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.s\\\[\[1\]\\\], wzr" 4 } } */ > > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.d\\\[\[1\]\\\], xzr" 2 } } */ > > Can you test big-endian too? I'd expect it to use different INS indices. Ah indeed, thanks for pointing out. > > It might be worth quoting the regexps with {...} rather than "...", > to reduce the number of backslashes needed. Does the attached patch look OK ? Thanks, Prathamesh > > Thanks, > Richard [aarch64] Use wzr/xzr for assigning 0 to vector element. gcc/ChangeLog: * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): New pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vec-set-zero.c: New test. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 7f212bf37cd..7428e74beaf 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1083,6 +1083,20 @@ [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] ) +(define_insn "aarch64_simd_vec_set_zero<mode>" + [(set (match_operand:VALL_F16 0 "register_operand" "=w") + (vec_merge:VALL_F16 + (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "") + (match_operand:VALL_F16 3 "register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" + { + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; + } +) + (define_insn "@aarch64_simd_vec_copy_lane<mode>" [(set (match_operand:VALL_F16 0 "register_operand" "=w") (vec_merge:VALL_F16 diff --git a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c new file mode 100644 index 00000000000..b34b902cf27 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "arm_neon.h" + +#define FOO(type) \ +type f_##type(type v) \ +{ \ + v[1] = 0; \ + return v; \ +} + +FOO(int8x8_t) +FOO(int16x4_t) +FOO(int32x2_t) + +FOO(int8x16_t) +FOO(int16x8_t) +FOO(int32x4_t) +FOO(int64x2_t) + +FOO(float16x4_t) +FOO(float32x2_t) + +FOO(float16x8_t) +FOO(float32x4_t) +FOO(float64x2_t) + +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[1\], wzr} 2 { target aarch64_little_endian } } } */ +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[1\], wzr} 4 { target aarch64_little_endian } } } */ +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[1\], wzr} 4 { target aarch64_little_endian } } } */ +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.d\[1\], xzr} 2 { target aarch64_little_endian } } } */ + +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[6\], wzr} 1 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[14\], wzr} 1 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[2\], wzr} 2 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[6\], wzr} 2 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[0\], wzr} 2 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[2\], wzr} 2 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.d\[0\], xzr} 2 { target aarch64_big_endian } } } */
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > On Mon, 23 Jan 2023 at 22:26, Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: >> > On Wed, 18 Jan 2023 at 19:59, Richard Sandiford >> > <richard.sandiford@arm.com> wrote: >> >> >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: >> >> > On Tue, 17 Jan 2023 at 18:29, Richard Sandiford >> >> > <richard.sandiford@arm.com> wrote: >> >> >> >> >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: >> >> >> > Hi Richard, >> >> >> > For the following (contrived) test: >> >> >> > >> >> >> > void foo(int32x4_t v) >> >> >> > { >> >> >> > v[3] = 0; >> >> >> > return v; >> >> >> > } >> >> >> > >> >> >> > -O2 code-gen: >> >> >> > foo: >> >> >> > fmov s1, wzr >> >> >> > ins v0.s[3], v1.s[0] >> >> >> > ret >> >> >> > >> >> >> > I suppose we can instead emit the following code-gen ? >> >> >> > foo: >> >> >> > ins v0.s[3], wzr >> >> >> > ret >> >> >> > >> >> >> > combine produces: >> >> >> > Failed to match this instruction: >> >> >> > (set (reg:V4SI 95 [ v ]) >> >> >> > (vec_merge:V4SI (const_vector:V4SI [ >> >> >> > (const_int 0 [0]) repeated x4 >> >> >> > ]) >> >> >> > (reg:V4SI 97) >> >> >> > (const_int 8 [0x8]))) >> >> >> > >> >> >> > So, I wrote the following pattern to match the above insn: >> >> >> > (define_insn "aarch64_simd_vec_set_zero<mode>" >> >> >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") >> >> >> > (vec_merge:VALL_F16 >> >> >> > (match_operand:VALL_F16 1 "const_dup0_operand" "w") >> >> >> > (match_operand:VALL_F16 3 "register_operand" "0") >> >> >> > (match_operand:SI 2 "immediate_operand" "i")))] >> >> >> > "TARGET_SIMD" >> >> >> > { >> >> >> > int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); >> >> >> > operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); >> >> >> > return "ins\\t%0.<Vetype>[%p2], wzr"; >> >> >> > } >> >> >> > ) >> >> >> > >> >> >> > which now matches the above insn produced by combine. >> >> >> > However, in reload dump, it creates a new insn for assigning >> >> >> > register to (const_vector (const_int 0)), >> >> >> > which results in: >> >> >> > (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) >> >> >> > (const_vector:V4SI [ >> >> >> > (const_int 0 [0]) repeated x4 >> >> >> > ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} >> >> >> > (nil)) >> >> >> > (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) >> >> >> > (vec_merge:V4SI (reg:V4SI 33 v1 [99]) >> >> >> > (reg:V4SI 32 v0 [97]) >> >> >> > (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 >> >> >> > {aarch64_simd_vec_set_zerov4si} >> >> >> > (nil)) >> >> >> > >> >> >> > and eventually the code-gen: >> >> >> > foo: >> >> >> > movi v1.4s, 0 >> >> >> > ins v0.s[3], wzr >> >> >> > ret >> >> >> > >> >> >> > To get rid of redundant assignment of 0 to v1, I tried to split the >> >> >> > above pattern >> >> >> > as in the attached patch. This works to emit code-gen: >> >> >> > foo: >> >> >> > ins v0.s[3], wzr >> >> >> > ret >> >> >> > >> >> >> > However, I am not sure if this is the right approach. Could you suggest, >> >> >> > if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? >> >> >> >> >> >> The problem is with the "w" constraint on operand 1, which tells LRA >> >> >> to force the zero into an FPR. It should work if you remove the >> >> >> constraint. >> >> > Ah indeed, sorry about that, changing the constrained works. >> >> >> >> "i" isn't right though, because that's for scalar integers. >> >> There's no need for any constraint here -- the predicate does >> >> all of the work. >> >> >> >> > Does the attached patch look OK after bootstrap+test ? >> >> > Since we're in stage-4, shall it be OK to commit now, or queue it for stage-1 ? >> >> >> >> It needs tests as well. :-) >> >> >> >> Also: >> >> >> >> > Thanks, >> >> > Prathamesh >> >> > >> >> > >> >> >> >> >> >> Also, I think you'll need to use <vwcore>zr for the zero, so that >> >> >> it uses xzr for 64-bit elements. >> >> >> >> >> >> I think this and the existing patterns ought to test >> >> >> exact_log2 (INTVAL (operands[2])) >= 0 in the insn condition, >> >> >> since there's no guarantee that RTL optimisations won't form >> >> >> vec_merges that have other masks. >> >> >> >> >> >> Thanks, >> >> >> Richard >> >> > >> >> > [aarch64] Use wzr/xzr for assigning 0 to vector element. >> >> > >> >> > gcc/ChangeLog: >> >> > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): >> >> > New pattern. >> >> > * config/aarch64/predicates.md (const_dup0_operand): New. >> >> > >> >> > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md >> >> > index 104088f67d2..8e54ee4e886 100644 >> >> > --- a/gcc/config/aarch64/aarch64-simd.md >> >> > +++ b/gcc/config/aarch64/aarch64-simd.md >> >> > @@ -1083,6 +1083,20 @@ >> >> > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] >> >> > ) >> >> > >> >> > +(define_insn "aarch64_simd_vec_set_zero<mode>" >> >> > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") >> >> > + (vec_merge:VALL_F16 >> >> > + (match_operand:VALL_F16 1 "const_dup0_operand" "i") >> >> > + (match_operand:VALL_F16 3 "register_operand" "0") >> >> > + (match_operand:SI 2 "immediate_operand" "i")))] >> >> > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" >> >> > + { >> >> > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); >> >> > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); >> >> > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; >> >> > + } >> >> > +) >> >> > + >> >> > (define_insn "@aarch64_simd_vec_copy_lane<mode>" >> >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") >> >> > (vec_merge:VALL_F16 >> >> > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md >> >> > index ff7f73d3f30..901fa1bd7f9 100644 >> >> > --- a/gcc/config/aarch64/predicates.md >> >> > +++ b/gcc/config/aarch64/predicates.md >> >> > @@ -49,6 +49,13 @@ >> >> > return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3); >> >> > }) >> >> > >> >> > +(define_predicate "const_dup0_operand" >> >> > + (match_code "const_vector") >> >> > +{ >> >> > + op = unwrap_const_vec_duplicate (op); >> >> > + return CONST_INT_P (op) && rtx_equal_p (op, const0_rtx); >> >> > +}) >> >> > + >> >> >> >> We already have aarch64_simd_imm_zero for this. aarch64_simd_imm_zero >> >> is actually more general, because it works for floating-point modes too. >> >> >> >> I think the tests should cover all modes included in VALL_F16, since >> >> that should have picked up this and the xzr thing. >> > Hi Richard, >> > Thanks for the suggestions. Does the attached patch look OK ? >> > I am not sure how to test for v4bf and v8bf since it seems the compiler >> > refuses conversions to/from bfloat16_t ? >> > >> > Thanks, >> > Prathamesh >> > >> >> >> >> Thanks, >> >> Richard >> >> >> >> > (define_predicate "subreg_lowpart_operator" >> >> > (ior (match_code "truncate") >> >> > (and (match_code "subreg") >> > >> > [aarch64] Use wzr/xzr for assigning 0 to vector element. >> > >> > gcc/ChangeLog: >> > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): >> > New pattern. >> > >> > gcc/testsuite/ChangeLog: >> > * gcc.target/aarch64/vec-set-zero.c: New test. >> > >> > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md >> > index 7f212bf37cd..7428e74beaf 100644 >> > --- a/gcc/config/aarch64/aarch64-simd.md >> > +++ b/gcc/config/aarch64/aarch64-simd.md >> > @@ -1083,6 +1083,20 @@ >> > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] >> > ) >> > >> > +(define_insn "aarch64_simd_vec_set_zero<mode>" >> > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") >> > + (vec_merge:VALL_F16 >> > + (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "") >> > + (match_operand:VALL_F16 3 "register_operand" "0") >> > + (match_operand:SI 2 "immediate_operand" "i")))] >> > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" >> > + { >> > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); >> > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); >> > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; >> > + } >> > +) >> > + >> > (define_insn "@aarch64_simd_vec_copy_lane<mode>" >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") >> > (vec_merge:VALL_F16 >> > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c >> > new file mode 100644 >> > index 00000000000..c260cc9e445 >> > --- /dev/null >> > +++ b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c >> > @@ -0,0 +1,32 @@ >> > +/* { dg-do compile } */ >> > +/* { dg-options "-O2" } */ >> > + >> > +#include "arm_neon.h" >> > + >> > +#define FOO(type) \ >> > +type f_##type(type v) \ >> > +{ \ >> > + v[1] = 0; \ >> > + return v; \ >> > +} >> > + >> > +FOO(int8x8_t) >> > +FOO(int16x4_t) >> > +FOO(int32x2_t) >> > + >> > +FOO(int8x16_t) >> > +FOO(int16x8_t) >> > +FOO(int32x4_t) >> > +FOO(int64x2_t) >> > + >> > +FOO(float16x4_t) >> > +FOO(float32x2_t) >> > + >> > +FOO(float16x8_t) >> > +FOO(float32x4_t) >> > +FOO(float64x2_t) >> > + >> > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.b\\\[\[1\]\\\], wzr" 2 } } */ >> > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.h\\\[\[1\]\\\], wzr" 4 } } */ >> > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.s\\\[\[1\]\\\], wzr" 4 } } */ >> > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.d\\\[\[1\]\\\], xzr" 2 } } */ >> >> Can you test big-endian too? I'd expect it to use different INS indices. > Ah indeed, thanks for pointing out. >> >> It might be worth quoting the regexps with {...} rather than "...", >> to reduce the number of backslashes needed. > Does the attached patch look OK ? Yeah, OK for GCC 14, thanks. Richard > > Thanks, > Prathamesh >> >> Thanks, >> Richard > > [aarch64] Use wzr/xzr for assigning 0 to vector element. > > gcc/ChangeLog: > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): > New pattern. > > gcc/testsuite/ChangeLog: > * gcc.target/aarch64/vec-set-zero.c: New test. > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index 7f212bf37cd..7428e74beaf 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -1083,6 +1083,20 @@ > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] > ) > > +(define_insn "aarch64_simd_vec_set_zero<mode>" > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") > + (vec_merge:VALL_F16 > + (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "") > + (match_operand:VALL_F16 3 "register_operand" "0") > + (match_operand:SI 2 "immediate_operand" "i")))] > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" > + { > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; > + } > +) > + > (define_insn "@aarch64_simd_vec_copy_lane<mode>" > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > (vec_merge:VALL_F16 > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > new file mode 100644 > index 00000000000..b34b902cf27 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > @@ -0,0 +1,40 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "arm_neon.h" > + > +#define FOO(type) \ > +type f_##type(type v) \ > +{ \ > + v[1] = 0; \ > + return v; \ > +} > + > +FOO(int8x8_t) > +FOO(int16x4_t) > +FOO(int32x2_t) > + > +FOO(int8x16_t) > +FOO(int16x8_t) > +FOO(int32x4_t) > +FOO(int64x2_t) > + > +FOO(float16x4_t) > +FOO(float32x2_t) > + > +FOO(float16x8_t) > +FOO(float32x4_t) > +FOO(float64x2_t) > + > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[1\], wzr} 2 { target aarch64_little_endian } } } */ > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[1\], wzr} 4 { target aarch64_little_endian } } } */ > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[1\], wzr} 4 { target aarch64_little_endian } } } */ > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.d\[1\], xzr} 2 { target aarch64_little_endian } } } */ > + > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[6\], wzr} 1 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[14\], wzr} 1 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[2\], wzr} 2 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[6\], wzr} 2 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[0\], wzr} 2 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[2\], wzr} 2 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.d\[0\], xzr} 2 { target aarch64_big_endian } } } */
On Tue, 31 Jan 2023 at 11:51, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > On Mon, 23 Jan 2023 at 22:26, Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > >> > On Wed, 18 Jan 2023 at 19:59, Richard Sandiford > >> > <richard.sandiford@arm.com> wrote: > >> >> > >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > >> >> > On Tue, 17 Jan 2023 at 18:29, Richard Sandiford > >> >> > <richard.sandiford@arm.com> wrote: > >> >> >> > >> >> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > >> >> >> > Hi Richard, > >> >> >> > For the following (contrived) test: > >> >> >> > > >> >> >> > void foo(int32x4_t v) > >> >> >> > { > >> >> >> > v[3] = 0; > >> >> >> > return v; > >> >> >> > } > >> >> >> > > >> >> >> > -O2 code-gen: > >> >> >> > foo: > >> >> >> > fmov s1, wzr > >> >> >> > ins v0.s[3], v1.s[0] > >> >> >> > ret > >> >> >> > > >> >> >> > I suppose we can instead emit the following code-gen ? > >> >> >> > foo: > >> >> >> > ins v0.s[3], wzr > >> >> >> > ret > >> >> >> > > >> >> >> > combine produces: > >> >> >> > Failed to match this instruction: > >> >> >> > (set (reg:V4SI 95 [ v ]) > >> >> >> > (vec_merge:V4SI (const_vector:V4SI [ > >> >> >> > (const_int 0 [0]) repeated x4 > >> >> >> > ]) > >> >> >> > (reg:V4SI 97) > >> >> >> > (const_int 8 [0x8]))) > >> >> >> > > >> >> >> > So, I wrote the following pattern to match the above insn: > >> >> >> > (define_insn "aarch64_simd_vec_set_zero<mode>" > >> >> >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > >> >> >> > (vec_merge:VALL_F16 > >> >> >> > (match_operand:VALL_F16 1 "const_dup0_operand" "w") > >> >> >> > (match_operand:VALL_F16 3 "register_operand" "0") > >> >> >> > (match_operand:SI 2 "immediate_operand" "i")))] > >> >> >> > "TARGET_SIMD" > >> >> >> > { > >> >> >> > int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > >> >> >> > operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > >> >> >> > return "ins\\t%0.<Vetype>[%p2], wzr"; > >> >> >> > } > >> >> >> > ) > >> >> >> > > >> >> >> > which now matches the above insn produced by combine. > >> >> >> > However, in reload dump, it creates a new insn for assigning > >> >> >> > register to (const_vector (const_int 0)), > >> >> >> > which results in: > >> >> >> > (insn 19 8 13 2 (set (reg:V4SI 33 v1 [99]) > >> >> >> > (const_vector:V4SI [ > >> >> >> > (const_int 0 [0]) repeated x4 > >> >> >> > ])) "wzr-test.c":8:1 1269 {*aarch64_simd_movv4si} > >> >> >> > (nil)) > >> >> >> > (insn 13 19 14 2 (set (reg/i:V4SI 32 v0) > >> >> >> > (vec_merge:V4SI (reg:V4SI 33 v1 [99]) > >> >> >> > (reg:V4SI 32 v0 [97]) > >> >> >> > (const_int 8 [0x8]))) "wzr-test.c":8:1 1808 > >> >> >> > {aarch64_simd_vec_set_zerov4si} > >> >> >> > (nil)) > >> >> >> > > >> >> >> > and eventually the code-gen: > >> >> >> > foo: > >> >> >> > movi v1.4s, 0 > >> >> >> > ins v0.s[3], wzr > >> >> >> > ret > >> >> >> > > >> >> >> > To get rid of redundant assignment of 0 to v1, I tried to split the > >> >> >> > above pattern > >> >> >> > as in the attached patch. This works to emit code-gen: > >> >> >> > foo: > >> >> >> > ins v0.s[3], wzr > >> >> >> > ret > >> >> >> > > >> >> >> > However, I am not sure if this is the right approach. Could you suggest, > >> >> >> > if it'd be possible to get rid of UNSPEC_SETZERO in the patch ? > >> >> >> > >> >> >> The problem is with the "w" constraint on operand 1, which tells LRA > >> >> >> to force the zero into an FPR. It should work if you remove the > >> >> >> constraint. > >> >> > Ah indeed, sorry about that, changing the constrained works. > >> >> > >> >> "i" isn't right though, because that's for scalar integers. > >> >> There's no need for any constraint here -- the predicate does > >> >> all of the work. > >> >> > >> >> > Does the attached patch look OK after bootstrap+test ? > >> >> > Since we're in stage-4, shall it be OK to commit now, or queue it for stage-1 ? > >> >> > >> >> It needs tests as well. :-) > >> >> > >> >> Also: > >> >> > >> >> > Thanks, > >> >> > Prathamesh > >> >> > > >> >> > > >> >> >> > >> >> >> Also, I think you'll need to use <vwcore>zr for the zero, so that > >> >> >> it uses xzr for 64-bit elements. > >> >> >> > >> >> >> I think this and the existing patterns ought to test > >> >> >> exact_log2 (INTVAL (operands[2])) >= 0 in the insn condition, > >> >> >> since there's no guarantee that RTL optimisations won't form > >> >> >> vec_merges that have other masks. > >> >> >> > >> >> >> Thanks, > >> >> >> Richard > >> >> > > >> >> > [aarch64] Use wzr/xzr for assigning 0 to vector element. > >> >> > > >> >> > gcc/ChangeLog: > >> >> > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): > >> >> > New pattern. > >> >> > * config/aarch64/predicates.md (const_dup0_operand): New. > >> >> > > >> >> > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > >> >> > index 104088f67d2..8e54ee4e886 100644 > >> >> > --- a/gcc/config/aarch64/aarch64-simd.md > >> >> > +++ b/gcc/config/aarch64/aarch64-simd.md > >> >> > @@ -1083,6 +1083,20 @@ > >> >> > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] > >> >> > ) > >> >> > > >> >> > +(define_insn "aarch64_simd_vec_set_zero<mode>" > >> >> > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") > >> >> > + (vec_merge:VALL_F16 > >> >> > + (match_operand:VALL_F16 1 "const_dup0_operand" "i") > >> >> > + (match_operand:VALL_F16 3 "register_operand" "0") > >> >> > + (match_operand:SI 2 "immediate_operand" "i")))] > >> >> > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" > >> >> > + { > >> >> > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > >> >> > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > >> >> > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; > >> >> > + } > >> >> > +) > >> >> > + > >> >> > (define_insn "@aarch64_simd_vec_copy_lane<mode>" > >> >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > >> >> > (vec_merge:VALL_F16 > >> >> > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md > >> >> > index ff7f73d3f30..901fa1bd7f9 100644 > >> >> > --- a/gcc/config/aarch64/predicates.md > >> >> > +++ b/gcc/config/aarch64/predicates.md > >> >> > @@ -49,6 +49,13 @@ > >> >> > return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3); > >> >> > }) > >> >> > > >> >> > +(define_predicate "const_dup0_operand" > >> >> > + (match_code "const_vector") > >> >> > +{ > >> >> > + op = unwrap_const_vec_duplicate (op); > >> >> > + return CONST_INT_P (op) && rtx_equal_p (op, const0_rtx); > >> >> > +}) > >> >> > + > >> >> > >> >> We already have aarch64_simd_imm_zero for this. aarch64_simd_imm_zero > >> >> is actually more general, because it works for floating-point modes too. > >> >> > >> >> I think the tests should cover all modes included in VALL_F16, since > >> >> that should have picked up this and the xzr thing. > >> > Hi Richard, > >> > Thanks for the suggestions. Does the attached patch look OK ? > >> > I am not sure how to test for v4bf and v8bf since it seems the compiler > >> > refuses conversions to/from bfloat16_t ? > >> > > >> > Thanks, > >> > Prathamesh > >> > > >> >> > >> >> Thanks, > >> >> Richard > >> >> > >> >> > (define_predicate "subreg_lowpart_operator" > >> >> > (ior (match_code "truncate") > >> >> > (and (match_code "subreg") > >> > > >> > [aarch64] Use wzr/xzr for assigning 0 to vector element. > >> > > >> > gcc/ChangeLog: > >> > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): > >> > New pattern. > >> > > >> > gcc/testsuite/ChangeLog: > >> > * gcc.target/aarch64/vec-set-zero.c: New test. > >> > > >> > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > >> > index 7f212bf37cd..7428e74beaf 100644 > >> > --- a/gcc/config/aarch64/aarch64-simd.md > >> > +++ b/gcc/config/aarch64/aarch64-simd.md > >> > @@ -1083,6 +1083,20 @@ > >> > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] > >> > ) > >> > > >> > +(define_insn "aarch64_simd_vec_set_zero<mode>" > >> > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") > >> > + (vec_merge:VALL_F16 > >> > + (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "") > >> > + (match_operand:VALL_F16 3 "register_operand" "0") > >> > + (match_operand:SI 2 "immediate_operand" "i")))] > >> > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" > >> > + { > >> > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > >> > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > >> > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; > >> > + } > >> > +) > >> > + > >> > (define_insn "@aarch64_simd_vec_copy_lane<mode>" > >> > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > >> > (vec_merge:VALL_F16 > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > >> > new file mode 100644 > >> > index 00000000000..c260cc9e445 > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > >> > @@ -0,0 +1,32 @@ > >> > +/* { dg-do compile } */ > >> > +/* { dg-options "-O2" } */ > >> > + > >> > +#include "arm_neon.h" > >> > + > >> > +#define FOO(type) \ > >> > +type f_##type(type v) \ > >> > +{ \ > >> > + v[1] = 0; \ > >> > + return v; \ > >> > +} > >> > + > >> > +FOO(int8x8_t) > >> > +FOO(int16x4_t) > >> > +FOO(int32x2_t) > >> > + > >> > +FOO(int8x16_t) > >> > +FOO(int16x8_t) > >> > +FOO(int32x4_t) > >> > +FOO(int64x2_t) > >> > + > >> > +FOO(float16x4_t) > >> > +FOO(float32x2_t) > >> > + > >> > +FOO(float16x8_t) > >> > +FOO(float32x4_t) > >> > +FOO(float64x2_t) > >> > + > >> > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.b\\\[\[1\]\\\], wzr" 2 } } */ > >> > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.h\\\[\[1\]\\\], wzr" 4 } } */ > >> > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.s\\\[\[1\]\\\], wzr" 4 } } */ > >> > +/* { dg-final { scan-assembler-times "ins\\tv\[0-9\]+\.d\\\[\[1\]\\\], xzr" 2 } } */ > >> > >> Can you test big-endian too? I'd expect it to use different INS indices. > > Ah indeed, thanks for pointing out. > >> > >> It might be worth quoting the regexps with {...} rather than "...", > >> to reduce the number of backslashes needed. > > Does the attached patch look OK ? > > Yeah, OK for GCC 14, thanks. Thanks, committed after verifying bootstrap+test passes on aarch64-linux-gnu in: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2c7bf8036dfe2f603f1c135dabf6415d8d28051b Thanks, Prathamesh > > Richard > > > > > Thanks, > > Prathamesh > >> > >> Thanks, > >> Richard > > > > [aarch64] Use wzr/xzr for assigning 0 to vector element. > > > > gcc/ChangeLog: > > * config/aaarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): > > New pattern. > > > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vec-set-zero.c: New test. > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > > index 7f212bf37cd..7428e74beaf 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -1083,6 +1083,20 @@ > > [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] > > ) > > > > +(define_insn "aarch64_simd_vec_set_zero<mode>" > > + [(set (match_operand:VALL_F16 0 "register_operand" "=w") > > + (vec_merge:VALL_F16 > > + (match_operand:VALL_F16 1 "aarch64_simd_imm_zero" "") > > + (match_operand:VALL_F16 3 "register_operand" "0") > > + (match_operand:SI 2 "immediate_operand" "i")))] > > + "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0" > > + { > > + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); > > + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); > > + return "ins\\t%0.<Vetype>[%p2], <vwcore>zr"; > > + } > > +) > > + > > (define_insn "@aarch64_simd_vec_copy_lane<mode>" > > [(set (match_operand:VALL_F16 0 "register_operand" "=w") > > (vec_merge:VALL_F16 > > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > > new file mode 100644 > > index 00000000000..b34b902cf27 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/vec-set-zero.c > > @@ -0,0 +1,40 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > + > > +#include "arm_neon.h" > > + > > +#define FOO(type) \ > > +type f_##type(type v) \ > > +{ \ > > + v[1] = 0; \ > > + return v; \ > > +} > > + > > +FOO(int8x8_t) > > +FOO(int16x4_t) > > +FOO(int32x2_t) > > + > > +FOO(int8x16_t) > > +FOO(int16x8_t) > > +FOO(int32x4_t) > > +FOO(int64x2_t) > > + > > +FOO(float16x4_t) > > +FOO(float32x2_t) > > + > > +FOO(float16x8_t) > > +FOO(float32x4_t) > > +FOO(float64x2_t) > > + > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[1\], wzr} 2 { target aarch64_little_endian } } } */ > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[1\], wzr} 4 { target aarch64_little_endian } } } */ > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[1\], wzr} 4 { target aarch64_little_endian } } } */ > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.d\[1\], xzr} 2 { target aarch64_little_endian } } } */ > > + > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[6\], wzr} 1 { target aarch64_big_endian } } } */ > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.b\[14\], wzr} 1 { target aarch64_big_endian } } } */ > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[2\], wzr} 2 { target aarch64_big_endian } } } */ > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.h\[6\], wzr} 2 { target aarch64_big_endian } } } */ > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[0\], wzr} 2 { target aarch64_big_endian } } } */ > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.s\[2\], wzr} 2 { target aarch64_big_endian } } } */ > > +/* { dg-final { scan-assembler-times {ins\tv[0-9]+\.d\[0\], xzr} 2 { target aarch64_big_endian } } } */
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 104088f67d2..5130f46c0da 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1083,6 +1083,39 @@ [(set_attr "type" "neon_ins<q>, neon_from_gp<q>, neon_load1_one_lane<q>")] ) +(define_insn "aarch64_simd_set_zero<mode>" + [(set (match_operand:VALL_F16 0 "register_operand" "=w") + (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")] + UNSPEC_SETZERO))] + "TARGET_SIMD" + { + if (GET_MODE_INNER (<MODE>mode) == DImode) + return "ins\\t%0.<Vetype>[%p2], xzr"; + return "ins\\t%0.<Vetype>[%p2], wzr"; + } + [(set_attr "type" "neon_ins<q>")] +) + +(define_insn_and_split "aarch64_simd_vec_set_zero<mode>" + [(set (match_operand:VALL_F16 0 "register_operand" "=w") + (vec_merge:VALL_F16 + (match_operand:VALL_F16 1 "const_dup0_operand" "w") + (match_operand:VALL_F16 3 "register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_SIMD" + "#" + "&& 1" + [(const_int 0)] + { + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2]))); + operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt); + emit_insn (gen_aarch64_simd_set_zero<mode> (operands[0], operands[3], operands[2])); + DONE; + } + [(set_attr "type" "neon_ins<q>")] +) + (define_insn "@aarch64_simd_vec_copy_lane<mode>" [(set (match_operand:VALL_F16 0 "register_operand" "=w") (vec_merge:VALL_F16 diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 5b26443e5b6..8064841ebb4 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -839,6 +839,7 @@ UNSPEC_FCMUL_CONJ ; Used in aarch64-simd.md. UNSPEC_FCMLA_CONJ ; Used in aarch64-simd.md. UNSPEC_FCMLA180_CONJ ; Used in aarch64-simd.md. + UNSPEC_SETZERO ; Used in aarch64-simd.md. UNSPEC_ASRD ; Used in aarch64-sve.md. UNSPEC_ADCLB ; Used in aarch64-sve2.md. UNSPEC_ADCLT ; Used in aarch64-sve2.md. diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index ff7f73d3f30..901fa1bd7f9 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -49,6 +49,13 @@ return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3); }) +(define_predicate "const_dup0_operand" + (match_code "const_vector") +{ + op = unwrap_const_vec_duplicate (op); + return CONST_INT_P (op) && rtx_equal_p (op, const0_rtx); +}) + (define_predicate "subreg_lowpart_operator" (ior (match_code "truncate") (and (match_code "subreg")