From patchwork Fri Apr 5 13:51:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 88105 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 86AFC3858C24 for ; Fri, 5 Apr 2024 13:51:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 789FB3858C98 for ; Fri, 5 Apr 2024 13:51:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 789FB3858C98 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 789FB3858C98 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712325077; cv=none; b=un3kKFai7XIOJGYp/Y6/mK8kr41TyT+vKEfhf/dDPUjclthAFXzuFHMSYkWIyLTSvaA1AcyzXQPYEtQtwvww7eEjIZluMRI2tVdZ4Ttc3LRWt29XbfZfhSCweh3InKCE2xUxLis5XMDGK1DJZoyGqSs0fy8paPtDFR5do0R3QbQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712325077; c=relaxed/simple; bh=1oJhFYLAzS9CdBrd3Z8s77PVDoTqqqFFsetHBIcZ02s=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=R+VCf/67sRB8ouwI5XnKGXV3UTUwWsjcHhzdFM3ZMv0CAnyOZ1ZfwnBirqAHKAaaNvbM/htxSQlyAAx+NR+nppriIdVWks4iNUM6TA81iGncjyNkpRBmE0wBjl4xLwWCTml/QA4JEjp3XSIFeODW+DI41QkTf6ap2ArRMYbRjLQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8B57EDA7 for ; Fri, 5 Apr 2024 06:51:45 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 917863F7B4 for ; Fri, 5 Apr 2024 06:51:14 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed] aarch64: Fix bogus cnot optimisation [PR114603] Date: Fri, 05 Apr 2024 14:51:13 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-20.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org aarch64-sve.md had a pattern that combined: cmpeq pb.T, pa/z, zc.T, #0 mov zd.T, pb/z, #1 into: cnot zd.T, pa/m, zc.T But this is only valid if pa.T is a ptrue. In other cases, the original would set inactive elements of zd.T to 0, whereas the combined form would copy elements from zc.T. This isn't a regression on a known testcase. However, it's a nasty wrong code bug that could conceivably trigger for autovec code (although I've not been able to construct a reproducer so far). That fix is also quite localised to the buggy operation. I'd therefore prefer to push the fix now rather than wait for GCC 15. Tested on aarch64-linux-gnu & pushed. I'll backport to branches if there is no fallout. Richard gcc/ PR target/114603 * config/aarch64/aarch64-sve.md (@aarch64_pred_cnot): Replace with... (@aarch64_ptrue_cnot): ...this, requiring operand 1 to be a ptrue. (*cnot): Require operand 1 to be a ptrue. * config/aarch64/aarch64-sve-builtins-base.cc (svcnot_impl::expand): Use aarch64_ptrue_cnot for _x operations that are predicated with a ptrue. Represent other _x operations as fully-defined _m operations. gcc/testsuite/ PR target/114603 * gcc.target/aarch64/sve/acle/general/cnot_1.c: New test. --- .../aarch64/aarch64-sve-builtins-base.cc | 25 ++++++++++++------- gcc/config/aarch64/aarch64-sve.md | 22 ++++++++-------- .../aarch64/sve/acle/general/cnot_1.c | 23 +++++++++++++++++ 3 files changed, 50 insertions(+), 20 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cnot_1.c diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 257ca5bf6ad..5be2315a3c6 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -517,15 +517,22 @@ public: expand (function_expander &e) const override { machine_mode mode = e.vector_mode (0); - if (e.pred == PRED_x) - { - /* The pattern for CNOT includes an UNSPEC_PRED_Z, so needs - a ptrue hint. */ - e.add_ptrue_hint (0, e.gp_mode (0)); - return e.use_pred_x_insn (code_for_aarch64_pred_cnot (mode)); - } - - return e.use_cond_insn (code_for_cond_cnot (mode), 0); + machine_mode pred_mode = e.gp_mode (0); + /* The underlying _x pattern is effectively: + + dst = src == 0 ? 1 : 0 + + rather than an UNSPEC_PRED_X. Using this form allows autovec + constructs to be matched by combine, but it means that the + predicate on the src == 0 comparison must be all-true. + + For simplicity, represent other _x operations as fully-defined _m + operations rather than using a separate bespoke pattern. */ + if (e.pred == PRED_x + && gen_lowpart (pred_mode, e.args[0]) == CONSTM1_RTX (pred_mode)) + return e.use_pred_x_insn (code_for_aarch64_ptrue_cnot (mode)); + return e.use_cond_insn (code_for_cond_cnot (mode), + e.pred == PRED_x ? 1 : 0); } }; diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index eca8623e587..0434358122d 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -3363,24 +3363,24 @@ (define_insn_and_split "trunc2" ;; - CNOT ;; ------------------------------------------------------------------------- -;; Predicated logical inverse. -(define_expand "@aarch64_pred_cnot" +;; Logical inverse, predicated with a ptrue. +(define_expand "@aarch64_ptrue_cnot" [(set (match_operand:SVE_FULL_I 0 "register_operand") (unspec:SVE_FULL_I [(unspec: [(match_operand: 1 "register_operand") - (match_operand:SI 2 "aarch64_sve_ptrue_flag") + (const_int SVE_KNOWN_PTRUE) (eq: - (match_operand:SVE_FULL_I 3 "register_operand") - (match_dup 4))] + (match_operand:SVE_FULL_I 2 "register_operand") + (match_dup 3))] UNSPEC_PRED_Z) - (match_dup 5) - (match_dup 4)] + (match_dup 4) + (match_dup 3)] UNSPEC_SEL))] "TARGET_SVE" { - operands[4] = CONST0_RTX (mode); - operands[5] = CONST1_RTX (mode); + operands[3] = CONST0_RTX (mode); + operands[4] = CONST1_RTX (mode); } ) @@ -3389,7 +3389,7 @@ (define_insn "*cnot" (unspec:SVE_I [(unspec: [(match_operand: 1 "register_operand") - (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (const_int SVE_KNOWN_PTRUE) (eq: (match_operand:SVE_I 2 "register_operand") (match_operand:SVE_I 3 "aarch64_simd_imm_zero"))] @@ -11001,4 +11001,4 @@ (define_insn "@aarch64_sve_set_neonq_" GET_MODE (operands[2])); return "sel\t%0., %3, %2., %1."; } -) \ No newline at end of file +) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cnot_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cnot_1.c new file mode 100644 index 00000000000..b1a489f0cf0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cnot_1.c @@ -0,0 +1,23 @@ +/* { dg-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/* +** foo: +** cmpeq (p[0-7])\.s, p0/z, z0\.s, #0 +** mov z0\.s, \1/z, #1 +** ret +*/ +svint32_t foo(svbool_t pg, svint32_t y) +{ + return svsel(svcmpeq(pg, y, 0), svdup_s32(1), svdup_s32(0)); +} + +#ifdef __cplusplus +} +#endif