From patchwork Fri Jan 27 11:06:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 63786 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 17B7B385842D for ; Fri, 27 Jan 2023 11:06:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 17B7B385842D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1674817613; bh=hJxICQ+xGT/wGH6q8NiYsXqdKXDKWCerbKc4ufy1KCw=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=Cp5ZCahBwcf9LPqKKDSOJGuM2LwvH6WniSsmPr0c1VRkKbaqNczygAa/OQMtH7AfF ELz2pz3bOjmSLdN7KofqBIBM7ErNkLjJf1vH5A/tlWq5bNoEiYD4o0kJtCKmFMOr+E pIXRO5zE+GIcdmNcEkWCQN3J6txUhC69rWAg+VZI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id AD03A3858422 for ; Fri, 27 Jan 2023 11:06:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AD03A3858422 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 441952B; Fri, 27 Jan 2023 03:07:04 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E02B33F64C; Fri, 27 Jan 2023 03:06:21 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, richard.sandiford@arm.com Cc: rguenther@suse.de Subject: [PATCH 1/2] Add support for conditional xorsign [PR96373] Date: Fri, 27 Jan 2023 11:06:20 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-36.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch is an optimisation, but it's also a prerequisite for fixing PR96373 without regressing vect-xorsign_exec.c. Currently the vectoriser vectorises: for (i = 0; i < N; i++) r[i] = a[i] * __builtin_copysignf (1.0f, b[i]); as two unconditional operations (copysign and mult). tree-ssa-math-opts.cc later combines them into an "xorsign" function. This works for both Advanced SIMD and SVE. However, with the fix for PR96373, the vectoriser will instead generate a conditional multiplication (IFN_COND_MUL). Something then needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional xorsign. Three obvious options were: (1) Extend tree-ssa-math-opts.cc. (2) Do the fold in match.pd. (3) Leave it to rtl combine. I'm against (3), because this isn't a target-specific optimisation. (1) would be possible, but would involve open-coding a lot of what match.pd does for us. And, in contrast to doing the current tree-ssa-math-opts.cc optimisation in match.pd, there should be no danger of (2) happening too early. If we have an IFN_COND_MUL then we're already past the stage of simplifying the original source code. There was also a choice between adding a conditional xorsign ifn and simply open-coding the xorsign. The latter seems simpler, and means less boiler-plate for target-specific code. The signed_or_unsigned_type_for change is needed to make sure that we stay in "SVE space" when doing the optimisation on 128-bit fixed-length SVE. Tested on aarch64-linux-gnu. OK to install? Richard gcc/ PR tree-optimization/96373 * tree.h (sign_mask_for): Declare. * tree.cc (sign_mask_for): New function. (signed_or_unsigned_type_for): For vector types, try to use the related_int_vector_mode. * genmatch.cc (commutative_op): Handle conditional internal functions. * match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and. gcc/testsuite/ PR tree-optimization/96373 * gcc.target/aarch64/sve/cond_xorsign_1.c: New test. * gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise. --- gcc/genmatch.cc | 15 ++++++++ gcc/match.pd | 14 ++++++++ .../gcc.target/aarch64/sve/cond_xorsign_1.c | 34 +++++++++++++++++++ .../gcc.target/aarch64/sve/cond_xorsign_2.c | 17 ++++++++++ gcc/tree.cc | 33 ++++++++++++++++++ gcc/tree.h | 1 + 6 files changed, 114 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index d4cb439a851..e147ab9db7a 100644 --- a/gcc/genmatch.cc +++ b/gcc/genmatch.cc @@ -489,6 +489,21 @@ commutative_op (id_base *id) case CFN_FNMS: return 0; + case CFN_COND_ADD: + case CFN_COND_MUL: + case CFN_COND_MIN: + case CFN_COND_MAX: + case CFN_COND_FMIN: + case CFN_COND_FMAX: + case CFN_COND_AND: + case CFN_COND_IOR: + case CFN_COND_XOR: + case CFN_COND_FMA: + case CFN_COND_FMS: + case CFN_COND_FNMA: + case CFN_COND_FNMS: + return 1; + default: return -1; } diff --git a/gcc/match.pd b/gcc/match.pd index 56ac743aa6d..f605b798c44 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -339,6 +339,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@0))) (COPYSIGN_ALL (negate @0) @1))) +/* Transform c ? x * copysign (1, y) : z to c ? x ^ signs(y) : z. + tree-ssa-math-opts.cc does the corresponding optimization for + unconditional multiplications (via xorsign). */ +(simplify + (IFN_COND_MUL:c @0 @1 (IFN_COPYSIGN real_onep @2) @3) + (with { tree signs = sign_mask_for (type); } + (if (signs) + (with { tree inttype = TREE_TYPE (signs); } + (view_convert:type + (IFN_COND_XOR:inttype @0 + (view_convert:inttype @1) + (bit_and (view_convert:inttype @2) { signs; }) + (view_convert:inttype @3))))))) + /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ (simplify (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c new file mode 100644 index 00000000000..338ca605923 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define xorsign(A, B, SUFFIX) ((A) * __builtin_copysign##SUFFIX (1.0, B)) + +#define DEF_LOOP(TYPE, SUFFIX) \ + void __attribute__ ((noinline, noclone)) \ + test_##TYPE (TYPE *__restrict r, TYPE *__restrict a, \ + TYPE *__restrict b, TYPE *__restrict c, \ + int n) \ + { \ + for (int i = 0; i < n; ++i) \ + r[i] = a[i] < 20 ? xorsign(b[i], c[i], SUFFIX) : b[i]; \ + } + +#define TEST_ALL(T) \ + T (_Float16, f16) \ + T (float, f) \ + T (double, ) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */ + +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-not {\tfmul} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tsel\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c new file mode 100644 index 00000000000..274dd0ede59 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msve-vector-bits=128 --param aarch64-autovec-preference=2" } */ + +#include "cond_xorsign_1.c" + +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */ + +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-not {\tfmul} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tsel\t} } } */ diff --git a/gcc/tree.cc b/gcc/tree.cc index 7473912a065..94d20eaba17 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -2669,6 +2669,35 @@ build_zero_cst (tree type) } } +/* If floating-point type TYPE has an IEEE-style sign bit, return an + unsigned constant in which only the sign bit is set. Return null + otherwise. */ + +tree +sign_mask_for (tree type) +{ + /* Avoid having to choose between a real-only sign and a pair of signs. + This could be relaxed if the choice becomes obvious later. */ + if (TREE_CODE (type) == COMPLEX_TYPE) + return NULL_TREE; + + auto eltmode = as_a (element_mode (type)); + auto bits = REAL_MODE_FORMAT (eltmode)->ieee_bits; + if (!bits || !pow2p_hwi (bits)) + return NULL_TREE; + + tree inttype = unsigned_type_for (type); + if (!inttype) + return NULL_TREE; + + auto mask = wi::set_bit_in_zero (bits - 1, bits); + if (TREE_CODE (inttype) == VECTOR_TYPE) + { + tree elt = wide_int_to_tree (TREE_TYPE (inttype), mask); + return build_vector_from_val (inttype, elt); + } + return wide_int_to_tree (inttype, mask); +} /* Build a BINFO with LEN language slots. */ @@ -10961,6 +10990,10 @@ signed_or_unsigned_type_for (int unsignedp, tree type) return NULL_TREE; if (inner == inner2) return type; + machine_mode new_mode; + if (VECTOR_MODE_P (TYPE_MODE (type)) + && related_int_vector_mode (TYPE_MODE (type)).exists (&new_mode)) + return build_vector_type_for_mode (inner2, new_mode); return build_vector_type (inner2, TYPE_VECTOR_SUBPARTS (type)); } diff --git a/gcc/tree.h b/gcc/tree.h index e730a2a3e56..c656cd5b7bf 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4675,6 +4675,7 @@ extern tree build_one_cst (tree); extern tree build_minus_one_cst (tree); extern tree build_all_ones_cst (tree); extern tree build_zero_cst (tree); +extern tree sign_mask_for (tree); extern tree build_string (unsigned, const char * = NULL); extern tree build_poly_int_cst (tree, const poly_wide_int_ref &); extern tree build_tree_list (tree, tree CXX_MEM_STAT_INFO); From patchwork Fri Jan 27 11:08:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 63787 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C81743858428 for ; Fri, 27 Jan 2023 11:09:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C81743858428 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1674817749; bh=zxkCTCkDtuix0ix60Nes9qhzA8VlLYukisOkXSJh8F8=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=TKvZLg9efO/ebMGRC55+mWYsbX7HM8DTh977HWMD7Y58ix321i11rZa4aOx3MJOD+ L8PbsS637O/gqyIPxs2BMREQgEkM9jsQqROFZk4+PaQt+aIiQLBnToF+q/aVG8wzRV p2H6RJOAfhhXlssIhRhPWZLrSTrgqbjZ+P75xaww= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A3B313858C60; Fri, 27 Jan 2023 11:08:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A3B313858C60 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3EA782B; Fri, 27 Jan 2023 03:09:22 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BF2643F64C; Fri, 27 Jan 2023 03:08:39 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, linkw@gcc.gnu.org, richard.sandiford@arm.com Cc: rguenther@suse.de, linkw@gcc.gnu.org Subject: [PATCH 2/2] vect: Make partial trapping ops use predication [PR96373] Date: Fri, 27 Jan 2023 11:08:38 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-36.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" PR96373 points out that a predicated SVE loop currently converts trapping unconditional ops into unpredicated vector ops. Doing the operation on inactive lanes can then raise an exception. As discussed in the PR trail, we aren't 100% consistent about whether we preserve traps or not. But the direction of travel is clearly to improve that rather than live with it. This patch tries to do that for the SVE case. Doing this regresses gcc.target/aarch64/sve/fabd_1.c. I've added -fno-trapping-math for now and filed PR108571 to track it. A similar problem applies to fsubr_1.d. I think this is likely to regress Power 10, since conditional operations are only available for masked loops. I think we'll need to add -fno-trapping-math to any affected testcases, but I don't have a Power 10 system to test on. Kewen, would you mind giving this a spin and seeing how bad the fallout is? Tested on aarch64-linux-gnu. OK to install assuming no blockers on the Power 10 side? Richard gcc/ PR tree-optimization/96373 * tree-vect-stmts.cc (vectorizable_operation): Predicate trapping operations on the loop mask. Reject partial vectors if this isn't possible. gcc/testsuite/ PR tree-optimization/96373 PR tree-optimization/108571 * gcc.target/aarch64/sve/fabd_1.c: Add -fno-trapping-math. * gcc.target/aarch64/sve/fsubr_1.c: Likewise. * gcc.target/aarch64/sve/fmul_1.c: Expect predicate ops. * gcc.target/aarch64/sve/fp_arith_1.c: Likewise. --- gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c | 2 +- gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c | 12 +++---- .../gcc.target/aarch64/sve/fp_arith_1.c | 12 +++---- .../gcc.target/aarch64/sve/fsubr_1.c | 2 +- gcc/tree-vect-stmts.cc | 32 ++++++++++++++----- 5 files changed, 38 insertions(+), 22 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c index 13ad83be24c..30bde6f0df7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c @@ -1,5 +1,5 @@ /* { dg-do assemble { target aarch64_asm_sve_ok } } */ -/* { dg-options "-O3 --save-temps" } */ +/* { dg-options "-O3 --save-temps -fno-trapping-math" } */ #define N 16 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c b/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c index 4a3e7c06745..0245a8c1422 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c @@ -27,20 +27,20 @@ DO_ARITH_OPS (_Float16, *, mul) DO_ARITH_OPS (float, *, mul) DO_ARITH_OPS (double, *, mul) -/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 4 } } */ /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #0.5\n} 1 } } */ -/* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #2} } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #2.0\n} 1 } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #5} } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #-} } } */ -/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 4 } } */ /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #0.5\n} 1 } } */ -/* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #2} } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #2.0\n} 1 } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #5} } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #-} } } */ -/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */ /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #0.5\n} 1 } } */ -/* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2} } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2.0\n} 1 } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #5} } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #-} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c b/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c index 5aed0dcb490..419d6e1b5ec 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c @@ -34,37 +34,37 @@ DO_ARITH_OPS (double, -, minus) /* No specific count because it's valid to use fadd or fsub for the out-of-range constants. */ -/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} } } */ +/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #2} } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #-} } } */ -/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} } } */ +/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #2} } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #-} } } */ -/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} } } */ +/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #2} } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #-} } } */ -/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} } } */ +/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #2} } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #-} } } */ -/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} } } */ +/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2} } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #-} } } */ -/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} } } */ +/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fsubr_1.c b/gcc/testsuite/gcc.target/aarch64/sve/fsubr_1.c index f47a360dee9..012cf6e9e5d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/fsubr_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/fsubr_1.c @@ -1,5 +1,5 @@ /* { dg-do assemble { target aarch64_asm_sve_ok } } */ -/* { dg-options "-O3 --save-temps" } */ +/* { dg-options "-O3 --save-temps -fno-trapping-math" } */ #define DO_IMMEDIATE_OPS(VALUE, TYPE, NAME) \ void vsubrarithimm_##NAME##_##TYPE (TYPE *dst, int count) \ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index eb4ca1f184e..56e3c30658e 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6301,6 +6301,7 @@ vectorizable_operation (vec_info *vinfo, int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); internal_fn cond_fn = get_conditional_internal_fn (code); + bool could_trap = gimple_could_trap_p (stmt); if (!vec_stmt) /* transformation not required. */ { @@ -6309,7 +6310,7 @@ vectorizable_operation (vec_info *vinfo, keeping the inactive lanes as-is. */ if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) - && reduc_idx >= 0) + && (could_trap || reduc_idx >= 0)) { if (cond_fn == IFN_LAST || !direct_internal_fn_supported_p (cond_fn, vectype, @@ -6452,16 +6453,31 @@ vectorizable_operation (vec_info *vinfo, vop1 = ((op_type == binary_op || op_type == ternary_op) ? vec_oprnds1[i] : NULL_TREE); vop2 = ((op_type == ternary_op) ? vec_oprnds2[i] : NULL_TREE); - if (masked_loop_p && reduc_idx >= 0) + if (masked_loop_p && (reduc_idx >= 0 || could_trap)) { - /* Perform the operation on active elements only and take - inactive elements from the reduction chain input. */ - gcc_assert (!vop2); - vop2 = reduc_idx == 1 ? vop1 : vop0; tree mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies, vectype, i); - gcall *call = gimple_build_call_internal (cond_fn, 4, mask, - vop0, vop1, vop2); + auto_vec vops (5); + vops.quick_push (mask); + vops.quick_push (vop0); + if (vop1) + vops.quick_push (vop1); + if (vop2) + vops.quick_push (vop2); + if (reduc_idx >= 0) + { + /* Perform the operation on active elements only and take + inactive elements from the reduction chain input. */ + gcc_assert (!vop2); + vops.quick_push (reduc_idx == 1 ? vop1 : vop0); + } + else + { + auto else_value = targetm.preferred_else_value + (cond_fn, vectype, vops.length () - 1, &vops[1]); + vops.quick_push (else_value); + } + gcall *call = gimple_build_call_internal_vec (cond_fn, vops); new_temp = make_ssa_name (vec_dest, call); gimple_call_set_lhs (call, new_temp); gimple_call_set_nothrow (call, true);