Message ID | alpine.DEB.2.20.2201251820040.11348@tpp.orcam.me.uk |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 297453947C23 for <patchwork@sourceware.org>; Thu, 27 Jan 2022 12:16:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by sourceware.org (Postfix) with ESMTPS id 1D54E385AC28 for <gcc-patches@gcc.gnu.org>; Thu, 27 Jan 2022 12:16:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1D54E385AC28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-lj1-x231.google.com with SMTP id z20so4017423ljo.6 for <gcc-patches@gcc.gnu.org>; Thu, 27 Jan 2022 04:16:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=qK3d95q8GBr20r1OhAgUSb0dPeaNywHA5an923lVrO0=; b=T85ldVSqo+XVf7VlwF8zajbARzmE8Zlg3i1cn2N4JWxoMa+TyxJWb+AJrnWahp2hYO 3//pTPNHe+hoEWxQp0fs6m/1cWdn615M0P4M1CdV3/RKab1i3BbPDrPeCA7itZAHZY+l NnabREQE6HYObwBa0MbMewjFtvxgegVk/UAol/IfXXDnzGP3x4Zr2rKGGReU47b79y55 n9Qfvj6Fk4pgckxX84pgj+YbtAJBa56QBbuBvGdlXbZk/e4HqLbmpO8dN0vSenZ7wkxK KrAgM0pGeDHmmNi9iJ0pNOSOVFcVonuO5CIic10xhYBSYsGxd8aAhyNtFz1t6bxPOMyQ s9tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=qK3d95q8GBr20r1OhAgUSb0dPeaNywHA5an923lVrO0=; b=mKpftcDP9/EFH/hN0CPyQvOS6Kc4DW6jV26vpVoNoU4ZmnPbsoV/NZvhu7KshW8r+d gimCcDFsYFTrFGkjyGqJeX9gkofJKnsnfIN28rLRyFIDvMS8H7c2h9eHRNwXxfNO6etJ csWfheY4u2GDV+KKefADTS/fQznt8I4onweFL0LcVK/yrTTgHFG1t5Dh2yuX6wv3IglI m1rEoU9Vt6FefOg/KWhvpkfkcXQttEeK3BKLyzYare904nxfxL2sg1jNSG6nS1hDYt/d tvEwTMbHsO1t4NsC/t/pt/oBCPbFnsUflGcpuCNIB1XMqWZcWLFoumv+RVD/Pt68EziT lY3g== X-Gm-Message-State: AOAM531gmgnhL4L+TMHwsGTlFqWxTnpKbtQjGY3QzUadt8S4V5whE0pe u70Xlg0ZG+Gn/SMJ/dg+xVh4hBNWq3ZFjA== X-Google-Smtp-Source: ABdhPJyn7RdIoBu/G9CUSarLXBTJQulTMOqA/ereV3JYAGxhDraXVSd3yivnejrzGM6XoiiGHSu20w== X-Received: by 2002:a05:651c:1607:: with SMTP id f7mr2599911ljq.367.1643285790741; Thu, 27 Jan 2022 04:16:30 -0800 (PST) Received: from [192.168.219.3] ([78.8.192.131]) by smtp.gmail.com with ESMTPSA id r17sm1807700lfg.237.2022.01.27.04.16.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jan 2022 04:16:30 -0800 (PST) Date: Thu, 27 Jan 2022 12:16:27 +0000 (GMT) From: "Maciej W. Rozycki" <macro@embecosm.com> To: gcc-patches@gcc.gnu.org Subject: [PATCH][GCC13] Don't force side effects for hardware vector element broadcast Message-ID: <alpine.DEB.2.20.2201251820040.11348@tpp.orcam.me.uk> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Cc: Richard Sandiford <richard.sandiford@arm.com>, Marc Glisse <marc.glisse@inria.fr> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[GCC13] Don't force side effects for hardware vector element broadcast
|
|
Commit Message
Maciej W. Rozycki
Jan. 27, 2022, 12:16 p.m. UTC
Do not mark vector element broadcast resulting from OpenCL operations as having side effects for targets that have a suitable hardware operation, so that the `vec_duplicateM' standard RTL pattern can be directly used for them. This does not happen currently, because any RTL pattern named `vec_duplicateM' cannot express the side effects presumed, and therefore a `!TREE_SIDE_EFFECTS' condition placed in `store_constructor' correctly prevents any `vec_duplicateM' pattern from being used. This side-effect annotation was introduced for the C frontend with: <https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01075.html>, and then: <https://gcc.gnu.org/ml/gcc-patches/2011-08/msg00891.html> ("Scalar vector binary operation"), and finally landed with commit 0e3a99ae913c ("c-typeck.c (scalar_to_vector): New function."), along with original support for scalar-to-vector OpenCL operations. Support for these operations was then gradually added to C++ with: <https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01557.html>, and then: <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00778.html> ("[C++] Mixed scalar-vector operations"), landing with commit a212e43fca22 ("re PR c++/54427 (Expose more vector extensions)"), followed by: <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01665.html>, and then: <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg02253.html> ("[C++] Handle ?: for vectors"), landing with commit 93100c6b5b45 ("re PR c++/54427 (Expose more vector extensions)"). And side-effect annotation for C++ was retrofitted, again gradually, with: <https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00254.html> ("[C++] Missing save_expr in vector-scalar ops"), landing with commit 6698175d1591 ("typeck.c (cp_build_binary_op): Call save_expr before build_vector_from_val."), and then: <https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00233.html> ("vector conditional expression with scalar arguments"), landing with commit 07298ffd6f7c ("call.c (build_conditional_expr_1): Handle the case with 1 vector and 2 scalars.") This was all long before we gained support for the `vec_duplicateM' standard RTL pattern, with commit be4c1d4a42c5 ("Add VEC_DUPLICATE_EXPR and associated optab") and it may have had sense for where the element broadcast operation was always open-coded. However where provided by hardware the element broadcast operation is just like any other vector operation, presumably cheap, and whether to repeat it or not should be based on the cost of the operation and not hardwired. So there is no point in artificially marking the operation as having a side effect for the lone purpose of preventing repetition. This does not currently prevent targets like the Aarch64 and MIPS ones from using the hardware element broadcast operation, but it requires the backend to wire it explicitly via the `vec_initMN' standard RTL pattern rather than relying on the middle end to pick it up via the `vec_duplicateM' pattern automatically. This change lifts this limitation. gcc/ * c/c-typeck.cc (build_binary_op) Do not annotate vector element broadcast operations as having a side effect if there is such an operation provided by the backend. * cp/call.cc (build_conditional_expr): Likewise. * cp/typeck.cc (cp_build_binary_op): Likewise. --- Hi, I've come across this while working on an out-of-tree microarchitecture, which does not require a `vec_initMN' RTL pattern, however has hardware support suitable for cheap `vec_duplicateM'. I could not persuade the middle end to make use of the new `vec_duplicateM' pattern however until I removed the side-effect anbnotation. So I think this is a move in the right direction even if we do not have support for said microarchitecture at the moment. For some reason I could not persuade our current Aarch64 code to use the `vec_duplicateM' pattern directly either. This is because at the time `store_constructor' is called the vector mode for the element broadcast operation is say `V4SFmode' (for a vector of floats), however there is no corresponding `vec_duplicatev4sf' pattern. Code in the `vec_initv4sfsf' pattern does the right thing though, referring to `aarch64_simd_dupv4sf' by hand instead. There is a `vec_duplicatevnx4sf' pattern corresponding to `VNx4SFmode', but I was unable to figure out how to activate it. This change has passed regression-testing with the `aarch64-linux-gnu' target (with `-march=armv9-a') and QEMU in the user emulation mode across all the GCC languages and libraries. NB there are indentation issues with code surrounding places I have modified, but I have chosen not to address them so as not to obfuscate this change. Likewise there are unnecessary compound statements used in the switch statements in the C++ frontend parts modified. Any questions or comments? Otherwise OK once GCC 13 has opened? Maciej --- gcc/c/c-typeck.cc | 9 +++++++-- gcc/cp/call.cc | 19 +++++++++++++++---- gcc/cp/typeck.cc | 9 +++++++-- 3 files changed, 29 insertions(+), 8 deletions(-) gcc-scalar-to-vector-no-side-effect.diff
Comments
On Thu, Jan 27, 2022 at 1:16 PM Maciej W. Rozycki <macro@embecosm.com> wrote: > > Do not mark vector element broadcast resulting from OpenCL operations as > having side effects for targets that have a suitable hardware operation, > so that the `vec_duplicateM' standard RTL pattern can be directly used > for them. This does not happen currently, because any RTL pattern named > `vec_duplicateM' cannot express the side effects presumed, and therefore > a `!TREE_SIDE_EFFECTS' condition placed in `store_constructor' correctly > prevents any `vec_duplicateM' pattern from being used. > > This side-effect annotation was introduced for the C frontend with: > <https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01075.html>, and then: > <https://gcc.gnu.org/ml/gcc-patches/2011-08/msg00891.html> ("Scalar > vector binary operation"), and finally landed with commit 0e3a99ae913c > ("c-typeck.c (scalar_to_vector): New function."), along with original > support for scalar-to-vector OpenCL operations. > > Support for these operations was then gradually added to C++ with: > <https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01557.html>, and then: > <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00778.html> ("[C++] Mixed > scalar-vector operations"), landing with commit a212e43fca22 ("re PR > c++/54427 (Expose more vector extensions)"), followed by: > <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01665.html>, and then: > <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg02253.html> ("[C++] > Handle ?: for vectors"), landing with commit 93100c6b5b45 ("re PR > c++/54427 (Expose more vector extensions)"). > > And side-effect annotation for C++ was retrofitted, again gradually, > with: <https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00254.html> ("[C++] > Missing save_expr in vector-scalar ops"), landing with commit > 6698175d1591 ("typeck.c (cp_build_binary_op): Call save_expr before > build_vector_from_val."), and then: > <https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00233.html> ("vector > conditional expression with scalar arguments"), landing with commit > 07298ffd6f7c ("call.c (build_conditional_expr_1): Handle the case with > 1 vector and 2 scalars.") > > This was all long before we gained support for the `vec_duplicateM' > standard RTL pattern, with commit be4c1d4a42c5 ("Add VEC_DUPLICATE_EXPR > and associated optab") and it may have had sense for where the element > broadcast operation was always open-coded. However where provided by > hardware the element broadcast operation is just like any other vector > operation, presumably cheap, and whether to repeat it or not should be > based on the cost of the operation and not hardwired. > > So there is no point in artificially marking the operation as having a > side effect for the lone purpose of preventing repetition. This does > not currently prevent targets like the Aarch64 and MIPS ones from using > the hardware element broadcast operation, but it requires the backend to > wire it explicitly via the `vec_initMN' standard RTL pattern rather than > relying on the middle end to pick it up via the `vec_duplicateM' pattern > automatically. This change lifts this limitation. > > gcc/ > * c/c-typeck.cc (build_binary_op) Do not annotate vector element > broadcast operations as having a side effect if there is such an > operation provided by the backend. > * cp/call.cc (build_conditional_expr): Likewise. > * cp/typeck.cc (cp_build_binary_op): Likewise. > --- > Hi, > > I've come across this while working on an out-of-tree microarchitecture, > which does not require a `vec_initMN' RTL pattern, however has hardware > support suitable for cheap `vec_duplicateM'. I could not persuade the > middle end to make use of the new `vec_duplicateM' pattern however until I > removed the side-effect anbnotation. So I think this is a move in the > right direction even if we do not have support for said microarchitecture > at the moment. > > For some reason I could not persuade our current Aarch64 code to use the > `vec_duplicateM' pattern directly either. This is because at the time > `store_constructor' is called the vector mode for the element broadcast > operation is say `V4SFmode' (for a vector of floats), however there is no > corresponding `vec_duplicatev4sf' pattern. Code in the `vec_initv4sfsf' > pattern does the right thing though, referring to `aarch64_simd_dupv4sf' > by hand instead. There is a `vec_duplicatevnx4sf' pattern corresponding > to `VNx4SFmode', but I was unable to figure out how to activate it. > > This change has passed regression-testing with the `aarch64-linux-gnu' > target (with `-march=armv9-a') and QEMU in the user emulation mode across > all the GCC languages and libraries. > > NB there are indentation issues with code surrounding places I have > modified, but I have chosen not to address them so as not to obfuscate > this change. Likewise there are unnecessary compound statements used in > the switch statements in the C++ frontend parts modified. > > Any questions or comments? Otherwise OK once GCC 13 has opened? > > Maciej > --- > gcc/c/c-typeck.cc | 9 +++++++-- > gcc/cp/call.cc | 19 +++++++++++++++---- > gcc/cp/typeck.cc | 9 +++++++-- > 3 files changed, 29 insertions(+), 8 deletions(-) > > gcc-scalar-to-vector-no-side-effect.diff > Index: gcc/gcc/c/c-typeck.cc > =================================================================== > --- gcc.orig/gcc/c/c-typeck.cc > +++ gcc/gcc/c/c-typeck.cc > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > #include "gomp-constants.h" > #include "spellcheck-tree.h" > #include "gcc-rich-location.h" > +#include "optabs-query.h" > #include "stringpool.h" > #include "attribs.h" > #include "asan.h" > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > bool maybe_const = true; > tree sc; > sc = c_fully_fold (op0, false, &maybe_const); > - sc = save_expr (sc); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type1)) == CODE_FOR_nothing) > + sc = save_expr (sc); This doesn't make much sense - I suppose the CONSTRUCTOR retains TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE and thus should have been cleared during gimplification or in the end ignored by RTL expansion. You do not add a testcase so it's hard to see what goes wrong where exactly. Richard. > sc = convert (TREE_TYPE (type1), sc); > op0 = build_vector_from_val (type1, sc); > if (!maybe_const) > @@ -11938,7 +11941,9 @@ build_binary_op (location_t location, en > bool maybe_const = true; > tree sc; > sc = c_fully_fold (op1, false, &maybe_const); > - sc = save_expr (sc); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type0)) == CODE_FOR_nothing) > + sc = save_expr (sc); > sc = convert (TREE_TYPE (type0), sc); > op1 = build_vector_from_val (type0, sc); > if (!maybe_const) > Index: gcc/gcc/cp/call.cc > =================================================================== > --- gcc.orig/gcc/cp/call.cc > +++ gcc/gcc/cp/call.cc > @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. > #include "langhooks.h" > #include "c-family/c-objc.h" > #include "internal-fn.h" > +#include "optabs-query.h" > #include "stringpool.h" > #include "attribs.h" > #include "gcc-rich-location.h" > @@ -5435,12 +5436,18 @@ build_conditional_expr (const op_locatio > return error_mark_node; > } > > + bool have_vec_duplicate > + = optab_handler (vec_duplicate_optab, > + TYPE_MODE (vtype)) != CODE_FOR_nothing; > + > arg2 = cp_convert (stype, arg2, complain); > - arg2 = save_expr (arg2); > + if (!have_vec_duplicate) > + arg2 = save_expr (arg2); > arg2 = build_vector_from_val (vtype, arg2); > arg2_type = vtype; > arg3 = cp_convert (stype, arg3, complain); > - arg3 = save_expr (arg3); > + if (!have_vec_duplicate) > + arg3 = save_expr (arg3); > arg3 = build_vector_from_val (vtype, arg3); > arg3_type = vtype; > } > @@ -5458,7 +5465,9 @@ build_conditional_expr (const op_locatio > return error_mark_node; > case stv_firstarg: > { > - arg2 = save_expr (arg2); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (arg3_type)) == CODE_FOR_nothing) > + arg2 = save_expr (arg2); > arg2 = convert (TREE_TYPE (arg3_type), arg2); > arg2 = build_vector_from_val (arg3_type, arg2); > arg2_type = TREE_TYPE (arg2); > @@ -5466,7 +5475,9 @@ build_conditional_expr (const op_locatio > } > case stv_secondarg: > { > - arg3 = save_expr (arg3); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (arg2_type)) == CODE_FOR_nothing) > + arg3 = save_expr (arg3); > arg3 = convert (TREE_TYPE (arg2_type), arg3); > arg3 = build_vector_from_val (arg2_type, arg3); > arg3_type = TREE_TYPE (arg3); > Index: gcc/gcc/cp/typeck.cc > =================================================================== > --- gcc.orig/gcc/cp/typeck.cc > +++ gcc/gcc/cp/typeck.cc > @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. > #include "c-family/c-objc.h" > #include "c-family/c-ubsan.h" > #include "gcc-rich-location.h" > +#include "optabs-query.h" > #include "stringpool.h" > #include "attribs.h" > #include "asan.h" > @@ -5066,7 +5067,9 @@ cp_build_binary_op (const op_location_t > case stv_firstarg: > { > op0 = convert (TREE_TYPE (type1), op0); > - op0 = save_expr (op0); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type1)) == CODE_FOR_nothing) > + op0 = save_expr (op0); > op0 = build_vector_from_val (type1, op0); > type0 = TREE_TYPE (op0); > code0 = TREE_CODE (type0); > @@ -5076,7 +5079,9 @@ cp_build_binary_op (const op_location_t > case stv_secondarg: > { > op1 = convert (TREE_TYPE (type0), op1); > - op1 = save_expr (op1); > + if (optab_handler (vec_duplicate_optab, > + TYPE_MODE (type0)) == CODE_FOR_nothing) > + op1 = save_expr (op1); > op1 = build_vector_from_val (type0, op1); > type1 = TREE_TYPE (op1); > code1 = TREE_CODE (type1);
On Thu, 27 Jan 2022, Richard Biener wrote: > > Index: gcc/gcc/c/c-typeck.cc > > =================================================================== > > --- gcc.orig/gcc/c/c-typeck.cc > > +++ gcc/gcc/c/c-typeck.cc > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > #include "gomp-constants.h" > > #include "spellcheck-tree.h" > > #include "gcc-rich-location.h" > > +#include "optabs-query.h" > > #include "stringpool.h" > > #include "attribs.h" > > #include "asan.h" > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > bool maybe_const = true; > > tree sc; > > sc = c_fully_fold (op0, false, &maybe_const); > > - sc = save_expr (sc); > > + if (optab_handler (vec_duplicate_optab, > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > + sc = save_expr (sc); > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > and thus should have been cleared during gimplification or in the end > ignored by RTL expansion. This is how the expression built here eventually looks in `store_constructor': (gdb) print exp $41 = <constructor 0x7ffff5c06ba0> (gdb) pt <constructor 0x7ffff5c06ba0 type <vector_type 0x7ffff5e7ea48 v4sf type <real_type 0x7ffff5cf1260 float sizes-gimplified SF size <integer_cst 0x7ffff5c00f78 constant 32> unit-size <integer_cst 0x7ffff5c00f90 constant 4> align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 pointer_to_this <pointer_type 0x7ffff5cf1848>> sizes-gimplified V4SF size <integer_cst 0x7ffff5c00d80 constant 128> unit-size <integer_cst 0x7ffff5c00d98 constant 16> align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> side-effects length:4 val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> visited var <parm_decl 0x7ffff5f00080 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> visited var <parm_decl 0x7ffff5f00080 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> visited var <parm_decl 0x7ffff5f00080 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> visited var <parm_decl 0x7ffff5f00080 y> def_stmt GIMPLE_NOP version:2>> (gdb) The `side-effects' flag prevents this conditional from executing: /* Try using vec_duplicate_optab for uniform vectors. */ if (!TREE_SIDE_EFFECTS (exp) && VECTOR_MODE_P (mode) && eltmode == GET_MODE_INNER (mode) && ((icode = optab_handler (vec_duplicate_optab, mode)) != CODE_FOR_nothing) && (elt = uniform_vector_p (exp)) && !VECTOR_TYPE_P (TREE_TYPE (elt))) { class expand_operand ops[2]; create_output_operand (&ops[0], target, mode); create_input_operand (&ops[1], expand_normal (elt), eltmode); expand_insn (icode, 2, ops); if (!rtx_equal_p (target, ops[0].value)) emit_move_insn (target, ops[0].value); break; } I don't know what's supposed to clear the flag (and what the purpose of setting it in the first place would be then). > You do not add a testcase so it's hard to see what goes wrong where exactly. This only happens with an out-of-tree microarchitecture and I was unable to find an in-tree target that ever uses the `vec_duplicateM' standard RTL pattern, so I wasn't able to produce a test case that would trigger with upstream code. Code is essentially like: typedef float v4sf __attribute__ ((vector_size (16))); v4sf odd_even (v4sf x, float y) { return x + f; } I considered the Aarch64 one the most likely candidate as it is the one commit be4c1d4a42c5 has been for, but as I noted in the description it does not appear to use it either. It uses `vec_initMN' instead and does the broadcast by hand in the backend. Maybe I'm missing something. Maciej
On Thu, Jan 27, 2022 at 2:59 PM Maciej W. Rozycki <macro@embecosm.com> wrote: > > On Thu, 27 Jan 2022, Richard Biener wrote: > > > > Index: gcc/gcc/c/c-typeck.cc > > > =================================================================== > > > --- gcc.orig/gcc/c/c-typeck.cc > > > +++ gcc/gcc/c/c-typeck.cc > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > > #include "gomp-constants.h" > > > #include "spellcheck-tree.h" > > > #include "gcc-rich-location.h" > > > +#include "optabs-query.h" > > > #include "stringpool.h" > > > #include "attribs.h" > > > #include "asan.h" > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > > bool maybe_const = true; > > > tree sc; > > > sc = c_fully_fold (op0, false, &maybe_const); > > > - sc = save_expr (sc); > > > + if (optab_handler (vec_duplicate_optab, > > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > > + sc = save_expr (sc); > > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > > and thus should have been cleared during gimplification or in the end > > ignored by RTL expansion. > > This is how the expression built here eventually looks in > `store_constructor': > > (gdb) print exp > $41 = <constructor 0x7ffff5c06ba0> > (gdb) pt > <constructor 0x7ffff5c06ba0 > type <vector_type 0x7ffff5e7ea48 v4sf > type <real_type 0x7ffff5cf1260 float sizes-gimplified SF > size <integer_cst 0x7ffff5c00f78 constant 32> > unit-size <integer_cst 0x7ffff5c00f90 constant 4> > align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 > pointer_to_this <pointer_type 0x7ffff5cf1848>> > sizes-gimplified V4SF > size <integer_cst 0x7ffff5c00d80 constant 128> > unit-size <integer_cst 0x7ffff5c00d98 constant 16> > align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> > side-effects length:4 > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > visited var <parm_decl 0x7ffff5f00080 y> > def_stmt GIMPLE_NOP > version:2> > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > visited var <parm_decl 0x7ffff5f00080 y> > def_stmt GIMPLE_NOP > version:2> > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > visited var <parm_decl 0x7ffff5f00080 y> > def_stmt GIMPLE_NOP > version:2> > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > visited var <parm_decl 0x7ffff5f00080 y> > def_stmt GIMPLE_NOP > version:2>> > (gdb) > > The `side-effects' flag prevents this conditional from executing: > > /* Try using vec_duplicate_optab for uniform vectors. */ > if (!TREE_SIDE_EFFECTS (exp) > && VECTOR_MODE_P (mode) > && eltmode == GET_MODE_INNER (mode) > && ((icode = optab_handler (vec_duplicate_optab, mode)) > != CODE_FOR_nothing) > && (elt = uniform_vector_p (exp)) > && !VECTOR_TYPE_P (TREE_TYPE (elt))) > { > class expand_operand ops[2]; > create_output_operand (&ops[0], target, mode); > create_input_operand (&ops[1], expand_normal (elt), eltmode); > expand_insn (icode, 2, ops); > if (!rtx_equal_p (target, ops[0].value)) > emit_move_insn (target, ops[0].value); > break; > } > > I don't know what's supposed to clear the flag (and what the purpose of > setting it in the first place would be then). It's probably safe to remove the !TREE_SIDE_EFFECTS check above but already gimplification should have made sure all side-effects are pushed to separate stmts. gimplifiation usually calls recompute_side_effects but that doesn't seem to touch CONSTRUCTORs. But I do remember fixing some spurious TREE_SIDE_EFFECTS on CTORs before. Might be worth verifying in verify_gimple_assign_single that CTORs do not have TREE_SIDE_EFFECTS set (unless this is a clobber). > > > You do not add a testcase so it's hard to see what goes wrong where exactly. > > This only happens with an out-of-tree microarchitecture and I was unable > to find an in-tree target that ever uses the `vec_duplicateM' standard RTL > pattern, so I wasn't able to produce a test case that would trigger with > upstream code. Code is essentially like: > > typedef float v4sf __attribute__ ((vector_size (16))); > > v4sf > odd_even (v4sf x, float y) > { > return x + f; > } > > I considered the Aarch64 one the most likely candidate as it is the one > commit be4c1d4a42c5 has been for, but as I noted in the description it > does not appear to use it either. It uses `vec_initMN' instead and does > the broadcast by hand in the backend. > > Maybe I'm missing something. > > Maciej
On Thu, 27 Jan 2022, Richard Biener wrote: > > > > Index: gcc/gcc/c/c-typeck.cc > > > > =================================================================== > > > > --- gcc.orig/gcc/c/c-typeck.cc > > > > +++ gcc/gcc/c/c-typeck.cc > > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > > > #include "gomp-constants.h" > > > > #include "spellcheck-tree.h" > > > > #include "gcc-rich-location.h" > > > > +#include "optabs-query.h" > > > > #include "stringpool.h" > > > > #include "attribs.h" > > > > #include "asan.h" > > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > > > bool maybe_const = true; > > > > tree sc; > > > > sc = c_fully_fold (op0, false, &maybe_const); > > > > - sc = save_expr (sc); > > > > + if (optab_handler (vec_duplicate_optab, > > > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > > > + sc = save_expr (sc); > > > > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > > > and thus should have been cleared during gimplification or in the end > > > ignored by RTL expansion. > > > > This is how the expression built here eventually looks in > > `store_constructor': > > > > (gdb) print exp > > $41 = <constructor 0x7ffff5c06ba0> > > (gdb) pt > > <constructor 0x7ffff5c06ba0 > > type <vector_type 0x7ffff5e7ea48 v4sf > > type <real_type 0x7ffff5cf1260 float sizes-gimplified SF > > size <integer_cst 0x7ffff5c00f78 constant 32> > > unit-size <integer_cst 0x7ffff5c00f90 constant 4> > > align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 > > pointer_to_this <pointer_type 0x7ffff5cf1848>> > > sizes-gimplified V4SF > > size <integer_cst 0x7ffff5c00d80 constant 128> > > unit-size <integer_cst 0x7ffff5c00d98 constant 16> > > align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> > > side-effects length:4 > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > visited var <parm_decl 0x7ffff5f00080 y> > > def_stmt GIMPLE_NOP > > version:2> > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > visited var <parm_decl 0x7ffff5f00080 y> > > def_stmt GIMPLE_NOP > > version:2> > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > visited var <parm_decl 0x7ffff5f00080 y> > > def_stmt GIMPLE_NOP > > version:2> > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > visited var <parm_decl 0x7ffff5f00080 y> > > def_stmt GIMPLE_NOP > > version:2>> > > (gdb) > > > > The `side-effects' flag prevents this conditional from executing: > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > if (!TREE_SIDE_EFFECTS (exp) > > && VECTOR_MODE_P (mode) > > && eltmode == GET_MODE_INNER (mode) > > && ((icode = optab_handler (vec_duplicate_optab, mode)) > > != CODE_FOR_nothing) > > && (elt = uniform_vector_p (exp)) > > && !VECTOR_TYPE_P (TREE_TYPE (elt))) > > { > > class expand_operand ops[2]; > > create_output_operand (&ops[0], target, mode); > > create_input_operand (&ops[1], expand_normal (elt), eltmode); > > expand_insn (icode, 2, ops); > > if (!rtx_equal_p (target, ops[0].value)) > > emit_move_insn (target, ops[0].value); > > break; > > } > > > > I don't know what's supposed to clear the flag (and what the purpose of > > setting it in the first place would be then). > > It's probably safe to remove the !TREE_SIDE_EFFECTS check above > but already gimplification should have made sure all side-effects are > pushed to separate stmts. gimplifiation usually calls recompute_side_effects > but that doesn't seem to touch CONSTRUCTORs. But I do remember fixing > some spurious TREE_SIDE_EFFECTS on CTORs before. > > Might be worth verifying in verify_gimple_assign_single that CTORs > do not have TREE_SIDE_EFFECTS set (unless this is a clobber). OK, so maybe there's another bug somewhere that causes the side-effects flag not to be cleared where expected, however I an inconvinced as to withdrawing my original point. That is why treat code like: v4sf odd_even (v4sf x, float y) { return x + f; } effectively like: v4sf odd_even (v4sf x, volatile float y) { return x + f; } which I infer from the terse justification in the discussions referred is the sole purpose of making use of `save_expr' here, also for targets that have a cheap (or free if combined with another operation) `vec_duplicateM' machine operation? While removing the !TREE_SIDE_EFFECTS check may cause `vec_duplicateM' to be used, the middle end will still ensure the element broadcast operation won't be repeated, e.g. at the cost of consuming a temporary register to carry a vector of identical elements, where it may not be the least costly approach. Where we have an actual `vec_duplicateM' insn we can use its cost to determine the best approach, can't we? Am I still missing something? Maciej
On Thu, Jan 27, 2022 at 8:14 PM Maciej W. Rozycki <macro@embecosm.com> wrote: > > On Thu, 27 Jan 2022, Richard Biener wrote: > > > > > > Index: gcc/gcc/c/c-typeck.cc > > > > > =================================================================== > > > > > --- gcc.orig/gcc/c/c-typeck.cc > > > > > +++ gcc/gcc/c/c-typeck.cc > > > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > > > > #include "gomp-constants.h" > > > > > #include "spellcheck-tree.h" > > > > > #include "gcc-rich-location.h" > > > > > +#include "optabs-query.h" > > > > > #include "stringpool.h" > > > > > #include "attribs.h" > > > > > #include "asan.h" > > > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > > > > bool maybe_const = true; > > > > > tree sc; > > > > > sc = c_fully_fold (op0, false, &maybe_const); > > > > > - sc = save_expr (sc); > > > > > + if (optab_handler (vec_duplicate_optab, > > > > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > > > > + sc = save_expr (sc); > > > > > > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > > > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > > > > and thus should have been cleared during gimplification or in the end > > > > ignored by RTL expansion. > > > > > > This is how the expression built here eventually looks in > > > `store_constructor': > > > > > > (gdb) print exp > > > $41 = <constructor 0x7ffff5c06ba0> > > > (gdb) pt > > > <constructor 0x7ffff5c06ba0 > > > type <vector_type 0x7ffff5e7ea48 v4sf > > > type <real_type 0x7ffff5cf1260 float sizes-gimplified SF > > > size <integer_cst 0x7ffff5c00f78 constant 32> > > > unit-size <integer_cst 0x7ffff5c00f90 constant 4> > > > align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 > > > pointer_to_this <pointer_type 0x7ffff5cf1848>> > > > sizes-gimplified V4SF > > > size <integer_cst 0x7ffff5c00d80 constant 128> > > > unit-size <integer_cst 0x7ffff5c00d98 constant 16> > > > align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> > > > side-effects length:4 > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > visited var <parm_decl 0x7ffff5f00080 y> > > > def_stmt GIMPLE_NOP > > > version:2> > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > visited var <parm_decl 0x7ffff5f00080 y> > > > def_stmt GIMPLE_NOP > > > version:2> > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > visited var <parm_decl 0x7ffff5f00080 y> > > > def_stmt GIMPLE_NOP > > > version:2> > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > visited var <parm_decl 0x7ffff5f00080 y> > > > def_stmt GIMPLE_NOP > > > version:2>> > > > (gdb) > > > > > > The `side-effects' flag prevents this conditional from executing: > > > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > > if (!TREE_SIDE_EFFECTS (exp) > > > && VECTOR_MODE_P (mode) > > > && eltmode == GET_MODE_INNER (mode) > > > && ((icode = optab_handler (vec_duplicate_optab, mode)) > > > != CODE_FOR_nothing) > > > && (elt = uniform_vector_p (exp)) > > > && !VECTOR_TYPE_P (TREE_TYPE (elt))) > > > { > > > class expand_operand ops[2]; > > > create_output_operand (&ops[0], target, mode); > > > create_input_operand (&ops[1], expand_normal (elt), eltmode); > > > expand_insn (icode, 2, ops); > > > if (!rtx_equal_p (target, ops[0].value)) > > > emit_move_insn (target, ops[0].value); > > > break; > > > } > > > > > > I don't know what's supposed to clear the flag (and what the purpose of > > > setting it in the first place would be then). > > > > It's probably safe to remove the !TREE_SIDE_EFFECTS check above > > but already gimplification should have made sure all side-effects are > > pushed to separate stmts. gimplifiation usually calls recompute_side_effects > > but that doesn't seem to touch CONSTRUCTORs. But I do remember fixing > > some spurious TREE_SIDE_EFFECTS on CTORs before. > > > > Might be worth verifying in verify_gimple_assign_single that CTORs > > do not have TREE_SIDE_EFFECTS set (unless this is a clobber). > > OK, so maybe there's another bug somewhere that causes the side-effects > flag not to be cleared where expected, however I an inconvinced as to > withdrawing my original point. That is why treat code like: > > v4sf > odd_even (v4sf x, float y) > { > return x + f; > } > > effectively like: > > v4sf > odd_even (v4sf x, volatile float y) > { > return x + f; > } that's not what it does. It treats it like float tem = f; return x + { tem, tem, tem, tem }; avoiding, like for x + (1.0f + f) creating return x + { 1.0+f, 1.0+f, 1.0+f ...} it's more CSE than volatile qualifying. > which I infer from the terse justification in the discussions referred is > the sole purpose of making use of `save_expr' here, also for targets that > have a cheap (or free if combined with another operation) `vec_duplicateM' > machine operation? Because the IL from the frontends should not depend on target capabilities and whether we have to preserve side-effects properly doesn't depend on the cheapness of the operation itself. Consider return x + bar (f); you definitely want bar(f) to be only evaluated once, even when the target can cheaply do the splat. Richard. > While removing the !TREE_SIDE_EFFECTS check may cause `vec_duplicateM' to > be used, the middle end will still ensure the element broadcast operation > won't be repeated, e.g. at the cost of consuming a temporary register to > carry a vector of identical elements, where it may not be the least costly > approach. Where we have an actual `vec_duplicateM' insn we can use its > cost to determine the best approach, can't we? > > Am I still missing something? > > Maciej
On Fri, Jan 28, 2022 at 1:22 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Jan 27, 2022 at 8:14 PM Maciej W. Rozycki <macro@embecosm.com> wrote: > > > > On Thu, 27 Jan 2022, Richard Biener wrote: > > > > > > > > Index: gcc/gcc/c/c-typeck.cc > > > > > > =================================================================== > > > > > > --- gcc.orig/gcc/c/c-typeck.cc > > > > > > +++ gcc/gcc/c/c-typeck.cc > > > > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. > > > > > > #include "gomp-constants.h" > > > > > > #include "spellcheck-tree.h" > > > > > > #include "gcc-rich-location.h" > > > > > > +#include "optabs-query.h" > > > > > > #include "stringpool.h" > > > > > > #include "attribs.h" > > > > > > #include "asan.h" > > > > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en > > > > > > bool maybe_const = true; > > > > > > tree sc; > > > > > > sc = c_fully_fold (op0, false, &maybe_const); > > > > > > - sc = save_expr (sc); > > > > > > + if (optab_handler (vec_duplicate_optab, > > > > > > + TYPE_MODE (type1)) == CODE_FOR_nothing) > > > > > > + sc = save_expr (sc); > > > > > > > > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains > > > > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE > > > > > and thus should have been cleared during gimplification or in the end > > > > > ignored by RTL expansion. > > > > > > > > This is how the expression built here eventually looks in > > > > `store_constructor': > > > > > > > > (gdb) print exp > > > > $41 = <constructor 0x7ffff5c06ba0> > > > > (gdb) pt > > > > <constructor 0x7ffff5c06ba0 > > > > type <vector_type 0x7ffff5e7ea48 v4sf > > > > type <real_type 0x7ffff5cf1260 float sizes-gimplified SF > > > > size <integer_cst 0x7ffff5c00f78 constant 32> > > > > unit-size <integer_cst 0x7ffff5c00f90 constant 4> > > > > align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32 > > > > pointer_to_this <pointer_type 0x7ffff5cf1848>> > > > > sizes-gimplified V4SF > > > > size <integer_cst 0x7ffff5c00d80 constant 128> > > > > unit-size <integer_cst 0x7ffff5c00d98 constant 16> > > > > align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>> > > > > side-effects length:4 > > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > > visited var <parm_decl 0x7ffff5f00080 y> > > > > def_stmt GIMPLE_NOP > > > > version:2> > > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > > visited var <parm_decl 0x7ffff5f00080 y> > > > > def_stmt GIMPLE_NOP > > > > version:2> > > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > > visited var <parm_decl 0x7ffff5f00080 y> > > > > def_stmt GIMPLE_NOP > > > > version:2> > > > > val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float> > > > > visited var <parm_decl 0x7ffff5f00080 y> > > > > def_stmt GIMPLE_NOP > > > > version:2>> > > > > (gdb) > > > > > > > > The `side-effects' flag prevents this conditional from executing: > > > > > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > > > if (!TREE_SIDE_EFFECTS (exp) > > > > && VECTOR_MODE_P (mode) > > > > && eltmode == GET_MODE_INNER (mode) > > > > && ((icode = optab_handler (vec_duplicate_optab, mode)) > > > > != CODE_FOR_nothing) > > > > && (elt = uniform_vector_p (exp)) > > > > && !VECTOR_TYPE_P (TREE_TYPE (elt))) > > > > { > > > > class expand_operand ops[2]; > > > > create_output_operand (&ops[0], target, mode); > > > > create_input_operand (&ops[1], expand_normal (elt), eltmode); > > > > expand_insn (icode, 2, ops); > > > > if (!rtx_equal_p (target, ops[0].value)) > > > > emit_move_insn (target, ops[0].value); > > > > break; > > > > } > > > > > > > > I don't know what's supposed to clear the flag (and what the purpose of > > > > setting it in the first place would be then). > > > > > > It's probably safe to remove the !TREE_SIDE_EFFECTS check above > > > but already gimplification should have made sure all side-effects are > > > pushed to separate stmts. gimplifiation usually calls recompute_side_effects > > > but that doesn't seem to touch CONSTRUCTORs. But I do remember fixing > > > some spurious TREE_SIDE_EFFECTS on CTORs before. > > > > > > Might be worth verifying in verify_gimple_assign_single that CTORs > > > do not have TREE_SIDE_EFFECTS set (unless this is a clobber). > > > > OK, so maybe there's another bug somewhere that causes the side-effects > > flag not to be cleared where expected, however I an inconvinced as to > > withdrawing my original point. That is why treat code like: > > > > v4sf > > odd_even (v4sf x, float y) > > { > > return x + f; > > } > > > > effectively like: > > > > v4sf > > odd_even (v4sf x, volatile float y) > > { > > return x + f; > > } > > that's not what it does. It treats it like > > float tem = f; > return x + { tem, tem, tem, tem }; > > avoiding, like for x + (1.0f + f) creating > > return x + { 1.0+f, 1.0+f, 1.0+f ...} > > it's more CSE than volatile qualifying. > > > which I infer from the terse justification in the discussions referred is > > the sole purpose of making use of `save_expr' here, also for targets that > > have a cheap (or free if combined with another operation) `vec_duplicateM' > > machine operation? > > Because the IL from the frontends should not depend on target capabilities > and whether we have to preserve side-effects properly doesn't depend on > the cheapness of the operation itself. Consider > > return x + bar (f); > > you definitely want bar(f) to be only evaluated once, even when the > target can cheaply do the splat. Btw, diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index efd10332c53..c0f7d98931d 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -4703,6 +4703,12 @@ verify_gimple_assign_single (gassign *stmt) debug_generic_stmt (rhs1); return true; } + if (TREE_SIDE_EFFECTS (rhs1) && !gimple_clobber_p (stmt)) + { + error ("%qs with side-effects", code_name); + debug_generic_stmt (rhs1); + return true; + } return res; case ASSERT_EXPR: does not cause ICEs on the two testcases (on trunk). Richard. > > Richard. > > > While removing the !TREE_SIDE_EFFECTS check may cause `vec_duplicateM' to > > be used, the middle end will still ensure the element broadcast operation > > won't be repeated, e.g. at the cost of consuming a temporary register to > > carry a vector of identical elements, where it may not be the least costly > > approach. Where we have an actual `vec_duplicateM' insn we can use its > > cost to determine the best approach, can't we? > > > > Am I still missing something? > > > > Maciej
On Fri, 28 Jan 2022, Richard Biener wrote: > > that's not what it does. It treats it like > > > > float tem = f; > > return x + { tem, tem, tem, tem }; > > > > avoiding, like for x + (1.0f + f) creating > > > > return x + { 1.0+f, 1.0+f, 1.0+f ...} > > > > it's more CSE than volatile qualifying. I see, thanks for your time to explain me. I got this confused. > > Because the IL from the frontends should not depend on target capabilities > > and whether we have to preserve side-effects properly doesn't depend on > > the cheapness of the operation itself. Consider > > > > return x + bar (f); > > > > you definitely want bar(f) to be only evaluated once, even when the > > target can cheaply do the splat. Indeed. > Btw, > > diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc > index efd10332c53..c0f7d98931d 100644 > --- a/gcc/tree-cfg.cc > +++ b/gcc/tree-cfg.cc > @@ -4703,6 +4703,12 @@ verify_gimple_assign_single (gassign *stmt) > debug_generic_stmt (rhs1); > return true; > } > + if (TREE_SIDE_EFFECTS (rhs1) && !gimple_clobber_p (stmt)) > + { > + error ("%qs with side-effects", code_name); > + debug_generic_stmt (rhs1); > + return true; > + } > return res; > > case ASSERT_EXPR: > > does not cause ICEs on the two testcases (on trunk). Right, so it has turned out I had the wrong binary run under GDB, sigh. I have re-verified the current trunk and indeed the side-effect annotation has gone: (gdb) frame #0 store_constructor (exp=<constructor 0x7ffff5b36e88>, target=0x7ffff5b388e0, cleared=0, size=..., reverse=false) at .../gcc/expr.cc:7169 7169 if (!TREE_SIDE_EFFECTS (exp) (gdb) pt exp <constructor 0x7ffff5b36e88 type <vector_type 0x7ffff5e23678 v4sf type <real_type 0x7ffff5c41260 float sizes-gimplified SF size <integer_cst 0x7ffff5b31050 constant 32> unit-size <integer_cst 0x7ffff5b31068 constant 4> align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5c41260 precision:32 pointer_to_this <pointer_type 0x7ffff5c41848>> sizes-gimplified V4SF size <integer_cst 0x7ffff5b30e58 constant 128> unit-size <integer_cst 0x7ffff5b30e70 constant 16> align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5c6d740 nunits:4 context <translation_unit_decl 0x7ffff5e908e8 v4sf.c>> length:4 val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float> visited var <parm_decl 0x7ffff7f80100 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float> visited var <parm_decl 0x7ffff7f80100 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float> visited var <parm_decl 0x7ffff7f80100 y> def_stmt GIMPLE_NOP version:2> val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float> visited var <parm_decl 0x7ffff7f80100 y> def_stmt GIMPLE_NOP version:2>> (gdb) and I have tracked down commit 512429a885e8 ("tree-optimization/99863 - clear vector CTOR TREE_SIDE_EFFECTS") of yours to be the change required. Thank you for your assistance! Maciej
Index: gcc/gcc/c/c-typeck.cc =================================================================== --- gcc.orig/gcc/c/c-typeck.cc +++ gcc/gcc/c/c-typeck.cc @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. #include "gomp-constants.h" #include "spellcheck-tree.h" #include "gcc-rich-location.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "asan.h" @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en bool maybe_const = true; tree sc; sc = c_fully_fold (op0, false, &maybe_const); - sc = save_expr (sc); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type1)) == CODE_FOR_nothing) + sc = save_expr (sc); sc = convert (TREE_TYPE (type1), sc); op0 = build_vector_from_val (type1, sc); if (!maybe_const) @@ -11938,7 +11941,9 @@ build_binary_op (location_t location, en bool maybe_const = true; tree sc; sc = c_fully_fold (op1, false, &maybe_const); - sc = save_expr (sc); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type0)) == CODE_FOR_nothing) + sc = save_expr (sc); sc = convert (TREE_TYPE (type0), sc); op1 = build_vector_from_val (type0, sc); if (!maybe_const) Index: gcc/gcc/cp/call.cc =================================================================== --- gcc.orig/gcc/cp/call.cc +++ gcc/gcc/cp/call.cc @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. #include "langhooks.h" #include "c-family/c-objc.h" #include "internal-fn.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "gcc-rich-location.h" @@ -5435,12 +5436,18 @@ build_conditional_expr (const op_locatio return error_mark_node; } + bool have_vec_duplicate + = optab_handler (vec_duplicate_optab, + TYPE_MODE (vtype)) != CODE_FOR_nothing; + arg2 = cp_convert (stype, arg2, complain); - arg2 = save_expr (arg2); + if (!have_vec_duplicate) + arg2 = save_expr (arg2); arg2 = build_vector_from_val (vtype, arg2); arg2_type = vtype; arg3 = cp_convert (stype, arg3, complain); - arg3 = save_expr (arg3); + if (!have_vec_duplicate) + arg3 = save_expr (arg3); arg3 = build_vector_from_val (vtype, arg3); arg3_type = vtype; } @@ -5458,7 +5465,9 @@ build_conditional_expr (const op_locatio return error_mark_node; case stv_firstarg: { - arg2 = save_expr (arg2); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (arg3_type)) == CODE_FOR_nothing) + arg2 = save_expr (arg2); arg2 = convert (TREE_TYPE (arg3_type), arg2); arg2 = build_vector_from_val (arg3_type, arg2); arg2_type = TREE_TYPE (arg2); @@ -5466,7 +5475,9 @@ build_conditional_expr (const op_locatio } case stv_secondarg: { - arg3 = save_expr (arg3); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (arg2_type)) == CODE_FOR_nothing) + arg3 = save_expr (arg3); arg3 = convert (TREE_TYPE (arg2_type), arg3); arg3 = build_vector_from_val (arg2_type, arg3); arg3_type = TREE_TYPE (arg3); Index: gcc/gcc/cp/typeck.cc =================================================================== --- gcc.orig/gcc/cp/typeck.cc +++ gcc/gcc/cp/typeck.cc @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. #include "c-family/c-objc.h" #include "c-family/c-ubsan.h" #include "gcc-rich-location.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "asan.h" @@ -5066,7 +5067,9 @@ cp_build_binary_op (const op_location_t case stv_firstarg: { op0 = convert (TREE_TYPE (type1), op0); - op0 = save_expr (op0); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type1)) == CODE_FOR_nothing) + op0 = save_expr (op0); op0 = build_vector_from_val (type1, op0); type0 = TREE_TYPE (op0); code0 = TREE_CODE (type0); @@ -5076,7 +5079,9 @@ cp_build_binary_op (const op_location_t case stv_secondarg: { op1 = convert (TREE_TYPE (type0), op1); - op1 = save_expr (op1); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type0)) == CODE_FOR_nothing) + op1 = save_expr (op1); op1 = build_vector_from_val (type0, op1); type1 = TREE_TYPE (op1); code1 = TREE_CODE (type1);