From patchwork Thu Jan 27 12:16:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 50487 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 297453947C23 for ; Thu, 27 Jan 2022 12:16:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by sourceware.org (Postfix) with ESMTPS id 1D54E385AC28 for ; Thu, 27 Jan 2022 12:16:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1D54E385AC28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-lj1-x231.google.com with SMTP id z20so4017423ljo.6 for ; Thu, 27 Jan 2022 04:16:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=qK3d95q8GBr20r1OhAgUSb0dPeaNywHA5an923lVrO0=; b=T85ldVSqo+XVf7VlwF8zajbARzmE8Zlg3i1cn2N4JWxoMa+TyxJWb+AJrnWahp2hYO 3//pTPNHe+hoEWxQp0fs6m/1cWdn615M0P4M1CdV3/RKab1i3BbPDrPeCA7itZAHZY+l NnabREQE6HYObwBa0MbMewjFtvxgegVk/UAol/IfXXDnzGP3x4Zr2rKGGReU47b79y55 n9Qfvj6Fk4pgckxX84pgj+YbtAJBa56QBbuBvGdlXbZk/e4HqLbmpO8dN0vSenZ7wkxK KrAgM0pGeDHmmNi9iJ0pNOSOVFcVonuO5CIic10xhYBSYsGxd8aAhyNtFz1t6bxPOMyQ s9tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=qK3d95q8GBr20r1OhAgUSb0dPeaNywHA5an923lVrO0=; b=mKpftcDP9/EFH/hN0CPyQvOS6Kc4DW6jV26vpVoNoU4ZmnPbsoV/NZvhu7KshW8r+d gimCcDFsYFTrFGkjyGqJeX9gkofJKnsnfIN28rLRyFIDvMS8H7c2h9eHRNwXxfNO6etJ csWfheY4u2GDV+KKefADTS/fQznt8I4onweFL0LcVK/yrTTgHFG1t5Dh2yuX6wv3IglI m1rEoU9Vt6FefOg/KWhvpkfkcXQttEeK3BKLyzYare904nxfxL2sg1jNSG6nS1hDYt/d tvEwTMbHsO1t4NsC/t/pt/oBCPbFnsUflGcpuCNIB1XMqWZcWLFoumv+RVD/Pt68EziT lY3g== X-Gm-Message-State: AOAM531gmgnhL4L+TMHwsGTlFqWxTnpKbtQjGY3QzUadt8S4V5whE0pe u70Xlg0ZG+Gn/SMJ/dg+xVh4hBNWq3ZFjA== X-Google-Smtp-Source: ABdhPJyn7RdIoBu/G9CUSarLXBTJQulTMOqA/ereV3JYAGxhDraXVSd3yivnejrzGM6XoiiGHSu20w== X-Received: by 2002:a05:651c:1607:: with SMTP id f7mr2599911ljq.367.1643285790741; Thu, 27 Jan 2022 04:16:30 -0800 (PST) Received: from [192.168.219.3] ([78.8.192.131]) by smtp.gmail.com with ESMTPSA id r17sm1807700lfg.237.2022.01.27.04.16.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jan 2022 04:16:30 -0800 (PST) Date: Thu, 27 Jan 2022 12:16:27 +0000 (GMT) From: "Maciej W. Rozycki" To: gcc-patches@gcc.gnu.org Subject: [PATCH][GCC13] Don't force side effects for hardware vector element broadcast Message-ID: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Richard Sandiford , Marc Glisse Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Do not mark vector element broadcast resulting from OpenCL operations as having side effects for targets that have a suitable hardware operation, so that the `vec_duplicateM' standard RTL pattern can be directly used for them. This does not happen currently, because any RTL pattern named `vec_duplicateM' cannot express the side effects presumed, and therefore a `!TREE_SIDE_EFFECTS' condition placed in `store_constructor' correctly prevents any `vec_duplicateM' pattern from being used. This side-effect annotation was introduced for the C frontend with: , and then: ("Scalar vector binary operation"), and finally landed with commit 0e3a99ae913c ("c-typeck.c (scalar_to_vector): New function."), along with original support for scalar-to-vector OpenCL operations. Support for these operations was then gradually added to C++ with: , and then: ("[C++] Mixed scalar-vector operations"), landing with commit a212e43fca22 ("re PR c++/54427 (Expose more vector extensions)"), followed by: , and then: ("[C++] Handle ?: for vectors"), landing with commit 93100c6b5b45 ("re PR c++/54427 (Expose more vector extensions)"). And side-effect annotation for C++ was retrofitted, again gradually, with: ("[C++] Missing save_expr in vector-scalar ops"), landing with commit 6698175d1591 ("typeck.c (cp_build_binary_op): Call save_expr before build_vector_from_val."), and then: ("vector conditional expression with scalar arguments"), landing with commit 07298ffd6f7c ("call.c (build_conditional_expr_1): Handle the case with 1 vector and 2 scalars.") This was all long before we gained support for the `vec_duplicateM' standard RTL pattern, with commit be4c1d4a42c5 ("Add VEC_DUPLICATE_EXPR and associated optab") and it may have had sense for where the element broadcast operation was always open-coded. However where provided by hardware the element broadcast operation is just like any other vector operation, presumably cheap, and whether to repeat it or not should be based on the cost of the operation and not hardwired. So there is no point in artificially marking the operation as having a side effect for the lone purpose of preventing repetition. This does not currently prevent targets like the Aarch64 and MIPS ones from using the hardware element broadcast operation, but it requires the backend to wire it explicitly via the `vec_initMN' standard RTL pattern rather than relying on the middle end to pick it up via the `vec_duplicateM' pattern automatically. This change lifts this limitation. gcc/ * c/c-typeck.cc (build_binary_op) Do not annotate vector element broadcast operations as having a side effect if there is such an operation provided by the backend. * cp/call.cc (build_conditional_expr): Likewise. * cp/typeck.cc (cp_build_binary_op): Likewise. --- Hi, I've come across this while working on an out-of-tree microarchitecture, which does not require a `vec_initMN' RTL pattern, however has hardware support suitable for cheap `vec_duplicateM'. I could not persuade the middle end to make use of the new `vec_duplicateM' pattern however until I removed the side-effect anbnotation. So I think this is a move in the right direction even if we do not have support for said microarchitecture at the moment. For some reason I could not persuade our current Aarch64 code to use the `vec_duplicateM' pattern directly either. This is because at the time `store_constructor' is called the vector mode for the element broadcast operation is say `V4SFmode' (for a vector of floats), however there is no corresponding `vec_duplicatev4sf' pattern. Code in the `vec_initv4sfsf' pattern does the right thing though, referring to `aarch64_simd_dupv4sf' by hand instead. There is a `vec_duplicatevnx4sf' pattern corresponding to `VNx4SFmode', but I was unable to figure out how to activate it. This change has passed regression-testing with the `aarch64-linux-gnu' target (with `-march=armv9-a') and QEMU in the user emulation mode across all the GCC languages and libraries. NB there are indentation issues with code surrounding places I have modified, but I have chosen not to address them so as not to obfuscate this change. Likewise there are unnecessary compound statements used in the switch statements in the C++ frontend parts modified. Any questions or comments? Otherwise OK once GCC 13 has opened? Maciej --- gcc/c/c-typeck.cc | 9 +++++++-- gcc/cp/call.cc | 19 +++++++++++++++---- gcc/cp/typeck.cc | 9 +++++++-- 3 files changed, 29 insertions(+), 8 deletions(-) gcc-scalar-to-vector-no-side-effect.diff Index: gcc/gcc/c/c-typeck.cc =================================================================== --- gcc.orig/gcc/c/c-typeck.cc +++ gcc/gcc/c/c-typeck.cc @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. #include "gomp-constants.h" #include "spellcheck-tree.h" #include "gcc-rich-location.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "asan.h" @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en bool maybe_const = true; tree sc; sc = c_fully_fold (op0, false, &maybe_const); - sc = save_expr (sc); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type1)) == CODE_FOR_nothing) + sc = save_expr (sc); sc = convert (TREE_TYPE (type1), sc); op0 = build_vector_from_val (type1, sc); if (!maybe_const) @@ -11938,7 +11941,9 @@ build_binary_op (location_t location, en bool maybe_const = true; tree sc; sc = c_fully_fold (op1, false, &maybe_const); - sc = save_expr (sc); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type0)) == CODE_FOR_nothing) + sc = save_expr (sc); sc = convert (TREE_TYPE (type0), sc); op1 = build_vector_from_val (type0, sc); if (!maybe_const) Index: gcc/gcc/cp/call.cc =================================================================== --- gcc.orig/gcc/cp/call.cc +++ gcc/gcc/cp/call.cc @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. #include "langhooks.h" #include "c-family/c-objc.h" #include "internal-fn.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "gcc-rich-location.h" @@ -5435,12 +5436,18 @@ build_conditional_expr (const op_locatio return error_mark_node; } + bool have_vec_duplicate + = optab_handler (vec_duplicate_optab, + TYPE_MODE (vtype)) != CODE_FOR_nothing; + arg2 = cp_convert (stype, arg2, complain); - arg2 = save_expr (arg2); + if (!have_vec_duplicate) + arg2 = save_expr (arg2); arg2 = build_vector_from_val (vtype, arg2); arg2_type = vtype; arg3 = cp_convert (stype, arg3, complain); - arg3 = save_expr (arg3); + if (!have_vec_duplicate) + arg3 = save_expr (arg3); arg3 = build_vector_from_val (vtype, arg3); arg3_type = vtype; } @@ -5458,7 +5465,9 @@ build_conditional_expr (const op_locatio return error_mark_node; case stv_firstarg: { - arg2 = save_expr (arg2); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (arg3_type)) == CODE_FOR_nothing) + arg2 = save_expr (arg2); arg2 = convert (TREE_TYPE (arg3_type), arg2); arg2 = build_vector_from_val (arg3_type, arg2); arg2_type = TREE_TYPE (arg2); @@ -5466,7 +5475,9 @@ build_conditional_expr (const op_locatio } case stv_secondarg: { - arg3 = save_expr (arg3); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (arg2_type)) == CODE_FOR_nothing) + arg3 = save_expr (arg3); arg3 = convert (TREE_TYPE (arg2_type), arg3); arg3 = build_vector_from_val (arg2_type, arg3); arg3_type = TREE_TYPE (arg3); Index: gcc/gcc/cp/typeck.cc =================================================================== --- gcc.orig/gcc/cp/typeck.cc +++ gcc/gcc/cp/typeck.cc @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. #include "c-family/c-objc.h" #include "c-family/c-ubsan.h" #include "gcc-rich-location.h" +#include "optabs-query.h" #include "stringpool.h" #include "attribs.h" #include "asan.h" @@ -5066,7 +5067,9 @@ cp_build_binary_op (const op_location_t case stv_firstarg: { op0 = convert (TREE_TYPE (type1), op0); - op0 = save_expr (op0); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type1)) == CODE_FOR_nothing) + op0 = save_expr (op0); op0 = build_vector_from_val (type1, op0); type0 = TREE_TYPE (op0); code0 = TREE_CODE (type0); @@ -5076,7 +5079,9 @@ cp_build_binary_op (const op_location_t case stv_secondarg: { op1 = convert (TREE_TYPE (type0), op1); - op1 = save_expr (op1); + if (optab_handler (vec_duplicate_optab, + TYPE_MODE (type0)) == CODE_FOR_nothing) + op1 = save_expr (op1); op1 = build_vector_from_val (type0, op1); type1 = TREE_TYPE (op1); code1 = TREE_CODE (type1);