From patchwork Tue Aug 9 13:23:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 56622 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AECBF38560A7 for ; Tue, 9 Aug 2022 13:24:55 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id A39B4385702C for ; Tue, 9 Aug 2022 13:24:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A39B4385702C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,224,1654588800"; d="scan'208";a="80994419" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 09 Aug 2022 05:24:08 -0800 IronPort-SDR: dwfF63Ks1sLczyWjZzKggvdOxpWKLe+TKIg0JwgV/iVfF5F/FoyBiRhvRTFQGYchK9yp4N9yfz fupUnNN7fHZ0e7ND21zPyKr66llI700imbuXd30SZONVWoQUGDJBg5n8BJvT//WIl27QZ9wQxW jPrrkOQv5Q9wiVlk7c9IYrkxkYXEbVjWPqrgUYe7wvpth8IZLNK+MSi7rDrXZiyBx57jheVY9i rgxL9UmvJPXA8W2s9VBlsUmJZ9O4qzVN+BJdH6MmVJGHhWDmv0tG0n5/j83WtoF56eWif8sUHi 2is= From: Andrew Stubbs To: Subject: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors Date: Tue, 9 Aug 2022 14:23:48 +0100 Message-ID: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The vecsize_int/vecsize_float has an assumption that all arguments will use the same bitsize, and vary the number of lanes according to the element size, but this is inappropriate on targets where the number of lanes is fixed and the bitsize varies (i.e. amdgcn). With this change the vecsize can be left zero and the vectorization factor will be the same for all types. gcc/ChangeLog: * doc/tm.texi: Regenerate. * omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero vecsize. (simd_clone_adjust_argument_types): Likewise. * target.def (compute_vecsize_and_simdlen): Document the new vecsize_int and vecsize_float semantics. --- gcc/doc/tm.texi | 3 +++ gcc/omp-simd-clone.cc | 20 +++++++++++++++----- gcc/target.def | 3 +++ 3 files changed, 21 insertions(+), 5 deletions(-) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 92bda1a7e14..c3001c6ded9 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6253,6 +6253,9 @@ stores. This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float} fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also @var{simdlen} field if it was previously 0. +@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and +@var{vecsize_float} should be left zero on targets where the number of lanes is +not determined by the bitsize (in which case @var{simdlen} is always used). The hook should return 0 if SIMD clones shouldn't be emitted, or number of @var{vecsize_mangle} variants that should be emitted. @end deftypefn diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc index 58bd68b129b..258d3c6377f 100644 --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node) veclen = node->simdclone->vecsize_int; else veclen = node->simdclone->vecsize_float; - veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t))); + if (known_eq (veclen, 0)) + veclen = node->simdclone->simdlen; + else + veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t))); if (multiple_p (veclen, node->simdclone->simdlen)) veclen = node->simdclone->simdlen; if (POINTER_TYPE_P (t)) @@ -618,8 +621,12 @@ simd_clone_adjust_argument_types (struct cgraph_node *node) veclen = sc->vecsize_int; else veclen = sc->vecsize_float; - veclen = exact_div (veclen, - GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type))); + if (known_eq (veclen, 0)) + veclen = sc->simdlen; + else + veclen = exact_div (veclen, + GET_MODE_BITSIZE + (SCALAR_TYPE_MODE (parm_type))); if (multiple_p (veclen, sc->simdlen)) veclen = sc->simdlen; adj.op = IPA_PARAM_OP_NEW; @@ -669,8 +676,11 @@ simd_clone_adjust_argument_types (struct cgraph_node *node) veclen = sc->vecsize_int; else veclen = sc->vecsize_float; - veclen = exact_div (veclen, - GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type))); + if (known_eq (veclen, 0)) + veclen = sc->simdlen; + else + veclen = exact_div (veclen, + GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type))); if (multiple_p (veclen, sc->simdlen)) veclen = sc->simdlen; if (sc->mask_mode != VOIDmode) diff --git a/gcc/target.def b/gcc/target.def index 2a7fa68f83d..4d49ffc2c88 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1629,6 +1629,9 @@ DEFHOOK "This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}\n\ fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also\n\ @var{simdlen} field if it was previously 0.\n\ +@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and\n\ +@var{vecsize_float} should be left zero on targets where the number of lanes is\n\ +not determined by the bitsize (in which case @var{simdlen} is always used).\n\ The hook should return 0 if SIMD clones shouldn't be emitted,\n\ or number of @var{vecsize_mangle} variants that should be emitted.", int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL) From patchwork Tue Aug 9 13:23:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 56623 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B8DE23856242 for ; Tue, 9 Aug 2022 13:25:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 6049E3856DE2 for ; Tue, 9 Aug 2022 13:24:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6049E3856DE2 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,224,1654588800"; d="scan'208";a="80994424" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 09 Aug 2022 05:24:13 -0800 IronPort-SDR: kYUeIiRu+kzynfRQvftP/SMnw869ClMTQ3s9QZ6bs6R0f//EJZReNLdwDC4r8Zn10q7UE1kL+A WkU1bZM/bXJj++OqAVYoQR8oeQfiOGl3JxynO8/aGN64Hs9M8TG9DmEgf08bImpSLwxMz5ZEAK 9j9FmeUm2I8rW4yDXWy39tMDK47G9VvMU5q37uszBC6VUzrcO0hYd/bi8eGWRuKECPV9My6G/n 1b9R9jX9SrUbAb3gge1Zl2MB9wevDQLbhY686mWYwl2YUgQgagowgdWf+QInIKo1F+n61//+1/ EBw= From: Andrew Stubbs To: Subject: [PATCH 2/3] amdgcn: OpenMP SIMD routine support Date: Tue, 9 Aug 2022 14:23:49 +0100 Message-ID: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Enable and configure SIMD clones for amdgcn. This affects both the __simd__ function attribute, and the OpenMP "declare simd" directive. Note that the masked SIMD variants are generated, but the middle end doesn't actually support calling them yet. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): New. (gcn_simd_clone_adjust): New. (gcn_simd_clone_usable): New. (TARGET_SIMD_CLONE_ADJUST): New. (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New. (TARGET_SIMD_CLONE_USABLE): New. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-simd-clone-1.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-2.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-3.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-4.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-5.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-8.c: Add dg-warning. --- gcc/config/gcn/gcn.cc | 63 +++++++++++++++++++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c | 2 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c | 2 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c | 1 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c | 1 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c | 1 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c | 2 + 7 files changed, 72 insertions(+) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 96295e23aad..ceb69000807 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -52,6 +52,7 @@ #include "rtl-iter.h" #include "dwarf2.h" #include "gimple.h" +#include "cgraph.h" /* This file should be included last. */ #include "target-def.h" @@ -4555,6 +4556,61 @@ gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED (type_of_cost), return 1; } +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN. */ + +static int +gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node), + struct cgraph_simd_clone *clonei, + tree base_type, + int ARG_UNUSED (num)) +{ + unsigned int elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)); + + if (known_eq (clonei->simdlen, 0U)) + clonei->simdlen = 64; + else if (maybe_ne (clonei->simdlen, 64U)) + { + /* Note that x86 has a similar message that is likely to trigger on + sizes that are OK for gcn; the user can't win. */ + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported simdlen %wd (amdgcn)", + clonei->simdlen.to_constant ()); + return 0; + } + + clonei->vecsize_mangle = 'n'; + clonei->vecsize_int = 0; + clonei->vecsize_float = 0; + + /* DImode ought to be more natural here, but VOIDmode produces better code, + at present, due to the shift-and-test steps not being optimized away + inside the in-branch clones. */ + clonei->mask_mode = VOIDmode; + + return 1; +} + +/* Implement TARGET_SIMD_CLONE_ADJUST. */ + +static void +gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node)) +{ + /* This hook has to be defined when + TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN is defined, but we don't + need it to do anything yet. */ +} + +/* Implement TARGET_SIMD_CLONE_USABLE. */ + +static int +gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node)) +{ + /* We don't need to do anything here because + gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one + possibility. */ + return 0; +} + /* }}} */ /* {{{ md_reorg pass. */ @@ -6643,6 +6699,13 @@ gcn_dwarf_register_span (rtx rtl) #define TARGET_SECTION_TYPE_FLAGS gcn_section_type_flags #undef TARGET_SCALAR_MODE_SUPPORTED_P #define TARGET_SCALAR_MODE_SUPPORTED_P gcn_scalar_mode_supported_p +#undef TARGET_SIMD_CLONE_ADJUST +#define TARGET_SIMD_CLONE_ADJUST gcn_simd_clone_adjust +#undef TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN +#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \ + gcn_simd_clone_compute_vecsize_and_simdlen +#undef TARGET_SIMD_CLONE_USABLE +#define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable #undef TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \ gcn_small_register_classes_for_mode_p diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c index 50429049500..cd65fc343f1 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c @@ -56,3 +56,5 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */ +/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c index f89c73a961b..ffcbf9380d6 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c @@ -50,3 +50,5 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */ +/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c index 75ce696ed66..18d68779cc5 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c @@ -43,3 +43,4 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 15 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c index debbe77b79d..e9af0b83162 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c @@ -46,3 +46,4 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 17 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c index 6a098d9a51a..46da496524d 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c @@ -41,3 +41,4 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 15 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c index 1bfd19dc8ab..f414285a170 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c @@ -92,3 +92,5 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 17 } */ +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 24 } */ From patchwork Tue Aug 9 13:23:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 56624 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 809D33856DF4 for ; Tue, 9 Aug 2022 13:26:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id E64363856944 for ; Tue, 9 Aug 2022 13:24:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E64363856944 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,224,1654588800"; d="scan'208";a="80994427" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 09 Aug 2022 05:24:13 -0800 IronPort-SDR: jLhJbrtUh0Q38rGTBCTa6ksRNTMrHR0UpqKyXwYsgxYeGS9U328ao0NEn2gezilm7fMoi3bcR7 B5AxiHUnSTwKonmA68+zHUZrMqhYZJapPeNh+OZx1ny635SEfpQrVKedThgruYQ4HKoQiuB1OU NUHfHkDoFJH7ErUJI0+kcGAaKpzTZ8/ertO7xbQfy9gXAcKaz72TBMJwGE1W2lPhOK7zSRfSBp oQ4KLn2pd1zyliuJ8yIdAtbUubpUirzGIWxTmhKN8A0vPhlBzkGFPkxnAJqUvBaHW5zXlfInSm 8Is= From: Andrew Stubbs To: Subject: [PATCH 3/3] vect: inbranch SIMD clones Date: Tue, 9 Aug 2022 14:23:50 +0100 Message-ID: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" There has been support for generating "inbranch" SIMD clones for a long time, but nothing actually uses them (as far as I can see). This patch add supports for a sub-set of possible cases (those using mask_mode == VOIDmode). The other cases fail to vectorize, just as before, so there should be no regressions. The sub-set of support should cover all cases needed by amdgcn, at present. gcc/ChangeLog: * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type for mask arguments also. * tree-if-conv.cc: Include cgraph.h. (if_convertible_stmt_p): Do if conversions for calls to SIMD calls. (predicate_statements): Pass the predicate to SIMD functions. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls to clones with mask arguments, in some cases. Generate the mask vector arguments. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-simd-clone-16.c: New test. * gcc.dg/vect/vect-simd-clone-16b.c: New test. * gcc.dg/vect/vect-simd-clone-16c.c: New test. * gcc.dg/vect/vect-simd-clone-16d.c: New test. * gcc.dg/vect/vect-simd-clone-16e.c: New test. * gcc.dg/vect/vect-simd-clone-16f.c: New test. * gcc.dg/vect/vect-simd-clone-17.c: New test. * gcc.dg/vect/vect-simd-clone-17b.c: New test. * gcc.dg/vect/vect-simd-clone-17c.c: New test. * gcc.dg/vect/vect-simd-clone-17d.c: New test. * gcc.dg/vect/vect-simd-clone-17e.c: New test. * gcc.dg/vect/vect-simd-clone-17f.c: New test. * gcc.dg/vect/vect-simd-clone-18.c: New test. * gcc.dg/vect/vect-simd-clone-18b.c: New test. * gcc.dg/vect/vect-simd-clone-18c.c: New test. * gcc.dg/vect/vect-simd-clone-18d.c: New test. * gcc.dg/vect/vect-simd-clone-18e.c: New test. * gcc.dg/vect/vect-simd-clone-18f.c: New test. --- gcc/omp-simd-clone.cc | 1 + .../gcc.dg/vect/vect-simd-clone-16.c | 89 ++++++++++++ .../gcc.dg/vect/vect-simd-clone-16b.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-16c.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-16d.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-16e.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-16f.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-17.c | 89 ++++++++++++ .../gcc.dg/vect/vect-simd-clone-17b.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-17c.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-17d.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-17e.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-17f.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-18.c | 89 ++++++++++++ .../gcc.dg/vect/vect-simd-clone-18b.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-18c.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-18d.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-18e.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-18f.c | 16 +++ gcc/tree-if-conv.cc | 39 ++++- gcc/tree-vect-stmts.cc | 134 ++++++++++++++---- 21 files changed, 641 insertions(+), 28 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc index 258d3c6377f..58e3dc8b2e9 100644 --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -716,6 +716,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node) } sc->args[i].orig_type = base_type; sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK; + sc->args[i].vector_type = adj.type; } if (node->definition) diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c new file mode 100644 index 00000000000..ffaabb30d1e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c @@ -0,0 +1,89 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +/* Test that simd inbranch clones work correctly. */ + +#ifndef TYPE +#define TYPE int +#endif + +/* A simple function that will be cloned. */ +#pragma omp declare simd +TYPE __attribute__((noinline)) +foo (TYPE a) +{ + return a + 1; +} + +/* Check that "inbranch" clones are called correctly. */ + +void __attribute__((noinline)) +masked (TYPE * __restrict a, TYPE * __restrict b, int size) +{ + #pragma omp simd + for (int i = 0; i < size; i++) + b[i] = a[i]<1 ? foo(a[i]) : a[i]; +} + +/* Check that "inbranch" works when there might be unrolling. */ + +void __attribute__((noinline)) +masked_fixed (TYPE * __restrict a, TYPE * __restrict b) +{ + #pragma omp simd + for (int i = 0; i < 128; i++) + b[i] = a[i]<1 ? foo(a[i]) : a[i]; +} + +/* Validate the outputs. */ + +void +check_masked (TYPE *b, int size) +{ + for (int i = 0; i < size; i++) + if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1)) + || ((TYPE)i >= 1 && b[i] != (TYPE)i)) + { + __builtin_printf ("error at %d\n", i); + __builtin_exit (1); + } +} + +int +main () +{ + TYPE a[1024]; + TYPE b[1024]; + + for (int i = 0; i < 1024; i++) + a[i] = i; + + masked_fixed (a, b); + check_masked (b, 128); + + /* Test various sizes to cover machines with different vectorization + factors. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size); + check_masked (b, size); + } + + /* Test sizes that might exercise the partial vector code-path. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size-4); + check_masked (b, size-4); + } + + return 0; +} + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c new file mode 100644 index 00000000000..a503ef85238 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE float +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c new file mode 100644 index 00000000000..6563879df71 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE short +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=short. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c new file mode 100644 index 00000000000..6c5e69482e5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE char +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=char. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c new file mode 100644 index 00000000000..6690844deae --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE double +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c new file mode 100644 index 00000000000..e7b35a6a2dc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE __INT64_TYPE__ +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=int64. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c new file mode 100644 index 00000000000..6f5d374a417 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c @@ -0,0 +1,89 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +/* Test that simd inbranch clones work correctly. */ + +#ifndef TYPE +#define TYPE int +#endif + +/* A simple function that will be cloned. */ +#pragma omp declare simd uniform(b) +TYPE __attribute__((noinline)) +foo (TYPE a, TYPE b) +{ + return a + b; +} + +/* Check that "inbranch" clones are called correctly. */ + +void __attribute__((noinline)) +masked (TYPE * __restrict a, TYPE * __restrict b, int size) +{ + #pragma omp simd + for (int i = 0; i < size; i++) + b[i] = a[i]<1 ? foo(a[i], 1) : a[i]; +} + +/* Check that "inbranch" works when there might be unrolling. */ + +void __attribute__((noinline)) +masked_fixed (TYPE * __restrict a, TYPE * __restrict b) +{ + #pragma omp simd + for (int i = 0; i < 128; i++) + b[i] = a[i]<1 ? foo(a[i], 1) : a[i]; +} + +/* Validate the outputs. */ + +void +check_masked (TYPE *b, int size) +{ + for (int i = 0; i < size; i++) + if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1)) + || ((TYPE)i >= 1 && b[i] != (TYPE)i)) + { + __builtin_printf ("error at %d\n", i); + __builtin_exit (1); + } +} + +int +main () +{ + TYPE a[1024]; + TYPE b[1024]; + + for (int i = 0; i < 1024; i++) + a[i] = i; + + masked_fixed (a, b); + check_masked (b, 128); + + /* Test various sizes to cover machines with different vectorization + factors. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size); + check_masked (b, size); + } + + /* Test sizes that might exercise the partial vector code-path. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size-4); + check_masked (b, size-4); + } + + return 0; +} + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c new file mode 100644 index 00000000000..1e2c3ab11b3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE float +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c new file mode 100644 index 00000000000..007001de669 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE short +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=short. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c new file mode 100644 index 00000000000..abb85a4ceee --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE char +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=char. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c new file mode 100644 index 00000000000..2c1d8a659bd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE double +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c new file mode 100644 index 00000000000..582e690304f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE __INT64_TYPE__ +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=int64. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c new file mode 100644 index 00000000000..750a3f92b62 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c @@ -0,0 +1,89 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +/* Test that simd inbranch clones work correctly. */ + +#ifndef TYPE +#define TYPE int +#endif + +/* A simple function that will be cloned. */ +#pragma omp declare simd uniform(b) +TYPE __attribute__((noinline)) +foo (TYPE b, TYPE a) +{ + return a + b; +} + +/* Check that "inbranch" clones are called correctly. */ + +void __attribute__((noinline)) +masked (TYPE * __restrict a, TYPE * __restrict b, int size) +{ + #pragma omp simd + for (int i = 0; i < size; i++) + b[i] = a[i]<1 ? foo(1, a[i]) : a[i]; +} + +/* Check that "inbranch" works when there might be unrolling. */ + +void __attribute__((noinline)) +masked_fixed (TYPE * __restrict a, TYPE * __restrict b) +{ + #pragma omp simd + for (int i = 0; i < 128; i++) + b[i] = a[i]<1 ? foo(1, a[i]) : a[i]; +} + +/* Validate the outputs. */ + +void +check_masked (TYPE *b, int size) +{ + for (int i = 0; i < size; i++) + if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1)) + || ((TYPE)i >= 1 && b[i] != (TYPE)i)) + { + __builtin_printf ("error at %d\n", i); + __builtin_exit (1); + } +} + +int +main () +{ + TYPE a[1024]; + TYPE b[1024]; + + for (int i = 0; i < 1024; i++) + a[i] = i; + + masked_fixed (a, b); + check_masked (b, 128); + + /* Test various sizes to cover machines with different vectorization + factors. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size); + check_masked (b, size); + } + + /* Test sizes that might exercise the partial vector code-path. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size-4); + check_masked (b, size-4); + } + + return 0; +} + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c new file mode 100644 index 00000000000..a77ccf3bfcc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE float +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c new file mode 100644 index 00000000000..bee5f338abe --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE short +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=short. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c new file mode 100644 index 00000000000..a749edefdd7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE char +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=char. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c new file mode 100644 index 00000000000..061e0dc2621 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE double +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c new file mode 100644 index 00000000000..a3037f5809a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE __INT64_TYPE__ +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=int64. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 1c8e1a45234..82b21add802 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -122,6 +122,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-dse.h" #include "tree-vectorizer.h" #include "tree-eh.h" +#include "cgraph.h" /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ @@ -1054,7 +1055,8 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt, A statement is if-convertible if: - it is an if-convertible GIMPLE_ASSIGN, - it is a GIMPLE_LABEL or a GIMPLE_COND, - - it is builtins call. */ + - it is builtins call. + - it is a call to a function with a SIMD clone. */ static bool if_convertible_stmt_p (gimple *stmt, vec refs) @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec refs) tree fndecl = gimple_call_fndecl (stmt); if (fndecl) { + /* We can vectorize some builtins and functions with SIMD + clones. */ int flags = gimple_call_flags (stmt); + struct cgraph_node *node = cgraph_node::get (fndecl); if ((flags & ECF_CONST) && !(flags & ECF_LOOPING_CONST_OR_PURE) - /* We can only vectorize some builtins at the moment, - so restrict if-conversion to those. */ && fndecl_built_in_p (fndecl)) return true; + else if (node && node->simd_clones != NULL) + { + need_to_predicate = true; + return true; + } } return false; } @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop) gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi)); update_stmt (stmt); } + + /* Add a predicate parameter to functions that have a SIMD clone. + This will cause the vectorizer to match the "in branch" clone + variants because they also have the extra parameter, and serves + to build the mask vector in a natural way. */ + gcall *call = dyn_cast (gsi_stmt (gsi)); + if (call && !gimple_call_internal_p (call)) + { + tree orig_fndecl = gimple_call_fndecl (call); + int orig_nargs = gimple_call_num_args (call); + auto_vec args; + for (int i=0; i < orig_nargs; i++) + args.safe_push (gimple_call_arg (call, i)); + args.safe_push (cond); + + /* Replace the call with a new one that has the extra + parameter. The FUNCTION_DECL remains unchanged so that + the vectorizer can find the SIMD clones. This call will + either be deleted or replaced at that time, so the + mismatch is short-lived and we can live with it. */ + gcall *new_call = gimple_build_call_vec (orig_fndecl, args); + gimple_call_set_lhs (new_call, gimple_call_lhs (call)); + gsi_replace (&gsi, new_call, true); + } + lhs = gimple_get_lhs (gsi_stmt (gsi)); if (lhs && TREE_CODE (lhs) == SSA_NAME) ssa_names.add (lhs); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f582d238984..2214d216c15 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4049,16 +4049,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, || thisarginfo.dt == vect_external_def) gcc_assert (thisarginfo.vectype == NULL_TREE); else - { - gcc_assert (thisarginfo.vectype != NULL_TREE); - if (VECTOR_BOOLEAN_TYPE_P (thisarginfo.vectype)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "vector mask arguments are not supported\n"); - return false; - } - } + gcc_assert (thisarginfo.vectype != NULL_TREE); /* For linear arguments, the analyze phase should have saved the base and step in STMT_VINFO_SIMD_CLONE_INFO. */ @@ -4151,9 +4142,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, if (target_badness < 0) continue; this_badness += target_badness * 512; - /* FORNOW: Have to add code to add the mask argument. */ - if (n->simdclone->inbranch) - continue; for (i = 0; i < nargs; i++) { switch (n->simdclone->args[i].arg_type) @@ -4191,7 +4179,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, i = -1; break; case SIMD_CLONE_ARG_TYPE_MASK: - gcc_unreachable (); + break; } if (i == (size_t) -1) break; @@ -4217,18 +4205,55 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, return false; for (i = 0; i < nargs; i++) - if ((arginfo[i].dt == vect_constant_def - || arginfo[i].dt == vect_external_def) - && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR) - { - tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i)); - arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type, - slp_node); - if (arginfo[i].vectype == NULL - || !constant_multiple_p (bestn->simdclone->simdlen, - simd_clone_subparts (arginfo[i].vectype))) + { + if ((arginfo[i].dt == vect_constant_def + || arginfo[i].dt == vect_external_def) + && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR) + { + tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i)); + arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type, + slp_node); + if (arginfo[i].vectype == NULL + || !constant_multiple_p (bestn->simdclone->simdlen, + simd_clone_subparts (arginfo[i].vectype))) + return false; + } + + if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR + && VECTOR_BOOLEAN_TYPE_P (bestn->simdclone->args[i].vector_type)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "vector mask arguments are not supported.\n"); return false; - } + } + + if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK + && bestn->simdclone->mask_mode == VOIDmode + && (simd_clone_subparts (bestn->simdclone->args[i].vector_type) + != simd_clone_subparts (arginfo[i].vectype))) + { + /* FORNOW we only have partial support for vector-type masks that + can't hold all of simdlen. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "in-branch vector clones are not yet" + " supported for mismatched vector sizes.\n"); + return false; + } + if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK + && bestn->simdclone->mask_mode != VOIDmode) + { + /* FORNOW don't support integer-type masks. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "in-branch vector clones are not yet" + " supported for integer mask modes.\n"); + return false; + } + } fndecl = bestn->decl; nunits = bestn->simdclone->simdlen; @@ -4417,6 +4442,65 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, } } break; + case SIMD_CLONE_ARG_TYPE_MASK: + atype = bestn->simdclone->args[i].vector_type; + if (bestn->simdclone->mask_mode != VOIDmode) + { + /* FORNOW: this is disabled above. */ + gcc_unreachable (); + } + else + { + tree elt_type = TREE_TYPE (atype); + tree one = fold_convert (elt_type, integer_one_node); + tree zero = fold_convert (elt_type, integer_zero_node); + o = vector_unroll_factor (nunits, + simd_clone_subparts (atype)); + for (m = j * o; m < (j + 1) * o; m++) + { + if (simd_clone_subparts (atype) + < simd_clone_subparts (arginfo[i].vectype)) + { + /* The mask type has fewer elements than simdlen. */ + + /* FORNOW */ + gcc_unreachable (); + } + else if (simd_clone_subparts (atype) + == simd_clone_subparts (arginfo[i].vectype)) + { + /* The SIMD clone function has the same number of + elements as the current function. */ + if (m == 0) + { + vect_get_vec_defs_for_operand (vinfo, stmt_info, + o * ncopies, + op, + &vec_oprnds[i]); + vec_oprnds_i[i] = 0; + } + vec_oprnd0 = vec_oprnds[i][vec_oprnds_i[i]++]; + vec_oprnd0 + = build3 (VEC_COND_EXPR, atype, vec_oprnd0, + build_vector_from_val (atype, one), + build_vector_from_val (atype, zero)); + gassign *new_stmt + = gimple_build_assign (make_ssa_name (atype), + vec_oprnd0); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + vargs.safe_push (gimple_assign_lhs (new_stmt)); + } + else + { + /* The mask type has more elements than simdlen. */ + + /* FORNOW */ + gcc_unreachable (); + } + } + } + break; case SIMD_CLONE_ARG_TYPE_UNIFORM: vargs.safe_push (op); break;