From patchwork Thu Oct 28 14:44:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 46754 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 685F53857C47 for ; Thu, 28 Oct 2021 14:45:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 685F53857C47 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1635432323; bh=nefHWoC60b8JOxRkjQS2NCOZ37x/0t66uPL1xuvb0eQ=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=h5r/OV0AY+FWnUoGRST191GErplphkVakM+eKr3qxgziySVvtjxc7z2vD/3+D4SbC UwSgmVb+clyYOv0i9y3nS2JEP4tn38yDHiuzAu8bdnHp3ZjXRVV5OW28ocKKjY8GYC EfUNZYEKpAkP8dJQgUZ8qwpv2/LErW1loFaVQLH0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 149C13857802 for ; Thu, 28 Oct 2021 14:44:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 149C13857802 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19SClYFl017334; Thu, 28 Oct 2021 14:44:26 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 3byv68ttcj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Oct 2021 14:44:25 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19SEhsxk003948; Thu, 28 Oct 2021 14:44:23 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma04ams.nl.ibm.com with ESMTP id 3bx4eea7kj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Oct 2021 14:44:22 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19SEcAme40304930 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Oct 2021 14:38:10 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A5B3AAE05A; Thu, 28 Oct 2021 14:44:19 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6E63BAE061; Thu, 28 Oct 2021 14:44:19 +0000 (GMT) Received: from [9.171.41.86] (unknown [9.171.41.86]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 28 Oct 2021 14:44:19 +0000 (GMT) Message-ID: <440687a0-d6e5-50e0-7105-7914b910c8c6@linux.ibm.com> Date: Thu, 28 Oct 2021 16:44:19 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Content-Language: en-US To: GCC Patches Subject: [PATCH] vect: Add bias parameter for partial vectorization X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: DPeuK2Ip8uWaAJiSClooIUa8HQJhAcpx X-Proofpoint-GUID: DPeuK2Ip8uWaAJiSClooIUa8HQJhAcpx X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-28_01,2021-10-26_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 clxscore=1015 priorityscore=1501 mlxscore=0 phishscore=0 mlxlogscore=999 impostorscore=0 bulkscore=0 malwarescore=0 suspectscore=0 adultscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2110280081 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Robin Dapp via Gcc-patches From: Robin Dapp Reply-To: Robin Dapp Cc: Richard Sandiford Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, as discussed in https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582627.html this introduces a bias parameter for the len_load/len_store ifns as well as optabs that is meant to distinguish between Power and s390 variants. The default is a bias of 0, while in s390's case vll/vstl do not support lengths of zero bytes and a bias of -1 should be used. Bootstrapped and regtested on Power9 (--with-cpu=power9) and s390 (--with-arch=z15). The tiny changes in the Power backend I will post separately. Regards Robin commit 18a5fcd0f8835247e86d86fb018789fe755404be Author: Robin Dapp Date: Wed Oct 27 11:42:11 2021 +0200 vect: Add bias parameter for partial vectorization This adds a bias parameter for LEN_LOAD and LEN_STORE as well as the corresponding internal functions. A bias of 0 represents the status quo, while -1 is used for the s390 vll instruction that expects the highest byte to load rather than the number of bytes to load. Backends need to support one of these biases via an operand predicate and the vectorizer will then emit the appropriate variant. diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 8312d08aab2..993e32c1854 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2696,9 +2696,9 @@ expand_call_mem_ref (tree type, gcall *stmt, int index) static void expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) { - class expand_operand ops[3]; - tree type, lhs, rhs, maskt; - rtx mem, target, mask; + class expand_operand ops[4]; + tree type, lhs, rhs, maskt, biast; + rtx mem, target, mask, bias; insn_code icode; maskt = gimple_call_arg (stmt, 2); @@ -2727,7 +2727,16 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) TYPE_UNSIGNED (TREE_TYPE (maskt))); else create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (icode, 3, ops); + if (optab == len_load_optab) + { + biast = gimple_call_arg (stmt, 3); + bias = expand_normal (biast); + create_input_operand (&ops[3], bias, QImode); + expand_insn (icode, 4, ops); + } + else + expand_insn (icode, 3, ops); + if (!rtx_equal_p (target, ops[0].value)) emit_move_insn (target, ops[0].value); } @@ -2741,9 +2750,9 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) static void expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) { - class expand_operand ops[3]; - tree type, lhs, rhs, maskt; - rtx mem, reg, mask; + class expand_operand ops[4]; + tree type, lhs, rhs, maskt, biast; + rtx mem, reg, mask, bias; insn_code icode; maskt = gimple_call_arg (stmt, 2); @@ -2770,7 +2779,16 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) TYPE_UNSIGNED (TREE_TYPE (maskt))); else create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (icode, 3, ops); + + if (optab == len_store_optab) + { + biast = gimple_call_arg (stmt, 4); + bias = expand_normal (biast); + create_input_operand (&ops[3], bias, QImode); + expand_insn (icode, 4, ops); + } + else + expand_insn (icode, 3, ops); } #define expand_mask_store_optab_fn expand_partial_store_optab_fn @@ -4172,6 +4190,30 @@ internal_check_ptrs_fn_supported_p (internal_fn ifn, tree type, && insn_operand_matches (icode, 4, GEN_INT (align))); } +/* Return the supported bias for the len_load IFN. For now we support a + default bias of 0 and -1 in case 0 is not an allowable length for len_load. + If none of these biases match what the backend provides, return + VECT_PARTIAL_BIAS_UNSUPPORTED. */ + +signed char +internal_len_load_bias_supported (internal_fn ifn, machine_mode mode) +{ + optab optab = direct_internal_fn_optab (ifn); + insn_code icode = direct_optab_handler (optab, mode); + + if (icode != CODE_FOR_nothing) + { + /* We only support a bias of 0 (default) or -1. Try both + of them. */ + if (insn_operand_matches (icode, 3, GEN_INT (0))) + return 0; + else if (insn_operand_matches (icode, 3, GEN_INT (-1))) + return -1; + } + + return VECT_PARTIAL_BIAS_UNSUPPORTED; +} + /* Expand STMT as though it were a call to internal function FN. */ void diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 19d0f849a5a..af28cf0d566 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -227,6 +227,10 @@ extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, tree, tree, int); extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree, poly_uint64, unsigned int); +#define VECT_PARTIAL_BIAS_UNSUPPORTED 127 + +extern signed char internal_len_load_bias_supported (internal_fn ifn, + machine_mode); extern void expand_addsub_overflow (location_t, tree_code, tree, tree, tree, bool, bool, bool, bool, tree *); diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index e94356d76e9..cd2c33fc4a7 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1163,6 +1163,15 @@ vect_verify_loop_lens (loop_vec_info loop_vinfo) if (LOOP_VINFO_LENS (loop_vinfo).is_empty ()) return false; + opt_machine_mode len_load_mode = get_len_load_store_mode + (loop_vinfo->vector_mode, false); + /* If the backend requires a bias of -1 for LEN_LOAD, we must not emit + len_loads with a length of zero. In order to avoid that we prohibit + more than one loop length here. */ + if (internal_len_load_bias_supported (IFN_LEN_LOAD, len_load_mode.require ()) + == -1 && LOOP_VINFO_LENS (loop_vinfo).length () > 1) + return false; + unsigned int max_nitems_per_iter = 1; unsigned int i; rgroup_controls *rgl; diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 17849b575b7..c3df26c8009 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -8289,9 +8289,30 @@ vectorizable_store (vec_info *vinfo, gsi); vec_oprnd = var; } + + /* Check which bias value to use. Default is 0. + A bias of -1 means that we cannot emit a LEN_LOAD with + a length of 0 and need to subtract 1 from the length. */ + char biasval = internal_len_load_bias_supported + (IFN_LEN_STORE, new_vmode); + tree bias = build_int_cst (intQI_type_node, biasval); + tree new_len = final_len; + if (biasval != 0 + && biasval != VECT_PARTIAL_BIAS_UNSUPPORTED) + { + new_len = make_ssa_name (TREE_TYPE (final_len)); + gassign *m1 + = gimple_build_assign (new_len, MINUS_EXPR, + final_len, + build_one_cst (TREE_TYPE + (final_len))); + vect_finish_stmt_generation (vinfo, stmt_info, m1, + gsi); + } gcall *call - = gimple_build_call_internal (IFN_LEN_STORE, 4, dataref_ptr, - ptr, final_len, vec_oprnd); + = gimple_build_call_internal (IFN_LEN_STORE, 5, dataref_ptr, + ptr, new_len, vec_oprnd, + bias); gimple_call_set_nothrow (call, true); vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); new_stmt = call; @@ -9588,24 +9609,46 @@ vectorizable_load (vec_info *vinfo, vec_num * j + i); tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT); + + machine_mode vmode = TYPE_MODE (vectype); + opt_machine_mode new_ovmode + = get_len_load_store_mode (vmode, true); + machine_mode new_vmode = new_ovmode.require (); + tree qi_type = unsigned_intQI_type_node; + tree new_vtype + = build_vector_type_for_mode (qi_type, new_vmode); + + /* Check which bias value to use. Default is 0. */ + char biasval = internal_len_load_bias_supported + (IFN_LEN_LOAD, new_vmode); + + tree bias = build_int_cst (intQI_type_node, biasval); + tree new_len = final_len; + if (biasval != 0 + && biasval != VECT_PARTIAL_BIAS_UNSUPPORTED) + { + new_len = make_ssa_name (TREE_TYPE (final_len)); + gassign *m1 = gimple_build_assign (new_len, + MINUS_EXPR, + final_len, + build_one_cst + (TREE_TYPE + (final_len))); + vect_finish_stmt_generation (vinfo, stmt_info, m1, + gsi); + } + gcall *call - = gimple_build_call_internal (IFN_LEN_LOAD, 3, + = gimple_build_call_internal (IFN_LEN_LOAD, 4, dataref_ptr, ptr, - final_len); + new_len, bias); gimple_call_set_nothrow (call, true); new_stmt = call; data_ref = NULL_TREE; /* Need conversion if it's wrapped with VnQI. */ - machine_mode vmode = TYPE_MODE (vectype); - opt_machine_mode new_ovmode - = get_len_load_store_mode (vmode, true); - machine_mode new_vmode = new_ovmode.require (); if (vmode != new_vmode) { - tree qi_type = unsigned_intQI_type_node; - tree new_vtype - = build_vector_type_for_mode (qi_type, new_vmode); tree var = vect_get_new_ssa_name (new_vtype, vect_simple_var); gimple_set_lhs (call, var);