Message ID | e5e1426b-5729-6de6-209f-518dcd4ada38@linux.ibm.com |
---|---|
State | RFC |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 283143857C42 for <patchwork@sourceware.org>; Wed, 20 Oct 2021 08:35:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 283143857C42 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1634718916; bh=1o/GutuBGq9Y1auThdVUKB3nR37me2i/uxEADK43vwo=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=g2m5C2GNx1v+VLtq72+hFJao11+ETxMgNP1shm/fjKKdURotGMJA5lV9GGyu+h0vb GytSk9sprxa1zDl8+T3cVxehwGHC8onj86FvJUZuELIYyslqSzjwLwI9RC7uR7g064 4zeanYg5rh8krN97QcEK6lIb5Ob/x638JGfog5o0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id EAEB43858C3A for <gcc-patches@gcc.gnu.org>; Wed, 20 Oct 2021 08:34:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EAEB43858C3A Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19K8MAUY020218 for <gcc-patches@gcc.gnu.org>; Wed, 20 Oct 2021 04:34:46 -0400 Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 3btfhv06wu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for <gcc-patches@gcc.gnu.org>; Wed, 20 Oct 2021 04:34:45 -0400 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19K8XXBb009517 for <gcc-patches@gcc.gnu.org>; Wed, 20 Oct 2021 08:34:43 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma05fra.de.ibm.com with ESMTP id 3bqpc9yw5x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for <gcc-patches@gcc.gnu.org>; Wed, 20 Oct 2021 08:34:42 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19K8Yd5q62849312 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Oct 2021 08:34:39 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 247C911C066; Wed, 20 Oct 2021 08:34:39 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0969B11C052; Wed, 20 Oct 2021 08:34:39 +0000 (GMT) Received: from li-926bd7cc-2dd1-11b2-a85c-f6adc0f5efec.ibm.com (unknown [9.171.64.225]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Wed, 20 Oct 2021 08:34:38 +0000 (GMT) Subject: [RFC] Partial vectors for s390 To: GCC Patches <gcc-patches@gcc.gnu.org> Message-ID: <e5e1426b-5729-6de6-209f-518dcd4ada38@linux.ibm.com> Date: Wed, 20 Oct 2021 10:34:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: AZiGFuxnn_JdRGHS562u7t_ubz0cngBG X-Proofpoint-ORIG-GUID: AZiGFuxnn_JdRGHS562u7t_ubz0cngBG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-20_04,2021-10-19_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxlogscore=999 spamscore=0 suspectscore=0 mlxscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 phishscore=0 priorityscore=1501 clxscore=1015 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001 definitions=main-2110200047 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Robin Dapp <rdapp@linux.ibm.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[RFC] Partial vectors for s390
|
|
Commit Message
Robin Dapp
Oct. 20, 2021, 8:34 a.m. UTC
Hi, I have been playing around with making Kewen's partial vector changes workable with s390: We have a vll instruction that can be passed the highest byte to load. The rather unfortunate consequence of this is that a length of zero cannot be specified. The partial vector framework, however, relies a lot on the fact that a len_load can be made a NOP using a length of zero. After confirming an additional zero-check before each vll is definitely too slow across SPEC and some discussion with Kewen we figured the easiest way forward is to exclude loops with multiple VFs (despite giving up vectorization possibilities). These are prone to len_loads with zero while the regular induction variable check prevents them in single-VF loops. So, as a quick hack, I went with whether we are guaranteed not to emit len_load with zero now. On top, I subtract 1 from the passed length in the expander, which, supposedly, is also not ideal. There are some regressions that I haven't fully analyzed yet but whether and when to actually enable this feature could be a backend decision with the necessary middle-end checks already in place. Any ideas on how to properly check for the zero condition and exclude the cases that cause it? Kewen suggested enriching the len_load optabs with a separate parameter. Regards Robin
Comments
Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi, > > I have been playing around with making Kewen's partial vector changes > workable with s390: > > We have a vll instruction that can be passed the highest byte to load. > The rather unfortunate consequence of this is that a length of zero > cannot be specified. The partial vector framework, however, relies a > lot on the fact that a len_load can be made a NOP using a length of zero. > > After confirming an additional zero-check before each vll is definitely > too slow across SPEC and some discussion with Kewen we figured the > easiest way forward is to exclude loops with multiple VFs (despite > giving up vectorization possibilities). These are prone to len_loads > with zero while the regular induction variable check prevents them in > single-VF loops. > > So, as a quick hack, I went with > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 75f24e7c4f6..f79222daeb6 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -1170,6 +1170,9 @@ vect_verify_loop_lens (loop_vec_info loop_vinfo) > if (LOOP_VINFO_LENS (loop_vinfo).is_empty ()) > return false; > > + if (LOOP_VINFO_LENS (loop_vinfo).length () > 1) > + return false; > + Yeah, I think this should be sufficient. > which could be made a hook, eventually. FWIW this is sufficient to make > bootstrap, regtest and compiling the SPEC suites succeed. I'm unsure > whether we are guaranteed not to emit len_load with zero now. On top, > I subtract 1 from the passed length in the expander, which, supposedly, > is also not ideal. Exposing the subtraction in gimple would certainly allow for more optimisation. We already have code to probe the predicates of the underlying define_expands/insns to see whether they support certain constant IFN arguments; see e.g. internal_gather_scatter_fn_supported_p. We could do something similar here: add an extra operand to the optab, and an extra argument to the IFN, that gives a bias amount. The PowerPC version would require 0, the System Z version would require -1. The vectoriser would probe to see which value it should use. Doing it that way ensures that the gimple is still self-describing. It avoids gimple semantics depending on target hooks. > There are some regressions that I haven't fully analyzed yet but whether > and when to actually enable this feature could be a backend decision > with the necessary middle-end checks already in place. > > Any ideas on how to properly check for the zero condition and exclude > the cases that cause it? Kewen suggested enriching the len_load optabs > with a separate parameter. Yeah, I think that'd be a good approach. A bias of -1 would indicate that the target can't cope with zero lengths. Thanks, Richard
Hi Richard, > We already have code to probe the predicates of the underlying > define_expands/insns to see whether they support certain constant > IFN arguments; see e.g. internal_gather_scatter_fn_supported_p. > We could do something similar here: add an extra operand to the optab, > and an extra argument to the IFN, that gives a bias amount. > The PowerPC version would require 0, the System Z version would > require -1. The vectoriser would probe to see which value > it should use. > > Doing it that way ensures that the gimple is still self-describing. > It avoids gimple semantics depending on target hooks. As I don't have much previous exposure to the vectoriser code, I cobbled together something pretty ad-hoc (attached). Does this come somehow close to what you have in mind? internal_len_load_supported_p should rather be called internal_len_load_bias_supported_p or so I guess and the part where we exclude multiple loop_lens is still missing. Would we also check for a viable bias there and then either accept multiple lens or not? Regards Robin
Robin Dapp <rdapp@linux.ibm.com> writes: > Hi Richard, > >> We already have code to probe the predicates of the underlying >> define_expands/insns to see whether they support certain constant >> IFN arguments; see e.g. internal_gather_scatter_fn_supported_p. >> We could do something similar here: add an extra operand to the optab, >> and an extra argument to the IFN, that gives a bias amount. >> The PowerPC version would require 0, the System Z version would >> require -1. The vectoriser would probe to see which value >> it should use. >> >> Doing it that way ensures that the gimple is still self-describing. >> It avoids gimple semantics depending on target hooks. > > As I don't have much previous exposure to the vectoriser code, I cobbled > together something pretty ad-hoc (attached). Does this come somehow > close to what you have in mind? Yeah, looks good. > internal_len_load_supported_p should rather be called > internal_len_load_bias_supported_p or so I guess and the part where we > exclude multiple loop_lens is still missing. Since we only support one bias, it might be better to make the internal-fn.c function return the bias as an int (with some marker value for “not supported”), so that the caller doesn't need to probe both values. > Would we also check for a viable bias there and then either accept > multiple lens or not? Yeah, I think so. Thanks, Richard > > Regards > Robin > > commit 2320dbfdfe1477b15a2ac59847d2a52e68de49ab > Author: Robin Dapp <rdapp@linux.ibm.com> > Date: Tue Oct 26 14:36:08 2021 +0200 > > bias1 > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > index 8312d08aab2..bf97d3e471a 100644 > --- a/gcc/internal-fn.c > +++ b/gcc/internal-fn.c > @@ -2696,9 +2696,9 @@ expand_call_mem_ref (tree type, gcall *stmt, int index) > static void > expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > { > - class expand_operand ops[3]; > - tree type, lhs, rhs, maskt; > - rtx mem, target, mask; > + class expand_operand ops[4]; > + tree type, lhs, rhs, maskt, biast; > + rtx mem, target, mask, bias; > insn_code icode; > > maskt = gimple_call_arg (stmt, 2); > @@ -2727,7 +2727,18 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > TYPE_UNSIGNED (TREE_TYPE (maskt))); > else > create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); > - expand_insn (icode, 3, ops); > + if (optab == len_load_optab) > + { > + biast = gimple_call_arg (stmt, 3); > + bias = expand_normal (biast); > + create_input_operand (&ops[3], bias, SImode); > + } > + > + if (optab != len_load_optab) > + expand_insn (icode, 3, ops); > + else > + expand_insn (icode, 4, ops); > + > if (!rtx_equal_p (target, ops[0].value)) > emit_move_insn (target, ops[0].value); > } > @@ -2741,9 +2752,9 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > static void > expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > { > - class expand_operand ops[3]; > - tree type, lhs, rhs, maskt; > - rtx mem, reg, mask; > + class expand_operand ops[4]; > + tree type, lhs, rhs, maskt, biast; > + rtx mem, reg, mask, bias; > insn_code icode; > > maskt = gimple_call_arg (stmt, 2); > @@ -2770,7 +2781,17 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > TYPE_UNSIGNED (TREE_TYPE (maskt))); > else > create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); > - expand_insn (icode, 3, ops); > + if (optab == len_store_optab) > + { > + biast = gimple_call_arg (stmt, 4); > + bias = expand_normal (biast); > + create_input_operand (&ops[3], bias, SImode); > + } > + > + if (optab != len_store_optab) > + expand_insn (icode, 3, ops); > + else > + expand_insn (icode, 4, ops); > } > > #define expand_mask_store_optab_fn expand_partial_store_optab_fn > @@ -4154,6 +4175,25 @@ internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, > && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); > } > > +bool > +internal_len_load_supported_p (internal_fn ifn, tree load_type, int bias) > +{ > + if (bias > 0 || bias < -1) > + return false; > + > + machine_mode mode = TYPE_MODE (load_type); > + > + optab optab = direct_internal_fn_optab (ifn); > + insn_code icode = direct_optab_handler (optab, mode); > + int output_ops = internal_load_fn_p (ifn) ? 1 : 0; > + > + if (icode != CODE_FOR_nothing > + && insn_operand_matches (icode, 2 + output_ops, GEN_INT (bias))) > + return true; > + > + return false; > +} > + > /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN > for pointers of type TYPE when the accesses have LENGTH bytes and their > common byte alignment is ALIGN. */ > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h > index 19d0f849a5a..d0bf9941bcc 100644 > --- a/gcc/internal-fn.h > +++ b/gcc/internal-fn.h > @@ -225,6 +225,7 @@ extern int internal_fn_mask_index (internal_fn); > extern int internal_fn_stored_value_index (internal_fn); > extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, > tree, tree, int); > +extern bool internal_len_load_supported_p (internal_fn ifn, tree, int); > extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree, > poly_uint64, unsigned int); > > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index d7723b1a92a..50537763ace 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -8272,12 +8272,14 @@ vectorizable_store (vec_info *vinfo, > opt_machine_mode new_ovmode > = get_len_load_store_mode (vmode, false); > machine_mode new_vmode = new_ovmode.require (); > + tree vtype = vectype; > /* Need conversion if it's wrapped with VnQI. */ > if (vmode != new_vmode) > { > tree new_vtype > = build_vector_type_for_mode (unsigned_intQI_type_node, > - new_vmode); > + new_vmode); > + vtype = new_vtype; > tree var > = vect_get_new_ssa_name (new_vtype, vect_simple_var); > vec_oprnd > @@ -8289,9 +8291,29 @@ vectorizable_store (vec_info *vinfo, > gsi); > vec_oprnd = var; > } > + > + /* Check which bias value to use. Default is 0. */ > + tree bias = build_int_cst (intSI_type_node, 0); > + tree new_len = final_len; > + if (!internal_len_load_supported_p (IFN_LEN_LOAD, vtype, 0) > + && internal_len_load_supported_p (IFN_LEN_LOAD, > + vtype, -1)) > + { > + bias = build_int_cst (intSI_type_node, -1); > + new_len = make_ssa_name (TREE_TYPE (final_len)); > + gassign *m1 = gimple_build_assign (new_len, > + MINUS_EXPR, > + final_len, > + build_one_cst > + (TREE_TYPE > + (final_len))); > + vect_finish_stmt_generation (vinfo, stmt_info, m1, > + gsi); > + } > gcall *call > - = gimple_build_call_internal (IFN_LEN_STORE, 4, dataref_ptr, > - ptr, final_len, vec_oprnd); > + = gimple_build_call_internal (IFN_LEN_STORE, 5, dataref_ptr, > + ptr, new_len, vec_oprnd, > + bias); > gimple_call_set_nothrow (call, true); > vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > new_stmt = call; > @@ -9588,24 +9610,50 @@ vectorizable_load (vec_info *vinfo, > vec_num * j + i); > tree ptr = build_int_cst (ref_type, > align * BITS_PER_UNIT); > + > + machine_mode vmode = TYPE_MODE (vectype); > + opt_machine_mode new_ovmode > + = get_len_load_store_mode (vmode, true); > + machine_mode new_vmode = new_ovmode.require (); > + tree qi_type = unsigned_intQI_type_node; > + tree new_vtype > + = build_vector_type_for_mode (qi_type, new_vmode); > + > + tree vtype = vectype; > + if (vmode != new_vmode) > + vtype = new_vtype; > + > + /* Check which bias value to use. Default is 0. */ > + tree bias = build_int_cst (intSI_type_node, 0); > + tree new_len = final_len; > + if (!internal_len_load_supported_p (IFN_LEN_LOAD, > + vtype, 0) > + && internal_len_load_supported_p (IFN_LEN_LOAD, > + vtype, -1)) > + { > + bias = build_int_cst (intSI_type_node, -1); > + new_len = make_ssa_name (TREE_TYPE (final_len)); > + gassign *m1 = gimple_build_assign (new_len, > + MINUS_EXPR, > + final_len, > + build_one_cst > + (TREE_TYPE > + (final_len))); > + vect_finish_stmt_generation (vinfo, stmt_info, m1, > + gsi); > + } > + > gcall *call > - = gimple_build_call_internal (IFN_LEN_LOAD, 3, > + = gimple_build_call_internal (IFN_LEN_LOAD, 4, > dataref_ptr, ptr, > - final_len); > + new_len, bias); > gimple_call_set_nothrow (call, true); > new_stmt = call; > data_ref = NULL_TREE; > > /* Need conversion if it's wrapped with VnQI. */ > - machine_mode vmode = TYPE_MODE (vectype); > - opt_machine_mode new_ovmode > - = get_len_load_store_mode (vmode, true); > - machine_mode new_vmode = new_ovmode.require (); > if (vmode != new_vmode) > { > - tree qi_type = unsigned_intQI_type_node; > - tree new_vtype > - = build_vector_type_for_mode (qi_type, new_vmode); > tree var = vect_get_new_ssa_name (new_vtype, > vect_simple_var); > gimple_set_lhs (call, var);
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 75f24e7c4f6..f79222daeb6 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1170,6 +1170,9 @@ vect_verify_loop_lens (loop_vec_info loop_vinfo) if (LOOP_VINFO_LENS (loop_vinfo).is_empty ()) return false; + if (LOOP_VINFO_LENS (loop_vinfo).length () > 1) + return false; + which could be made a hook, eventually. FWIW this is sufficient to make bootstrap, regtest and compiling the SPEC suites succeed. I'm unsure