[v6,02/11] Implement recording/getting of mask/length for BB SLP

Message ID 20251206165518.5449-3-chris.bazley@arm.com
State Superseded
Headers
Series Extend BB SLP vectorization to use predicated tails |

Commit Message

Christopher Bazley Dec. 6, 2025, 4:55 p.m. UTC
  Add two new fields to SLP tree nodes, which are accessed as
SLP_TREE_CAN_USE_PARTIAL_VECTORS_P and SLP_TREE_PARTIAL_VECTORS_STYLE.

SLP_TREE_CAN_USE_PARTIAL_VECTORS_P is analogous to the existing
predicate LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. It is initialized to
true. This flag just records whether the target could vectorize a
node using a partial vector; it does not say anything about
whether the vector actually is partial, or how the target would support
use of a partial vector. Some kinds of node require mask/length for
partial vectors; others don't. In the latter case (e.g., for add
operations), SLP_TREE_CAN_USE_PARTIAL_VECTORS_P will remain true.

SLP_TREE_PARTIAL_VECTORS_STYLE is analogous to the existing field
LOOP_VINFO_PARTIAL_VECTORS_STYLE. Both are initialized to 'none'.
The vect_partial_vectors_avx512 enumerator is not used for BB SLP.
Unlike loop vectorization, a different style of partial vectors can be
chosen for each node during analysis of that node.

Implement the recently-introduced wrapper functions,
vect_record_(len|mask), for BB SLP by setting
SLP_TREE_PARTIAL_VECTORS_STYLE to indicate that a mask or length should
be used for a given SLP node. The passed-in vec_info is ignored.

Implement the vect_fully_(masked|with_length)_p wrapper functions for
BB SLP by checking the SLP_TREE_PARTIAL_VECTORS_STYLE. This should be
sufficient because at most one of vect_record_(len|mask) and
vect_cannot_use_partial_vectors are expected to be called for any
given SLP node. SLP_TREE_CAN_USE_PARTIAL_VECTORS_P should be true if
the style is not 'none', but its value isn't used beyond the analysis
phase.

The implementations of vect_get_mask and vect_get_len for BB SLP are
non-trivial (albeit simpler than for loop vectorization), therefore they
are delegated to SLP-specific functions defined in tree-vect-slp.cc.

Implement the vect_cannot_use_partial_vectors wrapper function by
setting the SLP_TREE_CAN_USE_PARTIAL_VECTORS_P flag to false.
To prevent regressions, vect_can_use_partial_vectors_p still returns
false for BB SLP regardless (for now). This prevents vect_record_mask
or vect_record_len from being called.

gcc/ChangeLog:

	* tree-vect-slp.cc (_slp_tree::_slp_tree): initialize new
	partial_vector_style and can_use_partial_vectors members.
	(vect_slp_record_bb_style): Set the partial vector style of an
	SLP node, checking that the style does not flip-flop between mask
        and length.
	(vect_slp_record_bb_mask): Use vect_slp_record_bb_style to set
	the partial vectors style of the SLP tree node to
	vect_partial_vectors_while_ult.
	(vect_slp_get_bb_mask): New function to materialize a mask for
	basic block SLP vectorization.
	(vect_slp_record_bb_len): Use vect_slp_record_bb_style to set
	the partial vectors style of the SLP tree node to
	vect_partial_vectors_len.
	(vect_slp_get_bb_len): New function to materialize a length for
	basic block SLP vectorization.
	* tree-vect-stmts.cc (vectorizable_internal_function):
	(vect_record_mask): Handle the basic block SLP use case by
	delegating to vect_slp_record_bb_mask.
	(vect_get_mask): Handle the basic block SLP use case by
	delegating to vect_slp_get_bb_mask.
	(vect_record_len): Handle the basic block SLP use case by
	delegating to vect_slp_record_bb_len.
	(vect_get_len): Handle the basic block SLP use case by
	delegating to vect_slp_get_bb_len.
	(vect_gen_while_ssa_name): New function containing code
	refactored out of vect_gen_while for reuse by
	vect_slp_get_bb_mask.
	(vect_gen_while): Use vect_gen_while_ssa_name instead of custom
	code for some of the implementation.
	* tree-vectorizer.h (enum vect_partial_vector_style): Move this
	definition earlier to allow reuse by struct _slp_tree.
	(struct _slp_tree): Add a partial_vector_style member to record
	whether to use a length or mask for the SLP tree node, if
	partial vectors are required and supported.
	Add a can_use_partial_vectors member to record whether partial
	vectors are supported for the SLP tree node.
	(SLP_TREE_PARTIAL_VECTORS_STYLE): New member accessor macro.
	(SLP_TREE_CAN_USE_PARTIAL_VECTORS_P): New member accessor macro.
	(vect_gen_while_ssa_name): Declaration of a new function.
	(vect_slp_get_bb_mask): As above.
	(vect_slp_get_bb_len): As above.
	(vect_cannot_use_partial_vectors): Handle the basic block SLP
	use-case by setting SLP_TREE_CAN_USE_PARTIAL_VECTORS_P to
	false.
	(vect_fully_with_length_p): Handle the basic block SLP use
	case by checking whether the SLP_TREE_PARTIAL_VECTORS_STYLE is
	vect_partial_vectors_len.
	(vect_fully_masked_p): Handle the basic block SLP use case by
	checking whether the SLP_TREE_PARTIAL_VECTORS_STYLE is
	vect_partial_vectors_while_ult.

---
 gcc/tree-vect-slp.cc   | 121 +++++++++++++++++++++++++++++++++++++++++
 gcc/tree-vect-stmts.cc |  51 ++++++++++-------
 gcc/tree-vectorizer.h  |  44 +++++++++------
 3 files changed, 179 insertions(+), 37 deletions(-)
  

Patch

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 658ad6dc257..57adef2b5d1 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -124,6 +124,8 @@  _slp_tree::_slp_tree ()
   SLP_TREE_GS_BASE (this) = NULL_TREE;
   this->ldst_lanes = false;
   this->avoid_stlf_fail = false;
+  SLP_TREE_PARTIAL_VECTORS_STYLE (this) = vect_partial_vectors_none;
+  SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (this) = true;
   SLP_TREE_VECTYPE (this) = NULL_TREE;
   SLP_TREE_REPRESENTATIVE (this) = NULL;
   this->cycle_info.id = -1;
@@ -12519,3 +12521,122 @@  vect_schedule_slp (vec_info *vinfo, const vec<slp_instance> &slp_instances)
         }
     }
 }
+
+/* Record that a specific partial vector style could be used to vectorize
+   SLP_NODE if required.  */
+
+static void
+vect_slp_record_bb_style (slp_tree slp_node, vect_partial_vector_style style)
+{
+  gcc_assert (style != vect_partial_vectors_none);
+  gcc_assert (style != vect_partial_vectors_avx512);
+
+  if (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) == vect_partial_vectors_none)
+    SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) = style;
+  else
+    gcc_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node) == style);
+}
+
+/* Record that a mask could be used to vectorize SLP_NODE if required.
+ */
+
+void
+vect_slp_record_bb_mask (slp_tree slp_node)
+{
+  vect_slp_record_bb_style (slp_node, vect_partial_vectors_while_ult);
+}
+
+/* Materialize mask number INDEX for a group of scalar stmts in SLP_NODE that
+   operate on NVECTORS vectors of type VECTYPE, where 0 <= INDEX < NVECTORS.
+   Masking is only required for the tail, therefore NULL_TREE is returned for
+   every value of INDEX except the last.  Insert any set-up statements before
+   GSI.  */
+
+tree
+vect_slp_get_bb_mask (slp_tree slp_node, gimple_stmt_iterator *gsi,
+		      unsigned int nvectors, tree vectype, unsigned int index)
+{
+  gcc_checking_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
+		       == vect_partial_vectors_while_ult);
+
+  /* Only the last vector can be a partial vector.  */
+  if (index < nvectors - 1)
+    return NULL_TREE;
+
+  /* vect_get_num_copies only allows a partial vector if it is the only
+     vector.  */
+  if (nvectors > 1)
+    return NULL_TREE;
+
+  gcc_checking_assert (nvectors == 1);
+
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned int group_size = SLP_TREE_LANES (slp_node);
+
+  /* A single vector can be a full vector, in which case no mask is
+   * needed.  */
+  if (known_eq (nunits, group_size))
+    return NULL_TREE;
+
+  /* Return a mask for a single partial vector.
+     FORNOW: don't bother maintaining a set of mask constants to allow
+     sharing between nodes belonging to the same instance of bb_vec_info.  */
+  gcc_checking_assert (known_le (group_size, nunits));
+  gimple_seq stmts = NULL;
+  tree cmp_type = size_type_node;
+  tree start_index = build_zero_cst (cmp_type);
+  tree end_index = build_int_cst (cmp_type, group_size);
+  tree masktype = truth_type_for (vectype);
+  tree mask = make_temp_ssa_name (masktype, NULL, "slp_mask");
+  vect_gen_while_ssa_name (&stmts, masktype, start_index, end_index, mask);
+  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+
+  return mask;
+}
+
+/* Record that a length limit could be used to vectorize SLP_NODE if required.
+ */
+
+void
+vect_slp_record_bb_len (slp_tree slp_node)
+{
+  vect_slp_record_bb_style (slp_node, vect_partial_vectors_len);
+}
+
+/* Materialize length number INDEX for a group of scalar stmts in SLP_NODE that
+   operate on NVECTORS vectors of type VECTYPE, where 0 <= INDEX < NVECTORS.  A
+   length limit is only required for the tail, therefore NULL_TREE is returned
+   for every value of INDEX except the last; otherwise, return a value that
+   contains FACTOR multiplied by the number of elements that should be
+   processed.  */
+
+tree
+vect_slp_get_bb_len (slp_tree slp_node, unsigned int nvectors, tree vectype,
+		     unsigned int index, unsigned int factor)
+{
+  gcc_checking_assert (SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
+		       == vect_partial_vectors_len);
+
+  /* Only the last vector can be a partial vector.  */
+  if (index < nvectors - 1)
+    return NULL_TREE;
+
+  /* vect_get_num_copies only allows a partial vector if it is the only
+     vector.  */
+  if (nvectors > 1)
+    return NULL_TREE;
+
+  gcc_checking_assert (nvectors == 1);
+
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned int group_size = SLP_TREE_LANES (slp_node);
+
+  /* A single vector can be a full vector, in which case no length limit is
+   * needed.  */
+  if (known_eq (nunits, group_size))
+    return NULL_TREE;
+
+  /* Return the scaled length of a single partial vector.  */
+  gcc_checking_assert (known_lt (group_size, nunits));
+  return size_int (group_size * factor);
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 078cc63b2d9..bd63451b467 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1387,7 +1387,9 @@  vectorizable_internal_function (combined_fn cfn, tree fndecl,
 /* Record that a complete set of masks associated with VINFO would need to
    contain a sequence of NVECTORS masks that each control a vector of type
    VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would AND
-   these vector masks with the vector version of SCALAR_MASK.  */
+   these vector masks with the vector version of SCALAR_MASK.  Alternatively,
+   if doing basic block vectorization, record that a mask could be used to
+   vectorize SLP_NODE if required.  */
 static void
 vect_record_mask (vec_info *vinfo, slp_tree slp_node, unsigned int nvectors,
 		  tree vectype, tree scalar_mask)
@@ -1397,7 +1399,7 @@  vect_record_mask (vec_info *vinfo, slp_tree slp_node, unsigned int nvectors,
     vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo), nvectors,
 			   vectype, scalar_mask);
   else
-    (void) slp_node; /* FORNOW */
+    vect_slp_record_bb_mask (slp_node);
 }
 
 /* Given a complete set of masks associated with VINFO, extract mask number
@@ -1415,16 +1417,15 @@  vect_get_mask (vec_info *vinfo, slp_tree slp_node, gimple_stmt_iterator *gsi,
     return vect_get_loop_mask (loop_vinfo, gsi, &LOOP_VINFO_MASKS (loop_vinfo),
 			       nvectors, vectype, index);
   else
-    {
-      (void) slp_node; /* FORNOW */
-      return NULL_TREE;
-    }
+    return vect_slp_get_bb_mask (slp_node, gsi, nvectors, vectype, index);
 }
 
 /* Record that a complete set of lengths associated with VINFO would need to
    contain a sequence of NVECTORS lengths for controlling an operation on
    VECTYPE.  The operation splits each element of VECTYPE into FACTOR separate
-   subelements, measuring the length as a number of these subelements.  */
+   subelements, measuring the length as a number of these subelements.
+   Alternatively, if doing basic block vectorization, record that a length limit
+   could be used to vectorize SLP_NODE if required.  */
 static void
 vect_record_len (vec_info *vinfo, slp_tree slp_node, unsigned int nvectors,
 		 tree vectype, unsigned int factor)
@@ -1434,7 +1435,7 @@  vect_record_len (vec_info *vinfo, slp_tree slp_node, unsigned int nvectors,
     vect_record_loop_len (loop_vinfo, &LOOP_VINFO_LENS (loop_vinfo), nvectors,
 			  vectype, factor);
   else
-    (void) slp_node; /* FORNOW */
+    vect_slp_record_bb_len(slp_node);
 }
 
 /* Given a complete set of lengths associated with VINFO, extract length number
@@ -1455,10 +1456,7 @@  vect_get_len (vec_info *vinfo, slp_tree slp_node, gimple_stmt_iterator *gsi,
     return vect_get_loop_len (loop_vinfo, gsi, &LOOP_VINFO_LENS (loop_vinfo),
 			      nvectors, vectype, index, factor);
   else
-    {
-      (void) slp_node; /* FORNOW */
-      return NULL_TREE;
-    }
+    return vect_slp_get_bb_len (slp_node, nvectors, vectype, index, factor);
 }
 
 static tree permute_vec_elements (vec_info *, tree, tree, tree, stmt_vec_info,
@@ -14499,24 +14497,35 @@  supportable_indirect_convert_operation (code_helper code,
    mask[I] is true iff J + START_INDEX < END_INDEX for all J <= I.
    Add the statements to SEQ.  */
 
+void
+vect_gen_while_ssa_name (gimple_seq *seq, tree mask_type, tree start_index,
+			 tree end_index, tree ssa_name)
+{
+  tree cmp_type = TREE_TYPE (start_index);
+  gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT, cmp_type,
+						       mask_type,
+						       OPTIMIZE_FOR_SPEED));
+  gcall *call
+    = gimple_build_call_internal (IFN_WHILE_ULT, 3, start_index, end_index,
+				  build_zero_cst (mask_type));
+  gimple_call_set_lhs (call, ssa_name);
+  gimple_seq_add_stmt (seq, call);
+}
+
+/*  Like vect_gen_while_ssa_name except that it creates a new SSA_NAME node
+    for type MASK_TYPE defined in the created GIMPLE_CALL statement.  If NAME
+    is not a null pointer then it is used for the SSA_NAME in dumps.  */
+
 tree
 vect_gen_while (gimple_seq *seq, tree mask_type, tree start_index,
 		tree end_index, const char *name)
 {
-  tree cmp_type = TREE_TYPE (start_index);
-  gcc_checking_assert (direct_internal_fn_supported_p (IFN_WHILE_ULT,
-						       cmp_type, mask_type,
-						       OPTIMIZE_FOR_SPEED));
-  gcall *call = gimple_build_call_internal (IFN_WHILE_ULT, 3,
-					    start_index, end_index,
-					    build_zero_cst (mask_type));
   tree tmp;
   if (name)
     tmp = make_temp_ssa_name (mask_type, NULL, name);
   else
     tmp = make_ssa_name (mask_type);
-  gimple_call_set_lhs (call, tmp);
-  gimple_seq_add_stmt (seq, call);
+  vect_gen_while_ssa_name (seq, mask_type, start_index, end_index, tmp);
   return tmp;
 }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 2d914dca90b..1830c29819a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -307,6 +307,13 @@  struct vect_load_store_data : vect_data {
   bool subchain_p; // VMAT_STRIDED_SLP and VMAT_GATHER_SCATTER
 };
 
+enum vect_partial_vector_style {
+  vect_partial_vectors_none,
+  vect_partial_vectors_while_ult,
+  vect_partial_vectors_avx512,
+  vect_partial_vectors_len
+};
+
 /* A computation tree of an SLP instance.  Each node corresponds to a group of
    stmts to be packed in a SIMD stmt.  */
 struct _slp_tree {
@@ -368,6 +375,13 @@  struct _slp_tree {
   /* For BB vect, flag to indicate this load node should be vectorized
      as to avoid STLF fails because of related stores.  */
   bool avoid_stlf_fail;
+  /* The style used for implementing partial vectors if LANES is less than
+     the minimum number of lanes implied by the VECTYPE.  */
+  vect_partial_vector_style partial_vector_style;
+  /* Flag to indicate whether we still have the option of vectorizing this node
+     using partial vectors (i.e.  using lengths or masks to prevent use of
+     inactive scalar lanes).  */
+  bool can_use_partial_vectors;
 
   int vertex;
 
@@ -466,6 +480,8 @@  public:
 #define SLP_TREE_GS_BASE(S)			 (S)->gs_base
 #define SLP_TREE_REDUC_IDX(S)			 (S)->cycle_info.reduc_idx
 #define SLP_TREE_PERMUTE_P(S)			 ((S)->code == VEC_PERM_EXPR)
+#define SLP_TREE_PARTIAL_VECTORS_STYLE(S)	 (S)->partial_vector_style
+#define SLP_TREE_CAN_USE_PARTIAL_VECTORS_P(S)    (S)->can_use_partial_vectors
 
 inline vect_memory_access_type
 SLP_TREE_MEMORY_ACCESS_TYPE (slp_tree node)
@@ -476,13 +492,6 @@  SLP_TREE_MEMORY_ACCESS_TYPE (slp_tree node)
   return VMAT_UNINITIALIZED;
 }
 
-enum vect_partial_vector_style {
-    vect_partial_vectors_none,
-    vect_partial_vectors_while_ult,
-    vect_partial_vectors_avx512,
-    vect_partial_vectors_len
-};
-
 /* Key for map that records association between
    scalar conditions and corresponding loop mask, and
    is populated by vect_record_loop_mask.  */
@@ -2576,6 +2585,7 @@  extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &);
 extern void optimize_mask_stores (class loop*);
 extern tree vect_gen_while (gimple_seq *, tree, tree, tree,
 			    const char * = nullptr);
+extern void vect_gen_while_ssa_name (gimple_seq *, tree, tree, tree, tree);
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
 extern opt_result vect_get_vector_types_for_stmt (vec_info *,
 						  stmt_vec_info, tree *,
@@ -2760,6 +2770,12 @@  extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
 extern void vect_free_slp_tree (slp_tree);
 extern bool compatible_calls_p (gcall *, gcall *, bool);
 extern int vect_slp_child_index_for_operand (const gimple *, int op, bool);
+extern void vect_slp_record_bb_mask (slp_tree slp_node);
+extern tree vect_slp_get_bb_mask (slp_tree, gimple_stmt_iterator *,
+				  unsigned int, tree, unsigned int);
+extern void vect_slp_record_bb_len (slp_tree slp_node);
+extern tree vect_slp_get_bb_len (slp_tree, unsigned int, tree, unsigned int,
+				 unsigned int);
 extern tree prepare_vec_mask (vec_info *, tree, tree, tree,
 			      gimple_stmt_iterator *);
 extern tree vect_get_mask_load_else (int, tree);
@@ -2924,7 +2940,7 @@  vect_cannot_use_partial_vectors (vec_info *vinfo, slp_tree slp_node)
   if (loop_vinfo)
     LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
   else
-    (void) slp_node; /* FORNOW */
+    SLP_TREE_CAN_USE_PARTIAL_VECTORS_P (slp_node) = false;
 }
 
 /* Return true if VINFO is vectorizer state for loop vectorization, we've
@@ -2938,10 +2954,8 @@  vect_fully_with_length_p (vec_info *vinfo, slp_tree slp_node)
   if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
     return LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
   else
-    {
-      (void) slp_node; /* FORNOW */
-      return false;
-    }
+    return SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
+	   == vect_partial_vectors_len;
 }
 
 /* Return true if VINFO is vectorizer state for loop vectorization, we've
@@ -2955,10 +2969,8 @@  vect_fully_masked_p (vec_info *vinfo, slp_tree slp_node)
   if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
     return LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
   else
-    {
-      (void) slp_node; /* FORNOW */
-      return false;
-    }
+    return SLP_TREE_PARTIAL_VECTORS_STYLE (slp_node)
+	   == vect_partial_vectors_while_ult;
 }
 
 /* If STMT_INFO describes a reduction, return the vect_reduction_type