From patchwork Wed Jan 10 15:45:21 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
X-Patchwork-Id: 83753
X-Patchwork-Delegate: rdapp.gcc@gmail.com
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3B16E385802F
	for <patchwork@sourceware.org>; Wed, 10 Jan 2024 15:45:57 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from smtpbguseast1.qq.com (smtpbguseast1.qq.com [54.204.34.129])
 by sourceware.org (Postfix) with ESMTPS id 11B483858C41
 for <gcc-patches@gcc.gnu.org>; Wed, 10 Jan 2024 15:45:28 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 11B483858C41
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=rivai.ai
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 11B483858C41
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=54.204.34.129
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704901532; cv=none;
 b=i/RrVxHOZyg0reqV99MF4rrMgyEYTtCrTx/F94ibbpKsm/TiGj2Eqcbewx2Fh17CrztGNgaRswtad02wm9J4dRDbBbvU+DPBC1p+uGNAExnP2fb0FzE2+fAxFPUKxWHMysZH5NU5pKbzgkkeT/PnhSbC6HmuhyapgIWK8J/o9dg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1704901532; c=relaxed/simple;
 bh=3c1xMx8rM6TFSOW2xlFZTffF0RBx6JX4OIUpVlSKIGE=;
 h=From:To:Subject:Date:Message-Id:MIME-Version;
 b=fHIZBMciEA6QgmxM70Msgpm1QSC51iwbZi//sKYK0LPC+lR0eOOw0r0rFhepCNslEd/7viNXPMjaMBkBOAdq80JhiX9SP0a/0TRfmCfqfD7Zf6628LyQ/KqrObB7Q2iF6So1IcalYrG5GBDsV9hfnpNMkdkCUS+nExWA4hjwDOo=
ARC-Authentication-Results: i=1; server2.sourceware.org
X-QQ-mid: bizesmtp79t1704901523tzulz40j
Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9])
 by bizesmtp.qq.com (ESMTP) with
 id ; Wed, 10 Jan 2024 23:45:22 +0800 (CST)
X-QQ-SSF: 01400000000000G0V000000A0000000
X-QQ-FEAT: cbck7jzG4wbdGZMsQUaWj9Vd8+/Kc7FW85zYHEdUYnDs0j6el4FQJnHue7qPy
 E2Dv6IpRBDjZ6SM5XzXL8bhmSlrBJSIaeH+SE4HPzxn7Yx7HHTu6eUQFGL16YIgpMpFzfAK
 OBdJJKK+kvg39LptHsslhG5t6VmYjpG+yC96286fSXBZZYFDKRp7bm5CXjczzKGT7XdVtrs
 Gi3n4LmUYZ9MSje6RDM/Tz9IaiEqeXVa0u9nA8lSe5kysucsLrrPvV9rsb0ro5Qolha+eh3
 ceuL7ulekQN/hrCav3/FHwNZmo7k/4oGlfUYocHyNgTLujwkXvV6sE18x/ORQPjtCi1i+Hr
 PvW0dnZr3gthCjcdijthX2B75tKOZj4gGovLUP6oTgOfbrMCoqeBTxxiuSR1QsUcUUtxRox
 go06Ji3CACA=
X-QQ-GoodBg: 2
X-BIZMAIL-ID: 5277267550747141958
From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
To: gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com,
 rdapp.gcc@gmail.com, Juzhe-Zhong <juzhe.zhong@rivai.ai>
Subject: [PATCH V2] RISC-V: Switch RVV cost model.
Date: Wed, 10 Jan 2024 23:45:21 +0800
Message-Id: <20240110154521.146111-1-juzhe.zhong@rivai.ai>
X-Mailer: git-send-email 2.36.3
MIME-Version: 1.0
X-QQ-SENDSIZE: 520
Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0
X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS,
 SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org

This patch is preparing patch for the following cost model tweak.

Since we don't have vector cost model in default tune info (rocket),
we set the cost model default as generic cost model by default.

The reason we want to switch to generic vector cost model is the default
cost model generates inferior codegen for various benchmarks.

For example, PR113247, we have performance bug that we end up having over 70%
performance drop of SHA256.  Currently, no matter how we adapt cost model,
we are not able to fix the performance bug since we always use default cost model by default.

Also, tweak the generic cost model back to default cost model since we have some FAILs in
current tests.

After this patch, we (me an Robin) can work on cost model tunning together to improve performane
in various benchmarks.

Tested on both RV32 and RV64, ok for trunk ?

gcc/ChangeLog:

	* config/riscv/riscv.cc (get_common_costs): Switch RVV cost model.
	(get_vector_costs): Ditto.
	(riscv_builtin_vectorization_cost): Ditto.
---
 gcc/config/riscv/riscv.cc | 144 ++++++++++++++++++++------------------
 1 file changed, 75 insertions(+), 69 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32183d63180..cca01fd54d9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -352,48 +352,49 @@ const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   VD_REGS,	VD_REGS,	VD_REGS,	VD_REGS,
 };
 
-/* Generic costs for VLS vector operations.   */
-static const common_vector_cost generic_vls_vector_cost = {
+/* RVV costs for VLS vector operations.   */
+static const common_vector_cost rvv_vls_vector_cost = {
   1, /* int_stmt_cost  */
   1, /* fp_stmt_cost  */
   1, /* gather_load_cost  */
   1, /* scatter_store_cost  */
-  2, /* vec_to_scalar_cost  */
+  1, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
-  2, /* permute_cost  */
+  1, /* permute_cost  */
   1, /* align_load_cost  */
   1, /* align_store_cost  */
-  1, /* unalign_load_cost  */
-  1, /* unalign_store_cost  */
+  2, /* unalign_load_cost  */
+  2, /* unalign_store_cost  */
 };
 
-/* Generic costs for VLA vector operations.  */
-static const scalable_vector_cost generic_vla_vector_cost = {
+/* RVV costs for VLA vector operations.  */
+static const scalable_vector_cost rvv_vla_vector_cost = {
   {
     1, /* int_stmt_cost  */
     1, /* fp_stmt_cost  */
     1, /* gather_load_cost  */
     1, /* scatter_store_cost  */
-    2, /* vec_to_scalar_cost  */
+    1, /* vec_to_scalar_cost  */
     1, /* scalar_to_vec_cost  */
-    2, /* permute_cost  */
+    1, /* permute_cost  */
     1, /* align_load_cost  */
     1, /* align_store_cost  */
-    1, /* unalign_load_cost  */
-    1, /* unalign_store_cost  */
+    2, /* unalign_load_cost  */
+    2, /* unalign_store_cost  */
   },
 };
 
-/* Generic costs for vector insn classes.  */
+/* Generic costs for vector insn classes.  It is supposed to be the vector cost
+   models used by default if no other cost model was specified.  */
 static const struct cpu_vector_cost generic_vector_cost = {
-  1,			    /* scalar_int_stmt_cost  */
-  1,			    /* scalar_fp_stmt_cost  */
-  1,			    /* scalar_load_cost  */
-  1,			    /* scalar_store_cost  */
-  3,			    /* cond_taken_branch_cost  */
-  1,			    /* cond_not_taken_branch_cost  */
-  &generic_vls_vector_cost, /* vls  */
-  &generic_vla_vector_cost, /* vla */
+  1,			/* scalar_int_stmt_cost  */
+  1,			/* scalar_fp_stmt_cost  */
+  1,			/* scalar_load_cost  */
+  1,			/* scalar_store_cost  */
+  3,			/* cond_taken_branch_cost  */
+  1,			/* cond_not_taken_branch_cost  */
+  &rvv_vls_vector_cost, /* vls  */
+  &rvv_vla_vector_cost, /* vla */
 };
 
 /* Costs to use when optimizing for rocket.  */
@@ -10372,11 +10373,10 @@ riscv_frame_pointer_required (void)
   return riscv_save_frame_pointer && !crtl->is_leaf;
 }
 
-/* Return the appropriate common costs for vectors of type VECTYPE.  */
+/* Return the appropriate common costs according to VECTYPE from COSTS.  */
 static const common_vector_cost *
-get_common_costs (tree vectype)
+get_common_costs (const cpu_vector_cost *costs, tree vectype)
 {
-  const cpu_vector_cost *costs = tune_param->vec_costs;
   gcc_assert (costs);
 
   if (vectype && riscv_v_ext_vls_mode_p (TYPE_MODE (vectype)))
@@ -10384,78 +10384,84 @@ get_common_costs (tree vectype)
   return costs->vla;
 }
 
+/* Return the CPU vector costs according to -mtune if tune info has non-NULL
+   vector cost.  Otherwide, return the default generic vector costs.  */
+static const cpu_vector_cost *
+get_vector_costs ()
+{
+  const cpu_vector_cost *costs = tune_param->vec_costs;
+  if (!costs)
+    return &generic_vector_cost;
+  return costs;
+}
+
 /* Implement targetm.vectorize.builtin_vectorization_cost.  */
 
 static int
 riscv_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
 				  tree vectype, int misalign ATTRIBUTE_UNUSED)
 {
-  unsigned elements;
-  const cpu_vector_cost *costs = tune_param->vec_costs;
+  const cpu_vector_cost *costs = get_vector_costs ();
   bool fp = false;
 
   if (vectype != NULL)
     fp = FLOAT_TYPE_P (vectype);
 
-  if (costs != NULL)
+  const common_vector_cost *common_costs = get_common_costs (costs, vectype);
+  gcc_assert (common_costs != NULL);
+  switch (type_of_cost)
     {
-      const common_vector_cost *common_costs = get_common_costs (vectype);
-      gcc_assert (common_costs != NULL);
-      switch (type_of_cost)
-	{
-	case scalar_stmt:
-	  return fp ? costs->scalar_fp_stmt_cost : costs->scalar_int_stmt_cost;
+    case scalar_stmt:
+      return fp ? costs->scalar_fp_stmt_cost : costs->scalar_int_stmt_cost;
 
-	case scalar_load:
-	  return costs->scalar_load_cost;
+    case scalar_load:
+      return costs->scalar_load_cost;
 
-	case scalar_store:
-	  return costs->scalar_store_cost;
+    case scalar_store:
+      return costs->scalar_store_cost;
 
-	case vector_stmt:
-	  return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
+    case vector_stmt:
+      return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
 
-	case vector_load:
-	  return common_costs->align_load_cost;
+    case vector_load:
+      return common_costs->align_load_cost;
 
-	case vector_store:
-	  return common_costs->align_store_cost;
+    case vector_store:
+      return common_costs->align_store_cost;
 
-	case vec_to_scalar:
-	  return common_costs->vec_to_scalar_cost;
+    case vec_to_scalar:
+      return common_costs->vec_to_scalar_cost;
 
-	case scalar_to_vec:
-	  return common_costs->scalar_to_vec_cost;
+    case scalar_to_vec:
+      return common_costs->scalar_to_vec_cost;
 
-	case unaligned_load:
-	  return common_costs->unalign_load_cost;
-	case vector_gather_load:
-	  return common_costs->gather_load_cost;
+    case unaligned_load:
+      return common_costs->unalign_load_cost;
+    case vector_gather_load:
+      return common_costs->gather_load_cost;
 
-	case unaligned_store:
-	  return common_costs->unalign_store_cost;
-	case vector_scatter_store:
-	  return common_costs->scatter_store_cost;
+    case unaligned_store:
+      return common_costs->unalign_store_cost;
+    case vector_scatter_store:
+      return common_costs->scatter_store_cost;
 
-	case cond_branch_taken:
-	  return costs->cond_taken_branch_cost;
+    case cond_branch_taken:
+      return costs->cond_taken_branch_cost;
 
-	case cond_branch_not_taken:
-	  return costs->cond_not_taken_branch_cost;
+    case cond_branch_not_taken:
+      return costs->cond_not_taken_branch_cost;
 
-	case vec_perm:
-	  return common_costs->permute_cost;
+    case vec_perm:
+      return common_costs->permute_cost;
 
-	case vec_promote_demote:
-	  return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
+    case vec_promote_demote:
+      return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
 
-	case vec_construct:
-	  elements = estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
-	  return elements / 2 + 1;
+    case vec_construct:
+      return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype)) - 1;
 
-	default:
-	  gcc_unreachable ();
-	}
+    default:
+      gcc_unreachable ();
     }
 
   return default_builtin_vectorization_cost (type_of_cost, vectype, misalign);