From patchwork Wed Sep 20 02:30:59 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Li, Pan2" <pan2.li@intel.com>
X-Patchwork-Id: 76418
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C38C23858C1F
	for <patchwork@sourceware.org>; Wed, 20 Sep 2023 02:31:28 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93])
 by sourceware.org (Postfix) with ESMTPS id 945043858D20
 for <gcc-patches@gcc.gnu.org>; Wed, 20 Sep 2023 02:31:06 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 945043858D20
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1695177066; x=1726713066;
 h=from:to:cc:subject:date:message-id:mime-version:
 content-transfer-encoding;
 bh=ME8ZRoelTthAGedlQH12YVxpXo51AcF57kofhGvbeYE=;
 b=M4n+XTsVPmnUbu2hJ29kyW+2ME2gtXTqaqBOiqyFVfOlX7Cgdu7CFTWe
 IBSzdFtBRW1T5HZ5tfu3ByjtxbK0gTzjo4qjrZX4WCzUKrLEX/lebl5tv
 JBQ3rWaD+yM/9b//vLxr0b0jiwvcSqLQzt89CjPqCNeFYCedUtABcuSzF
 HNSGQxs6k1mZjP1nMScAU+LzX6TTgdu5RpHTEAPCa6Rp7/wYrv5YF35BK
 4J7lo4roEb/JwBJUAQ23hQG+eBvrm6dnXoUUTVhUo2PXtsWFe/sUI5ZfS
 IjlZTsuRRY/6aZxVei55v1ycLQD5AyWulFuthF48MW+0DqJZwlKJRQZxK w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10838"; a="377417282"
X-IronPort-AV: E=Sophos;i="6.02,160,1688454000"; d="scan'208";a="377417282"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 19 Sep 2023 19:31:04 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10838"; a="861791345"
X-IronPort-AV: E=Sophos;i="6.02,160,1688454000"; d="scan'208";a="861791345"
Received: from shvmail02.sh.intel.com ([10.239.244.9])
 by fmsmga002.fm.intel.com with ESMTP; 19 Sep 2023 19:31:03 -0700
Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com
 [10.239.159.47])
 by shvmail02.sh.intel.com (Postfix) with ESMTP id 4C27E10056A1;
 Wed, 20 Sep 2023 10:31:01 +0800 (CST)
From: pan2.li@intel.com
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai, pan2.li@intel.com, yanzhang.wang@intel.com,
 kito.cheng@gmail.com
Subject: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
Date: Wed, 20 Sep 2023 10:30:59 +0800
Message-Id: <20230920023059.1728132-1-pan2.li@intel.com>
X-Mailer: git-send-email 2.34.1
MIME-Version: 1.0
X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 KAM_ASCII_DIVIDERS, KAM_SHORT, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org

From: Pan Li <pan2.li@intel.com>

This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.

When we would like to call ceil/ceilf like v2 = ceil (v1), we will
onvert it into below insn (reference the implementation of llvm).

* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3

The conditional auto-vectorization for ceil/ceilf is also supported
and covered by test cases.

Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw     fa0,0(s0)
  addi    s0,s0,4
  addi    s1,s1,4
  call    ceilf
  fsw     fa0,-4(s1)
  bne     s0,s2,.L3

After this patch:
  ...
  fsrmi   3
.L4:
  vsetvli a5,a2,e32,m1,ta,ma
  vle32.v v1,0(a1)
  vsetvli a3,zero,e32,m1,ta,ma
  slli    a4,a5,2
  vfcvt.x.f.v     v1,v1
  sub     a2,a2,a5
  vfcvt.f.x.v     v1,v1
  vsetvli zero,a5,e32,m1,ta,ma
  vse32.v v1,0(a0)
  add     a1,a1,a4
  add     a0,a0,a4
  bne     a2,zero,.L4
.L14:
  fsrm    a6
  ret

Please not VLS mode is not involved in this patch and will be token
care of in the underlying patches soon.

gcc/ChangeLog:

	* config/riscv/autovec.md (ceil<mode>2): New pattern.
	* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
	(enum insn_type): Ditto.
	* config/riscv/riscv-v.cc: Handle rounding up.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
	* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
	* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
	* gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test.
	* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/test-math.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
Signed-off-by: Pan Li <pan2.li@intel.com>
Signed-off-by: Pan Li <pan2.li@intel.com>
Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/autovec.md                   | 30 +++++++++++++
 gcc/config/riscv/riscv-protos.h               |  4 ++
 gcc/config/riscv/riscv-v.cc                   |  2 +
 .../riscv/rvv/autovec/math-ceil-1.c           | 21 +++++++++
 .../riscv/rvv/autovec/math-ceil-2.c           | 21 +++++++++
 .../riscv/rvv/autovec/math-ceil-3.c           | 24 ++++++++++
 .../riscv/rvv/autovec/math-ceil-4.c           | 24 ++++++++++
 .../riscv/rvv/autovec/math-ceil-run-1.c       | 24 ++++++++++
 .../riscv/rvv/autovec/math-ceil-run-2.c       | 24 ++++++++++
 .../gcc.target/riscv/rvv/autovec/test-math.h  | 45 +++++++++++++++++++
 10 files changed, 219 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 493d5745485..ea508d81047 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2374,3 +2374,33 @@ (define_expand "<u>avg<v_double_trunc>3_ceil"
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
   DONE;
 })
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Math.h.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ceil/ceilf
+;; -------------------------------------------------------------------------
+(define_expand "ceil<mode>2"
+  [(match_operand:VF 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    rtx tmp = gen_reg_rtx (<VCONVERT>mode);
+    rtx ops_1[] = {tmp, operands[1]};
+    insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, <MODE>mode);
+
+    /* vfcvt.x.f with rounding up (aka ceil).  */
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1);
+
+    rtx ops_2[] = {operands[0], tmp};
+    icode = code_for_pred (FLOAT, <MODE>mode);
+
+    /* vfcvt.f.x for the final result.  To avoid unnecessary frm register
+       access, we use RUP here and it will never do the rounding up because
+       the tmp rtx comes from the float to int conversion.  */
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2);
+
+    DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a2d218d67b..833f1efbaf4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -250,6 +250,9 @@ enum insn_flags : unsigned int
   /* flags for the floating-point rounding mode.  */
   /* Means INSN has FRM operand and the value is FRM_DYN.  */
   FRM_DYN_P = 1 << 15,
+
+  /* Means INSN has FRM operand and the value is FRM_RUP.  */
+  FRM_RUP_P = 1 << 16,
 };
 
 enum insn_type : unsigned int
@@ -290,6 +293,7 @@ enum insn_type : unsigned int
   UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
   UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
+  UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
 
   /* Binary operator.  */
   BINARY_OP = __NORMAL_OP | BINARY_OP_P,
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a9287e5d671..4192f988648 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -323,6 +323,8 @@ public:
     /* Add rounding mode operand.  */
     if (m_insn_flags & FRM_DYN_P)
       add_rounding_mode_operand (FRM_DYN);
+    if (m_insn_flags & FRM_RUP_P)
+      add_rounding_mode_operand (FRM_RUP);
 
     gcc_assert (insn_data[(int) icode].n_operands == m_opno);
     expand (icode, any_mem_p);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
new file mode 100644
index 00000000000..8f0f09609eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+3
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ...
+*/
+TEST_CEIL(float, ceilf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
new file mode 100644
index 00000000000..73395d30d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+3
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ...
+*/
+TEST_CEIL(double, ceil)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
new file mode 100644
index 00000000000..eb0f3a3db78
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+3
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+**   ...
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ...
+*/
+TEST_COND_CEIL(float, ceilf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
new file mode 100644
index 00000000000..b9a3c8ebf84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+**   frrm\s+[atx][0-9]+
+**   ...
+**   fsrmi\s+3
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+**   ...
+**   vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+**   ...
+**   vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+**   ...
+**   fsrm\s+[atx][0-9]+
+**   ...
+*/
+TEST_COND_CEIL(double, ceil)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
new file mode 100644
index 00000000000..014c4c3ac0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+float in[ARRAY_SIZE];
+float out[ARRAY_SIZE];
+float ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(float, ceilf)
+TEST_INIT(float)
+TEST_ASSERT(float)
+
+int
+main ()
+{
+  test_float_init (in, ref, ARRAY_SIZE);
+  test_float_ceilf (out, in, ARRAY_SIZE);
+  test_float_assert (out, ref, ARRAY_SIZE);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
new file mode 100644
index 00000000000..ae361e11144
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+double in[ARRAY_SIZE];
+double out[ARRAY_SIZE];
+double ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(double, ceil)
+TEST_INIT(double)
+TEST_ASSERT(double)
+
+int
+main ()
+{
+  test_double_init (in, ref, ARRAY_SIZE);
+  test_double_ceil (out, in, ARRAY_SIZE);
+  test_double_assert (out, ref, ARRAY_SIZE);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
new file mode 100644
index 00000000000..57dd5e0e460
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
@@ -0,0 +1,45 @@
+#include <math.h>
+
+#define TEST_CEIL(TYPE, CALL) \
+  void test_##TYPE##_##CALL (TYPE *out, TYPE *in, unsigned count) \
+  {                                                               \
+    for (unsigned i = 0; i < count; i++)                          \
+      out[i] = CALL (in[i]);                                      \
+  }
+
+#define TEST_COND_CEIL(TYPE, CALL) \
+  void test_##TYPE##_##CALL (TYPE *out, int *cond, TYPE *in, unsigned count) \
+  {                                                                          \
+    for (unsigned i = 0; i < count; i++)                                     \
+      out[i] = cond[i] ? CALL (in[i]) : in[i];                               \
+  }
+
+#define TEST_INIT(TYPE)                                        \
+  void test_##TYPE##_init (TYPE *in, TYPE *ref, unsigned size) \
+  {                                                            \
+    for (unsigned i = 0; i < size; i++)                        \
+      {                                                        \
+	TYPE tmp = (TYPE)i;                                    \
+                                                               \
+	if (i % 2 == 0)                                        \
+	  {                                                    \
+	    in[i] = 1.5f + (TYPE)i;                            \
+	    ref[i] = (TYPE)(i + 2);                            \
+	  }                                                    \
+	else                                                   \
+	  {                                                    \
+	    in[i] = (TYPE)i;                                   \
+	    ref[i] = (TYPE)i;                                  \
+	  }                                                    \
+      }                                                        \
+  }
+
+#define TEST_ASSERT(TYPE)                                         \
+  void test_##TYPE##_assert (TYPE *out, TYPE *ref, unsigned size) \
+  {                                                               \
+    for (unsigned i = 0; i < size; i++)                           \
+      {                                                           \
+	if (out[i] != ref[i])                                     \
+	  __builtin_abort ();                                     \
+      }                                                           \
+  }