[v1] RISC-V: Fix bug for expand_const_vector interleave [PR118931]

Message ID 20250223031157.3014070-1-pan2.li@intel.com
State Superseded
Delegated to: Robin Dapp
Headers
Series [v1] RISC-V: Fix bug for expand_const_vector interleave [PR118931] |

Checks

Context Check Description
rivoscibot/toolchain-ci-rivos-lint success Lint passed
rivoscibot/toolchain-ci-rivos-apply-patch success Patch applied
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap success Build passed
rivoscibot/toolchain-ci-rivos-test fail Testing failed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed

Commit Message

Li, Pan2 Feb. 23, 2025, 3:11 a.m. UTC
  From: Pan Li <pan2.li@intel.com>

This patch would like to fix one bug when expanding const vector for the
interleave case.  For example, we have:

base1 = 151
step = 121

For vec_series, we will generate vector in format of v[i] = base + i * step.
Then the vec_series will have below result for HImode, and we can find
that the result overflow to the highest 8 bits of HImode.

v1.b = {151, 255, 7,  0, 119,  0, 231,  0, 87,  1, 199,  1, 55,   2, 167,   2}

Aka we expect v1.b should be:

v1.b = {151, 0, 7,  0, 119,  0, 231,  0, 87,  0, 199,  0, 55,   0, 167,   0}

After that it will perform the IOR with v2 for the base2(aka another series).

v2.b =  {0,  17, 0, 33,   0, 49,   0, 65,  0, 81,   0, 97,  0, 113,   0, 129}

Unfortunately, the base1 + i * step1 in HImode may overflow to the high
8 bits, and the high 8 bits will pollute the v2 and result in incorrect
value in const_vector.

This patch would like to perform the overflow to smode check before IOR
the base2 series, and perform the clean highest bit if the const_vector
overflow to smode occurs.  If no overflow, there will do nothing here.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

	PR target/118931

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (expand_const_vector): Add overflow to
	smode check and clean up highest bits if overflow.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/pr118931-run-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/riscv-v.cc                   | 31 ++++++++++++++++++-
 .../riscv/rvv/base/pr118931-run-1.c           | 19 ++++++++++++
 2 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr118931-run-1.c
  

Comments

Robin Dapp Feb. 24, 2025, 8:29 a.m. UTC | #1
> This patch would like to perform the overflow to smode check before IOR
> the base2 series, and perform the clean highest bit if the const_vector
> overflow to smode occurs.  If no overflow, there will do nothing here.

I agree with the general idea but I'm a bit wary fiddling with the coefficients
directly.  I think for a fixed-size, non VLA vector it should be sufficient to
check whether the last step overflows.

For VLA we actually know the largest possible runtime vector length and could
use it the check?  But maybe it's probably easier to bail with VLA anyway.

In addition: if we're already checking the step anyway we could get rid of the
negative step restriction as well.
  
Li, Pan2 Feb. 24, 2025, 11:34 a.m. UTC | #2
Thanks Robin for comments.

> I agree with the general idea but I'm a bit wary fiddling with the coefficients
> directly.  I think for a fixed-size, non VLA vector it should be sufficient to
> check whether the last step overflows.

Initial idea I would like to take care of VLS and VLA in the same chunk of code, given
the expand_const_vector is sort of complicated currently(I don't want to make it worse by add more if VLA, else if VLS ... etc).

I will try to leverage poly_int mult, shift, and compare if possible, instead of touch the coefficients directly in v2.

> For VLA we actually know the largest possible runtime vector length and could
> use it the check?  But maybe it's probably easier to bail with VLA anyway.
> In addition: if we're already checking the step anyway we could get rid of the
> negative step restriction as well.

I don't explore more cases here consider we are in stage 4. I think the expand_const_vector need some
refactor up to a point. 

Pan

-----Original Message-----
From: Robin Dapp <rdapp.gcc@gmail.com> 
Sent: Monday, February 24, 2025 4:29 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: [PATCH v1] RISC-V: Fix bug for expand_const_vector interleave [PR118931]

> This patch would like to perform the overflow to smode check before IOR
> the base2 series, and perform the clean highest bit if the const_vector
> overflow to smode occurs.  If no overflow, there will do nothing here.

I agree with the general idea but I'm a bit wary fiddling with the coefficients
directly.  I think for a fixed-size, non VLA vector it should be sufficient to
check whether the last step overflows.

For VLA we actually know the largest possible runtime vector length and could
use it the check?  But maybe it's probably easier to bail with VLA anyway.

In addition: if we're already checking the step anyway we could get rid of the
negative step restriction as well.

-- 
Regards
 Robin
  
Robin Dapp Feb. 24, 2025, 11:43 a.m. UTC | #3
> I don't explore more cases here consider we are in stage 4. I think the
> expand_const_vector need some refactor up to a point.

I added the negative step check just some weeks ago and I'd see it as
simplification to remove the restriction again if you're touching the actual
step anyway.  So I wouldn't worry about stage 4 in that regard.
  
Li, Pan2 Feb. 24, 2025, 11:50 a.m. UTC | #4
> I added the negative step check just some weeks ago and I'd see it as
> simplification to remove the restriction again if you're touching the actual
> step anyway.  So I wouldn't worry about stage 4 in that regard.

Oh, I see. I'll have a try after this bug fix.

Pan 

-----Original Message-----
From: Robin Dapp <rdapp.gcc@gmail.com> 
Sent: Monday, February 24, 2025 7:44 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: [PATCH v1] RISC-V: Fix bug for expand_const_vector interleave [PR118931]

> I don't explore more cases here consider we are in stage 4. I think the
> expand_const_vector need some refactor up to a point.

I added the negative step check just some weeks ago and I'd see it as
simplification to remove the restriction again if you're touching the actual
step anyway.  So I wouldn't worry about stage 4 in that regard.

-- 
Regards
 Robin
  

Patch

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 7cc15f3d53c..43b54b4184f 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1504,9 +1504,38 @@  expand_const_vector (rtx target, rtx src)
 	      && get_vector_mode (new_smode, new_nunits).exists (&new_mode))
 	    {
 	      rtx tmp1 = gen_reg_rtx (new_mode);
-	      base1 = gen_int_mode (rtx_to_poly_int64 (base1), new_smode);
+	      poly_int64 base1_poly = rtx_to_poly_int64 (base1);
+	      base1 = gen_int_mode (base1_poly, new_smode);
 	      expand_vec_series (tmp1, base1, gen_int_mode (step1, new_smode));
 
+	      bool overflow_smode_p = false;
+
+	      for (int i = 0; i < XVECLEN (src, 0) / 2; i++)
+		{
+		  for (int k = 0; k < NUM_POLY_INT_COEFFS; k++)
+		    if (((uint64_t)(base1_poly.coeffs[k] + i * step1.coeffs[k])
+			>> builder.inner_bits_size ()) != 0)
+		      overflow_smode_p = true;
+
+		  if (overflow_smode_p)
+		    break;
+		}
+
+	      if (overflow_smode_p)
+		{
+		  /* The vec_series base1 may overflow bits to base2 series.  */
+		  rtx vec_mask = gen_vec_duplicate (new_mode,
+						    CONSTM1_RTX (new_smode));
+		  rtx lshift_vec_mask = gen_reg_rtx (new_mode);
+		  rtx shift = gen_int_mode (builder.inner_bits_size (), Xmode);
+		  rtx lshift_ops[] = {lshift_vec_mask, vec_mask, shift};
+		  emit_vlmax_insn (code_for_pred_scalar (LSHIFTRT, new_mode),
+				   BINARY_OP, lshift_ops);
+		  rtx and_ops[] = {tmp1, tmp1, lshift_vec_mask};
+		  emit_vlmax_insn (code_for_pred (AND, new_mode), BINARY_OP,
+				  and_ops);
+		}
+
 	      if (rtx_equal_p (base2, const0_rtx) && known_eq (step2, 0))
 		/* { 1, 0, 2, 0, ... }.  */
 		emit_move_insn (result, gen_lowpart (mode, tmp1));
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr118931-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr118931-run-1.c
new file mode 100644
index 00000000000..ef866a72039
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr118931-run-1.c
@@ -0,0 +1,19 @@ 
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-O3 -march=rv64gcv -flto -mrvv-vector-bits=zvl" } */
+
+long long m;
+char f = 151;
+char h = 103;
+unsigned char a = 109;
+
+int main() {
+  for (char l = 0; l < 255 - 241; l += h - 102)
+    a *= f;
+
+  m = a;
+
+  if (m != 29)
+    __builtin_abort ();
+
+  return 0;
+}