[v2] RISC-V: Fix bug for expand_const_vector interleave [PR118931]
Checks
Context |
Check |
Description |
rivoscibot/toolchain-ci-rivos-lint |
success
|
Lint passed
|
rivoscibot/toolchain-ci-rivos-apply-patch |
success
|
Patch applied
|
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-test |
fail
|
Testing failed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
From: Pan Li <pan2.li@intel.com>
This patch would like to fix one bug when expanding const vector for the
interleave case. For example, we have:
base1 = 151
step = 121
For vec_series, we will generate vector in format of v[i] = base + i * step.
Then the vec_series will have below result for HImode, and we can find
that the result overflow to the highest 8 bits of HImode.
v1.b = {151, 255, 7, 0, 119, 0, 231, 0, 87, 1, 199, 1, 55, 2, 167, 2}
Aka we expect v1.b should be:
v1.b = {151, 0, 7, 0, 119, 0, 231, 0, 87, 0, 199, 0, 55, 0, 167, 0}
After that it will perform the IOR with v2 for the base2(aka another series).
v2.b = {0, 17, 0, 33, 0, 49, 0, 65, 0, 81, 0, 97, 0, 113, 0, 129}
Unfortunately, the base1 + i * step1 in HImode may overflow to the high
8 bits, and the high 8 bits will pollute the v2 and result in incorrect
value in const_vector.
This patch would like to perform the overflow to smode check before IOR
the base2 series, and perform the clean highest bit if the const_vector
overflow to smode occurs. If no overflow, there will do nothing here.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/118931
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Add overflow to
smode check and clean up highest bits if overflow.
* poly-int.h: Add operator =>> for poly.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118931-run-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/config/riscv/riscv-v.cc | 33 ++++++++++++++++++-
gcc/poly-int.h | 11 +++++++
.../riscv/rvv/base/pr118931-run-1.c | 19 +++++++++++
3 files changed, 62 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr118931-run-1.c
Comments
> This patch would like to fix one bug when expanding const vector for the
> interleave case. For example, we have:
>
> base1 = 151
> step = 121
>
> For vec_series, we will generate vector in format of v[i] = base + i * step.
> Then the vec_series will have below result for HImode, and we can find
> that the result overflow to the highest 8 bits of HImode.
>
> v1.b = {151, 255, 7, 0, 119, 0, 231, 0, 87, 1, 199, 1, 55, 2, 167, 2}
A few remarks:
- IMHO we need to check both series for overflow, if step2 overflows in the
smaller type isn't the result equally wrong?
- As discussed in V1, isn't is sufficient to just check the last element?
XVECLEN is equal to the number of elements in a fixed-length const vector but
not in a variable-length one. So for a variable-length vector you'd be
getting XVECLEN = 6 and only check up to a step of 2 when there might be
many more steps.
We need to either forbid variable-length vectors for this scheme altogether or
determine the maximum runtime number of elements possible for the current mode.
We don't support more than 65536 bits in a vector which would "naturally" limit
the number of elements. I suppose variable-length vectors are not too common
for this scheme and falling back to a less optimized one shouldn't be too
costly. That's just a gut feeling, though.
- Dividing/right shifting poly_ints is tricky and there is can_div_trunc_p to
check if we can do that at all. If we forbade VLA that wouldn't be an issue
and all we needed to check is CONST_VECTOR_NUNITS () * step.
Thanks Robin.
> - IMHO we need to check both series for overflow, if step2 overflows in the
> smaller type isn't the result equally wrong?
The series2 will shift right before IOR, thus the overflow bits never effect on the final result.
For example, the series2 will be similar as below after shift.
v2.b = {0, 17, 0, 33, 0, 49, 0, 65, 0, 81, 0, 97, 0, 113, 0, 129}
> - As discussed in V1, isn't is sufficient to just check the last element?
> XVECLEN is equal to the number of elements in a fixed-length const vector but
> not in a variable-length one. So for a variable-length vector you'd be
> getting XVECLEN = 6 and only check up to a step of 2 when there might be
> many more steps.
For i32 interleave, we will extend to i64 for series1 and series2, I think we can design
a series like base + i * step with last element overflow to bit 65 but have bits 32-64 all zero, and then
the element before the last one will have overflow bits pollute the result.
It seems checking the last 2 elements is good enough here.
> We need to either forbid variable-length vectors for this scheme altogether or
> determine the maximum runtime number of elements possible for the current mode.
> We don't support more than 65536 bits in a vector which would "naturally" limit
> the number of elements. I suppose variable-length vectors are not too common
> for this scheme and falling back to a less optimized one shouldn't be too
> costly. That's just a gut feeling, though.
Make sense, will fall back to less optimized for VLA.
> - Dividing/right shifting poly_ints is tricky and there is can_div_trunc_p to
> check if we can do that at all. If we forbade VLA that wouldn't be an issue
> and all we needed to check is CONST_VECTOR_NUNITS () * step.
I see, that explains why we have poly shift right in previous, will update in v3.
Pan
-----Original Message-----
From: Robin Dapp <rdapp.gcc@gmail.com>
Sent: Wednesday, February 26, 2025 12:46 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: [PATCH v2] RISC-V: Fix bug for expand_const_vector interleave [PR118931]
> This patch would like to fix one bug when expanding const vector for the
> interleave case. For example, we have:
>
> base1 = 151
> step = 121
>
> For vec_series, we will generate vector in format of v[i] = base + i * step.
> Then the vec_series will have below result for HImode, and we can find
> that the result overflow to the highest 8 bits of HImode.
>
> v1.b = {151, 255, 7, 0, 119, 0, 231, 0, 87, 1, 199, 1, 55, 2, 167, 2}
A few remarks:
- IMHO we need to check both series for overflow, if step2 overflows in the
smaller type isn't the result equally wrong?
- As discussed in V1, isn't is sufficient to just check the last element?
XVECLEN is equal to the number of elements in a fixed-length const vector but
not in a variable-length one. So for a variable-length vector you'd be
getting XVECLEN = 6 and only check up to a step of 2 when there might be
many more steps.
We need to either forbid variable-length vectors for this scheme altogether or
determine the maximum runtime number of elements possible for the current mode.
We don't support more than 65536 bits in a vector which would "naturally" limit
the number of elements. I suppose variable-length vectors are not too common
for this scheme and falling back to a less optimized one shouldn't be too
costly. That's just a gut feeling, though.
- Dividing/right shifting poly_ints is tricky and there is can_div_trunc_p to
check if we can do that at all. If we forbade VLA that wouldn't be an issue
and all we needed to check is CONST_VECTOR_NUNITS () * step.
> Thanks Robin.
>
>> - IMHO we need to check both series for overflow, if step2 overflows in the
>> smaller type isn't the result equally wrong?
>
> The series2 will shift right before IOR, thus the overflow bits never effect
> on the final result.
> For example, the series2 will be similar as below after shift.
>
> v2.b = {0, 17, 0, 33, 0, 49, 0, 65, 0, 81, 0, 97, 0, 113, 0, 129}
Hmm, I think it's the other way around and it is being shifted left. In that
case we're only keeping the lower bits and the "overflow bits" don't matter.
That means we should indeed be OK here.
> For i32 interleave, we will extend to i64 for series1 and series2, I think we
> can design
> a series like base + i * step with last element overflow to bit 65 but have
> bits 32-64 all zero, and then
> the element before the last one will have overflow bits pollute the result.
>
> It seems checking the last 2 elements is good enough here.
I guess the worst that can happen theoretically is i = 2 ^ 32 - 1,
step = 2 ^ 32 - 1, and base = 2 ^ 32 - 1? But i is bound by our "largest"
mode so IMHO a 64-bit value should be sufficient.
We always have at least two elements BTW, no need to check that.
Please also adjust the comment above the code regarding overflow.
>> We need to either forbid variable-length vectors for this scheme altogether or
>> determine the maximum runtime number of elements possible for the current mode.
>> We don't support more than 65536 bits in a vector which would "naturally" limit
>> the number of elements. I suppose variable-length vectors are not too common
>> for this scheme and falling back to a less optimized one shouldn't be too
>> costly. That's just a gut feeling, though.
>
> Make sense, will fall back to less optimized for VLA.
Have you checked how the more generic version (last else branch) compares? I
could imagine it's close or even better for our case.
> Hmm, I think it's the other way around and it is being shifted left. In that
> case we're only keeping the lower bits and the "overflow bits" don't matter.
> That means we should indeed be OK here.
Yes, will add some comments here for series2.
> I guess the worst that can happen theoretically is i = 2 ^ 32 - 1,
> step = 2 ^ 32 - 1, and base = 2 ^ 32 - 1? But i is bound by our "largest"
> mode so IMHO a 64-bit value should be sufficient.
> We always have at least two elements BTW, no need to check that.
> Please also adjust the comment above the code regarding overflow.
Sure thing, your are right, the last element is good enough here.
> Have you checked how the more generic version (last else branch) compares? I
> could imagine it's close or even better for our case.
If you mean the last branch of interleave, I think it is safe because it leverage the
merge to generate the result, instead of IOR. Only the IOR for final result have
this issue.
Pan
-----Original Message-----
From: Robin Dapp <rdapp.gcc@gmail.com>
Sent: Thursday, February 27, 2025 12:23 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: [PATCH v2] RISC-V: Fix bug for expand_const_vector interleave [PR118931]
> Thanks Robin.
>
>> - IMHO we need to check both series for overflow, if step2 overflows in the
>> smaller type isn't the result equally wrong?
>
> The series2 will shift right before IOR, thus the overflow bits never effect
> on the final result.
> For example, the series2 will be similar as below after shift.
>
> v2.b = {0, 17, 0, 33, 0, 49, 0, 65, 0, 81, 0, 97, 0, 113, 0, 129}
Hmm, I think it's the other way around and it is being shifted left. In that
case we're only keeping the lower bits and the "overflow bits" don't matter.
That means we should indeed be OK here.
> For i32 interleave, we will extend to i64 for series1 and series2, I think we
> can design
> a series like base + i * step with last element overflow to bit 65 but have
> bits 32-64 all zero, and then
> the element before the last one will have overflow bits pollute the result.
>
> It seems checking the last 2 elements is good enough here.
I guess the worst that can happen theoretically is i = 2 ^ 32 - 1,
step = 2 ^ 32 - 1, and base = 2 ^ 32 - 1? But i is bound by our "largest"
mode so IMHO a 64-bit value should be sufficient.
We always have at least two elements BTW, no need to check that.
Please also adjust the comment above the code regarding overflow.
>> We need to either forbid variable-length vectors for this scheme altogether or
>> determine the maximum runtime number of elements possible for the current mode.
>> We don't support more than 65536 bits in a vector which would "naturally" limit
>> the number of elements. I suppose variable-length vectors are not too common
>> for this scheme and falling back to a less optimized one shouldn't be too
>> costly. That's just a gut feeling, though.
>
> Make sense, will fall back to less optimized for VLA.
Have you checked how the more generic version (last else branch) compares? I
could imagine it's close or even better for our case.
> If you mean the last branch of interleave, I think it is safe because it leverage the
> merge to generate the result, instead of IOR. Only the IOR for final result have
> this issue.
Yep, I meant checking overflow before the initial if
if (known_ge (step1, 0) && known_ge (step2, 0)
&& int_mode_for_size (new_smode_bitsize, 0).exists (&new_smode)
&& get_vector_mode (new_smode, new_nunits).exists (&new_mode))
and basically adding a new clause here. That would make us fall back to the
merge scheme in case of overflow instead of making the other scheme more
complicated.
I haven't checked this but could imagine the merge sequence is not worse or
even preferable.
> and basically adding a new clause here. That would make us fall back to the
> merge scheme in case of overflow instead of making the other scheme more
> complicated.
Oh, I see, that make sense to me, given current implementation of expand_const_vector is complicated, let me update in v4.
Pan
-----Original Message-----
From: Robin Dapp <rdapp.gcc@gmail.com>
Sent: Thursday, February 27, 2025 1:37 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: [PATCH v2] RISC-V: Fix bug for expand_const_vector interleave [PR118931]
> If you mean the last branch of interleave, I think it is safe because it leverage the
> merge to generate the result, instead of IOR. Only the IOR for final result have
> this issue.
Yep, I meant checking overflow before the initial if
if (known_ge (step1, 0) && known_ge (step2, 0)
&& int_mode_for_size (new_smode_bitsize, 0).exists (&new_smode)
&& get_vector_mode (new_smode, new_nunits).exists (&new_mode))
and basically adding a new clause here. That would make us fall back to the
merge scheme in case of overflow instead of making the other scheme more
complicated.
I haven't checked this but could imagine the merge sequence is not worse or
even preferable.
--
Regards
Robin
@@ -1504,9 +1504,40 @@ expand_const_vector (rtx target, rtx src)
&& get_vector_mode (new_smode, new_nunits).exists (&new_mode))
{
rtx tmp1 = gen_reg_rtx (new_mode);
- base1 = gen_int_mode (rtx_to_poly_int64 (base1), new_smode);
+ poly_int64 base1_poly = rtx_to_poly_int64 (base1);
+ base1 = gen_int_mode (base1_poly, new_smode);
expand_vec_series (tmp1, base1, gen_int_mode (step1, new_smode));
+ bool overflow_smode_p = false;
+
+ for (int i = 0; i < XVECLEN (src, 0) / 2; i++)
+ {
+ poly_int64 elem = step1 * i + base1_poly;
+
+ elem >>= builder.inner_bits_size ();
+
+ if (maybe_ne (0, elem))
+ {
+ overflow_smode_p = true;
+ break;
+ }
+ }
+
+ if (overflow_smode_p)
+ {
+ /* The vec_series base1 may overflow bits to base2 series. */
+ rtx vec_mask = gen_vec_duplicate (new_mode,
+ CONSTM1_RTX (new_smode));
+ rtx lshift_vec_mask = gen_reg_rtx (new_mode);
+ rtx shift = gen_int_mode (builder.inner_bits_size (), Xmode);
+ rtx lshift_ops[] = {lshift_vec_mask, vec_mask, shift};
+ emit_vlmax_insn (code_for_pred_scalar (LSHIFTRT, new_mode),
+ BINARY_OP, lshift_ops);
+ rtx and_ops[] = {tmp1, tmp1, lshift_vec_mask};
+ emit_vlmax_insn (code_for_pred (AND, new_mode), BINARY_OP,
+ and_ops);
+ }
+
if (rtx_equal_p (base2, const0_rtx) && known_eq (step2, 0))
/* { 1, 0, 2, 0, ... }. */
emit_move_insn (result, gen_lowpart (mode, tmp1));
@@ -408,6 +408,8 @@ public:
poly_int &operator <<= (unsigned int);
+ poly_int &operator >>= (unsigned int);
+
bool is_constant () const;
template<typename T>
@@ -550,6 +552,15 @@ poly_int<N, C>::operator <<= (unsigned int a)
return *this;
}
+template<unsigned int N, typename C>
+inline poly_int<N, C>&
+poly_int<N, C>::operator >>= (unsigned int a)
+{
+ for (unsigned int i = 0; i < N; i++)
+ this->coeffs[i] >>= a;
+ return *this;
+}
+
/* Return true if the polynomial value is a compile-time constant. */
template<unsigned int N, typename C>
new file mode 100644
@@ -0,0 +1,19 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-O3 -march=rv64gcv -flto -mrvv-vector-bits=zvl" } */
+
+long long m;
+char f = 151;
+char h = 103;
+unsigned char a = 109;
+
+int main() {
+ for (char l = 0; l < 255 - 241; l += h - 102)
+ a *= f;
+
+ m = a;
+
+ if (m != 29)
+ __builtin_abort ();
+
+ return 0;
+}