forwprop: Check mask for type overflow [PR123414].
Checks
| Context |
Check |
Description |
| rivoscibot/toolchain-ci-rivos-lint |
success
|
Lint passed
|
| rivoscibot/toolchain-ci-rivos-apply-patch |
success
|
Patch applied
|
| rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib |
success
|
Build passed
|
| rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib |
success
|
Build passed
|
| rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib |
success
|
Build passed
|
| rivoscibot/toolchain-ci-rivos-test |
fail
|
Testing failed
|
Commit Message
Hi,
RVV's vectors can get very large with LMUL8. In the PR we have
256-element char vectors which get permuted. For permuting them
we use a mask vectype that is deduced from the element type
without checking if the permute indices fit this type.
That leads to an invalid permute mask which gets optimized away.
This patch punts if the permute index might overflow the mask type.
Bootstrapped and regtested on x86, power10, and aarch64.
Regtested on riscv64.
Regards
Robin
PR tree-optimization/123414
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_vector_constructor):
Give up if the permute index might overflow the mask type.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr123414.c: New test.
---
.../gcc.target/riscv/rvv/autovec/pr123414.c | 31 +++++++++++++++++++
gcc/tree-ssa-forwprop.cc | 13 ++++++--
2 files changed, 41 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
Comments
On Thu, Jan 8, 2026 at 11:20 AM Robin Dapp <rdapp.gcc@gmail.com> wrote:
>
> Hi,
>
> RVV's vectors can get very large with LMUL8. In the PR we have
> 256-element char vectors which get permuted. For permuting them
> we use a mask vectype that is deduced from the element type
> without checking if the permute indices fit this type.
> That leads to an invalid permute mask which gets optimized away.
>
> This patch punts if the permute index might overflow the mask type.
>
> Bootstrapped and regtested on x86, power10, and aarch64.
> Regtested on riscv64.
>
> Regards
> Robin
>
> PR tree-optimization/123414
>
> gcc/ChangeLog:
>
> * tree-ssa-forwprop.cc (simplify_vector_constructor):
> Give up if the permute index might overflow the mask type.
Ok with one minor change listed below:
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr123414.c: New test.
> ---
> .../gcc.target/riscv/rvv/autovec/pr123414.c | 31 +++++++++++++++++++
> gcc/tree-ssa-forwprop.cc | 13 ++++++--
> 2 files changed, 41 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
> new file mode 100644
> index 00000000000..a28ce23f058
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvl256b -mrvv-max-lmul=m8 -O3 -fsigned-char -fno-strict-aliasing -fwrapv -fdump-tree-optimized -std=gnu99" } */
> +
> +signed char a=2;
> +long long b;
> +long c = 93;
> +int e[1][9];
> +
> +void
> +g (long cc, int ee[][9])
> +{
> + for (int i = 0; i < 4; i++)
> + for (int j = 0; j < 5; j++)
> + for (unsigned k = 0; k < 9; k++)
> + {
> + a *= cc;
> + for (int l = 0; l < 6; l += (ee[k] <= 0) + 2)
> + ;
> + }
> +}
> +
> +int main() {
> + g( c, e);
> + b = (int)a;
> + if (b != 34)
> + __builtin_abort ();
> +}
> +
> +/* We should have four vector constructors that must not get optimized away.
> + */
> +/* { dg-final { scan-tree-dump-times "\[a-zA-Z_\]\[a-zA-Z0-9_\]+.=.\\\{\[a-zA-Z0-9._\]+, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1," 4 "optimized" } } */
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index d00f7b9004e..9435e4dc409 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -4177,9 +4177,16 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
> machine_mode vmode = TYPE_MODE (perm_type);
> if (!can_vec_perm_const_p (vmode, vmode, indices))
> return false;
> - mask_type
> - = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
> - refnelts);
> + /* With an ELEM_SIZEd integer we can shuffle at most
> + (1 << ELEM_SIZE) / 2 - 1 elements. For a more fine-grained
> + check we could take the maximum of the element values
> + but for now this is sufficient. */
> + if (refnelts < (HOST_WIDE_INT_1U << (elem_size - 1)))
> + mask_type
> + = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
> + refnelts);
> + else
> + return false;
Just do:
if (refnelts >= (HOST_WIDE_INT_1U << (elem_size - 1)))
return false;
And then don't touch the whitespace on the mask_type line.
Thanks,
Andrew Pinski
> if (GET_MODE_CLASS (TYPE_MODE (mask_type)) != MODE_VECTOR_INT
> || maybe_ne (GET_MODE_SIZE (TYPE_MODE (mask_type)),
> GET_MODE_SIZE (TYPE_MODE (perm_type))))
> --
> 2.52.0
>
> Am 08.01.2026 um 20:20 schrieb Robin Dapp <rdapp.gcc@gmail.com>:
>
> Hi,
>
> RVV's vectors can get very large with LMUL8. In the PR we have
> 256-element char vectors which get permuted. For permuting them
> we use a mask vectype that is deduced from the element type
> without checking if the permute indices fit this type.
> That leads to an invalid permute mask which gets optimized away.
>
> This patch punts if the permute index might overflow the mask type.
>
> Bootstrapped and regtested on x86, power10, and aarch64.
> Regtested on riscv64.
Given its a constant permute(?) you can just always use ssizetyoe here.
> Regards
> Robin
>
> PR tree-optimization/123414
>
> gcc/ChangeLog:
>
> * tree-ssa-forwprop.cc (simplify_vector_constructor):
> Give up if the permute index might overflow the mask type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr123414.c: New test.
> ---
> .../gcc.target/riscv/rvv/autovec/pr123414.c | 31 +++++++++++++++++++
> gcc/tree-ssa-forwprop.cc | 13 ++++++--
> 2 files changed, 41 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
> new file mode 100644
> index 00000000000..a28ce23f058
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvl256b -mrvv-max-lmul=m8 -O3 -fsigned-char -fno-strict-aliasing -fwrapv -fdump-tree-optimized -std=gnu99" } */
> +
> +signed char a=2;
> +long long b;
> +long c = 93;
> +int e[1][9];
> +
> +void
> +g (long cc, int ee[][9])
> +{
> + for (int i = 0; i < 4; i++)
> + for (int j = 0; j < 5; j++)
> + for (unsigned k = 0; k < 9; k++)
> + {
> + a *= cc;
> + for (int l = 0; l < 6; l += (ee[k] <= 0) + 2)
> + ;
> + }
> +}
> +
> +int main() {
> + g( c, e);
> + b = (int)a;
> + if (b != 34)
> + __builtin_abort ();
> +}
> +
> +/* We should have four vector constructors that must not get optimized away.
> + */
> +/* { dg-final { scan-tree-dump-times "\[a-zA-Z_\]\[a-zA-Z0-9_\]+.=.\\\{\[a-zA-Z0-9._\]+, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1," 4 "optimized" } } */
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index d00f7b9004e..9435e4dc409 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -4177,9 +4177,16 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
> machine_mode vmode = TYPE_MODE (perm_type);
> if (!can_vec_perm_const_p (vmode, vmode, indices))
> return false;
> - mask_type
> - = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
> - refnelts);
> + /* With an ELEM_SIZEd integer we can shuffle at most
> + (1 << ELEM_SIZE) / 2 - 1 elements. For a more fine-grained
> + check we could take the maximum of the element values
> + but for now this is sufficient. */
> + if (refnelts < (HOST_WIDE_INT_1U << (elem_size - 1)))
> + mask_type
> + = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
> + refnelts);
> + else
> + return false;
> if (GET_MODE_CLASS (TYPE_MODE (mask_type)) != MODE_VECTOR_INT
> || maybe_ne (GET_MODE_SIZE (TYPE_MODE (mask_type)),
> GET_MODE_SIZE (TYPE_MODE (perm_type))))
> --
> 2.52.0
>
>> Bootstrapped and regtested on x86, power10, and aarch64.
>> Regtested on riscv64.
>
> Given its a constant permute(?) you can just always use ssizetyoe here.
Ah, that's even simpler then, thanks. The attached v2 was bootstrapped and
regtested as before.
[PATCH v2] forwprop: Use ssizetype for mask [PR123414].
RVV's vectors can get very large with LMUL8. In the PR we have
256-element char vectors which get permuted. For permuting them
we use a mask vectype that is deduced from the element type
without checking if the permute indices fit this type.
That leads to an invalid permute mask which gets optimized away.
This patch uses ssizetype as masktype instead.
PR tree-optimization/123414
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_vector_constructor):
Use ssizetype as mask type.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr123414.c: New test.
---
.../gcc.target/riscv/rvv/autovec/pr123414.c | 29 +++++++++++++++++++
gcc/tree-ssa-forwprop.cc | 22 +++-----------
2 files changed, 33 insertions(+), 18 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
new file mode 100644
index 00000000000..9a48085a562
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mrvv-max-lmul=m8 -O3 -fsigned-char -fno-strict-aliasing -fwrapv -fdump-tree-optimized -std=gnu99" } */
+
+signed char a=2;
+long long b;
+long c = 93;
+int e[1][9];
+
+void
+g (long cc, int ee[][9])
+{
+ for (int i = 0; i < 4; i++)
+ for (int j = 0; j < 5; j++)
+ for (unsigned k = 0; k < 9; k++)
+ {
+ a *= cc;
+ for (int l = 0; l < 6; l += (ee[k] <= 0) + 2)
+ ;
+ }
+}
+
+int main() {
+ g( c, e);
+ b = (int)a;
+ if (b != 34)
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "\[a-zA-Z_\]\[a-zA-Z0-9_\]+.=.VEC_PERM_EXPR <_\[0-9\]+, \\\{ 1(?:, 1){255} \\\}, \\\{ 0, 257, 258" 3 "optimized" } } */
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index c5bad3ddf38..b70fe8169ad 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -3829,8 +3829,8 @@ static bool
simplify_vector_constructor (gimple_stmt_iterator *gsi)
{
gimple *stmt = gsi_stmt (*gsi);
- tree op, orig[2], type, elem_type;
- unsigned elem_size, i;
+ tree op, orig[2], type;
+ unsigned i;
unsigned HOST_WIDE_INT nelts;
unsigned HOST_WIDE_INT refnelts;
enum tree_code conv_code;
@@ -3843,8 +3843,6 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts))
return false;
- elem_type = TREE_TYPE (type);
- elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
orig[0] = NULL;
orig[1] = NULL;
@@ -4177,13 +4175,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
machine_mode vmode = TYPE_MODE (perm_type);
if (!can_vec_perm_const_p (vmode, vmode, indices))
return false;
- mask_type
- = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
- refnelts);
- if (GET_MODE_CLASS (TYPE_MODE (mask_type)) != MODE_VECTOR_INT
- || maybe_ne (GET_MODE_SIZE (TYPE_MODE (mask_type)),
- GET_MODE_SIZE (TYPE_MODE (perm_type))))
- return false;
+ mask_type = build_vector_type (ssizetype, refnelts);
tree op2 = vec_perm_indices_to_tree (mask_type, indices);
bool converted_orig1 = false;
gimple_seq stmts = NULL;
@@ -4248,13 +4240,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
machine_mode vmode = TYPE_MODE (type);
if (!can_vec_perm_const_p (vmode, vmode, indices))
return false;
- mask_type
- = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
- nelts);
- if (GET_MODE_CLASS (TYPE_MODE (mask_type)) != MODE_VECTOR_INT
- || maybe_ne (GET_MODE_SIZE (TYPE_MODE (mask_type)),
- GET_MODE_SIZE (TYPE_MODE (type))))
- return false;
+ mask_type = build_vector_type (ssizetype, nelts);
blend_op2 = vec_perm_indices_to_tree (mask_type, indices);
}
> Am 09.01.2026 um 12:05 schrieb Robin Dapp <rdapp.gcc@gmail.com>:
>
>
>>
>>> Bootstrapped and regtested on x86, power10, and aarch64.
>>> Regtested on riscv64.
>>
>> Given its a constant permute(?) you can just always use ssizetyoe here.
>
> Ah, that's even simpler then, thanks. The attached v2 was bootstrapped and
> regtested as before.
Ok
Richard
> [PATCH v2] forwprop: Use ssizetype for mask [PR123414].
>
> RVV's vectors can get very large with LMUL8. In the PR we have
> 256-element char vectors which get permuted. For permuting them
> we use a mask vectype that is deduced from the element type
> without checking if the permute indices fit this type.
> That leads to an invalid permute mask which gets optimized away.
>
> This patch uses ssizetype as masktype instead.
>
> PR tree-optimization/123414
>
> gcc/ChangeLog:
>
> * tree-ssa-forwprop.cc (simplify_vector_constructor):
> Use ssizetype as mask type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr123414.c: New test.
> ---
> .../gcc.target/riscv/rvv/autovec/pr123414.c | 29 +++++++++++++++++++
> gcc/tree-ssa-forwprop.cc | 22 +++-----------
> 2 files changed, 33 insertions(+), 18 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
> new file mode 100644
> index 00000000000..9a48085a562
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr123414.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvl256b -mrvv-max-lmul=m8 -O3 -fsigned-char -fno-strict-aliasing -fwrapv -fdump-tree-optimized -std=gnu99" } */
> +
> +signed char a=2;
> +long long b;
> +long c = 93;
> +int e[1][9];
> +
> +void
> +g (long cc, int ee[][9])
> +{
> + for (int i = 0; i < 4; i++)
> + for (int j = 0; j < 5; j++)
> + for (unsigned k = 0; k < 9; k++)
> + {
> + a *= cc;
> + for (int l = 0; l < 6; l += (ee[k] <= 0) + 2)
> + ;
> + }
> +}
> +
> +int main() {
> + g( c, e);
> + b = (int)a;
> + if (b != 34)
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\[a-zA-Z_\]\[a-zA-Z0-9_\]+.=.VEC_PERM_EXPR <_\[0-9\]+, \\\{ 1(?:, 1){255} \\\}, \\\{ 0, 257, 258" 3 "optimized" } } */
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index c5bad3ddf38..b70fe8169ad 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -3829,8 +3829,8 @@ static bool
> simplify_vector_constructor (gimple_stmt_iterator *gsi)
> {
> gimple *stmt = gsi_stmt (*gsi);
> - tree op, orig[2], type, elem_type;
> - unsigned elem_size, i;
> + tree op, orig[2], type;
> + unsigned i;
> unsigned HOST_WIDE_INT nelts;
> unsigned HOST_WIDE_INT refnelts;
> enum tree_code conv_code;
> @@ -3843,8 +3843,6 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
>
> if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts))
> return false;
> - elem_type = TREE_TYPE (type);
> - elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>
> orig[0] = NULL;
> orig[1] = NULL;
> @@ -4177,13 +4175,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
> machine_mode vmode = TYPE_MODE (perm_type);
> if (!can_vec_perm_const_p (vmode, vmode, indices))
> return false;
> - mask_type
> - = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
> - refnelts);
> - if (GET_MODE_CLASS (TYPE_MODE (mask_type)) != MODE_VECTOR_INT
> - || maybe_ne (GET_MODE_SIZE (TYPE_MODE (mask_type)),
> - GET_MODE_SIZE (TYPE_MODE (perm_type))))
> - return false;
> + mask_type = build_vector_type (ssizetype, refnelts);
> tree op2 = vec_perm_indices_to_tree (mask_type, indices);
> bool converted_orig1 = false;
> gimple_seq stmts = NULL;
> @@ -4248,13 +4240,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
> machine_mode vmode = TYPE_MODE (type);
> if (!can_vec_perm_const_p (vmode, vmode, indices))
> return false;
> - mask_type
> - = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
> - nelts);
> - if (GET_MODE_CLASS (TYPE_MODE (mask_type)) != MODE_VECTOR_INT
> - || maybe_ne (GET_MODE_SIZE (TYPE_MODE (mask_type)),
> - GET_MODE_SIZE (TYPE_MODE (type))))
> - return false;
> + mask_type = build_vector_type (ssizetype, nelts);
> blend_op2 = vec_perm_indices_to_tree (mask_type, indices);
> }
>
> --
> 2.52.0
>
new file mode 100644
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mrvv-max-lmul=m8 -O3 -fsigned-char -fno-strict-aliasing -fwrapv -fdump-tree-optimized -std=gnu99" } */
+
+signed char a=2;
+long long b;
+long c = 93;
+int e[1][9];
+
+void
+g (long cc, int ee[][9])
+{
+ for (int i = 0; i < 4; i++)
+ for (int j = 0; j < 5; j++)
+ for (unsigned k = 0; k < 9; k++)
+ {
+ a *= cc;
+ for (int l = 0; l < 6; l += (ee[k] <= 0) + 2)
+ ;
+ }
+}
+
+int main() {
+ g( c, e);
+ b = (int)a;
+ if (b != 34)
+ __builtin_abort ();
+}
+
+/* We should have four vector constructors that must not get optimized away.
+ */
+/* { dg-final { scan-tree-dump-times "\[a-zA-Z_\]\[a-zA-Z0-9_\]+.=.\\\{\[a-zA-Z0-9._\]+, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1," 4 "optimized" } } */
@@ -4177,9 +4177,16 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
machine_mode vmode = TYPE_MODE (perm_type);
if (!can_vec_perm_const_p (vmode, vmode, indices))
return false;
- mask_type
- = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
- refnelts);
+ /* With an ELEM_SIZEd integer we can shuffle at most
+ (1 << ELEM_SIZE) / 2 - 1 elements. For a more fine-grained
+ check we could take the maximum of the element values
+ but for now this is sufficient. */
+ if (refnelts < (HOST_WIDE_INT_1U << (elem_size - 1)))
+ mask_type
+ = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
+ refnelts);
+ else
+ return false;
if (GET_MODE_CLASS (TYPE_MODE (mask_type)) != MODE_VECTOR_INT
|| maybe_ne (GET_MODE_SIZE (TYPE_MODE (mask_type)),
GET_MODE_SIZE (TYPE_MODE (perm_type))))