vect: Ensure both NITERSM1 and NITERS are INTEGER_CSTs or neither of them [PR113210]
Checks
Commit Message
Hi!
On the following testcase e.g. on riscv64 or aarch64 (latter with
-O3 -march=armv8-a+sve ) we ICE, because while NITERS is INTEGER_CST,
NITERSM1 is a complex expression like
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0
where a.0_1 is unsigned char. The condition is never true, so the above
is equivalent to just 0, but only when trying to fold the above with
PLUS_EXPR 1 we manage to simplify it (first
~(short unsigned int) (a.0_1 + 255)
to
-(short unsigned int) (a.0_1 + 255)
and then
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? -(short unsigned int) (a.0_1 + 255) : 1
to
(short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) : 1
and only at this point we fold the condition to be false.
But the vectorizer seems to assume that if NITERS is known (i.e. suitable
INTEGER_CST) then NITERSM1 also is, so the following hack ensures that if
NITERS folds into INTEGER_CST NITERSM1 will be one as well.
Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
with cross to aarch64-linux with that -O3 -march=armv8-a+sve on the
testcase, ok for trunk?
2024-01-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113210
* tree-vect-loop.cc (vect_get_loop_niters): If non-INTEGER_CST
value in *number_of_iterationsm1 PLUS_EXPR 1 is folded into
INTEGER_CST, recompute *number_of_iterationsm1 as the INTEGER_CST
minus 1.
* gcc.c-torture/compile/pr113210.c: New test.
Jakub
Comments
On Tue, 9 Jan 2024, Jakub Jelinek wrote:
> Hi!
>
> On the following testcase e.g. on riscv64 or aarch64 (latter with
> -O3 -march=armv8-a+sve ) we ICE, because while NITERS is INTEGER_CST,
> NITERSM1 is a complex expression like
> (short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0
> where a.0_1 is unsigned char. The condition is never true, so the above
> is equivalent to just 0, but only when trying to fold the above with
> PLUS_EXPR 1 we manage to simplify it (first
> ~(short unsigned int) (a.0_1 + 255)
> to
> -(short unsigned int) (a.0_1 + 255)
> and then
> (short unsigned int) (a.0_1 + 255) + 1 > 256 ? -(short unsigned int) (a.0_1 + 255) : 1
> to
> (short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) : 1
> and only at this point we fold the condition to be false.
>
> But the vectorizer seems to assume that if NITERS is known (i.e. suitable
> INTEGER_CST) then NITERSM1 also is, so the following hack ensures that if
> NITERS folds into INTEGER_CST NITERSM1 will be one as well.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
> with cross to aarch64-linux with that -O3 -march=armv8-a+sve on the
> testcase, ok for trunk?
OK.
> 2024-01-09 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/113210
> * tree-vect-loop.cc (vect_get_loop_niters): If non-INTEGER_CST
> value in *number_of_iterationsm1 PLUS_EXPR 1 is folded into
> INTEGER_CST, recompute *number_of_iterationsm1 as the INTEGER_CST
> minus 1.
>
> * gcc.c-torture/compile/pr113210.c: New test.
>
> --- gcc/tree-vect-loop.cc.jj 2024-01-08 16:13:18.682939712 +0100
> +++ gcc/tree-vect-loop.cc 2024-01-08 16:30:24.062626368 +0100
> @@ -941,9 +941,22 @@ vect_get_loop_niters (class loop *loop,
> ??? For UINT_MAX latch executions this number overflows to zero
> for loops like do { n++; } while (n != 0); */
> if (niter && !chrec_contains_undetermined (niter))
> + {
> niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> unshare_expr (niter),
> build_int_cst (TREE_TYPE (niter), 1));
> + if (TREE_CODE (niter) == INTEGER_CST
> + && TREE_CODE (*number_of_iterationsm1) != INTEGER_CST)
> + {
> + /* If we manage to fold niter + 1 into INTEGER_CST even when
> + niter is some complex expression, ensure back
> + *number_of_iterationsm1 is an INTEGER_CST as well. See
> + PR113210. */
> + *number_of_iterationsm1
> + = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), niter,
> + build_minus_one_cst (TREE_TYPE (niter)));
> + }
> + }
> *number_of_iterations = niter;
> }
>
> --- gcc/testsuite/gcc.c-torture/compile/pr113210.c.jj 2024-01-08 16:17:16.672620793 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr113210.c 2024-01-08 16:17:16.671620807 +0100
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/113210 */
> +
> +unsigned char a, c;
> +unsigned short b;
> +
> +void
> +foo (void)
> +{
> + c = a + 255;
> + b = c;
> + while (++b > 256)
> + ;
> +}
>
> Jakub
>
>
@@ -941,9 +941,22 @@ vect_get_loop_niters (class loop *loop,
??? For UINT_MAX latch executions this number overflows to zero
for loops like do { n++; } while (n != 0); */
if (niter && !chrec_contains_undetermined (niter))
+ {
niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
unshare_expr (niter),
build_int_cst (TREE_TYPE (niter), 1));
+ if (TREE_CODE (niter) == INTEGER_CST
+ && TREE_CODE (*number_of_iterationsm1) != INTEGER_CST)
+ {
+ /* If we manage to fold niter + 1 into INTEGER_CST even when
+ niter is some complex expression, ensure back
+ *number_of_iterationsm1 is an INTEGER_CST as well. See
+ PR113210. */
+ *number_of_iterationsm1
+ = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), niter,
+ build_minus_one_cst (TREE_TYPE (niter)));
+ }
+ }
*number_of_iterations = niter;
}
@@ -0,0 +1,13 @@
+/* PR tree-optimization/113210 */
+
+unsigned char a, c;
+unsigned short b;
+
+void
+foo (void)
+{
+ c = a + 255;
+ b = c;
+ while (++b > 256)
+ ;
+}