vect: Ensure both NITERSM1 and NITERS are INTEGER_CSTs or neither of them [PR113210]

Message ID ZZ0HuOnrY+ZxfYQu@tucnak
State New
Headers
Series vect: Ensure both NITERSM1 and NITERS are INTEGER_CSTs or neither of them [PR113210] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply

Commit Message

Jakub Jelinek Jan. 9, 2024, 8:45 a.m. UTC
  Hi!

On the following testcase e.g. on riscv64 or aarch64 (latter with
-O3 -march=armv8-a+sve ) we ICE, because while NITERS is INTEGER_CST,
NITERSM1 is a complex expression like
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0
where a.0_1 is unsigned char.  The condition is never true, so the above
is equivalent to just 0, but only when trying to fold the above with
PLUS_EXPR 1 we manage to simplify it (first
~(short unsigned int) (a.0_1 + 255)
to
-(short unsigned int) (a.0_1 + 255)
and then
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? -(short unsigned int) (a.0_1 + 255) : 1
to
(short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) : 1
and only at this point we fold the condition to be false.

But the vectorizer seems to assume that if NITERS is known (i.e. suitable
INTEGER_CST) then NITERSM1 also is, so the following hack ensures that if
NITERS folds into INTEGER_CST NITERSM1 will be one as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
with cross to aarch64-linux with that -O3 -march=armv8-a+sve on the
testcase, ok for trunk?

2024-01-09  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/113210
	* tree-vect-loop.cc (vect_get_loop_niters): If non-INTEGER_CST
	value in *number_of_iterationsm1 PLUS_EXPR 1 is folded into
	INTEGER_CST, recompute *number_of_iterationsm1 as the INTEGER_CST
	minus 1.

	* gcc.c-torture/compile/pr113210.c: New test.


	Jakub
  

Comments

Richard Biener Jan. 9, 2024, 9:15 a.m. UTC | #1
On Tue, 9 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase e.g. on riscv64 or aarch64 (latter with
> -O3 -march=armv8-a+sve ) we ICE, because while NITERS is INTEGER_CST,
> NITERSM1 is a complex expression like
> (short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0
> where a.0_1 is unsigned char.  The condition is never true, so the above
> is equivalent to just 0, but only when trying to fold the above with
> PLUS_EXPR 1 we manage to simplify it (first
> ~(short unsigned int) (a.0_1 + 255)
> to
> -(short unsigned int) (a.0_1 + 255)
> and then
> (short unsigned int) (a.0_1 + 255) + 1 > 256 ? -(short unsigned int) (a.0_1 + 255) : 1
> to
> (short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) : 1
> and only at this point we fold the condition to be false.
> 
> But the vectorizer seems to assume that if NITERS is known (i.e. suitable
> INTEGER_CST) then NITERSM1 also is, so the following hack ensures that if
> NITERS folds into INTEGER_CST NITERSM1 will be one as well.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
> with cross to aarch64-linux with that -O3 -march=armv8-a+sve on the
> testcase, ok for trunk?

OK.

> 2024-01-09  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/113210
> 	* tree-vect-loop.cc (vect_get_loop_niters): If non-INTEGER_CST
> 	value in *number_of_iterationsm1 PLUS_EXPR 1 is folded into
> 	INTEGER_CST, recompute *number_of_iterationsm1 as the INTEGER_CST
> 	minus 1.
> 
> 	* gcc.c-torture/compile/pr113210.c: New test.
> 
> --- gcc/tree-vect-loop.cc.jj	2024-01-08 16:13:18.682939712 +0100
> +++ gcc/tree-vect-loop.cc	2024-01-08 16:30:24.062626368 +0100
> @@ -941,9 +941,22 @@ vect_get_loop_niters (class loop *loop,
>  	 ???  For UINT_MAX latch executions this number overflows to zero
>  	 for loops like do { n++; } while (n != 0);  */
>        if (niter && !chrec_contains_undetermined (niter))
> +	{
>  	  niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
>  			       unshare_expr (niter),
>  			       build_int_cst (TREE_TYPE (niter), 1));
> +	  if (TREE_CODE (niter) == INTEGER_CST
> +	      && TREE_CODE (*number_of_iterationsm1) != INTEGER_CST)
> +	    {
> +	      /* If we manage to fold niter + 1 into INTEGER_CST even when
> +		 niter is some complex expression, ensure back
> +		 *number_of_iterationsm1 is an INTEGER_CST as well.  See
> +		 PR113210.  */
> +	      *number_of_iterationsm1
> +		= fold_build2 (PLUS_EXPR, TREE_TYPE (niter), niter,
> +			       build_minus_one_cst (TREE_TYPE (niter)));
> +	    }
> +	}
>        *number_of_iterations = niter;
>      }
>  
> --- gcc/testsuite/gcc.c-torture/compile/pr113210.c.jj	2024-01-08 16:17:16.672620793 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr113210.c	2024-01-08 16:17:16.671620807 +0100
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/113210 */
> +
> +unsigned char a, c;
> +unsigned short b;
> +
> +void
> +foo (void)
> +{
> +  c = a + 255;
> +  b = c;
> +  while (++b > 256)
> +    ;
> +}
> 
> 	Jakub
> 
>
  

Patch

--- gcc/tree-vect-loop.cc.jj	2024-01-08 16:13:18.682939712 +0100
+++ gcc/tree-vect-loop.cc	2024-01-08 16:30:24.062626368 +0100
@@ -941,9 +941,22 @@  vect_get_loop_niters (class loop *loop,
 	 ???  For UINT_MAX latch executions this number overflows to zero
 	 for loops like do { n++; } while (n != 0);  */
       if (niter && !chrec_contains_undetermined (niter))
+	{
 	  niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
 			       unshare_expr (niter),
 			       build_int_cst (TREE_TYPE (niter), 1));
+	  if (TREE_CODE (niter) == INTEGER_CST
+	      && TREE_CODE (*number_of_iterationsm1) != INTEGER_CST)
+	    {
+	      /* If we manage to fold niter + 1 into INTEGER_CST even when
+		 niter is some complex expression, ensure back
+		 *number_of_iterationsm1 is an INTEGER_CST as well.  See
+		 PR113210.  */
+	      *number_of_iterationsm1
+		= fold_build2 (PLUS_EXPR, TREE_TYPE (niter), niter,
+			       build_minus_one_cst (TREE_TYPE (niter)));
+	    }
+	}
       *number_of_iterations = niter;
     }
 
--- gcc/testsuite/gcc.c-torture/compile/pr113210.c.jj	2024-01-08 16:17:16.672620793 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr113210.c	2024-01-08 16:17:16.671620807 +0100
@@ -0,0 +1,13 @@ 
+/* PR tree-optimization/113210 */
+
+unsigned char a, c;
+unsigned short b;
+
+void
+foo (void)
+{
+  c = a + 255;
+  b = c;
+  while (++b > 256)
+    ;
+}