middle-end: Prevent the use of the cond inversion detection code when both conditions are external. [PR105197]

Message ID VI1PR08MB53253C42D28B84ABC462D366FFEA9@VI1PR08MB5325.eurprd08.prod.outlook.com
State Committed
Commit 78c718490bc2843d4dadcef8a0ae14aed1d15a32
Headers
Series middle-end: Prevent the use of the cond inversion detection code when both conditions are external. [PR105197] |

Commit Message

Tamar Christina April 11, 2022, 1:15 p.m. UTC
  Hi All,

Previously ifcvt used to enforce that a mask A and the inverse of said mask be
represented as ~A. So for the masks

  _25 = _6 != 0;
  _44 = _4 != 0;

ifcvt would produce for an operation requiring the inverse of said mask

  _26 = ~_25;
  _43 = ~_44;

but now that VN is applied to the entire function body we get a simplification
on the mask and produce:

  _26 = _6 == 0;
  _43 = _4 == 0;

This in itself is not a problem semantically speaking (though it does create
more masks that need to be tracked) but when vectorizing the masked conditional
we would still detect _26 and _43 to be inverses of _25 and _44 and mark them
as requiring their operands be swapped.

When vectorizing we swap the operands but don't find the BIT_NOT_EXPR to remove
and so we leave the condition as is which produces invalid code:

------>vectorizing statement: _ifc__41 = _43 ? 0 : _ifc__40;
created new init_stmt: vect_cst__136 = { 0, ... }
add new stmt: _137 = mask__43.26_135 & loop_mask_111
note:  add new stmt: vect__ifc__41.27_138 = VEC_COND_EXPR <_137, vect__ifc__40.25_133, vect_cst__136>;

This fixes disabling the inversion detection code when the loop isn't masked
since both conditional would be external.  We'd then not use the new cond_code
and would incorrectly still swap the operands.

The resulting code is also better than GCC-11 with most operations now
predicated on the loop mask rather than a ptrue.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR target/105197
	* tree-vect-stmts.cc (vectorizable_condition): Prevent cond swap when
	not masked.

gcc/testsuite/ChangeLog:

	PR target/105197
	* gcc.target/aarch64/sve/pr105197-1.c: New test.
	* gcc.target/aarch64/sve/pr105197-2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr105197-1.c b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..e33532d8bed5f90f216817a6692544eae1f3ae3f


--
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr105197-1.c b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..e33532d8bed5f90f216817a6692544eae1f3ae3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-1.c
@@ -0,0 +1,20 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-additional-options "-O -ftree-vectorize" } */
+
+unsigned char arr_7[9][3];
+unsigned char (*main_arr_7)[3] = arr_7;
+int main() {
+  char arr_2[9];
+  int arr_6[9];
+  int x;
+  unsigned i;
+  for (i = 0; i < 9; ++i) {
+    arr_2[i] = 21;
+    arr_6[i] = 6;
+  }
+  for (i = arr_2[8] - 21; i < 2; i++)
+    x = arr_6[i] ? (main_arr_7[8][i] ? main_arr_7[8][i] : 8) : (char)arr_6[i];
+  if (x != 8)
+    __builtin_abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr105197-2.c b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..5eec5cd837d786390c441fc5ddd2f93c1374d3a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O -ftree-vectorize" } */
+
+void f(int n, int y, char *arr_2, char *arr_6) {
+  for (int i = y; i < n; i++)
+    arr_6[i] = arr_6[i] ? (arr_2[i] ? 3 : 8) : 1;
+}
+
+/* { dg-final { scan-assembler-not {\tand\tp[0-9]+.b} } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5c9e8cfefa5032d39a11696b06cff9ae50f4d46a..a680f991e07f7b147d1fa64e9464d0f7ed0d843f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -10493,7 +10493,7 @@ vectorizable_condition (vec_info *vinfo,
 	      bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
 	      tree_code orig_code = cond.code;
 	      cond.code = invert_tree_comparison (cond.code, honor_nans);
-	      if (loop_vinfo->scalar_cond_masked_set.contains (cond))
+	      if (!masked && loop_vinfo->scalar_cond_masked_set.contains (cond))
 		{
 		  masks = &LOOP_VINFO_MASKS (loop_vinfo);
 		  cond_code = cond.code;
  

Comments

Richard Biener April 11, 2022, 1:18 p.m. UTC | #1
On Mon, 11 Apr 2022, Tamar Christina wrote:

> Hi All,
> 
> Previously ifcvt used to enforce that a mask A and the inverse of said mask be
> represented as ~A. So for the masks
> 
>   _25 = _6 != 0;
>   _44 = _4 != 0;
> 
> ifcvt would produce for an operation requiring the inverse of said mask
> 
>   _26 = ~_25;
>   _43 = ~_44;
> 
> but now that VN is applied to the entire function body we get a simplification
> on the mask and produce:
> 
>   _26 = _6 == 0;
>   _43 = _4 == 0;
> 
> This in itself is not a problem semantically speaking (though it does create
> more masks that need to be tracked) but when vectorizing the masked conditional
> we would still detect _26 and _43 to be inverses of _25 and _44 and mark them
> as requiring their operands be swapped.
> 
> When vectorizing we swap the operands but don't find the BIT_NOT_EXPR to remove
> and so we leave the condition as is which produces invalid code:
> 
> ------>vectorizing statement: _ifc__41 = _43 ? 0 : _ifc__40;
> created new init_stmt: vect_cst__136 = { 0, ... }
> add new stmt: _137 = mask__43.26_135 & loop_mask_111
> note:  add new stmt: vect__ifc__41.27_138 = VEC_COND_EXPR <_137, vect__ifc__40.25_133, vect_cst__136>;
> 
> This fixes disabling the inversion detection code when the loop isn't masked
> since both conditional would be external.  We'd then not use the new cond_code
> and would incorrectly still swap the operands.
> 
> The resulting code is also better than GCC-11 with most operations now
> predicated on the loop mask rather than a ptrue.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

LGTM.

Thansk,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR target/105197
> 	* tree-vect-stmts.cc (vectorizable_condition): Prevent cond swap when
> 	not masked.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR target/105197
> 	* gcc.target/aarch64/sve/pr105197-1.c: New test.
> 	* gcc.target/aarch64/sve/pr105197-2.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr105197-1.c b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..e33532d8bed5f90f216817a6692544eae1f3ae3f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run { target aarch64_sve_hw } } */
> +/* { dg-additional-options "-O -ftree-vectorize" } */
> +
> +unsigned char arr_7[9][3];
> +unsigned char (*main_arr_7)[3] = arr_7;
> +int main() {
> +  char arr_2[9];
> +  int arr_6[9];
> +  int x;
> +  unsigned i;
> +  for (i = 0; i < 9; ++i) {
> +    arr_2[i] = 21;
> +    arr_6[i] = 6;
> +  }
> +  for (i = arr_2[8] - 21; i < 2; i++)
> +    x = arr_6[i] ? (main_arr_7[8][i] ? main_arr_7[8][i] : 8) : (char)arr_6[i];
> +  if (x != 8)
> +    __builtin_abort ();
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr105197-2.c b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..5eec5cd837d786390c441fc5ddd2f93c1374d3a4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-2.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O -ftree-vectorize" } */
> +
> +void f(int n, int y, char *arr_2, char *arr_6) {
> +  for (int i = y; i < n; i++)
> +    arr_6[i] = arr_6[i] ? (arr_2[i] ? 3 : 8) : 1;
> +}
> +
> +/* { dg-final { scan-assembler-not {\tand\tp[0-9]+.b} } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 5c9e8cfefa5032d39a11696b06cff9ae50f4d46a..a680f991e07f7b147d1fa64e9464d0f7ed0d843f 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -10493,7 +10493,7 @@ vectorizable_condition (vec_info *vinfo,
>  	      bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
>  	      tree_code orig_code = cond.code;
>  	      cond.code = invert_tree_comparison (cond.code, honor_nans);
> -	      if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> +	      if (!masked && loop_vinfo->scalar_cond_masked_set.contains (cond))
>  		{
>  		  masks = &LOOP_VINFO_MASKS (loop_vinfo);
>  		  cond_code = cond.code;
> 
> 
>
  

Patch

--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-1.c
@@ -0,0 +1,20 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-additional-options "-O -ftree-vectorize" } */
+
+unsigned char arr_7[9][3];
+unsigned char (*main_arr_7)[3] = arr_7;
+int main() {
+  char arr_2[9];
+  int arr_6[9];
+  int x;
+  unsigned i;
+  for (i = 0; i < 9; ++i) {
+    arr_2[i] = 21;
+    arr_6[i] = 6;
+  }
+  for (i = arr_2[8] - 21; i < 2; i++)
+    x = arr_6[i] ? (main_arr_7[8][i] ? main_arr_7[8][i] : 8) : (char)arr_6[i];
+  if (x != 8)
+    __builtin_abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr105197-2.c b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..5eec5cd837d786390c441fc5ddd2f93c1374d3a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr105197-2.c
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-O -ftree-vectorize" } */
+
+void f(int n, int y, char *arr_2, char *arr_6) {
+  for (int i = y; i < n; i++)
+    arr_6[i] = arr_6[i] ? (arr_2[i] ? 3 : 8) : 1;
+}
+
+/* { dg-final { scan-assembler-not {\tand\tp[0-9]+.b} } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5c9e8cfefa5032d39a11696b06cff9ae50f4d46a..a680f991e07f7b147d1fa64e9464d0f7ed0d843f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -10493,7 +10493,7 @@  vectorizable_condition (vec_info *vinfo,
 	      bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
 	      tree_code orig_code = cond.code;
 	      cond.code = invert_tree_comparison (cond.code, honor_nans);
-	      if (loop_vinfo->scalar_cond_masked_set.contains (cond))
+	      if (!masked && loop_vinfo->scalar_cond_masked_set.contains (cond))
 		{
 		  masks = &LOOP_VINFO_MASKS (loop_vinfo);
 		  cond_code = cond.code;