bswap: Fix up symbolic merging for xor and plus [PR103376]

Message ID 20211124083539.GI2646553@tucnak
State Committed
Headers
Series bswap: Fix up symbolic merging for xor and plus [PR103376] |

Commit Message

Jakub Jelinek Nov. 24, 2021, 8:35 a.m. UTC
  On Mon, Nov 22, 2021 at 08:39:42AM -0000, Roger Sayle wrote:
> This patch implements PR tree-optimization/103345 to merge adjacent
> loads when combined with addition or bitwise xor.  The current code
> in gimple-ssa-store-merging.c's find_bswap_or_nop alreay handles ior,
> so that all that's required is to treat PLUS_EXPR and BIT_XOR_EXPR in
> the same way at BIT_IOR_EXPR.

Unfortunately they aren't exactly the same.  They work the same if always
at least one operand (or corresponding byte in it) is known to be 0,
0 | 0 = 0 ^ 0 = 0 + 0 = 0.  But for | also x | x = x for any other x,
so perform_symbolic_merge has been accepting either that at least one
of the bytes is 0 or that both are the same, but that is wrong for ^
and +.

The following patch fixes that by passing through the code of binary
operation and allowing non-zero masked1 == masked2 through only
for BIT_IOR_EXPR.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Thinking more about it, perhaps we could do more for BIT_XOR_EXPR.
We could allow masked1 == masked2 case for it, but would need to
do something different than the
  n->n = n1->n | n2->n;
we do on all the bytes together.
In particular, for masked1 == masked2 if masked1 != 0 (well, for 0
both variants are the same) and masked1 != 0xff we would need to
clear corresponding n->n byte instead of setting it to the input
as x ^ x = 0 (but if we don't know what x and y are, the result is
also don't know).  Now, for plus it is much harder, because not only
for non-zero operands we don't know what the result is, but it can
modify upper bytes as well.  So perhaps only if current's byte
masked1 && masked2 set the resulting byte to 0xff (unknown) iff
the byte above it is 0 and 0, and set that resulting byte to 0xff too.
Also, even for | we could instead of return NULL just set the resulting
byte to 0xff if it is different, perhaps it will be masked off later on.
Ok to handle that incrementally?

2021-11-24  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/103376
	* gimple-ssa-store-merging.c (perform_symbolic_merge): Add CODE
	argument.  If CODE is not BIT_IOR_EXPR, ensure that one of masked1
	or masked2 is 0.
	(find_bswap_or_nop_1, find_bswap_or_nop,
	imm_store_chain_info::try_coalesce_bswap): Adjust
	perform_symbolic_merge callers.

	* gcc.c-torture/execute/pr103376.c: New test.



	Jakub
  

Comments

Richard Biener Nov. 24, 2021, 8:45 a.m. UTC | #1
On Wed, 24 Nov 2021, Jakub Jelinek wrote:

> On Mon, Nov 22, 2021 at 08:39:42AM -0000, Roger Sayle wrote:
> > This patch implements PR tree-optimization/103345 to merge adjacent
> > loads when combined with addition or bitwise xor.  The current code
> > in gimple-ssa-store-merging.c's find_bswap_or_nop alreay handles ior,
> > so that all that's required is to treat PLUS_EXPR and BIT_XOR_EXPR in
> > the same way at BIT_IOR_EXPR.
> 
> Unfortunately they aren't exactly the same.  They work the same if always
> at least one operand (or corresponding byte in it) is known to be 0,
> 0 | 0 = 0 ^ 0 = 0 + 0 = 0.  But for | also x | x = x for any other x,
> so perform_symbolic_merge has been accepting either that at least one
> of the bytes is 0 or that both are the same, but that is wrong for ^
> and +.
> 
> The following patch fixes that by passing through the code of binary
> operation and allowing non-zero masked1 == masked2 through only
> for BIT_IOR_EXPR.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> Thinking more about it, perhaps we could do more for BIT_XOR_EXPR.
> We could allow masked1 == masked2 case for it, but would need to
> do something different than the
>   n->n = n1->n | n2->n;
> we do on all the bytes together.
> In particular, for masked1 == masked2 if masked1 != 0 (well, for 0
> both variants are the same) and masked1 != 0xff we would need to
> clear corresponding n->n byte instead of setting it to the input
> as x ^ x = 0 (but if we don't know what x and y are, the result is
> also don't know).  Now, for plus it is much harder, because not only
> for non-zero operands we don't know what the result is, but it can
> modify upper bytes as well.  So perhaps only if current's byte
> masked1 && masked2 set the resulting byte to 0xff (unknown) iff
> the byte above it is 0 and 0, and set that resulting byte to 0xff too.
> Also, even for | we could instead of return NULL just set the resulting
> byte to 0xff if it is different, perhaps it will be masked off later on.
> Ok to handle that incrementally?

Not sure if it is worth the trouble - the XOR handling sounds
straight forward at least.  But sure, the merging routine could
simply be conservatively correct here.

Thanks,
Richard.

> 2021-11-24  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/103376
> 	* gimple-ssa-store-merging.c (perform_symbolic_merge): Add CODE
> 	argument.  If CODE is not BIT_IOR_EXPR, ensure that one of masked1
> 	or masked2 is 0.
> 	(find_bswap_or_nop_1, find_bswap_or_nop,
> 	imm_store_chain_info::try_coalesce_bswap): Adjust
> 	perform_symbolic_merge callers.
> 
> 	* gcc.c-torture/execute/pr103376.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.c.jj	2021-11-23 10:26:30.000000000 +0100
> +++ gcc/gimple-ssa-store-merging.c	2021-11-23 11:49:33.806168782 +0100
> @@ -434,14 +434,14 @@ find_bswap_or_nop_load (gimple *stmt, tr
>    return true;
>  }
>  
> -/* Compute the symbolic number N representing the result of a bitwise OR on 2
> -   symbolic number N1 and N2 whose source statements are respectively
> -   SOURCE_STMT1 and SOURCE_STMT2.  */
> +/* Compute the symbolic number N representing the result of a bitwise OR,
> +   bitwise XOR or plus on 2 symbolic number N1 and N2 whose source statements
> +   are respectively SOURCE_STMT1 and SOURCE_STMT2.  CODE is the operation.  */
>  
>  gimple *
>  perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1,
>  			gimple *source_stmt2, struct symbolic_number *n2,
> -			struct symbolic_number *n)
> +			struct symbolic_number *n, enum tree_code code)
>  {
>    int i, size;
>    uint64_t mask;
> @@ -563,7 +563,9 @@ perform_symbolic_merge (gimple *source_s
>  
>        masked1 = n1->n & mask;
>        masked2 = n2->n & mask;
> -      if (masked1 && masked2 && masked1 != masked2)
> +      /* For BIT_XOR_EXPR or PLUS_EXPR, at least one of masked1 and masked2
> +	 has to be 0, for BIT_IOR_EXPR x | x is still x.  */
> +      if (masked1 && masked2 && (code != BIT_IOR_EXPR || masked1 != masked2))
>  	return NULL;
>      }
>    n->n = n1->n | n2->n;
> @@ -769,7 +771,8 @@ find_bswap_or_nop_1 (gimple *stmt, struc
>  	    return NULL;
>  
>  	  source_stmt
> -	    = perform_symbolic_merge (source_stmt1, &n1, source_stmt2, &n2, n);
> +	    = perform_symbolic_merge (source_stmt1, &n1, source_stmt2, &n2, n,
> +				      code);
>  
>  	  if (!source_stmt)
>  	    return NULL;
> @@ -943,7 +946,8 @@ find_bswap_or_nop (gimple *stmt, struct
>  	      else if (!do_shift_rotate (LSHIFT_EXPR, &n0, eltsz))
>  		return NULL;
>  	      ins_stmt
> -		= perform_symbolic_merge (ins_stmt, &n0, source_stmt, &n1, n);
> +		= perform_symbolic_merge (ins_stmt, &n0, source_stmt, &n1, n,
> +					  BIT_IOR_EXPR);
>  
>  	      if (!ins_stmt)
>  		return NULL;
> @@ -2881,7 +2885,7 @@ imm_store_chain_info::try_coalesce_bswap
>  	  end = MAX (end, info->bitpos + info->bitsize);
>  
>  	  ins_stmt = perform_symbolic_merge (ins_stmt, &n, info->ins_stmt,
> -					     &this_n, &n);
> +					     &this_n, &n, BIT_IOR_EXPR);
>  	  if (ins_stmt == NULL)
>  	    return false;
>  	}
> --- gcc/testsuite/gcc.c-torture/execute/pr103376.c.jj	2021-11-23 12:03:38.339948150 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr103376.c	2021-11-23 12:02:44.668723595 +0100
> @@ -0,0 +1,29 @@
> +/* PR tree-optimization/103376 */
> +
> +long long a = 0x123456789abcdef0LL, f;
> +int b, c, *d;
> +
> +__attribute__((noipa)) void
> +foo (int x)
> +{
> +  asm volatile ("" : : "r" (x));
> +}
> +
> +int
> +main ()
> +{
> +  long long e;
> +  e = a;
> +  if (b)
> +    {
> +      foo (c);
> +      d = (int *) 0;
> +      while (*d)
> +	;
> +    }
> +  f = a ^ e;
> +  asm volatile ("" : "+m" (f));
> +  if (f != 0)
> +    __builtin_abort ();
> +  return 0;
> +}
> 
> 
> 	Jakub
> 
>
  

Patch

--- gcc/gimple-ssa-store-merging.c.jj	2021-11-23 10:26:30.000000000 +0100
+++ gcc/gimple-ssa-store-merging.c	2021-11-23 11:49:33.806168782 +0100
@@ -434,14 +434,14 @@  find_bswap_or_nop_load (gimple *stmt, tr
   return true;
 }
 
-/* Compute the symbolic number N representing the result of a bitwise OR on 2
-   symbolic number N1 and N2 whose source statements are respectively
-   SOURCE_STMT1 and SOURCE_STMT2.  */
+/* Compute the symbolic number N representing the result of a bitwise OR,
+   bitwise XOR or plus on 2 symbolic number N1 and N2 whose source statements
+   are respectively SOURCE_STMT1 and SOURCE_STMT2.  CODE is the operation.  */
 
 gimple *
 perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1,
 			gimple *source_stmt2, struct symbolic_number *n2,
-			struct symbolic_number *n)
+			struct symbolic_number *n, enum tree_code code)
 {
   int i, size;
   uint64_t mask;
@@ -563,7 +563,9 @@  perform_symbolic_merge (gimple *source_s
 
       masked1 = n1->n & mask;
       masked2 = n2->n & mask;
-      if (masked1 && masked2 && masked1 != masked2)
+      /* For BIT_XOR_EXPR or PLUS_EXPR, at least one of masked1 and masked2
+	 has to be 0, for BIT_IOR_EXPR x | x is still x.  */
+      if (masked1 && masked2 && (code != BIT_IOR_EXPR || masked1 != masked2))
 	return NULL;
     }
   n->n = n1->n | n2->n;
@@ -769,7 +771,8 @@  find_bswap_or_nop_1 (gimple *stmt, struc
 	    return NULL;
 
 	  source_stmt
-	    = perform_symbolic_merge (source_stmt1, &n1, source_stmt2, &n2, n);
+	    = perform_symbolic_merge (source_stmt1, &n1, source_stmt2, &n2, n,
+				      code);
 
 	  if (!source_stmt)
 	    return NULL;
@@ -943,7 +946,8 @@  find_bswap_or_nop (gimple *stmt, struct
 	      else if (!do_shift_rotate (LSHIFT_EXPR, &n0, eltsz))
 		return NULL;
 	      ins_stmt
-		= perform_symbolic_merge (ins_stmt, &n0, source_stmt, &n1, n);
+		= perform_symbolic_merge (ins_stmt, &n0, source_stmt, &n1, n,
+					  BIT_IOR_EXPR);
 
 	      if (!ins_stmt)
 		return NULL;
@@ -2881,7 +2885,7 @@  imm_store_chain_info::try_coalesce_bswap
 	  end = MAX (end, info->bitpos + info->bitsize);
 
 	  ins_stmt = perform_symbolic_merge (ins_stmt, &n, info->ins_stmt,
-					     &this_n, &n);
+					     &this_n, &n, BIT_IOR_EXPR);
 	  if (ins_stmt == NULL)
 	    return false;
 	}
--- gcc/testsuite/gcc.c-torture/execute/pr103376.c.jj	2021-11-23 12:03:38.339948150 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr103376.c	2021-11-23 12:02:44.668723595 +0100
@@ -0,0 +1,29 @@ 
+/* PR tree-optimization/103376 */
+
+long long a = 0x123456789abcdef0LL, f;
+int b, c, *d;
+
+__attribute__((noipa)) void
+foo (int x)
+{
+  asm volatile ("" : : "r" (x));
+}
+
+int
+main ()
+{
+  long long e;
+  e = a;
+  if (b)
+    {
+      foo (c);
+      d = (int *) 0;
+      while (*d)
+	;
+    }
+  f = a ^ e;
+  asm volatile ("" : "+m" (f));
+  if (f != 0)
+    __builtin_abort ();
+  return 0;
+}