convert (xor (and (xor A B) C) A) to (ior (and A ~C) (and B C)) [PR90323]

Message ID 20211229012706.3244946-1-luoxhu@linux.ibm.com
State New
Headers
Series convert (xor (and (xor A B) C) A) to (ior (and A ~C) (and B C)) [PR90323] |

Commit Message

Xionghu Luo Dec. 29, 2021, 1:27 a.m. UTC
  Bootstrapped and regtested on powerpc64le-linux-gnu {P10,P9}
powerpc64-linux-gnu {P8, P7} and X86.  OK for master?

gcc/ChangeLog:

	PR 90323
	* simplify-rtx.c (simplify_context::simplify_binary_operation_1): Relax
	C from constant to constant or reg.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr90323.c: New test.
---
 gcc/simplify-rtx.c                         | 11 ++++----
 gcc/testsuite/gcc.target/powerpc/pr90323.c | 33 ++++++++++++++++++++++
 2 files changed, 39 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr90323.c
  

Comments

Jeff Law Dec. 30, 2021, 4:22 p.m. UTC | #1
On 12/28/2021 6:27 PM, Xionghu Luo via Gcc-patches wrote:
> Bootstrapped and regtested on powerpc64le-linux-gnu {P10,P9}
> powerpc64-linux-gnu {P8, P7} and X86.  OK for master?
>
> gcc/ChangeLog:
>
> 	PR 90323
> 	* simplify-rtx.c (simplify_context::simplify_binary_operation_1): Relax
> 	C from constant to constant or reg.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/powerpc/pr90323.c: New test.
If C is not a constant and the target does not have and-complement 
instructions, then this is likely worse than the original.  If you want 
to do this transformation, then you probably need to be checking target 
costs.

Jeff
  
Jakub Jelinek Dec. 30, 2021, 4:27 p.m. UTC | #2
On Thu, Dec 30, 2021 at 09:22:51AM -0700, Jeff Law via Gcc-patches wrote:
> On 12/28/2021 6:27 PM, Xionghu Luo via Gcc-patches wrote:
> > Bootstrapped and regtested on powerpc64le-linux-gnu {P10,P9}
> > powerpc64-linux-gnu {P8, P7} and X86.  OK for master?
> > 
> > gcc/ChangeLog:
> > 
> > 	PR 90323
> > 	* simplify-rtx.c (simplify_context::simplify_binary_operation_1): Relax
> > 	C from constant to constant or reg.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.target/powerpc/pr90323.c: New test.
> If C is not a constant and the target does not have and-complement
> instructions, then this is likely worse than the original.  If you want to
> do this transformation, then you probably need to be checking target costs.

I'm not sure we can check costs because what simplify_* is fed especially
during combine might be far from what is a valid instruction and checking
costs on something that isn't valid could lead to bogus results.
Perhaps check if the andnot optab exist for the mode, except we don't have
such an optab...

	Jakub
  
Jeff Law Dec. 30, 2021, 4:30 p.m. UTC | #3
On 12/30/2021 9:27 AM, Jakub Jelinek wrote:
> On Thu, Dec 30, 2021 at 09:22:51AM -0700, Jeff Law via Gcc-patches wrote:
>> On 12/28/2021 6:27 PM, Xionghu Luo via Gcc-patches wrote:
>>> Bootstrapped and regtested on powerpc64le-linux-gnu {P10,P9}
>>> powerpc64-linux-gnu {P8, P7} and X86.  OK for master?
>>>
>>> gcc/ChangeLog:
>>>
>>> 	PR 90323
>>> 	* simplify-rtx.c (simplify_context::simplify_binary_operation_1): Relax
>>> 	C from constant to constant or reg.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.target/powerpc/pr90323.c: New test.
>> If C is not a constant and the target does not have and-complement
>> instructions, then this is likely worse than the original.  If you want to
>> do this transformation, then you probably need to be checking target costs.
> I'm not sure we can check costs because what simplify_* is fed especially
> during combine might be far from what is a valid instruction and checking
> costs on something that isn't valid could lead to bogus results.
> Perhaps check if the andnot optab exist for the mode, except we don't have
> such an optab...
Should we just defer this to the next stage1?  That would give plenty of 
time to wire up an optab and test it on the appropriate targets.


jeff
  
Segher Boessenkool Dec. 30, 2021, 5:01 p.m. UTC | #4
On Thu, Dec 30, 2021 at 09:22:51AM -0700, Jeff Law wrote:
> On 12/28/2021 6:27 PM, Xionghu Luo via Gcc-patches wrote:
> >	PR 90323
> >	* simplify-rtx.c (simplify_context::simplify_binary_operation_1): 
> >	Relax
> >	C from constant to constant or reg.
> >
> >gcc/testsuite/ChangeLog:
> >
> >	* gcc.target/powerpc/pr90323.c: New test.
> If C is not a constant and the target does not have and-complement 
> instructions, then this is likely worse than the original.  If you want 
> to do this transformation, then you probably need to be checking target 
> costs.

It even then is still not worse on any modern superscalar machine!


Segher
  
Segher Boessenkool Dec. 30, 2021, 6:56 p.m. UTC | #5
Hi!

On Tue, Dec 28, 2021 at 07:27:06PM -0600, Xionghu Luo wrote:
> Bootstrapped and regtested on powerpc64le-linux-gnu {P10,P9}
> powerpc64-linux-gnu {P8, P7} and X86.  OK for master?
> 
> gcc/ChangeLog:
> 
> 	PR 90323
> 	* simplify-rtx.c (simplify_context::simplify_binary_operation_1): Relax
> 	C from constant to constant or reg.

So, this doesn't even do the right thing for PowerPC, for scalar
operations that is: we only have rl[wd]imi for this, and nothing that
uses a register for the mask instead.  Not that it will hurt much at all
(if anything) in practice, but :-)

OTOH for vectors we have vsel/xxsel, which always uses a register for
the mask.

On yet another hand, the formulation with two XORs is just obfuscation,
for RTL.  RTL very much does not have the rule that fewer ops is better.

So, ideally we will never get this nonsense in RTL at all.  Probably the
simplest / best way to get there is to not have it in Gimple either, and
instead just use some "select" operation there, which can be optimised
much better anyway?

And yeah that will be GCC 13 stuff :-(


Segher
  

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index a060f1bbce0..be240b2979e 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -3581,12 +3581,13 @@  simplify_context::simplify_binary_operation_1 (rtx_code code,
 	    }
 	}
 
-      /* If we have (xor (and (xor A B) C) A) with C a constant we can instead
-	 do (ior (and A ~C) (and B C)) which is a machine instruction on some
-	 machines, and also has shorter instruction path length.  */
+      /* If we have (xor (and (xor A B) C) A) with C a constant or register
+	 we can instead do (ior (and A ~C) (and B C)) which is a machine
+	 instruction on some machines, and also has shorter instruction path
+	 length.  */
       if (GET_CODE (op0) == AND
 	  && GET_CODE (XEXP (op0, 0)) == XOR
-	  && CONST_INT_P (XEXP (op0, 1))
+	  && (CONST_INT_P (XEXP (op0, 1)) || REG_P (XEXP (op0, 1)))
 	  && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1))
 	{
 	  rtx a = trueop1;
@@ -3600,7 +3601,7 @@  simplify_context::simplify_binary_operation_1 (rtx_code code,
       /* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C))  */
       else if (GET_CODE (op0) == AND
 	  && GET_CODE (XEXP (op0, 0)) == XOR
-	  && CONST_INT_P (XEXP (op0, 1))
+	  && (CONST_INT_P (XEXP (op0, 1)) || REG_P (XEXP (op0, 1)))
 	  && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1))
 	{
 	  rtx a = XEXP (XEXP (op0, 0), 0);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr90323.c b/gcc/testsuite/gcc.target/powerpc/pr90323.c
new file mode 100644
index 00000000000..e80609c477f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr90323.c
@@ -0,0 +1,33 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include <altivec.h>
+#include <stdio.h>
+volatile vector unsigned orig = {0xebebebeb, 0x34343434, 0x76767676, 0x12121212};
+volatile vector unsigned mask = {0xffffffff, 0, 0xffffffff, 0};
+volatile vector unsigned fill = {0xfefefefe, 0xaaaaaaaa, 0xbbbbbbbb, 0xcccccccc};
+volatile vector unsigned expected = {0xfefefefe, 0x34343434, 0xbbbbbbbb, 0x12121212};
+__attribute__((noinline)) vector unsigned
+without_sel(vector unsigned l, vector unsigned r, vector unsigned mask) {
+  l = l & ~mask;
+  l |= mask & r;
+  return l;
+}
+
+__attribute__((noinline)) vector unsigned
+with_sel(vector unsigned l, vector unsigned r, vector unsigned mask) {
+  return vec_sel(l, r, mask);
+}
+
+int main() {
+  vector unsigned res1 = without_sel(orig, fill, mask);
+  vector unsigned res2 = with_sel(orig, fill, mask);
+  if (!vec_all_eq(res1, expected))
+    printf("error1\n");
+  if (!vec_all_eq(res2, expected))
+    printf("error2\n");
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times {\mxxsel\M} 2 } } */