[v3,rs6000] Disable TImode from Bool expanders [PR100694, PR93123]

Message ID 001654df-a083-c10f-6c0f-dfda0de6e87b@linux.ibm.com
State New
Headers
Series [v3,rs6000] Disable TImode from Bool expanders [PR100694, PR93123] |

Commit Message

HAO CHEN GUI July 4, 2022, 6:27 a.m. UTC
  Hi,
  This patch fails TImode for all 128-bit logical operation expanders. So
TImode splits to two DI registers during expand. Potential optimizations can
be taken after expand pass. Originally, the TImode logical operations are
split after reload pass. It's too late. The test case illustrates it.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is
this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-07-04 Haochen Gui <guihaoc@linux.ibm.com>

gcc/
	PR target/100694
	* config/rs6000/rs6000.md (and<mode>3): Fail TImode.
	(ior<mode>3): Likewise.
	(xor<mode>3): Likewise.
	(nor<mode>3): Likewise.
	(andc<mode>3): Likewise.
	(eqv<mode>3): Likewise.
	(nand<mode>3): Likewise.
	(orc<mode>3): Likewise.
	(one_cmpl<mode>2): Define as an expand and fail TImode.
	(*one_cmpl<mode>2): Define as an anonymous insn pattern.

gcc/testsuite/
	PR target/100694
	* gcc.target/powerpc/pr100694.c: New.
	* gcc.target/powerpc/pr92398.c: New.
	* gcc.target/powerpc/pr92398.h: Remove.
	* gcc.target/powerpc/pr92398.p9-.c: Remove.
	* gcc.target/powerpc/pr92398.p9+.c: Remove.


patch.diff
  

Comments

Segher Boessenkool July 12, 2022, 5:26 p.m. UTC | #1
Hi!

On Mon, Jul 04, 2022 at 02:27:42PM +0800, HAO CHEN GUI wrote:
>   This patch fails TImode for all 128-bit logical operation expanders. So
> TImode splits to two DI registers during expand. Potential optimizations can
> be taken after expand pass. Originally, the TImode logical operations are
> split after reload pass. It's too late. The test case illustrates it.

Out of interest, did you see any performance win?  There is a lot of
opportunity for this to cause performance *loss*, on newer systems :-(

> ChangeLog
> 2022-07-04 Haochen Gui <guihaoc@linux.ibm.com>

Two spaces both before and after your name, in changelogs.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7078,27 +7078,38 @@ (define_expand "subti3"
>  })
>  
>  ;; 128-bit logical operations expanders
> +;; Fail TImode in all 128-bit logical operations expanders and split it into
> +;; two DI registers.
> 
>  (define_expand "and<mode>3"
>    [(set (match_operand:BOOL_128 0 "vlogical_operand")
>  	(and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>  		      (match_operand:BOOL_128 2 "vlogical_operand")))]
>    ""
> -  "")
> +{
> +  if (<MODE>mode == TImode)
> +    FAIL;
> +})

It is better to not FAIL it, but simply not have a pattern for the
TImode version at all.

Does nothing depend on the :TI version to exist?

What about the :PTI version?  Getting rid of that as well will allow
some nice optimisations.

Of course we *do* have instructions to do such TImode ops, on newer
CPUs, but in vector registers only.  It isn't obvious what is faster.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100694.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */
> +/* { dg-final { scan-assembler-not {\mli\M} } } */
> +/* { dg-final { scan-assembler-not {\mor\M} } } */
> +
> +/* It just needs two std.  */
> +void foo (unsigned __int128* res, unsigned long long hi, unsigned long long lo)
> +{
> +   unsigned __int128 i = hi;
> +   i <<= 64;
> +   i |= lo;
> +   *res = i;
> +}

You can also just count the number of generated insns:
/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 3 } } */
(three, not two, because of the blr insn at the end).

If possible, we should simply not do :TI ops on older systems at all,
and only on the newer systems that have instructions for it (and that
does not fix PR100694 btw, the problems there have to be solved, not
side-stepped :-( )


Segher
  
HAO CHEN GUI July 18, 2022, 8:33 a.m. UTC | #2
Hi Segher,
  Thanks for your comments.

On 13/7/2022 上午 1:26, Segher Boessenkool wrote:
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -7078,27 +7078,38 @@ (define_expand "subti3"
>>  })
>>  
>>  ;; 128-bit logical operations expanders
>> +;; Fail TImode in all 128-bit logical operations expanders and split it into
>> +;; two DI registers.
>>
>>  (define_expand "and<mode>3"
>>    [(set (match_operand:BOOL_128 0 "vlogical_operand")
>>  	(and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>>  		      (match_operand:BOOL_128 2 "vlogical_operand")))]
>>    ""
>> -  "")
>> +{
>> +  if (<MODE>mode == TImode)
>> +    FAIL;
>> +})
> It is better to not FAIL it, but simply not have a pattern for the
> TImode version at all.
> 
> Does nothing depend on the :TI version to exist?
> 
> What about the :PTI version?  Getting rid of that as well will allow
> some nice optimisations.
> 
> Of course we *do* have instructions to do such TImode ops, on newer
> CPUs, but in vector registers only.  It isn't obvious what is faster.
> 

During expand, TI mode is split to two registers when it can't match
any expands. So I failed TI mode in each expand and expect to be
split at expand. TI mode is still in some insn_and_split patterns
(e.g. "*and<mode>3_internal"). If later rtl passes generate TI mode
logical operations, they still can be matched.

Originally, the TI mode is split after reload pass by
rs6000_split_logical. It's too late to catch some rtl optimizations.

For the PTI, it can't be split to two registers during expand. PTI
requires an even/odd register pair. So splitting it after reload can
make sure it gets correct registers, I think.

From my understanding, it's sub-optimal to use vector logical operation
instructions for TI mode if the destination is an integer operand. It
needs three instructions (move to vector register, vector logical
operation and move from vector register). When splitting TImode, it only
needs two logical instructions on two separate registers.

Thanks again
Gui Haochen
  

Patch

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index c55ee7e171a..6e57aac3ebf 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7078,27 +7078,38 @@  (define_expand "subti3"
 })
 
 ;; 128-bit logical operations expanders
+;; Fail TImode in all 128-bit logical operations expanders and split it into
+;; two DI registers.

 (define_expand "and<mode>3"
   [(set (match_operand:BOOL_128 0 "vlogical_operand")
 	(and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
 		      (match_operand:BOOL_128 2 "vlogical_operand")))]
   ""
-  "")
+{
+  if (<MODE>mode == TImode)
+    FAIL;
+})

 (define_expand "ior<mode>3"
   [(set (match_operand:BOOL_128 0 "vlogical_operand")
         (ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
 		      (match_operand:BOOL_128 2 "vlogical_operand")))]
   ""
-  "")
+{
+  if (<MODE>mode == TImode)
+    FAIL;
+})

 (define_expand "xor<mode>3"
   [(set (match_operand:BOOL_128 0 "vlogical_operand")
         (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
 		      (match_operand:BOOL_128 2 "vlogical_operand")))]
   ""
-  "")
+{
+  if (<MODE>mode == TImode)
+    FAIL;
+})

 (define_expand "nor<mode>3"
   [(set (match_operand:BOOL_128 0 "vlogical_operand")
@@ -7106,7 +7117,10 @@  (define_expand "nor<mode>3"
 	 (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand"))
 	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"))))]
   ""
-  "")
+{
+  if (<MODE>mode == TImode)
+    FAIL;
+})

 (define_expand "andc<mode>3"
   [(set (match_operand:BOOL_128 0 "vlogical_operand")
@@ -7114,7 +7128,10 @@  (define_expand "andc<mode>3"
 	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"))
 	 (match_operand:BOOL_128 1 "vlogical_operand")))]
   ""
-  "")
+{
+  if (<MODE>mode == TImode)
+    FAIL;
+})

 ;; Power8 vector logical instructions.
 (define_expand "eqv<mode>3"
@@ -7123,7 +7140,10 @@  (define_expand "eqv<mode>3"
 	 (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
 		       (match_operand:BOOL_128 2 "vlogical_operand"))))]
   "<MODE>mode == TImode || <MODE>mode == PTImode || TARGET_P8_VECTOR"
-  "")
+{
+  if (<MODE>mode == TImode)
+    FAIL;
+})

 ;; Rewrite nand into canonical form
 (define_expand "nand<mode>3"
@@ -7132,7 +7152,10 @@  (define_expand "nand<mode>3"
 	 (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand"))
 	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"))))]
   "<MODE>mode == TImode || <MODE>mode == PTImode || TARGET_P8_VECTOR"
-  "")
+{
+  if (<MODE>mode == TImode)
+    FAIL;
+})

 ;; The canonical form is to have the negated element first, so we need to
 ;; reverse arguments.
@@ -7142,7 +7165,10 @@  (define_expand "orc<mode>3"
 	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"))
 	 (match_operand:BOOL_128 1 "vlogical_operand")))]
   "<MODE>mode == TImode || <MODE>mode == PTImode || TARGET_P8_VECTOR"
-  "")
+{
+  if (<MODE>mode == TImode)
+    FAIL;
+})

 ;; 128-bit logical operations insns and split operations
 (define_insn_and_split "*and<mode>3_internal"
@@ -7394,7 +7420,17 @@  (define_insn_and_split "*eqv<mode>3_internal2"
 	 (const_string "16")))])

 ;; 128-bit one's complement
-(define_insn_and_split "one_cmpl<mode>2"
+(define_expand "one_cmpl<mode>2"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(not:BOOL_128
+	  (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_UNARY>")))]
+  ""
+{
+  if (<MODE>mode == TImode)
+    FAIL;
+})
+
+(define_insn_and_split "*one_cmpl<mode>2"
   [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
 	(not:BOOL_128
 	  (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_UNARY>")))]
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100694.c b/gcc/testsuite/gcc.target/powerpc/pr100694.c
new file mode 100644
index 00000000000..99dd3ca89ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100694.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */
+/* { dg-final { scan-assembler-not {\mli\M} } } */
+/* { dg-final { scan-assembler-not {\mor\M} } } */
+
+/* It just needs two std.  */
+void foo (unsigned __int128* res, unsigned long long hi, unsigned long long lo)
+{
+   unsigned __int128 i = hi;
+   i <<= 64;
+   i |= lo;
+   *res = i;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.c b/gcc/testsuite/gcc.target/powerpc/pr92398.c
new file mode 100644
index 00000000000..7d6201cc5bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr92398.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times {\mnot\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */
+
+/* All platforms should generate the same instructions: not;not;std;std.  */
+void bar (__int128_t *dst, __int128_t src)
+{
+  *dst =  ~src;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.h b/gcc/testsuite/gcc.target/powerpc/pr92398.h
deleted file mode 100644
index 5a4a8bcab80..00000000000
--- a/gcc/testsuite/gcc.target/powerpc/pr92398.h
+++ /dev/null
@@ -1,17 +0,0 @@ 
-/* This test code is included into pr92398.p9-.c and pr92398.p9+.c.
-   The two files have the tests for the number of instructions generated for
-   P9- versus P9+.
-
-   store generates difference instructions as below:
-   P9+: mtvsrdd;xxlnot;stxv.
-   P8/P7/P6 LE: not;not;std;std.
-   P8 BE: mtvsrd;mtvsrd;xxpermdi;xxlnor;stxvd2x.
-   P7/P6 BE: std;std;addi;lxvd2x;xxlnor;stxvd2x.
-   P9+ and P9- LE are expected, P6/P7/P8 BE are unexpected.  */
-
-void
-bar (__int128_t *dst, __int128_t src)
-{
-  *dst =  ~src;
-}
-
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c b/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c
deleted file mode 100644
index 72dd1d9a274..00000000000
--- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c
+++ /dev/null
@@ -1,12 +0,0 @@ 
-/* { dg-do compile { target { lp64 && has_arch_pwr9 } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-options "-O2 -mvsx" } */
-
-/* { dg-final { scan-assembler-times {\mmtvsrdd\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mxxlnor\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mstxv\M} 1 } } */
-/* { dg-final { scan-assembler-not {\mld\M} } } */
-/* { dg-final { scan-assembler-not {\mnot\M} } } */
-
-/* Source code for the test in pr92398.h */
-#include "pr92398.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c b/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c
deleted file mode 100644
index bd7fa98af51..00000000000
--- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c
+++ /dev/null
@@ -1,10 +0,0 @@ 
-/* { dg-do compile { target { lp64 && {! has_arch_pwr9} } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-options "-O2 -mvsx" } */
-
-/* { dg-final { scan-assembler-times {\mnot\M} 2 { xfail be } } } */
-/* { dg-final { scan-assembler-times {\mstd\M} 2 { xfail { { {! has_arch_pwr9} && has_arch_pwr8 } && be } } } } */
-
-/* Source code for the test in pr92398.h */
-#include "pr92398.h"
-