[PATCH-1,v2,rs6000] Replace shift and ior insns with one rotate and mask insn for bswap pattern [PR93453]

Message ID 01e0bcef-10a0-561b-0227-4700ebe22692@linux.ibm.com
State New
Headers
Series [PATCH-1,v2,rs6000] Replace shift and ior insns with one rotate and mask insn for bswap pattern [PR93453] |

Commit Message

HAO CHEN GUI June 7, 2022, 8:07 a.m. UTC
  Hi,
  This patch replaces shift and ior insns with one rotate and mask
insn for the split patterns which are for DI byte swap on Power6. The
test cases shows the optimization.

  Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-06-07 Haochen Gui <guihaoc@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.md (define_split for bswapdi load): Merge shift
	and ior insns to one rotate and mask insn.
	(define_split for bswapdi register): Likewise.

gcc/testsuite/
	* gcc.target/powerpc/pr93453-1.c: New.


patch.diff
  

Comments

Segher Boessenkool June 7, 2022, 3:59 p.m. UTC | #1
Hi!

On Tue, Jun 07, 2022 at 04:07:58PM +0800, HAO CHEN GUI wrote:
>   This patch replaces shift and ior insns with one rotate and mask
> insn for the split patterns which are for DI byte swap on Power6. The
> test cases shows the optimization.

Nice :-)

> -  emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32)));
> -  emit_insn (gen_iordi3 (dest, dest, op3));
> +  emit_insn (gen_rotldi3_insert_3 (dest, op3, GEN_INT (32), dest,
> +				   GEN_INT (0xffffffff)));

You could make some define_expand to make this easier to use.  But not
sure what to call it.  The goal would be to make this easier to read and
use, not to make it harder :-)  Something with duplicate-si-to-di or
such?  Is that pattern somewhere else already, maybe vectors, maybe some
other target even?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-mdejagnu-cpu=power6 -O2" } */

It doesn't require -m64, only -mpowerpc64.  You can use has_arch_ppc64
to test for the latter.

Okay for trunk, even without that improvement.  Thanks!


Segher
  
Segher Boessenkool June 7, 2022, 4:01 p.m. UTC | #2
On Tue, Jun 07, 2022 at 04:07:58PM +0800, HAO CHEN GUI wrote:
>   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.

One further thing: please make sure you have tested on -m32 as well,
especially for integer stuff like this, it is easy to accidentally get
the conditions on a define_* and one of its uses out of synch.  But it
looks to be okay in this case :-)


Segher
  
HAO CHEN GUI June 8, 2022, 7:57 a.m. UTC | #3
Hi,

On 7/6/2022 下午 11:59, Segher Boessenkool wrote:
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile { target lp64 } } */
>> +/* { dg-options "-mdejagnu-cpu=power6 -O2" } */
> It doesn't require -m64, only -mpowerpc64.  You can use has_arch_ppc64
> to test for the latter.

Tested it with 'target has_arch_ppc64', it works on both -m32 and -m64.

Thanks.
Gui Haochen
  

Patch

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bf85baa5370..83800df12aa 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2828,8 +2828,8 @@  (define_split
       emit_insn (gen_bswapsi2 (dest_32, word2));
     }

-  emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32)));
-  emit_insn (gen_iordi3 (dest, dest, op3));
+  emit_insn (gen_rotldi3_insert_3 (dest, op3, GEN_INT (32), dest,
+				   GEN_INT (0xffffffff)));
   DONE;
 })

@@ -2914,10 +2914,10 @@  (define_split
   rtx op3_si  = simplify_gen_subreg (SImode, op3, DImode, lo_off);

   emit_insn (gen_lshrdi3 (op2, src, GEN_INT (32)));
-  emit_insn (gen_bswapsi2 (dest_si, src_si));
-  emit_insn (gen_bswapsi2 (op3_si, op2_si));
-  emit_insn (gen_ashldi3 (dest, dest, GEN_INT (32)));
-  emit_insn (gen_iordi3 (dest, dest, op3));
+  emit_insn (gen_bswapsi2 (op3_si, src_si));
+  emit_insn (gen_bswapsi2 (dest_si, op2_si));
+  emit_insn (gen_rotldi3_insert_3 (dest, op3, GEN_INT (32), dest,
+				   GEN_INT (0xffffffff)));
   DONE;
 })

diff --git a/gcc/testsuite/gcc.target/powerpc/pr93453-1.c b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
new file mode 100644
index 00000000000..4271886561f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-mdejagnu-cpu=power6 -O2" } */
+
+unsigned long load_byte_reverse (unsigned long *in)
+{
+   return __builtin_bswap64 (*in);
+}
+
+unsigned long byte_reverse (unsigned long in)
+{
+   return __builtin_bswap64 (in);
+}
+
+/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */