[rs6000] PR target/105991: Recognize PLUS and XOR forms of rldimi.

Message ID 053901d88209$015b0b10$04112130$@nextmovesoftware.com
State New
Headers
Series [rs6000] PR target/105991: Recognize PLUS and XOR forms of rldimi. |

Commit Message

Roger Sayle June 17, 2022, 5:13 a.m. UTC
  This patch addresses PR target/105991 where a change to prefer representing
shifts and adds at the tree-level as multiplications, causes problems for
the rldimi patterns in the powerpc backend.  The issue is that rs6000.md
models this pattern using IOR, and some variants that have the equivalent
PLUS or XOR in the RTL fail to match some *rotl<mode>4_insert patterns.
This is fixed in this patch by adding a define_insn_and_split to locally
canonicalize the PLUS and XOR forms to the backend's preferred IOR form.

An alternative fix might be for the RTL optimizers to define a canonical
form for these plus_xor_ior equivalent expressions, but the logical
choice might be plus (which may appear in an addressing mode), and such
a change may require a number of tweaks to update various backends
(i.e.  a more intrusive change than the one proposed here).

Many thanks for Marek Polacek for bootstrapping and regression testing
this change without problems.  Hopefully the new testcase is portable
across powerpc's effective-targets.  Ok for mainline?


2022-06-17  Roger Sayle  <roger@nextmovesoftware.com>
	    Marek Polacek  <polacek@redhat.com>

gcc/ChangeLog
	PR target/105991
	* config/rs6000/rs6000.md (plus_xor): New code iterator.
	(*rotl<mode>3_insert_3_<code>): New define_insn_and_split.

gcc/testsuite/ChangeLog
	PR target/105991
	* gcc.target/powerpc/pr105991.c: New test case.


Thanks in advance,
Roger
--
  

Comments

Segher Boessenkool June 20, 2022, 10:10 p.m. UTC | #1
Hi!

On Fri, Jun 17, 2022 at 07:13:37AM +0200, Roger Sayle wrote:
> This patch addresses PR target/105991 where a change to prefer representing
> shifts and adds at the tree-level as multiplications, causes problems for
> the rldimi patterns in the powerpc backend.

Because it now is converted to different RTL at expand time.  Which the
generic expand code does some premature optimisation on, which makes us
end up with the addition instead of data manipulation insns.  Oh well.

> The issue is that rs6000.md
> models this pattern using IOR, and some variants that have the equivalent
> PLUS or XOR in the RTL fail to match some *rotl<mode>4_insert patterns.
> This is fixed in this patch by adding a define_insn_and_split to locally
> canonicalize the PLUS and XOR forms to the backend's preferred IOR form.

Okay.

> An alternative fix might be for the RTL optimizers to define a canonical
> form for these plus_xor_ior equivalent expressions, but the logical
> choice might be plus (which may appear in an addressing mode), and such
> a change may require a number of tweaks to update various backends
> (i.e.  a more intrusive change than the one proposed here).

This does not make sense in an address at all, thankfully :-)

The only sane canonicalisation for this is something like VEC_DUPLICATE
but for submodes of integer modes, instead of the component mode of a
vector mode.  I don't feel this is worth trying to handle in general
though.

> Many thanks for Marek Polacek for bootstrapping and regression testing
> this change without problems.

You have an account on the cfarm, it is quick and easy to test there :-)
I recommend gcc135, a 32 core p9, with oodles of disk space :-)

> +; Canonicalize the PLUS and XOR forms to IOR for rotl<mode>3_insert_3
> +(define_code_iterator plus_xor [plus xor])
> +
> +(define_insn_and_split "*rotl<mode>3_insert_3_<code>"
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> +	(plus_xor:GPR
> +	  (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
> +		   (match_operand:GPR 4 "const_int_operand" "n"))
> +	  (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
> +		      (match_operand:SI 2 "const_int_operand" "n"))))]
> +  "INTVAL (operands[2]) == exact_log2 (UINTVAL (operands[4]) + 1)"

exact_log2 returns -1 if its argument is not a power of two.  Please
test it is > 0 explicitly here: I don't think this splitter will work
correctly otherwise.  There shouldn't really be a shift by 0 ever of
course, but it isn't invalid RTL.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr105991.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +unsigned long long
> +foo (unsigned long long value)
> +{
> +  value &= 0xffffffff;
> +  value |= value << 32;
> +  return value;
> +}
> +/* { dg-final { scan-assembler "rldimi" } } */

Write
/* { dg-final { scan-assembler {\mrldimi\M} } } */
please.


Okay for trunk with those changes.  Thanks!


Segher
  
Kewen.Lin June 21, 2022, 2:03 a.m. UTC | #2
on 2022/6/21 06:10, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jun 17, 2022 at 07:13:37AM +0200, Roger Sayle wrote:
>> This patch addresses PR target/105991 where a change to prefer representing
>> shifts and adds at the tree-level as multiplications, causes problems for
>> the rldimi patterns in the powerpc backend.
> 
> Because it now is converted to different RTL at expand time.  Which the
> generic expand code does some premature optimisation on, which makes us
> end up with the addition instead of data manipulation insns.  Oh well.
> 
>> The issue is that rs6000.md
>> models this pattern using IOR, and some variants that have the equivalent
>> PLUS or XOR in the RTL fail to match some *rotl<mode>4_insert patterns.
>> This is fixed in this patch by adding a define_insn_and_split to locally
>> canonicalize the PLUS and XOR forms to the backend's preferred IOR form.
> 
> Okay.
> 
>> An alternative fix might be for the RTL optimizers to define a canonical
>> form for these plus_xor_ior equivalent expressions, but the logical
>> choice might be plus (which may appear in an addressing mode), and such
>> a change may require a number of tweaks to update various backends
>> (i.e.  a more intrusive change than the one proposed here).
> 
> This does not make sense in an address at all, thankfully :-)
> 
> The only sane canonicalisation for this is something like VEC_DUPLICATE
> but for submodes of integer modes, instead of the component mode of a
> vector mode.  I don't feel this is worth trying to handle in general
> though.
> 
>> Many thanks for Marek Polacek for bootstrapping and regression testing
>> this change without problems.
> 
> You have an account on the cfarm, it is quick and easy to test there :-)
> I recommend gcc135, a 32 core p9, with oodles of disk space :-)
> 
>> +; Canonicalize the PLUS and XOR forms to IOR for rotl<mode>3_insert_3
>> +(define_code_iterator plus_xor [plus xor])
>> +
>> +(define_insn_and_split "*rotl<mode>3_insert_3_<code>"
>> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>> +	(plus_xor:GPR
>> +	  (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
>> +		   (match_operand:GPR 4 "const_int_operand" "n"))
>> +	  (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
>> +		      (match_operand:SI 2 "const_int_operand" "n"))))]
>> +  "INTVAL (operands[2]) == exact_log2 (UINTVAL (operands[4]) + 1)"
> 
> exact_log2 returns -1 if its argument is not a power of two.  Please
> test it is > 0 explicitly here: I don't think this splitter will work
> correctly otherwise.  There shouldn't really be a shift by 0 ever of
> course, but it isn't invalid RTL.
> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr105991.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +unsigned long long
>> +foo (unsigned long long value)
>> +{
>> +  value &= 0xffffffff;
>> +  value |= value << 32;
>> +  return value;
>> +}
>> +/* { dg-final { scan-assembler "rldimi" } } */
> 
> Write
> /* { dg-final { scan-assembler {\mrldimi\M} } } */
> please.
> 

This case also needs effective-target keyword lp64,
that is /* { dg-require-effective-target lp64 } */

since with -m32, it gets:
  mr 3,4

with -m32 -mpowerpc64, it gets:
  rldicl 3,4,0,32


BR,
Kewen
  
Segher Boessenkool June 21, 2022, 7:34 a.m. UTC | #3
On Tue, Jun 21, 2022 at 10:03:18AM +0800, Kewen.Lin wrote:
> This case also needs effective-target keyword lp64,
> that is /* { dg-require-effective-target lp64 } */

Good point.  Yes.

It would be nice to have just has_arch_ppc64 really.

> since with -m32, it gets:
>   mr 3,4
> 
> with -m32 -mpowerpc64, it gets:
>   rldicl 3,4,0,32

Yes, and that is not lp64 -- both longs and pointers are 32 bits when
you have -m32.

You get different code because parameter passing is different.  The
usual way to sidestep is to have the data in memory instead:

unsigned long long x;
void 
goo (void)
{
  unsigned long long value = x;
  value &= 0xffffffff;
  value |= value << 32;
  x = value;
}

but then the compiler tries to be smart and do code like
	addis 10,2,.LANCHOR0+4@toc@ha
	lwz 10,.LANCHOR0+4@toc@l(10)
	sldi 9,10,32
	add 9,9,10
	addis 10,2,.LANCHOR0@toc@ha
	std 9,.LANCHOR0@toc@l(10)
	blr
for -m64, and
	lis 9,x@ha
	la 10,x@l(9)
	lwz 10,4(10)
	stw 10,x@l(9)
	blr
for just -m32, but
	lis 10,x@ha
	la 9,x@l(10)
	la 10,x@l(10)
	ld 9,0(9)
	rldicl 8,9,0,32
	sldi 9,9,32
	add 9,9,8
	std 9,0(10)
	blr
for -m32 -mpowerpc64 (note it has not managed to do the splitter here;
it gets
Failed to match this instruction:
(set (reg:DI 128)
    (plus:DI (ashift:DI (reg/v:DI 117 [ value ])
            (const_int 32 [0x20]))
        (zero_extend:DI (subreg:SI (reg/v:DI 117 [ value ]) 4))))
and then
Failed to match this instruction:
(set (reg:DI 128)
    (plus:DI (and:DI (reg/v:DI 117 [ value ])
            (const_int 4294967295 [0xffffffff]))
        (ashift:DI (reg/v:DI 117 [ value ])
            (const_int 32 [0x20]))))
but that is not enough).

So let's just do lp64, at least for now :-)


Segher
  

Patch

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index c55ee7e..695ec33 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4188,6 +4188,23 @@ 
 }
   [(set_attr "type" "insert")])
 
+; Canonicalize the PLUS and XOR forms to IOR for rotl<mode>3_insert_3
+(define_code_iterator plus_xor [plus xor])
+
+(define_insn_and_split "*rotl<mode>3_insert_3_<code>"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+	(plus_xor:GPR
+	  (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
+		   (match_operand:GPR 4 "const_int_operand" "n"))
+	  (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+		      (match_operand:SI 2 "const_int_operand" "n"))))]
+  "INTVAL (operands[2]) == exact_log2 (UINTVAL (operands[4]) + 1)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(ior:GPR (and:GPR (match_dup 3) (match_dup 4))
+		 (ashift:GPR (match_dup 1) (match_dup 2))))])
+
 (define_code_iterator plus_ior_xor [plus ior xor])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/powerpc/pr105991.c b/gcc/testsuite/gcc.target/powerpc/pr105991.c
new file mode 100644
index 0000000..e853e53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr105991.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+unsigned long long
+foo (unsigned long long value)
+{
+  value &= 0xffffffff;
+  value |= value << 32;
+  return value;
+}
+/* { dg-final { scan-assembler "rldimi" } } */
+