[rs6000] new split pattern for TI to V1TI move [PR103124]

Message ID fc85253b-cc4c-02cf-3c65-717633b08d27@linux.ibm.com
State New
Headers
Series [rs6000] new split pattern for TI to V1TI move [PR103124] |

Commit Message

HAO CHEN GUI Dec. 13, 2021, 3 a.m. UTC
  Hi,
   This patch defines a new split pattern for TI to V1TI move. The pattern concatenates two subreg:DI of
a TI to a V2DI, then move the V2DI to V1TI. With the pattern, the subreg pass can do register split for
TI when there is a TI to V1TI move. The patch optimizes one unnecessary "mr" out on P9. The new
test case illustrates it.

   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is this okay for trunk?
Any recommendations? Thanks a lot.

ChangeLog
2021-12-13 Haochen Gui <guihaoc@linux.ibm.com>

gcc/
	* config/rs6000/vsx.md (split pattern for TI to V1TI move): Defined.

gcc/testsuite/
	* gcc.target/powerpc/pr103124.c: New testcase.


patch.diff
  

Comments

David Edelsohn Dec. 13, 2021, 10:22 p.m. UTC | #1
On Sun, Dec 12, 2021 at 10:00 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
>
> Hi,
>    This patch defines a new split pattern for TI to V1TI move. The pattern concatenates two subreg:DI of
> a TI to a V2DI, then move the V2DI to V1TI. With the pattern, the subreg pass can do register split for
> TI when there is a TI to V1TI move. The patch optimizes one unnecessary "mr" out on P9. The new
> test case illustrates it.
>
>    Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is this okay for trunk?
> Any recommendations? Thanks a lot.
>
> ChangeLog
> 2021-12-13 Haochen Gui <guihaoc@linux.ibm.com>
>
> gcc/
>         * config/rs6000/vsx.md (split pattern for TI to V1TI move): Defined.
>
> gcc/testsuite/
>         * gcc.target/powerpc/pr103124.c: New testcase.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index bf033e31c1c..7bca7780735 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6589,3 +6589,19 @@ (define_insn "xxeval"
>     [(set_attr "type" "vecperm")
>      (set_attr "prefixed" "yes")])
>
> +;; split TI to V1TI move
> +(define_split
> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> +       (subreg:V1TI
> +         (match_operand:TI 1 "int_reg_operand") 0 ))]
> +  "TARGET_P9_VECTOR && !reload_completed"
> +  [(const_int 0)]
> +{
> +  rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0);
> +  rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8);
> +  rtx tmp3 = gen_reg_rtx (V2DImode);
> +  emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2));
> +  rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0);
> +  emit_move_insn (operands[0], tmp4);
> +  DONE;
> +})
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> new file mode 100644
> index 00000000000..724492dbcd2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } */

Please don't include the "powerpc" target selector in the
gcc.target/powerpc directory.  Just use lp64.

> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-final { scan-assembler-not "\mmr\M" } } */
> +
> +vector __int128 add (long long a)
> +{
> +  vector __int128 b;
> +  b = (vector __int128) {a};
> +  return b;
> +}

Okay with that change.

Thanks, David
  
Segher Boessenkool Dec. 13, 2021, 10:59 p.m. UTC | #2
Hi!

On Mon, Dec 13, 2021 at 05:22:06PM -0500, David Edelsohn wrote:
> On Sun, Dec 12, 2021 at 10:00 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -6589,3 +6589,19 @@ (define_insn "xxeval"
> >     [(set_attr "type" "vecperm")
> >      (set_attr "prefixed" "yes")])
> >
> > +;; split TI to V1TI move

Please comment that this splitter tries to generate mtvsrdd insns, and
don't say the obvious things :-)

> > +(define_split
> > +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> > +       (subreg:V1TI
> > +         (match_operand:TI 1 "int_reg_operand") 0 ))]
> > +  "TARGET_P9_VECTOR && !reload_completed"

Why the "!reload_completed"?  Is this generated after reload as well,
and that is bad for some reason?

> > +  [(const_int 0)]
> > +{
> > +  rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0);
> > +  rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8);
> > +  rtx tmp3 = gen_reg_rtx (V2DImode);
> > +  emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2));
> > +  rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0);
> > +  emit_move_insn (operands[0], tmp4);
> > +  DONE;
> > +})

Ah, it is bad because it generates a pseudo.

So either you just make it work when everything is hard regs, or you do
this *and comment it*.

The first option is not very easy to do.  You need to make sure you can
do those subregs (and get GPRs!), and you need to use a hard reg instead
of the new pseudo (you can use operand 0 for this here though, it can
never be the same as operand 1 :-) (but only do this if this *is* after
reload)).

But, it sounds like you actually saw problems when allowing it after
reload, so it sounds like it would actually be useful to do it then?

> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } */
> 
> Please don't include the "powerpc" target selector in the
> gcc.target/powerpc directory.  Just use lp64.

Or actually, don't use anything, and do a  dg-require int128  instead.


Segher
  
HAO CHEN GUI Dec. 14, 2021, 1:41 a.m. UTC | #3
Hi Segher,
  Thanks for your advice. Please see my comments.

On 14/12/2021 上午 6:59, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Dec 13, 2021 at 05:22:06PM -0500, David Edelsohn wrote:
>> On Sun, Dec 12, 2021 at 10:00 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
>>> --- a/gcc/config/rs6000/vsx.md
>>> +++ b/gcc/config/rs6000/vsx.md
>>> @@ -6589,3 +6589,19 @@ (define_insn "xxeval"
>>>     [(set_attr "type" "vecperm")
>>>      (set_attr "prefixed" "yes")])
>>>
>>> +;; split TI to V1TI move
> 
> Please comment that this splitter tries to generate mtvsrdd insns, and
> don't say the obvious things :-)
>

OK, I will modify it.

>>> +(define_split
>>> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
>>> +       (subreg:V1TI
>>> +         (match_operand:TI 1 "int_reg_operand") 0 ))]
>>> +  "TARGET_P9_VECTOR && !reload_completed"
> 
> Why the "!reload_completed"?  Is this generated after reload as well,
> and that is bad for some reason?
> 
>>> +  [(const_int 0)]
>>> +{
>>> +  rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0);
>>> +  rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8);
>>> +  rtx tmp3 = gen_reg_rtx (V2DImode);
>>> +  emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2));
>>> +  rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0);
>>> +  emit_move_insn (operands[0], tmp4);
>>> +  DONE;
>>> +})
> 
> Ah, it is bad because it generates a pseudo.
> 
> So either you just make it work when everything is hard regs, or you do
> this *and comment it*.
> 
> The first option is not very easy to do.  You need to make sure you can
> do those subregs (and get GPRs!), and you need to use a hard reg instead
> of the new pseudo (you can use operand 0 for this here though, it can
> never be the same as operand 1 :-) (but only do this if this *is* after
> reload)).
> 
> But, it sounds like you actually saw problems when allowing it after
> reload, so it sounds like it would actually be useful to do it then?

The purpose of this split pattern is to generate V1TI by two subregs from TI.
Subsequent subreg pass can recognize TI in the insn as splitable. As there is no
subreg pass after reload, I want the split just to be done before reload. Also as
you mentioned, my patch generates a pseudo. It doesn't work after reload. That's
why I set "!reload_complete" condition.

> 
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c
>>> @@ -0,0 +1,11 @@
>>> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } */
>>
>> Please don't include the "powerpc" target selector in the
>> gcc.target/powerpc directory.  Just use lp64.
> 
> Or actually, don't use anything, and do a  dg-require int128  instead.
> 

Thanks, I will take it.

> 
> Segher
>
  

Patch

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..7bca7780735 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -6589,3 +6589,19 @@  (define_insn "xxeval"
    [(set_attr "type" "vecperm")
     (set_attr "prefixed" "yes")])

+;; split TI to V1TI move
+(define_split
+  [(set (match_operand:V1TI 0 "vsx_register_operand")
+	(subreg:V1TI
+	  (match_operand:TI 1 "int_reg_operand") 0 ))]
+  "TARGET_P9_VECTOR && !reload_completed"
+  [(const_int 0)]
+{
+  rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0);
+  rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8);
+  rtx tmp3 = gen_reg_rtx (V2DImode);
+  emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2));
+  rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0);
+  emit_move_insn (operands[0], tmp4);
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c b/gcc/testsuite/gcc.target/powerpc/pr103124.c
new file mode 100644
index 00000000000..724492dbcd2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-final { scan-assembler-not "\mmr\M" } } */
+
+vector __int128 add (long long a)
+{
+  vector __int128 b;
+  b = (vector __int128) {a};
+  return b;
+}