rs6000: fmr gets used instead of faster xxlor [PR93571]

Message ID 16fa34b8-ad8a-20f2-b285-3b3f5bf5d5b2@linux.ibm.com
State New
Headers
Series rs6000: fmr gets used instead of faster xxlor [PR93571] |

Commit Message

Ajit Agarwal Feb. 17, 2023, 4:58 p.m. UTC
  Hello All:

This patch replaces fmr instruction (6 cycles) with xxlor instruction ( 2 cycles)
Bootstrapped and regtested on powerpc64-linux-gnu.

copyright assignment form is still in the process of being sent.
 
Thanks & Regards
Ajit

	rs6000: fmr gets used instead of faster xxlor [PR93571]

        This patch replaces 6 cycles fmr instruction with xxlor
	2 cycles.

	2023-02-17  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

	* config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction.
---
 gcc/config/rs6000/rs6000.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Andrew Pinski Feb. 17, 2023, 5:09 p.m. UTC | #1
On Fri, Feb 17, 2023 at 8:59 AM Ajit Agarwal via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
> Hello All:
>
> This patch replaces fmr instruction (6 cycles) with xxlor instruction ( 2 cycles)
> Bootstrapped and regtested on powerpc64-linux-gnu.

I don't think this can be unconditionally replaced.
xxlor only exists in newer Power ISA.

Thanks,
Andrew Pinski

>
> copyright assignment form is still in the process of being sent.
>
> Thanks & Regards
> Ajit
>
>         rs6000: fmr gets used instead of faster xxlor [PR93571]
>
>         This patch replaces 6 cycles fmr instruction with xxlor
>         2 cycles.
>
>         2023-02-17  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
>
> gcc/ChangeLog:
>
>         * config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction.
> ---
>  gcc/config/rs6000/rs6000.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 4a7812fa592..dfd6c73ffcb 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -8436,7 +8436,7 @@
>    "@
>     stfd%U0%X0 %1,%0
>     lfd%U1%X1 %0,%1
> -   fmr %0,%1
> +   xxlor %0,%1,%1
>     lxsd %0,%1
>     stxsd %1,%0
>     lxsdx %x0,%y1
> --
> 2.31.1
>
>
  
Segher Boessenkool Feb. 17, 2023, 5:23 p.m. UTC | #2
Hi!

On Fri, Feb 17, 2023 at 10:28:41PM +0530, Ajit Agarwal wrote:
> This patch replaces fmr instruction (6 cycles) with xxlor instruction ( 2 cycles)
> Bootstrapped and regtested on powerpc64-linux-gnu.

You tested this on a CPU that does have VSX.  It is incorrect on other
(older) CPUs.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -8436,7 +8436,7 @@
>    "@
>     stfd%U0%X0 %1,%0
>     lfd%U1%X1 %0,%1
> -   fmr %0,%1
> +   xxlor %0,%1,%1
>     lxsd %0,%1
>     stxsd %1,%0
>     lxsdx %x0,%y1

This is the *mov<mode>_hardfloat64 pattern.  You can add some magic to
your Git config so that will show in the patch: in .git/config:

[diff "md"]
        xfuncname = "^\\(define.*$"

(As it says in .gitattributes:
  # Make diff on MD files use "(define" as a function marker.
  # Use together with git config diff.md.xfuncname '^\(define.*$'
  # which is run by contrib/gcc-git-customization.sh too.
  *.md diff=md
)

The third alternative to this insn, the fmr one, has "d" as both input
and output constraint, and has "*" as isa attribute, so it will be used
on any CPU that has floating point registers.  The eight alternative
(the existing xxlor one) has "wa" constraints (via <f64_vsx>) so it
implicitly requires VSX to be enabled.  You need to do something similar
for what you want, but you also need to still allow fmr.


Segher
  
Ajit Agarwal Feb. 21, 2023, 8:48 a.m. UTC | #3
Hello All:

This patch replaces fmr instruction 6 cycles with 2 cycles xxlor instruction
for p7 and p8 architecture.

I have implemented with switch and cases otherwise it is difficult to accommodate
xxlor with p7 and p8 and fmr for other architectures.

Bootstrapped and regtested.

Thanks & Regards
Ajit

		
	rs6000: fmr gets used instead of faster xxlor [PR93571]

	This patch replaces 6 cycles fmr instruction with xxlor
	2 cycles in p8 and p7 architecture.

	2023-02-21  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

	* config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction.
---
 gcc/config/rs6000/rs6000.md | 49 ++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index dfd6c73ffcb..ef587033367 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8433,26 +8433,35 @@ (define_insn "*mov<mode>_hardfloat64"
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
-  "@
-   stfd%U0%X0 %1,%0
-   lfd%U1%X1 %0,%1
-   xxlor %0,%1,%1
-   lxsd %0,%1
-   stxsd %1,%0
-   lxsdx %x0,%y1
-   stxsdx %x1,%y0
-   xxlor %x0,%x1,%x1
-   xxlxor %x0,%x0,%x0
-   li %0,0
-   std%U0%X0 %1,%0
-   ld%U1%X1 %0,%1
-   mr %0,%1
-   mt%0 %1
-   mf%1 %0
-   nop
-   mfvsrd %0,%x1
-   mtvsrd %x0,%1
-   #"
+{
+  switch (which_alternative) {
+    case 0 :  return "stfd%U0%X0 %1,%0";
+    case 1 :  return "lfd%U1%X1 %0,%1";
+    case 2 : if ((TARGET_VSX || TARGET_P8_VECTOR)
+                  && !TARGET_P9_VECTOR
+                  && !TARGET_POWER10)
+               return "xxlor %0,%1,%1";
+              else
+                return "fmr %0,%1";
+
+     case 3 : return "lxsd %0,%1";
+     case 4 : return "stxsd %1,%0";
+     case 5 : return "lxsdx %x0,%y1";
+     case 6 : return "stxsdx %x1,%y0";
+     case 7 : return "xxlor %x0,%x1,%x1";
+     case 8 : return "xxlxor %x0,%x0,%x0";
+     case 9 : return "li %0,0";
+     case 10 : return "std%U0%X0 %1,%0";
+     case 11 : return "ld%U1%X1 %0,%1";
+     case 12 : return "mr %0,%1";
+     case 13 : return "mt%0 %1";
+     case 14 : return "mf%1 %0";
+     case 15 : return "nop";
+     case 16: return "mfvsrd %0,%x1";
+     case 17 : return "mtvsrd %x0,%1";
+   }
+   return "unreachable";
+}
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
  
Segher Boessenkool Feb. 21, 2023, 11:04 a.m. UTC | #4
Hi!

On Tue, Feb 21, 2023 at 02:18:25PM +0530, Ajit Agarwal wrote:
> This patch replaces fmr instruction 6 cycles with 2 cycles xxlor instruction
> for p7 and p8 architecture.
> 
> I have implemented with switch and cases otherwise it is difficult to accommodate
> xxlor with p7 and p8 and fmr for other architectures.

Please domn't use a switch, it isn't needed.  Instead use the "isa"
attribute (with p7v here), and put the preferred alternative first.

> 	rs6000: fmr gets used instead of faster xxlor [PR93571]

rs6000: Use xxlor instead of fmr where possible

> 	This patch replaces 6 cycles fmr instruction with xxlor
> 	2 cycles in p8 and p7 architecture.

No, it also does it on all later architectures.

Do you have any actual timings (i.e. from hardware, not documentation)?

> 	* config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction.

Line too long.  And, that is not what the patch does.  Changelog should
be totally boring just saying what the patch changes.  If the patch
changes things other than what thechangelog says your reviewer will
think something went missin somewhere :-)

> -  "@
> -   stfd%U0%X0 %1,%0
> -   lfd%U1%X1 %0,%1
> -   xxlor %0,%1,%1

That is not what is currently in trunk, so your patch cannot apply.

> +  switch (which_alternative) {
> +    case 0 :  return "stfd%U0%X0 %1,%0";
> +    case 1 :  return "lfd%U1%X1 %0,%1";

Formatting is all incorrect.  We dom't need or want a switch at all, but
correct would be:
  switch (which_alternative)
    {
      case 0:
	return "stfd%U0%X0 %1,%0";
      case 1:
	return "lfd%U1%X1 %0,%1";

etc.

> +    case 2 : if ((TARGET_VSX || TARGET_P8_VECTOR)
> +                  && !TARGET_P9_VECTOR
> +                  && !TARGET_POWER10)
> +               return "xxlor %0,%1,%1";
> +              else
> +                return "fmr %0,%1";

Ah, so you are excluding p9 and p10 here.  Hrm.  That should be written
TARGET_VSX && !TARGET_P9_VECTOR, none of the rest is needed; but is that
a good idea at all?

Please use %xN for VSX arguments whenever possible.  If this alternative
allows only the low numbered vector registers, that is a hint that you
probably should write this differently (and %xN is harmless then).

> +   return "unreachable";

No, never do that.  There is "gcc_unreachable ()" if you need it.

So, let's first do actual timings, and see if it is better on p9 and
p10 as well (or at least not worse).


Segher
  
Ajit Agarwal Feb. 21, 2023, 12:30 p.m. UTC | #5
Hello Segher:

On 21/02/23 4:34 pm, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Feb 21, 2023 at 02:18:25PM +0530, Ajit Agarwal wrote:
>> This patch replaces fmr instruction 6 cycles with 2 cycles xxlor instruction
>> for p7 and p8 architecture.
>>
>> I have implemented with switch and cases otherwise it is difficult to accommodate
>> xxlor with p7 and p8 and fmr for other architectures.
> 
> Please domn't use a switch, it isn't needed.  Instead use the "isa"
> attribute (with p7v here), and put the preferred alternative first.

I am not sure how this is possible without switch and using only "isa".
> 
>> 	rs6000: fmr gets used instead of faster xxlor [PR93571]
> 
> rs6000: Use xxlor instead of fmr where possible
> 
>> 	This patch replaces 6 cycles fmr instruction with xxlor
>> 	2 cycles in p8 and p7 architecture.
> 
> No, it also does it on all later architectures.
> 
> Do you have any actual timings (i.e. from hardware, not documentation)?
> 
>> 	* config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction.
> 
> Line too long.  And, that is not what the patch does.  Changelog should
> be totally boring just saying what the patch changes.  If the patch
> changes things other than what thechangelog says your reviewer will
> think something went missin somewhere :-)

I will correct this.
> 
>> -  "@
>> -   stfd%U0%X0 %1,%0
>> -   lfd%U1%X1 %0,%1
>> -   xxlor %0,%1,%1
> 
> That is not what is currently in trunk, so your patch cannot apply.
> 
>> +  switch (which_alternative) {
>> +    case 0 :  return "stfd%U0%X0 %1,%0";
>> +    case 1 :  return "lfd%U1%X1 %0,%1";
> 
> Formatting is all incorrect.  We dom't need or want a switch at all, but
> correct would be:
>   switch (which_alternative)
>     {
>       case 0:
> 	return "stfd%U0%X0 %1,%0";
>       case 1:
> 	return "lfd%U1%X1 %0,%1";
> 
> etc.

I will correct that.
> 
>> +    case 2 : if ((TARGET_VSX || TARGET_P8_VECTOR)
>> +                  && !TARGET_P9_VECTOR
>> +                  && !TARGET_POWER10)
>> +               return "xxlor %0,%1,%1";
>> +              else
>> +                return "fmr %0,%1";
> 
> Ah, so you are excluding p9 and p10 here.  Hrm.  That should be written
> TARGET_VSX && !TARGET_P9_VECTOR, none of the rest is needed; but is that
> a good idea at all?
> 
> Please use %xN for VSX arguments whenever possible.  If this alternative
> allows only the low numbered vector registers, that is a hint that you
> probably should write this differently (and %xN is harmless then).
> 
>> +   return "unreachable";
> 
> No, never do that.  There is "gcc_unreachable ()" if you need it.
> 

I will also correct this.

> So, let's first do actual timings, and see if it is better on p9 and
> p10 as well (or at least not worse).
> 
> 
> Segher

Thanks & Regards
Ajit
  
Segher Boessenkool Feb. 21, 2023, 2:09 p.m. UTC | #6
On Tue, Feb 21, 2023 at 06:00:52PM +0530, Ajit Agarwal wrote:
> On 21/02/23 4:34 pm, Segher Boessenkool wrote:
> > Please domn't use a switch, it isn't needed.  Instead use the "isa"
> > attribute (with p7v here), and put the preferred alternative first.
> 
> I am not sure how this is possible without switch and using only "isa".

You have the "p7v" "xxlor" alternative earlier than the "*" "fmr"
alternative.  You can have an "xxlor" for contraints "d", but probably
the best (and certainly the easiest) is to just move the existing
xxlor to before fmr.

Oh, the existing xxlor alternative is implicitly isa p7v, the "wa"
constraint causes that.  It may be nicer to mark it explicitly p7v as
well, nicer for the reader.

Btw, please update the other similar patterns at the same time?  There
are eight patterns with fmr in rs6000.md (the four in dfp.md should
probably not be touched); not all are similar so should be in separate
patches, if changed at all, but a bunch are completely analogous so
should not diverge.

(It is fine to first do this one pattern only, until we have worked out
all kinks, but all should be committed at the same time).

Thanks,


Segher
  
Ajit Agarwal Feb. 22, 2023, 10:28 a.m. UTC | #7
On 21/02/23 7:39 pm, Segher Boessenkool wrote:
> On Tue, Feb 21, 2023 at 06:00:52PM +0530, Ajit Agarwal wrote:
>> On 21/02/23 4:34 pm, Segher Boessenkool wrote:
>>> Please domn't use a switch, it isn't needed.  Instead use the "isa"
>>> attribute (with p7v here), and put the preferred alternative first.
>>
>> I am not sure how this is possible without switch and using only "isa".
> 
> You have the "p7v" "xxlor" alternative earlier than the "*" "fmr"
> alternative.  You can have an "xxlor" for contraints "d", but probably
> the best (and certainly the easiest) is to just move the existing
> xxlor to before fmr.
> 
> Oh, the existing xxlor alternative is implicitly isa p7v, the "wa"
> constraint causes that.  It may be nicer to mark it explicitly p7v as
> well, nicer for the reader.
> 

If I do the above, for power9 it selects xxlor instead of fmr.

> Btw, please update the other similar patterns at the same time?  There
> are eight patterns with fmr in rs6000.md (the four in dfp.md should
> probably not be touched); not all are similar so should be in separate
> patches, if changed at all, but a bunch are completely analogous so
> should not diverge.
> 
> (It is fine to first do this one pattern only, until we have worked out
> all kinks, but all should be committed at the same time).
> 
> Thanks,
> 
> 
> Segher
  
Ajit Agarwal Feb. 24, 2023, 8:11 a.m. UTC | #8
Hello All:

Here is the patch that uses xxlor instead of fmr where possible.
Performance results shows that fmr is better in power9 and 
power10 architectures whereas xxlor is better in power7 and
power 8 architectures.

Bootstrapped and regtested powepc64-linux-gnu.

Thanks & Regards
Ajit

	rs6000: Use xxlor instead of fmr where possible

	This patch replaces fmr with xxlor instruction for power7
	and power8 architectures whereas for power9 and power10
	replaces xxlor with fmr instruction.

	Perf measurement results:

	Power9 fmr:  201,847,661 cycles.
	Power9 xxlor: 201,877,78 cycles.
	Power8 fmr: 201,057,795 cycles.
        Power8 xxlor: 201,004,671 cycles.

	2023-02-24  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

	* config/rs6000/rs6000.md (*movdf_hardfloat64): Use xxlor
	for power7 and power8 and fmr for power9 and power10.
---
 gcc/config/rs6000/rs6000.md | 46 +++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 81bffb04ceb..1253b8622a7 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -354,7 +354,7 @@ (define_attr "cpu"
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p7p8,p10"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -402,6 +402,11 @@ (define_attr "enabled" ""
      (and (eq_attr "isa" "p10")
 	  (match_test "TARGET_POWER10"))
      (const_int 1)
+      
+     (and (eq_attr "isa" "p7p8")
+	  (match_test "TARGET_VSX && !TARGET_P9_VECTOR"))
+     (const_int 1)
+
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
@@ -8436,27 +8441,29 @@ (define_insn "*mov<mode>_softfloat32"
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
-           "=m,           d,          d,          <f64_p9>,   wY,
-             <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
-             YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>,   wa")
+           "=m,           d,          <f64_vsx>,  <f64_p9>,   wY,
+            <f64_av>,     Z,          wa,         <f64_vsx>,  !r,
+            YZ,           r,          !r,         *c*l,       !r,
+            *h,           r,          <f64_dm>,   d,          wn,
+            wa")
 	(match_operand:FMOVE64 1 "input_operand"
-            "d,           m,          d,          wY,         <f64_p9>,
-             Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
+            "d,           m,          <f64_vsx>,  wY,         <f64_p9>,
+             Z,           <f64_av>,   wa,         <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r,          eP"))]
+             0,           <f64_dm>,   r,          d,          wn,
+             eP"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
   "@
    stfd%U0%X0 %1,%0
    lfd%U1%X1 %0,%1
-   fmr %0,%1
+   xxlor %x0,%x1,%x1
    lxsd %0,%1
    stxsd %1,%0
    lxsdx %x0,%y1
    stxsdx %x1,%y0
-   xxlor %x0,%x1,%x1
+   fmr %0,%1
    xxlxor %x0,%x0,%x0
    li %0,0
    std%U0%X0 %1,%0
@@ -8467,23 +8474,28 @@ (define_insn "*mov<mode>_hardfloat64"
    nop
    mfvsrd %0,%x1
    mtvsrd %x0,%1
+   fmr %0,%1
+   fmr %0,%1
    #"
   [(set_attr "type"
-            "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
+            "fpstore,     fpload,     veclogical, fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr,      vecperm")
+             *,           mfvsr,      mtvsr,      fpsimple,   fpsimple,
+             vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
-            "*,           *,          *,          p9v,        p9v,
-             p7v,         p7v,        *,          *,          *,
-             *,           *,          *,          *,          *,
-             *,           p8v,        p8v,        p10")
+            "*,           *,          p7p8,        p9v,        p9v,
+             p7v,         p7v,        *,           *,          *,
+             *,           *,          *,           *,          *,
+             *,           p8v,        p8v,         *,          *,
+             p10")
    (set_attr "prefixed"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           *,          *,          *")])
+             *,           *,          *,          *,          *,
+             *")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
  
Segher Boessenkool Feb. 24, 2023, 3:11 p.m. UTC | #9
Hi!

For future patches: please don't send patches as replies to existing
threads.  Just start a new thread for a new patch (series).  You can
mark it as [PATCH v2] in the subject, if you want.

On Fri, Feb 24, 2023 at 01:41:49PM +0530, Ajit Agarwal wrote:
> Here is the patch that uses xxlor instead of fmr where possible.
> Performance results shows that fmr is better in power9 and 
> power10 architectures whereas xxlor is better in power7 and
> power 8 architectures.

And fmr is the only option before p7.

> 	rs6000: Use xxlor instead of fmr where possible
> 
> 	This patch replaces fmr with xxlor instruction for power7
> 	and power8 architectures whereas for power9 and power10
> 	replaces xxlor with fmr instruction.

Saying "this patch" in a commit message reads strangely.  Just "Replace
fmr with" etc.?

The second part is just wrong, you cannot replace xxlor by fmr in
general.

> 	Perf measurement results:
> 
> 	Power9 fmr:  201,847,661 cycles.
> 	Power9 xxlor: 201,877,78 cycles.
> 	Power8 fmr: 201,057,795 cycles.
>         Power8 xxlor: 201,004,671 cycles.

What is this measuring?  100M insns back-to-back, each dependent on the
previous one?

What are the results on p7 and p10?

These numbers show there is no difference on p8 either.  Did you paste
the wrong numbers maybe?

> 	* config/rs6000/rs6000.md (*movdf_hardfloat64): Use xxlor
> 	for power7 and power8 and fmr for power9 and power10.

Please don't break lines early.  Changelogs lines can be 80 columns
wide, just like source code lines.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -354,7 +354,7 @@ (define_attr "cpu"
>    (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
>  
>  ;; The ISA we implement.
> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
> +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p7p8,p10"

p78v, and sort it after p8v please.

> +     (and (eq_attr "isa" "p7p8")
> +	  (match_test "TARGET_VSX && !TARGET_P9_VECTOR"))
> +     (const_int 1)

Okay.

>  (define_insn "*mov<mode>_hardfloat64"
>    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> -           "=m,           d,          d,          <f64_p9>,   wY,
> -             <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
> -             YZ,          r,          !r,         *c*l,       !r,
> -            *h,           r,          <f64_dm>,   wa")
> +           "=m,           d,          <f64_vsx>,  <f64_p9>,   wY,
> +            <f64_av>,     Z,          wa,         <f64_vsx>,  !r,
> +            YZ,           r,          !r,         *c*l,       !r,
> +            *h,           r,          <f64_dm>,   d,          wn,
> +            wa")
>  	(match_operand:FMOVE64 1 "input_operand"

(You posted this mail as wrapping.  That means the patch cannot be
applied non-manually, and that replies to your mail will be mangled.
Just get a Real mail client, and configure it correctly :-) )

> -            "d,           m,          d,          wY,         <f64_p9>,
> -             Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
> +            "d,           m,          <f64_vsx>,  wY,         <f64_p9>,
> +             Z,           <f64_av>,   wa,         <zero_fp>,  <zero_fp>,
>               r,           YZ,         r,          r,          *h,
> -             0,           <f64_dm>,   r,          eP"))]
> +             0,           <f64_dm>,   r,          d,          wn,
> +             eP"))]

No.  It is impossible to figure out what you changed here by just
reading it.

There is no requirement there should be exactly five alternatives per
line, and/or that there should be the same number everywhere.

If the indentation was incorrect, and you want to fix that, do that in a
separate *earlier* patch in the series, please.

>    "TARGET_POWERPC64 && TARGET_HARD_FLOAT
>     && (gpc_reg_operand (operands[0], <MODE>mode)
>         || gpc_reg_operand (operands[1], <MODE>mode))"
>    "@
>     stfd%U0%X0 %1,%0
>     lfd%U1%X1 %0,%1
> -   fmr %0,%1
> +   xxlor %x0,%x1,%x1
>     lxsd %0,%1
>     stxsd %1,%0
>     lxsdx %x0,%y1
>     stxsdx %x1,%y0
> -   xxlor %x0,%x1,%x1
> +   fmr %0,%1
>     xxlxor %x0,%x0,%x0
>     li %0,0
>     std%U0%X0 %1,%0
> @@ -8467,23 +8474,28 @@ (define_insn "*mov<mode>_hardfloat64"
>     nop
>     mfvsrd %0,%x1
>     mtvsrd %x0,%1
> +   fmr %0,%1
> +   fmr %0,%1
>     #"
>    [(set_attr "type"
> -            "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
> +            "fpstore,     fpload,     veclogical, fpload,     fpstore,
>               fpload,      fpstore,    veclogical, veclogical, integer,
>               store,       load,       *,          mtjmpr,     mfjmpr,
> -             *,           mfvsr,      mtvsr,      vecperm")
> +             *,           mfvsr,      mtvsr,      fpsimple,   fpsimple,
> +             vecperm")
>     (set_attr "size" "64")
>     (set_attr "isa"
> -            "*,           *,          *,          p9v,        p9v,
> -             p7v,         p7v,        *,          *,          *,
> -             *,           *,          *,          *,          *,
> -             *,           p8v,        p8v,        p10")
> +            "*,           *,          p7p8,        p9v,        p9v,
> +             p7v,         p7v,        *,           *,          *,
> +             *,           *,          *,           *,          *,
> +             *,           p8v,        p8v,         *,          *,
> +             p10")

So, you swapped the xxlor and fmr entries, and added two nextra fmr
entries at the end?!


Segher
  
Ajit Agarwal Feb. 25, 2023, 7:42 a.m. UTC | #10
Hello Segher:

On 24/02/23 8:41 pm, Segher Boessenkool wrote:
> Hi!
> 
> For future patches: please don't send patches as replies to existing
> threads.  Just start a new thread for a new patch (series).  You can
> mark it as [PATCH v2] in the subject, if you want.
> 
> On Fri, Feb 24, 2023 at 01:41:49PM +0530, Ajit Agarwal wrote:
>> Here is the patch that uses xxlor instead of fmr where possible.
>> Performance results shows that fmr is better in power9 and 
>> power10 architectures whereas xxlor is better in power7 and
>> power 8 architectures.
> 
> And fmr is the only option before p7.
> 
>> 	rs6000: Use xxlor instead of fmr where possible
>>
>> 	This patch replaces fmr with xxlor instruction for power7
>> 	and power8 architectures whereas for power9 and power10
>> 	replaces xxlor with fmr instruction.
> 
> Saying "this patch" in a commit message reads strangely.  Just "Replace
> fmr with" etc.?
> 

I will correct this.

> The second part is just wrong, you cannot replace xxlor by fmr in
> general.
> 
>> 	Perf measurement results:
>>
>> 	Power9 fmr:  201,847,661 cycles.
>> 	Power9 xxlor: 201,877,78 cycles.
>> 	Power8 fmr: 201,057,795 cycles.
>>         Power8 xxlor: 201,004,671 cycles.
> 
> What is this measuring?  100M insns back-to-back, each dependent on the
> previous one?
> 
Yes.

> What are the results on p7 and p10?
> 
> These numbers show there is no difference on p8 either.  Did you paste
> the wrong numbers maybe?
>

I will measure it again and update with a new patch.
 
>> 	* config/rs6000/rs6000.md (*movdf_hardfloat64): Use xxlor
>> 	for power7 and power8 and fmr for power9 and power10.
> 
> Please don't break lines early.  Changelogs lines can be 80 columns
> wide, just like source code lines.
> 
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -354,7 +354,7 @@ (define_attr "cpu"
>>    (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
>>  
>>  ;; The ISA we implement.
>> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
>> +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p7p8,p10"
> 
> p78v, and sort it after p8v please.
> 
>> +     (and (eq_attr "isa" "p7p8")
>> +	  (match_test "TARGET_VSX && !TARGET_P9_VECTOR"))
>> +     (const_int 1)
> 
> Okay.
> 
>>  (define_insn "*mov<mode>_hardfloat64"
>>    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
>> -           "=m,           d,          d,          <f64_p9>,   wY,
>> -             <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
>> -             YZ,          r,          !r,         *c*l,       !r,
>> -            *h,           r,          <f64_dm>,   wa")
>> +           "=m,           d,          <f64_vsx>,  <f64_p9>,   wY,
>> +            <f64_av>,     Z,          wa,         <f64_vsx>,  !r,
>> +            YZ,           r,          !r,         *c*l,       !r,
>> +            *h,           r,          <f64_dm>,   d,          wn,
>> +            wa")
>>  	(match_operand:FMOVE64 1 "input_operand"
> 
> (You posted this mail as wrapping.  That means the patch cannot be
> applied non-manually, and that replies to your mail will be mangled.
> Just get a Real mail client, and configure it correctly :-) )
>

I am using Thunderbird as mail client and the settings are all correct.
I have set the mailnews.wrapLength 0.

 
>> -            "d,           m,          d,          wY,         <f64_p9>,
>> -             Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
>> +            "d,           m,          <f64_vsx>,  wY,         <f64_p9>,
>> +             Z,           <f64_av>,   wa,         <zero_fp>,  <zero_fp>,
>>               r,           YZ,         r,          r,          *h,
>> -             0,           <f64_dm>,   r,          eP"))]
>> +             0,           <f64_dm>,   r,          d,          wn,
>> +             eP"))]
> 
> No.  It is impossible to figure out what you changed here by just
> reading it.
> 
> There is no requirement there should be exactly five alternatives per
> line, and/or that there should be the same number everywhere.
> 
> If the indentation was incorrect, and you want to fix that, do that in a
> separate *earlier* patch in the series, please.
> 

I will Keep indentation as same.
>>    "TARGET_POWERPC64 && TARGET_HARD_FLOAT
>>     && (gpc_reg_operand (operands[0], <MODE>mode)
>>         || gpc_reg_operand (operands[1], <MODE>mode))"
>>    "@
>>     stfd%U0%X0 %1,%0
>>     lfd%U1%X1 %0,%1
>> -   fmr %0,%1
>> +   xxlor %x0,%x1,%x1
>>     lxsd %0,%1
>>     stxsd %1,%0
>>     lxsdx %x0,%y1
>>     stxsdx %x1,%y0
>> -   xxlor %x0,%x1,%x1
>> +   fmr %0,%1
>>     xxlxor %x0,%x0,%x0
>>     li %0,0
>>     std%U0%X0 %1,%0
>> @@ -8467,23 +8474,28 @@ (define_insn "*mov<mode>_hardfloat64"
>>     nop
>>     mfvsrd %0,%x1
>>     mtvsrd %x0,%1
>> +   fmr %0,%1
>> +   fmr %0,%1
>>     #"
>>    [(set_attr "type"
>> -            "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
>> +            "fpstore,     fpload,     veclogical, fpload,     fpstore,
>>               fpload,      fpstore,    veclogical, veclogical, integer,
>>               store,       load,       *,          mtjmpr,     mfjmpr,
>> -             *,           mfvsr,      mtvsr,      vecperm")
>> +             *,           mfvsr,      mtvsr,      fpsimple,   fpsimple,
>> +             vecperm")
>>     (set_attr "size" "64")
>>     (set_attr "isa"
>> -            "*,           *,          *,          p9v,        p9v,
>> -             p7v,         p7v,        *,          *,          *,
>> -             *,           *,          *,          *,          *,
>> -             *,           p8v,        p8v,        p10")
>> +            "*,           *,          p7p8,        p9v,        p9v,
>> +             p7v,         p7v,        *,           *,          *,
>> +             *,           *,          *,           *,          *,
>> +             *,           p8v,        p8v,         *,          *,
>> +             p10")
> 
> So, you swapped the xxlor and fmr entries, and added two nextra fmr
> entries at the end?!
> 

I have moved xxlor <f64_vsx> "p7p8" before any other constraints with fmr "*".
I have added first constraints as xxlor <f64_vsx> with "p7p8" then wa fmr "*"
and wn,d "*" as fmr at end.
> 
> Segher

Thanks & Regards
Ajit
  

Patch

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 4a7812fa592..dfd6c73ffcb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8436,7 +8436,7 @@ 
   "@
    stfd%U0%X0 %1,%0
    lfd%U1%X1 %0,%1
-   fmr %0,%1
+   xxlor %0,%1,%1
    lxsd %0,%1
    stxsd %1,%0
    lxsdx %x0,%y1