[V2] Disable generating load/store vector pairs for block copies.

Message ID YqNi7DDZC+9Eqzsm@toto.the-meissners.org
State Committed
Commit fddb7f65129a12dabb5ddc3c8082fe576f4af451
Headers
Series [V2] Disable generating load/store vector pairs for block copies. |

Commit Message

Michael Meissner June 10, 2022, 3:27 p.m. UTC
  [PATCH, V2] Disable generating load/store vector pairs for block copies.

Testing has found that using store vector pair for block copies can result
in a slow down on power10.  This patch disables using the vector pair
instructions for block copies if we are tuning for power10.

This is version 2 of the patch.

| Date: Mon, 6 Jun 2022 20:55:55 -0400
| Subject: [PATCH 2/3] Disable generating load/store vector pairs for block copies.
| Message-ID: <Yp6iGwU03VjRSDAc@toto.the-meissners.org>
| https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596275.html

Compared to version 1, this patch is a stand-alone patch, and it doesn't depend
on a new switch (-mno-store-vector-pair).  Instead, this patch just sets the
default for -mblock-ops-vector-pair to be off if the current cpu being tuned
for is power10.  It would be anticipated that it would automatically be eabled
when tuning for a future cpu.

I have tested this patch on:

	little endian power10 using --with-cpu=power10
	little endian power9 using --with-cpu=power9
	big endian power8 using --with-cpu=power8, both 32/64-bit tested

there were no regressions.  Can I apply this to the master branch, and then
apply it to the GCC 12 patch after a burn-in period?


2022-06-09   Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
	not generate block copies with vector pair instructions if we are
	tuning for power10.
---
 gcc/config/rs6000/rs6000.cc | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

Segher Boessenkool June 10, 2022, 5:30 p.m. UTC | #1
Hi!

On Fri, Jun 10, 2022 at 11:27:40AM -0400, Michael Meissner wrote:
> Testing has found that using store vector pair for block copies can result
> in a slow down on power10.  This patch disables using the vector pair
> instructions for block copies if we are tuning for power10.

Load paired should be disabled as well, for the same reason.  The patch
seems to do that fine?  Please fix the commit message.

Thanks,


Segher


> 2022-06-09   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
> 	not generate block copies with vector pair instructions if we are
> 	tuning for power10.
> ---
>  gcc/config/rs6000/rs6000.cc | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 0af2085adc0..59481d9ac70 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4141,7 +4141,10 @@ rs6000_option_override_internal (bool global_init_p)
>  
>    if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
>      {
> -      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
> +      /* Do not generate lxvp and stxvp on power10 since there are some
> +	 performance issues.  */
> +      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> +	  && rs6000_tune != PROCESSOR_POWER10)
>  	rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
>        else
>  	rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
  
Michael Meissner July 14, 2022, 3:20 p.m. UTC | #2
I have applied the patch to GCC 12.

| From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001
| From: Michael Meissner <meissner@linux.ibm.com>
| Date: Thu, 14 Jul 2022 11:16:08 -0400
| Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies.

Testing has found that using load and store vector pair for block copies
can result in a slow down on power10.  This patch disables using the
vector pair instructions for block copies if we are tuning for power10.

2022-06-11   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
	not generate block copies with vector pair instructions if we are
	tuning for power10.  Back port from master branch.
---
 gcc/config/rs6000/rs6000.cc | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 4030864aa1a..040487bd277 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p)
 
   if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
     {
-      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
+      /* Do not generate lxvp and stxvp on power10 since there are some
+	 performance issues.  */
+      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
+	  && rs6000_tune != PROCESSOR_POWER10)
 	rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
       else
 	rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
  
Michael Meissner July 14, 2022, 6:07 p.m. UTC | #3
Back port patch (changing .cc to .c) from trunk to GCC 11 committed.

| From 3118d0856b030fe491a170354fed2df570df199f Mon Sep 17 00:00:00 2001
| From: Michael Meissner <meissner@linux.ibm.com>
| Date: Thu, 14 Jul 2022 14:03:37 -0400
| Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies.

Testing has found that using load and store vector pair for block copies
can result in a slow down on power10.  This patch disables using the
vector pair instructions for block copies if we are tuning for power10.

2022-06-11   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Do
	not generate block copies with vector pair instructions if we are
	tuning for power10.  Back port from master branch.
---
 gcc/config/rs6000/rs6000.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8c89b45922f..73f90f134f8 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4149,7 +4149,10 @@ rs6000_option_override_internal (bool global_init_p)
 
   if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
     {
-      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
+      /* Do not generate lxvp and stxvp on power10 since there are some
+	 performance issues.  */
+      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
+	  && rs6000_tune != PROCESSOR_POWER10)
 	rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
       else
 	rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
  
Segher Boessenkool July 14, 2022, 9:12 p.m. UTC | #4
On Thu, Jul 14, 2022 at 11:20:56AM -0400, Michael Meissner wrote:
> I have applied the patch to GCC 12.
> 
> | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001
> | From: Michael Meissner <meissner@linux.ibm.com>
> | Date: Thu, 14 Jul 2022 11:16:08 -0400
> | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies.
> 
> Testing has found that using load and store vector pair for block copies
> can result in a slow down on power10.  This patch disables using the
> vector pair instructions for block copies if we are tuning for power10.
> 
> 2022-06-11   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 
> 	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
> 	not generate block copies with vector pair instructions if we are
> 	tuning for power10.  Back port from master branch.

You never posted the trunk version of this, so that never was approved
either.

> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p)
>  
>    if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
>      {
> -      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
> +      /* Do not generate lxvp and stxvp on power10 since there are some
> +	 performance issues.  */
> +      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> +	  && rs6000_tune != PROCESSOR_POWER10)
>  	rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
>        else
>  	rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;

The TARGET_MMA in that should not be there.  Please fix that (that
probably needs more changes).

This statement does the opposite of what the comment says.

Please fix this.  On trunk, first.


Segher
  
Michael Meissner July 14, 2022, 9:49 p.m. UTC | #5
On Thu, Jul 14, 2022 at 04:12:14PM -0500, Segher Boessenkool wrote:
> On Thu, Jul 14, 2022 at 11:20:56AM -0400, Michael Meissner wrote:
> > I have applied the patch to GCC 12.
> > 
> > | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001
> > | From: Michael Meissner <meissner@linux.ibm.com>
> > | Date: Thu, 14 Jul 2022 11:16:08 -0400
> > | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies.
> > 
> > Testing has found that using load and store vector pair for block copies
> > can result in a slow down on power10.  This patch disables using the
> > vector pair instructions for block copies if we are tuning for power10.
> > 
> > 2022-06-11   Michael Meissner  <meissner@linux.ibm.com>
> > 
> > gcc/
> > 
> > 	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
> > 	not generate block copies with vector pair instructions if we are
> > 	tuning for power10.  Back port from master branch.
> 
> You never posted the trunk version of this, so that never was approved
> either.

I did post the trunk version on June 10th, and your only comment was fix the
commit message, which I thought I did in the commit.

> > +++ b/gcc/config/rs6000/rs6000.cc
> > @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p)
> >  
> >    if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
> >      {
> > -      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
> > +      /* Do not generate lxvp and stxvp on power10 since there are some
> > +	 performance issues.  */
> > +      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> > +	  && rs6000_tune != PROCESSOR_POWER10)
> >  	rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> >        else
> >  	rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> 
> The TARGET_MMA in that should not be there.  Please fix that (that
> probably needs more changes).


All of the movoo and movxo support require TARGET_MMA as does the code in
rs6000-string.cc that could possibly generate load/store vector pair.  To
remove the check here would mean also fixing all of the vector load and store
pairs in mma.md.

> This statement does the opposite of what the comment says.
> 
> Please fix this.  On trunk, first.
  
Segher Boessenkool July 14, 2022, 10:04 p.m. UTC | #6
On Thu, Jul 14, 2022 at 05:49:57PM -0400, Michael Meissner wrote:
> On Thu, Jul 14, 2022 at 04:12:14PM -0500, Segher Boessenkool wrote:
> > You never posted the trunk version of this, so that never was approved
> > either.
> 
> I did post the trunk version on June 10th, and your only comment was fix the
> commit message, which I thought I did in the commit.

I did not approve the patch.  Of course not, I didn't even get as far as
reading it.  You should have fixed it and sent again, I did not approve
anything.

> > > +      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> > > +	  && rs6000_tune != PROCESSOR_POWER10)
> > >  	rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> > >        else
> > >  	rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> > 
> > The TARGET_MMA in that should not be there.  Please fix that (that
> > probably needs more changes).
> 
> All of the movoo and movxo support require TARGET_MMA as does the code in
> rs6000-string.cc that could possibly generate load/store vector pair.

And all that is wrong and should be fixed.

> To
> remove the check here would mean also fixing all of the vector load and store
> pairs in mma.md.

That is wha I said, yes,.

> > This statement does the opposite of what the comment says.
> > 
> > Please fix this.  On trunk, first.

This is the core problem with this patch: it is simply wrong.

It is a very roundabout way of saying "only enable vector pairs if
-mcpu=power10 but -mtune=somethingelse".  Which is not a sensible
thing to do, and not what the comment says either.


Segher
  

Patch

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0af2085adc0..59481d9ac70 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -4141,7 +4141,10 @@  rs6000_option_override_internal (bool global_init_p)
 
   if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
     {
-      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
+      /* Do not generate lxvp and stxvp on power10 since there are some
+	 performance issues.  */
+      if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
+	  && rs6000_tune != PROCESSOR_POWER10)
 	rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
       else
 	rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;