swap: Fix incorrect lane extraction by vec_extract() [PR106770]

Message ID b6e8ab52-3c92-3fa0-c70a-085c5c53e18e@linux.vnet.ibm.com
State New
Headers
Series swap: Fix incorrect lane extraction by vec_extract() [PR106770] |

Commit Message

Surya Kumari Jangala Jan. 4, 2023, 8:28 a.m. UTC
  swap: Fix incorrect lane extraction by vec_extract() [PR106770]

In the routine rs6000_analyze_swaps(), special handling of swappable
instructions is done even if the webs that contain the swappable
instructions are not optimized, i.e., the webs do not contain any
permuting load/store instructions along with the associated register
swap instructions. Doing special handling in such webs will result in
the extracted lane being adjusted unnecessarily for vec_extract.

Modifying swappable instructions is also incorrect in webs where
loads/stores on quad word aligned addresses are changed to lvx/stvx.
Similarly, in webs where swap(load(vector constant)) instructions are
replaced with load(swapped vector constant), the swappable
instructions should not be modified.

2023-01-04  Surya Kumari Jangala  <jskumari@linux.ibm.com>

gcc/
	PR rtl-optimization/106770
	* rs6000-p8swap.cc (rs6000_analyze_swaps): .

gcc/testsuite/
	PR rtl-optimization/106770
	* gcc.target/powerpc/pr106770.c: New test.
---
  

Comments

Surya Kumari Jangala Jan. 12, 2023, 4:51 p.m. UTC | #1
Ping

On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Modifying swappable instructions is also incorrect in webs where
> loads/stores on quad word aligned addresses are changed to lvx/stvx.
> Similarly, in webs where swap(load(vector constant)) instructions are
> replaced with load(swapped vector constant), the swappable
> instructions should not be modified.
> 
> 2023-01-04  Surya Kumari Jangala  <jskumari@linux.ibm.com>
> 
> gcc/
> 	PR rtl-optimization/106770
> 	* rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> gcc/testsuite/
> 	PR rtl-optimization/106770
> 	* gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc b/gcc/config/rs6000/rs6000-p8swap.cc
> index 19fbbfb67dc..7ed39251df9 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>    unsigned int special_handling : 4;
>    /* Set if the web represented by this entry cannot be optimized.  */
>    unsigned int web_not_optimizable : 1;
> +  /* Set if the web represented by this entry has been optimized, ie,
> +     register swaps of permuting loads/stores have been removed.  */
> +  unsigned int web_is_optimized : 1;
>    /* Set if this insn should be deleted.  */
>    unsigned int will_delete : 1;
>  };
> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>    /* For each load and store in an optimizable web (which implies
>       the loads and stores are permuting), find the associated
>       register swaps and mark them for removal.  Due to various
> -     optimizations we may mark the same swap more than once.  Also
> -     perform special handling for swappable insns that require it.  */
> +     optimizations we may mark the same swap more than once. Fix up
> +     the non-permuting loads and stores by converting them into
> +     permuting ones.  */
>    for (i = 0; i < e; ++i)
>      if ((insn_entry[i].is_load || insn_entry[i].is_store)
>  	&& insn_entry[i].is_swap)
>        {
>  	swap_web_entry* root_entry
>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> -	if (!root_entry->web_not_optimizable)
> +	if (!root_entry->web_not_optimizable) {
>  	  mark_swaps_for_removal (insn_entry, i);
> +          root_entry->web_is_optimized = true;
> +        }
>        }
> -    else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
> +    else if (insn_entry[i].is_swappable
> +             && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
> +                 insn_entry[i].special_handling == SH_NOSWAP_ST))
> +      {
> +        swap_web_entry* root_entry
> +          = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> +        if (!root_entry->web_not_optimizable) {
> +          handle_special_swappables (insn_entry, i);
> +          root_entry->web_is_optimized = true;
> +        }
> +      }
> +
> +  /* Perform special handling for swappable insns that require it. 
> +     Note that special handling should be done only for those 
> +     swappable insns that are present in webs optimized above.  */
> +  for (i = 0; i < e; ++i)
> +    if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> +        !(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> +          insn_entry[i].special_handling == SH_NOSWAP_ST))
>        {
>  	swap_web_entry* root_entry
>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> -	if (!root_entry->web_not_optimizable)
> +	if (root_entry->web_is_optimized)
>  	  handle_special_swappables (insn_entry, i);
>        }
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> new file mode 100644
> index 00000000000..84e9aead975
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> +
> +/* Test case to resolve PR106770  */
> +
> +#include <altivec.h>
> +
> +int cmp2(double a, double b)
> +{
> +    vector double va = vec_promote(a, 1);
> +    vector double vb = vec_promote(b, 1);
> +    vector long long vlt = (vector long long)vec_cmplt(va, vb);
> +    vector long long vgt = (vector long long)vec_cmplt(vb, va);
> +    vector signed long long vr = vec_sub(vlt, vgt);
> +
> +    return vec_extract(vr, 1);
> +}
> +
  
Surya Kumari Jangala Feb. 17, 2023, 5:08 a.m. UTC | #2
Ping. Please review the patch.

On 12/01/23 10:21 pm, Surya Kumari Jangala via Gcc-patches wrote:
> Ping
> 
> On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
>> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable
>> instructions are not optimized, i.e., the webs do not contain any
>> permuting load/store instructions along with the associated register
>> swap instructions. Doing special handling in such webs will result in
>> the extracted lane being adjusted unnecessarily for vec_extract.
>>
>> Modifying swappable instructions is also incorrect in webs where
>> loads/stores on quad word aligned addresses are changed to lvx/stvx.
>> Similarly, in webs where swap(load(vector constant)) instructions are
>> replaced with load(swapped vector constant), the swappable
>> instructions should not be modified.
>>
>> 2023-01-04  Surya Kumari Jangala  <jskumari@linux.ibm.com>
>>
>> gcc/
>> 	PR rtl-optimization/106770
>> 	* rs6000-p8swap.cc (rs6000_analyze_swaps): .
>>
>> gcc/testsuite/
>> 	PR rtl-optimization/106770
>> 	* gcc.target/powerpc/pr106770.c: New test.
>> ---
>>
>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc b/gcc/config/rs6000/rs6000-p8swap.cc
>> index 19fbbfb67dc..7ed39251df9 100644
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>>    unsigned int special_handling : 4;
>>    /* Set if the web represented by this entry cannot be optimized.  */
>>    unsigned int web_not_optimizable : 1;
>> +  /* Set if the web represented by this entry has been optimized, ie,
>> +     register swaps of permuting loads/stores have been removed.  */
>> +  unsigned int web_is_optimized : 1;
>>    /* Set if this insn should be deleted.  */
>>    unsigned int will_delete : 1;
>>  };
>> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>>    /* For each load and store in an optimizable web (which implies
>>       the loads and stores are permuting), find the associated
>>       register swaps and mark them for removal.  Due to various
>> -     optimizations we may mark the same swap more than once.  Also
>> -     perform special handling for swappable insns that require it.  */
>> +     optimizations we may mark the same swap more than once. Fix up
>> +     the non-permuting loads and stores by converting them into
>> +     permuting ones.  */
>>    for (i = 0; i < e; ++i)
>>      if ((insn_entry[i].is_load || insn_entry[i].is_store)
>>  	&& insn_entry[i].is_swap)
>>        {
>>  	swap_web_entry* root_entry
>>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
>> -	if (!root_entry->web_not_optimizable)
>> +	if (!root_entry->web_not_optimizable) {
>>  	  mark_swaps_for_removal (insn_entry, i);
>> +          root_entry->web_is_optimized = true;
>> +        }
>>        }
>> -    else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
>> +    else if (insn_entry[i].is_swappable
>> +             && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
>> +                 insn_entry[i].special_handling == SH_NOSWAP_ST))
>> +      {
>> +        swap_web_entry* root_entry
>> +          = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
>> +        if (!root_entry->web_not_optimizable) {
>> +          handle_special_swappables (insn_entry, i);
>> +          root_entry->web_is_optimized = true;
>> +        }
>> +      }
>> +
>> +  /* Perform special handling for swappable insns that require it. 
>> +     Note that special handling should be done only for those 
>> +     swappable insns that are present in webs optimized above.  */
>> +  for (i = 0; i < e; ++i)
>> +    if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
>> +        !(insn_entry[i].special_handling == SH_NOSWAP_LD || 
>> +          insn_entry[i].special_handling == SH_NOSWAP_ST))
>>        {
>>  	swap_web_entry* root_entry
>>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
>> -	if (!root_entry->web_not_optimizable)
>> +	if (root_entry->web_is_optimized)
>>  	  handle_special_swappables (insn_entry, i);
>>        }
>>  
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c b/gcc/testsuite/gcc.target/powerpc/pr106770.c
>> new file mode 100644
>> index 00000000000..84e9aead975
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
>> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
>> +
>> +/* Test case to resolve PR106770  */
>> +
>> +#include <altivec.h>
>> +
>> +int cmp2(double a, double b)
>> +{
>> +    vector double va = vec_promote(a, 1);
>> +    vector double vb = vec_promote(b, 1);
>> +    vector long long vlt = (vector long long)vec_cmplt(va, vb);
>> +    vector long long vgt = (vector long long)vec_cmplt(vb, va);
>> +    vector signed long long vr = vec_sub(vlt, vgt);
>> +
>> +    return vec_extract(vr, 1);
>> +}
>> +
  
Surya Kumari Jangala Feb. 27, 2023, 11:52 a.m. UTC | #3
Hello,

Ping https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609374.html

Thanks,

Surya

On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Modifying swappable instructions is also incorrect in webs where
> loads/stores on quad word aligned addresses are changed to lvx/stvx.
> Similarly, in webs where swap(load(vector constant)) instructions are
> replaced with load(swapped vector constant), the swappable
> instructions should not be modified.
> 
> 2023-01-04  Surya Kumari Jangala  <jskumari@linux.ibm.com>
> 
> gcc/
> 	PR rtl-optimization/106770
> 	* rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> gcc/testsuite/
> 	PR rtl-optimization/106770
> 	* gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc b/gcc/config/rs6000/rs6000-p8swap.cc
> index 19fbbfb67dc..7ed39251df9 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>    unsigned int special_handling : 4;
>    /* Set if the web represented by this entry cannot be optimized.  */
>    unsigned int web_not_optimizable : 1;
> +  /* Set if the web represented by this entry has been optimized, ie,
> +     register swaps of permuting loads/stores have been removed.  */
> +  unsigned int web_is_optimized : 1;
>    /* Set if this insn should be deleted.  */
>    unsigned int will_delete : 1;
>  };
> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>    /* For each load and store in an optimizable web (which implies
>       the loads and stores are permuting), find the associated
>       register swaps and mark them for removal.  Due to various
> -     optimizations we may mark the same swap more than once.  Also
> -     perform special handling for swappable insns that require it.  */
> +     optimizations we may mark the same swap more than once. Fix up
> +     the non-permuting loads and stores by converting them into
> +     permuting ones.  */
>    for (i = 0; i < e; ++i)
>      if ((insn_entry[i].is_load || insn_entry[i].is_store)
>  	&& insn_entry[i].is_swap)
>        {
>  	swap_web_entry* root_entry
>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> -	if (!root_entry->web_not_optimizable)
> +	if (!root_entry->web_not_optimizable) {
>  	  mark_swaps_for_removal (insn_entry, i);
> +          root_entry->web_is_optimized = true;
> +        }
>        }
> -    else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
> +    else if (insn_entry[i].is_swappable
> +             && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
> +                 insn_entry[i].special_handling == SH_NOSWAP_ST))
> +      {
> +        swap_web_entry* root_entry
> +          = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> +        if (!root_entry->web_not_optimizable) {
> +          handle_special_swappables (insn_entry, i);
> +          root_entry->web_is_optimized = true;
> +        }
> +      }
> +
> +  /* Perform special handling for swappable insns that require it. 
> +     Note that special handling should be done only for those 
> +     swappable insns that are present in webs optimized above.  */
> +  for (i = 0; i < e; ++i)
> +    if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> +        !(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> +          insn_entry[i].special_handling == SH_NOSWAP_ST))
>        {
>  	swap_web_entry* root_entry
>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> -	if (!root_entry->web_not_optimizable)
> +	if (root_entry->web_is_optimized)
>  	  handle_special_swappables (insn_entry, i);
>        }
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> new file mode 100644
> index 00000000000..84e9aead975
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> +
> +/* Test case to resolve PR106770  */
> +
> +#include <altivec.h>
> +
> +int cmp2(double a, double b)
> +{
> +    vector double va = vec_promote(a, 1);
> +    vector double vb = vec_promote(b, 1);
> +    vector long long vlt = (vector long long)vec_cmplt(va, vb);
> +    vector long long vgt = (vector long long)vec_cmplt(vb, va);
> +    vector signed long long vr = vec_sub(vlt, vgt);
> +
> +    return vec_extract(vr, 1);
> +}
> +
  
Segher Boessenkool Feb. 27, 2023, 4:28 p.m. UTC | #4
Hi!

On Wed, Jan 04, 2023 at 01:58:19PM +0530, Surya Kumari Jangala wrote:
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Modifying swappable instructions is also incorrect in webs where
> loads/stores on quad word aligned addresses are changed to lvx/stvx.
> Similarly, in webs where swap(load(vector constant)) instructions are
> replaced with load(swapped vector constant), the swappable
> instructions should not be modified.
> 
> 2023-01-04  Surya Kumari Jangala  <jskumari@linux.ibm.com>
> 
> gcc/
> 	PR rtl-optimization/106770
> 	* rs6000-p8swap.cc (rs6000_analyze_swaps): .

Please add an entry?  Or multiple ones, actually.  Describe all changes.

> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>    unsigned int special_handling : 4;
>    /* Set if the web represented by this entry cannot be optimized.  */
>    unsigned int web_not_optimizable : 1;
> +  /* Set if the web represented by this entry has been optimized, ie,

s/ie/i.e./

> +     register swaps of permuting loads/stores have been removed.  */

If it really means only exactly this, then the name isn't so good.

> +  unsigned int web_is_optimized : 1;

And if it is as general as the name suggests, then the comment is no
good.  Which is it?  :-)

>    /* For each load and store in an optimizable web (which implies
>       the loads and stores are permuting), find the associated
>       register swaps and mark them for removal.  Due to various
> -     optimizations we may mark the same swap more than once.  Also
> -     perform special handling for swappable insns that require it.  */
> +     optimizations we may mark the same swap more than once. Fix up
> +     the non-permuting loads and stores by converting them into
> +     permuting ones.  */

Two spaces after a full stop is correct.  Please put that back.

Is it a good idea convert from/to swapping load/stores in this pass at
all?  Doesdn't that belong elsewhere?  Like, in combine, where we
already should do this.  Why does that not work?

> -	if (!root_entry->web_not_optimizable)
> +	if (!root_entry->web_not_optimizable) {

Blocks start on a new line, indented.

>  	  mark_swaps_for_removal (insn_entry, i);
> +          root_entry->web_is_optimized = true;

Indent using tabs where possible.

> +        swap_web_entry* root_entry
> +          = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());

Space before *, in all cases. Space before the second (.  There are too
many brackets here, too.

> +  /* Perform special handling for swappable insns that require it. 

No trailing spaces.

> +     Note that special handling should be done only for those 
> +     swappable insns that are present in webs optimized above.  */
> +  for (i = 0; i < e; ++i)
> +    if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> +        !(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> +          insn_entry[i].special_handling == SH_NOSWAP_ST))
>        {
>  	swap_web_entry* root_entry
>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> -	if (!root_entry->web_not_optimizable)
> +	if (root_entry->web_is_optimized)
>  	  handle_special_swappables (insn_entry, i);
>        }

Why this change?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */

Is -O3 required?  Use -O2 if you can.  And no trailing spaces please.

> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */

Those two remaining are superfluous, so comment that please.


Segher
  
Surya Kumari Jangala March 3, 2023, 10:59 a.m. UTC | #5
On 27/02/23 9:58 pm, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 04, 2023 at 01:58:19PM +0530, Surya Kumari Jangala wrote:
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable
>> instructions are not optimized, i.e., the webs do not contain any
>> permuting load/store instructions along with the associated register
>> swap instructions. Doing special handling in such webs will result in
>> the extracted lane being adjusted unnecessarily for vec_extract.
>>
>> Modifying swappable instructions is also incorrect in webs where
>> loads/stores on quad word aligned addresses are changed to lvx/stvx.
>> Similarly, in webs where swap(load(vector constant)) instructions are
>> replaced with load(swapped vector constant), the swappable
>> instructions should not be modified.
>>
>> 2023-01-04  Surya Kumari Jangala  <jskumari@linux.ibm.com>
>>
>> gcc/
>> 	PR rtl-optimization/106770
>> 	* rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> Please add an entry?  Or multiple ones, actually.  Describe all changes.
Ok

> 
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>>    unsigned int special_handling : 4;
>>    /* Set if the web represented by this entry cannot be optimized.  */
>>    unsigned int web_not_optimizable : 1;
>> +  /* Set if the web represented by this entry has been optimized, ie,
> 
> s/ie/i.e./
> 
>> +     register swaps of permuting loads/stores have been removed.  */
> 
> If it really means only exactly this, then the name isn't so good.

There is another bit in this class named "web_not_optimizable". This stands for webs that cannot be optimized. I am reusing this name for the new bit I am adding, and I have named this bit as "web_is_optimized".

> 
>> +  unsigned int web_is_optimized : 1;
> 
> And if it is as general as the name suggests, then the comment is no
> good.  Which is it?  :-)
> 
>>    /* For each load and store in an optimizable web (which implies
>>       the loads and stores are permuting), find the associated
>>       register swaps and mark them for removal.  Due to various
>> -     optimizations we may mark the same swap more than once.  Also
>> -     perform special handling for swappable insns that require it.  */
>> +     optimizations we may mark the same swap more than once. Fix up
>> +     the non-permuting loads and stores by converting them into
>> +     permuting ones.  */
> 
> Two spaces after a full stop is correct.  Please put that back.
Ok

> 
> Is it a good idea convert from/to swapping load/stores in this pass at
> all?  Doesdn't that belong elsewhere?  Like, in combine, where we
> already should do this.  Why does that not work> 
>> -	if (!root_entry->web_not_optimizable)
>> +	if (!root_entry->web_not_optimizable) {
> 
> Blocks start on a new line, indented.
Ok

> 
>>  	  mark_swaps_for_removal (insn_entry, i);
>> +          root_entry->web_is_optimized = true;
> 
> Indent using tabs where possible.
Ok

> 
>> +        swap_web_entry* root_entry
>> +          = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> 
> Space before *, in all cases. Space before the second (.  There are too
> many brackets here, too.
Ok

> 
>> +  /* Perform special handling for swappable insns that require it. 
> 
> No trailing spaces.
Ok

> 
>> +     Note that special handling should be done only for those 
>> +     swappable insns that are present in webs optimized above.  */
>> +  for (i = 0; i < e; ++i)
>> +    if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
>> +        !(insn_entry[i].special_handling == SH_NOSWAP_LD || 
>> +          insn_entry[i].special_handling == SH_NOSWAP_ST))
>>        {
>>  	swap_web_entry* root_entry
>>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
>> -	if (!root_entry->web_not_optimizable)
>> +	if (root_entry->web_is_optimized)
>>  	  handle_special_swappables (insn_entry, i);
>>        }
> 
> Why this change?

The swap pass analyzes vector computations and removes unnecessary doubleword swaps (xxswapdi instructions). The swap pass first constructs webs and removes swap instructions if possible. If the web contains operations that are sensitive to element order (ie, insns requiring special handling), such as an extract, then such instructions should be modified. For example, for an extract operation the lane is changed.
However, the swap pass is changing element order of an extract operation that is present in unoptimized webs. Unoptimized webs are those for which register swaps were not removed, one of the reasons being that there are no loads/stores present in this web. For such webs, element order of extract operation should not be changed.
Hence, we first mark webs that can be optimized, and only for such webs we call the routine handle_special_swappables() to modify operations sensitive to element order.
One type of insns that are handled by handle_special_swappables() are non-permuting loads/stores. These insns are converted to permuting loads/stores. This conversion is needed because a permuting load/store is converted into a simple load/store during the final assembly code generation. Whereas, non-permuting loads/stores are converted into a load/store and a swap instruction.
So we first go thru all the webs, and for permuting loads/stores we find the associated register swaps and mark them for removal. And for non-permuting loads/stores, we convert them to permuting loads/stores. All such webs are marked as 'optimized'
Then we go thru all the webs again, and for those marked 'optimized', we call handle_special_swappables() to handle insns that require special handling.

> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
> 
> Is -O3 required?  Use -O2 if you can.  And no trailing spaces please.
Let me check with -O2.

> 
>> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> 
> Those two remaining are superfluous, so comment that please.
Ok

> 
> 
> Segher
  
Segher Boessenkool March 3, 2023, 5:19 p.m. UTC | #6
Hi!

On Fri, Mar 03, 2023 at 04:29:57PM +0530, Surya Kumari Jangala wrote:
> On 27/02/23 9:58 pm, Segher Boessenkool wrote:
> > On Wed, Jan 04, 2023 at 01:58:19PM +0530, Surya Kumari Jangala wrote:
> >> +     register swaps of permuting loads/stores have been removed.  */
> > 
> > If it really means only exactly this, then the name isn't so good.
> 
> There is another bit in this class named "web_not_optimizable". This stands for webs that cannot be optimized. I am reusing this name for the new bit I am adding, and I have named this bit as "web_is_optimized".

"optimizable" and "optimized" are very different things.  Both are not
saying much at all, making this worse :-(

"swaps_can_be_removed" and "swaps_have_been_removed" maybe, or
"swaps_will_be_removed" which is closer to the truth it seems?  The
future tense in that is important.

> >> +     Note that special handling should be done only for those 
> >> +     swappable insns that are present in webs optimized above.  */
> >> +  for (i = 0; i < e; ++i)
> >> +    if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> >> +        !(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> >> +          insn_entry[i].special_handling == SH_NOSWAP_ST))
> >>        {
> >>  	swap_web_entry* root_entry
> >>  	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
> >> -	if (!root_entry->web_not_optimizable)
> >> +	if (root_entry->web_is_optimized)
> >>  	  handle_special_swappables (insn_entry, i);
> >>        }
> > 
> > Why this change?
> 
> The swap pass analyzes vector computations and removes unnecessary doubleword swaps (xxswapdi instructions). The swap pass first constructs webs and removes swap instructions if possible. If the web contains operations that are sensitive to element order (ie, insns requiring special handling), such as an extract, then such instructions should be modified. For example, for an extract operation the lane is changed.

[Please limit your line lengths to something reasonable :-)  It is hard
to edit and even read like this.  70 or 72 is a usual max length in
email, allowing a few levels of quoting before wrapping on standard
terminals.]

> However, the swap pass is changing element order of an extract operation that is present in unoptimized webs. Unoptimized webs are those for which register swaps were not removed, one of the reasons being that there are no loads/stores present in this web. For such webs, element order of extract operation should not be changed.
> Hence, we first mark webs that can be optimized, and only for such webs we call the routine handle_special_swappables() to modify operations sensitive to element order.
> One type of insns that are handled by handle_special_swappables() are non-permuting loads/stores. These insns are converted to permuting loads/stores. This conversion is needed because a permuting load/store is converted into a simple load/store during the final assembly code generation. Whereas, non-permuting loads/stores are converted into a load/store and a swap instruction.
> So we first go thru all the webs, and for permuting loads/stores we find the associated register swaps and mark them for removal. And for non-permuting loads/stores, we convert them to permuting loads/stores. All such webs are marked as 'optimized'
> Then we go thru all the webs again, and for those marked 'optimized', we call handle_special_swappables() to handle insns that require special handling.

Hrm.  So something like "optimizable_with_special_handling"?

That is still way more vague than wanted, of course.

Two big issues:

1) The code seems to duplicate a lot of existing code.  Can things be
factored better?  This is very good for maintainability.
2) Related a bit, please create a new function if you need to add a
large amount of code.

Restructuring existing code is fine, that is even needed to keep code in
good shape.  Please do that in a separate patch (typically before the
rest of the series), which makes reviewing it much simpler: one patch
that can even be largish but that really changes nothing, and one
hopefully small one that does the real meat.  The former is easier to
review because there are only two things to consider: is the new
structure good, and does the patch really not change things?  The latter
patch will be much easier to review as well, because it has ultra focus
on the actual new code :-)

Looking forward to what you come up with,


Segher
  

Patch

diff --git a/gcc/config/rs6000/rs6000-p8swap.cc b/gcc/config/rs6000/rs6000-p8swap.cc
index 19fbbfb67dc..7ed39251df9 100644
--- a/gcc/config/rs6000/rs6000-p8swap.cc
+++ b/gcc/config/rs6000/rs6000-p8swap.cc
@@ -179,6 +179,9 @@  class swap_web_entry : public web_entry_base
   unsigned int special_handling : 4;
   /* Set if the web represented by this entry cannot be optimized.  */
   unsigned int web_not_optimizable : 1;
+  /* Set if the web represented by this entry has been optimized, ie,
+     register swaps of permuting loads/stores have been removed.  */
+  unsigned int web_is_optimized : 1;
   /* Set if this insn should be deleted.  */
   unsigned int will_delete : 1;
 };
@@ -2627,22 +2630,43 @@  rs6000_analyze_swaps (function *fun)
   /* For each load and store in an optimizable web (which implies
      the loads and stores are permuting), find the associated
      register swaps and mark them for removal.  Due to various
-     optimizations we may mark the same swap more than once.  Also
-     perform special handling for swappable insns that require it.  */
+     optimizations we may mark the same swap more than once. Fix up
+     the non-permuting loads and stores by converting them into
+     permuting ones.  */
   for (i = 0; i < e; ++i)
     if ((insn_entry[i].is_load || insn_entry[i].is_store)
 	&& insn_entry[i].is_swap)
       {
 	swap_web_entry* root_entry
 	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
-	if (!root_entry->web_not_optimizable)
+	if (!root_entry->web_not_optimizable) {
 	  mark_swaps_for_removal (insn_entry, i);
+          root_entry->web_is_optimized = true;
+        }
       }
-    else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
+    else if (insn_entry[i].is_swappable
+             && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
+                 insn_entry[i].special_handling == SH_NOSWAP_ST))
+      {
+        swap_web_entry* root_entry
+          = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
+        if (!root_entry->web_not_optimizable) {
+          handle_special_swappables (insn_entry, i);
+          root_entry->web_is_optimized = true;
+        }
+      }
+
+  /* Perform special handling for swappable insns that require it. 
+     Note that special handling should be done only for those 
+     swappable insns that are present in webs optimized above.  */
+  for (i = 0; i < e; ++i)
+    if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
+        !(insn_entry[i].special_handling == SH_NOSWAP_LD || 
+          insn_entry[i].special_handling == SH_NOSWAP_ST))
       {
 	swap_web_entry* root_entry
 	  = (swap_web_entry*)((&insn_entry[i])->unionfind_root ());
-	if (!root_entry->web_not_optimizable)
+	if (root_entry->web_is_optimized)
 	  handle_special_swappables (insn_entry, i);
       }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c b/gcc/testsuite/gcc.target/powerpc/pr106770.c
new file mode 100644
index 00000000000..84e9aead975
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
+/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
+
+/* Test case to resolve PR106770  */
+
+#include <altivec.h>
+
+int cmp2(double a, double b)
+{
+    vector double va = vec_promote(a, 1);
+    vector double vb = vec_promote(b, 1);
+    vector long long vlt = (vector long long)vec_cmplt(va, vb);
+    vector long long vgt = (vector long long)vec_cmplt(vb, va);
+    vector signed long long vr = vec_sub(vlt, vgt);
+
+    return vec_extract(vr, 1);
+}
+