[rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

Message ID edc970c5-024a-4466-a0fe-9d237b9316c2@linux.ibm.com
State New
Headers
Series [rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed

Commit Message

HAO CHEN GUI Jan. 11, 2024, 8:28 a.m. UTC
  Hi,
  This patch eliminates unnecessary byte swaps for block clear on P8
LE. For block clear, all the bytes are set to zero. The byte order
doesn't make sense. So the alignment of destination could be set to
the store mode size in stead of 1 byte in order to eliminates
unnecessary byte swap instructions on P8 LE. The test case shows the
problem.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE

gcc/
	PR target/113325
	* config/rs6000/rs6000-string.cc (expand_block_clear): Set the
	alignment of destination to the size of mode.

gcc/testsuite/
	PR target/113325
	* gcc.target/powerpc/pr113325.c: New.

patch.diff
  

Comments

Richard Biener Jan. 11, 2024, 9:20 a.m. UTC | #1
On Thu, Jan 11, 2024 at 9:30 AM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
>
> Hi,
>   This patch eliminates unnecessary byte swaps for block clear on P8
> LE. For block clear, all the bytes are set to zero. The byte order
> doesn't make sense. So the alignment of destination could be set to
> the store mode size in stead of 1 byte in order to eliminates
> unnecessary byte swap instructions on P8 LE. The test case shows the
> problem.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
>
> Thanks
> Gui Haochen
>
> ChangeLog
> rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE
>
> gcc/
>         PR target/113325
>         * config/rs6000/rs6000-string.cc (expand_block_clear): Set the
>         alignment of destination to the size of mode.
>
> gcc/testsuite/
>         PR target/113325
>         * gcc.target/powerpc/pr113325.c: New.
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
> index 7f777666ba9..4c9b2cbeefc 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
>         }
>
>        dest = adjust_address (orig_dest, mode, offset);
> -
> +      /* Set the alignment of dest to the size of mode in order to
> +        avoid unnecessary byte swaps on LE.  */
> +      set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);

but the alignment is now wrong which might cause ripple-down
wrong-code effects, no?

It's probably bad to hide the byte-swapping in the move patterns (I'm
just guessing
you do that)

>        emit_move_insn (dest, CONST0_RTX (mode));
>      }
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> new file mode 100644
> index 00000000000..4a3cae019c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
> +
> +void* foo (void* s1)
> +{
> +  return __builtin_memset (s1, 0, 32);
> +}
  
HAO CHEN GUI Jan. 12, 2024, 2:15 a.m. UTC | #2
Hi Richard,
   Thanks so much for your comments.


>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
>> index 7f777666ba9..4c9b2cbeefc 100644
>> --- a/gcc/config/rs6000/rs6000-string.cc
>> +++ b/gcc/config/rs6000/rs6000-string.cc
>> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
>>         }
>>
>>        dest = adjust_address (orig_dest, mode, offset);
>> -
>> +      /* Set the alignment of dest to the size of mode in order to
>> +        avoid unnecessary byte swaps on LE.  */
>> +      set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
> 
> but the alignment is now wrong which might cause ripple-down
> wrong-code effects, no?
> 
> It's probably bad to hide the byte-swapping in the move patterns (I'm
> just guessing
> you do that)

Here I just change the alignment of "dest" which is temporary used for
move. The orig_dest is untouched and keep the original alignment. The
subsequent insns which use orig_dest are not affected. I am not sure if
it causes ripple-down effects. Do you mean the dest might be reused
later? But I think the alignment is different even though the mode and
offset is the same.

Looking forward to your advice.

Thanks
Gui Haochen
  
Richard Biener Jan. 12, 2024, 8:03 a.m. UTC | #3
On Fri, Jan 12, 2024 at 3:15 AM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
>
> Hi Richard,
>    Thanks so much for your comments.
>
>
> >> patch.diff
> >> diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
> >> index 7f777666ba9..4c9b2cbeefc 100644
> >> --- a/gcc/config/rs6000/rs6000-string.cc
> >> +++ b/gcc/config/rs6000/rs6000-string.cc
> >> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
> >>         }
> >>
> >>        dest = adjust_address (orig_dest, mode, offset);
> >> -
> >> +      /* Set the alignment of dest to the size of mode in order to
> >> +        avoid unnecessary byte swaps on LE.  */
> >> +      set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
> >
> > but the alignment is now wrong which might cause ripple-down
> > wrong-code effects, no?
> >
> > It's probably bad to hide the byte-swapping in the move patterns (I'm
> > just guessing
> > you do that)
>
> Here I just change the alignment of "dest" which is temporary used for
> move. The orig_dest is untouched and keep the original alignment. The
> subsequent insns which use orig_dest are not affected. I am not sure if
> it causes ripple-down effects. Do you mean the dest might be reused
> later? But I think the alignment is different even though the mode and
> offset is the same.

If the MEM ends up in the IL then its MEM_ALIGN should be better correct.

> Looking forward to your advice.
>
> Thanks
> Gui Haochen
  
Kewen.Lin Jan. 15, 2024, 7:15 a.m. UTC | #4
Hi Haochen,

on 2024/1/11 16:28, HAO CHEN GUI wrote:
> Hi,
>   This patch eliminates unnecessary byte swaps for block clear on P8
> LE. For block clear, all the bytes are set to zero. The byte order
> doesn't make sense. So the alignment of destination could be set to
> the store mode size in stead of 1 byte in order to eliminates
> unnecessary byte swap instructions on P8 LE. The test case shows the
> problem.

I agree with Richi's concern, a bytes swap can be eliminated if the
bytes swapped result is known as before, one typical case is the vector
constant with predicate const_vector_each_byte_same, we can do some
optimization for that.

BR,
Kewen

> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE
> 
> gcc/
> 	PR target/113325
> 	* config/rs6000/rs6000-string.cc (expand_block_clear): Set the
> 	alignment of destination to the size of mode.
> 
> gcc/testsuite/
> 	PR target/113325
> 	* gcc.target/powerpc/pr113325.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
> index 7f777666ba9..4c9b2cbeefc 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
>  	}
> 
>        dest = adjust_address (orig_dest, mode, offset);
> -
> +      /* Set the alignment of dest to the size of mode in order to
> +	 avoid unnecessary byte swaps on LE.  */
> +      set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
>        emit_move_insn (dest, CONST0_RTX (mode));
>      }
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> new file mode 100644
> index 00000000000..4a3cae019c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
> +
> +void* foo (void* s1)
> +{
> +  return __builtin_memset (s1, 0, 32);
> +}
  

Patch

diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
index 7f777666ba9..4c9b2cbeefc 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -140,7 +140,9 @@  expand_block_clear (rtx operands[])
 	}

       dest = adjust_address (orig_dest, mode, offset);
-
+      /* Set the alignment of dest to the size of mode in order to
+	 avoid unnecessary byte swaps on LE.  */
+      set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
       emit_move_insn (dest, CONST0_RTX (mode));
     }

diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c
new file mode 100644
index 00000000000..4a3cae019c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
+
+void* foo (void* s1)
+{
+  return __builtin_memset (s1, 0, 32);
+}