rs6000/test: Add emulated gather test case

Message ID d2de3c57-01ab-3e42-97d4-80ad552eaac8@linux.ibm.com
State Committed
Commit 300dbea12693e365c89971527ca14cb0242def64
Headers
Series rs6000/test: Add emulated gather test case |

Commit Message

Kewen.Lin Nov. 25, 2021, 3:20 a.m. UTC
  Hi,

This patch is to add a test case similar to the one in i386
to add testing coverage for 510.parest_r hotspots.

As evaluated, the emulated gather capability of vectorizer
(r12-2733) can help to speed up SPEC2017 510.parest_r on
Power8/9/10 by 5% to 9% with option sets Ofast unroll and
Ofast lto.  But since rs6000 missed unpacking support for
unsigned int before, it can only vectorize the hotspots
until r12-3134.

By checking why r12-2733 doesn't immediately show its impact
for SPEC2017 510.parest_r while the associated test case
already can get vectorized on rs6000 at that time, I realized
the associated test case use int as INDEXTYPE while the
hotspots actually use unsigned int.  So different from the one
in i386, this patch uses unsigned int as INDEXTYPE since the
unpack support for unsigned int (r12-3134) also matters for
the hotspots vectorization.  Not sure if it's worth to updating
the one in i386 as well?

Tested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8.

Is it ok for trunk?

BR,
Kewen
-----
gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/vect-gather-1.c: New test.

--
2.25.1
  

Comments

Hongtao Liu Nov. 25, 2021, 5:17 a.m. UTC | #1
On Thu, Nov 25, 2021 at 11:21 AM Kewen.Lin via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> This patch is to add a test case similar to the one in i386
> to add testing coverage for 510.parest_r hotspots.
>
> As evaluated, the emulated gather capability of vectorizer
> (r12-2733) can help to speed up SPEC2017 510.parest_r on
> Power8/9/10 by 5% to 9% with option sets Ofast unroll and
> Ofast lto.  But since rs6000 missed unpacking support for
> unsigned int before, it can only vectorize the hotspots
> until r12-3134.
>
> By checking why r12-2733 doesn't immediately show its impact
> for SPEC2017 510.parest_r while the associated test case
> already can get vectorized on rs6000 at that time, I realized
> the associated test case use int as INDEXTYPE while the
> hotspots actually use unsigned int.  So different from the one
> in i386, this patch uses unsigned int as INDEXTYPE since the
> unpack support for unsigned int (r12-3134) also matters for
> the hotspots vectorization.  Not sure if it's worth to updating
> the one in i386 as well?
It looks like the same testcase added in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531
>
> Tested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -----
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/powerpc/vect-gather-1.c: New test.
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/vect-gather-1.c b/gcc/testsuite/gcc.target/powerpc/vect-gather-1.c
> new file mode 100644
> index 00000000000..bf98045ab03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vect-gather-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* Profitable from Power8 since it supports efficient unaligned load.  */
> +/* { dg-options "-Ofast -mdejagnu-cpu=power8 -fdump-tree-vect-details -fdump-tree-forwprop4" } */
> +
> +#ifndef INDEXTYPE
> +#define INDEXTYPE unsigned int
> +#endif
> +double vmul(INDEXTYPE *rowstart, INDEXTYPE *rowend,
> +           double *luval, double *dst)
> +{
> +  double res = 0;
> +  for (const INDEXTYPE * col = rowstart; col != rowend; ++col, ++luval)
> +        res += *luval * dst[*col];
> +  return res;
> +}
> +
> +/* With gather emulation this should be profitable to vectorize from Power8.  */
> +/* { dg-final { scan-tree-dump "loop vectorized" "vect" } } */
> +/* The index vector loads and promotions should be scalar after forwprop.  */
> +/* { dg-final { scan-tree-dump-not "vec_unpack" "forwprop4" } } */
> --
> 2.25.1
>
  
Kewen.Lin Nov. 25, 2021, 5:31 a.m. UTC | #2
on 2021/11/25 下午1:17, Hongtao Liu wrote:
> On Thu, Nov 25, 2021 at 11:21 AM Kewen.Lin via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Hi,
>>
>> This patch is to add a test case similar to the one in i386
>> to add testing coverage for 510.parest_r hotspots.
>>
>> As evaluated, the emulated gather capability of vectorizer
>> (r12-2733) can help to speed up SPEC2017 510.parest_r on
>> Power8/9/10 by 5% to 9% with option sets Ofast unroll and
>> Ofast lto.  But since rs6000 missed unpacking support for
>> unsigned int before, it can only vectorize the hotspots
>> until r12-3134.
>>
>> By checking why r12-2733 doesn't immediately show its impact
>> for SPEC2017 510.parest_r while the associated test case
>> already can get vectorized on rs6000 at that time, I realized
>> the associated test case use int as INDEXTYPE while the
>> hotspots actually use unsigned int.  So different from the one
>> in i386, this patch uses unsigned int as INDEXTYPE since the
>> unpack support for unsigned int (r12-3134) also matters for
>> the hotspots vectorization.  Not sure if it's worth to updating
>> the one in i386 as well?
> It looks like the same testcase added in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531

Thanks for the information!  Good to know that there are already
some cases to cover.  :)

BR,
Kewen

>>
>> Tested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.target/powerpc/vect-gather-1.c: New test.
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/vect-gather-1.c b/gcc/testsuite/gcc.target/powerpc/vect-gather-1.c
>> new file mode 100644
>> index 00000000000..bf98045ab03
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vect-gather-1.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* Profitable from Power8 since it supports efficient unaligned load.  */
>> +/* { dg-options "-Ofast -mdejagnu-cpu=power8 -fdump-tree-vect-details -fdump-tree-forwprop4" } */
>> +
>> +#ifndef INDEXTYPE
>> +#define INDEXTYPE unsigned int
>> +#endif
>> +double vmul(INDEXTYPE *rowstart, INDEXTYPE *rowend,
>> +           double *luval, double *dst)
>> +{
>> +  double res = 0;
>> +  for (const INDEXTYPE * col = rowstart; col != rowend; ++col, ++luval)
>> +        res += *luval * dst[*col];
>> +  return res;
>> +}
>> +
>> +/* With gather emulation this should be profitable to vectorize from Power8.  */
>> +/* { dg-final { scan-tree-dump "loop vectorized" "vect" } } */
>> +/* The index vector loads and promotions should be scalar after forwprop.  */
>> +/* { dg-final { scan-tree-dump-not "vec_unpack" "forwprop4" } } */
>> --
>> 2.25.1
>>
> 
>
  
Segher Boessenkool Nov. 26, 2021, 4:24 p.m. UTC | #3
Hi!

On Thu, Nov 25, 2021 at 11:20:57AM +0800, Kewen.Lin wrote:
> This patch is to add a test case similar to the one in i386
> to add testing coverage for 510.parest_r hotspots.

> gcc/testsuite/ChangeLog:
> 	* gcc.target/powerpc/vect-gather-1.c: New test.

This is okay for trunk.  Thanks!


Segher
  
Kewen.Lin Nov. 29, 2021, 2:03 a.m. UTC | #4
on 2021/11/27 上午12:24, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Nov 25, 2021 at 11:20:57AM +0800, Kewen.Lin wrote:
>> This patch is to add a test case similar to the one in i386
>> to add testing coverage for 510.parest_r hotspots.
> 
>> gcc/testsuite/ChangeLog:
>> 	* gcc.target/powerpc/vect-gather-1.c: New test.
> 
> This is okay for trunk.  Thanks!
> 

Thanks Segher!  Committed as r12-5569.

BR,
Kewen
  

Patch

diff --git a/gcc/testsuite/gcc.target/powerpc/vect-gather-1.c b/gcc/testsuite/gcc.target/powerpc/vect-gather-1.c
new file mode 100644
index 00000000000..bf98045ab03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vect-gather-1.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* Profitable from Power8 since it supports efficient unaligned load.  */
+/* { dg-options "-Ofast -mdejagnu-cpu=power8 -fdump-tree-vect-details -fdump-tree-forwprop4" } */
+
+#ifndef INDEXTYPE
+#define INDEXTYPE unsigned int
+#endif
+double vmul(INDEXTYPE *rowstart, INDEXTYPE *rowend,
+	    double *luval, double *dst)
+{
+  double res = 0;
+  for (const INDEXTYPE * col = rowstart; col != rowend; ++col, ++luval)
+        res += *luval * dst[*col];
+  return res;
+}
+
+/* With gather emulation this should be profitable to vectorize from Power8.  */
+/* { dg-final { scan-tree-dump "loop vectorized" "vect" } } */
+/* The index vector loads and promotions should be scalar after forwprop.  */
+/* { dg-final { scan-tree-dump-not "vec_unpack" "forwprop4" } } */