[SVE] PR96463 - Optimise svld1rq from vectors

Message ID CAAgBjMmAJek1O=pFkRNx9jn2DAzD-0AV57ASq0cyP+aDmg+9sg@mail.gmail.com
State New
Headers
Series [SVE] PR96463 - Optimise svld1rq from vectors |

Commit Message

Prathamesh Kulkarni Dec. 2, 2021, 10:50 a.m. UTC
  Hi Richard,
I have attached a WIP untested patch for PR96463.
IIUC, the PR suggests to transform
lhs = svld1rq ({-1, -1, ...}, &v[0])
into:
lhs = vec_perm_expr<v, v, {0, 0, ...}>
if v is vector of 4 elements, and each element is 32 bits on little
endian target ?

I am sorry if this sounds like a silly question, but I am not sure how
to convert a vector of type int32x4_t into svint32_t ? In the patch, I
simply used NOP_EXPR (which I expected to fail), and gave type error
during gimple verification:

svint32_t
foo (int32x4_t x)
{
  return svld1rq (svptrue_b8 (), &x[0]);
}

transformed to:
EMERGENCY DUMP:

svint32_t foo (int32x4_t x)
{
  svint32_t _3;
  __Int32x4_t _4;

  <bb 2> :
  _4 = VEC_PERM_EXPR <x_5(D), x_5(D), { 0, 0, 0, 0 }>;
  _3 = (svint32_t) _4;
  return _3;

}

and ICE's with:
pr96463.c:8:1: error: invalid vector types in nop conversion
    8 | }
      | ^
svint32_t
__Int32x4_t
_3 = (svint32_t) _4;
during GIMPLE pass: ccp

Could you please suggest how to proceed ?

Thanks,
Prathamesh
  

Comments

Richard Sandiford Dec. 2, 2021, 5:41 p.m. UTC | #1
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> Hi Richard,
> I have attached a WIP untested patch for PR96463.
> IIUC, the PR suggests to transform
> lhs = svld1rq ({-1, -1, ...}, &v[0])
> into:
> lhs = vec_perm_expr<v, v, {0, 0, ...}>
> if v is vector of 4 elements, and each element is 32 bits on little
> endian target ?
>
> I am sorry if this sounds like a silly question, but I am not sure how
> to convert a vector of type int32x4_t into svint32_t ? In the patch, I
> simply used NOP_EXPR (which I expected to fail), and gave type error
> during gimple verification:

It should be possible in principle to have a VEC_PERM_EXPR in which
the operands are Advanced SIMD vectors and the result is an SVE vector.

E.g., the dup in the PR would be something like this:

foo (int32x4_t a)
{
  svint32_t _2;

  _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>;
  return _2;
}

where the final operand can be built using:

  int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant ();
  vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1);
  for (int i = 0; i < source_nelts; ++i)
    sel.quick_push (i);

I'm not sure how well-tested that combination is though.  It might need
changes to target-independent code.

Thanks,
Richard
  
Prathamesh Kulkarni Dec. 7, 2021, 11:45 a.m. UTC | #2
On Thu, 2 Dec 2021 at 23:11, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > Hi Richard,
> > I have attached a WIP untested patch for PR96463.
> > IIUC, the PR suggests to transform
> > lhs = svld1rq ({-1, -1, ...}, &v[0])
> > into:
> > lhs = vec_perm_expr<v, v, {0, 0, ...}>
> > if v is vector of 4 elements, and each element is 32 bits on little
> > endian target ?
> >
> > I am sorry if this sounds like a silly question, but I am not sure how
> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I
> > simply used NOP_EXPR (which I expected to fail), and gave type error
> > during gimple verification:
>
> It should be possible in principle to have a VEC_PERM_EXPR in which
> the operands are Advanced SIMD vectors and the result is an SVE vector.
>
> E.g., the dup in the PR would be something like this:
>
> foo (int32x4_t a)
> {
>   svint32_t _2;
>
>   _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>;
>   return _2;
> }
>
> where the final operand can be built using:
>
>   int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant ();
>   vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1);
>   for (int i = 0; i < source_nelts; ++i)
>     sel.quick_push (i);
>
> I'm not sure how well-tested that combination is though.  It might need
> changes to target-independent code.
Hi Richard,
Thanks for the suggestions.
I tried the above approach in attached patch, but it still results in
ICE due to type mismatch:

pr96463.c: In function ‘foo’:
pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’
    8 | }
      | ^
svint32_t
int32x4_t
int32x4_t
svint32_t
_3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>;
during GIMPLE pass: ccp
dump file: pr96463.c.032t.ccp1
pr96463.c:8:1: internal compiler error: verify_gimple failed

Should we perhaps add another tree code, that "extends" a fixed-width
vector into it's VLA equivalent ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 02e42a71e5e..b38c4641535 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -44,6 +44,8 @@
 #include "aarch64-sve-builtins-shapes.h"
 #include "aarch64-sve-builtins-base.h"
 #include "aarch64-sve-builtins-functions.h"
+#include "print-tree.h"
+#include "gimple-pretty-print.h"
 
 using namespace aarch64_sve;
 
@@ -1207,6 +1209,57 @@ public:
     insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0));
     return e.use_contiguous_load_insn (icode);
   }
+
+  gimple *
+  fold (gimple_folder &f) const OVERRIDE
+  {
+    tree arg0 = gimple_call_arg (f.call, 0);
+    tree arg1 = gimple_call_arg (f.call, 1);
+
+    /* Transform:
+       lhs = svld1rq ({-1, -1, ... }, &v[0])
+       into:
+       tmp = vec_perm_expr<v, v, {0, 0, ...}>.
+       lhs = nop_expr tmp
+       on little endian target.  */
+
+    if (!BYTES_BIG_ENDIAN
+	&& integer_all_onesp (arg0)
+	&& TREE_CODE (arg1) == ADDR_EXPR)
+      {
+	tree t = TREE_OPERAND (arg1, 0);
+	if (TREE_CODE (t) == ARRAY_REF)
+	  {
+	    tree index = TREE_OPERAND (t, 1);
+	    t = TREE_OPERAND (t, 0);
+	    if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR)
+	      {
+		t = TREE_OPERAND (t, 0);
+		tree vectype = TREE_TYPE (t);
+		if (VECTOR_TYPE_P (vectype)
+		    && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
+		    && wi::to_wide (TYPE_SIZE (vectype)) == 128)
+		  {
+		    tree lhs = gimple_call_lhs (f.call);
+		    tree lhs_type = TREE_TYPE (lhs);
+		    int source_nelts = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+		    vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (lhs_type), source_nelts, 1);
+		    for (int i = 0; i < source_nelts; i++)
+		      sel.quick_push (i);
+
+		    vec_perm_indices indices (sel, 1, source_nelts);
+		    if (!can_vec_perm_const_p (TYPE_MODE (lhs_type), indices))
+		      return NULL;
+
+		    tree mask = vec_perm_indices_to_tree (lhs_type, indices);
+		    return gimple_build_assign (lhs, VEC_PERM_EXPR, t, t, mask);
+		  }
+	      }
+	  }
+      }
+
+    return NULL;
+  }
 };
 
 class svld1ro_impl : public load_replicate
  
Richard Sandiford Dec. 7, 2021, 1:38 p.m. UTC | #3
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Thu, 2 Dec 2021 at 23:11, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > Hi Richard,
>> > I have attached a WIP untested patch for PR96463.
>> > IIUC, the PR suggests to transform
>> > lhs = svld1rq ({-1, -1, ...}, &v[0])
>> > into:
>> > lhs = vec_perm_expr<v, v, {0, 0, ...}>
>> > if v is vector of 4 elements, and each element is 32 bits on little
>> > endian target ?
>> >
>> > I am sorry if this sounds like a silly question, but I am not sure how
>> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I
>> > simply used NOP_EXPR (which I expected to fail), and gave type error
>> > during gimple verification:
>>
>> It should be possible in principle to have a VEC_PERM_EXPR in which
>> the operands are Advanced SIMD vectors and the result is an SVE vector.
>>
>> E.g., the dup in the PR would be something like this:
>>
>> foo (int32x4_t a)
>> {
>>   svint32_t _2;
>>
>>   _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>;
>>   return _2;
>> }
>>
>> where the final operand can be built using:
>>
>>   int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant ();
>>   vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1);
>>   for (int i = 0; i < source_nelts; ++i)
>>     sel.quick_push (i);
>>
>> I'm not sure how well-tested that combination is though.  It might need
>> changes to target-independent code.
> Hi Richard,
> Thanks for the suggestions.
> I tried the above approach in attached patch, but it still results in
> ICE due to type mismatch:
>
> pr96463.c: In function ‘foo’:
> pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’
>     8 | }
>       | ^
> svint32_t
> int32x4_t
> int32x4_t
> svint32_t
> _3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>;
> during GIMPLE pass: ccp
> dump file: pr96463.c.032t.ccp1
> pr96463.c:8:1: internal compiler error: verify_gimple failed
>
> Should we perhaps add another tree code, that "extends" a fixed-width
> vector into it's VLA equivalent ?

No, I think this is just an extreme example of the combination not being
well-tested. :-)  Obviously it's worse than I thought.

I think accepting this kind of VEC_PERM_EXPR is still the way to go.
Richi, WDYT?

Thanks,
Richard
  
Prathamesh Kulkarni Dec. 14, 2021, 8:34 a.m. UTC | #4
On Tue, 7 Dec 2021 at 19:08, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > On Thu, 2 Dec 2021 at 23:11, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> >> > Hi Richard,
> >> > I have attached a WIP untested patch for PR96463.
> >> > IIUC, the PR suggests to transform
> >> > lhs = svld1rq ({-1, -1, ...}, &v[0])
> >> > into:
> >> > lhs = vec_perm_expr<v, v, {0, 0, ...}>
> >> > if v is vector of 4 elements, and each element is 32 bits on little
> >> > endian target ?
> >> >
> >> > I am sorry if this sounds like a silly question, but I am not sure how
> >> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I
> >> > simply used NOP_EXPR (which I expected to fail), and gave type error
> >> > during gimple verification:
> >>
> >> It should be possible in principle to have a VEC_PERM_EXPR in which
> >> the operands are Advanced SIMD vectors and the result is an SVE vector.
> >>
> >> E.g., the dup in the PR would be something like this:
> >>
> >> foo (int32x4_t a)
> >> {
> >>   svint32_t _2;
> >>
> >>   _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>;
> >>   return _2;
> >> }
> >>
> >> where the final operand can be built using:
> >>
> >>   int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant ();
> >>   vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1);
> >>   for (int i = 0; i < source_nelts; ++i)
> >>     sel.quick_push (i);
> >>
> >> I'm not sure how well-tested that combination is though.  It might need
> >> changes to target-independent code.
> > Hi Richard,
> > Thanks for the suggestions.
> > I tried the above approach in attached patch, but it still results in
> > ICE due to type mismatch:
> >
> > pr96463.c: In function ‘foo’:
> > pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’
> >     8 | }
> >       | ^
> > svint32_t
> > int32x4_t
> > int32x4_t
> > svint32_t
> > _3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>;
> > during GIMPLE pass: ccp
> > dump file: pr96463.c.032t.ccp1
> > pr96463.c:8:1: internal compiler error: verify_gimple failed
> >
> > Should we perhaps add another tree code, that "extends" a fixed-width
> > vector into it's VLA equivalent ?
>
> No, I think this is just an extreme example of the combination not being
> well-tested. :-)  Obviously it's worse than I thought.
>
> I think accepting this kind of VEC_PERM_EXPR is still the way to go.
> Richi, WDYT?
Hi Richi, ping ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
  
Richard Biener Jan. 3, 2022, 10:18 a.m. UTC | #5
On Tue, 14 Dec 2021, Prathamesh Kulkarni wrote:

> On Tue, 7 Dec 2021 at 19:08, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > On Thu, 2 Dec 2021 at 23:11, Richard Sandiford
> > > <richard.sandiford@arm.com> wrote:
> > >>
> > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > >> > Hi Richard,
> > >> > I have attached a WIP untested patch for PR96463.
> > >> > IIUC, the PR suggests to transform
> > >> > lhs = svld1rq ({-1, -1, ...}, &v[0])
> > >> > into:
> > >> > lhs = vec_perm_expr<v, v, {0, 0, ...}>
> > >> > if v is vector of 4 elements, and each element is 32 bits on little
> > >> > endian target ?
> > >> >
> > >> > I am sorry if this sounds like a silly question, but I am not sure how
> > >> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I
> > >> > simply used NOP_EXPR (which I expected to fail), and gave type error
> > >> > during gimple verification:
> > >>
> > >> It should be possible in principle to have a VEC_PERM_EXPR in which
> > >> the operands are Advanced SIMD vectors and the result is an SVE vector.
> > >>
> > >> E.g., the dup in the PR would be something like this:
> > >>
> > >> foo (int32x4_t a)
> > >> {
> > >>   svint32_t _2;
> > >>
> > >>   _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>;
> > >>   return _2;
> > >> }
> > >>
> > >> where the final operand can be built using:
> > >>
> > >>   int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant ();
> > >>   vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1);
> > >>   for (int i = 0; i < source_nelts; ++i)
> > >>     sel.quick_push (i);
> > >>
> > >> I'm not sure how well-tested that combination is though.  It might need
> > >> changes to target-independent code.
> > > Hi Richard,
> > > Thanks for the suggestions.
> > > I tried the above approach in attached patch, but it still results in
> > > ICE due to type mismatch:
> > >
> > > pr96463.c: In function ‘foo’:
> > > pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’
> > >     8 | }
> > >       | ^
> > > svint32_t
> > > int32x4_t
> > > int32x4_t
> > > svint32_t
> > > _3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>;
> > > during GIMPLE pass: ccp
> > > dump file: pr96463.c.032t.ccp1
> > > pr96463.c:8:1: internal compiler error: verify_gimple failed
> > >
> > > Should we perhaps add another tree code, that "extends" a fixed-width
> > > vector into it's VLA equivalent ?
> >
> > No, I think this is just an extreme example of the combination not being
> > well-tested. :-)  Obviously it's worse than I thought.
> >
> > I think accepting this kind of VEC_PERM_EXPR is still the way to go.
> > Richi, WDYT?
> Hi Richi, ping ?

We check

    case VEC_PERM_EXPR:
      if (!useless_type_conversion_p (lhs_type, rhs1_type)
          || !useless_type_conversion_p (lhs_type, rhs2_type))
        {
          error ("type mismatch in %qs", code_name);

and LHS is svint32_t while x_4 is int32x4_t (the permutation type
can indeed be different - it needs to be integer for example).

The test is indeed unnecessarily strict if there are two vector
types that are not compatible but have the same element mode and the
same number of elements.

I guess we could check sth like

   if (!useless_type_conversion_p (TREE_TYPE (lhs_type), TREE_TYPE 
(rhs1_type))
       || !types_compatible_p (rhs1_type, rhs2_type))

instead - we later check TYPE_VECTOR_SUBPARTS so they match.  But
note that vec_perm_optab has a single mode only so I'm not sure
what mode we should pick at RTL expansion time for your quoted case
so I'm a bit nervous here.  Richard?

Richard.
  

Patch

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 02e42a71e5e..3834f33443a 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -44,6 +44,13 @@ 
 #include "aarch64-sve-builtins-shapes.h"
 #include "aarch64-sve-builtins-base.h"
 #include "aarch64-sve-builtins-functions.h"
+#include "print-tree.h"
+#include "gimple-pretty-print.h"
+
+/* ??? Including tree-ssanames.h requires including other header dependencies.
+       Just including the prototype for now.  */
+extern tree make_ssa_name_fn (struct function *, tree, gimple *,
+			      unsigned int version = 0);
 
 using namespace aarch64_sve;
 
@@ -1207,6 +1214,52 @@  public:
     insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0));
     return e.use_contiguous_load_insn (icode);
   }
+
+  gimple *
+  fold (gimple_folder &f) const OVERRIDE
+  {
+    tree arg0 = gimple_call_arg (f.call, 0);
+    tree arg1 = gimple_call_arg (f.call, 1);
+
+    /* Transform:
+       lhs = svld1rq ({-1, -1, ... }, &v[0])
+       into:
+       tmp = vec_perm_expr<v, v, {0, 0, ...}>.
+       lhs = nop_expr tmp
+       on little endian target.  */
+
+    if (!BYTES_BIG_ENDIAN
+	&& integer_all_onesp (arg0)
+	&& TREE_CODE (arg1) == ADDR_EXPR)
+      {
+	tree t = TREE_OPERAND (arg1, 0);
+	if (TREE_CODE (t) == ARRAY_REF)
+	  {
+	    tree index = TREE_OPERAND (t, 1);
+	    t = TREE_OPERAND (t, 0);
+	    if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR)
+	      {
+		t = TREE_OPERAND (t, 0);
+		tree vectype = TREE_TYPE (t);
+		if (VECTOR_TYPE_P (vectype)
+		    && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
+		    && wi::to_wide (TYPE_SIZE (vectype)) == 128)
+		  {
+		    tree new_temp = ::make_ssa_name_fn (cfun, vectype, NULL);
+		    tree zero_vec = build_vector_from_val (vectype, index);
+		    gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, t, t, zero_vec);
+		    /* ??? How to convert between vector types if gimple_call_lhs (f.call) and
+			   new_temp have different types ?  */
+		    gimple *g2 = gimple_build_assign (gimple_call_lhs (f.call), NOP_EXPR, new_temp);
+		    gsi_insert_before (f.gsi, g, GSI_SAME_STMT);
+		    return g2;
+		  }
+	      }
+	  }
+      }
+
+    return NULL;
+  }
 };
 
 class svld1ro_impl : public load_replicate