ICE after folding svld1rq to vec_perm_expr duing forwprop

Message ID CAAgBjMmC1EX7AvzPVcn86ZqhcZJtXGx16yEDkB2qSkJ_i09sAQ@mail.gmail.com
State New
Headers
Series ICE after folding svld1rq to vec_perm_expr duing forwprop |

Commit Message

Prathamesh Kulkarni July 12, 2022, 7:11 p.m. UTC
  Hi Richard,
For the following test:

svint32_t f2(int a, int b, int c, int d)
{
  int32x4_t v = (int32x4_t) {a, b, c, d};
  return svld1rq_s32 (svptrue_b8 (), &v[0]);
}

The compiler emits following ICE with -O3 -mcpu=generic+sve:
foo.c: In function ‘f2’:
foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
    4 | svint32_t f2(int a, int b, int c, int d)
      |           ^~
svint32_t
__Int32x4_t
_7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
during GIMPLE pass: forwprop
dump file: foo.c.109t.forwprop2
foo.c:4:11: internal compiler error: verify_gimple failed
0xfda04a verify_gimple_in_cfg(function*, bool)
        ../../gcc/gcc/tree-cfg.cc:5568
0xe9371f execute_function_todo
        ../../gcc/gcc/passes.cc:2091
0xe93ccb execute_todo
        ../../gcc/gcc/passes.cc:2145

This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
  int32x4_t v;
  __Int32x4_t _1;
  svint32_t _9;
  vector(4) int _11;

  <bb 2> :
  _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
  v_12 = _1;
  _11 = v_12;
  _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
  return _9;

During forwprop, simplify_permutation simplifies vec_perm_expr to
view_convert_expr,
and the end result becomes:
  svint32_t _7;
  __Int32x4_t _8;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
  _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
  return _7;
;;    succ:       EXIT

which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
has incompatible types (svint32_t, int32x4_t).

The attached patch disables simplification of VEC_PERM_EXPR
in simplify_permutation, if lhs and rhs have non compatible types,
which resolves ICE, but am not sure if it's the correct approach ?

Alternatively, should we allow assignments from fixed-width to SVE
vector, so the above
VIEW_CONVERT_EXPR would result in dup ?

Thanks,
Prathamesh
  

Comments

Richard Biener July 13, 2022, 6:51 a.m. UTC | #1
On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi Richard,
> For the following test:
>
> svint32_t f2(int a, int b, int c, int d)
> {
>   int32x4_t v = (int32x4_t) {a, b, c, d};
>   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> }
>
> The compiler emits following ICE with -O3 -mcpu=generic+sve:
> foo.c: In function ‘f2’:
> foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
>     4 | svint32_t f2(int a, int b, int c, int d)
>       |           ^~
> svint32_t
> __Int32x4_t
> _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> during GIMPLE pass: forwprop
> dump file: foo.c.109t.forwprop2
> foo.c:4:11: internal compiler error: verify_gimple failed
> 0xfda04a verify_gimple_in_cfg(function*, bool)
>         ../../gcc/gcc/tree-cfg.cc:5568
> 0xe9371f execute_function_todo
>         ../../gcc/gcc/passes.cc:2091
> 0xe93ccb execute_todo
>         ../../gcc/gcc/passes.cc:2145
>
> This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
>   int32x4_t v;
>   __Int32x4_t _1;
>   svint32_t _9;
>   vector(4) int _11;
>
>   <bb 2> :
>   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
>   v_12 = _1;
>   _11 = v_12;
>   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
>   return _9;
>
> During forwprop, simplify_permutation simplifies vec_perm_expr to
> view_convert_expr,
> and the end result becomes:
>   svint32_t _7;
>   __Int32x4_t _8;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
>   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
>   return _7;
> ;;    succ:       EXIT
>
> which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> has incompatible types (svint32_t, int32x4_t).
>
> The attached patch disables simplification of VEC_PERM_EXPR
> in simplify_permutation, if lhs and rhs have non compatible types,
> which resolves ICE, but am not sure if it's the correct approach ?

It for sure papers over the issue.  I think the error happens earlier,
the V_C_E should have been built with the type of the VEC_PERM_EXPR
which is the type of the LHS.  But then you probably run into the
different sizes ICE (VLA vs constant size).  I think for this case you
want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
selecting the "low" part of the VLA vector.

>
> Alternatively, should we allow assignments from fixed-width to SVE
> vector, so the above
> VIEW_CONVERT_EXPR would result in dup ?
>
> Thanks,
> Prathamesh
  
Prathamesh Kulkarni July 14, 2022, 7:54 a.m. UTC | #2
On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi Richard,
> > For the following test:
> >
> > svint32_t f2(int a, int b, int c, int d)
> > {
> >   int32x4_t v = (int32x4_t) {a, b, c, d};
> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > }
> >
> > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > foo.c: In function ‘f2’:
> > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> >     4 | svint32_t f2(int a, int b, int c, int d)
> >       |           ^~
> > svint32_t
> > __Int32x4_t
> > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > during GIMPLE pass: forwprop
> > dump file: foo.c.109t.forwprop2
> > foo.c:4:11: internal compiler error: verify_gimple failed
> > 0xfda04a verify_gimple_in_cfg(function*, bool)
> >         ../../gcc/gcc/tree-cfg.cc:5568
> > 0xe9371f execute_function_todo
> >         ../../gcc/gcc/passes.cc:2091
> > 0xe93ccb execute_todo
> >         ../../gcc/gcc/passes.cc:2145
> >
> > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> >   int32x4_t v;
> >   __Int32x4_t _1;
> >   svint32_t _9;
> >   vector(4) int _11;
> >
> >   <bb 2> :
> >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> >   v_12 = _1;
> >   _11 = v_12;
> >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> >   return _9;
> >
> > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > view_convert_expr,
> > and the end result becomes:
> >   svint32_t _7;
> >   __Int32x4_t _8;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> >   return _7;
> > ;;    succ:       EXIT
> >
> > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > has incompatible types (svint32_t, int32x4_t).
> >
> > The attached patch disables simplification of VEC_PERM_EXPR
> > in simplify_permutation, if lhs and rhs have non compatible types,
> > which resolves ICE, but am not sure if it's the correct approach ?
>
> It for sure papers over the issue.  I think the error happens earlier,
> the V_C_E should have been built with the type of the VEC_PERM_EXPR
> which is the type of the LHS.  But then you probably run into the
> different sizes ICE (VLA vs constant size).  I think for this case you
> want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> selecting the "low" part of the VLA vector.
Hi Richard,
Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
represent dup operation
from fixed width to VLA vector. I am not sure how folding it to
BIT_FIELD_REF will work.
Could you please elaborate ?

Also, the issue doesn't seem restricted to this case.
The following test case also ICE's during forwprop:
svint32_t foo()
{
  int32x4_t v = (int32x4_t) {1, 2, 3, 4};
  svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
  return v2;
}

foo2.c: In function ‘foo’:
foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
    9 | }
      | ^
svint32_t
int32x4_t
v2_4 = { 1, 2, 3, 4 };

because simplify_permutation folds
VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
into:
vector_cst {1, 2, 3, 4}

and it complains during verify_gimple_assign_single because we don't
support assignment of vector_cst to VLA vector.

I guess the issue really is that currently, only VEC_PERM_EXPR
supports lhs and rhs
to have vector types with differing lengths, and simplifying it to
other tree codes, like above,
will result in type errors ?

Thanks,
Prathamesh
>
> >
> > Alternatively, should we allow assignments from fixed-width to SVE
> > vector, so the above
> > VIEW_CONVERT_EXPR would result in dup ?
> >
> > Thanks,
> > Prathamesh
  
Richard Biener July 14, 2022, 8:33 a.m. UTC | #3
On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi Richard,
> > > For the following test:
> > >
> > > svint32_t f2(int a, int b, int c, int d)
> > > {
> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > }
> > >
> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > foo.c: In function ‘f2’:
> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > >     4 | svint32_t f2(int a, int b, int c, int d)
> > >       |           ^~
> > > svint32_t
> > > __Int32x4_t
> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > during GIMPLE pass: forwprop
> > > dump file: foo.c.109t.forwprop2
> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > >         ../../gcc/gcc/tree-cfg.cc:5568
> > > 0xe9371f execute_function_todo
> > >         ../../gcc/gcc/passes.cc:2091
> > > 0xe93ccb execute_todo
> > >         ../../gcc/gcc/passes.cc:2145
> > >
> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > >   int32x4_t v;
> > >   __Int32x4_t _1;
> > >   svint32_t _9;
> > >   vector(4) int _11;
> > >
> > >   <bb 2> :
> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > >   v_12 = _1;
> > >   _11 = v_12;
> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > >   return _9;
> > >
> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > view_convert_expr,
> > > and the end result becomes:
> > >   svint32_t _7;
> > >   __Int32x4_t _8;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > has incompatible types (svint32_t, int32x4_t).
> > >
> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > which resolves ICE, but am not sure if it's the correct approach ?
> >
> > It for sure papers over the issue.  I think the error happens earlier,
> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > which is the type of the LHS.  But then you probably run into the
> > different sizes ICE (VLA vs constant size).  I think for this case you
> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > selecting the "low" part of the VLA vector.
> Hi Richard,
> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> represent dup operation
> from fixed width to VLA vector. I am not sure how folding it to
> BIT_FIELD_REF will work.
> Could you please elaborate ?
>
> Also, the issue doesn't seem restricted to this case.
> The following test case also ICE's during forwprop:
> svint32_t foo()
> {
>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
>   return v2;
> }
>
> foo2.c: In function ‘foo’:
> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
>     9 | }
>       | ^
> svint32_t
> int32x4_t
> v2_4 = { 1, 2, 3, 4 };
>
> because simplify_permutation folds
> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> into:
> vector_cst {1, 2, 3, 4}
>
> and it complains during verify_gimple_assign_single because we don't
> support assignment of vector_cst to VLA vector.
>
> I guess the issue really is that currently, only VEC_PERM_EXPR
> supports lhs and rhs
> to have vector types with differing lengths, and simplifying it to
> other tree codes, like above,
> will result in type errors ?

That might be the case - Richard should know.  If so your type check
is still too late, you should instead recognize that we are permuting
a VLA vector and then refuse to go any of the non-VEC_PERM generating
paths - that probably means just allowing the code == VEC_PERM_EXPR
case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?

Richard.

>
> Thanks,
> Prathamesh
> >
> > >
> > > Alternatively, should we allow assignments from fixed-width to SVE
> > > vector, so the above
> > > VIEW_CONVERT_EXPR would result in dup ?
> > >
> > > Thanks,
> > > Prathamesh
  
Richard Sandiford July 14, 2022, 11:52 a.m. UTC | #4
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
>>
>> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
>> >
>> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
>> > <gcc-patches@gcc.gnu.org> wrote:
>> > >
>> > > Hi Richard,
>> > > For the following test:
>> > >
>> > > svint32_t f2(int a, int b, int c, int d)
>> > > {
>> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
>> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
>> > > }
>> > >
>> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
>> > > foo.c: In function ‘f2’:
>> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
>> > >     4 | svint32_t f2(int a, int b, int c, int d)
>> > >       |           ^~
>> > > svint32_t
>> > > __Int32x4_t
>> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
>> > > during GIMPLE pass: forwprop
>> > > dump file: foo.c.109t.forwprop2
>> > > foo.c:4:11: internal compiler error: verify_gimple failed
>> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
>> > >         ../../gcc/gcc/tree-cfg.cc:5568
>> > > 0xe9371f execute_function_todo
>> > >         ../../gcc/gcc/passes.cc:2091
>> > > 0xe93ccb execute_todo
>> > >         ../../gcc/gcc/passes.cc:2145
>> > >
>> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
>> > >   int32x4_t v;
>> > >   __Int32x4_t _1;
>> > >   svint32_t _9;
>> > >   vector(4) int _11;
>> > >
>> > >   <bb 2> :
>> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
>> > >   v_12 = _1;
>> > >   _11 = v_12;
>> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
>> > >   return _9;
>> > >
>> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
>> > > view_convert_expr,
>> > > and the end result becomes:
>> > >   svint32_t _7;
>> > >   __Int32x4_t _8;
>> > >
>> > > ;;   basic block 2, loop depth 0
>> > > ;;    pred:       ENTRY
>> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
>> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
>> > >   return _7;
>> > > ;;    succ:       EXIT
>> > >
>> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
>> > > has incompatible types (svint32_t, int32x4_t).
>> > >
>> > > The attached patch disables simplification of VEC_PERM_EXPR
>> > > in simplify_permutation, if lhs and rhs have non compatible types,
>> > > which resolves ICE, but am not sure if it's the correct approach ?
>> >
>> > It for sure papers over the issue.  I think the error happens earlier,
>> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
>> > which is the type of the LHS.  But then you probably run into the
>> > different sizes ICE (VLA vs constant size).  I think for this case you
>> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
>> > selecting the "low" part of the VLA vector.
>> Hi Richard,
>> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
>> represent dup operation
>> from fixed width to VLA vector. I am not sure how folding it to
>> BIT_FIELD_REF will work.
>> Could you please elaborate ?
>>
>> Also, the issue doesn't seem restricted to this case.
>> The following test case also ICE's during forwprop:
>> svint32_t foo()
>> {
>>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
>>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
>>   return v2;
>> }
>>
>> foo2.c: In function ‘foo’:
>> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
>>     9 | }
>>       | ^
>> svint32_t
>> int32x4_t
>> v2_4 = { 1, 2, 3, 4 };
>>
>> because simplify_permutation folds
>> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
>> into:
>> vector_cst {1, 2, 3, 4}
>>
>> and it complains during verify_gimple_assign_single because we don't
>> support assignment of vector_cst to VLA vector.
>>
>> I guess the issue really is that currently, only VEC_PERM_EXPR
>> supports lhs and rhs
>> to have vector types with differing lengths, and simplifying it to
>> other tree codes, like above,
>> will result in type errors ?
>
> That might be the case - Richard should know.

I don't see anything particularly special about VEC_PERM_EXPR here,
or about the VLA vs. VLS thing.  We would have the same issue trying
to build a 128-bit vector from 2 64-bit vectors.  And there are other
tree codes whose input types are/can be different from their output
types.

So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
of type T needs to be replaced by something of type T.  Whether T has a
constant size or a variable size doesn't matter.

> If so your type check
> is still too late, you should instead recognize that we are permuting
> a VLA vector and then refuse to go any of the non-VEC_PERM generating
> paths - that probably means just allowing the code == VEC_PERM_EXPR
> case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?

Yeah.  If we're talking about the match.pd code, I think only:

  (if (sel.series_p (0, 1, 0, 1))
   { op0; }
   (if (sel.series_p (0, 1, nelts, 1))
    { op1; }

need a type compatibility check.  For fold_vec_perm I think
we should just rearrange:

  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
	      && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
	      && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
  if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
      || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
    return NULL_TREE;

so that the assert comes after the early-out.

It would be good at some point to relax fold_vec_perm to cases where the
outputs are a different length from the inputs, since the all-constant
VEC_PERM_EXPR above could be folded to a VECTOR_CST.

Thanks,
Richard
  
Prathamesh Kulkarni July 15, 2022, 1:48 p.m. UTC | #5
On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> >>
> >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> >> >
> >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> >> > <gcc-patches@gcc.gnu.org> wrote:
> >> > >
> >> > > Hi Richard,
> >> > > For the following test:
> >> > >
> >> > > svint32_t f2(int a, int b, int c, int d)
> >> > > {
> >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> >> > > }
> >> > >
> >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> >> > > foo.c: In function ‘f2’:
> >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> >> > >     4 | svint32_t f2(int a, int b, int c, int d)
> >> > >       |           ^~
> >> > > svint32_t
> >> > > __Int32x4_t
> >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> >> > > during GIMPLE pass: forwprop
> >> > > dump file: foo.c.109t.forwprop2
> >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> >> > >         ../../gcc/gcc/tree-cfg.cc:5568
> >> > > 0xe9371f execute_function_todo
> >> > >         ../../gcc/gcc/passes.cc:2091
> >> > > 0xe93ccb execute_todo
> >> > >         ../../gcc/gcc/passes.cc:2145
> >> > >
> >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> >> > >   int32x4_t v;
> >> > >   __Int32x4_t _1;
> >> > >   svint32_t _9;
> >> > >   vector(4) int _11;
> >> > >
> >> > >   <bb 2> :
> >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> >> > >   v_12 = _1;
> >> > >   _11 = v_12;
> >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> >> > >   return _9;
> >> > >
> >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> >> > > view_convert_expr,
> >> > > and the end result becomes:
> >> > >   svint32_t _7;
> >> > >   __Int32x4_t _8;
> >> > >
> >> > > ;;   basic block 2, loop depth 0
> >> > > ;;    pred:       ENTRY
> >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> >> > >   return _7;
> >> > > ;;    succ:       EXIT
> >> > >
> >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> >> > > has incompatible types (svint32_t, int32x4_t).
> >> > >
> >> > > The attached patch disables simplification of VEC_PERM_EXPR
> >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> >> > > which resolves ICE, but am not sure if it's the correct approach ?
> >> >
> >> > It for sure papers over the issue.  I think the error happens earlier,
> >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> >> > which is the type of the LHS.  But then you probably run into the
> >> > different sizes ICE (VLA vs constant size).  I think for this case you
> >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> >> > selecting the "low" part of the VLA vector.
> >> Hi Richard,
> >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> >> represent dup operation
> >> from fixed width to VLA vector. I am not sure how folding it to
> >> BIT_FIELD_REF will work.
> >> Could you please elaborate ?
> >>
> >> Also, the issue doesn't seem restricted to this case.
> >> The following test case also ICE's during forwprop:
> >> svint32_t foo()
> >> {
> >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> >>   return v2;
> >> }
> >>
> >> foo2.c: In function ‘foo’:
> >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> >>     9 | }
> >>       | ^
> >> svint32_t
> >> int32x4_t
> >> v2_4 = { 1, 2, 3, 4 };
> >>
> >> because simplify_permutation folds
> >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> >> into:
> >> vector_cst {1, 2, 3, 4}
> >>
> >> and it complains during verify_gimple_assign_single because we don't
> >> support assignment of vector_cst to VLA vector.
> >>
> >> I guess the issue really is that currently, only VEC_PERM_EXPR
> >> supports lhs and rhs
> >> to have vector types with differing lengths, and simplifying it to
> >> other tree codes, like above,
> >> will result in type errors ?
> >
> > That might be the case - Richard should know.
>
> I don't see anything particularly special about VEC_PERM_EXPR here,
> or about the VLA vs. VLS thing.  We would have the same issue trying
> to build a 128-bit vector from 2 64-bit vectors.  And there are other
> tree codes whose input types are/can be different from their output
> types.
>
> So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> of type T needs to be replaced by something of type T.  Whether T has a
> constant size or a variable size doesn't matter.
>
> > If so your type check
> > is still too late, you should instead recognize that we are permuting
> > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > paths - that probably means just allowing the code == VEC_PERM_EXPR
> > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?
>
> Yeah.  If we're talking about the match.pd code, I think only:
>
>   (if (sel.series_p (0, 1, 0, 1))
>    { op0; }
>    (if (sel.series_p (0, 1, nelts, 1))
>     { op1; }
>
> need a type compatibility check.  For fold_vec_perm I think
> we should just rearrange:
>
>   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
>               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
>               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
>   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
>       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
>     return NULL_TREE;
>
> so that the assert comes after the early-out.
>
> It would be good at some point to relax fold_vec_perm to cases where the
> outputs are a different length from the inputs, since the all-constant
> VEC_PERM_EXPR above could be folded to a VECTOR_CST.
Hi,
For the above case, I think the issue is that simplify_permutation
uses TREE_TYPE (arg0) for res_type,
while it should now use type for lhs.

      /* Shuffle of a constructor.  */
      bool ret = false;
      tree res_type = TREE_TYPE (arg0);
      tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);

Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)),
resolves the ICE, and emits all constant VEC_PERM_EXPR:

  v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
  return v2_4;

Does the patch look OK to commit after bootstrap+test ?
I will try to address the folding for above VEC_PERM_EXPR in follow-up patch.

Thanks,
Prathamesh
>
> Thanks,
> Richard
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 69567ab3275..b320ece13f2 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2589,7 +2589,7 @@ simplify_permutation (gimple_stmt_iterator *gsi)
 
       /* Shuffle of a constructor.  */
       bool ret = false;
-      tree res_type = TREE_TYPE (arg0);
+      tree res_type = TREE_TYPE (gimple_get_lhs (stmt));
       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
       if (!opt
 	  || (TREE_CODE (opt) != CONSTRUCTOR && TREE_CODE (opt) != VECTOR_CST))
  
Richard Biener July 18, 2022, 6:27 a.m. UTC | #6
On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Richard Biener <richard.guenther@gmail.com> writes:
> > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > >>
> > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> > >> >
> > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > >> > <gcc-patches@gcc.gnu.org> wrote:
> > >> > >
> > >> > > Hi Richard,
> > >> > > For the following test:
> > >> > >
> > >> > > svint32_t f2(int a, int b, int c, int d)
> > >> > > {
> > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > >> > > }
> > >> > >
> > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > >> > > foo.c: In function ‘f2’:
> > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > >> > >     4 | svint32_t f2(int a, int b, int c, int d)
> > >> > >       |           ^~
> > >> > > svint32_t
> > >> > > __Int32x4_t
> > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > >> > > during GIMPLE pass: forwprop
> > >> > > dump file: foo.c.109t.forwprop2
> > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > >> > >         ../../gcc/gcc/tree-cfg.cc:5568
> > >> > > 0xe9371f execute_function_todo
> > >> > >         ../../gcc/gcc/passes.cc:2091
> > >> > > 0xe93ccb execute_todo
> > >> > >         ../../gcc/gcc/passes.cc:2145
> > >> > >
> > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > >> > >   int32x4_t v;
> > >> > >   __Int32x4_t _1;
> > >> > >   svint32_t _9;
> > >> > >   vector(4) int _11;
> > >> > >
> > >> > >   <bb 2> :
> > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > >> > >   v_12 = _1;
> > >> > >   _11 = v_12;
> > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > >> > >   return _9;
> > >> > >
> > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > >> > > view_convert_expr,
> > >> > > and the end result becomes:
> > >> > >   svint32_t _7;
> > >> > >   __Int32x4_t _8;
> > >> > >
> > >> > > ;;   basic block 2, loop depth 0
> > >> > > ;;    pred:       ENTRY
> > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > >> > >   return _7;
> > >> > > ;;    succ:       EXIT
> > >> > >
> > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > >> > > has incompatible types (svint32_t, int32x4_t).
> > >> > >
> > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > >> >
> > >> > It for sure papers over the issue.  I think the error happens earlier,
> > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > >> > which is the type of the LHS.  But then you probably run into the
> > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > >> > selecting the "low" part of the VLA vector.
> > >> Hi Richard,
> > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > >> represent dup operation
> > >> from fixed width to VLA vector. I am not sure how folding it to
> > >> BIT_FIELD_REF will work.
> > >> Could you please elaborate ?
> > >>
> > >> Also, the issue doesn't seem restricted to this case.
> > >> The following test case also ICE's during forwprop:
> > >> svint32_t foo()
> > >> {
> > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > >>   return v2;
> > >> }
> > >>
> > >> foo2.c: In function ‘foo’:
> > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > >>     9 | }
> > >>       | ^
> > >> svint32_t
> > >> int32x4_t
> > >> v2_4 = { 1, 2, 3, 4 };
> > >>
> > >> because simplify_permutation folds
> > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > >> into:
> > >> vector_cst {1, 2, 3, 4}
> > >>
> > >> and it complains during verify_gimple_assign_single because we don't
> > >> support assignment of vector_cst to VLA vector.
> > >>
> > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > >> supports lhs and rhs
> > >> to have vector types with differing lengths, and simplifying it to
> > >> other tree codes, like above,
> > >> will result in type errors ?
> > >
> > > That might be the case - Richard should know.
> >
> > I don't see anything particularly special about VEC_PERM_EXPR here,
> > or about the VLA vs. VLS thing.  We would have the same issue trying
> > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > tree codes whose input types are/can be different from their output
> > types.
> >
> > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > of type T needs to be replaced by something of type T.  Whether T has a
> > constant size or a variable size doesn't matter.
> >
> > > If so your type check
> > > is still too late, you should instead recognize that we are permuting
> > > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > > paths - that probably means just allowing the code == VEC_PERM_EXPR
> > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?
> >
> > Yeah.  If we're talking about the match.pd code, I think only:
> >
> >   (if (sel.series_p (0, 1, 0, 1))
> >    { op0; }
> >    (if (sel.series_p (0, 1, nelts, 1))
> >     { op1; }
> >
> > need a type compatibility check.  For fold_vec_perm I think
> > we should just rearrange:
> >
> >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> >   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
> >       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
> >     return NULL_TREE;
> >
> > so that the assert comes after the early-out.
> >
> > It would be good at some point to relax fold_vec_perm to cases where the
> > outputs are a different length from the inputs, since the all-constant
> > VEC_PERM_EXPR above could be folded to a VECTOR_CST.
> Hi,
> For the above case, I think the issue is that simplify_permutation
> uses TREE_TYPE (arg0) for res_type,
> while it should now use type for lhs.
>
>       /* Shuffle of a constructor.  */
>       bool ret = false;
>       tree res_type = TREE_TYPE (arg0);
>       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
>
> Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)),
> resolves the ICE, and emits all constant VEC_PERM_EXPR:
>
>   v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
>   return v2_4;
>
> Does the patch look OK to commit after bootstrap+test ?

Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt).

> I will try to address the folding for above VEC_PERM_EXPR in follow-up patch.
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard
  
Prathamesh Kulkarni July 20, 2022, 3:35 p.m. UTC | #7
On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > >
> > > Richard Biener <richard.guenther@gmail.com> writes:
> > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >>
> > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >> >
> > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > >> > <gcc-patches@gcc.gnu.org> wrote:
> > > >> > >
> > > >> > > Hi Richard,
> > > >> > > For the following test:
> > > >> > >
> > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > >> > > {
> > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > >> > > }
> > > >> > >
> > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > >> > > foo.c: In function ‘f2’:
> > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > >> > >     4 | svint32_t f2(int a, int b, int c, int d)
> > > >> > >       |           ^~
> > > >> > > svint32_t
> > > >> > > __Int32x4_t
> > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > >> > > during GIMPLE pass: forwprop
> > > >> > > dump file: foo.c.109t.forwprop2
> > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > >> > >         ../../gcc/gcc/tree-cfg.cc:5568
> > > >> > > 0xe9371f execute_function_todo
> > > >> > >         ../../gcc/gcc/passes.cc:2091
> > > >> > > 0xe93ccb execute_todo
> > > >> > >         ../../gcc/gcc/passes.cc:2145
> > > >> > >
> > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > > >> > >   int32x4_t v;
> > > >> > >   __Int32x4_t _1;
> > > >> > >   svint32_t _9;
> > > >> > >   vector(4) int _11;
> > > >> > >
> > > >> > >   <bb 2> :
> > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > >> > >   v_12 = _1;
> > > >> > >   _11 = v_12;
> > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > >> > >   return _9;
> > > >> > >
> > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > >> > > view_convert_expr,
> > > >> > > and the end result becomes:
> > > >> > >   svint32_t _7;
> > > >> > >   __Int32x4_t _8;
> > > >> > >
> > > >> > > ;;   basic block 2, loop depth 0
> > > >> > > ;;    pred:       ENTRY
> > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > >> > >   return _7;
> > > >> > > ;;    succ:       EXIT
> > > >> > >
> > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > >> > >
> > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > > >> >
> > > >> > It for sure papers over the issue.  I think the error happens earlier,
> > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > >> > which is the type of the LHS.  But then you probably run into the
> > > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > >> > selecting the "low" part of the VLA vector.
> > > >> Hi Richard,
> > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > >> represent dup operation
> > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > >> BIT_FIELD_REF will work.
> > > >> Could you please elaborate ?
> > > >>
> > > >> Also, the issue doesn't seem restricted to this case.
> > > >> The following test case also ICE's during forwprop:
> > > >> svint32_t foo()
> > > >> {
> > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > >>   return v2;
> > > >> }
> > > >>
> > > >> foo2.c: In function ‘foo’:
> > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > >>     9 | }
> > > >>       | ^
> > > >> svint32_t
> > > >> int32x4_t
> > > >> v2_4 = { 1, 2, 3, 4 };
> > > >>
> > > >> because simplify_permutation folds
> > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > >> into:
> > > >> vector_cst {1, 2, 3, 4}
> > > >>
> > > >> and it complains during verify_gimple_assign_single because we don't
> > > >> support assignment of vector_cst to VLA vector.
> > > >>
> > > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > > >> supports lhs and rhs
> > > >> to have vector types with differing lengths, and simplifying it to
> > > >> other tree codes, like above,
> > > >> will result in type errors ?
> > > >
> > > > That might be the case - Richard should know.
> > >
> > > I don't see anything particularly special about VEC_PERM_EXPR here,
> > > or about the VLA vs. VLS thing.  We would have the same issue trying
> > > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > > tree codes whose input types are/can be different from their output
> > > types.
> > >
> > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > > of type T needs to be replaced by something of type T.  Whether T has a
> > > constant size or a variable size doesn't matter.
> > >
> > > > If so your type check
> > > > is still too late, you should instead recognize that we are permuting
> > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > > > paths - that probably means just allowing the code == VEC_PERM_EXPR
> > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?
> > >
> > > Yeah.  If we're talking about the match.pd code, I think only:
> > >
> > >   (if (sel.series_p (0, 1, 0, 1))
> > >    { op0; }
> > >    (if (sel.series_p (0, 1, nelts, 1))
> > >     { op1; }
> > >
> > > need a type compatibility check.  For fold_vec_perm I think
> > > we should just rearrange:
> > >
> > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > >   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
> > >       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
> > >     return NULL_TREE;
> > >
> > > so that the assert comes after the early-out.
> > >
> > > It would be good at some point to relax fold_vec_perm to cases where the
> > > outputs are a different length from the inputs, since the all-constant
> > > VEC_PERM_EXPR above could be folded to a VECTOR_CST.
> > Hi,
> > For the above case, I think the issue is that simplify_permutation
> > uses TREE_TYPE (arg0) for res_type,
> > while it should now use type for lhs.
> >
> >       /* Shuffle of a constructor.  */
> >       bool ret = false;
> >       tree res_type = TREE_TYPE (arg0);
> >       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> >
> > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)),
> > resolves the ICE, and emits all constant VEC_PERM_EXPR:
> >
> >   v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
> >   return v2_4;
> >
> > Does the patch look OK to commit after bootstrap+test ?
>
> Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt).
Hi,
I committed the patch but unfortunately it caused PR106360.
The issue is that for slp-reduc-sad-2.c on ppc64le,
simplify_permutation sees the following during forwprop4:

  _78 = (void *) ivtmp.21_73;
  _92 = MEM <unsigned long> [(uint8_t *)_78];
  _91 = {_92, 0};
  vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
  _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
  _87 = {_88, 0};
  vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
  vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;

So for,
tree res_type = TREE_TYPE (gimple_assign_lhs (stmt));
tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);

we have:
res_type = V16QI
arg0 = {_92, 0}
arg1 = {_88, 0}
op2 = {0, 2}

and thus we hit the following assert in fold_vec_perm:

gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
              && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
              && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));

since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16.

If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead,
it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR:
  _92 = MEM <unsigned long> [(uint8_t *)_78];
  _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
  _5 = {_92, _88};
  vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5);

I suppose it's legal to cast vector of one type to another as long as
sizes match ?

IIUC, the above VIEW_CONVERT_EXPR will result in:
vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0,
0, 0, 0, 0, 0, 0 } ?

In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks
if lhs_type and res_type differ but have same size, and in that case emit:
lhs = VIEW_CONVERT_EXPR<lhs_type> (opt),
instead of:
lhs = VIEW_CONVERT_EXPR<op1 type> (opt)
where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2>

Does it look OK ?

Thanks,
Prathamesh

>
> > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch.
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Richard
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index d04cf4bccf8..6019d9b03ff 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2661,18 +2661,21 @@ simplify_permutation (gimple_stmt_iterator *gsi)
 
       /* Shuffle of a constructor.  */
       bool ret = false;
-      tree res_type = TREE_TYPE (gimple_assign_lhs (stmt));
+      tree res_type = TREE_TYPE (arg0);
       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
       if (!opt
 	  || (TREE_CODE (opt) != CONSTRUCTOR && TREE_CODE (opt) != VECTOR_CST))
 	return 0;
       /* Found VIEW_CONVERT_EXPR before, need one explicit conversion.  */
-      if (res_type != TREE_TYPE (op0))
+      tree lhs_type = TREE_TYPE (gimple_assign_lhs (stmt));
+      if (res_type != lhs_type)
 	{
+	  if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type)))
+	    return 0;
 	  tree name = make_ssa_name (TREE_TYPE (opt));
 	  gimple *ass_stmt = gimple_build_assign (name, opt);
 	  gsi_insert_before (gsi, ass_stmt, GSI_SAME_STMT);
-	  opt = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (op0), name);
+	  opt = build1 (VIEW_CONVERT_EXPR, lhs_type, name);
 	}
       gimple_assign_set_rhs_from_tree (gsi, opt);
       update_stmt (gsi_stmt (*gsi));
  
Richard Biener July 21, 2022, 6:51 a.m. UTC | #8
On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > <richard.sandiford@arm.com> wrote:
> > > >
> > > > Richard Biener <richard.guenther@gmail.com> writes:
> > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > >>
> > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > >> >
> > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > > >> > <gcc-patches@gcc.gnu.org> wrote:
> > > > >> > >
> > > > >> > > Hi Richard,
> > > > >> > > For the following test:
> > > > >> > >
> > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > >> > > {
> > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > >> > > }
> > > > >> > >
> > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > >> > > foo.c: In function ‘f2’:
> > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > > >> > >     4 | svint32_t f2(int a, int b, int c, int d)
> > > > >> > >       |           ^~
> > > > >> > > svint32_t
> > > > >> > > __Int32x4_t
> > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > >> > > during GIMPLE pass: forwprop
> > > > >> > > dump file: foo.c.109t.forwprop2
> > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > >> > >         ../../gcc/gcc/tree-cfg.cc:5568
> > > > >> > > 0xe9371f execute_function_todo
> > > > >> > >         ../../gcc/gcc/passes.cc:2091
> > > > >> > > 0xe93ccb execute_todo
> > > > >> > >         ../../gcc/gcc/passes.cc:2145
> > > > >> > >
> > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > > > >> > >   int32x4_t v;
> > > > >> > >   __Int32x4_t _1;
> > > > >> > >   svint32_t _9;
> > > > >> > >   vector(4) int _11;
> > > > >> > >
> > > > >> > >   <bb 2> :
> > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > >> > >   v_12 = _1;
> > > > >> > >   _11 = v_12;
> > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > >> > >   return _9;
> > > > >> > >
> > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > > >> > > view_convert_expr,
> > > > >> > > and the end result becomes:
> > > > >> > >   svint32_t _7;
> > > > >> > >   __Int32x4_t _8;
> > > > >> > >
> > > > >> > > ;;   basic block 2, loop depth 0
> > > > >> > > ;;    pred:       ENTRY
> > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > >> > >   return _7;
> > > > >> > > ;;    succ:       EXIT
> > > > >> > >
> > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > >> > >
> > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > > > >> >
> > > > >> > It for sure papers over the issue.  I think the error happens earlier,
> > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > > >> > which is the type of the LHS.  But then you probably run into the
> > > > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > >> > selecting the "low" part of the VLA vector.
> > > > >> Hi Richard,
> > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > >> represent dup operation
> > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > >> BIT_FIELD_REF will work.
> > > > >> Could you please elaborate ?
> > > > >>
> > > > >> Also, the issue doesn't seem restricted to this case.
> > > > >> The following test case also ICE's during forwprop:
> > > > >> svint32_t foo()
> > > > >> {
> > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > >>   return v2;
> > > > >> }
> > > > >>
> > > > >> foo2.c: In function ‘foo’:
> > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > > >>     9 | }
> > > > >>       | ^
> > > > >> svint32_t
> > > > >> int32x4_t
> > > > >> v2_4 = { 1, 2, 3, 4 };
> > > > >>
> > > > >> because simplify_permutation folds
> > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > > >> into:
> > > > >> vector_cst {1, 2, 3, 4}
> > > > >>
> > > > >> and it complains during verify_gimple_assign_single because we don't
> > > > >> support assignment of vector_cst to VLA vector.
> > > > >>
> > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > > > >> supports lhs and rhs
> > > > >> to have vector types with differing lengths, and simplifying it to
> > > > >> other tree codes, like above,
> > > > >> will result in type errors ?
> > > > >
> > > > > That might be the case - Richard should know.
> > > >
> > > > I don't see anything particularly special about VEC_PERM_EXPR here,
> > > > or about the VLA vs. VLS thing.  We would have the same issue trying
> > > > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > > > tree codes whose input types are/can be different from their output
> > > > types.
> > > >
> > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > > > of type T needs to be replaced by something of type T.  Whether T has a
> > > > constant size or a variable size doesn't matter.
> > > >
> > > > > If so your type check
> > > > > is still too late, you should instead recognize that we are permuting
> > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR
> > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?
> > > >
> > > > Yeah.  If we're talking about the match.pd code, I think only:
> > > >
> > > >   (if (sel.series_p (0, 1, 0, 1))
> > > >    { op0; }
> > > >    (if (sel.series_p (0, 1, nelts, 1))
> > > >     { op1; }
> > > >
> > > > need a type compatibility check.  For fold_vec_perm I think
> > > > we should just rearrange:
> > > >
> > > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > >   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
> > > >       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
> > > >     return NULL_TREE;
> > > >
> > > > so that the assert comes after the early-out.
> > > >
> > > > It would be good at some point to relax fold_vec_perm to cases where the
> > > > outputs are a different length from the inputs, since the all-constant
> > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST.
> > > Hi,
> > > For the above case, I think the issue is that simplify_permutation
> > > uses TREE_TYPE (arg0) for res_type,
> > > while it should now use type for lhs.
> > >
> > >       /* Shuffle of a constructor.  */
> > >       bool ret = false;
> > >       tree res_type = TREE_TYPE (arg0);
> > >       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > >
> > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)),
> > > resolves the ICE, and emits all constant VEC_PERM_EXPR:
> > >
> > >   v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
> > >   return v2_4;
> > >
> > > Does the patch look OK to commit after bootstrap+test ?
> >
> > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt).
> Hi,
> I committed the patch but unfortunately it caused PR106360.
> The issue is that for slp-reduc-sad-2.c on ppc64le,
> simplify_permutation sees the following during forwprop4:
>
>   _78 = (void *) ivtmp.21_73;
>   _92 = MEM <unsigned long> [(uint8_t *)_78];
>   _91 = {_92, 0};
>   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
>   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
>   _87 = {_88, 0};
>   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
>   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
>
> So for,
> tree res_type = TREE_TYPE (gimple_assign_lhs (stmt));
> tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
>
> we have:
> res_type = V16QI
> arg0 = {_92, 0}
> arg1 = {_88, 0}
> op2 = {0, 2}
>
> and thus we hit the following assert in fold_vec_perm:
>
> gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
>               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
>               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
>
> since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16.
>
> If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead,
> it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR:
>   _92 = MEM <unsigned long> [(uint8_t *)_78];
>   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
>   _5 = {_92, _88};
>   vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5);
>
> I suppose it's legal to cast vector of one type to another as long as
> sizes match ?
>
> IIUC, the above VIEW_CONVERT_EXPR will result in:
> vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0,
> 0, 0, 0, 0, 0, 0 } ?
>
> In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks
> if lhs_type and res_type differ but have same size, and in that case emit:
> lhs = VIEW_CONVERT_EXPR<lhs_type> (opt),
> instead of:
> lhs = VIEW_CONVERT_EXPR<op1 type> (opt)
> where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2>
>
> Does it look OK ?

Definitely the original change was bogus.

+         if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type)))
+           return 0;

just repeats your very original change though ... I'll note that
fold_ternary will
ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by
returning NULL_TREE on cases it does not handle.

I think what should be done is, in the

      /* If there are any VIEW_CONVERT_EXPRs found when finding permutation
         operands source, check whether it's valid to transform and prepare
         the required new operands.  */
      if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
        {
...

path also transform the expected result type.  It should remain V_C_E compatible
to TREE_TYPE (lhs) but get a new element type.

But as said,

tree
fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
{
  unsigned int i;
  unsigned HOST_WIDE_INT nelts;
  bool need_ctor = false;

  if (!sel.length ().is_constant (&nelts))
    return NULL_TREE;
  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
              && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
              && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));

^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary
doesn't guard according to those asserts (I think we should extend fold_vec_perm
to support the new constraints).

Richard.

> Thanks,
> Prathamesh
>
> >
> > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch.
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Richard
  
Prathamesh Kulkarni Aug. 1, 2022, 3:16 a.m. UTC | #9
On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > <richard.sandiford@arm.com> wrote:
> > > > >
> > > > > Richard Biener <richard.guenther@gmail.com> writes:
> > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > >>
> > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > >> >
> > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > > > >> > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >> > >
> > > > > >> > > Hi Richard,
> > > > > >> > > For the following test:
> > > > > >> > >
> > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > >> > > {
> > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > >> > > }
> > > > > >> > >
> > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > > >> > > foo.c: In function ‘f2’:
> > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > > > >> > >     4 | svint32_t f2(int a, int b, int c, int d)
> > > > > >> > >       |           ^~
> > > > > >> > > svint32_t
> > > > > >> > > __Int32x4_t
> > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > >> > > during GIMPLE pass: forwprop
> > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > >> > >         ../../gcc/gcc/tree-cfg.cc:5568
> > > > > >> > > 0xe9371f execute_function_todo
> > > > > >> > >         ../../gcc/gcc/passes.cc:2091
> > > > > >> > > 0xe93ccb execute_todo
> > > > > >> > >         ../../gcc/gcc/passes.cc:2145
> > > > > >> > >
> > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > > > > >> > >   int32x4_t v;
> > > > > >> > >   __Int32x4_t _1;
> > > > > >> > >   svint32_t _9;
> > > > > >> > >   vector(4) int _11;
> > > > > >> > >
> > > > > >> > >   <bb 2> :
> > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > >> > >   v_12 = _1;
> > > > > >> > >   _11 = v_12;
> > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > >> > >   return _9;
> > > > > >> > >
> > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > > > >> > > view_convert_expr,
> > > > > >> > > and the end result becomes:
> > > > > >> > >   svint32_t _7;
> > > > > >> > >   __Int32x4_t _8;
> > > > > >> > >
> > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > >> > > ;;    pred:       ENTRY
> > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > >> > >   return _7;
> > > > > >> > > ;;    succ:       EXIT
> > > > > >> > >
> > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > >> > >
> > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > > > > >> >
> > > > > >> > It for sure papers over the issue.  I think the error happens earlier,
> > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > > > >> > which is the type of the LHS.  But then you probably run into the
> > > > > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > >> > selecting the "low" part of the VLA vector.
> > > > > >> Hi Richard,
> > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > > >> represent dup operation
> > > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > > >> BIT_FIELD_REF will work.
> > > > > >> Could you please elaborate ?
> > > > > >>
> > > > > >> Also, the issue doesn't seem restricted to this case.
> > > > > >> The following test case also ICE's during forwprop:
> > > > > >> svint32_t foo()
> > > > > >> {
> > > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > >>   return v2;
> > > > > >> }
> > > > > >>
> > > > > >> foo2.c: In function ‘foo’:
> > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > > > >>     9 | }
> > > > > >>       | ^
> > > > > >> svint32_t
> > > > > >> int32x4_t
> > > > > >> v2_4 = { 1, 2, 3, 4 };
> > > > > >>
> > > > > >> because simplify_permutation folds
> > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > > > >> into:
> > > > > >> vector_cst {1, 2, 3, 4}
> > > > > >>
> > > > > >> and it complains during verify_gimple_assign_single because we don't
> > > > > >> support assignment of vector_cst to VLA vector.
> > > > > >>
> > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > > > > >> supports lhs and rhs
> > > > > >> to have vector types with differing lengths, and simplifying it to
> > > > > >> other tree codes, like above,
> > > > > >> will result in type errors ?
> > > > > >
> > > > > > That might be the case - Richard should know.
> > > > >
> > > > > I don't see anything particularly special about VEC_PERM_EXPR here,
> > > > > or about the VLA vs. VLS thing.  We would have the same issue trying
> > > > > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > > > > tree codes whose input types are/can be different from their output
> > > > > types.
> > > > >
> > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > > > > of type T needs to be replaced by something of type T.  Whether T has a
> > > > > constant size or a variable size doesn't matter.
> > > > >
> > > > > > If so your type check
> > > > > > is still too late, you should instead recognize that we are permuting
> > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR
> > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?
> > > > >
> > > > > Yeah.  If we're talking about the match.pd code, I think only:
> > > > >
> > > > >   (if (sel.series_p (0, 1, 0, 1))
> > > > >    { op0; }
> > > > >    (if (sel.series_p (0, 1, nelts, 1))
> > > > >     { op1; }
> > > > >
> > > > > need a type compatibility check.  For fold_vec_perm I think
> > > > > we should just rearrange:
> > > > >
> > > > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > > >   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
> > > > >       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
> > > > >     return NULL_TREE;
> > > > >
> > > > > so that the assert comes after the early-out.
> > > > >
> > > > > It would be good at some point to relax fold_vec_perm to cases where the
> > > > > outputs are a different length from the inputs, since the all-constant
> > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST.
> > > > Hi,
> > > > For the above case, I think the issue is that simplify_permutation
> > > > uses TREE_TYPE (arg0) for res_type,
> > > > while it should now use type for lhs.
> > > >
> > > >       /* Shuffle of a constructor.  */
> > > >       bool ret = false;
> > > >       tree res_type = TREE_TYPE (arg0);
> > > >       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > > >
> > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)),
> > > > resolves the ICE, and emits all constant VEC_PERM_EXPR:
> > > >
> > > >   v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
> > > >   return v2_4;
> > > >
> > > > Does the patch look OK to commit after bootstrap+test ?
> > >
> > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt).
> > Hi,
> > I committed the patch but unfortunately it caused PR106360.
> > The issue is that for slp-reduc-sad-2.c on ppc64le,
> > simplify_permutation sees the following during forwprop4:
> >
> >   _78 = (void *) ivtmp.21_73;
> >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> >   _91 = {_92, 0};
> >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> >   _87 = {_88, 0};
> >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> >
> > So for,
> > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt));
> > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> >
> > we have:
> > res_type = V16QI
> > arg0 = {_92, 0}
> > arg1 = {_88, 0}
> > op2 = {0, 2}
> >
> > and thus we hit the following assert in fold_vec_perm:
> >
> > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> >
> > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16.
> >
> > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead,
> > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR:
> >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> >   _5 = {_92, _88};
> >   vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5);
> >
> > I suppose it's legal to cast vector of one type to another as long as
> > sizes match ?
> >
> > IIUC, the above VIEW_CONVERT_EXPR will result in:
> > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0,
> > 0, 0, 0, 0, 0, 0 } ?
> >
> > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks
> > if lhs_type and res_type differ but have same size, and in that case emit:
> > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt),
> > instead of:
> > lhs = VIEW_CONVERT_EXPR<op1 type> (opt)
> > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2>
> >
> > Does it look OK ?
>
> Definitely the original change was bogus.
>
> +         if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type)))
> +           return 0;
>
> just repeats your very original change though ... I'll note that
> fold_ternary will
> ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by
> returning NULL_TREE on cases it does not handle.
>
> I think what should be done is, in the
>
>       /* If there are any VIEW_CONVERT_EXPRs found when finding permutation
>          operands source, check whether it's valid to transform and prepare
>          the required new operands.  */
>       if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
>         {
> ...
>
> path also transform the expected result type.  It should remain V_C_E compatible
> to TREE_TYPE (lhs) but get a new element type.
>
> But as said,
>
> tree
> fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
> {
>   unsigned int i;
>   unsigned HOST_WIDE_INT nelts;
>   bool need_ctor = false;
>
>   if (!sel.length ().is_constant (&nelts))
>     return NULL_TREE;
>   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
>               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
>               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
>
> ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary
> doesn't guard according to those asserts (I think we should extend fold_vec_perm
> to support the new constraints).
Hi Richard,
Thanks for the suggestions and sorry for late reply. I reverted the change to
simplify_permutaton which resolved the ppc64le case ICE.

The attached patch extends fold_vec_perm to handle vectors with
differing lengths.
For,
lhs = vec_perm_expr<arg0, arg1, mask>,
the patch:
(a) asserts lhs and mask have same vector length.
(b) asserts arg0 and arg1 have same vector length.
(c) returns NULL_TREE if element type differs for lhs, arg0 and arg1.
(d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1
if the mask has npatterns == len(arg0) and nelts_per_pattern == 1.
The intent is to permute arg0 and arg1, and then to dup elements in result
to target vector length.
So for eg:
vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}>
will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6}
replicated thru-out.
Does it look OK ?

With the patch, we don't ICE for either of the aarch64 tests above.
For,
svint32_t f1()
{
  int32x4_t v = {1, 2, 3, 4};
  return svld1rq_s32 (svptrue_b8 (), &v[0]);
}

optimized dump shows:
svint32_t f1 ()
{
  int32x4_t v;

  <bb 2> :
  return { 1, 2, 3, 4, ... };

}

code-gen:
f1:
.LFB3900:
        .cfi_startproc
        ptrue   p0.b, all
        adrp    x0, .LC0
        add     x0, x0, :lo12:.LC0
        ld1rqw  z0.s, p0/z, [x0]
        ret
.LC0:
        .word   1
        .word   2
        .word   3
        .word   4

I guess for this particular case, we could use index instead.

For,
svint32_t f2(int a, int b, int c, int d)
{
  int32x4_t v = {a, b, c, d};
  return svld1rq_s32 (svptrue_b8 (), &v[0]);
}

optimized dump shows:
svint32_t f2 (int a, int b, int c, int d)
{
  svint32_t _6;

  <bb 2> [local count: 1073741824]:
  _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... };
  return _6;

The code-gen seems pretty bad however:
f2:
.LFB3901:
        .cfi_startproc
        addvl   sp, sp, #-4
        .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22
        ptrue   p0.b, all
        addvl   x4, sp, #3
        mov     z0.b, #0
        st1w    z0.s, p0, [sp, #3, mul vl]
        str     w0, [x4]
        addvl   x0, sp, #2
        ld1w    z0.s, p0/z, [sp, #3, mul vl]
        st1w    z0.s, p0, [sp, #2, mul vl]
        str     w1, [x0, 4]
        addvl   x0, sp, #1
        ld1w    z0.s, p0/z, [sp, #2, mul vl]
        st1w    z0.s, p0, [sp, #1, mul vl]
        str     w2, [x0, 8]
        ld1w    z0.s, p0/z, [sp, #1, mul vl]
        st1w    z0.s, p0, [sp]
        str     w3, [sp, 12]
        ld1w    z0.s, p0/z, [sp]
        addvl   sp, sp, #4
        .cfi_def_cfa_offset 0
        ret

I will try to address code-gen issues in follow up patches.
Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu.

Thanks,
Prathamesh

>
> Richard.
>
> > Thanks,
> > Prathamesh
> >
> > >
> > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch.
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Richard
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 99021a82df4..6912de9b43c 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10541,15 +10541,33 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
   unsigned HOST_WIDE_INT nelts;
   bool need_ctor = false;
 
-  if (!sel.length ().is_constant (&nelts))
-    return NULL_TREE;
-  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
-	      && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
-	      && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
+  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
+			sel.length ()));
+  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)),
+			TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))));
+
   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
     return NULL_TREE;
 
+  /* If result vector has greater length than input vector,
+     then allow permuting two vectors as long as:
+     a) sel.nelts_per_pattern == 1
+     b) sel.npatterns == len of input vector.
+     The intent is to permute input vectors, and
+     dup the elements in resulting vector to target vector length.  */
+
+  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
+		TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
+    {
+      nelts = sel.encoding ().npatterns ();
+      if (sel.encoding ().nelts_per_pattern () != 1
+	  || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
+	return NULL_TREE;
+    }
+  else if (!sel.length ().is_constant (&nelts))
+    return NULL_TREE;
+
   tree *in_elts = XALLOCAVEC (tree, nelts * 2);
   if (!vec_cst_ctor_to_array (arg0, nelts, in_elts)
       || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts))
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 47371d8bcbe..7e706857f43 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 		pp_space (pp);
 	      }
 	  }
+	if (VECTOR_TYPE_P (TREE_TYPE (node))
+	    && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
+	  pp_string (pp, ", ... ");
 	pp_right_brace (pp);
       }
       break;
  
Richard Biener Aug. 8, 2022, 8:56 a.m. UTC | #10
On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > >
> > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > > <richard.sandiford@arm.com> wrote:
> > > > > >
> > > > > > Richard Biener <richard.guenther@gmail.com> writes:
> > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > >>
> > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > >> >
> > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > > > > >> > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > >> > >
> > > > > > >> > > Hi Richard,
> > > > > > >> > > For the following test:
> > > > > > >> > >
> > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > >> > > {
> > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > >> > > }
> > > > > > >> > >
> > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > > > > >> > >     4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > >> > >       |           ^~
> > > > > > >> > > svint32_t
> > > > > > >> > > __Int32x4_t
> > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > >> > >         ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > >> > >         ../../gcc/gcc/passes.cc:2091
> > > > > > >> > > 0xe93ccb execute_todo
> > > > > > >> > >         ../../gcc/gcc/passes.cc:2145
> > > > > > >> > >
> > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > > > > > >> > >   int32x4_t v;
> > > > > > >> > >   __Int32x4_t _1;
> > > > > > >> > >   svint32_t _9;
> > > > > > >> > >   vector(4) int _11;
> > > > > > >> > >
> > > > > > >> > >   <bb 2> :
> > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > >> > >   v_12 = _1;
> > > > > > >> > >   _11 = v_12;
> > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > >> > >   return _9;
> > > > > > >> > >
> > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > > > > >> > > view_convert_expr,
> > > > > > >> > > and the end result becomes:
> > > > > > >> > >   svint32_t _7;
> > > > > > >> > >   __Int32x4_t _8;
> > > > > > >> > >
> > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > >> > > ;;    pred:       ENTRY
> > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > >> > >   return _7;
> > > > > > >> > > ;;    succ:       EXIT
> > > > > > >> > >
> > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > >> > >
> > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > > > > > >> >
> > > > > > >> > It for sure papers over the issue.  I think the error happens earlier,
> > > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > > > > >> > which is the type of the LHS.  But then you probably run into the
> > > > > > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > > >> > selecting the "low" part of the VLA vector.
> > > > > > >> Hi Richard,
> > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > > > >> represent dup operation
> > > > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > > > >> BIT_FIELD_REF will work.
> > > > > > >> Could you please elaborate ?
> > > > > > >>
> > > > > > >> Also, the issue doesn't seem restricted to this case.
> > > > > > >> The following test case also ICE's during forwprop:
> > > > > > >> svint32_t foo()
> > > > > > >> {
> > > > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > >>   return v2;
> > > > > > >> }
> > > > > > >>
> > > > > > >> foo2.c: In function ‘foo’:
> > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > > > > >>     9 | }
> > > > > > >>       | ^
> > > > > > >> svint32_t
> > > > > > >> int32x4_t
> > > > > > >> v2_4 = { 1, 2, 3, 4 };
> > > > > > >>
> > > > > > >> because simplify_permutation folds
> > > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > > > > >> into:
> > > > > > >> vector_cst {1, 2, 3, 4}
> > > > > > >>
> > > > > > >> and it complains during verify_gimple_assign_single because we don't
> > > > > > >> support assignment of vector_cst to VLA vector.
> > > > > > >>
> > > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > > > > > >> supports lhs and rhs
> > > > > > >> to have vector types with differing lengths, and simplifying it to
> > > > > > >> other tree codes, like above,
> > > > > > >> will result in type errors ?
> > > > > > >
> > > > > > > That might be the case - Richard should know.
> > > > > >
> > > > > > I don't see anything particularly special about VEC_PERM_EXPR here,
> > > > > > or about the VLA vs. VLS thing.  We would have the same issue trying
> > > > > > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > > > > > tree codes whose input types are/can be different from their output
> > > > > > types.
> > > > > >
> > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > > > > > of type T needs to be replaced by something of type T.  Whether T has a
> > > > > > constant size or a variable size doesn't matter.
> > > > > >
> > > > > > > If so your type check
> > > > > > > is still too late, you should instead recognize that we are permuting
> > > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR
> > > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?
> > > > > >
> > > > > > Yeah.  If we're talking about the match.pd code, I think only:
> > > > > >
> > > > > >   (if (sel.series_p (0, 1, 0, 1))
> > > > > >    { op0; }
> > > > > >    (if (sel.series_p (0, 1, nelts, 1))
> > > > > >     { op1; }
> > > > > >
> > > > > > need a type compatibility check.  For fold_vec_perm I think
> > > > > > we should just rearrange:
> > > > > >
> > > > > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > > > >   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
> > > > > >       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
> > > > > >     return NULL_TREE;
> > > > > >
> > > > > > so that the assert comes after the early-out.
> > > > > >
> > > > > > It would be good at some point to relax fold_vec_perm to cases where the
> > > > > > outputs are a different length from the inputs, since the all-constant
> > > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST.
> > > > > Hi,
> > > > > For the above case, I think the issue is that simplify_permutation
> > > > > uses TREE_TYPE (arg0) for res_type,
> > > > > while it should now use type for lhs.
> > > > >
> > > > >       /* Shuffle of a constructor.  */
> > > > >       bool ret = false;
> > > > >       tree res_type = TREE_TYPE (arg0);
> > > > >       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > > > >
> > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)),
> > > > > resolves the ICE, and emits all constant VEC_PERM_EXPR:
> > > > >
> > > > >   v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
> > > > >   return v2_4;
> > > > >
> > > > > Does the patch look OK to commit after bootstrap+test ?
> > > >
> > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt).
> > > Hi,
> > > I committed the patch but unfortunately it caused PR106360.
> > > The issue is that for slp-reduc-sad-2.c on ppc64le,
> > > simplify_permutation sees the following during forwprop4:
> > >
> > >   _78 = (void *) ivtmp.21_73;
> > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > >   _91 = {_92, 0};
> > >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > >   _87 = {_88, 0};
> > >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> > >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> > >
> > > So for,
> > > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt));
> > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > >
> > > we have:
> > > res_type = V16QI
> > > arg0 = {_92, 0}
> > > arg1 = {_88, 0}
> > > op2 = {0, 2}
> > >
> > > and thus we hit the following assert in fold_vec_perm:
> > >
> > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > >
> > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16.
> > >
> > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead,
> > > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR:
> > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > >   _5 = {_92, _88};
> > >   vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5);
> > >
> > > I suppose it's legal to cast vector of one type to another as long as
> > > sizes match ?
> > >
> > > IIUC, the above VIEW_CONVERT_EXPR will result in:
> > > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0,
> > > 0, 0, 0, 0, 0, 0 } ?
> > >
> > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks
> > > if lhs_type and res_type differ but have same size, and in that case emit:
> > > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt),
> > > instead of:
> > > lhs = VIEW_CONVERT_EXPR<op1 type> (opt)
> > > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2>
> > >
> > > Does it look OK ?
> >
> > Definitely the original change was bogus.
> >
> > +         if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type)))
> > +           return 0;
> >
> > just repeats your very original change though ... I'll note that
> > fold_ternary will
> > ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by
> > returning NULL_TREE on cases it does not handle.
> >
> > I think what should be done is, in the
> >
> >       /* If there are any VIEW_CONVERT_EXPRs found when finding permutation
> >          operands source, check whether it's valid to transform and prepare
> >          the required new operands.  */
> >       if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> >         {
> > ...
> >
> > path also transform the expected result type.  It should remain V_C_E compatible
> > to TREE_TYPE (lhs) but get a new element type.
> >
> > But as said,
> >
> > tree
> > fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
> > {
> >   unsigned int i;
> >   unsigned HOST_WIDE_INT nelts;
> >   bool need_ctor = false;
> >
> >   if (!sel.length ().is_constant (&nelts))
> >     return NULL_TREE;
> >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> >
> > ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary
> > doesn't guard according to those asserts (I think we should extend fold_vec_perm
> > to support the new constraints).
> Hi Richard,
> Thanks for the suggestions and sorry for late reply. I reverted the change to
> simplify_permutaton which resolved the ppc64le case ICE.
>
> The attached patch extends fold_vec_perm to handle vectors with
> differing lengths.
> For,
> lhs = vec_perm_expr<arg0, arg1, mask>,
> the patch:
> (a) asserts lhs and mask have same vector length.
> (b) asserts arg0 and arg1 have same vector length.
> (c) returns NULL_TREE if element type differs for lhs, arg0 and arg1.
> (d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1
> if the mask has npatterns == len(arg0) and nelts_per_pattern == 1.
> The intent is to permute arg0 and arg1, and then to dup elements in result
> to target vector length.
> So for eg:
> vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}>
> will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6}
> replicated thru-out.
> Does it look OK ?
>
> With the patch, we don't ICE for either of the aarch64 tests above.
> For,
> svint32_t f1()
> {
>   int32x4_t v = {1, 2, 3, 4};
>   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> }
>
> optimized dump shows:
> svint32_t f1 ()
> {
>   int32x4_t v;
>
>   <bb 2> :
>   return { 1, 2, 3, 4, ... };
>
> }
>
> code-gen:
> f1:
> .LFB3900:
>         .cfi_startproc
>         ptrue   p0.b, all
>         adrp    x0, .LC0
>         add     x0, x0, :lo12:.LC0
>         ld1rqw  z0.s, p0/z, [x0]
>         ret
> .LC0:
>         .word   1
>         .word   2
>         .word   3
>         .word   4
>
> I guess for this particular case, we could use index instead.
>
> For,
> svint32_t f2(int a, int b, int c, int d)
> {
>   int32x4_t v = {a, b, c, d};
>   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> }
>
> optimized dump shows:
> svint32_t f2 (int a, int b, int c, int d)
> {
>   svint32_t _6;
>
>   <bb 2> [local count: 1073741824]:
>   _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... };
>   return _6;
>
> The code-gen seems pretty bad however:
> f2:
> .LFB3901:
>         .cfi_startproc
>         addvl   sp, sp, #-4
>         .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22
>         ptrue   p0.b, all
>         addvl   x4, sp, #3
>         mov     z0.b, #0
>         st1w    z0.s, p0, [sp, #3, mul vl]
>         str     w0, [x4]
>         addvl   x0, sp, #2
>         ld1w    z0.s, p0/z, [sp, #3, mul vl]
>         st1w    z0.s, p0, [sp, #2, mul vl]
>         str     w1, [x0, 4]
>         addvl   x0, sp, #1
>         ld1w    z0.s, p0/z, [sp, #2, mul vl]
>         st1w    z0.s, p0, [sp, #1, mul vl]
>         str     w2, [x0, 8]
>         ld1w    z0.s, p0/z, [sp, #1, mul vl]
>         st1w    z0.s, p0, [sp]
>         str     w3, [sp, 12]
>         ld1w    z0.s, p0/z, [sp]
>         addvl   sp, sp, #4
>         .cfi_def_cfa_offset 0
>         ret
>
> I will try to address code-gen issues in follow up patches.
> Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu.


  /* If result vector has greater length than input vector,
+     then allow permuting two vectors as long as:
+     a) sel.nelts_per_pattern == 1
+     b) sel.npatterns == len of input vector.
+     The intent is to permute input vectors, and
+     dup the elements in resulting vector to target vector length.  */
+
+  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
+               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
+    {
+      nelts = sel.encoding ().npatterns ();
+      if (sel.encoding ().nelts_per_pattern () != 1
+         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
+       return NULL_TREE;
+    }

so the only case you add is non-VLA to VLA and there
explicitely only the case of a period that's same as the
element count in the input vectors.


@@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
node, int spc, dump_flags_t flags,
                pp_space (pp);
              }
          }
+       if (VECTOR_TYPE_P (TREE_TYPE (node))
+           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
+         pp_string (pp, ", ... ");
        pp_right_brace (pp);

btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?

I had hoped that you would make

  tree *in_elts = XALLOCAVEC (tree, nelts * 2);
  if (!vec_cst_ctor_to_array (arg0, nelts, in_elts)
      || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts))
    return NULL_TREE;

VLA agnostic, thus support for example permuting
{ 0, 2, 4, 8, ... }, { 1, 3, 5, 7 ... }, { 0, 4, 1, 5 ... }
as { 0, 1, 2, 3, ... }, etc.

that should be entirely doable, to somebody familiar with VLA and the APIs
even more so.

Richard.

> Thanks,
> Prathamesh
>
> >
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > >
> > > >
> > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch.
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Thanks,
> > > > > > Richard
  
Prathamesh Kulkarni Aug. 9, 2022, 10:09 a.m. UTC | #11
On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > >
> > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > > > <richard.sandiford@arm.com> wrote:
> > > > > > >
> > > > > > > Richard Biener <richard.guenther@gmail.com> writes:
> > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > > >>
> > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > > >> >
> > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > > > > > >> > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > >> > >
> > > > > > > >> > > Hi Richard,
> > > > > > > >> > > For the following test:
> > > > > > > >> > >
> > > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > > >> > > {
> > > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > >> > > }
> > > > > > > >> > >
> > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > > > > > >> > >     4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > > >> > >       |           ^~
> > > > > > > >> > > svint32_t
> > > > > > > >> > > __Int32x4_t
> > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > > >> > >         ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > > >> > >         ../../gcc/gcc/passes.cc:2091
> > > > > > > >> > > 0xe93ccb execute_todo
> > > > > > > >> > >         ../../gcc/gcc/passes.cc:2145
> > > > > > > >> > >
> > > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > > > > > > >> > >   int32x4_t v;
> > > > > > > >> > >   __Int32x4_t _1;
> > > > > > > >> > >   svint32_t _9;
> > > > > > > >> > >   vector(4) int _11;
> > > > > > > >> > >
> > > > > > > >> > >   <bb 2> :
> > > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > > >> > >   v_12 = _1;
> > > > > > > >> > >   _11 = v_12;
> > > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > > >> > >   return _9;
> > > > > > > >> > >
> > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > > > > > >> > > view_convert_expr,
> > > > > > > >> > > and the end result becomes:
> > > > > > > >> > >   svint32_t _7;
> > > > > > > >> > >   __Int32x4_t _8;
> > > > > > > >> > >
> > > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > > >> > > ;;    pred:       ENTRY
> > > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > >> > >   return _7;
> > > > > > > >> > > ;;    succ:       EXIT
> > > > > > > >> > >
> > > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > > >> > >
> > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > > > > > > >> >
> > > > > > > >> > It for sure papers over the issue.  I think the error happens earlier,
> > > > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > > > > > >> > which is the type of the LHS.  But then you probably run into the
> > > > > > > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > > > >> > selecting the "low" part of the VLA vector.
> > > > > > > >> Hi Richard,
> > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > > > > >> represent dup operation
> > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > > > > >> BIT_FIELD_REF will work.
> > > > > > > >> Could you please elaborate ?
> > > > > > > >>
> > > > > > > >> Also, the issue doesn't seem restricted to this case.
> > > > > > > >> The following test case also ICE's during forwprop:
> > > > > > > >> svint32_t foo()
> > > > > > > >> {
> > > > > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > >>   return v2;
> > > > > > > >> }
> > > > > > > >>
> > > > > > > >> foo2.c: In function ‘foo’:
> > > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > > > > > >>     9 | }
> > > > > > > >>       | ^
> > > > > > > >> svint32_t
> > > > > > > >> int32x4_t
> > > > > > > >> v2_4 = { 1, 2, 3, 4 };
> > > > > > > >>
> > > > > > > >> because simplify_permutation folds
> > > > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > > > > > >> into:
> > > > > > > >> vector_cst {1, 2, 3, 4}
> > > > > > > >>
> > > > > > > >> and it complains during verify_gimple_assign_single because we don't
> > > > > > > >> support assignment of vector_cst to VLA vector.
> > > > > > > >>
> > > > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > > > > > > >> supports lhs and rhs
> > > > > > > >> to have vector types with differing lengths, and simplifying it to
> > > > > > > >> other tree codes, like above,
> > > > > > > >> will result in type errors ?
> > > > > > > >
> > > > > > > > That might be the case - Richard should know.
> > > > > > >
> > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here,
> > > > > > > or about the VLA vs. VLS thing.  We would have the same issue trying
> > > > > > > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > > > > > > tree codes whose input types are/can be different from their output
> > > > > > > types.
> > > > > > >
> > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > > > > > > of type T needs to be replaced by something of type T.  Whether T has a
> > > > > > > constant size or a variable size doesn't matter.
> > > > > > >
> > > > > > > > If so your type check
> > > > > > > > is still too late, you should instead recognize that we are permuting
> > > > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > > > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR
> > > > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?
> > > > > > >
> > > > > > > Yeah.  If we're talking about the match.pd code, I think only:
> > > > > > >
> > > > > > >   (if (sel.series_p (0, 1, 0, 1))
> > > > > > >    { op0; }
> > > > > > >    (if (sel.series_p (0, 1, nelts, 1))
> > > > > > >     { op1; }
> > > > > > >
> > > > > > > need a type compatibility check.  For fold_vec_perm I think
> > > > > > > we should just rearrange:
> > > > > > >
> > > > > > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > > > > >   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
> > > > > > >       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
> > > > > > >     return NULL_TREE;
> > > > > > >
> > > > > > > so that the assert comes after the early-out.
> > > > > > >
> > > > > > > It would be good at some point to relax fold_vec_perm to cases where the
> > > > > > > outputs are a different length from the inputs, since the all-constant
> > > > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST.
> > > > > > Hi,
> > > > > > For the above case, I think the issue is that simplify_permutation
> > > > > > uses TREE_TYPE (arg0) for res_type,
> > > > > > while it should now use type for lhs.
> > > > > >
> > > > > >       /* Shuffle of a constructor.  */
> > > > > >       bool ret = false;
> > > > > >       tree res_type = TREE_TYPE (arg0);
> > > > > >       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > > > > >
> > > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)),
> > > > > > resolves the ICE, and emits all constant VEC_PERM_EXPR:
> > > > > >
> > > > > >   v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
> > > > > >   return v2_4;
> > > > > >
> > > > > > Does the patch look OK to commit after bootstrap+test ?
> > > > >
> > > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt).
> > > > Hi,
> > > > I committed the patch but unfortunately it caused PR106360.
> > > > The issue is that for slp-reduc-sad-2.c on ppc64le,
> > > > simplify_permutation sees the following during forwprop4:
> > > >
> > > >   _78 = (void *) ivtmp.21_73;
> > > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > > >   _91 = {_92, 0};
> > > >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> > > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > > >   _87 = {_88, 0};
> > > >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> > > >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> > > >
> > > > So for,
> > > > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt));
> > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > > >
> > > > we have:
> > > > res_type = V16QI
> > > > arg0 = {_92, 0}
> > > > arg1 = {_88, 0}
> > > > op2 = {0, 2}
> > > >
> > > > and thus we hit the following assert in fold_vec_perm:
> > > >
> > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > >
> > > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16.
> > > >
> > > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead,
> > > > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR:
> > > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > > >   _5 = {_92, _88};
> > > >   vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5);
> > > >
> > > > I suppose it's legal to cast vector of one type to another as long as
> > > > sizes match ?
> > > >
> > > > IIUC, the above VIEW_CONVERT_EXPR will result in:
> > > > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0,
> > > > 0, 0, 0, 0, 0, 0 } ?
> > > >
> > > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks
> > > > if lhs_type and res_type differ but have same size, and in that case emit:
> > > > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt),
> > > > instead of:
> > > > lhs = VIEW_CONVERT_EXPR<op1 type> (opt)
> > > > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2>
> > > >
> > > > Does it look OK ?
> > >
> > > Definitely the original change was bogus.
> > >
> > > +         if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type)))
> > > +           return 0;
> > >
> > > just repeats your very original change though ... I'll note that
> > > fold_ternary will
> > > ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by
> > > returning NULL_TREE on cases it does not handle.
> > >
> > > I think what should be done is, in the
> > >
> > >       /* If there are any VIEW_CONVERT_EXPRs found when finding permutation
> > >          operands source, check whether it's valid to transform and prepare
> > >          the required new operands.  */
> > >       if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > >         {
> > > ...
> > >
> > > path also transform the expected result type.  It should remain V_C_E compatible
> > > to TREE_TYPE (lhs) but get a new element type.
> > >
> > > But as said,
> > >
> > > tree
> > > fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
> > > {
> > >   unsigned int i;
> > >   unsigned HOST_WIDE_INT nelts;
> > >   bool need_ctor = false;
> > >
> > >   if (!sel.length ().is_constant (&nelts))
> > >     return NULL_TREE;
> > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > >
> > > ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary
> > > doesn't guard according to those asserts (I think we should extend fold_vec_perm
> > > to support the new constraints).
> > Hi Richard,
> > Thanks for the suggestions and sorry for late reply. I reverted the change to
> > simplify_permutaton which resolved the ppc64le case ICE.
> >
> > The attached patch extends fold_vec_perm to handle vectors with
> > differing lengths.
> > For,
> > lhs = vec_perm_expr<arg0, arg1, mask>,
> > the patch:
> > (a) asserts lhs and mask have same vector length.
> > (b) asserts arg0 and arg1 have same vector length.
> > (c) returns NULL_TREE if element type differs for lhs, arg0 and arg1.
> > (d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1
> > if the mask has npatterns == len(arg0) and nelts_per_pattern == 1.
> > The intent is to permute arg0 and arg1, and then to dup elements in result
> > to target vector length.
> > So for eg:
> > vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}>
> > will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6}
> > replicated thru-out.
> > Does it look OK ?
> >
> > With the patch, we don't ICE for either of the aarch64 tests above.
> > For,
> > svint32_t f1()
> > {
> >   int32x4_t v = {1, 2, 3, 4};
> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > }
> >
> > optimized dump shows:
> > svint32_t f1 ()
> > {
> >   int32x4_t v;
> >
> >   <bb 2> :
> >   return { 1, 2, 3, 4, ... };
> >
> > }
> >
> > code-gen:
> > f1:
> > .LFB3900:
> >         .cfi_startproc
> >         ptrue   p0.b, all
> >         adrp    x0, .LC0
> >         add     x0, x0, :lo12:.LC0
> >         ld1rqw  z0.s, p0/z, [x0]
> >         ret
> > .LC0:
> >         .word   1
> >         .word   2
> >         .word   3
> >         .word   4
> >
> > I guess for this particular case, we could use index instead.
> >
> > For,
> > svint32_t f2(int a, int b, int c, int d)
> > {
> >   int32x4_t v = {a, b, c, d};
> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > }
> >
> > optimized dump shows:
> > svint32_t f2 (int a, int b, int c, int d)
> > {
> >   svint32_t _6;
> >
> >   <bb 2> [local count: 1073741824]:
> >   _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... };
> >   return _6;
> >
> > The code-gen seems pretty bad however:
> > f2:
> > .LFB3901:
> >         .cfi_startproc
> >         addvl   sp, sp, #-4
> >         .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22
> >         ptrue   p0.b, all
> >         addvl   x4, sp, #3
> >         mov     z0.b, #0
> >         st1w    z0.s, p0, [sp, #3, mul vl]
> >         str     w0, [x4]
> >         addvl   x0, sp, #2
> >         ld1w    z0.s, p0/z, [sp, #3, mul vl]
> >         st1w    z0.s, p0, [sp, #2, mul vl]
> >         str     w1, [x0, 4]
> >         addvl   x0, sp, #1
> >         ld1w    z0.s, p0/z, [sp, #2, mul vl]
> >         st1w    z0.s, p0, [sp, #1, mul vl]
> >         str     w2, [x0, 8]
> >         ld1w    z0.s, p0/z, [sp, #1, mul vl]
> >         st1w    z0.s, p0, [sp]
> >         str     w3, [sp, 12]
> >         ld1w    z0.s, p0/z, [sp]
> >         addvl   sp, sp, #4
> >         .cfi_def_cfa_offset 0
> >         ret
> >
> > I will try to address code-gen issues in follow up patches.
> > Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu.
>
>
>   /* If result vector has greater length than input vector,
> +     then allow permuting two vectors as long as:
> +     a) sel.nelts_per_pattern == 1
> +     b) sel.npatterns == len of input vector.
> +     The intent is to permute input vectors, and
> +     dup the elements in resulting vector to target vector length.  */
> +
> +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> +    {
> +      nelts = sel.encoding ().npatterns ();
> +      if (sel.encoding ().nelts_per_pattern () != 1
> +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> +       return NULL_TREE;
> +    }
>
> so the only case you add is non-VLA to VLA and there
> explicitely only the case of a period that's same as the
> element count in the input vectors.
>
>
> @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> node, int spc, dump_flags_t flags,
>                 pp_space (pp);
>               }
>           }
> +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> +         pp_string (pp, ", ... ");
>         pp_right_brace (pp);
>
> btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
Well, it got created for the following case after folding:
svint32_t f2(int a, int b, int c, int d)
{
  int32x4_t v = {a, b, c, d};
  return svld1rq_s32 (svptrue_b8 (), &v[0]);
}

The svld1rq_s32 call gets folded to:
v = {a, b, c, d}
lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>

fold_vec_perm then folds the above VEC_PERM_EXPR to
VLA constructor, since elements in v (in_elts) are not constant, and
need_ctor is thus true:
lhs = {a, b, c, d, ...}
I added "..." to make it more explicit that it's a VLA constructor.
>
> I had hoped that you would make
>
>   tree *in_elts = XALLOCAVEC (tree, nelts * 2);
>   if (!vec_cst_ctor_to_array (arg0, nelts, in_elts)
>       || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts))
>     return NULL_TREE;
>
> VLA agnostic, thus support for example permuting
> { 0, 2, 4, 8, ... }, { 1, 3, 5, 7 ... }, { 0, 4, 1, 5 ... }
> as { 0, 1, 2, 3, ... }, etc.
>
> that should be entirely doable, to somebody familiar with VLA and the APIs
> even more so.
Thanks, I will work on adding this case.

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> >
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > >
> > > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch.
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard
  
Richard Biener Aug. 9, 2022, 1:12 p.m. UTC | #12
On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > >
> > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > >
> > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > > > > <richard.sandiford@arm.com> wrote:
> > > > > > > >
> > > > > > > > Richard Biener <richard.guenther@gmail.com> writes:
> > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > > > >>
> > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > > > >> >
> > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > > > > > > >> > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > Hi Richard,
> > > > > > > > >> > > For the following test:
> > > > > > > > >> > >
> > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > > > >> > > {
> > > > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > > >> > > }
> > > > > > > > >> > >
> > > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > > > > > > >> > >     4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > > > >> > >       |           ^~
> > > > > > > > >> > > svint32_t
> > > > > > > > >> > > __Int32x4_t
> > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > > > >> > >         ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > > > >> > >         ../../gcc/gcc/passes.cc:2091
> > > > > > > > >> > > 0xe93ccb execute_todo
> > > > > > > > >> > >         ../../gcc/gcc/passes.cc:2145
> > > > > > > > >> > >
> > > > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > > > > > > > >> > >   int32x4_t v;
> > > > > > > > >> > >   __Int32x4_t _1;
> > > > > > > > >> > >   svint32_t _9;
> > > > > > > > >> > >   vector(4) int _11;
> > > > > > > > >> > >
> > > > > > > > >> > >   <bb 2> :
> > > > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > > > >> > >   v_12 = _1;
> > > > > > > > >> > >   _11 = v_12;
> > > > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > > > >> > >   return _9;
> > > > > > > > >> > >
> > > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > > > > > > >> > > view_convert_expr,
> > > > > > > > >> > > and the end result becomes:
> > > > > > > > >> > >   svint32_t _7;
> > > > > > > > >> > >   __Int32x4_t _8;
> > > > > > > > >> > >
> > > > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > > > >> > > ;;    pred:       ENTRY
> > > > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > >> > >   return _7;
> > > > > > > > >> > > ;;    succ:       EXIT
> > > > > > > > >> > >
> > > > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > > > >> > >
> > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > > > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > > > > > > > >> >
> > > > > > > > >> > It for sure papers over the issue.  I think the error happens earlier,
> > > > > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > > > > > > >> > which is the type of the LHS.  But then you probably run into the
> > > > > > > > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > > > > >> > selecting the "low" part of the VLA vector.
> > > > > > > > >> Hi Richard,
> > > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > > > > > >> represent dup operation
> > > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > > > > > >> BIT_FIELD_REF will work.
> > > > > > > > >> Could you please elaborate ?
> > > > > > > > >>
> > > > > > > > >> Also, the issue doesn't seem restricted to this case.
> > > > > > > > >> The following test case also ICE's during forwprop:
> > > > > > > > >> svint32_t foo()
> > > > > > > > >> {
> > > > > > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > > > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > > >>   return v2;
> > > > > > > > >> }
> > > > > > > > >>
> > > > > > > > >> foo2.c: In function ‘foo’:
> > > > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > > > > > > >>     9 | }
> > > > > > > > >>       | ^
> > > > > > > > >> svint32_t
> > > > > > > > >> int32x4_t
> > > > > > > > >> v2_4 = { 1, 2, 3, 4 };
> > > > > > > > >>
> > > > > > > > >> because simplify_permutation folds
> > > > > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > > > > > > >> into:
> > > > > > > > >> vector_cst {1, 2, 3, 4}
> > > > > > > > >>
> > > > > > > > >> and it complains during verify_gimple_assign_single because we don't
> > > > > > > > >> support assignment of vector_cst to VLA vector.
> > > > > > > > >>
> > > > > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > > > > > > > >> supports lhs and rhs
> > > > > > > > >> to have vector types with differing lengths, and simplifying it to
> > > > > > > > >> other tree codes, like above,
> > > > > > > > >> will result in type errors ?
> > > > > > > > >
> > > > > > > > > That might be the case - Richard should know.
> > > > > > > >
> > > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here,
> > > > > > > > or about the VLA vs. VLS thing.  We would have the same issue trying
> > > > > > > > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > > > > > > > tree codes whose input types are/can be different from their output
> > > > > > > > types.
> > > > > > > >
> > > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > > > > > > > of type T needs to be replaced by something of type T.  Whether T has a
> > > > > > > > constant size or a variable size doesn't matter.
> > > > > > > >
> > > > > > > > > If so your type check
> > > > > > > > > is still too late, you should instead recognize that we are permuting
> > > > > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > > > > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR
> > > > > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?
> > > > > > > >
> > > > > > > > Yeah.  If we're talking about the match.pd code, I think only:
> > > > > > > >
> > > > > > > >   (if (sel.series_p (0, 1, 0, 1))
> > > > > > > >    { op0; }
> > > > > > > >    (if (sel.series_p (0, 1, nelts, 1))
> > > > > > > >     { op1; }
> > > > > > > >
> > > > > > > > need a type compatibility check.  For fold_vec_perm I think
> > > > > > > > we should just rearrange:
> > > > > > > >
> > > > > > > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > > > > > >   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
> > > > > > > >       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
> > > > > > > >     return NULL_TREE;
> > > > > > > >
> > > > > > > > so that the assert comes after the early-out.
> > > > > > > >
> > > > > > > > It would be good at some point to relax fold_vec_perm to cases where the
> > > > > > > > outputs are a different length from the inputs, since the all-constant
> > > > > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST.
> > > > > > > Hi,
> > > > > > > For the above case, I think the issue is that simplify_permutation
> > > > > > > uses TREE_TYPE (arg0) for res_type,
> > > > > > > while it should now use type for lhs.
> > > > > > >
> > > > > > >       /* Shuffle of a constructor.  */
> > > > > > >       bool ret = false;
> > > > > > >       tree res_type = TREE_TYPE (arg0);
> > > > > > >       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > > > > > >
> > > > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)),
> > > > > > > resolves the ICE, and emits all constant VEC_PERM_EXPR:
> > > > > > >
> > > > > > >   v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
> > > > > > >   return v2_4;
> > > > > > >
> > > > > > > Does the patch look OK to commit after bootstrap+test ?
> > > > > >
> > > > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt).
> > > > > Hi,
> > > > > I committed the patch but unfortunately it caused PR106360.
> > > > > The issue is that for slp-reduc-sad-2.c on ppc64le,
> > > > > simplify_permutation sees the following during forwprop4:
> > > > >
> > > > >   _78 = (void *) ivtmp.21_73;
> > > > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > > > >   _91 = {_92, 0};
> > > > >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> > > > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > > > >   _87 = {_88, 0};
> > > > >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> > > > >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> > > > >
> > > > > So for,
> > > > > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt));
> > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > > > >
> > > > > we have:
> > > > > res_type = V16QI
> > > > > arg0 = {_92, 0}
> > > > > arg1 = {_88, 0}
> > > > > op2 = {0, 2}
> > > > >
> > > > > and thus we hit the following assert in fold_vec_perm:
> > > > >
> > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > > >
> > > > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16.
> > > > >
> > > > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead,
> > > > > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR:
> > > > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > > > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > > > >   _5 = {_92, _88};
> > > > >   vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5);
> > > > >
> > > > > I suppose it's legal to cast vector of one type to another as long as
> > > > > sizes match ?
> > > > >
> > > > > IIUC, the above VIEW_CONVERT_EXPR will result in:
> > > > > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0,
> > > > > 0, 0, 0, 0, 0, 0 } ?
> > > > >
> > > > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks
> > > > > if lhs_type and res_type differ but have same size, and in that case emit:
> > > > > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt),
> > > > > instead of:
> > > > > lhs = VIEW_CONVERT_EXPR<op1 type> (opt)
> > > > > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2>
> > > > >
> > > > > Does it look OK ?
> > > >
> > > > Definitely the original change was bogus.
> > > >
> > > > +         if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type)))
> > > > +           return 0;
> > > >
> > > > just repeats your very original change though ... I'll note that
> > > > fold_ternary will
> > > > ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by
> > > > returning NULL_TREE on cases it does not handle.
> > > >
> > > > I think what should be done is, in the
> > > >
> > > >       /* If there are any VIEW_CONVERT_EXPRs found when finding permutation
> > > >          operands source, check whether it's valid to transform and prepare
> > > >          the required new operands.  */
> > > >       if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > > >         {
> > > > ...
> > > >
> > > > path also transform the expected result type.  It should remain V_C_E compatible
> > > > to TREE_TYPE (lhs) but get a new element type.
> > > >
> > > > But as said,
> > > >
> > > > tree
> > > > fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
> > > > {
> > > >   unsigned int i;
> > > >   unsigned HOST_WIDE_INT nelts;
> > > >   bool need_ctor = false;
> > > >
> > > >   if (!sel.length ().is_constant (&nelts))
> > > >     return NULL_TREE;
> > > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > >
> > > > ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary
> > > > doesn't guard according to those asserts (I think we should extend fold_vec_perm
> > > > to support the new constraints).
> > > Hi Richard,
> > > Thanks for the suggestions and sorry for late reply. I reverted the change to
> > > simplify_permutaton which resolved the ppc64le case ICE.
> > >
> > > The attached patch extends fold_vec_perm to handle vectors with
> > > differing lengths.
> > > For,
> > > lhs = vec_perm_expr<arg0, arg1, mask>,
> > > the patch:
> > > (a) asserts lhs and mask have same vector length.
> > > (b) asserts arg0 and arg1 have same vector length.
> > > (c) returns NULL_TREE if element type differs for lhs, arg0 and arg1.
> > > (d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1
> > > if the mask has npatterns == len(arg0) and nelts_per_pattern == 1.
> > > The intent is to permute arg0 and arg1, and then to dup elements in result
> > > to target vector length.
> > > So for eg:
> > > vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}>
> > > will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6}
> > > replicated thru-out.
> > > Does it look OK ?
> > >
> > > With the patch, we don't ICE for either of the aarch64 tests above.
> > > For,
> > > svint32_t f1()
> > > {
> > >   int32x4_t v = {1, 2, 3, 4};
> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > }
> > >
> > > optimized dump shows:
> > > svint32_t f1 ()
> > > {
> > >   int32x4_t v;
> > >
> > >   <bb 2> :
> > >   return { 1, 2, 3, 4, ... };
> > >
> > > }
> > >
> > > code-gen:
> > > f1:
> > > .LFB3900:
> > >         .cfi_startproc
> > >         ptrue   p0.b, all
> > >         adrp    x0, .LC0
> > >         add     x0, x0, :lo12:.LC0
> > >         ld1rqw  z0.s, p0/z, [x0]
> > >         ret
> > > .LC0:
> > >         .word   1
> > >         .word   2
> > >         .word   3
> > >         .word   4
> > >
> > > I guess for this particular case, we could use index instead.
> > >
> > > For,
> > > svint32_t f2(int a, int b, int c, int d)
> > > {
> > >   int32x4_t v = {a, b, c, d};
> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > }
> > >
> > > optimized dump shows:
> > > svint32_t f2 (int a, int b, int c, int d)
> > > {
> > >   svint32_t _6;
> > >
> > >   <bb 2> [local count: 1073741824]:
> > >   _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... };
> > >   return _6;
> > >
> > > The code-gen seems pretty bad however:
> > > f2:
> > > .LFB3901:
> > >         .cfi_startproc
> > >         addvl   sp, sp, #-4
> > >         .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22
> > >         ptrue   p0.b, all
> > >         addvl   x4, sp, #3
> > >         mov     z0.b, #0
> > >         st1w    z0.s, p0, [sp, #3, mul vl]
> > >         str     w0, [x4]
> > >         addvl   x0, sp, #2
> > >         ld1w    z0.s, p0/z, [sp, #3, mul vl]
> > >         st1w    z0.s, p0, [sp, #2, mul vl]
> > >         str     w1, [x0, 4]
> > >         addvl   x0, sp, #1
> > >         ld1w    z0.s, p0/z, [sp, #2, mul vl]
> > >         st1w    z0.s, p0, [sp, #1, mul vl]
> > >         str     w2, [x0, 8]
> > >         ld1w    z0.s, p0/z, [sp, #1, mul vl]
> > >         st1w    z0.s, p0, [sp]
> > >         str     w3, [sp, 12]
> > >         ld1w    z0.s, p0/z, [sp]
> > >         addvl   sp, sp, #4
> > >         .cfi_def_cfa_offset 0
> > >         ret
> > >
> > > I will try to address code-gen issues in follow up patches.
> > > Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu.
> >
> >
> >   /* If result vector has greater length than input vector,
> > +     then allow permuting two vectors as long as:
> > +     a) sel.nelts_per_pattern == 1
> > +     b) sel.npatterns == len of input vector.
> > +     The intent is to permute input vectors, and
> > +     dup the elements in resulting vector to target vector length.  */
> > +
> > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> > +    {
> > +      nelts = sel.encoding ().npatterns ();
> > +      if (sel.encoding ().nelts_per_pattern () != 1
> > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> > +       return NULL_TREE;
> > +    }
> >
> > so the only case you add is non-VLA to VLA and there
> > explicitely only the case of a period that's same as the
> > element count in the input vectors.
> >
> >
> > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > node, int spc, dump_flags_t flags,
> >                 pp_space (pp);
> >               }
> >           }
> > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> > +         pp_string (pp, ", ... ");
> >         pp_right_brace (pp);
> >
> > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> Well, it got created for the following case after folding:
> svint32_t f2(int a, int b, int c, int d)
> {
>   int32x4_t v = {a, b, c, d};
>   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> }
>
> The svld1rq_s32 call gets folded to:
> v = {a, b, c, d}
> lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
>
> fold_vec_perm then folds the above VEC_PERM_EXPR to
> VLA constructor, since elements in v (in_elts) are not constant, and
> need_ctor is thus true:
> lhs = {a, b, c, d, ...}
> I added "..." to make it more explicit that it's a VLA constructor.

But I doubt we do anything reasonable with such a beast?  Do we?
I suppose it's like a vec_duplicate if you view it as V1TImode
but do we actually make sure to do this duplication?

> >
> > I had hoped that you would make
> >
> >   tree *in_elts = XALLOCAVEC (tree, nelts * 2);
> >   if (!vec_cst_ctor_to_array (arg0, nelts, in_elts)
> >       || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts))
> >     return NULL_TREE;
> >
> > VLA agnostic, thus support for example permuting
> > { 0, 2, 4, 8, ... }, { 1, 3, 5, 7 ... }, { 0, 4, 1, 5 ... }
> > as { 0, 1, 2, 3, ... }, etc.
> >
> > that should be entirely doable, to somebody familiar with VLA and the APIs
> > even more so.
> Thanks, I will work on adding this case.
>
> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > >
> > > >
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >
> > > > > >
> > > > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard
  
Prathamesh Kulkarni Aug. 11, 2022, 1:23 p.m. UTC | #13
On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Thu, 21 Jul 2022 at 12:21, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > >
> > > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > > >
> > > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > > > > > <richard.sandiford@arm.com> wrote:
> > > > > > > > >
> > > > > > > > > Richard Biener <richard.guenther@gmail.com> writes:
> > > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > > > > >>
> > > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > > > > > > > > >> > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > Hi Richard,
> > > > > > > > > >> > > For the following test:
> > > > > > > > > >> > >
> > > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > > > > >> > > {
> > > > > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > > > >> > > }
> > > > > > > > > >> > >
> > > > > > > > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > > > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > > > > > > > >> > >     4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > > > > >> > >       |           ^~
> > > > > > > > > >> > > svint32_t
> > > > > > > > > >> > > __Int32x4_t
> > > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > > > > >> > >         ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > > > > >> > >         ../../gcc/gcc/passes.cc:2091
> > > > > > > > > >> > > 0xe93ccb execute_todo
> > > > > > > > > >> > >         ../../gcc/gcc/passes.cc:2145
> > > > > > > > > >> > >
> > > > > > > > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > > > > > > > > >> > >   int32x4_t v;
> > > > > > > > > >> > >   __Int32x4_t _1;
> > > > > > > > > >> > >   svint32_t _9;
> > > > > > > > > >> > >   vector(4) int _11;
> > > > > > > > > >> > >
> > > > > > > > > >> > >   <bb 2> :
> > > > > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > > > > >> > >   v_12 = _1;
> > > > > > > > > >> > >   _11 = v_12;
> > > > > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > > > > >> > >   return _9;
> > > > > > > > > >> > >
> > > > > > > > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > > > > > > > >> > > view_convert_expr,
> > > > > > > > > >> > > and the end result becomes:
> > > > > > > > > >> > >   svint32_t _7;
> > > > > > > > > >> > >   __Int32x4_t _8;
> > > > > > > > > >> > >
> > > > > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > > > > >> > > ;;    pred:       ENTRY
> > > > > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > > >> > >   return _7;
> > > > > > > > > >> > > ;;    succ:       EXIT
> > > > > > > > > >> > >
> > > > > > > > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > > > > >> > >
> > > > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > > > > > > > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > > > > > > > > >> >
> > > > > > > > > >> > It for sure papers over the issue.  I think the error happens earlier,
> > > > > > > > > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > > > > > > > > >> > which is the type of the LHS.  But then you probably run into the
> > > > > > > > > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > > > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > > > > > >> > selecting the "low" part of the VLA vector.
> > > > > > > > > >> Hi Richard,
> > > > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > > > > > > > > >> represent dup operation
> > > > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > > > > > > >> BIT_FIELD_REF will work.
> > > > > > > > > >> Could you please elaborate ?
> > > > > > > > > >>
> > > > > > > > > >> Also, the issue doesn't seem restricted to this case.
> > > > > > > > > >> The following test case also ICE's during forwprop:
> > > > > > > > > >> svint32_t foo()
> > > > > > > > > >> {
> > > > > > > > > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > > > > > > > > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > > > >>   return v2;
> > > > > > > > > >> }
> > > > > > > > > >>
> > > > > > > > > >> foo2.c: In function ‘foo’:
> > > > > > > > > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > > > > > > > > >>     9 | }
> > > > > > > > > >>       | ^
> > > > > > > > > >> svint32_t
> > > > > > > > > >> int32x4_t
> > > > > > > > > >> v2_4 = { 1, 2, 3, 4 };
> > > > > > > > > >>
> > > > > > > > > >> because simplify_permutation folds
> > > > > > > > > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > > > > > > > > >> into:
> > > > > > > > > >> vector_cst {1, 2, 3, 4}
> > > > > > > > > >>
> > > > > > > > > >> and it complains during verify_gimple_assign_single because we don't
> > > > > > > > > >> support assignment of vector_cst to VLA vector.
> > > > > > > > > >>
> > > > > > > > > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > > > > > > > > >> supports lhs and rhs
> > > > > > > > > >> to have vector types with differing lengths, and simplifying it to
> > > > > > > > > >> other tree codes, like above,
> > > > > > > > > >> will result in type errors ?
> > > > > > > > > >
> > > > > > > > > > That might be the case - Richard should know.
> > > > > > > > >
> > > > > > > > > I don't see anything particularly special about VEC_PERM_EXPR here,
> > > > > > > > > or about the VLA vs. VLS thing.  We would have the same issue trying
> > > > > > > > > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > > > > > > > > tree codes whose input types are/can be different from their output
> > > > > > > > > types.
> > > > > > > > >
> > > > > > > > > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > > > > > > > > of type T needs to be replaced by something of type T.  Whether T has a
> > > > > > > > > constant size or a variable size doesn't matter.
> > > > > > > > >
> > > > > > > > > > If so your type check
> > > > > > > > > > is still too late, you should instead recognize that we are permuting
> > > > > > > > > > a VLA vector and then refuse to go any of the non-VEC_PERM generating
> > > > > > > > > > paths - that probably means just allowing the code == VEC_PERM_EXPR
> > > > > > > > > > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?
> > > > > > > > >
> > > > > > > > > Yeah.  If we're talking about the match.pd code, I think only:
> > > > > > > > >
> > > > > > > > >   (if (sel.series_p (0, 1, 0, 1))
> > > > > > > > >    { op0; }
> > > > > > > > >    (if (sel.series_p (0, 1, nelts, 1))
> > > > > > > > >     { op1; }
> > > > > > > > >
> > > > > > > > > need a type compatibility check.  For fold_vec_perm I think
> > > > > > > > > we should just rearrange:
> > > > > > > > >
> > > > > > > > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > > > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > > > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > > > > > > >   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
> > > > > > > > >       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
> > > > > > > > >     return NULL_TREE;
> > > > > > > > >
> > > > > > > > > so that the assert comes after the early-out.
> > > > > > > > >
> > > > > > > > > It would be good at some point to relax fold_vec_perm to cases where the
> > > > > > > > > outputs are a different length from the inputs, since the all-constant
> > > > > > > > > VEC_PERM_EXPR above could be folded to a VECTOR_CST.
> > > > > > > > Hi,
> > > > > > > > For the above case, I think the issue is that simplify_permutation
> > > > > > > > uses TREE_TYPE (arg0) for res_type,
> > > > > > > > while it should now use type for lhs.
> > > > > > > >
> > > > > > > >       /* Shuffle of a constructor.  */
> > > > > > > >       bool ret = false;
> > > > > > > >       tree res_type = TREE_TYPE (arg0);
> > > > > > > >       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > > > > > > >
> > > > > > > > Using, res_type = TREE_TYPE (gimple_get_lhs (stmt)),
> > > > > > > > resolves the ICE, and emits all constant VEC_PERM_EXPR:
> > > > > > > >
> > > > > > > >   v2_4 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
> > > > > > > >   return v2_4;
> > > > > > > >
> > > > > > > > Does the patch look OK to commit after bootstrap+test ?
> > > > > > >
> > > > > > > Ok with using gimple_assign_lhs (stmt) instead of gimple_get_lhs (stmt).
> > > > > > Hi,
> > > > > > I committed the patch but unfortunately it caused PR106360.
> > > > > > The issue is that for slp-reduc-sad-2.c on ppc64le,
> > > > > > simplify_permutation sees the following during forwprop4:
> > > > > >
> > > > > >   _78 = (void *) ivtmp.21_73;
> > > > > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > > > > >   _91 = {_92, 0};
> > > > > >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> > > > > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > > > > >   _87 = {_88, 0};
> > > > > >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> > > > > >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> > > > > >
> > > > > > So for,
> > > > > > tree res_type = TREE_TYPE (gimple_assign_lhs (stmt));
> > > > > > tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
> > > > > >
> > > > > > we have:
> > > > > > res_type = V16QI
> > > > > > arg0 = {_92, 0}
> > > > > > arg1 = {_88, 0}
> > > > > > op2 = {0, 2}
> > > > > >
> > > > > > and thus we hit the following assert in fold_vec_perm:
> > > > > >
> > > > > > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > > > >
> > > > > > since nelts == 2, and TYPE_VECTOR_SUBPARTS (type) == 16.
> > > > > >
> > > > > > If we revert the committed patch so we pass res_type = TREE_TYPE (arg0) instead,
> > > > > > it simplifies the above VEC_PERM_EXPR to VIEW_CONVERT_EXPR:
> > > > > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > > > > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > > > > >   _5 = {_92, _88};
> > > > > >   vect__1.8_85 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_5);
> > > > > >
> > > > > > I suppose it's legal to cast vector of one type to another as long as
> > > > > > sizes match ?
> > > > > >
> > > > > > IIUC, the above VIEW_CONVERT_EXPR will result in:
> > > > > > vect__1.8_85 = { (uint8_t) _92, 0, 0, 0, 0, 0, 0, 0, (uint8_t) _88, 0,
> > > > > > 0, 0, 0, 0, 0, 0 } ?
> > > > > >
> > > > > > In the attached patch, it restores res_type to TREE_TYPE (arg0), and checks
> > > > > > if lhs_type and res_type differ but have same size, and in that case emit:
> > > > > > lhs = VIEW_CONVERT_EXPR<lhs_type> (opt),
> > > > > > instead of:
> > > > > > lhs = VIEW_CONVERT_EXPR<op1 type> (opt)
> > > > > > where opt is result of folding VEC_PERM_EXPR<res_type, arg0, arg1, op2>
> > > > > >
> > > > > > Does it look OK ?
> > > > >
> > > > > Definitely the original change was bogus.
> > > > >
> > > > > +         if (!operand_equal_p (TYPE_SIZE (lhs_type), TYPE_SIZE (res_type)))
> > > > > +           return 0;
> > > > >
> > > > > just repeats your very original change though ... I'll note that
> > > > > fold_ternary will
> > > > > ICE on now valid VEC_PERM_EXPRs so we should fix it, possibly by
> > > > > returning NULL_TREE on cases it does not handle.
> > > > >
> > > > > I think what should be done is, in the
> > > > >
> > > > >       /* If there are any VIEW_CONVERT_EXPRs found when finding permutation
> > > > >          operands source, check whether it's valid to transform and prepare
> > > > >          the required new operands.  */
> > > > >       if (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > > > >         {
> > > > > ...
> > > > >
> > > > > path also transform the expected result type.  It should remain V_C_E compatible
> > > > > to TREE_TYPE (lhs) but get a new element type.
> > > > >
> > > > > But as said,
> > > > >
> > > > > tree
> > > > > fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
> > > > > {
> > > > >   unsigned int i;
> > > > >   unsigned HOST_WIDE_INT nelts;
> > > > >   bool need_ctor = false;
> > > > >
> > > > >   if (!sel.length ().is_constant (&nelts))
> > > > >     return NULL_TREE;
> > > > >   gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> > > > >               && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
> > > > >
> > > > > ^^^ this doesn't match what we allow for VEC_PERM_EXPRs now and fold_ternary
> > > > > doesn't guard according to those asserts (I think we should extend fold_vec_perm
> > > > > to support the new constraints).
> > > > Hi Richard,
> > > > Thanks for the suggestions and sorry for late reply. I reverted the change to
> > > > simplify_permutaton which resolved the ppc64le case ICE.
> > > >
> > > > The attached patch extends fold_vec_perm to handle vectors with
> > > > differing lengths.
> > > > For,
> > > > lhs = vec_perm_expr<arg0, arg1, mask>,
> > > > the patch:
> > > > (a) asserts lhs and mask have same vector length.
> > > > (b) asserts arg0 and arg1 have same vector length.
> > > > (c) returns NULL_TREE if element type differs for lhs, arg0 and arg1.
> > > > (d) if len(lhs) > len(arg0), then the patch allows permuting arg0, arg1
> > > > if the mask has npatterns == len(arg0) and nelts_per_pattern == 1.
> > > > The intent is to permute arg0 and arg1, and then to dup elements in result
> > > > to target vector length.
> > > > So for eg:
> > > > vec_perm_expr< {1, 2, 3, 4}, {5, 6, 7, 8}, {0, 4, 1, 5, ...}>
> > > > will result in vla vector {1, 5, 2, 6, ....} with {1, 5, 2, 6}
> > > > replicated thru-out.
> > > > Does it look OK ?
> > > >
> > > > With the patch, we don't ICE for either of the aarch64 tests above.
> > > > For,
> > > > svint32_t f1()
> > > > {
> > > >   int32x4_t v = {1, 2, 3, 4};
> > > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > }
> > > >
> > > > optimized dump shows:
> > > > svint32_t f1 ()
> > > > {
> > > >   int32x4_t v;
> > > >
> > > >   <bb 2> :
> > > >   return { 1, 2, 3, 4, ... };
> > > >
> > > > }
> > > >
> > > > code-gen:
> > > > f1:
> > > > .LFB3900:
> > > >         .cfi_startproc
> > > >         ptrue   p0.b, all
> > > >         adrp    x0, .LC0
> > > >         add     x0, x0, :lo12:.LC0
> > > >         ld1rqw  z0.s, p0/z, [x0]
> > > >         ret
> > > > .LC0:
> > > >         .word   1
> > > >         .word   2
> > > >         .word   3
> > > >         .word   4
> > > >
> > > > I guess for this particular case, we could use index instead.
> > > >
> > > > For,
> > > > svint32_t f2(int a, int b, int c, int d)
> > > > {
> > > >   int32x4_t v = {a, b, c, d};
> > > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > }
> > > >
> > > > optimized dump shows:
> > > > svint32_t f2 (int a, int b, int c, int d)
> > > > {
> > > >   svint32_t _6;
> > > >
> > > >   <bb 2> [local count: 1073741824]:
> > > >   _6 = {a_1(D), b_2(D), c_3(D), d_4(D), ... };
> > > >   return _6;
> > > >
> > > > The code-gen seems pretty bad however:
> > > > f2:
> > > > .LFB3901:
> > > >         .cfi_startproc
> > > >         addvl   sp, sp, #-4
> > > >         .cfi_escape 0xf,0x9,0x8f,0,0x92,0x2e,0,0x8,0x20,0x1e,0x22
> > > >         ptrue   p0.b, all
> > > >         addvl   x4, sp, #3
> > > >         mov     z0.b, #0
> > > >         st1w    z0.s, p0, [sp, #3, mul vl]
> > > >         str     w0, [x4]
> > > >         addvl   x0, sp, #2
> > > >         ld1w    z0.s, p0/z, [sp, #3, mul vl]
> > > >         st1w    z0.s, p0, [sp, #2, mul vl]
> > > >         str     w1, [x0, 4]
> > > >         addvl   x0, sp, #1
> > > >         ld1w    z0.s, p0/z, [sp, #2, mul vl]
> > > >         st1w    z0.s, p0, [sp, #1, mul vl]
> > > >         str     w2, [x0, 8]
> > > >         ld1w    z0.s, p0/z, [sp, #1, mul vl]
> > > >         st1w    z0.s, p0, [sp]
> > > >         str     w3, [sp, 12]
> > > >         ld1w    z0.s, p0/z, [sp]
> > > >         addvl   sp, sp, #4
> > > >         .cfi_def_cfa_offset 0
> > > >         ret
> > > >
> > > > I will try to address code-gen issues in follow up patches.
> > > > Bootstrapped+tested on x64_64-linux-gnu and aarch64-linux-gnu.
> > >
> > >
> > >   /* If result vector has greater length than input vector,
> > > +     then allow permuting two vectors as long as:
> > > +     a) sel.nelts_per_pattern == 1
> > > +     b) sel.npatterns == len of input vector.
> > > +     The intent is to permute input vectors, and
> > > +     dup the elements in resulting vector to target vector length.  */
> > > +
> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> > > +    {
> > > +      nelts = sel.encoding ().npatterns ();
> > > +      if (sel.encoding ().nelts_per_pattern () != 1
> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> > > +       return NULL_TREE;
> > > +    }
> > >
> > > so the only case you add is non-VLA to VLA and there
> > > explicitely only the case of a period that's same as the
> > > element count in the input vectors.
> > >
> > >
> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > > node, int spc, dump_flags_t flags,
> > >                 pp_space (pp);
> > >               }
> > >           }
> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> > > +         pp_string (pp, ", ... ");
> > >         pp_right_brace (pp);
> > >
> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > Well, it got created for the following case after folding:
> > svint32_t f2(int a, int b, int c, int d)
> > {
> >   int32x4_t v = {a, b, c, d};
> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > }
> >
> > The svld1rq_s32 call gets folded to:
> > v = {a, b, c, d}
> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
> >
> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > VLA constructor, since elements in v (in_elts) are not constant, and
> > need_ctor is thus true:
> > lhs = {a, b, c, d, ...}
> > I added "..." to make it more explicit that it's a VLA constructor.
>
> But I doubt we do anything reasonable with such a beast?  Do we?
> I suppose it's like a vec_duplicate if you view it as V1TImode
> but do we actually make sure to do this duplication?
I am not sure. As mentioned above, the current code-gen for VLA
constructor looks pretty bad.
Should we avoid folding VLA constructors for now ?

I guess these are 2 different issues:
(a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
(b) Extending fold_vec_perm to handle vectors with differing lengths.

For (a), I think the issue with using:
res_type = gimple_assign_lhs (stmt)
in previous patch, was that op2's type will change to match tgt_units,
if we go thru
(code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
and may thus not be same as len(lhs_type) anymore, and hit the assert
in fold_vec_perm.

IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
following semantics:
(1) Element types for lhs, rhs1 and rhs2 should be the same.
(2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).

The attached patch changes res_type from TREE_TYPE (arg0) to following:
res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
                                                TYPE_VECTOR_SUBPARTS (op2))
so it has same element type as arg0 (and arg1) and len of op2.
Does that look reasonable ?

If we need a cast from res_type to lhs_type, then both would be fixed
width vectors
with len(lhs_type) being a multiple of len(res_type).
IIUC, we don't support casting from VLA vector to/from fixed width vector,
or from VLA vector of one type to VLA vector of other type ?

Currently, if op2 is VLA, and we enter the branch:
(code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
then I think it will bail out because op2_units will not be a compile
time constant,
and constant_multiple_p (op2_units, tgt_units, &factor) would return false.

The patch resolves ICE's for aarch64 cases, without regressing the
ones in PR106360.
Does it look OK ?

I have a WIP patch for extending fold_vec_perm to fold VLA vectors,
that I will post shortly.

Thanks,
Prathamesh
>
> > >
> > > I had hoped that you would make
> > >
> > >   tree *in_elts = XALLOCAVEC (tree, nelts * 2);
> > >   if (!vec_cst_ctor_to_array (arg0, nelts, in_elts)
> > >       || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts))
> > >     return NULL_TREE;
> > >
> > > VLA agnostic, thus support for example permuting
> > > { 0, 2, 4, 8, ... }, { 1, 3, 5, 7 ... }, { 0, 4, 1, 5 ... }
> > > as { 0, 1, 2, 3, ... }, etc.
> > >
> > > that should be entirely doable, to somebody familiar with VLA and the APIs
> > > even more so.
> > Thanks, I will work on adding this case.
> >
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > >
> > > > > > >
> > > > > > > > I will try to address the folding for above VEC_PERM_EXPR in follow-up patch.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Richard
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 99021a82df4..bc8ba4607a5 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10541,15 +10541,24 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
   unsigned HOST_WIDE_INT nelts;
   bool need_ctor = false;
 
-  if (!sel.length ().is_constant (&nelts))
-    return NULL_TREE;
-  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
-	      && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
-	      && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
+  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
+			sel.length ()));
+  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)),
+			TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))));
+
   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
     return NULL_TREE;
 
+  /* TODO: Handle VEC_PERM_EXPR with differing lengths.  */
+  if (!known_eq (TYPE_VECTOR_SUBPARTS (type),
+		 TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
+    return NULL_TREE;
+
+  /* TODO: Handle VLA vectors.  */
+  if (!sel.length ().is_constant (&nelts))
+    return NULL_TREE;
+
   tree *in_elts = XALLOCAVEC (tree, nelts * 2);
   if (!vec_cst_ctor_to_array (arg0, nelts, in_elts)
       || !vec_cst_ctor_to_array (arg1, nelts, in_elts + nelts))
diff --git a/gcc/match.pd b/gcc/match.pd
index 330c1db0c8e..aa20cc713c5 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7845,6 +7845,12 @@ and,
  (with
   {
     tree op0 = @0, op1 = @1, op2 = @2;
+
+    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
+			  TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))));
+    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)),
+			  TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1))));
+
     machine_mode result_mode = TYPE_MODE (type);
     machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
 
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index fdc4bc8909d..4b693ef095c 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2661,7 +2661,9 @@ simplify_permutation (gimple_stmt_iterator *gsi)
 
       /* Shuffle of a constructor.  */
       bool ret = false;
-      tree res_type = TREE_TYPE (arg0);
+      tree res_type
+	= build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
+			     TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)));
       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
       if (!opt
 	  || (TREE_CODE (opt) != CONSTRUCTOR && TREE_CODE (opt) != VECTOR_CST))
  
Richard Sandiford Aug. 16, 2022, 4:30 p.m. UTC | #14
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
>> <prathamesh.kulkarni@linaro.org> wrote:
>> >
>> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > >
>> > >
>> > >   /* If result vector has greater length than input vector,
>> > > +     then allow permuting two vectors as long as:
>> > > +     a) sel.nelts_per_pattern == 1
>> > > +     b) sel.npatterns == len of input vector.
>> > > +     The intent is to permute input vectors, and
>> > > +     dup the elements in resulting vector to target vector length.  */
>> > > +
>> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
>> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
>> > > +    {
>> > > +      nelts = sel.encoding ().npatterns ();
>> > > +      if (sel.encoding ().nelts_per_pattern () != 1
>> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
>> > > +       return NULL_TREE;
>> > > +    }
>> > >
>> > > so the only case you add is non-VLA to VLA and there
>> > > explicitely only the case of a period that's same as the
>> > > element count in the input vectors.
>> > >
>> > >
>> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
>> > > node, int spc, dump_flags_t flags,
>> > >                 pp_space (pp);
>> > >               }
>> > >           }
>> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
>> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
>> > > +         pp_string (pp, ", ... ");
>> > >         pp_right_brace (pp);
>> > >
>> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
>> > Well, it got created for the following case after folding:
>> > svint32_t f2(int a, int b, int c, int d)
>> > {
>> >   int32x4_t v = {a, b, c, d};
>> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
>> > }
>> >
>> > The svld1rq_s32 call gets folded to:
>> > v = {a, b, c, d}
>> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
>> >
>> > fold_vec_perm then folds the above VEC_PERM_EXPR to
>> > VLA constructor, since elements in v (in_elts) are not constant, and
>> > need_ctor is thus true:
>> > lhs = {a, b, c, d, ...}
>> > I added "..." to make it more explicit that it's a VLA constructor.
>>
>> But I doubt we do anything reasonable with such a beast?  Do we?
>> I suppose it's like a vec_duplicate if you view it as V1TImode
>> but do we actually make sure to do this duplication?
> I am not sure. As mentioned above, the current code-gen for VLA
> constructor looks pretty bad.
> Should we avoid folding VLA constructors for now ?

VLA constructors aren't really a thing.  At least, the only VLA vector
you could represent with current CONSTRUCTOR nodes is a fixed-length
sequence at the start of an otherwise zero vector.  I'm not sure
we even use that though (perhaps we do and I've forgotten).

> I guess these are 2 different issues:
> (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> (b) Extending fold_vec_perm to handle vectors with differing lengths.
>
> For (a), I think the issue with using:
> res_type = gimple_assign_lhs (stmt)
> in previous patch, was that op2's type will change to match tgt_units,
> if we go thru
> (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> and may thus not be same as len(lhs_type) anymore, and hit the assert
> in fold_vec_perm.
>
> IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
> following semantics:
> (1) Element types for lhs, rhs1 and rhs2 should be the same.
> (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).

Yeah.

> The attached patch changes res_type from TREE_TYPE (arg0) to following:
> res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
>                                                 TYPE_VECTOR_SUBPARTS (op2))
> so it has same element type as arg0 (and arg1) and len of op2.
> Does that look reasonable ?
>
> If we need a cast from res_type to lhs_type, then both would be fixed
> width vectors
> with len(lhs_type) being a multiple of len(res_type).
> IIUC, we don't support casting from VLA vector to/from fixed width vector,

Yes, that's not supported as a cast.  If the compiler knows the
length of the "VLA" vector then it's not VLA.  If it doesn't
know the length of the VLA vector then the sizes could be different
(preventing VIEW_CONVERT_EXPR) and the number of elements could be
different (preventing pointwise CONVERT_EXPRs).

> or from VLA vector of one type to VLA vector of other type ?

That's supported though.  They work just like VLS vectors: if the sizes
are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
are the same then we can do pointwise conversions (e.g. element-by-element
extensions, truncations, conversions to float, conversions to integer, etc).

> Currently, if op2 is VLA, and we enter the branch:
> (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> then I think it will bail out because op2_units will not be a compile
> time constant,
> and constant_multiple_p (op2_units, tgt_units, &factor) would return false.
>
> The patch resolves ICE's for aarch64 cases, without regressing the
> ones in PR106360.
> Does it look OK ?

Richi should have the final say, but it looks OK in principle to me.
I'm not sure it's worth applying independently of the follow-on patch
though, if that patch is in the offing anyway.

I guess my only question is whether tree-ssa-forwprop.cc really needs
to build a new type.  Couldn't it just use the TREE_TYPE of the lhs
instead?

Thanks,
Richard
  
Richard Biener Aug. 17, 2022, 11:31 a.m. UTC | #15
On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
> >>
> >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> >> <prathamesh.kulkarni@linaro.org> wrote:
> >> >
> >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > >
> >> > >
> >> > >   /* If result vector has greater length than input vector,
> >> > > +     then allow permuting two vectors as long as:
> >> > > +     a) sel.nelts_per_pattern == 1
> >> > > +     b) sel.npatterns == len of input vector.
> >> > > +     The intent is to permute input vectors, and
> >> > > +     dup the elements in resulting vector to target vector length.  */
> >> > > +
> >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> >> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> >> > > +    {
> >> > > +      nelts = sel.encoding ().npatterns ();
> >> > > +      if (sel.encoding ().nelts_per_pattern () != 1
> >> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> >> > > +       return NULL_TREE;
> >> > > +    }
> >> > >
> >> > > so the only case you add is non-VLA to VLA and there
> >> > > explicitely only the case of a period that's same as the
> >> > > element count in the input vectors.
> >> > >
> >> > >
> >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> >> > > node, int spc, dump_flags_t flags,
> >> > >                 pp_space (pp);
> >> > >               }
> >> > >           }
> >> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> >> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> >> > > +         pp_string (pp, ", ... ");
> >> > >         pp_right_brace (pp);
> >> > >
> >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> >> > Well, it got created for the following case after folding:
> >> > svint32_t f2(int a, int b, int c, int d)
> >> > {
> >> >   int32x4_t v = {a, b, c, d};
> >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> >> > }
> >> >
> >> > The svld1rq_s32 call gets folded to:
> >> > v = {a, b, c, d}
> >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
> >> >
> >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> >> > VLA constructor, since elements in v (in_elts) are not constant, and
> >> > need_ctor is thus true:
> >> > lhs = {a, b, c, d, ...}
> >> > I added "..." to make it more explicit that it's a VLA constructor.
> >>
> >> But I doubt we do anything reasonable with such a beast?  Do we?
> >> I suppose it's like a vec_duplicate if you view it as V1TImode
> >> but do we actually make sure to do this duplication?
> > I am not sure. As mentioned above, the current code-gen for VLA
> > constructor looks pretty bad.
> > Should we avoid folding VLA constructors for now ?
>
> VLA constructors aren't really a thing.  At least, the only VLA vector
> you could represent with current CONSTRUCTOR nodes is a fixed-length
> sequence at the start of an otherwise zero vector.  I'm not sure
> we even use that though (perhaps we do and I've forgotten).
>
> > I guess these are 2 different issues:
> > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> >
> > For (a), I think the issue with using:
> > res_type = gimple_assign_lhs (stmt)
> > in previous patch, was that op2's type will change to match tgt_units,
> > if we go thru
> > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > in fold_vec_perm.
> >
> > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
> > following semantics:
> > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
>
> Yeah.
>
> > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> >                                                 TYPE_VECTOR_SUBPARTS (op2))
> > so it has same element type as arg0 (and arg1) and len of op2.
> > Does that look reasonable ?
> >
> > If we need a cast from res_type to lhs_type, then both would be fixed
> > width vectors
> > with len(lhs_type) being a multiple of len(res_type).
> > IIUC, we don't support casting from VLA vector to/from fixed width vector,
>
> Yes, that's not supported as a cast.  If the compiler knows the
> length of the "VLA" vector then it's not VLA.  If it doesn't
> know the length of the VLA vector then the sizes could be different
> (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> different (preventing pointwise CONVERT_EXPRs).
>
> > or from VLA vector of one type to VLA vector of other type ?
>
> That's supported though.  They work just like VLS vectors: if the sizes
> are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
> are the same then we can do pointwise conversions (e.g. element-by-element
> extensions, truncations, conversions to float, conversions to integer, etc).
>
> > Currently, if op2 is VLA, and we enter the branch:
> > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > then I think it will bail out because op2_units will not be a compile
> > time constant,
> > and constant_multiple_p (op2_units, tgt_units, &factor) would return false.
> >
> > The patch resolves ICE's for aarch64 cases, without regressing the
> > ones in PR106360.
> > Does it look OK ?
>
> Richi should have the final say, but it looks OK in principle to me.
> I'm not sure it's worth applying independently of the follow-on patch
> though, if that patch is in the offing anyway.
>
> I guess my only question is whether tree-ssa-forwprop.cc really needs
> to build a new type.  Couldn't it just use the TREE_TYPE of the lhs
> instead?

I think the point was they are not necssarily the same when we
looked through a VIEW_CONVERT?  A comment might be in order
here.

Btw, please don't add new asserts that trivially hold in critical paths.

diff --git a/gcc/match.pd b/gcc/match.pd
index 330c1db0c8e..aa20cc713c5 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7845,6 +7845,12 @@ and,
  (with
   {
     tree op0 = @0, op1 = @1, op2 = @2;
+
+    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
+                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))));
+    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)),
+                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1))));
+
     machine_mode result_mode = TYPE_MODE (type);
     machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));



>
> Thanks,
> Richard
  
Prathamesh Kulkarni Aug. 18, 2022, 12:44 p.m. UTC | #16
On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
> > >>
> > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > >> <prathamesh.kulkarni@linaro.org> wrote:
> > >> >
> > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > >
> > >> > >
> > >> > >   /* If result vector has greater length than input vector,
> > >> > > +     then allow permuting two vectors as long as:
> > >> > > +     a) sel.nelts_per_pattern == 1
> > >> > > +     b) sel.npatterns == len of input vector.
> > >> > > +     The intent is to permute input vectors, and
> > >> > > +     dup the elements in resulting vector to target vector length.  */
> > >> > > +
> > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > >> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> > >> > > +    {
> > >> > > +      nelts = sel.encoding ().npatterns ();
> > >> > > +      if (sel.encoding ().nelts_per_pattern () != 1
> > >> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> > >> > > +       return NULL_TREE;
> > >> > > +    }
> > >> > >
> > >> > > so the only case you add is non-VLA to VLA and there
> > >> > > explicitely only the case of a period that's same as the
> > >> > > element count in the input vectors.
> > >> > >
> > >> > >
> > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > >> > > node, int spc, dump_flags_t flags,
> > >> > >                 pp_space (pp);
> > >> > >               }
> > >> > >           }
> > >> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> > >> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> > >> > > +         pp_string (pp, ", ... ");
> > >> > >         pp_right_brace (pp);
> > >> > >
> > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > >> > Well, it got created for the following case after folding:
> > >> > svint32_t f2(int a, int b, int c, int d)
> > >> > {
> > >> >   int32x4_t v = {a, b, c, d};
> > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > >> > }
> > >> >
> > >> > The svld1rq_s32 call gets folded to:
> > >> > v = {a, b, c, d}
> > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
> > >> >
> > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > >> > VLA constructor, since elements in v (in_elts) are not constant, and
> > >> > need_ctor is thus true:
> > >> > lhs = {a, b, c, d, ...}
> > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > >>
> > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > >> but do we actually make sure to do this duplication?
> > > I am not sure. As mentioned above, the current code-gen for VLA
> > > constructor looks pretty bad.
> > > Should we avoid folding VLA constructors for now ?
> >
> > VLA constructors aren't really a thing.  At least, the only VLA vector
> > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > sequence at the start of an otherwise zero vector.  I'm not sure
> > we even use that though (perhaps we do and I've forgotten).
> >
> > > I guess these are 2 different issues:
> > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > >
> > > For (a), I think the issue with using:
> > > res_type = gimple_assign_lhs (stmt)
> > > in previous patch, was that op2's type will change to match tgt_units,
> > > if we go thru
> > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > in fold_vec_perm.
> > >
> > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
> > > following semantics:
> > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> >
> > Yeah.
> >
> > > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > >                                                 TYPE_VECTOR_SUBPARTS (op2))
> > > so it has same element type as arg0 (and arg1) and len of op2.
> > > Does that look reasonable ?
> > >
> > > If we need a cast from res_type to lhs_type, then both would be fixed
> > > width vectors
> > > with len(lhs_type) being a multiple of len(res_type).
> > > IIUC, we don't support casting from VLA vector to/from fixed width vector,
> >
> > Yes, that's not supported as a cast.  If the compiler knows the
> > length of the "VLA" vector then it's not VLA.  If it doesn't
> > know the length of the VLA vector then the sizes could be different
> > (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> > different (preventing pointwise CONVERT_EXPRs).
> >
> > > or from VLA vector of one type to VLA vector of other type ?
> >
> > That's supported though.  They work just like VLS vectors: if the sizes
> > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
> > are the same then we can do pointwise conversions (e.g. element-by-element
> > extensions, truncations, conversions to float, conversions to integer, etc).
> >
> > > Currently, if op2 is VLA, and we enter the branch:
> > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > > then I think it will bail out because op2_units will not be a compile
> > > time constant,
> > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false.
> > >
> > > The patch resolves ICE's for aarch64 cases, without regressing the
> > > ones in PR106360.
> > > Does it look OK ?
> >
> > Richi should have the final say, but it looks OK in principle to me.
> > I'm not sure it's worth applying independently of the follow-on patch
> > though, if that patch is in the offing anyway.
> >
> > I guess my only question is whether tree-ssa-forwprop.cc really needs
> > to build a new type.  Couldn't it just use the TREE_TYPE of the lhs
> > instead?
>
> I think the point was they are not necssarily the same when we
> looked through a VIEW_CONVERT?  A comment might be in order
> here.
Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of
VIEW_CONVERT_EXPR. For instance in following case:

  _78 = (void *) ivtmp.21_73;
  _92 = MEM <unsigned long> [(uint8_t *)_78];
  _91 = {_92, 0};
  vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
  _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
  _87 = {_88, 0};
  vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
  vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;

simplify_permutation looks thru V_C_E, and tries to fold:
res = VEC_PERM_EXPR (_87, _91, {0, 2})
In this case, lhs type (V16QI) differs from res type (V2DI), and we
hit assert in fold_vec_perm.
>
> Btw, please don't add new asserts that trivially hold in critical paths.
Well, it didn't hold true for following case when op2 is VLA:
lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }),
because we're passing res_type == V4SI, and op2_type == VNx4SI from
simplify_permutation.

The vec_perm pattern hits:
(if (sel.series_p (0, 1, 0, 1))
  { op0; }

and folds it to:
lhs = {1, 2, 3, 4}

which results in error during verify_gimple since lhs and rhs types
differ (VNx4SI, V4SI).

I suppose we want to do the transforms:
sel.series_p (0, 1, 0, 1) -> op0 and,
sel.series_p (0, 1, nelts, 1) -> op1
only if input vectors and sel have same length ?

Thanks,
Prathamesh

>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 330c1db0c8e..aa20cc713c5 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7845,6 +7845,12 @@ and,
>   (with
>    {
>      tree op0 = @0, op1 = @1, op2 = @2;
> +
> +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
> +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))));
> +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)),
> +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1))));
> +
>      machine_mode result_mode = TYPE_MODE (type);
>      machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
>
>
>
> >
> > Thanks,
> > Richard
  
Prathamesh Kulkarni Aug. 18, 2022, 12:50 p.m. UTC | #17
On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > >
> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >>
> > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > >> <prathamesh.kulkarni@linaro.org> wrote:
> > > >> >
> > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > >
> > > >> > >
> > > >> > >   /* If result vector has greater length than input vector,
> > > >> > > +     then allow permuting two vectors as long as:
> > > >> > > +     a) sel.nelts_per_pattern == 1
> > > >> > > +     b) sel.npatterns == len of input vector.
> > > >> > > +     The intent is to permute input vectors, and
> > > >> > > +     dup the elements in resulting vector to target vector length.  */
> > > >> > > +
> > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > >> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> > > >> > > +    {
> > > >> > > +      nelts = sel.encoding ().npatterns ();
> > > >> > > +      if (sel.encoding ().nelts_per_pattern () != 1
> > > >> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> > > >> > > +       return NULL_TREE;
> > > >> > > +    }
> > > >> > >
> > > >> > > so the only case you add is non-VLA to VLA and there
> > > >> > > explicitely only the case of a period that's same as the
> > > >> > > element count in the input vectors.
> > > >> > >
> > > >> > >
> > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > > >> > > node, int spc, dump_flags_t flags,
> > > >> > >                 pp_space (pp);
> > > >> > >               }
> > > >> > >           }
> > > >> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > >> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> > > >> > > +         pp_string (pp, ", ... ");
> > > >> > >         pp_right_brace (pp);
> > > >> > >
> > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > > >> > Well, it got created for the following case after folding:
> > > >> > svint32_t f2(int a, int b, int c, int d)
> > > >> > {
> > > >> >   int32x4_t v = {a, b, c, d};
> > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > >> > }
> > > >> >
> > > >> > The svld1rq_s32 call gets folded to:
> > > >> > v = {a, b, c, d}
> > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
> > > >> >
> > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > >> > VLA constructor, since elements in v (in_elts) are not constant, and
> > > >> > need_ctor is thus true:
> > > >> > lhs = {a, b, c, d, ...}
> > > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > > >>
> > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > >> but do we actually make sure to do this duplication?
> > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > constructor looks pretty bad.
> > > > Should we avoid folding VLA constructors for now ?
> > >
> > > VLA constructors aren't really a thing.  At least, the only VLA vector
> > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > we even use that though (perhaps we do and I've forgotten).
> > >
> > > > I guess these are 2 different issues:
> > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > > >
> > > > For (a), I think the issue with using:
> > > > res_type = gimple_assign_lhs (stmt)
> > > > in previous patch, was that op2's type will change to match tgt_units,
> > > > if we go thru
> > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > > in fold_vec_perm.
> > > >
> > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
> > > > following semantics:
> > > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> > >
> > > Yeah.
> > >
> > > > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > > >                                                 TYPE_VECTOR_SUBPARTS (op2))
> > > > so it has same element type as arg0 (and arg1) and len of op2.
> > > > Does that look reasonable ?
> > > >
> > > > If we need a cast from res_type to lhs_type, then both would be fixed
> > > > width vectors
> > > > with len(lhs_type) being a multiple of len(res_type).
> > > > IIUC, we don't support casting from VLA vector to/from fixed width vector,
> > >
> > > Yes, that's not supported as a cast.  If the compiler knows the
> > > length of the "VLA" vector then it's not VLA.  If it doesn't
> > > know the length of the VLA vector then the sizes could be different
> > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> > > different (preventing pointwise CONVERT_EXPRs).
> > >
> > > > or from VLA vector of one type to VLA vector of other type ?
> > >
> > > That's supported though.  They work just like VLS vectors: if the sizes
> > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
> > > are the same then we can do pointwise conversions (e.g. element-by-element
> > > extensions, truncations, conversions to float, conversions to integer, etc).
> > >
> > > > Currently, if op2 is VLA, and we enter the branch:
> > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > > > then I think it will bail out because op2_units will not be a compile
> > > > time constant,
> > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false.
> > > >
> > > > The patch resolves ICE's for aarch64 cases, without regressing the
> > > > ones in PR106360.
> > > > Does it look OK ?
> > >
> > > Richi should have the final say, but it looks OK in principle to me.
> > > I'm not sure it's worth applying independently of the follow-on patch
> > > though, if that patch is in the offing anyway.
> > >
> > > I guess my only question is whether tree-ssa-forwprop.cc really needs
> > > to build a new type.  Couldn't it just use the TREE_TYPE of the lhs
> > > instead?
> >
> > I think the point was they are not necssarily the same when we
> > looked through a VIEW_CONVERT?  A comment might be in order
> > here.
> Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of
> VIEW_CONVERT_EXPR. For instance in following case:
>
>   _78 = (void *) ivtmp.21_73;
>   _92 = MEM <unsigned long> [(uint8_t *)_78];
>   _91 = {_92, 0};
>   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
>   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
>   _87 = {_88, 0};
>   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
>   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
>
> simplify_permutation looks thru V_C_E, and tries to fold:
> res = VEC_PERM_EXPR (_87, _91, {0, 2})
> In this case, lhs type (V16QI) differs from res type (V2DI), and we
> hit assert in fold_vec_perm.
Oops sorry, just to clarify -- this hit assert in fold_vec_perm only
when we used res_type = lhs_type.
It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI.
The issue with using res_type = TREE_TYPE (arg0), comes for case when
arg0 is fixed length, and op2 is VLA.

Thanks,
Prathamesh
> >
> > Btw, please don't add new asserts that trivially hold in critical paths.
> Well, it didn't hold true for following case when op2 is VLA:
> lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }),
> because we're passing res_type == V4SI, and op2_type == VNx4SI from
> simplify_permutation.
>
> The vec_perm pattern hits:
> (if (sel.series_p (0, 1, 0, 1))
>   { op0; }
>
> and folds it to:
> lhs = {1, 2, 3, 4}
>
> which results in error during verify_gimple since lhs and rhs types
> differ (VNx4SI, V4SI).
>
> I suppose we want to do the transforms:
> sel.series_p (0, 1, 0, 1) -> op0 and,
> sel.series_p (0, 1, nelts, 1) -> op1
> only if input vectors and sel have same length ?
>
> Thanks,
> Prathamesh
>
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 330c1db0c8e..aa20cc713c5 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7845,6 +7845,12 @@ and,
> >   (with
> >    {
> >      tree op0 = @0, op1 = @1, op2 = @2;
> > +
> > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
> > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))));
> > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)),
> > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1))));
> > +
> >      machine_mode result_mode = TYPE_MODE (type);
> >      machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> >
> >
> >
> > >
> > > Thanks,
> > > Richard
  
Prathamesh Kulkarni Aug. 29, 2022, 6:23 a.m. UTC | #18
On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > > <richard.sandiford@arm.com> wrote:
> > > >
> > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > >>
> > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > >> <prathamesh.kulkarni@linaro.org> wrote:
> > > > >> >
> > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > >
> > > > >> > >
> > > > >> > >   /* If result vector has greater length than input vector,
> > > > >> > > +     then allow permuting two vectors as long as:
> > > > >> > > +     a) sel.nelts_per_pattern == 1
> > > > >> > > +     b) sel.npatterns == len of input vector.
> > > > >> > > +     The intent is to permute input vectors, and
> > > > >> > > +     dup the elements in resulting vector to target vector length.  */
> > > > >> > > +
> > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > >> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> > > > >> > > +    {
> > > > >> > > +      nelts = sel.encoding ().npatterns ();
> > > > >> > > +      if (sel.encoding ().nelts_per_pattern () != 1
> > > > >> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> > > > >> > > +       return NULL_TREE;
> > > > >> > > +    }
> > > > >> > >
> > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > >> > > explicitely only the case of a period that's same as the
> > > > >> > > element count in the input vectors.
> > > > >> > >
> > > > >> > >
> > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > > > >> > > node, int spc, dump_flags_t flags,
> > > > >> > >                 pp_space (pp);
> > > > >> > >               }
> > > > >> > >           }
> > > > >> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > >> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> > > > >> > > +         pp_string (pp, ", ... ");
> > > > >> > >         pp_right_brace (pp);
> > > > >> > >
> > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > > > >> > Well, it got created for the following case after folding:
> > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > >> > {
> > > > >> >   int32x4_t v = {a, b, c, d};
> > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > >> > }
> > > > >> >
> > > > >> > The svld1rq_s32 call gets folded to:
> > > > >> > v = {a, b, c, d}
> > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
> > > > >> >
> > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and
> > > > >> > need_ctor is thus true:
> > > > >> > lhs = {a, b, c, d, ...}
> > > > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > > > >>
> > > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > > >> but do we actually make sure to do this duplication?
> > > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > > constructor looks pretty bad.
> > > > > Should we avoid folding VLA constructors for now ?
> > > >
> > > > VLA constructors aren't really a thing.  At least, the only VLA vector
> > > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > we even use that though (perhaps we do and I've forgotten).
> > > >
> > > > > I guess these are 2 different issues:
> > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > > > >
> > > > > For (a), I think the issue with using:
> > > > > res_type = gimple_assign_lhs (stmt)
> > > > > in previous patch, was that op2's type will change to match tgt_units,
> > > > > if we go thru
> > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > > > in fold_vec_perm.
> > > > >
> > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
> > > > > following semantics:
> > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> > > >
> > > > Yeah.
> > > >
> > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > > > >                                                 TYPE_VECTOR_SUBPARTS (op2))
> > > > > so it has same element type as arg0 (and arg1) and len of op2.
> > > > > Does that look reasonable ?
> > > > >
> > > > > If we need a cast from res_type to lhs_type, then both would be fixed
> > > > > width vectors
> > > > > with len(lhs_type) being a multiple of len(res_type).
> > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector,
> > > >
> > > > Yes, that's not supported as a cast.  If the compiler knows the
> > > > length of the "VLA" vector then it's not VLA.  If it doesn't
> > > > know the length of the VLA vector then the sizes could be different
> > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> > > > different (preventing pointwise CONVERT_EXPRs).
> > > >
> > > > > or from VLA vector of one type to VLA vector of other type ?
> > > >
> > > > That's supported though.  They work just like VLS vectors: if the sizes
> > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
> > > > are the same then we can do pointwise conversions (e.g. element-by-element
> > > > extensions, truncations, conversions to float, conversions to integer, etc).
> > > >
> > > > > Currently, if op2 is VLA, and we enter the branch:
> > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > > > > then I think it will bail out because op2_units will not be a compile
> > > > > time constant,
> > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false.
> > > > >
> > > > > The patch resolves ICE's for aarch64 cases, without regressing the
> > > > > ones in PR106360.
> > > > > Does it look OK ?
> > > >
> > > > Richi should have the final say, but it looks OK in principle to me.
> > > > I'm not sure it's worth applying independently of the follow-on patch
> > > > though, if that patch is in the offing anyway.
> > > >
> > > > I guess my only question is whether tree-ssa-forwprop.cc really needs
> > > > to build a new type.  Couldn't it just use the TREE_TYPE of the lhs
> > > > instead?
> > >
> > > I think the point was they are not necssarily the same when we
> > > looked through a VIEW_CONVERT?  A comment might be in order
> > > here.
> > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of
> > VIEW_CONVERT_EXPR. For instance in following case:
> >
> >   _78 = (void *) ivtmp.21_73;
> >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> >   _91 = {_92, 0};
> >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> >   _87 = {_88, 0};
> >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> >
> > simplify_permutation looks thru V_C_E, and tries to fold:
> > res = VEC_PERM_EXPR (_87, _91, {0, 2})
> > In this case, lhs type (V16QI) differs from res type (V2DI), and we
> > hit assert in fold_vec_perm.
> Oops sorry, just to clarify -- this hit assert in fold_vec_perm only
> when we used res_type = lhs_type.
> It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI.
> The issue with using res_type = TREE_TYPE (arg0), comes for case when
> arg0 is fixed length, and op2 is VLA.
Hi,
Shall it be OK to commit the above patch, that changes res_type to
have same elem type as arg0, and length as op2 ?

Also, shall it be OK to gate these transforms in vec_perm match.pd pattern:
sel.series_p (0, 1, 0, 1) -> op0 and,
sel.series_p (0, 1, nelts, 1) -> op1
if len(type) == len(arg0) ?

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> > >
> > > Btw, please don't add new asserts that trivially hold in critical paths.
> > Well, it didn't hold true for following case when op2 is VLA:
> > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }),
> > because we're passing res_type == V4SI, and op2_type == VNx4SI from
> > simplify_permutation.
> >
> > The vec_perm pattern hits:
> > (if (sel.series_p (0, 1, 0, 1))
> >   { op0; }
> >
> > and folds it to:
> > lhs = {1, 2, 3, 4}
> >
> > which results in error during verify_gimple since lhs and rhs types
> > differ (VNx4SI, V4SI).
> >
> > I suppose we want to do the transforms:
> > sel.series_p (0, 1, 0, 1) -> op0 and,
> > sel.series_p (0, 1, nelts, 1) -> op1
> > only if input vectors and sel have same length ?
> >
> > Thanks,
> > Prathamesh
> >
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 330c1db0c8e..aa20cc713c5 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -7845,6 +7845,12 @@ and,
> > >   (with
> > >    {
> > >      tree op0 = @0, op1 = @1, op2 = @2;
> > > +
> > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
> > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))));
> > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)),
> > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1))));
> > > +
> > >      machine_mode result_mode = TYPE_MODE (type);
> > >      machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> > >
> > >
> > >
> > > >
> > > > Thanks,
> > > > Richard
  
Prathamesh Kulkarni Sept. 5, 2022, 8:54 a.m. UTC | #19
On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > > > <richard.sandiford@arm.com> wrote:
> > > > >
> > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > >>
> > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > > >> <prathamesh.kulkarni@linaro.org> wrote:
> > > > > >> >
> > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > >
> > > > > >> > >
> > > > > >> > >   /* If result vector has greater length than input vector,
> > > > > >> > > +     then allow permuting two vectors as long as:
> > > > > >> > > +     a) sel.nelts_per_pattern == 1
> > > > > >> > > +     b) sel.npatterns == len of input vector.
> > > > > >> > > +     The intent is to permute input vectors, and
> > > > > >> > > +     dup the elements in resulting vector to target vector length.  */
> > > > > >> > > +
> > > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > > >> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> > > > > >> > > +    {
> > > > > >> > > +      nelts = sel.encoding ().npatterns ();
> > > > > >> > > +      if (sel.encoding ().nelts_per_pattern () != 1
> > > > > >> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> > > > > >> > > +       return NULL_TREE;
> > > > > >> > > +    }
> > > > > >> > >
> > > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > > >> > > explicitely only the case of a period that's same as the
> > > > > >> > > element count in the input vectors.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > > > > >> > > node, int spc, dump_flags_t flags,
> > > > > >> > >                 pp_space (pp);
> > > > > >> > >               }
> > > > > >> > >           }
> > > > > >> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > > >> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> > > > > >> > > +         pp_string (pp, ", ... ");
> > > > > >> > >         pp_right_brace (pp);
> > > > > >> > >
> > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > > > > >> > Well, it got created for the following case after folding:
> > > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > > >> > {
> > > > > >> >   int32x4_t v = {a, b, c, d};
> > > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > >> > }
> > > > > >> >
> > > > > >> > The svld1rq_s32 call gets folded to:
> > > > > >> > v = {a, b, c, d}
> > > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
> > > > > >> >
> > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and
> > > > > >> > need_ctor is thus true:
> > > > > >> > lhs = {a, b, c, d, ...}
> > > > > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > > > > >>
> > > > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > > > >> but do we actually make sure to do this duplication?
> > > > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > > > constructor looks pretty bad.
> > > > > > Should we avoid folding VLA constructors for now ?
> > > > >
> > > > > VLA constructors aren't really a thing.  At least, the only VLA vector
> > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > > we even use that though (perhaps we do and I've forgotten).
> > > > >
> > > > > > I guess these are 2 different issues:
> > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > > > > >
> > > > > > For (a), I think the issue with using:
> > > > > > res_type = gimple_assign_lhs (stmt)
> > > > > > in previous patch, was that op2's type will change to match tgt_units,
> > > > > > if we go thru
> > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > > > > in fold_vec_perm.
> > > > > >
> > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
> > > > > > following semantics:
> > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> > > > >
> > > > > Yeah.
> > > > >
> > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > > > > >                                                 TYPE_VECTOR_SUBPARTS (op2))
> > > > > > so it has same element type as arg0 (and arg1) and len of op2.
> > > > > > Does that look reasonable ?
> > > > > >
> > > > > > If we need a cast from res_type to lhs_type, then both would be fixed
> > > > > > width vectors
> > > > > > with len(lhs_type) being a multiple of len(res_type).
> > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector,
> > > > >
> > > > > Yes, that's not supported as a cast.  If the compiler knows the
> > > > > length of the "VLA" vector then it's not VLA.  If it doesn't
> > > > > know the length of the VLA vector then the sizes could be different
> > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> > > > > different (preventing pointwise CONVERT_EXPRs).
> > > > >
> > > > > > or from VLA vector of one type to VLA vector of other type ?
> > > > >
> > > > > That's supported though.  They work just like VLS vectors: if the sizes
> > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
> > > > > are the same then we can do pointwise conversions (e.g. element-by-element
> > > > > extensions, truncations, conversions to float, conversions to integer, etc).
> > > > >
> > > > > > Currently, if op2 is VLA, and we enter the branch:
> > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > > > > > then I think it will bail out because op2_units will not be a compile
> > > > > > time constant,
> > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false.
> > > > > >
> > > > > > The patch resolves ICE's for aarch64 cases, without regressing the
> > > > > > ones in PR106360.
> > > > > > Does it look OK ?
> > > > >
> > > > > Richi should have the final say, but it looks OK in principle to me.
> > > > > I'm not sure it's worth applying independently of the follow-on patch
> > > > > though, if that patch is in the offing anyway.
> > > > >
> > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs
> > > > > to build a new type.  Couldn't it just use the TREE_TYPE of the lhs
> > > > > instead?
> > > >
> > > > I think the point was they are not necssarily the same when we
> > > > looked through a VIEW_CONVERT?  A comment might be in order
> > > > here.
> > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of
> > > VIEW_CONVERT_EXPR. For instance in following case:
> > >
> > >   _78 = (void *) ivtmp.21_73;
> > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > >   _91 = {_92, 0};
> > >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > >   _87 = {_88, 0};
> > >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> > >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> > >
> > > simplify_permutation looks thru V_C_E, and tries to fold:
> > > res = VEC_PERM_EXPR (_87, _91, {0, 2})
> > > In this case, lhs type (V16QI) differs from res type (V2DI), and we
> > > hit assert in fold_vec_perm.
> > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only
> > when we used res_type = lhs_type.
> > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI.
> > The issue with using res_type = TREE_TYPE (arg0), comes for case when
> > arg0 is fixed length, and op2 is VLA.
> Hi,
> Shall it be OK to commit the above patch, that changes res_type to
> have same elem type as arg0, and length as op2 ?
>
> Also, shall it be OK to gate these transforms in vec_perm match.pd pattern:
> sel.series_p (0, 1, 0, 1) -> op0 and,
> sel.series_p (0, 1, nelts, 1) -> op1
> if len(type) == len(arg0) ?
ping

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > > >
> > > > Btw, please don't add new asserts that trivially hold in critical paths.
> > > Well, it didn't hold true for following case when op2 is VLA:
> > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }),
> > > because we're passing res_type == V4SI, and op2_type == VNx4SI from
> > > simplify_permutation.
> > >
> > > The vec_perm pattern hits:
> > > (if (sel.series_p (0, 1, 0, 1))
> > >   { op0; }
> > >
> > > and folds it to:
> > > lhs = {1, 2, 3, 4}
> > >
> > > which results in error during verify_gimple since lhs and rhs types
> > > differ (VNx4SI, V4SI).
> > >
> > > I suppose we want to do the transforms:
> > > sel.series_p (0, 1, 0, 1) -> op0 and,
> > > sel.series_p (0, 1, nelts, 1) -> op1
> > > only if input vectors and sel have same length ?
> > >
> > > Thanks,
> > > Prathamesh
> > >
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index 330c1db0c8e..aa20cc713c5 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -7845,6 +7845,12 @@ and,
> > > >   (with
> > > >    {
> > > >      tree op0 = @0, op1 = @1, op2 = @2;
> > > > +
> > > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
> > > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))));
> > > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)),
> > > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1))));
> > > > +
> > > >      machine_mode result_mode = TYPE_MODE (type);
> > > >      machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> > > >
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > Richard
  
Richard Biener Sept. 5, 2022, 9:09 a.m. UTC | #20
On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > > > > <richard.sandiford@arm.com> wrote:
> > > > > >
> > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > >>
> > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > > > >> <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > >> >
> > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > >
> > > > > > >> > >
> > > > > > >> > >   /* If result vector has greater length than input vector,
> > > > > > >> > > +     then allow permuting two vectors as long as:
> > > > > > >> > > +     a) sel.nelts_per_pattern == 1
> > > > > > >> > > +     b) sel.npatterns == len of input vector.
> > > > > > >> > > +     The intent is to permute input vectors, and
> > > > > > >> > > +     dup the elements in resulting vector to target vector length.  */
> > > > > > >> > > +
> > > > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > > > >> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> > > > > > >> > > +    {
> > > > > > >> > > +      nelts = sel.encoding ().npatterns ();
> > > > > > >> > > +      if (sel.encoding ().nelts_per_pattern () != 1
> > > > > > >> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> > > > > > >> > > +       return NULL_TREE;
> > > > > > >> > > +    }
> > > > > > >> > >
> > > > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > > > >> > > explicitely only the case of a period that's same as the
> > > > > > >> > > element count in the input vectors.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > > > > > >> > > node, int spc, dump_flags_t flags,
> > > > > > >> > >                 pp_space (pp);
> > > > > > >> > >               }
> > > > > > >> > >           }
> > > > > > >> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > > > >> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> > > > > > >> > > +         pp_string (pp, ", ... ");
> > > > > > >> > >         pp_right_brace (pp);
> > > > > > >> > >
> > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > > > > > >> > Well, it got created for the following case after folding:
> > > > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > > > >> > {
> > > > > > >> >   int32x4_t v = {a, b, c, d};
> > > > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > >> > }
> > > > > > >> >
> > > > > > >> > The svld1rq_s32 call gets folded to:
> > > > > > >> > v = {a, b, c, d}
> > > > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
> > > > > > >> >
> > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and
> > > > > > >> > need_ctor is thus true:
> > > > > > >> > lhs = {a, b, c, d, ...}
> > > > > > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > > > > > >>
> > > > > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > > > > >> but do we actually make sure to do this duplication?
> > > > > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > > > > constructor looks pretty bad.
> > > > > > > Should we avoid folding VLA constructors for now ?
> > > > > >
> > > > > > VLA constructors aren't really a thing.  At least, the only VLA vector
> > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > > > we even use that though (perhaps we do and I've forgotten).
> > > > > >
> > > > > > > I guess these are 2 different issues:
> > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > > > > > >
> > > > > > > For (a), I think the issue with using:
> > > > > > > res_type = gimple_assign_lhs (stmt)
> > > > > > > in previous patch, was that op2's type will change to match tgt_units,
> > > > > > > if we go thru
> > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > > > > > in fold_vec_perm.
> > > > > > >
> > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
> > > > > > > following semantics:
> > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> > > > > >
> > > > > > Yeah.
> > > > > >
> > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > > > > > >                                                 TYPE_VECTOR_SUBPARTS (op2))
> > > > > > > so it has same element type as arg0 (and arg1) and len of op2.
> > > > > > > Does that look reasonable ?
> > > > > > >
> > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed
> > > > > > > width vectors
> > > > > > > with len(lhs_type) being a multiple of len(res_type).
> > > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector,
> > > > > >
> > > > > > Yes, that's not supported as a cast.  If the compiler knows the
> > > > > > length of the "VLA" vector then it's not VLA.  If it doesn't
> > > > > > know the length of the VLA vector then the sizes could be different
> > > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> > > > > > different (preventing pointwise CONVERT_EXPRs).
> > > > > >
> > > > > > > or from VLA vector of one type to VLA vector of other type ?
> > > > > >
> > > > > > That's supported though.  They work just like VLS vectors: if the sizes
> > > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
> > > > > > are the same then we can do pointwise conversions (e.g. element-by-element
> > > > > > extensions, truncations, conversions to float, conversions to integer, etc).
> > > > > >
> > > > > > > Currently, if op2 is VLA, and we enter the branch:
> > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > > > > > > then I think it will bail out because op2_units will not be a compile
> > > > > > > time constant,
> > > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false.
> > > > > > >
> > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the
> > > > > > > ones in PR106360.
> > > > > > > Does it look OK ?
> > > > > >
> > > > > > Richi should have the final say, but it looks OK in principle to me.
> > > > > > I'm not sure it's worth applying independently of the follow-on patch
> > > > > > though, if that patch is in the offing anyway.
> > > > > >
> > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs
> > > > > > to build a new type.  Couldn't it just use the TREE_TYPE of the lhs
> > > > > > instead?
> > > > >
> > > > > I think the point was they are not necssarily the same when we
> > > > > looked through a VIEW_CONVERT?  A comment might be in order
> > > > > here.
> > > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of
> > > > VIEW_CONVERT_EXPR. For instance in following case:
> > > >
> > > >   _78 = (void *) ivtmp.21_73;
> > > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > > >   _91 = {_92, 0};
> > > >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> > > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > > >   _87 = {_88, 0};
> > > >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> > > >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> > > >
> > > > simplify_permutation looks thru V_C_E, and tries to fold:
> > > > res = VEC_PERM_EXPR (_87, _91, {0, 2})
> > > > In this case, lhs type (V16QI) differs from res type (V2DI), and we
> > > > hit assert in fold_vec_perm.
> > > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only
> > > when we used res_type = lhs_type.
> > > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI.
> > > The issue with using res_type = TREE_TYPE (arg0), comes for case when
> > > arg0 is fixed length, and op2 is VLA.
> > Hi,
> > Shall it be OK to commit the above patch, that changes res_type to
> > have same elem type as arg0, and length as op2 ?
> >
> > Also, shall it be OK to gate these transforms in vec_perm match.pd pattern:
> > sel.series_p (0, 1, 0, 1) -> op0 and,
> > sel.series_p (0, 1, nelts, 1) -> op1
> > if len(type) == len(arg0) ?
> ping

I've lost track - what is the patch you are refering to?

Richard.

> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > > >
> > > > > Btw, please don't add new asserts that trivially hold in critical paths.
> > > > Well, it didn't hold true for following case when op2 is VLA:
> > > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }),
> > > > because we're passing res_type == V4SI, and op2_type == VNx4SI from
> > > > simplify_permutation.
> > > >
> > > > The vec_perm pattern hits:
> > > > (if (sel.series_p (0, 1, 0, 1))
> > > >   { op0; }
> > > >
> > > > and folds it to:
> > > > lhs = {1, 2, 3, 4}
> > > >
> > > > which results in error during verify_gimple since lhs and rhs types
> > > > differ (VNx4SI, V4SI).
> > > >
> > > > I suppose we want to do the transforms:
> > > > sel.series_p (0, 1, 0, 1) -> op0 and,
> > > > sel.series_p (0, 1, nelts, 1) -> op1
> > > > only if input vectors and sel have same length ?
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index 330c1db0c8e..aa20cc713c5 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -7845,6 +7845,12 @@ and,
> > > > >   (with
> > > > >    {
> > > > >      tree op0 = @0, op1 = @1, op2 = @2;
> > > > > +
> > > > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
> > > > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))));
> > > > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)),
> > > > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1))));
> > > > > +
> > > > >      machine_mode result_mode = TYPE_MODE (type);
> > > > >      machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Richard
  
Prathamesh Kulkarni Sept. 5, 2022, 9:26 a.m. UTC | #21
On Mon, 5 Sept 2022 at 14:39, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
> > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > >
> > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > > > > > <richard.sandiford@arm.com> wrote:
> > > > > > >
> > > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > > > > >> <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > > >> >
> > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > >
> > > > > > > >> > >
> > > > > > > >> > >   /* If result vector has greater length than input vector,
> > > > > > > >> > > +     then allow permuting two vectors as long as:
> > > > > > > >> > > +     a) sel.nelts_per_pattern == 1
> > > > > > > >> > > +     b) sel.npatterns == len of input vector.
> > > > > > > >> > > +     The intent is to permute input vectors, and
> > > > > > > >> > > +     dup the elements in resulting vector to target vector length.  */
> > > > > > > >> > > +
> > > > > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > > > > >> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> > > > > > > >> > > +    {
> > > > > > > >> > > +      nelts = sel.encoding ().npatterns ();
> > > > > > > >> > > +      if (sel.encoding ().nelts_per_pattern () != 1
> > > > > > > >> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> > > > > > > >> > > +       return NULL_TREE;
> > > > > > > >> > > +    }
> > > > > > > >> > >
> > > > > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > > > > >> > > explicitely only the case of a period that's same as the
> > > > > > > >> > > element count in the input vectors.
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > > > > > > >> > > node, int spc, dump_flags_t flags,
> > > > > > > >> > >                 pp_space (pp);
> > > > > > > >> > >               }
> > > > > > > >> > >           }
> > > > > > > >> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > > > > >> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> > > > > > > >> > > +         pp_string (pp, ", ... ");
> > > > > > > >> > >         pp_right_brace (pp);
> > > > > > > >> > >
> > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > > > > > > >> > Well, it got created for the following case after folding:
> > > > > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > > > > >> > {
> > > > > > > >> >   int32x4_t v = {a, b, c, d};
> > > > > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > >> > }
> > > > > > > >> >
> > > > > > > >> > The svld1rq_s32 call gets folded to:
> > > > > > > >> > v = {a, b, c, d}
> > > > > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
> > > > > > > >> >
> > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and
> > > > > > > >> > need_ctor is thus true:
> > > > > > > >> > lhs = {a, b, c, d, ...}
> > > > > > > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > > > > > > >>
> > > > > > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > > > > > >> but do we actually make sure to do this duplication?
> > > > > > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > > > > > constructor looks pretty bad.
> > > > > > > > Should we avoid folding VLA constructors for now ?
> > > > > > >
> > > > > > > VLA constructors aren't really a thing.  At least, the only VLA vector
> > > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > > > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > > > > we even use that though (perhaps we do and I've forgotten).
> > > > > > >
> > > > > > > > I guess these are 2 different issues:
> > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > > > > > > >
> > > > > > > > For (a), I think the issue with using:
> > > > > > > > res_type = gimple_assign_lhs (stmt)
> > > > > > > > in previous patch, was that op2's type will change to match tgt_units,
> > > > > > > > if we go thru
> > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > > > > > > in fold_vec_perm.
> > > > > > > >
> > > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
> > > > > > > > following semantics:
> > > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> > > > > > >
> > > > > > > Yeah.
> > > > > > >
> > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > > > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > > > > > > >                                                 TYPE_VECTOR_SUBPARTS (op2))
> > > > > > > > so it has same element type as arg0 (and arg1) and len of op2.
> > > > > > > > Does that look reasonable ?
> > > > > > > >
> > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed
> > > > > > > > width vectors
> > > > > > > > with len(lhs_type) being a multiple of len(res_type).
> > > > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector,
> > > > > > >
> > > > > > > Yes, that's not supported as a cast.  If the compiler knows the
> > > > > > > length of the "VLA" vector then it's not VLA.  If it doesn't
> > > > > > > know the length of the VLA vector then the sizes could be different
> > > > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> > > > > > > different (preventing pointwise CONVERT_EXPRs).
> > > > > > >
> > > > > > > > or from VLA vector of one type to VLA vector of other type ?
> > > > > > >
> > > > > > > That's supported though.  They work just like VLS vectors: if the sizes
> > > > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
> > > > > > > are the same then we can do pointwise conversions (e.g. element-by-element
> > > > > > > extensions, truncations, conversions to float, conversions to integer, etc).
> > > > > > >
> > > > > > > > Currently, if op2 is VLA, and we enter the branch:
> > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > > > > > > > then I think it will bail out because op2_units will not be a compile
> > > > > > > > time constant,
> > > > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false.
> > > > > > > >
> > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the
> > > > > > > > ones in PR106360.
> > > > > > > > Does it look OK ?
> > > > > > >
> > > > > > > Richi should have the final say, but it looks OK in principle to me.
> > > > > > > I'm not sure it's worth applying independently of the follow-on patch
> > > > > > > though, if that patch is in the offing anyway.
> > > > > > >
> > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs
> > > > > > > to build a new type.  Couldn't it just use the TREE_TYPE of the lhs
> > > > > > > instead?
> > > > > >
> > > > > > I think the point was they are not necssarily the same when we
> > > > > > looked through a VIEW_CONVERT?  A comment might be in order
> > > > > > here.
> > > > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of
> > > > > VIEW_CONVERT_EXPR. For instance in following case:
> > > > >
> > > > >   _78 = (void *) ivtmp.21_73;
> > > > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > > > >   _91 = {_92, 0};
> > > > >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> > > > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > > > >   _87 = {_88, 0};
> > > > >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> > > > >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> > > > >
> > > > > simplify_permutation looks thru V_C_E, and tries to fold:
> > > > > res = VEC_PERM_EXPR (_87, _91, {0, 2})
> > > > > In this case, lhs type (V16QI) differs from res type (V2DI), and we
> > > > > hit assert in fold_vec_perm.
> > > > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only
> > > > when we used res_type = lhs_type.
> > > > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI.
> > > > The issue with using res_type = TREE_TYPE (arg0), comes for case when
> > > > arg0 is fixed length, and op2 is VLA.
> > > Hi,
> > > Shall it be OK to commit the above patch, that changes res_type to
> > > have same elem type as arg0, and length as op2 ?
> > >
> > > Also, shall it be OK to gate these transforms in vec_perm match.pd pattern:
> > > sel.series_p (0, 1, 0, 1) -> op0 and,
> > > sel.series_p (0, 1, nelts, 1) -> op1
> > > if len(type) == len(arg0) ?
> > ping
>
> I've lost track - what is the patch you are refering to?
This one: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599586.html
Only the tree-ssa-forwprop.cc bits (the other patch extends
fold_vec_perm and we don't want asserts in vec_perm pattern).

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > > >
> > > > > > Btw, please don't add new asserts that trivially hold in critical paths.
> > > > > Well, it didn't hold true for following case when op2 is VLA:
> > > > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }),
> > > > > because we're passing res_type == V4SI, and op2_type == VNx4SI from
> > > > > simplify_permutation.
> > > > >
> > > > > The vec_perm pattern hits:
> > > > > (if (sel.series_p (0, 1, 0, 1))
> > > > >   { op0; }
> > > > >
> > > > > and folds it to:
> > > > > lhs = {1, 2, 3, 4}
> > > > >
> > > > > which results in error during verify_gimple since lhs and rhs types
> > > > > differ (VNx4SI, V4SI).
> > > > >
> > > > > I suppose we want to do the transforms:
> > > > > sel.series_p (0, 1, 0, 1) -> op0 and,
> > > > > sel.series_p (0, 1, nelts, 1) -> op1
> > > > > only if input vectors and sel have same length ?
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >
> > > > > >
> > > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > > index 330c1db0c8e..aa20cc713c5 100644
> > > > > > --- a/gcc/match.pd
> > > > > > +++ b/gcc/match.pd
> > > > > > @@ -7845,6 +7845,12 @@ and,
> > > > > >   (with
> > > > > >    {
> > > > > >      tree op0 = @0, op1 = @1, op2 = @2;
> > > > > > +
> > > > > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
> > > > > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))));
> > > > > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)),
> > > > > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1))));
> > > > > > +
> > > > > >      machine_mode result_mode = TYPE_MODE (type);
> > > > > >      machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard
  
Richard Biener Sept. 5, 2022, 11:03 a.m. UTC | #22
On Mon, Sep 5, 2022 at 11:27 AM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Mon, 5 Sept 2022 at 14:39, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 10:54 AM Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Mon, 29 Aug 2022 at 11:53, Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Thu, 18 Aug 2022 at 18:20, Prathamesh Kulkarni
> > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > >
> > > > > On Thu, 18 Aug 2022 at 18:14, Prathamesh Kulkarni
> > > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > > >
> > > > > > On Wed, 17 Aug 2022 at 17:01, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford
> > > > > > > <richard.sandiford@arm.com> wrote:
> > > > > > > >
> > > > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > > > > > > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > > > >>
> > > > > > > > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
> > > > > > > > >> <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > > > >> >
> > > > > > > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guenther@gmail.com> w>> > >
> > > > > > > > >> > >
> > > > > > > > >> > >   /* If result vector has greater length than input vector,
> > > > > > > > >> > > +     then allow permuting two vectors as long as:
> > > > > > > > >> > > +     a) sel.nelts_per_pattern == 1
> > > > > > > > >> > > +     b) sel.npatterns == len of input vector.
> > > > > > > > >> > > +     The intent is to permute input vectors, and
> > > > > > > > >> > > +     dup the elements in resulting vector to target vector length.  */
> > > > > > > > >> > > +
> > > > > > > > >> > > +  if (maybe_gt (TYPE_VECTOR_SUBPARTS (type),
> > > > > > > > >> > > +               TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
> > > > > > > > >> > > +    {
> > > > > > > > >> > > +      nelts = sel.encoding ().npatterns ();
> > > > > > > > >> > > +      if (sel.encoding ().nelts_per_pattern () != 1
> > > > > > > > >> > > +         || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))))
> > > > > > > > >> > > +       return NULL_TREE;
> > > > > > > > >> > > +    }
> > > > > > > > >> > >
> > > > > > > > >> > > so the only case you add is non-VLA to VLA and there
> > > > > > > > >> > > explicitely only the case of a period that's same as the
> > > > > > > > >> > > element count in the input vectors.
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree
> > > > > > > > >> > > node, int spc, dump_flags_t flags,
> > > > > > > > >> > >                 pp_space (pp);
> > > > > > > > >> > >               }
> > > > > > > > >> > >           }
> > > > > > > > >> > > +       if (VECTOR_TYPE_P (TREE_TYPE (node))
> > > > > > > > >> > > +           && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ())
> > > > > > > > >> > > +         pp_string (pp, ", ... ");
> > > > > > > > >> > >         pp_right_brace (pp);
> > > > > > > > >> > >
> > > > > > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"?  Are they?
> > > > > > > > >> > Well, it got created for the following case after folding:
> > > > > > > > >> > svint32_t f2(int a, int b, int c, int d)
> > > > > > > > >> > {
> > > > > > > > >> >   int32x4_t v = {a, b, c, d};
> > > > > > > > >> >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > > >> > }
> > > > > > > > >> >
> > > > > > > > >> > The svld1rq_s32 call gets folded to:
> > > > > > > > >> > v = {a, b, c, d}
> > > > > > > > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }>
> > > > > > > > >> >
> > > > > > > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to
> > > > > > > > >> > VLA constructor, since elements in v (in_elts) are not constant, and
> > > > > > > > >> > need_ctor is thus true:
> > > > > > > > >> > lhs = {a, b, c, d, ...}
> > > > > > > > >> > I added "..." to make it more explicit that it's a VLA constructor.
> > > > > > > > >>
> > > > > > > > >> But I doubt we do anything reasonable with such a beast?  Do we?
> > > > > > > > >> I suppose it's like a vec_duplicate if you view it as V1TImode
> > > > > > > > >> but do we actually make sure to do this duplication?
> > > > > > > > > I am not sure. As mentioned above, the current code-gen for VLA
> > > > > > > > > constructor looks pretty bad.
> > > > > > > > > Should we avoid folding VLA constructors for now ?
> > > > > > > >
> > > > > > > > VLA constructors aren't really a thing.  At least, the only VLA vector
> > > > > > > > you could represent with current CONSTRUCTOR nodes is a fixed-length
> > > > > > > > sequence at the start of an otherwise zero vector.  I'm not sure
> > > > > > > > we even use that though (perhaps we do and I've forgotten).
> > > > > > > >
> > > > > > > > > I guess these are 2 different issues:
> > > > > > > > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests.
> > > > > > > > > (b) Extending fold_vec_perm to handle vectors with differing lengths.
> > > > > > > > >
> > > > > > > > > For (a), I think the issue with using:
> > > > > > > > > res_type = gimple_assign_lhs (stmt)
> > > > > > > > > in previous patch, was that op2's type will change to match tgt_units,
> > > > > > > > > if we go thru
> > > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch,
> > > > > > > > > and may thus not be same as len(lhs_type) anymore, and hit the assert
> > > > > > > > > in fold_vec_perm.
> > > > > > > > >
> > > > > > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the
> > > > > > > > > following semantics:
> > > > > > > > > (1) Element types for lhs, rhs1 and rhs2 should be the same.
> > > > > > > > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2).
> > > > > > > >
> > > > > > > > Yeah.
> > > > > > > >
> > > > > > > > > The attached patch changes res_type from TREE_TYPE (arg0) to following:
> > > > > > > > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)),
> > > > > > > > >                                                 TYPE_VECTOR_SUBPARTS (op2))
> > > > > > > > > so it has same element type as arg0 (and arg1) and len of op2.
> > > > > > > > > Does that look reasonable ?
> > > > > > > > >
> > > > > > > > > If we need a cast from res_type to lhs_type, then both would be fixed
> > > > > > > > > width vectors
> > > > > > > > > with len(lhs_type) being a multiple of len(res_type).
> > > > > > > > > IIUC, we don't support casting from VLA vector to/from fixed width vector,
> > > > > > > >
> > > > > > > > Yes, that's not supported as a cast.  If the compiler knows the
> > > > > > > > length of the "VLA" vector then it's not VLA.  If it doesn't
> > > > > > > > know the length of the VLA vector then the sizes could be different
> > > > > > > > (preventing VIEW_CONVERT_EXPR) and the number of elements could be
> > > > > > > > different (preventing pointwise CONVERT_EXPRs).
> > > > > > > >
> > > > > > > > > or from VLA vector of one type to VLA vector of other type ?
> > > > > > > >
> > > > > > > > That's supported though.  They work just like VLS vectors: if the sizes
> > > > > > > > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements
> > > > > > > > are the same then we can do pointwise conversions (e.g. element-by-element
> > > > > > > > extensions, truncations, conversions to float, conversions to integer, etc).
> > > > > > > >
> > > > > > > > > Currently, if op2 is VLA, and we enter the branch:
> > > > > > > > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR)
> > > > > > > > > then I think it will bail out because op2_units will not be a compile
> > > > > > > > > time constant,
> > > > > > > > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false.
> > > > > > > > >
> > > > > > > > > The patch resolves ICE's for aarch64 cases, without regressing the
> > > > > > > > > ones in PR106360.
> > > > > > > > > Does it look OK ?
> > > > > > > >
> > > > > > > > Richi should have the final say, but it looks OK in principle to me.
> > > > > > > > I'm not sure it's worth applying independently of the follow-on patch
> > > > > > > > though, if that patch is in the offing anyway.
> > > > > > > >
> > > > > > > > I guess my only question is whether tree-ssa-forwprop.cc really needs
> > > > > > > > to build a new type.  Couldn't it just use the TREE_TYPE of the lhs
> > > > > > > > instead?
> > > > > > >
> > > > > > > I think the point was they are not necssarily the same when we
> > > > > > > looked through a VIEW_CONVERT?  A comment might be in order
> > > > > > > here.
> > > > > > Yes, the issue is when rhs1 and rhs2 of VEC_PERM_EXPR are a result of
> > > > > > VIEW_CONVERT_EXPR. For instance in following case:
> > > > > >
> > > > > >   _78 = (void *) ivtmp.21_73;
> > > > > >   _92 = MEM <unsigned long> [(uint8_t *)_78];
> > > > > >   _91 = {_92, 0};
> > > > > >   vect__1.6_90 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_91);
> > > > > >   _88 = MEM <unsigned long> [(uint8_t *)_78 + 16B];
> > > > > >   _87 = {_88, 0};
> > > > > >   vect__1.7_86 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(_87);
> > > > > >   vect__1.8_85 = VEC_PERM_EXPR <vect__1.6_90, vect__1.7_86, { 0, 1, 2,
> > > > > > 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 }>;
> > > > > >
> > > > > > simplify_permutation looks thru V_C_E, and tries to fold:
> > > > > > res = VEC_PERM_EXPR (_87, _91, {0, 2})
> > > > > > In this case, lhs type (V16QI) differs from res type (V2DI), and we
> > > > > > hit assert in fold_vec_perm.
> > > > > Oops sorry, just to clarify -- this hit assert in fold_vec_perm only
> > > > > when we used res_type = lhs_type.
> > > > > It works as-is, with res_type = TREE_TYPE (arg0) since arg0 type is V2DI.
> > > > > The issue with using res_type = TREE_TYPE (arg0), comes for case when
> > > > > arg0 is fixed length, and op2 is VLA.
> > > > Hi,
> > > > Shall it be OK to commit the above patch, that changes res_type to
> > > > have same elem type as arg0, and length as op2 ?
> > > >
> > > > Also, shall it be OK to gate these transforms in vec_perm match.pd pattern:
> > > > sel.series_p (0, 1, 0, 1) -> op0 and,
> > > > sel.series_p (0, 1, nelts, 1) -> op1
> > > > if len(type) == len(arg0) ?
> > > ping
> >
> > I've lost track - what is the patch you are refering to?
> This one: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599586.html
> Only the tree-ssa-forwprop.cc bits (the other patch extends
> fold_vec_perm and we don't want asserts in vec_perm pattern).

Yes, the tree-ssa-forwprop.cc hunk of that patch is OK.

Thanks,
Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > > >
> > > > > > > Btw, please don't add new asserts that trivially hold in critical paths.
> > > > > > Well, it didn't hold true for following case when op2 is VLA:
> > > > > > lhs = VEC_PERM_EXPR ({1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ... }),
> > > > > > because we're passing res_type == V4SI, and op2_type == VNx4SI from
> > > > > > simplify_permutation.
> > > > > >
> > > > > > The vec_perm pattern hits:
> > > > > > (if (sel.series_p (0, 1, 0, 1))
> > > > > >   { op0; }
> > > > > >
> > > > > > and folds it to:
> > > > > > lhs = {1, 2, 3, 4}
> > > > > >
> > > > > > which results in error during verify_gimple since lhs and rhs types
> > > > > > differ (VNx4SI, V4SI).
> > > > > >
> > > > > > I suppose we want to do the transforms:
> > > > > > sel.series_p (0, 1, 0, 1) -> op0 and,
> > > > > > sel.series_p (0, 1, nelts, 1) -> op1
> > > > > > only if input vectors and sel have same length ?
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > >
> > > > > > >
> > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > > > index 330c1db0c8e..aa20cc713c5 100644
> > > > > > > --- a/gcc/match.pd
> > > > > > > +++ b/gcc/match.pd
> > > > > > > @@ -7845,6 +7845,12 @@ and,
> > > > > > >   (with
> > > > > > >    {
> > > > > > >      tree op0 = @0, op1 = @1, op2 = @2;
> > > > > > > +
> > > > > > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type),
> > > > > > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2))));
> > > > > > > +    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)),
> > > > > > > +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1))));
> > > > > > > +
> > > > > > >      machine_mode result_mode = TYPE_MODE (type);
> > > > > > >      machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard
  

Patch

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 69567ab3275..be888f1c48e 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2414,6 +2414,9 @@  simplify_permutation (gimple_stmt_iterator *gsi)
   if (TREE_CODE (op2) != VECTOR_CST)
     return 0;
 
+  if (!types_compatible_p (TREE_TYPE (gimple_get_lhs (stmt)), TREE_TYPE (op0)))
+    return 0;
+
   if (TREE_CODE (op0) == VECTOR_CST)
     {
       code = VECTOR_CST;