[PATCH/RFC] PR tree-optimization/96912: Recognize VEC_COND_EXPR in match.pd

Message ID 005701d86ea5$dfc134f0$9f439ed0$@nextmovesoftware.com
State New
Headers
Series [PATCH/RFC] PR tree-optimization/96912: Recognize VEC_COND_EXPR in match.pd |

Commit Message

Roger Sayle May 23, 2022, 1:06 p.m. UTC
  Hi Richard,
I was wondering what you think of the following patch as a solution to
PR tree-optimization/96912, i.e. the ability to recognize pblendvb from
regular code rather than as a target specific builtin?

The obvious point of contention is that the current middle-end philosophy
around vector expressions is that the middle-end should continually check
for backend support, whereas I prefer the "old school" view that trees
are an abstraction and that RTL expansion is the point where these abstract
operations get converted/lowered into instructions supported by the target.
[The exceptions being built-in functions, IFN_* etc.] Should tree.texi
document
which tree codes can't be used without checking the backend.


Bootstrapped and regression tested, but this obviously depends upon RTL
expansion being able to perform the inverse operation/lowering if required.


2022-05-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR tree-optimization/96912
	* match.pd (vector_mask_p): New predicate to identify vectors
	where every element must be zero or all ones.
	(bit_xor (bit_and (bit_xor ...) ...) ...): Recognize a VEC_COND_EXPR
	expressed as logical vector operations.

gcc/testsuite/ChangeLog
	PR tree-optimization/96912
	* gcc.target/i386/pr96912.c: New test case.


Thoughts?  How would you solve this PR?  Are there convenience predicates
for testing whether a target supports vec_cond_expr, vec_duplicate, etc?

Cheers,
Roger
--

/* { dg-do compile { target { ! ia32 } } } */
/* { dg-options "-O2 -msse4" } */

typedef char V __attribute__((vector_size(16)));
typedef long long W __attribute__((vector_size(16)));

W
foo (W x, W y, V m)
{
  W t = (m < 0);
  return (~t & x) | (t & y);
}

V
bar (V x, V y, V m)
{
  V t = (m < 0);
  return (~t & x) | (t & y);
}

/* { dg-final { scan-assembler-times "pblend" 2 } } */
  

Comments

Richard Biener May 23, 2022, 1:35 p.m. UTC | #1
On Mon, May 23, 2022 at 3:06 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Richard,
> I was wondering what you think of the following patch as a solution to
> PR tree-optimization/96912, i.e. the ability to recognize pblendvb from
> regular code rather than as a target specific builtin?
>
> The obvious point of contention is that the current middle-end philosophy
> around vector expressions is that the middle-end should continually check
> for backend support, whereas I prefer the "old school" view that trees
> are an abstraction and that RTL expansion is the point where these abstract
> operations get converted/lowered into instructions supported by the target.

That's still true for trees aka GENERIC but GIMPLE is more like RTL here.
Note before vector lowering GIMPLE is like GENERIC but after the
"RTL expansion" of vectors is considered done.  I _think_ this was done
to get the opportunity to optimize the lowered code (and maybe cheat
by doing that lowering sub-optimally) given that user-written "generic"
vector code tends to run into limitations.

> [The exceptions being built-in functions, IFN_* etc.] Should tree.texi
> document
> which tree codes can't be used without checking the backend.

I suppose yes, but it's not really "which tree codes" but which
types.  Even for PLUS_EXPR you have to check for target support
when vector types are involved.

Note when being too lazy before vector lowering you could end up
transforming previously supported IL into unsupported and thus
triggering vector lowering to perform elementwise operations, severly
slowing down code which is why you might find checks like
if (target supports new code || target didn't support old code)

> Bootstrapped and regression tested, but this obviously depends upon RTL
> expansion being able to perform the inverse operation/lowering if required.

So the case in question "historically" was a task for RTL combine.  If
we now bring that to GIMPLE we should indeed verify if the target
can efficiently(!) do the operation we like to use.  In this particular
case it would be vec_cond_mask support for the created VEC_COND_EXPR.
We also have to avoid doing this after ISEL.

Note all original types are data types while you need a mask type for
the selector which in turn means you will almost never match
unless you hit the

+(match vector_mask_p
+ VECTOR_CST@0
+ (if (integer_zerop (@0) || integer_all_onesp (@0))))

case?

+(simplify
+ (bit_xor:c (bit_and:c (bit_xor:c @0 @1) (view_convert vector_mask_p@2)) @0)
+ (if (VECTOR_TYPE_P (type)
+      && VECTOR_TYPE_P (TREE_TYPE (@2)))
+  (with { tree masktype = truth_type_for (TREE_TYPE (@2));

I think you want to check the truth_type_for (type) instead, check that
you can V_C_E @2 to it by checking it's a vector mode and the same
as the truth_type mode.

+          tree vecttype = maybe_ne (TYPE_VECTOR_SUBPARTS (masktype),
+                                   TYPE_VECTOR_SUBPARTS (type))
+                         ? unsigned_type_for (masktype)
+                         : type; }
+   (view_convert (vec_cond:vecttype (view_convert:masktype @2)
+                                   (view_convert:vecttype @1)
+                                   (view_convert:vecttype @0))))))

and then have

    (vec_cond (view_convert:masktype @2) @1 @0)

Richard.

>
> 2022-05-23  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR tree-optimization/96912
>         * match.pd (vector_mask_p): New predicate to identify vectors
>         where every element must be zero or all ones.
>         (bit_xor (bit_and (bit_xor ...) ...) ...): Recognize a VEC_COND_EXPR
>         expressed as logical vector operations.
>
> gcc/testsuite/ChangeLog
>         PR tree-optimization/96912
>         * gcc.target/i386/pr96912.c: New test case.
>
>
> Thoughts?  How would you solve this PR?  Are there convenience predicates
> for testing whether a target supports vec_cond_expr, vec_duplicate, etc?
>
> Cheers,
> Roger
> --
>
  
Roger Sayle May 23, 2022, 2:27 p.m. UTC | #2
Hi Richard,

Currently for pr96912, we end up with:

W foo (W x, W y, V m)
{
  W t;
  vector(16) <signed-boolean:8> _1;
  vector(16) signed char _2;
  W _7;
  vector(2) long long int _9;
  vector(2) long long int _10;

  <bb 2> [local count: 1073741824]:
  _1 = m_3(D) < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _2 = VIEW_CONVERT_EXPR<vector(16) signed char>(_1);
  t_4 = VIEW_CONVERT_EXPR<W>(_2);
  _9 = x_5(D) ^ y_6(D);
  _10 = t_4 & _9;
  _7 = x_5(D) ^ _10;
  return _7;
}

The mask mode is V16QI and the data mode is V2DI (please forgive my RTL terminology).
The assignment t_4 view-converts the mask into the "data type" for the bitwise
operations.  The use x86's pblendvb, the "vcond_expr" operation needs to be mask's
mode (V16QI) rather than the data's mode (V2DI).  Hence the unsigned_type_for 
of the truth_type_for [if they have different NUNITS].  Obviously, converting the
mask to V2DI and performing a V2DI blend, won't produce the same result.

The most useful clause of vector_mask_p is actually the VECTOR_BOOLEAN_ TYPE_P
test that catches all "mask types", such as those that result from vector comparisons.
Note that vector_mask_p is tested against the operand of a view_convert expr.
The remaining cases handle truth_*_expr like operations on those comparisons.
One follow-up patch is to additionally allow VIEW_CONVERT_EXPR if both source
and destination are of  VECTOR_TYPE_P and known_eq TYPE_VECTOR_SUBUNITS.
Likewise, a C cst_vector_mask_p could check each element rather than the catch
all "integer_zerop || integer_all_onesp".

I agree with gimple-isel replacing VEC_COND_EXPR when it's supported in hardware,
just like .FMA is inserted to replace the universal MULT_EXPR/PLUS_EXPR tree codes,
the question is whether vec_cond_expr (and vec_duplicate) can always be expanded
moderately efficiently by the middle-end.  For one thing, we have vector cost models
that should be able to  distinguish between efficient and inefficient implementations.
And a vec_duplicate expander for SPARC should be able to generate an optimal 
sequence of instructions, even if there's isn't native hardware (instruction) support.
For example, scalar multiplication by 0x0101010101010101 may be a good way to 
vec_duplicate QI mode to V8QI mode (via DI mode), at least with -Os.  As you've
mentioned, the VEC_PERM infrastructure should be useful here.

[p.s. it's unfortunate that some of my patches appear controversial.  By randomly
selecting Bugzilla (regression) PRs that have been open for a long time, I seem to be
selecting/enriching for bugs for which there's no simple solution, and that maintainers
have already thought about for a long time but without coming up a satisfactory solution].

Thanks again for your assistance/direction.
Cheers,
Roger
--

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 23 May 2022 14:36
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH/RFC] PR tree-optimization/96912: Recognize
> VEC_COND_EXPR in match.pd
> 
> On Mon, May 23, 2022 at 3:06 PM Roger Sayle
> <roger@nextmovesoftware.com> wrote:
> >
> >
> > Hi Richard,
> > I was wondering what you think of the following patch as a solution to
> > PR tree-optimization/96912, i.e. the ability to recognize pblendvb
> > from regular code rather than as a target specific builtin?
> >
> > The obvious point of contention is that the current middle-end
> > philosophy around vector expressions is that the middle-end should
> > continually check for backend support, whereas I prefer the "old
> > school" view that trees are an abstraction and that RTL expansion is
> > the point where these abstract operations get converted/lowered into
> instructions supported by the target.
> 
> That's still true for trees aka GENERIC but GIMPLE is more like RTL here.
> Note before vector lowering GIMPLE is like GENERIC but after the "RTL
> expansion" of vectors is considered done.  I _think_ this was done to get the
> opportunity to optimize the lowered code (and maybe cheat by doing that
> lowering sub-optimally) given that user-written "generic"
> vector code tends to run into limitations.
> 
> > [The exceptions being built-in functions, IFN_* etc.] Should tree.texi
> > document which tree codes can't be used without checking the backend.
> 
> I suppose yes, but it's not really "which tree codes" but which types.  Even for
> PLUS_EXPR you have to check for target support when vector types are
> involved.
> 
> Note when being too lazy before vector lowering you could end up transforming
> previously supported IL into unsupported and thus triggering vector lowering to
> perform elementwise operations, severly slowing down code which is why you
> might find checks like if (target supports new code || target didn't support old
> code)
> 
> > Bootstrapped and regression tested, but this obviously depends upon
> > RTL expansion being able to perform the inverse operation/lowering if
> required.
> 
> So the case in question "historically" was a task for RTL combine.  If we now
> bring that to GIMPLE we should indeed verify if the target can efficiently(!) do
> the operation we like to use.  In this particular case it would be vec_cond_mask
> support for the created VEC_COND_EXPR.
> We also have to avoid doing this after ISEL.
> 
> Note all original types are data types while you need a mask type for the
> selector which in turn means you will almost never match unless you hit the
> 
> +(match vector_mask_p
> + VECTOR_CST@0
> + (if (integer_zerop (@0) || integer_all_onesp (@0))))
> 
> case?
> 
> +(simplify
> + (bit_xor:c (bit_and:c (bit_xor:c @0 @1) (view_convert
> +vector_mask_p@2)) @0)  (if (VECTOR_TYPE_P (type)
> +      && VECTOR_TYPE_P (TREE_TYPE (@2)))
> +  (with { tree masktype = truth_type_for (TREE_TYPE (@2));
> 
> I think you want to check the truth_type_for (type) instead, check that you can
> V_C_E @2 to it by checking it's a vector mode and the same as the truth_type
> mode.
> 
> +          tree vecttype = maybe_ne (TYPE_VECTOR_SUBPARTS (masktype),
> +                                   TYPE_VECTOR_SUBPARTS (type))
> +                         ? unsigned_type_for (masktype)
> +                         : type; }
> +   (view_convert (vec_cond:vecttype (view_convert:masktype @2)
> +                                   (view_convert:vecttype @1)
> +                                   (view_convert:vecttype @0))))))
> 
> and then have
> 
>     (vec_cond (view_convert:masktype @2) @1 @0)
> 
> Richard.
> 
> >
> > 2022-05-23  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR tree-optimization/96912
> >         * match.pd (vector_mask_p): New predicate to identify vectors
> >         where every element must be zero or all ones.
> >         (bit_xor (bit_and (bit_xor ...) ...) ...): Recognize a VEC_COND_EXPR
> >         expressed as logical vector operations.
> >
> > gcc/testsuite/ChangeLog
> >         PR tree-optimization/96912
> >         * gcc.target/i386/pr96912.c: New test case.
> >
> >
> > Thoughts?  How would you solve this PR?  Are there convenience
> > predicates for testing whether a target supports vec_cond_expr,
> vec_duplicate, etc?
> >
> > Cheers,
> > Roger
> > --
> >
  
Richard Biener May 24, 2022, 10:10 a.m. UTC | #3
On Mon, May 23, 2022 at 4:27 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Richard,
>
> Currently for pr96912, we end up with:
>
> W foo (W x, W y, V m)
> {
>   W t;
>   vector(16) <signed-boolean:8> _1;
>   vector(16) signed char _2;
>   W _7;
>   vector(2) long long int _9;
>   vector(2) long long int _10;
>
>   <bb 2> [local count: 1073741824]:
>   _1 = m_3(D) < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _2 = VIEW_CONVERT_EXPR<vector(16) signed char>(_1);
>   t_4 = VIEW_CONVERT_EXPR<W>(_2);
>   _9 = x_5(D) ^ y_6(D);
>   _10 = t_4 & _9;
>   _7 = x_5(D) ^ _10;
>   return _7;
> }
>
> The mask mode is V16QI and the data mode is V2DI (please forgive my RTL terminology).
> The assignment t_4 view-converts the mask into the "data type" for the bitwise
> operations.  The use x86's pblendvb, the "vcond_expr" operation needs to be mask's
> mode (V16QI) rather than the data's mode (V2DI).  Hence the unsigned_type_for
> of the truth_type_for [if they have different NUNITS].  Obviously, converting the
> mask to V2DI and performing a V2DI blend, won't produce the same result.

OK, so that's because pblendvb doesn't produce the bit operation
result but instead
does element-wise selection.  So you rely on the all-zero or all-ones
constant matching
or the VECTOR_BOOLEAN_TYPE_P guarantee that this is a suitable mask to get
at the "element" granularity that's usable since from the data modes
in the bit operations
there's no way to guess that (which might also be the reason why it's hard to do
this on the RTL level).  Commentary above the pattern should explain
this I think.

The recursive (and exponential via bit-ops) matching of this is a bit
awkward, since
you match (view_convert vector_mask_p@2) isn't it enough to require a
VECTOR_BOOLEAN_TYPE_P here?  That should then eventually be good enough
to simply use TREE_TYPE (@2) as data type without further checking and thus

 (view_convert (vec_cond @2 (view_convert:mask_type @1)
(view_convert:mask_type @0))

>
> The most useful clause of vector_mask_p is actually the VECTOR_BOOLEAN_ TYPE_P
> test that catches all "mask types", such as those that result from vector comparisons.
> Note that vector_mask_p is tested against the operand of a view_convert expr.
> The remaining cases handle truth_*_expr like operations on those comparisons.

But those should be still VECTOR_BOOLEAN_TYPE_P?

> One follow-up patch is to additionally allow VIEW_CONVERT_EXPR if both source
> and destination are of  VECTOR_TYPE_P and known_eq TYPE_VECTOR_SUBUNITS.
> Likewise, a C cst_vector_mask_p could check each element rather than the catch
> all "integer_zerop || integer_all_onesp".

Yes, you'd want to know whether the constant bit pattern covers a
vector mode (and which)
making lanes all ones or zeros.

> I agree with gimple-isel replacing VEC_COND_EXPR when it's supported in hardware,
> just like .FMA is inserted to replace the universal MULT_EXPR/PLUS_EXPR tree codes,
> the question is whether vec_cond_expr (and vec_duplicate) can always be expanded
> moderately efficiently by the middle-end.  For one thing, we have vector cost models
> that should be able to  distinguish between efficient and inefficient implementations.
> And a vec_duplicate expander for SPARC should be able to generate an optimal
> sequence of instructions, even if there's isn't native hardware (instruction) support.
> For example, scalar multiplication by 0x0101010101010101 may be a good way to
> vec_duplicate QI mode to V8QI mode (via DI mode), at least with -Os.  As you've
> mentioned, the VEC_PERM infrastructure should be useful here.

If it can be done efficiently with alternate code sequences then the
current setup
is so that vector lowering would be the place that does this.  There's
somewhat of
a consensus that we want to move away from RTL expansion doing all the "clever"
things toward doing that on (late) GIMPLE so that RTL expansion can eventually
rely on optabs only.

I'm not saying that's going to win, likewise the current vector lowering is both
sub-optimal (relying on followup optimization) and placed too early
(due to this).

In theory vector match.pd patterns could condition themselves to
optimize_vectors_before_lowering_p (), alternatively they have to check
target support.  There's now plenty passes between vectorization and
vector lowering though and we probably should not mess with the vectorizer
produced code (which is supported by the target) so we do not run into
the sub-optimal vector lowering implementation.  Possibly vector lowering
should be done earlier, before loop opts (before PRE).

> [p.s. it's unfortunate that some of my patches appear controversial.  By randomly
> selecting Bugzilla (regression) PRs that have been open for a long time, I seem to be
> selecting/enriching for bugs for which there's no simple solution, and that maintainers
> have already thought about for a long time but without coming up a satisfactory solution].

You are simply hitting areas where GCC isn't nicely designed and the obviously
optimal solution isn't clear :/

> Thanks again for your assistance/direction.
> Cheers,
> Roger
> --
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: 23 May 2022 14:36
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH/RFC] PR tree-optimization/96912: Recognize
> > VEC_COND_EXPR in match.pd
> >
> > On Mon, May 23, 2022 at 3:06 PM Roger Sayle
> > <roger@nextmovesoftware.com> wrote:
> > >
> > >
> > > Hi Richard,
> > > I was wondering what you think of the following patch as a solution to
> > > PR tree-optimization/96912, i.e. the ability to recognize pblendvb
> > > from regular code rather than as a target specific builtin?
> > >
> > > The obvious point of contention is that the current middle-end
> > > philosophy around vector expressions is that the middle-end should
> > > continually check for backend support, whereas I prefer the "old
> > > school" view that trees are an abstraction and that RTL expansion is
> > > the point where these abstract operations get converted/lowered into
> > instructions supported by the target.
> >
> > That's still true for trees aka GENERIC but GIMPLE is more like RTL here.
> > Note before vector lowering GIMPLE is like GENERIC but after the "RTL
> > expansion" of vectors is considered done.  I _think_ this was done to get the
> > opportunity to optimize the lowered code (and maybe cheat by doing that
> > lowering sub-optimally) given that user-written "generic"
> > vector code tends to run into limitations.
> >
> > > [The exceptions being built-in functions, IFN_* etc.] Should tree.texi
> > > document which tree codes can't be used without checking the backend.
> >
> > I suppose yes, but it's not really "which tree codes" but which types.  Even for
> > PLUS_EXPR you have to check for target support when vector types are
> > involved.
> >
> > Note when being too lazy before vector lowering you could end up transforming
> > previously supported IL into unsupported and thus triggering vector lowering to
> > perform elementwise operations, severly slowing down code which is why you
> > might find checks like if (target supports new code || target didn't support old
> > code)
> >
> > > Bootstrapped and regression tested, but this obviously depends upon
> > > RTL expansion being able to perform the inverse operation/lowering if
> > required.
> >
> > So the case in question "historically" was a task for RTL combine.  If we now
> > bring that to GIMPLE we should indeed verify if the target can efficiently(!) do
> > the operation we like to use.  In this particular case it would be vec_cond_mask
> > support for the created VEC_COND_EXPR.
> > We also have to avoid doing this after ISEL.
> >
> > Note all original types are data types while you need a mask type for the
> > selector which in turn means you will almost never match unless you hit the
> >
> > +(match vector_mask_p
> > + VECTOR_CST@0
> > + (if (integer_zerop (@0) || integer_all_onesp (@0))))
> >
> > case?
> >
> > +(simplify
> > + (bit_xor:c (bit_and:c (bit_xor:c @0 @1) (view_convert
> > +vector_mask_p@2)) @0)  (if (VECTOR_TYPE_P (type)
> > +      && VECTOR_TYPE_P (TREE_TYPE (@2)))
> > +  (with { tree masktype = truth_type_for (TREE_TYPE (@2));
> >
> > I think you want to check the truth_type_for (type) instead, check that you can
> > V_C_E @2 to it by checking it's a vector mode and the same as the truth_type
> > mode.
> >
> > +          tree vecttype = maybe_ne (TYPE_VECTOR_SUBPARTS (masktype),
> > +                                   TYPE_VECTOR_SUBPARTS (type))
> > +                         ? unsigned_type_for (masktype)
> > +                         : type; }
> > +   (view_convert (vec_cond:vecttype (view_convert:masktype @2)
> > +                                   (view_convert:vecttype @1)
> > +                                   (view_convert:vecttype @0))))))
> >
> > and then have
> >
> >     (vec_cond (view_convert:masktype @2) @1 @0)
> >
> > Richard.
> >
> > >
> > > 2022-05-23  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         PR tree-optimization/96912
> > >         * match.pd (vector_mask_p): New predicate to identify vectors
> > >         where every element must be zero or all ones.
> > >         (bit_xor (bit_and (bit_xor ...) ...) ...): Recognize a VEC_COND_EXPR
> > >         expressed as logical vector operations.
> > >
> > > gcc/testsuite/ChangeLog
> > >         PR tree-optimization/96912
> > >         * gcc.target/i386/pr96912.c: New test case.
> > >
> > >
> > > Thoughts?  How would you solve this PR?  Are there convenience
> > > predicates for testing whether a target supports vec_cond_expr,
> > vec_duplicate, etc?
> > >
> > > Cheers,
> > > Roger
> > > --
> > >
>
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index c2fed9b..e365f28 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4221,6 +4221,35 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (integer_all_onesp (@1) && integer_zerop (@2))
     @0))))
 
+/* A integer vector where every element must be 0 or -1.  */
+(match vector_mask_p
+ @0
+ (if (VECTOR_BOOLEAN_TYPE_P (type))))
+(match vector_mask_p
+ VECTOR_CST@0
+ (if (integer_zerop (@0) || integer_all_onesp (@0))))
+(match vector_mask_p
+ (vec_cond @0 vector_mask_p@1 vector_mask_p@2))
+(match vector_mask_p
+ (bit_not vector_mask_p@0))
+(for op (bit_and bit_ior bit_xor)
+ (match vector_mask_p
+  (op vector_mask_p@0 vector_mask_p@1)))
+
+/* Recognize VEC_COND_EXPR.  */
+(simplify
+ (bit_xor:c (bit_and:c (bit_xor:c @0 @1) (view_convert vector_mask_p@2)) @0)
+ (if (VECTOR_TYPE_P (type)
+      && VECTOR_TYPE_P (TREE_TYPE (@2)))
+  (with { tree masktype = truth_type_for (TREE_TYPE (@2));
+          tree vecttype = maybe_ne (TYPE_VECTOR_SUBPARTS (masktype),
+				    TYPE_VECTOR_SUBPARTS (type))
+			  ? unsigned_type_for (masktype)
+			  : type; }
+   (view_convert (vec_cond:vecttype (view_convert:masktype @2)
+				    (view_convert:vecttype @1)
+				    (view_convert:vecttype @0))))))
+
 /* A few simplifications of "a ? CST1 : CST2". */
 /* NOTE: Only do this on gimple as the if-chain-to-switch
    optimization depends on the gimple to have if statements in it. */