[GCC13] Don't force side effects for hardware vector element broadcast

Message ID alpine.DEB.2.20.2201251820040.11348@tpp.orcam.me.uk
State Dropped
Headers
Series [GCC13] Don't force side effects for hardware vector element broadcast |

Commit Message

Maciej W. Rozycki Jan. 27, 2022, 12:16 p.m. UTC
  Do not mark vector element broadcast resulting from OpenCL operations as 
having side effects for targets that have a suitable hardware operation, 
so that the `vec_duplicateM' standard RTL pattern can be directly used 
for them.  This does not happen currently, because any RTL pattern named 
`vec_duplicateM' cannot express the side effects presumed, and therefore 
a `!TREE_SIDE_EFFECTS' condition placed in `store_constructor' correctly 
prevents any `vec_duplicateM' pattern from being used.

This side-effect annotation was introduced for the C frontend with: 
<https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01075.html>, and then: 
<https://gcc.gnu.org/ml/gcc-patches/2011-08/msg00891.html> ("Scalar 
vector binary operation"), and finally landed with commit 0e3a99ae913c 
("c-typeck.c (scalar_to_vector): New function."), along with original 
support for scalar-to-vector OpenCL operations.

Support for these operations was then gradually added to C++ with: 
<https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01557.html>, and then: 
<https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00778.html> ("[C++] Mixed 
scalar-vector operations"), landing with commit a212e43fca22 ("re PR 
c++/54427 (Expose more vector extensions)"), followed by: 
<https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01665.html>, and then: 
<https://gcc.gnu.org/ml/gcc-patches/2012-10/msg02253.html> ("[C++] 
Handle ?: for vectors"), landing with commit 93100c6b5b45 ("re PR 
c++/54427 (Expose more vector extensions)").

And side-effect annotation for C++ was retrofitted, again gradually, 
with: <https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00254.html> ("[C++] 
Missing save_expr in vector-scalar ops"), landing with commit 
6698175d1591 ("typeck.c (cp_build_binary_op): Call save_expr before 
build_vector_from_val."), and then: 
<https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00233.html> ("vector 
conditional expression with scalar arguments"), landing with commit 
07298ffd6f7c ("call.c (build_conditional_expr_1): Handle the case with 
1 vector and 2 scalars.")

This was all long before we gained support for the `vec_duplicateM' 
standard RTL pattern, with commit be4c1d4a42c5 ("Add VEC_DUPLICATE_EXPR 
and associated optab") and it may have had sense for where the element 
broadcast operation was always open-coded.  However where provided by 
hardware the element broadcast operation is just like any other vector 
operation, presumably cheap, and whether to repeat it or not should be 
based on the cost of the operation and not hardwired.

So there is no point in artificially marking the operation as having a 
side effect for the lone purpose of preventing repetition.  This does 
not currently prevent targets like the Aarch64 and MIPS ones from using 
the hardware element broadcast operation, but it requires the backend to 
wire it explicitly via the `vec_initMN' standard RTL pattern rather than 
relying on the middle end to pick it up via the `vec_duplicateM' pattern 
automatically.  This change lifts this limitation.

	gcc/
	* c/c-typeck.cc (build_binary_op) Do not annotate vector element 
	broadcast operations as having a side effect if there is such an 
	operation provided by the backend.
	* cp/call.cc (build_conditional_expr): Likewise.
	* cp/typeck.cc (cp_build_binary_op): Likewise.
---
Hi,

 I've come across this while working on an out-of-tree microarchitecture, 
which does not require a `vec_initMN' RTL pattern, however has hardware 
support suitable for cheap `vec_duplicateM'.  I could not persuade the 
middle end to make use of the new `vec_duplicateM' pattern however until I 
removed the side-effect anbnotation.  So I think this is a move in the 
right direction even if we do not have support for said microarchitecture 
at the moment.

 For some reason I could not persuade our current Aarch64 code to use the 
`vec_duplicateM' pattern directly either.  This is because at the time 
`store_constructor' is called the vector mode for the element broadcast 
operation is say `V4SFmode' (for a vector of floats), however there is no 
corresponding `vec_duplicatev4sf' pattern.  Code in the `vec_initv4sfsf' 
pattern does the right thing though, referring to `aarch64_simd_dupv4sf' 
by hand instead.  There is a `vec_duplicatevnx4sf' pattern corresponding 
to `VNx4SFmode', but I was unable to figure out how to activate it.

 This change has passed regression-testing with the `aarch64-linux-gnu' 
target (with `-march=armv9-a') and QEMU in the user emulation mode across 
all the GCC languages and libraries.

 NB there are indentation issues with code surrounding places I have 
modified, but I have chosen not to address them so as not to obfuscate 
this change.  Likewise there are unnecessary compound statements used in 
the switch statements in the C++ frontend parts modified.

 Any questions or comments?  Otherwise OK once GCC 13 has opened?

  Maciej
---
 gcc/c/c-typeck.cc |    9 +++++++--
 gcc/cp/call.cc    |   19 +++++++++++++++----
 gcc/cp/typeck.cc  |    9 +++++++--
 3 files changed, 29 insertions(+), 8 deletions(-)

gcc-scalar-to-vector-no-side-effect.diff
  

Comments

Richard Biener Jan. 27, 2022, 1:19 p.m. UTC | #1
On Thu, Jan 27, 2022 at 1:16 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> Do not mark vector element broadcast resulting from OpenCL operations as
> having side effects for targets that have a suitable hardware operation,
> so that the `vec_duplicateM' standard RTL pattern can be directly used
> for them.  This does not happen currently, because any RTL pattern named
> `vec_duplicateM' cannot express the side effects presumed, and therefore
> a `!TREE_SIDE_EFFECTS' condition placed in `store_constructor' correctly
> prevents any `vec_duplicateM' pattern from being used.
>
> This side-effect annotation was introduced for the C frontend with:
> <https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01075.html>, and then:
> <https://gcc.gnu.org/ml/gcc-patches/2011-08/msg00891.html> ("Scalar
> vector binary operation"), and finally landed with commit 0e3a99ae913c
> ("c-typeck.c (scalar_to_vector): New function."), along with original
> support for scalar-to-vector OpenCL operations.
>
> Support for these operations was then gradually added to C++ with:
> <https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01557.html>, and then:
> <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00778.html> ("[C++] Mixed
> scalar-vector operations"), landing with commit a212e43fca22 ("re PR
> c++/54427 (Expose more vector extensions)"), followed by:
> <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01665.html>, and then:
> <https://gcc.gnu.org/ml/gcc-patches/2012-10/msg02253.html> ("[C++]
> Handle ?: for vectors"), landing with commit 93100c6b5b45 ("re PR
> c++/54427 (Expose more vector extensions)").
>
> And side-effect annotation for C++ was retrofitted, again gradually,
> with: <https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00254.html> ("[C++]
> Missing save_expr in vector-scalar ops"), landing with commit
> 6698175d1591 ("typeck.c (cp_build_binary_op): Call save_expr before
> build_vector_from_val."), and then:
> <https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00233.html> ("vector
> conditional expression with scalar arguments"), landing with commit
> 07298ffd6f7c ("call.c (build_conditional_expr_1): Handle the case with
> 1 vector and 2 scalars.")
>
> This was all long before we gained support for the `vec_duplicateM'
> standard RTL pattern, with commit be4c1d4a42c5 ("Add VEC_DUPLICATE_EXPR
> and associated optab") and it may have had sense for where the element
> broadcast operation was always open-coded.  However where provided by
> hardware the element broadcast operation is just like any other vector
> operation, presumably cheap, and whether to repeat it or not should be
> based on the cost of the operation and not hardwired.
>
> So there is no point in artificially marking the operation as having a
> side effect for the lone purpose of preventing repetition.  This does
> not currently prevent targets like the Aarch64 and MIPS ones from using
> the hardware element broadcast operation, but it requires the backend to
> wire it explicitly via the `vec_initMN' standard RTL pattern rather than
> relying on the middle end to pick it up via the `vec_duplicateM' pattern
> automatically.  This change lifts this limitation.
>
>         gcc/
>         * c/c-typeck.cc (build_binary_op) Do not annotate vector element
>         broadcast operations as having a side effect if there is such an
>         operation provided by the backend.
>         * cp/call.cc (build_conditional_expr): Likewise.
>         * cp/typeck.cc (cp_build_binary_op): Likewise.
> ---
> Hi,
>
>  I've come across this while working on an out-of-tree microarchitecture,
> which does not require a `vec_initMN' RTL pattern, however has hardware
> support suitable for cheap `vec_duplicateM'.  I could not persuade the
> middle end to make use of the new `vec_duplicateM' pattern however until I
> removed the side-effect anbnotation.  So I think this is a move in the
> right direction even if we do not have support for said microarchitecture
> at the moment.
>
>  For some reason I could not persuade our current Aarch64 code to use the
> `vec_duplicateM' pattern directly either.  This is because at the time
> `store_constructor' is called the vector mode for the element broadcast
> operation is say `V4SFmode' (for a vector of floats), however there is no
> corresponding `vec_duplicatev4sf' pattern.  Code in the `vec_initv4sfsf'
> pattern does the right thing though, referring to `aarch64_simd_dupv4sf'
> by hand instead.  There is a `vec_duplicatevnx4sf' pattern corresponding
> to `VNx4SFmode', but I was unable to figure out how to activate it.
>
>  This change has passed regression-testing with the `aarch64-linux-gnu'
> target (with `-march=armv9-a') and QEMU in the user emulation mode across
> all the GCC languages and libraries.
>
>  NB there are indentation issues with code surrounding places I have
> modified, but I have chosen not to address them so as not to obfuscate
> this change.  Likewise there are unnecessary compound statements used in
> the switch statements in the C++ frontend parts modified.
>
>  Any questions or comments?  Otherwise OK once GCC 13 has opened?
>
>   Maciej
> ---
>  gcc/c/c-typeck.cc |    9 +++++++--
>  gcc/cp/call.cc    |   19 +++++++++++++++----
>  gcc/cp/typeck.cc  |    9 +++++++--
>  3 files changed, 29 insertions(+), 8 deletions(-)
>
> gcc-scalar-to-vector-no-side-effect.diff
> Index: gcc/gcc/c/c-typeck.cc
> ===================================================================
> --- gcc.orig/gcc/c/c-typeck.cc
> +++ gcc/gcc/c/c-typeck.cc
> @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
>  #include "gomp-constants.h"
>  #include "spellcheck-tree.h"
>  #include "gcc-rich-location.h"
> +#include "optabs-query.h"
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "asan.h"
> @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en
>                bool maybe_const = true;
>                tree sc;
>                sc = c_fully_fold (op0, false, &maybe_const);
> -              sc = save_expr (sc);
> +             if (optab_handler (vec_duplicate_optab,
> +                                TYPE_MODE (type1)) == CODE_FOR_nothing)
> +               sc = save_expr (sc);

This doesn't make much sense - I suppose the CONSTRUCTOR retains
TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE
and thus should have been cleared during gimplification or in the end
ignored by RTL expansion.

You do not add a testcase so it's hard to see what goes wrong where exactly.

Richard.

>                sc = convert (TREE_TYPE (type1), sc);
>                op0 = build_vector_from_val (type1, sc);
>                if (!maybe_const)
> @@ -11938,7 +11941,9 @@ build_binary_op (location_t location, en
>               bool maybe_const = true;
>               tree sc;
>               sc = c_fully_fold (op1, false, &maybe_const);
> -             sc = save_expr (sc);
> +             if (optab_handler (vec_duplicate_optab,
> +                                TYPE_MODE (type0)) == CODE_FOR_nothing)
> +               sc = save_expr (sc);
>               sc = convert (TREE_TYPE (type0), sc);
>               op1 = build_vector_from_val (type0, sc);
>               if (!maybe_const)
> Index: gcc/gcc/cp/call.cc
> ===================================================================
> --- gcc.orig/gcc/cp/call.cc
> +++ gcc/gcc/cp/call.cc
> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.
>  #include "langhooks.h"
>  #include "c-family/c-objc.h"
>  #include "internal-fn.h"
> +#include "optabs-query.h"
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "gcc-rich-location.h"
> @@ -5435,12 +5436,18 @@ build_conditional_expr (const op_locatio
>               return error_mark_node;
>             }
>
> +         bool have_vec_duplicate
> +           = optab_handler (vec_duplicate_optab,
> +                            TYPE_MODE (vtype)) != CODE_FOR_nothing;
> +
>           arg2 = cp_convert (stype, arg2, complain);
> -         arg2 = save_expr (arg2);
> +         if (!have_vec_duplicate)
> +           arg2 = save_expr (arg2);
>           arg2 = build_vector_from_val (vtype, arg2);
>           arg2_type = vtype;
>           arg3 = cp_convert (stype, arg3, complain);
> -         arg3 = save_expr (arg3);
> +         if (!have_vec_duplicate)
> +           arg3 = save_expr (arg3);
>           arg3 = build_vector_from_val (vtype, arg3);
>           arg3_type = vtype;
>         }
> @@ -5458,7 +5465,9 @@ build_conditional_expr (const op_locatio
>                 return error_mark_node;
>               case stv_firstarg:
>                 {
> -                 arg2 = save_expr (arg2);
> +                 if (optab_handler (vec_duplicate_optab,
> +                                    TYPE_MODE (arg3_type)) == CODE_FOR_nothing)
> +                   arg2 = save_expr (arg2);
>                   arg2 = convert (TREE_TYPE (arg3_type), arg2);
>                   arg2 = build_vector_from_val (arg3_type, arg2);
>                   arg2_type = TREE_TYPE (arg2);
> @@ -5466,7 +5475,9 @@ build_conditional_expr (const op_locatio
>                 }
>               case stv_secondarg:
>                 {
> -                 arg3 = save_expr (arg3);
> +                 if (optab_handler (vec_duplicate_optab,
> +                                    TYPE_MODE (arg2_type)) == CODE_FOR_nothing)
> +                   arg3 = save_expr (arg3);
>                   arg3 = convert (TREE_TYPE (arg2_type), arg3);
>                   arg3 = build_vector_from_val (arg2_type, arg3);
>                   arg3_type = TREE_TYPE (arg3);
> Index: gcc/gcc/cp/typeck.cc
> ===================================================================
> --- gcc.orig/gcc/cp/typeck.cc
> +++ gcc/gcc/cp/typeck.cc
> @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.
>  #include "c-family/c-objc.h"
>  #include "c-family/c-ubsan.h"
>  #include "gcc-rich-location.h"
> +#include "optabs-query.h"
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "asan.h"
> @@ -5066,7 +5067,9 @@ cp_build_binary_op (const op_location_t
>            case stv_firstarg:
>              {
>                op0 = convert (TREE_TYPE (type1), op0);
> -             op0 = save_expr (op0);
> +             if (optab_handler (vec_duplicate_optab,
> +                                TYPE_MODE (type1)) == CODE_FOR_nothing)
> +               op0 = save_expr (op0);
>                op0 = build_vector_from_val (type1, op0);
>                type0 = TREE_TYPE (op0);
>                code0 = TREE_CODE (type0);
> @@ -5076,7 +5079,9 @@ cp_build_binary_op (const op_location_t
>            case stv_secondarg:
>              {
>                op1 = convert (TREE_TYPE (type0), op1);
> -             op1 = save_expr (op1);
> +             if (optab_handler (vec_duplicate_optab,
> +                                TYPE_MODE (type0)) == CODE_FOR_nothing)
> +               op1 = save_expr (op1);
>                op1 = build_vector_from_val (type0, op1);
>                type1 = TREE_TYPE (op1);
>                code1 = TREE_CODE (type1);
  
Maciej W. Rozycki Jan. 27, 2022, 1:59 p.m. UTC | #2
On Thu, 27 Jan 2022, Richard Biener wrote:

> > Index: gcc/gcc/c/c-typeck.cc
> > ===================================================================
> > --- gcc.orig/gcc/c/c-typeck.cc
> > +++ gcc/gcc/c/c-typeck.cc
> > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
> >  #include "gomp-constants.h"
> >  #include "spellcheck-tree.h"
> >  #include "gcc-rich-location.h"
> > +#include "optabs-query.h"
> >  #include "stringpool.h"
> >  #include "attribs.h"
> >  #include "asan.h"
> > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en
> >                bool maybe_const = true;
> >                tree sc;
> >                sc = c_fully_fold (op0, false, &maybe_const);
> > -              sc = save_expr (sc);
> > +             if (optab_handler (vec_duplicate_optab,
> > +                                TYPE_MODE (type1)) == CODE_FOR_nothing)
> > +               sc = save_expr (sc);
> 
> This doesn't make much sense - I suppose the CONSTRUCTOR retains
> TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE
> and thus should have been cleared during gimplification or in the end
> ignored by RTL expansion.

 This is how the expression built here eventually looks in 
`store_constructor':

(gdb) print exp
$41 = <constructor 0x7ffff5c06ba0>
(gdb) pt
 <constructor 0x7ffff5c06ba0
    type <vector_type 0x7ffff5e7ea48 v4sf
        type <real_type 0x7ffff5cf1260 float sizes-gimplified SF
            size <integer_cst 0x7ffff5c00f78 constant 32>
            unit-size <integer_cst 0x7ffff5c00f90 constant 4>
            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32
            pointer_to_this <pointer_type 0x7ffff5cf1848>>
        sizes-gimplified V4SF
        size <integer_cst 0x7ffff5c00d80 constant 128>
        unit-size <integer_cst 0x7ffff5c00d98 constant 16>
        align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>>
    side-effects length:4
    val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
        visited var <parm_decl 0x7ffff5f00080 y>
        def_stmt GIMPLE_NOP
        version:2>
    val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
        visited var <parm_decl 0x7ffff5f00080 y>
        def_stmt GIMPLE_NOP
        version:2>
    val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
        visited var <parm_decl 0x7ffff5f00080 y>
        def_stmt GIMPLE_NOP
        version:2>
    val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
        visited var <parm_decl 0x7ffff5f00080 y>
        def_stmt GIMPLE_NOP
        version:2>>
(gdb)

The `side-effects' flag prevents this conditional from executing:

	/* Try using vec_duplicate_optab for uniform vectors.  */
	if (!TREE_SIDE_EFFECTS (exp)
	    && VECTOR_MODE_P (mode)
	    && eltmode == GET_MODE_INNER (mode)
	    && ((icode = optab_handler (vec_duplicate_optab, mode))
		!= CODE_FOR_nothing)
	    && (elt = uniform_vector_p (exp))
	    && !VECTOR_TYPE_P (TREE_TYPE (elt)))
	  {
	    class expand_operand ops[2];
	    create_output_operand (&ops[0], target, mode);
	    create_input_operand (&ops[1], expand_normal (elt), eltmode);
	    expand_insn (icode, 2, ops);
	    if (!rtx_equal_p (target, ops[0].value))
	      emit_move_insn (target, ops[0].value);
	    break;
	  }

I don't know what's supposed to clear the flag (and what the purpose of 
setting it in the first place would be then).

> You do not add a testcase so it's hard to see what goes wrong where exactly.

 This only happens with an out-of-tree microarchitecture and I was unable 
to find an in-tree target that ever uses the `vec_duplicateM' standard RTL 
pattern, so I wasn't able to produce a test case that would trigger with 
upstream code.  Code is essentially like:

typedef float v4sf __attribute__ ((vector_size (16)));

v4sf
odd_even (v4sf x, float y)
{
  return x + f;
}

 I considered the Aarch64 one the most likely candidate as it is the one
commit be4c1d4a42c5 has been for, but as I noted in the description it 
does not appear to use it either.  It uses `vec_initMN' instead and does 
the broadcast by hand in the backend.

 Maybe I'm missing something.

  Maciej
  
Richard Biener Jan. 27, 2022, 2:22 p.m. UTC | #3
On Thu, Jan 27, 2022 at 2:59 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> On Thu, 27 Jan 2022, Richard Biener wrote:
>
> > > Index: gcc/gcc/c/c-typeck.cc
> > > ===================================================================
> > > --- gcc.orig/gcc/c/c-typeck.cc
> > > +++ gcc/gcc/c/c-typeck.cc
> > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
> > >  #include "gomp-constants.h"
> > >  #include "spellcheck-tree.h"
> > >  #include "gcc-rich-location.h"
> > > +#include "optabs-query.h"
> > >  #include "stringpool.h"
> > >  #include "attribs.h"
> > >  #include "asan.h"
> > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en
> > >                bool maybe_const = true;
> > >                tree sc;
> > >                sc = c_fully_fold (op0, false, &maybe_const);
> > > -              sc = save_expr (sc);
> > > +             if (optab_handler (vec_duplicate_optab,
> > > +                                TYPE_MODE (type1)) == CODE_FOR_nothing)
> > > +               sc = save_expr (sc);
> >
> > This doesn't make much sense - I suppose the CONSTRUCTOR retains
> > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE
> > and thus should have been cleared during gimplification or in the end
> > ignored by RTL expansion.
>
>  This is how the expression built here eventually looks in
> `store_constructor':
>
> (gdb) print exp
> $41 = <constructor 0x7ffff5c06ba0>
> (gdb) pt
>  <constructor 0x7ffff5c06ba0
>     type <vector_type 0x7ffff5e7ea48 v4sf
>         type <real_type 0x7ffff5cf1260 float sizes-gimplified SF
>             size <integer_cst 0x7ffff5c00f78 constant 32>
>             unit-size <integer_cst 0x7ffff5c00f90 constant 4>
>             align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32
>             pointer_to_this <pointer_type 0x7ffff5cf1848>>
>         sizes-gimplified V4SF
>         size <integer_cst 0x7ffff5c00d80 constant 128>
>         unit-size <integer_cst 0x7ffff5c00d98 constant 16>
>         align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>>
>     side-effects length:4
>     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
>         visited var <parm_decl 0x7ffff5f00080 y>
>         def_stmt GIMPLE_NOP
>         version:2>
>     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
>         visited var <parm_decl 0x7ffff5f00080 y>
>         def_stmt GIMPLE_NOP
>         version:2>
>     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
>         visited var <parm_decl 0x7ffff5f00080 y>
>         def_stmt GIMPLE_NOP
>         version:2>
>     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
>         visited var <parm_decl 0x7ffff5f00080 y>
>         def_stmt GIMPLE_NOP
>         version:2>>
> (gdb)
>
> The `side-effects' flag prevents this conditional from executing:
>
>         /* Try using vec_duplicate_optab for uniform vectors.  */
>         if (!TREE_SIDE_EFFECTS (exp)
>             && VECTOR_MODE_P (mode)
>             && eltmode == GET_MODE_INNER (mode)
>             && ((icode = optab_handler (vec_duplicate_optab, mode))
>                 != CODE_FOR_nothing)
>             && (elt = uniform_vector_p (exp))
>             && !VECTOR_TYPE_P (TREE_TYPE (elt)))
>           {
>             class expand_operand ops[2];
>             create_output_operand (&ops[0], target, mode);
>             create_input_operand (&ops[1], expand_normal (elt), eltmode);
>             expand_insn (icode, 2, ops);
>             if (!rtx_equal_p (target, ops[0].value))
>               emit_move_insn (target, ops[0].value);
>             break;
>           }
>
> I don't know what's supposed to clear the flag (and what the purpose of
> setting it in the first place would be then).

It's probably safe to remove the !TREE_SIDE_EFFECTS check above
but already gimplification should have made sure all side-effects are
pushed to separate stmts.  gimplifiation usually calls recompute_side_effects
but that doesn't seem to touch CONSTRUCTORs.  But I do remember fixing
some spurious TREE_SIDE_EFFECTS on CTORs before.

Might be worth verifying in verify_gimple_assign_single that CTORs
do not have TREE_SIDE_EFFECTS set (unless this is a clobber).

>
> > You do not add a testcase so it's hard to see what goes wrong where exactly.
>
>  This only happens with an out-of-tree microarchitecture and I was unable
> to find an in-tree target that ever uses the `vec_duplicateM' standard RTL
> pattern, so I wasn't able to produce a test case that would trigger with
> upstream code.  Code is essentially like:
>
> typedef float v4sf __attribute__ ((vector_size (16)));
>
> v4sf
> odd_even (v4sf x, float y)
> {
>   return x + f;
> }
>
>  I considered the Aarch64 one the most likely candidate as it is the one
> commit be4c1d4a42c5 has been for, but as I noted in the description it
> does not appear to use it either.  It uses `vec_initMN' instead and does
> the broadcast by hand in the backend.
>
>  Maybe I'm missing something.
>
>   Maciej
  
Maciej W. Rozycki Jan. 27, 2022, 7:14 p.m. UTC | #4
On Thu, 27 Jan 2022, Richard Biener wrote:

> > > > Index: gcc/gcc/c/c-typeck.cc
> > > > ===================================================================
> > > > --- gcc.orig/gcc/c/c-typeck.cc
> > > > +++ gcc/gcc/c/c-typeck.cc
> > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
> > > >  #include "gomp-constants.h"
> > > >  #include "spellcheck-tree.h"
> > > >  #include "gcc-rich-location.h"
> > > > +#include "optabs-query.h"
> > > >  #include "stringpool.h"
> > > >  #include "attribs.h"
> > > >  #include "asan.h"
> > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en
> > > >                bool maybe_const = true;
> > > >                tree sc;
> > > >                sc = c_fully_fold (op0, false, &maybe_const);
> > > > -              sc = save_expr (sc);
> > > > +             if (optab_handler (vec_duplicate_optab,
> > > > +                                TYPE_MODE (type1)) == CODE_FOR_nothing)
> > > > +               sc = save_expr (sc);
> > >
> > > This doesn't make much sense - I suppose the CONSTRUCTOR retains
> > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE
> > > and thus should have been cleared during gimplification or in the end
> > > ignored by RTL expansion.
> >
> >  This is how the expression built here eventually looks in
> > `store_constructor':
> >
> > (gdb) print exp
> > $41 = <constructor 0x7ffff5c06ba0>
> > (gdb) pt
> >  <constructor 0x7ffff5c06ba0
> >     type <vector_type 0x7ffff5e7ea48 v4sf
> >         type <real_type 0x7ffff5cf1260 float sizes-gimplified SF
> >             size <integer_cst 0x7ffff5c00f78 constant 32>
> >             unit-size <integer_cst 0x7ffff5c00f90 constant 4>
> >             align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32
> >             pointer_to_this <pointer_type 0x7ffff5cf1848>>
> >         sizes-gimplified V4SF
> >         size <integer_cst 0x7ffff5c00d80 constant 128>
> >         unit-size <integer_cst 0x7ffff5c00d98 constant 16>
> >         align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>>
> >     side-effects length:4
> >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> >         visited var <parm_decl 0x7ffff5f00080 y>
> >         def_stmt GIMPLE_NOP
> >         version:2>
> >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> >         visited var <parm_decl 0x7ffff5f00080 y>
> >         def_stmt GIMPLE_NOP
> >         version:2>
> >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> >         visited var <parm_decl 0x7ffff5f00080 y>
> >         def_stmt GIMPLE_NOP
> >         version:2>
> >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> >         visited var <parm_decl 0x7ffff5f00080 y>
> >         def_stmt GIMPLE_NOP
> >         version:2>>
> > (gdb)
> >
> > The `side-effects' flag prevents this conditional from executing:
> >
> >         /* Try using vec_duplicate_optab for uniform vectors.  */
> >         if (!TREE_SIDE_EFFECTS (exp)
> >             && VECTOR_MODE_P (mode)
> >             && eltmode == GET_MODE_INNER (mode)
> >             && ((icode = optab_handler (vec_duplicate_optab, mode))
> >                 != CODE_FOR_nothing)
> >             && (elt = uniform_vector_p (exp))
> >             && !VECTOR_TYPE_P (TREE_TYPE (elt)))
> >           {
> >             class expand_operand ops[2];
> >             create_output_operand (&ops[0], target, mode);
> >             create_input_operand (&ops[1], expand_normal (elt), eltmode);
> >             expand_insn (icode, 2, ops);
> >             if (!rtx_equal_p (target, ops[0].value))
> >               emit_move_insn (target, ops[0].value);
> >             break;
> >           }
> >
> > I don't know what's supposed to clear the flag (and what the purpose of
> > setting it in the first place would be then).
> 
> It's probably safe to remove the !TREE_SIDE_EFFECTS check above
> but already gimplification should have made sure all side-effects are
> pushed to separate stmts.  gimplifiation usually calls recompute_side_effects
> but that doesn't seem to touch CONSTRUCTORs.  But I do remember fixing
> some spurious TREE_SIDE_EFFECTS on CTORs before.
> 
> Might be worth verifying in verify_gimple_assign_single that CTORs
> do not have TREE_SIDE_EFFECTS set (unless this is a clobber).

 OK, so maybe there's another bug somewhere that causes the side-effects 
flag not to be cleared where expected, however I an inconvinced as to 
withdrawing my original point.  That is why treat code like:

v4sf
odd_even (v4sf x, float y)
{
  return x + f;
}

effectively like:

v4sf
odd_even (v4sf x, volatile float y)
{
  return x + f;
}

which I infer from the terse justification in the discussions referred is 
the sole purpose of making use of `save_expr' here, also for targets that 
have a cheap (or free if combined with another operation) `vec_duplicateM' 
machine operation?

 While removing the !TREE_SIDE_EFFECTS check may cause `vec_duplicateM' to 
be used, the middle end will still ensure the element broadcast operation 
won't be repeated, e.g. at the cost of consuming a temporary register to 
carry a vector of identical elements, where it may not be the least costly 
approach.  Where we have an actual `vec_duplicateM' insn we can use its 
cost to determine the best approach, can't we?

 Am I still missing something?

  Maciej
  
Richard Biener Jan. 28, 2022, 12:22 p.m. UTC | #5
On Thu, Jan 27, 2022 at 8:14 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> On Thu, 27 Jan 2022, Richard Biener wrote:
>
> > > > > Index: gcc/gcc/c/c-typeck.cc
> > > > > ===================================================================
> > > > > --- gcc.orig/gcc/c/c-typeck.cc
> > > > > +++ gcc/gcc/c/c-typeck.cc
> > > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
> > > > >  #include "gomp-constants.h"
> > > > >  #include "spellcheck-tree.h"
> > > > >  #include "gcc-rich-location.h"
> > > > > +#include "optabs-query.h"
> > > > >  #include "stringpool.h"
> > > > >  #include "attribs.h"
> > > > >  #include "asan.h"
> > > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en
> > > > >                bool maybe_const = true;
> > > > >                tree sc;
> > > > >                sc = c_fully_fold (op0, false, &maybe_const);
> > > > > -              sc = save_expr (sc);
> > > > > +             if (optab_handler (vec_duplicate_optab,
> > > > > +                                TYPE_MODE (type1)) == CODE_FOR_nothing)
> > > > > +               sc = save_expr (sc);
> > > >
> > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains
> > > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE
> > > > and thus should have been cleared during gimplification or in the end
> > > > ignored by RTL expansion.
> > >
> > >  This is how the expression built here eventually looks in
> > > `store_constructor':
> > >
> > > (gdb) print exp
> > > $41 = <constructor 0x7ffff5c06ba0>
> > > (gdb) pt
> > >  <constructor 0x7ffff5c06ba0
> > >     type <vector_type 0x7ffff5e7ea48 v4sf
> > >         type <real_type 0x7ffff5cf1260 float sizes-gimplified SF
> > >             size <integer_cst 0x7ffff5c00f78 constant 32>
> > >             unit-size <integer_cst 0x7ffff5c00f90 constant 4>
> > >             align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32
> > >             pointer_to_this <pointer_type 0x7ffff5cf1848>>
> > >         sizes-gimplified V4SF
> > >         size <integer_cst 0x7ffff5c00d80 constant 128>
> > >         unit-size <integer_cst 0x7ffff5c00d98 constant 16>
> > >         align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>>
> > >     side-effects length:4
> > >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> > >         visited var <parm_decl 0x7ffff5f00080 y>
> > >         def_stmt GIMPLE_NOP
> > >         version:2>
> > >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> > >         visited var <parm_decl 0x7ffff5f00080 y>
> > >         def_stmt GIMPLE_NOP
> > >         version:2>
> > >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> > >         visited var <parm_decl 0x7ffff5f00080 y>
> > >         def_stmt GIMPLE_NOP
> > >         version:2>
> > >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> > >         visited var <parm_decl 0x7ffff5f00080 y>
> > >         def_stmt GIMPLE_NOP
> > >         version:2>>
> > > (gdb)
> > >
> > > The `side-effects' flag prevents this conditional from executing:
> > >
> > >         /* Try using vec_duplicate_optab for uniform vectors.  */
> > >         if (!TREE_SIDE_EFFECTS (exp)
> > >             && VECTOR_MODE_P (mode)
> > >             && eltmode == GET_MODE_INNER (mode)
> > >             && ((icode = optab_handler (vec_duplicate_optab, mode))
> > >                 != CODE_FOR_nothing)
> > >             && (elt = uniform_vector_p (exp))
> > >             && !VECTOR_TYPE_P (TREE_TYPE (elt)))
> > >           {
> > >             class expand_operand ops[2];
> > >             create_output_operand (&ops[0], target, mode);
> > >             create_input_operand (&ops[1], expand_normal (elt), eltmode);
> > >             expand_insn (icode, 2, ops);
> > >             if (!rtx_equal_p (target, ops[0].value))
> > >               emit_move_insn (target, ops[0].value);
> > >             break;
> > >           }
> > >
> > > I don't know what's supposed to clear the flag (and what the purpose of
> > > setting it in the first place would be then).
> >
> > It's probably safe to remove the !TREE_SIDE_EFFECTS check above
> > but already gimplification should have made sure all side-effects are
> > pushed to separate stmts.  gimplifiation usually calls recompute_side_effects
> > but that doesn't seem to touch CONSTRUCTORs.  But I do remember fixing
> > some spurious TREE_SIDE_EFFECTS on CTORs before.
> >
> > Might be worth verifying in verify_gimple_assign_single that CTORs
> > do not have TREE_SIDE_EFFECTS set (unless this is a clobber).
>
>  OK, so maybe there's another bug somewhere that causes the side-effects
> flag not to be cleared where expected, however I an inconvinced as to
> withdrawing my original point.  That is why treat code like:
>
> v4sf
> odd_even (v4sf x, float y)
> {
>   return x + f;
> }
>
> effectively like:
>
> v4sf
> odd_even (v4sf x, volatile float y)
> {
>   return x + f;
> }

that's not what it does.  It treats it like

  float tem = f;
  return x + { tem, tem, tem, tem };

avoiding, like for x + (1.0f + f) creating

  return x + { 1.0+f, 1.0+f, 1.0+f ...}

it's more CSE than volatile qualifying.

> which I infer from the terse justification in the discussions referred is
> the sole purpose of making use of `save_expr' here, also for targets that
> have a cheap (or free if combined with another operation) `vec_duplicateM'
> machine operation?

Because the IL from the frontends should not depend on target capabilities
and whether we have to preserve side-effects properly doesn't depend on
the cheapness of the operation itself.  Consider

  return x + bar (f);

you definitely want bar(f) to be only evaluated once, even when the
target can cheaply do the splat.

Richard.

>  While removing the !TREE_SIDE_EFFECTS check may cause `vec_duplicateM' to
> be used, the middle end will still ensure the element broadcast operation
> won't be repeated, e.g. at the cost of consuming a temporary register to
> carry a vector of identical elements, where it may not be the least costly
> approach.  Where we have an actual `vec_duplicateM' insn we can use its
> cost to determine the best approach, can't we?
>
>  Am I still missing something?
>
>   Maciej
  
Richard Biener Jan. 28, 2022, 12:25 p.m. UTC | #6
On Fri, Jan 28, 2022 at 1:22 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Jan 27, 2022 at 8:14 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
> >
> > On Thu, 27 Jan 2022, Richard Biener wrote:
> >
> > > > > > Index: gcc/gcc/c/c-typeck.cc
> > > > > > ===================================================================
> > > > > > --- gcc.orig/gcc/c/c-typeck.cc
> > > > > > +++ gcc/gcc/c/c-typeck.cc
> > > > > > @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
> > > > > >  #include "gomp-constants.h"
> > > > > >  #include "spellcheck-tree.h"
> > > > > >  #include "gcc-rich-location.h"
> > > > > > +#include "optabs-query.h"
> > > > > >  #include "stringpool.h"
> > > > > >  #include "attribs.h"
> > > > > >  #include "asan.h"
> > > > > > @@ -11923,7 +11924,9 @@ build_binary_op (location_t location, en
> > > > > >                bool maybe_const = true;
> > > > > >                tree sc;
> > > > > >                sc = c_fully_fold (op0, false, &maybe_const);
> > > > > > -              sc = save_expr (sc);
> > > > > > +             if (optab_handler (vec_duplicate_optab,
> > > > > > +                                TYPE_MODE (type1)) == CODE_FOR_nothing)
> > > > > > +               sc = save_expr (sc);
> > > > >
> > > > > This doesn't make much sense - I suppose the CONSTRUCTOR retains
> > > > > TREE_SIDE_EFFECTS but such flag has no meaning on GIMPLE
> > > > > and thus should have been cleared during gimplification or in the end
> > > > > ignored by RTL expansion.
> > > >
> > > >  This is how the expression built here eventually looks in
> > > > `store_constructor':
> > > >
> > > > (gdb) print exp
> > > > $41 = <constructor 0x7ffff5c06ba0>
> > > > (gdb) pt
> > > >  <constructor 0x7ffff5c06ba0
> > > >     type <vector_type 0x7ffff5e7ea48 v4sf
> > > >         type <real_type 0x7ffff5cf1260 float sizes-gimplified SF
> > > >             size <integer_cst 0x7ffff5c00f78 constant 32>
> > > >             unit-size <integer_cst 0x7ffff5c00f90 constant 4>
> > > >             align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5cf1260 precision:32
> > > >             pointer_to_this <pointer_type 0x7ffff5cf1848>>
> > > >         sizes-gimplified V4SF
> > > >         size <integer_cst 0x7ffff5c00d80 constant 128>
> > > >         unit-size <integer_cst 0x7ffff5c00d98 constant 16>
> > > >         align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5d19648 nunits:4 context <translation_unit_decl 0x7ffff5ec0bb8 v4sf-dup.c>>
> > > >     side-effects length:4
> > > >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> > > >         visited var <parm_decl 0x7ffff5f00080 y>
> > > >         def_stmt GIMPLE_NOP
> > > >         version:2>
> > > >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> > > >         visited var <parm_decl 0x7ffff5f00080 y>
> > > >         def_stmt GIMPLE_NOP
> > > >         version:2>
> > > >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> > > >         visited var <parm_decl 0x7ffff5f00080 y>
> > > >         def_stmt GIMPLE_NOP
> > > >         version:2>
> > > >     val <ssa_name 0x7ffff5cb0dc8 type <real_type 0x7ffff5cf1260 float>
> > > >         visited var <parm_decl 0x7ffff5f00080 y>
> > > >         def_stmt GIMPLE_NOP
> > > >         version:2>>
> > > > (gdb)
> > > >
> > > > The `side-effects' flag prevents this conditional from executing:
> > > >
> > > >         /* Try using vec_duplicate_optab for uniform vectors.  */
> > > >         if (!TREE_SIDE_EFFECTS (exp)
> > > >             && VECTOR_MODE_P (mode)
> > > >             && eltmode == GET_MODE_INNER (mode)
> > > >             && ((icode = optab_handler (vec_duplicate_optab, mode))
> > > >                 != CODE_FOR_nothing)
> > > >             && (elt = uniform_vector_p (exp))
> > > >             && !VECTOR_TYPE_P (TREE_TYPE (elt)))
> > > >           {
> > > >             class expand_operand ops[2];
> > > >             create_output_operand (&ops[0], target, mode);
> > > >             create_input_operand (&ops[1], expand_normal (elt), eltmode);
> > > >             expand_insn (icode, 2, ops);
> > > >             if (!rtx_equal_p (target, ops[0].value))
> > > >               emit_move_insn (target, ops[0].value);
> > > >             break;
> > > >           }
> > > >
> > > > I don't know what's supposed to clear the flag (and what the purpose of
> > > > setting it in the first place would be then).
> > >
> > > It's probably safe to remove the !TREE_SIDE_EFFECTS check above
> > > but already gimplification should have made sure all side-effects are
> > > pushed to separate stmts.  gimplifiation usually calls recompute_side_effects
> > > but that doesn't seem to touch CONSTRUCTORs.  But I do remember fixing
> > > some spurious TREE_SIDE_EFFECTS on CTORs before.
> > >
> > > Might be worth verifying in verify_gimple_assign_single that CTORs
> > > do not have TREE_SIDE_EFFECTS set (unless this is a clobber).
> >
> >  OK, so maybe there's another bug somewhere that causes the side-effects
> > flag not to be cleared where expected, however I an inconvinced as to
> > withdrawing my original point.  That is why treat code like:
> >
> > v4sf
> > odd_even (v4sf x, float y)
> > {
> >   return x + f;
> > }
> >
> > effectively like:
> >
> > v4sf
> > odd_even (v4sf x, volatile float y)
> > {
> >   return x + f;
> > }
>
> that's not what it does.  It treats it like
>
>   float tem = f;
>   return x + { tem, tem, tem, tem };
>
> avoiding, like for x + (1.0f + f) creating
>
>   return x + { 1.0+f, 1.0+f, 1.0+f ...}
>
> it's more CSE than volatile qualifying.
>
> > which I infer from the terse justification in the discussions referred is
> > the sole purpose of making use of `save_expr' here, also for targets that
> > have a cheap (or free if combined with another operation) `vec_duplicateM'
> > machine operation?
>
> Because the IL from the frontends should not depend on target capabilities
> and whether we have to preserve side-effects properly doesn't depend on
> the cheapness of the operation itself.  Consider
>
>   return x + bar (f);
>
> you definitely want bar(f) to be only evaluated once, even when the
> target can cheaply do the splat.

Btw,

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index efd10332c53..c0f7d98931d 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -4703,6 +4703,12 @@ verify_gimple_assign_single (gassign *stmt)
          debug_generic_stmt (rhs1);
          return true;
        }
+      if (TREE_SIDE_EFFECTS (rhs1) && !gimple_clobber_p (stmt))
+       {
+         error ("%qs with side-effects", code_name);
+         debug_generic_stmt (rhs1);
+         return true;
+       }
       return res;

     case ASSERT_EXPR:

does not cause ICEs on the two testcases (on trunk).

Richard.

>
> Richard.
>
> >  While removing the !TREE_SIDE_EFFECTS check may cause `vec_duplicateM' to
> > be used, the middle end will still ensure the element broadcast operation
> > won't be repeated, e.g. at the cost of consuming a temporary register to
> > carry a vector of identical elements, where it may not be the least costly
> > approach.  Where we have an actual `vec_duplicateM' insn we can use its
> > cost to determine the best approach, can't we?
> >
> >  Am I still missing something?
> >
> >   Maciej
  
Maciej W. Rozycki Jan. 28, 2022, 9:52 p.m. UTC | #7
On Fri, 28 Jan 2022, Richard Biener wrote:

> > that's not what it does.  It treats it like
> >
> >   float tem = f;
> >   return x + { tem, tem, tem, tem };
> >
> > avoiding, like for x + (1.0f + f) creating
> >
> >   return x + { 1.0+f, 1.0+f, 1.0+f ...}
> >
> > it's more CSE than volatile qualifying.

 I see, thanks for your time to explain me.  I got this confused.

> > Because the IL from the frontends should not depend on target capabilities
> > and whether we have to preserve side-effects properly doesn't depend on
> > the cheapness of the operation itself.  Consider
> >
> >   return x + bar (f);
> >
> > you definitely want bar(f) to be only evaluated once, even when the
> > target can cheaply do the splat.

 Indeed.

> Btw,
> 
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index efd10332c53..c0f7d98931d 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -4703,6 +4703,12 @@ verify_gimple_assign_single (gassign *stmt)
>           debug_generic_stmt (rhs1);
>           return true;
>         }
> +      if (TREE_SIDE_EFFECTS (rhs1) && !gimple_clobber_p (stmt))
> +       {
> +         error ("%qs with side-effects", code_name);
> +         debug_generic_stmt (rhs1);
> +         return true;
> +       }
>        return res;
> 
>      case ASSERT_EXPR:
> 
> does not cause ICEs on the two testcases (on trunk).

 Right, so it has turned out I had the wrong binary run under GDB, sigh.  
I have re-verified the current trunk and indeed the side-effect annotation 
has gone:

(gdb) frame
#0  store_constructor (exp=<constructor 0x7ffff5b36e88>,
    target=0x7ffff5b388e0, cleared=0, size=..., reverse=false)
    at .../gcc/expr.cc:7169
7169		if (!TREE_SIDE_EFFECTS (exp)
(gdb) pt exp
 <constructor 0x7ffff5b36e88
    type <vector_type 0x7ffff5e23678 v4sf
        type <real_type 0x7ffff5c41260 float sizes-gimplified SF
            size <integer_cst 0x7ffff5b31050 constant 32>
            unit-size <integer_cst 0x7ffff5b31068 constant 4>
            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5c41260 precision:32
            pointer_to_this <pointer_type 0x7ffff5c41848>>
        sizes-gimplified V4SF
        size <integer_cst 0x7ffff5b30e58 constant 128>
        unit-size <integer_cst 0x7ffff5b30e70 constant 16>
        align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff5c6d740 nunits:4 context <translation_unit_decl 0x7ffff5e908e8 v4sf.c>>
    length:4
    val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float>
        visited var <parm_decl 0x7ffff7f80100 y>
        def_stmt GIMPLE_NOP
        version:2>
    val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float>
        visited var <parm_decl 0x7ffff7f80100 y>
        def_stmt GIMPLE_NOP
        version:2>
    val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float>
        visited var <parm_decl 0x7ffff7f80100 y>
        def_stmt GIMPLE_NOP
        version:2>
    val <ssa_name 0x7ffff5bd1200 type <real_type 0x7ffff5c41260 float>
        visited var <parm_decl 0x7ffff7f80100 y>
        def_stmt GIMPLE_NOP
        version:2>>
(gdb)

and I have tracked down commit 512429a885e8 ("tree-optimization/99863 - 
clear vector CTOR TREE_SIDE_EFFECTS") of yours to be the change required.

 Thank you for your assistance!

  Maciej
  

Patch

Index: gcc/gcc/c/c-typeck.cc
===================================================================
--- gcc.orig/gcc/c/c-typeck.cc
+++ gcc/gcc/c/c-typeck.cc
@@ -49,6 +49,7 @@  along with GCC; see the file COPYING3.
 #include "gomp-constants.h"
 #include "spellcheck-tree.h"
 #include "gcc-rich-location.h"
+#include "optabs-query.h"
 #include "stringpool.h"
 #include "attribs.h"
 #include "asan.h"
@@ -11923,7 +11924,9 @@  build_binary_op (location_t location, en
               bool maybe_const = true;
               tree sc;
               sc = c_fully_fold (op0, false, &maybe_const);
-              sc = save_expr (sc);
+	      if (optab_handler (vec_duplicate_optab,
+				 TYPE_MODE (type1)) == CODE_FOR_nothing)
+		sc = save_expr (sc);
               sc = convert (TREE_TYPE (type1), sc);
               op0 = build_vector_from_val (type1, sc);
               if (!maybe_const)
@@ -11938,7 +11941,9 @@  build_binary_op (location_t location, en
 	      bool maybe_const = true;
 	      tree sc;
 	      sc = c_fully_fold (op1, false, &maybe_const);
-	      sc = save_expr (sc);
+	      if (optab_handler (vec_duplicate_optab,
+				 TYPE_MODE (type0)) == CODE_FOR_nothing)
+		sc = save_expr (sc);
 	      sc = convert (TREE_TYPE (type0), sc);
 	      op1 = build_vector_from_val (type0, sc);
 	      if (!maybe_const)
Index: gcc/gcc/cp/call.cc
===================================================================
--- gcc.orig/gcc/cp/call.cc
+++ gcc/gcc/cp/call.cc
@@ -39,6 +39,7 @@  along with GCC; see the file COPYING3.
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "internal-fn.h"
+#include "optabs-query.h"
 #include "stringpool.h"
 #include "attribs.h"
 #include "gcc-rich-location.h"
@@ -5435,12 +5436,18 @@  build_conditional_expr (const op_locatio
 	      return error_mark_node;
 	    }
 
+	  bool have_vec_duplicate
+	    = optab_handler (vec_duplicate_optab,
+			     TYPE_MODE (vtype)) != CODE_FOR_nothing;
+
 	  arg2 = cp_convert (stype, arg2, complain);
-	  arg2 = save_expr (arg2);
+	  if (!have_vec_duplicate)
+	    arg2 = save_expr (arg2);
 	  arg2 = build_vector_from_val (vtype, arg2);
 	  arg2_type = vtype;
 	  arg3 = cp_convert (stype, arg3, complain);
-	  arg3 = save_expr (arg3);
+	  if (!have_vec_duplicate)
+	    arg3 = save_expr (arg3);
 	  arg3 = build_vector_from_val (vtype, arg3);
 	  arg3_type = vtype;
 	}
@@ -5458,7 +5465,9 @@  build_conditional_expr (const op_locatio
 		return error_mark_node;
 	      case stv_firstarg:
 		{
-		  arg2 = save_expr (arg2);
+		  if (optab_handler (vec_duplicate_optab,
+				     TYPE_MODE (arg3_type)) == CODE_FOR_nothing)
+		    arg2 = save_expr (arg2);
 		  arg2 = convert (TREE_TYPE (arg3_type), arg2);
 		  arg2 = build_vector_from_val (arg3_type, arg2);
 		  arg2_type = TREE_TYPE (arg2);
@@ -5466,7 +5475,9 @@  build_conditional_expr (const op_locatio
 		}
 	      case stv_secondarg:
 		{
-		  arg3 = save_expr (arg3);
+		  if (optab_handler (vec_duplicate_optab,
+				     TYPE_MODE (arg2_type)) == CODE_FOR_nothing)
+		    arg3 = save_expr (arg3);
 		  arg3 = convert (TREE_TYPE (arg2_type), arg3);
 		  arg3 = build_vector_from_val (arg2_type, arg3);
 		  arg3_type = TREE_TYPE (arg3);
Index: gcc/gcc/cp/typeck.cc
===================================================================
--- gcc.orig/gcc/cp/typeck.cc
+++ gcc/gcc/cp/typeck.cc
@@ -36,6 +36,7 @@  along with GCC; see the file COPYING3.
 #include "c-family/c-objc.h"
 #include "c-family/c-ubsan.h"
 #include "gcc-rich-location.h"
+#include "optabs-query.h"
 #include "stringpool.h"
 #include "attribs.h"
 #include "asan.h"
@@ -5066,7 +5067,9 @@  cp_build_binary_op (const op_location_t
           case stv_firstarg:
             {
               op0 = convert (TREE_TYPE (type1), op0);
-	      op0 = save_expr (op0);
+	      if (optab_handler (vec_duplicate_optab,
+				 TYPE_MODE (type1)) == CODE_FOR_nothing)
+		op0 = save_expr (op0);
               op0 = build_vector_from_val (type1, op0);
               type0 = TREE_TYPE (op0);
               code0 = TREE_CODE (type0);
@@ -5076,7 +5079,9 @@  cp_build_binary_op (const op_location_t
           case stv_secondarg:
             {
               op1 = convert (TREE_TYPE (type0), op1);
-	      op1 = save_expr (op1);
+	      if (optab_handler (vec_duplicate_optab,
+				 TYPE_MODE (type0)) == CODE_FOR_nothing)
+		op1 = save_expr (op1);
               op1 = build_vector_from_val (type0, op1);
               type1 = TREE_TYPE (op1);
               code1 = TREE_CODE (type1);