c++: compile time evaluation of prvalues [PR116416]

Message ID 20240920221815.694554-1-polacek@redhat.com
State New
Headers
Series c++: compile time evaluation of prvalues [PR116416] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed

Commit Message

Marek Polacek Sept. 20, 2024, 10:18 p.m. UTC
  Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR reports a missed optimization.  When we have:

  Str str{"Test"};
  callback(str);

as in the test, we're able to evaluate the Str::Str() call at compile
time.  But when we have:

  callback(Str{"Test"});

we are not.  With this patch (in fact, it's Patrick's patch with a little
tweak), we turn

  callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
    5
    __ct_comp
    D.2890
    (struct Str *) <<< Unknown tree: void_cst >>>
    (const char *) "Test" >>>>)

into

  callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)

I explored the idea of calling maybe_constant_value for the whole
TARGET_EXPR in cp_fold.  That has three problems:
- we can't always elide a TARGET_EXPR, so we'd have to make sure the
  result is also a TARGET_EXPR;
- the resulting TARGET_EXPR must have the same flags, otherwise Bad
  Things happen;
- getting a new slot is also problematic.  I've seen a test where we
  had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
  would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
  D.2680, we can't replace it with D.2681, and things break.

With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.

FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
is easy.  Previously, we would call C::C, so .gimple has:

  D.2590 = {};
  C::C (&D.2590);
  D.2597 = D.2590;
  return D.2597;

Then .einline inlines the C::C call:

  D.2590 = {};
  D.2590.a = {}; // #1
  D.2590.b = 0;  // #2
  D.2597 = D.2590;
  D.2590 ={v} {CLOBBER(eos)};
  return D.2597;

then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
passes.  But with the patch, .gimple won't have that C::C call, so the
IL is of course going to look different.

Unfortunately, pr78687.C is much more complicated and I can't explain
precisely what happens there.  But it seems like a good idea to have
a way to avoid this optimization.  So I've added the "noinline" check.

	PR c++/116416

gcc/cp/ChangeLog:

	* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
	TARGET_EXPR_INITIAL and replace it with the folded result if
	it's TREE_CONSTANT.

gcc/testsuite/ChangeLog:

	* g++.dg/analyzer/pr97116.C: Adjust dg-message.
	* g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
	* g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
	* g++.dg/tree-ssa/pr90883.C: Likewise.
	* g++.dg/cpp1y/constexpr-prvalue1.C: New test.

Co-authored-by: Patrick Palka <ppalka@redhat.com>
---
 gcc/cp/cp-gimplify.cc                         | 14 +++++++++
 gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
 .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 +++++++++++++++++++
 gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
 gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
 gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
 6 files changed, 50 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C


base-commit: 9227a64495d5594613604573b72422e8e3722fc5
  

Comments

Jason Merrill Sept. 20, 2024, 10:39 p.m. UTC | #1
On 9/20/24 12:18 AM, Marek Polacek wrote:
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> This PR reports a missed optimization.  When we have:
> 
>    Str str{"Test"};
>    callback(str);
> 
> as in the test, we're able to evaluate the Str::Str() call at compile
> time.  But when we have:
> 
>    callback(Str{"Test"});
> 
> we are not.  With this patch (in fact, it's Patrick's patch with a little
> tweak), we turn
> 
>    callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
>      5
>      __ct_comp
>      D.2890
>      (struct Str *) <<< Unknown tree: void_cst >>>
>      (const char *) "Test" >>>>)
> 
> into
> 
>    callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
> 
> I explored the idea of calling maybe_constant_value for the whole
> TARGET_EXPR in cp_fold.  That has three problems:
> - we can't always elide a TARGET_EXPR, so we'd have to make sure the
>    result is also a TARGET_EXPR;

I'd think that the result should always be a TARGET_EXPR for a class, 
and that's the case we want to fold; a TARGET_EXPR for a scalar is 
always the initialize-temp-and-use-it pattern you mention below.

> - the resulting TARGET_EXPR must have the same flags, otherwise Bad
>    Things happen;

I guess maybe_constant_value should be fixed to preserve flags 
regardless of this change.

> - getting a new slot is also problematic.  I've seen a test where we
>    had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
>    would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
>    D.2680, we can't replace it with D.2681, and things break.

Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?

> With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
> 
> FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
> is easy.  Previously, we would call C::C, so .gimple has:
> 
>    D.2590 = {};
>    C::C (&D.2590);
>    D.2597 = D.2590;
>    return D.2597;
> 
> Then .einline inlines the C::C call:
> 
>    D.2590 = {};
>    D.2590.a = {}; // #1
>    D.2590.b = 0;  // #2
>    D.2597 = D.2590;
>    D.2590 ={v} {CLOBBER(eos)};
>    return D.2597;
> 
> then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
> passes.  But with the patch, .gimple won't have that C::C call, so the
> IL is of course going to look different.

Maybe -fno-inline instead of the --param?

> Unfortunately, pr78687.C is much more complicated and I can't explain
> precisely what happens there.  But it seems like a good idea to have
> a way to avoid this optimization.  So I've added the "noinline" check.

Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy 
constructors are not constexpr.  And I wouldn't expect the optimization 
to affect the value-initialization option_2().

> 	PR c++/116416
> 
> gcc/cp/ChangeLog:
> 
> 	* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
> 	TARGET_EXPR_INITIAL and replace it with the folded result if
> 	it's TREE_CONSTANT.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.dg/analyzer/pr97116.C: Adjust dg-message.
> 	* g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> 	* g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> 	* g++.dg/tree-ssa/pr90883.C: Likewise.
> 	* g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> 
> Co-authored-by: Patrick Palka <ppalka@redhat.com>
> ---
>   gcc/cp/cp-gimplify.cc                         | 14 +++++++++
>   gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
>   .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 +++++++++++++++++++
>   gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
>   gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
>   gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
>   6 files changed, 50 insertions(+), 3 deletions(-)
>   create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> 
> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> index 003e68f1ea7..41d6333f650 100644
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
>   	 that case, strip it in favor of this one.  */
>         if (tree &init = TARGET_EXPR_INITIAL (stmt))
>   	{
> +	  tree fn;
> +	  if ((data->flags & ff_genericize)
> +	      /* Give the user an option to opt out.  */
> +	      && !((fn = current_function_decl)
> +		   && lookup_attribute ("noinline",
> +					DECL_ATTRIBUTES (fn))))

This looks backwards from the usual sense of noinline, which suppresses 
inlining of the function so marked, rather than inlining of other 
functions called.  If you want to check noinline, you need to dig out 
the called function (if any), perhaps with extract_call_expr.

This should also share conditions with the code that calls 
maybe_constant_value for CALL_EXPR (flag_no_inline, in particular).

Jason
  
Jakub Jelinek Sept. 20, 2024, 10:40 p.m. UTC | #2
On Fri, Sep 20, 2024 at 06:18:15PM -0400, Marek Polacek wrote:
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
>  	 that case, strip it in favor of this one.  */
>        if (tree &init = TARGET_EXPR_INITIAL (stmt))
>  	{
> +	  tree fn;
> +	  if ((data->flags & ff_genericize)
> +	      /* Give the user an option to opt out.  */
> +	      && !((fn = current_function_decl)
> +		   && lookup_attribute ("noinline",
> +					DECL_ATTRIBUTES (fn))))
> +	    {
> +	      tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt));
> +	      if (folded != init && TREE_CONSTANT (folded))
> +		{
> +		  init = folded;
> +		  break;
> +		}
> +	    }

The CALL_EXPR case in cp_fold uses !flag_no_inline instead, that makes more
sense to me.
Because checking "noinline" attribute (which means don't inline this
function) on current_function_decl rather than on functions being "inlined"
(the constexpr functions being non-manifestly constant evaluated) is just
weird.
If we really wanted, we could honor "noinline" during constant evaluation
on the CALL_EXPR/AGGR_INIT_EXPR fndecls, but dunno if whenever doing the
non-manifestly constant evaluated cases or just in special cases like these
two (CALL_EXPR in cp_fold, this in cp_fold_r).

	Jakub
  
Jason Merrill Sept. 20, 2024, 11:03 p.m. UTC | #3
On 9/20/24 12:40 AM, Jakub Jelinek wrote:
> On Fri, Sep 20, 2024 at 06:18:15PM -0400, Marek Polacek wrote:
>> --- a/gcc/cp/cp-gimplify.cc
>> +++ b/gcc/cp/cp-gimplify.cc
>> @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
>>   	 that case, strip it in favor of this one.  */
>>         if (tree &init = TARGET_EXPR_INITIAL (stmt))
>>   	{
>> +	  tree fn;
>> +	  if ((data->flags & ff_genericize)
>> +	      /* Give the user an option to opt out.  */
>> +	      && !((fn = current_function_decl)
>> +		   && lookup_attribute ("noinline",
>> +					DECL_ATTRIBUTES (fn))))
>> +	    {
>> +	      tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt));
>> +	      if (folded != init && TREE_CONSTANT (folded))
>> +		{
>> +		  init = folded;
>> +		  break;
>> +		}
>> +	    }
> 
> The CALL_EXPR case in cp_fold uses !flag_no_inline instead, that makes more
> sense to me.
> Because checking "noinline" attribute (which means don't inline this
> function) on current_function_decl rather than on functions being "inlined"
> (the constexpr functions being non-manifestly constant evaluated) is just
> weird.
> If we really wanted, we could honor "noinline" during constant evaluation
> on the CALL_EXPR/AGGR_INIT_EXPR fndecls, but dunno if whenever doing the
> non-manifestly constant evaluated cases or just in special cases like these
> two (CALL_EXPR in cp_fold, this in cp_fold_r).

Checking noinline in non-manifestly constant-evaluated cases might make 
sense.

But I suspect there isn't even a call in pr78687; the only thing I can 
see that maybe_constant could do anything with is the option_2() 
value-init, and that should just be a TARGET_EXPR around a CONSTRUCTOR 
already.  Mysterious.

Jason
  
Jakub Jelinek Sept. 21, 2024, 3 p.m. UTC | #4
On Fri, Sep 20, 2024 at 07:03:45PM -0400, Jason Merrill wrote:
> > The CALL_EXPR case in cp_fold uses !flag_no_inline instead, that makes more
> > sense to me.
> > Because checking "noinline" attribute (which means don't inline this
> > function) on current_function_decl rather than on functions being "inlined"
> > (the constexpr functions being non-manifestly constant evaluated) is just
> > weird.
> > If we really wanted, we could honor "noinline" during constant evaluation
> > on the CALL_EXPR/AGGR_INIT_EXPR fndecls, but dunno if whenever doing the
> > non-manifestly constant evaluated cases or just in special cases like these
> > two (CALL_EXPR in cp_fold, this in cp_fold_r).
> 
> Checking noinline in non-manifestly constant-evaluated cases might make
> sense.

Though, if somebody marks some function explicitly constexpr they should be
prepared to get some constexpr evaluation of it, doesn't have to be strictly
standard required one.
And for -fimplicit-constexpr we already have "noinline" attribute check, so
maybe it is ok as is.

	Jakub
  
Marek Polacek Sept. 24, 2024, 9:10 p.m. UTC | #5
On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
> On 9/20/24 12:18 AM, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > This PR reports a missed optimization.  When we have:
> > 
> >    Str str{"Test"};
> >    callback(str);
> > 
> > as in the test, we're able to evaluate the Str::Str() call at compile
> > time.  But when we have:
> > 
> >    callback(Str{"Test"});
> > 
> > we are not.  With this patch (in fact, it's Patrick's patch with a little
> > tweak), we turn
> > 
> >    callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
> >      5
> >      __ct_comp
> >      D.2890
> >      (struct Str *) <<< Unknown tree: void_cst >>>
> >      (const char *) "Test" >>>>)
> > 
> > into
> > 
> >    callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
> > 
> > I explored the idea of calling maybe_constant_value for the whole
> > TARGET_EXPR in cp_fold.  That has three problems:
> > - we can't always elide a TARGET_EXPR, so we'd have to make sure the
> >    result is also a TARGET_EXPR;
> 
> I'd think that the result should always be a TARGET_EXPR for a class, and
> that's the case we want to fold; a TARGET_EXPR for a scalar is always the
> initialize-temp-and-use-it pattern you mention below.

Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
 
> > - the resulting TARGET_EXPR must have the same flags, otherwise Bad
> >    Things happen;
> 
> I guess maybe_constant_value should be fixed to preserve flags regardless of
> this change.

Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
but here we go into the break_out_target_exprs block in maybe_constant_value
and that doesn't necessarily preserve them.

> > - getting a new slot is also problematic.  I've seen a test where we
> >    had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
> >    would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
> >    D.2680, we can't replace it with D.2681, and things break.
> 
> Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?

...unfortunately that doesn't always help.  I've reduced an example into:

  struct optional {
    constexpr optional(int) {}
  };
  optional foo() { return 2; }

where check_return_expr creates a COMPOUND_EXPR:

        retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
                         TREE_OPERAND (retval, 0));

where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
 
> > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
> > 
> > FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
> > is easy.  Previously, we would call C::C, so .gimple has:
> > 
> >    D.2590 = {};
> >    C::C (&D.2590);
> >    D.2597 = D.2590;
> >    return D.2597;
> > 
> > Then .einline inlines the C::C call:
> > 
> >    D.2590 = {};
> >    D.2590.a = {}; // #1
> >    D.2590.b = 0;  // #2
> >    D.2597 = D.2590;
> >    D.2590 ={v} {CLOBBER(eos)};
> >    return D.2597;
> > 
> > then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
> > passes.  But with the patch, .gimple won't have that C::C call, so the
> > IL is of course going to look different.
> 
> Maybe -fno-inline instead of the --param?

Then that C::C call isn't inlined and the test fails :/.
 
> > Unfortunately, pr78687.C is much more complicated and I can't explain
> > precisely what happens there.  But it seems like a good idea to have
> > a way to avoid this optimization.  So I've added the "noinline" check.
> 
> Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
> constructors are not constexpr.  And I wouldn't expect the optimization to
> affect the value-initialization option_2().

In pr78687.C we do this new optimization only once for
"constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept (std::is_nothrow_constructible<T, U&&>::value)".
 
> > 	PR c++/116416
> > 
> > gcc/cp/ChangeLog:
> > 
> > 	* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
> > 	TARGET_EXPR_INITIAL and replace it with the folded result if
> > 	it's TREE_CONSTANT.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > 	* g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> > 	* g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> > 	* g++.dg/tree-ssa/pr90883.C: Likewise.
> > 	* g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> > 
> > Co-authored-by: Patrick Palka <ppalka@redhat.com>
> > ---
> >   gcc/cp/cp-gimplify.cc                         | 14 +++++++++
> >   gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
> >   .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 +++++++++++++++++++
> >   gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
> >   gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
> >   gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
> >   6 files changed, 50 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> > 
> > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > index 003e68f1ea7..41d6333f650 100644
> > --- a/gcc/cp/cp-gimplify.cc
> > +++ b/gcc/cp/cp-gimplify.cc
> > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
> >   	 that case, strip it in favor of this one.  */
> >         if (tree &init = TARGET_EXPR_INITIAL (stmt))
> >   	{
> > +	  tree fn;
> > +	  if ((data->flags & ff_genericize)
> > +	      /* Give the user an option to opt out.  */
> > +	      && !((fn = current_function_decl)
> > +		   && lookup_attribute ("noinline",
> > +					DECL_ATTRIBUTES (fn))))
> 
> This looks backwards from the usual sense of noinline, which suppresses
> inlining of the function so marked, rather than inlining of other functions
> called.  If you want to check noinline, you need to dig out the called
> function (if any), perhaps with extract_call_expr.

Now I wish I hadn't done that.  But for implicit functions, like in
pr90883.C where we have

    class C
    {
        char a[7]{};
        int b{};
    };
 
there's nothing to make noinline anyway.  So I'm stuck.  I suppose
it wouldn't make sense to use -fno-fold-simple-inlines to disable
this optimization.  Can I abuse -fearly-inlining?  I don't want a new
flag for this :(.

Or should we just adjust the tests?

> This should also share conditions with the code that calls
> maybe_constant_value for CALL_EXPR (flag_no_inline, in particular).

Done.  So I have 

--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1473,6 +1473,16 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
     that case, strip it in favor of this one.  */
       if (tree &init = TARGET_EXPR_INITIAL (stmt))
    {
+     if ((data->flags & ff_genericize)
+         && !flag_no_inline)
+       {
+         tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt));
+         if (folded != init && TREE_CONSTANT (folded))
+       {
+         init = folded;
+         break;
+       }
+       }
      cp_walk_tree (&init, cp_fold_r, data, NULL);
      cp_walk_tree (&TARGET_EXPR_CLEANUP (stmt), cp_fold_r, data, NULL);
      *walk_subtrees = 0;

But there are still these two fails as mentioned above:

FAIL: g++.dg/tree-ssa/pr78687.C   scan-tree-dump sra "Removing load:.*ptr;"
FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
  
Marek Polacek Sept. 24, 2024, 9:14 p.m. UTC | #6
On Sat, Sep 21, 2024 at 05:00:51PM +0200, Jakub Jelinek wrote:
> On Fri, Sep 20, 2024 at 07:03:45PM -0400, Jason Merrill wrote:
> > > The CALL_EXPR case in cp_fold uses !flag_no_inline instead, that makes more
> > > sense to me.
> > > Because checking "noinline" attribute (which means don't inline this
> > > function) on current_function_decl rather than on functions being "inlined"
> > > (the constexpr functions being non-manifestly constant evaluated) is just
> > > weird.
> > > If we really wanted, we could honor "noinline" during constant evaluation
> > > on the CALL_EXPR/AGGR_INIT_EXPR fndecls, but dunno if whenever doing the
> > > non-manifestly constant evaluated cases or just in special cases like these
> > > two (CALL_EXPR in cp_fold, this in cp_fold_r).
> > 
> > Checking noinline in non-manifestly constant-evaluated cases might make
> > sense.
> 
> Though, if somebody marks some function explicitly constexpr they should be
> prepared to get some constexpr evaluation of it, doesn't have to be strictly
> standard required one.

Yeah, I would agree with that.

> And for -fimplicit-constexpr we already have "noinline" attribute check, so
> maybe it is ok as is.

Yeah.  I dropped the "noinline" attribute check though because I no longer
see any need for it.

Marek
  
Jason Merrill Sept. 25, 2024, 12:54 p.m. UTC | #7
On 9/24/24 5:10 PM, Marek Polacek wrote:
> On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
>> On 9/20/24 12:18 AM, Marek Polacek wrote:
>>> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>>>
>>> -- >8 --
>>> This PR reports a missed optimization.  When we have:
>>>
>>>     Str str{"Test"};
>>>     callback(str);
>>>
>>> as in the test, we're able to evaluate the Str::Str() call at compile
>>> time.  But when we have:
>>>
>>>     callback(Str{"Test"});
>>>
>>> we are not.  With this patch (in fact, it's Patrick's patch with a little
>>> tweak), we turn
>>>
>>>     callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
>>>       5
>>>       __ct_comp
>>>       D.2890
>>>       (struct Str *) <<< Unknown tree: void_cst >>>
>>>       (const char *) "Test" >>>>)
>>>
>>> into
>>>
>>>     callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
>>>
>>> I explored the idea of calling maybe_constant_value for the whole
>>> TARGET_EXPR in cp_fold.  That has three problems:
>>> - we can't always elide a TARGET_EXPR, so we'd have to make sure the
>>>     result is also a TARGET_EXPR;
>>
>> I'd think that the result should always be a TARGET_EXPR for a class, and
>> that's the case we want to fold; a TARGET_EXPR for a scalar is always the
>> initialize-temp-and-use-it pattern you mention below.
> 
> Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
>   
>>> - the resulting TARGET_EXPR must have the same flags, otherwise Bad
>>>     Things happen;
>>
>> I guess maybe_constant_value should be fixed to preserve flags regardless of
>> this change.
> 
> Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
> but here we go into the break_out_target_exprs block in maybe_constant_value
> and that doesn't necessarily preserve them.
> 
>>> - getting a new slot is also problematic.  I've seen a test where we
>>>     had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
>>>     would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
>>>     D.2680, we can't replace it with D.2681, and things break.
>>
>> Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?
> 
> ...unfortunately that doesn't always help.  I've reduced an example into:
> 
>    struct optional {
>      constexpr optional(int) {}
>    };
>    optional foo() { return 2; }
> 
> where check_return_expr creates a COMPOUND_EXPR:
> 
>          retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
>                           TREE_OPERAND (retval, 0));
> 
> where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
>   
>>> With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
>>>
>>> FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
>>> is easy.  Previously, we would call C::C, so .gimple has:
>>>
>>>     D.2590 = {};
>>>     C::C (&D.2590);
>>>     D.2597 = D.2590;
>>>     return D.2597;
>>>
>>> Then .einline inlines the C::C call:
>>>
>>>     D.2590 = {};
>>>     D.2590.a = {}; // #1
>>>     D.2590.b = 0;  // #2
>>>     D.2597 = D.2590;
>>>     D.2590 ={v} {CLOBBER(eos)};
>>>     return D.2597;
>>>
>>> then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
>>> passes.  But with the patch, .gimple won't have that C::C call, so the
>>> IL is of course going to look different.
>>
>> Maybe -fno-inline instead of the --param?
> 
> Then that C::C call isn't inlined and the test fails :/.
>   
>>> Unfortunately, pr78687.C is much more complicated and I can't explain
>>> precisely what happens there.  But it seems like a good idea to have
>>> a way to avoid this optimization.  So I've added the "noinline" check.
>>
>> Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
>> constructors are not constexpr.  And I wouldn't expect the optimization to
>> affect the value-initialization option_2().
> 
> In pr78687.C we do this new optimization only once for
> "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept (std::is_nothrow_constructible<T, U&&>::value)".

Aha.  Can we remove its constexpr?
...no, it's defaulted.  And moving the defaulting after the class also 
breaks the testcase.

>>> 	PR c++/116416
>>>
>>> gcc/cp/ChangeLog:
>>>
>>> 	* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
>>> 	TARGET_EXPR_INITIAL and replace it with the folded result if
>>> 	it's TREE_CONSTANT.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* g++.dg/analyzer/pr97116.C: Adjust dg-message.
>>> 	* g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
>>> 	* g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
>>> 	* g++.dg/tree-ssa/pr90883.C: Likewise.
>>> 	* g++.dg/cpp1y/constexpr-prvalue1.C: New test.
>>>
>>> Co-authored-by: Patrick Palka <ppalka@redhat.com>
>>> ---
>>>    gcc/cp/cp-gimplify.cc                         | 14 +++++++++
>>>    gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
>>>    .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 +++++++++++++++++++
>>>    gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
>>>    gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
>>>    gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
>>>    6 files changed, 50 insertions(+), 3 deletions(-)
>>>    create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
>>>
>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
>>> index 003e68f1ea7..41d6333f650 100644
>>> --- a/gcc/cp/cp-gimplify.cc
>>> +++ b/gcc/cp/cp-gimplify.cc
>>> @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
>>>    	 that case, strip it in favor of this one.  */
>>>          if (tree &init = TARGET_EXPR_INITIAL (stmt))
>>>    	{
>>> +	  tree fn;
>>> +	  if ((data->flags & ff_genericize)
>>> +	      /* Give the user an option to opt out.  */
>>> +	      && !((fn = current_function_decl)
>>> +		   && lookup_attribute ("noinline",
>>> +					DECL_ATTRIBUTES (fn))))
>>
>> This looks backwards from the usual sense of noinline, which suppresses
>> inlining of the function so marked, rather than inlining of other functions
>> called.  If you want to check noinline, you need to dig out the called
>> function (if any), perhaps with extract_call_expr.
> 
> Now I wish I hadn't done that.  But for implicit functions, like in
> pr90883.C where we have
> 
>      class C
>      {
>          char a[7]{};
>          int b{};
>      };
>   
> there's nothing to make noinline anyway.  So I'm stuck.  I suppose
> it wouldn't make sense to use -fno-fold-simple-inlines to disable
> this optimization.  Can I abuse -fearly-inlining?  I don't want a new
> flag for this :(.
> 
> Or should we just adjust the tests?

I think so, yes.  Probably so that they test that the output doesn't 
contain the redundancy that the PR complains about; reaching optimal 
output by another path isn't a problem.

For pr78687, I guess that means checking the number of aggregate 
temporaries.

For pr90883, maybe check for the lack of ".a =" and ".b = "?

What is the .optimized for these two tests after your patch?

Jason
  
Marek Polacek Sept. 25, 2024, 8:21 p.m. UTC | #8
On Wed, Sep 25, 2024 at 08:54:46AM -0400, Jason Merrill wrote:
> On 9/24/24 5:10 PM, Marek Polacek wrote:
> > On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
> > > On 9/20/24 12:18 AM, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > This PR reports a missed optimization.  When we have:
> > > > 
> > > >     Str str{"Test"};
> > > >     callback(str);
> > > > 
> > > > as in the test, we're able to evaluate the Str::Str() call at compile
> > > > time.  But when we have:
> > > > 
> > > >     callback(Str{"Test"});
> > > > 
> > > > we are not.  With this patch (in fact, it's Patrick's patch with a little
> > > > tweak), we turn
> > > > 
> > > >     callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
> > > >       5
> > > >       __ct_comp
> > > >       D.2890
> > > >       (struct Str *) <<< Unknown tree: void_cst >>>
> > > >       (const char *) "Test" >>>>)
> > > > 
> > > > into
> > > > 
> > > >     callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
> > > > 
> > > > I explored the idea of calling maybe_constant_value for the whole
> > > > TARGET_EXPR in cp_fold.  That has three problems:
> > > > - we can't always elide a TARGET_EXPR, so we'd have to make sure the
> > > >     result is also a TARGET_EXPR;
> > > 
> > > I'd think that the result should always be a TARGET_EXPR for a class, and
> > > that's the case we want to fold; a TARGET_EXPR for a scalar is always the
> > > initialize-temp-and-use-it pattern you mention below.
> > 
> > Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
> > > > - the resulting TARGET_EXPR must have the same flags, otherwise Bad
> > > >     Things happen;
> > > 
> > > I guess maybe_constant_value should be fixed to preserve flags regardless of
> > > this change.
> > 
> > Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
> > but here we go into the break_out_target_exprs block in maybe_constant_value
> > and that doesn't necessarily preserve them.
> > 
> > > > - getting a new slot is also problematic.  I've seen a test where we
> > > >     had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
> > > >     would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
> > > >     D.2680, we can't replace it with D.2681, and things break.
> > > 
> > > Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?
> > 
> > ...unfortunately that doesn't always help.  I've reduced an example into:
> > 
> >    struct optional {
> >      constexpr optional(int) {}
> >    };
> >    optional foo() { return 2; }
> > 
> > where check_return_expr creates a COMPOUND_EXPR:
> > 
> >          retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
> >                           TREE_OPERAND (retval, 0));
> > 
> > where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
> > > > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
> > > > 
> > > > FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
> > > > is easy.  Previously, we would call C::C, so .gimple has:
> > > > 
> > > >     D.2590 = {};
> > > >     C::C (&D.2590);
> > > >     D.2597 = D.2590;
> > > >     return D.2597;
> > > > 
> > > > Then .einline inlines the C::C call:
> > > > 
> > > >     D.2590 = {};
> > > >     D.2590.a = {}; // #1
> > > >     D.2590.b = 0;  // #2
> > > >     D.2597 = D.2590;
> > > >     D.2590 ={v} {CLOBBER(eos)};
> > > >     return D.2597;
> > > > 
> > > > then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
> > > > passes.  But with the patch, .gimple won't have that C::C call, so the
> > > > IL is of course going to look different.
> > > 
> > > Maybe -fno-inline instead of the --param?
> > 
> > Then that C::C call isn't inlined and the test fails :/.
> > > > Unfortunately, pr78687.C is much more complicated and I can't explain
> > > > precisely what happens there.  But it seems like a good idea to have
> > > > a way to avoid this optimization.  So I've added the "noinline" check.
> > > 
> > > Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
> > > constructors are not constexpr.  And I wouldn't expect the optimization to
> > > affect the value-initialization option_2().
> > 
> > In pr78687.C we do this new optimization only once for
> > "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept (std::is_nothrow_constructible<T, U&&>::value)".
> 
> Aha.  Can we remove its constexpr?
> ...no, it's defaulted.  And moving the defaulting after the class also
> breaks the testcase.
> 
> > > > 	PR c++/116416
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > 	* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
> > > > 	TARGET_EXPR_INITIAL and replace it with the folded result if
> > > > 	it's TREE_CONSTANT.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > 	* g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > > > 	* g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> > > > 	* g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> > > > 	* g++.dg/tree-ssa/pr90883.C: Likewise.
> > > > 	* g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> > > > 
> > > > Co-authored-by: Patrick Palka <ppalka@redhat.com>
> > > > ---
> > > >    gcc/cp/cp-gimplify.cc                         | 14 +++++++++
> > > >    gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
> > > >    .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 +++++++++++++++++++
> > > >    gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
> > > >    gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
> > > >    gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
> > > >    6 files changed, 50 insertions(+), 3 deletions(-)
> > > >    create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> > > > 
> > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > index 003e68f1ea7..41d6333f650 100644
> > > > --- a/gcc/cp/cp-gimplify.cc
> > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
> > > >    	 that case, strip it in favor of this one.  */
> > > >          if (tree &init = TARGET_EXPR_INITIAL (stmt))
> > > >    	{
> > > > +	  tree fn;
> > > > +	  if ((data->flags & ff_genericize)
> > > > +	      /* Give the user an option to opt out.  */
> > > > +	      && !((fn = current_function_decl)
> > > > +		   && lookup_attribute ("noinline",
> > > > +					DECL_ATTRIBUTES (fn))))
> > > 
> > > This looks backwards from the usual sense of noinline, which suppresses
> > > inlining of the function so marked, rather than inlining of other functions
> > > called.  If you want to check noinline, you need to dig out the called
> > > function (if any), perhaps with extract_call_expr.
> > 
> > Now I wish I hadn't done that.  But for implicit functions, like in
> > pr90883.C where we have
> > 
> >      class C
> >      {
> >          char a[7]{};
> >          int b{};
> >      };
> > there's nothing to make noinline anyway.  So I'm stuck.  I suppose
> > it wouldn't make sense to use -fno-fold-simple-inlines to disable
> > this optimization.  Can I abuse -fearly-inlining?  I don't want a new
> > flag for this :(.
> > 
> > Or should we just adjust the tests?
> 
> I think so, yes.  Probably so that they test that the output doesn't contain
> the redundancy that the PR complains about; reaching optimal output by
> another path isn't a problem.
> 
> For pr78687, I guess that means checking the number of aggregate
> temporaries.
> 
> For pr90883, maybe check for the lack of ".a =" and ".b = "?

Sure.
 
> What is the .optimized for these two tests after your patch?

pr90883 is fine, the dumps are the same.  In pr78687 they very much
aren't.

Let's see what's happening here.  In .gimple, the difference is
in make_object_1:

 struct ref_proxy make_object_1 ()
 {
   struct variant D.9472;
-  struct option_2 D.9273;

   try
     {
-      try
-        {
-          eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472, &D.9273);
-          ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy (&<retval>, D.9472);
-          return <retval>;
-        }
-      finally
-        {
-          D.9273 = {CLOBBER(eos)};
-        }
+      D.9472 = {};
+      D.9472._storage._which = 2;
+      ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy (&<retval>, D.9472);
+      return <retval>;

that makes sense: the new optimization "inlined" the
eggs::variants::variant<option_1, option_2>::variant<option_2> call.  So later
.einline has nothing to expand here whereas previously 

  eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472, &D.9273);

was expanded into

  MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
  MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
  MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
  MEM[(struct _storage *)&D.9472]._which = 2;

Then make_object_1 gets inlined.  Something happens in SRA.  And
then we got rid of a lot of code.  But now a lot of code remains.

Is it simply the fact that with this opt we expand the ctor into

  D.9472 = {};
  D.9472._storage._which = 2;

which means that we zero-init the variant object and then set _which to 2,
while previously we just allocated storage and set _which to 2?

This is .optimized with the opt on:

struct ref_proxy f ()
{
  struct ref_proxy ptr;
  struct ref_proxy D.10036;
  struct ref_proxy type;
  struct ref_proxy type;
  struct qual_option D.10031;
  struct ref_proxy D.10030;
  struct qual_option inner;
  struct variant t;
  struct variant D.10026;
  struct variant D.10024;
  struct inplace_ref D.10023;
  struct inplace_ref ptr;
  struct ref_proxy D.9898;

  <bb 2> [local count: 1073741824]:
  MEM <char[40]> [(struct variant *)&D.10024] = {};
  D.10024._storage._which = 2;
  D.10026 = D.10024;
  t = D.10026;
  MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
  MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
  t ={v} {CLOBBER(eos)};
  D.10026 ={v} {CLOBBER(eos)};
  D.10024 ={v} {CLOBBER(eos)};
  MEM <size_t> [(struct ref_proxy *)&D.9898 + 40B] = 2;
  D.10036 = D.9898;
  ptr = D.10036;
  MEM[(struct variant_ref *)&D.10030] ={v} {CLOBBER(bob)};
  MEM[(struct variant_ref *)&D.10030].inner_storage_ = ptr.D.9270.inner_storage_;
  ptr ={v} {CLOBBER(eos)};
  D.10036 ={v} {CLOBBER(eos)};
  MEM <size_t> [(struct ref_proxy *)&D.10030 + 40B] = 2;
  type = D.10030;
  type = type;
  MEM[(struct __as_base  &)&D.10031] ={v} {CLOBBER(bob)};
  D.10031.type_ = type;
  type ={v} {CLOBBER(eos)};
  type ={v} {CLOBBER(eos)};
  MEM <size_t> [(struct qual_option *)&D.10031 + 40B] = 2;
  D.10031.quals_ = 0;
  inner = D.10031;
  D.10023 ={v} {CLOBBER(bob)};
  D.10023.inner_ = inner;
  inner ={v} {CLOBBER(eos)};
  D.10030 ={v} {CLOBBER(eos)};
  D.10031 ={v} {CLOBBER(eos)};
  MEM <size_t> [(struct inplace_ref *)&D.10023 + 40B] = 2;
  MEM <int> [(struct inplace_ref *)&D.10023 + 48B] = 0;
  ptr = D.10023;
  <retval> ={v} {CLOBBER(bob)};
  <retval>.D.9858 = ptr;
  ptr ={v} {CLOBBER(eos)};
  D.9898 ={v} {CLOBBER(eos)};
  return <retval>;

}
  
Jason Merrill Sept. 26, 2024, 12:45 a.m. UTC | #9
On 9/25/24 4:21 PM, Marek Polacek wrote:
> On Wed, Sep 25, 2024 at 08:54:46AM -0400, Jason Merrill wrote:
>> On 9/24/24 5:10 PM, Marek Polacek wrote:
>>> On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
>>>> On 9/20/24 12:18 AM, Marek Polacek wrote:
>>>>> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>>>>>
>>>>> -- >8 --
>>>>> This PR reports a missed optimization.  When we have:
>>>>>
>>>>>      Str str{"Test"};
>>>>>      callback(str);
>>>>>
>>>>> as in the test, we're able to evaluate the Str::Str() call at compile
>>>>> time.  But when we have:
>>>>>
>>>>>      callback(Str{"Test"});
>>>>>
>>>>> we are not.  With this patch (in fact, it's Patrick's patch with a little
>>>>> tweak), we turn
>>>>>
>>>>>      callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
>>>>>        5
>>>>>        __ct_comp
>>>>>        D.2890
>>>>>        (struct Str *) <<< Unknown tree: void_cst >>>
>>>>>        (const char *) "Test" >>>>)
>>>>>
>>>>> into
>>>>>
>>>>>      callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
>>>>>
>>>>> I explored the idea of calling maybe_constant_value for the whole
>>>>> TARGET_EXPR in cp_fold.  That has three problems:
>>>>> - we can't always elide a TARGET_EXPR, so we'd have to make sure the
>>>>>      result is also a TARGET_EXPR;
>>>>
>>>> I'd think that the result should always be a TARGET_EXPR for a class, and
>>>> that's the case we want to fold; a TARGET_EXPR for a scalar is always the
>>>> initialize-temp-and-use-it pattern you mention below.
>>>
>>> Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
>>>>> - the resulting TARGET_EXPR must have the same flags, otherwise Bad
>>>>>      Things happen;
>>>>
>>>> I guess maybe_constant_value should be fixed to preserve flags regardless of
>>>> this change.
>>>
>>> Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
>>> but here we go into the break_out_target_exprs block in maybe_constant_value
>>> and that doesn't necessarily preserve them.
>>>
>>>>> - getting a new slot is also problematic.  I've seen a test where we
>>>>>      had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
>>>>>      would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
>>>>>      D.2680, we can't replace it with D.2681, and things break.
>>>>
>>>> Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?
>>>
>>> ...unfortunately that doesn't always help.  I've reduced an example into:
>>>
>>>     struct optional {
>>>       constexpr optional(int) {}
>>>     };
>>>     optional foo() { return 2; }
>>>
>>> where check_return_expr creates a COMPOUND_EXPR:
>>>
>>>           retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
>>>                            TREE_OPERAND (retval, 0));
>>>
>>> where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
>>>>> With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
>>>>>
>>>>> FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
>>>>> is easy.  Previously, we would call C::C, so .gimple has:
>>>>>
>>>>>      D.2590 = {};
>>>>>      C::C (&D.2590);
>>>>>      D.2597 = D.2590;
>>>>>      return D.2597;
>>>>>
>>>>> Then .einline inlines the C::C call:
>>>>>
>>>>>      D.2590 = {};
>>>>>      D.2590.a = {}; // #1
>>>>>      D.2590.b = 0;  // #2
>>>>>      D.2597 = D.2590;
>>>>>      D.2590 ={v} {CLOBBER(eos)};
>>>>>      return D.2597;
>>>>>
>>>>> then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
>>>>> passes.  But with the patch, .gimple won't have that C::C call, so the
>>>>> IL is of course going to look different.
>>>>
>>>> Maybe -fno-inline instead of the --param?
>>>
>>> Then that C::C call isn't inlined and the test fails :/.
>>>>> Unfortunately, pr78687.C is much more complicated and I can't explain
>>>>> precisely what happens there.  But it seems like a good idea to have
>>>>> a way to avoid this optimization.  So I've added the "noinline" check.
>>>>
>>>> Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
>>>> constructors are not constexpr.  And I wouldn't expect the optimization to
>>>> affect the value-initialization option_2().
>>>
>>> In pr78687.C we do this new optimization only once for
>>> "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept (std::is_nothrow_constructible<T, U&&>::value)".
>>
>> Aha.  Can we remove its constexpr?
>> ...no, it's defaulted.  And moving the defaulting after the class also
>> breaks the testcase.
>>
>>>>> 	PR c++/116416
>>>>>
>>>>> gcc/cp/ChangeLog:
>>>>>
>>>>> 	* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
>>>>> 	TARGET_EXPR_INITIAL and replace it with the folded result if
>>>>> 	it's TREE_CONSTANT.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> 	* g++.dg/analyzer/pr97116.C: Adjust dg-message.
>>>>> 	* g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
>>>>> 	* g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
>>>>> 	* g++.dg/tree-ssa/pr90883.C: Likewise.
>>>>> 	* g++.dg/cpp1y/constexpr-prvalue1.C: New test.
>>>>>
>>>>> Co-authored-by: Patrick Palka <ppalka@redhat.com>
>>>>> ---
>>>>>     gcc/cp/cp-gimplify.cc                         | 14 +++++++++
>>>>>     gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
>>>>>     .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 +++++++++++++++++++
>>>>>     gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
>>>>>     gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
>>>>>     gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
>>>>>     6 files changed, 50 insertions(+), 3 deletions(-)
>>>>>     create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
>>>>>
>>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
>>>>> index 003e68f1ea7..41d6333f650 100644
>>>>> --- a/gcc/cp/cp-gimplify.cc
>>>>> +++ b/gcc/cp/cp-gimplify.cc
>>>>> @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
>>>>>     	 that case, strip it in favor of this one.  */
>>>>>           if (tree &init = TARGET_EXPR_INITIAL (stmt))
>>>>>     	{
>>>>> +	  tree fn;
>>>>> +	  if ((data->flags & ff_genericize)
>>>>> +	      /* Give the user an option to opt out.  */
>>>>> +	      && !((fn = current_function_decl)
>>>>> +		   && lookup_attribute ("noinline",
>>>>> +					DECL_ATTRIBUTES (fn))))
>>>>
>>>> This looks backwards from the usual sense of noinline, which suppresses
>>>> inlining of the function so marked, rather than inlining of other functions
>>>> called.  If you want to check noinline, you need to dig out the called
>>>> function (if any), perhaps with extract_call_expr.
>>>
>>> Now I wish I hadn't done that.  But for implicit functions, like in
>>> pr90883.C where we have
>>>
>>>       class C
>>>       {
>>>           char a[7]{};
>>>           int b{};
>>>       };
>>> there's nothing to make noinline anyway.  So I'm stuck.  I suppose
>>> it wouldn't make sense to use -fno-fold-simple-inlines to disable
>>> this optimization.  Can I abuse -fearly-inlining?  I don't want a new
>>> flag for this :(.
>>>
>>> Or should we just adjust the tests?
>>
>> I think so, yes.  Probably so that they test that the output doesn't contain
>> the redundancy that the PR complains about; reaching optimal output by
>> another path isn't a problem.
>>
>> For pr78687, I guess that means checking the number of aggregate
>> temporaries.
>>
>> For pr90883, maybe check for the lack of ".a =" and ".b = "?
> 
> Sure.
>   
>> What is the .optimized for these two tests after your patch?
> 
> pr90883 is fine, the dumps are the same.  In pr78687 they very much
> aren't.
> 
> Let's see what's happening here.  In .gimple, the difference is
> in make_object_1:
> 
>   struct ref_proxy make_object_1 ()
>   {
>     struct variant D.9472;
> -  struct option_2 D.9273;
> 
>     try
>       {
> -      try
> -        {
> -          eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472, &D.9273);
> -          ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy (&<retval>, D.9472);
> -          return <retval>;
> -        }
> -      finally
> -        {
> -          D.9273 = {CLOBBER(eos)};
> -        }
> +      D.9472 = {};
> +      D.9472._storage._which = 2;
> +      ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy (&<retval>, D.9472);
> +      return <retval>;
> 
> that makes sense: the new optimization "inlined" the
> eggs::variants::variant<option_1, option_2>::variant<option_2> call.  So later
> .einline has nothing to expand here whereas previously
> 
>    eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472, &D.9273);
> 
> was expanded into
> 
>    MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
>    MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
>    MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
>    MEM[(struct _storage *)&D.9472]._which = 2;
> 
> Then make_object_1 gets inlined.  Something happens in SRA.  And
> then we got rid of a lot of code.  But now a lot of code remains.
> 
> Is it simply the fact that with this opt we expand the ctor into
> 
>    D.9472 = {};
>    D.9472._storage._which = 2;
> 
> which means that we zero-init the variant object and then set _which to 2,
> while previously we just allocated storage and set _which to 2?

That seems likely since the patch that fixed the bug before was dealing 
with partially-initialized objects.  Why does the optimization change that?

> This is .optimized with the opt on:
> 
> struct ref_proxy f ()
> {
>    struct ref_proxy ptr;
>    struct ref_proxy D.10036;
>    struct ref_proxy type;
>    struct ref_proxy type;
>    struct qual_option D.10031;
>    struct ref_proxy D.10030;
>    struct qual_option inner;
>    struct variant t;
>    struct variant D.10026;
>    struct variant D.10024;
>    struct inplace_ref D.10023;
>    struct inplace_ref ptr;
>    struct ref_proxy D.9898;
> 
>    <bb 2> [local count: 1073741824]:
>    MEM <char[40]> [(struct variant *)&D.10024] = {};
>    D.10024._storage._which = 2;
>    D.10026 = D.10024;
>    t = D.10026;
>    MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
>    MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
>    t ={v} {CLOBBER(eos)};
>    D.10026 ={v} {CLOBBER(eos)};
>    D.10024 ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct ref_proxy *)&D.9898 + 40B] = 2;
>    D.10036 = D.9898;
>    ptr = D.10036;
>    MEM[(struct variant_ref *)&D.10030] ={v} {CLOBBER(bob)};
>    MEM[(struct variant_ref *)&D.10030].inner_storage_ = ptr.D.9270.inner_storage_;
>    ptr ={v} {CLOBBER(eos)};
>    D.10036 ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct ref_proxy *)&D.10030 + 40B] = 2;
>    type = D.10030;
>    type = type;
>    MEM[(struct __as_base  &)&D.10031] ={v} {CLOBBER(bob)};
>    D.10031.type_ = type;
>    type ={v} {CLOBBER(eos)};
>    type ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct qual_option *)&D.10031 + 40B] = 2;
>    D.10031.quals_ = 0;
>    inner = D.10031;
>    D.10023 ={v} {CLOBBER(bob)};
>    D.10023.inner_ = inner;
>    inner ={v} {CLOBBER(eos)};
>    D.10030 ={v} {CLOBBER(eos)};
>    D.10031 ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct inplace_ref *)&D.10023 + 40B] = 2;
>    MEM <int> [(struct inplace_ref *)&D.10023 + 48B] = 0;
>    ptr = D.10023;
>    <retval> ={v} {CLOBBER(bob)};
>    <retval>.D.9858 = ptr;
>    ptr ={v} {CLOBBER(eos)};
>    D.9898 ={v} {CLOBBER(eos)};
>    return <retval>;
> 
> }
>
  
Marek Polacek Sept. 26, 2024, 10:03 p.m. UTC | #10
On Wed, Sep 25, 2024 at 08:45:50PM -0400, Jason Merrill wrote:
> On 9/25/24 4:21 PM, Marek Polacek wrote:
> > On Wed, Sep 25, 2024 at 08:54:46AM -0400, Jason Merrill wrote:
> > > On 9/24/24 5:10 PM, Marek Polacek wrote:
> > > > On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
> > > > > On 9/20/24 12:18 AM, Marek Polacek wrote:
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > 
> > > > > > -- >8 --
> > > > > > This PR reports a missed optimization.  When we have:
> > > > > > 
> > > > > >      Str str{"Test"};
> > > > > >      callback(str);
> > > > > > 
> > > > > > as in the test, we're able to evaluate the Str::Str() call at compile
> > > > > > time.  But when we have:
> > > > > > 
> > > > > >      callback(Str{"Test"});
> > > > > > 
> > > > > > we are not.  With this patch (in fact, it's Patrick's patch with a little
> > > > > > tweak), we turn
> > > > > > 
> > > > > >      callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
> > > > > >        5
> > > > > >        __ct_comp
> > > > > >        D.2890
> > > > > >        (struct Str *) <<< Unknown tree: void_cst >>>
> > > > > >        (const char *) "Test" >>>>)
> > > > > > 
> > > > > > into
> > > > > > 
> > > > > >      callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
> > > > > > 
> > > > > > I explored the idea of calling maybe_constant_value for the whole
> > > > > > TARGET_EXPR in cp_fold.  That has three problems:
> > > > > > - we can't always elide a TARGET_EXPR, so we'd have to make sure the
> > > > > >      result is also a TARGET_EXPR;
> > > > > 
> > > > > I'd think that the result should always be a TARGET_EXPR for a class, and
> > > > > that's the case we want to fold; a TARGET_EXPR for a scalar is always the
> > > > > initialize-temp-and-use-it pattern you mention below.
> > > > 
> > > > Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
> > > > > > - the resulting TARGET_EXPR must have the same flags, otherwise Bad
> > > > > >      Things happen;
> > > > > 
> > > > > I guess maybe_constant_value should be fixed to preserve flags regardless of
> > > > > this change.
> > > > 
> > > > Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
> > > > but here we go into the break_out_target_exprs block in maybe_constant_value
> > > > and that doesn't necessarily preserve them.
> > > > 
> > > > > > - getting a new slot is also problematic.  I've seen a test where we
> > > > > >      had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
> > > > > >      would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
> > > > > >      D.2680, we can't replace it with D.2681, and things break.
> > > > > 
> > > > > Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?
> > > > 
> > > > ...unfortunately that doesn't always help.  I've reduced an example into:
> > > > 
> > > >     struct optional {
> > > >       constexpr optional(int) {}
> > > >     };
> > > >     optional foo() { return 2; }
> > > > 
> > > > where check_return_expr creates a COMPOUND_EXPR:
> > > > 
> > > >           retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
> > > >                            TREE_OPERAND (retval, 0));
> > > > 
> > > > where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
> > > > > > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
> > > > > > 
> > > > > > FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
> > > > > > is easy.  Previously, we would call C::C, so .gimple has:
> > > > > > 
> > > > > >      D.2590 = {};
> > > > > >      C::C (&D.2590);
> > > > > >      D.2597 = D.2590;
> > > > > >      return D.2597;
> > > > > > 
> > > > > > Then .einline inlines the C::C call:
> > > > > > 
> > > > > >      D.2590 = {};
> > > > > >      D.2590.a = {}; // #1
> > > > > >      D.2590.b = 0;  // #2
> > > > > >      D.2597 = D.2590;
> > > > > >      D.2590 ={v} {CLOBBER(eos)};
> > > > > >      return D.2597;
> > > > > > 
> > > > > > then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
> > > > > > passes.  But with the patch, .gimple won't have that C::C call, so the
> > > > > > IL is of course going to look different.
> > > > > 
> > > > > Maybe -fno-inline instead of the --param?
> > > > 
> > > > Then that C::C call isn't inlined and the test fails :/.
> > > > > > Unfortunately, pr78687.C is much more complicated and I can't explain
> > > > > > precisely what happens there.  But it seems like a good idea to have
> > > > > > a way to avoid this optimization.  So I've added the "noinline" check.
> > > > > 
> > > > > Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
> > > > > constructors are not constexpr.  And I wouldn't expect the optimization to
> > > > > affect the value-initialization option_2().
> > > > 
> > > > In pr78687.C we do this new optimization only once for
> > > > "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept (std::is_nothrow_constructible<T, U&&>::value)".
> > > 
> > > Aha.  Can we remove its constexpr?
> > > ...no, it's defaulted.  And moving the defaulting after the class also
> > > breaks the testcase.

FWIW, removing EGGS_CXX11_CONSTEXPR on line 360 "fixes" the test.

> > > > > > 	PR c++/116416
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > > 	* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
> > > > > > 	TARGET_EXPR_INITIAL and replace it with the folded result if
> > > > > > 	it's TREE_CONSTANT.
> > > > > > 
> > > > > > gcc/testsuite/ChangeLog:
> > > > > > 
> > > > > > 	* g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > > > > > 	* g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> > > > > > 	* g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> > > > > > 	* g++.dg/tree-ssa/pr90883.C: Likewise.
> > > > > > 	* g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> > > > > > 
> > > > > > Co-authored-by: Patrick Palka <ppalka@redhat.com>
> > > > > > ---
> > > > > >     gcc/cp/cp-gimplify.cc                         | 14 +++++++++
> > > > > >     gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
> > > > > >     .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 +++++++++++++++++++
> > > > > >     gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
> > > > > >     gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
> > > > > >     gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
> > > > > >     6 files changed, 50 insertions(+), 3 deletions(-)
> > > > > >     create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> > > > > > 
> > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > index 003e68f1ea7..41d6333f650 100644
> > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
> > > > > >     	 that case, strip it in favor of this one.  */
> > > > > >           if (tree &init = TARGET_EXPR_INITIAL (stmt))
> > > > > >     	{
> > > > > > +	  tree fn;
> > > > > > +	  if ((data->flags & ff_genericize)
> > > > > > +	      /* Give the user an option to opt out.  */
> > > > > > +	      && !((fn = current_function_decl)
> > > > > > +		   && lookup_attribute ("noinline",
> > > > > > +					DECL_ATTRIBUTES (fn))))
> > > > > 
> > > > > This looks backwards from the usual sense of noinline, which suppresses
> > > > > inlining of the function so marked, rather than inlining of other functions
> > > > > called.  If you want to check noinline, you need to dig out the called
> > > > > function (if any), perhaps with extract_call_expr.
> > > > 
> > > > Now I wish I hadn't done that.  But for implicit functions, like in
> > > > pr90883.C where we have
> > > > 
> > > >       class C
> > > >       {
> > > >           char a[7]{};
> > > >           int b{};
> > > >       };
> > > > there's nothing to make noinline anyway.  So I'm stuck.  I suppose
> > > > it wouldn't make sense to use -fno-fold-simple-inlines to disable
> > > > this optimization.  Can I abuse -fearly-inlining?  I don't want a new
> > > > flag for this :(.
> > > > 
> > > > Or should we just adjust the tests?
> > > 
> > > I think so, yes.  Probably so that they test that the output doesn't contain
> > > the redundancy that the PR complains about; reaching optimal output by
> > > another path isn't a problem.
> > > 
> > > For pr78687, I guess that means checking the number of aggregate
> > > temporaries.
> > > 
> > > For pr90883, maybe check for the lack of ".a =" and ".b = "?
> > 
> > Sure.
> > > What is the .optimized for these two tests after your patch?
> > 
> > pr90883 is fine, the dumps are the same.  In pr78687 they very much
> > aren't.
> > 
> > Let's see what's happening here.  In .gimple, the difference is
> > in make_object_1:
> > 
> >   struct ref_proxy make_object_1 ()
> >   {
> >     struct variant D.9472;
> > -  struct option_2 D.9273;
> > 
> >     try
> >       {
> > -      try
> > -        {
> > -          eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472, &D.9273);
> > -          ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy (&<retval>, D.9472);
> > -          return <retval>;
> > -        }
> > -      finally
> > -        {
> > -          D.9273 = {CLOBBER(eos)};
> > -        }
> > +      D.9472 = {};
> > +      D.9472._storage._which = 2;
> > +      ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy (&<retval>, D.9472);
> > +      return <retval>;
> > 
> > that makes sense: the new optimization "inlined" the
> > eggs::variants::variant<option_1, option_2>::variant<option_2> call.  So later
> > .einline has nothing to expand here whereas previously
> > 
> >    eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472, &D.9273);
> > 
> > was expanded into
> > 
> >    MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
> >    MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
> >    MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
> >    MEM[(struct _storage *)&D.9472]._which = 2;
> > 
> > Then make_object_1 gets inlined.  Something happens in SRA.  And
> > then we got rid of a lot of code.  But now a lot of code remains.
> > 
> > Is it simply the fact that with this opt we expand the ctor into
> > 
> >    D.9472 = {};
> >    D.9472._storage._which = 2;
> > 
> > which means that we zero-init the variant object and then set _which to 2,
> > while previously we just allocated storage and set _which to 2?
> 
> That seems likely since the patch that fixed the bug before was dealing with
> partially-initialized objects.  Why does the optimization change that?

CCing a few optimizer folks.  To recap, we're talking about tree-ssa/pr78687.C.
In in, there's:

        EGGS_CXX11_CONSTEXPR variant(U&& v)
            noexcept(
                std::is_nothrow_constructible<T, U&&>::value)
          : _storage{detail::index<I + 1>{}, std::forward<U>(v)}
        {}

With a new C++ FE optimization this patch introduces, we can evaluate the call
at compile time.  Then .gimple looks a little different (see above).  What is
not clear is why we can't optimize the code as much as without this patch,
when the variant call isn't evaluated at compile time, and instead we produce
those MEMs as shown above.

Any insights would be appreciated.
 
This is .optimized with the opt on:

__attribute__((noinline))
struct ref_proxy f ()
{
   struct ref_proxy ptr;
   struct ref_proxy D.10036;
   struct ref_proxy type;
   struct ref_proxy type;
   struct qual_option D.10031;
   struct ref_proxy D.10030;
   struct qual_option inner;
   struct variant t;
   struct variant D.10026;
   struct variant D.10024;
   struct inplace_ref D.10023;
   struct inplace_ref ptr;
   struct ref_proxy D.9898;

   <bb 2> [local count: 1073741824]:
   MEM <char[40]> [(struct variant *)&D.10024] = {};
   D.10024._storage._which = 2;
   D.10026 = D.10024;
   t = D.10026;
   MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
   MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
   t ={v} {CLOBBER(eos)};
   D.10026 ={v} {CLOBBER(eos)};
   D.10024 ={v} {CLOBBER(eos)};
   MEM <size_t> [(struct ref_proxy *)&D.9898 + 40B] = 2;
   D.10036 = D.9898;
   ptr = D.10036;
   MEM[(struct variant_ref *)&D.10030] ={v} {CLOBBER(bob)};
   MEM[(struct variant_ref *)&D.10030].inner_storage_ = ptr.D.9270.inner_storage_;
   ptr ={v} {CLOBBER(eos)};
   D.10036 ={v} {CLOBBER(eos)};
   MEM <size_t> [(struct ref_proxy *)&D.10030 + 40B] = 2;
   type = D.10030;
   type = type;
   MEM[(struct __as_base  &)&D.10031] ={v} {CLOBBER(bob)};
   D.10031.type_ = type;
   type ={v} {CLOBBER(eos)};
   type ={v} {CLOBBER(eos)};
   MEM <size_t> [(struct qual_option *)&D.10031 + 40B] = 2;
   D.10031.quals_ = 0;
   inner = D.10031;
   D.10023 ={v} {CLOBBER(bob)};
   D.10023.inner_ = inner;
   inner ={v} {CLOBBER(eos)};
   D.10030 ={v} {CLOBBER(eos)};
   D.10031 ={v} {CLOBBER(eos)};
   MEM <size_t> [(struct inplace_ref *)&D.10023 + 40B] = 2;
   MEM <int> [(struct inplace_ref *)&D.10023 + 48B] = 0;
   ptr = D.10023;
   <retval> ={v} {CLOBBER(bob)};
   <retval>.D.9858 = ptr;
   ptr ={v} {CLOBBER(eos)};
   D.9898 ={v} {CLOBBER(eos)};
   return <retval>;

}

and this is the result without this patch:

__attribute__((noinline))
struct ref_proxy f ()
{
  <bb 2> [local count: 1073741824]:
  <retval> ={v} {CLOBBER(bob)};
  MEM <size_t> [(struct inplace_ref *)&<retval> + 40B] = 2;
  MEM <int> [(struct inplace_ref *)&<retval> + 48B] = 0;
  return <retval>;

}
  
Richard Biener Sept. 27, 2024, 6:16 a.m. UTC | #11
On Thu, 26 Sep 2024, Marek Polacek wrote:

> On Wed, Sep 25, 2024 at 08:45:50PM -0400, Jason Merrill wrote:
> > On 9/25/24 4:21 PM, Marek Polacek wrote:
> > > On Wed, Sep 25, 2024 at 08:54:46AM -0400, Jason Merrill wrote:
> > > > On 9/24/24 5:10 PM, Marek Polacek wrote:
> > > > > On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
> > > > > > On 9/20/24 12:18 AM, Marek Polacek wrote:
> > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > > 
> > > > > > > -- >8 --
> > > > > > > This PR reports a missed optimization.  When we have:
> > > > > > > 
> > > > > > >      Str str{"Test"};
> > > > > > >      callback(str);
> > > > > > > 
> > > > > > > as in the test, we're able to evaluate the Str::Str() call at compile
> > > > > > > time.  But when we have:
> > > > > > > 
> > > > > > >      callback(Str{"Test"});
> > > > > > > 
> > > > > > > we are not.  With this patch (in fact, it's Patrick's patch with a little
> > > > > > > tweak), we turn
> > > > > > > 
> > > > > > >      callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
> > > > > > >        5
> > > > > > >        __ct_comp
> > > > > > >        D.2890
> > > > > > >        (struct Str *) <<< Unknown tree: void_cst >>>
> > > > > > >        (const char *) "Test" >>>>)
> > > > > > > 
> > > > > > > into
> > > > > > > 
> > > > > > >      callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
> > > > > > > 
> > > > > > > I explored the idea of calling maybe_constant_value for the whole
> > > > > > > TARGET_EXPR in cp_fold.  That has three problems:
> > > > > > > - we can't always elide a TARGET_EXPR, so we'd have to make sure the
> > > > > > >      result is also a TARGET_EXPR;
> > > > > > 
> > > > > > I'd think that the result should always be a TARGET_EXPR for a class, and
> > > > > > that's the case we want to fold; a TARGET_EXPR for a scalar is always the
> > > > > > initialize-temp-and-use-it pattern you mention below.
> > > > > 
> > > > > Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
> > > > > > > - the resulting TARGET_EXPR must have the same flags, otherwise Bad
> > > > > > >      Things happen;
> > > > > > 
> > > > > > I guess maybe_constant_value should be fixed to preserve flags regardless of
> > > > > > this change.
> > > > > 
> > > > > Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
> > > > > but here we go into the break_out_target_exprs block in maybe_constant_value
> > > > > and that doesn't necessarily preserve them.
> > > > > 
> > > > > > > - getting a new slot is also problematic.  I've seen a test where we
> > > > > > >      had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
> > > > > > >      would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
> > > > > > >      D.2680, we can't replace it with D.2681, and things break.
> > > > > > 
> > > > > > Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?
> > > > > 
> > > > > ...unfortunately that doesn't always help.  I've reduced an example into:
> > > > > 
> > > > >     struct optional {
> > > > >       constexpr optional(int) {}
> > > > >     };
> > > > >     optional foo() { return 2; }
> > > > > 
> > > > > where check_return_expr creates a COMPOUND_EXPR:
> > > > > 
> > > > >           retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
> > > > >                            TREE_OPERAND (retval, 0));
> > > > > 
> > > > > where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
> > > > > > > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
> > > > > > > 
> > > > > > > FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
> > > > > > > is easy.  Previously, we would call C::C, so .gimple has:
> > > > > > > 
> > > > > > >      D.2590 = {};
> > > > > > >      C::C (&D.2590);
> > > > > > >      D.2597 = D.2590;
> > > > > > >      return D.2597;
> > > > > > > 
> > > > > > > Then .einline inlines the C::C call:
> > > > > > > 
> > > > > > >      D.2590 = {};
> > > > > > >      D.2590.a = {}; // #1
> > > > > > >      D.2590.b = 0;  // #2
> > > > > > >      D.2597 = D.2590;
> > > > > > >      D.2590 ={v} {CLOBBER(eos)};
> > > > > > >      return D.2597;
> > > > > > > 
> > > > > > > then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
> > > > > > > passes.  But with the patch, .gimple won't have that C::C call, so the
> > > > > > > IL is of course going to look different.
> > > > > > 
> > > > > > Maybe -fno-inline instead of the --param?
> > > > > 
> > > > > Then that C::C call isn't inlined and the test fails :/.
> > > > > > > Unfortunately, pr78687.C is much more complicated and I can't explain
> > > > > > > precisely what happens there.  But it seems like a good idea to have
> > > > > > > a way to avoid this optimization.  So I've added the "noinline" check.
> > > > > > 
> > > > > > Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
> > > > > > constructors are not constexpr.  And I wouldn't expect the optimization to
> > > > > > affect the value-initialization option_2().
> > > > > 
> > > > > In pr78687.C we do this new optimization only once for
> > > > > "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept (std::is_nothrow_constructible<T, U&&>::value)".
> > > > 
> > > > Aha.  Can we remove its constexpr?
> > > > ...no, it's defaulted.  And moving the defaulting after the class also
> > > > breaks the testcase.
> 
> FWIW, removing EGGS_CXX11_CONSTEXPR on line 360 "fixes" the test.
> 
> > > > > > > 	PR c++/116416
> > > > > > > 
> > > > > > > gcc/cp/ChangeLog:
> > > > > > > 
> > > > > > > 	* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
> > > > > > > 	TARGET_EXPR_INITIAL and replace it with the folded result if
> > > > > > > 	it's TREE_CONSTANT.
> > > > > > > 
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > 
> > > > > > > 	* g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > > > > > > 	* g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> > > > > > > 	* g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> > > > > > > 	* g++.dg/tree-ssa/pr90883.C: Likewise.
> > > > > > > 	* g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> > > > > > > 
> > > > > > > Co-authored-by: Patrick Palka <ppalka@redhat.com>
> > > > > > > ---
> > > > > > >     gcc/cp/cp-gimplify.cc                         | 14 +++++++++
> > > > > > >     gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
> > > > > > >     .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 +++++++++++++++++++
> > > > > > >     gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
> > > > > > >     gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
> > > > > > >     gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
> > > > > > >     6 files changed, 50 insertions(+), 3 deletions(-)
> > > > > > >     create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> > > > > > > 
> > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > > index 003e68f1ea7..41d6333f650 100644
> > > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
> > > > > > >     	 that case, strip it in favor of this one.  */
> > > > > > >           if (tree &init = TARGET_EXPR_INITIAL (stmt))
> > > > > > >     	{
> > > > > > > +	  tree fn;
> > > > > > > +	  if ((data->flags & ff_genericize)
> > > > > > > +	      /* Give the user an option to opt out.  */
> > > > > > > +	      && !((fn = current_function_decl)
> > > > > > > +		   && lookup_attribute ("noinline",
> > > > > > > +					DECL_ATTRIBUTES (fn))))
> > > > > > 
> > > > > > This looks backwards from the usual sense of noinline, which suppresses
> > > > > > inlining of the function so marked, rather than inlining of other functions
> > > > > > called.  If you want to check noinline, you need to dig out the called
> > > > > > function (if any), perhaps with extract_call_expr.
> > > > > 
> > > > > Now I wish I hadn't done that.  But for implicit functions, like in
> > > > > pr90883.C where we have
> > > > > 
> > > > >       class C
> > > > >       {
> > > > >           char a[7]{};
> > > > >           int b{};
> > > > >       };
> > > > > there's nothing to make noinline anyway.  So I'm stuck.  I suppose
> > > > > it wouldn't make sense to use -fno-fold-simple-inlines to disable
> > > > > this optimization.  Can I abuse -fearly-inlining?  I don't want a new
> > > > > flag for this :(.
> > > > > 
> > > > > Or should we just adjust the tests?
> > > > 
> > > > I think so, yes.  Probably so that they test that the output doesn't contain
> > > > the redundancy that the PR complains about; reaching optimal output by
> > > > another path isn't a problem.
> > > > 
> > > > For pr78687, I guess that means checking the number of aggregate
> > > > temporaries.
> > > > 
> > > > For pr90883, maybe check for the lack of ".a =" and ".b = "?
> > > 
> > > Sure.
> > > > What is the .optimized for these two tests after your patch?
> > > 
> > > pr90883 is fine, the dumps are the same.  In pr78687 they very much
> > > aren't.
> > > 
> > > Let's see what's happening here.  In .gimple, the difference is
> > > in make_object_1:
> > > 
> > >   struct ref_proxy make_object_1 ()
> > >   {
> > >     struct variant D.9472;
> > > -  struct option_2 D.9273;
> > > 
> > >     try
> > >       {
> > > -      try
> > > -        {
> > > -          eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472, &D.9273);
> > > -          ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy (&<retval>, D.9472);
> > > -          return <retval>;
> > > -        }
> > > -      finally
> > > -        {
> > > -          D.9273 = {CLOBBER(eos)};
> > > -        }
> > > +      D.9472 = {};
> > > +      D.9472._storage._which = 2;
> > > +      ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy (&<retval>, D.9472);
> > > +      return <retval>;
> > > 
> > > that makes sense: the new optimization "inlined" the
> > > eggs::variants::variant<option_1, option_2>::variant<option_2> call.  So later
> > > .einline has nothing to expand here whereas previously
> > > 
> > >    eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472, &D.9273);
> > > 
> > > was expanded into
> > > 
> > >    MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
> > >    MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
> > >    MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
> > >    MEM[(struct _storage *)&D.9472]._which = 2;
> > > 
> > > Then make_object_1 gets inlined.  Something happens in SRA.  And
> > > then we got rid of a lot of code.  But now a lot of code remains.
> > > 
> > > Is it simply the fact that with this opt we expand the ctor into
> > > 
> > >    D.9472 = {};
> > >    D.9472._storage._which = 2;
> > > 
> > > which means that we zero-init the variant object and then set _which to 2,
> > > while previously we just allocated storage and set _which to 2?
> > 
> > That seems likely since the patch that fixed the bug before was dealing with
> > partially-initialized objects.  Why does the optimization change that?
> 
> CCing a few optimizer folks.  To recap, we're talking about tree-ssa/pr78687.C.
> In in, there's:
> 
>         EGGS_CXX11_CONSTEXPR variant(U&& v)
>             noexcept(
>                 std::is_nothrow_constructible<T, U&&>::value)
>           : _storage{detail::index<I + 1>{}, std::forward<U>(v)}
>         {}
> 
> With a new C++ FE optimization this patch introduces, we can evaluate the call
> at compile time.  Then .gimple looks a little different (see above).  What is
> not clear is why we can't optimize the code as much as without this patch,
> when the variant call isn't evaluated at compile time, and instead we produce
> those MEMs as shown above.
> 
> Any insights would be appreciated.
>  
> This is .optimized with the opt on:
> 
> __attribute__((noinline))
> struct ref_proxy f ()
> {
>    struct ref_proxy ptr;
>    struct ref_proxy D.10036;
>    struct ref_proxy type;
>    struct ref_proxy type;
>    struct qual_option D.10031;
>    struct ref_proxy D.10030;
>    struct qual_option inner;
>    struct variant t;
>    struct variant D.10026;
>    struct variant D.10024;
>    struct inplace_ref D.10023;
>    struct inplace_ref ptr;
>    struct ref_proxy D.9898;
> 
>    <bb 2> [local count: 1073741824]:
>    MEM <char[40]> [(struct variant *)&D.10024] = {};

Without actually checking it might be that SRA chokes on the above.
The IL is basically a huge chain of aggregate copies interspersed
with clobbers and occasional scalar inits and I fear that we really
only have SRA dealing with this.

Is there any reason to use the char[40] init instead of a aggregate
{} init of type variant?

I would suggest to open a bugreport.

>    D.10024._storage._which = 2;
>    D.10026 = D.10024;
>    t = D.10026;
>    MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
>    MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
>    t ={v} {CLOBBER(eos)};
>    D.10026 ={v} {CLOBBER(eos)};
>    D.10024 ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct ref_proxy *)&D.9898 + 40B] = 2;
>    D.10036 = D.9898;
>    ptr = D.10036;
>    MEM[(struct variant_ref *)&D.10030] ={v} {CLOBBER(bob)};
>    MEM[(struct variant_ref *)&D.10030].inner_storage_ = ptr.D.9270.inner_storage_;
>    ptr ={v} {CLOBBER(eos)};
>    D.10036 ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct ref_proxy *)&D.10030 + 40B] = 2;
>    type = D.10030;
>    type = type;
>    MEM[(struct __as_base  &)&D.10031] ={v} {CLOBBER(bob)};
>    D.10031.type_ = type;
>    type ={v} {CLOBBER(eos)};
>    type ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct qual_option *)&D.10031 + 40B] = 2;
>    D.10031.quals_ = 0;
>    inner = D.10031;
>    D.10023 ={v} {CLOBBER(bob)};
>    D.10023.inner_ = inner;
>    inner ={v} {CLOBBER(eos)};
>    D.10030 ={v} {CLOBBER(eos)};
>    D.10031 ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct inplace_ref *)&D.10023 + 40B] = 2;
>    MEM <int> [(struct inplace_ref *)&D.10023 + 48B] = 0;
>    ptr = D.10023;
>    <retval> ={v} {CLOBBER(bob)};
>    <retval>.D.9858 = ptr;
>    ptr ={v} {CLOBBER(eos)};
>    D.9898 ={v} {CLOBBER(eos)};
>    return <retval>;
> 
> }
> 
> and this is the result without this patch:
> 
> __attribute__((noinline))
> struct ref_proxy f ()
> {
>   <bb 2> [local count: 1073741824]:
>   <retval> ={v} {CLOBBER(bob)};
>   MEM <size_t> [(struct inplace_ref *)&<retval> + 40B] = 2;
>   MEM <int> [(struct inplace_ref *)&<retval> + 48B] = 0;
>   return <retval>;
> 
> }
> 
>
  
Jakub Jelinek Sept. 27, 2024, 8:50 a.m. UTC | #12
On Fri, Sep 27, 2024 at 08:16:43AM +0200, Richard Biener wrote:
> > __attribute__((noinline))
> > struct ref_proxy f ()
> > {
> >    struct ref_proxy ptr;
> >    struct ref_proxy D.10036;
> >    struct ref_proxy type;
> >    struct ref_proxy type;
> >    struct qual_option D.10031;
> >    struct ref_proxy D.10030;
> >    struct qual_option inner;
> >    struct variant t;
> >    struct variant D.10026;
> >    struct variant D.10024;
> >    struct inplace_ref D.10023;
> >    struct inplace_ref ptr;
> >    struct ref_proxy D.9898;
> > 
> >    <bb 2> [local count: 1073741824]:
> >    MEM <char[40]> [(struct variant *)&D.10024] = {};
> 
> Without actually checking it might be that SRA chokes on the above.
> The IL is basically a huge chain of aggregate copies interspersed
> with clobbers and occasional scalar inits and I fear that we really
> only have SRA dealing with this.

True.

> Is there any reason to use the char[40] init instead of a aggregate
> {} init of type variant?

It is dse1 which introduces that:
-  D.10137 = {};
+  MEM <char[40]> [(struct variant *)&D.10137] = {};
in particular maybe_trim_constructor_store.

So, if SRA chokes on it, it better be fixed to deal with that,
DSE can create that any time.
Though, not sure how to differentiate that from the actual C++ zero
initialization where it is supposed to clear also padding bits if any.
I think a CONSTRUCTOR flag for that would be best, though e.g. in GIMPLE
representing such clears including padding bits with
MEM <char[sizeof (struct whatever)]> [(struct whatever *)&D.whatever] = {};
might be an option too.  But then how to represent the DSE constructor
trimming such that it is clear that it still doesn't initialize the padding
bits?
Anyway, even if padding bits are zero initialized, if SRA could see that
nothing really inspects those padding bits, it would be nice to still optimize
it.

That said, it is
a union of a struct with 5 pointers (i.e. 40 bytes) and an empty struct
(1 byte, with padding) followed by size_t which_ field (the = 2 store).

And, I believe when not constexpr evaluating this, there actually is no
clearing before the = 2; store,
void eggs::variants::detail::_storage<eggs::variants::detail::pack<eggs::variants::detail::empty, option_1, option_2>, true, true>::_storage<2, option_2> (struct _storage * const thi
s, struct index which, struct option_2 & args#0)
{
  struct index D.10676;

  *this = {CLOBBER(bob)};
  {
    _1 = &this->D.9542;
    eggs::variants::detail::_union<eggs::variants::detail::pack<eggs::variants::detail::empty, option_1, option_2>, true>::_union<2, option_2> (_1, D.10676, args#0);
    this->_which = 2;
  }
}
and the call in there is just 3 nested inline calls which do some clobbers
too, take address of something and call another inline and in the end
it is just a clobber and nothing else.

So, another thing to look at is whether the CONSTRUCTOR is
CONSTRUCTOR_NO_CLEARING or not and if that is ok, and/or whether
the gimplification is correct in that case (say if it would be
struct S { union U { struct T { void *a, *b, *c, *d, *e; } t; struct V {} v; } u; unsigned long w; };
void bar (S *);

void
foo ()
{
  S s = { .u = { .v = {} }, .w = 2 };
  bar (&s);
}
why do we expand it as
      s = {};
      s.w = 2;
when just
      s.w = 2;
or maybe
      s.u.v = {};
      s.w = 2;
would be enough.  Because when the large union has just a small member
(in this case empty struct) active, clearing the whole union is really a
waste of time.

> I would suggest to open a bugreport.

Yes.

	Jakub
  
Richard Biener Sept. 27, 2024, 10:14 a.m. UTC | #13
On Fri, 27 Sep 2024, Jakub Jelinek wrote:

> On Fri, Sep 27, 2024 at 08:16:43AM +0200, Richard Biener wrote:
> > > __attribute__((noinline))
> > > struct ref_proxy f ()
> > > {
> > >    struct ref_proxy ptr;
> > >    struct ref_proxy D.10036;
> > >    struct ref_proxy type;
> > >    struct ref_proxy type;
> > >    struct qual_option D.10031;
> > >    struct ref_proxy D.10030;
> > >    struct qual_option inner;
> > >    struct variant t;
> > >    struct variant D.10026;
> > >    struct variant D.10024;
> > >    struct inplace_ref D.10023;
> > >    struct inplace_ref ptr;
> > >    struct ref_proxy D.9898;
> > > 
> > >    <bb 2> [local count: 1073741824]:
> > >    MEM <char[40]> [(struct variant *)&D.10024] = {};
> > 
> > Without actually checking it might be that SRA chokes on the above.
> > The IL is basically a huge chain of aggregate copies interspersed
> > with clobbers and occasional scalar inits and I fear that we really
> > only have SRA dealing with this.
> 
> True.
> 
> > Is there any reason to use the char[40] init instead of a aggregate
> > {} init of type variant?
> 
> It is dse1 which introduces that:
> -  D.10137 = {};
> +  MEM <char[40]> [(struct variant *)&D.10137] = {};
> in particular maybe_trim_constructor_store.
> 
> So, if SRA chokes on it, it better be fixed to deal with that,
> DSE can create that any time.

Agreed, it might be non-trivial though.

> Though, not sure how to differentiate that from the actual C++ zero
> initialization where it is supposed to clear also padding bits if any.
> I think a CONSTRUCTOR flag for that would be best, though e.g. in GIMPLE
> representing such clears including padding bits with
> MEM <char[sizeof (struct whatever)]> [(struct whatever *)&D.whatever] = {};
> might be an option too.  But then how to represent the DSE constructor
> trimming such that it is clear that it still doesn't initialize the padding
> bits?
> Anyway, even if padding bits are zero initialized, if SRA could see that
> nothing really inspects those padding bits, it would be nice to still optimize
> it.
>
> That said, it is
> a union of a struct with 5 pointers (i.e. 40 bytes) and an empty struct
> (1 byte, with padding) followed by size_t which_ field (the = 2 store).
> 
> And, I believe when not constexpr evaluating this, there actually is no
> clearing before the = 2; store,
> void eggs::variants::detail::_storage<eggs::variants::detail::pack<eggs::variants::detail::empty, option_1, option_2>, true, true>::_storage<2, option_2> (struct _storage * const thi
> s, struct index which, struct option_2 & args#0)
> {
>   struct index D.10676;
> 
>   *this = {CLOBBER(bob)};
>   {
>     _1 = &this->D.9542;
>     eggs::variants::detail::_union<eggs::variants::detail::pack<eggs::variants::detail::empty, option_1, option_2>, true>::_union<2, option_2> (_1, D.10676, args#0);
>     this->_which = 2;
>   }
> }
> and the call in there is just 3 nested inline calls which do some clobbers
> too, take address of something and call another inline and in the end
> it is just a clobber and nothing else.
> 
> So, another thing to look at is whether the CONSTRUCTOR is
> CONSTRUCTOR_NO_CLEARING or not and if that is ok, and/or whether
> the gimplification is correct in that case (say if it would be
> struct S { union U { struct T { void *a, *b, *c, *d, *e; } t; struct V {} v; } u; unsigned long w; };
> void bar (S *);
> 
> void
> foo ()
> {
>   S s = { .u = { .v = {} }, .w = 2 };
>   bar (&s);
> }
> why do we expand it as
>       s = {};
>       s.w = 2;
> when just
>       s.w = 2;
> or maybe
>       s.u.v = {};
>       s.w = 2;
> would be enough.  Because when the large union has just a small member
> (in this case empty struct) active, clearing the whole union is really a
> waste of time.
> 
> > I would suggest to open a bugreport.
> 
> Yes.

I can investigate a bit when there's a testcase showing the issue.

Richard.
  
Jakub Jelinek Sept. 27, 2024, 2:01 p.m. UTC | #14
On Fri, Sep 27, 2024 at 12:14:47PM +0200, Richard Biener wrote:
> I can investigate a bit when there's a testcase showing the issue.

The testcase is pr78687.C with Marek's cp-gimplify.cc patch.

Or the
struct S { union U { struct T { void *a, *b, *c, *d, *e; } t; struct V {} v; } u; unsigned long w; };
void bar (struct S *);

void
foo ()
{
  struct S s = { .u = { .v = {} }, .w = 2 };
  bar (&s);
}
I've mentioned shows the same behavior except for SRA (which
is there of course prevented through the object escaping).
Though, not really sure right now if this reduced testcase
in C or C++ actually isn't supposed to clear the whole object rather than
just the initialized fields and what exactly is CONSTRUCTOR_NO_CLEARING
vs. !CONSTRUCTOR_NO_CLEARING supposed to mean.

On pr78687.C with Marek's patch, CONSTRUCTOR_NO_CLEARING is cleared with
  /* The result of a constexpr function must be completely initialized.

     However, in C++20, a constexpr constructor doesn't necessarily have
     to initialize all the fields, so we don't clear CONSTRUCTOR_NO_CLEARING
     in order to detect reading an unitialized object in constexpr instead
     of value-initializing it.  (reduced_constant_expression_p is expected to
     take care of clearing the flag.)  */
  if (TREE_CODE (result) == CONSTRUCTOR
      && (cxx_dialect < cxx20
          || !DECL_CONSTRUCTOR_P (fun)))
    clear_no_implicit_zero (result);

generic.texi says:
"Unrepresented fields will be cleared (zeroed), unless the
CONSTRUCTOR_NO_CLEARING flag is set, in which case their value becomes
undefined."
Now, for RECORD_TYPE, I think the !CONSTRUCTOR_NO_CLEARING meaning is clear,
if the flag isn't set, then if there is no constructor_elt for certain
FIELD_DECL, that FIELD_DECL is implicitly zeroed.  The state of padding bits
is fuzzy, we don't really have a flag whether the CONSTRUCTOR clears also
padding bits (aka C++ zero initialization) or not.
The problem above is with UNION_TYPE.  If the CONSTRUCTOR for it is empty,
that should IMHO still imply zero initialization of the whole thing, we
don't really know which union member is active.  But if the CONSTRUCTOR
has one elt (it should never have more than one), shouldn't that mean
(unless there is a new flag which says that padding bits are cleared too)
that CONSTRUCTOR_NO_CLEARING and !CONSTRUCTOR_NO_CLEARING behave the same,
in particular that the selected FIELD_DECL is initialized to whatever
the initializer is but nothing else is?

The reason why the gimplifier clears the whole struct is because
on (with Marek's patch on the pr78687.C testcase)
D.10137 = {._storage={.D.9542={.D.9123={._tail={.D.9181={._tail={.D.9240={._head={}}}}}}}, ._which=2}};
or that
s = {.u={.v={}}, .w=2};
in the above testcase, categorize_ctor_elements yields
valid_const_initializer = true
num_nonzero_elements = 1
num_unique_nonzero_elements = 1
num_ctor_elements = 1
complete_p = false
- there is a single non-empty initializer in both CONSTRUCTORs,
they aren't CONSTRUCTOR_NO_CLEARING and the reason complete_p is false
is that categorize_ctor_elements_1 does:
  if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
                                                num_fields, elt_type))
    *p_complete = 0;
  else if (*p_complete > 0
           && type_has_padding_at_level_p (TREE_TYPE (ctor)))
    *p_complete = -1;
and for UNION_TYPE/QUAL_UNION_TYPE complete_ctor_at_level_p does:
      if (num_elts == 0)
        return false;
...
      /* ??? We could look at each element of the union, and find the
         largest element.  Which would avoid comparing the size of the
         initialized element against any tail padding in the union.
         Doesn't seem worth the effort...  */
      return simple_cst_equal (TYPE_SIZE (type), TYPE_SIZE (last_type)) == 1;
Now, given the documentation of categorize_ctor_elements:
   * whether the constructor is complete -- in the sense that every
     meaningful byte is explicitly given a value --
     and place it in *P_COMPLETE:
     -  0 if any field is missing
     -  1 if all fields are initialized, and there's no padding
     - -1 if all fields are initialized, but there's padding
I'd argue this handling of UNION_TYPE/QUAL_UNION_TYPE is wrong
(though note that type_has_padding_at_level_p returns true if any of the
union members is smaller than the whole, rather than checking whether
the actually initialized one has the same size as whole), it will
set *p_complete to 0 as if any field is missing, even when no field
is missing, just the union has padding.

So, I think we should go with (but so far completely untested except
for pr78687.C which is optimized with Marek's patch and the above testcase
which doesn't have the clearing anymore) the following patch.

2024-09-27  Jakub Jelinek  <jakub@redhat.com>

	PR c++/116416
	* expr.cc (categorize_ctor_elements_1): Fix up union handling of
	*p_complete.  Clear it only if num_fields is 0 and the union has
	at least one FIELD_DECL, set to -1 if either union has no fields
	and non-zero size, or num_fields is 1 and complete_ctor_at_level_p
	returned false.
	* gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
	handling, return also true for UNION_TYPE with no FIELD_DECLs
	and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.

--- gcc/expr.cc.jj	2024-09-04 12:09:22.598808244 +0200
+++ gcc/expr.cc	2024-09-27 15:34:20.929282525 +0200
@@ -7218,7 +7218,36 @@ categorize_ctor_elements_1 (const_tree c
 
   if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
 						num_fields, elt_type))
-    *p_complete = 0;
+    {
+      if (TREE_CODE (TREE_TYPE (ctor)) == UNION_TYPE
+	  || TREE_CODE (TREE_TYPE (ctor)) == QUAL_UNION_TYPE)
+	{
+	  /* The union case is more complicated.  */
+	  if (num_fields == 0)
+	    {
+	      /* If the CONSTRUCTOR doesn't have any elts, it is
+		 incomplete if the union has at least one field.  */
+	      for (tree f = TYPE_FIELDS (TREE_TYPE (ctor));
+		   f; f = DECL_CHAIN (f))
+		if (TREE_CODE (f) == FIELD_DECL)
+		  {
+		    *p_complete = 0;
+		    break;
+		  }
+	      /* Otherwise it has padding if the union has non-zero size.  */
+	      if (*p_complete > 0
+		  && !integer_zerop (TYPE_SIZE (TREE_TYPE (ctor))))
+		*p_complete = -1;
+	    }
+	  /* Otherwise the CONSTRUCTOR should have exactly one element.
+	     It is then never incomplete, but if complete_ctor_at_level_p
+	     returned false, it has padding.  */
+	  else if (*p_complete > 0)
+	    *p_complete = -1;
+	}
+      else
+	*p_complete = 0;
+    }
   else if (*p_complete > 0
 	   && type_has_padding_at_level_p (TREE_TYPE (ctor)))
     *p_complete = -1;
--- gcc/gimple-fold.cc.jj	2024-09-09 11:25:43.197048840 +0200
+++ gcc/gimple-fold.cc	2024-09-27 15:44:05.896355615 +0200
@@ -4814,12 +4814,22 @@ type_has_padding_at_level_p (tree type)
 	return false;
       }
     case UNION_TYPE:
+    case QUAL_UNION_TYPE:
+      bool any_fields;
+      any_fields = false;
       /* If any of the fields is smaller than the whole, there is padding.  */
       for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
-	if (TREE_CODE (f) == FIELD_DECL)
-	  if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
-				TREE_TYPE (type)) != 1)
-	    return true;
+	if (TREE_CODE (f) != FIELD_DECL)
+	  continue;
+	else if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
+				   TYPE_SIZE (TREE_TYPE (type))) != 1)
+	  return true;
+	else
+	  any_fields = true;
+      /* If the union doesn't have any fields and still has non-zero size,
+	 all of it is padding.  */
+      if (!any_fields && !integer_zerop (TYPE_SIZE (type)))
+	return true;
       return false;
     case ARRAY_TYPE:
     case COMPLEX_TYPE:


	Jakub
  
Richard Biener Sept. 28, 2024, 10:15 a.m. UTC | #15
On Fri, 27 Sep 2024, Jakub Jelinek wrote:

> On Fri, Sep 27, 2024 at 12:14:47PM +0200, Richard Biener wrote:
> > I can investigate a bit when there's a testcase showing the issue.
> 
> The testcase is pr78687.C with Marek's cp-gimplify.cc patch.

OK, I can reproduce.  The main issue I see is that SRA refuses to totally
scalarize any union type (or a type which recursively contains a union).
That's because for total scalarization it will attempt to create 
replacements according to the component types - for unions it would need
to arrange to block-copy it - certainly possible, esp. when none of
its field is accessed, but that will end up as aggregate copy chain
probably not solving things.  For the testcase the union involved is
also quite large (40 bytes).

There are also two variables in the chain disqualified for total
scalarization, one is because

MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;

"contains a VCE" according to contains_vce_or_bfcref_p, in this case
the MEM accesses D.9898 in a type that doesn't match its declaration.
Same for the other case:

MEM[(struct variant_ref *)&D.10030].inner_storage_ = 
ptr.D.9270.inner_storage_;

I'm not sure whether that changed with the patch - without the patch
it seems we avoid touching the union portion at all.  The cruical
difference indeed seems to be

-  MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
-  MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
-  MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
-  MEM[(struct _storage *)&D.9472]._which = 2;
+  D.9472 = {};
+  D.9472._storage._which = 2;

where the extra zeroing triggers the need for total scalarization of it - 
otherwise only _which is determined as used and the rest of the
objects ignored.

So maybe instead of teaching SRA about unions zero-init could be
special-cased as "re-materializable".  OTOH SRA is already full of
too many special-cases.

I'll note that the bob/eos CLOBBERs unfortunately have many of the
aggregates have overlapping lifetimes:

  D.10024 = {}; 
  D.10024._storage._which = 2;
  D.10026 = D.10024;

^^^ D.10024 should be dead here

  t = D.10026;
  MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
  MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
  t ={v} {CLOBBER(eos)};
  D.10026 ={v} {CLOBBER(eos)};
  D.10024 ={v} {CLOBBER(eos)};

^^^ but EOS here, even after D.10026.  For cases like in this testcase
doing BB-local aggregate liveness analysis, relying on bob/eos CLOBBERS
only to determine if an aggregate is BB-local could make for a relatively
cheap aggregate "re-use" pass, replacing D.10026 with D.10024 above.
Note bob CLOBBERS are missing for most vars.  This kind of analysis
could also help SRA code generation as it could generate only one set
of scalar replacements for a chain of aggregate copies.

> Or the
> struct S { union U { struct T { void *a, *b, *c, *d, *e; } t; struct V {} v; } u; unsigned long w; };
> void bar (struct S *);
> 
> void
> foo ()
> {
>   struct S s = { .u = { .v = {} }, .w = 2 };
>   bar (&s);
> }
> I've mentioned shows the same behavior except for SRA (which
> is there of course prevented through the object escaping).
> Though, not really sure right now if this reduced testcase
> in C or C++ actually isn't supposed to clear the whole object rather than
> just the initialized fields and what exactly is CONSTRUCTOR_NO_CLEARING
> vs. !CONSTRUCTOR_NO_CLEARING supposed to mean.
> 
> On pr78687.C with Marek's patch, CONSTRUCTOR_NO_CLEARING is cleared with
>   /* The result of a constexpr function must be completely initialized.
> 
>      However, in C++20, a constexpr constructor doesn't necessarily have
>      to initialize all the fields, so we don't clear CONSTRUCTOR_NO_CLEARING
>      in order to detect reading an unitialized object in constexpr instead
>      of value-initializing it.  (reduced_constant_expression_p is expected to
>      take care of clearing the flag.)  */
>   if (TREE_CODE (result) == CONSTRUCTOR
>       && (cxx_dialect < cxx20
>           || !DECL_CONSTRUCTOR_P (fun)))
>     clear_no_implicit_zero (result);
> 
> generic.texi says:
> "Unrepresented fields will be cleared (zeroed), unless the
> CONSTRUCTOR_NO_CLEARING flag is set, in which case their value becomes
> undefined."
> Now, for RECORD_TYPE, I think the !CONSTRUCTOR_NO_CLEARING meaning is clear,
> if the flag isn't set, then if there is no constructor_elt for certain
> FIELD_DECL, that FIELD_DECL is implicitly zeroed.  The state of padding bits
> is fuzzy, we don't really have a flag whether the CONSTRUCTOR clears also
> padding bits (aka C++ zero initialization) or not.
> The problem above is with UNION_TYPE.  If the CONSTRUCTOR for it is empty,
> that should IMHO still imply zero initialization of the whole thing, we
> don't really know which union member is active.  But if the CONSTRUCTOR
> has one elt (it should never have more than one), shouldn't that mean
> (unless there is a new flag which says that padding bits are cleared too)
> that CONSTRUCTOR_NO_CLEARING and !CONSTRUCTOR_NO_CLEARING behave the same,
> in particular that the selected FIELD_DECL is initialized to whatever
> the initializer is but nothing else is?
> 
> The reason why the gimplifier clears the whole struct is because
> on (with Marek's patch on the pr78687.C testcase)
> D.10137 = {._storage={.D.9542={.D.9123={._tail={.D.9181={._tail={.D.9240={._head={}}}}}}}, ._which=2}};
> or that
> s = {.u={.v={}}, .w=2};
> in the above testcase, categorize_ctor_elements yields
> valid_const_initializer = true
> num_nonzero_elements = 1
> num_unique_nonzero_elements = 1
> num_ctor_elements = 1
> complete_p = false
> - there is a single non-empty initializer in both CONSTRUCTORs,
> they aren't CONSTRUCTOR_NO_CLEARING and the reason complete_p is false
> is that categorize_ctor_elements_1 does:
>   if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
>                                                 num_fields, elt_type))
>     *p_complete = 0;
>   else if (*p_complete > 0
>            && type_has_padding_at_level_p (TREE_TYPE (ctor)))
>     *p_complete = -1;
> and for UNION_TYPE/QUAL_UNION_TYPE complete_ctor_at_level_p does:
>       if (num_elts == 0)
>         return false;
> ...
>       /* ??? We could look at each element of the union, and find the
>          largest element.  Which would avoid comparing the size of the
>          initialized element against any tail padding in the union.
>          Doesn't seem worth the effort...  */
>       return simple_cst_equal (TYPE_SIZE (type), TYPE_SIZE (last_type)) == 1;
> Now, given the documentation of categorize_ctor_elements:
>    * whether the constructor is complete -- in the sense that every
>      meaningful byte is explicitly given a value --
>      and place it in *P_COMPLETE:
>      -  0 if any field is missing
>      -  1 if all fields are initialized, and there's no padding
>      - -1 if all fields are initialized, but there's padding
> I'd argue this handling of UNION_TYPE/QUAL_UNION_TYPE is wrong

I agree (it's pessimistic).

> (though note that type_has_padding_at_level_p returns true if any of the
> union members is smaller than the whole, rather than checking whether
> the actually initialized one has the same size as whole), it will
> set *p_complete to 0 as if any field is missing, even when no field
> is missing, just the union has padding.
> 
> So, I think we should go with (but so far completely untested except
> for pr78687.C which is optimized with Marek's patch and the above testcase
> which doesn't have the clearing anymore) the following patch.
> 
> 2024-09-27  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR c++/116416
> 	* expr.cc (categorize_ctor_elements_1): Fix up union handling of
> 	*p_complete.  Clear it only if num_fields is 0 and the union has
> 	at least one FIELD_DECL, set to -1 if either union has no fields
> 	and non-zero size, or num_fields is 1 and complete_ctor_at_level_p
> 	returned false.
> 	* gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
> 	handling, return also true for UNION_TYPE with no FIELD_DECLs
> 	and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.
> 
> --- gcc/expr.cc.jj	2024-09-04 12:09:22.598808244 +0200
> +++ gcc/expr.cc	2024-09-27 15:34:20.929282525 +0200
> @@ -7218,7 +7218,36 @@ categorize_ctor_elements_1 (const_tree c
>  
>    if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
>  						num_fields, elt_type))
> -    *p_complete = 0;
> +    {
> +      if (TREE_CODE (TREE_TYPE (ctor)) == UNION_TYPE
> +	  || TREE_CODE (TREE_TYPE (ctor)) == QUAL_UNION_TYPE)
> +	{
> +	  /* The union case is more complicated.  */
> +	  if (num_fields == 0)
> +	    {
> +	      /* If the CONSTRUCTOR doesn't have any elts, it is
> +		 incomplete if the union has at least one field.  */

Hmm, but an empty CTOR is zero-initialization?

You know this code more than anybody else, so I suppose the change
is OK if it works.

Thanks,
Richard.

> +	      for (tree f = TYPE_FIELDS (TREE_TYPE (ctor));
> +		   f; f = DECL_CHAIN (f))
> +		if (TREE_CODE (f) == FIELD_DECL)
> +		  {
> +		    *p_complete = 0;
> +		    break;
> +		  }
> +	      /* Otherwise it has padding if the union has non-zero size.  */
> +	      if (*p_complete > 0
> +		  && !integer_zerop (TYPE_SIZE (TREE_TYPE (ctor))))
> +		*p_complete = -1;
> +	    }
> +	  /* Otherwise the CONSTRUCTOR should have exactly one element.
> +	     It is then never incomplete, but if complete_ctor_at_level_p
> +	     returned false, it has padding.  */
> +	  else if (*p_complete > 0)
> +	    *p_complete = -1;
> +	}
> +      else
> +	*p_complete = 0;
> +    }
>    else if (*p_complete > 0
>  	   && type_has_padding_at_level_p (TREE_TYPE (ctor)))
>      *p_complete = -1;
> --- gcc/gimple-fold.cc.jj	2024-09-09 11:25:43.197048840 +0200
> +++ gcc/gimple-fold.cc	2024-09-27 15:44:05.896355615 +0200
> @@ -4814,12 +4814,22 @@ type_has_padding_at_level_p (tree type)
>  	return false;
>        }
>      case UNION_TYPE:
> +    case QUAL_UNION_TYPE:
> +      bool any_fields;
> +      any_fields = false;
>        /* If any of the fields is smaller than the whole, there is padding.  */
>        for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> -	if (TREE_CODE (f) == FIELD_DECL)
> -	  if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
> -				TREE_TYPE (type)) != 1)
> -	    return true;
> +	if (TREE_CODE (f) != FIELD_DECL)
> +	  continue;
> +	else if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
> +				   TYPE_SIZE (TREE_TYPE (type))) != 1)
> +	  return true;
> +	else
> +	  any_fields = true;
> +      /* If the union doesn't have any fields and still has non-zero size,
> +	 all of it is padding.  */
> +      if (!any_fields && !integer_zerop (TYPE_SIZE (type)))
> +	return true;
>        return false;
>      case ARRAY_TYPE:
>      case COMPLEX_TYPE:
> 
> 
> 	Jakub
> 
>
  

Patch

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 003e68f1ea7..41d6333f650 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1473,6 +1473,20 @@  cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
 	 that case, strip it in favor of this one.  */
       if (tree &init = TARGET_EXPR_INITIAL (stmt))
 	{
+	  tree fn;
+	  if ((data->flags & ff_genericize)
+	      /* Give the user an option to opt out.  */
+	      && !((fn = current_function_decl)
+		   && lookup_attribute ("noinline",
+					DECL_ATTRIBUTES (fn))))
+	    {
+	      tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt));
+	      if (folded != init && TREE_CONSTANT (folded))
+		{
+		  init = folded;
+		  break;
+		}
+	    }
 	  cp_walk_tree (&init, cp_fold_r, data, NULL);
 	  cp_walk_tree (&TARGET_EXPR_CLEANUP (stmt), cp_fold_r, data, NULL);
 	  *walk_subtrees = 0;
diff --git a/gcc/testsuite/g++.dg/analyzer/pr97116.C b/gcc/testsuite/g++.dg/analyzer/pr97116.C
index d8e08a73172..1c404c2ceb2 100644
--- a/gcc/testsuite/g++.dg/analyzer/pr97116.C
+++ b/gcc/testsuite/g++.dg/analyzer/pr97116.C
@@ -16,7 +16,7 @@  struct foo
 void test_1 (void)
 {
   foo *p = new(NULL) foo (42); // { dg-warning "non-null expected" "warning" }
-  // { dg-message "argument 'this' \\(\[^\n\]*\\) NULL where non-null expected" "final event" { target *-*-* } .-1 }
+  // { dg-message "argument 'this'( \\(\[^\n\]*\\))? NULL where non-null expected" "final event" { target *-*-* } .-1 }
 }
 
 int test_2 (void)
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C b/gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
new file mode 100644
index 00000000000..45e3a382056
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
@@ -0,0 +1,29 @@ 
+// PR c++/116416
+// { dg-do compile { target c++14 } }
+
+struct Str {
+  constexpr Str() {}
+  constexpr Str(const char *instr) {
+      str = instr; length = 0;
+      for (auto index = 0; instr[index]; ++index) {
+        ++length;
+      }
+  }
+  const char *str = nullptr;
+  int length = 0;
+};
+extern void callback(Str str);
+void
+func1()
+{
+    callback(Str{"Test"});
+}
+void
+func2()
+{
+    Str str{"Test"};
+    callback(str);
+}
+
+// Check that we don't call Str::Str(char const*)
+// { dg-final { scan-assembler-not "_ZN3StrC1EPKc" } }
diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C b/gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C
index 30129a4a266..64c06fb0b7d 100644
--- a/gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C
+++ b/gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C
@@ -68,7 +68,7 @@  test (int i)
 {
   int r = g (42) + g(i);
   int t = k<int>(42)
-	    + k<int>(i); // { dg-bogus "call to|constant" "" { xfail *-*-* } }
+	    + k<int>(i); // { dg-bogus "call to|constant" }
   return r + t;
 }
 
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr78687.C b/gcc/testsuite/g++.dg/tree-ssa/pr78687.C
index 698458f0e9a..7f5306bf841 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr78687.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr78687.C
@@ -446,7 +446,10 @@  struct qual_option
     int quals_;
 };
 
-inline ref_proxy<option_2, option_ref > make_object_1()
+// Compile-time evaluation of TARGET_EXPR's initials in cp_fold_r/TARGET_EXPR
+// breaks this test.  The attribute prevents that early evaluation.
+__attribute__((noinline))
+ref_proxy<option_2, option_ref > make_object_1()
 {
     return ref_proxy<option_2, option_ref >(option_2());
 }
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr90883.C b/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
index 37df17d0b16..f66124945b6 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
@@ -7,6 +7,7 @@ 
         int b{};
     };
 
+    __attribute__((noinline))
     C slow()
     {
         return {};