tailc: Virtually undo IPA-VRP return value optimization for tail calls [PR118430]
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
Commit Message
Hi!
When we have return somefn (whatever); where somefn is normally tail
callable and IPA-VRP determines somefn returns a singleton range, VRP
just changes the IL to
somefn (whatever);
return 42;
(or whatever the value in that range is). The introduction of IPA-VRP
return value tracking then effectively regresses the tail call optimization.
This is even more important if the call is [[gnu::musttail]].
So, the following patch queries IPA-VRP whether a function returns singleton
range and if so and the value returned is identical to that, marks the
call as [tail call] anyway. If expansion decides it can't use the tail
call, we'll still expand the return 42; or similar statement, and if it
decides it can use the tail call, that part will be ignored and we'll emit
normal tail call.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2025-01-15 Jakub Jelinek <jakub@redhat.com>
Andrew Pinski <quic_apinski@quicinc.com>
PR tree-optimization/118430
* tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h,
symbol-summary.h, ipa-cp.h and ipa-prop.h.
(find_tail_calls): If ass_var is NULL and ret_var is not, check if
IPA-VRP has not found singleton return range for it. In that case,
don't punt if ret_var is the only value in that range. Adjust the
maybe_error_musttail message otherwise to diagnose different value
being returned from the caller and callee rather than using return
slot. Formatting fixes.
* c-c++-common/musttail14.c: New test.
Jakub
Comments
On Wed, 15 Jan 2025, Jakub Jelinek wrote:
> Hi!
>
> When we have return somefn (whatever); where somefn is normally tail
> callable and IPA-VRP determines somefn returns a singleton range, VRP
> just changes the IL to
> somefn (whatever);
> return 42;
> (or whatever the value in that range is). The introduction of IPA-VRP
> return value tracking then effectively regresses the tail call optimization.
> This is even more important if the call is [[gnu::musttail]].
>
> So, the following patch queries IPA-VRP whether a function returns singleton
> range and if so and the value returned is identical to that, marks the
> call as [tail call] anyway. If expansion decides it can't use the tail
> call, we'll still expand the return 42; or similar statement, and if it
> decides it can use the tail call, that part will be ignored and we'll emit
> normal tail call.
Interesting idea. I'd have said that for [[gnu::musttail]] we want to
disable the IPA transform instead? But I can see that when the user
wrote
somefn (whatever);
return 42;
the handling would enable tail-calling, but your patch only handles
it in the [[musttail]] failure path.
Anyway, I think we want to guard IPA transforms on [[gnu::musttail]]
return value which should be more robust?
Richard.
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2025-01-15 Jakub Jelinek <jakub@redhat.com>
> Andrew Pinski <quic_apinski@quicinc.com>
>
> PR tree-optimization/118430
> * tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h,
> symbol-summary.h, ipa-cp.h and ipa-prop.h.
> (find_tail_calls): If ass_var is NULL and ret_var is not, check if
> IPA-VRP has not found singleton return range for it. In that case,
> don't punt if ret_var is the only value in that range. Adjust the
> maybe_error_musttail message otherwise to diagnose different value
> being returned from the caller and callee rather than using return
> slot. Formatting fixes.
>
> * c-c++-common/musttail14.c: New test.
>
> --- gcc/tree-tailcall.cc.jj 2025-01-02 11:23:18.407490465 +0100
> +++ gcc/tree-tailcall.cc 2025-01-14 16:03:13.169294430 +0100
> @@ -45,6 +45,12 @@ along with GCC; see the file COPYING3.
> #include "ipa-utils.h"
> #include "tree-ssa-live.h"
> #include "diagnostic-core.h"
> +#include "gimple-range.h"
> +#include "alloc-pool.h"
> +#include "sreal.h"
> +#include "symbol-summary.h"
> +#include "ipa-cp.h"
> +#include "ipa-prop.h"
>
> /* The file implements the tail recursion elimination. It is also used to
> analyze the tail calls in general, passing the results to the rtl level
> @@ -483,7 +489,7 @@ find_tail_calls (basic_block bb, struct
> {
> if (dump_file)
> fprintf (dump_file, "Basic block %d has extra exit edges\n",
> - bb->index);
> + bb->index);
> return;
> }
> if (!cfun->has_musttail)
> @@ -517,7 +523,8 @@ find_tail_calls (basic_block bb, struct
> if (bad_stmt)
> {
> maybe_error_musttail (call,
> - _("memory reference or volatile after call"));
> + _("memory reference or volatile after "
> + "call"));
> return;
> }
> ass_var = gimple_call_lhs (call);
> @@ -597,10 +604,10 @@ find_tail_calls (basic_block bb, struct
> {
> if (stmt == last_stmt)
> maybe_error_musttail (call,
> - _("call may throw exception that does not propagate"));
> + _("call may throw exception that does not "
> + "propagate"));
> else
> - maybe_error_musttail (call,
> - _("code between call and return"));
> + maybe_error_musttail (call, _("code between call and return"));
> return;
> }
>
> @@ -715,7 +722,8 @@ find_tail_calls (basic_block bb, struct
> {
> if (local_live_vars)
> BITMAP_FREE (local_live_vars);
> - maybe_error_musttail (call, _("call invocation refers to locals"));
> + maybe_error_musttail (call,
> + _("call invocation refers to locals"));
> return;
> }
> else
> @@ -724,7 +732,8 @@ find_tail_calls (basic_block bb, struct
> if (bitmap_bit_p (local_live_vars, *v))
> {
> BITMAP_FREE (local_live_vars);
> - maybe_error_musttail (call, _("call invocation refers to locals"));
> + maybe_error_musttail (call,
> + _("call invocation refers to locals"));
> return;
> }
> }
> @@ -833,15 +842,39 @@ find_tail_calls (basic_block bb, struct
> && (ret_var != ass_var
> && !(is_empty_type (TREE_TYPE (ret_var)) && !ass_var)))
> {
> - maybe_error_musttail (call, _("call uses return slot"));
> - return;
> + bool ok = false;
> + value_range val;
> + tree valr;
> + /* If IPA-VRP proves called function always returns a singleton range,
> + the return value is replaced by the only value in that range.
> + For tail call purposes, pretend such replacement didn't happen. */
> + if (ass_var == NULL_TREE
> + && !tail_recursion
> + && TREE_CONSTANT (ret_var))
> + if (tree type = gimple_range_type (call))
> + if (tree callee = gimple_call_fndecl (call))
> + if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
> + && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
> + type)
> + && useless_type_conversion_p (TREE_TYPE (ret_var), type)
> + && ipa_return_value_range (val, callee)
> + && val.singleton_p (&valr)
> + && operand_equal_p (ret_var, valr, 0))
> + ok = true;
> + if (!ok)
> + {
> + maybe_error_musttail (call,
> + _("call and return value are different"));
> + return;
> + }
> }
>
> /* If this is not a tail recursive call, we cannot handle addends or
> multiplicands. */
> if (!tail_recursion && (m || a))
> {
> - maybe_error_musttail (call, _("operations after non tail recursive call"));
> + maybe_error_musttail (call,
> + _("operations after non tail recursive call"));
> return;
> }
>
> @@ -849,7 +882,8 @@ find_tail_calls (basic_block bb, struct
> if (m && POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl))))
> {
> maybe_error_musttail (call,
> - _("tail recursion with pointers can only use additions"));
> + _("tail recursion with pointers can only use "
> + "additions"));
> return;
> }
>
> --- gcc/testsuite/c-c++-common/musttail14.c.jj 2025-01-14 16:22:06.623276122 +0100
> +++ gcc/testsuite/c-c++-common/musttail14.c 2025-01-14 16:23:17.963267912 +0100
> @@ -0,0 +1,89 @@
> +/* PR tree-optimization/118430 */
> +/* { dg-do compile { target musttail } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times " bar \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " freddy \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
> +
> +__attribute__ ((noipa)) void
> +foo (int x)
> +{
> + (void) x;
> +}
> +
> +__attribute__ ((noinline)) int
> +bar (int x)
> +{
> + foo (x);
> + return 1;
> +}
> +
> +__attribute__ ((noinline)) int
> +baz (int *x)
> +{
> + foo (*x);
> + return 2;
> +}
> +
> +__attribute__((noipa)) int
> +qux (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + [[gnu::musttail]]
> + return bar (x);
> +}
> +
> +__attribute__((noipa)) int
> +corge (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + return bar (x) + 1;
> +}
> +
> +__attribute__ ((noinline)) float
> +freddy (int x)
> +{
> + foo (x);
> + return 1.75f;
> +}
> +
> +__attribute__((noipa)) float
> +garply (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + [[gnu::musttail]]
> + return freddy (x);
> +}
> +
> +__attribute__((noipa)) float
> +quux (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + return freddy (x) + 0.25f;
> +}
> +
> +int v;
> +
> +int
> +main ()
> +{
> + qux (v);
> + corge (v);
> + garply (v);
> + quux (v);
> +}
>
> Jakub
>
>
On Wed, Jan 15, 2025 at 10:05:35AM +0100, Richard Biener wrote:
> > When we have return somefn (whatever); where somefn is normally tail
> > callable and IPA-VRP determines somefn returns a singleton range, VRP
> > just changes the IL to
> > somefn (whatever);
> > return 42;
> > (or whatever the value in that range is). The introduction of IPA-VRP
> > return value tracking then effectively regresses the tail call optimization.
> > This is even more important if the call is [[gnu::musttail]].
> >
> > So, the following patch queries IPA-VRP whether a function returns singleton
> > range and if so and the value returned is identical to that, marks the
> > call as [tail call] anyway. If expansion decides it can't use the tail
> > call, we'll still expand the return 42; or similar statement, and if it
> > decides it can use the tail call, that part will be ignored and we'll emit
> > normal tail call.
>
> Interesting idea. I'd have said that for [[gnu::musttail]] we want to
> disable the IPA transform instead?
The patch isn't just about [[gnu::musttail]], that is mentioned primarily
because it is then an error rather than just missed optimization.
The patch fixes a regression for the non-musttail cases, where before GCC 14
we used to tail call and we no longer do.
The maybe_error_musttail call in there does nothing if call isn't musttail
(except print a note into the dump file).
> But I can see that when the user
> wrote
>
> somefn (whatever);
> return 42;
>
> the handling would enable tail-calling, but your patch only handles
> it in the [[musttail]] failure path.
No, the patch handles that too.
> Anyway, I think we want to guard IPA transforms on [[gnu::musttail]]
> return value which should be more robust?
The question is what. I think not computing the return range for that case
is a bad idea, we want to propagate it to the caller. One thing I've
considered was extending the range in one of the directions (which?) so that
it isn't singleton, but there could be harm caused if we pick the wrong one.
So maybe just guard the spot which replaces the use of SSA_NAME with
singleton value in the return stmt with the constant if the
SSA_NAME_DEF_STMT is a musttail call? I guess that can be done if there
aren't too many spots which perform such replacement (and I'm afraid there
are, but haven't searched for them).
For non-musttail we shouldn't do that though, we don't know whether it is
tail-callable at all and whether having the constant in there is more
beneficial for optimizations or possibly keeping it as maybe tail callable.
And therefore I wrote the patch as is, it then fixes the 14/15 regression
for it and as a benefit can even handle the cases where user wrote somefn
(whatever); return 42; rather than return somefn (whatever);
I'm willing to at least try the punting on singleton replacements for
gnu::musttail though.
Jakub
On Wed, 15 Jan 2025, Jakub Jelinek wrote:
> On Wed, Jan 15, 2025 at 10:05:35AM +0100, Richard Biener wrote:
> > > When we have return somefn (whatever); where somefn is normally tail
> > > callable and IPA-VRP determines somefn returns a singleton range, VRP
> > > just changes the IL to
> > > somefn (whatever);
> > > return 42;
> > > (or whatever the value in that range is). The introduction of IPA-VRP
> > > return value tracking then effectively regresses the tail call optimization.
> > > This is even more important if the call is [[gnu::musttail]].
> > >
> > > So, the following patch queries IPA-VRP whether a function returns singleton
> > > range and if so and the value returned is identical to that, marks the
> > > call as [tail call] anyway. If expansion decides it can't use the tail
> > > call, we'll still expand the return 42; or similar statement, and if it
> > > decides it can use the tail call, that part will be ignored and we'll emit
> > > normal tail call.
> >
> > Interesting idea. I'd have said that for [[gnu::musttail]] we want to
> > disable the IPA transform instead?
>
> The patch isn't just about [[gnu::musttail]], that is mentioned primarily
> because it is then an error rather than just missed optimization.
> The patch fixes a regression for the non-musttail cases, where before GCC 14
> we used to tail call and we no longer do.
> The maybe_error_musttail call in there does nothing if call isn't musttail
> (except print a note into the dump file).
>
> > But I can see that when the user
> > wrote
> >
> > somefn (whatever);
> > return 42;
> >
> > the handling would enable tail-calling, but your patch only handles
> > it in the [[musttail]] failure path.
>
> No, the patch handles that too.
>
> > Anyway, I think we want to guard IPA transforms on [[gnu::musttail]]
> > return value which should be more robust?
>
> The question is what. I think not computing the return range for that case
> is a bad idea, we want to propagate it to the caller. One thing I've
> considered was extending the range in one of the directions (which?) so that
> it isn't singleton, but there could be harm caused if we pick the wrong one.
> So maybe just guard the spot which replaces the use of SSA_NAME with
> singleton value in the return stmt with the constant if the
> SSA_NAME_DEF_STMT is a musttail call? I guess that can be done if there
> aren't too many spots which perform such replacement (and I'm afraid there
> are, but haven't searched for them).
Yes. I'll note there's a PR (or a bunch of) which are about
x = FOO (y, ..);
<use of x>
vs.
FOO (y, ..);
<use of y>
for FOO returning an argument (like for example memcpy). Where neither
form is best in all cases. For your example above consider '42' being
a FP constant, we'd have to re-load that from the constant pool after
the call rather using the conveniently available copy in the return
register. Or on RISC archs it might be costly to materialize the
integer immediate.
So the question is whether such replacement is a good thing - yes,
we want the knowledge to simplify followup code, but copy-replacing
might not always be good.
I wonder if the IPA improvements now cause replacement for returns-ARG
case as well?
> For non-musttail we shouldn't do that though, we don't know whether it is
> tail-callable at all and whether having the constant in there is more
> beneficial for optimizations or possibly keeping it as maybe tail callable.
> And therefore I wrote the patch as is, it then fixes the 14/15 regression
> for it and as a benefit can even handle the cases where user wrote somefn
> (whatever); return 42; rather than return somefn (whatever);
>
> I'm willing to at least try the punting on singleton replacements for
> gnu::musttail though.
I think that would make it more robust indeed. But general optimality
concerns remain. I don't think LRA knows to remat a constant from
a return register for example.
Richard.
> Jakub
@@ -45,6 +45,12 @@ along with GCC; see the file COPYING3.
#include "ipa-utils.h"
#include "tree-ssa-live.h"
#include "diagnostic-core.h"
+#include "gimple-range.h"
+#include "alloc-pool.h"
+#include "sreal.h"
+#include "symbol-summary.h"
+#include "ipa-cp.h"
+#include "ipa-prop.h"
/* The file implements the tail recursion elimination. It is also used to
analyze the tail calls in general, passing the results to the rtl level
@@ -483,7 +489,7 @@ find_tail_calls (basic_block bb, struct
{
if (dump_file)
fprintf (dump_file, "Basic block %d has extra exit edges\n",
- bb->index);
+ bb->index);
return;
}
if (!cfun->has_musttail)
@@ -517,7 +523,8 @@ find_tail_calls (basic_block bb, struct
if (bad_stmt)
{
maybe_error_musttail (call,
- _("memory reference or volatile after call"));
+ _("memory reference or volatile after "
+ "call"));
return;
}
ass_var = gimple_call_lhs (call);
@@ -597,10 +604,10 @@ find_tail_calls (basic_block bb, struct
{
if (stmt == last_stmt)
maybe_error_musttail (call,
- _("call may throw exception that does not propagate"));
+ _("call may throw exception that does not "
+ "propagate"));
else
- maybe_error_musttail (call,
- _("code between call and return"));
+ maybe_error_musttail (call, _("code between call and return"));
return;
}
@@ -715,7 +722,8 @@ find_tail_calls (basic_block bb, struct
{
if (local_live_vars)
BITMAP_FREE (local_live_vars);
- maybe_error_musttail (call, _("call invocation refers to locals"));
+ maybe_error_musttail (call,
+ _("call invocation refers to locals"));
return;
}
else
@@ -724,7 +732,8 @@ find_tail_calls (basic_block bb, struct
if (bitmap_bit_p (local_live_vars, *v))
{
BITMAP_FREE (local_live_vars);
- maybe_error_musttail (call, _("call invocation refers to locals"));
+ maybe_error_musttail (call,
+ _("call invocation refers to locals"));
return;
}
}
@@ -833,15 +842,39 @@ find_tail_calls (basic_block bb, struct
&& (ret_var != ass_var
&& !(is_empty_type (TREE_TYPE (ret_var)) && !ass_var)))
{
- maybe_error_musttail (call, _("call uses return slot"));
- return;
+ bool ok = false;
+ value_range val;
+ tree valr;
+ /* If IPA-VRP proves called function always returns a singleton range,
+ the return value is replaced by the only value in that range.
+ For tail call purposes, pretend such replacement didn't happen. */
+ if (ass_var == NULL_TREE
+ && !tail_recursion
+ && TREE_CONSTANT (ret_var))
+ if (tree type = gimple_range_type (call))
+ if (tree callee = gimple_call_fndecl (call))
+ if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
+ && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
+ type)
+ && useless_type_conversion_p (TREE_TYPE (ret_var), type)
+ && ipa_return_value_range (val, callee)
+ && val.singleton_p (&valr)
+ && operand_equal_p (ret_var, valr, 0))
+ ok = true;
+ if (!ok)
+ {
+ maybe_error_musttail (call,
+ _("call and return value are different"));
+ return;
+ }
}
/* If this is not a tail recursive call, we cannot handle addends or
multiplicands. */
if (!tail_recursion && (m || a))
{
- maybe_error_musttail (call, _("operations after non tail recursive call"));
+ maybe_error_musttail (call,
+ _("operations after non tail recursive call"));
return;
}
@@ -849,7 +882,8 @@ find_tail_calls (basic_block bb, struct
if (m && POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl))))
{
maybe_error_musttail (call,
- _("tail recursion with pointers can only use additions"));
+ _("tail recursion with pointers can only use "
+ "additions"));
return;
}
@@ -0,0 +1,89 @@
+/* PR tree-optimization/118430 */
+/* { dg-do compile { target musttail } } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times " bar \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " freddy \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
+
+__attribute__ ((noipa)) void
+foo (int x)
+{
+ (void) x;
+}
+
+__attribute__ ((noinline)) int
+bar (int x)
+{
+ foo (x);
+ return 1;
+}
+
+__attribute__ ((noinline)) int
+baz (int *x)
+{
+ foo (*x);
+ return 2;
+}
+
+__attribute__((noipa)) int
+qux (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ [[gnu::musttail]]
+ return bar (x);
+}
+
+__attribute__((noipa)) int
+corge (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ return bar (x) + 1;
+}
+
+__attribute__ ((noinline)) float
+freddy (int x)
+{
+ foo (x);
+ return 1.75f;
+}
+
+__attribute__((noipa)) float
+garply (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ [[gnu::musttail]]
+ return freddy (x);
+}
+
+__attribute__((noipa)) float
+quux (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ return freddy (x) + 0.25f;
+}
+
+int v;
+
+int
+main ()
+{
+ qux (v);
+ corge (v);
+ garply (v);
+ quux (v);
+}