tailc, v2: Virtually undo IPA-VRP return value optimization for tail calls [PR118430]
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
Commit Message
On Wed, Jan 15, 2025 at 10:42:12AM +0100, Richard Biener wrote:
> Yes. I'll note there's a PR (or a bunch of) which are about
>
> x = FOO (y, ..);
> <use of x>
>
> vs.
>
> FOO (y, ..);
> <use of y>
>
> for FOO returning an argument (like for example memcpy). Where neither
> form is best in all cases. For your example above consider '42' being
> a FP constant, we'd have to re-load that from the constant pool after
> the call rather using the conveniently available copy in the return
> register. Or on RISC archs it might be costly to materialize the
> integer immediate.
>
> So the question is whether such replacement is a good thing - yes,
> we want the knowledge to simplify followup code, but copy-replacing
> might not always be good.
Or we can improve it incrementally, in RA or wherever else, perhaps with
some helper function. VRP or ssa propagation unfortunately doesn't know
if something is a tail call and shouldn't redo tailc pass analysis for it
each time it encounters it, and not replacing it in places which aren't tail
call can hamper further optimizations when the lhs is used later on in other
statements.
The point of the patch is that it restores previous behavior at least for
the tail call optimization.
BTW, I think we don't optimize returns-arg stuff like that at least right
now, and if we did, it wouldn't be through IPA-VRP, most of the returns-arg
functions actually return a pointer, not integer and for prange we just
track constant values, not say address of something.
> > For non-musttail we shouldn't do that though, we don't know whether it is
> > tail-callable at all and whether having the constant in there is more
> > beneficial for optimizations or possibly keeping it as maybe tail callable.
> > And therefore I wrote the patch as is, it then fixes the 14/15 regression
> > for it and as a benefit can even handle the cases where user wrote somefn
> > (whatever); return 42; rather than return somefn (whatever);
> >
> > I'm willing to at least try the punting on singleton replacements for
> > gnu::musttail though.
>
> I think that would make it more robust indeed. But general optimality
> concerns remain. I don't think LRA knows to remat a constant from
> a return register for example.
It could be taught that.
In any case, IPA VRP return value optimizion isn't a new thing on the trunk,
we've been doing that already in GCC 14.
So, how about following adjusted patch (same tree-tailcall.cc, I've just
extended the musttail14.c testcase to really verify the non-tailcallable
cases weren't tail called) plus added testcase to verify the non-musttail
cases (both when written as return fncall (args); and as fncall (args);
return singletoncst;), plus follow up patch I'll post next?
2025-01-15 Jakub Jelinek <jakub@redhat.com>
Andrew Pinski <quic_apinski@quicinc.com>
PR tree-optimization/118430
* tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h,
symbol-summary.h, ipa-cp.h and ipa-prop.h.
(find_tail_calls): If ass_var is NULL and ret_var is not, check if
IPA-VRP has not found singleton return range for it. In that case,
don't punt if ret_var is the only value in that range. Adjust the
maybe_error_musttail message otherwise to diagnose different value
being returned from the caller and callee rather than using return
slot. Formatting fixes.
* c-c++-common/musttail14.c: New test.
* c-c++-common/pr118430.c: New test.
Jakub
Comments
On Wed, Jan 15, 2025 at 11:46:28AM +0100, Jakub Jelinek wrote:
> BTW, I think we don't optimize returns-arg stuff like that at least right
> now, and if we did, it wouldn't be through IPA-VRP, most of the returns-arg
> functions actually return a pointer, not integer and for prange we just
> track constant values, not say address of something.
I think for RA, perhaps we could add a REG_EQUIV note to the call insns
which return singleton value range and let CSE etc. deal with that.
Jakub
On Wed, 15 Jan 2025, Jakub Jelinek wrote:
> On Wed, Jan 15, 2025 at 10:42:12AM +0100, Richard Biener wrote:
> > Yes. I'll note there's a PR (or a bunch of) which are about
> >
> > x = FOO (y, ..);
> > <use of x>
> >
> > vs.
> >
> > FOO (y, ..);
> > <use of y>
> >
> > for FOO returning an argument (like for example memcpy). Where neither
> > form is best in all cases. For your example above consider '42' being
> > a FP constant, we'd have to re-load that from the constant pool after
> > the call rather using the conveniently available copy in the return
> > register. Or on RISC archs it might be costly to materialize the
> > integer immediate.
> >
> > So the question is whether such replacement is a good thing - yes,
> > we want the knowledge to simplify followup code, but copy-replacing
> > might not always be good.
>
> Or we can improve it incrementally, in RA or wherever else, perhaps with
> some helper function. VRP or ssa propagation unfortunately doesn't know
> if something is a tail call and shouldn't redo tailc pass analysis for it
> each time it encounters it, and not replacing it in places which aren't tail
> call can hamper further optimizations when the lhs is used later on in other
> statements.
>
> The point of the patch is that it restores previous behavior at least for
> the tail call optimization.
>
> BTW, I think we don't optimize returns-arg stuff like that at least right
> now, and if we did, it wouldn't be through IPA-VRP, most of the returns-arg
> functions actually return a pointer, not integer and for prange we just
> track constant values, not say address of something.
>
> > > For non-musttail we shouldn't do that though, we don't know whether it is
> > > tail-callable at all and whether having the constant in there is more
> > > beneficial for optimizations or possibly keeping it as maybe tail callable.
> > > And therefore I wrote the patch as is, it then fixes the 14/15 regression
> > > for it and as a benefit can even handle the cases where user wrote somefn
> > > (whatever); return 42; rather than return somefn (whatever);
> > >
> > > I'm willing to at least try the punting on singleton replacements for
> > > gnu::musttail though.
> >
> > I think that would make it more robust indeed. But general optimality
> > concerns remain. I don't think LRA knows to remat a constant from
> > a return register for example.
>
> It could be taught that.
>
> In any case, IPA VRP return value optimizion isn't a new thing on the trunk,
> we've been doing that already in GCC 14.
>
> So, how about following adjusted patch (same tree-tailcall.cc, I've just
> extended the musttail14.c testcase to really verify the non-tailcallable
> cases weren't tail called) plus added testcase to verify the non-musttail
> cases (both when written as return fncall (args); and as fncall (args);
> return singletoncst;), plus follow up patch I'll post next?
>
> 2025-01-15 Jakub Jelinek <jakub@redhat.com>
> Andrew Pinski <quic_apinski@quicinc.com>
>
> PR tree-optimization/118430
> * tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h,
> symbol-summary.h, ipa-cp.h and ipa-prop.h.
> (find_tail_calls): If ass_var is NULL and ret_var is not, check if
> IPA-VRP has not found singleton return range for it. In that case,
> don't punt if ret_var is the only value in that range. Adjust the
> maybe_error_musttail message otherwise to diagnose different value
> being returned from the caller and callee rather than using return
> slot. Formatting fixes.
>
> * c-c++-common/musttail14.c: New test.
> * c-c++-common/pr118430.c: New test.
>
> --- gcc/tree-tailcall.cc.jj 2025-01-02 11:23:18.407490465 +0100
> +++ gcc/tree-tailcall.cc 2025-01-14 16:03:13.169294430 +0100
> @@ -45,6 +45,12 @@ along with GCC; see the file COPYING3.
> #include "ipa-utils.h"
> #include "tree-ssa-live.h"
> #include "diagnostic-core.h"
> +#include "gimple-range.h"
> +#include "alloc-pool.h"
> +#include "sreal.h"
> +#include "symbol-summary.h"
> +#include "ipa-cp.h"
> +#include "ipa-prop.h"
>
> /* The file implements the tail recursion elimination. It is also used to
> analyze the tail calls in general, passing the results to the rtl level
> @@ -483,7 +489,7 @@ find_tail_calls (basic_block bb, struct
> {
> if (dump_file)
> fprintf (dump_file, "Basic block %d has extra exit edges\n",
> - bb->index);
> + bb->index);
> return;
> }
> if (!cfun->has_musttail)
> @@ -517,7 +523,8 @@ find_tail_calls (basic_block bb, struct
> if (bad_stmt)
> {
> maybe_error_musttail (call,
> - _("memory reference or volatile after call"));
> + _("memory reference or volatile after "
> + "call"));
> return;
> }
> ass_var = gimple_call_lhs (call);
> @@ -597,10 +604,10 @@ find_tail_calls (basic_block bb, struct
> {
> if (stmt == last_stmt)
> maybe_error_musttail (call,
> - _("call may throw exception that does not propagate"));
> + _("call may throw exception that does not "
> + "propagate"));
> else
> - maybe_error_musttail (call,
> - _("code between call and return"));
> + maybe_error_musttail (call, _("code between call and return"));
> return;
> }
>
> @@ -715,7 +722,8 @@ find_tail_calls (basic_block bb, struct
> {
> if (local_live_vars)
> BITMAP_FREE (local_live_vars);
> - maybe_error_musttail (call, _("call invocation refers to locals"));
> + maybe_error_musttail (call,
> + _("call invocation refers to locals"));
> return;
> }
> else
> @@ -724,7 +732,8 @@ find_tail_calls (basic_block bb, struct
> if (bitmap_bit_p (local_live_vars, *v))
> {
> BITMAP_FREE (local_live_vars);
> - maybe_error_musttail (call, _("call invocation refers to locals"));
> + maybe_error_musttail (call,
> + _("call invocation refers to locals"));
> return;
> }
> }
> @@ -833,15 +842,39 @@ find_tail_calls (basic_block bb, struct
> && (ret_var != ass_var
> && !(is_empty_type (TREE_TYPE (ret_var)) && !ass_var)))
> {
> - maybe_error_musttail (call, _("call uses return slot"));
> - return;
> + bool ok = false;
> + value_range val;
> + tree valr;
> + /* If IPA-VRP proves called function always returns a singleton range,
> + the return value is replaced by the only value in that range.
> + For tail call purposes, pretend such replacement didn't happen. */
> + if (ass_var == NULL_TREE
> + && !tail_recursion
> + && TREE_CONSTANT (ret_var))
> + if (tree type = gimple_range_type (call))
> + if (tree callee = gimple_call_fndecl (call))
> + if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
> + && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
> + type)
> + && useless_type_conversion_p (TREE_TYPE (ret_var), type)
> + && ipa_return_value_range (val, callee)
> + && val.singleton_p (&valr)
I suppose it's good enough to check
useless_type_conversion_p (TREE_TYPE (ret_var), TREE_TYPE (valr))?
but the bigger question is whether RTL expansion does the right thing
here without changing the IL at this point back to return the LHS
and put that back? IIRC there are some sanity checks upon tail call
expansion, but does this all work when the call itself doesn't have
its return value used? Aka some call expansion checks might be
elided in this case.
So I feel a bit nervous marking sth as tail-call that doesn't
actually look like one (unless we make it so again).
Richard.
> + && operand_equal_p (ret_var, valr, 0))
> + ok = true;
> + if (!ok)
> + {
> + maybe_error_musttail (call,
> + _("call and return value are different"));
> + return;
> + }
> }
>
> /* If this is not a tail recursive call, we cannot handle addends or
> multiplicands. */
> if (!tail_recursion && (m || a))
> {
> - maybe_error_musttail (call, _("operations after non tail recursive call"));
> + maybe_error_musttail (call,
> + _("operations after non tail recursive call"));
> return;
> }
>
> @@ -849,7 +882,8 @@ find_tail_calls (basic_block bb, struct
> if (m && POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl))))
> {
> maybe_error_musttail (call,
> - _("tail recursion with pointers can only use additions"));
> + _("tail recursion with pointers can only use "
> + "additions"));
> return;
> }
>
> --- gcc/testsuite/c-c++-common/musttail14.c.jj 2025-01-14 16:22:06.623276122 +0100
> +++ gcc/testsuite/c-c++-common/musttail14.c 2025-01-15 11:32:58.048016223 +0100
> @@ -0,0 +1,90 @@
> +/* PR tree-optimization/118430 */
> +/* { dg-do compile { target musttail } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times " bar \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " freddy \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " (?:bar|freddy) \\\(\[^\n\r]\*\\\); \\\[tail call\\\]" 2 "optimized" } } */
> +
> +__attribute__ ((noipa)) void
> +foo (int x)
> +{
> + (void) x;
> +}
> +
> +__attribute__ ((noinline)) int
> +bar (int x)
> +{
> + foo (x);
> + return 1;
> +}
> +
> +__attribute__ ((noinline)) int
> +baz (int *x)
> +{
> + foo (*x);
> + return 2;
> +}
> +
> +__attribute__((noipa)) int
> +qux (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + [[gnu::musttail]]
> + return bar (x);
> +}
> +
> +__attribute__((noipa)) int
> +corge (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + return bar (x) + 1;
> +}
> +
> +__attribute__ ((noinline)) float
> +freddy (int x)
> +{
> + foo (x);
> + return 1.75f;
> +}
> +
> +__attribute__((noipa)) float
> +garply (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + [[gnu::musttail]]
> + return freddy (x);
> +}
> +
> +__attribute__((noipa)) float
> +quux (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + return freddy (x) + 0.25f;
> +}
> +
> +int v;
> +
> +int
> +main ()
> +{
> + qux (v);
> + corge (v);
> + garply (v);
> + quux (v);
> +}
> --- gcc/testsuite/c-c++-common/pr118430.c.jj 2025-01-15 10:58:09.258897583 +0100
> +++ gcc/testsuite/c-c++-common/pr118430.c 2025-01-15 10:59:27.349797780 +0100
> @@ -0,0 +1,89 @@
> +/* PR tree-optimization/118430 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times " bar \\\(\[^\n\r]\*\\\); \\\[tail call\\\]" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " freddy \\\(\[^\n\r]\*\\\); \\\[tail call\\\]" 2 "optimized" } } */
> +
> +__attribute__ ((noipa)) void
> +foo (int x)
> +{
> + (void) x;
> +}
> +
> +__attribute__ ((noinline)) int
> +bar (int x)
> +{
> + foo (x);
> + return 1;
> +}
> +
> +__attribute__ ((noinline)) int
> +baz (int *x)
> +{
> + foo (*x);
> + return 2;
> +}
> +
> +__attribute__((noipa)) int
> +qux (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + return bar (x);
> +}
> +
> +__attribute__((noipa)) int
> +corge (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + bar (x);
> + return 1;
> +}
> +
> +__attribute__ ((noinline)) float
> +freddy (int x)
> +{
> + foo (x);
> + return 1.75f;
> +}
> +
> +__attribute__((noipa)) float
> +garply (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + return freddy (x);
> +}
> +
> +__attribute__((noipa)) float
> +quux (int x)
> +{
> + {
> + int v;
> + foo (x);
> + baz (&v);
> + }
> + freddy (x);
> + return 1.75f;
> +}
> +
> +int v;
> +
> +int
> +main ()
> +{
> + qux (v);
> + corge (v);
> + garply (v);
> + quux (v);
> +}
>
>
> Jakub
>
>
On Wed, 15 Jan 2025, Jakub Jelinek wrote:
> On Wed, Jan 15, 2025 at 11:46:28AM +0100, Jakub Jelinek wrote:
> > BTW, I think we don't optimize returns-arg stuff like that at least right
> > now, and if we did, it wouldn't be through IPA-VRP, most of the returns-arg
> > functions actually return a pointer, not integer and for prange we just
> > track constant values, not say address of something.
>
> I think for RA, perhaps we could add a REG_EQUIV note to the call insns
> which return singleton value range and let CSE etc. deal with that.
So in case we expand from _1 = foo (); this looks sensible - add
a REG_EQUAL with the constant. But for the other way around I'm
not sure where to attach such note? Expand the LHS of the call
anyway and add it on the then unused set pseudo?
Richard.
On Wed, Jan 15, 2025 at 03:16:04PM +0100, Richard Biener wrote:
> > + /* If IPA-VRP proves called function always returns a singleton range,
> > + the return value is replaced by the only value in that range.
> > + For tail call purposes, pretend such replacement didn't happen. */
> > + if (ass_var == NULL_TREE
> > + && !tail_recursion
> > + && TREE_CONSTANT (ret_var))
> > + if (tree type = gimple_range_type (call))
> > + if (tree callee = gimple_call_fndecl (call))
> > + if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
> > + && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
> > + type)
> > + && useless_type_conversion_p (TREE_TYPE (ret_var), type)
> > + && ipa_return_value_range (val, callee)
> > + && val.singleton_p (&valr)
>
> I suppose it's good enough to check
>
> useless_type_conversion_p (TREE_TYPE (ret_var), TREE_TYPE (valr))?
One of the checks is for the gimple_fntype to actually match the type of
the gimple_call_fndecl, one could always have some ugly hacks like
int foo (void);
...
long (*fn) (void) = (long (*) (void)) foo;
fn ();
etc., copied over from gimple-range-fold.cc, the other is making sure
it matches also the caller's return type.
> but the bigger question is whether RTL expansion does the right thing
> here without changing the IL at this point back to return the LHS
> and put that back? IIRC there are some sanity checks upon tail call
> expansion, but does this all work when the call itself doesn't have
> its return value used? Aka some call expansion checks might be
> elided in this case.
It does the right thing, it just relies on the tailc pass to do its job
properly.
E.g. when we have
<bb 2> [local count: 1073741824]:
foo (x_2(D));
baz (&v);
v ={v} {CLOBBER(eos)};
bar (x_2(D)); [tail call]
return 1;
when expand_gimple_basic_block handles the bar (x_2(D)); call, it uses
if (call_stmt && gimple_call_tail_p (call_stmt))
{
bool can_fallthru;
new_bb = expand_gimple_tailcall (bb, call_stmt, &can_fallthru);
if (new_bb)
{
if (can_fallthru)
bb = new_bb;
else
{
currently_expanding_gimple_stmt = NULL;
return new_bb;
}
}
}
As it is actually tail callable during expansion of the bar (x_2(D)); call
stmt, expand_gimple_tailbb returns non-NULL and sets can_fallthru to false,
plus emits
;; bar (x_2(D)); [tail call]
(insn 11 10 12 2 (set (reg:SI 5 di)
(reg/v:SI 99 [ x ])) "pr118430.c":35:10 -1
(nil))
(call_insn/j 12 11 13 2 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("bar") [flags 0x3] <function_decl 0x7fb39020bd00 bar>) [0 bar S1 A8])
(const_int 0 [0]))) "pr118430.c":35:10 -1
(expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x3] <function_decl 0x7fb39020bd00 bar>)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(expr_list:SI (use (reg:SI 5 di))
(nil)))
(barrier 13 12 0)
Because it doesn't fallthru, no further statements in the same bb are
expanded. Now, if the bb with return happened to be in some other basic
block from the [tail call], it could be expanded but because the bb with
tail call ends with a barrier, it doesn't fall thru there and if nothing
else could reach it, we'd remove the unreachable bb RSN.
> So I feel a bit nervous marking sth as tail-call that doesn't
> actually look like one (unless we make it so again).
Expansion really counts on tailc to verify all the following statements
are useless. Even if it is
_1 = bar (...); [tail call]
return _1;
it is treated the same, we also don't expand the return _1; there
separately, after all, nothing initializes the _1 anywhere when expanding
the tail call (unless tail call fails and we expand it as normal call of
course). And as tailc allows, there could be further statements, copying of
SSA_NAMEs, debug statements, clobber statements, ...
Jakub
On Wed, Jan 15, 2025 at 03:17:22PM +0100, Richard Biener wrote:
> On Wed, 15 Jan 2025, Jakub Jelinek wrote:
>
> > On Wed, Jan 15, 2025 at 11:46:28AM +0100, Jakub Jelinek wrote:
> > > BTW, I think we don't optimize returns-arg stuff like that at least right
> > > now, and if we did, it wouldn't be through IPA-VRP, most of the returns-arg
> > > functions actually return a pointer, not integer and for prange we just
> > > track constant values, not say address of something.
> >
> > I think for RA, perhaps we could add a REG_EQUIV note to the call insns
> > which return singleton value range and let CSE etc. deal with that.
>
> So in case we expand from _1 = foo (); this looks sensible - add
> a REG_EQUAL with the constant. But for the other way around I'm
> not sure where to attach such note? Expand the LHS of the call
> anyway and add it on the then unused set pseudo?
As can be seen on
int foo (void);
int
bar (void)
{
foo ();
return 1;
}
whenever a function doesn't return void, we expand it as
(call_insn 5 2 6 2 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("foo") [flags 0x41] <function_decl 0x7fe47b788b00 foo>) [0 foo S1 A8])
(const_int 0 [0]))) "oui.c":5:3 -1
(expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x41] <function_decl 0x7fe47b788b00 foo>)
(nil))
(nil))
i.e. use the call_value form rather than call. So adding
REG_EQUIV (or is that REG_EQUAL?, I'm always confused about that)
on the CALL_INSN could just explain that ax has value say (const_int 42)
or whatever else at that point.
Whether CSE and/or LRA would make good use of that is to be seen when it is
a hard register. Perhaps for CSE/combine it is undesirable to extend the
lifetime of a hard register that way (unless we create a pseudo for it),
but maybe LRA can handle it, dunno.
Jakub
On Wed, 15 Jan 2025, Jakub Jelinek wrote:
> On Wed, Jan 15, 2025 at 03:16:04PM +0100, Richard Biener wrote:
> > > + /* If IPA-VRP proves called function always returns a singleton range,
> > > + the return value is replaced by the only value in that range.
> > > + For tail call purposes, pretend such replacement didn't happen. */
> > > + if (ass_var == NULL_TREE
> > > + && !tail_recursion
> > > + && TREE_CONSTANT (ret_var))
> > > + if (tree type = gimple_range_type (call))
> > > + if (tree callee = gimple_call_fndecl (call))
> > > + if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
> > > + && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
> > > + type)
> > > + && useless_type_conversion_p (TREE_TYPE (ret_var), type)
> > > + && ipa_return_value_range (val, callee)
> > > + && val.singleton_p (&valr)
> >
> > I suppose it's good enough to check
> >
> > useless_type_conversion_p (TREE_TYPE (ret_var), TREE_TYPE (valr))?
>
> One of the checks is for the gimple_fntype to actually match the type of
> the gimple_call_fndecl, one could always have some ugly hacks like
> int foo (void);
> ...
> long (*fn) (void) = (long (*) (void)) foo;
> fn ();
> etc., copied over from gimple-range-fold.cc, the other is making sure
> it matches also the caller's return type.
>
> > but the bigger question is whether RTL expansion does the right thing
> > here without changing the IL at this point back to return the LHS
> > and put that back? IIRC there are some sanity checks upon tail call
> > expansion, but does this all work when the call itself doesn't have
> > its return value used? Aka some call expansion checks might be
> > elided in this case.
>
> It does the right thing, it just relies on the tailc pass to do its job
> properly.
> E.g. when we have
> <bb 2> [local count: 1073741824]:
> foo (x_2(D));
> baz (&v);
> v ={v} {CLOBBER(eos)};
> bar (x_2(D)); [tail call]
> return 1;
> when expand_gimple_basic_block handles the bar (x_2(D)); call, it uses
> if (call_stmt && gimple_call_tail_p (call_stmt))
> {
> bool can_fallthru;
> new_bb = expand_gimple_tailcall (bb, call_stmt, &can_fallthru);
> if (new_bb)
> {
> if (can_fallthru)
> bb = new_bb;
> else
> {
> currently_expanding_gimple_stmt = NULL;
> return new_bb;
> }
> }
> }
> As it is actually tail callable during expansion of the bar (x_2(D)); call
> stmt, expand_gimple_tailbb returns non-NULL and sets can_fallthru to false,
> plus emits
> ;; bar (x_2(D)); [tail call]
>
> (insn 11 10 12 2 (set (reg:SI 5 di)
> (reg/v:SI 99 [ x ])) "pr118430.c":35:10 -1
> (nil))
>
> (call_insn/j 12 11 13 2 (set (reg:SI 0 ax)
> (call (mem:QI (symbol_ref:DI ("bar") [flags 0x3] <function_decl 0x7fb39020bd00 bar>) [0 bar S1 A8])
> (const_int 0 [0]))) "pr118430.c":35:10 -1
> (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x3] <function_decl 0x7fb39020bd00 bar>)
> (expr_list:REG_EH_REGION (const_int 0 [0])
> (nil)))
> (expr_list:SI (use (reg:SI 5 di))
> (nil)))
>
> (barrier 13 12 0)
> Because it doesn't fallthru, no further statements in the same bb are
> expanded. Now, if the bb with return happened to be in some other basic
> block from the [tail call], it could be expanded but because the bb with
> tail call ends with a barrier, it doesn't fall thru there and if nothing
> else could reach it, we'd remove the unreachable bb RSN.
>
> > So I feel a bit nervous marking sth as tail-call that doesn't
> > actually look like one (unless we make it so again).
>
> Expansion really counts on tailc to verify all the following statements
> are useless. Even if it is
> _1 = bar (...); [tail call]
> return _1;
> it is treated the same, we also don't expand the return _1; there
> separately, after all, nothing initializes the _1 anywhere when expanding
> the tail call (unless tail call fails and we expand it as normal call of
> course). And as tailc allows, there could be further statements, copying of
> SSA_NAMEs, debug statements, clobber statements, ...
I see.
The patch is OK then.
Thanks,
Richard.
> Jakub
>
>
@@ -45,6 +45,12 @@ along with GCC; see the file COPYING3.
#include "ipa-utils.h"
#include "tree-ssa-live.h"
#include "diagnostic-core.h"
+#include "gimple-range.h"
+#include "alloc-pool.h"
+#include "sreal.h"
+#include "symbol-summary.h"
+#include "ipa-cp.h"
+#include "ipa-prop.h"
/* The file implements the tail recursion elimination. It is also used to
analyze the tail calls in general, passing the results to the rtl level
@@ -483,7 +489,7 @@ find_tail_calls (basic_block bb, struct
{
if (dump_file)
fprintf (dump_file, "Basic block %d has extra exit edges\n",
- bb->index);
+ bb->index);
return;
}
if (!cfun->has_musttail)
@@ -517,7 +523,8 @@ find_tail_calls (basic_block bb, struct
if (bad_stmt)
{
maybe_error_musttail (call,
- _("memory reference or volatile after call"));
+ _("memory reference or volatile after "
+ "call"));
return;
}
ass_var = gimple_call_lhs (call);
@@ -597,10 +604,10 @@ find_tail_calls (basic_block bb, struct
{
if (stmt == last_stmt)
maybe_error_musttail (call,
- _("call may throw exception that does not propagate"));
+ _("call may throw exception that does not "
+ "propagate"));
else
- maybe_error_musttail (call,
- _("code between call and return"));
+ maybe_error_musttail (call, _("code between call and return"));
return;
}
@@ -715,7 +722,8 @@ find_tail_calls (basic_block bb, struct
{
if (local_live_vars)
BITMAP_FREE (local_live_vars);
- maybe_error_musttail (call, _("call invocation refers to locals"));
+ maybe_error_musttail (call,
+ _("call invocation refers to locals"));
return;
}
else
@@ -724,7 +732,8 @@ find_tail_calls (basic_block bb, struct
if (bitmap_bit_p (local_live_vars, *v))
{
BITMAP_FREE (local_live_vars);
- maybe_error_musttail (call, _("call invocation refers to locals"));
+ maybe_error_musttail (call,
+ _("call invocation refers to locals"));
return;
}
}
@@ -833,15 +842,39 @@ find_tail_calls (basic_block bb, struct
&& (ret_var != ass_var
&& !(is_empty_type (TREE_TYPE (ret_var)) && !ass_var)))
{
- maybe_error_musttail (call, _("call uses return slot"));
- return;
+ bool ok = false;
+ value_range val;
+ tree valr;
+ /* If IPA-VRP proves called function always returns a singleton range,
+ the return value is replaced by the only value in that range.
+ For tail call purposes, pretend such replacement didn't happen. */
+ if (ass_var == NULL_TREE
+ && !tail_recursion
+ && TREE_CONSTANT (ret_var))
+ if (tree type = gimple_range_type (call))
+ if (tree callee = gimple_call_fndecl (call))
+ if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
+ && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
+ type)
+ && useless_type_conversion_p (TREE_TYPE (ret_var), type)
+ && ipa_return_value_range (val, callee)
+ && val.singleton_p (&valr)
+ && operand_equal_p (ret_var, valr, 0))
+ ok = true;
+ if (!ok)
+ {
+ maybe_error_musttail (call,
+ _("call and return value are different"));
+ return;
+ }
}
/* If this is not a tail recursive call, we cannot handle addends or
multiplicands. */
if (!tail_recursion && (m || a))
{
- maybe_error_musttail (call, _("operations after non tail recursive call"));
+ maybe_error_musttail (call,
+ _("operations after non tail recursive call"));
return;
}
@@ -849,7 +882,8 @@ find_tail_calls (basic_block bb, struct
if (m && POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl))))
{
maybe_error_musttail (call,
- _("tail recursion with pointers can only use additions"));
+ _("tail recursion with pointers can only use "
+ "additions"));
return;
}
@@ -0,0 +1,90 @@
+/* PR tree-optimization/118430 */
+/* { dg-do compile { target musttail } } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times " bar \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " freddy \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " (?:bar|freddy) \\\(\[^\n\r]\*\\\); \\\[tail call\\\]" 2 "optimized" } } */
+
+__attribute__ ((noipa)) void
+foo (int x)
+{
+ (void) x;
+}
+
+__attribute__ ((noinline)) int
+bar (int x)
+{
+ foo (x);
+ return 1;
+}
+
+__attribute__ ((noinline)) int
+baz (int *x)
+{
+ foo (*x);
+ return 2;
+}
+
+__attribute__((noipa)) int
+qux (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ [[gnu::musttail]]
+ return bar (x);
+}
+
+__attribute__((noipa)) int
+corge (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ return bar (x) + 1;
+}
+
+__attribute__ ((noinline)) float
+freddy (int x)
+{
+ foo (x);
+ return 1.75f;
+}
+
+__attribute__((noipa)) float
+garply (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ [[gnu::musttail]]
+ return freddy (x);
+}
+
+__attribute__((noipa)) float
+quux (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ return freddy (x) + 0.25f;
+}
+
+int v;
+
+int
+main ()
+{
+ qux (v);
+ corge (v);
+ garply (v);
+ quux (v);
+}
@@ -0,0 +1,89 @@
+/* PR tree-optimization/118430 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times " bar \\\(\[^\n\r]\*\\\); \\\[tail call\\\]" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " freddy \\\(\[^\n\r]\*\\\); \\\[tail call\\\]" 2 "optimized" } } */
+
+__attribute__ ((noipa)) void
+foo (int x)
+{
+ (void) x;
+}
+
+__attribute__ ((noinline)) int
+bar (int x)
+{
+ foo (x);
+ return 1;
+}
+
+__attribute__ ((noinline)) int
+baz (int *x)
+{
+ foo (*x);
+ return 2;
+}
+
+__attribute__((noipa)) int
+qux (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ return bar (x);
+}
+
+__attribute__((noipa)) int
+corge (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ bar (x);
+ return 1;
+}
+
+__attribute__ ((noinline)) float
+freddy (int x)
+{
+ foo (x);
+ return 1.75f;
+}
+
+__attribute__((noipa)) float
+garply (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ return freddy (x);
+}
+
+__attribute__((noipa)) float
+quux (int x)
+{
+ {
+ int v;
+ foo (x);
+ baz (&v);
+ }
+ freddy (x);
+ return 1.75f;
+}
+
+int v;
+
+int
+main ()
+{
+ qux (v);
+ corge (v);
+ garply (v);
+ quux (v);
+}