[v3] ipa-inline: Add target info into fn summary [PR102059]
Commit Message
Hi!
Power ISA 2.07 (Power8) introduces transactional memory feature
but ISA3.1 (Power10) removes it. It exposes one troublesome
issue as PR102059 shows. Users define some function with
target pragma cpu=power10 then it calls one function with
attribute always_inline which inherits command line option
cpu=power8 which enables HTM implicitly. The current isa_flags
check doesn't allow this inlining due to "target specific
option mismatch" and error mesasge is emitted.
Normally, the callee function isn't intended to exploit HTM
feature, but the default flag setting make it look it has.
As Richi raised in the PR, we have fp_expressions flag in
function summary, and allow us to check the function actually
contains any floating point expressions to avoid overkill.
So this patch follows the similar idea but is more target
specific, for this rs6000 port specific requirement on HTM
feature check, we would like to check rs6000 specific HTM
built-in functions and inline assembly, it allows targets
to do their own customized checks and updates.
It introduces two target hooks need_ipa_fn_target_info and
update_ipa_fn_target_info. The former allows target to do
some previous check and decides to collect target specific
information for this function or not. For some special case,
it can predict the analysis result and set it early without
any scannings. The latter allows the analyze_function_body
to pass gimple stmts down just like fp_expressions handlings,
target can do its own tricks. I put them as one hook initially
with one boolean to indicates whether it's initial time, but
the code looks a bit ugly, to separate them seems to have
better readability.
Against v2 [2], this v3 addressed Martin's review comments:
- Replace HWI auto_vec with unsigned int for target_info
to avoid overkill (also Segher's comments), adjust some
places need to be updated for this change.
- Annotate target_info won't be streamed for offloading
target compilers.
- Scan for all gimple statements instead of those with
non-zero size/time weights.
Against v1 [1], the v2 addressed Richi's and Segher's review
comments, mainly consists of:
- Extend it to cover non always_inline.
- Exclude the case for offload streaming.
- Some function naming and formatting issues.
- Adjust rs6000_can_inline_p.
- Add new cases.
Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
Any comments are highly appreciated!
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579045.html
------
gcc/ChangeLog:
PR ipa/102059
* config/rs6000/rs6000-call.c (rs6000_fn_has_any_of_these_mask_bits):
New function.
* config/rs6000/rs6000-internal.h
(rs6000_fn_has_any_of_these_mask_bits): New declare.
* config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
(TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
(rs6000_need_ipa_fn_target_info): New function.
(rs6000_update_ipa_fn_target_info): Likewise.
(rs6000_can_inline_p): Adjust for ipa function summary target info.
* config/rs6000/rs6000.h (RS6000_FN_TARGET_INFO_HTM): New macro.
* ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
summary target info.
(analyze_function_body): Adjust for ipa function summary target
info and call hook rs6000_need_ipa_fn_target_info and
rs6000_update_ipa_fn_target_info.
(ipa_merge_fn_summary_after_inlining): Adjust for ipa function
summary target info.
(inline_read_section): Likewise.
(ipa_fn_summary_write): Likewise.
* ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
hook.
(TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
* target.def (update_ipa_fn_target_info): New hook.
(need_ipa_fn_target_info): Likewise.
* targhooks.c (default_need_ipa_fn_target_info): New function.
(default_update_ipa_fn_target_info): Likewise.
* targhooks.h (default_update_ipa_fn_target_info): New declare.
(default_need_ipa_fn_target_info): Likewise.
gcc/testsuite/ChangeLog:
PR ipa/102059
* gcc.dg/lto/pr102059-1_0.c: New test.
* gcc.dg/lto/pr102059-1_1.c: New test.
* gcc.dg/lto/pr102059-1_2.c: New test.
* gcc.dg/lto/pr102059-2_0.c: New test.
* gcc.dg/lto/pr102059-2_1.c: New test.
* gcc.dg/lto/pr102059-2_2.c: New test.
* gcc.target/powerpc/pr102059-5.c: New test.
* gcc.target/powerpc/pr102059-6.c: New test.
* gcc.target/powerpc/pr102059-7.c: New test.
---
gcc/config/rs6000/rs6000-call.c | 12 +++
gcc/config/rs6000/rs6000-internal.h | 3 +
gcc/config/rs6000/rs6000.c | 87 +++++++++++++++++--
gcc/config/rs6000/rs6000.h | 5 ++
gcc/doc/tm.texi | 31 +++++++
gcc/doc/tm.texi.in | 4 +
gcc/ipa-fnsummary.c | 33 ++++++-
gcc/ipa-fnsummary.h | 7 +-
gcc/testsuite/gcc.dg/lto/pr102059-1_0.c | 12 +++
gcc/testsuite/gcc.dg/lto/pr102059-1_1.c | 9 ++
gcc/testsuite/gcc.dg/lto/pr102059-1_2.c | 11 +++
gcc/testsuite/gcc.dg/lto/pr102059-2_0.c | 12 +++
gcc/testsuite/gcc.dg/lto/pr102059-2_1.c | 9 ++
gcc/testsuite/gcc.dg/lto/pr102059-2_2.c | 10 +++
gcc/testsuite/gcc.target/powerpc/pr102059-5.c | 21 +++++
gcc/testsuite/gcc.target/powerpc/pr102059-6.c | 21 +++++
gcc/testsuite/gcc.target/powerpc/pr102059-7.c | 21 +++++
gcc/target.def | 35 ++++++++
gcc/targhooks.c | 16 ++++
gcc/targhooks.h | 2 +
20 files changed, 349 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
create mode 100644 gcc/testsuite/gcc.dg/lto/pr102059-1_1.c
create mode 100644 gcc/testsuite/gcc.dg/lto/pr102059-1_2.c
create mode 100644 gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
create mode 100644 gcc/testsuite/gcc.dg/lto/pr102059-2_1.c
create mode 100644 gcc/testsuite/gcc.dg/lto/pr102059-2_2.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-5.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-6.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-7.c
Comments
Hi,
On Fri, Sep 17 2021, Kewen.Lin wrote:
>
[...]
>
> Against v2 [2], this v3 addressed Martin's review comments:
> - Replace HWI auto_vec with unsigned int for target_info
> to avoid overkill (also Segher's comments), adjust some
> places need to be updated for this change.
> - Annotate target_info won't be streamed for offloading
> target compilers.
> - Scan for all gimple statements instead of those with
> non-zero size/time weights.
>
> Against v1 [1], the v2 addressed Richi's and Segher's review
> comments, mainly consists of:
> - Extend it to cover non always_inline.
> - Exclude the case for offload streaming.
> - Some function naming and formatting issues.
> - Adjust rs6000_can_inline_p.
> - Add new cases.
>
> Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
>
> Any comments are highly appreciated!
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579045.html
> ------
> gcc/ChangeLog:
>
> PR ipa/102059
> * config/rs6000/rs6000-call.c (rs6000_fn_has_any_of_these_mask_bits):
> New function.
> * config/rs6000/rs6000-internal.h
> (rs6000_fn_has_any_of_these_mask_bits): New declare.
> * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
> (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
> (rs6000_need_ipa_fn_target_info): New function.
> (rs6000_update_ipa_fn_target_info): Likewise.
> (rs6000_can_inline_p): Adjust for ipa function summary target info.
> * config/rs6000/rs6000.h (RS6000_FN_TARGET_INFO_HTM): New macro.
> * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
> summary target info.
> (analyze_function_body): Adjust for ipa function summary target
> info and call hook rs6000_need_ipa_fn_target_info and
> rs6000_update_ipa_fn_target_info.
> (ipa_merge_fn_summary_after_inlining): Adjust for ipa function
> summary target info.
> (inline_read_section): Likewise.
> (ipa_fn_summary_write): Likewise.
> * ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
> * doc/tm.texi: Regenerate.
> * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
> hook.
> (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
> * target.def (update_ipa_fn_target_info): New hook.
> (need_ipa_fn_target_info): Likewise.
> * targhooks.c (default_need_ipa_fn_target_info): New function.
> (default_update_ipa_fn_target_info): Likewise.
> * targhooks.h (default_update_ipa_fn_target_info): New declare.
> (default_need_ipa_fn_target_info): Likewise.
>
[...]
> diff --git a/gcc/target.def b/gcc/target.def
> index 28a34f1d51b..28ff639684b 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -6600,6 +6600,41 @@ specific target options and the caller does not use the same options.",
> bool, (tree caller, tree callee),
> default_target_can_inline_p)
>
> +DEFHOOK
> +(update_ipa_fn_target_info,
> + "Allow target to analyze all gimple statements for the given function to\n\
> +record and update some target specific information for inlining. A typical\n\
> +example is that a caller with one isa feature disabled is normally not\n\
> +allowed to inline a callee with that same isa feature enabled even which is\n\
> +attributed by always_inline, but with the conservative analysis on all\n\
> +statements of the callee if we are able to guarantee the callee does not\n\
> +exploit any instructions from the mismatch isa feature, it would be safe to\n\
> +allow the caller to inline the callee.\n\
> +@var{info} is one @code{unsigned int} value to record information in which\n\
> +one set bit indicates one corresponding feature is detected in the analysis,\n\
> +@var{stmt} is the statement being analyzed. Return true if target still\n\
> +need to analyze the subsequent statements, otherwise return false to stop\n\
> +subsequent analysis.\n\
> +The default version of this hook returns false.",
> + bool, (unsigned int& info, const gimple* stmt),
> + default_update_ipa_fn_target_info)
> +
> +DEFHOOK
> +(need_ipa_fn_target_info,
> + "Allow target to check early whether it is necessary to analyze all gimple\n\
> +statements in the given function to update target specific information for\n\
> +inlining. See hook @code{update_ipa_fn_target_info} for usage example of\n\
> +target specific information. This hook is expected to be invoked ahead of\n\
> +the iterating with hook @code{update_ipa_fn_target_info}.\n\
> +@var{decl} is the function being analyzed, @var{info} is the same as what\n\
> +in hook @code{update_ipa_fn_target_info}, target can do one time update\n\
> +into @var{info} without iterating for some case. Return true if target\n\
> +decides to analyze all gimple statements to collect information, otherwise\n\
> +return false.\n\
> +The default version of this hook returns false.",
> + bool, (const_tree decl, unsigned int& info),
> + default_need_ipa_fn_target_info)
> +
> DEFHOOK
> (relayout_function,
> "This target hook fixes function @var{fndecl} after attributes are processed.\n\
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index c9b5208853d..5ba539cef1a 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -1765,6 +1765,22 @@ default_target_can_inline_p (tree caller, tree callee)
> return callee_opts == caller_opts;
> }
>
> +/* By default, return false to not need to collect any target information
> + for inlining. Target maintainer should re-define the hook if the
> + target want to take advantage of it. */
> +
> +bool
> +default_need_ipa_fn_target_info (const_tree, uint16_t &)
> +{
> + return false;
> +}
> +
> +bool
> +default_update_ipa_fn_target_info (uint16_t &, const gimple *)
> +{
> + return false;
> +}
The parameters have uint16_t type here but you apparently decided to use
unsigned int everywhere else, you probably forgot to change them here
too.
Martin
On Fri, Sep 17, 2021 at 05:42:38PM +0800, Kewen.Lin wrote:
> Against v2 [2], this v3 addressed Martin's review comments:
> - Replace HWI auto_vec with unsigned int for target_info
> to avoid overkill (also Segher's comments), adjust some
> places need to be updated for this change.
I'd have used a single HWI (always 64 bits), but an int (always at least
32 bits in GCC) is fine as well, sure.
> * config/rs6000/rs6000-internal.h
> (rs6000_fn_has_any_of_these_mask_bits): New declare.
You can break that after the ":"... Just :-)
> * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
> hook.
That easily fits one line. (Many more examples here btw).
> +bool
> +rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
> + HOST_WIDE_INT mask)
> +{
> + gcc_assert (code < RS6000_BUILTIN_COUNT);
We don't have this assert anywhere else, so lose it here as well?
If we want such checking we should make an inline accessor function for
this, and check it there. But we already do a check in
rs6000_builtin_decl (and one in def_builtin, but that one has an
off-by-one error in it).
> +extern bool rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
> + HOST_WIDE_INT mask);
The huge unwieldy name suggests it might not be the best abstraction you
could use, btw ;-)
> +static bool
> +rs6000_update_ipa_fn_target_info (unsigned int &info, const gimple *stmt)
> +{
> + /* Assume inline asm can use any instruction features. */
> + if (gimple_code (stmt) == GIMPLE_ASM)
This should be fine for HTM, but it may be a bit *too* pessimistic for
other features. We'll see when we get there :-)
> +@deftypefn {Target Hook} bool TARGET_NEED_IPA_FN_TARGET_INFO (const_tree @var{decl}, unsigned int& @var{info})
> +Allow target to check early whether it is necessary to analyze all gimple
> +statements in the given function to update target specific information for
> +inlining. See hook @code{update_ipa_fn_target_info} for usage example of
[ ... ]
> +The default version of this hook returns false.
And that is really the only reason to have this premature optimisation:
targets that do not care do not have to pay the price, however trivial
that price may be, which is a good idea politically ;-)
> +/* { dg-final { scan-tree-dump-times "Inlining foo/\[0-9\]* " 1 "einline"} } */
If you use {} instead of "" you don't need the backslashes.
> +default_update_ipa_fn_target_info (uint16_t &, const gimple *)
I'm surprised the compiler didn't warn about this btw.
The rs6000 parts are okay for trunk (with the trivial cleanups please).
Thanks!
Segher
Hi,
On Fri, Sep 17 2021, Segher Boessenkool wrote:
> On Fri, Sep 17, 2021 at 05:42:38PM +0800, Kewen.Lin wrote:
>> Against v2 [2], this v3 addressed Martin's review comments:
>> - Replace HWI auto_vec with unsigned int for target_info
>> to avoid overkill (also Segher's comments), adjust some
>> places need to be updated for this change.
>
> I'd have used a single HWI (always 64 bits), but an int (always at least
> 32 bits in GCC) is fine as well, sure.
Let's have it as unsigned int then (in a separate thread I was
suggesting to go even smaller).
> That easily fits one line. (Many more examples here btw).
>
>> +bool
>> +rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
>> + HOST_WIDE_INT mask)
>> +{
>> + gcc_assert (code < RS6000_BUILTIN_COUNT);
>
> We don't have this assert anywhere else, so lose it here as well?
>
> If we want such checking we should make an inline accessor function for
> this, and check it there. But we already do a check in
> rs6000_builtin_decl (and one in def_builtin, but that one has an
> off-by-one error in it).
>
>> +extern bool rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
>> + HOST_WIDE_INT mask);
>
> The huge unwieldy name suggests it might not be the best abstraction you
> could use, btw ;-)
>
>> +static bool
>> +rs6000_update_ipa_fn_target_info (unsigned int &info, const gimple *stmt)
>> +{
>> + /* Assume inline asm can use any instruction features. */
>> + if (gimple_code (stmt) == GIMPLE_ASM)
>
> This should be fine for HTM, but it may be a bit *too* pessimistic for
> other features. We'll see when we get there :-)
>
>> +@deftypefn {Target Hook} bool TARGET_NEED_IPA_FN_TARGET_INFO (const_tree @var{decl}, unsigned int& @var{info})
>> +Allow target to check early whether it is necessary to analyze all gimple
>> +statements in the given function to update target specific information for
>> +inlining. See hook @code{update_ipa_fn_target_info} for usage example of
> [ ... ]
>> +The default version of this hook returns false.
>
> And that is really the only reason to have this premature optimisation:
> targets that do not care do not have to pay the price, however trivial
> that price may be, which is a good idea politically ;-)
>
>> +/* { dg-final { scan-tree-dump-times "Inlining foo/\[0-9\]* " 1 "einline"} } */
>
> If you use {} instead of "" you don't need the backslashes.
>
>> +default_update_ipa_fn_target_info (uint16_t &, const gimple *)
>
> I'm surprised the compiler didn't warn about this btw.
>
> The rs6000 parts are okay for trunk (with the trivial cleanups please).
> Thanks!
the IPA bits are OK too (after the type mismatch is fixed).
Martin
Hi Segher,
Thanks for the review!
on 2021/9/17 下午10:14, Segher Boessenkool wrote:
> On Fri, Sep 17, 2021 at 05:42:38PM +0800, Kewen.Lin wrote:
>> Against v2 [2], this v3 addressed Martin's review comments:
>> - Replace HWI auto_vec with unsigned int for target_info
>> to avoid overkill (also Segher's comments), adjust some
>> places need to be updated for this change.
>
> I'd have used a single HWI (always 64 bits), but an int (always at least
> 32 bits in GCC) is fine as well, sure.
>
>> * config/rs6000/rs6000-internal.h
>> (rs6000_fn_has_any_of_these_mask_bits): New declare.
>
> You can break that after the ":"... Just :-)
>
OK, will adjust.
>> * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
>> hook.
>
> That easily fits one line. (Many more examples here btw).
>
I noticed some practices seem to like breaking it early, maybe due to
it could have a good overall alignment view? I am guessing the demand
for where to break isn't that strict?
>> +bool
>> +rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
>> + HOST_WIDE_INT mask)
>> +{
>> + gcc_assert (code < RS6000_BUILTIN_COUNT);
>
> We don't have this assert anywhere else, so lose it here as well?
>
> If we want such checking we should make an inline accessor function for
> this, and check it there. But we already do a check in
> rs6000_builtin_decl (and one in def_builtin, but that one has an
> off-by-one error in it).
OK, will remove it.
>
>> +extern bool rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
>> + HOST_WIDE_INT mask);
>
> The huge unwieldy name suggests it might not be the best abstraction you
> could use, btw ;-)
>
>> +static bool
>> +rs6000_update_ipa_fn_target_info (unsigned int &info, const gimple *stmt)
>> +{
>> + /* Assume inline asm can use any instruction features. */
>> + if (gimple_code (stmt) == GIMPLE_ASM)
>
> This should be fine for HTM, but it may be a bit *too* pessimistic for
> other features. We'll see when we get there :-)
>
>> +@deftypefn {Target Hook} bool TARGET_NEED_IPA_FN_TARGET_INFO (const_tree @var{decl}, unsigned int& @var{info})
>> +Allow target to check early whether it is necessary to analyze all gimple
>> +statements in the given function to update target specific information for
>> +inlining. See hook @code{update_ipa_fn_target_info} for usage example of
> [ ... ]
>> +The default version of this hook returns false.
>
> And that is really the only reason to have this premature optimisation:
> targets that do not care do not have to pay the price, however trivial
> that price may be, which is a good idea politically ;-)
>
>> +/* { dg-final { scan-tree-dump-times "Inlining foo/\[0-9\]* " 1 "einline"} } */
>
> If you use {} instead of "" you don't need the backslashes.
>
OK, will adjust.
>> +default_update_ipa_fn_target_info (uint16_t &, const gimple *)
>
> I'm surprised the compiler didn't warn about this btw.
>
So am I. Guessing it's due to I only bootstrapped it on ppc64le which re-defines
the hook? Anyway, I will do the testing on x86 as well before committing it.
> The rs6000 parts are okay for trunk (with the trivial cleanups please).
> Thanks!
>
Thanks again!
BR,
Kewen
Hi Martin,
Thanks for the review.
on 2021/9/18 下午7:31, Martin Jambor wrote:
> Hi,
>
> On Fri, Sep 17 2021, Segher Boessenkool wrote:
>> On Fri, Sep 17, 2021 at 05:42:38PM +0800, Kewen.Lin wrote:
>>> Against v2 [2], this v3 addressed Martin's review comments:
>>> - Replace HWI auto_vec with unsigned int for target_info
>>> to avoid overkill (also Segher's comments), adjust some
>>> places need to be updated for this change.
>>
>> I'd have used a single HWI (always 64 bits), but an int (always at least
>> 32 bits in GCC) is fine as well, sure.
>
> Let's have it as unsigned int then (in a separate thread I was
> suggesting to go even smaller).
>
>> That easily fits one line. (Many more examples here btw).
>>
>>> +bool
>>> +rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
>>> + HOST_WIDE_INT mask)
>>> +{
>>> + gcc_assert (code < RS6000_BUILTIN_COUNT);
>>
>> We don't have this assert anywhere else, so lose it here as well?
>>
>> If we want such checking we should make an inline accessor function for
>> this, and check it there. But we already do a check in
>> rs6000_builtin_decl (and one in def_builtin, but that one has an
>> off-by-one error in it).
>>
>>> +extern bool rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
>>> + HOST_WIDE_INT mask);
>>
>> The huge unwieldy name suggests it might not be the best abstraction you
>> could use, btw ;-)
>>
>>> +static bool
>>> +rs6000_update_ipa_fn_target_info (unsigned int &info, const gimple *stmt)
>>> +{
>>> + /* Assume inline asm can use any instruction features. */
>>> + if (gimple_code (stmt) == GIMPLE_ASM)
>>
>> This should be fine for HTM, but it may be a bit *too* pessimistic for
>> other features. We'll see when we get there :-)
>>
>>> +@deftypefn {Target Hook} bool TARGET_NEED_IPA_FN_TARGET_INFO (const_tree @var{decl}, unsigned int& @var{info})
>>> +Allow target to check early whether it is necessary to analyze all gimple
>>> +statements in the given function to update target specific information for
>>> +inlining. See hook @code{update_ipa_fn_target_info} for usage example of
>> [ ... ]
>>> +The default version of this hook returns false.
>>
>> And that is really the only reason to have this premature optimisation:
>> targets that do not care do not have to pay the price, however trivial
>> that price may be, which is a good idea politically ;-)
>>
>>> +/* { dg-final { scan-tree-dump-times "Inlining foo/\[0-9\]* " 1 "einline"} } */
>>
>> If you use {} instead of "" you don't need the backslashes.
>>
>>> +default_update_ipa_fn_target_info (uint16_t &, const gimple *)
>>
>> I'm surprised the compiler didn't warn about this btw.
>>
>> The rs6000 parts are okay for trunk (with the trivial cleanups please).
>> Thanks!
>
>> +/* By default, return false to not need to collect any target information
>> + for inlining. Target maintainer should re-define the hook if the
>> + target want to take advantage of it. */
>> +
>> +bool
>> +default_need_ipa_fn_target_info (const_tree, uint16_t &)
>> +{
>> + return false;
>> +}
>> +
>> +bool
>> +default_update_ipa_fn_target_info (uint16_t &, const gimple *)
>> +{
>> + return false;
>> +}
>
> The parameters have uint16_t type here but you apparently decided to use
unsigned int everywhere else, you probably forgot to change them here
too.
Yeah, I did forget to change them. :( Thanks for catching!
> the IPA bits are OK too (after the type mismatch is fixed).
>
Thanks!
BR,
Kewen
@@ -13795,6 +13795,18 @@ rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
return rs6000_builtin_decls[code];
}
+/* Return true if the builtin with CODE has any mask bits set
+ which are specified by MASK. */
+
+bool
+rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
+ HOST_WIDE_INT mask)
+{
+ gcc_assert (code < RS6000_BUILTIN_COUNT);
+ HOST_WIDE_INT fnmask = rs6000_builtin_info[code].mask;
+ return (fnmask & mask) != 0;
+}
+
static void
altivec_init_builtins (void)
{
@@ -180,6 +180,9 @@ extern tree rs6000_fold_builtin (tree fndecl ATTRIBUTE_UNUSED,
tree *args ATTRIBUTE_UNUSED,
bool ignore ATTRIBUTE_UNUSED);
+extern bool rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
+ HOST_WIDE_INT mask);
+
#if TARGET_ELF
extern bool rs6000_passes_ieee128;
#endif
@@ -71,6 +71,9 @@
#include "tree-vector-builder.h"
#include "context.h"
#include "tree-pass.h"
+#include "symbol-summary.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
#include "except.h"
#if TARGET_XCOFF
#include "xcoffout.h" /* get declarations of xcoff_*_section_name */
@@ -1786,6 +1789,12 @@ static const struct attribute_spec rs6000_attribute_table[] =
#undef TARGET_INVALID_CONVERSION
#define TARGET_INVALID_CONVERSION rs6000_invalid_conversion
+
+#undef TARGET_NEED_IPA_FN_TARGET_INFO
+#define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
+
+#undef TARGET_UPDATE_IPA_FN_TARGET_INFO
+#define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
/* Processor table. */
@@ -25026,7 +25035,63 @@ rs6000_generate_version_dispatcher_body (void *node_p)
return resolver;
}
-
+/* Hook to decide if we need to scan function gimple statements to
+ collect target specific information for inlining, and update the
+ corresponding RS6000_FN_TARGET_INFO_* bit in INFO if we are able
+ to predict which ISA feature is used at this time. Return true
+ if we need to scan, otherwise return false. */
+
+static bool
+rs6000_need_ipa_fn_target_info (const_tree decl,
+ unsigned int &info ATTRIBUTE_UNUSED)
+{
+ tree target = DECL_FUNCTION_SPECIFIC_TARGET (decl);
+ if (!target)
+ target = target_option_default_node;
+ struct cl_target_option *opts = TREE_TARGET_OPTION (target);
+
+ /* See PR102059, we only handle HTM for now, so will only do
+ the consequent scannings when HTM feature enabled. */
+ if (opts->x_rs6000_isa_flags & OPTION_MASK_HTM)
+ return true;
+
+ return false;
+}
+
+/* Hook to update target specific information INFO for inlining by
+ checking the given STMT. Return false if we don't need to scan
+ any more, otherwise return true. */
+
+static bool
+rs6000_update_ipa_fn_target_info (unsigned int &info, const gimple *stmt)
+{
+ /* Assume inline asm can use any instruction features. */
+ if (gimple_code (stmt) == GIMPLE_ASM)
+ {
+ /* Should set any bits we concerned, for now OPTION_MASK_HTM is
+ the only bit we care about. */
+ info |= RS6000_FN_TARGET_INFO_HTM;
+ return false;
+ }
+ else if (gimple_code (stmt) == GIMPLE_CALL)
+ {
+ tree fndecl = gimple_call_fndecl (stmt);
+ if (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD))
+ {
+ enum rs6000_builtins fcode =
+ (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+ /* HTM bifs definitely exploit HTM insns. */
+ if (rs6000_fn_has_any_of_these_mask_bits (fcode, RS6000_BTM_HTM))
+ {
+ info |= RS6000_FN_TARGET_INFO_HTM;
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
+
/* Hook to determine if one function can safely inline another. */
static bool
@@ -25059,10 +25124,19 @@ rs6000_can_inline_p (tree caller, tree callee)
| OPTION_MASK_P10_FUSION_ADDLOG | OPTION_MASK_P10_FUSION_2ADD
| OPTION_MASK_PCREL_OPT);
- if (always_inline) {
- caller_isa &= ~always_inline_safe_mask;
- callee_isa &= ~always_inline_safe_mask;
- }
+ /* Some features are totally safe for inlining (or always inlines),
+ let's exclude them from the following checkings. */
+ HOST_WIDE_INT safe_mask = always_inline ? always_inline_safe_mask : 0;
+ cgraph_node *callee_node = cgraph_node::get (callee);
+ if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
+ {
+ unsigned int info = ipa_fn_summaries->get (callee_node)->target_info;
+ if ((info & RS6000_FN_TARGET_INFO_HTM) == 0)
+ safe_mask |= OPTION_MASK_HTM;
+ }
+
+ /* To adjust callee is enough since we just check for subset. */
+ callee_isa &= ~safe_mask;
/* The callee's options must be a subset of the caller's options, i.e.
a vsx function may inline an altivec function, but a no-vsx function
@@ -25082,8 +25156,7 @@ rs6000_can_inline_p (tree caller, tree callee)
then we must enforce that the callee's and caller's options match
exactly; see PR70010. */
HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
- if (always_inline)
- explicit_isa &= ~always_inline_safe_mask;
+ explicit_isa &= ~safe_mask;
if ((caller_isa & explicit_isa) != (callee_isa & explicit_isa))
{
if (TARGET_DEBUG_TARGET)
@@ -604,6 +604,11 @@ extern int rs6000_vector_align[];
&& TARGET_P8_VECTOR \
&& TARGET_POWERPC64)
+/* Inlining allows targets to define the meanings of bits in target_info
+ field of ipa_fn_summary by itself, the used bits for rs6000 are listed
+ below. */
+#define RS6000_FN_TARGET_INFO_HTM 1
+
/* Whether the various reciprocal divide/square root estimate instructions
exist, and whether we should automatically generate code for the instruction
by default. */
@@ -10752,6 +10752,37 @@ default, inlining is not allowed if the callee function has function
specific target options and the caller does not use the same options.
@end deftypefn
+@deftypefn {Target Hook} bool TARGET_UPDATE_IPA_FN_TARGET_INFO (unsigned int& @var{info}, const gimple* @var{stmt})
+Allow target to analyze all gimple statements for the given function to
+record and update some target specific information for inlining. A typical
+example is that a caller with one isa feature disabled is normally not
+allowed to inline a callee with that same isa feature enabled even which is
+attributed by always_inline, but with the conservative analysis on all
+statements of the callee if we are able to guarantee the callee does not
+exploit any instructions from the mismatch isa feature, it would be safe to
+allow the caller to inline the callee.
+@var{info} is one @code{unsigned int} value to record information in which
+one set bit indicates one corresponding feature is detected in the analysis,
+@var{stmt} is the statement being analyzed. Return true if target still
+need to analyze the subsequent statements, otherwise return false to stop
+subsequent analysis.
+The default version of this hook returns false.
+@end deftypefn
+
+@deftypefn {Target Hook} bool TARGET_NEED_IPA_FN_TARGET_INFO (const_tree @var{decl}, unsigned int& @var{info})
+Allow target to check early whether it is necessary to analyze all gimple
+statements in the given function to update target specific information for
+inlining. See hook @code{update_ipa_fn_target_info} for usage example of
+target specific information. This hook is expected to be invoked ahead of
+the iterating with hook @code{update_ipa_fn_target_info}.
+@var{decl} is the function being analyzed, @var{info} is the same as what
+in hook @code{update_ipa_fn_target_info}, target can do one time update
+into @var{info} without iterating for some case. Return true if target
+decides to analyze all gimple statements to collect information, otherwise
+return false.
+The default version of this hook returns false.
+@end deftypefn
+
@deftypefn {Target Hook} void TARGET_RELAYOUT_FUNCTION (tree @var{fndecl})
This target hook fixes function @var{fndecl} after attributes are processed.
Default does nothing. On ARM, the default function's alignment is updated
@@ -7198,6 +7198,10 @@ on this implementation detail.
@hook TARGET_CAN_INLINE_P
+@hook TARGET_UPDATE_IPA_FN_TARGET_INFO
+
+@hook TARGET_NEED_IPA_FN_TARGET_INFO
+
@hook TARGET_RELAYOUT_FUNCTION
@node Emulated TLS
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see
#include "system.h"
#include "coretypes.h"
#include "backend.h"
+#include "target.h"
#include "tree.h"
#include "gimple.h"
#include "alloc-pool.h"
@@ -1146,6 +1147,8 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node)
fprintf (f, " calls:\n");
dump_ipa_call_summary (f, 4, node, s);
fprintf (f, "\n");
+ if (s->target_info)
+ fprintf (f, " target_info: %x\n", s->target_info);
}
else
fprintf (f, "IPA summary for %s is missing.\n", node->dump_name ());
@@ -2659,6 +2662,12 @@ analyze_function_body (struct cgraph_node *node, bool early)
bb_predicate,
bb_predicate);
+ /* Only look for target information for inlinable functions. */
+ bool scan_for_target_info =
+ info->inlinable
+ && targetm.target_option.need_ipa_fn_target_info (node->decl,
+ info->target_info);
+
if (fbi.info)
compute_bb_predicates (&fbi, node, info, params_summary);
const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
@@ -2878,6 +2887,15 @@ analyze_function_body (struct cgraph_node *node, bool early)
}
}
+ /* For target specific information, we want to scan all statements
+ rather than those statements with non-zero weights, to avoid
+ missing to scan something interesting for target information,
+ such as: internal function calls. */
+ if (scan_for_target_info)
+ scan_for_target_info =
+ targetm.target_option.update_ipa_fn_target_info
+ (info->target_info, stmt);
+
/* Account cost of address calculations in the statements. */
for (unsigned int i = 0; i < gimple_num_ops (stmt); i++)
{
@@ -4072,6 +4090,7 @@ ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge)
toplev_predicate = true;
info->fp_expressions |= callee_info->fp_expressions;
+ info->target_info |= callee_info->target_info;
if (callee_info->conds)
{
@@ -4412,13 +4431,17 @@ inline_read_section (struct lto_file_decl_data *file_data, const char *data,
bp = streamer_read_bitpack (&ib);
if (info)
{
- info->inlinable = bp_unpack_value (&bp, 1);
- info->fp_expressions = bp_unpack_value (&bp, 1);
+ info->inlinable = bp_unpack_value (&bp, 1);
+ info->fp_expressions = bp_unpack_value (&bp, 1);
+ if (!lto_stream_offload_p)
+ info->target_info = streamer_read_uhwi (&ib);
}
else
{
- bp_unpack_value (&bp, 1);
- bp_unpack_value (&bp, 1);
+ bp_unpack_value (&bp, 1);
+ bp_unpack_value (&bp, 1);
+ if (!lto_stream_offload_p)
+ streamer_read_uhwi (&ib);
}
count2 = streamer_read_uhwi (&ib);
@@ -4654,6 +4677,8 @@ ipa_fn_summary_write (void)
bp_pack_value (&bp, info->inlinable, 1);
bp_pack_value (&bp, info->fp_expressions, 1);
streamer_write_bitpack (&bp);
+ if (!lto_stream_offload_p)
+ streamer_write_uhwi (ob, info->target_info);
streamer_write_uhwi (ob, vec_safe_length (info->conds));
for (i = 0; vec_safe_iterate (info->conds, i, &c); i++)
{
@@ -126,7 +126,7 @@ public:
ipa_fn_summary ()
: min_size (0),
inlinable (false), single_caller (false),
- fp_expressions (false),
+ fp_expressions (false), target_info (0),
estimated_stack_size (false),
time (0), conds (NULL),
size_time_table (), call_size_time_table (vNULL),
@@ -141,6 +141,7 @@ public:
: min_size (s.min_size),
inlinable (s.inlinable), single_caller (s.single_caller),
fp_expressions (s.fp_expressions),
+ target_info (s.target_info),
estimated_stack_size (s.estimated_stack_size),
time (s.time), conds (s.conds), size_time_table (),
call_size_time_table (vNULL),
@@ -164,6 +165,10 @@ public:
unsigned int single_caller : 1;
/* True if function contains any floating point expressions. */
unsigned int fp_expressions : 1;
+ /* Like fp_expressions field above, but it's to hold some target specific
+ information, such as some target specific isa flags. Note that for
+ offloading target compilers, this field isn't streamed. */
+ unsigned int target_info;
/* Information about function that will result after applying all the
inline decisions present in the callgraph. Generally kept up to
new file mode 100644
@@ -0,0 +1,12 @@
+/* { dg-lto-do link } */
+/* { dg-skip-if "power only" { ! { powerpc*-*-* } } } */
+/* -Wno-attributes suppresses always_inline warnings. */
+/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes" } } */
+
+int __attribute__ ((always_inline))
+foo1 (int *b)
+{
+ *b += 100;
+ return *b;
+}
+
new file mode 100644
@@ -0,0 +1,9 @@
+extern int foo1 (int *b);
+
+int __attribute__ ((always_inline)) foo2 (int *b)
+{
+ int res = foo1 (b);
+ *b += res;
+ return *b;
+}
+
new file mode 100644
@@ -0,0 +1,11 @@
+extern int foo2 (int *b);
+
+#pragma GCC target "cpu=power10"
+__attribute__ ((always_inline))
+int
+main (int *a)
+{
+ *a = foo2 (a);
+ return 0;
+}
+
new file mode 100644
@@ -0,0 +1,12 @@
+/* { dg-lto-do link } */
+/* { dg-skip-if "power only" { ! { powerpc*-*-* } } } */
+/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -flto -fdump-ipa-inline" } } */
+
+int
+foo1 (int *b)
+{
+ *b += 100;
+ return *b;
+}
+
+/* { dg-final { scan-wpa-ipa-dump-not "target specific option mismatch" "inline" } } */
new file mode 100644
@@ -0,0 +1,9 @@
+extern int foo1 (int *b);
+
+int foo2 (int *b)
+{
+ int res = foo1 (b);
+ *b += res;
+ return *b;
+}
+
new file mode 100644
@@ -0,0 +1,10 @@
+extern int foo2 (int *b);
+
+#pragma GCC target "cpu=power10"
+int
+main (int *a)
+{
+ *a = foo2 (a);
+ return 0;
+}
+
new file mode 100644
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* -Wno-attributes suppresses always_inline warnings. */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -Wno-attributes" } */
+
+/* Verify the reduced case from PR102059 won't fail. */
+
+__attribute__ ((always_inline)) int
+foo (int *b)
+{
+ *b += 10;
+ return *b;
+}
+
+#pragma GCC target "cpu=power10"
+int
+bar (int *a)
+{
+ *a = foo (a);
+ return 0;
+}
+
new file mode 100644
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-htm" } */
+
+/* Verify target info for inlining still works even if callee
+ disables HTM explicitly while caller enables it. */
+
+static inline int __attribute__ ((always_inline))
+foo (int *b)
+{
+ *b += 10;
+ return *b;
+}
+
+#pragma GCC target "htm"
+int
+bar (int *a)
+{
+ *a = foo (a);
+ return 0;
+}
+
new file mode 100644
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -fdump-tree-einline-optimized" } */
+
+/* Like pr102059-5.c, to verify the inlining still happens
+ even without always_inline attribute. */
+
+int foo (int *b)
+{
+ *b += 10;
+ return *b;
+}
+
+#pragma GCC target "cpu=power10"
+int
+bar (int *a)
+{
+ *a = foo (a);
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Inlining foo/\[0-9\]* " 1 "einline"} } */
@@ -6600,6 +6600,41 @@ specific target options and the caller does not use the same options.",
bool, (tree caller, tree callee),
default_target_can_inline_p)
+DEFHOOK
+(update_ipa_fn_target_info,
+ "Allow target to analyze all gimple statements for the given function to\n\
+record and update some target specific information for inlining. A typical\n\
+example is that a caller with one isa feature disabled is normally not\n\
+allowed to inline a callee with that same isa feature enabled even which is\n\
+attributed by always_inline, but with the conservative analysis on all\n\
+statements of the callee if we are able to guarantee the callee does not\n\
+exploit any instructions from the mismatch isa feature, it would be safe to\n\
+allow the caller to inline the callee.\n\
+@var{info} is one @code{unsigned int} value to record information in which\n\
+one set bit indicates one corresponding feature is detected in the analysis,\n\
+@var{stmt} is the statement being analyzed. Return true if target still\n\
+need to analyze the subsequent statements, otherwise return false to stop\n\
+subsequent analysis.\n\
+The default version of this hook returns false.",
+ bool, (unsigned int& info, const gimple* stmt),
+ default_update_ipa_fn_target_info)
+
+DEFHOOK
+(need_ipa_fn_target_info,
+ "Allow target to check early whether it is necessary to analyze all gimple\n\
+statements in the given function to update target specific information for\n\
+inlining. See hook @code{update_ipa_fn_target_info} for usage example of\n\
+target specific information. This hook is expected to be invoked ahead of\n\
+the iterating with hook @code{update_ipa_fn_target_info}.\n\
+@var{decl} is the function being analyzed, @var{info} is the same as what\n\
+in hook @code{update_ipa_fn_target_info}, target can do one time update\n\
+into @var{info} without iterating for some case. Return true if target\n\
+decides to analyze all gimple statements to collect information, otherwise\n\
+return false.\n\
+The default version of this hook returns false.",
+ bool, (const_tree decl, unsigned int& info),
+ default_need_ipa_fn_target_info)
+
DEFHOOK
(relayout_function,
"This target hook fixes function @var{fndecl} after attributes are processed.\n\
@@ -1765,6 +1765,22 @@ default_target_can_inline_p (tree caller, tree callee)
return callee_opts == caller_opts;
}
+/* By default, return false to not need to collect any target information
+ for inlining. Target maintainer should re-define the hook if the
+ target want to take advantage of it. */
+
+bool
+default_need_ipa_fn_target_info (const_tree, uint16_t &)
+{
+ return false;
+}
+
+bool
+default_update_ipa_fn_target_info (uint16_t &, const gimple *)
+{
+ return false;
+}
+
/* If the machine does not have a case insn that compares the bounds,
this means extra overhead for dispatch tables, which raises the
threshold for using them. */
@@ -194,6 +194,8 @@ extern bool default_new_address_profitable_p (rtx, rtx_insn *, rtx);
extern bool default_target_option_valid_attribute_p (tree, tree, tree, int);
extern bool default_target_option_pragma_parse (tree, tree);
extern bool default_target_can_inline_p (tree, tree);
+extern bool default_update_ipa_fn_target_info (unsigned int &, const gimple *);
+extern bool default_need_ipa_fn_target_info (const_tree, unsigned int &);
extern bool default_valid_pointer_mode (scalar_int_mode);
extern bool default_ref_may_alias_errno (class ao_ref *);
extern scalar_int_mode default_addr_space_pointer_mode (addr_space_t);