[v2] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

Message ID 57a263ea-4ef3-c57e-ff4c-2e5833532162@linux.ibm.com
State New
Headers
Series [v2] rs6000: Fix some issues in rs6000_can_inline_p [PR102059] |

Commit Message

Kewen.Lin Dec. 3, 2021, 3:46 a.m. UTC
  Hi Segher,

Thanks for the review!

on 2021/11/30 上午12:57, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Sep 01, 2021 at 02:55:51PM +0800, Kewen.Lin wrote:
>> This patch is to fix the inconsistent behaviors for non-LTO mode
>> and LTO mode.  As Martin pointed out, currently the function
>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>> NULL, but it's wrong, we should use the command line options
>> from target_option_default_node as default.
> 
> This is not documented.
> 

Yeah, but according to the document for the target attribute [1],
"Multiple target back ends implement the target attribute to specify
that a function is to be compiled with different target options than
specified on the command line. The original target command-line options
are ignored. ", it seems to say the function without any target
attribute/pragma will be compiled with target options specified on the
command line.  I think it's a normal expectation for users.

Excepting for the inconsistent behaviors between LTO and non-LTO,
it can also make the below case different.

// Option: -O2 -mcpu=power8 -mhtm

int foo(int *b) {
  *b += __builtin_ttest();
  return *b;
}

__attribute__((target("no-htm"))) int bar(int *a) {
  *a = foo(a);
  return 0;
}

Without this fix, bar inlines foo but get the error as below:

In function ‘foo’,
    inlined from ‘bar’ at test.c:8:8:
test.c:3:9: error: ‘__builtin_ttest’ requires the ‘-mhtm’ option
    3 |   *b += __builtin_ttest();
      |         ^~~~~~~~~~~~~~~~~

Since bar doesn't support foo's required HTM feature.

With this fix, bar doesn't inline foo and the case gets compiled well.

I've added this test case as pr102059-5.c.

[1] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

>> It also replaces
>> rs6000_isa_flags with the one from target_option_default_node
>> when caller_tree is NULL as rs6000_isa_flags could probably
>> change since initialization.
> 
> Is this enough then?  Are there no other places that use
> rs6000_isa_flags?  Is it correct for us to have that variable at all
> anymore?  Etc.
> 

Good question, I think the answer is yes.  I digged into it and found
the current target option save/restore scheme would keep rs6000_isa_flags
to be the same as the one saved in target_option_default_node for the context
of inlining.  So I put one assertion as below:

    if (!caller_tree) {
      caller_tree = target_option_default_node;
      gcc_assert (rs6000_isa_flags
                  == TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags);
    }

And kicked off regression testings on both BE and LE, the result showed
only one same failure, which seems to be a latent bug.  I've filed PR103515
for it.

This bug also indicates it's better to use target_option_default_node
rather than rs6000_isa_flags when the caller_tree is NULL.  Since
the target_option_default_node is straightforward and doesn't rely
on the assumption that rs6000_isa_flags would be kept as the default
one correctly, it doesn't suffer from bugs like PR103515.

>> +  bool always_inline =
>> +    (DECL_DISREGARD_INLINE_LIMITS (callee)
>> +     && lookup_attribute ("always_inline", DECL_ATTRIBUTES (callee)));
> 
> The "=" should start a line, not end it.  And please don't use
> unnecessary parens.
> 
>> +  /* Some flags such as fusion can be tolerated for always inlines.  */
>> +  unsigned HOST_WIDE_INT always_inline_safe_mask =
>> +    (MASK_P8_FUSION | MASK_P10_FUSION | OPTION_MASK_SAVE_TOC_INDIRECT
>> +     | OPTION_MASK_P8_FUSION_SIGN | OPTION_MASK_P10_FUSION_LD_CMPI
>> +     | OPTION_MASK_P10_FUSION_2LOGICAL | OPTION_MASK_P10_FUSION_LOGADD
>> +     | OPTION_MASK_P10_FUSION_ADDLOG | OPTION_MASK_P10_FUSION_2ADD
>> +     | OPTION_MASK_PCREL_OPT);
> 
> Same.
> 

Both above and here fixed.

> The fusion ones are obvious.  The other two are not.  Please put those
> two more obviously separate (not somewhere in the sea of fusion
> options), and some comment wouldn't hurt either.  You can make it
> separate statements even, make it really stand out.
> 

Fixed by splitting it into: fusion_options_mask and optimization_options_mask.
I'm not happy for the later name, but poor to get a better in mind.  Do you
have any suggestions on the name and words?

> Why are there OPTION_MASKs for separate P10 fusion types here, as well as
> MASK_P10_FUSION?
> 

Mike helped to explain the history, I've updated all of them using OPTION_MASK_
to avoid potential confusion.

>> +
>> +  if (always_inline) {
>> +    caller_isa &= ~always_inline_safe_mask;
>> +    callee_isa &= ~always_inline_safe_mask;
>> +  }
> 
> "{" starts a new line, indented.
> 

Fixed.  I refined this part a bit since caller doesn't need to take actions.

The new version V2 is attached.  Tested as before.

BR,
Kewen
---
gcc/ChangeLog:

	PR ipa/102059
	* config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
	target_option_default_node and consider always_inline_safe flags.

gcc/testsuite/ChangeLog:

	PR ipa/102059
	* gcc.target/powerpc/pr102059-1.c: New test.
	* gcc.target/powerpc/pr102059-2.c: New test.
	* gcc.target/powerpc/pr102059-3.c: New test.
	* gcc.target/powerpc/pr102059-4.c: New test.
	* gcc.target/powerpc/pr102059-5.c: New test.

-----
---
 gcc/config/rs6000/rs6000.c                    | 95 +++++++++++++------
 gcc/testsuite/gcc.target/powerpc/pr102059-1.c | 24 +++++
 gcc/testsuite/gcc.target/powerpc/pr102059-2.c | 20 ++++
 gcc/testsuite/gcc.target/powerpc/pr102059-3.c | 95 +++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 22 +++++
 gcc/testsuite/gcc.target/powerpc/pr102059-5.c | 22 +++++
 6 files changed, 250 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-5.c
  

Comments

Peter Bergner Dec. 3, 2021, 11:23 p.m. UTC | #1
On 12/2/21 9:46 PM, Kewen.Lin via Gcc-patches wrote:
> on 2021/11/30 上午12:57, Segher Boessenkool wrote:
>> On Wed, Sep 01, 2021 at 02:55:51PM +0800, Kewen.Lin wrote:
>>> This patch is to fix the inconsistent behaviors for non-LTO mode
>>> and LTO mode.  As Martin pointed out, currently the function
>>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>>> NULL, but it's wrong, we should use the command line options
>>> from target_option_default_node as default.
>>
>> This is not documented.
>>
> 
> Yeah, but according to the document for the target attribute [1],
> "Multiple target back ends implement the target attribute to specify
> that a function is to be compiled with different target options than
> specified on the command line. The original target command-line options
> are ignored. ", it seems to say the function without any target
> attribute/pragma will be compiled with target options specified on the
> command line.  I think it's a normal expectation for users.
> 
> Excepting for the inconsistent behaviors between LTO and non-LTO,
> it can also make the below case different.

I thought Martin and richi mentioned that target attribute options
are treated as if they are appended to the end of the command line
options, so they can potentially override earlier options, but they
don't actually ignore them?

Peter
  
Martin Liška Dec. 6, 2021, 9:35 a.m. UTC | #2
On 12/4/21 00:23, Peter Bergner wrote:
> On 12/2/21 9:46 PM, Kewen.Lin via Gcc-patches wrote:
>> on 2021/11/30 上午12:57, Segher Boessenkool wrote:
>>> On Wed, Sep 01, 2021 at 02:55:51PM +0800, Kewen.Lin wrote:
>>>> This patch is to fix the inconsistent behaviors for non-LTO mode
>>>> and LTO mode.  As Martin pointed out, currently the function
>>>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>>>> NULL, but it's wrong, we should use the command line options
>>>> from target_option_default_node as default.
>>>
>>> This is not documented.
>>>
>>
>> Yeah, but according to the document for the target attribute [1],
>> "Multiple target back ends implement the target attribute to specify
>> that a function is to be compiled with different target options than
>> specified on the command line. The original target command-line options
>> are ignored. ", it seems to say the function without any target
>> attribute/pragma will be compiled with target options specified on the
>> command line.  I think it's a normal expectation for users.
>>
>> Excepting for the inconsistent behaviors between LTO and non-LTO,
>> it can also make the below case different.
> 
> I thought Martin and richi mentioned that target attribute options
> are treated as if they are appended to the end of the command line
> options, so they can potentially override earlier options, but they
> don't actually ignore them?

No, the described behavior is true for optimize attribute:

optimize (string, …)
...
The optimize attribute arguments of a function behave behave as if appended to the command-line.

but:

target (string, …)
...
The original target command-line options are ignored.

As seen here:
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

Cheers,
Martin

> 
> Peter
>
  
Segher Boessenkool Dec. 6, 2021, 1:06 p.m. UTC | #3
On Fri, Dec 03, 2021 at 11:46:53AM +0800, Kewen.Lin wrote:
> >> This patch is to fix the inconsistent behaviors for non-LTO mode
> >> and LTO mode.  As Martin pointed out, currently the function
> >> rs6000_can_inline_p simply makes it inlinable if callee_tree is
> >> NULL, but it's wrong, we should use the command line options
> >> from target_option_default_node as default.
> > 
> > This is not documented.
> 
> Yeah, but according to the document for the target attribute [1],
> "Multiple target back ends implement the target attribute to specify
> that a function is to be compiled with different target options than
> specified on the command line. The original target command-line options
> are ignored. ", it seems to say the function without any target
> attribute/pragma will be compiled with target options specified on the
> command line.  I think it's a normal expectation for users.

The whole target_option_default_node is not documented, I meant.

> Excepting for the inconsistent behaviors between LTO and non-LTO,
> it can also make the below case different.
> 
> // Option: -O2 -mcpu=power8 -mhtm
> 
> int foo(int *b) {
>   *b += __builtin_ttest();
>   return *b;
> }
> 
> __attribute__((target("no-htm"))) int bar(int *a) {
>   *a = foo(a);
>   return 0;
> }
> 
> Without this fix, bar inlines foo but get the error as below:
> 
> In function ‘foo’,
>     inlined from ‘bar’ at test.c:8:8:
> test.c:3:9: error: ‘__builtin_ttest’ requires the ‘-mhtm’ option
>     3 |   *b += __builtin_ttest();
>       |         ^~~~~~~~~~~~~~~~~
> 
> Since bar doesn't support foo's required HTM feature.
> 
> With this fix, bar doesn't inline foo and the case gets compiled well.

And if you inline something no-htm into something that allows htm?  That
should work fine, be allowed just fine.

> >> It also replaces
> >> rs6000_isa_flags with the one from target_option_default_node
> >> when caller_tree is NULL as rs6000_isa_flags could probably
> >> change since initialization.
> > 
> > Is this enough then?  Are there no other places that use
> > rs6000_isa_flags?  Is it correct for us to have that variable at all
> > anymore?  Etc.
> 
> Good question, I think the answer is yes.  I digged into it and found
> the current target option save/restore scheme would keep rs6000_isa_flags
> to be the same as the one saved in target_option_default_node for the context
> of inlining.  So I put one assertion as below:
> 
>     if (!caller_tree) {
>       caller_tree = target_option_default_node;
>       gcc_assert (rs6000_isa_flags
>                   == TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags);
>     }
> 
> And kicked off regression testings on both BE and LE, the result showed
> only one same failure, which seems to be a latent bug.  I've filed PR103515
> for it.

Ah cool :-)  Please send a patch to add that asser then (well, once we
can bootstrap with it ;-) )

> This bug also indicates it's better to use target_option_default_node
> rather than rs6000_isa_flags when the caller_tree is NULL.  Since
> the target_option_default_node is straightforward and doesn't rely
> on the assumption that rs6000_isa_flags would be kept as the default
> one correctly, it doesn't suffer from bugs like PR103515.

So we should not ever use rs6000_isa_flags anymore?

> > The fusion ones are obvious.  The other two are not.  Please put those
> > two more obviously separate (not somewhere in the sea of fusion
> > options), and some comment wouldn't hurt either.  You can make it
> > separate statements even, make it really stand out.
> > 
> 
> Fixed by splitting it into: fusion_options_mask and optimization_options_mask.
> I'm not happy for the later name, but poor to get a better in mind.  Do you
> have any suggestions on the name and words?

You dont have to split it into variables, as you found out then you get
the usual naming problem.  But you can just split it in the code:

  if (important_condition
      || another_important_one
      /* comment explaining things */
      || bla1 || bla2 || bla3 || bla4 || bla5)

> > Why are there OPTION_MASKs for separate P10 fusion types here, as well as
> > MASK_P10_FUSION?
> 
> Mike helped to explain the history, I've updated all of them using OPTION_MASK_
> to avoid potential confusion.

That is one thing, sure, but why are both needed?  Both the "main" flag,
and the "details" flags.

(The latter should soon go away btw).


Segher
  
Peter Bergner Dec. 6, 2021, 2:40 p.m. UTC | #4
On 12/6/21 3:35 AM, Martin Liška wrote:
> On 12/4/21 00:23, Peter Bergner wrote:
>> I thought Martin and richi mentioned that target attribute options
>> are treated as if they are appended to the end of the command line
>> options, so they can potentially override earlier options, but they
>> don't actually ignore them?
> 
> No, the described behavior is true for optimize attribute:
> 
> optimize (string, …)
> ...
> The optimize attribute arguments of a function behave behave as if appended to the command-line.
> 
> but:
> 
> target (string, …)
> ...
> The original target command-line options are ignored.

Ok, you learn something new every day.  Thanks for setting me straight!

Peter
  
Peter Bergner Dec. 6, 2021, 2:52 p.m. UTC | #5
On 12/6/21 7:06 AM, Segher Boessenkool wrote:
> On Fri, Dec 03, 2021 at 11:46:53AM +0800, Kewen.Lin wrot
>> Without this fix, bar inlines foo but get the error as below:
>>
>> In function ‘foo’,
>>     inlined from ‘bar’ at test.c:8:8:
>> test.c:3:9: error: ‘__builtin_ttest’ requires the ‘-mhtm’ option
>>     3 |   *b += __builtin_ttest();
>>       |         ^~~~~~~~~~~~~~~~~
>>
>> Since bar doesn't support foo's required HTM feature.
>>
>> With this fix, bar doesn't inline foo and the case gets compiled well.
> 
> And if you inline something no-htm into something that allows htm?  That
> should work fine, be allowed just fine.

It is only ok to inline a function to be compiled with no-htm into a
function that will be compiled with htm if the no-htm function is
being compiled with no-htm by default.  If the user explicitly used
-fno-htm on the function, then we should not inline the htm function
into it.  Ditto for the other way around.  Meaning, the caller's
option flags should be a superset of the callee's flags, but the
explicit flags used on both functions need to match exactly.

Peter
  
Kewen.Lin Dec. 7, 2021, 3:42 a.m. UTC | #6
Hi Segher,

on 2021/12/6 下午9:06, Segher Boessenkool wrote:
> On Fri, Dec 03, 2021 at 11:46:53AM +0800, Kewen.Lin wrote:
>>>> This patch is to fix the inconsistent behaviors for non-LTO mode
>>>> and LTO mode.  As Martin pointed out, currently the function
>>>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>>>> NULL, but it's wrong, we should use the command line options
>>>> from target_option_default_node as default.
>>>
>>> This is not documented.
>>
>> Yeah, but according to the document for the target attribute [1],
>> "Multiple target back ends implement the target attribute to specify
>> that a function is to be compiled with different target options than
>> specified on the command line. The original target command-line options
>> are ignored. ", it seems to say the function without any target
>> attribute/pragma will be compiled with target options specified on the
>> command line.  I think it's a normal expectation for users.
> 
> The whole target_option_default_node is not documented, I meant.
> 

Ah, you meant that.  Yeah, it can be improved separately.

>> Excepting for the inconsistent behaviors between LTO and non-LTO,
>> it can also make the below case different.
>>
>> // Option: -O2 -mcpu=power8 -mhtm
>>
>> int foo(int *b) {
>>   *b += __builtin_ttest();
>>   return *b;
>> }
>>
>> __attribute__((target("no-htm"))) int bar(int *a) {
>>   *a = foo(a);
>>   return 0;
>> }
>>
>> Without this fix, bar inlines foo but get the error as below:
>>
>> In function ‘foo’,
>>     inlined from ‘bar’ at test.c:8:8:
>> test.c:3:9: error: ‘__builtin_ttest’ requires the ‘-mhtm’ option
>>     3 |   *b += __builtin_ttest();
>>       |         ^~~~~~~~~~~~~~~~~
>>
>> Since bar doesn't support foo's required HTM feature.
>>
>> With this fix, bar doesn't inline foo and the case gets compiled well.
> 
> And if you inline something no-htm into something that allows htm?  That
> should work fine, be allowed just fine.
> 

Thanks for helping to answer this, Peter!

>>>> It also replaces
>>>> rs6000_isa_flags with the one from target_option_default_node
>>>> when caller_tree is NULL as rs6000_isa_flags could probably
>>>> change since initialization.
>>>
>>> Is this enough then?  Are there no other places that use
>>> rs6000_isa_flags?  Is it correct for us to have that variable at all
>>> anymore?  Etc.
>>
>> Good question, I think the answer is yes.  I digged into it and found
>> the current target option save/restore scheme would keep rs6000_isa_flags
>> to be the same as the one saved in target_option_default_node for the context
>> of inlining.  So I put one assertion as below:
>>
>>     if (!caller_tree) {
>>       caller_tree = target_option_default_node;
>>       gcc_assert (rs6000_isa_flags
>>                   == TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags);
>>     }
>>
>> And kicked off regression testings on both BE and LE, the result showed
>> only one same failure, which seems to be a latent bug.  I've filed PR103515
>> for it.
> 
> Ah cool :-)  Please send a patch to add that asser then (well, once we
> can bootstrap with it ;-) )
> 

OK.

>> This bug also indicates it's better to use target_option_default_node
>> rather than rs6000_isa_flags when the caller_tree is NULL.  Since
>> the target_option_default_node is straightforward and doesn't rely
>> on the assumption that rs6000_isa_flags would be kept as the default
>> one correctly, it doesn't suffer from bugs like PR103515.
> 
> So we should not ever use rs6000_isa_flags anymore?
> 

If the question is for the scope in function rs6000_can_inline_p, yes.

Before this patch, the only use of rs6000_isa_flags is:

   /* If the caller has option attributes, then use them.
      Otherwise, use the command line options.  */
      if (caller_tree)
	caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
      else
	caller_isa = rs6000_isa_flags;

This patch changes this part to be like (logically):

      if (caller_tree)
	caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
      else
	caller_isa = TREE_TARGET_OPTION (target_option_default_node)->x_rs6000_isa_flags;

If the question is for the others out of function rs6000_can_inline_p, no.
The rs6000_isa_flags holds the flags for the current context of either top level
or some given function decl.  As function rs6000_set_current_function shows, we
already consider target_option_default_node for the case that there is NULL
DECL_FUNCTION_SPECIFIC_TARGET for one decl.

>>> The fusion ones are obvious.  The other two are not.  Please put those
>>> two more obviously separate (not somewhere in the sea of fusion
>>> options), and some comment wouldn't hurt either.  You can make it
>>> separate statements even, make it really stand out.
>>>
>>
>> Fixed by splitting it into: fusion_options_mask and optimization_options_mask.
>> I'm not happy for the later name, but poor to get a better in mind.  Do you
>> have any suggestions on the name and words?
> 
> You dont have to split it into variables, as you found out then you get
> the usual naming problem.  But you can just split it in the code:
> 
>   if (important_condition
>       || another_important_one
>       /* comment explaining things */
>       || bla1 || bla2 || bla3 || bla4 || bla5)
> 

Got it, thanks.  Removed those two variables and updated it to:

  /* Some features can be tolerated for always inlines.  */
  unsigned HOST_WIDE_INT always_inline_safe_mask
    /* Fusion option masks.  */
    = OPTION_MASK_P8_FUSION | OPTION_MASK_P10_FUSION
      | OPTION_MASK_P8_FUSION_SIGN | OPTION_MASK_P10_FUSION
      | OPTION_MASK_P10_FUSION_LD_CMPI | OPTION_MASK_P10_FUSION_2LOGICAL
      | OPTION_MASK_P10_FUSION_LOGADD | OPTION_MASK_P10_FUSION_ADDLOG
      | OPTION_MASK_P10_FUSION_2ADD
      /* Like fusion, some option masks which are just for optimization.  */
      | OPTION_MASK_SAVE_TOC_INDIRECT | OPTION_MASK_PCREL_OPT;

>>> Why are there OPTION_MASKs for separate P10 fusion types here, as well as
>>> MASK_P10_FUSION?
>>
>> Mike helped to explain the history, I've updated all of them using OPTION_MASK_
>> to avoid potential confusion.
> 
> That is one thing, sure, but why are both needed?  Both the "main" flag,
> and the "details" flags.

Because they are taken as independent bits in the below checking with bitwise
manipulation, like:

  (caller_isa & callee_isa) == callee_isa)

> 
> (The latter should soon go away btw).
> 
> 
> Segher
> 

The updated patch is attached, just changing always_inline_safe_mask
should be no any functional changes comparing to the previous version.

Does it look good to you?

BR,
Kewen
---
gcc/ChangeLog:

	PR ipa/102059
	* config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
	target_option_default_node and consider always_inline_safe flags.

gcc/testsuite/ChangeLog:

	PR ipa/102059
	* gcc.target/powerpc/pr102059-1.c: New test.
	* gcc.target/powerpc/pr102059-2.c: New test.
	* gcc.target/powerpc/pr102059-3.c: New test.
	* gcc.target/powerpc/pr102059-4.c: New test.
	* gcc.target/powerpc/pr102059-5.c: New test.

-----
---
 gcc/config/rs6000/rs6000.c                    | 91 ++++++++++++------
 gcc/testsuite/gcc.target/powerpc/pr102059-1.c | 24 +++++
 gcc/testsuite/gcc.target/powerpc/pr102059-2.c | 20 ++++
 gcc/testsuite/gcc.target/powerpc/pr102059-3.c | 95 +++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 22 +++++
 gcc/testsuite/gcc.target/powerpc/pr102059-5.c | 22 +++++
 6 files changed, 246 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-5.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 289c1b3df24..856bd271a14 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25539,45 +25539,80 @@ rs6000_generate_version_dispatcher_body (void *node_p)
 static bool
 rs6000_can_inline_p (tree caller, tree callee)
 {
-  bool ret = false;
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
-  /* If the callee has no option attributes, then it is ok to inline.  */
+  /* If the caller/callee has option attributes, then use them.
+     Otherwise, use the command line options.  */
   if (!callee_tree)
-    ret = true;
+    callee_tree = target_option_default_node;
+  if (!caller_tree)
+    caller_tree = target_option_default_node;
+
+  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags;
+  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
+
+  bool always_inline
+    = DECL_DISREGARD_INLINE_LIMITS (callee)
+      && lookup_attribute ("always_inline", DECL_ATTRIBUTES (callee));
+
+  /* Some features can be tolerated for always inlines.  */
+  unsigned HOST_WIDE_INT always_inline_safe_mask
+    /* Fusion option masks.  */
+    = OPTION_MASK_P8_FUSION | OPTION_MASK_P10_FUSION
+      | OPTION_MASK_P8_FUSION_SIGN | OPTION_MASK_P10_FUSION
+      | OPTION_MASK_P10_FUSION_LD_CMPI | OPTION_MASK_P10_FUSION_2LOGICAL
+      | OPTION_MASK_P10_FUSION_LOGADD | OPTION_MASK_P10_FUSION_ADDLOG
+      | OPTION_MASK_P10_FUSION_2ADD
+      /* Like fusion, some option masks which are just for optimization.  */
+      | OPTION_MASK_SAVE_TOC_INDIRECT | OPTION_MASK_PCREL_OPT;
+
+  /* To adjust callee is enough since we just check for subset.  */
+  if (always_inline)
+    callee_isa &= ~always_inline_safe_mask;
+
+  /* The callee's options must be a subset of the caller's options, i.e.
+     a vsx function may inline an altivec function, but a no-vsx function
+     must not inline a vsx function.  */
+  if ((caller_isa & callee_isa) != callee_isa)
+    {
+      if (TARGET_DEBUG_TARGET)
+	fprintf (stderr,
+		 "rs6000_can_inline_p:, caller %s, callee %s, cannot "
+		 "inline since callee's options set isn't a subset of "
+		 "caller's options set.\n",
+		 get_decl_name (caller), get_decl_name (callee));
+      return false;
+    }
 
-  else
+  /* For those options that the callee has explicitly enabled or disabled,
+     then we must enforce that the callee's and caller's options match
+     exactly; see PR70010.  */
+  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
+  if (always_inline)
+    explicit_isa &= ~always_inline_safe_mask;
+  if ((caller_isa & explicit_isa) != (callee_isa & explicit_isa))
     {
-      HOST_WIDE_INT caller_isa;
-      struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
-      HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
-      HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
-
-      /* If the caller has option attributes, then use them.
-	 Otherwise, use the command line options.  */
-      if (caller_tree)
-	caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
-      else
-	caller_isa = rs6000_isa_flags;
-
-      /* The callee's options must be a subset of the caller's options, i.e.
-	 a vsx function may inline an altivec function, but a no-vsx function
-	 must not inline a vsx function.  However, for those options that the
-	 callee has explicitly enabled or disabled, then we must enforce that
-	 the callee's and caller's options match exactly; see PR70010.  */
-      if (((caller_isa & callee_isa) == callee_isa)
-	  && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
-	ret = true;
+      if (TARGET_DEBUG_TARGET)
+	fprintf (stderr,
+		 "rs6000_can_inline_p:, caller %s, callee %s, cannot "
+		 "inline since callee's options set isn't a subset of "
+		 "caller's options set by considering callee's "
+		 "explicitly set options.\n",
+		 get_decl_name (caller), get_decl_name (callee));
+      return false;
     }
 
   if (TARGET_DEBUG_TARGET)
-    fprintf (stderr, "rs6000_can_inline_p:, caller %s, callee %s, %s inline\n",
-	     get_decl_name (caller), get_decl_name (callee),
-	     (ret ? "can" : "cannot"));
+    fprintf (stderr,
+	     "rs6000_can_inline_p:, caller %s, callee %s, can inline.\n",
+	     get_decl_name (caller), get_decl_name (callee));
 
-  return ret;
+  return true;
 }
+
 
 /* Allocate a stack temp and fixup the address so it meets the particular
    memory requirements (either offetable or REG+REG addressing).  */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-1.c b/gcc/testsuite/gcc.target/powerpc/pr102059-1.c
new file mode 100644
index 00000000000..d2a002cf141
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_htm_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+
+/* Verify it emits inlining error msg at non-LTO mode.  */
+
+#include <htmintrin.h>
+
+static inline int __attribute__ ((always_inline))
+foo (int *b) /* { dg-error "inlining failed in call to .* target specific option mismatch" } */
+{
+  int res = _HTM_STATE(__builtin_ttest());
+  *b += res;
+  return *b;
+}
+
+#pragma GCC target "cpu=power10"
+int
+bar (int *a)
+{
+  *a = foo (a); /* { dg-message "called from here" } */
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-2.c b/gcc/testsuite/gcc.target/powerpc/pr102059-2.c
new file mode 100644
index 00000000000..1d5d6c38bf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-vsx" } */
+
+/* Verify it emits inlining error msg when the callee has explicit
+   disabling option from command line.  */
+
+vector int c, a, b;
+
+static inline void __attribute__ ((__always_inline__))
+foo () /* { dg-error "inlining failed in call to .* target specific option mismatch" } */
+{
+  c = a + b;
+}
+
+__attribute__ ((target ("vsx")))
+int main ()
+{
+  foo (); /* { dg-message "called from here" } */
+  c = a + b;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
new file mode 100644
index 00000000000..9684cab986a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
@@ -0,0 +1,95 @@
+/* { dg-do compile } */
+/* -Wno-attributes suppresses always_inline warnings.  */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -Wno-attributes" } */
+
+/* Verify it doesn't emit inlining error msg since some mismatched
+   features are considered as safe for always_inline.  */
+
+/* 1. Callee enables Power8 fusion implicitly, while caller
+      with Power9 doesn't support power8 fusion at all.  */
+
+__attribute__ ((always_inline))
+int callee1 (int *b)
+{
+  *b += 1;
+  return *b;
+}
+
+#pragma GCC target "cpu=power9"
+int caller1 (int *a)
+{
+  *a = callee1 (a);
+  return 0;
+}
+
+/* 2. Caller enables indirect toc save feature while callee
+      disables it explicitly.  */
+
+#pragma GCC target "save-toc-indirect"
+__attribute__ ((always_inline))
+int callee2 (int *b)
+{
+  *b += 2;
+  return *b;
+}
+
+#pragma GCC target "no-save-toc-indirect"
+int caller2 (int *a)
+{
+  *a = callee2 (a);
+  return 0;
+}
+
+/* 3. Caller disables Power10 fusion explicitly, while callee
+      still supports it as Power10 turns it on by default.  */
+
+#pragma GCC target "cpu=power10"
+__attribute__ ((always_inline))
+int callee3 (int *b)
+{
+  *b += 3;
+  return *b;
+}
+
+#pragma GCC target "cpu=power10,no-power10-fusion"
+int caller3 (int *a)
+{
+  *a = callee3 (a);
+  return 0;
+}
+
+/* 4. Caller enables Power10 fusion implicitly, while callee
+      disables it explicitly.  */
+
+#pragma GCC target "no-power10-fusion"
+__attribute__ ((always_inline))
+int callee4 (int *b)
+{
+  *b += 4;
+  return *b;
+}
+
+#pragma GCC target "cpu=power10"
+int caller4 (int *a)
+{
+  *a = callee4 (a);
+  return 0;
+}
+
+/* 5. Caller disables pcrel-opt while callee enables it explicitly.  */
+
+#pragma GCC target "cpu=power10,no-pcrel-opt"
+__attribute__ ((always_inline))
+int callee5 (int *b)
+{
+  *b += 5;
+  return *b;
+}
+
+#pragma GCC target "cpu=power10,pcrel-opt"
+int caller5 (int *a)
+{
+  *a = callee5 (a);
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-4.c b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
new file mode 100644
index 00000000000..0f27f2ce7d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* -Wno-attributes suppresses always_inline warnings.  */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -Wno-attributes" } */
+
+/* Verify it doesn't emit inlining error msg since the flag power8
+   fusion is considered as safe for always_inline, it's still safe
+   even the flag is set explicitly.  */
+
+__attribute__ ((always_inline))
+int foo (int *b)
+{
+  *b += 10;
+  return *b;
+}
+
+#pragma GCC target "power8-fusion"
+int bar (int *a)
+{
+  *a = foo (a);
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-5.c b/gcc/testsuite/gcc.target/powerpc/pr102059-5.c
new file mode 100644
index 00000000000..826ce88025f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_htm_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mhtm" } */
+
+/* Verify the inlining won't perform when the callee requires
+   some target feature which isn't supported by caller, even
+   if the callee doesn't have any target attributes or pragmas.
+   If the inlining performs here, the compilation will fail.  */
+
+int
+foo (int *b)
+{
+  *b += __builtin_ttest ();
+  return *b;
+}
+
+__attribute__ ((target ("no-htm"))) int
+bar (int *a)
+{
+  *a = foo (a);
+  return 0;
+}
  

Patch

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 289c1b3df24..8d4258e58b1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25539,45 +25539,84 @@  rs6000_generate_version_dispatcher_body (void *node_p)
 static bool
 rs6000_can_inline_p (tree caller, tree callee)
 {
-  bool ret = false;
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
-  /* If the callee has no option attributes, then it is ok to inline.  */
+  /* If the caller/callee has option attributes, then use them.
+     Otherwise, use the command line options.  */
   if (!callee_tree)
-    ret = true;
+    callee_tree = target_option_default_node;
+  if (!caller_tree)
+    caller_tree = target_option_default_node;
+
+  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags;
+  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
+
+  bool always_inline
+    = DECL_DISREGARD_INLINE_LIMITS (callee)
+      && lookup_attribute ("always_inline", DECL_ATTRIBUTES (callee));
+
+  /* Fusion option masks.  */
+  unsigned HOST_WIDE_INT fusion_options_mask
+    = OPTION_MASK_P8_FUSION | OPTION_MASK_P10_FUSION | OPTION_MASK_P8_FUSION_SIGN
+      | OPTION_MASK_P10_FUSION_LD_CMPI | OPTION_MASK_P10_FUSION_2LOGICAL
+      | OPTION_MASK_P10_FUSION_LOGADD | OPTION_MASK_P10_FUSION_ADDLOG
+      | OPTION_MASK_P10_FUSION_2ADD;
+
+  /* Like fusion, some option masks which are just for optimization.  */
+  unsigned HOST_WIDE_INT optimization_options_mask
+    = OPTION_MASK_SAVE_TOC_INDIRECT | OPTION_MASK_PCREL_OPT;
+
+  /* Some features can be tolerated for always inlines.  */
+  unsigned HOST_WIDE_INT always_inline_safe_mask
+    = fusion_options_mask | optimization_options_mask;
+
+  /* To adjust callee is enough since we just check for subset.  */
+  if (always_inline)
+    callee_isa &= ~always_inline_safe_mask;
+
+  /* The callee's options must be a subset of the caller's options, i.e.
+     a vsx function may inline an altivec function, but a no-vsx function
+     must not inline a vsx function.  */
+  if ((caller_isa & callee_isa) != callee_isa)
+    {
+      if (TARGET_DEBUG_TARGET)
+	fprintf (stderr,
+		 "rs6000_can_inline_p:, caller %s, callee %s, cannot "
+		 "inline since callee's options set isn't a subset of "
+		 "caller's options set.\n",
+		 get_decl_name (caller), get_decl_name (callee));
+      return false;
+    }
 
-  else
+  /* For those options that the callee has explicitly enabled or disabled,
+     then we must enforce that the callee's and caller's options match
+     exactly; see PR70010.  */
+  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
+  if (always_inline)
+    explicit_isa &= ~always_inline_safe_mask;
+  if ((caller_isa & explicit_isa) != (callee_isa & explicit_isa))
     {
-      HOST_WIDE_INT caller_isa;
-      struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
-      HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
-      HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
-
-      /* If the caller has option attributes, then use them.
-	 Otherwise, use the command line options.  */
-      if (caller_tree)
-	caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
-      else
-	caller_isa = rs6000_isa_flags;
-
-      /* The callee's options must be a subset of the caller's options, i.e.
-	 a vsx function may inline an altivec function, but a no-vsx function
-	 must not inline a vsx function.  However, for those options that the
-	 callee has explicitly enabled or disabled, then we must enforce that
-	 the callee's and caller's options match exactly; see PR70010.  */
-      if (((caller_isa & callee_isa) == callee_isa)
-	  && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
-	ret = true;
+      if (TARGET_DEBUG_TARGET)
+	fprintf (stderr,
+		 "rs6000_can_inline_p:, caller %s, callee %s, cannot "
+		 "inline since callee's options set isn't a subset of "
+		 "caller's options set by considering callee's "
+		 "explicitly set options.\n",
+		 get_decl_name (caller), get_decl_name (callee));
+      return false;
     }
 
   if (TARGET_DEBUG_TARGET)
-    fprintf (stderr, "rs6000_can_inline_p:, caller %s, callee %s, %s inline\n",
-	     get_decl_name (caller), get_decl_name (callee),
-	     (ret ? "can" : "cannot"));
+    fprintf (stderr,
+	     "rs6000_can_inline_p:, caller %s, callee %s, can inline.\n",
+	     get_decl_name (caller), get_decl_name (callee));
 
-  return ret;
+  return true;
 }
+
 
 /* Allocate a stack temp and fixup the address so it meets the particular
    memory requirements (either offetable or REG+REG addressing).  */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-1.c b/gcc/testsuite/gcc.target/powerpc/pr102059-1.c
new file mode 100644
index 00000000000..d2a002cf141
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-1.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_htm_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+
+/* Verify it emits inlining error msg at non-LTO mode.  */
+
+#include <htmintrin.h>
+
+static inline int __attribute__ ((always_inline))
+foo (int *b) /* { dg-error "inlining failed in call to .* target specific option mismatch" } */
+{
+  int res = _HTM_STATE(__builtin_ttest());
+  *b += res;
+  return *b;
+}
+
+#pragma GCC target "cpu=power10"
+int
+bar (int *a)
+{
+  *a = foo (a); /* { dg-message "called from here" } */
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-2.c b/gcc/testsuite/gcc.target/powerpc/pr102059-2.c
new file mode 100644
index 00000000000..1d5d6c38bf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-2.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-vsx" } */
+
+/* Verify it emits inlining error msg when the callee has explicit
+   disabling option from command line.  */
+
+vector int c, a, b;
+
+static inline void __attribute__ ((__always_inline__))
+foo () /* { dg-error "inlining failed in call to .* target specific option mismatch" } */
+{
+  c = a + b;
+}
+
+__attribute__ ((target ("vsx")))
+int main ()
+{
+  foo (); /* { dg-message "called from here" } */
+  c = a + b;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
new file mode 100644
index 00000000000..9684cab986a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
@@ -0,0 +1,95 @@ 
+/* { dg-do compile } */
+/* -Wno-attributes suppresses always_inline warnings.  */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -Wno-attributes" } */
+
+/* Verify it doesn't emit inlining error msg since some mismatched
+   features are considered as safe for always_inline.  */
+
+/* 1. Callee enables Power8 fusion implicitly, while caller
+      with Power9 doesn't support power8 fusion at all.  */
+
+__attribute__ ((always_inline))
+int callee1 (int *b)
+{
+  *b += 1;
+  return *b;
+}
+
+#pragma GCC target "cpu=power9"
+int caller1 (int *a)
+{
+  *a = callee1 (a);
+  return 0;
+}
+
+/* 2. Caller enables indirect toc save feature while callee
+      disables it explicitly.  */
+
+#pragma GCC target "save-toc-indirect"
+__attribute__ ((always_inline))
+int callee2 (int *b)
+{
+  *b += 2;
+  return *b;
+}
+
+#pragma GCC target "no-save-toc-indirect"
+int caller2 (int *a)
+{
+  *a = callee2 (a);
+  return 0;
+}
+
+/* 3. Caller disables Power10 fusion explicitly, while callee
+      still supports it as Power10 turns it on by default.  */
+
+#pragma GCC target "cpu=power10"
+__attribute__ ((always_inline))
+int callee3 (int *b)
+{
+  *b += 3;
+  return *b;
+}
+
+#pragma GCC target "cpu=power10,no-power10-fusion"
+int caller3 (int *a)
+{
+  *a = callee3 (a);
+  return 0;
+}
+
+/* 4. Caller enables Power10 fusion implicitly, while callee
+      disables it explicitly.  */
+
+#pragma GCC target "no-power10-fusion"
+__attribute__ ((always_inline))
+int callee4 (int *b)
+{
+  *b += 4;
+  return *b;
+}
+
+#pragma GCC target "cpu=power10"
+int caller4 (int *a)
+{
+  *a = callee4 (a);
+  return 0;
+}
+
+/* 5. Caller disables pcrel-opt while callee enables it explicitly.  */
+
+#pragma GCC target "cpu=power10,no-pcrel-opt"
+__attribute__ ((always_inline))
+int callee5 (int *b)
+{
+  *b += 5;
+  return *b;
+}
+
+#pragma GCC target "cpu=power10,pcrel-opt"
+int caller5 (int *a)
+{
+  *a = callee5 (a);
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-4.c b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
new file mode 100644
index 00000000000..0f27f2ce7d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
@@ -0,0 +1,22 @@ 
+/* { dg-do compile } */
+/* -Wno-attributes suppresses always_inline warnings.  */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -Wno-attributes" } */
+
+/* Verify it doesn't emit inlining error msg since the flag power8
+   fusion is considered as safe for always_inline, it's still safe
+   even the flag is set explicitly.  */
+
+__attribute__ ((always_inline))
+int foo (int *b)
+{
+  *b += 10;
+  return *b;
+}
+
+#pragma GCC target "power8-fusion"
+int bar (int *a)
+{
+  *a = foo (a);
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-5.c b/gcc/testsuite/gcc.target/powerpc/pr102059-5.c
new file mode 100644
index 00000000000..826ce88025f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-5.c
@@ -0,0 +1,22 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_htm_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mhtm" } */
+
+/* Verify the inlining won't perform when the callee requires
+   some target feature which isn't supported by caller, even
+   if the callee doesn't have any target attributes or pragmas.
+   If the inlining performs here, the compilation will fail.  */
+
+int
+foo (int *b)
+{
+  *b += __builtin_ttest ();
+  return *b;
+}
+
+__attribute__ ((target ("no-htm"))) int
+bar (int *a)
+{
+  *a = foo (a);
+  return 0;
+}