[PATCH/RFC] rs6000: Remove optimize_for_speed check for implicit TARGET_SAVE_TOC_INDIRECT [PR108184]

Message ID 6305f0e5-d235-8916-6d42-7110cfede236@linux.ibm.com
State New
Headers
Series [PATCH/RFC] rs6000: Remove optimize_for_speed check for implicit TARGET_SAVE_TOC_INDIRECT [PR108184] |

Commit Message

Kewen.Lin Jan. 16, 2023, 9:39 a.m. UTC
  Hi,

Now we will check optimize_function_for_speed_p (cfun) for
TARGET_SAVE_TOC_INDIRECT if it's implicitly enabled.  But
the effect of -msave-toc-indirect is actually to save the
TOC in the prologue for indirect calls rather than inline,
it's also good for optimize_function_for_size?  So this
patch is to remove the check of optimize_function_for_speed
and make it work for both optimizing for size and speed.

Bootstrapped and regtested on powerpc64-linux-gnu P8,
powerpc64le-linux-gnu P{9,10} and powerpc-ibm-aix.

Any thoughts?

Thanks in advance!

Kewen
-----
gcc/ChangeLog:

        * config/rs6000/rs6000.cc (rs6000_call_aix): Remove the check of
        optimize_function_for_speed_p for implicit SAVE_TOC_INDIRECT.

gcc/testsuite/ChangeLog:

        * gcc.target/powerpc/pr108184-3.c: Adjust.
---
 gcc/config/rs6000/rs6000.cc                   |  5 +----
 gcc/testsuite/gcc.target/powerpc/pr108184-3.c | 17 ++++++++++++-----
 2 files changed, 13 insertions(+), 9 deletions(-)

--
2.27.0
  

Comments

Michael Meissner Jan. 17, 2023, 8:57 p.m. UTC | #1
On Mon, Jan 16, 2023 at 05:39:04PM +0800, Kewen.Lin wrote:
> Hi,
> 
> Now we will check optimize_function_for_speed_p (cfun) for
> TARGET_SAVE_TOC_INDIRECT if it's implicitly enabled.  But
> the effect of -msave-toc-indirect is actually to save the
> TOC in the prologue for indirect calls rather than inline,
> it's also good for optimize_function_for_size?  So this
> patch is to remove the check of optimize_function_for_speed
> and make it work for both optimizing for size and speed.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8,
> powerpc64le-linux-gnu P{9,10} and powerpc-ibm-aix.
> 
> Any thoughts?
> 
> Thanks in advance!

Well in terms of size, it is only a savings if we have 2 or more indirect calls
within a module, and we are not compiling for power10.

On power9, if we have just one indirect call, then it is the same size.

On power10, the -msave-toc-indirect switch does nothing, because we don't need
TOCs when we have prefixed addressing.

So I have objection to the change.  I suspect it may be better with a check for
just optimize either for speed or size, and not for speed.

The option however, can slow things down if there is an early exit to the
function since the store would always be done, even if the function exits
early.
  
Michael Meissner Jan. 17, 2023, 8:58 p.m. UTC | #2
On Tue, Jan 17, 2023 at 03:57:24PM -0500, Michael Meissner wrote:
> So I have objection to the change.  I suspect it may be better with a check for
> just optimize either for speed or size, and not for speed.

Sigh.  I meant I have NO objection to the change.  Sorry about that.
  
Kewen.Lin Jan. 18, 2023, 9:04 a.m. UTC | #3
Hi Mike,

Thanks for the comments!

on 2023/1/18 04:57, Michael Meissner wrote:
> On Mon, Jan 16, 2023 at 05:39:04PM +0800, Kewen.Lin wrote:
>> Hi,
>>
>> Now we will check optimize_function_for_speed_p (cfun) for
>> TARGET_SAVE_TOC_INDIRECT if it's implicitly enabled.  But
>> the effect of -msave-toc-indirect is actually to save the
>> TOC in the prologue for indirect calls rather than inline,
>> it's also good for optimize_function_for_size?  So this
>> patch is to remove the check of optimize_function_for_speed
>> and make it work for both optimizing for size and speed.
>>
>> Bootstrapped and regtested on powerpc64-linux-gnu P8,
>> powerpc64le-linux-gnu P{9,10} and powerpc-ibm-aix.
>>
>> Any thoughts?
>>
>> Thanks in advance!
> 
> Well in terms of size, it is only a savings if we have 2 or more indirect calls
> within a module, and we are not compiling for power10.
> 
> On power9, if we have just one indirect call, then it is the same size.
> 
> On power10, the -msave-toc-indirect switch does nothing, because we don't need
> TOCs when we have prefixed addressing.

Yes, exactly, so the test cases have the explicit option -mno-pcrel.

> 
> So I have objection to the change.  I suspect it may be better with a check for
> just optimize either for speed or size, and not for speed.
> 
> The option however, can slow things down if there is an early exit to the
> function since the store would always be done, even if the function exits
> early.
> 

Good point, I guessed that's why we only try to turn it on under the guard
flag_shrink_wrap_separate when there is no explicit -m{no-,}save-toc-indirect.

BR,
Kewen
  

Patch

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index f47d21980a9..870525347d5 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -25688,10 +25688,7 @@  rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)

          /* Can we optimize saving the TOC in the prologue or
             do we need to do it at every call?  */
-         if (TARGET_SAVE_TOC_INDIRECT
-             && !cfun->calls_alloca
-             && (rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT
-                 || optimize_function_for_speed_p (cfun)))
+         if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
            cfun->machine->save_toc_in_prologue = true;
          else
            {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108184-3.c b/gcc/testsuite/gcc.target/powerpc/pr108184-3.c
index ceaa96e4421..a1ce3a18855 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr108184-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr108184-3.c
@@ -2,15 +2,22 @@ 
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-fpic -mno-pcrel -O2" } */

-/* Verify it doesn't imply -msave-toc-indirect, so
-   it doesn't take effect and we have two separated
-   toc savings for these two long calls.  */
+/* Verify -msave-toc-indirect is implicitly enabled
+   for both optimizing for speed and size, so one
+   toc saving for each function.  */

 void foo (void) __attribute__((__longcall__));
 int baz (void) __attribute__((__longcall__));

-__attribute__ ((cold)) int
-bar (void)
+__attribute__ ((cold, noipa)) int
+bar1 (void)
+{
+  foo ();
+  return baz () + 1;
+}
+
+__attribute__ ((noipa)) int
+bar2 (void)
 {
   foo ();
   return baz () + 1;