[0/3] Add zero cycle move support

Message ID YZe5llec+qA6YdtE@toto.the-meissners.org
Headers
Series Add zero cycle move support |

Message

Michael Meissner Nov. 19, 2021, 2:49 p.m. UTC
  The next set of 3 patches add zero cycle move support to the Power10.  Zero
cycle moves are where the move to LR/CTR/TAR register that is adjacent to the
jump to LR/CTR/TAR register can be fused together.

At the moment, these set of three patches add support for zero cycle moves for
indirect jumps and switch tables using the CTR register.  Potential zero cycle
moves for doing returns are not currently handled.

In looking at the code, I discovered that just using zero cycle moves isn't as
helpful unless we can eliminate the add instruction before doing the jump.  I
also noticed that the various power10 fusion options are only done if
-mcpu=power10.  I added a patch to do the fusion for -mtune=power10 as well.

I have done bootstraps and make check with these patches installed on both
little endian power9 and little endian power10 systems.  Can I install these
patches?

The following patches will be posted:

1) Patch to add zero cycle move for indirect jumps and switches.

2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10.

3) Patch to use absolute addresses for switch tables instead of relative
   addresses if zero cycle fusion is enabled.
  

Comments

Li, Pan2 via Gcc-patches Nov. 22, 2021, 3:57 p.m. UTC | #1
Hi!

On 11/19/21 8:49 AM, Michael Meissner wrote:
> The next set of 3 patches add zero cycle move support to the Power10.  Zero
> cycle moves are where the move to LR/CTR/TAR register that is adjacent to the
> jump to LR/CTR/TAR register can be fused together.
>
> At the moment, these set of three patches add support for zero cycle moves for
> indirect jumps and switch tables using the CTR register.  Potential zero cycle
> moves for doing returns are not currently handled.
>
> In looking at the code, I discovered that just using zero cycle moves isn't as
> helpful unless we can eliminate the add instruction before doing the jump.  I
> also noticed that the various power10 fusion options are only done if
> -mcpu=power10.  I added a patch to do the fusion for -mtune=power10 as well.
>
> I have done bootstraps and make check with these patches installed on both
> little endian power9 and little endian power10 systems.  Can I install these
> patches?
>
> The following patches will be posted:
>
> 1) Patch to add zero cycle move for indirect jumps and switches.
>
> 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10.
>
> 3) Patch to use absolute addresses for switch tables instead of relative
>    addresses if zero cycle fusion is enabled.
>
For this last point, I had thought that the plan was to always switch over to
absolute addresses for switch tables, following the work that Hao Chen did in
this area.  Am I misremembering?  Hao Chen, can you please remind me where we
ended up here?

Thanks!
Bill
  
David Edelsohn Nov. 22, 2021, 4:09 p.m. UTC | #2
On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
>
> Hi!
>
> On 11/19/21 8:49 AM, Michael Meissner wrote:
> > The next set of 3 patches add zero cycle move support to the Power10.  Zero
> > cycle moves are where the move to LR/CTR/TAR register that is adjacent to the
> > jump to LR/CTR/TAR register can be fused together.
> >
> > At the moment, these set of three patches add support for zero cycle moves for
> > indirect jumps and switch tables using the CTR register.  Potential zero cycle
> > moves for doing returns are not currently handled.
> >
> > In looking at the code, I discovered that just using zero cycle moves isn't as
> > helpful unless we can eliminate the add instruction before doing the jump.  I
> > also noticed that the various power10 fusion options are only done if
> > -mcpu=power10.  I added a patch to do the fusion for -mtune=power10 as well.
> >
> > I have done bootstraps and make check with these patches installed on both
> > little endian power9 and little endian power10 systems.  Can I install these
> > patches?
> >
> > The following patches will be posted:
> >
> > 1) Patch to add zero cycle move for indirect jumps and switches.
> >
> > 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10.
> >
> > 3) Patch to use absolute addresses for switch tables instead of relative
> >    addresses if zero cycle fusion is enabled.
> >
> For this last point, I had thought that the plan was to always switch over to
> absolute addresses for switch tables, following the work that Hao Chen did in
> this area.  Am I misremembering?  Hao Chen, can you please remind me where we
> ended up here?

And do the absolute addressing for switch tables changes work on AIX?
I thought that Hao Chen only had done the work for PPC64 Linux ELF
syntax with promises of future changes to accommodate AIX as well.

Thanks, David
  
Michael Meissner Nov. 22, 2021, 9:17 p.m. UTC | #3
On Mon, Nov 22, 2021 at 11:09:22AM -0500, David Edelsohn wrote:
> On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
> And do the absolute addressing for switch tables changes work on AIX?
> I thought that Hao Chen only had done the work for PPC64 Linux ELF
> syntax with promises of future changes to accommodate AIX as well.

In theory it should work on AIX, since the assembler has to support syntax to
load the contents of a 64-bit address in memory.

In the past, when I measured this (probably in the power8 days), the issue was
occasionally having 64-bit loads for the switch tables insted of 32-bit loads
and an add instruction meant a slow down for 1-2 benchmarks that were extremely
sensitive to cache sizes.
  
HAO CHEN GUI Nov. 23, 2021, 3:41 a.m. UTC | #4
Bill and David,

    Currently, the absolute jump table is not by default enabled. It can be enabled by undocumented option "-mno-relative-jumptables". If the target supports named sections (have_named_sections), the feature can be enabled. We plan to enable the future by default in GCC12 and there is a ticket for it.  Latest status is that I am waiting for comments on my patch. (https://github.ibm.com/wschmidt/power-gcc/issues/998#issuecomment-34643825). Thanks.

||

On 23/11/2021 上午 12:09, David Edelsohn wrote:
> On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
>> Hi!
>>
>> On 11/19/21 8:49 AM, Michael Meissner wrote:
>>> The next set of 3 patches add zero cycle move support to the Power10.  Zero
>>> cycle moves are where the move to LR/CTR/TAR register that is adjacent to the
>>> jump to LR/CTR/TAR register can be fused together.
>>>
>>> At the moment, these set of three patches add support for zero cycle moves for
>>> indirect jumps and switch tables using the CTR register.  Potential zero cycle
>>> moves for doing returns are not currently handled.
>>>
>>> In looking at the code, I discovered that just using zero cycle moves isn't as
>>> helpful unless we can eliminate the add instruction before doing the jump.  I
>>> also noticed that the various power10 fusion options are only done if
>>> -mcpu=power10.  I added a patch to do the fusion for -mtune=power10 as well.
>>>
>>> I have done bootstraps and make check with these patches installed on both
>>> little endian power9 and little endian power10 systems.  Can I install these
>>> patches?
>>>
>>> The following patches will be posted:
>>>
>>> 1) Patch to add zero cycle move for indirect jumps and switches.
>>>
>>> 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10.
>>>
>>> 3) Patch to use absolute addresses for switch tables instead of relative
>>>    addresses if zero cycle fusion is enabled.
>>>
>> For this last point, I had thought that the plan was to always switch over to
>> absolute addresses for switch tables, following the work that Hao Chen did in
>> this area.  Am I misremembering?  Hao Chen, can you please remind me where we
>> ended up here?
> And do the absolute addressing for switch tables changes work on AIX?
> I thought that Hao Chen only had done the work for PPC64 Linux ELF
> syntax with promises of future changes to accommodate AIX as well.
>
> Thanks, David