Message ID | YZe5llec+qA6YdtE@toto.the-meissners.org |
---|---|
Headers | show |
Series | Add zero cycle move support | expand |
Hi! On 11/19/21 8:49 AM, Michael Meissner wrote: > The next set of 3 patches add zero cycle move support to the Power10. Zero > cycle moves are where the move to LR/CTR/TAR register that is adjacent to the > jump to LR/CTR/TAR register can be fused together. > > At the moment, these set of three patches add support for zero cycle moves for > indirect jumps and switch tables using the CTR register. Potential zero cycle > moves for doing returns are not currently handled. > > In looking at the code, I discovered that just using zero cycle moves isn't as > helpful unless we can eliminate the add instruction before doing the jump. I > also noticed that the various power10 fusion options are only done if > -mcpu=power10. I added a patch to do the fusion for -mtune=power10 as well. > > I have done bootstraps and make check with these patches installed on both > little endian power9 and little endian power10 systems. Can I install these > patches? > > The following patches will be posted: > > 1) Patch to add zero cycle move for indirect jumps and switches. > > 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10. > > 3) Patch to use absolute addresses for switch tables instead of relative > addresses if zero cycle fusion is enabled. > For this last point, I had thought that the plan was to always switch over to absolute addresses for switch tables, following the work that Hao Chen did in this area. Am I misremembering? Hao Chen, can you please remind me where we ended up here? Thanks! Bill
On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote: > > Hi! > > On 11/19/21 8:49 AM, Michael Meissner wrote: > > The next set of 3 patches add zero cycle move support to the Power10. Zero > > cycle moves are where the move to LR/CTR/TAR register that is adjacent to the > > jump to LR/CTR/TAR register can be fused together. > > > > At the moment, these set of three patches add support for zero cycle moves for > > indirect jumps and switch tables using the CTR register. Potential zero cycle > > moves for doing returns are not currently handled. > > > > In looking at the code, I discovered that just using zero cycle moves isn't as > > helpful unless we can eliminate the add instruction before doing the jump. I > > also noticed that the various power10 fusion options are only done if > > -mcpu=power10. I added a patch to do the fusion for -mtune=power10 as well. > > > > I have done bootstraps and make check with these patches installed on both > > little endian power9 and little endian power10 systems. Can I install these > > patches? > > > > The following patches will be posted: > > > > 1) Patch to add zero cycle move for indirect jumps and switches. > > > > 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10. > > > > 3) Patch to use absolute addresses for switch tables instead of relative > > addresses if zero cycle fusion is enabled. > > > For this last point, I had thought that the plan was to always switch over to > absolute addresses for switch tables, following the work that Hao Chen did in > this area. Am I misremembering? Hao Chen, can you please remind me where we > ended up here? And do the absolute addressing for switch tables changes work on AIX? I thought that Hao Chen only had done the work for PPC64 Linux ELF syntax with promises of future changes to accommodate AIX as well. Thanks, David
On Mon, Nov 22, 2021 at 11:09:22AM -0500, David Edelsohn wrote: > On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote: > And do the absolute addressing for switch tables changes work on AIX? > I thought that Hao Chen only had done the work for PPC64 Linux ELF > syntax with promises of future changes to accommodate AIX as well. In theory it should work on AIX, since the assembler has to support syntax to load the contents of a 64-bit address in memory. In the past, when I measured this (probably in the power8 days), the issue was occasionally having 64-bit loads for the switch tables insted of 32-bit loads and an add instruction meant a slow down for 1-2 benchmarks that were extremely sensitive to cache sizes.
Bill and David, Currently, the absolute jump table is not by default enabled. It can be enabled by undocumented option "-mno-relative-jumptables". If the target supports named sections (have_named_sections), the feature can be enabled. We plan to enable the future by default in GCC12 and there is a ticket for it. Latest status is that I am waiting for comments on my patch. (https://github.ibm.com/wschmidt/power-gcc/issues/998#issuecomment-34643825). Thanks. || On 23/11/2021 上午 12:09, David Edelsohn wrote: > On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote: >> Hi! >> >> On 11/19/21 8:49 AM, Michael Meissner wrote: >>> The next set of 3 patches add zero cycle move support to the Power10. Zero >>> cycle moves are where the move to LR/CTR/TAR register that is adjacent to the >>> jump to LR/CTR/TAR register can be fused together. >>> >>> At the moment, these set of three patches add support for zero cycle moves for >>> indirect jumps and switch tables using the CTR register. Potential zero cycle >>> moves for doing returns are not currently handled. >>> >>> In looking at the code, I discovered that just using zero cycle moves isn't as >>> helpful unless we can eliminate the add instruction before doing the jump. I >>> also noticed that the various power10 fusion options are only done if >>> -mcpu=power10. I added a patch to do the fusion for -mtune=power10 as well. >>> >>> I have done bootstraps and make check with these patches installed on both >>> little endian power9 and little endian power10 systems. Can I install these >>> patches? >>> >>> The following patches will be posted: >>> >>> 1) Patch to add zero cycle move for indirect jumps and switches. >>> >>> 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10. >>> >>> 3) Patch to use absolute addresses for switch tables instead of relative >>> addresses if zero cycle fusion is enabled. >>> >> For this last point, I had thought that the plan was to always switch over to >> absolute addresses for switch tables, following the work that Hao Chen did in >> this area. Am I misremembering? Hao Chen, can you please remind me where we >> ended up here? > And do the absolute addressing for switch tables changes work on AIX? > I thought that Hao Chen only had done the work for PPC64 Linux ELF > syntax with promises of future changes to accommodate AIX as well. > > Thanks, David