[0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.

Message ID 20230525123550.1072506-1-manolis.tsamis@vrull.eu
Headers
Series RISC-V: New pass to optimize calculation of offsets for memory operations. |

Message

Manolis Tsamis May 25, 2023, 12:35 p.m. UTC
  This pass tries to optimize memory offset calculations by moving them
from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

The first patch in the series contains the implementation of this pass
while the second is a minor change that enables cprop_hardreg's
propgation of the stack pointer, because this pass depends on cprop
to do the propagation of optimized operations. If preferred I can split
this into two different patches (in which cases some of the testcases
included will fail temporarily).



Manolis Tsamis (2):
  Implementation of new RISCV optimizations pass: fold-mem-offsets.
  cprop_hardreg: Enable propagation of the stack pointer if possible.

 gcc/config.gcc                                |   2 +-
 gcc/config/riscv/riscv-fold-mem-offsets.cc    | 637 ++++++++++++++++++
 gcc/config/riscv/riscv-passes.def             |   1 +
 gcc/config/riscv/riscv-protos.h               |   1 +
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/t-riscv                      |   4 +
 gcc/doc/invoke.texi                           |   8 +
 gcc/regcprop.cc                               |   7 +-
 .../gcc.target/riscv/fold-mem-offsets-1.c     |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c     |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c     |  17 +
 11 files changed, 719 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
  

Comments

Jeff Law May 25, 2023, 1:42 p.m. UTC | #1
On 5/25/23 06:35, Manolis Tsamis wrote:
> 
> This pass tries to optimize memory offset calculations by moving them
> from add immediate instructions to the memory loads/stores.
> For example it can transform this:
> 
>    addi t4,sp,16
>    add  t2,a6,t4
>    shl  t3,t2,1
>    ld   a2,0(t3)
>    addi a2,1
>    sd   a2,8(t2)
> 
> into the following (one instruction less):
> 
>    add  t2,a6,sp
>    shl  t3,t2,1
>    ld   a2,32(t3)
>    addi a2,1
>    sd   a2,24(t2)
> 
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
> 
> The first patch in the series contains the implementation of this pass
> while the second is a minor change that enables cprop_hardreg's
> propgation of the stack pointer, because this pass depends on cprop
> to do the propagation of optimized operations. If preferred I can split
> this into two different patches (in which cases some of the testcases
> included will fail temporarily).
Thanks Manolis.  Do you happen to know if this includes the fixes I 
passed along to Philipp a few months back?  My recollection is one fixed 
stale DF data which prevented an ICE during bootstrapping, the other 
needed to ignore debug insns in one or two places so that the behavior 
didn't change based on the existence of debug insns.


Jeff
  
Manolis Tsamis May 25, 2023, 1:57 p.m. UTC | #2
On Thu, May 25, 2023 at 4:42 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> >
> > This pass tries to optimize memory offset calculations by moving them
> > from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >    addi t4,sp,16
> >    add  t2,a6,t4
> >    shl  t3,t2,1
> >    ld   a2,0(t3)
> >    addi a2,1
> >    sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >    add  t2,a6,sp
> >    shl  t3,t2,1
> >    ld   a2,32(t3)
> >    addi a2,1
> >    sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
> >
> > The first patch in the series contains the implementation of this pass
> > while the second is a minor change that enables cprop_hardreg's
> > propgation of the stack pointer, because this pass depends on cprop
> > to do the propagation of optimized operations. If preferred I can split
> > this into two different patches (in which cases some of the testcases
> > included will fail temporarily).
> Thanks Manolis.  Do you happen to know if this includes the fixes I
> passed along to Philipp a few months back?  My recollection is one fixed
> stale DF data which prevented an ICE during bootstrapping, the other
> needed to ignore debug insns in one or two places so that the behavior
> didn't change based on the existence of debug insns.
>

Hi Jeff,

Yes this does include your fixes for DF and debug insns, along with
some other minor improvements.
Also, thanks for catching these!

Manolis

>
> Jeff
  
Jeff Law June 15, 2023, 3:04 p.m. UTC | #3
On 5/25/23 07:42, Jeff Law wrote:

> Thanks Manolis.  Do you happen to know if this includes the fixes I 
> passed along to Philipp a few months back?  My recollection is one fixed 
> stale DF data which prevented an ICE during bootstrapping, the other 
> needed to ignore debug insns in one or two places so that the behavior 
> didn't change based on the existence of debug insns.
So we stumbled over another relatively minor issue in this code this 
week that I'm sure you'll want to fix for a V2.

Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT 
rather than an "int".  Inside the ASHIFT handling you need to change the 
type of shift_scale to a HOST_WIDE_INT as well and potentially the 
actual computation of shift_scale.

The problem is if you have a compile-time constant address on rv64, it 
might be constructed with code like this:




> (insn 282 47 283 6 (set (reg:DI 14 a4 [267])
>         (const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
>      (nil))
> (insn 283 282 284 6 (set (reg:DI 14 a4 [267])
>         (plus:DI (reg:DI 14 a4 [267])
>             (const_int 1365 [0x555]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>      (expr_list:REG_EQUAL (const_int 349525 [0x55555])
>         (nil)))
> (insn 284 283 285 6 (set (reg:DI 13 a3 [268])
>         (const_int 1431662592 [0x55557000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
>      (nil))
> (insn 285 284 215 6 (set (reg:DI 13 a3 [268])
>         (plus:DI (reg:DI 13 a3 [268])
>             (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>      (expr_list:REG_EQUAL (const_int 1431662596 [0x55557004])
>         (nil)))
> (insn 215 285 216 6 (set (reg:DI 14 a4 [271])
>         (ashift:DI (reg:DI 14 a4 [267]) 
>             (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3}
>      (nil)) 
> (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166])
>         (plus:DI (reg:DI 14 a4 [271]) 
>             (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>      (expr_list:REG_DEAD (reg:DI 13 a3 [268])
>         (expr_list:REG_EQUIV (const_int 1501199875796996 [0x5555555557004])
>             (nil))))



Note that 32bit ASHIFT in insn 215.  If you're doing that computation in 
a 32bit integer type, then it's going to shift off the end of the type.


Jeff
  
Manolis Tsamis June 15, 2023, 3:30 p.m. UTC | #4
On Thu, Jun 15, 2023 at 6:04 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 07:42, Jeff Law wrote:
>
> > Thanks Manolis.  Do you happen to know if this includes the fixes I
> > passed along to Philipp a few months back?  My recollection is one fixed
> > stale DF data which prevented an ICE during bootstrapping, the other
> > needed to ignore debug insns in one or two places so that the behavior
> > didn't change based on the existence of debug insns.
> So we stumbled over another relatively minor issue in this code this
> week that I'm sure you'll want to fix for a V2.
>
> Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT
> rather than an "int".  Inside the ASHIFT handling you need to change the
> type of shift_scale to a HOST_WIDE_INT as well and potentially the
> actual computation of shift_scale.
>
> The problem is if you have a compile-time constant address on rv64, it
> might be constructed with code like this:
>
>
>
>
> > (insn 282 47 283 6 (set (reg:DI 14 a4 [267])
> >         (const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
> >      (nil))
> > (insn 283 282 284 6 (set (reg:DI 14 a4 [267])
> >         (plus:DI (reg:DI 14 a4 [267])
> >             (const_int 1365 [0x555]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
> >      (expr_list:REG_EQUAL (const_int 349525 [0x55555])
> >         (nil)))
> > (insn 284 283 285 6 (set (reg:DI 13 a3 [268])
> >         (const_int 1431662592 [0x55557000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
> >      (nil))
> > (insn 285 284 215 6 (set (reg:DI 13 a3 [268])
> >         (plus:DI (reg:DI 13 a3 [268])
> >             (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
> >      (expr_list:REG_EQUAL (const_int 1431662596 [0x55557004])
> >         (nil)))
> > (insn 215 285 216 6 (set (reg:DI 14 a4 [271])
> >         (ashift:DI (reg:DI 14 a4 [267])
> >             (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3}
> >      (nil))
> > (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166])
> >         (plus:DI (reg:DI 14 a4 [271])
> >             (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
> >      (expr_list:REG_DEAD (reg:DI 13 a3 [268])
> >         (expr_list:REG_EQUIV (const_int 1501199875796996 [0x5555555557004])
> >             (nil))))
>
>
>
> Note that 32bit ASHIFT in insn 215.  If you're doing that computation in
> a 32bit integer type, then it's going to shift off the end of the type.
>
Thanks for reporting. I also noticed this while reworking the
implementation for v2 and I have fixed it among other things.

But I'm still wondering about the type of the offset folding
calculation and whether it could overflow in a bad way:
Could there also be edge cases where HOST_WIDE_INT would be problematic as well?
Maybe unsigned HOST_WIDE_INT is more correct (due to potential overflow issues)?

Manolis

>
> Jeff
  
Jeff Law June 15, 2023, 3:56 p.m. UTC | #5
On 6/15/23 09:30, Manolis Tsamis wrote:
> On Thu, Jun 15, 2023 at 6:04 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 5/25/23 07:42, Jeff Law wrote:
>>
>>> Thanks Manolis.  Do you happen to know if this includes the fixes I
>>> passed along to Philipp a few months back?  My recollection is one fixed
>>> stale DF data which prevented an ICE during bootstrapping, the other
>>> needed to ignore debug insns in one or two places so that the behavior
>>> didn't change based on the existence of debug insns.
>> So we stumbled over another relatively minor issue in this code this
>> week that I'm sure you'll want to fix for a V2.
>>
>> Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT
>> rather than an "int".  Inside the ASHIFT handling you need to change the
>> type of shift_scale to a HOST_WIDE_INT as well and potentially the
>> actual computation of shift_scale.
>>
>> The problem is if you have a compile-time constant address on rv64, it
>> might be constructed with code like this:
>>
>>
>>
>>
>>> (insn 282 47 283 6 (set (reg:DI 14 a4 [267])
>>>          (const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
>>>       (nil))
>>> (insn 283 282 284 6 (set (reg:DI 14 a4 [267])
>>>          (plus:DI (reg:DI 14 a4 [267])
>>>              (const_int 1365 [0x555]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>>>       (expr_list:REG_EQUAL (const_int 349525 [0x55555])
>>>          (nil)))
>>> (insn 284 283 285 6 (set (reg:DI 13 a3 [268])
>>>          (const_int 1431662592 [0x55557000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
>>>       (nil))
>>> (insn 285 284 215 6 (set (reg:DI 13 a3 [268])
>>>          (plus:DI (reg:DI 13 a3 [268])
>>>              (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>>>       (expr_list:REG_EQUAL (const_int 1431662596 [0x55557004])
>>>          (nil)))
>>> (insn 215 285 216 6 (set (reg:DI 14 a4 [271])
>>>          (ashift:DI (reg:DI 14 a4 [267])
>>>              (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3}
>>>       (nil))
>>> (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166])
>>>          (plus:DI (reg:DI 14 a4 [271])
>>>              (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>>>       (expr_list:REG_DEAD (reg:DI 13 a3 [268])
>>>          (expr_list:REG_EQUIV (const_int 1501199875796996 [0x5555555557004])
>>>              (nil))))
>>
>>
>>
>> Note that 32bit ASHIFT in insn 215.  If you're doing that computation in
>> a 32bit integer type, then it's going to shift off the end of the type.
>>
> Thanks for reporting. I also noticed this while reworking the
> implementation for v2 and I have fixed it among other things.
> 
> But I'm still wondering about the type of the offset folding
> calculation and whether it could overflow in a bad way:
> Could there also be edge cases where HOST_WIDE_INT would be problematic as well?
> Maybe unsigned HOST_WIDE_INT is more correct (due to potential overflow issues)?
I think HOST_WIDE_INT is going to be OK.  If we overflow a H_W_I, then 
there's bigger problems elsewhere.

jeff
  
Jeff Law June 18, 2023, 6:11 p.m. UTC | #6
On 6/15/23 09:30, Manolis Tsamis wrote:

>>
> Thanks for reporting. I also noticed this while reworking the
> implementation for v2 and I have fixed it among other things.
Sounds good.  I stumbled across another problem while testing V2.

GEN_INT can create a non-canonical integer constant (and one might 
legitimately wonder if we should eliminate GEN_INT).  The specific case 
I ran into was something like 0xfffffff0 for an SImode value on a 64bit 
host.  That should have been 0xfffffffffffffff0 to be canonical.

The right way to handle this these days is with gen_int_mode.    You 
should replace the two calls to GEN_INT with gen_int_mode (new_offset, mode)

Still testing the new variant...

jeff