Message ID | 20230525123550.1072506-1-manolis.tsamis@vrull.eu |
---|---|
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 415E83857720 for <patchwork@sourceware.org>; Thu, 25 May 2023 12:36:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) by sourceware.org (Postfix) with ESMTPS id 2A8603858D37 for <gcc-patches@gcc.gnu.org>; Thu, 25 May 2023 12:36:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2A8603858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-lj1-x22b.google.com with SMTP id 38308e7fff4ca-2af7081c9ebso5667911fa.1 for <gcc-patches@gcc.gnu.org>; Thu, 25 May 2023 05:36:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1685018178; x=1687610178; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=PAOLISsz2N1PclUelPQwf4LaIgv6+YuYOFZHZPJySoA=; b=W5TbRraBeLMp2YmX8fsCJbt3Eg7ZUS0JNJpiZzyOrIInQ69DPqGuM/0vCCC9eychQP Fgb64cintGSDspXnXGYoTEFOEjIVYf+KBLk9C+jL3/cAp0kFxNHd/QVgXr+qcOUPy02g EuJX9CGAuWMyspC2qCKGuYPazYP1Mawam2MRMMge5F/iTmKbzxDL9iNCuNNr3sLJBUOn JUZjGkpqb3PrsIxOstEINvlUPyGGX6KdeBJuR1tiyaEK23oAB+VwF9lTHFuB8WRohaae 0Ny9zPCwfuGhdHQr1G36Af/v8+TdsWWIIzmvMDH4YMfHYrt3lxAGdXmYy7CfBuVwWvSo EtqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685018178; x=1687610178; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PAOLISsz2N1PclUelPQwf4LaIgv6+YuYOFZHZPJySoA=; b=FpWpHYeZjRDKziUYnW44P99QqcTx3ga3vbP3r+DRO54U+QnTVounNVZWRuKKE0Sv20 pmNDizxW5KzgOKi67PR8McoFYmYiPsXxp2gXnVCmUag81eC+z6gM2bHmdL2FiRHlaKBr 3C9AZpG1BQkYhDME+h7DUctG+XHd5B1qePFgb/4yXuUnjICdG6jpZ8sxASV98kMD6j2S dRU/szk1sA1bHAV9iIKUe9p8X6am5LRh2piqSxI8E0bYWplannxJsizPWQxaCC7lvoZV 5E0APLX8A+W7xfFqmmDfTNIL37PGAliOlAsphiO10uizgMWK57wPRebQ7PyphJH5TfUi yg1w== X-Gm-Message-State: AC+VfDzM4EHeX3F+uG9Gt7AhkHxqO8Uol0NqpHjFKxJUiPG3gXH3DGSx NZusRYYSvqmkochI8z253esxHsKIGnB76AwmNAIw1w== X-Google-Smtp-Source: ACHHUZ7LShXE1Ae8NY0GGkdxf427MU5kKStvzr+ZDXbCw7y9tTAjCGZX4k+dsjJznb1EhTy0oRPN9w== X-Received: by 2002:a2e:a402:0:b0:2ab:bd1:93da with SMTP id p2-20020a2ea402000000b002ab0bd193damr962194ljn.10.1685018178324; Thu, 25 May 2023 05:36:18 -0700 (PDT) Received: from helsinki-03.engr ([2a01:4f9:6b:2a47::2]) by smtp.gmail.com with ESMTPSA id l7-20020a2eb687000000b002a774fb7923sm236631ljo.45.2023.05.25.05.36.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 May 2023 05:36:17 -0700 (PDT) From: Manolis Tsamis <manolis.tsamis@vrull.eu> To: gcc-patches@gcc.gnu.org Cc: Richard Biener <richard.guenther@gmail.com>, Palmer Dabbelt <palmer@rivosinc.com>, Philipp Tomsich <philipp.tomsich@vrull.eu>, Kito Cheng <kito.cheng@gmail.com>, Manolis Tsamis <manolis.tsamis@vrull.eu> Subject: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations. Date: Thu, 25 May 2023 14:35:49 +0200 Message-Id: <20230525123550.1072506-1-manolis.tsamis@vrull.eu> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
RISC-V: New pass to optimize calculation of offsets for memory operations.
|
|
Message
Manolis Tsamis
May 25, 2023, 12:35 p.m. UTC
This pass tries to optimize memory offset calculations by moving them from add immediate instructions to the memory loads/stores. For example it can transform this: addi t4,sp,16 add t2,a6,t4 shl t3,t2,1 ld a2,0(t3) addi a2,1 sd a2,8(t2) into the following (one instruction less): add t2,a6,sp shl t3,t2,1 ld a2,32(t3) addi a2,1 sd a2,24(t2) Although there are places where this is done already, this pass is more powerful and can handle the more difficult cases that are currently not optimized. Also, it runs late enough and can optimize away unnecessary stack pointer calculations. The first patch in the series contains the implementation of this pass while the second is a minor change that enables cprop_hardreg's propgation of the stack pointer, because this pass depends on cprop to do the propagation of optimized operations. If preferred I can split this into two different patches (in which cases some of the testcases included will fail temporarily). Manolis Tsamis (2): Implementation of new RISCV optimizations pass: fold-mem-offsets. cprop_hardreg: Enable propagation of the stack pointer if possible. gcc/config.gcc | 2 +- gcc/config/riscv/riscv-fold-mem-offsets.cc | 637 ++++++++++++++++++ gcc/config/riscv/riscv-passes.def | 1 + gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv.opt | 4 + gcc/config/riscv/t-riscv | 4 + gcc/doc/invoke.texi | 8 + gcc/regcprop.cc | 7 +- .../gcc.target/riscv/fold-mem-offsets-1.c | 16 + .../gcc.target/riscv/fold-mem-offsets-2.c | 24 + .../gcc.target/riscv/fold-mem-offsets-3.c | 17 + 11 files changed, 719 insertions(+), 2 deletions(-) create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
Comments
On 5/25/23 06:35, Manolis Tsamis wrote: > > This pass tries to optimize memory offset calculations by moving them > from add immediate instructions to the memory loads/stores. > For example it can transform this: > > addi t4,sp,16 > add t2,a6,t4 > shl t3,t2,1 > ld a2,0(t3) > addi a2,1 > sd a2,8(t2) > > into the following (one instruction less): > > add t2,a6,sp > shl t3,t2,1 > ld a2,32(t3) > addi a2,1 > sd a2,24(t2) > > Although there are places where this is done already, this pass is more > powerful and can handle the more difficult cases that are currently not > optimized. Also, it runs late enough and can optimize away unnecessary > stack pointer calculations. > > The first patch in the series contains the implementation of this pass > while the second is a minor change that enables cprop_hardreg's > propgation of the stack pointer, because this pass depends on cprop > to do the propagation of optimized operations. If preferred I can split > this into two different patches (in which cases some of the testcases > included will fail temporarily). Thanks Manolis. Do you happen to know if this includes the fixes I passed along to Philipp a few months back? My recollection is one fixed stale DF data which prevented an ICE during bootstrapping, the other needed to ignore debug insns in one or two places so that the behavior didn't change based on the existence of debug insns. Jeff
On Thu, May 25, 2023 at 4:42 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 5/25/23 06:35, Manolis Tsamis wrote: > > > > This pass tries to optimize memory offset calculations by moving them > > from add immediate instructions to the memory loads/stores. > > For example it can transform this: > > > > addi t4,sp,16 > > add t2,a6,t4 > > shl t3,t2,1 > > ld a2,0(t3) > > addi a2,1 > > sd a2,8(t2) > > > > into the following (one instruction less): > > > > add t2,a6,sp > > shl t3,t2,1 > > ld a2,32(t3) > > addi a2,1 > > sd a2,24(t2) > > > > Although there are places where this is done already, this pass is more > > powerful and can handle the more difficult cases that are currently not > > optimized. Also, it runs late enough and can optimize away unnecessary > > stack pointer calculations. > > > > The first patch in the series contains the implementation of this pass > > while the second is a minor change that enables cprop_hardreg's > > propgation of the stack pointer, because this pass depends on cprop > > to do the propagation of optimized operations. If preferred I can split > > this into two different patches (in which cases some of the testcases > > included will fail temporarily). > Thanks Manolis. Do you happen to know if this includes the fixes I > passed along to Philipp a few months back? My recollection is one fixed > stale DF data which prevented an ICE during bootstrapping, the other > needed to ignore debug insns in one or two places so that the behavior > didn't change based on the existence of debug insns. > Hi Jeff, Yes this does include your fixes for DF and debug insns, along with some other minor improvements. Also, thanks for catching these! Manolis > > Jeff
On 5/25/23 07:42, Jeff Law wrote: > Thanks Manolis. Do you happen to know if this includes the fixes I > passed along to Philipp a few months back? My recollection is one fixed > stale DF data which prevented an ICE during bootstrapping, the other > needed to ignore debug insns in one or two places so that the behavior > didn't change based on the existence of debug insns. So we stumbled over another relatively minor issue in this code this week that I'm sure you'll want to fix for a V2. Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT rather than an "int". Inside the ASHIFT handling you need to change the type of shift_scale to a HOST_WIDE_INT as well and potentially the actual computation of shift_scale. The problem is if you have a compile-time constant address on rv64, it might be constructed with code like this: > (insn 282 47 283 6 (set (reg:DI 14 a4 [267]) > (const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit} > (nil)) > (insn 283 282 284 6 (set (reg:DI 14 a4 [267]) > (plus:DI (reg:DI 14 a4 [267]) > (const_int 1365 [0x555]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3} > (expr_list:REG_EQUAL (const_int 349525 [0x55555]) > (nil))) > (insn 284 283 285 6 (set (reg:DI 13 a3 [268]) > (const_int 1431662592 [0x55557000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit} > (nil)) > (insn 285 284 215 6 (set (reg:DI 13 a3 [268]) > (plus:DI (reg:DI 13 a3 [268]) > (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3} > (expr_list:REG_EQUAL (const_int 1431662596 [0x55557004]) > (nil))) > (insn 215 285 216 6 (set (reg:DI 14 a4 [271]) > (ashift:DI (reg:DI 14 a4 [267]) > (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3} > (nil)) > (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166]) > (plus:DI (reg:DI 14 a4 [271]) > (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3} > (expr_list:REG_DEAD (reg:DI 13 a3 [268]) > (expr_list:REG_EQUIV (const_int 1501199875796996 [0x5555555557004]) > (nil)))) Note that 32bit ASHIFT in insn 215. If you're doing that computation in a 32bit integer type, then it's going to shift off the end of the type. Jeff
On Thu, Jun 15, 2023 at 6:04 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 5/25/23 07:42, Jeff Law wrote: > > > Thanks Manolis. Do you happen to know if this includes the fixes I > > passed along to Philipp a few months back? My recollection is one fixed > > stale DF data which prevented an ICE during bootstrapping, the other > > needed to ignore debug insns in one or two places so that the behavior > > didn't change based on the existence of debug insns. > So we stumbled over another relatively minor issue in this code this > week that I'm sure you'll want to fix for a V2. > > Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT > rather than an "int". Inside the ASHIFT handling you need to change the > type of shift_scale to a HOST_WIDE_INT as well and potentially the > actual computation of shift_scale. > > The problem is if you have a compile-time constant address on rv64, it > might be constructed with code like this: > > > > > > (insn 282 47 283 6 (set (reg:DI 14 a4 [267]) > > (const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit} > > (nil)) > > (insn 283 282 284 6 (set (reg:DI 14 a4 [267]) > > (plus:DI (reg:DI 14 a4 [267]) > > (const_int 1365 [0x555]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3} > > (expr_list:REG_EQUAL (const_int 349525 [0x55555]) > > (nil))) > > (insn 284 283 285 6 (set (reg:DI 13 a3 [268]) > > (const_int 1431662592 [0x55557000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit} > > (nil)) > > (insn 285 284 215 6 (set (reg:DI 13 a3 [268]) > > (plus:DI (reg:DI 13 a3 [268]) > > (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3} > > (expr_list:REG_EQUAL (const_int 1431662596 [0x55557004]) > > (nil))) > > (insn 215 285 216 6 (set (reg:DI 14 a4 [271]) > > (ashift:DI (reg:DI 14 a4 [267]) > > (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3} > > (nil)) > > (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166]) > > (plus:DI (reg:DI 14 a4 [271]) > > (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3} > > (expr_list:REG_DEAD (reg:DI 13 a3 [268]) > > (expr_list:REG_EQUIV (const_int 1501199875796996 [0x5555555557004]) > > (nil)))) > > > > Note that 32bit ASHIFT in insn 215. If you're doing that computation in > a 32bit integer type, then it's going to shift off the end of the type. > Thanks for reporting. I also noticed this while reworking the implementation for v2 and I have fixed it among other things. But I'm still wondering about the type of the offset folding calculation and whether it could overflow in a bad way: Could there also be edge cases where HOST_WIDE_INT would be problematic as well? Maybe unsigned HOST_WIDE_INT is more correct (due to potential overflow issues)? Manolis > > Jeff
On 6/15/23 09:30, Manolis Tsamis wrote: > On Thu, Jun 15, 2023 at 6:04 PM Jeff Law <jeffreyalaw@gmail.com> wrote: >> >> >> >> On 5/25/23 07:42, Jeff Law wrote: >> >>> Thanks Manolis. Do you happen to know if this includes the fixes I >>> passed along to Philipp a few months back? My recollection is one fixed >>> stale DF data which prevented an ICE during bootstrapping, the other >>> needed to ignore debug insns in one or two places so that the behavior >>> didn't change based on the existence of debug insns. >> So we stumbled over another relatively minor issue in this code this >> week that I'm sure you'll want to fix for a V2. >> >> Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT >> rather than an "int". Inside the ASHIFT handling you need to change the >> type of shift_scale to a HOST_WIDE_INT as well and potentially the >> actual computation of shift_scale. >> >> The problem is if you have a compile-time constant address on rv64, it >> might be constructed with code like this: >> >> >> >> >>> (insn 282 47 283 6 (set (reg:DI 14 a4 [267]) >>> (const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit} >>> (nil)) >>> (insn 283 282 284 6 (set (reg:DI 14 a4 [267]) >>> (plus:DI (reg:DI 14 a4 [267]) >>> (const_int 1365 [0x555]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3} >>> (expr_list:REG_EQUAL (const_int 349525 [0x55555]) >>> (nil))) >>> (insn 284 283 285 6 (set (reg:DI 13 a3 [268]) >>> (const_int 1431662592 [0x55557000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit} >>> (nil)) >>> (insn 285 284 215 6 (set (reg:DI 13 a3 [268]) >>> (plus:DI (reg:DI 13 a3 [268]) >>> (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3} >>> (expr_list:REG_EQUAL (const_int 1431662596 [0x55557004]) >>> (nil))) >>> (insn 215 285 216 6 (set (reg:DI 14 a4 [271]) >>> (ashift:DI (reg:DI 14 a4 [267]) >>> (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3} >>> (nil)) >>> (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166]) >>> (plus:DI (reg:DI 14 a4 [271]) >>> (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3} >>> (expr_list:REG_DEAD (reg:DI 13 a3 [268]) >>> (expr_list:REG_EQUIV (const_int 1501199875796996 [0x5555555557004]) >>> (nil)))) >> >> >> >> Note that 32bit ASHIFT in insn 215. If you're doing that computation in >> a 32bit integer type, then it's going to shift off the end of the type. >> > Thanks for reporting. I also noticed this while reworking the > implementation for v2 and I have fixed it among other things. > > But I'm still wondering about the type of the offset folding > calculation and whether it could overflow in a bad way: > Could there also be edge cases where HOST_WIDE_INT would be problematic as well? > Maybe unsigned HOST_WIDE_INT is more correct (due to potential overflow issues)? I think HOST_WIDE_INT is going to be OK. If we overflow a H_W_I, then there's bigger problems elsewhere. jeff
On 6/15/23 09:30, Manolis Tsamis wrote: >> > Thanks for reporting. I also noticed this while reworking the > implementation for v2 and I have fixed it among other things. Sounds good. I stumbled across another problem while testing V2. GEN_INT can create a non-canonical integer constant (and one might legitimately wonder if we should eliminate GEN_INT). The specific case I ran into was something like 0xfffffff0 for an SImode value on a 64bit host. That should have been 0xfffffffffffffff0 to be canonical. The right way to handle this these days is with gen_int_mode. You should replace the two calls to GEN_INT with gen_int_mode (new_offset, mode) Still testing the new variant... jeff