From patchwork Mon Feb 26 10:25:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Georg-Johann Lay X-Patchwork-Id: 86369 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 85745385843B for ; Mon, 26 Feb 2024 10:26:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [81.169.146.217]) by sourceware.org (Postfix) with ESMTPS id 71DE33858CDB for ; Mon, 26 Feb 2024 10:25:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 71DE33858CDB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gjlay.de Authentication-Results: sourceware.org; spf=none smtp.mailfrom=gjlay.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 71DE33858CDB Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=81.169.146.217 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1708943141; cv=pass; b=SlJcSl6iI6GHu7nZQHH7a3HBbpMc5L2PssRPLJuWwquf8ytws8on4Xily1BzqYfoSsHZSXUa5AsFVFQ396eJM9BAX/2ppHzxVv4wSoLt7JWNU7eg6xyM4dvLLYK5TtNxKqqRTSwh3u/69ocUFUBuVswuVuv8zOjSlpPJPqcMnMo= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1708943141; c=relaxed/simple; bh=SF5odaQG7WNFsKEglYzLoHWjDVNl+pxWuxy4eW40xaM=; h=DKIM-Signature:DKIM-Signature:Message-ID:Date:MIME-Version:From: To:Subject; b=Ujsyg25E5GKLcrIsHlwhVBMJgGKXB4CzoQf09uWctaOMKqTBgkfFVYYjZsDDFZFwrVIvVZPidt/vSQgecvHK/cWnFHdBy6tdhIIdxxAqMXfRGK2NMhwQXhx+XY37f7RSlMiAMn40YBnS9hVY1wHx4sIl1QTEIm2qqORQ8fz4xp4= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; t=1708943134; cv=none; d=strato.com; s=strato-dkim-0002; b=hgp0dnbQI2ZpOjrjIDBDJ2t+Mi7YbS+crOqFmjdN6Vxptao8q44eVcJXgOHeLSiRZY 8jL84u2T+0AptS7MM7mtL4QqaryV+UUr56jQSGrjpalgXcNONhz/+ar5wi5zKQJsScqB gu8JUE5puBIzR8XemBuSGLDBzuMnx9EsItNGrRzm5hjQz6XfAjSSdY6FA+enJrCCh+Bn CloMpFnMfIGtZXG8O0cCEz/E/SAQKw1Z5Qn6byKG51IrPfqaTa7qq+i0gWZ7rmT4CK0a adzAtgfOAJeeQvP0f+OTYu3saFMAGBgJe1QxqH8asHh/52HYzMcNp856tbxfm+wJvb24 pWtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1708943134; s=strato-dkim-0002; d=strato.com; h=Subject:To:From:Date:Message-ID:Cc:Date:From:Subject:Sender; bh=XhS5pjDjFHgyIPfLE2OInDMMz/nwr0c/NA56U4cLwUU=; b=iPtRzvRbdlLWVn+ltWgePAHdnhGMvj1o/jw8ob2guzVGO7C7u/55+nXJKoc30s4lZb jefWan7MYJvGSG0GpECe7KKU75xH+DF1Lg356fnQICSSWGAX0ZMUVt9J3oHkbwReBBgs 1uxSfZPXoYKbpYAT0oMaUl4e0+wCgphLG5zGBC+9LN2Lv5zx/Ctk01eJxJRsBIYknyv/ cSduD4dGJwO7juzag+C+LCsROkJBpc4zHG13qSIQ2CQ7q/yEkKAT2giGL4XeW6XsSg/n cDT4nJUprFJNoBDklTgKcMoq4HtQBZjNY2iOuVlCRbStSSLW+iy73eMyGfo/IkiCIa4X hWlQ== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1708943134; s=strato-dkim-0002; d=gjlay.de; h=Subject:To:From:Date:Message-ID:Cc:Date:From:Subject:Sender; bh=XhS5pjDjFHgyIPfLE2OInDMMz/nwr0c/NA56U4cLwUU=; b=k/rnftxj2aIeuE30l6mBQnHnkVARlfa85kp9bHzKSZvzyzNtou+qIufc39vePirMe4 n/jqAHADeuyj6z1+CTvHEoWB8gLPmhztiXxVMkXltcWe5PjLOmayiF4Okq//5UcjcAMt vmsXBmemUbUtiissM2YaAC1jnCLBn8QKD9PAkWAn/8FGvamDP4LWkS2JRDbzNydDX2DL t+VEKfNDhkkpJvtnZQPqffD5obXYaic/xnHx9F+Ylf9352DiGYE+fWaTTPKrbXzQsULd b5434tdcXaWPWwihP48Bz1G+SNcLcRsA/M0waSF5w/DlFAyq4lMGOcKgzjtDEghiBYMd 9t3w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1708943134; s=strato-dkim-0003; d=gjlay.de; h=Subject:To:From:Date:Message-ID:Cc:Date:From:Subject:Sender; bh=XhS5pjDjFHgyIPfLE2OInDMMz/nwr0c/NA56U4cLwUU=; b=K8uWBEqEIjyWBXwj4cBK9ToAI4qpDb0vL5w/nBdUAkYA3JEiuLt84ivKH35civNybz acwC0MZJJsfKL0BpqyAQ== X-RZG-AUTH: ":LXoWVUeid/7A29J/hMvvT3koxZnKT7Qq0xotTetVnKkSgcSjpmy9KwoMJ/K0VA==" Received: from [192.168.2.102] by smtp.strato.de (RZmta 50.0.4 DYNA|AUTH) with ESMTPSA id z6051601QAPXDFZ (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate) for ; Mon, 26 Feb 2024 11:25:33 +0100 (CET) Message-ID: Date: Mon, 26 Feb 2024 11:25:32 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US From: Georg-Johann Lay To: gcc-patches@gcc.gnu.org Subject: [patch,avr] PR114100 : Better indirect addressing on reduced cores X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org A description of what the patch does follows in the commit message below. On ATmega128, there are no changes in test results. On ATtiny40 (reduced core) there are no new execution fails, but apart from that there is quite some noise in the delta: unsupported (memory full) -> pass unsupported (memory full) -> fail due to unresolved symbol (printf, float, ...) unsupported (memory full) -> fail (excess errors) this is because the testsuite is far from being diagnostic-clean. All these transitions are because the code size shrinks a lot, sometimes 20% or more. When there are no objections or improvements, I would go ahead and install it so it can go into v14. Johann --- The Reduced Tiny core does not support indirect addressing with offset, which basically means that every indirect memory access with a size of more than one byte is effectively POST_INC or PRE_DEC. The lack of that addressing mode is currently handled by pretending to support it, and then let the insn printers add and subtract again offsets as needed. For example, the following C code int vars[10]; void inc_var2 (void) { ++vars[2]; } is compiled to: ldi r30,lo8(vars) ; 14 [c=4 l=2] *movhi/4 ldi r31,hi8(vars) subi r30,lo8(-(4)) ; 15 [c=8 l=6] *movhi/2 sbci r31,hi8(-(4)) ld r20,Z+ ld r21,Z subi r30,lo8((4+1)) sbci r31,hi8((4+1)) subi r20,-1 ; 16 [c=4 l=2] *addhi3_clobber/1 sbci r21,-1 subi r30,lo8(-(4+1)) ; 17 [c=4 l=4] *movhi/3 sbci r31,hi8(-(4+1)) st Z,r21 st -Z,r20 where the code could be -- and with this patch actually is -- like ldi r30,lo8(vars+4) ; 28 [c=4 l=2] *movhi/4 ldi r31,hi8(vars+4) ld r20,Z+ ; 17 [c=8 l=2] *movhi/2 ld r21,Z+ subi r20,-1 ; 19 [c=4 l=2] *addhi3_clobber/1 sbci r21,-1 st -Z,r21 ; 30 [c=4 l=2] *movhi/3 st -Z,r20 This is achieved in two steps: - A post-reload split into "real" instructions during .split2. - A new avr-specific mini pass .avr-fuse-add that runs before RTL peephole and that tries to combine the generated pointer additions into memory accesses to form POST_INC or PRE_DEC. gcc/ PR target/114100 * doc/invoke.texi (AVR Options) <-mfuse-add>: Document. * config/avr/avr.opt (-mfuse-add=): New target option. * common/config/avr/avr-common.cc (avr_option_optimization_table) [OPT_LEVELS_1_PLUS]: Set -mfuse-add=1. [OPT_LEVELS_2_PLUS]: Set -mfuse-add=2. * config/avr/avr-passes.def (avr_pass_fuse_add): Insert new pass. * config/avr/avr-protos.h (avr_split_tiny_move) (make_avr_pass_fuse_add): New protos. * config/avr/avr.md [AVR_TINY]: New post-reload splitter uses avr_split_tiny_move to split indirect memory accesses. (gen_move_clobbercc): New define_expand helper. * config/avr/avr.cc (avr_pass_data_fuse_add): New pass data. (avr_pass_fuse_add): New class from rtl_opt_pass. (make_avr_pass_fuse_add, avr_split_tiny_move): New functions. (reg_seen_between_p, emit_move_ccc, emit_move_ccc_after): New functions. (avr_legitimate_address_p) [AVR_TINY]: Don't restrict offsets of PLUS addressing for AVR_TINY. (avr_regno_mode_code_ok_for_base_p) [AVR_TINY]: Ignore -mstrict-X. (avr_out_plus_1) [AVR_TINY]: Tweak ++Y and --Y. (avr_mode_code_base_reg_class) [AVR_TINY]: Always return POINTER_REGS. gcc/common/config/avr/avr-common.cc | 2 + gcc/config/avr/avr-passes.def | 9 ++ gcc/config/avr/avr-protos.h | 2 + gcc/config/avr/avr.cc | 787 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- gcc/config/avr/avr.md | 29 +++++ gcc/config/avr/avr.opt | 8 ++ gcc/doc/invoke.texi | 10 +- 7 files changed, 845 insertions(+), 2 deletions(-) diff --git a/gcc/common/config/avr/avr-common.cc b/gcc/common/config/avr/avr-common.cc index 7867483909d..fdf130f1e1a 100644 --- a/gcc/common/config/avr/avr-common.cc +++ b/gcc/common/config/avr/avr-common.cc @@ -34,6 +34,8 @@ static const struct default_options avr_option_optimization_table[] = { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_mmain_is_OS_task, NULL, 1 }, + { OPT_LEVELS_1_PLUS, OPT_mfuse_add_, NULL, 1 }, + { OPT_LEVELS_2_PLUS, OPT_mfuse_add_, NULL, 2 }, // Stick to the "old" placement of the subreg lowering pass. { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types_early, NULL, 1 }, /* Allow optimizer to introduce store data races. This used to be the diff --git a/gcc/config/avr/avr-passes.def b/gcc/config/avr/avr-passes.def index 34e5b95f920..748260edaef 100644 --- a/gcc/config/avr/avr-passes.def +++ b/gcc/config/avr/avr-passes.def @@ -17,6 +17,15 @@ along with GCC; see the file COPYING3. If not see . */ +/* A post reload optimization pass that fuses PLUS insns with CONST_INT + addend with a load or store insn to get POST_INC or PRE_DEC addressing. + It can also fuse two PLUSes to a single one, which may occur due to + splits from `avr_split_tiny_move'. We do this in an own pass because + it can find more cases than peephole2, for example when there are + unrelated insns between the interesting ones. */ + +INSERT_PASS_BEFORE (pass_peephole2, 1, avr_pass_fuse_add); + /* An analysis pass that runs prior to prologue / epilogue generation. Computes cfun->machine->gasisr.maybe which is used in prologue and epilogue generation provided -mgas-isr-prologues is on. */ diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h index 7d1f815c664..064a3d23322 100644 --- a/gcc/config/avr/avr-protos.h +++ b/gcc/config/avr/avr-protos.h @@ -88,6 +88,7 @@ extern void avr_expand_prologue (void); extern void avr_expand_epilogue (bool); extern bool avr_emit_cpymemhi (rtx*); extern int avr_epilogue_uses (int regno); +extern bool avr_split_tiny_move (rtx_insn *insn, rtx *operands); extern void avr_output_addr_vec (rtx_insn*, rtx); extern const char *avr_out_sbxx_branch (rtx_insn *insn, rtx operands[]); @@ -161,6 +162,7 @@ extern bool avr_have_dimode; namespace gcc { class context; } class rtl_opt_pass; +extern rtl_opt_pass *make_avr_pass_fuse_add (gcc::context *); extern rtl_opt_pass *make_avr_pass_pre_proep (gcc::context *); extern rtl_opt_pass *make_avr_pass_recompute_notes (gcc::context *); extern rtl_opt_pass *make_avr_pass_casesi (gcc::context *); diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc index d3756a2f036..b721d9dc98a 100644 --- a/gcc/config/avr/avr.cc +++ b/gcc/config/avr/avr.cc @@ -1779,6 +1779,586 @@ sequent_regs_live (void) return (cur_seq == live_seq) ? live_seq : 0; } + +namespace { +static const pass_data avr_pass_data_fuse_add = +{ + RTL_PASS, // type + "", // name (will be patched) + OPTGROUP_NONE, // optinfo_flags + TV_DF_SCAN, // tv_id + 0, // properties_required + 0, // properties_provided + 0, // properties_destroyed + 0, // todo_flags_start + TODO_df_finish // todo_flags_finish +}; + + +class avr_pass_fuse_add : public rtl_opt_pass +{ +public: + avr_pass_fuse_add (gcc::context *ctxt, const char *name) + : rtl_opt_pass (avr_pass_data_fuse_add, ctxt) + { + this->name = name; + } + + void fuse_add (function *); + + virtual bool gate (function *) { return optimize && avr_fuse_add > 0; } + + virtual unsigned int execute (function *); + + struct Some_Insn + { + rtx_insn *insn = nullptr; + rtx dest, src; + bool valid () const { return (bool) insn; } + void set_deleted () + { + gcc_assert (insn); + SET_INSN_DELETED (insn); + insn = nullptr; + } + }; + + // If .insn is not NULL, then this is a reg:HI += const_int. + struct Add_Insn : Some_Insn + { + rtx addend; + int regno; + Add_Insn () {} + Add_Insn (rtx_insn *insn); + }; + + // If .insn is not NULL, then this sets an address register to + // a constant value. + struct Ldi_Insn : Some_Insn + { + int regno; + Ldi_Insn () {} + Ldi_Insn (rtx_insn *insn); + }; + + // If .insn is not NULL, then this is a load or store insn where + // the address is REG or POST_INC. + struct Mem_Insn : Some_Insn + { + rtx reg_or_0, mem, addr, addr_reg; + int addr_regno; + enum rtx_code addr_code; + machine_mode mode; + addr_space_t addr_space; + bool store_p, volatile_p, generic_p; + Mem_Insn () {} + Mem_Insn (rtx_insn *insn); + }; + + rtx_insn *fuse_ldi_add (Ldi_Insn &prev_ldi, Add_Insn &add); + rtx_insn *fuse_add_add (Add_Insn &prev_add, Add_Insn &add); + rtx_insn *fuse_add_mem (Add_Insn &prev_add, Mem_Insn &mem); + rtx_insn *fuse_mem_add (Mem_Insn &prev_mem, Add_Insn &add); +}; // avr_pass_fuse_add + +} // anon namespace + +rtl_opt_pass * +make_avr_pass_fuse_add (gcc::context *ctxt) +{ + return new avr_pass_fuse_add (ctxt, "avr-fuse-add"); +} + +/* Describe properties of AVR's indirect load and store instructions + LD, LDD, ST, STD, LPM, ELPM depending on register number, volatility etc. + Rules for "volatile" accesses are: + + | Xmega | non-Xmega + ------+-----------------+---------------- + load | read LSB first | read LSB first + store | write LSB first | write MSB first +*/ + + struct AVR_LdSt_Props +{ + bool has_postinc, has_predec, has_ldd; + // The insn printers will use POST_INC or PRE_DEC addressing, no matter + // what adressing modes we are feeding into them. + bool want_postinc, want_predec; + + AVR_LdSt_Props (int regno, bool store_p, bool volatile_p, addr_space_t as) + { + bool generic_p = ADDR_SPACE_GENERIC_P (as); + bool flash_p = ! generic_p && as != ADDR_SPACE_MEMX; + has_postinc = generic_p || (flash_p && regno == REG_Z); + has_predec = generic_p; + has_ldd = generic_p && ! AVR_TINY && (regno == REG_Y || regno == REG_Z); + want_predec = volatile_p && generic_p && ! AVR_XMEGA && store_p; + want_postinc = volatile_p && generic_p && (AVR_XMEGA || ! store_p); + want_postinc |= flash_p && regno == REG_Z; + } + + AVR_LdSt_Props (const avr_pass_fuse_add::Mem_Insn &m) + : AVR_LdSt_Props (m.addr_regno, m.store_p, m.volatile_p, m.addr_space) + { + gcc_assert (m.valid ()); + } +}; + +/* Emit a single_set that clobbers REG_CC. */ + +static rtx_insn * +emit_move_ccc (rtx dest, rtx src) +{ + return emit_insn (gen_gen_move_clobbercc (dest, src)); +} + +/* Emit a single_set that clobbers REG_CC after insn AFTER. */ + +static rtx_insn * +emit_move_ccc_after (rtx dest, rtx src, rtx_insn *after) +{ + return emit_insn_after (gen_gen_move_clobbercc (dest, src), after); +} + +static bool +reg_seen_between_p (const_rtx reg, const rtx_insn *from, const rtx_insn *to) +{ + return (reg_used_between_p (reg, from, to) + || reg_set_between_p (reg, from, to)); +} + + +static void +avr_maybe_adjust_cfa (rtx_insn *insn, rtx reg, int addend) +{ + if (addend + && frame_pointer_needed + && REGNO (reg) == FRAME_POINTER_REGNUM + && avr_fuse_add == 3) + { + rtx plus = plus_constant (Pmode, reg, addend); + RTX_FRAME_RELATED_P (insn) = 1; + add_reg_note (insn, REG_CFA_ADJUST_CFA, gen_rtx_SET (reg, plus)); + } +} + + +// If successful, this represents a SET of a pointer register to a constant. +avr_pass_fuse_add::Ldi_Insn::Ldi_Insn (rtx_insn *insn) +{ + rtx set = single_set (insn); + if (!set) + return; + + src = SET_SRC (set); + dest = SET_DEST (set); + + if (REG_P (dest) + && GET_MODE (dest) == Pmode + && IN_RANGE (regno = REGNO (dest), REG_X, REG_Z) + && CONSTANT_P (src)) + { + this->insn = insn; + } +} + +// If successful, this represents a PLUS with CONST_INT of a pointer +// register X, Y or Z. Otherwise, the object is not valid(). +avr_pass_fuse_add::Add_Insn::Add_Insn (rtx_insn *insn) +{ + rtx set = single_set (insn); + if (!set) + return; + + src = SET_SRC (set); + dest = SET_DEST (set); + if (REG_P (dest) + // We are only interested in PLUSes that change address regs. + && GET_MODE (dest) == Pmode + && IN_RANGE (regno = REGNO (dest), REG_X, REG_Z) + && PLUS == GET_CODE (src) + && rtx_equal_p (XEXP (src, 0), dest) + && CONST_INT_P (XEXP (src, 1))) + { + // This is reg:HI += const_int. + addend = XEXP (src, 1); + this->insn = insn; + } +} + +// If successful, this represents a load or store insn where the addressing +// mode uses pointer register X, Y or Z. Otherwise, the object is not valid(). +avr_pass_fuse_add::Mem_Insn::Mem_Insn (rtx_insn *insn) +{ + rtx set = single_set (insn); + if (!set) + return; + + src = SET_SRC (set); + dest = SET_DEST (set); + mode = GET_MODE (dest); + + if (MEM_P (dest) + && (REG_P (src) || src == CONST0_RTX (mode))) + { + reg_or_0 = src; + mem = dest; + } + else if (REG_P (dest) && MEM_P (src)) + { + reg_or_0 = dest; + mem = src; + } + else + return; + + addr = XEXP (mem, 0); + addr_code = GET_CODE (addr); + + if (addr_code == REG) + addr_reg = addr; + else if (addr_code == POST_INC || addr_code == PRE_DEC) + addr_reg = XEXP (addr, 0); + else + return; + + addr_regno = REGNO (addr_reg); + + if (avr_fuse_add == 2 + && frame_pointer_needed + && addr_regno == FRAME_POINTER_REGNUM) + MEM_VOLATILE_P (mem) = 0; + + if (reg_overlap_mentioned_p (reg_or_0, addr) // Can handle CONSTANT_P. + || addr_regno > REG_Z + || avr_mem_memx_p (mem) + // The following optimizations only handle REG and POST_INC, + // so that's all what we allow here. + || (addr_code != REG && addr_code != POST_INC)) + return; + + addr_space = MEM_ADDR_SPACE (mem); + volatile_p = MEM_VOLATILE_P (mem); + store_p = MEM_P (dest); + + // Turn this "valid". + this->insn = insn; +} + +/* Try to combine a Ldi insn with a PLUS CONST_INT addend to one Ldi insn. + If LDI is valid, then it precedes ADD in the same block. + When a replacement is found, a new insn is emitted and the old insns + are pseudo-deleted. The returned insn is the point where the calling + scanner should continue. When no replacement is found, nullptr is + returned and nothing changed. */ + +rtx_insn * +avr_pass_fuse_add::fuse_ldi_add (Ldi_Insn &ldi, Add_Insn &add) +{ + if (! ldi.valid () + || reg_seen_between_p (ldi.dest, ldi.insn, add.insn)) + { + // If something is between the Ldi and the current insn, we can + // set the Ldi invalid to speed future scans. + return ldi.insn = nullptr; + } + + // Found a Ldi with const and a PLUS insns in the same BB, + // and with no interfering insns between them. + + // Emit new Ldi with the sum of the original offsets after the old Ldi. + rtx xval = plus_constant (Pmode, ldi.src, INTVAL (add.addend)); + + rtx_insn *insn = emit_move_ccc_after (ldi.dest, xval, ldi.insn); + avr_dump (";; new Ldi[%d] insn %d after %d: R%d = %r\n\n", ldi.regno, + INSN_UID (insn), INSN_UID (ldi.insn), ldi.regno, xval); + + rtx_insn *next = NEXT_INSN (add.insn); + ldi.set_deleted (); + add.set_deleted (); + + return next; +} + +/* Try to combine two PLUS insns with CONST_INT addend to one such insn. + If PREV_ADD is valid, then it precedes ADD in the same basic block. + When a replacement is found, a new insn is emitted and the old insns + are pseudo-deleted. The returned insn is the point where the calling + scanner should continue. When no replacement is found, nullptr is + returned and nothing changed. */ + +rtx_insn * +avr_pass_fuse_add::fuse_add_add (Add_Insn &prev_add, Add_Insn &add) +{ + if (! prev_add.valid () + || reg_seen_between_p (add.dest, prev_add.insn, add.insn)) + { + // If something is between the previous Add and the current insn, + // we can set the previous Add invalid to speed future scans. + return prev_add.insn = nullptr; + } + + // Found two PLUS insns in the same BB, and with no interfering + // insns between them. + rtx plus = plus_constant (Pmode, add.src, INTVAL (prev_add.addend)); + + rtx_insn *next; + if (REG_P (plus)) + { + avr_dump (";; Add[%d] from %d annihilates %d\n\n", add.regno, + INSN_UID (prev_add.insn), INSN_UID (add.insn)); + next = NEXT_INSN (add.insn); + } + else + { + // Emit after the current insn, so that it will be picked + // up as next valid Add insn. + next = emit_move_ccc_after (add.dest, plus, add.insn); + avr_dump (";; #1 new Add[%d] insn %d after %d: R%d += %d\n\n", + add.regno, INSN_UID (next), INSN_UID (add.insn), + add.regno, (int) INTVAL (XEXP (plus, 1))); + gcc_assert (GET_CODE (plus) == PLUS); + } + + add.set_deleted (); + prev_add.set_deleted (); + + return next; +} + +/* Try to combine a PLUS of the address register with a load or store insn. + If ADD is valid, then it precedes MEM in the same basic block. + When a replacement is found, a new insn is emitted and the old insns + are pseudo-deleted. The returned insn is the point where the calling + scanner should continue. When no replacement is found, nullptr is + returned and nothing changed. */ + +rtx_insn * +avr_pass_fuse_add::fuse_add_mem (Add_Insn &add, Mem_Insn &mem) +{ + if (! add.valid () + || reg_seen_between_p (add.dest, add.insn, mem.insn)) + { + // If something is between the Add and the current insn, we can + // set the Add invalid to speed future scans. + return add.insn = nullptr; + } + + AVR_LdSt_Props ap { mem }; + + int msize = GET_MODE_SIZE (mem.mode); + + // The mem insn really wants PRE_DEC. + bool case1 = ((mem.addr_code == REG || mem.addr_code == POST_INC) + && msize > 1 && ap.want_predec && ! ap.has_ldd); + + // The offset can be consumed by a PRE_DEC. + bool case2 = (- INTVAL (add.addend) == msize + && (mem.addr_code == REG || mem.addr_code == POST_INC) + && ap.has_predec && ! ap.want_postinc); + + if (! case1 && ! case2) + return nullptr; + + // Change from REG or POST_INC to PRE_DEC. + rtx xmem = change_address (mem.mem, mem.mode, + gen_rtx_PRE_DEC (Pmode, mem.addr_reg)); + rtx dest = mem.store_p ? xmem : mem.reg_or_0; + rtx src = mem.store_p ? mem.reg_or_0 : xmem; + + rtx_insn *next = emit_move_ccc_after (dest, src, mem.insn); + add_reg_note (next, REG_INC, mem.addr_reg); + avr_dump (";; new Mem[%d] insn %d after %d: %r = %r\n\n", mem.addr_regno, + INSN_UID (next), INSN_UID (mem.insn), dest, src); + + // Changing REG or POST_INC -> PRE_DEC means that the addend before + // the memory access must be increased by the size of the access, + rtx plus = plus_constant (Pmode, add.src, msize); + if (! REG_P (plus)) + { + rtx_insn *insn = emit_move_ccc_after (add.dest, plus, add.insn); + avr_dump (";; #2 new Add[%d] insn %d after %d: R%d += %d\n\n", + add.regno, INSN_UID (insn), INSN_UID (add.insn), + add.regno, (int) INTVAL (XEXP (plus, 1))); + gcc_assert (GET_CODE (plus) == PLUS); + } + else + avr_dump (";; Add[%d] insn %d consumed into %d\n\n", + add.regno, INSN_UID (add.insn), INSN_UID (next)); + + // Changing POST_INC -> PRE_DEC means that the addend after the mem has to be + // the size of the access. The hope is that this new add insn may be unused. + if (mem.addr_code == POST_INC) + { + plus = plus_constant (Pmode, add.dest, msize); + rtx_insn *next2 = emit_move_ccc_after (add.dest, plus, next); + avr_dump (";; #3 new Add[%d] insn %d after %d: R%d += %d\n\n", add.regno, + INSN_UID (next2), INSN_UID (next), add.regno, msize); + next = next2; + } + + add.set_deleted (); + mem.set_deleted (); + + return next; +} + +/* Try to combine a load or store insn with a PLUS of the address register. + If MEM is valid, then it precedes ADD in the same basic block. + When a replacement is found, a new insn is emitted and the old insns + are pseudo-deleted. The returned insn is the point where the calling + scanner should continue. When no replacement is found, nullptr is + returned and nothing changed. */ + +rtx_insn * +avr_pass_fuse_add::fuse_mem_add (Mem_Insn &mem, Add_Insn &add) +{ + if (! mem.valid () + || reg_seen_between_p (add.dest, mem.insn, add.insn)) + { + // If something is between the Mem and the current insn, we can + // set the Mem invalid to speed future scans. + return mem.insn = nullptr; + } + + AVR_LdSt_Props ap { mem }; + + int msize = GET_MODE_SIZE (mem.mode); + + // The add insn can be consumed by a POST_INC. + bool case1 = (mem.addr_code == REG + && INTVAL (add.addend) == msize + && ap.has_postinc && ! ap.want_predec); + + // There are cases where even a partial consumption of the offset is better. + // This are the cases where no LD+offset addressing is available, because + // the address register is obviously used after the mem insn, and a mem insn + // with REG addressing mode will have to restore the address. + bool case2 = (mem.addr_code == REG + && msize > 1 && ap.want_postinc && ! ap.has_ldd); + + if (! case1 && ! case2) + return nullptr; + + // Change addressing mode from REG to POST_INC. + rtx xmem = change_address (mem.mem, mem.mode, + gen_rtx_POST_INC (Pmode, mem.addr_reg)); + rtx dest = mem.store_p ? xmem : mem.reg_or_0; + rtx src = mem.store_p ? mem.reg_or_0 : xmem; + + rtx_insn *insn = emit_move_ccc_after (dest, src, mem.insn); + add_reg_note (insn, REG_INC, mem.addr_reg); + avr_dump (";; new Mem[%d] insn %d after %d: %r = %r\n\n", add.regno, + INSN_UID (insn), INSN_UID (mem.insn), dest, src); + + rtx_insn *next = NEXT_INSN (add.insn); + + // Changing REG -> POST_INC means that the post addend must be + // decreased by the size of the access. + rtx plus = plus_constant (Pmode, add.src, -msize); + if (! REG_P (plus)) + { + next = emit_move_ccc_after (mem.addr_reg, plus, add.insn); + avr_dump (";; #4 new Add[%d] insn %d after %d: R%d += %d\n\n", + add.regno, INSN_UID (next), INSN_UID (add.insn), + add.regno, (int) INTVAL (XEXP (plus, 1))); + gcc_assert (GET_CODE (plus) == PLUS); + } + else + avr_dump (";; Add[%d] insn %d consumed into %d\n\n", + add.regno, INSN_UID (add.insn), INSN_UID (insn)); + + add.set_deleted (); + mem.set_deleted (); + + return next; +} + +/* Try to post-reload combine PLUS with CONST_INt of pointer registers with: + - Sets to a constant address. + - PLUS insn of that kind. + - Indirect loads and stores. + In almost all cases, combine opportunities arise from the preparation + done by `avr_split_tiny_move', but in some rare cases combinations are + found for the ordinary cores, too. + As we consider at most one Mem insn per try, there may still be missed + optimizations like POST_INC + PLUS + POST_INC might be performed + as PRE_DEC + PRE_DEC for two adjacent locations. */ + +unsigned int +avr_pass_fuse_add::execute (function *func) +{ + df_note_add_problem (); + df_analyze (); + + int n_add = 0, n_mem = 0, n_ldi = 0; + basic_block bb; + + FOR_EACH_BB_FN (bb, func) + { + Ldi_Insn prev_ldi_insns[32]; + Add_Insn prev_add_insns[32]; + Mem_Insn prev_mem_insns[32]; + rtx_insn *insn, *curr; + + avr_dump ("\n;; basic block %d\n\n", bb->index); + + FOR_BB_INSNS_SAFE (bb, insn, curr) + { + rtx_insn *next = nullptr; + Ldi_Insn ldi_insn { insn }; + Add_Insn add_insn { insn }; + Mem_Insn mem_insn { insn }; + + if (add_insn.valid ()) + { + // Found reg:HI += const_int + avr_dump (";; insn %d: Add[%d]: R%d += %d\n\n", + INSN_UID (add_insn.insn), add_insn.regno, + add_insn.regno, (int) INTVAL (add_insn.addend)); + Ldi_Insn &prev_ldi_insn = prev_ldi_insns[add_insn.regno]; + Add_Insn &prev_add_insn = prev_add_insns[add_insn.regno]; + Mem_Insn &prev_mem_insn = prev_mem_insns[add_insn.regno]; + if ((next = fuse_ldi_add (prev_ldi_insn, add_insn))) + curr = next, n_ldi += 1; + else if ((next = fuse_add_add (prev_add_insn, add_insn))) + curr = next, n_add += 1; + else if ((next = fuse_mem_add (prev_mem_insn, add_insn))) + curr = next, n_mem += 1; + else + prev_add_insn = add_insn; + } + else if (mem_insn.valid ()) + { + int addr_regno = REGNO (mem_insn.addr_reg); + avr_dump (";; insn %d: Mem[%d]: %r = %r\n\n", + INSN_UID (mem_insn.insn), addr_regno, + mem_insn.dest, mem_insn.src); + Add_Insn &prev_add_insn = prev_add_insns[addr_regno]; + if ((next = fuse_add_mem (prev_add_insn, mem_insn))) + curr = next, n_mem += 1; + else + prev_mem_insns[addr_regno] = mem_insn; + } + else if (ldi_insn.valid ()) + { + if (! CONST_INT_P (ldi_insn.src)) + avr_dump (";; insn %d: Ldi[%d]: R%d = %r\n\n", + INSN_UID (ldi_insn.insn), ldi_insn.regno, + ldi_insn.regno, ldi_insn.src); + prev_ldi_insns[ldi_insn.regno] = ldi_insn; + } + } // for insns + } // for BBs + + avr_dump (";; Function %f: Found %d changes: %d ldi, %d add, %d mem.\n", + n_ldi + n_add + n_mem, n_ldi, n_add, n_mem); + + return 0; +} + + namespace { static const pass_data avr_pass_data_pre_proep = { @@ -2776,7 +3356,10 @@ avr_legitimate_address_p (machine_mode mode, rtx x, bool strict) && CONST_INT_P (op1) && INTVAL (op1) >= 0) { - bool fit = IN_RANGE (INTVAL (op1), 0, MAX_LD_OFFSET (mode)); + bool fit = (IN_RANGE (INTVAL (op1), 0, MAX_LD_OFFSET (mode)) + // Reduced Tiny does not support PLUS addressing + // anyway, so we are not restricted to LD offset. + || AVR_TINY); if (fit) { @@ -6014,6 +6597,175 @@ out_movhi_mr_r (rtx_insn *insn, rtx op[], int *plen) return ""; } + +/* During reload, we allow much more addresses than Reduced Tiny actually + supports. Split them after reload in order to get closer to the + core's capabilities. This sets the stage for pass .avr-fuse-add. */ + +bool +avr_split_tiny_move (rtx_insn * /*insn*/, rtx *xop) +{ + bool store_p = false; + rtx mem, reg_or_0; + + if (REG_P (xop[0]) && MEM_P (xop[1])) + { + reg_or_0 = xop[0]; + mem = xop[1]; + } + else if (MEM_P (xop[0]) + && (REG_P (xop[1]) + || xop[1] == CONST0_RTX (GET_MODE (xop[0])))) + { + mem = xop[0]; + reg_or_0 = xop[1]; + store_p = true; + } + else + return false; + + machine_mode mode = GET_MODE (mem); + rtx base, addr = XEXP (mem, 0); + enum rtx_code addr_code = GET_CODE (addr); + + if (REG_P (reg_or_0) + && reg_overlap_mentioned_p (reg_or_0, addr)) + return false; + else if (addr_code == PLUS || addr_code == PRE_DEC || addr_code == POST_INC) + base = XEXP (addr, 0); + else if (addr_code == REG) + base = addr; + else + return false; + + if (REGNO (base) > REG_Z) + return false; + + bool volatile_p = MEM_VOLATILE_P (mem); + bool mem_volatile_p = false; + if (frame_pointer_needed + && REGNO (base) == FRAME_POINTER_REGNUM) + { + if (avr_fuse_add < 2 + // Be a projection (we always split PLUS). + || (avr_fuse_add == 2 && volatile_p && addr_code != PLUS)) + return false; + + // Changing the frame pointer locally may confuse later passes + // like .dse2 which don't track changes of FP, not even when + // respective CFA notes are present. An example is pr22141-1.c. + if (avr_fuse_add == 2) + mem_volatile_p = true; + } + + enum rtx_code new_code = UNKNOWN; + HOST_WIDE_INT add = 0, sub = 0; + int msize = GET_MODE_SIZE (mode); + + AVR_LdSt_Props ap { REGNO (base), store_p, volatile_p, ADDR_SPACE_GENERIC }; + + switch (addr_code) + { + default: + return false; + + case PLUS: + add = INTVAL (XEXP (addr, 1)); + if (msize == 1) + { + new_code = REG; + sub = -add; + } + else if (ap.want_predec) + { + // volatile stores prefer PRE_DEC (MSB first) + sub = -add; + add += msize; + new_code = PRE_DEC; + } + else + { + new_code = POST_INC; + sub = -add - msize; + } + break; + + case POST_INC: + // volatile stores prefer PRE_DEC (MSB first) + if (msize > 1 && ap.want_predec) + { + add = msize; + new_code = PRE_DEC; + sub = msize; + break; + } + return false; + + case PRE_DEC: + // volatile loads prefer POST_INC (LSB first) + if (msize > 1 && ap.want_postinc) + { + add = -msize; + new_code = POST_INC; + sub = -msize; + break; + } + return false; + + case REG: + if (msize == 1) + return false; + + if (ap.want_predec) + { + add = msize; + new_code = PRE_DEC; + sub = 0; + } + else + { + add = 0; + new_code = POST_INC; + sub = -msize; + } + break; + } // switch addr_code + + rtx_insn *insn; + + if (add) + { + insn = emit_move_ccc (base, plus_constant (Pmode, base, add)); + avr_maybe_adjust_cfa (insn, base, add); + } + + rtx new_addr = new_code == REG + ? base + : gen_rtx_fmt_e (new_code, Pmode, base); + + rtx new_mem = change_address (mem, mode, new_addr); + if (mem_volatile_p) + MEM_VOLATILE_P (new_mem) = 1; + + insn = emit_move_ccc (store_p ? new_mem : reg_or_0, + store_p ? reg_or_0 : new_mem); + if (auto_inc_p (new_addr)) + { + add_reg_note (insn, REG_INC, base); + int off = new_code == POST_INC ? msize : -msize; + avr_maybe_adjust_cfa (insn, base, off); + } + + if (sub) + { + insn = emit_move_ccc (base, plus_constant (Pmode, base, sub)); + avr_maybe_adjust_cfa (insn, base, sub); + } + + return true; +} + + /* Return 1 if frame pointer for current function required. */ static bool @@ -8222,6 +8974,28 @@ avr_out_plus_1 (rtx *xop, int *plen, enum rtx_code code, int *pcc, } } + if (AVR_TINY + && optimize + && i == 0 + && n_bytes == 2 + // When that pass adjusts the frame pointer, then we know that + // reg Y points to ordinary memory, and the only side-effect + // of -Y and Y+ is the side effect on Y. + && avr_fuse_add >= 2 + && frame_pointer_needed + && REGNO (xop[0]) == FRAME_POINTER_REGNUM) + { + rtx xval16 = simplify_gen_subreg (HImode, xval, imode, i); + if (xval16 == const1_rtx || xval16 == constm1_rtx) + { + avr_asm_len ((code == PLUS) == (xval16 == const1_rtx) + ? "ld __tmp_reg__,%a0+" + : "ld __tmp_reg__,-%a0", xop, plen, 1); + i++; + continue; + } + } + if (val8 == 0) { if (started) @@ -12812,6 +13586,11 @@ avr_mode_code_base_reg_class (machine_mode mode ATTRIBUTE_UNUSED, return POINTER_Z_REGS; } + if (AVR_TINY) + // We allow all offsets for all pointer regs. Pass .avr-fuse-add + // will rectify it (register allocation cannot do it). + return POINTER_REGS; + if (!avr_strict_X) return reload_completed ? BASE_POINTER_REGS : POINTER_REGS; @@ -12873,6 +13652,12 @@ avr_regno_mode_code_ok_for_base_p (int regno, } if (avr_strict_X + // On Reduced Tiny, all registers are equal in that they do not + // support PLUS addressing; respective addresses will be fake, + // even for the frame pointer. They must be handled in the + // printers by add-store-sub sequences -- or may be split after + // reload by `avr_split_tiny_move'. + && ! AVR_TINY && PLUS == outer_code && regno == REG_X) { diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index 49e586723ab..8f6bc288515 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -956,6 +956,30 @@ (define_split ; "split-lpmx" operands[4] = gen_int_mode (-GET_MODE_SIZE (mode), HImode); }) + +;; Legitimate address and stuff allows way more addressing modes than +;; Reduced Tiny actually supports. Split them now so that we get +;; closer to real instructions which may result in some optimization +;; opportunities. +(define_split + [(parallel [(set (match_operand:MOVMODE 0 "nonimmediate_operand") + (match_operand:MOVMODE 1 "general_operand")) + (clobber (reg:CC REG_CC))])] + "AVR_TINY + && reload_completed + && avr_fuse_add > 0 + // Only split this for .split2 when we are before + // pass .avr-fuse-add (which runs after proep). + && ! epilogue_completed + && (MEM_P (operands[0]) || MEM_P (operands[1]))" + [(scratch)] + { + if (avr_split_tiny_move (curr_insn, operands)) + DONE; + FAIL; + }) + + ;;========================================================================== ;; xpointer move (24 bit) @@ -6704,6 +6728,11 @@ (define_expand "gen_compare" (match_operand:HISI 1 "const_int_operand"))) (clobber (match_operand:QI 2 "scratch_operand"))])]) +(define_expand "gen_move_clobbercc" + [(parallel [(set (match_operand 0) + (match_operand 1)) + (clobber (reg:CC REG_CC))])]) + ;; ---------------------------------------------------------------------- ;; JUMP INSTRUCTIONS ;; ---------------------------------------------------------------------- diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt index c9f2b4d2fe5..01481c4c840 100644 --- a/gcc/config/avr/avr.opt +++ b/gcc/config/avr/avr.opt @@ -107,6 +107,14 @@ msp8 Target RejectNegative Var(avr_sp8) Init(0) The device has no SPH special function register. This option will be overridden by the compiler driver with the correct setting if presence/absence of SPH can be deduced from -mmcu=MCU. +mfuse-add +Target Alias(mfuse-add=, 1, 0) Optimization +Split register additions from load/store instructions. Most useful on Reduced Tiny. + +mfuse-add= +Target Joined RejectNegative UInteger Var(avr_fuse_add) Init(0) Optimization IntegerRange(0, 2) +Split register additions from load/store instructions. Most useful on Reduced Tiny. + Waddr-space-convert Warning C Var(avr_warn_addr_space_convert) Init(0) Warn if the address space of an address is changed. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e18886e0ac7..016a3fa9c85 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -884,7 +884,7 @@ Objective-C and Objective-C++ Dialects}. @emph{AVR Options} @gccoptlist{-mmcu=@var{mcu} -mabsdata -maccumulate-args --mbranch-cost=@var{cost} +-mbranch-cost=@var{cost} -mfuse-add=@var{level} -mcall-prologues -mgas-isr-prologues -mint8 -mflmap -mdouble=@var{bits} -mlong-double=@var{bits} -mn_flash=@var{size} -mno-interrupts @@ -23785,6 +23785,14 @@ integers. The default branch cost is 0. Functions prologues/epilogues are expanded as calls to appropriate subroutines. Code size is smaller. +@opindex mfuse-add +@item -mfuse-add +@itemx -mno-fuse-add +@itemx -mfuse-add=@var{level} +Optimize indirect memory accesses on reduced Tiny devices. +The default uses @code{@var{level}=1} for optimizations @option{-Og} +and @option{-O1}, and @code{@var{level}=2} for higher optimizations. + @opindex mdouble @opindex mlong-double @item -mdouble=@var{bits}