From patchwork Fri Feb 24 05:51:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?= X-Patchwork-Id: 65570 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8564B383F978 for ; Fri, 24 Feb 2023 05:53:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by sourceware.org (Postfix) with ESMTPS id 05DE0385B51F for ; Fri, 24 Feb 2023 05:51:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 05DE0385B51F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-wr1-x431.google.com with SMTP id c12so12781364wrw.1 for ; Thu, 23 Feb 2023 21:51:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YEuTMuCb/CjW0QYE3hxnNAy8qTY++T4lHkDGNYSdRtY=; b=fBXQmch0lhBvGAgv4v4T1Tnb6fu/Z1SIycvAPtwWtGEzXeVYjniLgJeSBtNkjuEAxu b1Gq4NCCW9M/rjSI0WK71Pd9QTw+WnuInyYLlyGtN5XY00qIIxhuzL1oSHhyxFvft1JV DVk2I5QuljSfayE6SJKDn10kdk+G1RbDRQpDAQgdoLYy6autu+iMqoGuOo7YeivohNKJ JsgMAyylN8zKhhPcJqkBK6SkZCFH/i3MNbkLyE9IX1d/4esi7esB8KX4A2kqk9ot0ynz Gfdpd2f+Xu92VluKD4xbAcpSdvHG9jm8Mm/KqrdmoFUZcKjesSGZWhmCOT07poHmz0tP +dlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YEuTMuCb/CjW0QYE3hxnNAy8qTY++T4lHkDGNYSdRtY=; b=QVZKgT0g2ON1pFqfwNRF1l1ov7EB1yGU/jOLrk2ueX2cDYmgXAvgiBN38dCSgAU94O L0JlFTPeiXKzrR+7wvKudtSSOABzvb+GEegzmwf7A7N8jqp80K1jG1IiVfeVQAmti2gK cLXuLzsZXrbEB4uDBdtQ9D7WZLFVbKBpetGOWmL0sgbW/CdxvRCYcaucS7Ek5O/JC/yg YpL6aB+dcr+9wYHo04Rfg5nqRzTVpkU6fImXGLM5OsPgtoQ1hHwDTIgwZcl9Howzz4u+ x99dZ9//8qGfRtjSxZi11FHrbYWJvNbibhnMbShD9sz7x3/T99TwNcfexcbCxDWPqZmx Fq6A== X-Gm-Message-State: AO0yUKU+ak3dWEMbEJeYUH0D/tbXMjd9WtuH50XWM9kryqzMLPvvrYyP s6K3cgoEkwpfe7FJbb0RzPat8Fy53/+yPqwZ X-Google-Smtp-Source: AK7set/Z6gfA7rjS/L5yCDrGCs3obWt47haFaQ21lczHAiMN50w7i18sscIBjEhQDZOqyPgwHqug9A== X-Received: by 2002:a5d:4692:0:b0:2c7:16c3:1756 with SMTP id u18-20020a5d4692000000b002c716c31756mr2848305wrq.61.1677217905896; Thu, 23 Feb 2023 21:51:45 -0800 (PST) Received: from beast.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id f18-20020adffcd2000000b002c59e001631sm11704055wrs.77.2023.02.23.21.51.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Feb 2023 21:51:45 -0800 (PST) From: Christoph Muellner To: gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Jeff Law , Cooper Qu , Lifang Xia , Yunhai Shang , Zhiwei Liu Cc: =?utf-8?q?Christoph_M=C3=BCllner?= Subject: [PATCH v3 09/11] riscv: thead: Add support for the XTheadMemPair ISA extension Date: Fri, 24 Feb 2023 06:51:25 +0100 Message-Id: <20230224055127.2500953-10-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230224055127.2500953-1-christoph.muellner@vrull.eu> References: <20230224055127.2500953-1-christoph.muellner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_MANYTO, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Christoph Müllner The XTheadMemPair ISA extension allows to pair two loads or stores: * th.ldd (2x LD) * th.lwd (2x LW) * th.lwud (2x LWU) * th.sdd (2x SD) * th.swd (2x SW) The displacement of these instructions is quite limited: * Displacement := imm2 << shamt * imm2 is a 2-bit unsigned value {0..3} * shamt is 4 for th.ldd/th.sdd and 3 otherwise But even with this small displacement we can identify many candidates. The merge of the two loads/stores is realized in form of peephole2 passes that support instruction reordering. The CFA expansion (save/restore registers on/from stack) is not processed by the peephole2 pass and, therefore, needs special-treatment. Many ideas of this patch are inspired by similar/equal approaches in other backends. gcc/ChangeLog: * config/riscv/peephole.md: New peephole passes for XTheadMemPair. * config/riscv/riscv-protos.h (thead_mempair_operands_p): New prototype. (thead_mempair_order_operands): Likewise. (thead_mempair_output_move): Likewise. * config/riscv/riscv.cc (extract_base_offset_in_addr): New function. (riscv_split_plus): New function. (thead_mempair_output_move): New function. (thead_mempair_check_consecutive_mems): New function. (thead_mempair_operand_p): New function. (thead_mempair_load_overlap_p): New function. (thead_mempair_operands_p): New function. (thead_mempair_order_operands): New function. (riscv_save_reg): Moved code. (th_mempair_save_reg): New function. (riscv_restore_reg): Moved code. (th_mempair_restore_reg): New function. (riscv_for_each_saved_reg): Add code to handle mempair instructions. * config/riscv/thead.md (*th_mempair_load_2): New pattern. (*th_mempair_store_2): Likewise. (*th_mempair_load_extendsidi2): Likewise. (*th_mempair_load_zero_extendsidi2): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadmempair-1.c: New test. * gcc.target/riscv/xtheadmempair-2.c: New test. * gcc.target/riscv/xtheadmempair-3.c: New test. Changes in v3: - Don't emit instructions during peephole2, but emit parallel INSNs - Add proper checks for the INSN patterns to avoid ICEs or illegal instructions reported by the assembler - Don't insert any `add` instructions - Rework the constraint handling - Simplify the output function - Restructure and simplify CFA processing - Add debug notes to CFA instructions - Emit parallel INSNs in the CFA code (same as peephole2) - Drop tests that target reordering - Drop tests that are irrelevant (e.g. unrolled loops) - Add tests for all possible displacements and all instructions - Add tests for CFA Signed-off-by: Christoph Müllner --- gcc/config/riscv/peephole.md | 56 +++ gcc/config/riscv/riscv-protos.h | 6 + gcc/config/riscv/riscv.cc | 450 ++++++++++++++++-- gcc/config/riscv/thead.md | 52 ++ .../gcc.target/riscv/xtheadmempair-1.c | 98 ++++ .../gcc.target/riscv/xtheadmempair-2.c | 84 ++++ .../gcc.target/riscv/xtheadmempair-3.c | 29 ++ 7 files changed, 746 insertions(+), 29 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c diff --git a/gcc/config/riscv/peephole.md b/gcc/config/riscv/peephole.md index 0ef0c04410b..ea696f3116c 100644 --- a/gcc/config/riscv/peephole.md +++ b/gcc/config/riscv/peephole.md @@ -38,3 +38,59 @@ (define_peephole2 { operands[5] = GEN_INT (INTVAL (operands[2]) - INTVAL (operands[5])); }) + +;; XTheadMemPair: merge two SI or DI loads +(define_peephole2 + [(set (match_operand:GPR 0 "register_operand" "") + (match_operand:GPR 1 "memory_operand" "")) + (set (match_operand:GPR 2 "register_operand" "") + (match_operand:GPR 3 "memory_operand" ""))] + "TARGET_XTHEADMEMPAIR + && thead_mempair_operands_p (operands, true, mode)" + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +{ + thead_mempair_order_operands (operands, true, mode); +}) + +;; XTheadMemPair: merge two SI or DI stores +(define_peephole2 + [(set (match_operand:GPR 0 "memory_operand" "") + (match_operand:GPR 1 "register_operand" "")) + (set (match_operand:GPR 2 "memory_operand" "") + (match_operand:GPR 3 "register_operand" ""))] + "TARGET_XTHEADMEMPAIR + && thead_mempair_operands_p (operands, false, mode)" + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +{ + thead_mempair_order_operands (operands, false, mode); +}) + +;; XTheadMemPair: merge two SI loads with sign-extension +(define_peephole2 + [(set (match_operand:DI 0 "register_operand" "") + (sign_extend:DI (match_operand:SI 1 "memory_operand" ""))) + (set (match_operand:DI 2 "register_operand" "") + (sign_extend:DI (match_operand:SI 3 "memory_operand" "")))] + "TARGET_XTHEADMEMPAIR && TARGET_64BIT + && thead_mempair_operands_p (operands, true, SImode)" + [(parallel [(set (match_dup 0) (sign_extend:DI (match_dup 1))) + (set (match_dup 2) (sign_extend:DI (match_dup 3)))])] +{ + thead_mempair_order_operands (operands, true, SImode); +}) + +;; XTheadMemPair: merge two SI loads with zero-extension +(define_peephole2 + [(set (match_operand:DI 0 "register_operand" "") + (zero_extend:DI (match_operand:SI 1 "memory_operand" ""))) + (set (match_operand:DI 2 "register_operand" "") + (zero_extend:DI (match_operand:SI 3 "memory_operand" "")))] + "TARGET_XTHEADMEMPAIR && TARGET_64BIT + && thead_mempair_operands_p (operands, true, SImode)" + [(parallel [(set (match_dup 0) (zero_extend:DI (match_dup 1))) + (set (match_dup 2) (zero_extend:DI (match_dup 3)))])] +{ + thead_mempair_order_operands (operands, true, SImode); +}) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 5cf4fafd662..1b7ba02726d 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -54,7 +54,13 @@ extern bool riscv_split_64bit_move_p (rtx, rtx); extern void riscv_split_doubleword_move (rtx, rtx); extern const char *riscv_output_move (rtx, rtx); extern const char *riscv_output_return (); + +extern bool thead_mempair_operands_p (rtx[4], bool, machine_mode); +extern void thead_mempair_order_operands (rtx[4], bool, machine_mode); + #ifdef RTX_CODE +extern const char *thead_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE); + extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx); extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx); extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx); diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 48f2cb399ae..33854393bd2 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -2943,6 +2943,332 @@ riscv_output_move (rtx dest, rtx src) gcc_unreachable (); } +/* If MEM is in the form of "base+offset", extract the two parts + of address and set to BASE and OFFSET, otherwise return false + after clearing BASE and OFFSET. */ + +static bool +extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset) +{ + rtx addr; + + gcc_assert (MEM_P (mem)); + + addr = XEXP (mem, 0); + + if (REG_P (addr)) + { + *base = addr; + *offset = const0_rtx; + return true; + } + + if (GET_CODE (addr) == PLUS + && REG_P (XEXP (addr, 0)) && CONST_INT_P (XEXP (addr, 1))) + { + *base = XEXP (addr, 0); + *offset = XEXP (addr, 1); + return true; + } + + *base = NULL_RTX; + *offset = NULL_RTX; + + return false; +} + +/* If X is a PLUS of a CONST_INT, return the two terms in *BASE_PTR + and *OFFSET_PTR. Return X in *BASE_PTR and 0 in *OFFSET_PTR otherwise. */ + +static void +riscv_split_plus (rtx x, rtx *base_ptr, HOST_WIDE_INT *offset_ptr) +{ + if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) + { + *base_ptr = XEXP (x, 0); + *offset_ptr = INTVAL (XEXP (x, 1)); + } + else + { + *base_ptr = x; + *offset_ptr = 0; + } +} + +/* Output a mempair instruction with the provided OPERANDS. + LOAD_P is true if a we have a pair of loads (stores otherwise). + MODE is the access mode (DI or SI). + CODE is the extension code (UNKNOWN, SIGN_EXTEND or ZERO_EXTEND). + This instruction does not handle invalid inputs gracefully, + but is full of assertions to ensure that only valid instructions + are emitted. */ + +const char * +thead_mempair_output_move (rtx operands[4], bool load_p, + machine_mode mode, RTX_CODE code) +{ + rtx reg1, reg2, mem1, mem2, base1, base2; + HOST_WIDE_INT offset1, offset2; + rtx output_operands[5]; + const char* format; + + gcc_assert (mode == SImode || mode == DImode); + + /* Paired 64-bit access instructions have a fixed shift amount of 4. + Paired 32-bit access instructions have a fixed shift amount of 3. */ + unsigned shamt = (mode == DImode) ? 4 : 3; + + if (load_p) + { + reg1 = copy_rtx (operands[0]); + reg2 = copy_rtx (operands[2]); + mem1 = copy_rtx (operands[1]); + mem2 = copy_rtx (operands[3]); + + if (mode == SImode) + if (code == ZERO_EXTEND) + format = "th.lwud\t%0, %1, (%2), %3, %4"; + else //SIGN_EXTEND or UNKNOWN + format = "th.lwd\t%0, %1, (%2), %3, %4"; + else + format = "th.ldd\t%0, %1, (%2), %3, %4"; + } + else + { + reg1 = copy_rtx (operands[1]); + reg2 = copy_rtx (operands[3]); + mem1 = copy_rtx (operands[0]); + mem2 = copy_rtx (operands[2]); + + if (mode == SImode) + format = "th.swd\t%z0, %z1, (%2), %3, %4"; + else + format = "th.sdd\t%z0, %z1, (%2), %3, %4"; + } + + riscv_split_plus (XEXP (mem1, 0), &base1, &offset1); + riscv_split_plus (XEXP (mem2, 0), &base2, &offset2); + gcc_assert (rtx_equal_p (base1, base2)); + auto size1 = MEM_SIZE (mem1); + auto size2 = MEM_SIZE (mem2); + gcc_assert (known_eq (size1, size2)); + gcc_assert (known_eq (offset1 + size1, offset2)); + + HOST_WIDE_INT imm2 = offset1 >> shamt; + + /* Make sure all mempair instruction constraints are met. */ + gcc_assert (imm2 >= 0 && imm2 < 4); + gcc_assert ((imm2 << shamt) == offset1); + gcc_assert (REG_P (reg1)); + gcc_assert (REG_P (reg2)); + gcc_assert (REG_P (base1)); + if (load_p) + { + gcc_assert (REGNO (reg1) != REGNO (reg2)); + gcc_assert (REGNO (reg1) != REGNO (base1)); + gcc_assert (REGNO (reg2) != REGNO (base1)); + } + + /* Output the mempair instruction. */ + output_operands[0] = copy_rtx (reg1); + output_operands[1] = copy_rtx (reg2); + output_operands[2] = copy_rtx (base1); + output_operands[3] = gen_rtx_CONST_INT (mode, imm2); + output_operands[4] = gen_rtx_CONST_INT (mode, shamt); + output_asm_insn (format, output_operands); + + return ""; +} + +/* Analyse if a pair of loads/stores MEM1 and MEM2 with given MODE + are consecutive so they can be merged into a mempair instruction. + RESERVED will be set to true, if a reversal of the accesses is + required (false otherwise). Returns true if the accesses can be + merged (even if reversing is necessary) and false if not. */ + +static bool +thead_mempair_check_consecutive_mems (machine_mode mode, rtx *mem1, rtx *mem2, + bool *reversed) +{ + rtx base1, base2, offset1, offset2; + extract_base_offset_in_addr (*mem1, &base1, &offset1); + extract_base_offset_in_addr (*mem2, &base2, &offset2); + + /* Make sure both mems are in base+offset form. */ + if (!base1 || !base2) + return false; + + /* If both mems use the same base register, just check the offsets. */ + if (rtx_equal_p (base1, base2)) + { + auto size = GET_MODE_SIZE (mode); + + if (known_eq (UINTVAL (offset1) + size, UINTVAL (offset2))) + { + *reversed = false; + return true; + } + + if (known_eq (UINTVAL (offset2) + size, UINTVAL (offset1))) + { + *reversed = true; + return true; + } + + return false; + } + + return false; +} + +/* Check if the given MEM can be used to define the address of a mempair + instruction. */ + +static bool +thead_mempair_operand_p (rtx mem, machine_mode mode) +{ + if (!MEM_SIZE_KNOWN_P (mem)) + return false; + + /* Only DI or SI mempair instructions exist. */ + auto mem_sz = MEM_SIZE (mem); + auto mode_sz = GET_MODE_SIZE (mode); + if (!known_eq (mem_sz, mode_sz)) + return false; + + /* Paired 64-bit access instructions have a fixed shift amount of 4. + Paired 32-bit access instructions have a fixed shift amount of 3. */ + machine_mode mem_mode = GET_MODE (mem); + unsigned shamt = (mem_mode == DImode) ? 4 : 3; + + rtx base; + HOST_WIDE_INT offset; + riscv_split_plus (XEXP (mem, 0), &base, &offset); + HOST_WIDE_INT imm2 = offset >> shamt; + + if (imm2 < 0 || imm2 >= 4) + return false; + + if ((imm2 << shamt) != offset) + return false; + + return true; +} + +static bool +thead_mempair_load_overlap_p (rtx reg1, rtx reg2, rtx mem) +{ + if (REGNO (reg1) == REGNO (reg2)) + return true; + + if (reg_overlap_mentioned_p (reg1, mem)) + return true; + + rtx base; + HOST_WIDE_INT offset; + riscv_split_plus (XEXP (mem, 0), &base, &offset); + + if (!REG_P (base)) + return true; + + if (REG_P (base)) + { + if (REGNO (base) == REGNO (reg1) + || REGNO (base) == REGNO (reg2)) + return true; + } + + return false; +} + +/* Given OPERANDS of consecutive load/store, check if we can merge + them into load-pair or store-pair instructions. + LOAD is true if they are load instructions. + MODE is the mode of memory operation. */ + +bool +thead_mempair_operands_p (rtx operands[4], bool load_p, + machine_mode mode) +{ + rtx mem_1, mem_2, reg_1, reg_2; + + if (load_p) + { + reg_1 = operands[0]; + mem_1 = operands[1]; + reg_2 = operands[2]; + mem_2 = operands[3]; + if (!REG_P (reg_1) || !REG_P (reg_2)) + return false; + if (thead_mempair_load_overlap_p (reg_1, reg_2, mem_1)) + return false; + if (thead_mempair_load_overlap_p (reg_1, reg_2, mem_2)) + return false; + } + else + { + mem_1 = operands[0]; + reg_1 = operands[1]; + mem_2 = operands[2]; + reg_2 = operands[3]; + } + + /* Check if the registers are GP registers. */ + if (!REG_P (reg_1) || !GP_REG_P (REGNO (reg_1)) + || !REG_P (reg_2) || !GP_REG_P (REGNO (reg_2))) + return false; + + /* The mems cannot be volatile. */ + if (!MEM_P (mem_1) || !MEM_P (mem_2)) + return false; + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)) + return false; + + /* If we have slow unaligned access, we only accept aligned memory. */ + if (riscv_slow_unaligned_access_p + && known_lt (MEM_ALIGN (mem_1), GET_MODE_SIZE (mode) * BITS_PER_UNIT)) + return false; + + /* Check if the addresses are in the form of [base+offset]. */ + bool reversed = false; + if (!thead_mempair_check_consecutive_mems (mode, &mem_1, &mem_2, &reversed)) + return false; + + /* The first memory accesses must be a mempair operand. */ + if ((!reversed && !thead_mempair_operand_p (mem_1, mode)) + || (reversed && !thead_mempair_operand_p (mem_2, mode))) + return false; + + /* The operands must be of the same size. */ + gcc_assert (known_eq (GET_MODE_SIZE (GET_MODE (mem_1)), + GET_MODE_SIZE (GET_MODE (mem_2)))); + + return true; +} + +/* Given OPERANDS of consecutive load/store that can be merged, + swap them if they are not in ascending order. + Return true if swap was performed. */ +void +thead_mempair_order_operands (rtx operands[4], bool load_p, machine_mode mode) +{ + int mem_op = load_p ? 1 : 0; + bool reversed = false; + if (!thead_mempair_check_consecutive_mems (mode, + operands + mem_op, + operands + mem_op + 2, + &reversed)) + gcc_unreachable (); + + if (reversed) + { + /* Irrespective of whether this is a load or a store, + we do the same swap. */ + std::swap (operands[0], operands[2]); + std::swap (operands[1], operands[3]); + } +} + const char * riscv_output_return () { @@ -4963,6 +5289,61 @@ riscv_set_return_address (rtx address, rtx scratch) riscv_emit_move (gen_frame_mem (GET_MODE (address), slot_address), address); } +/* Save register REG to MEM. Make the instruction frame-related. */ + +static void +riscv_save_reg (rtx reg, rtx mem) +{ + riscv_emit_move (mem, reg); + riscv_set_frame_expr (riscv_frame_set (mem, reg)); +} + +/* Save register 2x REG to MEM. Make the instruction frame-related. */ + +static void +th_mempair_save_reg (rtx operands[4]) +{ + rtx set1 = gen_rtx_SET (operands[0], operands[1]); + rtx set2 = gen_rtx_SET (operands[2], operands[3]); + rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set1, set2))); + RTX_FRAME_RELATED_P (insn) = 1; + add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1)); + add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2)); +} + +/* Restore register REG from MEM. */ + +static void +riscv_restore_reg (rtx reg, rtx mem) +{ + rtx insn = riscv_emit_move (reg, mem); + rtx dwarf = NULL_RTX; + dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf); + + if (epilogue_cfa_sp_offset && REGNO (reg) == HARD_FRAME_POINTER_REGNUM) + { + rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (epilogue_cfa_sp_offset)); + dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf); + } + + REG_NOTES (insn) = dwarf; + RTX_FRAME_RELATED_P (insn) = 1; +} + +/* Restore register 2x REG from MEM. Make the instruction frame-related. */ + +static void +th_mempair_restore_reg (rtx operands[4]) +{ + rtx set1 = gen_rtx_SET (operands[0], operands[1]); + rtx set2 = gen_rtx_SET (operands[2], operands[3]); + rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set1, set2))); + RTX_FRAME_RELATED_P (insn) = 1; + add_reg_note (insn, REG_CFA_RESTORE, operands[0]); + add_reg_note (insn, REG_CFA_RESTORE, operands[2]); +} + /* A function to save or store a register. The first argument is the register and the second is the stack slot. */ typedef void (*riscv_save_restore_fn) (rtx, rtx); @@ -5057,6 +5438,46 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, riscv_save_restore_fn fn, && riscv_is_eh_return_data_register (regno)) continue; + if (TARGET_XTHEADMEMPAIR) + { + /* Get the next reg/offset pair. */ + HOST_WIDE_INT offset2 = offset; + unsigned int regno2 = riscv_next_saved_reg (regno, limit, &offset2); + + /* Prepare the operands for a mempair instruction. */ + bool load_p = (fn == riscv_restore_reg); + int reg_op = load_p ? 0 : 1; + int mem_op = load_p ? 1 : 0; + rtx operands[4]; + operands[reg_op] = gen_rtx_REG (word_mode, regno); + rtx mem1 = plus_constant (word_mode, stack_pointer_rtx, offset); + mem1 = gen_frame_mem (word_mode, mem1); + operands[mem_op] = mem1; + operands[2 + reg_op] = gen_rtx_REG (word_mode, regno2); + rtx mem2 = plus_constant (word_mode, stack_pointer_rtx, offset2); + mem2 = gen_frame_mem (word_mode, mem2); + operands[2 + mem_op] = mem2; + + /* Validate everything before emitting a mempair instruction. */ + if (regno2 != INVALID_REGNUM + && !cfun->machine->reg_is_wrapped_separately[regno2] + && !(epilogue && !maybe_eh_return + && riscv_is_eh_return_data_register (regno2)) + && thead_mempair_operands_p (operands, load_p, word_mode)) + { + thead_mempair_order_operands (operands, load_p, word_mode); + + if (load_p) + th_mempair_restore_reg (operands); + else + th_mempair_save_reg (operands); + + offset = offset2; + regno = regno2; + continue; + } + } + riscv_save_restore_reg (word_mode, regno, offset, fn); } @@ -5075,35 +5496,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, riscv_save_restore_fn fn, } } -/* Save register REG to MEM. Make the instruction frame-related. */ - -static void -riscv_save_reg (rtx reg, rtx mem) -{ - riscv_emit_move (mem, reg); - riscv_set_frame_expr (riscv_frame_set (mem, reg)); -} - -/* Restore register REG from MEM. */ - -static void -riscv_restore_reg (rtx reg, rtx mem) -{ - rtx insn = riscv_emit_move (reg, mem); - rtx dwarf = NULL_RTX; - dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf); - - if (epilogue_cfa_sp_offset && REGNO (reg) == HARD_FRAME_POINTER_REGNUM) - { - rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx, - GEN_INT (epilogue_cfa_sp_offset)); - dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf); - } - - REG_NOTES (insn) = dwarf; - RTX_FRAME_RELATED_P (insn) = 1; -} - /* For stack frames that can't be allocated with a single ADDI instruction, compute the best value to initially allocate. It must at a minimum allocate enough space to spill the callee-saved registers. If TARGET_RVC, diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md index 6ec3613071f..cbfe51cbd85 100644 --- a/gcc/config/riscv/thead.md +++ b/gcc/config/riscv/thead.md @@ -297,3 +297,55 @@ (define_insn "*th_sextw_msubhisi4" [(set_attr "type" "imul") (set_attr "mode" "SI")] ) + +;; XTheadMemPair + +;; MEMPAIR load 64/32 bit +(define_insn "*th_mempair_load_2" + [(set (match_operand:GPR 0 "register_operand" "=r") + (match_operand:GPR 1 "memory_operand" "m")) + (set (match_operand:GPR 2 "register_operand" "=r") + (match_operand:GPR 3 "memory_operand" "m"))] + "TARGET_XTHEADMEMPAIR && reload_completed + && thead_mempair_operands_p (operands, true, mode)" + { return thead_mempair_output_move (operands, true, mode, UNKNOWN); } + [(set_attr "move_type" "load") + (set_attr "mode" "")]) + +;; MEMPAIR store 64/32 bit +(define_insn "*th_mempair_store_2" + [(set (match_operand:GPR 0 "memory_operand" "=m") + (match_operand:GPR 1 "register_operand" "r")) + (set (match_operand:GPR 2 "memory_operand" "=m") + (match_operand:GPR 3 "register_operand" "r"))] + "TARGET_XTHEADMEMPAIR && reload_completed + && thead_mempair_operands_p (operands, false, mode)" + { return thead_mempair_output_move (operands, false, mode, UNKNOWN); } + [(set_attr "move_type" "store") + (set_attr "mode" "")]) + +;; MEMPAIR load DI extended signed SI +(define_insn "*th_mempair_load_extendsidi2" + [(set (match_operand 0 "register_operand" "=r") + (sign_extend:DI (match_operand 1 "memory_operand" "m"))) + (set (match_operand 2 "register_operand" "=r") + (sign_extend:DI (match_operand 3 "memory_operand" "m")))] + "TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed + && thead_mempair_operands_p (operands, true, SImode)" + { return thead_mempair_output_move (operands, true, SImode, SIGN_EXTEND); } + [(set_attr "move_type" "load") + (set_attr "mode" "DI") + (set_attr "length" "8")]) + +;; MEMPAIR load DI extended unsigned SI +(define_insn "*th_mempair_load_zero_extendsidi2" + [(set (match_operand 0 "register_operand" "=r") + (zero_extend:DI (match_operand 1 "memory_operand" "m"))) + (set (match_operand 2 "register_operand" "=r") + (zero_extend:DI (match_operand 3 "memory_operand" "m")))] + "TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed + && thead_mempair_operands_p (operands, true, SImode)" + { return thead_mempair_output_move (operands, true, SImode, ZERO_EXTEND); } + [(set_attr "move_type" "load") + (set_attr "mode" "DI") + (set_attr "length" "8")]) diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c b/gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c new file mode 100644 index 00000000000..c324555fd35 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c @@ -0,0 +1,98 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */ +/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906" { target { rv64 } } } */ +/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906" { target { rv32 } } } */ + +#include + +#if __riscv_xlen == 32 +typedef uint32_t xlen_t; +#else +typedef uint64_t xlen_t; +#endif + +void foof (xlen_t*, xlen_t, xlen_t); +void foor (xlen_t*, xlen_t, xlen_t); +void foowu (uint32_t*, uint64_t, uint64_t); +void foows (int32_t*, int64_t, int64_t); + +#define LxD_TEST(f, T, i1, i2) \ +void \ +f ## i1 ## i2(T *arr) \ +{ \ + foo ## f(arr, arr[i1], arr[i2]); \ +} + +// works +LxD_TEST(f, xlen_t, 0, 1) +// does not work (can't merge with unaligned offset) +LxD_TEST(f, xlen_t, 1, 2) +// works +LxD_TEST(f, xlen_t, 2, 3) +// does not work (can't merge with unaligned offset) +LxD_TEST(f, xlen_t, 3, 4) +// works +LxD_TEST(f, xlen_t, 4, 5) +// does not work (can't merge with unaligned offset) +LxD_TEST(f, xlen_t, 5, 6) +// works +LxD_TEST(f, xlen_t, 6, 7) +// does not work (can't merge with unaligned offset) +LxD_TEST(f, xlen_t, 7, 8) +// does not work (out of range) +LxD_TEST(f, xlen_t, 8, 9) + +// works with reordering +LxD_TEST(r, xlen_t, 1, 0) +// does not work (can't merge with unaligned offset) +LxD_TEST(r, xlen_t, 2, 1) +// works with reordering +LxD_TEST(r, xlen_t, 3, 2) +// does not work (can't merge with unaligned offset) +LxD_TEST(r, xlen_t, 4, 3) +// works with reordering +LxD_TEST(r, xlen_t, 5, 4) +// does not work (can't merge with unaligned offset) +LxD_TEST(r, xlen_t, 6, 5) +// works with reordering +LxD_TEST(r, xlen_t, 7, 6) +// does not work (can't merge with unaligned offset) +LxD_TEST(r, xlen_t, 8, 7) +// does not work (out of range) +LxD_TEST(r, xlen_t, 9, 8) + +#if __riscv_xlen != 32 +// works +LxD_TEST(wu, uint32_t, 0, 1) +LxD_TEST(ws, int32_t, 0, 1) +// does not work (can't merge with unaligned offset) +LxD_TEST(wu, uint32_t, 1, 2) +LxD_TEST(ws, int32_t, 1, 2) +// works +LxD_TEST(wu, uint32_t, 2, 3) +LxD_TEST(ws, int32_t, 2, 3) +// does not work (can't merge with unaligned offset) +LxD_TEST(wu, uint32_t, 3, 4) +LxD_TEST(ws, int32_t, 3, 4) +// works +LxD_TEST(wu, uint32_t, 4, 5) +LxD_TEST(ws, int32_t, 4, 5) +// does not work (can't merge with unaligned offset) +LxD_TEST(wu, uint32_t, 5, 6) +LxD_TEST(ws, int32_t, 5, 6) +// works +LxD_TEST(wu, uint32_t, 6, 7) +LxD_TEST(ws, int32_t, 6, 7) +// does not work (can't merge with unaligned offset) +LxD_TEST(wu, uint32_t, 7, 8) +LxD_TEST(ws, int32_t, 7, 8) +// does not work (out of range) +LxD_TEST(wu, uint32_t, 8, 9) +LxD_TEST(ws, int32_t, 8, 9) +#endif + +/* { dg-final { scan-assembler-times "th.ldd\t" 8 { target { rv64 } } } } */ +/* { dg-final { scan-assembler-times "th.luwd\t" 4 { target { rv64 } } } } */ +/* { dg-final { scan-assembler-times "th.lwd\t" 4 { target { rv64 } } } } */ + +/* { dg-final { scan-assembler-times "th.lwd\t" 8 { target { rv32 } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c b/gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c new file mode 100644 index 00000000000..ed639709196 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c @@ -0,0 +1,84 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */ +/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906" { target { rv64 } } } */ +/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906" { target { rv32 } } } */ + +#include + +#if __riscv_xlen == 32 +typedef uint32_t xlen_t; +#else +typedef uint64_t xlen_t; +#endif + +#define SxD_TEST(f, T, i1, i2) \ +void \ +f ## i1 ## i2(T *arr, T x, T y) \ +{ \ + arr[i1] = x; \ + arr[i2] = y; \ +} + +// works +SxD_TEST(f, xlen_t, 0, 1) +// does not work (can't merge with unaligned offset) +SxD_TEST(f, xlen_t, 1, 2) +// works +SxD_TEST(f, xlen_t, 2, 3) +// does not work (can't merge with unaligned offset) +SxD_TEST(f, xlen_t, 3, 4) +// works +SxD_TEST(f, xlen_t, 4, 5) +// does not work (can't merge with unaligned offset) +SxD_TEST(f, xlen_t, 5, 6) +// works +SxD_TEST(f, xlen_t, 6, 7) +// does not work (can't merge with unaligned offset) +SxD_TEST(f, xlen_t, 7, 8) +// does not work (out of range) +SxD_TEST(f, xlen_t, 8, 9) + +// works with reordering +SxD_TEST(r, xlen_t, 1, 0) +// does not work (can't merge with unaligned offset) +SxD_TEST(r, xlen_t, 2, 1) +// works with reordering +SxD_TEST(r, xlen_t, 3, 2) +// does not work (can't merge with unaligned offset) +SxD_TEST(r, xlen_t, 4, 3) +// works with reordering +SxD_TEST(r, xlen_t, 5, 4) +// does not work (can't merge with unaligned offset) +SxD_TEST(r, xlen_t, 6, 5) +// works with reordering +SxD_TEST(r, xlen_t, 7, 6) +// does not work (can't merge with unaligned offset) +SxD_TEST(r, xlen_t, 8, 7) +// does not work (out of range) +SxD_TEST(r, xlen_t, 9, 8) + +#if __riscv_xlen != 32 +// works +SxD_TEST(w, uint32_t, 0, 1) +// does not work (can't merge with unaligned offset) +SxD_TEST(w, uint32_t, 1, 2) +// works +SxD_TEST(w, uint32_t, 2, 3) +// does not work (can't merge with unaligned offset) +SxD_TEST(w, uint32_t, 3, 4) +// works +SxD_TEST(w, uint32_t, 4, 5) +// does not work (can't merge with unaligned offset) +SxD_TEST(w, uint32_t, 5, 6) +// works +SxD_TEST(w, uint32_t, 6, 7) +// does not work (can't merge with unaligned offset) +SxD_TEST(w, uint32_t, 7, 8) +// does not work (out of range) +SxD_TEST(w, uint32_t, 8, 9) +#endif + +/* { dg-final { scan-assembler-times "th.sdd\t" 8 { target { rv64 } } } } */ +/* { dg-final { scan-assembler-times "th.swd\t" 4 { target { rv64 } } } } */ + +/* { dg-final { scan-assembler-times "th.swd\t" 8 { target { rv32 } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c new file mode 100644 index 00000000000..5dec702819a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */ +/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906" { target { rv64 } } } */ +/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906" { target { rv32 } } } */ + +#include + +#if __riscv_xlen == 32 +typedef uint32_t xlen_t; +#else +typedef uint64_t xlen_t; +#endif + +void foo (xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t); +void bar (xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t); + +void baz (xlen_t a, xlen_t b, xlen_t c, xlen_t d, xlen_t e, xlen_t f, xlen_t g, xlen_t h) +{ + foo (a, b, c, d, e, f, g, h); + /* RV64: We don't use 0(sp), therefore we can only get 3 mempairs. */ + /* RV32: We don't use 0(sp)-8(sp), therefore we can only get 2 mempairs. */ + bar (a, b, c, d, e, f, g, h); +} + +/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } } } } */ +/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } } } } */ + +/* { dg-final { scan-assembler-times "th.lwd\t" 2 { target { rv32 } } } } */ +/* { dg-final { scan-assembler-times "th.swd\t" 2 { target { rv32 } } } } */