From patchwork Thu Mar 2 08:35:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?= X-Patchwork-Id: 65895 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8619B385021B for ; Thu, 2 Mar 2023 08:38:21 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id BEE023858428 for ; Thu, 2 Mar 2023 08:35:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BEE023858428 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-ed1-x52b.google.com with SMTP id x3so1478992edb.10 for ; Thu, 02 Mar 2023 00:35:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1677746152; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yC22w2T1sNOAMLhtcQIXxfE2yqDWSwV1nn4wK8JCgs0=; b=L/Pq68Vs9wU6uXidMlhQc7rAEe3e3IrVB0D8Z8aXjuF+uaTrDTTDyBq5OWUlYuJRdg rr6fX6GYBV+fTXRiLERzunpJw7lQwZGRuAuSBqcsLbjn+JFWXka9v1OtuMMq3eb1I7e4 uODOTeRAjyUX6Qfg51rTLV0O8OAEl3YwHAkSVcwOt5izUUmy56vTWrsb15aj8TNFdlLb ROyuWxQ+wH8vh/U7zTScFlDoeS6s+8e4A1bzxIaNkgmAJqt9Rp2KaMecgJDPcxObYOVq +7uZhuLoEpzaCjRAeLl8lXhTv+FpjBT6nL6WxjS0WWffLAfVkRWOp1H6TDgUC1cdTTAp 7W8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677746152; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yC22w2T1sNOAMLhtcQIXxfE2yqDWSwV1nn4wK8JCgs0=; b=j55pXOlppt7MKF4+17byGULN7whpj/+5V29XhNe2VTW1KTrDyzw2YO7o8HDnqxn0r+ Zr+gLAtqZHn9Rv0Xu6ilFhIzfpc8SRvr36muANsUlMVxxWD5YbOc4vuUTbaOEUiCzXi1 VTCsVmfOwMos9uOM5wgrY7sGEA7T0HAYx6OuTYdDAEoIkzC2cEUJUHl1FsitkKhhZiRJ SHBvOYE4UYLLKzNLpihb1c+JE9aAJLAn4RAK1IBRFWx7zzl+ViMJzt5LSmyvy68ExG9Z tjOsMFJrPT1F6qfsPs7YbpocDF8Knrp3cR52eKxxY4ZQssRmVdMBZL8atWvs8M6Zs5jW RMIA== X-Gm-Message-State: AO0yUKUKZbnOp6a79f9cGSEFGADzHdx9s9pEkpb9FPzAc2D0a0CxcISH ImaO3uhrf8Bd2FJ8xalcgkGC+CUeKB9pAaf/ X-Google-Smtp-Source: AK7set82ExBpSw+ypIKRSeJSiVRpFrMqP96S2bgwKKAPbj2FaBOBlWKY6V29jngoHJxpLr8hv2TGig== X-Received: by 2002:a05:6402:b1c:b0:4be:b37d:6134 with SMTP id bm28-20020a0564020b1c00b004beb37d6134mr3179228edb.5.1677746151559; Thu, 02 Mar 2023 00:35:51 -0800 (PST) Received: from beast.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id p17-20020a170906229100b008e68d2c11d8sm6853872eja.218.2023.03.02.00.35.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Mar 2023 00:35:50 -0800 (PST) From: Christoph Muellner To: gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Cooper Qu , Lifang Xia , Yunhai Shang , Zhiwei Liu , Andrew Pinski , Hans-Peter Nilsson Cc: =?utf-8?q?Christoph_M=C3=BCllner?= Subject: [PATCH v4 9/9] riscv: thead: Add support for the XTheadMemPair ISA extension Date: Thu, 2 Mar 2023 09:35:34 +0100 Message-Id: <20230302083534.4076244-10-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230302083534.4076244-1-christoph.muellner@vrull.eu> References: <20230302083534.4076244-1-christoph.muellner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_MANYTO, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Christoph Müllner The XTheadMemPair ISA extension allows to pair two loads or stores: * th.ldd (2x LD) * th.lwd (2x LW) * th.lwud (2x LWU) * th.sdd (2x SD) * th.swd (2x SW) The displacement of these instructions is quite limited: * Displacement := imm2 << shamt * imm2 is a 2-bit unsigned value {0..3} * shamt is 4 for th.ldd/th.sdd and 3 otherwise But even with this small displacement we can identify many candidates. The merge of the two loads/stores is realized in form of peephole2 passes that support instruction reordering. The CFA expansion (save/restore registers on/from stack) is not processed by the peephole2 pass and, therefore, needs special-treatment. Many ideas of this patch are inspired by similar/equal approaches in other backends. gcc/ChangeLog: * config.gcc: Add thead.o to RISC-V extra_objs. * config/riscv/peephole.md: Add mempair peephole passes. * config/riscv/riscv-protos.h (riscv_split_64bit_move_p): New prototype. (th_mempair_operands_p): Likewise. (th_mempair_order_operands): Likewise. (th_mempair_prepare_save_restore_operands): Likewise. (th_mempair_save_restore_regs): Likewise. (th_mempair_output_move): Likewise. * config/riscv/riscv.cc (riscv_save_reg): Move code. (riscv_restore_reg): Move code. (riscv_for_each_saved_reg): Add code to emit mempair insns. * config/riscv/t-riscv: Add thead.cc. * config/riscv/thead.md (*th_mempair_load_2): New insn. (*th_mempair_store_2): Likewise. (*th_mempair_load_extendsidi2): Likewise. (*th_mempair_load_zero_extendsidi2): Likewise. * config/riscv/thead.cc: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadmempair-1.c: New test. * gcc.target/riscv/xtheadmempair-2.c: New test. * gcc.target/riscv/xtheadmempair-3.c: New test. Changes in v4: - Unify function prefix ("th_") - Move th_* functions to thead.cc - Minimize code in CFA code in riscv.cc Changes in v3: - Don't emit instructions during peephole2, but emit parallel INSNs - Add proper checks for the INSN patterns to avoid ICEs or illegal instructions reported by the assembler - Don't insert any `add` instructions - Rework the constraint handling - Simplify the output function - Restructure and simplify CFA processing - Add debug notes to CFA instructions - Emit parallel INSNs in the CFA code (same as peephole2) - Drop tests that target reordering - Drop tests that are irrelevant (e.g. unrolled loops) - Add tests for all possible displacements and all instructions - Add tests for CFA Signed-off-by: Christoph Müllner --- gcc/config.gcc | 1 + gcc/config/riscv/peephole.md | 56 +++ gcc/config/riscv/riscv-protos.h | 14 + gcc/config/riscv/riscv.cc | 88 ++-- gcc/config/riscv/t-riscv | 4 + gcc/config/riscv/thead.cc | 427 ++++++++++++++++++ gcc/config/riscv/thead.md | 52 +++ .../gcc.target/riscv/xtheadmempair-1.c | 98 ++++ .../gcc.target/riscv/xtheadmempair-2.c | 84 ++++ .../gcc.target/riscv/xtheadmempair-3.c | 29 ++ 10 files changed, 824 insertions(+), 29 deletions(-) create mode 100644 gcc/config/riscv/thead.cc create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c diff --git a/gcc/config.gcc b/gcc/config.gcc index c070e6ecd2e..ad0b6d3302a 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -531,6 +531,7 @@ riscv*) cpu_type=riscv extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o" extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o" + extra_objs="${extra_objs} thead.o" d_target_objs="riscv-d.o" extra_headers="riscv_vector.h" target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.cc" diff --git a/gcc/config/riscv/peephole.md b/gcc/config/riscv/peephole.md index 0ef0c04410b..67e7046d7e6 100644 --- a/gcc/config/riscv/peephole.md +++ b/gcc/config/riscv/peephole.md @@ -38,3 +38,59 @@ (define_peephole2 { operands[5] = GEN_INT (INTVAL (operands[2]) - INTVAL (operands[5])); }) + +;; XTheadMemPair: merge two SI or DI loads +(define_peephole2 + [(set (match_operand:GPR 0 "register_operand" "") + (match_operand:GPR 1 "memory_operand" "")) + (set (match_operand:GPR 2 "register_operand" "") + (match_operand:GPR 3 "memory_operand" ""))] + "TARGET_XTHEADMEMPAIR + && th_mempair_operands_p (operands, true, mode)" + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +{ + th_mempair_order_operands (operands, true, mode); +}) + +;; XTheadMemPair: merge two SI or DI stores +(define_peephole2 + [(set (match_operand:GPR 0 "memory_operand" "") + (match_operand:GPR 1 "register_operand" "")) + (set (match_operand:GPR 2 "memory_operand" "") + (match_operand:GPR 3 "register_operand" ""))] + "TARGET_XTHEADMEMPAIR + && th_mempair_operands_p (operands, false, mode)" + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +{ + th_mempair_order_operands (operands, false, mode); +}) + +;; XTheadMemPair: merge two SI loads with sign-extension +(define_peephole2 + [(set (match_operand:DI 0 "register_operand" "") + (sign_extend:DI (match_operand:SI 1 "memory_operand" ""))) + (set (match_operand:DI 2 "register_operand" "") + (sign_extend:DI (match_operand:SI 3 "memory_operand" "")))] + "TARGET_XTHEADMEMPAIR && TARGET_64BIT + && th_mempair_operands_p (operands, true, SImode)" + [(parallel [(set (match_dup 0) (sign_extend:DI (match_dup 1))) + (set (match_dup 2) (sign_extend:DI (match_dup 3)))])] +{ + th_mempair_order_operands (operands, true, SImode); +}) + +;; XTheadMemPair: merge two SI loads with zero-extension +(define_peephole2 + [(set (match_operand:DI 0 "register_operand" "") + (zero_extend:DI (match_operand:SI 1 "memory_operand" ""))) + (set (match_operand:DI 2 "register_operand" "") + (zero_extend:DI (match_operand:SI 3 "memory_operand" "")))] + "TARGET_XTHEADMEMPAIR && TARGET_64BIT + && th_mempair_operands_p (operands, true, SImode)" + [(parallel [(set (match_dup 0) (zero_extend:DI (match_dup 1))) + (set (match_dup 2) (zero_extend:DI (match_dup 3)))])] +{ + th_mempair_order_operands (operands, true, SImode); +}) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 5cf4fafd662..6b2d1c722b4 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -54,6 +54,7 @@ extern bool riscv_split_64bit_move_p (rtx, rtx); extern void riscv_split_doubleword_move (rtx, rtx); extern const char *riscv_output_move (rtx, rtx); extern const char *riscv_output_return (); + #ifdef RTX_CODE extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx); extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx); @@ -200,4 +201,17 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1; /* Mask that selects the riscv_builtin_class part of a function code. */ const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1; +/* Routines implemented in thead.cc. */ +extern bool th_mempair_operands_p (rtx[4], bool, machine_mode); +extern void th_mempair_order_operands (rtx[4], bool, machine_mode); +extern void th_mempair_prepare_save_restore_operands (rtx[4], bool, + machine_mode, + int, HOST_WIDE_INT, + int, HOST_WIDE_INT); +extern void th_mempair_save_restore_regs (rtx[4], bool, machine_mode); +#ifdef RTX_CODE +extern const char* +th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE); +#endif + #endif /* ! GCC_RISCV_PROTOS_H */ diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 48f2cb399ae..20af3506275 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -4963,6 +4963,35 @@ riscv_set_return_address (rtx address, rtx scratch) riscv_emit_move (gen_frame_mem (GET_MODE (address), slot_address), address); } +/* Save register REG to MEM. Make the instruction frame-related. */ + +static void +riscv_save_reg (rtx reg, rtx mem) +{ + riscv_emit_move (mem, reg); + riscv_set_frame_expr (riscv_frame_set (mem, reg)); +} + +/* Restore register REG from MEM. */ + +static void +riscv_restore_reg (rtx reg, rtx mem) +{ + rtx insn = riscv_emit_move (reg, mem); + rtx dwarf = NULL_RTX; + dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf); + + if (epilogue_cfa_sp_offset && REGNO (reg) == HARD_FRAME_POINTER_REGNUM) + { + rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (epilogue_cfa_sp_offset)); + dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf); + } + + REG_NOTES (insn) = dwarf; + RTX_FRAME_RELATED_P (insn) = 1; +} + /* A function to save or store a register. The first argument is the register and the second is the stack slot. */ typedef void (*riscv_save_restore_fn) (rtx, rtx); @@ -5057,6 +5086,36 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, riscv_save_restore_fn fn, && riscv_is_eh_return_data_register (regno)) continue; + if (TARGET_XTHEADMEMPAIR) + { + /* Get the next reg/offset pair. */ + HOST_WIDE_INT offset2 = offset; + unsigned int regno2 = riscv_next_saved_reg (regno, limit, &offset2); + + /* Validate everything before emitting a mempair instruction. */ + if (regno2 != INVALID_REGNUM + && !cfun->machine->reg_is_wrapped_separately[regno2] + && !(epilogue && !maybe_eh_return + && riscv_is_eh_return_data_register (regno2))) + { + bool load_p = (fn == riscv_restore_reg); + rtx operands[4]; + th_mempair_prepare_save_restore_operands (operands, + load_p, word_mode, + regno, offset, + regno2, offset2); + + /* If the operands fit into a mempair insn, then emit one. */ + if (th_mempair_operands_p (operands, load_p, word_mode)) + { + th_mempair_save_restore_regs (operands, load_p, word_mode); + offset = offset2; + regno = regno2; + continue; + } + } + } + riscv_save_restore_reg (word_mode, regno, offset, fn); } @@ -5075,35 +5134,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, riscv_save_restore_fn fn, } } -/* Save register REG to MEM. Make the instruction frame-related. */ - -static void -riscv_save_reg (rtx reg, rtx mem) -{ - riscv_emit_move (mem, reg); - riscv_set_frame_expr (riscv_frame_set (mem, reg)); -} - -/* Restore register REG from MEM. */ - -static void -riscv_restore_reg (rtx reg, rtx mem) -{ - rtx insn = riscv_emit_move (reg, mem); - rtx dwarf = NULL_RTX; - dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf); - - if (epilogue_cfa_sp_offset && REGNO (reg) == HARD_FRAME_POINTER_REGNUM) - { - rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx, - GEN_INT (epilogue_cfa_sp_offset)); - dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf); - } - - REG_NOTES (insn) = dwarf; - RTX_FRAME_RELATED_P (insn) = 1; -} - /* For stack frames that can't be allocated with a single ADDI instruction, compute the best value to initially allocate. It must at a minimum allocate enough space to spill the callee-saved registers. If TARGET_RVC, diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv index d30e0235356..dc30ecff678 100644 --- a/gcc/config/riscv/t-riscv +++ b/gcc/config/riscv/t-riscv @@ -75,6 +75,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc $(COMPILE) $< $(POSTCOMPILE) +thead.o: $(srcdir)/config/riscv/thead.cc + $(COMPILE) $< + $(POSTCOMPILE) + PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def $(common_out_file): $(srcdir)/config/riscv/riscv-cores.def \ diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc new file mode 100644 index 00000000000..75203805310 --- /dev/null +++ b/gcc/config/riscv/thead.cc @@ -0,0 +1,427 @@ +/* Subroutines used for code generation for RISC-V. + Copyright (C) 2023 Free Software Foundation, Inc. + Contributed by Christoph Müllner (christoph.muellner@vrull.eu). + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "target.h" +#include "backend.h" +#include "rtl.h" +#include "memmodel.h" +#include "emit-rtl.h" +#include "poly-int.h" +#include "output.h" + +/* If MEM is in the form of "base+offset", extract the two parts + of address and set to BASE and OFFSET, otherwise return false + after clearing BASE and OFFSET. */ + +static bool +extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset) +{ + rtx addr; + + gcc_assert (MEM_P (mem)); + + addr = XEXP (mem, 0); + + if (REG_P (addr)) + { + *base = addr; + *offset = const0_rtx; + return true; + } + + if (GET_CODE (addr) == PLUS + && REG_P (XEXP (addr, 0)) && CONST_INT_P (XEXP (addr, 1))) + { + *base = XEXP (addr, 0); + *offset = XEXP (addr, 1); + return true; + } + + *base = NULL_RTX; + *offset = NULL_RTX; + + return false; +} + +/* If X is a PLUS of a CONST_INT, return the two terms in *BASE_PTR + and *OFFSET_PTR. Return X in *BASE_PTR and 0 in *OFFSET_PTR otherwise. */ + +static void +split_plus (rtx x, rtx *base_ptr, HOST_WIDE_INT *offset_ptr) +{ + if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) + { + *base_ptr = XEXP (x, 0); + *offset_ptr = INTVAL (XEXP (x, 1)); + } + else + { + *base_ptr = x; + *offset_ptr = 0; + } +} + +/* Output a mempair instruction with the provided OPERANDS. + LOAD_P is true if a we have a pair of loads (stores otherwise). + MODE is the access mode (DI or SI). + CODE is the extension code (UNKNOWN, SIGN_EXTEND or ZERO_EXTEND). + This instruction does not handle invalid inputs gracefully, + but is full of assertions to ensure that only valid instructions + are emitted. */ + +const char * +th_mempair_output_move (rtx operands[4], bool load_p, + machine_mode mode, RTX_CODE code) +{ + rtx reg1, reg2, mem1, mem2, base1, base2; + HOST_WIDE_INT offset1, offset2; + rtx output_operands[5]; + const char* format; + + gcc_assert (mode == SImode || mode == DImode); + + /* Paired 64-bit access instructions have a fixed shift amount of 4. + Paired 32-bit access instructions have a fixed shift amount of 3. */ + unsigned shamt = (mode == DImode) ? 4 : 3; + + if (load_p) + { + reg1 = copy_rtx (operands[0]); + reg2 = copy_rtx (operands[2]); + mem1 = copy_rtx (operands[1]); + mem2 = copy_rtx (operands[3]); + + if (mode == SImode) + if (code == ZERO_EXTEND) + format = "th.lwud\t%0, %1, (%2), %3, %4"; + else //SIGN_EXTEND or UNKNOWN + format = "th.lwd\t%0, %1, (%2), %3, %4"; + else + format = "th.ldd\t%0, %1, (%2), %3, %4"; + } + else + { + reg1 = copy_rtx (operands[1]); + reg2 = copy_rtx (operands[3]); + mem1 = copy_rtx (operands[0]); + mem2 = copy_rtx (operands[2]); + + if (mode == SImode) + format = "th.swd\t%z0, %z1, (%2), %3, %4"; + else + format = "th.sdd\t%z0, %z1, (%2), %3, %4"; + } + + split_plus (XEXP (mem1, 0), &base1, &offset1); + split_plus (XEXP (mem2, 0), &base2, &offset2); + gcc_assert (rtx_equal_p (base1, base2)); + auto size1 = MEM_SIZE (mem1); + auto size2 = MEM_SIZE (mem2); + gcc_assert (known_eq (size1, size2)); + gcc_assert (known_eq (offset1 + size1, offset2)); + + HOST_WIDE_INT imm2 = offset1 >> shamt; + + /* Make sure all mempair instruction constraints are met. */ + gcc_assert (imm2 >= 0 && imm2 < 4); + gcc_assert ((imm2 << shamt) == offset1); + gcc_assert (REG_P (reg1)); + gcc_assert (REG_P (reg2)); + gcc_assert (REG_P (base1)); + if (load_p) + { + gcc_assert (REGNO (reg1) != REGNO (reg2)); + gcc_assert (REGNO (reg1) != REGNO (base1)); + gcc_assert (REGNO (reg2) != REGNO (base1)); + } + + /* Output the mempair instruction. */ + output_operands[0] = copy_rtx (reg1); + output_operands[1] = copy_rtx (reg2); + output_operands[2] = copy_rtx (base1); + output_operands[3] = gen_rtx_CONST_INT (mode, imm2); + output_operands[4] = gen_rtx_CONST_INT (mode, shamt); + output_asm_insn (format, output_operands); + + return ""; +} + +/* Analyse if a pair of loads/stores MEM1 and MEM2 with given MODE + are consecutive so they can be merged into a mempair instruction. + RESERVED will be set to true, if a reversal of the accesses is + required (false otherwise). Returns true if the accesses can be + merged (even if reversing is necessary) and false if not. */ + +static bool +th_mempair_check_consecutive_mems (machine_mode mode, rtx *mem1, rtx *mem2, + bool *reversed) +{ + rtx base1, base2, offset1, offset2; + extract_base_offset_in_addr (*mem1, &base1, &offset1); + extract_base_offset_in_addr (*mem2, &base2, &offset2); + + /* Make sure both mems are in base+offset form. */ + if (!base1 || !base2) + return false; + + /* If both mems use the same base register, just check the offsets. */ + if (rtx_equal_p (base1, base2)) + { + auto size = GET_MODE_SIZE (mode); + + if (known_eq (UINTVAL (offset1) + size, UINTVAL (offset2))) + { + *reversed = false; + return true; + } + + if (known_eq (UINTVAL (offset2) + size, UINTVAL (offset1))) + { + *reversed = true; + return true; + } + + return false; + } + + return false; +} + +/* Check if the given MEM can be used to define the address of a mempair + instruction. */ + +static bool +th_mempair_operand_p (rtx mem, machine_mode mode) +{ + if (!MEM_SIZE_KNOWN_P (mem)) + return false; + + /* Only DI or SI mempair instructions exist. */ + gcc_assert (mode == SImode || mode == DImode); + auto mem_sz = MEM_SIZE (mem); + auto mode_sz = GET_MODE_SIZE (mode); + if (!known_eq (mem_sz, mode_sz)) + return false; + + /* Paired 64-bit access instructions have a fixed shift amount of 4. + Paired 32-bit access instructions have a fixed shift amount of 3. */ + machine_mode mem_mode = GET_MODE (mem); + unsigned shamt = (mem_mode == DImode) ? 4 : 3; + + rtx base; + HOST_WIDE_INT offset; + split_plus (XEXP (mem, 0), &base, &offset); + HOST_WIDE_INT imm2 = offset >> shamt; + + if (imm2 < 0 || imm2 >= 4) + return false; + + if ((imm2 << shamt) != offset) + return false; + + return true; +} + +static bool +th_mempair_load_overlap_p (rtx reg1, rtx reg2, rtx mem) +{ + if (REGNO (reg1) == REGNO (reg2)) + return true; + + if (reg_overlap_mentioned_p (reg1, mem)) + return true; + + rtx base; + HOST_WIDE_INT offset; + split_plus (XEXP (mem, 0), &base, &offset); + + if (!REG_P (base)) + return true; + + if (REG_P (base)) + { + if (REGNO (base) == REGNO (reg1) + || REGNO (base) == REGNO (reg2)) + return true; + } + + return false; +} + +/* Given OPERANDS of consecutive load/store, check if we can merge + them into load-pair or store-pair instructions. + LOAD is true if they are load instructions. + MODE is the mode of memory operation. */ + +bool +th_mempair_operands_p (rtx operands[4], bool load_p, + machine_mode mode) +{ + rtx mem_1, mem_2, reg_1, reg_2; + + if (load_p) + { + reg_1 = operands[0]; + mem_1 = operands[1]; + reg_2 = operands[2]; + mem_2 = operands[3]; + if (!REG_P (reg_1) || !REG_P (reg_2)) + return false; + if (th_mempair_load_overlap_p (reg_1, reg_2, mem_1)) + return false; + if (th_mempair_load_overlap_p (reg_1, reg_2, mem_2)) + return false; + } + else + { + mem_1 = operands[0]; + reg_1 = operands[1]; + mem_2 = operands[2]; + reg_2 = operands[3]; + } + + /* Check if the registers are GP registers. */ + if (!REG_P (reg_1) || !GP_REG_P (REGNO (reg_1)) + || !REG_P (reg_2) || !GP_REG_P (REGNO (reg_2))) + return false; + + /* The mems cannot be volatile. */ + if (!MEM_P (mem_1) || !MEM_P (mem_2)) + return false; + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)) + return false; + + /* If we have slow unaligned access, we only accept aligned memory. */ + if (riscv_slow_unaligned_access_p + && known_lt (MEM_ALIGN (mem_1), GET_MODE_SIZE (mode) * BITS_PER_UNIT)) + return false; + + /* Check if the addresses are in the form of [base+offset]. */ + bool reversed = false; + if (!th_mempair_check_consecutive_mems (mode, &mem_1, &mem_2, &reversed)) + return false; + + /* The first memory accesses must be a mempair operand. */ + if ((!reversed && !th_mempair_operand_p (mem_1, mode)) + || (reversed && !th_mempair_operand_p (mem_2, mode))) + return false; + + /* The operands must be of the same size. */ + gcc_assert (known_eq (GET_MODE_SIZE (GET_MODE (mem_1)), + GET_MODE_SIZE (GET_MODE (mem_2)))); + + return true; +} + +/* Given OPERANDS of consecutive load/store that can be merged, + swap them if they are not in ascending order. + Return true if swap was performed. */ +void +th_mempair_order_operands (rtx operands[4], bool load_p, machine_mode mode) +{ + int mem_op = load_p ? 1 : 0; + bool reversed = false; + if (!th_mempair_check_consecutive_mems (mode, + operands + mem_op, + operands + mem_op + 2, + &reversed)) + gcc_unreachable (); + + if (reversed) + { + /* Irrespective of whether this is a load or a store, + we do the same swap. */ + std::swap (operands[0], operands[2]); + std::swap (operands[1], operands[3]); + } +} + +/* Similar like riscv_save_reg, but saves two registers to memory + and marks the resulting instruction as frame-related. */ + +static void +th_mempair_save_regs (rtx operands[4]) +{ + rtx set1 = gen_rtx_SET (operands[0], operands[1]); + rtx set2 = gen_rtx_SET (operands[2], operands[3]); + rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set1, set2))); + RTX_FRAME_RELATED_P (insn) = 1; + add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1)); + add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2)); +} + +/* Similar like riscv_restore_reg, but restores two registers from memory + and marks the instruction frame-related. */ + +static void +th_mempair_restore_regs (rtx operands[4]) +{ + rtx set1 = gen_rtx_SET (operands[0], operands[1]); + rtx set2 = gen_rtx_SET (operands[2], operands[3]); + rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set1, set2))); + RTX_FRAME_RELATED_P (insn) = 1; + add_reg_note (insn, REG_CFA_RESTORE, operands[0]); + add_reg_note (insn, REG_CFA_RESTORE, operands[2]); +} + +/* Prepare the OPERANDS array to emit a mempair instruction using the + provided information. No checks are performed, the resulting array + should be validated using th_mempair_operands_p(). */ + +void +th_mempair_prepare_save_restore_operands (rtx operands[4], + bool load_p, machine_mode mode, + int regno, HOST_WIDE_INT offset, + int regno2, HOST_WIDE_INT offset2) +{ + int reg_op = load_p ? 0 : 1; + int mem_op = load_p ? 1 : 0; + + rtx mem1 = plus_constant (mode, stack_pointer_rtx, offset); + mem1 = gen_frame_mem (mode, mem1); + rtx mem2 = plus_constant (mode, stack_pointer_rtx, offset2); + mem2 = gen_frame_mem (mode, mem2); + + operands[reg_op] = gen_rtx_REG (mode, regno); + operands[mem_op] = mem1; + operands[2 + reg_op] = gen_rtx_REG (mode, regno2); + operands[2 + mem_op] = mem2; +} + +/* Emit a mempair instruction to save/restore two registers to/from stack. */ + +void +th_mempair_save_restore_regs (rtx operands[4], bool load_p, + machine_mode mode) +{ + gcc_assert (th_mempair_operands_p (operands, load_p, mode)); + + th_mempair_order_operands (operands, load_p, mode); + + if (load_p) + th_mempair_restore_regs (operands); + else + th_mempair_save_regs (operands); +} diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md index b0a71d595fd..63c4af6f77d 100644 --- a/gcc/config/riscv/thead.md +++ b/gcc/config/riscv/thead.md @@ -292,3 +292,55 @@ (define_insn "*th_sextw_msubhisi4" [(set_attr "type" "imul") (set_attr "mode" "SI")] ) + +;; XTheadMemPair + +;; MEMPAIR load 64/32 bit +(define_insn "*th_mempair_load_2" + [(set (match_operand:GPR 0 "register_operand" "=r") + (match_operand:GPR 1 "memory_operand" "m")) + (set (match_operand:GPR 2 "register_operand" "=r") + (match_operand:GPR 3 "memory_operand" "m"))] + "TARGET_XTHEADMEMPAIR && reload_completed + && th_mempair_operands_p (operands, true, mode)" + { return th_mempair_output_move (operands, true, mode, UNKNOWN); } + [(set_attr "move_type" "load") + (set_attr "mode" "")]) + +;; MEMPAIR store 64/32 bit +(define_insn "*th_mempair_store_2" + [(set (match_operand:GPR 0 "memory_operand" "=m") + (match_operand:GPR 1 "register_operand" "r")) + (set (match_operand:GPR 2 "memory_operand" "=m") + (match_operand:GPR 3 "register_operand" "r"))] + "TARGET_XTHEADMEMPAIR && reload_completed + && th_mempair_operands_p (operands, false, mode)" + { return th_mempair_output_move (operands, false, mode, UNKNOWN); } + [(set_attr "move_type" "store") + (set_attr "mode" "")]) + +;; MEMPAIR load DI extended signed SI +(define_insn "*th_mempair_load_extendsidi2" + [(set (match_operand 0 "register_operand" "=r") + (sign_extend:DI (match_operand 1 "memory_operand" "m"))) + (set (match_operand 2 "register_operand" "=r") + (sign_extend:DI (match_operand 3 "memory_operand" "m")))] + "TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed + && th_mempair_operands_p (operands, true, SImode)" + { return th_mempair_output_move (operands, true, SImode, SIGN_EXTEND); } + [(set_attr "move_type" "load") + (set_attr "mode" "DI") + (set_attr "length" "8")]) + +;; MEMPAIR load DI extended unsigned SI +(define_insn "*th_mempair_load_zero_extendsidi2" + [(set (match_operand 0 "register_operand" "=r") + (zero_extend:DI (match_operand 1 "memory_operand" "m"))) + (set (match_operand 2 "register_operand" "=r") + (zero_extend:DI (match_operand 3 "memory_operand" "m")))] + "TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed + && th_mempair_operands_p (operands, true, SImode)" + { return th_mempair_output_move (operands, true, SImode, ZERO_EXTEND); } + [(set_attr "move_type" "load") + (set_attr "mode" "DI") + (set_attr "length" "8")]) diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c b/gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c new file mode 100644 index 00000000000..34a29ab9c04 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c @@ -0,0 +1,98 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */ +/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906" { target { rv64 } } } */ +/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906" { target { rv32 } } } */ + +#include + +#if __riscv_xlen == 32 +typedef uint32_t xlen_t; +#else +typedef uint64_t xlen_t; +#endif + +void foof (xlen_t*, xlen_t, xlen_t); +void foor (xlen_t*, xlen_t, xlen_t); +void foowu (uint32_t*, uint64_t, uint64_t); +void foows (int32_t*, int64_t, int64_t); + +#define LxD_TEST(f, T, i1, i2) \ +void \ +f ## i1 ## i2(T *arr) \ +{ \ + foo ## f(arr, arr[i1], arr[i2]); \ +} + +// works +LxD_TEST(f, xlen_t, 0, 1) +// does not work (can't merge with unaligned offset) +LxD_TEST(f, xlen_t, 1, 2) +// works +LxD_TEST(f, xlen_t, 2, 3) +// does not work (can't merge with unaligned offset) +LxD_TEST(f, xlen_t, 3, 4) +// works +LxD_TEST(f, xlen_t, 4, 5) +// does not work (can't merge with unaligned offset) +LxD_TEST(f, xlen_t, 5, 6) +// works +LxD_TEST(f, xlen_t, 6, 7) +// does not work (can't merge with unaligned offset) +LxD_TEST(f, xlen_t, 7, 8) +// does not work (out of range) +LxD_TEST(f, xlen_t, 8, 9) + +// works with reordering +LxD_TEST(r, xlen_t, 1, 0) +// does not work (can't merge with unaligned offset) +LxD_TEST(r, xlen_t, 2, 1) +// works with reordering +LxD_TEST(r, xlen_t, 3, 2) +// does not work (can't merge with unaligned offset) +LxD_TEST(r, xlen_t, 4, 3) +// works with reordering +LxD_TEST(r, xlen_t, 5, 4) +// does not work (can't merge with unaligned offset) +LxD_TEST(r, xlen_t, 6, 5) +// works with reordering +LxD_TEST(r, xlen_t, 7, 6) +// does not work (can't merge with unaligned offset) +LxD_TEST(r, xlen_t, 8, 7) +// does not work (out of range) +LxD_TEST(r, xlen_t, 9, 8) + +#if __riscv_xlen != 32 +// works +LxD_TEST(wu, uint32_t, 0, 1) +LxD_TEST(ws, int32_t, 0, 1) +// does not work (can't merge with unaligned offset) +LxD_TEST(wu, uint32_t, 1, 2) +LxD_TEST(ws, int32_t, 1, 2) +// works +LxD_TEST(wu, uint32_t, 2, 3) +LxD_TEST(ws, int32_t, 2, 3) +// does not work (can't merge with unaligned offset) +LxD_TEST(wu, uint32_t, 3, 4) +LxD_TEST(ws, int32_t, 3, 4) +// works +LxD_TEST(wu, uint32_t, 4, 5) +LxD_TEST(ws, int32_t, 4, 5) +// does not work (can't merge with unaligned offset) +LxD_TEST(wu, uint32_t, 5, 6) +LxD_TEST(ws, int32_t, 5, 6) +// works +LxD_TEST(wu, uint32_t, 6, 7) +LxD_TEST(ws, int32_t, 6, 7) +// does not work (can't merge with unaligned offset) +LxD_TEST(wu, uint32_t, 7, 8) +LxD_TEST(ws, int32_t, 7, 8) +// does not work (out of range) +LxD_TEST(wu, uint32_t, 8, 9) +LxD_TEST(ws, int32_t, 8, 9) +#endif + +/* { dg-final { scan-assembler-times "th.ldd\t" 8 { target { rv64 } } } } */ +/* { dg-final { scan-assembler-times "th.lwud\t" 4 { target { rv64 } } } } */ +/* { dg-final { scan-assembler-times "th.lwd\t" 4 { target { rv64 } } } } */ + +/* { dg-final { scan-assembler-times "th.lwd\t" 8 { target { rv32 } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c b/gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c new file mode 100644 index 00000000000..ed639709196 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c @@ -0,0 +1,84 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */ +/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906" { target { rv64 } } } */ +/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906" { target { rv32 } } } */ + +#include + +#if __riscv_xlen == 32 +typedef uint32_t xlen_t; +#else +typedef uint64_t xlen_t; +#endif + +#define SxD_TEST(f, T, i1, i2) \ +void \ +f ## i1 ## i2(T *arr, T x, T y) \ +{ \ + arr[i1] = x; \ + arr[i2] = y; \ +} + +// works +SxD_TEST(f, xlen_t, 0, 1) +// does not work (can't merge with unaligned offset) +SxD_TEST(f, xlen_t, 1, 2) +// works +SxD_TEST(f, xlen_t, 2, 3) +// does not work (can't merge with unaligned offset) +SxD_TEST(f, xlen_t, 3, 4) +// works +SxD_TEST(f, xlen_t, 4, 5) +// does not work (can't merge with unaligned offset) +SxD_TEST(f, xlen_t, 5, 6) +// works +SxD_TEST(f, xlen_t, 6, 7) +// does not work (can't merge with unaligned offset) +SxD_TEST(f, xlen_t, 7, 8) +// does not work (out of range) +SxD_TEST(f, xlen_t, 8, 9) + +// works with reordering +SxD_TEST(r, xlen_t, 1, 0) +// does not work (can't merge with unaligned offset) +SxD_TEST(r, xlen_t, 2, 1) +// works with reordering +SxD_TEST(r, xlen_t, 3, 2) +// does not work (can't merge with unaligned offset) +SxD_TEST(r, xlen_t, 4, 3) +// works with reordering +SxD_TEST(r, xlen_t, 5, 4) +// does not work (can't merge with unaligned offset) +SxD_TEST(r, xlen_t, 6, 5) +// works with reordering +SxD_TEST(r, xlen_t, 7, 6) +// does not work (can't merge with unaligned offset) +SxD_TEST(r, xlen_t, 8, 7) +// does not work (out of range) +SxD_TEST(r, xlen_t, 9, 8) + +#if __riscv_xlen != 32 +// works +SxD_TEST(w, uint32_t, 0, 1) +// does not work (can't merge with unaligned offset) +SxD_TEST(w, uint32_t, 1, 2) +// works +SxD_TEST(w, uint32_t, 2, 3) +// does not work (can't merge with unaligned offset) +SxD_TEST(w, uint32_t, 3, 4) +// works +SxD_TEST(w, uint32_t, 4, 5) +// does not work (can't merge with unaligned offset) +SxD_TEST(w, uint32_t, 5, 6) +// works +SxD_TEST(w, uint32_t, 6, 7) +// does not work (can't merge with unaligned offset) +SxD_TEST(w, uint32_t, 7, 8) +// does not work (out of range) +SxD_TEST(w, uint32_t, 8, 9) +#endif + +/* { dg-final { scan-assembler-times "th.sdd\t" 8 { target { rv64 } } } } */ +/* { dg-final { scan-assembler-times "th.swd\t" 4 { target { rv64 } } } } */ + +/* { dg-final { scan-assembler-times "th.swd\t" 8 { target { rv32 } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c new file mode 100644 index 00000000000..5dec702819a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */ +/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906" { target { rv64 } } } */ +/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906" { target { rv32 } } } */ + +#include + +#if __riscv_xlen == 32 +typedef uint32_t xlen_t; +#else +typedef uint64_t xlen_t; +#endif + +void foo (xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t); +void bar (xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t); + +void baz (xlen_t a, xlen_t b, xlen_t c, xlen_t d, xlen_t e, xlen_t f, xlen_t g, xlen_t h) +{ + foo (a, b, c, d, e, f, g, h); + /* RV64: We don't use 0(sp), therefore we can only get 3 mempairs. */ + /* RV32: We don't use 0(sp)-8(sp), therefore we can only get 2 mempairs. */ + bar (a, b, c, d, e, f, g, h); +} + +/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } } } } */ +/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } } } } */ + +/* { dg-final { scan-assembler-times "th.lwd\t" 2 { target { rv32 } } } } */ +/* { dg-final { scan-assembler-times "th.swd\t" 2 { target { rv32 } } } } */