From patchwork Wed Jun 7 05:52:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fei Gao X-Patchwork-Id: 70701 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 38BA83857034 for ; Wed, 7 Jun 2023 05:53:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from sgoci-sdnproxy-4.icoremail.net (sgoci-sdnproxy-4.icoremail.net [129.150.39.64]) by sourceware.org (Postfix) with ESMTP id C41C2385771C for ; Wed, 7 Jun 2023 05:52:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C41C2385771C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=eswincomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=eswincomputing.com Received: from localhost.localdomain (unknown [10.12.130.31]) by app1 (Coremail) with SMTP id EwgMCgBnBcUeG4BkrggjAA--.24178S7; Wed, 07 Jun 2023 13:52:37 +0800 (CST) From: Fei Gao To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, palmer@dabbelt.com, jeffreyalaw@gmail.com, sinan.lin@linux.alibaba.com, jiawei@iscas.ac.cn, Fei Gao Subject: [PATCH 3/4] [RISC-V] resolve confilct between zcmp multi push/pop and shrink-wrap-separate Date: Wed, 7 Jun 2023 05:52:14 +0000 Message-Id: <20230607055215.29332-4-gaofei@eswincomputing.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230607055215.29332-1-gaofei@eswincomputing.com> References: <20230607055215.29332-1-gaofei@eswincomputing.com> X-CM-TRANSID: EwgMCgBnBcUeG4BkrggjAA--.24178S7 X-Coremail-Antispam: 1UD129KBjvAXoW3uFyfKr4ruw1Duw4UXrWrZrb_yoW8Jr4kAo WfGwn8X3WFgFy7CF95C348Gry7ur4vg3W7XrsYvry5GF1DXry7u3Wjyw4fXa4fXrWSkr97 Za4rZa4UXa1vgFnxn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYW7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r1rM28IrcIa0x kI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l84AC jcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4UJVW0ow A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY02Avz4vE-syl42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUojjgUUUUU X-CM-SenderInfo: xjdrwv3l6h245lqf0zpsxwx03jof0z/ X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, LIKELY_SPAM_BODY, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Disable zcmp multi push/pop if shrink-wrap-separate is active. So in -Os that prefers smaller code size, by default shrink-wrap-separate is disabled while zcmp multi push/pop is enabled. And in -O2 and others that prefers speed, by default shrink-wrap-separate is enabled while zcmp multi push/pop is disabled. To force enabling zcmp multi push/pop in this case, -fno-shrink-wrap-separate has to be explictly given. The following TC shows the issues in -O2 before this patch with both shrink-wrap-separate and zcmp multi push/pop active. 1. duplicated store of s regs. 2. cm.push pushes ra, s0-s11 in reverse order than what normal prologue does, causing stack corruption and failure to resotre s regs. TC: zcmp_shrink_wrap_separate.c included in this patch. output asm before this patch: calc_func: cm.push {ra, s0-s3}, -32 ... beq a5,zero,.L2 ... .L2: ... sw s1,20(sp) //issue here sw s3,12(sp) //issue here ... sw s2,16(sp) //issue here output asm after this patch: calc_func: addi sp,sp,-32 sw s0,24(sp) ... beq a5,zero,.L2 ... .L2: ... sw s1,20(sp) sw s3,12(sp) ... sw s2,16(sp) gcc/ChangeLog: * config/riscv/riscv.cc (riscv_avoid_shrink_wrapping_separate): wrap the condition check in riscv_avoid_shrink_wrapping_separate. (riscv_avoid_multi_push): avoid multi push if shrink_wrapping_separate is active. (riscv_get_separate_components): call riscv_avoid_shrink_wrapping_separate * shrink-wrap.cc (try_shrink_wrapping_separate): call use_shrink_wrapping_separate. (use_shrink_wrapping_separate):wrap the condition check in use_shrink_wrapping_separate * shrink-wrap.h (use_shrink_wrapping_separate): add to extern gcc/testsuite/ChangeLog: * gcc.target/riscv/zcmp_shrink_wrap_separate.c: New test. * gcc.target/riscv/zcmp_shrink_wrap_separate2.c: New test. Signed-off-by: Fei Gao Co-Authored-By: Zhangjin Liao --- gcc/config/riscv/riscv.cc | 19 +++- gcc/shrink-wrap.cc | 25 +++-- gcc/shrink-wrap.h | 1 + .../riscv/zcmp_shrink_wrap_separate.c | 97 +++++++++++++++++++ .../riscv/zcmp_shrink_wrap_separate2.c | 97 +++++++++++++++++++ 5 files changed, 228 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index f60c241a526..b505cdeca34 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3. If not see #include "cfghooks.h" #include "cfgloop.h" #include "cfgrtl.h" +#include "shrink-wrap.h" #include "sel-sched.h" #include "fold-const.h" #include "gimple-iterator.h" @@ -389,6 +390,7 @@ static const struct riscv_tune_param optimize_size_tune_info = { false, /* use_divmod_expansion */ }; +static bool riscv_avoid_shrink_wrapping_separate (); static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *); static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *); @@ -4910,6 +4912,8 @@ riscv_avoid_multi_push(const struct riscv_frame_info *frame) || cfun->machine->interrupt_handler_p || cfun->machine->varargs_size != 0 || crtl->args.pretend_args_size != 0 + || (use_shrink_wrapping_separate () + && !riscv_avoid_shrink_wrapping_separate ()) || (frame->mask & ~ MULTI_PUSH_GPR_MASK)) return true; @@ -6077,6 +6081,17 @@ riscv_epilogue_uses (unsigned int regno) return false; } +static bool +riscv_avoid_shrink_wrapping_separate () +{ + if (riscv_use_save_libcall (&cfun->machine->frame) + || cfun->machine->interrupt_handler_p + || !cfun->machine->frame.gp_sp_offset.is_constant ()) + return true; + + return false; +} + /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ static sbitmap @@ -6086,9 +6101,7 @@ riscv_get_separate_components (void) sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER); bitmap_clear (components); - if (riscv_use_save_libcall (&cfun->machine->frame) - || cfun->machine->interrupt_handler_p - || !cfun->machine->frame.gp_sp_offset.is_constant ()) + if (riscv_avoid_shrink_wrapping_separate ()) return components; offset = cfun->machine->frame.gp_sp_offset.to_constant (); diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc index b8d7b557130..d534964321a 100644 --- a/gcc/shrink-wrap.cc +++ b/gcc/shrink-wrap.cc @@ -1776,16 +1776,14 @@ insert_prologue_epilogue_for_components (sbitmap components) commit_edge_insertions (); } -/* The main entry point to this subpass. FIRST_BB is where the prologue - would be normally put. */ -void -try_shrink_wrapping_separate (basic_block first_bb) +bool +use_shrink_wrapping_separate (void) { if (!(SHRINK_WRAPPING_ENABLED - && flag_shrink_wrap_separate - && optimize_function_for_speed_p (cfun) - && targetm.shrink_wrap.get_separate_components)) - return; + && flag_shrink_wrap_separate + && optimize_function_for_speed_p (cfun) + && targetm.shrink_wrap.get_separate_components)) + return false; /* We don't handle "strange" functions. */ if (cfun->calls_alloca @@ -1794,6 +1792,17 @@ try_shrink_wrapping_separate (basic_block first_bb) || crtl->calls_eh_return || crtl->has_nonlocal_goto || crtl->saves_all_registers) + return false; + + return true; +} + +/* The main entry point to this subpass. FIRST_BB is where the prologue + would be normally put. */ +void +try_shrink_wrapping_separate (basic_block first_bb) +{ + if (!use_shrink_wrapping_separate ()) return; /* Ask the target what components there are. If it returns NULL, don't diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h index 161647711a3..82386c2b712 100644 --- a/gcc/shrink-wrap.h +++ b/gcc/shrink-wrap.h @@ -26,6 +26,7 @@ along with GCC; see the file COPYING3. If not see extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET); extern void try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_seq); extern void try_shrink_wrapping_separate (basic_block first_bb); +extern bool use_shrink_wrapping_separate (void); #define SHRINK_WRAPPING_ENABLED \ (flag_shrink_wrap && targetm.have_simple_return ()) diff --git a/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c b/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c new file mode 100644 index 00000000000..11f87aee607 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c @@ -0,0 +1,97 @@ +/* { dg-do compile } */ +/* { dg-options " -O2 -march=rv32imaf_zca_zcmp -mabi=ilp32f" } */ +/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-O3" "-Oz" "-flto"} } */ + +typedef struct MAT_PARAMS_S +{ + int N; + signed short *A; + signed short *B; + signed int *C; +} mat_params; + +typedef struct CORE_PORTABLE_S +{ + unsigned char portable_id; +} core_portable; + +typedef struct RESULTS_S +{ + /* inputs */ + signed short seed1; /* Initializing seed */ + signed short seed2; /* Initializing seed */ + signed short seed3; /* Initializing seed */ + void * memblock[4]; /* Pointer to safe memory location */ + unsigned int size; /* Size of the data */ + unsigned int iterations; /* Number of iterations to execute */ + unsigned int execs; /* Bitmask of operations to execute */ + struct list_head_s *list; + mat_params mat; + /* outputs */ + unsigned short crc; + unsigned short crclist; + unsigned short crcmatrix; + unsigned short crcstate; + signed short err; + /* ultithread specific */ + core_portable port; +} core_results; + +extern signed short +core_bench_state(unsigned int, void *, signed short, signed short, signed short, unsigned short); + +extern signed short +core_bench_matrix(mat_params *, signed short, unsigned short); + +extern unsigned short +crcu16(signed short, unsigned short); + +signed short +calc_func(signed short *pdata, core_results *res) +{ + signed short data = *pdata; + signed short retval; + unsigned char optype + = (data >> 7) + & 1; /* bit 7 indicates if the function result has been cached */ + if (optype) /* if cached, use cache */ + return (data & 0x007f); + else + { /* otherwise calculate and cache the result */ + signed short flag = data & 0x7; /* bits 0-2 is type of function to perform */ + signed short dtype + = ((data >> 3) + & 0xf); /* bits 3-6 is specific data for the operation */ + dtype |= dtype << 4; /* replicate the lower 4 bits to get an 8b value */ + switch (flag) + { + case 0: + if (dtype < 0x22) /* set min period for bit corruption */ + dtype = 0x22; + retval = core_bench_state(res->size, + res->memblock[3], + res->seed1, + res->seed2, + dtype, + res->crc); + if (res->crcstate == 0) + res->crcstate = retval; + break; + case 1: + retval = core_bench_matrix(&(res->mat), dtype, res->crc); + if (res->crcmatrix == 0) + res->crcmatrix = retval; + break; + default: + retval = data; + break; + } + res->crc = crcu16(retval, res->crc); + retval &= 0x007f; + *pdata = (data & 0xff00) | 0x0080 | retval; /* cache the result */ + return retval; + } +} + +/* { dg-final { scan-assembler-not "cm\.push" } } */ + diff --git a/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c b/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c new file mode 100644 index 00000000000..ec7e9c39b5d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c @@ -0,0 +1,97 @@ +/* { dg-do compile } */ +/* { dg-options " -O2 -fno-shrink-wrap-separate -march=rv32imaf_zca_zcmp -mabi=ilp32f" } */ +/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-O3" "-Oz" "-flto"} } */ + +typedef struct MAT_PARAMS_S +{ + int N; + signed short *A; + signed short *B; + signed int *C; +} mat_params; + +typedef struct CORE_PORTABLE_S +{ + unsigned char portable_id; +} core_portable; + +typedef struct RESULTS_S +{ + /* inputs */ + signed short seed1; /* Initializing seed */ + signed short seed2; /* Initializing seed */ + signed short seed3; /* Initializing seed */ + void * memblock[4]; /* Pointer to safe memory location */ + unsigned int size; /* Size of the data */ + unsigned int iterations; /* Number of iterations to execute */ + unsigned int execs; /* Bitmask of operations to execute */ + struct list_head_s *list; + mat_params mat; + /* outputs */ + unsigned short crc; + unsigned short crclist; + unsigned short crcmatrix; + unsigned short crcstate; + signed short err; + /* ultithread specific */ + core_portable port; +} core_results; + +extern signed short +core_bench_state(unsigned int, void *, signed short, signed short, signed short, unsigned short); + +extern signed short +core_bench_matrix(mat_params *, signed short, unsigned short); + +extern unsigned short +crcu16(signed short, unsigned short); + +signed short +calc_func(signed short *pdata, core_results *res) +{ + signed short data = *pdata; + signed short retval; + unsigned char optype + = (data >> 7) + & 1; /* bit 7 indicates if the function result has been cached */ + if (optype) /* if cached, use cache */ + return (data & 0x007f); + else + { /* otherwise calculate and cache the result */ + signed short flag = data & 0x7; /* bits 0-2 is type of function to perform */ + signed short dtype + = ((data >> 3) + & 0xf); /* bits 3-6 is specific data for the operation */ + dtype |= dtype << 4; /* replicate the lower 4 bits to get an 8b value */ + switch (flag) + { + case 0: + if (dtype < 0x22) /* set min period for bit corruption */ + dtype = 0x22; + retval = core_bench_state(res->size, + res->memblock[3], + res->seed1, + res->seed2, + dtype, + res->crc); + if (res->crcstate == 0) + res->crcstate = retval; + break; + case 1: + retval = core_bench_matrix(&(res->mat), dtype, res->crc); + if (res->crcmatrix == 0) + res->crcmatrix = retval; + break; + default: + retval = data; + break; + } + res->crc = crcu16(retval, res->crc); + retval &= 0x007f; + *pdata = (data & 0xff00) | 0x0080 | retval; /* cache the result */ + return retval; + } +} + +/* { dg-final { scan-assembler "cm\.push" } } */ +