From patchwork Thu Sep 22 06:58:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 57880 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E79F13857349 for ; Thu, 22 Sep 2022 06:59:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgeu1.qq.com (smtpbgeu1.qq.com [52.59.177.22]) by sourceware.org (Postfix) with ESMTPS id 0AF413858028 for ; Thu, 22 Sep 2022 06:58:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0AF413858028 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp69t1663829925tlrfj59h Received: from rios-cad5.localdomain ( [42.247.22.65]) by bizesmtp.qq.com (ESMTP) with id ; Thu, 22 Sep 2022 14:58:43 +0800 (CST) X-QQ-SSF: 01400000000000D0J000000A0000000 X-QQ-FEAT: Fc2LLDWeHZ8MSQo9gPDaUWjR/YsKKBT0n3uFqi3In2gs/Ao/xwxy1x14k+VRI hTg0J6U4BT1CelwDyqmSe8TQ6IvOyCLyFNXRQ/MO1muH52B06UtSMZKbgCJJHGLmEA2RGI3 GQe58vD/6HXKP9qpkP19+DrjYhAJTbyrSUxnC1xWuRrTJRRw/3k06EkMrx9RulxxhNNcef7 doyvIxM9S+Qfa0JgRuEVWumd6Jf0zr/l/V0krvCLj22WkzDpaSYqP8vyIdonD6mHRcyvL5y aBolqFV8WOrlPrUItJ65olXbVL1vB7LR0SKC+nvxs4q9vOS9B5Kp1vjF1JlLeRIWO88+zqd jnUvPpvs5ypyfSg90wcj4EfXpxVZb36i+0FQFiHuRLvPkOnM6DAWmfeByaQKNw3ZJPovZWV X-QQ-GoodBg: 2 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Subject: [PATCH] DSE: Enhance dse with def-ref analysis Date: Thu, 22 Sep 2022 14:58:26 +0800 Message-Id: <20220922065826.329557-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvr:qybglogicsvr7 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_PASS, TXREP, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Ju-Zhe Zhong This patch fix issue: PR 99407 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407 The enhancement implementation is simple: 1.Search gimple statement in program reverse order. 2.Queue the store statement which may be possible kill the def of previous store statement. 3.Perform dse_def_ref_analysis to remove stores will not kill any def. For example: a[i_18] = _5; ... foo (&a); a[i_18] = _7; a[i_18] = _7 is queued at the begining and will be removed in dse_def_ref_analysis. 4.Remove the store if the def is confirmed to be killed. I have fully tested it in RISC-V foundation downstream port (RVV): https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next Are you willing to review this patch and test it in ARM/x86? gcc/ChangeLog: * tree-ssa-dse.cc (dse_search_def_stores): (dse_can_def_ref_p): (dse_def_ref_analysis): (dse_optimize_stmt): (pass_dse::execute): gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr99407.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/pr99407.c | 30 ++++ gcc/tree-ssa-dse.cc | 209 +++++++++++++++++++++++- 2 files changed, 236 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c new file mode 100644 index 00000000000..57cea77da7c --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-dse1-details" } */ +typedef float real_t; + +#define iterations 100000 +#define LEN_1D 32000 +#define LEN_2D 256 +real_t flat_2d_array[LEN_2D*LEN_2D]; + +real_t x[LEN_1D]; + +real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D], +bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D]; + +int indx[LEN_1D]; + +real_t* __restrict__ xx; +real_t* yy; +real_t s243(void) +{ + for (int nl = 0; nl < iterations; nl++) { + for (int i = 0; i < LEN_1D-1; i++) { + a[i] = b[i] + c[i ] * d[i]; + b[i] = a[i] + d[i ] * e[i]; + a[i] = b[i] + a[i+1] * d[i]; + } + } +} + +/* { dg-final { scan-tree-dump "Deleted dead store" "dse1" } } */ \ No newline at end of file diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc index 34cfd1a8802..a8ca3672da2 100644 --- a/gcc/tree-ssa-dse.cc +++ b/gcc/tree-ssa-dse.cc @@ -1332,6 +1332,186 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes) return true; } +/* Search the stores_queue to see whether there is a store has a same vdef + as the stmt. */ + +static bool +dse_search_def_stores (function *fun, auto_vec &stores_queue, + gimple *stmt) +{ + /* Consider the following sequcence: + a[i_18] = _5; + _8 = e[i_18]; + _9 = _3 * _8; + _10 = _5 + _9; + b[i_18] = _10; + _12 = i_18 + 1; + _13 = a[_12]; + _15 = _3 * _13; + _16 = _10 + _15; + a[i_18] = _16 + + We should be able to remove a[i_18] = _5. */ + for (unsigned int i = 0; i < stores_queue.length (); ++i) + { + if (!stores_queue[i]) + continue; + tree lhs1 = gimple_assign_lhs (stores_queue[i]); + tree lhs2 = gimple_assign_lhs (stmt); + + if (TREE_CODE (lhs1) != TREE_CODE (lhs2)) + continue; + if (operand_equal_p (gimple_assign_lhs (stores_queue[i]), + gimple_assign_lhs (stmt), OEP_ADDRESS_OF)) + { + /* No matter it can be eliminated or not, remove it + in the worklist. */ + stores_queue[i] = NULL; + if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt) + && !is_ctrl_altering_stmt (stmt) + && (!stmt_could_throw_p (fun, stmt) + || fun->can_delete_dead_exceptions)) + return true; + } + } + + return false; +} + +/* Return true if the TREE_CODE of the mem op is allowed to do dse + according to def-ref analysis. */ + +static bool +dse_can_def_ref_p (gimple *stmt) +{ + /*TODO: For now, we only support dse according to + def-ref analysis for ARRAY_REF. */ + return TREE_CODE (gimple_assign_lhs (stmt)) == ARRAY_REF; +} + +/* Perform def-ref analysis on all the stores of stores_queue worklist. + Since dse is running on reverse program order walk, the stores in + stores_queue are always after stmt, clear the store in the stores_queue + if the address of store lhs is changed or the lhs of store is used + in stmt. */ + +static void +dse_def_ref_analysis (gimple *stmt, auto_vec &stores_queue) +{ + for (unsigned int i = 0; i < stores_queue.length (); ++i) + { + if (!stores_queue[i]) + continue; + + /* If we meet non-call or non-assign statement, we disable the possible + * dse. */ + if (gimple_code (stmt) != GIMPLE_CALL + && gimple_code (stmt) != GIMPLE_ASSIGN) + { + stores_queue[i] = NULL; + continue; + } + + tree lhs = gimple_get_lhs (stores_queue[i]); + if (!lhs) + continue; + switch (TREE_CODE (lhs)) + { + /* Handle the array like a[i_18]: + 1.if i_18 is changed in stmt which means lhs of stmt == i_18, + we remove the store from the stores_queue. + + 2.a or a[i_18] is used in stmt, we remove the store from the + stores_queue. */ + case ARRAY_REF: { + if (gimple_get_lhs (stmt) + && operand_equal_p (gimple_get_lhs (stmt), + TREE_OPERAND (lhs, 1), 0)) + { + stores_queue[i] = NULL; + continue; + } + + for (unsigned int j = 0; j < gimple_num_ops (stmt); ++j) + { + tree op = gimple_op (stmt, j); + if (!op) + continue; + + /* We only check the use op. */ + if (op == gimple_get_lhs (stmt)) + continue; + + if (TREE_OPERAND_LENGTH (op) == 0) + { + if (operand_equal_p (op, lhs, 0) + || operand_equal_p (op, TREE_OPERAND (lhs, 0), 0)) + stores_queue[i] = NULL; + } + else + { + for (int k = 0; k < TREE_OPERAND_LENGTH (op); ++k) + { + if (!TREE_OPERAND (op, k)) + continue; + + if (TREE_CODE (op) == ARRAY_REF) + { + /* + Disable optimization like this: + a[i_18] = _5; + ... + foo (a[i_18]). + + Don't disable optimization like this: + a[i_18] = _5; + ... + foo (a[i_12]). + */ + if (operand_equal_p (op, lhs, OEP_ADDRESS_OF)) + { + stores_queue[i] = NULL; + break; + } + } + else + { + /* To be conservative, if TREE_OPERAND (op, k) + length > 1. Disable the possible dse + optimization. + Disable optimization like this: + a[i_18] = _5; + ... + foo (&a). + + Or + a[i_18] = _5; + ... + foo (test (&a)). + */ + if (TREE_OPERAND_LENGTH (TREE_OPERAND (op, k)) > 0 + || operand_equal_p (TREE_OPERAND (op, k), lhs, + 0) + || operand_equal_p (TREE_OPERAND (op, k), + TREE_OPERAND (lhs, 0), 0)) + { + stores_queue[i] = NULL; + break; + } + } + } + } + if (!stores_queue[i]) + break; + } + } + break; + default: + gcc_unreachable (); + } + } +} + /* Attempt to eliminate dead stores in the statement referenced by BSI. A dead store is a store into a memory location which will later be @@ -1344,7 +1524,8 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes) post dominates the first store, then the first store is dead. */ static void -dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes) +dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes, + auto_vec &stores_queue) { gimple *stmt = gsi_stmt (*gsi); @@ -1469,7 +1650,25 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes) byte_tracking_enabled, live_bytes, &by_clobber_p); if (store_status == DSE_STORE_LIVE) - return; + { + if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt) + && !is_ctrl_altering_stmt (stmt) + && (!stmt_could_throw_p (fun, stmt) + || fun->can_delete_dead_exceptions)) + { + if (dse_search_def_stores (fun, stores_queue, stmt)) + ; + else if (dse_can_def_ref_p (stmt)) + { + stores_queue.safe_push (stmt); + return; + } + else + return; + } + else + return; + } if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD) { @@ -1566,13 +1765,17 @@ pass_dse::execute (function *fun) { basic_block bb = BASIC_BLOCK_FOR_FN (fun, rpo[i-1]); gimple_stmt_iterator gsi; + /* Queue the stores which potentially updates mem of a previous + redundant store. We only do the optimization within a basic block. */ + auto_vec stores_queue; for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);) { gimple *stmt = gsi_stmt (gsi); + dse_def_ref_analysis (stmt, stores_queue); if (gimple_vdef (stmt)) - dse_optimize_stmt (fun, &gsi, live_bytes); + dse_optimize_stmt (fun, &gsi, live_bytes, stores_queue); else if (def_operand_p def_p = single_ssa_def_operand (stmt, SSA_OP_DEF)) {