From patchwork Tue Sep 27 19:53:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 58092 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 74A38385782C for ; Tue, 27 Sep 2022 19:54:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by sourceware.org (Postfix) with ESMTPS id 965F93857C4C for ; Tue, 27 Sep 2022 19:53:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 965F93857C4C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com Received: by mail-pl1-x62e.google.com with SMTP id d11so10012451pll.8 for ; Tue, 27 Sep 2022 12:53:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date; bh=uUUgkGKoWT0oOYZbwuGp3qSoIV/XP3O3RLkATGSuvfI=; b=iBxxXQVrGu9MMu+c1gEG+AK3TJP0txJIIcquN7jzSFa5esAEu0GFAjsqVQ30OKWTXn syfhZQZNEs+pluePpZog3V3wuxBuSKNrmQxQ6FY2tNO9xp/ojPAgHW9aaAs1qGSuUFsa Dorfxhu4T6p04d5apzrHps7GklPi3nVsm7YfBYcXTw9v72NYqYyuHLkx20GTXwzfiBPK FXQhct9qlk13/CZ/1RHDPejd8Cm4rca67vWdTp4A7VKquZuWGZnXpMr2D+Rmw5sgegE0 GfiJ0GxEvfG5BS/f5r0k3/keaKvQDIk+uNfs72KV0ailI8vwtK39Ym1Fsy/8VRtDpRlH N1Lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date; bh=uUUgkGKoWT0oOYZbwuGp3qSoIV/XP3O3RLkATGSuvfI=; b=URMVGyrbPRL1Gyw+i3ifMU0euDOvANtOG5bDg8mYCNpziQr84dLh0E9oQg7viTduV+ 2iahuOGibJavt4fQUR+S2R0AOMktkYNW3VcRb1e9vmPinpV69cDmDwQ9Oh/FhzMJ+MOZ fnFeAfzP9rN++l8faS6Ty4xsFOF6BOzTof+VHEXyGOS7YAZY0n3CciBq83DrAfcgE5e4 QQ61IedAVmOGHFr02cwPGumU7hIkqazEeh5wBT/u9Vth3ez64Xc2pK24E1wXWXKMzSc4 3HkfMNQ8xkodoZyPV/O3nAP1xDndedRTJFcm7S1MKBI7wHHLpstZHG2qBnN3QsOHbeM/ W8fg== X-Gm-Message-State: ACrzQf1P4LBwT0pbNAlczYRKr3ZI9K7pcR7eKEojaphG2e5uVhYeAWnC 2VyZAKT4oOPbS9RXKdy9l4f9DXnDC2Wy5x3/ X-Google-Smtp-Source: AMsMyM7AwlrA1q0EZwQCuxQ4POttc7AGui/b3rt88kln6F4HM1pkmgOmUgrHHihoi+5fTeQkNX/eUg== X-Received: by 2002:a17:90b:3ec1:b0:203:5eef:fe1e with SMTP id rm1-20020a17090b3ec100b002035eeffe1emr6161571pjb.143.1664308438288; Tue, 27 Sep 2022 12:53:58 -0700 (PDT) Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a]) by smtp.gmail.com with ESMTPSA id v66-20020a622f45000000b00543780ba53asm2168988pfv.124.2022.09.27.12.53.57 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 27 Sep 2022 12:53:57 -0700 (PDT) Message-ID: Date: Tue, 27 Sep 2022 13:53:56 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.1 Content-Language: en-US From: Jeff Law Subject: [RFA] Avoid unnecessary load-immediate in coremark To: gcc-patches@gcc.gnu.org X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This is another minor improvement to coremark.   I suspect this only improves code size as the load-immediate was likely issuing with the ret statement on multi-issue machines. Basically we're failing to utilize conditional equivalences during the post-reload CSE pass.  So if a particular block is only reached when a certain condition holds (say for example a4 == 0) and the block has an assignment like a4 = 0, we would fail to eliminate the unnecessary assignment. So the way this works, as we enter each block in reload_cse_regs_1 we look at the block's predecessors to see if all of them have the same implicit assignment.  If they do, then we create a dummy insn representing that implicit assignment. Before processing the first real insn, we enter the implicit assignment into the cselib hash tables.    This deferred action is necessary because of CODE_LABEL handling in cselib -- when it sees a CODE_LABEL it wipes state.  So we have to add the implicit assignment after processing the (optional) CODE_LABEL, but before processing real insns. Note we have to walk all the block's predecessors to verify they all have the same implicit assignment.  That could potentially be expensive, so we limit it to cases where there are only a few predecessors.   For reference on x86_64, 81% of the cases where implicit assignments can be found are for single predecessor blocks.  96% have two preds, 99.1% have 3 preds, 99.6% have 4 preds, 99.8% have 5 preds and so-on.   While there were cases where all 19 preds had the same implicit assignment capturing those cases just doesn't seem terribly important.   I put the clamp at 3 preds.    If folks think it's important, I could certainly make that a PARAM. Bootstrapped and regression tested on x86.  Bootstrapped on riscv as well. OK for the trunk? Jeff gcc/ * postreload.cc (reload_cse_regs_1): Record implicit sets from conditional branches into the cselib tables. gcc/testsuite/ * gcc.target/riscv/implict-set.c: New test. diff --git a/gcc/postreload.cc b/gcc/postreload.cc index 41f61d32648..2f155a239ae 100644 --- a/gcc/postreload.cc +++ b/gcc/postreload.cc @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "emit-rtl.h" #include "recog.h" +#include "cfghooks.h" #include "cfgrtl.h" #include "cfgbuild.h" #include "cfgcleanup.h" @@ -221,13 +222,108 @@ reload_cse_regs_1 (void) init_alias_analysis (); FOR_EACH_BB_FN (bb, cfun) - FOR_BB_INSNS (bb, insn) - { - if (INSN_P (insn)) - cfg_changed |= reload_cse_simplify (insn, testreg); + { + /* If BB has a small number of predecessors, see if each of the + has the same implicit set. If so, record that implicit set so + that we can add it to the cselib tables. */ + rtx_insn *implicit_set; - cselib_process_insn (insn); - } + implicit_set = NULL; + if (EDGE_COUNT (bb->preds) <= 3) + { + edge e; + edge_iterator ei; + rtx src = NULL_RTX; + rtx dest = NULL_RTX; + bool found = true; + + /* Iterate over each incoming edge and see if they + all have the same implicit set. */ + FOR_EACH_EDGE (e, ei, bb->preds) + { + /* If the predecessor does not end in a conditional + jump, then it does not have an implicit set. */ + if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) + && !block_ends_with_condjump_p (e->src)) + { + found = false; + break; + } + + /* We know the predecessor ends with a conditional + jump. Now dig into the actal form of the jump + to potentially extract an implicit set. */ + rtx_insn *condjump = BB_END (e->src); + if (condjump + && any_condjump_p (condjump) + && onlyjump_p (condjump)) + { + /* Extract the condition. */ + rtx pat = PATTERN (condjump); + rtx i_t_e = SET_SRC (pat); + gcc_assert (GET_CODE (i_t_e) == IF_THEN_ELSE); + rtx cond = XEXP (i_t_e, 0); + if ((GET_CODE (cond) == EQ + && GET_CODE (XEXP (i_t_e, 1)) == LABEL_REF + && XEXP (XEXP (i_t_e, 1), 0) == BB_HEAD (bb)) + || (GET_CODE (cond) == NE + && XEXP (i_t_e, 2) == pc_rtx + && e->src->next_bb == bb)) + { + /* If this is the first time through record + the source and destination. */ + if (!dest) + { + dest = XEXP (cond, 0); + src = XEXP (cond, 1); + } + /* If this is not the first time through, then + verify the source and destination match. */ + else if (dest == XEXP (cond, 0) && src == XEXP (cond, 1)) + ; + else + { + found = false; + break; + } + } + } + else + { + found = false; + break; + } + } + + /* If all the incoming edges had the same implicit + set, then create a dummy insn for that set. + + It will be entered into the cselib tables before + we process the first real insn in this block. */ + if (dest && found) + implicit_set = make_insn_raw (gen_rtx_SET (dest, src)); + } + + FOR_BB_INSNS (bb, insn) + { + if (INSN_P (insn)) + { + /* If we recorded an implicit set, enter it + into the tables before the first real insn. + + We have to do it this way because a CODE_LABEL + will flush the cselib tables. */ + if (implicit_set) + { + cselib_process_insn (implicit_set); + implicit_set = NULL; + } + cfg_changed |= reload_cse_simplify (insn, testreg); + } + + cselib_process_insn (insn); + } + } /* Clean up. */ end_alias_analysis (); diff --git a/gcc/testsuite/gcc.target/riscv/implicit-set.c b/gcc/testsuite/gcc.target/riscv/implicit-set.c new file mode 100644 index 00000000000..91106bb5d80 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/implicit-set.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -dp" } */ +/* This was extracted from coremark. */ + + +typedef signed short ee_s16; +typedef struct list_data_s +{ + ee_s16 data16; + ee_s16 idx; +} list_data; + +typedef struct list_head_s +{ + struct list_head_s *next; + struct list_data_s *info; +} list_head; + + +list_head * +core_list_find(list_head *list, list_data *info) +{ + if (info->idx >= 0) + { + while (list && (list->info->idx != info->idx)) + list = list->next; + return list; + } + else + { + while (list && ((list->info->data16 & 0xff) != info->data16)) + list = list->next; + return list; + } +} + +/* There was an unnecessary assignment to the return value until + recently. Scan for that in the resulting output. */ +/* { dg-final { scan-assembler-not "li\\ta0,0" } } */ +