From patchwork Thu Oct 31 01:06:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 99857 X-Patchwork-Delegate: jlaw@ventanamicro.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BEBF43857BA9 for ; Thu, 31 Oct 2024 01:08:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by sourceware.org (Postfix) with ESMTPS id ECAE63857BB6 for ; Thu, 31 Oct 2024 01:06:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ECAE63857BB6 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org ECAE63857BB6 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::52e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730336791; cv=none; b=AoE7ijU0rE3jUhB+ss7pTMMxOYPNpqtBMVM0i10pR9KmSsdTFCRbCb+/TL9Z/L/mPNzwJOHmbcB2LcsB/Rmc26lppMSdHpHUm0/4zSGEPdvMo9pl3hNI1FF/ycSPxm8zEDyJ/YnCQk+vWmUa3bdicuIU1bw+6Y1TLSWq1bQbx5I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730336791; c=relaxed/simple; bh=cyXg0okrfdR0vBwgHK1i68N7paa5Nk6FRJ1WCd0TeR8=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:To:Subject; b=gBsbQXxGev2cvfZDEhRLoV+uqks1Ds+s4kLCNxKd759B27BYWyF69HO3el4qRV+Oi0klMwTkqWILZOV8QLmQxwHgR2Ux5LU/ihhhhxDPVfuvdGvOyrjRpvaCNfoiY6KnxUwWhiHTyhmolayskkycOYneV6AFQ3J4hk9Th5p+FF4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-5c97c7852e8so745355a12.1 for ; Wed, 30 Oct 2024 18:06:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730336765; x=1730941565; darn=gcc.gnu.org; h=subject:to:content-language:from:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=d9UGbU3Nkuam/3lr7sylwHstS+/3YXV9VVCkES4z6YA=; b=e2HJOS22F0cDmQnOzc/VSCSUfawotBBJcs9uWqKZttXj1K1ae8NV5Y17yMf+YbVqSO 6yyr7/wMUedcUjRogxYoNUdt12VJiwcOo+NMOitoycNTollKioJp4v1Pc5bVLDiEGSW2 tl3MIzMqv9XIzik34EwyY9I6Kw61PoP8TtbI0rRRa4pJ2zp0eB1sCahPm1tfil2h1Ymu NrdkRz8m7frJz5kh4olIjm9Nf6jvvIxgbPoR7nFu3zwzdlF6E8P/JsQOFOCjXYCWOxco 4idajWtRYk67Cy43XcOBkIqBYOPMW1iyShnBAFi165pf8GkednZmZPoezbFwZlRgASXP YsNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730336765; x=1730941565; h=subject:to:content-language:from:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=d9UGbU3Nkuam/3lr7sylwHstS+/3YXV9VVCkES4z6YA=; b=g6J/yKVfryi3L2U0vI+r4mKfSpESVEk7bd3hC2Sjz2f8Tei7ltzY9E3Ad8gfgULFkD BkwHbML6PHiKC7ddMXN/ozJtav6gIDZtMg4fMjxkvuX7R1FYkgYWCy6fT+yUJo0G3Yuv VFP5H4WXxDBgS8+dKFlJwkZs3t0Ugu37MDuyGIEyWHPqWwE33oiG1itwj9OxaAD2xayQ ik7uahjJq6V6oTUofk5Xj1au8ZYQ/7mUdNjFE1ESdJfxFA10hktDCYRgxZsYTT+R9NwU mK6jHAgp1tOJezetzdBjoT8Bk+fP4Bl7CRdfXVsNj8cWgA9n4sQH8z38IYodQ/px7L0T q+tQ== X-Gm-Message-State: AOJu0YyEVgi1x9FmCUye1vP87h4WQCMKqMXSMtnzBm4f6bX9wQUp5OvR S2tlv1Zi3EW/C70PiBST3L7b938N9kDfzJOn6VtfxgmAUBkJj2N06aqJpw== X-Google-Smtp-Source: AGHT+IG8l5rAcNJwUNVkc7kM15D4hP6mz90S8D9DZgJieBPASkr1+pKpqM20lXeN6s0S9vVPEOGCkw== X-Received: by 2002:a17:907:7ea0:b0:a9a:17fb:4c40 with SMTP id a640c23a62f3a-a9de5d6e1e7mr1654460966b.26.1730336764724; Wed, 30 Oct 2024 18:06:04 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9e56681a76sm13137766b.199.2024.10.30.18.06.03 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Oct 2024 18:06:03 -0700 (PDT) Message-ID: Date: Wed, 30 Oct 2024 19:06:02 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Jeff Law Content-Language: en-US To: "gcc-patches@gcc.gnu.org" Subject: [RISC-V] Reorder the ready queue to avoid extraneous vsetvls X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org So this patch is a very conservative approach to eliminate more vsetvl instructions. As we know, scheduling can scramble the instruction stream based on a variety of factors and can easily result in an instruction sequence where we ping-pong between different vector configurations, thus resulting in more vsetvl instructions than if we had issued instructions in a slightly different order. Robin did some experiments with vsetvl aware scheduling several weeks ago. I believe he was adjusting the priority or cost of the insns based on their vsetvl needs. This experiment actually showed worse performance on our design, which is a good indicator that scheduling for latency and functional hazards is more important than eliminating vsetvl instructions (not a surprise obviously). That experiment greatly influenced this implementation. Specifically we don't adjust cost/priority of any insns. Instead we use vector configuration information to potentially swap two insns in the scheduler's ready queue iff swapping would result in avoiding a vsetvl and the two insns being swapped in the ready queue have the same priority/cost. So it's quite conservative, essentially using vector configuration as a sort key of last resort and only for the highest priority insns in the ready queue and only when it's going to let us eliminate a vsetvl. For something like SATD we eliminate a few gratuitous vsetvl instructions. As expected it makes no performance difference in the pico benchmarks we've built on our design. The main goal here is to side step questions about the over-active changing of vector configurations. My BPI is busy, so I don't have performance data on that design, but it does result in a ~1% drop in dynamic instructions. On a high performance design I wouldn't expect this to help performance in an significant way, but it does make the code look better. Bootstrapped and regression tested on riscv64-linux-gnu, also regression tested in my tester for rv32 and rv64 elf configurations. I'll wait for the pre-commit tester to render a verdict *and* for others who know the vsetvl code to have time to chime in. Assuming clean from pre-commit and no negative comments the plan would be to commit next week sometime. Jeff * config/riscv/riscv-protos.h (has_vtype_op): Declare. (mask_agnostic_p, get_avl, vsetvl_insn_p): Likewise. * config/riscv/riscv-vsetvl.cc (has_vtype_op): No longer static. (vsetvl_insn_p, get_avl, mask_agnostic_p): Likewise. * config/riscv/riscv.cc (last_vconfig): New structure. (clear_vconfig, compatible_with_last_vconfig): New functions. (riscv_sched_init, riscv_sched_reorder): Likewise. (riscv_sched_variable_issue): Record vector config as insns are issued. (TARGET_SCHED_INIT, TARGET_SCHED_REORDER): Define. * gcc.target/riscv/rvv/vsetvl/pr111037-1.c: Adjust expected output. * gcc.target/riscv/rvv/vsetvl/pr111037-4.c: Likewise. * gcc.target/riscv/rvv/vsetvl/pr113248.c: Likewise. diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 1e6d10a1402..96d0e13577e 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -197,6 +197,12 @@ rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt); rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt); rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt); +/* Routines implemented in riscv-vsetvl.cc. */ +extern bool has_vtype_op (rtx_insn *); +extern bool mask_agnostic_p (rtx_insn *); +extern rtx get_avl (rtx_insn *); +extern bool vsetvl_insn_p (rtx_insn *); + /* Routines implemented in riscv-string.c. */ extern bool riscv_expand_block_compare (rtx, rtx, rtx, rtx); extern bool riscv_expand_block_move (rtx, rtx, rtx); diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 030ffbe2ebb..925b95b4063 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -259,7 +259,7 @@ policy_to_str (bool agnostic_p) /* Return true if it is an RVV instruction depends on VTYPE global status register. */ -static bool +bool has_vtype_op (rtx_insn *rinsn) { return recog_memoized (rinsn) >= 0 && get_attr_has_vtype_op (rinsn); @@ -307,7 +307,7 @@ vector_config_insn_p (rtx_insn *rinsn) } /* Return true if it is vsetvldi or vsetvlsi. */ -static bool +bool vsetvl_insn_p (rtx_insn *rinsn) { if (!rinsn || !vector_config_insn_p (rinsn)) @@ -387,7 +387,7 @@ get_vl (rtx_insn *rinsn) } /* Helper function to get AVL operand. */ -static rtx +rtx get_avl (rtx_insn *rinsn) { if (vsetvl_insn_p (rinsn) || vsetvl_discard_result_insn_p (rinsn)) @@ -412,7 +412,7 @@ get_default_ma () } /* Helper function to get MA operand. */ -static bool +bool mask_agnostic_p (rtx_insn *rinsn) { /* If it doesn't have MA, we return agnostic by default. */ diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index e111cb07284..71a1aaf96fb 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -9612,6 +9612,71 @@ riscv_issue_rate (void) return tune_param->issue_rate; } +/* Structure for very basic vector configuration tracking in the scheduler. */ +struct last_vconfig +{ + bool valid; + bool ta; + bool ma; + uint8_t sew; + uint8_t vlmul; + rtx avl; +} last_vconfig; + +/* Clear LAST_VCONFIG so we have no known state. */ +static void +clear_vconfig (void) +{ + memset (&last_vconfig, 0, sizeof (last_vconfig)); +} + +/* Return TRUE if INSN is a vector insn needing a particular + vector configuration that is trivially equal to the last + vector insn issued. Return FALSE otherwise. */ +static bool +compatible_with_last_vconfig (rtx_insn *insn) +{ + /* We might be able to extract the data from a preexisting vsetvl. */ + if (vsetvl_insn_p (insn)) + return false; + + /* Nothing to do for these cases. */ + if (!NONDEBUG_INSN_P (insn) || !has_vtype_op (insn)) + return false; + + extract_insn_cached (insn); + + rtx avl = get_avl (insn); + if (avl != last_vconfig.avl) + return false; + + if (get_sew (insn) != last_vconfig.sew) + return false; + + if (get_vlmul (insn) != last_vconfig.vlmul) + return false; + + if (tail_agnostic_p (insn) != last_vconfig.ta) + return false; + + if (mask_agnostic_p (insn) != last_vconfig.ma) + return false; + + /* No differences found, they're trivially compatible. */ + return true; +} + +/* Implement TARGET_SCHED_INIT, we use this to track the vector configuration + of the last issued vector instruction. We can then use that information + to potentially adjust the ready queue to issue instructions of a compatible + vector configuration instead of a conflicting configuration. That will + reduce the number of vsetvl instructions we ultimately emit. */ +static void +riscv_sched_init (FILE *, int, int) +{ + clear_vconfig (); +} + /* Implement TARGET_SCHED_VARIABLE_ISSUE. */ static int riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, int more) @@ -9636,9 +9701,85 @@ riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, int more) an assert so we can find and fix this problem. */ gcc_assert (insn_has_dfa_reservation_p (insn)); + /* If this is a vector insn with vl/vtype info, then record the last + vector configuration. */ + if (vsetvl_insn_p (insn)) + clear_vconfig (); + else if (NONDEBUG_INSN_P (insn) && has_vtype_op (insn)) + { + extract_insn_cached (insn); + + rtx avl = get_avl (insn); + if (avl == RVV_VLMAX) + avl = const0_rtx; + + if (!avl || !CONST_INT_P (avl)) + clear_vconfig (); + else + { + last_vconfig.valid = true; + last_vconfig.avl = avl; + last_vconfig.sew = get_sew (insn); + last_vconfig.vlmul = get_vlmul (insn); + last_vconfig.ta = tail_agnostic_p (insn); + last_vconfig.ma = mask_agnostic_p (insn); + } + } return more - 1; } +/* Implement TARGET_SCHED_REORDER. The goal here is to look at the ready + queue and reorder it ever so slightly to encourage issing an insn with + the same vector configuration as the most recently issued vector + instruction. That will reduce vsetvl instructions. */ +static int +riscv_sched_reorder (FILE *, int, rtx_insn **ready, int *nreadyp, int) +{ + /* If we don't have a valid prior vector configuration, then there is + no point in reordering the ready queue, similarly if there is + just one entry in the queue. */ + if (!last_vconfig.valid || *nreadyp == 1) + return riscv_issue_rate (); + + int nready = *nreadyp; + int priority = INSN_PRIORITY (ready[nready - 1]); + for (int i = nready - 1; i >= 0; i--) + { + rtx_insn *insn = ready[i]; + + /* On a high performance core, vsetvl instructions should be + inexpensive. Removing them is very much a secondary concern, so + be extremely conservative with reordering, essentially only + allowing reordering within the highest priority value. + + Lower end cores may benefit from more flexibility here. That + tuning is left to those who understand their core's behavior + and can thoroughly benchmark the result. Assuming such + designs appear, we can probably put an entry in the tuning + structure to indicate how much difference in priority to allow. */ + if (INSN_PRIORITY (insn) < priority) + break; + + if (compatible_with_last_vconfig (insn)) + { + /* This entry is compatible with the last vconfig and has + the same priority as the most important insn. So swap + it so that we keep the vector configuration as-is and + ultimately eliminate a vsetvl. + + Note no need to swap if this is the first entry in the + queue. */ + if (i == nready - 1) + break; + + std::swap (ready[i], ready[nready - 1]); + break; + } + } + + return riscv_issue_rate (); +} + /* Implement TARGET_SCHED_MACRO_FUSION_P. Return true if target supports instruction fusion of some sort. */ @@ -12610,9 +12751,15 @@ riscv_stack_clash_protection_alloca_probe_range (void) #undef TARGET_SCHED_MACRO_FUSION_PAIR_P #define TARGET_SCHED_MACRO_FUSION_PAIR_P riscv_macro_fusion_pair_p +#undef TARGET_SCHED_INIT +#define TARGET_SCHED_INIT riscv_sched_init + #undef TARGET_SCHED_VARIABLE_ISSUE #define TARGET_SCHED_VARIABLE_ISSUE riscv_sched_variable_issue +#undef TARGET_SCHED_REORDER +#define TARGET_SCHED_REORDER riscv_sched_reorder + #undef TARGET_SCHED_ADJUST_COST #define TARGET_SCHED_ADJUST_COST riscv_sched_adjust_cost diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-1.c index 803ce5702eb..96c1f312cc4 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-1.c @@ -11,5 +11,5 @@ void foo(_Float16 y, int64_t *i64p) asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy)); } -/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 1 } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e16,\s*m1,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-4.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-4.c index 5949085bdc9..6cf3c64679f 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-4.c @@ -12,5 +12,5 @@ void foo(_Float16 y, int16_t z, int64_t *i64p) asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy), "vr" (vz)); } -/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 1 } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e16,\s*m1,\s*t[au],\s*m[au]} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113248.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113248.c index d95281362a8..fe573933977 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113248.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113248.c @@ -11,5 +11,5 @@ void foo(_Float16 y, int64_t *i64p) asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy)); } -/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 1 } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e16,\s*m1,\s*t[au],\s*m[au]} 1 } } */