Message ID | 20221108195434.2701247-1-philipp.tomsich@vrull.eu |
---|---|
State | Deferred, archived |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A9D73857B81 for <patchwork@sourceware.org>; Tue, 8 Nov 2022 19:55:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by sourceware.org (Postfix) with ESMTPS id 51FF1385800B for <gcc-patches@gcc.gnu.org>; Tue, 8 Nov 2022 19:54:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 51FF1385800B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-lf1-x12b.google.com with SMTP id r12so22711880lfp.1 for <gcc-patches@gcc.gnu.org>; Tue, 08 Nov 2022 11:54:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=N8jMqa/NOrP4ZzfDi8dAtjtERecSw1Rr2L1UQGK7810=; b=I30ynGeyxAPgeYSDPP1Lty0L3gE6V8ZBrZm3q8KTUuJ9/qE8+lQraqKQrkqWhEzAS1 mqESiGdf6gU0FaxheC12rvb2SBpginZNzzh4hRjVUJkDypBVVFdx7hejw6JIBIHmclez U/AQ44gZlT2Y9Glx4aOlMbeNbUfuJvX4VxkKLrBhX8CpF74/ZCRBErOSX7hEqlPO3X1L xWueN6S5MJEIAXk+zQahCeWwRnohdmc3R95N8DvgkpjVcO0kJhqCC5Olt0xvTDvXZL3U aQ3SGf59lVMIXMRHtkWBscayLsKh+qMf3cMi4Djxazx9012PwEq7WPxQx95aQVcmHUBX 17eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N8jMqa/NOrP4ZzfDi8dAtjtERecSw1Rr2L1UQGK7810=; b=BHbxL7vEAWvRjEyfrUIYw5Q4dutdl2kfGebIK72ZFJeBaSYJT2WlMh5yNj3XEF0GFC iIDp7s+2BOKv9aTt1ifugiHk0q9qUlno73xB7nD+XLvy/3WIT61I3YMkmOBHoIUrPmB+ SsmRCuCRquMDffQ/Kv7WbuyUo0WjX0PN7UeyuXRgjFjgZd7LGIG/+NZ4rlSx2uH2VdVy RzUDGOFM3iTCTa57KVC4D2UCohBLp6GFOhzG+N3ZnKDGjhQ+Jck8JeMRG8KJvtb9HpPp lX8gNfAfiSXosXG0nBzxXlBEpmSSAPytAbUyT8AIjq8WPpovHPLrjuKwMcdKjAYnvl8X F+hg== X-Gm-Message-State: ACrzQf0aNmgeORzn5kzKIjmCuMZesodgWiWJ/xVsNM8JHVMMtNIZrYEq uVe/omkd8uQvhxJxv3WpBY4aK1fm6s1mnlj4 X-Google-Smtp-Source: AMsMyM4H6LYI9OwGNT3P1uC0gs43QbUkox00hAxb4zduZSG1vxaXL1khjgUdtWBLMu8AG9v0Xr4rMg== X-Received: by 2002:ac2:4bc7:0:b0:4a4:6eb6:8da4 with SMTP id o7-20020ac24bc7000000b004a46eb68da4mr20574791lfq.84.1667937276484; Tue, 08 Nov 2022 11:54:36 -0800 (PST) Received: from ubuntu-focal.. ([2a01:4f9:3a:1e26::2]) by smtp.gmail.com with ESMTPSA id h6-20020a197006000000b004a2386b8cf4sm1913615lfc.258.2022.11.08.11.54.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Nov 2022 11:54:36 -0800 (PST) From: Philipp Tomsich <philipp.tomsich@vrull.eu> To: gcc-patches@gcc.gnu.org Cc: Kito Cheng <kito.cheng@gmail.com>, Vineet Gupta <vineetg@rivosinc.com>, Christoph Muellner <christoph.muellner@vrull.eu>, Palmer Dabbelt <palmer@rivosinc.com>, Jeff Law <jlaw@ventanamicro.com>, Philipp Tomsich <philipp.tomsich@vrull.eu> Subject: [PATCH] RISC-V: costs: support shift-and-add in strength-reduction Date: Tue, 8 Nov 2022 20:54:34 +0100 Message-Id: <20221108195434.2701247-1-philipp.tomsich@vrull.eu> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
RISC-V: costs: support shift-and-add in strength-reduction
|
|
Commit Message
Philipp Tomsich
Nov. 8, 2022, 7:54 p.m. UTC
The strength-reduction implementation in expmed.c will assess the profitability of using shift-and-add using a RTL expression that wraps a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs function recognizes this as expressing a sh[123]add instruction, we will return an inflated cost---thus defeating the optimization. This change adds the necessary idiom recognition to provide an accurate cost for this for of expressing sh[123]add. Instead on expanding to li a5,200 mulw a0,a5,a0 with this change, the expression 'a * 200' is sythesized as: sh2add a0,a0,a0 // *5 = a + 4 * a sh2add a0,a0,a0 // *5 = a + 4 * a slli a0,a0,3 // *8 gcc/ChangeLog: * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, if expressed as a plus and multiplication with a power-of-2. --- gcc/config/riscv/riscv.cc | 13 +++++++++++++ 1 file changed, 13 insertions(+)
Comments
On Tue, 08 Nov 2022 11:54:34 PST (-0800), philipp.tomsich@vrull.eu wrote: > The strength-reduction implementation in expmed.c will assess the > profitability of using shift-and-add using a RTL expression that wraps > a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs > function recognizes this as expressing a sh[123]add instruction, we > will return an inflated cost---thus defeating the optimization. > > This change adds the necessary idiom recognition to provide an > accurate cost for this for of expressing sh[123]add. > > Instead on expanding to > li a5,200 > mulw a0,a5,a0 > with this change, the expression 'a * 200' is sythesized as: > sh2add a0,a0,a0 // *5 = a + 4 * a > sh2add a0,a0,a0 // *5 = a + 4 * a > slli a0,a0,3 // *8 That's more instructions, but multiplication is generally expensive. At some point I remember the SiFive cores getting very fast integer multipliers, but I don't see that reflected in the cost model anywhere so maybe I'm just wrong? Andrew or Kito might remember... If the mul-based sequences are still faster on the SiFive cores then we should probably find a way to keep emitting them, which may just be a matter of adjusting those multiply costs. Moving to the shift-based sequences seems reasonable for a generic target, though. Either way, it probably warrants a test case to make sure we don't regress in the future. > > gcc/ChangeLog: > > * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, > if expressed as a plus and multiplication with a power-of-2. > > --- > > gcc/config/riscv/riscv.cc | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index ab6c745c722..0b2c4b3599d 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -2451,6 +2451,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN > *total = COSTS_N_INSNS (1); > return true; > } > + /* Before strength-reduction, the shNadd can be expressed as the addition > + of a multiplication with a power-of-two. If this case is not handled, > + the strength-reduction in expmed.c will calculate an inflated cost. */ > + if (TARGET_ZBA > + && mode == word_mode > + && GET_CODE (XEXP (x, 0)) == MULT > + && REG_P (XEXP (XEXP (x, 0), 0)) > + && CONST_INT_P (XEXP (XEXP (x, 0), 1)) > + && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) IIUC the fall-through is biting us here and this matches power-of-2 +1 and power-of-2 -1. That looks to be the case for the one below, though, so not sure if I'm just missing something? > + { > + *total = COSTS_N_INSNS (1); > + return true; > + } > /* shNadd.uw pattern for zba. > [(set (match_operand:DI 0 "register_operand" "=r") > (plus:DI
On Thu, 10 Nov 2022 at 02:46, Palmer Dabbelt <palmer@rivosinc.com> wrote: > > On Tue, 08 Nov 2022 11:54:34 PST (-0800), philipp.tomsich@vrull.eu wrote: > > The strength-reduction implementation in expmed.c will assess the > > profitability of using shift-and-add using a RTL expression that wraps > > a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs > > function recognizes this as expressing a sh[123]add instruction, we > > will return an inflated cost---thus defeating the optimization. > > > > This change adds the necessary idiom recognition to provide an > > accurate cost for this for of expressing sh[123]add. > > > > Instead on expanding to > > li a5,200 > > mulw a0,a5,a0 > > with this change, the expression 'a * 200' is sythesized as: > > sh2add a0,a0,a0 // *5 = a + 4 * a > > sh2add a0,a0,a0 // *5 = a + 4 * a > > slli a0,a0,3 // *8 > > That's more instructions, but multiplication is generally expensive. At > some point I remember the SiFive cores getting very fast integer > multipliers, but I don't see that reflected in the cost model anywhere > so maybe I'm just wrong? Andrew or Kito might remember... > > If the mul-based sequences are still faster on the SiFive cores then we > should probably find a way to keep emitting them, which may just be a > matter of adjusting those multiply costs. Moving to the shift-based > sequences seems reasonable for a generic target, though. The cost for a regular MULT is COSTS_N_INSNS(4) for the series-7 (see the SImode and DImode entries in the int_mul line): /* Costs to use when optimizing for Sifive 7 Series. */ static const struct riscv_tune_param sifive_7_tune_info = { {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_add */ {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */ {COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */ {COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */ {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */ 2, /* issue_rate */ 4, /* branch_cost */ 3, /* memory_cost */ 8, /* fmv_cost */ true, /* slow_unaligned_access */ }; So the break-even is at COSTS_N_INSNS(4) + rtx_cost(immediate). Testing against series-7, we get up to 5 (4 for the mul + 1 for the li) instructions from strength reduction: val * 783 => sh1add a5,a0,a0 slli a5,a5,4 add a5,a5,a0 slli a5,a5,4 sub a0,a5,a0 but fall back to a mul, once the cost exceeds this: val * 1574 => li a5,1574 mul a0,a0,a5 > Either way, it probably warrants a test case to make sure we don't > regress in the future. Ack. Will be added for v2. > > > > > gcc/ChangeLog: > > > > * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, > > if expressed as a plus and multiplication with a power-of-2. This will still need to be regenerated (it's referring to a '.c' extension still). > > > > --- > > > > gcc/config/riscv/riscv.cc | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > > index ab6c745c722..0b2c4b3599d 100644 > > --- a/gcc/config/riscv/riscv.cc > > +++ b/gcc/config/riscv/riscv.cc > > @@ -2451,6 +2451,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN > > *total = COSTS_N_INSNS (1); > > return true; > > } > > + /* Before strength-reduction, the shNadd can be expressed as the addition > > + of a multiplication with a power-of-two. If this case is not handled, > > + the strength-reduction in expmed.c will calculate an inflated cost. */ > > + if (TARGET_ZBA > > + && mode == word_mode > > + && GET_CODE (XEXP (x, 0)) == MULT > > + && REG_P (XEXP (XEXP (x, 0), 0)) > > + && CONST_INT_P (XEXP (XEXP (x, 0), 1)) > > + && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) > > IIUC the fall-through is biting us here and this matches power-of-2 +1 > and power-of-2 -1. That looks to be the case for the one below, though, > so not sure if I'm just missing something? The strength-reduction in expmed.cc uses "(PLUS (reg) (MULT (reg) <pow2>))" to express a shift-then-add. Here's one of the relevant snippets (from the internal costing in expmed.cc): all.shift_mult = gen_rtx_MULT (mode, all.reg, all.reg); all.shift_add = gen_rtx_PLUS (mode, all.shift_mult, all.reg); So while we normally encounter a "(PLUS (reg) (ASHIFT (reg) <shamt>))", for the strength-reduction we also need to provide the cost for the expression with a MULT). The other idioms (those matching above and below the new one) always require an ASHIFT for the inner. > > > + { > > + *total = COSTS_N_INSNS (1); > > + return true; > > + } > > /* shNadd.uw pattern for zba. > > [(set (match_operand:DI 0 "register_operand" "=r") > > (plus:DI
On Thu, 10 Nov 2022 07:09:35 PST (-0800), philipp.tomsich@vrull.eu wrote: > On Thu, 10 Nov 2022 at 02:46, Palmer Dabbelt <palmer@rivosinc.com> wrote: >> >> On Tue, 08 Nov 2022 11:54:34 PST (-0800), philipp.tomsich@vrull.eu wrote: >> > The strength-reduction implementation in expmed.c will assess the >> > profitability of using shift-and-add using a RTL expression that wraps >> > a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs >> > function recognizes this as expressing a sh[123]add instruction, we >> > will return an inflated cost---thus defeating the optimization. >> > >> > This change adds the necessary idiom recognition to provide an >> > accurate cost for this for of expressing sh[123]add. >> > >> > Instead on expanding to >> > li a5,200 >> > mulw a0,a5,a0 >> > with this change, the expression 'a * 200' is sythesized as: >> > sh2add a0,a0,a0 // *5 = a + 4 * a >> > sh2add a0,a0,a0 // *5 = a + 4 * a >> > slli a0,a0,3 // *8 >> >> That's more instructions, but multiplication is generally expensive. At >> some point I remember the SiFive cores getting very fast integer >> multipliers, but I don't see that reflected in the cost model anywhere >> so maybe I'm just wrong? Andrew or Kito might remember... >> >> If the mul-based sequences are still faster on the SiFive cores then we >> should probably find a way to keep emitting them, which may just be a >> matter of adjusting those multiply costs. Moving to the shift-based >> sequences seems reasonable for a generic target, though. > > The cost for a regular MULT is COSTS_N_INSNS(4) for the series-7 (see > the SImode and DImode entries in the int_mul line): > /* Costs to use when optimizing for Sifive 7 Series. */ > static const struct riscv_tune_param sifive_7_tune_info = { > {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_add */ > {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */ > {COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */ > {COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */ > {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */ > 2, /* issue_rate */ > 4, /* branch_cost */ > 3, /* memory_cost */ > 8, /* fmv_cost */ > true, /* slow_unaligned_access */ > }; > > So the break-even is at COSTS_N_INSNS(4) + rtx_cost(immediate). > > Testing against series-7, we get up to 5 (4 for the mul + 1 for the > li) instructions from strength reduction: > > val * 783 > => > sh1add a5,a0,a0 > slli a5,a5,4 > add a5,a5,a0 > slli a5,a5,4 > sub a0,a5,a0 > > but fall back to a mul, once the cost exceeds this: > > val * 1574 > => > li a5,1574 > mul a0,a0,a5 That's just the cost model, though, not the hardware. My argument was essentially that the cost model is wrong, assuming how I remember the hardware is right. That was a while ago and there's a lot of options, though, so I'm not sure what these things actually look like. IMO that doesn't need to block this patch, though: having one incorrect cost model so it cancels out another one is a great way to lose our minds. >> Either way, it probably warrants a test case to make sure we don't >> regress in the future. > > Ack. Will be added for v2. > >> >> > >> > gcc/ChangeLog: >> > >> > * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, >> > if expressed as a plus and multiplication with a power-of-2. > > This will still need to be regenerated (it's referring to a '.c' > extension still). > >> > >> > --- >> > >> > gcc/config/riscv/riscv.cc | 13 +++++++++++++ >> > 1 file changed, 13 insertions(+) >> > >> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc >> > index ab6c745c722..0b2c4b3599d 100644 >> > --- a/gcc/config/riscv/riscv.cc >> > +++ b/gcc/config/riscv/riscv.cc >> > @@ -2451,6 +2451,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN >> > *total = COSTS_N_INSNS (1); >> > return true; >> > } >> > + /* Before strength-reduction, the shNadd can be expressed as the addition >> > + of a multiplication with a power-of-two. If this case is not handled, >> > + the strength-reduction in expmed.c will calculate an inflated cost. */ >> > + if (TARGET_ZBA >> > + && mode == word_mode >> > + && GET_CODE (XEXP (x, 0)) == MULT >> > + && REG_P (XEXP (XEXP (x, 0), 0)) >> > + && CONST_INT_P (XEXP (XEXP (x, 0), 1)) >> > + && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) >> >> IIUC the fall-through is biting us here and this matches power-of-2 +1 >> and power-of-2 -1. That looks to be the case for the one below, though, >> so not sure if I'm just missing something? > > The strength-reduction in expmed.cc uses "(PLUS (reg) (MULT (reg) > <pow2>))" to express a shift-then-add. > Here's one of the relevant snippets (from the internal costing in expmed.cc): > all.shift_mult = gen_rtx_MULT (mode, all.reg, all.reg); > all.shift_add = gen_rtx_PLUS (mode, all.shift_mult, all.reg); > > So while we normally encounter a "(PLUS (reg) (ASHIFT (reg) > <shamt>))", for the strength-reduction we also need to provide the > cost for the expression with a MULT). > The other idioms (those matching above and below the new one) always > require an ASHIFT for the inner. I was trying to say this also matches "(MINUS (REG) (MULT (reg) <pow2>))", the switch statement block it's in has PLUS and MINUS as a fall through. IIUC we don't have single-instruction patterns for the minus case, unless I'm missing some sort of trick in ZBA? That also looks to be the case for some of the other patterns in this block. >> > + { >> > + *total = COSTS_N_INSNS (1); >> > + return true; >> > + } >> > /* shNadd.uw pattern for zba. >> > [(set (match_operand:DI 0 "register_operand" "=r") >> > (plus:DI
On Thu, 10 Nov 2022 at 21:47, Palmer Dabbelt <palmer@rivosinc.com> wrote: > > On Thu, 10 Nov 2022 07:09:35 PST (-0800), philipp.tomsich@vrull.eu wrote: > > On Thu, 10 Nov 2022 at 02:46, Palmer Dabbelt <palmer@rivosinc.com> wrote: > >> > >> On Tue, 08 Nov 2022 11:54:34 PST (-0800), philipp.tomsich@vrull.eu wrote: > >> > The strength-reduction implementation in expmed.c will assess the > >> > profitability of using shift-and-add using a RTL expression that wraps > >> > a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs > >> > function recognizes this as expressing a sh[123]add instruction, we > >> > will return an inflated cost---thus defeating the optimization. > >> > > >> > This change adds the necessary idiom recognition to provide an > >> > accurate cost for this for of expressing sh[123]add. > >> > > >> > Instead on expanding to > >> > li a5,200 > >> > mulw a0,a5,a0 > >> > with this change, the expression 'a * 200' is sythesized as: > >> > sh2add a0,a0,a0 // *5 = a + 4 * a > >> > sh2add a0,a0,a0 // *5 = a + 4 * a > >> > slli a0,a0,3 // *8 > >> > >> That's more instructions, but multiplication is generally expensive. At > >> some point I remember the SiFive cores getting very fast integer > >> multipliers, but I don't see that reflected in the cost model anywhere > >> so maybe I'm just wrong? Andrew or Kito might remember... > >> > >> If the mul-based sequences are still faster on the SiFive cores then we > >> should probably find a way to keep emitting them, which may just be a > >> matter of adjusting those multiply costs. Moving to the shift-based > >> sequences seems reasonable for a generic target, though. > > > > The cost for a regular MULT is COSTS_N_INSNS(4) for the series-7 (see > > the SImode and DImode entries in the int_mul line): > > /* Costs to use when optimizing for Sifive 7 Series. */ > > static const struct riscv_tune_param sifive_7_tune_info = { > > {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_add */ > > {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */ > > {COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */ > > {COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */ > > {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */ > > 2, /* issue_rate */ > > 4, /* branch_cost */ > > 3, /* memory_cost */ > > 8, /* fmv_cost */ > > true, /* slow_unaligned_access */ > > }; > > > > So the break-even is at COSTS_N_INSNS(4) + rtx_cost(immediate). > > > > Testing against series-7, we get up to 5 (4 for the mul + 1 for the > > li) instructions from strength reduction: > > > > val * 783 > > => > > sh1add a5,a0,a0 > > slli a5,a5,4 > > add a5,a5,a0 > > slli a5,a5,4 > > sub a0,a5,a0 > > > > but fall back to a mul, once the cost exceeds this: > > > > val * 1574 > > => > > li a5,1574 > > mul a0,a0,a5 > > That's just the cost model, though, not the hardware. My argument was > essentially that the cost model is wrong, assuming how I remember the > hardware is right. That was a while ago and there's a lot of options, > though, so I'm not sure what these things actually look like. > > IMO that doesn't need to block this patch, though: having one incorrect > cost model so it cancels out another one is a great way to lose our > minds. > > >> Either way, it probably warrants a test case to make sure we don't > >> regress in the future. > > > > Ack. Will be added for v2. > > > >> > >> > > >> > gcc/ChangeLog: > >> > > >> > * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, > >> > if expressed as a plus and multiplication with a power-of-2. > > > > This will still need to be regenerated (it's referring to a '.c' > > extension still). > > > >> > > >> > --- > >> > > >> > gcc/config/riscv/riscv.cc | 13 +++++++++++++ > >> > 1 file changed, 13 insertions(+) > >> > > >> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > >> > index ab6c745c722..0b2c4b3599d 100644 > >> > --- a/gcc/config/riscv/riscv.cc > >> > +++ b/gcc/config/riscv/riscv.cc > >> > @@ -2451,6 +2451,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN > >> > *total = COSTS_N_INSNS (1); > >> > return true; > >> > } > >> > + /* Before strength-reduction, the shNadd can be expressed as the addition > >> > + of a multiplication with a power-of-two. If this case is not handled, > >> > + the strength-reduction in expmed.c will calculate an inflated cost. */ > >> > + if (TARGET_ZBA > >> > + && mode == word_mode > >> > + && GET_CODE (XEXP (x, 0)) == MULT > >> > + && REG_P (XEXP (XEXP (x, 0), 0)) > >> > + && CONST_INT_P (XEXP (XEXP (x, 0), 1)) > >> > + && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) > >> > >> IIUC the fall-through is biting us here and this matches power-of-2 +1 > >> and power-of-2 -1. That looks to be the case for the one below, though, > >> so not sure if I'm just missing something? > > > > The strength-reduction in expmed.cc uses "(PLUS (reg) (MULT (reg) > > <pow2>))" to express a shift-then-add. > > Here's one of the relevant snippets (from the internal costing in expmed.cc): > > all.shift_mult = gen_rtx_MULT (mode, all.reg, all.reg); > > all.shift_add = gen_rtx_PLUS (mode, all.shift_mult, all.reg); > > > > So while we normally encounter a "(PLUS (reg) (ASHIFT (reg) > > <shamt>))", for the strength-reduction we also need to provide the > > cost for the expression with a MULT). > > The other idioms (those matching above and below the new one) always > > require an ASHIFT for the inner. > > I was trying to say this also matches "(MINUS (REG) (MULT (reg) > <pow2>))", the switch statement block it's in has PLUS and MINUS as a > fall through. IIUC we don't have single-instruction patterns for the > minus case, unless I'm missing some sort of trick in ZBA? In fact, this was caught with the test cases being pulled it. If left unchecked, it led to cases with one or 2 extra instructions being generated beyond the breakeven point for multiplication. So this will also be queued up for v2. Thanks for pointing it out! > > That also looks to be the case for some of the other patterns in this > block. > > >> > + { > >> > + *total = COSTS_N_INSNS (1); > >> > + return true; > >> > + } > >> > /* shNadd.uw pattern for zba. > >> > [(set (match_operand:DI 0 "register_operand" "=r") > >> > (plus:DI
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index ab6c745c722..0b2c4b3599d 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -2451,6 +2451,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN *total = COSTS_N_INSNS (1); return true; } + /* Before strength-reduction, the shNadd can be expressed as the addition + of a multiplication with a power-of-two. If this case is not handled, + the strength-reduction in expmed.c will calculate an inflated cost. */ + if (TARGET_ZBA + && mode == word_mode + && GET_CODE (XEXP (x, 0)) == MULT + && REG_P (XEXP (XEXP (x, 0), 0)) + && CONST_INT_P (XEXP (XEXP (x, 0), 1)) + && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) + { + *total = COSTS_N_INSNS (1); + return true; + } /* shNadd.uw pattern for zba. [(set (match_operand:DI 0 "register_operand" "=r") (plus:DI