Message ID | 20220524214703.4022737-4-philipp.tomsich@vrull.eu |
---|---|
State | Deferred, archived |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 33E353857BBE for <patchwork@sourceware.org>; Tue, 24 May 2022 21:48:55 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by sourceware.org (Postfix) with ESMTPS id 7123338485AD for <gcc-patches@gcc.gnu.org>; Tue, 24 May 2022 21:47:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7123338485AD Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-lf1-x136.google.com with SMTP id l13so26370306lfp.11 for <gcc-patches@gcc.gnu.org>; Tue, 24 May 2022 14:47:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+a52S9cqnk2yl6uJOk+Rh5KH44H82rLH1ywEwAsI63E=; b=X7LSB71iYhvlZkuqjvmbfQWhzbtzu0uc60UedMX5MD8p7BE5sQJOzcDppzkfOgAcY6 1nY/pM6R/YCGaJLJGfglFW2aPZqNj0Yd6N7DEFFhCDbuPm8yOl2+IOI+GXc0FM2obhKJ rde6gsR3LuCGQgWrwLc8ubn46k7xaffgvZXb3bD7XoaQ9jYlHKtYhAi/iiKJcFsQAhAn bHHfSA9WbjFYZG7qRIaDQz7Pc2pmMyNfZzyG4A2P4caN6NSa1A0RjSE1He8MJKQsOnT+ 2mWFyEiQYxJSAaeJsAwl9CMj7G0g3f3umSF8wZO4GaL62Odx6Yk5yM8kvk8YcRkjoNga dJTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+a52S9cqnk2yl6uJOk+Rh5KH44H82rLH1ywEwAsI63E=; b=R1vUXouXlFoEGweGD9jydRanpUfLXrakZtMa/hw9cWhSWIzIJjw0M0VVnHKLnwe7/o lD9V6WmgPKOV7GHOqhGuiRkEGDZeT55Q071bgnFYHUAvuVdoIizb7oCT80+NQHvvqgHo mWBkPSWbdRBkY9DQ9aktY5guOO/CVTff9wfeDuG0p9TPQ6YvZibE6Xcy3EKJTb2yiZ28 FIRJzjQF3mqoYJIwM9EupAPdhLgrWLUIIT5e3SuVDAtxn1zpo3iXUuy1RjK18AP+eC8f 5cNvckmChddsBoE5/jDe9AOff+LuzWYNs2kek1Z69ue/2VF2o0MhRPuaHIjZBPMwgkys Pr6A== X-Gm-Message-State: AOAM530yo+Tc4wsqNoBbwlCFflo8hobAz4IIl+C2X0FROaOLonFLZC7S 9XBIt3UOLUtkZ2MNxoalgSNG5HGlKCXiIQLV X-Google-Smtp-Source: ABdhPJxY/7lR/aj6acnffwBSoPQw5dtraYKedxpoB4goyPeHPbwQSmL7XUWVZ9mZfLz2o/1BaQoRIA== X-Received: by 2002:a05:6512:b99:b0:478:53da:91c with SMTP id b25-20020a0565120b9900b0047853da091cmr15463780lfv.583.1653428835801; Tue, 24 May 2022 14:47:15 -0700 (PDT) Received: from ubuntu-focal.. ([2a01:4f9:3a:1e26::2]) by smtp.gmail.com with ESMTPSA id c17-20020a2e9d91000000b0024f3d1dae98sm2724165ljj.32.2022.05.24.14.47.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 May 2022 14:47:15 -0700 (PDT) From: Philipp Tomsich <philipp.tomsich@vrull.eu> To: gcc-patches@gcc.gnu.org Subject: [PATCH v1 3/3] RISC-V: Replace zero_extendsidi2_shifted with generalized split Date: Tue, 24 May 2022 23:47:03 +0200 Message-Id: <20220524214703.4022737-4-philipp.tomsich@vrull.eu> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220524214703.4022737-1-philipp.tomsich@vrull.eu> References: <20220524214703.4022737-1-philipp.tomsich@vrull.eu> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Cc: Andrew Waterman <andrew@sifive.com>, Vineet Gupta <vineetg@rivosinc.com>, Kito Cheng <kito.cheng@gmail.com>, Philipp Tomsich <philipp.tomsich@vrull.eu> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
RISC-V: Improve sequences with shifted zero-extended operands
|
|
Commit Message
Philipp Tomsich
May 24, 2022, 9:47 p.m. UTC
The current method of treating shifts of extended values on RISC-V
frequently causes sequences of 3 shifts, despite the presence of the
'zero_extendsidi2_shifted' pattern.
Consider:
unsigned long f(unsigned int a, unsigned long b)
{
a = a << 1;
unsigned long c = (unsigned long) a;
c = b + (c<<4);
return c;
}
which will present at combine-time as:
Trying 7, 8 -> 9:
7: r78:SI=r81:DI#0<<0x1
REG_DEAD r81:DI
8: r79:DI=zero_extend(r78:SI)
REG_DEAD r78:SI
9: r72:DI=r79:DI<<0x4
REG_DEAD r79:DI
Failed to match this instruction:
(set (reg:DI 72 [ _1 ])
(and:DI (ashift:DI (reg:DI 81)
(const_int 5 [0x5]))
(const_int 68719476704 [0xfffffffe0])))
and produce the following (optimized) assembly:
f:
slliw a5,a0,1
slli a5,a5,32
srli a5,a5,28
add a0,a5,a1
ret
The current way of handling this (in 'zero_extendsidi2_shifted')
doesn't apply for two reasons:
- this is seen before reload, and
- (more importantly) the constant mask is not 0xfffffffful.
To address this, we introduce a generalized version of shifting
zero-extended values that supports any mask of consecutive ones as
long as the number of training zeros is the inner shift-amount.
With this new split, we generate the following assembly for the
aforementioned function:
f:
slli a0,a0,33
srli a0,a0,28
add a0,a0,a1
ret
gcc/ChangeLog:
* config/riscv/riscv.md (zero_extendsidi2_shifted): Replace
with a generalized split that requires no clobber, runs
before reload and works for smaller masks.
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---
gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++-----------------
1 file changed, 20 insertions(+), 17 deletions(-)
Comments
On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote: > > The current method of treating shifts of extended values on RISC-V > frequently causes sequences of 3 shifts, despite the presence of the > 'zero_extendsidi2_shifted' pattern. > > Consider: > unsigned long f(unsigned int a, unsigned long b) > { > a = a << 1; > unsigned long c = (unsigned long) a; > c = b + (c<<4); > return c; > } > which will present at combine-time as: > Trying 7, 8 -> 9: > 7: r78:SI=r81:DI#0<<0x1 > REG_DEAD r81:DI > 8: r79:DI=zero_extend(r78:SI) > REG_DEAD r78:SI > 9: r72:DI=r79:DI<<0x4 > REG_DEAD r79:DI > Failed to match this instruction: > (set (reg:DI 72 [ _1 ]) > (and:DI (ashift:DI (reg:DI 81) > (const_int 5 [0x5])) > (const_int 68719476704 [0xfffffffe0]))) > and produce the following (optimized) assembly: > f: > slliw a5,a0,1 > slli a5,a5,32 > srli a5,a5,28 > add a0,a5,a1 > ret > > The current way of handling this (in 'zero_extendsidi2_shifted') > doesn't apply for two reasons: > - this is seen before reload, and > - (more importantly) the constant mask is not 0xfffffffful. > > To address this, we introduce a generalized version of shifting > zero-extended values that supports any mask of consecutive ones as > long as the number of training zeros is the inner shift-amount. > > With this new split, we generate the following assembly for the > aforementioned function: > f: > slli a0,a0,33 > srli a0,a0,28 > add a0,a0,a1 > ret > > gcc/ChangeLog: > > * config/riscv/riscv.md (zero_extendsidi2_shifted): Replace > with a generalized split that requires no clobber, runs > before reload and works for smaller masks. > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > --- > > gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++----------------- > 1 file changed, 20 insertions(+), 17 deletions(-) > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > index b8ab0cf169a..cc10cd90a74 100644 > --- a/gcc/config/riscv/riscv.md > +++ b/gcc/config/riscv/riscv.md > @@ -2119,23 +2119,26 @@ (define_split > ;; occur when unsigned int is used for array indexing. Split this into two > ;; shifts. Otherwise we can get 3 shifts. > > -(define_insn_and_split "zero_extendsidi2_shifted" > - [(set (match_operand:DI 0 "register_operand" "=r") > - (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") > - (match_operand:QI 2 "immediate_operand" "I")) > - (match_operand 3 "immediate_operand" ""))) > - (clobber (match_scratch:DI 4 "=&r"))] > - "TARGET_64BIT && !TARGET_ZBA > - && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)" > - "#" > - "&& reload_completed" > - [(set (match_dup 4) > - (ashift:DI (match_dup 1) (const_int 32))) > - (set (match_dup 0) > - (lshiftrt:DI (match_dup 4) (match_dup 5)))] > - "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));" > - [(set_attr "type" "shift") > - (set_attr "mode" "DI")]) > +(define_split > + [(set (match_operand:DI 0 "register_operand") > + (and:DI (ashift:DI (match_operand:DI 1 "register_operand") > + (match_operand:QI 2 "immediate_operand")) > + (match_operand:DI 3 "consecutive_bits_operand")))] > + "TARGET_64BIT" > + [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4))) > + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))] I would prefer to keep using another register if possible: like this: + [(set (match_dup 6) (ashift:DI (match_dup 1) (match_dup 4))) + (set (match_dup 0) (lshiftrt:DI (match_dup 6) (match_dup 5)))] if (can_create_pseudo_p) operands[6] = gen_reg_rtx (DImode); else operands[6] = operands[0]; > +{ > + unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]); > + int leading = clz_hwi (mask); > + int trailing = ctz_hwi (mask); > + > + /* The shift-amount must match the number of trailing bits */ > + if (trailing != UINTVAL (operands[2])) > + FAIL; > + > + operands[4] = GEN_INT (leading + trailing); > + operands[5] = GEN_INT (leading); > +}) > > ;; > ;; .................... > -- > 2.34.1 >
On Tue, 7 Jun 2022 at 12:24, Kito Cheng <kito.cheng@gmail.com> wrote: > > On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich > <philipp.tomsich@vrull.eu> wrote: > > > > The current method of treating shifts of extended values on RISC-V > > frequently causes sequences of 3 shifts, despite the presence of the > > 'zero_extendsidi2_shifted' pattern. > > > > Consider: > > unsigned long f(unsigned int a, unsigned long b) > > { > > a = a << 1; > > unsigned long c = (unsigned long) a; > > c = b + (c<<4); > > return c; > > } > > which will present at combine-time as: > > Trying 7, 8 -> 9: > > 7: r78:SI=r81:DI#0<<0x1 > > REG_DEAD r81:DI > > 8: r79:DI=zero_extend(r78:SI) > > REG_DEAD r78:SI > > 9: r72:DI=r79:DI<<0x4 > > REG_DEAD r79:DI > > Failed to match this instruction: > > (set (reg:DI 72 [ _1 ]) > > (and:DI (ashift:DI (reg:DI 81) > > (const_int 5 [0x5])) > > (const_int 68719476704 [0xfffffffe0]))) > > and produce the following (optimized) assembly: > > f: > > slliw a5,a0,1 > > slli a5,a5,32 > > srli a5,a5,28 > > add a0,a5,a1 > > ret > > > > The current way of handling this (in 'zero_extendsidi2_shifted') > > doesn't apply for two reasons: > > - this is seen before reload, and > > - (more importantly) the constant mask is not 0xfffffffful. > > > > To address this, we introduce a generalized version of shifting > > zero-extended values that supports any mask of consecutive ones as > > long as the number of training zeros is the inner shift-amount. > > > > With this new split, we generate the following assembly for the > > aforementioned function: > > f: > > slli a0,a0,33 > > srli a0,a0,28 > > add a0,a0,a1 > > ret > > > > gcc/ChangeLog: > > > > * config/riscv/riscv.md (zero_extendsidi2_shifted): Replace > > with a generalized split that requires no clobber, runs > > before reload and works for smaller masks. > > > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > > --- > > > > gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++----------------- > > 1 file changed, 20 insertions(+), 17 deletions(-) > > > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > > index b8ab0cf169a..cc10cd90a74 100644 > > --- a/gcc/config/riscv/riscv.md > > +++ b/gcc/config/riscv/riscv.md > > @@ -2119,23 +2119,26 @@ (define_split > > ;; occur when unsigned int is used for array indexing. Split this into two > > ;; shifts. Otherwise we can get 3 shifts. > > > > -(define_insn_and_split "zero_extendsidi2_shifted" > > - [(set (match_operand:DI 0 "register_operand" "=r") > > - (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") > > - (match_operand:QI 2 "immediate_operand" "I")) > > - (match_operand 3 "immediate_operand" ""))) > > - (clobber (match_scratch:DI 4 "=&r"))] > > - "TARGET_64BIT && !TARGET_ZBA > > - && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)" > > - "#" > > - "&& reload_completed" > > - [(set (match_dup 4) > > - (ashift:DI (match_dup 1) (const_int 32))) > > - (set (match_dup 0) > > - (lshiftrt:DI (match_dup 4) (match_dup 5)))] > > - "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));" > > - [(set_attr "type" "shift") > > - (set_attr "mode" "DI")]) > > +(define_split > > + [(set (match_operand:DI 0 "register_operand") > > + (and:DI (ashift:DI (match_operand:DI 1 "register_operand") > > + (match_operand:QI 2 "immediate_operand")) > > + (match_operand:DI 3 "consecutive_bits_operand")))] > > + "TARGET_64BIT" > > + [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4))) > > + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))] > > I would prefer to keep using another register if possible: > > like this: > + [(set (match_dup 6) (ashift:DI (match_dup 1) (match_dup 4))) > + (set (match_dup 0) (lshiftrt:DI (match_dup 6) (match_dup 5)))] > > if (can_create_pseudo_p) > operands[6] = gen_reg_rtx (DImode); > else > operands[6] = operands[0]; I don't see the benefit to this (unless you expect opportunities for CSE), as there will be a linear dependency chain anyway. I'd like to understand your reasoning behind this a bit better, as our style currently generally tries to not avoid introducing temporaries if it is avoidable. Thanks, Philipp. > > > +{ > > + unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]); > > + int leading = clz_hwi (mask); > > + int trailing = ctz_hwi (mask); > > + > > + /* The shift-amount must match the number of trailing bits */ > > + if (trailing != UINTVAL (operands[2])) > > + FAIL; > > + > > + operands[4] = GEN_INT (leading + trailing); > > + operands[5] = GEN_INT (leading); > > +}) > > > > ;; > > ;; .................... > > -- > > 2.34.1 > >
Using the same pseudo register makes one longer live range instead of two shorter live ranges, that's not good when inst. scheduler try to separate those two instructions, and I think register allocator has more complete knowledge to decide which way is better - using the same or different, so I prefer to use another pseudo here if possible. That's also what AArch64/ARM/x86 port did - use new pseudo as tmp if possible. On Tue, Jun 7, 2022 at 6:50 PM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote: > > On Tue, 7 Jun 2022 at 12:24, Kito Cheng <kito.cheng@gmail.com> wrote: > > > > On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich > > <philipp.tomsich@vrull.eu> wrote: > > > > > > The current method of treating shifts of extended values on RISC-V > > > frequently causes sequences of 3 shifts, despite the presence of the > > > 'zero_extendsidi2_shifted' pattern. > > > > > > Consider: > > > unsigned long f(unsigned int a, unsigned long b) > > > { > > > a = a << 1; > > > unsigned long c = (unsigned long) a; > > > c = b + (c<<4); > > > return c; > > > } > > > which will present at combine-time as: > > > Trying 7, 8 -> 9: > > > 7: r78:SI=r81:DI#0<<0x1 > > > REG_DEAD r81:DI > > > 8: r79:DI=zero_extend(r78:SI) > > > REG_DEAD r78:SI > > > 9: r72:DI=r79:DI<<0x4 > > > REG_DEAD r79:DI > > > Failed to match this instruction: > > > (set (reg:DI 72 [ _1 ]) > > > (and:DI (ashift:DI (reg:DI 81) > > > (const_int 5 [0x5])) > > > (const_int 68719476704 [0xfffffffe0]))) > > > and produce the following (optimized) assembly: > > > f: > > > slliw a5,a0,1 > > > slli a5,a5,32 > > > srli a5,a5,28 > > > add a0,a5,a1 > > > ret > > > > > > The current way of handling this (in 'zero_extendsidi2_shifted') > > > doesn't apply for two reasons: > > > - this is seen before reload, and > > > - (more importantly) the constant mask is not 0xfffffffful. > > > > > > To address this, we introduce a generalized version of shifting > > > zero-extended values that supports any mask of consecutive ones as > > > long as the number of training zeros is the inner shift-amount. > > > > > > With this new split, we generate the following assembly for the > > > aforementioned function: > > > f: > > > slli a0,a0,33 > > > srli a0,a0,28 > > > add a0,a0,a1 > > > ret > > > > > > gcc/ChangeLog: > > > > > > * config/riscv/riscv.md (zero_extendsidi2_shifted): Replace > > > with a generalized split that requires no clobber, runs > > > before reload and works for smaller masks. > > > > > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> > > > --- > > > > > > gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++----------------- > > > 1 file changed, 20 insertions(+), 17 deletions(-) > > > > > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > > > index b8ab0cf169a..cc10cd90a74 100644 > > > --- a/gcc/config/riscv/riscv.md > > > +++ b/gcc/config/riscv/riscv.md > > > @@ -2119,23 +2119,26 @@ (define_split > > > ;; occur when unsigned int is used for array indexing. Split this into two > > > ;; shifts. Otherwise we can get 3 shifts. > > > > > > -(define_insn_and_split "zero_extendsidi2_shifted" > > > - [(set (match_operand:DI 0 "register_operand" "=r") > > > - (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") > > > - (match_operand:QI 2 "immediate_operand" "I")) > > > - (match_operand 3 "immediate_operand" ""))) > > > - (clobber (match_scratch:DI 4 "=&r"))] > > > - "TARGET_64BIT && !TARGET_ZBA > > > - && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)" > > > - "#" > > > - "&& reload_completed" > > > - [(set (match_dup 4) > > > - (ashift:DI (match_dup 1) (const_int 32))) > > > - (set (match_dup 0) > > > - (lshiftrt:DI (match_dup 4) (match_dup 5)))] > > > - "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));" > > > - [(set_attr "type" "shift") > > > - (set_attr "mode" "DI")]) > > > +(define_split > > > + [(set (match_operand:DI 0 "register_operand") > > > + (and:DI (ashift:DI (match_operand:DI 1 "register_operand") > > > + (match_operand:QI 2 "immediate_operand")) > > > + (match_operand:DI 3 "consecutive_bits_operand")))] > > > + "TARGET_64BIT" > > > + [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4))) > > > + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))] > > > > I would prefer to keep using another register if possible: > > > > like this: > > + [(set (match_dup 6) (ashift:DI (match_dup 1) (match_dup 4))) > > + (set (match_dup 0) (lshiftrt:DI (match_dup 6) (match_dup 5)))] > > > > if (can_create_pseudo_p) > > operands[6] = gen_reg_rtx (DImode); > > else > > operands[6] = operands[0]; > > I don't see the benefit to this (unless you expect opportunities for > CSE), as there will be a linear dependency chain anyway. I'd like to > understand your reasoning behind this a bit better, as our style > currently generally tries to not avoid introducing temporaries if it > is avoidable. > > Thanks, > Philipp. > > > > > > +{ > > > + unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]); > > > + int leading = clz_hwi (mask); > > > + int trailing = ctz_hwi (mask); > > > + > > > + /* The shift-amount must match the number of trailing bits */ > > > + if (trailing != UINTVAL (operands[2])) > > > + FAIL; > > > + > > > + operands[4] = GEN_INT (leading + trailing); > > > + operands[5] = GEN_INT (leading); > > > +}) > > > > > > ;; > > > ;; .................... > > > -- > > > 2.34.1 > > >
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index b8ab0cf169a..cc10cd90a74 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2119,23 +2119,26 @@ (define_split ;; occur when unsigned int is used for array indexing. Split this into two ;; shifts. Otherwise we can get 3 shifts. -(define_insn_and_split "zero_extendsidi2_shifted" - [(set (match_operand:DI 0 "register_operand" "=r") - (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") - (match_operand:QI 2 "immediate_operand" "I")) - (match_operand 3 "immediate_operand" ""))) - (clobber (match_scratch:DI 4 "=&r"))] - "TARGET_64BIT && !TARGET_ZBA - && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)" - "#" - "&& reload_completed" - [(set (match_dup 4) - (ashift:DI (match_dup 1) (const_int 32))) - (set (match_dup 0) - (lshiftrt:DI (match_dup 4) (match_dup 5)))] - "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));" - [(set_attr "type" "shift") - (set_attr "mode" "DI")]) +(define_split + [(set (match_operand:DI 0 "register_operand") + (and:DI (ashift:DI (match_operand:DI 1 "register_operand") + (match_operand:QI 2 "immediate_operand")) + (match_operand:DI 3 "consecutive_bits_operand")))] + "TARGET_64BIT" + [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4))) + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))] +{ + unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]); + int leading = clz_hwi (mask); + int trailing = ctz_hwi (mask); + + /* The shift-amount must match the number of trailing bits */ + if (trailing != UINTVAL (operands[2])) + FAIL; + + operands[4] = GEN_INT (leading + trailing); + operands[5] = GEN_INT (leading); +}) ;; ;; ....................