From patchwork Tue Aug 2 11:55:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 56491 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9A87B385276E for ; Tue, 2 Aug 2022 11:56:17 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id C72E03856DE8 for ; Tue, 2 Aug 2022 11:55:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C72E03856DE8 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=JugxZ4w69FxWm7P8JJMM5caSPBlf3MM5BBxRK+00OU4=; b=SyAp2yXMWzBuKvCGsYqF65nCft B9d28Inq8JQ7UOoXzBc1yetceUSNsv8SKB9SkeVQikiKC/uEIRdt9DFih+mEody2v4fSQMH0U6V3X NGZWwiB3yniagykd8vTMT+mbaz6Pm4qNRgHJA7rEnrXfmY1YP8QGyxtaQ4BBam9L9EM4j6G0b3m3W NHRDFFgO2si2r9SgRjBBbvVjKOAO8JMQ1vMsQ3YOMGoWv6KYnOX33icoURw0m//xutzLnYqxeACJd VUtgeqepCvatEBa/hpCIKokkc5Ar39Ji6F1gZ6Ot4KO+PMyNsXx/MvWDu6SgXgksRUlXdhxrr7qPp asw26HCA==; Received: from host86-169-41-119.range86-169.btcentralplus.com ([86.169.41.119]:54332 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oIqVC-00049a-Q1; Tue, 02 Aug 2022 07:55:59 -0400 From: "Roger Sayle" To: "'GCC Patches'" References: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com> In-Reply-To: Subject: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx. Date: Tue, 2 Aug 2022 12:55:56 +0100 Message-ID: <01c401d8a666$d48c9e50$7da5daf0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdimZEr8OquuAI/HSburP1uyCP2bMA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 'Richard Sandiford' , 'Segher Boessenkool' Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Many thanks to Segher and Richard for pointing out that my removal of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version of this patch was incorrect, and my assumption that these would be subsumed by val_signbit_known_clear_p was mistaken. That the tests for ABS and FFS looked out of place, was not an indication that they were not required, but that we were missing simplifications for the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc. To make up for this mistake, in this revised patch I've not only restored the tests for ABS and FFS, but also added the many sibling RTX codes that I'd also expect to see optimized here, such as ABS(PARITY(x)). This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-08-02 Roger Sayle Segher Boessenkool Richard Sandiford gcc/ChangeLog * simplify_rtx.cc (simplify_unary_operation_1) : Add optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ and LSHIFTRT that are all positive to complement the existing FFS and (idempotent) ABS simplifications. : Canonicalize SIGN_EXTEND to ZERO_EXTEND when val_signbit_known_clear_p is true of the operand. Simplify sign extensions of SUBREG truncations of operands that are already suitably (zero) extended. : Simplify zero extensions of SUBREG truncations of operands that are already suitably zero extended. Thanks in advance, Roger --- > -----Original Message----- > From: Richard Sandiford > Sent: 02 August 2022 10:39 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool' > > Subject: Re: [PATCH] Some additional zero-extension related optimizations in > simplify-rtx. > > "Roger Sayle" writes: > > This patch implements some additional zero-extension and > > sign-extension related optimizations in simplify-rtx.cc. The original > > motivation comes from PR rtl-optimization/71775, where in comment #2 > Andrew Pinski sees: > > > > Failed to match this instruction: > > (set (reg:DI 88 [ _1 ]) > > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) > > > > On many platforms the result of DImode CTZ is constrained to be a > > small unsigned integer (between 0 and 64), hence the truncation to > > 32-bits (using a SUBREG) and the following sign extension back to > > 64-bits are effectively a no-op, so the above should ideally (often) > > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))". > > > > To implement this, and some closely related transformations, we build > > upon the existing val_signbit_known_clear_p predicate. In the first > > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit > > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x)) > > can itself be simplified. > > I think I misunderstood, but just in case: RTL ABS is well-defined for the minimum > integer (giving back the minimum integer), so we can't assume that ABS leaves > the sign bit clear. > > Thanks, > Richard > > > The second transformation is that we can canonicalized SIGN_EXTEND to > > ZERO_EXTEND (as in the PR 71775 case above) when the operand's > > sign-bit is known to be clear. The final two chunks are for > > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating > > SUBREG respectively. The nonzero_bits of a truncating SUBREG > > pessimistically thinks that the upper bits may have an arbitrary value > > (by taking the SUBREG), so we need look deeper at the SUBREG's operand > > to confirm that the high bits are known to be zero. > > > > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with > > default architecture options is undefined at zero, so we can't be sure > > the upper bits of reg:DI 88 will be sign extended (all zeros or all ones). > > nonzero_bits knows this, so the above transformations don't trigger, > > but the transformations themselves are perfectly valid for other > > operations such as FFS, POPCOUNT and PARITY, and on other > > targets/-march settings where CTZ is defined at zero. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32}, > > with no new failures. Testing with CSiBE shows these transformations > > trigger on several source files (and with -Os reduces the size of the > > code). Ok for mainline? > > > > > > 2022-07-27 Roger Sayle > > > > gcc/ChangeLog > > * simplify_rtx.cc (simplify_unary_operation_1) : Simplify > > test as both FFS and ABS result in nonzero_bits returning a > > mask that satisfies val_signbit_known_clear_p. > > : Canonicalize SIGN_EXTEND to ZERO_EXTEND when > > val_signbit_known_clear_p is true of the operand. > > Simplify sign extensions of SUBREG truncations of operands > > that are already suitably (zero) extended. > > : Simplify zero extensions of SUBREG truncations > > of operands that are already suitably zero extended. > > > > > > Thanks in advance, > > Roger > > -- > > > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index > > fa20665..e62bf56 100644 > > --- a/gcc/simplify-rtx.cc > > +++ b/gcc/simplify-rtx.cc > > @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1 > (rtx_code code, machine_mode mode, > > break; > > > > /* If operand is something known to be positive, ignore the ABS. */ > > - if (GET_CODE (op) == FFS || GET_CODE (op) == ABS > > - || val_signbit_known_clear_p (GET_MODE (op), > > - nonzero_bits (op, GET_MODE (op)))) > > + if (val_signbit_known_clear_p (GET_MODE (op), > > + nonzero_bits (op, GET_MODE (op)))) > > return op; > > > > /* If operand is known to be only -1 or 0, convert ABS to NEG. > > */ @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1 > (rtx_code code, machine_mode mode, > > } > > } > > > > + /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when > > + we know the sign bit of OP must be clear. */ > > + if (val_signbit_known_clear_p (GET_MODE (op), > > + nonzero_bits (op, GET_MODE (op)))) > > + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); > > + > > + /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ > > + if (GET_CODE (op) == SUBREG > > + && subreg_lowpart_p (op) > > + && GET_MODE (SUBREG_REG (op)) == mode > > + && is_a (mode, &int_mode) > > + && is_a (GET_MODE (op), &op_mode) > > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT > > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION > (int_mode) > > + && (nonzero_bits (SUBREG_REG (op), mode) > > + & ~(GET_MODE_MASK (op_mode)>>1)) == 0) > > + return SUBREG_REG (op); > > + > > #if defined(POINTERS_EXTEND_UNSIGNED) > > /* As we do not know which address space the pointer is referring to, > > we can do this only if the target does not support different > > pointer @@ -1765,6 +1782,18 @@ > simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode > mode, > > op0_mode); > > } > > > > + /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ > > + if (GET_CODE (op) == SUBREG > > + && subreg_lowpart_p (op) > > + && GET_MODE (SUBREG_REG (op)) == mode > > + && is_a (mode, &int_mode) > > + && is_a (GET_MODE (op), &op_mode) > > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT > > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION > (int_mode) > > + && (nonzero_bits (SUBREG_REG (op), mode) > > + & ~GET_MODE_MASK (op_mode)) == 0) > > + return SUBREG_REG (op); > > + > > #if defined(POINTERS_EXTEND_UNSIGNED) > > /* As we do not know which address space the pointer is referring to, > > we can do this only if the target does not support different > > pointer diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index fa20665..b53272b 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -1366,11 +1366,57 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, break; /* If operand is something known to be positive, ignore the ABS. */ - if (GET_CODE (op) == FFS || GET_CODE (op) == ABS - || val_signbit_known_clear_p (GET_MODE (op), - nonzero_bits (op, GET_MODE (op)))) + if (val_signbit_known_clear_p (GET_MODE (op), + nonzero_bits (op, GET_MODE (op)))) return op; + /* Using nonzero_bits doesn't (currently) work for modes wider than + HOST_WIDE_INT, so the following transformations help simplify + ABS for TImode and wider. */ + switch (GET_CODE (op)) + { + case ABS: + case CLRSB: + case FFS: + case PARITY: + case POPCOUNT: + case SS_ABS: + return op; + + case CLZ: + if (is_a (mode, &int_mode) + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT) + { + HOST_WIDE_INT val0; + if (CLZ_DEFINED_VALUE_AT_ZERO (int_mode, val0) + && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0) + return op; + } + break; + + case CTZ: + if (is_a (mode, &int_mode) + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT) + { + HOST_WIDE_INT val0; + if (CTZ_DEFINED_VALUE_AT_ZERO (int_mode, val0) + && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0) + return op; + } + break; + + case LSHIFTRT: + if (CONST_INT_P (XEXP (op, 1)) + && INTVAL (XEXP (op, 1)) > 0 + && is_a (mode, &int_mode) + && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (int_mode)) + return op; + break; + + default: + break; + } + /* If operand is known to be only -1 or 0, convert ABS to NEG. */ if (is_a (mode, &int_mode) && (num_sign_bit_copies (op, int_mode) @@ -1615,6 +1661,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, } } + /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when + we know the sign bit of OP must be clear. */ + if (val_signbit_known_clear_p (GET_MODE (op), + nonzero_bits (op, GET_MODE (op)))) + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); + + /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ + if (GET_CODE (op) == SUBREG + && subreg_lowpart_p (op) + && GET_MODE (SUBREG_REG (op)) == mode + && is_a (mode, &int_mode) + && is_a (GET_MODE (op), &op_mode) + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) + && (nonzero_bits (SUBREG_REG (op), mode) + & ~(GET_MODE_MASK (op_mode) >> 1)) == 0) + return SUBREG_REG (op); + #if defined(POINTERS_EXTEND_UNSIGNED) /* As we do not know which address space the pointer is referring to, we can do this only if the target does not support different pointer @@ -1765,6 +1829,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, op0_mode); } + /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ + if (GET_CODE (op) == SUBREG + && subreg_lowpart_p (op) + && GET_MODE (SUBREG_REG (op)) == mode + && is_a (mode, &int_mode) + && is_a (GET_MODE (op), &op_mode) + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) + && (nonzero_bits (SUBREG_REG (op), mode) + & ~GET_MODE_MASK (op_mode)) == 0) + return SUBREG_REG (op); + #if defined(POINTERS_EXTEND_UNSIGNED) /* As we do not know which address space the pointer is referring to, we can do this only if the target does not support different pointer