From patchwork Mon Mar 6 12:47:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 66030 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3889A385B518 for ; Mon, 6 Mar 2023 12:47:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3889A385B518 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1678106858; bh=xqo+cCHY5nBk0pJK2WZn++/BRwPtKioXvV/80JJrRgU=; h=To:Cc:Subject:References:Date:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=LBTRi2gvMW8tO/4OnzeYcZgPdXUh++jQ3eFdw8k4FUHMYPqg6UBQIeeiomcTPMDP/ Dru1bwhzFNLZJd/3+8mycdK2ZW+LYAxhIFwewd979yhnu4z67xE9gpqtWm3ru9HQ9p 6kSm6ySujXyFxIu/Gwh/y8vZXCj6ALTIOzJjpINg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 5EECC3858D39 for ; Mon, 6 Mar 2023 12:47:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5EECC3858D39 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6122F13D5; Mon, 6 Mar 2023 04:47:52 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 337493F885; Mon, 6 Mar 2023 04:47:08 -0800 (PST) To: Jeff Law via Gcc-patches Mail-Followup-To: Jeff Law via Gcc-patches , Tamar Christina , Segher Boessenkool , Roger Sayle , Jeff Law , richard.sandiford@arm.com Cc: Tamar Christina , Segher Boessenkool , Roger Sayle , Jeff Law Subject: [PATCH] combine: Try harder to form zero_extends [PR106594] References: <000c01d94ec7$a6921430$f3b63c90$@nextmovesoftware.com> <20230304221749.GK25951@gate.crashing.org> <3b1ed616-5d90-7a66-63b5-bdb5e320eebf@gmail.com> Date: Mon, 06 Mar 2023 12:47:06 +0000 In-Reply-To: <3b1ed616-5d90-7a66-63b5-bdb5e320eebf@gmail.com> (Jeff Law via Gcc-patches's message of "Sun, 5 Mar 2023 12:56:02 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-34.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Jeff Law via Gcc-patches writes: > On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote: >> >> The regression was reported during stage-1. A patch was provided during stage 1 and the discussions around combine stalled. >> >> The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to "take". >> >> So we need a way forward, even if it's stage-4. > Then it needs to be in a way that works within the design constraints of > combine. > > As Segher has indicated, using a magic constant to say "this is always > cheap enough" isn't acceptable. Furthermore, what this patch changes is > combine's internal canonicalization of extensions into shift pairs. > > So I think another path forward needs to be found. I don't see hacking > up expand_compound_operation is viable. > > Jeff How about the patch below? Tested on aarch64-linux-gnu, but I'll test on x86_64-linux-gnu and powerpc64le-linux-gnu before committing. ----------------------------- The PR contains a case where we want GCC to combine a sign_extend into an address (which is something that GCC 12 could do). After substitution, the sign_extend goes through the usual expand_compound_operation wrangler and, after taking nonzero_bits into account, make_compound_operation is presented with: X1: (and (mult (subreg x) (const_int N2)) (const_int N1)) where: (a) the subreg is paradoxical (b) N2 is a power of 2 (c) all bits outside N1/N2 are known to be zero in x This is equivalent to: X2: (mult (and (subreg x) (const_int N1/N2)) (const_int N2)) Given in this form, existing code would already use (c) to convert the inner "and" to a zero_extend: (mult (zero_extend x) (const_int N2)) This patch makes the code handle X1 as well as X2. Logically, it would make sense to do the same for ASHIFT, which would be the canonical form outside memory addresses. However, it seemed better to do the minimum possible, given the late stage in the release cycle. gcc/ PR rtl-optimization/106594 * combine.cc (make_compound_operation_int): Extend the AND to ZERO_EXTEND transformation so that it can handle an intervening multiplication by a power of two. gcc/testsuite/ * gcc.target/aarch64/pr106594.c: New test. --- gcc/combine.cc | 60 ++++++++++++++------- gcc/testsuite/gcc.target/aarch64/pr106594.c | 21 ++++++++ 2 files changed, 63 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594.c diff --git a/gcc/combine.cc b/gcc/combine.cc index 053879500b7..b45042bbafd 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -8188,28 +8188,52 @@ make_compound_operation_int (scalar_int_mode mode, rtx *x_ptr, /* If the one operand is a paradoxical subreg of a register or memory and the constant (limited to the smaller mode) has only zero bits where the sub expression has known zero bits, this can be expressed as - a zero_extend. */ - else if (GET_CODE (XEXP (x, 0)) == SUBREG) - { - rtx sub; + a zero_extend. + + Look also for the case where the operand is such a subreg that + is multiplied by 2**N: - sub = XEXP (XEXP (x, 0), 0); - machine_mode sub_mode = GET_MODE (sub); - int sub_width; - if ((REG_P (sub) || MEM_P (sub)) - && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width) - && sub_width < mode_width) + (and (mult ... 2**N) M) --> (mult (and ... M>>N) 2**N) -> ... */ + else + { + rtx y = XEXP (x, 0); + rtx top = y; + int shift = 0; + if (GET_CODE (y) == MULT + && CONST_INT_P (XEXP (y, 1)) + && pow2p_hwi (INTVAL (XEXP (y, 1)))) { - unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode); - unsigned HOST_WIDE_INT mask; + shift = exact_log2 (INTVAL (XEXP (y, 1))); + y = XEXP (y, 0); + } + if (GET_CODE (y) == SUBREG) + { + rtx sub; - /* original AND constant with all the known zero bits set */ - mask = UINTVAL (XEXP (x, 1)) | (~nonzero_bits (sub, sub_mode)); - if ((mask & mode_mask) == mode_mask) + sub = XEXP (y, 0); + machine_mode sub_mode = GET_MODE (sub); + int sub_width; + if ((REG_P (sub) || MEM_P (sub)) + && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width) + && sub_width < mode_width) { - new_rtx = make_compound_operation (sub, next_code); - new_rtx = make_extraction (mode, new_rtx, 0, 0, sub_width, - 1, 0, in_code == COMPARE); + unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode); + unsigned HOST_WIDE_INT mask; + + /* The shifted AND constant with all the known zero + bits set. */ + mask = ((UINTVAL (XEXP (x, 1)) >> shift) + | ~nonzero_bits (sub, sub_mode)); + if ((mask & mode_mask) == mode_mask) + { + new_rtx = make_compound_operation (sub, next_code); + new_rtx = make_extraction (mode, new_rtx, + 0, 0, sub_width, + 1, 0, in_code == COMPARE); + if (shift) + new_rtx = gen_rtx_fmt_ee (GET_CODE (top), mode, + new_rtx, XEXP (top, 1)); + } } } } diff --git a/gcc/testsuite/gcc.target/aarch64/pr106594.c b/gcc/testsuite/gcc.target/aarch64/pr106594.c new file mode 100644 index 00000000000..f10d72665e0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr106594.c @@ -0,0 +1,21 @@ +/* { dg-options "-O2" } */ + +extern const int constellation_64qam[64]; + +void foo(int nbits, + const char *p_src, + int *p_dst) { + + while (nbits > 0U) { + char first = *p_src++; + + char index1 = ((first & 0x3) << 4) | (first >> 4); + + *p_dst++ = constellation_64qam[index1]; + + nbits--; + } +} + +/* { dg-final { scan-assembler {ldr\tw[0-9]+, \[x[0-9]+, w[0-9]+, [su]xtw #?2\]} } } */ +