From patchwork Mon Mar  6 12:47:06 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 66030
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3889A385B518
	for <patchwork@sourceware.org>; Mon,  6 Mar 2023 12:47:38 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3889A385B518
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1678106858;
	bh=xqo+cCHY5nBk0pJK2WZn++/BRwPtKioXvV/80JJrRgU=;
	h=To:Cc:Subject:References:Date:In-Reply-To:List-Id:
	 List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
	 From:Reply-To:From;
	b=LBTRi2gvMW8tO/4OnzeYcZgPdXUh++jQ3eFdw8k4FUHMYPqg6UBQIeeiomcTPMDP/
	 Dru1bwhzFNLZJd/3+8mycdK2ZW+LYAxhIFwewd979yhnu4z67xE9gpqtWm3ru9HQ9p
	 6kSm6ySujXyFxIu/Gwh/y8vZXCj6ALTIOzJjpINg=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 5EECC3858D39
 for <gcc-patches@gcc.gnu.org>; Mon,  6 Mar 2023 12:47:09 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5EECC3858D39
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6122F13D5;
 Mon,  6 Mar 2023 04:47:52 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 337493F885;
 Mon,  6 Mar 2023 04:47:08 -0800 (PST)
To: Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org>
Mail-Followup-To: Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org>,
 Tamar Christina <Tamar.Christina@arm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 Roger Sayle <roger@nextmovesoftware.com>, Jeff Law <jeffreyalaw@gmail.com>,
 richard.sandiford@arm.com
Cc: Tamar Christina <Tamar.Christina@arm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 Roger Sayle <roger@nextmovesoftware.com>, Jeff Law <jeffreyalaw@gmail.com>
Subject: [PATCH] combine: Try harder to form zero_extends [PR106594]
References: <000c01d94ec7$a6921430$f3b63c90$@nextmovesoftware.com>
 <20230304221749.GK25951@gate.crashing.org>
 <VI1PR08MB53251080224EFA9FF000DC33FFB19@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <3b1ed616-5d90-7a66-63b5-bdb5e320eebf@gmail.com>
Date: Mon, 06 Mar 2023 12:47:06 +0000
In-Reply-To: <3b1ed616-5d90-7a66-63b5-bdb5e320eebf@gmail.com> (Jeff Law via
 Gcc-patches's message of "Sun, 5 Mar 2023 12:56:02 -0700")
Message-ID: <mptilfe9hdh.fsf_-_@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
X-Spam-Status: No, score=-34.0 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT,
 SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Richard Sandiford via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Richard Sandiford <richard.sandiford@arm.com>
Reply-To: Richard Sandiford <richard.sandiford@arm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
>> 
>> The regression was reported during stage-1. A patch was provided during stage 1 and the discussions around combine stalled.
>> 
>> The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to "take".
>> 
>> So we need a way forward, even if it's stage-4.
> Then it needs to be in a way that works within the design constraints of 
> combine.
>
> As Segher has indicated, using a magic constant to say "this is always 
> cheap enough" isn't acceptable.  Furthermore, what this patch changes is 
> combine's internal canonicalization of extensions into shift pairs.
>
> So I think another path forward needs to be found.  I don't see hacking 
> up expand_compound_operation is viable.
>
> Jeff

How about the patch below?  Tested on aarch64-linux-gnu,
but I'll test on x86_64-linux-gnu and powerpc64le-linux-gnu
before committing.

-----------------------------

The PR contains a case where we want GCC to combine a sign_extend
into an address (which is something that GCC 12 could do).  After
substitution, the sign_extend goes through the usual
expand_compound_operation wrangler and, after taking nonzero_bits
into account, make_compound_operation is presented with:

  X1: (and (mult (subreg x) (const_int N2)) (const_int N1))

where:

(a) the subreg is paradoxical
(b) N2 is a power of 2
(c) all bits outside N1/N2 are known to be zero in x

This is equivalent to:

  X2: (mult (and (subreg x) (const_int N1/N2)) (const_int N2))

Given in this form, existing code would already use (c) to convert
the inner "and" to a zero_extend:

  (mult (zero_extend x) (const_int N2))

This patch makes the code handle X1 as well as X2.

Logically, it would make sense to do the same for ASHIFT, which
would be the canonical form outside memory addresses.  However, it
seemed better to do the minimum possible, given the late stage in
the release cycle.

gcc/
	PR rtl-optimization/106594
	* combine.cc (make_compound_operation_int): Extend the AND to
	ZERO_EXTEND transformation so that it can handle an intervening
	multiplication by a power of two.

gcc/testsuite/
	* gcc.target/aarch64/pr106594.c: New test.
---
 gcc/combine.cc                              | 60 ++++++++++++++-------
 gcc/testsuite/gcc.target/aarch64/pr106594.c | 21 ++++++++
 2 files changed, 63 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 053879500b7..b45042bbafd 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -8188,28 +8188,52 @@ make_compound_operation_int (scalar_int_mode mode, rtx *x_ptr,
       /* If the one operand is a paradoxical subreg of a register or memory and
 	 the constant (limited to the smaller mode) has only zero bits where
 	 the sub expression has known zero bits, this can be expressed as
-	 a zero_extend.  */
-      else if (GET_CODE (XEXP (x, 0)) == SUBREG)
-	{
-	  rtx sub;
+	 a zero_extend.
+
+	 Look also for the case where the operand is such a subreg that
+	 is multiplied by 2**N:
 
-	  sub = XEXP (XEXP (x, 0), 0);
-	  machine_mode sub_mode = GET_MODE (sub);
-	  int sub_width;
-	  if ((REG_P (sub) || MEM_P (sub))
-	      && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
-	      && sub_width < mode_width)
+	   (and (mult ... 2**N) M) --> (mult (and ... M>>N) 2**N) -> ...  */
+      else
+	{
+	  rtx y = XEXP (x, 0);
+	  rtx top = y;
+	  int shift = 0;
+	  if (GET_CODE (y) == MULT
+	      && CONST_INT_P (XEXP (y, 1))
+	      && pow2p_hwi (INTVAL (XEXP (y, 1))))
 	    {
-	      unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
-	      unsigned HOST_WIDE_INT mask;
+	      shift = exact_log2 (INTVAL (XEXP (y, 1)));
+	      y = XEXP (y, 0);
+	    }
+	  if (GET_CODE (y) == SUBREG)
+	    {
+	      rtx sub;
 
-	      /* original AND constant with all the known zero bits set */
-	      mask = UINTVAL (XEXP (x, 1)) | (~nonzero_bits (sub, sub_mode));
-	      if ((mask & mode_mask) == mode_mask)
+	      sub = XEXP (y, 0);
+	      machine_mode sub_mode = GET_MODE (sub);
+	      int sub_width;
+	      if ((REG_P (sub) || MEM_P (sub))
+		  && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
+		  && sub_width < mode_width)
 		{
-		  new_rtx = make_compound_operation (sub, next_code);
-		  new_rtx = make_extraction (mode, new_rtx, 0, 0, sub_width,
-					     1, 0, in_code == COMPARE);
+		  unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
+		  unsigned HOST_WIDE_INT mask;
+
+		  /* The shifted AND constant with all the known zero
+		     bits set.  */
+		  mask = ((UINTVAL (XEXP (x, 1)) >> shift)
+			  | ~nonzero_bits (sub, sub_mode));
+		  if ((mask & mode_mask) == mode_mask)
+		    {
+		      new_rtx = make_compound_operation (sub, next_code);
+		      new_rtx = make_extraction (mode, new_rtx,
+						 0, 0, sub_width,
+						 1, 0, in_code == COMPARE);
+		      if (shift)
+			new_rtx = gen_rtx_fmt_ee (GET_CODE (top), mode,
+						  new_rtx, XEXP (top, 1));
+		    }
 		}
 	    }
 	}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr106594.c b/gcc/testsuite/gcc.target/aarch64/pr106594.c
new file mode 100644
index 00000000000..f10d72665e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr106594.c
@@ -0,0 +1,21 @@
+/* { dg-options "-O2" } */
+
+extern const int constellation_64qam[64];
+
+void foo(int nbits,
+         const char *p_src,
+         int *p_dst) {
+
+  while (nbits > 0U) {
+    char first = *p_src++;
+
+    char index1 = ((first & 0x3) << 4) | (first >> 4);
+
+    *p_dst++ = constellation_64qam[index1];
+
+    nbits--;
+  }
+}
+
+/* { dg-final { scan-assembler {ldr\tw[0-9]+, \[x[0-9]+, w[0-9]+, [su]xtw #?2\]} } } */
+