From patchwork Thu Mar 23 11:32:20 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 66795
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C7E953858C31
	for <patchwork@sourceware.org>; Thu, 23 Mar 2023 11:32:42 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id ED1493858C31
 for <gcc-patches@gcc.gnu.org>; Thu, 23 Mar 2023 11:32:25 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ED1493858C31
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.98,283,1673942400";
   d="scan'208";a="115406"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 23 Mar 2023 03:32:24 -0800
IronPort-SDR: 
 cSxeHJr1KWc/rS2YHrbZdBFwN+bYcw2tlGw9avfCRo3jmdK5T2fQ7RCxJ1FejllTJBgk8N4M7D
 pSB/M5qJJo6a4ETz9kj2HK4lYcGMTwUFhEDwkabbO0Lx6qI6JOie40LKrtxEfvChnsWLynOhWM
 Z+IrGiJfzFCOmlvpd1OQiEPgbOxSkjWkRhXnGHknrovuyIQe7WZU+fuNh1pK0kRgxR0C7ogNkZ
 yWh18A6gtaiM6tW71QnqWk85zpVawXZcSpPIlTuwCch0dqu6Dp6kglAYs+60L6ThIajojcdllc
 jK8=
Message-ID: <db0a824f-d244-a3bc-6ffd-2dc87b6635db@codesourcery.com>
Date: Thu, 23 Mar 2023 11:32:20 +0000
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.9.0
Content-Language: en-GB
From: Andrew Stubbs <ams@codesourcery.com>
Subject: [committed] amdgcn: vec_extract no-op insns
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This patch adds new pseudo-insns for no-op vector extractions.

These were previously modelled as simple move instructions, but the 
register allocator has unhelpful special handling for these that 
triggered spills to memory. Modelling them as a vec_select does the 
right thing in the register allocator, but now the instruction has to be 
explicitly empty.

This patch has already been committed to the OG12 branch.

Andrew
amdgcn: vec_extract no-op insns

Just using move insn for no-op conversions triggers special move handling in
IRA which declares that subreg of vectors aren't valid and routes everything
through memory.  These patterns make the vec_select explicit and all is well.

gcc/ChangeLog:

	* config/gcn/gcn-protos.h (gcn_stepped_zero_int_parallel_p): New.
	* config/gcn/gcn-valu.md (V_1REG_ALT): New.
	(V_2REG_ALT): New.
	(vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop): New.
	(vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop): New.
	(vec_extract<V_ALL:mode><V_ALL_ALT:mode>): Use new patterns.
	* config/gcn/gcn.cc (gcn_stepped_zero_int_parallel_p): New.
	* config/gcn/predicates.md (ascending_zero_int_parallel): New.

diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index d7862b21a2a..287ce17d422 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -75,6 +75,7 @@ extern reg_class gcn_regno_reg_class (int regno);
 extern bool gcn_scalar_flat_address_p (rtx);
 extern bool gcn_scalar_flat_mem_p (rtx);
 extern bool gcn_sgpr_move_p (rtx, rtx);
+extern bool gcn_stepped_zero_int_parallel_p (rtx op, int step);
 extern bool gcn_valid_move_p (machine_mode, rtx, rtx);
 extern rtx gcn_vec_constant (machine_mode, int);
 extern rtx gcn_vec_constant (machine_mode, rtx);
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 787d7709d0d..334b6b0b51c 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -49,6 +49,13 @@ (define_mode_iterator V_1REG
 		       V16QI V16HI V16SI V16HF V16SF
 		       V32QI V32HI V32SI V32HF V32SF
 		       V64QI V64HI V64SI V64HF V64SF])
+(define_mode_iterator V_1REG_ALT
+		      [V2QI V2HI V2SI V2HF V2SF
+		       V4QI V4HI V4SI V4HF V4SF
+		       V8QI V8HI V8SI V8HF V8SF
+		       V16QI V16HI V16SI V16HF V16SF
+		       V32QI V32HI V32SI V32HF V32SF
+		       V64QI V64HI V64SI V64HF V64SF])
 
 (define_mode_iterator V_INT_1REG
 		      [V2QI V2HI V2SI
@@ -80,6 +87,13 @@ (define_mode_iterator V_2REG
 		       V16DI V16DF
 		       V32DI V32DF
 		       V64DI V64DF])
+(define_mode_iterator V_2REG_ALT
+		      [V2DI V2DF
+		       V4DI V4DF
+		       V8DI V8DF
+		       V16DI V16DF
+		       V32DI V32DF
+		       V64DI V64DF])
 
 ; Vector modes with native support
 (define_mode_iterator V_noQI
@@ -788,11 +802,36 @@ (define_insn "vec_extract<mode><scalar_mode>"
    (set_attr "exec" "none")
    (set_attr "laneselect" "yes")])
 
+(define_insn "vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop"
+  [(set (match_operand:V_1REG_ALT 0 "register_operand" "=v,v")
+	(vec_select:V_1REG_ALT
+	  (match_operand:V_1REG 1 "register_operand"   " 0,v")
+	  (match_operand 2 "ascending_zero_int_parallel" "")))]
+  "MODE_VF (<V_1REG_ALT:MODE>mode) < MODE_VF (<V_1REG:MODE>mode)
+   && <V_1REG_ALT:SCALAR_MODE>mode == <V_1REG:SCALAR_MODE>mode"
+  "@
+  ; in-place extract %0
+  v_mov_b32\t%L0, %L1"
+  [(set_attr "type" "vmult")
+   (set_attr "length" "0,8")])
+  
+(define_insn "vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop"
+  [(set (match_operand:V_2REG_ALT 0 "register_operand" "=v,v")
+	(vec_select:V_2REG_ALT
+	  (match_operand:V_2REG 1 "register_operand"   " 0,v")
+	  (match_operand 2 "ascending_zero_int_parallel" "")))]
+  "MODE_VF (<V_2REG_ALT:MODE>mode) < MODE_VF (<V_2REG:MODE>mode)
+   && <V_2REG_ALT:SCALAR_MODE>mode == <V_2REG:SCALAR_MODE>mode"
+  "@
+  ; in-place extract %0
+  v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1"
+  [(set_attr "type" "vmult")
+   (set_attr "length" "0,8")])
+  
 (define_expand "vec_extract<V_ALL:mode><V_ALL_ALT:mode>"
-  [(set (match_operand:V_ALL_ALT 0 "register_operand")
-	(vec_select:V_ALL_ALT
-	  (match_operand:V_ALL 1 "register_operand")
-	  (parallel [(match_operand 2 "immediate_operand")])))]
+  [(match_operand:V_ALL_ALT 0 "register_operand")
+   (match_operand:V_ALL 1 "register_operand")
+   (match_operand 2 "immediate_operand")]
   "MODE_VF (<V_ALL_ALT:MODE>mode) < MODE_VF (<V_ALL:MODE>mode)
    && <V_ALL_ALT:SCALAR_MODE>mode == <V_ALL:SCALAR_MODE>mode"
   {
@@ -802,8 +841,12 @@ (define_expand "vec_extract<V_ALL:mode><V_ALL_ALT:mode>"
 
     if (firstlane == 0)
       {
-	/* A plain move will do.  */
-	tmp = operands[1];
+	rtx parallel = gen_rtx_PARALLEL (<V_ALL:MODE>mode,
+					  rtvec_alloc (numlanes));
+	for (int i = 0; i < numlanes; i++)
+	  XVECEXP (parallel, 0, i) = GEN_INT (i);
+	emit_insn (gen_vec_extract<V_ALL:mode><V_ALL_ALT:mode>_nop
+		   (operands[0], operands[1], parallel));
       } else {
         /* FIXME: optimize this by using DPP where available.  */
 
@@ -815,10 +858,10 @@ (define_expand "vec_extract<V_ALL:mode><V_ALL_ALT:mode>"
 	tmp = gen_reg_rtx (<V_ALL:MODE>mode);
 	emit_insn (gen_ds_bpermute<V_ALL:mode> (tmp, permutation, operands[1],
 						get_exec (<V_ALL:MODE>mode)));
-      }
 
-    emit_move_insn (operands[0],
-		    gen_rtx_SUBREG (<V_ALL_ALT:MODE>mode, tmp, 0));
+	emit_move_insn (operands[0],
+			gen_rtx_SUBREG (<V_ALL_ALT:MODE>mode, tmp, 0));
+      }
     DONE;
   })
 
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index aca17a19d84..5bf88e98083 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -1422,6 +1422,24 @@ CODE_FOR_OP (reload_out)
 #undef CODE_FOR_OP
 #undef CODE_FOR
 
+/* Return true if OP is a PARALLEL of CONST_INTs that form a linear
+   series with step STEP.  */
+
+bool
+gcn_stepped_zero_int_parallel_p (rtx op, int step)
+{
+  if (GET_CODE (op) != PARALLEL || !CONST_INT_P (XVECEXP (op, 0, 0)))
+    return false;
+
+  unsigned HOST_WIDE_INT base = 0;
+  for (int i = 0; i < XVECLEN (op, 0); ++i)
+    if (!CONST_INT_P (XVECEXP (op, 0, i))
+	|| UINTVAL (XVECEXP (op, 0, i)) != base + i * step)
+      return false;
+
+  return true;
+}
+
 /* }}}  */
 /* {{{ Addresses, pointers and moves.  */
 
diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md
index a20acf7e9ef..5554a06b63b 100644
--- a/gcc/config/gcn/predicates.md
+++ b/gcc/config/gcn/predicates.md
@@ -197,3 +197,10 @@ (define_predicate "gcn_register_ds_or_unspec_operand"
        (ior (match_operand 0 "gcn_ds_memory_operand")
 	    (and (match_code "unspec")
               (match_test "XINT (op, 1) == UNSPEC_VECTOR")))))
+
+(define_predicate "ascending_zero_int_parallel"
+  (match_code "parallel")
+{
+  return gcn_stepped_zero_int_parallel_p (op, 1);
+})
+