From patchwork Thu Mar 23 11:32:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 66795 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C7E953858C31 for ; Thu, 23 Mar 2023 11:32:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id ED1493858C31 for ; Thu, 23 Mar 2023 11:32:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ED1493858C31 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,283,1673942400"; d="scan'208";a="115406" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 23 Mar 2023 03:32:24 -0800 IronPort-SDR: cSxeHJr1KWc/rS2YHrbZdBFwN+bYcw2tlGw9avfCRo3jmdK5T2fQ7RCxJ1FejllTJBgk8N4M7D pSB/M5qJJo6a4ETz9kj2HK4lYcGMTwUFhEDwkabbO0Lx6qI6JOie40LKrtxEfvChnsWLynOhWM Z+IrGiJfzFCOmlvpd1OQiEPgbOxSkjWkRhXnGHknrovuyIQe7WZU+fuNh1pK0kRgxR0C7ogNkZ yWh18A6gtaiM6tW71QnqWk85zpVawXZcSpPIlTuwCch0dqu6Dp6kglAYs+60L6ThIajojcdllc jK8= Message-ID: Date: Thu, 23 Mar 2023 11:32:20 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Content-Language: en-GB From: Andrew Stubbs Subject: [committed] amdgcn: vec_extract no-op insns To: "gcc-patches@gcc.gnu.org" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch adds new pseudo-insns for no-op vector extractions. These were previously modelled as simple move instructions, but the register allocator has unhelpful special handling for these that triggered spills to memory. Modelling them as a vec_select does the right thing in the register allocator, but now the instruction has to be explicitly empty. This patch has already been committed to the OG12 branch. Andrew amdgcn: vec_extract no-op insns Just using move insn for no-op conversions triggers special move handling in IRA which declares that subreg of vectors aren't valid and routes everything through memory. These patterns make the vec_select explicit and all is well. gcc/ChangeLog: * config/gcn/gcn-protos.h (gcn_stepped_zero_int_parallel_p): New. * config/gcn/gcn-valu.md (V_1REG_ALT): New. (V_2REG_ALT): New. (vec_extract_nop): New. (vec_extract_nop): New. (vec_extract): Use new patterns. * config/gcn/gcn.cc (gcn_stepped_zero_int_parallel_p): New. * config/gcn/predicates.md (ascending_zero_int_parallel): New. diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h index d7862b21a2a..287ce17d422 100644 --- a/gcc/config/gcn/gcn-protos.h +++ b/gcc/config/gcn/gcn-protos.h @@ -75,6 +75,7 @@ extern reg_class gcn_regno_reg_class (int regno); extern bool gcn_scalar_flat_address_p (rtx); extern bool gcn_scalar_flat_mem_p (rtx); extern bool gcn_sgpr_move_p (rtx, rtx); +extern bool gcn_stepped_zero_int_parallel_p (rtx op, int step); extern bool gcn_valid_move_p (machine_mode, rtx, rtx); extern rtx gcn_vec_constant (machine_mode, int); extern rtx gcn_vec_constant (machine_mode, rtx); diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index 787d7709d0d..334b6b0b51c 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -49,6 +49,13 @@ (define_mode_iterator V_1REG V16QI V16HI V16SI V16HF V16SF V32QI V32HI V32SI V32HF V32SF V64QI V64HI V64SI V64HF V64SF]) +(define_mode_iterator V_1REG_ALT + [V2QI V2HI V2SI V2HF V2SF + V4QI V4HI V4SI V4HF V4SF + V8QI V8HI V8SI V8HF V8SF + V16QI V16HI V16SI V16HF V16SF + V32QI V32HI V32SI V32HF V32SF + V64QI V64HI V64SI V64HF V64SF]) (define_mode_iterator V_INT_1REG [V2QI V2HI V2SI @@ -80,6 +87,13 @@ (define_mode_iterator V_2REG V16DI V16DF V32DI V32DF V64DI V64DF]) +(define_mode_iterator V_2REG_ALT + [V2DI V2DF + V4DI V4DF + V8DI V8DF + V16DI V16DF + V32DI V32DF + V64DI V64DF]) ; Vector modes with native support (define_mode_iterator V_noQI @@ -788,11 +802,36 @@ (define_insn "vec_extract" (set_attr "exec" "none") (set_attr "laneselect" "yes")]) +(define_insn "vec_extract_nop" + [(set (match_operand:V_1REG_ALT 0 "register_operand" "=v,v") + (vec_select:V_1REG_ALT + (match_operand:V_1REG 1 "register_operand" " 0,v") + (match_operand 2 "ascending_zero_int_parallel" "")))] + "MODE_VF (mode) < MODE_VF (mode) + && mode == mode" + "@ + ; in-place extract %0 + v_mov_b32\t%L0, %L1" + [(set_attr "type" "vmult") + (set_attr "length" "0,8")]) + +(define_insn "vec_extract_nop" + [(set (match_operand:V_2REG_ALT 0 "register_operand" "=v,v") + (vec_select:V_2REG_ALT + (match_operand:V_2REG 1 "register_operand" " 0,v") + (match_operand 2 "ascending_zero_int_parallel" "")))] + "MODE_VF (mode) < MODE_VF (mode) + && mode == mode" + "@ + ; in-place extract %0 + v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1" + [(set_attr "type" "vmult") + (set_attr "length" "0,8")]) + (define_expand "vec_extract" - [(set (match_operand:V_ALL_ALT 0 "register_operand") - (vec_select:V_ALL_ALT - (match_operand:V_ALL 1 "register_operand") - (parallel [(match_operand 2 "immediate_operand")])))] + [(match_operand:V_ALL_ALT 0 "register_operand") + (match_operand:V_ALL 1 "register_operand") + (match_operand 2 "immediate_operand")] "MODE_VF (mode) < MODE_VF (mode) && mode == mode" { @@ -802,8 +841,12 @@ (define_expand "vec_extract" if (firstlane == 0) { - /* A plain move will do. */ - tmp = operands[1]; + rtx parallel = gen_rtx_PARALLEL (mode, + rtvec_alloc (numlanes)); + for (int i = 0; i < numlanes; i++) + XVECEXP (parallel, 0, i) = GEN_INT (i); + emit_insn (gen_vec_extract_nop + (operands[0], operands[1], parallel)); } else { /* FIXME: optimize this by using DPP where available. */ @@ -815,10 +858,10 @@ (define_expand "vec_extract" tmp = gen_reg_rtx (mode); emit_insn (gen_ds_bpermute (tmp, permutation, operands[1], get_exec (mode))); - } - emit_move_insn (operands[0], - gen_rtx_SUBREG (mode, tmp, 0)); + emit_move_insn (operands[0], + gen_rtx_SUBREG (mode, tmp, 0)); + } DONE; }) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index aca17a19d84..5bf88e98083 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -1422,6 +1422,24 @@ CODE_FOR_OP (reload_out) #undef CODE_FOR_OP #undef CODE_FOR +/* Return true if OP is a PARALLEL of CONST_INTs that form a linear + series with step STEP. */ + +bool +gcn_stepped_zero_int_parallel_p (rtx op, int step) +{ + if (GET_CODE (op) != PARALLEL || !CONST_INT_P (XVECEXP (op, 0, 0))) + return false; + + unsigned HOST_WIDE_INT base = 0; + for (int i = 0; i < XVECLEN (op, 0); ++i) + if (!CONST_INT_P (XVECEXP (op, 0, i)) + || UINTVAL (XVECEXP (op, 0, i)) != base + i * step) + return false; + + return true; +} + /* }}} */ /* {{{ Addresses, pointers and moves. */ diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md index a20acf7e9ef..5554a06b63b 100644 --- a/gcc/config/gcn/predicates.md +++ b/gcc/config/gcn/predicates.md @@ -197,3 +197,10 @@ (define_predicate "gcn_register_ds_or_unspec_operand" (ior (match_operand 0 "gcn_ds_memory_operand") (and (match_code "unspec") (match_test "XINT (op, 1) == UNSPEC_VECTOR"))))) + +(define_predicate "ascending_zero_int_parallel" + (match_code "parallel") +{ + return gcn_stepped_zero_int_parallel_p (op, 1); +}) +