From patchwork Fri Sep 12 14:14:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2" X-Patchwork-Id: 120174 X-Patchwork-Delegate: rdapp.gcc@gmail.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D89493857737 for ; Fri, 12 Sep 2025 23:35:07 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by sourceware.org (Postfix) with ESMTPS id 286413858039 for ; Fri, 12 Sep 2025 23:34:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 286413858039 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 286413858039 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757720076; cv=none; b=qYp09qRTBP1mO5ItsUYrxwWttXUYhvbrA4fW/rMTNFgiQi1mDEdssNQkdVWbCdL1wUFKA9HCY6yBu2Srl50+rXolq7/T1fNt1dyM0424PbwpF4XxiF08s5m144NKkyVmRHmiD5bzWf5oA7/rNecTIVkXatr4BqaGN4sGH/IOMP0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757720076; c=relaxed/simple; bh=RgiAPVpqNvZBRxrylw60zRcyNijejtb2i9N02YJI8lg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=ZtoFri65AxmKb5Ln/WVTJIqbQJdKRdhUaaM5nEpw1blaor3yOb/0mXXX2zjpvAQ1CRzF90P7GD4JLwe9eKxd7/OTccyE/kNgJoUrp6rxbVAHUCJimdZwaF7K1ekuanQLWNGd/aivqe2/BNhBt68JdqHT+ypeG5rv6SEHCWtMOrs= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 286413858039 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=duAwJZDl DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1757720076; x=1789256076; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RgiAPVpqNvZBRxrylw60zRcyNijejtb2i9N02YJI8lg=; b=duAwJZDl3IovQlATBBZF8ik0n+l9YH9RuxW81Ycl7KSWYv1u1sxpuTF4 hYF3RoGDLbm0d/+j9XLOZiB1nLKiUiIHoyVu/cbMgMtaPYS3Bl0i6HxxR f78EXAr25OkwxlC2c/1czGQ2TfyY3pRJSuA09wpXJ2ZBw9kcYMJEepQo2 G62uRDJbAlWGHWMubmb6UVygXINCfAh1S5VR4gbxp6bPYBiHna7zpK/OT p5lXSICe+9v57YS9TWqHsDoHlpx52I2naB9jF2Pb0PJMWqh3uEyLicyxy 3nXA4uYVnFikUXVmQuvAjuFNHXxkNoYHouNZDxbLrwRGiig/dfHYNFqax A==; X-CSE-ConnectionGUID: xEmHY5V3RRWFznUAe1PjGQ== X-CSE-MsgGUID: Om31hCHyQuqUeW6IIqPu2A== X-IronPort-AV: E=McAfee;i="6800,10657,11551"; a="85500925" X-IronPort-AV: E=Sophos;i="6.18,260,1751266800"; d="scan'208";a="85500925" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Sep 2025 16:34:35 -0700 X-CSE-ConnectionGUID: UGdsWuaLS4OC2H4drC76pQ== X-CSE-MsgGUID: s6DQAWArS3G8NU16XSDBjw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,260,1751266800"; d="scan'208";a="173285388" Received: from panli.sh.intel.com ([10.239.159.63]) by orviesa006.jf.intel.com with ESMTP; 12 Sep 2025 16:34:33 -0700 From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, kito.cheng@gmail.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, ken.chen@intel.com, hongtao.liu@intel.com, Pan Li Subject: [PATCH v1 1/4] RISC-V: Combine vec_duplicate + vwaddu.vv to vwaddu.vx on GR2VR cost Date: Fri, 12 Sep 2025 22:14:30 +0800 Message-ID: <20250912233514.3586587-2-pan2.li@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250912233514.3586587-1-pan2.li@intel.com> References: <20250912233514.3586587-1-pan2.li@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DATE_IN_PAST_06_12, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org From: Pan Li This patch would like to combine the vec_duplicate + vwaddu.vv to the vwaddu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. Before this patch: 11 beq a3,zero,.L8 12 vsetvli a5,zero,e32,m1,ta,ma 13 vmv.v.x v2,a2 ... 16 .L3: 17 vsetvli a5,a3,e32,m1,ta,ma ... 22 vwaddu.vv v1,v2,v3 ... 25 bne a3,zero,.L3 After this patch: 11 beq a3,zero,.L8 ... 14 .L3: 15 vsetvli a5,a3,e32,m1,ta,ma ... 20 vwaddu.vx v1,a2,v3 ... 23 bne a3,zero,.L3 The pattern of this patch only works on DImode, aka below pattern. v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode) + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode)); Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss this extend op after expand. For uint16_t => uint32_t we have: (set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0)) For uint32_t => uint64_t we have: (set (reg:DI 148 [ _6 ]) (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0))) We can see there is no zero_extend for uint16_t to uint32_t, and we cannot hit the pattern above. So the combine will try below pattern for uint16_t to uint32_t. v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode) + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode))) But it cannot match the vwaddu sematics, thus we need another handing for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to uint16_t. gcc/ChangeLog: * config/riscv/autovec-opt.md (*widen_frist__vx_): Add helper bridge pattern for vwaddu.vx combine. (*widen___vx_): Add new pattern to match vwaddu.vx combine. * config/riscv/iterators.md: Add code attr to get extend CODE. * config/riscv/vector-iterators.md: Add Dmode iterator for widen. Signed-off-by: Pan Li --- gcc/config/riscv/autovec-opt.md | 44 ++++++++++++++++++++++++++++ gcc/config/riscv/iterators.md | 3 ++ gcc/config/riscv/vector-iterators.md | 16 ++++++++++ 3 files changed, 63 insertions(+) diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index 02f19bc6a42..fefd2dc63c3 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -1868,6 +1868,50 @@ (define_insn_and_split "*mul_minus_vx_" } [(set_attr "type" "vimuladd")]) +(define_insn_and_split "*widen_frist__vx_" + [(set (match_operand:VWEXTI_D 0 "register_operand") + (vec_duplicate:VWEXTI_D + (any_extend: + (match_operand: 1 "register_operand"))))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { + machine_mode d_trunc_mode = mode; + rtx vec_dup = gen_reg_rtx (d_trunc_mode); + insn_code icode = code_for_pred_broadcast (d_trunc_mode); + rtx vec_dup_ops[] = {vec_dup, operands[1]}; + riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, vec_dup_ops); + + icode = code_for_pred_vf2 (, mode); + rtx extend_ops[] = {operands[0], vec_dup}; + riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, extend_ops); + + DONE; + }) + +(define_insn_and_split "*widen___vx_" + [(set (match_operand:VWEXTI_D 0 "register_operand") + (any_widen_binop:VWEXTI_D + (any_extend:VWEXTI_D + (match_operand: 1 "register_operand")) + (vec_duplicate:VWEXTI_D + (any_extend: + (match_operand: 2 "register_operand")))))] + "TARGET_VECTOR && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] + { + insn_code icode = code_for_pred_dual_widen_scalar (, + , + mode); + riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands); + + DONE; + } + [(set_attr "type" "viwalu")]) ;; ============================================================================= ;; Combine vec_duplicate + op.vv to op.vf diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md index d3002241509..41e56c721c5 100644 --- a/gcc/config/riscv/iterators.md +++ b/gcc/config/riscv/iterators.md @@ -207,6 +207,9 @@ (define_mode_attr slot12_offset [(SI "-52") (DI "-104")]) ;; This code iterator allows signed and unsigned widening multiplications ;; to use the same template. (define_code_iterator any_extend [sign_extend zero_extend]) +(define_code_attr extend_code [ + (sign_extend "SIGN_EXTEND") (zero_extend "ZERO_EXTEND") +]) ;; These code iterators allow unsigned and signed extraction to be generated ;; from the same template. diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index ed7e9c3a951..45af65642cd 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -943,6 +943,22 @@ (define_mode_iterator V_FRACT [ (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_VECTOR_ELEN_64") ]) +(define_mode_iterator VWEXTI_D [ + (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64") + (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64") + + (V1DI "riscv_vector::vls_mode_valid_p (V1DImode) && TARGET_VECTOR_ELEN_64") + (V2DI "riscv_vector::vls_mode_valid_p (V2DImode) && TARGET_VECTOR_ELEN_64") + (V4DI "riscv_vector::vls_mode_valid_p (V4DImode) && TARGET_VECTOR_ELEN_64") + (V8DI "riscv_vector::vls_mode_valid_p (V8DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 64") + (V16DI "riscv_vector::vls_mode_valid_p (V16DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128") + (V32DI "riscv_vector::vls_mode_valid_p (V32DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 256") + (V64DI "riscv_vector::vls_mode_valid_p (V64DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 512") + (V128DI "riscv_vector::vls_mode_valid_p (V128DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 1024") + (V256DI "riscv_vector::vls_mode_valid_p (V256DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 2048") + (V512DI "riscv_vector::vls_mode_valid_p (V512DImode) && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 4096") +]) + (define_mode_iterator VWEXTI [ RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_VECTOR_ELEN_64")