From patchwork Fri Feb 23 16:25:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Pinski X-Patchwork-Id: 86288 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E77AF385843B for ; Fri, 23 Feb 2024 16:26:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by sourceware.org (Postfix) with ESMTPS id 622003858CD1 for ; Fri, 23 Feb 2024 16:25:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 622003858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=quicinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 622003858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=205.220.168.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708705543; cv=none; b=Xn9RyKhwTcDUocpXkAYsjVhhNG71Qz9EuUm7HvffHu3vDOp7OO4zPJP1zxAXyEq/vwUNNcwTJwrC2tq09/KY/Ekhf4fdRnDU6aCsSfAeEXgB8m62lQ4gp4vsutL1KLbyp4ISXWydDrwgL50lM/lP78XF/ak0HPYYQk4tC2paoQY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708705543; c=relaxed/simple; bh=sDi3+Aq53km+8p1PVutd+leECiezIHfaBGJzlspTVmw=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=pu51EAZwJBmSyoYMpwm9H3H/TsXcBdpXWvMD3PKinR2AV5KSzyLJyznJkHCJgN1scpqZ8nQ7aK+aBLqIaLf1yOCkuTST6Ol5TJCX5oXFANRx+O0ZyQjErChw7D1j4+7aPm79KDCUv7SsTCFapg4Cqkh1kD1PWEr+oREZLMpRjcY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0279863.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 41NDA2ZO027828 for ; Fri, 23 Feb 2024 16:25:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding:content-type; s=qcppdkim1; bh=EFvQTRS VZjvKvJ7kJOJ4TCh479cNj2Z0wYODL1H2sAw=; b=ikKPeEIeBFRPrjlqNDO0sQH 2ngIazkCKZ4mICLJZVsRySR98s9yHgmLT8R4MYZ34uwTZ7zZ/EIS1LwAfI3mGCpO my1W3beh/A7wANVLw26nLJMyf3GwiUOO9G5IhF8qFGfTPf4+Rhc60yKBfdAzp557 rRb8ZYAO4qy41HJyUzNcxlZsB4EXrIqqlNruRIzog+pzp1dURTWyZpRzEFVqRcQS ZEHbnumQGdY1nr1zT+BzYa0L+11WZgexpApxjCgY6MQlr4PrTk+i4sCBoyE6kelG JI6dBlIP2NNJ9ask+YRJkMiCpF6slslg3oiq0OZKAmZ06cb87VCVyjjqKPD/TdA= = Received: from nasanppmta02.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3weqcf9463-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 23 Feb 2024 16:25:39 +0000 (GMT) Received: from nasanex01c.na.qualcomm.com (nasanex01c.na.qualcomm.com [10.45.79.139]) by NASANPPMTA02.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 41NGPcaW016924 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 23 Feb 2024 16:25:38 GMT Received: from hu-apinski-lv.qualcomm.com (10.49.16.6) by nasanex01c.na.qualcomm.com (10.45.79.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 23 Feb 2024 08:25:38 -0800 From: Andrew Pinski To: CC: Andrew Pinski Subject: [PATCH] aarch64: Fix costing of manual bfi instructions Date: Fri, 23 Feb 2024 08:25:13 -0800 Message-ID: <20240223162513.3632569-1-quic_apinski@quicinc.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01c.na.qualcomm.com (10.47.97.35) To nasanex01c.na.qualcomm.com (10.45.79.139) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: 4Hd7uN49XwRXcMBGA_SItR9gLyRxtJsl X-Proofpoint-GUID: 4Hd7uN49XwRXcMBGA_SItR9gLyRxtJsl X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-23_02,2024-02-23_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxlogscore=999 impostorscore=0 clxscore=1015 priorityscore=1501 suspectscore=0 spamscore=0 phishscore=0 lowpriorityscore=0 mlxscore=0 adultscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2402120000 definitions=main-2402230119 X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org This fixes the cost model for BFI instructions which don't use directly zero_extract on the LHS. aarch64_bfi_rtx_p does the heavy lifting by matching of the patterns. Note this alone does not fix PR 107270, it is a step in the right direction. There we get z zero_extend for the non-shifted part which we don't currently match. Built and tested on aarch64-linux-gnu with no regressions. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_bfi_rtx_p): New function. (aarch64_rtx_costs): For IOR, try calling aarch64_bfi_rtx_p. Signed-off-by: Andrew Pinski --- gcc/config/aarch64/aarch64.cc | 94 +++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 3d8341c17fe..dc5c5c23cb3 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -13776,6 +13776,90 @@ aarch64_extr_rtx_p (rtx x, rtx *res_op0, rtx *res_op1) return false; } +/* Return true iff X is an rtx that will match an bfi instruction + i.e. as described in the *aarch64_bfi5 family of patterns. + OP0 and OP1 will be set to the operands of the insert involved + on success and will be NULL_RTX otherwise. */ + +static bool +aarch64_bfi_rtx_p (rtx x, rtx *res_op0, rtx *res_op1) +{ + rtx op0, op1; + scalar_int_mode mode; + + *res_op0 = NULL_RTX; + *res_op1 = NULL_RTX; + if (!is_a (GET_MODE (x), &mode)) + return false; + + if (GET_CODE (x) != IOR) + return false; + + unsigned HOST_WIDE_INT mask1; + unsigned HOST_WIDE_INT shft_amnt; + unsigned HOST_WIDE_INT mask2; + rtx shiftop; + + rtx iop0 = XEXP (x, 0); + rtx iop1 = XEXP (x, 1); + + if (GET_CODE (iop0) == AND + && CONST_INT_P (XEXP (iop0, 1)) + && GET_CODE (XEXP (iop0, 0)) != ASHIFT) + { + op0 = XEXP (iop0, 0); + mask1 = UINTVAL (XEXP (iop0, 1)); + shiftop = iop1; + } + else if (GET_CODE (iop1) == AND + && CONST_INT_P (XEXP (iop1, 1)) + && GET_CODE (XEXP (iop1, 0)) != ASHIFT) + { + op0 = XEXP (iop1, 0); + mask1 = UINTVAL (XEXP (iop1, 1)); + shiftop = iop0; + } + else + return false; + + /* Shifted with no mask. */ + if (GET_CODE (shiftop) == ASHIFT + && CONST_INT_P (XEXP (shiftop, 1))) + { + shft_amnt = UINTVAL (XEXP (shiftop, 1)); + mask2 = HOST_WIDE_INT_M1U << shft_amnt; + op1 = XEXP (shiftop, 0); + } + else if (GET_CODE (shiftop) == AND + && CONST_INT_P (XEXP (shiftop, 1))) + { + mask2 = UINTVAL (XEXP (shiftop, 1)); + if (GET_CODE (XEXP (shiftop, 0)) == ASHIFT + && CONST_INT_P (XEXP (XEXP (shiftop, 0), 1))) + { + op1 = XEXP (XEXP (shiftop, 0), 0); + shft_amnt = UINTVAL (XEXP (XEXP (shiftop, 0), 1)); + } + else + { + op1 = XEXP (shiftop, 0); + shft_amnt = 0; + } + } + else + return false; + + if (shft_amnt >= GET_MODE_BITSIZE (mode)) + return false; + + if (!aarch64_masks_and_shift_for_bfi_p (mode, mask1, shft_amnt, mask2)) + return false; + + *res_op0 = op0; + *res_op1 = op1; + return true; +} + /* Calculate the cost of calculating (if_then_else (OP0) (OP1) (OP2)), storing it in *COST. Result is true if the total cost of the operation has now been calculated. */ @@ -14662,6 +14746,16 @@ cost_plus: return true; } + if (aarch64_bfi_rtx_p (x, &op0, &op1)) + { + *cost += rtx_cost (op0, mode, IOR, 0, speed); + *cost += rtx_cost (op0, mode, IOR, 1, speed); + if (speed) + *cost += extra_cost->alu.bfi; + + return true; + } + if (aarch64_extr_rtx_p (x, &op0, &op1)) { *cost += rtx_cost (op0, mode, IOR, 0, speed);