From patchwork Wed Nov 27 04:53:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dhruv Chawla X-Patchwork-Id: 101946 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 850553857BB3 for ; Wed, 27 Nov 2024 04:54:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 850553857BB3 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=X7gWAOCE X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on20612.outbound.protection.outlook.com [IPv6:2a01:111:f403:2417::612]) by sourceware.org (Postfix) with ESMTPS id 22ED33858D37 for ; Wed, 27 Nov 2024 04:53:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 22ED33858D37 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nvidia.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 22ED33858D37 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2417::612 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1732683203; cv=pass; b=XfjzXdGTkd/1hr4Dezk5qqiQW5viAOpls2V69YSgjEFKJe5XETZ2heL7FOlJcLBHDbNRMl88rriIuivUv/K9Et05aKD/x0FfppMMeHUHGIRb/VFjATEcGFuh7jTH5TetkrWCDvuZdhf3lkw09pbwVAAWrrxd2791Nt3ehqwPcZY= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1732683203; c=relaxed/simple; bh=v6EYfbYv7lpSXCzwJQ/Jz6T9OiaIT/JY2w2PNROfdao=; h=DKIM-Signature:Message-ID:Date:To:From:Subject:MIME-Version; b=XEvr9zCTyM/LWIrETGyVjMpA7vSoOltgF8g+/W2zfz44nd0JFLQkI/PPUrSSVUjZXWbAfchcLzS+4l88QMpzyuUVqvtbQWsl0WPDBB9D5Jf0VKfGo29Y1NaMNnQGAvXGCpnoX0XOmeFT6iFPJM3uNT/gBltquOm4LUJThc5FtsI= ARC-Authentication-Results: i=2; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 22ED33858D37 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Z5DfJRnHRx2o8LbLfk5JfUyPfCxbOVY19BZijRUqp3dphf7s2twAamD5V/miqSjqwnRuWrQithuA3hu0EIhKsdhVCyz6x3Q0WUhmmjWYOI2WD0T0chpbJ66XBpkXqtw2ynsZtrk8+mE3+O+BEU+09SEkJSP/sBJPtI/YITBx2L7yWbDB5AzKs+/jlfwN5coQCx8sZdMtPc/plTcdpbg9L4HOGAu4kgLpFt7c/iw4ogVjIDMoKBWKXT7gnV95lwU34xdZFZhPagKsoayylQ63PyyNO5Y+KZRbf7r+g3X4XDVYdKf0eI7Z733G2c/k2KxmcK7ISCcMdSNfmi2286TnLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ahd+MpVtJj7YLMECyxSUwFdtPuuoK7dG+jv32rr5FG0=; b=UjJqotE0gYka+lPC0A2JdTm8bfB+eSVncAE+BHPetFPx2+YK0moUogCRisgsNNflX6c68QA68yk6IEXbYIBd+n1OhTIkTbavToTzOGBrbRtuVf3JAUEuclBIhPGWk1rKze3QwnnGAuJl1uH6gwhfAynW6LoZcQrRFobMA8y4/BlMLOWje3bRXfYRieX2W9IOMempPWc3182+OTjn9dsB34XgHaWd9C6qJwqca+KarmqhV2jN6/0QofajLzBYrDNsVL7D9gpO27hGPSVAim4+KYx+nYqpzNlWhqnu28gd/rzX5+620ySkFQ9tEwTDZaOOG76wW1ezQKZc63oAeJZAMg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ahd+MpVtJj7YLMECyxSUwFdtPuuoK7dG+jv32rr5FG0=; b=X7gWAOCE88Lgao6ujWH+7KZ0FMm6p3+iZktZFJ6sJAgjjVUgb1CE81/GIS+3d27aJ+DGVbG7IbptDir2mmMi5zZyl6a8tn6d+vRtfBwsoooprxbqZLwH/jyN8Hc08He2Au/jAq6PTgRy87hy0ekuj0NPKvMSz1gxahT32YbR893j3zSSlzGPNkN7SPyGCzYymNxKWY8tnNFKvnsgdjHOGm3VB7sdm6V93yaygqCztII746ZdGU4VmUSx5mRl8nehEGm+rL5Q8K+6UdzYcmIrW1El0LaYy/3j0K6vHv9XtHjxVN+R8DE2jb+kEK/3MHdPL9IGCEI3Te1/rh6bMRgN6A== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CY8PR12MB7099.namprd12.prod.outlook.com (2603:10b6:930:61::17) by BL3PR12MB6618.namprd12.prod.outlook.com (2603:10b6:208:38d::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8182.21; Wed, 27 Nov 2024 04:53:18 +0000 Received: from CY8PR12MB7099.namprd12.prod.outlook.com ([fe80::314f:877c:8b6f:52d6]) by CY8PR12MB7099.namprd12.prod.outlook.com ([fe80::314f:877c:8b6f:52d6%6]) with mapi id 15.20.8182.018; Wed, 27 Nov 2024 04:53:18 +0000 Message-ID: <43813c0a-ea45-4d70-bdb7-751613f895df@nvidia.com> Date: Wed, 27 Nov 2024 10:23:12 +0530 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches@gcc.gnu.org Cc: ktkachov@nvidia.com, richard.sandiford@arm.com From: Dhruv Chawla Subject: [RFC][PATCH] aarch64: Fold lsl+lsr+orr to rev for half-width shifts X-ClientProxiedBy: BMXPR01CA0089.INDPRD01.PROD.OUTLOOK.COM (2603:1096:b00:54::29) To CY8PR12MB7099.namprd12.prod.outlook.com (2603:10b6:930:61::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY8PR12MB7099:EE_|BL3PR12MB6618:EE_ X-MS-Office365-Filtering-Correlation-Id: beaa9612-9a55-4c75-bb79-08dd0e9f68ee X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?q?8GH7pDDO+xjJDpSkt+c1/KAyFBYcYKW?= =?utf-8?q?O6PQ64+sZuYLXXXdnMKX71uOt1I/gtMSjbHfqQoYhdrIdl15r00tL8S3lHMaOme3u?= =?utf-8?q?gi69QdPBvx3pLtWtY8ndrMKKVp8CAlZG0gnbbB6VU9G7kkKYLYdRrxTjh+gd31xVW?= =?utf-8?q?83QNzvHKxt3U9dAN+d2QXpKXJoN/8rakkcEi6UeFcgtz0Iuz8xlarjYC+SDYmdFwQ?= =?utf-8?q?roSY44NunlEZxTJvUtBs6zV1wMfHdsWaro/VHz15Qew3kvpySngjstXRXcaCw+rhK?= =?utf-8?q?nTP5ZRUGku5mjLms5M2SnZB2M9ZOk/CCUiqYmZpiHVDtK4AH87iYhrf7Nlg0q/46d?= =?utf-8?q?gk2bkvFgKnpTbk+BTITj5QFhR49Zw97T+dpTW3nMSnInFLI+8nJ4XqFSuRDUCkAMv?= =?utf-8?q?0UuVtWVGvpl02gfGkrjw0/teYVMPu5kK60Fmwvdhhno7rWP0kWDJbqVYSAe/AIfZN?= =?utf-8?q?SBI6g/YGsp9SLu2AP58EuyiBsEmjxFTZzNh9zKSohaiALnrxRX/dVHIx1Bo4T4LFS?= =?utf-8?q?gSHzgTFgC1tMDNDWIBz5++WEBmsU1vd6nnI0cwDvX6873bfIqvssXeoSsnii/oGN4?= =?utf-8?q?nUU4YDcvSRc2cCzUGeU/IZzr5WtI6AgpfoPBWBdL4buoxB1xVmLcdu4G1yQ29R94z?= =?utf-8?q?qaC0KYb2vm7XyI7BOCb6FxaPVAzYxTdaPjayAvw/HLyruTwYdJIp5GMA7SHPCpk1B?= =?utf-8?q?NyLO2wDvRN5zgt4r9PUw6MVwQYGY7buuLNZOFs9S9muzP6RJjPMeVzmovcOoTLYTE?= =?utf-8?q?/BTRIdNnAA8ikp9SUeULtp0zUx8FN4r5IjTgNlPFNlkqKyFAy3ByO3AmsxNDu46Vf?= =?utf-8?q?sgdS8DWKEkmrBlFYTopbD+990jOxmEhmaB3Ek8baMbLnzglcFhrXJJWlNmJ53lbOE?= =?utf-8?q?y0ji6xB5PaaRKfO+D1hG1afmc2rb3KRj8seP6IW6IDAxYMfHgkhO0aU7NHj6KqoQy?= =?utf-8?q?ruhQKHytRz/c0GJhLNHbWynRM1acXMWe2eqtzCH6ahip3TZzx7ALjrUDsxgmkVa31?= =?utf-8?q?t6b6ZmH8Dk0rd9PuDXVKpNEIRmM9Vck6uQisYYX3DMqByZIHiht4GW56Ac/KJ/UqE?= =?utf-8?q?9U/poUCCMf5q2g05S8pKi1PmULTHIMp2kf8oPFSROERZNlfhhfEqaXlW8VafUGGYI?= =?utf-8?q?qAuTkQxL8eCyolGQGEvhb/MDlfbHGh0mTU7Av7RS/WBq6Bvx988eromve/SLWok3Z?= =?utf-8?q?/F7BV9BXeYmthyNd107ILuJHlV3MOx9vok4kBdtj3jwPBtTWBXscGQ1jwY+M8bFd/?= =?utf-8?q?AnoDupsovrDkmZMyqw97bnW+bYuVpS6My9w=3D=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CY8PR12MB7099.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?BWONYJMn+0SdM7f/D3H6/Ivk3CPz?= =?utf-8?q?0VEYQBBVjI87u10nyJbQb1d3QPzmNNRiR/iRol82xxegsGcGqwoe9ScgcuKrRDfya?= =?utf-8?q?UdBKZ+yEcq8JduBQ7IVDMuVmwFlFkPeN75bx2Dbpp2r7v85xHwtSaTHzRM+6J3Tqu?= =?utf-8?q?VYjcl5EX+RU9pnVprJphELhUOFFW+Xb/IeemPsdc5CrNkIP0IBe2OK5FFbuTC4Vx3?= =?utf-8?q?ZTD+BLYRCmmWqsc+/d60kI00Gulj1OXe639JxJ2TxOvbY/JNAgTcbtEV5ry5LBplI?= =?utf-8?q?CwdeGPUIn4U0fXDiGhRMpqu9wxOWiy1jJhEMcxG8x02WkDnUt1/ZoNOLACJHllKC/?= =?utf-8?q?Fkj+th4N8LiveJiizwTyTlyrpRQm6t25N5uvzNnd6VPTS70utwm0dL8RhULyXIdVw?= =?utf-8?q?cYvgEYjgChCYr4/epnbKiY4Hu1Dk6VRs6hXF9BdVC8K3sNPE4E9DeLnMstEMKap/e?= =?utf-8?q?45YeFiCBHcmGH82BeLRK06+3OjjewF6EIC3joy40E33dqcJPrKT5f5NjoIC+e1DFG?= =?utf-8?q?I0uP++B66DTs655jcLsNEGYvbF51yASmk/otCYwlujSkmS1W7wOHZvYTFdZGvTqiz?= =?utf-8?q?lUSYKTnzOAge1DRTW3ZCRcDD+BNSLBpgL6J+XOmvI2PzHGp8chXqdkJRkMT+Gcp0K?= =?utf-8?q?OJdUSVwm4TWDTqlgVmD864OMQ/AZ43943vDP5vbd87C4/Pp9eZgNoltmBAd28uOkN?= =?utf-8?q?sPnDEToUNg2S+lOCy83X3zGp2nQtUoMLp+L6TQBm3ovjhdbTkwnHEP4Onwe8+CmXf?= =?utf-8?q?PIK2ctCjKjda7mzjW2X2RZr/F36Z8PGtEJETaP3Etv7HuRWkmbZqrcaCPiwlzjP3l?= =?utf-8?q?VjrMcqYWzbxvAciYn2Fbk+9gBWZJ/fc0XYCS6mLOyVdkeI5jtMgv37lBUBxtPnGOG?= =?utf-8?q?xUOoym6A13Y1if2lQizlmdfOqOYq33ttN/WXT3Nf0s0hpl02ZCeuadJx86uLB9/5F?= =?utf-8?q?Pa4Eag3LLTH1IoYJf+7p3bMV2VzAPrLJOAeWEmtaq7iVmpH6WLSGjuy366W2S9TMz?= =?utf-8?q?830VZ54RZ4ov6JPs8UuOEvK9UolF9Pa5UTsYac0b7YrkgNFS/z1iE8C0v/+UXFsYT?= =?utf-8?q?MxhNl3KGlNkt/08OW6vfESDbqleAbLyzUryedI2fCAehZN+SlBt2VpyO+NnEHo2C6?= =?utf-8?q?Shj2QgQ+MKB7w06Mk0BmJgl44SEz6+xvcwSVdjriSctjSg2ANW6442LR8KzoqJCVk?= =?utf-8?q?ZA5xrbJ0C0ue2WpNxkXvpvMly1brjq1eYPcqQOUKErMOtxbiCLsnQdecYfMeU/m4F?= =?utf-8?q?hFYvchakWyyrvcakyfHDsrZZSbthW09chfFX4WS65c6wx75971dQ0A6N+4x/xpVDJ?= =?utf-8?q?tsjfqBngUxrbk4wxANhpV5lBb3/NjeGlrObUv4edFwWlMNYhrKbF7w4CSASNYH8d5?= =?utf-8?q?tse+e7r4Ydi6XPZZmSdo4mN4Psr2q2I8jQqnN6EKjCms9T6PtFmZ3q8aMxoehdGsn?= =?utf-8?q?6zkf35OUSswLf8KiO3rF7T+mabfqIVuUPJoYjGVhgC3hRF8HNeebXcN2jwSRpLxtZ?= =?utf-8?q?PPcpRpgsNW6v?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: beaa9612-9a55-4c75-bb79-08dd0e9f68ee X-MS-Exchange-CrossTenant-AuthSource: CY8PR12MB7099.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Nov 2024 04:53:18.2771 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: EX+CMrwnRiTSsHTPqmjgNnd1bqyn2L1jmID7PsVV+GKiL3Ehs4Q3Fsc/2/z4XulHJJPMie5oz6Hma+VQ4gbljw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL3PR12MB6618 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, LOCAL_AUTHENTICATION_FAIL_SPF, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org This patch modifies the intrinsic expanders to expand svlsl and svlsr to unpredicated forms when the predicate is a ptrue. It also folds the following pattern: lsl , , lsr , , orr , , to: revb/h/w , when the shift amount is equal to half the bitwidth of the register. This relies on the RTL combiners combining the "ior (ashift, ashiftrt)" pattern to a "rotate" when the shift amount is half the element width. In the case of the shift amount being 8, a "bswap" is generated. While this works well, the problem is that the matchers for instructions like SRA and ADR expect the shifts to be in an unspec form. So, to keep matching the patterns when the unpredicated instructions are generated, they have to be duplicated to also accept the unpredicated form. Looking for feedback on whether this is a good way to proceed with this problem or how to do this in a better way. The patch was bootstrapped and regtested on aarch64-linux-gnu. diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 87e9909b55a..d91182b6454 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -1947,6 +1947,33 @@ public: { return f.fold_const_binary (LSHIFT_EXPR); } + + rtx expand (function_expander &e) const override + { + tree pred = TREE_OPERAND (e.call_expr, 3); + if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))) + return e.use_unpred_insn (e.direct_optab_handler (ashl_optab)); + return rtx_code_function::expand (e); + } +}; + +class svlsr_impl : public rtx_code_function +{ +public: + CONSTEXPR svlsr_impl () : rtx_code_function (LSHIFTRT, LSHIFTRT) {} + + gimple *fold (gimple_folder &f) const override + { + return f.fold_const_binary (RSHIFT_EXPR); + } + + rtx expand (function_expander &e) const override + { + tree pred = TREE_OPERAND (e.call_expr, 3); + if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))) + return e.use_unpred_insn (e.direct_optab_handler (lshr_optab)); + return rtx_code_function::expand (e); + } }; class svmad_impl : public function_base @@ -3315,7 +3342,7 @@ FUNCTION (svldnt1, svldnt1_impl,) FUNCTION (svlen, svlen_impl,) FUNCTION (svlsl, svlsl_impl,) FUNCTION (svlsl_wide, shift_wide, (ASHIFT, UNSPEC_ASHIFT_WIDE)) -FUNCTION (svlsr, rtx_code_function, (LSHIFTRT, LSHIFTRT)) +FUNCTION (svlsr, svlsr_impl, ) FUNCTION (svlsr_wide, shift_wide, (LSHIFTRT, UNSPEC_LSHIFTRT_WIDE)) FUNCTION (svmad, svmad_impl,) FUNCTION (svmax, rtx_code_function, (SMAX, UMAX, UNSPEC_COND_FMAX, diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 9afd11d3476..3d0bd3b8a67 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -3233,6 +3233,55 @@ ;; - REVW ;; ------------------------------------------------------------------------- +(define_insn_and_split "*v_rev" + [(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=w") + (rotate:SVE_FULL_HSDI + (match_operand:SVE_FULL_HSDI 1 "register_operand" "w") + (match_operand:SVE_FULL_HSDI 2 "aarch64_constant_vector_operand")))] + "TARGET_SVE" + "#" + "&& !reload_completed" + [(set (match_dup 3) + (ashift:SVE_FULL_HSDI (match_dup 1) + (match_dup 2))) + (set (match_dup 0) + (plus:SVE_FULL_HSDI + (lshiftrt:SVE_FULL_HSDI (match_dup 1) + (match_dup 4)) + (match_dup 3)))] + { + if (aarch64_emit_opt_vec_rotate (operands[0], operands[1], operands[2])) + DONE; + + operands[3] = gen_reg_rtx (mode); + rtx shift_amount = unwrap_const_vec_duplicate (operands[2]); + int bitwidth = GET_MODE_UNIT_BITSIZE (mode); + operands[4] = aarch64_simd_gen_const_vector_dup (mode, + bitwidth - INTVAL (shift_amount)); + } +) + +;; The RTL combiners are able to combine "ior (ashift, ashiftrt)" to a "bswap". +;; Match that as well. +(define_insn_and_split "*v_revvnx8hi" + [(set (match_operand:VNx8HI 0 "register_operand" "=w") + (bswap:VNx8HI + (match_operand 1 "register_operand" "w")))] + "TARGET_SVE" + "#" + "&& !reload_completed" + [(set (match_dup 0) + (unspec:VNx8HI + [(match_dup 2) + (unspec:VNx8HI + [(match_dup 1)] + UNSPEC_REVB)] + UNSPEC_PRED_X))] + { + operands[2] = aarch64_ptrue_reg (VNx8BImode); + } +) + ;; Predicated integer unary operations. (define_insn "@aarch64_pred_" [(set (match_operand:SVE_FULL_I 0 "register_operand") @@ -4163,6 +4212,17 @@ } ) +(define_expand "@aarch64_adr_shift_unpred" + [(set (match_operand:SVE_FULL_SDI 0 "register_operand") + (plus:SVE_FULL_SDI + (ashift:SVE_FULL_SDI + (match_operand:SVE_FULL_SDI 2 "register_operand") + (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand")) + (match_operand:SVE_FULL_SDI 1 "register_operand")))] + "TARGET_SVE && TARGET_NON_STREAMING" + {} +) + (define_insn_and_rewrite "*aarch64_adr_shift" [(set (match_operand:SVE_24I 0 "register_operand" "=w") (plus:SVE_24I @@ -4181,6 +4241,17 @@ } ) +(define_insn "*aarch64_adr_shift_unpred" + [(set (match_operand:SVE_24I 0 "register_operand" "=w") + (plus:SVE_24I + (ashift:SVE_24I + (match_operand:SVE_24I 2 "register_operand" "w") + (match_operand:SVE_24I 3 "const_1_to_3_operand")) + (match_operand:SVE_24I 1 "register_operand" "w")))] + "TARGET_SVE && TARGET_NON_STREAMING" + "adr\t%0., [%1., %2., lsl %3]" +) + ;; Same, but with the index being sign-extended from the low 32 bits. (define_insn_and_rewrite "*aarch64_adr_shift_sxtw" [(set (match_operand:VNx2DI 0 "register_operand" "=w") @@ -4205,6 +4276,26 @@ } ) +(define_insn_and_rewrite "*aarch64_adr_shift_sxtw_unpred" + [(set (match_operand:VNx2DI 0 "register_operand" "=w") + (plus:VNx2DI + (ashift:VNx2DI + (unspec:VNx2DI + [(match_operand 4) + (sign_extend:VNx2DI + (truncate:VNx2SI + (match_operand:VNx2DI 2 "register_operand" "w")))] + UNSPEC_PRED_X) + (match_operand:VNx2DI 3 "const_1_to_3_operand")) + (match_operand:VNx2DI 1 "register_operand" "w")))] + "TARGET_SVE && TARGET_NON_STREAMING" + "adr\t%0.d, [%1.d, %2.d, sxtw %3]" + "&& !CONSTANT_P (operands[4])" + { + operands[4] = CONSTM1_RTX (VNx2BImode); + } +) + ;; Same, but with the index being zero-extended from the low 32 bits. (define_insn_and_rewrite "*aarch64_adr_shift_uxtw" [(set (match_operand:VNx2DI 0 "register_operand" "=w") @@ -4226,6 +4317,19 @@ } ) +(define_insn "*aarch64_adr_shift_uxtw_unpred" + [(set (match_operand:VNx2DI 0 "register_operand" "=w") + (plus:VNx2DI + (ashift:VNx2DI + (and:VNx2DI + (match_operand:VNx2DI 2 "register_operand" "w") + (match_operand:VNx2DI 4 "aarch64_sve_uxtw_immediate")) + (match_operand:VNx2DI 3 "const_1_to_3_operand")) + (match_operand:VNx2DI 1 "register_operand" "w")))] + "TARGET_SVE && TARGET_NON_STREAMING" + "adr\t%0.d, [%1.d, %2.d, uxtw %3]" +) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Absolute difference ;; ------------------------------------------------------------------------- @@ -4804,6 +4908,9 @@ ;; Unpredicated shift by a scalar, which expands into one of the vector ;; shifts below. +;; +;; The unpredicated form is emitted only when the shift amount is a constant +;; value that is valid for the shift being carried out. (define_expand "3" [(set (match_operand:SVE_I 0 "register_operand") (ASHIFT:SVE_I @@ -4811,20 +4918,29 @@ (match_operand: 2 "general_operand")))] "TARGET_SVE" { - rtx amount; + rtx amount = NULL_RTX; if (CONST_INT_P (operands[2])) { - amount = gen_const_vec_duplicate (mode, operands[2]); - if (!aarch64_sve_shift_operand (operands[2], mode)) - amount = force_reg (mode, amount); + if (aarch64_simd_shift_imm_p (operands[2], mode, _optab == ashl_optab)) + operands[2] = aarch64_simd_gen_const_vector_dup (mode, INTVAL (operands[2])); + else + { + amount = gen_const_vec_duplicate (mode, operands[2]); + if (!aarch64_sve_shift_operand (operands[2], mode)) + amount = force_reg (mode, amount); + } } else { amount = convert_to_mode (mode, operands[2], 0); amount = expand_vector_broadcast (mode, amount); } - emit_insn (gen_v3 (operands[0], operands[1], amount)); - DONE; + + if (amount) + { + emit_insn (gen_v3 (operands[0], operands[1], amount)); + DONE; + } } ) @@ -4868,27 +4984,27 @@ "" ) -;; Unpredicated shift operations by a constant (post-RA only). +;; Unpredicated shift operations by a constant. ;; These are generated by splitting a predicated instruction whose ;; predicate is unused. -(define_insn "*post_ra_v_ashl3" +(define_insn "*v_ashl3" [(set (match_operand:SVE_I 0 "register_operand") (ashift:SVE_I (match_operand:SVE_I 1 "register_operand") (match_operand:SVE_I 2 "aarch64_simd_lshift_imm")))] - "TARGET_SVE && reload_completed" + "TARGET_SVE" {@ [ cons: =0 , 1 , 2 ] [ w , w , vs1 ] add\t%0., %1., %1. [ w , w , Dl ] lsl\t%0., %1., #%2 } ) -(define_insn "*post_ra_v_3" +(define_insn "*v_3" [(set (match_operand:SVE_I 0 "register_operand" "=w") (SHIFTRT:SVE_I (match_operand:SVE_I 1 "register_operand" "w") (match_operand:SVE_I 2 "aarch64_simd_rshift_imm")))] - "TARGET_SVE && reload_completed" + "TARGET_SVE" "\t%0., %1., #%2" ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 66affa85d36..b3fb0460b70 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1876,6 +1876,16 @@ } ) +(define_expand "@aarch64_sve_add__unpred" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (plus:SVE_FULL_I + (SHIFTRT:SVE_FULL_I + (match_operand:SVE_FULL_I 2 "register_operand") + (match_operand:SVE_FULL_I 3 "aarch64_simd_rshift_imm")) + (match_operand:SVE_FULL_I 1 "register_operand")))] + "TARGET_SVE2" +) + ;; Pattern-match SSRA and USRA as a predicated operation whose predicate ;; isn't needed. (define_insn_and_rewrite "*aarch64_sve2_sra" @@ -1899,6 +1909,20 @@ } ) +(define_insn "*aarch64_sve2_sra_unpred" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (plus:SVE_FULL_I + (SHIFTRT:SVE_FULL_I + (match_operand:SVE_FULL_I 2 "register_operand") + (match_operand:SVE_FULL_I 3 "aarch64_simd_rshift_imm")) + (match_operand:SVE_FULL_I 1 "register_operand")))] + "TARGET_SVE2" + {@ [ cons: =0 , 1 , 2 ; attrs: movprfx ] + [ w , 0 , w ; * ] sra\t%0., %2., #%3 + [ ?&w , w , w ; yes ] movprfx\t%0, %1\;sra\t%0., %2., #%3 + } +) + ;; SRSRA and URSRA. (define_insn "@aarch64_sve_add_" [(set (match_operand:SVE_FULL_I 0 "register_operand") @@ -2539,6 +2563,18 @@ "addhnb\t%0., %2., %3." ) +(define_insn "*bitmask_shift_plus_unpred" + [(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=w") + (lshiftrt:SVE_FULL_HSDI + (plus:SVE_FULL_HSDI + (match_operand:SVE_FULL_HSDI 1 "register_operand" "w") + (match_operand:SVE_FULL_HSDI 2 "register_operand" "w")) + (match_operand:SVE_FULL_HSDI 3 + "aarch64_simd_shift_imm_vec_exact_top" "")))] + "TARGET_SVE2" + "addhnb\t%0., %1., %2." +) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Narrowing right shifts ;; ------------------------------------------------------------------------- diff --git a/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c new file mode 100644 index 00000000000..3a30f80d152 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c @@ -0,0 +1,83 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=armv8.2-a+sve" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** ror32_sve_lsl_imm: +** ptrue p3.b, all +** revw z0.d, p3/m, z0.d +** ret +*/ +svuint64_t +ror32_sve_lsl_imm (svuint64_t r) +{ + return svorr_u64_z (svptrue_b64 (), svlsl_n_u64_z (svptrue_b64 (), r, 32), + svlsr_n_u64_z (svptrue_b64 (), r, 32)); +} + +/* +** ror32_sve_lsl_operand: +** ptrue p3.b, all +** revw z0.d, p3/m, z0.d +** ret +*/ +svuint64_t +ror32_sve_lsl_operand (svuint64_t r) +{ + svbool_t pt = svptrue_b64 (); + return svorr_u64_z (pt, svlsl_n_u64_z (pt, r, 32), svlsr_n_u64_z (pt, r, 32)); +} + +/* +** ror16_sve_lsl_imm: +** ptrue p3.b, all +** revh z0.s, p3/m, z0.s +** ret +*/ +svuint32_t +ror16_sve_lsl_imm (svuint32_t r) +{ + return svorr_u32_z (svptrue_b32 (), svlsl_n_u32_z (svptrue_b32 (), r, 16), + svlsr_n_u32_z (svptrue_b32 (), r, 16)); +} + +/* +** ror16_sve_lsl_operand: +** ptrue p3.b, all +** revh z0.s, p3/m, z0.s +** ret +*/ +svuint32_t +ror16_sve_lsl_operand (svuint32_t r) +{ + svbool_t pt = svptrue_b32 (); + return svorr_u32_z (pt, svlsl_n_u32_z (pt, r, 16), svlsr_n_u32_z (pt, r, 16)); +} + +/* +** ror8_sve_lsl_imm: +** ptrue p3.b, all +** revb z0.h, p3/m, z0.h +** ret +*/ +svuint16_t +ror8_sve_lsl_imm (svuint16_t r) +{ + return svorr_u16_z (svptrue_b16 (), svlsl_n_u16_z (svptrue_b16 (), r, 8), + svlsr_n_u16_z (svptrue_b16 (), r, 8)); +} + +/* +** ror8_sve_lsl_operand: +** ptrue p3.b, all +** revb z0.h, p3/m, z0.h +** ret +*/ +svuint16_t +ror8_sve_lsl_operand (svuint16_t r) +{ + svbool_t pt = svptrue_b16 (); + return svorr_u16_z (pt, svlsl_n_u16_z (pt, r, 8), svlsr_n_u16_z (pt, r, 8)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c new file mode 100644 index 00000000000..89d5a8a8b3e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c @@ -0,0 +1,63 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=armv8.2-a+sve" } */ + +#include + +#define PTRUE_B(BITWIDTH) svptrue_b##BITWIDTH () + +#define ROR_SVE_LSL(NAME, INPUT_TYPE, SHIFT_AMOUNT, BITWIDTH) \ + INPUT_TYPE \ + NAME##_imm (INPUT_TYPE r) \ + { \ + return svorr_u##BITWIDTH##_z (PTRUE_B (BITWIDTH), \ + svlsl_n_u##BITWIDTH##_z (PTRUE_B (BITWIDTH), \ + r, SHIFT_AMOUNT), \ + svlsr_n_u##BITWIDTH##_z (PTRUE_B (BITWIDTH), \ + r, SHIFT_AMOUNT)); \ + } \ + \ + INPUT_TYPE \ + NAME##_operand (INPUT_TYPE r) \ + { \ + svbool_t pt = PTRUE_B (BITWIDTH); \ + return svorr_u##BITWIDTH##_z ( \ + pt, svlsl_n_u##BITWIDTH##_z (pt, r, SHIFT_AMOUNT), \ + svlsr_n_u##BITWIDTH##_z (pt, r, SHIFT_AMOUNT)); \ + } + +/* Make sure that the pattern doesn't match incorrect bit-widths, eg. a shift of + 8 matching the 32-bit mode. */ + +ROR_SVE_LSL (higher_ror32, svuint64_t, 64, 64); +ROR_SVE_LSL (higher_ror16, svuint32_t, 32, 32); +ROR_SVE_LSL (higher_ror8, svuint16_t, 16, 16); + +ROR_SVE_LSL (lower_ror32, svuint64_t, 16, 64); +ROR_SVE_LSL (lower_ror16, svuint32_t, 8, 32); +ROR_SVE_LSL (lower_ror8, svuint16_t, 4, 16); + +/* Check off-by-one cases. */ + +ROR_SVE_LSL (off_1_high_ror32, svuint64_t, 33, 64); +ROR_SVE_LSL (off_1_high_ror16, svuint32_t, 17, 32); +ROR_SVE_LSL (off_1_high_ror8, svuint16_t, 9, 16); + +ROR_SVE_LSL (off_1_low_ror32, svuint64_t, 31, 64); +ROR_SVE_LSL (off_1_low_ror16, svuint32_t, 15, 32); +ROR_SVE_LSL (off_1_low_ror8, svuint16_t, 7, 16); + +/* Check out of bounds cases. */ + +ROR_SVE_LSL (oob_ror32, svuint64_t, 65, 64); +ROR_SVE_LSL (oob_ror16, svuint32_t, 33, 32); +ROR_SVE_LSL (oob_ror8, svuint16_t, 17, 16); + +/* Check zero case. */ + +ROR_SVE_LSL (zero_ror32, svuint64_t, 0, 64); +ROR_SVE_LSL (zero_ror16, svuint32_t, 0, 32); +ROR_SVE_LSL (zero_ror8, svuint16_t, 0, 16); + +/* { dg-final { scan-assembler-times "revb" 0 } } */ +/* { dg-final { scan-assembler-times "revh" 0 } } */ +/* { dg-final { scan-assembler-times "revw" 0 } } */