Message ID | Z1BIjoIUZDTLQAd5@arm.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6B6F13858C3A for <patchwork@sourceware.org>; Wed, 4 Dec 2024 12:25:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6B6F13858C3A Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=qVROUjnX; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=qVROUjnX X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from OSPPR02CU001.outbound.protection.outlook.com (mail-norwayeastazlp170130007.outbound.protection.outlook.com [IPv6:2a01:111:f403:c20f::7]) by sourceware.org (Postfix) with ESMTPS id D824E3858C53; Wed, 4 Dec 2024 12:18:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D824E3858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D824E3858C53 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c20f::7 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1733314723; cv=pass; b=jiJtOz6unq2CiW/Rii0xf2vSppjLGhzkxzvxiqSwAZG37xiPJS6ad+doGQHQA9WGDBSdVXIzgs5+GuxxB739fSBoHH3knvP0eV59sBITRDLXQS1GLPa+5sLIps9kdON5eKJDHTVySeBtKuGGEN19atUG2m/vkPQISxmiWOQ2eKA= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1733314723; c=relaxed/simple; bh=UNGTzsbN7CK7XRQLMeTK51ZbFkVbT4LVJ26rU0tOWKQ=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=tVyufBUungj4AcQMTR2QTVRA16s0CWlgCglJPmlyW2Lkcp3NsHQC1FtMLgnFr8I2Vv8hs1Inuw6hGTbOxNtcGmvlUFpZ9+W63wEWC0N7jbGV2n20HMtV+dru/x2o0XZnT7+YSj7DyYTD/1BRdm67Apd30rzDp5tQNzxKKe0ugrE= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=UuBsxWULzA+vF2OpIzxUtp6SMFy5oP8iQMm4n4lzVU5g6vthIGjl5RZCfB5Nd+IC8CFyyCWV5SFUHAOwJiRDpkQb0Uqhhv1Q49H67RqOF+13KiFnrcR6TRxHn/Tvthu+YkG5Sit7Fz2vaX3IZpTXXyVUUEqfBb5Zs6/84E6+9bDZPOSmzs4czweUkWd21w9LqZNXZnBaYseET99SFEgQx5NPsguock4kHjk0IupvX+9kHKB1uzn2qf7x/Q29CxJp/4Ltb/scVxf/NzX7jce5Hl6Km22NY1/7afGzJxOpTqNq1h/y4HlFTuPuv2WxP+wLX4aa/67m9pZxyjZNlWQbGQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1ZgrQM7tQjj87VLMZjzCvOOSDGLqhmYaJd8amrAtGSw=; b=YBsEOCEWpHDC5pMhe49ubIueAkkRhB92SMoEby84j5F20cdLSQt6p30znTGZC7nqTCshr6FMdbMlnAQG04FIEYFX6v1tGG+7k71IStxtrnyO3GxYzQJTlnbumxlYdbqpVMaxcMDA0jDRHK9OUg+DFtgnSkLtOQOkGAAvzmZRajCMyK55cLIfWcKPtMtO6oZYIqb+3VqGDJ1XMwj8pvEvlYAnkikxDkolCMZX/f8iANp/+Bxoc6IVNik1FpZ0rEBmy64m9y766ugB6LyXZtQCVDXNJKQQIY48aU3xrlZMzZ8JFgT2jsFM43Q8Uw1JQ8pwy+yJD5sbboou1r8PcNMz7Q== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1ZgrQM7tQjj87VLMZjzCvOOSDGLqhmYaJd8amrAtGSw=; b=qVROUjnXIO8Bz+Z1ZcHIjtdYElHm5FcAoTX+gwOf0xF7YNMCu7tDcXZOVIcCF7m/CNofnBHKwZFXhJ259Hah0CGjG0gYMU9lQiILrp6THkXGOA2ydj8E3gRGwttnLiNvmWImWkJ0isSNalSMWpWLAGPCPnFaUP/NAuRiyk/+yG0= Received: from DU7P251CA0002.EURP251.PROD.OUTLOOK.COM (2603:10a6:10:551::26) by GV2PR08MB8147.eurprd08.prod.outlook.com (2603:10a6:150:7b::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8207.18; Wed, 4 Dec 2024 12:18:34 +0000 Received: from DB1PEPF000509FF.eurprd03.prod.outlook.com (2603:10a6:10:551:cafe::f0) by DU7P251CA0002.outlook.office365.com (2603:10a6:10:551::26) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8230.9 via Frontend Transport; Wed, 4 Dec 2024 12:18:34 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB1PEPF000509FF.mail.protection.outlook.com (10.167.242.41) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8230.7 via Frontend Transport; Wed, 4 Dec 2024 12:18:33 +0000 Received: ("Tessian outbound 54dc4e234ee1:v522"); Wed, 04 Dec 2024 12:18:33 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 01af7b602a5c4970 X-TessianGatewayMetadata: yiQNTmdcMPCQjPA44tgTQjEm4o6Kzrf8ZDACkzAu3hnxUqpSyR5uN7sk8YBw4EO+j9aL3C/Y4Rdqz0E+0SgYW2ZxgwYMyt3AopWHBGhEiKAqolC/FAIH3NJ9VHA4muvXK6W1Ka9F6xayrqDlJpzbDA== X-CR-MTA-TID: 64aa7808 Received: from La7a8be573b5c.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C5C52227-9B0F-4C0F-BCE9-7AE7FA79FC42.1; Wed, 04 Dec 2024 12:18:27 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id La7a8be573b5c.2 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384); Wed, 04 Dec 2024 12:18:27 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=fRNKR6P8EfE7GQRbFuSmLcUx3hRi6OPJ37RtiIhkPDPu3lezszXseSU8NsygS/ZuD6+v++xSWOUrzbQGFprzqHyJIKsTONJLJp2XGRFiHKV8Tbh+HcEonl2iEcHV9YR4j3egtWGYO4gnzGDghSARvTcbDlWMOnUngTD1sU+EEcNPt7911QCrBrEef78Ov0fatrGGPffsCM8N7zHkuwauYjxhc+EyFhJZDVIu7Igzixd5R+j1E6LIJmHIhLFBb6kxkOtH73HIn7qvWOAISzlYyBnOGCwETHmnAHSc3ga4pnDegNiW8Bj/YpDaUyFLujXvqCY5ey4ufs35i06Wi7j7bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1ZgrQM7tQjj87VLMZjzCvOOSDGLqhmYaJd8amrAtGSw=; b=og6f4+SVu9EC/GFvmqZ8iVxbKTcvW3rB/ASOCE6xNcb3YSNdAQQtXrJue7StrTUU+IRRwX7KIUlVQ9rMyFtMdpbwXnQfOrxNXoGs876gSjVH4LRc6whCzJtO6RPJ3Et71AeMV2qTB/EckygEiw6rVOBU7VQPM+9CmL8mpXnQ/oIbfEtkB/pfZ7Wq+IgPs5q9uVnfWiUtb8/Xml6iWQwTkmARcfoNSuan8970hkQKeFL9hyaDmNdn9IZ3zs2bOjlsax6yr7f5XAhYHgDSr1ydhklZj0dCa/JcayrG7xxuWI1Oydr7IKHRQoQRpDdTyflQesbOCCrSSCuQX8H3H+7nfw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1ZgrQM7tQjj87VLMZjzCvOOSDGLqhmYaJd8amrAtGSw=; b=qVROUjnXIO8Bz+Z1ZcHIjtdYElHm5FcAoTX+gwOf0xF7YNMCu7tDcXZOVIcCF7m/CNofnBHKwZFXhJ259Hah0CGjG0gYMU9lQiILrp6THkXGOA2ydj8E3gRGwttnLiNvmWImWkJ0isSNalSMWpWLAGPCPnFaUP/NAuRiyk/+yG0= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS8PR08MB6390.eurprd08.prod.outlook.com (2603:10a6:20b:31b::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8230.11; Wed, 4 Dec 2024 12:18:24 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::89dc:c731:362b:7c69%2]) with mapi id 15.20.8207.017; Wed, 4 Dec 2024 12:18:24 +0000 Date: Wed, 4 Dec 2024 12:18:22 +0000 From: Tamar Christina <tamar.christina@arm.com> To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, ktkachov@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 7/7]AArch64: Implement vector concat of partial SVE vectors Message-ID: <Z1BIjoIUZDTLQAd5@arm.com> Content-Type: multipart/mixed; boundary="vW20bfN1CeqtCTF+" Content-Disposition: inline In-Reply-To: <patch-19026-tamar@arm.com> X-ClientProxiedBy: LO4P123CA0130.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:193::9) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AS8PR08MB6390:EE_|DB1PEPF000509FF:EE_|GV2PR08MB8147:EE_ X-MS-Office365-Filtering-Correlation-Id: 58e37a12-6169-45cd-6759-08dd145dc5c2 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230040|1800799024|376014|366016; X-Microsoft-Antispam-Message-Info-Original: 2wTLa1OogmHPtduua25vuYXcDcSZ8ZetPnJyQPe6aJ5g5b131xqgLNht/OaD1O8yhuXfJ+pVMS6QPID8KsTigSEfyLJPXTc1uZkKRFkbfojewSIoCPI4zlu9fSNkLkC0w5QoNcW9URVjAwzG2FvFUP51fT42opjCCLzexMMmyO2+9ryrdZtBFBp260x9UjoWcfTTOsgiK6qKXx5MplfHYTyMjAGt1hiKCDIiFClWPxKnYzWT0ynGnj143qEIWm52j/oTsLG0us4OTkQqy1fRkoJYwG9qxnwEl34BdtePIM26aA0F/9GS13MF8/ZGFCG08O5oQzj02pbkH2/p1hWvFeL+Aby5gJvUKFdatsjUR+fbVPWqN640v06g746P4niIbk8D+yJesvnDpXGENQUclss3Mhe8QkQSBp5x6PowRnTLdnDG9n+KBBmy4Hp3gX+YN8rA00NFLtrSL3qdGEm8dc2GYoNaNU0ZKAztBQuCssM9wHpSPXj3S3eR4GflsqxjDzY3uhYcEmybTw9S+3N8jfXxiG0wTv3IpHbrTaesKJhfSlRiQ0TEfqnxA7+RMpMSZKUubYXCzM3v+akQ0U9bxcXD8M/TgeBCX3C7sZ0PUtrAMte2ZkPKiuk5qdCZ20PmHakR6jmMVLMYwbapVnjgwTefNoix5ASfibwzTAH0z0yPOCmCM7tx/u7sXdeHtbPWMsCzmfo/HId/NGodWrMc+2dYZDZUh68qlO91T8t20go4n4p/XUPpxg3wpaTIv/mel0VuddcomR1bpsHtL0L6JxRTVzJfotjfoUbYYvr+EdxCbhbAvfPMuHtSvdJ0ry6/C4TNH2kF7EFS3Mi8GUVinfMSLhZv9Me+93XeoUE23gQmgWNdYwyXG1eqK2Lcr5S9+za4du+xdrc4XlzknqtjauXSkwB9OghvRCIw9JGPZDGD/9PP5a7dhN57n0SXHk57CyltFDkeyBicOorpWon9w2bLJrZgsunRq4OYIc9Dv7ILN+cyC1AH8IMFJXPa2GqgNrL4nkc5aBSA5UjN2/OUQt2bLVoG80O09PKEYkZHkVEDa6W50PakEG17dPO7T39LYlw+WWbl1iFG7UaDqOz44hIOcpdi/6Ndej2lLL++nRFjmhh/0bua7yeUalOboX/7RzJaNm4Pk8wLgAurejwpCFnIqsDHLHIDQYQ4manWvW7PvWC5mIxCbYJJ5uc6HP5LqWRTGwIJefDVKx2+ITS2nZ0OYf5vW7upM6VOOH+zRoGxDnijA4A1m7jjGhXukqfM82rjHekjdHkTozl6sfF7R0WaoNlxfp5H4AVzoMYJW2E= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(1800799024)(376014)(366016); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6390 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:803:13e::17]; domain=VI1PR08MB5325.eurprd08.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB1PEPF000509FF.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: fb8d36de-9be7-48d0-61af-08dd145dc036 X-Microsoft-Antispam: BCL:0; ARA:13230040|14060799003|1800799024|35042699022|36860700013|82310400026|376014; X-Microsoft-Antispam-Message-Info: =?utf-8?q?9LezKJK/IR6p3oRTTZA+/wppgP7/Y+t?= =?utf-8?q?6562D2ywqfNJVzqGIQx902QrHUUCrnJJp8bTZ3eozIud0qhM5u4orZWJF5rDi0KUy?= =?utf-8?q?scCQxMlKPQorsTXWCxhwAMzFGcSxFxgVFMwaHls6y2qvA+Xq1368WoAGiB42fFuS+?= =?utf-8?q?q6CRXSO9bfmmw5ZhQ74s0kVMCNBvjc/bEqN3BiyNGWMSluKJIeSztTrrBU37u2BiO?= =?utf-8?q?rEVP0uub0V/Z5d/taX8vXNi8ao1en9CykmO8EeIKEfE5tKlIfblNz+/srW5u+hQgC?= =?utf-8?q?bj+ngKiaeXCeuf1p8m5W/mZ9rSJS1JyO6fWtXqxweB2UYJ4CFOOxsFHSHS26y2upQ?= =?utf-8?q?xh8p0dD8CI46NapgPBn0/Fu54MuBEC8XLIkwmu3AQNGmcae+zAvqlmBQrOHax10vy?= =?utf-8?q?RzqUcBc4cvwwknGxY8c2I6oYapQZgTKLOP+eMo7A3h8o6BFsq5MEpHWdhz/1BfUEg?= =?utf-8?q?fsun8t6c+CK/kw8gG0cmx9TZtEQNGTtI3ZpfB8xlhwFWwAAqHsraTqFMM+VHq9sLh?= =?utf-8?q?svXLuCFtEB7Gqe88WaKu4HB4Ihh3OdLrdWjl6hOMvbd6sIQyg6wNNcdS8wqrt0Io0?= =?utf-8?q?nEEJW+TXvKne812f+W4m9fgMTb2z0g1zZLY6yI9w1T4vXcCv1TLRhZnVZPhjHvO/Z?= =?utf-8?q?fyppxCOMIGRpbRyBMeYADiPhGUS6/nOuHbXUHLS1pDhxIRwTwtpUeT3lqNwilJLOL?= =?utf-8?q?jhm5i645RstEftgMMawu8CdLNTXPiuR08eIb+dokyE3yHi3ywvLljbwpB6+s9zYON?= =?utf-8?q?owDSiGNVpUM56RKI/NLpyYPY0xVXVTxNU2PbjL2ZkFMS+WHAc1v1K/jKlP3ErofcP?= =?utf-8?q?ogCFofujtY6iVH9fKrMOAcN2Qv33IeNuVpWZNm/QfeASJJNuzy0MlmF+zob9ERu3i?= =?utf-8?q?RO/cAt/9DUPbgTPzV+jEmR5TklqHBitz8gWXN5MwQuMPu3oNTygaf4ZVGVvvjXKre?= =?utf-8?q?626uH/EyUsqJ+O+EtXlZ1EMiE2uE4rAzzqQOgiJLPIj7BfkkccFpUVijKuGlS5avm?= =?utf-8?q?6Y2PbUn+57sglfotmuvzYjE2JpHAL30kBFhg0rL6j6zEHVh79EerKHmSH7UFBSHbz?= =?utf-8?q?9XB0oOQc5W90JxFQs+EdjiKD0sfisoiEJNtuu7iNcFDAF6QDYI5ZKO3zSZOy89Szr?= =?utf-8?q?TCRob3Kesa6TmbtjZaUL3WO+199ZSQ0GmC/51rMxmUn+xpsRuJuLpv9LxMERW6K5p?= =?utf-8?q?8qKNgTkm8+tYi/rEBst33XKQh0pHUC6Q2XgcjeNxy1WKnxXbwBAjO04bcOgPsAmFz?= =?utf-8?q?lK8ImL4piaH3TC7pwTEKP7wtTe9b2GE4XGGw8tp291Pm+X+lAlaraODcBsoYuxBij?= =?utf-8?q?Hv1bceuNZOAvZLkXi3JPCx9hbh97r45+Ww=3D=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:64aa7808-outbound-1.mta.getcheckrecipient.com; CAT:NONE; SFS:(13230040)(14060799003)(1800799024)(35042699022)(36860700013)(82310400026)(376014); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Dec 2024 12:18:33.8063 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 58e37a12-6169-45cd-6759-08dd145dc5c2 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB1PEPF000509FF.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB8147 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, UNPARSEABLE_RELAY, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org |
Series |
None
|
|
Commit Message
Tamar Christina
Dec. 4, 2024, 12:18 p.m. UTC
Hi All, This patch adds support for vector constructor from two partial SVE vectors into a full SVE vector. It also implements support for the standard vec_init obtab to do this. gcc/ChangeLog: PR target/96342 * config/aarch64/aarch64-sve.md (vec_init<mode><Vhalf>): New. (@aarch64_pack_partial<mode>): New. * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init): Special case constructors of two vectors. * config/aarch64/iterators.md (SVE_NO2E, SVE_PARTIAL_NO2E): New. (VHALF, Vhalf, Vwstype): Add SVE partial vectors. gcc/testsuite/ChangeLog: PR target/96342 * gcc.target/aarch64/vect-simd-clone-2.c: New test. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Ok for master? Thanks, Tamar --- --
Comments
ping > -----Original Message----- > From: Tamar Christina > Sent: Wednesday, December 4, 2024 12:18 PM > To: gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; > ktkachov@gcc.gnu.org; Richard Sandiford <Richard.Sandiford@arm.com> > Subject: [PATCH 7/7]AArch64: Implement vector concat of partial SVE vectors > > Hi All, > > This patch adds support for vector constructor from two partial SVE vectors into > a full SVE vector. It also implements support for the standard vec_init obtab to > do this. > > gcc/ChangeLog: > > PR target/96342 > * config/aarch64/aarch64-sve.md (vec_init<mode><Vhalf>): New. > (@aarch64_pack_partial<mode>): New. > * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init): Special > case constructors of two vectors. > * config/aarch64/iterators.md (SVE_NO2E, SVE_PARTIAL_NO2E): New. > (VHALF, Vhalf, Vwstype): Add SVE partial vectors. > > gcc/testsuite/ChangeLog: > > PR target/96342 > * gcc.target/aarch64/vect-simd-clone-2.c: New test. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? > > Thanks, > Tamar > > --- > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64- > sve.md > index > 9afd11d347626eeb640722fdba2ab763b8479aa7..9e3577be6e943d7a5c95119 > 6463873d4bcfee07c 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -2840,6 +2840,16 @@ (define_expand "vec_init<mode><Vel>" > } > ) > > +(define_expand "vec_init<mode><Vhalf>" > + [(match_operand:SVE_NO2E 0 "register_operand") > + (match_operand 1 "")] > + "TARGET_SVE" > + { > + aarch64_sve_expand_vector_init (operands[0], operands[1]); > + DONE; > + } > +) > + > ;; Shift an SVE vector left and insert a scalar into element 0. > (define_insn "vec_shl_insert_<mode>" > [(set (match_operand:SVE_FULL 0 "register_operand") > @@ -9347,6 +9357,20 @@ (define_insn "vec_pack_trunc_<Vwide>" > "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>" > ) > > +;; Integer partial pack packing two partial SVE types into a single full SVE > +;; type of the same element type. Use UZP1 on the wider type, which discards > +;; the high part of each wide element. This allows to concat SVE partial types > +;; into a wider vector. > +(define_insn "@aarch64_pack_partial<mode>" > + [(set (match_operand:SVE_PARTIAL_NO2E 0 "register_operand" "=w") > + (unspec:SVE_PARTIAL_NO2E > + [(match_operand:<VHALF> 1 "register_operand" "w") > + (match_operand:<VHALF> 2 "register_operand" "w")] > + UNSPEC_PACK))] > + "TARGET_SVE" > + "uzp1\t%0.<Vwstype>, %1.<Vwstype>, %2.<Vwstype>" > +) > + > ;; ------------------------------------------------------------------------- > ;; ---- [INT<-INT] Unpacks > ;; ------------------------------------------------------------------------- > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > af6fede102c2be6673c24f8020d000ea56322997..690d54b0a2954327e00d559f > 96c414c81c2e18cd 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24790,6 +24790,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx > vals) > v.quick_push (XVECEXP (vals, 0, i)); > v.finalize (); > > + /* If we have two elements and are concatting vector. */ > + machine_mode elem_mode = GET_MODE (v.elt (0)); > + if (nelts == 2 && VECTOR_MODE_P (elem_mode)) > + { > + /* We've failed expansion using a dup. Try using a cheeky truncate. */ > + rtx arg0 = force_reg (elem_mode, v.elt(0)); > + rtx arg1 = force_reg (elem_mode, v.elt(1)); > + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); > + return; > + } > + > /* If neither sub-vectors of v could be initialized specially, > then use INSR to insert all elements from v into TARGET. > ??? This might not be optimal for vectors with large > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index > 023893d35f3e955e222c322ce370e84c95c29ee6..77d23d6ad795630d3d5fb5c0 > 76c086a479d46fee 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -138,6 +138,14 @@ (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI]) > ;; VQ without 2 element modes. > (define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF V8BF]) > > +;; SVE modes without 2 element modes. > +(define_mode_iterator SVE_NO2E [VNx16QI VNx8QI VNx4QI VNx8HI VNx4HI > VNx8HF > + VNx4HF VNx8BF VNx4BF VNx4SI VNx4SF]) > + > +;; Partial SVE modes without 2 element modes. > +(define_mode_iterator SVE_PARTIAL_NO2E [VNx8QI VNx4QI VNx4HI > + VNx4HF VNx8BF VNx4BF]) > + > ;; 2 element quad vector modes. > (define_mode_iterator VQ_2E [V2DI V2DF]) > > @@ -1678,7 +1686,15 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI > "V8QI") > (V2DI "DI") (V2SF "SF") > (V4SF "V2SF") (V4HF "V2HF") > (V8HF "V4HF") (V2DF "DF") > - (V8BF "V4BF")]) > + (V8BF "V4BF") > + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") > + (VNx4QI "VNx2QI") (VNx2QI "QI") > + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") (VNx2HI "HI") > + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") (VNx2HF "HF") > + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") (VNx2BF "BF") > + (VNx4SI "VNx2SI") (VNx2SI "SI") > + (VNx4SF "VNx2SF") (VNx2SF "SF") > + (VNx2DI "DI") (VNx2DF "DF")]) > > ;; Half modes of all vector modes, in lower-case. > (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") > @@ -1686,7 +1702,15 @@ (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI > "v8qi") > (V8HF "v4hf") (V8BF "v4bf") > (V2SI "si") (V4SI "v2si") > (V2DI "di") (V2SF "sf") > - (V4SF "v2sf") (V2DF "df")]) > + (V4SF "v2sf") (V2DF "df") > + (VNx16QI "vnx8qi") (VNx8QI "vnx4qi") > + (VNx4QI "vnx2qi") (VNx2QI "qi") > + (VNx8HI "vnx4hi") (VNx4HI "vnx2hi") (VNx2HI "hi") > + (VNx8HF "vnx4hf") (VNx4HF "vnx2hf") (VNx2HF "hf") > + (VNx8BF "vnx4bf") (VNx4BF "vnx2bf") (VNx2BF "bf") > + (VNx4SI "vnx2si") (VNx2SI "si") > + (VNx4SF "vnx2sf") (VNx2SF "sf") > + (VNx2DI "di") (VNx2DF "df")]) > > ;; Single-element half modes of quad vector modes. > (define_mode_attr V1HALF [(V2DI "V1DI") (V2DF "V1DF")]) > @@ -1815,7 +1839,10 @@ (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s") > ;; Widened scalar register suffixes. > (define_mode_attr Vwstype [(V8QI "h") (V4HI "s") > (V2SI "") (V16QI "h") > - (V8HI "s") (V4SI "d")]) > + (V8HI "s") (V4SI "d") > + (VNx8QI "h") (VNx4QI "h") > + (VNx4HF "s") (VNx4HI "s") > + (VNx8BF "h") (VNx4BF "h")]) > ;; Add a .1d for V2SI. > (define_mode_attr Vwsuf [(V8QI "") (V4HI "") > (V2SI ".1d") (V16QI "") > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c > b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..a25cae2708dd18cc91a773 > 2f845419bbdb06c5c1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-std=c99" } */ > +/* { dg-additional-options "-O3 -march=armv8-a" } */ > + > +#pragma GCC target ("+sve") > +extern char __attribute__ ((simd, const)) fn3 (int, char); > +void test_fn3 (int *a, int *b, char *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = (int) (fn3 (b[i], c[i]) + c[i]); > +} > + > +/* { dg-final { scan-assembler {\s+_ZGVsMxvv_fn3\n} } } */ > > > > > --
Tamar Christina <tamar.christina@arm.com> writes: > Hi All, > > This patch adds support for vector constructor from two partial SVE vectors into > a full SVE vector. It also implements support for the standard vec_init obtab to > do this. > > gcc/ChangeLog: > > PR target/96342 > * config/aarch64/aarch64-sve.md (vec_init<mode><Vhalf>): New. > (@aarch64_pack_partial<mode>): New. > * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init): Special > case constructors of two vectors. > * config/aarch64/iterators.md (SVE_NO2E, SVE_PARTIAL_NO2E): New. > (VHALF, Vhalf, Vwstype): Add SVE partial vectors. > > gcc/testsuite/ChangeLog: > > PR target/96342 > * gcc.target/aarch64/vect-simd-clone-2.c: New test. This triggers an ICE for: typedef unsigned int v8si __attribute__((vector_size(32))); typedef unsigned int v16si __attribute__((vector_size(64))); v16si __GIMPLE foo (v8si x, v8si y) { v16si res; res = _Literal (v16si) { x, y }; return res; } compiled with -O2 -march=armv8-a+sve -msve-vector-bits=512 -fgimple. Suggested fix below. > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? > > Thanks, > Tamar > > --- > > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md > index 9afd11d347626eeb640722fdba2ab763b8479aa7..9e3577be6e943d7a5c951196463873d4bcfee07c 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -2840,6 +2840,16 @@ (define_expand "vec_init<mode><Vel>" > } > ) > > +(define_expand "vec_init<mode><Vhalf>" > + [(match_operand:SVE_NO2E 0 "register_operand") > + (match_operand 1 "")] Nit: excess indentation. > + "TARGET_SVE" > + { > + aarch64_sve_expand_vector_init (operands[0], operands[1]); > + DONE; > + } > +) > + > ;; Shift an SVE vector left and insert a scalar into element 0. > (define_insn "vec_shl_insert_<mode>" > [(set (match_operand:SVE_FULL 0 "register_operand") > @@ -9347,6 +9357,20 @@ (define_insn "vec_pack_trunc_<Vwide>" > "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>" > ) > > +;; Integer partial pack packing two partial SVE types into a single full SVE > +;; type of the same element type. Use UZP1 on the wider type, which discards > +;; the high part of each wide element. This allows to concat SVE partial types > +;; into a wider vector. > +(define_insn "@aarch64_pack_partial<mode>" > + [(set (match_operand:SVE_PARTIAL_NO2E 0 "register_operand" "=w") > + (unspec:SVE_PARTIAL_NO2E > + [(match_operand:<VHALF> 1 "register_operand" "w") > + (match_operand:<VHALF> 2 "register_operand" "w")] > + UNSPEC_PACK))] > + "TARGET_SVE" > + "uzp1\t%0.<Vwstype>, %1.<Vwstype>, %2.<Vwstype>" > +) > + To fix the ICE above, I think we should define this pattern for all SVE_NO2E. We can also make it a vec_concat, which should work for both endiannesses. Rather than use Vwstype, I think this is conceptually a permute of the containers, so should use Vctype. That will change VNx4QI from using .h (as in the patch) to .s (to match VNx4SI), but both work. > ;; ------------------------------------------------------------------------- > ;; ---- [INT<-INT] Unpacks > ;; ------------------------------------------------------------------------- > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index af6fede102c2be6673c24f8020d000ea56322997..690d54b0a2954327e00d559f96c414c81c2e18cd 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24790,6 +24790,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) > v.quick_push (XVECEXP (vals, 0, i)); > v.finalize (); > > + /* If we have two elements and are concatting vector. */ > + machine_mode elem_mode = GET_MODE (v.elt (0)); > + if (nelts == 2 && VECTOR_MODE_P (elem_mode)) > + { > + /* We've failed expansion using a dup. Try using a cheeky truncate. */ > + rtx arg0 = force_reg (elem_mode, v.elt(0)); > + rtx arg1 = force_reg (elem_mode, v.elt(1)); > + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); > + return; > + } > + I think it'd be better to use an independent routine for this, since there's not really any overlap with the scalar-element code. In particular, we might as well get the vectors directly from XVECEXP (val, 0, ...), since we don't need the rtx_vector_builder for the expansion. > /* If neither sub-vectors of v could be initialized specially, > then use INSR to insert all elements from v into TARGET. > ??? This might not be optimal for vectors with large > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index 023893d35f3e955e222c322ce370e84c95c29ee6..77d23d6ad795630d3d5fb5c076c086a479d46fee 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -138,6 +138,14 @@ (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI]) > ;; VQ without 2 element modes. > (define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF V8BF]) > > +;; SVE modes without 2 element modes. > +(define_mode_iterator SVE_NO2E [VNx16QI VNx8QI VNx4QI VNx8HI VNx4HI VNx8HF > + VNx4HF VNx8BF VNx4BF VNx4SI VNx4SF]) > + > +;; Partial SVE modes without 2 element modes. > +(define_mode_iterator SVE_PARTIAL_NO2E [VNx8QI VNx4QI VNx4HI > + VNx4HF VNx8BF VNx4BF]) > + > ;; 2 element quad vector modes. > (define_mode_iterator VQ_2E [V2DI V2DF]) > > @@ -1678,7 +1686,15 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI "V8QI") > (V2DI "DI") (V2SF "SF") > (V4SF "V2SF") (V4HF "V2HF") > (V8HF "V4HF") (V2DF "DF") > - (V8BF "V4BF")]) > + (V8BF "V4BF") > + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") > + (VNx4QI "VNx2QI") (VNx2QI "QI") > + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") (VNx2HI "HI") > + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") (VNx2HF "HF") > + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") (VNx2BF "BF") > + (VNx4SI "VNx2SI") (VNx2SI "SI") > + (VNx4SF "VNx2SF") (VNx2SF "SF") > + (VNx2DI "DI") (VNx2DF "DF")]) Are the x2 entries necessary, given that the new uses are restricted to NO2E? Thanks, Richard > ;; Half modes of all vector modes, in lower-case. > (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") > @@ -1686,7 +1702,15 @@ (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") > (V8HF "v4hf") (V8BF "v4bf") > (V2SI "si") (V4SI "v2si") > (V2DI "di") (V2SF "sf") > - (V4SF "v2sf") (V2DF "df")]) > + (V4SF "v2sf") (V2DF "df") > + (VNx16QI "vnx8qi") (VNx8QI "vnx4qi") > + (VNx4QI "vnx2qi") (VNx2QI "qi") > + (VNx8HI "vnx4hi") (VNx4HI "vnx2hi") (VNx2HI "hi") > + (VNx8HF "vnx4hf") (VNx4HF "vnx2hf") (VNx2HF "hf") > + (VNx8BF "vnx4bf") (VNx4BF "vnx2bf") (VNx2BF "bf") > + (VNx4SI "vnx2si") (VNx2SI "si") > + (VNx4SF "vnx2sf") (VNx2SF "sf") > + (VNx2DI "di") (VNx2DF "df")]) > > ;; Single-element half modes of quad vector modes. > (define_mode_attr V1HALF [(V2DI "V1DI") (V2DF "V1DF")]) > @@ -1815,7 +1839,10 @@ (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s") > ;; Widened scalar register suffixes. > (define_mode_attr Vwstype [(V8QI "h") (V4HI "s") > (V2SI "") (V16QI "h") > - (V8HI "s") (V4SI "d")]) > + (V8HI "s") (V4SI "d") > + (VNx8QI "h") (VNx4QI "h") > + (VNx4HF "s") (VNx4HI "s") > + (VNx8BF "h") (VNx4BF "h")]) > ;; Add a .1d for V2SI. > (define_mode_attr Vwsuf [(V8QI "") (V4HI "") > (V2SI ".1d") (V16QI "") > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c > new file mode 100644 > index 0000000000000000000000000000000000000000..a25cae2708dd18cc91a7732f845419bbdb06c5c1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-std=c99" } */ > +/* { dg-additional-options "-O3 -march=armv8-a" } */ > + > +#pragma GCC target ("+sve") > +extern char __attribute__ ((simd, const)) fn3 (int, char); > +void test_fn3 (int *a, int *b, char *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = (int) (fn3 (b[i], c[i]) + c[i]); > +} > + > +/* { dg-final { scan-assembler {\s+_ZGVsMxvv_fn3\n} } } */
> > ;; 2 element quad vector modes. > > (define_mode_iterator VQ_2E [V2DI V2DF]) > > > > @@ -1678,7 +1686,15 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI > "V8QI") > > (V2DI "DI") (V2SF "SF") > > (V4SF "V2SF") (V4HF "V2HF") > > (V8HF "V4HF") (V2DF "DF") > > - (V8BF "V4BF")]) > > + (V8BF "V4BF") > > + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") > > + (VNx4QI "VNx2QI") (VNx2QI "QI") > > + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") (VNx2HI "HI") > > + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") (VNx2HF "HF") > > + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") (VNx2BF "BF") > > + (VNx4SI "VNx2SI") (VNx2SI "SI") > > + (VNx4SF "VNx2SF") (VNx2SF "SF") > > + (VNx2DI "DI") (VNx2DF "DF")]) > > Are the x2 entries necessary, given that the new uses are restricted > to NO2E? > No, but I wanted to keep the symmetry with the Adv. SIMD modes. Since the mode attributes don't really control the number of alternatives I thought it would be better to have the attributes be "fully" defined rather than only the subset I use. gcc/ChangeLog: PR target/96342 * config/aarch64/aarch64-sve.md (vec_init<mode><Vhalf>): New. (@aarch64_pack_partial<mode>): New. * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_subvector): New. * config/aarch64/iterators.md (SVE_NO2E): New. (VHALF, Vhalf): Add SVE partial vectors. gcc/testsuite/ChangeLog: PR target/96342 * gcc.target/aarch64/vect-simd-clone-2.c: New test. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Ok for master? Thanks, Tamar -- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index a72ca2a500d394598268c6adfe717eed94a304b3..8ed4221dbe5c49db97b37f186365fa391900eadb 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -2839,6 +2839,16 @@ (define_expand "vec_init<mode><Vel>" } ) +(define_expand "vec_init<mode><Vhalf>" + [(match_operand:SVE_NO2E 0 "register_operand") + (match_operand 1 "")] + "TARGET_SVE" + { + aarch64_sve_expand_vector_init (operands[0], operands[1]); + DONE; + } +) + ;; Shift an SVE vector left and insert a scalar into element 0.a (define_insn "vec_shl_insert_<mode>" [(set (match_operand:SVE_FULL 0 "register_operand") @@ -9289,6 +9299,19 @@ (define_insn "vec_pack_trunc_<Vwide>" "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>" ) +;; Integer partial pack packing two partial SVE types into a single full SVE +;; type of the same element type. Use UZP1 on the wider type, which discards +;; the high part of each wide element. This allows to concat SVE partial types +;; into a wider vector. +(define_insn "@aarch64_pack_partial<mode>" + [(set (match_operand:SVE_NO2E 0 "register_operand" "=w") + (vec_concat:SVE_NO2E + (match_operand:<VHALF> 1 "register_operand" "w") + (match_operand:<VHALF> 2 "register_operand" "w")))] + "TARGET_SVE" + "uzp1\t%0.<Vctype>, %1.<Vctype>, %2.<Vctype>" +) + ;; ------------------------------------------------------------------------- ;; ---- [INT<-INT] Unpacks ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index de4c0a0783912b54ac35d7c818c24574b27a4ca0..40214e318f3c4e30e619d96073b253887c973efc 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24859,6 +24859,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) v.quick_push (XVECEXP (vals, 0, i)); v.finalize (); + /* If we have two elements and are concatting vector. */ + machine_mode elem_mode = GET_MODE (v.elt (0)); + if (nelts == 2 && VECTOR_MODE_P (elem_mode)) + { + /* We've failed expansion using a dup. Try using a cheeky truncate. */ + rtx arg0 = force_reg (elem_mode, v.elt(0)); + rtx arg1 = force_reg (elem_mode, v.elt(1)); + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); + return; + } + /* If neither sub-vectors of v could be initialized specially, then use INSR to insert all elements from v into TARGET. ??? This might not be optimal for vectors with large @@ -24870,6 +24881,30 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) aarch64_sve_expand_vector_init_insert_elems (target, v, nelts); } +/* Initialize register TARGET from the two vector subelements in PARALLEL + rtx VALS. */ + +void +aarch64_sve_expand_vector_init_subvector (rtx target, rtx vals) +{ + machine_mode mode = GET_MODE (target); + int nelts = XVECLEN (vals, 0); + + gcc_assert (nelts == 2); + + rtx arg0 = XVECEXP (vals, 0, 0); + rtx arg1 = XVECEXP (vals, 0, 1); + + /* If we have two elements and are concatting vector. */ + machine_mode elem_mode = GET_MODE (arg0); + gcc_assert (VECTOR_MODE_P (elem_mode)); + + arg0 = force_reg (elem_mode, arg0); + arg1 = force_reg (elem_mode, arg1); + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); + return; +} + /* Check whether VALUE is a vector constant in which every element is either a power of 2 or a negated power of 2. If so, return a constant vector of log2s, and flip CODE between PLUS and MINUS diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 89c72b24aeb791adbbd3edfdb131478d52b248e6..09c2d24c4b8f1f39c27ea691f7cfe0b51bc4f788 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -140,6 +140,10 @@ (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI]) ;; VQ without 2 element modes. (define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF V8BF]) +;; SVE modes without 2 element modes. +(define_mode_iterator SVE_NO2E [VNx16QI VNx8QI VNx4QI VNx8HI VNx4HI VNx8HF + VNx4HF VNx8BF VNx4BF VNx4SI VNx4SF]) + ;; 2 element quad vector modes. (define_mode_iterator VQ_2E [V2DI V2DF]) @@ -1737,7 +1741,15 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI "V8QI") (V2DI "DI") (V2SF "SF") (V4SF "V2SF") (V4HF "V2HF") (V8HF "V4HF") (V2DF "DF") - (V8BF "V4BF")]) + (V8BF "V4BF") + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") + (VNx4QI "VNx2QI") (VNx2QI "QI") + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") (VNx2HI "HI") + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") (VNx2HF "HF") + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") (VNx2BF "BF") + (VNx4SI "VNx2SI") (VNx2SI "SI") + (VNx4SF "VNx2SF") (VNx2SF "SF") + (VNx2DI "DI") (VNx2DF "DF")]) ;; Half modes of all vector modes, in lower-case. (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") @@ -1745,7 +1757,15 @@ (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") (V8HF "v4hf") (V8BF "v4bf") (V2SI "si") (V4SI "v2si") (V2DI "di") (V2SF "sf") - (V4SF "v2sf") (V2DF "df")]) + (V4SF "v2sf") (V2DF "df") + (VNx16QI "vnx8qi") (VNx8QI "vnx4qi") + (VNx4QI "vnx2qi") (VNx2QI "qi") + (VNx8HI "vnx4hi") (VNx4HI "vnx2hi") (VNx2HI "hi") + (VNx8HF "vnx4hf") (VNx4HF "vnx2hf") (VNx2HF "hf") + (VNx8BF "vnx4bf") (VNx4BF "vnx2bf") (VNx2BF "bf") + (VNx4SI "vnx2si") (VNx2SI "si") + (VNx4SF "vnx2sf") (VNx2SF "sf") + (VNx2DI "di") (VNx2DF "df")]) ;; Single-element half modes of quad vector modes. (define_mode_attr V1HALF [(V2DI "V1DI") (V2DF "V1DF")]) diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c new file mode 100644 index 0000000000000000000000000000000000000000..a25cae2708dd18cc91a7732f845419bbdb06c5c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c99" } */ +/* { dg-additional-options "-O3 -march=armv8-a" } */ + +#pragma GCC target ("+sve") +extern char __attribute__ ((simd, const)) fn3 (int, char); +void test_fn3 (int *a, int *b, char *c, int n) +{ + for (int i = 0; i < n; ++i) + a[i] = (int) (fn3 (b[i], c[i]) + c[i]); +} + +/* { dg-final { scan-assembler {\s+_ZGVsMxvv_fn3\n} } } */
Tamar Christina <Tamar.Christina@arm.com> writes: >> > ;; 2 element quad vector modes. >> > (define_mode_iterator VQ_2E [V2DI V2DF]) >> > >> > @@ -1678,7 +1686,15 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI >> "V8QI") >> > (V2DI "DI") (V2SF "SF") >> > (V4SF "V2SF") (V4HF "V2HF") >> > (V8HF "V4HF") (V2DF "DF") >> > - (V8BF "V4BF")]) >> > + (V8BF "V4BF") >> > + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") >> > + (VNx4QI "VNx2QI") (VNx2QI "QI") >> > + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") (VNx2HI "HI") >> > + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") (VNx2HF "HF") >> > + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") (VNx2BF "BF") >> > + (VNx4SI "VNx2SI") (VNx2SI "SI") >> > + (VNx4SF "VNx2SF") (VNx2SF "SF") >> > + (VNx2DI "DI") (VNx2DF "DF")]) >> >> Are the x2 entries necessary, given that the new uses are restricted >> to NO2E? >> > > No, but I wanted to keep the symmetry with the Adv. SIMD modes. Since the > mode attributes don't really control the number of alternatives I thought it would > be better to have the attributes be "fully" defined rather than only the subset I use. But these are variable-length modes, so DI is only half of VNx2DI for the minimum vector length. It's less than half for Neoverse V1 or A64FX. IMO it'd be better to leave them out for now and defined them when needed, at which point the right choice would be more obvious. Thanks, Richard > > gcc/ChangeLog: > > PR target/96342 > * config/aarch64/aarch64-sve.md (vec_init<mode><Vhalf>): New. > (@aarch64_pack_partial<mode>): New. > * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_subvector): New. > * config/aarch64/iterators.md (SVE_NO2E): New. > (VHALF, Vhalf): Add SVE partial vectors. > > gcc/testsuite/ChangeLog: > > PR target/96342 > * gcc.target/aarch64/vect-simd-clone-2.c: New test. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? > > Thanks, > Tamar > > -- inline copy of patch -- > > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md > index a72ca2a500d394598268c6adfe717eed94a304b3..8ed4221dbe5c49db97b37f186365fa391900eadb 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -2839,6 +2839,16 @@ (define_expand "vec_init<mode><Vel>" > } > ) > > +(define_expand "vec_init<mode><Vhalf>" > + [(match_operand:SVE_NO2E 0 "register_operand") > + (match_operand 1 "")] > + "TARGET_SVE" > + { > + aarch64_sve_expand_vector_init (operands[0], operands[1]); > + DONE; > + } > +) > + > ;; Shift an SVE vector left and insert a scalar into element 0.a > (define_insn "vec_shl_insert_<mode>" > [(set (match_operand:SVE_FULL 0 "register_operand") > @@ -9289,6 +9299,19 @@ (define_insn "vec_pack_trunc_<Vwide>" > "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>" > ) > > +;; Integer partial pack packing two partial SVE types into a single full SVE > +;; type of the same element type. Use UZP1 on the wider type, which discards > +;; the high part of each wide element. This allows to concat SVE partial types > +;; into a wider vector. > +(define_insn "@aarch64_pack_partial<mode>" > + [(set (match_operand:SVE_NO2E 0 "register_operand" "=w") > + (vec_concat:SVE_NO2E > + (match_operand:<VHALF> 1 "register_operand" "w") > + (match_operand:<VHALF> 2 "register_operand" "w")))] > + "TARGET_SVE" > + "uzp1\t%0.<Vctype>, %1.<Vctype>, %2.<Vctype>" > +) > + > ;; ------------------------------------------------------------------------- > ;; ---- [INT<-INT] Unpacks > ;; ------------------------------------------------------------------------- > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index de4c0a0783912b54ac35d7c818c24574b27a4ca0..40214e318f3c4e30e619d96073b253887c973efc 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24859,6 +24859,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) > v.quick_push (XVECEXP (vals, 0, i)); > v.finalize (); > > + /* If we have two elements and are concatting vector. */ > + machine_mode elem_mode = GET_MODE (v.elt (0)); > + if (nelts == 2 && VECTOR_MODE_P (elem_mode)) > + { > + /* We've failed expansion using a dup. Try using a cheeky truncate. */ > + rtx arg0 = force_reg (elem_mode, v.elt(0)); > + rtx arg1 = force_reg (elem_mode, v.elt(1)); > + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); > + return; > + } > + > /* If neither sub-vectors of v could be initialized specially, > then use INSR to insert all elements from v into TARGET. > ??? This might not be optimal for vectors with large > @@ -24870,6 +24881,30 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) > aarch64_sve_expand_vector_init_insert_elems (target, v, nelts); > } > > +/* Initialize register TARGET from the two vector subelements in PARALLEL > + rtx VALS. */ > + > +void > +aarch64_sve_expand_vector_init_subvector (rtx target, rtx vals) > +{ > + machine_mode mode = GET_MODE (target); > + int nelts = XVECLEN (vals, 0); > + > + gcc_assert (nelts == 2); > + > + rtx arg0 = XVECEXP (vals, 0, 0); > + rtx arg1 = XVECEXP (vals, 0, 1); > + > + /* If we have two elements and are concatting vector. */ > + machine_mode elem_mode = GET_MODE (arg0); > + gcc_assert (VECTOR_MODE_P (elem_mode)); > + > + arg0 = force_reg (elem_mode, arg0); > + arg1 = force_reg (elem_mode, arg1); > + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); > + return; > +} > + > /* Check whether VALUE is a vector constant in which every element > is either a power of 2 or a negated power of 2. If so, return > a constant vector of log2s, and flip CODE between PLUS and MINUS > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index 89c72b24aeb791adbbd3edfdb131478d52b248e6..09c2d24c4b8f1f39c27ea691f7cfe0b51bc4f788 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -140,6 +140,10 @@ (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI]) > ;; VQ without 2 element modes. > (define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF V8BF]) > > +;; SVE modes without 2 element modes. > +(define_mode_iterator SVE_NO2E [VNx16QI VNx8QI VNx4QI VNx8HI VNx4HI VNx8HF > + VNx4HF VNx8BF VNx4BF VNx4SI VNx4SF]) > + > ;; 2 element quad vector modes. > (define_mode_iterator VQ_2E [V2DI V2DF]) > > @@ -1737,7 +1741,15 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI "V8QI") > (V2DI "DI") (V2SF "SF") > (V4SF "V2SF") (V4HF "V2HF") > (V8HF "V4HF") (V2DF "DF") > - (V8BF "V4BF")]) > + (V8BF "V4BF") > + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") > + (VNx4QI "VNx2QI") (VNx2QI "QI") > + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") (VNx2HI "HI") > + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") (VNx2HF "HF") > + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") (VNx2BF "BF") > + (VNx4SI "VNx2SI") (VNx2SI "SI") > + (VNx4SF "VNx2SF") (VNx2SF "SF") > + (VNx2DI "DI") (VNx2DF "DF")]) > > ;; Half modes of all vector modes, in lower-case. > (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") > @@ -1745,7 +1757,15 @@ (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") > (V8HF "v4hf") (V8BF "v4bf") > (V2SI "si") (V4SI "v2si") > (V2DI "di") (V2SF "sf") > - (V4SF "v2sf") (V2DF "df")]) > + (V4SF "v2sf") (V2DF "df") > + (VNx16QI "vnx8qi") (VNx8QI "vnx4qi") > + (VNx4QI "vnx2qi") (VNx2QI "qi") > + (VNx8HI "vnx4hi") (VNx4HI "vnx2hi") (VNx2HI "hi") > + (VNx8HF "vnx4hf") (VNx4HF "vnx2hf") (VNx2HF "hf") > + (VNx8BF "vnx4bf") (VNx4BF "vnx2bf") (VNx2BF "bf") > + (VNx4SI "vnx2si") (VNx2SI "si") > + (VNx4SF "vnx2sf") (VNx2SF "sf") > + (VNx2DI "di") (VNx2DF "df")]) > > ;; Single-element half modes of quad vector modes. > (define_mode_attr V1HALF [(V2DI "V1DI") (V2DF "V1DF")]) > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c > new file mode 100644 > index 0000000000000000000000000000000000000000..a25cae2708dd18cc91a7732f845419bbdb06c5c1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-std=c99" } */ > +/* { dg-additional-options "-O3 -march=armv8-a" } */ > + > +#pragma GCC target ("+sve") > +extern char __attribute__ ((simd, const)) fn3 (int, char); > +void test_fn3 (int *a, int *b, char *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = (int) (fn3 (b[i], c[i]) + c[i]); > +} > + > +/* { dg-final { scan-assembler {\s+_ZGVsMxvv_fn3\n} } } */
> -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Thursday, December 19, 2024 11:03 AM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; ktkachov@gcc.gnu.org > Subject: Re: [PATCH 7/7]AArch64: Implement vector concat of partial SVE vectors > > Tamar Christina <Tamar.Christina@arm.com> writes: > >> > ;; 2 element quad vector modes. > >> > (define_mode_iterator VQ_2E [V2DI V2DF]) > >> > > >> > @@ -1678,7 +1686,15 @@ (define_mode_attr VHALF [(V8QI "V4QI") > (V16QI > >> "V8QI") > >> > (V2DI "DI") (V2SF "SF") > >> > (V4SF "V2SF") (V4HF "V2HF") > >> > (V8HF "V4HF") (V2DF "DF") > >> > - (V8BF "V4BF")]) > >> > + (V8BF "V4BF") > >> > + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") > >> > + (VNx4QI "VNx2QI") (VNx2QI "QI") > >> > + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") (VNx2HI "HI") > >> > + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") (VNx2HF "HF") > >> > + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") (VNx2BF "BF") > >> > + (VNx4SI "VNx2SI") (VNx2SI "SI") > >> > + (VNx4SF "VNx2SF") (VNx2SF "SF") > >> > + (VNx2DI "DI") (VNx2DF "DF")]) > >> > >> Are the x2 entries necessary, given that the new uses are restricted > >> to NO2E? > >> > > > > No, but I wanted to keep the symmetry with the Adv. SIMD modes. Since the > > mode attributes don't really control the number of alternatives I thought it > would > > be better to have the attributes be "fully" defined rather than only the subset I > use. > > But these are variable-length modes, so DI is only half of VNx2DI for > the minimum vector length. It's less than half for Neoverse V1 or A64FX. > > IMO it'd be better to leave them out for now and defined them when needed, > at which point the right choice would be more obvious. > OK. gcc/ChangeLog: PR target/96342 * config/aarch64/aarch64-sve.md (vec_init<mode><Vhalf>): New. (@aarch64_pack_partial<mode>): New. * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_subvector): New. * config/aarch64/iterators.md (SVE_NO2E): New. (VHALF, Vhalf): Add SVE partial vectors. gcc/testsuite/ChangeLog: PR target/96342 * gcc.target/aarch64/vect-simd-clone-2.c: New test. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar -- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index a72ca2a500d394598268c6adfe717eed94a304b3..8ed4221dbe5c49db97b37f186365fa391900eadb 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -2839,6 +2839,16 @@ (define_expand "vec_init<mode><Vel>" } ) +(define_expand "vec_init<mode><Vhalf>" + [(match_operand:SVE_NO2E 0 "register_operand") + (match_operand 1 "")] + "TARGET_SVE" + { + aarch64_sve_expand_vector_init (operands[0], operands[1]); + DONE; + } +) + ;; Shift an SVE vector left and insert a scalar into element 0. (define_insn "vec_shl_insert_<mode>" [(set (match_operand:SVE_FULL 0 "register_operand") @@ -9289,6 +9299,19 @@ (define_insn "vec_pack_trunc_<Vwide>" "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>" ) +;; Integer partial pack packing two partial SVE types into a single full SVE +;; type of the same element type. Use UZP1 on the wider type, which discards +;; the high part of each wide element. This allows to concat SVE partial types +;; into a wider vector. +(define_insn "@aarch64_pack_partial<mode>" + [(set (match_operand:SVE_NO2E 0 "register_operand" "=w") + (vec_concat:SVE_NO2E + (match_operand:<VHALF> 1 "register_operand" "w") + (match_operand:<VHALF> 2 "register_operand" "w")))] + "TARGET_SVE" + "uzp1\t%0.<Vctype>, %1.<Vctype>, %2.<Vctype>" +) + ;; ------------------------------------------------------------------------- ;; ---- [INT<-INT] Unpacks ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index de4c0a0783912b54ac35d7c818c24574b27a4ca0..40214e318f3c4e30e619d96073b253887c973efc 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24859,6 +24859,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) v.quick_push (XVECEXP (vals, 0, i)); v.finalize (); + /* If we have two elements and are concatting vector. */ + machine_mode elem_mode = GET_MODE (v.elt (0)); + if (nelts == 2 && VECTOR_MODE_P (elem_mode)) + { + /* We've failed expansion using a dup. Try using a cheeky truncate. */ + rtx arg0 = force_reg (elem_mode, v.elt(0)); + rtx arg1 = force_reg (elem_mode, v.elt(1)); + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); + return; + } + /* If neither sub-vectors of v could be initialized specially, then use INSR to insert all elements from v into TARGET. ??? This might not be optimal for vectors with large @@ -24870,6 +24881,30 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) aarch64_sve_expand_vector_init_insert_elems (target, v, nelts); } +/* Initialize register TARGET from the two vector subelements in PARALLEL + rtx VALS. */ + +void +aarch64_sve_expand_vector_init_subvector (rtx target, rtx vals) +{ + machine_mode mode = GET_MODE (target); + int nelts = XVECLEN (vals, 0); + + gcc_assert (nelts == 2); + + rtx arg0 = XVECEXP (vals, 0, 0); + rtx arg1 = XVECEXP (vals, 0, 1); + + /* If we have two elements and are concatting vector. */ + machine_mode elem_mode = GET_MODE (arg0); + gcc_assert (VECTOR_MODE_P (elem_mode)); + + arg0 = force_reg (elem_mode, arg0); + arg1 = force_reg (elem_mode, arg1); + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); + return; +} + /* Check whether VALUE is a vector constant in which every element is either a power of 2 or a negated power of 2. If so, return a constant vector of log2s, and flip CODE between PLUS and MINUS diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 89c72b24aeb791adbbd3edfdb131478d52b248e6..34200b05a3abf6d51919313de1027aa4988bcb8d 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -140,6 +140,10 @@ (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI]) ;; VQ without 2 element modes. (define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF V8BF]) +;; SVE modes without 2 element modes. +(define_mode_iterator SVE_NO2E [VNx16QI VNx8QI VNx4QI VNx8HI VNx4HI VNx8HF + VNx4HF VNx8BF VNx4BF VNx4SI VNx4SF]) + ;; 2 element quad vector modes. (define_mode_iterator VQ_2E [V2DI V2DF]) @@ -1737,7 +1741,13 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI "V8QI") (V2DI "DI") (V2SF "SF") (V4SF "V2SF") (V4HF "V2HF") (V8HF "V4HF") (V2DF "DF") - (V8BF "V4BF")]) + (V8BF "V4BF") + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") + (VNx4QI "VNx2QI") + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") + (VNx4SI "VNx2SI") (VNx4SF "VNx2SF")]) ;; Half modes of all vector modes, in lower-case. (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") @@ -1745,7 +1755,13 @@ (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") (V8HF "v4hf") (V8BF "v4bf") (V2SI "si") (V4SI "v2si") (V2DI "di") (V2SF "sf") - (V4SF "v2sf") (V2DF "df")]) + (V4SF "v2sf") (V2DF "df") + (VNx16QI "vnx8qi") (VNx8QI "vnx4qi") + (VNx4QI "vnx2qi") + (VNx8HI "vnx4hi") (VNx4HI "vnx2hi") + (VNx8HF "vnx4hf") (VNx4HF "vnx2hf") + (VNx8BF "vnx4bf") (VNx4BF "vnx2bf") + (VNx4SI "vnx2si") (VNx4SF "vnx2sf")]) ;; Single-element half modes of quad vector modes. (define_mode_attr V1HALF [(V2DI "V1DI") (V2DF "V1DF")]) diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c new file mode 100644 index 0000000000000000000000000000000000000000..a25cae2708dd18cc91a7732f845419bbdb06c5c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c99" } */ +/* { dg-additional-options "-O3 -march=armv8-a" } */ + +#pragma GCC target ("+sve") +extern char __attribute__ ((simd, const)) fn3 (int, char); +void test_fn3 (int *a, int *b, char *c, int n) +{ + for (int i = 0; i < n; ++i) + a[i] = (int) (fn3 (b[i], c[i]) + c[i]); +} + +/* { dg-final { scan-assembler {\s+_ZGVsMxvv_fn3\n} } } */
Tamar Christina <Tamar.Christina@arm.com> writes: > gcc/ChangeLog: > > PR target/96342 > * config/aarch64/aarch64-sve.md (vec_init<mode><Vhalf>): New. > (@aarch64_pack_partial<mode>): New. > * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_subvector): New. > * config/aarch64/iterators.md (SVE_NO2E): New. > (VHALF, Vhalf): Add SVE partial vectors. > > gcc/testsuite/ChangeLog: > > PR target/96342 > * gcc.target/aarch64/vect-simd-clone-2.c: New test. OK, thanks. Richard > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > -- inline copy of patch -- > > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md > index a72ca2a500d394598268c6adfe717eed94a304b3..8ed4221dbe5c49db97b37f186365fa391900eadb 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -2839,6 +2839,16 @@ (define_expand "vec_init<mode><Vel>" > } > ) > > +(define_expand "vec_init<mode><Vhalf>" > + [(match_operand:SVE_NO2E 0 "register_operand") > + (match_operand 1 "")] > + "TARGET_SVE" > + { > + aarch64_sve_expand_vector_init (operands[0], operands[1]); > + DONE; > + } > +) > + > ;; Shift an SVE vector left and insert a scalar into element 0. > (define_insn "vec_shl_insert_<mode>" > [(set (match_operand:SVE_FULL 0 "register_operand") > @@ -9289,6 +9299,19 @@ (define_insn "vec_pack_trunc_<Vwide>" > "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>" > ) > > +;; Integer partial pack packing two partial SVE types into a single full SVE > +;; type of the same element type. Use UZP1 on the wider type, which discards > +;; the high part of each wide element. This allows to concat SVE partial types > +;; into a wider vector. > +(define_insn "@aarch64_pack_partial<mode>" > + [(set (match_operand:SVE_NO2E 0 "register_operand" "=w") > + (vec_concat:SVE_NO2E > + (match_operand:<VHALF> 1 "register_operand" "w") > + (match_operand:<VHALF> 2 "register_operand" "w")))] > + "TARGET_SVE" > + "uzp1\t%0.<Vctype>, %1.<Vctype>, %2.<Vctype>" > +) > + > ;; ------------------------------------------------------------------------- > ;; ---- [INT<-INT] Unpacks > ;; ------------------------------------------------------------------------- > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index de4c0a0783912b54ac35d7c818c24574b27a4ca0..40214e318f3c4e30e619d96073b253887c973efc 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24859,6 +24859,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) > v.quick_push (XVECEXP (vals, 0, i)); > v.finalize (); > > + /* If we have two elements and are concatting vector. */ > + machine_mode elem_mode = GET_MODE (v.elt (0)); > + if (nelts == 2 && VECTOR_MODE_P (elem_mode)) > + { > + /* We've failed expansion using a dup. Try using a cheeky truncate. */ > + rtx arg0 = force_reg (elem_mode, v.elt(0)); > + rtx arg1 = force_reg (elem_mode, v.elt(1)); > + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); > + return; > + } > + > /* If neither sub-vectors of v could be initialized specially, > then use INSR to insert all elements from v into TARGET. > ??? This might not be optimal for vectors with large > @@ -24870,6 +24881,30 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) > aarch64_sve_expand_vector_init_insert_elems (target, v, nelts); > } > > +/* Initialize register TARGET from the two vector subelements in PARALLEL > + rtx VALS. */ > + > +void > +aarch64_sve_expand_vector_init_subvector (rtx target, rtx vals) > +{ > + machine_mode mode = GET_MODE (target); > + int nelts = XVECLEN (vals, 0); > + > + gcc_assert (nelts == 2); > + > + rtx arg0 = XVECEXP (vals, 0, 0); > + rtx arg1 = XVECEXP (vals, 0, 1); > + > + /* If we have two elements and are concatting vector. */ > + machine_mode elem_mode = GET_MODE (arg0); > + gcc_assert (VECTOR_MODE_P (elem_mode)); > + > + arg0 = force_reg (elem_mode, arg0); > + arg1 = force_reg (elem_mode, arg1); > + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); > + return; > +} > + > /* Check whether VALUE is a vector constant in which every element > is either a power of 2 or a negated power of 2. If so, return > a constant vector of log2s, and flip CODE between PLUS and MINUS > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index 89c72b24aeb791adbbd3edfdb131478d52b248e6..34200b05a3abf6d51919313de1027aa4988bcb8d 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -140,6 +140,10 @@ (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI]) > ;; VQ without 2 element modes. > (define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF V8BF]) > > +;; SVE modes without 2 element modes. > +(define_mode_iterator SVE_NO2E [VNx16QI VNx8QI VNx4QI VNx8HI VNx4HI VNx8HF > + VNx4HF VNx8BF VNx4BF VNx4SI VNx4SF]) > + > ;; 2 element quad vector modes. > (define_mode_iterator VQ_2E [V2DI V2DF]) > > @@ -1737,7 +1741,13 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI "V8QI") > (V2DI "DI") (V2SF "SF") > (V4SF "V2SF") (V4HF "V2HF") > (V8HF "V4HF") (V2DF "DF") > - (V8BF "V4BF")]) > + (V8BF "V4BF") > + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") > + (VNx4QI "VNx2QI") > + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") > + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") > + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") > + (VNx4SI "VNx2SI") (VNx4SF "VNx2SF")]) > > ;; Half modes of all vector modes, in lower-case. > (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") > @@ -1745,7 +1755,13 @@ (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") > (V8HF "v4hf") (V8BF "v4bf") > (V2SI "si") (V4SI "v2si") > (V2DI "di") (V2SF "sf") > - (V4SF "v2sf") (V2DF "df")]) > + (V4SF "v2sf") (V2DF "df") > + (VNx16QI "vnx8qi") (VNx8QI "vnx4qi") > + (VNx4QI "vnx2qi") > + (VNx8HI "vnx4hi") (VNx4HI "vnx2hi") > + (VNx8HF "vnx4hf") (VNx4HF "vnx2hf") > + (VNx8BF "vnx4bf") (VNx4BF "vnx2bf") > + (VNx4SI "vnx2si") (VNx4SF "vnx2sf")]) > > ;; Single-element half modes of quad vector modes. > (define_mode_attr V1HALF [(V2DI "V1DI") (V2DF "V1DF")]) > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c > new file mode 100644 > index 0000000000000000000000000000000000000000..a25cae2708dd18cc91a7732f845419bbdb06c5c1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-std=c99" } */ > +/* { dg-additional-options "-O3 -march=armv8-a" } */ > + > +#pragma GCC target ("+sve") > +extern char __attribute__ ((simd, const)) fn3 (int, char); > +void test_fn3 (int *a, int *b, char *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + a[i] = (int) (fn3 (b[i], c[i]) + c[i]); > +} > + > +/* { dg-final { scan-assembler {\s+_ZGVsMxvv_fn3\n} } } */
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 9afd11d347626eeb640722fdba2ab763b8479aa7..9e3577be6e943d7a5c951196463873d4bcfee07c 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -2840,6 +2840,16 @@ (define_expand "vec_init<mode><Vel>" } ) +(define_expand "vec_init<mode><Vhalf>" + [(match_operand:SVE_NO2E 0 "register_operand") + (match_operand 1 "")] + "TARGET_SVE" + { + aarch64_sve_expand_vector_init (operands[0], operands[1]); + DONE; + } +) + ;; Shift an SVE vector left and insert a scalar into element 0. (define_insn "vec_shl_insert_<mode>" [(set (match_operand:SVE_FULL 0 "register_operand") @@ -9347,6 +9357,20 @@ (define_insn "vec_pack_trunc_<Vwide>" "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>" ) +;; Integer partial pack packing two partial SVE types into a single full SVE +;; type of the same element type. Use UZP1 on the wider type, which discards +;; the high part of each wide element. This allows to concat SVE partial types +;; into a wider vector. +(define_insn "@aarch64_pack_partial<mode>" + [(set (match_operand:SVE_PARTIAL_NO2E 0 "register_operand" "=w") + (unspec:SVE_PARTIAL_NO2E + [(match_operand:<VHALF> 1 "register_operand" "w") + (match_operand:<VHALF> 2 "register_operand" "w")] + UNSPEC_PACK))] + "TARGET_SVE" + "uzp1\t%0.<Vwstype>, %1.<Vwstype>, %2.<Vwstype>" +) + ;; ------------------------------------------------------------------------- ;; ---- [INT<-INT] Unpacks ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index af6fede102c2be6673c24f8020d000ea56322997..690d54b0a2954327e00d559f96c414c81c2e18cd 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24790,6 +24790,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx vals) v.quick_push (XVECEXP (vals, 0, i)); v.finalize (); + /* If we have two elements and are concatting vector. */ + machine_mode elem_mode = GET_MODE (v.elt (0)); + if (nelts == 2 && VECTOR_MODE_P (elem_mode)) + { + /* We've failed expansion using a dup. Try using a cheeky truncate. */ + rtx arg0 = force_reg (elem_mode, v.elt(0)); + rtx arg1 = force_reg (elem_mode, v.elt(1)); + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); + return; + } + /* If neither sub-vectors of v could be initialized specially, then use INSR to insert all elements from v into TARGET. ??? This might not be optimal for vectors with large diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 023893d35f3e955e222c322ce370e84c95c29ee6..77d23d6ad795630d3d5fb5c076c086a479d46fee 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -138,6 +138,14 @@ (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI]) ;; VQ without 2 element modes. (define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF V8BF]) +;; SVE modes without 2 element modes. +(define_mode_iterator SVE_NO2E [VNx16QI VNx8QI VNx4QI VNx8HI VNx4HI VNx8HF + VNx4HF VNx8BF VNx4BF VNx4SI VNx4SF]) + +;; Partial SVE modes without 2 element modes. +(define_mode_iterator SVE_PARTIAL_NO2E [VNx8QI VNx4QI VNx4HI + VNx4HF VNx8BF VNx4BF]) + ;; 2 element quad vector modes. (define_mode_iterator VQ_2E [V2DI V2DF]) @@ -1678,7 +1686,15 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI "V8QI") (V2DI "DI") (V2SF "SF") (V4SF "V2SF") (V4HF "V2HF") (V8HF "V4HF") (V2DF "DF") - (V8BF "V4BF")]) + (V8BF "V4BF") + (VNx16QI "VNx8QI") (VNx8QI "VNx4QI") + (VNx4QI "VNx2QI") (VNx2QI "QI") + (VNx8HI "VNx4HI") (VNx4HI "VNx2HI") (VNx2HI "HI") + (VNx8HF "VNx4HF") (VNx4HF "VNx2HF") (VNx2HF "HF") + (VNx8BF "VNx4BF") (VNx4BF "VNx2BF") (VNx2BF "BF") + (VNx4SI "VNx2SI") (VNx2SI "SI") + (VNx4SF "VNx2SF") (VNx2SF "SF") + (VNx2DI "DI") (VNx2DF "DF")]) ;; Half modes of all vector modes, in lower-case. (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") @@ -1686,7 +1702,15 @@ (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") (V8HF "v4hf") (V8BF "v4bf") (V2SI "si") (V4SI "v2si") (V2DI "di") (V2SF "sf") - (V4SF "v2sf") (V2DF "df")]) + (V4SF "v2sf") (V2DF "df") + (VNx16QI "vnx8qi") (VNx8QI "vnx4qi") + (VNx4QI "vnx2qi") (VNx2QI "qi") + (VNx8HI "vnx4hi") (VNx4HI "vnx2hi") (VNx2HI "hi") + (VNx8HF "vnx4hf") (VNx4HF "vnx2hf") (VNx2HF "hf") + (VNx8BF "vnx4bf") (VNx4BF "vnx2bf") (VNx2BF "bf") + (VNx4SI "vnx2si") (VNx2SI "si") + (VNx4SF "vnx2sf") (VNx2SF "sf") + (VNx2DI "di") (VNx2DF "df")]) ;; Single-element half modes of quad vector modes. (define_mode_attr V1HALF [(V2DI "V1DI") (V2DF "V1DF")]) @@ -1815,7 +1839,10 @@ (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s") ;; Widened scalar register suffixes. (define_mode_attr Vwstype [(V8QI "h") (V4HI "s") (V2SI "") (V16QI "h") - (V8HI "s") (V4SI "d")]) + (V8HI "s") (V4SI "d") + (VNx8QI "h") (VNx4QI "h") + (VNx4HF "s") (VNx4HI "s") + (VNx8BF "h") (VNx4BF "h")]) ;; Add a .1d for V2SI. (define_mode_attr Vwsuf [(V8QI "") (V4HI "") (V2SI ".1d") (V16QI "") diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c new file mode 100644 index 0000000000000000000000000000000000000000..a25cae2708dd18cc91a7732f845419bbdb06c5c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c99" } */ +/* { dg-additional-options "-O3 -march=armv8-a" } */ + +#pragma GCC target ("+sve") +extern char __attribute__ ((simd, const)) fn3 (int, char); +void test_fn3 (int *a, int *b, char *c, int n) +{ + for (int i = 0; i < n; ++i) + a[i] = (int) (fn3 (b[i], c[i]) + c[i]); +} + +/* { dg-final { scan-assembler {\s+_ZGVsMxvv_fn3\n} } } */