From patchwork Thu Jun 16 10:48:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 55130 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 27B31385627D for ; Thu, 16 Jun 2022 10:49:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 27B31385627D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655376596; bh=g6QPmkBCkkHdlkPHyFvhYELtWbId3XssAorRaTiVqtM=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=JLr+O0hr/pg7D2aXLwrSLh8LLnFXFikqzUMhQdteq9dNvqpFxq1KiQhQeMcBkPdb6 FdWr1o6JUcZyHEAL5kBqw2u4wu1Ub894wluoqWyl+vOrND1Mw7T1x+e0YYOsuViCAg PAfbP8yjVF50gHt+1DLs2gM+aDGo6l301VEL18Xk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60040.outbound.protection.outlook.com [40.107.6.40]) by sourceware.org (Postfix) with ESMTPS id A3DCB385736D for ; Thu, 16 Jun 2022 10:49:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A3DCB385736D ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=POKCZWHo5jenR1UOa0jmSyLBNcDo8xF9FFvW43NYnhN5PqEdqIAV25VYhVaH20XEC2KPcbuL9ZK1qIukSclUaxohobv0fWJUl9OiJBywnUyKXyyo375lsbUhMKbbBYXF7wkouAbEfSimxsiyUukcW8+BYrcM2apH+FQ2IZXKZdk9wFHewk2AkT8hkDxs5JnVatxTcjo9Lbtr+u21rQ7Pr6TV8i3LM5A0i06BVQE7AnZ8Nj9CD6DyYAxlAU+6IEoQ+R4XNm80YOlxkw1DXF1c5n+N9NrUuoAPS9URoZ+WG3MC8JFwC8SdjhHV38puW/iXRqE+javCUcFl1hD68/5S4w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=g6QPmkBCkkHdlkPHyFvhYELtWbId3XssAorRaTiVqtM=; b=b5MyYUhA3p2FulRTOhM/XP2TOqhvzMiY60VD8vlNpz3wO7FMwRYbEcI64fCpXbFkwNHCc2XCVJJTHwuHyA7OavQr/lv4xuOfypEPMWHhjSEmIv6c5MyK3d3Ar2loTf20UrnLS7pbh7uKXDwKBuH4AjGGc+PFxWMx233BJYazGRm08tdnUGuipeB/ItJiwlTax2laqCNvcsuFYtwaqVL/EH9EU4HzLoCwnHAL6GkOfqSoe/6C4cuCnAUxmFN3bqe9CSJII13Sgz/G0eJzn96lVl2eojCy64+qL5oTH4Ot/d6ngwmstidHeU8pdVXo7x7MmWutlnNxFAdjwf6+7mUpBA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from AS9PR06CA0050.eurprd06.prod.outlook.com (2603:10a6:20b:463::13) by GV1PR08MB7347.eurprd08.prod.outlook.com (2603:10a6:150:22::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.15; Thu, 16 Jun 2022 10:49:21 +0000 Received: from VE1EUR03FT019.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:463:cafe::1a) by AS9PR06CA0050.outlook.office365.com (2603:10a6:20b:463::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.14 via Frontend Transport; Thu, 16 Jun 2022 10:49:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT019.mail.protection.outlook.com (10.152.18.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.14 via Frontend Transport; Thu, 16 Jun 2022 10:49:20 +0000 Received: ("Tessian outbound 5b5a41c043d3:v120"); Thu, 16 Jun 2022 10:49:20 +0000 X-CR-MTA-TID: 64aa7808 Received: from 3d3deeb3f623.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 710E1AFA-2991-4B53-A20D-BE988D0F7E19.1; Thu, 16 Jun 2022 10:48:49 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3d3deeb3f623.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 16 Jun 2022 10:48:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=M181j1k2slDI2FZzhrWH9bB2Q9ZH7i6rPrp5Yc5XlvMZzwmY8ehEBG8Esf6IqRkF4yFnPKZEBFrW0W/7+IPVLujIX9ofveR5/S3yho3spRQK3xMBlyoQdTN2b9D46RdHRRcFtKYWeuUR9YhAsHGcmGsYxKEVRdjjpwCC+TQLKnLZHKfePeXtz/tnbLofHrxTh8d8y40gOmg65TEQ4u5yz5fWmami9szFSMk22scCkM7Rwu1yjuqlAZfvEC45Dqo9Pdbi1rlg3BCJprXJXeYYZca25C79r/dVnr+goi0S5JebquF5bpJO+58oUsOSeP2lQoG+UE1hEY9W4aPUzuFzqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=g6QPmkBCkkHdlkPHyFvhYELtWbId3XssAorRaTiVqtM=; b=GFbe1koOicZfRs5aLC54AEHm8JLMHuXGqmkF3c0cpbgNRgcQKafPG8+ZtcTh07xC1x4XF79YDFFjdj0fvNEsy5zqaCCkFKBtJf4HzzdtwSdu2izRsCDkKm8QMT5lA4Gusri7an3DW7YK7lyVYgGnuUxW0AOIm5G/oNft58LvZLaZcnshx6S5OhGbhUUhSQGV1tG4tc23Qce1wpzBLR3o81y2Oon57sdOeQ/csHenRpVQG0fW9VxlaytETDrMJAaATQlS3uZw+TDCBkh3h5GN2HTTsx5rqr3SyFFDxa0dIRcGwRX5D+6MM///fA6HMv9VjHalkUIKXuP3Sfu+zCm/XA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS1PR08MB7401.eurprd08.prod.outlook.com (2603:10a6:20b:4c7::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.14; Thu, 16 Jun 2022 10:48:47 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4%8]) with mapi id 15.20.5353.014; Thu, 16 Jun 2022 10:48:47 +0000 Date: Thu, 16 Jun 2022 11:48:44 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2]AArch64 Add fallback case using sdot for usdot Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO4P265CA0139.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2c4::14) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 89d1e582-eec9-43f0-73a1-08da4f85de58 X-MS-TrafficTypeDiagnostic: AS1PR08MB7401:EE_|VE1EUR03FT019:EE_|GV1PR08MB7347:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: QevIglYBgIwYz0WfUUi2uxUtjbL3QBhD5bNMOtZs/Tom6DZdnNVsLCoQiWUsF9aROeEoQ0LQn5TFHWKMJSlMfwIFZPsH7kR899JzxhEXfyC/98w97rMTjymKdiPSmfXZDCJkOYtgIEIlwUl/4YOWQRBNa2xCAiU9epMd+os4C6w5ECZLaxI8R6peR7qxCb+xybAbuEmTZegJRosj21gIIgFMulz3RWRT/fG9pEvLSAgZZQJu5UBzsTPKDNCnG62x+FFpb1aaIlGe9lX0CYvzRPoAH3ZIE1ybjRQMh7CtLQS4CJilv5TLOBH1sNywg+By8InJW4SGqJccgliWu59YqSwAb50Ek2MYnuEkECev8FF3U2PMcjIcS97RtksOTqIlAGJHHmPMRHJ8A/grp9DoPYxRZ8OZ6Lzxg46MS1fY1nYudzXg6x9C+IUpA9TVSNVjo2cpsqythaGvzyJjf7+URToUnu3x+Cd0L2UUDOvQ5T/OAyuFZ06O8NUo4533x/rZeLj4qPgKa/xR2MJ0RXcSemNDf6XV/svWE+fQb64DR+dpCm1uWub2rmJ857Nzqm9R3RWlRopywcKIDPhDUvKzRpwxWGxWo0IjcB48POpjTE6aU7+JQlQhRx+XKhpDoGrpm0D25JKdT6keRu2PUIHABlUxFAXCKf6zEup4agjPbseqEtlddhWO24gmntMYX9vHmMp3+4JK0vcs+9zukdOym7dxZJpbJyf8s57d12PaPFbWrIs5qDOvvhWv6SWsjKB4X4QLVy9IQk5GV1kvNZimHhuWjH8IHifKe44sOjrkhcFj89dJY/6vmAOulxiRB/Qj X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(4636009)(366004)(2906002)(508600001)(8676002)(66556008)(4326008)(66476007)(6666004)(8936002)(66946007)(6486002)(44832011)(235185007)(4743002)(84970400001)(6512007)(86362001)(26005)(5660300002)(33964004)(44144004)(6506007)(38100700002)(6916009)(316002)(186003)(2616005)(36756003)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS1PR08MB7401 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT019.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: cced830f-9082-4f30-ecdd-08da4f85ca79 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qtxhfTJwEnQU2HqfRR0kkbDW24QzvoRxVwEXH8jObUgn0kkZzHnaGGOmOou/ELx7CyCM/U46jPLTlcozPHqOTWuFavMyM9zwOaH4kVxVq/5Esg+vxD7v0ftSjGByRtIwl8ycGc56qyDbxZq610KddZ1144+9dIfc6eo+2EnfcqE0ZcWOEQLH6CCpeaQ8AsCtxi9NkpSeq591dyyrwXbTjo8twDdPmrhBBMuduvQXDLtOyilKDGeTPua+HbL9b/SUSFmGwk5gn54EptWghrIX5tjYd2QyZN5H/YI/MZQWop94oYwEMmsrWHR6qnxj7sI7Tldl9O3aCdYU0h1hjoGoKCEN6itDRCjgAj6p3UjnKY4uCsAOwnG838rEJ7z4fDWxZaGkVbSi8Y3rSbdqCHzzExaGKIMKLEjgRa46kwjmrW7IxtqEiDla+lxgdzZtGl/DzN4bSpzUaQ78n7+Kkd3d3itGojU0N3Q8kQZm6XmkXccs3NO5ZeFno7htUJWmt4V0wLushT+N7SibLKfTJTSKm83qwkjRgbv9uhHf8203q59xO0/cyTDFF/me2ByxSmpa4UF60xBwQSdpqr13c5+u42mVEd8yH35ooCJMGeWlWUzloJVMEAS94NN1kFu7V/BsfRg0Opia5i89T3ZOsNaGc82fTmXYlN5/MBOkJCVa9EAJn3bdH1P9lz3zyz6KOz9vbxTLhkXJ+TiJeG6mo/eGPFr5bp8pcRMcfwcl79kvtmc/rb0Fih/4uWYN2U1hRYKfd4itam5m6ZqekEMStGxnOF7vyM8MB98drqEufdNWYId9asnzjNzdk1Bh9axeCTUt X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230016)(4636009)(46966006)(36840700001)(40470700004)(6916009)(44832011)(5660300002)(6506007)(70206006)(4743002)(36860700001)(70586007)(2906002)(82310400005)(235185007)(6512007)(84970400001)(47076005)(8676002)(86362001)(6486002)(81166007)(44144004)(2616005)(6666004)(508600001)(316002)(26005)(4326008)(8936002)(33964004)(40460700003)(356005)(186003)(36756003)(336012)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jun 2022 10:49:20.4523 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 89d1e582-eec9-43f0-73a1-08da4f85de58 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT019.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB7347 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, The usdot operation is common in video encoder and decoders including some of the most widely used ones. This patch adds a +dotprod version of the optab as a fallback for when you do have sdot but not usdot available. The fallback works by adding a bias to the unsigned argument to convert it to a signed value and then correcting for the bias later on. Essentially it relies on (x - 128)y + 128y == xy where x is unsigned and y is signed (assuming both are 8-bit values). Because the range of a signed byte is only to 127 we split the bias correction into: (x - 128)y + 127y + y Concretely for: #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a, SIGNEDNESS_4 char *restrict b) { for (__INTPTR_TYPE__ i = 0; i < N; ++i) { int av = a[i]; int bv = b[i]; SIGNEDNESS_2 short mult = av * bv; res += mult; } return res; } we generate: movi v5.16b, 0x7f mov x3, 0 movi v4.16b, 0x1 movi v3.16b, 0xffffffffffffff80 movi v0.4s, 0 .L2: ldr q2, [x2, x3] ldr q1, [x1, x3] add x3, x3, 16 sub v2.16b, v2.16b, v3.16b sdot v0.4s, v2.16b, v1.16b sdot v0.4s, v5.16b, v1.16b sdot v0.4s, v4.16b, v1.16b cmp x3, 480 bne .L2 instead of: movi v0.4s, 0 mov x3, 0 .L2: ldr q2, [x1, x3] ldr q1, [x2, x3] add x3, x3, 16 sxtl v4.8h, v2.8b sxtl2 v3.8h, v2.16b uxtl v2.8h, v1.8b uxtl2 v1.8h, v1.16b mul v2.8h, v2.8h, v4.8h mul v1.8h, v1.8h, v3.8h saddw v0.4s, v0.4s, v2.4h saddw2 v0.4s, v0.4s, v2.8h saddw v0.4s, v0.4s, v1.4h saddw2 v0.4s, v0.4s, v1.8h cmp x3, 480 bne .L2 The new sequence is significantly faster as the operations it uses are well optimized. Note that execution tests are already in the mid-end testsuite. Thanks to James Greenhalgh for the tip-off. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (usdot_prod): Generate fallback or call original isns ... (usdot_prod_insn): ...here. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vusdot-autovec-2.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cf2f4badacc594df9ecf06de3f8ea570ef9e0ff2..235a6fa371e471816284e3383e8564e9cf643a74 100644 --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cf2f4badacc594df9ecf06de3f8ea570ef9e0ff2..235a6fa371e471816284e3383e8564e9cf643a74 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -623,7 +623,7 @@ (define_insn "dot_prod" ;; These instructions map to the __builtins for the Armv8.6-a I8MM usdot ;; (vector) Dot Product operation and the vectorized optab. -(define_insn "usdot_prod" +(define_insn "usdot_prod_insn" [(set (match_operand:VS 0 "register_operand" "=w") (plus:VS (unspec:VS [(match_operand: 1 "register_operand" "w") @@ -635,6 +635,43 @@ (define_insn "usdot_prod" [(set_attr "type" "neon_dot")] ) +;; usdot auto-vec fallback code +(define_expand "usdot_prod" + [(set (match_operand:VS 0 "register_operand") + (plus:VS + (unspec:VS [(match_operand: 1 "register_operand") + (match_operand: 2 "register_operand")] + UNSPEC_USDOT) + (match_operand:VS 3 "register_operand")))] + "TARGET_DOTPROD || TARGET_I8MM" +{ + if (TARGET_I8MM) + { + emit_insn (gen_usdot_prod_insn (operands[0], operands[1], + operands[2], operands[3])); + DONE; + } + + machine_mode elemmode = GET_MODE_INNER (mode); + HOST_WIDE_INT val = 1 << (GET_MODE_BITSIZE (elemmode).to_constant () - 1); + rtx signbit = gen_int_mode (val, elemmode); + rtx t1 = gen_reg_rtx (mode); + rtx t2 = gen_reg_rtx (mode); + rtx tmp = gen_reg_rtx (mode); + rtx c1 = gen_const_vec_duplicate (mode, + gen_int_mode (val - 1, elemmode)); + rtx c2 = gen_const_vec_duplicate (mode, gen_int_mode (1, elemmode)); + rtx dup = gen_const_vec_duplicate (mode, signbit); + c1 = force_reg (mode, c1); + c2 = force_reg (mode, c2); + dup = force_reg (mode, dup); + emit_insn (gen_sub3 (tmp, operands[1], dup)); + emit_insn (gen_sdot_prod (t1, tmp, operands[2], operands[3])); + emit_insn (gen_sdot_prod (t2, c1, operands[2], t1)); + emit_insn (gen_sdot_prod (operands[0], c2, operands[2], t2)); + DONE; +}) + ;; These instructions map to the __builtins for the Dot Product ;; indexed operations. (define_insn "aarch64_dot_lane" diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec-2.c b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec-2.c new file mode 100644 index 0000000000000000000000000000000000000000..acd8e36209690386d021df72f1467a696750ac3e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec-2.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=armv8.2-a+noi8mm+dotprod" } */ + +#define N 480 +#define SIGNEDNESS_1 unsigned +#define SIGNEDNESS_2 signed +#define SIGNEDNESS_3 signed +#define SIGNEDNESS_4 unsigned + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a, + SIGNEDNESS_4 char *restrict b) +{ + for (__INTPTR_TYPE__ i = 0; i < N; ++i) + { + int av = a[i]; + int bv = b[i]; + SIGNEDNESS_2 short mult = av * bv; + res += mult; + } + return res; +} + +/* { dg-final { scan-assembler-not {\tusdot\t} } } */ +/* { dg-final { scan-assembler-times {\tsdot\t} 3 } } */ --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -623,7 +623,7 @@ (define_insn "dot_prod" ;; These instructions map to the __builtins for the Armv8.6-a I8MM usdot ;; (vector) Dot Product operation and the vectorized optab. -(define_insn "usdot_prod" +(define_insn "usdot_prod_insn" [(set (match_operand:VS 0 "register_operand" "=w") (plus:VS (unspec:VS [(match_operand: 1 "register_operand" "w") @@ -635,6 +635,43 @@ (define_insn "usdot_prod" [(set_attr "type" "neon_dot")] ) +;; usdot auto-vec fallback code +(define_expand "usdot_prod" + [(set (match_operand:VS 0 "register_operand") + (plus:VS + (unspec:VS [(match_operand: 1 "register_operand") + (match_operand: 2 "register_operand")] + UNSPEC_USDOT) + (match_operand:VS 3 "register_operand")))] + "TARGET_DOTPROD || TARGET_I8MM" +{ + if (TARGET_I8MM) + { + emit_insn (gen_usdot_prod_insn (operands[0], operands[1], + operands[2], operands[3])); + DONE; + } + + machine_mode elemmode = GET_MODE_INNER (mode); + HOST_WIDE_INT val = 1 << (GET_MODE_BITSIZE (elemmode).to_constant () - 1); + rtx signbit = gen_int_mode (val, elemmode); + rtx t1 = gen_reg_rtx (mode); + rtx t2 = gen_reg_rtx (mode); + rtx tmp = gen_reg_rtx (mode); + rtx c1 = gen_const_vec_duplicate (mode, + gen_int_mode (val - 1, elemmode)); + rtx c2 = gen_const_vec_duplicate (mode, gen_int_mode (1, elemmode)); + rtx dup = gen_const_vec_duplicate (mode, signbit); + c1 = force_reg (mode, c1); + c2 = force_reg (mode, c2); + dup = force_reg (mode, dup); + emit_insn (gen_sub3 (tmp, operands[1], dup)); + emit_insn (gen_sdot_prod (t1, tmp, operands[2], operands[3])); + emit_insn (gen_sdot_prod (t2, c1, operands[2], t1)); + emit_insn (gen_sdot_prod (operands[0], c2, operands[2], t2)); + DONE; +}) + ;; These instructions map to the __builtins for the Dot Product ;; indexed operations. (define_insn "aarch64_dot_lane" diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec-2.c b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec-2.c new file mode 100644 index 0000000000000000000000000000000000000000..acd8e36209690386d021df72f1467a696750ac3e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vusdot-autovec-2.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=armv8.2-a+noi8mm+dotprod" } */ + +#define N 480 +#define SIGNEDNESS_1 unsigned +#define SIGNEDNESS_2 signed +#define SIGNEDNESS_3 signed +#define SIGNEDNESS_4 unsigned + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a, + SIGNEDNESS_4 char *restrict b) +{ + for (__INTPTR_TYPE__ i = 0; i < N; ++i) + { + int av = a[i]; + int bv = b[i]; + SIGNEDNESS_2 short mult = av * bv; + res += mult; + } + return res; +} + +/* { dg-final { scan-assembler-not {\tusdot\t} } } */ +/* { dg-final { scan-assembler-times {\tsdot\t} 3 } } */