From patchwork Tue Dec 21 12:31:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 49144 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D3AB03858428 for ; Tue, 21 Dec 2021 12:32:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D3AB03858428 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1640089941; bh=rDT61YR4z6quhpthK4GJQNLDDPfZNLX3gnWPf9oPJBw=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=ecnEIsau7U/F2C1XJ97hMabxbMErSl4NM8fZ74/a0diZebRHKo9myQLebMoYuJ61J BjFZKKebQFbRc/U1zxYIrIi1W6yuIeWvY2MmPhrVYzD5yOjSvF3XCKl930H2KmO/Xz 9B8OLamvJ9fX6U8pV7lEfNfFBQp4J0ly/XcA4MXY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR02-AM5-obe.outbound.protection.outlook.com (mail-eopbgr00065.outbound.protection.outlook.com [40.107.0.65]) by sourceware.org (Postfix) with ESMTPS id EAEF93858C2C for ; Tue, 21 Dec 2021 12:31:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EAEF93858C2C Received: from AM6P193CA0056.EURP193.PROD.OUTLOOK.COM (2603:10a6:209:8e::33) by AM6PR08MB3800.eurprd08.prod.outlook.com (2603:10a6:20b:87::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.17; Tue, 21 Dec 2021 12:31:45 +0000 Received: from VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:8e:cafe::98) by AM6P193CA0056.outlook.office365.com (2603:10a6:209:8e::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14 via Frontend Transport; Tue, 21 Dec 2021 12:31:45 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT063.mail.protection.outlook.com (10.152.18.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14 via Frontend Transport; Tue, 21 Dec 2021 12:31:45 +0000 Received: ("Tessian outbound a33f292be81b:v110"); Tue, 21 Dec 2021 12:31:44 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: abfad42acdacc5e4 X-CR-MTA-TID: 64aa7808 Received: from 7aad7245fcda.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 9166E5F6-B13C-4EA2-B596-3DA131CAC558.1; Tue, 21 Dec 2021 12:31:32 +0000 Received: from EUR02-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 7aad7245fcda.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 21 Dec 2021 12:31:32 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=e4a8Vj3UbOMED+s3I/ChXsDRgV/SRAbtcFDkkFPobyGXbx9xzJfUMlP5r52K0fgQ563OsVn8iiQ4X66dxIcJUkunl8mfJ+ygwHrWul/ouAVRPL++BGXoGOORNdz0bxb9o0vIvbMI1DAGbHijgPEjYW6B3D9l0CHx2Av9hWOkF8G80SykGedOgULP35lWvgUZYOr4/qkTMCk6oSwEOifJsaUWJGX6lNd8EJg9/sQkqohFKpp4zpgz5G1wy50j4xTLJqYLinJJqRF3ghZG+sopO+eP3JlRaejXgut0MOA6OFH+e0qNBBOF8FyB53Evv5KUVDhU242G0DwFB5No2kknCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rDT61YR4z6quhpthK4GJQNLDDPfZNLX3gnWPf9oPJBw=; b=Tqin329Z7Ku2+jjpw+l8tP4vNjSRACT28hDDZC0r2Tf4TtPUGvtNlBEMpAGZ0pz0HK6l4Qy6YNgEbSUiz4hRB3gnx5kaGXovF+/iVXoj1ZHnps8CNoslr/Yl0Gdapyr1z+M9Fg8s5GM3T4W3wGcIXxSzl/gGBY+BPaPf1uH5yZfF2JzsQ2ayXKAakJN7eR6lQFKN3SdNOVx2rMgpa3+YOQ1+Wx0g18ew+lEGAjjuh3siE/+SwgpS5vLQExgr7AwZUiJ1SaV0V6kW1lystkd3FZINl6XXPHC7k3+XcSwfWrbw0QM60TpHg3iKHHL06cLN42Ja1a7mz6m+C1HPk/PZEw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR08MB4253.eurprd08.prod.outlook.com (2603:10a6:803:f6::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.18; Tue, 21 Dec 2021 12:31:29 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::12a:3d2c:81ff:8fcc]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::12a:3d2c:81ff:8fcc%8]) with mapi id 15.20.4801.022; Tue, 21 Dec 2021 12:31:29 +0000 Date: Tue, 21 Dec 2021 12:31:20 +0000 To: gcc-patches@gcc.gnu.org Subject: [AArch32]: correct dot-product RTL patterns. Message-ID: Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: SN4PR0701CA0046.namprd07.prod.outlook.com (2603:10b6:803:2d::33) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 0958b726-af54-425d-35d0-08d9c47dd9c1 X-MS-TrafficTypeDiagnostic: VI1PR08MB4253:EE_|VE1EUR03FT063:EE_|AM6PR08MB3800:EE_ X-LD-Processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 1AsgoCtwEJD1km9cpGkWIDnMEPTo7dO+99sVY/sFK1zYxHt5Jv1aYDtevOvyyaoPHgj3/qsprMsmWRUw1oHxxuq2/4jVChU2z/bfU/6gyLGTuIw+k+M5z/49Jbjv3Gyv5PqNZas3JiJosfvdaVPcqLj97xMwlf3gZS4i9oGdWQJVl0V59P+6nTpI2Keoq0UukQxegrPxcttTytMTGI/8HOnPQjfzIKXabwuSty1unD2+KHP20keBVsB+pjv732jgt5YPSmwTouWpeKofpaIwWoQjC1HfmFP167UksDETfQwnWf15R5HfWlJsd3ipaEakqY/vtxtpEqO9FUzmwPZ0EuGRBNWNLNIVKpB4EZz/EMHanyG82IRc+NZE7mWlnRP430xTDu8NgOmAO6aFagZHGVrEsEAFnt2WRbiqOlAG5rPmyLAfnMhunzzz4epseAyAVKYaWCc1vApP2pQbSPk4jc4PPwoVfmG9I3+HcjpFuesBIrtR5HS9R36NttLJRFQ1HK1bQQKdvgvn4X0cdHyUUohPJ3LL4XrcnNbc7X8imFP125Gx0HkCgceg2322QR2wTxaRNSM/0nKbexonno1wU51IG1rhV839mIQRLE5QtKNwbqZuhp7nEcdazJ3m1Ch6j4Lfk2bMLWrfzsRhyKllOb+8L63fD0Vy17TLLUbrMAQBkLiqcjJeUIsmhA+t3zp3ZPaf9ZjNDOeAjj0THH5ZKQ7fl+mFEeBh80rpwwbTY4fiRnWinocQgU+GAACLTnHGF0VIe3zU7HAN03oOevqYfm8bfSCxABM+LKK1jN8erd2Wlz88i46JFNMLjOfH+eZ1QT9iU7DURXDwLToC0hFLnMUcLUEXyBj8spVkaiwylfCDUNjaE2n90y8C2inWBH8C X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(30864003)(6512007)(66946007)(66556008)(186003)(6506007)(2906002)(508600001)(38350700002)(36756003)(6486002)(4326008)(38100700002)(66476007)(8676002)(4743002)(26005)(966005)(83380400001)(86362001)(5660300002)(316002)(8936002)(6916009)(235185007)(2616005)(6666004)(84970400001)(44144004)(33964004)(52116002)(44832011)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4253 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 38972b7a-5f6c-4382-9bdf-08d9c47dcfee X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: f3992B400FC/vGC0XUOECOxBYl7RJFHmt5cEgB1ts4+qrihsuGJCg9pjCbb5fg/KKFk90PjJvYix7wvlRzDO36iObBAt7SC+L6mYXlX9q16s6PRIO/4jZJmKr2jhpAVKtchV7/AyMf4zohDRYtgGpRrFGdhFpTZNG5v65pQRLj8G9HGf0bFv7FOJJv5fJU3ZjMQk99sT7pYKmWvJEmYuCrroi6ZoyqVC+3X+Ur8TBtNvM+V6X2a+3j16+z9Bj7Bqvp+RDKcjzHDTwsdV5Q5OAuEV9SIxG/bgWlKgzeb41OsJL5FpCChEer3fSB1bn1gqT1uUUg99wkOQsKViLFZtGNeTXTpyoojEVbXJ83Pq6EHZrVjhQtbHAVwUUlF9E3kJbmytU0N/MVBfwby3UKdaCUCRjE7aF4WNwz1YP09ZWWu4JNvvoAjHcbkefSS+3xfNLhUl7zUh5CEEKVXKG9OqvUudq4P7riPsGMyaWLqIibYAwEck1+QffWvxBWzGcW98bmTrPFyvbgp8X85NKbTajPZh0uiP0s1umbtmViClqTFAOoHwsYBubPGDJ6semwAMQfqpzWmTA/9vSlIvW8jat3Befgk8GIcegvWFbtCOmpztxCOZalYAwU6XYCxsbSxrq9uSBBdw1tuFyUvlIaM2l3Txf57BmWLuhgAAW3DQ1jXp2SDciynpdmbghvvGs6VL87HXm5XwuX1PqsBam/9hawp/8q6x716vjwRHc6qigQlxELnhBHvUtt7NSV+VUVV1PQsCy8ftZ/0CoLg0nHATiwHX6zcapIBB4S1JVh3O8wZXGQ9lHWvhc+2OWKt8tEPzwTCS+liLMpQ807j7i/0x5t407uxwAxQ76xJwMURdgLlKRiLlINxMjYOTeqmRdIyi X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(81166007)(36756003)(5660300002)(83380400001)(6512007)(356005)(86362001)(186003)(84970400001)(30864003)(6506007)(33964004)(8936002)(6486002)(8676002)(316002)(2906002)(26005)(966005)(47076005)(4743002)(2616005)(6666004)(70586007)(6916009)(4326008)(82310400004)(235185007)(36860700001)(44832011)(70206006)(336012)(44144004)(508600001)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Dec 2021 12:31:45.1347 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0958b726-af54-425d-35d0-08d9c47dd9c1 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3800 X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, Ramana.Radhakrishnan@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, The previous fix for this problem was wrong due to a subtle difference between where NEON expects the RMW values and where intrinsics expects them. The insn pattern is modeled after the intrinsics and so needs an expand for the vectorizer optab to switch the RTL. However operand[3] is not expected to be written to so the current pattern is bogus. Instead we use the expand to shuffle around the RTL. The vectorizer expects operands[3] and operands[0] to be the same but the aarch64 intrinsics expanders expect operands[0] and operands[1] to be the same. This also fixes some issues with big-endian, each dot product performs 4 8-byte multiplications. However compared to AArch64 we don't enter lanes in GCC lane indexed in AArch32 aside from loads/stores. This means no lane remappings are done in arm-builtins.c and so none should be done at the instruction side. There are some other instructions that need inspections as I think there are more incorrect ones. Third there was a bug in the ACLE specication for dot product which has now been fixed[1]. This means some intrinsics were missing and are added by this patch. Bootstrapped and regtested on arm-none-linux-gnueabihf and no issues. Ok for master? and active branches after some stew? [1] https://github.com/ARM-software/acle/releases/tag/r2021Q3 Thanks, Tamar gcc/ChangeLog: * config/arm/arm_neon.h (vdot_laneq_u32, vdotq_laneq_u32, vdot_laneq_s32, vdotq_laneq_s32): New. * config/arm/arm_neon_builtins.def (sdot_laneq, udot_laneq: New. * config/arm/neon.md (neon_dot): New. (dot_prod): Re-order rtl. (neon_dot_lane): Fix rtl order and endiannes. (neon_dot_laneq): New. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/vdot-compile.c: Add new cases. * gcc.target/arm/simd/vdot-exec.c: Likewise. --- inline copy of patch -- diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3364b37f69dfc33082388246c03149d9ad66a634..af6ac63dc3b47830d92f199d93153ff510f658e9 100644 diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3364b37f69dfc33082388246c03149d9ad66a634..af6ac63dc3b47830d92f199d93153ff510f658e9 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18243,6 +18243,35 @@ vdotq_lane_s32 (int32x4_t __r, int8x16_t __a, int8x8_t __b, const int __index) return __builtin_neon_sdot_lanev16qi (__r, __a, __b, __index); } +__extension__ extern __inline uint32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdot_laneq_u32 (uint32x2_t __r, uint8x8_t __a, uint8x16_t __b, const int __index) +{ + return __builtin_neon_udot_laneqv8qi_uuuus (__r, __a, __b, __index); +} + +__extension__ extern __inline uint32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdotq_laneq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b, + const int __index) +{ + return __builtin_neon_udot_laneqv16qi_uuuus (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdot_laneq_s32 (int32x2_t __r, int8x8_t __a, int8x16_t __b, const int __index) +{ + return __builtin_neon_sdot_laneqv8qi (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdotq_laneq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b, const int __index) +{ + return __builtin_neon_sdot_laneqv16qi (__r, __a, __b, __index); +} + #pragma GCC pop_options #endif diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index fafb5c6fc51c16679ead1afda7cccfea8264fd15..f83dd4327c16c0af68f72eb6d9ca8cf21e2e56b5 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -342,6 +342,8 @@ VAR2 (TERNOP, sdot, v8qi, v16qi) VAR2 (UTERNOP, udot, v8qi, v16qi) VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi) VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi) +VAR2 (MAC_LANE, sdot_laneq, v8qi, v16qi) +VAR2 (UMAC_LANE, udot_laneq, v8qi, v16qi) VAR1 (USTERNOP, usdot, v8qi) VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 8b0a396947cc8e7345f178b926128d7224fb218a..848166311b5f82c5facb66e97c2260a5aba5d302 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -2866,20 +2866,49 @@ (define_expand "cmul3" }) -;; These instructions map to the __builtins for the Dot Product operations. -(define_insn "neon_dot" +;; These map to the auto-vectorizer Dot Product optab. +;; The auto-vectorizer expects a dot product builtin that also does an +;; accumulation into the provided register. +;; Given the following pattern +;; +;; for (i=0; idot_prod" [(set (match_operand:VCVTI 0 "register_operand" "=w") - (plus:VCVTI (match_operand:VCVTI 1 "register_operand" "0") - (unspec:VCVTI [(match_operand: 2 - "register_operand" "w") - (match_operand: 3 - "register_operand" "w")] - DOTPROD)))] + (plus:VCVTI + (unspec:VCVTI [(match_operand: 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w")] + DOTPROD) + (match_operand:VCVTI 3 "register_operand" "0")))] "TARGET_DOTPROD" - "vdot.\\t%0, %2, %3" + "vdot.\\t%0, %1, %2" [(set_attr "type" "neon_dot")] ) +;; These instructions map to the __builtins for the Dot Product operations +(define_expand "neon_dot" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI + (unspec:VCVTI [(match_operand: 2 "register_operand") + (match_operand: 3 "register_operand")] + DOTPROD) + (match_operand:VCVTI 1 "register_operand")))] + "TARGET_DOTPROD" +) + ;; These instructions map to the __builtins for the Dot Product operations. (define_insn "neon_usdot" [(set (match_operand:VCVTI 0 "register_operand" "=w") @@ -2898,17 +2927,40 @@ (define_insn "neon_usdot" ;; indexed operations. (define_insn "neon_dot_lane" [(set (match_operand:VCVTI 0 "register_operand" "=w") - (plus:VCVTI (match_operand:VCVTI 1 "register_operand" "0") - (unspec:VCVTI [(match_operand: 2 - "register_operand" "w") - (match_operand:V8QI 3 "register_operand" "t") - (match_operand:SI 4 "immediate_operand" "i")] - DOTPROD)))] + (plus:VCVTI + (unspec:VCVTI [(match_operand: 2 "register_operand" "w") + (match_operand:V8QI 3 "register_operand" "t") + (match_operand:SI 4 "immediate_operand" "i")] + DOTPROD) + (match_operand:VCVTI 1 "register_operand" "0")))] + "TARGET_DOTPROD" + "vdot.\\t%0, %2, %P3[%c4]"; + [(set_attr "type" "neon_dot")] +) + +;; These instructions map to the __builtins for the Dot Product +;; indexed operations. +(define_insn "neon_dot_laneq" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI + (unspec:VCVTI [(match_operand: 2 "register_operand" "w") + (match_operand:V16QI 3 "register_operand" "t") + (match_operand:SI 4 "immediate_operand" "i")] + DOTPROD) + (match_operand:VCVTI 1 "register_operand" "0")))] "TARGET_DOTPROD" { - operands[4] - = GEN_INT (NEON_ENDIAN_LANE_N (V8QImode, INTVAL (operands[4]))); - return "vdot.\\t%0, %2, %P3[%c4]"; + int lane = INTVAL (operands[4]); + if (lane > GET_MODE_NUNITS (V2SImode) - 1) + { + operands[4] = GEN_INT (lane - GET_MODE_NUNITS (V2SImode)); + return "vdot.\\t%0, %2, %f3[%c4]"; + } + else + { + operands[4] = GEN_INT (lane); + return "vdot.\\t%0, %2, %e3[%c4]"; + } } [(set_attr "type" "neon_dot")] ) @@ -2932,43 +2984,6 @@ (define_insn "neon_dot_lane" [(set_attr "type" "neon_dot")] ) -;; These expands map to the Dot Product optab the vectorizer checks for. -;; The auto-vectorizer expects a dot product builtin that also does an -;; accumulation into the provided register. -;; Given the following pattern -;; -;; for (i=0; idot_prod" - [(set (match_operand:VCVTI 0 "register_operand") - (plus:VCVTI (unspec:VCVTI [(match_operand: 1 - "register_operand") - (match_operand: 2 - "register_operand")] - DOTPROD) - (match_operand:VCVTI 3 "register_operand")))] - "TARGET_DOTPROD" -{ - emit_insn ( - gen_neon_dot (operands[3], operands[3], operands[1], - operands[2])); - emit_insn (gen_rtx_SET (operands[0], operands[3])); - DONE; -}) - ;; Auto-vectorizer pattern for usdot (define_expand "usdot_prod" [(set (match_operand:VCVTI 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c index b3bd3bf00e3822fdd60b5955165583d5a5cdc1d0..d3541e829a44fa07972096a02226adea1d26f09d 100644 --- a/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c @@ -49,8 +49,28 @@ int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, int8x8_t y) return vdotq_lane_s32 (r, x, y, 0); } -/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\td[0-9]+, d[0-9]+, d[0-9]+} 4 } } */ +int32x2_t sfoo_laneq1 (int32x2_t r, int8x8_t x, int8x16_t y) +{ + return vdot_laneq_s32 (r, x, y, 0); +} + +int32x4_t sfooq_lane1 (int32x4_t r, int8x16_t x, int8x16_t y) +{ + return vdotq_laneq_s32 (r, x, y, 0); +} + +int32x2_t sfoo_laneq2 (int32x2_t r, int8x8_t x, int8x16_t y) +{ + return vdot_laneq_s32 (r, x, y, 2); +} + +int32x4_t sfooq_lane2 (int32x4_t r, int8x16_t x, int8x16_t y) +{ + return vdotq_laneq_s32 (r, x, y, 2); +} + +/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\td[0-9]+, d[0-9]+, d[0-9]+} 6 } } */ /* { dg-final { scan-assembler-times {v[us]dot\.[us]8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */ -/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\td[0-9]+, d[0-9]+, d[0-9]+\[#?[0-9]\]} 2 } } */ -/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?[0-9]\]} 2 } } */ +/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\td[0-9]+, d[0-9]+, d[0-9]+\[#?[0-9]\]} 4 } } */ +/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?[0-9]\]} 4 } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-exec.c b/gcc/testsuite/gcc.target/arm/simd/vdot-exec.c index 054f4703394b4184284dac371415bef8e9bac45d..97b7898bd6a0fc9a898eba0ea15fbf38eb1405a3 100644 --- a/gcc/testsuite/gcc.target/arm/simd/vdot-exec.c +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-exec.c @@ -2,6 +2,7 @@ /* { dg-additional-options "-O3" } */ /* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw } */ /* { dg-add-options arm_v8_2a_dotprod_neon } */ +/* { dg-additional-options "-w" } */ #include @@ -33,7 +34,20 @@ extern void abort(); t3 f##_##rx1 = {0}; \ f##_##rx1 = f (f##_##rx1, f##_##x, f##_##y, ORDER (1, 1)); \ if (f##_##rx1[0] != n3 || f##_##rx1[1] != n4) \ - abort (); \ + abort (); + +#define P2(n1,n2) n1,n1,n1,n1,n2,n2,n2,n2,n1,n1,n1,n1,n2,n2,n2,n2 +#define TEST_LANEQ(t1, t2, t3, f, r1, r2, n1, n2, n3, n4) \ + ARR(f, x, t1, r1); \ + ARR(f, y, t2, r2); \ + t3 f##_##rx = {0}; \ + f##_##rx = f (f##_##rx, f##_##x, f##_##y, ORDER (3, 2)); \ + if (f##_##rx[0] != n1 || f##_##rx[1] != n2) \ + abort (); \ + t3 f##_##rx1 = {0}; \ + f##_##rx1 = f (f##_##rx1, f##_##x, f##_##y, ORDER (3, 3)); \ + if (f##_##rx1[0] != n3 || f##_##rx1[1] != n4) \ + abort (); int main() @@ -45,11 +59,16 @@ main() TEST (int8x16_t, int8x16_t, int32x4_t, vdotq_s32, P(1,2), P(-2,-3), -8, -24); TEST_LANE (uint8x8_t, uint8x8_t, uint32x2_t, vdot_lane_u32, P(1,2), P(2,3), 8, 16, 12, 24); - TEST_LANE (int8x8_t, int8x8_t, int32x2_t, vdot_lane_s32, P(1,2), P(-2,-3), -8, -16, -12, -24); TEST_LANE (uint8x16_t, uint8x8_t, uint32x4_t, vdotq_lane_u32, P(1,2), P(2,3), 8, 16, 12, 24); TEST_LANE (int8x16_t, int8x8_t, int32x4_t, vdotq_lane_s32, P(1,2), P(-2,-3), -8, -16, -12, -24); + TEST_LANEQ (uint8x8_t, uint8x16_t, uint32x2_t, vdot_laneq_u32, P2(1,2), P2(2,3), 8, 16, 12, 24); + TEST_LANEQ (int8x8_t, int8x16_t, int32x2_t, vdot_laneq_s32, P2(1,2), P2(-2,-3), -8, -16, -12, -24); + + TEST_LANEQ (uint8x16_t, uint8x16_t, uint32x4_t, vdotq_laneq_u32, P2(1,2), P2(2,3), 8, 16, 12, 24); + TEST_LANEQ (int8x16_t, int8x16_t, int32x4_t, vdotq_laneq_s32, P2(1,2), P2(-2,-3), -8, -16, -12, -24); + return 0; }