Message ID | 20210929161942.GA28881@arm.com |
---|---|
State | Committed |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6085B385841A for <patchwork@sourceware.org>; Wed, 29 Sep 2021 16:22:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6085B385841A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1632932527; bh=y8qqNpvp88O014H3XqhSEJr6NYrcm8yJ0LTRAgBETkE=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=lOUReetLnAkNrRcbxlWcWbrTOPqqtjjoXdQhAw9zm51NFX12b55wWusb3Xgc01wwi Dcpvh6PXrEaVd7kpNB9UwB7T6hm0RWONAgdEdfMzENhT8P91ymW/CX4loTVpxKbhtu W7LqaQsq4BO0YGMgo2PTL9nOz/8DgfiwvxbP4poA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60052.outbound.protection.outlook.com [40.107.6.52]) by sourceware.org (Postfix) with ESMTPS id 97470385802E for <gcc-patches@gcc.gnu.org>; Wed, 29 Sep 2021 16:20:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 97470385802E Received: from AM6P193CA0119.EURP193.PROD.OUTLOOK.COM (2603:10a6:209:85::24) by AM0PR08MB5315.eurprd08.prod.outlook.com (2603:10a6:208:18e::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4544.15; Wed, 29 Sep 2021 16:20:04 +0000 Received: from VE1EUR03FT039.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:85:cafe::f9) by AM6P193CA0119.outlook.office365.com (2603:10a6:209:85::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4566.14 via Frontend Transport; Wed, 29 Sep 2021 16:20:04 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.33.187.114) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.33.187.114 as permitted sender) receiver=protection.outlook.com; client-ip=63.33.187.114; helo=64aa7808-outbound-2.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-2.mta.getcheckrecipient.com (63.33.187.114) by VE1EUR03FT039.mail.protection.outlook.com (10.152.19.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4566.14 via Frontend Transport; Wed, 29 Sep 2021 16:20:04 +0000 Received: ("Tessian outbound c9f4ff96a6ad:v103"); Wed, 29 Sep 2021 16:20:03 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: be48c586bd249173 X-CR-MTA-TID: 64aa7808 Received: from dc0c457be939.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 666A27D4-D36E-4425-A10B-69387727ABCB.1; Wed, 29 Sep 2021 16:19:51 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id dc0c457be939.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 29 Sep 2021 16:19:51 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mD1GwmL0Vh9I63rwuGVsSrOYakCOxdYU/BDAkDovdoOSWo6a2TUlCjd5op6EDTnWuKlSmHYQwev24e8dltpz/cM0oL9OviDpZvo2B0gDyHSr4fWcrnsGkI0dyl+Y2PjhohLKA7wfQFNIRuhuvF+X0U7AUXYjuTlmbIefr5Y5Gj8rlL61fLhvOZgl2W39F09AqAQsK19RKdsIsstILw1LhH+bMO+FxK7yWphqx72W1laMqF9Rk1sE46Joz0bK5WU8lyklfyeGysahwu4Tesx4lgwh8L7cOHWVyxB6wrqCTVelOp4KZduQRMSJ1ElFusnA+bR94kXp627rseTyMG/fCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=y8qqNpvp88O014H3XqhSEJr6NYrcm8yJ0LTRAgBETkE=; b=SjVGkQietGHsX7HZBLCFtaJcpPsRNCM9JWc4G+MEtxaCxkEV4yNDR3WV34lVbKkQgdn2n+6IxFh/dnmuyAfkVH2rGeeqRMr8qsN/h5kRNvFy5AB3AWnJOxnrbrPAk32jsC2rtibbw/leROF36j3QOes7nxgIZ0kY127DCpDvO2t5TLnhzTzfBvabgvWurxy43JlZlX+uqVbICGYFgIjIdaTXS+07EW9ruar9Gkh2o/nd2kPFmS39aLn5whk9ivfN6dYIhN9wqz2qNAa/OkzLjw/hWDJ7Ak1pJLckgUDmfRU/2R+amZ4RzS43UCAcHuxXPYhmhGAJtlUun9ewoOphAg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR0801MB1728.eurprd08.prod.outlook.com (2603:10a6:800:54::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4544.20; Wed, 29 Sep 2021 16:19:49 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bd45:5ad5:f666:272a]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bd45:5ad5:f666:272a%5]) with mapi id 15.20.4544.022; Wed, 29 Sep 2021 16:19:49 +0000 Date: Wed, 29 Sep 2021 17:19:44 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/7]AArch64 Add combine patterns for narrowing shift of half top bits (shuffle) Message-ID: <20210929161942.GA28881@arm.com> Content-Type: multipart/mixed; boundary="M9NhX3UHpAaciwkO" Content-Disposition: inline In-Reply-To: <patch-14899-tamar@arm.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: LO2P265CA0043.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:61::31) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 Received: from arm.com (217.140.106.55) by LO2P265CA0043.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:61::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4544.13 via Frontend Transport; Wed, 29 Sep 2021 16:19:48 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 25eb2327-76a5-4398-d9bc-08d98364fea4 X-MS-TrafficTypeDiagnostic: VI1PR0801MB1728:|AM0PR08MB5315: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: <AM0PR08MB5315361243354B917CF18762FFA99@AM0PR08MB5315.eurprd08.prod.outlook.com> x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:6108;OLM:6108; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 1DF7+13psP3U1XOYbGlajeB0YJ2fT52C/7zdPq678Cl0lfYPSirEtPJx4wmaGUPwEnIjARguJjlzlN1sw3pl1aP9goceWtO2YFvci5eE5XLLIMY322GN58DJj7HvrF6HSBiFnFd3STBcf8UqTllVqnaCgjcIVaTyg/NsYDoaZRSQ/M61Mm3ppUOyKWMJXNh+F8a6NGiydgIVcygRELk2l3hBi4rnYCOaDofXxFCypA7gIcQSVa2ns98l/GNaJXiFoyt2FWHcodJm5QPpxej54iOSvWg8puIRUKPrwyEKRSeqhA5V5XmA+9H7kVQUUcuEV79bq6e0GY0EVzfJAYOvyunDT8Kas4CejfOD+mHb/Dg6ovWXrt7gpdPNQv6Afn01cuCrOZ7bbrzrJOk//vesKfga47PFG403HVYZaXWTtb9Q8pLU7r0XVW9yZq6xIBSjLyJGlIQjEKZThsfyd9GsqmUYOV5kNAVaMnFqU2HrkT+pZBWX+2suGwC70ucwQhWLAH089nOA2LN62TgweFhEb7DkcaHO6emOwIQjM2806d8YnOEfy6i66DESGJmh6u+I/3R0xXZDIaofLQN5KqWIONdwTHyEaDPY2SaIxt9tKzfkFa6/djthMKYIbJLGfOwXQpVuGaT+y5oAwIibgFrFN94bKfcDys8GM265fDN9Ru95Fco90zOBCgVLPEdTkwKFFpYuK5ISqppovkPYjSw06oM2myXaj5R13GXYSlhuddy7m4E8sfY+V7XD/hn0+It8+NHvnLS2Y0mTYy80NvB+VQB1+TtuzqU+AY4lPYbSN5ko4jqW62xNMDxbDSQlPa9ddlgd6Nyx0veKBnhHe4AA6BekJy4wdoJFt3dNNPr1vfI= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(44144004)(7696005)(1076003)(36756003)(316002)(66946007)(33656002)(186003)(26005)(66556008)(6666004)(52116002)(8676002)(5660300002)(33964004)(38100700002)(38350700002)(508600001)(66616009)(8886007)(8936002)(66476007)(6916009)(55016002)(235185007)(2616005)(4743002)(44832011)(4326008)(2906002)(86362001)(956004)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1728 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT039.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 24eabd0e-a339-4a22-4bf5-08d98364f585 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: b0qwSx3eivj+TalL/QYN2Q92f2MN6LrnfO2lhl4mHOhz3/VmJkep2na6J/odYAZzxtxh2kK148f9vsgjY0TRYLZ5AhZfbxwoFHCQp31lIXaoE9TZYGTYw+7aXg+55h/cL6Qfx3Db1L56AWjA8jwRoxJlpBdmbmBFTKv/A9gPkb8XDraVrAqgCeISbHaLW6VZ4BMhDZ/uKtw3eu7bqgUQrEi0dUKIX+vlSgSObxcwQ36M6CphpEZ4vL7d4nq5be43sk7g0mX33XoQ8Ukpm/72GFimF4xw8D8h37mcBb5E5grnoTJzKjSkZ3r7LY+vn4y1i/1cIRD5yCOQ7x0yh1hbhRUYsf2Etl3QMjrmmMnaSRU92lVjI81GtLpf3ppQZeTBJ6suOXDU3qX3dJFklR3d2cqphQ41jv3stLRbBbQDEJSpb8FIPMd9kJsTR+EfGdVmDIZJxdZa6lUuVmKchpwTnwdJqWtQTRSMC8ot5Aun8xr5d51JJ/opTRcWfzNo4n2WVQc0Memc9OkN7BeFMXQ/LuIZE61u1UdOPQn2QrmBNO2eivP8PwHl4sk7fgN3+ZyHYOpdtsizU5c+0LZ0DopqrgK/+XQwdDLdcwyG2os/ZkaSnPx+bwbN0slYn2wZ66ko+3eSeM2Yw+cHGbTdBVRDGqbasP3xIno8mV89jjSWmCAoVsHxbCA0pyjTUnu3e/ocBeWny9RzVsmfPhv+HvExlf/PCBtFH+/gJOeoZ6bqiXLThFQKXg997Ah+fkQcno0rXbZc2d4dwCIY9XKQ0Ouc2HRKycC8s31CbuHS2OXmd4ccVjXuypO/q6pMz4uEasPlY6i/tutSZJ65HGo4n1Jm6a5WSv+2Er4a1Q4+t0GNPSs= X-Forefront-Antispam-Report: CIP:63.33.187.114; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-2.mta.getcheckrecipient.com; PTR:ec2-63-33-187-114.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(5660300002)(7696005)(44144004)(508600001)(6916009)(36860700001)(8886007)(235185007)(336012)(1076003)(4743002)(82310400003)(33964004)(2906002)(36756003)(6666004)(8676002)(55016002)(81166007)(356005)(8936002)(44832011)(47076005)(4326008)(66616009)(26005)(956004)(86362001)(70206006)(70586007)(186003)(316002)(33656002)(2616005)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Sep 2021 16:20:04.0391 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 25eb2327-76a5-4398-d9bc-08d98364fea4 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.33.187.114]; Helo=[64aa7808-outbound-2.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT039.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB5315 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Tamar Christina <tamar.christina@arm.com> Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
AArch64 Optimize truncation, shifts and bitmask comparisons
|
|
Commit Message
Tamar Christina
Sept. 29, 2021, 4:19 p.m. UTC
Hi All, When doing a (narrowing) right shift by half the width of the original type then we are essentially shuffling the top bits from the first number down. If we have a hi/lo pair we can just use a single shuffle instead of needing two shifts. i.e. typedef short int16_t; typedef unsigned short uint16_t; void foo (uint16_t * restrict a, int16_t * restrict d, int n) { for( int i = 0; i < n; i++ ) d[i] = (a[i] * a[i]) >> 16; } now generates: .L4: ldr q0, [x0, x3] umull v1.4s, v0.4h, v0.4h umull2 v0.4s, v0.8h, v0.8h uzp2 v0.8h, v1.8h, v0.8h str q0, [x1, x3] add x3, x3, 16 cmp x4, x3 bne .L4 instead of .L4: ldr q0, [x0, x3] umull v1.4s, v0.4h, v0.4h umull2 v0.4s, v0.8h, v0.8h sshr v1.4s, v1.4s, 16 sshr v0.4s, v0.4s, 16 xtn v1.4h, v1.4s xtn2 v1.8h, v0.4s str q1, [x1, x3] add x3, x3, 16 cmp x4, x3 bne .L4 Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_<srn_op>topbits_shuffle<mode>, *aarch64_topbits_shuffle<mode>): New. * config/aarch64/predicates.md (aarch64_simd_shift_imm_vec_exact_top): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shrn-combine-2.c: New test. * gcc.target/aarch64/shrn-combine-3.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index d7b6cae424622d259f97a3d5fa9093c0fb0bd5ce..300bf001b59ca7fa197c580b10adb7f70f20d1e0 100644 --
Comments
> -----Original Message----- > From: Tamar Christina <Tamar.Christina@arm.com> > Sent: Wednesday, September 29, 2021 5:20 PM > To: gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov > <Kyrylo.Tkachov@arm.com>; Richard Sandiford > <Richard.Sandiford@arm.com> > Subject: [PATCH 2/7]AArch64 Add combine patterns for narrowing shift of > half top bits (shuffle) > > Hi All, > > When doing a (narrowing) right shift by half the width of the original type > then > we are essentially shuffling the top bits from the first number down. > > If we have a hi/lo pair we can just use a single shuffle instead of needing two > shifts. > > i.e. > > typedef short int16_t; > typedef unsigned short uint16_t; > > void foo (uint16_t * restrict a, int16_t * restrict d, int n) > { > for( int i = 0; i < n; i++ ) > d[i] = (a[i] * a[i]) >> 16; > } > > now generates: > > .L4: > ldr q0, [x0, x3] > umull v1.4s, v0.4h, v0.4h > umull2 v0.4s, v0.8h, v0.8h > uzp2 v0.8h, v1.8h, v0.8h > str q0, [x1, x3] > add x3, x3, 16 > cmp x4, x3 > bne .L4 > > instead of > > .L4: > ldr q0, [x0, x3] > umull v1.4s, v0.4h, v0.4h > umull2 v0.4s, v0.8h, v0.8h > sshr v1.4s, v1.4s, 16 > sshr v0.4s, v0.4s, 16 > xtn v1.4h, v1.4s > xtn2 v1.8h, v0.4s > str q1, [x1, x3] > add x3, x3, 16 > cmp x4, x3 > bne .L4 > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > Ok. Thanks, Kyrill > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md > (*aarch64_<srn_op>topbits_shuffle<mode>, > *aarch64_topbits_shuffle<mode>): New. > * config/aarch64/predicates.md > (aarch64_simd_shift_imm_vec_exact_top): New. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/shrn-combine-2.c: New test. > * gcc.target/aarch64/shrn-combine-3.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > d7b6cae424622d259f97a3d5fa9093c0fb0bd5ce..300bf001b59ca7fa197c580b > 10adb7f70f20d1e0 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -1840,6 +1840,36 @@ (define_insn > "*aarch64_<srn_op>shrn<mode>2_vect" > [(set_attr "type" "neon_shift_imm_narrow_q")] > ) > > +(define_insn "*aarch64_<srn_op>topbits_shuffle<mode>" > + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") > + (vec_concat:<VNARROWQ2> > + (truncate:<VNARROWQ> > + (SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w") > + (match_operand:VQN 2 > "aarch64_simd_shift_imm_vec_exact_top"))) > + (truncate:<VNARROWQ> > + (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w") > + (match_dup 2)))))] > + "TARGET_SIMD" > + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" > + [(set_attr "type" "neon_permute<q>")] > +) > + > +(define_insn "*aarch64_topbits_shuffle<mode>" > + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") > + (vec_concat:<VNARROWQ2> > + (unspec:<VNARROWQ> [ > + (match_operand:VQN 1 "register_operand" "w") > + (match_operand:VQN 2 > "aarch64_simd_shift_imm_vec_exact_top") > + ] UNSPEC_RSHRN) > + (unspec:<VNARROWQ> [ > + (match_operand:VQN 3 "register_operand" "w") > + (match_dup 2) > + ] UNSPEC_RSHRN)))] > + "TARGET_SIMD" > + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" > + [(set_attr "type" "neon_permute<q>")] > +) > + > (define_expand "aarch64_shrn<mode>" > [(set (match_operand:<VNARROWQ> 0 "register_operand") > (truncate:<VNARROWQ> > diff --git a/gcc/config/aarch64/predicates.md > b/gcc/config/aarch64/predicates.md > index > 49f02ae0381359174fed80c2a2264295c75bc189..7fd4f9e7d06d3082d6f30472 > 90f0446789e1d0d2 100644 > --- a/gcc/config/aarch64/predicates.md > +++ b/gcc/config/aarch64/predicates.md > @@ -545,6 +545,12 @@ (define_predicate > "aarch64_simd_shift_imm_offset_di" > (and (match_code "const_int") > (match_test "IN_RANGE (INTVAL (op), 1, 64)"))) > > +(define_predicate "aarch64_simd_shift_imm_vec_exact_top" > + (and (match_code "const_vector") > + (match_test "aarch64_const_vec_all_same_in_range_p (op, > + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2, > + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2)"))) > + > (define_predicate "aarch64_simd_shift_imm_vec_qi" > (and (match_code "const_vector") > (match_test "aarch64_const_vec_all_same_in_range_p (op, 1, 8)"))) > diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c > b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..924b3b849e449082b8c0b7 > dc6b955a2bad8d0911 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c > @@ -0,0 +1,15 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ > + > +typedef short int16_t; > +typedef unsigned short uint16_t; > + > +void foo (uint16_t * restrict a, int16_t * restrict d, int n) > +{ > + for( int i = 0; i < n; i++ ) > + d[i] = (a[i] * a[i]) >> 16; > +} > + > +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ > +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ > +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c > b/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..929a55c5c338844e6a5c5ad > 249af482286ab9c61 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c > @@ -0,0 +1,14 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ > + > + > +#include <arm_neon.h> > + > +uint16x8_t foo (uint32x4_t a, uint32x4_t b) > +{ > + return vrshrn_high_n_u32 (vrshrn_n_u32 (a, 16), b, 16); > +} > + > +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ > +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ > +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ > > > --
Hi All, This is a new version with more tests and BE support. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_<srn_op>topbits_shuffle<mode>_le): New. (*aarch64_topbits_shuffle<mode>_le): New. (*aarch64_<srn_op>topbits_shuffle<mode>_be): New. (*aarch64_topbits_shuffle<mode>_be): New. * config/aarch64/predicates.md (aarch64_simd_shift_imm_vec_exact_top): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shrn-combine-10.c: New test. * gcc.target/aarch64/shrn-combine-5.c: New test. * gcc.target/aarch64/shrn-combine-6.c: New test. * gcc.target/aarch64/shrn-combine-7.c: New test. * gcc.target/aarch64/shrn-combine-8.c: New test. * gcc.target/aarch64/shrn-combine-9.c: New test. --- inline copy of patch --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 5715db4e1e1386e724e4d4defd5e5ed9efd8a874..7f0888ee2f81ae17ac97be1f8438a2e588587c2a 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1852,6 +1852,66 @@ (define_insn "*aarch64_<srn_op>shrn<mode>2_vect_be" [(set_attr "type" "neon_shift_imm_narrow_q")] ) +(define_insn "*aarch64_<srn_op>topbits_shuffle<mode>_le" + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") + (vec_concat:<VNARROWQ2> + (truncate:<VNARROWQ> + (SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w") + (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top"))) + (truncate:<VNARROWQ> + (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w") + (match_dup 2)))))] + "TARGET_SIMD && !BYTES_BIG_ENDIAN" + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" + [(set_attr "type" "neon_permute<q>")] +) + +(define_insn "*aarch64_topbits_shuffle<mode>_le" + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") + (vec_concat:<VNARROWQ2> + (unspec:<VNARROWQ> [ + (match_operand:VQN 1 "register_operand" "w") + (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top") + ] UNSPEC_RSHRN) + (unspec:<VNARROWQ> [ + (match_operand:VQN 3 "register_operand" "w") + (match_dup 2) + ] UNSPEC_RSHRN)))] + "TARGET_SIMD && !BYTES_BIG_ENDIAN" + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" + [(set_attr "type" "neon_permute<q>")] +) + +(define_insn "*aarch64_<srn_op>topbits_shuffle<mode>_be" + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") + (vec_concat:<VNARROWQ2> + (truncate:<VNARROWQ> + (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w") + (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top"))) + (truncate:<VNARROWQ> + (SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w") + (match_dup 2)))))] + "TARGET_SIMD && BYTES_BIG_ENDIAN" + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" + [(set_attr "type" "neon_permute<q>")] +) + +(define_insn "*aarch64_topbits_shuffle<mode>_be" + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") + (vec_concat:<VNARROWQ2> + (unspec:<VNARROWQ> [ + (match_operand:VQN 3 "register_operand" "w") + (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top") + ] UNSPEC_RSHRN) + (unspec:<VNARROWQ> [ + (match_operand:VQN 1 "register_operand" "w") + (match_dup 2) + ] UNSPEC_RSHRN)))] + "TARGET_SIMD && BYTES_BIG_ENDIAN" + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" + [(set_attr "type" "neon_permute<q>")] +) + (define_expand "aarch64_shrn<mode>" [(set (match_operand:<VNARROWQ> 0 "register_operand") (truncate:<VNARROWQ> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 49f02ae0381359174fed80c2a2264295c75bc189..7fd4f9e7d06d3082d6f3047290f0446789e1d0d2 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -545,6 +545,12 @@ (define_predicate "aarch64_simd_shift_imm_offset_di" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 1, 64)"))) +(define_predicate "aarch64_simd_shift_imm_vec_exact_top" + (and (match_code "const_vector") + (match_test "aarch64_const_vec_all_same_in_range_p (op, + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2, + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2)"))) + (define_predicate "aarch64_simd_shift_imm_vec_qi" (and (match_code "const_vector") (match_test "aarch64_const_vec_all_same_in_range_p (op, 1, 8)"))) diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c new file mode 100644 index 0000000000000000000000000000000000000000..3a1cfce93e9065e8d5b43a770b0ef24a17586411 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c @@ -0,0 +1,14 @@ +/* { dg-do assemble } */ +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ + + +#include <arm_neon.h> + +uint32x4_t foo (uint64x2_t a, uint64x2_t b) +{ + return vrshrn_high_n_u64 (vrshrn_n_u64 (a, 32), b, 32); +} + +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c new file mode 100644 index 0000000000000000000000000000000000000000..408e85535788b2c1c9b05672a269e4e6567f2683 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c @@ -0,0 +1,16 @@ +/* { dg-do assemble } */ +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ + +#define TYPE1 char +#define TYPE2 short +#define SHIFT 8 + +void foo (TYPE2 * restrict a, TYPE1 * restrict d, int n) +{ + for( int i = 0; i < n; i++ ) + d[i] = a[i] >> SHIFT; +} + +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c new file mode 100644 index 0000000000000000000000000000000000000000..6211ba3e41c199f325b80217d298801767c8dad5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c @@ -0,0 +1,16 @@ +/* { dg-do assemble } */ +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ + +#define TYPE1 short +#define TYPE2 int +#define SHIFT 16 + +void foo (TYPE2 * restrict a, TYPE1 * restrict d, int n) +{ + for( int i = 0; i < n; i++ ) + d[i] = a[i] >> SHIFT; +} + +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c new file mode 100644 index 0000000000000000000000000000000000000000..56cbeacc6de54f177f5b66d26b62ba6cefb921ad --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c @@ -0,0 +1,16 @@ +/* { dg-do assemble } */ +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ + +#define TYPE1 int +#define TYPE2 long long +#define SHIFT 32 + +void foo (TYPE2 * restrict a, TYPE1 * restrict d, int n) +{ + for( int i = 0; i < n; i++ ) + d[i] = a[i] >> SHIFT; +} + +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-8.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-8.c new file mode 100644 index 0000000000000000000000000000000000000000..6a47f3cdaee399e603c57a1c6a0c09c6cfd21abb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-8.c @@ -0,0 +1,14 @@ +/* { dg-do assemble } */ +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ + + +#include <arm_neon.h> + +uint8x16_t foo (uint16x8_t a, uint16x8_t b) +{ + return vrshrn_high_n_u16 (vrshrn_n_u16 (a, 8), b, 8); +} + +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-9.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-9.c new file mode 100644 index 0000000000000000000000000000000000000000..929a55c5c338844e6a5c5ad249af482286ab9c61 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-9.c @@ -0,0 +1,14 @@ +/* { dg-do assemble } */ +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ + + +#include <arm_neon.h> + +uint16x8_t foo (uint32x4_t a, uint32x4_t b) +{ + return vrshrn_high_n_u32 (vrshrn_n_u32 (a, 16), b, 16); +} + +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */
> -----Original Message----- > From: Tamar Christina <Tamar.Christina@arm.com> > Sent: Tuesday, October 12, 2021 5:23 PM > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Richard Sandiford > <Richard.Sandiford@arm.com> > Subject: RE: [PATCH 2/7]AArch64 Add combine patterns for narrowing shift > of half top bits (shuffle) > > Hi All, > > This is a new version with more tests and BE support. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? Ok. Thanks, Kyrill > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md > (*aarch64_<srn_op>topbits_shuffle<mode>_le): New. > (*aarch64_topbits_shuffle<mode>_le): New. > (*aarch64_<srn_op>topbits_shuffle<mode>_be): New. > (*aarch64_topbits_shuffle<mode>_be): New. > * config/aarch64/predicates.md > (aarch64_simd_shift_imm_vec_exact_top): New. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/shrn-combine-10.c: New test. > * gcc.target/aarch64/shrn-combine-5.c: New test. > * gcc.target/aarch64/shrn-combine-6.c: New test. > * gcc.target/aarch64/shrn-combine-7.c: New test. > * gcc.target/aarch64/shrn-combine-8.c: New test. > * gcc.target/aarch64/shrn-combine-9.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 5715db4e1e1386e724e4d4defd5e5ed9efd8a874..7f0888ee2f81ae17ac97be1f > 8438a2e588587c2a 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -1852,6 +1852,66 @@ (define_insn > "*aarch64_<srn_op>shrn<mode>2_vect_be" > [(set_attr "type" "neon_shift_imm_narrow_q")] > ) > > +(define_insn "*aarch64_<srn_op>topbits_shuffle<mode>_le" > + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") > + (vec_concat:<VNARROWQ2> > + (truncate:<VNARROWQ> > + (SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w") > + (match_operand:VQN 2 > "aarch64_simd_shift_imm_vec_exact_top"))) > + (truncate:<VNARROWQ> > + (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w") > + (match_dup 2)))))] > + "TARGET_SIMD && !BYTES_BIG_ENDIAN" > + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" > + [(set_attr "type" "neon_permute<q>")] > +) > + > +(define_insn "*aarch64_topbits_shuffle<mode>_le" > + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") > + (vec_concat:<VNARROWQ2> > + (unspec:<VNARROWQ> [ > + (match_operand:VQN 1 "register_operand" "w") > + (match_operand:VQN 2 > "aarch64_simd_shift_imm_vec_exact_top") > + ] UNSPEC_RSHRN) > + (unspec:<VNARROWQ> [ > + (match_operand:VQN 3 "register_operand" "w") > + (match_dup 2) > + ] UNSPEC_RSHRN)))] > + "TARGET_SIMD && !BYTES_BIG_ENDIAN" > + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" > + [(set_attr "type" "neon_permute<q>")] > +) > + > +(define_insn "*aarch64_<srn_op>topbits_shuffle<mode>_be" > + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") > + (vec_concat:<VNARROWQ2> > + (truncate:<VNARROWQ> > + (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w") > + (match_operand:VQN 2 > "aarch64_simd_shift_imm_vec_exact_top"))) > + (truncate:<VNARROWQ> > + (SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w") > + (match_dup 2)))))] > + "TARGET_SIMD && BYTES_BIG_ENDIAN" > + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" > + [(set_attr "type" "neon_permute<q>")] > +) > + > +(define_insn "*aarch64_topbits_shuffle<mode>_be" > + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") > + (vec_concat:<VNARROWQ2> > + (unspec:<VNARROWQ> [ > + (match_operand:VQN 3 "register_operand" "w") > + (match_operand:VQN 2 > "aarch64_simd_shift_imm_vec_exact_top") > + ] UNSPEC_RSHRN) > + (unspec:<VNARROWQ> [ > + (match_operand:VQN 1 "register_operand" "w") > + (match_dup 2) > + ] UNSPEC_RSHRN)))] > + "TARGET_SIMD && BYTES_BIG_ENDIAN" > + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" > + [(set_attr "type" "neon_permute<q>")] > +) > + > (define_expand "aarch64_shrn<mode>" > [(set (match_operand:<VNARROWQ> 0 "register_operand") > (truncate:<VNARROWQ> > diff --git a/gcc/config/aarch64/predicates.md > b/gcc/config/aarch64/predicates.md > index > 49f02ae0381359174fed80c2a2264295c75bc189..7fd4f9e7d06d3082d6f30472 > 90f0446789e1d0d2 100644 > --- a/gcc/config/aarch64/predicates.md > +++ b/gcc/config/aarch64/predicates.md > @@ -545,6 +545,12 @@ (define_predicate > "aarch64_simd_shift_imm_offset_di" > (and (match_code "const_int") > (match_test "IN_RANGE (INTVAL (op), 1, 64)"))) > > +(define_predicate "aarch64_simd_shift_imm_vec_exact_top" > + (and (match_code "const_vector") > + (match_test "aarch64_const_vec_all_same_in_range_p (op, > + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2, > + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2)"))) > + > (define_predicate "aarch64_simd_shift_imm_vec_qi" > (and (match_code "const_vector") > (match_test "aarch64_const_vec_all_same_in_range_p (op, 1, 8)"))) > diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c > b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..3a1cfce93e9065e8d5b43a7 > 70b0ef24a17586411 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c > @@ -0,0 +1,14 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ > + > + > +#include <arm_neon.h> > + > +uint32x4_t foo (uint64x2_t a, uint64x2_t b) > +{ > + return vrshrn_high_n_u64 (vrshrn_n_u64 (a, 32), b, 32); > +} > + > +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ > +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ > +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c > b/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..408e85535788b2c1c9b0567 > 2a269e4e6567f2683 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c > @@ -0,0 +1,16 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ > + > +#define TYPE1 char > +#define TYPE2 short > +#define SHIFT 8 > + > +void foo (TYPE2 * restrict a, TYPE1 * restrict d, int n) > +{ > + for( int i = 0; i < n; i++ ) > + d[i] = a[i] >> SHIFT; > +} > + > +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ > +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ > +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c > b/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..6211ba3e41c199f325b8021 > 7d298801767c8dad5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c > @@ -0,0 +1,16 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ > + > +#define TYPE1 short > +#define TYPE2 int > +#define SHIFT 16 > + > +void foo (TYPE2 * restrict a, TYPE1 * restrict d, int n) > +{ > + for( int i = 0; i < n; i++ ) > + d[i] = a[i] >> SHIFT; > +} > + > +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ > +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ > +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c > b/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..56cbeacc6de54f177f5b66d > 26b62ba6cefb921ad > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c > @@ -0,0 +1,16 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ > + > +#define TYPE1 int > +#define TYPE2 long long > +#define SHIFT 32 > + > +void foo (TYPE2 * restrict a, TYPE1 * restrict d, int n) > +{ > + for( int i = 0; i < n; i++ ) > + d[i] = a[i] >> SHIFT; > +} > + > +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ > +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ > +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-8.c > b/gcc/testsuite/gcc.target/aarch64/shrn-combine-8.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..6a47f3cdaee399e603c57a1 > c6a0c09c6cfd21abb > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-8.c > @@ -0,0 +1,14 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ > + > + > +#include <arm_neon.h> > + > +uint8x16_t foo (uint16x8_t a, uint16x8_t b) > +{ > + return vrshrn_high_n_u16 (vrshrn_n_u16 (a, 8), b, 8); > +} > + > +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ > +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ > +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-9.c > b/gcc/testsuite/gcc.target/aarch64/shrn-combine-9.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..929a55c5c338844e6a5c5ad > 249af482286ab9c61 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-9.c > @@ -0,0 +1,14 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ > + > + > +#include <arm_neon.h> > + > +uint16x8_t foo (uint32x4_t a, uint32x4_t b) > +{ > + return vrshrn_high_n_u32 (vrshrn_n_u32 (a, 16), b, 16); > +} > + > +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ > +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ > +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index d7b6cae424622d259f97a3d5fa9093c0fb0bd5ce..300bf001b59ca7fa197c580b10adb7f70f20d1e0 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1840,6 +1840,36 @@ (define_insn "*aarch64_<srn_op>shrn<mode>2_vect" [(set_attr "type" "neon_shift_imm_narrow_q")] ) +(define_insn "*aarch64_<srn_op>topbits_shuffle<mode>" + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") + (vec_concat:<VNARROWQ2> + (truncate:<VNARROWQ> + (SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w") + (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top"))) + (truncate:<VNARROWQ> + (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w") + (match_dup 2)))))] + "TARGET_SIMD" + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" + [(set_attr "type" "neon_permute<q>")] +) + +(define_insn "*aarch64_topbits_shuffle<mode>" + [(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w") + (vec_concat:<VNARROWQ2> + (unspec:<VNARROWQ> [ + (match_operand:VQN 1 "register_operand" "w") + (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top") + ] UNSPEC_RSHRN) + (unspec:<VNARROWQ> [ + (match_operand:VQN 3 "register_operand" "w") + (match_dup 2) + ] UNSPEC_RSHRN)))] + "TARGET_SIMD" + "uzp2\\t%0.<V2ntype>, %1.<V2ntype>, %3.<V2ntype>" + [(set_attr "type" "neon_permute<q>")] +) + (define_expand "aarch64_shrn<mode>" [(set (match_operand:<VNARROWQ> 0 "register_operand") (truncate:<VNARROWQ> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 49f02ae0381359174fed80c2a2264295c75bc189..7fd4f9e7d06d3082d6f3047290f0446789e1d0d2 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -545,6 +545,12 @@ (define_predicate "aarch64_simd_shift_imm_offset_di" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 1, 64)"))) +(define_predicate "aarch64_simd_shift_imm_vec_exact_top" + (and (match_code "const_vector") + (match_test "aarch64_const_vec_all_same_in_range_p (op, + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2, + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2)"))) + (define_predicate "aarch64_simd_shift_imm_vec_qi" (and (match_code "const_vector") (match_test "aarch64_const_vec_all_same_in_range_p (op, 1, 8)"))) diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c new file mode 100644 index 0000000000000000000000000000000000000000..924b3b849e449082b8c0b7dc6b955a2bad8d0911 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c @@ -0,0 +1,15 @@ +/* { dg-do assemble } */ +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ + +typedef short int16_t; +typedef unsigned short uint16_t; + +void foo (uint16_t * restrict a, int16_t * restrict d, int n) +{ + for( int i = 0; i < n; i++ ) + d[i] = (a[i] * a[i]) >> 16; +} + +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c new file mode 100644 index 0000000000000000000000000000000000000000..929a55c5c338844e6a5c5ad249af482286ab9c61 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c @@ -0,0 +1,14 @@ +/* { dg-do assemble } */ +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */ + + +#include <arm_neon.h> + +uint16x8_t foo (uint32x4_t a, uint32x4_t b) +{ + return vrshrn_high_n_u32 (vrshrn_n_u32 (a, 16), b, 16); +} + +/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tshrn\t} } } */ +/* { dg-final { scan-assembler-not {\tshrn2\t} } } */