From patchwork Tue Apr 5 18:08:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 52649 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 224B03858D37 for ; Tue, 5 Apr 2022 18:09:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 224B03858D37 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1649182179; bh=4ekF7u+JJOesQzi9rpHDyskbfKem13kTBcot/G/vvxI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=KEmfLuyp0B7StW0FWljNxY9+luUgN8hKlBwRtoURnfZwnCOLqOR8pIua/lRJN/h3q 7Vsm4HD8M2tOz4YzAhe+Hprp5sl894QN7mjmjHNrLbdttaWWfQlpxk2JGB+Eq0PC+m 7XwS4ymSLka6xis2pn0/xvW35I3WZmhYbbgCJEEQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30086.outbound.protection.outlook.com [40.107.3.86]) by sourceware.org (Postfix) with ESMTPS id CF8653858D1E for ; Tue, 5 Apr 2022 18:09:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CF8653858D1E Received: from AM0PR02CA0185.eurprd02.prod.outlook.com (2603:10a6:20b:28e::22) by AM4PR0802MB2322.eurprd08.prod.outlook.com (2603:10a6:200:62::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Tue, 5 Apr 2022 18:09:04 +0000 Received: from AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:28e:cafe::1a) by AM0PR02CA0185.outlook.office365.com (2603:10a6:20b:28e::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31 via Frontend Transport; Tue, 5 Apr 2022 18:09:04 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT011.mail.protection.outlook.com (10.152.16.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.19 via Frontend Transport; Tue, 5 Apr 2022 18:09:04 +0000 Received: ("Tessian outbound ab7864ef57f2:v118"); Tue, 05 Apr 2022 18:09:03 +0000 X-CR-MTA-TID: 64aa7808 Received: from e467036ac2ba.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C7B884D0-4C98-41AD-A24B-67C063B7C0CC.1; Tue, 05 Apr 2022 18:08:58 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e467036ac2ba.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 05 Apr 2022 18:08:58 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=U6FocJVY+Xb3WCbaeUM8tfKDmUhOWweTnkmhFyoWiE20twaqFekMIIsv/NPDQH5yL/LpphhsWNw2QhHhgEkGXbL4TSCed3H4I0TZFDoWjcpAIsmbQJouTzTw156rad8caYkgKS6odzWYvu3YeYC+fH0VDyPnwdKaTsI1jPK3KgbYtCEGyEUNXybRuLEoj5HfznitXV+WwqIWkCeWoh7u1w6iMkVGyp1St8iu141FBk6YgWjZWAUpiWpx5nwxTDUBTH2E3mhIC6Hb9HnZXqRDPyoS/9xSV4gv5ChN4iOmLdGXC6Dag3rGlyV/DU/TupZa5C8Til3K7qmDIS16KrTAfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4ekF7u+JJOesQzi9rpHDyskbfKem13kTBcot/G/vvxI=; b=hgG48cR6+sT6dwdeZ1jpei0bp5QnqxcgmH+0LR6VWpAeKkkVayooOtDQFDYgMtn1a+BswwIzLGRM537PsE2fTf85paPUrR9p+DSVScGaHenMF/Bqa+fOX1/DKTxu6kU2EJeFhoJ0eHuj5H9RTtgRgWqKvQGzx2EpOpeEKdKh0oC5iMIvNj4v783d2xuQvvUwstBjY/loPvKyLeg3/tEv8RiWcWi8cvUR8G1+AaOVErXugmGCFnKsh9r5ZwDeQhcs8lqV8vFk86BDmTOMimslfeIvSUxkDDk6DtW8DMgCZd9ultGAxmLRQJL6buAaVi1P9NbyaUPlknDR6wEeVCKRVw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by HE1PR08MB2649.eurprd08.prod.outlook.com (2603:10a6:7:2b::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Tue, 5 Apr 2022 18:08:55 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::90e:52a:2cd5:ad63]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::90e:52a:2cd5:ad63%7]) with mapi id 15.20.5123.031; Tue, 5 Apr 2022 18:08:55 +0000 To: nd , GCC Patches Subject: [PATCH]AArch64 Fix left fold sum reduction RTL patterns [PR104049] Thread-Topic: [PATCH]AArch64 Fix left fold sum reduction RTL patterns [PR104049] Thread-Index: AQHYSRaSdu+ye1b1s069Ku661DOz3Q== Date: Tue, 5 Apr 2022 18:08:55 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-ms-exchange-imapappendstamp: VI1PR08MB5325.eurprd08.prod.outlook.com (15.20.5123.031) x-ts-tracking-id: C9E10453D5AC2A48B3675CA786166DA9.0 x-checkrecipientchecked: true Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-exchange-messagesentrepresentingtype: 1 X-MS-Office365-Filtering-Correlation-Id: 9208eb05-615f-429f-a4a8-08da172f5e7f x-ms-traffictypediagnostic: HE1PR08MB2649:EE_|AM5EUR03FT011:EE_|AM4PR0802MB2322:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: +87rtxamIPHCAYRis7UBPT5nDEVA4PKZZ3qNyyhbZIpz877T2ypT5y5hDzHtkJ5vVzd4cijWj2818jMQz1X6xpD/0t+BVIyW5dQU2K+gxnkmuKS1J6cZgYDDbSlKxMck4B2R+QhD1cKg2LezN7qH3q7ith4WbY7+upzcmgR4c2JTKvuxvj+jlBUwwizEYfeSKHHoLa6jZir9IF+1w9cIIuzzGM3TLEQC8g5K/vhl2xD1MQeLuA6I3GF9vsHPzZ1chFMjEE9OxXKjrFKCyILpLhIppcCTSj/kK+7Yk1MRqsw/M2APOlgbrKKe6OoH0c702eNSWe3olGNMd7V0+o5SBLYCM54OJzv9RUlmRoiZUK7UDlFyUL1ZPWmN5f08XD640rMYPZ/NlVv50MzOL2TdkDyMtZSTX68CkDZusiPBIEE2FHrdVMYSExavQcnjcxyPDdd1XT+MZnQRS3AOVaFjthOHpOP9jo/5xSOaoAJXqmaqc+OmR4RC79J/PUA8S4Ut7xwx9wvTjYTINahFxK7K+jNDPu+WSZ25PDa5+ub3aiYlPxF8Ht/TlEAdGh71igHQQuvOExdkeptma97Jp6yjIRdnZnL8xTiB6Dgl24RbpZocVRdst4ZmGn7s8GW7INdr9W7iNcXD3Nk+zS48jXQ0todyoXlI4LrpumyD8kSClBdimiwHXBmChdfaZfWa9qi6KT/LP36UUvzwbZauaS/F7TmVdJ3neMsXD5pErlAImnbIRe1zJrgNLWCTm+PkEIu4yk+CwqD2lPeuok92fUmzGnMFRtji6Tsy+c2741uS21HIDaDbSkez+9OLxp9pUt0AGDdeebcYJUJtUOEkb28lUw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(4636009)(366004)(66476007)(66446008)(84970400001)(26005)(186003)(44832011)(316002)(76116006)(110136005)(55016003)(71200400001)(8936002)(52536014)(83380400001)(66556008)(66946007)(5660300002)(33656002)(122000001)(86362001)(38070700005)(54906003)(38100700002)(64756008)(2906002)(9686003)(4743002)(99936003)(508600001)(4326008)(6506007)(8676002)(7696005); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR08MB2649 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 6412699b-1f77-4c96-1b87-08da172f596f X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ByHR5/ZNj2XXM6c/RySMmT2Lj4p3/xwGR8aw4HMLBx3NI7mIw8aBL1DY7/MA2HofQM/d8luKjaCjv5N2ZE8YkWRFWLXWqqhxaYACmWxZN5b1Efbqej+VdkB9jOvkxaHm94U69/BfvqsUi1lX9Ok0h6Rza2a7mkvfivoy6FD6tR3WpQ6xXjlEz/mtXqXP6g5iXl4oB7pNuBoqkco/rae0yajTqqazBgZNDyl5vFjcco6+McR8DPBApa9CMIrBxGRcqRuafAMDE2G4jviNQ9MKfpDmigI31PT+BKDae51Wl4PdiFlKXiO5HAo9ccqsx6774FAJt1eJmQkNVzYTuIsu7TVevN5GQNAiy+p7rt850GXqGFJQftIlZGVDfxW5rZbEo6W5/6bu8LhqracwuMqi6LmfqCw/7y4DNgrZReyWX4xzAt1hilhRRiwMWUWc9YX22JjqYycD3BT+t7cZAj/ZwkmmnDBD2cqWjo1+SBLurEVlphRQuXlxfMR3LMkG9hqOI9SxRmpEDswh9HKJbPgBN+O4K7vEhNhjGs4Qe5huBNUwZKjz/THBM59wqYevRPs9aLGVYalFHt6Ld5bj96DIGnxnbA7VoDtxg4HkjeVIB9tmAMXPu/kkNNDA88QOp2DEDxZV2dxk+X7aRUds4ofkUv/6YNJNopauRdKPBmE1o7BsyW77E7Nye6lEeVbc4/fpf9GNNyFgR+Y3YtrDBMD2tSi1DGslG3t+1FRgT1wMwTPbKU0AS93NA3BKQ/DEHcUMchkMxhYc3cHNMqWKrcqMB4u1broE2DVaEi4JWttwBSk= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(235185007)(33964004)(54906003)(83380400001)(82310400005)(2906002)(9686003)(7696005)(4743002)(6506007)(47076005)(508600001)(316002)(36860700001)(5660300002)(84970400001)(336012)(52536014)(26005)(186003)(8936002)(356005)(55016003)(99936003)(81166007)(33656002)(86362001)(70206006)(8676002)(110136005)(70586007)(40460700003)(4326008); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Apr 2022 18:09:04.1890 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9208eb05-615f-429f-a4a8-08da172f5e7f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0802MB2322 X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard Earnshaw , Richard Sandiford , Marcus Shawcroft Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, As the discussion in the PR pointed out the RTL we have for the REDUC_PLUS patterns are wrong. The UNSPECs are modelled as returning a vector and then in an expand pattern we emit a vec_select of the 0th element to get the scalar. This is incorrect as the instruction itself already only returns a single scalar and by declaring it returns a vector it allows combine to push in a subreg into the pattern, which causes reload to make duplicate moves. This patch corrects this by removing the weird indirection and making the RTL pattern model the correct semantics of the instruction immediately. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/104049 * config/aarch64/aarch64-simd.md (aarch64_reduc_plus_internal): Fix RTL and rename to... (reduc_plus_scal_): ... This. (reduc_plus_scal_v4sf): Moved. (aarch64_reduc_plus_internalv2si): Fix RTL and rename to... (reduc_plus_scal_v2si): ... This. gcc/testsuite/ChangeLog: PR target/104049 * gcc.target/aarch64/vadd_reduc-1.c: New test. * gcc.target/aarch64/vadd_reduc-2.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 18733428f3fb91d937346aa360f6d1fe13ca1eae..ad69a1c4eafe30a98e3d2273fe498694eeddc05a 100644 --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 18733428f3fb91d937346aa360f6d1fe13ca1eae..ad69a1c4eafe30a98e3d2273fe498694eeddc05a 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3385,20 +3385,6 @@ (define_insn "3" ;; 'across lanes' add. -(define_expand "reduc_plus_scal_" - [(match_operand: 0 "register_operand") - (unspec:VDQ_I [(match_operand:VDQ_I 1 "register_operand")] - UNSPEC_ADDV)] - "TARGET_SIMD" - { - rtx elt = aarch64_endian_lane_rtx (mode, 0); - rtx scratch = gen_reg_rtx (mode); - emit_insn (gen_aarch64_reduc_plus_internal (scratch, operands[1])); - emit_insn (gen_aarch64_get_lane (operands[0], scratch, elt)); - DONE; - } -) - (define_insn "aarch64_faddp" [(set (match_operand:VHSDF 0 "register_operand" "=w") (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") @@ -3409,15 +3395,47 @@ (define_insn "aarch64_faddp" [(set_attr "type" "neon_fp_reduc_add_")] ) -(define_insn "aarch64_reduc_plus_internal" - [(set (match_operand:VDQV 0 "register_operand" "=w") - (unspec:VDQV [(match_operand:VDQV 1 "register_operand" "w")] +(define_insn "reduc_plus_scal_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: [(match_operand:VDQV 1 "register_operand" "w")] UNSPEC_ADDV))] "TARGET_SIMD" "add\\t%0, %1." [(set_attr "type" "neon_reduc_add")] ) +(define_insn "reduc_plus_scal_v2si" + [(set (match_operand:SI 0 "register_operand" "=w") + (unspec:SI [(match_operand:V2SI 1 "register_operand" "w")] + UNSPEC_ADDV))] + "TARGET_SIMD" + "addp\\t%0.2s, %1.2s, %1.2s" + [(set_attr "type" "neon_reduc_add")] +) + +(define_insn "reduc_plus_scal_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: [(match_operand:V2F 1 "register_operand" "w")] + UNSPEC_FADDV))] + "TARGET_SIMD" + "faddp\\t%0, %1." + [(set_attr "type" "neon_fp_reduc_add_")] +) + +(define_expand "reduc_plus_scal_v4sf" + [(set (match_operand:SF 0 "register_operand") + (unspec:SF [(match_operand:V4SF 1 "register_operand")] + UNSPEC_FADDV))] + "TARGET_SIMD" +{ + rtx elt = aarch64_endian_lane_rtx (V4SFmode, 0); + rtx scratch = gen_reg_rtx (V4SFmode); + emit_insn (gen_aarch64_faddpv4sf (scratch, operands[1], operands[1])); + emit_insn (gen_aarch64_faddpv4sf (scratch, scratch, scratch)); + emit_insn (gen_aarch64_get_lanev4sf (operands[0], scratch, elt)); + DONE; +}) + (define_insn "aarch64_addlv" [(set (match_operand: 0 "register_operand" "=w") (unspec: [(match_operand:VDQV_L 1 "register_operand" "w")] @@ -3447,38 +3465,6 @@ (define_insn "aarch64_zero_extend_reduc_plus_" [(set_attr "type" "neon_reduc_add")] ) -(define_insn "aarch64_reduc_plus_internalv2si" - [(set (match_operand:V2SI 0 "register_operand" "=w") - (unspec:V2SI [(match_operand:V2SI 1 "register_operand" "w")] - UNSPEC_ADDV))] - "TARGET_SIMD" - "addp\\t%0.2s, %1.2s, %1.2s" - [(set_attr "type" "neon_reduc_add")] -) - -(define_insn "reduc_plus_scal_" - [(set (match_operand: 0 "register_operand" "=w") - (unspec: [(match_operand:V2F 1 "register_operand" "w")] - UNSPEC_FADDV))] - "TARGET_SIMD" - "faddp\\t%0, %1." - [(set_attr "type" "neon_fp_reduc_add_")] -) - -(define_expand "reduc_plus_scal_v4sf" - [(set (match_operand:SF 0 "register_operand") - (unspec:V4SF [(match_operand:V4SF 1 "register_operand")] - UNSPEC_FADDV))] - "TARGET_SIMD" -{ - rtx elt = aarch64_endian_lane_rtx (V4SFmode, 0); - rtx scratch = gen_reg_rtx (V4SFmode); - emit_insn (gen_aarch64_faddpv4sf (scratch, operands[1], operands[1])); - emit_insn (gen_aarch64_faddpv4sf (scratch, scratch, scratch)); - emit_insn (gen_aarch64_get_lanev4sf (operands[0], scratch, elt)); - DONE; -}) - (define_insn "clrsb2" [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w") (clrsb:VDQ_BHSI (match_operand:VDQ_BHSI 1 "register_operand" "w")))] diff --git a/gcc/testsuite/gcc.target/aarch64/vadd_reduc-1.c b/gcc/testsuite/gcc.target/aarch64/vadd_reduc-1.c new file mode 100644 index 0000000000000000000000000000000000000000..271a1c3e8c319f7bdb8669e4b3199b6f88ef7f5d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vadd_reduc-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +/* +**bar: +** ... +** addv s0, v0.4s +** fmov w0, s0 +** lsr w1, w0, 16 +** add w0, w1, w0, uxth +** ret +*/ +int bar (v4si x) +{ + unsigned int sum = vaddvq_s32 (x); + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/vadd_reduc-2.c b/gcc/testsuite/gcc.target/aarch64/vadd_reduc-2.c new file mode 100644 index 0000000000000000000000000000000000000000..5355cbcca34f40a686da837acf75780eb4286974 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vadd_reduc-2.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +**test: +** ... +** addv s0, v0.4s +** fmov w0, s0 +** and w1, w0, 65535 +** add w0, w1, w0, lsr 16 +** lsr w0, w0, 1 +** ret +*/ +int test (uint8_t *p, uint32_t t[1][1], int n) { + + int sum = 0; + uint32_t a0; + for (int i = 0; i < 4; i++, p++) + t[i][0] = p[0]; + + for (int i = 0; i < 4; i++) { + { + int t0 = t[0][i] + t[0][i]; + a0 = t0; + }; + sum += a0; + } + return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1; +} --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3385,20 +3385,6 @@ (define_insn "3" ;; 'across lanes' add. -(define_expand "reduc_plus_scal_" - [(match_operand: 0 "register_operand") - (unspec:VDQ_I [(match_operand:VDQ_I 1 "register_operand")] - UNSPEC_ADDV)] - "TARGET_SIMD" - { - rtx elt = aarch64_endian_lane_rtx (mode, 0); - rtx scratch = gen_reg_rtx (mode); - emit_insn (gen_aarch64_reduc_plus_internal (scratch, operands[1])); - emit_insn (gen_aarch64_get_lane (operands[0], scratch, elt)); - DONE; - } -) - (define_insn "aarch64_faddp" [(set (match_operand:VHSDF 0 "register_operand" "=w") (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") @@ -3409,15 +3395,47 @@ (define_insn "aarch64_faddp" [(set_attr "type" "neon_fp_reduc_add_")] ) -(define_insn "aarch64_reduc_plus_internal" - [(set (match_operand:VDQV 0 "register_operand" "=w") - (unspec:VDQV [(match_operand:VDQV 1 "register_operand" "w")] +(define_insn "reduc_plus_scal_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: [(match_operand:VDQV 1 "register_operand" "w")] UNSPEC_ADDV))] "TARGET_SIMD" "add\\t%0, %1." [(set_attr "type" "neon_reduc_add")] ) +(define_insn "reduc_plus_scal_v2si" + [(set (match_operand:SI 0 "register_operand" "=w") + (unspec:SI [(match_operand:V2SI 1 "register_operand" "w")] + UNSPEC_ADDV))] + "TARGET_SIMD" + "addp\\t%0.2s, %1.2s, %1.2s" + [(set_attr "type" "neon_reduc_add")] +) + +(define_insn "reduc_plus_scal_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: [(match_operand:V2F 1 "register_operand" "w")] + UNSPEC_FADDV))] + "TARGET_SIMD" + "faddp\\t%0, %1." + [(set_attr "type" "neon_fp_reduc_add_")] +) + +(define_expand "reduc_plus_scal_v4sf" + [(set (match_operand:SF 0 "register_operand") + (unspec:SF [(match_operand:V4SF 1 "register_operand")] + UNSPEC_FADDV))] + "TARGET_SIMD" +{ + rtx elt = aarch64_endian_lane_rtx (V4SFmode, 0); + rtx scratch = gen_reg_rtx (V4SFmode); + emit_insn (gen_aarch64_faddpv4sf (scratch, operands[1], operands[1])); + emit_insn (gen_aarch64_faddpv4sf (scratch, scratch, scratch)); + emit_insn (gen_aarch64_get_lanev4sf (operands[0], scratch, elt)); + DONE; +}) + (define_insn "aarch64_addlv" [(set (match_operand: 0 "register_operand" "=w") (unspec: [(match_operand:VDQV_L 1 "register_operand" "w")] @@ -3447,38 +3465,6 @@ (define_insn "aarch64_zero_extend_reduc_plus_" [(set_attr "type" "neon_reduc_add")] ) -(define_insn "aarch64_reduc_plus_internalv2si" - [(set (match_operand:V2SI 0 "register_operand" "=w") - (unspec:V2SI [(match_operand:V2SI 1 "register_operand" "w")] - UNSPEC_ADDV))] - "TARGET_SIMD" - "addp\\t%0.2s, %1.2s, %1.2s" - [(set_attr "type" "neon_reduc_add")] -) - -(define_insn "reduc_plus_scal_" - [(set (match_operand: 0 "register_operand" "=w") - (unspec: [(match_operand:V2F 1 "register_operand" "w")] - UNSPEC_FADDV))] - "TARGET_SIMD" - "faddp\\t%0, %1." - [(set_attr "type" "neon_fp_reduc_add_")] -) - -(define_expand "reduc_plus_scal_v4sf" - [(set (match_operand:SF 0 "register_operand") - (unspec:V4SF [(match_operand:V4SF 1 "register_operand")] - UNSPEC_FADDV))] - "TARGET_SIMD" -{ - rtx elt = aarch64_endian_lane_rtx (V4SFmode, 0); - rtx scratch = gen_reg_rtx (V4SFmode); - emit_insn (gen_aarch64_faddpv4sf (scratch, operands[1], operands[1])); - emit_insn (gen_aarch64_faddpv4sf (scratch, scratch, scratch)); - emit_insn (gen_aarch64_get_lanev4sf (operands[0], scratch, elt)); - DONE; -}) - (define_insn "clrsb2" [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w") (clrsb:VDQ_BHSI (match_operand:VDQ_BHSI 1 "register_operand" "w")))] diff --git a/gcc/testsuite/gcc.target/aarch64/vadd_reduc-1.c b/gcc/testsuite/gcc.target/aarch64/vadd_reduc-1.c new file mode 100644 index 0000000000000000000000000000000000000000..271a1c3e8c319f7bdb8669e4b3199b6f88ef7f5d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vadd_reduc-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +/* +**bar: +** ... +** addv s0, v0.4s +** fmov w0, s0 +** lsr w1, w0, 16 +** add w0, w1, w0, uxth +** ret +*/ +int bar (v4si x) +{ + unsigned int sum = vaddvq_s32 (x); + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/vadd_reduc-2.c b/gcc/testsuite/gcc.target/aarch64/vadd_reduc-2.c new file mode 100644 index 0000000000000000000000000000000000000000..5355cbcca34f40a686da837acf75780eb4286974 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vadd_reduc-2.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +**test: +** ... +** addv s0, v0.4s +** fmov w0, s0 +** and w1, w0, 65535 +** add w0, w1, w0, lsr 16 +** lsr w0, w0, 1 +** ret +*/ +int test (uint8_t *p, uint32_t t[1][1], int n) { + + int sum = 0; + uint32_t a0; + for (int i = 0; i < 4; i++, p++) + t[i][0] = p[0]; + + for (int i = 0; i < 4; i++) { + { + int t0 = t[0][i] + t[0][i]; + a0 = t0; + }; + sum += a0; + } + return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1; +}