From patchwork Thu Feb 1 13:45:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 85132 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 079D63857C7C for ; Thu, 1 Feb 2024 13:46:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-AM7-obe.outbound.protection.outlook.com (mail-am7eur03on2078.outbound.protection.outlook.com [40.107.105.78]) by sourceware.org (Postfix) with ESMTPS id 9CE333858009 for ; Thu, 1 Feb 2024 13:45:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9CE333858009 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9CE333858009 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.105.78 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1706795152; cv=pass; b=II+2V+HxjEsCcq55ANusIFkoz8Jr/wuiihpuCuVTnylE7pNnvp+Bbjip4PtuEE/UMAQEQsfShglS5Ly6FnN+KbJKVa1FFSB+AMm24pad8U/26o08XPF7Hj6D0rextekl/SNCwbSwfzAAiAlfOyosIpFNQXct+X/KmcW/2KeYGG0= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1706795152; c=relaxed/simple; bh=8eMZt2ddNFiQzvAerYvh9ttUTiKhH25zderrKTUf/M0=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=NoNEQFXOhB5DM7Hh0jVD7ZzUqMiOkUMlLrIUYiltkz/xo2Uwa42px3t1jNTBTrRXHeiHxIEZIhEi59Vwuz95TndAiZL9iFej0CzYHOYaSY9jEVoVt9ikMXjHFkGIupPqRT9POPrgHXw9mTg0lynBQVPeCBhGGAG83Tt3LgkSOdU= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=TULQkdFIwKMjetF6zozDshVwsiTMHNHXN6L0vfhDSy2C4gcjHhUfJ9p/Ocw4AxN6+dgQExxxZb1lZcTUZ4ig8EYV7iS4gsi+Bceix4neU2JR+Lh6l+mD/dGiMAFJJ6agNzNf1Ryx5KFVaOFjZpHWcBFiJe6iYf+AmftOMeWRWjlVrHryzG9jSR9yNDlnCBUoQ0GxH5+b1T0Galt9XdRCk20V6Wl/12aUo9hgKouSCLvoJfA3zYPJA3+SLW79FftvfQa81iV7mN4sX38c5QOm4sN/t413uBxjV5A4h+u2lRNSDmkgyCTPP9LCsJJ5Kp1/0C7IzXFJ35pVBdLRDwHXMA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ORVOElF4po9hcOh0crJ7KApH21OGynaV39zyaiBelEA=; b=B5NUMLtDKz4nlNMrGyy9awGjxBIvJaijee65XeTptUkPxtVaU/H7aBIz8VsCAMHPyDP8uZrycKQiDvTJ5L7CPaDn8FjupbWkPa+tkO/TGr1TpYoToWhjRqd/G0N0yn3uWoypTxX530hpyW/Aeu6cqKDM9is7+kN0W+7Ti0mrS+y3IQyWvxr1tomCdBxMxHWEZWtFo6Sp1s3gk3AS2ArIC+Vjxevli45cMS5Bn6KtsU4Yxo24kaCC5hG0US98wlWh7ag+dzYI2RJff9mMFUrsAwVqfGIeZNoi+Bw+QdpJLCtKABM61cGrMNiAKLdJoegb5uMLqa7iVMaAwmYJCWISFw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ORVOElF4po9hcOh0crJ7KApH21OGynaV39zyaiBelEA=; b=AjSDC1RWMZokowdnlVaR1nKWiBoEMQ9sxK3z3d3rhfx5J9nX63+Yax5eL+P+C3h3Kdhd06Fg+H+NF0stAEKDE61IWIQWXzyPbFAKtXMHj0woyMoKkjJvKxSo9HS35T5om5AVtUy75lNW8qFksZDHS78VvpHfnPBFDtNlzGIfoF4= Received: from DUZPR01CA0152.eurprd01.prod.exchangelabs.com (2603:10a6:10:4bd::27) by DU0PR08MB7713.eurprd08.prod.outlook.com (2603:10a6:10:3bf::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7228.32; Thu, 1 Feb 2024 13:45:45 +0000 Received: from DB1PEPF00039234.eurprd03.prod.outlook.com (2603:10a6:10:4bd:cafe::93) by DUZPR01CA0152.outlook.office365.com (2603:10a6:10:4bd::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7249.27 via Frontend Transport; Thu, 1 Feb 2024 13:45:45 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB1PEPF00039234.mail.protection.outlook.com (10.167.8.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7249.19 via Frontend Transport; Thu, 1 Feb 2024 13:45:45 +0000 Received: ("Tessian outbound 94d82ba85b1d:v228"); Thu, 01 Feb 2024 13:45:45 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: af7ecc8e88bf16cc X-CR-MTA-TID: 64aa7808 Received: from b24c8f21dc43.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D72F669F-2ABC-4980-8734-DAD5C979907D.1; Thu, 01 Feb 2024 13:45:33 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id b24c8f21dc43.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 01 Feb 2024 13:45:33 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DIBvzFSNAPoX3q6a1kbmaqUV1/MtpogCyFYxyf8rfkI7SUzjUkaOjI0T5vHLcyQMTt7WoflWIOtDxwAPRGWjvSEZSl+vbZ4KRtXJt6I1+FnvaPHNuMXSAGCxdSYgjAKMmkNMyruxIUcGj+TOUZPZMDEKxK1TFXAn0eCV5yIuZORMW78jN4tLcqKhVRHmkADHX1LbPWUE08wWjUitnwKsBS01Z9imuURIU7RgcOZfN2dQyVGiGWSGsWIkfOLJXzup3BvPn9Ag/JSxL3LkY9DmGjAcZTSKl/4/X76vQuvpfxxSq8oHz0wNXw6x5JQ1icEcyEDG/h6rUu9Mtf7rFpB9QQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ORVOElF4po9hcOh0crJ7KApH21OGynaV39zyaiBelEA=; b=QMYkTmbJes42zI+uEgt3qOW1uIUSFKoycVCwu/bLux+I1u2alw6R93y6ACdLTdjh2uueKx9A4SuJbrNCYo2QnOJG762lRT+t5i0QIqRbWPQapl4j77ZH1PyNgMnRdXT82wtVWd167h0kole8PBpmzxIPXGbop1AQw0i7v/z9eZx6WUGaCWWw0mV/CZi4Da+6J+FPVr2mqtc4w5FJIfq/SdimD5dSruhhs5ZrFgBdLYiVrEz92Fk171qsHMSohNMZ5Za07ckzQ+jaH9/1ruyMihHAax3muL6e4f9oeLZy3TvOjqzJdi4TbWAEyhDNJf/yPKkffAcks5YO8axcJCyA4w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ORVOElF4po9hcOh0crJ7KApH21OGynaV39zyaiBelEA=; b=AjSDC1RWMZokowdnlVaR1nKWiBoEMQ9sxK3z3d3rhfx5J9nX63+Yax5eL+P+C3h3Kdhd06Fg+H+NF0stAEKDE61IWIQWXzyPbFAKtXMHj0woyMoKkjJvKxSo9HS35T5om5AVtUy75lNW8qFksZDHS78VvpHfnPBFDtNlzGIfoF4= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AM9PR08MB6658.eurprd08.prod.outlook.com (2603:10a6:20b:303::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7249.26; Thu, 1 Feb 2024 13:45:32 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::381e:a45c:29f8:65c]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::381e:a45c:29f8:65c%7]) with mapi id 15.20.7249.023; Thu, 1 Feb 2024 13:45:31 +0000 From: Wilco Dijkstra To: GCC Patches CC: Richard Sandiford , Kyrylo Tkachov Subject: [PATCH] AArch64: memcpy/memset expansions should not emit LDP/STP [PR113618] Thread-Topic: [PATCH] AArch64: memcpy/memset expansions should not emit LDP/STP [PR113618] Thread-Index: AQHaVRQqNvkwL6Obu0SmbjgFo2aeUg== Date: Thu, 1 Feb 2024 13:45:31 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AM9PR08MB6658:EE_|DB1PEPF00039234:EE_|DU0PR08MB7713:EE_ X-MS-Office365-Filtering-Correlation-Id: 0f6e0444-640e-4a32-e4d6-08dc232c172d x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: zMCdV3R0guxYOEr4ZRnzn0Jj3yN5Hoa/6HRW+pzhfU9513tQd8Qui/3YhbyQvVVUzkG0Flwn/FVcrK6i3EDacBKNGmC7UnCMhTXLWXa/YJ0+uh2YonsYg/ev9gS1oSn/wgEtBeIZBRjct9uMuQ2RliRhYEWzO9vN4FfJq3QICEt7G4UwQrGCmUkhEQJcjbC13kqms6OQGLl4YkFFIZ+vtHHhUjf5YBsFeR5AruE6LP7y9WT7O90yzB7wDo7VXqJAEWQul7R5w21AfwvA0ngC4lE64SU2Qg+ZhxbnHqxZ0Sq8bd8CkamOZvoY/uhveIJ3uiGFVXLBMinZmkHoysqTCpunSTLuAuzqvU5H6VXn64QXN/LsS9jenfTbgg8cqSELBQ08FdMtVu6GWGB4v4ywJNSs/rbM0jYUX37lSNh3SHw9Apoagyz/Zb0XZ0UiuhRbHzK4GVqL7LMQZACSrVoifNbT18a8fSLCOt2IJEraid5D8HKmchBbC8YmmXvXpnYXzLpmlfSs/E4OSNpx8gXmhMQBlNug5l8ZavW/E/tCL2FE/y6CrueCRc3TI94n6tUZlYHsRX2YBVMWWnOGOP5emIY+zOBFsq9ue4BbaYq8n4o= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(346002)(39860400002)(376002)(136003)(396003)(366004)(230922051799003)(451199024)(1800799012)(64100799003)(186009)(6916009)(2906002)(5660300002)(41300700001)(33656002)(86362001)(38100700002)(478600001)(38070700009)(83380400001)(7696005)(6506007)(26005)(9686003)(122000001)(71200400001)(8676002)(8936002)(52536014)(4326008)(316002)(66556008)(66476007)(66446008)(64756008)(76116006)(91956017)(55016003)(66946007)(54906003)(84970400001); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6658 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB1PEPF00039234.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: e22c1101-7ce0-472a-44e3-08dc232c0f19 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: FNoHSQGPzrGb0TwuVSXSwdxuR0zq9UbPLT5LpZJGSRBw4bTLNCNn2+CPirdwLP9Hta3/pcQPkQG4cQtHhZH4dqXaWguo19sL/OIQtLD57s+AiUxIcniXQPnOOhSeDWs3wSJynEbKSmNwFtJ86lVT3kOW4NVl0jTlgqEgcmC7K7FbOT/b83cblOf7hfP656xDd8KUJxWhVNPToSpEow58bKSbZw400YunwXvy9EZhpQFu+teDrZQza7NQ8Mc4XjHZElkLDmGYx/RoaP/SgzS6srtpMghy8/OMQTLCLSy8yp5x1oGjleNHTR4dEPmS/W9QCd8m9kDZqgJyjdaTitMQt4eq612K39rZ0DGC77+L4JG1xJMuSyEaFwWstL9OtvCZSXPsxuI4KdXFFI2PTRwnxJZg76yLqURIEodOeNvmWPNOkSH3v/gSVng35BdigwRe/3wBGJ1eNGqsLw6+5V6MmLyWI581AFGXE/4/0obFtTfjP5858+XzsI2nKhETjbdXQfh6ux5SntJpgfgxiKH1//bbU2spSR7RRd950paBp6NqEKKG2/aFSlgq7PvWwlv5gAgkERzxNYfjz87mYJ/NZbs9jypF7Hsv3RmBtt4JoqmxcCjpUZu/ZWZqLvl9E7QRNxJ6TFvf889FVfyPvgdVpIK7LoFC0OnWA4mII6gMUPKu7T/lQdltu5MKU1D4IZ/vcoFTRp6EjnXrBHj0FBhW6Kl+OT0A9cylU3qhXO9cvn8= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(376002)(136003)(396003)(346002)(39860400002)(230922051799003)(451199024)(82310400011)(186009)(1800799012)(64100799003)(36840700001)(46966006)(40470700004)(6506007)(83380400001)(336012)(7696005)(9686003)(41300700001)(47076005)(8936002)(2906002)(81166007)(52536014)(5660300002)(4326008)(70586007)(8676002)(54906003)(70206006)(6916009)(26005)(86362001)(82740400003)(356005)(36860700001)(316002)(33656002)(478600001)(55016003)(40480700001)(40460700003)(84970400001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Feb 2024 13:45:45.4184 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0f6e0444-640e-4a32-e4d6-08dc232c172d X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB1PEPF00039234.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7713 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org The new RTL introduced for LDP/STP results in regressions due to use of UNSPEC. Given the new LDP fusion pass is good at finding LDP opportunities, change the memcpy, memmove and memset expansions to emit single vector loads/stores. This fixes the regression and enables more RTL optimization on the standard memory accesses. SPEC2017 performance improves slightly. Codesize is a bit worse due to missed LDP opportunities as discussed in the PR. Passes regress, OK for commit? gcc/ChangeLog: PR target/113618 * config/aarch64/aarch64.cc (aarch64_copy_one_block): Remove. (aarch64_expand_cpymem): Emit single load/store only. (aarch64_set_one_block): Remove. (aarch64_expand_setmem): Emit single stores only. gcc/testsuite/ChangeLog: PR target/113618 * gcc.target/aarch64/pr113618.c: New test. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index d17198b4a5f73f8be8aeca3258b81809ffb48eac..2194441b949a53f181fe373e07bc18341c014918 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -26376,33 +26376,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 amount) next, amount); } -typedef auto_vec, 12> copy_ops; - -/* Copy one block of size MODE from SRC to DST at offset OFFSET. */ -static void -aarch64_copy_one_block (copy_ops &ops, rtx src, rtx dst, - int offset, machine_mode mode) -{ - /* Emit explict load/store pair instructions for 32-byte copies. */ - if (known_eq (GET_MODE_SIZE (mode), 32)) - { - mode = V4SImode; - rtx src1 = adjust_address (src, mode, offset); - rtx dst1 = adjust_address (dst, mode, offset); - rtx reg1 = gen_reg_rtx (mode); - rtx reg2 = gen_reg_rtx (mode); - rtx load = aarch64_gen_load_pair (reg1, reg2, src1); - rtx store = aarch64_gen_store_pair (dst1, reg1, reg2); - ops.safe_push ({ load, store }); - return; - } - - rtx reg = gen_reg_rtx (mode); - rtx load = gen_move_insn (reg, adjust_address (src, mode, offset)); - rtx store = gen_move_insn (adjust_address (dst, mode, offset), reg); - ops.safe_push ({ load, store }); -} - /* Expand a cpymem/movmem using the MOPS extension. OPERANDS are taken from the cpymem/movmem pattern. IS_MEMMOVE is true if this is a memmove rather than memcpy. Return true iff we succeeded. */ @@ -26438,7 +26411,7 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove) rtx src = operands[1]; unsigned align = UINTVAL (operands[3]); rtx base; - machine_mode cur_mode = BLKmode, next_mode; + machine_mode mode = BLKmode, next_mode; /* Variable-sized or strict-align copies may use the MOPS expansion. */ if (!CONST_INT_P (operands[2]) || (STRICT_ALIGNMENT && align < 16)) @@ -26465,7 +26438,7 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove) ??? Although it would be possible to use LDP/STP Qn in streaming mode (so using TARGET_BASE_SIMD instead of TARGET_SIMD), it isn't clear whether that would improve performance. */ - unsigned copy_max = (size <= 24 || !TARGET_SIMD) ? 16 : 32; + bool use_qregs = size > 24 && TARGET_SIMD; base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = adjust_automodify_address (dst, VOIDmode, base, 0); @@ -26473,7 +26446,7 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove) base = copy_to_mode_reg (Pmode, XEXP (src, 0)); src = adjust_automodify_address (src, VOIDmode, base, 0); - copy_ops ops; + auto_vec, 16> ops; int offset = 0; while (size > 0) @@ -26482,23 +26455,27 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove) or writing. */ opt_scalar_int_mode mode_iter; FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT) - if (GET_MODE_SIZE (mode_iter.require ()) <= MIN (size, copy_max)) - cur_mode = mode_iter.require (); + if (GET_MODE_SIZE (mode_iter.require ()) <= MIN (size, 16)) + mode = mode_iter.require (); + + gcc_assert (mode != BLKmode); - gcc_assert (cur_mode != BLKmode); + mode_bytes = GET_MODE_SIZE (mode).to_constant (); - mode_bytes = GET_MODE_SIZE (cur_mode).to_constant (); + /* Prefer Q-register accesses. */ + if (mode_bytes == 16 && use_qregs) + mode = V4SImode; - /* Prefer Q-register accesses for the last bytes. */ - if (mode_bytes == 16 && copy_max == 32) - cur_mode = V4SImode; - aarch64_copy_one_block (ops, src, dst, offset, cur_mode); + rtx reg = gen_reg_rtx (mode); + rtx load = gen_move_insn (reg, adjust_address (src, mode, offset)); + rtx store = gen_move_insn (adjust_address (dst, mode, offset), reg); + ops.safe_push ({ load, store }); size -= mode_bytes; offset += mode_bytes; /* Emit trailing copies using overlapping unaligned accesses (when !STRICT_ALIGNMENT) - this is smaller and faster. */ - if (size > 0 && size < copy_max / 2 && !STRICT_ALIGNMENT) + if (size > 0 && size < 16 && !STRICT_ALIGNMENT) { next_mode = smallest_mode_for_size (size * BITS_PER_UNIT, MODE_INT); int n_bytes = GET_MODE_SIZE (next_mode).to_constant (); @@ -26510,7 +26487,7 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove) /* Memcpy interleaves loads with stores, memmove emits all loads first. */ int nops = ops.length(); - int inc = is_memmove ? nops : nops == 4 ? 2 : 3; + int inc = is_memmove || nops <= 8 ? nops : 6; for (int i = 0; i < nops; i += inc) { @@ -26525,23 +26502,6 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove) return true; } -/* Set one block of size MODE at DST at offset OFFSET to value in SRC. */ -static void -aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode) -{ - /* Emit explict store pair instructions for 32-byte writes. */ - if (known_eq (GET_MODE_SIZE (mode), 32)) - { - mode = V16QImode; - rtx dst1 = adjust_address (dst, mode, offset); - emit_insn (aarch64_gen_store_pair (dst1, src, src)); - return; - } - if (known_lt (GET_MODE_SIZE (mode), 16)) - src = lowpart_subreg (mode, src, GET_MODE (src)); - emit_move_insn (adjust_address (dst, mode, offset), src); -} - /* Expand a setmem using the MOPS instructions. OPERANDS are the same as for the setmem pattern. Return true iff we succeed. */ static bool @@ -26574,7 +26534,7 @@ aarch64_expand_setmem (rtx *operands) rtx val = operands[2], src; unsigned align = UINTVAL (operands[3]); rtx base; - machine_mode cur_mode = BLKmode, next_mode; + machine_mode mode = BLKmode, next_mode; /* Variable-sized or strict-align memset may use the MOPS expansion. */ if (!CONST_INT_P (operands[1]) || !TARGET_SIMD @@ -26595,11 +26555,8 @@ aarch64_expand_setmem (rtx *operands) dst = adjust_automodify_address (dst, VOIDmode, base, 0); /* Prepare the val using a DUP/MOVI v0.16B, val. */ - src = expand_vector_broadcast (V16QImode, val); - src = force_reg (V16QImode, src); - - /* Set maximum number of bytes to write per instruction. */ - unsigned set_max = (len <= 24) ? 16 : 32; + val = expand_vector_broadcast (V16QImode, val); + val = force_reg (V16QImode, val); int offset = 0; while (len > 0) @@ -26608,24 +26565,28 @@ aarch64_expand_setmem (rtx *operands) over writing. */ opt_scalar_int_mode mode_iter; FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT) - if (GET_MODE_SIZE (mode_iter.require ()) <= MIN (len, set_max)) - cur_mode = mode_iter.require (); + if (GET_MODE_SIZE (mode_iter.require ()) <= MIN (len, 16)) + mode = mode_iter.require (); + + gcc_assert (mode != BLKmode); - gcc_assert (cur_mode != BLKmode); + mode_bytes = GET_MODE_SIZE (mode).to_constant (); - mode_bytes = GET_MODE_SIZE (cur_mode).to_constant (); + src = val; - /* Prefer Q-register accesses for the last bytes. */ + /* Prefer Q-register accesses. */ if (mode_bytes == 16) - cur_mode = V16QImode; + mode = V16QImode; + else + src = lowpart_subreg (mode, src, GET_MODE (val)); - aarch64_set_one_block (src, dst, offset, cur_mode); + emit_move_insn (adjust_address (dst, mode, offset), src); len -= mode_bytes; offset += mode_bytes; /* Emit trailing writes using overlapping unaligned accesses (when !STRICT_ALIGNMENT) - this is smaller and faster. */ - if (len > 0 && len < set_max / 2 && !STRICT_ALIGNMENT) + if (len > 0 && len < 16 && !STRICT_ALIGNMENT) { next_mode = smallest_mode_for_size (len * BITS_PER_UNIT, MODE_INT); int n_bytes = GET_MODE_SIZE (next_mode).to_constant (); diff --git a/gcc/testsuite/gcc.target/aarch64/pr113618.c b/gcc/testsuite/gcc.target/aarch64/pr113618.c new file mode 100644 index 0000000000000000000000000000000000000000..f582360e7c1d149ad94c78c3e66f1c4a973750bb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr113618.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + + +/* +** move32: +** ... +** ldp q([0-9]+), q([0-9]+), \[x0\] +** stp q\1, q\2, \[x1\] +** ... +*/ + +void move32 (char *a, char *b) +{ + char temp[32]; + __builtin_memcpy (temp, a, 32); + __builtin_memcpy (b, temp, 32); +} + +/* +** move64: +** ... +** ldp q([0-9]+), q([0-9]+), \[x0\] +** ldp q([0-9]+), q([0-9]+), \[x0, 32\] +** stp q\1, q\2, \[x1\] +** stp q\3, q\4, \[x1, 32\] +** ... +*/ + +void move64 (char *a, char *b) +{ + char temp[64]; + __builtin_memcpy (temp, a, 64); + __builtin_memcpy (b, temp, 64); +}