From patchwork Thu Sep 21 16:19:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 76535 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 051A23858031 for ; Thu, 21 Sep 2023 16:20:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2042.outbound.protection.outlook.com [40.107.21.42]) by sourceware.org (Postfix) with ESMTPS id 390FB3858288 for ; Thu, 21 Sep 2023 16:20:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 390FB3858288 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=w7i305WqidXF09P+xldQyWOneGmOCCZ90t6y1bW5aK8=; b=0b0Xm0vQ+QCmHBGlhEKzuGNxwl47BDTYvV5evVbqjBt+kQvGtLiUmkDUPzCfPe0ClJ7iYFSaiXKRpVA/AupXFuMTGjGIt7O15xQdxZ0k8Du95ZKc7Qz7tq3S8WoLCh9cA+MGD1WMjwP/5AOPhpoW0WziQduv5JHDpHZe8wC80LY= Received: from AS9PR01CA0027.eurprd01.prod.exchangelabs.com (2603:10a6:20b:542::11) by AS2PR08MB9248.eurprd08.prod.outlook.com (2603:10a6:20b:59c::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.26; Thu, 21 Sep 2023 16:20:08 +0000 Received: from AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:542:cafe::56) by AS9PR01CA0027.outlook.office365.com (2603:10a6:20b:542::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.30 via Frontend Transport; Thu, 21 Sep 2023 16:20:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT006.mail.protection.outlook.com (100.127.141.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.20 via Frontend Transport; Thu, 21 Sep 2023 16:20:08 +0000 Received: ("Tessian outbound 1eb4e931b055:v175"); Thu, 21 Sep 2023 16:20:08 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: ad568cb1e7ae1d75 X-CR-MTA-TID: 64aa7808 Received: from 46fd500b63a2.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id A8E13647-5E17-4D12-91D4-87319A202052.1; Thu, 21 Sep 2023 16:19:57 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 46fd500b63a2.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 21 Sep 2023 16:19:57 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QrQNrFi53XG/+ZA88A1zxZ9AEyAeEPX4NJ1ANgkv8oxICoRs+fC0cbX8lE95C54WdeV/iVQVXCY7O4pZVEg4gqzoApWHIkGH3N41UqWG4Xa8jGWUF7aNDVbPI1sWE27t6asu3SLlWVwNHRR4RnBZa9hrkzMkG4kLR4kdZhrkqkWiZWIS53naF13+9BDAiKlUlvlbDQek/fOAxAwHHCSYzFwoY/Qq5gvq56FjiAIzDc9gDEO8pEGDV7hLk4nvTmYDGBDolBNKGCtmh/DiB3eJp4BGnq2nyTxjftMQUv0WfTRUAzJR1P084JhGZXG7IeGLNvSryRNZqtSdDhLcG6zUCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=w7i305WqidXF09P+xldQyWOneGmOCCZ90t6y1bW5aK8=; b=D7Qu2CW5ZmlsqCakapmXcOrV18mpZSjkKUIIUYHzjh0Qr8kvz0w0xOimsC17NT6F8/mhtAD9xg3oMOgSPd/nDF8xg6Hyfz3Ke9vccOcPva5m3B4jrffWEOZavRyM+g3Hw1rCpT0R0SGWC8GUhnfpjOTXfSCjl81ZvDEJJ5F/ISrro49A3UAVOKkqpxYf+oJA3MMXNq0+AcI09X6DRAuCquOYH9t6GpJNQtceMDx0NllmeI9R2HUavXI33TnduHzmMdJ11hOTM4wUquobqEHCvg+JutofAhH4RSsLLMugn8OnlHDTm2IFY/tzI5Ar7Z97gTrHqKcZtlYRaFE3c7ujjQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=w7i305WqidXF09P+xldQyWOneGmOCCZ90t6y1bW5aK8=; b=0b0Xm0vQ+QCmHBGlhEKzuGNxwl47BDTYvV5evVbqjBt+kQvGtLiUmkDUPzCfPe0ClJ7iYFSaiXKRpVA/AupXFuMTGjGIt7O15xQdxZ0k8Du95ZKc7Qz7tq3S8WoLCh9cA+MGD1WMjwP/5AOPhpoW0WziQduv5JHDpHZe8wC80LY= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AM8PR08MB6339.eurprd08.prod.outlook.com (2603:10a6:20b:317::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.20; Thu, 21 Sep 2023 16:19:54 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::ff3d:6e95:9971:a7e]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::ff3d:6e95:9971:a7e%5]) with mapi id 15.20.6813.017; Thu, 21 Sep 2023 16:19:51 +0000 From: Wilco Dijkstra To: GCC Patches CC: Richard Sandiford , Richard Earnshaw Subject: [PATCH] AArch64: Add inline memmove expansion Thread-Topic: [PATCH] AArch64: Add inline memmove expansion Thread-Index: AQHZ7KZq1x2vk2WvwkSrmqteNIco5A== Date: Thu, 21 Sep 2023 16:19:51 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AM8PR08MB6339:EE_|AM7EUR03FT006:EE_|AS2PR08MB9248:EE_ X-MS-Office365-Filtering-Correlation-Id: a2d5f31a-cf98-4d36-32cf-08dbbabe9f97 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: nhcOrAjuKDB9WS4/fkJ+M9js/gp9Eqq2VqXs9Eh9koJCEORNBUs0pBbyVxx20o7edqQUgcNZMFzig4UJI3NelK7+rJz+WQCSJVpmZXPEnQi+a4yYNSMR0LpYu6rBiNruwoQFtCid50G7vqxg6cRzjedAJq2WVUqoL1nAk0w865PjZf/G4ic75dPWo4D+FZn+eEtQ8OPdJ+M1I+6RIvcXJtD5R81V4lm6WJRkMjrNiMv9IeYWwqnABsqHjY0hvbodmTNxFct/RvibN9FyIwos6UtMPmL+69EdODRMA01NgQX+4sLezUtHSK2dSB3Ajzmf/Ch2SW8Vj4yd0GpytV4NoU4HaIcyICzJVMs12dZqSogDFuzc0wEvVTdLg3X3Sxz5BZ6w/C6FllrKk2pIu0nvcNPQDdu/xjkl6HCPChHvGpgXPe4/STnoytBOtQCSu9CL4u+eSbQ+YUOXCVnfsXFkoo4G35uOkTxaD7zXL1YdMx93KFj04+clKdmP3ZKtTLteYv/Bm55KLja097xwc6didxWq+i57F9pTyDmQ3t9KDpgPQgmpPZHYEI8HztgM6FA67vlm+Buvt2u2/WC/ifokRtiexM9bXm0Aniq/fUbIJKo= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(136003)(366004)(346002)(396003)(376002)(1800799009)(186009)(451199024)(91956017)(7696005)(6506007)(9686003)(122000001)(86362001)(66946007)(76116006)(38070700005)(71200400001)(478600001)(55016003)(38100700002)(26005)(66556008)(66476007)(66446008)(5660300002)(52536014)(4326008)(54906003)(6916009)(64756008)(316002)(8936002)(41300700001)(33656002)(8676002)(2906002)(30864003)(84970400001)(83380400001); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB6339 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: e9a53da5-91b4-45ac-5362-08dbbabe956c X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: tuEsQn6fzyNmzyF7syOXg+zOBWmicZlQZkdslcyElguSsnYh85Lf8qzOfJMKjmwqlo35f1Icsq2bR+08pkLqVVs++6TfDXzuyO1X4VbBYdd9dkxAr/cm1a+7jHSuUJPa5ETcVZWVpYPkbiSQLoOf9pF8IWEPmHiptfDLX5mx+fdTP9GoVYe5MkJT5JFokYA7lxa/bJYr7agPA81WsdrWeEEkeJTtlrKgY9tnLKlRfzSaAy4gJLr8ssLm33cnl/NQF9z28fLKJV0GmiJs7j5WWG9WN64C7HCqCNsEeb1dLBHdjnj8vldzGYsA3COHUOuFGGUPKr8+7ZiqdmL2lVRTQ+/1igT/pJsVeqsR2581qCFF03VI69+wCKJcak6BQlv2eMNVwF1iyRqShvpOvoydn4jP7V8JV0rrYSbYYXwMUIgT6J4yDSk4M3ziZIIkPq1l4VGaLnBqI0E3SeVXOvzeKFL/KfdcGNVGQiqNq5+dXJDH4JqrT4r+Tk3yrR0Ol/2zsZtt+dTlk1X7PQLFZihoX3Bi0jmH6j7c71XIAn4oe93RVMklvtm2m4LdSVq+hnRgI1/cqesRZAG7fqAvIiv/Xw+kFrXRQKMAoXRjw8frYhBIYlgDX3l7eybnA3BiTYe74hZlrMHTgn26uwSmNxgK8qR9jRy1Gi10FqDCbZBQJoWhIBU7t3tgN2vWk3qDxmgCVKzD6++PH7HbF7SR7gQmiaUG3DMJ7N3/0mCTMebwa1w= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(396003)(346002)(39860400002)(136003)(376002)(451199024)(186009)(82310400011)(1800799009)(40470700004)(46966006)(36840700001)(478600001)(83380400001)(82740400003)(6506007)(7696005)(70586007)(9686003)(336012)(2906002)(41300700001)(47076005)(54906003)(6916009)(70206006)(316002)(30864003)(52536014)(26005)(40460700003)(8936002)(356005)(36860700001)(86362001)(81166007)(4326008)(40480700001)(33656002)(55016003)(84970400001)(8676002)(5660300002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Sep 2023 16:20:08.6314 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a2d5f31a-cf98-4d36-32cf-08dbbabe9f97 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9248 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Add support for inline memmove expansions. The generated code is identical as for memcpy, except that all loads are emitted before stores rather than being interleaved. The maximum size is 256 bytes which requires at most 16 registers. Passes regress/bootstrap, OK for commit? gcc/ChangeLog/ * config/aarch64/aarch64.opt (aarch64_mops_memmove_size_threshold): Change default. * config/aarch64/aarch64.md (cpymemdi): Add a parameter. (movmemdi): Call aarch64_expand_cpymem. * config/aarch64/aarch64.cc (aarch64_copy_one_block): Rename function, simplify, support storing generated loads/stores. (aarch64_expand_cpymem): Support expansion of memmove. * config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Add bool arg. gcc/testsuite/ChangeLog/ * gcc.target/aarch64/memmove.c: Add new test. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index e8d91cba30e32e03c4794ccc24254691d135f2dd..e224218600969d9d052128790f1524414bbab5c6 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -766,7 +766,7 @@ bool aarch64_emit_approx_sqrt (rtx, rtx, bool); tree aarch64_vector_load_decl (tree); void aarch64_expand_call (rtx, rtx, rtx, bool); bool aarch64_expand_cpymem_mops (rtx *, bool); -bool aarch64_expand_cpymem (rtx *); +bool aarch64_expand_cpymem (rtx *, bool); bool aarch64_expand_setmem (rtx *); bool aarch64_float_const_zero_rtx_p (rtx); bool aarch64_float_const_rtx_p (rtx); diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 8a12894d6b80de1031d6e7d02dca680c57bce136..a573e3bded2736f5108ad2d4004f530e0f32c99c 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -25191,48 +25191,35 @@ aarch64_progress_pointer (rtx pointer) MODE bytes. */ static void -aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, - machine_mode mode) +aarch64_copy_one_block (rtx *load, rtx *store, rtx src, rtx dst, + int offset, machine_mode mode) { /* Handle 256-bit memcpy separately. We do this by making 2 adjacent memory address copies using V4SImode so that we can use Q registers. */ if (known_eq (GET_MODE_BITSIZE (mode), 256)) { mode = V4SImode; + rtx src1 = adjust_address (src, mode, offset); + rtx src2 = adjust_address (src, mode, offset + 16); + rtx dst1 = adjust_address (dst, mode, offset); + rtx dst2 = adjust_address (dst, mode, offset + 16); rtx reg1 = gen_reg_rtx (mode); rtx reg2 = gen_reg_rtx (mode); - /* "Cast" the pointers to the correct mode. */ - *src = adjust_address (*src, mode, 0); - *dst = adjust_address (*dst, mode, 0); - /* Emit the memcpy. */ - emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2, - aarch64_progress_pointer (*src))); - emit_insn (aarch64_gen_store_pair (mode, *dst, reg1, - aarch64_progress_pointer (*dst), reg2)); - /* Move the pointers forward. */ - *src = aarch64_move_pointer (*src, 32); - *dst = aarch64_move_pointer (*dst, 32); + *load = aarch64_gen_load_pair (mode, reg1, src1, reg2, src2); + *store = aarch64_gen_store_pair (mode, dst1, reg1, dst2, reg2); return; } rtx reg = gen_reg_rtx (mode); - - /* "Cast" the pointers to the correct mode. */ - *src = adjust_address (*src, mode, 0); - *dst = adjust_address (*dst, mode, 0); - /* Emit the memcpy. */ - emit_move_insn (reg, *src); - emit_move_insn (*dst, reg); - /* Move the pointers forward. */ - *src = aarch64_progress_pointer (*src); - *dst = aarch64_progress_pointer (*dst); + *load = gen_move_insn (reg, adjust_address (src, mode, offset)); + *store = gen_move_insn (adjust_address (dst, mode, offset), reg); } /* Expand a cpymem/movmem using the MOPS extension. OPERANDS are taken from the cpymem/movmem pattern. IS_MEMMOVE is true if this is a memmove rather than memcpy. Return true iff we succeeded. */ bool -aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove = false) +aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove) { if (!TARGET_MOPS) return false; @@ -25251,12 +25238,12 @@ aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove = false) return true; } -/* Expand cpymem, as if from a __builtin_memcpy. Return true if - we succeed, otherwise return false, indicating that a libcall to - memcpy should be emitted. */ - +/* Expand cpymem/movmem, as if from a __builtin_memcpy/memmove. + OPERANDS are taken from the cpymem/movmem pattern. IS_MEMMOVE is true + if this is a memmove rather than memcpy. Return true if we succeed, + otherwise return false, indicating that a libcall should be emitted. */ bool -aarch64_expand_cpymem (rtx *operands) +aarch64_expand_cpymem (rtx *operands, bool is_memmove) { int mode_bits; rtx dst = operands[0]; @@ -25268,17 +25255,22 @@ aarch64_expand_cpymem (rtx *operands) /* Variable-sized or strict-align copies may use the MOPS expansion. */ if (!CONST_INT_P (operands[2]) || (STRICT_ALIGNMENT && align < 16)) - return aarch64_expand_cpymem_mops (operands); + return aarch64_expand_cpymem_mops (operands, is_memmove); unsigned HOST_WIDE_INT size = UINTVAL (operands[2]); - /* Try to inline up to 256 bytes. */ - unsigned max_copy_size = 256; - unsigned mops_threshold = aarch64_mops_memcpy_size_threshold; + /* Set inline limits for memmove/memcpy. MOPS has a separate threshold. */ + unsigned max_copy_size = TARGET_SIMD ? 256 : 128; + unsigned mops_threshold = is_memmove ? aarch64_mops_memmove_size_threshold + : aarch64_mops_memcpy_size_threshold; + + /* Reduce the maximum size with -Os. */ + if (size_p) + max_copy_size /= 4; /* Large copies use MOPS when available or a library call. */ if (size > max_copy_size || (TARGET_MOPS && size > mops_threshold)) - return aarch64_expand_cpymem_mops (operands); + return aarch64_expand_cpymem_mops (operands, is_memmove); int copy_bits = 256; @@ -25290,23 +25282,20 @@ aarch64_expand_cpymem (rtx *operands) & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)) copy_bits = 128; - /* Emit an inline load+store sequence and count the number of operations - involved. We use a simple count of just the loads and stores emitted - rather than rtx_insn count as all the pointer adjustments and reg copying - in this function will get optimized away later in the pipeline. */ - start_sequence (); - unsigned nops = 0; - base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = adjust_automodify_address (dst, VOIDmode, base, 0); base = copy_to_mode_reg (Pmode, XEXP (src, 0)); src = adjust_automodify_address (src, VOIDmode, base, 0); + const int max_ops = 40; + rtx load[max_ops], store[max_ops]; + /* Convert size to bits to make the rest of the code simpler. */ int n = size * BITS_PER_UNIT; + int nops, offset; - while (n > 0) + for (nops = 0, offset = 0; n > 0; nops++) { /* Find the largest mode in which to do the copy in without over reading or writing. */ @@ -25315,7 +25304,7 @@ aarch64_expand_cpymem (rtx *operands) if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_bits)) cur_mode = mode_iter.require (); - gcc_assert (cur_mode != BLKmode); + gcc_assert (cur_mode != BLKmode && nops < max_ops); mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant (); @@ -25323,49 +25312,38 @@ aarch64_expand_cpymem (rtx *operands) if (mode_bits == 128 && copy_bits == 256) cur_mode = V4SImode; - aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode); - /* A single block copy is 1 load + 1 store. */ - nops += 2; + aarch64_copy_one_block (&load[nops], &store[nops], src, dst, offset, cur_mode); n -= mode_bits; + offset += mode_bits / BITS_PER_UNIT; - /* Emit trailing copies using overlapping unaligned accesses - (when !STRICT_ALIGNMENT) - this is smaller and faster. */ - if (n > 0 && n < copy_bits / 2 && !STRICT_ALIGNMENT) + /* Emit trailing copies using overlapping unaligned accesses - + this is smaller and faster. */ + if (n > 0 && n < copy_bits / 2) { machine_mode next_mode = smallest_mode_for_size (n, MODE_INT); int n_bits = GET_MODE_BITSIZE (next_mode).to_constant (); gcc_assert (n_bits <= mode_bits); - src = aarch64_move_pointer (src, (n - n_bits) / BITS_PER_UNIT); - dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT); + offset -= (n_bits - n) / BITS_PER_UNIT; n = n_bits; } } - rtx_insn *seq = get_insns (); - end_sequence (); - /* MOPS sequence requires 3 instructions for the memory copying + 1 to move - the constant size into a register. */ - unsigned mops_cost = 3 + 1; - - /* If MOPS is available at this point we don't consider the libcall as it's - not a win even on code size. At this point only consider MOPS if - optimizing for size. For speed optimizations we will have chosen between - the two based on copy size already. */ - if (TARGET_MOPS) - { - if (size_p && mops_cost < nops) - return aarch64_expand_cpymem_mops (operands); - emit_insn (seq); - return true; - } - /* A memcpy libcall in the worst case takes 3 instructions to prepare the - arguments + 1 for the call. When MOPS is not available and we're - optimizing for size a libcall may be preferable. */ - unsigned libcall_cost = 4; - if (size_p && libcall_cost < nops) - return false; + /* Memcpy interleaves loads with stores, memmove emits all loads first. */ + int i, j, m, inc; + inc = is_memmove ? nops : 3; + if (nops == inc + 1) + inc = nops / 2; + for (i = 0; i < nops; i += inc) + { + m = inc; + if (i + m > nops) + m = nops - i; - emit_insn (seq); + for (j = 0; j < m; j++) + emit_insn (load[i + j]); + for (j = 0; j < m; j++) + emit_insn (store[i + j]); + } return true; } diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 96508a2580876d1fdbdfa6c67d1a3d02608c1d24..d08598fcdb146dfe0f6283cf57088b224f695c9b 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1629,7 +1629,7 @@ (define_expand "cpymemdi" (match_operand:DI 3 "immediate_operand")] "" { - if (aarch64_expand_cpymem (operands)) + if (aarch64_expand_cpymem (operands, false)) DONE; FAIL; } @@ -1673,17 +1673,9 @@ (define_expand "movmemdi" (match_operand:BLK 1 "memory_operand") (match_operand:DI 2 "general_operand") (match_operand:DI 3 "immediate_operand")] - "TARGET_MOPS" + "" { - rtx sz_reg = operands[2]; - /* For constant-sized memmoves check the threshold. - FIXME: We should add a non-MOPS memmove expansion for smaller, - constant-sized memmove to avoid going to a libcall. */ - if (CONST_INT_P (sz_reg) - && INTVAL (sz_reg) < aarch64_mops_memmove_size_threshold) - FAIL; - - if (aarch64_expand_cpymem_mops (operands, true)) + if (aarch64_expand_cpymem (operands, true)) DONE; FAIL; } diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 4a0580435a8d3c92eca8936515026882c7ea7f48..305923751067ac14d228c5fd51bc24eeaca164dc 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -327,7 +327,7 @@ Target Joined UInteger Var(aarch64_mops_memcpy_size_threshold) Init(256) Param Constant memcpy size in bytes above which to start using MOPS sequence. -param=aarch64-mops-memmove-size-threshold= -Target Joined UInteger Var(aarch64_mops_memmove_size_threshold) Init(0) Param +Target Joined UInteger Var(aarch64_mops_memmove_size_threshold) Init(256) Param Constant memmove size in bytes above which to start using MOPS sequence. -param=aarch64-mops-memset-size-threshold= diff --git a/gcc/testsuite/gcc.target/aarch64/memmove.c b/gcc/testsuite/gcc.target/aarch64/memmove.c new file mode 100644 index 0000000000000000000000000000000000000000..6926a97761eb2578d3f1db7e6eb19dba17b888be --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/memmove.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +void +copy1 (int *x, int *y) +{ + __builtin_memmove (x, y, 12); +} + +void +copy2 (int *x, int *y) +{ + __builtin_memmove (x, y, 128); +} + +void +copy3 (int *x, int *y) +{ + __builtin_memmove (x, y, 255); +} + +/* { dg-final { scan-assembler-not {\tb\tmemmove} } } */