From patchwork Thu Sep 21 14:24:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 76528 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AA7C5385828D for ; Thu, 21 Sep 2023 14:25:22 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2052.outbound.protection.outlook.com [40.107.8.52]) by sourceware.org (Postfix) with ESMTPS id 3FD403858D39 for ; Thu, 21 Sep 2023 14:24:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3FD403858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YBZh24BdOLj9I2YpFRJcA2f0pjvTBMzJZzFt7L4DCUI=; b=C6PXWSeDKiCoB837viqQ16d1CmO0THuKti4Hj/lJMeF0YrlJ3AauhdmvIr/hc7TxRyAvIjoqYJxib8A88XlgvkUbwQmA2UkOFqUZwfEYBEu/geyvDTs4pHvWKdvIc+F2kP9MUi1dPQ15MMQm+9ByfqThHl0s7nr/iIRAeeMDCao= Received: from AM5PR1001CA0015.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:206:2::28) by PAWPR08MB9808.eurprd08.prod.outlook.com (2603:10a6:102:2ea::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.20; Thu, 21 Sep 2023 14:24:53 +0000 Received: from AM7EUR03FT028.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:2:cafe::bb) by AM5PR1001CA0015.outlook.office365.com (2603:10a6:206:2::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.30 via Frontend Transport; Thu, 21 Sep 2023 14:24:53 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT028.mail.protection.outlook.com (100.127.140.192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.20 via Frontend Transport; Thu, 21 Sep 2023 14:24:52 +0000 Received: ("Tessian outbound 5c548696a0e7:v175"); Thu, 21 Sep 2023 14:24:52 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c6cea85ebfbd1daf X-CR-MTA-TID: 64aa7808 Received: from 2b4dcce06131.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id EA0766DF-A45D-4261-B1AE-EB0A2498299C.1; Thu, 21 Sep 2023 14:24:46 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 2b4dcce06131.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 21 Sep 2023 14:24:46 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bSpk4OXFADeAqSc/GQEJbVS3yQnLtNGAKMyYCZ9x5mHy6y9kszpI5uRu9GOrbx8sJyhQG9Zx3mh4aow4eOBVN58kRiqHxxUyQB+PtXW+R8h16V6s9DXOCw4fusrbSktUJJcbzwk0pIfTF9YZkudmsO1qGXnd5k3PY9ZgFtQA9pg1yDTITfL7e4C/SlMfyK0jP3Fov09lmGi3BbMEUrfJx+gwMoPuAutDAGCxeYE859IJJGcEJJUxvAL7BgU2FxsFyh9zuS6BuXV1Kx/hT/q4tNQweotJgkxsXxpaT53RTrCVIlb1SWZJt7iWGkH9hKQfaA3peu9fl7si1Pm1mnXRqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YBZh24BdOLj9I2YpFRJcA2f0pjvTBMzJZzFt7L4DCUI=; b=TF3NyuHwBPahj5/cJzRTDwBwfLMn90X3FGxJSLGMN4rPi9hrGPD6IKeLu4SOb/6eXy8l+1hzaI2MFrFqYRwn797+rYoLU0QCrVhPNoS3F7Xn1FXtfoIFIyhsKRlBOZdr/Cu7ANntoF5UcwLPHddYzgcuskaiQQhgQHbUyHKZ5Db3Y8eyTnKaqmkSVsELReYhsX2Dpy9TXGHGTXeWWGPw+MsL7uaaT3R0DBLZu25V9IpVRibTUsNKL62ztEDkrj06ISkh6H3gnSZQkA3/xVFDXF45QeeM3Icvf/Drhh8Ff6w/63r+jmRgYiiGebsZyWok61w0vRRylk13PbxvXuePpQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YBZh24BdOLj9I2YpFRJcA2f0pjvTBMzJZzFt7L4DCUI=; b=C6PXWSeDKiCoB837viqQ16d1CmO0THuKti4Hj/lJMeF0YrlJ3AauhdmvIr/hc7TxRyAvIjoqYJxib8A88XlgvkUbwQmA2UkOFqUZwfEYBEu/geyvDTs4pHvWKdvIc+F2kP9MUi1dPQ15MMQm+9ByfqThHl0s7nr/iIRAeeMDCao= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by PAWPR08MB8790.eurprd08.prod.outlook.com (2603:10a6:102:333::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.20; Thu, 21 Sep 2023 14:24:44 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::ff3d:6e95:9971:a7e]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::ff3d:6e95:9971:a7e%5]) with mapi id 15.20.6813.017; Thu, 21 Sep 2023 14:24:44 +0000 From: Wilco Dijkstra To: Richard Earnshaw , GCC Patches CC: Richard Sandiford Subject: [PATCH v2] AArch64: Fix strict-align cpymem/setmem [PR103100] Thread-Topic: [PATCH v2] AArch64: Fix strict-align cpymem/setmem [PR103100] Thread-Index: AQHZ7Jdd2GDLCyMfA0yeqHx9u0xm1A== Date: Thu, 21 Sep 2023 14:24:44 +0000 Message-ID: References: <9d14aa1a-91fb-cc6a-cc75-fb6c4447d7a2@arm.com> In-Reply-To: <9d14aa1a-91fb-cc6a-cc75-fb6c4447d7a2@arm.com> Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|PAWPR08MB8790:EE_|AM7EUR03FT028:EE_|PAWPR08MB9808:EE_ X-MS-Office365-Filtering-Correlation-Id: 54026fcc-9e1f-4eb7-72d4-08dbbaae8587 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: qz6eRTNFynJhkZxwzXLieK78aeP2lVAuPRYp1gjiOCOnJaT5E/MQos4FC1mos72SqxuAvbQgUDV2XnnSZ54p7h82tj3VXk1hbQFoIvLLlr5nWOPKDnkZNzbc7HBjdRjxK2OU8dmkGBMiN6rhSDjNdIStZfs6zxeUbkpQ6jYnrNPgowe1TQCmCg3OTpavnOEMkTYNDizzTD1Ufz9VFtE/yjPYywrf02+L3NN4JzcVuw9Vckwoc7cuQ3yawUTXBoti9Y+5IFunucTOsfIcJiL818KK/9++wb1Y9x1P5Nf2u8MqSxI4ylA+oPBmiEN9Gl/lQgJvaaxvT12mQoisVOudeRxdTLY4VuZN5bOCZT/f0wSSSTNYkm+XSAmrJ/Jz1ic/pigJS/gqfKqczmiUev3Rfj3NmtJnnaWIjHIdyq+VFx1F+Pdu6D7Ji7fl8JXiGlXDO/0jvDnlMfV0VZMH805+W28Oua5wppgOAKXWS5LeNqUXfRgIGKQbm+zCUQyx1fLdLanR6IXZmjTEFNMC4gzGXIrjkWU4kP2p1e0fziikk4/fv1ZBawVyWFNJas+pMpAWBeLXh0/HR0U+CtPsnhkh92XmzUj2B4peYhhnwZ6yVk16/qwaOZ2x3lfxd0VrUdZk X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(366004)(136003)(376002)(396003)(346002)(39860400002)(186009)(451199024)(1800799009)(26005)(41300700001)(71200400001)(8676002)(4326008)(8936002)(9686003)(6506007)(7696005)(55016003)(76116006)(91956017)(110136005)(66446008)(316002)(66476007)(64756008)(66556008)(66946007)(2906002)(478600001)(86362001)(38070700005)(38100700002)(83380400001)(122000001)(52536014)(5660300002)(33656002); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB8790 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT028.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 681a8931-b549-461a-d4cf-08dbbaae804d X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: dwyD2fvhcfFO1JnJ+pN8Y8XTmrk+VrWA0obP3QdH+eUJBbSQHLJr3OZzI+cSJ4I91Qxibu4/zWSRORORDVe3EmjUFMSxh4oGYxUooD8c9fRs56CzD8EUnPcsC9wvBGyy6hzgbNtukHiUMfCNwPob6cngWaXn324Puny86qttzo2fLlRZzCzP8ksBpofp7WjL/19azQXL6PVGcas4Dqd5B9gBcFOSGpjkIEpVFTjzemuN3mOdMmAHA9DVFHc/kSa1VeAn9eIa3LXZx16TGGjwFkMqsYRmsJPmpVloAcwvI1dHKSNe0Hs/6T0hQIJuds9B61VnOrySQl8lGIzfTMOjH5jEg9s8H1yH5OABkq+7c7B2GcsdLTVW2xPh2WaUPxJGUBDKSuF/UvlsMYVVqKKaeY0O0ZjOJ2q37uZ1PL/gcRhbycaphkMXJKAswvLuHosGKm0RwgCOwA9zgLU6B+rqylSd5q3l4zSE1tiLEOcHhrJEDhRcWk8CuKd9ks4CNiR/kPZsVBxsEb7xLfmmQOhlG1Jt9AEHobo3iubBlt0NbQG5pULT1TzpzPvC2eYpXPJmheZybCRWE1/GZ5E8nlP7Ez5cpJ7bZ8vWdFN1ZthVxSPg6W593lXXvowGCIK97HJLtjBDOX5moKGxQtGpXV0pahIb88lN2RDyGLmch4PqaisXH7/cgUjE8Ahuh+bB+fDKrj4KIuPtu/3ceAPQzdEQFEdXrQVfGq9bs9emff4v1MxpsULSLn7sjt4ZUlUEE7oU X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(136003)(346002)(396003)(376002)(39860400002)(451199024)(186009)(1800799009)(82310400011)(40470700004)(36840700001)(46966006)(6506007)(7696005)(33656002)(81166007)(86362001)(40480700001)(55016003)(82740400003)(356005)(36860700001)(336012)(2906002)(26005)(47076005)(9686003)(83380400001)(478600001)(8676002)(5660300002)(52536014)(41300700001)(110136005)(4326008)(40460700003)(70206006)(8936002)(316002)(70586007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Sep 2023 14:24:52.9598 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 54026fcc-9e1f-4eb7-72d4-08dbbaae8587 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT028.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB9808 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org v2: Use UINTVAL, rename max_mops_size. The cpymemdi/setmemdi implementation doesn't fully support strict alignment. Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT. Clean up the condition when to use MOPS. Passes regress/bootstrap, OK for commit? gcc/ChangeLog/ PR target/103100 * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition. (setmemdi): Likewise. * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support strict-align. Cleanup condition for using MOPS. (aarch64_expand_setmem): Likewise. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index dd6874d13a75f20d10a244578afc355b25c73da2..8a12894d6b80de1031d6e7d02dca680c57bce136 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -25261,27 +25261,23 @@ aarch64_expand_cpymem (rtx *operands) int mode_bits; rtx dst = operands[0]; rtx src = operands[1]; + unsigned align = UINTVAL (operands[3]); rtx base; machine_mode cur_mode = BLKmode; + bool size_p = optimize_function_for_size_p (cfun); - /* Variable-sized memcpy can go through the MOPS expansion if available. */ - if (!CONST_INT_P (operands[2])) + /* Variable-sized or strict-align copies may use the MOPS expansion. */ + if (!CONST_INT_P (operands[2]) || (STRICT_ALIGNMENT && align < 16)) return aarch64_expand_cpymem_mops (operands); - unsigned HOST_WIDE_INT size = INTVAL (operands[2]); - - /* Try to inline up to 256 bytes or use the MOPS threshold if available. */ - unsigned HOST_WIDE_INT max_copy_size - = TARGET_MOPS ? aarch64_mops_memcpy_size_threshold : 256; + unsigned HOST_WIDE_INT size = UINTVAL (operands[2]); - bool size_p = optimize_function_for_size_p (cfun); + /* Try to inline up to 256 bytes. */ + unsigned max_copy_size = 256; + unsigned mops_threshold = aarch64_mops_memcpy_size_threshold; - /* Large constant-sized cpymem should go through MOPS when possible. - It should be a win even for size optimization in the general case. - For speed optimization the choice between MOPS and the SIMD sequence - depends on the size of the copy, rather than number of instructions, - alignment etc. */ - if (size > max_copy_size) + /* Large copies use MOPS when available or a library call. */ + if (size > max_copy_size || (TARGET_MOPS && size > mops_threshold)) return aarch64_expand_cpymem_mops (operands); int copy_bits = 256; @@ -25445,12 +25441,13 @@ aarch64_expand_setmem (rtx *operands) unsigned HOST_WIDE_INT len; rtx dst = operands[0]; rtx val = operands[2], src; + unsigned align = UINTVAL (operands[3]); rtx base; machine_mode cur_mode = BLKmode, next_mode; - /* If we don't have SIMD registers or the size is variable use the MOPS - inlined sequence if possible. */ - if (!CONST_INT_P (operands[1]) || !TARGET_SIMD) + /* Variable-sized or strict-align memset may use the MOPS expansion. */ + if (!CONST_INT_P (operands[1]) || !TARGET_SIMD + || (STRICT_ALIGNMENT && align < 16)) return aarch64_expand_setmem_mops (operands); bool size_p = optimize_function_for_size_p (cfun); @@ -25458,10 +25455,13 @@ aarch64_expand_setmem (rtx *operands) /* Default the maximum to 256-bytes when considering only libcall vs SIMD broadcast sequence. */ unsigned max_set_size = 256; + unsigned mops_threshold = aarch64_mops_memset_size_threshold; - len = INTVAL (operands[1]); - if (len > max_set_size && !TARGET_MOPS) - return false; + len = UINTVAL (operands[1]); + + /* Large memset uses MOPS when available or a library call. */ + if (len > max_set_size || (TARGET_MOPS && len > mops_threshold)) + return aarch64_expand_setmem_mops (operands); int cst_val = !!(CONST_INT_P (val) && (INTVAL (val) != 0)); /* The MOPS sequence takes: @@ -25474,12 +25474,6 @@ aarch64_expand_setmem (rtx *operands) the arguments + 1 for the call. */ unsigned libcall_cost = 4; - /* Upper bound check. For large constant-sized setmem use the MOPS sequence - when available. */ - if (TARGET_MOPS - && len >= (unsigned HOST_WIDE_INT) aarch64_mops_memset_size_threshold) - return aarch64_expand_setmem_mops (operands); - /* Attempt a sequence with a vector broadcast followed by stores. Count the number of operations involved to see if it's worth it against the alternatives. A simple counter simd_ops on the @@ -25521,10 +25515,8 @@ aarch64_expand_setmem (rtx *operands) simd_ops++; n -= mode_bits; - /* Do certain trailing copies as overlapping if it's going to be - cheaper. i.e. less instructions to do so. For instance doing a 15 - byte copy it's more efficient to do two overlapping 8 byte copies than - 8 + 4 + 2 + 1. Only do this when -mstrict-align is not supplied. */ + /* Emit trailing writes using overlapping unaligned accesses + (when !STRICT_ALIGNMENT) - this is smaller and faster. */ if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT) { next_mode = smallest_mode_for_size (n, MODE_INT); diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 6d0f072a9dd6d094e8764a513222a9129d8296fa..96508a2580876d1fdbdfa6c67d1a3d02608c1d24 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1627,7 +1627,7 @@ (define_expand "cpymemdi" (match_operand:BLK 1 "memory_operand") (match_operand:DI 2 "general_operand") (match_operand:DI 3 "immediate_operand")] - "!STRICT_ALIGNMENT || TARGET_MOPS" + "" { if (aarch64_expand_cpymem (operands)) DONE; @@ -1724,7 +1724,7 @@ (define_expand "setmemdi" (match_operand:QI 2 "nonmemory_operand")) ;; Value (use (match_operand:DI 1 "general_operand")) ;; Length (match_operand 3 "immediate_operand")] ;; Align - "TARGET_SIMD || TARGET_MOPS" + "" { if (aarch64_expand_setmem (operands)) DONE;