From patchwork Mon Aug 9 13:07:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44609 X-Patchwork-Delegate: szabolcs.nagy@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E58C43894415 for ; Mon, 9 Aug 2021 13:07:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E58C43894415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628514460; bh=nKXapOC+Pqr6fWMSauiEG6gilLXN0lMDi0xvLcN0Wp8=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=YHcF9/jlE7yN/KRM4zYK4yZqL9sJynCzQ82A6Py1wBAtpDplzsDFb85CqQ9gnq+YT Yfbuk78czSsj38Arozk3Opq4TGbGy5Gg5X81h38FOrGsNm0jfJP7HmFWLvADpmDiJ9 YkD3DVeG+a6ejM8sIP+9Ij0b0ZrEh5sZf9XBWsLM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60052.outbound.protection.outlook.com [40.107.6.52]) by sourceware.org (Postfix) with ESMTPS id 46B90385482F for ; Mon, 9 Aug 2021 13:07:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 46B90385482F Received: from AM6P195CA0106.EURP195.PROD.OUTLOOK.COM (2603:10a6:209:86::47) by DB6PR0802MB2135.eurprd08.prod.outlook.com (2603:10a6:4:82::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15; Mon, 9 Aug 2021 13:07:12 +0000 Received: from AM5EUR03FT013.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:86:cafe::54) by AM6P195CA0106.outlook.office365.com (2603:10a6:209:86::47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.17 via Frontend Transport; Mon, 9 Aug 2021 13:07:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT013.mail.protection.outlook.com (10.152.16.140) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 13:07:12 +0000 Received: ("Tessian outbound 8529ae990a93:v101"); Mon, 09 Aug 2021 13:07:11 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 95283b44a813caa0 X-CR-MTA-TID: 64aa7808 Received: from 43323deaa531.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 76BD7B72-CAAE-4082-929F-EAFA37ED1815.1; Mon, 09 Aug 2021 13:07:04 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 43323deaa531.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 13:07:04 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SUv3P0vZN2KwAx7l0rq6uLnx5VU8eiHApBitAXFofIIvbfEoiLzwcUlAG1KBQjV7Y7YnXvrJ2r7IEAt0c2jdpic6/InOKYgrOZkx9TmSRxnfSFZZytYlODup7G3ZWN9ezkLcYOnUYSRL+31XYs1hC5n4UDCl5pYwKMh1FPdlZJGF7GuVhiPmRYbaDFhMgO0E/R7H1om7SLnqQNgkesBbykszRT9Yu1o/G/iZLB41YwLNm+TZPt89RGkMZM2eGz+V3UHPIrkMaNuJbBjvfX1URugI6nVEuPMpwRKYDlBruOjagHCZ7NrzBsAYoXU8iLQsPDqvv+edkQ2g9Ztytv7CHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nKXapOC+Pqr6fWMSauiEG6gilLXN0lMDi0xvLcN0Wp8=; b=hl3P8rbfkLQIRVFdBSqLpb+Ahmi+2cWUJC+epee8i2oBw5bYVPwrsLeLeyHLJurZOiHwcOBr/vl9IbRg3x+EvkRyaQZNFFtg7spuHh2mQWXjE3SPQYuDfdkHqWOgGGTIF0S93s8B/7DNmN8rhzWiiyMqbY2U9zJ0SlR9TlsDCKzLb3lz4pFg8UMy6QhLdfx+Fb/LQsupPpkKKE5ITmQBKA0eXPGYZ59YPsfhZIpNQEtvm6xXIGM0iXSzqjO+AKuy349DMibr4oier3dWqaOq/2KFnf4wg2cw+kNMvAK6DYc5Ixfm7J01eA2U5Uhanva9C+UNtVnGc5hdEp82XStfQg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB5551.eurprd08.prod.outlook.com (2603:10a6:803:f1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.19; Mon, 9 Aug 2021 13:07:02 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 13:07:02 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 1/5] AArch64: Improve A64FX memset for small sizes Thread-Topic: [PATCH v4 1/5] AArch64: Improve A64FX memset for small sizes Thread-Index: AQHXjR8h8afY86Y+N0CZHA99hv4Q5g== Date: Mon, 9 Aug 2021 13:07:02 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 319db6ee-d51b-47bf-335b-08d95b369a0e x-ms-traffictypediagnostic: VI1PR08MB5551:|DB6PR0802MB2135: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:1775;OLM:1775; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 0OKM1QW9z/N6+0q1Jp0fNfaCZNx2DbI5BDTIm11cDkQzC3YA9gw99xFOqkKXVBFylVaA+kGVcGH+Y3u9ntkq60Ah51i1ZttPj5V4fj7Q9n/H2KQ4GAWD2oucEvZftD56PWWOn1WE4DSkxEIW4is+UCzzJFilYiPA0ZvNGH8OoQGOuDkmGQ3z2ZoJ3U2xmOrNODZfzCh2y8yEcnB+9Sp7fB6tIPDGL/62pyRxambs0qEXI8+iSlziYFYrZi/NO4aZH6LKKf9YHzWa36AOt5d94Q/IfGEjAJmkyzu58NLzKJzxvqq3SvKGmyJjoIInSgL+pwct5Yl0aLzKZ4KEpKCGzjysTlEwshe3rJLN3OZMQQvlVBoShh13rH8OdRM7MyBqgLRVIoV8h57PqMzlE00CSe84J/YrxZQdOlwLhN7uQEJaSHUSykmID1aPhzQrg10dv6G+TCSCH/weR4R2eShNSBW7nIDIiY1BLmPddwp0CkoApdKot/g0Ml2yr5z/D5OcFmAFEyGiPcyxBNe/2Wqrzwh2iD0WNHeFfCwAdQSyuWRuTUntjCiIUIAFhaPeuWXBGBkDd9ZEA8QfuxmO12K2JinZd4WWBUPz5zPUkaewyOAkT1k/1lNFI8jLXW9rP9EGk/l1LGYY++R0fJntp8CTm4g36FPhtKM9IzMx8SqKmjamzPGVzJKrklKC8hShUR9jPHuQeSfyyZgKRqVu7QjlXND2jDTwtBnuA16fwkH4fNY= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(396003)(39850400004)(136003)(376002)(366004)(8676002)(316002)(186003)(5660300002)(76116006)(91956017)(64756008)(38100700002)(66446008)(4326008)(55016002)(6916009)(8936002)(66946007)(66556008)(122000001)(6506007)(66476007)(9686003)(52536014)(2906002)(71200400001)(478600001)(7696005)(38070700005)(26005)(86362001)(33656002)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?MzvvbChj7PaMkHPSDOvVwl+?= =?iso-8859-1?q?fvtV3SkkDH8prCY3efImHBk+P1Ym3w+XjlmXJrPIAVvhUKNtFsaeQYvKCr2G?= =?iso-8859-1?q?plDb6Nn9VnvBubL2Ou6W8krHhR3VbU+oJWJUsF7yzc0KHG6QkxCCv56o5vlt?= =?iso-8859-1?q?LlQ5OH0dwGntPf7QWNoCbff7svLisRMAxrYrZow1BLnxZsfCrb+/3wNvA/pw?= =?iso-8859-1?q?pK4bcmZTiqaamHpsPfpR1fzcmtsSi9zsfk3HHLQBc7xx/McbNx8DFMVHJG60?= =?iso-8859-1?q?sRsxKz0KBRY56mo1UdMqRub8cYYn6n4gZnf/gb3isXAk49b0gNlOq/2f8EhI?= =?iso-8859-1?q?X1aYwxPzfLLK5zH6L8O4yP9i3R8VGkLST8wunbEUv8wwjE88W+rMchB7tq5K?= =?iso-8859-1?q?jGMSmkR0R55JJ2BE1Pg8IxOi3WFTDaL1mn/Jq/37Ky/XUI4YNRHmjvnqsC7+?= =?iso-8859-1?q?5l6El+CC3QYpsEh+CUYTLMesKAQb3Mly257BQLQVP5BUsbCZa/5GZtUalkMb?= =?iso-8859-1?q?Rg9uUzsGLK/A+phm9ZIIfdwFzUHl+UpCJu8tIibKV8CxdTEc1HYaO8VG2OMe?= =?iso-8859-1?q?T5/9MZHMTOw42Da4gaXGlZa/8rqdy0acECK7hedozKog2tr+a46YtxbDaQIO?= =?iso-8859-1?q?IVQprEP16+mOlrRPzDm0vx9gd7IN1Tekex/c8T+awlD59m+Cls/rE8KiRmEE?= =?iso-8859-1?q?67u8BvOpMbfBpDG89X2VSrCGwf0vrriL2kleNwTA0acsFCVwP/MeLxS8eSvS?= =?iso-8859-1?q?WrYZoH1ZrORKhDvmlw9r51jJS15AymK1WLR4VoGyHKP5nmwK/dQjuQyVGJ4m?= =?iso-8859-1?q?TgilKB94eVqAYMsXJeY7KSM9IFTPPU50VDSiazvh3zjhBc557ckGzNhRouIl?= =?iso-8859-1?q?SKlbfC1Cv4WNuyV4WqJlHSe0+bvBd4GkmaPaJx+6enYR5rUldVxgdRUk2IpX?= =?iso-8859-1?q?PQotl7jRE9ZeX4tgb86bKZt2c7LOAQIj7Z6NuBq4pn12R1K07U9mBpHhO3zu?= =?iso-8859-1?q?KOlN+cqfBGe4RnvU0Qg9NwsLOYsmvWeZmNCuEgbibKRbH6lXA50Yz6hbdUHF?= =?iso-8859-1?q?g7EeBADM6aZT4p46HHhV+S5O4dmbKTxY1hH7Rr/WDoPxQhFOqv104yWOQUTr?= =?iso-8859-1?q?pSSt33vqMlr3FIPODlZOjh07CNIH/DkcvzEgZ29X3VWB+uYkfbKtM2WpW12s?= =?iso-8859-1?q?2SUzyb3GGsH+Eg5udXOQGagcY1LP1ce3kk6c6BDa/YA2L6tMO75wJDts/umW?= =?iso-8859-1?q?vGj7YUnPulta8QoGIOKEErQRyfVzBqo2Atv9iuHtKj6avKZrMCW2NSyRVzte?= =?iso-8859-1?q?ueTuuu9ezopYlC2i5BH9wYX6qx260wOlS+msz2v4=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5551 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT013.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 43824a4d-0c9d-4b8c-0673-08d95b369438 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: tUuv1M5wjfu27n43E8GiFed2UFKVtRe/Z85ao0GT8iTnwBB0xe0RI6tljMEhHBYaJLaMggONOE+riiQkvhPgnRFhet//Ypd6urz5zqlh7Q7dakehSEkwiIT9yPbWK2/YyViL1dF1NuIE3Y+XxJBbAIlgc6/P6pKYbdFkJb/y1sOb0LKFtxgXeCXVxZYokIDv9R7TCXf+I2KPi7p6XN9QchvIE+hDNxFdeRHMyva9X4EwM7DtlL/YX5wEFr03nRwStKERAnEpF0sfRSrfbhGAgRIviglePrtSano4SnjGanMAZa+5CEsREp4zEdA+tm229va2y4q7DjsHG/f5THuptPX5VXxEtyYcjLYs3E0lqVV+tgxvdQJYZYXzaZTROZESyv/TH2gQ7oRcmr+cGDObZrmhWTjdiNmz+xLcstOyIyNV1WbktchkoA7r1vJeGi+IhmHNzVYr81huXbOnIK9LXFfvjO9Mwb8bRlF5b116wclPzstvhn9wgBo/KbFIJhjx31vmhcl+aWn86G1sjkVP6JqvnW1PDFVdJPpLE+A1BdCRBCf+1hZZ8nTI/6pnrpW454+BgSjxWgPuTEPQ92q+3r6LPmnc1906v4kTZHd8Qk769FiCefvmqGxoib2nDQ+m5Xnp3IwklxncCtdHMiVNdfjJixbMHSdtg6NPXiiB/GP5IjyOCTyxggjXMI96NO3fqjRk54wtlYMGM5Wj7opqU5/MJRYsIuRCNUb1THuLuO8= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(346002)(39850400004)(396003)(136003)(376002)(46966006)(36840700001)(8936002)(4326008)(6506007)(2906002)(9686003)(55016002)(82740400003)(7696005)(8676002)(478600001)(33656002)(81166007)(316002)(6862004)(26005)(336012)(52536014)(5660300002)(70206006)(186003)(70586007)(356005)(82310400003)(86362001)(36860700001)(47076005)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 13:07:12.0049 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 319db6ee-d51b-47bf-335b-08d95b369a0e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT013.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0802MB2135 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" v4: Don't remove ZF_DIST yet Improve performance of small memsets by reducing instruction counts and improving alignment. Bench-memset shows 35-45% performance gain for small sizes. diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index ce54e5418b08c8bc0ecc7affff68a59272ba6397..cf3d402ef681a9d98964d1751537945692a1ae68 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -51,78 +51,54 @@ .endm .macro st1b_unroll first=0, last=7 - st1b z0.b, p0, [dst, #\first, mul vl] + st1b z0.b, p0, [dst, \first, mul vl] .if \last-\first st1b_unroll "(\first+1)", \last .endif .endm - .macro shortcut_for_small_size exit - // if rest <= vector_length * 2 - whilelo p0.b, xzr, count - whilelo p1.b, vector_length, count - b.last 1f - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - ret -1: // if rest > vector_length * 8 - cmp count, vector_length, lsl 3 // vector_length * 8 - b.hi \exit - // if rest <= vector_length * 4 - lsl tmp1, vector_length, 1 // vector_length * 2 - whilelo p2.b, tmp1, count - incb tmp1 - whilelo p3.b, tmp1, count - b.last 1f - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - st1b z0.b, p2, [dstin, #2, mul vl] - st1b z0.b, p3, [dstin, #3, mul vl] - ret -1: // if rest <= vector_length * 8 - lsl tmp1, vector_length, 2 // vector_length * 4 - whilelo p4.b, tmp1, count - incb tmp1 - whilelo p5.b, tmp1, count - b.last 1f - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - st1b z0.b, p2, [dstin, #2, mul vl] - st1b z0.b, p3, [dstin, #3, mul vl] - st1b z0.b, p4, [dstin, #4, mul vl] - st1b z0.b, p5, [dstin, #5, mul vl] - ret -1: lsl tmp1, vector_length, 2 // vector_length * 4 - incb tmp1 // vector_length * 5 - incb tmp1 // vector_length * 6 - whilelo p6.b, tmp1, count - incb tmp1 - whilelo p7.b, tmp1, count - st1b z0.b, p0, [dstin, #0, mul vl] - st1b z0.b, p1, [dstin, #1, mul vl] - st1b z0.b, p2, [dstin, #2, mul vl] - st1b z0.b, p3, [dstin, #3, mul vl] - st1b z0.b, p4, [dstin, #4, mul vl] - st1b z0.b, p5, [dstin, #5, mul vl] - st1b z0.b, p6, [dstin, #6, mul vl] - st1b z0.b, p7, [dstin, #7, mul vl] - ret - .endm -ENTRY (MEMSET) +#undef BTI_C +#define BTI_C +ENTRY (MEMSET) PTR_ARG (0) SIZE_ARG (2) - cbnz count, 1f - ret -1: dup z0.b, valw cntb vector_length - // shortcut for less than vector_length * 8 - // gives a free ptrue to p0.b for n >= vector_length - shortcut_for_small_size L(vl_agnostic) - // end of shortcut + dup z0.b, valw + whilelo p0.b, vector_length, count + b.last 1f + whilelo p1.b, xzr, count + st1b z0.b, p1, [dstin, 0, mul vl] + st1b z0.b, p0, [dstin, 1, mul vl] + ret + + // count >= vector_length * 2 +1: cmp count, vector_length, lsl 2 + add dstend, dstin, count + b.hi 1f + st1b z0.b, p0, [dstin, 0, mul vl] + st1b z0.b, p0, [dstin, 1, mul vl] + st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] + ret + + // count > vector_length * 4 +1: lsl tmp1, vector_length, 3 + cmp count, tmp1 + b.hi L(vl_agnostic) + st1b z0.b, p0, [dstin, 0, mul vl] + st1b z0.b, p0, [dstin, 1, mul vl] + st1b z0.b, p0, [dstin, 2, mul vl] + st1b z0.b, p0, [dstin, 3, mul vl] + st1b z0.b, p0, [dstend, -4, mul vl] + st1b z0.b, p0, [dstend, -3, mul vl] + st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] + ret + .p2align 4 L(vl_agnostic): // VL Agnostic mov rest, count mov dst, dstin