From patchwork Mon Aug 9 13:15:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44612 X-Patchwork-Delegate: szabolcs.nagy@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 44D223894422 for ; Mon, 9 Aug 2021 13:16:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 44D223894422 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628514989; bh=L/iVi1R3x6VCETU0akmC7RrR28vFuYdCceGOnotz/8U=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=sjY6CbCGzk7WgeMjshGJ1eCPV+LzV4R2ipbJUwfCwkRfMfxrMtQNbPwrS8YA1iu06 R8ofWTAXmjfF0UvAT0A4symbI/XmBHiiDDhLudq5smUoPXc6AqxJgiSy/OUyYyHLk1 LVZhQZzs6JItF8f+o16A/uo2ka87//isSlZe+wF8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40047.outbound.protection.outlook.com [40.107.4.47]) by sourceware.org (Postfix) with ESMTPS id BE8F4385482F for ; Mon, 9 Aug 2021 13:16:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE8F4385482F Received: from AM6P195CA0035.EURP195.PROD.OUTLOOK.COM (2603:10a6:209:81::48) by VE1PR08MB5022.eurprd08.prod.outlook.com (2603:10a6:803:114::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16; Mon, 9 Aug 2021 13:16:01 +0000 Received: from AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:81:cafe::aa) by AM6P195CA0035.outlook.office365.com (2603:10a6:209:81::48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15 via Frontend Transport; Mon, 9 Aug 2021 13:16:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT012.mail.protection.outlook.com (10.152.16.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15 via Frontend Transport; Mon, 9 Aug 2021 13:16:00 +0000 Received: ("Tessian outbound ab45ca2b67bc:v101"); Mon, 09 Aug 2021 13:16:00 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5205adce920cea79 X-CR-MTA-TID: 64aa7808 Received: from 0368c1a59114.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 21ABFDE9-2689-4145-BFCD-AB9C2670181E.1; Mon, 09 Aug 2021 13:15:54 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 0368c1a59114.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 13:15:54 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=i6Kjox7COIZIIineOO65oYzvUNk5RNe/Vega+H6T6yK/q+gUzSshnnZ8EVOXBISGXvfl91t17rhMm6B5PSAZAw+BUr4RtDYSSCiyyPvDWU0A6eplDRw8ahCR1BGw32zMHgA4eA9YojAO5xRzKlf1EhaSOzf5kQth1Vc9k7ABoJ1WdGdaEotlwZ0N9M2IoGB8JiQBzfLx0tbPvXiG0JoGQl730Jn78s0keB4Iim1z4/YXVswQ1POW5ExVk80/4HnV6AN+FoV6VEhOGiQlzVCTSw+LzZ6vO8i8vfJXrSxR7WUMqrb4+dl++nWsWQ4h8FgQIXXEEjISzHf1mgKRNEAwVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L/iVi1R3x6VCETU0akmC7RrR28vFuYdCceGOnotz/8U=; b=d/zhNxRRmENB3mO2z8Slc0ZMWbtQmWPAMd43meGNLQHeISUT0dwnZj2r2to+AsHLm+uclyP23AR7mCQUtQeVcxYYVdP4VI24PxkVFfDB0ecBWx1UJnI9UyFQtUz9VRrpTq2KpcKILbz204C2M7+dl6ql5k0bgYnksxeihxnDdYdzu6jtGpWjgh+Av4xlfDPufu/JWAwEBzVwNASZCkGSl2kkN+qrFxj7bp+AH97Z6F8X9aztwG3XUA6d1XTpzXc+xT0h3YE0/2QbqpuhrUp+m9lAUL99tDJMCnkUtggVt7C+N3WMk3qoBfiNq+7/m+4eBgYh8WHAfbtopSz4wgcXGQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB3392.eurprd08.prod.outlook.com (2603:10a6:803:7b::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16; Mon, 9 Aug 2021 13:15:45 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 13:15:45 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 5/5] AArch64: Improve A64FX memset medium loops Thread-Topic: [PATCH v4 5/5] AArch64: Improve A64FX memset medium loops Thread-Index: AQHXjSBqQUk1uBbeZkiGnGpCiCyx6w== Date: Mon, 9 Aug 2021 13:15:45 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: ffbe9b0d-22cd-4fd3-a362-08d95b37d552 x-ms-traffictypediagnostic: VI1PR08MB3392:|VE1PR08MB5022: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4714;OLM:4714; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: dc9lCfFWOZoHsqQr80AgIDjvUBbZvs0/cfpR27zVszyj4O5cNSu2hEDhrmZKbrma8K/mz6Md3thlzk8VGECKAYYwoDPnlPYSOBDNViZgU3hUSShfzRD4kFj2IYo8Zr8EbP96pEQE1FC+Zfv5Xrwx/gfQICcvbk3LRa6opIdGoh+S/z0wCl92ZIZVzudp9QA4lv+1mGxJ/MhkAcAsGP8aZIi7bMoKQt0vOXpDip0gJB3xYanwR3s2Y8v2BqPuTT8meZOBSj5kAt+gJGzEbUkB50Okc9E/1Q0azhaCNbui9mMXiA9Ei45Yk6Ial2MOkJkqm8PBhqM7ieRlCuiJdNUUCOkmGGLRk5/sLoqJMwxlLzmpnQul/HrLVjXLDFZJZw/5yq4AgAvfHFH65EPDB+n4NQ+BG5dhggQp3syKlaHkg/PSdzQuCuTP0CAnM3AIq6bZRc2bzcwC0B79YKCo1b7G8j6BIcfZrqYp0mBe3IKVNhb3sna3dDFgiyI81UkbC47+FbsksL9S3m1R5pzGNijxIJAHo0C9PpN4hlxVoOLQo0n+qnJOpfXPt3Su38YOmQitAFEYFNh5zC02HL4Cc1whaNKjmEvrn0jBfQX6hIc5JPdTQocg3Y2agkuRM1NFFj52WVZOXVDvfyHzwJS3pYczTjDjeN0tZQr4R+ZRtNxhAd5caLMLJtXUEiqdI+xAzAfXu6Ue7HIgmwagtFDLhAk3f3KJeMC96vtX2UgcaW40g6nESH0sk0xSl4/ttsik7dDV0ag/cNZxoAwZLdM3aId+kg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(346002)(396003)(39850400004)(136003)(376002)(55016002)(76116006)(9686003)(8936002)(8676002)(91956017)(38070700005)(5660300002)(6916009)(7696005)(66946007)(66476007)(122000001)(66446008)(38100700002)(52536014)(66556008)(64756008)(71200400001)(26005)(186003)(2906002)(86362001)(316002)(33656002)(6506007)(478600001)(4326008)(473944003)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?9qlI6ZIDrlQ5vqX7+5LQA5Z?= =?iso-8859-1?q?KiV+WLwyWTM5ZHYzVgITe1pYWnf9jK5QBl1kJ8I7BBbRhBffJZV38ncfbOeo?= =?iso-8859-1?q?2JOaMl4ulbECEDQ3ztD9gduy/XqwaY9Hw6x8UxYvBAO8L2aYAn4IWEDZCFXM?= =?iso-8859-1?q?cQzNtvnaDzV9eJHvpTWrpmqfctsNisAVuXokrr3CKB15f0pYsOMUb91UkXFN?= =?iso-8859-1?q?5O/B4qgFLWkGmvNKVEOrSm3W7KutqNioSjBaUX1ug+tc202UVGczl8cBK0x9?= =?iso-8859-1?q?1B0oiqfrRvTfH5eZYIChwL1G72uVcmu0/+2kyyl+zps0JWu5my0YCPfZnpcg?= =?iso-8859-1?q?Ca/nFbvg+73eAyNbLqyloDDQ/KOTLfNAyX9nO7PhPjqP6D03C+4UljobLZmN?= =?iso-8859-1?q?qiWs3MWXNFVddi2m21gIlllChM8HTgiXEbj4JFIlUsfnRF5a24UsNaXBtp31?= =?iso-8859-1?q?WmIvLhhhGQ1Hs0ZUHF0bOmADgLWEd5oN89jkfXyOFt23R/t523JxMd/ycku/?= =?iso-8859-1?q?mnC4JKfrFqDQAWiyWV1qrbuGabs/1PSQUbyb2omtT009zaKcGJjyVL0uFiOh?= =?iso-8859-1?q?CHtaLIFui6caXuJWnSswmjVDUvszRJgu5zVArB+YQ2nWzXOsVzcAQeMp/SqQ?= =?iso-8859-1?q?AQB5tPQHpv9r6D6Y+YLh0i96ccR/2Ks5wZ22EoGMkq/LLlVFIEd6c/M61nA2?= =?iso-8859-1?q?1lOz0tf5fxhwRXK7gSzA4pbOqe0UhxAGHB86NE8iQNgoWOYQFrclxm/JqnWr?= =?iso-8859-1?q?/AQ3RM6Rlv4iv5SHqrvAMstrAcWgeiqPm1q2ZpeLt8Kl7IksB9qA9sZ39Z9A?= =?iso-8859-1?q?xHP6SO098AnUgRbQiglcFg++figvpE51mVQJV3mAAwrogrTQm3kJHhMhvHVW?= =?iso-8859-1?q?9aRgoOlBuuJdOjYDBDhKG05S41Ym4Rd/zaJm6NDamM+lEwoZOXF8PILfwa8G?= =?iso-8859-1?q?nJswD7Jt+gKBZEmqfgFyluhmf0vLNEfqsMHBRl/J2p9XFlYi3rxjDm94egmu?= =?iso-8859-1?q?QrrpE9FBz2pJi7ZnaAF5+v8drf0BSDzi77RwdvaCxc/8zRyx9bcMZGIVV7G4?= =?iso-8859-1?q?uKov+jQegE9Q50id7DqcN4ebwPB3LY8CkhbBXJSDRn3kLswi+2o0mGNEbla7?= =?iso-8859-1?q?N2p9jOso4f5NFw4DzOmRwwx3mMTh9Pev8gnmDhG+LfmOl/j49RThefUfF9rE?= =?iso-8859-1?q?l0xfcYwKZe9jYf8PYFX9TffvG1uhyXdeeTewim7/3igzlJK1OhXjxdiHG21M?= =?iso-8859-1?q?41A5WbP1Dznn0BC2fxGT2SMRRZdA/F0PG4yzdnV6d45em7du7GRLt+soN3Lv?= =?iso-8859-1?q?Dg9qcy9EmVgQIerRVFcKaduyiAfzjbRQJ/4c8L8I=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3392 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 4e156455-d941-45d9-5f77-08d95b37cc58 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: jNPEftvvBz9UP/vItRnWP8mVzp043pE+fb2vK8mDlL63tLjOOt6jR++KP3G2t2DnBI7WIdc5EN4jZbiSVkaiT5YhgJhdv1O29+as9oHt9ZOgasxr1UEDqZ1id04x5+Z+7meQHvwuVpUVrnVZNwq7e4nS3c7D77e+FlYN4OQBiSgptABlogLaEmTSCjsxOnngfJxqM+izyOiS+EMlgLI9xmETG9EDwXM7C2y+vpAKckJxmFvYeqS7rfPsUtqHuKiibyvEqu7s24E9yeIJIBXpW8XIdkEcb8oxTB3HqgRayZQI/ely/pRXKvkWOIsKm70JM0ruVtaXoMltQvE7Zj4lAi9lvPpbVZWVdkLQw+QIoAzuUZB2b12x/WmfaH9kuNGKKuAXOvMy2J1M9roDwR4n2ekZU+iV48CNrWrA8E37fSESHA0flGMbj25zzXfbGAgvHspKphTKH4qy/I8XmnSznm033OhMJQuE7nSHVQg1pRreM3ZOBaWh5Tx9P1flN8Xi6jSTNPU1YS2LPyLdMXmxTSRKJixR1Z7o9sa90P8f/MVD33XfFFeE2kl0mFhbBLCbsWl3DpA6FGS3es45T/7+I78GJ45AjJvHDDcgrcuO6/QMuNtobCGB7WHGy38zahUMPhL4bw7kcl6k0Gl0QspZPQGZxh3/e3rCpftT8eUD1VlxGzwY7awgByDy8ozpJjjDZeZVh30APuy3Ojhjql8Detq06HVWv5Pdvk+xq1tpGAPKifNQ3aaUpan7q0RacwTK X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(396003)(346002)(136003)(39850400004)(46966006)(36840700001)(356005)(81166007)(82310400003)(8676002)(36860700001)(86362001)(8936002)(26005)(478600001)(2906002)(316002)(186003)(82740400003)(336012)(33656002)(52536014)(70586007)(70206006)(6862004)(9686003)(5660300002)(4326008)(6506007)(7696005)(47076005)(55016002)(473944003)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 13:16:00.9383 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ffbe9b0d-22cd-4fd3-a362-08d95b37d552 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5022 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" v4: minor loop change Simplify the code for memsets smaller than L1. Improve the unroll8 and L1_prefetch loops. Reviewed-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 89dba912588c243e67a9527a56b4d3a44659d542..318c6350a31e0fad788b5f2139de645ddc51493f 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -30,7 +30,6 @@ #define L2_SIZE (8*1024*1024) // L2 8MB #define CACHE_LINE_SIZE 256 #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1 -#define rest x2 #define vector_length x9 #if HAVE_AARCH64_SVE_ASM @@ -89,29 +88,19 @@ ENTRY (MEMSET) .p2align 4 L(vl_agnostic): // VL Agnostic - mov rest, count mov dst, dstin - add dstend, dstin, count - // if rest >= L2_SIZE && vector_length == 64 then L(L2) - mov tmp1, 64 - cmp rest, L2_SIZE - ccmp vector_length, tmp1, 0, cs - b.eq L(L2) - // if rest >= L1_SIZE && vector_length == 64 then L(L1_prefetch) - cmp rest, L1_SIZE - ccmp vector_length, tmp1, 0, cs - b.eq L(L1_prefetch) - + cmp count, L1_SIZE + b.hi L(L1_prefetch) + // count >= 8 * vector_length L(unroll8): - lsl tmp1, vector_length, 3 - .p2align 3 -1: cmp rest, tmp1 - b.cc L(last) - st1b_unroll + sub count, count, tmp1 + .p2align 4 +1: st1b_unroll 0, 7 add dst, dst, tmp1 - sub rest, rest, tmp1 - b 1b + subs count, count, tmp1 + b.hi 1b + add count, count, tmp1 L(last): cmp count, vector_length, lsl 1 @@ -129,18 +118,22 @@ L(last): st1b z0.b, p0, [dstend, -1, mul vl] ret -L(L1_prefetch): // if rest >= L1_SIZE + // count >= L1_SIZE .p2align 3 +L(L1_prefetch): + cmp count, L2_SIZE + b.hs L(L2) + cmp vector_length, 64 + b.ne L(unroll8) 1: st1b_unroll 0, 3 prfm pstl1keep, [dst, PF_DIST_L1] st1b_unroll 4, 7 prfm pstl1keep, [dst, PF_DIST_L1 + CACHE_LINE_SIZE] add dst, dst, CACHE_LINE_SIZE * 2 - sub rest, rest, CACHE_LINE_SIZE * 2 - cmp rest, L1_SIZE - b.ge 1b - cbnz rest, L(unroll8) - ret + sub count, count, CACHE_LINE_SIZE * 2 + cmp count, PF_DIST_L1 + b.hs 1b + b L(unroll8) // count >= L2_SIZE .p2align 3