From patchwork Thu Jul 22 16:02:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44459 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EDAC33891034 for ; Thu, 22 Jul 2021 16:07:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EDAC33891034 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626970043; bh=OjtyJJO/jDZstu1YvbNcjhf27NxM7eqKsItgC8/uUes=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=utuls35p82vfIDZZnu5yOnu+CoNQTIxFYVkKmGhb6XBIZsCTeLspAiIuOLFovBDPo J2i1QPaKHHzh8RoTl/hdS9TR0e7l5wibByLIxehFzklHA/GGIBI2/gA1DHECnH68Vg sbastfNM24ITf3oZjw6mQvV5f7PP2WdcudJUZE5s= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2066.outbound.protection.outlook.com [40.107.22.66]) by sourceware.org (Postfix) with ESMTPS id 3CDAA393A416 for ; Thu, 22 Jul 2021 16:05:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3CDAA393A416 Received: from DU2PR04CA0288.eurprd04.prod.outlook.com (2603:10a6:10:28c::23) by HE1PR0801MB2028.eurprd08.prod.outlook.com (2603:10a6:3:56::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.29; Thu, 22 Jul 2021 16:05:37 +0000 Received: from DB5EUR03FT008.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:28c:cafe::1b) by DU2PR04CA0288.outlook.office365.com (2603:10a6:10:28c::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.26 via Frontend Transport; Thu, 22 Jul 2021 16:05:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT008.mail.protection.outlook.com (10.152.20.98) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 16:05:37 +0000 Received: ("Tessian outbound 809237f40a36:v99"); Thu, 22 Jul 2021 16:05:37 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 2dda691bf480a030 X-CR-MTA-TID: 64aa7808 Received: from 0dd2aba2be9b.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 5281CEDB-8690-48DB-877D-C704A0C3503E.1; Thu, 22 Jul 2021 16:02:10 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 0dd2aba2be9b.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 22 Jul 2021 16:02:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BZv2a3+DkFmjjpkzsgNLpKM0LU1we/CWBN//hk3k5Ic5vFPXWBIgVdF3cxAWSFfQjcFVLNA6y4STqdL6tllhHED/SOfSH96Vp9ayQWCFD2vAXRRfJuqTUa7QIOI3yf4cC0oLAfX39MTGRG1GpNMHFJEd6Tg5W5UjmedGDY+wLo7g4M2DAvVFWmEvYJz6Nb7Q44u6O81kUjh2GYmi47KD+7r1qCQV+sCNyDFJJwcqtakz5PsC0UMSaVFAi0gQZBqfC0EE+jguajFHqtPD5G9ZMszsuPBkXYEwUISBSvoEuvawq82Fe2qY42zTMRwtgyiki1e+LP7jUP3k/ZvpdCp5sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OjtyJJO/jDZstu1YvbNcjhf27NxM7eqKsItgC8/uUes=; b=fI74gW5qQqbMTjFPY6rRBXuWuaJeQcVHiRct5Rm+eKA3Gy+cb2gylAcZ2Z0l0ekGwp8DNkWOTnsHSeBt1hiE8ka3zbucUz0in20NR+Akv1RSdv89Q74/gsNILfhurm/JY4aunFwrG4F1K/rvmrjNJTlDWnqknJhpNGsIvcVIZGAB2Jsw9b4CgDG9Urh0FQIqiM6mJhhbkxwsforRimmPgndzklrmmjYSxYZfadKRTVsoPtrZgdbILD9WdsVYAHmrq+OeWWLCIiJOh5sR4mkrYXiOtl+dL0juJCdJ3MSkOoyEqYMot6Oz/pwkH1ZnZwB8ASMqUqoM2nCOhQZ8cmYcjQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB5551.eurprd08.prod.outlook.com (2603:10a6:803:f1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.29; Thu, 22 Jul 2021 16:02:09 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e%7]) with mapi id 15.20.4352.025; Thu, 22 Jul 2021 16:02:09 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v3 3/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 3/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxLTfxEck9F5oEu0Sp/qNCbIIw== Date: Thu, 22 Jul 2021 16:02:08 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 7e572ea2-8070-46b1-3384-08d94d2a8b96 x-ms-traffictypediagnostic: VI1PR08MB5551:|HE1PR0801MB2028: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4125;OLM:4125; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 43fCqH0yVUNkBE8whPljKL+LJlxKGG5hhTe4ThwE6mG9W4mRYwneNElDI2ZBq8YqbvdQ38Bkqmts+lrisJkBQa2qU66m9MPfYIOvHNq+ExzXofFZajlunMrvQmYFTUTw9QNYdij96aIn5zHk12b0huvJpYAqR/mivQtl2R5uXO/kQ5H1T7ST7AQbyCL4XDE6cdQg/0g6+hEnePcmS6g4dO0l6EYm3NpG+I7VehehrEWls7IrX5la1teTabnoXV+udrGHUaV07PvbIU07TPWTV2KFnYHt8vsf0VZWPMwklg+XQv1AK303Ilv6467M0hQHA1BrejEEL7DooTSOYL/CU2VvOuFOpO4+1kPFY1kc2qsHJC+flbfP6Ao9gU6nCutdHvzM+rY1AqTtLxH88EETS7cKU4iglrtfTHt7juQ0zvyTYjIg7+27sLd+L9NdaN+u/xqden21XJTuKQjmctyzvJDuyga35uFf/CNZ4fqpxi8jG5IYUvn7iUB4MGLlhFyDo3eYTIgEc3dcOxBL9tZ8Gz/TqCBnwppdeT2hfFJRnbj92Zuu/BbzkdiSkNxpQCfKervROfrfNQIoFjnbPrs3XdKFtVTEyLBpfHQ2+o6UtUgR6CmW3si0pIDsSBq/OulTmXei4RfydtErdXSAoAvvMZu1SzIe8XqfstE9BXD1u9Tgbfh3/alae7YxLnf2XBiJ7FCvOh23xvk0rR2UxMFkOAPNXcjU0f9sYhmyTgy70Ac= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(136003)(39850400004)(346002)(376002)(396003)(76116006)(64756008)(38100700002)(4326008)(478600001)(66556008)(66446008)(2906002)(66946007)(55016002)(5660300002)(66476007)(52536014)(33656002)(9686003)(6506007)(91956017)(8936002)(186003)(122000001)(86362001)(26005)(316002)(71200400001)(6916009)(7696005)(8676002)(38070700004)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?WDw7jnJ6qkMo3WJKn49Cttj?= =?iso-8859-1?q?mDnaq6+0M2y3ZnShLgZwYt4Ccgyyl5qr9Al0zWR8Nq5VMc0NnlNi5CwQsIzO?= =?iso-8859-1?q?HZi/Jkj06hOfExXMzxqsXYJ1DkrOIzn+GifzxwAePh4wdoqYjKFSVHrFEWKN?= =?iso-8859-1?q?1jDjd8ay2bMijJaK2z7Olm6AE9PLfJtGxkAIFDr2kDGlSLnsyrK0Jy+uUdxR?= =?iso-8859-1?q?CMer1qjhJaWK8Y+sKdhrVf4nyhXuKqDdebDS2r/AZQTszjcDf8OHarSl9x7k?= =?iso-8859-1?q?ejdi71YFq6j4/arptI8OhrKZIclBb3GNN/OgkbAhBYkzYn5VgfLd30cQSRbB?= =?iso-8859-1?q?0oUOkwnmDL3OIHBGZ3zmYaCGBpzhTksNPOPhsjPpqCM5Qgr8rcmo8HxDOYeE?= =?iso-8859-1?q?73ZzY6dIwmXH327jhyODsOG9V+kJqDwbz6KiNvD0NE+Mgo16Z0mZXubicYUt?= =?iso-8859-1?q?0R7TCo2DCBkCp768uGaaREMahcEDcivqXr/d88bHgGe8lE18B4rD/YsDnX+F?= =?iso-8859-1?q?aMAvb2PFiI6u973rtBOkhTH2iXa6FK+HEYUbKQXxqdYfEgvFpcxof1M4Kl85?= =?iso-8859-1?q?hkniKJpysWjyhEzlUXFt5rHpGxKrCy1+NqJVzJeX7SX2qq54cKFP80qFMBtK?= =?iso-8859-1?q?LtqNX3PGh0sDmzF71LebICsbrSrxp7xGFidIrErE6rcIgY1e9eOoQzOsU8VD?= =?iso-8859-1?q?6VeQTFoHk2aHoJS6ONYqinXDji6bQdAYRtzJbtKNr76G91vdM8txN133wOt5?= =?iso-8859-1?q?0x4XxBWbpoL7YjZ3ASpGZLe/TVjq6roumedscnHYsfhvR9dfbpM9DOThp2B0?= =?iso-8859-1?q?jU4k5zxj12nDf1lNffWz3EaVGRbJZA/vnFMk5cUO3845xz8F+E70065WyA/1?= =?iso-8859-1?q?J6liUpkBJHwzoJ5W8zpTYUQMbCsarz8SKEqFG6Fv9sU56kZLCjpyYNc2CP1Y?= =?iso-8859-1?q?Hd/TOe4cnGsu6ixOlDKVPFfNvgrDqt3/WZb9C3wOK6PzL3GyEABN94NSv+w3?= =?iso-8859-1?q?STXpn1LoCmPx4XWDNksRK0ncPUTuKr6qzK7qXwKRxv6uCbSGH6fRw39lsOpC?= =?iso-8859-1?q?e4K96Tm+99OrYWnfRvjMUfADklV0MMavG3HAJM0TEkdm5+AHeFD0uraIFAMp?= =?iso-8859-1?q?bSG47j4iTTT3WmCq96nKcO2y2OHQRQXD+F61KAXVwj6Yb8nhoC+hvv1lfrOA?= =?iso-8859-1?q?Mo8sNOKRLeT4rsanUyfMthMXhOtGo7tykRhLGMBKh4W0lrfxQHKn9WJueQOI?= =?iso-8859-1?q?GdPWBgMSXr5WEw2yGrbJ1XvqMgTYhfGFuBl+wGbZCA/vc4TxARdgKNaT3hED?= =?iso-8859-1?q?N6bSFN6puU8DrcPmnVtF2nqISvKVAZdAKF7vbKoU=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5551 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT008.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: eb15861e-1f75-4632-a7f0-08d94d2a0f5c X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +YyXx0dslav98PzQHtv/Yh7t/XgoOeeKuULwm2KhhcWCUi7zgWG6r1JJb1urC7ucXf9BVpm3qZ6nEm+LCPNeOs7q9Lup/Bqj8nYqLcbIl9r0sddJbKrLkj9UgJx6c2uPJ3gCzKSGstmhScTrKudsksZ6mH1EO3dcumasb0iu+R7WjG0rtWOJtZgxSlyo6DqF5ciRvfRKoGkUb4NwM7e4a+yBG6O79nQ37nqWGwZ3fN5FahLruDpe3CvnvN5TGZdixknL22/3tP0BkcSZiDit+2g1JOrfQjeS0KmyIuOqfj6Fg067OWfaOPy1MdkhC91ua50DzM+zQaeDW063khmU8Eq2Z4P+KiZXp2jPkyZZ78aX0N2LN+hU/zACvKkL9zAHIJxra+565zuB1D4O3kSPoobwSrE33tqikTOCkkEVEtUZ4vMEPYERUc3AN+JdUoHlkDWvV7AvNrVL2VvpjmHyAAbXfA/lPkcEPo6US8zXIFZ7GAEkEdhBieptcC7wTDb1pfyA6D4RypZN+lJWKtQzfbpmBE7OR2Bbxld9OUISSuKafXBeAWZVW/wLhXYQEOZxPPVrvC9UfHlIsdKNIbPN+kQxyHmoWFab/YOuhvERalbcamWz8qETR1p0lUeTCACUTEVZqh4m2oeLhcxBqrkZxQq0xaA9c+BqF5es39TA8GTYirBe6Z7AODJW1lMx33l3a35iL4fYLq+CZPPu466rxACBDiswYMQuh2NusfRQDaM= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39850400004)(136003)(346002)(376002)(396003)(36840700001)(46966006)(36860700001)(82310400003)(478600001)(4326008)(336012)(26005)(70586007)(8676002)(33656002)(186003)(2906002)(6862004)(86362001)(82740400003)(316002)(52536014)(47076005)(8936002)(356005)(7696005)(5660300002)(9686003)(55016002)(81166007)(70206006)(6506007)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2021 16:05:37.5407 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7e572ea2-8070-46b1-3384-08d94d2a8b96 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT008.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB2028 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Simplify handling of remaining bytes. Avoid lots of taken branches and complex whilelo computations, instead unconditionally write vectors from the end. Reviewed-by: Naohiro Tamura Tested-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 608e0e2e2ff5259178e2fdadf1eea8816194d879..fce257fa68120c2b101f29b438c397e10b4c275e 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -130,38 +130,19 @@ L(unroll8): b 1b L(last): - whilelo p0.b, xzr, rest - whilelo p1.b, vector_length, rest - b.last 1f - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - ret -1: lsl tmp1, vector_length, 1 // vector_length * 2 - whilelo p2.b, tmp1, rest - incb tmp1 - whilelo p3.b, tmp1, rest - b.last 1f - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - st1b z0.b, p2, [dst, #2, mul vl] - st1b z0.b, p3, [dst, #3, mul vl] - ret -1: lsl tmp1, vector_length, 2 // vector_length * 4 - whilelo p4.b, tmp1, rest - incb tmp1 - whilelo p5.b, tmp1, rest - incb tmp1 - whilelo p6.b, tmp1, rest - incb tmp1 - whilelo p7.b, tmp1, rest - st1b z0.b, p0, [dst, #0, mul vl] - st1b z0.b, p1, [dst, #1, mul vl] - st1b z0.b, p2, [dst, #2, mul vl] - st1b z0.b, p3, [dst, #3, mul vl] - st1b z0.b, p4, [dst, #4, mul vl] - st1b z0.b, p5, [dst, #5, mul vl] - st1b z0.b, p6, [dst, #6, mul vl] - st1b z0.b, p7, [dst, #7, mul vl] + cmp count, vector_length, lsl 1 + b.ls 2f + add tmp2, vector_length, vector_length, lsl 2 + cmp count, tmp2 + b.ls 5f + st1b z0.b, p0, [dstend, -8, mul vl] + st1b z0.b, p0, [dstend, -7, mul vl] + st1b z0.b, p0, [dstend, -6, mul vl] +5: st1b z0.b, p0, [dstend, -5, mul vl] + st1b z0.b, p0, [dstend, -4, mul vl] + st1b z0.b, p0, [dstend, -3, mul vl] +2: st1b z0.b, p0, [dstend, -2, mul vl] + st1b z0.b, p0, [dstend, -1, mul vl] ret L(L1_prefetch): // if rest >= L1_SIZE