From patchwork Mon Aug 9 13:13:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 44611 X-Patchwork-Delegate: szabolcs.nagy@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B3543894416 for ; Mon, 9 Aug 2021 13:14:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4B3543894416 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628514854; bh=Ms76dnwfDXRoaSwxFNszvr7aqwCx1x4uftRWZdjmStc=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=MFoS3K0gkTnCBsjcugRl3kEBuvCr0Qb9qMy+xZMDZ2VvaxJt7Go9Lm6u/FY/uh7qb MJvD1FXtw7VwYzpLd/KQekhvkqsN4OXehmeAQ9wz92DXhFW3Aadek/jiib81S+El3v VhDtAgHFGqSZAAdSv3eMRV92t0wLN9oJBV7otjic= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2083.outbound.protection.outlook.com [40.107.21.83]) by sourceware.org (Postfix) with ESMTPS id 26DAF385B83F for ; Mon, 9 Aug 2021 13:13:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 26DAF385B83F Received: from FR0P281CA0062.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:49::23) by DBBPR08MB4505.eurprd08.prod.outlook.com (2603:10a6:10:cf::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.17; Mon, 9 Aug 2021 13:13:39 +0000 Received: from VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:49:cafe::1b) by FR0P281CA0062.outlook.office365.com (2603:10a6:d10:49::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4415.5 via Frontend Transport; Mon, 9 Aug 2021 13:13:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT063.mail.protection.outlook.com (10.152.18.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 13:13:39 +0000 Received: ("Tessian outbound ab45ca2b67bc:v101"); Mon, 09 Aug 2021 13:13:39 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: a48a234f18a68194 X-CR-MTA-TID: 64aa7808 Received: from 4b77ab010ec0.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1666AC80-A594-4A03-B599-28C372A7C3B0.1; Mon, 09 Aug 2021 13:13:31 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4b77ab010ec0.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 13:13:31 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Xo+MPYn9EahK9Y/PD4kYyS5f98sRUscKtGPqJQXZIRfr1SZP1vvOMBl/wHsDqfL+SI9izYSt/lZs9lbt3iD6HMH9sJ/d+DJhGln/msp4BniPJr4wKXmh6LVGkQcyIPPdxK3zfUwMtAbEHKtPATcvcMi9oV0JLc3sMDLtQOJd7+GJx/i6Ea4bev7JuJUzc8WMcp6kfp+PWLQaKZbAIkDMM9lvIdDPpMTbHL1CfLbFmbv4htrFNGP67t3J/hy17G7qqx2/JE3moy0m1hubgqzYHUrvEQEj0NleaeJvPK/30Cn9iyux0ZxI+VxB224V/KJhWpYDJdB1n5hDXnRa5ojY+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ms76dnwfDXRoaSwxFNszvr7aqwCx1x4uftRWZdjmStc=; b=ZflchCEkV2XJWI7z5XcXL6dfjzFhY+9nDTw7bwO9J69cQSvIt31FE9vgHDFBg174ydWWIywlL72eqxNLZq9jWAoKJYdfOUZtuEZkAkZsEN3Cs/6se8lInVXu9PSEX8zVLmy9gpI/jJ8PmewqfMNH97yiFonMsNjfgfBp1bE8J4hBJDg6YwcWoIrWm475OADQjdqhNNQCvSQhUeNSeAAEcIeQDK3IqGO6ScmjbS6ZplMcZKXjHFAF27e2lD+OVE/LYfsD3c4tbhc7FZDDb3NkGuvOHMOrEIHO6Q+Gj0GWgjSo2Ozlf4SZHcPerQaLWYCAARrPYzMH1qurgPD4If5JcQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB3919.eurprd08.prod.outlook.com (2603:10a6:803:c4::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.19; Mon, 9 Aug 2021 13:13:30 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 13:13:30 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 4/5] AArch64: Improve A64FX memset by removing unroll32 Thread-Topic: [PATCH v4 4/5] AArch64: Improve A64FX memset by removing unroll32 Thread-Index: AQHXjSA1AzvPW7L65EeaF6sNrqsGUQ== Date: Mon, 9 Aug 2021 13:13:30 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 3fbaf264-bccb-460d-e6d2-08d95b3780e5 x-ms-traffictypediagnostic: VI1PR08MB3919:|DBBPR08MB4505: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:2399;OLM:2399; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: nysDhXBdU91ObCcF4oa24JCD93s/FWayLyNbzyryXpfp2Q7K1GRflNJCGu6C0bes2QPpY8Xp1Yi+gHAxaXO2/3Fss4TdXCbxdiEPpLB1G5/Sm4JHw85cTlL5yP7zULCvEvfLHM3Kz0rjQxGOeSuZTuAHbLpjXYqOn+QtNRiEDHQkzuEKUB78W4xKJ1w8DRiTmCIaeMFJy1vOBaoJ2WXCxhBhPh+eF9YjyMgUL5r7I60SlWyD9vv7J5QDlBe7iO5FgpOSEXCWruR4nepbYzJVqQNM8HbIifichSxh3CbnIJmK2bvI99CPJUJmZIUTRL+L3iuDiUSE81oDJAOgM7VN1XC3m7IZOWr03mgfD1E4n/wtsyzW3a/1Brcuw/ATvQo9yXc8Mlt5mmZ1ByezPWxm+mUJaAwTFyCwtqvTJ7WUBNc0iab8UfYVmyXkWKP5Ul3yHRI1nGZXpB7GqLYbKhqpKFbk2nky90pDSSYaBC6PUsX9DtwJOyRhbnvuRckLSqIaLbBzZMumNdZwmN4xBwSTk2idiqS7rGwVd2TngMmVlj/Bobf9/uNvgpmPW4qitJQ/g7iWbveDJjx0HbArQALQZ7ywwfbzRfkwZ7rr+H11M3zx+eh5y552IQhxDDIk5UmBNRqQCNSzPzfdwTIFMtJltazEw8TAOTWAwptISIxYDOMQWhDKAB6EyTqw2Tg6SmbxH+VXfPt0oaqrDu3iudD9wA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(38100700002)(8936002)(38070700005)(91956017)(316002)(66946007)(76116006)(66476007)(66556008)(8676002)(2906002)(122000001)(64756008)(33656002)(66446008)(7696005)(4326008)(6506007)(86362001)(5660300002)(52536014)(186003)(55016002)(26005)(9686003)(6916009)(508600001)(71200400001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?eS5zqusPEZ+p8Hx88MzI7tq?= =?iso-8859-1?q?VQbyBbX3mvpJjfwX35yGwV2VRK3SOcxFhEB1hSeMSEi8U04D5H7cNTMZn+wW?= =?iso-8859-1?q?fnQDT+8pOObx/uhFqGAkKYvrUqw2AEBFiFUwqqMYXlqUzQxMXD5LDncy3YFo?= =?iso-8859-1?q?DllxNZ5aM44DAuqw9HxKs5zXLE+p5ocngtLYim3iUsavOp13gdMmPCp5Y6Tl?= =?iso-8859-1?q?wnBzFjMcNp6Gcua10Y662JwkuVryQXOZ9v9qxg/YFen1PB11jFIdWlEUrD3S?= =?iso-8859-1?q?4/mMhB8MOpHTIAA81STV6WQQe+usX4s684PjPiURTAtwwzOSgeboe8rCQRkl?= =?iso-8859-1?q?1yd1ucfWjJLPmXxVqCA+YlMRHA5cs/5K/03sy6GU80tIKqReQqpYQ7iSqqUD?= =?iso-8859-1?q?ejO1fKIYmaYHNaDstd+Y1CNNIJye8TiTm/VLhMZlv9bEARmKNrBt9auSeLBX?= =?iso-8859-1?q?wouoaYYMwmEm5cFbIxTWs4FuAYdHFw+h3Ji3Y+MlmH6B8d9K3kbhDdF2qw1J?= =?iso-8859-1?q?HzTVAbW0OXJ1rdrLLt4Qr+edgBxoQTVUpQcr0SjQxw10bkEN5zA9rbuFDETm?= =?iso-8859-1?q?RyeaS5Zr+ZQE4sxEPkVsQHiF8DyJsgs5L4oCubcN5pITpX/70RcdJt3d74f2?= =?iso-8859-1?q?EoJmNc4G1NNOy2JJuAdte67teD3ZFGMqtJj4D4V8VufMLz3ImlLJr4IbuZj7?= =?iso-8859-1?q?TzTxtnqRAajgx9pVgR38covBpuI5wWinqLd8F4Cj8m4uWjnBRMO/yEhNXb3D?= =?iso-8859-1?q?sjeS+0Egxnk0zBVYQ03jQEoVT81C8vRulMHExPU9noB/iAAJ3mxpp/OOcwR4?= =?iso-8859-1?q?SNdhJx9pdtTebi0mv2g+r9fQ3FC6YTqRiBR8mp7G2IXCgkjRBfM/gqMbVOuR?= =?iso-8859-1?q?+NHT0Bxk+yfec0qsW2FodQTK68ARZ3Yy1o+GFOSnUnp4bN5VDaZSQQgvhMnl?= =?iso-8859-1?q?3SHuAL0nk563v3+oGlzssJUSrV9f8Hf+X75xavEJ6qL9ttpyaEdbfzeSbZ6p?= =?iso-8859-1?q?jFMHgCu+mEGha3HZwM5GBTKzhavdCWxNAtzx0WVcpZk1DUBPjkQ7KsQLKUrQ?= =?iso-8859-1?q?mcdCK85euvBTY6rENKoKWXhU7Mz/TiiQkSVFFtH+F1Zc6pDCCXIBG6AqtdQI?= =?iso-8859-1?q?DwdouQwoTBrK8kibSEsn9giM9rgf98zKFr37VRJ8vlH4OL0BGvYRxQZ2IhvS?= =?iso-8859-1?q?LOwJO2LS+8qeYhSzKfn958JxMWjMOo7O+8m3WiqscB4XSs3Jp8zQl7EnU4P4?= =?iso-8859-1?q?7bk9g4+fqbZBdRi3tlPdDNafbJbKNcBLhnF2Ws68teacQGrZf1d4fAyjCUn+?= =?iso-8859-1?q?PgLZ/0wNo+6rib7FUaS2G5UcgmBGsXabVTCDhYBU=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3919 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 0748d01a-9e64-4a4d-f167-08d95b377b9b X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5IOYosstyd+D42rg8XsKLtxmDqDerm9nuEr+mIelPNF25CJbfveVE5nbi+yigU83tIIipnAWizlzlE52VIFXf2UIxZc1pY3YAcOpJboZ8kDsLIlvsfWYnYE6o/oX+NxVjvG5ZlPv4exTU6r5NcShyLjLTEUaTs1kGzxD9Zgr1L/kUwz/9Qk5ZHCbYTIkvJJ7/OYUfzRLr/qAebMGDVCM+kVEJ/NuGb52V0/9tio6iZq7XFIlEPEryZriSdn46ShIip/1x9gpGBkxFFCJZy3Q2fHJwkJtxS6qvLshOZeX3R2LA9qWStbCPNm6OmvxotmhZ7QD4jboXOBzO5BUru9nyD8b+tLSfMhCoFRhUYCKnPvHwoMVxUhSEs/P881Yk8A0e46A6HBvS4P1NNO5GNyuWA6vjIgcpPfJfOwQUs05aZEz3wF9uVRfRvBt0yPcxuwpmaWiQzJTbbLAOLDYeW0IbgMHVdbddxo1aFLsITuJLske26DwH5R+jgaukieyrnOrgonEZUZbImXkrcGD6EcDQM8urf1glbOffQRjS5GZdfUSGEQPBUqDZD5KzUKM+xztZJ3/RYmgclGnK2yAQoglmEeFzUJpXbeZSbo11Erpz26JZp5eZcrmOgTPFNDcKJkYpEpjOeSOAb3er2aeyYqlDKUFtfUnnhBXjo0SpSK5XMhrjfWVYCb8ntxTTZEQQN1p3XEFPmGgtfA7YI8lHdhiiA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(33656002)(186003)(9686003)(8936002)(47076005)(6862004)(82310400003)(86362001)(8676002)(36860700001)(81166007)(6506007)(4326008)(55016002)(336012)(2906002)(508600001)(26005)(356005)(316002)(7696005)(5660300002)(52536014)(70206006)(70586007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 13:13:39.2085 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3fbaf264-bccb-460d-e6d2-08d95b3780e5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB4505 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" v4: no changes Remove unroll32 code since it doesn't improve performance. Reviewed-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 55f28b644defdffb140c88da0635ef099235546c..89dba912588c243e67a9527a56b4d3a44659d542 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -102,22 +102,6 @@ L(vl_agnostic): // VL Agnostic ccmp vector_length, tmp1, 0, cs b.eq L(L1_prefetch) -L(unroll32): - lsl tmp1, vector_length, 3 // vector_length * 8 - lsl tmp2, vector_length, 5 // vector_length * 32 - .p2align 3 -1: cmp rest, tmp2 - b.cc L(unroll8) - st1b_unroll - add dst, dst, tmp1 - st1b_unroll - add dst, dst, tmp1 - st1b_unroll - add dst, dst, tmp1 - st1b_unroll - add dst, dst, tmp1 - sub rest, rest, tmp2 - b 1b L(unroll8): lsl tmp1, vector_length, 3 @@ -155,7 +139,7 @@ L(L1_prefetch): // if rest >= L1_SIZE sub rest, rest, CACHE_LINE_SIZE * 2 cmp rest, L1_SIZE b.ge 1b - cbnz rest, L(unroll32) + cbnz rest, L(unroll8) ret // count >= L2_SIZE