From patchwork Wed Apr 2 14:13:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jovan Dmitrovic X-Patchwork-Id: 109704 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6277838618B9 for ; Wed, 2 Apr 2025 14:18:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6277838618B9 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=htecgroup.com header.i=@htecgroup.com header.a=rsa-sha256 header.s=selector1 header.b=FP7y1kYz X-Original-To: newlib@sourceware.org Delivered-To: newlib@sourceware.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on20718.outbound.protection.outlook.com [IPv6:2a01:111:f403:2612::718]) by sourceware.org (Postfix) with ESMTPS id 36133384B430 for ; Wed, 2 Apr 2025 14:14:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 36133384B430 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=htecgroup.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=htecgroup.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 36133384B430 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2612::718 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1743603242; cv=pass; b=DmjAvMFJNl/0KrF4Uvb3HTX/zAiG5T7YpP4v8pY+4rEWSprryPm6QLmDZttKysOxTm9ZsvGtJ+iBTMvpd+mY/Zr5WaWE3avKSRdqgLeu9uR35Ux3BXA/RBtIf2LKVYZqlknHF4jLrDIQKArL88VpikKdAGXwT8wJ+LOahIx8Dfk= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1743603242; c=relaxed/simple; bh=V8rwWrp7g/GRAPHY/6urkX7qaHxWPQ6CrVNrXL1+3+E=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=JvsnZCPYEyYDSLKV2yHshrMr/NOoX9pIXgjPYifIVAvvngqog1rlR4xxn2fGab/BXa43ovZqsHclKab/08u3+m5ZQKGjktiuBqbMRGTV0tht32tjYO5abc0RKmCejQRZVgSXP4pY0sRHq4vPzuVEGz4a18VduVjDIerVNTtzxKM= ARC-Authentication-Results: i=2; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 36133384B430 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=aNM6mhH0+PBy/7f6r97+I2vMWEQRQQzxvs1G7FO2fCpXmvfOeeje0TzbfFVwYmF1DjsnnWtZMPPrTtIEyaeytZLu5vNreNFNLNrr7l+S7lmMmvJd17B3L6LV1d3yKhn7/tTRUni4pYV6Q5OQR2obdg0Sxfw4XodkaWU6jVknC7LnLBwbdI0xH8YrACHQnFXxCRA8FxjHlMxFfUmORDapJwWilTTvaYoIDFF5mv57ir9sTL+949EH+dAdsbbBY6G8StJ1BYWPy/mGugKv8kDmrrYea42h376UW0hiKZfl5YKHC/25mVuRfjdgS6Nx7Bm7xv5LU+T4WYMepREh7TvPBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=af+ZgeX5MaDboarH7c6hVx9WAtMNilAknh/oKyQBarY=; b=OAQxOJkFYNWvxyB4nOaA3EHtXV74qA0ehuoKcWoelIckiEGbpTQljteJQMVHIZNEfgvymPi+0RCFj+3+CYl7NPz5p1KcNT+beTgDxLA0pQOQGmvh7k/QEsgm3/8D2F6G04KoEyOkTSMxRYP7u9AbcfXb73i4Ia9DXSB32B7xcW6G4CbkQygu/ifm8/sl8sPZHbl1HTJQaEdU1XvirRq6PQqqA/y0MeAV92uGXmUCLILnUCYDobTrwbpIdfWR/WmUW3ZuTjSbnJMQeQ8bix+dKFMEAyyAg2aRAtrU5+HV5+yZwfG71LpaLmfkibDcrgh3ZFru7fc23SWfuXI7gGGThA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=htecgroup.com; dmarc=pass action=none header.from=htecgroup.com; dkim=pass header.d=htecgroup.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=htecgroup.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=af+ZgeX5MaDboarH7c6hVx9WAtMNilAknh/oKyQBarY=; b=FP7y1kYzNukc44/5RJipvHrR2M5NjaesCM9wqn5FU4/ZZfLX7yLKq7CLzs8G5yX86JoA5CGo5CYoo1pJZZW5Cbi7sl1lPk8dkSs8kIa/9uNrVKRz4q2opo3zXPJWlxz9xcFak5ZfIqwvPFq2kkVu96NbJSEmZJ+YWkS9AleM9k0G/NejQ3whJZnibQmVGhTkqEy0kGrKzAQoTzWt9ThcodMzFUNwZWsElmD9YuffY5iSfoYdEanLFwdUgFAQ/rdRsw6qA0DBENSvSt4G9DJQ634pydo2eMzP4HgxJBZoY9DPW9TiGX9TiG1FNAWVzQbf7FZv7UoeOFnMYbtAmi3xOQ== Received: from PAVPR09MB6451.eurprd09.prod.outlook.com (2603:10a6:102:304::13) by DB8PR09MB3850.eurprd09.prod.outlook.com (2603:10a6:10:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8534.51; Wed, 2 Apr 2025 14:13:57 +0000 Received: from PAVPR09MB6451.eurprd09.prod.outlook.com ([fe80::4569:9af3:a4cf:48d]) by PAVPR09MB6451.eurprd09.prod.outlook.com ([fe80::4569:9af3:a4cf:48d%3]) with mapi id 15.20.8534.043; Wed, 2 Apr 2025 14:13:57 +0000 From: Jovan Dmitrovic To: "newlib@sourceware.org" CC: Djordje Todorovic , Jeff Johnston , Faraz Shahbazker Subject: [PATCH v2 09/13] libc: mips: memcpy prefetches beyond copied memory Thread-Topic: [PATCH v2 09/13] libc: mips: memcpy prefetches beyond copied memory Thread-Index: AQHbo9l5VZ9+b65XVkuAThPcrNAScA== Date: Wed, 2 Apr 2025 14:13:57 +0000 Message-ID: <20250402141228.1973965-10-jovan.dmitrovic@htecgroup.com> References: <20250402141228.1973965-1-jovan.dmitrovic@htecgroup.com> In-Reply-To: <20250402141228.1973965-1-jovan.dmitrovic@htecgroup.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=htecgroup.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: PAVPR09MB6451:EE_|DB8PR09MB3850:EE_ x-ms-office365-filtering-correlation-id: 04ef70e1-abd8-4078-9830-08dd71f09bb8 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: =?iso-8859-1?q?rHxQBAR7QYOh+2rJ16bUJ87nr5?= =?iso-8859-1?q?2ohHays3NM71AHE63KVybxf216dkyw7x6EnWs+wVW2hg5dWvyyXE3xI/OjZb?= =?iso-8859-1?q?MUH/dTnVyIkiCdJqx0k4hfqPbvOOkBcY+bsgsUUCbXThk3xwtUIkl973zS4h?= =?iso-8859-1?q?kfrizJwbPZ72EOng2A2hIveXEytZIHlvK/dQxEYIrGSSSJ9c6AorzH6dnbLJ?= =?iso-8859-1?q?9l94TMucCF8sCyKZuJ5ZB1HJzEWvHCQQ0HkRuv1OBGUYIM5EAih/8Z67BMgJ?= =?iso-8859-1?q?DCDKBlDy9VzTxHm8WJwzzhSlcSENFZXal2svd5E+JFZmoNvdakm9VajSlvht?= =?iso-8859-1?q?Aw5rlGWINkIyrDcOEEPzSwJzd/glT/IZcEQIxw5nxctRwfC1rfXkkFrA21Ha?= =?iso-8859-1?q?4zcH1yPX3zNFRbsqmqnYNRNWMV27tpfzDNcPv2L2XJRwQxPocJjoY242DrPY?= =?iso-8859-1?q?IHt6AfxFdU6TAHXmyXr4/7smjSyxvgvv8rgqD8v4/ckkZnw13MqxvQeytaTz?= =?iso-8859-1?q?VVoLLmEyRz4NDIAYgSrMmUkUuv0t7VOufDkwES9dJtabn+Ad2R8cUi8qDfWy?= =?iso-8859-1?q?GN1LQqHVLOayRmxD9QZPH2KHTWuVm+Hqgr7yoqaE+HL71L/y/DeqMWOgT7C3?= =?iso-8859-1?q?RJgtFaNId/FU2YKsQDsnR0yyfAwn1eXwrhL2b1sjEF3CpILxjwl1J8MnznjG?= =?iso-8859-1?q?19qvhmg8TOTTXcg/WxUhSV3OZoUWtDIUXt9alCFwCAnLA1FITfBUy06t39v0?= =?iso-8859-1?q?jgChzFINwboTTxi+KpdT32ZhzXF56ikcZfn2Uh4D0QMheSI9ib8kcku03tl6?= =?iso-8859-1?q?enGHJgSYxDH0yR2jOZj5AQVm6fonfvQquvWTRZKx39WxR+RjbPTuKXi9MStY?= =?iso-8859-1?q?QVDvAXwIu+V0EKIF/VY3LREZqpGMZqQrFBR2pXBaoNQt1niC7sdG5ojH7v7x?= =?iso-8859-1?q?Xae4+uIipZGHA0olFMpcUIk2BsUu+fXPt768RT8iP21PLalQkN4l1pVRFPRx?= =?iso-8859-1?q?DrMc+yzTMPoqhSRTv7mqzEjwVzOM5zDsxSaXAuJvVqLkzBx5OdZl/9DyDmde?= =?iso-8859-1?q?DXJ84rnIRgmQw+OoZWusRr2E0ZTEkWTq0WPTpPl0xCHA2uFrH8ekri4hEvWP?= =?iso-8859-1?q?6jUG9vJ1TgTqtZLKl2NOfctpMgyOkQ+1uUtxyucTz4jCcRgkovWoGyku+0oI?= =?iso-8859-1?q?AHVcQ/xRAy0kWaiYZ9V9/P16IEX+j6Bxa3BrTgd+30p25BRioJ3auT2ajILN?= =?iso-8859-1?q?fYLKBjGKc8m4jspBrEjvlUw2C+ltxWkGBgwkMyZaUygA28E1185ve0DdJImS?= =?iso-8859-1?q?S1+VrWWe78Y9LmmIhr29ZKO3nyX3hR+BAm2xXlMtoocwgEKCMJjDfeKFh8dk?= =?iso-8859-1?q?xCIZCprCW8PTV3pVDT7JmQxgBLQW8FuCsvd7A7/6gg3Pa3nXfOLqLEt4oFrE?= =?iso-8859-1?q?ZAQhN15H29KafagDGtobQWVlDVV3kFWNvE6qg35VI6NWTXG6iSdpH+b3i+ff?= =?iso-8859-1?q?U5qtAF?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAVPR09MB6451.eurprd09.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?9st+wRi2plMUsFmq+5ZDEpI?= =?iso-8859-1?q?i5enplD9uZ3Q3ScXlGO63PdZCiG5hFVfEHJ26qiAOZg/5qhQ9+xZ1EkqZKEm?= =?iso-8859-1?q?ChCFye5VdoFF4dsRJV8SXd4FsvkDjvlD27hjnSQjAlViMC7tWgkHo+lZoBnU?= =?iso-8859-1?q?ec218FaSa+O6rHdUy+g5NldkWNrgTFjGmddk64nx9HTkfmP3MOexd1/2h02m?= =?iso-8859-1?q?7gGmChWJFmQQwUe9HFbJThGDsGKjrcUZ7tbsfsVOWCRfo9A1Nr/GU0pex/oi?= =?iso-8859-1?q?NO0676SQd1t/WPC1Y5KK/VRL5s5kbSL++joYa4N+KH0WlsbwJK5Ve/TTyAXs?= =?iso-8859-1?q?fEPsQbjIXRt9qQ48a6v5xzzno2ffOT8xtzhBWkprUOIQ0+JBvzMWhSCbrtmu?= =?iso-8859-1?q?BUhKm/0+TDkEBZSCs/ajt9FjVziMOyApdA6Wcgrxi7XrRZ+DPDffDkMXR8CX?= =?iso-8859-1?q?TPWjBIJbcKwT80ru+bD026Up20X7scIw4x2ZcT9eP/Gjy3POe1W+wYXbNRzp?= =?iso-8859-1?q?V1inM/mA+MjVyqJ6pdPcNDD9poCC9K6+G869/p/erZ6dTJjpDrGWS8PYsYWk?= =?iso-8859-1?q?a0PdsxgjjjclepSlN7JVHsXEye7l8wMb32ePxkHVhUqQSUY6Uvx0hLfXYFS8?= =?iso-8859-1?q?xXQgQKD10lY/xN+HFJqFHrSBUin/BWiCwkF09CYSzT3R1lvH2S71D6WmeaUi?= =?iso-8859-1?q?en0+idaYu9+hCPPX9O9TaNR8bKyW0PDar6SQUDX8BsXP4AEdLiPqTMo+Egxl?= =?iso-8859-1?q?ysAdq/GBc2q2K6PLw+Y7SumEEaeyM1hhjYQSlBx96/mQ6oEqWjY88OhtYED8?= =?iso-8859-1?q?zC3EJUuMZf/QIVRjtFDI8xM5HZaYaHwxcAPM3geSPcgHxZ4OMon42bO4KqqK?= =?iso-8859-1?q?Q+YXv3irK2mPbB2+wVcNeFfOhmttJSBw8Xa5Ktyj3Dkh2RtIkhbDWUKUOywB?= =?iso-8859-1?q?ikF6JkCuwznNUsI5D8TnPsEL2+Na+vv0LKnBnDGfVHXPXjtzwgvHdxntHgKi?= =?iso-8859-1?q?YBpe2/cAfnahI63QWm2oefHB8qRdUPwU6gmy1IO0X656vhgLo0FDxbnt9buF?= =?iso-8859-1?q?GvIvNoAoO4U1yGuxbzlwYEFDt0m1mrDBJSXkAI6ewwA6COAzSVrEAiRUx8Ia?= =?iso-8859-1?q?oFLDVPKzxYo1nAG0S3B/wx8jRXoE/Y0PmdPr2vt4SvDTnkHVx9E9laoxgIUJ?= =?iso-8859-1?q?I6fv9byqWnk61cOAViLbVw/0a3QtBjm7VPlI7PKL0UcLzU129xzxgTTj7Fhp?= =?iso-8859-1?q?9Q/TXq2t8IRU3LftasBZU3j8pVKAHAxjWtJIDzrCkCLufE2e9hgldx4fs8O9?= =?iso-8859-1?q?xYh+wnR+G5yI1ypD/vHtobbChjdlWQPv+Mua2WjpEkW22oYytHvA5h8L2Vr4?= =?iso-8859-1?q?HEu6dLOIRkCPnruU1+bce0YW2p6Wx6Pgn/Dzf/DyTyCxYp4qbrwEuVVY8NIX?= =?iso-8859-1?q?AmVTp8NaP8cPOsl0N/OYx8JTMw5+6EPccJ7PcU+2lWzDVot7Jmj7m0gxltGu?= =?iso-8859-1?q?MOwLLs6QnxtsVNHQQX4dRwvFqj4LHrjc7RQRHzw2X9imjV0Gafw5XhRuPG8O?= =?iso-8859-1?q?ukn+4QW2ASki/srLCGAkJxCbPb0xLWOgyvQOkzRuiEQrofRR4bbOQxqoundl?= =?iso-8859-1?q?M+3sHasaw0aAbL65anfZOOIcVoaqy59eqU/MR2uY/aaBRCyNVXKpA+MJqjNc?= =?iso-8859-1?q?=3D?= MIME-Version: 1.0 X-OriginatorOrg: htecgroup.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PAVPR09MB6451.eurprd09.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 04ef70e1-abd8-4078-9830-08dd71f09bb8 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Apr 2025 14:13:57.4989 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 9f85665b-7efd-4776-9dfe-b6bfda2565ee X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 8Qzhyizd2NexMUu3yonRCjFV14I/3FIxXbFmPZtRliNnT9bHfOHD6emCl5TmIyQl3eoW6cn7tq0CyoGOZq3Mn9Mi7MILlJtXvAJS/TMRyXg= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR09MB3850 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: newlib-bounces~patchwork=sourceware.org@sourceware.org From: Faraz Shahbazker Fix prefetching in core loop to avoid exceeding the operated upon memory region. Revert accidentally changed prefetch-hint back to streaming mode. Refactor various bits and provide pre-processor checks to allow parameters to be overridden from compiler command line. --- newlib/libc/machine/mips/memcpy.c | 150 +++++++++++++++++++----------- 1 file changed, 97 insertions(+), 53 deletions(-) diff --git a/newlib/libc/machine/mips/memcpy.c b/newlib/libc/machine/mips/memcpy.c index 2d5031814..03ef299b5 100644 --- a/newlib/libc/machine/mips/memcpy.c +++ b/newlib/libc/machine/mips/memcpy.c @@ -27,7 +27,9 @@ */ /* Typical observed latency in cycles in fetching from DRAM. */ -#define LATENCY_CYCLES 63 +#ifndef LATENCY_CYCLES + #define LATENCY_CYCLES 63 +#endif /* Pre-fetch performance is subject to accurate prefetch ahead, which in turn depends on both the cache-line size and the amount @@ -44,30 +46,42 @@ #define LATENCY_CYCLES 150 #elif defined(_MIPS_TUNE_I6400) || defined(_MIPS_TUNE_I6500) #define CACHE_LINE 64 - #define BLOCK_CYCLES 16 + #define BLOCK_CYCLES 15 #elif defined(_MIPS_TUNE_P6600) #define CACHE_LINE 32 - #define BLOCK_CYCLES 12 -#elif defined(_MIPS_TUNE_INTERAPTIV) || defined(_MIPS_TUNE_INTERAPTIV_MR2) + #define BLOCK_CYCLES 15 +#elif defined(_MIPS_TUNE_INTERAPTIV) || defined(_MIPS_TUNE_INTERAPTIV_MR2) #define CACHE_LINE 32 #define BLOCK_CYCLES 30 #else - #define CACHE_LINE 32 - #define BLOCK_CYCLES 11 + #ifndef CACHE_LINE + #define CACHE_LINE 32 + #endif + #ifndef BLOCK_CYCLES + #ifdef __nanomips__ + #define BLOCK_CYCLES 20 + #else + #define BLOCK_CYCLES 11 + #endif + #endif #endif /* Pre-fetch look ahead = ceil (latency / block-cycles) */ #define PREF_AHEAD (LATENCY_CYCLES / BLOCK_CYCLES \ + ((LATENCY_CYCLES % BLOCK_CYCLES) == 0 ? 0 : 1)) -/* Unroll-factor, controls how many words at a time in the core loop. */ -#define BLOCK (CACHE_LINE == 128 ? 16 : 8) +/* The unroll-factor controls how many words at a time in the core loop. */ +#ifndef BLOCK_SIZE + #define BLOCK_SIZE (CACHE_LINE == 128 ? 16 : 8) +#elif BLOCK_SIZE != 8 && BLOCK_SIZE != 16 + #error "BLOCK_SIZE must be 8 or 16" +#endif #define __overloadable -#ifndef UNALIGNED_INSTR_SUPPORT +#if !defined(UNALIGNED_INSTR_SUPPORT) /* does target have unaligned lw/ld/ualw/uald instructions? */ #define UNALIGNED_INSTR_SUPPORT 0 - #if (__mips_isa_rev < 6 && !__mips1) +#if (__mips_isa_rev < 6 && !defined(__mips1)) || defined(__nanomips__) #undef UNALIGNED_INSTR_SUPPORT #define UNALIGNED_INSTR_SUPPORT 1 #endif @@ -75,17 +89,35 @@ #if !defined(HW_UNALIGNED_SUPPORT) /* Does target have hardware support for unaligned accesses? */ #define HW_UNALIGNED_SUPPORT 0 - #if __mips_isa_rev >= 6 + #if __mips_isa_rev >= 6 && !defined(__nanomips__) #undef HW_UNALIGNED_SUPPORT #define HW_UNALIGNED_SUPPORT 1 #endif #endif -#define ENABLE_PREFETCH 1 + +#ifndef ENABLE_PREFETCH + #define ENABLE_PREFETCH 1 +#endif + +#ifndef ENABLE_PREFETCH_CHECK + #define ENABLE_PREFETCH_CHECK 0 +#endif + #if ENABLE_PREFETCH - #define PREFETCH(addr) __builtin_prefetch (addr, 0, 0) -#else + #if ENABLE_PREFETCH_CHECK +#include +static char *limit; +#define PREFETCH(addr) \ + do { \ + assert ((char *)(addr) < limit); \ + __builtin_prefetch ((addr), 0, 1); \ + } while (0) +#else /* ENABLE_PREFETCH_CHECK */ + #define PREFETCH(addr) __builtin_prefetch (addr, 0, 1) + #endif /* ENABLE_PREFETCH_CHECK */ +#else /* ENABLE_PREFETCH */ #define PREFETCH(addr) -#endif +#endif /* ENABLE_PREFETCH */ #include @@ -95,17 +127,18 @@ typedef struct { reg_t B0:8, B1:8, B2:8, B3:8, B4:8, B5:8, B6:8, B7:8; } bits_t; -#else +#else /* __mips64 */ typedef unsigned long reg_t; typedef struct { reg_t B0:8, B1:8, B2:8, B3:8; } bits_t; -#endif +#endif /* __mips64 */ -#define CACHE_LINES_PER_BLOCK ((BLOCK * sizeof (reg_t) > CACHE_LINE) ? \ - (BLOCK * sizeof (reg_t) / CACHE_LINE) \ - : 1) +#define CACHE_LINES_PER_BLOCK \ + ((BLOCK_SIZE * sizeof (reg_t) > CACHE_LINE) \ + ? (BLOCK_SIZE * sizeof (reg_t) / CACHE_LINE) \ + : 1) typedef union { @@ -116,7 +149,7 @@ typedef union #define DO_BYTE(a, i) \ a[i] = bw.b.B##i; \ len--; \ - if(!len) return ret; \ + if (!len) return ret; \ /* This code is called when aligning a pointer, there are remaining bytes after doing word compares, or architecture does not have some form @@ -144,7 +177,7 @@ do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret) { unsigned char *x = (unsigned char *) a; bitfields_t bw; - if(len > 0) + if (len > 0) { bw.v = *(reg_t *)b; DO_BYTE(x, 0); @@ -155,7 +188,7 @@ do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret) DO_BYTE(x, 4); DO_BYTE(x, 5); DO_BYTE(x, 6); -#endif +#endif /* __mips64 */ } return ret; } @@ -166,7 +199,7 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, { /* Use a set-back so that load/stores have incremented addresses in order to promote bonding. */ - int off = (BLOCK - words); + int off = (BLOCK_SIZE - words); a -= off; b -= off; switch (off) @@ -178,7 +211,7 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, case 5: a[5] = b[5]; case 6: a[6] = b[6]; case 7: a[7] = b[7]; -#if BLOCK==16 +#if BLOCK_SIZE==16 case 8: a[8] = b[8]; case 9: a[9] = b[9]; case 10: a[10] = b[10]; @@ -187,9 +220,9 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, case 13: a[13] = b[13]; case 14: a[14] = b[14]; case 15: a[15] = b[15]; -#endif +#endif /* BLOCK_SIZE==16 */ } - return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret); + return do_bytes_remaining (a + BLOCK_SIZE, b + BLOCK_SIZE, bytes, ret); } #if !HW_UNALIGNED_SUPPORT @@ -206,7 +239,7 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, { /* Use a set-back so that load/stores have incremented addresses in order to promote bonding. */ - int off = (BLOCK - words); + int off = (BLOCK_SIZE - words); a -= off; b -= off; switch (off) @@ -218,7 +251,7 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, case 5: a[5].uli = b[5]; case 6: a[6].uli = b[6]; case 7: a[7].uli = b[7]; -#if BLOCK==16 +#if BLOCK_SIZE==16 case 8: a[8].uli = b[8]; case 9: a[9].uli = b[9]; case 10: a[10].uli = b[10]; @@ -227,9 +260,9 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, case 13: a[13].uli = b[13]; case 14: a[14].uli = b[14]; case 15: a[15].uli = b[15]; -#endif +#endif /* BLOCK_SIZE==16 */ } - return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret); + return do_bytes_remaining (a + BLOCK_SIZE, b + BLOCK_SIZE, bytes, ret); } /* The first pointer is not aligned while second pointer is. */ @@ -238,13 +271,19 @@ unaligned_words (struct ulw *a, const reg_t * b, unsigned long words, unsigned long bytes, void *ret) { unsigned long i, words_by_block, words_by_1; - words_by_1 = words % BLOCK; - words_by_block = words / BLOCK; + words_by_1 = words % BLOCK_SIZE; + words_by_block = words / BLOCK_SIZE; + for (; words_by_block > 0; words_by_block--) { - if (words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK) + /* This condition is deliberately conservative. One could theoretically + pre-fetch another time around in some cases without crossing the page + boundary at the limit, but checking for the right conditions here is + too expensive to be worth it. */ + if (words_by_block > PREF_AHEAD) for (i = 0; i < CACHE_LINES_PER_BLOCK; i++) - PREFETCH (b + (BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i)); + PREFETCH (b + ((BLOCK_SIZE / CACHE_LINES_PER_BLOCK) + * (PREF_AHEAD + i))); reg_t y0 = b[0], y1 = b[1], y2 = b[2], y3 = b[3]; reg_t y4 = b[4], y5 = b[5], y6 = b[6], y7 = b[7]; @@ -256,7 +295,7 @@ unaligned_words (struct ulw *a, const reg_t * b, a[5].uli = y5; a[6].uli = y6; a[7].uli = y7; -#if BLOCK==16 +#if BLOCK_SIZE==16 y0 = b[8], y1 = b[9], y2 = b[10], y3 = b[11]; y4 = b[12], y5 = b[13], y6 = b[14], y7 = b[15]; a[8].uli = y0; @@ -267,16 +306,16 @@ unaligned_words (struct ulw *a, const reg_t * b, a[13].uli = y5; a[14].uli = y6; a[15].uli = y7; -#endif - a += BLOCK; - b += BLOCK; +#endif /* BLOCK_SIZE==16 */ + a += BLOCK_SIZE; + b += BLOCK_SIZE; } /* Mop up any remaining bytes. */ return do_uwords_remaining (a, b, words_by_1, bytes, ret); } -#else +#else /* !UNALIGNED_INSTR_SUPPORT */ /* No HW support or unaligned lw/ld/ualw/uald instructions. */ static void * @@ -294,7 +333,7 @@ unaligned_words (reg_t * a, const reg_t * b, x[1] = bw.b.B1; x[2] = bw.b.B2; x[3] = bw.b.B3; -#if __mips64 +#ifdef __mips64 x[4] = bw.b.B4; x[5] = bw.b.B5; x[6] = bw.b.B6; @@ -316,13 +355,15 @@ aligned_words (reg_t * a, const reg_t * b, unsigned long words, unsigned long bytes, void *ret) { unsigned long i, words_by_block, words_by_1; - words_by_1 = words % BLOCK; - words_by_block = words / BLOCK; + words_by_1 = words % BLOCK_SIZE; + words_by_block = words / BLOCK_SIZE; + for (; words_by_block > 0; words_by_block--) { - if(words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK) + if (words_by_block > PREF_AHEAD) for (i = 0; i < CACHE_LINES_PER_BLOCK; i++) - PREFETCH (b + ((BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i))); + PREFETCH (b + ((BLOCK_SIZE / CACHE_LINES_PER_BLOCK) + * (PREF_AHEAD + i))); reg_t x0 = b[0], x1 = b[1], x2 = b[2], x3 = b[3]; reg_t x4 = b[4], x5 = b[5], x6 = b[6], x7 = b[7]; @@ -334,7 +375,7 @@ aligned_words (reg_t * a, const reg_t * b, a[5] = x5; a[6] = x6; a[7] = x7; -#if BLOCK==16 +#if BLOCK_SIZE==16 x0 = b[8], x1 = b[9], x2 = b[10], x3 = b[11]; x4 = b[12], x5 = b[13], x6 = b[14], x7 = b[15]; a[8] = x0; @@ -345,9 +386,9 @@ aligned_words (reg_t * a, const reg_t * b, a[13] = x5; a[14] = x6; a[15] = x7; -#endif - a += BLOCK; - b += BLOCK; +#endif /* BLOCK_SIZE==16 */ + a += BLOCK_SIZE; + b += BLOCK_SIZE; } /* mop up any remaining bytes. */ @@ -359,13 +400,16 @@ memcpy (void *a, const void *b, size_t len) __overloadable { unsigned long bytes, words, i; void *ret = a; +#if ENABLE_PREFETCH_CHECK + limit = (char *)b + len; +#endif /* ENABLE_PREFETCH_CHECK */ /* shouldn't hit that often. */ if (len <= 8) return do_bytes (a, b, len, a); /* Start pre-fetches ahead of time. */ - if (len > CACHE_LINE * (PREF_AHEAD - 1)) - for (i = 1; i < PREF_AHEAD - 1; i++) + if (len > CACHE_LINE * PREF_AHEAD) + for (i = 1; i < PREF_AHEAD; i++) PREFETCH ((char *)b + CACHE_LINE * i); else for (i = 1; i < len / CACHE_LINE; i++) @@ -396,10 +440,10 @@ memcpy (void *a, const void *b, size_t len) __overloadable #if HW_UNALIGNED_SUPPORT /* treat possible unaligned first pointer as aligned. */ return aligned_words (a, b, words, bytes, ret); -#else +#else /* !HW_UNALIGNED_SUPPORT */ if (((unsigned long) a) % sizeof (reg_t) == 0) return aligned_words (a, b, words, bytes, ret); /* need to use unaligned instructions on first pointer. */ return unaligned_words (a, b, words, bytes, ret); -#endif +#endif /* HW_UNALIGNED_SUPPORT */ }