From patchwork Thu Oct 22 04:50:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Karumanchi, Sajan" X-Patchwork-Id: 40808 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8D987396ECD6; Thu, 22 Oct 2020 04:50:43 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2086.outbound.protection.outlook.com [40.107.223.86]) by sourceware.org (Postfix) with ESMTPS id B42013857C57 for ; Thu, 22 Oct 2020 04:50:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B42013857C57 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=amd.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=Sajan.Karumanchi@amd.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=js5gJbiKWZl3fXZ9M7RW91WDKRFeqyprf/5cpGrEjt+sMRS9QP/RJs/GsaiNv8AZRcXpk7XcUINQZIp0sfhBAh8jEwsQSU0ejvZP/QmjR4P18CSYLmd6v4PA2PK1a4ZJl5rD4oMCdLCcvw8fYqvYCFNlnXrEq7gqCO4kl84BRAS7LA173kpGidNpHkS4kcFpbnxBwXDtm+Jvw6ZPP1M3tuTbBQ3ilUQ9fPHLcRjf8Et+hJKgdA3ppMJ8yr+U09UfO9CDJ3YzwaXElIFhR0En56CsefPvZIey9stlBFKirikqMJWeKK6e/GLoJUDKIb0ETTSykG3mWM1aj1CydEhZiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1vFUXOIKUsiqqRhi/emnMnk1c2vsfqokLk+eY2/B2dA=; b=CYIG4Km8LvSI+QSFvAsE48Vbqexbjo5dag5WTR6oXY2gVDfUd4SG8zpGR0+reYFNoNnvViG4hzYxYsn/bhTfGABT9mzUALHWIHXwcEOiD//sGZLZzpDjqTR7QcTXPX35iGe3pyRU440kujyoVUlobBaCgw80mz7zy2sxJy9rpvGnFNUnj3+/EH5qr4vqVoHTD6h8ru4d7oEbw3OKurBrbz53HWsoK9gLAV+Nn/jq7Ey3lWco8Ko4iN/YoLRZaR1ZL1tQFSrPTBTHl+T+zByZ6JzWr1HglW2qJ1hCP+C3Idc05eK0HOtSCEslCIse2lb1nU6dzxzQnLQxTzpSxfMgMg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector2-amdcloud-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1vFUXOIKUsiqqRhi/emnMnk1c2vsfqokLk+eY2/B2dA=; b=N5YU6m9B5WgefEbqRjNnf1CYCLC1+FVPnldt6Gwa67bkqveccxWSI73WGXwOETHPiU83aKHXRjNZpC96GE7ttDEJ+4CH0bMByLsYI85tQG96xYEeczMKJChoA1YU1IVyUl4zw/AY3M78HTKlGApCbhRrI3R9vAZVkTOWMGPdAOc= Authentication-Results: sourceware.org; dkim=none (message not signed) header.d=none;sourceware.org; dmarc=none action=none header.from=amd.com; Received: from BY5PR12MB4067.namprd12.prod.outlook.com (2603:10b6:a03:212::17) by BYAPR12MB3094.namprd12.prod.outlook.com (2603:10b6:a03:db::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18; Thu, 22 Oct 2020 04:50:33 +0000 Received: from BY5PR12MB4067.namprd12.prod.outlook.com ([fe80::2d32:272f:bf1b:6d24]) by BY5PR12MB4067.namprd12.prod.outlook.com ([fe80::2d32:272f:bf1b:6d24%9]) with mapi id 15.20.3499.018; Thu, 22 Oct 2020 04:50:33 +0000 From: sajan.karumanchi@amd.com To: libc-alpha@sourceware.org, carlos@redhat.com, fweimer@redhat.com Subject: [PATCH 1/1] x86: Optimizing memcpy for AMD Zen architecture. Date: Thu, 22 Oct 2020 10:20:05 +0530 Message-Id: <20201022045005.17371-2-sajan.karumanchi@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201022045005.17371-1-sajan.karumanchi@amd.com> References: <20201022045005.17371-1-sajan.karumanchi@amd.com> X-Originating-IP: [165.204.156.251] X-ClientProxiedBy: MAXPR0101CA0008.INDPRD01.PROD.OUTLOOK.COM (2603:1096:a00:c::18) To BY5PR12MB4067.namprd12.prod.outlook.com (2603:10b6:a03:212::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from amd.com (165.204.156.251) by MAXPR0101CA0008.INDPRD01.PROD.OUTLOOK.COM (2603:1096:a00:c::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18 via Frontend Transport; Thu, 22 Oct 2020 04:50:31 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 28394d11-d84f-4f67-de77-08d87646024e X-MS-TrafficTypeDiagnostic: BYAPR12MB3094: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:3276; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: CMcNv92Dv/bLlmgSsEgAVzj+EMi4JAcl0AIIH0ht0s6lqxyHmXkHlNEofaipQHNdYPUbITL0tAV5Nlg+OHmvdMhw+Mtg/+o/6g/vzIkEgX2W6VXm0oZD+yDpRsUYkof9xbDFbEYobii2M/tYmKoXO0hmbBJvQP9QQIPPYUij1FrvGpPiz5zNtiyyCm4a+d4OCvKr22WJGkQPOsieVuPL5vQkUQLhIbZah9lCowX6tELb4GCi+NPronP3ugPrTUFw/aUITp1RQVMpgLY4Uo4ZjDXFnGoQPcvkpDaQ6KOgtQUfyT30bgMCWLXqvlsom1f+S0XpElQ8kUrj0k6zlfEO0A== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BY5PR12MB4067.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(346002)(39860400002)(366004)(136003)(376002)(66556008)(8886007)(66476007)(16526019)(52116002)(83380400001)(66946007)(8676002)(6666004)(186003)(4326008)(55016002)(9686003)(478600001)(5660300002)(7696005)(86362001)(26005)(2616005)(36756003)(2906002)(1076003)(316002)(8936002)(956004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: Nn9i9Vlu91orMNvdQHwtXNvAoQEMqr6cY9BaOgpUZlD50zL/+WONBB38z6aQEaIhAf8aJpLKah5FIJ96X/fNI3P/eRVspp6CnW60mX4hFbo7/Jqd7W3QqmVGQAn0XCpDVXrFmiAjyoch2zKJjO8/OpeBt99hY3q6smFxk3UropxO8VWgFuof92TViZluB/rEDKJrlqGfH3hw4ZCefJNlJoSI1iaaRrJvbdfNdkfVPK+dZGfzwUSuIzx1JfyxLDKmkfr52oQrEOs1Dwlmat3oYG+0ujiRpm5ybBjVlOWoBWbt0nwc2D9HVdqbyl9hbZgUEiHTjCEQBwiuAFlNWDfQPlDdCdUHXtJSfCmEFd82DR6xBeCIxHywG6EEJ7sld8F5W8mzWn6NDiodt3ItAfIRSHw+RCSZCDaIXZyfafHE2LBz3igLhD7FQtQk9jUMk8Lel8Dvf3b77Sh1BAC8p+bU4VqWl+pxQnB0VbKhtFqU41kEdxk5roQU5JlweucI7eIh1UJsrr3+XG4ky1YI9Au3XtOWpZgH24gdiJ0DgbNd8mw8Hdsx7lXPFjNO+FrBO9JGvZmjEqPrznFJU3N7aSFRBqyKzEcmNQ2kYKnF19XjP7q2Xycgjax0yDh7UcbgKXQgE/5iM5ud8EwNvnGR8gO5dA== X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 28394d11-d84f-4f67-de77-08d87646024e X-MS-Exchange-CrossTenant-AuthSource: BY5PR12MB4067.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Oct 2020 04:50:33.2721 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: p0wqhqiw7orfCjuFn0HUL0v8cBby9DhPQ30n5YK9O3tss4XX9K1+kLuHSCnANKVR11uuCJxtrDTU2yPoNKtgng== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB3094 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sajan Karumanchi , premachandra.mallappa@amd.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" From: Sajan Karumanchi Modifying the shareable cache '__x86_shared_cache_size', which is a factor in computing the non-temporal threshold parameter '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen architectures. In the existing implementation, the shareable cache is computed as 'L3 per thread, L2 per core'. Recomputing this shareable cache as 'L3 per CCX(Core-Complex)' has brought in performance gains. As per the large bench variant results, this patch also addresses the regression problem on AMD Zen architectures. Reviewed-by: Premachandra Mallappa Signed-off-by: Premachandra Mallappa Signed-off-by: Sajan Karumanchi --- sysdeps/x86/cacheinfo.h | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/sysdeps/x86/cacheinfo.h b/sysdeps/x86/cacheinfo.h index 7f342fdc23..d6d6877702 100644 --- a/sysdeps/x86/cacheinfo.h +++ b/sysdeps/x86/cacheinfo.h @@ -303,6 +303,8 @@ init_cacheinfo (void) data = handle_amd (_SC_LEVEL1_DCACHE_SIZE); long int core = handle_amd (_SC_LEVEL2_CACHE_SIZE); shared = handle_amd (_SC_LEVEL3_CACHE_SIZE); + unsigned int eax; + unsigned int threads_per_ccx = 0; /* Get maximum extended function. */ __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx); @@ -320,7 +322,7 @@ init_cacheinfo (void) threads = 1 << ((ecx >> 12) & 0x0f); } - if (threads == 0) + if (threads == 0 || cpu_features->basic.family >= 0x17) { /* If APIC ID width is not available, use logical processor count. */ @@ -335,13 +337,27 @@ init_cacheinfo (void) if (threads > 0) shared /= threads; - /* Account for exclusive L2 and L3 caches. */ - shared += core; - } + /* Get shared cache per ccx for Zen architectures */ + if (cpu_features->basic.family >= 0x17) + { + /* Get number of threads share the L3 cache in CCX */ + __cpuid_count(0x8000001D, 0x3, eax, ebx, ecx, edx); + threads_per_ccx = ((eax >> 14) & 0xfff) + 1; + shared = shared * threads_per_ccx; + } + else + { + /* Account for exclusive L2 and L3 caches. */ + shared += core; + } + } } if (cpu_features->data_cache_size != 0) - data = cpu_features->data_cache_size; + { + if (data == 0 || cpu_features->basic.kind != arch_kind_amd) + data = cpu_features->data_cache_size; + } if (data > 0) { @@ -354,7 +370,10 @@ init_cacheinfo (void) } if (cpu_features->shared_cache_size != 0) - shared = cpu_features->shared_cache_size; + { + if (shared == 0 || cpu_features->basic.kind != arch_kind_amd) + shared = cpu_features->shared_cache_size; + } if (shared > 0) {