From patchwork Wed Oct 28 12:48:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 40905 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9F17F386F002; Wed, 28 Oct 2020 12:48:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9F17F386F002 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1603889313; bh=C2J3Hbun8XblOciESszKYZ7u/WuZQe+0tGVeg5yaIHU=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=ZbjNoO+gmCPux/HrjuPJL8xgsQL6jz82CA0vsGfu9RsnB63Zzc7H7jB9L0/oH7b/f X5ZevG/9VQ77mKiVG2NaGLuZdywls0fy4YFooB/oDi0+h/7CeI8vgy4kXZi/uPNlhE B20vFmrenA3kttY1v1J+RLmjJ/0nTezleQ71nu78= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 103B63858012 for ; Wed, 28 Oct 2020 12:48:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 103B63858012 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-123-5_uxBFdhNRW9qCaZLZCePA-1; Wed, 28 Oct 2020 08:48:28 -0400 X-MC-Unique: 5_uxBFdhNRW9qCaZLZCePA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5EF9F188C136; Wed, 28 Oct 2020 12:48:27 +0000 (UTC) Received: from oldenburg2.str.redhat.com (ovpn-113-60.ams2.redhat.com [10.36.113.60]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 78A6F5C22E; Wed, 28 Oct 2020 12:48:26 +0000 (UTC) To: Sajan Karumanchi Subject: [PATCH 2.32] x86: Optimizing memcpy for AMD Zen architecture. Date: Wed, 28 Oct 2020 13:48:24 +0100 Message-ID: <878sbqy0rr.fsf@oldenburg2.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Florian Weimer via Libc-alpha From: Florian Weimer Reply-To: Florian Weimer Cc: Premachandra Mallappa , libc-alpha@sourceware.org Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" I would like to backport your fix to various stable release branches. Due to some refactoring, the patch doesn't apply cleanly. Would you kindly review this 2.32 backport? Thanks. Florian 8<------------------------------------------------------------------8< From: Sajan Karumanchi Modifying the shareable cache '__x86_shared_cache_size', which is a factor in computing the non-temporal threshold parameter '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen architectures. In the existing implementation, the shareable cache is computed as 'L3 per thread, L2 per core'. Recomputing this shareable cache as 'L3 per CCX(Core-Complex)' has brought in performance gains. As per the large bench variant results, this patch also addresses the regression problem on AMD Zen architectures. Backport of commit 59803e81f96b479c17f583b31eac44b57591a1bf upstream. Reviewed-by: Premachandra Mallappa Reviewed-by: Premachandra Mallappa --- sysdeps/x86/cacheinfo.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c index dadec5d58f..2c06435170 100644 --- a/sysdeps/x86/cacheinfo.c +++ b/sysdeps/x86/cacheinfo.c @@ -808,7 +808,7 @@ init_cacheinfo (void) threads = 1 << ((ecx >> 12) & 0x0f); } - if (threads == 0) + if (threads == 0 || cpu_features->basic.family >= 0x17) { /* If APIC ID width is not available, use logical processor count. */ @@ -823,13 +823,30 @@ init_cacheinfo (void) if (threads > 0) shared /= threads; - /* Account for exclusive L2 and L3 caches. */ - shared += core; + /* Get shared cache per ccx for Zen architectures. */ + if (cpu_features->basic.family >= 0x17) + { + unsigned int eax; + + /* Get number of threads share the L3 cache in CCX. */ + __cpuid_count (0x8000001D, 0x3, eax, ebx, ecx, edx); + + unsigned int threads_per_ccx = ((eax >> 14) & 0xfff) + 1; + shared *= threads_per_ccx; + } + else + { + /* Account for exclusive L2 and L3 caches. */ + shared += core; + } } } if (cpu_features->data_cache_size != 0) - data = cpu_features->data_cache_size; + { + if (data == 0 || cpu_features->basic.kind != arch_kind_amd) + data = cpu_features->data_cache_size; + } if (data > 0) { @@ -842,7 +859,10 @@ init_cacheinfo (void) } if (cpu_features->shared_cache_size != 0) - shared = cpu_features->shared_cache_size; + { + if (shared == 0 || cpu_features->basic.kind != arch_kind_amd) + shared = cpu_features->shared_cache_size; + } if (shared > 0) {