From patchwork Fri Oct 30 08:45:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 40933 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 06DE738708E3; Fri, 30 Oct 2020 08:45:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 06DE738708E3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1604047551; bh=poN6I0NrcRcoAy4MTVkOI5P1LubqTM1bigU0IeVJTQM=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=IZ6M0CeldMHZINv3XBYVMJIcfzdFlXr2V8BSfvwptdQGgaJNQ6VnX4WHuFYfHUdEV iTY2HuLffZGcZC8v24esm0lC7CK2D5gDz6XP3ifE5xJe4sSUaZGyOTbjIPW1ddVgZb vR5sbhsbNlAkduTjSJhFSd1DSc9MN6RPsw8VSsx4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 1B1393857033 for ; Fri, 30 Oct 2020 08:45:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1B1393857033 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-547-0kVeiMH2M2i81lRUdcCwSQ-1; Fri, 30 Oct 2020 04:45:44 -0400 X-MC-Unique: 0kVeiMH2M2i81lRUdcCwSQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3F945107AFCC; Fri, 30 Oct 2020 08:45:43 +0000 (UTC) Received: from oldenburg2.str.redhat.com (ovpn-113-60.ams2.redhat.com [10.36.113.60]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 144AB60BFA; Fri, 30 Oct 2020 08:45:41 +0000 (UTC) To: Sajan Karumanchi Subject: [PATCH v2 2.32] x86: Optimizing memcpy for AMD Zen architecture. Date: Fri, 30 Oct 2020 09:45:38 +0100 Message-ID: <87imasnlu5.fsf@oldenburg2.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Florian Weimer via Libc-alpha From: Florian Weimer Reply-To: Florian Weimer Cc: Premachandra Mallappa , libc-alpha@sourceware.org Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" From: Sajan Karumanchi Modifying the shareable cache '__x86_shared_cache_size', which is a factor in computing the non-temporal threshold parameter '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen architectures. In the existing implementation, the shareable cache is computed as 'L3 per thread, L2 per core'. Recomputing this shareable cache as 'L3 per CCX(Core-Complex)' has brought in performance gains. As per the large bench variant results, this patch also addresses the regression problem on AMD Zen architectures. Backport of commit 59803e81f96b479c17f583b31eac44b57591a1bf upstream, with the fix from cb3a749a22a55645dc6a52659eea765300623f98 ("x86: Restore processing of cache size tunables in init_cacheinfo") applied. Reviewed-by: Premachandra Mallappa Co-Authored-by: Florian Weimer --- sysdeps/x86/cacheinfo.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c index dadec5d58f..3fb4a028d8 100644 --- a/sysdeps/x86/cacheinfo.c +++ b/sysdeps/x86/cacheinfo.c @@ -808,7 +808,7 @@ init_cacheinfo (void) threads = 1 << ((ecx >> 12) & 0x0f); } - if (threads == 0) + if (threads == 0 || cpu_features->basic.family >= 0x17) { /* If APIC ID width is not available, use logical processor count. */ @@ -823,8 +823,22 @@ init_cacheinfo (void) if (threads > 0) shared /= threads; - /* Account for exclusive L2 and L3 caches. */ - shared += core; + /* Get shared cache per ccx for Zen architectures. */ + if (cpu_features->basic.family >= 0x17) + { + unsigned int eax; + + /* Get number of threads share the L3 cache in CCX. */ + __cpuid_count (0x8000001D, 0x3, eax, ebx, ecx, edx); + + unsigned int threads_per_ccx = ((eax >> 14) & 0xfff) + 1; + shared *= threads_per_ccx; + } + else + { + /* Account for exclusive L2 and L3 caches. */ + shared += core; + } } }