From patchwork Thu May 19 17:50:31 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 12395 Received: (qmail 9293 invoked by alias); 19 May 2016 17:50:36 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 9278 invoked by uid 89); 19 May 2016 17:50:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=smt, SMT, family X-HELO: mail-qk0-f173.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=4OEEOyZPO9PMScCmAX1Vf1i7Zgwt8nRpR/Tn1nFrgkY=; b=cZUGRlYK6Ghxaz3G9BtsXjxxd3As7RmNfpxXAA1crj1HVEyPT6UPYcsJCbfnfA7xcM ediUe5iS+CgSbjLeN6xpYc8MWSQRnyFOhfdlSL3JCt4m032MUEv0/YkvuRt/mhFmi+dv L78P8KaWjnceyrzs3w77n95vR49WNPUDvqW7cSxDNkgS0zDIym+B4dzkS1OzxtQlkHBI FeWflyWPDLAolouQEU7fgai7fWtVrElhgqgoI9/lDWWZVodAVdFw63jY2M75lsx01doe Hn83Tzr00/dF8Q9TKSWb7GuXuoyMAA+1+sshAJAbIgV4Ke9F3kxdRG/xLyMVh5nFHWMN lhSA== X-Gm-Message-State: AOPr4FVKJDzOYxEvgnS6COqDC0gyoJ9s/jOKU3WtVQfG6LUI6Bj+qY7E+wCpqvGTHkPZM2FwkootwKE0/yKsQA== MIME-Version: 1.0 X-Received: by 10.55.93.130 with SMTP id r124mr16086527qkb.195.1463680231231; Thu, 19 May 2016 10:50:31 -0700 (PDT) In-Reply-To: References: Date: Thu, 19 May 2016 10:50:31 -0700 Message-ID: Subject: Re: [PATCH] Count number of logical processors sharing L2 cache From: "H.J. Lu" To: GNU C Library On Fri, May 13, 2016 at 1:39 PM, H.J. Lu wrote: > > We need count number of available logical processors sharing L2 > cache. For Intel processors, when there are both L2 and L3 caches, SMT level type should be ued to count number of available logical processors sharing L2 cache. If there is only L2 cache, core level type should be used to count number of available logical processors sharing L2 cache. Number of available logical processors sharing L2 cache should be used for non-inclusive L2 and L3 caches. Any comments? From 5f7d26b9f1d85819700150ea9ee97f46d048ca54 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 13 May 2016 13:26:37 -0700 Subject: [PATCH] Count number of logical processors sharing L2 cache For Intel processors, when there are both L2 and L3 caches, SMT level type should be ued to count number of available logical processors sharing L2 cache. If there is only L2 cache, core level type should be used to count number of available logical processors sharing L2 cache. Number of available logical processors sharing L2 cache should be used for non-inclusive L2 and L3 caches. * sysdeps/x86/cacheinfo.c (init_cacheinfo): Count number of available logical processors with SMT level type sharing L2 cache for Intel processors. --- sysdeps/x86/cacheinfo.c | 156 +++++++++++++++++++++++++++++++++++++----------- 1 file changed, 120 insertions(+), 36 deletions(-) diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c index 020d3fd..c357bd4 100644 --- a/sysdeps/x86/cacheinfo.c +++ b/sysdeps/x86/cacheinfo.c @@ -499,11 +499,24 @@ init_cacheinfo (void) level = 3; shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, max_cpuid); + /* Number of logical processors sharing L2 cache. */ + int threads_l2; + + /* Number of logical processors sharing L3 cache. */ + int threads_l3; + if (shared <= 0) { /* Try L2 otherwise. */ level = 2; shared = core; + threads_l2 = 0; + threads_l3 = -1; + } + else + { + threads_l2 = 0; + threads_l3 = 0; } /* A value of 0 for the HTT bit indicates there is only a single @@ -519,7 +532,8 @@ init_cacheinfo (void) int i = 0; - /* Query until desired cache level is enumerated. */ + /* Query until cache level 2 and 3 are enumerated. */ + int check = 0x1 | (threads_l3 == 0) << 1; do { __cpuid_count (4, i++, eax, ebx, ecx, edx); @@ -530,24 +544,53 @@ init_cacheinfo (void) assume there is no such information. */ if ((eax & 0x1f) == 0) goto intel_bug_no_cache_info; - } - while (((eax >> 5) & 0x7) != level); - /* Check if cache is inclusive of lower cache levels. */ - inclusive_cache = (edx & 0x2) != 0; + switch ((eax >> 5) & 0x7) + { + default: + break; + case 2: + if ((check & 0x1)) + { + /* Get maximum number of logical processors + sharing L2 cache. */ + threads_l2 = (eax >> 14) & 0x3ff; + check &= ~0x1; + } + break; + case 3: + if ((check & (0x1 << 1))) + { + /* Get maximum number of logical processors + sharing L3 cache. */ + threads_l3 = (eax >> 14) & 0x3ff; - threads = (eax >> 14) & 0x3ff; + /* Check if L2 and L3 caches are inclusive. */ + inclusive_cache = (edx & 0x2) != 0; + check &= ~(0x1 << 1); + } + break; + } + } + while (check); - /* If max_cpuid >= 11, THREADS is the maximum number of - addressable IDs for logical processors sharing the - cache, instead of the maximum number of threads + /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum + numbers of addressable IDs for logical processors sharing + the cache, instead of the maximum number of threads sharing the cache. */ - if (threads && max_cpuid >= 11) + if (max_cpuid >= 11) { /* Find the number of logical processors shipped in one core and apply count mask. */ i = 0; - while (1) + + /* Count SMT only if there is L3 cache. Always count + core if there is no L3 cache. */ + int count = ((threads_l2 > 0 && level == 3) + | ((threads_l3 > 0 + || (threads_l2 > 0 && level == 2)) << 1)); + + while (count) { __cpuid_count (11, i++, eax, ebx, ecx, edx); @@ -555,38 +598,75 @@ init_cacheinfo (void) int type = ecx & 0xff00; if (shipped == 0 || type == 0) break; + else if (type == 0x100) + { + /* Count SMT. */ + if ((count & 0x1)) + { + int count_mask; + + /* Compute count mask. */ + asm ("bsr %1, %0" + : "=r" (count_mask) : "g" (threads_l2)); + count_mask = ~(-1 << (count_mask + 1)); + threads_l2 = (shipped - 1) & count_mask; + count &= ~0x1; + } + } else if (type == 0x200) { - int count_mask; - - /* Compute count mask. */ - asm ("bsr %1, %0" - : "=r" (count_mask) : "g" (threads)); - count_mask = ~(-1 << (count_mask + 1)); - threads = (shipped - 1) & count_mask; - break; + /* Count core. */ + if ((count & (0x1 << 1))) + { + int count_mask; + int threads_core + = (level == 2 ? threads_l2 : threads_l3); + + /* Compute count mask. */ + asm ("bsr %1, %0" + : "=r" (count_mask) : "g" (threads_core)); + count_mask = ~(-1 << (count_mask + 1)); + threads_core = (shipped - 1) & count_mask; + if (level == 2) + threads_l2 = threads_core; + else + threads_l3 = threads_core; + count &= ~(0x1 << 1); + } } } } - threads += 1; - if (threads > 2 && level == 2 && family == 6) + if (threads_l2 > 0) + threads_l2 += 1; + if (threads_l3 > 0) + threads_l3 += 1; + if (level == 2) { - switch (model) + if (threads_l2) { - case 0x57: - /* Knights Landing has L2 cache shared by 2 cores. */ - case 0x37: - case 0x4a: - case 0x4d: - case 0x5a: - case 0x5d: - /* Silvermont has L2 cache shared by 2 cores. */ - threads = 2; - break; - default: - break; + threads = threads_l2; + if (threads > 2 && family == 6) + { + switch (model) + { + case 0x57: + /* Knights Landing has L2 cache shared by 2 cores. */ + case 0x37: + case 0x4a: + case 0x4d: + case 0x5a: + case 0x5d: + /* Silvermont has L2 cache shared by 2 cores. */ + threads = 2; + break; + default: + break; + } + } } } + else if (threads_l3) + threads = threads_l3; } else { @@ -606,8 +686,12 @@ intel_bug_no_cache_info: } /* Account for non-inclusive L2 and L3 caches. */ - if (level == 3 && !inclusive_cache) - shared += core; + if (!inclusive_cache) + { + if (threads_l2 > 0) + core /= threads_l2; + shared += core; + } } /* This spells out "AuthenticAMD". */ else if (is_amd) -- 2.5.5