From patchwork Mon Mar 21 15:16:29 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 11423 Received: (qmail 78953 invoked by alias); 21 Mar 2016 15:16:43 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 78943 invoked by uid 89); 21 Mar 2016 15:16:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_50, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=spells, HAS_CPU_FEATURE, AVX2_Usable, has_cpu_feature X-HELO: mail-qg0-f41.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=GVcWuk8OrlpnxEAe4d0xNhFkjo/omU337ciTvv1fPw8=; b=O4/wPtDTJDTvkOt1nxtgnmlnHBxoX0elKnqKsK/tU3mjk0+0Q4vyGr8xeRet0RZFEc rNMF8ZLCVBo5yvcZdH57/JyZKzn25GBP1mMjI3HXgocDNg+r/YmuSQ1CEoddC9hmgIsL Cs7Fg0b3Ok4qHQefwe/mxiisRxbwuvulSSM0GGXdGeFeLHH+PcCGdpQRkHRRy1Y8t16Z gEtZ8/1XEDpBNhWpPc7J33QNuR6I2HMFVnCVO9QHZWGFoTeuqLOF0DKtSf/DmwbDWREw DLEKWnQ6YO8aKrxn9oq0lFe1N83XnlyUjqP3qeH22bHmC9XAJ8eCVK7tbdSYmny0Blle Ttsg== X-Gm-Message-State: AD7BkJJaZyAhDfoLg5Etapgh+JyAVHJaXEJAPbXq31nHLT3ZorOytgRWDwmNkoqorQbzt9VU4l+FEVyBAUy1Cg== MIME-Version: 1.0 X-Received: by 10.140.23.139 with SMTP id 11mr40978764qgp.62.1458573390008; Mon, 21 Mar 2016 08:16:30 -0700 (PDT) In-Reply-To: <20160318202535.GA13420@intel.com> References: <20160318202535.GA13420@intel.com> Date: Mon, 21 Mar 2016 08:16:29 -0700 Message-ID: Subject: Re: [PATCH] Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors From: "H.J. Lu" To: GNU C Library , "Pawar, Amit" On Fri, Mar 18, 2016 at 1:25 PM, H.J. Lu wrote: > Since only Intel processors with AVX2 have fast unaligned load, we > should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. > > Add PTR_HAS_CPU_FEATURE and PTR_HAS_ARCH_FEATURE to aoid loading > GLRO(dl_x86_cpu_features) in cpu-features.c. > > Tested on x86-64. OK for master? > > H.J. > --- > * sysdeps/x86/cpu-features.c (get_common_indeces): Remove > inline. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. > processors. > (init_cpu_features): Replace HAS_CPU_FEATURE and > PTR_HAS_ARCH_FEATURE with PTR_HAS_CPU_FEATURE and > PTR_HAS_ARCH_FEATURE. Set index_arch_AVX_Fast_Unaligned_Load > for Intel processors with usable AVX2. > * sysdeps/x86/cpu-features.h (PTR_HAS_CPU_FEATURE): New macro. > (PTR_HAS_ARCH_FEATURE): Likewise. > (HAS_CPU_FEATURE): Use PTR_HAS_CPU_FEATURE. > (HAS_ARCH_FEATURE): Use PTR_HAS_ARCH_FEATURE. Here is the updated patch to rename PTR_HAS_CPU_FEATURE and PTR_HAS_ARCH_FEATURE to CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. It also calls get_common_indeces for other processors to detect AVX, AVX2, AVX512, FMA and FMA4. This should help fix https://sourceware.org/bugzilla/show_bug.cgi?id=19583 by setting AVX_Fast_Unaligned_Load only for Intel processors. OK for master? Thanks. From 0f0675f3755688a0662c0e69337505af9c1a2401 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 18 Mar 2016 12:45:58 -0700 Subject: [PATCH] Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P. --- sysdeps/x86/cpu-features.c | 152 ++++++++++++++++++++++++--------------------- sysdeps/x86/cpu-features.h | 10 ++- 2 files changed, 88 insertions(+), 74 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 1787716..c8f81ef 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -19,23 +19,79 @@ #include #include -static inline void +static void get_common_indeces (struct cpu_features *cpu_features, unsigned int *family, unsigned int *model, unsigned int *extended_model) { - unsigned int eax; - __cpuid (1, eax, cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx, - cpu_features->cpuid[COMMON_CPUID_INDEX_1].ecx, - cpu_features->cpuid[COMMON_CPUID_INDEX_1].edx); - GLRO(dl_x86_cpu_features).cpuid[COMMON_CPUID_INDEX_1].eax = eax; - *family = (eax >> 8) & 0x0f; - *model = (eax >> 4) & 0x0f; - *extended_model = (eax >> 12) & 0xf0; - if (*family == 0x0f) + if (family) { - *family += (eax >> 20) & 0xff; - *model += *extended_model; + unsigned int eax; + __cpuid (1, eax, cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx, + cpu_features->cpuid[COMMON_CPUID_INDEX_1].ecx, + cpu_features->cpuid[COMMON_CPUID_INDEX_1].edx); + cpu_features->cpuid[COMMON_CPUID_INDEX_1].eax = eax; + *family = (eax >> 8) & 0x0f; + *model = (eax >> 4) & 0x0f; + *extended_model = (eax >> 12) & 0xf0; + if (*family == 0x0f) + { + *family += (eax >> 20) & 0xff; + *model += *extended_model; + } + } + + if (cpu_features->max_cpuid >= 7) + __cpuid_count (7, 0, + cpu_features->cpuid[COMMON_CPUID_INDEX_7].eax, + cpu_features->cpuid[COMMON_CPUID_INDEX_7].ebx, + cpu_features->cpuid[COMMON_CPUID_INDEX_7].ecx, + cpu_features->cpuid[COMMON_CPUID_INDEX_7].edx); + + /* Can we call xgetbv? */ + if (CPU_FEATURES_CPU_P (cpu_features, OSXSAVE)) + { + unsigned int xcrlow; + unsigned int xcrhigh; + asm ("xgetbv" : "=a" (xcrlow), "=d" (xcrhigh) : "c" (0)); + /* Is YMM and XMM state usable? */ + if ((xcrlow & (bit_YMM_state | bit_XMM_state)) == + (bit_YMM_state | bit_XMM_state)) + { + /* Determine if AVX is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, AVX)) + cpu_features->feature[index_arch_AVX_Usable] + |= bit_arch_AVX_Usable; + /* Determine if AVX2 is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, AVX2)) + cpu_features->feature[index_arch_AVX2_Usable] + |= bit_arch_AVX2_Usable; + /* Check if OPMASK state, upper 256-bit of ZMM0-ZMM15 and + ZMM16-ZMM31 state are enabled. */ + if ((xcrlow & (bit_Opmask_state | bit_ZMM0_15_state + | bit_ZMM16_31_state)) == + (bit_Opmask_state | bit_ZMM0_15_state | bit_ZMM16_31_state)) + { + /* Determine if AVX512F is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, AVX512F)) + { + cpu_features->feature[index_arch_AVX512F_Usable] + |= bit_arch_AVX512F_Usable; + /* Determine if AVX512DQ is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, AVX512DQ)) + cpu_features->feature[index_arch_AVX512DQ_Usable] + |= bit_arch_AVX512DQ_Usable; + } + } + /* Determine if FMA is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, FMA)) + cpu_features->feature[index_arch_FMA_Usable] + |= bit_arch_FMA_Usable; + /* Determine if FMA4 is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, FMA4)) + cpu_features->feature[index_arch_FMA4_Usable] + |= bit_arch_FMA4_Usable; + } } } @@ -135,6 +191,12 @@ init_cpu_features (struct cpu_features *cpu_features) break; } } + + /* Unaligned load with 256-bit AVX registers are faster on + Intel processors with AVX2. */ + if (CPU_FEATURES_ARCH_P (cpu_features, AVX2_Usable)) + cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load] + |= bit_arch_AVX_Fast_Unaligned_Load; } /* This spells out "AuthenticAMD". */ else if (ebx == 0x68747541 && ecx == 0x444d4163 && edx == 0x69746e65) @@ -165,73 +227,19 @@ init_cpu_features (struct cpu_features *cpu_features) } } else - kind = arch_kind_other; + { + kind = arch_kind_other; + get_common_indeces (cpu_features, NULL, NULL, NULL); + } /* Support i586 if CX8 is available. */ - if (HAS_CPU_FEATURE (CX8)) + if (CPU_FEATURES_CPU_P (cpu_features, CX8)) cpu_features->feature[index_arch_I586] |= bit_arch_I586; /* Support i686 if CMOV is available. */ - if (HAS_CPU_FEATURE (CMOV)) + if (CPU_FEATURES_CPU_P (cpu_features, CMOV)) cpu_features->feature[index_arch_I686] |= bit_arch_I686; - if (cpu_features->max_cpuid >= 7) - __cpuid_count (7, 0, - cpu_features->cpuid[COMMON_CPUID_INDEX_7].eax, - cpu_features->cpuid[COMMON_CPUID_INDEX_7].ebx, - cpu_features->cpuid[COMMON_CPUID_INDEX_7].ecx, - cpu_features->cpuid[COMMON_CPUID_INDEX_7].edx); - - /* Can we call xgetbv? */ - if (HAS_CPU_FEATURE (OSXSAVE)) - { - unsigned int xcrlow; - unsigned int xcrhigh; - asm ("xgetbv" : "=a" (xcrlow), "=d" (xcrhigh) : "c" (0)); - /* Is YMM and XMM state usable? */ - if ((xcrlow & (bit_YMM_state | bit_XMM_state)) == - (bit_YMM_state | bit_XMM_state)) - { - /* Determine if AVX is usable. */ - if (HAS_CPU_FEATURE (AVX)) - cpu_features->feature[index_arch_AVX_Usable] - |= bit_arch_AVX_Usable; -#if index_arch_AVX2_Usable != index_arch_AVX_Fast_Unaligned_Load -# error index_arch_AVX2_Usable != index_arch_AVX_Fast_Unaligned_Load -#endif - /* Determine if AVX2 is usable. Unaligned load with 256-bit - AVX registers are faster on processors with AVX2. */ - if (HAS_CPU_FEATURE (AVX2)) - cpu_features->feature[index_arch_AVX2_Usable] - |= bit_arch_AVX2_Usable | bit_arch_AVX_Fast_Unaligned_Load; - /* Check if OPMASK state, upper 256-bit of ZMM0-ZMM15 and - ZMM16-ZMM31 state are enabled. */ - if ((xcrlow & (bit_Opmask_state | bit_ZMM0_15_state - | bit_ZMM16_31_state)) == - (bit_Opmask_state | bit_ZMM0_15_state | bit_ZMM16_31_state)) - { - /* Determine if AVX512F is usable. */ - if (HAS_CPU_FEATURE (AVX512F)) - { - cpu_features->feature[index_arch_AVX512F_Usable] - |= bit_arch_AVX512F_Usable; - /* Determine if AVX512DQ is usable. */ - if (HAS_CPU_FEATURE (AVX512DQ)) - cpu_features->feature[index_arch_AVX512DQ_Usable] - |= bit_arch_AVX512DQ_Usable; - } - } - /* Determine if FMA is usable. */ - if (HAS_CPU_FEATURE (FMA)) - cpu_features->feature[index_arch_FMA_Usable] - |= bit_arch_FMA_Usable; - /* Determine if FMA4 is usable. */ - if (HAS_CPU_FEATURE (FMA4)) - cpu_features->feature[index_arch_FMA4_Usable] - |= bit_arch_FMA4_Usable; - } - } - #if !HAS_CPUID no_cpuid: #endif diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h index 0624a92..e06eb7e 100644 --- a/sysdeps/x86/cpu-features.h +++ b/sysdeps/x86/cpu-features.h @@ -204,11 +204,17 @@ extern const struct cpu_features *__get_cpu_features (void) # endif +/* Only used directly in cpu-features.c. */ +# define CPU_FEATURES_CPU_P(ptr, name) \ + ((ptr->cpuid[index_cpu_##name].reg_##name & (bit_cpu_##name)) != 0) +# define CPU_FEATURES_ARCH_P(ptr, name) \ + ((ptr->feature[index_arch_##name] & (bit_arch_##name)) != 0) + /* HAS_* evaluates to true if we may use the feature at runtime. */ # define HAS_CPU_FEATURE(name) \ - ((__get_cpu_features ()->cpuid[index_cpu_##name].reg_##name & (bit_cpu_##name)) != 0) + CPU_FEATURES_CPU_P (__get_cpu_features (), name) # define HAS_ARCH_FEATURE(name) \ - ((__get_cpu_features ()->feature[index_arch_##name] & (bit_arch_##name)) != 0) + CPU_FEATURES_ARCH_P (__get_cpu_features (), name) # define index_cpu_CX8 COMMON_CPUID_INDEX_1 # define index_cpu_CMOV COMMON_CPUID_INDEX_1 -- 2.5.0