From patchwork Wed Apr 9 16:28:58 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 466 Return-Path: X-Original-To: siddhesh@wilcox.dreamhost.com Delivered-To: siddhesh@wilcox.dreamhost.com Received: from homiemail-mx21.g.dreamhost.com (peon2454.g.dreamhost.com [208.113.200.127]) by wilcox.dreamhost.com (Postfix) with ESMTP id 79DDD36005B for ; Wed, 9 Apr 2014 09:29:04 -0700 (PDT) Received: by homiemail-mx21.g.dreamhost.com (Postfix, from userid 14307373) id 3740410C5029; Wed, 9 Apr 2014 09:29:04 -0700 (PDT) X-Original-To: glibc@patchwork.siddhesh.in Delivered-To: x14307373@homiemail-mx21.g.dreamhost.com Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by homiemail-mx21.g.dreamhost.com (Postfix) with ESMTPS id 05F9910CDE84 for ; Wed, 9 Apr 2014 09:29:03 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; q=dns; s=default; b= UikvTKOfY+TMi61mN8ARpGcMXUWuIZ8exnrxvSZO0BrPoVc4g/8tlQUQRY70N18g hm6lTgJ12E7Ju5aM8oOVJZcm8DLBZT7Rk127Jh6JSiUAX/cxNLe69ga4oX9Tduky 1+67JoIcT6qVuHhMBYkE8umSFcuMyGsEaoRu2ig/7KY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; s=default; bh=yZzUL Y2j2lfunTDek6sPK/9TkSE=; b=Es/HRLnPC1NAadgonA+aSo4v9jMXM3P3vYUg4 4Q216gw5+h9xuY4THxVHevnPVS7Oz0pUqZq6Dj/jJBQyU1VymNxrQ7KfHmuzYVqN r4T3nTlXxsx4QO3dy3sMVrcvybqX0y+uitSKNzyG3u8+Ga7nMLmYFYxUlZryp1kQ xGbdA8= Received: (qmail 21634 invoked by alias); 9 Apr 2014 16:29:02 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 21623 invoked by uid 89); 9 Apr 2014 16:29:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=3.9 required=5.0 tests=AWL, BAYES_50, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPAM_URI1, SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-oa0-f45.google.com MIME-Version: 1.0 X-Received: by 10.182.135.228 with SMTP id pv4mr2322523obb.62.1397060938331; Wed, 09 Apr 2014 09:28:58 -0700 (PDT) In-Reply-To: References: <1396595802-21567-1-git-send-email-ling.ma.program@gmail.com> Date: Wed, 9 Apr 2014 09:28:58 -0700 Message-ID: Subject: Re: [PATCH RFC] X86_64 Avx2 Detection From: "H.J. Lu" To: Ling Ma Cc: GNU C Library , Ondrej Bilka , Liubov Dmitrieva , Sihai Yao X-DH-Original-To: glibc@patchwork.siddhesh.in On Wed, Apr 9, 2014 at 9:12 AM, H.J. Lu wrote: > On Fri, Apr 4, 2014 at 12:16 AM, wrote: >> From: Sihai Yao >> >> This patch sets bit_AVX2_Usable of __cpu_features.feature by checking >> COMMON_CPUID_INDEX_7 for Haswell. Architecture related assembler file >> can use this bit to determine calling path. >> >> --- >> ChangeLog | 9 +++++++++ >> sysdeps/x86_64/multiarch/ifunc-defines.sym | 2 ++ >> sysdeps/x86_64/multiarch/init-arch.c | 3 +++ >> sysdeps/x86_64/multiarch/init-arch.h | 9 +++++++++ >> 4 files changed, 23 insertions(+) >> >> diff --git a/ChangeLog b/ChangeLog >> index da8ea6d..ab23a3a 100644 >> --- a/ChangeLog >> +++ b/ChangeLog >> @@ -1,3 +1,12 @@ >> +2014-04-04 Sihai Yao >> + >> + * sysdeps/x86_64/multiarch/ifunc-defines.sym: Add COMMON_CPU_INDEX_7 and >> + FEATURE_INDEX_7. >> + * sysdeps/x86_64/multiarch/init-arch.c: Add AVX2 detection from cpu >> + features word of COMMON_CPUID_INDEX_7. >> + * sysdeps/x86_64/multiarch/init-arch.h: Add bit_AVX2_Usable for memset.S >> + to determine calling path. >> + >> 2014-04-03 David Svoboda >> >> [BZ #5666] >> diff --git a/sysdeps/x86_64/multiarch/ifunc-defines.sym b/sysdeps/x86_64/multiarch/ifunc-defines.sym >> index eb1538a..448b8c4 100644 >> --- a/sysdeps/x86_64/multiarch/ifunc-defines.sym >> +++ b/sysdeps/x86_64/multiarch/ifunc-defines.sym >> @@ -17,4 +17,6 @@ FEATURE_OFFSET offsetof (struct cpu_features, feature) >> FEATURE_SIZE sizeof (unsigned int) >> >> COMMON_CPUID_INDEX_1 >> +COMMON_CPUID_INDEX_7 >> FEATURE_INDEX_1 >> +FEATURE_INDEX_7 >> diff --git a/sysdeps/x86_64/multiarch/init-arch.c b/sysdeps/x86_64/multiarch/init-arch.c >> index db74d97..2bbc5eb 100644 >> --- a/sysdeps/x86_64/multiarch/init-arch.c >> +++ b/sysdeps/x86_64/multiarch/init-arch.c >> @@ -106,6 +106,7 @@ __init_cpu_features (void) >> case 0x2c: >> case 0x2e: >> case 0x2f: >> + case 0x3c: > > This isn't mentioned in ChangeLog. IA Optimization reference manual > shows that 0x45 and 0x46 are also Haswell. This should be in a separate > patch. > >> /* Rep string instructions, copy backward, unaligned loads >> and pminub are fast on Intel Core i3, i5 and i7. */ >> #if index_Fast_Rep_String != index_Fast_Copy_Backward >> @@ -153,6 +154,8 @@ __init_cpu_features (void) >> __cpu_features.cpuid[COMMON_CPUID_INDEX_7].ebx, >> __cpu_features.cpuid[COMMON_CPUID_INDEX_7].ecx, >> __cpu_features.cpuid[COMMON_CPUID_INDEX_7].edx); >> + if (CPUID_AVX2) >> + __cpu_features.feature[index_AVX2_Usable] |= bit_AVX2_Usable; > ^^^^^^^^^^ > > This should be inside if (CPUID_OSXSAVE), similar to bit_AVX_Usable. > >> /* Can we call xgetbv? */ >> if (CPUID_OSXSAVE) >> diff --git a/sysdeps/x86_64/multiarch/init-arch.h b/sysdeps/x86_64/multiarch/init-arch.h >> index 793707a..e453ccc 100644 >> --- a/sysdeps/x86_64/multiarch/init-arch.h >> +++ b/sysdeps/x86_64/multiarch/init-arch.h >> @@ -24,6 +24,7 @@ >> #define bit_FMA_Usable (1 << 7) >> #define bit_FMA4_Usable (1 << 8) >> #define bit_Slow_SSE4_2 (1 << 9) >> +#define bit_AVX2_Usable (1 << 10) >> >> /* CPUID Feature flags. */ >> >> @@ -40,6 +41,7 @@ >> >> /* COMMON_CPUID_INDEX_7. */ >> #define bit_RTM (1 << 11) >> +#define bit_AVX2 (1 << 5) >> >> /* XCR0 Feature flags. */ >> #define bit_XMM_state (1 << 1) >> @@ -54,6 +56,7 @@ >> # define index_SSE4_1 COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET >> # define index_SSE4_2 COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET >> # define index_AVX COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET >> +# define index_AVX2 COMMON_CPUID_INDEX_7*CPUID_SIZE+CPUID_EBX_OFFSET >> >> # define index_Fast_Rep_String FEATURE_INDEX_1*FEATURE_SIZE >> # define index_Fast_Copy_Backward FEATURE_INDEX_1*FEATURE_SIZE >> @@ -64,6 +67,7 @@ >> # define index_FMA_Usable FEATURE_INDEX_1*FEATURE_SIZE >> # define index_FMA4_Usable FEATURE_INDEX_1*FEATURE_SIZE >> # define index_Slow_SSE4_2 FEATURE_INDEX_1*FEATURE_SIZE >> +# define index_AVX2_Usable FEATURE_INDEX_7*FEATURE_SIZE >> >> #else /* __ASSEMBLER__ */ >> >> @@ -81,6 +85,7 @@ enum >> enum >> { >> FEATURE_INDEX_1 = 0, >> + FEATURE_INDEX_7, >> /* Keep the following line at the end. */ >> FEATURE_INDEX_MAX >> }; >> @@ -145,6 +150,8 @@ extern const struct cpu_features *__get_cpu_features (void) >> HAS_CPUID_FLAG (COMMON_CPUID_INDEX_80000001, ecx, bit_FMA4) >> # define CPUID_RTM \ >> HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_RTM) >> +# define CPUID_AVX2 \ >> + HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_AVX2) >> >> /* HAS_* evaluates to true if we may use the feature at runtime. */ >> # define HAS_SSE2 HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, edx, bit_SSE2) >> @@ -153,6 +160,7 @@ extern const struct cpu_features *__get_cpu_features (void) >> # define HAS_SSE4_1 HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_1) >> # define HAS_SSE4_2 HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_2) >> # define HAS_RTM HAS_CPU_FEATURE (COMMON_CPUID_INDEX_7, ebx, bit_RTM) >> +# define HAS_AVX2 HAS_CPU_FEATURE (COMMON_CPUID_INDEX_7, ebx, bit_AVX2) >> >> # define index_Fast_Rep_String FEATURE_INDEX_1 >> # define index_Fast_Copy_Backward FEATURE_INDEX_1 >> @@ -163,6 +171,7 @@ extern const struct cpu_features *__get_cpu_features (void) >> # define index_FMA_Usable FEATURE_INDEX_1 >> # define index_FMA4_Usable FEATURE_INDEX_1 >> # define index_Slow_SSE4_2 FEATURE_INDEX_1 >> +# define index_AVX2_Usable FEATURE_INDEX_7 >> >> # define HAS_ARCH_FEATURE(name) \ >> ((__get_cpu_features ()->feature[index_##name] & (bit_##name)) != 0) >> -- >> 1.8.1.4 Here is the updated patch. We don't need FEATURE_INDEX_7. diff --git a/sysdeps/x86_64/multiarch/ifunc-defines.sym b/sysdeps/x86_64/multiarch/ifunc-defines.sym index eb1538a..a410d88 100644 --- a/sysdeps/x86_64/multiarch/ifunc-defines.sym +++ b/sysdeps/x86_64/multiarch/ifunc-defines.sym @@ -17,4 +17,5 @@ FEATURE_OFFSET offsetof (struct cpu_features, feature) FEATURE_SIZE sizeof (unsigned int) COMMON_CPUID_INDEX_1 +COMMON_CPUID_INDEX_7 FEATURE_INDEX_1 diff --git a/sysdeps/x86_64/multiarch/init-arch.c b/sysdeps/x86_64/multiarch/init-arch.c index db74d97..2a6dcb7 100644 --- a/sysdeps/x86_64/multiarch/init-arch.c +++ b/sysdeps/x86_64/multiarch/init-arch.c @@ -167,6 +167,9 @@ __init_cpu_features (void) /* Determine if AVX is usable. */ if (CPUID_AVX) __cpu_features.feature[index_AVX_Usable] |= bit_AVX_Usable; + /* Determine if AVX2 is usable. */ + if (CPUID_AVX2) + __cpu_features.feature[index_AVX2_Usable] |= bit_AVX2_Usable; /* Determine if FMA is usable. */ if (CPUID_FMA) __cpu_features.feature[index_FMA_Usable] |= bit_FMA_Usable; diff --git a/sysdeps/x86_64/multiarch/init-arch.h b/sysdeps/x86_64/multiarch/init-arch.h index 793707a..813b6de 100644 --- a/sysdeps/x86_64/multiarch/init-arch.h +++ b/sysdeps/x86_64/multiarch/init-arch.h @@ -24,6 +24,7 @@ #define bit_FMA_Usable (1 << 7) #define bit_FMA4_Usable (1 << 8) #define bit_Slow_SSE4_2 (1 << 9) +#define bit_AVX2_Usable (1 << 10) /* CPUID Feature flags. */ @@ -40,6 +41,7 @@ /* COMMON_CPUID_INDEX_7. */ #define bit_RTM (1 << 11) +#define bit_AVX2 (1 << 5) /* XCR0 Feature flags. */ #define bit_XMM_state (1 << 1) @@ -54,6 +56,7 @@ # define index_SSE4_1 COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET # define index_SSE4_2 COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET # define index_AVX COMMON_CPUID_INDEX_1*CPUID_SIZE+CPUID_ECX_OFFSET +# define index_AVX2 COMMON_CPUID_INDEX_7*CPUID_SIZE+CPUID_EBX_OFFSET # define index_Fast_Rep_String FEATURE_INDEX_1*FEATURE_SIZE # define index_Fast_Copy_Backward FEATURE_INDEX_1*FEATURE_SIZE @@ -64,6 +67,7 @@ # define index_FMA_Usable FEATURE_INDEX_1*FEATURE_SIZE # define index_FMA4_Usable FEATURE_INDEX_1*FEATURE_SIZE # define index_Slow_SSE4_2 FEATURE_INDEX_1*FEATURE_SIZE +# define index_AVX2_Usable FEATURE_INDEX_1*FEATURE_SIZE #else /* __ASSEMBLER__ */ @@ -145,6 +149,8 @@ extern const struct cpu_features *__get_cpu_features (void) HAS_CPUID_FLAG (COMMON_CPUID_INDEX_80000001, ecx, bit_FMA4) # define CPUID_RTM \ HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_RTM) +# define CPUID_AVX2 \ + HAS_CPUID_FLAG (COMMON_CPUID_INDEX_7, ebx, bit_AVX2) /* HAS_* evaluates to true if we may use the feature at runtime. */ # define HAS_SSE2 HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, edx, bit_SSE2) @@ -153,6 +159,7 @@ extern const struct cpu_features *__get_cpu_features (void) # define HAS_SSE4_1 HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_1) # define HAS_SSE4_2 HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_2) # define HAS_RTM HAS_CPU_FEATURE (COMMON_CPUID_INDEX_7, ebx, bit_RTM) +# define HAS_AVX2 HAS_CPU_FEATURE (COMMON_CPUID_INDEX_7, ebx, bit_AVX2) # define index_Fast_Rep_String FEATURE_INDEX_1 # define index_Fast_Copy_Backward FEATURE_INDEX_1 @@ -163,6 +170,7 @@ extern const struct cpu_features *__get_cpu_features (void) # define index_FMA_Usable FEATURE_INDEX_1 # define index_FMA4_Usable FEATURE_INDEX_1 # define index_Slow_SSE4_2 FEATURE_INDEX_1 +# define index_AVX2_Usable FEATURE_INDEX_1 # define HAS_ARCH_FEATURE(name) \ ((__get_cpu_features ()->feature[index_##name] & (bit_##name)) != 0)