FMA4 detection fails.

Message ID SN1PR12MB073357908CD9467919A8814797580@SN1PR12MB0733.namprd12.prod.outlook.com
State New, archived
Headers

Commit Message

Pawar, Amit June 2, 2016, 9:35 a.m. UTC
  After the git commit "f4b6d20366aac66070f1cf50552cf2951991a1e5" FMA4 detection is failing in trunk.
This commit was related to Bugzilla id https://sourceware.org/bugzilla/show_bug.cgi?id=19214 and is ok to fix in following way in cpu-features.c file?



Thanks,
Amit Pawar
  

Comments

Joseph Myers June 2, 2016, 11:32 p.m. UTC | #1
On Thu, 2 Jun 2016, Pawar, Amit wrote:

> After the git commit "f4b6d20366aac66070f1cf50552cf2951991a1e5" FMA4 
> detection is failing in trunk.
> This commit was related to Bugzilla id 
> https://sourceware.org/bugzilla/show_bug.cgi?id=19214 and is ok to fix 
> in following way in cpu-features.c file?

You need to open a new bug for what appears to be a new issue (and then 
mention [BZ #N] in your ChangeLog entry, etc.).
  
Carlos O'Donell June 3, 2016, 12:20 a.m. UTC | #2
On 06/02/2016 05:35 AM, Pawar, Amit wrote:
> After the git commit "f4b6d20366aac66070f1cf50552cf2951991a1e5" FMA4 detection is failing in trunk.

You need to explain _why_ it's not working.

> This commit was related to Bugzilla id https://sourceware.org/bugzilla/show_bug.cgi?id=19214 and is ok to fix in following way in cpu-features.c file?
> 
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index a5fa81f..384e328 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -87,10 +87,6 @@ get_common_indeces (struct cpu_features *cpu_features,
>           if (CPU_FEATURES_CPU_P (cpu_features, FMA))
>             cpu_features->feature[index_arch_FMA_Usable]
>               |= bit_arch_FMA_Usable;
> -         /* Determine if FMA4 is usable.  */
> -         if (CPU_FEATURES_CPU_P (cpu_features, FMA4))
> -           cpu_features->feature[index_arch_FMA4_Usable]
> -             |= bit_arch_FMA4_Usable;
>         }

You move the feature check from the subroutine get_common_indeces...


>      }
>  }
> @@ -230,6 +226,25 @@ init_cpu_features (struct cpu_features *cpu_features)
>                  cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ecx,
>                  cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].edx);
> 
> +      /* Can we call xgetbv?  */
> +      if (CPU_FEATURES_CPU_P (cpu_features, OSXSAVE))
> +       {
> +         unsigned int xcrlow;
> +         unsigned int xcrhigh;
> +
> +         asm ("xgetbv" : "=a" (xcrlow), "=d" (xcrhigh) : "c" (0));
> +
> +         /* Is YMM and XMM state usable?  */
> +         if ((xcrlow & (bit_YMM_state | bit_XMM_state)) ==
> +             (bit_YMM_state | bit_XMM_state))
> +           {
> +             /* Determine if FMA4 is usable.  */
> +               if (CPU_FEATURES_CPU_P (cpu_features, FMA4))
> +                 cpu_features->feature[index_arch_FMA4_Usable]
> +                   |= bit_arch_FMA4_Usable;

... and up into the init_cpu_features (caller of get_common_indeces).

As you refactor it up the call stack you end up copying the osxsave/xgetbv
checks which get_common_indeces was already doing.

As far as I can tell your patch just moves the check earlier... why?

> +           }
> +        }
> +
>        if (family == 0x15)
>         {
>  #if index_arch_Fast_Unaligned_Load != index_arch_Fast_Copy_Backward
> 
> 
> Thanks,
> Amit Pawar
  
Pawar, Amit June 3, 2016, 7:41 a.m. UTC | #3
>... and up into the init_cpu_features (caller of get_common_indeces).
>
>As you refactor it up the call stack you end up copying the osxsave/xgetbv checks which get_common_indeces was already doing.
>
>As far as I can tell your patch just moves the check earlier... why?
Thanks for replying and I have filed a new defect in Bugzilla as per the suggestion and please check the same https://sourceware.org/bugzilla/show_bug.cgi?id=20195

Thanks,
Amit
  

Patch

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index a5fa81f..384e328 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -87,10 +87,6 @@  get_common_indeces (struct cpu_features *cpu_features,
          if (CPU_FEATURES_CPU_P (cpu_features, FMA))
            cpu_features->feature[index_arch_FMA_Usable]
              |= bit_arch_FMA_Usable;
-         /* Determine if FMA4 is usable.  */
-         if (CPU_FEATURES_CPU_P (cpu_features, FMA4))
-           cpu_features->feature[index_arch_FMA4_Usable]
-             |= bit_arch_FMA4_Usable;
        }
     }
 }
@@ -230,6 +226,25 @@  init_cpu_features (struct cpu_features *cpu_features)
                 cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ecx,
                 cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].edx);

+      /* Can we call xgetbv?  */
+      if (CPU_FEATURES_CPU_P (cpu_features, OSXSAVE))
+       {
+         unsigned int xcrlow;
+         unsigned int xcrhigh;
+
+         asm ("xgetbv" : "=a" (xcrlow), "=d" (xcrhigh) : "c" (0));
+
+         /* Is YMM and XMM state usable?  */
+         if ((xcrlow & (bit_YMM_state | bit_XMM_state)) ==
+             (bit_YMM_state | bit_XMM_state))
+           {
+             /* Determine if FMA4 is usable.  */
+               if (CPU_FEATURES_CPU_P (cpu_features, FMA4))
+                 cpu_features->feature[index_arch_FMA4_Usable]
+                   |= bit_arch_FMA4_Usable;
+           }
+        }
+
       if (family == 0x15)
        {
 #if index_arch_Fast_Unaligned_Load != index_arch_Fast_Copy_Backward