x86-64: Check FMA_Usable in ifunc-mathvec-avx2.h [BZ #21966]
Commit Message
Since the AVX2 version of mathvec functions uses FMA, it can only be
used when FMA is usable.
Any comments?
Andrew, please take a look and check if it should be fixed on branches.
Thanks.
H.J.
[BZ #21966]
* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx2.h
(IFUNC_SELECTOR): Don't use the AVX2 version if FMA isn't
usable.
---
sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx2.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
> -----Original Message-----
> From: Lu, Hongjiu
> Sent: Friday, August 18, 2017 01:01
> To: GNU C Library <libc-alpha@sourceware.org>
> Cc: Senkevich, Andrew <andrew.senkevich@intel.com>
> Subject: [PATCH] x86-64: Check FMA_Usable in ifunc-mathvec-avx2.h [BZ
> #21966]
>
> Since the AVX2 version of mathvec functions uses FMA, it can only be used when
> FMA is usable.
>
> Any comments?
>
> Andrew, please take a look and check if it should be fixed on branches.
Yes, branches should be fixed also. Thanks.
--
Andrew
On Fri, Aug 18, 2017 at 6:02 AM, Senkevich, Andrew
<andrew.senkevich@intel.com> wrote:
>> -----Original Message-----
>> From: Lu, Hongjiu
>> Sent: Friday, August 18, 2017 01:01
>> To: GNU C Library <libc-alpha@sourceware.org>
>> Cc: Senkevich, Andrew <andrew.senkevich@intel.com>
>> Subject: [PATCH] x86-64: Check FMA_Usable in ifunc-mathvec-avx2.h [BZ
>> #21966]
>>
>> Since the AVX2 version of mathvec functions uses FMA, it can only be used when
>> FMA is usable.
>>
>> Any comments?
>>
>> Andrew, please take a look and check if it should be fixed on branches.
>
> Yes, branches should be fixed also. Thanks.
>
Hi Andrew,
I checked my patch into master branch. Can you take care of
branches?
Thanks.
@@ -31,7 +31,8 @@ IFUNC_SELECTOR (void)
{
const struct cpu_features* cpu_features = __get_cpu_features ();
- if (CPU_FEATURES_ARCH_P (cpu_features, AVX2_Usable))
+ if (CPU_FEATURES_ARCH_P (cpu_features, FMA_Usable)
+ && CPU_FEATURES_ARCH_P (cpu_features, AVX2_Usable))
return OPTIMIZE (avx2);
return OPTIMIZE (sse_wrapper);