[RFC] How to add vector math functions to Glibc
Commit Message
2014-09-26 18:15 GMT+04:00 Carlos O'Donell <carlos@redhat.com>:
> On 09/26/2014 09:45 AM, Andrew Senkevich wrote:
>> So lets discuss Glibc build changes.
>> Build of libmvec (and hence libm.so installation) need to be
>> architecture dependent and optional, and some changes already was
>> discussed in https://sourceware.org/ml/libc-alpha/2014-09/msg00578.html.
>> Is it OK additionally to have configure option --enable-mathvec with
>> default=no and with default=yes for x86_86 build?
>
> Under what circumstances would a non-x86_64 target build with
> --enable-mathvec?
>
> When they have their own API/ABI standard to implement and provide
> in libmvec.so?
>
> What's wrong with simply producing a libmvec.so that has no public
> symbols?
>
> It simplifies everything to just always ship libmvec.so, even if
> it's empty.
Based on previous discussion, now we have the following changes:
--
WBR,
Andrew
Comments
Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
> diff --git a/configure.ac b/configure.ac
> index 82d0896..c5c1758 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
> # Accept binutils 2.20 or newer.
> AC_CHECK_PROG_VER(AS, $AS, --version,
> [GNU assembler.* \([0-9]*\.[0-9.]*\)],
> - [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
> critic_missing="$critic_missing as")
> + [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
> critic_missing="$critic_missing as")
What are you trying to do here? That doesn't look correct.
Andreas.
2014-09-30 19:44 GMT+04:00 Andreas Schwab <schwab@suse.de>:
> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
>
>> diff --git a/configure.ac b/configure.ac
>> index 82d0896..c5c1758 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>> # Accept binutils 2.20 or newer.
>> AC_CHECK_PROG_VER(AS, $AS, --version,
>> [GNU assembler.* \([0-9]*\.[0-9.]*\)],
>> - [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
>> critic_missing="$critic_missing as")
>> + [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
>> critic_missing="$critic_missing as")
>
> What are you trying to do here? That doesn't look correct.
It is update of minimum required version of binutils to 2.22.
And of course comment need to be updated also...
--
WBR,
Andrew
Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
> 2014-09-30 19:44 GMT+04:00 Andreas Schwab <schwab@suse.de>:
>> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
>>
>>> diff --git a/configure.ac b/configure.ac
>>> index 82d0896..c5c1758 100644
>>> --- a/configure.ac
>>> +++ b/configure.ac
>>> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>>> # Accept binutils 2.20 or newer.
>>> AC_CHECK_PROG_VER(AS, $AS, --version,
>>> [GNU assembler.* \([0-9]*\.[0-9.]*\)],
>>> - [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
>>> critic_missing="$critic_missing as")
>>> + [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
>>> critic_missing="$critic_missing as")
>>
>> What are you trying to do here? That doesn't look correct.
>
> It is update of minimum required version of binutils to 2.22.
Along with excluding 2.30, 2.31, 2.40, 2.41 ...
Andreas.
2014-09-30 20:16 GMT+04:00 Andreas Schwab <schwab@suse.de>:
> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
>
>> 2014-09-30 19:44 GMT+04:00 Andreas Schwab <schwab@suse.de>:
>>> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
>>>
>>>> diff --git a/configure.ac b/configure.ac
>>>> index 82d0896..c5c1758 100644
>>>> --- a/configure.ac
>>>> +++ b/configure.ac
>>>> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>>>> # Accept binutils 2.20 or newer.
>>>> AC_CHECK_PROG_VER(AS, $AS, --version,
>>>> [GNU assembler.* \([0-9]*\.[0-9.]*\)],
>>>> - [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
>>>> critic_missing="$critic_missing as")
>>>> + [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
>>>> critic_missing="$critic_missing as")
>>>
>>> What are you trying to do here? That doesn't look correct.
>>
>> It is update of minimum required version of binutils to 2.22.
>
> Along with excluding 2.30, 2.31, 2.40, 2.41 ...
Yes, thank you, it need to be fixed:
- [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
+ [2.1[0-9][0-9]*|2.2[2-9]*|2.[3-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
And need to add the same check for ld version...
--
WBR,
Andrew
On Tue, 30 Sep 2014, Andrew Senkevich wrote:
> diff --git a/configure.ac b/configure.ac
> index 82d0896..c5c1758 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
> # Accept binutils 2.20 or newer.
> AC_CHECK_PROG_VER(AS, $AS, --version,
> [GNU assembler.* \([0-9]*\.[0-9.]*\)],
> - [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
> critic_missing="$critic_missing as")
> + [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
> critic_missing="$critic_missing as")
> AC_CHECK_PROG_VER(LD, $LD, --version,
> [GNU ld.* \([0-9][0-9]*\.[0-9.]*\)],
> [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], LD=:
> critic_missing="$critic_missing ld")
Any change to required versions needs to include an update to install.texi
(and the generated INSTALL file). It should also be proposed in a
separate thread whose subject describes what is being proposed.
> +# We need to install libm.so as linker script
> +# for more comfortable use of vector math library.
> +subdir_install: $(inst_libdir)/libm.so
> +
> +$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
> + $(common-objpfx)math/libm.so$(libm.so-version) \
> + $(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
> + $(+force)
> + (echo '/* GNU ld script */';\
> + cat $<; \
> + echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
> + 'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
> + ) > $@.new
> + mv -f $@.new $@
Do you have ordering issues here? It seems bad for math/ to install a
direct symlink and then mathvec/ to change it to something else - all
installation rules for libm should be in the math/ directory.
Do you need to link libmvec against libm (and if so, I'd expect associated
Makefile rules, and maybe a Depend file to ensure the directories are
built in the right order)?
Also, I'm not sure the empty libmvec option for unsupported architectures
when we consider the case where functions require GCC or binutils versions
newer than we wish to require, so they are optional on some architecture.
I think having libmvec built or not built on that architecture, depending
on the tools installed, is better than possibly having it built but empty
if the tools are too old.
> diff --git a/sysdeps/unix/sysv/linux/shlib-versions
> b/sysdeps/unix/sysv/linux/shlib-versions
> index 9160557..4a32c8a 100644
> --- a/sysdeps/unix/sysv/linux/shlib-versions
> +++ b/sysdeps/unix/sysv/linux/shlib-versions
> @@ -1,2 +1,3 @@
> libm=6
> libc=6
> +libmvec=1
There is nothing Linux-specific about this library, so the toplevel
shlib-versions seems better.
Did the patch pass the testsuite? If so, you have a problem - you didn't
add ABI test baselines for this library (in this version, a default empty
baseline, and one in sysdeps/unix/sysv/linux/x86_64), so the ABI tests
should have failed, and you need to find out why they didn't run for this
library, and fix that. If it failed for lack of ABI test baselines, add
them.
> +#if defined __x86_64__ && defined __FAST_MATH__
> +# define __DECL_SIMD_AVX2
> +# define __DECL_SIMD_SSE4
I don't see the need for this initial define to empty and subsequent
#undef. Except you should probably have comments explaining exactly what
these macros mean in terms of what function versions they define to be
available.
> +# if defined _OPENMP && _OPENMP >= 201307
> +/* OpenMP case. */
> +# undef __DECL_SIMD_AVX2
> +# undef __DECL_SIMD_SSE4
> +# define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
> +# define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
I think there should be a comment pointing to the ABI/API documentation
that says what function versions this pragma defines to be available and
guaranteeing that it will not be redefined to e.g. say that AVX512 is
available so that existing headers will work with future compilers (but
another pragma will be needed if in future AVX512 versions are added).
> +# elif defined _CILKPLUS && _CILKPLUS >= 0
> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
> +# undef __DECL_SIMD_AVX2
> +# undef __DECL_SIMD_SSE4
> +# define __DECL_SIMD_AVX2 __attribute__((vector (nomask)))
> +# define __DECL_SIMD_SSE4 __attribute__((vector (processor(core_i7_sse4_2), \
> + nomask)))
To be namespace-clean, you have to use reserved-namespace versions of
attributes. That is, __vector__, __nomask__, __processor__ and
__core_i7_sse4_2__.
> + .align 64
> + .globl __gnu_svml_dcos_data
> +__gnu_svml_dcos_data:
> + .long 4294967295
What are the semantics of the values in this table (please add a comment)?
How was this table generated?
> + .type __gnu_svml_dcos_data,@object
> + .size __gnu_svml_dcos_data,1600
.size __gnu_svml_dcos_data,.-__gnu_svml_dcos_data
seems better than hardcoding another magic number for the size here.
Hi all,
just 2cts from someone who wrote a couple of libm functions alreday in
his life:
Joseph S. Myers wrote on 30/09/2014 18:35:
>> +# if defined _OPENMP && _OPENMP >= 201307
>> +/* OpenMP case. */
>> +# undef __DECL_SIMD_AVX2
>> +# undef __DECL_SIMD_SSE4
>> +# define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
>> +# define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
>
> I think there should be a comment pointing to the ABI/API documentation
> that says what function versions this pragma defines to be available and
> guaranteeing that it will not be redefined to e.g. say that AVX512 is
> available so that existing headers will work with future compilers (but
> another pragma will be needed if in future AVX512 versions are added).
>
Yeah, the ABI/API is not quite self-documenting with functions declared
as follows:
Andrew Senkevich wrote on 30/09/2014 17:00:
> +#include <sysdep.h>
> +
> + .text
> +ENTRY(_ZGVdN4v_cos)
> +
> +/* ALGORITHM DESCRIPTION:
> + *
> + * ( low accuracy ( < 4ulp ) or enhanced performance ( half of
> correct mantissa ) implementation )
> + *
> + * Argument representation:
> + * arg + Pi/2 = (N*Pi + R)
> + *
> + * Result calculation:
> + * cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
> + * sin(R) is approximated by corresponding polynomial
> + */
> + pushq %rbp
> + movq %rsp, %rbp
> + andq $-64, %rsp
> + subq $448, %rsp
> + movq __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
> + vmovapd %ymm0, %ymm1
> + vmovupd 192(%rax), %ymm4
> + vmovupd 256(%rax), %ymm5
> +
Of course, there are comments in the code about how the algorithm works
but the code mainly is assembly with lots of magic numbers everywhere.
Frankly speaking, I have trouble seeing the difference between that code
and a binary blob. Yes, this last remark is polemic.
>> +# elif defined _CILKPLUS && _CILKPLUS >= 0
>> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
>> +# undef __DECL_SIMD_AVX2
>> +# undef __DECL_SIMD_SSE4
>> +# define __DECL_SIMD_AVX2 __attribute__((vector (nomask)))
>> +# define __DECL_SIMD_SSE4 __attribute__((vector (processor(core_i7_sse4_2), \
>> + nomask)))
>
> To be namespace-clean, you have to use reserved-namespace versions of
> attributes. That is, __vector__, __nomask__, __processor__ and
> __core_i7_sse4_2__.
>
>> + .align 64
>> + .globl __gnu_svml_dcos_data
>> +__gnu_svml_dcos_data:
>> + .long 4294967295
>
> What are the semantics of the values in this table (please add a comment)?
> How was this table generated?
>
Yeah, who codes floating-point values as (little-endian ?) memory
notation in decimal? I would understand hexadecimal but decimal?
As is, the code is unmaintainable.
>> + .type __gnu_svml_dcos_data,@object
>> + .size __gnu_svml_dcos_data,1600
>
> .size __gnu_svml_dcos_data,.-__gnu_svml_dcos_data
>
> seems better than hardcoding another magic number for the size here.
>
Yeah, so in conclusion: is there any technical rationale why a compiler
couldn't produce vectorized libm function suitable for the purpose of
gcc/cilk integration?
Best Regards,
Christoph Lauter
On Tue, 30 Sep 2014, Christoph Lauter wrote:
> Hi all,
>
> just 2cts from someone who wrote a couple of libm functions alreday in his
> life:
>
> Joseph S. Myers wrote on 30/09/2014 18:35:
>
> > > +# if defined _OPENMP && _OPENMP >= 201307
> > > +/* OpenMP case. */
> > > +# undef __DECL_SIMD_AVX2
> > > +# undef __DECL_SIMD_SSE4
> > > +# define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
> > > +# define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
> >
> > I think there should be a comment pointing to the ABI/API documentation
> > that says what function versions this pragma defines to be available and
> > guaranteeing that it will not be redefined to e.g. say that AVX512 is
> > available so that existing headers will work with future compilers (but
> > another pragma will be needed if in future AVX512 versions are added).
> >
>
> Yeah, the ABI/API is not quite self-documenting with functions declared as
> follows:
What I'm referring to here is somewhat different - it's the ABI/API that
defines the contact between the library and compiler implied by the pragma
(or, in the Cilk Plus case, by the attribute).
That ABI/API will effectively say "this pragma / attribute means that
versions of this function are available for the following vector ISAs"
(and then go on to say what the ABI is for each ISA). It should also say
explicitly that compilers must not interpret the pragma / attribute as
meaning that functions are available for any other vector ISAs and that
new pragmas / attributes will be defined for any new vector ISAs as
needed. That avoids future compilers misinterpreting glibc 2.21's headers
as meaning it provides e.g. AVX512 versions of functions.
This ABI/API should be generically about OpenMP / Cilk Plus on x86_64
processors, rather than specifically about GCC, to establish an
interpretation intended to be shared by any compiler that implements those
features, now or in the future.
(Of course then the patch does actually need to provide all the function
versions implied by the pragma / attribute.)
> > > + .align 64
> > > + .globl __gnu_svml_dcos_data
> > > +__gnu_svml_dcos_data:
> > > + .long 4294967295
> >
> > What are the semantics of the values in this table (please add a comment)?
> > How was this table generated?
> >
>
> Yeah, who codes floating-point values as (little-endian ?) memory notation in
> decimal? I would understand hexadecimal but decimal?
>
> As is, the code is unmaintainable.
And, generally, we want to be able to regenerate any such tables if there
are changes to the algorithms. This means at a minimum having comments
giving the semantics of the table (coefficients of whatever polynomial
approximation to a given function on given intervals, for example), but
preferably source code to generate the table.
>> +# We need to install libm.so as linker script
>> +# for more comfortable use of vector math library.
>> +subdir_install: $(inst_libdir)/libm.so
>> +
>> +$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
>> + $(common-objpfx)math/libm.so$(libm.so-version) \
>> + $(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
>> + $(+force)
>> + (echo '/* GNU ld script */';\
>> + cat $<; \
>> + echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
>> + 'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
>> + ) > $@.new
>> + mv -f $@.new $@
>
> Do you have ordering issues here? It seems bad for math/ to install a
> direct symlink and then mathvec/ to change it to something else - all
> installation rules for libm should be in the math/ directory.
Inserted in math/Makefile this rule produces warning about overriding
recipe for target libm.so (as I see rule for libm.so was already
generated from o-iterator.mk).
If use temporary target:
+subdir_install: $(inst_libdir)/libm.so.tmp
+$(inst_libdir)/libm.so.tmp: $(common-objpfx)format.lds \
+ $(common-objpfx)math/libm.so$(libm.so-version) \
+ $(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
+ $(+force)
+ (echo '/* GNU ld script */';\
+ cat $<; \
+ echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+ 'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
+ ) > $@
+ mv -f $@ $(inst_libdir)/libm.so
$(inst_libdir)/libm.so became overwritten later.
So I have temporary file and need to move it to $(inst_libdir)/libm.so
at the end.
If would be great if someone can give me advice how to do it.
--
WBR,
Andrew
>> > > + .align 64
>> > > + .globl __gnu_svml_dcos_data
>> > > +__gnu_svml_dcos_data:
>> > > + .long 4294967295
>> >
>> > What are the semantics of the values in this table (please add a comment)?
This tables contain data of several types - polynomial coefficients,
some constants, lookup-tables.
>> > How was this table generated?
Values was calculated with Maple, Mathematica and Sollya.
>> Yeah, who codes floating-point values as (little-endian ?) memory notation in
>> decimal? I would understand hexadecimal but decimal?
What is requirements for data representation? Lets determine how
values will be represented here.
> And, generally, we want to be able to regenerate any such tables if there
> are changes to the algorithms. This means at a minimum having comments
> giving the semantics of the table (coefficients of whatever polynomial
> approximation to a given function on given intervals, for example), but
> preferably source code to generate the table.
We can add some comments but regeneration of this tables is not supported.
--
WBR,
Andrew
On Thu, 2 Oct 2014, Andrew Senkevich wrote:
> >> > > + .align 64
> >> > > + .globl __gnu_svml_dcos_data
> >> > > +__gnu_svml_dcos_data:
> >> > > + .long 4294967295
> >> >
> >> > What are the semantics of the values in this table (please add a comment)?
>
> This tables contain data of several types - polynomial coefficients,
> some constants, lookup-tables.
That then indicates that each part of the table should have a comment
explaining the exact semantics of the values in that part of the table,
and naming the macro used for the offset of that part of the table from
the start of the table - and where the code refers to parts of the table,
it should use those macros for the offsets instead of hardcoding magic
constants in the relevant instructions. Furthermore, if you define those
macros in a common header, the table can do
.if .-__gnu_svml_dcos_data != MACRO_NAME
.err
.endif
at the start of each section of the table, so avoiding the need for
comments to mention the macro names and making sure the macros are
accurate. Then if someone changes part of the function implementation,
requiring replacing just one section of the table, you don't have problems
with quiet problems from not updating offsets - failing to update the
macros correctly will cause an immediate build failure.
@@ -4521,7 +4521,7 @@ $as_echo_n "checking version of $AS... " >&6; }
ac_prog_version=`$AS --version 2>&1 | sed -n 's/^.*GNU assembler.*
\([0-9]*\.[0-9.]*\).*$/\1/p'`
case $ac_prog_version in
'') ac_prog_version="v. ?.??, bad"; ac_verc_fail=yes;;
- 2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*)
+ 2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*)
ac_prog_version="$ac_prog_version, ok"; ac_verc_fail=no;;
*) ac_prog_version="$ac_prog_version, bad"; ac_verc_fail=yes;;
@@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
# Accept binutils 2.20 or newer.
AC_CHECK_PROG_VER(AS, $AS, --version,
[GNU assembler.* \([0-9]*\.[0-9.]*\)],
- [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
critic_missing="$critic_missing as")
+ [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
critic_missing="$critic_missing as")
AC_CHECK_PROG_VER(LD, $LD, --version,
[GNU ld.* \([0-9][0-9]*\.[0-9.]*\)],
[2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], LD=:
critic_missing="$critic_missing ld")
@@ -1018,7 +1018,7 @@ all-subdirs = csu assert ctype locale intl
catgets math setjmp signal \
stdlib stdio-common libio malloc string wcsmbs time dirent \
grp pwd posix io termios resource misc socket sysvipc gmon \
gnulib iconv iconvdata wctype manual shadow gshadow po argp \
- crypt localedata timezone rt conform debug \
+ crypt localedata timezone rt conform debug mathvec \
$(add-on-subdirs) dlfcn elf
ifndef avoid-generated
new file mode 100644
@@ -0,0 +1,22 @@
+/* Platform-specific SIMD declarations of math functions.
+ Copyright (C) 2014 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+ include <math.h> instead."
+#endif
@@ -26,7 +26,7 @@ headers := math.h bits/mathcalls.h bits/mathinline.h
bits/huge_val.h \
bits/huge_valf.h bits/huge_vall.h bits/inf.h bits/nan.h \
fpu_control.h complex.h bits/cmathcalls.h fenv.h \
bits/fenv.h bits/fenvinline.h bits/mathdef.h tgmath.h \
- bits/math-finite.h
+ bits/math-finite.h bits/math-vector.h
# FPU support code.
aux := setfpucw fpu_control
@@ -60,6 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
__MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
/* Cosine of X. */
+#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
+__DECL_SIMD_cos
+#endif
+#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
+__DECL_SIMD_cosf
+#endif
+#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
+__DECL_SIMD_cosl
+#endif
__MATHCALL (cos,, (_Mdouble_ __x));
/* Sine of X. */
__MATHCALL (sin,, (_Mdouble_ __x));
@@ -27,6 +27,9 @@
__BEGIN_DECLS
+/* Get machine-dependent vector math functions declarations */
+#include <bits/math-vector.h>
+
/* Get machine-dependent HUGE_VAL value (returned on overflow).
On all IEEE754 machines, this is +Infinity. */
#include <bits/huge_val.h>
new file mode 100644
@@ -0,0 +1,45 @@
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Makefile for the vector math library.
+
+subdir := mathvec
+
+include ../Makeconfig
+
+extra-libs := libmvec
+extra-libs-others = $(extra-libs)
+
+libmvec.so-no-z-defs = yes
+libmvec-routines = $(strip $(libmvec-support))
+
+# We need to install libm.so as linker script
+# for more comfortable use of vector math library.
+subdir_install: $(inst_libdir)/libm.so
+
+$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
+ $(common-objpfx)math/libm.so$(libm.so-version) \
+ $(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
+ $(+force)
+ (echo '/* GNU ld script */';\
+ cat $<; \
+ echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+ 'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
+ ) > $@.new
+ mv -f $@.new $@
+
+include ../Rules
b/sysdeps/unix/sysv/linux/shlib-versions
@@ -1,2 +1,3 @@
libm=6
libc=6
+libmvec=1
b/sysdeps/x86/fpu/bits/math-vector.h
new file mode 100644
@@ -0,0 +1,45 @@
+/* Platform-specific SIMD declarations of math functions.
+ Copyright (C) 2014 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+ include <math.h> instead."
+#endif
+
+#if defined __x86_64__ && defined __FAST_MATH__
+# define __DECL_SIMD_AVX2
+# define __DECL_SIMD_SSE4
+
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+# undef __DECL_SIMD_AVX2
+# undef __DECL_SIMD_SSE4
+# define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
+# define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
+# elif defined _CILKPLUS && _CILKPLUS >= 0
+/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
+# undef __DECL_SIMD_AVX2
+# undef __DECL_SIMD_SSE4
+# define __DECL_SIMD_AVX2 __attribute__((vector (nomask)))
+# define __DECL_SIMD_SSE4 __attribute__((vector (processor(core_i7_sse4_2), \
+ nomask)))
+# endif
+
+# define __DECL_SIMD_cos __DECL_SIMD_AVX2
+# define __DECL_SIMD_cosf __DECL_SIMD_SSE4
+#endif
new file mode 100644
@@ -0,0 +1,3 @@
+ifeq ($(subdir),mathvec)
+libmvec-support += svml_d_cos4_core svml_d_cos_data
+endif
new file mode 100644
@@ -0,0 +1,5 @@
+libmvec {
+ GLIBC_2.21 {
+ _ZGVdN4v_cos;
+ }
+}
b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
new file mode 100644
@@ -0,0 +1,185 @@
+/* Function cos vectorized with AVX2.
+ Copyright (C) 2014 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+ .text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *
+ * ( low accuracy ( < 4ulp ) or enhanced performance ( half of
correct mantissa ) implementation )
+ *
+ * Argument representation:
+ * arg + Pi/2 = (N*Pi + R)
+ *
+ * Result calculation:
+ * cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ * sin(R) is approximated by corresponding polynomial
+ */
+ pushq %rbp
+ movq %rsp, %rbp
+ andq $-64, %rsp
+ subq $448, %rsp
+ movq __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
+ vmovapd %ymm0, %ymm1
+ vmovupd 192(%rax), %ymm4
+ vmovupd 256(%rax), %ymm5
+
+/* ARGUMENT RANGE REDUCTION:
+ * Add Pi/2 to argument: X' = X+Pi/2
+ */
+ vaddpd 128(%rax), %ymm1, %ymm7
+
+/* Get absolute argument value: X' = |X'| */
+ vandpd (%rax), %ymm7, %ymm2
+
+/* Y = X'*InvPi + RS : right shifter add */
+ vfmadd213pd %ymm5, %ymm4, %ymm7
+ vmovupd 1216(%rax), %ymm4
+
+/* Check for large arguments path */
+ vcmpnle_uqpd 64(%rax), %ymm2, %ymm3
+
+/* N = Y - RS : right shifter sub */
+ vsubpd %ymm5, %ymm7, %ymm6
+ vmovupd 640(%rax), %ymm2
+
+/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
+ vpsllq $63, %ymm7, %ymm7
+
+/* N = N - 0.5 */
+ vsubpd 320(%rax), %ymm6, %ymm0
+ vmovmskpd %ymm3, %ecx
+
+/* R = X - N*Pi1 */
+ vmovapd %ymm1, %ymm3
+ vfnmadd231pd %ymm0, %ymm2, %ymm3
+
+/* R = R - N*Pi2 */
+ vfnmadd231pd 704(%rax), %ymm0, %ymm3
+
+/* R = R - N*Pi3 */
+ vfnmadd132pd 768(%rax), %ymm3, %ymm0
+
+/* POLYNOMIAL APPROXIMATION:
+ * R2 = R*R
+ */
+ vmulpd %ymm0, %ymm0, %ymm5
+ vfmadd213pd 1152(%rax), %ymm5, %ymm4
+ vfmadd213pd 1088(%rax), %ymm5, %ymm4
+ vfmadd213pd 1024(%rax), %ymm5, %ymm4
+
+/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
+ vfmadd213pd 960(%rax), %ymm5, %ymm4
+ vfmadd213pd 896(%rax), %ymm5, %ymm4
+ vfmadd213pd 832(%rax), %ymm5, %ymm4
+ vmulpd %ymm5, %ymm4, %ymm6
+ vfmadd213pd %ymm0, %ymm0, %ymm6
+
+/* RECONSTRUCTION:
+ * Final sign setting: Res = Poly^SignRes
+ */
+ vxorpd %ymm7, %ymm6, %ymm0
+ testl %ecx, %ecx
+ jne _LBL_1_3
+
+_LBL_1_2:
+ movq %rbp, %rsp
+ popq %rbp
+ ret
+
+_LBL_1_3:
+ vmovupd %ymm1, 320(%rsp)
+ vmovupd %ymm0, 384(%rsp)
+ je _LBL_1_2
+
+ xorb %dl, %dl
+ xorl %eax, %eax
+ vmovups %ymm8, 224(%rsp)
+ vmovups %ymm9, 192(%rsp)
+ vmovups %ymm10, 160(%rsp)
+ vmovups %ymm11, 128(%rsp)
+ vmovups %ymm12, 96(%rsp)
+ vmovups %ymm13, 64(%rsp)
+ vmovups %ymm14, 32(%rsp)
+ vmovups %ymm15, (%rsp)
+ movq %rsi, 264(%rsp)
+ movq %rdi, 256(%rsp)
+ movq %r12, 296(%rsp)
+ movb %dl, %r12b
+ movq %r13, 288(%rsp)
+ movl %ecx, %r13d
+ movq %r14, 280(%rsp)
+ movl %eax, %r14d
+ movq %r15, 272(%rsp)
+
+_LBL_1_6:
+ btl %r14d, %r13d
+ jc _LBL_1_12
+
+_LBL_1_7:
+ lea 1(%r14), %esi
+ btl %esi, %r13d
+ jc _LBL_1_10
+
+_LBL_1_8:
+ incb %r12b
+ addl $2, %r14d
+ cmpb $16, %r12b
+ jb _LBL_1_6
+
+ vmovups 224(%rsp), %ymm8
+ vmovups 192(%rsp), %ymm9
+ vmovups 160(%rsp), %ymm10
+ vmovups 128(%rsp), %ymm11
+ vmovups 96(%rsp), %ymm12
+ vmovups 64(%rsp), %ymm13
+ vmovups 32(%rsp), %ymm14
+ vmovups (%rsp), %ymm15
+ vmovupd 384(%rsp), %ymm0
+ movq 264(%rsp), %rsi
+ movq 256(%rsp), %rdi
+ movq 296(%rsp), %r12
+ movq 288(%rsp), %r13
+ movq 280(%rsp), %r14
+ movq 272(%rsp), %r15
+ jmp _LBL_1_2
+
+_LBL_1_10:
+ movzbl %r12b, %r15d
+ shlq $4, %r15
+ vmovsd 328(%rsp,%r15), %xmm0
+ vzeroupper
+
+ call cos@PLT
+
+ vmovsd %xmm0, 392(%rsp,%r15)
+ jmp _LBL_1_8
+
+_LBL_1_12:
+ movzbl %r12b, %r15d
+ shlq $4, %r15
+ vmovsd 320(%rsp,%r15), %xmm0
+ vzeroupper
+
+ call cos@PLT
+
+ vmovsd %xmm0, 384(%rsp,%r15)
+ jmp _LBL_1_7
+END(_ZGVdN4v_cos)
b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
@@ -0,0 +1,426 @@
+/* Data for vectorized cos.
+ Copyright (C) 2014 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+ .section .rodata, "a"
+
+ .align 64
+ .globl __gnu_svml_dcos_data
+__gnu_svml_dcos_data:
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .type __gnu_svml_dcos_data,@object
+ .size __gnu_svml_dcos_data,1600