[RFC] How to add vector math functions to Glibc

Message ID CAMXFM3uGOKqEAvGYew+9K7bmhObVmnP2u4kUOSh8_Cpwyk8s5g@mail.gmail.com
State New, archived
Headers

Commit Message

Andrew Senkevich Oct. 16, 2014, 4:37 p.m. UTC
  Hi, Joseph,

here is the patch with test suite changes with fixes based on your
previous comments.

>> @@ -6258,7 +6274,11 @@ static const struct test_f_f_data cos_test_data[] =
>>  static void
>>  cos_test (void)
>>  {
>> +#ifndef TEST_MATHVEC
>>    ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
>> +#else
>> +  TN_RM_TEST (vector_cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
>> +#endif
>>  }
>
> And I don't think we want conditionals like this for every function -
> indeed, the tests shouldn't need to know which functions have vector
> versions at all.

Do you mean to use the same *_test function for testing vector
(through wrapper)?
I have such scheme now but it requires to add macros named as standard
function and it also caused changes in START macros.

>> +CFLAGS-test-vec-double.c = -fno-inline -ffloat-store -fno-builtin
>> -frounding-math -mavx2 -Wno-unused-function
>
> And since you can't determine at configure
> time what host the tests might run on, instruction set features such as
> AVX need testing for at runtime (this means building separate source files
> for the test with separate options so that you know the compiler won't
> generate AVX code before you've tested for AVX availability).

Because of vector tests grouped by ISA we have different test driver
names containing vector length (test-double-vlen4.c for AVX2).
Scalar wrappers (called from test driver) will be in separate files
(test-double-vlen4-wrapper.c) and will be built with
architecture-specific options specified in sysdeps Makefile.
For runtime check we need to insert condition before wrapper start so
with help of new macros added in *_test function that condition could
be defined in test driver.
Now I have built test-double-vlen4 manually and if this way is ok I
will prepare that sysdeps Makefile.

+#include "libm-test.c"


--
WBR,
Andrew
  

Comments

Joseph Myers Oct. 16, 2014, 9:51 p.m. UTC | #1
On Thu, 16 Oct 2014, Andrew Senkevich wrote:

> >> @@ -6258,7 +6274,11 @@ static const struct test_f_f_data cos_test_data[] =
> >>  static void
> >>  cos_test (void)
> >>  {
> >> +#ifndef TEST_MATHVEC
> >>    ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
> >> +#else
> >> +  TN_RM_TEST (vector_cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
> >> +#endif
> >>  }
> >
> > And I don't think we want conditionals like this for every function -
> > indeed, the tests shouldn't need to know which functions have vector
> > versions at all.
> 
> Do you mean to use the same *_test function for testing vector
> (through wrapper)?

Yes.  I don't have a full design, but the principle is to change how the 
macros for running tests expand (or what functions they call do) 
conditional on what is being tested, so that none of the conditionals are 
at the level of individual functions if it can be avoided.  And I don't 
think you should need to change calls to START, just the expansion.

> Because of vector tests grouped by ISA we have different test driver
> names containing vector length (test-double-vlen4.c for AVX2).
> Scalar wrappers (called from test driver) will be in separate files
> (test-double-vlen4-wrapper.c) and will be built with
> architecture-specific options specified in sysdeps Makefile.
> For runtime check we need to insert condition before wrapper start so
> with help of new macros added in *_test function that condition could
> be defined in test driver.

I'd think that the check for AVX2 etc. availability could run once in 
main, rather than in the tests of individual functions.

> @@ -6247,7 +6248,7 @@ copysign_test (void)
> 
> 
>  static const struct test_f_f_data cos_test_data[] =
> -  {
> +  {

This looks like a bogus diff hunk.

> +  /* Vector trigonometric functions:  */
> +#ifdef TEST_MATHVEC
> +
> +  cos_test ();
> +
> +#else

There shouldn't be such conditionals.  It should be arranged that if 
there isn't a relevant vector version of a particular function, running 
vector tests for that function does nothing - so there are no conditionals 
on which *_test functions to run, and none inside those functions, just 
conditionals affecting what the test macros do (by means of conditionals 
inside them such as if (HAVE_VECTOR_cos_double_vlen4), for example, 
resulting from appropriate concatenations).

> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps
> b/sysdeps/x86_64/fpu/libm-test-ulps
> index 36e1b76..0e11cd5 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -905,6 +905,12 @@ idouble: 1
>  ildouble: 2
>  ldouble: 2
> 
> +
> +Function: "vlen4_cos":
> +double: 1
> +
>  Function: "cosh":
>  double: 1
>  float: 1

This looks odd.  There shouldn't be the double blank line, and entries 
should be sorted alphabetically - this file should be updated by "make 
regen-ulps", and you need to ensure that regen-ulps does include the ulps 
for the tests of the vector functions.
  
Andrew Senkevich Oct. 21, 2014, 1:19 p.m. UTC | #2
2014-10-17 1:51 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:

>> +  /* Vector trigonometric functions:  */
>> +#ifdef TEST_MATHVEC
>> +
>> +  cos_test ();
>> +
>> +#else
>
> There shouldn't be such conditionals.  It should be arranged that if
> there isn't a relevant vector version of a particular function, running
> vector tests for that function does nothing - so there are no conditionals
> on which *_test functions to run, and none inside those functions, just
> conditionals affecting what the test macros do (by means of conditionals
> inside them such as if (HAVE_VECTOR_cos_double_vlen4), for example,
> resulting from appropriate concatenations).

With HAVE_VECTOR_cos_double_vlen4 we need to have such macros with
zero for all set of not vector functions which is huge.
May be more suitable way is to have determined name of vector function
wrapper and selection based on function name?
I mean to have something like this in test driver test-double-vlen4.c:

#define HAVE_VECTOR 1
#define VEC_PREFIX_STR "VECTOR_LEN_"
#define cos VECTOR_LEN_4_cos
#include "libm-test.c"

in test-double-vlen4-wrapper.c:

VECTOR_WRAPPER(VECTOR_LEN_4_cos,_ZGVdN4v_cos)

and in libm-test.inc:

#ifndef HAVE_VECTOR
# define HAVE_VECTOR 0
#endif

#ifndef VEC_PREFIX_STR
# define VEC_PREFIX_STR ""
#endif

static const char *vec_prefix = VEC_PREFIX_STR;

static int is_vector_name(const char *this_func)
{
  if (strncmp(this_func, vec_prefix, strlen(vec_prefix))==0)
    return 1;
  return 0;
}

#define STR_CON(x,y) __STRING(x##y)

/* Start and end the tests for a given function.  */
#define START(FUNC, SUFF, EXACT) \
  const char *this_func = STR_CON (FUNC, SUFF); \
  if (HAVE_VECTOR && !is_vector_name(this_func)) return; \
  init_max_error (this_func, EXACT)

Is this way ok?


--
WBR,
Andrew
  
Joseph Myers Oct. 21, 2014, 3:29 p.m. UTC | #3
On Tue, 21 Oct 2014, Andrew Senkevich wrote:

> > There shouldn't be such conditionals.  It should be arranged that if
> > there isn't a relevant vector version of a particular function, running
> > vector tests for that function does nothing - so there are no conditionals
> > on which *_test functions to run, and none inside those functions, just
> > conditionals affecting what the test macros do (by means of conditionals
> > inside them such as if (HAVE_VECTOR_cos_double_vlen4), for example,
> > resulting from appropriate concatenations).
> 
> With HAVE_VECTOR_cos_double_vlen4 we need to have such macros with
> zero for all set of not vector functions which is huge.

But I'd hope such macros could be generated by gen-libm-test.pl (or some 
such script, anyway) rather than needing lots of repetitive definitions to 
be maintained by hand and checked in.

Essentially:

* The architecture-specific headers (installed headers, or possibly 
non-installed ones used only by the testsuite in some cases) contain the 
information about what vector versions of what functions are available.  
Things are designed so that they only need to contain definitions where 
vector functions are available, not where they aren't (to avoid needing to 
repeat slightly different huge lists for each architecture).

* Where a default definition to 0 is needed in any cases, the relevant 
definitions are generated automatically.  (Indeed, this might make sense 
for a header included by bits/mathcalls.h, so that __MATHCALL can expand 
to include the right __DECL_SIMD_*, which might end up empty, rather than 
needing lots of #if conditionals before every function declaration there.)

(Incidentally, there have been so many different patch fragments posted in 
this discussion that it's hard to follow what you're proposing, if e.g. in 
the discussion of testing it's relevant to look at what you're proposing 
for installed headers.  I think it would help if you had a git branch with 
the current set of proposed changes, that you frequently rebase so it 
always shows what you currently propose.)

> May be more suitable way is to have determined name of vector function
> wrapper and selection based on function name?
> I mean to have something like this in test driver test-double-vlen4.c:
> 
> #define HAVE_VECTOR 1
> #define VEC_PREFIX_STR "VECTOR_LEN_"
> #define cos VECTOR_LEN_4_cos
> #include "libm-test.c"
> 
> in test-double-vlen4-wrapper.c:
> 
> VECTOR_WRAPPER(VECTOR_LEN_4_cos,_ZGVdN4v_cos)

If you did something like that, I think it would still be desirable to 
have some form of automatic generation of a list of defines, one per 
function and conditional as needed on whether the relevant vector version 
of that function exists.
  
Andrew Senkevich Oct. 23, 2014, 7:22 p.m. UTC | #4
Hi, Joseph,

attach contains current situation in my branch. I have generated
additional header with series of definitions, now that header included
in libm-test.inc but also only generation can be left in some script.
Currently information from math.h is used for selection what test to
run in vector case, but it required according changes in
math-vector.h. Added tests for cos and cosf (float with no vector
function body, just a stub now).
Test suite passed with no fails on math tests on non AVX2 target, on
AVX2 target also (but math/test-float-vlen8 must fail so it is
strange, will look).
Let me know if such changes ok in general.


--
WBR,
Andrew
  
Joseph Myers Oct. 23, 2014, 9:37 p.m. UTC | #5
On Thu, 23 Oct 2014, Andrew Senkevich wrote:

> Let me know if such changes ok in general.

I'm not clear we yet reached consensus on whether glibc is the right place 
for this; I think that discussion tailed off without a clear conclusion, 
and someone needs to reread it, post a careful analysis of the discussion 
so far and try to help the community reach consensus.

Regarding the specific patch:

> +	      [Enable building and installing mathvec @<:@default=yes on x86_64 build, else default=no@:>@])],

I don't think the help text in an architecture-independent file should 
refer to specific architectures like this; just say "default depends on 
architecture" or similar.

> +ifeq ($(build-mathvect),yes)
> +# We need to install libm.so as linker script
> +# for more comfortable use of vector math library.
> +subdir_install: $(inst_libdir)/libm.so.tmp
> +$(inst_libdir)/libm.so.tmp: $(common-objpfx)format.lds \
> +	$(common-objpfx)math/libm.so$(libm.so-version) \
> +	$(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
> +	$(+force)
> +	(echo '/* GNU ld script */';\
> +	cat $<; \
> +	echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
> +	'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
> +	) > $@
> +	mv -f $@ $(inst_libdir)/libm.so # TODO do it somehow after all other
> +endif

Clearly it's necessary to resolve how to disable the normal installation 
rule for libm.so so it can be cleanly replaced by this new one.

> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 8a94a7e..2d31a11 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -60,6 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
>  __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
>  
>  /* Cosine of X.  */
> +#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
> +__DECL_SIMD_cos
> +#endif
> +#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
> +__DECL_SIMD_cosf
> +#endif
> +#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
> +__DECL_SIMD_cosl
> +#endif
>  __MATHCALL (cos,, (_Mdouble_ __x));

As previously noted, I think it would be much better if the definition of 
__MATHCALL can include all the conditional bits (possibly through a 
generated header that defines __DECL_SIMD_cos etc. to empty if not defined 
by bits/math-vector.h).

> diff --git a/math/have_vector.h b/math/have_vector.h
> new file mode 100644
> index 0000000..94aacf0
> --- /dev/null
> +++ b/math/have_vector.h
> @@ -0,0 +1,2574 @@
> +/* 
> +Definitions below are generated with the following bash script:
> +for func in $(grep ALL_RM_TEST math/libm-test.inc | awk {'print $2'} | sed -e "s/(//" -e "s/,//"); do 

Rather than having such a file checked in, makefile rules / scripts to 
generate it at test time should be checked in.

> +static int avx2_usable;		/* Set to 1 if AVX2 supported */

Given that we expect multiple architectures to have vector functions, 
this belongs in some architecture-specific file that libm-test.inc 
includes, rather than directly in libm-test.inc (which shouldn't refer 
directly to AVX at all).

> -#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
> -		     EXCEPTIONS)					\
> -  do									\
> -    if (enable_test (EXCEPTIONS))					\
> -      {									\
> -	COMMON_TEST_SETUP (ARG_STR);					\
> -	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
> -		     EXCEPTIONS);					\
> -	COMMON_TEST_CLEANUP;						\
> -      }									\
> +#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,				\
> +		     EXCEPTIONS)						\
> +  do										\
> +    if (enable_test (EXCEPTIONS))						\
> +      {										\
> +	COMMON_TEST_SETUP (ARG_STR);						\
> +	check_float (test_name,							\
> +		     CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )),	\
> +			     FUNC (FUNC_NAME)) (ARG),				\
> +			     EXPECTED,						\
> +		     EXCEPTIONS);						\
> +	COMMON_TEST_CLEANUP;							\
> +      }										\

I think it would be better for FUNC to be defined, in the test file that 
includes libm-test.inc, in a way that avoids the need for the CONCAT* 
calls here.  (To avoid warnings / errors about undeclared functions, I 
suppose the generated header might then need to redefine e.g. vec_sin to 
sin if there isn't a vector version of sin.)

> +#if defined __x86_64__ && defined __FAST_MATH__
> +# if defined _OPENMP && _OPENMP >= 201307
> +/* OpenMP case. */
> +#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
> +#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")

Of course we still need the API/ABI documentation providing the stable 
guarantee about exactly what this pragma means regarding the function 
versions it is saying are available in glibc.

> +#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
> +#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
> +# elif defined _CILKPLUS && _CILKPLUS >= 0
> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
> +#  define __DECL_SIMD_AVX2 __attribute__((__vector__(nomask)))
> +#  define __DECL_SIMD_SSE4 __attribute__((__vector__(processor(core_i7_sse4_2),\
> +						     nomask)))

And as previously noted, this needs to be fixed to be namespace-clean - 
using __nomask__, __processor__, __core_i7_sse4_2__.

> +#if defined TEST_MATHVEC

No, you can't have such conditionals on a macro in the user's namespace in 
an installed header.

> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
> index c9f9a51..91c4cdf 100644
> --- a/sysdeps/x86_64/configure.ac
> +++ b/sysdeps/x86_64/configure.ac
> @@ -5,6 +5,24 @@ AC_CHECK_HEADER([cpuid.h], ,
>    [AC_MSG_ERROR([gcc must provide the <cpuid.h> header])],
>    [/* No default includes.  */])
>  
> +dnl Check if compiler target is x86_64.

Not needed.  preconfigure fragments in sysdeps directories need to check 
the architecture, but configure ones don't (they'll only be run for the 
relevant architecture, unless one fragment explicitly sources another).

> diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
> new file mode 100644
> index 0000000..d585fa0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/Makefile
> @@ -0,0 +1,33 @@
> +ifeq ($(subdir),mathvec)
> +libmvec-support += svml_d_cos4_core svml_d_cos_data
> +endif
> +
> +# Rules for libmvec tests
> +ifeq ($(subdir),math)
> +ifneq ($(PERL),no)
> +ifeq ($(build-mathvec),yes)
> +libm-tests += test-double-vlen4 test-float-vlen8
> +
> +CFLAGS-test-double-vlen4-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
> +				     -frounding-math -mavx2
> +CFLAGS-test-float-vlen8-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
> +				    -frounding-math -mavx2

I think the sysdeps makefile should actually just define that double-vlen4 
and float-vlen8 are the vector lengths for which testing should take 
place, with all the other testing rules being arranged in an 
architecture-independent way.

> +/* General constants:
> + * lAbsMask
> + */
> +	.long	0xffffffff
> +	.long	0x7fffffff

My previous point from 
<https://sourceware.org/ml/libc-alpha/2014-10/msg00324.html> still applies 
about how to make these tables more readable (one line per "double" 
constant, more explicitly say what the constants are) and ensure that the 
offsets in the tables are directly linked to the offsets used in the 
function implementation.

> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
> new file mode 100644
> index 0000000..0778e23
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c

This file may well need to be architecture-specific, at least as written, 
but ...

> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c b/sysdeps/x86_64/fpu/test-double-vlen4.c
> new file mode 100644
> index 0000000..4d3d9a3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4.c

 ... it's not at all clear that this one should need to be.  At present it 
has some architecture-specific bits

> +#define CHECK_ARCH_EXT if (!avx2_usable) return;
> +
> +extern FLOAT WRAPPER_NAME (cos) (FLOAT);

but I'd think those are all that needs to go somewhere 
architecture-specific and the rest is pretty generic to any architecture 
supporting vector functions for vectors of 4 doubles.
  
Andrew Senkevich Oct. 27, 2014, 2 p.m. UTC | #6
2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
> On Thu, 23 Oct 2014, Andrew Senkevich wrote:
>
>> Let me know if such changes ok in general.
>
> I'm not clear we yet reached consensus on whether glibc is the right place
> for this; I think that discussion tailed off without a clear conclusion,
> and someone needs to reread it, post a careful analysis of the discussion
> so far and try to help the community reach consensus.

It was already decided and written in Consensus paragraph on wiki in
https://sourceware.org/ml/libc-alpha/2014-09/msg00596.html.
Link to wiki - https://sourceware.org/glibc/wiki/libm#Consensus

>> +#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
>> +#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
>> +# elif defined _CILKPLUS && _CILKPLUS >= 0
>> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
>> +#  define __DECL_SIMD_AVX2 __attribute__((__vector__(nomask)))
>> +#  define __DECL_SIMD_SSE4 __attribute__((__vector__(processor(core_i7_sse4_2),\
>> +                                                  nomask)))
>
> And as previously noted, this needs to be fixed to be namespace-clean -
> using __nomask__, __processor__, __core_i7_sse4_2__.

It seems there are no such reserved-namespace word versions now...

>> +#if defined TEST_MATHVEC
>
> No, you can't have such conditionals on a macro in the user's namespace in
> an installed header.

Then we have to build vector tests with -D__FAST_MATH__
-DTEST_FAST_MATH -D_OPENMP=201307 to be sure we have needed
definitions from math.h?

>> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
>> index c9f9a51..91c4cdf 100644
>> --- a/sysdeps/x86_64/configure.ac
>> +++ b/sysdeps/x86_64/configure.ac
>> @@ -5,6 +5,24 @@ AC_CHECK_HEADER([cpuid.h], ,
>>    [AC_MSG_ERROR([gcc must provide the <cpuid.h> header])],
>>    [/* No default includes.  */])
>>
>> +dnl Check if compiler target is x86_64.
>
> Not needed.  preconfigure fragments in sysdeps directories need to check
> the architecture, but configure ones don't (they'll only be run for the
> relevant architecture, unless one fragment explicitly sources another).

Clear, then it can be done in root configure like so:
+if test x"$build_mathvec" = xnotset; then
+  if test x"$machine" = xx86_64/64; then
+    build_mathvec=yes
+  else
+    build_mathvec=no
+  fi
+fi
+LIBC_CONFIG_VAR([build-mathvec], [$build_mathvec])


--
WBR,
Andrew
  
Joseph Myers Oct. 27, 2014, 2:39 p.m. UTC | #7
On Mon, 27 Oct 2014, Andrew Senkevich wrote:

> >> +#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
> >> +#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
> >> +# elif defined _CILKPLUS && _CILKPLUS >= 0
> >> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
> >> +#  define __DECL_SIMD_AVX2 __attribute__((__vector__(nomask)))
> >> +#  define __DECL_SIMD_SSE4 __attribute__((__vector__(processor(core_i7_sse4_2),\
> >> +                                                  nomask)))
> >
> > And as previously noted, this needs to be fixed to be namespace-clean -
> > using __nomask__, __processor__, __core_i7_sse4_2__.
> 
> It seems there are no such reserved-namespace word versions now...

Then fix the compiler to have such reserved-namespace versions and put 
appropriate conditionals on a fixed compiler version in the header.  It's 
not OK to put random identifiers into an installed header like that.

> >> +#if defined TEST_MATHVEC
> >
> > No, you can't have such conditionals on a macro in the user's namespace in
> > an installed header.
> 
> Then we have to build vector tests with -D__FAST_MATH__
> -DTEST_FAST_MATH -D_OPENMP=201307 to be sure we have needed
> definitions from math.h?

Yes, -D__FAST_MATH__ is used for some other libm tests.

> >> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
> >> index c9f9a51..91c4cdf 100644
> >> --- a/sysdeps/x86_64/configure.ac
> >> +++ b/sysdeps/x86_64/configure.ac
> >> @@ -5,6 +5,24 @@ AC_CHECK_HEADER([cpuid.h], ,
> >>    [AC_MSG_ERROR([gcc must provide the <cpuid.h> header])],
> >>    [/* No default includes.  */])
> >>
> >> +dnl Check if compiler target is x86_64.
> >
> > Not needed.  preconfigure fragments in sysdeps directories need to check
> > the architecture, but configure ones don't (they'll only be run for the
> > relevant architecture, unless one fragment explicitly sources another).
> 
> Clear, then it can be done in root configure like so:
> +if test x"$build_mathvec" = xnotset; then
> +  if test x"$machine" = xx86_64/64; then

No.  Such conditionals on particular systems do not go in the toplevel 
configure script.

If you want something for x86_64 (both -m64 and -mx32), it can go in 
sysdeps/x86_64/configure.ac, without any machine conditionals.  And if 
there's a good reason (please state the reason if so) something won't work 
for x32, put it in sysdeps/x86_64/64/configure.ac, again with no machine 
conditionals (and in that case, the implementation files would also go in 
sysdeps/x86_64/64/ directories).
  
Andrew Senkevich Oct. 29, 2014, 12:59 p.m. UTC | #8
2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:

>> +static int avx2_usable;              /* Set to 1 if AVX2 supported */
>
> Given that we expect multiple architectures to have vector functions,
> this belongs in some architecture-specific file that libm-test.inc
> includes, rather than directly in libm-test.inc (which shouldn't refer
> directly to AVX at all).

>which shouldn't refer directly to AVX at all
Do you mean to place avx2_usable initialization in procedure in
architecture-specific *.c file and have generic stub, call it from
test main() and change build accordingly?
May be simply stay __cpu_features.feature[index_AVX2_Usable] &
bit_AVX2_Usable in every test function inserted through macros? It
don't require so big changes and don't affect performance
significantly. Or insert initialization in test's main() through
macros also.

>> -#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,                      \
>> -                  EXCEPTIONS)                                        \
>> -  do                                                                 \
>> -    if (enable_test (EXCEPTIONS))                                    \
>> -      {                                                                      \
>> -     COMMON_TEST_SETUP (ARG_STR);                                    \
>> -     check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,       \
>> -                  EXCEPTIONS);                                       \
>> -     COMMON_TEST_CLEANUP;                                            \
>> -      }                                                                      \
>> +#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,                              \
>> +                  EXCEPTIONS)                                                \
>> +  do                                                                         \
>> +    if (enable_test (EXCEPTIONS))                                            \
>> +      {                                                                              \
>> +     COMMON_TEST_SETUP (ARG_STR);                                            \
>> +     check_float (test_name,                                                 \
>> +                  CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )),      \
>> +                          FUNC (FUNC_NAME)) (ARG),                           \
>> +                          EXPECTED,                                          \
>> +                  EXCEPTIONS);                                               \
>> +     COMMON_TEST_CLEANUP;                                                    \
>> +      }                                                                              \
>
> I think it would be better for FUNC to be defined, in the test file that
> includes libm-test.inc, in a way that avoids the need for the CONCAT*
> calls here.  (To avoid warnings / errors about undeclared functions, I
> suppose the generated header might then need to redefine e.g. vec_sin to
> sin if there isn't a vector version of sin.)

Not good idea to change FUNC definition since it used in libm-test.c
not only in test macros (so it may cause usage of vector function with
not vector parameter). But it is possible to reduce number of
concatenation if change generated definitions in way your have
proposed.

Not all functions tested trough ALL_RM_TEST - cexp, tgamma and jn
tested in not all rounding modes, so definitions for them we have to
generate in script manually.

>> diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
>> new file mode 100644
>> index 0000000..d585fa0
>> --- /dev/null
>> +++ b/sysdeps/x86_64/fpu/Makefile
>> @@ -0,0 +1,33 @@
>> +ifeq ($(subdir),mathvec)
>> +libmvec-support += svml_d_cos4_core svml_d_cos_data
>> +endif
>> +
>> +# Rules for libmvec tests
>> +ifeq ($(subdir),math)
>> +ifneq ($(PERL),no)
>> +ifeq ($(build-mathvec),yes)
>> +libm-tests += test-double-vlen4 test-float-vlen8
>> +
>> +CFLAGS-test-double-vlen4-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
>> +                                  -frounding-math -mavx2
>> +CFLAGS-test-float-vlen8-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
>> +                                 -frounding-math -mavx2
>
> I think the sysdeps makefile should actually just define that double-vlen4
> and float-vlen8 are the vector lengths for which testing should take
> place, with all the other testing rules being arranged in an
> architecture-independent way.

Do you mean to stay in sysdeps/x86_64/fpu/Makefile only CFLAGS-*
definitions or to setup some variable which will be used in common
makefile for build vector tests?

>> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
>> new file mode 100644
>> index 0000000..0778e23
>> --- /dev/null
>> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
>
> This file may well need to be architecture-specific, at least as written,
> but ...
>
>> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c b/sysdeps/x86_64/fpu/test-double-vlen4.c
>> new file mode 100644
>> index 0000000..4d3d9a3
>> --- /dev/null
>> +++ b/sysdeps/x86_64/fpu/test-double-vlen4.c
>
>  ... it's not at all clear that this one should need to be.  At present it
> has some architecture-specific bits
>
>> +#define CHECK_ARCH_EXT if (!avx2_usable) return;
>> +
>> +extern FLOAT WRAPPER_NAME (cos) (FLOAT);
>
> but I'd think those are all that needs to go somewhere
> architecture-specific and the rest is pretty generic to any architecture
> supporting vector functions for vectors of 4 doubles.

Then lets have math/test-double-vlen4.h with common definitions and
sysdeps/x86_64/fpu/test-double-vlen4.c containing wrapper.
Attached patch contains only discussed here changes.


--
WBR,
Andrew
  
Joseph Myers Oct. 29, 2014, 6:50 p.m. UTC | #9
On Wed, 29 Oct 2014, Andrew Senkevich wrote:

> 2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
> 
> >> +static int avx2_usable;              /* Set to 1 if AVX2 supported */
> >
> > Given that we expect multiple architectures to have vector functions,
> > this belongs in some architecture-specific file that libm-test.inc
> > includes, rather than directly in libm-test.inc (which shouldn't refer
> > directly to AVX at all).
> 
> >which shouldn't refer directly to AVX at all
> Do you mean to place avx2_usable initialization in procedure in
> architecture-specific *.c file and have generic stub, call it from
> test main() and change build accordingly?

For example.  The aim is to get something clean in accordance with glibc's 
design principles - such as putting things that are 
architecture-independent in architecture-independent places, and things 
that are architecture-specific in architecture-specific places, with a 
minimum of duplication between architectures.  There may be multiple 
approaches that achieve that.

> > I think it would be better for FUNC to be defined, in the test file that
> > includes libm-test.inc, in a way that avoids the need for the CONCAT*
> > calls here.  (To avoid warnings / errors about undeclared functions, I
> > suppose the generated header might then need to redefine e.g. vec_sin to
> > sin if there isn't a vector version of sin.)
> 
> Not good idea to change FUNC definition since it used in libm-test.c
> not only in test macros (so it may cause usage of vector function with
> not vector parameter). But it is possible to reduce number of
> concatenation if change generated definitions in way your have
> proposed.

Well, maybe a preliminary refactoring patch is needed that separates FUNC 
into multiple macros, one for functions used in testsuite infrastructure 
and one for functions being tested.

There are lots of RUN_TEST_* macros (I don't think we should assume that 
only one of them will only ever be relevant for vector tests) - it seems a 
bad idea for every one of them to need to repeat something so cryptic as 
CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )), FUNC (FUNC_NAME)).

> Not all functions tested trough ALL_RM_TEST - cexp, tgamma and jn
> tested in not all rounding modes, so definitions for them we have to
> generate in script manually.

Yes, conversion of those to ALL_RM_TEST was deferred because of bugs it 
showed up that need fixing.  And in the case of cexp, the bugs appear to 
be present in other functions as well, but it's not convenient to add 
tests for them in all cases until csin / csinh have moved to tests in 
auto-libm-test-in - and for that, I'm waiting for a new MPC release with 
last December's speedups for mpc_sin / mpc_sinh.  I'm doubtful any changes 
to the arguments to START should be needed, but if they are, then you do 
indeed need to change the code for those three functions' tests manually.

> >> diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
> >> new file mode 100644
> >> index 0000000..d585fa0
> >> --- /dev/null
> >> +++ b/sysdeps/x86_64/fpu/Makefile
> >> @@ -0,0 +1,33 @@
> >> +ifeq ($(subdir),mathvec)
> >> +libmvec-support += svml_d_cos4_core svml_d_cos_data
> >> +endif
> >> +
> >> +# Rules for libmvec tests
> >> +ifeq ($(subdir),math)
> >> +ifneq ($(PERL),no)
> >> +ifeq ($(build-mathvec),yes)
> >> +libm-tests += test-double-vlen4 test-float-vlen8
> >> +
> >> +CFLAGS-test-double-vlen4-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
> >> +                                  -frounding-math -mavx2
> >> +CFLAGS-test-float-vlen8-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
> >> +                                 -frounding-math -mavx2
> >
> > I think the sysdeps makefile should actually just define that double-vlen4
> > and float-vlen8 are the vector lengths for which testing should take
> > place, with all the other testing rules being arranged in an
> > architecture-independent way.
> 
> Do you mean to stay in sysdeps/x86_64/fpu/Makefile only CFLAGS-*
> definitions or to setup some variable which will be used in common
> makefile for build vector tests?

Only libmvec-support, and a variable containing "double-vlen4 float-vlen8" 
or similar as a list of vector formats for which to run tests, and a 
variable containing "-mavx2" as compiler options for building vector tests 
(all the other options there should be architecture-independent and 
defined only once in a variable in math/Makefile).
  
Andrew Senkevich Oct. 30, 2014, 12:14 p.m. UTC | #10
2014-10-29 21:50 GMT+03:00 Joseph S. Myers <joseph@codesourcery.com>:

>> > I think it would be better for FUNC to be defined, in the test file that
>> > includes libm-test.inc, in a way that avoids the need for the CONCAT*
>> > calls here.  (To avoid warnings / errors about undeclared functions, I
>> > suppose the generated header might then need to redefine e.g. vec_sin to
>> > sin if there isn't a vector version of sin.)
>>
>> Not good idea to change FUNC definition since it used in libm-test.c
>> not only in test macros (so it may cause usage of vector function with
>> not vector parameter). But it is possible to reduce number of
>> concatenation if change generated definitions in way your have
>> proposed.
>
> Well, maybe a preliminary refactoring patch is needed that separates FUNC
> into multiple macros, one for functions used in testsuite infrastructure
> and one for functions being tested.
>
> There are lots of RUN_TEST_* macros (I don't think we should assume that
> only one of them will only ever be relevant for vector tests) - it seems a
> bad idea for every one of them to need to repeat something so cryptic as
> CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )), FUNC (FUNC_NAME)).

But it is already old code, yesterday's patch looks so in this place:
FUNC_TEST (FUNC_NAME) (ARG)


--
WBR,
Andrew
  
Joseph Myers Oct. 30, 2014, 1:55 p.m. UTC | #11
On Thu, 30 Oct 2014, Andrew Senkevich wrote:

> > Well, maybe a preliminary refactoring patch is needed that separates FUNC
> > into multiple macros, one for functions used in testsuite infrastructure
> > and one for functions being tested.
> >
> > There are lots of RUN_TEST_* macros (I don't think we should assume that
> > only one of them will only ever be relevant for vector tests) - it seems a
> > bad idea for every one of them to need to repeat something so cryptic as
> > CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )), FUNC (FUNC_NAME)).
> 
> But it is already old code, yesterday's patch looks so in this place:
> FUNC_TEST (FUNC_NAME) (ARG)

As I said, *preliminary refactoring patch*.  Long sequences of variations 
on the same patch aren't helpful; if you find yourself sending them, you 
need to step back and think very carefully about how to restructure the 
submission to make things as clear and as easy to review as possible.  
That includes separating out any pieces, large or small, that are 
reasonably separable and can be justified on their own merits.  Having 
separated them, you then need to make *self-contained* submissions 
(including all relevant rationale and background), and ping those 
submissions weekly as needed (I haven't seen any pings of the binutils 
version requirement patch).  And please keep the state for your own 
patches in patchwork.sourceware.org clean; I see six entries there with 
the same description "[RFC] How to add vector math functions to Glibc", 
when there should be at most one.

If you do need to make multiple submissions of successive versions of the 
same patch, consider the submission style where each submission contains 
both the full self-contained description and rationale (that would go in 
the git log message) and a separate description of what has changed 
relative to the previous patch version (and number each patch version).
  
Joseph Myers Oct. 30, 2014, 8:07 p.m. UTC | #12
Also, I don't see you in copyright.list, so unless you're covered by a 
corporate copyright assignment for glibc you should start work on 
completing the paperwork.
  
Andrew Senkevich Oct. 31, 2014, 10:24 a.m. UTC | #13
2014-10-30 23:07 GMT+03:00 Joseph S. Myers <joseph@codesourcery.com>:

> Also, I don't see you in copyright.list, so unless you're covered by a
> corporate copyright assignment for glibc you should start work on
> completing the paperwork.

Paperwork in progress.


--
WBR,
Andrew
  
Andrew Senkevich Nov. 6, 2014, 8:51 p.m. UTC | #14
Hi, Joseph,

2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:

> On Thu, 23 Oct 2014, Andrew Senkevich wrote:

>> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
>> index 8a94a7e..2d31a11 100644
>> --- a/math/bits/mathcalls.h
>> +++ b/math/bits/mathcalls.h
>> @@ -60,6 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
>>  __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
>>
>>  /* Cosine of X.  */
>> +#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
>> +__DECL_SIMD_cos
>> +#endif
>> +#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
>> +__DECL_SIMD_cosf
>> +#endif
>> +#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
>> +__DECL_SIMD_cosl
>> +#endif
>>  __MATHCALL (cos,, (_Mdouble_ __x));
>
> As previously noted, I think it would be much better if the definition of
> __MATHCALL can include all the conditional bits (possibly through a
> generated header that defines __DECL_SIMD_cos etc. to empty if not defined
> by bits/math-vector.h).

proposal is to use separated __MATHCALL_VEC for vector cases, because
it reduces number of needed empty definitions and can be simply
generated (__MATHCALL case requires a lot of manual search to obtain
all affected function names because of redefinitions in some files).

>> +#if defined __x86_64__ && defined __FAST_MATH__
>> +# if defined _OPENMP && _OPENMP >= 201307
>> +/* OpenMP case. */
>> +#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
>> +#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
>
> Of course we still need the API/ABI documentation providing the stable
> guarantee about exactly what this pragma means regarding the function
> versions it is saying are available in glibc.

We will follow-up on this soon.

I attached patch with almost all infrastructure fixes discussed
before. It seems pragma meaning and data tables remain to be done.
Patch affects a lot of files and of course will be separated to
minimal disjoint parts for submission later.


--
WBR,
Andrew
  
Andrew Senkevich Nov. 14, 2014, 3:44 p.m. UTC | #15
2014-11-06 23:51 GMT+03:00 Andrew Senkevich <andrew.n.senkevich@gmail.com>:
> Hi, Joseph,
>
> 2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
>
>> On Thu, 23 Oct 2014, Andrew Senkevich wrote:
>
>>> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
>>> index 8a94a7e..2d31a11 100644
>>> --- a/math/bits/mathcalls.h
>>> +++ b/math/bits/mathcalls.h
>>> @@ -60,6 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
>>>  __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
>>>
>>>  /* Cosine of X.  */
>>> +#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
>>> +__DECL_SIMD_cos
>>> +#endif
>>> +#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
>>> +__DECL_SIMD_cosf
>>> +#endif
>>> +#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
>>> +__DECL_SIMD_cosl
>>> +#endif
>>>  __MATHCALL (cos,, (_Mdouble_ __x));
>>
>> As previously noted, I think it would be much better if the definition of
>> __MATHCALL can include all the conditional bits (possibly through a
>> generated header that defines __DECL_SIMD_cos etc. to empty if not defined
>> by bits/math-vector.h).
>
> proposal is to use separated __MATHCALL_VEC for vector cases, because
> it reduces number of needed empty definitions and can be simply
> generated (__MATHCALL case requires a lot of manual search to obtain
> all affected function names because of redefinitions in some files).
>
>>> +#if defined __x86_64__ && defined __FAST_MATH__
>>> +# if defined _OPENMP && _OPENMP >= 201307
>>> +/* OpenMP case. */
>>> +#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
>>> +#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
>>
>> Of course we still need the API/ABI documentation providing the stable
>> guarantee about exactly what this pragma means regarding the function
>> versions it is saying are available in glibc.
>
> We will follow-up on this soon.
>
> I attached patch with almost all infrastructure fixes discussed
> before. It seems pragma meaning and data tables remain to be done.
> Patch affects a lot of files and of course will be separated to
> minimal disjoint parts for submission later.

Here is the patch updated in part of data table and function code
accordingly points mentioned before in this discussion.


--
WBR,
Andrew
  
Joseph Myers Nov. 14, 2014, 4:51 p.m. UTC | #16
On Fri, 14 Nov 2014, Andrew Senkevich wrote:

> +#define __SIMD_DECL(function) __CONCAT(__DECL_SIMD_,function)
> +
> +#define __MATHCALL_VEC(function,suffix, args) 	\
> +  __SIMD_DECL(__MATH_PRECNAME(function,suffix)) \
> +  __MATHCALL(function,suffix, args)

Generally, throughout the patch, use GNU style: spaces before open 
parentheses for calls to functions and function-like macros (not of course 
in "#define func(args)" where C syntax doesn't allow that space) and after 
commas.

> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
> index c9f9a51..0b73d5b 100644
> --- a/sysdeps/x86_64/configure.ac
> +++ b/sysdeps/x86_64/configure.ac
> @@ -99,6 +99,15 @@ if test $libc_cv_cc_avx2 = yes; then
>  fi
>  LIBC_CONFIG_VAR([config-cflags-avx2], [$libc_cv_cc_avx2])
>  
> +if test x"$build_mathvec" = xnotset; then
> +  if test x"$base_machine" = xx86_64; then

No need for the base_machine test here; this configure fragment will never 
be called for non-x86_64 machines.  It's only preconfigure fragments that 
need to check for an applicable machine, not configure ones.

> +LIBC_CONFIG_VAR([build-mathvec], [$build_mathvec])

I think the LIBC_CONFIG_VAR call belongs in the toplevel configure script 
(after the sysdeps configure fragments have been run) - as does setting 
build_mathvec to "no" if it's still "notset" after running the sysdeps 
configure fragments.
  
Andrew Senkevich Nov. 18, 2014, 7:06 p.m. UTC | #17
Hi Joseph,

attached patch now contain versions of vector cos in SSE4, AVX and AVX2 ISAs.

Because of both AVX and AVX2 versions have vector length 4 there are
some changes in tests - I put AVX2 make rules in sysdeps makefile and
its name changed to test-double-vlen4-avx2 , AVX test stay with old
name, in ULPs file specified both versions.

If everything is okey, let me know and I will prepare separated
patches while document about pragma meaning is preparing (we plan to
add it at last steps).


--
WBR,
Andrew
  
Joseph Myers Nov. 18, 2014, 10:49 p.m. UTC | #18
On Tue, 18 Nov 2014, Andrew Senkevich wrote:

> Hi Joseph,
> 
> attached patch now contain versions of vector cos in SSE4, AVX and AVX2 ISAs.
> 
> Because of both AVX and AVX2 versions have vector length 4 there are
> some changes in tests - I put AVX2 make rules in sysdeps makefile and
> its name changed to test-double-vlen4-avx2 , AVX test stay with old
> name, in ULPs file specified both versions.
> 
> If everything is okey, let me know and I will prepare separated
> patches while document about pragma meaning is preparing (we plan to
> add it at last steps).

The overall approach seems reasonable.  I fully expect further revisions 
to be needed to some of the individual patches once they are submitted.
  

Patch

diff --git a/math/libm-test.inc b/math/libm-test.inc
index f86a4fa..9ddb77e 100644
--- a/math/libm-test.inc
+++ b/math/libm-test.inc
@@ -684,7 +684,7 @@  static void
 test_single_errno (const char *test_name, int errno_value,
    int expected_value, const char *expected_name)
 {
-#ifndef TEST_INLINE
+#if !defined TEST_INLINE && !defined TEST_MATHVEC
   if (errno_value == expected_value)
     {
       if (print_screen (1))
@@ -1691,8 +1691,9 @@  struct test_fFF_11_data
   ROUND_RESTORE_ ## ROUNDING_MODE

 /* Start and end the tests for a given function.  */
-#define START(FUNC, EXACT) \
-  const char *this_func = #FUNC; \
+#define STR_CON(x,y) __STRING(x##y)
+#define START(FUNC, SUFF, EXACT) \
+  const char *this_func = STR_CON (FUNC, SUFF); \
   init_max_error (this_func, EXACT)
 #define END \
   print_max_error (this_func)
@@ -1705,28 +1706,28 @@  struct test_fFF_11_data
     { \
       do \
  { \
-  START (FUNC, EXACT); \
+  START (FUNC, , EXACT); \
   LOOP_MACRO (FUNC, ARRAY, , ## __VA_ARGS__); \
   END_MACRO; \
  } \
       while (0); \
       do \
  { \
-  START (FUNC ## _downward, EXACT); \
+  START (FUNC, _downward, EXACT); \
   LOOP_MACRO (FUNC, ARRAY, FE_DOWNWARD, ## __VA_ARGS__); \
   END_MACRO; \
  } \
       while (0); \
       do \
  { \
-  START (FUNC ## _towardzero, EXACT); \
+  START (FUNC, _towardzero, EXACT); \
   LOOP_MACRO (FUNC, ARRAY, FE_TOWARDZERO, ## __VA_ARGS__); \
   END_MACRO; \
  } \
       while (0); \
       do \
  { \
-  START (FUNC ## _upward, EXACT); \
+  START (FUNC, _upward, EXACT); \
   LOOP_MACRO (FUNC, ARRAY, FE_UPWARD, ## __VA_ARGS__); \
   END_MACRO; \
  } \
@@ -6034,7 +6035,7 @@  static const struct test_c_c_data cexp_test_data[] =
 static void
 cexp_test (void)
 {
-  START (cexp, 0);
+  START (cexp, , 0);
   RUN_TEST_LOOP_c_c (cexp, cexp_test_data, );
   END_COMPLEX;
 }
@@ -6247,7 +6248,7 @@  copysign_test (void)


 static const struct test_f_f_data cos_test_data[] =
-  {
+  {
     TEST_f_f (cos, plus_infty, qnan_value, INVALID_EXCEPTION|ERRNO_EDOM),
     TEST_f_f (cos, minus_infty, qnan_value, INVALID_EXCEPTION|ERRNO_EDOM),
     TEST_f_f (cos, qnan_value, qnan_value,
NO_INEXACT_EXCEPTION|ERRNO_UNCHANGED),
@@ -6255,9 +6256,14 @@  static const struct test_f_f_data cos_test_data[] =
     AUTO_TESTS_f_f (cos),
   };

+#ifndef CHECKARCH
+# define CHECKARCH
+#endif
+
 static void
 cos_test (void)
 {
+  CHECKARCH
   ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
 }

@@ -7548,7 +7554,7 @@  static const struct test_if_f_data jn_test_data[] =
 static void
 jn_test (void)
 {
-  START (jn, 0);
+  START (jn, , 0);
   RUN_TEST_LOOP_if_f (jn, jn_test_data, );
   END;
 }
@@ -9374,7 +9380,7 @@  static const struct test_f_f_data tgamma_test_data[] =
 static void
 tgamma_test (void)
 {
-  START (tgamma, 0);
+  START (tgamma, , 0);
   RUN_TEST_LOOP_f_f (tgamma, tgamma_test_data, );
   END;
 }
@@ -9824,6 +9830,12 @@  main (int argc, char **argv)
   initialize ();
   printf (TEST_MSG);

+  /* Vector trigonometric functions:  */
+#ifdef TEST_MATHVEC
+
+  cos_test ();
+
+#else
   check_ulp ();

   /* Keep the tests a wee bit ordered (according to ISO C99).  */
@@ -9960,6 +9972,7 @@  main (int argc, char **argv)
   y0_test ();
   y1_test ();
   yn_test ();
+#endif

   if (output_ulps)
     fclose (ulps_file);

diff --git a/sysdeps/x86_64/fpu/libm-test-ulps
b/sysdeps/x86_64/fpu/libm-test-ulps
index 36e1b76..0e11cd5 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -905,6 +905,12 @@  idouble: 1
 ildouble: 2
 ldouble: 2

+
+Function: "vlen4_cos":
+double: 1
+
 Function: "cosh":
 double: 1
 float: 1

diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
new file mode 100644
index 0000000..35e130e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
@@ -0,0 +1,40 @@ 
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FLOAT double
+
+// Wrapper from scalar to vector function implemented in AVX2.
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+extern __m256d vector_func(__m256d); \
+FLOAT scalar_func(FLOAT x)\
+{\
+  int i;\
+  __m256d mx = _mm256_set1_pd(x);\
+  __m256d mr = vector_func(mx);\
+  for(i=1;i<4;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#include <immintrin.h>
+
+VECTOR_WRAPPER(vlen4_cos,_ZGVdN4v_cos)

diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c
b/sysdeps/x86_64/fpu/test-double-vlen4.c
new file mode 100644
index 0000000..ce40c04
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4.c
@@ -0,0 +1,44 @@ 
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FUNC(function) function
+#define FLOAT double
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat)
Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define TEST_MATHVEC
+#define EXCEPTION_TESTS_double 0
+#define ROUNDING_TESTS_double(MODE) ((MODE) == FE_TONEAREST)
+
+#define cos vlen4_cos
+
+#include <init-arch.h>
+
+#define CHECKARCH \
+__init_cpu_features();\
+if (__cpu_features.feature[index_AVX2_Usable] & bit_AVX2_Usable)
+