[x86-64,BZ,#20024] Fixed vector sincos/sincosf ABI

Message ID CAMXFM3tO0RGFjbrQaEV80rTCWmUy8LQjkZ7gRd9j+Q3YQ2+nAQ@mail.gmail.com
State New, archived
Headers

Commit Message

Andrew Senkevich May 31, 2016, 7:25 p.m. UTC
  Hi,

this patch fixes wrong vector sincos/sincosf ABI to have it compatible with
current vector function declaration.  According to current vector function
declaration vectorized sincos should have vector of pointers for second and
third parameters, so it is fixed with implementation as wrapper to version
having second and third parameters as pointers.
Is it Ok for trunk, 2.22 and 2.23 releases branches?

2016-05-31  Andrew Senkevich  <andrew.senkevich@intel.com>

        [BZ #20024]
        * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI
        of this implementation of vector function.
        * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
        Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise.
        * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Redefined wrapper
        for testing vector function with fixed ABI.
        * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated on KNL.



--
WBR,
Andrew
  

Comments

Carlos O'Donell June 1, 2016, 12:14 a.m. UTC | #1
On 05/31/2016 03:25 PM, Andrew Senkevich wrote:
> Hi,
> 
> this patch fixes wrong vector sincos/sincosf ABI to have it compatible with
> current vector function declaration.  According to current vector function
> declaration vectorized sincos should have vector of pointers for second and
> third parameters, so it is fixed with implementation as wrapper to version
> having second and third parameters as pointers.
> Is it Ok for trunk, 2.22 and 2.23 releases branches?

How did you test this? Is it possible to add a regression test that detects
this issue?

> 2016-05-31  Andrew Senkevich  <andrew.senkevich@intel.com>
> 
>         [BZ #20024]
>         * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI
>         of this implementation of vector function.
>         * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise.
>         * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise.
>         * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
>         Likewise.
>         * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise.
>         * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise.
>         * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Redefined wrapper
>         for testing vector function with fixed ABI.
>         * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.


>         * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated on KNL.

This should be a separate patch that you commit without any real need
for discussion (unless the numbers are way out).

Cheers,
Carlos.
  
Andrew Senkevich June 1, 2016, 8:03 p.m. UTC | #2
2016-06-01 3:14 GMT+03:00 Carlos O'Donell <carlos@redhat.com>:
> On 05/31/2016 03:25 PM, Andrew Senkevich wrote:
>> Hi,
>>
>> this patch fixes wrong vector sincos/sincosf ABI to have it compatible with
>> current vector function declaration.  According to current vector function
>> declaration vectorized sincos should have vector of pointers for second and
>> third parameters, so it is fixed with implementation as wrapper to version
>> having second and third parameters as pointers.
>> Is it Ok for trunk, 2.22 and 2.23 releases branches?
>
> How did you test this? Is it possible to add a regression test that detects
> this issue?

I tested with testcase from the bug.  But for AVX512 variants needed
GCC not less than 6.1.

Is it Ok to add regression tests with assembly sources (or at least
for that vector versions for which can be hard to get vectorized
testcase with any GCC version)?


--
WBR,
Andrew
  
Joseph Myers June 2, 2016, 10:50 p.m. UTC | #3
On Tue, 31 May 2016, Andrew Senkevich wrote:

> Hi,
> 
> this patch fixes wrong vector sincos/sincosf ABI to have it compatible with
> current vector function declaration.  According to current vector function
> declaration vectorized sincos should have vector of pointers for second and
> third parameters, so it is fixed with implementation as wrapper to version
> having second and third parameters as pointers.
> Is it Ok for trunk, 2.22 and 2.23 releases branches?

Do you intend a followup for trunk only that exports the new functions 
with the intended ABI and makes the old ones into compat symbols?

> +  INIT_VEC_LOOP (mr, (long int)r, VEC_LEN);    \
> +  INIT_VEC_LOOP (mr1, (long int)r1, VEC_LEN);  \
> +  vector_func (mx, mr, mr1);                   \
> +  TEST_VEC_LOOP (*(FLOAT*)mr, VEC_LEN);                \
> +  TEST_VEC_LOOP (*(FLOAT*)mr1, VEC_LEN);       \

Should have a space in casts, "(type) value", throughout this patch.
  
Andrew Senkevich June 6, 2016, 1:34 p.m. UTC | #4
2016-06-03 1:50 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> On Tue, 31 May 2016, Andrew Senkevich wrote:
>
>> Hi,
>>
>> this patch fixes wrong vector sincos/sincosf ABI to have it compatible with
>> current vector function declaration.  According to current vector function
>> declaration vectorized sincos should have vector of pointers for second and
>> third parameters, so it is fixed with implementation as wrapper to version
>> having second and third parameters as pointers.
>> Is it Ok for trunk, 2.22 and 2.23 releases branches?
>
> Do you intend a followup for trunk only that exports the new functions
> with the intended ABI and makes the old ones into compat symbols?

Is it suitable way to have both simd declarations for sincos in headers?

Do we need tests and is it Ok to have them in assembly?



--
WBR,
Andrew
  
Joseph Myers June 6, 2016, 2:08 p.m. UTC | #5
On Mon, 6 Jun 2016, Andrew Senkevich wrote:

> 2016-06-03 1:50 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> > On Tue, 31 May 2016, Andrew Senkevich wrote:
> >
> >> Hi,
> >>
> >> this patch fixes wrong vector sincos/sincosf ABI to have it compatible with
> >> current vector function declaration.  According to current vector function
> >> declaration vectorized sincos should have vector of pointers for second and
> >> third parameters, so it is fixed with implementation as wrapper to version
> >> having second and third parameters as pointers.
> >> Is it Ok for trunk, 2.22 and 2.23 releases branches?
> >
> > Do you intend a followup for trunk only that exports the new functions
> > with the intended ABI and makes the old ones into compat symbols?
> 
> Is it suitable way to have both simd declarations for sincos in headers?

(a) Would that work usefully, and cause both functions to be used 
depending on the code to be vectorized?

(b) How useful are the existing functions, i.e. would real code be likely 
to use both functions?

> Do we need tests and is it Ok to have them in assembly?

All public interfaces should have tests.  Compat interfaces may be 
trickier to test, but it's still a good idea to do so if possible.

C tests seem safer where possible.  For example: the existing functions 
are said to take vectors of pointers.  Does that mean vectors of 8-byte 
pointers for the 64-bit ABI and vectors of 4-byte pointers for x32?  If 
so, a C test is more likely to get right that the ABI to test is different 
in those cases.  Did you test your patch for x32?
  
Carlos O'Donell June 7, 2016, 12:02 a.m. UTC | #6
On 06/06/2016 10:08 AM, Joseph Myers wrote:
> C tests seem safer where possible.  For example: the existing functions 
> are said to take vectors of pointers.  Does that mean vectors of 8-byte 
> pointers for the 64-bit ABI and vectors of 4-byte pointers for x32?  If 
> so, a C test is more likely to get right that the ABI to test is different 
> in those cases.

+1
  
Andrew Senkevich June 11, 2016, 12:56 p.m. UTC | #7
2016-06-07 3:02 GMT+03:00 Carlos O'Donell <carlos@redhat.com>:
> On 06/06/2016 10:08 AM, Joseph Myers wrote:
>> C tests seem safer where possible.  For example: the existing functions
>> are said to take vectors of pointers.  Does that mean vectors of 8-byte
>> pointers for the 64-bit ABI and vectors of 4-byte pointers for x32?  If
>> so, a C test is more likely to get right that the ABI to test is different
>> in those cases.
>
> +1

Here is new version of patch with fixed implementations for x32 and
new C tests for sincos ABI.


--
WBR,
Andrew
  
Andrew Senkevich June 18, 2016, 10:02 a.m. UTC | #8
2016-06-11 15:56 GMT+03:00 Andrew Senkevich <andrew.n.senkevich@gmail.com>:
> 2016-06-07 3:02 GMT+03:00 Carlos O'Donell <carlos@redhat.com>:
>> On 06/06/2016 10:08 AM, Joseph Myers wrote:
>>> C tests seem safer where possible.  For example: the existing functions
>>> are said to take vectors of pointers.  Does that mean vectors of 8-byte
>>> pointers for the 64-bit ABI and vectors of 4-byte pointers for x32?  If
>>> so, a C test is more likely to get right that the ABI to test is different
>>> in those cases.
>>
>> +1
>
> Here is new version of patch with fixed implementations for x32 and
> new C tests for sincos ABI.

Ping.

ChangeLog is:

2016-06-11  Andrew Senkevich  <andrew.senkevich@intel.com>

        [BZ #20024]
        * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI
        of this implementation of vector function.
        * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
        Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise.
        * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Redefined wrapper
        for testing vector function with fixed ABI.
        * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx.c: New test.
        * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx2.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx512.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-libmvec-sincos.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx2.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx512.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-libmvec-sincosf.c: Likewise.
        * sysdeps/x86_64/fpu/Makefile: Added new tests.


--
WBR,
Andrew
  
Joseph Myers June 22, 2016, 3:12 p.m. UTC | #9
On Sat, 11 Jun 2016, Andrew Senkevich wrote:

> +#define VECTOR_WRAPPER_fFF(scalar_func, vector_func)		\
> +extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE);	\
> +void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)		\
> +{							\
> +  int i;						\
> +  VEC_TYPE mx;						\
> +  VEC_INT_TYPE mr, mr1;					\
> +  INIT_VEC_LOOP (mx, x, VEC_LEN);			\
> +  INIT_VEC_LOOP (((FLOAT **) &mr), r, VEC_LEN);		\
> +  INIT_VEC_LOOP (((FLOAT **) &mr1), r1, VEC_LEN);	\
> +  vector_func (mx, mr, mr1);				\
> +  char *mr_ptr = (char *) &mr;				\
> +  char *mr1_ptr = (char *) &mr1;			\
> +  TEST_VEC_LOOP (*((FLOAT **) mr_ptr), VEC_LEN);	\
> +  TEST_VEC_LOOP (*((FLOAT **) mr1_ptr), VEC_LEN);	\
> +  return;						\

You seem to have lots of duplicate copies of this VECTOR_WRAPPER_fFF 
definition.  Please unify them somehow.

Also, I don't see how this definition can work.  It looks to me like: you 
initialize the vectors of pointers with lots of copies of the same pointer 
(as INIT_VEC_LOOP is about putting lots of copies of the same value in a 
vector).  Then you call the vector function.  Then the TEST_VEC_LOOP calls 
have a first argument that is, via some indirection, just r or r1, so they 
would look successively at r[0], r[1] etc. - but only r[0] and r1[0] 
actually exist.  Given this, I don't understand why the implementation you 
have would have passed the tests at all.

What I'd expect is: you define vector result variables locally in the 
macro, like math/test-math-vector.h's copy of VECTOR_WRAPPER_fFF does.  
You initialize the vectors of pointers to point to each successive element 
of the vector result variables - not to have every element pointing to the 
same place.  Then everything after that would be as in the 
math/test-math-vector.h version.
  
Andrew Senkevich June 22, 2016, 5:52 p.m. UTC | #10
2016-06-22 18:12 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> Also, I don't see how this definition can work.  It looks to me like: you
> initialize the vectors of pointers with lots of copies of the same pointer
> (as INIT_VEC_LOOP is about putting lots of copies of the same value in a
> vector).  Then you call the vector function.  Then the TEST_VEC_LOOP calls
> have a first argument that is, via some indirection, just r or r1, so they
> would look successively at r[0], r[1] etc. - but only r[0] and r1[0]
> actually exist.  Given this, I don't understand why the implementation you
> have would have passed the tests at all.

Unfolded TEST_VEC_LOOP looks successively at mr[0], mr[1] not at r[0], r[1].
mr[0], mr[1] etc. are the same pointer, yes, but mx also contains
equal values...
Is it Ok?


--
WBR,
Andrew
  
Joseph Myers June 22, 2016, 5:56 p.m. UTC | #11
On Wed, 22 Jun 2016, Andrew Senkevich wrote:

> 2016-06-22 18:12 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> > Also, I don't see how this definition can work.  It looks to me like: you
> > initialize the vectors of pointers with lots of copies of the same pointer
> > (as INIT_VEC_LOOP is about putting lots of copies of the same value in a
> > vector).  Then you call the vector function.  Then the TEST_VEC_LOOP calls
> > have a first argument that is, via some indirection, just r or r1, so they
> > would look successively at r[0], r[1] etc. - but only r[0] and r1[0]
> > actually exist.  Given this, I don't understand why the implementation you
> > have would have passed the tests at all.
> 
> Unfolded TEST_VEC_LOOP looks successively at mr[0], mr[1] not at r[0], r[1].
> mr[0], mr[1] etc. are the same pointer, yes, but mx also contains
> equal values...
> Is it Ok?

The whole point of TEST_VEC_LOOP is to make sure that the N floating-point 
results are equal, given equal inputs (to fit vector tests into the scalar 
test infrastructure).

This means you need to use N separate pointers in the vector of pointers.
  
Andrew Senkevich June 23, 2016, 4:33 p.m. UTC | #12
2016-06-22 20:56 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> On Wed, 22 Jun 2016, Andrew Senkevich wrote:
>
>> 2016-06-22 18:12 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
>> > Also, I don't see how this definition can work.  It looks to me like: you
>> > initialize the vectors of pointers with lots of copies of the same pointer
>> > (as INIT_VEC_LOOP is about putting lots of copies of the same value in a
>> > vector).  Then you call the vector function.  Then the TEST_VEC_LOOP calls
>> > have a first argument that is, via some indirection, just r or r1, so they
>> > would look successively at r[0], r[1] etc. - but only r[0] and r1[0]
>> > actually exist.  Given this, I don't understand why the implementation you
>> > have would have passed the tests at all.
>>
>> Unfolded TEST_VEC_LOOP looks successively at mr[0], mr[1] not at r[0], r[1].
>> mr[0], mr[1] etc. are the same pointer, yes, but mx also contains
>> equal values...
>> Is it Ok?
>
> The whole point of TEST_VEC_LOOP is to make sure that the N floating-point
> results are equal, given equal inputs (to fit vector tests into the scalar
> test infrastructure).
>
> This means you need to use N separate pointers in the vector of pointers.

Attached refactored version, ChangeLog is

        [BZ #20024]
        * sysdeps/x86/fpu/test-math-vector-sincos.h: New.
        * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI
        of this implementation of vector function.
        * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
        Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise.
        * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise.
        * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise.
        * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Use another wrapper
        for testing vector sincos with fixed ABI.
        * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx.c: New test.
        * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx2.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx512.c: Likewise.
        * sysdeps/x86_64/fpu/test-double-libmvec-sincos.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx2.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx512.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-libmvec-sincosf.c: Likewise.
        * sysdeps/x86_64/fpu/Makefile: Added new tests.


--
WBR,
Andrew
  
Andrew Senkevich June 27, 2016, 11:26 a.m. UTC | #13
2016-06-23 19:33 GMT+03:00 Andrew Senkevich <andrew.n.senkevich@gmail.com>:
> 2016-06-22 20:56 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
>> On Wed, 22 Jun 2016, Andrew Senkevich wrote:
>>
>>> 2016-06-22 18:12 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
>>> > Also, I don't see how this definition can work.  It looks to me like: you
>>> > initialize the vectors of pointers with lots of copies of the same pointer
>>> > (as INIT_VEC_LOOP is about putting lots of copies of the same value in a
>>> > vector).  Then you call the vector function.  Then the TEST_VEC_LOOP calls
>>> > have a first argument that is, via some indirection, just r or r1, so they
>>> > would look successively at r[0], r[1] etc. - but only r[0] and r1[0]
>>> > actually exist.  Given this, I don't understand why the implementation you
>>> > have would have passed the tests at all.
>>>
>>> Unfolded TEST_VEC_LOOP looks successively at mr[0], mr[1] not at r[0], r[1].
>>> mr[0], mr[1] etc. are the same pointer, yes, but mx also contains
>>> equal values...
>>> Is it Ok?
>>
>> The whole point of TEST_VEC_LOOP is to make sure that the N floating-point
>> results are equal, given equal inputs (to fit vector tests into the scalar
>> test infrastructure).
>>
>> This means you need to use N separate pointers in the vector of pointers.
>
> Attached refactored version, ChangeLog is
>
>         [BZ #20024]
>         * sysdeps/x86/fpu/test-math-vector-sincos.h: New.
>         * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI
>         of this implementation of vector function.
>         * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise.
>         * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise.
>         * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
>         Likewise.
>         * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise.
>         * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise.
>         * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise.
>         * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Use another wrapper
>         for testing vector sincos with fixed ABI.
>         * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
>         * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx.c: New test.
>         * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx2.c: Likewise.
>         * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx512.c: Likewise.
>         * sysdeps/x86_64/fpu/test-double-libmvec-sincos.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx2.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx512.c: Likewise.
>         * sysdeps/x86_64/fpu/test-float-libmvec-sincosf.c: Likewise.
>         * sysdeps/x86_64/fpu/Makefile: Added new tests.

Tested on x86_64 and x32 on all needed ISAs. Ok for trunk?


--
WBR,
Andrew
  
Joseph Myers June 29, 2016, 9:59 p.m. UTC | #14
On Thu, 23 Jun 2016, Andrew Senkevich wrote:

> +#define VECTOR_WRAPPER_fFF_2(scalar_func, vector_func)		\
> +extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE);	\
> +void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)		\
> +{								\
> +  int i;							\
> +  FLOAT r_loc[VEC_LEN], r1_loc[VEC_LEN];			\
> +  VEC_TYPE mx;							\
> +  VEC_INT_TYPE mr, mr1;						\
> +  INIT_VEC_LOOP (mx, x, VEC_LEN);				\
> +  INIT_VEC_PTRS_LOOP (((FLOAT **) &mr), r_loc, VEC_LEN);	\
> +  INIT_VEC_PTRS_LOOP (((FLOAT **) &mr1), r1_loc, VEC_LEN);	\
> +  vector_func (mx, mr, mr1);					\
> +  char *mr_ptr = (char *) &mr;					\
> +  char *mr1_ptr = (char *) &mr1;				\
> +  TEST_VEC_LOOP (*((FLOAT **) mr_ptr), VEC_LEN);		\
> +  TEST_VEC_LOOP (*((FLOAT **) mr1_ptr), VEC_LEN);		\
> +  *r = *((FLOAT **) mr_ptr)[0];					\
> +  *r1 = *((FLOAT **) mr1_ptr)[0];				\

I still think this is much more complicated than necessary.

Rather than having variables mr_ptr and mr1_ptr at all, and having a load 
of pointer casts, I'd expect you just to pass r_loc and r1_loc - the 
arrays in which the results have been stored - directly to TEST_VEC_LOOP.  
And then store the results in *r and *r1 taken from r_loc[0] and 
r1_loc[0], without all the unnecessary indirection.
  
Andrew Senkevich June 30, 2016, 12:40 p.m. UTC | #15
2016-06-30 0:59 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> On Thu, 23 Jun 2016, Andrew Senkevich wrote:
>
>> +#define VECTOR_WRAPPER_fFF_2(scalar_func, vector_func)               \
>> +extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE);      \
>> +void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)            \
>> +{                                                            \
>> +  int i;                                                     \
>> +  FLOAT r_loc[VEC_LEN], r1_loc[VEC_LEN];                     \
>> +  VEC_TYPE mx;                                                       \
>> +  VEC_INT_TYPE mr, mr1;                                              \
>> +  INIT_VEC_LOOP (mx, x, VEC_LEN);                            \
>> +  INIT_VEC_PTRS_LOOP (((FLOAT **) &mr), r_loc, VEC_LEN);     \
>> +  INIT_VEC_PTRS_LOOP (((FLOAT **) &mr1), r1_loc, VEC_LEN);   \
>> +  vector_func (mx, mr, mr1);                                 \
>> +  char *mr_ptr = (char *) &mr;                                       \
>> +  char *mr1_ptr = (char *) &mr1;                             \
>> +  TEST_VEC_LOOP (*((FLOAT **) mr_ptr), VEC_LEN);             \
>> +  TEST_VEC_LOOP (*((FLOAT **) mr1_ptr), VEC_LEN);            \
>> +  *r = *((FLOAT **) mr_ptr)[0];                                      \
>> +  *r1 = *((FLOAT **) mr1_ptr)[0];                            \
>
> I still think this is much more complicated than necessary.
>
> Rather than having variables mr_ptr and mr1_ptr at all, and having a load
> of pointer casts, I'd expect you just to pass r_loc and r1_loc - the
> arrays in which the results have been stored - directly to TEST_VEC_LOOP.
> And then store the results in *r and *r1 taken from r_loc[0] and
> r1_loc[0], without all the unnecessary indirection.

Indeed, it can be simplified now.

Is it  Ok with that change for trunk as well as for 2.22 and 2.23
release branches?


--
WBR,
Andrew
  
Joseph Myers June 30, 2016, 1:46 p.m. UTC | #16
On Thu, 30 Jun 2016, Andrew Senkevich wrote:

> Indeed, it can be simplified now.
> 
> Is it  Ok with that change for trunk as well as for 2.22 and 2.23
> release branches?

Please send the actual patch you are proposing.
  
Andrew Senkevich June 30, 2016, 3:22 p.m. UTC | #17
2016-06-30 16:46 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> On Thu, 30 Jun 2016, Andrew Senkevich wrote:
>
>> Indeed, it can be simplified now.
>>
>> Is it  Ok with that change for trunk as well as for 2.22 and 2.23
>> release branches?
>
> Please send the actual patch you are proposing.

Here is attached.


--
WBR,
Andrew
  
Joseph Myers June 30, 2016, 10:25 p.m. UTC | #18
On Thu, 30 Jun 2016, Andrew Senkevich wrote:

> 2016-06-30 16:46 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> > On Thu, 30 Jun 2016, Andrew Senkevich wrote:
> >
> >> Indeed, it can be simplified now.
> >>
> >> Is it  Ok with that change for trunk as well as for 2.22 and 2.23
> >> release branches?
> >
> > Please send the actual patch you are proposing.
> 
> Here is attached.

This one is OK.
  

Patch

diff --git a/sysdeps/x86_64/fpu/libm-test-ulps
b/sysdeps/x86_64/fpu/libm-test-ulps
index 7e7707b..38c4218 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1029,7 +1029,7 @@  Function: "cos_vlen4_avx2":
 double: 2

 Function: "cos_vlen8":
-double: 1
+double: 2
 float: 1

 Function: "cos_vlen8_avx2":
@@ -2125,7 +2125,7 @@  Function: "sincos_vlen4_avx2":
 double: 2

 Function: "sincos_vlen8":
-double: 1
+double: 2
 float: 1

 Function: "sincos_vlen8_avx2":
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S
b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S
index d37275d..56e9c57
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S
@@ -20,7 +20,7 @@ 
 #include "svml_d_trig_data.h"

        .text
-ENTRY (_ZGVbN2vvv_sincos_sse4)
+ENTRY (_ZGVbN2vl8l8_sincos_sse4)
 /*
    ALGORITHM DESCRIPTION:

@@ -311,4 +311,31 @@  ENTRY (_ZGVbN2vvv_sincos_sse4)

         movsd     %xmm0, 256(%rsp,%r15)
         jmp       .LBL_1_7
+END (_ZGVbN2vl8l8_sincos_sse4)
+libmvec_hidden_def(_ZGVbN2vl8l8_sincos_sse4)
+
+/* vvv version implemented with wrapper to vl8l8 variant.  */
+ENTRY (_ZGVbN2vvv_sincos_sse4)
+        subq      $72, %rsp
+        .cfi_def_cfa_offset 80
+        movdqu    %xmm1, 32(%rsp)
+        lea       (%rsp), %rdi
+        movdqu    %xmm2, 48(%rdi)
+        lea       16(%rsp), %rsi
+        call      HIDDEN_JUMPTARGET(_ZGVbN2vl8l8_sincos_sse4)
+        movq      32(%rsp), %rdx
+        movq      48(%rsp), %rsi
+        movq      40(%rsp), %r8
+        movq      56(%rsp), %r10
+        movq      (%rsp), %rax
+        movq      16(%rsp), %rcx
+        movq      8(%rsp), %rdi
+        movq      24(%rsp), %r9
+        movq      %rax, (%rdx)
+        movq      %rcx, (%rsi)
+        movq      %rdi, (%r8)
+        movq      %r9, (%r10)
+        addq      $72, %rsp
+        .cfi_def_cfa_offset 8
+        ret
 END (_ZGVbN2vvv_sincos_sse4)
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S
b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S
index 24b57f4..fc2b526
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S
@@ -20,7 +20,7 @@ 
 #include "svml_d_trig_data.h"

        .text
-ENTRY (_ZGVdN4vvv_sincos_avx2)
+ENTRY (_ZGVdN4vl8l8_sincos_avx2)
 /*
    ALGORITHM DESCRIPTION:

@@ -274,4 +274,51 @@  ENTRY (_ZGVdN4vvv_sincos_avx2)
         vmovsd    %xmm0, 384(%rsp,%r15)
         jmp       .LBL_1_7

+END (_ZGVdN4vl8l8_sincos_avx2)
+libmvec_hidden_def(_ZGVdN4vl8l8_sincos_avx2)
+
+/* vvv version implemented with wrapper to vl8l8 variant.  */
+ENTRY (_ZGVdN4vvv_sincos_avx2)
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-32, %rsp
+        subq      $128, %rsp
+        vmovdqu   %ymm1, 64(%rsp)
+        lea       (%rsp), %rdi
+        vmovdqu   %ymm2, 96(%rdi)
+        lea       32(%rsp), %rsi
+        call      HIDDEN_JUMPTARGET(_ZGVdN4vl8l8_sincos_avx2)
+        movq      64(%rsp), %rdx
+        movq      96(%rsp), %rsi
+        movq      72(%rsp), %r8
+        movq      104(%rsp), %r10
+        movq      (%rsp), %rax
+        movq      32(%rsp), %rcx
+        movq      8(%rsp), %rdi
+        movq      40(%rsp), %r9
+        movq      %rax, (%rdx)
+        movq      %rcx, (%rsi)
+        movq      80(%rsp), %rax
+        movq      112(%rsp), %rcx
+        movq      %rdi, (%r8)
+        movq      %r9, (%r10)
+        movq      88(%rsp), %rdi
+        movq      120(%rsp), %r9
+        movq      16(%rsp), %r11
+        movq      48(%rsp), %rdx
+        movq      24(%rsp), %rsi
+        movq      56(%rsp), %r8
+        movq      %r11, (%rax)
+        movq      %rdx, (%rcx)
+        movq      %rsi, (%rdi)
+        movq      %r8, (%r9)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
 END (_ZGVdN4vvv_sincos_avx2)
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S
b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S
index 1d9f426..1e1f220
--- a/sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S
@@ -36,9 +36,9 @@ 
      sin(R), sin(R') are approximated by corresponding polynomial.  */

        .text
-ENTRY (_ZGVeN8vvv_sincos_knl)
+ENTRY (_ZGVeN8vl8l8_sincos_knl)
 #ifndef HAVE_AVX512_ASM_SUPPORT
-WRAPPER_IMPL_AVX512_fFF _ZGVdN4vvv_sincos
+WRAPPER_IMPL_AVX512_fFF _ZGVdN4vl8l8_sincos
 #else
         pushq     %rbp
         cfi_adjust_cfa_offset (8)
@@ -304,11 +304,12 @@  WRAPPER_IMPL_AVX512_fFF _ZGVdN4vvv_sincos
         jmp       .LBL_1_7

 #endif
-END (_ZGVeN8vvv_sincos_knl)
+END (_ZGVeN8vl8l8_sincos_knl)
+libmvec_hidden_def(_ZGVeN8vl8l8_sincos_knl)

-ENTRY (_ZGVeN8vvv_sincos_skx)
+ENTRY (_ZGVeN8vl8l8_sincos_skx)
 #ifndef HAVE_AVX512_ASM_SUPPORT
-WRAPPER_IMPL_AVX512_fFF _ZGVdN4vvv_sincos
+WRAPPER_IMPL_AVX512_fFF _ZGVdN4vl8l8_sincos
 #else
         pushq     %rbp
         cfi_adjust_cfa_offset (8)
@@ -585,6 +586,100 @@  WRAPPER_IMPL_AVX512_fFF _ZGVdN4vvv_sincos
         jmp       .LBL_2_7

 #endif
+END (_ZGVeN8vl8l8_sincos_skx)
+libmvec_hidden_def(_ZGVeN8vl8l8_sincos_skx)
+
+/* Wrapper between vvv and vl8l8 vector variants.  */
+.macro WRAPPER_AVX512_vvv_vl8l8 callee
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $256, %rsp
+        /* Encoding for vmovups %zmm1, 128(%rsp).  */
+        .byte 0x62
+        .byte 0xf1
+        .byte 0x7c
+        .byte 0x48
+        .byte 0x11
+        .byte 0x4c
+        .byte 0x24
+        .byte 0x02
+        lea       (%rsp), %rdi
+        /* Encoding for vmovups %zmm2, 192(%rdi).  */
+        .byte 0x62
+        .byte 0xf1
+        .byte 0x7c
+        .byte 0x48
+        .byte 0x11
+        .byte 0x57
+        .byte 0x03
+        lea       64(%rsp), %rsi
+        call      HIDDEN_JUMPTARGET(\callee)
+        movq      128(%rsp), %rdx
+        movq      136(%rsp), %rsi
+        movq      144(%rsp), %r8
+        movq      152(%rsp), %r10
+        movq      (%rsp), %rax
+        movq      8(%rsp), %rcx
+        movq      16(%rsp), %rdi
+        movq      24(%rsp), %r9
+        movq      %rax, (%rdx)
+        movq      %rcx, (%rsi)
+        movq      160(%rsp), %rax
+        movq      168(%rsp), %rcx
+        movq      %rdi, (%r8)
+        movq      %r9, (%r10)
+        movq      176(%rsp), %rdi
+        movq      184(%rsp), %r9
+        movq      32(%rsp), %r11
+        movq      40(%rsp), %rdx
+        movq      48(%rsp), %rsi
+        movq      56(%rsp), %r8
+        movq      %r11, (%rax)
+        movq      %rdx, (%rcx)
+        movq      192(%rsp), %r11
+        movq      200(%rsp), %rdx
+        movq      %rsi, (%rdi)
+        movq      %r8, (%r9)
+        movq      208(%rsp), %rsi
+        movq      216(%rsp), %r8
+        movq      64(%rsp), %r10
+        movq      72(%rsp), %rax
+        movq      80(%rsp), %rcx
+        movq      88(%rsp), %rdi
+        movq      %r10, (%r11)
+        movq      %rax, (%rdx)
+        movq      224(%rsp), %r10
+        movq      232(%rsp), %rax
+        movq      %rcx, (%rsi)
+        movq      %rdi, (%r8)
+        movq      240(%rsp), %rcx
+        movq      248(%rsp), %rdi
+        movq      96(%rsp), %r9
+        movq      104(%rsp), %r11
+        movq      112(%rsp), %rdx
+        movq      120(%rsp), %rsi
+        movq      %r9, (%r10)
+        movq      %r11, (%rax)
+        movq      %rdx, (%rcx)
+        movq      %rsi, (%rdi)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+.endm
+
+ENTRY (_ZGVeN8vvv_sincos_knl)
+WRAPPER_AVX512_vvv_vl8l8 _ZGVeN8vl8l8_sincos_knl
+END (_ZGVeN8vvv_sincos_knl)
+
+ENTRY (_ZGVeN8vvv_sincos_skx)
+WRAPPER_AVX512_vvv_vl8l8 _ZGVeN8vl8l8_sincos_skx
 END (_ZGVeN8vvv_sincos_skx)

        .section .rodata, "a"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
index e375de8..c26ee0d
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
@@ -49,9 +49,9 @@ 
            R2 = XOR( RC, SC ).  */

        .text
-ENTRY (_ZGVeN16vvv_sincosf_knl)
+ENTRY (_ZGVeN16vl4l4_sincosf_knl)
 #ifndef HAVE_AVX512_ASM_SUPPORT
-WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
+WRAPPER_IMPL_AVX512_fFF _ZGVdN8vl4l4_sincosf
 #else
         pushq     %rbp
         cfi_adjust_cfa_offset (8)
@@ -267,9 +267,10 @@  WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
         vmovss    %xmm0, 1280(%rsp,%r15,8)
         jmp       .LBL_1_7
 #endif
-END (_ZGVeN16vvv_sincosf_knl)
+END (_ZGVeN16vl4l4_sincosf_knl)
+libmvec_hidden_def(_ZGVeN16vl4l4_sincosf_knl)

-ENTRY (_ZGVeN16vvv_sincosf_skx)
+ENTRY (_ZGVeN16vl4l4_sincosf_skx)
 #ifndef HAVE_AVX512_ASM_SUPPORT
 WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
 #else
@@ -496,6 +497,164 @@  WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
         vmovss    %xmm0, 1280(%rsp,%r15,8)
         jmp       .LBL_2_7
 #endif
+END (_ZGVeN16vl4l4_sincosf_skx)
+libmvec_hidden_def(_ZGVeN16vl4l4_sincosf_skx)
+
+/* Wrapper between vvv and vl4l4 vector variants.  */
+.macro WRAPPER_AVX512_vvv_vl4l4 callee
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $384, %rsp
+        /* Encoding for vmovups %zmm1, 128(%rsp).  */
+        .byte 0x62
+        .byte 0xf1
+        .byte 0x7c
+        .byte 0x48
+        .byte 0x11
+        .byte 0x4c
+        .byte 0x24
+        .byte 0x02
+        lea       (%rsp), %rdi
+        /* Encoding for vmovups %zmm2, 192(%rdi).  */
+        .byte 0x62
+        .byte 0xf1
+        .byte 0x7c
+        .byte 0x48
+        .byte 0x11
+        .byte 0x57
+        .byte 0x03
+        /* Encoding for vmovups %zmm3, 256(%rdi).  */
+        .byte 0x62
+        .byte 0xf1
+        .byte 0x7c
+        .byte 0x48
+        .byte 0x11
+        .byte 0x5f
+        .byte 0x04
+        /* Encoding for vmovups %zmm4, 320(%rdi).  */
+        .byte 0x62
+        .byte 0xf1
+        .byte 0x7c
+        .byte 0x48
+        .byte 0x11
+        .byte 0x67
+        .byte 0x05
+        lea       64(%rsp), %rsi
+        call      HIDDEN_JUMPTARGET(\callee)
+        movq      128(%rsp), %rdx
+        movq      136(%rsp), %rsi
+        movq      144(%rsp), %r8
+        movq      152(%rsp), %r10
+        movl      (%rsp), %eax
+        movl      4(%rsp), %ecx
+        movl      8(%rsp), %edi
+        movl      12(%rsp), %r9d
+        movl      %eax, (%rdx)
+        movl      %ecx, (%rsi)
+        movq      160(%rsp), %rax
+        movq      168(%rsp), %rcx
+        movl      %edi, (%r8)
+        movl      %r9d, (%r10)
+        movq      176(%rsp), %rdi
+        movq      184(%rsp), %r9
+        movl      16(%rsp), %r11d
+        movl      20(%rsp), %edx
+        movl      24(%rsp), %esi
+        movl      28(%rsp), %r8d
+        movl      %r11d, (%rax)
+        movl      %edx, (%rcx)
+        movq      192(%rsp), %r11
+        movq      200(%rsp), %rdx
+        movl      %esi, (%rdi)
+        movl      %r8d, (%r9)
+        movq      208(%rsp), %rsi
+        movq      216(%rsp), %r8
+        movl      32(%rsp), %r10d
+        movl      36(%rsp), %eax
+        movl      40(%rsp), %ecx
+        movl      44(%rsp), %edi
+        movl      %r10d, (%r11)
+        movl      %eax, (%rdx)
+        movq      224(%rsp), %r10
+        movq      232(%rsp), %rax
+        movl      %ecx, (%rsi)
+        movl      %edi, (%r8)
+        movq      240(%rsp), %rcx
+        movq      248(%rsp), %rdi
+        movl      48(%rsp), %r9d
+        movl      52(%rsp), %r11d
+        movl      56(%rsp), %edx
+        movl      60(%rsp), %esi
+        movl      %r9d, (%r10)
+        movl      %r11d, (%rax)
+        movq      256(%rsp), %r9
+        movq      264(%rsp), %r11
+        movl      %edx, (%rcx)
+        movl      %esi, (%rdi)
+        movq      272(%rsp), %rdx
+        movq      280(%rsp), %rsi
+        movl      64(%rsp), %r8d
+        movl      68(%rsp), %r10d
+        movl      72(%rsp), %eax
+        movl      76(%rsp), %ecx
+        movl      %r8d, (%r9)
+        movl      %r10d, (%r11)
+        movq      288(%rsp), %r8
+        movq      296(%rsp), %r10
+        movl      %eax, (%rdx)
+        movl      %ecx, (%rsi)
+        movq      304(%rsp), %rax
+        movq      312(%rsp), %rcx
+        movl      80(%rsp), %edi
+        movl      84(%rsp), %r9d
+        movl      88(%rsp), %r11d
+        movl      92(%rsp), %edx
+        movl      %edi, (%r8)
+        movl      %r9d, (%r10)
+        movq      320(%rsp), %rdi
+        movq      328(%rsp), %r9
+        movl      %r11d, (%rax)
+        movl      %edx, (%rcx)
+        movq      336(%rsp), %r11
+        movq      344(%rsp), %rdx
+        movl      96(%rsp), %esi
+        movl      100(%rsp), %r8d
+        movl      104(%rsp), %r10d
+        movl      108(%rsp), %eax
+        movl      %esi, (%rdi)
+        movl      %r8d, (%r9)
+        movq      352(%rsp), %rsi
+        movq      360(%rsp), %r8
+        movl      %r10d, (%r11)
+        movl      %eax, (%rdx)
+        movq      368(%rsp), %r10
+        movq      376(%rsp), %rax
+        movl      112(%rsp), %ecx
+        movl      116(%rsp), %edi
+        movl      120(%rsp), %r9d
+        movl      124(%rsp), %r11d
+        movl      %ecx, (%rsi)
+        movl      %edi, (%r8)
+        movl      %r9d, (%r10)
+        movl      %r11d, (%rax)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+.endm
+
+ENTRY (_ZGVeN16vvv_sincosf_knl)
+WRAPPER_AVX512_vvv_vl4l4 _ZGVeN16vl4l4_sincosf_knl
+END (_ZGVeN16vvv_sincosf_knl)
+
+ENTRY (_ZGVeN16vvv_sincosf_skx)
+WRAPPER_AVX512_vvv_vl4l4 _ZGVeN16vl4l4_sincosf_skx
 END (_ZGVeN16vvv_sincosf_skx)

        .section .rodata, "a"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
index 562367b..54205ce
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
@@ -20,7 +20,7 @@ 
 #include "svml_s_trig_data.h"

        .text
-ENTRY (_ZGVbN4vvv_sincosf_sse4)
+ENTRY (_ZGVbN4vl4l4_sincosf_sse4)
 /*
    ALGORITHM DESCRIPTION:

@@ -265,4 +265,45 @@  ENTRY (_ZGVbN4vvv_sincosf_sse4)
         movss     %xmm0, 256(%rsp,%r15,8)
         jmp       .LBL_1_7

+END (_ZGVbN4vl4l4_sincosf_sse4)
+libmvec_hidden_def(_ZGVbN4vl4l4_sincosf_sse4)
+
+/* vvv version implemented with wrapper to vl4l4 variant.  */
+ENTRY (_ZGVbN4vvv_sincosf_sse4)
+        subq      $104, %rsp
+        .cfi_def_cfa_offset 112
+        movdqu    %xmm1, 32(%rsp)
+        lea       (%rsp), %rdi
+        movdqu    %xmm2, 48(%rdi)
+        lea       16(%rsp), %rsi
+        movdqu    %xmm3, 48(%rsi)
+        movdqu    %xmm4, 64(%rsi)
+        call      HIDDEN_JUMPTARGET(_ZGVbN4vl4l4_sincosf_sse4)
+        movq      32(%rsp), %rdx
+        movq      40(%rsp), %rsi
+        movq      48(%rsp), %r8
+        movq      56(%rsp), %r10
+        movl      (%rsp), %eax
+        movl      4(%rsp), %ecx
+        movl      8(%rsp), %edi
+        movl      12(%rsp), %r9d
+        movl      %eax, (%rdx)
+        movl      %ecx, (%rsi)
+        movq      64(%rsp), %rax
+        movq      72(%rsp), %rcx
+        movl      %edi, (%r8)
+        movl      %r9d, (%r10)
+        movq      80(%rsp), %rdi
+        movq      88(%rsp), %r9
+        movl      16(%rsp), %r11d
+        movl      20(%rsp), %edx
+        movl      24(%rsp), %esi
+        movl      28(%rsp), %r8d
+        movl      %r11d, (%rax)
+        movl      %edx, (%rcx)
+        movl      %esi, (%rdi)
+        movl      %r8d, (%r9)
+        addq      $104, %rsp
+        .cfi_def_cfa_offset 8
+        ret
 END (_ZGVbN4vvv_sincosf_sse4)
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
index baf887d..fef0b75
--- a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
@@ -20,7 +20,7 @@ 
 #include "svml_s_trig_data.h"

        .text
-ENTRY(_ZGVdN8vvv_sincosf_avx2)
+ENTRY (_ZGVdN8vl4l4_sincosf_avx2)
 /*
    ALGORITHM DESCRIPTION:

@@ -238,4 +238,77 @@  ENTRY(_ZGVdN8vvv_sincosf_avx2)
         vmovss    %xmm0, 384(%rsp,%r15,8)
         jmp       .LBL_1_7

-END(_ZGVdN8vvv_sincosf_avx2)
+END (_ZGVdN8vl4l4_sincosf_avx2)
+libmvec_hidden_def(_ZGVdN8vl4l4_sincosf_avx2)
+
+/* vvv version implemented with wrapper to vl4l4 variant.  */
+ENTRY (_ZGVdN8vvv_sincosf_avx2)
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-32, %rsp
+        subq      $192, %rsp
+        vmovdqu   %ymm1, 64(%rsp)
+        lea       (%rsp), %rdi
+        vmovdqu   %ymm2, 96(%rdi)
+        vmovdqu   %ymm3, 128(%rdi)
+        vmovdqu   %ymm4, 160(%rdi)
+        lea       32(%rsp), %rsi
+        call      HIDDEN_JUMPTARGET(_ZGVdN8vl4l4_sincosf_avx2)
+        movq      64(%rsp), %rdx
+        movq      72(%rsp), %rsi
+        movq      80(%rsp), %r8
+        movq      88(%rsp), %r10
+        movl      (%rsp), %eax
+        movl      4(%rsp), %ecx
+        movl      8(%rsp), %edi
+        movl      12(%rsp), %r9d
+        movl      %eax, (%rdx)
+        movl      %ecx, (%rsi)
+        movq      96(%rsp), %rax
+        movq      104(%rsp), %rcx
+        movl      %edi, (%r8)
+        movl      %r9d, (%r10)
+        movq      112(%rsp), %rdi
+        movq      120(%rsp), %r9
+        movl      16(%rsp), %r11d
+        movl      20(%rsp), %edx
+        movl      24(%rsp), %esi
+        movl      28(%rsp), %r8d
+        movl      %r11d, (%rax)
+        movl      %edx, (%rcx)
+        movq      128(%rsp), %r11
+        movq      136(%rsp), %rdx
+        movl      %esi, (%rdi)
+        movl      %r8d, (%r9)
+        movq      144(%rsp), %rsi
+        movq      152(%rsp), %r8
+        movl      32(%rsp), %r10d
+        movl      36(%rsp), %eax
+        movl      40(%rsp), %ecx
+        movl      44(%rsp), %edi
+        movl      %r10d, (%r11)
+        movl      %eax, (%rdx)
+        movq      160(%rsp), %r10
+        movq      168(%rsp), %rax
+        movl      %ecx, (%rsi)
+        movl      %edi, (%r8)
+        movq      176(%rsp), %rcx
+        movq      184(%rsp), %rdi
+        movl      48(%rsp), %r9d
+        movl      52(%rsp), %r11d
+        movl      56(%rsp), %edx
+        movl      60(%rsp), %esi
+        movl      %r9d, (%r10)
+        movl      %r11d, (%rax)
+        movl      %edx, (%rcx)
+        movl      %esi, (%rdi)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+END (_ZGVdN8vvv_sincosf_avx2)
diff --git a/sysdeps/x86_64/fpu/svml_d_sincos2_core.S
b/sysdeps/x86_64/fpu/svml_d_sincos2_core.S
index 74afa0a..3dbc692
--- a/sysdeps/x86_64/fpu/svml_d_sincos2_core.S
+++ b/sysdeps/x86_64/fpu/svml_d_sincos2_core.S
@@ -20,8 +20,13 @@ 
 #include "svml_d_wrapper_impl.h"

        .text
-ENTRY (_ZGVbN2vvv_sincos)
+ENTRY (_ZGVbN2vl8l8_sincos)
 WRAPPER_IMPL_SSE2_fFF sincos
+END (_ZGVbN2vl8l8_sincos)
+libmvec_hidden_def (_ZGVbN2vl8l8_sincos)
+
+ENTRY (_ZGVbN2vvv_sincos)
+WRAPPER_IMPL_SSE2_fFF_vvv sincos
 END (_ZGVbN2vvv_sincos)

 #ifndef USE_MULTIARCH
diff --git a/sysdeps/x86_64/fpu/svml_d_sincos4_core.S
b/sysdeps/x86_64/fpu/svml_d_sincos4_core.S
index 2c0b011..f2cf1c7
--- a/sysdeps/x86_64/fpu/svml_d_sincos4_core.S
+++ b/sysdeps/x86_64/fpu/svml_d_sincos4_core.S
@@ -20,8 +20,13 @@ 
 #include "svml_d_wrapper_impl.h"

        .text
+ENTRY (_ZGVdN4vl8l8_sincos)
+WRAPPER_IMPL_AVX_fFF _ZGVbN2vl8l8_sincos
+END (_ZGVdN4vl8l8_sincos)
+libmvec_hidden_def (_ZGVdN4vl8l8_sincos)
+
 ENTRY (_ZGVdN4vvv_sincos)
-WRAPPER_IMPL_AVX_fFF _ZGVbN2vvv_sincos
+WRAPPER_IMPL_AVX2_fFF_vvv _ZGVbN2vl8l8_sincos
 END (_ZGVdN4vvv_sincos)

 #ifndef USE_MULTIARCH
diff --git a/sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S
b/sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S
index e4320a9..cf3cd79
--- a/sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S
+++ b/sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S
@@ -20,6 +20,10 @@ 
 #include "svml_d_wrapper_impl.h"

        .text
+ENTRY (_ZGVcN4vl8l8_sincos)
+WRAPPER_IMPL_AVX_fFF _ZGVbN2vl8l8_sincos
+END (_ZGVcN4vl8l8_sincos)
+
 ENTRY (_ZGVcN4vvv_sincos)
-WRAPPER_IMPL_AVX_fFF _ZGVbN2vvv_sincos
+WRAPPER_IMPL_AVX_fFF_vvv _ZGVbN2vl8l8_sincos
 END (_ZGVcN4vvv_sincos)
diff --git a/sysdeps/x86_64/fpu/svml_d_sincos8_core.S
b/sysdeps/x86_64/fpu/svml_d_sincos8_core.S
index 68d490e..7aba5f7
--- a/sysdeps/x86_64/fpu/svml_d_sincos8_core.S
+++ b/sysdeps/x86_64/fpu/svml_d_sincos8_core.S
@@ -20,6 +20,10 @@ 
 #include "svml_d_wrapper_impl.h"

        .text
+ENTRY (_ZGVeN8vl8l8_sincos)
+WRAPPER_IMPL_AVX512_fFF _ZGVdN4vl8l8_sincos
+END (_ZGVeN8vl8l8_sincos)
+
 ENTRY (_ZGVeN8vvv_sincos)
-WRAPPER_IMPL_AVX512_fFF _ZGVdN4vvv_sincos
+WRAPPER_IMPL_AVX512_fFF_vvv _ZGVdN4vl8l8_sincos
 END (_ZGVeN8vvv_sincos)
diff --git a/sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
b/sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
index 5cbf10b..e6a83e6
--- a/sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
@@ -20,6 +20,10 @@ 
 #include "svml_s_wrapper_impl.h"

        .text
+ENTRY (_ZGVeN16vl4l4_sincosf)
+WRAPPER_IMPL_AVX512_fFF _ZGVdN8vl4l4_sincosf
+END (_ZGVeN16vl4l4_sincosf)
+
 ENTRY (_ZGVeN16vvv_sincosf)
-WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
+WRAPPER_IMPL_AVX512_fFF_vvv _ZGVdN8vl4l4_sincosf
 END (_ZGVeN16vvv_sincosf)
diff --git a/sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
b/sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
index 1a7d273..e546c1c
--- a/sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
@@ -16,13 +16,17 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */

-
 #include <sysdep.h>
 #include "svml_s_wrapper_impl.h"

        .text
-ENTRY (_ZGVbN4vvv_sincosf)
+ENTRY (_ZGVbN4vl4l4_sincosf)
 WRAPPER_IMPL_SSE2_fFF sincosf
+END (_ZGVbN4vl4l4_sincosf)
+libmvec_hidden_def (_ZGVbN4vl4l4_sincosf)
+
+ENTRY (_ZGVbN4vvv_sincosf)
+WRAPPER_IMPL_SSE2_fFF_vvv sincosf
 END (_ZGVbN4vvv_sincosf)

 #ifndef USE_MULTIARCH
diff --git a/sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
b/sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
index 74d1dfd..0cffa1f
--- a/sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
@@ -20,8 +20,13 @@ 
 #include "svml_s_wrapper_impl.h"

        .text
+ENTRY (_ZGVdN8vl4l4_sincosf)
+WRAPPER_IMPL_AVX_fFF _ZGVbN4vl4l4_sincosf
+END (_ZGVdN8vl4l4_sincosf)
+libmvec_hidden_def (_ZGVdN8vl4l4_sincosf)
+
 ENTRY (_ZGVdN8vvv_sincosf)
-WRAPPER_IMPL_AVX_fFF _ZGVbN4vvv_sincosf
+WRAPPER_IMPL_AVX2_fFF_vvv _ZGVbN4vl4l4_sincosf
 END (_ZGVdN8vvv_sincosf)

 #ifndef USE_MULTIARCH
diff --git a/sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
b/sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
index 55b8b2d..0ccd9b5
--- a/sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
@@ -20,6 +20,10 @@ 
 #include "svml_s_wrapper_impl.h"

         .text
-ENTRY(_ZGVcN8vvv_sincosf)
-WRAPPER_IMPL_AVX_fFF _ZGVbN4vvv_sincosf
-END(_ZGVcN8vvv_sincosf)
+ENTRY (_ZGVcN8vl4l4_sincosf)
+WRAPPER_IMPL_AVX_fFF _ZGVbN4vl4l4_sincosf
+END (_ZGVcN8vl4l4_sincosf)
+
+ENTRY (_ZGVcN8vvv_sincosf)
+WRAPPER_IMPL_AVX_fFF_vvv _ZGVbN4vl4l4_sincosf
+END (_ZGVcN8vvv_sincosf)
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index a9d1597..dc393be
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -23,7 +23,28 @@ 

 VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVbN2v_cos)
 VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVbN2v_sin)
-VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVbN2vvv_sincos)
 VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log)
 VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow)
+
+#define VEC_INT_TYPE __m128i
+
+/* Redefinition of wrapper to be compatible with _ZGVbN2vvv_sincos.  */
+#undef VECTOR_WRAPPER_fFF
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func)           \
+extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE);        \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)              \
+{                                              \
+  int i;                                       \
+  VEC_TYPE mx;                                 \
+  VEC_INT_TYPE mr, mr1;                                \
+  INIT_VEC_LOOP (mx, x, VEC_LEN);              \
+  INIT_VEC_LOOP (mr, (long int)r, VEC_LEN);    \
+  INIT_VEC_LOOP (mr1, (long int)r1, VEC_LEN);  \
+  vector_func (mx, mr, mr1);                   \
+  TEST_VEC_LOOP (*(FLOAT*)mr, VEC_LEN);                \
+  TEST_VEC_LOOP (*(FLOAT*)mr1, VEC_LEN);       \
+  return;                                      \
+}
+
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVbN2vvv_sincos)
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index eb6a531..26448ea
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -26,7 +26,28 @@ 

 VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVdN4v_cos)
 VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVdN4v_sin)
-VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVdN4vvv_sincos)
 VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVdN4v_log)
 VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVdN4v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVdN4vv_pow)
+
+#define VEC_INT_TYPE __m256i
+
+/* Redefinition of wrapper to be compatible with _ZGVdN4vvv_sincos.  */
+#undef VECTOR_WRAPPER_fFF
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func)           \
+extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE);        \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)              \
+{                                              \
+  int i;                                       \
+  VEC_TYPE mx;                                 \
+  VEC_INT_TYPE mr, mr1;                                \
+  INIT_VEC_LOOP (mx, x, VEC_LEN);              \
+  INIT_VEC_LOOP (mr, (long int)r, VEC_LEN);    \
+  INIT_VEC_LOOP (mr1, (long int)r1, VEC_LEN);  \
+  vector_func (mx, mr, mr1);                   \
+  TEST_VEC_LOOP (*(FLOAT*)mr, VEC_LEN);                \
+  TEST_VEC_LOOP (*(FLOAT*)mr1, VEC_LEN);       \
+  return;                                      \
+}
+
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVdN4vvv_sincos)
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 52b81da..52a67be
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -23,7 +23,29 @@ 

 VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVcN4v_cos)
 VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVcN4v_sin)
-VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVcN4vvv_sincos)
 VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVcN4v_log)
 VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVcN4v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVcN4vv_pow)
+
+#define VEC_INT_TYPE __m128i
+
+/* Redefinition of wrapper to be compatible with _ZGVcN4vvv_sincos.  */
+#undef VECTOR_WRAPPER_fFF
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func)           \
+extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE,  \
+                        VEC_INT_TYPE, VEC_INT_TYPE);           \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)              \
+{                                              \
+  int i;                                       \
+  VEC_TYPE mx;                                 \
+  VEC_INT_TYPE mr, mr1;                                \
+  INIT_VEC_LOOP (mx, x, VEC_LEN);              \
+  INIT_VEC_LOOP (mr, (long int)r, VEC_LEN/2);  \
+  INIT_VEC_LOOP (mr1, (long int)r1, VEC_LEN/2);        \
+  vector_func (mx, mr, mr, mr1, mr1);          \
+  TEST_VEC_LOOP (*(FLOAT*)mr, VEC_LEN/2);      \
+  TEST_VEC_LOOP (*(FLOAT*)mr1, VEC_LEN/2);     \
+  return;                                      \
+}
+
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVcN4vvv_sincos)
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index c10bb9c..557cb1e
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -23,7 +23,28 @@ 

 VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVeN8v_cos)
 VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVeN8v_sin)
-VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVeN8vvv_sincos)
 VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVeN8v_log)
 VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVeN8v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVeN8vv_pow)
+
+#define VEC_INT_TYPE __m512i
+
+/* Redefinition of wrapper to be compatible with _ZGVeN8vvv_sincos.  */
+#undef VECTOR_WRAPPER_fFF
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func)           \
+extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE);        \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)              \
+{                                              \
+  int i;                                       \
+  VEC_TYPE mx;                                 \
+  VEC_INT_TYPE mr, mr1;                                \
+  INIT_VEC_LOOP (mx, x, VEC_LEN);              \
+  INIT_VEC_LOOP (mr, (long int)r, VEC_LEN);    \
+  INIT_VEC_LOOP (mr1, (long int)r1, VEC_LEN);  \
+  vector_func (mx, mr, mr1);                   \
+  TEST_VEC_LOOP (*(FLOAT*)mr, VEC_LEN);                \
+  TEST_VEC_LOOP (*(FLOAT*)mr1, VEC_LEN);       \
+  return;                                      \
+}
+
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVeN8vvv_sincos)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index dc09e4a..9137dbe
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -23,7 +23,29 @@ 

 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVeN16v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVeN16v_sinf)
-VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVeN16vvv_sincosf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVeN16v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf)
+
+#define VEC_INT_TYPE __m512i
+
+/* Redefinition of wrapper to be compatible with _ZGVeN16vvv_sincosf.  */
+#undef VECTOR_WRAPPER_fFF
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func)           \
+extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE, \
+                        VEC_INT_TYPE, VEC_INT_TYPE);           \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)              \
+{                                              \
+  int i;                                       \
+  VEC_TYPE mx;                                 \
+  VEC_INT_TYPE mr, mr1;                                \
+  INIT_VEC_LOOP (mx, x, VEC_LEN);              \
+  INIT_VEC_LOOP (mr, (long int)r, VEC_LEN/2);  \
+  INIT_VEC_LOOP (mr1, (long int)r1, VEC_LEN/2);        \
+  vector_func (mx, mr, mr, mr1, mr1);          \
+  TEST_VEC_LOOP (*(FLOAT*)mr, VEC_LEN/2);      \
+  TEST_VEC_LOOP (*(FLOAT*)mr1, VEC_LEN/2);     \
+  return;                                      \
+}
+
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVeN16vvv_sincosf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 0bb9818..005ad22
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -23,7 +23,29 @@ 

 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf)
-VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVbN4vvv_sincosf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf)
+
+#define VEC_INT_TYPE __m128i
+
+/* Redefinition of wrapper to be compatible with _ZGVbN4vvv_sincosf.  */
+#undef VECTOR_WRAPPER_fFF
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func)           \
+extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE, \
+                        VEC_INT_TYPE, VEC_INT_TYPE);           \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)              \
+{                                              \
+  int i;                                       \
+  VEC_TYPE mx;                                 \
+  VEC_INT_TYPE mr, mr1;                                \
+  INIT_VEC_LOOP (mx, x, VEC_LEN);              \
+  INIT_VEC_LOOP (mr, (long int)r, VEC_LEN/2);  \
+  INIT_VEC_LOOP (mr1, (long int)r1, VEC_LEN/2);        \
+  vector_func (mx, mr, mr, mr1, mr1);          \
+  TEST_VEC_LOOP (*(FLOAT*)mr, VEC_LEN/2);      \
+  TEST_VEC_LOOP (*(FLOAT*)mr1, VEC_LEN/2);     \
+  return;                                      \
+}
+
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVbN4vvv_sincosf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 4985ac2..53f4221
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -26,7 +26,29 @@ 

 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVdN8v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVdN8v_sinf)
-VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVdN8vvv_sincosf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVdN8v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf)
+
+#define VEC_INT_TYPE __m256i
+
+/* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
+#undef VECTOR_WRAPPER_fFF
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func)           \
+extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE, \
+                        VEC_INT_TYPE, VEC_INT_TYPE);           \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)              \
+{                                              \
+  int i;                                       \
+  VEC_TYPE mx;                                 \
+  VEC_INT_TYPE mr, mr1;                                \
+  INIT_VEC_LOOP (mx, x, VEC_LEN);              \
+  INIT_VEC_LOOP (mr, (long int)r, VEC_LEN/2);  \
+  INIT_VEC_LOOP (mr1, (long int)r1, VEC_LEN/2);        \
+  vector_func (mx, mr, mr, mr1, mr1);          \
+  TEST_VEC_LOOP (*(FLOAT*)mr, VEC_LEN/2);      \
+  TEST_VEC_LOOP (*(FLOAT*)mr1, VEC_LEN/2);     \
+  return;                                      \
+}
+
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVdN8vvv_sincosf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 9cc2883..12dc4b9
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -23,7 +23,31 @@ 

 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVcN8v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVcN8v_sinf)
-VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVcN8vvv_sincosf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVcN8v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf)
+
+#define VEC_INT_TYPE __m128i
+
+/* Redefinition of wrapper to be compatible with _ZGVcN8vvv_sincosf.  */
+#undef VECTOR_WRAPPER_fFF
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func)           \
+extern void vector_func (VEC_TYPE, VEC_INT_TYPE, VEC_INT_TYPE, \
+                        VEC_INT_TYPE, VEC_INT_TYPE,            \
+                        VEC_INT_TYPE, VEC_INT_TYPE,            \
+                        VEC_INT_TYPE, VEC_INT_TYPE);           \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)              \
+{                                                      \
+  int i;                                               \
+  VEC_TYPE mx;                                         \
+  VEC_INT_TYPE mr, mr1;                                        \
+  INIT_VEC_LOOP (mx, x, VEC_LEN);                      \
+  INIT_VEC_LOOP (mr, (long int)r, VEC_LEN/4);          \
+  INIT_VEC_LOOP (mr1, (long int)r1, VEC_LEN/4);                \
+  vector_func (mx, mr, mr, mr, mr, mr1, mr1, mr1, mr1);        \
+  TEST_VEC_LOOP (*(FLOAT*)mr, VEC_LEN/4);              \
+  TEST_VEC_LOOP (*(FLOAT*)mr1, VEC_LEN/4);             \
+  return;                                              \
+}
+
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVcN8vvv_sincosf)