[RFC] How to add vector math functions to Glibc

Message ID	CAMXFM3uPiuJvSpgmt+8d0B1qh3QSA=TVx0ZExfojDVHzrscL8A@mail.gmail.com
State	New, archived
Headers	Received: (qmail 9689 invoked by alias); 9 Oct 2014 17:10:32 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 9655 invoked by uid 89); 9 Oct 2014 17:10:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qg0-f53.google.com X-Received: by 10.140.19.36 with SMTP id 33mr57109906qgg.32.1412874625343; Thu, 09 Oct 2014 10:10:25 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <Pine.LNX.4.64.1410021411420.24886@digraph.polyomino.org.uk> References: <CAMXFM3tjquzniXP1weqxSVFJyhXqsf2PHuyrrrmqp7K0ZzORqA@mail.gmail.com> <CAMXFM3tO7MTYCq8-YFZacdbLvR4iAab_n04AuB+rp2Phs4BvQg@mail.gmail.com> <Pine.LNX.4.64.1409242011260.7597@digraph.polyomino.org.uk> <CAMXFM3tqiqUNuSU2KXvAFM-QescX3+6xUO9=z5X0Ac6C9qJ7zg@mail.gmail.com> <CAMe9rOq7bZHb8R=opUzSmAMGWjLpX21mR=Sx96cuBph=TTtDXA@mail.gmail.com> <54246CB5.7020908@redhat.com> <CAMe9rOoLmJ2jHWmERoB0M83WNKovJOgh0--Kquw9O86A1tPU0g@mail.gmail.com> <5424733D.6010305@redhat.com> <CAMe9rOpacze055qyBFzz3M-b-GNtXCqZzMmkScBL9a94zVj28g@mail.gmail.com> <54247FAB.6050002@redhat.com> <CAMXFM3v8narOLMHC5U=fvyTFWV6s4ZACN-UrAC4fAcUs9SOFfA@mail.gmail.com> <54257507.9070508@redhat.com> <CAMXFM3vOLspQtHxgJfD_Emht480w2RMbiwnEH6A_LhoS-JZFag@mail.gmail.com> <Pine.LNX.4.64.1409301620020.15186@digraph.polyomino.org.uk> <542AF92E.8090708@lip6.fr> <Pine.LNX.4.64.1409302003410.12188@digraph.polyomino.org.uk> <CAMXFM3tuM_p6Acp4hzoQ2xzR=4BZqtw8NbezqY6h8V4Xx=5hUA@mail.gmail.com> <Pine.LNX.4.64.1410021411420.24886@digraph.polyomino.org.uk> From: Andrew Senkevich <andrew.n.senkevich@gmail.com> Date: Thu, 9 Oct 2014 21:09:55 +0400 Message-ID: <CAMXFM3uPiuJvSpgmt+8d0B1qh3QSA=TVx0ZExfojDVHzrscL8A@mail.gmail.com> Subject: Re: [RFC] How to add vector math functions to Glibc To: "Joseph S. Myers" <joseph@codesourcery.com> Cc: Christoph Lauter <christoph.lauter@lip6.fr>, "Carlos O'Donell" <carlos@redhat.com>, libc-alpha <libc-alpha@sourceware.org> Content-Type: text/plain; charset=UTF-8

Message ID

CAMXFM3uPiuJvSpgmt+8d0B1qh3QSA=TVx0ZExfojDVHzrscL8A@mail.gmail.com

State

New, archived

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
MIME-Version: 1.0
In-Reply-To: <Pine.LNX.4.64.1410021411420.24886@digraph.polyomino.org.uk>
References: <CAMXFM3tjquzniXP1weqxSVFJyhXqsf2PHuyrrrmqp7K0ZzORqA@mail.gmail.com>
	<CAMXFM3tO7MTYCq8-YFZacdbLvR4iAab_n04AuB+rp2Phs4BvQg@mail.gmail.com>
	<Pine.LNX.4.64.1409242011260.7597@digraph.polyomino.org.uk>
	<CAMXFM3tqiqUNuSU2KXvAFM-QescX3+6xUO9=z5X0Ac6C9qJ7zg@mail.gmail.com>
	<CAMe9rOq7bZHb8R=opUzSmAMGWjLpX21mR=Sx96cuBph=TTtDXA@mail.gmail.com>
	<54246CB5.7020908@redhat.com>
	<CAMe9rOoLmJ2jHWmERoB0M83WNKovJOgh0--Kquw9O86A1tPU0g@mail.gmail.com>
	<5424733D.6010305@redhat.com>
	<CAMe9rOpacze055qyBFzz3M-b-GNtXCqZzMmkScBL9a94zVj28g@mail.gmail.com>
	<54247FAB.6050002@redhat.com>
	<CAMXFM3v8narOLMHC5U=fvyTFWV6s4ZACN-UrAC4fAcUs9SOFfA@mail.gmail.com>
	<54257507.9070508@redhat.com>
	<CAMXFM3vOLspQtHxgJfD_Emht480w2RMbiwnEH6A_LhoS-JZFag@mail.gmail.com>
	<Pine.LNX.4.64.1409301620020.15186@digraph.polyomino.org.uk>
	<542AF92E.8090708@lip6.fr>
	<Pine.LNX.4.64.1409302003410.12188@digraph.polyomino.org.uk>
	<CAMXFM3tuM_p6Acp4hzoQ2xzR=4BZqtw8NbezqY6h8V4Xx=5hUA@mail.gmail.com>
	<Pine.LNX.4.64.1410021411420.24886@digraph.polyomino.org.uk>
From: Andrew Senkevich <andrew.n.senkevich@gmail.com>
Date: Thu, 9 Oct 2014 21:09:55 +0400
Message-ID: <CAMXFM3uPiuJvSpgmt+8d0B1qh3QSA=TVx0ZExfojDVHzrscL8A@mail.gmail.com>
Subject: Re: [RFC] How to add vector math functions to Glibc
To: "Joseph S. Myers" <joseph@codesourcery.com>
Cc: Christoph Lauter <christoph.lauter@lip6.fr>,
	"Carlos O'Donell" <carlos@redhat.com>, 
	libc-alpha <libc-alpha@sourceware.org>
Content-Type: text/plain; charset=UTF-8

Commit Message

Andrew Senkevich Oct. 9, 2014, 5:09 p.m. UTC

  Hi all,

lets discuss changes in the testsuite, --enable-mathvec configure
option and comments for data table.
Some runtime or configure check also need to be added for running
tests only on appropriate hardware.




--
WBR,
Andrew

Comments

Andreas Schwab Oct. 9, 2014, 5:39 p.m. UTC | #1

Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:

> +      [if test -n "$(gcc -v 2>&1 | grep 'Target: x86_64')"; then :

You can get the target with -dumpmachine.  But neither takes -m32 into
account, so you'd better check the __x86_64__ predefine.

Andreas.

Joseph Myers Oct. 9, 2014, 5:45 p.m. UTC | #2

On Thu, 9 Oct 2014, Andrew Senkevich wrote:

> lets discuss changes in the testsuite, --enable-mathvec configure
> option and comments for data table.

I think the patch submission needs much more explanation (several 
paragraphs explaining what this patch does and how it relates to previous 
patch submissions and discussion in this area).  At this stage of 
discussion, the carefully written analysis of the implementation choices 
you faced and the decisions you reached, with rationale, is much more 
important than the patch itself.

> diff --git a/configure.ac b/configure.ac
> index 82d0896..c32e508 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -353,6 +353,17 @@ if test "$build_pt_chown" = yes; then
>    AC_DEFINE(HAVE_PT_CHOWN)
>  fi
> 
> +AC_ARG_ENABLE([mathvec],
> +      [AS_HELP_STRING([--enable-mathvec],
> +      [Enable building and installing mathvec @<:@default=yes on
> x86_64 build, else default=no@:>@])],
> +      [build_mathvec=$enableval],
> +      [if test -n "$(gcc -v 2>&1 | grep 'Target: x86_64')"; then :

No, you never put architecture-dependencies in the toplevel configure 
script.  Instead, the default needs to be determined by variables that may 
be set by sysdeps configure fragments.

> diff --git a/math/libm-test.inc b/math/libm-test.inc
> index f86a4fa..39901c4 100644
> --- a/math/libm-test.inc
> +++ b/math/libm-test.inc
> @@ -706,13 +706,15 @@ test_single_errno (const char *test_name, int errno_value,
>  static void
>  test_errno (const char *test_name, int errno_value, int exceptions)
>  {
> -  ++noErrnoTests;
> -  if (exceptions & ERRNO_UNCHANGED)
> -    test_single_errno (test_name, errno_value, 0, "unchanged");
> -  if (exceptions & ERRNO_EDOM)
> -    test_single_errno (test_name, errno_value, EDOM, "EDOM");
> -  if (exceptions & ERRNO_ERANGE)
> -    test_single_errno (test_name, errno_value, ERANGE, "ERANGE");
> +#ifndef TEST_MATHVEC

It would seem better to change test_single_errno where it already has a 
conditional "#ifndef TEST_INLINE".

> +/* Run tests for a given function in TONEAREST rounding modes.  */
> +#define TN_RM_TEST(FUNC, EXACT, ARRAY, LOOP_MACRO, END_MACRO, ...) \

I think you should arrange for IF_ROUND_INIT_* to return false for modes 
other than FE_TONEAREST when doing the vector tests, rather than having a 
new macro like this.

> @@ -6258,7 +6274,11 @@ static const struct test_f_f_data cos_test_data[] =
>  static void
>  cos_test (void)
>  {
> +#ifndef TEST_MATHVEC
>    ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
> +#else
> +  TN_RM_TEST (vector_cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
> +#endif
>  }

And I don't think we want conditionals like this for every function - 
indeed, the tests shouldn't need to know which functions have vector 
versions at all.

Instead, I suggest that all the testing of different variants takes place 
in the math/ directory - and in addition to testing float, double, 
ldouble, ifloat, idouble, ildoubl, that there are also cases float-vector, 
double-vector, ldouble-vector.  (I also suggest renaming the ifloat, 
idouble, ildoubl cases to match this general pattern.)

That is, there are some number of variants that may be tested for each 
floating-point type.  It may be useful for sysdeps Makefile fragments to 
be able to add to the list of variants.  math/Makefile should then arrange 
for the tests to be run for all relevant combinations of (type, variant).

> +CFLAGS-test-vec-double.c = -fno-inline -ffloat-store -fno-builtin
> -frounding-math -mavx2 -Wno-unused-function

Again, nothing architecture-specific (such as -mavx) in 
architecture-independent files.  If architecture-specific options are 
needed for testing, you need to set up a system of variables that can go 
in sysdeps Makefile fragments.  And since you can't determine at configure 
time what host the tests might run on, instruction set features such as 
AVX need testing for at runtime (this means building separate source files 
for the test with separate options so that you know the compiler won't 
generate AVX code before you've tested for AVX availability).

> +#include <immintrin.h>
> +
> +extern __m256d _ZGVdN4v_cos(__m256d);

We need an architecture-independent way of testing.  It might involve 
architecture-specific files providing information about how to map from 
the scalar function to the vector function, what vector functions are 
available, etc. - but the structure needs to have such a division between 
architecture-specific and architecture-independent files.

(I'd like tests to cover normal use via the installed headers, such as 
-fopenmp, but I think testing the vector functions directly *is* a good 
idea as well.)

> +/* General constants:
> + * lAbsMask
> + */

I really don't think these comments are sufficient to explain the 
semantics of the values.  I'm expecting comments of the form "the 
following N 64-bit values are IEEE binary64 constants a[0], a[1], ... for 
a minimax polynomial expansion a[0] + a[1]x + a[2]x^2 + ... of 
cos(x+0.125) for x in the interval [0.125,0.25]" or similar - an 
unambiguous description of exactly what the values mean / how they are 
used.  And see my previous point about defining macros for the offsets in 
this table in such a way that build errors will occur if the macro values 
are wrong.

Joseph Myers Oct. 9, 2014, 5:46 p.m. UTC | #3

On Thu, 9 Oct 2014, Andreas Schwab wrote:

> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
> 
> > +      [if test -n "$(gcc -v 2>&1 | grep 'Target: x86_64')"; then :
> 
> You can get the target with -dumpmachine.  But neither takes -m32 into
> account, so you'd better check the __x86_64__ predefine.

And it shouldn't use "gcc" at all - the compiler used is $CC.  But we've 
moved all such configuration into sysdeps configure scripts, so that's the 
right approach here.

Andrew Senkevich Oct. 10, 2014, 1:27 p.m. UTC | #4

2014-10-09 21:45 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:

>> +/* General constants:
>> + * lAbsMask
>> + */
>
> I really don't think these comments are sufficient to explain the
> semantics of the values.  I'm expecting comments of the form "the
> following N 64-bit values are IEEE binary64 constants a[0], a[1], ... for
> a minimax polynomial expansion a[0] + a[1]x + a[2]x^2 + ... of
> cos(x+0.125) for x in the interval [0.125,0.25]" or similar - an
> unambiguous description of exactly what the values mean / how they are
> used.

Table values were obtained mostly through many years of research and
experimental work, were part of old enough codes and we have no
detailed comments there either. So our proposal is to stay at current
level of comments as these codes proved their correctness and
effectiveness through many years of intensive usage in math
applications in such institutions as CERN, LLNL, etc.

> And see my previous point about defining macros for the offsets in
> this table in such a way that build errors will occur if the macro values
> are wrong.

We will follow-up, though these sources will not change often and they
have no influence on usage value.


--
WBR,
Andrew

Joseph Myers Oct. 10, 2014, 3:23 p.m. UTC | #5

On Fri, 10 Oct 2014, Andrew Senkevich wrote:

> Table values were obtained mostly through many years of research and
> experimental work, were part of old enough codes and we have no
> detailed comments there either. So our proposal is to stay at current
> level of comments as these codes proved their correctness and
> effectiveness through many years of intensive usage in math
> applications in such institutions as CERN, LLNL, etc.

So maybe you aren't sure if e.g. the values are the result of rounding to 
floating-point values a minimax polynomial approximation over the reals, 
or if they are a minimax polynomial approximation over floating-point 
values, or if they are some other kind of polynomial approximation.  But 
you can still make the comments say how they are used.

E.g. in <https://sourceware.org/ml/libc-alpha/2014-09/msg00680.html> you 
have a comment saying "Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7)))".  Now if 
you repeated that in the table, with the additional information of *what 
this is a polynomial approximation for* ((cos(x)-1)/x^2? (sin(x)-x)/x^3?), 
and *what interval the approximation is used on*, you've provided enough 
information there for someone who wants to recompute values optimized in a 
particular way to do so.

This goes together with a few other things to make the table more 
readable:

* If the values are 64-bit doubles, representing them with .quad rather 
than as pairs of .long would make things clearer.

* Where you have vectors repeating the same value eight times, using .rept 
/ .endr would make this obvious and make the source code smaller.

* Combining this with my previous suggestion in 
<https://sourceware.org/ml/libc-alpha/2014-10/msg00040.html> regarding how 
to make the offsets of table entries explicit, you could do:

/* Define a vector of eight copies of VALUE, whose offset from the
   start of the table __gnu_svml_dcos_data must be OFFSET.  */
.macro double_vector offset value
.if .-__gnu_svml_dcos_data != \offset
.err
.endif
.rept 8
.quad \value
.endr
.endm

and then define the values as

double_vector OFFSET_LABSMASK 0x7fffffffffffffff
double_vector OFFSET_LRANGEVAL 0x4160000000000000

etc. - you still need the comments explaining what each of the values is / 
how it is used, and still need the function implementation to use those 
OFFSET_* macros for offsets rather than hardcoding their values, but I 
think macro calls like this are about as clear as you can get for actually 
putting the constants into the table in a .S file.

> > And see my previous point about defining macros for the offsets in
> > this table in such a way that build errors will occur if the macro values
> > are wrong.
> 
> We will follow-up, though these sources will not change often and they
> have no influence on usage value.

Software is for people to read and modify, not just for computers to 
execute.  It's inherent to Free Software that you don't know who might be 
using or modifying it and in what way - so enough information should be 
provided in the source code that someone other than the original author 
can plausibly make local changes (e.g. changing the algorithm in a 
particular case only).

diff mbox

Patch

diff --git a/Makeconfig b/Makeconfig
index 24a3b82..4672008 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -476,7 +476,7 @@  link-libc = $(link-libc-rpath-link)
$(link-libc-before-gnulib) $(gnulib)
 link-libc-tests = $(link-libc-tests-rpath-link) \
   $(link-libc-before-gnulib) $(gnulib-tests)
 # This is how to find at build-time things that will be installed there.
-rpath-dirs = math elf dlfcn nss nis rt resolv crypt
+rpath-dirs = math elf dlfcn nss nis rt resolv crypt mathvec
 rpath-link = \
 $(common-objdir):$(subst $(empty) ,:,$(patsubst
../$(subdir),.,$(rpath-dirs:%=$(common-objpfx)%)))
 else
@@ -1018,7 +1018,7 @@  all-subdirs = csu assert ctype locale intl
catgets math setjmp signal    \
       stdlib stdio-common libio malloc string wcsmbs time dirent    \
       grp pwd posix io termios resource misc socket sysvipc gmon    \
       gnulib iconv iconvdata wctype manual shadow gshadow po argp   \
-      crypt localedata timezone rt conform debug    \
+      crypt localedata timezone rt conform debug mathvec    \
       $(add-on-subdirs) dlfcn elf

 ifndef avoid-generated

diff --git a/config.make.in b/config.make.in
index 4a781fd..09fe220 100644
--- a/config.make.in
+++ b/config.make.in
@@ -93,6 +93,7 @@  use-nscd = @use_nscd@
 build-hardcoded-path-in-tests= @hardcoded_path_in_tests@
 build-pt-chown = @build_pt_chown@
 enable-lock-elision = @enable_lock_elision@
+build-mathvect = @build_mathvec@

 # Build tools.
 CC = @CC@

diff --git a/configure.ac b/configure.ac
index 82d0896..c32e508 100644
--- a/configure.ac
+++ b/configure.ac
@@ -353,6 +353,17 @@  if test "$build_pt_chown" = yes; then
   AC_DEFINE(HAVE_PT_CHOWN)
 fi

+AC_ARG_ENABLE([mathvec],
+      [AS_HELP_STRING([--enable-mathvec],
+      [Enable building and installing mathvec @<:@default=yes on
x86_64 build, else default=no@:>@])],
+      [build_mathvec=$enableval],
+      [if test -n "$(gcc -v 2>&1 | grep 'Target: x86_64')"; then :
+    build_mathvec=yes
+  else
+    build_mathvec=no
+  fi])
+AC_SUBST(build_mathvec)
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.

diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl
index b5d599f..9899e1a 100755
--- a/math/gen-libm-test.pl
+++ b/math/gen-libm-test.pl
@@ -87,7 +87,7 @@  if ($opt_h) {
 $ulps_file = $opt_u if ($opt_u);
 $output_dir = $opt_o if ($opt_o);

-$input = "libm-test.inc";
+$input = "${srcdir}libm-test.inc";
 $auto_input = "${srcdir}auto-libm-test-out";
 $output = "${output_dir}libm-test.c";

diff --git a/math/libm-test.inc b/math/libm-test.inc
index f86a4fa..39901c4 100644
--- a/math/libm-test.inc
+++ b/math/libm-test.inc
@@ -706,13 +706,15 @@  test_single_errno (const char *test_name, int errno_value,
 static void
 test_errno (const char *test_name, int errno_value, int exceptions)
 {
-  ++noErrnoTests;
-  if (exceptions & ERRNO_UNCHANGED)
-    test_single_errno (test_name, errno_value, 0, "unchanged");
-  if (exceptions & ERRNO_EDOM)
-    test_single_errno (test_name, errno_value, EDOM, "EDOM");
-  if (exceptions & ERRNO_ERANGE)
-    test_single_errno (test_name, errno_value, ERANGE, "ERANGE");
+#ifndef TEST_MATHVEC
+      ++noErrnoTests;
+      if (exceptions & ERRNO_UNCHANGED)
+        test_single_errno (test_name, errno_value, 0, "unchanged");
+      if (exceptions & ERRNO_EDOM)
+        test_single_errno (test_name, errno_value, EDOM, "EDOM");
+      if (exceptions & ERRNO_ERANGE)
+        test_single_errno (test_name, errno_value, ERANGE, "ERANGE");
+#endif
 }

 /* Returns the number of ulps that GIVEN is away from EXPECTED.  */
@@ -1734,6 +1736,20 @@  struct test_fFF_11_data
     } \
   while (0);

+/* Run tests for a given function in TONEAREST rounding modes.  */
+#define TN_RM_TEST(FUNC, EXACT, ARRAY, LOOP_MACRO, END_MACRO, ...) \
+  do \
+    { \
+      do \
+ { \
+  START (FUNC, EXACT); \
+  LOOP_MACRO (FUNC, ARRAY, FE_TONEAREST, ## __VA_ARGS__); \
+  END_MACRO; \
+ } \
+      while (0); \
+    } \
+  while (0);
+
 /* This is to prevent messages from the SVID libm emulation.  */
 int
 matherr (struct exception *x __attribute__ ((unused)))
@@ -6258,7 +6274,11 @@  static const struct test_f_f_data cos_test_data[] =
 static void
 cos_test (void)
 {
+#ifndef TEST_MATHVEC
   ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
+#else
+  TN_RM_TEST (vector_cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
+#endif
 }

@@ -9824,6 +9844,7 @@  main (int argc, char **argv)
   initialize ();
   printf (TEST_MSG);

+#ifndef TEST_MATHVEC
   check_ulp ();

   /* Keep the tests a wee bit ordered (according to ISO C99).  */
@@ -9960,6 +9981,11 @@  main (int argc, char **argv)
   y0_test ();
   y1_test ();
   yn_test ();
+#else
+  /* Vector trigonometric functions:  */
+  cos_test ();
+
+#endif

   if (output_ulps)
     fclose (ulps_file);

diff --git a/mathvec/Makefile b/mathvec/Makefile
new file mode 100644
index 0000000..546741a
--- /dev/null
+++ b/mathvec/Makefile
@@ -0,0 +1,63 @@ 
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Makefile for the vector math library.
+
+subdir := mathvec
+
+include ../Makeconfig
+
+ifeq ($(build-mathvect),yes)
+extra-libs := libmvec
+extra-libs-others = $(extra-libs)
+endif
+
+libmvec-routines = $(strip $(libmvec-support))
+
+$(objpfx)libmvec.so: $(common-objpfx)math/libm.so
+
+# Rules for the test suite.
+ifeq ($(build-mathvect),yes)
+ifneq (no,$(PERL))
+libmvec-tests = test-vec-double
+libmvec-tests.o = $(addsuffix .o,$(libmvec-tests))
+tests = $(libmvec-tests)
+
+libmvec-tests-generated = $(common-objpfx)math/libm-test-ulps.h
$(common-objpfx)math/libm-test.c
+generated += $(libmvec-tests-generated) libmvec-test.stmp
+
+# This is needed for dependencies
+before-compile += $(common-objpfx)math/libm-test.c
+ulps-file = $(firstword $(wildcard $(sysdirs:%=%/libm-test-ulps)))
+
+$(addprefix $(objpfx), $(libmvec-tests-generated)): $(objpfx)libmvec-test.stmp
+
+$(objpfx)libmvec-test.stmp: $(ulps-file) ../math/libm-test.inc \
+ ../math/gen-libm-test.pl ../math/auto-libm-test-out
+ $(make-target-directory)
+ $(PERL) ../math/gen-libm-test.pl -u $< -o "$(common-objpfx)math/"
+ @echo > $@
+
+$(objpfx)test-vec-double.o: $(objpfx)libmvec-test.stmp
+endif
+endif
+
+CFLAGS-test-vec-double.c = -fno-inline -ffloat-store -fno-builtin
-frounding-math -mavx2 -Wno-unused-function
+
+rtld-tests-LDFLAGS += $(common-objpfx)math/libm.so $(objpfx)libmvec.so
+
+include ../Rules

diff --git a/mathvec/test-vec-double.c b/mathvec/test-vec-double.c
new file mode 100644
index 0000000..d418ac2
--- /dev/null
+++ b/mathvec/test-vec-double.c
@@ -0,0 +1,58 @@ 
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FUNC(function) function
+#define FLOAT double
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat)
Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#include <immintrin.h>
+
+extern __m256d _ZGVdN4v_cos(__m256d);
+
+double vector_cos(double x)
+{
+  int i;
+  __m256d mx = _mm256_set1_pd(x);
+  __m256d mr = _ZGVdN4v_cos(mx);
+
+  for(i=1;i<4;i++)
+  {
+    if (((double*)&mr)[0]!=((double*)&mr)[i])
+    {
+      return ((double*)&mr)[0]+0.1;
+    }
+  }
+
+  return ((double*)&mr)[0];
+}
+
+#define TEST_MATHVEC
+#define EXCEPTION_TESTS_double 0
+
+#include "../math/libm-test.c"

diff --git a/sysdeps/x86_64/fpu/libm-test-ulps
b/sysdeps/x86_64/fpu/libm-test-ulps
index 36e1b76..bdc7b56 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -905,6 +905,9 @@  idouble: 1
 ildouble: 2
 ldouble: 2

+Function: "vector_cos":
+double: 1
+
 Function: "cosh":
 double: 1
 float: 1

diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S
b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
index 0000000..0f2ff1f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
@@ -0,0 +1,492 @@ 
+/* Data for vectorized cos.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+ .section .rodata, "a"
+
+ .align 64
+ .globl __gnu_svml_dcos_data
+
+/* Data table for vector implementations of function cos.
+ * The table may contain polynomial, reduction, lookup
+ * coefficients and other constants obtained through different
+ * methods of research and experimental work.
+ */
+__gnu_svml_dcos_data:
+
+/* General constants:
+ * lAbsMask
+ */
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+
+/* lRangeVal */
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+
+/* HalfPI */
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+
+/* InvPI */
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+
+/* RShifter */
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+
+/* OneHalf */
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+
+/* Range reduction PI-based constants:
+ * PI1
+ */
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+
+/* PI2 */
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+
+/* PI3 */
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+
+/* PI4 */
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+
+/* Range reduction PI-based constants if FMA available:
+ * PI1_FMA
+ */
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+
+/* PI2_FMA */
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+
+/* PI3_FMA */
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+
+/* Polynomial coeffifients (relative error 2^(-52.115)):
+ * C1
+ */
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+
+/* C2 */
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+
+/* C3 */
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+
+/* C4 */
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+
+/* C5 */
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+
+/* C6 */
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+
+/* C7 */
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+
+/* Additional constants:
+ * AbsMask
+ */
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+
+/* InvPI */
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+
+/* RShifter_la */
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+
+/* RShifter_la */
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+
+/* RSXmax_la */
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .type __gnu_svml_dcos_data,@object
+ .size __gnu_svml_dcos_data,.-__gnu_svml_dcos_data