[1/N] x86_64 vectorization support: vectorized math functions addition to Glibc

Message ID 20140917100849.GD17454@tucnak.redhat.com
State Rejected
Headers

Commit Message

Jakub Jelinek Sept. 17, 2014, 10:08 a.m. UTC
  On Wed, Sep 17, 2014 at 01:56:06PM +0400, Andrew Senkevich wrote:
> > The wiki says:
> >
> > 3.1. Goal
> >
> > Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
> > constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
> > and Cilk Plus constructs (6-7 in
> > http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
> > on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
> > several vector math functions (float and double versions). AVX-512
> > versions are planned to be added later. These functions can be also
> > used manually (with intrincics) by developers to obtain speedup.
> >
> > It is the opposite of
> >
> > https://sourceware.org/ml/libc-alpha/2014-09/msg00277.html
> >
> > which is for programmers to use them directly in their
> > applications, mostly independent of compilers.
> >
> > We need to come to an agreement on what goal is first.
> >
> > --
> > H.J.
> 
> Hi H.J.,
> 
> of course the first goal is to improve vectorization. Usage with
> intrinsics is additional goal and is not very significant.
> 
> Attached first patch corrected according last comments in
> https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html.

you need all of SSE2, AVX and AVX2 versions, the other two can be
thunked (extract arguments and call cos in a loop or similarly, then
pass result in vector reg again).

	Jakub
  

Comments

Andrew Senkevich Sept. 17, 2014, 11:57 a.m. UTC | #1
Hi Jakub,

> +/* For now we have vectorized version only for _Mdouble_ case */
> +#if !defined _Mfloat_ && !defined _Mlong_double_
> +# if defined _OPENMP && _OPENMP >= 201307
> +#  define __DECL_SIMD _Pragma ("omp declare simd")
>
> As the function is provided only on x86_64, it needs to be guarded
> by defined __x86_64__ too (or have some way how arch specific
> headers can tell what function are elemental).
> Also, only the N (notinbranch) version is provided, so you'd
> need to use "omp declare simd notinbranch", and furthermore only
> the AVX2 version is provided (that is not possible for gcc,
> you need all of SSE2, AVX and AVX2 versions, the other two can be
> thunked (extract arguments and call cos in a loop or similarly, then
> pass result in vector reg again).

thank you, this place will look so:

#if defined (__x86_64__) && !defined _Mfloat_ && !defined _Mlong_double_ \
     && defined _OPENMP && _OPENMP >= 201307
# define __DECL_SIMD _Pragma ("omp declare simd notinbranch")
#else
# define __DECL_SIMD
#endif

Only AVX will be thunked, SSE4 we have implemented.
It will be added in next patch soon.
  
Joseph Myers Sept. 17, 2014, 12:34 p.m. UTC | #2
On Wed, 17 Sep 2014, Andrew Senkevich wrote:

> thank you, this place will look so:
> 
> #if defined (__x86_64__) && !defined _Mfloat_ && !defined _Mlong_double_ \
>      && defined _OPENMP && _OPENMP >= 201307
> # define __DECL_SIMD _Pragma ("omp declare simd notinbranch")
> #else
> # define __DECL_SIMD
> #endif

No, we never put architecture conditionals like that in 
architecture-independent headers.  You have to use bits/*.h headers 
(per-architecture) instead to provide all the information about which 
functions have which vectorized versions.  See what I suggested in 
<https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html> about macros 
such as __DECL_SIMD_COS_DOUBLE.

There's no point in sending more patch revisions until there's consensus 
on the overall implementation approach.  Please go back several steps and 
start an architecture-independent discussion in a new 
architecture-independent thread seeking consensus on all the points I 
raised in <https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html>.  
State your initial answers to these points so people can agree or disagree 
with them, but be prepared to change following the results of the 
discussion.  Only once there is such consensus, write up the results on 
the wiki, seek confirmation on the list that other people agree that what 
you wrote up is an accurate representation of the consensus, and only then 
start sending patches (which will probably put functions in libmvec, not 
libm, based on the discussions so far).  The patches have to follow the 
consensus, not your preference if the consensus goes against what you 
prefer.

<https://sourceware.org/glibc/wiki/libm> does not in any way reflect any 
sort of consensus.  It's *ideas and proposals*.  Consensus can only be 
reached after discussions on libc-alpha (and if no-one comments on 
something, that's not consensus).  In general, proposals are best posted 
to libc-alpha so it's easy to tell what was being discussed in a 
particular thread - proposals on wiki pages make it harder to follow the 
discussion, as you need to work out which version of the wiki page someone 
was referring to.

Note that if we're relying on #pragma omp declare simd meaning a precise 
set of function versions are available - with a guarantee that no future 
compiler version will interpret is also meaning an AVX512 version is 
available, for example, so that it's safely possible to use older glibc 
with a newer compiler - there should be some sort of ABI document 
(preferably compiler-independent) stating that this is the meaning of that 
pragma on x86_64 and that this pragma says nothing about availability of 
function versions for other vector extensions and that if an ABI is 
defined for such versions in future, it will use a different pragma to 
declare their availability.  Is there such an ABI document available that 
defines what this pragma means on x86_64?
  

Patch

--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -46,6 +46,17 @@ 
 # error "Never include <bits/mathcalls.h> directly; include <math.h> instead."
 #endif
 
+#undef __DECL_SIMD
+
+/* For now we have vectorized version only for _Mdouble_ case */
+#if !defined _Mfloat_ && !defined _Mlong_double_
+# if defined _OPENMP && _OPENMP >= 201307
+#  define __DECL_SIMD _Pragma ("omp declare simd")

As the function is provided only on x86_64, it needs to be guarded
by defined __x86_64__ too (or have some way how arch specific
headers can tell what function are elemental).

Also, only the N (notinbranch) version is provided, so you'd
need to use "omp declare simd notinbranch", and furthermore only
the AVX2 version is provided (that is not possible for gcc,