From patchwork Wed Sep 17 10:08:49 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jakub Jelinek <jakub@redhat.com>
X-Patchwork-Id: 2877
Received: (qmail 22635 invoked by alias); 17 Sep 2014 10:09:04 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 22625 invoked by uid 89); 17 Sep 2014 10:09:03 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL, BAYES_00,
	RP_MATCHES_RCVD, SPF_HELO_PASS,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Date: Wed, 17 Sep 2014 12:08:49 +0200
From: Jakub Jelinek <jakub@redhat.com>
To: Andrew Senkevich <andrew.n.senkevich@gmail.com>
Cc: "H.J. Lu" <hjl.tools@gmail.com>, "Carlos O'Donell" <carlos@redhat.com>,
	"Joseph S. Myers" <joseph@codesourcery.com>,
	libc-alpha <libc-alpha@sourceware.org>,
	"Zamyatin, Igor" <igor.zamyatin@intel.com>,
	"Melik-Adamyan, Areg" <areg.melik-adamyan@intel.com>
Subject: Re: [PATCH 1/N] x86_64 vectorization support: vectorized math
	functions addition to Glibc
Message-ID: <20140917100849.GD17454@tucnak.redhat.com>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: 
 <CAMXFM3t=ppndDUBzHzSus7xyuF5hTaLFZ5b273jD39NtddSvsw@mail.gmail.com>
	<Pine.LNX.4.64.1409101549490.12853@digraph.polyomino.org.uk>
	<5411F8D3.7050001@redhat.com>
	<CAMXFM3vEbTO1ntx7KOAG21axosPApTG6vwpcnu7B4VVATD+USw@mail.gmail.com>
	<CAMe9rOqFmwMWYBSsg9gPNeB_nskWZMSpzeWwc=YomsTNzjCn1A@mail.gmail.com>
	<CAMXFM3uNrRrAHDdS0LnbRZ7QwEFv1yd25cu1Ht2NC8fMBxLsBA@mail.gmail.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: 
 <CAMXFM3uNrRrAHDdS0LnbRZ7QwEFv1yd25cu1Ht2NC8fMBxLsBA@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)

On Wed, Sep 17, 2014 at 01:56:06PM +0400, Andrew Senkevich wrote:
> > The wiki says:
> >
> > 3.1. Goal
> >
> > Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
> > constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
> > and Cilk Plus constructs (6-7 in
> > http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
> > on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
> > several vector math functions (float and double versions). AVX-512
> > versions are planned to be added later. These functions can be also
> > used manually (with intrincics) by developers to obtain speedup.
> >
> > It is the opposite of
> >
> > https://sourceware.org/ml/libc-alpha/2014-09/msg00277.html
> >
> > which is for programmers to use them directly in their
> > applications, mostly independent of compilers.
> >
> > We need to come to an agreement on what goal is first.
> >
> > --
> > H.J.
> 
> Hi H.J.,
> 
> of course the first goal is to improve vectorization. Usage with
> intrinsics is additional goal and is not very significant.
> 
> Attached first patch corrected according last comments in
> https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html.

you need all of SSE2, AVX and AVX2 versions, the other two can be
thunked (extract arguments and call cos in a loop or similarly, then
pass result in vector reg again).

	Jakub

--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -46,6 +46,17 @@
 # error "Never include <bits/mathcalls.h> directly; include <math.h> instead."
 #endif
 
+#undef __DECL_SIMD
+
+/* For now we have vectorized version only for _Mdouble_ case */
+#if !defined _Mfloat_ && !defined _Mlong_double_
+# if defined _OPENMP && _OPENMP >= 201307
+#  define __DECL_SIMD _Pragma ("omp declare simd")

As the function is provided only on x86_64, it needs to be guarded
by defined __x86_64__ too (or have some way how arch specific
headers can tell what function are elemental).

Also, only the N (notinbranch) version is provided, so you'd
need to use "omp declare simd notinbranch", and furthermore only
the AVX2 version is provided (that is not possible for gcc,