From patchwork Wed Sep 17 10:08:49 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 2877 Received: (qmail 22635 invoked by alias); 17 Sep 2014 10:09:04 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 22625 invoked by uid 89); 17 Sep 2014 10:09:03 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Date: Wed, 17 Sep 2014 12:08:49 +0200 From: Jakub Jelinek To: Andrew Senkevich Cc: "H.J. Lu" , "Carlos O'Donell" , "Joseph S. Myers" , libc-alpha , "Zamyatin, Igor" , "Melik-Adamyan, Areg" Subject: Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc Message-ID: <20140917100849.GD17454@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <5411F8D3.7050001@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) On Wed, Sep 17, 2014 at 01:56:06PM +0400, Andrew Senkevich wrote: > > The wiki says: > > > > 3.1. Goal > > > > Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD > > constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf > > and Cilk Plus constructs (6-7 in > > http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm) > > on x86_64 by adding SSE4, AVX and AVX2 vector implementations of > > several vector math functions (float and double versions). AVX-512 > > versions are planned to be added later. These functions can be also > > used manually (with intrincics) by developers to obtain speedup. > > > > It is the opposite of > > > > https://sourceware.org/ml/libc-alpha/2014-09/msg00277.html > > > > which is for programmers to use them directly in their > > applications, mostly independent of compilers. > > > > We need to come to an agreement on what goal is first. > > > > -- > > H.J. > > Hi H.J., > > of course the first goal is to improve vectorization. Usage with > intrinsics is additional goal and is not very significant. > > Attached first patch corrected according last comments in > https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html. you need all of SSE2, AVX and AVX2 versions, the other two can be thunked (extract arguments and call cos in a loop or similarly, then pass result in vector reg again). Jakub --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -46,6 +46,17 @@ # error "Never include directly; include instead." #endif +#undef __DECL_SIMD + +/* For now we have vectorized version only for _Mdouble_ case */ +#if !defined _Mfloat_ && !defined _Mlong_double_ +# if defined _OPENMP && _OPENMP >= 201307 +# define __DECL_SIMD _Pragma ("omp declare simd") As the function is provided only on x86_64, it needs to be guarded by defined __x86_64__ too (or have some way how arch specific headers can tell what function are elemental). Also, only the N (notinbranch) version is provided, so you'd need to use "omp declare simd notinbranch", and furthermore only the AVX2 version is provided (that is not possible for gcc,