From patchwork Tue Dec 5 13:47:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 24736 Received: (qmail 75588 invoked by alias); 5 Dec 2017 13:47:54 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 75568 invoked by uid 89); 5 Dec 2017 13:47:53 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy= X-HELO: mail-qt0-f180.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=T+COJJ696kuIaaXGyfuIXJmF3AyniY5GSERqdXnRDxc=; b=Fg05y0nKFGV+DV17owTbLiKQx3sBJ5YAXDMu1KIxDMdjcPwK+Wjgr2S3PiJY85XamF ztQCEwvrZDCTAso+0/epd5fxMITSJPsFJLnLm4ouLqa2JNHA9NjltEBpVJbeB/Up2x4F VuFNB0VUSiRbsEfmM/0g2YgeccLi1hVMkFr9AZF4uiwuC+78RKmOkRcPrH/aX1N1fKws hl01IrlxGqCioYIyMFav6q7IvofZV2RrOajp3xpBZchwguDyOGk6kJLatbmltgW4vm1H kEasoS1iPCkfr5dliVM/mDupgfDp9MdvefN/N0qdK0OFYz3qJFxW9xVekTbYzWBEdPRl YQhg== X-Gm-Message-State: AKGB3mJAN7xvM7wrSVu6DA66f2v/XWyzCr3DUQV+85PiVJ5aAqGuGHut U8zHet/N2CiwPQ4jj/lhSzUOE2OrY08= X-Google-Smtp-Source: AGs4zMacB8QnIr69q7kmfBGVA/TqXW3Wvq7UW9fbWbPy9ftTvjVFwuLYsq9RZIdGHSMnSnODeQ2hJQ== X-Received: by 10.200.45.215 with SMTP id q23mr2147898qta.134.1512481667706; Tue, 05 Dec 2017 05:47:47 -0800 (PST) Subject: Re: [PATCH] x86-64: Add sinf with FMA To: "H.J. Lu" , Joseph Myers Cc: GNU C Library References: <20171204180905.GA31592@gmail.com> <3c53189f-818f-0473-9ccd-1c0ecf40ab1c@linaro.org> From: Adhemerval Zanella Message-ID: Date: Tue, 5 Dec 2017 11:47:42 -0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: On 04/12/2017 20:42, H.J. Lu wrote: > On Mon, Dec 4, 2017 at 12:59 PM, Joseph Myers wrote: >> On Mon, 4 Dec 2017, H.J. Lu wrote: >> >>> Is >>> >>> (*__errno_location ()) = xxx >> >> If anything I'd expect a direct TLS initial-exec access to errno to be >> faster. > > I will update x86-64 s_sinf.S to use errno. > >>> faster? On x86-64, there should be no call to __floor. >> >> The x86_64 __floor inline in math_private.h is only when compiling glibc >> for SSE4.1 or later. >> >> The case of inlining floor / __floor and related functions for x86_64 >> without SSE4.1 is tricky. Supposing we had appropriate asm redirects to >> allow libm to call floor / ceil / trunc etc. directly so the compiler >> could inline them but __* are still called if not inlined, the default >> SSE2 inlines would come into play. But those inlines are slower on SSE4.1 >> hardware than an out-of-line call to the floor / ceil / trunc IFUNC, so if >> you're building a generic SSE2 glibc that may well be used on SSE4.1 >> hardware, you may wish either to avoid those inlines or, if there is a >> significant performance difference in benchmarks, have an SSE4.1 IFUNC of >> the calling function using floor (or __floor, with the present inline). >> >> The expf etc. set of optimized float functions have several different >> choices of how conversions to integer are handled, which may be configured >> by an architecture. That may make sense in other cases as well. > > x86-64 s_sinf.S avoids floor () with cvttsd2si. I don't think it can be used > with generic sinf. > I think we can the compiler builtins for math_private.h as a fallback for non SSE 4.1 as: --- @@ -109,11 +116,15 @@ extern __always_inline double __floor (double d) { double res; +#ifdef __SSE4_1__ # if defined __AVX__ || defined SSE2AVX asm ("vroundsd $1, %1, %0, %0" : "=x" (res) : "xm" (d)); # else asm ("roundsd $1, %1, %0" : "=x" (res) : "xm" (d)); # endif +#else + res = __builtin_floor (d); +#endif return res; } --- As least for GCC7 compiler will expand to an inline version even for generic tune option. However the code that actually calls __floor is not showing on profiling, but rather int to fp conversion. Generic implementation shows on my system (gcc version 7.1.1, i7-4790K): "sinf": { "": { "duration": 4.00221e+10, "iterations": 1.4128e+09, "max": 587.555, "min": 12.747, "mean": 28.3281 } And with a simple modification to avoid int to fp conversion: --- --- I get: "sinf": { "": { "duration": 4.0015e+10, "iterations": 1.4535e+09, "max": 640.456, "min": 11.437, "mean": 27.5301 } Which is roughly 3% on mean and 11.5% on min. I think we can improve it even more by avoiding the int to fp conversion to get the sign right and try operate with sign as double argument. diff --git a/sysdeps/ieee754/flt-32/s_sinf.c b/sysdeps/ieee754/flt-32/s_sinf.c index 40d3d19..a2fd3cf 100644 --- a/sysdeps/ieee754/flt-32/s_sinf.c +++ b/sysdeps/ieee754/flt-32/s_sinf.c @@ -75,7 +75,7 @@ static const double invpio4_table[] = { 0x1.0e4107cp-169 }; -static const int ones[] = { +1, -1 }; +static const double ones[] = { 1.0, -1.0 }; /* Compute the sine value using Chebyshev polynomials where THETA is the range reduced absolute value of the input @@ -92,7 +92,7 @@ reduced (const double theta, const unsigned long int n, const double theta2 = theta * theta; /* We are operating on |x|, so we need to add back the original signbit for sinf. */ - int sign; + double sign; /* Determine positive or negative primary interval. */ sign = ones[((n >> 2) & 1) ^ signbit]; /* Are we in the primary interval of sin or cos? */