From patchwork Tue Jun 9 21:32:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 39536 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D0692388C007; Tue, 9 Jun 2020 21:33:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D0692388C007 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1591738395; bh=CveUWVVvm/M0GjUqqxCM+Z8S2kKm0ZGA13YZveoGSYQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=dcHdZkYFAw8x7G9vC5dbwDn2f3FWma90Y8n+Buakb1RtrTMFTrruRQ9BiIxi2NjfC EJL+ICLLPj/eEDZEDJy16pbeeSYc+KhX6DZknPNdqmgiRzY4ou0EOu4urGbf6H+H4h vUFj2G169BR9rQMR4caZmAd0UhHXEDYQKa61YPds= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by sourceware.org (Postfix) with ESMTPS id 1EE89388B03D for ; Tue, 9 Jun 2020 21:33:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1EE89388B03D Received: by mail-qt1-x843.google.com with SMTP id y1so132335qtv.12 for ; Tue, 09 Jun 2020 14:33:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CveUWVVvm/M0GjUqqxCM+Z8S2kKm0ZGA13YZveoGSYQ=; b=dYZfOp0WAsI7P0u1W0QK8nbPZrdNpIulM9QRqR47D0Kge2olaVYjElrCblEXhtgVOG /XZG+p+JLTapdkETwAr/yXgvEOWSxwF1EnPZAGkLBaZfNO112CAAOFW2sY5ZCWMMeZ07 6lwuL4qWDBIFM84ZGzXFpK1NJt+9ASz8uVL0XwvxUF9D96YLKOon7iUv38/uqehvbyXN 37Azs86Yd1wBsDnkADza161GaTwofV3O1XT4ggVkvfr2eo8QkPhYh8vaMxMjaHcS0g+P Pdf9FbSGMuZn3wE9Zj9nHglCx/2nj0Cx21UOmYJRCFDW/Fv04pEOECezhY1qGKuoe1xP xCoQ== X-Gm-Message-State: AOAM5314YzQJVcLkPLUROV0mf5+qtZ0IeyEACU64pwEOMmnobUS94/Dg fJh/8rsiKuPevabcseZJKU0sh5c9/DM= X-Google-Smtp-Source: ABdhPJyeXudamFuDtGxxpfCL4ALQvoF4uYZUjSRRMUCPMRDupjBpC9VMZFlYru+Mt/RQcQQ0hNSxkA== X-Received: by 2002:ac8:1c72:: with SMTP id j47mr18283574qtk.198.1591738391893; Tue, 09 Jun 2020 14:33:11 -0700 (PDT) Received: from localhost.localdomain ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id d193sm10746214qke.124.2020.06.09.14.33.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jun 2020 14:33:11 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH 04/13] powerpc: Use sqrt{f} builtin Date: Tue, 9 Jun 2020 18:32:52 -0300 Message-Id: <20200609213301.3591135-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200609213301.3591135-1-adhemerval.zanella@linaro.org> References: <20200609213301.3591135-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-14.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" The powerpc sqrt implementation is also simplified: - the static constants are open coded within the implementation. - for !USE_SQRT_BUILTIN the function is implemented directly on __ieee754_sqrt (it avoid an superflous extra jump). Checked on powerpc-linux-gnu and powerpc64le-linux-gnu. Reviewed-by: Paul A. Clarke --- sysdeps/powerpc/fpu/e_sqrt.c | 57 ++++++-------------- sysdeps/powerpc/fpu/e_sqrtf.c | 56 ++++++------------- sysdeps/powerpc/fpu/math-use-builtins-sqrt.h | 9 ++++ 3 files changed, 42 insertions(+), 80 deletions(-) create mode 100644 sysdeps/powerpc/fpu/math-use-builtins-sqrt.h diff --git a/sysdeps/powerpc/fpu/e_sqrt.c b/sysdeps/powerpc/fpu/e_sqrt.c index a47f77966f..505ae72339 100644 --- a/sysdeps/powerpc/fpu/e_sqrt.c +++ b/sysdeps/powerpc/fpu/e_sqrt.c @@ -18,22 +18,16 @@ #include #include -#include #include -#include -#include -#include -#include #include +#include -#ifndef _ARCH_PPCSQ -static const double almost_half = 0.5000000000000001; /* 0.5 + 2^-53 */ -static const ieee_float_shape_type a_nan = {.word = 0x7fc00000 }; -static const ieee_float_shape_type a_inf = {.word = 0x7f800000 }; -static const float two108 = 3.245185536584267269e+32; -static const float twom54 = 5.551115123125782702e-17; -extern const float __t_sqrt[1024]; - +double +__ieee754_sqrt (double x) +{ +#if USE_SQRT_BUILTIN + return __builtin_sqrt (x); +#else /* The method is based on a description in Computation of elementary functions on the IBM RISC System/6000 processor, P. W. Markstein, IBM J. Res. Develop, 34(1) 1990. @@ -48,10 +42,7 @@ extern const float __t_sqrt[1024]; generated guesses (which mostly runs on the integer unit, while the Newton-Raphson is running on the FPU). */ -double -__slow_ieee754_sqrt (double x) -{ - const float inf = a_inf.value; + extern const float __t_sqrt[1024]; if (x > 0) { @@ -60,7 +51,7 @@ __slow_ieee754_sqrt (double x) ieee_double_shape_type ew_u; ieee_double_shape_type iw_u; ew_u.value = (x); - if (x != inf) + if (x != INFINITY) { /* Variables named starting with 's' exist in the argument-reduced space, so that 2 > sx >= 0.5, @@ -112,7 +103,7 @@ __slow_ieee754_sqrt (double x) INSERT_WORDS (fsg, fsgi, 0); iw_u.parts.msw = fsgi; iw_u.parts.lsw = (0); - e = -__builtin_fma (sy, sg, -almost_half); + e = -__builtin_fma (sy, sg, -0x1.0000000000001p-1); sd = -__builtin_fma (sg, sg, -sx); if ((xi0 & 0x7ff00000) == 0) goto denorm; @@ -122,7 +113,7 @@ __slow_ieee754_sqrt (double x) sy2 = sy + sy; /* complete the INSERT_WORDS (fsg, fsgi, 0) operation. */ fsg = iw_u.value; - e = -__builtin_fma (sy, sg, -almost_half); + e = -__builtin_fma (sy, sg, -0x1.0000000000001p-1); sd = -__builtin_fma (sg, sg, -sx); sy = __builtin_fma (e, sy2, sy); shx = sx * fsg; @@ -131,7 +122,7 @@ __slow_ieee754_sqrt (double x) rounded incorrectly. */ sy2 = sy + sy; g = sg * fsg; - e = -__builtin_fma (sy, sg, -almost_half); + e = -__builtin_fma (sy, sg, -0x1.0000000000001p-1); d = -__builtin_fma (g, sg, -shx); sy = __builtin_fma (e, sy2, sy); fesetenv_register (fe); @@ -140,38 +131,24 @@ __slow_ieee754_sqrt (double x) /* For denormalised numbers, we normalise, calculate the square root, and return an adjusted result. */ fesetenv_register (fe); - return __slow_ieee754_sqrt (x * two108) * twom54; + return __ieee754_sqrt (x * 0x1p+108f) * 0x1p-54f; } } else if (x < 0) { /* For some reason, some PowerPC32 processors don't implement FE_INVALID_SQRT. */ -#ifdef FE_INVALID_SQRT +# ifdef FE_INVALID_SQRT __feraiseexcept (FE_INVALID_SQRT); fenv_union_t u = { .fenv = fegetenv_register () }; if ((u.l & FE_INVALID) == 0) -#endif +# endif __feraiseexcept (FE_INVALID); - x = a_nan.value; + x = NAN; } return f_wash (x); +#endif /* USE_SQRT_BUILTIN */ } -#endif /* _ARCH_PPCSQ */ -#undef __ieee754_sqrt -double -__ieee754_sqrt (double x) -{ - double z; - -#ifdef _ARCH_PPCSQ - asm ("fsqrt %0,%1\n" :"=f" (z):"f" (x)); -#else - z = __slow_ieee754_sqrt (x); -#endif - - return z; -} libm_alias_finite (__ieee754_sqrt, __sqrt) diff --git a/sysdeps/powerpc/fpu/e_sqrtf.c b/sysdeps/powerpc/fpu/e_sqrtf.c index f119dcf5d9..ae76bb1e10 100644 --- a/sysdeps/powerpc/fpu/e_sqrtf.c +++ b/sysdeps/powerpc/fpu/e_sqrtf.c @@ -18,22 +18,16 @@ #include #include -#include #include -#include -#include -#include -#include #include +#include -#ifndef _ARCH_PPCSQ -static const float almost_half = 0.50000006; /* 0.5 + 2^-24 */ -static const ieee_float_shape_type a_nan = {.word = 0x7fc00000 }; -static const ieee_float_shape_type a_inf = {.word = 0x7f800000 }; -static const float two48 = 281474976710656.0; -static const float twom24 = 5.9604644775390625e-8; -extern const float __t_sqrt[1024]; - +float +__ieee754_sqrtf (float x) +{ +#if USE_SQRTF_BUILTIN + return __builtin_sqrtf (x); +#else /* The method is based on a description in Computation of elementary functions on the IBM RISC System/6000 processor, P. W. Markstein, IBM J. Res. Develop, 34(1) 1990. @@ -48,14 +42,11 @@ extern const float __t_sqrt[1024]; generated guesses (which mostly runs on the integer unit, while the Newton-Raphson is running on the FPU). */ -float -__slow_ieee754_sqrtf (float x) -{ - const float inf = a_inf.value; + extern const float __t_sqrt[1024]; if (x > 0) { - if (x != inf) + if (x != INFINITY) { /* Variables named starting with 's' exist in the argument-reduced space, so that 2 > sx >= 0.5, @@ -94,7 +85,7 @@ __slow_ieee754_sqrtf (float x) sy2 = sy + sy; sg = __builtin_fmaf (sy, sd, sg); /* 16-bit approximation to sqrt(sx). */ - e = -__builtin_fmaf (sy, sg, -almost_half); + e = -__builtin_fmaf (sy, sg, -0x1.0000020365653p-1); SET_FLOAT_WORD (fsg, fsgi); sd = -__builtin_fmaf (sg, sg, -sx); sy = __builtin_fmaf (e, sy2, sy); @@ -106,7 +97,7 @@ __slow_ieee754_sqrtf (float x) rounded incorrectly. */ sy2 = sy + sy; g = sg * fsg; - e = -__builtin_fmaf (sy, sg, -almost_half); + e = -__builtin_fmaf (sy, sg, -0x1.0000020365653p-1); d = -__builtin_fmaf (g, sg, -shx); sy = __builtin_fmaf (e, sy2, sy); fesetenv_register (fe); @@ -115,38 +106,23 @@ __slow_ieee754_sqrtf (float x) /* For denormalised numbers, we normalise, calculate the square root, and return an adjusted result. */ fesetenv_register (fe); - return __slow_ieee754_sqrtf (x * two48) * twom24; + return __ieee754_sqrtf (x * 0x1p+48) * 0x1p-24; } } else if (x < 0) { /* For some reason, some PowerPC32 processors don't implement FE_INVALID_SQRT. */ -#ifdef FE_INVALID_SQRT +# ifdef FE_INVALID_SQRT feraiseexcept (FE_INVALID_SQRT); fenv_union_t u = { .fenv = fegetenv_register () }; if ((u.l & FE_INVALID) == 0) -#endif +# endif feraiseexcept (FE_INVALID); - x = a_nan.value; + x = NAN; } return f_washf (x); -} -#endif /* _ARCH_PPCSQ */ - -#undef __ieee754_sqrtf -float -__ieee754_sqrtf (float x) -{ - float z; - -#ifdef _ARCH_PPCSQ - asm ("fsqrts %0,%1\n" :"=f" (z):"f" (x)); -#else - z = __slow_ieee754_sqrtf (x); -#endif - - return z; +#endif /* USE_SQRTF_BUILTIN */ } libm_alias_finite (__ieee754_sqrtf, __sqrtf) diff --git a/sysdeps/powerpc/fpu/math-use-builtins-sqrt.h b/sysdeps/powerpc/fpu/math-use-builtins-sqrt.h new file mode 100644 index 0000000000..653309a7e7 --- /dev/null +++ b/sysdeps/powerpc/fpu/math-use-builtins-sqrt.h @@ -0,0 +1,9 @@ +#ifdef _ARCH_PPCSQ +# define USE_SQRT_BUILTIN 1 +# define USE_SQRTF_BUILTIN 1 +#else +# define USE_SQRT_BUILTIN 0 +# define USE_SQRTF_BUILTIN 0 +#endif +#define USE_SQRTL_BUILTIN 0 +#define USE_SQRTF128_BUILTIN 0