From patchwork Wed Mar 25 19:22:24 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 132302 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 2002C4BB58DD for ; Wed, 25 Mar 2026 19:29:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2002C4BB58DD Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=PvMZ43MK X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-vk1-xa2a.google.com (mail-vk1-xa2a.google.com [IPv6:2607:f8b0:4864:20::a2a]) by sourceware.org (Postfix) with ESMTPS id 83B844BB58FD for ; Wed, 25 Mar 2026 19:24:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 83B844BB58FD Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 83B844BB58FD Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::a2a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1774466660; cv=none; b=FUcckrlZ83f+AzscC3x+LC8xLpRoQ3JblxJVRyQkuWxHpsivBoXUAGAuSdmRjgAxXjWdjYpfMx0DbiOqGKIEPJawpcP+JetecliXjlk/K8uKarr1Z/LMULs+IX6TO6Ojcalg/YKiusckfRhWyG1ngcdGAKhnfBJ5oH4jIZAtnhI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1774466660; c=relaxed/simple; bh=zF/1FZOD/7LPrc/5zxp14Vl+KWQdlXEiiAjlt6rOzmA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=iSfJavIDN+EpcOdpZPEzzlvsH7jYYeyGh+rTuMeToaBMMotQBkw8PWm71wDPVObAzkUi59to3Kr420bjhl7okFS/35FB7UgzUF9vl95GssOonqEPBCYMkwo5ZS9J8YjgyDx1ajmJajjz2cvwrHyp7ur0b0OJlPphy56iyvmIQHI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 83B844BB58FD Received: by mail-vk1-xa2a.google.com with SMTP id 71dfb90a1353d-56ba039eecbso125772e0c.0 for ; Wed, 25 Mar 2026 12:24:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1774466659; x=1775071459; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CdXcMji6L+FMz7B84UL4n5RBn9XLl+7+jjshZzc/W9U=; b=PvMZ43MKN7vh60/vkgfnuS1H3Hb12XVSK9v9MvDXMqYHEsIFSxPDMptebBENiE2cnt npwtMCGNejpDxamU0mYMqcss7y2dgurgiwmqm8r5fG9brBhojAck2QsBaBCoZhTHw/EO lw3uPH1TbbclOH97JJ6ds5cO6J4aclQAd4FEzS9xI7bC9tQCuRgr2zF/DdMs5hj7FhMb WBeyGeYSpKXpe38r4d9d3L337Bl2vhNL4skZA9o0h2mOlmh0m7mRW6IDMNfEYCfYeUWj 2rrON+seICaI/dgZgutjrLfEQ71A1FLKGqSgxuoNV0BOalVrCYDh2qljj6j1uebAKgUN ZjEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774466659; x=1775071459; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=CdXcMji6L+FMz7B84UL4n5RBn9XLl+7+jjshZzc/W9U=; b=Yf5ytJI4uMF/4+M5T/BBaD8lR54h62PDzT71K88GgLrLx7Oe/91CW0ZDN58g1TNRm4 gvm7Gt6Fu3Nfcchxm7L/xWnKtVwUo6Rrq76j205gvNm0koNTOd76FMDqhsV2O4Ly5PTb lFUOYWNuXUibQU9HdN/iDNSJAmDOluW628Vaqbv4Jkgt78sKiY+edIVTa9kApt52A7yz PBB5WdfKVT7jfGSdaG+9mmUqzxf8yKTJZRdZCinSx1QQa0bZ0UTYQ/PqYGO9qgpw97hf cFy+ZWT8Xepo/Yq8gozoq8Jsz0KzC1+jARCSQV0wOo4tGB+FKniLxC5aepTkq7Fqbd/H 7/yA== X-Gm-Message-State: AOJu0YwFjzrYnEBmAWHd3W/mPt1nQEclJFJHdHk1A+akWWw9i495+vze vpXn5+yyUq3FNicEhLzfcE7l1/JG3FSbULI3cDmdP+sDrp1/VRPgj+bres1QLa3C+bv+GmNLxSQ sx0QF X-Gm-Gg: ATEYQzy33troz1vERI3NFJ3lHUlEZMtYfE/aBqjC0y+BTS4iyUmvQ2M2+laM2QwbbK4 ikQHLkmnfgZLDQozHObDid1W7H5STK60g5VAyX14rqEzYsVwY+utyAqS1rh8KufBGIlaBCIQ7GU zcRBSRfa67/21w2ZGQie0sW+ztiEwv+00VUhnBHmGA088It8V4yaKJ9AG2dwt9u3B1J0pDxXofa 6jgT4cykjGFXwBmuiYWEWKnhEruxdgujmKWmVCXOQSu2p7u1rXxqsGmwzoiRRoeUKHSZPiVZNHK uux1WtAPVpyT2KT+27Cw+ze+yC6xwnraJdD/9BZpEXcJ2OX5Cuhxp4KHf9C1wR0i6J9z/l3iwrg 4pPOxlx7QVjSDSWY5kc03ZvnGbbGprdOaaCE97J8DAGyViqPX+kdRAodP48ygUtDKdJ5Jcm3hI1 k356q6eb7u/K8WEHXyo4QoYC4JdP0MCeLNEAo= X-Received: by 2002:a05:6122:d96:b0:56a:9841:9f81 with SMTP id 71dfb90a1353d-56d21f50c1fmr2651154e0c.6.1774466659410; Wed, 25 Mar 2026 12:24:19 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c1:90ea:f31d:ca7d:f8dd:c20b]) by smtp.gmail.com with ESMTPSA id 71dfb90a1353d-56d31d394fcsm1040021e0c.12.2026.03.25.12.24.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Mar 2026 12:24:18 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Paul Zimmermann , DJ Delorie Subject: [PATCH 6/8] math: Use sincosf from CORE-MATH Date: Wed, 25 Mar 2026 16:22:24 -0300 Message-ID: <20260325192357.1284741-7-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260325192357.1284741-1-adhemerval.zanella@linaro.org> References: <20260325192357.1284741-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org The CORE-MATH implementation is correctly rounded (in any rounding mode) and performs similarly. * latency - input: random (-2*pi, 2*pi) latency master patched improvement x86_64 15.4175 21.4282 -38.99% x86_64v2 14.0476 14.3099 -1.87% x86_64v3 10.9744 11.6395 -6.06% aarch64 9.2646 10.2755 -10.91% power10 5.2857 4.8504 8.24% - input: large (-64, 64) latency master patched improvement x86_64 15.8593 22.4521 -41.57% x86_64v2 14.7728 15.0530 -1.90% x86_64v3 12.5962 13.3065 -5.64% aarch64 9.2468 11.5874 -25.31% power10 6.3992 5.5319 13.55% - input: huge (-1024, 1024) latency master patched improvement x86_64 19.6595 22.7598 -15.77% x86_64v2 15.0780 15.3032 -1.49% x86_64v3 12.6576 13.3942 -5.82% aarch64 10.5701 10.9711 -3.79% power10 7.3381 5.1920 29.25% * reciprocal-throughput - input: random (-2*pi, 2*pi) master patched improvement x86_64 15.3955 21.4922 -39.60% x86_64v2 14.0497 14.3085 -1.84% x86_64v3 10.9702 11.6287 -6.00% aarch64 9.1699 10.0887 -10.02% power10 5.3090 4.8565 8.52% - input: rlarge (-64, 64) master patched improvement x86_64 15.8584 22.4857 -41.79% x86_64v2 14.7689 15.0531 -1.92% x86_64v3 11.8228 12.4646 -5.43% aarch64 9.2013 11.5711 -25.76% power10 6.4112 5.5249 13.82% - input: huge (-1024, 1024) master patched improvement x86_64 19.6392 22.7250 -15.71% x86_64v2 15.0745 15.2802 -1.36% x86_64v3 12.0551 12.7136 -5.46% aarch64 10.5383 10.9630 -4.03% power10 7.2801 5.1874 28.74% The benchmarks were run on x64_64 (Ryzen 9 5900X, gcc 15.2.1) with --disable-multi-arch disable; aarch64 (Neoverse-N1, gcc 15.2.1), and powerpc (POWER10, gcc 15.2.1). The code is adapted to glibc style and to use the math_config definition.h (to handle errno, overflow, and underflow). The x86_64-v1 shows some performance regression because the CORE-MATH implementation relies on roundeven. The x86_64v2 and forward provide a specific instruction, and x86_64 already provides an FMA ifunc variant. Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. --- SHARED-FILES | 2 + math/auto-libm-test-in | 1 + math/auto-libm-test-out-sincos | 25 ++ sysdeps/ieee754/flt-32/libm-test-ulps | 12 + sysdeps/ieee754/flt-32/s_sincosf.c | 254 +++++++++++++----- sysdeps/ieee754/flt-32/s_sincosf_data.h | 13 + .../ieee754/flt-32/s_sincosf_data_generic.c | 13 + 7 files changed, 247 insertions(+), 73 deletions(-) diff --git a/SHARED-FILES b/SHARED-FILES index 751b31fae4..68413ec08f 100644 --- a/SHARED-FILES +++ b/SHARED-FILES @@ -310,6 +310,8 @@ core-math: sysdeps/ieee754/flt-32/s_log2p1f.c # src/binary32/sin/sinf.c revision 8ea8ea35 sysdeps/ieee754/flt-32/s_sinf.c + # src/binary32/sincos/sincosf.c revision 8ea8ea35 + sysdeps/ieee754/flt-32/s_sincosf.c # src/binary32/sinpi/sinpif.c, revision bbfabd99d sysdeps/ieee754/flt-32/s_sinpif.c # src/binary32/tan/tanf.c, revision 59d21d7 diff --git a/math/auto-libm-test-in b/math/auto-libm-test-in index 4f38095453..2574957ada 100644 --- a/math/auto-libm-test-in +++ b/math/auto-libm-test-in @@ -9499,6 +9499,7 @@ sincos 1e22 sincos 0x1p1023 sincos 0x1p16383 sincos 0x1p+120 +sincos 0x1p-126 sincos 0x1p+127 sincos 0x1.fffff8p+127 sincos 0x1.fffffep+127 diff --git a/math/auto-libm-test-out-sincos b/math/auto-libm-test-out-sincos index f0457631a9..085b9f421a 100644 --- a/math/auto-libm-test-out-sincos +++ b/math/auto-libm-test-out-sincos @@ -1197,6 +1197,31 @@ sincos 0x1p+120 = sincos tonearest ibm128 0x1p+120 : 0x6.0b8d19579bf2db5e5f1aa933f4p-4 -0xe.d06685b36c66c4cf35c11f6518p-4 : inexact-ok = sincos towardzero ibm128 0x1p+120 : 0x6.0b8d19579bf2db5e5f1aa933f2p-4 -0xe.d06685b36c66c4cf35c11f6518p-4 : inexact-ok = sincos upward ibm128 0x1p+120 : 0x6.0b8d19579bf2db5e5f1aa933f4p-4 -0xe.d06685b36c66c4cf35c11f6518p-4 : inexact-ok +sincos 0x1p-126 += sincos downward binary32 0x4p-128 : 0x3.fffff8p-128 0xf.fffffp-4 : inexact-ok underflow-ok errno-erange-ok += sincos tonearest binary32 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok underflow-ok errno-erange-ok += sincos towardzero binary32 0x4p-128 : 0x3.fffff8p-128 0xf.fffffp-4 : inexact-ok underflow-ok errno-erange-ok += sincos upward binary32 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok underflow-ok errno-erange-ok += sincos downward binary64 0x4p-128 : 0x3.ffffffffffffep-128 0xf.ffffffffffff8p-4 : inexact-ok += sincos tonearest binary64 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok += sincos towardzero binary64 0x4p-128 : 0x3.ffffffffffffep-128 0xf.ffffffffffff8p-4 : inexact-ok += sincos upward binary64 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok += sincos downward intel96 0x4p-128 : 0x3.fffffffffffffffcp-128 0xf.fffffffffffffffp-4 : inexact-ok += sincos tonearest intel96 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok += sincos towardzero intel96 0x4p-128 : 0x3.fffffffffffffffcp-128 0xf.fffffffffffffffp-4 : inexact-ok += sincos upward intel96 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok += sincos downward m68k96 0x4p-128 : 0x3.fffffffffffffffcp-128 0xf.fffffffffffffffp-4 : inexact-ok += sincos tonearest m68k96 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok += sincos towardzero m68k96 0x4p-128 : 0x3.fffffffffffffffcp-128 0xf.fffffffffffffffp-4 : inexact-ok += sincos upward m68k96 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok += sincos downward binary128 0x4p-128 : 0x3.fffffffffffffffffffffffffffep-128 0xf.fffffffffffffffffffffffffff8p-4 : inexact-ok += sincos tonearest binary128 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok += sincos towardzero binary128 0x4p-128 : 0x3.fffffffffffffffffffffffffffep-128 0xf.fffffffffffffffffffffffffff8p-4 : inexact-ok += sincos upward binary128 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok += sincos downward ibm128 0x4p-128 : 0x3.ffffffffffffffffffffffffffp-128 0xf.fffffffffffffffffffffffffcp-4 : inexact-ok += sincos tonearest ibm128 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok += sincos towardzero ibm128 0x4p-128 : 0x3.ffffffffffffffffffffffffffp-128 0xf.fffffffffffffffffffffffffcp-4 : inexact-ok += sincos upward ibm128 0x4p-128 : 0x4p-128 0x1p+0 : inexact-ok sincos 0x1p+127 = sincos downward binary32 0x8p+124 : 0x9.f9631p-4 0xc.82b8ep-4 : inexact-ok = sincos tonearest binary32 0x8p+124 : 0x9.f9631p-4 0xc.82b8fp-4 : inexact-ok diff --git a/sysdeps/ieee754/flt-32/libm-test-ulps b/sysdeps/ieee754/flt-32/libm-test-ulps index 0179223018..dfe78a63c8 100644 --- a/sysdeps/ieee754/flt-32/libm-test-ulps +++ b/sysdeps/ieee754/flt-32/libm-test-ulps @@ -275,6 +275,18 @@ float: 0 Function: "sin_upward": float: 0 +Function: "sincos": +float: 0 + +Function: "sincos_downward": +float: 0 + +Function: "sincos_towardzero": +float: 0 + +Function: "sincos_upward": +float: 0 + Function: "sinh": float: 0 diff --git a/sysdeps/ieee754/flt-32/s_sincosf.c b/sysdeps/ieee754/flt-32/s_sincosf.c index fd49d20b58..eb5d849174 100644 --- a/sysdeps/ieee754/flt-32/s_sincosf.c +++ b/sysdeps/ieee754/flt-32/s_sincosf.c @@ -1,28 +1,35 @@ -/* Compute sine and cosine of argument. - Copyright (C) 2018-2026 Free Software Foundation, Inc. - This file is part of the GNU C Library. +/* Correctly-rounded sincos of binary32 value. - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. +Copyright (c) 2024-2025 Alexei Sibidanov - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. +The original version of this file was copied from the CORE-MATH +project (file src/binary32/sincos/sincosf.c, revision 8ea8ea35. - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: -#include -#include +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. +*/ + +#include #include -#include #include #include "math_config.h" -#include "s_sincosf.h" +#include +#include #ifndef SECTION # define SECTION @@ -34,73 +41,174 @@ # define SINCOSF_FUNC SINCOSF #endif -/* Fast sincosf implementation. Worst-case ULP is 0.5607, maximum relative - error is 0.5303 * 2^-23. A single-step range reduction is used for - small values. Large inputs have their range reduced using fast integer - arithmetic. */ -void -SECTION -SINCOSF_FUNC (float y, float *sinp, float *cosp) +static double __attribute__ ((noinline)) +rbig (uint32_t u, int *q) { - double x = y; - double s; - int n; - const sincos_t *p = &__sincosf_table[0]; - - if (abstop12 (y) < abstop12 (pio4)) + int e = (u >> 23) & 0xff, i; + uint64_t m = (u & (~0u >> 9)) | 1 << 23; + u128 p0 = u128_mul (u128_from_u64 (m), u128_from_u64 (IPI[0])); + u128 p1 = u128_mul (u128_from_u64 (m), u128_from_u64 (IPI[1])); + p1 = u128_add (p1, u128_rshift (p0, 64)); + u128 p2 = u128_mul (u128_from_u64 (m), u128_from_u64 (IPI[2])); + p2 = u128_add (p2, u128_rshift (p1, 64)); + u128 p3 = u128_mul (u128_from_u64 (m), u128_from_u64 (IPI[3])); + p3 = u128_add (p3, u128_rshift (p2, 64)); + uint64_t p3h = u128_high (p3), p3l = u128_low (p3), p2l = u128_low (p2), + p1l = u128_low (p1); + int64_t a; + int k = e - 124, s = k - 23; + /* in cr_sinf(), rbig() is called in the case 127+28 <= e < 0xff + thus 155 <= e <= 254, which yields 28 <= k <= 127 and 5 <= s <= 104 */ + if (s < 64) { - double x2 = x * x; + i = p3h << s | p3l >> (64 - s); + a = p3l << s | p2l >> (64 - s); + } + else if (s == 64) + { + i = p3l; + a = p2l; + } + else + { /* s > 64 */ + i = p3l << (s - 64) | p2l >> (128 - s); + a = p2l << (s - 64) | p1l >> (128 - s); + } + int sgn = u; + sgn >>= 31; + int64_t sm = a >> 63; + i -= sm; + double z = (a ^ sgn) * 0x1p-64; + i = (i ^ sgn) - sgn; + *q = i; + return z; +} - if (__glibc_unlikely (abstop12 (y) < abstop12 (0x1p-12f))) +static inline double +rltl (float z, int *q) +{ + double x = z; + double idl = -0x1.b1bbead603d8bp-29 * x, idh = 0x1.45f306ep+2 * x, + id = roundeven_finite (idh); + *q = asuint64 (0x1.8p52 + id); + return (idh - id) + idl; +} + +static inline double +rltl0 (double x, int *q) +{ + double idh = 0x1.45f306dc9c883p+2 * x, id = roundeven_finite (idh); + *q = asuint64 (0x1.8p52 + id); + return idh - id; +} + +static inline float +add_sign (float x, float rh, float rl) +{ + float sgn = copysignf (1.0f, x); + return sgn * rh + sgn * rl; +} + +static void __attribute__ ((noinline)) +as_sincosf_database (float x, float *sout, float *cout) +{ + uint32_t t = asuint (x); + uint32_t ax = t & (~0u >> 1); + for (unsigned i = 0; i < array_length (ST_SINCOSF); i++) + if (__glibc_unlikely (ST_SINCOSF[i].uarg == ax)) { - /* Force underflow for tiny y. */ - if (__glibc_unlikely (abstop12 (y) < abstop12 (0x1p-126f))) - math_force_eval ((float)x2); - *sinp = y; - *cosp = 1.0f; - return; + *sout = add_sign (x, ST_SINCOSF[i].sh, ST_SINCOSF[i].sl); + *cout = ST_SINCOSF[i].ch + ST_SINCOSF[i].cl; + break; } +} - sincosf_poly (x, x2, p, 0, sinp, cosp); +static void __attribute__ ((noinline)) +as_sincosf_big (float x, float *sout, float *cout) +{ + uint32_t t = asuint (x); + uint32_t ax = t << 1; + if (__glibc_unlikely (ax >= 0xffu << 24)) + { // nan or +-inf + if (ax << 8) + { + *sout = x + x; + *cout = x + x; + return; // nan + } + *sout = *cout = x - x; + __math_invalidf (x + x); + return; } - else if (abstop12 (y) < abstop12 (120.0f)) - { - x = reduce_fast (x, p, &n); + int ia; + double z = rbig (t, &ia); + double z2 = z * z, z4 = z2 * z2; + double aa = (A[0] + z2 * A[1]) + z4 * (A[2] + z2 * A[3]); + double bb = (B[0] + z2 * B[1]) + z4 * (B[2] + z2 * B[3]); + bb *= z; + double s0 = TB[ia & 31], c0 = TB[(ia + 8u) & 31]; + double s = s0 + z * (aa * c0 - bb * s0); + double c = c0 - z * (aa * s0 + bb * c0); + *sout = s; + *cout = c; + uint64_t tr = asuint64 (c); + uint64_t tail = (tr + 6) & (~UINT64_C(0) >> 36); + if (__glibc_unlikely (tail <= 12)) + return as_sincosf_database (x, sout, cout); +} - /* Setup the signs for sin and cos. */ - s = p->sign[n & 3]; - - if (n & 2) - p = &__sincosf_table[1]; - - sincosf_poly (x * s, x * x, p, n, sinp, cosp); - } - else if (__glibc_likely (abstop12 (y) < abstop12 (INFINITY))) - { - uint32_t xi = asuint (y); - int sign = xi >> 31; - - x = reduce_large (xi, &n); - - /* Setup signs for sin and cos - include original sign. */ - s = p->sign[(n + sign) & 3]; - - if ((n + sign) & 2) - p = &__sincosf_table[1]; - - sincosf_poly (x * s, x * x, p, n, sinp, cosp); +void SECTION +SINCOSF_FUNC (float x, float *sout, float *cout) +{ + uint32_t ax = asuint (x) << 1; + int ia; + double z0 = x, z; + if (__glibc_likely (ax < 0x822d97c8u)) + { // |x| < 0x1.2d97c8p+3 + if (__glibc_unlikely (ax < 0x73000000u)) + { // |x| < 0x1p-12 + if (__glibc_unlikely (ax < 0x66000000u)) + { // |x| < 0x1p-25 + if (__glibc_unlikely (ax == 0u)) + { + *sout = x; + *cout = 1.0f; + } + else + { + *sout = fmaf (-x, fabsf (x), x); + *cout = 1.0f - 0x1p-25f; + } + } + else + { + *sout = (-0x1.555556p-3f * x) * (x * x) + x; + *cout = (-0x1p-1f * x) * x + 1.0f; + } + return; + } + if (__glibc_unlikely (ax == 0x812d97c8u)) + return as_sincosf_database (x, sout, cout); + z = rltl0 (z0, &ia); } else { - /* Return NaN if Inf or NaN for both sin and cos. */ - *sinp = *cosp = y - y; -#if WANT_ERRNO - /* Needed to set errno for +-Inf, the add is a hack to work - around a gcc register allocation issue: just passing y - affects code generation in the fast path (PR86901). */ - __math_invalidf (y + y); -#endif + if (__glibc_unlikely (ax > 0x99000000u)) + return as_sincosf_big (x, sout, cout); + if (__glibc_unlikely (ax == 0x8c333330u)) + return as_sincosf_database (x, sout, cout); + z = rltl (z0, &ia); } + double z2 = z * z, z4 = z2 * z2; + double aa = (A[0] + z2 * A[1]) + z4 * (A[2] + z2 * A[3]); + double bb = (B[0] + z2 * B[1]) + z4 * (B[2] + z2 * B[3]); + aa *= z; + bb *= z2; + double s0 = TB[ia & 31], c0 = TB[(ia + 8) & 31]; + double rs = s0 + (aa * c0 - bb * s0); + double rc = c0 - (aa * s0 + bb * c0); + *sout = rs; + *cout = rc; } #ifndef SINCOSF diff --git a/sysdeps/ieee754/flt-32/s_sincosf_data.h b/sysdeps/ieee754/flt-32/s_sincosf_data.h index 8bea0042f3..354e9b6a88 100644 --- a/sysdeps/ieee754/flt-32/s_sincosf_data.h +++ b/sysdeps/ieee754/flt-32/s_sincosf_data.h @@ -46,4 +46,17 @@ extern const sincosf_database_t __sinf_st[4] attribute_hidden; extern const sincosf_database_t __cosf_st[5] attribute_hidden; #define ST_COSF __cosf_st +typedef struct +{ + union + { + float arg; + uint32_t uarg; + }; + float sh, sl; + float ch, cl; +} sincosf2_database_t; +extern const sincosf2_database_t __sincosf_st[9] attribute_hidden; +#define ST_SINCOSF __sincosf_st + #endif diff --git a/sysdeps/ieee754/flt-32/s_sincosf_data_generic.c b/sysdeps/ieee754/flt-32/s_sincosf_data_generic.c index 2e58f0b423..2a85b687dd 100644 --- a/sysdeps/ieee754/flt-32/s_sincosf_data_generic.c +++ b/sysdeps/ieee754/flt-32/s_sincosf_data_generic.c @@ -88,3 +88,16 @@ const sincosf_database_t __cosf_st[] = { { 0x1.3170fp+63 }, 0x1.fe2976p-1, 0x1p-26 }, { { 0x1.2b9622p+67 }, 0x1.f0285ep-1, -0x1p-26 }, }; + +const sincosf2_database_t __sincosf_st[] = +{ + { { 0x1.33333p+13 }, -0x1.63f4bap-2, -0x1p-27, -0x1.e01216p-1, -0x1p-26 }, + { { 0x1.75b8a2p-1 }, 0x1.55688ap-1, -0x1p-26, 0x1.7d8e1ep-1, 0x1p-26 }, + { { 0x1.4f0654p+0 }, 0x1.ee836cp-1, -0x1p-26, 0x1.09558p-2, -0x1p-27 }, + { { 0x1.2d97c8p+3 }, -0x1.99bc5ap-26, -0x1p-51, -0x1p+0, 0x1p-25 }, + { { 0x1.2d97c8p+2 }, -0x1p+0, 0x1p-25, 0x1.99bc5cp-27, -0x1p-52 }, + { { 0x1.4555p+51 }, -0x1.b0ea44p-1, 0x1p-26, 0x1.115d7ep-1, -0x1p-26 }, + { { 0x1.48a858p+54 }, 0x1.beac8cp-1, 0x1p-26, 0x1.f48148p-2, 0x1p-27 }, + { { 0x1.3170fp+63 }, 0x1.5ac1eep-4, -0x1p-30, 0x1.fe2976p-1, 0x1p-26 }, + { { 0x1.2b9622p+67 }, -0x1.f983c2p-3, 0x1p-28, 0x1.f0285ep-1, -0x1p-26 }, +};