[09/15] math: Use asinpif from CORE-MATH
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic asinpif.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
latency master patched improvement
x86_64 46.4996 51.0456 -9.78%
x86_64v2 46.7551 52.1317 -11.50%
x86_64v3 42.6235 34.8162 18.32%
aarch64 (Neoverse) 17.4161 14.3604 17.55%
power8 10.7347 9.0193 15.98%
power10 10.6420 9.0362 15.09%
reciprocal-throughput master patched improvement
x86_64 24.7208 29.0812 -17.64%
x86_64v2 24.2177 29.7166 -22.71%
x86_64v3 20.5617 12.3679 39.85%
aarch64 (Neoverse) 13.4827 7.17613 46.78%
power8 6.46134 3.56089 44.89%
power10 5.79007 3.49544 39.63%
x86_64/x86_64-v2 shows slower performance due the use of a fma
operation in the fast patch, only x86_64-v3 provides it without a
function call.
---
SHARED-FILES | 4 +
sysdeps/aarch64/libm-test-ulps | 4 -
sysdeps/arc/fpu/libm-test-ulps | 4 -
sysdeps/arc/nofpu/libm-test-ulps | 1 -
sysdeps/arm/libm-test-ulps | 4 -
sysdeps/hppa/fpu/libm-test-ulps | 4 -
sysdeps/i386/fpu/libm-test-ulps | 4 -
.../i386/i686/fpu/multiarch/libm-test-ulps | 4 -
sysdeps/ieee754/flt-32/s_asinpif.c | 136 ++++++++++++++++++
sysdeps/loongarch/lp64/libm-test-ulps | 4 -
sysdeps/mips/mips64/libm-test-ulps | 4 -
sysdeps/or1k/fpu/libm-test-ulps | 4 -
sysdeps/or1k/nofpu/libm-test-ulps | 1 -
sysdeps/powerpc/fpu/libm-test-ulps | 4 -
sysdeps/riscv/nofpu/libm-test-ulps | 1 -
sysdeps/riscv/rvd/libm-test-ulps | 4 -
sysdeps/s390/fpu/libm-test-ulps | 4 -
sysdeps/sparc/fpu/libm-test-ulps | 4 -
sysdeps/x86_64/fpu/libm-test-ulps | 4 -
19 files changed, 140 insertions(+), 59 deletions(-)
create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c
Comments
I confirm we get correct rounding for all rounding modes and all binary32 inputs
on x86_64.
Paul
> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Cc: DJ Delorie <dj@redhat.com>,
> Joseph Myers <josmyers@redhat.com>,
> Paul Zimmermann <Paul.Zimmermann@inria.fr>,
> Alexei Sibidanov <sibid@uvic.ca>
> Date: Fri, 31 Jan 2025 16:17:13 -0300
>
> The CORE-MATH implementation is correctly rounded (for any rounding mode)
> and shows better performance to the generic asinpif.
>
> The code was adapted to glibc style and to use the definition of
> math_config.h (to handle errno, overflow, and underflow).
>
> Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
> gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
>
> latency master patched improvement
> x86_64 46.4996 51.0456 -9.78%
> x86_64v2 46.7551 52.1317 -11.50%
> x86_64v3 42.6235 34.8162 18.32%
> aarch64 (Neoverse) 17.4161 14.3604 17.55%
> power8 10.7347 9.0193 15.98%
> power10 10.6420 9.0362 15.09%
>
> reciprocal-throughput master patched improvement
> x86_64 24.7208 29.0812 -17.64%
> x86_64v2 24.2177 29.7166 -22.71%
> x86_64v3 20.5617 12.3679 39.85%
> aarch64 (Neoverse) 13.4827 7.17613 46.78%
> power8 6.46134 3.56089 44.89%
> power10 5.79007 3.49544 39.63%
>
> x86_64/x86_64-v2 shows slower performance due the use of a fma
> operation in the fast patch, only x86_64-v3 provides it without a
> function call.
> ---
> SHARED-FILES | 4 +
> sysdeps/aarch64/libm-test-ulps | 4 -
> sysdeps/arc/fpu/libm-test-ulps | 4 -
> sysdeps/arc/nofpu/libm-test-ulps | 1 -
> sysdeps/arm/libm-test-ulps | 4 -
> sysdeps/hppa/fpu/libm-test-ulps | 4 -
> sysdeps/i386/fpu/libm-test-ulps | 4 -
> .../i386/i686/fpu/multiarch/libm-test-ulps | 4 -
> sysdeps/ieee754/flt-32/s_asinpif.c | 136 ++++++++++++++++++
> sysdeps/loongarch/lp64/libm-test-ulps | 4 -
> sysdeps/mips/mips64/libm-test-ulps | 4 -
> sysdeps/or1k/fpu/libm-test-ulps | 4 -
> sysdeps/or1k/nofpu/libm-test-ulps | 1 -
> sysdeps/powerpc/fpu/libm-test-ulps | 4 -
> sysdeps/riscv/nofpu/libm-test-ulps | 1 -
> sysdeps/riscv/rvd/libm-test-ulps | 4 -
> sysdeps/s390/fpu/libm-test-ulps | 4 -
> sysdeps/sparc/fpu/libm-test-ulps | 4 -
> sysdeps/x86_64/fpu/libm-test-ulps | 4 -
> 19 files changed, 140 insertions(+), 59 deletions(-)
> create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c
>
> diff --git a/SHARED-FILES b/SHARED-FILES
> index 3fde72644a..e700f4b155 100644
> --- a/SHARED-FILES
> +++ b/SHARED-FILES
> @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c:
> (src/binary32/acospi/acospif.c in CORE-MATH)
> - the code was adapted to use glibc code style and internal
> functions to handle errno, overflow, and underflow.
> +sysdeps/ieee754/flt-32/s_asinpif.c:
> + (src/binary32/asinpi/asinpif.c in CORE-MATH)
> + - the code was adapted to use glibc code style and internal
> + functions to handle errno, overflow, and underflow.
> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
> index 1a403d95b6..abb0611ee5 100644
> --- a/sysdeps/aarch64/libm-test-ulps
> +++ b/sysdeps/aarch64/libm-test-ulps
> @@ -115,22 +115,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
> index c0c5daa589..35aebba38a 100644
> --- a/sysdeps/arc/fpu/libm-test-ulps
> +++ b/sysdeps/arc/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
> index 2b34f5a0ab..325546e582 100644
> --- a/sysdeps/arc/nofpu/libm-test-ulps
> +++ b/sysdeps/arc/nofpu/libm-test-ulps
> @@ -18,7 +18,6 @@ double: 2
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
> index afb0532a66..0927fdb980 100644
> --- a/sysdeps/arm/libm-test-ulps
> +++ b/sysdeps/arm/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
> index b9959c8a12..02cc3b5ddc 100644
> --- a/sysdeps/hppa/fpu/libm-test-ulps
> +++ b/sysdeps/hppa/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
> index 85c58f34e9..69d0eb1eec 100644
> --- a/sysdeps/i386/fpu/libm-test-ulps
> +++ b/sysdeps/i386/fpu/libm-test-ulps
> @@ -101,25 +101,21 @@ ldouble: 5
>
> Function: "asinpi":
> double: 1
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 2
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> float128: 1
> ldouble: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> float128: 2
> ldouble: 2
>
> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> index bc14e7e115..392d7d252c 100644
> --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> @@ -101,25 +101,21 @@ ldouble: 5
>
> Function: "asinpi":
> double: 1
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 2
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> float128: 1
> ldouble: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> float128: 2
> ldouble: 2
>
> diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c
> new file mode 100644
> index 0000000000..585dc3f06e
> --- /dev/null
> +++ b/sysdeps/ieee754/flt-32/s_asinpif.c
> @@ -0,0 +1,136 @@
> +/* Correctly-rounded half-revolution arc-sine function for binary32 value.
> +
> +Copyright (c) 2022-2025 Alexei Sibidanov.
> +
> +The original version of this file was copied from the CORE-MATH
> +project (file src/binary32/asinpi/asinpif.c, revision 49e223e).
> +
> +Permission is hereby granted, free of charge, to any person obtaining a copy
> +of this software and associated documentation files (the "Software"), to deal
> +in the Software without restriction, including without limitation the rights
> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> +copies of the Software, and to permit persons to whom the Software is
> +furnished to do so, subject to the following conditions:
> +
> +The above copyright notice and this permission notice shall be included in all
> +copies or substantial portions of the Software.
> +
> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> +SOFTWARE.
> +
> +*/
> +
> +#include <errno.h>
> +#include <math.h>
> +#include <stdint.h>
> +#include <libm-alias-float.h>
> +#include "math_config.h"
> +
> +float
> +__asinpif (float x)
> +{
> + float ax = fabsf (x);
> + double az = ax;
> + double z = x;
> + uint32_t t = asuint (x);
> + int32_t e = (t >> 23) & 0xff;
> + if (__glibc_unlikely (e >= 127))
> + {
> + if (ax == 1.0f)
> + return copysignf (0.5f, x);
> + if (e == 0xff && (t << 9))
> + return x + x; /* nan */
> + return __math_edomf ((x - x) / (x - x)); /* nan */
> + }
> + int32_t s = 146 - e;
> + int32_t i = 0;
> + if (__glibc_likely (s < 32))
> + i = ((t & (~0u >> 9)) | 1 << 23) >> s;
> + static const double ch[][8] =
> + {
> + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6,
> + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8,
> + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 },
> + { 0x1.ffffffffd3cdap-2, -0x1.17cc1b3355fdcp-4, 0x1.d067a1e8d5a99p-6,
> + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8,
> + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 },
> + { 0x1.fffffff7c4617p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6,
> + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8,
> + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 },
> + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6,
> + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8,
> + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 },
> + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6,
> + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8,
> + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 },
> + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6,
> + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8,
> + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 },
> + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6,
> + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8,
> + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 },
> + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6,
> + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8,
> + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 },
> + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6,
> + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8,
> + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 },
> + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6,
> + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8,
> + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 },
> + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6,
> + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9,
> + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 },
> + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6,
> + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9,
> + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 },
> + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6,
> + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9,
> + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 },
> + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6,
> + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9,
> + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 },
> + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6,
> + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9,
> + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 },
> + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6,
> + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9,
> + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15
> + }
> + };
> + const double *c = ch[i];
> + double z2 = z * z;
> + double z4 = z2 * z2;
> + if (__glibc_unlikely (i == 0))
> + {
> + double c0 = c[0] + z2 * c[1];
> + double c2 = c[2] + z2 * c[3];
> + double c4 = c[4] + z2 * c[5];
> + double c6 = c[6] + z2 * c[7];
> + c0 += c2 * z4;
> + c4 += c6 * z4;
> + c0 += c4 * (z4 * z4);
> + if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f))
> + __set_errno (ERANGE);
> + return z * c0;
> + }
> + else
> + {
> + double f = sqrt (1 - az);
> + double c0 = fma (az, c[1], c[0]);
> + double c2 = c[2] + az * c[3];
> + double c4 = c[4] + az * c[5];
> + double c6 = c[6] + az * c[7];
> + c0 += c2 * z2;
> + c4 += c6 * z2;
> + c0 += c4 * z4;
> + double r = fma (-c0, copysign (f, x), copysign (0.5, x));
> + return r;
> + }
> +}
> +libm_alias_float (__asinpi, asinpi)
> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
> index ce84ddf1e6..33dd6718ba 100644
> --- a/sysdeps/loongarch/lp64/libm-test-ulps
> +++ b/sysdeps/loongarch/lp64/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
> index 67c37dfd5e..869ceff928 100644
> --- a/sysdeps/mips/mips64/libm-test-ulps
> +++ b/sysdeps/mips/mips64/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
> index d3b1036d29..75db236e09 100644
> --- a/sysdeps/or1k/fpu/libm-test-ulps
> +++ b/sysdeps/or1k/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
> index 14b7e0f3f9..a1f7c80097 100644
> --- a/sysdeps/or1k/nofpu/libm-test-ulps
> +++ b/sysdeps/or1k/nofpu/libm-test-ulps
> @@ -54,7 +54,6 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
> index c9c86de147..fa3cf2e844 100644
> --- a/sysdeps/powerpc/fpu/libm-test-ulps
> +++ b/sysdeps/powerpc/fpu/libm-test-ulps
> @@ -107,25 +107,21 @@ ldouble: 7
>
> Function: "asinpi":
> double: 1
> -float: 1
> float128: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> float128: 2
> ldouble: 4
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> float128: 1
> ldouble: 4
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> float128: 2
> ldouble: 4
>
> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
> index 6206a9531a..a5184ecad9 100644
> --- a/sysdeps/riscv/nofpu/libm-test-ulps
> +++ b/sysdeps/riscv/nofpu/libm-test-ulps
> @@ -71,7 +71,6 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
> index 124ca4b719..3bfc9668d5 100644
> --- a/sysdeps/riscv/rvd/libm-test-ulps
> +++ b/sysdeps/riscv/rvd/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
> index 364ccf3326..7d61bf1cef 100644
> --- a/sysdeps/s390/fpu/libm-test-ulps
> +++ b/sysdeps/s390/fpu/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
> index 1174972002..426f45893e 100644
> --- a/sysdeps/sparc/fpu/libm-test-ulps
> +++ b/sysdeps/sparc/fpu/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 5ed5112b49..d4c4bfa42b 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -180,25 +180,21 @@ float: 1
>
> Function: "asinpi":
> double: 1
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> float128: 1
> ldouble: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> float128: 2
> ldouble: 2
>
> --
> 2.43.0
>
>
I suggest the following change which should improve performance on x86_64/x86_64-v2:
--- a/sysdeps/ieee754/flt-32/s_asinpif.c
+++ b/sysdeps/ieee754/flt-32/s_asinpif.c
@@ -122,7 +122,7 @@ __asinpif (float x)
else
{
double f = sqrt (1 - az);
- double c0 = fma (az, c[1], c[0]);
+ double c0 = c[0] + az * c[1];
double c2 = c[2] + az * c[3];
double c4 = c[4] + az * c[5];
double c6 = c[6] + az * c[7];
Moreover "fast patch" should be fast path.
Paul
> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Cc: DJ Delorie <dj@redhat.com>,
> Joseph Myers <josmyers@redhat.com>,
> Paul Zimmermann <Paul.Zimmermann@inria.fr>,
> Alexei Sibidanov <sibid@uvic.ca>
> Date: Fri, 31 Jan 2025 16:17:13 -0300
>
> The CORE-MATH implementation is correctly rounded (for any rounding mode)
> and shows better performance to the generic asinpif.
>
> The code was adapted to glibc style and to use the definition of
> math_config.h (to handle errno, overflow, and underflow).
>
> Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
> gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
>
> latency master patched improvement
> x86_64 46.4996 51.0456 -9.78%
> x86_64v2 46.7551 52.1317 -11.50%
> x86_64v3 42.6235 34.8162 18.32%
> aarch64 (Neoverse) 17.4161 14.3604 17.55%
> power8 10.7347 9.0193 15.98%
> power10 10.6420 9.0362 15.09%
>
> reciprocal-throughput master patched improvement
> x86_64 24.7208 29.0812 -17.64%
> x86_64v2 24.2177 29.7166 -22.71%
> x86_64v3 20.5617 12.3679 39.85%
> aarch64 (Neoverse) 13.4827 7.17613 46.78%
> power8 6.46134 3.56089 44.89%
> power10 5.79007 3.49544 39.63%
>
> x86_64/x86_64-v2 shows slower performance due the use of a fma
> operation in the fast patch, only x86_64-v3 provides it without a
> function call.
> ---
> SHARED-FILES | 4 +
> sysdeps/aarch64/libm-test-ulps | 4 -
> sysdeps/arc/fpu/libm-test-ulps | 4 -
> sysdeps/arc/nofpu/libm-test-ulps | 1 -
> sysdeps/arm/libm-test-ulps | 4 -
> sysdeps/hppa/fpu/libm-test-ulps | 4 -
> sysdeps/i386/fpu/libm-test-ulps | 4 -
> .../i386/i686/fpu/multiarch/libm-test-ulps | 4 -
> sysdeps/ieee754/flt-32/s_asinpif.c | 136 ++++++++++++++++++
> sysdeps/loongarch/lp64/libm-test-ulps | 4 -
> sysdeps/mips/mips64/libm-test-ulps | 4 -
> sysdeps/or1k/fpu/libm-test-ulps | 4 -
> sysdeps/or1k/nofpu/libm-test-ulps | 1 -
> sysdeps/powerpc/fpu/libm-test-ulps | 4 -
> sysdeps/riscv/nofpu/libm-test-ulps | 1 -
> sysdeps/riscv/rvd/libm-test-ulps | 4 -
> sysdeps/s390/fpu/libm-test-ulps | 4 -
> sysdeps/sparc/fpu/libm-test-ulps | 4 -
> sysdeps/x86_64/fpu/libm-test-ulps | 4 -
> 19 files changed, 140 insertions(+), 59 deletions(-)
> create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c
>
> diff --git a/SHARED-FILES b/SHARED-FILES
> index 3fde72644a..e700f4b155 100644
> --- a/SHARED-FILES
> +++ b/SHARED-FILES
> @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c:
> (src/binary32/acospi/acospif.c in CORE-MATH)
> - the code was adapted to use glibc code style and internal
> functions to handle errno, overflow, and underflow.
> +sysdeps/ieee754/flt-32/s_asinpif.c:
> + (src/binary32/asinpi/asinpif.c in CORE-MATH)
> + - the code was adapted to use glibc code style and internal
> + functions to handle errno, overflow, and underflow.
> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
> index 1a403d95b6..abb0611ee5 100644
> --- a/sysdeps/aarch64/libm-test-ulps
> +++ b/sysdeps/aarch64/libm-test-ulps
> @@ -115,22 +115,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
> index c0c5daa589..35aebba38a 100644
> --- a/sysdeps/arc/fpu/libm-test-ulps
> +++ b/sysdeps/arc/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
> index 2b34f5a0ab..325546e582 100644
> --- a/sysdeps/arc/nofpu/libm-test-ulps
> +++ b/sysdeps/arc/nofpu/libm-test-ulps
> @@ -18,7 +18,6 @@ double: 2
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
> index afb0532a66..0927fdb980 100644
> --- a/sysdeps/arm/libm-test-ulps
> +++ b/sysdeps/arm/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
> index b9959c8a12..02cc3b5ddc 100644
> --- a/sysdeps/hppa/fpu/libm-test-ulps
> +++ b/sysdeps/hppa/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
> index 85c58f34e9..69d0eb1eec 100644
> --- a/sysdeps/i386/fpu/libm-test-ulps
> +++ b/sysdeps/i386/fpu/libm-test-ulps
> @@ -101,25 +101,21 @@ ldouble: 5
>
> Function: "asinpi":
> double: 1
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 2
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> float128: 1
> ldouble: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> float128: 2
> ldouble: 2
>
> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> index bc14e7e115..392d7d252c 100644
> --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> @@ -101,25 +101,21 @@ ldouble: 5
>
> Function: "asinpi":
> double: 1
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 2
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> float128: 1
> ldouble: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> float128: 2
> ldouble: 2
>
> diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c
> new file mode 100644
> index 0000000000..585dc3f06e
> --- /dev/null
> +++ b/sysdeps/ieee754/flt-32/s_asinpif.c
> @@ -0,0 +1,136 @@
> +/* Correctly-rounded half-revolution arc-sine function for binary32 value.
> +
> +Copyright (c) 2022-2025 Alexei Sibidanov.
> +
> +The original version of this file was copied from the CORE-MATH
> +project (file src/binary32/asinpi/asinpif.c, revision 49e223e).
> +
> +Permission is hereby granted, free of charge, to any person obtaining a copy
> +of this software and associated documentation files (the "Software"), to deal
> +in the Software without restriction, including without limitation the rights
> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> +copies of the Software, and to permit persons to whom the Software is
> +furnished to do so, subject to the following conditions:
> +
> +The above copyright notice and this permission notice shall be included in all
> +copies or substantial portions of the Software.
> +
> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> +SOFTWARE.
> +
> +*/
> +
> +#include <errno.h>
> +#include <math.h>
> +#include <stdint.h>
> +#include <libm-alias-float.h>
> +#include "math_config.h"
> +
> +float
> +__asinpif (float x)
> +{
> + float ax = fabsf (x);
> + double az = ax;
> + double z = x;
> + uint32_t t = asuint (x);
> + int32_t e = (t >> 23) & 0xff;
> + if (__glibc_unlikely (e >= 127))
> + {
> + if (ax == 1.0f)
> + return copysignf (0.5f, x);
> + if (e == 0xff && (t << 9))
> + return x + x; /* nan */
> + return __math_edomf ((x - x) / (x - x)); /* nan */
> + }
> + int32_t s = 146 - e;
> + int32_t i = 0;
> + if (__glibc_likely (s < 32))
> + i = ((t & (~0u >> 9)) | 1 << 23) >> s;
> + static const double ch[][8] =
> + {
> + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6,
> + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8,
> + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 },
> + { 0x1.ffffffffd3cdap-2, -0x1.17cc1b3355fdcp-4, 0x1.d067a1e8d5a99p-6,
> + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8,
> + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 },
> + { 0x1.fffffff7c4617p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6,
> + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8,
> + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 },
> + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6,
> + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8,
> + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 },
> + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6,
> + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8,
> + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 },
> + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6,
> + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8,
> + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 },
> + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6,
> + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8,
> + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 },
> + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6,
> + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8,
> + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 },
> + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6,
> + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8,
> + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 },
> + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6,
> + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8,
> + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 },
> + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6,
> + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9,
> + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 },
> + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6,
> + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9,
> + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 },
> + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6,
> + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9,
> + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 },
> + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6,
> + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9,
> + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 },
> + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6,
> + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9,
> + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 },
> + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6,
> + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9,
> + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15
> + }
> + };
> + const double *c = ch[i];
> + double z2 = z * z;
> + double z4 = z2 * z2;
> + if (__glibc_unlikely (i == 0))
> + {
> + double c0 = c[0] + z2 * c[1];
> + double c2 = c[2] + z2 * c[3];
> + double c4 = c[4] + z2 * c[5];
> + double c6 = c[6] + z2 * c[7];
> + c0 += c2 * z4;
> + c4 += c6 * z4;
> + c0 += c4 * (z4 * z4);
> + if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f))
> + __set_errno (ERANGE);
> + return z * c0;
> + }
> + else
> + {
> + double f = sqrt (1 - az);
> + double c0 = fma (az, c[1], c[0]);
> + double c2 = c[2] + az * c[3];
> + double c4 = c[4] + az * c[5];
> + double c6 = c[6] + az * c[7];
> + c0 += c2 * z2;
> + c4 += c6 * z2;
> + c0 += c4 * z4;
> + double r = fma (-c0, copysign (f, x), copysign (0.5, x));
> + return r;
> + }
> +}
> +libm_alias_float (__asinpi, asinpi)
> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
> index ce84ddf1e6..33dd6718ba 100644
> --- a/sysdeps/loongarch/lp64/libm-test-ulps
> +++ b/sysdeps/loongarch/lp64/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
> index 67c37dfd5e..869ceff928 100644
> --- a/sysdeps/mips/mips64/libm-test-ulps
> +++ b/sysdeps/mips/mips64/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
> index d3b1036d29..75db236e09 100644
> --- a/sysdeps/or1k/fpu/libm-test-ulps
> +++ b/sysdeps/or1k/fpu/libm-test-ulps
> @@ -63,19 +63,15 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
> index 14b7e0f3f9..a1f7c80097 100644
> --- a/sysdeps/or1k/nofpu/libm-test-ulps
> +++ b/sysdeps/or1k/nofpu/libm-test-ulps
> @@ -54,7 +54,6 @@ double: 3
>
> Function: "asinpi":
> double: 1
> -float: 1
>
> Function: "atan":
> double: 1
> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
> index c9c86de147..fa3cf2e844 100644
> --- a/sysdeps/powerpc/fpu/libm-test-ulps
> +++ b/sysdeps/powerpc/fpu/libm-test-ulps
> @@ -107,25 +107,21 @@ ldouble: 7
>
> Function: "asinpi":
> double: 1
> -float: 1
> float128: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> float128: 2
> ldouble: 4
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> float128: 1
> ldouble: 4
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> float128: 2
> ldouble: 4
>
> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
> index 6206a9531a..a5184ecad9 100644
> --- a/sysdeps/riscv/nofpu/libm-test-ulps
> +++ b/sysdeps/riscv/nofpu/libm-test-ulps
> @@ -71,7 +71,6 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
> index 124ca4b719..3bfc9668d5 100644
> --- a/sysdeps/riscv/rvd/libm-test-ulps
> +++ b/sysdeps/riscv/rvd/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
> index 364ccf3326..7d61bf1cef 100644
> --- a/sysdeps/s390/fpu/libm-test-ulps
> +++ b/sysdeps/s390/fpu/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
> index 1174972002..426f45893e 100644
> --- a/sysdeps/sparc/fpu/libm-test-ulps
> +++ b/sysdeps/sparc/fpu/libm-test-ulps
> @@ -83,22 +83,18 @@ ldouble: 4
>
> Function: "asinpi":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> ldouble: 1
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> ldouble: 2
>
> Function: "atan":
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 5ed5112b49..d4c4bfa42b 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -180,25 +180,21 @@ float: 1
>
> Function: "asinpi":
> double: 1
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_downward":
> double: 1
> -float: 1
> float128: 2
> ldouble: 2
>
> Function: "asinpi_towardzero":
> double: 1
> -float: 2
> float128: 1
> ldouble: 2
>
> Function: "asinpi_upward":
> double: 2
> -float: 2
> float128: 2
> ldouble: 2
>
> --
> 2.43.0
>
>
On 03/02/25 03:40, Paul Zimmermann wrote:
> I suggest the following change which should improve performance on x86_64/x86_64-v2:
>
> --- a/sysdeps/ieee754/flt-32/s_asinpif.c
> +++ b/sysdeps/ieee754/flt-32/s_asinpif.c
> @@ -122,7 +122,7 @@ __asinpif (float x)
> else
> {
> double f = sqrt (1 - az);
> - double c0 = fma (az, c[1], c[0]);
> + double c0 = c[0] + az * c[1];
> double c2 = c[2] + az * c[3];
> double c4 = c[4] + az * c[5];
> double c6 = c[6] + az * c[7];
>
> Moreover "fast patch" should be fast path.
Thanks, this allows to drop the last patch that adds a asinpif ifunc variant
for x86_64-v3, since performance is now an improvement on all x86_64 ABIs.
>
> Paul
>
>> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
>> Cc: DJ Delorie <dj@redhat.com>,
>> Joseph Myers <josmyers@redhat.com>,
>> Paul Zimmermann <Paul.Zimmermann@inria.fr>,
>> Alexei Sibidanov <sibid@uvic.ca>
>> Date: Fri, 31 Jan 2025 16:17:13 -0300
>>
>> The CORE-MATH implementation is correctly rounded (for any rounding mode)
>> and shows better performance to the generic asinpif.
>>
>> The code was adapted to glibc style and to use the definition of
>> math_config.h (to handle errno, overflow, and underflow).
>>
>> Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
>> gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
>>
>> latency master patched improvement
>> x86_64 46.4996 51.0456 -9.78%
>> x86_64v2 46.7551 52.1317 -11.50%
>> x86_64v3 42.6235 34.8162 18.32%
>> aarch64 (Neoverse) 17.4161 14.3604 17.55%
>> power8 10.7347 9.0193 15.98%
>> power10 10.6420 9.0362 15.09%
>>
>> reciprocal-throughput master patched improvement
>> x86_64 24.7208 29.0812 -17.64%
>> x86_64v2 24.2177 29.7166 -22.71%
>> x86_64v3 20.5617 12.3679 39.85%
>> aarch64 (Neoverse) 13.4827 7.17613 46.78%
>> power8 6.46134 3.56089 44.89%
>> power10 5.79007 3.49544 39.63%
>>
>> x86_64/x86_64-v2 shows slower performance due the use of a fma
>> operation in the fast patch, only x86_64-v3 provides it without a
>> function call.
>> ---
>> SHARED-FILES | 4 +
>> sysdeps/aarch64/libm-test-ulps | 4 -
>> sysdeps/arc/fpu/libm-test-ulps | 4 -
>> sysdeps/arc/nofpu/libm-test-ulps | 1 -
>> sysdeps/arm/libm-test-ulps | 4 -
>> sysdeps/hppa/fpu/libm-test-ulps | 4 -
>> sysdeps/i386/fpu/libm-test-ulps | 4 -
>> .../i386/i686/fpu/multiarch/libm-test-ulps | 4 -
>> sysdeps/ieee754/flt-32/s_asinpif.c | 136 ++++++++++++++++++
>> sysdeps/loongarch/lp64/libm-test-ulps | 4 -
>> sysdeps/mips/mips64/libm-test-ulps | 4 -
>> sysdeps/or1k/fpu/libm-test-ulps | 4 -
>> sysdeps/or1k/nofpu/libm-test-ulps | 1 -
>> sysdeps/powerpc/fpu/libm-test-ulps | 4 -
>> sysdeps/riscv/nofpu/libm-test-ulps | 1 -
>> sysdeps/riscv/rvd/libm-test-ulps | 4 -
>> sysdeps/s390/fpu/libm-test-ulps | 4 -
>> sysdeps/sparc/fpu/libm-test-ulps | 4 -
>> sysdeps/x86_64/fpu/libm-test-ulps | 4 -
>> 19 files changed, 140 insertions(+), 59 deletions(-)
>> create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c
>>
>> diff --git a/SHARED-FILES b/SHARED-FILES
>> index 3fde72644a..e700f4b155 100644
>> --- a/SHARED-FILES
>> +++ b/SHARED-FILES
>> @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c:
>> (src/binary32/acospi/acospif.c in CORE-MATH)
>> - the code was adapted to use glibc code style and internal
>> functions to handle errno, overflow, and underflow.
>> +sysdeps/ieee754/flt-32/s_asinpif.c:
>> + (src/binary32/asinpi/asinpif.c in CORE-MATH)
>> + - the code was adapted to use glibc code style and internal
>> + functions to handle errno, overflow, and underflow.
>> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
>> index 1a403d95b6..abb0611ee5 100644
>> --- a/sysdeps/aarch64/libm-test-ulps
>> +++ b/sysdeps/aarch64/libm-test-ulps
>> @@ -115,22 +115,18 @@ ldouble: 4
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> ldouble: 1
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> ldouble: 2
>>
>> Function: "atan":
>> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
>> index c0c5daa589..35aebba38a 100644
>> --- a/sysdeps/arc/fpu/libm-test-ulps
>> +++ b/sysdeps/arc/fpu/libm-test-ulps
>> @@ -63,19 +63,15 @@ double: 3
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>>
>> Function: "atan":
>> double: 1
>> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
>> index 2b34f5a0ab..325546e582 100644
>> --- a/sysdeps/arc/nofpu/libm-test-ulps
>> +++ b/sysdeps/arc/nofpu/libm-test-ulps
>> @@ -18,7 +18,6 @@ double: 2
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>>
>> Function: "atan":
>> double: 1
>> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
>> index afb0532a66..0927fdb980 100644
>> --- a/sysdeps/arm/libm-test-ulps
>> +++ b/sysdeps/arm/libm-test-ulps
>> @@ -63,19 +63,15 @@ double: 3
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>>
>> Function: "atan":
>> double: 1
>> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
>> index b9959c8a12..02cc3b5ddc 100644
>> --- a/sysdeps/hppa/fpu/libm-test-ulps
>> +++ b/sysdeps/hppa/fpu/libm-test-ulps
>> @@ -63,19 +63,15 @@ double: 3
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>>
>> Function: "atan":
>> double: 1
>> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
>> index 85c58f34e9..69d0eb1eec 100644
>> --- a/sysdeps/i386/fpu/libm-test-ulps
>> +++ b/sysdeps/i386/fpu/libm-test-ulps
>> @@ -101,25 +101,21 @@ ldouble: 5
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> float128: 2
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 2
>> -float: 1
>> float128: 2
>> ldouble: 2
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> float128: 1
>> ldouble: 2
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> float128: 2
>> ldouble: 2
>>
>> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
>> index bc14e7e115..392d7d252c 100644
>> --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
>> +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
>> @@ -101,25 +101,21 @@ ldouble: 5
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> float128: 2
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 2
>> -float: 1
>> float128: 2
>> ldouble: 2
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> float128: 1
>> ldouble: 2
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> float128: 2
>> ldouble: 2
>>
>> diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c
>> new file mode 100644
>> index 0000000000..585dc3f06e
>> --- /dev/null
>> +++ b/sysdeps/ieee754/flt-32/s_asinpif.c
>> @@ -0,0 +1,136 @@
>> +/* Correctly-rounded half-revolution arc-sine function for binary32 value.
>> +
>> +Copyright (c) 2022-2025 Alexei Sibidanov.
>> +
>> +The original version of this file was copied from the CORE-MATH
>> +project (file src/binary32/asinpi/asinpif.c, revision 49e223e).
>> +
>> +Permission is hereby granted, free of charge, to any person obtaining a copy
>> +of this software and associated documentation files (the "Software"), to deal
>> +in the Software without restriction, including without limitation the rights
>> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> +copies of the Software, and to permit persons to whom the Software is
>> +furnished to do so, subject to the following conditions:
>> +
>> +The above copyright notice and this permission notice shall be included in all
>> +copies or substantial portions of the Software.
>> +
>> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> +SOFTWARE.
>> +
>> +*/
>> +
>> +#include <errno.h>
>> +#include <math.h>
>> +#include <stdint.h>
>> +#include <libm-alias-float.h>
>> +#include "math_config.h"
>> +
>> +float
>> +__asinpif (float x)
>> +{
>> + float ax = fabsf (x);
>> + double az = ax;
>> + double z = x;
>> + uint32_t t = asuint (x);
>> + int32_t e = (t >> 23) & 0xff;
>> + if (__glibc_unlikely (e >= 127))
>> + {
>> + if (ax == 1.0f)
>> + return copysignf (0.5f, x);
>> + if (e == 0xff && (t << 9))
>> + return x + x; /* nan */
>> + return __math_edomf ((x - x) / (x - x)); /* nan */
>> + }
>> + int32_t s = 146 - e;
>> + int32_t i = 0;
>> + if (__glibc_likely (s < 32))
>> + i = ((t & (~0u >> 9)) | 1 << 23) >> s;
>> + static const double ch[][8] =
>> + {
>> + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6,
>> + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8,
>> + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 },
>> + { 0x1.ffffffffd3cdap-2, -0x1.17cc1b3355fdcp-4, 0x1.d067a1e8d5a99p-6,
>> + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8,
>> + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 },
>> + { 0x1.fffffff7c4617p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6,
>> + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8,
>> + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 },
>> + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6,
>> + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8,
>> + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 },
>> + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6,
>> + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8,
>> + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 },
>> + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6,
>> + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8,
>> + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 },
>> + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6,
>> + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8,
>> + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 },
>> + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6,
>> + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8,
>> + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 },
>> + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6,
>> + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8,
>> + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 },
>> + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6,
>> + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8,
>> + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 },
>> + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6,
>> + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9,
>> + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 },
>> + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6,
>> + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9,
>> + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 },
>> + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6,
>> + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9,
>> + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 },
>> + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6,
>> + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9,
>> + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 },
>> + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6,
>> + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9,
>> + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 },
>> + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6,
>> + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9,
>> + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15
>> + }
>> + };
>> + const double *c = ch[i];
>> + double z2 = z * z;
>> + double z4 = z2 * z2;
>> + if (__glibc_unlikely (i == 0))
>> + {
>> + double c0 = c[0] + z2 * c[1];
>> + double c2 = c[2] + z2 * c[3];
>> + double c4 = c[4] + z2 * c[5];
>> + double c6 = c[6] + z2 * c[7];
>> + c0 += c2 * z4;
>> + c4 += c6 * z4;
>> + c0 += c4 * (z4 * z4);
>> + if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f))
>> + __set_errno (ERANGE);
>> + return z * c0;
>> + }
>> + else
>> + {
>> + double f = sqrt (1 - az);
>> + double c0 = fma (az, c[1], c[0]);
>> + double c2 = c[2] + az * c[3];
>> + double c4 = c[4] + az * c[5];
>> + double c6 = c[6] + az * c[7];
>> + c0 += c2 * z2;
>> + c4 += c6 * z2;
>> + c0 += c4 * z4;
>> + double r = fma (-c0, copysign (f, x), copysign (0.5, x));
>> + return r;
>> + }
>> +}
>> +libm_alias_float (__asinpi, asinpi)
>> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
>> index ce84ddf1e6..33dd6718ba 100644
>> --- a/sysdeps/loongarch/lp64/libm-test-ulps
>> +++ b/sysdeps/loongarch/lp64/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> ldouble: 1
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> ldouble: 2
>>
>> Function: "atan":
>> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
>> index 67c37dfd5e..869ceff928 100644
>> --- a/sysdeps/mips/mips64/libm-test-ulps
>> +++ b/sysdeps/mips/mips64/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> ldouble: 1
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> ldouble: 2
>>
>> Function: "atan":
>> diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
>> index d3b1036d29..75db236e09 100644
>> --- a/sysdeps/or1k/fpu/libm-test-ulps
>> +++ b/sysdeps/or1k/fpu/libm-test-ulps
>> @@ -63,19 +63,15 @@ double: 3
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>>
>> Function: "atan":
>> double: 1
>> diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
>> index 14b7e0f3f9..a1f7c80097 100644
>> --- a/sysdeps/or1k/nofpu/libm-test-ulps
>> +++ b/sysdeps/or1k/nofpu/libm-test-ulps
>> @@ -54,7 +54,6 @@ double: 3
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>>
>> Function: "atan":
>> double: 1
>> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
>> index c9c86de147..fa3cf2e844 100644
>> --- a/sysdeps/powerpc/fpu/libm-test-ulps
>> +++ b/sysdeps/powerpc/fpu/libm-test-ulps
>> @@ -107,25 +107,21 @@ ldouble: 7
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> float128: 1
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>> float128: 2
>> ldouble: 4
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> float128: 1
>> ldouble: 4
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> float128: 2
>> ldouble: 4
>>
>> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
>> index 6206a9531a..a5184ecad9 100644
>> --- a/sysdeps/riscv/nofpu/libm-test-ulps
>> +++ b/sysdeps/riscv/nofpu/libm-test-ulps
>> @@ -71,7 +71,6 @@ ldouble: 4
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "atan":
>> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
>> index 124ca4b719..3bfc9668d5 100644
>> --- a/sysdeps/riscv/rvd/libm-test-ulps
>> +++ b/sysdeps/riscv/rvd/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> ldouble: 1
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> ldouble: 2
>>
>> Function: "atan":
>> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
>> index 364ccf3326..7d61bf1cef 100644
>> --- a/sysdeps/s390/fpu/libm-test-ulps
>> +++ b/sysdeps/s390/fpu/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> ldouble: 1
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> ldouble: 2
>>
>> Function: "atan":
>> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
>> index 1174972002..426f45893e 100644
>> --- a/sysdeps/sparc/fpu/libm-test-ulps
>> +++ b/sysdeps/sparc/fpu/libm-test-ulps
>> @@ -83,22 +83,18 @@ ldouble: 4
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>> ldouble: 2
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> ldouble: 1
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> ldouble: 2
>>
>> Function: "atan":
>> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
>> index 5ed5112b49..d4c4bfa42b 100644
>> --- a/sysdeps/x86_64/fpu/libm-test-ulps
>> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
>> @@ -180,25 +180,21 @@ float: 1
>>
>> Function: "asinpi":
>> double: 1
>> -float: 1
>> float128: 2
>> ldouble: 2
>>
>> Function: "asinpi_downward":
>> double: 1
>> -float: 1
>> float128: 2
>> ldouble: 2
>>
>> Function: "asinpi_towardzero":
>> double: 1
>> -float: 2
>> float128: 1
>> ldouble: 2
>>
>> Function: "asinpi_upward":
>> double: 2
>> -float: 2
>> float128: 2
>> ldouble: 2
>>
>> --
>> 2.43.0
>>
>>
LGTM
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> diff --git a/SHARED-FILES b/SHARED-FILES
> index 3fde72644a..e700f4b155 100644
> --- a/SHARED-FILES
> +++ b/SHARED-FILES
> @@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c:
> (src/binary32/acospi/acospif.c in CORE-MATH)
> - the code was adapted to use glibc code style and internal
> functions to handle errno, overflow, and underflow.
> +sysdeps/ieee754/flt-32/s_asinpif.c:
> + (src/binary32/asinpi/asinpif.c in CORE-MATH)
> + - the code was adapted to use glibc code style and internal
> + functions to handle errno, overflow, and underflow.
Ok.
> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
Ok.
> diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c
> +/* Correctly-rounded half-revolution arc-sine function for binary32 value.
> +
> +Copyright (c) 2022-2025 Alexei Sibidanov.
> +
> +The original version of this file was copied from the CORE-MATH
> +project (file src/binary32/asinpi/asinpif.c, revision 49e223e).
> +
> +Permission is hereby granted, free of charge, to any person obtaining a copy
> +of this software and associated documentation files (the "Software"), to deal
> +in the Software without restriction, including without limitation the rights
> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> +copies of the Software, and to permit persons to whom the Software is
> +furnished to do so, subject to the following conditions:
> +
> +The above copyright notice and this permission notice shall be included in all
> +copies or substantial portions of the Software.
> +
> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> +SOFTWARE.
> +
> +*/
> +
> +#include <errno.h>
> +#include <math.h>
> +#include <stdint.h>
> +#include <libm-alias-float.h>
> +#include "math_config.h"
Ok.
> +float
> +__asinpif (float x)
> +{
> + float ax = fabsf (x);
> + double az = ax;
> + double z = x;
> + uint32_t t = asuint (x);
> + int32_t e = (t >> 23) & 0xff;
> + if (__glibc_unlikely (e >= 127))
> + {
> + if (ax == 1.0f)
> + return copysignf (0.5f, x);
> + if (e == 0xff && (t << 9))
> + return x + x; /* nan */
> + return __math_edomf ((x - x) / (x - x)); /* nan */
> + }
Ok.
> + int32_t s = 146 - e;
> + int32_t i = 0;
> + if (__glibc_likely (s < 32))
> + i = ((t & (~0u >> 9)) | 1 << 23) >> s;
Ok.
> + static const double ch[][8] =
> + {
> + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6,
> + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8,
> + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 },
> + { 0x1.ffffffffd3cdap-2, -0x1.17cc1b3355fdcp-4, 0x1.d067a1e8d5a99p-6,
> + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8,
> + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 },
> + { 0x1.fffffff7c4617p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6,
> + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8,
> + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 },
> + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6,
> + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8,
> + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 },
> + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6,
> + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8,
> + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 },
> + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6,
> + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8,
> + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 },
> + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6,
> + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8,
> + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 },
> + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6,
> + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8,
> + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 },
> + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6,
> + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8,
> + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 },
> + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6,
> + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8,
> + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 },
> + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6,
> + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9,
> + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 },
> + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6,
> + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9,
> + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 },
> + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6,
> + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9,
> + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 },
> + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6,
> + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9,
> + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 },
> + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6,
> + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9,
> + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 },
> + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6,
> + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9,
> + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15
> + }
> + };
Ok.
> + const double *c = ch[i];
> + double z2 = z * z;
> + double z4 = z2 * z2;
> + if (__glibc_unlikely (i == 0))
> + {
> + double c0 = c[0] + z2 * c[1];
> + double c2 = c[2] + z2 * c[3];
> + double c4 = c[4] + z2 * c[5];
> + double c6 = c[6] + z2 * c[7];
> + c0 += c2 * z4;
> + c4 += c6 * z4;
> + c0 += c4 * (z4 * z4);
> + if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f))
> + __set_errno (ERANGE);
> + return z * c0;
> + }
Ok.
> + else
> + {
> + double f = sqrt (1 - az);
> + double c0 = fma (az, c[1], c[0]);
> + double c2 = c[2] + az * c[3];
> + double c4 = c[4] + az * c[5];
> + double c6 = c[6] + az * c[7];
> + c0 += c2 * z2;
> + c4 += c6 * z2;
> + c0 += c4 * z4;
> + double r = fma (-c0, copysign (f, x), copysign (0.5, x));
> + return r;
> + }
> +}
> +libm_alias_float (__asinpi, asinpi)
Ok.
> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
> diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps
> diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps
> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
Ok.
@@ -338,3 +338,7 @@ sysdeps/ieee754/flt-32/s_acospif.c:
(src/binary32/acospi/acospif.c in CORE-MATH)
- the code was adapted to use glibc code style and internal
functions to handle errno, overflow, and underflow.
+sysdeps/ieee754/flt-32/s_asinpif.c:
+ (src/binary32/asinpi/asinpif.c in CORE-MATH)
+ - the code was adapted to use glibc code style and internal
+ functions to handle errno, overflow, and underflow.
@@ -115,22 +115,18 @@ ldouble: 4
Function: "asinpi":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_downward":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_towardzero":
double: 1
-float: 2
ldouble: 1
Function: "asinpi_upward":
double: 2
-float: 2
ldouble: 2
Function: "atan":
@@ -63,19 +63,15 @@ double: 3
Function: "asinpi":
double: 1
-float: 1
Function: "asinpi_downward":
double: 1
-float: 1
Function: "asinpi_towardzero":
double: 1
-float: 2
Function: "asinpi_upward":
double: 2
-float: 2
Function: "atan":
double: 1
@@ -18,7 +18,6 @@ double: 2
Function: "asinpi":
double: 1
-float: 1
Function: "atan":
double: 1
@@ -63,19 +63,15 @@ double: 3
Function: "asinpi":
double: 1
-float: 1
Function: "asinpi_downward":
double: 1
-float: 1
Function: "asinpi_towardzero":
double: 1
-float: 2
Function: "asinpi_upward":
double: 2
-float: 2
Function: "atan":
double: 1
@@ -63,19 +63,15 @@ double: 3
Function: "asinpi":
double: 1
-float: 1
Function: "asinpi_downward":
double: 1
-float: 1
Function: "asinpi_towardzero":
double: 1
-float: 2
Function: "asinpi_upward":
double: 2
-float: 2
Function: "atan":
double: 1
@@ -101,25 +101,21 @@ ldouble: 5
Function: "asinpi":
double: 1
-float: 1
float128: 2
ldouble: 2
Function: "asinpi_downward":
double: 2
-float: 1
float128: 2
ldouble: 2
Function: "asinpi_towardzero":
double: 1
-float: 2
float128: 1
ldouble: 2
Function: "asinpi_upward":
double: 2
-float: 2
float128: 2
ldouble: 2
@@ -101,25 +101,21 @@ ldouble: 5
Function: "asinpi":
double: 1
-float: 1
float128: 2
ldouble: 2
Function: "asinpi_downward":
double: 2
-float: 1
float128: 2
ldouble: 2
Function: "asinpi_towardzero":
double: 1
-float: 2
float128: 1
ldouble: 2
Function: "asinpi_upward":
double: 2
-float: 2
float128: 2
ldouble: 2
new file mode 100644
@@ -0,0 +1,136 @@
+/* Correctly-rounded half-revolution arc-sine function for binary32 value.
+
+Copyright (c) 2022-2025 Alexei Sibidanov.
+
+The original version of this file was copied from the CORE-MATH
+project (file src/binary32/asinpi/asinpif.c, revision 49e223e).
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+*/
+
+#include <errno.h>
+#include <math.h>
+#include <stdint.h>
+#include <libm-alias-float.h>
+#include "math_config.h"
+
+float
+__asinpif (float x)
+{
+ float ax = fabsf (x);
+ double az = ax;
+ double z = x;
+ uint32_t t = asuint (x);
+ int32_t e = (t >> 23) & 0xff;
+ if (__glibc_unlikely (e >= 127))
+ {
+ if (ax == 1.0f)
+ return copysignf (0.5f, x);
+ if (e == 0xff && (t << 9))
+ return x + x; /* nan */
+ return __math_edomf ((x - x) / (x - x)); /* nan */
+ }
+ int32_t s = 146 - e;
+ int32_t i = 0;
+ if (__glibc_likely (s < 32))
+ i = ((t & (~0u >> 9)) | 1 << 23) >> s;
+ static const double ch[][8] =
+ {
+ { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6,
+ 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8,
+ 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 },
+ { 0x1.ffffffffd3cdap-2, -0x1.17cc1b3355fdcp-4, 0x1.d067a1e8d5a99p-6,
+ -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8,
+ 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 },
+ { 0x1.fffffff7c4617p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6,
+ -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8,
+ 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 },
+ { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6,
+ -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8,
+ 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 },
+ { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6,
+ -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8,
+ 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 },
+ { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6,
+ -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8,
+ 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 },
+ { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6,
+ -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8,
+ 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 },
+ { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6,
+ -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8,
+ 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 },
+ { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6,
+ -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8,
+ 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 },
+ { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6,
+ -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8,
+ 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 },
+ { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6,
+ -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9,
+ 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 },
+ { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6,
+ -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9,
+ 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 },
+ { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6,
+ -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9,
+ 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 },
+ { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6,
+ -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9,
+ 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 },
+ { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6,
+ -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9,
+ 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 },
+ { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6,
+ -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9,
+ 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15
+ }
+ };
+ const double *c = ch[i];
+ double z2 = z * z;
+ double z4 = z2 * z2;
+ if (__glibc_unlikely (i == 0))
+ {
+ double c0 = c[0] + z2 * c[1];
+ double c2 = c[2] + z2 * c[3];
+ double c4 = c[4] + z2 * c[5];
+ double c6 = c[6] + z2 * c[7];
+ c0 += c2 * z4;
+ c4 += c6 * z4;
+ c0 += c4 * (z4 * z4);
+ if (__glibc_unlikely (ax != 0.0 && ax <= 0x1.921fb4p-126f))
+ __set_errno (ERANGE);
+ return z * c0;
+ }
+ else
+ {
+ double f = sqrt (1 - az);
+ double c0 = fma (az, c[1], c[0]);
+ double c2 = c[2] + az * c[3];
+ double c4 = c[4] + az * c[5];
+ double c6 = c[6] + az * c[7];
+ c0 += c2 * z2;
+ c4 += c6 * z2;
+ c0 += c4 * z4;
+ double r = fma (-c0, copysign (f, x), copysign (0.5, x));
+ return r;
+ }
+}
+libm_alias_float (__asinpi, asinpi)
@@ -83,22 +83,18 @@ ldouble: 4
Function: "asinpi":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_downward":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_towardzero":
double: 1
-float: 2
ldouble: 1
Function: "asinpi_upward":
double: 2
-float: 2
ldouble: 2
Function: "atan":
@@ -83,22 +83,18 @@ ldouble: 4
Function: "asinpi":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_downward":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_towardzero":
double: 1
-float: 2
ldouble: 1
Function: "asinpi_upward":
double: 2
-float: 2
ldouble: 2
Function: "atan":
@@ -63,19 +63,15 @@ double: 3
Function: "asinpi":
double: 1
-float: 1
Function: "asinpi_downward":
double: 1
-float: 1
Function: "asinpi_towardzero":
double: 1
-float: 2
Function: "asinpi_upward":
double: 2
-float: 2
Function: "atan":
double: 1
@@ -54,7 +54,6 @@ double: 3
Function: "asinpi":
double: 1
-float: 1
Function: "atan":
double: 1
@@ -107,25 +107,21 @@ ldouble: 7
Function: "asinpi":
double: 1
-float: 1
float128: 1
ldouble: 2
Function: "asinpi_downward":
double: 1
-float: 1
float128: 2
ldouble: 4
Function: "asinpi_towardzero":
double: 1
-float: 2
float128: 1
ldouble: 4
Function: "asinpi_upward":
double: 2
-float: 2
float128: 2
ldouble: 4
@@ -71,7 +71,6 @@ ldouble: 4
Function: "asinpi":
double: 1
-float: 1
ldouble: 2
Function: "atan":
@@ -83,22 +83,18 @@ ldouble: 4
Function: "asinpi":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_downward":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_towardzero":
double: 1
-float: 2
ldouble: 1
Function: "asinpi_upward":
double: 2
-float: 2
ldouble: 2
Function: "atan":
@@ -83,22 +83,18 @@ ldouble: 4
Function: "asinpi":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_downward":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_towardzero":
double: 1
-float: 2
ldouble: 1
Function: "asinpi_upward":
double: 2
-float: 2
ldouble: 2
Function: "atan":
@@ -83,22 +83,18 @@ ldouble: 4
Function: "asinpi":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_downward":
double: 1
-float: 1
ldouble: 2
Function: "asinpi_towardzero":
double: 1
-float: 2
ldouble: 1
Function: "asinpi_upward":
double: 2
-float: 2
ldouble: 2
Function: "atan":
@@ -180,25 +180,21 @@ float: 1
Function: "asinpi":
double: 1
-float: 1
float128: 2
ldouble: 2
Function: "asinpi_downward":
double: 1
-float: 1
float128: 2
ldouble: 2
Function: "asinpi_towardzero":
double: 1
-float: 2
float128: 1
ldouble: 2
Function: "asinpi_upward":
double: 2
-float: 2
float128: 2
ldouble: 2