Message ID | 20241129132032.476978-1-adhemerval.zanella@linaro.org (mailing list archive) |
---|---|
Headers |
Return-Path: <libc-alpha-bounces~patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4D814385840D for <patchwork@sourceware.org>; Fri, 29 Nov 2024 13:23:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4D814385840D Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=MRtEQRe+ X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x642.google.com (mail-pl1-x642.google.com [IPv6:2607:f8b0:4864:20::642]) by sourceware.org (Postfix) with ESMTPS id B824F3858D29 for <libc-alpha@sourceware.org>; Fri, 29 Nov 2024 13:20:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B824F3858D29 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B824F3858D29 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::642 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732886438; cv=none; b=TDvCH3gQensjXs5bOEZrJ5taV2PL//oKYUFwdsQeTabQoyh5nxfIbrS/upSEdXfMrerDsd+DKAdpLenVH8QJBY0KMcn/wsIv94BoxwjLCoJCB17O/Ifo6S1bQ3dEfVNHNW3MW2bIo1StnPi/0LtcLdD4fYVJE62xNDDBkDlbzM4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732886438; c=relaxed/simple; bh=WvkcGplu4OKPw0NKBIHdSTywuqYebGEh0H0zjuUVErM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=gLJfAP7zxsl8IYkNu/N9UpV9V2lQCQCkog8p6QkyeUQ0QyAASiCbAO3Gis8Zi/Fk8RgRBqZzzdke2xvPyhGE6NLNrSN+p+JuPW/wbvpfsUwALXL6/voMGQKBt/npZ+9PrKGZ5X3781qr9UM2BvzIR/iAwI1dW6f/6FrGrHrKp0A= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B824F3858D29 Received: by mail-pl1-x642.google.com with SMTP id d9443c01a7336-212776d6449so17582685ad.1 for <libc-alpha@sourceware.org>; Fri, 29 Nov 2024 05:20:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1732886436; x=1733491236; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=UO54ajiNQyWz7vL5BP14YBnArWPnMAwCE7zjz2tgebU=; b=MRtEQRe+NTq2z6SqlNlql2zwTKV3rc/mhv6QvAiPphdklkvHa8LpYIX8AZZ61zHKt5 dSrcuIEkfEDiW8mGIfa+lREcMLAT9M7zz+kQJy79ouPkw84e2r9xApmjy8oYHGr01KlY ZtL/R4Wq/4jnB2b4TKP3fbVZtp1Abw+y+W+rfrKO25rb5Lv3oSj9i3nM+FjSw63I6GAp A1jBXothL2lxzMu1TmnM76aJRPoiO85JJf/wZ9apXewjIDJCFM5lJN8Iq3U2bR3HlHTs Xsnwc1lelMQw/zBnzdC1Iq9v1Knf4OpAugChr7xJyYMTl942IcP+cjlqjVyVwnHeITDz bWqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732886436; x=1733491236; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=UO54ajiNQyWz7vL5BP14YBnArWPnMAwCE7zjz2tgebU=; b=VCa1WWjwtbysFMv1qknrJILWLeIB4U3NYaL7ihWnSHXSPq8Xu++azdnNRQqCwNlfV9 YHMAV6iKLoSiQTrQ33kjfBYD3xajYH3Tp8w7NkA6ZZmK9Z749YktMSx3injj5C5LNthF x5Zbl6FTaU5kVtWhbKo9h7+m46nZZWW5BW5nafHNw3eq4kaj1tvUbo3NLlkK527e5yAn VhvVIbow1KM8FyiV1Q7f0Coc4e8fuuy3RJkmxKmH4iKZq7CRVpifoHEhQ6XxTrnw7EEZ Av/VXxf3CGYo5rbi/sl5a4p0zMVsSkIqasMpvF2VY1VaFRJImqeypgIltRYm7cMuvaAL 2n1A== X-Gm-Message-State: AOJu0Yzd2W0KFegb3me4Da3KAZNxhI9MEXs7Dh6x4h5GMp9vUmvt5Vfm kz5N/O/0EfBT3a0hhRTMwe4rhG51USjuvoiQVf8OYfgUU+q9O9PZ2wA3hnb+3OyKosSn5B/a+x+ stpqeDaCp X-Gm-Gg: ASbGncuc3Mu7JierW526LI+lf8qSI6HbQQsf7Vn3KocMx6puF+1R0KiOfexyiiiUwuQ /RGi0Vyfj1EF0rCfiTZsvIIjlK3abJeCaek3geQD8UVd9QMJaxeRkUVm17C1eFW7IwHCr1f705a asa9N2X6FHvaPVVVOGpEyYYVzZvlfX109r/bbCbYdL2GnK/HFjcDC0z5L9+wfiqfYF+iCYzgTuY XqGicA3Z1HR0sctHndr+3OF/kOor0WwumFLUZMFUacZiNHU4Hiw+Hpu04d+nnk= X-Google-Smtp-Source: AGHT+IE2QkH0qrC9r1+GFcZGQDlg5O9SJ/0K5k4sGaJiUW1IC6tjEBAXevib7f9Xbn3lb56nyY543Q== X-Received: by 2002:a17:902:da85:b0:212:e29:3b2f with SMTP id d9443c01a7336-21501b5afc3mr136660005ad.44.1732886436333; Fri, 29 Nov 2024 05:20:36 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c1:68c8:3143:6603:ad16:715e]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2153d5f66d5sm14472255ad.201.2024.11.29.05.20.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2024 05:20:35 -0800 (PST) From: Adhemerval Zanella <adhemerval.zanella@linaro.org> To: libc-alpha@sourceware.org Cc: DJ Delorie <dj@redhat.com> Subject: [PATCH 00/23] Add remaining CORE-MATH binary32 implementations to libm Date: Fri, 29 Nov 2024 10:17:24 -0300 Message-ID: <20241129132032.476978-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org |
Series |
Add remaining CORE-MATH binary32 implementations to libm
|
|
Message
Adhemerval Zanella Netto
Nov. 29, 2024, 1:17 p.m. UTC
This patchset adds the optimized and correctly rounded acosf, acohf, asinf, asinhf, atanf, atan2f, atanhf, coshf, sinhf, and tanf from CORE-MATH [1]. Each implementation has a benchmark to evaluate the performance improvements. In general, the results are pretty good with just some remarks: * acosf/asinf hits hard the branch prediction on the range x < 0x1.c2a1dcp-1 for acosf and x < 0x1.852p+126 for asinf. For acosf, this is required to make it correctly rounded on non-default rounding modes, and for asinf, it is fast-path optimization. The performance profiles are wildly different depending on the chip: I see regressions compared to glibc implementation on AMD Zen3 targeting x86-64/x86-64-v2, but I also see large improvements for x86-64-v3 and also on aarch64 Neoverse-N1 and powerpc POWER10. I think it should not be a blocker for integration. * coshf performance is the only regression compared to the current glibc. This is mostly due to the benchmark used (which I modeled using CORE-MATH input range) showing that the glibc code hotspot is on expf, an optimized version from ARM-optimized routines. Neither current expf nor coshf is correctly rounded, and the maximum error is 2ulps for FE_TONEAREST and 3ulp for another rounding. I am not sure if this would be a blocker, and I plan to remove the old SVID compat wrapper in subsequent patches (which should improve the function performance by about ~10%). [1] https://gitlab.inria.fr/core-math/core-math Adhemerval Zanella (23): benchtests: Add acosf benchmark benchtests: Add acoshf benchmark benchtests: Add asinf benchmark benchtests: Add asinhf benchmark benchtests: Add atanf benchmark benchtests: Add atan2f benchmark benchtests: Add atanhf benchmark benchtests: Add coshf benchmark benchtests: Add sinhf benchmark benchtests: Add tanhf benchmark math: Add inf support on gen-auto-libm-tests.c math: Fix the expected atanf (inf) results math: Fix the expected atan2f (inf) results math: Use acosf from CORE-MATH math: Use acoshf from CORE-MATH math: Use asinf from CORE-MATH math: Use asinhf from CORE-MATH math: Use atanf from CORE-MATH math: Use atan2f from CORE-MATH math: Use atanhf from CORE-MATH math: Use coshf from CORE-MATH math: Use sinhf from CORE-MATH math: Use tanhf from CORE-MATH SHARED-FILES | 40 + benchtests/Makefile | 10 + benchtests/acosf-inputs | 2710 +++++++++++++++++ benchtests/acoshf-inputs | 1005 ++++++ benchtests/asinf-inputs | 2710 +++++++++++++++++ benchtests/asinhf-inputs | 2005 ++++++++++++ benchtests/atan2f-inputs | 2005 ++++++++++++ benchtests/atanf-inputs | 2005 ++++++++++++ benchtests/atanhf-inputs | 2005 ++++++++++++ benchtests/coshf-inputs | 2005 ++++++++++++ benchtests/sinhf-inputs | 2005 ++++++++++++ benchtests/tanhf-inputs | 2005 ++++++++++++ math/auto-libm-test-in | 52 + math/auto-libm-test-out-atan | 50 + math/auto-libm-test-out-atan2 | 2316 ++++++++++++++ math/gen-auto-libm-tests.c | 23 +- math/libm-test-atan.inc | 2 - math/libm-test-atan2.inc | 56 - sysdeps/aarch64/libm-test-ulps | 44 +- sysdeps/alpha/fpu/libm-test-ulps | 40 - sysdeps/arc/fpu/libm-test-ulps | 40 - sysdeps/arc/nofpu/libm-test-ulps | 10 - sysdeps/arm/libm-test-ulps | 48 +- sysdeps/csky/fpu/libm-test-ulps | 40 - sysdeps/csky/nofpu/libm-test-ulps | 40 - sysdeps/hppa/fpu/libm-test-ulps | 40 - sysdeps/i386/fpu/e_acosf.S | 23 - sysdeps/i386/fpu/e_acoshf.S | 101 - sysdeps/i386/fpu/e_asinf.S | 38 - sysdeps/i386/fpu/e_atan2f.S | 30 - sysdeps/i386/fpu/e_atanhf.S | 110 - sysdeps/i386/fpu/libm-test-ulps | 25 - sysdeps/i386/fpu/s_asinhf.S | 139 - sysdeps/i386/fpu/s_atanf.S | 30 - .../i386/i686/fpu/multiarch/libm-test-ulps | 25 - sysdeps/ieee754/flt-32/e_acosf.c | 191 +- sysdeps/ieee754/flt-32/e_acoshf.c | 230 +- sysdeps/ieee754/flt-32/e_asinf.c | 210 +- sysdeps/ieee754/flt-32/e_atan2f.c | 337 +- sysdeps/ieee754/flt-32/e_atanhf.c | 210 +- sysdeps/ieee754/flt-32/e_coshf.c | 156 +- sysdeps/ieee754/flt-32/e_sinhf.c | 169 +- sysdeps/ieee754/flt-32/s_asinhf.c | 219 +- sysdeps/ieee754/flt-32/s_atanf.c | 186 +- sysdeps/ieee754/flt-32/s_tanhf.c | 131 +- sysdeps/loongarch/lp64/libm-test-ulps | 44 +- sysdeps/m68k/coldfire/fpu/libm-test-ulps | 2 - sysdeps/m68k/m680x0/fpu/libm-test-ulps | 11 - sysdeps/microblaze/libm-test-ulps | 10 - sysdeps/mips/mips32/libm-test-ulps | 40 - sysdeps/mips/mips64/libm-test-ulps | 40 - sysdeps/or1k/fpu/libm-test-ulps | 40 - sysdeps/or1k/nofpu/libm-test-ulps | 40 - sysdeps/powerpc/fpu/libm-test-ulps | 44 +- sysdeps/powerpc/nofpu/libm-test-ulps | 40 - sysdeps/riscv/nofpu/libm-test-ulps | 40 - sysdeps/riscv/rvd/libm-test-ulps | 40 - sysdeps/s390/fpu/libm-test-ulps | 40 - sysdeps/sh/libm-test-ulps | 20 - sysdeps/sparc/fpu/libm-test-ulps | 40 - sysdeps/x86_64/fpu/libm-test-ulps | 46 +- 61 files changed, 24381 insertions(+), 2027 deletions(-) create mode 100644 benchtests/acosf-inputs create mode 100644 benchtests/acoshf-inputs create mode 100644 benchtests/asinf-inputs create mode 100644 benchtests/asinhf-inputs create mode 100644 benchtests/atan2f-inputs create mode 100644 benchtests/atanf-inputs create mode 100644 benchtests/atanhf-inputs create mode 100644 benchtests/coshf-inputs create mode 100644 benchtests/sinhf-inputs create mode 100644 benchtests/tanhf-inputs delete mode 100644 sysdeps/i386/fpu/e_acosf.S delete mode 100644 sysdeps/i386/fpu/e_acoshf.S delete mode 100644 sysdeps/i386/fpu/e_asinf.S delete mode 100644 sysdeps/i386/fpu/e_atan2f.S delete mode 100644 sysdeps/i386/fpu/e_atanhf.S delete mode 100644 sysdeps/i386/fpu/s_asinhf.S delete mode 100644 sysdeps/i386/fpu/s_atanf.S
Comments
* Adhemerval Zanella: > * acosf/asinf hits hard the branch prediction on the range x < 0x1.c2a1dcp-1 > for acosf and x < 0x1.852p+126 for asinf. > > For acosf, this is required to make it correctly rounded on non-default > rounding modes, and for asinf, it is fast-path optimization. > > The performance profiles are wildly different depending on the chip: I see > regressions compared to glibc implementation on AMD Zen3 targeting > x86-64/x86-64-v2, but I also see large improvements for x86-64-v3 and also > on aarch64 Neoverse-N1 and powerpc POWER10. We can add an FMA variant for acosf/asinf as an IFUNC. This should avoid the performance regression for x86-64 on most machines. According to the commit message, that would also help with coshf for x86-64? Thanks, Florian
I'm currently[*] seeing ULP failures for atan, atan2, asin, and acos when passed PI-related constants, for example: FAIL: math/test-double-acospi FAIL: math/test-double-asinpi FAIL: math/test-double-atan2pi FAIL: math/test-double-atanpi FAIL: math/test-float-acospi FAIL: math/test-float-asinpi FAIL: math/test-float-atan2pi FAIL: math/test-float-atanpi . . . ... on s390, ppc64le, and i686. The results are all "saw 1 expected 0 ULPs". Could these be related to this patch set? [*] while syncing glibc to Fedora Rawhide, at least. The i686 ci/cd trybot is not seeing ULP problems.
* DJ Delorie: > I'm currently[*] seeing ULP failures for atan, atan2, asin, and acos > when passed PI-related constants, for example: > > FAIL: math/test-double-acospi > FAIL: math/test-double-asinpi > FAIL: math/test-double-atan2pi > FAIL: math/test-double-atanpi > FAIL: math/test-float-acospi > FAIL: math/test-float-asinpi > FAIL: math/test-float-atan2pi > FAIL: math/test-float-atanpi > . . . > > ... on s390, ppc64le, and i686. > > The results are all "saw 1 expected 0 ULPs". > > Could these be related to this patch set? > > [*] while syncing glibc to Fedora Rawhide, at least. The i686 ci/cd > trybot is not seeing ULP problems. Yes, these functions are not expected to be correctly rounded, and as they are newly added, many architectures do not have ulps bounds for them yet. Thanks, Florian
> > I'm currently[*] seeing ULP failures for atan, atan2, asin, and acos > > when passed PI-related constants, for example: > > > > FAIL: math/test-double-acospi > > FAIL: math/test-double-asinpi > > FAIL: math/test-double-atan2pi > > FAIL: math/test-double-atanpi > > FAIL: math/test-float-acospi > > FAIL: math/test-float-asinpi > > FAIL: math/test-float-atan2pi > > FAIL: math/test-float-atanpi > > . . . > > > > ... on s390, ppc64le, and i686. > > > > The results are all "saw 1 expected 0 ULPs". > > > > Could these be related to this patch set? > > > > [*] while syncing glibc to Fedora Rawhide, at least. The i686 ci/cd > > trybot is not seeing ULP problems. > > Yes, these functions are not expected to be correctly rounded, and as > they are newly added, many architectures do not have ulps bounds for > them yet. however, this is unrelated to the "Add remaining CORE-MATH binary32 implementations" patchset, since these functions do not yet use the CORE-MATH implementations. This will have to wait after the 2.41 release. Paul
On Tue, 17 Dec 2024, Paul Zimmermann wrote: > however, this is unrelated to the "Add remaining CORE-MATH binary32 implementations" > patchset, since these functions do not yet use the CORE-MATH implementations. > This will have to wait after the 2.41 release. The release freeze hasn't started yet, though it's soon (soft freeze 21 Dec, hard freeze 4 Jan - changing which implementation is used for a new function like this would probably be OK until we're in hard freeze, though it may be hard to get review in the next couple of weeks).
Florian Weimer <fweimer@redhat.com> writes:
> and as they are newly added,
Ah, that's the key. Thanks!
Joseph Myers <josmyers@redhat.com> writes: > On Tue, 17 Dec 2024, Paul Zimmermann wrote: > >> however, this is unrelated to the "Add remaining CORE-MATH binary32 implementations" >> patchset, since these functions do not yet use the CORE-MATH implementations. >> This will have to wait after the 2.41 release. > > The release freeze hasn't started yet, though it's soon (soft freeze 21 > Dec, hard freeze 4 Jan - changing which implementation is used for a new > function like this would probably be OK until we're in hard freeze, though > it may be hard to get review in the next couple of weeks). Given how long it takes to review these, and the upcoming holiday break for many of us, I think 2.41 is much more attainable.