From patchwork Thu Oct 5 16:10:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 77182 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A19283888C54 for ; Thu, 5 Oct 2023 16:12:08 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-AM7-obe.outbound.protection.outlook.com (mail-am7eur03on2056.outbound.protection.outlook.com [40.107.105.56]) by sourceware.org (Postfix) with ESMTPS id 9D638385773C for ; Thu, 5 Oct 2023 16:11:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9D638385773C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yexMuPn3UGo/AEeGnyNbrt088CTCuk1RVuNixrhVMp0=; b=bOLsX6Jl4vzsNSduChflsZvmYDq3rLX3Qml4iKeYyc+q4ywhCMKSPXv92EsiF/1iD9Hsx9v4DUTlamOZ//pff2vzdCZWSP6UMpR/7hXzauZMxUOZIM6iImIF6/u+hJCFLqOuNnSR4AUADhso3ZZOSMZR4loCvvvBW5C1d9u3BJk= Received: from AS9PR06CA0549.eurprd06.prod.outlook.com (2603:10a6:20b:485::22) by AM0PR08MB5395.eurprd08.prod.outlook.com (2603:10a6:208:188::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.20; Thu, 5 Oct 2023 16:11:04 +0000 Received: from AM7EUR03FT008.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:485:cafe::d1) by AS9PR06CA0549.outlook.office365.com (2603:10a6:20b:485::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.33 via Frontend Transport; Thu, 5 Oct 2023 16:11:04 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT008.mail.protection.outlook.com (100.127.141.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.28 via Frontend Transport; Thu, 5 Oct 2023 16:11:04 +0000 Received: ("Tessian outbound 6d14f3380669:v211"); Thu, 05 Oct 2023 16:11:04 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 30ec9364c250670a X-CR-MTA-TID: 64aa7808 Received: from e98dc09077db.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id B281D666-ADC4-48D2-8A4E-28CFDA9A7951.1; Thu, 05 Oct 2023 16:10:58 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e98dc09077db.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 05 Oct 2023 16:10:58 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=f+sYcusRFHraHFNcreC9gjyBiwKUp1VTI3+w2pzYCOjNfOAQ/lTVONzMO/r5ysftI3J2kpmIoPZ6vzOkSHfm9LCOK1wEmyy9CKdmAbmcN3rbcrX+pFot/ALLZPBcKHq9kYbhg+vHFz/3HXOVEa9+4yQaZhznZERqrxjXVNiwji3Bunb8ODjQDBFTdY0NsnjJuIsjve3hmkHIeZVMh2wPAEp9RvhIIas7ByEqQTqKSroLmM6fLSrrBLrJIuTqjPHKHgqDKfRFN8qzSh5obVcpuneoSBp7Uyb7nqMFs/AoRHfUlqMq7cN77w0VTJAqFUNsUh4DeNxinqPe6JbhGwqTgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yexMuPn3UGo/AEeGnyNbrt088CTCuk1RVuNixrhVMp0=; b=FqrCdkiZKG884bQNvtpceQkKwsmaPrcOmx0C4c8//uJYKAaBHyn97QZOvhu+Okyczi7cp440w9yKWXxP7MfPy09F28541lc0uJBGboGn0YnnXXyTjE7AlDO/4JeTihdSzaPSqW6mabXUZJ3Mb8Nvowel+K6UtDM1eQ0uvVSO6XWcN00h1MfMthwQS0utpdvHEP6T30cjhCTuFoFC2HYh5tepeMzCqs3iC6ilqizb+UNky8D82zenLPM/j4kCm5Wc1Gz2cQpIAxq2c9wBbXECIYjSE/Kvq4Pn0zr74EwU1eOsENnVw4U/6F6ZrD90IReL7V3kU4aX4UTKKvJW5unCcw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yexMuPn3UGo/AEeGnyNbrt088CTCuk1RVuNixrhVMp0=; b=bOLsX6Jl4vzsNSduChflsZvmYDq3rLX3Qml4iKeYyc+q4ywhCMKSPXv92EsiF/1iD9Hsx9v4DUTlamOZ//pff2vzdCZWSP6UMpR/7hXzauZMxUOZIM6iImIF6/u+hJCFLqOuNnSR4AUADhso3ZZOSMZR4loCvvvBW5C1d9u3BJk= Received: from AM5PR1001CA0072.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:206:15::49) by DB4PR08MB7933.eurprd08.prod.outlook.com (2603:10a6:10:37b::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.26; Thu, 5 Oct 2023 16:10:54 +0000 Received: from AM7EUR03FT004.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:15:cafe::c1) by AM5PR1001CA0072.outlook.office365.com (2603:10a6:206:15::49) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.33 via Frontend Transport; Thu, 5 Oct 2023 16:10:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by AM7EUR03FT004.mail.protection.outlook.com (100.127.140.210) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6863.29 via Frontend Transport; Thu, 5 Oct 2023 16:10:54 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Thu, 5 Oct 2023 16:10:53 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.27 via Frontend Transport; Thu, 5 Oct 2023 16:10:53 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH v2 1/5] aarch64: Add vector implementations of tan routines Date: Thu, 5 Oct 2023 17:10:48 +0100 Message-ID: <20231005161052.11878-1-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AM7EUR03FT004:EE_|DB4PR08MB7933:EE_|AM7EUR03FT008:EE_|AM0PR08MB5395:EE_ X-MS-Office365-Filtering-Correlation-Id: 1c0ebe0d-f7ff-4655-a038-08dbc5bdad1f x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: yo5i/m8DHFj6pyK78vHc0Gi5gVgLfmzBflX8yaORJ233Ro4kJJXgMcNBtw9u6zX3NLvcU+7YLOpLz4dIrYpJDpkVrp3J/R/e1EdZDO19Q39V18MT+oegXOmPsd3A20lH4hlvGgTnj8zyOQtaiFEJ4jEKUA8nk5ldhIcrdvOBd/14dDDsiRCpyVERwmKs4BpIqqUWeu/NEswOTxoacjVtR+hs4Bp2J7Y66QCRcgkvQ+yvqnAQjp/OpUwpEaiB6176NX358q4ftKc8uIU9w8s8570Orjk5fZ8ZBxHQ7KC6f/GiL2zli5Jhu6mPWQWQjepzOS2NRzVIJ47aWTo83W8GBVly2uHjIxkfrDuDGQN6yNCG2R++jQyKsff4CjTP5kfng0n19d/cLwEq3x8phLOSnNMG/YX7eqZmKR7pRmUU5CKKuf8ZiSi1wJ09Ssq0A+Ck36zZzRhCHTse3/+UxcMkuOiRWtlGOeWDSLJc+0oaFjAmVVcgdfeD/YOMRGCPD+CahMk1uaGFK/ibjRlOJToQlVuQ4chwrAFpWX46upkhSIMdjFd05pLn84ATULDnsmWClWrCFpmtg4kL9kV3hOkFODsE4POxj3QfUtj3YCjLjTEDXRu2jZKZ/VoCx/CziEkskASk8iFYrCuB9Q9j0zIGk2PZjvo7PQ08PWx+AGLyuTdrCl8ErFrMbl3+uJHsZF9+nDDZ2dKTsr9Kngobb0SvPanu0vSoPFHy4Ca6TglJ208WLkk521v36ygVJ23ZCQmYSBZrlypoIvUcIl4pTJBTew== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(4636009)(396003)(136003)(39850400004)(376002)(346002)(230922051799003)(82310400011)(64100799003)(1800799009)(186009)(451199024)(46966006)(36840700001)(41300700001)(70206006)(316002)(8676002)(8936002)(6916009)(4326008)(6666004)(30864003)(5660300002)(2906002)(70586007)(426003)(478600001)(40480700001)(83380400001)(47076005)(86362001)(82740400003)(36860700001)(336012)(36756003)(2616005)(1076003)(26005)(7696005)(81166007)(356005)(2004002)(36900700001)(559001)(579004)(473944003); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR08MB7933 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT008.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 8df2ab04-2f2c-4d9f-2c3f-08dbc5bda6fd X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qPw2tT9bLMZgglW3QrW/fPbD19VpYSjUa42JQK1YrLE840tZAymYSNGpdKZmTCYdvcN4lBDr6R95W37nL8CNUujjPfwD7WhKvhDkAZsVIQDZop/70cECZHJpcNxhF5Wrgd7/KVoLGWsMSDWtEidiiz0QxbKx2nAd6eALptGiEJvaqyM4KFery09sfVV8FOT9m7VdDHk4twmVzQ6SbjEHsa92p0oPwZJRCairw9BOKs/iJEiSAgnb9R9xbGXIR23JEQIiD0WSkEvJ71zB+RM1vIyqHOmWh1d2yogW+ovjmwIi7IV3XeGsYD4T6Z+IhV79Aigi5piiclRDReVlZOEq9XuyKXW+cGMUApHs0jzRP4WABIFBQ+fp3WPVaNkbRZM47z72d6+u7vytfaebDp5pO5mgGdewmM8C2RR09kY+qAt9yoX1XEoDPTSWTIC6vdpj2RfdE8+3zQkgya/slYAB5qGpgoFPC4K7YiGgnzze/5N43jH1ucPeH443qODCJh9rYWAZVT4EY8mu0O60vnHlK5oOwuIz5LP0SrvWQ9ldobvb82I/OOf+rJ470R1BGJt8yC0ci++zWfvOFj0lMvwdDp+JtTzU4pTJMZALfo3w39wOdimXvcDd9RuEdSdn7X3SQE3PJRhejmBAHKD09cLBH8Q6OFCPBYV3GdNdGItC2baKnvaWM1vJ653ECOaBGAJR0b5cq565jalbK3nS7ORPKhWjY7oUunZ1kdgWiWuMsfmjC7TxdPFvU5ObAr5vxr75C73881BRTEjI7/O6cEAp+Q== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(39860400002)(136003)(396003)(376002)(346002)(230922051799003)(1800799009)(64100799003)(451199024)(82310400011)(186009)(36840700001)(46966006)(40470700004)(426003)(40460700003)(7696005)(6666004)(478600001)(5660300002)(6916009)(36860700001)(82740400003)(336012)(86362001)(81166007)(2616005)(41300700001)(40480700001)(30864003)(2906002)(83380400001)(26005)(47076005)(1076003)(70206006)(316002)(8936002)(36756003)(70586007)(8676002)(4326008)(2004002)(579004)(559001)(473944003); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Oct 2023 16:11:04.6297 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1c0ebe0d-f7ff-4655-a038-08dbc5bdad1f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT008.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB5395 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org This includes some utility headers for evaluating polynomials using various schemes. --- Changes from v1: * Silence sign-of-zero check for mathvec tan * Optimise register usage around special-case, and remove -0 check Thanks, Joe math/auto-libm-test-in | 2 +- math/auto-libm-test-out-tan | 50 +-- sysdeps/aarch64/fpu/Makefile | 3 +- sysdeps/aarch64/fpu/Versions | 6 + sysdeps/aarch64/fpu/bits/math-vector.h | 4 + sysdeps/aarch64/fpu/poly_advsimd_f32.h | 36 ++ sysdeps/aarch64/fpu/poly_advsimd_f64.h | 36 ++ sysdeps/aarch64/fpu/poly_generic.h | 285 ++++++++++++++++ sysdeps/aarch64/fpu/poly_sve_f32.h | 38 +++ sysdeps/aarch64/fpu/poly_sve_f64.h | 38 +++ sysdeps/aarch64/fpu/poly_sve_generic.h | 313 ++++++++++++++++++ sysdeps/aarch64/fpu/tan_advsimd.c | 123 +++++++ sysdeps/aarch64/fpu/tan_sve.c | 104 ++++++ sysdeps/aarch64/fpu/tanf_advsimd.c | 129 ++++++++ sysdeps/aarch64/fpu/tanf_sve.c | 118 +++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/libm-test-ulps | 8 + .../unix/sysv/linux/aarch64/libmvec.abilist | 4 + 21 files changed, 1274 insertions(+), 27 deletions(-) create mode 100644 sysdeps/aarch64/fpu/poly_advsimd_f32.h create mode 100644 sysdeps/aarch64/fpu/poly_advsimd_f64.h create mode 100644 sysdeps/aarch64/fpu/poly_generic.h create mode 100644 sysdeps/aarch64/fpu/poly_sve_f32.h create mode 100644 sysdeps/aarch64/fpu/poly_sve_f64.h create mode 100644 sysdeps/aarch64/fpu/poly_sve_generic.h create mode 100644 sysdeps/aarch64/fpu/tan_advsimd.c create mode 100644 sysdeps/aarch64/fpu/tan_sve.c create mode 100644 sysdeps/aarch64/fpu/tanf_advsimd.c create mode 100644 sysdeps/aarch64/fpu/tanf_sve.c diff --git a/math/auto-libm-test-in b/math/auto-libm-test-in index 2672eb1f6a..70892503d6 100644 --- a/math/auto-libm-test-in +++ b/math/auto-libm-test-in @@ -7655,7 +7655,7 @@ sqrt min sqrt min_subnorm tan 0 -tan -0 +tan -0 no-mathvec tan pi/4 tan pi/2 tan -pi/2 diff --git a/math/auto-libm-test-out-tan b/math/auto-libm-test-out-tan index 7d00d03e1d..f46fdc7ec6 100644 --- a/math/auto-libm-test-out-tan +++ b/math/auto-libm-test-out-tan @@ -23,31 +23,31 @@ tan 0 = tan tonearest ibm128 0x0p+0 : 0x0p+0 : inexact-ok = tan towardzero ibm128 0x0p+0 : 0x0p+0 : inexact-ok = tan upward ibm128 0x0p+0 : 0x0p+0 : inexact-ok -tan -0 -= tan downward binary32 -0x0p+0 : -0x0p+0 : inexact-ok -= tan tonearest binary32 -0x0p+0 : -0x0p+0 : inexact-ok -= tan towardzero binary32 -0x0p+0 : -0x0p+0 : inexact-ok -= tan upward binary32 -0x0p+0 : -0x0p+0 : inexact-ok -= tan downward binary64 -0x0p+0 : -0x0p+0 : inexact-ok -= tan tonearest binary64 -0x0p+0 : -0x0p+0 : inexact-ok -= tan towardzero binary64 -0x0p+0 : -0x0p+0 : inexact-ok -= tan upward binary64 -0x0p+0 : -0x0p+0 : inexact-ok -= tan downward intel96 -0x0p+0 : -0x0p+0 : inexact-ok -= tan tonearest intel96 -0x0p+0 : -0x0p+0 : inexact-ok -= tan towardzero intel96 -0x0p+0 : -0x0p+0 : inexact-ok -= tan upward intel96 -0x0p+0 : -0x0p+0 : inexact-ok -= tan downward m68k96 -0x0p+0 : -0x0p+0 : inexact-ok -= tan tonearest m68k96 -0x0p+0 : -0x0p+0 : inexact-ok -= tan towardzero m68k96 -0x0p+0 : -0x0p+0 : inexact-ok -= tan upward m68k96 -0x0p+0 : -0x0p+0 : inexact-ok -= tan downward binary128 -0x0p+0 : -0x0p+0 : inexact-ok -= tan tonearest binary128 -0x0p+0 : -0x0p+0 : inexact-ok -= tan towardzero binary128 -0x0p+0 : -0x0p+0 : inexact-ok -= tan upward binary128 -0x0p+0 : -0x0p+0 : inexact-ok -= tan downward ibm128 -0x0p+0 : -0x0p+0 : inexact-ok -= tan tonearest ibm128 -0x0p+0 : -0x0p+0 : inexact-ok -= tan towardzero ibm128 -0x0p+0 : -0x0p+0 : inexact-ok -= tan upward ibm128 -0x0p+0 : -0x0p+0 : inexact-ok +tan -0 no-mathvec += tan downward binary32 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan tonearest binary32 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan towardzero binary32 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan upward binary32 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan downward binary64 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan tonearest binary64 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan towardzero binary64 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan upward binary64 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan downward intel96 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan tonearest intel96 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan towardzero intel96 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan upward intel96 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan downward m68k96 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan tonearest m68k96 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan towardzero m68k96 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan upward m68k96 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan downward binary128 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan tonearest binary128 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan towardzero binary128 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan upward binary128 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan downward ibm128 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan tonearest ibm128 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan towardzero ibm128 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok += tan upward ibm128 -0x0p+0 : -0x0p+0 : no-mathvec inexact-ok tan pi/4 = tan downward binary32 0xc.90fdbp-4 : 0x1p+0 : inexact-ok = tan tonearest binary32 0xc.90fdbp-4 : 0x1p+0 : inexact-ok diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 04aa2e37ca..a1bbc9bcaa 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -1,7 +1,8 @@ libmvec-supported-funcs = cos \ exp \ log \ - sin + sin \ + tan float-advsimd-funcs = $(libmvec-supported-funcs) double-advsimd-funcs = $(libmvec-supported-funcs) diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index c85c0f3efb..f0ca0940a9 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -17,4 +17,10 @@ libmvec { _ZGVsMxv_sin; _ZGVsMxv_sinf; } + GLIBC_2.39 { + _ZGVnN4v_tanf; + _ZGVnN2v_tan; + _ZGVsMxv_tanf; + _ZGVsMxv_tan; + } } diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index 7c200599c1..6193213147 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -53,11 +53,13 @@ __vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_expf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_logf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_sinf (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4v_tanf (__f32x4_t); __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_exp (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_log (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_sin (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2v_tan (__f64x2_t); # undef __ADVSIMD_VEC_MATH_SUPPORTED #endif /* __ADVSIMD_VEC_MATH_SUPPORTED */ @@ -68,11 +70,13 @@ __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_expf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_logf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_sinf (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxv_tanf (__sv_f32_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_exp (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_log (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_sin (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxv_tan (__sv_f64_t, __sv_bool_t); # undef __SVE_VEC_MATH_SUPPORTED #endif /* __SVE_VEC_MATH_SUPPORTED */ diff --git a/sysdeps/aarch64/fpu/poly_advsimd_f32.h b/sysdeps/aarch64/fpu/poly_advsimd_f32.h new file mode 100644 index 0000000000..9e2ad9ad94 --- /dev/null +++ b/sysdeps/aarch64/fpu/poly_advsimd_f32.h @@ -0,0 +1,36 @@ +/* Helpers for evaluating polynomials on single-precision AdvSIMD input, using + various schemes. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef AARCH64_FPU_POLY_ADVSIMD_F32_H +#define AARCH64_FPU_POLY_ADVSIMD_F32_H + +#include + +/* Wrap AdvSIMD f32 helpers: evaluation of some scheme/order has form: + v_[scheme]_[order]_f32. */ +#define VTYPE float32x4_t +#define FMA(x, y, z) vfmaq_f32 (z, x, y) +#define VWRAP(f) v_##f##_f32 +#include "poly_generic.h" +#undef VWRAP +#undef FMA +#undef VTYPE + +#endif diff --git a/sysdeps/aarch64/fpu/poly_advsimd_f64.h b/sysdeps/aarch64/fpu/poly_advsimd_f64.h new file mode 100644 index 0000000000..955cfc08ce --- /dev/null +++ b/sysdeps/aarch64/fpu/poly_advsimd_f64.h @@ -0,0 +1,36 @@ +/* Helpers for evaluating polynomials on double-precision AdvSIMD input, using + various schemes. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef AARCH64_FPU_POLY_ADVSIMD_F64_H +#define AARCH64_FPU_POLY_ADVSIMD_F64_H + +#include + +/* Wrap AdvSIMD f64 helpers: evaluation of some scheme/order has form: + v_[scheme]_[order]_f64. */ +#define VTYPE float64x2_t +#define FMA(x, y, z) vfmaq_f64 (z, x, y) +#define VWRAP(f) v_##f##_f64 +#include "poly_generic.h" +#undef VWRAP +#undef FMA +#undef VTYPE + +#endif diff --git a/sysdeps/aarch64/fpu/poly_generic.h b/sysdeps/aarch64/fpu/poly_generic.h new file mode 100644 index 0000000000..84f042182b --- /dev/null +++ b/sysdeps/aarch64/fpu/poly_generic.h @@ -0,0 +1,285 @@ +/* Generic helpers for evaluating polynomials with various schemes. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + + +#ifndef VTYPE +# error Cannot use poly_generic without defining VTYPE +#endif +#ifndef VWRAP +# error Cannot use poly_generic without defining VWRAP +#endif +#ifndef FMA +# error Cannot use poly_generic without defining FMA +#endif + +static inline VTYPE VWRAP (pairwise_poly_3) (VTYPE x, VTYPE x2, + const VTYPE *poly) +{ + /* At order 3, Estrin and Pairwise Horner are identical. */ + VTYPE p01 = FMA (poly[1], x, poly[0]); + VTYPE p23 = FMA (poly[3], x, poly[2]); + return FMA (p23, x2, p01); +} + +static inline VTYPE VWRAP (estrin_4) (VTYPE x, VTYPE x2, VTYPE x4, + const VTYPE *poly) +{ + VTYPE p03 = VWRAP (pairwise_poly_3) (x, x2, poly); + return FMA (poly[4], x4, p03); +} +static inline VTYPE VWRAP (estrin_5) (VTYPE x, VTYPE x2, VTYPE x4, + const VTYPE *poly) +{ + VTYPE p03 = VWRAP (pairwise_poly_3) (x, x2, poly); + VTYPE p45 = FMA (poly[5], x, poly[4]); + return FMA (p45, x4, p03); +} +static inline VTYPE VWRAP (estrin_6) (VTYPE x, VTYPE x2, VTYPE x4, + const VTYPE *poly) +{ + VTYPE p03 = VWRAP (pairwise_poly_3) (x, x2, poly); + VTYPE p45 = FMA (poly[5], x, poly[4]); + VTYPE p46 = FMA (poly[6], x2, p45); + return FMA (p46, x4, p03); +} +static inline VTYPE VWRAP (estrin_7) (VTYPE x, VTYPE x2, VTYPE x4, + const VTYPE *poly) +{ + VTYPE p03 = VWRAP (pairwise_poly_3) (x, x2, poly); + VTYPE p47 = VWRAP (pairwise_poly_3) (x, x2, poly + 4); + return FMA (p47, x4, p03); +} +static inline VTYPE VWRAP (estrin_8) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + const VTYPE *poly) +{ + return FMA (poly[8], x8, VWRAP (estrin_7) (x, x2, x4, poly)); +} +static inline VTYPE VWRAP (estrin_9) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + const VTYPE *poly) +{ + VTYPE p89 = FMA (poly[9], x, poly[8]); + return FMA (p89, x8, VWRAP (estrin_7) (x, x2, x4, poly)); +} +static inline VTYPE VWRAP (estrin_10) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + const VTYPE *poly) +{ + VTYPE p89 = FMA (poly[9], x, poly[8]); + VTYPE p8_10 = FMA (poly[10], x2, p89); + return FMA (p8_10, x8, VWRAP (estrin_7) (x, x2, x4, poly)); +} +static inline VTYPE VWRAP (estrin_11) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + const VTYPE *poly) +{ + VTYPE p8_11 = VWRAP (pairwise_poly_3) (x, x2, poly + 8); + return FMA (p8_11, x8, VWRAP (estrin_7) (x, x2, x4, poly)); +} +static inline VTYPE VWRAP (estrin_12) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + const VTYPE *poly) +{ + return FMA (VWRAP (estrin_4) (x, x2, x4, poly + 8), x8, + VWRAP (estrin_7) (x, x2, x4, poly)); +} +static inline VTYPE VWRAP (estrin_13) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + const VTYPE *poly) +{ + return FMA (VWRAP (estrin_5) (x, x2, x4, poly + 8), x8, + VWRAP (estrin_7) (x, x2, x4, poly)); +} +static inline VTYPE VWRAP (estrin_14) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + const VTYPE *poly) +{ + return FMA (VWRAP (estrin_6) (x, x2, x4, poly + 8), x8, + VWRAP (estrin_7) (x, x2, x4, poly)); +} +static inline VTYPE VWRAP (estrin_15) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + const VTYPE *poly) +{ + return FMA (VWRAP (estrin_7) (x, x2, x4, poly + 8), x8, + VWRAP (estrin_7) (x, x2, x4, poly)); +} +static inline VTYPE VWRAP (estrin_16) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + VTYPE x16, const VTYPE *poly) +{ + return FMA (poly[16], x16, VWRAP (estrin_15) (x, x2, x4, x8, poly)); +} +static inline VTYPE VWRAP (estrin_17) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + VTYPE x16, const VTYPE *poly) +{ + VTYPE p16_17 = FMA (poly[17], x, poly[16]); + return FMA (p16_17, x16, VWRAP (estrin_15) (x, x2, x4, x8, poly)); +} +static inline VTYPE VWRAP (estrin_18) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + VTYPE x16, const VTYPE *poly) +{ + VTYPE p16_17 = FMA (poly[17], x, poly[16]); + VTYPE p16_18 = FMA (poly[18], x2, p16_17); + return FMA (p16_18, x16, VWRAP (estrin_15) (x, x2, x4, x8, poly)); +} +static inline VTYPE VWRAP (estrin_19) (VTYPE x, VTYPE x2, VTYPE x4, VTYPE x8, + VTYPE x16, const VTYPE *poly) +{ + VTYPE p16_19 = VWRAP (pairwise_poly_3) (x, x2, poly + 16); + return FMA (p16_19, x16, VWRAP (estrin_15) (x, x2, x4, x8, poly)); +} + +static inline VTYPE VWRAP (horner_3) (VTYPE x, const VTYPE *poly) +{ + VTYPE p = FMA (poly[3], x, poly[2]); + p = FMA (x, p, poly[1]); + p = FMA (x, p, poly[0]); + return p; +} +static inline VTYPE VWRAP (horner_4) (VTYPE x, const VTYPE *poly) +{ + VTYPE p = FMA (poly[4], x, poly[3]); + p = FMA (x, p, poly[2]); + p = FMA (x, p, poly[1]); + p = FMA (x, p, poly[0]); + return p; +} +static inline VTYPE VWRAP (horner_5) (VTYPE x, const VTYPE *poly) +{ + return FMA (x, VWRAP (horner_4) (x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_6) (VTYPE x, const VTYPE *poly) +{ + return FMA (x, VWRAP (horner_5) (x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_7) (VTYPE x, const VTYPE *poly) +{ + return FMA (x, VWRAP (horner_6) (x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_8) (VTYPE x, const VTYPE *poly) +{ + return FMA (x, VWRAP (horner_7) (x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_9) (VTYPE x, const VTYPE *poly) +{ + return FMA (x, VWRAP (horner_8) (x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_10) (VTYPE x, const VTYPE *poly) +{ + return FMA (x, VWRAP (horner_9) (x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_11) (VTYPE x, const VTYPE *poly) +{ + return FMA (x, VWRAP (horner_10) (x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_12) (VTYPE x, const VTYPE *poly) +{ + return FMA (x, VWRAP (horner_11) (x, poly + 1), poly[0]); +} + +static inline VTYPE VWRAP (pw_horner_4) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p01 = FMA (poly[1], x, poly[0]); + VTYPE p23 = FMA (poly[3], x, poly[2]); + VTYPE p; + p = FMA (x2, poly[4], p23); + p = FMA (x2, p, p01); + return p; +} +static inline VTYPE VWRAP (pw_horner_5) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p01 = FMA (poly[1], x, poly[0]); + VTYPE p23 = FMA (poly[3], x, poly[2]); + VTYPE p45 = FMA (poly[5], x, poly[4]); + VTYPE p; + p = FMA (x2, p45, p23); + p = FMA (x2, p, p01); + return p; +} +static inline VTYPE VWRAP (pw_horner_6) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p26 = VWRAP (pw_horner_4) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p26, p01); +} +static inline VTYPE VWRAP (pw_horner_7) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p27 = VWRAP (pw_horner_5) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p27, p01); +} +static inline VTYPE VWRAP (pw_horner_8) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p28 = VWRAP (pw_horner_6) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p28, p01); +} +static inline VTYPE VWRAP (pw_horner_9) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p29 = VWRAP (pw_horner_7) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p29, p01); +} +static inline VTYPE VWRAP (pw_horner_10) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p2_10 = VWRAP (pw_horner_8) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p2_10, p01); +} +static inline VTYPE VWRAP (pw_horner_11) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p2_11 = VWRAP (pw_horner_9) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p2_11, p01); +} +static inline VTYPE VWRAP (pw_horner_12) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p2_12 = VWRAP (pw_horner_10) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p2_12, p01); +} +static inline VTYPE VWRAP (pw_horner_13) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p2_13 = VWRAP (pw_horner_11) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p2_13, p01); +} +static inline VTYPE VWRAP (pw_horner_14) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p2_14 = VWRAP (pw_horner_12) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p2_14, p01); +} +static inline VTYPE VWRAP (pw_horner_15) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p2_15 = VWRAP (pw_horner_13) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p2_15, p01); +} +static inline VTYPE VWRAP (pw_horner_16) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p2_16 = VWRAP (pw_horner_14) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p2_16, p01); +} +static inline VTYPE VWRAP (pw_horner_17) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p2_17 = VWRAP (pw_horner_15) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p2_17, p01); +} +static inline VTYPE VWRAP (pw_horner_18) (VTYPE x, VTYPE x2, const VTYPE *poly) +{ + VTYPE p2_18 = VWRAP (pw_horner_16) (x, x2, poly + 2); + VTYPE p01 = FMA (poly[1], x, poly[0]); + return FMA (x2, p2_18, p01); +} diff --git a/sysdeps/aarch64/fpu/poly_sve_f32.h b/sysdeps/aarch64/fpu/poly_sve_f32.h new file mode 100644 index 0000000000..dcf2fab8dd --- /dev/null +++ b/sysdeps/aarch64/fpu/poly_sve_f32.h @@ -0,0 +1,38 @@ +/* Helpers for evaluating polynomials on single-precision SVE input, using + various schemes. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef AARCH64_FPU_POLY_SVE_F32_H +#define AARCH64_FPU_POLY_SVE_F32_H + +#include + +/* Wrap SVE f32 helpers: evaluation of some scheme/order has form: + sv_[scheme]_[order]_f32_x. */ +#define VTYPE svfloat32_t +#define STYPE float +#define VWRAP(f) sv_##f##_f32_x +#define DUP svdup_n_f32 +#include "poly_sve_generic.h" +#undef DUP +#undef VWRAP +#undef STYPE +#undef VTYPE + +#endif diff --git a/sysdeps/aarch64/fpu/poly_sve_f64.h b/sysdeps/aarch64/fpu/poly_sve_f64.h new file mode 100644 index 0000000000..97a0b76637 --- /dev/null +++ b/sysdeps/aarch64/fpu/poly_sve_f64.h @@ -0,0 +1,38 @@ +/* Helpers for evaluating polynomials on double-precision SVE input, using + various schemes. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef AARCH64_FPU_POLY_SVE_F64_H +#define AARCH64_FPU_POLY_SVE_F64_H + +#include + +/* Wrap SVE f64 helpers: evaluation of some scheme/order has form: + sv_[scheme]_[order]_f64_x. */ +#define VTYPE svfloat64_t +#define STYPE double +#define VWRAP(f) sv_##f##_f64_x +#define DUP svdup_n_f64 +#include "poly_sve_generic.h" +#undef DUP +#undef VWRAP +#undef STYPE +#undef VTYPE + +#endif diff --git a/sysdeps/aarch64/fpu/poly_sve_generic.h b/sysdeps/aarch64/fpu/poly_sve_generic.h new file mode 100644 index 0000000000..0ecf5ce45b --- /dev/null +++ b/sysdeps/aarch64/fpu/poly_sve_generic.h @@ -0,0 +1,313 @@ +/* Helpers for evaluating polynomials with various schemes - specific to SVE + but precision-agnostic. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef VTYPE +# error Cannot use poly_generic without defining VTYPE +#endif +#ifndef STYPE +# error Cannot use poly_generic without defining STYPE +#endif +#ifndef VWRAP +# error Cannot use poly_generic without defining VWRAP +#endif +#ifndef DUP +# error Cannot use poly_generic without defining DUP +#endif + +static inline VTYPE VWRAP (pairwise_poly_3) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + /* At order 3, Estrin and Pairwise Horner are identical. */ + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + VTYPE p23 = svmla_x (pg, DUP (poly[2]), x, poly[3]); + return svmla_x (pg, p01, p23, x2); +} + +static inline VTYPE VWRAP (estrin_4) (svbool_t pg, VTYPE x, VTYPE x2, VTYPE x4, + const STYPE *poly) +{ + VTYPE p03 = VWRAP (pairwise_poly_3) (pg, x, x2, poly); + return svmla_x (pg, p03, x4, poly[4]); +} +static inline VTYPE VWRAP (estrin_5) (svbool_t pg, VTYPE x, VTYPE x2, VTYPE x4, + const STYPE *poly) +{ + VTYPE p03 = VWRAP (pairwise_poly_3) (pg, x, x2, poly); + VTYPE p45 = svmla_x (pg, DUP (poly[4]), x, poly[5]); + return svmla_x (pg, p03, p45, x4); +} +static inline VTYPE VWRAP (estrin_6) (svbool_t pg, VTYPE x, VTYPE x2, VTYPE x4, + const STYPE *poly) +{ + VTYPE p03 = VWRAP (pairwise_poly_3) (pg, x, x2, poly); + VTYPE p45 = svmla_x (pg, DUP (poly[4]), x, poly[5]); + VTYPE p46 = svmla_x (pg, p45, x, poly[6]); + return svmla_x (pg, p03, p46, x4); +} +static inline VTYPE VWRAP (estrin_7) (svbool_t pg, VTYPE x, VTYPE x2, VTYPE x4, + const STYPE *poly) +{ + VTYPE p03 = VWRAP (pairwise_poly_3) (pg, x, x2, poly); + VTYPE p47 = VWRAP (pairwise_poly_3) (pg, x, x2, poly + 4); + return svmla_x (pg, p03, p47, x4); +} +static inline VTYPE VWRAP (estrin_8) (svbool_t pg, VTYPE x, VTYPE x2, VTYPE x4, + VTYPE x8, const STYPE *poly) +{ + return svmla_x (pg, VWRAP (estrin_7) (pg, x, x2, x4, poly), x8, poly[8]); +} +static inline VTYPE VWRAP (estrin_9) (svbool_t pg, VTYPE x, VTYPE x2, VTYPE x4, + VTYPE x8, const STYPE *poly) +{ + VTYPE p89 = svmla_x (pg, DUP (poly[8]), x, poly[9]); + return svmla_x (pg, VWRAP (estrin_7) (pg, x, x2, x4, poly), p89, x8); +} +static inline VTYPE VWRAP (estrin_10) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, const STYPE *poly) +{ + VTYPE p89 = svmla_x (pg, DUP (poly[8]), x, poly[9]); + VTYPE p8_10 = svmla_x (pg, p89, x2, poly[10]); + return svmla_x (pg, VWRAP (estrin_7) (pg, x, x2, x4, poly), p8_10, x8); +} +static inline VTYPE VWRAP (estrin_11) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, const STYPE *poly) +{ + VTYPE p8_11 = VWRAP (pairwise_poly_3) (pg, x, x2, poly + 8); + return svmla_x (pg, VWRAP (estrin_7) (pg, x, x2, x4, poly), p8_11, x8); +} +static inline VTYPE VWRAP (estrin_12) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, const STYPE *poly) +{ + return svmla_x (pg, VWRAP (estrin_7) (pg, x, x2, x4, poly), + VWRAP (estrin_4) (pg, x, x2, x4, poly + 8), x8); +} +static inline VTYPE VWRAP (estrin_13) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, const STYPE *poly) +{ + return svmla_x (pg, VWRAP (estrin_7) (pg, x, x2, x4, poly), + VWRAP (estrin_5) (pg, x, x2, x4, poly + 8), x8); +} +static inline VTYPE VWRAP (estrin_14) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, const STYPE *poly) +{ + return svmla_x (pg, VWRAP (estrin_7) (pg, x, x2, x4, poly), + VWRAP (estrin_6) (pg, x, x2, x4, poly + 8), x8); +} +static inline VTYPE VWRAP (estrin_15) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, const STYPE *poly) +{ + return svmla_x (pg, VWRAP (estrin_7) (pg, x, x2, x4, poly), + VWRAP (estrin_7) (pg, x, x2, x4, poly + 8), x8); +} +static inline VTYPE VWRAP (estrin_16) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, VTYPE x16, + const STYPE *poly) +{ + return svmla_x (pg, VWRAP (estrin_15) (pg, x, x2, x4, x8, poly), x16, + poly[16]); +} +static inline VTYPE VWRAP (estrin_17) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, VTYPE x16, + const STYPE *poly) +{ + VTYPE p16_17 = svmla_x (pg, DUP (poly[16]), x, poly[17]); + return svmla_x (pg, VWRAP (estrin_15) (pg, x, x2, x4, x8, poly), p16_17, + x16); +} +static inline VTYPE VWRAP (estrin_18) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, VTYPE x16, + const STYPE *poly) +{ + VTYPE p16_17 = svmla_x (pg, DUP (poly[16]), x, poly[17]); + VTYPE p16_18 = svmla_x (pg, p16_17, x2, poly[18]); + return svmla_x (pg, VWRAP (estrin_15) (pg, x, x2, x4, x8, poly), p16_18, + x16); +} +static inline VTYPE VWRAP (estrin_19) (svbool_t pg, VTYPE x, VTYPE x2, + VTYPE x4, VTYPE x8, VTYPE x16, + const STYPE *poly) +{ + return svmla_x (pg, VWRAP (estrin_15) (pg, x, x2, x4, x8, poly), + VWRAP (pairwise_poly_3) (pg, x, x2, poly + 16), x16); +} + +static inline VTYPE VWRAP (horner_3) (svbool_t pg, VTYPE x, const STYPE *poly) +{ + VTYPE p = svmla_x (pg, DUP (poly[2]), x, poly[3]); + p = svmad_x (pg, x, p, poly[1]); + p = svmad_x (pg, x, p, poly[0]); + return p; +} +static inline VTYPE VWRAP (horner_4) (svbool_t pg, VTYPE x, const STYPE *poly) +{ + VTYPE p = svmla_x (pg, DUP (poly[3]), x, poly[4]); + p = svmad_x (pg, x, p, poly[2]); + p = svmad_x (pg, x, p, poly[1]); + p = svmad_x (pg, x, p, poly[0]); + return p; +} +static inline VTYPE VWRAP (horner_5) (svbool_t pg, VTYPE x, const STYPE *poly) +{ + return svmad_x (pg, x, VWRAP (horner_4) (pg, x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_6) (svbool_t pg, VTYPE x, const STYPE *poly) +{ + return svmad_x (pg, x, VWRAP (horner_5) (pg, x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_7) (svbool_t pg, VTYPE x, const STYPE *poly) +{ + return svmad_x (pg, x, VWRAP (horner_6) (pg, x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_8) (svbool_t pg, VTYPE x, const STYPE *poly) +{ + return svmad_x (pg, x, VWRAP (horner_7) (pg, x, poly + 1), poly[0]); +} +static inline VTYPE VWRAP (horner_9) (svbool_t pg, VTYPE x, const STYPE *poly) +{ + return svmad_x (pg, x, VWRAP (horner_8) (pg, x, poly + 1), poly[0]); +} +static inline VTYPE +sv_horner_10_f32_x (svbool_t pg, VTYPE x, const STYPE *poly) +{ + return svmad_x (pg, x, VWRAP (horner_9) (pg, x, poly + 1), poly[0]); +} +static inline VTYPE +sv_horner_11_f32_x (svbool_t pg, VTYPE x, const STYPE *poly) +{ + return svmad_x (pg, x, sv_horner_10_f32_x (pg, x, poly + 1), poly[0]); +} +static inline VTYPE +sv_horner_12_f32_x (svbool_t pg, VTYPE x, const STYPE *poly) +{ + return svmad_x (pg, x, sv_horner_11_f32_x (pg, x, poly + 1), poly[0]); +} + +static inline VTYPE VWRAP (pw_horner_4) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + VTYPE p23 = svmla_x (pg, DUP (poly[2]), x, poly[3]); + VTYPE p; + p = svmla_x (pg, p23, x2, poly[4]); + p = svmla_x (pg, p01, x2, p); + return p; +} +static inline VTYPE VWRAP (pw_horner_5) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + VTYPE p23 = svmla_x (pg, DUP (poly[2]), x, poly[3]); + VTYPE p45 = svmla_x (pg, DUP (poly[4]), x, poly[5]); + VTYPE p; + p = svmla_x (pg, p23, x2, p45); + p = svmla_x (pg, p01, x2, p); + return p; +} +static inline VTYPE VWRAP (pw_horner_6) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p26 = VWRAP (pw_horner_4) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p26); +} +static inline VTYPE VWRAP (pw_horner_7) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p27 = VWRAP (pw_horner_5) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p27); +} +static inline VTYPE VWRAP (pw_horner_8) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p28 = VWRAP (pw_horner_6) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p28); +} +static inline VTYPE VWRAP (pw_horner_9) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p29 = VWRAP (pw_horner_7) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p29); +} +static inline VTYPE VWRAP (pw_horner_10) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p2_10 = VWRAP (pw_horner_8) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p2_10); +} +static inline VTYPE VWRAP (pw_horner_11) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p2_11 = VWRAP (pw_horner_9) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p2_11); +} +static inline VTYPE VWRAP (pw_horner_12) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p2_12 = VWRAP (pw_horner_10) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p2_12); +} +static inline VTYPE VWRAP (pw_horner_13) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p2_13 = VWRAP (pw_horner_11) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p2_13); +} +static inline VTYPE VWRAP (pw_horner_14) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p2_14 = VWRAP (pw_horner_12) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p2_14); +} +static inline VTYPE VWRAP (pw_horner_15) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p2_15 = VWRAP (pw_horner_13) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p2_15); +} +static inline VTYPE VWRAP (pw_horner_16) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p2_16 = VWRAP (pw_horner_14) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p2_16); +} +static inline VTYPE VWRAP (pw_horner_17) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p2_17 = VWRAP (pw_horner_15) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p2_17); +} +static inline VTYPE VWRAP (pw_horner_18) (svbool_t pg, VTYPE x, VTYPE x2, + const STYPE *poly) +{ + VTYPE p2_18 = VWRAP (pw_horner_16) (pg, x, x2, poly + 2); + VTYPE p01 = svmla_x (pg, DUP (poly[0]), x, poly[1]); + return svmla_x (pg, p01, x2, p2_18); +} diff --git a/sysdeps/aarch64/fpu/tan_advsimd.c b/sysdeps/aarch64/fpu/tan_advsimd.c new file mode 100644 index 0000000000..936a0569c8 --- /dev/null +++ b/sysdeps/aarch64/fpu/tan_advsimd.c @@ -0,0 +1,123 @@ +/* Double-precision vector (Advanced SIMD) tan function + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" +#include "poly_advsimd_f64.h" + +static const struct data +{ + float64x2_t poly[9]; + float64x2_t half_pi_hi, half_pi_lo, two_over_pi, shift; +#if !WANT_SIMD_EXCEPT + float64x2_t range_val; +#endif +} data = { + /* Coefficients generated using FPMinimax. */ + .poly = { V2 (0x1.5555555555556p-2), V2 (0x1.1111111110a63p-3), + V2 (0x1.ba1ba1bb46414p-5), V2 (0x1.664f47e5b5445p-6), + V2 (0x1.226e5e5ecdfa3p-7), V2 (0x1.d6c7ddbf87047p-9), + V2 (0x1.7ea75d05b583ep-10), V2 (0x1.289f22964a03cp-11), + V2 (0x1.4e4fd14147622p-12) }, + .half_pi_hi = V2 (0x1.921fb54442d18p0), + .half_pi_lo = V2 (0x1.1a62633145c07p-54), + .two_over_pi = V2 (0x1.45f306dc9c883p-1), + .shift = V2 (0x1.8p52), +#if !WANT_SIMD_EXCEPT + .range_val = V2 (0x1p23), +#endif +}; + +#define RangeVal 0x4160000000000000 /* asuint64(0x1p23). */ +#define TinyBound 0x3e50000000000000 /* asuint64(2^-26). */ +#define Thresh 0x310000000000000 /* RangeVal - TinyBound. */ + +/* Special cases (fall back to scalar calls). */ +static float64x2_t VPCS_ATTR NOINLINE +special_case (float64x2_t x) +{ + return v_call_f64 (tan, x, x, v_u64 (-1)); +} + +/* Vector approximation for double-precision tan. + Maximum measured error is 3.48 ULP: + __v_tan(0x1.4457047ef78d8p+20) got -0x1.f6ccd8ecf7dedp+37 + want -0x1.f6ccd8ecf7deap+37. */ +float64x2_t VPCS_ATTR V_NAME_D1 (tan) (float64x2_t x) +{ + const struct data *dat = ptr_barrier (&data); + /* Our argument reduction cannot calculate q with sufficient accuracy for very + large inputs. Fall back to scalar routine for all lanes if any are too + large, or Inf/NaN. If fenv exceptions are expected, also fall back for tiny + input to avoid underflow. */ +#if WANT_SIMD_EXCEPT + uint64x2_t iax = vreinterpretq_u64_f64 (vabsq_f64 (x)); + /* iax - tiny_bound > range_val - tiny_bound. */ + uint64x2_t special + = vcgtq_u64 (vsubq_u64 (iax, v_u64 (TinyBound)), v_u64 (Thresh)); + if (__glibc_unlikely (v_any_u64 (special))) + return special_case (x); +#endif + + /* q = nearest integer to 2 * x / pi. */ + float64x2_t q + = vsubq_f64 (vfmaq_f64 (dat->shift, x, dat->two_over_pi), dat->shift); + int64x2_t qi = vcvtq_s64_f64 (q); + + /* Use q to reduce x to r in [-pi/4, pi/4], by: + r = x - q * pi/2, in extended precision. */ + float64x2_t r = x; + r = vfmsq_f64 (r, q, dat->half_pi_hi); + r = vfmsq_f64 (r, q, dat->half_pi_lo); + /* Further reduce r to [-pi/8, pi/8], to be reconstructed using double angle + formula. */ + r = vmulq_n_f64 (r, 0.5); + + /* Approximate tan(r) using order 8 polynomial. + tan(x) is odd, so polynomial has the form: + tan(x) ~= x + C0 * x^3 + C1 * x^5 + C3 * x^7 + ... + Hence we first approximate P(r) = C1 + C2 * r^2 + C3 * r^4 + ... + Then compute the approximation by: + tan(r) ~= r + r^3 * (C0 + r^2 * P(r)). */ + float64x2_t r2 = vmulq_f64 (r, r), r4 = vmulq_f64 (r2, r2), + r8 = vmulq_f64 (r4, r4); + /* Offset coefficients to evaluate from C1 onwards. */ + float64x2_t p = v_estrin_7_f64 (r2, r4, r8, dat->poly + 1); + p = vfmaq_f64 (dat->poly[0], p, r2); + p = vfmaq_f64 (r, r2, vmulq_f64 (p, r)); + + /* Recombination uses double-angle formula: + tan(2x) = 2 * tan(x) / (1 - (tan(x))^2) + and reciprocity around pi/2: + tan(x) = 1 / (tan(pi/2 - x)) + to assemble result using change-of-sign and conditional selection of + numerator/denominator, dependent on odd/even-ness of q (hence quadrant). */ + float64x2_t n = vfmaq_f64 (v_f64 (-1), p, p); + float64x2_t d = vaddq_f64 (p, p); + + uint64x2_t no_recip = vtstq_u64 (vreinterpretq_u64_s64 (qi), v_u64 (1)); + +#if !WANT_SIMD_EXCEPT + uint64x2_t special = vceqzq_u64 (vcaleq_f64 (x, dat->range_val)); + if (__glibc_unlikely (v_any_u64 (special))) + return special_case (x); +#endif + + return vdivq_f64 (vbslq_f64 (no_recip, n, vnegq_f64 (d)), + vbslq_f64 (no_recip, d, n)); +} diff --git a/sysdeps/aarch64/fpu/tan_sve.c b/sysdeps/aarch64/fpu/tan_sve.c new file mode 100644 index 0000000000..df9666711d --- /dev/null +++ b/sysdeps/aarch64/fpu/tan_sve.c @@ -0,0 +1,104 @@ +/* Double-precision vector (SVE) tan function + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" +#include "poly_sve_f64.h" + +static const struct data +{ + double poly[9]; + double half_pi_hi, half_pi_lo, inv_half_pi, range_val, shift; +} data = { + /* Polynomial generated with FPMinimax. */ + .poly = { 0x1.5555555555556p-2, 0x1.1111111110a63p-3, 0x1.ba1ba1bb46414p-5, + 0x1.664f47e5b5445p-6, 0x1.226e5e5ecdfa3p-7, 0x1.d6c7ddbf87047p-9, + 0x1.7ea75d05b583ep-10, 0x1.289f22964a03cp-11, + 0x1.4e4fd14147622p-12, }, + .half_pi_hi = 0x1.921fb54442d18p0, + .half_pi_lo = 0x1.1a62633145c07p-54, + .inv_half_pi = 0x1.45f306dc9c883p-1, + .range_val = 0x1p23, + .shift = 0x1.8p52, +}; + +static svfloat64_t NOINLINE +special_case (svfloat64_t x, svfloat64_t y, svbool_t special) +{ + return sv_call_f64 (tan, x, y, special); +} + +/* Vector approximation for double-precision tan. + Maximum measured error is 3.48 ULP: + _ZGVsMxv_tan(0x1.4457047ef78d8p+20) got -0x1.f6ccd8ecf7dedp+37 + want -0x1.f6ccd8ecf7deap+37. */ +svfloat64_t SV_NAME_D1 (tan) (svfloat64_t x, svbool_t pg) +{ + const struct data *dat = ptr_barrier (&data); + + /* Invert condition to catch NaNs and Infs as well as large values. */ + svbool_t special = svnot_z (pg, svaclt (pg, x, dat->range_val)); + + /* q = nearest integer to 2 * x / pi. */ + svfloat64_t shift = sv_f64 (dat->shift); + svfloat64_t q = svmla_x (pg, shift, x, dat->inv_half_pi); + q = svsub_x (pg, q, shift); + svint64_t qi = svcvt_s64_x (pg, q); + + /* Use q to reduce x to r in [-pi/4, pi/4], by: + r = x - q * pi/2, in extended precision. */ + svfloat64_t r = x; + svfloat64_t half_pi = svld1rq (svptrue_b64 (), &dat->half_pi_hi); + r = svmls_lane (r, q, half_pi, 0); + r = svmls_lane (r, q, half_pi, 1); + /* Further reduce r to [-pi/8, pi/8], to be reconstructed using double angle + formula. */ + r = svmul_x (pg, r, 0.5); + + /* Approximate tan(r) using order 8 polynomial. + tan(x) is odd, so polynomial has the form: + tan(x) ~= x + C0 * x^3 + C1 * x^5 + C3 * x^7 + ... + Hence we first approximate P(r) = C1 + C2 * r^2 + C3 * r^4 + ... + Then compute the approximation by: + tan(r) ~= r + r^3 * (C0 + r^2 * P(r)). */ + svfloat64_t r2 = svmul_x (pg, r, r); + svfloat64_t r4 = svmul_x (pg, r2, r2); + svfloat64_t r8 = svmul_x (pg, r4, r4); + /* Use offset version coeff array by 1 to evaluate from C1 onwards. */ + svfloat64_t p = sv_estrin_7_f64_x (pg, r2, r4, r8, dat->poly + 1); + p = svmad_x (pg, p, r2, dat->poly[0]); + p = svmla_x (pg, r, r2, svmul_x (pg, p, r)); + + /* Recombination uses double-angle formula: + tan(2x) = 2 * tan(x) / (1 - (tan(x))^2) + and reciprocity around pi/2: + tan(x) = 1 / (tan(pi/2 - x)) + to assemble result using change-of-sign and conditional selection of + numerator/denominator dependent on odd/even-ness of q (hence quadrant). */ + svbool_t use_recip + = svcmpeq (pg, svand_x (pg, svreinterpret_u64 (qi), 1), 0); + + svfloat64_t n = svmad_x (pg, p, p, -1); + svfloat64_t d = svmul_x (pg, p, 2); + svfloat64_t swap = n; + n = svneg_m (n, use_recip, d); + d = svsel (use_recip, swap, d); + if (__glibc_unlikely (svptest_any (pg, special))) + return special_case (x, svdiv_x (svnot_z (pg, special), n, d), special); + return svdiv_x (pg, n, d); +} diff --git a/sysdeps/aarch64/fpu/tanf_advsimd.c b/sysdeps/aarch64/fpu/tanf_advsimd.c new file mode 100644 index 0000000000..4c8a7f740e --- /dev/null +++ b/sysdeps/aarch64/fpu/tanf_advsimd.c @@ -0,0 +1,129 @@ +/* Single-precision vector (Advanced SIMD) tan function + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" +#include "poly_advsimd_f32.h" + +static const struct data +{ + float32x4_t poly[6]; + float32x4_t neg_half_pi_1, neg_half_pi_2, neg_half_pi_3, two_over_pi, shift; +#if !WANT_SIMD_EXCEPT + float32x4_t range_val; +#endif +} data = { + /* Coefficients generated using FPMinimax. */ + .poly = { V4 (0x1.55555p-2f), V4 (0x1.11166p-3f), V4 (0x1.b88a78p-5f), + V4 (0x1.7b5756p-6f), V4 (0x1.4ef4cep-8f), V4 (0x1.0e1e74p-7f) }, + .neg_half_pi_1 = V4 (-0x1.921fb6p+0f), + .neg_half_pi_2 = V4 (0x1.777a5cp-25f), + .neg_half_pi_3 = V4 (0x1.ee59dap-50f), + .two_over_pi = V4 (0x1.45f306p-1f), + .shift = V4 (0x1.8p+23f), +#if !WANT_SIMD_EXCEPT + .range_val = V4 (0x1p15f), +#endif +}; + +#define RangeVal v_u32 (0x47000000) /* asuint32(0x1p15f). */ +#define TinyBound v_u32 (0x30000000) /* asuint32 (0x1p-31f). */ +#define Thresh v_u32 (0x16000000) /* asuint32(RangeVal) - TinyBound. */ + +/* Special cases (fall back to scalar calls). */ +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, uint32x4_t cmp) +{ + return v_call_f32 (tanf, x, y, cmp); +} + +/* Use a full Estrin scheme to evaluate polynomial. */ +static inline float32x4_t +eval_poly (float32x4_t z, const struct data *d) +{ + float32x4_t z2 = vmulq_f32 (z, z); +#if WANT_SIMD_EXCEPT + /* Tiny z (<= 0x1p-31) will underflow when calculating z^4. If fp exceptions + are to be triggered correctly, sidestep this by fixing such lanes to 0. */ + uint32x4_t will_uflow + = vcleq_u32 (vreinterpretq_u32_f32 (vabsq_f32 (z)), TinyBound); + if (__glibc_unlikely (v_any_u32 (will_uflow))) + z2 = vbslq_f32 (will_uflow, v_f32 (0), z2); +#endif + float32x4_t z4 = vmulq_f32 (z2, z2); + return v_estrin_5_f32 (z, z2, z4, d->poly); +} + +/* Fast implementation of AdvSIMD tanf. + Maximum error is 3.45 ULP: + __v_tanf(-0x1.e5f0cap+13) got 0x1.ff9856p-1 + want 0x1.ff9850p-1. */ +float32x4_t VPCS_ATTR V_NAME_F1 (tan) (float32x4_t x) +{ + const struct data *d = ptr_barrier (&data); + float32x4_t special_arg = x; + + /* iax >= RangeVal means x, if not inf or NaN, is too large to perform fast + regression. */ +#if WANT_SIMD_EXCEPT + uint32x4_t iax = vreinterpretq_u32_f32 (vabsq_f32 (x)); + /* If fp exceptions are to be triggered correctly, also special-case tiny + input, as this will load to overflow later. Fix any special lanes to 1 to + prevent any exceptions being triggered. */ + uint32x4_t special = vcgeq_u32 (vsubq_u32 (iax, TinyBound), Thresh); + if (__glibc_unlikely (v_any_u32 (special))) + x = vbslq_f32 (special, v_f32 (1.0f), x); +#else + /* Otherwise, special-case large and special values. */ + uint32x4_t special = vcageq_f32 (x, d->range_val); +#endif + + /* n = rint(x/(pi/2)). */ + float32x4_t q = vfmaq_f32 (d->shift, d->two_over_pi, x); + float32x4_t n = vsubq_f32 (q, d->shift); + /* Determine if x lives in an interval, where |tan(x)| grows to infinity. */ + uint32x4_t pred_alt = vtstq_u32 (vreinterpretq_u32_f32 (q), v_u32 (1)); + + /* r = x - n * (pi/2) (range reduction into -pi./4 .. pi/4). */ + float32x4_t r; + r = vfmaq_f32 (x, d->neg_half_pi_1, n); + r = vfmaq_f32 (r, d->neg_half_pi_2, n); + r = vfmaq_f32 (r, d->neg_half_pi_3, n); + + /* If x lives in an interval, where |tan(x)| + - is finite, then use a polynomial approximation of the form + tan(r) ~ r + r^3 * P(r^2) = r + r * r^2 * P(r^2). + - grows to infinity then use symmetries of tangent and the identity + tan(r) = cotan(pi/2 - r) to express tan(x) as 1/tan(-r). Finally, use + the same polynomial approximation of tan as above. */ + + /* Invert sign of r if odd quadrant. */ + float32x4_t z = vmulq_f32 (r, vbslq_f32 (pred_alt, v_f32 (-1), v_f32 (1))); + + /* Evaluate polynomial approximation of tangent on [-pi/4, pi/4]. */ + float32x4_t z2 = vmulq_f32 (r, r); + float32x4_t p = eval_poly (z2, d); + float32x4_t y = vfmaq_f32 (z, vmulq_f32 (z, z2), p); + + /* Compute reciprocal and apply if required. */ + float32x4_t inv_y = vdivq_f32 (v_f32 (1.0f), y); + + if (__glibc_unlikely (v_any_u32 (special))) + return special_case (special_arg, vbslq_f32 (pred_alt, inv_y, y), special); + return vbslq_f32 (pred_alt, inv_y, y); +} diff --git a/sysdeps/aarch64/fpu/tanf_sve.c b/sysdeps/aarch64/fpu/tanf_sve.c new file mode 100644 index 0000000000..856cbece7e --- /dev/null +++ b/sysdeps/aarch64/fpu/tanf_sve.c @@ -0,0 +1,118 @@ +/* Single-precision vector (SVE) tan function + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +static const struct data +{ + float pio2_1, pio2_2, pio2_3, invpio2; + float c1, c3, c5; + float c0, c2, c4, range_val, shift; +} data = { + /* Coefficients generated using: + poly = fpminimax((tan(sqrt(x))-sqrt(x))/x^(3/2), + deg, + [|single ...|], + [a*a;b*b]); + optimize relative error + final prec : 23 bits + deg : 5 + a : 0x1p-126 ^ 2 + b : ((pi) / 0x1p2) ^ 2 + dirty rel error: 0x1.f7c2e4p-25 + dirty abs error: 0x1.f7c2ecp-25. */ + .c0 = 0x1.55555p-2, .c1 = 0x1.11166p-3, + .c2 = 0x1.b88a78p-5, .c3 = 0x1.7b5756p-6, + .c4 = 0x1.4ef4cep-8, .c5 = 0x1.0e1e74p-7, + + .pio2_1 = 0x1.921fb6p+0f, .pio2_2 = -0x1.777a5cp-25f, + .pio2_3 = -0x1.ee59dap-50f, .invpio2 = 0x1.45f306p-1f, + .range_val = 0x1p15f, .shift = 0x1.8p+23f +}; + +static svfloat32_t NOINLINE +special_case (svfloat32_t x, svfloat32_t y, svbool_t cmp) +{ + return sv_call_f32 (tanf, x, y, cmp); +} + +/* Fast implementation of SVE tanf. + Maximum error is 3.45 ULP: + SV_NAME_F1 (tan)(-0x1.e5f0cap+13) got 0x1.ff9856p-1 + want 0x1.ff9850p-1. */ +svfloat32_t SV_NAME_F1 (tan) (svfloat32_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + /* Determine whether input is too large to perform fast regression. */ + svbool_t cmp = svacge (pg, x, d->range_val); + + svfloat32_t odd_coeffs = svld1rq (svptrue_b32 (), &d->c1); + svfloat32_t pi_vals = svld1rq (svptrue_b32 (), &d->pio2_1); + + /* n = rint(x/(pi/2)). */ + svfloat32_t q = svmla_lane (sv_f32 (d->shift), x, pi_vals, 3); + svfloat32_t n = svsub_x (pg, q, d->shift); + /* n is already a signed integer, simply convert it. */ + svint32_t in = svcvt_s32_x (pg, n); + /* Determine if x lives in an interval, where |tan(x)| grows to infinity. */ + svint32_t alt = svand_x (pg, in, 1); + svbool_t pred_alt = svcmpne (pg, alt, 0); + + /* r = x - n * (pi/2) (range reduction into 0 .. pi/4). */ + svfloat32_t r; + r = svmls_lane (x, n, pi_vals, 0); + r = svmls_lane (r, n, pi_vals, 1); + r = svmls_lane (r, n, pi_vals, 2); + + /* If x lives in an interval, where |tan(x)| + - is finite, then use a polynomial approximation of the form + tan(r) ~ r + r^3 * P(r^2) = r + r * r^2 * P(r^2). + - grows to infinity then use symmetries of tangent and the identity + tan(r) = cotan(pi/2 - r) to express tan(x) as 1/tan(-r). Finally, use + the same polynomial approximation of tan as above. */ + + /* Perform additional reduction if required. */ + svfloat32_t z = svneg_m (r, pred_alt, r); + + /* Evaluate polynomial approximation of tangent on [-pi/4, pi/4], + using Estrin on z^2. */ + svfloat32_t z2 = svmul_x (pg, z, z); + svfloat32_t p01 = svmla_lane (sv_f32 (d->c0), z2, odd_coeffs, 0); + svfloat32_t p23 = svmla_lane (sv_f32 (d->c2), z2, odd_coeffs, 1); + svfloat32_t p45 = svmla_lane (sv_f32 (d->c4), z2, odd_coeffs, 2); + + svfloat32_t z4 = svmul_x (pg, z2, z2); + svfloat32_t p = svmla_x (pg, p01, z4, p23); + + svfloat32_t z8 = svmul_x (pg, z4, z4); + p = svmla_x (pg, p, z8, p45); + + svfloat32_t y = svmla_x (pg, z, p, svmul_x (pg, z, z2)); + + /* Transform result back, if necessary. */ + svfloat32_t inv_y = svdivr_x (pg, y, 1.0f); + + /* No need to pass pg to specialcase here since cmp is a strict subset, + guaranteed by the cmpge above. */ + if (__glibc_unlikely (svptest_any (pg, cmp))) + return special_case (x, svsel (pred_alt, inv_y, y), cmp); + + return svsel (pred_alt, inv_y, y); +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index 3b6b1e343d..f76726e324 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -27,3 +27,4 @@ VPCS_VECTOR_WRAPPER (cos_advsimd, _ZGVnN2v_cos) VPCS_VECTOR_WRAPPER (exp_advsimd, _ZGVnN2v_exp) VPCS_VECTOR_WRAPPER (log_advsimd, _ZGVnN2v_log) VPCS_VECTOR_WRAPPER (sin_advsimd, _ZGVnN2v_sin) +VPCS_VECTOR_WRAPPER (tan_advsimd, _ZGVnN2v_tan) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index d7ac47ca22..5a9e5b552a 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -36,3 +36,4 @@ SVE_VECTOR_WRAPPER (cos_sve, _ZGVsMxv_cos) SVE_VECTOR_WRAPPER (exp_sve, _ZGVsMxv_exp) SVE_VECTOR_WRAPPER (log_sve, _ZGVsMxv_log) SVE_VECTOR_WRAPPER (sin_sve, _ZGVsMxv_sin) +SVE_VECTOR_WRAPPER (tan_sve, _ZGVsMxv_tan) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index d4a9ac7154..66c6227087 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -27,3 +27,4 @@ VPCS_VECTOR_WRAPPER (cosf_advsimd, _ZGVnN4v_cosf) VPCS_VECTOR_WRAPPER (expf_advsimd, _ZGVnN4v_expf) VPCS_VECTOR_WRAPPER (logf_advsimd, _ZGVnN4v_logf) VPCS_VECTOR_WRAPPER (sinf_advsimd, _ZGVnN4v_sinf) +VPCS_VECTOR_WRAPPER (tanf_advsimd, _ZGVnN4v_tanf) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index d44033eab0..3cbf8b48b7 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -36,3 +36,4 @@ SVE_VECTOR_WRAPPER (cosf_sve, _ZGVsMxv_cosf) SVE_VECTOR_WRAPPER (expf_sve, _ZGVsMxv_expf) SVE_VECTOR_WRAPPER (logf_sve, _ZGVsMxv_logf) SVE_VECTOR_WRAPPER (sinf_sve, _ZGVsMxv_sinf) +SVE_VECTOR_WRAPPER (tanf_sve, _ZGVsMxv_tanf) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 6b25ed43e0..ec84381c75 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1340,11 +1340,19 @@ Function: "tan": float: 1 ldouble: 1 +Function: "tan_advsimd": +double: 2 +float: 2 + Function: "tan_downward": double: 1 float: 2 ldouble: 1 +Function: "tan_sve": +double: 2 +float: 2 + Function: "tan_towardzero": double: 1 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index ae46ef8c34..0ec668582e 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -14,3 +14,7 @@ GLIBC_2.38 _ZGVsMxv_log F GLIBC_2.38 _ZGVsMxv_logf F GLIBC_2.38 _ZGVsMxv_sin F GLIBC_2.38 _ZGVsMxv_sinf F +GLIBC_2.39 _ZGVnN2v_tan F +GLIBC_2.39 _ZGVnN4v_tanf F +GLIBC_2.39 _ZGVsMxv_tan F +GLIBC_2.39 _ZGVsMxv_tanf F