From patchwork Thu Oct 5 09:31:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 77144 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A187F385CC8D for ; Thu, 5 Oct 2023 09:32:12 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04on2054.outbound.protection.outlook.com [40.107.6.54]) by sourceware.org (Postfix) with ESMTPS id 2F2D238582BE for ; Thu, 5 Oct 2023 09:31:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2F2D238582BE Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LKbz+VJDe9zmD8kfxEjl7dlgArEZ5qnEvFvpOGl5QoM=; b=53cFudIbJUKsT50osj7+T6Nvwc8aazhXkw6MVz4QryU7Gjqjw3m5IMjfJCHOolsfSiTk+LLAEc/foy6SgMiRqcLqnU9Yv4uvg4FJUExhwcaU+CzyNJqW0B9QhYTNHgNuBhCT/nUzwOa+PVHEwNOByRoIJ17NrsHngTe7CYNSauU= Received: from AS9PR01CA0002.eurprd01.prod.exchangelabs.com (2603:10a6:20b:540::14) by AS8PR08MB6373.eurprd08.prod.outlook.com (2603:10a6:20b:33a::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.35; Thu, 5 Oct 2023 09:31:51 +0000 Received: from AM7EUR03FT033.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:540:cafe::8e) by AS9PR01CA0002.outlook.office365.com (2603:10a6:20b:540::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.26 via Frontend Transport; Thu, 5 Oct 2023 09:31:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT033.mail.protection.outlook.com (100.127.140.129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.26 via Frontend Transport; Thu, 5 Oct 2023 09:31:51 +0000 Received: ("Tessian outbound 6d14f3380669:v211"); Thu, 05 Oct 2023 09:31:50 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 9afd3287ff24f01b X-CR-MTA-TID: 64aa7808 Received: from 5780953add4a.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id E084D3D3-7CEB-4982-BF74-9C4FEC9C4DC6.1; Thu, 05 Oct 2023 09:31:43 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 5780953add4a.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 05 Oct 2023 09:31:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Nz8nk7+E8xk5g63z0qDmoPTnfPC/KLxPBMk5dNePQHmbCJcQkIS0dYZYfTXRL3SBN/hkZCKwYR87XpUOXHQRDRteMdsJOj53KoE6rhyeL+z/htREsf4iZ6cUaeJxgT+pEBD/Z/CTc2Sq69H9NTyVuREH8PDqh9WUj0nCSgPdwSS2wZLmjDtFhwwq07UM2PNT1xzB9jS+mbji0dnl1EHFzJedU99NirvsoLbujREMD+bRmWEVsqyrCAXeqvry8YlCihAY/gAIuzbSwBTeq7HQIUXEt1xZZS4zWolDHoP7ijW0pUIJ8uG3ZJ7tjLZYq8ivr3Um7t15ulVjowlHBSEtDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LKbz+VJDe9zmD8kfxEjl7dlgArEZ5qnEvFvpOGl5QoM=; b=IujnxVPfWh+S9mrf4tAncp4lrAV++QrQynxylAGpcJ0xE+lGMwHKTtDbh2jDWFfbvtW8qw/cgHnxhpCgE1CgrNDyWBKXYt9O9wvawHS0Q2TWNVI1Juvq7bC8Dc8a392f5oPF7WzITzYoz9bvmGDrIhhUCKINUvIGwX3SK3h6hMatDws+AaxVJD2263r2u92T28RCRzL7WMvDVhz7H7WmRP5qBQPEqcbrQlj5973uomTZYN4a91j1FC84QsztPez+2x8Rqf5v5jku8SkEHRD9rIYPmaBKlNiV1LDBN0zPEpj7pqV0KmTVuYsP8SLElSSXBP22nffcjICPF2EQyr+xZw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LKbz+VJDe9zmD8kfxEjl7dlgArEZ5qnEvFvpOGl5QoM=; b=53cFudIbJUKsT50osj7+T6Nvwc8aazhXkw6MVz4QryU7Gjqjw3m5IMjfJCHOolsfSiTk+LLAEc/foy6SgMiRqcLqnU9Yv4uvg4FJUExhwcaU+CzyNJqW0B9QhYTNHgNuBhCT/nUzwOa+PVHEwNOByRoIJ17NrsHngTe7CYNSauU= Received: from AS9PR01CA0014.eurprd01.prod.exchangelabs.com (2603:10a6:20b:540::19) by AS2PR08MB9619.eurprd08.prod.outlook.com (2603:10a6:20b:608::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.35; Thu, 5 Oct 2023 09:31:41 +0000 Received: from AM7EUR03FT033.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:540:cafe::10) by AS9PR01CA0014.outlook.office365.com (2603:10a6:20b:540::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.33 via Frontend Transport; Thu, 5 Oct 2023 09:31:41 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by AM7EUR03FT033.mail.protection.outlook.com (100.127.140.129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6863.26 via Frontend Transport; Thu, 5 Oct 2023 09:31:41 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Thu, 5 Oct 2023 09:31:39 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.27 via Frontend Transport; Thu, 5 Oct 2023 09:31:39 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH v2] aarch64: Improve vecmath sin routines Date: Thu, 5 Oct 2023 10:31:38 +0100 Message-ID: <20231005093138.7209-1-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AM7EUR03FT033:EE_|AS2PR08MB9619:EE_|AM7EUR03FT033:EE_|AS8PR08MB6373:EE_ X-MS-Office365-Filtering-Correlation-Id: 86e2865c-cec5-4393-b5aa-08dbc585e7b9 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: vCYZe1xJ2V752tjRu3NXPdIvP6xL5MZ9L50q0JlgL6jKu9ljSQygG6r1TdMJFbb//h8HuU9BZBnFK6HLwWP3uzu+koQVNGvLmOI/+iP1xpXzdiyXw31gpsj1zCbH5x/+ofJ6lSAhsL9bOqx9N7p0V25YTC5kJxmn0xG8XTx6sY4gmWonTTk6l095M3vRchyyisVFI0QMOFltSmTd3+1USrtfud3CShtt6JB863UsjSyl8XXqyEKrX2qpESpmnoQV4JE27Rd2buMSjq1TnXQZwNMHbVCQy53LaigFOD2WGLgOiiXgYqmsLIIRguyPuJW1+4Is/ZZY3yu6SpoKL49yIBbzdphvlqM+enK2aa+95a4deS+nMSIBX0nysaBlgtxvY6UbBDTTJ7YLXtEAcwnsBNDXakpZ1Z4EwG5cXdMUuVWLC5vxDpEsR9wsbxiznOuMMTTmnDEfspYsiFceBlqX9OJmVwmRI7Nz6e0DSDsT9qtRQgVICEPnXqV0ox4qQZKF+VbR5dGN2UR1SmBjoY84hUOc4+bdL8olJ9cfu/oq4d7Hx14MPimRel1/sg4E5btHwe+4XO8gODYZN5JX3kSG6phDA9yMIEVqqL2p63QhIHqwoJFRdXdUfZvJbNbvvLGqb/f/bok5UfSGWhUWvtTYp6Z58Y3eOWAxbMlc1s4aTpm8w4Cw+MeVv6hJwpCxsYST+5rCnO8pDJTxq51wQM7xxVBbagR1HwGDa27CNG/TZVZL6hX8IiAg3m8mtyxefVlgkqkiw/zfDd+cilRCf5BK+UKdYWEh/Sv2yMFASvT5BIo= X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(4636009)(136003)(39860400002)(346002)(396003)(376002)(230922051799003)(64100799003)(82310400011)(1800799009)(186009)(451199024)(40470700004)(46966006)(36840700001)(6916009)(2616005)(336012)(26005)(426003)(1076003)(36756003)(316002)(356005)(40480700001)(70206006)(70586007)(41300700001)(8936002)(83380400001)(5660300002)(8676002)(478600001)(81166007)(7696005)(4326008)(47076005)(36860700001)(2906002)(40460700003)(86362001)(82740400003)(36900700001)(473944003); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9619 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT033.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 8b04d289-2a63-4d73-68cd-08dbc585e1ba X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: SMx8xajZXNuawhYnH1uocFCb0EraCSGUFqSYG2touxZojmZWariWbdYmMJrTcIA2gdFhz/jEzojnWmEmyairYCakxyI3pCnG1I4ghHnNmcI6Ozg5MHqDyQdNmLWWbAj4izw8HZrpPxBd2vfUvCef9tTLrBmMJoGVDQzACNVbenXVh8cMw02VyCDijljbnT0wvKkeDaqMmvtLkamArGADahpQt9FrewfWGyJtAzZfLrn4CMZBbuqfVaNxli2DxSXbc33NsM8yhCTXQrQVx2ciUeslWq7anxTrDXQYC9oFRHz1/oSFFrG4GfV6aDviZrzLUefUSFn63JHZFdd9B9x8KsJbqrvvFHpZtCxecxnEIVs0UvLozBilFVCt5OypTBaku8R1o942+l6vWd+Lmc+NslJWNDFGgcNOSKUUIcrorEBBKbTdYGkHNyo8Cjqbr6G23He9eOmx2xRdlViIKzNG/TzZ/zMYk985SOKbZt6+GI9hoepYHMJJpuuXD2LNyD6lCHGFIAceII7Y+fTNKJUBYdJ82kvEfNpEwRVcnpBYgztMkAKPeZ3M08+iN/Ql6Rwm9aLXs6yTrSIMkMCCUyIOcRoaRDgYNLCg5h8BUqzKfUXRyTn4DqCHy+NOrj4uczK+FogVD4zFK7YFHNIEv22mgdB1Bf+dLSQFBQO4fcfSxC2ShHaPPWMhntUJfWFlmLrPbw6NIDdI1rFtsN6yUyFqqsOVkobiq1ke+SdbRq4KbuK137zdGluMQVDyFF4o/Fb5XG32b4T41PFBXO9IPbRdqg== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(376002)(39860400002)(136003)(346002)(396003)(230922051799003)(82310400011)(451199024)(1800799009)(64100799003)(186009)(40470700004)(46966006)(36840700001)(40460700003)(86362001)(2906002)(40480700001)(41300700001)(5660300002)(70586007)(6916009)(70206006)(4326008)(8676002)(8936002)(36756003)(82740400003)(47076005)(316002)(7696005)(336012)(426003)(26005)(1076003)(83380400001)(81166007)(36860700001)(478600001)(2616005)(473944003); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Oct 2023 09:31:51.1305 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 86e2865c-cec5-4393-b5aa-08dbc585e7b9 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT033.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6373 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org * Update ULP comment reflecting a new observed max in [-pi/2, pi/2] * Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions * Improve register use near special-case branch Also use overloaded intrinsics for SVE. --- Changes from v1: * Report new observed global max and max in [-pi/2, pi/2] Thanks, Joe sysdeps/aarch64/fpu/sin_advsimd.c | 10 ++- sysdeps/aarch64/fpu/sin_sve.c | 106 ++++++++++++++++-------------- sysdeps/aarch64/fpu/sinf_sve.c | 44 +++++++------ 3 files changed, 87 insertions(+), 73 deletions(-) diff --git a/sysdeps/aarch64/fpu/sin_advsimd.c b/sysdeps/aarch64/fpu/sin_advsimd.c index 0389b334cc..3d87a1da79 100644 --- a/sysdeps/aarch64/fpu/sin_advsimd.c +++ b/sysdeps/aarch64/fpu/sin_advsimd.c @@ -24,7 +24,6 @@ static const struct data float64x2_t poly[7]; float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; } data = { - /* Worst-case error is 2.8 ulp in [-pi/2, pi/2]. */ .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7), V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19), V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33), @@ -52,6 +51,15 @@ special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp) return v_call_f64 (sin, x, y, cmp); } +/* Vector (AdvSIMD) sin approximation. + Maximum observed error in [-pi/2, pi/2], where argument is not reduced, + is 2.87 ULP: + _ZGVnN2v_sin (0x1.921d5c6a07142p+0) got 0x1.fffffffa7dc02p-1 + want 0x1.fffffffa7dc05p-1 + Maximum observed error in the entire non-special domain ([-2^23, 2^23]) + is 3.22 ULP: + _ZGVnN2v_sin (0x1.5702447b6f17bp+22) got 0x1.ffdcd125c84fbp-3 + want 0x1.ffdcd125c84f8p-3. */ float64x2_t VPCS_ATTR V_NAME_D1 (sin) (float64x2_t x) { const struct data *d = ptr_barrier (&data); diff --git a/sysdeps/aarch64/fpu/sin_sve.c b/sysdeps/aarch64/fpu/sin_sve.c index c3f450d0ea..54c8dae286 100644 --- a/sysdeps/aarch64/fpu/sin_sve.c +++ b/sysdeps/aarch64/fpu/sin_sve.c @@ -21,20 +21,22 @@ static const struct data { - double inv_pi, half_pi, inv_pi_over_2, pi_over_2_1, pi_over_2_2, pi_over_2_3, - shift; + double inv_pi, pi_1, pi_2, pi_3, shift, range_val; + double poly[7]; } data = { - /* Polynomial coefficients are hard-wired in the FTMAD instruction. */ + .poly = { -0x1.555555555547bp-3, 0x1.1111111108a4dp-7, -0x1.a01a019936f27p-13, + 0x1.71de37a97d93ep-19, -0x1.ae633919987c6p-26, + 0x1.60e277ae07cecp-33, -0x1.9e9540300a1p-41, }, + .inv_pi = 0x1.45f306dc9c883p-2, - .half_pi = 0x1.921fb54442d18p+0, - .inv_pi_over_2 = 0x1.45f306dc9c882p-1, - .pi_over_2_1 = 0x1.921fb50000000p+0, - .pi_over_2_2 = 0x1.110b460000000p-26, - .pi_over_2_3 = 0x1.1a62633145c07p-54, - .shift = 0x1.8p52 + .pi_1 = 0x1.921fb54442d18p+1, + .pi_2 = 0x1.1a62633145c06p-53, + .pi_3 = 0x1.c1cd129024e09p-106, + .shift = 0x1.8p52, + .range_val = 0x1p23, }; -#define RangeVal 0x4160000000000000 /* asuint64 (0x1p23). */ +#define C(i) sv_f64 (d->poly[i]) static svfloat64_t NOINLINE special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) @@ -42,56 +44,58 @@ special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) return sv_call_f64 (sin, x, y, cmp); } -/* A fast SVE implementation of sin based on trigonometric - instructions (FTMAD, FTSSEL, FTSMUL). - Maximum observed error in 2.52 ULP: - SV_NAME_D1 (sin)(0x1.2d2b00df69661p+19) got 0x1.10ace8f3e786bp-40 - want 0x1.10ace8f3e7868p-40. */ +/* A fast SVE implementation of sin. + Maximum observed error in [-pi/2, pi/2], where argument is not reduced, + is 2.87 ULP: + _ZGVsMxv_sin (0x1.921d5c6a07142p+0) got 0x1.fffffffa7dc02p-1 + want 0x1.fffffffa7dc05p-1 + Maximum observed error in the entire non-special domain ([-2^23, 2^23]) + is 3.22 ULP: + _ZGVsMxv_sin (0x1.5702447b6f17bp+22) got 0x1.ffdcd125c84fbp-3 + want 0x1.ffdcd125c84f8p-3. */ svfloat64_t SV_NAME_D1 (sin) (svfloat64_t x, const svbool_t pg) { const struct data *d = ptr_barrier (&data); - svfloat64_t r = svabs_f64_x (pg, x); - svuint64_t sign - = sveor_u64_x (pg, svreinterpret_u64_f64 (x), svreinterpret_u64_f64 (r)); - svbool_t cmp = svcmpge_n_u64 (pg, svreinterpret_u64_f64 (r), RangeVal); + /* Load some values in quad-word chunks to minimise memory access. */ + const svbool_t ptrue = svptrue_b64 (); + svfloat64_t shift = sv_f64 (d->shift); + svfloat64_t inv_pi_and_pi1 = svld1rq (ptrue, &d->inv_pi); + svfloat64_t pi2_and_pi3 = svld1rq (ptrue, &d->pi_2); - /* Load first two pio2-related constants to one vector. */ - svfloat64_t invpio2_and_pio2_1 - = svld1rq_f64 (svptrue_b64 (), &d->inv_pi_over_2); + /* n = rint(|x|/pi). */ + svfloat64_t n = svmla_lane (shift, x, inv_pi_and_pi1, 0); + svuint64_t odd = svlsl_x (pg, svreinterpret_u64 (n), 63); + n = svsub_x (pg, n, shift); - /* n = rint(|x|/(pi/2)). */ - svfloat64_t q = svmla_lane_f64 (sv_f64 (d->shift), r, invpio2_and_pio2_1, 0); - svfloat64_t n = svsub_n_f64_x (pg, q, d->shift); + /* r = |x| - n*(pi/2) (range reduction into -pi/2 .. pi/2). */ + svfloat64_t r = x; + r = svmls_lane (r, n, inv_pi_and_pi1, 1); + r = svmls_lane (r, n, pi2_and_pi3, 0); + r = svmls_lane (r, n, pi2_and_pi3, 1); - /* r = |x| - n*(pi/2) (range reduction into -pi/4 .. pi/4). */ - r = svmls_lane_f64 (r, n, invpio2_and_pio2_1, 1); - r = svmls_n_f64_x (pg, r, n, d->pi_over_2_2); - r = svmls_n_f64_x (pg, r, n, d->pi_over_2_3); + /* sin(r) poly approx. */ + svfloat64_t r2 = svmul_x (pg, r, r); + svfloat64_t r3 = svmul_x (pg, r2, r); + svfloat64_t r4 = svmul_x (pg, r2, r2); - /* Final multiplicative factor: 1.0 or x depending on bit #0 of q. */ - svfloat64_t f = svtssel_f64 (r, svreinterpret_u64_f64 (q)); + svfloat64_t t1 = svmla_x (pg, C (4), C (5), r2); + svfloat64_t t2 = svmla_x (pg, C (2), C (3), r2); + svfloat64_t t3 = svmla_x (pg, C (0), C (1), r2); - /* sin(r) poly approx. */ - svfloat64_t r2 = svtsmul_f64 (r, svreinterpret_u64_f64 (q)); - svfloat64_t y = sv_f64 (0.0); - y = svtmad_f64 (y, r2, 7); - y = svtmad_f64 (y, r2, 6); - y = svtmad_f64 (y, r2, 5); - y = svtmad_f64 (y, r2, 4); - y = svtmad_f64 (y, r2, 3); - y = svtmad_f64 (y, r2, 2); - y = svtmad_f64 (y, r2, 1); - y = svtmad_f64 (y, r2, 0); - - /* Apply factor. */ - y = svmul_f64_x (pg, f, y); - - /* sign = y^sign. */ - y = svreinterpret_f64_u64 ( - sveor_u64_x (pg, svreinterpret_u64_f64 (y), sign)); + svfloat64_t y = svmla_x (pg, t1, C (6), r4); + y = svmla_x (pg, t2, y, r4); + y = svmla_x (pg, t3, y, r4); + y = svmla_x (pg, r, y, r3); + svbool_t cmp = svacle (pg, x, d->range_val); + cmp = svnot_z (pg, cmp); if (__glibc_unlikely (svptest_any (pg, cmp))) - return special_case (x, y, cmp); - return y; + return special_case (x, + svreinterpret_f64 (sveor_z ( + svnot_z (pg, cmp), svreinterpret_u64 (y), odd)), + cmp); + + /* Copy sign. */ + return svreinterpret_f64 (sveor_z (pg, svreinterpret_u64 (y), odd)); } diff --git a/sysdeps/aarch64/fpu/sinf_sve.c b/sysdeps/aarch64/fpu/sinf_sve.c index 4d2ce7a846..590881c14b 100644 --- a/sysdeps/aarch64/fpu/sinf_sve.c +++ b/sysdeps/aarch64/fpu/sinf_sve.c @@ -23,7 +23,7 @@ static const struct data { float poly[4]; /* Pi-related values to be loaded as one quad-word and used with - svmla_lane_f32. */ + svmla_lane. */ float negpi1, negpi2, negpi3, invpi; float shift; } data = { @@ -57,40 +57,42 @@ svfloat32_t SV_NAME_F1 (sin) (svfloat32_t x, const svbool_t pg) { const struct data *d = ptr_barrier (&data); - svfloat32_t ax = svabs_f32_x (pg, x); - svuint32_t sign = sveor_u32_x (pg, svreinterpret_u32_f32 (x), - svreinterpret_u32_f32 (ax)); - svbool_t cmp = svcmpge_n_u32 (pg, svreinterpret_u32_f32 (ax), RangeVal); + svfloat32_t ax = svabs_x (pg, x); + svuint32_t sign + = sveor_x (pg, svreinterpret_u32 (x), svreinterpret_u32 (ax)); + svbool_t cmp = svcmpge (pg, svreinterpret_u32 (ax), RangeVal); /* pi_vals are a quad-word of helper values - the first 3 elements contain -pi in extended precision, the last contains 1 / pi. */ - svfloat32_t pi_vals = svld1rq_f32 (svptrue_b32 (), &d->negpi1); + svfloat32_t pi_vals = svld1rq (svptrue_b32 (), &d->negpi1); /* n = rint(|x|/pi). */ - svfloat32_t n = svmla_lane_f32 (sv_f32 (d->shift), ax, pi_vals, 3); - svuint32_t odd = svlsl_n_u32_x (pg, svreinterpret_u32_f32 (n), 31); - n = svsub_n_f32_x (pg, n, d->shift); + svfloat32_t n = svmla_lane (sv_f32 (d->shift), ax, pi_vals, 3); + svuint32_t odd = svlsl_x (pg, svreinterpret_u32 (n), 31); + n = svsub_x (pg, n, d->shift); /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ svfloat32_t r; - r = svmla_lane_f32 (ax, n, pi_vals, 0); - r = svmla_lane_f32 (r, n, pi_vals, 1); - r = svmla_lane_f32 (r, n, pi_vals, 2); + r = svmla_lane (ax, n, pi_vals, 0); + r = svmla_lane (r, n, pi_vals, 1); + r = svmla_lane (r, n, pi_vals, 2); /* sin(r) approx using a degree 9 polynomial from the Taylor series expansion. Note that only the odd terms of this are non-zero. */ - svfloat32_t r2 = svmul_f32_x (pg, r, r); + svfloat32_t r2 = svmul_x (pg, r, r); svfloat32_t y; - y = svmla_f32_x (pg, C (2), r2, C (3)); - y = svmla_f32_x (pg, C (1), r2, y); - y = svmla_f32_x (pg, C (0), r2, y); - y = svmla_f32_x (pg, r, r, svmul_f32_x (pg, y, r2)); + y = svmla_x (pg, C (2), r2, C (3)); + y = svmla_x (pg, C (1), r2, y); + y = svmla_x (pg, C (0), r2, y); + y = svmla_x (pg, r, r, svmul_x (pg, y, r2)); /* sign = y^sign^odd. */ - y = svreinterpret_f32_u32 (sveor_u32_x (pg, svreinterpret_u32_f32 (y), - sveor_u32_x (pg, sign, odd))); + sign = sveor_x (pg, sign, odd); if (__glibc_unlikely (svptest_any (pg, cmp))) - return special_case (x, y, cmp); - return y; + return special_case (x, + svreinterpret_f32 (sveor_x ( + svnot_z (pg, cmp), svreinterpret_u32 (y), sign)), + cmp); + return svreinterpret_f32 (sveor_x (pg, svreinterpret_u32 (y), sign)); }