From patchwork Wed Oct 4 10:58:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 77093 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6CCF6385770B for ; Wed, 4 Oct 2023 10:58:41 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2057.outbound.protection.outlook.com [40.107.22.57]) by sourceware.org (Postfix) with ESMTPS id 84CB03858CDB for ; Wed, 4 Oct 2023 10:58:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 84CB03858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GWfc6Pu+u6+O3fLB9My9t0ijogSJ9ip4okKTD7NA+Mc=; b=atv2pcz1g7UFOOZw+65GLPLzkQqxGepmtGSYlHeKXYX4tKgtvVlJBkQVwf+t5H4oLKxcLFEGfFom54KmU8ivap5WaZ0lVoIC+QqwIayJnlX0K5E5P8bNx1L7IKOZVJjAu4IlObeuexg54+BwB94XymH4R5JT1KYOPu4jqbW/neM= Received: from DUZPR01CA0089.eurprd01.prod.exchangelabs.com (2603:10a6:10:46a::12) by DU2PR08MB7309.eurprd08.prod.outlook.com (2603:10a6:10:2e4::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.35; Wed, 4 Oct 2023 10:58:20 +0000 Received: from DBAEUR03FT064.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:46a:cafe::44) by DUZPR01CA0089.outlook.office365.com (2603:10a6:10:46a::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.33 via Frontend Transport; Wed, 4 Oct 2023 10:58:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT064.mail.protection.outlook.com (100.127.143.3) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.25 via Frontend Transport; Wed, 4 Oct 2023 10:58:20 +0000 Received: ("Tessian outbound d219f9a4f5c9:v211"); Wed, 04 Oct 2023 10:58:20 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 2cfa768077803198 X-CR-MTA-TID: 64aa7808 Received: from f4a698e8acbe.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 49E2933D-FD1E-4FB9-8050-F21EF9B8797E.1; Wed, 04 Oct 2023 10:58:14 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id f4a698e8acbe.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 04 Oct 2023 10:58:14 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QvWGrelWmFrOvuD32OnMA+S2OgV7V+63n2n7B0kvmPiGYfzfifz1zHgpjpOWSiccikej/iIsFXE2Pa0QJ83MjZ5XglTS4CKpOm/Ma84PUWVMUEDj0hjnBuyGUcV4abF65kf7jlB3pd0WqeCZ+UacEWZj/+xF6paEghVjfi+G6ylrIqCNAKGS31nguyMnvA/NWqGqsFFtppvPEVv/l4x5kRXSKUHi4JqLinsNlXsBsmSecGaTQXLGxY47tmtDfNt4TYUt7T0aJYdQtKMtBZipg3qyyp8QdZD7sCNZajjbWX9wMuiZqQWhzEFVlApRG/T+t9BtoXvDFbW7A9gcJib4wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GWfc6Pu+u6+O3fLB9My9t0ijogSJ9ip4okKTD7NA+Mc=; b=ZA25nj6HdqNpYX7P4CZIGDH+V1Ayfpji91+b+fC62Plonx4Ts2BVU0k6YhjNpNpEqnrt1koBV9jQndMTj62pROuEoa9B8SR+UBOG8mkwb9f1x9K8zvpMZho0AHEW7r4ucvtonXQ5F7oFXGWNUFdsdnaF9Of4AjimHIYEALKsMfbTGSUC1wDRP9zMRNFikcGBv2zzJHMjMIFBGits7A1uRPid9L8Q466VhxG3XqrP0sAEBsxoj7pNmZGgLsHYUDXTObxKaMYJ+ofv+5mtHI0LBmTlnOvOQY2l5IebmbjZJLON3QU4F6sN4f9aPw+flfDart7WwoGoj5GMNg4gbf2p6g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GWfc6Pu+u6+O3fLB9My9t0ijogSJ9ip4okKTD7NA+Mc=; b=atv2pcz1g7UFOOZw+65GLPLzkQqxGepmtGSYlHeKXYX4tKgtvVlJBkQVwf+t5H4oLKxcLFEGfFom54KmU8ivap5WaZ0lVoIC+QqwIayJnlX0K5E5P8bNx1L7IKOZVJjAu4IlObeuexg54+BwB94XymH4R5JT1KYOPu4jqbW/neM= Received: from DUZPR01CA0354.eurprd01.prod.exchangelabs.com (2603:10a6:10:4b8::6) by PAVPR08MB9016.eurprd08.prod.outlook.com (2603:10a6:102:325::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.31; Wed, 4 Oct 2023 10:58:12 +0000 Received: from DBAEUR03FT027.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:4b8:cafe::d) by DUZPR01CA0354.outlook.office365.com (2603:10a6:10:4b8::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.33 via Frontend Transport; Wed, 4 Oct 2023 10:58:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DBAEUR03FT027.mail.protection.outlook.com (100.127.142.237) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6863.26 via Frontend Transport; Wed, 4 Oct 2023 10:58:12 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 4 Oct 2023 10:58:10 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.27 via Frontend Transport; Wed, 4 Oct 2023 10:58:10 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH] aarch64: Improve vecmath sin routines Date: Wed, 4 Oct 2023 11:58:09 +0100 Message-ID: <20231004105809.50464-1-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DBAEUR03FT027:EE_|PAVPR08MB9016:EE_|DBAEUR03FT064:EE_|DU2PR08MB7309:EE_ X-MS-Office365-Filtering-Correlation-Id: f48eee7e-3a95-43bb-70d6-08dbc4c8d268 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: yxDu3EncirG5qf4TpCK4LnEbHZIp7cg4KU9etf0IqXgwgxzAs+OYW/OCq8MmtRrMqL0JWVt68L+YDo3S41iyjfKPx3mVaE/TAyOpHLT9EbKNCo/aLpEy0ES/ggNVpt6bj1eYEDWtEqx1CsQhOZH4LK4ckU1tHsc+1Lqyd0KIKbQNvyG9M+t2kQMvY2D4EUw/VWooKEb3zvnrjeK51nkyCZF7qI8ztYCxd5OCojlWdtVXfIN5c8t2fL9WDtQyUs/B2nnumf7OFvd9cpAFPmF8tNtDLtyaiqGgTv+AJdKJHETzFF5KHImz+ZvhXjEiKCwE209aUC5+MbbF5gc56bWjtXX7bh5o/ZQt2R0VbeJuHMN4O6LmG/u2JsioodkhOW77eWZskJsGjPvZfEkJDP7ynWLxjs/rfHjjM1FlquboDZqfQkWddSoHlZK3LkFRJ1lUY49qbbPbjnuQHcs0k/7Q42QkR8HTgQ1NZ4xJ39PVPgtFbGxcRihwgWWAOo8D/8CJBixPd80RhrERbrX5RYa0KhzzyPCnwk2inJNX0SGCxcBdAvIpqpapZf1/G9IjPhsCrQM92z9g1MVuvdckRqsRb7tv7jyv7r8SMZXF3O4ZSKdOdRWEkO43HFCdtvA5E484ee9xVXpCC1EHojv+KxEsIjm6sEiwrf++5ensq2JdS8vykq6UnfNWRJHOTiupydSk+CdiNy/5wgBakzQGKgyNK+g2e67vbOZG/vtRFqsJzKzFHISpEydeqA4z+CMb7/WVTu9wrluPy+AwIAlpG7AfES8vVy75tFQZO4gAr/jgZGM= X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(4636009)(376002)(136003)(39860400002)(346002)(396003)(230922051799003)(82310400011)(186009)(1800799009)(451199024)(64100799003)(36840700001)(46966006)(40470700004)(70586007)(478600001)(6916009)(70206006)(316002)(7696005)(2616005)(336012)(426003)(356005)(1076003)(8676002)(40480700001)(41300700001)(36756003)(4326008)(8936002)(2906002)(26005)(81166007)(5660300002)(82740400003)(86362001)(47076005)(83380400001)(36860700001)(40460700003)(36900700001)(473944003); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9016 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT064.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: ee9fb343-32e4-4407-be78-08dbc4c8cdb1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Imi1R//NHBFR6PyLSqRKglT+Pzbyc0/RYkwgO//tLBTFlZReTBIuNkDGEF9PHuTVoICCFgFupO7BAHSKsPZKznmXLnzcUTAbsoFR3f9R3Gmvp2MvqwceqLx+woCm0mGhpbOUv3fq4kI/DZRGuvA6iJ8BKKgHmJJO0XFWCSG1lXSPsg5hADuTgayh4UQ9CX0ELFtwikZOI9KgWYnA0sPepjOWfQGiaT29RZ1w+ZE4zDQJSnP5xKb8jfAttbwEWOLreTaMez/PjU7K4rWIRVUFpoN8682Tww5q3WR5OS5/82N02moIk+YKILWcTMTuNvjndO+fD/eVUhQac8GPDIWHwLhCgAcOOSYfNQlff7yKMmcMiX2Rh7lTUio/QUVX0MIT4rnJPWAKCZlQ35dWDDAEuL+qB/dVLv6b6o0Fz3r8Q1Y2LIyF595VuwYX7p28istuxOshES7tGZp4mP5g7zyZLPNVIms3PKlLbEz8ZUM3suNU85FoOH37MCnd2cT8QNqv3aeO9yZx9s/EAA/9I8QDyCv/0SOJN160KCuD54CoRkJSXCEASpqinGDqQuG1qCUgSXYeF7gl+Bofl1CRIit9mseCHEbftin6hVu7Rv6Ihl7HP7JyEbInEZW7k7cyxkZvWvMndK+A3XqaAF+ITHwgp4q1qGJJK/dgYmmxHfWikbV6s0zexbKS7qKsnVtL12sDETZ/k1AXNuTGJlVTvA8HYcQwrvKKsEWyFdr4+BLlzlswHi4HvFh4eeXkmJvqqlasMgaZ5qbQtkdq3qvHe07lDA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(376002)(39850400004)(346002)(396003)(136003)(230922051799003)(186009)(1800799009)(82310400011)(451199024)(64100799003)(46966006)(40470700004)(36840700001)(7696005)(478600001)(336012)(83380400001)(26005)(1076003)(70206006)(41300700001)(70586007)(6916009)(5660300002)(8936002)(4326008)(316002)(8676002)(36756003)(2906002)(86362001)(47076005)(81166007)(426003)(82740400003)(36860700001)(40480700001)(2616005)(40460700003)(473944003); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Oct 2023 10:58:20.5464 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f48eee7e-3a95-43bb-70d6-08dbc4c8d268 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT064.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2PR08MB7309 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org * Update ULP comment reflecting a new observed max in [-pi/2, pi/2] * Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions * Improve register use near special-case branch Also use overloaded intrinsics for SVE. --- Subsumes a patch from August which replaced FTRIG instructions with polynomial. Thanks, Joe sysdeps/aarch64/fpu/sin_advsimd.c | 2 +- sysdeps/aarch64/fpu/sin_sve.c | 102 +++++++++++++++--------------- sysdeps/aarch64/fpu/sinf_sve.c | 44 +++++++------ 3 files changed, 75 insertions(+), 73 deletions(-) diff --git a/sysdeps/aarch64/fpu/sin_advsimd.c b/sysdeps/aarch64/fpu/sin_advsimd.c index 0389b334cc..55644c4cc6 100644 --- a/sysdeps/aarch64/fpu/sin_advsimd.c +++ b/sysdeps/aarch64/fpu/sin_advsimd.c @@ -24,7 +24,7 @@ static const struct data float64x2_t poly[7]; float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; } data = { - /* Worst-case error is 2.8 ulp in [-pi/2, pi/2]. */ + /* Worst-case error is 2.87 ulp in [-pi/2, pi/2]. */ .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7), V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19), V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33), diff --git a/sysdeps/aarch64/fpu/sin_sve.c b/sysdeps/aarch64/fpu/sin_sve.c index c3f450d0ea..9e7f5ff684 100644 --- a/sysdeps/aarch64/fpu/sin_sve.c +++ b/sysdeps/aarch64/fpu/sin_sve.c @@ -21,20 +21,23 @@ static const struct data { - double inv_pi, half_pi, inv_pi_over_2, pi_over_2_1, pi_over_2_2, pi_over_2_3, - shift; + double inv_pi, pi_1, pi_2, pi_3, shift, range_val; + double poly[7]; } data = { - /* Polynomial coefficients are hard-wired in the FTMAD instruction. */ + /* Worst-case error is 2.87 ulp in [-pi/2, pi/2]. */ + .poly = { -0x1.555555555547bp-3, 0x1.1111111108a4dp-7, -0x1.a01a019936f27p-13, + 0x1.71de37a97d93ep-19, -0x1.ae633919987c6p-26, + 0x1.60e277ae07cecp-33, -0x1.9e9540300a1p-41, }, + .inv_pi = 0x1.45f306dc9c883p-2, - .half_pi = 0x1.921fb54442d18p+0, - .inv_pi_over_2 = 0x1.45f306dc9c882p-1, - .pi_over_2_1 = 0x1.921fb50000000p+0, - .pi_over_2_2 = 0x1.110b460000000p-26, - .pi_over_2_3 = 0x1.1a62633145c07p-54, - .shift = 0x1.8p52 + .pi_1 = 0x1.921fb54442d18p+1, + .pi_2 = 0x1.1a62633145c06p-53, + .pi_3 = 0x1.c1cd129024e09p-106, + .shift = 0x1.8p52, + .range_val = 0x1p23, }; -#define RangeVal 0x4160000000000000 /* asuint64 (0x1p23). */ +#define C(i) sv_f64 (d->poly[i]) static svfloat64_t NOINLINE special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) @@ -42,56 +45,53 @@ special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) return sv_call_f64 (sin, x, y, cmp); } -/* A fast SVE implementation of sin based on trigonometric - instructions (FTMAD, FTSSEL, FTSMUL). - Maximum observed error in 2.52 ULP: - SV_NAME_D1 (sin)(0x1.2d2b00df69661p+19) got 0x1.10ace8f3e786bp-40 - want 0x1.10ace8f3e7868p-40. */ +/* A fast SVE implementation of sin. + Maximum observed error in 3.22 ULP: + _ZGVsMxv_sin (0x1.d70eef40f39b1p+12) got -0x1.ffe9537d5dbb7p-3 + want -0x1.ffe9537d5dbb4p-3. */ svfloat64_t SV_NAME_D1 (sin) (svfloat64_t x, const svbool_t pg) { const struct data *d = ptr_barrier (&data); - svfloat64_t r = svabs_f64_x (pg, x); - svuint64_t sign - = sveor_u64_x (pg, svreinterpret_u64_f64 (x), svreinterpret_u64_f64 (r)); - svbool_t cmp = svcmpge_n_u64 (pg, svreinterpret_u64_f64 (r), RangeVal); + /* Load some values in quad-word chunks to minimise memory access. */ + const svbool_t ptrue = svptrue_b64 (); + svfloat64_t shift = sv_f64 (d->shift); + svfloat64_t inv_pi_and_pi1 = svld1rq (ptrue, &d->inv_pi); + svfloat64_t pi2_and_pi3 = svld1rq (ptrue, &d->pi_2); - /* Load first two pio2-related constants to one vector. */ - svfloat64_t invpio2_and_pio2_1 - = svld1rq_f64 (svptrue_b64 (), &d->inv_pi_over_2); + /* n = rint(|x|/pi). */ + svfloat64_t n = svmla_lane (shift, x, inv_pi_and_pi1, 0); + svuint64_t odd = svlsl_x (pg, svreinterpret_u64 (n), 63); + n = svsub_x (pg, n, shift); - /* n = rint(|x|/(pi/2)). */ - svfloat64_t q = svmla_lane_f64 (sv_f64 (d->shift), r, invpio2_and_pio2_1, 0); - svfloat64_t n = svsub_n_f64_x (pg, q, d->shift); + /* r = |x| - n*(pi/2) (range reduction into -pi/2 .. pi/2). */ + svfloat64_t r = x; + r = svmls_lane (r, n, inv_pi_and_pi1, 1); + r = svmls_lane (r, n, pi2_and_pi3, 0); + r = svmls_lane (r, n, pi2_and_pi3, 1); - /* r = |x| - n*(pi/2) (range reduction into -pi/4 .. pi/4). */ - r = svmls_lane_f64 (r, n, invpio2_and_pio2_1, 1); - r = svmls_n_f64_x (pg, r, n, d->pi_over_2_2); - r = svmls_n_f64_x (pg, r, n, d->pi_over_2_3); + /* sin(r) poly approx. */ + svfloat64_t r2 = svmul_x (pg, r, r); + svfloat64_t r3 = svmul_x (pg, r2, r); + svfloat64_t r4 = svmul_x (pg, r2, r2); - /* Final multiplicative factor: 1.0 or x depending on bit #0 of q. */ - svfloat64_t f = svtssel_f64 (r, svreinterpret_u64_f64 (q)); + svfloat64_t t1 = svmla_x (pg, C (4), C (5), r2); + svfloat64_t t2 = svmla_x (pg, C (2), C (3), r2); + svfloat64_t t3 = svmla_x (pg, C (0), C (1), r2); - /* sin(r) poly approx. */ - svfloat64_t r2 = svtsmul_f64 (r, svreinterpret_u64_f64 (q)); - svfloat64_t y = sv_f64 (0.0); - y = svtmad_f64 (y, r2, 7); - y = svtmad_f64 (y, r2, 6); - y = svtmad_f64 (y, r2, 5); - y = svtmad_f64 (y, r2, 4); - y = svtmad_f64 (y, r2, 3); - y = svtmad_f64 (y, r2, 2); - y = svtmad_f64 (y, r2, 1); - y = svtmad_f64 (y, r2, 0); - - /* Apply factor. */ - y = svmul_f64_x (pg, f, y); - - /* sign = y^sign. */ - y = svreinterpret_f64_u64 ( - sveor_u64_x (pg, svreinterpret_u64_f64 (y), sign)); + svfloat64_t y = svmla_x (pg, t1, C (6), r4); + y = svmla_x (pg, t2, y, r4); + y = svmla_x (pg, t3, y, r4); + y = svmla_x (pg, r, y, r3); + svbool_t cmp = svacle (pg, x, d->range_val); + cmp = svnot_z (pg, cmp); if (__glibc_unlikely (svptest_any (pg, cmp))) - return special_case (x, y, cmp); - return y; + return special_case (x, + svreinterpret_f64 (sveor_z ( + svnot_z (pg, cmp), svreinterpret_u64 (y), odd)), + cmp); + + /* Copy sign. */ + return svreinterpret_f64 (sveor_z (pg, svreinterpret_u64 (y), odd)); } diff --git a/sysdeps/aarch64/fpu/sinf_sve.c b/sysdeps/aarch64/fpu/sinf_sve.c index 4d2ce7a846..590881c14b 100644 --- a/sysdeps/aarch64/fpu/sinf_sve.c +++ b/sysdeps/aarch64/fpu/sinf_sve.c @@ -23,7 +23,7 @@ static const struct data { float poly[4]; /* Pi-related values to be loaded as one quad-word and used with - svmla_lane_f32. */ + svmla_lane. */ float negpi1, negpi2, negpi3, invpi; float shift; } data = { @@ -57,40 +57,42 @@ svfloat32_t SV_NAME_F1 (sin) (svfloat32_t x, const svbool_t pg) { const struct data *d = ptr_barrier (&data); - svfloat32_t ax = svabs_f32_x (pg, x); - svuint32_t sign = sveor_u32_x (pg, svreinterpret_u32_f32 (x), - svreinterpret_u32_f32 (ax)); - svbool_t cmp = svcmpge_n_u32 (pg, svreinterpret_u32_f32 (ax), RangeVal); + svfloat32_t ax = svabs_x (pg, x); + svuint32_t sign + = sveor_x (pg, svreinterpret_u32 (x), svreinterpret_u32 (ax)); + svbool_t cmp = svcmpge (pg, svreinterpret_u32 (ax), RangeVal); /* pi_vals are a quad-word of helper values - the first 3 elements contain -pi in extended precision, the last contains 1 / pi. */ - svfloat32_t pi_vals = svld1rq_f32 (svptrue_b32 (), &d->negpi1); + svfloat32_t pi_vals = svld1rq (svptrue_b32 (), &d->negpi1); /* n = rint(|x|/pi). */ - svfloat32_t n = svmla_lane_f32 (sv_f32 (d->shift), ax, pi_vals, 3); - svuint32_t odd = svlsl_n_u32_x (pg, svreinterpret_u32_f32 (n), 31); - n = svsub_n_f32_x (pg, n, d->shift); + svfloat32_t n = svmla_lane (sv_f32 (d->shift), ax, pi_vals, 3); + svuint32_t odd = svlsl_x (pg, svreinterpret_u32 (n), 31); + n = svsub_x (pg, n, d->shift); /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ svfloat32_t r; - r = svmla_lane_f32 (ax, n, pi_vals, 0); - r = svmla_lane_f32 (r, n, pi_vals, 1); - r = svmla_lane_f32 (r, n, pi_vals, 2); + r = svmla_lane (ax, n, pi_vals, 0); + r = svmla_lane (r, n, pi_vals, 1); + r = svmla_lane (r, n, pi_vals, 2); /* sin(r) approx using a degree 9 polynomial from the Taylor series expansion. Note that only the odd terms of this are non-zero. */ - svfloat32_t r2 = svmul_f32_x (pg, r, r); + svfloat32_t r2 = svmul_x (pg, r, r); svfloat32_t y; - y = svmla_f32_x (pg, C (2), r2, C (3)); - y = svmla_f32_x (pg, C (1), r2, y); - y = svmla_f32_x (pg, C (0), r2, y); - y = svmla_f32_x (pg, r, r, svmul_f32_x (pg, y, r2)); + y = svmla_x (pg, C (2), r2, C (3)); + y = svmla_x (pg, C (1), r2, y); + y = svmla_x (pg, C (0), r2, y); + y = svmla_x (pg, r, r, svmul_x (pg, y, r2)); /* sign = y^sign^odd. */ - y = svreinterpret_f32_u32 (sveor_u32_x (pg, svreinterpret_u32_f32 (y), - sveor_u32_x (pg, sign, odd))); + sign = sveor_x (pg, sign, odd); if (__glibc_unlikely (svptest_any (pg, cmp))) - return special_case (x, y, cmp); - return y; + return special_case (x, + svreinterpret_f32 (sveor_x ( + svnot_z (pg, cmp), svreinterpret_u32 (y), sign)), + cmp); + return svreinterpret_f32 (sveor_x (pg, svreinterpret_u32 (y), sign)); }