From patchwork Fri Jun 23 15:24:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 71601 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 24934385828E for ; Fri, 23 Jun 2023 15:26:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 24934385828E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1687533960; bh=dG7684NOn6cvsUxSl9CXlC0vuv3N12sk/pY9q91RBVc=; h=To:CC:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=dQAz+tCSQwmia2GcgoCmhXfoJ1LTu7r4FZdAK76s/3RwbcWaUROJdWyZ7TaNp8MPl neWn7NngYJDqhB07Fw7YPFe8UEd1iSRrzFLP0uxXA9VYRaD0mMQvxkkaWsf9G/0vIo olsPZL5EgnCV7gPyEsTUt7q6RQrYYp5MSdgtYBG4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04on2058.outbound.protection.outlook.com [40.107.6.58]) by sourceware.org (Postfix) with ESMTPS id 059D83858C3A for ; Fri, 23 Jun 2023 15:24:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 059D83858C3A Received: from DBBPR09CA0031.eurprd09.prod.outlook.com (2603:10a6:10:d4::19) by DB9PR08MB8290.eurprd08.prod.outlook.com (2603:10a6:10:3de::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Fri, 23 Jun 2023 15:24:38 +0000 Received: from DBAEUR03FT023.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:d4:cafe::d1) by DBBPR09CA0031.outlook.office365.com (2603:10a6:10:d4::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26 via Frontend Transport; Fri, 23 Jun 2023 15:24:38 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT023.mail.protection.outlook.com (100.127.142.253) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.11 via Frontend Transport; Fri, 23 Jun 2023 15:24:38 +0000 Received: ("Tessian outbound 7c913606c6e6:v142"); Fri, 23 Jun 2023 15:24:38 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: fc2e5704032375eb X-CR-MTA-TID: 64aa7808 Received: from c0e9ae69340a.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 5AE4B991-8FE5-4093-A5E4-27DDCB1F5DA5.1; Fri, 23 Jun 2023 15:24:31 +0000 Received: from EUR01-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c0e9ae69340a.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Jun 2023 15:24:31 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fiE7n1eZ0j1J3rg+RvWcd54ktph8HqYqUaLzMSJCjhTIjRs120i9r9a1iQW2TDWMt4bHq8DhplSV0EBVMo2d1M3o4MsGkalMPMFJCS9eJ43Zd7pSpVUZ32UpUKlJ4AKHLtkQI/SEukggXtujk0xhXhVY1Msp04Z2ulEnTf+VdlpIRz4k4weMvwnlDfq4mJzlIblYWo5D2j7hkm/ryZhGlP78CK12Apmcgm1gQJhecLDhNA/8v7PFu2YQityyELEoPhuNmjoz31xJYnfcauVd9Ggs9aY2qsymtnc6pHUgwAQ7LnDyzgFf5ER+/azEJajIzWpgUyZdBwmzKCtuRwspAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dG7684NOn6cvsUxSl9CXlC0vuv3N12sk/pY9q91RBVc=; b=NAOJBQF3NZyPpMlJ1oylsJw1x+66ModYCMuGOU2L6wWH9tWl+wfJvOV5aDTlQD8avXhncYfUKqKhfcgu1MhIeuD+aYqTLK9343kjAFVKa+Eac72cVQExTxnxFJ5CpTm3fZeha/JIkvX+AiEbTExPEAu7drymC3AbrKOfZrr9sVeMo4s0wfbliq1N0O1w+Tr9UnFN9Xx02sYXXITQczCgQkcpW2Mw2h/ql1N8awzQk7b2B1ksJveMA9PEPbAYTniwc7JCKwCIm+I7D5cwpYx6ti2btmav/bo84qcFbczEU0Lf7BNo6wUINkKGHj/6m6XrsDFt/ZIgnGgCn/cFhjhIpg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none Received: from DB9PR06CA0012.eurprd06.prod.outlook.com (2603:10a6:10:1db::17) by PAXPR08MB7622.eurprd08.prod.outlook.com (2603:10a6:102:240::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Fri, 23 Jun 2023 15:24:28 +0000 Received: from DBAEUR03FT054.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:1db:cafe::62) by DB9PR06CA0012.outlook.office365.com (2603:10a6:10:1db::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26 via Frontend Transport; Fri, 23 Jun 2023 15:24:27 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DBAEUR03FT054.mail.protection.outlook.com (100.127.142.218) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6521.26 via Frontend Transport; Fri, 23 Jun 2023 15:24:27 +0000 Received: from AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Fri, 23 Jun 2023 15:24:27 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Fri, 23 Jun 2023 15:24:26 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.23 via Frontend Transport; Fri, 23 Jun 2023 15:24:26 +0000 To: CC: Joe Ramsay Subject: [PATCH v3 2/4] aarch64: Add vector implementations of sin routines Date: Fri, 23 Jun 2023 16:24:21 +0100 Message-ID: <20230623152423.1683-2-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230623152423.1683-1-Joe.Ramsay@arm.com> References: <20230623152423.1683-1-Joe.Ramsay@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DBAEUR03FT054:EE_|PAXPR08MB7622:EE_|DBAEUR03FT023:EE_|DB9PR08MB8290:EE_ X-MS-Office365-Filtering-Correlation-Id: 82adcb0f-ed0c-4cc4-efdd-08db73fdf574 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: CzmJSWnKxzDNSbjef/cD2WWzqiEIVVa6K6FlqGi8gL8uGfR2tVr6jQIq4slCNJOM3eE3qcwzX2CKqth2N0pGyHN5LLdHBCKa7zqApe5g6gLg5Bfmafyl+8AqpNwA79DludFJC9q7mUPfHVBTf63CkKAfQEZG1HMKVmqWPCAWaAAwX/rwxyGZYnriYiaqYMLlcgGWni5wKlFCOgUe4HhuwZ+Fx4fVFN6HwQJfxCtdQgTOkzjeLCIVB+PDC9JKKzwTE6nN2AQwr9u1vdcpRUTYY+QZelpIPscqNnc2I94KLdQ/cPWXVC45NtaFOEHTp/GLMkj2LOx19Q/OK3rRL9B8mbrXgOcAsllvDCAPtDRKpiJweQSTxBeSclAZ59F53+IcKAJq8IYIhr2Yq+r83MK/qzwAS6QVlDzaQx8fl/kb9ESBr4IL9GIR4JbHKtsrYr9KMcXZpNyE21nMgPYwYCobyLZ18ZWDd0n57U6QFINRSyISXi2QOIIE2XVRRxnljHeWCTHb1hUqb2OEnDY2WkDj4H6I+q4mcUnyFoQlBmTrsor0U32jH5a5WfO3jdjKRKtZJm3rH+OBuGvE8ufZCfjl0CIgSTKEPoxrWf/rwBhD+by1T1gPuKVVRlsa+rEMFJlwjx1R+HPPvqiK2bf4mDFnpBpji8NUdWlStHz5CnuY5iaynBqSq7Ps5SxhtfRmTVlECWoMVDShiOQHCbeNFizIqQoxvF/9rutjbsLCVXTIohjvgV8AY4Bcd2M3Ck7HL5TvpzF+gyMwFKNrVQ3Kg5D3+Q== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230028)(4636009)(396003)(136003)(376002)(346002)(39860400002)(451199021)(36840700001)(40470700004)(46966006)(82310400005)(7696005)(6666004)(40460700003)(86362001)(82740400003)(1076003)(36860700001)(186003)(26005)(426003)(336012)(2616005)(47076005)(40480700001)(36756003)(356005)(81166007)(83380400001)(8676002)(8936002)(2906002)(30864003)(6916009)(316002)(4326008)(41300700001)(70206006)(70586007)(478600001)(5660300002)(2004002)(36900700001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB7622 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT023.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: e527d2b2-f232-4c00-386b-08db73fdef14 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: mJYHwh/YZMkl3PxtqEYHfflQKaLbUMPaj0fQAOBm9yB0bD0HlP0KQ13u8GX02+GB6kqqo5JEm1AF+HFy8tm+HLYK0hUYJjjVUiPgk26N43JN7IPBS4WKwaL9ufHDLTC27vodEhqqrAbFa8hOPOnkiGbTVWIJiK4zSUSiow7kvrALWQSUD0efao24mVdA7Ot3aUrPm89yuNmVLzxe/qDFc3NkuG829+O+uhAME8JuDG3cJsI5LPBpvC5sP+DSbfgCFTbkAXvVUt9wxOl4jr3YzSxkN8o1Wt9OWGrlNisJxKukDPhJTsZfygYkYdVFJV0bPRaGYkOjZJKGIQDi6izz1eiKs3MOtLnNtPj0/8saABAQAGZEC9VjdtJJyTL7yyqG5hKcw1srqYjEqKjS6orVJy9NqwJbIaqiObjF1uLA8X5942PFsEtP7ynPOM9IDnqOJpRaeaSH0AJf70ZuwANlPlwEr3HX5cRECrMYxcKGEwslCIqluevXnTjIez9AwQmnlqLfDzKWtj7NXfcgPEmkCkbezjHL5b7vt3curz1nED9edF3W+ciVDHblbjk+p/cZpPuCSH6zoMQKGSsqb3t9oaSsJT/6zsLesrpqhregLuC7qx1KphkV13VDPqRFvT3CCOn/q+4j2Am7XhGC6EpbcPhdB85klSOfSwJMSD0yI5NEbfdmuHii0l+KatRWYmbQ0qY7tZ8VXkseynM89VM+VGvC9thGtqn7BHxLgGUdJ+2aOPHoaaHukcVQIVuS0Y3/ X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(346002)(136003)(39860400002)(376002)(396003)(451199021)(46966006)(36840700001)(40470700004)(40460700003)(2906002)(30864003)(7696005)(82310400005)(82740400003)(6666004)(81166007)(2616005)(83380400001)(336012)(426003)(1076003)(186003)(26005)(47076005)(41300700001)(36860700001)(86362001)(478600001)(40480700001)(316002)(70586007)(70206006)(36756003)(4326008)(6916009)(5660300002)(8676002)(8936002)(2004002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Jun 2023 15:24:38.5058 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 82adcb0f-ed0c-4cc4-efdd-08db73fdf574 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT023.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB8290 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Joe Ramsay via Libc-alpha From: Joe Ramsay Reply-To: Joe Ramsay Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. As previously, data tables are marked volatile or used via pointers to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. --- Changes to v2: * Mark SVE data tables as const, and use new barrier * Fix sign of sin(-0) for AdvSIMD routines. * Remove special handling for mathvec routines in tests, and this is no longer needed. * Newline for every routine in the Makefile. Thanks, Joe sysdeps/aarch64/fpu/Makefile | 12 +- sysdeps/aarch64/fpu/Versions | 4 + sysdeps/aarch64/fpu/bits/math-vector.h | 6 + sysdeps/aarch64/fpu/sin_advsimd.c | 105 ++++++++++++++++++ sysdeps/aarch64/fpu/sin_sve.c | 97 ++++++++++++++++ sysdeps/aarch64/fpu/sinf_advsimd.c | 98 ++++++++++++++++ sysdeps/aarch64/fpu/sinf_sve.c | 96 ++++++++++++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/libm-test-ulps | 8 ++ .../unix/sysv/linux/aarch64/libmvec.abilist | 4 + 13 files changed, 428 insertions(+), 6 deletions(-) create mode 100644 sysdeps/aarch64/fpu/sin_advsimd.c create mode 100644 sysdeps/aarch64/fpu/sin_sve.c create mode 100644 sysdeps/aarch64/fpu/sinf_advsimd.c create mode 100644 sysdeps/aarch64/fpu/sinf_sve.c diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 850cfb9012..9ceea35148 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -1,10 +1,10 @@ -float-advsimd-funcs = cos +libmvec-supported-funcs = cos \ + sin -double-advsimd-funcs = cos - -float-sve-funcs = cos - -double-sve-funcs = cos +float-advsimd-funcs = $(libmvec-supported-funcs) +double-advsimd-funcs = $(libmvec-supported-funcs) +float-sve-funcs = $(libmvec-supported-funcs) +double-sve-funcs = $(libmvec-supported-funcs) ifeq ($(subdir),mathvec) libmvec-support = $(addsuffix f_advsimd,$(float-advsimd-funcs)) \ diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index 5222a6f180..d26b3968a9 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -1,8 +1,12 @@ libmvec { GLIBC_2.38 { _ZGVnN2v_cos; + _ZGVnN2v_sin; _ZGVnN4v_cosf; + _ZGVnN4v_sinf; _ZGVsMxv_cos; _ZGVsMxv_cosf; + _ZGVsMxv_sin; + _ZGVsMxv_sinf; } } diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index a2f2277591..ad9c9945e8 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -50,7 +50,10 @@ typedef __SVBool_t __sv_bool_t; # define __vpcs __attribute__ ((__aarch64_vector_pcs__)) __vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4v_sinf (__f32x4_t); + __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2v_sin (__f64x2_t); # undef __ADVSIMD_VEC_MATH_SUPPORTED #endif /* __ADVSIMD_VEC_MATH_SUPPORTED */ @@ -58,7 +61,10 @@ __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); #ifdef __SVE_VEC_MATH_SUPPORTED __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxv_sinf (__sv_f32_t, __sv_bool_t); + __sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxv_sin (__sv_f64_t, __sv_bool_t); # undef __SVE_VEC_MATH_SUPPORTED #endif /* __SVE_VEC_MATH_SUPPORTED */ diff --git a/sysdeps/aarch64/fpu/sin_advsimd.c b/sysdeps/aarch64/fpu/sin_advsimd.c new file mode 100644 index 0000000000..2e7cf3f59f --- /dev/null +++ b/sysdeps/aarch64/fpu/sin_advsimd.c @@ -0,0 +1,105 @@ +/* Double-precision vector (Advanced SIMD) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +static const volatile struct +{ + float64x2_t poly[7]; + float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; +} data = { + /* Worst-case error is 2.8 ulp in [-pi/2, pi/2]. */ + .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7), + V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19), + V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33), + V2 (-0x1.9e9540300a1p-41) }, + + .range_val = V2 (0x1p23), + .inv_pi = V2 (0x1.45f306dc9c883p-2), + .pi_1 = V2 (0x1.921fb54442d18p+1), + .pi_2 = V2 (0x1.1a62633145c06p-53), + .pi_3 = V2 (0x1.c1cd129024e09p-106), + .shift = V2 (0x1.8p52), +}; + +#if WANT_SIMD_EXCEPT +# define TinyBound v_u64 (0x3000000000000000) /* asuint64 (0x1p-255). */ +# define Thresh v_u64 (0x1160000000000000) /* RangeVal - TinyBound. */ +#endif + +#define C(i) data.poly[i] + +static float64x2_t VPCS_ATTR NOINLINE +special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp) +{ + y = vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd)); + return v_call_f64 (sin, x, y, cmp); +} + +float64x2_t VPCS_ATTR V_NAME_D1 (sin) (float64x2_t x) +{ + float64x2_t n, r, r2, r3, r4, y, t1, t2, t3; + uint64x2_t odd, cmp, eqz; + +#if WANT_SIMD_EXCEPT + /* Detect |x| <= TinyBound or |x| >= RangeVal. If fenv exceptions are to be + triggered correctly, set any special lanes to 1 (which is neutral w.r.t. + fenv). These lanes will be fixed by special-case handler later. */ + uint64x2_t ir = vreinterpretq_u64_f64 (vabsq_f64 (x)); + cmp = vcgeq_u64 (vsubq_u64 (ir, TinyBound), Thresh); + r = vbslq_f64 (cmp, vreinterpretq_f64_u64 (cmp), x); +#else + r = x; + cmp = vcageq_f64 (data.range_val, x); + cmp = vceqzq_u64 (cmp); /* cmp = ~cmp. */ +#endif + eqz = vceqzq_f64 (x); + + /* n = rint(|x|/pi). */ + n = vfmaq_f64 (data.shift, data.inv_pi, r); + odd = vshlq_n_u64 (vreinterpretq_u64_f64 (n), 63); + n = vsubq_f64 (n, data.shift); + + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ + r = vfmsq_f64 (r, data.pi_1, n); + r = vfmsq_f64 (r, data.pi_2, n); + r = vfmsq_f64 (r, data.pi_3, n); + + /* sin(r) poly approx. */ + r2 = vmulq_f64 (r, r); + r3 = vmulq_f64 (r2, r); + r4 = vmulq_f64 (r2, r2); + + t1 = vfmaq_f64 (C (4), C (5), r2); + t2 = vfmaq_f64 (C (2), C (3), r2); + t3 = vfmaq_f64 (C (0), C (1), r2); + + y = vfmaq_f64 (t1, C (6), r4); + y = vfmaq_f64 (t2, y, r4); + y = vfmaq_f64 (t3, y, r4); + y = vfmaq_f64 (r, y, r3); + + /* Sign of 0 is discarded by polynomial, so copy it back here. */ + if (__glibc_unlikely (v_any_u64 (eqz))) + y = vbslq_f64 (eqz, x, y); + + if (__glibc_unlikely (v_any_u64 (cmp))) + return special_case (x, y, odd, cmp); + return vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd)); +} diff --git a/sysdeps/aarch64/fpu/sin_sve.c b/sysdeps/aarch64/fpu/sin_sve.c new file mode 100644 index 0000000000..c3f450d0ea --- /dev/null +++ b/sysdeps/aarch64/fpu/sin_sve.c @@ -0,0 +1,97 @@ +/* Double-precision vector (SVE) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +static const struct data +{ + double inv_pi, half_pi, inv_pi_over_2, pi_over_2_1, pi_over_2_2, pi_over_2_3, + shift; +} data = { + /* Polynomial coefficients are hard-wired in the FTMAD instruction. */ + .inv_pi = 0x1.45f306dc9c883p-2, + .half_pi = 0x1.921fb54442d18p+0, + .inv_pi_over_2 = 0x1.45f306dc9c882p-1, + .pi_over_2_1 = 0x1.921fb50000000p+0, + .pi_over_2_2 = 0x1.110b460000000p-26, + .pi_over_2_3 = 0x1.1a62633145c07p-54, + .shift = 0x1.8p52 +}; + +#define RangeVal 0x4160000000000000 /* asuint64 (0x1p23). */ + +static svfloat64_t NOINLINE +special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) +{ + return sv_call_f64 (sin, x, y, cmp); +} + +/* A fast SVE implementation of sin based on trigonometric + instructions (FTMAD, FTSSEL, FTSMUL). + Maximum observed error in 2.52 ULP: + SV_NAME_D1 (sin)(0x1.2d2b00df69661p+19) got 0x1.10ace8f3e786bp-40 + want 0x1.10ace8f3e7868p-40. */ +svfloat64_t SV_NAME_D1 (sin) (svfloat64_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svfloat64_t r = svabs_f64_x (pg, x); + svuint64_t sign + = sveor_u64_x (pg, svreinterpret_u64_f64 (x), svreinterpret_u64_f64 (r)); + svbool_t cmp = svcmpge_n_u64 (pg, svreinterpret_u64_f64 (r), RangeVal); + + /* Load first two pio2-related constants to one vector. */ + svfloat64_t invpio2_and_pio2_1 + = svld1rq_f64 (svptrue_b64 (), &d->inv_pi_over_2); + + /* n = rint(|x|/(pi/2)). */ + svfloat64_t q = svmla_lane_f64 (sv_f64 (d->shift), r, invpio2_and_pio2_1, 0); + svfloat64_t n = svsub_n_f64_x (pg, q, d->shift); + + /* r = |x| - n*(pi/2) (range reduction into -pi/4 .. pi/4). */ + r = svmls_lane_f64 (r, n, invpio2_and_pio2_1, 1); + r = svmls_n_f64_x (pg, r, n, d->pi_over_2_2); + r = svmls_n_f64_x (pg, r, n, d->pi_over_2_3); + + /* Final multiplicative factor: 1.0 or x depending on bit #0 of q. */ + svfloat64_t f = svtssel_f64 (r, svreinterpret_u64_f64 (q)); + + /* sin(r) poly approx. */ + svfloat64_t r2 = svtsmul_f64 (r, svreinterpret_u64_f64 (q)); + svfloat64_t y = sv_f64 (0.0); + y = svtmad_f64 (y, r2, 7); + y = svtmad_f64 (y, r2, 6); + y = svtmad_f64 (y, r2, 5); + y = svtmad_f64 (y, r2, 4); + y = svtmad_f64 (y, r2, 3); + y = svtmad_f64 (y, r2, 2); + y = svtmad_f64 (y, r2, 1); + y = svtmad_f64 (y, r2, 0); + + /* Apply factor. */ + y = svmul_f64_x (pg, f, y); + + /* sign = y^sign. */ + y = svreinterpret_f64_u64 ( + sveor_u64_x (pg, svreinterpret_u64_f64 (y), sign)); + + if (__glibc_unlikely (svptest_any (pg, cmp))) + return special_case (x, y, cmp); + return y; +} diff --git a/sysdeps/aarch64/fpu/sinf_advsimd.c b/sysdeps/aarch64/fpu/sinf_advsimd.c new file mode 100644 index 0000000000..507d60ce9a --- /dev/null +++ b/sysdeps/aarch64/fpu/sinf_advsimd.c @@ -0,0 +1,98 @@ +/* Single-precision vector (Advanced SIMD) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +static const volatile struct +{ + float32x4_t poly[4]; + float32x4_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; +} data = { + /* 1.886 ulp error. */ + .poly = { V4 (-0x1.555548p-3f), V4 (0x1.110df4p-7f), V4 (-0x1.9f42eap-13f), + V4 (0x1.5b2e76p-19f) }, + + .pi_1 = V4 (0x1.921fb6p+1f), + .pi_2 = V4 (-0x1.777a5cp-24f), + .pi_3 = V4 (-0x1.ee59dap-49f), + + .inv_pi = V4 (0x1.45f306p-2f), + .shift = V4 (0x1.8p+23f), + .range_val = V4 (0x1p20f) +}; + +#if WANT_SIMD_EXCEPT +# define TinyBound v_u32 (0x21000000) /* asuint32(0x1p-61f). */ +# define Thresh v_u32 (0x28800000) /* RangeVal - TinyBound. */ +#endif + +#define C(i) data.poly[i] + +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, uint32x4_t odd, uint32x4_t cmp) +{ + /* Fall back to scalar code. */ + y = vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd)); + return v_call_f32 (sinf, x, y, cmp); +} + +float32x4_t VPCS_ATTR V_NAME_F1 (sin) (float32x4_t x) +{ + float32x4_t n, r, r2, y; + uint32x4_t odd, cmp, eqz; + +#if WANT_SIMD_EXCEPT + uint32x4_t ir = vreinterpretq_u32_f32 (vabsq_f32 (x)); + cmp = vcgeq_u32 (vsubq_u32 (ir, TinyBound), Thresh); + /* If fenv exceptions are to be triggered correctly, set any special lanes + to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by + special-case handler later. */ + r = vbslq_f32 (cmp, vreinterpretq_f32_u32 (cmp), x); +#else + r = x; + cmp = vcageq_f32 (data.range_val, x); + cmp = vceqzq_u32 (cmp); /* cmp = ~cmp. */ +#endif + eqz = vceqzq_f32 (x); + + /* n = rint(|x|/pi) */ + n = vfmaq_f32 (data.shift, data.inv_pi, r); + odd = vshlq_n_u32 (vreinterpretq_u32_f32 (n), 31); + n = vsubq_f32 (n, data.shift); + + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2) */ + r = vfmsq_f32 (r, data.pi_1, n); + r = vfmsq_f32 (r, data.pi_2, n); + r = vfmsq_f32 (r, data.pi_3, n); + + /* y = sin(r) */ + r2 = vmulq_f32 (r, r); + y = vfmaq_f32 (C (2), C (3), r2); + y = vfmaq_f32 (C (1), y, r2); + y = vfmaq_f32 (C (0), y, r2); + y = vfmaq_f32 (r, vmulq_f32 (y, r2), r); + + /* Sign of 0 is discarded by polynomial, so copy it back here. */ + if (__glibc_unlikely (v_any_u32 (eqz))) + y = vbslq_f32 (eqz, x, y); + + if (__glibc_unlikely (v_any_u32 (cmp))) + return special_case (x, y, odd, cmp); + return vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd)); +} diff --git a/sysdeps/aarch64/fpu/sinf_sve.c b/sysdeps/aarch64/fpu/sinf_sve.c new file mode 100644 index 0000000000..4d2ce7a846 --- /dev/null +++ b/sysdeps/aarch64/fpu/sinf_sve.c @@ -0,0 +1,96 @@ +/* Single-precision vector (SVE) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +static const struct data +{ + float poly[4]; + /* Pi-related values to be loaded as one quad-word and used with + svmla_lane_f32. */ + float negpi1, negpi2, negpi3, invpi; + float shift; +} data = { + .poly = { + /* Non-zero coefficients from the degree 9 Taylor series expansion of + sin. */ + -0x1.555548p-3f, 0x1.110df4p-7f, -0x1.9f42eap-13f, 0x1.5b2e76p-19f + }, + .negpi1 = -0x1.921fb6p+1f, + .negpi2 = 0x1.777a5cp-24f, + .negpi3 = 0x1.ee59dap-49f, + .invpi = 0x1.45f306p-2f, + .shift = 0x1.8p+23f +}; + +#define RangeVal 0x49800000 /* asuint32 (0x1p20f). */ +#define C(i) sv_f32 (d->poly[i]) + +static svfloat32_t NOINLINE +special_case (svfloat32_t x, svfloat32_t y, svbool_t cmp) +{ + return sv_call_f32 (sinf, x, y, cmp); +} + +/* A fast SVE implementation of sinf. + Maximum error: 1.89 ULPs. + This maximum error is achieved at multiple values in [-2^18, 2^18] + but one example is: + SV_NAME_F1 (sin)(0x1.9247a4p+0) got 0x1.fffff6p-1 want 0x1.fffffap-1. */ +svfloat32_t SV_NAME_F1 (sin) (svfloat32_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svfloat32_t ax = svabs_f32_x (pg, x); + svuint32_t sign = sveor_u32_x (pg, svreinterpret_u32_f32 (x), + svreinterpret_u32_f32 (ax)); + svbool_t cmp = svcmpge_n_u32 (pg, svreinterpret_u32_f32 (ax), RangeVal); + + /* pi_vals are a quad-word of helper values - the first 3 elements contain + -pi in extended precision, the last contains 1 / pi. */ + svfloat32_t pi_vals = svld1rq_f32 (svptrue_b32 (), &d->negpi1); + + /* n = rint(|x|/pi). */ + svfloat32_t n = svmla_lane_f32 (sv_f32 (d->shift), ax, pi_vals, 3); + svuint32_t odd = svlsl_n_u32_x (pg, svreinterpret_u32_f32 (n), 31); + n = svsub_n_f32_x (pg, n, d->shift); + + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ + svfloat32_t r; + r = svmla_lane_f32 (ax, n, pi_vals, 0); + r = svmla_lane_f32 (r, n, pi_vals, 1); + r = svmla_lane_f32 (r, n, pi_vals, 2); + + /* sin(r) approx using a degree 9 polynomial from the Taylor series + expansion. Note that only the odd terms of this are non-zero. */ + svfloat32_t r2 = svmul_f32_x (pg, r, r); + svfloat32_t y; + y = svmla_f32_x (pg, C (2), r2, C (3)); + y = svmla_f32_x (pg, C (1), r2, y); + y = svmla_f32_x (pg, C (0), r2, y); + y = svmla_f32_x (pg, r, r, svmul_f32_x (pg, y, r2)); + + /* sign = y^sign^odd. */ + y = svreinterpret_f32_u32 (sveor_u32_x (pg, svreinterpret_u32_f32 (y), + sveor_u32_x (pg, sign, odd))); + + if (__glibc_unlikely (svptest_any (pg, cmp))) + return special_case (x, y, cmp); + return y; +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index cb45fd3298..4af97a25a2 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -24,3 +24,4 @@ #define VEC_TYPE float64x2_t VPCS_VECTOR_WRAPPER (cos_advsimd, _ZGVnN2v_cos) +VPCS_VECTOR_WRAPPER (sin_advsimd, _ZGVnN2v_sin) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index cf72ef83b7..64c790adc5 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -33,3 +33,4 @@ } SVE_VECTOR_WRAPPER (cos_sve, _ZGVsMxv_cos) +SVE_VECTOR_WRAPPER (sin_sve, _ZGVsMxv_sin) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index fa146862b0..50e776b952 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -24,3 +24,4 @@ #define VEC_TYPE float32x4_t VPCS_VECTOR_WRAPPER (cosf_advsimd, _ZGVnN4v_cosf) +VPCS_VECTOR_WRAPPER (sinf_advsimd, _ZGVnN4v_sinf) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index bc26558c62..7355032929 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -33,3 +33,4 @@ } SVE_VECTOR_WRAPPER (cosf_sve, _ZGVsMxv_cosf) +SVE_VECTOR_WRAPPER (sinf_sve, _ZGVsMxv_sinf) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 07da4ab843..4145662b2d 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1257,11 +1257,19 @@ double: 1 float: 1 ldouble: 2 +Function: "sin_advsimd": +double: 2 +float: 1 + Function: "sin_downward": double: 1 float: 1 ldouble: 3 +Function: "sin_sve": +double: 2 +float: 1 + Function: "sin_towardzero": double: 1 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index 13af421af2..a4c564859c 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -1,4 +1,8 @@ GLIBC_2.38 _ZGVnN2v_cos F +GLIBC_2.38 _ZGVnN2v_sin F GLIBC_2.38 _ZGVnN4v_cosf F +GLIBC_2.38 _ZGVnN4v_sinf F GLIBC_2.38 _ZGVsMxv_cos F GLIBC_2.38 _ZGVsMxv_cosf F +GLIBC_2.38 _ZGVsMxv_sin F +GLIBC_2.38 _ZGVsMxv_sinf F